Shell ScriptingValidationDatabase CentricAdvancedMay 2026

Shell Scripting Database Centric: Data Validation Scripts

Assert data quality after imports and migrations — row count checks, referential integrity probes, duplicate detection, range validation, cardinality checks, and statistical anomaly detection.

Every data pipeline has a validation step — or should. Shell scripts that validate database contents after imports, migrations, and ETL runs catch data quality problems before they propagate to downstream systems. This page covers row count checks, referential integrity probes, range validations, and duplicate detection.

BASH
#!/usr/bin/env bash
# validate_import.sh — Validate a CSV import into MySQL

set -euo pipefail

DB_CONF="/etc/myapp/mysql.conf"
db() { mysql --defaults-file="${DB_CONF}" -BNs -D myapp -e "$1"; }
ERRORS=0

check() {
  local name="$1" condition="$2"
  local result
  result=$(db "${condition}")
  if [[ "${result}" == "1" ]]; then
    printf "  %-40s PASS\n" "${name}"
  else
    printf "  %-40s FAIL (got: %s)\n" "${name}" "${result}"
    (( ERRORS++ ))
  fi
}

echo "=== Data Validation: $(date +%Y-%m-%d) ==="

# Row count checks
check "users table not empty"        "SELECT COUNT(*) > 0 FROM users"
check "all users have email"         "SELECT COUNT(*) = 0 FROM users WHERE email IS NULL OR email = ''"
check "email uniqueness"             "SELECT COUNT(email) = COUNT(DISTINCT email) FROM users"
check "no future created_at"         "SELECT COUNT(*) = 0 FROM users WHERE created_at > NOW()"
check "active flag valid"            "SELECT COUNT(*) = 0 FROM users WHERE active NOT IN (0,1)"

# Referential integrity
check "orders have valid user_id"    "SELECT COUNT(*) = 0 FROM orders o LEFT JOIN users u ON u.id=o.user_id WHERE u.id IS NULL"
check "order_items have valid order" "SELECT COUNT(*) = 0 FROM order_items oi LEFT JOIN orders o ON o.id=oi.order_id WHERE o.id IS NULL"

# Range validations
check "order totals non-negative"    "SELECT COUNT(*) = 0 FROM orders WHERE total < 0"
check "order totals reasonable"      "SELECT COUNT(*) = 0 FROM orders WHERE total > 100000"

echo ""
echo "  Results: ${ERRORS} failures"
(( ERRORS > 0 )) && { echo "  VALIDATION FAILED"; exit 1; }
echo "  All validations passed"
BASH
# ── Find duplicate emails ─────────────────────────────────
mysql --defaults-file="${DB_CONF}" -BNs myapp -e "
  SELECT email, COUNT(*) AS cnt
  FROM users
  GROUP BY email
  HAVING cnt > 1
  ORDER BY cnt DESC
  LIMIT 20" | awk -F$'\t' '{printf "  DUPLICATE: %-40s %d times\n",$1,$2}'

# ── Detect orphaned records ────────────────────────────────
mysql --defaults-file="${DB_CONF}" -BNs myapp -e "
  SELECT COUNT(*) FROM order_items oi
  WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.id=oi.order_id)" |   xargs -I{} bash -c '[ {} -gt 0 ] && echo "WARN: {} orphaned order_items"'

# ── Check column cardinality (detect runaway enums) ────────
mysql --defaults-file="${DB_CONF}" -BNs myapp -e "
  SELECT status, COUNT(*) FROM orders GROUP BY status ORDER BY 2 DESC"

# ── Statistical anomaly check ─────────────────────────────
mysql --defaults-file="${DB_CONF}" -BNs myapp -e "
  SELECT
    AVG(total) AS mean,
    STDDEV(total) AS stddev,
    MIN(total) AS min_val,
    MAX(total) AS max_val,
    COUNT(*) AS cnt
  FROM orders
  WHERE DATE(created_at) = CURDATE()" |   awk -F$'\t' '{
    if ($1 > 0 && $4/$1 > 10)
      printf "WARN: Max order (%.2f) is >10x average (%.2f)\n",$4,$1
  }' 
✔ Validati — Run validation after every import and migration. Use COALESCE and IS NULL checks for nullable columns. Always check referential integrity after bulk inserts. Store validation results to a log file for audit trails. Make the script exit non-zero on any failure so CI/CD pipelines can gate on it.