Database alerts bridge the gap between monitoring dashboards and immediate action — paging on-call when replication lags, Slack-posting when disk fills up, emailing when the backup fails. Shell scripts that combine database queries with notification channels provide flexible, customisable alerting without expensive monitoring platforms.
1
Threshold-based alert script
BASH
#!/usr/bin/env bash
# db_alert.sh — Database threshold alerting with multiple channels
set -euo pipefail
DB_CONF="/etc/myapp/mysql.conf"
ALERT_LOG="/var/log/myapp/db-alerts.log"
SLACK_URL="${SLACK_WEBHOOK_URL:-}"
PAGERDUTY_KEY="${PAGERDUTY_ROUTING_KEY:-}"
db() { mysql --defaults-file="${DB_CONF}" -BNs -D myapp -e "$1"; }
alert() {
local level="$1" title="$2" message="$3"
local ts; ts=$(date --iso-8601=seconds)
# Log
echo "[${ts}] ${level}: ${title} — ${message}" >> "${ALERT_LOG}"
# Slack
if [[ -n "${SLACK_URL}" ]]; then
local color
case "${level}" in CRITICAL) color="danger";; WARNING) color="warning";; *) color="good";; esac
curl -s -X POST "${SLACK_URL}" -H "Content-Type: application/json" -d "{"attachments":[{"color":"${color}","title":"[${level}] ${title}","text":"${message}","footer":"$(hostname) | ${ts}"}]}" &>/dev/null
fi
# PagerDuty for CRITICAL
if [[ "${level}" == "CRITICAL" && -n "${PAGERDUTY_KEY}" ]]; then
curl -s -X POST "https://events.pagerduty.com/v2/enqueue" -H "Content-Type: application/json" -d "{"routing_key":"${PAGERDUTY_KEY}","event_action":"trigger",
"payload":{"summary":"${title}","severity":"critical",
"source":"$(hostname)","custom_details":{"message":"${message}"}}}" &>/dev/null
fi
}
# ── Connection count ───────────────────────────────────────
CONNS=$(db "SELECT COUNT(*) FROM information_schema.PROCESSLIST")
MAX_CONNS=$(db "SELECT @@max_connections")
PCT=$(( CONNS * 100 / MAX_CONNS ))
(( PCT >= 90 )) && alert "CRITICAL" "MySQL connections near limit" "${CONNS}/${MAX_CONNS} (${PCT}%)"
(( PCT >= 75 && PCT < 90 )) && alert "WARNING" "MySQL connections elevated" "${CONNS}/${MAX_CONNS} (${PCT}%)"
# ── Disk usage ────────────────────────────────────────────
DISK_PCT=$(df /var/lib/mysql | awk 'NR==2{gsub(/%/,"",$5);print $5}')
(( DISK_PCT >= 90 )) && alert "CRITICAL" "MySQL disk nearly full" "${DISK_PCT}% used on /var/lib/mysql"
(( DISK_PCT >= 75 && DISK_PCT < 90 )) && alert "WARNING" "MySQL disk usage high" "${DISK_PCT}% used"
# ── Replication lag ────────────────────────────────────────
LAG=$(db "SHOW SLAVE STATUS\G" 2>/dev/null | awk "/Seconds_Behind_Master/{print \$2}")
[[ -n "${LAG}" && "${LAG}" != "NULL" ]] && (( LAG > 60 )) && alert "CRITICAL" "MySQL replication lag" "${LAG}s behind master"
# ── Long-running queries ───────────────────────────────────
LONG_QUERY=$(db "SELECT COUNT(*) FROM information_schema.PROCESSLIST
WHERE command != 'Sleep' AND time > 300")
(( LONG_QUERY > 0 )) && alert "WARNING" "Long-running queries" "${LONG_QUERY} queries running >5min"
echo "Alert check complete: $(date '+%H:%M:%S')"
2
Dead-man switch — alert if backup stops
BASH
#!/usr/bin/env bash
# backup_watchdog.sh — Alert if no recent backup exists
BACKUP_DIR="/backups/mysql"
MAX_AGE_HOURS=25 # allow 1 hour slack past 24h schedule
ALERT_EMAIL="dba@example.com"
LATEST=$(find "${BACKUP_DIR}" -name "*.sql.gz" -mtime -1 | sort -t_ -k2 -r | head -1)
if [[ -z "${LATEST}" ]]; then
AGE_HOURS=$(( ( $(date +%s) - $(stat -c %Y "${BACKUP_DIR}"/*.sql.gz 2>/dev/null | sort -rn | head -1 || echo 0) ) / 3600 ))
mail -s "CRITICAL: No MySQL backup in ${AGE_HOURS}h" "${ALERT_EMAIL}" << MSG
No MySQL backup file has been written in the last 25 hours.
Last backup found: $(ls -lt "${BACKUP_DIR}"/*.sql.gz 2>/dev/null | head -1)
Backup directory: ${BACKUP_DIR}
Host: $(hostname)
Time: $(date)
Investigate immediately — data is not being backed up.
MSG
echo "ALERT sent: no recent backup"
else
echo "OK: Recent backup found: $(basename "${LATEST}")"
fi
✔ ✔ Alerti — Check every threshold from multiple angles: connection percentage, absolute count, and trend. Deduplicate alerts using a state file — avoid sending the same alert 60 times in an hour. Use severity levels (CRITICAL for page-now, WARNING for next-day) to route to different channels. Test alert routing monthly by deliberately triggering a non-critical threshold. A dead-man switch (alert if backup STOPS) catches failures that positive checks miss.