A backup script that takes 20 minutes instead of 5 is a problem. A health check that blocks for 30 seconds on an unreachable host blocks your whole pipeline. Understanding where bash spends time — and knowing the patterns that create invisible overhead — lets you fix scripts that are too slow for production use.
1
Measure first — time and profiling
BASH
# ── time — measure overall script runtime ─────────────────
time ./backup.sh
# real 0m42.183s ← wall clock time (what matters)
# user 0m0.284s ← CPU time in user space
# sys 0m0.091s ← CPU time in kernel
# ── Measure individual sections ───────────────────────────
time_section() {
local label="${1}"; shift
local start; start=$(date +%s%N)
"$@"
local elapsed=$(( ($(date +%s%N) - start) / 1000000 ))
printf " [TIMER] %-30s %dms\n" "${label}" "${elapsed}" >&2
}
time_section "DB dump" mysqldump myapp > /backups/dump.sql
time_section "Compress" gzip /backups/dump.sql
time_section "Upload" rsync /backups/ bkp-01:/archive/
# ── Simple profiling with PS4 ─────────────────────────────
PS4='+ $(date "+%s%N") ${BASH_SOURCE[0]}:${LINENO}: '
exec 3>&2 2>/tmp/trace.log
set -x
# ... your script ...
set +x
exec 2>&3 3>&-
echo "Trace written to /tmp/trace.log"
2
Avoid unnecessary subshells — the biggest performance trap
BASH
# ── Every $() forks a subshell — expensive in loops ───────
# SLOW — forks cat + wc for every iteration
for f in /var/log/*.log; do
lines=$(cat "${f}" | wc -l) # 2 forks per iteration
echo "${f}: ${lines} lines"
done
# FAST — wc reads files directly, one fork total
wc -l /var/log/*.log
# ── Useless use of cat ────────────────────────────────────
# SLOW
cat file.txt | grep "ERROR" # forks cat unnecessarily
cat file.txt | wc -l # same
# FAST — redirect directly
grep "ERROR" file.txt
wc -l < file.txt
# ── Use built-ins over external commands ──────────────────
# SLOW — forks an external process
n=$(expr $n + 1)
upper=$(echo "${str}" | tr '[:lower:]' '[:upper:]')
# FAST — bash built-in, no fork
(( n++ ))
upper="${str^^}" # bash 4+ built-in uppercase
# ── More built-in alternatives ────────────────────────────
# Instead of: $(dirname "$path") → use: "${path%/*}"
# Instead of: $(basename "$path") → use: "${path##*/}"
# Instead of: $(echo "$str" | sed 's/foo/bar/') → use: "${str/foo/bar}"
# Instead of: $(echo "$str" | grep -c 'pat') → use regex in [[ ]]
3
Parallelise work — xargs -P and background jobs
BASH
# ── xargs -P — parallel execution ─────────────────────────
# Compress 100 log files in parallel (8 at a time)
find /var/log -name "*.log" -mtime +7 | xargs -P 8 -I{} gzip {}
# Process images in parallel
find /photos -name "*.jpg" | \
xargs -P $(nproc) -I{} convert {} -resize 800x600 {}_thumb.jpg
# ── Parallel with background jobs ─────────────────────────
MAX_PARALLEL=4
running=0
for server in "${servers[@]}"; do
process_server "${server}" &
(( ++running >= MAX_PARALLEL )) && {
wait -n 2>/dev/null || wait # wait for any one to finish
(( running-- ))
}
done
wait # wait for all remaining
# ── Batch network checks in parallel ─────────────────────
check_host() {
ping -c1 -W2 "${1}" &>/dev/null \
&& echo "UP: ${1}" \
|| echo "DOWN: ${1}"
}
export -f check_host
printf "%s\n" web-{01..10} | xargs -P 10 -I{} bash -c 'check_host "$@"' _ {}
# ── Benchmark before/after parallelism ───────────────────
echo "Sequential:"
time { for i in {1..10}; do sleep 1; done; }
# real: ~10s
echo "Parallel:"
time { for i in {1..10}; do sleep 1 &; done; wait; }
# real: ~1s
vriddh@prod-01:~/scripts$time ./compress_logs.sh # sequential
Compressed 48 log files
real 2m18.4s
vriddh@prod-01:~/scripts$time find /var/log -name "*.log" -mtime +7 | xargs -P 8 -I{} gzip {}
real 0m19.2s
vriddh@prod-01:~/scripts$printf "%s\n" web-{01..10} | xargs -P 10 -I{} bash -c 'check_host "$@"' _ {}
UP: web-01
UP: web-02
DOWN: web-05
UP: web-07
█
✔ Performance rules — Measure before optimising — use
time and time_section. Avoid cat file | cmd — use cmd < file or just cmd file. Use bash built-ins (${str^^}, (( n++ )), ${path##*/}) instead of forking subshells. Parallelise with xargs -P for file operations and background jobs + wait for commands. Use nproc to set parallelism equal to CPU count automatically.