Shell Scripting CSV awk Intermediate May 2026

Shell Scripting CSV & Delimited File Processing

Process CSV exports, TSV files, and custom delimiters using awk, while read with IFS, cut, and join. Handle headers, quoted fields, missing values, and generate summary reports from tabular data.

CSV and delimited files are everywhere in DevOps — inventory exports, billing data, database dumps, configuration lists. Bash handles them well when you know the right tools. awk is the most powerful for column-aware processing; while IFS= read is the safest for line-by-line parsing with proper field splitting.

BASH
# CSV file: servers.csv
# name,host,port,role
# db-primary,10.0.1.10,3306,primary
# db-replica,10.0.1.11,3306,replica
# cache-01,10.0.1.20,6379,cache

# ── Read CSV, skip header ─────────────────────────────────
while IFS=',' read -r name host port role; do
  [[ "${name}" == "name" ]] && continue   # skip header row
  [[ "${name}" == "#"* ]] && continue    # skip comments
  [[ -z "${name}" ]]         && continue    # skip blank lines

  printf "  Checking %-15s %-15s %-6s %s\n" \
    "${name}" "${host}" "${port}" "${role}"
  nc -z -w 3 "${host}" "${port}" 2>/dev/null \
    && echo "    ✔ UP" || echo "    ✘ DOWN"
done < servers.csv

# ── Skip header with tail ─────────────────────────────────
tail -n +2 servers.csv | while IFS=',' read -r name host port role; do
  echo "  Processing: ${name}"
done

# ── TSV (tab-delimited) ───────────────────────────────────
while IFS=$'\t' read -r col1 col2 col3; do
  echo "  ${col1} | ${col2} | ${col3}"
done < data.tsv
BASH
# ── awk with field separator ──────────────────────────────
awk -F',' '{print $1, $3}' servers.csv       # name and port
awk -F',' 'NR>1 {print $2}' servers.csv    # host column, skip header

# ── Filter rows ───────────────────────────────────────────
awk -F',' '$4=="primary"' servers.csv        # only primary servers
awk -F',' '$3>3000' servers.csv              # port > 3000

# ── Sum a column ──────────────────────────────────────────
awk -F',' 'NR>1 {sum+=$3} END {print "Total:", sum}' data.csv

# ── Report with header ────────────────────────────────────
awk -F',' '
BEGIN {
  printf "%-20s %-15s %s\n", "Name", "Host", "Port"
  printf "%-20s %-15s %s\n", "----", "----", "----"
}
NR>1 {
  printf "%-20s %-15s %s\n", $1, $2, $3
}
END {
  printf "\nTotal: %d servers\n", NR-1
}' servers.csv

# ── Group by and count ────────────────────────────────────
awk -F',' 'NR>1 {count[$4]++} END {for (r in count) print r, count[r]}' \
  servers.csv
BASH
#!/usr/bin/env bash
# disk_report.sh — Generate CSV report of disk usage

OUTPUT="/tmp/disk_report_$(date +%Y%m%d).csv"

# Write header
echo "hostname,filesystem,size,used,available,use_pct,mounted_on" > "${OUTPUT}"

HOSTNAME=$(hostname)

# Write data rows
df -h --output=source,size,used,avail,pcent,target | grep -v tmpfs \
  | tail -n +2 \
  | while IFS= read -r line; do
      read -r fs size used avail pct mount <<< "${line}"
      printf "%s,%s,%s,%s,%s,%s,%s\n" \
        "${HOSTNAME}" "${fs}" "${size}" "${used}" \
        "${avail}" "${pct}" "${mount}"
    done >> "${OUTPUT}"

echo "Report saved: ${OUTPUT}"

# Append data from multiple servers
for srv in web-01 web-02 db-01; do
  ssh "${srv}" "df -h --output=source,size,used,avail,pcent,target | grep -v tmpfs | tail -n +2" \
    | awk -v host="${srv}" '{print host","$1","$2","$3","$4","$5","$6}' \
    >> "${OUTPUT}"
done
bash — CSV processing
vriddh@prod-01:~/scripts$awk -F',' 'NR>1 {count[$4]++} END {for (r in count) printf " %-10s %d\n", r, count[r]}' servers.csv
primary 1
replica 2
cache 1
vriddh@prod-01:~/scripts$./disk_report.sh
Report saved: /tmp/disk_report_20260501.csv
vriddh@prod-01:~/scripts$head -3 /tmp/disk_report_20260501.csv
hostname,filesystem,size,used,available,use_pct,mounted_on
prod-01,/dev/sda1,50G,18G,30G,38%,/
prod-01,/dev/sdb1,200G,172G,18G,91%,/data
⚠ Quoted fields with commas — Standard CSV allows fields like "Smith, John",42. The IFS=',' pattern breaks on this. For production CSV with quoted fields, use Python's csv module or mlr (Miller) which handles RFC 4180 properly. For simple CSV without quoted commas, bash + awk works perfectly.
✔ CSV processing rules — Always skip the header row with NR>1 in awk or tail -n +2. Always skip blank lines and comments. Use IFS=',' with while read for simple parsing. Use awk -F',' for filtering, aggregation, and reporting. Always quote CSV output fields that could contain commas. Use mlr or Python for complex RFC 4180 CSV.