Shell Scripting AWK Multi-file Advanced May 2026

Shell Scripting — Advanced AWK: Multi-file & Pipes

Process multiple files simultaneously, join datasets using FNR==NR, write to multiple output files, read from pipes inside AWK with getline, and build AWK programs stored in reusable script files.

AWK handles multiple files naturally — processing them in sequence with FILENAME tracking each source and FNR resetting per file. This makes AWK ideal for cross-file joins, per-file reporting, and enrichment pipelines where you load a reference dataset from one file and apply it to another. Combine this with getline for pipeline integration and you have a complete data processing toolkit.

1 FNR vs NR — tracking files

AWK

# NR  = total records read across ALL files
# FNR = records read in CURRENT file (resets per file)
# FILENAME = name of current input file

# ── Per-file headers ──────────────────────────────────────
awk 'FNR==1 { print "=== File:", FILENAME, "===" }
     { print NR, FNR, $0 }' file1.log file2.log file3.log

# ── Per-file summary ──────────────────────────────────────
awk 'FNR==1 && NR>1 {
  # At the start of a new file (but not the very first)
  printf "  %s: %d errors\n", prev_file, errors
  errors = 0
}
/ERROR/ { errors++ }
{ prev_file = FILENAME }
END { printf "  %s: %d errors\n", prev_file, errors }' *.log

# ── Detect which file we are in ───────────────────────────
awk 'FILENAME == "servers.csv" { servers[$1] = $2 }
     FILENAME == "metrics.log" { print $1, servers[$1], $2 }' \
  servers.csv metrics.log

2 Two-file join — the FNR==NR pattern

AWK

# The classic AWK join idiom:
# FNR==NR  is true only when reading the FIRST file
# next     skips to next record without processing rest of rules
# So: first file loads the lookup table; second file uses it

# ── Enrich metrics with server metadata ───────────────────
# servers.csv:  hostname,region,tier,owner
# metrics.log:  hostname cpu_pct mem_pct disk_pct
awk 'BEGIN { FS="," }
FNR==NR {
  region[$1] = $2
  tier[$1]   = $3
  owner[$1]  = $4
  next
}
{
  h = $1
  printf "%-15s region=%-10s tier=%-4s owner=%-12s cpu=%s mem=%s disk=%s\n",
    h, region[h], tier[h], owner[h], $2, $3, $4
}' servers.csv metrics.log

# ── Find lines in file2 NOT in file1 (set difference) ─────
awk 'FNR==NR { seen[$0]=1; next }
!seen[$0]' file1.txt file2.txt

# ── Find lines COMMON to both files (intersection) ────────
awk 'FNR==NR { seen[$0]=1; next }
seen[$0]' file1.txt file2.txt

# ── Join on specific field (like SQL JOIN) ────────────────
# employees.csv:  id,name,dept_id
# departments.csv: dept_id,dept_name
awk 'BEGIN { FS="," }
FNR==NR { dept[$1]=$2; next }
{ print $2, $3, dept[$3] }' departments.csv employees.csv

3 Writing to multiple output files

AWK

# ── Split log by level ────────────────────────────────────
awk '{ print > "/tmp/logs/" $3 ".log" }' app.log
# Creates: /tmp/logs/ERROR.log, /tmp/logs/WARN.log, /tmp/logs/INFO.log

# ── Split CSV by department ───────────────────────────────
awk -F',' 'NR>1 { print > "/tmp/dept_" $3 ".csv" }' employees.csv

# ── Append to file ────────────────────────────────────────
awk '/ERROR/ { print >> "/var/log/errors_only.log" }
     { print }' app.log

# ── Split into chunks of N lines ─────────────────────────
awk '{ file = sprintf("/tmp/chunk_%04d.txt", int((NR-1)/1000))
     print > file }' bigfile.txt
# Creates chunk_0000.txt (lines 1-1000), chunk_0001.txt (1001-2000)...

# ── Print to pipe ─────────────────────────────────────────
awk '{ print | "sort -rn" }' numbers.txt
awk '{ print | "mail -s Alert ops@example.com" }' alerts.txt

# ── Close file handles to flush and reuse ────────────────
awk '{
  print > "/tmp/output.txt"
  close("/tmp/output.txt")   # flush — needed before reading same file
}'

4 getline — reading from commands and files

AWK

# getline reads one line at a time from various sources
# Returns 1 on success, 0 on EOF, -1 on error

# ── Read from a command ───────────────────────────────────
awk 'BEGIN {
  while (("cat /etc/hostname" | getline line) > 0)
    hostname = line
  close("cat /etc/hostname")
  print "Running on:", hostname
}
{ print hostname, $0 }' app.log

# ── Read next line from same input ────────────────────────
awk '/^START/ {
  getline nextline   # reads next line into nextline
  print "After START:", nextline
}' events.log

# ── Read from a file ──────────────────────────────────────
awk '{
  while ((getline line < "/etc/hosts") > 0)
    if (line ~ $1) print "Found in hosts:", line
  close("/etc/hosts")
}' hostnames.txt

# ── AWK script file ───────────────────────────────────────
# Store complex programs in a file: report.awk
# #!/usr/bin/awk -f
# BEGIN { FS=","; print "Report" }
# NR>1  { ... }
# END   { print "Done" }
awk -f report.awk data.csv        # run from file
chmod +x report.awk && ./report.awk data.csv  # run directly

awk — multi-file join demo

vriddh@prod-01:~/scripts$awk 'BEGIN{FS=","} FNR==NR{region[$1]=$2;tier[$1]=$3;next} {printf "%-15s %-10s %s cpu=%s\n",$1,region[$1],tier[$1],$2}' servers.csv metrics.log

prod-web-01 ap-south web cpu=38%

prod-db-01 ap-south db cpu=91%

prod-cache-01 ap-south cache cpu=23%

vriddh@prod-01:~/scripts$awk '{print > "/tmp/logs/" $3 ".log"}' app.log && ls /tmp/logs/

ERROR.log INFO.log WARN.log

█

✔ Multi-file AWK rules — Use FNR==NR { ...; next } to load the first file as a lookup table. Always pass the lookup file first on the command line. Use FILENAME and FNR for per-file tracking. Close output files with close() when writing many distinct files to avoid hitting the OS open-file limit. Store complex AWK programs in .awk files and run with awk -f program.awk data.