AWK handles multiple files naturally — processing them in sequence with FILENAME tracking each source and FNR resetting per file. This makes AWK ideal for cross-file joins, per-file reporting, and enrichment pipelines where you load a reference dataset from one file and apply it to another. Combine this with getline for pipeline integration and you have a complete data processing toolkit.
1
FNR vs NR — tracking files
AWK
# NR = total records read across ALL files
# FNR = records read in CURRENT file (resets per file)
# FILENAME = name of current input file
# ── Per-file headers ──────────────────────────────────────
awk 'FNR==1 { print "=== File:", FILENAME, "===" }
{ print NR, FNR, $0 }' file1.log file2.log file3.log
# ── Per-file summary ──────────────────────────────────────
awk 'FNR==1 && NR>1 {
# At the start of a new file (but not the very first)
printf " %s: %d errors\n", prev_file, errors
errors = 0
}
/ERROR/ { errors++ }
{ prev_file = FILENAME }
END { printf " %s: %d errors\n", prev_file, errors }' *.log
# ── Detect which file we are in ───────────────────────────
awk 'FILENAME == "servers.csv" { servers[$1] = $2 }
FILENAME == "metrics.log" { print $1, servers[$1], $2 }' \
servers.csv metrics.log
2
Two-file join — the FNR==NR pattern
AWK
# The classic AWK join idiom:
# FNR==NR is true only when reading the FIRST file
# next skips to next record without processing rest of rules
# So: first file loads the lookup table; second file uses it
# ── Enrich metrics with server metadata ───────────────────
# servers.csv: hostname,region,tier,owner
# metrics.log: hostname cpu_pct mem_pct disk_pct
awk 'BEGIN { FS="," }
FNR==NR {
region[$1] = $2
tier[$1] = $3
owner[$1] = $4
next
}
{
h = $1
printf "%-15s region=%-10s tier=%-4s owner=%-12s cpu=%s mem=%s disk=%s\n",
h, region[h], tier[h], owner[h], $2, $3, $4
}' servers.csv metrics.log
# ── Find lines in file2 NOT in file1 (set difference) ─────
awk 'FNR==NR { seen[$0]=1; next }
!seen[$0]' file1.txt file2.txt
# ── Find lines COMMON to both files (intersection) ────────
awk 'FNR==NR { seen[$0]=1; next }
seen[$0]' file1.txt file2.txt
# ── Join on specific field (like SQL JOIN) ────────────────
# employees.csv: id,name,dept_id
# departments.csv: dept_id,dept_name
awk 'BEGIN { FS="," }
FNR==NR { dept[$1]=$2; next }
{ print $2, $3, dept[$3] }' departments.csv employees.csv
3
Writing to multiple output files
AWK
# ── Split log by level ────────────────────────────────────
awk '{ print > "/tmp/logs/" $3 ".log" }' app.log
# Creates: /tmp/logs/ERROR.log, /tmp/logs/WARN.log, /tmp/logs/INFO.log
# ── Split CSV by department ───────────────────────────────
awk -F',' 'NR>1 { print > "/tmp/dept_" $3 ".csv" }' employees.csv
# ── Append to file ────────────────────────────────────────
awk '/ERROR/ { print >> "/var/log/errors_only.log" }
{ print }' app.log
# ── Split into chunks of N lines ─────────────────────────
awk '{ file = sprintf("/tmp/chunk_%04d.txt", int((NR-1)/1000))
print > file }' bigfile.txt
# Creates chunk_0000.txt (lines 1-1000), chunk_0001.txt (1001-2000)...
# ── Print to pipe ─────────────────────────────────────────
awk '{ print | "sort -rn" }' numbers.txt
awk '{ print | "mail -s Alert ops@example.com" }' alerts.txt
# ── Close file handles to flush and reuse ────────────────
awk '{
print > "/tmp/output.txt"
close("/tmp/output.txt") # flush — needed before reading same file
}'
4
getline — reading from commands and files
AWK
# getline reads one line at a time from various sources
# Returns 1 on success, 0 on EOF, -1 on error
# ── Read from a command ───────────────────────────────────
awk 'BEGIN {
while (("cat /etc/hostname" | getline line) > 0)
hostname = line
close("cat /etc/hostname")
print "Running on:", hostname
}
{ print hostname, $0 }' app.log
# ── Read next line from same input ────────────────────────
awk '/^START/ {
getline nextline # reads next line into nextline
print "After START:", nextline
}' events.log
# ── Read from a file ──────────────────────────────────────
awk '{
while ((getline line < "/etc/hosts") > 0)
if (line ~ $1) print "Found in hosts:", line
close("/etc/hosts")
}' hostnames.txt
# ── AWK script file ───────────────────────────────────────
# Store complex programs in a file: report.awk
# #!/usr/bin/awk -f
# BEGIN { FS=","; print "Report" }
# NR>1 { ... }
# END { print "Done" }
awk -f report.awk data.csv # run from file
chmod +x report.awk && ./report.awk data.csv # run directly
vriddh@prod-01:~/scripts$awk 'BEGIN{FS=","} FNR==NR{region[$1]=$2;tier[$1]=$3;next} {printf "%-15s %-10s %s cpu=%s\n",$1,region[$1],tier[$1],$2}' servers.csv metrics.log
prod-web-01 ap-south web cpu=38%
prod-db-01 ap-south db cpu=91%
prod-cache-01 ap-south cache cpu=23%
vriddh@prod-01:~/scripts$awk '{print > "/tmp/logs/" $3 ".log"}' app.log && ls /tmp/logs/
ERROR.log INFO.log WARN.log
█
✔ Multi-file AWK rules — Use
FNR==NR { ...; next } to load the first file as a lookup table. Always pass the lookup file first on the command line. Use FILENAME and FNR for per-file tracking. Close output files with close() when writing many distinct files to avoid hitting the OS open-file limit. Store complex AWK programs in .awk files and run with awk -f program.awk data.