AWK is not just a text tool — it is a complete programming language designed specifically for processing structured text. Unlike grep (which filters) or sed (which transforms), AWK processes data field by field, record by record, with full arithmetic, string functions, arrays, and control flow. Understanding AWK deeply is what separates advanced shell scripters from the rest.
1
AWK execution model — pattern { action }
AWK
# AWK processes input one record (line) at a time
# Each record is split into fields: $1, $2, ... $NF
# $0 = entire record, NR = record number, NF = number of fields
# FS = field separator (default: whitespace)
# ── Structure: [pattern] { action } ──────────────────────
awk '{ print $1 }' # action only (runs for every line)
awk '/ERROR/ { print $0 }' # pattern + action (regex match)
awk 'NR == 1 { print "Header:", $0 }' # expression pattern
awk '/START/,/END/ { print }' # range pattern
# ── BEGIN and END blocks ──────────────────────────────────
awk '
BEGIN {
FS = "," # set field separator before any input
print "Processing CSV..."
count = 0
}
NR > 1 { # skip header row
count++
total += $3 # sum column 3
}
END {
print "Records:", count
print "Total:", total
print "Average:", (count > 0 ? total/count : 0)
}' data.csv
# ── Multiple rules match the same line ───────────────────
awk '
/ERROR/ { errors++ }
/WARN/ { warns++ }
END { printf "Errors: %d, Warnings: %d\n", errors, warns }
' app.log
2
Built-in variables — the AWK environment
AWK
# ── Input variables ───────────────────────────────────────
# NR current record number (total lines read)
# FNR record number within current file
# NF number of fields in current record
# $0 full current record
# $1..$NF individual fields
# FILENAME current input filename
# ── Separator variables ───────────────────────────────────
# FS field separator (default: whitespace; set in BEGIN)
# OFS output field separator (default: space)
# RS record separator (default: newline)
# ORS output record separator (default: newline)
# ── Examples ──────────────────────────────────────────────
awk '{ print NR, NF, $1, $NF }' file.txt
# Process CSV — comma separator
awk -F',' '{ print $2 }' data.csv
awk 'BEGIN{FS=","} { print $2 }' data.csv # same
# Output with custom separator
awk 'BEGIN{FS=":"; OFS="\t"} { print $1, $3, $7 }' /etc/passwd
# Multi-character delimiter
awk -F' *' '{ print $1 }' # 2+ spaces as delimiter
awk -F'[,;|]' '{ print $2 }' # regex delimiter
# Process records separated by blank lines (paragraphs)
awk 'BEGIN{RS=""} { print NR, $0 }' file.txt
# $NF — last field (very useful)
ls -la | awk '{ print $NF }' # print filename only
df -h | awk 'NR>1 { print $NF, $5 }' # mount point + use%
3
String functions — the AWK stdlib
AWK
# ── String functions ──────────────────────────────────────
awk '{ print length($0) }' # line length
awk '{ print length($1) }' # field 1 length
awk '{ print toupper($1) }' # uppercase
awk '{ print tolower($1) }' # lowercase
awk '{ print substr($0, 1, 10) }' # first 10 chars
awk '{ print substr($1, 5) }' # from position 5 to end
awk '{ print index($0, "ERROR") }' # position of substring (0=not found)
awk '{ if (split($0, arr, ":") > 2) print arr[1], arr[3] }'
# split("str", array, sep) → fills array, returns count
# ── sub and gsub — find and replace ──────────────────────
awk '{ sub(/ERROR/, "WARN"); print }' # replace first match
awk '{ gsub(/ERROR/, "WARN"); print }' # replace all matches
awk '{ gsub(/ +/, "_"); print }' # spaces → underscores
awk '{ gsub(/[^0-9]/, ""); print }' # keep digits only
# ── printf — formatted output ─────────────────────────────
awk '{ printf "%-20s %8.2f\n", $1, $2 }' # aligned columns
awk '{ printf "%05d %s\n", NR, $0 }' # zero-padded line numbers
# ── match — regex matching with position ─────────────────
awk '{ if (match($0, /[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/))
print substr($0, RSTART, RLENGTH) }' # extract IP
4
AWK arithmetic and control flow
AWK
# ── Arithmetic ────────────────────────────────────────────
awk '{ sum += $1; count++ } END { print sum/count }'
awk 'BEGIN { print 355/113 }' # pi approximation: 3.14159
awk '{ print int($1 * 1.21) }' # integer truncation
awk 'BEGIN { print sqrt(2), log(10), sin(3.14) }'
# ── Conditionals ──────────────────────────────────────────
awk '$3 > 80 { print "HIGH:", $0 }'
awk '{ if ($2 == "ERROR") print "error:", $3; else print "ok:", $3 }'
awk '{ status = ($3 > 90) ? "CRITICAL" : ($3 > 70 ? "WARN" : "OK"); print status, $1 }'
# ── Loops ─────────────────────────────────────────────────
awk '{ for (i=1; i<=NF; i++) printf "%s ", $i; print "" }'
awk 'BEGIN { for (i=1; i<=10; i++) print i*i }'
awk '{ i=1; while (i <= NF) { print $i; i++ } }'
# ── next — skip to next record ────────────────────────────
awk '/^#/ { next } { print }' # skip comment lines
awk 'NF == 0 { next } { print }' # skip empty lines
# ── exit — stop processing ────────────────────────────────
awk 'NR==5 { exit } { print }' # print first 4 lines only
Terminal output
Key
Computed output
Field extraction
Aggregation
vriddh@prod-01:~/scripts$awk -F',' 'BEGIN{print "Name","CPU","Status"} NR>1{status=$3>80?"HIGH":"OK"; printf "%-15s %4s%% %s\n",$1,$3,status}' servers.csv
Name CPU Status
prod-web-01 38% OK
prod-db-01 91% HIGH
prod-cache-01 23% OK
vriddh@prod-01:~/scripts$awk 'NR>1{sum+=$3; n++} END{printf "Avg CPU: %.1f%%\n", sum/n}' servers.csv
Avg CPU: 50.7%
█
✔ AWK fundamentals — Every AWK program is zero or more
pattern { action } rules. Use BEGIN to set FS and print headers. Use END for summaries. $0 = full line, $1–$NF = fields, $NF = last field. Always set FS in BEGIN rather than relying on -F when writing to a file. Use printf for aligned output — print is too uncontrolled for reports.