awk

Process and transform structured text data — extract columns, filter rows, and compute aggregates from log files and CSV data with awk.

awk

awk is a text-processing language and command-line tool that scans input line by line, splits each line into fields, and applies pattern-action rules to extract, transform, and report structured data on Linux, macOS, and other Unix-like systems.

What awk Does and When to Use It

awk reads input one record (line) at a time, splits each record into fields by a delimiter (whitespace by default), and executes user-defined rules against those fields. Each rule consists of a pattern (when to act) and an action (what to do). System administrators use awk to extract columns from log files, calculate sums and averages, reformat CSV data, and build quick reports from command output.

awk is not a general-purpose scripting language. For tasks involving complex data structures, error handling, or external API calls, use Python or Perl. awk excels at one-liner text transformations where performance and brevity matter.

Three implementations exist: the original awk (one true awk), gawk (GNU awk, default on most Linux distributions), and mawk (faster, minimal implementation, default on Debian/Ubuntu). gawk adds features like network I/O, multi-dimensional arrays, and POSIX character classes. For official documentation, see man awk or gnu.org/software/gawk/manual/.

How to Install awk

awk ships pre-installed on all Linux and macOS systems. Ubuntu and Debian install mawk by default; install gawk for GNU extensions:

sudo apt install gawk

Verify the installed version:

awk --version

Core Concepts of awk

awk Fields and Delimiters

awk splits each input line into fields labeled $1, $2, $3, and so on. $0 represents the entire line. The default field separator is whitespace (spaces and tabs). Use -F to set a custom delimiter — for example, -F: for /etc/passwd or -F, for CSV files.

awk Patterns and Actions

An awk program consists of pattern { action } rules. The pattern determines which lines to process; the action defines what to do with matching lines. Omitting the pattern processes every line. Omitting the action prints the matching line. Special patterns BEGIN and END run before the first line and after the last line.

awk Built-In Variables

awk provides built-in variables for program logic: NR (current record/line number), NF (number of fields in the current line), FS (input field separator), OFS (output field separator), RS (record separator), and FILENAME (current input filename).

Common Tasks with awk

How to Print a Specific Column with awk

awk extracts a single column from structured text. Print the first column of command output:

df -h | awk '{print $1}'

Print the username field from /etc/passwd (colon-delimited):

awk -F: '{print $1}' /etc/passwd

How to Filter Lines by Pattern with awk

awk prints only lines matching a regex pattern. Filter Nginx access log entries returning a 500 status code (assuming status is field 9):

awk '$9 == 500' /var/log/nginx/access.log

Match lines containing a string:

awk '/error/' /var/log/syslog

How to Sum a Column with awk

awk computes column totals using a variable accumulator and the END block:

awk '{sum += $5} END {print "Total:", sum}' data.txt

How to Change the Output Delimiter with awk

awk reformats output by setting the output field separator ( OFS). Convert /etc/passwd from colon-separated to tab-separated:

awk -F: 'BEGIN {OFS="\t"} {print $1, $3, $6}' /etc/passwd

awk Troubleshooting

Error / SymptomCauseFix
awk: fatal: cannot open fileFile path is incorrect or file does not exist→ Full article
Empty or incorrect outputWrong field separator; fields are not where expected→ Full article
awk: syntax error at source line 1Unbalanced braces, missing quotes, or using double quotes around the program on the shell→ Full article

grep filters lines by pattern but cannot extract or rearrange fields. Use grep for simple matching and awk for field-level processing. See the grep article.

sed performs line-level substitutions and transformations. awk handles column-based processing that sed cannot. See sed article.