awk

awk is a non interactive tool to treat, to transform, to edit, or to apply elementary computations to texts.

It works on files line by line and addresses the fields inside each line.

Principle of operation

awk operates in a similar way as sed:

  • the file(s) to be processed are read line by line

  • each line is subject to a awk program: if the current line satisfies one or more selections, corresponding actions are executed

A line of a awk program has the format:

selection {action}

A prologue (the BEGIN selection) and an epilogue (the END selection) initializes and closes computations on the line of the files.

A selection is a regular expression placed between two slashes, or a more complex expresion. An action is a language instruction.

Without selection, , all lines of the file(s) are processed.

Without action, the current line is displayed.

Call

awk [ -Fc ] { -f fprog | prog } [ param ] [ file ]

Without file, awk processes the standard input.

If more than one input file is specified, they are processed sequentially.

The option -F indicates the field separator (by default: white space and tabulation).

A awk program can be specified at the command line (the prog argument) or in a file (the fprog file of the -f option).

Variables

A awk program manipulates three kind of variables:

  • the pre-defined variables

  • the variables that designates line fields that are processed

  • work variables that are needed by actions

There are seven pre-defined variables:

variable meaning
FILENAME the name of the file being processed
NF the number of fields in the current record/line
NR the number of the current record/line
FS the input field separator (default: space)
OFS the output field separator (default: OFS=FS)
RS the input line/record separator (default: new line)
ORS the output line/record separator (default: ORS=RS)

The variables that designates the line fields are $1,$2,$2, ... until $(NF).

$0 means the complete current line.

Work variables are defined by identifiers:

  • they don't have to be declared

  • they are of type string by default

  • they are initialized to the empty string

  • from context, they can be considered as numerical (floating numbers) and initialized to zero

Array variables (or dictionnaries) don't have to be declared. The index can be anything (e.g., a number, a string, ...). Brackets around the index indicates that it is an array.

# starts a comment until the end of the line.

Selections

Three different kind are possible:

no selection = the whole line is selected for action
condition lines that satisfy condition are selected for action
cond1, cond2 are selected for action: the line that satisfies cond1, all lines after that line until cond2 is satisfied

A condition is either a regular expression between slashes, or a boolean expression whose terms are combined with operators && (and), | (or), or ! (not).

A boolean term can use the relational operators:

operator meaning
== equal
!= different
<, <= lower, lower or equal
>, >= greater, greater or equal
~ is included in
!~ is not included in

A regular expression in awk can include the following extensions:

  • |: or between two regular expressions

  • +: one or more occurences of the preceding regular expression (at least one)

  • ?: zero or one occurence of the preceding regular expression (at most one)

  • *: zero or any occurences of the preceding regular expression (any, including none)

Actions

[ BEGIN { actions } ] [ selection { actions } ] [ END { actions } ]

Actions are composed of a set of instructions separated by:

  • a end of line

  • a semi-colon (;)

  • a closing brace (})

Assignement

variable = statement
+, -, *, /, % addition, subtraction, multiplication, division, remainder
++, -- increment and decrement operators
+=, -=, *=, /=, %= assignement operators

Any variable can be assigned, even a field variable.

Display

Commands:

print display a list of statements separated by commas
printf format formatted display of a list of statements separated by commas

The format is a string with a syntax similar as in C.

Conditional

    if (condition) {statement} [ else {statement} ]

Loop

    while (condition) {statement}
    for (statement ; condition ; statement ) statement
    for ( i in array ) statement

Other control structures

break exits the current loop
continue leaves the current iteration (go directly to the next iteration)
next leaves the processing of the current line
exit leaves the current processing (execute the epiloque if it exists)