Gawk variables

As awk is processing the input file, it uses several variables. Some are editable, some are read-only.

The input field separator

The field separator, which is either a single character or a regular expression, controls the way awk splits up an input record into fields. The input record is scanned for character sequences that match the separator definition; the fields themselves are the text between the matches.

The field separator is represented by the built-in variable FS. Note that this is something different from the IFS variable used by POSIX-compliant shells.

The value of the field separator variable can be changed in the awk program with the assignment operator =. Often the right time to do this is at the beginning of execution before any input has been processed, so that the very first record is read with the proper separator. To do this, use the special BEGIN pattern.

In the example below, we build a command that displays all the users on your system with a description:

kelly is in ~> awk 'BEGIN { FS=":" } { print $1 "\t" $5 }' /etc/passwd
--output omitted--
kelly	Kelly Smith
franky	Franky B.
eddy	Eddy White
willy	William Black
cathy	Catherine the Great
sandy	Sandy Li Wong

kelly is in ~>

In an awk script, it would look like this:

kelly is in ~> cat printnames.awk
BEGIN { FS=":" }
{ print $1 "\t" $5 }

kelly is in ~> awk -f printnames.awk /etc/passwd
--output omitted--

Choose input field separators carefully to prevent problems. An example to illustrate this: say you get input in the form of lines that look like this:

Sandy L. Wong, 64 Zoo St., Antwerp, 2000X

You write a command line or a script, which prints out the name of the person in that record:

awk 'BEGIN { FS="," } { print $1, $2, $3 }' inputfile

But a person might have a PhD, and it might be written like this:

Sandy L. Wong, PhD, 64 Zoo St., Antwerp, 2000X

Your awk will give the wrong output for this line. If needed, use an extra awk or sed to uniform data input formats.

The default input field separator is one or more whitespaces or tabs.

The output separators

The output field separator

Fields are normally separated by spaces in the output. This becomes apparent when you use the correct syntax for the print command, where arguments are separated by commas:

kelly@octarine ~/test> cat test
record1         data1
record2         data2

kelly@octarine ~/test> awk '{ print $1 $2}' test

kelly@octarine ~/test> awk '{ print $1, $2}' test
record1 data1
record2 data2

kelly@octarine ~/test>

If you don't put in the commas, print will treat the items to output as one argument, thus omitting the use of the default output separator, OFS.

Any character string may be used as the output field separator by setting this built-in variable.

The output record separator

The output from an entire print statement is called an output record. Each print command results in one output record, and then outputs a string called the output record separator, ORS. The default value for this variable is \n, a newline character. Thus, each print statement generates a separate line.

To change the way output fields and records are separated, assign new values to OFS and ORS:

kelly@octarine ~/test> awk 'BEGIN { OFS=";" ; ORS="\n-->\n" } \
{ print $1,$2}' test

kelly@octarine ~/test>

If the value of ORS does not contain a newline, the program's output is run together on a single line.

The number of records

The built-in NR holds the number of records that are processed. It is incremented after reading a new input line. You can use it at the end to count the total number of records, or in each output record:

kelly@octarine ~/test> cat processed.awk
BEGIN { OFS="-" ; ORS="\n--> done\n" }
{ print "Record number " NR ":\t" $1,$2 }
END { print "Number of records processed: " NR }

kelly@octarine ~/test> awk -f processed.awk test
Record number 1:        record1-data1
--> done
Record number 2:        record2-data2
--> done
Number of records processed: 2
--> done

kelly@octarine ~/test>

User defined variables

Apart from the built-in variables, you can define your own. When awk encounters a reference to a variable which does not exist (which is not predefined), the variable is created and initialized to a null string. For all subsequent references, the value of the variable is whatever value was assigned last. Variables can be a string or a numeric value. Content of input fields can also be assigned to variables.

Values can be assigned directly using the = operator, or you can use the current value of the variable in combination with other operators:

kelly@octarine ~> cat revenues
20021009        20021013        consultancy     BigComp         2500
20021015        20021020        training        EduComp         2000
20021112        20021123        appdev          SmartComp       10000
20021204        20021215        training        EduComp         5000

kelly@octarine ~> cat total.awk
{ total=total + $5 }
{ print "Send bill for " $5 " dollar to " $4 }
END { print "---------------------------------\nTotal revenue: " total }

kelly@octarine ~> awk -f total.awk test
Send bill for 2500 dollar to BigComp
Send bill for 2000 dollar to EduComp
Send bill for 10000 dollar to SmartComp
Send bill for 5000 dollar to EduComp
Total revenue: 19500

kelly@octarine ~>

C-like shorthands like VAR+= value are also accepted.

More examples

The example from the section called “Writing output files” becomes much easier when we use an awk script:

kelly@octarine ~/html> cat make-html-from-text.awk
BEGIN { print "<html>\n<head><title>Awk-generated HTML</title></head>\n<body bgcolor=\"#ffffff\">\n<pre>" }
{ print $0 }
END { print "</pre>\n</body>\n</html>" }

And the command to execute is also much more straightforward when using awk instead of sed:

kelly@octarine ~/html> awk -f make-html-from-text.awk testfile > file.html
[Tip]Awk examples on your system

We refer again to the directory containing the initscripts on your system. Enter a command similar to the following to see more practical examples of the widely spread usage of the awk command:

grep awk /etc/init.d/*

The printf program

For more precise control over the output format than what is normally provided by print, use printf. The printf command can be used to specify the field width to use for each item, as well as various formatting choices for numbers (such as what output base to use, whether to print an exponent, whether to print a sign, and how many digits to print after the decimal point). This is done by supplying a string, called the format string, that controls how and where to print the other arguments.

The syntax is the same as for the C-language printf statement; see your C introduction guide. The gawk info pages contain full explanations.