Chapter 6. The GNU awk programming language

Table of Contents

Getting started with gawk
What is gawk?
Gawk commands
The print program
Printing selected fields
Formatting fields
The print command and regular expressions
Special patterns
Gawk scripts
Gawk variables
The input field separator
The output separators
The number of records
User defined variables
More examples
The printf program


In this chapter we will discuss:

  • What is gawk?

  • Using gawk commands on the command line

  • How to format text with gawk

  • How gawk uses regular expressions

  • Gawk in scripts

  • Gawk and variables

[Note]To make it more fun

As with sed, entire books have been written about various versions of awk. This introduction is far from complete and is only intended for understanding examples in the following chapters. For more information, best start with the documentation that comes with GNU awk: GAWK: Effective AWK Programming: A User's Guide for GNU Awk.

Getting started with gawk

What is gawk?

Gawk is the GNU version of the commonly available UNIX awk program, another popular stream editor. Since the awk program is often just a link to gawk, we will refer to it as awk.

The basic function of awk is to search files for lines or other text units containing one or more patterns. When a line matches one of the patterns, special actions are performed on that line.

Programs in awk are different from programs in most other languages, because awk programs are data-driven: you describe the data you want to work with and then what to do when you find it. Most other languages are procedural. You have to describe, in great detail, every step the program is to take. When working with procedural languages, it is usually much harder to clearly describe the data your program will process. For this reason, awk programs are often refreshingly easy to read and write.

[Note]What does it really mean?

Back in the 1970s, three programmers got together to create this language. Their names were Aho, Kernighan and Weinberger. They took the first character of each of their names and put them together. So the name of the language might just as well have been wak.

Gawk commands

When you run awk, you specify an awk program that tells awk what to do. The program consists of a series of rules. (It may also contain function definitions, loops, conditions and other programming constructs, advanced features that we will ignore for now.) Each rule specifies one pattern to search for and one action to perform upon finding the pattern.

There are several ways to run awk. If the program is short, it is easiest to run it on the command line:

awk PROGRAM inputfile(s)

If multiple changes have to be made, possibly regularly and on multiple files, it is easier to put the awk commands in a script. This is read like this:

awk -f PROGRAM-FILE inputfile(s)