Tille - I can see you, read those man pages!   Tille's Site

Awk for dummies

This is a very general explanation on the uses of this fine processing language, equally applicable on all Unix/Unix-alike systems, which may or may not use the GNU version of this command, gawk:

prompt> ls -l /bin/awk
lrwxrwxrwx    1 root     root            4 May 25 16:00 /bin/awk -> gawk

The awk command reads lines, e.g. a text file, but it examines colums on each line. This makes awk the perfect sysadmin companion.

When working with awk, there's only one simple rule you have to know: this command refers to each column using a dollar sign ($). To separate one column from another, field separators are used. Most likely, these are whitespaces, e.g. when you issue the ls -l command, which lists a 'columned' representation of files and their properties in a directory. In the examples below, we will assume that these column separators are whitespaces, as in this typical output (in this case, on a Linux machine, we had to filter out the first line of this command since it contains statistics on the total amount of files in a directory):

myprompt> ls -l | sed '1d'
-r-xr-xr-x    1 tille    tille      970637 Dec  4  2001 install.sh*
-r--r--r--    1 tille    tille        2841 Dec  4  2001 javalogo52x88.gif
drwxrwxr-x    3 tille    tille        4096 Feb 19  2002 javaws/
-r--r--r--    1 tille    tille       15286 Dec  4  2001 Readme_de.html
-r--r--r--    1 tille    tille       13161 Dec  4  2001 readme.html

The awk thinks of the first column as '$1'. The second column is '$2', the third '$3' and so on, as long as there are colums left. The entire line is refered to as '$0'

Thus, if we want awk to print the file size and name, colums 5 and 9 in the above example, we would use this command:

myprompt> ls -l | sed '1d' | awk '{print $5 $9}'

This is, however, not what we want, because it is rather unreadable. This gives better results:

myprompt> ls -l | sed '1d' | awk '{print $5 " " $9}'
970637 install.sh*
2841 javalogo52x88.gif
4096 javaws/
15286 Readme_de.html
13161 readme.html

This is in fact equivalent to the expression

awk '{print $5, $9}'

The comma indicates to use the default field separator.

We can make this output even more understandable, for instance when generating a report on disk usage for your boss, using the -h option to the (GNU) ls command and the addition of a meaningful string, meanwhile switching the colums:

myprompt> ls -lh | sed '1d' | awk '{print $9 " is " $5 " in size."}'
install.sh* is 948k in size.
javalogo52x88.gif is 2.8k in size.
javaws/ is 4.0k in size.
Readme_de.html is 15k in size.
readme.html is 13k in size.

There is much more to say about awk, it is also one of those commands that have been the subject of many a literary adventure. This was only to make the fear go away. See the info pages for more.

© 1995-2010 Machtelt Garrels - tille - Powered by vIm - Best viewed with your eyes - Validated by W3C - Last update 20100511