Awk tips

awk is a programming language designed to extract columns from a file or stdin. For example the command awk '{print $2;}' file1 prints the second column ($2) from the text file file1. It is also possible to use regular expressions.

= Using stand-alone files =

Once an awk script gets long, it's probably best to put it into a file. It can then be run directly. Just place this line at the beginning of the file (use which awk or which gawk to figure out the exact name to use): #!/bin/gawk -f Then set the file executable: chmod +x myscript.awk And you can run it!

= Printing a column =

Print only one column of a file with: awk '{print $2;}' infile  # $2 means column 2

Or, print the last word on each line with: awk '{print $NF;}' infile # NF is the number of "fields" (words)

= Running awk without input data =

The BEGIN rule can be used to run an awk program without having input data, or to do things before any input is read.

$ awk 'BEGIN { print "Hello, world!"; }'

This can be used to evaluate floating-point expressions like bc does. $ X=3.5 $ Y=2.5 $ Z=$(awk "BEGIN { print $X * $Y; }")  # Double quotes allow shell variables. $ echo $Z

= Quick sums and averages =

Ever want to know exactly how many bytes are in a directory or average a bunch of numbers? Awk is well-suited for this. $ ls -l | awk '{ sum += $5; } END { print sum; }'

$ awk '{ sum += $1; n++; } END { print sum/n; }' datafile

= Sort a file by the sum of two of its fields =

For a file with three columns, with numbers in columns 2 and 3, sorting its lines by the sum of fields 2 and 3 can be done as follows:

$ awk '{ print $1, $2, $3, $2 + $3 }' datafile | sort -nk4

= Digitizing a graph I: Print coordinates clicked from xv =

Every once in a while you have a graph that you need the numbers from. Rather than getting out a ruler and measuring point after point, scan the graph (if necessary) and load it up in xv using the command line suggested below. Start by middle-clicking on the origin and the ends of the axes so you can rescale the coordinates after you're done saving them. (Middle because left-click actually does stuff in xv, like select a region and move it.)

/ButtonPress.*mainW/ { if ($NF == "button=2") { x=$4+0; # The +0 makes an integer from a string y=substr($4, index($4,",")+1)+0; print x, y;  } }
 * 1) !/bin/gawk -f
 * 2)  Run with "xv -D 1 image.png 2>&1 | clickxv.awk"

= Digitizing a graph II: Rescale a set of coordinates =

Remember those points you saved? Here's how to rescale them with awk. Look at the coordinates for the origin and the ends of the x and y axes. Use them to write the following.

{ print ($1 - x0) * (xrange_real / xrange_pixels), \ ($2 - y0) * (yrange_real / yrange_pixels) }
 * 1) !/bin/gawk -f

xrange_real is the label on the end of the x axis, while xrange_pixels is the pixel value from when you clicked on that point.