Awk tips
awk is a programming language designed to extract columns from a file or stdin. For example the command
awk '{print $2;}' file1
prints the second column ($2) from the text file file1. It is also possible to use regular expressions.
Using stand-alone files
Once an awk script gets long, it's probably best to put it into a file. It can then be run directly. Just place this line at the beginning of the file (use which awk or which gawk to figure out the exact name to use):
#!/bin/gawk -f
Then set the file executable:
chmod +x myscript.awk
And you can run it!
Printing a column
Print only one column of a file with:
awk '{print $2;}' infile # $2 means column 2
Or, print the last word on each line with:
awk '{print $NF;}' infile # NF is the number of "fields" (words)
Running awk without input data
The BEGIN rule can be used to run an awk program without having input data, or to do things before any input is read.
$ awk 'BEGIN { print "Hello, world!"; }'
This can be used to evaluate floating-point expressions like bc does.
$ X=3.5 $ Y=2.5 $ Z=$(awk "BEGIN { print $X * $Y; }") # Double quotes allow shell variables. $ echo $Z
Quick sums and averages
Ever want to know exactly how many bytes are in a directory or average a bunch of numbers? Awk is well-suited for this.
$ ls -l | awk '{ sum += $5; } END { print sum; }'
$ awk '{ sum += $1; n++; } END { print sum/n; }' datafile
Sort a file by the sum of two of its fields
For a file with three columns, with numbers in columns 2 and 3, sorting its lines by the sum of fields 2 and 3 can be done as follows:
$ awk '{ print $1, $2, $3, $2 + $3 }' datafile | sort -nk4
Digitizing a graph I: Print coordinates clicked from xv
Every once in a while you have a graph that you need the numbers from. Rather than getting out a ruler and measuring point after point, scan the graph (if necessary) and load it up in xv using the command line suggested below. Start by middle-clicking on the origin and the ends of the axes so you can rescale the coordinates after you're done saving them. (Middle because left-click actually does stuff in xv, like select a region and move it.)
#!/bin/gawk -f # Run with "xv -D 1 image.png 2>&1 | clickxv.awk" /ButtonPress.*mainW/ { if ($NF == "button=2") { x=$4+0; # The +0 makes an integer from a string y=substr($4, index($4,",")+1)+0; print x, y; } }
Digitizing a graph II: Rescale a set of coordinates
Remember those points you saved? Here's how to rescale them with awk. Look at the coordinates for the origin and the ends of the x and y axes. Use them to write the following.
#!/bin/gawk -f { print ($1 - x0) * (xrange_real / xrange_pixels), \ ($2 - y0) * (yrange_real / yrange_pixels) }
xrange_real is the label on the end of the x axis, while xrange_pixels is the pixel value from when you clicked on that point.