Awk tips

From LQWiki
Jump to navigation Jump to search

awk is a programming language designed to extract columns from a file or stdin. For example the command

awk '{print $2;}' file1

prints the second column ($2) from the text file file1. It is also possible to use regular expressions.

Using stand-alone files

Once an awk script gets long, it's probably best to put it into a file. It can then be run directly. Just place this line at the beginning of the file (use which awk or which gawk to figure out the exact name to use):

 #!/bin/gawk -f

Then set the file executable:

 chmod +x myscript.awk

And you can run it!

Printing a column

Print only one column of a file with:

 awk '{print $2;}' infile   # $2 means column 2

Or, print the last word on each line with:

 awk '{print $NF;}' infile  # NF is the number of "fields" (words)

Running awk without input data

The BEGIN rule can be used to run an awk program without having input data, or to do things before any input is read.

 $ awk 'BEGIN { print "Hello, world!"; }'

This can be used to evaluate floating-point expressions like bc does.

 $ X=3.5
 $ Y=2.5
 $ Z=$(awk "BEGIN { print $X * $Y; }")   # Double quotes allow shell variables.
 $ echo $Z

Quick sums and averages

Ever want to know exactly how many bytes are in a directory or average a bunch of numbers? Awk is well-suited for this.

 $ ls -l | awk '{ sum += $5; } END { print sum; }'
 $ awk '{ sum += $1; n++; } END { print sum/n; }' datafile

Sort a file by the sum of two of its fields

For a file with three columns, with numbers in columns 2 and 3, sorting its lines by the sum of fields 2 and 3 can be done as follows:

 $ awk '{ print $1, $2, $3, $2 + $3 }' datafile | sort -nk4

Digitizing a graph I: Print coordinates clicked from xv

Every once in a while you have a graph that you need the numbers from. Rather than getting out a ruler and measuring point after point, scan the graph (if necessary) and load it up in xv using the command line suggested below. Start by middle-clicking on the origin and the ends of the axes so you can rescale the coordinates after you're done saving them. (Middle because left-click actually does stuff in xv, like select a region and move it.)

#!/bin/gawk -f
#  Run with "xv -D 1 image.png 2>&1 | clickxv.awk"

/ButtonPress.*mainW/ {
  if ($NF == "button=2") {
    x=$4+0;  # The +0 makes an integer from a string
    y=substr($4, index($4,",")+1)+0;
    print x, y;
  }
}

Digitizing a graph II: Rescale a set of coordinates

Remember those points you saved? Here's how to rescale them with awk. Look at the coordinates for the origin and the ends of the x and y axes. Use them to write the following.

#!/bin/gawk -f
{ print ($1 - x0) * (xrange_real / xrange_pixels), \
        ($2 - y0) * (yrange_real / yrange_pixels)
}

xrange_real is the label on the end of the x axis, while xrange_pixels is the pixel value from when you clicked on that point.