View the Most Wanted LQ Wiki articles.
LinuxQuestions.org > Linux Wiki > Awk

From LQWiki

Jump to: navigation, search

awk is a C-like text processing scripting language originally created in 1977 at AT&T Bell Laboratories by Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan. In its modern GNU incarnation, it implements regular expressions, redirection (including bidirectional communication with coprocesses), user-defined functions, arrays, TCP/IP networking, and floating point math, among other things. Awk tips provides some quick-and-dirty awk scripts.

Contents

Usage

When awk is run, it is given two forms of input, the program and the data. The program can be typed directly on the command line or stored in a file and accessed with the -f option. The data comes from files listed on the command line or from stdin if none are listed. The first example has the script on the command line with input from a file, while the second example uses an external program to create the input data and pipes into awk, which uses an external script as the program.

$ awk '{ print $1; }' datafile
$ makedata | awk -f myscript.awk

awk scripts that are saved in files can be executed directly by placing the proper shebang line at the beginning:

#!/bin/awk -f

Important note: use the exact path of your awk (available from typing "which awk") if it is not named /bin/awk.

Language structure

An awk program consists of a series of statements each consisting of a pattern and an action. Awk reads the input (whether files or data piped from stdin) line-by-line automatically. For each line of data, if the pattern is true, the action is executed. There are a few special patterns. The BEGIN rule is executed first, before any input is read, and the END rule is executed last, after the end of all input. Some complicated awk scripts consist of only a BEGIN rule and use getline to read the input data. If pattern is empty, the action is always executed. If action is empty, awk echos the line.

The pattern can be a regular expression enclosed in slashes ('/'), in which case it is considered true if the input line matches (i.e. contains matching text) the pattern. The expression /^[^#]/ would select all lines not beginning with a pound sign. The pattern could also be an awk expression, e.g. (NF>5) to select all lines with more than 5 words.

Whenever a line of input is read (whether automatically or with getline [1]), the line is split into words. The first word is assigned to $1, the second $2, etc. This makes it easy for awk to deal with columns of data. The variable NF is set to the number of words. $ is an awk operator, so the "number" can be the result of any expression. $NF is the last word on the line.

For a complete description of the language, see the GNU awk manual [2].

GNU Awk extensions

Things to be careful about when using a gawk script in a non-GNU awk include:

  • Special files like /dev/stderr, useful for printing error messages.
  • The systime() and strftime() functions.
  • The nextfile statement.
  • delete ARRA to delete an entire array.
  • The gensub() function.
  • Bidirectional pipes to coprocesses.

This list is not comprehensive; the gawk manual (below) has more info.

Examples

Extracting addresses from a mailbox file

This statement will pull out everything that looks like an e-mail address from a mailbox file (here called foo), and sort them into alphabetical order.

 awk '!/essage-[iI][dD]/ && !/MAILER-DAEMON/ {
   if (match($0, /([A-Za-z][A-Za-z0-9._-]*@[A-Za-z0-9._-]+)/))
     print substr($0, RSTART, RLENGTH); }' foo |sort |uniq

The awk script works as follows. If the line is not a message-id header, does not mention "MAILER-DAEMON", and has something in it that looks like an email address, then the address-like portion is extracted and printed. The output from awk is sorted alphabetically by sort, and uniq deletes repetitions.

See also

This article is based, in whole or in part, on entry or entries in the Jargon File.

External links

  • gawk manual (www.gnu.org)
    Excellent reference, especially the sections on Reading Files, Expressions, and Functions.
  • gawk man page (man.linuxquestions.org)
  • awk forum (www.tek-tips.com)
    when you have questions
  • How to Use AWK (sparky.rice.edu)
    quick intro
  • awk manual (www.cs.uu.nl)
    shorter than the gawk manual above

Personal tools