From LQWiki
Jump to: navigation, search


Gawk is the GNU implementation of the AWK programming language. It is probably the form of awk most often used in Linux.

Unlike procedural programming languages, awk is entirely data-driven. Each statement in an awk program consists of a record matching criterion and an awk command; the command will be carried out on any input records that match the criterion, and may modify those records or calculate and store various totals. When the entire program has been run against a record, that record (possibly highly modified) is written to standard output. This continues until all records have been processed.

gawk can analyze either a text file whose name is passed as an argument or its own standard input. When working from standard input, it can be used to analyze the output of other utilities in quite complex ways. The program that it uses for this analysis can be read from a file or (if it is a very small program) included on the gawk command line.

Records are defined as strings separated by the awk record separator RS; each record is treated as a list of fields separated by the field separator FS. By default, FS is white space and RS is a new line, so that records are lines and each word is a field. However these variables, and indeed all awk's built-in variables, can be redefined within the awk program.

Regular expressions are often used as match criteria. They may be specified as occurring in a particular field or anywhere in the record. One feature of gawk is that it uses GNU's extended regular expressions rather than the more limited original UNIX ones.