awk.1 (13062B)
- .de EX
- .nf
- .ft CW
- ..
- .de EE
- .br
- .fi
- .ft 1
- ..
- .de TF
- .IP "" "\w'\fB\\$1\ \ \fP'u"
- .PD 0
- ..
- .TH AWK 1
- .CT 1 files prog_other
- .SH NAME
- awk \- pattern-directed scanning and processing language
- .SH SYNOPSIS
- .B awk
- [
- .BI \-F
- .I fs
- |
- .B \-\^\-csv
- ]
- [
- .BI \-v
- .I var=value
- ]
- [
- .I 'prog'
- |
- .BI \-f
- .I progfile
- ]
- [
- .I file ...
- ]
- .SH DESCRIPTION
- .I Awk
- scans each input
- .I file
- for lines that match any of a set of patterns specified literally in
- .I prog
- or in one or more files
- specified as
- .B \-f
- .IR progfile .
- With each pattern
- there can be an associated action that will be performed
- when a line of a
- .I file
- matches the pattern.
- Each line is matched against the
- pattern portion of every pattern-action statement;
- the associated action is performed for each matched pattern.
- The file name
- .B \-
- means the standard input.
- Any
- .I file
- of the form
- .I var=value
- is treated as an assignment, not a filename,
- and is executed at the time it would have been opened if it were a filename.
- The option
- .B \-v
- followed by
- .I var=value
- is an assignment to be done before
- .I prog
- is executed;
- any number of
- .B \-v
- options may be present.
- The
- .B \-F
- .I fs
- option defines the input field separator to be the regular expression
- .IR fs .
- The
- .B \-\^\-csv
- option causes
- .I awk
- to process records using (more or less) standard comma-separated values
- (CSV) format.
- .PP
- An input line is normally made up of fields separated by white space,
- or by the regular expression
- .BR FS .
- The fields are denoted
- .BR $1 ,
- .BR $2 ,
- \&..., while
- .B $0
- refers to the entire line.
- If
- .BR FS
- is null, the input line is split into one field per character.
- .PP
- A pattern-action statement has the form:
- .IP
- .IB pattern " { " action " }
- .PP
- A missing
- .BI { " action " }
- means print the line;
- a missing pattern always matches.
- Pattern-action statements are separated by newlines or semicolons.
- .PP
- An action is a sequence of statements.
- A statement can be one of the following:
- .PP
- .EX
- .ta \w'\f(CWdelete array[expression]\fR'u
- .RS
- .nf
- .ft CW
- if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
- while(\fI expression \fP)\fI statement\fP
- for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
- for(\fI var \fPin\fI array \fP)\fI statement\fP
- do\fI statement \fPwhile(\fI expression \fP)
- break
- continue
- {\fR [\fP\fI statement ... \fP\fR] \fP}
- \fIexpression\fP #\fR commonly\fP\fI var = expression\fP
- print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
- printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
- return\fR [ \fP\fIexpression \fP\fR]\fP
- next #\fR skip remaining patterns on this input line\fP
- nextfile #\fR skip rest of this file, open next, start at top\fP
- delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
- delete\fI array\fP #\fR delete all elements of array\fP
- exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
- .fi
- .RE
- .EE
- .DT
- .PP
- Statements are terminated by
- semicolons, newlines or right braces.
- An empty
- .I expression-list
- stands for
- .BR $0 .
- String constants are quoted \&\f(CW"\ "\fR,
- with the usual C escapes recognized within.
- Expressions take on string or numeric values as appropriate,
- and are built using the operators
- .B + \- * / % ^
- (exponentiation), and concatenation (indicated by white space).
- The operators
- .B
- ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
- are also available in expressions.
- Variables may be scalars, array elements
- (denoted
- .IB x [ i ] \fR)
- or fields.
- Variables are initialized to the null string.
- Array subscripts may be any string,
- not necessarily numeric;
- this allows for a form of associative memory.
- Multiple subscripts such as
- .B [i,j,k]
- are permitted; the constituents are concatenated,
- separated by the value of
- .BR SUBSEP .
- .PP
- The
- .B print
- statement prints its arguments on the standard output
- (or on a file if
- .BI > " file
- or
- .BI >> " file
- is present or on a pipe if
- .BI | " cmd
- is present), separated by the current output field separator,
- and terminated by the output record separator.
- .I file
- and
- .I cmd
- may be literal names or parenthesized expressions;
- identical string values in different statements denote
- the same open file.
- The
- .B printf
- statement formats its expression list according to the
- .I format
- (see
- .IR printf (3)).
- The built-in function
- .BI close( expr )
- closes the file or pipe
- .IR expr .
- The built-in function
- .BI fflush( expr )
- flushes any buffered output for the file or pipe
- .IR expr .
- .PP
- The mathematical functions
- .BR atan2 ,
- .BR cos ,
- .BR exp ,
- .BR log ,
- .BR sin ,
- and
- .B sqrt
- are built in.
- Other built-in functions:
- .TF "\fBlength(\fR[\fIv\^\fR]\fB)\fR"
- .TP
- \fBlength(\fR[\fIv\^\fR]\fB)\fR
- the length of its argument
- taken as a string,
- number of elements in an array for an array argument,
- or length of
- .B $0
- if no argument.
- .TP
- .B rand()
- random number on [0,1).
- .TP
- \fBsrand(\fR[\fIs\^\fR]\fB)\fR
- sets seed for
- .B rand
- and returns the previous seed.
- .TP
- .BI int( x\^ )
- truncates to an integer value.
- .TP
- \fBsubstr(\fIs\fB, \fIm\fR [\fB, \fIn\^\fR]\fB)\fR
- the
- .IR n -character
- substring of
- .I s
- that begins at position
- .I m
- counted from 1.
- If no
- .IR n ,
- use the rest of the string.
- .TP
- .BI index( s , " t" )
- the position in
- .I s
- where the string
- .I t
- occurs, or 0 if it does not.
- .TP
- .BI match( s , " r" )
- the position in
- .I s
- where the regular expression
- .I r
- occurs, or 0 if it does not.
- The variables
- .B RSTART
- and
- .B RLENGTH
- are set to the position and length of the matched string.
- .TP
- \fBsplit(\fIs\fB, \fIa \fR[\fB, \fIfs\^\fR]\fB)\fR
- splits the string
- .I s
- into array elements
- .IB a [1] \fR,
- .IB a [2] \fR,
- \&...,
- .IB a [ n ] \fR,
- and returns
- .IR n .
- The separation is done with the regular expression
- .I fs
- or with the field separator
- .B FS
- if
- .I fs
- is not given.
- An empty string as field separator splits the string
- into one array element per character.
- .TP
- \fBsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
- substitutes
- .I t
- for the first occurrence of the regular expression
- .I r
- in the string
- .IR s .
- If
- .I s
- is not given,
- .B $0
- is used.
- .TP
- \fBgsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
- same as
- .B sub
- except that all occurrences of the regular expression
- are replaced;
- .B sub
- and
- .B gsub
- return the number of replacements.
- .TP
- .BI sprintf( fmt , " expr" , " ...\fB)
- the string resulting from formatting
- .I expr ...
- according to the
- .IR printf (3)
- format
- .IR fmt .
- .TP
- .BI system( cmd )
- executes
- .I cmd
- and returns its exit status. This will be \-1 upon error,
- .IR cmd 's
- exit status upon a normal exit,
- 256 +
- .I sig
- upon death-by-signal, where
- .I sig
- is the number of the murdering signal,
- or 512 +
- .I sig
- if there was a core dump.
- .TP
- .BI tolower( str )
- returns a copy of
- .I str
- with all upper-case characters translated to their
- corresponding lower-case equivalents.
- .TP
- .BI toupper( str )
- returns a copy of
- .I str
- with all lower-case characters translated to their
- corresponding upper-case equivalents.
- .PD
- .PP
- The ``function''
- .B getline
- sets
- .B $0
- to the next input record from the current input file;
- .B getline
- .BI < " file
- sets
- .B $0
- to the next record from
- .IR file .
- .B getline
- .I x
- sets variable
- .I x
- instead.
- Finally,
- .IB cmd " | getline
- pipes the output of
- .I cmd
- into
- .BR getline ;
- each call of
- .B getline
- returns the next line of output from
- .IR cmd .
- In all cases,
- .B getline
- returns 1 for a successful input,
- 0 for end of file, and \-1 for an error.
- .PP
- Patterns are arbitrary Boolean combinations
- (with
- .BR "! || &&" )
- of regular expressions and
- relational expressions.
- Regular expressions are as in
- .IR egrep ;
- see
- .IR grep (1).
- Isolated regular expressions
- in a pattern apply to the entire line.
- Regular expressions may also occur in
- relational expressions, using the operators
- .B ~
- and
- .BR !~ .
- .BI / re /
- is a constant regular expression;
- any string (constant or variable) may be used
- as a regular expression, except in the position of an isolated regular expression
- in a pattern.
- .PP
- A pattern may consist of two patterns separated by a comma;
- in this case, the action is performed for all lines
- from an occurrence of the first pattern
- through an occurrence of the second, inclusive.
- .PP
- A relational expression is one of the following:
- .IP
- .I expression matchop regular-expression
- .br
- .I expression relop expression
- .br
- .IB expression " in " array-name
- .br
- .BI ( expr ,\| expr ,\| ... ") in " array-name
- .PP
- where a
- .I relop
- is any of the six relational operators in C,
- and a
- .I matchop
- is either
- .B ~
- (matches)
- or
- .B !~
- (does not match).
- A conditional is an arithmetic expression,
- a relational expression,
- or a Boolean combination
- of these.
- .PP
- The special patterns
- .B BEGIN
- and
- .B END
- may be used to capture control before the first input line is read
- and after the last.
- .B BEGIN
- and
- .B END
- do not combine with other patterns.
- They may appear multiple times in a program and execute
- in the order they are read by
- .IR awk .
- .PP
- Variable names with special meanings:
- .TF FILENAME
- .TP
- .B ARGC
- argument count, assignable.
- .TP
- .B ARGV
- argument array, assignable;
- non-null members are taken as filenames.
- .TP
- .B CONVFMT
- conversion format used when converting numbers
- (default
- .BR "%.6g" ).
- .TP
- .B ENVIRON
- array of environment variables; subscripts are names.
- .TP
- .B FILENAME
- the name of the current input file.
- .TP
- .B FNR
- ordinal number of the current record in the current file.
- .TP
- .B FS
- regular expression used to separate fields; also settable
- by option
- .BI \-F fs\fR.
- .TP
- .BR NF
- number of fields in the current record.
- .TP
- .B NR
- ordinal number of the current record.
- .TP
- .B OFMT
- output format for numbers (default
- .BR "%.6g" ).
- .TP
- .B OFS
- output field separator (default space).
- .TP
- .B ORS
- output record separator (default newline).
- .TP
- .B RLENGTH
- the length of a string matched by
- .BR match .
- .TP
- .B RS
- input record separator (default newline).
- If empty, blank lines separate records.
- If more than one character long,
- .B RS
- is treated as a regular expression, and records are
- separated by text matching the expression.
- .TP
- .B RSTART
- the start position of a string matched by
- .BR match .
- .TP
- .B SUBSEP
- separates multiple subscripts (default 034).
- .PD
- .PP
- Functions may be defined (at the position of a pattern-action statement) thus:
- .IP
- .B
- function foo(a, b, c) { ... }
- .PP
- Parameters are passed by value if scalar and by reference if array name;
- functions may be called recursively.
- Parameters are local to the function; all other variables are global.
- Thus local variables may be created by providing excess parameters in
- the function definition.
- .SH ENVIRONMENT VARIABLES
- If
- .B POSIXLY_CORRECT
- is set in the environment, then
- .I awk
- follows the POSIX rules for
- .B sub
- and
- .B gsub
- with respect to consecutive backslashes and ampersands.
- .SH EXAMPLES
- .TP
- .EX
- length($0) > 72
- .EE
- Print lines longer than 72 characters.
- .TP
- .EX
- { print $2, $1 }
- .EE
- Print first two fields in opposite order.
- .PP
- .EX
- BEGIN { FS = ",[ \et]*|[ \et]+" }
- { print $2, $1 }
- .EE
- .ns
- .IP
- Same, with input fields separated by comma and/or spaces and tabs.
- .PP
- .EX
- .nf
- { s += $1 }
- END { print "sum is", s, " average is", s/NR }
- .fi
- .EE
- .ns
- .IP
- Add up first column, print sum and average.
- .TP
- .EX
- /start/, /stop/
- .EE
- Print all lines between start/stop pairs.
- .PP
- .EX
- .nf
- BEGIN { # Simulate echo(1)
- for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
- printf "\en"
- exit }
- .fi
- .EE
- .SH SEE ALSO
- .IR grep (1),
- .IR lex (1),
- .IR sed (1)
- .br
- A. V. Aho, B. W. Kernighan, P. J. Weinberger,
- .IR "The AWK Programming Language, Second Edition" ,
- Addison-Wesley, 2024. ISBN 978-0-13-826972-2, 0-13-826972-6.
- .SH BUGS
- There are no explicit conversions between numbers and strings.
- To force an expression to be treated as a number add 0 to it;
- to force it to be treated as a string concatenate
- \&\f(CW""\fP to it.
- .PP
- The scope rules for variables in functions are a botch;
- the syntax is worse.
- .PP
- Input is expected to be UTF-8 encoded. Other multibyte
- character sets are not handled.
- However, in eight-bit locales,
- .I awk
- treats each input byte as a separate character.
- .SH UNUSUAL FLOATING-POINT VALUES
- .I Awk
- was designed before IEEE 754 arithmetic defined Not-A-Number (NaN)
- and Infinity values, which are supported by all modern floating-point
- hardware.
- .PP
- Because
- .I awk
- uses
- .IR strtod (3)
- and
- .IR atof (3)
- to convert string values to double-precision floating-point values,
- modern C libraries also convert strings starting with
- .B inf
- and
- .B nan
- into infinity and NaN values respectively. This led to strange results,
- with something like this:
- .PP
- .EX
- .nf
- echo nancy | awk '{ print $1 + 0 }'
- .fi
- .EE
- .PP
- printing
- .B nan
- instead of zero.
- .PP
- .I Awk
- now follows GNU AWK, and prefilters string values before attempting
- to convert them to numbers, as follows:
- .TP
- .I "Hexadecimal values"
- Hexadecimal values (allowed since C99) convert to zero, as they did
- prior to C99.
- .TP
- .I "NaN values"
- The two strings
- .B +nan
- and
- .B \-nan
- (case independent) convert to NaN. No others do.
- (NaNs can have signs.)
- .TP
- .I "Infinity values"
- The two strings
- .B +inf
- and
- .B \-inf
- (case independent) convert to positive and negative infinity, respectively.
- No others do.