awk.1p (108240B)
- '\" et
- .TH AWK "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual"
- .\"
- .SH PROLOG
- This manual page is part of the POSIX Programmer's Manual.
- The Linux implementation of this interface may differ (consult
- the corresponding Linux manual page for details of Linux behavior),
- or the interface may not be implemented on Linux.
- .\"
- .SH NAME
- awk
- \(em pattern scanning and processing language
- .SH SYNOPSIS
- .LP
- .nf
- awk \fB[\fR-F \fIsepstring\fB] [\fR-v \fIassignment\fB]\fR... \fIprogram\fB [\fIargument\fR...\fB]\fR
- .P
- awk \fB[\fR-F \fIsepstring\fB] \fR-f \fIprogfile \fB[\fR-f \fIprogfile\fB]\fR... \fB[\fR-v \fIassignment\fB]\fR...
- \fB[\fIargument\fR...\fB]\fR
- .fi
- .SH DESCRIPTION
- The
- .IR awk
- utility shall execute programs written in the
- .IR awk
- programming language, which is specialized for textual data
- manipulation. An
- .IR awk
- program is a sequence of patterns and corresponding actions. When
- input is read that matches a pattern, the action associated with that
- pattern is carried out.
- .P
- Input shall be interpreted as a sequence of records. By default, a
- record is a line, less its terminating
- <newline>,
- but this can be changed by using the
- .BR RS
- built-in variable. Each record of input shall be matched in turn
- against each pattern in the program. For each pattern matched, the
- associated action shall be executed.
- .P
- The
- .IR awk
- utility shall interpret each input record as a sequence of fields
- where, by default, a field is a string of non-\c
- <blank>
- non-\c
- <newline>
- characters. This default
- <blank>
- and
- <newline>
- field delimiter can be changed by using the
- .BR FS
- built-in variable or the
- .BR \-F
- .IR sepstring
- option. The
- .IR awk
- utility shall denote the first field in a record $1, the second $2, and
- so on. The symbol $0 shall refer to the entire record; setting any
- other field causes the re-evaluation of $0. Assigning to $0 shall reset
- the values of all other fields and the
- .BR NF
- built-in variable.
- .SH OPTIONS
- The
- .IR awk
- utility shall conform to the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 12.2" ", " "Utility Syntax Guidelines".
- .P
- The following options shall be supported:
- .IP "\fB\-F\ \fIsepstring\fR" 10
- Define the input field separator. This option shall be equivalent to:
- .RS 10
- .sp
- .RS 4
- .nf
- -v FS=\fIsepstring
- .fi
- .P
- .RE
- .P
- except that if
- .BR \-F
- .IR sepstring
- and
- .BR \-v
- .IR \fRFS=\fPsepstring\fR
- are both used, it is unspecified whether the
- .BR FS
- assignment resulting from
- .BR \-F
- .IR sepstring
- is processed in command line order or is processed after the last
- .BR \-v
- .IR \fRFS=\fPsepstring\fR .
- See the description of the
- .BR FS
- built-in variable, and how it is used, in the EXTENDED DESCRIPTION
- section.
- .RE
- .IP "\fB\-f\ \fIprogfile\fR" 10
- Specify the pathname of the file
- .IR progfile
- containing an
- .IR awk
- program. A pathname of
- .BR '\-'
- shall denote the standard input. If multiple instances of this option
- are specified, the concatenation of the files specified as
- .IR progfile
- in the order specified shall be the
- .IR awk
- program. The
- .IR awk
- program can alternatively be specified in the command line as a single
- argument.
- .IP "\fB\-v\ \fIassignment\fR" 10
- .br
- The application shall ensure that the
- .IR assignment
- argument is in the same form as an
- .IR assignment
- operand. The specified variable assignment shall occur prior to
- executing the
- .IR awk
- program, including the actions associated with
- .BR BEGIN
- patterns (if any). Multiple occurrences of this option can be
- specified.
- .SH OPERANDS
- The following operands shall be supported:
- .IP "\fIprogram\fR" 10
- If no
- .BR \-f
- option is specified, the first operand to
- .IR awk
- shall be the text of the
- .IR awk
- program. The application shall supply the
- .IR program
- operand as a single argument to
- .IR awk .
- If the text does not end in a
- <newline>,
- .IR awk
- shall interpret the text as if it did.
- .IP "\fIargument\fR" 10
- Either of the following two types of
- .IR argument
- can be intermixed:
- .RS 10
- .IP "\fIfile\fR" 10
- A pathname of a file that contains the input to be read, which is
- matched against the set of patterns in the program. If no
- .IR file
- operands are specified, or if a
- .IR file
- operand is
- .BR '\-' ,
- the standard input shall be used.
- .IP "\fIassignment\fR" 10
- An operand that begins with an
- <underscore>
- or alphabetic character from the portable character set (see the table
- in the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 6.1" ", " "Portable Character Set"),
- followed by a sequence of underscores, digits, and alphabetics from the
- portable character set, followed by the
- .BR '='
- character, shall specify a variable assignment rather than a pathname.
- The characters before the
- .BR '='
- represent the name of an
- .IR awk
- variable; if that name is an
- .IR awk
- reserved word (see
- .IR "Grammar")
- the behavior is undefined. The characters following the
- <equals-sign>
- shall be interpreted as if they appeared in the
- .IR awk
- program preceded and followed by a double-quote (\c
- .BR '\&"' )
- character, as a
- .BR STRING
- token (see
- .IR "Grammar"),
- except that if the last character is an unescaped
- <backslash>,
- it shall be interpreted as a literal
- <backslash>
- rather than as the first character of the sequence
- .BR \(dq\e"\(dq .
- The variable shall be assigned the value of that
- .BR STRING
- token and, if appropriate, shall be considered a
- .IR "numeric string"
- (see
- .IR "Expressions in awk"),
- the variable shall also be assigned its numeric value. Each such
- variable assignment shall occur just prior to the processing of the
- following
- .IR file ,
- if any. Thus, an assignment before the first
- .IR file
- argument shall be executed after the
- .BR BEGIN
- actions (if any), while an assignment after the last
- .IR file
- argument shall occur before the
- .BR END
- actions (if any). If there are no
- .IR file
- arguments, assignments shall be executed before processing the standard
- input.
- .RE
- .SH STDIN
- The standard input shall be used only if no
- .IR file
- operands are specified, or if a
- .IR file
- operand is
- .BR '\-' ,
- or if a
- .IR progfile
- option-argument is
- .BR '\-' ;
- see the INPUT FILES section. If the
- .IR awk
- program contains no actions and no patterns, but is otherwise a valid
- .IR awk
- program, standard input and any
- .IR file
- operands shall not be read and
- .IR awk
- shall exit with a return status of zero.
- .SH "INPUT FILES"
- Input files to the
- .IR awk
- program from any of the following sources shall be text files:
- .IP " *" 4
- Any
- .IR file
- operands or their equivalents, achieved by modifying the
- .IR awk
- variables
- .BR ARGV
- and
- .BR ARGC
- .IP " *" 4
- Standard input in the absence of any
- .IR file
- operands
- .IP " *" 4
- Arguments to the
- .BR getline
- function
- .P
- Whether the variable
- .BR RS
- is set to a value other than a
- <newline>
- or not, for these files, implementations shall support records
- terminated with the specified separator up to
- {LINE_MAX}
- bytes and may support longer records.
- .P
- If
- .BR \-f
- .IR progfile
- is specified, the application shall ensure that the files named by each
- of the
- .IR progfile
- option-arguments are text files and their concatenation, in the same
- order as they appear in the arguments, is an
- .IR awk
- program.
- .SH "ENVIRONMENT VARIABLES"
- The following environment variables shall affect the execution of
- .IR awk :
- .IP "\fILANG\fP" 10
- Provide a default value for the internationalization variables that are
- unset or null. (See the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 8.2" ", " "Internationalization Variables"
- for the precedence of internationalization variables used to determine
- the values of locale categories.)
- .IP "\fILC_ALL\fP" 10
- If set to a non-empty string value, override the values of all the
- other internationalization variables.
- .IP "\fILC_COLLATE\fP" 10
- .br
- Determine the locale for the behavior of ranges, equivalence classes,
- and multi-character collating elements within regular expressions and
- in comparisons of string values.
- .IP "\fILC_CTYPE\fP" 10
- Determine the locale for the interpretation of sequences of bytes of
- text data as characters (for example, single-byte as opposed to
- multi-byte characters in arguments and input files), the behavior of
- character classes within regular expressions, the identification of
- characters as letters, and the mapping of uppercase and lowercase
- characters for the
- .BR toupper
- and
- .BR tolower
- functions.
- .IP "\fILC_MESSAGES\fP" 10
- .br
- Determine the locale that should be used to affect the format and
- contents of diagnostic messages written to standard error.
- .IP "\fILC_NUMERIC\fP" 10
- .br
- Determine the radix character used when interpreting numeric input,
- performing conversions between numeric and string values, and
- formatting numeric output. Regardless of locale, the
- <period>
- character (the decimal-point character of the POSIX locale) is the
- decimal-point character recognized in processing
- .IR awk
- programs (including assignments in command line arguments).
- .IP "\fINLSPATH\fP" 10
- Determine the location of message catalogs for the processing of
- .IR LC_MESSAGES .
- .IP "\fIPATH\fP" 10
- Determine the search path when looking for commands executed by
- \fIsystem\fR(\fIexpr\fR), or input and output pipes; see the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Chapter 8" ", " "Environment Variables".
- .P
- In addition, all environment variables shall be visible via the
- .IR awk
- variable
- .BR ENVIRON .
- .SH "ASYNCHRONOUS EVENTS"
- Default.
- .SH STDOUT
- The nature of the output files depends on the
- .IR awk
- program.
- .SH STDERR
- The standard error shall be used only for diagnostic messages.
- .SH "OUTPUT FILES"
- The nature of the output files depends on the
- .IR awk
- program.
- .br
- .SH "EXTENDED DESCRIPTION"
- .SS "Overall Program Structure"
- .P
- An
- .IR awk
- program is composed of pairs of the form:
- .sp
- .RS 4
- .nf
- \fIpattern\fR { \fIaction\fR }
- .fi
- .P
- .RE
- .P
- Either the pattern or the action (including the enclosing brace
- characters) can be omitted.
- .P
- A missing pattern shall match any record of input, and a missing action
- shall be equivalent to:
- .sp
- .RS 4
- .nf
- { print }
- .fi
- .P
- .RE
- .P
- Execution of the
- .IR awk
- program shall start by first executing the actions associated with all
- .BR BEGIN
- patterns in the order they occur in the program. Then each
- .IR file
- operand (or standard input if no files were specified) shall be
- processed in turn by reading data from the file until a record
- separator is seen (\c
- <newline>
- by default). Before the first reference to a field in the record is
- evaluated, the record shall be split into fields, according to the
- rules in
- .IR "Regular Expressions",
- using the value of
- .BR FS
- that was current at the time the record was read. Each pattern in the
- program then shall be evaluated in the order of occurrence, and the
- action associated with each pattern that matches the current record
- executed. The action for a matching pattern shall be executed before
- evaluating subsequent patterns. Finally, the actions associated with
- all
- .BR END
- patterns shall be executed in the order they occur in the program.
- .SS "Expressions in awk"
- .P
- Expressions describe computations used in
- .IR patterns
- and
- .IR actions .
- In the following table, valid expression operations are given in groups
- from highest precedence first to lowest precedence last, with
- equal-precedence operators grouped between horizontal lines. In
- expression evaluation, where the grammar is formally ambiguous, higher
- precedence operators shall be evaluated before lower precedence
- operators. In this table
- .IR expr ,
- .IR expr1 ,
- .IR expr2 ,
- and
- .IR expr3
- represent any expression, while lvalue represents any entity that can
- be assigned to (that is, on the left side of an assignment operator).
- The precise syntax of expressions is given in
- .IR "Grammar".
- .sp
- .ce 1
- \fBTable 4-1: Expressions in Decreasing Precedence in \fIawk\fP\fR
- .TS
- box tab(@) center;
- cB | cB | cB | cB
- l1f5 | l1 | l1 | l.
- Syntax@Name@Type of Result@Associativity
- _
- ( \fIexpr\fP )@Grouping@Type of \fIexpr\fP@N/A
- _
- $\fIexpr\fP@Field reference@String@N/A
- _
- lvalue ++@Post-increment@Numeric@N/A
- lvalue \-\|\-@Post-decrement@Numeric@N/A
- _
- ++ lvalue@Pre-increment@Numeric@N/A
- \-\|\- lvalue@Pre-decrement@Numeric@N/A
- _
- \fIexpr\fP ^ \fIexpr\fP@Exponentiation@Numeric@Right
- _
- ! \fIexpr\fP@Logical not@Numeric@N/A
- + \fIexpr\fP@Unary plus@Numeric@N/A
- \- \fIexpr\fP@Unary minus@Numeric@N/A
- _
- \fIexpr\fP * \fIexpr\fP@Multiplication@Numeric@Left
- \fIexpr\fP / \fIexpr\fP@Division@Numeric@Left
- \fIexpr\fP % \fIexpr\fP@Modulus@Numeric@Left
- _
- \fIexpr\fP + \fIexpr\fP@Addition@Numeric@Left
- \fIexpr\fP \- \fIexpr\fP@Subtraction@Numeric@Left
- _
- \fIexpr\fP \fIexpr\fP@String concatenation@String@Left
- _
- \fIexpr\fP < \fIexpr\fP@Less than@Numeric@None
- \fIexpr\fP <= \fIexpr\fP@Less than or equal to@Numeric@None
- \fIexpr\fP != \fIexpr\fP@Not equal to@Numeric@None
- \fIexpr\fP == \fIexpr\fP@Equal to@Numeric@None
- \fIexpr\fP > \fIexpr\fP@Greater than@Numeric@None
- \fIexpr\fP >= \fIexpr\fP@Greater than or equal to@Numeric@None
- _
- \fIexpr\fP ~ \fIexpr\fP@ERE match@Numeric@None
- \fIexpr\fP !~ \fIexpr\fP@ERE non-match@Numeric@None
- _
- \fIexpr\fP in array@Array membership@Numeric@Left
- ( \fIindex\fP ) in \fIarray\fP@Multi-dimension array@Numeric@Left
- @membership
- _
- \fIexpr\fP && \fIexpr\fP@Logical AND@Numeric@Left
- _
- \fIexpr\fP || \fIexpr\fP@Logical OR@Numeric@Left
- _
- \fIexpr1\fP ? \fIexpr2\fP : \fIexpr3\fP@Conditional expression@Type of selected@Right
- @@\fIexpr2\fP or \fIexpr3\fP
- _
- lvalue ^= \fIexpr\fP@Exponentiation assignment@Numeric@Right
- lvalue %= \fIexpr\fP@Modulus assignment@Numeric@Right
- lvalue *= \fIexpr\fP@Multiplication assignment@Numeric@Right
- lvalue /= \fIexpr\fP@Division assignment@Numeric@Right
- lvalue += \fIexpr\fP@Addition assignment@Numeric@Right
- lvalue \-= \fIexpr\fP@Subtraction assignment@Numeric@Right
- lvalue = \fIexpr\fP@Assignment@Type of \fIexpr\fP@Right
- .TE
- .P
- Each expression shall have either a string value, a numeric value, or
- both. Except as stated for specific contexts, the value of an expression
- shall be implicitly converted to the type needed for the context in which
- it is used. A string value shall be converted to a numeric value either by
- the equivalent of the following calls to functions defined by the ISO\ C standard:
- .sp
- .RS 4
- .nf
- setlocale(LC_NUMERIC, "");
- \fInumeric_value\fR = atof(\fIstring_value\fR);
- .fi
- .P
- .RE
- .P
- or by converting the initial portion of the string to type
- .BR double
- representation as follows:
- .sp
- .RS
- The input string is decomposed into two parts: an initial, possibly empty,
- sequence of white-space characters (as specified by
- \fIisspace\fR())
- and a subject sequence interpreted as a floating-point constant.
- .P
- The expected form of the subject sequence is an optional
- .BR '+'
- or
- .BR '\-'
- sign, then a non-empty sequence of digits optionally containing a
- <period>,
- then an optional exponent part. An exponent part consists of
- .BR 'e'
- or
- .BR 'E' ,
- followed by an optional sign, followed by one or more decimal digits.
- .P
- The sequence starting with the first digit or the
- <period>
- (whichever occurs first) is interpreted as a floating constant of the
- C language, and if neither an exponent part nor a
- <period>
- appears, a
- <period>
- is assumed to follow the last digit in the string. If the subject
- sequence begins with a
- <hyphen-minus>,
- the value resulting from the conversion is negated.
- .RE
- .P
- A numeric value that is exactly equal to the value of an integer (see
- .IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard")
- shall be converted to a string by the equivalent of a call to the
- .BR sprintf
- function (see
- .IR "String Functions")
- with the string
- .BR \(dq%d\(dq
- as the
- .IR fmt
- argument and the numeric value being converted as the first and only
- .IR expr
- argument. Any other numeric value shall be converted to a string by the
- equivalent of a call to the
- .BR sprintf
- function with the value of the variable
- .BR CONVFMT
- as the
- .IR fmt
- argument and the numeric value being converted as the first and only
- .IR expr
- argument. The result of the conversion is unspecified if the value of
- .BR CONVFMT
- is not a floating-point format specification. This volume of POSIX.1\(hy2017 specifies no
- explicit conversions between numbers and strings. An application can
- force an expression to be treated as a number by adding zero to it, or
- can force it to be treated as a string by concatenating the null string
- (\c
- .BR \(dq\^\(dq )
- to it.
- .P
- A string value shall be considered a
- .IR "numeric string"
- if it comes from one of the following:
- .IP " 1." 4
- Field variables
- .IP " 2." 4
- Input from the
- \fIgetline\fR()
- function
- .IP " 3." 4
- .BR FILENAME
- .IP " 4." 4
- .BR ARGV
- array elements
- .IP " 5." 4
- .BR ENVIRON
- array elements
- .IP " 6." 4
- Array elements created by the
- \fIsplit\fR()
- function
- .IP " 7." 4
- A command line variable assignment
- .IP " 8." 4
- Variable assignment from another numeric string variable
- .P
- and an implementation-dependent condition corresponding to either
- case (a) or (b) below is met.
- .IP " a." 4
- After the equivalent of the following calls to functions defined by
- the ISO\ C standard,
- .IR string_value_end
- would differ from
- .IR string_value ,
- and any characters before the terminating null character in
- .IR string_value_end
- would be
- <blank>
- characters:
- .RS 4
- .sp
- .RS 4
- .nf
- char *string_value_end;
- setlocale(LC_NUMERIC, "");
- numeric_value = strtod (string_value, &string_value_end);
- .fi
- .P
- .RE
- .RE
- .IP " b." 4
- After all the following conversions have been applied, the resulting
- string would lexically be recognized as a
- .BR NUMBER
- token as described by the lexical conventions in
- .IR "Grammar":
- .RS 4
- .IP -- 4
- All leading and trailing
- <blank>
- characters are discarded.
- .IP -- 4
- If the first non-\c
- <blank>
- is
- .BR '\(pl'
- or
- .BR '\-' ,
- it is discarded.
- .IP -- 4
- Each occurrence of the decimal point character from the current locale
- is changed to a
- <period>.
- .RE
- In case (a) the numeric value of the
- .IR "numeric string"
- shall be the value that would be returned by the
- \fIstrtod\fR()
- call. In case (b) if the first non-\c
- <blank>
- is
- .BR '\-' ,
- the numeric value of the
- .IR "numeric string"
- shall be the negation of the numeric value of the recognized
- .BR NUMBER
- token; otherwise, the numeric value of the
- .IR "numeric string"
- shall be the numeric value of the recognized
- .BR NUMBER
- token. Whether or not a string is a
- .IR "numeric string"
- shall be relevant only in contexts where that term is used in this
- section.
- .P
- When an expression is used in a Boolean context, if it has a numeric
- value, a value of zero shall be treated as false and any other value
- shall be treated as true. Otherwise, a string value of the null string
- shall be treated as false and any other value shall be treated as true.
- A Boolean context shall be one of the following:
- .IP " *" 4
- The first subexpression of a conditional expression
- .IP " *" 4
- An expression operated on by logical NOT, logical AND, or logical OR
- .IP " *" 4
- The second expression of a
- .BR for
- statement
- .IP " *" 4
- The expression of an
- .BR if
- statement
- .IP " *" 4
- The expression of the
- .BR while
- clause in either a
- .BR while
- or
- .BR do .\|.\|.\c
- .BR while
- statement
- .IP " *" 4
- An expression used as a pattern (as in Overall Program Structure)
- .P
- All arithmetic shall follow the semantics of floating-point arithmetic as
- specified by the ISO\ C standard (see
- .IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard").
- .P
- The value of the expression:
- .sp
- .RS 4
- .nf
- \fIexpr1\fR \(ha \fIexpr2\fR
- .fi
- .P
- .RE
- .P
- shall be equivalent to the value returned by the ISO\ C standard function call:
- .sp
- .RS 4
- .nf
- \fRpow(\fIexpr1\fR, \fIexpr2\fR)
- .fi
- .P
- .RE
- .P
- The expression:
- .sp
- .RS 4
- .nf
- lvalue \(ha= \fIexpr\fR
- .fi
- .P
- .RE
- .P
- shall be equivalent to the ISO\ C standard expression:
- .sp
- .RS 4
- .nf
- lvalue = pow(lvalue, \fIexpr\fR)
- .fi
- .P
- .RE
- .P
- except that lvalue shall be evaluated only once. The value of the
- expression:
- .sp
- .RS 4
- .nf
- \fIexpr1\fR % \fIexpr2\fR
- .fi
- .P
- .RE
- .P
- shall be equivalent to the value returned by the ISO\ C standard function call:
- .sp
- .RS 4
- .nf
- fmod(\fIexpr1\fR, \fIexpr2\fR)
- .fi
- .P
- .RE
- .P
- The expression:
- .sp
- .RS 4
- .nf
- lvalue %= \fIexpr\fR
- .fi
- .P
- .RE
- .P
- shall be equivalent to the ISO\ C standard expression:
- .sp
- .RS 4
- .nf
- lvalue = fmod(lvalue, \fIexpr\fR)
- .fi
- .P
- .RE
- .P
- except that lvalue shall be evaluated only once.
- .P
- Variables and fields shall be set by the assignment statement:
- .sp
- .RS 4
- .nf
- lvalue = \fIexpression\fR
- .fi
- .P
- .RE
- .P
- and the type of
- .IR expression
- shall determine the resulting variable type. The assignment includes
- the arithmetic assignments (\c
- .BR \(dq+=\(dq ,
- .BR \(dq-=\(dq ,
- .BR \(dq*=\(dq ,
- .BR \(dq/=\(dq ,
- .BR \(dq%=\(dq ,
- .BR \(dq\(ha=\(dq ,
- .BR \(dq++\(dq ,
- .BR \(dq--\(dq )
- all of which shall produce a numeric result. The left-hand side of an
- assignment and the target of increment and decrement operators can be
- one of a variable, an array with index, or a field selector.
- .P
- The
- .IR awk
- language supplies arrays that are used for storing numbers or strings.
- Arrays need not be declared. They shall initially be empty, and their
- sizes shall change dynamically. The subscripts, or element identifiers,
- are strings, providing a type of associative array capability. An array
- name followed by a subscript within square brackets can be used as an
- lvalue and thus as an expression, as described in the grammar; see
- .IR "Grammar".
- Unsubscripted array names can be used in only the following contexts:
- .IP " *" 4
- A parameter in a function definition or function call
- .IP " *" 4
- The
- .BR NAME
- token following any use of the keyword
- .BR in
- as specified in the grammar (see
- .IR "Grammar");
- if the name used in this context is not an array name, the behavior is
- undefined
- .P
- A valid array
- .IR index
- shall consist of one or more
- <comma>-separated
- expressions, similar to the way in which multi-dimensional arrays are
- indexed in some programming languages. Because
- .IR awk
- arrays are really one-dimensional, such a
- <comma>-separated
- list shall be converted to a single string by concatenating the string
- values of the separate expressions, each separated from the other by
- the value of the
- .BR SUBSEP
- variable. Thus, the following two index operations shall be
- equivalent:
- .sp
- .RS 4
- .nf
- \fIvar\fB[\fIexpr1\fR, \fIexpr2\fR, ... \fIexprn\fB]
- .P
- \fIvar\fB[\fIexpr1\fR SUBSEP \fIexpr2\fR SUBSEP ... \fRSUBSEP \fIexprn\fB]\fR
- .fi
- .P
- .RE
- .P
- The application shall ensure that a multi-dimensioned
- .IR index
- used with the
- .BR in
- operator is parenthesized. The
- .BR in
- operator, which tests for the existence of a particular array element,
- shall not cause that element to exist. Any other reference to a
- nonexistent array element shall automatically create it.
- .P
- Comparisons (with the
- .BR '<' ,
- .BR \(dq<=\(dq ,
- .BR \(dq!=\(dq ,
- .BR \(dq==\(dq ,
- .BR '>' ,
- and
- .BR \(dq>=\(dq
- operators) shall be made numerically if both operands are numeric, if
- one is numeric and the other has a string value that is a numeric
- string, or if one is numeric and the other has the uninitialized value.
- Otherwise, operands shall be converted to strings as required and a
- string comparison shall be made as follows:
- .IP " *" 4
- For the
- .BR \(dq!=\(dq
- and
- .BR \(dq==\(dq
- operators, the strings should be compared to check if they are
- identical but may be compared using the locale-specific collation
- sequence to check if they collate equally.
- .IP " *" 4
- For the other operators, the strings shall be compared using the
- locale-specific collation sequence.
- .P
- The value of the comparison expression shall be 1 if the relation is
- true, or 0 if the relation is false.
- .SS "Variables and Special Variables"
- .P
- Variables can be used in an
- .IR awk
- program by referencing them. With the exception of function parameters
- (see
- .IR "User-Defined Functions"),
- they are not explicitly declared. Function parameter names shall be
- local to the function; all other variable names shall be global. The
- same name shall not be used as both a function parameter name and as
- the name of a function or a special
- .IR awk
- variable. The same name shall not be used both as a variable name with
- global scope and as the name of a function. The same name shall not be
- used within the same scope both as a scalar variable and as an array.
- Uninitialized variables, including scalar variables, array elements,
- and field variables, shall have an uninitialized value. An
- uninitialized value shall have both a numeric value of zero and a
- string value of the empty string. Evaluation of variables with an
- uninitialized value, to either string or numeric, shall be determined
- by the context in which they are used.
- .P
- Field variables shall be designated by a
- .BR '$'
- followed by a number or numerical expression. The effect of the field
- number
- .IR expression
- evaluating to anything other than a non-negative integer is
- unspecified; uninitialized variables or string values need not be
- converted to numeric values in this context. New field variables can be
- created by assigning a value to them. References to nonexistent fields
- (that is, fields after $\fBNF\fP), shall evaluate to the uninitialized
- value. Such references shall not create new fields. However, assigning
- to a nonexistent field (for example, $(\fBNF\fP+2)=5) shall increase
- the value of
- .BR NF ;
- create any intervening fields with the uninitialized value; and cause
- the value of $0 to be recomputed, with the fields being separated by
- the value of
- .BR OFS .
- Each field variable shall have a string value or an uninitialized value
- when created. Field variables shall have the uninitialized value when
- created from $0 using
- .BR FS
- and the variable does not contain any characters. If appropriate, the
- field variable shall be considered a numeric string (see
- .IR "Expressions in awk").
- .P
- Implementations shall support the following other special variables
- that are set by
- .IR awk :
- .IP "\fBARGC\fR" 10
- The number of elements in the
- .BR ARGV
- array.
- .IP "\fBARGV\fR" 10
- An array of command line arguments, excluding options and the
- .IR program
- argument, numbered from zero to
- .BR ARGC \-1.
- .RS 10
- .P
- The arguments in
- .BR ARGV
- can be modified or added to;
- .BR ARGC
- can be altered. As each input file ends,
- .IR awk
- shall treat the next non-null element of
- .BR ARGV ,
- up to the current value of
- .BR ARGC \-1,
- inclusive, as the name of the next input file. Thus, setting an element
- of
- .BR ARGV
- to null means that it shall not be treated as an input file. The name
- .BR '\-'
- indicates the standard input. If an argument matches the format of an
- .IR assignment
- operand, this argument shall be treated as an
- .IR assignment
- rather than a
- .IR file
- argument.
- .RE
- .IP "\fBCONVFMT\fR" 10
- The
- .BR printf
- format for converting numbers to strings (except for output statements,
- where
- .BR OFMT
- is used);
- .BR \(dq%.6g\(dq
- by default.
- .IP "\fBENVIRON\fR" 10
- An array representing the value of the environment, as described in the
- .IR exec
- functions defined in the System Interfaces volume of POSIX.1\(hy2017. The indices of the array shall be
- strings consisting of the names of the environment variables, and the
- value of each array element shall be a string consisting of the value
- of that variable. If appropriate, the environment variable shall be
- considered a
- .IR "numeric string"
- (see
- .IR "Expressions in awk");
- the array element shall also have its numeric value.
- .RS 10
- .P
- In all cases where the behavior of
- .IR awk
- is affected by environment variables (including the environment of any
- commands that
- .IR awk
- executes via the
- .BR system
- function or via pipeline redirections with the
- .BR print
- statement, the
- .BR printf
- statement, or the
- .BR getline
- function), the environment used shall be the environment at the time
- .IR awk
- began executing; it is implementation-defined whether any
- modification of
- .BR ENVIRON
- affects this environment.
- .RE
- .IP "\fBFILENAME\fR" 10
- A pathname of the current input file. Inside a
- .BR BEGIN
- action the value is undefined. Inside an
- .BR END
- action the value shall be the name of the last input file processed.
- .IP "\fBFNR\fR" 10
- The ordinal number of the current record in the current file. Inside a
- .BR BEGIN
- action the value shall be zero. Inside an
- .BR END
- action the value shall be the number of the last record processed in
- the last file processed.
- .IP "\fBFS\fR" 10
- Input field separator regular expression; a
- <space>
- by default.
- .IP "\fBNF\fR" 10
- The number of fields in the current record. Inside a
- .BR BEGIN
- action, the use of
- .BR NF
- is undefined unless a
- .BR getline
- function without a
- .IR var
- argument is executed previously. Inside an
- .BR END
- action,
- .BR NF
- shall retain the value it had for the last record read, unless a
- subsequent, redirected,
- .BR getline
- function without a
- .IR var
- argument is performed prior to entering the
- .BR END
- action.
- .IP "\fBNR\fR" 10
- The ordinal number of the current record from the start of input.
- Inside a
- .BR BEGIN
- action the value shall be zero. Inside an
- .BR END
- action the value shall be the number of the last record processed.
- .IP "\fBOFMT\fR" 10
- The
- .BR printf
- format for converting numbers to strings in output statements (see
- .IR "Output Statements");
- .BR \(dq%.6g\(dq
- by default. The result of the conversion is unspecified if the value of
- .BR OFMT
- is not a floating-point format specification.
- .IP "\fBOFS\fR" 10
- The
- .BR print
- statement output field separator;
- <space>
- by default.
- .IP "\fBORS\fR" 10
- The
- .BR print
- statement output record separator; a
- <newline>
- by default.
- .IP "\fBRLENGTH\fR" 10
- The length of the string matched by the
- .BR match
- function.
- .IP "\fBRS\fR" 10
- The first character of the string value of
- .BR RS
- shall be the input record separator; a
- <newline>
- by default. If
- .BR RS
- contains more than one character, the results are unspecified. If
- .BR RS
- is null, then records are separated by sequences consisting of a
- <newline>
- plus one or more blank lines, leading or trailing blank lines shall not
- result in empty records at the beginning or end of the input, and a
- <newline>
- shall always be a field separator, no matter what the value of
- .BR FS
- is.
- .IP "\fBRSTART\fR" 10
- The starting position of the string matched by the
- .BR match
- function, numbering from 1. This shall always be equivalent to the
- return value of the
- .BR match
- function.
- .IP "\fBSUBSEP\fR" 10
- The subscript separator string for multi-dimensional arrays; the
- default value is implementation-defined.
- .SS "Regular Expressions"
- .P
- The
- .IR awk
- utility shall make use of the extended regular expression notation
- (see the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 9.4" ", " "Extended Regular Expressions")
- except that it shall allow the use of C-language conventions
- for escaping special characters within the EREs, as specified in the
- table in the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Chapter 5" ", " "File Format Notation"
- (\c
- .BR '\e\e' ,
- .BR '\ea' ,
- .BR '\eb' ,
- .BR '\ef' ,
- .BR '\en' ,
- .BR '\er' ,
- .BR '\et' ,
- .BR '\ev' )
- and the following table; these escape sequences shall be recognized
- both inside and outside bracket expressions. Note that records need not
- be separated by
- <newline>
- characters and string constants can contain
- <newline>
- characters, so even the
- .BR \(dq\en\(dq
- sequence is valid in
- .IR awk
- EREs. Using a
- <slash>
- character within an ERE requires the escaping shown in the following
- table.
- .br
- .sp
- .ce 1
- \fBTable 4-2: Escape Sequences in \fIawk\fP\fR
- .ad l
- .TS
- center tab(@) box;
- cB | cB | cB
- cB | cB | cB
- lf5 | lw(34) | lw(34).
- Escape
- Sequence@Description@Meaning
- _
- \e"@T{
- <backslash> <quotation-mark>
- T}@T{
- <quotation-mark> character
- T}
- _
- \e/@T{
- <backslash> <slash>
- T}@T{
- <slash> character
- T}
- _
- \eddd@T{
- A
- <backslash>
- character followed by the longest sequence of one, two, or
- three octal-digit characters (01234567). If all of the digits are 0
- (that is, representation of the NUL character), the behavior is
- undefined.
- T}@T{
- The character whose encoding is represented by the one, two, or
- three-digit octal integer. Multi-byte characters require
- multiple, concatenated escape sequences of this type, including the
- leading
- <backslash>
- for each byte.
- T}
- _
- \ec@T{
- A
- <backslash>
- character followed by any character not described in this
- table or in the table in the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Chapter 5" ", " "File Format Notation"
- (\c
- .BR '\e\e' ,
- .BR '\ea' ,
- .BR '\eb' ,
- .BR '\ef' ,
- .BR '\en' ,
- .BR '\er' ,
- .BR '\et' ,
- .BR '\ev' ).
- T}@Undefined
- .TE
- .ad b
- .P
- A regular expression can be matched against a specific field or string
- by using one of the two regular expression matching operators,
- .BR '\(ti'
- and
- .BR \(dq!\(ti\(dq .
- These operators shall interpret their right-hand operand as a regular
- expression and their left-hand operand as a string. If the regular
- expression matches the string, the
- .BR '\(ti'
- expression shall evaluate to a value of 1, and the
- .BR \(dq!\(ti\(dq
- expression shall evaluate to a value of 0. (The regular expression
- matching operation is as defined by the term matched in the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 9.1" ", " "Regular Expression Definitions",
- where a match occurs on any part of the string unless the regular
- expression is limited with the
- <circumflex>
- or
- <dollar-sign>
- special characters.) If the regular expression does not match the
- string, the
- .BR '\(ti'
- expression shall evaluate to a value of 0, and the
- .BR \(dq!\(ti\(dq
- expression shall evaluate to a value of 1. If the right-hand operand is
- any expression other than the lexical token
- .BR ERE ,
- the string value of the expression shall be interpreted as an extended
- regular expression, including the escape conventions described above.
- Note that these same escape conventions shall also be applied in
- determining the value of a string literal (the lexical token
- .BR STRING ),
- and thus shall be applied a second time when a string literal is used
- in this context.
- .P
- When an
- .BR ERE
- token appears as an expression in any context other than as the
- right-hand of the
- .BR '\(ti'
- or
- .BR \(dq!\(ti\(dq
- operator or as one of the built-in function arguments described below,
- the value of the resulting expression shall be the equivalent of:
- .sp
- .RS 4
- .nf
- $0 \(ti /\fIere\fR/
- .fi
- .P
- .RE
- .P
- The
- .IR ere
- argument to the
- .BR gsub ,
- .BR match ,
- .BR sub
- functions, and the
- .IR fs
- argument to the
- .BR split
- function (see
- .IR "String Functions")
- shall be interpreted as extended regular expressions. These can be
- either
- .BR ERE
- tokens or arbitrary expressions, and shall be interpreted in the same
- manner as the right-hand side of the
- .BR '\(ti'
- or
- .BR \(dq!\(ti\(dq
- operator.
- .P
- An extended regular expression can be used to separate fields by assigning
- a string containing the expression to the built-in variable
- .BR FS ,
- either directly or as a consequence of using the
- .BR \-F
- .IR sepstring
- option.
- The default value of the
- .BR FS
- variable shall be a single
- <space>.
- The following describes
- .BR FS
- behavior:
- .IP " 1." 4
- If
- .BR FS
- is a null string, the behavior is unspecified.
- .IP " 2." 4
- If
- .BR FS
- is a single character:
- .RS 4
- .IP " a." 4
- If
- .BR FS
- is
- <space>,
- skip leading and trailing
- <blank>
- and
- <newline>
- characters; fields shall be delimited by sets of one or more
- <blank>
- or
- <newline>
- characters.
- .IP " b." 4
- Otherwise, if
- .BR FS
- is any other character
- .IR c ,
- fields shall be delimited by each single occurrence of
- .IR c .
- .RE
- .IP " 3." 4
- Otherwise, the string value of
- .BR FS
- shall be considered to be an extended regular expression. Each
- occurrence of a sequence matching the extended regular expression shall
- delimit fields.
- .P
- Except for the
- .BR '\(ti'
- and
- .BR \(dq!\(ti\(dq
- operators, and in the
- .BR gsub ,
- .BR match ,
- .BR split ,
- and
- .BR sub
- built-in functions, ERE matching shall be based on input records; that
- is, record separator characters (the first character of the value of
- the variable
- .BR RS ,
- <newline>
- by default) cannot be embedded in the expression, and no expression
- shall match the record separator character. If the record separator is
- not
- <newline>,
- <newline>
- characters embedded in the expression can be matched. For the
- .BR '\(ti'
- and
- .BR \(dq!\(ti\(dq
- operators, and in those four built-in functions, ERE matching shall be
- based on text strings; that is, any character (including
- <newline>
- and the record separator) can be embedded in the pattern, and an
- appropriate pattern shall match any character. However, in all
- .IR awk
- ERE matching, the use of one or more NUL characters in the pattern,
- input record, or text string produces undefined results.
- .SS "Patterns"
- .P
- A
- .IR pattern
- is any valid
- .IR expression ,
- a range specified by two expressions separated by a comma, or one of the
- two special patterns
- .BR BEGIN
- or
- .BR END .
- .SS "Special Patterns"
- .P
- The
- .IR awk
- utility shall recognize two special patterns,
- .BR BEGIN
- and
- .BR END .
- Each
- .BR BEGIN
- pattern shall be matched once and its associated action executed before
- the first record of input is read\(emexcept possibly by use of the
- .BR getline
- function (see
- .IR "Input/Output and General Functions")
- in a prior
- .BR BEGIN
- action\(emand before command line assignment is done. Each
- .BR END
- pattern shall be matched once and its associated action executed after
- the last record of input has been read. These two patterns shall have
- associated actions.
- .P
- .BR BEGIN
- and
- .BR END
- shall not combine with other patterns. Multiple
- .BR BEGIN
- and
- .BR END
- patterns shall be allowed. The actions associated with the
- .BR BEGIN
- patterns shall be executed in the order specified in the program, as
- are the
- .BR END
- actions. An
- .BR END
- pattern can precede a
- .BR BEGIN
- pattern in a program.
- .P
- If an
- .IR awk
- program consists of only actions with the pattern
- .BR BEGIN ,
- and the
- .BR BEGIN
- action contains no
- .BR getline
- function,
- .IR awk
- shall exit without reading its input when the last statement in the
- last
- .BR BEGIN
- action is executed. If an
- .IR awk
- program consists of only actions with the pattern
- .BR END
- or only actions with the patterns
- .BR BEGIN
- and
- .BR END ,
- the input shall be read before the statements in the
- .BR END
- actions are executed.
- .SS "Expression Patterns"
- .P
- An expression pattern shall be evaluated as if it were an expression in
- a Boolean context. If the result is true, the pattern shall be
- considered to match, and the associated action (if any) shall be
- executed. If the result is false, the action shall not be executed.
- .SS "Pattern Ranges"
- .P
- A pattern range consists of two expressions separated by a comma; in
- this case, the action shall be performed for all records between a
- match of the first expression and the following match of the second
- expression, inclusive. At this point, the pattern range can be repeated
- starting at input records subsequent to the end of the matched range.
- .SS "Actions"
- .P
- An action is a sequence of statements as shown in the grammar in
- .IR "Grammar".
- Any single statement can be replaced by a statement list enclosed in
- curly braces. The application shall ensure that statements in a
- statement list are separated by
- <newline>
- or
- <semicolon>
- characters. Statements in a statement list shall be executed sequentially
- in the order that they appear.
- .P
- The
- .IR expression
- acting as the conditional in an
- .BR if
- statement shall be evaluated and if it is non-zero or non-null, the
- following statement shall be executed; otherwise, if
- .BR else
- is present, the statement following the
- .BR else
- shall be executed.
- .P
- The
- .BR if ,
- .BR while ,
- .BR do .\|.\|.\c
- .BR while ,
- .BR for ,
- .BR break ,
- and
- .BR continue
- statements are based on the ISO\ C standard (see
- .IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard"),
- except that the Boolean expressions shall be treated as described in
- .IR "Expressions in awk",
- and except in the case of:
- .sp
- .RS 4
- .nf
- for (\fIvariable\fR in \fIarray\fR)
- .fi
- .P
- .RE
- .P
- which shall iterate, assigning each
- .IR index
- of
- .IR array
- to
- .IR variable
- in an unspecified order. The results of adding new elements to
- .IR array
- within such a
- .BR for
- loop are undefined. If a
- .BR break
- or
- .BR continue
- statement occurs outside of a loop, the behavior is undefined.
- .P
- The
- .BR delete
- statement shall remove an individual array element. Thus, the following
- code deletes an entire array:
- .sp
- .RS 4
- .nf
- for (index in array)
- delete array[index]
- .fi
- .P
- .RE
- .P
- The
- .BR next
- statement shall cause all further processing of the current input
- record to be abandoned. The behavior is undefined if a
- .BR next
- statement appears or is invoked in a
- .BR BEGIN
- or
- .BR END
- action.
- .P
- The
- .BR exit
- statement shall invoke all
- .BR END
- actions in the order in which they occur in the program source and then
- terminate the program without reading further input. An
- .BR exit
- statement inside an
- .BR END
- action shall terminate the program without further execution of
- .BR END
- actions. If an expression is specified in an
- .BR exit
- statement, its numeric value shall be the exit status of
- .IR awk ,
- unless subsequent errors are encountered or a subsequent
- .BR exit
- statement with an expression is executed.
- .SS "Output Statements"
- .P
- Both
- .BR print
- and
- .BR printf
- statements shall write to standard output by default. The output shall
- be written to the location specified by
- .IR output_redirection
- if one is supplied, as follows:
- .sp
- .RS 4
- .nf
- > \fIexpression\fR
- >> \fIexpression\fR
- | \fIexpression\fR
- .fi
- .P
- .RE
- .P
- In all cases, the
- .IR expression
- shall be evaluated to produce a string that is used as a pathname
- into which to write (for
- .BR '>'
- or
- .BR \(dq>>\(dq )
- or as a command to be executed (for
- .BR '|' ).
- Using the first two forms, if the file of that name is not currently
- open, it shall be opened, creating it if necessary and using the first
- form, truncating the file. The output then shall be appended to the
- file. As long as the file remains open, subsequent calls in which
- .IR expression
- evaluates to the same string value shall simply append output to the
- file. The file remains open until the
- .BR close
- function (see
- .IR "Input/Output and General Functions")
- is called with an expression that evaluates to the same string value.
- .P
- The third form shall write output onto a stream piped to the input of a
- command. The stream shall be created if no stream is currently open
- with the value of
- .IR expression
- as its command name. The stream created shall be equivalent to one
- created by a call to the
- \fIpopen\fR()
- function defined in the System Interfaces volume of POSIX.1\(hy2017 with the value of
- .IR expression
- as the
- .IR command
- argument and a value of
- .IR w
- as the
- .IR mode
- argument. As long as the stream remains open, subsequent calls in which
- .IR expression
- evaluates to the same string value shall write output to the existing
- stream. The stream shall remain open until the
- .BR close
- function (see
- .IR "Input/Output and General Functions")
- is called with an expression that evaluates to the same string value.
- At that time, the stream shall be closed as if by a call to the
- \fIpclose\fR()
- function defined in the System Interfaces volume of POSIX.1\(hy2017.
- .P
- As described in detail by the grammar in
- .IR "Grammar",
- these output statements shall take a
- <comma>-separated
- list of
- .IR expression s
- referred to in the grammar by the non-terminal symbols
- .BR expr_list ,
- .BR print_expr_list ,
- or
- .BR print_expr_list_opt .
- This list is referred to here as the
- .IR "expression list" ,
- and each member is referred to as an
- .IR "expression argument" .
- .P
- The
- .BR print
- statement shall write the value of each expression argument onto the
- indicated output stream separated by the current output field separator
- (see variable
- .BR OFS
- above), and terminated by the output record separator (see variable
- .BR ORS
- above). All expression arguments shall be taken as strings, being
- converted if necessary; this conversion shall be as described in
- .IR "Expressions in awk",
- with the exception that the
- .BR printf
- format in
- .BR OFMT
- shall be used instead of the value in
- .BR CONVFMT .
- An empty expression list shall stand for the whole input record ($0).
- .P
- The
- .BR printf
- statement shall produce output based on a notation similar to the
- File Format Notation used to describe file formats in this volume of POSIX.1\(hy2017 (see the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Chapter 5" ", " "File Format Notation").
- Output shall be produced as specified with the first
- .IR expression
- argument as the string
- .IR format
- and subsequent
- .IR expression
- arguments as the strings
- .IR arg1
- to
- .IR argn ,
- inclusive, with the following exceptions:
- .IP " 1." 4
- The
- .IR format
- shall be an actual character string rather than a graphical
- representation. Therefore, it cannot contain empty character
- positions. The
- <space>
- in the
- .IR format
- string, in any context other than a
- .IR flag
- of a conversion specification, shall be treated as an ordinary
- character that is copied to the output.
- .IP " 2." 4
- If the character set contains a
- .BR ' '
- character and that character appears in the
- .IR format
- string, it shall be treated as an ordinary character that is copied to
- the output.
- .IP " 3." 4
- The
- .IR "escape sequences"
- beginning with a
- <backslash>
- character shall be treated as sequences of ordinary characters that are
- copied to the output. Note that these same sequences shall be interpreted
- lexically by
- .IR awk
- when they appear in literal strings, but they shall not be treated
- specially by the
- .BR printf
- statement.
- .IP " 4." 4
- A
- .IR "field width"
- or
- .IR precision
- can be specified as the
- .BR '*'
- character instead of a digit string. In this case the next argument
- from the expression list shall be fetched and its numeric value taken
- as the field width or precision.
- .IP " 5." 4
- The implementation shall not precede or follow output from the
- .BR d
- or
- .BR u
- conversion specifier characters with
- <blank>
- characters not specified by the
- .IR format
- string.
- .IP " 6." 4
- The implementation shall not precede output from the
- .BR o
- conversion specifier character with leading zeros not specified by the
- .IR format
- string.
- .IP " 7." 4
- For the
- .BR c
- conversion specifier character: if the argument has a numeric value, the
- character whose encoding is that value shall be output. If the value is
- zero or is not the encoding of any character in the character set, the
- behavior is undefined. If the argument does not have a numeric value,
- the first character of the string value shall be output; if the string
- does not contain any characters, the behavior is undefined.
- .IP " 8." 4
- For each conversion specification that consumes an argument, the next
- expression argument shall be evaluated. With the exception of the
- .BR c
- conversion specifier character, the value shall be converted (according
- to the rules specified in
- .IR "Expressions in awk")
- to the appropriate type for the conversion specification.
- .IP " 9." 4
- If there are insufficient expression arguments to satisfy all the
- conversion specifications in the
- .IR format
- string, the behavior is undefined.
- .IP 10. 4
- If any character sequence in the
- .IR format
- string begins with a
- .BR '%'
- character, but does not form a valid conversion specification, the
- behavior is unspecified.
- .P
- Both
- .BR print
- and
- .BR printf
- can output at least
- {LINE_MAX}
- bytes.
- .SS "Functions"
- .P
- The
- .IR awk
- language has a variety of built-in functions: arithmetic, string,
- input/output, and general.
- .SS "Arithmetic Functions"
- .P
- The arithmetic functions, except for
- .BR int ,
- shall be based on the ISO\ C standard (see
- .IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard").
- The behavior is undefined in cases where the ISO\ C standard specifies that an
- error be returned or that the behavior is undefined. Although the
- grammar (see
- .IR "Grammar")
- permits built-in functions to appear with no arguments or parentheses,
- unless the argument or parentheses are indicated as optional in the
- following list (by displaying them within the
- .BR \(dq[]\(dq
- brackets), such use is undefined.
- .IP "\fBatan2\fR(\fIy\fR,\fIx\fR)" 10
- Return arctangent of \fIy\fP/\fIx\fR in radians in the range
- [\-\(*p,\(*p].
- .IP "\fBcos\fR(\fIx\fR)" 10
- Return cosine of \fIx\fP, where \fIx\fP is in radians.
- .IP "\fBsin\fR(\fIx\fR)" 10
- Return sine of \fIx\fP, where \fIx\fP is in radians.
- .IP "\fBexp\fR(\fIx\fR)" 10
- Return the exponential function of \fIx\fP.
- .IP "\fBlog\fR(\fIx\fR)" 10
- Return the natural logarithm of \fIx\fP.
- .IP "\fBsqrt\fR(\fIx\fR)" 10
- Return the square root of \fIx\fP.
- .IP "\fBint\fR(\fIx\fR)" 10
- Return the argument truncated to an integer. Truncation shall
- be toward 0 when \fIx\fP>0.
- .IP "\fBrand\fP(\|)" 10
- Return a random number \fIn\fP, such that 0\(<=\fIn\fP<1.
- .IP "\fBsrand\fR(\fB[\fIexpr\fB]\fR)" 10
- Set the seed value for
- .IR rand
- to
- .IR expr
- or use the time of day if
- .IR expr
- is omitted. The previous seed value shall be returned.
- .SS "String Functions"
- .P
- The string functions in the following list shall be supported.
- Although the grammar (see
- .IR "Grammar")
- permits built-in functions to appear with no arguments or parentheses,
- unless the argument or parentheses are indicated as optional in the
- following list (by displaying them within the
- .BR \(dq[]\(dq
- brackets), such use is undefined.
- .IP "\fBgsub\fR(\fIere\fR,\ \fIrepl\fB[\fR,\ \fIin\fB]\fR)" 10
- .br
- Behave like
- .BR sub
- (see below), except that it shall replace all occurrences of the
- regular expression (like the
- .IR ed
- utility global substitute) in $0 or in the
- .IR in
- argument, when specified.
- .IP "\fBindex\fR(\fIs\fR,\ \fIt\fR)" 10
- Return the position, in characters, numbering from 1, in string
- .IR s
- where string
- .IR t
- first occurs, or zero if it does not occur at all.
- .IP "\fBlength[\fR(\fB[\fIs\fB]\fR)\fB]\fR" 10
- Return the length, in characters, of its argument taken as a string, or
- of the whole record, $0, if there is no argument.
- .IP "\fBmatch\fR(\fIs\fR,\ \fIere\fR)" 10
- Return the position, in characters, numbering from 1, in string
- .IR s
- where the extended regular expression
- .IR ere
- occurs, or zero if it does not occur at all. RSTART shall be set to the
- starting position (which is the same as the returned value), zero if no
- match is found; RLENGTH shall be set to the length of the matched
- string, \-1 if no match is found.
- .IP "\fBsplit\fR(\fIs\fR,\ \fIa\fB[\fR,\ \fIfs\ \fB]\fR)" 10
- .br
- Split the string
- .IR s
- into array elements
- .IR a [1],
- .IR a [2],
- \&.\|.\|.,
- .IR a [ n ],
- and return
- .IR n .
- All elements of the array shall be deleted before the split is
- performed. The separation shall be done with the ERE
- .IR fs
- or with the field separator
- .BR FS
- if
- .IR fs
- is not given. Each array element shall have a string value when created
- and, if appropriate, the array element shall be considered a numeric
- string (see
- .IR "Expressions in awk").
- The effect of a null string as the value of
- .IR fs
- is unspecified.
- .IP "\fBsprintf\fR(\fIfmt\fR,\ \fIexpr\fR,\ \fIexpr\fR,\ .\|.\|.)" 10
- .br
- Format the expressions according to the
- .BR printf
- format given by
- .IR fmt
- and return the resulting string.
- .IP "\fBsub(\fIere\fR,\ \fIrepl\fB[\fR,\ \fIin\ \fB]\fR)" 10
- .br
- Substitute the string
- .IR repl
- in place of the first instance of the extended regular expression
- .IR ERE
- in string
- .IR in
- and return the number of substitutions. An
- <ampersand>
- (\c
- .BR '&' )
- appearing in the string
- .IR repl
- shall be replaced by the string from
- .IR in
- that matches the ERE. An
- <ampersand>
- preceded with a
- <backslash>
- shall be interpreted as the literal
- <ampersand>
- character. An occurrence of two consecutive
- <backslash>
- characters shall be interpreted as just a single literal
- <backslash>
- character. Any other occurrence of a
- <backslash>
- (for example, preceding any other character) shall be treated as a
- literal
- <backslash>
- character. Note that if
- .IR repl
- is a string literal (the lexical token
- .BR STRING ;
- see
- .IR "Grammar"),
- the handling of the
- <ampersand>
- character occurs after any lexical processing, including any lexical
- <backslash>-escape
- sequence processing. If
- .IR in
- is specified and it is not an lvalue (see
- .IR "Expressions in awk"),
- the behavior is undefined. If
- .IR in
- is omitted,
- .IR awk
- shall use the current record ($0) in its place.
- .IP "\fBsubstr\fR(\fIs\fR,\ \fIm\fB[\fR,\ \fIn\ \fB]\fR)" 10
- .br
- Return the at most
- .IR n -character
- substring of
- .IR s
- that begins at position
- .IR m ,
- numbering from 1. If
- .IR n
- is omitted, or if
- .IR n
- specifies more characters than are left in the string, the length of
- the substring shall be limited by the length of the string
- .IR s .
- .IP "\fBtolower\fR(\fIs\fR)" 10
- Return a string based on the string
- .IR s .
- Each character in
- .IR s
- that is an uppercase letter specified to have a
- .BR tolower
- mapping by the
- .IR LC_CTYPE
- category of the current locale shall be replaced in the returned string
- by the lowercase letter specified by the mapping. Other characters in
- .IR s
- shall be unchanged in the returned string.
- .IP "\fBtoupper\fR(\fIs\fR)" 10
- Return a string based on the string
- .IR s .
- Each character in
- .IR s
- that is a lowercase letter specified to have a
- .BR toupper
- mapping by the
- .IR LC_CTYPE
- category of the current locale is replaced in the returned string by
- the uppercase letter specified by the mapping. Other characters in
- .IR s
- are unchanged in the returned string.
- .P
- All of the preceding functions that take
- .IR ERE
- as a parameter expect a pattern or a string valued expression that is a
- regular expression as defined in
- .IR "Regular Expressions".
- .SS "Input/Output and General Functions"
- .P
- The input/output and general functions are:
- .IP "\fBclose\fR(\fIexpression\fR)" 10
- .br
- Close the file or pipe opened by a
- .BR print
- or
- .BR printf
- statement or a call to
- .BR getline
- with the same string-valued
- .IR expression .
- The limit on the number of open
- .IR expression
- arguments is implementation-defined. If the close was successful, the
- function shall return zero; otherwise, it shall return non-zero.
- .IP "\fIexpression\ |\ \fBgetline\ [\fIvar\fB]\fR" 10
- .br
- Read a record of input from a stream piped from the output of a
- command. The stream shall be created if no stream is currently open
- with the value of
- .IR expression
- as its command name. The stream created shall be equivalent to one
- created by a call to the
- \fIpopen\fR()
- function with the value of
- .IR expression
- as the
- .IR command
- argument and a value of
- .IR r
- as the
- .IR mode
- argument. As long as the stream remains open, subsequent calls in which
- .IR expression
- evaluates to the same string value shall read subsequent records from
- the stream. The stream shall remain open until the
- .BR close
- function is called with an expression that evaluates to the same string
- value. At that time, the stream shall be closed as if by a call to the
- \fIpclose\fR()
- function. If
- .IR var
- is omitted, $0 and
- .BR NF
- shall be set; otherwise,
- .IR var
- shall be set and, if appropriate, it shall be considered a numeric
- string (see
- .IR "Expressions in awk").
- .RS 10
- .P
- The
- .BR getline
- operator can form ambiguous constructs when there are unparenthesized
- operators (including concatenate) to the left of the
- .BR '|'
- (to the beginning of the expression containing
- .BR getline ).
- In the context of the
- .BR '$'
- operator,
- .BR '|'
- shall behave as if it had a lower precedence than
- .BR '$' .
- The result of evaluating other operators is unspecified, and conforming
- applications shall parenthesize properly all such usages.
- .RE
- .IP "\fBgetline\fR" 10
- Set $0 to the next input record from the current input file. This form
- of
- .BR getline
- shall set the
- .BR NF ,
- .BR NR ,
- and
- .BR FNR
- variables.
- .IP "\fBgetline\ \fIvar\fR" 10
- Set variable
- .IR var
- to the next input record from the current input file and, if
- appropriate,
- .IR var
- shall be considered a numeric string (see
- .IR "Expressions in awk").
- This form of
- .BR getline
- shall set the
- .BR FNR
- and
- .BR NR
- variables.
- .IP "\fBgetline\ \fB[\fIvar\fB]\ \fR<\ \fIexpression\fR" 10
- .br
- Read the next record of input from a named file. The
- .IR expression
- shall be evaluated to produce a string that is used as a pathname.
- If the file of that name is not currently open, it shall be opened. As
- long as the stream remains open, subsequent calls in which
- .IR expression
- evaluates to the same string value shall read subsequent records from
- the file. The file shall remain open until the
- .BR close
- function is called with an expression that evaluates to the same string
- value. If
- .IR var
- is omitted, $0 and
- .BR NF
- shall be set; otherwise,
- .IR var
- shall be set and, if appropriate, it shall be considered a numeric
- string (see
- .IR "Expressions in awk").
- .RS 10
- .P
- The
- .BR getline
- operator can form ambiguous constructs when there are unparenthesized
- binary operators (including concatenate) to the right of the
- .BR '<'
- (up to the end of the expression containing the
- .BR getline ).
- The result of evaluating such a construct is unspecified, and conforming
- applications shall parenthesize properly all such usages.
- .RE
- .IP "\fBsystem\fR(\fIexpression\fR)" 10
- .br
- Execute the command given by
- .IR expression
- in a manner equivalent to the
- \fIsystem\fR()
- function defined in the System Interfaces volume of POSIX.1\(hy2017 and return the exit status of the
- command.
- .P
- All forms of
- .BR getline
- shall return 1 for successful input, zero for end-of-file, and \-1
- for an error.
- .P
- Where strings are used as the name of a file or pipeline, the
- application shall ensure that the strings are textually identical. The
- terminology ``same string value'' implies that ``equivalent strings'',
- even those that differ only by
- <space>
- characters, represent different files.
- .SS "User-Defined Functions"
- .P
- The
- .IR awk
- language also provides user-defined functions. Such functions can be
- defined as:
- .sp
- .RS 4
- .nf
- function \fIname\fR(\fB[\fIparameter\fR, ...\fB]\fR) { \fIstatements\fR }
- .fi
- .P
- .RE
- .P
- A function can be referred to anywhere in an
- .IR awk
- program; in particular, its use can precede its definition. The scope
- of a function is global.
- .P
- Function parameters, if present, can be either scalars or arrays; the
- behavior is undefined if an array name is passed as a parameter that
- the function uses as a scalar, or if a scalar expression is passed as a
- parameter that the function uses as an array. Function parameters shall
- be passed by value if scalar and by reference if array name.
- .P
- The number of parameters in the function definition need not match the
- number of parameters in the function call. Excess formal parameters can
- be used as local variables. If fewer arguments are supplied in a
- function call than are in the function definition, the extra parameters
- that are used in the function body as scalars shall evaluate to the
- uninitialized value until they are otherwise initialized, and the extra
- parameters that are used in the function body as arrays shall be
- treated as uninitialized arrays where each element evaluates to the
- uninitialized value until otherwise initialized.
- .P
- When invoking a function, no white space can be placed between the
- function name and the opening parenthesis. Function calls can be nested
- and recursive calls can be made upon functions. Upon return from any
- nested or recursive function call, the values of all of the calling
- function's parameters shall be unchanged, except for array parameters
- passed by reference. The
- .BR return
- statement can be used to return a value. If a
- .BR return
- statement appears outside of a function definition, the behavior is
- undefined.
- .P
- In the function definition,
- <newline>
- characters shall be optional before the opening brace and after the
- closing brace. Function definitions can appear anywhere in the program
- where a
- .IR pattern-action
- pair is allowed.
- .SS "Grammar"
- .P
- The grammar in this section and the lexical conventions in the
- following section shall together describe the syntax for
- .IR awk
- programs. The general conventions for this style of grammar are
- described in
- .IR "Section 1.3" ", " "Grammar Conventions".
- A valid program can be represented as the non-terminal symbol
- .IR program
- in the grammar. This formal syntax shall take precedence over the
- preceding text syntax description.
- .sp
- .RS 4
- .nf
- %token NAME NUMBER STRING ERE
- %token FUNC_NAME /* Name followed by \(aq(\(aq without white space. */
- .P
- /* Keywords */
- %token Begin End
- /* \(aqBEGIN\(aq \(aqEND\(aq */
- .P
- %token Break Continue Delete Do Else
- /* \(aqbreak\(aq \(aqcontinue\(aq \(aqdelete\(aq \(aqdo\(aq \(aqelse\(aq */
- .P
- %token Exit For Function If In
- /* \(aqexit\(aq \(aqfor\(aq \(aqfunction\(aq \(aqif\(aq \(aqin\(aq */
- .P
- %token Next Print Printf Return While
- /* \(aqnext\(aq \(aqprint\(aq \(aqprintf\(aq \(aqreturn\(aq \(aqwhile\(aq */
- .P
- /* Reserved function names */
- %token BUILTIN_FUNC_NAME
- /* One token for the following:
- * atan2 cos sin exp log sqrt int rand srand
- * gsub index length match split sprintf sub
- * substr tolower toupper close system
- */
- %token GETLINE
- /* Syntactically different from other built-ins. */
- .P
- /* Two-character tokens. */
- %token ADD_ASSIGN SUB_ASSIGN MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN POW_ASSIGN
- /* \(aq+=\(aq \(aq-=\(aq \(aq*=\(aq \(aq/=\(aq \(aq%=\(aq \(aq\(ha=\(aq */
- .P
- %token OR AND NO_MATCH EQ LE GE NE INCR DECR APPEND
- /* \(aq||\(aq \(aq&&\(aq \(aq!\^\(ti\(aq \(aq==\(aq \(aq<=\(aq \(aq>=\(aq \(aq!=\(aq \(aq++\(aq \(aq--\(aq \(aq>>\(aq */
- .P
- /* One-character tokens. */
- %token \(aq{\(aq \(aq}\(aq \(aq(\(aq \(aq)\(aq \(aq[\(aq \(aq]\(aq \(aq,\(aq \(aq;\(aq NEWLINE
- %token \(aq+\(aq \(aq-\(aq \(aq*\(aq \(aq%\(aq \(aq\(ha\(aq \(aq!\(aq \(aq>\(aq \(aq<\(aq \(aq|\(aq \(aq?\(aq \(aq:\(aq \(aq\(ti\(aq \(aq$\(aq \(aq=\(aq
- .P
- %start program
- %%
- .P
- program : item_list
- | item_list item
- ;
- .P
- item_list : /* empty */
- | item_list item terminator
- ;
- .P
- item : action
- | pattern action
- | normal_pattern
- | Function NAME \(aq(\(aq param_list_opt \(aq)\(aq
- newline_opt action
- | Function FUNC_NAME \(aq(\(aq param_list_opt \(aq)\(aq
- newline_opt action
- ;
- .P
- param_list_opt : /* empty */
- | param_list
- ;
- .P
- param_list : NAME
- | param_list \(aq,\(aq NAME
- ;
- .P
- pattern : normal_pattern
- | special_pattern
- ;
- .P
- normal_pattern : expr
- | expr \(aq,\(aq newline_opt expr
- ;
- .P
- special_pattern : Begin
- | End
- ;
- .P
- action : \(aq{\(aq newline_opt \(aq}\(aq
- | \(aq{\(aq newline_opt terminated_statement_list \(aq}\(aq
- | \(aq{\(aq newline_opt unterminated_statement_list \(aq}\(aq
- ;
- .P
- terminator : terminator NEWLINE
- | \(aq;\(aq
- | NEWLINE
- ;
- .P
- terminated_statement_list : terminated_statement
- | terminated_statement_list terminated_statement
- ;
- .P
- unterminated_statement_list : unterminated_statement
- | terminated_statement_list unterminated_statement
- ;
- .P
- terminated_statement : action newline_opt
- | If \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement
- | If \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement
- Else newline_opt terminated_statement
- | While \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement
- | For \(aq(\(aq simple_statement_opt \(aq;\(aq
- expr_opt \(aq;\(aq simple_statement_opt \(aq)\(aq newline_opt
- terminated_statement
- | For \(aq(\(aq NAME In NAME \(aq)\(aq newline_opt
- terminated_statement
- | \(aq;\(aq newline_opt
- | terminatable_statement NEWLINE newline_opt
- | terminatable_statement \(aq;\(aq newline_opt
- ;
- .P
- unterminated_statement : terminatable_statement
- | If \(aq(\(aq expr \(aq)\(aq newline_opt unterminated_statement
- | If \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement
- Else newline_opt unterminated_statement
- | While \(aq(\(aq expr \(aq)\(aq newline_opt unterminated_statement
- | For \(aq(\(aq simple_statement_opt \(aq;\(aq
- expr_opt \(aq;\(aq simple_statement_opt \(aq)\(aq newline_opt
- unterminated_statement
- | For \(aq(\(aq NAME In NAME \(aq)\(aq newline_opt
- unterminated_statement
- ;
- .P
- terminatable_statement : simple_statement
- | Break
- | Continue
- | Next
- | Exit expr_opt
- | Return expr_opt
- | Do newline_opt terminated_statement While \(aq(\(aq expr \(aq)\(aq
- ;
- .P
- simple_statement_opt : /* empty */
- | simple_statement
- ;
- .P
- simple_statement : Delete NAME \(aq[\(aq expr_list \(aq]\(aq
- | expr
- | print_statement
- ;
- .P
- print_statement : simple_print_statement
- | simple_print_statement output_redirection
- ;
- .P
- simple_print_statement : Print print_expr_list_opt
- | Print \(aq(\(aq multiple_expr_list \(aq)\(aq
- | Printf print_expr_list
- | Printf \(aq(\(aq multiple_expr_list \(aq)\(aq
- ;
- .P
- output_redirection : \(aq>\(aq expr
- | APPEND expr
- | \(aq|\(aq expr
- ;
- .P
- expr_list_opt : /* empty */
- | expr_list
- ;
- .P
- expr_list : expr
- | multiple_expr_list
- ;
- .P
- multiple_expr_list : expr \(aq,\(aq newline_opt expr
- | multiple_expr_list \(aq,\(aq newline_opt expr
- ;
- .P
- expr_opt : /* empty */
- | expr
- ;
- .P
- expr : unary_expr
- | non_unary_expr
- ;
- .P
- unary_expr : \(aq+\(aq expr
- | \(aq-\(aq expr
- | unary_expr \(aq\(ha\(aq expr
- | unary_expr \(aq*\(aq expr
- | unary_expr \(aq/\(aq expr
- | unary_expr \(aq%\(aq expr
- | unary_expr \(aq+\(aq expr
- | unary_expr \(aq-\(aq expr
- | unary_expr non_unary_expr
- | unary_expr \(aq<\(aq expr
- | unary_expr LE expr
- | unary_expr NE expr
- | unary_expr EQ expr
- | unary_expr \(aq>\(aq expr
- | unary_expr GE expr
- | unary_expr \(aq\(ti\(aq expr
- | unary_expr NO_MATCH expr
- | unary_expr In NAME
- | unary_expr AND newline_opt expr
- | unary_expr OR newline_opt expr
- | unary_expr \(aq?\(aq expr \(aq:\(aq expr
- | unary_input_function
- ;
- .P
- non_unary_expr : \(aq(\(aq expr \(aq)\(aq
- | \(aq!\(aq expr
- | non_unary_expr \(aq\(ha\(aq expr
- | non_unary_expr \(aq*\(aq expr
- | non_unary_expr \(aq/\(aq expr
- | non_unary_expr \(aq%\(aq expr
- | non_unary_expr \(aq+\(aq expr
- | non_unary_expr \(aq-\(aq expr
- | non_unary_expr non_unary_expr
- | non_unary_expr \(aq<\(aq expr
- | non_unary_expr LE expr
- | non_unary_expr NE expr
- | non_unary_expr EQ expr
- | non_unary_expr \(aq>\(aq expr
- | non_unary_expr GE expr
- | non_unary_expr \(aq\(ti\(aq expr
- | non_unary_expr NO_MATCH expr
- | non_unary_expr In NAME
- | \(aq(\(aq multiple_expr_list \(aq)\(aq In NAME
- | non_unary_expr AND newline_opt expr
- | non_unary_expr OR newline_opt expr
- | non_unary_expr \(aq?\(aq expr \(aq:\(aq expr
- | NUMBER
- | STRING
- | lvalue
- | ERE
- | lvalue INCR
- | lvalue DECR
- | INCR lvalue
- | DECR lvalue
- | lvalue POW_ASSIGN expr
- | lvalue MOD_ASSIGN expr
- | lvalue MUL_ASSIGN expr
- | lvalue DIV_ASSIGN expr
- | lvalue ADD_ASSIGN expr
- | lvalue SUB_ASSIGN expr
- | lvalue \(aq=\(aq expr
- | FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq
- /* no white space allowed before \(aq(\(aq */
- | BUILTIN_FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq
- | BUILTIN_FUNC_NAME
- | non_unary_input_function
- ;
- .P
- print_expr_list_opt : /* empty */
- | print_expr_list
- ;
- .P
- print_expr_list : print_expr
- | print_expr_list \(aq,\(aq newline_opt print_expr
- ;
- .P
- print_expr : unary_print_expr
- | non_unary_print_expr
- ;
- .P
- unary_print_expr : \(aq+\(aq print_expr
- | \(aq-\(aq print_expr
- | unary_print_expr \(aq\(ha\(aq print_expr
- | unary_print_expr \(aq*\(aq print_expr
- | unary_print_expr \(aq/\(aq print_expr
- | unary_print_expr \(aq%\(aq print_expr
- | unary_print_expr \(aq+\(aq print_expr
- | unary_print_expr \(aq-\(aq print_expr
- | unary_print_expr non_unary_print_expr
- | unary_print_expr \(aq\(ti\(aq print_expr
- | unary_print_expr NO_MATCH print_expr
- | unary_print_expr In NAME
- | unary_print_expr AND newline_opt print_expr
- | unary_print_expr OR newline_opt print_expr
- | unary_print_expr \(aq?\(aq print_expr \(aq:\(aq print_expr
- ;
- .P
- non_unary_print_expr : \(aq(\(aq expr \(aq)\(aq
- | \(aq!\(aq print_expr
- | non_unary_print_expr \(aq\(ha\(aq print_expr
- | non_unary_print_expr \(aq*\(aq print_expr
- | non_unary_print_expr \(aq/\(aq print_expr
- | non_unary_print_expr \(aq%\(aq print_expr
- | non_unary_print_expr \(aq+\(aq print_expr
- | non_unary_print_expr \(aq-\(aq print_expr
- | non_unary_print_expr non_unary_print_expr
- | non_unary_print_expr \(aq\(ti\(aq print_expr
- | non_unary_print_expr NO_MATCH print_expr
- | non_unary_print_expr In NAME
- | \(aq(\(aq multiple_expr_list \(aq)\(aq In NAME
- | non_unary_print_expr AND newline_opt print_expr
- | non_unary_print_expr OR newline_opt print_expr
- | non_unary_print_expr \(aq?\(aq print_expr \(aq:\(aq print_expr
- | NUMBER
- | STRING
- | lvalue
- | ERE
- | lvalue INCR
- | lvalue DECR
- | INCR lvalue
- | DECR lvalue
- | lvalue POW_ASSIGN print_expr
- | lvalue MOD_ASSIGN print_expr
- | lvalue MUL_ASSIGN print_expr
- | lvalue DIV_ASSIGN print_expr
- | lvalue ADD_ASSIGN print_expr
- | lvalue SUB_ASSIGN print_expr
- | lvalue \(aq=\(aq print_expr
- | FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq
- /* no white space allowed before \(aq(\(aq */
- | BUILTIN_FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq
- | BUILTIN_FUNC_NAME
- ;
- .P
- lvalue : NAME
- | NAME \(aq[\(aq expr_list \(aq]\(aq
- | \(aq$\(aq expr
- ;
- .P
- non_unary_input_function : simple_get
- | simple_get \(aq<\(aq expr
- | non_unary_expr \(aq|\(aq simple_get
- ;
- .P
- unary_input_function : unary_expr \(aq|\(aq simple_get
- ;
- .P
- simple_get : GETLINE
- | GETLINE lvalue
- ;
- .P
- newline_opt : /* empty */
- | newline_opt NEWLINE
- ;
- .fi
- .P
- .RE
- .P
- This grammar has several ambiguities that shall be resolved as
- follows:
- .IP " *" 4
- Operator precedence and associativity shall be as described in
- .IR "Table 4-1, Expressions in Decreasing Precedence in \fIawk\fP".
- .IP " *" 4
- In case of ambiguity, an
- .BR else
- shall be associated with the most immediately preceding
- .BR if
- that would satisfy the grammar.
- .IP " *" 4
- In some contexts, a
- <slash>
- (\c
- .BR '/' )
- that is used to surround an ERE could also be the division operator.
- This shall be resolved in such a way that wherever the division
- operator could appear, a
- <slash>
- is assumed to be the division operator. (There is no unary division
- operator.)
- .P
- Each expression in an
- .IR awk
- program shall conform to the precedence and associativity rules, even
- when this is not needed to resolve an ambiguity. For example, because
- .BR '$'
- has higher precedence than
- .BR '++' ,
- the string
- .BR \(dq$x++--\(dq
- is not a valid
- .IR awk
- expression, even though it is unambiguously parsed by the grammar as
- .BR \(dq$(x++)--\(dq .
- .P
- One convention that might not be obvious from the formal grammar is
- where
- <newline>
- characters are acceptable. There are several obvious placements such as
- terminating a statement, and a
- <backslash>
- can be used to escape
- <newline>
- characters between any lexical tokens. In addition,
- <newline>
- characters without
- <backslash>
- characters can follow a comma, an open brace, logical AND operator (\c
- .BR \(dq&&\(dq ),
- logical OR operator (\c
- .BR \(dq||\(dq ),
- the
- .BR do
- keyword, the
- .BR else
- keyword, and the closing parenthesis of an
- .BR if ,
- .BR for ,
- or
- .BR while
- statement. For example:
- .sp
- .RS 4
- .nf
- { print $1,
- $2 }
- .fi
- .P
- .RE
- .SS "Lexical Conventions"
- .P
- The lexical conventions for
- .IR awk
- programs, with respect to the preceding grammar, shall be as follows:
- .IP " 1." 4
- Except as noted,
- .IR awk
- shall recognize the longest possible token or delimiter beginning at a
- given point.
- .IP " 2." 4
- A comment shall consist of any characters beginning with the
- <number-sign>
- character and terminated by, but excluding the next occurrence of, a
- <newline>.
- Comments shall have no effect, except to delimit lexical tokens.
- .IP " 3." 4
- The
- <newline>
- shall be recognized as the token
- .BR NEWLINE .
- .IP " 4." 4
- A
- <backslash>
- character immediately followed by a
- <newline>
- shall have no effect.
- .IP " 5." 4
- The token
- .BR STRING
- shall represent a string constant. A string constant shall begin with
- the character
- .BR '\&"' .
- Within a string constant, a
- <backslash>
- character shall be considered to begin an escape sequence as specified
- in the table in the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Chapter 5" ", " "File Format Notation"
- (\c
- .BR '\e\e' ,
- .BR '\ea' ,
- .BR '\eb' ,
- .BR '\ef' ,
- .BR '\en' ,
- .BR '\er' ,
- .BR '\et' ,
- .BR '\ev' ).
- In addition, the escape sequences in
- .IR "Table 4-2, Escape Sequences in \fIawk\fP"
- shall be recognized. A
- <newline>
- shall not occur within a string constant. A string constant shall be
- terminated by the first unescaped occurrence of the character
- .BR '\&"'
- after the one that begins the string constant. The value of the string
- shall be the sequence of all unescaped characters and values of escape
- sequences between, but not including, the two delimiting
- .BR '\&"'
- characters.
- .IP " 6." 4
- The token
- .BR ERE
- represents an extended regular expression constant. An ERE constant
- shall begin with the
- <slash>
- character. Within an ERE constant, a
- <backslash>
- character shall be considered to begin an escape sequence as
- specified in the table in the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Chapter 5" ", " "File Format Notation".
- In addition, the escape sequences in
- .IR "Table 4-2, Escape Sequences in \fIawk\fP"
- shall be recognized. The application shall ensure that a
- <newline>
- does not occur within an ERE constant. An ERE constant shall be
- terminated by the first unescaped occurrence of the
- <slash>
- character after the one that begins the ERE constant. The extended regular
- expression represented by the ERE constant shall be the sequence of all
- unescaped characters and values of escape sequences between, but not
- including, the two delimiting
- <slash>
- characters.
- .IP " 7." 4
- A
- <blank>
- shall have no effect, except to delimit lexical tokens or within
- .BR STRING
- or
- .BR ERE
- tokens.
- .IP " 8." 4
- The token
- .BR NUMBER
- shall represent a numeric constant. Its form and numeric value shall
- either be equivalent to the
- .BR decimal-floating-constant
- token as specified by the ISO\ C standard, or it shall be a sequence of decimal
- digits and shall be evaluated as an integer constant in decimal. In
- addition, implementations may accept numeric constants with the form
- and numeric value equivalent to the
- .BR hexadecimal-constant
- and
- .BR hexadecimal-floating-constant
- tokens as specified by the ISO\ C standard.
- .RS 4
- .P
- If the value is too large or too small to be representable (see
- .IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard"),
- the behavior is undefined.
- .RE
- .IP " 9." 4
- A sequence of underscores, digits, and alphabetics from the portable
- character set (see the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 6.1" ", " "Portable Character Set"),
- beginning with an
- <underscore>
- or alphabetic character, shall be considered a word.
- .IP 10. 4
- The following words are keywords that shall be recognized as individual
- tokens; the name of the token is the same as the keyword:
- .TS
- tab(@);
- lw(0.6i)eB leB leB leB leB leB.
- T{
- .nf
- BEGIN
- break
- continue
- T}@T{
- .nf
- delete
- do
- else
- T}@T{
- .nf
- END
- exit
- for
- T}@T{
- .nf
- function
- getline
- if
- T}@T{
- .nf
- in
- next
- T}@T{
- .nf
- printf
- return
- while
- T}
- .TE
- .IP 11. 4
- The following words are names of built-in functions and shall be
- recognized as the token
- .BR BUILTIN_FUNC_NAME :
- .TS
- tab(@);
- lw(0.6i)eB leB leB leB leB leB.
- T{
- .nf
- atan2
- close
- cos
- exp
- T}@T{
- .nf
- gsub
- index
- int
- length
- T}@T{
- .nf
- log
- match
- rand
- sin
- T}@T{
- .nf
- split
- sprintf
- sqrt
- srand
- T}@T{
- .nf
- sub
- substr
- system
- tolower
- T}@T{
- .nf
- toupper
- .fi
- T}
- .TE
- .RS 4
- .P
- The above-listed keywords and names of built-in functions are
- considered reserved words.
- .RE
- .IP 12. 4
- The token
- .BR NAME
- shall consist of a word that is not a keyword or a name of a built-in
- function and is not followed immediately (without any delimiters) by
- the
- .BR '('
- character.
- .IP 13. 4
- The token
- .BR FUNC_NAME
- shall consist of a word that is not a keyword or a name of a built-in
- function, followed immediately (without any delimiters) by the
- .BR '('
- character. The
- .BR '('
- character shall not be included as part of the token.
- .IP 14. 4
- The following two-character sequences shall be recognized as the named
- tokens:
- .TS
- box center tab(@);
- cB | cB | cB | cB
- lB | cf5 | lB | cf5.
- Token Name@Sequence@Token Name@Sequence
- _
- ADD_ASSIGN@+=@NO_MATCH@!~
- SUB_ASSIGN@\-=@EQ@==
- MUL_ASSIGN@*=@LE@<=
- DIV_ASSIGN@/=@GE@>=
- MOD_ASSIGN@%=@NE@!=
- POW_ASSIGN@^=@INCR@++
- OR@||@DECR@\-\|\-
- AND@&&@APPEND@>>
- .TE
- .IP 15. 4
- The following single characters shall be recognized as tokens whose
- names are the character:
- .RS 4
- .sp
- .RS 4
- .nf
- <newline> { } ( ) [ ] , ; + - * % \(ha ! > < | ? : \(ti $ =
- .fi
- .P
- .RE
- .RE
- .P
- There is a lexical ambiguity between the token
- .BR ERE
- and the tokens
- .BR '/'
- and
- .BR DIV_ASSIGN .
- When an input sequence begins with a
- <slash>
- character in any syntactic context where the token
- .BR '/'
- or
- .BR DIV_ASSIGN
- could appear as the next token in a valid program, the longer of those
- two tokens that can be recognized shall be recognized. In any other
- syntactic context where the token
- .BR ERE
- could appear as the next token in a valid program, the token
- .BR ERE
- shall be recognized.
- .SH "EXIT STATUS"
- The following exit values shall be returned:
- .IP "\00" 6
- All input files were processed successfully.
- .IP >0 6
- An error occurred.
- .P
- The exit status can be altered within the program by using an
- .BR exit
- expression.
- .SH "CONSEQUENCES OF ERRORS"
- If any
- .IR file
- operand is specified and the named file cannot be accessed,
- .IR awk
- shall write a diagnostic message to standard error and terminate
- without any further action.
- .P
- If the program specified by either the
- .IR program
- operand or a
- .IR progfile
- operand is not a valid
- .IR awk
- program (as specified in the EXTENDED DESCRIPTION section), the
- behavior is undefined.
- .LP
- .IR "The following sections are informative."
- .SH "APPLICATION USAGE"
- The
- .BR index ,
- .BR length ,
- .BR match ,
- and
- .BR substr
- functions should not be confused with similar functions in the ISO\ C standard;
- the
- .IR awk
- versions deal with characters, while the ISO\ C standard deals with bytes.
- .P
- Because the concatenation operation is represented by adjacent
- expressions rather than an explicit operator, it is often necessary to
- use parentheses to enforce the proper evaluation precedence.
- .P
- When using
- .IR awk
- to process pathnames, it is recommended that LC_ALL, or at least
- LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environment,
- since pathnames can contain byte sequences that do not form valid
- characters in some locales, in which case the utility's behavior would
- be undefined. In the POSIX locale each byte is a valid single-byte
- character, and therefore this problem is avoided.
- .P
- On implementations where the
- .BR \(dq==\(dq
- operator checks if strings collate equally, applications needing to
- check whether strings are identical can use:
- .sp
- .RS 4
- .nf
- length(a) == length(b) && index(a,b) == 1
- .fi
- .P
- .RE
- .P
- On implementations where the
- .BR \(dq==\(dq
- operator checks if strings are identical, applications needing to
- check whether strings collate equally can use:
- .sp
- .RS 4
- .nf
- a <= b && a >= b
- .fi
- .P
- .RE
- .SH EXAMPLES
- The
- .IR awk
- program specified in the command line is most easily specified within
- single-quotes (for example, \(aq\fIprogram\fP\(aq) for applications using
- .IR sh ,
- because
- .IR awk
- programs commonly contain characters that are special to the shell,
- including double-quotes. In the cases where an
- .IR awk
- program contains single-quote characters, it is usually easiest to
- specify most of the program as strings within single-quotes
- concatenated by the shell with quoted single-quote characters. For
- example:
- .sp
- .RS 4
- .nf
- awk \(aq/\(aq\e\(aq\(aq/ { print "quote:", $0 }\(aq
- .fi
- .P
- .RE
- .P
- prints all lines from the standard input containing a single-quote
- character, prefixed with
- .IR quote :.
- .P
- The following are examples of simple
- .IR awk
- programs:
- .IP " 1." 4
- Write to the standard output all input lines for which field 3 is
- greater than 5:
- .RS 4
- .sp
- .RS 4
- .nf
- $3 > 5
- .fi
- .P
- .RE
- .RE
- .IP " 2." 4
- Write every tenth line:
- .RS 4
- .sp
- .RS 4
- .nf
- (NR % 10) == 0
- .fi
- .P
- .RE
- .RE
- .IP " 3." 4
- Write any line with a substring matching the regular expression:
- .RS 4
- .sp
- .RS 4
- .nf
- /(G|D)(2[0-9][[:alpha:]]*)/
- .fi
- .P
- .RE
- .RE
- .IP " 4." 4
- Print any line with a substring containing a
- .BR 'G'
- or
- .BR 'D' ,
- followed by a sequence of digits and characters. This example uses
- character classes
- .BR digit
- and
- .BR alpha
- to match language-independent digit and alphabetic characters
- respectively:
- .RS 4
- .sp
- .RS 4
- .nf
- /(G|D)([[:digit:][:alpha:]]*)/
- .fi
- .P
- .RE
- .RE
- .IP " 5." 4
- Write any line in which the second field matches the regular expression
- and the fourth field does not:
- .RS 4
- .sp
- .RS 4
- .nf
- $2 \(ti /xyz/ && $4 !\(ti /xyz/
- .fi
- .P
- .RE
- .RE
- .IP " 6." 4
- Write any line in which the second field contains a
- <backslash>:
- .RS 4
- .sp
- .RS 4
- .nf
- $2 \(ti /\e\e/
- .fi
- .P
- .RE
- .RE
- .IP " 7." 4
- Write any line in which the second field contains a
- <backslash>.
- Note that
- <backslash>-escapes
- are interpreted twice; once in lexical processing of the string and once
- in processing the regular expression:
- .RS 4
- .sp
- .RS 4
- .nf
- $2 \(ti "\e\e\e\e"
- .fi
- .P
- .RE
- .RE
- .IP " 8." 4
- Write the second to the last and the last field in each line. Separate
- the fields by a
- <colon>:
- .RS 4
- .sp
- .RS 4
- .nf
- {OFS=":";print $(NF-1), $NF}
- .fi
- .P
- .RE
- .RE
- .IP " 9." 4
- Write the line number and number of fields in each line. The three
- strings representing the line number, the
- <colon>,
- and the number of fields are concatenated and that string is written to
- standard output:
- .RS 4
- .sp
- .RS 4
- .nf
- {print NR ":" NF}
- .fi
- .P
- .RE
- .RE
- .IP 10. 4
- Write lines longer than 72 characters:
- .RS 4
- .sp
- .RS 4
- .nf
- length($0) > 72
- .fi
- .P
- .RE
- .RE
- .IP 11. 4
- Write the first two fields in opposite order separated by
- .BR OFS :
- .RS 4
- .sp
- .RS 4
- .nf
- { print $2, $1 }
- .fi
- .P
- .RE
- .RE
- .IP 12. 4
- Same, with input fields separated by a
- <comma>
- or
- <space>
- and
- <tab>
- characters, or both:
- .RS 4
- .sp
- .RS 4
- .nf
- BEGIN { FS = ",[ \et]*|[ \et]+" }
- { print $2, $1 }
- .fi
- .P
- .RE
- .RE
- .IP 13. 4
- Add up the first column, print sum, and average:
- .RS 4
- .sp
- .RS 4
- .nf
- {s += $1 }
- END {print "sum is ", s, " average is", s/NR}
- .fi
- .P
- .RE
- .RE
- .IP 14. 4
- Write fields in reverse order, one per line (many lines out for each
- line in):
- .RS 4
- .sp
- .RS 4
- .nf
- { for (i = NF; i > 0; --i) print $i }
- .fi
- .P
- .RE
- .RE
- .IP 15. 4
- Write all lines between occurrences of the strings
- .BR start
- and
- .BR stop :
- .RS 4
- .sp
- .RS 4
- .nf
- /start/, /stop/
- .fi
- .P
- .RE
- .RE
- .IP 16. 4
- Write all lines whose first field is different from the previous one:
- .RS 4
- .sp
- .RS 4
- .nf
- $1 != prev { print; prev = $1 }
- .fi
- .P
- .RE
- .RE
- .IP 17. 4
- Simulate
- .IR echo :
- .RS 4
- .sp
- .RS 4
- .nf
- BEGIN {
- for (i = 1; i < ARGC; ++i)
- printf("%s%s", ARGV[i], i==ARGC-1?"\en":" ")
- }
- .fi
- .P
- .RE
- .RE
- .IP 18. 4
- Write the path prefixes contained in the
- .IR PATH
- environment variable, one per line:
- .RS 4
- .sp
- .RS 4
- .nf
- BEGIN {
- n = split (ENVIRON["PATH"], path, ":")
- for (i = 1; i <= n; ++i)
- print path[i]
- }
- .fi
- .P
- .RE
- .RE
- .IP 19. 4
- If there is a file named
- .BR input
- containing page headers of the form:
- Page #
- .RS 4
- .P
- and a file named
- .BR program
- that contains:
- .sp
- .RS 4
- .nf
- /Page/ { $2 = n++; }
- { print }
- .fi
- .P
- .RE
- then the command line:
- .sp
- .RS 4
- .nf
- awk -f program n=5 input
- .fi
- .P
- .RE
- .P
- prints the file
- .BR input ,
- filling in page numbers starting at 5.
- .RE
- .SH RATIONALE
- This description is based on the new
- .IR awk ,
- ``nawk'', (see the referenced \fIThe AWK Programming Language\fP), which introduced a number of new features to
- the historical
- .IR awk :
- .IP " 1." 4
- New keywords:
- .BR delete ,
- .BR do ,
- .BR function ,
- .BR return
- .IP " 2." 4
- New built-in functions:
- .BR atan2 ,
- .BR close ,
- .BR cos ,
- .BR gsub ,
- .BR match ,
- .BR rand ,
- .BR sin ,
- .BR srand ,
- .BR sub ,
- .BR system
- .IP " 3." 4
- New predefined variables:
- .BR FNR ,
- .BR ARGC ,
- .BR ARGV ,
- .BR RSTART ,
- .BR RLENGTH ,
- .BR SUBSEP
- .IP " 4." 4
- New expression operators:
- .BR ? ,
- .BR : ,
- .BR , ,
- .BR ^
- .IP " 5." 4
- The
- .BR FS
- variable and the third argument to
- .BR split ,
- now treated as extended regular expressions.
- .IP " 6." 4
- The operator precedence, changed to more closely match the C language.
- Two examples of code that operate differently are:
- .RS 4
- .sp
- .RS 4
- .nf
- while ( n /= 10 > 1) ...
- if (!"wk" \(ti /bwk/) ...
- .fi
- .P
- .RE
- .RE
- .P
- Several features have been added based on newer implementations of
- .IR awk :
- .IP " *" 4
- Multiple instances of
- .BR \-f
- .IR progfile
- are permitted.
- .IP " *" 4
- The new option
- .BR \-v
- .IR assignment.
- .IP " *" 4
- The new predefined variable
- .BR ENVIRON .
- .IP " *" 4
- New built-in functions
- .BR toupper
- and
- .BR tolower .
- .IP " *" 4
- More formatting capabilities are added to
- .BR printf
- to match the ISO\ C standard.
- .P
- Earlier versions of this standard required implementations to
- support multiple adjacent
- <semicolon>s,
- lines with one or more
- <semicolon>
- before a rule (\c
- .IR pattern-action
- pairs), and lines with only
- <semicolon>(s).
- These are not required by this standard and are considered poor
- programming practice, but can be accepted by an implementation of
- .IR awk
- as an extension.
- .P
- The overall
- .IR awk
- syntax has always been based on the C language, with a few features
- from the shell command language and other sources. Because of this, it
- is not completely compatible with any other language, which has caused
- confusion for some users. It is not the intent of the standard
- developers to address such issues. A few relatively minor changes
- toward making the language more compatible with the ISO\ C standard were
- made; most of these changes are based on similar changes in recent
- implementations, as described above. There remain several C-language
- conventions that are not in
- .IR awk .
- One of the notable ones is the
- <comma>
- operator, which is commonly used to specify multiple expressions in the
- C language
- .BR for
- statement. Also, there are various places where
- .IR awk
- is more restrictive than the C language regarding the type of
- expression that can be used in a given context. These limitations are
- due to the different features that the
- .IR awk
- language does provide.
- .P
- Regular expressions in
- .IR awk
- have been extended somewhat from historical implementations to make
- them a pure superset of extended regular expressions, as defined by
- POSIX.1\(hy2008 (see the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 9.4" ", " "Extended Regular Expressions").
- The main extensions are internationalization
- features and interval expressions. Historical implementations of
- .IR awk
- have long supported
- <backslash>-escape
- sequences as an extension to extended regular expressions, and
- this extension has been retained despite inconsistency with other
- utilities. The number of escape sequences recognized in both extended
- regular expressions and strings has varied (generally increasing with
- time) among implementations. The set specified by POSIX.1\(hy2008 includes most
- sequences known to be supported by popular implementations and by the
- ISO\ C standard. One sequence that is not supported is hexadecimal value escapes
- beginning with
- .BR '\ex' .
- This would allow values expressed in more than 9 bits to be used within
- .IR awk
- as in the ISO\ C standard. However, because this syntax has a non-deterministic
- length, it does not permit the subsequent character to be a hexadecimal
- digit. This limitation can be dealt with in the C language by the use
- of lexical string concatenation. In the
- .IR awk
- language, concatenation could also be a solution for strings, but not
- for extended regular expressions (either lexical ERE tokens or strings
- used dynamically as regular expressions). Because of this limitation,
- the feature has not been added to POSIX.1\(hy2008.
- .P
- When a string variable is used in a context where an extended regular
- expression normally appears (where the lexical token ERE is used in the
- grammar) the string does not contain the literal
- <slash>
- characters.
- .P
- Some versions of
- .IR awk
- allow the form:
- .sp
- .RS 4
- .nf
- func name(args, ... ) { statements }
- .fi
- .P
- .RE
- .P
- This has been deprecated by the authors of the language, who asked that
- it not be specified.
- .P
- Historical implementations of
- .IR awk
- produce an error if a
- .BR next
- statement is executed in a
- .BR BEGIN
- action, and cause
- .IR awk
- to terminate if a
- .BR next
- statement is executed in an
- .BR END
- action. This behavior has not been documented, and it was not believed
- that it was necessary to standardize it.
- .P
- The specification of conversions between string and numeric values is
- much more detailed than in the documentation of historical
- implementations or in the referenced \fIThe AWK Programming Language\fP. Although most of the behavior is
- designed to be intuitive, the details are necessary to ensure
- compatible behavior from different implementations. This is especially
- important in relational expressions since the types of the operands
- determine whether a string or numeric comparison is performed. From the
- perspective of an application developer, it is usually sufficient to
- expect intuitive behavior and to force conversions (by adding zero or
- concatenating a null string) when the type of an expression does not
- obviously match what is needed. The intent has been to specify
- historical practice in almost all cases. The one exception is that, in
- historical implementations, variables and constants maintain both
- string and numeric values after their original value is converted by
- any use. This means that referencing a variable or constant can have
- unexpected side-effects. For example, with historical implementations
- the following program:
- .sp
- .RS 4
- .nf
- {
- a = "+2"
- b = 2
- if (NR % 2)
- c = a + b
- if (a == b)
- print "numeric comparison"
- else
- print "string comparison"
- }
- .fi
- .P
- .RE
- .P
- would perform a numeric comparison (and output numeric comparison) for
- each odd-numbered line, but perform a string comparison (and output
- string comparison) for each even-numbered line. POSIX.1\(hy2008 ensures that
- comparisons will be numeric if necessary. With historical
- implementations, the following program:
- .sp
- .RS 4
- .nf
- BEGIN {
- OFMT = "%e"
- print 3.14
- OFMT = "%f"
- print 3.14
- }
- .fi
- .P
- .RE
- .P
- would output
- .BR \(dq3.140000e+00\(dq
- twice, because in the second
- .BR print
- statement the constant
- .BR \(dq3.14\(dq
- would have a string value from the previous conversion. POSIX.1\(hy2008 requires
- that the output of the second
- .BR print
- statement be
- .BR \(dq3.140000\(dq .
- The behavior of historical implementations was seen as too unintuitive
- and unpredictable.
- .P
- It was pointed out that with the rules contained in early drafts, the
- following script would print nothing:
- .sp
- .RS 4
- .nf
- BEGIN {
- y[1.5] = 1
- OFMT = "%e"
- print y[1.5]
- }
- .fi
- .P
- .RE
- .P
- Therefore, a new variable,
- .BR CONVFMT ,
- was introduced. The
- .BR OFMT
- variable is now restricted to affecting output conversions of numbers
- to strings and
- .BR CONVFMT
- is used for internal conversions, such as comparisons or array
- indexing. The default value is the same as that for
- .BR OFMT ,
- so unless a program changes
- .BR CONVFMT
- (which no historical program would do), it will receive the historical
- behavior associated with internal string conversions.
- .P
- The POSIX
- .IR awk
- lexical and syntactic conventions are specified more formally than in
- other sources. Again the intent has been to specify historical
- practice. One convention that may not be obvious from the formal
- grammar as in other verbal descriptions is where
- <newline>
- characters are acceptable. There are several obvious placements such as
- terminating a statement, and a
- <backslash>
- can be used to escape
- <newline>
- characters between any lexical tokens. In addition,
- <newline>
- characters without
- <backslash>
- characters can follow a comma, an open brace, a logical AND operator (\c
- .BR \(dq&&\(dq ),
- a logical OR operator (\c
- .BR \(dq||\(dq ),
- the
- .BR do
- keyword, the
- .BR else
- keyword, and the closing parenthesis of an
- .BR if ,
- .BR for ,
- or
- .BR while
- statement. For example:
- .sp
- .RS 4
- .nf
- { print $1,
- $2 }
- .fi
- .P
- .RE
- .P
- The requirement that
- .IR awk
- add a trailing
- <newline>
- to the program argument text is to simplify the grammar, making it
- match a text file in form. There is no way for an application or test
- suite to determine whether a literal
- <newline>
- is added or whether
- .IR awk
- simply acts as if it did.
- .P
- POSIX.1\(hy2008 requires several changes from historical implementations in order
- to support internationalization. Probably the most subtle of these is
- the use of the decimal-point character, defined by the
- .IR LC_NUMERIC
- category of the locale, in representations of floating-point numbers.
- This locale-specific character is used in recognizing numeric input, in
- converting between strings and numeric values, and in formatting
- output. However, regardless of locale, the
- <period>
- character (the decimal-point character of the POSIX locale) is the
- decimal-point character recognized in processing
- .IR awk
- programs (including assignments in command line arguments). This is
- essentially the same convention as the one used in the ISO\ C standard. The
- difference is that the C language includes the
- \fIsetlocale\fR()
- function, which permits an application to modify its locale. Because of
- this capability, a C application begins executing with its locale set
- to the C locale, and only executes in the environment-specified locale
- after an explicit call to
- \fIsetlocale\fR().
- However, adding such an elaborate new feature to the
- .IR awk
- language was seen as inappropriate for POSIX.1\(hy2008. It is possible to execute
- an
- .IR awk
- program explicitly in any desired locale by setting the environment in
- the shell.
- .P
- The undefined behavior resulting from NULs in extended regular
- expressions allows future extensions for the GNU
- .IR gawk
- program to process binary data.
- .P
- The behavior in the case of invalid
- .IR awk
- programs (including lexical, syntactic, and semantic errors) is
- undefined because it was considered overly limiting on implementations
- to specify. In most cases such errors can be expected to produce a
- diagnostic and a non-zero exit status. However, some implementations
- may choose to extend the language in ways that make use of certain
- invalid constructs. Other invalid constructs might be deemed worthy of
- a warning, but otherwise cause some reasonable behavior. Still other
- constructs may be very difficult to detect in some implementations.
- Also, different implementations might detect a given error during an
- initial parsing of the program (before reading any input files) while
- others might detect it when executing the program after reading some
- input. Implementors should be aware that diagnosing errors as early as
- possible and producing useful diagnostics can ease debugging of
- applications, and thus make an implementation more usable.
- .P
- The unspecified behavior from using multi-character
- .BR RS
- values is to allow possible future extensions based on extended regular
- expressions used for record separators. Historical implementations take
- the first character of the string and ignore the others.
- .P
- Unspecified behavior when
- .IR split (\c
- .IR string ,\c
- .IR array ,\c
- <null>)
- is used is to allow a proposed future extension that would split up a
- string into an array of individual characters.
- .P
- In the context of the
- .BR getline
- function, equally good arguments for different precedences of the
- .BR |
- and
- .BR <
- operators can be made. Historical practice has been that:
- .sp
- .RS 4
- .nf
- getline < "a" "b"
- .fi
- .P
- .RE
- .P
- is parsed as:
- .sp
- .RS 4
- .nf
- ( getline < "a" ) "b"
- .fi
- .P
- .RE
- .P
- although many would argue that the intent was that the file
- .BR ab
- should be read. However:
- .sp
- .RS 4
- .nf
- getline < "x" + 1
- .fi
- .P
- .RE
- .P
- parses as:
- .sp
- .RS 4
- .nf
- getline < ( "x" + 1 )
- .fi
- .P
- .RE
- .P
- Similar problems occur with the
- .BR |
- version of
- .BR getline ,
- particularly in combination with
- .BR $ .
- For example:
- .sp
- .RS 4
- .nf
- $"echo hi" | getline
- .fi
- .P
- .RE
- .P
- (This situation is particularly problematic when used in a
- .BR print
- statement, where the
- .BR |getline
- part might be a redirection of the
- .BR print .)
- .P
- Since in most cases such constructs are not (or at least should not) be
- used (because they have a natural ambiguity for which there is no
- conventional parsing), the meaning of these constructs has been made
- explicitly unspecified. (The effect is that a conforming application that
- runs into the problem must parenthesize to resolve the ambiguity.)
- There appeared to be few if any actual uses of such constructs.
- .P
- Grammars can be written that would cause an error under these
- circumstances. Where backwards-compatibility is not a large
- consideration, implementors may wish to use such grammars.
- .P
- Some historical implementations have allowed some built-in functions to
- be called without an argument list, the result being a default argument
- list chosen in some ``reasonable'' way. Use of
- .BR length
- as a synonym for
- .BR "length($0)"
- is the only one of these forms that is thought to be widely known or
- widely used; this particular form is documented in various places (for
- example, most historical
- .IR awk
- reference pages, although not in the referenced \fIThe AWK Programming Language\fP) as legitimate practice.
- With this exception, default argument lists have always been
- undocumented and vaguely defined, and it is not at all clear how (or
- if) they should be generalized to user-defined functions. They add no
- useful functionality and preclude possible future extensions that might
- need to name functions without calling them. Not standardizing them
- seems the simplest course. The standard developers considered that
- .BR length
- merited special treatment, however, since it has been documented in the
- past and sees possibly substantial use in historical programs.
- Accordingly, this usage has been made legitimate, but Issue\ 5
- removed the obsolescent marking for XSI-conforming implementations
- and many otherwise conforming applications depend on this feature.
- .P
- In
- .BR sub
- and
- .BR gsub ,
- if
- .IR repl
- is a string literal (the lexical token
- .BR STRING ),
- then two consecutive
- <backslash>
- characters should be used in the string to ensure a single
- <backslash>
- will precede the
- <ampersand>
- when the resultant string is passed to the function. (For example,
- to specify one literal
- <ampersand>
- in the replacement string, use
- .BR gsub (\c
- .BR ERE ,
- .BR \(dq\e\e&\(dq ).)
- .P
- Historically, the only special character in the
- .IR repl
- argument of
- .BR sub
- and
- .BR gsub
- string functions was the
- <ampersand>
- (\c
- .BR '&' )
- character and preceding it with the
- <backslash>
- character was used to turn off its special meaning.
- .P
- The description in the ISO\ POSIX\(hy2:\|1993 standard introduced behavior such that the
- <backslash>
- character was another special character and it was unspecified whether
- there were any other special characters. This description introduced
- several portability problems, some of which are described below, and so
- it has been replaced with the more historical description. Some of the
- problems include:
- .IP " *" 4
- Historically, to create the replacement string, a script could use
- .BR gsub (\c
- .BR ERE ,
- .BR \(dq\e\e&\(dq ),
- but with the ISO\ POSIX\(hy2:\|1993 standard wording, it was necessary to use
- .BR gsub (\c
- .BR ERE ,
- .BR \(dq\e\e\e\e&\(dq ).
- The
- <backslash>
- characters are doubled here because all string literals are subject to
- lexical analysis, which would reduce each pair of
- <backslash>
- characters to a single
- <backslash>
- before being passed to
- .BR gsub .
- .IP " *" 4
- Since it was unspecified what the special characters were, for portable
- scripts to guarantee that characters are printed literally, each
- character had to be preceded with a
- <backslash>.
- (For example, a portable script had to use
- .BR gsub (\c
- .BR ERE ,
- .BR \(dq\e\eh\e\ei\(dq )
- to produce a replacement string of
- .BR \(dqhi\(dq .)
- .P
- The description for comparisons in the ISO\ POSIX\(hy2:\|1993 standard did not properly describe
- historical practice because of the way numeric strings are compared as
- numbers. The current rules cause the following code:
- .sp
- .RS 4
- .nf
- if (0 == "000")
- print "strange, but true"
- else
- print "not true"
- .fi
- .P
- .RE
- .P
- to do a numeric comparison, causing the
- .BR if
- to succeed. It should be intuitively obvious that this is incorrect
- behavior, and indeed, no historical implementation of
- .IR awk
- actually behaves this way.
- .P
- To fix this problem, the definition of
- .IR "numeric string"
- was enhanced to include only those values obtained from specific
- circumstances (mostly external sources) where it is not possible to
- determine unambiguously whether the value is intended to be a string or
- a numeric.
- .P
- Variables that are assigned to a numeric string shall also be treated
- as a numeric string. (For example, the notion of a numeric string can
- be propagated across assignments.) In comparisons, all variables having
- the uninitialized value are to be treated as a numeric operand
- evaluating to the numeric value zero.
- .P
- Uninitialized variables include all types of variables including
- scalars, array elements, and fields. The definition of an uninitialized
- value in
- .IR "Variables and Special Variables"
- is necessary to describe the value placed on uninitialized variables
- and on fields that are valid (for example,
- .BR <
- .BR $NF )
- but have no characters in them and to describe how these variables are
- to be used in comparisons. A valid field, such as
- .BR $1 ,
- that has no characters in it can be obtained from an input line of
- .BR \(dq\et\et\(dq
- when
- .BR FS= \c
- .BR '\et' .
- Historically, the comparison (\c
- .BR $1< 10)
- was done numerically after evaluating
- .BR $1
- to the value zero.
- .P
- The phrase ``.\|.\|. also shall have the numeric value of the numeric
- string'' was removed from several sections of the ISO\ POSIX\(hy2:\|1993 standard because is
- specifies an unnecessary implementation detail. It is not necessary for
- POSIX.1\(hy2008 to specify that these objects be assigned two different values.
- It is only necessary to specify that these objects may evaluate to two
- different values depending on context.
- .P
- Historical implementations of
- .IR awk
- did not parse hexadecimal integer or floating constants like
- .BR \(dq0xa\(dq
- and
- .BR \(dq0xap0\(dq .
- Due to an oversight, the 2001 through 2004 editions of this standard
- required support for hexadecimal floating constants. This was due to
- the reference to
- \fIatof\fR().
- This version of the standard allows but does not require implementations
- to use
- \fIatof\fR()
- and includes a description of how floating-point numbers are recognized
- as an alternative to match historic behavior. The intent of this change
- is to allow implementations to recognize floating-point constants
- according to either the ISO/IEC\ 9899:\|1990 standard or ISO/IEC\ 9899:\|1999 standard, and to allow (but not require)
- implementations to recognize hexadecimal integer constants.
- .P
- Historical implementations of
- .IR awk
- did not support floating-point infinities and NaNs in
- .IR "numeric strings" ;
- e.g.,
- .BR \(dq-INF\(dq
- and
- .BR \(dqNaN\(dq .
- However, implementations that use the
- \fIatof\fR()
- or
- \fIstrtod\fR()
- functions to do the conversion picked up support for these values if they
- used a ISO/IEC\ 9899:\|1999 standard version of the function instead of a ISO/IEC\ 9899:\|1990 standard version. Due to
- an oversight, the 2001 through 2004 editions of this standard did not
- allow support for infinities and NaNs, but in this revision support is
- allowed (but not required). This is a silent change to the behavior of
- .IR awk
- programs; for example, in the POSIX locale the expression:
- .sp
- .RS 4
- .nf
- ("-INF" + 0 < 0)
- .fi
- .P
- .RE
- .P
- formerly had the value 0 because
- .BR \(dq-INF\(dq
- converted to 0, but now it may have the value 0 or 1.
- .SH "FUTURE DIRECTIONS"
- A future version of this standard may require the
- .BR \(dq!=\(dq
- and
- .BR \(dq==\(dq
- operators to perform string comparisons by checking if the strings are
- identical (and not by checking if they collate equally).
- .SH "SEE ALSO"
- .IR "Section 1.3" ", " "Grammar Conventions",
- .IR "\fIgrep\fR\^",
- .IR "\fIlex\fR\^",
- .IR "\fIsed\fR\^"
- .P
- The Base Definitions volume of POSIX.1\(hy2017,
- .IR "Chapter 5" ", " "File Format Notation",
- .IR "Section 6.1" ", " "Portable Character Set",
- .IR "Chapter 8" ", " "Environment Variables",
- .IR "Chapter 9" ", " "Regular Expressions",
- .IR "Section 12.2" ", " "Utility Syntax Guidelines"
- .P
- The System Interfaces volume of POSIX.1\(hy2017,
- .IR "\fIatof\fR\^(\|)",
- .IR "\fIexec\fR\^",
- .IR "\fIisspace\fR\^(\|)",
- .IR "\fIpopen\fR\^(\|)",
- .IR "\fIsetlocale\fR\^(\|)",
- .IR "\fIstrtod\fR\^(\|)"
- .\"
- .SH COPYRIGHT
- Portions of this text are reprinted and reproduced in electronic form
- from IEEE Std 1003.1-2017, Standard for Information Technology
- -- Portable Operating System Interface (POSIX), The Open Group Base
- Specifications Issue 7, 2018 Edition,
- Copyright (C) 2018 by the Institute of
- Electrical and Electronics Engineers, Inc and The Open Group.
- In the event of any discrepancy between this version and the original IEEE and
- The Open Group Standard, the original IEEE and The Open Group Standard
- is the referee document. The original Standard can be obtained online at
- http://www.opengroup.org/unix/online.html .
- .PP
- Any typographical or formatting errors that appear
- in this page are most likely
- to have been introduced during the conversion of the source files to
- man page format. To report such errors, see
- https://www.kernel.org/doc/man-pages/reporting_bugs.html .