logo

oasis-root

Compiled tree of Oasis Linux based on own branch at <https://hacktivis.me/git/oasis/> git clone https://anongit.hacktivis.me/git/oasis-root.git

awk.1p (108240B)


  1. '\" et
  2. .TH AWK "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual"
  3. .\"
  4. .SH PROLOG
  5. This manual page is part of the POSIX Programmer's Manual.
  6. The Linux implementation of this interface may differ (consult
  7. the corresponding Linux manual page for details of Linux behavior),
  8. or the interface may not be implemented on Linux.
  9. .\"
  10. .SH NAME
  11. awk
  12. \(em pattern scanning and processing language
  13. .SH SYNOPSIS
  14. .LP
  15. .nf
  16. awk \fB[\fR-F \fIsepstring\fB] [\fR-v \fIassignment\fB]\fR... \fIprogram\fB [\fIargument\fR...\fB]\fR
  17. .P
  18. awk \fB[\fR-F \fIsepstring\fB] \fR-f \fIprogfile \fB[\fR-f \fIprogfile\fB]\fR... \fB[\fR-v \fIassignment\fB]\fR...
  19. \fB[\fIargument\fR...\fB]\fR
  20. .fi
  21. .SH DESCRIPTION
  22. The
  23. .IR awk
  24. utility shall execute programs written in the
  25. .IR awk
  26. programming language, which is specialized for textual data
  27. manipulation. An
  28. .IR awk
  29. program is a sequence of patterns and corresponding actions. When
  30. input is read that matches a pattern, the action associated with that
  31. pattern is carried out.
  32. .P
  33. Input shall be interpreted as a sequence of records. By default, a
  34. record is a line, less its terminating
  35. <newline>,
  36. but this can be changed by using the
  37. .BR RS
  38. built-in variable. Each record of input shall be matched in turn
  39. against each pattern in the program. For each pattern matched, the
  40. associated action shall be executed.
  41. .P
  42. The
  43. .IR awk
  44. utility shall interpret each input record as a sequence of fields
  45. where, by default, a field is a string of non-\c
  46. <blank>
  47. non-\c
  48. <newline>
  49. characters. This default
  50. <blank>
  51. and
  52. <newline>
  53. field delimiter can be changed by using the
  54. .BR FS
  55. built-in variable or the
  56. .BR \-F
  57. .IR sepstring
  58. option. The
  59. .IR awk
  60. utility shall denote the first field in a record $1, the second $2, and
  61. so on. The symbol $0 shall refer to the entire record; setting any
  62. other field causes the re-evaluation of $0. Assigning to $0 shall reset
  63. the values of all other fields and the
  64. .BR NF
  65. built-in variable.
  66. .SH OPTIONS
  67. The
  68. .IR awk
  69. utility shall conform to the Base Definitions volume of POSIX.1\(hy2017,
  70. .IR "Section 12.2" ", " "Utility Syntax Guidelines".
  71. .P
  72. The following options shall be supported:
  73. .IP "\fB\-F\ \fIsepstring\fR" 10
  74. Define the input field separator. This option shall be equivalent to:
  75. .RS 10
  76. .sp
  77. .RS 4
  78. .nf
  79. -v FS=\fIsepstring
  80. .fi
  81. .P
  82. .RE
  83. .P
  84. except that if
  85. .BR \-F
  86. .IR sepstring
  87. and
  88. .BR \-v
  89. .IR \fRFS=\fPsepstring\fR
  90. are both used, it is unspecified whether the
  91. .BR FS
  92. assignment resulting from
  93. .BR \-F
  94. .IR sepstring
  95. is processed in command line order or is processed after the last
  96. .BR \-v
  97. .IR \fRFS=\fPsepstring\fR .
  98. See the description of the
  99. .BR FS
  100. built-in variable, and how it is used, in the EXTENDED DESCRIPTION
  101. section.
  102. .RE
  103. .IP "\fB\-f\ \fIprogfile\fR" 10
  104. Specify the pathname of the file
  105. .IR progfile
  106. containing an
  107. .IR awk
  108. program. A pathname of
  109. .BR '\-'
  110. shall denote the standard input. If multiple instances of this option
  111. are specified, the concatenation of the files specified as
  112. .IR progfile
  113. in the order specified shall be the
  114. .IR awk
  115. program. The
  116. .IR awk
  117. program can alternatively be specified in the command line as a single
  118. argument.
  119. .IP "\fB\-v\ \fIassignment\fR" 10
  120. .br
  121. The application shall ensure that the
  122. .IR assignment
  123. argument is in the same form as an
  124. .IR assignment
  125. operand. The specified variable assignment shall occur prior to
  126. executing the
  127. .IR awk
  128. program, including the actions associated with
  129. .BR BEGIN
  130. patterns (if any). Multiple occurrences of this option can be
  131. specified.
  132. .SH OPERANDS
  133. The following operands shall be supported:
  134. .IP "\fIprogram\fR" 10
  135. If no
  136. .BR \-f
  137. option is specified, the first operand to
  138. .IR awk
  139. shall be the text of the
  140. .IR awk
  141. program. The application shall supply the
  142. .IR program
  143. operand as a single argument to
  144. .IR awk .
  145. If the text does not end in a
  146. <newline>,
  147. .IR awk
  148. shall interpret the text as if it did.
  149. .IP "\fIargument\fR" 10
  150. Either of the following two types of
  151. .IR argument
  152. can be intermixed:
  153. .RS 10
  154. .IP "\fIfile\fR" 10
  155. A pathname of a file that contains the input to be read, which is
  156. matched against the set of patterns in the program. If no
  157. .IR file
  158. operands are specified, or if a
  159. .IR file
  160. operand is
  161. .BR '\-' ,
  162. the standard input shall be used.
  163. .IP "\fIassignment\fR" 10
  164. An operand that begins with an
  165. <underscore>
  166. or alphabetic character from the portable character set (see the table
  167. in the Base Definitions volume of POSIX.1\(hy2017,
  168. .IR "Section 6.1" ", " "Portable Character Set"),
  169. followed by a sequence of underscores, digits, and alphabetics from the
  170. portable character set, followed by the
  171. .BR '='
  172. character, shall specify a variable assignment rather than a pathname.
  173. The characters before the
  174. .BR '='
  175. represent the name of an
  176. .IR awk
  177. variable; if that name is an
  178. .IR awk
  179. reserved word (see
  180. .IR "Grammar")
  181. the behavior is undefined. The characters following the
  182. <equals-sign>
  183. shall be interpreted as if they appeared in the
  184. .IR awk
  185. program preceded and followed by a double-quote (\c
  186. .BR '\&"' )
  187. character, as a
  188. .BR STRING
  189. token (see
  190. .IR "Grammar"),
  191. except that if the last character is an unescaped
  192. <backslash>,
  193. it shall be interpreted as a literal
  194. <backslash>
  195. rather than as the first character of the sequence
  196. .BR \(dq\e"\(dq .
  197. The variable shall be assigned the value of that
  198. .BR STRING
  199. token and, if appropriate, shall be considered a
  200. .IR "numeric string"
  201. (see
  202. .IR "Expressions in awk"),
  203. the variable shall also be assigned its numeric value. Each such
  204. variable assignment shall occur just prior to the processing of the
  205. following
  206. .IR file ,
  207. if any. Thus, an assignment before the first
  208. .IR file
  209. argument shall be executed after the
  210. .BR BEGIN
  211. actions (if any), while an assignment after the last
  212. .IR file
  213. argument shall occur before the
  214. .BR END
  215. actions (if any). If there are no
  216. .IR file
  217. arguments, assignments shall be executed before processing the standard
  218. input.
  219. .RE
  220. .SH STDIN
  221. The standard input shall be used only if no
  222. .IR file
  223. operands are specified, or if a
  224. .IR file
  225. operand is
  226. .BR '\-' ,
  227. or if a
  228. .IR progfile
  229. option-argument is
  230. .BR '\-' ;
  231. see the INPUT FILES section. If the
  232. .IR awk
  233. program contains no actions and no patterns, but is otherwise a valid
  234. .IR awk
  235. program, standard input and any
  236. .IR file
  237. operands shall not be read and
  238. .IR awk
  239. shall exit with a return status of zero.
  240. .SH "INPUT FILES"
  241. Input files to the
  242. .IR awk
  243. program from any of the following sources shall be text files:
  244. .IP " *" 4
  245. Any
  246. .IR file
  247. operands or their equivalents, achieved by modifying the
  248. .IR awk
  249. variables
  250. .BR ARGV
  251. and
  252. .BR ARGC
  253. .IP " *" 4
  254. Standard input in the absence of any
  255. .IR file
  256. operands
  257. .IP " *" 4
  258. Arguments to the
  259. .BR getline
  260. function
  261. .P
  262. Whether the variable
  263. .BR RS
  264. is set to a value other than a
  265. <newline>
  266. or not, for these files, implementations shall support records
  267. terminated with the specified separator up to
  268. {LINE_MAX}
  269. bytes and may support longer records.
  270. .P
  271. If
  272. .BR \-f
  273. .IR progfile
  274. is specified, the application shall ensure that the files named by each
  275. of the
  276. .IR progfile
  277. option-arguments are text files and their concatenation, in the same
  278. order as they appear in the arguments, is an
  279. .IR awk
  280. program.
  281. .SH "ENVIRONMENT VARIABLES"
  282. The following environment variables shall affect the execution of
  283. .IR awk :
  284. .IP "\fILANG\fP" 10
  285. Provide a default value for the internationalization variables that are
  286. unset or null. (See the Base Definitions volume of POSIX.1\(hy2017,
  287. .IR "Section 8.2" ", " "Internationalization Variables"
  288. for the precedence of internationalization variables used to determine
  289. the values of locale categories.)
  290. .IP "\fILC_ALL\fP" 10
  291. If set to a non-empty string value, override the values of all the
  292. other internationalization variables.
  293. .IP "\fILC_COLLATE\fP" 10
  294. .br
  295. Determine the locale for the behavior of ranges, equivalence classes,
  296. and multi-character collating elements within regular expressions and
  297. in comparisons of string values.
  298. .IP "\fILC_CTYPE\fP" 10
  299. Determine the locale for the interpretation of sequences of bytes of
  300. text data as characters (for example, single-byte as opposed to
  301. multi-byte characters in arguments and input files), the behavior of
  302. character classes within regular expressions, the identification of
  303. characters as letters, and the mapping of uppercase and lowercase
  304. characters for the
  305. .BR toupper
  306. and
  307. .BR tolower
  308. functions.
  309. .IP "\fILC_MESSAGES\fP" 10
  310. .br
  311. Determine the locale that should be used to affect the format and
  312. contents of diagnostic messages written to standard error.
  313. .IP "\fILC_NUMERIC\fP" 10
  314. .br
  315. Determine the radix character used when interpreting numeric input,
  316. performing conversions between numeric and string values, and
  317. formatting numeric output. Regardless of locale, the
  318. <period>
  319. character (the decimal-point character of the POSIX locale) is the
  320. decimal-point character recognized in processing
  321. .IR awk
  322. programs (including assignments in command line arguments).
  323. .IP "\fINLSPATH\fP" 10
  324. Determine the location of message catalogs for the processing of
  325. .IR LC_MESSAGES .
  326. .IP "\fIPATH\fP" 10
  327. Determine the search path when looking for commands executed by
  328. \fIsystem\fR(\fIexpr\fR), or input and output pipes; see the Base Definitions volume of POSIX.1\(hy2017,
  329. .IR "Chapter 8" ", " "Environment Variables".
  330. .P
  331. In addition, all environment variables shall be visible via the
  332. .IR awk
  333. variable
  334. .BR ENVIRON .
  335. .SH "ASYNCHRONOUS EVENTS"
  336. Default.
  337. .SH STDOUT
  338. The nature of the output files depends on the
  339. .IR awk
  340. program.
  341. .SH STDERR
  342. The standard error shall be used only for diagnostic messages.
  343. .SH "OUTPUT FILES"
  344. The nature of the output files depends on the
  345. .IR awk
  346. program.
  347. .br
  348. .SH "EXTENDED DESCRIPTION"
  349. .SS "Overall Program Structure"
  350. .P
  351. An
  352. .IR awk
  353. program is composed of pairs of the form:
  354. .sp
  355. .RS 4
  356. .nf
  357. \fIpattern\fR { \fIaction\fR }
  358. .fi
  359. .P
  360. .RE
  361. .P
  362. Either the pattern or the action (including the enclosing brace
  363. characters) can be omitted.
  364. .P
  365. A missing pattern shall match any record of input, and a missing action
  366. shall be equivalent to:
  367. .sp
  368. .RS 4
  369. .nf
  370. { print }
  371. .fi
  372. .P
  373. .RE
  374. .P
  375. Execution of the
  376. .IR awk
  377. program shall start by first executing the actions associated with all
  378. .BR BEGIN
  379. patterns in the order they occur in the program. Then each
  380. .IR file
  381. operand (or standard input if no files were specified) shall be
  382. processed in turn by reading data from the file until a record
  383. separator is seen (\c
  384. <newline>
  385. by default). Before the first reference to a field in the record is
  386. evaluated, the record shall be split into fields, according to the
  387. rules in
  388. .IR "Regular Expressions",
  389. using the value of
  390. .BR FS
  391. that was current at the time the record was read. Each pattern in the
  392. program then shall be evaluated in the order of occurrence, and the
  393. action associated with each pattern that matches the current record
  394. executed. The action for a matching pattern shall be executed before
  395. evaluating subsequent patterns. Finally, the actions associated with
  396. all
  397. .BR END
  398. patterns shall be executed in the order they occur in the program.
  399. .SS "Expressions in awk"
  400. .P
  401. Expressions describe computations used in
  402. .IR patterns
  403. and
  404. .IR actions .
  405. In the following table, valid expression operations are given in groups
  406. from highest precedence first to lowest precedence last, with
  407. equal-precedence operators grouped between horizontal lines. In
  408. expression evaluation, where the grammar is formally ambiguous, higher
  409. precedence operators shall be evaluated before lower precedence
  410. operators. In this table
  411. .IR expr ,
  412. .IR expr1 ,
  413. .IR expr2 ,
  414. and
  415. .IR expr3
  416. represent any expression, while lvalue represents any entity that can
  417. be assigned to (that is, on the left side of an assignment operator).
  418. The precise syntax of expressions is given in
  419. .IR "Grammar".
  420. .sp
  421. .ce 1
  422. \fBTable 4-1: Expressions in Decreasing Precedence in \fIawk\fP\fR
  423. .TS
  424. box tab(@) center;
  425. cB | cB | cB | cB
  426. l1f5 | l1 | l1 | l.
  427. Syntax@Name@Type of Result@Associativity
  428. _
  429. ( \fIexpr\fP )@Grouping@Type of \fIexpr\fP@N/A
  430. _
  431. $\fIexpr\fP@Field reference@String@N/A
  432. _
  433. lvalue ++@Post-increment@Numeric@N/A
  434. lvalue \-\|\-@Post-decrement@Numeric@N/A
  435. _
  436. ++ lvalue@Pre-increment@Numeric@N/A
  437. \-\|\- lvalue@Pre-decrement@Numeric@N/A
  438. _
  439. \fIexpr\fP ^ \fIexpr\fP@Exponentiation@Numeric@Right
  440. _
  441. ! \fIexpr\fP@Logical not@Numeric@N/A
  442. + \fIexpr\fP@Unary plus@Numeric@N/A
  443. \- \fIexpr\fP@Unary minus@Numeric@N/A
  444. _
  445. \fIexpr\fP * \fIexpr\fP@Multiplication@Numeric@Left
  446. \fIexpr\fP / \fIexpr\fP@Division@Numeric@Left
  447. \fIexpr\fP % \fIexpr\fP@Modulus@Numeric@Left
  448. _
  449. \fIexpr\fP + \fIexpr\fP@Addition@Numeric@Left
  450. \fIexpr\fP \- \fIexpr\fP@Subtraction@Numeric@Left
  451. _
  452. \fIexpr\fP \fIexpr\fP@String concatenation@String@Left
  453. _
  454. \fIexpr\fP < \fIexpr\fP@Less than@Numeric@None
  455. \fIexpr\fP <= \fIexpr\fP@Less than or equal to@Numeric@None
  456. \fIexpr\fP != \fIexpr\fP@Not equal to@Numeric@None
  457. \fIexpr\fP == \fIexpr\fP@Equal to@Numeric@None
  458. \fIexpr\fP > \fIexpr\fP@Greater than@Numeric@None
  459. \fIexpr\fP >= \fIexpr\fP@Greater than or equal to@Numeric@None
  460. _
  461. \fIexpr\fP ~ \fIexpr\fP@ERE match@Numeric@None
  462. \fIexpr\fP !~ \fIexpr\fP@ERE non-match@Numeric@None
  463. _
  464. \fIexpr\fP in array@Array membership@Numeric@Left
  465. ( \fIindex\fP ) in \fIarray\fP@Multi-dimension array@Numeric@Left
  466. @membership
  467. _
  468. \fIexpr\fP && \fIexpr\fP@Logical AND@Numeric@Left
  469. _
  470. \fIexpr\fP || \fIexpr\fP@Logical OR@Numeric@Left
  471. _
  472. \fIexpr1\fP ? \fIexpr2\fP : \fIexpr3\fP@Conditional expression@Type of selected@Right
  473. @@\fIexpr2\fP or \fIexpr3\fP
  474. _
  475. lvalue ^= \fIexpr\fP@Exponentiation assignment@Numeric@Right
  476. lvalue %= \fIexpr\fP@Modulus assignment@Numeric@Right
  477. lvalue *= \fIexpr\fP@Multiplication assignment@Numeric@Right
  478. lvalue /= \fIexpr\fP@Division assignment@Numeric@Right
  479. lvalue += \fIexpr\fP@Addition assignment@Numeric@Right
  480. lvalue \-= \fIexpr\fP@Subtraction assignment@Numeric@Right
  481. lvalue = \fIexpr\fP@Assignment@Type of \fIexpr\fP@Right
  482. .TE
  483. .P
  484. Each expression shall have either a string value, a numeric value, or
  485. both. Except as stated for specific contexts, the value of an expression
  486. shall be implicitly converted to the type needed for the context in which
  487. it is used. A string value shall be converted to a numeric value either by
  488. the equivalent of the following calls to functions defined by the ISO\ C standard:
  489. .sp
  490. .RS 4
  491. .nf
  492. setlocale(LC_NUMERIC, "");
  493. \fInumeric_value\fR = atof(\fIstring_value\fR);
  494. .fi
  495. .P
  496. .RE
  497. .P
  498. or by converting the initial portion of the string to type
  499. .BR double
  500. representation as follows:
  501. .sp
  502. .RS
  503. The input string is decomposed into two parts: an initial, possibly empty,
  504. sequence of white-space characters (as specified by
  505. \fIisspace\fR())
  506. and a subject sequence interpreted as a floating-point constant.
  507. .P
  508. The expected form of the subject sequence is an optional
  509. .BR '+'
  510. or
  511. .BR '\-'
  512. sign, then a non-empty sequence of digits optionally containing a
  513. <period>,
  514. then an optional exponent part. An exponent part consists of
  515. .BR 'e'
  516. or
  517. .BR 'E' ,
  518. followed by an optional sign, followed by one or more decimal digits.
  519. .P
  520. The sequence starting with the first digit or the
  521. <period>
  522. (whichever occurs first) is interpreted as a floating constant of the
  523. C language, and if neither an exponent part nor a
  524. <period>
  525. appears, a
  526. <period>
  527. is assumed to follow the last digit in the string. If the subject
  528. sequence begins with a
  529. <hyphen-minus>,
  530. the value resulting from the conversion is negated.
  531. .RE
  532. .P
  533. A numeric value that is exactly equal to the value of an integer (see
  534. .IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard")
  535. shall be converted to a string by the equivalent of a call to the
  536. .BR sprintf
  537. function (see
  538. .IR "String Functions")
  539. with the string
  540. .BR \(dq%d\(dq
  541. as the
  542. .IR fmt
  543. argument and the numeric value being converted as the first and only
  544. .IR expr
  545. argument. Any other numeric value shall be converted to a string by the
  546. equivalent of a call to the
  547. .BR sprintf
  548. function with the value of the variable
  549. .BR CONVFMT
  550. as the
  551. .IR fmt
  552. argument and the numeric value being converted as the first and only
  553. .IR expr
  554. argument. The result of the conversion is unspecified if the value of
  555. .BR CONVFMT
  556. is not a floating-point format specification. This volume of POSIX.1\(hy2017 specifies no
  557. explicit conversions between numbers and strings. An application can
  558. force an expression to be treated as a number by adding zero to it, or
  559. can force it to be treated as a string by concatenating the null string
  560. (\c
  561. .BR \(dq\^\(dq )
  562. to it.
  563. .P
  564. A string value shall be considered a
  565. .IR "numeric string"
  566. if it comes from one of the following:
  567. .IP " 1." 4
  568. Field variables
  569. .IP " 2." 4
  570. Input from the
  571. \fIgetline\fR()
  572. function
  573. .IP " 3." 4
  574. .BR FILENAME
  575. .IP " 4." 4
  576. .BR ARGV
  577. array elements
  578. .IP " 5." 4
  579. .BR ENVIRON
  580. array elements
  581. .IP " 6." 4
  582. Array elements created by the
  583. \fIsplit\fR()
  584. function
  585. .IP " 7." 4
  586. A command line variable assignment
  587. .IP " 8." 4
  588. Variable assignment from another numeric string variable
  589. .P
  590. and an implementation-dependent condition corresponding to either
  591. case (a) or (b) below is met.
  592. .IP " a." 4
  593. After the equivalent of the following calls to functions defined by
  594. the ISO\ C standard,
  595. .IR string_value_end
  596. would differ from
  597. .IR string_value ,
  598. and any characters before the terminating null character in
  599. .IR string_value_end
  600. would be
  601. <blank>
  602. characters:
  603. .RS 4
  604. .sp
  605. .RS 4
  606. .nf
  607. char *string_value_end;
  608. setlocale(LC_NUMERIC, "");
  609. numeric_value = strtod (string_value, &string_value_end);
  610. .fi
  611. .P
  612. .RE
  613. .RE
  614. .IP " b." 4
  615. After all the following conversions have been applied, the resulting
  616. string would lexically be recognized as a
  617. .BR NUMBER
  618. token as described by the lexical conventions in
  619. .IR "Grammar":
  620. .RS 4
  621. .IP -- 4
  622. All leading and trailing
  623. <blank>
  624. characters are discarded.
  625. .IP -- 4
  626. If the first non-\c
  627. <blank>
  628. is
  629. .BR '\(pl'
  630. or
  631. .BR '\-' ,
  632. it is discarded.
  633. .IP -- 4
  634. Each occurrence of the decimal point character from the current locale
  635. is changed to a
  636. <period>.
  637. .RE
  638. In case (a) the numeric value of the
  639. .IR "numeric string"
  640. shall be the value that would be returned by the
  641. \fIstrtod\fR()
  642. call. In case (b) if the first non-\c
  643. <blank>
  644. is
  645. .BR '\-' ,
  646. the numeric value of the
  647. .IR "numeric string"
  648. shall be the negation of the numeric value of the recognized
  649. .BR NUMBER
  650. token; otherwise, the numeric value of the
  651. .IR "numeric string"
  652. shall be the numeric value of the recognized
  653. .BR NUMBER
  654. token. Whether or not a string is a
  655. .IR "numeric string"
  656. shall be relevant only in contexts where that term is used in this
  657. section.
  658. .P
  659. When an expression is used in a Boolean context, if it has a numeric
  660. value, a value of zero shall be treated as false and any other value
  661. shall be treated as true. Otherwise, a string value of the null string
  662. shall be treated as false and any other value shall be treated as true.
  663. A Boolean context shall be one of the following:
  664. .IP " *" 4
  665. The first subexpression of a conditional expression
  666. .IP " *" 4
  667. An expression operated on by logical NOT, logical AND, or logical OR
  668. .IP " *" 4
  669. The second expression of a
  670. .BR for
  671. statement
  672. .IP " *" 4
  673. The expression of an
  674. .BR if
  675. statement
  676. .IP " *" 4
  677. The expression of the
  678. .BR while
  679. clause in either a
  680. .BR while
  681. or
  682. .BR do .\|.\|.\c
  683. .BR while
  684. statement
  685. .IP " *" 4
  686. An expression used as a pattern (as in Overall Program Structure)
  687. .P
  688. All arithmetic shall follow the semantics of floating-point arithmetic as
  689. specified by the ISO\ C standard (see
  690. .IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard").
  691. .P
  692. The value of the expression:
  693. .sp
  694. .RS 4
  695. .nf
  696. \fIexpr1\fR \(ha \fIexpr2\fR
  697. .fi
  698. .P
  699. .RE
  700. .P
  701. shall be equivalent to the value returned by the ISO\ C standard function call:
  702. .sp
  703. .RS 4
  704. .nf
  705. \fRpow(\fIexpr1\fR, \fIexpr2\fR)
  706. .fi
  707. .P
  708. .RE
  709. .P
  710. The expression:
  711. .sp
  712. .RS 4
  713. .nf
  714. lvalue \(ha= \fIexpr\fR
  715. .fi
  716. .P
  717. .RE
  718. .P
  719. shall be equivalent to the ISO\ C standard expression:
  720. .sp
  721. .RS 4
  722. .nf
  723. lvalue = pow(lvalue, \fIexpr\fR)
  724. .fi
  725. .P
  726. .RE
  727. .P
  728. except that lvalue shall be evaluated only once. The value of the
  729. expression:
  730. .sp
  731. .RS 4
  732. .nf
  733. \fIexpr1\fR % \fIexpr2\fR
  734. .fi
  735. .P
  736. .RE
  737. .P
  738. shall be equivalent to the value returned by the ISO\ C standard function call:
  739. .sp
  740. .RS 4
  741. .nf
  742. fmod(\fIexpr1\fR, \fIexpr2\fR)
  743. .fi
  744. .P
  745. .RE
  746. .P
  747. The expression:
  748. .sp
  749. .RS 4
  750. .nf
  751. lvalue %= \fIexpr\fR
  752. .fi
  753. .P
  754. .RE
  755. .P
  756. shall be equivalent to the ISO\ C standard expression:
  757. .sp
  758. .RS 4
  759. .nf
  760. lvalue = fmod(lvalue, \fIexpr\fR)
  761. .fi
  762. .P
  763. .RE
  764. .P
  765. except that lvalue shall be evaluated only once.
  766. .P
  767. Variables and fields shall be set by the assignment statement:
  768. .sp
  769. .RS 4
  770. .nf
  771. lvalue = \fIexpression\fR
  772. .fi
  773. .P
  774. .RE
  775. .P
  776. and the type of
  777. .IR expression
  778. shall determine the resulting variable type. The assignment includes
  779. the arithmetic assignments (\c
  780. .BR \(dq+=\(dq ,
  781. .BR \(dq-=\(dq ,
  782. .BR \(dq*=\(dq ,
  783. .BR \(dq/=\(dq ,
  784. .BR \(dq%=\(dq ,
  785. .BR \(dq\(ha=\(dq ,
  786. .BR \(dq++\(dq ,
  787. .BR \(dq--\(dq )
  788. all of which shall produce a numeric result. The left-hand side of an
  789. assignment and the target of increment and decrement operators can be
  790. one of a variable, an array with index, or a field selector.
  791. .P
  792. The
  793. .IR awk
  794. language supplies arrays that are used for storing numbers or strings.
  795. Arrays need not be declared. They shall initially be empty, and their
  796. sizes shall change dynamically. The subscripts, or element identifiers,
  797. are strings, providing a type of associative array capability. An array
  798. name followed by a subscript within square brackets can be used as an
  799. lvalue and thus as an expression, as described in the grammar; see
  800. .IR "Grammar".
  801. Unsubscripted array names can be used in only the following contexts:
  802. .IP " *" 4
  803. A parameter in a function definition or function call
  804. .IP " *" 4
  805. The
  806. .BR NAME
  807. token following any use of the keyword
  808. .BR in
  809. as specified in the grammar (see
  810. .IR "Grammar");
  811. if the name used in this context is not an array name, the behavior is
  812. undefined
  813. .P
  814. A valid array
  815. .IR index
  816. shall consist of one or more
  817. <comma>-separated
  818. expressions, similar to the way in which multi-dimensional arrays are
  819. indexed in some programming languages. Because
  820. .IR awk
  821. arrays are really one-dimensional, such a
  822. <comma>-separated
  823. list shall be converted to a single string by concatenating the string
  824. values of the separate expressions, each separated from the other by
  825. the value of the
  826. .BR SUBSEP
  827. variable. Thus, the following two index operations shall be
  828. equivalent:
  829. .sp
  830. .RS 4
  831. .nf
  832. \fIvar\fB[\fIexpr1\fR, \fIexpr2\fR, ... \fIexprn\fB]
  833. .P
  834. \fIvar\fB[\fIexpr1\fR SUBSEP \fIexpr2\fR SUBSEP ... \fRSUBSEP \fIexprn\fB]\fR
  835. .fi
  836. .P
  837. .RE
  838. .P
  839. The application shall ensure that a multi-dimensioned
  840. .IR index
  841. used with the
  842. .BR in
  843. operator is parenthesized. The
  844. .BR in
  845. operator, which tests for the existence of a particular array element,
  846. shall not cause that element to exist. Any other reference to a
  847. nonexistent array element shall automatically create it.
  848. .P
  849. Comparisons (with the
  850. .BR '<' ,
  851. .BR \(dq<=\(dq ,
  852. .BR \(dq!=\(dq ,
  853. .BR \(dq==\(dq ,
  854. .BR '>' ,
  855. and
  856. .BR \(dq>=\(dq
  857. operators) shall be made numerically if both operands are numeric, if
  858. one is numeric and the other has a string value that is a numeric
  859. string, or if one is numeric and the other has the uninitialized value.
  860. Otherwise, operands shall be converted to strings as required and a
  861. string comparison shall be made as follows:
  862. .IP " *" 4
  863. For the
  864. .BR \(dq!=\(dq
  865. and
  866. .BR \(dq==\(dq
  867. operators, the strings should be compared to check if they are
  868. identical but may be compared using the locale-specific collation
  869. sequence to check if they collate equally.
  870. .IP " *" 4
  871. For the other operators, the strings shall be compared using the
  872. locale-specific collation sequence.
  873. .P
  874. The value of the comparison expression shall be 1 if the relation is
  875. true, or 0 if the relation is false.
  876. .SS "Variables and Special Variables"
  877. .P
  878. Variables can be used in an
  879. .IR awk
  880. program by referencing them. With the exception of function parameters
  881. (see
  882. .IR "User-Defined Functions"),
  883. they are not explicitly declared. Function parameter names shall be
  884. local to the function; all other variable names shall be global. The
  885. same name shall not be used as both a function parameter name and as
  886. the name of a function or a special
  887. .IR awk
  888. variable. The same name shall not be used both as a variable name with
  889. global scope and as the name of a function. The same name shall not be
  890. used within the same scope both as a scalar variable and as an array.
  891. Uninitialized variables, including scalar variables, array elements,
  892. and field variables, shall have an uninitialized value. An
  893. uninitialized value shall have both a numeric value of zero and a
  894. string value of the empty string. Evaluation of variables with an
  895. uninitialized value, to either string or numeric, shall be determined
  896. by the context in which they are used.
  897. .P
  898. Field variables shall be designated by a
  899. .BR '$'
  900. followed by a number or numerical expression. The effect of the field
  901. number
  902. .IR expression
  903. evaluating to anything other than a non-negative integer is
  904. unspecified; uninitialized variables or string values need not be
  905. converted to numeric values in this context. New field variables can be
  906. created by assigning a value to them. References to nonexistent fields
  907. (that is, fields after $\fBNF\fP), shall evaluate to the uninitialized
  908. value. Such references shall not create new fields. However, assigning
  909. to a nonexistent field (for example, $(\fBNF\fP+2)=5) shall increase
  910. the value of
  911. .BR NF ;
  912. create any intervening fields with the uninitialized value; and cause
  913. the value of $0 to be recomputed, with the fields being separated by
  914. the value of
  915. .BR OFS .
  916. Each field variable shall have a string value or an uninitialized value
  917. when created. Field variables shall have the uninitialized value when
  918. created from $0 using
  919. .BR FS
  920. and the variable does not contain any characters. If appropriate, the
  921. field variable shall be considered a numeric string (see
  922. .IR "Expressions in awk").
  923. .P
  924. Implementations shall support the following other special variables
  925. that are set by
  926. .IR awk :
  927. .IP "\fBARGC\fR" 10
  928. The number of elements in the
  929. .BR ARGV
  930. array.
  931. .IP "\fBARGV\fR" 10
  932. An array of command line arguments, excluding options and the
  933. .IR program
  934. argument, numbered from zero to
  935. .BR ARGC \-1.
  936. .RS 10
  937. .P
  938. The arguments in
  939. .BR ARGV
  940. can be modified or added to;
  941. .BR ARGC
  942. can be altered. As each input file ends,
  943. .IR awk
  944. shall treat the next non-null element of
  945. .BR ARGV ,
  946. up to the current value of
  947. .BR ARGC \-1,
  948. inclusive, as the name of the next input file. Thus, setting an element
  949. of
  950. .BR ARGV
  951. to null means that it shall not be treated as an input file. The name
  952. .BR '\-'
  953. indicates the standard input. If an argument matches the format of an
  954. .IR assignment
  955. operand, this argument shall be treated as an
  956. .IR assignment
  957. rather than a
  958. .IR file
  959. argument.
  960. .RE
  961. .IP "\fBCONVFMT\fR" 10
  962. The
  963. .BR printf
  964. format for converting numbers to strings (except for output statements,
  965. where
  966. .BR OFMT
  967. is used);
  968. .BR \(dq%.6g\(dq
  969. by default.
  970. .IP "\fBENVIRON\fR" 10
  971. An array representing the value of the environment, as described in the
  972. .IR exec
  973. functions defined in the System Interfaces volume of POSIX.1\(hy2017. The indices of the array shall be
  974. strings consisting of the names of the environment variables, and the
  975. value of each array element shall be a string consisting of the value
  976. of that variable. If appropriate, the environment variable shall be
  977. considered a
  978. .IR "numeric string"
  979. (see
  980. .IR "Expressions in awk");
  981. the array element shall also have its numeric value.
  982. .RS 10
  983. .P
  984. In all cases where the behavior of
  985. .IR awk
  986. is affected by environment variables (including the environment of any
  987. commands that
  988. .IR awk
  989. executes via the
  990. .BR system
  991. function or via pipeline redirections with the
  992. .BR print
  993. statement, the
  994. .BR printf
  995. statement, or the
  996. .BR getline
  997. function), the environment used shall be the environment at the time
  998. .IR awk
  999. began executing; it is implementation-defined whether any
  1000. modification of
  1001. .BR ENVIRON
  1002. affects this environment.
  1003. .RE
  1004. .IP "\fBFILENAME\fR" 10
  1005. A pathname of the current input file. Inside a
  1006. .BR BEGIN
  1007. action the value is undefined. Inside an
  1008. .BR END
  1009. action the value shall be the name of the last input file processed.
  1010. .IP "\fBFNR\fR" 10
  1011. The ordinal number of the current record in the current file. Inside a
  1012. .BR BEGIN
  1013. action the value shall be zero. Inside an
  1014. .BR END
  1015. action the value shall be the number of the last record processed in
  1016. the last file processed.
  1017. .IP "\fBFS\fR" 10
  1018. Input field separator regular expression; a
  1019. <space>
  1020. by default.
  1021. .IP "\fBNF\fR" 10
  1022. The number of fields in the current record. Inside a
  1023. .BR BEGIN
  1024. action, the use of
  1025. .BR NF
  1026. is undefined unless a
  1027. .BR getline
  1028. function without a
  1029. .IR var
  1030. argument is executed previously. Inside an
  1031. .BR END
  1032. action,
  1033. .BR NF
  1034. shall retain the value it had for the last record read, unless a
  1035. subsequent, redirected,
  1036. .BR getline
  1037. function without a
  1038. .IR var
  1039. argument is performed prior to entering the
  1040. .BR END
  1041. action.
  1042. .IP "\fBNR\fR" 10
  1043. The ordinal number of the current record from the start of input.
  1044. Inside a
  1045. .BR BEGIN
  1046. action the value shall be zero. Inside an
  1047. .BR END
  1048. action the value shall be the number of the last record processed.
  1049. .IP "\fBOFMT\fR" 10
  1050. The
  1051. .BR printf
  1052. format for converting numbers to strings in output statements (see
  1053. .IR "Output Statements");
  1054. .BR \(dq%.6g\(dq
  1055. by default. The result of the conversion is unspecified if the value of
  1056. .BR OFMT
  1057. is not a floating-point format specification.
  1058. .IP "\fBOFS\fR" 10
  1059. The
  1060. .BR print
  1061. statement output field separator;
  1062. <space>
  1063. by default.
  1064. .IP "\fBORS\fR" 10
  1065. The
  1066. .BR print
  1067. statement output record separator; a
  1068. <newline>
  1069. by default.
  1070. .IP "\fBRLENGTH\fR" 10
  1071. The length of the string matched by the
  1072. .BR match
  1073. function.
  1074. .IP "\fBRS\fR" 10
  1075. The first character of the string value of
  1076. .BR RS
  1077. shall be the input record separator; a
  1078. <newline>
  1079. by default. If
  1080. .BR RS
  1081. contains more than one character, the results are unspecified. If
  1082. .BR RS
  1083. is null, then records are separated by sequences consisting of a
  1084. <newline>
  1085. plus one or more blank lines, leading or trailing blank lines shall not
  1086. result in empty records at the beginning or end of the input, and a
  1087. <newline>
  1088. shall always be a field separator, no matter what the value of
  1089. .BR FS
  1090. is.
  1091. .IP "\fBRSTART\fR" 10
  1092. The starting position of the string matched by the
  1093. .BR match
  1094. function, numbering from 1. This shall always be equivalent to the
  1095. return value of the
  1096. .BR match
  1097. function.
  1098. .IP "\fBSUBSEP\fR" 10
  1099. The subscript separator string for multi-dimensional arrays; the
  1100. default value is implementation-defined.
  1101. .SS "Regular Expressions"
  1102. .P
  1103. The
  1104. .IR awk
  1105. utility shall make use of the extended regular expression notation
  1106. (see the Base Definitions volume of POSIX.1\(hy2017,
  1107. .IR "Section 9.4" ", " "Extended Regular Expressions")
  1108. except that it shall allow the use of C-language conventions
  1109. for escaping special characters within the EREs, as specified in the
  1110. table in the Base Definitions volume of POSIX.1\(hy2017,
  1111. .IR "Chapter 5" ", " "File Format Notation"
  1112. (\c
  1113. .BR '\e\e' ,
  1114. .BR '\ea' ,
  1115. .BR '\eb' ,
  1116. .BR '\ef' ,
  1117. .BR '\en' ,
  1118. .BR '\er' ,
  1119. .BR '\et' ,
  1120. .BR '\ev' )
  1121. and the following table; these escape sequences shall be recognized
  1122. both inside and outside bracket expressions. Note that records need not
  1123. be separated by
  1124. <newline>
  1125. characters and string constants can contain
  1126. <newline>
  1127. characters, so even the
  1128. .BR \(dq\en\(dq
  1129. sequence is valid in
  1130. .IR awk
  1131. EREs. Using a
  1132. <slash>
  1133. character within an ERE requires the escaping shown in the following
  1134. table.
  1135. .br
  1136. .sp
  1137. .ce 1
  1138. \fBTable 4-2: Escape Sequences in \fIawk\fP\fR
  1139. .ad l
  1140. .TS
  1141. center tab(@) box;
  1142. cB | cB | cB
  1143. cB | cB | cB
  1144. lf5 | lw(34) | lw(34).
  1145. Escape
  1146. Sequence@Description@Meaning
  1147. _
  1148. \e"@T{
  1149. <backslash> <quotation-mark>
  1150. T}@T{
  1151. <quotation-mark> character
  1152. T}
  1153. _
  1154. \e/@T{
  1155. <backslash> <slash>
  1156. T}@T{
  1157. <slash> character
  1158. T}
  1159. _
  1160. \eddd@T{
  1161. A
  1162. <backslash>
  1163. character followed by the longest sequence of one, two, or
  1164. three octal-digit characters (01234567). If all of the digits are 0
  1165. (that is, representation of the NUL character), the behavior is
  1166. undefined.
  1167. T}@T{
  1168. The character whose encoding is represented by the one, two, or
  1169. three-digit octal integer. Multi-byte characters require
  1170. multiple, concatenated escape sequences of this type, including the
  1171. leading
  1172. <backslash>
  1173. for each byte.
  1174. T}
  1175. _
  1176. \ec@T{
  1177. A
  1178. <backslash>
  1179. character followed by any character not described in this
  1180. table or in the table in the Base Definitions volume of POSIX.1\(hy2017,
  1181. .IR "Chapter 5" ", " "File Format Notation"
  1182. (\c
  1183. .BR '\e\e' ,
  1184. .BR '\ea' ,
  1185. .BR '\eb' ,
  1186. .BR '\ef' ,
  1187. .BR '\en' ,
  1188. .BR '\er' ,
  1189. .BR '\et' ,
  1190. .BR '\ev' ).
  1191. T}@Undefined
  1192. .TE
  1193. .ad b
  1194. .P
  1195. A regular expression can be matched against a specific field or string
  1196. by using one of the two regular expression matching operators,
  1197. .BR '\(ti'
  1198. and
  1199. .BR \(dq!\(ti\(dq .
  1200. These operators shall interpret their right-hand operand as a regular
  1201. expression and their left-hand operand as a string. If the regular
  1202. expression matches the string, the
  1203. .BR '\(ti'
  1204. expression shall evaluate to a value of 1, and the
  1205. .BR \(dq!\(ti\(dq
  1206. expression shall evaluate to a value of 0. (The regular expression
  1207. matching operation is as defined by the term matched in the Base Definitions volume of POSIX.1\(hy2017,
  1208. .IR "Section 9.1" ", " "Regular Expression Definitions",
  1209. where a match occurs on any part of the string unless the regular
  1210. expression is limited with the
  1211. <circumflex>
  1212. or
  1213. <dollar-sign>
  1214. special characters.) If the regular expression does not match the
  1215. string, the
  1216. .BR '\(ti'
  1217. expression shall evaluate to a value of 0, and the
  1218. .BR \(dq!\(ti\(dq
  1219. expression shall evaluate to a value of 1. If the right-hand operand is
  1220. any expression other than the lexical token
  1221. .BR ERE ,
  1222. the string value of the expression shall be interpreted as an extended
  1223. regular expression, including the escape conventions described above.
  1224. Note that these same escape conventions shall also be applied in
  1225. determining the value of a string literal (the lexical token
  1226. .BR STRING ),
  1227. and thus shall be applied a second time when a string literal is used
  1228. in this context.
  1229. .P
  1230. When an
  1231. .BR ERE
  1232. token appears as an expression in any context other than as the
  1233. right-hand of the
  1234. .BR '\(ti'
  1235. or
  1236. .BR \(dq!\(ti\(dq
  1237. operator or as one of the built-in function arguments described below,
  1238. the value of the resulting expression shall be the equivalent of:
  1239. .sp
  1240. .RS 4
  1241. .nf
  1242. $0 \(ti /\fIere\fR/
  1243. .fi
  1244. .P
  1245. .RE
  1246. .P
  1247. The
  1248. .IR ere
  1249. argument to the
  1250. .BR gsub ,
  1251. .BR match ,
  1252. .BR sub
  1253. functions, and the
  1254. .IR fs
  1255. argument to the
  1256. .BR split
  1257. function (see
  1258. .IR "String Functions")
  1259. shall be interpreted as extended regular expressions. These can be
  1260. either
  1261. .BR ERE
  1262. tokens or arbitrary expressions, and shall be interpreted in the same
  1263. manner as the right-hand side of the
  1264. .BR '\(ti'
  1265. or
  1266. .BR \(dq!\(ti\(dq
  1267. operator.
  1268. .P
  1269. An extended regular expression can be used to separate fields by assigning
  1270. a string containing the expression to the built-in variable
  1271. .BR FS ,
  1272. either directly or as a consequence of using the
  1273. .BR \-F
  1274. .IR sepstring
  1275. option.
  1276. The default value of the
  1277. .BR FS
  1278. variable shall be a single
  1279. <space>.
  1280. The following describes
  1281. .BR FS
  1282. behavior:
  1283. .IP " 1." 4
  1284. If
  1285. .BR FS
  1286. is a null string, the behavior is unspecified.
  1287. .IP " 2." 4
  1288. If
  1289. .BR FS
  1290. is a single character:
  1291. .RS 4
  1292. .IP " a." 4
  1293. If
  1294. .BR FS
  1295. is
  1296. <space>,
  1297. skip leading and trailing
  1298. <blank>
  1299. and
  1300. <newline>
  1301. characters; fields shall be delimited by sets of one or more
  1302. <blank>
  1303. or
  1304. <newline>
  1305. characters.
  1306. .IP " b." 4
  1307. Otherwise, if
  1308. .BR FS
  1309. is any other character
  1310. .IR c ,
  1311. fields shall be delimited by each single occurrence of
  1312. .IR c .
  1313. .RE
  1314. .IP " 3." 4
  1315. Otherwise, the string value of
  1316. .BR FS
  1317. shall be considered to be an extended regular expression. Each
  1318. occurrence of a sequence matching the extended regular expression shall
  1319. delimit fields.
  1320. .P
  1321. Except for the
  1322. .BR '\(ti'
  1323. and
  1324. .BR \(dq!\(ti\(dq
  1325. operators, and in the
  1326. .BR gsub ,
  1327. .BR match ,
  1328. .BR split ,
  1329. and
  1330. .BR sub
  1331. built-in functions, ERE matching shall be based on input records; that
  1332. is, record separator characters (the first character of the value of
  1333. the variable
  1334. .BR RS ,
  1335. <newline>
  1336. by default) cannot be embedded in the expression, and no expression
  1337. shall match the record separator character. If the record separator is
  1338. not
  1339. <newline>,
  1340. <newline>
  1341. characters embedded in the expression can be matched. For the
  1342. .BR '\(ti'
  1343. and
  1344. .BR \(dq!\(ti\(dq
  1345. operators, and in those four built-in functions, ERE matching shall be
  1346. based on text strings; that is, any character (including
  1347. <newline>
  1348. and the record separator) can be embedded in the pattern, and an
  1349. appropriate pattern shall match any character. However, in all
  1350. .IR awk
  1351. ERE matching, the use of one or more NUL characters in the pattern,
  1352. input record, or text string produces undefined results.
  1353. .SS "Patterns"
  1354. .P
  1355. A
  1356. .IR pattern
  1357. is any valid
  1358. .IR expression ,
  1359. a range specified by two expressions separated by a comma, or one of the
  1360. two special patterns
  1361. .BR BEGIN
  1362. or
  1363. .BR END .
  1364. .SS "Special Patterns"
  1365. .P
  1366. The
  1367. .IR awk
  1368. utility shall recognize two special patterns,
  1369. .BR BEGIN
  1370. and
  1371. .BR END .
  1372. Each
  1373. .BR BEGIN
  1374. pattern shall be matched once and its associated action executed before
  1375. the first record of input is read\(emexcept possibly by use of the
  1376. .BR getline
  1377. function (see
  1378. .IR "Input/Output and General Functions")
  1379. in a prior
  1380. .BR BEGIN
  1381. action\(emand before command line assignment is done. Each
  1382. .BR END
  1383. pattern shall be matched once and its associated action executed after
  1384. the last record of input has been read. These two patterns shall have
  1385. associated actions.
  1386. .P
  1387. .BR BEGIN
  1388. and
  1389. .BR END
  1390. shall not combine with other patterns. Multiple
  1391. .BR BEGIN
  1392. and
  1393. .BR END
  1394. patterns shall be allowed. The actions associated with the
  1395. .BR BEGIN
  1396. patterns shall be executed in the order specified in the program, as
  1397. are the
  1398. .BR END
  1399. actions. An
  1400. .BR END
  1401. pattern can precede a
  1402. .BR BEGIN
  1403. pattern in a program.
  1404. .P
  1405. If an
  1406. .IR awk
  1407. program consists of only actions with the pattern
  1408. .BR BEGIN ,
  1409. and the
  1410. .BR BEGIN
  1411. action contains no
  1412. .BR getline
  1413. function,
  1414. .IR awk
  1415. shall exit without reading its input when the last statement in the
  1416. last
  1417. .BR BEGIN
  1418. action is executed. If an
  1419. .IR awk
  1420. program consists of only actions with the pattern
  1421. .BR END
  1422. or only actions with the patterns
  1423. .BR BEGIN
  1424. and
  1425. .BR END ,
  1426. the input shall be read before the statements in the
  1427. .BR END
  1428. actions are executed.
  1429. .SS "Expression Patterns"
  1430. .P
  1431. An expression pattern shall be evaluated as if it were an expression in
  1432. a Boolean context. If the result is true, the pattern shall be
  1433. considered to match, and the associated action (if any) shall be
  1434. executed. If the result is false, the action shall not be executed.
  1435. .SS "Pattern Ranges"
  1436. .P
  1437. A pattern range consists of two expressions separated by a comma; in
  1438. this case, the action shall be performed for all records between a
  1439. match of the first expression and the following match of the second
  1440. expression, inclusive. At this point, the pattern range can be repeated
  1441. starting at input records subsequent to the end of the matched range.
  1442. .SS "Actions"
  1443. .P
  1444. An action is a sequence of statements as shown in the grammar in
  1445. .IR "Grammar".
  1446. Any single statement can be replaced by a statement list enclosed in
  1447. curly braces. The application shall ensure that statements in a
  1448. statement list are separated by
  1449. <newline>
  1450. or
  1451. <semicolon>
  1452. characters. Statements in a statement list shall be executed sequentially
  1453. in the order that they appear.
  1454. .P
  1455. The
  1456. .IR expression
  1457. acting as the conditional in an
  1458. .BR if
  1459. statement shall be evaluated and if it is non-zero or non-null, the
  1460. following statement shall be executed; otherwise, if
  1461. .BR else
  1462. is present, the statement following the
  1463. .BR else
  1464. shall be executed.
  1465. .P
  1466. The
  1467. .BR if ,
  1468. .BR while ,
  1469. .BR do .\|.\|.\c
  1470. .BR while ,
  1471. .BR for ,
  1472. .BR break ,
  1473. and
  1474. .BR continue
  1475. statements are based on the ISO\ C standard (see
  1476. .IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard"),
  1477. except that the Boolean expressions shall be treated as described in
  1478. .IR "Expressions in awk",
  1479. and except in the case of:
  1480. .sp
  1481. .RS 4
  1482. .nf
  1483. for (\fIvariable\fR in \fIarray\fR)
  1484. .fi
  1485. .P
  1486. .RE
  1487. .P
  1488. which shall iterate, assigning each
  1489. .IR index
  1490. of
  1491. .IR array
  1492. to
  1493. .IR variable
  1494. in an unspecified order. The results of adding new elements to
  1495. .IR array
  1496. within such a
  1497. .BR for
  1498. loop are undefined. If a
  1499. .BR break
  1500. or
  1501. .BR continue
  1502. statement occurs outside of a loop, the behavior is undefined.
  1503. .P
  1504. The
  1505. .BR delete
  1506. statement shall remove an individual array element. Thus, the following
  1507. code deletes an entire array:
  1508. .sp
  1509. .RS 4
  1510. .nf
  1511. for (index in array)
  1512. delete array[index]
  1513. .fi
  1514. .P
  1515. .RE
  1516. .P
  1517. The
  1518. .BR next
  1519. statement shall cause all further processing of the current input
  1520. record to be abandoned. The behavior is undefined if a
  1521. .BR next
  1522. statement appears or is invoked in a
  1523. .BR BEGIN
  1524. or
  1525. .BR END
  1526. action.
  1527. .P
  1528. The
  1529. .BR exit
  1530. statement shall invoke all
  1531. .BR END
  1532. actions in the order in which they occur in the program source and then
  1533. terminate the program without reading further input. An
  1534. .BR exit
  1535. statement inside an
  1536. .BR END
  1537. action shall terminate the program without further execution of
  1538. .BR END
  1539. actions. If an expression is specified in an
  1540. .BR exit
  1541. statement, its numeric value shall be the exit status of
  1542. .IR awk ,
  1543. unless subsequent errors are encountered or a subsequent
  1544. .BR exit
  1545. statement with an expression is executed.
  1546. .SS "Output Statements"
  1547. .P
  1548. Both
  1549. .BR print
  1550. and
  1551. .BR printf
  1552. statements shall write to standard output by default. The output shall
  1553. be written to the location specified by
  1554. .IR output_redirection
  1555. if one is supplied, as follows:
  1556. .sp
  1557. .RS 4
  1558. .nf
  1559. > \fIexpression\fR
  1560. >> \fIexpression\fR
  1561. | \fIexpression\fR
  1562. .fi
  1563. .P
  1564. .RE
  1565. .P
  1566. In all cases, the
  1567. .IR expression
  1568. shall be evaluated to produce a string that is used as a pathname
  1569. into which to write (for
  1570. .BR '>'
  1571. or
  1572. .BR \(dq>>\(dq )
  1573. or as a command to be executed (for
  1574. .BR '|' ).
  1575. Using the first two forms, if the file of that name is not currently
  1576. open, it shall be opened, creating it if necessary and using the first
  1577. form, truncating the file. The output then shall be appended to the
  1578. file. As long as the file remains open, subsequent calls in which
  1579. .IR expression
  1580. evaluates to the same string value shall simply append output to the
  1581. file. The file remains open until the
  1582. .BR close
  1583. function (see
  1584. .IR "Input/Output and General Functions")
  1585. is called with an expression that evaluates to the same string value.
  1586. .P
  1587. The third form shall write output onto a stream piped to the input of a
  1588. command. The stream shall be created if no stream is currently open
  1589. with the value of
  1590. .IR expression
  1591. as its command name. The stream created shall be equivalent to one
  1592. created by a call to the
  1593. \fIpopen\fR()
  1594. function defined in the System Interfaces volume of POSIX.1\(hy2017 with the value of
  1595. .IR expression
  1596. as the
  1597. .IR command
  1598. argument and a value of
  1599. .IR w
  1600. as the
  1601. .IR mode
  1602. argument. As long as the stream remains open, subsequent calls in which
  1603. .IR expression
  1604. evaluates to the same string value shall write output to the existing
  1605. stream. The stream shall remain open until the
  1606. .BR close
  1607. function (see
  1608. .IR "Input/Output and General Functions")
  1609. is called with an expression that evaluates to the same string value.
  1610. At that time, the stream shall be closed as if by a call to the
  1611. \fIpclose\fR()
  1612. function defined in the System Interfaces volume of POSIX.1\(hy2017.
  1613. .P
  1614. As described in detail by the grammar in
  1615. .IR "Grammar",
  1616. these output statements shall take a
  1617. <comma>-separated
  1618. list of
  1619. .IR expression s
  1620. referred to in the grammar by the non-terminal symbols
  1621. .BR expr_list ,
  1622. .BR print_expr_list ,
  1623. or
  1624. .BR print_expr_list_opt .
  1625. This list is referred to here as the
  1626. .IR "expression list" ,
  1627. and each member is referred to as an
  1628. .IR "expression argument" .
  1629. .P
  1630. The
  1631. .BR print
  1632. statement shall write the value of each expression argument onto the
  1633. indicated output stream separated by the current output field separator
  1634. (see variable
  1635. .BR OFS
  1636. above), and terminated by the output record separator (see variable
  1637. .BR ORS
  1638. above). All expression arguments shall be taken as strings, being
  1639. converted if necessary; this conversion shall be as described in
  1640. .IR "Expressions in awk",
  1641. with the exception that the
  1642. .BR printf
  1643. format in
  1644. .BR OFMT
  1645. shall be used instead of the value in
  1646. .BR CONVFMT .
  1647. An empty expression list shall stand for the whole input record ($0).
  1648. .P
  1649. The
  1650. .BR printf
  1651. statement shall produce output based on a notation similar to the
  1652. File Format Notation used to describe file formats in this volume of POSIX.1\(hy2017 (see the Base Definitions volume of POSIX.1\(hy2017,
  1653. .IR "Chapter 5" ", " "File Format Notation").
  1654. Output shall be produced as specified with the first
  1655. .IR expression
  1656. argument as the string
  1657. .IR format
  1658. and subsequent
  1659. .IR expression
  1660. arguments as the strings
  1661. .IR arg1
  1662. to
  1663. .IR argn ,
  1664. inclusive, with the following exceptions:
  1665. .IP " 1." 4
  1666. The
  1667. .IR format
  1668. shall be an actual character string rather than a graphical
  1669. representation. Therefore, it cannot contain empty character
  1670. positions. The
  1671. <space>
  1672. in the
  1673. .IR format
  1674. string, in any context other than a
  1675. .IR flag
  1676. of a conversion specification, shall be treated as an ordinary
  1677. character that is copied to the output.
  1678. .IP " 2." 4
  1679. If the character set contains a
  1680. .BR ' '
  1681. character and that character appears in the
  1682. .IR format
  1683. string, it shall be treated as an ordinary character that is copied to
  1684. the output.
  1685. .IP " 3." 4
  1686. The
  1687. .IR "escape sequences"
  1688. beginning with a
  1689. <backslash>
  1690. character shall be treated as sequences of ordinary characters that are
  1691. copied to the output. Note that these same sequences shall be interpreted
  1692. lexically by
  1693. .IR awk
  1694. when they appear in literal strings, but they shall not be treated
  1695. specially by the
  1696. .BR printf
  1697. statement.
  1698. .IP " 4." 4
  1699. A
  1700. .IR "field width"
  1701. or
  1702. .IR precision
  1703. can be specified as the
  1704. .BR '*'
  1705. character instead of a digit string. In this case the next argument
  1706. from the expression list shall be fetched and its numeric value taken
  1707. as the field width or precision.
  1708. .IP " 5." 4
  1709. The implementation shall not precede or follow output from the
  1710. .BR d
  1711. or
  1712. .BR u
  1713. conversion specifier characters with
  1714. <blank>
  1715. characters not specified by the
  1716. .IR format
  1717. string.
  1718. .IP " 6." 4
  1719. The implementation shall not precede output from the
  1720. .BR o
  1721. conversion specifier character with leading zeros not specified by the
  1722. .IR format
  1723. string.
  1724. .IP " 7." 4
  1725. For the
  1726. .BR c
  1727. conversion specifier character: if the argument has a numeric value, the
  1728. character whose encoding is that value shall be output. If the value is
  1729. zero or is not the encoding of any character in the character set, the
  1730. behavior is undefined. If the argument does not have a numeric value,
  1731. the first character of the string value shall be output; if the string
  1732. does not contain any characters, the behavior is undefined.
  1733. .IP " 8." 4
  1734. For each conversion specification that consumes an argument, the next
  1735. expression argument shall be evaluated. With the exception of the
  1736. .BR c
  1737. conversion specifier character, the value shall be converted (according
  1738. to the rules specified in
  1739. .IR "Expressions in awk")
  1740. to the appropriate type for the conversion specification.
  1741. .IP " 9." 4
  1742. If there are insufficient expression arguments to satisfy all the
  1743. conversion specifications in the
  1744. .IR format
  1745. string, the behavior is undefined.
  1746. .IP 10. 4
  1747. If any character sequence in the
  1748. .IR format
  1749. string begins with a
  1750. .BR '%'
  1751. character, but does not form a valid conversion specification, the
  1752. behavior is unspecified.
  1753. .P
  1754. Both
  1755. .BR print
  1756. and
  1757. .BR printf
  1758. can output at least
  1759. {LINE_MAX}
  1760. bytes.
  1761. .SS "Functions"
  1762. .P
  1763. The
  1764. .IR awk
  1765. language has a variety of built-in functions: arithmetic, string,
  1766. input/output, and general.
  1767. .SS "Arithmetic Functions"
  1768. .P
  1769. The arithmetic functions, except for
  1770. .BR int ,
  1771. shall be based on the ISO\ C standard (see
  1772. .IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard").
  1773. The behavior is undefined in cases where the ISO\ C standard specifies that an
  1774. error be returned or that the behavior is undefined. Although the
  1775. grammar (see
  1776. .IR "Grammar")
  1777. permits built-in functions to appear with no arguments or parentheses,
  1778. unless the argument or parentheses are indicated as optional in the
  1779. following list (by displaying them within the
  1780. .BR \(dq[]\(dq
  1781. brackets), such use is undefined.
  1782. .IP "\fBatan2\fR(\fIy\fR,\fIx\fR)" 10
  1783. Return arctangent of \fIy\fP/\fIx\fR in radians in the range
  1784. [\-\(*p,\(*p].
  1785. .IP "\fBcos\fR(\fIx\fR)" 10
  1786. Return cosine of \fIx\fP, where \fIx\fP is in radians.
  1787. .IP "\fBsin\fR(\fIx\fR)" 10
  1788. Return sine of \fIx\fP, where \fIx\fP is in radians.
  1789. .IP "\fBexp\fR(\fIx\fR)" 10
  1790. Return the exponential function of \fIx\fP.
  1791. .IP "\fBlog\fR(\fIx\fR)" 10
  1792. Return the natural logarithm of \fIx\fP.
  1793. .IP "\fBsqrt\fR(\fIx\fR)" 10
  1794. Return the square root of \fIx\fP.
  1795. .IP "\fBint\fR(\fIx\fR)" 10
  1796. Return the argument truncated to an integer. Truncation shall
  1797. be toward 0 when \fIx\fP>0.
  1798. .IP "\fBrand\fP(\|)" 10
  1799. Return a random number \fIn\fP, such that 0\(<=\fIn\fP<1.
  1800. .IP "\fBsrand\fR(\fB[\fIexpr\fB]\fR)" 10
  1801. Set the seed value for
  1802. .IR rand
  1803. to
  1804. .IR expr
  1805. or use the time of day if
  1806. .IR expr
  1807. is omitted. The previous seed value shall be returned.
  1808. .SS "String Functions"
  1809. .P
  1810. The string functions in the following list shall be supported.
  1811. Although the grammar (see
  1812. .IR "Grammar")
  1813. permits built-in functions to appear with no arguments or parentheses,
  1814. unless the argument or parentheses are indicated as optional in the
  1815. following list (by displaying them within the
  1816. .BR \(dq[]\(dq
  1817. brackets), such use is undefined.
  1818. .IP "\fBgsub\fR(\fIere\fR,\ \fIrepl\fB[\fR,\ \fIin\fB]\fR)" 10
  1819. .br
  1820. Behave like
  1821. .BR sub
  1822. (see below), except that it shall replace all occurrences of the
  1823. regular expression (like the
  1824. .IR ed
  1825. utility global substitute) in $0 or in the
  1826. .IR in
  1827. argument, when specified.
  1828. .IP "\fBindex\fR(\fIs\fR,\ \fIt\fR)" 10
  1829. Return the position, in characters, numbering from 1, in string
  1830. .IR s
  1831. where string
  1832. .IR t
  1833. first occurs, or zero if it does not occur at all.
  1834. .IP "\fBlength[\fR(\fB[\fIs\fB]\fR)\fB]\fR" 10
  1835. Return the length, in characters, of its argument taken as a string, or
  1836. of the whole record, $0, if there is no argument.
  1837. .IP "\fBmatch\fR(\fIs\fR,\ \fIere\fR)" 10
  1838. Return the position, in characters, numbering from 1, in string
  1839. .IR s
  1840. where the extended regular expression
  1841. .IR ere
  1842. occurs, or zero if it does not occur at all. RSTART shall be set to the
  1843. starting position (which is the same as the returned value), zero if no
  1844. match is found; RLENGTH shall be set to the length of the matched
  1845. string, \-1 if no match is found.
  1846. .IP "\fBsplit\fR(\fIs\fR,\ \fIa\fB[\fR,\ \fIfs\ \fB]\fR)" 10
  1847. .br
  1848. Split the string
  1849. .IR s
  1850. into array elements
  1851. .IR a [1],
  1852. .IR a [2],
  1853. \&.\|.\|.,
  1854. .IR a [ n ],
  1855. and return
  1856. .IR n .
  1857. All elements of the array shall be deleted before the split is
  1858. performed. The separation shall be done with the ERE
  1859. .IR fs
  1860. or with the field separator
  1861. .BR FS
  1862. if
  1863. .IR fs
  1864. is not given. Each array element shall have a string value when created
  1865. and, if appropriate, the array element shall be considered a numeric
  1866. string (see
  1867. .IR "Expressions in awk").
  1868. The effect of a null string as the value of
  1869. .IR fs
  1870. is unspecified.
  1871. .IP "\fBsprintf\fR(\fIfmt\fR,\ \fIexpr\fR,\ \fIexpr\fR,\ .\|.\|.)" 10
  1872. .br
  1873. Format the expressions according to the
  1874. .BR printf
  1875. format given by
  1876. .IR fmt
  1877. and return the resulting string.
  1878. .IP "\fBsub(\fIere\fR,\ \fIrepl\fB[\fR,\ \fIin\ \fB]\fR)" 10
  1879. .br
  1880. Substitute the string
  1881. .IR repl
  1882. in place of the first instance of the extended regular expression
  1883. .IR ERE
  1884. in string
  1885. .IR in
  1886. and return the number of substitutions. An
  1887. <ampersand>
  1888. (\c
  1889. .BR '&' )
  1890. appearing in the string
  1891. .IR repl
  1892. shall be replaced by the string from
  1893. .IR in
  1894. that matches the ERE. An
  1895. <ampersand>
  1896. preceded with a
  1897. <backslash>
  1898. shall be interpreted as the literal
  1899. <ampersand>
  1900. character. An occurrence of two consecutive
  1901. <backslash>
  1902. characters shall be interpreted as just a single literal
  1903. <backslash>
  1904. character. Any other occurrence of a
  1905. <backslash>
  1906. (for example, preceding any other character) shall be treated as a
  1907. literal
  1908. <backslash>
  1909. character. Note that if
  1910. .IR repl
  1911. is a string literal (the lexical token
  1912. .BR STRING ;
  1913. see
  1914. .IR "Grammar"),
  1915. the handling of the
  1916. <ampersand>
  1917. character occurs after any lexical processing, including any lexical
  1918. <backslash>-escape
  1919. sequence processing. If
  1920. .IR in
  1921. is specified and it is not an lvalue (see
  1922. .IR "Expressions in awk"),
  1923. the behavior is undefined. If
  1924. .IR in
  1925. is omitted,
  1926. .IR awk
  1927. shall use the current record ($0) in its place.
  1928. .IP "\fBsubstr\fR(\fIs\fR,\ \fIm\fB[\fR,\ \fIn\ \fB]\fR)" 10
  1929. .br
  1930. Return the at most
  1931. .IR n -character
  1932. substring of
  1933. .IR s
  1934. that begins at position
  1935. .IR m ,
  1936. numbering from 1. If
  1937. .IR n
  1938. is omitted, or if
  1939. .IR n
  1940. specifies more characters than are left in the string, the length of
  1941. the substring shall be limited by the length of the string
  1942. .IR s .
  1943. .IP "\fBtolower\fR(\fIs\fR)" 10
  1944. Return a string based on the string
  1945. .IR s .
  1946. Each character in
  1947. .IR s
  1948. that is an uppercase letter specified to have a
  1949. .BR tolower
  1950. mapping by the
  1951. .IR LC_CTYPE
  1952. category of the current locale shall be replaced in the returned string
  1953. by the lowercase letter specified by the mapping. Other characters in
  1954. .IR s
  1955. shall be unchanged in the returned string.
  1956. .IP "\fBtoupper\fR(\fIs\fR)" 10
  1957. Return a string based on the string
  1958. .IR s .
  1959. Each character in
  1960. .IR s
  1961. that is a lowercase letter specified to have a
  1962. .BR toupper
  1963. mapping by the
  1964. .IR LC_CTYPE
  1965. category of the current locale is replaced in the returned string by
  1966. the uppercase letter specified by the mapping. Other characters in
  1967. .IR s
  1968. are unchanged in the returned string.
  1969. .P
  1970. All of the preceding functions that take
  1971. .IR ERE
  1972. as a parameter expect a pattern or a string valued expression that is a
  1973. regular expression as defined in
  1974. .IR "Regular Expressions".
  1975. .SS "Input/Output and General Functions"
  1976. .P
  1977. The input/output and general functions are:
  1978. .IP "\fBclose\fR(\fIexpression\fR)" 10
  1979. .br
  1980. Close the file or pipe opened by a
  1981. .BR print
  1982. or
  1983. .BR printf
  1984. statement or a call to
  1985. .BR getline
  1986. with the same string-valued
  1987. .IR expression .
  1988. The limit on the number of open
  1989. .IR expression
  1990. arguments is implementation-defined. If the close was successful, the
  1991. function shall return zero; otherwise, it shall return non-zero.
  1992. .IP "\fIexpression\ |\ \fBgetline\ [\fIvar\fB]\fR" 10
  1993. .br
  1994. Read a record of input from a stream piped from the output of a
  1995. command. The stream shall be created if no stream is currently open
  1996. with the value of
  1997. .IR expression
  1998. as its command name. The stream created shall be equivalent to one
  1999. created by a call to the
  2000. \fIpopen\fR()
  2001. function with the value of
  2002. .IR expression
  2003. as the
  2004. .IR command
  2005. argument and a value of
  2006. .IR r
  2007. as the
  2008. .IR mode
  2009. argument. As long as the stream remains open, subsequent calls in which
  2010. .IR expression
  2011. evaluates to the same string value shall read subsequent records from
  2012. the stream. The stream shall remain open until the
  2013. .BR close
  2014. function is called with an expression that evaluates to the same string
  2015. value. At that time, the stream shall be closed as if by a call to the
  2016. \fIpclose\fR()
  2017. function. If
  2018. .IR var
  2019. is omitted, $0 and
  2020. .BR NF
  2021. shall be set; otherwise,
  2022. .IR var
  2023. shall be set and, if appropriate, it shall be considered a numeric
  2024. string (see
  2025. .IR "Expressions in awk").
  2026. .RS 10
  2027. .P
  2028. The
  2029. .BR getline
  2030. operator can form ambiguous constructs when there are unparenthesized
  2031. operators (including concatenate) to the left of the
  2032. .BR '|'
  2033. (to the beginning of the expression containing
  2034. .BR getline ).
  2035. In the context of the
  2036. .BR '$'
  2037. operator,
  2038. .BR '|'
  2039. shall behave as if it had a lower precedence than
  2040. .BR '$' .
  2041. The result of evaluating other operators is unspecified, and conforming
  2042. applications shall parenthesize properly all such usages.
  2043. .RE
  2044. .IP "\fBgetline\fR" 10
  2045. Set $0 to the next input record from the current input file. This form
  2046. of
  2047. .BR getline
  2048. shall set the
  2049. .BR NF ,
  2050. .BR NR ,
  2051. and
  2052. .BR FNR
  2053. variables.
  2054. .IP "\fBgetline\ \fIvar\fR" 10
  2055. Set variable
  2056. .IR var
  2057. to the next input record from the current input file and, if
  2058. appropriate,
  2059. .IR var
  2060. shall be considered a numeric string (see
  2061. .IR "Expressions in awk").
  2062. This form of
  2063. .BR getline
  2064. shall set the
  2065. .BR FNR
  2066. and
  2067. .BR NR
  2068. variables.
  2069. .IP "\fBgetline\ \fB[\fIvar\fB]\ \fR<\ \fIexpression\fR" 10
  2070. .br
  2071. Read the next record of input from a named file. The
  2072. .IR expression
  2073. shall be evaluated to produce a string that is used as a pathname.
  2074. If the file of that name is not currently open, it shall be opened. As
  2075. long as the stream remains open, subsequent calls in which
  2076. .IR expression
  2077. evaluates to the same string value shall read subsequent records from
  2078. the file. The file shall remain open until the
  2079. .BR close
  2080. function is called with an expression that evaluates to the same string
  2081. value. If
  2082. .IR var
  2083. is omitted, $0 and
  2084. .BR NF
  2085. shall be set; otherwise,
  2086. .IR var
  2087. shall be set and, if appropriate, it shall be considered a numeric
  2088. string (see
  2089. .IR "Expressions in awk").
  2090. .RS 10
  2091. .P
  2092. The
  2093. .BR getline
  2094. operator can form ambiguous constructs when there are unparenthesized
  2095. binary operators (including concatenate) to the right of the
  2096. .BR '<'
  2097. (up to the end of the expression containing the
  2098. .BR getline ).
  2099. The result of evaluating such a construct is unspecified, and conforming
  2100. applications shall parenthesize properly all such usages.
  2101. .RE
  2102. .IP "\fBsystem\fR(\fIexpression\fR)" 10
  2103. .br
  2104. Execute the command given by
  2105. .IR expression
  2106. in a manner equivalent to the
  2107. \fIsystem\fR()
  2108. function defined in the System Interfaces volume of POSIX.1\(hy2017 and return the exit status of the
  2109. command.
  2110. .P
  2111. All forms of
  2112. .BR getline
  2113. shall return 1 for successful input, zero for end-of-file, and \-1
  2114. for an error.
  2115. .P
  2116. Where strings are used as the name of a file or pipeline, the
  2117. application shall ensure that the strings are textually identical. The
  2118. terminology ``same string value'' implies that ``equivalent strings'',
  2119. even those that differ only by
  2120. <space>
  2121. characters, represent different files.
  2122. .SS "User-Defined Functions"
  2123. .P
  2124. The
  2125. .IR awk
  2126. language also provides user-defined functions. Such functions can be
  2127. defined as:
  2128. .sp
  2129. .RS 4
  2130. .nf
  2131. function \fIname\fR(\fB[\fIparameter\fR, ...\fB]\fR) { \fIstatements\fR }
  2132. .fi
  2133. .P
  2134. .RE
  2135. .P
  2136. A function can be referred to anywhere in an
  2137. .IR awk
  2138. program; in particular, its use can precede its definition. The scope
  2139. of a function is global.
  2140. .P
  2141. Function parameters, if present, can be either scalars or arrays; the
  2142. behavior is undefined if an array name is passed as a parameter that
  2143. the function uses as a scalar, or if a scalar expression is passed as a
  2144. parameter that the function uses as an array. Function parameters shall
  2145. be passed by value if scalar and by reference if array name.
  2146. .P
  2147. The number of parameters in the function definition need not match the
  2148. number of parameters in the function call. Excess formal parameters can
  2149. be used as local variables. If fewer arguments are supplied in a
  2150. function call than are in the function definition, the extra parameters
  2151. that are used in the function body as scalars shall evaluate to the
  2152. uninitialized value until they are otherwise initialized, and the extra
  2153. parameters that are used in the function body as arrays shall be
  2154. treated as uninitialized arrays where each element evaluates to the
  2155. uninitialized value until otherwise initialized.
  2156. .P
  2157. When invoking a function, no white space can be placed between the
  2158. function name and the opening parenthesis. Function calls can be nested
  2159. and recursive calls can be made upon functions. Upon return from any
  2160. nested or recursive function call, the values of all of the calling
  2161. function's parameters shall be unchanged, except for array parameters
  2162. passed by reference. The
  2163. .BR return
  2164. statement can be used to return a value. If a
  2165. .BR return
  2166. statement appears outside of a function definition, the behavior is
  2167. undefined.
  2168. .P
  2169. In the function definition,
  2170. <newline>
  2171. characters shall be optional before the opening brace and after the
  2172. closing brace. Function definitions can appear anywhere in the program
  2173. where a
  2174. .IR pattern-action
  2175. pair is allowed.
  2176. .SS "Grammar"
  2177. .P
  2178. The grammar in this section and the lexical conventions in the
  2179. following section shall together describe the syntax for
  2180. .IR awk
  2181. programs. The general conventions for this style of grammar are
  2182. described in
  2183. .IR "Section 1.3" ", " "Grammar Conventions".
  2184. A valid program can be represented as the non-terminal symbol
  2185. .IR program
  2186. in the grammar. This formal syntax shall take precedence over the
  2187. preceding text syntax description.
  2188. .sp
  2189. .RS 4
  2190. .nf
  2191. %token NAME NUMBER STRING ERE
  2192. %token FUNC_NAME /* Name followed by \(aq(\(aq without white space. */
  2193. .P
  2194. /* Keywords */
  2195. %token Begin End
  2196. /* \(aqBEGIN\(aq \(aqEND\(aq */
  2197. .P
  2198. %token Break Continue Delete Do Else
  2199. /* \(aqbreak\(aq \(aqcontinue\(aq \(aqdelete\(aq \(aqdo\(aq \(aqelse\(aq */
  2200. .P
  2201. %token Exit For Function If In
  2202. /* \(aqexit\(aq \(aqfor\(aq \(aqfunction\(aq \(aqif\(aq \(aqin\(aq */
  2203. .P
  2204. %token Next Print Printf Return While
  2205. /* \(aqnext\(aq \(aqprint\(aq \(aqprintf\(aq \(aqreturn\(aq \(aqwhile\(aq */
  2206. .P
  2207. /* Reserved function names */
  2208. %token BUILTIN_FUNC_NAME
  2209. /* One token for the following:
  2210. * atan2 cos sin exp log sqrt int rand srand
  2211. * gsub index length match split sprintf sub
  2212. * substr tolower toupper close system
  2213. */
  2214. %token GETLINE
  2215. /* Syntactically different from other built-ins. */
  2216. .P
  2217. /* Two-character tokens. */
  2218. %token ADD_ASSIGN SUB_ASSIGN MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN POW_ASSIGN
  2219. /* \(aq+=\(aq \(aq-=\(aq \(aq*=\(aq \(aq/=\(aq \(aq%=\(aq \(aq\(ha=\(aq */
  2220. .P
  2221. %token OR AND NO_MATCH EQ LE GE NE INCR DECR APPEND
  2222. /* \(aq||\(aq \(aq&&\(aq \(aq!\^\(ti\(aq \(aq==\(aq \(aq<=\(aq \(aq>=\(aq \(aq!=\(aq \(aq++\(aq \(aq--\(aq \(aq>>\(aq */
  2223. .P
  2224. /* One-character tokens. */
  2225. %token \(aq{\(aq \(aq}\(aq \(aq(\(aq \(aq)\(aq \(aq[\(aq \(aq]\(aq \(aq,\(aq \(aq;\(aq NEWLINE
  2226. %token \(aq+\(aq \(aq-\(aq \(aq*\(aq \(aq%\(aq \(aq\(ha\(aq \(aq!\(aq \(aq>\(aq \(aq<\(aq \(aq|\(aq \(aq?\(aq \(aq:\(aq \(aq\(ti\(aq \(aq$\(aq \(aq=\(aq
  2227. .P
  2228. %start program
  2229. %%
  2230. .P
  2231. program : item_list
  2232. | item_list item
  2233. ;
  2234. .P
  2235. item_list : /* empty */
  2236. | item_list item terminator
  2237. ;
  2238. .P
  2239. item : action
  2240. | pattern action
  2241. | normal_pattern
  2242. | Function NAME \(aq(\(aq param_list_opt \(aq)\(aq
  2243. newline_opt action
  2244. | Function FUNC_NAME \(aq(\(aq param_list_opt \(aq)\(aq
  2245. newline_opt action
  2246. ;
  2247. .P
  2248. param_list_opt : /* empty */
  2249. | param_list
  2250. ;
  2251. .P
  2252. param_list : NAME
  2253. | param_list \(aq,\(aq NAME
  2254. ;
  2255. .P
  2256. pattern : normal_pattern
  2257. | special_pattern
  2258. ;
  2259. .P
  2260. normal_pattern : expr
  2261. | expr \(aq,\(aq newline_opt expr
  2262. ;
  2263. .P
  2264. special_pattern : Begin
  2265. | End
  2266. ;
  2267. .P
  2268. action : \(aq{\(aq newline_opt \(aq}\(aq
  2269. | \(aq{\(aq newline_opt terminated_statement_list \(aq}\(aq
  2270. | \(aq{\(aq newline_opt unterminated_statement_list \(aq}\(aq
  2271. ;
  2272. .P
  2273. terminator : terminator NEWLINE
  2274. | \(aq;\(aq
  2275. | NEWLINE
  2276. ;
  2277. .P
  2278. terminated_statement_list : terminated_statement
  2279. | terminated_statement_list terminated_statement
  2280. ;
  2281. .P
  2282. unterminated_statement_list : unterminated_statement
  2283. | terminated_statement_list unterminated_statement
  2284. ;
  2285. .P
  2286. terminated_statement : action newline_opt
  2287. | If \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement
  2288. | If \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement
  2289. Else newline_opt terminated_statement
  2290. | While \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement
  2291. | For \(aq(\(aq simple_statement_opt \(aq;\(aq
  2292. expr_opt \(aq;\(aq simple_statement_opt \(aq)\(aq newline_opt
  2293. terminated_statement
  2294. | For \(aq(\(aq NAME In NAME \(aq)\(aq newline_opt
  2295. terminated_statement
  2296. | \(aq;\(aq newline_opt
  2297. | terminatable_statement NEWLINE newline_opt
  2298. | terminatable_statement \(aq;\(aq newline_opt
  2299. ;
  2300. .P
  2301. unterminated_statement : terminatable_statement
  2302. | If \(aq(\(aq expr \(aq)\(aq newline_opt unterminated_statement
  2303. | If \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement
  2304. Else newline_opt unterminated_statement
  2305. | While \(aq(\(aq expr \(aq)\(aq newline_opt unterminated_statement
  2306. | For \(aq(\(aq simple_statement_opt \(aq;\(aq
  2307. expr_opt \(aq;\(aq simple_statement_opt \(aq)\(aq newline_opt
  2308. unterminated_statement
  2309. | For \(aq(\(aq NAME In NAME \(aq)\(aq newline_opt
  2310. unterminated_statement
  2311. ;
  2312. .P
  2313. terminatable_statement : simple_statement
  2314. | Break
  2315. | Continue
  2316. | Next
  2317. | Exit expr_opt
  2318. | Return expr_opt
  2319. | Do newline_opt terminated_statement While \(aq(\(aq expr \(aq)\(aq
  2320. ;
  2321. .P
  2322. simple_statement_opt : /* empty */
  2323. | simple_statement
  2324. ;
  2325. .P
  2326. simple_statement : Delete NAME \(aq[\(aq expr_list \(aq]\(aq
  2327. | expr
  2328. | print_statement
  2329. ;
  2330. .P
  2331. print_statement : simple_print_statement
  2332. | simple_print_statement output_redirection
  2333. ;
  2334. .P
  2335. simple_print_statement : Print print_expr_list_opt
  2336. | Print \(aq(\(aq multiple_expr_list \(aq)\(aq
  2337. | Printf print_expr_list
  2338. | Printf \(aq(\(aq multiple_expr_list \(aq)\(aq
  2339. ;
  2340. .P
  2341. output_redirection : \(aq>\(aq expr
  2342. | APPEND expr
  2343. | \(aq|\(aq expr
  2344. ;
  2345. .P
  2346. expr_list_opt : /* empty */
  2347. | expr_list
  2348. ;
  2349. .P
  2350. expr_list : expr
  2351. | multiple_expr_list
  2352. ;
  2353. .P
  2354. multiple_expr_list : expr \(aq,\(aq newline_opt expr
  2355. | multiple_expr_list \(aq,\(aq newline_opt expr
  2356. ;
  2357. .P
  2358. expr_opt : /* empty */
  2359. | expr
  2360. ;
  2361. .P
  2362. expr : unary_expr
  2363. | non_unary_expr
  2364. ;
  2365. .P
  2366. unary_expr : \(aq+\(aq expr
  2367. | \(aq-\(aq expr
  2368. | unary_expr \(aq\(ha\(aq expr
  2369. | unary_expr \(aq*\(aq expr
  2370. | unary_expr \(aq/\(aq expr
  2371. | unary_expr \(aq%\(aq expr
  2372. | unary_expr \(aq+\(aq expr
  2373. | unary_expr \(aq-\(aq expr
  2374. | unary_expr non_unary_expr
  2375. | unary_expr \(aq<\(aq expr
  2376. | unary_expr LE expr
  2377. | unary_expr NE expr
  2378. | unary_expr EQ expr
  2379. | unary_expr \(aq>\(aq expr
  2380. | unary_expr GE expr
  2381. | unary_expr \(aq\(ti\(aq expr
  2382. | unary_expr NO_MATCH expr
  2383. | unary_expr In NAME
  2384. | unary_expr AND newline_opt expr
  2385. | unary_expr OR newline_opt expr
  2386. | unary_expr \(aq?\(aq expr \(aq:\(aq expr
  2387. | unary_input_function
  2388. ;
  2389. .P
  2390. non_unary_expr : \(aq(\(aq expr \(aq)\(aq
  2391. | \(aq!\(aq expr
  2392. | non_unary_expr \(aq\(ha\(aq expr
  2393. | non_unary_expr \(aq*\(aq expr
  2394. | non_unary_expr \(aq/\(aq expr
  2395. | non_unary_expr \(aq%\(aq expr
  2396. | non_unary_expr \(aq+\(aq expr
  2397. | non_unary_expr \(aq-\(aq expr
  2398. | non_unary_expr non_unary_expr
  2399. | non_unary_expr \(aq<\(aq expr
  2400. | non_unary_expr LE expr
  2401. | non_unary_expr NE expr
  2402. | non_unary_expr EQ expr
  2403. | non_unary_expr \(aq>\(aq expr
  2404. | non_unary_expr GE expr
  2405. | non_unary_expr \(aq\(ti\(aq expr
  2406. | non_unary_expr NO_MATCH expr
  2407. | non_unary_expr In NAME
  2408. | \(aq(\(aq multiple_expr_list \(aq)\(aq In NAME
  2409. | non_unary_expr AND newline_opt expr
  2410. | non_unary_expr OR newline_opt expr
  2411. | non_unary_expr \(aq?\(aq expr \(aq:\(aq expr
  2412. | NUMBER
  2413. | STRING
  2414. | lvalue
  2415. | ERE
  2416. | lvalue INCR
  2417. | lvalue DECR
  2418. | INCR lvalue
  2419. | DECR lvalue
  2420. | lvalue POW_ASSIGN expr
  2421. | lvalue MOD_ASSIGN expr
  2422. | lvalue MUL_ASSIGN expr
  2423. | lvalue DIV_ASSIGN expr
  2424. | lvalue ADD_ASSIGN expr
  2425. | lvalue SUB_ASSIGN expr
  2426. | lvalue \(aq=\(aq expr
  2427. | FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq
  2428. /* no white space allowed before \(aq(\(aq */
  2429. | BUILTIN_FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq
  2430. | BUILTIN_FUNC_NAME
  2431. | non_unary_input_function
  2432. ;
  2433. .P
  2434. print_expr_list_opt : /* empty */
  2435. | print_expr_list
  2436. ;
  2437. .P
  2438. print_expr_list : print_expr
  2439. | print_expr_list \(aq,\(aq newline_opt print_expr
  2440. ;
  2441. .P
  2442. print_expr : unary_print_expr
  2443. | non_unary_print_expr
  2444. ;
  2445. .P
  2446. unary_print_expr : \(aq+\(aq print_expr
  2447. | \(aq-\(aq print_expr
  2448. | unary_print_expr \(aq\(ha\(aq print_expr
  2449. | unary_print_expr \(aq*\(aq print_expr
  2450. | unary_print_expr \(aq/\(aq print_expr
  2451. | unary_print_expr \(aq%\(aq print_expr
  2452. | unary_print_expr \(aq+\(aq print_expr
  2453. | unary_print_expr \(aq-\(aq print_expr
  2454. | unary_print_expr non_unary_print_expr
  2455. | unary_print_expr \(aq\(ti\(aq print_expr
  2456. | unary_print_expr NO_MATCH print_expr
  2457. | unary_print_expr In NAME
  2458. | unary_print_expr AND newline_opt print_expr
  2459. | unary_print_expr OR newline_opt print_expr
  2460. | unary_print_expr \(aq?\(aq print_expr \(aq:\(aq print_expr
  2461. ;
  2462. .P
  2463. non_unary_print_expr : \(aq(\(aq expr \(aq)\(aq
  2464. | \(aq!\(aq print_expr
  2465. | non_unary_print_expr \(aq\(ha\(aq print_expr
  2466. | non_unary_print_expr \(aq*\(aq print_expr
  2467. | non_unary_print_expr \(aq/\(aq print_expr
  2468. | non_unary_print_expr \(aq%\(aq print_expr
  2469. | non_unary_print_expr \(aq+\(aq print_expr
  2470. | non_unary_print_expr \(aq-\(aq print_expr
  2471. | non_unary_print_expr non_unary_print_expr
  2472. | non_unary_print_expr \(aq\(ti\(aq print_expr
  2473. | non_unary_print_expr NO_MATCH print_expr
  2474. | non_unary_print_expr In NAME
  2475. | \(aq(\(aq multiple_expr_list \(aq)\(aq In NAME
  2476. | non_unary_print_expr AND newline_opt print_expr
  2477. | non_unary_print_expr OR newline_opt print_expr
  2478. | non_unary_print_expr \(aq?\(aq print_expr \(aq:\(aq print_expr
  2479. | NUMBER
  2480. | STRING
  2481. | lvalue
  2482. | ERE
  2483. | lvalue INCR
  2484. | lvalue DECR
  2485. | INCR lvalue
  2486. | DECR lvalue
  2487. | lvalue POW_ASSIGN print_expr
  2488. | lvalue MOD_ASSIGN print_expr
  2489. | lvalue MUL_ASSIGN print_expr
  2490. | lvalue DIV_ASSIGN print_expr
  2491. | lvalue ADD_ASSIGN print_expr
  2492. | lvalue SUB_ASSIGN print_expr
  2493. | lvalue \(aq=\(aq print_expr
  2494. | FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq
  2495. /* no white space allowed before \(aq(\(aq */
  2496. | BUILTIN_FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq
  2497. | BUILTIN_FUNC_NAME
  2498. ;
  2499. .P
  2500. lvalue : NAME
  2501. | NAME \(aq[\(aq expr_list \(aq]\(aq
  2502. | \(aq$\(aq expr
  2503. ;
  2504. .P
  2505. non_unary_input_function : simple_get
  2506. | simple_get \(aq<\(aq expr
  2507. | non_unary_expr \(aq|\(aq simple_get
  2508. ;
  2509. .P
  2510. unary_input_function : unary_expr \(aq|\(aq simple_get
  2511. ;
  2512. .P
  2513. simple_get : GETLINE
  2514. | GETLINE lvalue
  2515. ;
  2516. .P
  2517. newline_opt : /* empty */
  2518. | newline_opt NEWLINE
  2519. ;
  2520. .fi
  2521. .P
  2522. .RE
  2523. .P
  2524. This grammar has several ambiguities that shall be resolved as
  2525. follows:
  2526. .IP " *" 4
  2527. Operator precedence and associativity shall be as described in
  2528. .IR "Table 4-1, Expressions in Decreasing Precedence in \fIawk\fP".
  2529. .IP " *" 4
  2530. In case of ambiguity, an
  2531. .BR else
  2532. shall be associated with the most immediately preceding
  2533. .BR if
  2534. that would satisfy the grammar.
  2535. .IP " *" 4
  2536. In some contexts, a
  2537. <slash>
  2538. (\c
  2539. .BR '/' )
  2540. that is used to surround an ERE could also be the division operator.
  2541. This shall be resolved in such a way that wherever the division
  2542. operator could appear, a
  2543. <slash>
  2544. is assumed to be the division operator. (There is no unary division
  2545. operator.)
  2546. .P
  2547. Each expression in an
  2548. .IR awk
  2549. program shall conform to the precedence and associativity rules, even
  2550. when this is not needed to resolve an ambiguity. For example, because
  2551. .BR '$'
  2552. has higher precedence than
  2553. .BR '++' ,
  2554. the string
  2555. .BR \(dq$x++--\(dq
  2556. is not a valid
  2557. .IR awk
  2558. expression, even though it is unambiguously parsed by the grammar as
  2559. .BR \(dq$(x++)--\(dq .
  2560. .P
  2561. One convention that might not be obvious from the formal grammar is
  2562. where
  2563. <newline>
  2564. characters are acceptable. There are several obvious placements such as
  2565. terminating a statement, and a
  2566. <backslash>
  2567. can be used to escape
  2568. <newline>
  2569. characters between any lexical tokens. In addition,
  2570. <newline>
  2571. characters without
  2572. <backslash>
  2573. characters can follow a comma, an open brace, logical AND operator (\c
  2574. .BR \(dq&&\(dq ),
  2575. logical OR operator (\c
  2576. .BR \(dq||\(dq ),
  2577. the
  2578. .BR do
  2579. keyword, the
  2580. .BR else
  2581. keyword, and the closing parenthesis of an
  2582. .BR if ,
  2583. .BR for ,
  2584. or
  2585. .BR while
  2586. statement. For example:
  2587. .sp
  2588. .RS 4
  2589. .nf
  2590. { print $1,
  2591. $2 }
  2592. .fi
  2593. .P
  2594. .RE
  2595. .SS "Lexical Conventions"
  2596. .P
  2597. The lexical conventions for
  2598. .IR awk
  2599. programs, with respect to the preceding grammar, shall be as follows:
  2600. .IP " 1." 4
  2601. Except as noted,
  2602. .IR awk
  2603. shall recognize the longest possible token or delimiter beginning at a
  2604. given point.
  2605. .IP " 2." 4
  2606. A comment shall consist of any characters beginning with the
  2607. <number-sign>
  2608. character and terminated by, but excluding the next occurrence of, a
  2609. <newline>.
  2610. Comments shall have no effect, except to delimit lexical tokens.
  2611. .IP " 3." 4
  2612. The
  2613. <newline>
  2614. shall be recognized as the token
  2615. .BR NEWLINE .
  2616. .IP " 4." 4
  2617. A
  2618. <backslash>
  2619. character immediately followed by a
  2620. <newline>
  2621. shall have no effect.
  2622. .IP " 5." 4
  2623. The token
  2624. .BR STRING
  2625. shall represent a string constant. A string constant shall begin with
  2626. the character
  2627. .BR '\&"' .
  2628. Within a string constant, a
  2629. <backslash>
  2630. character shall be considered to begin an escape sequence as specified
  2631. in the table in the Base Definitions volume of POSIX.1\(hy2017,
  2632. .IR "Chapter 5" ", " "File Format Notation"
  2633. (\c
  2634. .BR '\e\e' ,
  2635. .BR '\ea' ,
  2636. .BR '\eb' ,
  2637. .BR '\ef' ,
  2638. .BR '\en' ,
  2639. .BR '\er' ,
  2640. .BR '\et' ,
  2641. .BR '\ev' ).
  2642. In addition, the escape sequences in
  2643. .IR "Table 4-2, Escape Sequences in \fIawk\fP"
  2644. shall be recognized. A
  2645. <newline>
  2646. shall not occur within a string constant. A string constant shall be
  2647. terminated by the first unescaped occurrence of the character
  2648. .BR '\&"'
  2649. after the one that begins the string constant. The value of the string
  2650. shall be the sequence of all unescaped characters and values of escape
  2651. sequences between, but not including, the two delimiting
  2652. .BR '\&"'
  2653. characters.
  2654. .IP " 6." 4
  2655. The token
  2656. .BR ERE
  2657. represents an extended regular expression constant. An ERE constant
  2658. shall begin with the
  2659. <slash>
  2660. character. Within an ERE constant, a
  2661. <backslash>
  2662. character shall be considered to begin an escape sequence as
  2663. specified in the table in the Base Definitions volume of POSIX.1\(hy2017,
  2664. .IR "Chapter 5" ", " "File Format Notation".
  2665. In addition, the escape sequences in
  2666. .IR "Table 4-2, Escape Sequences in \fIawk\fP"
  2667. shall be recognized. The application shall ensure that a
  2668. <newline>
  2669. does not occur within an ERE constant. An ERE constant shall be
  2670. terminated by the first unescaped occurrence of the
  2671. <slash>
  2672. character after the one that begins the ERE constant. The extended regular
  2673. expression represented by the ERE constant shall be the sequence of all
  2674. unescaped characters and values of escape sequences between, but not
  2675. including, the two delimiting
  2676. <slash>
  2677. characters.
  2678. .IP " 7." 4
  2679. A
  2680. <blank>
  2681. shall have no effect, except to delimit lexical tokens or within
  2682. .BR STRING
  2683. or
  2684. .BR ERE
  2685. tokens.
  2686. .IP " 8." 4
  2687. The token
  2688. .BR NUMBER
  2689. shall represent a numeric constant. Its form and numeric value shall
  2690. either be equivalent to the
  2691. .BR decimal-floating-constant
  2692. token as specified by the ISO\ C standard, or it shall be a sequence of decimal
  2693. digits and shall be evaluated as an integer constant in decimal. In
  2694. addition, implementations may accept numeric constants with the form
  2695. and numeric value equivalent to the
  2696. .BR hexadecimal-constant
  2697. and
  2698. .BR hexadecimal-floating-constant
  2699. tokens as specified by the ISO\ C standard.
  2700. .RS 4
  2701. .P
  2702. If the value is too large or too small to be representable (see
  2703. .IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard"),
  2704. the behavior is undefined.
  2705. .RE
  2706. .IP " 9." 4
  2707. A sequence of underscores, digits, and alphabetics from the portable
  2708. character set (see the Base Definitions volume of POSIX.1\(hy2017,
  2709. .IR "Section 6.1" ", " "Portable Character Set"),
  2710. beginning with an
  2711. <underscore>
  2712. or alphabetic character, shall be considered a word.
  2713. .IP 10. 4
  2714. The following words are keywords that shall be recognized as individual
  2715. tokens; the name of the token is the same as the keyword:
  2716. .TS
  2717. tab(@);
  2718. lw(0.6i)eB leB leB leB leB leB.
  2719. T{
  2720. .nf
  2721. BEGIN
  2722. break
  2723. continue
  2724. T}@T{
  2725. .nf
  2726. delete
  2727. do
  2728. else
  2729. T}@T{
  2730. .nf
  2731. END
  2732. exit
  2733. for
  2734. T}@T{
  2735. .nf
  2736. function
  2737. getline
  2738. if
  2739. T}@T{
  2740. .nf
  2741. in
  2742. next
  2743. print
  2744. T}@T{
  2745. .nf
  2746. printf
  2747. return
  2748. while
  2749. T}
  2750. .TE
  2751. .IP 11. 4
  2752. The following words are names of built-in functions and shall be
  2753. recognized as the token
  2754. .BR BUILTIN_FUNC_NAME :
  2755. .TS
  2756. tab(@);
  2757. lw(0.6i)eB leB leB leB leB leB.
  2758. T{
  2759. .nf
  2760. atan2
  2761. close
  2762. cos
  2763. exp
  2764. T}@T{
  2765. .nf
  2766. gsub
  2767. index
  2768. int
  2769. length
  2770. T}@T{
  2771. .nf
  2772. log
  2773. match
  2774. rand
  2775. sin
  2776. T}@T{
  2777. .nf
  2778. split
  2779. sprintf
  2780. sqrt
  2781. srand
  2782. T}@T{
  2783. .nf
  2784. sub
  2785. substr
  2786. system
  2787. tolower
  2788. T}@T{
  2789. .nf
  2790. toupper
  2791. .fi
  2792. T}
  2793. .TE
  2794. .RS 4
  2795. .P
  2796. The above-listed keywords and names of built-in functions are
  2797. considered reserved words.
  2798. .RE
  2799. .IP 12. 4
  2800. The token
  2801. .BR NAME
  2802. shall consist of a word that is not a keyword or a name of a built-in
  2803. function and is not followed immediately (without any delimiters) by
  2804. the
  2805. .BR '('
  2806. character.
  2807. .IP 13. 4
  2808. The token
  2809. .BR FUNC_NAME
  2810. shall consist of a word that is not a keyword or a name of a built-in
  2811. function, followed immediately (without any delimiters) by the
  2812. .BR '('
  2813. character. The
  2814. .BR '('
  2815. character shall not be included as part of the token.
  2816. .IP 14. 4
  2817. The following two-character sequences shall be recognized as the named
  2818. tokens:
  2819. .TS
  2820. box center tab(@);
  2821. cB | cB | cB | cB
  2822. lB | cf5 | lB | cf5.
  2823. Token Name@Sequence@Token Name@Sequence
  2824. _
  2825. ADD_ASSIGN@+=@NO_MATCH@!~
  2826. SUB_ASSIGN@\-=@EQ@==
  2827. MUL_ASSIGN@*=@LE@<=
  2828. DIV_ASSIGN@/=@GE@>=
  2829. MOD_ASSIGN@%=@NE@!=
  2830. POW_ASSIGN@^=@INCR@++
  2831. OR@||@DECR@\-\|\-
  2832. AND@&&@APPEND@>>
  2833. .TE
  2834. .IP 15. 4
  2835. The following single characters shall be recognized as tokens whose
  2836. names are the character:
  2837. .RS 4
  2838. .sp
  2839. .RS 4
  2840. .nf
  2841. <newline> { } ( ) [ ] , ; + - * % \(ha ! > < | ? : \(ti $ =
  2842. .fi
  2843. .P
  2844. .RE
  2845. .RE
  2846. .P
  2847. There is a lexical ambiguity between the token
  2848. .BR ERE
  2849. and the tokens
  2850. .BR '/'
  2851. and
  2852. .BR DIV_ASSIGN .
  2853. When an input sequence begins with a
  2854. <slash>
  2855. character in any syntactic context where the token
  2856. .BR '/'
  2857. or
  2858. .BR DIV_ASSIGN
  2859. could appear as the next token in a valid program, the longer of those
  2860. two tokens that can be recognized shall be recognized. In any other
  2861. syntactic context where the token
  2862. .BR ERE
  2863. could appear as the next token in a valid program, the token
  2864. .BR ERE
  2865. shall be recognized.
  2866. .SH "EXIT STATUS"
  2867. The following exit values shall be returned:
  2868. .IP "\00" 6
  2869. All input files were processed successfully.
  2870. .IP >0 6
  2871. An error occurred.
  2872. .P
  2873. The exit status can be altered within the program by using an
  2874. .BR exit
  2875. expression.
  2876. .SH "CONSEQUENCES OF ERRORS"
  2877. If any
  2878. .IR file
  2879. operand is specified and the named file cannot be accessed,
  2880. .IR awk
  2881. shall write a diagnostic message to standard error and terminate
  2882. without any further action.
  2883. .P
  2884. If the program specified by either the
  2885. .IR program
  2886. operand or a
  2887. .IR progfile
  2888. operand is not a valid
  2889. .IR awk
  2890. program (as specified in the EXTENDED DESCRIPTION section), the
  2891. behavior is undefined.
  2892. .LP
  2893. .IR "The following sections are informative."
  2894. .SH "APPLICATION USAGE"
  2895. The
  2896. .BR index ,
  2897. .BR length ,
  2898. .BR match ,
  2899. and
  2900. .BR substr
  2901. functions should not be confused with similar functions in the ISO\ C standard;
  2902. the
  2903. .IR awk
  2904. versions deal with characters, while the ISO\ C standard deals with bytes.
  2905. .P
  2906. Because the concatenation operation is represented by adjacent
  2907. expressions rather than an explicit operator, it is often necessary to
  2908. use parentheses to enforce the proper evaluation precedence.
  2909. .P
  2910. When using
  2911. .IR awk
  2912. to process pathnames, it is recommended that LC_ALL, or at least
  2913. LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environment,
  2914. since pathnames can contain byte sequences that do not form valid
  2915. characters in some locales, in which case the utility's behavior would
  2916. be undefined. In the POSIX locale each byte is a valid single-byte
  2917. character, and therefore this problem is avoided.
  2918. .P
  2919. On implementations where the
  2920. .BR \(dq==\(dq
  2921. operator checks if strings collate equally, applications needing to
  2922. check whether strings are identical can use:
  2923. .sp
  2924. .RS 4
  2925. .nf
  2926. length(a) == length(b) && index(a,b) == 1
  2927. .fi
  2928. .P
  2929. .RE
  2930. .P
  2931. On implementations where the
  2932. .BR \(dq==\(dq
  2933. operator checks if strings are identical, applications needing to
  2934. check whether strings collate equally can use:
  2935. .sp
  2936. .RS 4
  2937. .nf
  2938. a <= b && a >= b
  2939. .fi
  2940. .P
  2941. .RE
  2942. .SH EXAMPLES
  2943. The
  2944. .IR awk
  2945. program specified in the command line is most easily specified within
  2946. single-quotes (for example, \(aq\fIprogram\fP\(aq) for applications using
  2947. .IR sh ,
  2948. because
  2949. .IR awk
  2950. programs commonly contain characters that are special to the shell,
  2951. including double-quotes. In the cases where an
  2952. .IR awk
  2953. program contains single-quote characters, it is usually easiest to
  2954. specify most of the program as strings within single-quotes
  2955. concatenated by the shell with quoted single-quote characters. For
  2956. example:
  2957. .sp
  2958. .RS 4
  2959. .nf
  2960. awk \(aq/\(aq\e\(aq\(aq/ { print "quote:", $0 }\(aq
  2961. .fi
  2962. .P
  2963. .RE
  2964. .P
  2965. prints all lines from the standard input containing a single-quote
  2966. character, prefixed with
  2967. .IR quote :.
  2968. .P
  2969. The following are examples of simple
  2970. .IR awk
  2971. programs:
  2972. .IP " 1." 4
  2973. Write to the standard output all input lines for which field 3 is
  2974. greater than 5:
  2975. .RS 4
  2976. .sp
  2977. .RS 4
  2978. .nf
  2979. $3 > 5
  2980. .fi
  2981. .P
  2982. .RE
  2983. .RE
  2984. .IP " 2." 4
  2985. Write every tenth line:
  2986. .RS 4
  2987. .sp
  2988. .RS 4
  2989. .nf
  2990. (NR % 10) == 0
  2991. .fi
  2992. .P
  2993. .RE
  2994. .RE
  2995. .IP " 3." 4
  2996. Write any line with a substring matching the regular expression:
  2997. .RS 4
  2998. .sp
  2999. .RS 4
  3000. .nf
  3001. /(G|D)(2[0-9][[:alpha:]]*)/
  3002. .fi
  3003. .P
  3004. .RE
  3005. .RE
  3006. .IP " 4." 4
  3007. Print any line with a substring containing a
  3008. .BR 'G'
  3009. or
  3010. .BR 'D' ,
  3011. followed by a sequence of digits and characters. This example uses
  3012. character classes
  3013. .BR digit
  3014. and
  3015. .BR alpha
  3016. to match language-independent digit and alphabetic characters
  3017. respectively:
  3018. .RS 4
  3019. .sp
  3020. .RS 4
  3021. .nf
  3022. /(G|D)([[:digit:][:alpha:]]*)/
  3023. .fi
  3024. .P
  3025. .RE
  3026. .RE
  3027. .IP " 5." 4
  3028. Write any line in which the second field matches the regular expression
  3029. and the fourth field does not:
  3030. .RS 4
  3031. .sp
  3032. .RS 4
  3033. .nf
  3034. $2 \(ti /xyz/ && $4 !\(ti /xyz/
  3035. .fi
  3036. .P
  3037. .RE
  3038. .RE
  3039. .IP " 6." 4
  3040. Write any line in which the second field contains a
  3041. <backslash>:
  3042. .RS 4
  3043. .sp
  3044. .RS 4
  3045. .nf
  3046. $2 \(ti /\e\e/
  3047. .fi
  3048. .P
  3049. .RE
  3050. .RE
  3051. .IP " 7." 4
  3052. Write any line in which the second field contains a
  3053. <backslash>.
  3054. Note that
  3055. <backslash>-escapes
  3056. are interpreted twice; once in lexical processing of the string and once
  3057. in processing the regular expression:
  3058. .RS 4
  3059. .sp
  3060. .RS 4
  3061. .nf
  3062. $2 \(ti "\e\e\e\e"
  3063. .fi
  3064. .P
  3065. .RE
  3066. .RE
  3067. .IP " 8." 4
  3068. Write the second to the last and the last field in each line. Separate
  3069. the fields by a
  3070. <colon>:
  3071. .RS 4
  3072. .sp
  3073. .RS 4
  3074. .nf
  3075. {OFS=":";print $(NF-1), $NF}
  3076. .fi
  3077. .P
  3078. .RE
  3079. .RE
  3080. .IP " 9." 4
  3081. Write the line number and number of fields in each line. The three
  3082. strings representing the line number, the
  3083. <colon>,
  3084. and the number of fields are concatenated and that string is written to
  3085. standard output:
  3086. .RS 4
  3087. .sp
  3088. .RS 4
  3089. .nf
  3090. {print NR ":" NF}
  3091. .fi
  3092. .P
  3093. .RE
  3094. .RE
  3095. .IP 10. 4
  3096. Write lines longer than 72 characters:
  3097. .RS 4
  3098. .sp
  3099. .RS 4
  3100. .nf
  3101. length($0) > 72
  3102. .fi
  3103. .P
  3104. .RE
  3105. .RE
  3106. .IP 11. 4
  3107. Write the first two fields in opposite order separated by
  3108. .BR OFS :
  3109. .RS 4
  3110. .sp
  3111. .RS 4
  3112. .nf
  3113. { print $2, $1 }
  3114. .fi
  3115. .P
  3116. .RE
  3117. .RE
  3118. .IP 12. 4
  3119. Same, with input fields separated by a
  3120. <comma>
  3121. or
  3122. <space>
  3123. and
  3124. <tab>
  3125. characters, or both:
  3126. .RS 4
  3127. .sp
  3128. .RS 4
  3129. .nf
  3130. BEGIN { FS = ",[ \et]*|[ \et]+" }
  3131. { print $2, $1 }
  3132. .fi
  3133. .P
  3134. .RE
  3135. .RE
  3136. .IP 13. 4
  3137. Add up the first column, print sum, and average:
  3138. .RS 4
  3139. .sp
  3140. .RS 4
  3141. .nf
  3142. {s += $1 }
  3143. END {print "sum is ", s, " average is", s/NR}
  3144. .fi
  3145. .P
  3146. .RE
  3147. .RE
  3148. .IP 14. 4
  3149. Write fields in reverse order, one per line (many lines out for each
  3150. line in):
  3151. .RS 4
  3152. .sp
  3153. .RS 4
  3154. .nf
  3155. { for (i = NF; i > 0; --i) print $i }
  3156. .fi
  3157. .P
  3158. .RE
  3159. .RE
  3160. .IP 15. 4
  3161. Write all lines between occurrences of the strings
  3162. .BR start
  3163. and
  3164. .BR stop :
  3165. .RS 4
  3166. .sp
  3167. .RS 4
  3168. .nf
  3169. /start/, /stop/
  3170. .fi
  3171. .P
  3172. .RE
  3173. .RE
  3174. .IP 16. 4
  3175. Write all lines whose first field is different from the previous one:
  3176. .RS 4
  3177. .sp
  3178. .RS 4
  3179. .nf
  3180. $1 != prev { print; prev = $1 }
  3181. .fi
  3182. .P
  3183. .RE
  3184. .RE
  3185. .IP 17. 4
  3186. Simulate
  3187. .IR echo :
  3188. .RS 4
  3189. .sp
  3190. .RS 4
  3191. .nf
  3192. BEGIN {
  3193. for (i = 1; i < ARGC; ++i)
  3194. printf("%s%s", ARGV[i], i==ARGC-1?"\en":" ")
  3195. }
  3196. .fi
  3197. .P
  3198. .RE
  3199. .RE
  3200. .IP 18. 4
  3201. Write the path prefixes contained in the
  3202. .IR PATH
  3203. environment variable, one per line:
  3204. .RS 4
  3205. .sp
  3206. .RS 4
  3207. .nf
  3208. BEGIN {
  3209. n = split (ENVIRON["PATH"], path, ":")
  3210. for (i = 1; i <= n; ++i)
  3211. print path[i]
  3212. }
  3213. .fi
  3214. .P
  3215. .RE
  3216. .RE
  3217. .IP 19. 4
  3218. If there is a file named
  3219. .BR input
  3220. containing page headers of the form:
  3221. Page #
  3222. .RS 4
  3223. .P
  3224. and a file named
  3225. .BR program
  3226. that contains:
  3227. .sp
  3228. .RS 4
  3229. .nf
  3230. /Page/ { $2 = n++; }
  3231. { print }
  3232. .fi
  3233. .P
  3234. .RE
  3235. then the command line:
  3236. .sp
  3237. .RS 4
  3238. .nf
  3239. awk -f program n=5 input
  3240. .fi
  3241. .P
  3242. .RE
  3243. .P
  3244. prints the file
  3245. .BR input ,
  3246. filling in page numbers starting at 5.
  3247. .RE
  3248. .SH RATIONALE
  3249. This description is based on the new
  3250. .IR awk ,
  3251. ``nawk'', (see the referenced \fIThe AWK Programming Language\fP), which introduced a number of new features to
  3252. the historical
  3253. .IR awk :
  3254. .IP " 1." 4
  3255. New keywords:
  3256. .BR delete ,
  3257. .BR do ,
  3258. .BR function ,
  3259. .BR return
  3260. .IP " 2." 4
  3261. New built-in functions:
  3262. .BR atan2 ,
  3263. .BR close ,
  3264. .BR cos ,
  3265. .BR gsub ,
  3266. .BR match ,
  3267. .BR rand ,
  3268. .BR sin ,
  3269. .BR srand ,
  3270. .BR sub ,
  3271. .BR system
  3272. .IP " 3." 4
  3273. New predefined variables:
  3274. .BR FNR ,
  3275. .BR ARGC ,
  3276. .BR ARGV ,
  3277. .BR RSTART ,
  3278. .BR RLENGTH ,
  3279. .BR SUBSEP
  3280. .IP " 4." 4
  3281. New expression operators:
  3282. .BR ? ,
  3283. .BR : ,
  3284. .BR , ,
  3285. .BR ^
  3286. .IP " 5." 4
  3287. The
  3288. .BR FS
  3289. variable and the third argument to
  3290. .BR split ,
  3291. now treated as extended regular expressions.
  3292. .IP " 6." 4
  3293. The operator precedence, changed to more closely match the C language.
  3294. Two examples of code that operate differently are:
  3295. .RS 4
  3296. .sp
  3297. .RS 4
  3298. .nf
  3299. while ( n /= 10 > 1) ...
  3300. if (!"wk" \(ti /bwk/) ...
  3301. .fi
  3302. .P
  3303. .RE
  3304. .RE
  3305. .P
  3306. Several features have been added based on newer implementations of
  3307. .IR awk :
  3308. .IP " *" 4
  3309. Multiple instances of
  3310. .BR \-f
  3311. .IR progfile
  3312. are permitted.
  3313. .IP " *" 4
  3314. The new option
  3315. .BR \-v
  3316. .IR assignment.
  3317. .IP " *" 4
  3318. The new predefined variable
  3319. .BR ENVIRON .
  3320. .IP " *" 4
  3321. New built-in functions
  3322. .BR toupper
  3323. and
  3324. .BR tolower .
  3325. .IP " *" 4
  3326. More formatting capabilities are added to
  3327. .BR printf
  3328. to match the ISO\ C standard.
  3329. .P
  3330. Earlier versions of this standard required implementations to
  3331. support multiple adjacent
  3332. <semicolon>s,
  3333. lines with one or more
  3334. <semicolon>
  3335. before a rule (\c
  3336. .IR pattern-action
  3337. pairs), and lines with only
  3338. <semicolon>(s).
  3339. These are not required by this standard and are considered poor
  3340. programming practice, but can be accepted by an implementation of
  3341. .IR awk
  3342. as an extension.
  3343. .P
  3344. The overall
  3345. .IR awk
  3346. syntax has always been based on the C language, with a few features
  3347. from the shell command language and other sources. Because of this, it
  3348. is not completely compatible with any other language, which has caused
  3349. confusion for some users. It is not the intent of the standard
  3350. developers to address such issues. A few relatively minor changes
  3351. toward making the language more compatible with the ISO\ C standard were
  3352. made; most of these changes are based on similar changes in recent
  3353. implementations, as described above. There remain several C-language
  3354. conventions that are not in
  3355. .IR awk .
  3356. One of the notable ones is the
  3357. <comma>
  3358. operator, which is commonly used to specify multiple expressions in the
  3359. C language
  3360. .BR for
  3361. statement. Also, there are various places where
  3362. .IR awk
  3363. is more restrictive than the C language regarding the type of
  3364. expression that can be used in a given context. These limitations are
  3365. due to the different features that the
  3366. .IR awk
  3367. language does provide.
  3368. .P
  3369. Regular expressions in
  3370. .IR awk
  3371. have been extended somewhat from historical implementations to make
  3372. them a pure superset of extended regular expressions, as defined by
  3373. POSIX.1\(hy2008 (see the Base Definitions volume of POSIX.1\(hy2017,
  3374. .IR "Section 9.4" ", " "Extended Regular Expressions").
  3375. The main extensions are internationalization
  3376. features and interval expressions. Historical implementations of
  3377. .IR awk
  3378. have long supported
  3379. <backslash>-escape
  3380. sequences as an extension to extended regular expressions, and
  3381. this extension has been retained despite inconsistency with other
  3382. utilities. The number of escape sequences recognized in both extended
  3383. regular expressions and strings has varied (generally increasing with
  3384. time) among implementations. The set specified by POSIX.1\(hy2008 includes most
  3385. sequences known to be supported by popular implementations and by the
  3386. ISO\ C standard. One sequence that is not supported is hexadecimal value escapes
  3387. beginning with
  3388. .BR '\ex' .
  3389. This would allow values expressed in more than 9 bits to be used within
  3390. .IR awk
  3391. as in the ISO\ C standard. However, because this syntax has a non-deterministic
  3392. length, it does not permit the subsequent character to be a hexadecimal
  3393. digit. This limitation can be dealt with in the C language by the use
  3394. of lexical string concatenation. In the
  3395. .IR awk
  3396. language, concatenation could also be a solution for strings, but not
  3397. for extended regular expressions (either lexical ERE tokens or strings
  3398. used dynamically as regular expressions). Because of this limitation,
  3399. the feature has not been added to POSIX.1\(hy2008.
  3400. .P
  3401. When a string variable is used in a context where an extended regular
  3402. expression normally appears (where the lexical token ERE is used in the
  3403. grammar) the string does not contain the literal
  3404. <slash>
  3405. characters.
  3406. .P
  3407. Some versions of
  3408. .IR awk
  3409. allow the form:
  3410. .sp
  3411. .RS 4
  3412. .nf
  3413. func name(args, ... ) { statements }
  3414. .fi
  3415. .P
  3416. .RE
  3417. .P
  3418. This has been deprecated by the authors of the language, who asked that
  3419. it not be specified.
  3420. .P
  3421. Historical implementations of
  3422. .IR awk
  3423. produce an error if a
  3424. .BR next
  3425. statement is executed in a
  3426. .BR BEGIN
  3427. action, and cause
  3428. .IR awk
  3429. to terminate if a
  3430. .BR next
  3431. statement is executed in an
  3432. .BR END
  3433. action. This behavior has not been documented, and it was not believed
  3434. that it was necessary to standardize it.
  3435. .P
  3436. The specification of conversions between string and numeric values is
  3437. much more detailed than in the documentation of historical
  3438. implementations or in the referenced \fIThe AWK Programming Language\fP. Although most of the behavior is
  3439. designed to be intuitive, the details are necessary to ensure
  3440. compatible behavior from different implementations. This is especially
  3441. important in relational expressions since the types of the operands
  3442. determine whether a string or numeric comparison is performed. From the
  3443. perspective of an application developer, it is usually sufficient to
  3444. expect intuitive behavior and to force conversions (by adding zero or
  3445. concatenating a null string) when the type of an expression does not
  3446. obviously match what is needed. The intent has been to specify
  3447. historical practice in almost all cases. The one exception is that, in
  3448. historical implementations, variables and constants maintain both
  3449. string and numeric values after their original value is converted by
  3450. any use. This means that referencing a variable or constant can have
  3451. unexpected side-effects. For example, with historical implementations
  3452. the following program:
  3453. .sp
  3454. .RS 4
  3455. .nf
  3456. {
  3457. a = "+2"
  3458. b = 2
  3459. if (NR % 2)
  3460. c = a + b
  3461. if (a == b)
  3462. print "numeric comparison"
  3463. else
  3464. print "string comparison"
  3465. }
  3466. .fi
  3467. .P
  3468. .RE
  3469. .P
  3470. would perform a numeric comparison (and output numeric comparison) for
  3471. each odd-numbered line, but perform a string comparison (and output
  3472. string comparison) for each even-numbered line. POSIX.1\(hy2008 ensures that
  3473. comparisons will be numeric if necessary. With historical
  3474. implementations, the following program:
  3475. .sp
  3476. .RS 4
  3477. .nf
  3478. BEGIN {
  3479. OFMT = "%e"
  3480. print 3.14
  3481. OFMT = "%f"
  3482. print 3.14
  3483. }
  3484. .fi
  3485. .P
  3486. .RE
  3487. .P
  3488. would output
  3489. .BR \(dq3.140000e+00\(dq
  3490. twice, because in the second
  3491. .BR print
  3492. statement the constant
  3493. .BR \(dq3.14\(dq
  3494. would have a string value from the previous conversion. POSIX.1\(hy2008 requires
  3495. that the output of the second
  3496. .BR print
  3497. statement be
  3498. .BR \(dq3.140000\(dq .
  3499. The behavior of historical implementations was seen as too unintuitive
  3500. and unpredictable.
  3501. .P
  3502. It was pointed out that with the rules contained in early drafts, the
  3503. following script would print nothing:
  3504. .sp
  3505. .RS 4
  3506. .nf
  3507. BEGIN {
  3508. y[1.5] = 1
  3509. OFMT = "%e"
  3510. print y[1.5]
  3511. }
  3512. .fi
  3513. .P
  3514. .RE
  3515. .P
  3516. Therefore, a new variable,
  3517. .BR CONVFMT ,
  3518. was introduced. The
  3519. .BR OFMT
  3520. variable is now restricted to affecting output conversions of numbers
  3521. to strings and
  3522. .BR CONVFMT
  3523. is used for internal conversions, such as comparisons or array
  3524. indexing. The default value is the same as that for
  3525. .BR OFMT ,
  3526. so unless a program changes
  3527. .BR CONVFMT
  3528. (which no historical program would do), it will receive the historical
  3529. behavior associated with internal string conversions.
  3530. .P
  3531. The POSIX
  3532. .IR awk
  3533. lexical and syntactic conventions are specified more formally than in
  3534. other sources. Again the intent has been to specify historical
  3535. practice. One convention that may not be obvious from the formal
  3536. grammar as in other verbal descriptions is where
  3537. <newline>
  3538. characters are acceptable. There are several obvious placements such as
  3539. terminating a statement, and a
  3540. <backslash>
  3541. can be used to escape
  3542. <newline>
  3543. characters between any lexical tokens. In addition,
  3544. <newline>
  3545. characters without
  3546. <backslash>
  3547. characters can follow a comma, an open brace, a logical AND operator (\c
  3548. .BR \(dq&&\(dq ),
  3549. a logical OR operator (\c
  3550. .BR \(dq||\(dq ),
  3551. the
  3552. .BR do
  3553. keyword, the
  3554. .BR else
  3555. keyword, and the closing parenthesis of an
  3556. .BR if ,
  3557. .BR for ,
  3558. or
  3559. .BR while
  3560. statement. For example:
  3561. .sp
  3562. .RS 4
  3563. .nf
  3564. { print $1,
  3565. $2 }
  3566. .fi
  3567. .P
  3568. .RE
  3569. .P
  3570. The requirement that
  3571. .IR awk
  3572. add a trailing
  3573. <newline>
  3574. to the program argument text is to simplify the grammar, making it
  3575. match a text file in form. There is no way for an application or test
  3576. suite to determine whether a literal
  3577. <newline>
  3578. is added or whether
  3579. .IR awk
  3580. simply acts as if it did.
  3581. .P
  3582. POSIX.1\(hy2008 requires several changes from historical implementations in order
  3583. to support internationalization. Probably the most subtle of these is
  3584. the use of the decimal-point character, defined by the
  3585. .IR LC_NUMERIC
  3586. category of the locale, in representations of floating-point numbers.
  3587. This locale-specific character is used in recognizing numeric input, in
  3588. converting between strings and numeric values, and in formatting
  3589. output. However, regardless of locale, the
  3590. <period>
  3591. character (the decimal-point character of the POSIX locale) is the
  3592. decimal-point character recognized in processing
  3593. .IR awk
  3594. programs (including assignments in command line arguments). This is
  3595. essentially the same convention as the one used in the ISO\ C standard. The
  3596. difference is that the C language includes the
  3597. \fIsetlocale\fR()
  3598. function, which permits an application to modify its locale. Because of
  3599. this capability, a C application begins executing with its locale set
  3600. to the C locale, and only executes in the environment-specified locale
  3601. after an explicit call to
  3602. \fIsetlocale\fR().
  3603. However, adding such an elaborate new feature to the
  3604. .IR awk
  3605. language was seen as inappropriate for POSIX.1\(hy2008. It is possible to execute
  3606. an
  3607. .IR awk
  3608. program explicitly in any desired locale by setting the environment in
  3609. the shell.
  3610. .P
  3611. The undefined behavior resulting from NULs in extended regular
  3612. expressions allows future extensions for the GNU
  3613. .IR gawk
  3614. program to process binary data.
  3615. .P
  3616. The behavior in the case of invalid
  3617. .IR awk
  3618. programs (including lexical, syntactic, and semantic errors) is
  3619. undefined because it was considered overly limiting on implementations
  3620. to specify. In most cases such errors can be expected to produce a
  3621. diagnostic and a non-zero exit status. However, some implementations
  3622. may choose to extend the language in ways that make use of certain
  3623. invalid constructs. Other invalid constructs might be deemed worthy of
  3624. a warning, but otherwise cause some reasonable behavior. Still other
  3625. constructs may be very difficult to detect in some implementations.
  3626. Also, different implementations might detect a given error during an
  3627. initial parsing of the program (before reading any input files) while
  3628. others might detect it when executing the program after reading some
  3629. input. Implementors should be aware that diagnosing errors as early as
  3630. possible and producing useful diagnostics can ease debugging of
  3631. applications, and thus make an implementation more usable.
  3632. .P
  3633. The unspecified behavior from using multi-character
  3634. .BR RS
  3635. values is to allow possible future extensions based on extended regular
  3636. expressions used for record separators. Historical implementations take
  3637. the first character of the string and ignore the others.
  3638. .P
  3639. Unspecified behavior when
  3640. .IR split (\c
  3641. .IR string ,\c
  3642. .IR array ,\c
  3643. <null>)
  3644. is used is to allow a proposed future extension that would split up a
  3645. string into an array of individual characters.
  3646. .P
  3647. In the context of the
  3648. .BR getline
  3649. function, equally good arguments for different precedences of the
  3650. .BR |
  3651. and
  3652. .BR <
  3653. operators can be made. Historical practice has been that:
  3654. .sp
  3655. .RS 4
  3656. .nf
  3657. getline < "a" "b"
  3658. .fi
  3659. .P
  3660. .RE
  3661. .P
  3662. is parsed as:
  3663. .sp
  3664. .RS 4
  3665. .nf
  3666. ( getline < "a" ) "b"
  3667. .fi
  3668. .P
  3669. .RE
  3670. .P
  3671. although many would argue that the intent was that the file
  3672. .BR ab
  3673. should be read. However:
  3674. .sp
  3675. .RS 4
  3676. .nf
  3677. getline < "x" + 1
  3678. .fi
  3679. .P
  3680. .RE
  3681. .P
  3682. parses as:
  3683. .sp
  3684. .RS 4
  3685. .nf
  3686. getline < ( "x" + 1 )
  3687. .fi
  3688. .P
  3689. .RE
  3690. .P
  3691. Similar problems occur with the
  3692. .BR |
  3693. version of
  3694. .BR getline ,
  3695. particularly in combination with
  3696. .BR $ .
  3697. For example:
  3698. .sp
  3699. .RS 4
  3700. .nf
  3701. $"echo hi" | getline
  3702. .fi
  3703. .P
  3704. .RE
  3705. .P
  3706. (This situation is particularly problematic when used in a
  3707. .BR print
  3708. statement, where the
  3709. .BR |getline
  3710. part might be a redirection of the
  3711. .BR print .)
  3712. .P
  3713. Since in most cases such constructs are not (or at least should not) be
  3714. used (because they have a natural ambiguity for which there is no
  3715. conventional parsing), the meaning of these constructs has been made
  3716. explicitly unspecified. (The effect is that a conforming application that
  3717. runs into the problem must parenthesize to resolve the ambiguity.)
  3718. There appeared to be few if any actual uses of such constructs.
  3719. .P
  3720. Grammars can be written that would cause an error under these
  3721. circumstances. Where backwards-compatibility is not a large
  3722. consideration, implementors may wish to use such grammars.
  3723. .P
  3724. Some historical implementations have allowed some built-in functions to
  3725. be called without an argument list, the result being a default argument
  3726. list chosen in some ``reasonable'' way. Use of
  3727. .BR length
  3728. as a synonym for
  3729. .BR "length($0)"
  3730. is the only one of these forms that is thought to be widely known or
  3731. widely used; this particular form is documented in various places (for
  3732. example, most historical
  3733. .IR awk
  3734. reference pages, although not in the referenced \fIThe AWK Programming Language\fP) as legitimate practice.
  3735. With this exception, default argument lists have always been
  3736. undocumented and vaguely defined, and it is not at all clear how (or
  3737. if) they should be generalized to user-defined functions. They add no
  3738. useful functionality and preclude possible future extensions that might
  3739. need to name functions without calling them. Not standardizing them
  3740. seems the simplest course. The standard developers considered that
  3741. .BR length
  3742. merited special treatment, however, since it has been documented in the
  3743. past and sees possibly substantial use in historical programs.
  3744. Accordingly, this usage has been made legitimate, but Issue\ 5
  3745. removed the obsolescent marking for XSI-conforming implementations
  3746. and many otherwise conforming applications depend on this feature.
  3747. .P
  3748. In
  3749. .BR sub
  3750. and
  3751. .BR gsub ,
  3752. if
  3753. .IR repl
  3754. is a string literal (the lexical token
  3755. .BR STRING ),
  3756. then two consecutive
  3757. <backslash>
  3758. characters should be used in the string to ensure a single
  3759. <backslash>
  3760. will precede the
  3761. <ampersand>
  3762. when the resultant string is passed to the function. (For example,
  3763. to specify one literal
  3764. <ampersand>
  3765. in the replacement string, use
  3766. .BR gsub (\c
  3767. .BR ERE ,
  3768. .BR \(dq\e\e&\(dq ).)
  3769. .P
  3770. Historically, the only special character in the
  3771. .IR repl
  3772. argument of
  3773. .BR sub
  3774. and
  3775. .BR gsub
  3776. string functions was the
  3777. <ampersand>
  3778. (\c
  3779. .BR '&' )
  3780. character and preceding it with the
  3781. <backslash>
  3782. character was used to turn off its special meaning.
  3783. .P
  3784. The description in the ISO\ POSIX\(hy2:\|1993 standard introduced behavior such that the
  3785. <backslash>
  3786. character was another special character and it was unspecified whether
  3787. there were any other special characters. This description introduced
  3788. several portability problems, some of which are described below, and so
  3789. it has been replaced with the more historical description. Some of the
  3790. problems include:
  3791. .IP " *" 4
  3792. Historically, to create the replacement string, a script could use
  3793. .BR gsub (\c
  3794. .BR ERE ,
  3795. .BR \(dq\e\e&\(dq ),
  3796. but with the ISO\ POSIX\(hy2:\|1993 standard wording, it was necessary to use
  3797. .BR gsub (\c
  3798. .BR ERE ,
  3799. .BR \(dq\e\e\e\e&\(dq ).
  3800. The
  3801. <backslash>
  3802. characters are doubled here because all string literals are subject to
  3803. lexical analysis, which would reduce each pair of
  3804. <backslash>
  3805. characters to a single
  3806. <backslash>
  3807. before being passed to
  3808. .BR gsub .
  3809. .IP " *" 4
  3810. Since it was unspecified what the special characters were, for portable
  3811. scripts to guarantee that characters are printed literally, each
  3812. character had to be preceded with a
  3813. <backslash>.
  3814. (For example, a portable script had to use
  3815. .BR gsub (\c
  3816. .BR ERE ,
  3817. .BR \(dq\e\eh\e\ei\(dq )
  3818. to produce a replacement string of
  3819. .BR \(dqhi\(dq .)
  3820. .P
  3821. The description for comparisons in the ISO\ POSIX\(hy2:\|1993 standard did not properly describe
  3822. historical practice because of the way numeric strings are compared as
  3823. numbers. The current rules cause the following code:
  3824. .sp
  3825. .RS 4
  3826. .nf
  3827. if (0 == "000")
  3828. print "strange, but true"
  3829. else
  3830. print "not true"
  3831. .fi
  3832. .P
  3833. .RE
  3834. .P
  3835. to do a numeric comparison, causing the
  3836. .BR if
  3837. to succeed. It should be intuitively obvious that this is incorrect
  3838. behavior, and indeed, no historical implementation of
  3839. .IR awk
  3840. actually behaves this way.
  3841. .P
  3842. To fix this problem, the definition of
  3843. .IR "numeric string"
  3844. was enhanced to include only those values obtained from specific
  3845. circumstances (mostly external sources) where it is not possible to
  3846. determine unambiguously whether the value is intended to be a string or
  3847. a numeric.
  3848. .P
  3849. Variables that are assigned to a numeric string shall also be treated
  3850. as a numeric string. (For example, the notion of a numeric string can
  3851. be propagated across assignments.) In comparisons, all variables having
  3852. the uninitialized value are to be treated as a numeric operand
  3853. evaluating to the numeric value zero.
  3854. .P
  3855. Uninitialized variables include all types of variables including
  3856. scalars, array elements, and fields. The definition of an uninitialized
  3857. value in
  3858. .IR "Variables and Special Variables"
  3859. is necessary to describe the value placed on uninitialized variables
  3860. and on fields that are valid (for example,
  3861. .BR <
  3862. .BR $NF )
  3863. but have no characters in them and to describe how these variables are
  3864. to be used in comparisons. A valid field, such as
  3865. .BR $1 ,
  3866. that has no characters in it can be obtained from an input line of
  3867. .BR \(dq\et\et\(dq
  3868. when
  3869. .BR FS= \c
  3870. .BR '\et' .
  3871. Historically, the comparison (\c
  3872. .BR $1< 10)
  3873. was done numerically after evaluating
  3874. .BR $1
  3875. to the value zero.
  3876. .P
  3877. The phrase ``.\|.\|. also shall have the numeric value of the numeric
  3878. string'' was removed from several sections of the ISO\ POSIX\(hy2:\|1993 standard because is
  3879. specifies an unnecessary implementation detail. It is not necessary for
  3880. POSIX.1\(hy2008 to specify that these objects be assigned two different values.
  3881. It is only necessary to specify that these objects may evaluate to two
  3882. different values depending on context.
  3883. .P
  3884. Historical implementations of
  3885. .IR awk
  3886. did not parse hexadecimal integer or floating constants like
  3887. .BR \(dq0xa\(dq
  3888. and
  3889. .BR \(dq0xap0\(dq .
  3890. Due to an oversight, the 2001 through 2004 editions of this standard
  3891. required support for hexadecimal floating constants. This was due to
  3892. the reference to
  3893. \fIatof\fR().
  3894. This version of the standard allows but does not require implementations
  3895. to use
  3896. \fIatof\fR()
  3897. and includes a description of how floating-point numbers are recognized
  3898. as an alternative to match historic behavior. The intent of this change
  3899. is to allow implementations to recognize floating-point constants
  3900. according to either the ISO/IEC\ 9899:\|1990 standard or ISO/IEC\ 9899:\|1999 standard, and to allow (but not require)
  3901. implementations to recognize hexadecimal integer constants.
  3902. .P
  3903. Historical implementations of
  3904. .IR awk
  3905. did not support floating-point infinities and NaNs in
  3906. .IR "numeric strings" ;
  3907. e.g.,
  3908. .BR \(dq-INF\(dq
  3909. and
  3910. .BR \(dqNaN\(dq .
  3911. However, implementations that use the
  3912. \fIatof\fR()
  3913. or
  3914. \fIstrtod\fR()
  3915. functions to do the conversion picked up support for these values if they
  3916. used a ISO/IEC\ 9899:\|1999 standard version of the function instead of a ISO/IEC\ 9899:\|1990 standard version. Due to
  3917. an oversight, the 2001 through 2004 editions of this standard did not
  3918. allow support for infinities and NaNs, but in this revision support is
  3919. allowed (but not required). This is a silent change to the behavior of
  3920. .IR awk
  3921. programs; for example, in the POSIX locale the expression:
  3922. .sp
  3923. .RS 4
  3924. .nf
  3925. ("-INF" + 0 < 0)
  3926. .fi
  3927. .P
  3928. .RE
  3929. .P
  3930. formerly had the value 0 because
  3931. .BR \(dq-INF\(dq
  3932. converted to 0, but now it may have the value 0 or 1.
  3933. .SH "FUTURE DIRECTIONS"
  3934. A future version of this standard may require the
  3935. .BR \(dq!=\(dq
  3936. and
  3937. .BR \(dq==\(dq
  3938. operators to perform string comparisons by checking if the strings are
  3939. identical (and not by checking if they collate equally).
  3940. .SH "SEE ALSO"
  3941. .IR "Section 1.3" ", " "Grammar Conventions",
  3942. .IR "\fIgrep\fR\^",
  3943. .IR "\fIlex\fR\^",
  3944. .IR "\fIsed\fR\^"
  3945. .P
  3946. The Base Definitions volume of POSIX.1\(hy2017,
  3947. .IR "Chapter 5" ", " "File Format Notation",
  3948. .IR "Section 6.1" ", " "Portable Character Set",
  3949. .IR "Chapter 8" ", " "Environment Variables",
  3950. .IR "Chapter 9" ", " "Regular Expressions",
  3951. .IR "Section 12.2" ", " "Utility Syntax Guidelines"
  3952. .P
  3953. The System Interfaces volume of POSIX.1\(hy2017,
  3954. .IR "\fIatof\fR\^(\|)",
  3955. .IR "\fIexec\fR\^",
  3956. .IR "\fIisspace\fR\^(\|)",
  3957. .IR "\fIpopen\fR\^(\|)",
  3958. .IR "\fIsetlocale\fR\^(\|)",
  3959. .IR "\fIstrtod\fR\^(\|)"
  3960. .\"
  3961. .SH COPYRIGHT
  3962. Portions of this text are reprinted and reproduced in electronic form
  3963. from IEEE Std 1003.1-2017, Standard for Information Technology
  3964. -- Portable Operating System Interface (POSIX), The Open Group Base
  3965. Specifications Issue 7, 2018 Edition,
  3966. Copyright (C) 2018 by the Institute of
  3967. Electrical and Electronics Engineers, Inc and The Open Group.
  3968. In the event of any discrepancy between this version and the original IEEE and
  3969. The Open Group Standard, the original IEEE and The Open Group Standard
  3970. is the referee document. The original Standard can be obtained online at
  3971. http://www.opengroup.org/unix/online.html .
  3972. .PP
  3973. Any typographical or formatting errors that appear
  3974. in this page are most likely
  3975. to have been introduced during the conversion of the source files to
  3976. man page format. To report such errors, see
  3977. https://www.kernel.org/doc/man-pages/reporting_bugs.html .