logo

oasis-root

Compiled tree of Oasis Linux based on own branch at <https://hacktivis.me/git/oasis/> git clone https://anongit.hacktivis.me/git/oasis-root.git

awk.1 (13062B)


  1. .de EX
  2. .nf
  3. .ft CW
  4. ..
  5. .de EE
  6. .br
  7. .fi
  8. .ft 1
  9. ..
  10. .de TF
  11. .IP "" "\w'\fB\\$1\ \ \fP'u"
  12. .PD 0
  13. ..
  14. .TH AWK 1
  15. .CT 1 files prog_other
  16. .SH NAME
  17. awk \- pattern-directed scanning and processing language
  18. .SH SYNOPSIS
  19. .B awk
  20. [
  21. .BI \-F
  22. .I fs
  23. |
  24. .B \-\^\-csv
  25. ]
  26. [
  27. .BI \-v
  28. .I var=value
  29. ]
  30. [
  31. .I 'prog'
  32. |
  33. .BI \-f
  34. .I progfile
  35. ]
  36. [
  37. .I file ...
  38. ]
  39. .SH DESCRIPTION
  40. .I Awk
  41. scans each input
  42. .I file
  43. for lines that match any of a set of patterns specified literally in
  44. .I prog
  45. or in one or more files
  46. specified as
  47. .B \-f
  48. .IR progfile .
  49. With each pattern
  50. there can be an associated action that will be performed
  51. when a line of a
  52. .I file
  53. matches the pattern.
  54. Each line is matched against the
  55. pattern portion of every pattern-action statement;
  56. the associated action is performed for each matched pattern.
  57. The file name
  58. .B \-
  59. means the standard input.
  60. Any
  61. .I file
  62. of the form
  63. .I var=value
  64. is treated as an assignment, not a filename,
  65. and is executed at the time it would have been opened if it were a filename.
  66. The option
  67. .B \-v
  68. followed by
  69. .I var=value
  70. is an assignment to be done before
  71. .I prog
  72. is executed;
  73. any number of
  74. .B \-v
  75. options may be present.
  76. The
  77. .B \-F
  78. .I fs
  79. option defines the input field separator to be the regular expression
  80. .IR fs .
  81. The
  82. .B \-\^\-csv
  83. option causes
  84. .I awk
  85. to process records using (more or less) standard comma-separated values
  86. (CSV) format.
  87. .PP
  88. An input line is normally made up of fields separated by white space,
  89. or by the regular expression
  90. .BR FS .
  91. The fields are denoted
  92. .BR $1 ,
  93. .BR $2 ,
  94. \&..., while
  95. .B $0
  96. refers to the entire line.
  97. If
  98. .BR FS
  99. is null, the input line is split into one field per character.
  100. .PP
  101. A pattern-action statement has the form:
  102. .IP
  103. .IB pattern " { " action " }
  104. .PP
  105. A missing
  106. .BI { " action " }
  107. means print the line;
  108. a missing pattern always matches.
  109. Pattern-action statements are separated by newlines or semicolons.
  110. .PP
  111. An action is a sequence of statements.
  112. A statement can be one of the following:
  113. .PP
  114. .EX
  115. .ta \w'\f(CWdelete array[expression]\fR'u
  116. .RS
  117. .nf
  118. .ft CW
  119. if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
  120. while(\fI expression \fP)\fI statement\fP
  121. for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
  122. for(\fI var \fPin\fI array \fP)\fI statement\fP
  123. do\fI statement \fPwhile(\fI expression \fP)
  124. break
  125. continue
  126. {\fR [\fP\fI statement ... \fP\fR] \fP}
  127. \fIexpression\fP #\fR commonly\fP\fI var = expression\fP
  128. print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
  129. printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
  130. return\fR [ \fP\fIexpression \fP\fR]\fP
  131. next #\fR skip remaining patterns on this input line\fP
  132. nextfile #\fR skip rest of this file, open next, start at top\fP
  133. delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
  134. delete\fI array\fP #\fR delete all elements of array\fP
  135. exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
  136. .fi
  137. .RE
  138. .EE
  139. .DT
  140. .PP
  141. Statements are terminated by
  142. semicolons, newlines or right braces.
  143. An empty
  144. .I expression-list
  145. stands for
  146. .BR $0 .
  147. String constants are quoted \&\f(CW"\ "\fR,
  148. with the usual C escapes recognized within.
  149. Expressions take on string or numeric values as appropriate,
  150. and are built using the operators
  151. .B + \- * / % ^
  152. (exponentiation), and concatenation (indicated by white space).
  153. The operators
  154. .B
  155. ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
  156. are also available in expressions.
  157. Variables may be scalars, array elements
  158. (denoted
  159. .IB x [ i ] \fR)
  160. or fields.
  161. Variables are initialized to the null string.
  162. Array subscripts may be any string,
  163. not necessarily numeric;
  164. this allows for a form of associative memory.
  165. Multiple subscripts such as
  166. .B [i,j,k]
  167. are permitted; the constituents are concatenated,
  168. separated by the value of
  169. .BR SUBSEP .
  170. .PP
  171. The
  172. .B print
  173. statement prints its arguments on the standard output
  174. (or on a file if
  175. .BI > " file
  176. or
  177. .BI >> " file
  178. is present or on a pipe if
  179. .BI | " cmd
  180. is present), separated by the current output field separator,
  181. and terminated by the output record separator.
  182. .I file
  183. and
  184. .I cmd
  185. may be literal names or parenthesized expressions;
  186. identical string values in different statements denote
  187. the same open file.
  188. The
  189. .B printf
  190. statement formats its expression list according to the
  191. .I format
  192. (see
  193. .IR printf (3)).
  194. The built-in function
  195. .BI close( expr )
  196. closes the file or pipe
  197. .IR expr .
  198. The built-in function
  199. .BI fflush( expr )
  200. flushes any buffered output for the file or pipe
  201. .IR expr .
  202. .PP
  203. The mathematical functions
  204. .BR atan2 ,
  205. .BR cos ,
  206. .BR exp ,
  207. .BR log ,
  208. .BR sin ,
  209. and
  210. .B sqrt
  211. are built in.
  212. Other built-in functions:
  213. .TF "\fBlength(\fR[\fIv\^\fR]\fB)\fR"
  214. .TP
  215. \fBlength(\fR[\fIv\^\fR]\fB)\fR
  216. the length of its argument
  217. taken as a string,
  218. number of elements in an array for an array argument,
  219. or length of
  220. .B $0
  221. if no argument.
  222. .TP
  223. .B rand()
  224. random number on [0,1).
  225. .TP
  226. \fBsrand(\fR[\fIs\^\fR]\fB)\fR
  227. sets seed for
  228. .B rand
  229. and returns the previous seed.
  230. .TP
  231. .BI int( x\^ )
  232. truncates to an integer value.
  233. .TP
  234. \fBsubstr(\fIs\fB, \fIm\fR [\fB, \fIn\^\fR]\fB)\fR
  235. the
  236. .IR n -character
  237. substring of
  238. .I s
  239. that begins at position
  240. .I m
  241. counted from 1.
  242. If no
  243. .IR n ,
  244. use the rest of the string.
  245. .TP
  246. .BI index( s , " t" )
  247. the position in
  248. .I s
  249. where the string
  250. .I t
  251. occurs, or 0 if it does not.
  252. .TP
  253. .BI match( s , " r" )
  254. the position in
  255. .I s
  256. where the regular expression
  257. .I r
  258. occurs, or 0 if it does not.
  259. The variables
  260. .B RSTART
  261. and
  262. .B RLENGTH
  263. are set to the position and length of the matched string.
  264. .TP
  265. \fBsplit(\fIs\fB, \fIa \fR[\fB, \fIfs\^\fR]\fB)\fR
  266. splits the string
  267. .I s
  268. into array elements
  269. .IB a [1] \fR,
  270. .IB a [2] \fR,
  271. \&...,
  272. .IB a [ n ] \fR,
  273. and returns
  274. .IR n .
  275. The separation is done with the regular expression
  276. .I fs
  277. or with the field separator
  278. .B FS
  279. if
  280. .I fs
  281. is not given.
  282. An empty string as field separator splits the string
  283. into one array element per character.
  284. .TP
  285. \fBsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
  286. substitutes
  287. .I t
  288. for the first occurrence of the regular expression
  289. .I r
  290. in the string
  291. .IR s .
  292. If
  293. .I s
  294. is not given,
  295. .B $0
  296. is used.
  297. .TP
  298. \fBgsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
  299. same as
  300. .B sub
  301. except that all occurrences of the regular expression
  302. are replaced;
  303. .B sub
  304. and
  305. .B gsub
  306. return the number of replacements.
  307. .TP
  308. .BI sprintf( fmt , " expr" , " ...\fB)
  309. the string resulting from formatting
  310. .I expr ...
  311. according to the
  312. .IR printf (3)
  313. format
  314. .IR fmt .
  315. .TP
  316. .BI system( cmd )
  317. executes
  318. .I cmd
  319. and returns its exit status. This will be \-1 upon error,
  320. .IR cmd 's
  321. exit status upon a normal exit,
  322. 256 +
  323. .I sig
  324. upon death-by-signal, where
  325. .I sig
  326. is the number of the murdering signal,
  327. or 512 +
  328. .I sig
  329. if there was a core dump.
  330. .TP
  331. .BI tolower( str )
  332. returns a copy of
  333. .I str
  334. with all upper-case characters translated to their
  335. corresponding lower-case equivalents.
  336. .TP
  337. .BI toupper( str )
  338. returns a copy of
  339. .I str
  340. with all lower-case characters translated to their
  341. corresponding upper-case equivalents.
  342. .PD
  343. .PP
  344. The ``function''
  345. .B getline
  346. sets
  347. .B $0
  348. to the next input record from the current input file;
  349. .B getline
  350. .BI < " file
  351. sets
  352. .B $0
  353. to the next record from
  354. .IR file .
  355. .B getline
  356. .I x
  357. sets variable
  358. .I x
  359. instead.
  360. Finally,
  361. .IB cmd " | getline
  362. pipes the output of
  363. .I cmd
  364. into
  365. .BR getline ;
  366. each call of
  367. .B getline
  368. returns the next line of output from
  369. .IR cmd .
  370. In all cases,
  371. .B getline
  372. returns 1 for a successful input,
  373. 0 for end of file, and \-1 for an error.
  374. .PP
  375. Patterns are arbitrary Boolean combinations
  376. (with
  377. .BR "! || &&" )
  378. of regular expressions and
  379. relational expressions.
  380. Regular expressions are as in
  381. .IR egrep ;
  382. see
  383. .IR grep (1).
  384. Isolated regular expressions
  385. in a pattern apply to the entire line.
  386. Regular expressions may also occur in
  387. relational expressions, using the operators
  388. .B ~
  389. and
  390. .BR !~ .
  391. .BI / re /
  392. is a constant regular expression;
  393. any string (constant or variable) may be used
  394. as a regular expression, except in the position of an isolated regular expression
  395. in a pattern.
  396. .PP
  397. A pattern may consist of two patterns separated by a comma;
  398. in this case, the action is performed for all lines
  399. from an occurrence of the first pattern
  400. through an occurrence of the second, inclusive.
  401. .PP
  402. A relational expression is one of the following:
  403. .IP
  404. .I expression matchop regular-expression
  405. .br
  406. .I expression relop expression
  407. .br
  408. .IB expression " in " array-name
  409. .br
  410. .BI ( expr ,\| expr ,\| ... ") in " array-name
  411. .PP
  412. where a
  413. .I relop
  414. is any of the six relational operators in C,
  415. and a
  416. .I matchop
  417. is either
  418. .B ~
  419. (matches)
  420. or
  421. .B !~
  422. (does not match).
  423. A conditional is an arithmetic expression,
  424. a relational expression,
  425. or a Boolean combination
  426. of these.
  427. .PP
  428. The special patterns
  429. .B BEGIN
  430. and
  431. .B END
  432. may be used to capture control before the first input line is read
  433. and after the last.
  434. .B BEGIN
  435. and
  436. .B END
  437. do not combine with other patterns.
  438. They may appear multiple times in a program and execute
  439. in the order they are read by
  440. .IR awk .
  441. .PP
  442. Variable names with special meanings:
  443. .TF FILENAME
  444. .TP
  445. .B ARGC
  446. argument count, assignable.
  447. .TP
  448. .B ARGV
  449. argument array, assignable;
  450. non-null members are taken as filenames.
  451. .TP
  452. .B CONVFMT
  453. conversion format used when converting numbers
  454. (default
  455. .BR "%.6g" ).
  456. .TP
  457. .B ENVIRON
  458. array of environment variables; subscripts are names.
  459. .TP
  460. .B FILENAME
  461. the name of the current input file.
  462. .TP
  463. .B FNR
  464. ordinal number of the current record in the current file.
  465. .TP
  466. .B FS
  467. regular expression used to separate fields; also settable
  468. by option
  469. .BI \-F fs\fR.
  470. .TP
  471. .BR NF
  472. number of fields in the current record.
  473. .TP
  474. .B NR
  475. ordinal number of the current record.
  476. .TP
  477. .B OFMT
  478. output format for numbers (default
  479. .BR "%.6g" ).
  480. .TP
  481. .B OFS
  482. output field separator (default space).
  483. .TP
  484. .B ORS
  485. output record separator (default newline).
  486. .TP
  487. .B RLENGTH
  488. the length of a string matched by
  489. .BR match .
  490. .TP
  491. .B RS
  492. input record separator (default newline).
  493. If empty, blank lines separate records.
  494. If more than one character long,
  495. .B RS
  496. is treated as a regular expression, and records are
  497. separated by text matching the expression.
  498. .TP
  499. .B RSTART
  500. the start position of a string matched by
  501. .BR match .
  502. .TP
  503. .B SUBSEP
  504. separates multiple subscripts (default 034).
  505. .PD
  506. .PP
  507. Functions may be defined (at the position of a pattern-action statement) thus:
  508. .IP
  509. .B
  510. function foo(a, b, c) { ... }
  511. .PP
  512. Parameters are passed by value if scalar and by reference if array name;
  513. functions may be called recursively.
  514. Parameters are local to the function; all other variables are global.
  515. Thus local variables may be created by providing excess parameters in
  516. the function definition.
  517. .SH ENVIRONMENT VARIABLES
  518. If
  519. .B POSIXLY_CORRECT
  520. is set in the environment, then
  521. .I awk
  522. follows the POSIX rules for
  523. .B sub
  524. and
  525. .B gsub
  526. with respect to consecutive backslashes and ampersands.
  527. .SH EXAMPLES
  528. .TP
  529. .EX
  530. length($0) > 72
  531. .EE
  532. Print lines longer than 72 characters.
  533. .TP
  534. .EX
  535. { print $2, $1 }
  536. .EE
  537. Print first two fields in opposite order.
  538. .PP
  539. .EX
  540. BEGIN { FS = ",[ \et]*|[ \et]+" }
  541. { print $2, $1 }
  542. .EE
  543. .ns
  544. .IP
  545. Same, with input fields separated by comma and/or spaces and tabs.
  546. .PP
  547. .EX
  548. .nf
  549. { s += $1 }
  550. END { print "sum is", s, " average is", s/NR }
  551. .fi
  552. .EE
  553. .ns
  554. .IP
  555. Add up first column, print sum and average.
  556. .TP
  557. .EX
  558. /start/, /stop/
  559. .EE
  560. Print all lines between start/stop pairs.
  561. .PP
  562. .EX
  563. .nf
  564. BEGIN { # Simulate echo(1)
  565. for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
  566. printf "\en"
  567. exit }
  568. .fi
  569. .EE
  570. .SH SEE ALSO
  571. .IR grep (1),
  572. .IR lex (1),
  573. .IR sed (1)
  574. .br
  575. A. V. Aho, B. W. Kernighan, P. J. Weinberger,
  576. .IR "The AWK Programming Language, Second Edition" ,
  577. Addison-Wesley, 2024. ISBN 978-0-13-826972-2, 0-13-826972-6.
  578. .SH BUGS
  579. There are no explicit conversions between numbers and strings.
  580. To force an expression to be treated as a number add 0 to it;
  581. to force it to be treated as a string concatenate
  582. \&\f(CW""\fP to it.
  583. .PP
  584. The scope rules for variables in functions are a botch;
  585. the syntax is worse.
  586. .PP
  587. Input is expected to be UTF-8 encoded. Other multibyte
  588. character sets are not handled.
  589. However, in eight-bit locales,
  590. .I awk
  591. treats each input byte as a separate character.
  592. .SH UNUSUAL FLOATING-POINT VALUES
  593. .I Awk
  594. was designed before IEEE 754 arithmetic defined Not-A-Number (NaN)
  595. and Infinity values, which are supported by all modern floating-point
  596. hardware.
  597. .PP
  598. Because
  599. .I awk
  600. uses
  601. .IR strtod (3)
  602. and
  603. .IR atof (3)
  604. to convert string values to double-precision floating-point values,
  605. modern C libraries also convert strings starting with
  606. .B inf
  607. and
  608. .B nan
  609. into infinity and NaN values respectively. This led to strange results,
  610. with something like this:
  611. .PP
  612. .EX
  613. .nf
  614. echo nancy | awk '{ print $1 + 0 }'
  615. .fi
  616. .EE
  617. .PP
  618. printing
  619. .B nan
  620. instead of zero.
  621. .PP
  622. .I Awk
  623. now follows GNU AWK, and prefilters string values before attempting
  624. to convert them to numbers, as follows:
  625. .TP
  626. .I "Hexadecimal values"
  627. Hexadecimal values (allowed since C99) convert to zero, as they did
  628. prior to C99.
  629. .TP
  630. .I "NaN values"
  631. The two strings
  632. .B +nan
  633. and
  634. .B \-nan
  635. (case independent) convert to NaN. No others do.
  636. (NaNs can have signs.)
  637. .TP
  638. .I "Infinity values"
  639. The two strings
  640. .B +inf
  641. and
  642. .B \-inf
  643. (case independent) convert to positive and negative infinity, respectively.
  644. No others do.