logo

oasis-root

Compiled tree of Oasis Linux based on own branch at <https://hacktivis.me/git/oasis/> git clone https://anongit.hacktivis.me/git/oasis-root.git

uniq.1p (9961B)


  1. '\" et
  2. .TH UNIQ "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual"
  3. .\"
  4. .SH PROLOG
  5. This manual page is part of the POSIX Programmer's Manual.
  6. The Linux implementation of this interface may differ (consult
  7. the corresponding Linux manual page for details of Linux behavior),
  8. or the interface may not be implemented on Linux.
  9. .\"
  10. .SH NAME
  11. uniq
  12. \(em report or filter out repeated lines in a file
  13. .SH SYNOPSIS
  14. .LP
  15. .nf
  16. uniq \fB[\fR-c|-d|-u\fB] [\fR-f \fIfields\fB] [\fR-s \fIchar\fB] [\fIinput_file \fB[\fIoutput_file\fB]]\fR
  17. .fi
  18. .SH DESCRIPTION
  19. The
  20. .IR uniq
  21. utility shall read an input file comparing adjacent lines, and write
  22. one copy of each input line on the output. The second and succeeding
  23. copies of repeated adjacent input lines shall not be written.
  24. The trailing
  25. <newline>
  26. of each line in the input shall be ignored when doing comparisons.
  27. .P
  28. Repeated lines in the input shall not be detected if they are not
  29. adjacent.
  30. .SH OPTIONS
  31. The
  32. .IR uniq
  33. utility shall conform to the Base Definitions volume of POSIX.1\(hy2017,
  34. .IR "Section 12.2" ", " "Utility Syntax Guidelines",
  35. except that
  36. .BR '\(pl'
  37. may be recognized as an option delimiter as well as
  38. .BR '\-' .
  39. .P
  40. The following options shall be supported:
  41. .IP "\fB\-c\fP" 10
  42. Precede each output line with a count of the number of times the line
  43. occurred in the input.
  44. .IP "\fB\-d\fP" 10
  45. Suppress the writing of lines that are not repeated in the input.
  46. .IP "\fB\-f\ \fIfields\fR" 10
  47. Ignore the first
  48. .IR fields
  49. fields on each input line when doing comparisons, where
  50. .IR fields
  51. is a positive decimal integer. A field is the maximal string matched
  52. by the basic regular expression:
  53. .RS 10
  54. .sp
  55. .RS 4
  56. .nf
  57. [[:blank:]]*[\(ha[:blank:]]*
  58. .fi
  59. .P
  60. .RE
  61. .P
  62. If the
  63. .IR fields
  64. option-argument specifies more fields than appear on an input line, a
  65. null string shall be used for comparison.
  66. .RE
  67. .IP "\fB\-s\ \fIchars\fR" 10
  68. Ignore the first
  69. .IR chars
  70. characters when doing comparisons, where
  71. .IR chars
  72. shall be a positive decimal integer. If specified in conjunction with
  73. the
  74. .BR \-f
  75. option, the first
  76. .IR chars
  77. characters after the first
  78. .IR fields
  79. fields shall be ignored. If the
  80. .IR chars
  81. option-argument specifies more characters than remain on an input line,
  82. a null string shall be used for comparison.
  83. .IP "\fB\-u\fP" 10
  84. Suppress the writing of lines that are repeated in the input.
  85. .SH OPERANDS
  86. The following operands shall be supported:
  87. .IP "\fIinput_file\fR" 10
  88. A pathname of the input file. If the
  89. .IR input_file
  90. operand is not specified, or if the
  91. .IR input_file
  92. is
  93. .BR '\-' ,
  94. the standard input shall be used.
  95. .IP "\fIoutput_file\fR" 10
  96. A pathname of the output file. If the
  97. .IR output_file
  98. operand is not specified, the standard output shall be used. The
  99. results are unspecified if the file named by
  100. .IR output_file
  101. is the file named by
  102. .IR input_file .
  103. .SH STDIN
  104. The standard input shall be used only if no
  105. .IR input_file
  106. operand is specified or if
  107. .IR input_file
  108. is
  109. .BR '\-' .
  110. See the INPUT FILES section.
  111. .SH "INPUT FILES"
  112. The input file shall be a text file.
  113. .SH "ENVIRONMENT VARIABLES"
  114. The following environment variables shall affect the execution of
  115. .IR uniq :
  116. .IP "\fILANG\fP" 10
  117. Provide a default value for the internationalization variables that are
  118. unset or null. (See the Base Definitions volume of POSIX.1\(hy2017,
  119. .IR "Section 8.2" ", " "Internationalization Variables"
  120. for the precedence of internationalization variables used to determine
  121. the values of locale categories.)
  122. .IP "\fILC_ALL\fP" 10
  123. If set to a non-empty string value, override the values of all the
  124. other internationalization variables.
  125. .IP "\fILC_CTYPE\fP" 10
  126. Determine the locale for the interpretation of sequences of bytes of
  127. text data as characters (for example, single-byte as opposed to
  128. multi-byte characters in arguments and input files) and which
  129. characters constitute a
  130. <blank>
  131. in the current locale.
  132. .IP "\fILC_MESSAGES\fP" 10
  133. .br
  134. Determine the locale that should be used to affect the format and
  135. contents of diagnostic messages written to standard error.
  136. .IP "\fINLSPATH\fP" 10
  137. Determine the location of message catalogs for the processing of
  138. .IR LC_MESSAGES .
  139. .SH "ASYNCHRONOUS EVENTS"
  140. Default.
  141. .SH STDOUT
  142. The standard output shall be used if no
  143. .IR output_file
  144. operand is specified, and shall be used if the
  145. .IR output_file
  146. operand is
  147. .BR '\-'
  148. and the implementation treats the
  149. .BR '\-'
  150. as meaning standard output. Otherwise, the standard output shall
  151. not be used.
  152. See the OUTPUT FILES section.
  153. .SH STDERR
  154. The standard error shall be used only for diagnostic messages.
  155. .SH "OUTPUT FILES"
  156. If the
  157. .BR \-c
  158. option is specified, the output file shall be empty or each line
  159. shall be of the form:
  160. .sp
  161. .RS 4
  162. .nf
  163. "%d %s", <\fInumber of duplicates\fR>, <\fIline\fR>
  164. .fi
  165. .P
  166. .RE
  167. .P
  168. otherwise, the output file shall be empty or each line shall be
  169. of the form:
  170. .sp
  171. .RS 4
  172. .nf
  173. "%s", <\fIline\fR>
  174. .fi
  175. .P
  176. .RE
  177. .SH "EXTENDED DESCRIPTION"
  178. None.
  179. .SH "EXIT STATUS"
  180. The following exit values shall be returned:
  181. .IP "\00" 6
  182. The utility executed successfully.
  183. .IP >0 6
  184. An error occurred.
  185. .SH "CONSEQUENCES OF ERRORS"
  186. Default.
  187. .LP
  188. .IR "The following sections are informative."
  189. .SH "APPLICATION USAGE"
  190. If the collating sequence of the current locale has a total ordering
  191. of all characters, the
  192. .IR sort
  193. utility can be used to cause repeated lines to be adjacent in the input
  194. file. If the collating sequence does not have a total ordering of all
  195. characters, the
  196. .IR sort
  197. utility should still do this but it might not. To ensure that all
  198. duplicate lines are eliminated, and have the output sorted according
  199. the collating sequence of the current locale, applications should use:
  200. .sp
  201. .RS 4
  202. .nf
  203. LC_ALL=C sort -u | sort
  204. .fi
  205. .P
  206. .RE
  207. .P
  208. instead of:
  209. .sp
  210. .RS 4
  211. .nf
  212. sort | uniq
  213. .fi
  214. .P
  215. .RE
  216. .P
  217. To remove duplicate lines based on whether they collate equally
  218. instead of whether they are identical, applications should use:
  219. .sp
  220. .RS 4
  221. .nf
  222. sort -u
  223. .fi
  224. .P
  225. .RE
  226. .P
  227. instead of:
  228. .sp
  229. .RS 4
  230. .nf
  231. sort | uniq
  232. .fi
  233. .P
  234. .RE
  235. .P
  236. When using
  237. .IR uniq
  238. to process pathnames, it is recommended that LC_ALL, or at least
  239. LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environment,
  240. since pathnames can contain byte sequences that do not form valid
  241. characters in some locales, in which case the utility's behavior would
  242. be undefined. In the POSIX locale each byte is a valid single-byte
  243. character, and therefore this problem is avoided.
  244. .SH EXAMPLES
  245. The following input file data (but flushed left) was used for a test
  246. series on
  247. .IR uniq :
  248. .sp
  249. .RS 4
  250. .nf
  251. #01 foo0 bar0 foo1 bar1
  252. #02 bar0 foo1 bar1 foo1
  253. #03 foo0 bar0 foo1 bar1
  254. #04
  255. #05 foo0 bar0 foo1 bar1
  256. #06 foo0 bar0 foo1 bar1
  257. #07 bar0 foo1 bar1 foo0
  258. .fi
  259. .P
  260. .RE
  261. .P
  262. What follows is a series of test invocations of the
  263. .IR uniq
  264. utility that use a mixture of
  265. .IR uniq
  266. options against the input file data. These tests verify the meaning of
  267. .IR adjacent .
  268. The
  269. .IR uniq
  270. utility views the input data as a sequence of strings delimited by
  271. .BR '\en' .
  272. Accordingly, for the
  273. .IR fields th
  274. member of the sequence,
  275. .IR uniq
  276. interprets unique or repeated adjacent lines strictly relative to the
  277. .IR fields +1th
  278. member.
  279. .IP " 1." 4
  280. This first example tests the line counting option, comparing each line
  281. of the input file data starting from the second field:
  282. .RS 4
  283. .sp
  284. .RS 4
  285. .nf
  286. uniq -c -f 1 uniq_0I.t
  287. 1 #01 foo0 bar0 foo1 bar1
  288. 1 #02 bar0 foo1 bar1 foo1
  289. 1 #03 foo0 bar0 foo1 bar1
  290. 1 #04
  291. 2 #05 foo0 bar0 foo1 bar1
  292. 1 #07 bar0 foo1 bar1 foo0
  293. .fi
  294. .P
  295. .RE
  296. .P
  297. The number
  298. .BR '2' ,
  299. prefixing the fifth line of output, signifies that the
  300. .IR uniq
  301. utility detected a pair of repeated lines. Given the input data, this
  302. can only be true when
  303. .IR uniq
  304. is run using the
  305. .BR "\-f\ 1"
  306. option (which shall cause
  307. .IR uniq
  308. to ignore the first field on each input line).
  309. .RE
  310. .IP " 2." 4
  311. The second example tests the option to suppress unique lines, comparing
  312. each line of the input file data starting from the second field:
  313. .RS 4
  314. .sp
  315. .RS 4
  316. .nf
  317. uniq -d -f 1 uniq_0I.t
  318. #05 foo0 bar0 foo1 bar1
  319. .fi
  320. .P
  321. .RE
  322. .RE
  323. .IP " 3." 4
  324. This test suppresses repeated lines, comparing each line of the input
  325. file data starting from the second field:
  326. .RS 4
  327. .sp
  328. .RS 4
  329. .nf
  330. uniq -u -f 1 uniq_0I.t
  331. #01 foo0 bar0 foo1 bar1
  332. #02 bar0 foo1 bar1 foo1
  333. #03 foo0 bar0 foo1 bar1
  334. #04
  335. #07 bar0 foo1 bar1 foo0
  336. .fi
  337. .P
  338. .RE
  339. .RE
  340. .IP " 4." 4
  341. This suppresses unique lines, comparing each line of the input file
  342. data starting from the third character:
  343. .RS 4
  344. .sp
  345. .RS 4
  346. .nf
  347. uniq -d -s 2 uniq_0I.t
  348. .fi
  349. .P
  350. .RE
  351. .P
  352. In the last example, the
  353. .IR uniq
  354. utility found no input matching the above criteria.
  355. .RE
  356. .SH RATIONALE
  357. Some historical implementations have limited lines to be 1\|080 bytes
  358. in length, which does not meet the implied
  359. {LINE_MAX}
  360. limit.
  361. .P
  362. Earlier versions of this standard allowed the
  363. .BR \- \c
  364. .IR number
  365. and
  366. .BR \(pl \c
  367. .IR number
  368. options. These options are no longer specified by POSIX.1\(hy2008 but
  369. may be present in some implementations.
  370. .SH "FUTURE DIRECTIONS"
  371. None.
  372. .SH "SEE ALSO"
  373. .IR "\fIcomm\fR\^",
  374. .IR "\fIsort\fR\^"
  375. .P
  376. The Base Definitions volume of POSIX.1\(hy2017,
  377. .IR "Chapter 8" ", " "Environment Variables",
  378. .IR "Section 12.2" ", " "Utility Syntax Guidelines"
  379. .\"
  380. .SH COPYRIGHT
  381. Portions of this text are reprinted and reproduced in electronic form
  382. from IEEE Std 1003.1-2017, Standard for Information Technology
  383. -- Portable Operating System Interface (POSIX), The Open Group Base
  384. Specifications Issue 7, 2018 Edition,
  385. Copyright (C) 2018 by the Institute of
  386. Electrical and Electronics Engineers, Inc and The Open Group.
  387. In the event of any discrepancy between this version and the original IEEE and
  388. The Open Group Standard, the original IEEE and The Open Group Standard
  389. is the referee document. The original Standard can be obtained online at
  390. http://www.opengroup.org/unix/online.html .
  391. .PP
  392. Any typographical or formatting errors that appear
  393. in this page are most likely
  394. to have been introduced during the conversion of the source files to
  395. man page format. To report such errors, see
  396. https://www.kernel.org/doc/man-pages/reporting_bugs.html .