uniq.1p (9961B)
- '\" et
- .TH UNIQ "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual"
- .\"
- .SH PROLOG
- This manual page is part of the POSIX Programmer's Manual.
- The Linux implementation of this interface may differ (consult
- the corresponding Linux manual page for details of Linux behavior),
- or the interface may not be implemented on Linux.
- .\"
- .SH NAME
- uniq
- \(em report or filter out repeated lines in a file
- .SH SYNOPSIS
- .LP
- .nf
- uniq \fB[\fR-c|-d|-u\fB] [\fR-f \fIfields\fB] [\fR-s \fIchar\fB] [\fIinput_file \fB[\fIoutput_file\fB]]\fR
- .fi
- .SH DESCRIPTION
- The
- .IR uniq
- utility shall read an input file comparing adjacent lines, and write
- one copy of each input line on the output. The second and succeeding
- copies of repeated adjacent input lines shall not be written.
- The trailing
- <newline>
- of each line in the input shall be ignored when doing comparisons.
- .P
- Repeated lines in the input shall not be detected if they are not
- adjacent.
- .SH OPTIONS
- The
- .IR uniq
- utility shall conform to the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 12.2" ", " "Utility Syntax Guidelines",
- except that
- .BR '\(pl'
- may be recognized as an option delimiter as well as
- .BR '\-' .
- .P
- The following options shall be supported:
- .IP "\fB\-c\fP" 10
- Precede each output line with a count of the number of times the line
- occurred in the input.
- .IP "\fB\-d\fP" 10
- Suppress the writing of lines that are not repeated in the input.
- .IP "\fB\-f\ \fIfields\fR" 10
- Ignore the first
- .IR fields
- fields on each input line when doing comparisons, where
- .IR fields
- is a positive decimal integer. A field is the maximal string matched
- by the basic regular expression:
- .RS 10
- .sp
- .RS 4
- .nf
- [[:blank:]]*[\(ha[:blank:]]*
- .fi
- .P
- .RE
- .P
- If the
- .IR fields
- option-argument specifies more fields than appear on an input line, a
- null string shall be used for comparison.
- .RE
- .IP "\fB\-s\ \fIchars\fR" 10
- Ignore the first
- .IR chars
- characters when doing comparisons, where
- .IR chars
- shall be a positive decimal integer. If specified in conjunction with
- the
- .BR \-f
- option, the first
- .IR chars
- characters after the first
- .IR fields
- fields shall be ignored. If the
- .IR chars
- option-argument specifies more characters than remain on an input line,
- a null string shall be used for comparison.
- .IP "\fB\-u\fP" 10
- Suppress the writing of lines that are repeated in the input.
- .SH OPERANDS
- The following operands shall be supported:
- .IP "\fIinput_file\fR" 10
- A pathname of the input file. If the
- .IR input_file
- operand is not specified, or if the
- .IR input_file
- is
- .BR '\-' ,
- the standard input shall be used.
- .IP "\fIoutput_file\fR" 10
- A pathname of the output file. If the
- .IR output_file
- operand is not specified, the standard output shall be used. The
- results are unspecified if the file named by
- .IR output_file
- is the file named by
- .IR input_file .
- .SH STDIN
- The standard input shall be used only if no
- .IR input_file
- operand is specified or if
- .IR input_file
- is
- .BR '\-' .
- See the INPUT FILES section.
- .SH "INPUT FILES"
- The input file shall be a text file.
- .SH "ENVIRONMENT VARIABLES"
- The following environment variables shall affect the execution of
- .IR uniq :
- .IP "\fILANG\fP" 10
- Provide a default value for the internationalization variables that are
- unset or null. (See the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 8.2" ", " "Internationalization Variables"
- for the precedence of internationalization variables used to determine
- the values of locale categories.)
- .IP "\fILC_ALL\fP" 10
- If set to a non-empty string value, override the values of all the
- other internationalization variables.
- .IP "\fILC_CTYPE\fP" 10
- Determine the locale for the interpretation of sequences of bytes of
- text data as characters (for example, single-byte as opposed to
- multi-byte characters in arguments and input files) and which
- characters constitute a
- <blank>
- in the current locale.
- .IP "\fILC_MESSAGES\fP" 10
- .br
- Determine the locale that should be used to affect the format and
- contents of diagnostic messages written to standard error.
- .IP "\fINLSPATH\fP" 10
- Determine the location of message catalogs for the processing of
- .IR LC_MESSAGES .
- .SH "ASYNCHRONOUS EVENTS"
- Default.
- .SH STDOUT
- The standard output shall be used if no
- .IR output_file
- operand is specified, and shall be used if the
- .IR output_file
- operand is
- .BR '\-'
- and the implementation treats the
- .BR '\-'
- as meaning standard output. Otherwise, the standard output shall
- not be used.
- See the OUTPUT FILES section.
- .SH STDERR
- The standard error shall be used only for diagnostic messages.
- .SH "OUTPUT FILES"
- If the
- .BR \-c
- option is specified, the output file shall be empty or each line
- shall be of the form:
- .sp
- .RS 4
- .nf
- "%d %s", <\fInumber of duplicates\fR>, <\fIline\fR>
- .fi
- .P
- .RE
- .P
- otherwise, the output file shall be empty or each line shall be
- of the form:
- .sp
- .RS 4
- .nf
- "%s", <\fIline\fR>
- .fi
- .P
- .RE
- .SH "EXTENDED DESCRIPTION"
- None.
- .SH "EXIT STATUS"
- The following exit values shall be returned:
- .IP "\00" 6
- The utility executed successfully.
- .IP >0 6
- An error occurred.
- .SH "CONSEQUENCES OF ERRORS"
- Default.
- .LP
- .IR "The following sections are informative."
- .SH "APPLICATION USAGE"
- If the collating sequence of the current locale has a total ordering
- of all characters, the
- .IR sort
- utility can be used to cause repeated lines to be adjacent in the input
- file. If the collating sequence does not have a total ordering of all
- characters, the
- .IR sort
- utility should still do this but it might not. To ensure that all
- duplicate lines are eliminated, and have the output sorted according
- the collating sequence of the current locale, applications should use:
- .sp
- .RS 4
- .nf
- LC_ALL=C sort -u | sort
- .fi
- .P
- .RE
- .P
- instead of:
- .sp
- .RS 4
- .nf
- sort | uniq
- .fi
- .P
- .RE
- .P
- To remove duplicate lines based on whether they collate equally
- instead of whether they are identical, applications should use:
- .sp
- .RS 4
- .nf
- sort -u
- .fi
- .P
- .RE
- .P
- instead of:
- .sp
- .RS 4
- .nf
- sort | uniq
- .fi
- .P
- .RE
- .P
- When using
- .IR uniq
- to process pathnames, it is recommended that LC_ALL, or at least
- LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environment,
- since pathnames can contain byte sequences that do not form valid
- characters in some locales, in which case the utility's behavior would
- be undefined. In the POSIX locale each byte is a valid single-byte
- character, and therefore this problem is avoided.
- .SH EXAMPLES
- The following input file data (but flushed left) was used for a test
- series on
- .IR uniq :
- .sp
- .RS 4
- .nf
- #01 foo0 bar0 foo1 bar1
- #02 bar0 foo1 bar1 foo1
- #03 foo0 bar0 foo1 bar1
- #04
- #05 foo0 bar0 foo1 bar1
- #06 foo0 bar0 foo1 bar1
- #07 bar0 foo1 bar1 foo0
- .fi
- .P
- .RE
- .P
- What follows is a series of test invocations of the
- .IR uniq
- utility that use a mixture of
- .IR uniq
- options against the input file data. These tests verify the meaning of
- .IR adjacent .
- The
- .IR uniq
- utility views the input data as a sequence of strings delimited by
- .BR '\en' .
- Accordingly, for the
- .IR fields th
- member of the sequence,
- .IR uniq
- interprets unique or repeated adjacent lines strictly relative to the
- .IR fields +1th
- member.
- .IP " 1." 4
- This first example tests the line counting option, comparing each line
- of the input file data starting from the second field:
- .RS 4
- .sp
- .RS 4
- .nf
- uniq -c -f 1 uniq_0I.t
- 1 #01 foo0 bar0 foo1 bar1
- 1 #02 bar0 foo1 bar1 foo1
- 1 #03 foo0 bar0 foo1 bar1
- 1 #04
- 2 #05 foo0 bar0 foo1 bar1
- 1 #07 bar0 foo1 bar1 foo0
- .fi
- .P
- .RE
- .P
- The number
- .BR '2' ,
- prefixing the fifth line of output, signifies that the
- .IR uniq
- utility detected a pair of repeated lines. Given the input data, this
- can only be true when
- .IR uniq
- is run using the
- .BR "\-f\ 1"
- option (which shall cause
- .IR uniq
- to ignore the first field on each input line).
- .RE
- .IP " 2." 4
- The second example tests the option to suppress unique lines, comparing
- each line of the input file data starting from the second field:
- .RS 4
- .sp
- .RS 4
- .nf
- uniq -d -f 1 uniq_0I.t
- #05 foo0 bar0 foo1 bar1
- .fi
- .P
- .RE
- .RE
- .IP " 3." 4
- This test suppresses repeated lines, comparing each line of the input
- file data starting from the second field:
- .RS 4
- .sp
- .RS 4
- .nf
- uniq -u -f 1 uniq_0I.t
- #01 foo0 bar0 foo1 bar1
- #02 bar0 foo1 bar1 foo1
- #03 foo0 bar0 foo1 bar1
- #04
- #07 bar0 foo1 bar1 foo0
- .fi
- .P
- .RE
- .RE
- .IP " 4." 4
- This suppresses unique lines, comparing each line of the input file
- data starting from the third character:
- .RS 4
- .sp
- .RS 4
- .nf
- uniq -d -s 2 uniq_0I.t
- .fi
- .P
- .RE
- .P
- In the last example, the
- .IR uniq
- utility found no input matching the above criteria.
- .RE
- .SH RATIONALE
- Some historical implementations have limited lines to be 1\|080 bytes
- in length, which does not meet the implied
- {LINE_MAX}
- limit.
- .P
- Earlier versions of this standard allowed the
- .BR \- \c
- .IR number
- and
- .BR \(pl \c
- .IR number
- options. These options are no longer specified by POSIX.1\(hy2008 but
- may be present in some implementations.
- .SH "FUTURE DIRECTIONS"
- None.
- .SH "SEE ALSO"
- .IR "\fIcomm\fR\^",
- .IR "\fIsort\fR\^"
- .P
- The Base Definitions volume of POSIX.1\(hy2017,
- .IR "Chapter 8" ", " "Environment Variables",
- .IR "Section 12.2" ", " "Utility Syntax Guidelines"
- .\"
- .SH COPYRIGHT
- Portions of this text are reprinted and reproduced in electronic form
- from IEEE Std 1003.1-2017, Standard for Information Technology
- -- Portable Operating System Interface (POSIX), The Open Group Base
- Specifications Issue 7, 2018 Edition,
- Copyright (C) 2018 by the Institute of
- Electrical and Electronics Engineers, Inc and The Open Group.
- In the event of any discrepancy between this version and the original IEEE and
- The Open Group Standard, the original IEEE and The Open Group Standard
- is the referee document. The original Standard can be obtained online at
- http://www.opengroup.org/unix/online.html .
- .PP
- Any typographical or formatting errors that appear
- in this page are most likely
- to have been introduced during the conversion of the source files to
- man page format. To report such errors, see
- https://www.kernel.org/doc/man-pages/reporting_bugs.html .