sort.1p (23063B)
- '\" et
- .TH SORT "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual"
- .\"
- .SH PROLOG
- This manual page is part of the POSIX Programmer's Manual.
- The Linux implementation of this interface may differ (consult
- the corresponding Linux manual page for details of Linux behavior),
- or the interface may not be implemented on Linux.
- .\"
- .SH NAME
- sort
- \(em sort, merge, or sequence check text files
- .SH SYNOPSIS
- .LP
- .nf
- sort \fB[\fR-m\fB] [\fR-o \fIoutput\fB] [\fR-bdfinru\fB] [\fR-t \fIchar\fB] [\fR-k \fIkeydef\fB]\fR... \fB[\fIfile\fR...\fB]\fR
- .P
- sort \fB[\fR-c|-C\fB] [\fR-bdfinru\fB] [\fR-t \fIchar\fB] [\fR-k \fIkeydef\fB] [\fIfile\fB]\fR
- .fi
- .SH DESCRIPTION
- The
- .IR sort
- utility shall perform one of the following functions:
- .IP " 1." 4
- Sort lines of all the named files together and write the result to
- the specified output.
- .IP " 2." 4
- Merge lines of all the named (presorted) files together and write the
- result to the specified output.
- .IP " 3." 4
- Check that a single input file is correctly presorted.
- .P
- Comparisons shall be based on one or more sort keys extracted from each
- line of input (or, if no sort keys are specified, the entire line up
- to, but not including, the terminating
- <newline>),
- and shall be performed using the collating sequence of the current
- locale. If this collating sequence does not have a total ordering of
- all characters (see the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 7.3.2" ", " "LC_COLLATE"),
- any lines of input that collate equally should be further compared
- byte-by-byte using the collating sequence for the POSIX locale.
- .SH OPTIONS
- The
- .IR sort
- utility shall conform to the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 12.2" ", " "Utility Syntax Guidelines",
- except for Guideline 9, and the
- .BR \-k
- .IR keydef
- option should follow the
- .BR \-b ,
- .BR \-d ,
- .BR \-f ,
- .BR \-i ,
- .BR \-n ,
- and
- .BR \-r
- options. In addition,
- .BR '\(pl'
- may be recognized as an option delimiter as well as
- .BR '\-' .
- .P
- The following options shall be supported:
- .IP "\fB\-c\fP" 10
- Check that the single input file is ordered as specified by the
- arguments and the collating sequence of the current locale. Output
- shall not be sent to standard output. The exit code shall indicate
- whether or not disorder was detected or an error occurred. If
- disorder (or, with
- .BR \-u ,
- a duplicate key) is detected, a warning message shall be sent to
- standard error indicating where the disorder or duplicate key
- was found.
- .IP "\fB\-C\fP" 10
- Same as
- .BR \-c ,
- except that a warning message shall not be sent to standard error
- if disorder or, with
- .BR \-u ,
- a duplicate key is detected.
- .IP "\fB\-m\fP" 10
- Merge only; the input file shall be assumed to be already sorted.
- .IP "\fB\-o\ \fIoutput\fR" 10
- Specify the name of an output file to be used instead of the standard
- output. This file can be the same as one of the input
- .IR file s.
- .IP "\fB\-u\fP" 10
- Unique: suppress all but one in each set of lines having equal keys.
- If used with the
- .BR \-c
- option, check that there are no lines with duplicate keys, in addition
- to checking that the input file is sorted.
- .P
- The following options shall override the default ordering rules. When
- ordering options appear independent of any key field specifications,
- the requested field ordering rules shall be applied globally to all
- sort keys. When attached to a specific key (see
- .BR \-k ),
- the specified ordering options shall override all global ordering
- options for that key.
- .IP "\fB\-d\fP" 10
- Specify that only
- <blank>
- characters and alphanumeric characters, according to the current
- setting of
- .IR LC_CTYPE ,
- shall be significant in comparisons. The behavior is undefined for a
- sort key to which
- .BR \-i
- or
- .BR \-n
- also applies.
- .IP "\fB\-f\fP" 10
- Consider all lowercase characters that have uppercase equivalents,
- according to the current setting of
- .IR LC_CTYPE ,
- to be the uppercase equivalent for the purposes of comparison.
- .IP "\fB\-i\fP" 10
- Ignore all characters that are non-printable, according to the current
- setting of
- .IR LC_CTYPE .
- The behavior is undefined for a sort key for which
- .BR \-n
- also applies.
- .IP "\fB\-n\fP" 10
- Restrict the sort key to an initial numeric string, consisting of
- optional
- <blank>
- characters, optional
- <hyphen-minus>
- character, and zero or more digits with an
- optional radix character and thousands separators (as defined in the
- current locale), which shall be sorted by arithmetic value. An empty
- digit string shall be treated as zero. Leading zeros and signs on zeros
- shall not affect ordering.
- .IP "\fB\-r\fP" 10
- Reverse the sense of comparisons.
- .P
- The treatment of field separators can be altered using the options:
- .IP "\fB\-b\fP" 10
- Ignore leading
- <blank>
- characters when determining the starting and ending positions of a
- restricted sort key. If the
- .BR \-b
- option is specified before the first
- .BR \-k
- option, it shall be applied to all
- .BR \-k
- options. Otherwise, the
- .BR \-b
- option can be attached independently to each
- .BR \-k
- .IR field_start
- or
- .IR field_end
- option-argument (see below).
- .IP "\fB\-t\ \fIchar\fR" 10
- Use
- .IR char
- as the field separator character;
- .IR char
- shall not be considered to be part of a field (although it can be
- included in a sort key). Each occurrence of
- .IR char
- shall be significant (for example, <\fIchar\fR><\fIchar\fR> delimits an
- empty field). If
- .BR \-t
- is not specified,
- <blank>
- characters shall be used as default field separators; each maximal
- non-empty sequence of
- <blank>
- characters that follows a non-\c
- <blank>
- shall be a field separator.
- .P
- Sort keys can be specified using the options:
- .IP "\fB\-k\ \fIkeydef\fR" 10
- The
- .IR keydef
- argument is a restricted sort key field definition. The format of this
- definition is:
- .RS 10
- .sp
- .RS 4
- .nf
- \fIfield_start\fB[\fItype\fB][\fR,\fIfield_end\fB[\fItype\fB]]\fR
- .fi
- .P
- .RE
- .P
- where
- .IR field_start
- and
- .IR field_end
- define a key field restricted to a portion of the line (see the
- EXTENDED DESCRIPTION section), and
- .IR type
- is one or more modifiers from the list of characters
- .BR 'b' ,
- .BR 'd' ,
- .BR 'f' ,
- .BR 'i' ,
- .BR 'n' ,
- .BR 'r' .
- The
- .BR 'b'
- modifier shall behave like the
- .BR \-b
- option, but shall apply only to the
- .IR field_start
- or
- .IR field_end
- to which it is attached. The other modifiers shall behave like the
- corresponding options, but shall apply only to the key field to which
- they are attached; they shall have this effect if specified with
- .IR field_start ,
- .IR field_end ,
- or both. If any modifier is attached to a
- .IR field_start
- or to a
- .IR field_end ,
- no option shall apply to either. Implementations shall support at
- least nine occurrences of the
- .BR \-k
- option, which shall be significant in command line order. If no
- .BR \-k
- option is specified, a default sort key of the entire line shall be
- used.
- .P
- When there are multiple key fields, later keys shall be compared only
- after all earlier keys compare equal. Except when the
- .BR \-u
- option is specified, lines that otherwise compare equal shall be
- ordered as if none of the options
- .BR \-d ,
- .BR \-f ,
- .BR \-i ,
- .BR \-n ,
- or
- .BR \-k
- were present (but with
- .BR \-r
- still in effect, if it was specified) and with all bytes in the lines
- significant to the comparison. The order in which lines that still
- compare equal are written is unspecified.
- .RE
- .SH OPERANDS
- The following operand shall be supported:
- .IP "\fIfile\fR" 10
- A pathname of a file to be sorted, merged, or checked. If no
- .IR file
- operands are specified, or if a
- .IR file
- operand is
- .BR '\-' ,
- the standard input shall be used. If
- .IR sort
- encounters an error when opening or reading a
- .IR file
- operand, it may exit without writing any output to standard output or
- processing later operands.
- .SH STDIN
- The standard input shall be used only if no
- .IR file
- operands are specified, or if a
- .IR file
- operand is
- .BR '\-' .
- See the INPUT FILES section.
- .SH "INPUT FILES"
- The input files shall be text files, except that the
- .IR sort
- utility shall add a
- <newline>
- to the end of a file ending with an incomplete last line.
- .SH "ENVIRONMENT VARIABLES"
- The following environment variables shall affect the execution of
- .IR sort :
- .IP "\fILANG\fP" 10
- Provide a default value for the internationalization variables that are
- unset or null. (See the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 8.2" ", " "Internationalization Variables"
- for the precedence of internationalization variables used to determine
- the values of locale categories.)
- .IP "\fILC_ALL\fP" 10
- If set to a non-empty string value, override the values of all the
- other internationalization variables.
- .IP "\fILC_COLLATE\fP" 10
- .br
- Determine the locale for ordering rules.
- .IP "\fILC_CTYPE\fP" 10
- Determine the locale for the interpretation of sequences of bytes of
- text data as characters (for example, single-byte as opposed to
- multi-byte characters in arguments and input files) and the behavior of
- character classification for the
- .BR \-b ,
- .BR \-d ,
- .BR \-f ,
- .BR \-i ,
- and
- .BR \-n
- options.
- .IP "\fILC_MESSAGES\fP" 10
- .br
- Determine the locale that should be used to affect the format and
- contents of diagnostic messages written to standard error.
- .IP "\fILC_NUMERIC\fP" 10
- .br
- Determine the locale for the definition of the radix character and
- thousands separator for the
- .BR \-n
- option.
- .IP "\fINLSPATH\fP" 10
- Determine the location of message catalogs for the processing of
- .IR LC_MESSAGES .
- .SH "ASYNCHRONOUS EVENTS"
- Default.
- .SH STDOUT
- Unless the
- .BR \-o
- or
- .BR \-c
- options are in effect, the standard output shall contain the sorted
- input.
- .SH STDERR
- The standard error shall be used for diagnostic messages. When
- .BR \-c
- is specified, if disorder is detected (or if
- .BR \-u
- is also specified and a duplicate key is detected), a message shall
- be written to the standard error which identifies the input line at
- which disorder (or a duplicate key) was detected. A warning
- message about correcting an incomplete last line of an input file
- may be generated, but need not affect the final exit status.
- .SH "OUTPUT FILES"
- If the
- .BR \-o
- option is in effect, the sorted input shall be written to the file
- .IR output .
- .SH "EXTENDED DESCRIPTION"
- The notation:
- .sp
- .RS 4
- .nf
- -k \fIfield_start\fB[\fItype\fB][\fR,\fIfield_end\fB[\fItype\fB]]\fR
- .fi
- .P
- .RE
- .P
- shall define a key field that begins at
- .IR field_start
- and ends at
- .IR field_end
- inclusive, unless
- .IR field_start
- falls beyond the end of the line or after
- .IR field_end ,
- in which case the key field is empty. A missing
- .IR field_end
- shall mean the last character of the line.
- .P
- A field comprises a maximal sequence of non-separating characters and,
- in the absence of option
- .BR \-t ,
- any preceding field separator.
- .P
- The
- .IR field_start
- portion of the
- .IR keydef
- option-argument shall have the form:
- .sp
- .RS 4
- .nf
- \fIfield_number\fB[\fR.\fIfirst_character\fB]\fR
- .fi
- .P
- .RE
- .P
- Fields and characters within fields shall be numbered starting with 1.
- The
- .IR field_number
- and
- .IR first_character
- pieces, interpreted as positive decimal integers, shall specify the
- first character to be used as part of a sort key. If
- .IR .first_character
- is omitted, it shall refer to the first character of the field.
- .P
- The
- .IR field_end
- portion of the
- .IR keydef
- option-argument shall have the form:
- .sp
- .RS 4
- .nf
- \fIfield_number\fB[\fR.\fIlast_character\fB]\fR
- .fi
- .P
- .RE
- .P
- The
- .IR field_number
- shall be as described above for
- .IR field_start.
- The
- .IR last_character
- piece, interpreted as a non-negative decimal integer, shall specify the
- last character to be used as part of the sort key. If
- .IR last_character
- evaluates to zero or
- .IR .last_character
- is omitted, it shall refer to the last character of the field specified
- by
- .IR field_number .
- .P
- If the
- .BR \-b
- option or
- .BR b
- type modifier is in effect, characters within a field shall be counted
- from the first non-\c
- <blank>
- in the field. (This shall apply separately to
- .IR first_character
- and
- .IR last_character .)
- .SH "EXIT STATUS"
- The following exit values shall be returned:
- .IP "\00" 6
- All input files were output successfully, or
- .BR \-c
- was specified and the input file was correctly sorted.
- .IP "\01" 6
- Under the
- .BR \-c
- option, the file was not ordered as specified, or if the
- .BR \-c
- and
- .BR \-u
- options were both specified, two input lines were found with equal
- keys.
- .IP >1 6
- An error occurred.
- .SH "CONSEQUENCES OF ERRORS"
- The default requirements shall apply, except that if
- .IR sort
- encounters an error when opening or reading a
- .IR file
- operand, it may exit without writing any output to standard output or
- processing later operands.
- .LP
- .IR "The following sections are informative."
- .SH "APPLICATION USAGE"
- The default value for
- .BR \-t ,
- <blank>,
- has different properties from, for example,
- .BR \-t \c
- "<space>". If a line contains:
- .sp
- .RS 4
- .nf
- <space><space>foo
- .fi
- .P
- .RE
- .P
- the following treatment would occur with default separation as opposed
- to specifically selecting a
- <space>:
- .TS
- center box tab(@);
- cB | cB | cB
- n | l | l.
- Field@Default@\-t "<space>"
- _
- 1@<space><space>foo@\fIempty\fP
- 2@\fIempty\fP@\fIempty\fP
- 3@\fIempty\fP@foo
- .TE
- .P
- The leading field separator itself is included in a field when
- .BR \-t
- is not used. For example, this command returns an exit status of zero,
- meaning the input was already sorted:
- .sp
- .RS 4
- .nf
- sort -c -k 2 <<eof
- y<tab>b
- x<space>a
- eof
- .fi
- .P
- .RE
- .P
- (assuming that a
- <tab>
- precedes the
- <space>
- in the current collating sequence). The field separator is not included
- in a field when it is explicitly set via
- .BR \-t .
- This is historical practice and allows usage such as:
- .sp
- .RS 4
- .nf
- sort -t "|" -k 2n <<eof
- Atlanta|425022|Georgia
- Birmingham|284413|Alabama
- Columbia|100385|South Carolina
- eof
- .fi
- .P
- .RE
- .P
- where the second field can be correctly sorted numerically without
- regard to the non-numeric field separator.
- .P
- The wording in the OPTIONS section clarifies that the
- .BR \-b ,
- .BR \-d ,
- .BR \-f ,
- .BR \-i ,
- .BR \-n ,
- and
- .BR \-r
- options have to come before the first sort key specified if they are
- intended to apply to all specified keys. The way it is described in
- \&this volume of POSIX.1\(hy2017 matches historical practice, not historical documentation.
- The results are unspecified if these options are specified after a
- .BR \-k
- option.
- .P
- The
- .BR \-f
- option might not work as expected in locales where there is not a
- one-to-one mapping between an uppercase and a lowercase letter.
- .P
- When using
- .IR sort
- to process pathnames, it is recommended that LC_ALL, or at least
- LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environment,
- since pathnames can contain byte sequences that do not form valid
- characters in some locales, in which case the utility's behavior would
- be undefined. In the POSIX locale each byte is a valid single-byte
- character, and therefore this problem is avoided.
- .P
- If the collating sequence of the current locale does not have a total
- ordering of all characters, this can affect the behavior of
- .IR sort
- in the following ways:
- .IP " *" 4
- As
- .IR sort
- .BR \-u
- suppresses lines with duplicate keys, it suppresses lines that collate
- equally but are not identical.
- .IP " *" 4
- The output of
- .IR sort
- (without
- .BR \-u )
- can contain identical lines that are not adjacent, if it does not
- implement the recommended further byte-by-byte comparison of lines
- that collate equally. This affects the use of
- .IR sort
- with
- .IR comm
- and
- .IR uniq ;
- see the APPLICATION USAGE for those utilities.
- .SH EXAMPLES
- .IP " 1." 4
- The following command sorts the contents of
- .BR infile
- with the second field as the sort key:
- .RS 4
- .sp
- .RS 4
- .nf
- sort -k 2,2 infile
- .fi
- .P
- .RE
- .RE
- .IP " 2." 4
- The following command sorts, in reverse order, the contents of
- .BR infile1
- and
- .BR infile2 ,
- placing the output in
- .BR outfile
- and using the second character of the second field as the sort key
- (assuming that the first character of the second field is the field
- separator):
- .RS 4
- .sp
- .RS 4
- .nf
- sort -r -o outfile -k 2.2,2.2 infile1 infile2
- .fi
- .P
- .RE
- .RE
- .IP " 3." 4
- The following command sorts the contents of
- .BR infile1
- and
- .BR infile2
- using the second non-\c
- <blank>
- of the second field as the sort key:
- .RS 4
- .sp
- .RS 4
- .nf
- sort -k 2.2b,2.2b infile1 infile2
- .fi
- .P
- .RE
- .RE
- .IP " 4." 4
- The following command prints the System\ V password file (user
- database) sorted by the numeric user ID (the third
- <colon>-separated
- field):
- .RS 4
- .sp
- .RS 4
- .nf
- sort -t : -k 3,3n /etc/passwd
- .fi
- .P
- .RE
- .RE
- .IP " 5." 4
- The following command prints the lines of the already sorted file
- .BR infile ,
- suppressing all but one occurrence of lines having the same third
- field:
- .RS 4
- .sp
- .RS 4
- .nf
- sort -um -k 3.1,3.0 infile
- .fi
- .P
- .RE
- .RE
- .SH RATIONALE
- Examples in some historical documentation state that options
- .BR \-um
- with one input file keep the first in each set of lines with equal
- keys. This behavior was deemed to be an implementation artifact and
- was not standardized.
- .P
- The
- .BR \-z
- option was omitted; it is not standard practice on most systems and is
- inconsistent with using
- .IR sort
- to sort several files individually and then merge them together. The
- text concerning
- .BR \-z
- in historical documentation appeared to require implementations to
- determine the proper buffer length during the sort phase of operation,
- but not during the merge.
- .P
- The
- .BR \-y
- option was omitted because of non-portability. The
- .BR \-M
- option, present in System V, was omitted because of non-portability in
- international usage.
- .P
- An undocumented
- .BR \-T
- option exists in some implementations. It is used to specify a
- directory for intermediate files. Implementations are encouraged to
- support the use of the
- .IR TMPDIR
- environment variable instead of adding an option to support this
- functionality.
- .P
- The
- .BR \-k
- option was added to satisfy two objections. First, the zero-based
- counting used by
- .IR sort
- is not consistent with other utility conventions. Second, it did not
- meet syntax guideline requirements.
- .P
- Historical documentation indicates that ``setting
- .BR \-n
- implies
- .BR \-b ''.
- The description of
- .BR \-n
- already states that optional leading <blank>s are tolerated in doing
- the comparison. If
- .BR \-b
- is enabled, rather than implied, by
- .BR \-n ,
- this has unusual side-effects. When a character offset is used in a
- column of numbers (for example, to sort modulo 100), that offset is
- measured relative to the most significant digit, not to the column.
- Based upon a recommendation from the author of the original
- .IR sort
- utility, the
- .BR \-b
- implication has been omitted from this volume of POSIX.1\(hy2017, and an application wishing to
- achieve the previously mentioned side-effects has to code the
- .BR \-b
- flag explicitly.
- .P
- Earlier versions of this standard allowed the
- .BR \-o
- option to appear after operands. Historical practice allowed all
- options to be interspersed with operands. This version of the
- standard allows implementations to accept options after operands
- but conforming applications should not use this form.
- .P
- Earlier versions of this standard also allowed the
- .BR \- \c
- .IR number
- and
- .BR \(pl \c
- .IR number
- options. These options are no longer specified by POSIX.1\(hy2008 but may
- be present in some implementations.
- .P
- Historical implementations produced a message on standard error when
- .BR \-c
- was specified and disorder was detected, and when
- .BR \-c
- and
- .BR \-u
- were specified and a duplicate key was detected. An earlier version of
- this standard contained wording that did not make it clear that this
- message was allowed and some implementations removed this message to
- be sure that they conformed to the standard's requirements. Confronted
- with this difference in behavior, interactive users that wanted to be
- sure that they got visual feedback instead of just exit code 1 could
- have used a command like:
- .sp
- .RS 4
- .nf
- sort -c file || echo disorder
- .fi
- .P
- .RE
- .P
- whether or not the
- .IR sort
- utility provided a message in this case. But, it was not easy for a user
- to find where the disorder or duplicate key occurred on implementations
- that do not produce a message, especially when some parts of the input
- line were not part of the key and when one or more of the
- .BR \-b ,
- .BR \-d ,
- .BR \-f ,
- .BR \-i ,
- .BR \-n ,
- or
- .BR \- r
- options or
- .IR keydef
- type modifiers were in use. POSIX.1\(hy2008 requires a message to be
- produced in this case. POSIX.1\(hy2008 also contains the
- .BR \-C
- option giving users the ability to choose either behavior.
- .P
- When a disorder or duplicate is found when the
- .BR \-c
- option is specified, some implementations print a message containing
- the first line that is out of order or contains a duplicate key; others
- print a message specifying the line number of the offending line. This
- standard allows either type of message.
- .P
- Implementations are encouraged to perform the recommended further
- byte-by-byte comparison of lines that collate equally, even though
- this may affect efficiency. The impact on efficiency can be mitigated
- by only performing the additional comparison if the current locale's
- collating sequence does not have a total ordering of all characters
- (if the implementation provides a way to query this) or by only
- performing the additional comparison if the locale name associated
- with the LC_COLLATE category has an
- .BR '@'
- modifier in the name (since locales without an
- .BR '@'
- modifier should have a total ordering of all characters \(em see the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 7.3.2" ", " "LC_COLLATE").
- Note that if the implementation provides a
- .IR "stable sort"
- option as an extension (usually
- .BR \-s ),
- the additional comparison should not be performed when this option has
- been specified.
- .SH "FUTURE DIRECTIONS"
- A future version of this standard may require that if the collating
- sequence of the current locale does not have a total ordering of all
- characters, any lines of input that collate equally when comparing
- them as whole lines are further compared byte-by-byte using the
- collating sequence for the POSIX locale.
- .SH "SEE ALSO"
- .IR "\fIcomm\fR\^",
- .IR "\fIjoin\fR\^",
- .IR "\fIuniq\fR\^"
- .P
- The Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 7.3.2" ", " "LC_COLLATE",
- .IR "Chapter 8" ", " "Environment Variables",
- .IR "Section 12.2" ", " "Utility Syntax Guidelines"
- .P
- The System Interfaces volume of POSIX.1\(hy2017,
- .IR "\fItoupper\fR\^(\|)"
- .\"
- .SH COPYRIGHT
- Portions of this text are reprinted and reproduced in electronic form
- from IEEE Std 1003.1-2017, Standard for Information Technology
- -- Portable Operating System Interface (POSIX), The Open Group Base
- Specifications Issue 7, 2018 Edition,
- Copyright (C) 2018 by the Institute of
- Electrical and Electronics Engineers, Inc and The Open Group.
- In the event of any discrepancy between this version and the original IEEE and
- The Open Group Standard, the original IEEE and The Open Group Standard
- is the referee document. The original Standard can be obtained online at
- http://www.opengroup.org/unix/online.html .
- .PP
- Any typographical or formatting errors that appear
- in this page are most likely
- to have been introduced during the conversion of the source files to
- man page format. To report such errors, see
- https://www.kernel.org/doc/man-pages/reporting_bugs.html .