tr.1p (19216B)
- '\" et
- .TH TR "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual"
- .\"
- .SH PROLOG
- This manual page is part of the POSIX Programmer's Manual.
- The Linux implementation of this interface may differ (consult
- the corresponding Linux manual page for details of Linux behavior),
- or the interface may not be implemented on Linux.
- .\"
- .SH NAME
- tr
- \(em translate characters
- .SH SYNOPSIS
- .LP
- .nf
- tr \fB[\fR-c|-C\fB] [\fR-s\fB] \fIstring1 string2\fR
- .P
- tr -s \fB[\fR-c|-C\fB] \fIstring1\fR
- .P
- tr -d \fB[\fR-c|-C\fB] \fIstring1\fR
- .P
- tr -ds \fB[\fR-c|-C\fB] \fIstring1 string2\fR
- .fi
- .SH DESCRIPTION
- The
- .IR tr
- utility shall copy the standard input to the standard output with
- substitution or deletion of selected characters. The options specified
- and the
- .IR string1
- and
- .IR string2
- operands shall control translations that occur while copying characters
- and single-character collating elements.
- .SH OPTIONS
- The
- .IR tr
- utility shall conform to the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 12.2" ", " "Utility Syntax Guidelines".
- .P
- The following options shall be supported:
- .IP "\fB\-c\fP" 10
- Complement the set of values specified by
- .IR string1 .
- See the EXTENDED DESCRIPTION section.
- .IP "\fB\-C\fP" 10
- Complement the set of characters specified by
- .IR string1 .
- See the EXTENDED DESCRIPTION section.
- .IP "\fB\-d\fP" 10
- Delete all occurrences of input characters that are specified by
- .IR string1 .
- .IP "\fB\-s\fP" 10
- Replace instances of repeated characters with a single character, as
- described in the EXTENDED DESCRIPTION section.
- .SH OPERANDS
- The following operands shall be supported:
- .IP "\fIstring1\fR,\ \fIstring2\fR" 10
- .br
- Translation control strings. Each string shall represent a set of
- characters to be converted into an array of characters used for the
- translation. For a detailed description of how the strings are
- interpreted, see the EXTENDED DESCRIPTION section.
- .SH STDIN
- The standard input can be any type of file.
- .SH "INPUT FILES"
- None.
- .SH "ENVIRONMENT VARIABLES"
- The following environment variables shall affect the execution of
- .IR tr :
- .IP "\fILANG\fP" 10
- Provide a default value for the internationalization variables that are
- unset or null. (See the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 8.2" ", " "Internationalization Variables"
- for the precedence of internationalization variables used to determine
- the values of locale categories.)
- .IP "\fILC_ALL\fP" 10
- If set to a non-empty string value, override the values of all the
- other internationalization variables.
- .IP "\fILC_COLLATE\fP" 10
- .br
- Determine the locale for the behavior of range expressions and
- equivalence classes.
- .IP "\fILC_CTYPE\fP" 10
- Determine the locale for the interpretation of sequences of bytes of
- text data as characters (for example, single-byte as opposed to
- multi-byte characters in arguments) and the behavior of character
- classes.
- .IP "\fILC_MESSAGES\fP" 10
- .br
- Determine the locale that should be used to affect the format and
- contents of diagnostic messages written to standard error.
- .IP "\fINLSPATH\fP" 10
- Determine the location of message catalogs for the processing of
- .IR LC_MESSAGES .
- .SH "ASYNCHRONOUS EVENTS"
- Default.
- .SH STDOUT
- The
- .IR tr
- output shall be identical to the input, with the exception of the
- specified transformations.
- .SH STDERR
- The standard error shall be used only for diagnostic messages.
- .SH "OUTPUT FILES"
- None.
- .SH "EXTENDED DESCRIPTION"
- The operands
- .IR string1
- and
- .IR string2
- (if specified) define two arrays of characters. The constructs in the
- following list can be used to specify characters or single-character
- collating elements. If any of the constructs result in multi-character
- collating elements,
- .IR tr
- shall exclude, without a diagnostic, those multi-character elements
- from the resulting array.
- .IP "\fIcharacter\fR" 10
- Any character not described by one of the conventions below shall
- represent itself.
- .IP "\e\fIoctal\fR" 10
- Octal sequences can be used to represent characters with specific coded
- values. An octal sequence shall consist of a
- <backslash>
- followed by the longest sequence of one, two, or three-octal-digit
- characters (01234567). The sequence shall cause the value whose encoding
- is represented by the one, two, or three-digit octal integer to be placed
- into the array. Multi-byte characters require multiple, concatenated
- escape sequences of this type, including the leading
- <backslash>
- for each byte.
- .IP "\e\fIcharacter\fR" 10
- The
- <backslash>-escape
- sequences in the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Table 5-1" ", " "Escape Sequences and Associated Actions"
- (\c
- .BR '\e\e' ,
- .BR '\ea' ,
- .BR '\eb' ,
- .BR '\ef' ,
- .BR '\en' ,
- .BR '\er' ,
- .BR '\et' ,
- .BR '\ev' )
- shall be supported. The results of using any other character, other
- than an octal digit, following the
- <backslash>
- are unspecified. Also, if there is no character following the
- <backslash>,
- the results are unspecified.
- .IP "\fIc\fR\-\fIc\fR" 10
- In the POSIX locale, this construct shall represent the range of
- collating elements between the range endpoints (as long as neither
- endpoint is an octal sequence of the form \e\fIoctal\fP), inclusive, as
- defined by the collation sequence. The characters or collating elements
- in the range shall be placed in the array in ascending collation
- sequence. If the second endpoint precedes the starting endpoint in the
- collation sequence, it is unspecified whether the range of collating
- elements is empty, or this construct is treated as invalid. In locales
- other than the POSIX locale, this construct has unspecified behavior.
- .RS 10
- .P
- If either or both of the range endpoints are octal sequences of the
- form \e\fIoctal\fP, this shall represent the range of specific coded
- values between the two range endpoints, inclusive.
- .RE
- .IP "[:\fIclass\fR:]" 10
- Represents all characters belonging to the defined character class, as
- defined by the current setting of the
- .IR LC_CTYPE
- locale category. The following character class names shall be accepted
- when specified in
- .IR string1 :
- .TS
- tab(@);
- lB lB lB lB lB lB.
- alnum@blank@digit@lower@punct@upper
- alpha@cntrl@graph@print@space@xdigit
- .TE
- .RS 10
- .P
- In addition, character class expressions of the form [:\c
- .IR name :]
- shall be recognized in those locales where the
- .IR name
- keyword has been given a
- .BR charclass
- definition in the
- .IR LC_CTYPE
- category.
- .P
- When both the
- .BR \-d
- and
- .BR \-s
- options are specified, any of the character class names shall be
- accepted in
- .IR string2 .
- Otherwise, only character class names
- .BR lower
- or
- .BR upper
- are valid in
- .IR string2
- and then only if the corresponding character class (\c
- .BR upper
- and
- .BR lower ,
- respectively) is specified in the same relative position in
- .IR string1 .
- Such a specification shall be interpreted as a request for case
- conversion. When [:\c
- .IR lower :]
- appears in
- .IR string1
- and [:\c
- .IR upper :]
- appears in
- .IR string2 ,
- the arrays shall contain the characters from the
- .BR toupper
- mapping in the
- .IR LC_CTYPE
- category of the current locale. When [:\c
- .IR upper :]
- appears in
- .IR string1
- and [:\c
- .IR lower :]
- appears in
- .IR string2 ,
- the arrays shall contain the characters from the
- .BR tolower
- mapping in the
- .IR LC_CTYPE
- category of the current locale. The first character from each mapping
- pair shall be in the array for
- .IR string1
- and the second character from each mapping pair shall be in the array
- for
- .IR string2
- in the same relative position.
- .P
- Except for case conversion, the characters specified by a character
- class expression shall be placed in the array in an unspecified order.
- .P
- If the name specified for
- .IR class
- does not define a valid character class in the current locale, the
- behavior is undefined.
- .RE
- .IP "[=\fIequiv\fR=]" 10
- Represents all characters or collating elements belonging to the same
- equivalence class as
- .IR equiv ,
- as defined by the current setting of the
- .IR LC_COLLATE
- locale category. An equivalence class expression shall be allowed only
- in
- .IR string1 ,
- or in
- .IR string2
- when it is being used by the combined
- .BR \-d
- and
- .BR \-s
- options. The characters belonging to the equivalence class shall be
- placed in the array in an unspecified order.
- .IP "[\fIx\fR*\fIn\fR]" 10
- Represents
- .IR n
- repeated occurrences of the character
- .IR x .
- Because this expression is used to map multiple characters to one, it
- is only valid when it occurs in
- .IR string2 .
- If
- .IR n
- is omitted or is zero, it shall be interpreted as large enough to
- extend the
- .IR string2 -based
- sequence to the length of the
- .IR string1 -based
- sequence. If
- .IR n
- has a leading zero, it shall be interpreted as an octal value.
- Otherwise, it shall be interpreted as a decimal value.
- .P
- When the
- .BR \-d
- option is not specified:
- .IP " *" 4
- If
- .IR string2
- is present, each input character found in the array specified by
- .IR string1
- shall be replaced by the character in the same relative position in the
- array specified by
- .IR string2 .
- If the array specified by
- .IR string2
- is shorter that the one specified by
- .IR string1 ,
- or if a character occurs more than once in
- .IR string1 ,
- the results are unspecified.
- .IP " *" 4
- If the
- .BR \-C
- option is specified, the complements of the characters specified by
- .IR string1
- (the set of all characters in the current character set, as defined by
- the current setting of
- .IR LC_CTYPE ,
- except for those actually specified in the
- .IR string1
- operand) shall be placed in the array in ascending collation sequence,
- as defined by the current setting of
- .IR LC_COLLATE .
- .IP " *" 4
- If the
- .BR \-c
- option is specified, the complement of the values specified by
- .IR string1
- shall be placed in the array in ascending order by binary value.
- .IP " *" 4
- Because the order in which characters specified by character class
- expressions or equivalence class expressions is undefined, such
- expressions should only be used if the intent is to map several
- characters into one. An exception is case conversion, as described
- previously.
- .P
- When the
- .BR \-d
- option is specified:
- .IP " *" 4
- Input characters found in the array specified by
- .IR string1
- shall be deleted.
- .IP " *" 4
- When the
- .BR \-C
- option is specified with
- .BR \-d ,
- all characters except those specified by
- .IR string1
- shall be deleted. The contents of
- .IR string2
- are ignored, unless the
- .BR \-s
- option is also specified.
- .IP " *" 4
- When the
- .BR \-c
- option is specified with
- .BR \-d ,
- all values except those specified by
- .IR string1
- shall be deleted. The contents of
- .IR string2
- shall be ignored, unless the
- .BR \-s
- option is also specified.
- .IP " *" 4
- The same string cannot be used for both the
- .BR \-d
- and the
- .BR \-s
- option; when both options are specified, both
- .IR string1
- (used for deletion) and
- .IR string2
- (used for squeezing) shall be required.
- .P
- When the
- .BR \-s
- option is specified, after any deletions or translations have taken
- place, repeated sequences of the same character shall be replaced by
- one occurrence of the same character, if the character is found in the
- array specified by the last operand. If the last operand contains a
- character class, such as the following example:
- .sp
- .RS 4
- .nf
- tr -s \(aq[:space:]\(aq
- .fi
- .P
- .RE
- .P
- the last operand's array shall contain all of the characters in that
- character class. However, in a case conversion, as described
- previously, such as:
- .sp
- .RS 4
- .nf
- tr -s \(aq[:upper:]\(aq \(aq[:lower:]\(aq
- .fi
- .P
- .RE
- .P
- the last operand's array shall contain only those characters defined as
- the second characters in each of the
- .BR toupper
- or
- .BR tolower
- character pairs, as appropriate.
- .P
- An empty string used for
- .IR string1
- or
- .IR string2
- produces undefined results.
- .SH "EXIT STATUS"
- The following exit values shall be returned:
- .IP "\00" 6
- All input was processed successfully.
- .IP >0 6
- An error occurred.
- .SH "CONSEQUENCES OF ERRORS"
- Default.
- .LP
- .IR "The following sections are informative."
- .SH "APPLICATION USAGE"
- If necessary,
- .IR string1
- and
- .IR string2
- can be quoted to avoid pattern matching by the shell.
- .P
- If an ordinary digit (representing itself) is to follow an octal
- sequence, the octal sequence must use the full three digits to avoid
- ambiguity.
- .P
- When
- .IR string2
- is shorter than
- .IR string1 ,
- a difference results between historical System\ V and BSD systems. A
- BSD system pads
- .IR string2
- with the last character found in
- .IR string2 .
- Thus, it is possible to do the following:
- .sp
- .RS 4
- .nf
- tr 0123456789 d
- .fi
- .P
- .RE
- .P
- which would translate all digits to the letter
- .BR 'd' .
- Since this area is specifically unspecified in this volume of POSIX.1\(hy2017, both the BSD and
- System\ V behaviors are allowed, but a conforming application cannot rely
- on the BSD behavior. It would have to code the example in the
- following way:
- .sp
- .RS 4
- .nf
- tr 0123456789 \(aq[d*]\(aq
- .fi
- .P
- .RE
- .P
- It should be noted that, despite similarities in appearance, the string
- operands used by
- .IR tr
- are not regular expressions.
- .P
- Unlike some historical implementations, this definition of the
- .IR tr
- utility correctly processes NUL characters in its input stream. NUL
- characters can be stripped by using:
- .sp
- .RS 4
- .nf
- tr -d \(aq\e000\(aq
- .fi
- .P
- .RE
- .SH EXAMPLES
- .IP " 1." 4
- The following example creates a list of all words in
- .BR file1
- one per line in
- .BR file2 ,
- where a word is taken to be a maximal string of letters.
- .RS 4
- .sp
- .RS 4
- .nf
- tr -cs "[:alpha:]" "[\en*]" <file1 >file2
- .fi
- .P
- .RE
- .RE
- .IP " 2." 4
- The next example translates all lowercase characters in
- .BR file1
- to uppercase and writes the results to standard output.
- .RS 4
- .sp
- .RS 4
- .nf
- tr "[:lower:]" "[:upper:]" <file1
- .fi
- .P
- .RE
- .RE
- .IP " 3." 4
- This example uses an equivalence class to identify accented variants of
- the base character
- .BR 'e'
- in
- .BR file1 ,
- which are stripped of diacritical marks and written to
- .BR file2 .
- .RS 4
- .sp
- .RS 4
- .nf
- tr "[=e=]" "[e*]" <file1 >file2
- .fi
- .P
- .RE
- .RE
- .SH RATIONALE
- In some early proposals, an explicit option
- .BR \-n
- was added to disable the historical behavior of stripping NUL
- characters from the input. It was considered that automatically
- stripping NUL characters from the input was not correct functionality.
- However, the removal of
- .BR \-n
- in a later proposal does not remove the requirement that
- .IR tr
- correctly process NUL characters in its input stream. NUL characters
- can be stripped by using
- .IR tr
- .BR \-d
- \&\(aq\e000\(aq.
- .P
- Historical implementations of
- .IR tr
- differ widely in syntax and behavior. For example, the BSD version has
- not needed the bracket characters for the repetition sequence. The
- .IR tr
- utility syntax is based more closely on the System V and XPG3 model
- while attempting to accommodate historical BSD implementations. In the
- case of the short
- .IR string2
- padding, the decision was to unspecify the behavior and preserve System
- V and XPG3 scripts, which might find difficulty with the BSD method.
- The assumption was made that BSD users of
- .IR tr
- have to make accommodations to meet the syntax defined here. Since it
- is possible to use the repetition sequence to duplicate the desired
- behavior, whereas there is no simple way to achieve the System V
- method, this was the correct, if not desirable, approach.
- .P
- The use of octal values to specify control characters, while having
- historical precedents, is not portable. The introduction of escape
- sequences for control characters should provide the necessary
- portability. It is recognized that this may cause some historical
- scripts to break.
- .P
- An early proposal included support for multi-character collating elements.
- It was pointed out that, while
- .IR tr
- does employ some syntactical elements from REs, the aim of
- .IR tr
- is quite different; ranges, for example, do not have a similar meaning
- (``any of the chars in the range matches'', \fIversus\fP ``translate
- each character in the range to the output counterpart''). As a result,
- the previously included support for multi-character collating elements
- has been removed. What remains are ranges in current collation order
- (to support, for example, accented characters), character classes, and
- equivalence classes.
- .P
- In XPG3 the [:\c
- .IR class :]
- and [=\c
- .IR equiv =]
- conventions are shown with double brackets, as in RE syntax. However,
- .IR tr
- does not implement RE principles; it just borrows part of the syntax.
- Consequently, [:\c
- .IR class :]
- and [=\c
- .IR equiv =]
- should be regarded as syntactical elements on a par with [\c
- .IR x *\c
- .IR n ],
- which is not an RE bracket expression.
- .P
- The standard developers will consider changes to
- .IR tr
- that allow it to translate characters between different character
- encodings, or they will consider providing a new utility to accomplish
- this.
- .P
- On historical System V systems, a range expression requires enclosing
- square-brackets, such as:
- .sp
- .RS 4
- .nf
- tr \(aq[a-z]\(aq \(aq[A-Z]\(aq
- .fi
- .P
- .RE
- .P
- However, BSD-based systems did not require the brackets, and this
- convention is used here to avoid breaking large numbers of BSD scripts:
- .sp
- .RS 4
- .nf
- tr a-z A-Z
- .fi
- .P
- .RE
- .P
- The preceding System V script will continue to work because the
- brackets, treated as regular characters, are translated to themselves.
- However, any System V script that relied on
- .BR \(dqa\(hyz\(dq
- representing the three characters
- .BR 'a' ,
- .BR '\-' ,
- and
- .BR 'z'
- have to be rewritten as
- .BR \(dqaz-\(dq .
- .P
- The ISO\ POSIX\(hy2:\|1993 standard had a
- .BR \-c
- option that behaved similarly to the
- .BR \-C
- option, but did not supply functionality equivalent to the
- .BR \-c
- option specified in POSIX.1\(hy2008.
- .P
- The earlier version also said that octal sequences referred to
- collating elements and could be placed adjacent to each other to
- specify multi-byte characters. However, it was noted that this caused
- ambiguities because
- .IR tr
- would not be able to tell whether adjacent octal sequences were
- intending to specify multi-byte characters or multiple single byte
- characters. POSIX.1\(hy2008 specifies that octal sequences always refer to single
- byte binary values when used to specify an endpoint of a range of
- collating elements.
- .P
- Earlier versions of this standard allowed for implementations with
- bytes other than eight bits, but this has been modified in this
- version.
- .SH "FUTURE DIRECTIONS"
- None.
- .SH "SEE ALSO"
- .IR "\fIsed\fR\^"
- .P
- The Base Definitions volume of POSIX.1\(hy2017,
- .IR "Table 5-1" ", " "Escape Sequences and Associated Actions",
- .IR "Chapter 8" ", " "Environment Variables",
- .IR "Section 12.2" ", " "Utility Syntax Guidelines"
- .\"
- .SH COPYRIGHT
- Portions of this text are reprinted and reproduced in electronic form
- from IEEE Std 1003.1-2017, Standard for Information Technology
- -- Portable Operating System Interface (POSIX), The Open Group Base
- Specifications Issue 7, 2018 Edition,
- Copyright (C) 2018 by the Institute of
- Electrical and Electronics Engineers, Inc and The Open Group.
- In the event of any discrepancy between this version and the original IEEE and
- The Open Group Standard, the original IEEE and The Open Group Standard
- is the referee document. The original Standard can be obtained online at
- http://www.opengroup.org/unix/online.html .
- .PP
- Any typographical or formatting errors that appear
- in this page are most likely
- to have been introduced during the conversion of the source files to
- man page format. To report such errors, see
- https://www.kernel.org/doc/man-pages/reporting_bugs.html .