join.1p (13379B)
- '\" et
- .TH JOIN "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual"
- .\"
- .SH PROLOG
- This manual page is part of the POSIX Programmer's Manual.
- The Linux implementation of this interface may differ (consult
- the corresponding Linux manual page for details of Linux behavior),
- or the interface may not be implemented on Linux.
- .\"
- .SH NAME
- join
- \(em relational database operator
- .SH SYNOPSIS
- .LP
- .nf
- join \fB[\fR-a \fIfile_number\fR|-v \fIfile_number\fB] [\fR-e \fIstring\fB] [\fR-o \fIlist\fB] [\fR-t \fIchar\fB]
- [\fR-1 \fIfield\fB] [\fR-2 \fIfield\fB]\fI file1 file2\fR
- .fi
- .SH DESCRIPTION
- The
- .IR join
- utility shall perform an equality join on the files
- .IR file1
- and
- .IR file2 .
- The joined files shall be written to the standard output.
- .P
- The join field is a field in each file on which the files are
- compared. The
- .IR join
- utility shall write one line in the output for each pair of lines in
- .IR file1
- and
- .IR file2
- that have join fields that collate equally. The output line by default
- shall consist of the join field, then the remaining fields from
- .IR file1 ,
- then the remaining fields from
- .IR file2 .
- This format can be changed by using the
- .BR \-o
- option (see below). The
- .BR \-a
- option can be used to add unmatched lines to the output. The
- .BR \-v
- option can be used to output only unmatched lines.
- .P
- The files
- .IR file1
- and
- .IR file2
- shall be ordered in the collating sequence of
- .IR sort
- .BR \-b
- on the fields on which they shall be joined, by default the first in
- each line. All selected output shall be written in the same collating
- sequence.
- .P
- The default input field separators shall be
- <blank>
- characters. In this case, multiple separators shall count as one field
- separator, and leading separators shall be ignored. The default output
- field separator shall be a
- <space>.
- .P
- The field separator and collating sequence can be changed by using the
- .BR \-t
- option (see below).
- .P
- If the same key appears more than once in either file, all combinations
- of the set of remaining fields in
- .IR file1
- and the set of remaining fields in
- .IR file2
- are output in the order of the lines encountered.
- .P
- If the input files are not in the appropriate collating sequence, the
- results are unspecified.
- .SH OPTIONS
- The
- .IR join
- utility shall conform to the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 12.2" ", " "Utility Syntax Guidelines".
- .P
- The following options shall be supported:
- .IP "\fB\-a\ \fIfile_number\fR" 10
- .br
- Produce a line for each unpairable line in file
- .IR file_number ,
- where
- .IR file_number
- is 1 or 2, in addition to the default output. If both
- .BR \-a 1
- and
- .BR \-a 2
- are specified, all unpairable lines shall be output.
- .IP "\fB\-e\ \fIstring\fR" 10
- Replace empty output fields in the list selected by
- .BR \-o
- with the string
- .IR string .
- .IP "\fB\-o\ \fIlist\fR" 10
- Construct the output line to comprise the fields specified in
- .IR list ,
- each element of which shall have one of the following two forms:
- .RS 10
- .IP " 1." 4
- \fIfile_number.field\fR, where
- .IR file_number
- is a file number and
- .IR field
- is a decimal integer field number
- .IP " 2." 4
- 0 (zero), representing the join field
- .P
- The elements of
- .IR list
- shall be either
- <comma>-separated
- or
- <blank>-separated,
- as specified in Guideline 8 of the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 12.2" ", " "Utility Syntax Guidelines".
- The fields specified by
- .IR list
- shall be written for all selected output lines. Fields selected by
- .IR list
- that do not appear in the input shall be treated as empty output
- fields. (See the
- .BR \-e
- option.) Only specifically requested fields shall be written. The
- application shall ensure that
- .IR list
- is a single command line argument.
- .RE
- .IP "\fB\-t\ \fIchar\fR" 10
- Use character
- .IR char
- as a separator, for both input and output. Every appearance of
- .IR char
- in a line shall be significant. When this option is specified, the
- collating sequence shall be the same as
- .IR sort
- without the
- .BR \-b
- option.
- .IP "\fB\-v\ \fIfile_number\fR" 10
- .br
- Instead of the default output, produce a line only for each unpairable
- line in
- .IR file_number ,
- where
- .IR file_number
- is 1 or 2. If both
- .BR \-v 1
- and
- .BR \-v 2
- are specified, all unpairable lines shall be output.
- .IP "\fB\-1\ \fIfield\fR" 10
- Join on the
- .IR field th
- field of file 1. Fields are decimal integers starting with 1.
- .IP "\fB\-2\ \fIfield\fR" 10
- Join on the
- .IR field th
- field of file 2. Fields are decimal integers starting with 1.
- .SH OPERANDS
- The following operands shall be supported:
- .IP "\fIfile1\fR,\ \fIfile2\fR" 10
- A pathname of a file to be joined. If either of the
- .IR file1
- or
- .IR file2
- operands is
- .BR '\-' ,
- the standard input shall be used in its place.
- .SH STDIN
- The standard input shall be used only if the
- .IR file1
- or
- .IR file2
- operand is
- .BR '\-' .
- See the INPUT FILES section.
- .SH "INPUT FILES"
- The input files shall be text files.
- .SH "ENVIRONMENT VARIABLES"
- The following environment variables shall affect the execution of
- .IR join :
- .IP "\fILANG\fP" 10
- Provide a default value for the internationalization variables that are
- unset or null. (See the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 8.2" ", " "Internationalization Variables"
- for the precedence of internationalization variables used to determine
- the values of locale categories.)
- .IP "\fILC_ALL\fP" 10
- If set to a non-empty string value, override the values of all the
- other internationalization variables.
- .IP "\fILC_COLLATE\fP" 10
- .br
- Determine the locale of the collating sequence
- .IR join
- expects to have been used when the input files were sorted.
- .IP "\fILC_CTYPE\fP" 10
- Determine the locale for the interpretation of sequences of bytes of
- text data as characters (for example, single-byte as opposed to
- multi-byte characters in arguments and input files).
- .IP "\fILC_MESSAGES\fP" 10
- .br
- Determine the locale that should be used to affect the format and
- contents of diagnostic messages written to standard error.
- .IP "\fINLSPATH\fP" 10
- Determine the location of message catalogs for the processing of
- .IR LC_MESSAGES .
- .SH "ASYNCHRONOUS EVENTS"
- Default.
- .SH STDOUT
- The
- .IR join
- utility output shall be a concatenation of selected character fields.
- When the
- .BR \-o
- option is not specified, the output shall be:
- .sp
- .RS 4
- .nf
- "%s%s%s\en", <\fIjoin field\fR>, <\fIother file1 fields\fR>,
- <\fIother file2 fields\fR>
- .fi
- .P
- .RE
- .P
- If the join field is not the first field in a file, the
- <\fIother\ file\ fields\fP> for that file shall be:
- .sp
- .RS 4
- .nf
- <\fIfields preceding join field\fR>, <\fIfields following join field\fR>
- .fi
- .P
- .RE
- .P
- When the
- .BR \-o
- option is specified, the output format shall be:
- .sp
- .RS 4
- .nf
- "%s\en", <\fIconcatenation of fields\fR>
- .fi
- .P
- .RE
- .P
- where the concatenation of fields is described by the
- .BR \-o
- option, above.
- .P
- For either format, each field (except the last) shall be written with
- its trailing separator character. If the separator is the default (\c
- <blank>
- characters), a single
- <space>
- shall be written after each field (except the last).
- .SH STDERR
- The standard error shall be used only for diagnostic messages.
- .SH "OUTPUT FILES"
- None.
- .SH "EXTENDED DESCRIPTION"
- None.
- .SH "EXIT STATUS"
- The following exit values shall be returned:
- .IP "\00" 6
- All input files were output successfully.
- .IP >0 6
- An error occurred.
- .SH "CONSEQUENCES OF ERRORS"
- Default.
- .LP
- .IR "The following sections are informative."
- .SH "APPLICATION USAGE"
- Pathnames consisting of numeric digits or of the form
- .IR string.string
- should not be specified directly following the
- .BR \-o
- list.
- .P
- If the collating sequence of the current locale does not have a total
- ordering of all characters (see the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 7.3.2" ", " "LC_COLLATE"),
- .IR join
- treats fields that collate equally but are not identical as being the
- same. If this behavior is not desired, it can be avoided by forcing
- the use of the POSIX locale (although this means re-sorting the input
- files into the POSIX locale collating sequence.)
- .P
- When using
- .IR join
- to process pathnames, it is recommended that LC_ALL, or at least
- LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environment,
- since pathnames can contain byte sequences that do not form valid
- characters in some locales, in which case the utility's behavior would
- be undefined. In the POSIX locale each byte is a valid single-byte
- character, and therefore this problem is avoided.
- .SH EXAMPLES
- The
- .BR \-o
- 0 field essentially selects the union of the join fields. For example,
- given file
- .BR phone :
- .sp
- .RS 4
- .nf
- !Name Phone Number
- Don +1 123-456-7890
- Hal +1 234-567-8901
- Yasushi +2 345-678-9012
- .fi
- .P
- .RE
- .P
- and file
- .BR fax :
- .sp
- .RS 4
- .nf
- !Name Fax Number
- Don +1 123-456-7899
- Keith +1 456-789-0122
- Yasushi +2 345-678-9011
- .fi
- .P
- .RE
- .P
- (where the large expanses of white space are meant to each represent a
- single
- <tab>),
- the command:
- .sp
- .RS 4
- .nf
- join -t "<tab>" -a 1 -a 2 -e \(aq(unknown)\(aq -o 0,1.2,2.2 phone fax
- .fi
- .P
- .RE
- .P
- (where
- .IR <tab>
- is a literal
- <tab>
- character) would produce:
- .sp
- .RS 4
- .nf
- !Name Phone Number Fax Number
- Don +1 123-456-7890 +1 123-456-7899
- Hal +1 234-567-8901 (unknown)
- Keith (unknown) +1 456-789-0122
- Yasushi +2 345-678-9012 +2 345-678-9011
- .fi
- .P
- .RE
- .P
- Multiple instances of the same key will produce combinatorial results.
- The following:
- .sp
- .RS 4
- .nf
- fa:
- a x
- a y
- a z
- fb:
- a p
- .fi
- .P
- .RE
- .P
- will produce:
- .sp
- .RS 4
- .nf
- a x p
- a y p
- a z p
- .fi
- .P
- .RE
- .P
- And the following:
- .sp
- .RS 4
- .nf
- fa:
- a b c
- a d e
- fb:
- a w x
- a y z
- a o p
- .fi
- .P
- .RE
- .P
- will produce:
- .sp
- .RS 4
- .nf
- a b c w x
- a b c y z
- a b c o p
- a d e w x
- a d e y z
- a d e o p
- .fi
- .P
- .RE
- .SH RATIONALE
- The
- .BR \-e
- option is only effective when used with
- .BR \-o
- because, unless specific fields are identified using
- .BR \-o ,
- .IR join
- is not aware of what fields might be empty. The exception to this is
- the join field, but identifying an empty join field with the
- .BR \-e
- string is not historical practice and some scripts might break if this
- were changed.
- .P
- The 0 field in the
- .BR \-o
- list was adopted from the Tenth Edition version of
- .IR join
- to satisfy international objections that the
- .IR join
- in the base documents for IEEE\ Std 1003.2\(hy1992 did not support the ``full join''
- or ``outer join'' described in relational database literature.
- Although it has been possible to include a join field in the
- output (by default, or by field number using
- .BR \-o ),
- the join field could not be included for an unpaired line selected by
- .BR \-a .
- The
- .BR \-o
- 0 field essentially selects the union of the join fields.
- .P
- This sort of outer join was not possible with the
- .IR join
- commands in the base documents for IEEE\ Std 1003.2\(hy1992. The
- .BR \-o
- 0 field was chosen because it is an upwards-compatible change for
- applications. An alternative was considered: have the join field
- represent the union of the fields in the files (where they are
- identical for matched lines, and one or both are null for unmatched
- lines). This was not adopted because it would break some historical
- applications.
- .P
- The ability to specify
- .IR file2
- as
- .BR \-
- is not historical practice; it was added for completeness.
- .P
- The
- .BR \-v
- option is not historical practice, but was considered necessary because
- it permitted the writing of
- .IR only
- those lines that do not match on the join field, as opposed to the
- .BR \-a
- option, which prints both lines that do and do not match. This
- additional facility is parallel with the
- .BR \-v
- option of
- .IR grep .
- .P
- Some historical implementations have been encountered where a blank
- line in one of the input files was considered to be the end of the
- file; the description in this volume of POSIX.1\(hy2017 does not cite this as an allowable case.
- .P
- Earlier versions of this standard allowed
- .BR \-j ,
- .BR \-j1 ,
- .BR \-j2
- options, and a form of the
- .BR \-o
- option that allowed the
- .IR list
- option-argument to be multiple arguments. These forms are no longer
- specified by POSIX.1\(hy2008 but may be present in some implementations.
- .SH "FUTURE DIRECTIONS"
- None.
- .SH "SEE ALSO"
- .IR "\fIawk\fR\^",
- .IR "\fIcomm\fR\^",
- .IR "\fIsort\fR\^",
- .IR "\fIuniq\fR\^"
- .P
- The Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 7.3.2" ", " "LC_COLLATE",
- .IR "Chapter 8" ", " "Environment Variables",
- .IR "Section 12.2" ", " "Utility Syntax Guidelines"
- .\"
- .SH COPYRIGHT
- Portions of this text are reprinted and reproduced in electronic form
- from IEEE Std 1003.1-2017, Standard for Information Technology
- -- Portable Operating System Interface (POSIX), The Open Group Base
- Specifications Issue 7, 2018 Edition,
- Copyright (C) 2018 by the Institute of
- Electrical and Electronics Engineers, Inc and The Open Group.
- In the event of any discrepancy between this version and the original IEEE and
- The Open Group Standard, the original IEEE and The Open Group Standard
- is the referee document. The original Standard can be obtained online at
- http://www.opengroup.org/unix/online.html .
- .PP
- Any typographical or formatting errors that appear
- in this page are most likely
- to have been introduced during the conversion of the source files to
- man page format. To report such errors, see
- https://www.kernel.org/doc/man-pages/reporting_bugs.html .