join.1 (7285B)
- .\" SPDX-License-Identifier: BSD-3-Clause
- .\" Copyright (c) 1990, 1993
- .\" The Regents of the University of California. All rights reserved.
- .\"
- .\" This code is derived from software contributed to Berkeley by
- .\" the Institute of Electrical and Electronics Engineers, Inc.
- .\"
- .\" Redistribution and use in source and binary forms, with or without
- .\" modification, are permitted provided that the following conditions
- .\" are met:
- .\" 1. Redistributions of source code must retain the above copyright
- .\" notice, this list of conditions and the following disclaimer.
- .\" 2. Redistributions in binary form must reproduce the above copyright
- .\" notice, this list of conditions and the following disclaimer in the
- .\" documentation and/or other materials provided with the distribution.
- .\" 3. Neither the name of the University nor the names of its contributors
- .\" may be used to endorse or promote products derived from this software
- .\" without specific prior written permission.
- .\"
- .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
- .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
- .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
- .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
- .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- .\" SUCH DAMAGE.
- .\"
- .\" @(#)join.1 8.3 (Berkeley) 4/28/95
- .\"
- .Dd June 20, 2020
- .Dt JOIN 1
- .Os
- .Sh NAME
- .Nm join
- .Nd relational database operator
- .Sh SYNOPSIS
- .Nm
- .Oo
- .Fl a Ar file_number | Fl v Ar file_number
- .Oc
- .Op Fl e Ar string
- .Op Fl o Ar list
- .Op Fl t Ar char
- .Op Fl 1 Ar field
- .Op Fl 2 Ar field
- .Ar file1
- .Ar file2
- .Sh DESCRIPTION
- The
- .Nm
- utility performs an
- .Dq equality join
- on the specified files
- and writes the result to the standard output.
- The
- .Dq join field
- is the field in each file by which the files are compared.
- The first field in each line is used by default.
- There is one line in the output for each pair of lines in
- .Ar file1
- and
- .Ar file2
- which have identical join fields.
- Each output line consists of the join field, the remaining fields from
- .Ar file1
- and then the remaining fields from
- .Ar file2 .
- .Pp
- The default field separators are tab and space characters.
- In this case, multiple tabs and spaces count as a single field separator,
- and leading tabs and spaces are ignored.
- The default output field separator is a single space character.
- .Pp
- Many of the options use file and field numbers.
- Both file numbers and field numbers are 1 based, i.e., the first file on
- the command line is file number 1 and the first field is field number 1.
- The following options are available:
- .Bl -tag -width indent
- .It Fl a Ar file_number
- In addition to the default output, produce a line for each unpairable
- line in file
- .Ar file_number .
- .It Fl e Ar string
- Replace empty output fields with
- .Ar string .
- .It Fl o Ar list
- The
- .Fl o
- option specifies the fields that will be output from each file for
- each line with matching join fields.
- Each element of
- .Ar list
- has either the form
- .Ar file_number . Ns Ar field ,
- where
- .Ar file_number
- is a file number and
- .Ar field
- is a field number, or the form
- .Ql 0
- .Pq zero ,
- representing the join field.
- The elements of list must be either comma
- .Pq Ql \&,
- or whitespace separated.
- (The latter requires quoting to protect it from the shell, or, a simpler
- approach is to use multiple
- .Fl o
- options.)
- .It Fl t Ar char
- Use character
- .Ar char
- as a field delimiter for both input and output.
- Every occurrence of
- .Ar char
- in a line is significant.
- .It Fl v Ar file_number
- Do not display the default output, but display a line for each unpairable
- line in file
- .Ar file_number .
- The options
- .Fl v Cm 1
- and
- .Fl v Cm 2
- may be specified at the same time.
- .It Fl 1 Ar field
- Join on the
- .Ar field Ns 'th
- field of
- .Ar file1 .
- .It Fl 2 Ar field
- Join on the
- .Ar field Ns 'th
- field of
- .Ar file2 .
- .El
- .Pp
- When the default field delimiter characters are used, the files to be joined
- should be ordered in the collating sequence of
- .Xr sort 1 ,
- using the
- .Fl b
- option, on the fields on which they are to be joined, otherwise
- .Nm
- may not report all field matches.
- When the field delimiter characters are specified by the
- .Fl t
- option, the collating sequence should be the same as
- .Xr sort 1
- without the
- .Fl b
- option.
- .Pp
- If one of the arguments
- .Ar file1
- or
- .Ar file2
- is
- .Sq Fl ,
- the standard input is used.
- .Sh EXIT STATUS
- .Ex -std
- .Sh EXAMPLES
- Assuming a file named
- .Pa nobel_laureates.txt
- with information about some of the first Nobel Peace Prize laureates:
- .Bd -literal -offset indent
- 1901,Jean Henri Dunant,M
- 1901,Frederic Passy,M
- 1902,Elie Ducommun,M
- 1905,Baroness Bertha Sophie Felicita Von Suttner,F
- 1910,Permanent International Peace Bureau,
- .Ed
- .Pp
- and a second file
- .Pa nobel_nationalities.txt
- with their nationalities:
- .Bd -literal -offset indent
- Jean Henri Dunant,Switzerland
- Frederic Passy,France
- Elie Ducommun,Switzerland
- Baroness Bertha Sophie Felicita Von Suttner
- .Ed
- .Pp
- Join the two files using the second column from first file and the default first
- column from second file specifying a custom field delimiter:
- .Bd -literal -offset indent
- $ join -t, -1 2 nobel_laureates.txt nobel_nationalities.txt
- Jean Henri Dunant,1901,M,Switzerland
- Frederic Passy,1901,M,France
- Elie Ducommun,1902,M,Switzerland
- Baroness Bertha Sophie Felicita Von Suttner,1905,F
- .Ed
- .Pp
- Show only the year and the nationality of the laureate using
- .Ql <<NULL>>
- to replace empty fields:
- .Bd -literal -offset indent
- $ join -e "<<NULL>>" -t, -1 2 -o "1.1 2.2" nobel_laureates.txt nobel_nationalities.txt
- 1901,Switzerland
- 1901,France
- 1902,Switzerland
- 1905,<<NULL>>
- .Ed
- .Pp
- Show only lines from first file which do not have a match in second file:
- .Bd -literal -offset indent
- $ join -v1 -t, -1 2 nobel_laureates.txt nobel_nationalities.txt
- Permanent International Peace Bureau,1910,
- .Ed
- .Pp
- Assuming a file named
- .Pa capitals.txt
- with the following content:
- .Bd -literal -offset indent
- Belgium,Brussels
- France,Paris
- Italy,Rome
- Switzerland
- .Ed
- .Pp
- Show the name and capital of the country where the laureate was born.
- This example uses
- .Pa nobel_nationalities.txt
- as a bridge but does not show any information from that file.
- Also see the note about
- .Xr sort 1
- above to understand why we need to sort the intermediate result.
- .Bd -literal -offset indent
- $ join -t, -1 2 -o "1.2 2.2" nobel_laureates.txt nobel_nationalities.txt | \e
- sort -k2 -t, | join -t, -e "<<NULL>>" -1 2 -o "1.1 2.2" - capitals.txt
- Elie Ducommun,<<NULL>>
- Jean Henri Dunant,<<NULL>>
- .Ed
- .Sh COMPATIBILITY
- Compatibility with
- .Fl a
- and multi-operand
- .Fl o
- from historical versions of
- .Nm
- got dropped, compatibility with
- .Fl j Ar field
- is kept for now.
- .Sh SEE ALSO
- .Xr awk 1 ,
- .Xr comm 1 ,
- .Xr paste 1 ,
- .Xr sort 1 ,
- .Xr uniq 1
- .Sh STANDARDS
- .Nm
- should be compliant with the
- IEEE Std 1003.1-2024 (“POSIX.1”)
- specification.