join.1 (7285B)
- .\" SPDX-License-Identifier: BSD-3-Clause
 - .\" Copyright (c) 1990, 1993
 - .\" The Regents of the University of California. All rights reserved.
 - .\"
 - .\" This code is derived from software contributed to Berkeley by
 - .\" the Institute of Electrical and Electronics Engineers, Inc.
 - .\"
 - .\" Redistribution and use in source and binary forms, with or without
 - .\" modification, are permitted provided that the following conditions
 - .\" are met:
 - .\" 1. Redistributions of source code must retain the above copyright
 - .\" notice, this list of conditions and the following disclaimer.
 - .\" 2. Redistributions in binary form must reproduce the above copyright
 - .\" notice, this list of conditions and the following disclaimer in the
 - .\" documentation and/or other materials provided with the distribution.
 - .\" 3. Neither the name of the University nor the names of its contributors
 - .\" may be used to endorse or promote products derived from this software
 - .\" without specific prior written permission.
 - .\"
 - .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 - .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 - .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 - .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 - .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 - .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 - .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 - .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 - .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 - .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 - .\" SUCH DAMAGE.
 - .\"
 - .\" @(#)join.1 8.3 (Berkeley) 4/28/95
 - .\"
 - .Dd June 20, 2020
 - .Dt JOIN 1
 - .Os
 - .Sh NAME
 - .Nm join
 - .Nd relational database operator
 - .Sh SYNOPSIS
 - .Nm
 - .Oo
 - .Fl a Ar file_number | Fl v Ar file_number
 - .Oc
 - .Op Fl e Ar string
 - .Op Fl o Ar list
 - .Op Fl t Ar char
 - .Op Fl 1 Ar field
 - .Op Fl 2 Ar field
 - .Ar file1
 - .Ar file2
 - .Sh DESCRIPTION
 - The
 - .Nm
 - utility performs an
 - .Dq equality join
 - on the specified files
 - and writes the result to the standard output.
 - The
 - .Dq join field
 - is the field in each file by which the files are compared.
 - The first field in each line is used by default.
 - There is one line in the output for each pair of lines in
 - .Ar file1
 - and
 - .Ar file2
 - which have identical join fields.
 - Each output line consists of the join field, the remaining fields from
 - .Ar file1
 - and then the remaining fields from
 - .Ar file2 .
 - .Pp
 - The default field separators are tab and space characters.
 - In this case, multiple tabs and spaces count as a single field separator,
 - and leading tabs and spaces are ignored.
 - The default output field separator is a single space character.
 - .Pp
 - Many of the options use file and field numbers.
 - Both file numbers and field numbers are 1 based, i.e., the first file on
 - the command line is file number 1 and the first field is field number 1.
 - The following options are available:
 - .Bl -tag -width indent
 - .It Fl a Ar file_number
 - In addition to the default output, produce a line for each unpairable
 - line in file
 - .Ar file_number .
 - .It Fl e Ar string
 - Replace empty output fields with
 - .Ar string .
 - .It Fl o Ar list
 - The
 - .Fl o
 - option specifies the fields that will be output from each file for
 - each line with matching join fields.
 - Each element of
 - .Ar list
 - has either the form
 - .Ar file_number . Ns Ar field ,
 - where
 - .Ar file_number
 - is a file number and
 - .Ar field
 - is a field number, or the form
 - .Ql 0
 - .Pq zero ,
 - representing the join field.
 - The elements of list must be either comma
 - .Pq Ql \&,
 - or whitespace separated.
 - (The latter requires quoting to protect it from the shell, or, a simpler
 - approach is to use multiple
 - .Fl o
 - options.)
 - .It Fl t Ar char
 - Use character
 - .Ar char
 - as a field delimiter for both input and output.
 - Every occurrence of
 - .Ar char
 - in a line is significant.
 - .It Fl v Ar file_number
 - Do not display the default output, but display a line for each unpairable
 - line in file
 - .Ar file_number .
 - The options
 - .Fl v Cm 1
 - and
 - .Fl v Cm 2
 - may be specified at the same time.
 - .It Fl 1 Ar field
 - Join on the
 - .Ar field Ns 'th
 - field of
 - .Ar file1 .
 - .It Fl 2 Ar field
 - Join on the
 - .Ar field Ns 'th
 - field of
 - .Ar file2 .
 - .El
 - .Pp
 - When the default field delimiter characters are used, the files to be joined
 - should be ordered in the collating sequence of
 - .Xr sort 1 ,
 - using the
 - .Fl b
 - option, on the fields on which they are to be joined, otherwise
 - .Nm
 - may not report all field matches.
 - When the field delimiter characters are specified by the
 - .Fl t
 - option, the collating sequence should be the same as
 - .Xr sort 1
 - without the
 - .Fl b
 - option.
 - .Pp
 - If one of the arguments
 - .Ar file1
 - or
 - .Ar file2
 - is
 - .Sq Fl ,
 - the standard input is used.
 - .Sh EXIT STATUS
 - .Ex -std
 - .Sh EXAMPLES
 - Assuming a file named
 - .Pa nobel_laureates.txt
 - with information about some of the first Nobel Peace Prize laureates:
 - .Bd -literal -offset indent
 - 1901,Jean Henri Dunant,M
 - 1901,Frederic Passy,M
 - 1902,Elie Ducommun,M
 - 1905,Baroness Bertha Sophie Felicita Von Suttner,F
 - 1910,Permanent International Peace Bureau,
 - .Ed
 - .Pp
 - and a second file
 - .Pa nobel_nationalities.txt
 - with their nationalities:
 - .Bd -literal -offset indent
 - Jean Henri Dunant,Switzerland
 - Frederic Passy,France
 - Elie Ducommun,Switzerland
 - Baroness Bertha Sophie Felicita Von Suttner
 - .Ed
 - .Pp
 - Join the two files using the second column from first file and the default first
 - column from second file specifying a custom field delimiter:
 - .Bd -literal -offset indent
 - $ join -t, -1 2 nobel_laureates.txt nobel_nationalities.txt
 - Jean Henri Dunant,1901,M,Switzerland
 - Frederic Passy,1901,M,France
 - Elie Ducommun,1902,M,Switzerland
 - Baroness Bertha Sophie Felicita Von Suttner,1905,F
 - .Ed
 - .Pp
 - Show only the year and the nationality of the laureate using
 - .Ql <<NULL>>
 - to replace empty fields:
 - .Bd -literal -offset indent
 - $ join -e "<<NULL>>" -t, -1 2 -o "1.1 2.2" nobel_laureates.txt nobel_nationalities.txt
 - 1901,Switzerland
 - 1901,France
 - 1902,Switzerland
 - 1905,<<NULL>>
 - .Ed
 - .Pp
 - Show only lines from first file which do not have a match in second file:
 - .Bd -literal -offset indent
 - $ join -v1 -t, -1 2 nobel_laureates.txt nobel_nationalities.txt
 - Permanent International Peace Bureau,1910,
 - .Ed
 - .Pp
 - Assuming a file named
 - .Pa capitals.txt
 - with the following content:
 - .Bd -literal -offset indent
 - Belgium,Brussels
 - France,Paris
 - Italy,Rome
 - Switzerland
 - .Ed
 - .Pp
 - Show the name and capital of the country where the laureate was born.
 - This example uses
 - .Pa nobel_nationalities.txt
 - as a bridge but does not show any information from that file.
 - Also see the note about
 - .Xr sort 1
 - above to understand why we need to sort the intermediate result.
 - .Bd -literal -offset indent
 - $ join -t, -1 2 -o "1.2 2.2" nobel_laureates.txt nobel_nationalities.txt | \e
 - sort -k2 -t, | join -t, -e "<<NULL>>" -1 2 -o "1.1 2.2" - capitals.txt
 - Elie Ducommun,<<NULL>>
 - Jean Henri Dunant,<<NULL>>
 - .Ed
 - .Sh COMPATIBILITY
 - Compatibility with
 - .Fl a
 - and multi-operand
 - .Fl o
 - from historical versions of
 - .Nm
 - got dropped, compatibility with
 - .Fl j Ar field
 - is kept for now.
 - .Sh SEE ALSO
 - .Xr awk 1 ,
 - .Xr comm 1 ,
 - .Xr paste 1 ,
 - .Xr sort 1 ,
 - .Xr uniq 1
 - .Sh STANDARDS
 - .Nm
 - should be compliant with the
 - IEEE Std 1003.1-2024 (“POSIX.1”)
 - specification.