logo

oasis-root

Compiled tree of Oasis Linux based on own branch at <https://hacktivis.me/git/oasis/> git clone https://anongit.hacktivis.me/git/oasis-root.git

join.1p (13379B)


  1. '\" et
  2. .TH JOIN "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual"
  3. .\"
  4. .SH PROLOG
  5. This manual page is part of the POSIX Programmer's Manual.
  6. The Linux implementation of this interface may differ (consult
  7. the corresponding Linux manual page for details of Linux behavior),
  8. or the interface may not be implemented on Linux.
  9. .\"
  10. .SH NAME
  11. join
  12. \(em relational database operator
  13. .SH SYNOPSIS
  14. .LP
  15. .nf
  16. join \fB[\fR-a \fIfile_number\fR|-v \fIfile_number\fB] [\fR-e \fIstring\fB] [\fR-o \fIlist\fB] [\fR-t \fIchar\fB]
  17. [\fR-1 \fIfield\fB] [\fR-2 \fIfield\fB]\fI file1 file2\fR
  18. .fi
  19. .SH DESCRIPTION
  20. The
  21. .IR join
  22. utility shall perform an equality join on the files
  23. .IR file1
  24. and
  25. .IR file2 .
  26. The joined files shall be written to the standard output.
  27. .P
  28. The join field is a field in each file on which the files are
  29. compared. The
  30. .IR join
  31. utility shall write one line in the output for each pair of lines in
  32. .IR file1
  33. and
  34. .IR file2
  35. that have join fields that collate equally. The output line by default
  36. shall consist of the join field, then the remaining fields from
  37. .IR file1 ,
  38. then the remaining fields from
  39. .IR file2 .
  40. This format can be changed by using the
  41. .BR \-o
  42. option (see below). The
  43. .BR \-a
  44. option can be used to add unmatched lines to the output. The
  45. .BR \-v
  46. option can be used to output only unmatched lines.
  47. .P
  48. The files
  49. .IR file1
  50. and
  51. .IR file2
  52. shall be ordered in the collating sequence of
  53. .IR sort
  54. .BR \-b
  55. on the fields on which they shall be joined, by default the first in
  56. each line. All selected output shall be written in the same collating
  57. sequence.
  58. .P
  59. The default input field separators shall be
  60. <blank>
  61. characters. In this case, multiple separators shall count as one field
  62. separator, and leading separators shall be ignored. The default output
  63. field separator shall be a
  64. <space>.
  65. .P
  66. The field separator and collating sequence can be changed by using the
  67. .BR \-t
  68. option (see below).
  69. .P
  70. If the same key appears more than once in either file, all combinations
  71. of the set of remaining fields in
  72. .IR file1
  73. and the set of remaining fields in
  74. .IR file2
  75. are output in the order of the lines encountered.
  76. .P
  77. If the input files are not in the appropriate collating sequence, the
  78. results are unspecified.
  79. .SH OPTIONS
  80. The
  81. .IR join
  82. utility shall conform to the Base Definitions volume of POSIX.1\(hy2017,
  83. .IR "Section 12.2" ", " "Utility Syntax Guidelines".
  84. .P
  85. The following options shall be supported:
  86. .IP "\fB\-a\ \fIfile_number\fR" 10
  87. .br
  88. Produce a line for each unpairable line in file
  89. .IR file_number ,
  90. where
  91. .IR file_number
  92. is 1 or 2, in addition to the default output. If both
  93. .BR \-a 1
  94. and
  95. .BR \-a 2
  96. are specified, all unpairable lines shall be output.
  97. .IP "\fB\-e\ \fIstring\fR" 10
  98. Replace empty output fields in the list selected by
  99. .BR \-o
  100. with the string
  101. .IR string .
  102. .IP "\fB\-o\ \fIlist\fR" 10
  103. Construct the output line to comprise the fields specified in
  104. .IR list ,
  105. each element of which shall have one of the following two forms:
  106. .RS 10
  107. .IP " 1." 4
  108. \fIfile_number.field\fR, where
  109. .IR file_number
  110. is a file number and
  111. .IR field
  112. is a decimal integer field number
  113. .IP " 2." 4
  114. 0 (zero), representing the join field
  115. .P
  116. The elements of
  117. .IR list
  118. shall be either
  119. <comma>-separated
  120. or
  121. <blank>-separated,
  122. as specified in Guideline 8 of the Base Definitions volume of POSIX.1\(hy2017,
  123. .IR "Section 12.2" ", " "Utility Syntax Guidelines".
  124. The fields specified by
  125. .IR list
  126. shall be written for all selected output lines. Fields selected by
  127. .IR list
  128. that do not appear in the input shall be treated as empty output
  129. fields. (See the
  130. .BR \-e
  131. option.) Only specifically requested fields shall be written. The
  132. application shall ensure that
  133. .IR list
  134. is a single command line argument.
  135. .RE
  136. .IP "\fB\-t\ \fIchar\fR" 10
  137. Use character
  138. .IR char
  139. as a separator, for both input and output. Every appearance of
  140. .IR char
  141. in a line shall be significant. When this option is specified, the
  142. collating sequence shall be the same as
  143. .IR sort
  144. without the
  145. .BR \-b
  146. option.
  147. .IP "\fB\-v\ \fIfile_number\fR" 10
  148. .br
  149. Instead of the default output, produce a line only for each unpairable
  150. line in
  151. .IR file_number ,
  152. where
  153. .IR file_number
  154. is 1 or 2. If both
  155. .BR \-v 1
  156. and
  157. .BR \-v 2
  158. are specified, all unpairable lines shall be output.
  159. .IP "\fB\-1\ \fIfield\fR" 10
  160. Join on the
  161. .IR field th
  162. field of file 1. Fields are decimal integers starting with 1.
  163. .IP "\fB\-2\ \fIfield\fR" 10
  164. Join on the
  165. .IR field th
  166. field of file 2. Fields are decimal integers starting with 1.
  167. .SH OPERANDS
  168. The following operands shall be supported:
  169. .IP "\fIfile1\fR,\ \fIfile2\fR" 10
  170. A pathname of a file to be joined. If either of the
  171. .IR file1
  172. or
  173. .IR file2
  174. operands is
  175. .BR '\-' ,
  176. the standard input shall be used in its place.
  177. .SH STDIN
  178. The standard input shall be used only if the
  179. .IR file1
  180. or
  181. .IR file2
  182. operand is
  183. .BR '\-' .
  184. See the INPUT FILES section.
  185. .SH "INPUT FILES"
  186. The input files shall be text files.
  187. .SH "ENVIRONMENT VARIABLES"
  188. The following environment variables shall affect the execution of
  189. .IR join :
  190. .IP "\fILANG\fP" 10
  191. Provide a default value for the internationalization variables that are
  192. unset or null. (See the Base Definitions volume of POSIX.1\(hy2017,
  193. .IR "Section 8.2" ", " "Internationalization Variables"
  194. for the precedence of internationalization variables used to determine
  195. the values of locale categories.)
  196. .IP "\fILC_ALL\fP" 10
  197. If set to a non-empty string value, override the values of all the
  198. other internationalization variables.
  199. .IP "\fILC_COLLATE\fP" 10
  200. .br
  201. Determine the locale of the collating sequence
  202. .IR join
  203. expects to have been used when the input files were sorted.
  204. .IP "\fILC_CTYPE\fP" 10
  205. Determine the locale for the interpretation of sequences of bytes of
  206. text data as characters (for example, single-byte as opposed to
  207. multi-byte characters in arguments and input files).
  208. .IP "\fILC_MESSAGES\fP" 10
  209. .br
  210. Determine the locale that should be used to affect the format and
  211. contents of diagnostic messages written to standard error.
  212. .IP "\fINLSPATH\fP" 10
  213. Determine the location of message catalogs for the processing of
  214. .IR LC_MESSAGES .
  215. .SH "ASYNCHRONOUS EVENTS"
  216. Default.
  217. .SH STDOUT
  218. The
  219. .IR join
  220. utility output shall be a concatenation of selected character fields.
  221. When the
  222. .BR \-o
  223. option is not specified, the output shall be:
  224. .sp
  225. .RS 4
  226. .nf
  227. "%s%s%s\en", <\fIjoin field\fR>, <\fIother file1 fields\fR>,
  228. <\fIother file2 fields\fR>
  229. .fi
  230. .P
  231. .RE
  232. .P
  233. If the join field is not the first field in a file, the
  234. <\fIother\ file\ fields\fP> for that file shall be:
  235. .sp
  236. .RS 4
  237. .nf
  238. <\fIfields preceding join field\fR>, <\fIfields following join field\fR>
  239. .fi
  240. .P
  241. .RE
  242. .P
  243. When the
  244. .BR \-o
  245. option is specified, the output format shall be:
  246. .sp
  247. .RS 4
  248. .nf
  249. "%s\en", <\fIconcatenation of fields\fR>
  250. .fi
  251. .P
  252. .RE
  253. .P
  254. where the concatenation of fields is described by the
  255. .BR \-o
  256. option, above.
  257. .P
  258. For either format, each field (except the last) shall be written with
  259. its trailing separator character. If the separator is the default (\c
  260. <blank>
  261. characters), a single
  262. <space>
  263. shall be written after each field (except the last).
  264. .SH STDERR
  265. The standard error shall be used only for diagnostic messages.
  266. .SH "OUTPUT FILES"
  267. None.
  268. .SH "EXTENDED DESCRIPTION"
  269. None.
  270. .SH "EXIT STATUS"
  271. The following exit values shall be returned:
  272. .IP "\00" 6
  273. All input files were output successfully.
  274. .IP >0 6
  275. An error occurred.
  276. .SH "CONSEQUENCES OF ERRORS"
  277. Default.
  278. .LP
  279. .IR "The following sections are informative."
  280. .SH "APPLICATION USAGE"
  281. Pathnames consisting of numeric digits or of the form
  282. .IR string.string
  283. should not be specified directly following the
  284. .BR \-o
  285. list.
  286. .P
  287. If the collating sequence of the current locale does not have a total
  288. ordering of all characters (see the Base Definitions volume of POSIX.1\(hy2017,
  289. .IR "Section 7.3.2" ", " "LC_COLLATE"),
  290. .IR join
  291. treats fields that collate equally but are not identical as being the
  292. same. If this behavior is not desired, it can be avoided by forcing
  293. the use of the POSIX locale (although this means re-sorting the input
  294. files into the POSIX locale collating sequence.)
  295. .P
  296. When using
  297. .IR join
  298. to process pathnames, it is recommended that LC_ALL, or at least
  299. LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environment,
  300. since pathnames can contain byte sequences that do not form valid
  301. characters in some locales, in which case the utility's behavior would
  302. be undefined. In the POSIX locale each byte is a valid single-byte
  303. character, and therefore this problem is avoided.
  304. .SH EXAMPLES
  305. The
  306. .BR \-o
  307. 0 field essentially selects the union of the join fields. For example,
  308. given file
  309. .BR phone :
  310. .sp
  311. .RS 4
  312. .nf
  313. !Name Phone Number
  314. Don +1 123-456-7890
  315. Hal +1 234-567-8901
  316. Yasushi +2 345-678-9012
  317. .fi
  318. .P
  319. .RE
  320. .P
  321. and file
  322. .BR fax :
  323. .sp
  324. .RS 4
  325. .nf
  326. !Name Fax Number
  327. Don +1 123-456-7899
  328. Keith +1 456-789-0122
  329. Yasushi +2 345-678-9011
  330. .fi
  331. .P
  332. .RE
  333. .P
  334. (where the large expanses of white space are meant to each represent a
  335. single
  336. <tab>),
  337. the command:
  338. .sp
  339. .RS 4
  340. .nf
  341. join -t "<tab>" -a 1 -a 2 -e \(aq(unknown)\(aq -o 0,1.2,2.2 phone fax
  342. .fi
  343. .P
  344. .RE
  345. .P
  346. (where
  347. .IR <tab>
  348. is a literal
  349. <tab>
  350. character) would produce:
  351. .sp
  352. .RS 4
  353. .nf
  354. !Name Phone Number Fax Number
  355. Don +1 123-456-7890 +1 123-456-7899
  356. Hal +1 234-567-8901 (unknown)
  357. Keith (unknown) +1 456-789-0122
  358. Yasushi +2 345-678-9012 +2 345-678-9011
  359. .fi
  360. .P
  361. .RE
  362. .P
  363. Multiple instances of the same key will produce combinatorial results.
  364. The following:
  365. .sp
  366. .RS 4
  367. .nf
  368. fa:
  369. a x
  370. a y
  371. a z
  372. fb:
  373. a p
  374. .fi
  375. .P
  376. .RE
  377. .P
  378. will produce:
  379. .sp
  380. .RS 4
  381. .nf
  382. a x p
  383. a y p
  384. a z p
  385. .fi
  386. .P
  387. .RE
  388. .P
  389. And the following:
  390. .sp
  391. .RS 4
  392. .nf
  393. fa:
  394. a b c
  395. a d e
  396. fb:
  397. a w x
  398. a y z
  399. a o p
  400. .fi
  401. .P
  402. .RE
  403. .P
  404. will produce:
  405. .sp
  406. .RS 4
  407. .nf
  408. a b c w x
  409. a b c y z
  410. a b c o p
  411. a d e w x
  412. a d e y z
  413. a d e o p
  414. .fi
  415. .P
  416. .RE
  417. .SH RATIONALE
  418. The
  419. .BR \-e
  420. option is only effective when used with
  421. .BR \-o
  422. because, unless specific fields are identified using
  423. .BR \-o ,
  424. .IR join
  425. is not aware of what fields might be empty. The exception to this is
  426. the join field, but identifying an empty join field with the
  427. .BR \-e
  428. string is not historical practice and some scripts might break if this
  429. were changed.
  430. .P
  431. The 0 field in the
  432. .BR \-o
  433. list was adopted from the Tenth Edition version of
  434. .IR join
  435. to satisfy international objections that the
  436. .IR join
  437. in the base documents for IEEE\ Std 1003.2\(hy1992 did not support the ``full join''
  438. or ``outer join'' described in relational database literature.
  439. Although it has been possible to include a join field in the
  440. output (by default, or by field number using
  441. .BR \-o ),
  442. the join field could not be included for an unpaired line selected by
  443. .BR \-a .
  444. The
  445. .BR \-o
  446. 0 field essentially selects the union of the join fields.
  447. .P
  448. This sort of outer join was not possible with the
  449. .IR join
  450. commands in the base documents for IEEE\ Std 1003.2\(hy1992. The
  451. .BR \-o
  452. 0 field was chosen because it is an upwards-compatible change for
  453. applications. An alternative was considered: have the join field
  454. represent the union of the fields in the files (where they are
  455. identical for matched lines, and one or both are null for unmatched
  456. lines). This was not adopted because it would break some historical
  457. applications.
  458. .P
  459. The ability to specify
  460. .IR file2
  461. as
  462. .BR \-
  463. is not historical practice; it was added for completeness.
  464. .P
  465. The
  466. .BR \-v
  467. option is not historical practice, but was considered necessary because
  468. it permitted the writing of
  469. .IR only
  470. those lines that do not match on the join field, as opposed to the
  471. .BR \-a
  472. option, which prints both lines that do and do not match. This
  473. additional facility is parallel with the
  474. .BR \-v
  475. option of
  476. .IR grep .
  477. .P
  478. Some historical implementations have been encountered where a blank
  479. line in one of the input files was considered to be the end of the
  480. file; the description in this volume of POSIX.1\(hy2017 does not cite this as an allowable case.
  481. .P
  482. Earlier versions of this standard allowed
  483. .BR \-j ,
  484. .BR \-j1 ,
  485. .BR \-j2
  486. options, and a form of the
  487. .BR \-o
  488. option that allowed the
  489. .IR list
  490. option-argument to be multiple arguments. These forms are no longer
  491. specified by POSIX.1\(hy2008 but may be present in some implementations.
  492. .SH "FUTURE DIRECTIONS"
  493. None.
  494. .SH "SEE ALSO"
  495. .IR "\fIawk\fR\^",
  496. .IR "\fIcomm\fR\^",
  497. .IR "\fIsort\fR\^",
  498. .IR "\fIuniq\fR\^"
  499. .P
  500. The Base Definitions volume of POSIX.1\(hy2017,
  501. .IR "Section 7.3.2" ", " "LC_COLLATE",
  502. .IR "Chapter 8" ", " "Environment Variables",
  503. .IR "Section 12.2" ", " "Utility Syntax Guidelines"
  504. .\"
  505. .SH COPYRIGHT
  506. Portions of this text are reprinted and reproduced in electronic form
  507. from IEEE Std 1003.1-2017, Standard for Information Technology
  508. -- Portable Operating System Interface (POSIX), The Open Group Base
  509. Specifications Issue 7, 2018 Edition,
  510. Copyright (C) 2018 by the Institute of
  511. Electrical and Electronics Engineers, Inc and The Open Group.
  512. In the event of any discrepancy between this version and the original IEEE and
  513. The Open Group Standard, the original IEEE and The Open Group Standard
  514. is the referee document. The original Standard can be obtained online at
  515. http://www.opengroup.org/unix/online.html .
  516. .PP
  517. Any typographical or formatting errors that appear
  518. in this page are most likely
  519. to have been introduced during the conversion of the source files to
  520. man page format. To report such errors, see
  521. https://www.kernel.org/doc/man-pages/reporting_bugs.html .