logo

oasis-root

Compiled tree of Oasis Linux based on own branch at <https://hacktivis.me/git/oasis/> git clone https://anongit.hacktivis.me/git/oasis-root.git

regcomp.3p (23206B)


  1. '\" et
  2. .TH REGCOMP "3P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual"
  3. .\"
  4. .SH PROLOG
  5. This manual page is part of the POSIX Programmer's Manual.
  6. The Linux implementation of this interface may differ (consult
  7. the corresponding Linux manual page for details of Linux behavior),
  8. or the interface may not be implemented on Linux.
  9. .\"
  10. .SH NAME
  11. regcomp,
  12. regerror,
  13. regexec,
  14. regfree
  15. \(em regular expression matching
  16. .SH SYNOPSIS
  17. .LP
  18. .nf
  19. #include <regex.h>
  20. .P
  21. int regcomp(regex_t *restrict \fIpreg\fP, const char *restrict \fIpattern\fP,
  22. int \fIcflags\fP);
  23. size_t regerror(int \fIerrcode\fP, const regex_t *restrict \fIpreg\fP,
  24. char *restrict \fIerrbuf\fP, size_t \fIerrbuf_size\fP);
  25. int regexec(const regex_t *restrict \fIpreg\fP, const char *restrict \fIstring\fP,
  26. size_t \fInmatch\fP, regmatch_t \fIpmatch\fP[restrict], int \fIeflags\fP);
  27. void regfree(regex_t *\fIpreg\fP);
  28. .fi
  29. .SH DESCRIPTION
  30. These functions interpret
  31. .IR basic
  32. and
  33. .IR extended
  34. regular expressions as described in the Base Definitions volume of POSIX.1\(hy2017,
  35. .IR "Chapter 9" ", " "Regular Expressions".
  36. .P
  37. The
  38. .BR regex_t
  39. structure is defined in
  40. .IR <regex.h>
  41. and contains at least the following member:
  42. .TS
  43. center box tab(!);
  44. cB | cB | cB
  45. lw(1.25i)B | lw(1.25i)I | lw(2.5i).
  46. Member Type!Member Name!Description
  47. _
  48. size_t!re_nsub!T{
  49. Number of parenthesized subexpressions.
  50. T}
  51. .TE
  52. .P
  53. The
  54. .BR regmatch_t
  55. structure is defined in
  56. .IR <regex.h>
  57. and contains at least the following members:
  58. .TS
  59. center box tab(!);
  60. cB | cB | cB
  61. lw(1.25i)B | lw(1.25i)I | lw(2.5i).
  62. Member Type!Member Name!Description
  63. _
  64. regoff_t!rm_so!T{
  65. Byte offset from start of \fIstring\fP to start of substring.
  66. T}
  67. regoff_t!rm_eo!T{
  68. Byte offset from start of
  69. .IR string
  70. of the first character after the end of substring.
  71. T}
  72. .TE
  73. .P
  74. The
  75. \fIregcomp\fR()
  76. function shall compile the regular expression contained in the string
  77. pointed to by the
  78. .IR pattern
  79. argument and place the results in the structure pointed to by
  80. .IR preg .
  81. The
  82. .IR cflags
  83. argument is the bitwise-inclusive OR of zero or more of the following
  84. flags, which are defined in the
  85. .IR <regex.h>
  86. header:
  87. .IP REG_EXTENDED 14
  88. Use Extended Regular Expressions.
  89. .IP REG_ICASE 14
  90. Ignore case in match (see the Base Definitions volume of POSIX.1\(hy2017,
  91. .IR "Chapter 9" ", " "Regular Expressions").
  92. .IP REG_NOSUB 14
  93. Report only success/fail in
  94. \fIregexec\fR().
  95. .IP REG_NEWLINE 14
  96. Change the handling of
  97. <newline>
  98. characters, as described in the text.
  99. .P
  100. The default regular expression type for
  101. .IR pattern
  102. is a Basic Regular Expression. The application can specify Extended
  103. Regular Expressions using the REG_EXTENDED
  104. .IR cflags
  105. flag.
  106. .P
  107. If the REG_NOSUB flag was not set in
  108. .IR cflags ,
  109. then
  110. \fIregcomp\fR()
  111. shall set
  112. .IR re_nsub
  113. to the number of parenthesized subexpressions (delimited by
  114. .BR \(dq\e(\e)\(dq
  115. in basic regular expressions or
  116. .BR \(dq(\|)\(dq
  117. in extended regular expressions) found in
  118. .IR pattern .
  119. .P
  120. The
  121. \fIregexec\fR()
  122. function compares the null-terminated string specified by
  123. .IR string
  124. with the compiled regular expression
  125. .IR preg
  126. initialized by a previous call to
  127. \fIregcomp\fR().
  128. If it finds a match,
  129. \fIregexec\fR()
  130. shall return 0; otherwise, it shall return non-zero indicating either
  131. no match or an error. The
  132. .IR eflags
  133. argument is the bitwise-inclusive OR of zero or more of the following
  134. flags, which are defined in the
  135. .IR <regex.h>
  136. header:
  137. .IP REG_NOTBOL 14
  138. The first character of the string pointed to by
  139. .IR string
  140. is not the beginning of the line. Therefore, the
  141. <circumflex>
  142. character
  143. (\c
  144. .BR '\(ha' ),
  145. when taken as a special character, shall not match the beginning of
  146. .IR string .
  147. .IP REG_NOTEOL 14
  148. The last character of the string pointed to by
  149. .IR string
  150. is not the end of the line. Therefore, the
  151. <dollar-sign>
  152. (\c
  153. .BR '$' ),
  154. when taken as a special character, shall not match the end of
  155. .IR string .
  156. .P
  157. If
  158. .IR nmatch
  159. is 0 or REG_NOSUB was set in the
  160. .IR cflags
  161. argument to
  162. \fIregcomp\fR(),
  163. then
  164. \fIregexec\fR()
  165. shall ignore the
  166. .IR pmatch
  167. argument. Otherwise, the application shall ensure that the
  168. .IR pmatch
  169. argument points to an array with at least
  170. .IR nmatch
  171. elements, and
  172. \fIregexec\fR()
  173. shall fill in the elements of that array with offsets of the substrings
  174. of
  175. .IR string
  176. that correspond to the parenthesized subexpressions of
  177. .IR pattern :
  178. .IR pmatch [\c
  179. .IR i ].\c
  180. .IR rm_so
  181. shall be the byte offset of the beginning and
  182. .IR pmatch [\c
  183. .IR i ].\c
  184. .IR rm_eo
  185. shall be one greater than the byte offset of the end of substring
  186. .IR i .
  187. (Subexpression
  188. .IR i
  189. begins at the
  190. .IR i th
  191. matched open parenthesis, counting from 1.) Offsets in
  192. .IR pmatch [0]
  193. identify the substring that corresponds to the entire regular
  194. expression. Unused elements of
  195. .IR pmatch
  196. up to
  197. .IR pmatch [\c
  198. .IR nmatch \-1]
  199. shall be filled with \-1. If there are more than
  200. .IR nmatch
  201. subexpressions in
  202. .IR pattern
  203. (\c
  204. .IR pattern
  205. itself counts as a subexpression), then
  206. \fIregexec\fR()
  207. shall still do the match, but shall record only the first
  208. .IR nmatch
  209. substrings.
  210. .P
  211. When matching a basic or extended regular expression, any given
  212. parenthesized subexpression of
  213. .IR pattern
  214. might participate in the match of several different substrings of
  215. .IR string ,
  216. or it might not match any substring even though the pattern as a whole
  217. did match. The following rules shall be used to determine which
  218. substrings to report in
  219. .IR pmatch
  220. when matching regular expressions:
  221. .IP " 1." 4
  222. If subexpression
  223. .IR i
  224. in a regular expression is not contained within another subexpression,
  225. and it participated in the match several times, then the byte offsets
  226. in
  227. .IR pmatch [\c
  228. .IR i ]
  229. shall delimit the last such match.
  230. .IP " 2." 4
  231. If subexpression
  232. .IR i
  233. is not contained within another subexpression, and it did not
  234. participate in an otherwise successful match, the byte offsets in
  235. .IR pmatch [\c
  236. .IR i ]
  237. shall be \-1. A subexpression does not participate in the match when:
  238. .sp
  239. .RS
  240. .BR '*'
  241. or
  242. .BR \(dq\e{\e}\(dq
  243. appears immediately after the subexpression in a basic regular
  244. expression, or
  245. .BR '*' ,
  246. .BR '?' ,
  247. or
  248. .BR \(dq{\|}\(dq
  249. appears immediately after the subexpression in an extended regular
  250. expression, and the subexpression did not match (matched 0 times)
  251. .RE
  252. .RS 4
  253. .P
  254. or:
  255. .sp
  256. .RS
  257. .BR '|'
  258. is used in an extended regular expression to select this subexpression
  259. or another, and the other subexpression matched.
  260. .RE
  261. .RE
  262. .IP " 3." 4
  263. If subexpression
  264. .IR i
  265. is contained within another subexpression
  266. .IR j ,
  267. and
  268. .IR i
  269. is not contained within any other subexpression that is contained
  270. within
  271. .IR j ,
  272. and a match of subexpression
  273. .IR j
  274. is reported in
  275. .IR pmatch [\c
  276. .IR j ],
  277. then the match or non-match of subexpression
  278. .IR i
  279. reported in
  280. .IR pmatch [\c
  281. .IR i ]
  282. shall be as described in 1. and 2. above, but within the substring
  283. reported in
  284. .IR pmatch [\c
  285. .IR j ]
  286. rather than the whole string. The offsets in
  287. .IR pmatch [\c
  288. .IR i ]
  289. are still relative to the start of
  290. .IR string .
  291. .IP " 4." 4
  292. If subexpression
  293. .IR i
  294. is contained in subexpression
  295. .IR j ,
  296. and the byte offsets in
  297. .IR pmatch [\c
  298. .IR j ]
  299. are \-1, then the pointers in
  300. .IR pmatch [\c
  301. .IR i ]
  302. shall also be \-1.
  303. .IP " 5." 4
  304. If subexpression
  305. .IR i
  306. matched a zero-length string, then both byte offsets in
  307. .IR pmatch [\c
  308. .IR i ]
  309. shall be the byte offset of the character or null terminator
  310. immediately following the zero-length string.
  311. .P
  312. If, when
  313. \fIregexec\fR()
  314. is called, the locale is different from when the regular expression was
  315. compiled, the result is undefined.
  316. .P
  317. If REG_NEWLINE is not set in
  318. .IR cflags ,
  319. then a
  320. <newline>
  321. in
  322. .IR pattern
  323. or
  324. .IR string
  325. shall be treated as an ordinary character. If REG_NEWLINE is set, then
  326. <newline>
  327. shall be treated as an ordinary character except as follows:
  328. .IP " 1." 4
  329. A
  330. <newline>
  331. in
  332. .IR string
  333. shall not be matched by a
  334. <period>
  335. outside a bracket expression or by any form of a non-matching list
  336. (see the Base Definitions volume of POSIX.1\(hy2017,
  337. .IR "Chapter 9" ", " "Regular Expressions").
  338. .IP " 2." 4
  339. A
  340. <circumflex>
  341. (\c
  342. .BR '\(ha' )
  343. in
  344. .IR pattern ,
  345. when used to specify expression anchoring (see the Base Definitions volume of POSIX.1\(hy2017,
  346. .IR "Section 9.3.8" ", " "BRE Expression Anchoring"),
  347. shall match the zero-length string immediately after a
  348. <newline>
  349. in
  350. .IR string ,
  351. regardless of the setting of REG_NOTBOL.
  352. .IP " 3." 4
  353. A
  354. <dollar-sign>
  355. (\c
  356. .BR '$' )
  357. in
  358. .IR pattern ,
  359. when used to specify expression anchoring, shall match the zero-length
  360. string immediately before a
  361. <newline>
  362. in
  363. .IR string ,
  364. regardless of the setting of REG_NOTEOL.
  365. .P
  366. The
  367. \fIregfree\fR()
  368. function frees any memory allocated by
  369. \fIregcomp\fR()
  370. associated with
  371. .IR preg .
  372. .P
  373. The following constants are defined as the minimum set of error return
  374. values, although other errors listed as implementation extensions in
  375. .IR <regex.h>
  376. are possible:
  377. .IP REG_BADBR 14
  378. Content of
  379. .BR \(dq\e{\e}\(dq
  380. invalid: not a number, number too large, more than two numbers, first
  381. larger than second.
  382. .IP REG_BADPAT 14
  383. Invalid regular expression.
  384. .IP REG_BADRPT 14
  385. .BR '?' ,
  386. .BR '*' ,
  387. or
  388. .BR '+'
  389. not preceded by valid regular expression.
  390. .IP REG_EBRACE 14
  391. .BR \(dq\e{\e}\(dq
  392. imbalance.
  393. .IP REG_EBRACK 14
  394. .BR \(dq[]\(dq
  395. imbalance.
  396. .IP REG_ECOLLATE 14
  397. Invalid collating element referenced.
  398. .IP REG_ECTYPE 14
  399. Invalid character class type referenced.
  400. .IP REG_EESCAPE 14
  401. Trailing
  402. <backslash>
  403. character in pattern.
  404. .IP REG_EPAREN 14
  405. .BR \(dq\e(\e)\(dq
  406. or
  407. .BR \(dq()\(dq
  408. imbalance.
  409. .IP REG_ERANGE 14
  410. Invalid endpoint in range expression.
  411. .IP REG_ESPACE 14
  412. Out of memory.
  413. .IP REG_ESUBREG 14
  414. Number in
  415. .BR \(dq\edigit\(dq
  416. invalid or in error.
  417. .IP REG_NOMATCH 14
  418. \fIregexec\fR()
  419. failed to match.
  420. .P
  421. If more than one error occurs in processing a function call, any one
  422. of the possible constants may be returned, as the order of detection is
  423. unspecified.
  424. .P
  425. The
  426. \fIregerror\fR()
  427. function provides a mapping from error codes returned by
  428. \fIregcomp\fR()
  429. and
  430. \fIregexec\fR()
  431. to unspecified printable strings. It generates a string corresponding
  432. to the value of the
  433. .IR errcode
  434. argument, which the application shall ensure is the last non-zero value
  435. returned by
  436. \fIregcomp\fR()
  437. or
  438. \fIregexec\fR()
  439. with the given value of
  440. .IR preg .
  441. If
  442. .IR errcode
  443. is not such a value, the content of the generated string is unspecified.
  444. .P
  445. If
  446. .IR preg
  447. is a null pointer, but
  448. .IR errcode
  449. is a value returned by a previous call to
  450. \fIregexec\fR()
  451. or
  452. \fIregcomp\fR(),
  453. the
  454. \fIregerror\fR()
  455. still generates an error string corresponding to the value of
  456. .IR errcode ,
  457. but it might not be as detailed under some implementations.
  458. .P
  459. If the
  460. .IR errbuf_size
  461. argument is not 0,
  462. \fIregerror\fR()
  463. shall place the generated string into the buffer of size
  464. .IR errbuf_size
  465. bytes pointed to by
  466. .IR errbuf .
  467. If the string (including the terminating null) cannot fit in the
  468. buffer,
  469. \fIregerror\fR()
  470. shall truncate the string and null-terminate the result.
  471. .P
  472. If
  473. .IR errbuf_size
  474. is 0,
  475. \fIregerror\fR()
  476. shall ignore the
  477. .IR errbuf
  478. argument, and return the size of the buffer needed to hold the
  479. generated string.
  480. .P
  481. If the
  482. .IR preg
  483. argument to
  484. \fIregexec\fR()
  485. or
  486. \fIregfree\fR()
  487. is not a compiled regular expression returned by
  488. \fIregcomp\fR(),
  489. the result is undefined. A
  490. .IR preg
  491. is no longer treated as a compiled regular expression after it is given
  492. to
  493. \fIregfree\fR().
  494. .SH "RETURN VALUE"
  495. Upon successful completion, the
  496. \fIregcomp\fR()
  497. function shall return 0. Otherwise, it shall return an integer value
  498. indicating an error as described in
  499. .IR <regex.h> ,
  500. and the content of
  501. .IR preg
  502. is undefined. If a code is returned, the interpretation shall be as
  503. given in
  504. .IR <regex.h> .
  505. .P
  506. If
  507. \fIregcomp\fR()
  508. detects an invalid RE, it may return REG_BADPAT, or it may return one
  509. of the error codes that more precisely describes the error.
  510. .P
  511. Upon successful completion, the
  512. \fIregexec\fR()
  513. function shall return 0. Otherwise, it shall return REG_NOMATCH to
  514. indicate no match.
  515. .P
  516. Upon successful completion, the
  517. \fIregerror\fR()
  518. function shall return the number of bytes needed to hold the entire
  519. generated string, including the null termination. If the return value
  520. is greater than
  521. .IR errbuf_size ,
  522. the string returned in the buffer pointed to by
  523. .IR errbuf
  524. has been truncated.
  525. .P
  526. The
  527. \fIregfree\fR()
  528. function shall not return a value.
  529. .SH ERRORS
  530. No errors are defined.
  531. .LP
  532. .IR "The following sections are informative."
  533. .SH "EXAMPLES"
  534. .sp
  535. .RS 4
  536. .nf
  537. #include <regex.h>
  538. .P
  539. /*
  540. * Match string against the extended regular expression in
  541. * pattern, treating errors as no match.
  542. *
  543. * Return 1 for match, 0 for no match.
  544. */
  545. .P
  546. int
  547. match(const char *string, char *pattern)
  548. {
  549. int status;
  550. regex_t re;
  551. .P
  552. if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) {
  553. return(0); /* Report error. */
  554. }
  555. status = regexec(&re, string, (size_t) 0, NULL, 0);
  556. regfree(&re);
  557. if (status != 0) {
  558. return(0); /* Report error. */
  559. }
  560. return(1);
  561. }
  562. .fi
  563. .P
  564. .RE
  565. .P
  566. The following demonstrates how the REG_NOTBOL flag could be used with
  567. \fIregexec\fR()
  568. to find all substrings in a line that match a pattern supplied by a user.
  569. (For simplicity of the example, very little error checking is done.)
  570. .sp
  571. .RS 4
  572. .nf
  573. (void) regcomp (&re, pattern, 0);
  574. /* This call to regexec() finds the first match on the line. */
  575. error = regexec (&re, &buffer[0], 1, &pm, 0);
  576. while (error == 0) { /* While matches found. */
  577. /* Substring found between pm.rm_so and pm.rm_eo. */
  578. /* This call to regexec() finds the next match. */
  579. error = regexec (&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);
  580. }
  581. .fi
  582. .P
  583. .RE
  584. .SH "APPLICATION USAGE"
  585. An application could use:
  586. .sp
  587. .RS 4
  588. .nf
  589. regerror(code,preg,(char *)NULL,(size_t)0)
  590. .fi
  591. .P
  592. .RE
  593. .P
  594. to find out how big a buffer is needed for the generated string,
  595. \fImalloc\fR()
  596. a buffer to hold the string, and then call
  597. \fIregerror\fR()
  598. again to get the string. Alternatively, it could allocate a fixed,
  599. static buffer that is big enough to hold most strings, and then use
  600. \fImalloc\fR()
  601. to allocate a larger buffer if it finds that this is too small.
  602. .P
  603. To match a pattern as described in the Shell and Utilities volume of POSIX.1\(hy2017,
  604. .IR "Section 2.13" ", " "Pattern Matching Notation",
  605. use the
  606. \fIfnmatch\fR()
  607. function.
  608. .SH RATIONALE
  609. The
  610. \fIregexec\fR()
  611. function must fill in all
  612. .IR nmatch
  613. elements of
  614. .IR pmatch ,
  615. where
  616. .IR nmatch
  617. and
  618. .IR pmatch
  619. are supplied by the application, even if some elements of
  620. .IR pmatch
  621. do not correspond to subexpressions in
  622. .IR pattern .
  623. The application developer should note that there is probably no reason
  624. for using a value of
  625. .IR nmatch
  626. that is larger than
  627. .IR preg \->\c
  628. .IR re_nsub +1.
  629. .P
  630. The REG_NEWLINE flag supports a use of RE matching that is needed in
  631. some applications like text editors. In such applications, the user
  632. supplies an RE asking the application to find a line that matches the
  633. given expression. An anchor in such an RE anchors at the beginning or
  634. end of any line. Such an application can pass a sequence of
  635. <newline>-separated
  636. lines to
  637. \fIregexec\fR()
  638. as a single long string and specify REG_NEWLINE to
  639. \fIregcomp\fR()
  640. to get the desired behavior. The application must ensure that there are
  641. no explicit
  642. <newline>
  643. characters in
  644. .IR pattern
  645. if it wants to ensure that any match occurs entirely within a single
  646. line.
  647. .P
  648. The REG_NEWLINE flag affects the behavior of
  649. \fIregexec\fR(),
  650. but it is in the
  651. .IR cflags
  652. parameter to
  653. \fIregcomp\fR()
  654. to allow flexibility of implementation. Some implementations will want
  655. to generate the same compiled RE in
  656. \fIregcomp\fR()
  657. regardless of the setting of REG_NEWLINE and have
  658. \fIregexec\fR()
  659. handle anchors differently based on the setting of the flag. Other
  660. implementations will generate different compiled REs based on the
  661. REG_NEWLINE.
  662. .P
  663. The REG_ICASE flag supports the operations taken by the
  664. .IR grep
  665. .BR \-i
  666. option and the historical implementations of
  667. .IR ex
  668. and
  669. .IR vi .
  670. Including this flag will make it easier for application code to be
  671. written that does the same thing as these utilities.
  672. .P
  673. The substrings reported in
  674. .IR pmatch [\|]
  675. are defined using offsets from the start of the string rather than
  676. pointers. This allows type-safe access to both constant and non-constant
  677. strings.
  678. .P
  679. The type
  680. .BR regoff_t
  681. is used for the elements of
  682. .IR pmatch [\|]
  683. to ensure that the application can represent large arrays in memory
  684. (important for an application conforming to the Shell and Utilities volume of POSIX.1\(hy2017).
  685. .P
  686. The 1992 edition of this standard required
  687. .BR regoff_t
  688. to be at least as wide as
  689. .BR off_t ,
  690. to facilitate future extensions in which the string to be searched is
  691. taken from a file. However, these future extensions have not appeared.
  692. The requirement rules out popular implementations with 32-bit
  693. .BR regoff_t
  694. and 64-bit
  695. .BR off_t ,
  696. so it has been removed.
  697. .P
  698. The standard developers rejected the inclusion of a
  699. \fIregsub\fR()
  700. function that would be used to do substitutions for a matched RE. While
  701. such a routine would be useful to some applications, its utility would
  702. be much more limited than the matching function described here. Both RE
  703. parsing and substitution are possible to implement without support
  704. other than that required by the ISO\ C standard, but matching is much more
  705. complex than substituting. The only difficult part of substitution,
  706. given the information supplied by
  707. \fIregexec\fR(),
  708. is finding the next character in a string when there can be multi-byte
  709. characters. That is a much larger issue, and one that needs a more
  710. general solution.
  711. .P
  712. The
  713. .IR errno
  714. variable has not been used for error returns to avoid filling the
  715. .IR errno
  716. name space for this feature.
  717. .P
  718. The interface is defined so that the matched substrings
  719. .IR rm_sp
  720. and
  721. .IR rm_ep
  722. are in a separate
  723. .BR regmatch_t
  724. structure instead of in
  725. .BR regex_t .
  726. This allows a single compiled RE to be used simultaneously in several
  727. contexts; in
  728. \fImain\fR()
  729. and a signal handler, perhaps, or in multiple threads of lightweight
  730. processes. (The
  731. .IR preg
  732. argument to
  733. \fIregexec\fR()
  734. is declared with type
  735. .BR const ,
  736. so the implementation is not permitted to use the structure to store
  737. intermediate results.) It also allows an application to request an
  738. arbitrary number of substrings from an RE. The number of
  739. subexpressions in the RE is reported in
  740. .IR re_nsub
  741. in
  742. .IR preg .
  743. With this change to
  744. \fIregexec\fR(),
  745. consideration was given to dropping the REG_NOSUB flag since the user
  746. can now specify this with a zero
  747. .IR nmatch
  748. argument to
  749. \fIregexec\fR().
  750. However, keeping REG_NOSUB allows an implementation to use a different
  751. (perhaps more efficient) algorithm if it knows in
  752. \fIregcomp\fR()
  753. that no subexpressions need be reported. The implementation is only
  754. required to fill in
  755. .IR pmatch
  756. if
  757. .IR nmatch
  758. is not zero and if REG_NOSUB is not specified. Note that the
  759. .BR size_t
  760. type, as defined in the ISO\ C standard, is unsigned, so the description of
  761. \fIregexec\fR()
  762. does not need to address negative values of
  763. .IR nmatch .
  764. .P
  765. REG_NOTBOL was added to allow an application to do repeated searches
  766. for the same pattern in a line. If the pattern contains a
  767. <circumflex>
  768. character that should match the beginning of a line, then the pattern
  769. should only match when matched against the beginning of the line.
  770. Without the REG_NOTBOL flag, the application could rewrite the
  771. expression for subsequent matches, but in the general case this would
  772. require parsing the expression. The need for REG_NOTEOL is not as
  773. clear; it was added for symmetry.
  774. .P
  775. The addition of the
  776. \fIregerror\fR()
  777. function addresses the historical need for conforming application
  778. programs to have access to error information more than ``Function
  779. failed to compile/match your RE for unknown reasons''.
  780. .P
  781. This interface provides for two different methods of dealing with error
  782. conditions. The specific error codes (REG_EBRACE, for example), defined
  783. in
  784. .IR <regex.h> ,
  785. allow an application to recover from an error if it is so able. Many
  786. applications, especially those that use patterns supplied by a user,
  787. will not try to deal with specific error cases, but will just use
  788. \fIregerror\fR()
  789. to obtain a human-readable error message to present to the user.
  790. .P
  791. The
  792. \fIregerror\fR()
  793. function uses a scheme similar to
  794. \fIconfstr\fR()
  795. to deal with the problem of allocating memory to hold the generated
  796. string. The scheme used by
  797. \fIstrerror\fR()
  798. in the ISO\ C standard was considered unacceptable since it creates difficulties
  799. for multi-threaded applications.
  800. .P
  801. The
  802. .IR preg
  803. argument is provided to
  804. \fIregerror\fR()
  805. to allow an implementation to generate a more descriptive message than
  806. would be possible with
  807. .IR errcode
  808. alone. An implementation might, for example, save the character offset
  809. of the offending character of the pattern in a field of
  810. .IR preg ,
  811. and then include that in the generated message string. The
  812. implementation may also ignore
  813. .IR preg .
  814. .P
  815. A REG_FILENAME flag was considered, but omitted. This flag caused
  816. \fIregexec\fR()
  817. to match patterns as described in the Shell and Utilities volume of POSIX.1\(hy2017,
  818. .IR "Section 2.13" ", " "Pattern Matching Notation"
  819. instead of REs. This service is now provided by the
  820. \fIfnmatch\fR()
  821. function.
  822. .P
  823. Notice that there is a difference in philosophy between the ISO\ POSIX\(hy2:\|1993 standard and
  824. POSIX.1\(hy2008 in how to handle a ``bad'' regular expression. The ISO\ POSIX\(hy2:\|1993 standard says
  825. that many bad constructs ``produce undefined results'', or that
  826. ``the interpretation is undefined''. POSIX.1\(hy2008, however, says that the
  827. interpretation of such REs is unspecified. The term ``undefined'' means
  828. that the action by the application is an error, of similar severity
  829. to passing a bad pointer to a function.
  830. .P
  831. The
  832. \fIregcomp\fR()
  833. and
  834. \fIregexec\fR()
  835. functions are required to accept any null-terminated string as the
  836. .IR pattern
  837. argument. If the meaning of the string is ``undefined'', the behavior
  838. of the function is ``unspecified''. POSIX.1\(hy2008 does not specify how the
  839. functions will interpret the pattern; they might return error codes, or
  840. they might do pattern matching in some completely unexpected way, but
  841. they should not do something like abort the process.
  842. .SH "FUTURE DIRECTIONS"
  843. None.
  844. .SH "SEE ALSO"
  845. .IR "\fIfnmatch\fR\^(\|)",
  846. .IR "\fIglob\fR\^(\|)"
  847. .P
  848. The Base Definitions volume of POSIX.1\(hy2017,
  849. .IR "Chapter 9" ", " "Regular Expressions",
  850. .IR "\fB<regex.h>\fP",
  851. .IR "\fB<sys_types.h>\fP"
  852. .P
  853. The Shell and Utilities volume of POSIX.1\(hy2017,
  854. .IR "Section 2.13" ", " "Pattern Matching Notation"
  855. .\"
  856. .SH COPYRIGHT
  857. Portions of this text are reprinted and reproduced in electronic form
  858. from IEEE Std 1003.1-2017, Standard for Information Technology
  859. -- Portable Operating System Interface (POSIX), The Open Group Base
  860. Specifications Issue 7, 2018 Edition,
  861. Copyright (C) 2018 by the Institute of
  862. Electrical and Electronics Engineers, Inc and The Open Group.
  863. In the event of any discrepancy between this version and the original IEEE and
  864. The Open Group Standard, the original IEEE and The Open Group Standard
  865. is the referee document. The original Standard can be obtained online at
  866. http://www.opengroup.org/unix/online.html .
  867. .PP
  868. Any typographical or formatting errors that appear
  869. in this page are most likely
  870. to have been introduced during the conversion of the source files to
  871. man page format. To report such errors, see
  872. https://www.kernel.org/doc/man-pages/reporting_bugs.html .