yacc.1p (49732B)
- '\" et
- .TH YACC "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual"
- .\"
- .SH PROLOG
- This manual page is part of the POSIX Programmer's Manual.
- The Linux implementation of this interface may differ (consult
- the corresponding Linux manual page for details of Linux behavior),
- or the interface may not be implemented on Linux.
- .\"
- .SH NAME
- yacc
- \(em yet another compiler compiler (\fBDEVELOPMENT\fP)
- .SH SYNOPSIS
- .LP
- .nf
- yacc \fB[\fR-dltv\fB] [\fR-b \fIfile_prefix\fB] [\fR-p \fIsym_prefix\fB]\fI grammar\fR
- .fi
- .SH DESCRIPTION
- The
- .IR yacc
- utility shall read a description of a context-free grammar in
- .IR grammar
- and write C source code, conforming to the ISO\ C standard, to a code file,
- and optionally header information into a header file, in the current
- directory. The generated source code shall not depend on any undefined,
- unspecified, or implementation-defined behavior, except in cases where
- it is copied directly from the supplied grammar, or in cases that are
- documented by the implementation. The C code shall define a function
- and related routines and macros for an automaton that executes a parsing
- algorithm meeting the requirements in
- .IR "Algorithms".
- .P
- The form and meaning of the grammar are described in the EXTENDED
- DESCRIPTION section.
- .P
- The C source code and header file shall be produced in a form suitable
- as input for the C compiler (see
- .IR "\fIc99\fR\^").
- .SH OPTIONS
- The
- .IR yacc
- utility shall conform to the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 12.2" ", " "Utility Syntax Guidelines",
- except for Guideline 9.
- .P
- The following options shall be supported:
- .IP "\fB\-b\ \fIfile_prefix\fR" 10
- Use
- .IR file_prefix
- instead of
- .BR y
- as the prefix for all output filenames. The code file
- .BR y.tab.c ,
- the header file
- .BR y.tab.h
- (created when
- .BR \-d
- is specified), and the description file
- .BR y.output
- (created when
- .BR \-v
- is specified), shall be changed to
- .IR file_prefix \c
- .BR .tab.c ,
- .IR file_prefix \c
- .BR .tab.h ,
- and
- .IR file_prefix \c
- .BR .output ,
- respectively.
- .IP "\fB\-d\fP" 10
- Write the header file; by default only the code file is written. See
- the OUTPUT FILES section.
- .IP "\fB\-l\fP" 10
- Produce a code file that does not contain any
- .BR #line
- constructs. If this option is not present, it is unspecified whether
- the code file or header file contains
- .BR #line
- directives. This should only be used after the grammar and the
- associated actions are fully debugged.
- .IP "\fB\-p\ \fIsym_prefix\fR" 10
- .br
- Use
- .IR sym_prefix
- instead of
- .BR yy
- as the prefix for all external names produced by
- .IR yacc .
- The names affected shall include the functions
- \fIyyparse\fR(),
- \fIyylex\fR(),
- and
- \fIyyerror\fR(),
- and the variables
- .IR yylval ,
- .IR yychar ,
- and
- .IR yydebug .
- (In the remainder of this section, the six symbols cited are referenced
- using their default names only as a notational convenience.) Local
- names may also be affected by the
- .BR \-p
- option; however, the
- .BR \-p
- option shall not affect
- .BR #define
- symbols generated by
- .IR yacc .
- .IP "\fB\-t\fP" 10
- Modify conditional compilation directives to permit compilation of
- debugging code in the code file. Runtime debugging statements shall
- always be contained in the code file, but by default conditional
- compilation directives prevent their compilation.
- .IP "\fB\-v\fP" 10
- Write a file containing a description of the parser and a report of
- conflicts generated by ambiguities in the grammar.
- .br
- .SH OPERANDS
- The following operand is required:
- .IP "\fIgrammar\fR" 10
- A pathname of a file containing instructions, hereafter called
- .IR grammar ,
- for which a parser is to be created. The format for the grammar is
- described in the EXTENDED DESCRIPTION section.
- .SH STDIN
- Not used.
- .SH "INPUT FILES"
- The file
- .IR grammar
- shall be a text file formatted as specified in the EXTENDED DESCRIPTION
- section.
- .SH "ENVIRONMENT VARIABLES"
- The following environment variables shall affect the execution of
- .IR yacc :
- .IP "\fILANG\fP" 10
- Provide a default value for the internationalization variables that are
- unset or null. (See the Base Definitions volume of POSIX.1\(hy2017,
- .IR "Section 8.2" ", " "Internationalization Variables"
- for the precedence of internationalization variables used to determine
- the values of locale categories.)
- .IP "\fILC_ALL\fP" 10
- If set to a non-empty string value, override the values of all the
- other internationalization variables.
- .IP "\fILC_CTYPE\fP" 10
- Determine the locale for the interpretation of sequences of bytes of
- text data as characters (for example, single-byte as opposed to
- multi-byte characters in arguments and input files).
- .IP "\fILC_MESSAGES\fP" 10
- .br
- Determine the locale that should be used to affect the format and
- contents of diagnostic messages written to standard error.
- .IP "\fINLSPATH\fP" 10
- Determine the location of message catalogs for the processing of
- .IR LC_MESSAGES .
- .P
- The
- .IR LANG
- and
- .IR LC_*
- variables affect the execution of the
- .IR yacc
- utility as stated. The
- \fImain\fR()
- function defined in
- .IR "Yacc Library"
- shall call:
- .sp
- .RS 4
- .nf
- setlocale(LC_ALL, "")
- .fi
- .P
- .RE
- .P
- and thus the program generated by
- .IR yacc
- shall also be affected by the contents of these variables at runtime.
- .SH "ASYNCHRONOUS EVENTS"
- Default.
- .SH STDOUT
- Not used.
- .SH STDERR
- If shift/reduce or reduce/reduce conflicts are detected in
- .IR grammar ,
- .IR yacc
- shall write a report of those conflicts to the standard error in an
- unspecified format.
- .P
- Standard error shall also be used for diagnostic messages.
- .SH "OUTPUT FILES"
- The code file, the header file, and the description file shall be text
- files. All are described in the following sections.
- .SS "Code File"
- .P
- This file shall contain the C source code for the
- \fIyyparse\fR()
- function. It shall contain code for the various semantic actions with
- macro substitution performed on them as described in the EXTENDED
- DESCRIPTION section. It also shall contain a copy of the
- .BR #define
- statements in the header file. If a
- .BR %union
- declaration is used, the declaration for YYSTYPE shall also be included
- in this file.
- .SS "Header File"
- .P
- The header file shall contain
- .BR #define
- statements that associate the token numbers with the token names. This
- allows source files other than the code file to access the token
- codes. If a
- .BR %union
- declaration is used, the declaration for YYSTYPE and an
- .IR "extern YYSTYPE yylval"
- declaration shall also be included in this file.
- .SS "Description File"
- .P
- The description file shall be a text file containing a description of
- the state machine corresponding to the parser, using an unspecified
- format. Limits for internal tables (see
- .IR "Limits")
- shall also be reported, in an implementation-defined manner. (Some
- implementations may use dynamic allocation techniques and have no
- specific limit values to report.)
- .SH "EXTENDED DESCRIPTION"
- The
- .IR yacc
- command accepts a language that is used to define a grammar for a
- target language to be parsed by the tables and code generated by
- .IR yacc .
- The language accepted by
- .IR yacc
- as a grammar for the target language is described below using the
- .IR yacc
- input language itself.
- .P
- The input
- .IR grammar
- includes rules describing the input structure of the target language
- and code to be invoked when these rules are recognized to provide the
- associated semantic action. The code to be executed shall appear as bodies
- of text that are intended to be C-language code. These bodies of text
- shall not contain C-language trigraphs. The C-language inclusions are
- presumed to form a correct function when processed by
- .IR yacc
- into its output files. The code included in this way shall be executed
- during the recognition of the target language.
- .P
- Given a grammar, the
- .IR yacc
- utility generates the files described in the OUTPUT FILES section. The
- code file can be compiled and linked using
- .IR c99 .
- If the declaration and programs sections of the grammar file did not
- include definitions of
- \fImain\fR(),
- \fIyylex\fR(),
- and
- \fIyyerror\fR(),
- the compiled output requires linking with externally supplied versions
- of those functions. Default versions of
- \fImain\fR()
- and
- \fIyyerror\fR()
- are supplied in the
- .IR yacc
- library and can be linked in by using the
- .BR "\-l\ y"
- operand to
- .IR c99 .
- The
- .IR yacc
- library interfaces need not support interfaces with other than the
- default
- .BR yy
- symbol prefix. The application provides the lexical analyzer function,
- \fIyylex\fR();
- the
- .IR lex
- utility is specifically designed to generate such a routine.
- .SS "Input Language"
- .P
- The application shall ensure that every specification file consists of
- three sections in order:
- .IR declarations ,
- .IR "grammar rules" ,
- and
- .IR programs ,
- separated by double
- <percent-sign>
- characters (\c
- .BR \(dq%%\(dq ).
- The declarations and programs sections can be empty. If the latter is
- empty, the preceding
- .BR \(dq%%\(dq
- mark separating it from the rules section can be omitted.
- .P
- The input is free form text following the structure of the grammar
- defined below.
- .SS "Lexical Structure of the Grammar"
- .P
- The
- <blank>,
- <newline>,
- and
- <form-feed>
- character shall be ignored, except that the application shall ensure that
- they do not appear in names or multi-character reserved symbols. Comments
- shall be enclosed in
- .BR \(dq/*\ ...\ */\(dq ,
- and can appear wherever a name is valid.
- .P
- Names are of arbitrary length, made up of letters, periods (\c
- .BR '.' ),
- underscores (\c
- .BR '_' ),
- and non-initial digits. Uppercase and lowercase letters are distinct.
- Conforming applications shall not use names beginning in
- .BR yy
- or
- .BR YY
- since the
- .IR yacc
- parser uses such names. Many of the names appear in the final output
- of
- .IR yacc ,
- and thus they should be chosen to conform with any additional rules
- created by the C compiler to be used. In particular they appear in
- .BR #define
- statements.
- .P
- A literal shall consist of a single character enclosed in single-quote
- characters. All of the escape sequences supported for character constants
- by the ISO\ C standard shall be supported by
- .IR yacc .
- .P
- The relationship with the lexical analyzer is discussed in detail below.
- .P
- The application shall ensure that the NUL character is not used in
- grammar rules or literals.
- .SS "Declarations Section"
- .P
- The declarations section is used to define the symbols used to define
- the target language and their relationship with each other. In
- particular, much of the additional information required to resolve
- ambiguities in the context-free grammar for the target language is
- provided here.
- .P
- Usually
- .IR yacc
- assigns the relationship between the symbolic names it generates and
- their underlying numeric value. The declarations section makes it
- possible to control the assignment of these values.
- .P
- It is also possible to keep semantic information associated with the
- tokens currently on the parse stack in a user-defined C-language
- .BR union ,
- if the members of the union are associated with the various names in
- the grammar. The declarations section provides for this as well.
- .P
- The first group of declarators below all take a list of names as
- arguments. That list can optionally be preceded by the name of a C
- union member (called a
- .IR tag
- below) appearing within
- .BR '<'
- and
- .BR '>' .
- (As an exception to the typographical conventions of the rest of this volume of POSIX.1\(hy2017,
- in this case <\fItag\fP> does not represent a metavariable, but the
- literal angle bracket characters surrounding a symbol.) The use of
- .IR tag
- specifies that the tokens named on this line shall be of the same C
- type as the union member referenced by
- .IR tag .
- This is discussed in more detail below.
- .P
- For lists used to define tokens, the first appearance of a given token
- can be followed by a positive integer (as a string of decimal digits).
- If this is done, the underlying value assigned to it for lexical
- purposes shall be taken to be that number.
- .P
- The following declares
- .IR name
- to be a token:
- .sp
- .RS 4
- .nf
- %token \fB[\fR<\fItag\fR>\fB] \fIname \fB[\fInumber\fB] [\fIname \fB[\fInumber\fB]]\fR...
- .fi
- .P
- .RE
- .P
- If
- .IR tag
- is present, the C type for all tokens on this line shall be declared to
- be the type referenced by
- .IR tag .
- If a positive integer,
- .IR number ,
- follows a
- .IR name ,
- that value shall be assigned to the token.
- .P
- The following declares
- .IR name
- to be a token, and assigns precedence to it:
- .sp
- .RS 4
- .nf
- %left \fB[\fR<\fItag\fR>\fB] \fIname \fB[\fInumber\fB] [\fIname \fB[\fInumber\fB]]\fR...
- %right \fB[\fR<\fItag\fR>\fB] \fIname \fB[\fInumber\fB] [\fIname \fB[\fInumber\fB]]\fR...
- .fi
- .P
- .RE
- .P
- One or more lines, each beginning with one of these symbols, can appear
- in this section. All tokens on the same line have the same precedence
- level and associativity; the lines are in order of increasing
- precedence or binding strength.
- .BR %left
- denotes that the operators on that line are left associative, and
- .BR %right
- similarly denotes right associative operators. If
- .IR tag
- is present, it shall declare a C type for
- .IR name s
- as described for
- .BR %token .
- .P
- The following declares
- .IR name
- to be a token, and indicates that this cannot be used associatively:
- .sp
- .RS 4
- .nf
- %nonassoc \fB[\fR<\fItag\fR>\fB] \fIname \fB[\fInumber\fB] [\fIname \fB[\fInumber\fB]]\fR...
- .fi
- .P
- .RE
- .P
- If the parser encounters associative use of this token it reports an
- error. If
- .IR tag
- is present, it shall declare a C type for
- .IR name s
- as described for
- .BR %token .
- .P
- The following declares that union member
- .IR name s
- are non-terminals, and thus it is required to have a
- .IR tag
- field at its beginning:
- .sp
- .RS 4
- .nf
- %type <\fItag\fR> \fIname\fR...
- .fi
- .P
- .RE
- .P
- Because it deals with non-terminals only, assigning a token number or
- using a literal is also prohibited. If this construct is present,
- .IR yacc
- shall perform type checking; if this construct is not present, the
- parse stack shall hold only the
- .BR int
- type.
- .P
- Every name used in
- .IR grammar
- not defined by a
- .BR %token ,
- .BR %left ,
- .BR %right ,
- or
- .BR %nonassoc
- declaration is assumed to represent a non-terminal symbol. The
- .IR yacc
- utility shall report an error for any non-terminal symbol that does not
- appear on the left side of at least one grammar rule.
- .P
- Once the type, precedence, or token number of a name is specified, it
- shall not be changed. If the first declaration of a token does not
- assign a token number,
- .IR yacc
- shall assign a token number. Once this assignment is made, the token
- number shall not be changed by explicit assignment.
- .P
- The following declarators do not follow the previous pattern.
- .P
- The following declares the non-terminal
- .IR name
- to be the
- .IR "start symbol" ,
- which represents the largest, most general structure described by the
- grammar rules:
- .sp
- .RS 4
- .nf
- %start \fIname\fR
- .fi
- .P
- .RE
- .P
- By default, it is the left-hand side of the first grammar rule; this
- default can be overridden with this declaration.
- .P
- The following declares the
- .IR yacc
- value stack to be a union of the various types of values desired.
- .sp
- .RS 4
- .nf
- %union { \fIbody of union\fR (\fIin C\fR) }
- .fi
- .P
- .RE
- .P
- The body of the union shall not contain unbalanced curly brace
- preprocessing tokens.
- .P
- By default, the values returned by actions (see below) and the lexical
- analyzer shall be of type
- .BR int .
- The
- .IR yacc
- utility keeps track of types, and it shall insert corresponding union
- member names in order to perform strict type checking of the resulting
- parser.
- .P
- Alternatively, given that at least one <\fItag\fP> construct is used,
- the union can be declared in a header file (which shall be included in
- the declarations section by using a
- .BR #include
- construct within
- .BR %{
- and
- .BR %} ),
- and a
- .BR typedef
- used to define the symbol YYSTYPE to represent this union. The effect
- of
- .BR %union
- is to provide the declaration of YYSTYPE directly from the
- .IR yacc
- input.
- .P
- C-language declarations and definitions can appear in the declarations
- section, enclosed by the following marks:
- .sp
- .RS 4
- .nf
- %{ ... %}
- .fi
- .P
- .RE
- .P
- These statements shall be copied into the code file, and have global
- scope within it so that they can be used in the rules and program
- sections. The statements shall not contain
- .BR \(dq%}\(dq
- outside a comment, string literal, or multi-character constant.
- .P
- The application shall ensure that the declarations section is
- terminated by the token
- .BR %% .
- .SS "Grammar Rules in yacc"
- .P
- The rules section defines the context-free grammar to be accepted by
- the function
- .IR yacc
- generates, and associates with those rules C-language actions and
- additional precedence information. The grammar is described below, and
- a formal definition follows.
- .P
- The rules section is comprised of one or more grammar rules. A grammar
- rule has the form:
- .sp
- .RS 4
- .nf
- A : BODY ;
- .fi
- .P
- .RE
- .P
- The symbol
- .BR A
- represents a non-terminal name, and
- .BR BODY
- represents a sequence of zero or more
- .IR name s,
- .IR literal s,
- and
- .IR "semantic action" s
- that can then be followed by optional
- .IR "precedence rule" s.
- Only the names and literals participate in the formation of the
- grammar; the semantic actions and precedence rules are used in other
- ways. The
- <colon>
- and the
- <semicolon>
- are
- .IR yacc
- punctuation. If there are several successive grammar rules with the
- same left-hand side, the
- <vertical-line>
- (\c
- .BR '|' )
- can be used to avoid rewriting the left-hand side; in this case the
- <semicolon>
- appears only after the last rule. The BODY part can be empty
- (or empty of names and literals) to indicate that the non-terminal
- symbol matches the empty string.
- .P
- The
- .IR yacc
- utility assigns a unique number to each rule. Rules using the vertical
- bar notation are distinct rules. The number assigned to the rule
- appears in the description file.
- .P
- The elements comprising a BODY are:
- .IP "\fIname\fR,\ \fIliteral\fR" 10
- These form the rules of the grammar:
- .IR name
- is either a
- .IR token
- or a
- .IR non-terminal ;
- .IR literal
- stands for itself (less the lexically required quotation marks).
- .IP "\fIsemantic action\fR" 10
- .br
- With each grammar rule, the user can associate actions to be performed
- each time the rule is recognized in the input process. (Note that the
- word ``action'' can also refer to the actions of the parser\(emshift,
- reduce, and so on.)
- .RS 10
- .P
- These actions can return values and can obtain the values returned by
- previous actions. These values are kept in objects of type YYSTYPE
- (see
- .BR %union ).
- The result value of the action shall be kept on the parse stack with
- the left-hand side of the rule, to be accessed by other reductions as
- part of their right-hand side. By using the <\fItag\fP> information
- provided in the declarations section, the code generated by
- .IR yacc
- can be strictly type checked and contain arbitrary information. In
- addition, the lexical analyzer can provide the same kinds of values for
- tokens, if desired.
- .P
- An action is an arbitrary C statement and as such can do input or
- output, call subprograms, and alter external variables. An action is
- one or more C statements enclosed in curly braces
- .BR '{'
- and
- .BR '}' .
- The statements shall not contain unbalanced curly brace preprocessing
- tokens.
- .P
- Certain pseudo-variables can be used in the action. These are macros
- for access to data structures known internally to
- .IR yacc .
- .IP $$ 10
- The value of the action can be set by assigning it to $$. If type
- checking is enabled and the type of the value to be assigned cannot be
- determined, a diagnostic message may be generated.
- .IP "$\fInumber\fR" 10
- This refers to the value returned by the component specified by the
- token
- .IR number
- in the right side of a rule, reading from left to right;
- .IR number
- can be zero or negative. If
- .IR number
- is zero or negative, it refers to the data associated with the name on
- the parser's stack preceding the leftmost symbol of the current rule.
- (That is,
- .BR \(dq$0\(dq
- refers to the name immediately preceding the leftmost name in the
- current rule to be found on the parser's stack and
- .BR \(dq$-1\(dq
- refers to the symbol to
- .IR its
- left.) If
- .IR number
- refers to an element past the current point in the rule, or beyond the
- bottom of the stack, the result is undefined. If type checking is
- enabled and the type of the value to be assigned cannot be determined,
- a diagnostic message may be generated.
- .IP "$<\fItag\fR>\fInumber\fR" 10
- .br
- These correspond exactly to the corresponding symbols without the
- .IR tag
- inclusion, but allow for strict type checking (and preclude unwanted
- type conversions). The effect is that the macro is expanded to use
- .IR tag
- to select an element from the YYSTYPE union (using
- .IR dataname.tag ).
- This is particularly useful if
- .IR number
- is not positive.
- .IP "$<\fItag\fR>$" 10
- This imposes on the reference the type of the union member referenced
- by
- .IR tag .
- This construction is applicable when a reference to a left context
- value occurs in the grammar, and provides
- .IR yacc
- with a means for selecting a type.
- .P
- Actions can occur anywhere in a rule (not just at the end); an
- action can access values returned by actions to its left, and in turn
- the value it returns can be accessed by actions to its right. An
- action appearing in the middle of a rule shall be equivalent to
- replacing the action with a new non-terminal symbol and adding an empty
- rule with that non-terminal symbol on the left-hand side. The semantic
- action associated with the new rule shall be equivalent to the original
- action. The use of actions within rules might introduce conflicts that
- would not otherwise exist.
- .P
- By default, the value of a rule shall be the value of the first element
- in it. If the first element does not have a type (particularly in the
- case of a literal) and type checking is turned on by
- .BR %type ,
- an error message shall result.
- .RE
- .IP "\fIprecedence\fR" 10
- The keyword
- .BR %prec
- can be used to change the precedence level associated with a particular
- grammar rule. Examples of this are in cases where a unary and binary
- operator have the same symbolic representation, but need to be given
- different precedences, or where the handling of an ambiguous if-else
- construction is necessary. The reserved symbol
- .BR %prec
- can appear immediately after the body of the grammar rule and can be
- followed by a token name or a literal. It shall cause the precedence
- of the grammar rule to become that of the following token name or
- literal. The action for the rule as a whole can follow
- .BR %prec .
- .P
- If a program section follows, the application shall ensure that the
- grammar rules are terminated by
- .BR %% .
- .SS "Programs Section"
- .P
- The
- .IR programs
- section can include the definition of the lexical analyzer
- \fIyylex\fR(),
- and any other functions; for example, those used in the actions
- specified in the grammar rules. It is unspecified whether the programs
- section precedes or follows the semantic actions in the output file;
- therefore, if the application contains any macro definitions and
- declarations intended to apply to the code in the semantic actions, it
- shall place them within
- .BR \(dq%{\ ...\ %}\(dq
- in the declarations section.
- .SS "Input Grammar"
- .P
- The following input to
- .IR yacc
- yields a parser for the input to
- .IR yacc .
- This formal syntax takes precedence over the preceding text syntax
- description.
- .P
- The lexical structure is defined less precisely;
- .IR "Lexical Structure of the Grammar"
- defines most terms. The correspondence between the previous terms and
- the tokens below is as follows.
- .IP "\fBIDENTIFIER\fR" 12
- This corresponds to the concept of
- .IR name ,
- given previously. It also includes literals as defined previously.
- .IP "\fBC_IDENTIFIER\fR" 12
- This is a name, and additionally it is known to be followed by a
- <colon>.
- A literal cannot yield this token.
- .IP "\fBNUMBER\fR" 12
- A string of digits (a non-negative decimal integer).
- .IP "\fBTYPE\fR,\ \fBLEFT\fR,\ \fBMARK\fR,\ \fBLCURL\fR,\ \fBRCURL\fR" 12
- .br
- These correspond directly to
- .BR %type ,
- .BR %left ,
- .BR %% ,
- .BR %{ ,
- and
- .BR %} .
- .IP "\fB{\ .\|.\|.\ }\fR" 12
- This indicates C-language source code, with the possible inclusion of
- .BR '$'
- macros as discussed previously.
- .sp
- .RS 4
- .nf
- /* Grammar for the input to yacc. */
- /* Basic entries. */
- /* The following are recognized by the lexical analyzer. */
- .P
- %token IDENTIFIER /* Includes identifiers and literals */
- %token C_IDENTIFIER /* identifier (but not literal)
- followed by a :. */
- %token NUMBER /* [0-9][0-9]* */
- .P
- /* Reserved words : %type=>TYPE %left=>LEFT, and so on */
- .P
- %token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION
- .P
- %token MARK /* The %% mark. */
- %token LCURL /* The %{ mark. */
- %token RCURL /* The %} mark. */
- .P
- /* 8-bit character literals stand for themselves; */
- /* tokens have to be defined for multi-byte characters. */
- .P
- %start spec
- .P
- %%
- .P
- spec : defs MARK rules tail
- ;
- tail : MARK
- {
- /* In this action, set up the rest of the file. */
- }
- | /* Empty; the second MARK is optional. */
- ;
- defs : /* Empty. */
- | defs def
- ;
- def : START IDENTIFIER
- | UNION
- {
- /* Copy union definition to output. */
- }
- | LCURL
- {
- /* Copy C code to output file. */
- }
- RCURL
- | rword tag nlist
- ;
- rword : TOKEN
- | LEFT
- | RIGHT
- | NONASSOC
- | TYPE
- ;
- tag : /* Empty: union tag ID optional. */
- | \(aq<\(aq IDENTIFIER \(aq>\(aq
- ;
- nlist : nmno
- | nlist nmno
- ;
- nmno : IDENTIFIER /* Note: literal invalid with % type. */
- | IDENTIFIER NUMBER /* Note: invalid with % type. */
- ;
- .P
- /* Rule section */
- .P
- rules : C_IDENTIFIER rbody prec
- | rules rule
- ;
- rule : C_IDENTIFIER rbody prec
- | \(aq|\(aq rbody prec
- ;
- rbody : /* empty */
- | rbody IDENTIFIER
- | rbody act
- ;
- act : \(aq{\(aq
- {
- /* Copy action, translate $$, and so on. */
- }
- \(aq}\(aq
- ;
- prec : /* Empty */
- | PREC IDENTIFIER
- | PREC IDENTIFIER act
- | prec \(aq;\(aq
- ;
- .fi
- .P
- .RE
- .sp
- .SS "Conflicts"
- .P
- The parser produced for an input grammar may contain states in which
- conflicts occur. The conflicts occur because the grammar is not
- LALR(1). An ambiguous grammar always contains at least one LALR(1)
- conflict. The
- .IR yacc
- utility shall resolve all conflicts, using either default rules or
- user-specified precedence rules.
- .P
- Conflicts are either shift/reduce conflicts or reduce/reduce
- conflicts. A shift/reduce conflict is where, for a given state and
- lookahead symbol, both a shift action and a reduce action are
- possible. A reduce/reduce conflict is where, for a given state and
- lookahead symbol, reductions by two different rules are possible.
- .P
- The rules below describe how to specify what actions to take when a
- conflict occurs. Not all shift/reduce conflicts can be successfully
- resolved this way because the conflict may be due to something other
- than ambiguity, so incautious use of these facilities can cause the
- language accepted by the parser to be much different from that which
- was intended. The description file shall contain sufficient
- information to understand the cause of the conflict. Where ambiguity
- is the reason either the default or explicit rules should be adequate
- to produce a working parser.
- .P
- The declared precedences and associativities (see
- .IR "Declarations Section")
- are used to resolve parsing conflicts as follows:
- .IP " 1." 4
- A precedence and associativity is associated with each grammar rule; it
- is the precedence and associativity of the last token or literal in the
- body of the rule. If the
- .BR %prec
- keyword is used, it overrides this default. Some grammar rules might
- not have both precedence and associativity.
- .IP " 2." 4
- If there is a shift/reduce conflict, and both the grammar rule and the
- input symbol have precedence and associativity associated with them,
- then the conflict is resolved in favor of the action (shift or reduce)
- associated with the higher precedence. If the precedences are the
- same, then the associativity is used; left associative implies reduce,
- right associative implies shift, and non-associative implies an error
- in the string being parsed.
- .IP " 3." 4
- When there is a shift/reduce conflict that cannot be resolved by rule
- 2, the shift is done. Conflicts resolved this way are counted in the
- diagnostic output described in
- .IR "Error Handling".
- .IP " 4." 4
- When there is a reduce/reduce conflict, a reduction is done by the
- grammar rule that occurs earlier in the input sequence. Conflicts
- resolved this way are counted in the diagnostic output described in
- .IR "Error Handling".
- .P
- Conflicts resolved by precedence or associativity shall not be counted
- in the shift/reduce and reduce/reduce conflicts reported by
- .IR yacc
- on either standard error or in the description file.
- .SS "Error Handling"
- .P
- The token
- .BR error
- shall be reserved for error handling. The name
- .BR error
- can be used in grammar rules. It indicates places where the parser can
- recover from a syntax error. The default value of
- .BR error
- shall be 256. Its value can be changed using a
- .BR %token
- declaration. The lexical analyzer should not return the value of
- .BR error .
- .P
- The parser shall detect a syntax error when it is in a state where the
- action associated with the lookahead symbol is
- .BR error .
- A semantic action can cause the parser to initiate error handling by
- executing the macro YYERROR. When YYERROR is executed, the semantic
- action passes control back to the parser. YYERROR cannot be used
- outside of semantic actions.
- .P
- When the parser detects a syntax error, it normally calls
- \fIyyerror\fR()
- with the character string
- .BR \(dqsyntax\ error\(dq
- as its argument. The call shall not be made if the parser is still
- recovering from a previous error when the error is detected. The
- parser is considered to be recovering from a previous error until the
- parser has shifted over at least three normal input symbols since the
- last error was detected or a semantic action has executed the macro
- .IR yyerrok .
- The parser shall not call
- \fIyyerror\fR()
- when YYERROR is executed.
- .P
- The macro function YYRECOVERING shall return 1 if a syntax error
- has been detected and the parser has not yet fully recovered from it.
- Otherwise, zero shall be returned.
- .P
- When a syntax error is detected by the parser, the parser shall check
- if a previous syntax error has been detected. If a previous error was
- detected, and if no normal input symbols have been shifted since the
- preceding error was detected, the parser checks if the lookahead symbol
- is an endmarker (see
- .IR "Interface to the Lexical Analyzer").
- If it is, the parser shall return with a non-zero value. Otherwise,
- the lookahead symbol shall be discarded and normal parsing shall
- resume.
- .P
- When YYERROR is executed or when the parser detects a syntax error and
- no previous error has been detected, or at least one normal input
- symbol has been shifted since the previous error was detected, the
- parser shall pop back one state at a time until the parse stack is
- empty or the current state allows a shift over
- .BR error .
- If the parser empties the parse stack, it shall return with a non-zero
- value. Otherwise, it shall shift over
- .BR error
- and then resume normal parsing. If the parser reads a lookahead symbol
- before the error was detected, that symbol shall still be the lookahead
- symbol when parsing is resumed.
- .P
- The macro
- .IR yyerrok
- in a semantic action shall cause the parser to act as if it has fully
- recovered from any previous errors. The macro
- .IR yyclearin
- shall cause the parser to discard the current lookahead token. If the
- current lookahead token has not yet been read,
- .IR yyclearin
- shall have no effect.
- .P
- The macro YYACCEPT shall cause the parser to return with the value
- zero. The macro YYABORT shall cause the parser to return with a
- non-zero value.
- .SS "Interface to the Lexical Analyzer"
- .P
- The
- \fIyylex\fR()
- function is an integer-valued function that returns a
- .IR "token number"
- representing the kind of token read. If there is a value associated
- with the token returned by
- \fIyylex\fR()
- (see the discussion of
- .IR tag
- above), it shall be assigned to the external variable
- .IR yylval .
- .P
- If the parser and
- \fIyylex\fR()
- do not agree on these token numbers, reliable communication between
- them cannot occur. For (single-byte character) literals, the token is
- simply the numeric value of the character in the current character set.
- The numbers for other tokens can either be chosen by
- .IR yacc ,
- or chosen by the user. In either case, the
- .BR #define
- construct of C is used to allow
- \fIyylex\fR()
- to return these numbers symbolically. The
- .BR #define
- statements are put into the code file, and the header file if that file
- is requested. The set of characters permitted by
- .IR yacc
- in an identifier is larger than that permitted by C. Token names found
- to contain such characters shall not be included in the
- .BR #define
- declarations.
- .P
- If the token numbers are chosen by
- .IR yacc ,
- the tokens other than literals shall be assigned numbers greater than
- 256, although no order is implied. A token can be explicitly assigned a
- number by following its first appearance in the declarations section
- with a number. Names and literals not defined this way retain their
- default definition. All token numbers assigned by
- .IR yacc
- shall be unique and distinct from the token numbers used for literals
- and user-assigned tokens. If duplicate token numbers cause conflicts in
- parser generation,
- .IR yacc
- shall report an error; otherwise, it is unspecified whether the token
- assignment is accepted or an error is reported.
- .P
- The end of the input is marked by a special token called the
- .IR endmarker ,
- which has a token number that is zero or negative. (These values are
- invalid for any other token.) All lexical analyzers shall return zero
- or negative as a token number upon reaching the end of their input. If
- the tokens up to, but excluding, the endmarker form a structure that
- matches the start symbol, the parser shall accept the input. If the
- endmarker is seen in any other context, it shall be considered an
- error.
- .SS "Completing the Program"
- .P
- In addition to
- \fIyyparse\fR()
- and
- \fIyylex\fR(),
- the functions
- \fIyyerror\fR()
- and
- \fImain\fR()
- are required to make a complete program. The application can supply
- \fImain\fR()
- and
- \fIyyerror\fR(),
- or those routines can be obtained from the
- .IR yacc
- library.
- .SS "Yacc Library"
- .P
- The following functions shall appear only in the
- .IR yacc
- library accessible through the
- .BR "\-l\ y"
- operand to
- .IR c99 ;
- they can therefore be redefined by a conforming application:
- .IP "\fBint\ \fImain\fR(\fBvoid\fR)" 6
- .br
- This function shall call
- \fIyyparse\fR()
- and exit with an unspecified value. Other actions within this function
- are unspecified.
- .IP "\fBint\ \fIyyerror\fR(\fBconst\ char\fR\ *\fIs\fR)" 6
- .br
- This function shall write the NUL-terminated argument to standard
- error, followed by a
- <newline>.
- .P
- The order of the
- .BR "\-l\ y"
- and
- .BR "\-l\ l"
- operands given to
- .IR c99
- is significant; the application shall either provide its own
- \fImain\fR()
- function or ensure that
- .BR "\-l\ y"
- precedes
- .BR "\-l\ l" .
- .SS "Debugging the Parser"
- .P
- The parser generated by
- .IR yacc
- shall have diagnostic facilities in it that can be optionally enabled
- at either compile time or at runtime (if enabled at compile time).
- The compilation of the runtime debugging code is under the control of
- YYDEBUG, a preprocessor symbol. If YYDEBUG has a non-zero value, the
- debugging code shall be included. If its value is zero, the code shall
- not be included.
- .P
- In parsers where the debugging code has been included, the external
- .BR int
- .IR yydebug
- can be used to turn debugging on (with a non-zero value) and off (zero
- value) at runtime. The initial value of
- .IR yydebug
- shall be zero.
- .P
- When
- .BR \-t
- is specified, the code file shall be built such that, if YYDEBUG is not
- already defined at compilation time (using the
- .IR c99
- .BR \-D
- YYDEBUG option, for example), YYDEBUG shall be set explicitly to 1.
- When
- .BR \-t
- is not specified, the code file shall be built such that, if YYDEBUG is
- not already defined, it shall be set explicitly to zero.
- .P
- The format of the debugging output is unspecified but includes at least
- enough information to determine the shift and reduce actions, and the
- input symbols. It also provides information about error recovery.
- .SS "Algorithms"
- .P
- The parser constructed by
- .IR yacc
- implements an LALR(1) parsing algorithm as documented in the
- literature. It is unspecified whether the parser is table-driven or
- direct-coded.
- .P
- A parser generated by
- .IR yacc
- shall never request an input symbol from
- \fIyylex\fR()
- while in a state where the only actions other than the error action are
- reductions by a single rule.
- .P
- The literature of parsing theory defines these concepts.
- .SS "Limits"
- .P
- The
- .IR yacc
- utility may have several internal tables. The minimum maximums for
- these tables are shown in the following table. The exact meaning of
- these values is implementation-defined. The implementation shall
- define the relationship between these values and between them and any
- error messages that the implementation may generate should it run out
- of space for any internal structure. An implementation may combine
- groups of these resources into a single pool as long as the total
- available to the user does not fall below the sum of the sizes
- specified by this section.
- .br
- .sp
- .ce 1
- \fBTable: Internal Limits in \fIyacc\fP\fR
- .ad l
- .TS
- center box tab(@);
- cB | cB | cB
- cB | cB | cB
- l | n | lw(3i).
- @Minimum
- Limit@Maximum@Description
- _
- {NTERMS}@126@Number of tokens.
- {NNONTERM}@200@Number of non-terminals.
- {NPROD}@300@Number of rules.
- {NSTATES}@600@Number of states.
- {MEMSIZE}@5\|200@T{
- Length of rules. The total length, in names (tokens and
- non-terminals), of all the rules of the grammar. The left-hand side is
- counted for each rule, even if it is not explicitly repeated, as
- specified in
- .IR "Grammar Rules in yacc".
- T}
- {ACTSIZE}@4\|000@T{
- Number of actions. ``Actions'' here (and in the description file)
- refer to parser actions (shift, reduce, and so on) not to semantic
- actions defined in
- .IR "Grammar Rules in yacc".
- T}
- .TE
- .ad b
- .SH "EXIT STATUS"
- The following exit values shall be returned:
- .IP "\00" 6
- Successful completion.
- .IP >0 6
- An error occurred.
- .SH "CONSEQUENCES OF ERRORS"
- If any errors are encountered, the run is aborted and
- .IR yacc
- exits with a non-zero status. Partial code files and header files
- may be produced. The summary information in the description file
- shall always be produced if the
- .BR \-v
- flag is present.
- .LP
- .IR "The following sections are informative."
- .SH "APPLICATION USAGE"
- Historical implementations experience name conflicts on the names
- .BR yacc.tmp ,
- .BR yacc.acts ,
- .BR yacc.debug ,
- .BR y.tab.c ,
- .BR y.tab.h ,
- and
- .BR y.output
- if more than one copy of
- .IR yacc
- is running in a single directory at one time. The
- .BR \-b
- option was added to overcome this problem. The related problem of
- allowing multiple
- .IR yacc
- parsers to be placed in the same file was addressed by adding a
- .BR \-p
- option to override the previously hard-coded
- .BR yy
- variable prefix.
- .P
- The description of the
- .BR \-p
- option specifies the minimal set of function and variable names that
- cause conflict when multiple parsers are linked together. YYSTYPE does
- not need to be changed. Instead, the programmer can use
- .BR \-b
- to give the header files for different parsers different names, and
- then the file with the
- \fIyylex\fR()
- for a given parser can include the header for that parser. Names such
- as
- .IR yyclearerr
- do not need to be changed because they are used only in the actions;
- they do not have linkage. It is possible that an implementation has
- other names, either internal ones for implementing things such as
- .IR yyclearerr ,
- or providing non-standard features that it wants to change with
- .BR \-p .
- .P
- Unary operators that are the same token as a binary operator in general
- need their precedence adjusted. This is handled by the
- .BR %prec
- advisory symbol associated with the particular grammar rule defining
- that unary operator. (See
- .IR "Grammar Rules in yacc".)
- Applications are not required to use this operator for unary operators,
- but the grammars that do not require it are rare.
- .SH EXAMPLES
- Access to the
- .IR yacc
- library is obtained with library search operands to
- .IR c99 .
- To use the
- .IR yacc
- library
- \fImain\fR():
- .sp
- .RS 4
- .nf
- c99 y.tab.c -l y
- .fi
- .P
- .RE
- .P
- Both the
- .IR lex
- library and the
- .IR yacc
- library contain
- \fImain\fR().
- To access the
- .IR yacc
- \fImain\fR():
- .sp
- .RS 4
- .nf
- c99 y.tab.c lex.yy.c -l y -l l
- .fi
- .P
- .RE
- .P
- This ensures that the
- .IR yacc
- library is searched first, so that its
- \fImain\fR()
- is used.
- .P
- The historical
- .IR yacc
- libraries have contained two simple functions that are normally coded
- by the application programmer. These functions are similar to the
- following code:
- .sp
- .RS 4
- .nf
- #include <locale.h>
- int main(void)
- {
- extern int yyparse();
- .P
- setlocale(LC_ALL, "");
- .P
- /* If the following parser is one created by lex, the
- application must be careful to ensure that LC_CTYPE
- and LC_COLLATE are set to the POSIX locale. */
- (void) yyparse();
- return (0);
- }
- .P
- #include <stdio.h>
- .P
- int yyerror(const char *msg)
- {
- (void) fprintf(stderr, "%s\en", msg);
- return (0);
- }
- .fi
- .P
- .RE
- .SH RATIONALE
- The references in
- .BR "Referenced Documents"
- may be helpful in constructing the parser generator. The referenced DeRemer and Pennello article (along
- with the works it references) describes a technique to generate parsers
- that conform to this volume of POSIX.1\(hy2017. Work in this area continues to be done, so
- implementors should consult current literature before doing any new
- implementations. The original Knuth article is the theoretical basis for this
- kind of parser, but the tables it generates are impractically large for
- reasonable grammars and should not be used. The ``equivalent to''
- wording is intentional to assure that the best tables that are LALR(1)
- can be generated.
- .P
- There has been confusion between the class of grammars, the algorithms
- needed to generate parsers, and the algorithms needed to parse the
- languages. They are all reasonably orthogonal. In particular, a parser
- generator that accepts the full range of LR(1) grammars need not
- generate a table any more complex than one that accepts SLR(1) (a
- relatively weak class of LR grammars) for a grammar that happens to be
- SLR(1). Such an implementation need not recognize the case, either;
- table compression can yield the SLR(1) table (or one even smaller than
- that) without recognizing that the grammar is SLR(1).
- The speed of an LR(1) parser for any class is dependent more upon the
- table representation and compression (or the code generation if a
- direct parser is generated) than upon the class of grammar that the
- table generator handles.
- .P
- The speed of the parser generator is somewhat dependent upon the class
- of grammar it handles. However, the original Knuth article algorithms for
- constructing LR parsers were judged by its author to be impractically
- slow at that time. Although full LR is more complex than LALR(1), as
- computer speeds and algorithms improve, the difference (in terms of
- acceptable wall-clock execution time) is becoming less significant.
- .P
- Potential authors are cautioned that the referenced DeRemer and Pennello article previously cited
- identifies a bug (an over-simplification of the computation of LALR(1)
- lookahead sets) in some of the LALR(1) algorithm statements that
- preceded it to publication. They should take the time to seek out that
- paper, as well as current relevant work, particularly Aho's.
- .P
- The
- .BR \-b
- option was added to provide a portable method for permitting
- .IR yacc
- to work on multiple separate parsers in the same directory. If a
- directory contains more than one
- .IR yacc
- grammar, and both grammars are constructed at the same time (by, for
- example, a parallel
- .IR make
- program), conflict results. While the solution is not historical
- practice, it corrects a known deficiency in historical implementations.
- Corresponding changes were made to all sections that referenced the
- filenames
- .BR y.tab.c
- (now ``the code file''),
- .BR y.tab.h
- (now ``the header file''), and
- .BR y.output
- (now ``the description file'').
- .P
- The grammar for
- .IR yacc
- input is based on System V documentation. The textual description shows
- there that the
- .BR ';'
- is required at the end of the rule. The grammar and the implementation
- do not require this. (The use of
- .BR C_IDENTIFIER
- causes a reduce to occur in the right place.)
- .P
- Also, in that implementation, the constructs such as
- .BR %token
- can be terminated by a
- <semicolon>,
- but this is not permitted by the grammar. The keywords such as
- .BR %token
- can also appear in uppercase, which is again not discussed. In most
- places where
- .BR '%'
- is used,
- <backslash>
- can be substituted, and there are alternate spellings for some of the
- symbols (for example,
- .BR %LEFT
- can be
- .BR \(dq%<\(dq
- or even
- .BR \(dq\e<\(dq ).
- .P
- Historically, <\fItag\fP> can contain any characters except
- .BR '>' ,
- including white space, in the implementation. However, since the
- .IR tag
- must reference an ISO\ C standard union member, in practice conforming
- implementations need to support only the set of characters for ISO\ C standard
- identifiers in this context.
- .P
- Some historical implementations are known to accept actions that are
- terminated by a period. Historical implementations often allow
- .BR '$'
- in names. A conforming implementation does not need to support either
- of these behaviors.
- .P
- Deciding when to use
- .BR %prec
- illustrates the difficulty in specifying the behavior of
- .IR yacc .
- There may be situations in which the
- .IR grammar
- is not, strictly speaking, in error, and yet
- .IR yacc
- cannot interpret it unambiguously. The resolution of ambiguities in the
- grammar can in many instances be resolved by providing additional
- information, such as using
- .BR %type
- or
- .BR %union
- declarations. It is often easier and it usually yields a smaller parser
- to take this alternative when it is appropriate.
- .P
- The size and execution time of a program produced without the runtime
- debugging code is usually smaller and slightly faster in historical
- implementations.
- .P
- Statistics messages from several historical implementations include the
- following types of information:
- .sp
- .RS 4
- .nf
- \fIn\fR/512 terminals, \fIn\fR/300 non-terminals
- \fIn\fR/600 grammar rules, \fIn\fR/1\|500 states
- \fIn\fR shift/reduce, \fIn\fR reduce/reduce conflicts reported
- \fIn\fR/350 working sets used
- Memory: states, etc. \fIn\fR/15\|000, parser \fIn\fR/15\|000
- \fIn\fR/600 distinct lookahead sets
- \fIn\fR extra closures
- \fIn\fR shift entries, \fIn\fR exceptions
- \fIn\fR goto entries
- \fIn\fR entries saved by goto default
- Optimizer space used: input \fIn\fR/15\|000, output \fIn\fR/15\|000
- \fIn\fR table entries, \fIn\fR zero
- Maximum spread: \fIn\fR, Maximum offset: \fIn\fR
- .fi
- .P
- .RE
- .P
- The report of internal tables in the description file is left
- implementation-defined because all aspects of these limits are also
- implementation-defined. Some implementations may use dynamic
- allocation techniques and have no specific limit values to report.
- .P
- The format of the
- .BR y.output
- file is not given because specification of the format was not seen to
- enhance applications portability. The listing is primarily intended to
- help human users understand and debug the parser; use of
- .BR y.output
- by a conforming application script would be unusual. Furthermore,
- implementations have not produced consistent output and no popular
- format was apparent. The format selected by the implementation should
- be human-readable, in addition to the requirement that it be a text
- file.
- .P
- Standard error reports are not specifically described because they are
- seldom of use to conforming applications and there was no reason to
- restrict implementations.
- .P
- Some implementations recognize
- .BR \(dq={\(dq
- as equivalent to
- .BR '{'
- because it appears in historical documentation. This construction was
- recognized and documented as obsolete as long ago as 1978, in the
- referenced \fIYacc: Yet Another Compiler-Compiler\fP. This volume of POSIX.1\(hy2017 chose to leave it as obsolete and omit it.
- .P
- Multi-byte characters should be recognized by the lexical analyzer and
- returned as tokens. They should not be returned as multi-byte
- character literals. The token
- .BR error
- that is used for error recovery is normally assigned the value 256 in
- the historical implementation. Thus, the token value 256, which is used
- in many multi-byte character sets, is not available for use as the
- value of a user-defined token.
- .SH "FUTURE DIRECTIONS"
- None.
- .SH "SEE ALSO"
- .IR "\fIc99\fR\^",
- .IR "\fIlex\fR\^"
- .P
- The Base Definitions volume of POSIX.1\(hy2017,
- .IR "Chapter 8" ", " "Environment Variables",
- .IR "Section 12.2" ", " "Utility Syntax Guidelines"
- .\"
- .SH COPYRIGHT
- Portions of this text are reprinted and reproduced in electronic form
- from IEEE Std 1003.1-2017, Standard for Information Technology
- -- Portable Operating System Interface (POSIX), The Open Group Base
- Specifications Issue 7, 2018 Edition,
- Copyright (C) 2018 by the Institute of
- Electrical and Electronics Engineers, Inc and The Open Group.
- In the event of any discrepancy between this version and the original IEEE and
- The Open Group Standard, the original IEEE and The Open Group Standard
- is the referee document. The original Standard can be obtained online at
- http://www.opengroup.org/unix/online.html .
- .PP
- Any typographical or formatting errors that appear
- in this page are most likely
- to have been introduced during the conversion of the source files to
- man page format. To report such errors, see
- https://www.kernel.org/doc/man-pages/reporting_bugs.html .