yacc.1 (13759B)
- .\" $Id: yacc.1,v 1.44 2024/12/31 15:46:49 tom Exp $
- .\"
- .TH YACC 1 2024-12-31 "Berkeley Yacc" "User Commands"
- .
- .ds N Yacc
- .ds n yacc
- .
- .ie n .ds CW R
- .el \{
- .ie \n(.g .ds CW CR
- .el .ds CW CW
- .\}
- .
- .de Ex
- .RS +7
- .PP
- .nf
- .ft \*(CW
- ..
- .de Ee
- .fi
- .ft R
- .RE
- ..
- .\" Escape single quotes in literal strings from groff's Unicode transform.
- .ie \n(.g \{\
- .ds `` \(lq
- .ds '' \(rq
- .ds ' \(aq
- .\}
- .el \{\
- .ie t .ds `` ``
- .el .ds `` ""
- .ie t .ds '' ''
- .el .ds '' ""
- .ie t .ds ' \(aq
- .el .ds ' '
- .\}",
- .\" Bulleted paragraph
- .de bP
- .ie n .IP \(bu 4
- .el .IP \(bu 2
- ..
- .SH NAME
- \*N \-
- an LALR(1) parser generator
- .SH SYNOPSIS
- .B \*n [ \-BdghilLPrtvVy ] [ \-b
- .I file_prefix
- .B ] [ \-H
- .I defines_file
- .B ] [ \-o
- .I output_file
- .B ] [ \-p
- .I symbol_prefix
- .B ]
- .I filename
- .SH DESCRIPTION
- .B \*N
- reads the grammar specification in the file
- .I filename
- and generates an LALR(1) parser for it.
- The parsers consist of a set of LALR(1) parsing tables and a driver routine
- written in the C programming language.
- .B \*N
- normally writes the parse tables and the driver routine to the file
- .IR y.tab.c .
- .PP
- The following options are available:
- .TP 5
- \fB\-b \fIfile_prefix\fR
- The
- .B \-b
- option changes the prefix prepended to the output file names to
- the string denoted by
- .IR file_prefix .
- The default prefix is the character
- .IR y .
- .TP
- .B \-B
- create a backtracking parser (compile-time configuration for \fBbtyacc\fP).
- .TP
- .B \-d
- causes the header file
- .B y.tab.h
- to be written.
- It contains #define's for the token identifiers.
- .TP
- .B \-h
- print a usage message to the standard error.
- .TP
- \fB\-H \fIdefines_file\fR
- causes #define's for the token identifiers
- to be written to the given \fIdefines_file\fP rather
- than the \fBy.tab.h\fP file used by the \fB\-d\fP option.
- .TP
- .B \-g
- The
- .B \-g
- option causes a graphical description of the generated LALR(1) parser to
- be written to the file
- .B y.dot
- in graphviz format, ready to be processed by
- .BR dot (1).
- .TP
- .B \-i
- The \fB\-i\fR option causes a supplementary header file
- .B y.tab.i
- to be written.
- It contains extern declarations
- and supplementary #define's as needed to map the conventional \fIyacc\fP
- \fByy\fP-prefixed names to whatever the \fB\-p\fP option may specify.
- The code file, e.g., \fBy.tab.c\fP is modified to #include this file
- as well as the \fBy.tab.h\fP file, enforcing consistent usage of the
- symbols defined in those files.
- .IP
- The supplementary header file makes it simpler to separate compilation
- of lex- and yacc-files.
- .TP
- .B \-l
- If the
- .B \-l
- option is not specified,
- .B \*n
- will insert \fI#line\fP directives in the generated code.
- The \fI#line\fP directives let the C compiler relate errors in the
- generated code to the user's original code.
- If the \fB\-l\fR option is specified,
- .B \*n
- will not insert the \fI#line\fP directives.
- \&\fI#line\fP directives specified by the user will be retained.
- .TP
- .B \-L
- enable position processing,
- e.g., \*(``%locations\*('' (compile-time configuration for \fBbtyacc\fP).
- .TP
- \fB\-o \fIoutput_file\fR
- specify the filename for the parser file.
- If this option is not given, the output filename is
- the file prefix concatenated with the file suffix, e.g., \fBy.tab.c\fP.
- This overrides the \fB\-b\fP option.
- .TP
- \fB\-p \fIsymbol_prefix\fR
- The
- .B \-p
- option changes the prefix prepended to yacc-generated symbols to
- the string denoted by
- .IR symbol_prefix .
- The default prefix is the string
- .B "yy."
- .TP
- .B \-P
- create a reentrant parser, e.g., \*(``%pure\-parser\*(''.
- .TP
- .B \-r
- The
- .B \-r
- option causes
- .B \*n
- to produce separate files for code and tables.
- The code file is named
- .IR y.code.c ,
- and the tables file is named
- .IR y.tab.c .
- The prefix \*(``\fIy.\fP\*('' can be overridden using the \fB\-b\fP option.
- .TP
- .B \-s
- suppress \*(``\fB#define\fP\*('' statements generated for string literals in
- a \*(``\fB%token\fP\*('' statement,
- to more closely match original \fByacc\fP behavior.
- .IP
- Normally when \fB\*n\fP sees a line such as
- .Ex
- %token OP_ADD "ADD"
- .Ee
- .IP
- it notices that the quoted \*(``ADD\*('' is a valid C identifier,
- and generates a #define not only for OP_ADD,
- but for ADD as well,
- e.g.,
- .Ex
- #define OP_ADD 257
- .br
- #define ADD 258
- .Ee
- .IP
- The original \fByacc\fP does not generate the second \*(``\fB#define\fP\*(''.
- The \fB\-s\fP option suppresses this \*(``\fB#define\fP\*(''.
- .IP
- POSIX (IEEE 1003.1 2004) documents only names and numbers
- for \*(``\fB%token\fP\*('',
- though original \fByacc\fP and bison also accept string literals.
- .TP
- .B \-t
- The
- .B \-t
- option changes the preprocessor directives generated by
- .B \*n
- so that debugging statements will be incorporated in the compiled code.
- .IP
- \fB\*N\fR sends debugging output to the standard output
- (compatible with both the original \fByacc\fP and \fBbtyacc\fP),
- while \fBbtyacc\fP writes debugging output to the standard error
- (like \fBbison\fP).
- .TP
- .B \-v
- The
- .B \-v
- option causes a human-readable description of the generated parser to
- be written to the file
- .IR y.output .
- .TP
- .B \-V
- print the version number to the standard output.
- .TP
- .B \-y
- \fB\*n\fP ignores this option,
- which bison supports for ostensible POSIX compatibility.
- .PP
- The \fIfilename\fP parameter is not optional.
- However, \fB\*n\fP accepts a single \*(``\-\*('' to read the grammar
- from the standard input.
- A double \*(``\-\-\*('' marker denotes the end of options.
- A single \fIfilename\fP parameter is expected after a \*(``\-\-\*('' marker.
- .
- .SH DIAGNOSTICS
- If there are rules that are never reduced, the number of such rules is
- reported on standard error.
- If there are any LALR(1) conflicts, the number of conflicts is reported
- on standard error.
- .SH EXTENSIONS
- .B \*N
- provides some extensions for
- compatibility with bison and other implementations of yacc.
- It accepts several \fIlong options\fP which have equivalents in \*n.
- The \fB%destructor\fP and \fB%locations\fP features are available
- only if \fB\*n\fP has been configured and compiled to support the
- back-tracking (\fBbtyacc\fP) functionality.
- The remaining features are always available:
- .TP
- \fB %code\fP \fIkeyword\fP { \fIcode\fP }
- Adds the indicated source \fIcode\fP at a given point in the output file.
- The optional \fIkeyword\fP tells \fB\*n\fP where to insert the \fIcode\fP:
- .RS 7
- .TP 5
- \fBtop\fP
- just after the version-definition in the generated code-file.
- .TP 5
- \fBrequires\fP
- just after the declaration of public parser variables.
- If the \fB\-d\fP option is given, the code is inserted at the
- beginning of the defines-file.
- .TP 5
- \fBprovides\fP
- just after the declaration of private parser variables.
- If the \fB\-d\fP option is given, the code is inserted at the
- end of the defines-file.
- .RE
- .IP
- If no \fIkeyword\fP is given, the code is inserted at the
- beginning of the section of code copied verbatim from the source file.
- Multiple \fB%code\fP directives may be given;
- \fB\*n\fP inserts those into the corresponding code- or defines-file
- in the order that they appear in the source file.
- .TP
- \fB %debug\fP
- This has the same effect as the \*(``\-t\*('' command-line option.
- .TP
- \fB %destructor\fP { \fIcode\fP } \fIsymbol+\fP
- defines code that is invoked when a symbol is automatically
- discarded during error recovery.
- This code can be used to
- reclaim dynamically allocated memory associated with the corresponding
- semantic value for cases where user actions cannot manage the memory
- explicitly.
- .IP
- On encountering a parse error, the generated parser
- discards symbols on the stack and input tokens until it reaches a state
- that will allow parsing to continue.
- This error recovery approach results in a memory leak
- if the \fBYYSTYPE\fP value is, or contains,
- pointers to dynamically allocated memory.
- .IP
- The bracketed \fIcode\fP is invoked whenever the parser discards one of
- the symbols.
- Within \fIcode\fP, \*(``\fB$$\fP\*('' or
- \*(``\fB$<\fItag\fB>$\fR\*('' designates the semantic value associated with the
- discarded symbol, and \*(``\fB@$\fP\*('' designates its location (see
- \fB%locations\fP directive).
- .IP
- A per-symbol destructor is defined by listing a grammar symbol
- in \fIsymbol+\fP. A per-type destructor is defined by listing
- a semantic type tag (e.g., \*(``<some_tag>\*('') in \fIsymbol+\fP; in this
- case, the parser will invoke \fIcode\fP whenever it discards any grammar
- symbol that has that semantic type tag, unless that symbol has its own
- per-symbol destructor.
- .IP
- Two categories of default destructor are supported that are
- invoked when discarding any grammar symbol that has no per-symbol and no
- per-type destructor:
- .RS
- .bP
- the code for \*(``\fB<*>\fP\*('' is used
- for grammar symbols that have an explicitly declared semantic type tag
- (via \*(``\fB%type\fP\*('');
- .bP
- the code for \*(``\fB<>\fP\*('' is used
- for grammar symbols that have no declared semantic type tag.
- .RE
- .TP
- \fB %empty\fP
- ignored by \fB\*n\fP.
- .TP
- \fB %expect\fP \fInumber\fP
- tells \fB\*n\fP the expected number of shift/reduce conflicts.
- That makes it only report the number if it differs.
- .TP
- \fB %expect\-rr\fP \fInumber\fP
- tell \fB\*n\fP the expected number of reduce/reduce conflicts.
- That makes it only report the number if it differs.
- This is (unlike bison) allowable in LALR parsers.
- .TP
- \fB %locations\fP
- tells \fB\*n\fP to enable management of position information associated
- with each token, provided by the lexer in the global variable \fByylloc\fP,
- similar to management of semantic value information provided in \fByylval\fP.
- .IP
- As for semantic values, locations can be referenced within actions using
- \fB@$\fP to refer to the location of the left hand side symbol, and \fB@\fIN\fR
- (\fIN\fP an integer) to refer to the location of one of the right hand side
- symbols.
- Also as for semantic values, when a rule is matched, a default
- action is used the compute the location represented by \fB@$\fP as the
- beginning of the first symbol and the end of the last symbol in the right
- hand side of the rule.
- This default computation can be overridden by
- explicit assignment to \fB@$\fP in a rule action.
- .IP
- The type of \fByylloc\fP is \fBYYLTYPE\fP, which is defined by default as:
- .Ex
- typedef struct YYLTYPE {
- int first_line;
- int first_column;
- int last_line;
- int last_column;
- } YYLTYPE;
- .Ee
- .IP
- \fBYYLTYPE\fP can be redefined by the user
- (\fBYYLTYPE_IS_DEFINED\fP must be defined, to inhibit the default)
- in the declarations section of the specification file.
- As in bison, the macro \fBYYLLOC_DEFAULT\fP is invoked
- each time a rule is matched to calculate a position for the left hand side of
- the rule, before the associated action is executed; this macro can be
- redefined by the user.
- .IP
- This directive adds a \fBYYLTYPE\fP parameter to \fByyerror()\fP.
- If the \fB%pure\-parser\fP directive is present,
- a \fBYYLTYPE\fP parameter is added to \fByylex()\fP calls.
- .TP
- \fB %lex\-param\fP { \fIargument-declaration\fP }
- By default, the lexer accepts no parameters, e.g., \fByylex()\fP.
- Use this directive to add parameter declarations for your customized lexer.
- .TP
- \fB %parse\-param\fP { \fIargument-declaration\fP }
- By default, the parser accepts no parameters, e.g., \fByyparse()\fP.
- Use this directive to add parameter declarations for your customized parser.
- .TP
- \fB %pure\-parser\fP
- Most variables (other than \fByydebug\fP and \fByynerrs\fP) are
- allocated on the stack within \fByyparse\fP, making the parser reasonably
- reentrant.
- .TP
- \fB %token\-table\fP
- Make the parser's names for tokens available in the \fByytname\fP array.
- However,
- .B \*n
- does not predefine \*(``$end\*('', \*(``$error\*(''
- or \*(``$undefined\*('' in this array.
- .
- .SH PORTABILITY
- According to Robert Corbett,
- .Ex
- Berkeley Yacc is an LALR(1) parser generator. Berkeley Yacc
- has been made as compatible as possible with AT&T Yacc.
- Berkeley Yacc can accept any input specification that
- conforms to the AT&T Yacc documentation. Specifications
- that take advantage of undocumented features of AT&T Yacc
- will probably be rejected.
- .Ee
- .PP
- The rationale in
- .Ex
- http://pubs.opengroup.org/onlinepubs/9699919799/utilities/yacc.html
- .Ee
- .PP
- documents some features of AT&T yacc which are no longer required for POSIX
- compliance.
- .PP
- That said, you may be interested in reusing grammar files with some
- other implementation which is not strictly compatible with AT&T yacc.
- For instance, there is bison.
- Here are a few differences:
- .bP
- \fBYacc\fP accepts an equals mark preceding the left curly brace
- of an action (as in the original grammar file \fBftp.y\fP):
- .Ex
- | STAT CRLF
- = {
- statcmd();
- }
- .Ee
- .bP
- \fBYacc\fP and bison emit code in different order, and in particular bison
- makes forward reference to common functions such as yylex, yyparse and
- yyerror without providing prototypes.
- .bP
- Bison's support for \*(``%expect\*('' is broken in more than one release.
- For best results using bison, delete that directive.
- .bP
- Bison has no equivalent for some of \fB\*n\fP's command-line options,
- relying on directives embedded in the grammar file.
- .bP
- Bison's \*(``\fB\-y\fP\*('' option does not affect bison's lack of support for
- features of AT&T yacc which were deemed obsolescent.
- .bP
- \fBYacc\fP accepts multiple parameters
- with \fB%lex\-param\fP and \fB%parse\-param\fP in two forms
- .Ex
- {type1 name1} {type2 name2} ...
- {type1 name1, type2 name2 ...}
- .Ee
- .IP
- Bison accepts the latter (though undocumented), but depending on the
- release may generate bad code.
- .bP
- Like bison, \fB\*n\fP will add parameters specified via \fB%parse\-param\fP
- to \fByyparse\fP, \fByyerror\fP and (if configured for back-tracking)
- to the destructor declared using \fB%destructor\fP.
- Bison puts the additional parameters \fIfirst\fP for
- \fByyparse\fP and \fByyerror\fP but \fIlast\fP for destructors.
- \fBYacc\fP matches this behavior.
- .
- .SH SEE ALSO
- \fBbison\fP(1),
- \fBbtyacc\fP(1),
- \fBlex\fP(1),
- \fBflex\fP(1),
- \fByacc\fP(1)