Parser¶
Introduction¶
The generated parser is LALR(1).
The grammar is specified as a series of terminals and non-terminals. For example,
<rhs>VARIABLE '=' expr ';'</rhs>
Extended Grammar¶
Extended grammar are mostly for convenience. New operators ?
, *
,
+
, |
, (
and )
are added. They are described below. Some
of the extended grammar requires a small runtime library
`cookcc-rt
.
The tests for the extended grammar have some examples.
Optional Operator¶
Optional operator ?
is used to indicate a symbol is optional. For
example, in the following grammar
<grammar rule="G">
<rhs>A B?</rhs>
</grammar>
It is equivalent to
<grammar rule="G">
<rhs>A @1</rhs>
</grammar>
<grammar rule="@1">
<rhs>B</rhs>
<action>$$ = $1;</action>
<rhs></rhs>
<action>$$ = null;</action>
</grammar>
With ()
, it is possible to enclose multiple symbols with ?
operator. For example, in the following
<grammar rule="G">
<rhs>A (B C)?</rhs>
</grammar>
It is equivalent to
<grammar rule="G">
<rhs>A C</rhs>
</grammar>
<grammar rule="@1">
<rhs>B C</rhs>
<action>$$ = new ASTNode (); $$.add ($1); $$.add ($2);</action>
<rhs></rhs>
<action>$$ = null;</action>
</grammar>
The collection used here is org.yuanheng.cookcc.ASTNode
which is
part of cookcc-rt runtime.
List Operator¶
List operator +
basically repeats a symbol or a set of symbols for
one or more iterations. It is very common for grammars to have repeated
terminal / non-terminals, so it should be helpful to avoid some tedious
works. For example, in the following grammar
<grammar rule="G">
<rhs>A B+</rhs>
</grammar>
It is equivalent to
<grammar rule="G">
<rhs>A @1</rhs>
</grammar>
<grammar rule="@1">
<rhs>B</rhs>
<action>$$ = new ASTListNode (); $$.add ($1);</action>
<rhs>@1 B</rhs>
<action>$$.add ($2);</action>
</grammar>
The collection used here is org.yuanheng.cookcc.ASTListNode
which is
part of cookcc-rt runtime.
It is possible to repeat a set of symbols, but its use should be rare. For example, in the following grammar
<grammar rule="G">
<rhs>A (B C)+</rhs>
</grammar>
It is equivalent to
<grammar rule="G">
<rhs>A @1</rhs>
</grammar>
<grammar rule="@1">
<rhs>B C</rhs>
<action>$$ = new ASTListNode (); $$.add ($1); $$.add ($2);</action>
<rhs>@1 B C</rhs>
<action>$$.add ($2); $$.add ($3);</action>
</grammar>
It should be noted a single ASTListNode
is used to contain all the
repeated symbols.
To avoid extra ASTListNode
being created, the following two rules
are equivalent.
<rhs>A B+</rhs>
<rhs>A (B)+</rhs>
Optional List Operator¶
Optional list operator *
basically repeats a symbol or a set of
symbols for zero or more iterations. It is very similar to +
operator except that the number of repeats can be zero.
Grouping Operator¶
(
and )
are used enclose symbols. The behavior are the
following.
- It has to enclose at least a symbol. Thus simply
()
is not allowed. - It is usually used in conjunction with other extended grammar operators. See notes in other extended grammar operators.
- You can use
(
and)
simply for grouping one or more symbols without using other extended grammar operators. For example:A (B) C
. An objectorg.yuanheng.cookcc.ASTNode
is used to collect the symbol values. - Nesting such as
((A))
is allowed, but its use probably does not make practical sense.
Or Operator¶
Or operator |
is used to make several possible choices. For example,
in the following grammar
<grammar rule="G">
<rhs>A (B | C)</rhs>
</grammar>
It is equivalent to
<grammar rule="G">
<rhs>A @1</rhs>
</grammar>
<grammar rule="@1">
<rhs>B</rhs>
<action>$$ = $1;</action>
<rhs>C</rhs>
<action>$$ = $1;</action>
</grammar>
It is not possible to specify empty rule with |
operator. Instead,
you should combine with ?
operator for this need. For example, in
the following grammar, A could be followed by B, C, or D.
<grammar rule="G">
<rhs>A (B | C) ? D</rhs>
</grammar>
cookcc-rt Runtime¶
cookcc-rt is a tiny Java runtime library for cookcc. It is only required by certain extended grammars mentioned above.
Parser Table Format¶
Currently, the following table formats are supported.
Format | Description |
---|---|
ecs |
Good when there are not a lot symbols and states. |
compressed |
A smaller table in most cases at some performance cost. |
Default Reduce¶
the command line option -defaultreduce
is specified, DFA states that
contain a reduceable item would convert all 0 (i.e. error) entries to
reduce. This approach can make the compressed table more compact, at the
expense of slightly more difficult error recovery.
Analysis Output¶
When the command line option -analysis
is specified, a file named
cookcc_parser_analysis.txt
is generated in the current directory that contains the detail of the
parser. It can be useful in analyzing the grammar.
Error Recovery¶
<parser>
has an option recovery
which would turn on/off the
error recovery code depending whether or not the value is true
or
false
. This value is by default true
.
Turning off error recovery can be useful since in many cases we do not really care much about the corrupted data, and error recovery can be slow.
The exact behavior of error recovery depends on the specific implementation of output language.
Error Recovery in Java¶
When the recovery
option is set to false
, the parser simply
returns with a value of 1
to indicate that an error has occurred.
The option parseerror
controls whether or not the code generator
should generatet he yyParseError
function. Set this option to
false
if you want to the parser to use your own function.
Otherwise, the behavior of the parser is the following.
When a token not belonging to one of the lookahead (i.e. cannot either
reduce or shift) is encountered. yyParseError
function is called. If
this function returns true
, the parser stops and returns a value
1
. If the function returns false
(by default), an error token is
pushed onto the lookahead stack and an internal error flag
_yyInError
is set. Then the parsing is resumed.
If the error token can be shifted, then a grammar dealing with error recovery is found. Otherwise, the parser would start discarding a state on the stack until a grammar that can handle the “error” token is reached.
With the error token shifted on to the stack. The state should be immediately reduceable if the grammar does not require any tokens after the error token. Otherwise, it means the grammar is looking for a specific terminal. Then the input is continuously consumed until the desired token is found, or the end of file is reached.
Additionally, yyPeekLookahead ()
is provided to check the cause of
the error (only accurate if the user didn’t specify any terminals after
error
in the grammar). yyPopLookahead ()
is provided to remove
the possible offending token. However, this function should be called
only once.
Here are some test cases that demonstrate these behaviors.