Lexer Section¶
Specifying Shortcuts¶
@Shortcut
is used to specify a single frequently used pattern can be
re-used in actual lexical patterns. Multiple @Shortcut
can be
defined using @Shortcuts
annotation. Just specify it on any
functions. The order of shortcut is not important and it is possible to
contain references to other shortcuts. Just be careful not to create
cyclic references.
@Shortcuts ( shortcuts = {
@Shortcut (name="nonws", pattern="[^ \\t\\n]"),
@Shortcut (name="ws", pattern="[ \\t]")
})
@Lex (pattern="{nonws}+", state="INITIAL")
void matchWord ()
{
m_cc += yyLength ();
++m_wc;
}
Specifying Lexical Patterns¶
@Lex
is use to specify a single lexical pattern. @Lexs
is used
to specify multiple lexical patterns that share a common action.
There are three types functions that can be marked with these two annotations. They each has different meanings.
None of the functions can be private
, since they are called from the
generated class. They can be protected
, public
or in the package
scope (if the generated class is in the same package as this class).
Case 1: Function returns void¶
This is the most simple case. The lexer would call this function and then move on to matching the next potential pattern. For example:
@Lex (pattern = "[ \\t\\r\\n]+")
protected void ignoreWhiteSpace ()
{
}
Note that it is necessary to use double backslashes here for escape
sequences because Java itself also interpret escape sequences. This is
perhaps one of the main drawback using Java annotation to specify the
lexer. Fortunately usually the lexer is fairly easy to get it working
correctly. IntelliJ IDEA also has a nice feature which pasting code with
escape sequence such as [ \t\r\n]+
inside a pair of double quotes
would automatically adds the extra backslashes.
Case 2: Function returns a non-int value¶
In this case, @Lex
needs to contain the terminal token that would be
returned. The return value from the function is going to be the value
associated with this terminal.
We have to specify the terminal in String due to the technical limitation.
@Lex (pattern="[0-9]+", token="INTEGER")
protected Integer parseInt ()
{
return Integer.parseInt (yyText ());
}
@Lexs (patterns = {
@Lex (pattern = "while", token = "WHILE"),
@Lex (pattern = "if", token = "IF"),
@Lex (pattern = "else", token = "ELSE"),
@Lex (pattern = "print", token = "PRINT")
})
protected Object parseKeyword ()
{
return null;
}
Case 3: Function returns an int value¶
In this case, the lexer would return the value. For example:
@Lex (pattern="[(){}.]")
protected int parseSymbol ()
{
return yyText ().charAt (0);
}
Be extra careful if the return value is used as terminals in the parser. Values not in the valid used terminals can result in the early termination of the parser.
Note that when <<EOF>>
is encountered, it is necessary to return a
value or the lexer would get into an infinite loop. There are a number
of ways of doing so:
@Lex (pattern = "<<EOF>>", token = "$")
protected void parseEOF ()
{
}
Or you can simply do
@Lex (pattern = "<<EOF>>")
protected int parseEOF ()
{
return 0;
}
This is because $
terminal corresponds to 0.