`<lexer>`¶

A sample XML for the <lexer> section looks like:

<lexer table="ecs">
    <shortcut name="nonws">[^ \t\r\n]</shortcut>
    <shortcut name="word">{nonws}+</shortcut>
    <rule>
        <pattern>{word}</pattern>
        <pattern>{word}[ \t\r\n]*</pattern>
        <action>
            ++wordCount;
        </action>
    </rule>
    <state name="INITIAL,TEST">
        <rule>
            <pattern>.|\n</pattern>
            <action>
                // ignore
            </action>
        </rule>
        <rule state="ANOTHER_STATE">
            <pattern><![CDATA[<<EOF>>]]></pattern>
            <action>
                return 0;  /* exit lexer loop */
            </action>
        </rule>
    </state>
</lexer>

Options for the lexer are specified as attributes of the <lexer> tag.

Attribute	Description
`table`	The DFA table format. Can be one of `"ecs"`, `"full"`, `"compressed"` options. Command line options can override the choice here.
`bol`	Instruct the lexer to keep track of the BOL (beginning of line) information even when there are no patterns use that information.
`warnbackup`	Generate warning of backup lexer states if set to true. Default is false.
`yywrap`	Indicates that `yyWrap ()` function should be called when EOF is encountered. Default is false.
`linemode`	Instruct the lexer to match patterns one line at a time. This mode is primarily useful for interactive modes where inputs are delimited by `'\n'` character. Multi-line patterns will generate warnings since they cannot be matched in this mode.

`<shortcut>`¶

This tag is used to specify frequently used subset of patterns. In the above example, when the pattern {word} is seen, it is replaced with ({nonws}), which is in turn replaced with ([^ \t\r\n]). So the actual pattern is [^ \t\r\n]+.

<shortcut> tags can only be specified as immediate children of <lexer>.

`<state>`¶

<state> tags are used to indicate the state conditions. It has only one attribute name to specify a comma separated list of state names. All rules specified under this tag are automatically added to this particular state. If the name attribute is not specified, it is assumed to be {{{INITIAL}}, which is required as the initial state at the start of the lexer.

`<rule>`¶

Rule tags are used to specify patterns and their associated action codes. It can have multiple <pattern> children, but one and only one <action> child.

Attribute	Description
`state`	A comma separated list of state names that this rule is in. If the current rule is already under a `<state>` tag, then the rule is added to all of them.

`<pattern>`¶

Attribute	Description
`bol`	Specify that the pattern only works at BOL (beginning of line).
`nocase`	Specify that the pattern does case insensitive match.

Although multiple patterns may be under the same rule and share the action code, in actual generated code, the action code is replicated for each pattern. This is to avoid the problem that some patterns may work at BOL while some other patterns may not. To avoid action code replication, try put them inside a single <pattern> tag with | in between.

`<action>`¶

It contains the code to be executed when the pattern is matched.

<lexer>¶

<shortcut>¶

<state>¶

<rule>¶

<pattern>¶

<action>¶