Java

Introduction

Java is the default output language of CookCC. It can also be selected using the -lang java command line option. The generated class is self-contained and does not require additional libraries.

Command Line Options

Option Version Description
-d <directory> 0.1+ Select the output directory. By default, it is the current directory.
-class <className> 0.1+ Set the class name. By default, the class name is Lexer. The output Java file would be generated under the appropriate package subdirectories of the output directory. The subdirectories would be created if they do not exist.
-public 0.1+ Set class scope to public. By default, the class generated is in the package scope.
-abstract 0.3+ Make the output class abstract. It also disables the generation of main function.

Code Locations

For codes in <code name="name"></code>, their locations in the generated file can be seen in the following example. If the name is not given, the name is assumed to be “default”.

/* code name = "fileheader" */
package foo;

/**
 * code name = "classheader"
 */
public class Bar
{
    public int yyLex ()
    {
        // code name = "lexerprolog"

        // case switch codes
    }

    public int yyParse ()
    {
        // code name = "parserprolog"

        // case switch codes
    }

    // code name = "default"
}

Lexer

Word count tests on large files (5 MB - 22 MB) have shown that ecs table has about the same performance as the full table for Java.

Buffer Size

When a particular match exceeds the buffer size (default 4096), CookCC would increase the buffer length by 50%. For very long matches (such as code dumping near the end), for the best performance, set the initial buffer size to the size of the input.

yywrap

By default, when the end of file of the current input is reached, a special <<EOF>> character is generated. However, if one wishes to hook this event, set the yywrap="true" option for the lexer and define the protected boolean yywrap () function, which would be called.

The yywrap function should return true if no further action should be done, and false if the lexer should attempt to read from the input again.

Input Stack

Sometimes, it maybe useful to halt the current input and switch to another input temporarily. For instance, #include "file". In these cases, yyPushInput, yyPopInput, yyInputStackSize functions are provided.

A test example is provided.

Unicode Support

CookCC can generate tables for 16-bit characters. The default input handling though is not clever enough to detect the encoding of the input.

See Input Encoding Detection for more details.

CookCC 0.3.3 generates a string that is too long for Oracle’s Java compiler. A work around is to use ECJ (Eclipse Core Java compiler) to compile the generated code.

CookCC 0.4 fixes this issue.

Performance

Here is a Lexer performance chart using Flex’s fastwc examples (lower bar indicates better performance) on a simple 5 MB text file. The following was tested using MinGW WC program, 5 version of word count under Flex (full table), CookCC (ecs table), and JFlex (ecs table). Both MinGW WC and Flex generated code were in C, while CookCC and JFlex generated codes were in Java.

Word Count Performance Chart

Word Count Performance Chart

The file is too small to really show the differences among Flex’s five different versions of word count for Flex, but the pattern shows quite noticeably for CookCC and the performance is expected (#1 to #4 has gradual improvements while #5 introduced backups and is actually slower).

There were several reasons why JFlex was so much slower than CookCC. JFlex has a very slow startup time due to its inefficient table packing method. As the result, the DFA table size has a major impact to the performance. JFlex also does not have local variable declaration section and thus all variables need to be instance variables. It also does not have a yyLength variable and must call yylength () function instead.

As a side note, ecs table and full table didn’t make much of the difference for CookCC.