Lexing & Parsing

In the final part of the project we will generate ASTs from textual representations of MiniJava programs.

Usage

java -jar mjavac.jar parse marshal inputProg.java out.xml

will write to out.xml a valid XML representation (per the schema) of the AST that corresponds to the input MiniJava program. (More about the XML representation of ASTs here.) Including line numbers in the resulting AST/XML file is completely optional.

In case of a parser/lexer error, no file should be created, an an error message starting with "Syntax error at line 11 of input." must be printed to stderr (not stdout!), where "11" is replaced by the line number where the error occurred. Use the supplied methods for error reporting as in the solution starter kit's parser example, including the code under scan with which captures exceptions thrown by the lexer. Make sure you report errors originating both from the lexer and the parser!

Examples

Here (see the readme). Our beloved AST examples also demonstrate successful parsing.

Starter Code

The solution starter kit includes a basic JFlex file, a basic CUP file (with methods for error handling), and an Ant build configuration that generates the lexical analyzer and parser from them before building. (Note that these files have been updated just before ex4 was announced; the modifications to the build file are slight and perform cleanup more properly.) You may also find the example from the recitation helpful.

MiniJava Grammar

The syntax of MiniJava is defined by the following BNF grammar. An <IDENTIFIER> is a sequence of letters, digits, and underscores, starting with a letter. Uppercase letters are distinguished from lowercase. An <INTEGER_LITERAL> is a sequence of decimal digits that denotes an integer value - it shouldn't start with 0 (which denotes an octal number, e.g. 012) or 0x (which denotes a hexadecimal number, e.g. 0xDEADBEAF).
For our project, /* comments */ are not nested, i.e., you need to implement /* */ comments, but the first */ terminates any open comment regardless of how many /* sequences have appeared before it. You are also required to support single-line // comments.

Which AST to Generate?

You must generate an AST that faithfully captures the syntax of the input program; printing the AST back to Java should be very similar to the original program, excepting indentation, spacing, and parentheses. For example, the order of classes in the AST's classdecls should be the same as the order of class declarations in the original program.

I am not currently aware of any scenario in which more than one AST corresponds to the program at the syntactic level (equivalence is not enough - in this phase we capture the syntax). Please let me know if you think of anything that might result in such a situation. That being said, we are planning to accommodate some minor variations if they represent the same program.

You do not need to include any line numbers in the ASTs you generate.

Parsing Semantically Invalid Programs

Your solution must successfully parse both valid programs and programs that would not pass semantic checking.

Some Things to Note

Good luck!

E Pluribus Unum

Sit back and enjoy all your hard work coming together.
java -jar mjavac.jar parse compile inputProg.java out.ll
lli out.ll