MiniJava: Overview

In the course project, we will implement a compiler for an object-oriented language called MiniJava into LLVM assembly/IR. The project is adapted from materials kindly provided by the staff of the compiler classes at the University of Athens and the University of Washington. We also thank Hila Peleg and Oren Ish-Shalom for consulting on the project design.

MiniJava is (almost exactly) a subset of Java, defined in the Appendix of Appel and Palsberg's Modern Compiler Implementation in Java, 2nd edition and described on the MiniJava web site.

Syntax

The syntax of MiniJava is defined by the following BNF grammar. An <IDENTIFIER> is a sequence of letters, digits, and underscores, starting with a letter. Uppercase letters are distinguished from lowercase. An <INTEGER_LITERAL> is a sequence of decimal digits that denotes an integer value.
For our project, /* comments */ are not nested, i.e., you need to implement /* */ comments, but the first */ terminates any open comment regardless of how many /* sequences have appeared before it, as in standard Java. There are also single-line // comments.

Semantics

The meaning of a MiniJava program is the same as in Java. This means that when your compiler generates code, its behavior must match that of code generated by a standard Java compiler.

Abstract Syntax

In our project we will be using a specific way to represent programs an abstract syntax tree (AST).
In Java code, we represent an AST using the class hierarchy provided here.

Your code will process an XML representation of the AST. (See the XML tutorial.)
Here are some example MiniJava programs and XML representations of their AST.
The following XML schema defines the span of possible ASTs in XML. To ensure that an XML file represents a well-formed AST, validate the file against the schema using (on Linux):
xmllint --schema schema/ast.xsd yourCoolAst.xml --noout

The provided solution starter kit (see more details below) provides code to translate from an XML to Java objects and vice versa.
To read an AST from XML and print it as a Java program, use
java -jar mjavac.jar unmarhsal print yourCoolAst.xml out.java
There is also code for writing an AST to XML; for example,
java -jar mjavac.jar unmarhsal marshal yourCoolAst.xml out.xml
reads the AST from the XML file and outputs the same AST to another XML file. This may not seem so useful (it's not), but some of the exercises ask you to output to XML an AST that you will generate, so the code that does the marshaling will be useful to you.

In your solutions, you may modify the AST classes as you desire, but you must preserve the same XML interface. If you want to add additional fields, you might find the @XmlTransient annotation helpful.

Assignments

  1. Variable & method renaming.
  2. LLVM code generation.
  3. Semantic checking.
  4. Lexing & parsing.

Solution starter kit

The starter kit is now offline; contact me if you need access.
Is here. It contains a Java skeleton of the project, including the class hierarchy to represent ASTs, code for converting between these classes and the XML representation (see above), and a main class accepting the expected command line arguments for each of the exercises.

Build environment

The provided build settings (in build.xml) are intended for Ubuntu 18.04 and Java 11. It works using Apache Ant, you need to have it installed (e.g. sudo apt install ant). Compile the project by running ant (and ant clean for cleaning), which creates a JAR file called mjavac.jar. Execute by java -jar mjavac.jar ....

To install Ant on Ubuntu, you can run sudo apt install ant.

Your build might run into problems in the generate phase - as a temporary workaround you can remove (by editing build.xml) the dependency of the compile task on generate.
Also note that defining a class of name Symbol leads to name clashes with CUP's class of the same name.

On the "nova" university server, the problem seems to be related to a preexisting old installation of JFlex. Fortunately, there is a simple workaround (due to the system team): build using ant -lib ./tools (instead of simply ant).
Note that you need to use Java 11 for this; you can download a .tar.gz file and "install" it locally as explained e.g. here (from bash).

Submission guidelines

Groups

Submission is in groups of at most 3. There's a Moodle forum for seeking partners.

Submission date and late submissions

The submission time is always at midnight of the day of the deadline (23:59).
Every student is allowed a total of 5 grace days. For example, if a student submits ex1 within 2 days of the deadline and ex2 within 3 days of the respective deadline, every other late submission will be graded 0.
Late submissions due to reasons recognized by the faculty's rules (e.g. illness, military duty, etc.) are not included in the grace days.

What to submit

Submit all your code in a zip file.

  • The top level must contain a build.xml file. We execute ant in this directory, and this must generate a JAR file called mjavac.jar.
  • The top level must contain a file called ids.txt that contains the ID numbers of the students in the group, each in a different line.
  • The rest of the directory structure is up to you (you can use the structure of the solution starter kit but you don't have to). Make sure you generate a target of the correct name in the correct place upon compilation.

    Your code must compile on a standard Ubuntu 18.04 distribution. The grade on a project that doesn't compile is 0. We will try to notify you when your submission doesn't compile and allow you to resubmit (on the expense of the grace days), but we can't promise to do that - make sure your code compiles.

    Where to submit

    One student from each group should submit the solution via the course's Moodle (under "HW submission").

    Start early

    There's much to do, and it's going to take you time. Plan ahead.

    Words of caution

    Do not plagiarize code from others. It is OK to discuss ideas with other students, but you must write your code on your own. Don't make us take disciplinary measures.

    Good luck!