Semantic Checks

In this part of the project we will validate a MiniJava AST to ensure that it conforms to the MiniJava specification, and in particular satisfies all the assumptions we have utilized for code generation. Nevertheless, these checks are part of the language, and they must be enforced even if your compiler can generate code for programs that don't satisfy some of the restrictions.

Usage

java -jar mjavac.jar unmarshal semantic inputProg.xml out.txt

will write to out.txt either "OK\n" if the input program passes all the semantic checks, or "ERROR\n" if any of the rules is violated.

Examples

Here (see the readme).

Semantic Checks on MiniJava Programs

Below is a list of requirements that you need to check, and report an error if some are violated. Every item in this list requires some explicit checks on the AST, since they express properties that are not inherently enforced by the AST structure. Note that the AST structure already enforces some properties of MiniJava, which we do not need to check explicitly here - for example, there's simply no way to write down a method with variable declarations not at the beginning.

A few items below are marked with *won't be tested* - this means that this is an assumption you can make about valid ASTs, and you don't have to implement a check for in this exercise. (Not because it's particularly difficult to do :))

  1. The superclass of a class precedes it in the file: when class B extends A, the class A was defined, and the definition of A must come before that of B. In particular, there are no cycles in the inheritance graph - i.e., check to be sure that no class directly or indirectly extends itself.

  2. The main class (the class containing the main method) cannot be extended.

  3. The same name cannot be used to name two classes. (Including the class containing the main method.)

  4. The same name cannot be used for the same field in one class. This includes fields defined in a class and a subclass (even though this is legal in real Java).

  5. The same name cannot be used for the same method in one class - overloading is not supported.

  6. An overriding method matches the ancestor's method signature with the same name (same number of arguments, same static type of arguments, a covariant static return type). Note that overloading is not supported.

  7. *won't be tested* The name this cannot be used for the name of a variable, field, formal parameter, or method.

  8. A type declaration of a reference type of A refers to classes that are defined somewhere in the file (either before or after the same class, or to the same class itself).

  9. new A() is invoked for a class A that is defined somewhere in the file (either before or after the same class, or to the same class itself).

  10. In method invocation, the static type of the object is a reference type (not int, bool, or int[]).

  11. A method call is to a method that was defined in the class according to the static type of the object, and further, the type of the actual parameters matches the definition. Namely, in e.f(a_1, ..., a_k), the method f was defined in class A where A is the static type of e, it has k arguments, and the static type of each a_i matches (i.e. is a subtype of) the type of the i'th formal parameter of said definition.

  12. A method call is invoked on an owner expression e (e.g. e.f(...)) which is either this, a new expression, or a reference to a local variable, formal parameter or a field.

  13. The static type of the object on which length invoked is int[].

  14. A reference in an expression to a variable (i.e., not in a role of a method name in a call or a class name in new) is to a local variable or formal parameter defined in the current method, or to a field defined in the current class or its superclasses.

  15. Every local variable is definitely initialized (assigned to) before it is used. See below.

  16. In an assignment x = e, the static type of e is valid according to the declaration of x. Note subtyping!

  17. In if and while, the condition is boolean.

  18. The static type of e in return e is valid according to the definition of the current method. Note subtyping!

  19. *won't be tested* The argument of the main method (e.g. args in public static void main(String[] args)) is not referenced.

  20. The argument to System.out.println is of type int.

  21. The arguments to the predefined operators (&&, <, !, +, -, * etc.) are of the correct type.

  22. In an array access x[e], x is int[] and e is an int.

  23. In an assignment to an array x[e1] = e2, x is int[], e1 is an int and also e2 is an int.

  24. Variable redeclaration is forbidden - the same name cannot be used for declarations of two local variables or formal parameters.

  25. updated In an array allocation new int[e], e is an int.

Definite Initialization of Local Variables

In Java, it is an error to access (as part of an r-value) a variable which has not been previously initialized (e.g., was assigned to, as part of an l-value). This is enforced statically at compile time. You will perform a very simple static analysis to ensure that. Let me emphasize again the "very simple" part - the analysis will sometimes (even in simple cases) declare an error even though at runtime there's never a problem, because detecting uninitialized variables with precision quickly goes off the deep end (it is undecidable).

We will follow a very simple rule: for each variable access there must be a preceding (in the same method) variable initialization in every program path. This is only tricky when there's more than one incoming edge in the control-flow graph, namely, in if and while. So for the variable to be properly initialized in an if to allow usage after the if, the variable must be initialized both in the then and the then case. Inside a while, the initializations don't count for uses after the loop, because we can't be sure that the loop executes at all. Note that if the only use of a variable is inside an if branch or a while body and it's initialized there, this is valid.
Also note that fields, formal parameters, and array components are always initialized.
See the recitation for more details.

Submission guidelines

As specified for all assignments. Not all the requirements in the list are graded equally.

Good luck!