716.077 ku compilerbau - programming · pdf fileplease take care that the required documents...

12
716.077 KU Compilerbau - Programming Assignment Univ.-Prof. Dr. Franz Wotawa, Birgit Hofer Institute for Software Technology, Graz University of Technology April 20, 2011 Introduction During this semester you will build a simple C compiler in Java using ANTLR 3.3. You can to this together with 3 of your colleagues (4 students per group at maximum). Your compiler should be able to translate a program written in the simplified C language described below to Java bytecode. Figure 1(b) illustrates the different phases of a compiler. The phases which will be part of your exercise are highlighted in Grey. (a) Language processing system (b) Compiler phases Figure 1: Compiler 1

Upload: vukhue

Post on 18-Mar-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

716.077 KU Compilerbau - Programming Assignment

Univ.-Prof. Dr. Franz Wotawa, Birgit HoferInstitute for Software Technology, Graz University of Technology

April 20, 2011

Introduction

During this semester you will build a simple C compiler in Java using ANTLR 3.3. You canto this together with 3 of your colleagues (4 students per group at maximum). Your compilershould be able to translate a program written in the simplified C language described below toJava bytecode. Figure 1(b) illustrates the different phases of a compiler. The phases which willbe part of your exercise are highlighted in Grey.

(a) Language processing system (b) Compiler phases

Figure 1: Compiler

1

The following table gives an overview of your tasks and the delivery deadlines of them.

Task Reachable Points Deadline0 Group Registration - 31.03.1 Lexical Analysis 5 04.04.2 Syntax Analysis 15 11.04.3 Type Checking 15 16.05.4 Intermediate Code 15 30.05.5 Code Generation 15 20.06.

General rules

• You have to deliver tasks 1-5 via SVN.

• You must always hand in something, since the tasks build on the previous tasks.

• If you are not able to solve one of the tasks you will get the source code of another groupon request so that you are able to finish the other tasks.

• You must document the percentage of participation of each team member for each task inthe README file.

• You will get the points reached for each task via email.

• There are mandatory interviews at the end of task 5.

File/folder hierarchy

Your SVN repository must have a branch for each task. The branches must have the nametask {number} where {number} is replaced by the actual task number. Each branch must havethe following structure. Please take care that the required documents are in the correct folders.

• task {number}/

◦ build.xml

◦ doc/

∗ readme.txt

◦ lib/

◦ src/

∗ **/*.java

∗ NEW: SimpleC.g

◦ test/

readme.txt

The file readme.txt must contain

• the table with the percentage of participation

• all changes made (e.g. bug correction) with respect to the previous tasks in clear and shortsentences

• known limitations / bugs

• implemented additional tasks

2

lib

The directory lib should contain the ANTLR and the JUnit .jar files.

build.xml

The file build.xml must define

• an ant task compile which compiles the grammar-file to Java files and compiles all Javafiles from src.

• an ant task run-junit which depends on compile, compiles all Java files from test, runsall JUnit tests and creates an xml file in the output folder.

You can use the BUILD file from the framework.

Framework

You can download a simple framework from course web site. This framework contains

• all required libraries (ANTLR , JUnit ),

• skeletons of the required classes,

• JUnit test files,

• example input and output files,

• an example README file and

• an example BUILD file.

Upload these files to your repository and modify and extend them.

3

The Grammar

Syntax

program → declarations function declarationsdeclarations → declarations type identifier-list;

| declarations function head;

| εidentifier list → identifier

| identifier-list , identifier

identifier → id| * identifier

type → int| char| void

function declarations → function declaration function declarations| ε

function declaration → function head function bodyfunction head → type identifier argumentsarguments → ( parameter-list )

| ( )

parameter list → type identifier| parameter list, type identifier

function body → { declarations optional statements }compound statement → { optional statements }optional statements → statement list

| εstatement list → statement

| statement list statementstatement → compound statement

| if (assignment expr) statement else statement

| for (expr stmt expr stmt assignment expr) statement

| expr stmt| return assignment expr ;

expr stmt → ;

| assignment expr ;

assignment expr → identifier assignop expression| identifier assignop & identifier| expression

expression → simple expression| simple expression relop assign simple expression

simple expression → term| sign term| simple expression sign term| simple expression or term

term → factor| term mulop factor

factor → identifier| function call| NEW: int| ( assignment expr )

| not factor| literal string

function call → identifier ( extend assignment expr list )

| identifier ( )

extend assignment expr list → assignment expr| & identifier| extend assignment expr list , assignment expr

| extend assignment expr list , & identifier

relop assign → relop

4

| assignop

Operators

relop → < | <= | > | >= | ==sign → + | -mulop → * | / | % | &&

assignop → =or → ||not → !

Identifiers

id → letter ( letter | digit | )*letter → [a-zA-Z]digit → [0-9]

String literals

literal string → ”( [ 0-9a-zA-Z ]| | ! | % | escape sequence | relop | sign | mulop | assignop )*”

Numerical literals

int → decInt | octInt | hexIntdecInt → 0 | digit digit0*hexInt → 0 (x|X) hexDigit+octInt → 0 octDigit+digit → [1-9]digit0 → [0-9]hexDigit → [0-9a-fA-F]octDigit → [0-7]

Escape sequences

\ n newline

\ t horizontal tab

\ b backspace

\ r carriage return

\ f form feed

\ ’ single quote

\ ” double quote

\\ backslash

Comments

Comments may appear after any token and are surrounded by /* and */.

White spaces

White spaces between tokens are optional, with one exception: keywords must be surrounded bywhite spaces, newlines or the beginning of the program.

5

0 Group Registration

Groups up to 4 students are allowed. In order to get access to an SVN repository you have toregister your group with help of the Web-Interface. The link to the Web-Interface will be postedon Wednesday, March 16th to the newsgroup.

1 Lexical Analysis

Write a lexical analyzer for the subset of the C language described above in Java with ANTLR

version 3.3. Name your grammar file NEW: SimpleC.g.Create the class LexicalAnalyzer.java in the package at.tugraz.ist.compilerbau in the

directory src with the method public static int lexer(String file path). This methodreturns 0 if the lexical analysis was successful, otherwise it returns the number of errors found.In addition write the following information to the standard output:

• Input program abstracted to lexemes and keywords with

◦ the same line separations as the original program,

◦ line numbers added,

◦ all comments and unnecessary white spaces removed,

• Summary or errors

◦ Number of errors found in the source file

You can find an example input and output in the framework. Create your own example programs(at least one error-free program and one program which leads to lexical errors) and add them tothe test-folder. Extend the JUnit test to test those programs.

Create the README file as descripted above and extend the BUILD file if necessary.

Task Summary

1. Define an ANTLR grammar file for the shown grammar

2. Implement the requested method for the class LexicalAnalyzer

3. Create example programs and extend the JUnit tests

4. Create a README file

5. Adapt the BUILD file if necessary

6

2 Syntax Analysis

Write a syntactical analyzer for the grammar described above in Java with ANTLR version 3.3.Extend the file NEW: SimpleC.g. Please note: the grammar above must be transformed into anLL grammar.

Create the class SyntaxAnalyzer.java in the package at.tugraz.ist.compilerbau in thedirectory src with the method public static int checkSyntax(String file path). Thismethod returns 0 if the syntax analysis was successful, otherwise it returns the number of errorsfound. If there exist lexer errors, no syntactical analysis have to be performed. Instead thenumber of lexical errors should be returned.

In addition write the following information to the standard output:

• Input program abstracted to function definitions

◦ a line for every definition of a function/procedure (name + signature)

◦ an output for the end of every function/procedure

◦ line numbers added

• Summary or errors

◦ Number of errors found in the source file

You can find an example input and output in the framework.Create your own example programs (at least one error-free program and one program which

leads to syntactical errors) and add them to the test-folder. Extend the JUnit test to test thoseprograms.

It makes sense to build an abstract syntax tree, since you need it for the next task. Be carefulwith left and right associativity and precedence. An operator like + is left-associative: a + b +

c is equal to (a + b) + c, but assignment is right-associative: a = b = c means a = (b = c).Update the your README file. Don’t forget to document any changes in the lexer you made

in this task.

Bonus Task

Your are allowed to extend the given grammar, e.g., you can add in-line comments. If you extendthe language, please make sure that your grammar stays downward compatible, and documentyour changes in the README file. You can get up to 3 points for extensions.

Task Summary

1. Extend your ANTLR grammar file

2. Implement the requested method for the class SyntaxAnalyzer

3. Create example programs and extend the JUnit tests

4. Adapt the README file

7

3 Type Checking

Create data structures to hold the type information for the identifiers. You will have (at least)int, void, pointers to (pointers to pointers to . . . ) int or void, and functions with argumentand return types. Implement the code to fill the data structures, and to perform type checking.

Create the class TypeChecker.java in the package at.tugraz.ist.compilerbau in the direc-tory src with the method public static int checkTypes(String file path). This methodreturns 0 if the type checking analysis was successful, otherwise it returns the number of errorsfound. If there exist lexical or syntactical errors, no type checking have to be performed. Insteadthe number of lexical or syntactical errors should be returned.

In addition write the following information to the standard output:

• Type errors

◦ Use of undeclared identifiers

◦ Use of incorrect type

◦ Double declarations (Note: It is allowed to declare a variable in the global scope andin each local scope. But it is an error to declare a variable more than once in anygiven scope.)

• Type coercions

◦ All type coercions. (For example: ”cast expression ’4 * a’ from int to real

in line 27.”)

◦ The type of any operator. (For example, ”real-real addition ’8.0 + 4 * a’ in

line 27”.)

◦ Definitions of variables, including type and scope. (For example: ”variable ’a’

defined in function ’f’ with type integer”).

Keep your output readable, e.g., in the form of a table. You can find an example input andoutput in the framework. The examples shown here do not prescribe the exact form of youroutput.

Define printf and scanf now. Note that printf may have one or two arguments.Create your own example programs (at least one type-correct and one type-incorrect program)

and add them to the test-folder. Extend the JUnit test to test those programs.

Scopes

Note that there are two scopes: a global one for the program and a local one for the currentfunction. Nested scoping is not required, but you can get bonus points for implementing it.

NEW: Forward declarations

In C it is not allowed to use a variable before it is defined. For instance, if you define a functiona, and then a global variable q, you can not use q in a.

For functions, the situation is more complicated. If you define a function a, and then afunction b, and you use b in a, you are creating an implicit declaration, which assumes, that b

returns value of a certain type, e.g., an int. This problem is avoided in C by forward declarations.Don’t forget to include forward declarations in your type checking.

Type Coercion

A coercion occurs if the type of an operand is automatically converted to the type expected bythe operator.

1. A conversion to void is always allowed and results in the result being thrown away. Forinstance, printf returns an int, but (void) printf("hallo") (not allowed in our gram-mar), or more succinctly printf("hallo") are legal C statements .

8

2. A conversion from void is never possible. A void type may not participate in an operation(such as +). A void variable can not be defined.

3. A binary operator (such as + or <, but not && or ||) takes either two int, or a char and anint (in either order). In the latter case, the char is converted to int before the operationis performed. (You can not do arbitrary arithmetic on pointers. In C , adding pointerand int is allowed, but you do not need to support it. Comparison of pointers (a<b) isallowed!)

4. A char can be assigned to an int and is converted automatically. An int can also beconverted to a char automatically; this may lead to loss of information.

5. int, char and pointers can be used in Boolean expressions. An int or char with thevalue 0 is false, any other value is true. Pointers are NULL (false) or non-NULL(true). Boolean operators (&&, ||, <, >, etc.) return an int.

6. Any pointer conversion is allowed, and never leads to an error. A warning of an invalid castis appreciated, and if you implement such warnings consequently, you’ll get bonus points.Conversions between pointers and ints are not supported.

Bonus Tasks

• Warnings for invalid pointer casts: 2 points

• Implementation of nested scopes: 2 points

• Extend the type system to include more types. Note: coercion rules in C are not quitetrivial. Points depend on your exact plans.

Don’t forget to clearly document the implemented bonus taks in the README file.

Task Summary

1. Implement the requested method for the class TypeChecker

2. Define printf and scanf

3. Create example programs and extend the JUnit tests

4. Adapt the README file

9

4 Intermediate Code

Build an Intermediate Code Generator which is able to build the intermediate code for anyprogram written in our grammar.

Create the class IntermediateCode.java in the package at.tugraz.ist.compilerbau in thedirectory src with the method public static int createIntermediateCode(String file path).This method returns 0 if the intermediate code generation was successful, otherwise it returnsthe number of (lexical, syntactical or type) errors found. In addition it writes the ASCII repre-sentation of the intermediate code to the standard output. Use comments to clarify which partof the C code corresponds to the intermediate code.

Three-Address Code

The three address code should support the following commands:

• Assigments of the form x := y op z, where op is one of

◦ int+, int-, int*, int/, int% , which take two integers and return the correspondingresult.

◦ int&&, int||, which take two integers and return 0 or 1.

◦ int<, int<=, int>=, int>, int==, which take two integers and return 0 or 1.

• Assignments of the form x := op y, where

◦ op is one of intminus intnot and y is a variable.

◦ op is one of address, dereference and y is a variable.

◦ op is one of intconst, pointerconst, and y is an integer constant or a label, respec-tively.

• The assignment x *= y, stating that y should be assigned to the location that x pointsto. (Translates the C statement *x = y.)

• A constant declaration string a, where a is a quoted string (”hello!\ 0”).

• Copy statements: x := y.

• Jumps

◦ The label label L.

◦ The unconditional jump goto L, where L is a label

◦ Conditional jump ifint x goto L, where x is an int and L a label.

• For functions

◦ The statement param x, which says that x is a parameter to the next function call.Parameters should be listed from right to left.

◦ The statement call f n, where f is the name of a function and n is the number ofparameters.

◦ The statement function f n, where f is a function, and n is the number of localvariables, including temporary ones.

◦ The statements return and return x, which return from the function, possibly re-turning the value in x.If you want your compiler to be gcc compatible, you need todistinguish between int and char returns. This is not part of the assignment.

◦ The statement local x i, where x is a local variable, a temporary variable, or anargument, and i is a number to reserve space.

∗ For locals and temps, i should be negative and run from -1 down.

10

∗ For arguments, i should be positive and run from 0 up (the left argument hasnumber 0).

◦ The statement getresult x, which stores the return value of the last function call (ifany) in x.

• The statement global x i, where x is a variable name, and i is a number, starting at 0,running up.

• The statement comment x, which is ignored.

This list might be incomplete. Your are allowed to add additional commands. Post your additionsto the newsgroup so that others can benefit from them. Don’t forget to document your extensionsin the README file. You do not have to pay attention to efficiency of time or space. In theintermediate code (but not in C ), every function has to end with a return.

Boolean values

Truth values are represented by integers: 0 is false, anything else is true. A truth valuegenerated by a comparison, &&, or || may only be 0 or 1.

Pointers

Pointers are treated like ints.

Implementation Suggestion

Create an inheritance hierarchy, with the class Statement on top. Other Statements inheritfrom this class. Your intermediate code program is a vector of Statements, each Statement canprint itself. Later, each Statement will know how to translate itself to e.g. assembler code orJava bytecode.

Use an attributed grammar on the abstract syntax tree to generate the code. A functionnode, for example, should get the intermediate code from its children (declarations, statements),plus the number of temp variables, so that it can construct the function name number statement.The intermediate code for the function node consists of this statement and the intermediate codefor the body.

Bonus Task

You need not to implement short-circuit evaluation for Boolean expressions. You will get 2 bonuspoints for implementing short circuiting.

Task Summary

1. Implement the requested method for the class IntermediateCode

2. Update your README file

11

5 Code Generation

Implement the translation of the intermediate code to Java bytecode.Create the class CodeGeneration.java in the package at.tugraz.ist.compilerbau in the

directory src with the method public static int createCode(String file path). Thismethod creates the output file <file>.class, where <file>.c is the name of the input file.The method returns 0 if the code generation was successful, otherwise it returns the number of(lexical, syntactical or type) errors found.

Your program must be executable by the Java Virtual machine. Printf and scanf mustwork.

NEW: Option Instead of directly producing bytecode, you are allowed to produce code inan assembler-like syntax (Jasmin). In this case your output files must end with .j. It must bepossible to translate the generated output files to bytecode by using Jasmin 2.4. Don’t forget toadd jasmin.jar to your libs-folder.

Task Summary

1. Implement the requested method for the class CodeGeneration

2. Update your README file

12