joey paquet, 2000, 2002, 2007, 20081 concordia university department of computer science comp...

34
Joey Paquet, 2000, 2002, 2007, 2008 1 Concordia University Department of Computer Science COMP 442/6421 Compiler Design

Upload: charles-snow

Post on 03-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Joey Paquet, 2000, 2002, 2007, 2008 1

Concordia UniversityDepartment of Computer

Science

COMP 442/6421Compiler Design

Joey Paquet, 2000, 2002, 2007, 2008 2

Course Description

• Instructor– Name: Dr. Joey Paquet– Office: EV-3-221– Phone: 7831– e-mail: [email protected]– Web: www.cse.concordia.ca/~paquet

Joey Paquet, 2000, 2002, 2007, 2008 3

Course Description

• Topic– Compiler organization and implementation. – Lexical, syntax and semantic analysis. Code

generation.

• Outline– Design and implementation of a simple

compiler.– Lectures related to the project.

Joey Paquet, 2000, 2002, 2007, 2008 4

Course Description

• Grading– Assignments (4) : 40%– Final Examination : 30%– Final Project : 30%

• Late assignment penalty: 50% per working day• Assignments and project are graded on:

Correctness, Completeness, Design, Style, Documentation.

Joey Paquet, 2000, 2002, 2007, 2008 5

Project Description

• Design and coding of a simple compiler– Individual work– Divided in four assignments– Final project is graded at the end of the

semester, during a final demonstration– Testing is VERY important and up to you

Joey Paquet, 2000, 2002, 2007, 2008 6

Project Description

• A complete compiler is a fairly complex and large program: from 10,000 to 1,000,000 lines of code.

• Programming one will force you to go over your limits.

• It uses most of the elements of the theoretical foundations of Computer Science.

• It will probably be the most complex program you have ever written.

Joey Paquet, 2000, 2002, 2007, 2008 7

Introduction to Compilation

• A compiler is a translation system. • It translates programs written in a high

level language into a lower level language, generally machine (binary) language.

source code compiler

targetcode

Source language Target languageTranslator

Joey Paquet, 2000, 2002, 2007, 2008 8

Introduction to Compilation

• The only language that the processor understands is binary.

a: Register addition (from a symbol table)b: First operand (R1) c: Second operand (R3)d: Third operand (R15)

000100000100111111

a b c d

Joey Paquet, 2000, 2002, 2007, 2008 9

Introduction to Compilation• Assembly language is the first higher level

programming language.• 000100000100111111 <=> Add R1,R3,R15• There is a one-to-one correspondence between lines of

code and the machine code lines.• A op-code table is sufficient to translate assembly

language into machine code.

Joey Paquet, 2000, 2002, 2007, 2008 10

Introduction to Compilation

• Compared to binary, it greatly improved the productivity of programmers. Why?

• Though a great improvement, it is not ideal: – Not easy to write– Even less easy to read and understand– Extremely architecture-dependent

Joey Paquet, 2000, 2002, 2007, 2008 11

Introduction to Compilation

• A compiler translates a given high-level language into assembler or machine code.

X=Y+Z;

L 3,Y Load working register with YA 3,Z Add Z to working registerST 3,X Store the result in X

000010010010110001001001010100100100101001

Joey Paquet, 2000, 2002, 2007, 2008 12

FORTRAN: The first compiler

• The problems with assembly led to the development of the first compiler: FORTRAN.

• Stands for FORmula TRANslation.• Developed between 1954 and 1957 at

IBM by a team led by John Backus. • This was an incredible feat, as the

theory of compilation was not available at the time.

Joey Paquet, 2000, 2002, 2007, 2008 13

Paving down the road• In parallel to that, Noam Chomsky was investigating on the structure

of natural languages. • His studies led the way to the classification of languages according to

their complexity (aka the Chomsky hierarchy).• This was used by various theoreticians in the 1960s and early 1970s to

design a fairly complete set of solutions to the parsing problem. • These solutions have been used ever since.

• As the parsing solutions became well understood, efforts were devoted to the development of parser generators.

• The most commonly known is YACC (Yet Another Compiler Compiler).• Developed by Steve Johnson in 1975 for the Unix system.

Joey Paquet, 2000, 2002, 2007, 2008 14

Compilation vs. Interpretation

• A compiler translates high-level instructions into machine code. An interpreter uses the computer to execute the program directly, statement by statement.– Advantage: immediate response– Drawbacks: inefficient with loops, restricted

to single-file programs.

Joey Paquet, 2000, 2002, 2007, 2008 15

Compiler’s Environment

• Building an executable from multiple files

sourcecode

compiler objectcode

executablecode

linker

run-timelibraries

compiledmodules

Joey Paquet, 2000, 2002, 2007, 2008 16

Phases of a Compiler

front-end

back-end

target code

intermediatecode

syntax treetoken stream annotatedtree

optimized target code

source code

target codegeneration

high-leveloptimization

syntacticanalysis

lexicalanalysis

semanticanalysis

low-leveloptimization

Joey Paquet, 2000, 2002, 2007, 2008 17

Lexical analysis

• Transforms the initial stream of characters into a stream of tokens – keywords : while, to, do, int, main– identifiers : i, max, total, i1, i2– literals : 123, 12.34, “Hello”– operators : +, *, and, >, <– punctuation : {, }, [, ], ;

Joey Paquet, 2000, 2002, 2007, 2008 18

Syntactic analysis

• Attempts to build a valid parse tree from the grammatical description of the language.

S

id =

idid

*

;

E

E

E

Distance = rate * time;

Joey Paquet, 2000, 2002, 2007, 2008 19

Semantic Analysis• The semantics of a program is its meaning. • It is possible to have syntactically valid

program that does not have any meaning.

• Semantic analysis has two parts: – Semantic checking: Validating the semantics of a

syntactically valid program and gathering information about the meaning of its constitents (attributes).

– Semantic translation: Giving a meaning to a program using a pre-established language, typically a syntax tree decorated with attributes. This is often called an intermediate representation.

Joey Paquet, 2000, 2002, 2007, 2008 20

Semantic Translation: example

• Breaks the statements into small pieces corresponding roughly to machine instructions.

x = a*y+z;t1 = a*y;t2 = t1+z;x = t2;

Joey Paquet, 2000, 2002, 2007, 2008 21

High-Level Optimization• The generated intermediate representation is often

inefficient because of bad structure or redundancy.

• This kind of optimization is not bound to the target machine’s architecture.

t1 = a*y;t2 = t1+z;x = t2;

t1 = a*y;x = t1+z;

Joey Paquet, 2000, 2002, 2007, 2008 22

Target Code Generation

• Translates the optimized intermediate representation into the target code (normally machine language or assembler).

t1 = a*y;x = t1+z;

LE 4,a a in register 4ME 4,y multiply by yAE 4,z add zSTE 4,x store register 4 in x

Joey Paquet, 2000, 2002, 2007, 2008 23

Passes, Front End and Back End

• A pass consists in reading a high-level version of the program and writing a new lower-level version.

• Several passes are often needed:– To resolve forward references– To limit the memory used by the different

phases.

Joey Paquet, 2000, 2002, 2007, 2008 24

Low-Level Optimization

• The generated target code is analyzed for inefficiencies such as dead code or code redundancy.

• Care is taken to exploit as much as possible the CPU’s capabilities.

• This phase is heavily architecture dependent.

• Lots of research is still done in this very complex area.

Joey Paquet, 2000, 2002, 2007, 2008 25

Passes, Front End and Back End• The front-end is composed of: Lexical, Syntactic,

Semantic analysis and High-level optimization.• In most compilers, most of the front-end is driven by the

Syntactic analyzer. • It calls the Lexical analyzer for tokens and generates an

abstract syntax tree when syntactic elements are recognized.

• The generated tree (or other intermediate representation) is then analyzed and optimized in a separate process.

• It has little or no concern with the target machine.

Joey Paquet, 2000, 2002, 2007, 2008 26

Passes, Front End and Back End

• The back-end is composed of: Code generation and low-level optimization.

• Uses the intermediate representation generated by the front-end to generate target machine code.

• Heavily dependent on the target machine.

• Independent on the programming language compiled.

Joey Paquet, 2000, 2002, 2007, 2008 27

System Support

• Symbol table– Central repository of identifiers (variable or

function names) used in the compiled program.

– Contains information such as the data type or value in the case of constants.

– Used to identify undeclared or multiply declared identifiers, as well as type mismatches.

– Provides temporary variables for intermediate code generation.

Joey Paquet, 2000, 2002, 2007, 2008 28

System Support

• Error handling procedures– Implement the compiler’s response to errors

in the code it is compiling.– Provides useful insight to the user about

where is the error and what it is.– Should find all errors in the whole program.– Can attempt to correct some errors and only

give a warning.

Joey Paquet, 2000, 2002, 2007, 2008 29

System Support

• Run-time system– Some programming languages concepts

raise the need for dynamic memory allocation. What are they?

– The running program must then be able to manage its own memory use.

– Some will require a stack, others a heap. These are managed by the run-time system.

Joey Paquet, 2000, 2002, 2007, 2008 30

Writing of Early Compilers

• The first C compiler

minimal Ccompiler source assembler

executableC compiler(minimal)

C compiler(minimal)

full Ccompiler source

executableC compiler

(full)

Joey Paquet, 2000, 2002, 2007, 2008 31

Writing Cross-Compilers

• A Unix-MacIntosh C cross compiler

Mac C compilersource code

in Unix C

Unix Ccompiler

Mac C complierusable on Unix

Mac C complierusable on Unix

Mac C compilersource code

in Unix C

Mac C complierusable on Mac

Joey Paquet, 2000, 2002, 2007, 2008 32

Writing Retargetable Compilers

• Two methods: – Make a strict distinction between front-end

and back-end, then use different back-ends.– Generate code for a virtual machine, then

build a compiler or interpreter to translate virtual machine code to a specific machine code. That is what we do in the project.

Joey Paquet, 2000, 2002, 2007, 2008 33

Summary

• The first compiler was the assembler, a one-to-one direct translator.

• Complex compilers were written incrementally, first using assemblers.

• All compilation techniques are well known since the 60’s and early 70’s.

Joey Paquet, 2000, 2002, 2007, 2008 34

Summary

• The compilation process is divided into phases.

• The input of a phase is the output of the previous phase.

• It can be seen as a pipeline, where the phases are filters that successively transform the input program into an executable.