compilers and language translation gordon college

Post on 15-Dec-2015

223 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Compilers and Language Translation

Gordon College

2

What’s a compiler? All computers only understand machine language

Therefore, high-level language instructions must be translated into machine language prior to execution

10000010010110100100101……

This isa program

3

What’s a compiler? Compiler

A piece of system software that translates high-level languages into machine language

10000010010110100100101……

Congrats!

while (c!='x') { if (c == 'a' || c == 'e' || c == 'i')

printf("Congrats!"); else if (c!='x') printf("You Loser!"); }

Compiler

gcc -o prog program.c

program.c

prog

4

Assembler (a kind of compiler)

LOAD X Assembly

0101 0000 0000 1001 Machine Language

(symbol table)(opcode table)

One-to-one translation

5

Compiler (high-level language translator)

a = b + c - d;

One-to-many translation

0101 00001110001 LOAD B0111 00001110010 ADD C0110 00001110011 SUBTRACT D0100 00001110100 STORE A

0101 00001110001 0111 00001110010…….

6

Goals of a compiler Code produced must be correct

A = (B+C)-(D+E);

Possible translation:LOAD BADD CSTORE BLOAD DADD ESTORE DLOAD BSUBTRACT DSTORE A

No - STORE B and STORE D changes the values of variables B and D which is the high-level language does not intend

Is this correct?

7

Goals of a compiler Code produced should be reasonably efficient

and conciseCompute the sum - 2x1+ 2x2+ 2x3+ 2x4+…. 2x50000

sum = 0.0for(i=0;i<50000;i++) {

sum = sum + (2.0 * x[i]);

Optimizing compiler:sum = 0.0for(i=0;i<50000;i++) {

sum = sum + x[i];sum = sum * 2.0; 49,999 less instructions

8

General Structure of a Compiler

9

The Compilation Process Phase I: Lexical analysis

Compiler examines the individual characters in the source program and groups them into syntactical units called tokens

Phase II: Parsing

The sequence of tokens formed by the scanner is checked to see whether it is syntactically correct

ScannerSourcecode

Groups of

tokens

ParserGroups of

tokens

correct

not correct

10

The Compilation Process

Phase III: Semantic analysis and code generation The compiler analyzes the meaning of the

high-level language statement and generates the machine language instructions to carry out these actions

Code Generator

Groupsof

tokens

Machinelanguage

11

The Compilation Process Phase IV: Code optimization

The compiler takes the generated code and sees whether it can be made more efficient

Code Optimizer Machine

languageMachinelanguage

12

Overall Execution Sequence on a High-Level Language Program

13

The Compilation Process

Source program

Original high-level language program

Object program

Machine language translation of the source program

14

Phase I: Lexical Analysis Lexical analyzer

The program that performs lexical analysis

More commonly called a scanner

Job of lexical analyzer Group input characters into tokens

• Tokens: Syntactical units that are treated as single, indivisible entities for the purposes of translation

Classify tokens according to their type

15

Phase I: Lexical AnalysisProgram statement

sum = sum + a[i];

Digital perspective:

tab,s,u,m,blank,=,blank,s,u,m,blank,+,blank,a,[,i,],;

Tokenized:

sum,=,sum,+,a[i],;

16

Phase I: Lexical Analysis

TOKEN TYPE CLASSIFICATION NUMBERSymbol 1Number 2= 3+ 4- 5; 6== 7If 8Else 9( 10) 11[ 12] 13…

Typical Token Classifications

17

Phase I: Lexical Analysis Lexical Analysis Process

1. Discard blanks, tabs, etc. - look for beginning of token.2. Put characters together 3. Repeat step 2 until end of token4. Classify and save token5. Repeat steps 1-4 until end of statement6. Repeat steps 1-5 until end of source code

Scannersum=sum+a[i];

sum 1= 3+ 4a 1[ 12i 1] 13; 6

18

Phase I: Lexical Analysis

Input to a scanner- A high-level language statement from the source program

Scanner’s output- A list of all the tokens in that statement- The classification number of each token found

Scannersum=sum+a[i];

sum 1= 3+ 4a 1[ 12i 1] 13; 6

19

Phase II: Parsing

Parsing phase

A compiler determines whether the tokens recognized by the scanner are a syntactically legal statement

Performed by a parser

20

Phase II: Parsing

Output of a parser

A parse tree, if such a tree exists

An error message, if a parse tree cannot be constructed

Successful construction of a parse tree is proof that the statement is correctly formed

21

Example High-level language statement: a = b + c

22

Grammars, Languages, and BNF

Syntax

The grammatical structure of the language

The parser must be given the syntax of the language

BNF (Backus-Naur Form)Most widely used notation for representing the syntax of a programming language

literal_expression ::= integer_literal | float_literal | string | character

23

Grammars, Languages, and BNF

In BNF

The syntax of a language is specified as a set of rules (also called productions)

A grammar

• The entire collection of rules for a language

Structure of an individual BNF rule

left-hand side ::= “definition”

24

Grammars, Languages, and BNF

BNF rules use two types of objects on the right-hand side of a production Terminals

• The actual tokens of the language

• Never appear on the left-hand side of a BNF rule Nonterminals

• Intermediate grammatical categories used to help explain and organize the language

• Must appear on the left-hand side of one or more rules

25

Grammars, Languages, and BNF

Goal symbol

The highest-level nonterminal

The nonterminal object that the parser is trying to produce as it builds the parse tree

All nonterminals are written inside angle brackets

Java BNF

26

BNF Example<postal-address> ::= <name-part> <street-address> <zip-part> <name-part> ::= <personal-part> <last-name> <opt-jr-part> <EOL> | <personal-part> <name-part> <personal-part> ::= <first-name> | <initial> "." <street-address> ::= <opt-apt-num> <house-num> <street-name> <EOL><zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL><opt-jr-part> ::= "Sr." | "Jr." | <roman-numeral> | ""

Identify the following:Goal symbol, terminals, nonterminals, a individual rule

Is this a legal postal address?Steve Moses Sr. 215 Rose Ave.Everywhere, NC 43563

27

Parsing Concepts and Techniques Fundamental rule of parsing:

By repeated applications of the rules of the grammar-

If the parser can convert the sequence of input tokens into the goal symbol the sequence of tokens is a syntactically valid

statement of the languageelse

the sequence of tokens is not a syntactically valid statement of the language

28

Parsing Example<httpaddress> ::= http:// <hostport> [ / <path> ] [ ? <search> ]<hostport> ::= <host> [ : <port> ]<host> ::= <hostname> | <hostnumber><hostname> ::= <ialpha> [ . <hostname> ]<hostnumber> ::= <digits> . <digits> . <digits> . <digits><port> ::= <digits><path> ::= <void> | <xpalphas> [ / <path> ]<search> ::= <xalphas> [ + <search> ]<xalpha> ::= <alpha> | <digit> | <safe> | <extra> | <escape> <xalphas> ::= <xalpha> [ <xalphas> ]<xpalpha> ::= <xalpha> | +<xpalphas> ::= <xpalpha> [ <xpalpha> ]<ialpha> ::= <alpha> [ <xalphas> ]<alpha> ::= a | b | … | z | A | B | … | Z<digit> ::= 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9<safe> ::= $ | - | _ | @ | . | & | ~<extra> ::= ! | * | " | ' | ( | ) | : | ; | , | <space><escape> ::= % <hex> <hex><hex> ::= <digit> | a | b | c | d | e | f | A | B | C | D | E | F<digits> ::= <digit> [ <digits> ]<void> ::=

Is the following http address legal:http://www.csm.astate.edu/~rossa/cs3543/bnf.html

29

Parsing Concepts and Techniques

Look-ahead parsing algorithms - intelligent parsers

One of the biggest problems in building a compiler is designing a grammar that:

Includes every valid statement that we want to be in the language

Excludes every invalid statement that we do not want to be in the language

30

Parsing Concepts and Techniques

Another problem in constructing a compiler: Designing a grammar that is not ambiguous

An ambiguous grammar allows the construction of two or more distinct parse trees for the same statement

NOT GOOD - multiple interpretations

31

Phase III: Semantics and Code Generation

Semantic analysis

The compiler makes a first pass over the parse tree to determine whether all branches of the tree are semantically valid

• If they are validthe compiler can generate machine language

instructionselse

there is a semantic error; machine language instructions are not generated

32

Phase III: Semantics and Code Generation

Semantic analysis

Syntactically correct, but semantically incorrect

example:

sum = a + b;

int a;double sum; data type mismatchchar b;

Semantic recordsa integersum doubleb char

33

Phase III: Semantics and Code Generation

Semantic analysis

a integerb char

temp ?

<expression> + <expression>

<expression>

Parse tree

Semantic recordSemantic record

Semantic record

34

Phase III: Semantics and Code Generation

Semantic analysis

a integerb integer

temp integer

<expression> + <expression>

<expression>

Parse tree

Semantic recordSemantic record

Semantic record

35

Phase III: Semantics and Code Generation

Code generation

Compiler makes a second pass over the parse tree to produce the translated code

36

Phase IV: Code Optimization Two types of optimization

Local Global

Local optimization The compiler looks at a very small block of

instructions and tries to determine how it can improve the efficiency of this local code block

Relatively easy; included as part of most compilers:

37

Phase IV: Code Optimization

Examples of possible local optimizations

Constant evaluation x = 1 + 1 ---> x = 2

Strength reduction x = x * 2 ---> x = x + x

Eliminating unnecessary operations

38

Phase IV: Code Optimization

Global optimization The compiler looks at large segments of the program

to decide how to improve performance Much more difficult; usually omitted from all but the

most sophisticated and expensive production-level “optimizing compilers”

Optimization cannot make an inefficient algorithm efficient - “only makes an efficient algorithm more efficient”

39

Summary A compiler is a piece of system software that

translates high-level languages into machine language

Goals of a compiler: Correctness and the production of efficient and concise code

Source program: High-level language program

40

Summary Object program: The machine language translation

of the source program

Phases of the compilation process

Phase I: Lexical analysis

Phase II: Parsing

Phase III: Semantic analysis and code generation

Phase IV: Code optimization

top related