241-437 compilers: attr. grammars/8 1 compiler structures objective – –describe semantic...

62
241-437 Compilers: Attr. Grammars/8 Compiler Structures Objective describe semantic analysis with attribute grammars, as applied in yacc and recursive descent parsers 241-437, Semester 1, 2011-2012 8. Attribute Grammars

Upload: griffin-willis

Post on 28-Dec-2015

235 views

Category:

Documents


0 download

TRANSCRIPT

241-437 Compilers: Attr. Grammars/8 1

Compiler Structures

• Objective– describe semantic analysis with attribute

grammars, as applied in yacc and recursive descent parsers

241-437, Semester 1, 2011-2012

8. Attribute Grammars

241-437 Compilers: Attr. Grammars/8 2

Overview

1. What is an Attribute Grammar?

2. Parse Tree Evaluation

3. Attributes

4. Attribute Grammars and yacc

5. A Grid Grammar

6. Recursive Descent and Attributes

241-437 Compilers: Attr. Grammars/8 3

In this lecture

Source Program

Target Lang. Prog.

Semantic Analyzer

Syntax Analyzer

Lexical Analyzer

FrontEnd

Code Optimizer

Target Code Generator

BackEnd

Int. Code Generator

Intermediate Codeconcentratingon attribute grammars

241-437 Compilers: Attr. Grammars/8 4

1. What is an Attribute Grammar?

• An attribute grammar is a context free grammar with semantic actions attached to some of the productions– semantic = meaning

• An action specifies the meaning of a production in terms of its body terminals and nonterminals.

241-437 Compilers: Attr. Grammars/8 5

Example Attribute Grammar

L EE E + TE TT T * FT FF ( E )F num

printf(Ebody.val)E.val := Ebody.val + Tbody.valE.val := Tbody.valT.val := Tbody.val * Fbody.valT.val := Fbody.valF.val := Ebody.valF.val := value(num)

Production Semantic Action

241-437 Compilers: Attr. Grammars/8 6

2. Parse Tree Evaluation

• One way of understanding semantic actions is as extra information (attributes) attached to the nodes of the parse tree for the input.

• The semantic action specifies the parent node attribute in terms of the attributes of its children.

241-437 Compilers: Attr. Grammars/8 7

Basic Parse Tree Input: 9 * 5 + 2

L EE E + TE TT T * FT FF ( E )F num

L

E

TE +

*T

F

9

F

5

F

2

T

241-437 Compilers: Attr. Grammars/8 8

Adding Meaning to the Tree

• What is the meaning of "9 * 5 + 2"?– the answer is to evaluate it, to get 47

• Add attributes to the tree, starting from the leaves and working up to the root– use the semantic actions to get the attribute

values

241-437 Compilers: Attr. Grammars/8 9

Parse Tree with Actions

L

E

TE +

*T

F

9

F

5

F

2

T

printf(Ebody.val)E.val := Ebody.val + Tbody.valE.val := Tbody.valT.val := Tbody.val * Fbody.valT.val := Fbody.valF.val := Ebody.valF.val := value(num) 9

9

45

45

47

47printf

2

2

evaluatebottom-up

5

241-437 Compilers: Attr. Grammars/8 10

3. Attributes

• Attribute values can be– numbers, strings, any data structures,

code, assembly language instructions

• It's not always necessary to build a parse tree in order to evaluate the grammar's action.

241-437 Compilers: Attr. Grammars/8 11

Kinds of Attribute

• There are two main kinds of attribute evaluation:– synthesized and inherited attributes

• The value of a synthesized attribute is calculated by using its body values– as in the previous example

241-437 Compilers: Attr. Grammars/8 12

Synthesized Attributes in a Tree

• Example:Production Semantic Action

T T * F T.val := Tbody.val * Fbody.val

*T F

T

9

45

5 evaluatebottom-up

241-437 Compilers: Attr. Grammars/8 13

Inherited Attributes

• An inherited attribute for a body symbol (i.e. terminal, non-terminal) gets its value from the other body symbols and the parent value– often used for evaluating more complex

programming language features

241-437 Compilers: Attr. Grammars/8 14

Inherited Attributes in a Tree

X.x := function(A.a, Y.y)

Y.y := function(A.a, X.x)

A.a

X.x Y.y

A.a

X.x Y.y

Direction of

evaluation

• Two examples:

241-437 Compilers: Attr. Grammars/8 15

4. Attribute Grammars and yacc

• yacc supports (synthesized) attribute grammars– yacc actions are semantic actions– no parse tree is needed, since yacc evaluates the

actions using the parser's built-in stack

241-437 Compilers: Attr. Grammars/8 16

expr.y Again%token NUMBER

%%

exprs: expr '\n' { printf("Value = %d\n", $1); }

| exprs expr '\n' { printf("Value = %d\n", $2); }

;

expr: expr '+' term { $$ = $1 + $3; }

| expr '-' term { $$ = $1 - $3; }

| term { $$ = $1; }

;

continued

declarations

actions

attributes

241-437 Compilers: Attr. Grammars/8 17

term: term '*' factor { $$ = $1 * $3; }

| term '/' factor { $$ = $1 / $3; } /* integer division */

| factor

;

factor: '(' expr ')' { $$ = $2; }

| NUMBER

;

continued

more actions

241-437 Compilers: Attr. Grammars/8 18

$$#include "lex.yy.c"

int yyerror(char *s){ fprintf(stderr, "%s\n", s); return 0;}

int main(void){ yyparse(); // the syntax analyzer return 0;}

c code

241-437 Compilers: Attr. Grammars/8 19

Evaluation in yaccStack$$ 3$ F$ T$ T *$ T * 5$ T * F$ T$ E$ E +$ E + 4$ E + F$ E + T$ E$ E \n$ Es

Input3*5+4\n$

*5+4\n$*5+4\n$*5+4\n$

5+4\n$+4\n$+4\n$+4\n$

+4\n$ 4\n$

\n$\n$\n$\n$

$$

Stack Actionshiftreduce F numreduce T Fshiftshiftreduce F num reduce T T * Freduce E T shiftshiftreduce F num reduce T F reduce E E + T shiftreduce Es E \naccept

val_3333 3 53 5151515 15 415 415 41919 19

Semantic Action

$$ = $1 (implicit)$$ = $1 (implicit)

$$ = $1 (implicit)$$ = $1 * $3$$ = $1 (implicit)

$$ = $1 (implicit)

$$ = $1 (implicit)$$ = $1 + $3

printf $1

Input: 3 * 5 + 4\n

241-437 Compilers: Attr. Grammars/8 20

5. A Grid Grammar

• A robot starts at (0,0) on a grid, and is given compass directions:– n = north, s = south, e = east, w = west

• Evaluate the sequence of directions to work out the final position of the robot.

241-437 Compilers: Attr. Grammars/8 21

Example

• The robot receives the directions:– n e e n n w– what is the 'meaning' (semantics) of the

directions?– the 'meaning' is the final robot position, (1,3)

start

final

n

ew

s

241-437 Compilers: Attr. Grammars/8 22

5.1. Grid Grammar Input: n w s s

robot pathpath path step | step e | w | s | n

robot

path

path step

spath step

spath step

wpath step

n

241-437 Compilers: Attr. Grammars/8 23

Grid Attribute Grammar

robot path

path path step

path

step estep wstep sstep n

printf( pathbody.(x,y) )

path.x := pathbody.x + stepbody.dxpath.y := pathbody.y + stepbody.dypath.(x,y) = (0,0)

step.(dx,dy) := (1,0)step.(dx,dy) := (-1,0)step.(dx,dy) := (0,-1)step.(dx,dy) := (0,1)

Production Semantic Actions

241-437 Compilers: Attr. Grammars/8 24

Data Types

• The path rules use (x,y), the position of the robot.

• The step rules use (dx,dy), the step taken by the robot.

• Implementing these data types requires new features of yacc.

(x,y)

dx,dy

241-437 Compilers: Attr. Grammars/8 25

Parse Tree with Actions Input: n w s s

robot

path

path step

spath step

spath step

wpath step

n

(0,0)

(0,1)

(-1,1)

(-1,0)

(-1,-1)

0,1

-1,0

0,-1

0,-1

printf (-1,-1)

evaluatebottom-up

241-437 Compilers: Attr. Grammars/8 26

5.2. Non-integer Yacc Attributes

• The default yacc attributes (e.g. $$, $1, etc) are integers.

• We want data structures for (x,y) and (dx,dy), coded as two struct types.

241-437 Compilers: Attr. Grammars/8 27

Defining New Types

• The new types are collected together inside a %union in the yacc definitions section:

%union{ type1 name1; type2 name2; . . .}

• For the grid:%union{ struct (int x, int y; } pos; struct (int dx, int dy; } offset;}

241-437 Compilers: Attr. Grammars/8 28

• The non-terminals that return the new types must be listed.

• Any tokens that use the types must be listed.

• For the grid:% type <offset> step% type <pos> path

Using the Types

these non-terminals returnvalues of the specified type

241-437 Compilers: Attr. Grammars/8 29

Using Typed Variables

• If an attribute (variable) is a record, then dotted-name notation is used to refer to its fields– e.g. $$.dx, $1.y

• The default action ($$ = $1) will cause an error if $$ and $1 are not the same type.

241-437 Compilers: Attr. Grammars/8 30

5.3. Grid Compiler

$ flex grid.l$ bison grid.y$ gcc grid.tab.c -o gridEval

grid.l,

a flex file

grid.y,

a bison file

bison

flex lex.yy.c

grid.tab.c

gccgridEval,

c executable

#include

241-437 Compilers: Attr. Grammars/8 31

Usage

$ ./gridEvalnwssRobot is at (-1,-1)$ ./gridEvaln n n w w w s eRobot is at (-2,2)$

I typed these lines.

I typed ctrl-D

241-437 Compilers: Attr. Grammars/8 32

grid.l%%

[nN] {return NORTH;}[sS] {return SOUTH;}[eE] {return EAST;}[wW] {return WEST;}

[ \n\t] ;

%%

int yywrap(void) { return 1; }

241-437 Compilers: Attr. Grammars/8 33

grid.y

%union{ struct { int x; int y; } pos; struct { int dx; int dy; } offset;}

%token EAST WEST NORTH SOUTH

%type <offset> step%type <pos> path

%%

continued

typedefinitions

types use by thenon-terminals

241-437 Compilers: Attr. Grammars/8 34

robot: path { printf("Robot is at (%d,%d)\n", $1.x, $1.y); }

;

path: path step {$$.x = $1.x + $2.dx; $$.y = $1.y + $2.dy;}

| {$$.x = 0; $$.y = 0;} ;

step: EAST {$$.dx = 1; $$.dy = 0;} | WEST {$$.dx = -1; $$.dy = 0;} | SOUTH {$$.dx = 0; $$.dy = -1;} | NORTH {$$.dx = 0; $$.dy = 1;} ;

%%

continued

241-437 Compilers: Attr. Grammars/8 35

#include "lex.yy.c"

int yyerror(char *s){ fprintf(stderr, "%s\n", s); return 0;}

int main(void){ yyparse(); return 0;}

241-437 Compilers: Attr. Grammars/8 36

6. Recursive Descent and Attributes

• It is easy to add semantic actions to a recursive descent parser– in many cases, there's no need for the parser to

build a parse tree in order to evaluate the attributes

• The basic translation strategy:– each production becomes a function

continued

241-437 Compilers: Attr. Grammars/8 37

• The function (e.g. f()) calls other functions representing its body non-terminals– those functions return values (attributes) to f()– f() combines the values, and returns a value

(attribute)

241-437 Compilers: Attr. Grammars/8 38

6.1. The Expressions Parser Again

• The basic LL(1) grammar:Stats => ( [ Stat ] \n )*

Stat => let ID = Expr | Expr

Expr => Term ( (+ | - ) Term )*

Term => Fact ( (* | / ) Fact ) *

Fact => '(' Expr ')' | Int | Id

241-437 Compilers: Attr. Grammars/8 39

An Expressions Program (test3.txt)

5 + 6 give answerlet x = 2 declare

variable3 + ( (x*y)/2) // comments// ylet x = 5let y = x /0 error

// comments

241-437 Compilers: Attr. Grammars/8 40

• exprParse1.c is a recursive descent parser using the expressions language.

• It differs from exprParse0.c by having semantic actions attached to its productions– these actions evaluate the expressions, and

assign values to expression variables

6.2. Parsing with Actions

241-437 Compilers: Attr. Grammars/8 41

Grammar with Actions

• Productions ActionsStats => ( [ Stat ] \n )* ---

Stat => let ID = Expr add id to symbol table;id.val = expr.val;print( id.val );

Stat => Expr print( expr.val );

continued

241-437 Compilers: Attr. Grammars/8 42

Expr => Term ( (+ | - ) Term )*

return term1.val (+| -)term2.val (+| -) ...termn.val;

Term => Fact ( (* | / ) Fact ) *

return fact1.val (*| /)fact2.val (*| /) ...factn.val;

continued

241-437 Compilers: Attr. Grammars/8 43

Fact => '(' Expr ') return expr.val;

Fact => Int return int.val;

Fact => Id lookup id;if not found then add (id, 0) to table;

return id.val;

241-437 Compilers: Attr. Grammars/8 44

The Symbol Table

• The symbol table is a data structure used to store expression variables and their values.

• In exprParse1.c, it's an array of structs, with each struct holding the name of the variable and its current integer value.

. . . .idvalue

syms[]

241-437 Compilers: Attr. Grammars/8 45

6.3. Usage

$ gcc -Wall -o exprParse1 exprParse1.c$ ./exprParse1 < test3.txt== 11x being declaredx = 2y being declared== 3x = 5Error: Division by zero; using 1 insteady = 5$

241-437 Compilers: Attr. Grammars/8 46

6.4. exprParse1.c Callgraphsame as in exprParse0.c

symboltable (new)

generated fromgrammar (nowwith actions)

241-437 Compilers: Attr. Grammars/8 47

6.5. Symbol Table Data Structures

#define MAX_SYMS 15 // max no of variables

typedef struct SymInfo { char *id; // name of variable int value; // value (an integer)} SymbolInfo;

int symNum = 0; // number of symbols storedSymbolInfo syms[MAX_SYMS];

. . . .idvalue

syms[]

0 1 2 14

241-437 Compilers: Attr. Grammars/8 48

Symbol Table FunctionsSymbolInfo *getIDEntry(void)/* find _OR_ create symbol table entry for

current tokString; return a pointer to it */{ SymbolInfo *si = NULL; if ((si = lookupID(tokString)) != NULL)

// already declared return si;

// add id to table printf("%s being declared\n", tokString); return addID(tokString, 0); //0 is default value} // end of getIDEntry()

241-437 Compilers: Attr. Grammars/8 49

SymbolInfo *lookupID(char *nm)/* is nm in the symbol table? return pointer to struct or NULL */{ int i; for(i=0; i<symNum; i++) if (!strcmp(syms[i].id, nm)) return &syms[i]; return NULL;} // end of lookupID()

241-437 Compilers: Attr. Grammars/8 50

SymbolInfo *addID(char *nm, int value)/* add nm and value to the symbol table;

return pointer to struct */{ if (symNum == MAX_SYMS) { printf("Symbol table full; cannot add %s\n", nm); exit(1); }

syms[symNum].id = (char *) malloc(strlen(nm)+1); strcpy(syms[symNum].id, nm); syms[symNum].value = value; SymbolInfo *si = &syms[symNum]; symNum++;

return si;} // end of addID()

241-437 Compilers: Attr. Grammars/8 51

Using the Symbol Table• The grammar functions use the symbol table via the

matchID() function.

SymbolInfo *matchId(void)// checks current ID with symbol table{ SymbolInfo *si; dprint("Parsing ident\n"); if ((si = getIDEntry()) == NULL) { printf("Error: id is NULL on line %d\n",lineNum); exit(1); } match(ID); // ok, so consume ID token return si;} // end of matchId()

241-437 Compilers: Attr. Grammars/8 52

6.6. Translating the Grammar Rules

• The same translation is carried out as before, but the code is augmented with actions.

• The semantic actions are translated into extra C code in the grammar functions.

241-437 Compilers: Attr. Grammars/8 53

The Grammar Functions

• main() and statements() are unchanged from exprParse0.c since they don't have any semantic actions.

• Functions with extra actions:– statement(), expression(), term(), factor()

241-437 Compilers: Attr. Grammars/8 54

int main(void){ nextToken(); statements(); match(SCANEOF); return 0;}

void statementsvoid statements((voidvoid))// // statements statements ::= ::= { { // // [ [ statementstatement] ] '\n' }'\n' }{{ dprintdprint("("Parsing Parsing

statements\n statements\n")");; while while ((currToken currToken != !=

SCANEOF SCANEOF) ) {{ if if ((currToken currToken != != NEWLINENEWLINE)) statementstatement()();; matchmatch((NEWLINENEWLINE));; }}} } // // end of statementsend of statements()()

Unchanged Functions

241-437 Compilers: Attr. Grammars/8 55

statement() Before and After

void statement(void)// statement ::= ( 'let' ID '=' EXPR ) | EXPR{ if (currToken == LET) { match(LET); match(ID); match(ASSIGNOP); expression(); } else expression();} // end of statement()

with no semantic actions

241-437 Compilers: Attr. Grammars/8 56

void statement(void)// statement ::= ( 'let' ID '=' EXPR ) | EXPR{ SymbolInfo *si; int value; dprint("Parsing statement\n"); if (currToken == LET) { match(LET); si = matchId(); // was match(ID); match(ASSIGNOP); value = expression(); si->value = value; printf("%s = %d\n", si->id, value); } else { // expression value = expression(); printf("== %d\n", value); }}

Actions: add id to table; id.val = expr.val; print( id.val );or print( expr.val );

241-437 Compilers: Attr. Grammars/8 57

expression() Before and After

void expression(void)// expression ::= term ( ('+'|'-') term )*{ term(); while((currToken == PLUSOP) ||

(currToken == MINUSOP)) { match(currToken); term(); }} // end of expression()

with no semantic actions

241-437 Compilers: Attr. Grammars/8 58

int expression(void)// expression ::= term ( ('+'|'-') term )*{ int result, v2; int isAddOp;

dprint("Parsing expression\n"); result = term(); while((currToken == PLUSOP) || (currToken == MINUSOP)) { isAddOp = (currToken == PLUSOP) ? 1 : 0; match(currToken); v2 = term(); if (isAddOp == 1) // addition result += v2; else // subtraction result -= v2; } return result;} // end of expression()

Action: return term1.val (+| -) term2.val (+| -) ... termn.val;

241-437 Compilers: Attr. Grammars/8 59

term() Before and After

void term(void)// term ::= factor ( ('*'|'/') factor )*{ factor(); while((currToken == MULTOP) ||

(currToken == DIVOP)) { match(currToken); factor(); }} // end of term()

with no semantic actions

241-437 Compilers: Attr. Grammars/8 60

int term(void)// term ::= factor ( ('*'|'/') factor )*{ int result, v2; int isMultOp; dprint("Parsing term\n"); result = factor(); while((currToken == MULTOP) || (currToken == DIVOP)) { isMultOp = (currToken == MULTOP) ? 1 : 0; match(currToken); v2 = factor(); if (isMultOp == 1) // multiplication result *= v2; else { // division if (v2 == 0) printf("Error: Division by zero; using 1 instead\n"); else result = result / v2; } } return result;} // end of term()

Action: return fact1.val (*| / ) fact2.val (*| / ) ... factn.val;

241-437 Compilers: Attr. Grammars/8 61

factor() Before and After

void factor(void)// factor ::= '(' expression ')' | INT | ID{ if(currToken == LPAREN) { match(LPAREN); expression(); match(RPAREN); } else if(currToken == INT) match(INT); else if (currToken == ID) match(ID); else syntax_error(currToken);} // end of factor()

with no semantic actions

241-437 Compilers: Attr. Grammars/8 62

int factor(void)// factor ::= '(' expression ')' | INT | ID{ int result = 0; dprint("Parsing factor\n"); if(currToken == LPAREN) { match(LPAREN); result = expression(); match(RPAREN); } else if(currToken == INT) { match(INT); result = currTokValue; } else if (currToken == ID) { SymbolInfo *si = matchId(); result = si->value; } else syntax_error(currToken); return result;} // end of factor()

Actions: return expr.val;or return int.val;or add id to table (if new); return id.val;