edan65: compilers, lecture03 context...

Post on 20-Jan-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

EDAN65:Compilers,Lecture 03

Context-free grammars,introductionto parsingGörelHedinRevised:2017-09-04

Courseoverview

Semantic analyzer

Intermediatecode generator

Optimizer

Targetcodegenerator

2

Lexical analyzer(scanner)

Syntactic analyzer(parser)

Regularexpressions

Context-freegrammar

Attributegrammar

machine

runtime system

stack

heap

codeanddata

objects

activationrecords

Interpreter

target code

tokens

Attributed AST

intermediate code

sourcecode (text)

AST(Abstractsyntaxtree)

intermediate code

garbagecollection

Virtualmachine

This lecture

Analyzing programtext

3

sum = sum + k ;

AssignStmt

Exp

Add

Exp Exp

ID EQ ID PLUS ID SEMIprogram text

tokens

parse tree

Recall:Generatingthecompiler:

Semantic analyzer

Lexical analyzer(scanner)

Syntactic analyzer(parser)

Regularexpressions

ScannergeneratorJFlex

Context-freegrammar

ParsergeneratorBeaver

Attributegrammar

Attribute evaluatorgenerator

We will use aparsergeneratorcalled Beaver

4

tokens

text

tree

5

Context-Free Grammars

Regular ExpressionsvsContext-Free Grammars

6

AnREcan have iteration

ACFGcan also have recursion(itispossible toderive asymbol,e.g.,Stmt,fromitself)

Example REs:WHILE = "while"ID = [a-z][a-z0-9]*LPAR = "("RPAR = ")"PLUS = "+"...

Example CFG:Stmt –> WhileStmtStmt –> AssignStmtWhileStmt –> WHILE LPAR Exp RPAR StmtExp –> IDExp –> Exp PLUS Exp...

ElementsofaContext-Free Grammar

7

Production rules:X –> s1 s2 … sn

where sk isasymbol(terminalornonterminal)

Nonterminal symbols

Terminalsymbols(tokens)

Startsymbol(one ofthenonterminals,usually theleft-handside of thefirst production)

Example CFG:Stmt –> WhileStmtStmt –> AssignStmtWhileStmt –> WHILE LPAR Exp RPAR StmtAssignStmt –> ID EQ Exp SEMIC…

Shorthand foralternatives

8

Stmt –> WhileStmtStmt –> AssignStmt

Stmt –> WhileStmt | AssignStmt

isequivalent to

Shorthand forrepetition

9

Stmt*

StmtList –> e | Stmt StmtList

isequivalent to

StmtList

where

ExerciseConstruct agrammar covering this programandsimilar ones:

10

Example program:while (k <= n) {sum = sum + k; k = k+1;}

CFG:Stmt –> WhileStmt | AssignStmt | CompoundStmtWhileStmt –> "while" "(" Exp ")" StmtAssignStmt –> ID "=" Exp ";"CompoundStmt –> ...Exp –> ...LessEq –> ...Add –> ...

(Often,simpletokensarewritten directly astextstrings)

SolutionConstruct agrammar covering this programandsimilar ones:

11

CFG:Stmt –> WhileStmt | AssignStmt | CompoundStmtWhileStmt –> "while" "(" Exp ")" StmtAssignStmt –> ID "=" Exp ";"CompoundStmt –> "{" Stmt* "}"Exp –> LessEq | Add | ID | INTLessEq –> Exp "<=" ExpAdd –> Exp "+" Exp

Example program:while (k <= n) {sum = sum + k; k = k+1;}

ParsingUse thegrammar toderive atree foraprogram:

12sum = sum + k ;

StmtExample program:sum = sum + k;

Startsymbol

Parse treeUse thegrammar toderive aparse tree foraprogram:

13sum = sum + k ;

Stmt

AssignStmt

Exp

Add

Exp Exp

Example program:sum = sum + k;

Nonterminalsareinnernodes

Startsymbol

Terminalsareleafs

Aparse tree includes allthetokensasleafs.

Corresponding abstractsyntaxtree(will bediscussed inlaterlecture)

14sum = sum + k ;

AssignStmt

Add

IdExp IdExp

Example program:sum = sum + k;

IdExp

Anabstractsyntaxtree issimilarto aparse tree,but simpler.

Itdoes notinclude allthetokens.

EBNFvsCanonical Form

15

EBNF:Stmt –> AssignStmt | CompoundStmtAssignStmt –> ID "=" Exp ";"CompoundStmt –> "{" Stmt* "}"Exp –> Add | IDAdd –> Exp "+" Exp

Canonical form:Stmt –> ID "=" Exp ";"Stmt –> "{" Stmts "}"Stmts –> eStmts –> Stmt StmtsExp –> Exp "+" ExpExp –> ID

(Extended)Backus-Naur Form:• Compact,easytoreadandwrite• EBNFhasalternatives,repetition,optionals,parentheses (likeREs)

• Commonnotationforpracticaluse

Canonical form:• Core formalismforCFGs• Useful forproving properties andexplaining algorithms

Realworldexample:TheJavaLanguageSpecification

16

See http://docs.oracle.com/javase/specs/jls/se8/html/index.html• See Chapter 2about theJavagrammar notation.• Lookatsome other chapters to see other syntaxexamples.

CompilationUnit:[PackageDeclaration]{ImportDeclaration}{TypeDeclaration}

PackageDeclaration:{PackageModifier}package Identifier {.Identifier};

PackageModifier:Annotation

17

Formaldefinitionof CFGs

FormaldefinitionofCFGs (canonical form)

18

Acontext-free grammar G=(N,T,P,S),whereN– thesetofnonterminalsymbolsT– thesetofterminalsymbolsP– thesetofproduction rules,each withtheform

X–>Y1 Y2 …Ynwhere X∈ N,n≥ 0,andYk∈ N∪ T

S– thestartsymbol(one ofthenonterminals).I.e.,S∈ N

So,theleft-hand side Xofarule isanonterminal.

Andtheright-hand side Y1 Y2 …Yn isasequence of nonterminalsandterminals.

If therhs foraproduction isempty,i.e.,n=0,we writeX–>e

AgrammarG defines alanguageL(G)

19

A context-free grammar G=(N,T,P,S),whereN– thesetofnonterminalsymbolsT– thesetofterminalsymbolsP– thesetofproduction rules,each withtheform

X–>Y1 Y2 …Ynwhere X∈ N,n≥ 0,andYk∈ N∪ T

S– thestartsymbol(one ofthenonterminals).I.e.,S∈ N

Gdefines alanguage L(G) overthealphabet T

T*isthesetofallpossible sequences ofTsymbols.

L(G)isthesubsetofT*thatcan bederived fromthestartsymbolS,byfollowing theproduction rules P.

Exercise

20

G = (N, T, P, S)

P = {Stmt –> ID "=" Exp ";",Stmt –> "{" Stmts "}" ,Stmts –> e ,Stmts –> Stmt Stmts ,Exp –> Exp "+" Exp ,Exp –> ID

}

N = { }

T = { }

S =

L(G) = {

}

Solution

21

G = (N, T, P, S)

P = {Stmt –> ID "=" Exp ";",Stmt –> "{" Stmts "}" ,Stmts –> e ,Stmts –> Stmt Stmts ,Exp –> Exp "+" Exp ,Exp –> ID

}

N = {Stmt, Exp, Stmts}

T = {ID, "=", "{", "}", ";", "+"}

S = Stmt

L(G) = {"{" "}","{" "{" "}" "}",ID "=" ID ";","{" ID "=" ID ";" "}",ID "=" ID "+" ID ";","{" "{" "}" "{" "}" "}","{" "{" "{" "}" "}" "}","{" ID "=" ID "+" ID ";" "}",ID "=" ID "+" ID "+" ID ";",...

}

Thesequences inL(G)areusually called sentences orstrings

22

Derivations

Derivationstep

23

If we have asequence ofterminalsandnonterminals,e.g.,

XaYY b

we can replace one ofthenonterminals,applying aproductionrule.Thisiscalled aderivationstep.(Swedish:Härledningssteg)

Supposethere isaproduction

Y–>Xa

andwe apply itforthefirstYinthesequence.We write thederivationstepasfollows:

XaYY b=>XaXaYb

Derivation

24

Aderivation,issimply asequence ofderivationsteps,e.g.:

g0 =>g1 =>…=>gn (n≥0)

where each gi isasequence ofterminalsandnonterminals

Ifthere isaderivationfromg0 togn,we can write thisas

g0 =>*gn

Sothismeans itispossible togetfromthesequence g0 tothesequence gn byfollowing theproduction rules.

Definitionofthelanguage L(G)

25

Recall that:

G=(N,T,P,S)

T* isthesetofallpossible sequences ofT symbols.

L(G)isthesubsetofT*thatcan bederived fromthestartsymbol S,byfollowing theproduction rules P.

Using theconcept ofderivations,we can formally define L(G) asfollows:

L(G)={w∈ T*| S=>*w}

Exercise:Prove thatasentence belongs toalanguage

26

Prove that

INT+INT*INT

Proof (byshowing allthederivationstepsfromthestartsymbolExp):

Exp=>

belongs tothelanguage ofthefollowing grammar:

p1: Exp –>Exp "+" Expp2: Exp –>Exp "*" Expp3: Exp –> INT

Solution:Prove thatasentence belongs toalanguage

27

Prove that

INT+INT*INT

Proof:(byshowing allthederivationstepsfromthestartsymbolExp)

Exp=>p1 Exp "+" Exp=>p3 INT"+"Exp=>p2 INT"+"Exp "*"Exp=>p3 INT"+"INT"*"Exp=>p3 INT"+"INT"*"INT

belongs tothelanguage ofthefollowing grammar:

p1: Exp –>Exp "+" Expp2: Exp –>Exp "*" Expp3: Exp –> INT

Leftmost andrightmost derivations

28

Inaleftmost derivation,theleftmost nonterminalisreplacedineach derivationstep,e.g.,:

Exp =>Exp "+" Exp =>INT"+"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT

LLparsingalgorithms use leftmost derivation.LRparsingalgorithms use rightmost derivation.Willbediscussed inlaterlectures.

Inarightmost derivation,therightmost nonterminalisreplaced ineach derivationstep,e.g.,:

Exp =>Exp "+" Exp =>Exp "+"Exp "*"Exp =>Exp "+"Exp "*"INT=>Exp "+"INT"*"INT=>INT"+"INT"*"INT

Aderivationcorresponds tobuilding aparse tree

29

Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT

Example derivation:

Exp =>Exp "+" Exp =>INT"+"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT

Exercise:build theparse tree(also called derivationtree).

Aderivationcorresponds tobuilding aparse tree

30

Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT

Example derivation:

Exp =>Exp "+" Exp =>INT"+"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT

Parse tree (derivationtree):

Exp

Exp Exp

Exp Exp

"+"

INT"*"

INT INT

31

Ambiguities

Exercise:Canwe do another derivationofthesamesentence,

thatgivesadifferentparse tree?

32

Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT

Parse tree:

Anotherderivation:

Exp =>

Solution:Canwe do another derivationofthesamesentence,

thatgivesadifferentparse tree?

33

Grammar:Exp –>Exp "+" ExpExp –>Exp "*" ExpExp –> INT

Parse tree:

Exp

Exp"*"

INT

Exp

Exp Exp"+"

INT INT

Anotherderivation:

Exp =>Exp "*" Exp =>Exp "+"Exp "*"Exp =>INT"+"Exp "*"Exp =>INT"+"INT"*"Exp =>INT"+"INT"*"INT

Which parse tree would we prefer?

Ambiguous context-freegrammars

34

ACFGisambiguous if asentence inthelanguage can bederived bytwo (ormore)differentparse trees.

ACFGisunambiguous if each sentence inthelanguage canbederived byonly one parse tree.

(Swedish:tvetydig,otvetydig)

Note!There can bemany differentderivationsthat give thesameparse tree.

HowcanweknowifaCFGisambiguous?

35

Ifwe find anexample of anambiguity,we know thegrammar isambiguous.

There are algorithms fordeciding if aCFGbelongs to certainsubsets of CFGs,e.g.LL,LR,etc.(See laterlectures.)Thesegrammarsare unambiguous.

But inthegeneralcase,theproblemisundecidable:itisnotpossible to construct ageneralalgorithm that decidesambiguity foranarbitrary CFG.

Strategies foreliminating ambiguities,next lecture.

36

Parsing

Differentparsingalgorithms

37

Ambiguous

Unambiguous

Allcontext-freegrammars

LR

LL

LL:Left-to-rightscanLeftmost derivationBuilds tree top-downSimpleto understand

LR:Left-to-rightscanRightmost derivationBuilds tree bottom-upMore powerful

LLandLRparsers:main idea

38

...if IDthen ID=ID;ID...

LR(1):decidestobuildAssignafterseeingthefirsttokenfollowingitssubtree.Thetreeisbuiltbottomup.

Id Assign

Id Id

Thetokeniscalled lookahead.LL(k)andLR(k)use k lookahead tokens.

...if IDthen ID=ID;ID...

IfStmt

Id Assign

LL(1):decidestobuildAssignafterseeingthefirsttokenofitssubtree.Thetreeisbuilttopdown.

CompoundStmt

Recursive-descentparsingAwayofprogramminganLL(1)parserbyrecursivemethodcalls

39

Assume aBNFgrammar with exactly one production rule foreach nonterminal.(Can easily begeneralized to EBNF.)

Each production rule RHSiseither1. asequence of token/nonterminal symbols,or2. asetof nonterminal symbolalternatives

Foreach nonterminal,amethod isconstructed.Themethod1. matches tokensandcallsnonterminal methods,or2. callsone of thenonterminal methods – which one depends onthe

lookahead token.

Ifthelookahead tokendoes notmatch,aparsingerror isreported.

A–>B|C|DB–>eCfDC–>...D–>...

ExampleJavaimplementation:overview

40

statement –>assignment |compoundStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACE...

class Parser{privateint token; //current lookahead tokenvoid accept(int t){...} //accepttandreadinnext tokenvoid error(Stringstr){...} //generate error messagevoid statement(){...}void assignment (){...}void compoundStmt (){...}...

}

Example:recursivedescentmethods

41

statement –>assignment |compoundStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACE

class Parser{void statement(){switch(token){case ID:assignment();break;case LBRACE:compoundStmt();break;default:error("Expecting statement,found:"+token);}}void assignment(){accept(ID);accept(ASSIGN);expr();accept(SEMICOLON);}void compoundStmt(){accept(LBRACE);while (token!=RBRACE){statement();}accept(RBRACE);}...}

Example:Parserskeletondetails

42

statement –>assignment |compoundStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACEexpr –>...

class Parser{finalstatic int ID=1,WHILE=2,DO=3,ASSIGN=4,...;privateint token; //current lookahead tokenvoid accept(int t){ //accepttandreadinnext tokenif (token==t){token=nextToken();}else {error("Expected "+t+",but found "+token);}}void error(Stringstr){...} //generate error messageprivateint nextToken(){...}//readnext tokenfromscannervoid statement()......

}

ArethesegrammarsLL(1)?

43

expr –>name params |name

Whatwouldhappeninarecursive-descentparser?

Could they beLL(2)?LL(k)?

Commonprefix

expr –>expr "+" term Left recursion

Dealingwithcommonprefixoflimitedlength:Locallookahead

44

LL(2)grammar:statement –>assignment |compoundStmt |callStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACEcallStmt –> IDLPAR expr RPAR SEMICOLON

voidstatement()...

45

LL(2)grammar:statement –>assignment |compoundStmt |callStmtassignment–>IDASSIGN expr SEMICOLONcompoundStmt –>LBRACE statement*RBRACEcallStmt –> IDLPAR expr RPAR SEMICOLON

voidstatement(){switch(token){caseID:if(lookahead(2) ==ASSIGN){assignment();}else{callStmt();}break;caseLBRACE:compoundStmt();break;default:error("Expectingstatement,found:"+token);}}

Dealingwithcommonprefixoflimitedlength:Locallookahead

Generatingtheparser:

Syntactic analyzer(parser)

Context-freegrammar Parsergenerator

46

tokens

tree

Beaver:anLR-based parsergenerator

ParserinJava

Context-freegrammar,

with semanticactionsinJava

Beaver

47

tokens

tree

Example beaver specification

48

%class "LangParser";%package "lang";...%terminalsLET,IN,END,ASSIGN,MUL,ID,NUMERAL;

%goal program;//Thestartsymbol

//Context-free grammarprogram=exp;exp =factor |exp MULfactor;factor =let |numeral |id;let =LETidASSIGNexp INexp END;numeral =NUMERAL;id=ID;

Lateron,we will extend this specification with semantic actionsto build thesyntaxtree.

49

RE CFGTypicalAlphabet

characters terminalsymbols(tokens)

Language isasetof ...

strings(charsequences)

sentences(tokensequences)

Used for... tokens parsetreesPower iteration recursionRecognizer DFA DFAwith stack

RegularExpressionsvsContext-FreeGrammars

50

Grammar Rulepatterns Typeregular X –>aY orX –>a orX –>e 3

contextfree X–> g 2context sensitive a X b –>a g b 1

arbitrary g –>d 0

TheChomskyhierarchy of formalgrammars

a – terminalsymbola, b, g, d – sequences of (terminalornonterminal)symbols

Type(3)⊂ Type (2)⊂ Type(1)⊂ Type(0)

Regular grammarshave thesamepower asregular expressions(tail recursion =iteration).

Type 2and3are of practicaluse incompiler construction.Type 0and1are only of theoretical interest.

Courseoverview

Semantic analyzer

51

Lexical analyzer(scanner)

Syntactic analyzer(parser)

Regularexpressions

Context-freegrammar

Attributegrammar

tokens

sourcecode (text)

AST(Abstractsyntaxtree)

What we have covered:Context-free grammars,derivations,parse treesAmbiguous grammarsIntroduction to parsing,recursive-descent

You can now finishassignment 1

Summary questions

52

• Construct aCFGforasimplepartof aprogramming language.• What isanonterminal symbol?Aterminalsymbol?Aproduction?Astartsymbol?Aparse tree?• What isaleft-handside of aproduction?Aright-handside?• Givenagrammar G,what ismeant bythelanguage L(G)?• What isaderivationstep?Aderivation?Aleftmost derivation?Arighmostderivation?• How does aderivationcorrespond to aparse tree?• What does itmean foragrammar to beambiguous?Unambiguous?• Give anexample anambiguous CFG.• What isthedifference between anLLandanLRparser?• What isthedifference between LL(1)andLL(2)?Orbetween LR(1)andLR(2)?• Construct arecursive descent parserforasimplelanguage.• Give typical examples of grammarsthat cannot behandledbyarecursive-descent parser.• Explain why context-free grammarsare more powerful than regularexpressions.• Inwhat senseare context-free grammars"context-free"?

top related