cs1622 - university of pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture07.pdf1 cs 1622...

17
1 CS 1622 Lecture 7 1 CS1622 Lecture 7 Parsing (2) CS 1622 Lecture 7 2 Reading Chapter 3 Introduction Section 3.1 CS 1622 Lecture 7 3 Parsing = determining whether a string of tokens can be generated by a grammar Two classes based on order in which parse tree is constructed: Top-down parsing Start construction at root of parse tree Bottom-up parsing Start at leaves and proceed to root

Upload: vudieu

Post on 22-May-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

1

CS 1622 Lecture 7 1

CS1622

Lecture 7Parsing (2)

CS 1622 Lecture 7 2

Reading Chapter 3

Introduction Section 3.1

CS 1622 Lecture 7 3

Parsing = determining whether a string of

tokens can be generated by a grammar Two classes based on order in which

parse tree is constructed: Top-down parsing

Start construction at root of parse tree

Bottom-up parsing Start at leaves and proceed to root

2

CS 1622 Lecture 7 4

Derivations and Parse TreesA derivation is a sequence of productionsA derivation can be drawn as a tree

Start symbol is the tree’s rootFor a production N -> X1…Xn add

X1 … Xn as children to node N

S!!!LLL

1 nXYY!L

X

1 nYYL

CS 1622 Lecture 7 5

Derivation Example S -> E + E | E * E E -> id | (E) Derivation for: id * id + id Derivation for: id + id + id

See board

CS 1622 Lecture 7 6

Notes on Derivations A parse tree has

Terminals at the leaves Non-terminals at the interior nodes

An in-order traversal of the leaves is theoriginal input

The parse tree shows the association ofoperations, the input string does not

3

CS 1622 Lecture 7 7

Terminals Terminals are called because there are no

rules for replacing them

Once generated, terminals are permanent

Terminals are the tokens of the languagerepresented by the grammar

CS 1622 Lecture 7 8

Left-most and Right-mostDerivations The example is a left-

most derivation At each step, replace the

left-most non-terminal

There is an equivalentnotion of a right-mostderivation

EE*Eid *Eid *E+E id *id+Eid*id+ id

®

®

®

®

®

CS 1622 Lecture 7 9

Right-most Derivation inDetail (1)

EEE+EE+ idE * E + idE * id + idid * id + id

®

®*®*®*®*

4

CS 1622 Lecture 7 10

Derivations and Parse Trees Note that right-most and left-most

derivations have the same parse tree

The difference is the order in whichbranches are added

CS 1622 Lecture 7 11

Summary of Derivations We are not just interested in whether

s ∈ L(G) We need a parse tree for s

A derivation defines a parse tree But one parse tree may have many derivations

Left-most and right-most derivations areimportant in parser implementation

CS 1622 Lecture 7 12

Parse treesQuestion 1:

for each of the two parse trees, find thecorresponding left-most derivation

Question 2: for each of the two parse trees, find the

corresponding right-most derivation

5

CS 1622 Lecture 7 13

Ambiguity (Cont.) A grammar is ambiguous if for some string

(the following three conditions are equivalent)

it has more than one parse tree if there is more than one right-most derivation if there is more than one left-most derivation

Ambiguity is BAD Leaves meaning of some programs ill-defined

CS 1622 Lecture 7 14

Dealing with Ambiguity There are several ways to handle ambiguity

Most direct method is to rewrite grammarunambiguousl

Enforces precedence of * over + Separate (external of grammar) conflict-resolution

rules E.g., precedence or associativity rules (e.g. via parser

tool declarations) Same idea we saw with scanning: instead of

complicated RE’s use symbol table to recognizekeywords

'''EEE | EEid E' | id | (E)!+!"

CS 1622 Lecture 7 15

Expression Grammars(precedence) Rewrite the grammar

use a different nonterminal for each precedence level start with the lowest precedence (MINUS)

E E - E | E / E | ( E ) | id

rewrite to

E E - E | TT T / T | FF id | ( E )

6

CS 1622 Lecture 7 16

Exampleparse tree for id – id / id

E E - E | TT T / T | FF id | ( E )

E

E

F F

T

-

id

/

idid

T

FT T

E

CS 1622 Lecture 7 17

More than one parse tree? Attempt to construct a parse tree for id-

id/id that shows the wrong precedence.

Question: Why do you fail to construct this parse

tree?

CS 1622 Lecture 7 18

Associativity The grammar captures operator precedence,

but it is still ambiguous! fails to express that both subtraction and division

are left associative;

e.g., 5-3-2 is equivalent to: ((5-3)-2) and not to:(5-(3-2)).

7

CS 1622 Lecture 7 19

Recursion A grammar is recursive in nonterminal X if:

X + … X … recall that + means “in one or more steps, X derives a

sequence of symbols that includes an X”

A grammar is left recursive in X if: X + X …

in one or more steps, X derives a sequence of symbolsthat starts with an X

A grammar is right recursive in X if: X + … X

in one or more steps, X derives a sequence of symbolsthat ends with an X

CS 1622 Lecture 7 20

How to fix associativity The grammar given above is both left and

right recursive in nonterminals exp and term try at home: write the derivation steps that show

this. To correctly express operator associativity:

For left associativity, use left recursion. For right associativity, use right recursion.

Here's the correct grammar:E E – T | TT T / F | FF id | ( E )

CS 1622 Lecture 7 21

Here's the correct grammar - corrected for bothprecedence and associativity

E E – T | TT T / F | FF id | ( E )

E E - E | TT T / T | F F id | ( E )

Ambiguity - has both left and right associativity

Old grammar - corrected for precedence

8

CS 1622 Lecture 7 22

Ambiguity: More thanoperator - The Dangling Else

Consider the grammar E → if E then E | if E then E else E | print

This grammar is also ambiguous

CS 1622 Lecture 7 23

The Dangling Else: Example The expression

if E1 then if E2 then E3 else E4has two parse trees

if

E1 if

E2 E3 E4

if

E1 if

E2 E3

E4

• Typically we want the second form

CS 1622 Lecture 7 24

The Dangling Else: A Fix else matches the closest unmatched then We can describe this in the grammar

S → WithElse /* all then are matched */ | LastElse /* some then are unmatched */WithElse → if E then WithElse else WithElse

| printLastElse → if E then S | if E then WithElse else LastElse

Describes the same set of strings

9

CS 1622 Lecture 7 25

The Dangling Else: ExampleRevisited The expression if E1 then if E2 then E3 else

E4if

E1 if

E2 E3 E4

if

E1 if

E2 E3

E4

• Not valid because thethen expression is nota MIF

• A valid parse tree(for a UIF)

CS 1622 Lecture 7 26

Precedence and AssociativityDeclarations Instead of rewriting the grammar

Use the more natural (ambiguous) grammar Along with disambiguating declarations

Most parser generators allow precedenceand associativity declarations todisambiguate grammars

Examples …

CS 1622 Lecture 7 27

Associativity Declarations Consider the grammar E → E - E | int Ambiguous: two parse trees of int - int - intE

E

E E

E-

int -

intint

E

E

E E

E-

int-

intint

• Left associativity declaration: %left +

10

CS 1622 Lecture 7 28

Precedence Declarations Consider the grammar E → E + E | E * E |

int And the string int + int * int

E

E

E E

E+

int *

intint

E

E

E E

E*

int+

intint

• Precedence declarations: %left + %left *

CS 1622 Lecture 7 29

Parenthetical Excursion intothe Land of Grammars Just FYI - if you are theoretically

inclined you may want to know this Relevant for the practical compiler

hacker wrt. kinds of constructs can beput into a CFG

CS 1622 Lecture 7 30

Chomsky Language Classificationbased on Grammar4 types of grammars

Type 0 grammar - unrestricted or recursive

Form of rule:α → β , α ε (N U T)*, β ε (N U T)*no restrictions on form of grammar rules

Example: aAB → aB

A → ε can have empty productions

| α | > | β |

11

CS 1622 Lecture 7 31

Type 1 grammar - context sensitive

Form of rule:

α A β → α γ β, where A ε N+

γ ε (N U T) +

| A | ≤ | γ |

Replace A by γ only if found in context of α and β

No erasure rules

CS 1622 Lecture 7 32

Type 2 grammars - context free

A → α, A ε N, α ε (N U T)*

Programming language constructs - mostly context free

but not all -

• declarations must proceed use

• number of formals and actuals agree

must check this in semantic phase

CS 1622 Lecture 7 33

Type 3 grammar - regular

rules: A → a

or A → aB, a ε T A,B, ε N

tokens defined by type 3 grammar

So lexical analysis - regular expressions - can only describe

regular languages

12

CS 1622 Lecture 7 34

End of Excursion Time to wake up again

CS 1622 Lecture 7 35

Parsing Algorithms Can parse any grammar

Earley’s algorithm - expensive O(n3) Top-down

Start from S and build input string Leftmost derivation Can be easily written by hand

Recursive descent Alternative implementation Predictive parsing

Only works for some grammars (LL(k))

CS 1622 Lecture 7 36

Bottom-up Parsing Reduce input string to S (root of tree) Rightmost derivation starting from leaves Shift-reduce: shift symbols on a stack,

reduce to lhs of production when a “handle”is identified

LR(k) parser Hard to build by hand Tools More grammars can be parsed

13

CS 1622 Lecture 7 37

Intro to Top-Down Parsing The parse tree is constructed

From the top From left to right

Terminals are seen in order ofappearance in the tokenstream:

t2 t5 t6 t8 t9

1

t2 3

4

t5

7

t6

t9

t8

CS 1622 Lecture 7 38

Recursive Descent Parsing Consider the grammar

E → T + E | T T → int | int * T | ( E )

Token stream is: int5 * int2 Start with top-level non-terminal E

Try the rules for E in order

CS 1622 Lecture 7 39

Recursive Descent Parsing.Example (Cont.) Try E0 → T1 + E2 Then try a rule for T1 → ( E3 )

But ( does not match input token int5 Try T1 → int . Token matches.

But + after T1 does not match input token * Try T1 → int * T2

This will match but + after T1 will be unmatched Has exhausted the choices for T1

Backtrack to choice for E0

14

CS 1622 Lecture 7 40

Recursive Descent Parsing.Example (Cont.) Try E0 → T1

Follow same steps as before for T1 And succeed with T1 → int * T2 and T2 → int With the following parse tree

E0

T1

int5

* T2

int2

CS 1622 Lecture 7 41

A Recursive Descent Parser.Preliminaries Let TOKEN be the type of tokens

Special tokens INT, OPEN, CLOSE,PLUS, TIMES

Let the global variable next point to thenext token

CS 1622 Lecture 7 42

A Recursive Descent Parser(2)

Define boolean functions that check thetoken string for a match of A given token terminal bool term(TOKEN tok) { return *next++ == tok; }

S -> S1 | S2 | S… | Sn A given production of S (the nth) bool Sn() { … } Any production of S: bool S() { … }

These functions advance next

15

CS 1622 Lecture 7 43

A Recursive Descent Parser(3) For production E → T bool E1() { return T(); } For production E → T + E bool E2() { return T() && term(PLUS) && E(); } For all productions of E (with backtracking)

bool E() { TOKEN *save = next; return (next = save, E1()) || (next = save, E2()); }

CS 1622 Lecture 7 44

A Recursive Descent Parser(4) Functions for non-terminal Tbool T1() { return term(OPEN) && E() && term(CLOSE);

}bool T2() { return term(INT) && term(TIMES) && T(); }bool T3() { return term(INT); }

bool T() { TOKEN *save = next; return (next = save, T1()) || (next = save, T2()) || (next = save, T3()); }

CS 1622 Lecture 7 45

Recursive Descent Parsing.Notes. To start the parser

Initialize next to point to first token Invoke E()

Notice how this simulates our previousexample

Easy to implement by hand But does not always work …

16

CS 1622 Lecture 7 46

When Recursive DescentDoes Not Work Consider a production S → S a bool S1() { return S() && term(a); } bool S() { return S1(); } S() will get into an infinite loop

A left-recursive grammar has a non-terminalS S →+ Sα for some α

Recursive descent does not work in suchcases

CS 1622 Lecture 7 47

Elimination of Left Recursion Consider the left-recursive grammar S → S α | β

S generates all strings starting with a β andfollowed by a number of α

Can rewrite using right-recursion S → β S’ S’ → α S’ | ε

CS 1622 Lecture 7 48

More Elimination of Left-Recursion In general S → S α1 | … | S αn | β1 | … | βm

All strings derived from S start with oneof β1,…,βm and continue with severalinstances of α1,…,αn

Rewrite as S → β1 S’ | … | βm S’ S’ → α1 S’ | … | αn S’ | ε

17

CS 1622 Lecture 7 49

General Left Recursion The grammar

S → A α | δ A → S β is also left-recursive because

S →+ S β α

This left-recursion can also be eliminated - won’t go into this

CS 1622 Lecture 7 50

Summary of RecursiveDescent Simple and general parsing strategy

Left-recursion must be eliminated first … but that can be done automatically

Unpopular because of backtracking Thought to be too inefficient

In practice, backtracking is eliminated byrestricting the grammar