cs1622 - university of pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 cs 1622...

15
1 CS 1622 Lecture 9 1 CS1622 Lecture 9 Parsing (4) CS 1622 Lecture 9 2 Today Example of a recursive descent parser Predictive & LL(1) parsers Building parse tables CS 1622 Lecture 9 3 A Recursive Descent Parser. Preliminaries Let TOKEN be the type of tokens Special tokens INT, OPEN, CLOSE, PLUS, TIMES Let the global variable next point to the next token

Upload: others

Post on 16-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 CS 1622 Lecture 9 7 Recursive Descent Parsing. Notes. To start the parser Initialize next to

1

CS 1622 Lecture 9 1

CS1622

Lecture 9Parsing (4)

CS 1622 Lecture 9 2

Today Example of a recursive descent parser Predictive & LL(1) parsers Building parse tables

CS 1622 Lecture 9 3

A Recursive Descent Parser.Preliminaries Let TOKEN be the type of tokens

Special tokens INT, OPEN, CLOSE,PLUS, TIMES

Let the global variable next point to thenext token

Page 2: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 CS 1622 Lecture 9 7 Recursive Descent Parsing. Notes. To start the parser Initialize next to

2

CS 1622 Lecture 9 4

A Recursive Descent Parser(2)

Define boolean functions that check thetoken string for a match of A given token terminal bool term(TOKEN tok) { return *next++ == tok; }

S -> S1 | S2 | S… | Sn A given production of S (the nth) bool Sn() { … } Any production of S: bool S() { … }

These functions advance next

CS 1622 Lecture 9 5

A Recursive Descent Parser(3) For production E → T bool E1() { return T(); } For production E → T + E bool E2() { return T() && term(PLUS) && E(); } For all productions of E (with backtracking)

bool E() { TOKEN *save = next; return (next = save, E1()) || (next = save, E2()); }

CS 1622 Lecture 9 6

A Recursive Descent Parser(4) Functions for non-terminal Tbool T1() { return term(OPEN) && E() && term(CLOSE);

}bool T2() { return term(INT) && term(TIMES) && T(); }bool T3() { return term(INT); }

bool T() { TOKEN *save = next; return (next = save, T1()) || (next = save, T2()) || (next = save, T3()); }

Page 3: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 CS 1622 Lecture 9 7 Recursive Descent Parsing. Notes. To start the parser Initialize next to

3

CS 1622 Lecture 9 7

Recursive Descent Parsing.Notes. To start the parser

Initialize next to point to first token Invoke E()

Notice how this simulates our previousexample

Easy to implement by hand But does not always work …

CS 1622 Lecture 9 8

When Recursive DescentDoes Not Work Consider a production S → S a bool S1() { return S() && term(a); } bool S() { return S1(); } S() will get into an infinite loop

A left-recursive grammar has a non-terminalS S →+ Sα for some α

Recursive descent does not work in suchcases

CS 1622 Lecture 9 9

Elimination of Left Recursion Consider the left-recursive grammar S → S α | β

S generates all strings starting with a β andfollowed by a number of α

Can rewrite using right-recursion S → β S’ S’ → α S’ | ε

Page 4: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 CS 1622 Lecture 9 7 Recursive Descent Parsing. Notes. To start the parser Initialize next to

4

CS 1622 Lecture 9 10

More Elimination of Left-Recursion In general S → S α1 | … | S αn | β1 | … | βm

All strings derived from S start with oneof β1,…,βm and continue with severalinstances of α1,…,αn

Rewrite as S → β1 S’ | … | βm S’ S’ → α1 S’ | … | αn S’ | ε

CS 1622 Lecture 9 11

General Left Recursion The grammar

S → A α | δ A → S β is also left-recursive because

S →+ S β α

This left-recursion can also be eliminated - won’t go into this

CS 1622 Lecture 9 12

Summary of RecursiveDescent Simple and general parsing strategy

Left-recursion must be eliminated first … but that can be done automatically

Unpopular because of backtracking Too inefficient

In practice, backtracking is eliminated byrestricting the grammar

Page 5: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 CS 1622 Lecture 9 7 Recursive Descent Parsing. Notes. To start the parser Initialize next to

5

CS 1622 Lecture 9 13

When Recursive DescentDoes Not Work Consider a production S → S a bool S1() { return S() && term(a); } bool S() { return S1(); } S() will get into an infinite loop

A left-recursive grammar has a non-terminalS S →+ Sα for some α

Recursive descent does not work in suchcases

CS 1622 Lecture 9 14

Elimination of Left Recursion Consider the left-recursive grammar S → S α | β

S generates all strings starting with a β andfollowed by a number of α

Can rewrite using right-recursion S → β S’ S’ → α S’ | ε

CS 1622 Lecture 9 15

More Elimination of Left-Recursion In general S → S α1 | … | S αn | β1 | … | βm

All strings derived from S start with oneof β1,…,βm and continue with severalinstances of α1,…,αn

Rewrite as S → β1 S’ | … | βm S’ S’ → α1 S’ | … | αn S’ | ε

Page 6: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 CS 1622 Lecture 9 7 Recursive Descent Parsing. Notes. To start the parser Initialize next to

6

CS 1622 Lecture 9 16

General Left Recursion The grammar

S → A α | δ A → S β is also left-recursive because

S →+ S β α

This left-recursion can also be eliminated - won’t go into this

CS 1622 Lecture 9 17

Summary of RecursiveDescent Simple and general parsing strategy

Left-recursion must be eliminated first … but that can be done automatically

Unpopular because of backtracking Thought to be too inefficient

In practice, backtracking is eliminated byrestricting the grammar

CS 1622 Lecture 9 18

Predictive Parsers Like recursive-descent but parser can

“predict” which production to use By looking at the next few tokens No backtracking

Predictive parsers accept LL(k) grammars L means “left-to-right” scan of input L means “leftmost derivation” k means “predict based on k tokens of lookahead”

In practice, LL(1) is used

Page 7: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 CS 1622 Lecture 9 7 Recursive Descent Parsing. Notes. To start the parser Initialize next to

7

CS 1622 Lecture 9 19

LL(1) Languages In recursive-descent, for each non-terminal

and input token there may be a choice ofproduction

LL(1) means that for each non-terminal andtoken there is only one production

Implementation Recursive procedures Table driven: Can be specified via 2D table

One dimension for current non-terminal to expand One dimension for next token A table entry contains one production

CS 1622 Lecture 9 20

Predictive Parser Avoid Backtracking by selecting the correct

production

How? Use look ahead

LL(1) uses 1 lookahead

Must structure the grammar to enable this A → a B | c D | e F Not A → a B | a C

CS 1622 Lecture 9 21

Predictive Parsing and LeftFactoring Consider the expression grammar

E → T + E | T T → int | int * T | ( E )

Hard to predict because For T two productions start with int For E it is not clear how to predict

A grammar must be left-factored before usefor predictive parsing

Page 8: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 CS 1622 Lecture 9 7 Recursive Descent Parsing. Notes. To start the parser Initialize next to

8

CS 1622 Lecture 9 22

Left-Factoring Example Recall the grammar

E → T + E | T T → int | int * T | ( E )

• Factor out common prefixes of productions E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε

CS 1622 Lecture 9 23

Predictive ParsingBasic ideaGiven A → α | β, the parser should be able to

choose between α & β

FIRST sets (see later how to compute)For some rhs α∈G, define FIRST(α) as the set of

tokens that appear as the first symbol in somestring derived from α

That is, x ∈ FIRST(α) iff α ⇒* x γ, for some γNote: x is terminal token

(Pursuing this idea leads to LL(1) parser generators...)

CS 1622 Lecture 9 24

LL(1) PropertyThe LL(1) Property

If A → α | β both appear in the grammar, α and β are

strings consisting of terminal and non-terminals,

we would like

FIRST(α) ∩ FIRST(β) = ∅ (see later how to compute)

This would allow the parser to make a correct

choice with a lookahead of exactly one symbol !

Page 9: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 CS 1622 Lecture 9 7 Recursive Descent Parsing. Notes. To start the parser Initialize next to

9

CS 1622 Lecture 9 25

LL(1) ParsingConsider A → β1 | β2 | β3, with

FIRST(β1) ∩ FIRST(β2) ∩ FIRST(β3) = ∅

/* find an A */if (current_word ∈ FIRST(β1)) do β1

else if (current_word ∈ FIRST(β2)) do β2

else if (current_word ∈ FIRST(β3)) do β3

else report an error and return false

CS 1622 Lecture 9 26

Predictive RecursiveDescent Parsing

1 Goal ! Expr

2 Expr ! Term Expr"

3 Expr" ! + Term Expr"

4 | - Term Expr"

5 | #

6 Term ! Factor Term"

7 Term" ! * Factor Term"

8 | / Factor Term"

9 | #

10 Factor ! number

11 | id

This produces a parser with sixmutually recursive routines:

• Goal

• Expr

• Expr_Prime

• Term

• Term_Prime

• FactorEach recognizes one nonterminal

CS 1622 Lecture 9 27

Recursive Descent ParsingGoal( ) token ← next_token( ); if (Expr( ) = true) then nextcompilation step; else return false;

Expr( ) result ← true; if (Term( ) = false) then result ← false; else if (EPrime( ) =false) thenresult ← false; return result;

Factor( ) result ← true; if (token = Number) then token ←next_token( ); else if (token =identifier) then token ←next_token( ); else

report syntaxerror;

result ← false; return result;

Page 10: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 CS 1622 Lecture 9 7 Recursive Descent Parsing. Notes. To start the parser Initialize next to

10

CS 1622 Lecture 9 28

Implementation Alternative Recursive procedures expensive

Stack space & call overhead Parsing table + driver normally used as

alternative

CS 1622 Lecture 9 29

LL(1) Parsing Table Example Left-factored grammar

E → T X X → + E | εT → ( E ) | int Y Y → * T | ε

The LL(1) parsing table:

εεε* TY( E )int YT

εε+ EXT XT XE

$)(+*int

CS 1622 Lecture 9 30

LL(1) Parsing Table Example(Cont.) Consider the [E, int] entry

“When current non-terminal is E and next input isint, use production E → T X

This production can generate an int in the firstplace

Consider the [Y,+] entry “When current non-terminal is Y and current token

is +, get rid of Y” Y can be followed by + only in a derivation in

which Y → ε

Page 11: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 CS 1622 Lecture 9 7 Recursive Descent Parsing. Notes. To start the parser Initialize next to

11

CS 1622 Lecture 9 31

LL(1) Parsing Tables. Errors Blank entries indicate error situations

Consider the [E,*] entry “There is no way to derive a string starting

with * from non-terminal E”

CS 1622 Lecture 9 32

Using Parsing Tables Method similar to recursive descent, except

For each non-terminal S We look at the next token a And chose the production shown at [S,a]

We use a stack to keep track of pending non-terminals - instead of procedure stack!

We reject when we encounter an error state We accept when we encounter end-of-input

CS 1622 Lecture 9 33

LL(1) Parsing Algorithminitialize stack = <S $> and nextrepeat case stack of <X, rest> : if T[X,*next] = Y1…Yn then stack ← <Y1… Yn

rest>; else error (); <t, rest> : if t == *next ++ then stack ← <rest>; else error ();until stack == < >

Page 12: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 CS 1622 Lecture 9 7 Recursive Descent Parsing. Notes. To start the parser Initialize next to

12

CS 1622 Lecture 9 34

LL(1) Parsing ExampleStack Input ActionE $ int * int $ T XT X $ int * int $ int Yint Y X $ int * int $ terminalY X $ * int $ * T* T X $ * int $ terminalT X $ int $ int Yint Y X $ int $ terminalY X $ $ εX $ $ ε$ $ ACCEPT

CS 1622 Lecture 9 35

Constructing Parsing Tables LL(1) languages are those defined by a

parsing table for the LL(1) algorithm No table entry can be multiply defined We want to generate parsing tables

from CFG

CS 1622 Lecture 9 36

Constructing Parsing Tables(Cont.) If A → α, where in the line of A we place α ? In the column of t where t can start a string

derived from α α →* t β We say that t ∈ First(α)

In the column of t if α is ε and t can follow anA S →* β A t δ We say t ∈ Follow(A)

Page 13: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 CS 1622 Lecture 9 7 Recursive Descent Parsing. Notes. To start the parser Initialize next to

13

CS 1622 Lecture 9 37

Computing First SetsDefinition First(X) = { t | X →* tα} ∪ {ε | X →* ε}Algorithm: First(t) = { t } ε ∈ First(X) if X → ε is a production ε ∈ First(X) if X → A1 … An

and ε ∈ First(Ai) for 1 ≤ i ≤ n First(α) ⊆ First(X) if X → A1 … An α

and ε ∈ First(Ai) for 1 ≤ i ≤ n

CS 1622 Lecture 9 38

First Sets. Example Recall the grammar

E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε

First sets First( ( ) = { ( } First( T ) = {int, ( } First( ) ) = { ) } First( E ) = {int, ( } First( int) = { int } First( X ) = {+, ε } First( + ) = { + } First( Y ) = {*, ε } First( * ) = { * }

CS 1622 Lecture 9 39

Computing Follow Sets forNonterminals Follow(S) needed for productions that

generate the empty string ε Definition: Follow(X) = { t | S →* β X t δ }

Note that ε CAN NEVER BE IN FOLLOW(X)!! Intuition

If X → A B then First(B) ⊆ Follow(A) and Follow(X) ⊆ Follow(B) Also if B →* ε then Follow(X) ⊆ Follow(A) If S is the start symbol then $ ∈ Follow(S)

Page 14: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 CS 1622 Lecture 9 7 Recursive Descent Parsing. Notes. To start the parser Initialize next to

14

CS 1622 Lecture 9 40

Computing Follow Sets (Cont.)Algorithm sketch: $ ∈ Follow(S) For each production A → α X β

First(β) - {ε} ⊆ Follow(X)

For each production A → α X β where ε ∈First(β)Follow(A) ⊆ Follow(X)

CS 1622 Lecture 9 41

Follow Sets. Example Recall the grammar

E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε

Follow sets

Follow( E ) = {), $}Follow( X ) = {$, ) }Follow( T ) = {+, ) , $}Follow( Y ) = {+, ) , $}

CS 1622 Lecture 9 42

Constructing LL(1) ParsingTables Construct a parsing table T for CFG G

For each production A → α in G do: For each terminal t ∈ First(α) do

T[A, t] = α If ε ∈ First(α), for each t ∈ Follow(A) do

T[A, t] = α If ε ∈ First(α) and $ ∈ Follow(A) do

T[A, $] = α

Page 15: CS1622 - University of Pittsburghpeople.cs.pitt.edu/~mock/cs1622/lectures/lecture09.pdf3 CS 1622 Lecture 9 7 Recursive Descent Parsing. Notes. To start the parser Initialize next to

15

CS 1622 Lecture 9 43

Notes on LL(1) Parsing Tables If any entry is multiply defined then G is not

LL(1). Possible reasons: If G is ambiguous If G is left recursive If G is not left-factored Grammar is not LL(1)

Most programming language grammars arenot LL(1) Some can be made LL(1) though but others can’t

There are tools that build LL(1) tables E.g. LLGen