lesson10.ppt

35
CPSC 388 – Compiler Design and Construction Implementing a Parser LL(1) and LALR Grammars FBI Noon Dining Hall Vicki Anderson Recruiter

Upload: paksmiler

Post on 02-Feb-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: lesson10.ppt

CPSC 388 – Compiler Design and Construction

Implementing a ParserLL(1) and LALR Grammars

FBI Noon Dining Hall Vicki Anderson Recruiter

Page 2: lesson10.ppt

Announcements

PROG 3 out, due Oct 9th

Get started NOW! HW due Friday HW6 posted, due next Friday

Page 3: lesson10.ppt

Parsing using CFGs Algorithms can parse using CFGs in O(n3) time (n is the

number of characters in input stream) – TOO SLOW Subclasses of grammars can be parsed in O(n) time

LL(1)1 token of look aheadDo a left most derivationScan input from left to right

LALR(1)one token of look-aheaddo a rightmost derivation in reversescan the input left-to-rightLA means "look-ahead“

(nothing to do with the number of tokens)

Page 4: lesson10.ppt

LALR(1) More general than LL(1) grammars

(Every LL(1) grammar is a LALR(1) grammar but not vice versa)

Class of grammars used by java_cup, Bison, YACC

Parsed bottom up(start with non-terminals and build tree from leaves

up to root) Covered in text section 4.6-4.7 For class need to understand details of just

LL(1) grammars

Page 5: lesson10.ppt

LL(1) Grammars – Predictive Parsers “build” parse tree top-down

actually discover tree top-down, don’t actually build it

Keep track of work to be done using a stack Scanned tokens along with stack

correspond to leaves of incomplete tree Use parse table to decide how to parse

input Rows are non-terminals Columns are tokens (plus EOF token) Cells are the bodies of production rules

Page 6: lesson10.ppt

Predictive Parser Algorithms.push(EOF) // special EOF terminals.push(start) // start is start non-terminalx=s.peek()t=scanner.next_token()While (x != EOF):

if x==t:s.pop()t=scanner.next_token()

else: if x is terminal: errorelse: if table[x][t]==empty: errorelse:

let body=table[x][t] //body of productionoutput x→bodys.pop()s.push(…) //push body from right to left

x=s.peek()

Page 7: lesson10.ppt

Example Parse using algorithm

Consider the language of balanced parentheses and brackets, e.g. ([])

Input String is “([])EOF” Grammar:

S → ε | ( S ) | [ S ] Parse Table:

( ) [ ] EOF

S (S) ε [S] ε ε

Page 8: lesson10.ppt

Not All Grammars LL(1) Not all Grammars are LL(1):

S → ( S ) | [ S ] | ( ) | [ ] If input is ( don’t know which rule to

use! Try input “[[]]” to LL(1) grammar

using predictive parser Draw input seen so far Stack Action taken

Page 9: lesson10.ppt

Is Grammar LL(1)

Given a grammar how do you tell if it is LL(1)?

How to build the parse table?

If parse table is built and only one entry per cell then LL(1)

Page 10: lesson10.ppt

Non-LL(1) Grammars

If a grammar is left-recursive

If a grammar is not left-factored

It is sometimes possible to change a grammar to remove left-recursion and to make it left-factored

Page 11: lesson10.ppt

Left-Recursion

Grammar g is recursive if there exists a production such that:

xx

xx

xx

*

*

*

Recursive

Left recursive

Right recursive

Page 12: lesson10.ppt

Removing Immediate Left-Recursion Consider the grammar

A → Aα | β A is a nonterminal α a sequence of terminals and/or nonterminals β is a sequence of terminals and/or nonterminals

not starting with A Replace production with

A → β A’A’ → α A’ | ε

Two grammars are equivalent (recognize same set of input strings)

Page 13: lesson10.ppt

You Try it Remove left recursion from the grammar:

exp → exp - factor | factor factor → INTLITERAL | ( exp )

Construct parse tree using original grammar and new grammar using input “5-3-2”

In general more difficult than this to remove left recursion, see text 4.3.3

Page 14: lesson10.ppt

Left Factored

A grammar is NOT left-factored if a non-terminal has two productions whose bodies have common prefixesexp → ( exp ) | ( )

A top-down predictive parser would not know which production rule to use when seeing input character of “(“

Page 15: lesson10.ppt

Left Factoring Given a pair of productions:

A → α β1 | α β2 α is sequence of terminals and non-terminals β1 and β2 are sequence of terminals and non-

terminals but don’t have common prefix (may be epsilon)

Change to:A → α A’A’ → β1 | β2

Page 16: lesson10.ppt

Left Factoring Example

So for grammarexp → ( exp ) | ( )

It becomesexp → ( exp’exp’ → exp ) | )

Page 17: lesson10.ppt

You Try It

Remove left recursion and do left factoring for grammarexp → ( exp ) | exp exp | ( )

Page 18: lesson10.ppt

Building Parse Tables Recall a parse table

Every row is a non-terminal Every column is an input token Every cell contains a production body

If any cell contains more than one production body then grammar is not LL(1)

To build parse table need to have FIRST set and FOLLOW set

Page 19: lesson10.ppt

FIRST set

FIRST(α)α is some sequence of terminals and non-

terminals

FIRST(α) is set of terminals that begin the strings derivable from α

if α can derive ε, then ε is in FIRST(α)

*αt

* tαttFIRST

and

and terminalis |)(

Page 20: lesson10.ppt

FIRST(X) X is a single terminal, non-terminal or ε FIRST(X)={X} //X is terminal FIRST(X)={ε} //X is ε FIRST(X)=… //X is non-terminal

Look at all productions rules with X as head For each production rule, X →Y1,Y2,…Yn

Put FIRST(Y1) - {ε} into FIRST(X). If ε is in FIRST(Y1), then put FIRST(Y2) - {ε} into

FIRST(X). If ε is in FIRST(Y2), then put FIRST(Y3) - {ε} into

FIRST(X). etc... If ε is in FIRST(Yi) for 1 <= i <= n (all production right-

hand side

Page 21: lesson10.ppt

Example FIRST Sets

Compute FIRST sets for each non-terminal:exp → term exp’exp’ → - term exp’ | εterm → factor term’term’ → / factor term’ | εfactor → INTLITERAL | ( exp ) {INTLITERAL, ( }

{ -, ε }

{ INTLITERAL, ( }

{ /, ε }

{ INTLITERAL, ( }

Page 22: lesson10.ppt

FIRST(α) for any α

α is of the form X1, X2, …, Xn

Where each X is a terminal, non-terminal or ε

1. Put FIRST(X1) - {ε} into FIRST(α)

2. If epsilon is in FIRST(X1) put FIRST(X2) into FIRST(α).

3. etc... 4. If ε is in the FIRST set for every Xn,

put ε into FIRST(α).

Page 23: lesson10.ppt

Example FIRST sets for rules

FIRST( term exp' ) = { INTLITERAL, ( }FIRST( - term exp' ) = { - }FIRST(ε ) = {ε }FIRST( factor term' ) = { INTLITERAL,

( }FIRST( / factor term' ) = { / }FIRST(ε ) = {ε }FIRST( INTLITERAL ) = { INTLITERAL } FIRST( ( exp ) ) = { ( }

Page 24: lesson10.ppt

Why Do We Care about FIRST(α)?

During parsing, suppose the top-of-stack symbol is nonterminal A, that there are two productions: A → α A → β

And that the current token is x If x is in FIRST(α) then use first production If x is in FIRST(β) then use second

production

Page 25: lesson10.ppt

FOLLOW(A) sets

Only defined for singlenon-terminals, A

the set of terminals that can appear immediately to the right of A (may include EOF but never ε)

Page 26: lesson10.ppt

Calculating FOLLOW(A)

If A is start non-terminal put EOF in FOLLOW(A)

Find productions with A in body: For each production X → α A β

put FIRST(β) – {ε} in FOLLOW(A) If ε in FIRST(β) put FOLLOW(X) into

FOLLOW(A) For each production X → α A

put FOLLOW(X) into FOLLOW(A)

Page 27: lesson10.ppt

FIRST and FOLLOW sets To compute FIRST(A) you must look for A on

a production's left-hand side. To compute FOLLOW(A) you must look for A

on a production's right-hand side. FIRST and FOLLOW sets are always sets of

terminals (plus, perhaps, ε for FIRST sets, and EOF for follow sets).

Nonterminals are never in a FIRST or a FOLLOW set.

Page 28: lesson10.ppt

Example FOLLOW setsCAPS are non-terminals and lower-case are terminalsS → B c | D BB → a b | c SD → d | ε

X FIRST(X) FOLLOW(X)-------------------------------------------D { d, ε } { a, c }B { a, c } { c, EOF }S { a, c, d } { EOF, c }Note: FOLLOW of S always includes EOF

Page 29: lesson10.ppt

You Try It

Computer FIRST and FOLLOW sets for:methodHeader → VOID ID LPAREN paramList RPAREN

paramList → epsilon

paramList → nonEmptyParamList

nonEmptyParamList → ID ID

nonEmptyParamList → ID ID COMMA nonEmptyParamList

Remember you need FIRST and FOLLOW sets for all non-terminals and FIRST sets for all bodies of rules

Page 30: lesson10.ppt

Parse Table

a b c d

S

A

X

R

Non-terminals

CurrentToken

Rule bodies

Page 31: lesson10.ppt

Parse Table Construction Algorithm

for each production X → α:for each terminal t in First(α):

put α in Table[X,t]

if ε is in First(α) then: for each terminal t in

Follow(X): put α in Table[X,t]

Page 32: lesson10.ppt

Example Parse Table Construction

S → B c | D BB → a b | c SD → d | εFor this grammar: Construct FIRST and FOLLOW Sets Apply algorithm to calculate parse

table

Page 33: lesson10.ppt

Example Parse Table Construction

X FIRST(X) FOLLOW(X)---------------------------------------------------D { d, ε } { a, c }B { a, c } { c, EOF }S { a, c, d } { EOF, c }Bc { a, c }DB { d, a, c }ab { a }cS { c }D { d }Ε {ε }

Page 34: lesson10.ppt

Parse Table

a b c d EOF

S BcDB

BcDB

DB

B

D ε ε

Finish Filling In Table

Page 35: lesson10.ppt

Predictive Parser Algorithms.push(EOF) // special EOF terminals.push(start) // start is start non-terminalx=s.peek()t=scanner.next_token()While (x != EOF):

if x==t:s.pop()t=scanner.next_token()

else: if x is terminal: errorelse: if table[x][t]==empty: errorelse:

let body=table[x][t] //body of productionoutput x→bodys.pop()s.push(…) //push body from right to left

x=s.peek()