lesson10.ppt

CPSC 388 – Compiler Design and Construction

Implementing a ParserLL(1) and LALR Grammars

FBI Noon Dining Hall Vicki Anderson Recruiter

Announcements

PROG 3 out, due Oct 9th

Get started NOW! HW due Friday HW6 posted, due next Friday

Parsing using CFGs Algorithms can parse using CFGs in O(n3) time (n is the

number of characters in input stream) – TOO SLOW Subclasses of grammars can be parsed in O(n) time

LL(1)1 token of look aheadDo a left most derivationScan input from left to right

LALR(1)one token of look-aheaddo a rightmost derivation in reversescan the input left-to-rightLA means "look-ahead“

(nothing to do with the number of tokens)

LALR(1) More general than LL(1) grammars

(Every LL(1) grammar is a LALR(1) grammar but not vice versa)

Class of grammars used by java_cup, Bison, YACC

Parsed bottom up(start with non-terminals and build tree from leaves

up to root) Covered in text section 4.6-4.7 For class need to understand details of just

LL(1) grammars

LL(1) Grammars – Predictive Parsers “build” parse tree top-down

actually discover tree top-down, don’t actually build it

Keep track of work to be done using a stack Scanned tokens along with stack

correspond to leaves of incomplete tree Use parse table to decide how to parse

input Rows are non-terminals Columns are tokens (plus EOF token) Cells are the bodies of production rules

Predictive Parser Algorithms.push(EOF) // special EOF terminals.push(start) // start is start non-terminalx=s.peek()t=scanner.next_token()While (x != EOF):

if x==t:s.pop()t=scanner.next_token()

else: if x is terminal: errorelse: if table[x][t]==empty: errorelse:

let body=table[x][t] //body of productionoutput x→bodys.pop()s.push(…) //push body from right to left

x=s.peek()

Example Parse using algorithm

Consider the language of balanced parentheses and brackets, e.g. ([])

Input String is “([])EOF” Grammar:

S → ε | ( S ) | [ S ] Parse Table:

( ) [ ] EOF

S (S) ε [S] ε ε

Not All Grammars LL(1) Not all Grammars are LL(1):

S → ( S ) | [ S ] | ( ) | [ ] If input is ( don’t know which rule to

use! Try input “[[]]” to LL(1) grammar

using predictive parser Draw input seen so far Stack Action taken

Is Grammar LL(1)

Given a grammar how do you tell if it is LL(1)?

How to build the parse table?

If parse table is built and only one entry per cell then LL(1)

Non-LL(1) Grammars

If a grammar is left-recursive

If a grammar is not left-factored

It is sometimes possible to change a grammar to remove left-recursion and to make it left-factored

Left-Recursion

Grammar g is recursive if there exists a production such that:

xx

xx

xx

*

*

*

Recursive

Left recursive

Right recursive

Removing Immediate Left-Recursion Consider the grammar

A → Aα | β A is a nonterminal α a sequence of terminals and/or nonterminals β is a sequence of terminals and/or nonterminals

not starting with A Replace production with

A → β A’A’ → α A’ | ε

Two grammars are equivalent (recognize same set of input strings)

You Try it Remove left recursion from the grammar:

exp → exp - factor | factor factor → INTLITERAL | ( exp )

Construct parse tree using original grammar and new grammar using input “5-3-2”

In general more difficult than this to remove left recursion, see text 4.3.3

Left Factored

A grammar is NOT left-factored if a non-terminal has two productions whose bodies have common prefixesexp → ( exp ) | ( )

A top-down predictive parser would not know which production rule to use when seeing input character of “(“

Left Factoring Given a pair of productions:

A → α β1 | α β2 α is sequence of terminals and non-terminals β1 and β2 are sequence of terminals and non-

terminals but don’t have common prefix (may be epsilon)

Change to:A → α A’A’ → β1 | β2

Left Factoring Example

So for grammarexp → ( exp ) | ( )

It becomesexp → ( exp’exp’ → exp ) | )

You Try It

Remove left recursion and do left factoring for grammarexp → ( exp ) | exp exp | ( )

Building Parse Tables Recall a parse table

Every row is a non-terminal Every column is an input token Every cell contains a production body

If any cell contains more than one production body then grammar is not LL(1)

To build parse table need to have FIRST set and FOLLOW set

FIRST set

FIRST(α)α is some sequence of terminals and non-

terminals

FIRST(α) is set of terminals that begin the strings derivable from α

if α can derive ε, then ε is in FIRST(α)

*αt

* tαttFIRST

and

and terminalis |)(

FIRST(X) X is a single terminal, non-terminal or ε FIRST(X)={X} //X is terminal FIRST(X)={ε} //X is ε FIRST(X)=… //X is non-terminal

Look at all productions rules with X as head For each production rule, X →Y1,Y2,…Yn

Put FIRST(Y1) - {ε} into FIRST(X). If ε is in FIRST(Y1), then put FIRST(Y2) - {ε} into

FIRST(X). If ε is in FIRST(Y2), then put FIRST(Y3) - {ε} into

FIRST(X). etc... If ε is in FIRST(Yi) for 1 <= i <= n (all production right-

hand side

Example FIRST Sets

Compute FIRST sets for each non-terminal:exp → term exp’exp’ → - term exp’ | εterm → factor term’term’ → / factor term’ | εfactor → INTLITERAL | ( exp ) {INTLITERAL, ( }

{ -, ε }

{ INTLITERAL, ( }

{ /, ε }

{ INTLITERAL, ( }

FIRST(α) for any α

α is of the form X1, X2, …, Xn

Where each X is a terminal, non-terminal or ε

1. Put FIRST(X1) - {ε} into FIRST(α)

2. If epsilon is in FIRST(X1) put FIRST(X2) into FIRST(α).

3. etc... 4. If ε is in the FIRST set for every Xn,

put ε into FIRST(α).

Example FIRST sets for rules

FIRST( term exp' ) = { INTLITERAL, ( }FIRST( - term exp' ) = { - }FIRST(ε ) = {ε }FIRST( factor term' ) = { INTLITERAL,

( }FIRST( / factor term' ) = { / }FIRST(ε ) = {ε }FIRST( INTLITERAL ) = { INTLITERAL } FIRST( ( exp ) ) = { ( }

Why Do We Care about FIRST(α)?

During parsing, suppose the top-of-stack symbol is nonterminal A, that there are two productions: A → α A → β

And that the current token is x If x is in FIRST(α) then use first production If x is in FIRST(β) then use second

production

FOLLOW(A) sets

Only defined for singlenon-terminals, A

the set of terminals that can appear immediately to the right of A (may include EOF but never ε)

Calculating FOLLOW(A)

If A is start non-terminal put EOF in FOLLOW(A)

Find productions with A in body: For each production X → α A β

put FIRST(β) – {ε} in FOLLOW(A) If ε in FIRST(β) put FOLLOW(X) into

FOLLOW(A) For each production X → α A

put FOLLOW(X) into FOLLOW(A)

FIRST and FOLLOW sets To compute FIRST(A) you must look for A on

a production's left-hand side. To compute FOLLOW(A) you must look for A

on a production's right-hand side. FIRST and FOLLOW sets are always sets of

terminals (plus, perhaps, ε for FIRST sets, and EOF for follow sets).

Nonterminals are never in a FIRST or a FOLLOW set.

Example FOLLOW setsCAPS are non-terminals and lower-case are terminalsS → B c | D BB → a b | c SD → d | ε

X FIRST(X) FOLLOW(X)-------------------------------------------D { d, ε } { a, c }B { a, c } { c, EOF }S { a, c, d } { EOF, c }Note: FOLLOW of S always includes EOF

You Try It

Computer FIRST and FOLLOW sets for:methodHeader → VOID ID LPAREN paramList RPAREN

paramList → epsilon

paramList → nonEmptyParamList

nonEmptyParamList → ID ID

nonEmptyParamList → ID ID COMMA nonEmptyParamList

Remember you need FIRST and FOLLOW sets for all non-terminals and FIRST sets for all bodies of rules

Parse Table

a b c d

S

A

X

R

Non-terminals

CurrentToken

Rule bodies

Parse Table Construction Algorithm

for each production X → α:for each terminal t in First(α):

put α in Table[X,t]

if ε is in First(α) then: for each terminal t in

Follow(X): put α in Table[X,t]

Example Parse Table Construction

S → B c | D BB → a b | c SD → d | εFor this grammar: Construct FIRST and FOLLOW Sets Apply algorithm to calculate parse

table

Example Parse Table Construction

X FIRST(X) FOLLOW(X)---------------------------------------------------D { d, ε } { a, c }B { a, c } { c, EOF }S { a, c, d } { EOF, c }Bc { a, c }DB { d, a, c }ab { a }cS { c }D { d }Ε {ε }

Parse Table

a b c d EOF

S BcDB

BcDB

DB

B

D ε ε

Finish Filling In Table

Predictive Parser Algorithms.push(EOF) // special EOF terminals.push(start) // start is start non-terminalx=s.peek()t=scanner.next_token()While (x != EOF):

if x==t:s.pop()t=scanner.next_token()

else: if x is terminal: errorelse: if table[x][t]==empty: errorelse:

let body=table[x][t] //body of productionoutput x→bodys.pop()s.push(…) //push body from right to left

x=s.peek()

lesson10.ppt

Documents