comp3190: principle of programming languages formal language syntax

143
COMP3190: Principle of Programming Languages Formal Language Syntax

Upload: frank-cooper

Post on 28-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: COMP3190: Principle of Programming Languages Formal Language Syntax

COMP3190: Principle of Programming Languages

Formal Language Syntax

Page 2: COMP3190: Principle of Programming Languages Formal Language Syntax

- 2 -

Motivation

The problem of parsing structured text is very commonConsider the structure of email addresses (using a grammar):<emailAddress> := <person> @ <host><person> := <word><host> := <word> | <word>.<host>Describe and recognize email addresses in arbitrary text.

Page 3: COMP3190: Principle of Programming Languages Formal Language Syntax

- 3 -

Outline

DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser

Page 4: COMP3190: Principle of Programming Languages Formal Language Syntax

- 4 -

Deterministic Finite Automata (DFA)

Q: finite set of states Σ: finite set of “letters” (alphabet) δ: QxΣ -> Q (transition function) q0: start state (in Q)

F : set of accept states (subset of Q) Acceptance: input consumed with the automata

in a final state.

Page 5: COMP3190: Principle of Programming Languages Formal Language Syntax

- 5 -

Example of DFA

q1 q2

1

0

0 1

δ 0 1

q1 q1 q2

q2 q1 q2

Accepts all strings that end in 1

Page 6: COMP3190: Principle of Programming Languages Formal Language Syntax

- 6 -

Another Example of a DFA

S

q1

q2

r1

r2

a b

a

ab

b

b

a b

a

Accepts all strings that start and end with “a” OR start and end with “b”

Page 7: COMP3190: Principle of Programming Languages Formal Language Syntax

- 7 -

Non-deterministic Finite Automata (NFA)

Transition function is different δ: QxΣε -> P(Q)

P(Q) is the powerset of Q (set of all subsets) Σε is the union of Σ and the special symbol ε

(denoting empty)String is accepted if there is at least one path leading to an accept state, and input consumed.

Page 8: COMP3190: Principle of Programming Languages Formal Language Syntax

- 8 -

Example of an NFA

q1 q2 q3 q4

0, 11 0, ε 1

0, 1

δ 0 1 ε

q1 {q1} {q1, q2}

q2 {q3} {q3}

q3 {q4}

q4 {q4} {q4}

What strings does this NFA accept?

Page 9: COMP3190: Principle of Programming Languages Formal Language Syntax

- 9 -

Outline

DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser

Page 10: COMP3190: Principle of Programming Languages Formal Language Syntax

- 10 -

Regular Expressions

R is a regular expression if R is “a” for some a in Σ. ε (the empty string). member of the empty language. the union of two regular expressions. the concatenation of two regular expr. R1

* (Kleene closure: zero or more repetitions of R1).

Page 11: COMP3190: Principle of Programming Languages Formal Language Syntax

- 11 -

Regular Expression Notation a: an ordinary letter ε: the empty string M | N: choosing from M or N MN: concatenation of M and N M*: zero or more times (Kleene star) M+: one or more times M?: zero or one occurence [a-zA-Z] character set alternation (choice) . period stands for any single char exc. newline

Page 12: COMP3190: Principle of Programming Languages Formal Language Syntax

- 12 -

Examples of Regular Expressions

{0, 1}* 0 all strings that end in 0{0, 1} 0* string that start with 1 or 0 followed by zero or more 0s.{0, 1}* all strings{0n1n, n >=0} not a regular expression!!!

Page 13: COMP3190: Principle of Programming Languages Formal Language Syntax

- 13 -

Converting a Regular Expression to an NFA

εε

ε

ε

εM

N

M

M N

ε

a

M|N

MN

M*

Page 14: COMP3190: Principle of Programming Languages Formal Language Syntax

- 14 -

Regular expression->NFA

Language: Strings of 0s and 1s in which the number of 0s is even

Regular expression: (1*01*0)*1*

Page 15: COMP3190: Principle of Programming Languages Formal Language Syntax

- 15 -

Converting an NFA to a DFA

For set of states S, closure(S) is the set of states that can be reached from S without consuming any input.

For a set of states S, DFAedge(s, c) is the set of states that can be reached from S by consuming input symbol c.

Each set of NFA states corresponds to one DFA state (hence at most 2n states).

Page 16: COMP3190: Principle of Programming Languages Formal Language Syntax

- 16 -

NFA -> DFA

Initial classes:{A, B, E}, {C, D}

No class requires partitioning!

Hence a two-stateDFA is obtained.

Page 17: COMP3190: Principle of Programming Languages Formal Language Syntax

- 17 -

Obtaining the minimal equivalent DFA

Initially two equivalence classes: final and nonfinal states.

Search for an equivalence class C and an input letter a such that with a as input, the states in C make transitions to states in k>1 different equivalence classes.

Partition C into k classes accordingly Repeat until unable to find a class to partition.

Page 18: COMP3190: Principle of Programming Languages Formal Language Syntax

- 18 -

Example (cont.)

Page 19: COMP3190: Principle of Programming Languages Formal Language Syntax

- 19 -

Outline

DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser

Page 20: COMP3190: Principle of Programming Languages Formal Language Syntax

- 20 -

Regular Grammar

Later definitions build on earlier ones Nothing defined in terms of itself (no

recursion)

Regular grammar for numeric literals in Pascal:digit -> 0|1|2|...|8|9unsigned_integer -> digit digit*unsigned_number -> unsigned_integer (( . unsigned_integer) | ε ) (( e (+ | - | ε ) unsigned_integer ) | ε )

Page 21: COMP3190: Principle of Programming Languages Formal Language Syntax

- 21 -

Languages and Automata in Programming Languages

Regular languages» Recognized(accepted) by finite automata» Useful for tokenizing program text

(lexical analysis) Context-free languages

» Recognized(accepted) by pushdown automata» Useful for parsing the syntax of a program

Page 22: COMP3190: Principle of Programming Languages Formal Language Syntax

- 22 -

Important Theorems

A language is regular if a regular expression describes it.

A language is regular if a finite automata recognizes it.

DFAs and NFAs are equally powerful.

Page 23: COMP3190: Principle of Programming Languages Formal Language Syntax

- 23 -

Outline

DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser

Page 24: COMP3190: Principle of Programming Languages Formal Language Syntax

- 24 -

Context-free Grammars

Context-free grammars are defined by substitution rules

Big Jim ate gree cheesegreen Jim ate green cheeseJim ate cheeseCheese ate Jim

P -> NP -> APS -> PVP

A -> big|greenN -> cheese|JimV -> ate

Page 25: COMP3190: Principle of Programming Languages Formal Language Syntax

- 25 -

Context-free Grammars

Context-free grammars are used to formally describe the syntax of programming languages.

Every syntactically correct program is derived using the context-free grammar of the language.

Parsing a program involves tracing such derivation, given the context-free grammar and the program.

Page 26: COMP3190: Principle of Programming Languages Formal Language Syntax

- 26 -

Context-free Grammars

A context-free grammar consists of V: a finite set of variables Σ: a finite set of terminals R: a finite set of rules of the form

variable -> {variable, terminal}* S: the start variable

Page 27: COMP3190: Principle of Programming Languages Formal Language Syntax

- 27 -

Pushdown Automata (PDA)

A pushdown automata consists of Q: a set of states Σ: input alphabet (of terminals) Γ: stack alphabet δ: a set of transition rules

Q x Σε x Γε -> P(Q x Γε)currentState, inputSymbol, headOfStack ->newState, pushSymbolOnStack

q0: the start state F: the set of accept states (subset of Q)

Deterministic: At most one move is possible from any configuration

Page 28: COMP3190: Principle of Programming Languages Formal Language Syntax

- 28 -

How does a PDA accept?

By final state: » Consume all the input while» Reaching a final state

By empty stack:» Consume all the input while» Having an empty stack» Set of final states is irrelevant

Page 29: COMP3190: Principle of Programming Languages Formal Language Syntax

- 29 -

Example of a PDA

q1 q2

q3q4

ε, ε ->$ 0, ε->0

1, 0->ε

1, 0->εε, $->ε

Notation: a, b->c: when PDA reads “a” from input, it replaces “b” at the top of stack with “c”.

What does this PDA accept?

Page 30: COMP3190: Principle of Programming Languages Formal Language Syntax

- 30 -

Important Theorems

A language is context-free iff a pushdown automata recognizes it

Non-deterministic PDA are more powerful than deterministic ones

Page 31: COMP3190: Principle of Programming Languages Formal Language Syntax

- 31 -

Example of Context-free Language That Requires a Non-deterministic PDA

{w wR | w belongs to {0, 1}*}

i.e. wR is w written backwards

Idea:

Non-deterministically guess the middle of the input string

Page 32: COMP3190: Principle of Programming Languages Formal Language Syntax

- 32 -

The Solution

q1 q2

q3q4

ε, ε ->$ 0, ε->01, ε->1

ε, ε->ε

1, 1->ε0, 0->ε

ε, $->ε

Page 33: COMP3190: Principle of Programming Languages Formal Language Syntax

- 33 -

Derivations and Parse Trees

Nested constructs require recursion, i.e. context-free grammars

CFG for arithmetic expressions

expression -> identifier | number | - expression | (expression) | expression operator expression

operator -> + | - | * | /

Page 34: COMP3190: Principle of Programming Languages Formal Language Syntax

- 34 -

Parse Tree for Slope*x + Intercept

Is this the only parse tree for this expression and grammar?

Page 35: COMP3190: Principle of Programming Languages Formal Language Syntax

- 35 -

A Better Expression Grammar

1. expression -> term | expression add_op term

2. term -> factor | term mult_op factor

3. factor -> identifier | number | - factor | (expression)

4. add_op -> + | -

5. mult_op -> * | /

A good grammar reflects the internal structure of programs.

This grammar is unambiguous and captures (HOW?):- operator precedence (*,/ bind tighter than +,- )- associativity (ops group left to right)

Page 36: COMP3190: Principle of Programming Languages Formal Language Syntax

- 36 -

And Better Parse Trees...

3 + 4 * 5

10 - 4 - 3

Page 37: COMP3190: Principle of Programming Languages Formal Language Syntax

- 37 -

Syntax-directed Compilation

Parser calls scanner to obtain tokens. Assembles tokens into parse tree. Passes tree to later phases of compilation. Scanner: deterministic finite automata. Parser: pushdown automata. Scanners and parsers can be generated

automatically from regular expressions and CFGs (e.G. lex/yacc).

Page 38: COMP3190: Principle of Programming Languages Formal Language Syntax

- 38 -

Outline

DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser

Page 39: COMP3190: Principle of Programming Languages Formal Language Syntax

- 39 -

Scanning

Accept the longest possible token in each invocation of the scanner.

Implementation.» Capture finite automata.

Case(switch) statements. Table and driver.

Page 40: COMP3190: Principle of Programming Languages Formal Language Syntax

- 40 -

Scanner for Pascal

Page 41: COMP3190: Principle of Programming Languages Formal Language Syntax

- 41 -

Scanner for Pascal(case Statements)

Page 42: COMP3190: Principle of Programming Languages Formal Language Syntax

- 42 -

Scanner (Table&driver)

Page 43: COMP3190: Principle of Programming Languages Formal Language Syntax

- 43 -

Scanner Generators

Start with a regular expression. Construct an NFA from it. Use a set of subsets construction to obtain an

equivalent DFA. Construct the minimal equivalent DFA.

Page 44: COMP3190: Principle of Programming Languages Formal Language Syntax

- 44 -

Outline

DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser

» Top-down parsing» Bottom-up Parsing» Comparison

Page 45: COMP3190: Principle of Programming Languages Formal Language Syntax

- 45 -

Parsing approaches Parsing in general has O(n3) cost. Need classes of grammars that can be parsed in

linear time» Top-down or

predictive parsing orrecursive descent parsingor LL parsing (Left-to-right Left-most)

» Bottom-up or shift-reduce parsing orLR parsing (Left-to-right Right-most)

Page 46: COMP3190: Principle of Programming Languages Formal Language Syntax

- 46 -

A Simple Grammar for a Comma-separated List of Identifiers

id_list -> id id_list_tail

id_list_tail -> , id id_list_tail

id_list_tail -> ;_________________________

String to be parsed: A, B, C;

Page 47: COMP3190: Principle of Programming Languages Formal Language Syntax

- 47 -

Top-down/bottom-up Parsing

Page 48: COMP3190: Principle of Programming Languages Formal Language Syntax

- 48 -

Outline

DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser

» Top-down parsing» Bottom-up Parsing» Comparison

Page 49: COMP3190: Principle of Programming Languages Formal Language Syntax

- 49 -

Top-down Parsing

Predicts a derivation Matches non-terminal against token observed in

input

Page 50: COMP3190: Principle of Programming Languages Formal Language Syntax

- 50 -

LL(1) Grammar

A grammar for which a top-down deterministic parser can be produced with one token of look-ahead.

LL(1) grammar:» For a given non-terminal, the lookahead symbol

uniquely determines the production to apply

» Top-down parsing = predictive parsing

» Driven by predictive parsing table of non-terminals x terminals productions

Page 51: COMP3190: Principle of Programming Languages Formal Language Syntax

- 51 -

From Last Time: Parsing with Table

Partly-derived String Lookahead parsed part unparsed partES’ ( (1+2+(3+4))+5(S)S’ 1 (1+2+(3+4))+5(ES’)S’ 1 (1+2+(3+4))+5(1S’)S’ + (1+2+(3+4))+5(1+ES’)S’ 2 (1+2+(3+4))+5(1+2S’)S’ + (1+2+(3+4))+5

S ES’ S’ | +S E num | (S)

num + ( ) $S ES’ ES’S’ +S E num (S)

Page 52: COMP3190: Principle of Programming Languages Formal Language Syntax

- 52 -

How to Construct Parsing Tables?

Needed: Algorithm for automatically generatinga predictive parse table from a grammar

S ES’S’ | +SE number | (S)

num + ( ) $S ES’ ES’S’ +S E num (S)

??

Page 53: COMP3190: Principle of Programming Languages Formal Language Syntax

- 53 -

Constructing Parse Tables Can construct predictive parser if:

» For every non-terminal, every lookahead symbol can be handled by at most 1 production

FIRST() for an arbitrary string of terminals and non-terminals is:» Set of symbols that might begin the fully expanded

version of FOLLOW(X) for a non-terminal X is:

» Set of symbols that might follow the derivation of X in the input stream

FIRST FOLLOW

X

Page 54: COMP3190: Principle of Programming Languages Formal Language Syntax

- 54 -

Parse Table Entries

Consider a production X Add to the X row for each symbol in

FIRST() If can derive ( is nullable), add

for each symbol in FOLLOW(X) Grammar is LL(1) if no conflicting entries

num + ( ) $S ES’ ES’S’ +S E num (S)

S ES’S’ | +SE number | (S)

Page 55: COMP3190: Principle of Programming Languages Formal Language Syntax

- 55 -

Computing Nullable

X is nullable if it can derive the empty string:» If it derives directly (X )

» If it has a production X YZ ... where all RHS symbols (Y,Z) are nullable

Algorithm: assume all non-terminals are non-nullable, apply rules repeatedly until no change

S ES’S’ | +SE number | (S)

Only S’ is nullable

Page 56: COMP3190: Principle of Programming Languages Formal Language Syntax

- 56 -

Computing FIRST Determining FIRST(X)

1. if X is a terminal, then add X to FIRST(X)

2. if X then add to FIRST(X)

3. if X is a nonterminal and X Y1Y2...Yk then a is in FIRST(X) if a is in FIRST(Yi) and is in FIRST(Yj) for j = 1...i-1 (i.e., its possible to have an empty prefix Y1 ... Yi-1

4. if is in FIRST(Y1Y2...Yk) then is in FIRST(X)

Page 57: COMP3190: Principle of Programming Languages Formal Language Syntax

- 57 -

FIRST Example

S ES’S’ | +SE number | (S)

Apply rule 1: FIRST(num) = {num}, FIRST(+) = {+}, etc.Apply rule 2: FIRST(S’) = {}Apply rule 3: FIRST(S) = FIRST(E) = {}

FIRST(S’) = FIRST(‘+’) + {} = { , + }FIRST(E) = FIRST(num) + FIRST(‘(‘) = {num, ( }

Rule 3 again: FIRST(S) = FIRST(E) = {num, ( }FIRST(S’) = {, + }FIRST(E) = {num, ( }

Page 58: COMP3190: Principle of Programming Languages Formal Language Syntax

- 58 -

Computing FOLLOW

Determining FOLLOW(X)1. if S is the start symbol then $ is in FOLLOW(S)

2. if A B then add all FIRST() != to FOLLOW(B)

3. if A B or B and is in FIRST() then add FOLLOW(A) to FOLLOW(B)

Page 59: COMP3190: Principle of Programming Languages Formal Language Syntax

- 59 -

FOLLOW Example

S ES’S’ | +SE number | (S)

FIRST(S) = {num, ( }FIRST(S’) = {, + }FIRST(E) = { num, ( }

Apply rule 1: FOL(S) = {$}Apply rule 2: S ES’ FOL(E) += {FIRST(S’) - } = {+}

S’ | +S -E num | (S) FOL(S) += {FIRST(‘)’) - } = {$,) }

Apply rule 3: S ES’ FOL(E) += FOL(S) = {+,$,)}(because S’ is nullable)

FOL(S’) += FOL(S) = {$,)}

Page 60: COMP3190: Principle of Programming Languages Formal Language Syntax

- 60 -

Putting it all TogetherFOLLOW(S) = { $, ) }FOLLOW(S’) = { $, ) }FOLLOW(E) = { +, ), $ }

FIRST(S) = {num, ( }FIRST(S’) = {, + }FIRST(E) = { num, ( }

Consider a production X

Add to the X row for each symbol in FIRST()

If can derive ( is nullable), add for each symbol in FOLLOW(X)

num + ( ) $S ES’ ES’S’ +S E num (S)

S ES’S’ | +SE number | (S)

Page 61: COMP3190: Principle of Programming Languages Formal Language Syntax

- 61 -

Ambiguous Grammars

Construction of predictive parse table for ambiguousgrammar results in conflicts in the table (ie 2 or moreproductions to apply in same cell)

S S + S | S * S | num

FIRST(S+S) = FIRST(S*S) = FIRST(num) = { num }

Page 62: COMP3190: Principle of Programming Languages Formal Language Syntax

- 62 -

Class Problem

E E + T | TT T * F | FF (E) | num |

1. Compute FIRST and FOLLOW sets for this G2. Compute parse table entries

Page 63: COMP3190: Principle of Programming Languages Formal Language Syntax

- 63 -

Top-Down Parsing Up to This Point

Now we know» How to build parsing table for an LL(1)

grammar (ie FIRST/FOLLOW)» How to construct recursive-descent parser

from parsing table» Call tree = parse tree

Open question – Can we generate the AST?

Page 64: COMP3190: Principle of Programming Languages Formal Language Syntax

- 64 -

Creating the Abstract Syntax Tree Some class definitions to assist

with AST construction class Expr {} class Add extends Expr {

» Expr left, right;

» Add(Expr L, Expr R) { left = L; right = R;

» }

} class Num extends Expr {

» int value;

» Num(int v) {value = v;}

}

Expr

Num Add

Class Hierarchy

Page 65: COMP3190: Principle of Programming Languages Formal Language Syntax

- 65 -

Creating the AST

++ 5

1 +

2 +

3 4

(1 + 2 + (3 + 4)) + 5S

E + S

( S ) E

E + S 5

E + S1

2 E

( S )

E + S

E3 4

• We got the parse treefrom the call tree

• Just add code to eachparsing routine to createthe appropriate nodes

• Works because parse treeand call tree are the sameshape, and AST is just acompressed form of theparse tree

Page 66: COMP3190: Principle of Programming Languages Formal Language Syntax

- 66 -

AST Creation: parse_E

Expr parse_E() {» switch (token) {

case num: // E number Expr result = Num(token.value); token = input.read(); return result;

case ‘(‘: // E (S) token = input.read(); Expr result = parse_S(); if (token != ‘)’) ParseError(); token = input.read(); return result;

default: ParseError();

» }

}

Remember, this is lookahead token

S ES’S’ | +SE number | (S)

Page 67: COMP3190: Principle of Programming Languages Formal Language Syntax

- 67 -

AST Creation: parse_S

Expr parse_S() {» switch (token) {

case num: case ‘(‘: // S ES’

Expr left = parse_E(); Expr right = parse_S’(); if (right == NULL) return left; else return new Add(left,right);

default: ParseError();

» }

}

S ES’S’ | +SE number | (S)

Page 68: COMP3190: Principle of Programming Languages Formal Language Syntax

- 68 -

Grammars Have been using grammar for language “sums

with parentheses” (1+2+(3+4))+5 Started with simple, right-associative grammar

» S E + S | E» E num | (S)

Transformed it to an LL(1) by left factoring:» S ES’» S’ | +S» E num (S)

What if we start with a left-associative grammar?» S S + E | E» E num | (S)

Page 69: COMP3190: Principle of Programming Languages Formal Language Syntax

- 69 -

Reminder: Left vs Right Associativity

+

1 +

2 +

3 4

S E + SS EE num

S S + ES EE num

+

1

+

2

+ 34

Right recursion : right associative

Left recursion : left associative

Consider a simpler string on a simpler grammar: “1 + 2 + 3 + 4”

Page 70: COMP3190: Principle of Programming Languages Formal Language Syntax

- 70 -

Left Recursion

derived string lookahead read/unreadS 1 1+2+3+4S+E 1 1+2+3+4S+E+E 1 1+2+3+4S+E+E+E 1 1+2+3+4E+E+E+E 1 1+2+3+41+E+E+E 2 1+2+3+41+2+E+E 3 1+2+3+41+2+3+E 4 1+2+3+41+2+3+4 $ 1+2+3+4

Is this right? If not, what’s the problem?

S S + ES EE num

“1 + 2 + 3 + 4”

Page 71: COMP3190: Principle of Programming Languages Formal Language Syntax

- 71 -

Left-Recursive Grammars

Left-recursive grammars don’t work with top-down parsers: we don’t know when to stop the recursion

Left-recursive grammars are NOT LL(1)!» S S» S

In parse table» Both productions will appear in the predictive

table at row S in all the columns corresponding to FIRST()

Page 72: COMP3190: Principle of Programming Languages Formal Language Syntax

- 72 -

Eliminate Left Recursion

Replace» X X1 | ... | Xm» X 1 | ... | n

With» X 1X’ | ... | nX’» X’ 1X’ | ... | mX’ |

See complete algorithm in Dragon book

Page 73: COMP3190: Principle of Programming Languages Formal Language Syntax

- 73 -

Class Problem

E E + T | TT T * F | FF (E) | num

Transform the following grammar to eliminate left recursion:

Page 74: COMP3190: Principle of Programming Languages Formal Language Syntax

- 74 -

Creating an LL(1) Grammar

Start with a left-recursive grammar S S + E S E

» and apply left-recursion elimination algorithm S ES’ S’ +ES’ |

Start with a right-recursive grammar S E + S S E

» and apply left-factoring to eliminate common prefixes S ES’ S’ +S |

Page 75: COMP3190: Principle of Programming Languages Formal Language Syntax

- 75 -

Top-Down Parsing Summary

Language grammarLeft-recursion elimination

Left factoring

LL(1) grammar

predictive parsing tableFIRST, FOLLOW

recursive-descent parser

parser with AST gen

Page 76: COMP3190: Principle of Programming Languages Formal Language Syntax

- 76 -

Outline

DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser

» Top-down parsing» Bottom-up Parsing» Comparison

Page 77: COMP3190: Principle of Programming Languages Formal Language Syntax

- 77 -

New Topic: Bottom-Up Parsing

A more power parsing technology LR grammars – more expressive than LL

» Construct right-most derivation of program» Left-recursive grammars, virtually all

programming languages are left-recursive» Easier to express syntax

Shift-reduce parsers» Parsers for LR grammars» Automatic parser generators (yacc, bison)

Page 78: COMP3190: Principle of Programming Languages Formal Language Syntax

- 78 -

Bottom-Up Parsing (2)

Right-most derivation – Backward» Start with the tokens» End with the start symbol» Match substring on RHS of production,

replace by LHS

S S + E | EE num | (S)

(1+2+(3+4))+5 (E+2+(3+4))+5 (S+2+(3+4))+5 (S+E+(3+4))+5 (S+(3+4))+5 (S+(E+4))+5 (S+(S+4))+5 (S+(S+E))+5 (S+(S))+5 (S+E)+5 (S)+5 E+5 S+E S

Page 79: COMP3190: Principle of Programming Languages Formal Language Syntax

- 79 -

Shift-Reduce Parsing

Parsing actions: A sequence of shift and reduce operations

Parser state: A stack of terminals and non-terminals (grows to the right)

Current derivation step = stack + input

Derivation step stack Unconsumed input(1+2+(3+4))+5 (1+2+(3+4))+5(E+2+(3+4))+5 (E +2+(3+4))+5(S+2+(3+4))+5 (S +2+(3+4))+5(S+E+(3+4))+5 (S+E +(3+4))+5...

Page 80: COMP3190: Principle of Programming Languages Formal Language Syntax

- 80 -

Shift-Reduce Actions

Parsing is a sequence of shifts and reduces Shift: move look-ahead token to stack

Reduce: Replace symbols from top of stack with non-terminal symbol X corresponding to the production: X (e.g., pop , push X)

stack input action( 1+2+(3+4))+5 shift 1(1 +2+(3+4))+5

stack input action(S+E +(3+4))+5 reduce S S+ E(S +(3+4))+5

Page 81: COMP3190: Principle of Programming Languages Formal Language Syntax

- 81 -

Shift-Reduce Parsing

derivation stack input stream action(1+2+(3+4))+5 (1+2+(3+4))+5 shift(1+2+(3+4))+5 ( 1+2+(3+4))+5 shift(1+2+(3+4))+5 (1 +2+(3+4))+5 reduce E num(E+2+(3+4))+5 (E +2+(3+4))+5 reduce S E(S+2+(3+4))+5 (S +2+(3+4))+5 shift(S+2+(3+4))+5 (S+ 2+(3+4))+5 shift(S+2+(3+4))+5 (S+2 +(3+4))+5 reduce E num(S+E+(3+4))+5 (S+E +(3+4))+5 reduce S S+E(S+(3+4))+5 (S +(3+4))+5 shift(S+(3+4))+5 (S+ (3+4))+5 shift(S+(3+4))+5 (S+( 3+4))+5 shift(S+(3+4))+5 (S+(3 +4))+5 reduce E num

...

S S + E | EE num | (S)

Page 82: COMP3190: Principle of Programming Languages Formal Language Syntax

- 82 -

Potential Problems

How do we know which action to take: whether to shift or reduce, and which production to apply

Issues» Sometimes can reduce but should not» Sometimes can reduce in different ways

Page 83: COMP3190: Principle of Programming Languages Formal Language Syntax

- 83 -

Action Selection Problem

Given stack and look-ahead symbol b, should parser:» Shift b onto the stack making it b ?» Reduce X assuming that the stack has the

form = making it X ? If stack has the form , should apply

reduction X (or shift) depending on stack prefix ? is different for different possible reductions

since ’s have different lengths

Page 84: COMP3190: Principle of Programming Languages Formal Language Syntax

- 84 -

LR Parsing Engine

Basic mechanism» Use a set of parser states» Use stack with alternating symbols and states

E.g., 1 ( 6 S 10 + 5 (blue = state numbers)

» Use parsing table to: Determine what action to apply (shift/reduce) Determine next state

The parser actions can be precisely determined from the table

Page 85: COMP3190: Principle of Programming Languages Formal Language Syntax

- 85 -

LR Parsing Table

Algorithm: look at entry for current state S and input terminal C» If Table[S,C] = s(S’) then shift:

push(C), push(S’)

» If Table[S,C] = X then reduce: pop(2*||), S’= top(), push(X), push(Table[S’,X])

Next actionand next state

Next state

Terminals Non-terminals

State

Action table Goto table

Page 86: COMP3190: Principle of Programming Languages Formal Language Syntax

- 86 -

LR Parsing Table Example

( ) id , $ S L1 s3 s2 g42 Sid Sid Sid Sid Sid3 s3 s2 g7 g54 accept5 s6 s86 S(L) S(L) S(L) S(L) S(L)7 LS LS LS LS LS8 s3 s2 g99 LL,S LL,S LL,S LL,S LL,S

Sta

te

Input terminal Non-terminals

We want to derive this in an algorithmic fashion

Page 87: COMP3190: Principle of Programming Languages Formal Language Syntax

- 87 -

Parsing Example ((a),b)

derivation stack input action((a),b) 1 ((a),b) shift, goto 3((a),b) 1(3 (a),b) shift, goto 3((a),b) 1(3(3 a),b) shift, goto 2((a),b) 1(3(3a2 ),b) reduce Sid((S),b) 1(3(3(S7 ),b) reduce LS((L),b) 1(3(3(L5 ),b) shift, goto 6((L),b) 1(3(3L5)6 ,b) reduce S(L)(S,b) 1(3S7 ,b) reduce LS(L,b) 1(3L5 ,b) shift, goto 8(L,b) 1(3L5,8 b) shift, goto 9(L,b) 1(3L5,8b2 ) reduce Sid(L,S) 1(3L8,S9 ) reduce LL,S(L) 1(3L5 ) shift, goto 6(L) 1(3L5)6 reduce S(L)S 1S4 $ done

S (L) | idL S | L,S

Page 88: COMP3190: Principle of Programming Languages Formal Language Syntax

- 88 -

LR(k) Grammars

LR(k) = Left-to-right scanning, right-most derivation, k lookahead chars

Main cases» LR(0), LR(1)» Some variations SLR and LALR(1)

Parsers for LR(0) Grammars:» Determine the actions without any lookahead» Will help us understand shift-reduce parsing

Page 89: COMP3190: Principle of Programming Languages Formal Language Syntax

- 89 -

Building LR(0) Parsing Tables

To build the parsing table:» Define states of the parser

» Build a DFA to describe transitions between states

» Use the DFA to build the parsing table

Each LR(0) state is a set of LR(0) items» An LR(0) item: X . where X is a

production in the grammar

» The LR(0) items keep track of the progress on all of the possible upcoming productions

» The item X . abstracts the fact that the parser already matched the string at the top of the stack

Page 90: COMP3190: Principle of Programming Languages Formal Language Syntax

- 90 -

Example LR(0) State

An LR(0) item is a production from the language with a separator “.” somewhere in the RHS of the production

Sub-string before “.” is already on the stack (beginnings of possible ’s to be reduced)

Sub-string after “.”: what we might see next

E num .E ( . S)

stateitem

Page 91: COMP3190: Principle of Programming Languages Formal Language Syntax

- 91 -

Class Problem

For the production,E num | (S)

Two items are:E num .E ( . S )

Are there any others? If so, what are they? If not, why?

Page 92: COMP3190: Principle of Programming Languages Formal Language Syntax

- 92 -

LR(0) Grammar

Nested lists» S (L) | id

» L S | L,S

Examples» (a,b,c)

» ((a,b), (c,d), (e,f))

» (a, (b,c,d), ((f,g)))

S

( L )

L , S

L , S

( S )S

a L , S

S

b

c

d

Parse tree for(a, (b,c), d)

Page 93: COMP3190: Principle of Programming Languages Formal Language Syntax

- 93 -

Start State and Closure

Start state» Augment grammar with production: S’ S $» Start state of DFA has empty stack: S’ . S $

Closure of a parser state:» Start with Closure(S) = S» Then for each item in S:

X . Y Add items for all the productions Y to the

closure of S: Y .

Page 94: COMP3190: Principle of Programming Languages Formal Language Syntax

- 94 -

Closure Example

S (L) | idL S | L,S

DFA start state

S’ . S $closure

S’ . S $S . (L)S . id

- Set of possible productions to be reduced next- Added items have the “.” located at the beginning: no symbols for these items on the stack yet

Page 95: COMP3190: Principle of Programming Languages Formal Language Syntax

- 95 -

The Goto Operation

Goto operation = describes transitions between parser states, which are sets of items

Algorithm: for state S and a symbol Y» If the item [X . Y ] is in I, then» Goto(I, Y) = Closure( [X Y . ] )

S’ . S $S . (L)S . id

Goto(S, ‘(‘) Closure( { S ( . L) } )

Page 96: COMP3190: Principle of Programming Languages Formal Language Syntax

- 96 -

Class Problem

1. If I = { [E’ . E]}, then Closure(I) = ??

2. If I = { [E’ E . ], [E E . + T] }, then Goto(I,+) = ??

E’ EE E + T | TT T * F | FF (E) | id

Page 97: COMP3190: Principle of Programming Languages Formal Language Syntax

- 97 -

Applying Reduce Actions

S’ . S $S . (L)S . id

S ( . L)L . SL . L, SS . (L)S . id

S id .

id

(

id (Grammar

S (L) | idL S | L,S

S (L . )L L . , S

L S .

L

S

states causing reductions(dot has reached the end!)

Pop RHS off stack, replace with LHS X (X ),then rerun DFA (e.g., (x))

Page 98: COMP3190: Principle of Programming Languages Formal Language Syntax

- 98 -

Reductions

On reducing X with stack » Pop off stack, revealing prefix and state» Take single step in DFA from top state» Push X onto stack with new DFA state

Example

derivation stack input action((a),b) 1 ( 3 ( 3 a),b) shift, goto 2((a),b) 1 ( 3 ( 3 a 2 ),b) reduce S id((S),b) 1 ( 3 ( 3 S 7 ),b) reduce L S

Page 99: COMP3190: Principle of Programming Languages Formal Language Syntax

- 99 -

Full DFA

S’ . S $S . (L)S . id

S ( . L)L . SL . L, SS . (L)S . id

S id .id

(

id

(

S (L . )LL L . , S

L S .

S

L L , . SS . (L)S . id

L L,S .

S (L) .

S’ S . $

final state

1 2 8 9

6

5

3

74

S

,

)

S

$

id

L

GrammarS (L) | idL S | L,S

Page 100: COMP3190: Principle of Programming Languages Formal Language Syntax

- 100 -

Building the Parsing Table

States in the table = states in the DFA For transition S S’ on terminal C:

» Table[S,C] += Shift(S’) For transition S S’ on non-terminal N:

» Table[S,N] += Goto(S’) If S is a reduction state X then:

» Table[S,*] += Reduce(X )

Page 101: COMP3190: Principle of Programming Languages Formal Language Syntax

- 101 -

Computed LR Parsing Table

( ) id , $ S L1 s3 s2 g42 Sid Sid Sid Sid Sid3 s3 s2 g7 g54 accept5 s6 s86 S(L) S(L) S(L) S(L) S(L)7 LS LS LS LS LS8 s3 s2 g99 LL,S LL,S LL,S LL,S LL,S

Sta

te

Input terminal Non-terminals

red = reduceblue = shift

Page 102: COMP3190: Principle of Programming Languages Formal Language Syntax

- 102 -

LR(0) Summary

LR(0) parsing recipe:» Start with LR(0) grammar» Compute LR(0) states and build DFA:

Use the closure operation to compute states Use the goto operation to compute transitions

» Build the LR(0) parsing table from the DFA This can be done automatically

Page 103: COMP3190: Principle of Programming Languages Formal Language Syntax

- 103 -

Class Problem

S E + S | EE num

Generate the DFA for the following grammar

Page 104: COMP3190: Principle of Programming Languages Formal Language Syntax

- 104 -

LR(0) Limitations

An LR(0) machine only works if states with reduce actions have a single reduce action» Always reduce regardless of lookahead

With a more complex grammar, construction gives states with shift/reduce or reduce/reduce conflicts

Need to use lookahead to choose

L L , S .L L , S .S S . , L

L S , L .L S .

OK shift/reduce reduce/reduce

Page 105: COMP3190: Principle of Programming Languages Formal Language Syntax

- 105 -

A Non-LR(0) Grammar

Grammar for addition of numbers» S S + E | E» E num

Left-associative version is LR(0) Right-associative is not LR(0) as you saw

with the previous class problem» S E + S | E» E num

Page 106: COMP3190: Principle of Programming Languages Formal Language Syntax

- 106 -

LR(0) Parsing Table

S’ . S $S .E + SS . EE .num E num .

S E . +SS E .

E

num

+

S E + S .

S’ S $ .

S

S E + . SS . E + SS . EE . num

S’ S . $

1 2

5

3

7

4S

GrammarS E + S | EE num

$

E

num

num + $ E S1 s4 g2 g62 SE s3/SE SE

Shift orreducein state 2?

Page 107: COMP3190: Principle of Programming Languages Formal Language Syntax

- 107 -

Solve Conflict With Lookahead

3 popular techniques for employing lookahead of 1 symbol with bottom-up parsing» SLR – Simple LR» LALR – LookAhead LR» LR(1)

Each as a different means of utilizing the lookahead» Results in different processing capabilities

Page 108: COMP3190: Principle of Programming Languages Formal Language Syntax

- 108 -

SLR Parsing

SLR Parsing = Easy extension of LR(0)» For each reduction X , look at next symbol C

» Apply reduction only if C is in FOLLOW(X)

SLR parsing table eliminates some conflicts» Same as LR(0) table except reduction rows» Adds reductions X only in the columns of

symbols in FOLLOW(X)

num + $ E S1 s4 g2 g62 s3 SE

Example: FOLLOW(S) = {$}

GrammarS E + S | EE num

Page 109: COMP3190: Principle of Programming Languages Formal Language Syntax

- 109 -

SLR Parsing Table

Reductions do not fill entire rows as before Otherwise, same as LR(0)

num + $ E S1 s4 g2 g62 s3 SE3 s4 g2 g54 Enum Enum5 SE+S6 s77 accept

GrammarS E + S | EE num

Page 110: COMP3190: Principle of Programming Languages Formal Language Syntax

- 110 -

Class ProblemConsider:

S L = RS RL *RL identR L

Think of L as l-value, R as r-value, and* as a pointer dereference

When you create the states in the SLR(1) DFA,2 of the states are the following:

S L . = RR L . S R .

Do you have any shift/reduce conflicts? (Not as easy as it looks)

Page 111: COMP3190: Principle of Programming Languages Formal Language Syntax

- 111 -

LR(1) Parsing Get as much as possible out of 1 lookahead

symbol parsing table LR(1) grammar = recognizable by a shift/reduce

parser with 1 lookahead LR(1) parsing uses similar concepts as LR(0)

» Parser states = set of items» LR(1) item = LR(0) item + lookahead symbol

possibly following production LR(0) item: S . S + E LR(1) item: S . S + E , + Lookahead only has impact upon REDUCE

operations, apply when lookahead = next input

Page 112: COMP3190: Principle of Programming Languages Formal Language Syntax

- 112 -

LR(1) States

LR(1) state = set of LR(1) items LR(1) item = (X . , y)

» Meaning: already matched at top of the stack, next expect to see y

Shorthand notation» (X . , {x1, ..., xn})

» means: (X . , x1) . . . (X . , xn)

Need to extend closure and goto operations

S S . + E +,$S S + . E num

Page 113: COMP3190: Principle of Programming Languages Formal Language Syntax

- 113 -

LR(1) Closure

LR(1) closure operation:» Start with Closure(S) = S

» For each item in S: X . Y , z

and for each production Y , add the following item to the closure of S: Y . , FIRST(z)

» Repeat until nothing changes

Similar to LR(0) closure, but also keeps track of lookahead symbol

Page 114: COMP3190: Principle of Programming Languages Formal Language Syntax

- 114 -

LR(1) Start State

Initial state: start with (S’ . S , $), then apply closure operation

Example: sum grammar

S’ . S , $

S’ . S , $S . E + S , $S . E , $E . num , +,$

closure

S’ S $S E + S | EE num

Page 115: COMP3190: Principle of Programming Languages Formal Language Syntax

- 115 -

LR(1) Goto Operation

LR(1) goto operation = describes transitions between LR(1) states

Algorithm: for a state S and a symbol Y (as before)» If the item [X . Y ] is in I, then

» Goto(I, Y) = Closure( [X Y . ] )

S E . + S , $S E . , $

Closure({S E + . S , $})

Goto(S1, ‘+’)S1 S2

Grammar:S’ S$S E + S | EE num

Page 116: COMP3190: Principle of Programming Languages Formal Language Syntax

- 116 -

Class Problem

1. Compute: Closure(I = {S E + . S , $})2. Compute: Goto(I, num)3. Compute: Goto(I, E)

S’ S $S E + S | EE num

Page 117: COMP3190: Principle of Programming Languages Formal Language Syntax

- 117 -

LR(1) DFA Construction

S’ . S , $S . E + S , $S . E , $E .num , +,$

E num . , +,$

S’ S . , $

E

num

+

S E+S. , +,$

S

S E + . S , $S . E + S , $S . E , $E . num , +,$

S E . + S , $S E . , $

S

GrammarS’ S$S E + S | EE numE

num

Page 118: COMP3190: Principle of Programming Languages Formal Language Syntax

- 118 -

LR(1) Reductions

S’ . S , $S . E + S , $S . E , $E .num , +,$

E num . , +,$

S’ S . , $

E

num

+

S E . , +,$

S

S E + . S , $S . E + S , $S . E , $E . num , +,$

S E . + S , $S E . , $

S

GrammarS’ S$S E + S | EE numE

num

• Reductions correspond to LR(1) items of the form (X . , y)

Page 119: COMP3190: Principle of Programming Languages Formal Language Syntax

- 119 -

LR(1) Parsing Table Construction

Same as construction of LR(0), except for reductions

For a transition S S’ on terminal x:» Table[S,x] += Shift(S’)

For a transition S S’ on non-terminal N:» Table[S,N] += Goto(S’)

If I contains {(X . , y)} then:» Table[I,y] += Reduce(X )

Page 120: COMP3190: Principle of Programming Languages Formal Language Syntax

- 120 -

LR(1) Parsing Table Example

S’ . S , $S . E + S , $S . E , $E .num , +,$

E

+

S E + . S , $S . E + S , $S . E , $E . num , +,$

S E . + S , $S E . , $

GrammarS’ S$S E + S | EE num

1

2

3

+ $ E1 g22 s3 SE

Fragment of theparsing table

Page 121: COMP3190: Principle of Programming Languages Formal Language Syntax

- 121 -

Class Problem

Compute the LR(1) DFA for the following grammar

E E + T | TT TF | FF F* | a | b

Page 122: COMP3190: Principle of Programming Languages Formal Language Syntax

- 122 -

LALR(1) Grammars

Problem with LR(1): too many states LALR(1) parsing (aka LookAhead LR)

» Constructs LR(1) DFA and then merge any 2 LR(1) states whose items are identical except lookahead

» Results in smaller parser tables

» Theoretically less powerful than LR(1)

LALR(1) grammar = a grammar whose LALR(1) parsing table has no conflicts

S id . , +S E . , $

S id . , $S E . , ++ = ??

Page 123: COMP3190: Principle of Programming Languages Formal Language Syntax

- 123 -

LALR Parsers

LALR(1)» Generally same number of states as SLR

(much less than LR(1))» But, with same lookahead capability of LR(1)

(much better than SLR)» Example: Pascal programming language

In SLR, several hundred states In LR(1), several thousand states

Page 124: COMP3190: Principle of Programming Languages Formal Language Syntax

- 124 -

Automate the Parsing Process

Can automate:» The construction of LR parsing tables

» The construction of shift-reduce parsers based on these parsing tables

LALR(1) parser generators» yacc, bison

» Not much difference compared to LR(1) in practice

» Smaller parsing tables than LR(1)

» Augment LALR(1) grammar specification with declarations of precedence, associativity

» Output: LALR(1) parser program

Page 125: COMP3190: Principle of Programming Languages Formal Language Syntax

- 125 -

Associativity

S S + E | EE num

E E + EE num

What happens if we run this grammar through LALR construction?

E E + EE num

E E + E . , +E E . + E , +,$

+

shift/reduceconflict

shift: 1+ (2+3)reduce: (1+2)+3

1 + 2 + 3

Page 126: COMP3190: Principle of Programming Languages Formal Language Syntax

- 126 -

Associativity (2)

If an operator is left associative» Assign a slightly higher value to its precedence if it is

on the parse stack than if it is in the input stream

» Since stack precedence is higher, reduce will take priority (which is correct for left associative)

If operator is right associative» Assign a slightly higher value if it is in the input

stream

» Since input stream is higher, shift will take priority (which is correct for right associative)

Page 127: COMP3190: Principle of Programming Languages Formal Language Syntax

- 127 -

Precedence

E E + E | TT T x T | num | (E) E E + E | E x E | num | (E)

Shift/reduceconflict results

What happens if we run this grammar through LALR construction?

E E . + E , ...E E x E . , +

E E + E . , xE E . x E, ...

Precedence: attach precedence indicators to terminalsShift/reduce conflict resolved by:

1. If precedence of the input token is greater than the last terminal on parse stack, favor shift over reduce2. If the precedence of the input token is less than or equal to the last terminal on the parse stack, favor reduce over shift

Page 128: COMP3190: Principle of Programming Languages Formal Language Syntax

- 128 -

Abstract Syntax Tree (AST) - Review Derivation = sequence of

applied productions» S E+S 1+S 1+E

1+2

Parse tree = graph representation of a derivation» Doesn’t capture the order

of applying the productions

AST discards unnecessary information from the parse tree

++ 5

1 +

2 +

3 4

S

E + S

( S ) E

E + S 5

E + S1

2 E

( S )

E + S

E3 4

Page 129: COMP3190: Principle of Programming Languages Formal Language Syntax

- 129 -

Implicit AST Construction

LL/LR parsing techniques implicitly build AST

The parse tree is captured in the derivation» LL parsing: AST represented by applied

productions» LR parsing: AST represented by applied

reductions We want to explicitly construct the AST

during the parsing phase

Page 130: COMP3190: Principle of Programming Languages Formal Language Syntax

- 130 -

AST Construction - LL

void parse_S() { switch (token) { case num: case ‘(‘: parse_E(); parse_S’(); return; default: ParseError(); }}

Expr parse_S() { switch (token) { case num: case ‘(‘: Expr left = parse_E(); Expr right = parse_S’(); if (right == NULL) return left else return new Add(left,right); default: ParseError(); }}

LL parsing: extend proceduresfor non-terminals

S ES’S’ | +SE num | (S)

Page 131: COMP3190: Principle of Programming Languages Formal Language Syntax

- 131 -

AST Construction - LR

We again need to add code for explicit AST construction

AST construction mechanism» Store parts of the tree on the stack» For each nonterminal symbol X on stack, also

store the sub-tree rooted at X on stack» Whenever the parser performs a reduce

operation for a production X , create an AST node for X

Page 132: COMP3190: Principle of Programming Languages Formal Language Syntax

- 132 -

AST Construction for LR - Example

S E + S | SE num | (S)

.

.

.

.

.

.

S

+

E

.

.

Add

Num(1) Num(2)

stac

k

Before reduction: S E + S

Num(3) ...

.

.

.

S .Add

Num(1)

Num(2) Num(3)

Add

After reduction: S E + S

input string: “1 + 2 + 3”

Page 133: COMP3190: Principle of Programming Languages Formal Language Syntax

- 133 -

Problems

Unstructured code: mixing parsing code with AST construction code

Automatic parser generators» The generated parser needs to contain AST

construction code

» How to construct a customized AST data structure using an automatic parser generator?

May want to perform other actions concurrently with parsing phase» E.g., semantic checks

» This can reduce the number of compiler passes

Page 134: COMP3190: Principle of Programming Languages Formal Language Syntax

- 134 -

Syntax-Directed Definition

Solution: Syntax-directed definition» Extends each grammar production with an

associated semantic action (code): S E + S {action}

» The parser generator adds these actions into the generated parser

» Each action is executed when the corresponding production is reduced

Page 135: COMP3190: Principle of Programming Languages Formal Language Syntax

- 135 -

Semantic Actions

Actions = C code (for bison/yacc) The actions access the parser stack

» Parser generators extend the stack of symbols with entries for user-defined structures (e.g., parse trees)

The action code should be able to refer to the grammar symbols in the productions» Need to refer to multiple occurrences of the same non-

terminal symbol, distinguish RHS vs LHS occurrence E E + E

» Use dollar variables in yacc/bison ($$, $1, $2, etc.) expr ::= expr PLUS expr {$$ = $1 + $3;}

Page 136: COMP3190: Principle of Programming Languages Formal Language Syntax

- 136 -

Building the AST

Use semantic actions to build the AST AST is built bottom-up along with parsing

expr ::= NUM {$$ = new Num($1.val); }expr ::= expr PLUS expr {$$ = new Add($1, $3); }expr ::= expr MULT expr {$$ = new Mul($1, $3); }expr ::= LPAR expr RPAR {$$ = $2; }

Recall: User-defined type forobjects on the stack (%union)

Page 137: COMP3190: Principle of Programming Languages Formal Language Syntax

- 137 -

Outline

DFA & NFA Regular expression Regular languages Context free languages &PDA Scanner Parser

» Top-down parsing» Bottom-up Parsing» Comparison

Page 138: COMP3190: Principle of Programming Languages Formal Language Syntax

- 138 -

LL/LR Grammar Summary

LL parsing tables» Non-terminals x terminals productions

» Computed using FIRST/FOLLOW

LR parsing tables» LR states x terminals {shift/reduce}

» LR states x non-terminals goto

» Computed using closure/goto operations on LR states

A grammar is:» LL(1) if its LL(1) parsing table has no conflicts

» same for LR(0), SLR, LALR(1), LR(1)

Page 139: COMP3190: Principle of Programming Languages Formal Language Syntax

- 139 -

Top-Down Parsing

S S+E E+E (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E ...

S S + E | EE num | (S)

In left-most derivation, entiretree above token (2) has beenexpanded when encountered

S

S + E

( S )

S + E

5E

S + E

2E

1

( S )

S + E

4E

3

Page 140: COMP3190: Principle of Programming Languages Formal Language Syntax

- 140 -

Top-Down vs Bottom-Up

scanned unscanned scanned unscanned

Top-down Bottom-up

Bottom-up: Don’t need to figure out as much of he parse treefor a given amount of input More time to decide what rulesto apply

Page 141: COMP3190: Principle of Programming Languages Formal Language Syntax

- 141 -

Terminology: LL vs LR LL(k)

» Left-to-right scan of input» Left-most derivation» k symbol lookahead» [Top-down or predictive] parsing or LL parser» Performs pre-order traversal of parse tree

LR(k)» Left-to-right scan of input» Right-most derivation» k symbol lookahead» [Bottom-up or shift-reduce] parsing or LR parser» Performs post-order traversal of parse tree

Page 142: COMP3190: Principle of Programming Languages Formal Language Syntax

- 142 -

Classification of Grammars

LR(0)

SLR

LALR(1)

LR(1)

LL(1)

LR(k) LR(k+1)LL(k) LL(k+0)

LL(k) LR(k)LR(0) SLRLALR(1) LR(1)

not to scale

Page 143: COMP3190: Principle of Programming Languages Formal Language Syntax

- 143 -

Bottom-Up Parsing

(1+2+(3+4))+5 (E+2+(3+4))+5 (S+2+(3+4))+5 (S+E+(3+4))+5

S S + E | EE num | (S)

Advantage of bottom-up parsing:can postpone the selection ofproductions until more of theinput is scanned

S

S + E

( S )

S + E

5E

S + E

2E

1

( S )

S + E

4E

3