lecture 4 cs 473: compiler design · 2020-01-29 · converting the table to code • define n...

30
CS 473: COMPILER DESIGN Lecture 4 1

Upload: others

Post on 01-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

CS 473: COMPILER DESIGNLecture 4

1

Page 2: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

2

Page 3: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

LL & LR PARSING

3

searching for derivations

Page 4: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Today: Parsing

4

Source Code(Character stream)if (b == 0) { a = 1; }

Backend

Assembly Codel1:

cmpq %eax, $0

jeq l2

jmp l3

l2:

Lexical Analysis

Analysis & Transformation

Abstract Syntax Tree:

Parsing

If

Eq

b 0 a 1

NoneAssn

Token stream:

if ( b == 0 ) { a = 0 ; }

Page 5: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Context Free Grammars: Summary

• Context-free grammars allow concise specifications of programming languages.– Can apply rules to convert token stream to parse tree/AST

– Can remove some ambiguity by encoding precedence and associativity in the grammar

• Next up: finding a derivation automatically (parsing!)

5

Page 6: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Automata for CFGs?

• Regular expressions can be recognized by (nondet.) finite automata

• CFGs can be recognized by nondet. pushdown automata

• Problem: pushdown automata can’t always be determinized!

• If we want computers to recognize CFGs automatically, we need to take another approach.

• First, let’s try making things simpler.

6

Page 7: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Consider finding leftmost derivations

• Look at only one input symbol at a time.

• Goal: derive (1 + 2 + (3 + 4)) + 5

Partly-derived String Look-ahead Parsed/Unparsed InputS ( (1 + 2 + (3 + 4)) + 5

⟼ E + S ( (1 + 2 + (3 + 4)) + 5⟼ (S) + S 1 (1 + 2 + (3 + 4)) + 5

⟼ (E + S) + S 1 (1 + 2 + (3 + 4)) + 5⟼ (1 + S) + S 2 (1 + 2 + (3 + 4)) + 5

⟼ (1 + E + S) + S 2 (1 + 2 + (3 + 4)) + 5⟼ (1 + 2 + S) + S ( (1 + 2 + (3 + 4)) + 5⟼ (1 + 2 + E) + S ( (1 + 2 + (3 + 4)) + 5⟼ (1 + 2 + (S)) + S 3 (1 + 2 + (3 + 4)) + 5⟼ (1 + 2 + (E + S)) + S 3 (1 + 2 + (3 + 4)) + 5⟼ …

7

S⟼ E + S | EE⟼ number | ( S )

Page 8: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Consider finding leftmost derivations

• Look at only one input symbol at a time.

• Goal: derive (1 + 2 + (3 + 4)) + 5

Partly-derived String Look-ahead Parsed/Unparsed InputS ( (1 + 2 + (3 + 4)) + 5

⟼ E + S ( (1 + 2 + (3 + 4)) + 5⟼ (S) + S 1 (1 + 2 + (3 + 4)) + 5

⟼ (E + S) + S 1 (1 + 2 + (3 + 4)) + 5⟼ (1 + S) + S 2 (1 + 2 + (3 + 4)) + 5

⟼ (1 + E + S) + S 2 (1 + 2 + (3 + 4)) + 5⟼ (1 + 2 + S) + S ( (1 + 2 + (3 + 4)) + 5⟼ (1 + 2 + E) + S ( (1 + 2 + (3 + 4)) + 5⟼ (1 + 2 + (S)) + S 3 (1 + 2 + (3 + 4)) + 5⟼ (1 + 2 + (E + S)) + S 3 (1 + 2 + (3 + 4)) + 5⟼ …

8

S⟼ E + S | EE⟼ number | ( S )

Page 9: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

There is a problem

• We want to decide which productionto apply based on the look-ahead symbol.

• But there is a choice:

(1) S⟼ E⟼ (S) ⟼ (E) ⟼ (1)

vs.(1) + 2 S⟼ E + S⟼ (S) + S⟼ (E) + S⟼ (1) + S⟼ (1) + E⟼ (1) + 2

• Given the look-ahead symbol ‘(’ it isn’t clear whether to pick S⟼ E or S⟼ E + S first.

9

S⟼ E + S | EE⟼ number | ( S )

Page 10: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

LL(1) GRAMMARS

10

Page 11: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Grammar is the problem

• Not all grammars can be parsed “top-down” with only a single lookaheadsymbol.

• Top-down: starting from the start symbol (root of the parse tree) and going down

• LL(1) means

– Left-to-right scanning

– Leftmost derivation

– 1 lookahead symbol

• This language isn’t “LL(1)”

• Is it LL(k) for some k?

• What can we do?

11

S⟼ E + S | EE⟼ number | ( S )

Page 12: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Making a grammar LL(1)

• Problem: We can’t decide which S production to apply until we see the symbol after the first E.

• Solution: Put off the choice by “left-factoring” the grammar. Turn the decision point into a new non-terminal S’:

• Also need to eliminate left-recursion somehow. Why?

• Consider:

12

S⟼ E + S | EE⟼ number | ( S )

S⟼ S + E | EE⟼ number | ( S )

S ⟼ ES’S’⟼ e

S’⟼ + SE ⟼ number | ( S )

Page 13: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

LL(1) Parse of the input string

• Look at only one input symbol at a time.

Partly-derived String Look-ahead Parsed/Unparsed InputS ( (1 + 2 + (3 + 4)) + 5

⟼ E S’ ( (1 + 2 + (3 + 4)) + 5⟼ (S) S’ 1 (1 + 2 + (3 + 4)) + 5⟼ (E S’) S’ 1 (1 + 2 + (3 + 4)) + 5⟼ (1 S’) S’ + (1 + 2 + (3 + 4)) + 5

⟼ (1 + S) S’ 2 (1 + 2 + (3 + 4)) + 5⟼ (1 + E S’) S’ 2 (1 + 2 + (3 + 4)) + 5⟼ (1 + 2 S’) S’ + (1 + 2 + (3 + 4)) + 5

⟼ (1 + 2 + S) S’ ( (1 + 2 + (3 + 4)) + 5⟼ (1 + 2 + E S’) S’ ( (1 + 2 + (3 + 4)) + 5⟼ (1 + 2 + (S)S’) S’ 3 (1 + 2 + (3 + 4)) + 5

13

S ⟼ ES’S’⟼ e

S’⟼ + SE ⟼ number | ( S )

Page 14: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

LL(1) Parse of the input string

• Look at only one input symbol at a time.

Partly-derived String Look-ahead Parsed/Unparsed InputS ( (1 + 2 + (3 + 4)) + 5

⟼ …

⟼ (1 + 2 + (S)S’) S’ 3 (1 + 2 + (3 + 4)) + 5

⟼ …

⟼ (1 + 2 + (3 + 4)S’) S’ ) (1 + 2 + (3 + 4)) + 5

⟼ (1 + 2 + (3 + 4)) S’ + (1 + 2 + (3 + 4)) + 5

⟼ (1 + 2 + (3 + 4)) + S 5 (1 + 2 + (3 + 4)) + 5

⟼ … ⟼ (1 + 2 + (3 + 4)) + 5 finished!

14

S ⟼ ES’S’⟼ e

S’⟼ + SE ⟼ number | ( S )

Page 15: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

15

Page 16: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Predictive Parsing

• Given an LL(1) grammar:

– For a given nonterminal, the lookahead symbol uniquely determines the production to apply.

– Top-down/predictive/recursive descent parsing

– Driven by a parsing table: nonterminal and input token → production

• Note: it is convenient to add a special end-of-file token $ and a start symbol T (top-level) that requires $.

16

number + ( ) $ (EOF)

T ⟼ S$ ⟼S$

S ⟼ E S’ ⟼E S’

S’ ⟼ + S ⟼ e ⟼ e

E ⟼ num. ⟼ ( S )

T ⟼ S$S ⟼ ES’S’⟼ e

S’⟼ + SE ⟼ number | ( S )

Page 17: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

How do we construct the parse table?

• Consider a given production: A→ g

• Construct the set of all input tokens that may appear first in strings that can be derived from g

– Add the production → g to the entry (A,token) for each such token.

• If g can derive e (the empty string), then we construct the set of all input tokens that may follow the nonterminal A in the grammar.

– Add the production → g to the entry (A, token) for each such token.

• Note: if there are two different productions for a given entry, the grammar is not LL(1)

17

Page 18: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Example

• First(S$) = ?

18

number + ( ) $ (EOF)

T

S

S’

E

T ⟼ S$S ⟼ ES’S’⟼ e

S’⟼ + SE ⟼ number | ( S )

Page 19: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Example

• First(S$) = First(S) = First(ES’)

19

number + ( ) $ (EOF)

T

S

S’

E

T ⟼ S$S ⟼ ES’S’⟼ e

S’⟼ + SE ⟼ number | ( S )

Page 20: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Example

• First(S$) = First(S) = First(ES’) = First(E)

• First(number) = number

20

number + ( ) $ (EOF)

T

S

S’

E ⟼ num.

T ⟼ S$S ⟼ ES’S’⟼ e

S’⟼ + SE ⟼ number | ( S )

Page 21: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Example

• First(S$) = First(S) = First(ES’) = First(E)

• First(number) = number

• First( ( S ) ) = (

21

number + ( ) $ (EOF)

T

S

S’

E ⟼ num. ⟼ ( S )

T ⟼ S$S ⟼ ES’S’⟼ e

S’⟼ + SE ⟼ number | ( S )

Page 22: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Example

• First(S$) = First(S) = First(ES’) = First(E)

• First(number) = number

• First( ( S ) ) = (

22

number + ( ) $ (EOF)

T ⟼ S$ ⟼ S$

S ⟼ ES’ ⟼ ES’

S’

E ⟼ num. ⟼ ( S )

T ⟼ S$S ⟼ ES’S’⟼ e

S’⟼ + SE ⟼ number | ( S )

Page 23: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Example

• First(S$) = First(S) = First(ES’) = First(E)

• First(number) = number

• First( ( S ) ) = (

• First(e) = ?

• First(+ S) = +

23

number + ( ) $ (EOF)

T ⟼ S$ ⟼ S$

S ⟼ ES’ ⟼ ES’

S’

E ⟼ num. ⟼ ( S )

T ⟼ S$S ⟼ ES’S’⟼ e

S’⟼ + SE ⟼ number | ( S )

Page 24: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Example

• First(S$) = First(S) = First(ES’) = First(E)

• First(number) = number

• First( ( S ) ) = (

• First(e) = ?

• First(+ S) = +

24

number + ( ) $ (EOF)

T ⟼ S$ ⟼ S$

S ⟼ ES’ ⟼ ES’

S’ ⟼ + S

E ⟼ num. ⟼ ( S )

T ⟼ S$S ⟼ ES’S’⟼ e

S’⟼ + SE ⟼ number | ( S )

Page 25: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Example

• First(S$) = First(S) = First(ES’) = First(E)

• First(number) = number

• First( ( S ) ) = (

• First(e) = ?

• First(+ S) = +

• Follow(S’) = ?

25

number + ( ) $ (EOF)

T ⟼ S$ ⟼ S$

S ⟼ ES’ ⟼ ES’

S’ ⟼ S

E ⟼ num. ⟼ ( S )

T ⟼ S$S ⟼ ES’S’⟼ e

S’⟼ + SE ⟼ number | ( S )

Page 26: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Example

• First(S$) = First(S) = First(ES’) = First(E)

• First(number) = number

• First( ( S ) ) = (

• First(e) = ?

• First(+ S) = +

• Follow(S’) = Follow(S)

26

number + ( ) $ (EOF)

T ⟼ S$ ⟼ S$

S ⟼ ES’ ⟼ ES’

S’ ⟼ S

E ⟼ num. ⟼ ( S )

T ⟼ S$S ⟼ ES’S’⟼ e

S’⟼ + SE ⟼ number | ( S )

Page 27: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Example

• First(S$) = First(S) = First(ES’) = First(E)

• First(number) = number

• First( ( S ) ) = (

• First(e) = ?

• First(+ S) = +

• Follow(S’) = Follow(S)

• Follow(S) = { $, ) }

27

number + ( ) $ (EOF)

T ⟼ S$ ⟼ S$

S ⟼ ES’ ⟼ ES’

S’ ⟼ S ⟼ e ⟼ e

E ⟼ num. ⟼ ( S )

T ⟼ S$S ⟼ ES’S’⟼ e

S’⟼ + SE ⟼ number | ( S )

Page 28: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

Converting the table to code

• Define n mutually recursive functions

– one for each nonterminal A: parse_A

• Each function “peeks” at the lookahead token and then follows the production rule in the corresponding entry.

– Consume terminal tokens from the input stream

– Call parse_X to create sub-tree for nonterminal X

– Build the piece of the AST for the production

28

Page 29: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

DEMO: PARSER.C

29

Hand-generated LL(1) code for the table above.

number + ( ) $ (EOF)

T ⟼ S$ ⟼S$

S ⟼ E S’ ⟼E S’

S’ ⟼ + S ⟼ e ⟼ e

E ⟼ num. ⟼ ( S )

Page 30: Lecture 4 CS 473: COMPILER DESIGN · 2020-01-29 · Converting the table to code • Define n mutually recursive functions –one for each nonterminal A: parse_A • Each function

30