parsingparsing. 2 front-end: parser checks the stream of words and their parts of speech for...

56
Parsing Parsing

Upload: elisabeth-cobb

Post on 22-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

ParsingParsingParsingParsing

Page 2: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

2

Front-End: ParserFront-End: ParserFront-End: ParserFront-End: Parser

Checks the stream of words and their parts of speech for grammatical correctness

scanner parsersourcecode

tokens IR

errors

Page 3: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

3

Front-End: ParserFront-End: ParserFront-End: ParserFront-End: Parser

Determines if the input is syntactically well formed

scanner parsersourcecode

tokens IR

errors

Page 4: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

4

Front-End: ParserFront-End: ParserFront-End: ParserFront-End: Parser

Guides context-sensitive (“semantic”) analysis (type checking)

scanner parsersourcecode

tokens IR

errors

Page 5: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

5

Front-End: ParserFront-End: ParserFront-End: ParserFront-End: Parser

Builds IR for source program

scanner parsersourcecode

tokens IR

errors

Page 6: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

6

Syntactic AnalysisSyntactic AnalysisSyntactic AnalysisSyntactic Analysis Natural language analogy:

consider the sentence

He wrote the program

Page 7: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

7

Syntactic AnalysisSyntactic AnalysisSyntactic AnalysisSyntactic Analysis

He wrote the program

noun verb article noun

Page 8: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

8

Syntactic AnalysisSyntactic AnalysisSyntactic AnalysisSyntactic Analysis

He wrote the program

noun verb article noun

subject predicate object

Page 9: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

9

Syntactic AnalysisSyntactic AnalysisSyntactic AnalysisSyntactic Analysis Natural language analogy

He wrote the program

noun verb article noun

subject predicate object

sentence

Page 10: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

10

Syntactic AnalysisSyntactic AnalysisSyntactic AnalysisSyntactic Analysis Programming language

if ( b <= 0 ) a = b

bool expr assignment

if-statement

Page 11: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

11

Syntactic AnalysisSyntactic AnalysisSyntactic AnalysisSyntactic Analysissyntax errors

int* foo(int i, int j)){ for(k=0; i j; ) fi( i > j ) return j;}

Page 12: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

Compiler Compiler ConstructionConstruction

Compiler Compiler ConstructionConstruction

Sohail Aslam

Lecture 11

Page 13: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

13

Syntactic AnalysisSyntactic AnalysisSyntactic AnalysisSyntactic Analysisint* foo(int i, int j))

{

for(k=0; i j; )

fi( i > j )

return j;

}

extra parenthesis

Missing expression

not a keyword

Page 14: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

14

Semantic AnalysisSemantic AnalysisSemantic AnalysisSemantic Analysis Grammatically correct

He wrote the computer

noun verb article noun

subject predicate object

sentence

Page 15: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

15

Semantic AnalysisSemantic AnalysisSemantic AnalysisSemantic Analysis semantically (meaning) wrong!

He wrote the computer

noun verb article noun

subject predicate object

sentence

Page 16: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

16

Semantic AnalysisSemantic AnalysisSemantic AnalysisSemantic Analysisint* foo(int i, int j){ for(k=0; i < j; j++ ) if( i < j-2 ) sum = sum+i return sum;}

undeclared var

return type

mismatch

Page 17: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

17

Role of the ParserRole of the ParserRole of the ParserRole of the Parser Not all sequences of tokens

are program. Parser must distinguish

between valid and invalid sequences of tokens.

Page 18: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

18

Role of the ParserRole of the ParserRole of the ParserRole of the Parser Not all sequences of tokens

are program. Parser must distinguish

between valid and invalid sequences of tokens.

Page 19: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

19

Role of the ParserRole of the ParserRole of the ParserRole of the ParserWhat we need

An expressive way to describe the syntax

An acceptor mechanism that determines if input token stream satisfies the syntax

Page 20: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

20

Role of the ParserRole of the ParserRole of the ParserRole of the ParserWhat we need

An expressive way to describe the syntax

An acceptor mechanism that determines if input token stream satisfies the syntax

Page 21: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

21

Role of the ParserRole of the ParserRole of the ParserRole of the ParserWhat we need

An expressive way to describe the syntax

An acceptor mechanism that determines if input token stream satisfies the syntax

Page 22: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

22

Study of ParsingStudy of ParsingStudy of ParsingStudy of Parsing Parsing is the process of

discovering a derivation for some sentence

Page 23: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

23

Study of ParsingStudy of ParsingStudy of ParsingStudy of Parsing Mathematical model of

syntax – a grammar G.

Algortihm for testing membership in L(G).

Page 24: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

24

Study of ParsingStudy of ParsingStudy of ParsingStudy of Parsing Mathematical model of

syntax – a grammar G.

Algortihm for testing membership in L(G).

Page 25: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

25

Context Free GrammarsContext Free GrammarsContext Free GrammarsContext Free GrammarsA CFG is a four tuple

G=(S,N,T,P) S is the start symbol N is a set of non-terminals T is a set of terminals P is a set of productions

Page 26: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

26

Why Not Regular Why Not Regular Expressions?Expressions?Why Not Regular Why Not Regular Expressions?Expressions?Reason:

regular languages do not have enough power to express syntax of programming languages.

Page 27: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

27

Limitations of Regular Limitations of Regular LanguagesLanguagesLimitations of Regular Limitations of Regular LanguagesLanguages

Finite automaton can’t remember number of times it has visited a particular state

Page 28: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

28

Example of CFGExample of CFGExample of CFGExample of CFG

Context-free syntax is specified with a CFG

Page 29: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

29

Example of CFGExample of CFGExample of CFGExample of CFG Example

SheepNoise → SheepNoise baa| baa

This CFG defines the set of noises sheep make

Page 30: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

30

Example of CFGExample of CFGExample of CFGExample of CFG We can use the

SheepNoise grammar to create sentences

We use the productions as rewriting rules

Page 31: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

31

Example of CFGExample of CFGExample of CFGExample of CFGSheepNoise → SheepNoise baa

| baa

Rule Sentential Form- SheepNoise2 baa

Page 32: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

32

Example of CFGExample of CFGExample of CFGExample of CFGSheepNoise → SheepNoise baa

| baa

Rule Sentential Form- SheepNoise1 SheepNoise baa2 baa baa

Page 33: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

33

Example of CFGExample of CFGExample of CFGExample of CFG

And so on ...

Rule Sentential Form- SheepNoise1 SheepNoise baa1 SheepNoise baa baa2 baa baa baa

Page 34: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

34

Example of CFGExample of CFGExample of CFGExample of CFG While it is cute, this

example quickly runs out intellectual steam

To explore uses of CFGs, we need a more complex grammar

Page 35: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

35

Example of CFGExample of CFGExample of CFGExample of CFG While it is cute, this

example quickly runs out intellectual steam

To explore uses of CFGs, we need a more complex grammar

Page 36: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

36

More Useful GrammarMore Useful GrammarMore Useful GrammarMore Useful Grammar1 expr → expr op expr2 | num3 | id4 op → +5 | –6 | *7 | /

Page 37: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

37

Backus-Naur Form (BNF)Backus-Naur Form (BNF)Backus-Naur Form (BNF)Backus-Naur Form (BNF)

Grammar rules in a similar form were first used in the description of the Algol60 Language.

Page 38: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

38

Backus-Naur Form (BNF)Backus-Naur Form (BNF)Backus-Naur Form (BNF)Backus-Naur Form (BNF) The notation was developed

by John Backus and adapted by Peter Naur for the Algol60 report.

Thus the term Backus-Naur Form (BNF)

Page 39: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

39

Backus-Naur Form (BNF)Backus-Naur Form (BNF)Backus-Naur Form (BNF)Backus-Naur Form (BNF) The notation was developed

by John Backus and adapted by Peter Naur for the Algol60 report.

Thus the term Backus-Naur Form (BNF)

Page 40: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

40

Derivation:Derivation:Derivation:Derivation: Let us use the expression

grammar to derive the sentence

x – 2 * y

Page 41: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

41

Derivation: Derivation: x – 2 x – 2 ** y yDerivation: Derivation: x – 2 x – 2 ** y yRule Sentential Form

- expr1 expr op expr2 <id,x> op expr5 <id,x> – expr1 <id,x> – expr op

expr

Page 42: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

42

Derivation: Derivation: x – 2 x – 2 ** y yDerivation: Derivation: x – 2 x – 2 ** y y

Rule Sentential Form2 <id,x> – <num,2> op

expr6 <id,x> – <num,2>

expr3 <id,x> – <num,2>

<id,y>

Page 43: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

43

DerivationDerivationDerivationDerivation Such a process of rewrites

is called a derivation.

Process or discovering a derivations is called parsing

Page 44: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

44

DerivationDerivationDerivationDerivation Such a process of rewrites

is called a derivation.

Process or discovering a derivations is called parsing

Page 45: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

45

DerivationDerivationDerivationDerivation

We denote this derivation as:

expr →* id – num * id

Page 46: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

46

DerivationsDerivationsDerivationsDerivations At each step, we choose a

non-terminal to replace

Different choices can lead to different derivations.

Page 47: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

47

DerivationsDerivationsDerivationsDerivations At each step, we choose a

non-terminal to replace

Different choices can lead to different derivations.

Page 48: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

48

DerivationsDerivationsDerivationsDerivations Two derivations are of

interest

1. Leftmost derivation

2. Rightmost derivation

Page 49: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

49

DerivationsDerivationsDerivationsDerivations Leftmost derivation:

replace leftmost non-terminal (NT) at each step

Rightmost derivation: replace rightmost NT at each step

Page 50: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

50

DerivationsDerivationsDerivationsDerivations Leftmost derivation:

replace leftmost non-terminal (NT) at each step

Rightmost derivation: replace rightmost NT at each step

Page 51: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

51

DerivationsDerivationsDerivationsDerivations The example on the

preceding slides was leftmost derivation

There is also a rightmost derivation

Page 52: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

52

Rightmost DerivationRightmost DerivationRightmost DerivationRightmost DerivationRule Sentential Form

- expr1 expr op expr3 expr op <id,x>6 expr <id,x>1 expr op expr

<id,x>

Page 53: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

53

Derivation: Derivation: x – 2 x – 2 ** y yDerivation: Derivation: x – 2 x – 2 ** y y

Rule Sentential Form2 expr op <num,2>

<id,x>5 expr – <num,2>

<id,x>3 <id,x> – <num,2>

<id,y>

Page 54: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

54

DerivationsDerivationsDerivationsDerivations In both cases we have

expr →* id – num id

Page 55: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

55

DerivationsDerivationsDerivationsDerivations The two derivations produce

different parse trees.

The parse trees imply different evaluation orders!

Page 56: ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens

56

DerivationsDerivationsDerivationsDerivations The two derivations produce

different parse trees.

The parse trees imply different evaluation orders!