cs419 lec7 cfg

23
Compilers WELCOME TO A JOURNEY TO CS419 Lecture 7 Parsing tokens using Context Free Grammars Cairo University FCI Dr. Hussien Sharaf Computer Science Department [email protected]

Upload: arab-open-university-and-cairo-university

Post on 20-Nov-2014

311 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Cs419 lec7   cfg

Compilers

WELCOME TO A JOURNEY TO

CS419 Lecture 7

Parsing tokens using Context Free Grammars

Cairo UniversityFCI

Dr. Hussien SharafComputer Science [email protected]

Page 2: Cs419 lec7   cfg

2

PART ONE

Dr. Hussien M. Sharaf

Page 3: Cs419 lec7   cfg

PARSING A parser gets a stream of

tokens from the scanner, and determines if the syntax (structure) of the program is correct according to the (context-free) grammar of the source language.

Then, it produces a data structure, called a parse tree or an abstract syntax tree, which describes the syntactic structure of the program.

parser

Stream of tokens

Parse/syntax tree

Dr. Hussien M. Sharaf 3

Page 4: Cs419 lec7   cfg

CFG A context-free grammar is a notation for

defining context free languages. It is more powerful than finite automata or

RE’s, but still cannot define all possible languages.

Useful for nested structures, e.g., parentheses in programming languages.

Basic idea is to use “variables” to stand for sets of strings.

These variables are defined recursively, in terms of one another.

Dr. Hussien M. Sharaf 4

Page 5: Cs419 lec7   cfg

CFG FORMAL DEFINITION C =(V, Σ, R, S) V: is a finite set of variables. Σ: symbols called terminals of the

alphabet of the language being defined.

S V: a special start symbol. R: is a finite set of production rules of

the form A→ where AV, (V Σ)

Dr. Hussien M. Sharaf 5

Page 6: Cs419 lec7   cfg

CFG -1

Define the language { anbn | n > 1}. Terminals = {a, b}. Variables = {S}. Start symbol = S. Productions =

S → ab S → aSb Summary S → ab S → aSb

Dr. Hussien M. Sharaf 6

Page 7: Cs419 lec7   cfg

DERIVATION We derive strings in the language of a CFG by

starting with the start symbol, and repeatedly replacing some variable A by the right side of one of its productions.

Derivation example for “aabb” Using S→ aSb

generates uncompleted string that still has a non- terminal S.

Then using S→ ab to replace the inner S Generates “aabb”

S aSb aabb ……[Successful derivation of aabb]

Dr. Hussien M. Sharaf 7

Page 8: Cs419 lec7   cfg

CFG -1 : BALANCED-PARENTHESES

Prod1S → (S)Prod2S → ()

Derive the string ((())).S → (S) …..[by prod1]

→ ((S)) …..[by prod1]→ ((())) …..[by prod2]

Dr. Hussien M. Sharaf 8

Page 9: Cs419 lec7   cfg

CFG -2 : PALINDROME

Describe palindrome of a’s and b’s using CFG

1] S → aSa 2] S → bSb 3] S → Λ

Derive “baab” from the above grammar. S → bSb [by 2]

→ baSab [by 1]→ ba ab [by 3]

Dr. Hussien M. Sharaf 9

Page 10: Cs419 lec7   cfg

CFG -3 : EVEN-PLAINDROME

i.e. {Λ, ab, abbaabba,… } S → aSa| bSb| Λ Derive abaaba

10

S

S

Λ

aa

S bb

S aa

Dr. Hussien M. Sharaf

Page 11: Cs419 lec7   cfg

CFG – 4

Describe anything (a+b)* using CGF1] S → Λ 2] S → Y 3] Y→ aY4] Y → bY 5] Y →a 6] Y→ b

Derive “aab” from the above grammar.

S → aY [by 3]Y → aaY [by 3]Y → aab [by 6]

Dr. Hussien M. Sharaf 11

Page 12: Cs419 lec7   cfg

CFG – 5

1] S → Λ 2] S → aS 3] S→ bS

Derive “aa” from the above grammar.

S → aS [by 2]→ aaS [by 2]→ aa [by 1]

Dr. Hussien M. Sharaf 12

Page 13: Cs419 lec7   cfg

13

PART TWO

Dr. Hussien M. Sharaf

Page 14: Cs419 lec7   cfg

Parsing CFG grammar is about categorizing the

statements of a language. Parsing using CFG means categorizing a certain

statements into categories defined in the CFG. Parsing can be expressed using a special type

of graph called Trees where no cycles exist. A parse tree is the graph representation of a

derivation. Programmatically; Parse tree can be

represented as a dynamic data structure using a single root node.

14Dr. Hussien M. Sharaf

Page 15: Cs419 lec7   cfg

Parse tree

15

(1)A vertex with a label which is a Non-terminal symbol is a parse tree.

(2) If A → y1 y2 … yn is a rule in R, then the tree

A

y1 y2 yn. . .

is a parse tree.

Dr. Hussien M. Sharaf

Page 16: Cs419 lec7   cfg

Ambiguity A grammar can generate the same

string in different ways. Ambiguity occurs when a string has two

or more leftmost derivations for the same CFG.

There are ways to eliminate ambiguity such as using Chomsky Normal Form (CNF) which does n’t use Λ.

Λ cause ambiguity.

16Dr. Hussien M. Sharaf

Page 17: Cs419 lec7   cfg

Ex 1 Deduce CFG of addition and parse the

following expression 2+3+5 1] S→S+S|N 2] N→1|2|3|4|5|6|7|8|9|0 N1|N2|N3|N4|N5|N6|N7|N8|N9|N0

17

S

S+N

S+

N

5S+

3

N

2

N

Can u makeanother parsingtree ?

Dr. Hussien M. Sharaf

Page 18: Cs419 lec7   cfg

Ex 2 Deduce CFG of a addition/multiplication

and parse the following expression 2+3*5

1] S→S+S|S*S|N

2] N→1|2|3|4|5|6|7|8|9|0|NN

18

S

S*S

S*

N

5S+

3

N

2

N

Can u makeanother parsingtree ?

Dr. Hussien M. Sharaf

Page 19: Cs419 lec7   cfg

Ex 3 CFG without ambiguity Deduce CFG of a addition/multiplication

and parse the following expression

2*3+51] S→ Term|Term + S 2] Term → N|N * Term 3] N→1|2|3|4|5|6|7|8|9|0

19

S

S+N

S+

N

5S*

3

N

2

N

Can you makeanother parsingtree ?

Dr. Hussien M. Sharaf

Page 20: Cs419 lec7   cfg

Example 4 : AABB

20

S A | A BA Λ| a | A b | A A

B b | b c | B c | b BSample derivations:

S AB AbB Abb AAbb Aabb aabb

S

A B

AA Bb

a a b

S

BA

b

A

b

AA

a aDr. Hussien M. Sharaf

S AB AAB aAB aaB aabB aabb

Page 21: Cs419 lec7   cfg

Ex 5

21

S A | A BA Λ | a | A b | A AB b | b c | B c | b B

w = aabb

S

A B

AA b

a

a

bA

S

A

A A

AA bA

a e

a

bA

S

A B

AA Bb

a a b

Dr. Hussien M. Sharaf

Page 22: Cs419 lec7   cfg

REMOVING AMBIGUITY

22

Eliminate “useless” variables.Eliminate Λ-productions: AΛ.Avoid left recursion by replacing it with

right-recursion.

But if a language is ambiguous, it can’t be totally removed. We just need to the parsing to continue without entering an infinite loop.

Dr. Hussien M. Sharaf

Page 23: Cs419 lec7   cfg

THANK YOU

Dr. Hussien M. Sharaf 23