![Page 1: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/1.jpg)
CMSC 330: Organization of Programming Languages
Context Free Grammars and Parsing
CMSC 330 - Spring 2013 1
![Page 2: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/2.jpg)
Recall: Architecture of Compilers, Interpreters
CMSC 330 - Spring 2013 2
Front End
Intermediate Representation
Back End
Parser Static Analyzer Source
Compiler / Interpreter
![Page 3: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/3.jpg)
Front End
! Responsible for turning source code (sequence of symbols) into representation of program structure
! Parser generates parse trees, which static analyzer processes
CMSC 330 - Spring 2013 3
Front End
Source Parser Static Analyzer
Intermediate Representation
Parse Tree
![Page 4: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/4.jpg)
Parser Architecture
! Scanner / lexer converts sequences of symbols into tokens (keywords, variable names, operators, numbers, etc.)
! Parser (yes, a parser contains a parser!) converts sequences of tokens into parse tree; implements a context free grammar • We can’t do everything as a regular expression – not powerful enough
CMSC 330 - Spring 2013 4
Parser
Source Scanner Parser
Parse Tree
Token Stream
![Page 5: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/5.jpg)
CMSC 330 - Spring 2013
Context Free Grammar (CFG)
! A way of describing sets of strings (= languages) • The notation [[S]] denotes the language of strings
defined by S
! Example grammar: S → 0S | 1S | ε String s’ ∊ [[S]] iff
• s’ = ε, or ∃s ∊ [[S]] such that s’ = 0s, or s’ = 1s
! Grammar is same as regular expression (0|1)* • Generates / accepts the same set of strings
5
![Page 6: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/6.jpg)
CMSC 330 - Spring 2013
Context-Free Grammars (CFGs)
! But CFGs can do what regexps cannot • S → ( S ) | ε // represents balanced pairs of ( )’s
! In fact, CFGs subsume REs, DFAs, NFAs • There is a CFG that generates any regular language • But REs are a better notation for regular languages
! CFGs can specify programming language syntax • CFGs (mostly) describe the parsing process
6
![Page 7: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/7.jpg)
CMSC 330 - Spring 2013
Formal Definition: Context-Free Grammar
! A CFG G is a 4-tuple (Σ, N, P, S) • Σ – alphabet (finite set of symbols, or terminals)
Often written in lowercase
• N – a finite, nonempty set of nonterminal symbols Often written in uppercase It must be that N ∩ Σ = ∅
• P – a set of productions of the form N → (Σ|N)* Informally: the nonterminal can be replaced by the string of
zero or more terminals / nonterminals to the right of the → Can think of productions as rewriting rules (more later)
• S ∊ N – the start symbol 7
![Page 8: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/8.jpg)
CMSC 330 - Spring 2013
Backus-Naur Form ! Context-free grammar production rules are also
called Backus-Naur Form or BNF • A production like A → B c D is written in BNF as
<A> ::= <B> c <D> (Non-terminals written with angle brackets and ::= instead of →)
• Often used to describe language syntax ! BNF was designed by
• John Backus Chair of the Algol committee in the early 1960s
• Peter Naur Secretary of the committee, who used this notation to
describe Algol in 1962
8
![Page 9: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/9.jpg)
Generating strings
! We can think of a grammar as generating strings by rewriting
! Example grammar: S → 0S | 1S | ε • S ⇒ 0S // using S → 0S • ⇒ 01S // using S → 1S • ⇒ 011S // using S → 1S • ⇒ 011 // using S → ε
CMSC 330 - Spring 2013 9
![Page 10: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/10.jpg)
CMSC 330 - Spring 2013
Accepting strings (informally)
! Determining if s ∈ [[S]] is called acceptance: goal is to find a rewriting from S that yields s • 011 ∈ [[S]] according to the previous rewriting • A rewriting is some sequence of productions
(rewrites) applied starting at the start symbol ! Terminology
• Such a sequence of rewrites is a derivation or parse • Discovering the derivation is called parsing
10
![Page 11: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/11.jpg)
CMSC 330 - Spring 2013
Derivations
! Notation ⇒ indicates a derivation of one step ⇒+ indicates a derivation of one or more steps ⇒* indicates a derivation of zero or more steps
! Example • S → 0S | 1S | ε
! For the string 010 • S ⇒ 0S ⇒ 01S ⇒ 010S ⇒ 010 • S ⇒+ 010 • S ⇒* S
11
![Page 12: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/12.jpg)
CMSC 330 - Spring 2013
Language Generated by Grammar
! L(G) (a.k.a. [[G]]), the language defined by G is
[[G]] = L(G) = { ω ∊ Σ* | S ⇒+ ω }
• S is the start symbol of the grammar • Σ is the alphabet for that grammar
! In other words • All strings over Σ that can be derived from the start
symbol via one or more productions
12
![Page 13: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/13.jpg)
CMSC 330 - Spring 2013
Practice
! Try to make a grammar which accepts • 0*|1* – 0n1n where n ≥ 0 – 0n1m where m ≤ n
! Give some example strings from this language • S → 0 | 1S
0, 10, 110, 1110, 11110, …
• What language is it? 1*0
S → A | B A → 0A | ε B → 1B | ε
S → 0S1 | ε S → 0S1 | 0S | ε
13
![Page 14: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/14.jpg)
CMSC 330 - Spring 2013
Example: Arithmetic Expressions
! E → a | b | c | E+E | E-E | E*E | (E) • An expression E is either a letter a, b, or c • Or an E followed by + followed by an E • etc…
! This describes (or generates) a set of strings • {a, b, c, a+b, a+a, a*c, a-(b*a), c*(b + a), …}
! Example strings not in the language • d, c(a), a+, b**c, etc.
14
![Page 15: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/15.jpg)
CMSC 330 - Spring 2013
Formal Description of Example
! Formally, the grammar we just showed is • Σ = { +, -, *, (, ), a, b, c } // terminals • N = { E } // nonterminals • P = { E → a, E → b, E → c, // productions
E → E-E, E → E+E, E → E*E, E → (E)
} • S = E // start symbol
15
![Page 16: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/16.jpg)
CMSC 330 - Spring 2013
(Non-)Uniqueness of Grammars
! Different grammars generate the same set of strings (language)
! The following grammar generates the same set
of strings as the previous grammar E → E+T | E-T | T T → T*P | P P → (E) | a | b | c
16
![Page 17: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/17.jpg)
CMSC 330 - Spring 2013
Notational Shortcuts
! A production is of the form • left-hand side (LHS) → right hand side (RHS)
! If not specified • Assume LHS of first listed production is the start symbol
! Productions with the same LHS • Are usually combined with |
! If a production has an empty RHS • It means the RHS is ε
S → ABC // S is start symbol A → aA | b // A → b | // A → ε
17
![Page 18: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/18.jpg)
CMSC 330 - Spring 2013
Practice
! Given the grammar S → aS | T T → bT | U U → cU | ε
• Provide derivations for the following strings b ac bbc
• Does the grammar generate the following? S ⇒+ ccc S ⇒+ bS S ⇒+ bab S ⇒+ Ta
S ⇒ T ⇒ bT ⇒ bU ⇒ b S ⇒ aS ⇒ aT ⇒ aU ⇒ acU ⇒ ac S ⇒ T ⇒ bT ⇒ bbT ⇒ bbU ⇒ bbcU ⇒ bbc
Yes No
No No
18
![Page 19: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/19.jpg)
CMSC 330 - Spring 2013
Practice
! Given the grammar S → aS | T T → bT | U U → cU | ε
• Name language accepted by grammar a*b*c*
• Give a different grammar accepting language
S → ABC A → aA | ε // a* B → bB | ε // b* C → cC | ε // c*
19
![Page 20: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/20.jpg)
CMSC 330 - Spring 2013
Parse Trees
! Parse tree shows how a string is produced by a grammar • Root node is the start symbol • Every internal node is a nonterminal • Children of an internal node
Are symbols on RHS of production applied to nonterminal
• Every leaf node is a terminal or ε
! Reading the leaves left to right • Shows the string corresponding to the tree
20
![Page 21: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/21.jpg)
CMSC 330 - Spring 2013
Parse Tree Example
S → aS | T T → bT | U U → cU | ε
S ⇒ aS ⇒ aT ⇒ aU ⇒ acU ⇒ ac
21
![Page 22: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/22.jpg)
CMSC 330 - Spring 2013
Parse Trees for Expressions ! A parse tree shows the structure of an
expression as it corresponds to a grammar E → a | b | c | d | E+E | E-E | E*E | (E) a a*c c*(b+d)
22
![Page 23: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/23.jpg)
CMSC 330 - Spring 2013
Practice
E → a | b | c | d | E+E | E-E | E*E | (E) Make a parse tree for… • a*b • a+(b-c) • d*(d+b)-a • (a+b)*(c-d) • a+(b-c)*d
23
![Page 24: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/24.jpg)
CMSC 330 - Spring 2013
Ambiguity
! A grammar is ambiguous if a string may have multiple derivations • Equivalent to multiple parse trees • Can be hard to determine
1. S → aS | T T → bT | U U → cU | ε
2. S → T | T T → Tx | Tx | x | x
3. S → SS | () | (S)
No
Yes
?
24
![Page 25: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/25.jpg)
CMSC 330 - Spring 2013
Ambiguity (cont.)
! Example • Grammar: S → SS | () | (S) String: ()()() • 2 distinct (leftmost) derivations (and parse trees)
S ⇒ SS ⇒ SSS ⇒()SS ⇒()()S ⇒()()() S ⇒ SS ⇒ ()S ⇒()SS ⇒()()S ⇒()()()
25
![Page 26: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/26.jpg)
CMSC 330 - Spring 2013
CFGs for Programming Languages
! Recall that our goal is to describe programming languages with CFGs
! We had the following example which describes limited arithmetic expressions E → a | b | c | E+E | E-E | E*E | (E)
! What’s wrong with using this grammar?
• It’s ambiguous!
26
![Page 27: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/27.jpg)
CMSC 330 - Spring 2013
Example: a-b-c E ⇒ E-E ⇒ a-E ⇒ a-E-E ⇒ a-b-E ⇒ a-b-c
E ⇒ E-E ⇒ E-E-E ⇒ a-E-E ⇒ a-b-E ⇒ a-b-c
Corresponds to a-(b-c) Corresponds to (a-b)-c 27
![Page 28: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/28.jpg)
CMSC 330 - Spring 2013
Example: a-b*c E ⇒ E-E ⇒ a-E ⇒ a-E*E ⇒ a-b*E ⇒ a-b*c
E ⇒ E-E ⇒ E-E*E ⇒ a-E*E ⇒ a-b*E ⇒ a-b*c
Corresponds to a-(b*c) Corresponds to (a-b)*c
*
*
28
![Page 29: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/29.jpg)
CMSC 330 - Spring 2013
Another Example: If-Then-Else <stmt> → <assignment> | <if-stmt> | ... <if-stmt> → if (<expr>) <stmt> | if (<expr>) <stmt> else <stmt>
(Note < >’s are used to denote nonterminals)
! Consider the following program fragment if (x > y) if (x < z) a = 1; else a = 2; (Note: Ignore newlines)
29
![Page 30: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/30.jpg)
CMSC 330 - Spring 2013
Parse Tree #1
! Else belongs to inner if
30
![Page 31: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/31.jpg)
CMSC 330 - Spring 2013
Parse Tree #2
! Else belongs to outer if
31
![Page 32: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/32.jpg)
CMSC 330 - Spring 2013
Dealing With Ambiguous Grammars
! Ambiguity is bad • Syntax is correct • But semantics differ depending on choice
Different associativity (a-b)-c vs. a-(b-c) Different precedence (a-b)*c vs. a-(b*c) Different control flow if (if else) vs. if (if) else
! Two approaches • Rewrite grammar • Use special parsing rules
Depending on parsing method (learn in CMSC 430)
32
![Page 33: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/33.jpg)
CMSC 330 - Spring 2013
Fixing the Expression Grammar
! Require right operand to not be bare expression E → E+T | E-T | E*T | T T → a | b | c | (E)
! Corresponds to left-associativity
! Now only one parse tree for a-b-c • Find derivation
33
![Page 34: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/34.jpg)
CMSC 330 - Spring 2013
What if We Wanted Right-Associativity?
! Left-recursive productions • Used for left-associative operators • Example
E → E+T | E-T | E*T | T T → a | b | c | (E)
! Right-recursive productions • Used for right-associative operators • Example
E → T+E | T-E | T*E | T T → a | b | c | (E)
34
![Page 35: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/35.jpg)
CMSC 330 - Spring 2013
Parse Tree Shape
! The kind of recursion determines the shape of the parse tree
left recursion right recursion
35
![Page 36: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/36.jpg)
CMSC 330 - Spring 2013
A Different Problem
! How about the string a+b*c ? E → E+T | E-T | E*T | T T → a | b | c | (E)
! Doesn’t have correct precedence for *
• When a nonterminal has productions for several operators, they effectively have the same precedence
36
![Page 37: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/37.jpg)
CMSC 330 - Spring 2013
Final Expression Grammar E → E+T | E-T | T lowest precedence operators T → T*P | P higher precedence P → a | b | c | (E) highest precedence (parentheses)
! Practice
• Construct tree and left and and right derivations for a+b*c a*(b+c) a*b+c a-b-c
• See what happens if you change the last set of productions to P → a | b | c | E | (E)
• See what happens if you change the first set of productions to E → E +T | E-T | T | P
37
![Page 38: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/38.jpg)
CMSC 330 - Spring 2013
Tips for Designing Grammars
1. Use recursive productions to generate an arbitrary number of symbols
A → xA | ε // Zero or more x’s A → yA | y // One or more y’s
2. Use separate non-terminals to generate disjoint parts of a language, and then combine in a production
{ a*b* } // a’s followed by b’s S → AB A → aA | ε // Zero or more a’s B → bB | ε // Zero or more b’s
38
![Page 39: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/39.jpg)
CMSC 330 - Spring 2013
Tips for Designing Grammars (cont.)
3. To generate languages with matching, balanced, or related numbers of symbols, write productions which generate strings from the middle
{anbn | n ≥ 0} // N a’s followed by N b’s S → aSb | ε Example derivation: S ⇒ aSb ⇒ aaSbb ⇒ aabb
{anb2n | n ≥ 0} // N a’s followed by 2N b’s S → aSbb | ε Example derivation: S ⇒ aSbb ⇒ aaSbbbb ⇒ aabbbb
39
![Page 40: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/40.jpg)
CMSC 330 - Spring 2013
Tips for Designing Grammars (cont.) 4. For a language that is the union of other
languages, use separate nonterminals for each part of the union and then combine { an(bm|cm) | m > n ≥ 0} Can be rewritten as { anbm | m > n ≥ 0} ∪ { ancm | m > n ≥ 0} S → T | V T → aTb | U
U → Ub | b V → aVc | W W → Wc | c
40
![Page 41: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/41.jpg)
CMSC 330 - Spring 2013
Recall: Steps of Compilation
41
![Page 42: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/42.jpg)
CMSC 330 - Spring 2013
Implementing Parsers
! Many efficient techniques for parsing • I.e., turning strings into parse trees • Examples
LL(k), SLR(k), LR(k), LALR(k)… Take CMSC 430 for more details
! One simple technique: recursive descent parsing • This is a top-down parsing algorithm • Other algorithms are bottom-up
42
![Page 43: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/43.jpg)
CMSC 330 - Spring 2013
Top-Down Parsing E → id = n | { L } L → E ; L | ε (Assume: id is
variable name, n is integer)
Show parse tree for { x = 3 ; { y = 4 ; } ; }
43
{ x = 3 ; { y = 4 ; } ; }
E
L
E L
E L
ε L
E L ε
![Page 44: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/44.jpg)
CMSC 330 - Spring 2013
Example: Recursive Descent Parsing ! Goal
• Determine if we can produce the string to be parsed from the grammar's start symbol
! Approach • Construct a parse tree: starting with start symbol at root,
recursively replace nonterminal with RHS of production ! Key question: which production to use?
• There can be many productions for each nonterminal • We could try them all, backtracking if we are unsuccessful • But this is slow!
! Answer: lookahead • Keep track of next token on the input string to be processed • Use this to guide selection of production
44
![Page 45: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/45.jpg)
CMSC 330 - Spring 2013
Bottom-up Parsing E → id = n | { L } L → E ; L | ε Show parse tree for { x = 3 ; { y = 4 ; } ; } Note that final trees
constructed are same as for top-down; only order in which nodes are added to tree is different
45
{ x = 3 ; { y = 4 ; } ; }
E
L
E L
E L
ε L
E L ε
![Page 46: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/46.jpg)
CMSC 330 - Spring 2013
Example: Shift-reduce parsing
! Replaces RHS of production with LHS (nonterminal)
! Example grammar • S → aA, A → Bc, B → b
! Example parse • abc ⇒ aBc ⇒ aA ⇒ S • Derivation happens in reverse
! Something to look forward to in CMSC 430 ! Complicated to use; requires tool support
• Bison, yacc produce shift-reduce parsers from CFGs
46
![Page 47: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/47.jpg)
CMSC 330 - Spring 2013
Tradeoffs
! Recursive descent parsers • Easy to write
The formal definition is a little clunky, but if you follow the code then it’s almost what you might have done if you weren't told about grammars formally
• Fast Can be implemented with a simple table
! Shift-reduce parsers handle more grammars • But are slower
! Most languages use hacked parsers (!) • Strange combination of the two
47
![Page 48: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/48.jpg)
CMSC 330 - Spring 2013
What’s Wrong With Parse Trees?
! Parse trees contain too much information • Example
Parentheses Extra nonterminals for precedence
• This extra stuff is needed for parsing
! But when we want to reason about languages • Extra information gets in the way (too much detail)
48
![Page 49: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/49.jpg)
CMSC 330 - Spring 2013
Abstract Syntax Trees (ASTs)
! An abstract syntax tree is a more compact, abstract representation of a parse tree, with only the essential parts
parse tree AST
49
![Page 50: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/50.jpg)
CMSC 330 - Spring 2013
Abstract Syntax Trees (cont.)
! Intuitively, ASTs correspond to the data structure you’d use to represent strings in the language • Note that grammars describe trees • So do OCaml datatypes (which we’ll see later) • E → a | b | c | E+E | E-E | E*E | (E)
50
![Page 51: CMSC 330: Organization of Programming Languages 330: Organization of Programming Languages Context Free Grammars and Parsing CMSC 330 - Spring 2013 1 Recall: Architecture of Compilers,](https://reader034.vdocuments.us/reader034/viewer/2022050720/5ab67b857f8b9a7c5b8da763/html5/thumbnails/51.jpg)
CMSC 330 - Spring 2013
The Compilation Process
51