prof. necula cs 164 lecture 5 1 introduction to parsing lecture 4
TRANSCRIPT
![Page 1: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/1.jpg)
Prof. Necula CS 164 Lecture 5 1
Introduction to Parsing
Lecture 4
![Page 2: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/2.jpg)
Prof. Necula CS 164 Lecture 5 3
Outline
• Regular languages revisited
• Parser overview
• Context-free grammars (CFG’s)
• Derivations
![Page 3: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/3.jpg)
Prof. Necula CS 164 Lecture 5 4
Languages and Automata
• Formal languages are very important in CS– Especially in programming languages
• Regular languages– The weakest formal languages widely used– Many applications
• We will also study context-free languages
![Page 4: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/4.jpg)
Prof. Necula CS 164 Lecture 5 5
Limitations of Regular Languages
• Intuition: A finite automaton that runs long enough must repeat states
• Finite automaton can’t remember # of times it has visited a particular state
• Finite automaton has finite memory– Only enough to store in which state it is – Cannot count, except up to a finite limit
• E.g., language of balanced parentheses is not regular: { (i )i | i ¸ 0}
![Page 5: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/5.jpg)
Prof. Necula CS 164 Lecture 5 6
The Functionality of the Parser
• Input: sequence of tokens from lexer
• Output: parse tree of the program
![Page 6: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/6.jpg)
Prof. Necula CS 164 Lecture 5 7
Example
• Coolif x = y then 1 else 2 fi
• Parser inputIF ID = ID THEN INT ELSE INT FI
• Parser outputIF-THEN-ELSE
=
ID ID
INT
INT
![Page 7: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/7.jpg)
Prof. Necula CS 164 Lecture 5 8
Comparison with Lexical Analysis
Phase Input Output
Lexer Sequence of characters
Sequence of tokens
Parser Sequence of tokens
Parse tree
![Page 8: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/8.jpg)
Prof. Necula CS 164 Lecture 5 9
The Role of the Parser
• Not all sequences of tokens are programs . . .
• . . . Parser must distinguish between valid and invalid sequences of tokens
• We need– A language for describing valid sequences of
tokens– A method for distinguishing valid from invalid
sequences of tokens
![Page 9: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/9.jpg)
Prof. Necula CS 164 Lecture 5 10
Context-Free Grammars
• Programming language constructs have recursive structure
• An EXPR isif EXPR then EXPR else EXPR fi , orwhile EXPR loop EXPR pool , or…
• Context-free grammars are a natural notation for this recursive structure
![Page 10: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/10.jpg)
Prof. Necula CS 164 Lecture 5 11
CFGs (Cont.)
• A CFG consists of– A set of terminals T– A set of non-terminals N– A start symbol S (a non-terminal)– A set of productions
Assuming X N X => , or X => Y1 Y2 ... Yn where Yi (N U T)
![Page 11: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/11.jpg)
Prof. Necula CS 164 Lecture 5 12
Notational Conventions
• In these lecture notes– Non-terminals are written upper-case– Terminals are written lower-case– The start symbol is the left-hand side of the
first production
![Page 12: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/12.jpg)
Prof. Necula CS 164 Lecture 5 13
Examples of CFGs
A fragment of Cool:
![Page 13: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/13.jpg)
Prof. Necula CS 164 Lecture 5 14
Examples of CFGs (cont.)
Simple arithmetic expressions:
![Page 14: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/14.jpg)
Prof. Necula CS 164 Lecture 5 15
The Language of a CFG
Read productions as replacement rules: X => Y1 ... Yn
Means X can be replaced by Y1 ... Yn
X => Means X can be erased (replaced with empty
string)
![Page 15: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/15.jpg)
Prof. Necula CS 164 Lecture 5 16
Key Idea
1. Begin with a string consisting of the start symbol “S”
2. Replace any non-terminal X in the string by a right-hand side of some production X => Y1 … Yn
1. Repeat (2) until there are no non-terminals in the string
![Page 16: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/16.jpg)
Prof. Necula CS 164 Lecture 5 17
The Language of a CFG (Cont.)
More formally, write X1 … Xi … Xn => X1 … Xi-1 Y1 … Ym Xi+1 … Xn
if there is a production Xi => Y1 … Ym
![Page 17: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/17.jpg)
Prof. Necula CS 164 Lecture 5 18
The Language of a CFG (Cont.)
Write X1 … Xn =>* Y1 … Ym
if X1 … Xn => … => … => Y1 … Ym
in 0 or more steps
![Page 18: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/18.jpg)
Prof. Necula CS 164 Lecture 5 19
The Language of a CFG
Let G be a context-free grammar with start symbol S. Then the language of G is:
{ a1 … an | S =>* a1 … an and every ai is a terminal }
![Page 19: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/19.jpg)
Prof. Necula CS 164 Lecture 5 20
Terminals
• Terminals are called because there are no rules for replacing them
• Once generated, terminals are permanent
• Terminals ought to be tokens of the language
![Page 20: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/20.jpg)
Prof. Necula CS 164 Lecture 5 21
Examples
L(G) is the language of CFG G
Strings of balanced parentheses
Two grammars:
OR
![Page 21: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/21.jpg)
Prof. Necula CS 164 Lecture 5 22
Cool Example
A fragment of COOL:
![Page 22: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/22.jpg)
Prof. Necula CS 164 Lecture 5 23
Cool Example (Cont.)
Some elements of the language
![Page 23: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/23.jpg)
Prof. Necula CS 164 Lecture 5 24
Arithmetic Example
Simple arithmetic expressions:
Some elements of the language:
![Page 24: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/24.jpg)
Prof. Necula CS 164 Lecture 5 25
Notes
The idea of a CFG is a big step. But:
• Membership in a language is “yes” or “no”– we also need parse tree of the input
• Must handle errors gracefully
• Need an implementation of CFG’s (e.g., bison)
![Page 25: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/25.jpg)
Prof. Necula CS 164 Lecture 5 26
More Notes
• Form of the grammar is important– Many grammars generate the same language– Tools are sensitive to the grammar
– Note: Tools for regular languages (e.g., flex) are also sensitive to the form of the regular expression, but this is rarely a problem in practice
![Page 26: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/26.jpg)
Prof. Necula CS 164 Lecture 5 27
Derivations and Parse Trees
A derivation is a sequence of productions S => … => …
A derivation can be drawn as a tree– Start symbol is the tree’s root
– For a production X => Y1 … Yn add children Y1, …, Yn to node X
![Page 27: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/27.jpg)
Prof. Necula CS 164 Lecture 5 28
Derivation Example
• Grammar
• String
![Page 28: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/28.jpg)
Prof. Necula CS 164 Lecture 5 29
Derivation Example (Cont.)
E
E
E E
E+
id*
idid
![Page 29: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/29.jpg)
Prof. Necula CS 164 Lecture 5 30
Derivation in Detail (1)
E
![Page 30: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/30.jpg)
Prof. Necula CS 164 Lecture 5 31
Derivation in Detail (2)
E
E E+
![Page 31: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/31.jpg)
Prof. Necula CS 164 Lecture 5 32
Derivation in Detail (3)
E
E
E E
E+
*
![Page 32: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/32.jpg)
Prof. Necula CS 164 Lecture 5 33
Derivation in Detail (4)
E
E
E E
E+
*
id
![Page 33: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/33.jpg)
Prof. Necula CS 164 Lecture 5 34
Derivation in Detail (5)
E
E
E E
E+
*
idid
![Page 34: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/34.jpg)
Prof. Necula CS 164 Lecture 5 35
Derivation in Detail (6)
E
E
E E
E+
id*
idid
![Page 35: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/35.jpg)
Prof. Necula CS 164 Lecture 5 36
Notes on Derivations
• A parse tree has– Terminals at the leaves– Non-terminals at the interior nodes
• An in-order traversal of the leaves is the original input
• The parse tree shows the association of operations, the input string does not
![Page 36: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/36.jpg)
Prof. Necula CS 164 Lecture 5 37
• The previous example is a left-most derivation– At each step, replace
the left-most non-terminal
• Here is an equivalent notion of a right-most derivation
Left-most and Right-most Derivations
![Page 37: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/37.jpg)
Prof. Necula CS 164 Lecture 5 38
Right-most Derivation in Detail (1)
E
![Page 38: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/38.jpg)
Prof. Necula CS 164 Lecture 5 39
Right-most Derivation in Detail (2)
E
E E+
![Page 39: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/39.jpg)
Prof. Necula CS 164 Lecture 5 40
Right-most Derivation in Detail (3)
E
E E+
id
![Page 40: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/40.jpg)
Prof. Necula CS 164 Lecture 5 41
Right-most Derivation in Detail (4)
E
E
E E
E+
id*
![Page 41: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/41.jpg)
Prof. Necula CS 164 Lecture 5 42
Right-most Derivation in Detail (5)
E
E
E E
E+
id*
id
![Page 42: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/42.jpg)
Prof. Necula CS 164 Lecture 5 43
Right-most Derivation in Detail (6)
E
E
E E
E+
id*
idid
![Page 43: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/43.jpg)
Prof. Necula CS 164 Lecture 5 44
Derivations and Parse Trees
• Note that right-most and left-most derivations have the same parse tree
• The difference is the order in which branches are added
![Page 44: Prof. Necula CS 164 Lecture 5 1 Introduction to Parsing Lecture 4](https://reader035.vdocuments.us/reader035/viewer/2022062301/5697c0101a28abf838ccb439/html5/thumbnails/44.jpg)
Prof. Necula CS 164 Lecture 5 45
Summary of Derivations
• We are not just interested in whether s L(G)
– We need a parse tree for s
• A derivation defines a parse tree– But one parse tree may have many derivations
• Left-most and right-most derivations are important in parser implementation