cs335: syntax analysis
TRANSCRIPT
CS335: Syntax AnalysisSwarnendu Biswas
Semester 2019-2020-II
CSE, IIT Kanpur
Content influenced by many excellent references, see References slide for acknowledgements.
An Overview of Compilation
CS 335 Swarnendu Biswas
lexical analyzer
semantic analyzer
source program
syntax analyzer code optimizer
code generator
intermediate code generator
target program
error handler
symbol table
Parser Interface
CS 335 Swarnendu Biswas
symbol table
token
get next token
parsetree
Lexical Analyzer
IRSyntax Analyzer
Rest of Front End
source program
Need for Checking Syntax
β’ Given an input program, scanner generates a stream of tokens classified according to the syntactic category
β’ The parser determines if the input program, represented by the token stream, is a valid sentence in the programming language
β’ The parser attempts to build a derivation for the input program, using a grammar for the programming languageβ’ If the input stream is a valid program, parser builds a valid model for later
phases
β’ If the input stream is invalid, parser reports the problem and diagnostic information to the user
CS 335 Swarnendu Biswas
Syntax Analysis
β’ Given a programming language grammar πΊ and a stream of tokens π , parsing tries to find a derivation in πΊ that produces π
β’ In addition, a syntax analyserβ’ Forward the information as IR to the next compilation phases
β’ Handle errors if the input string is not in πΏ(πΊ)
CS 335 Swarnendu Biswas
Context-Free Grammars
β’ A context-free grammar (CFG) πΊ is a quadruple (π, ππ, π, π)
CS 335 Swarnendu Biswas
π Set of terminal symbols (also called words) in the language πΏ(πΊ)
ππ Set of nonterminal symbols that appear in the productions of πΊ
π Goal or start symbol of the grammar πΊ
π Set of productions (or rules) in πΊ
Context-Free Grammars
β’ Terminal symbols correspond to syntactic categories returned by the scannerβ’ Terminal symbol is a word that can occur in a sentence
β’ Nonterminals are syntactic variables introduced to provide abstraction and structure in the productions
β’ π represents the set of sentences in πΏ(πΊ)
β’ Each rule in π is of the form π΅π» β (π» βͺ π΅π»)β
CS 335 Swarnendu Biswas
Definitions
β’ Derivation is a a sequence of rewriting steps that begins with the grammar πΊβs start symbol and ends with a sentence in the language
π Φ+π€ where π€ β πΏ(πΊ)
β’ At each point during derivation process, the string is a collection of terminal or nonterminal symbols
πΌπ΄π½ β πΌπΎπ½ if π΄ β πΎ
β’ Such a string is called a sentential form if it occurs in some step of a valid derivation
β’ A sentential form can be derived from the start symbol in zero or more steps
CS 335 Swarnendu Biswas
Example of a CFG
CFG
πΈπ₯ππ β πΈπ₯ππ
| πΈπ₯ππ ππ name
| name
ππ β + β Γ | Γ·
(π + π) Γ π
πΈπ₯ππ β πΈπ₯ππ ππ name
β πΈπ₯ππ Γ name
β (πΈπ₯ππ) Γ name
β (πΈπ₯ππ ππ name) Γ name
β (πΈπ₯ππ + name) Γ name
β (name + name) Γ name
CS 335 Swarnendu Biswas
Sentential Form and Parse Tree
CS 335 Swarnendu Biswas
πΈπ₯ππ β πΈπ₯ππ ππ name
β πΈπ₯ππ Γ name
β (πΈπ₯ππ) Γ name
β (πΈπ₯ππ ππ name) Γ name
β (πΈπ₯ππ + name) Γ name
β (name + name) Γ name
πΈπ₯ππ
πΈπ₯ππ ππ name
πΈπ₯ππ( ) Γ
πΈπ₯ππ ππ name
name +
Parse Tree
Parse Tree
β’ A parse tree is a graphical representation of a derivation β’ Root is labeled with by the start symbol π
β’ Each internal node is a nonterminal, and represents the application of a production
β’ Leaves are labeled by terminals and constitute a sentential form, read from left to right, called the yield or frontier of the tree
β’ Parse tree filters out the order in which productions are applied to replace nonterminals β’ It just represents the rules applied
CS 335 Swarnendu Biswas
Derivations
β’ At each step during derivation, we have two choices to make1. Which nonterminal to rewrite?
2. Which production rule to pick?
β’ Rightmost (or canonical) derivation rewrites the rightmost nonterminal at each step, denoted by πΌ
πππ½
β’ Similarly, leftmost derivation rewrites the leftmost nonterminal at each step, denoted by πΌ
πππ½
β’ Every leftmost derivation can be written as π€π΄πΎπππ€πΏπΎ
CS 335 Swarnendu Biswas
Leftmost Derivation
CS 335 Swarnendu Biswas
πΈπ₯ππ β πΈπ₯ππ ππ name
β (πΈπ₯ππ) ππ name
β πΈπ₯ππ ππ name ππ name
β name ππ name ππ name
β name + name ππ name
β name + name Γ name
πΈπ₯ππ
πΈπ₯ππ ππ name
πΈπ₯ππ( ) Γ
πΈπ₯ππ ππ name
name +
Parse Tree
Ambiguous Grammars
β’ A grammar πΊ is ambiguous if some sentence in πΏ(πΊ) has more than one rightmost (or leftmost) derivation
β’ An ambiguous grammar can produce multiple derivations and parse trees
CS 335 Swarnendu Biswas
Example of Ambiguous Grammar
β’ A grammar πΊ is ambiguous if some sentence in πΏ(πΊ) has more than one rightmost (or leftmost) derivation
β’ An ambiguous grammar can produce multiple derivations and parse trees
CS 335 Swarnendu Biswas
Sπ‘ππ‘ β if πΈπ₯ππ then ππ‘ππ‘
| if πΈπ₯ππ then ππ‘ππ‘ else ππ‘ππ‘
| π΄π π πππ
Ambiguous Dangling-Else Grammar
CS 335 Swarnendu Biswas
if πΈπ₯ππ1 then if πΈπ₯ππ2 then π΄π π πππ1 else π΄π π πππ2
ππ‘ππ‘
if πΈπ₯ππ1 then ππ‘ππ‘
if πΈπ₯ππ2 then ππ‘ππ‘ else ππ‘ππ‘
π΄π π πππ1 π΄π π πππ2
if πΈπ₯ππ1 then ππ‘ππ‘ else ππ‘ππ‘
ππ‘ππ‘
if πΈπ₯ππ2 then ππ‘ππ‘
π΄π π πππ1 π΄π π πππ2
Dealing with Ambiguous Grammars
β’ Ambiguous grammars are problematic for compilersβ’ Compilers use parse trees to interpret the meaning of the expressions during
later stages
β’ Multiple parse trees can give rise to multiple interpretations
β’ Fixing ambiguous grammarsβ’ Transform the grammar to remove the ambiguity
β’ Include rules to disambiguate during derivations β’ For e.g., associativity and precedence
CS 335 Swarnendu Biswas
Fixing the Ambiguous Dangling-Else Grammar
β’ In all programming languages, an else is matched with the closest then
CS 335 Swarnendu Biswas
Sπ‘ππ‘ β if πΈπ₯ππ then ππ‘ππ‘
| if πΈπ₯ππ then πβππππ‘ππ‘ else ππ‘ππ‘
| π΄π π πππ
πβππππ‘ππ‘ β if πΈπ₯ππ then πβππππ‘ππ‘ else πβππππ‘ππ‘
| π΄π π πππ
Fixed Dangling-Else Grammar
CS 335 Swarnendu Biswas
if πΈπ₯ππ1 then if πΈπ₯ππ2 then π΄π π πππ1 else π΄π π πππ2
Sπ‘ππ‘ β if πΈπ₯ππ then ππ‘ππ‘
β if πΈπ₯ππ then if πΈπ₯ππ then πβππππ‘ππ‘ else ππ‘ππ‘
β if πΈπ₯ππ then if πΈπ₯ππ then πβππππ‘ππ‘ else π΄π π πππ
β if πΈπ₯ππ then if πΈπ₯ππ then π΄π π πππ else π΄π π πππ
Interpreting the Meaning
CFG
πΈπ₯ππ β (πΈπ₯ππ)
| πΈπ₯ππ ππ name
| name
ππ β + β Γ | Γ·
π + π Γ π
πΈπ₯ππ β πΈπ₯ππ ππ name
β πΈπ₯ππ Γ name
β πΈπ₯ππ ππ name Γ name
β πΈπ₯ππ + name Γ name
β name + name Γ name
CS 335 Swarnendu Biswas
rightmost derivation
Corresponding Parse Tree
π + π Γ π
πΈπ₯ππ β πΈπ₯ππ ππ name
β πΈπ₯ππ Γ name
β πΈπ₯ππ ππ name Γ name
β πΈπ₯ππ + name Γ name
β name + name Γ name
CS 335 Swarnendu Biswas
πΈπ₯ππ
name
πΈπ₯ππ
πΈπ₯ππ ππ name
Γππ name
+
How do we evaluate the expression?
Associativity
CS 335 Swarnendu Biswas
π π‘ππππ β π π‘ππππ + π π‘ππππ π π‘ππππ β π π‘ππππ 0 1 2|β¦ |9
π π‘ππππ
9 5
π π‘ππππ
π π‘ππππ + π π‘ππππ
β π π‘ππππ 2
π π‘ππππ
β π π‘ππππ
+π π‘ππππ π π‘ππππ
5 2
π π‘ππππ
9
9 β 5 + 2
Associativity
β’ If an operand has operator on both the sides, the side on which operator takes this operand is the associativity of that operatorβ’ +, -, *, / are left associative
β’ ^, = are right associative
β’ Grammar to generate strings with right associative operators
CS 335 Swarnendu Biswas
πππβπ‘ β πππ‘π‘ππ = πππβπ‘|πππ‘π‘πππππ‘π‘ππ β π π β¦ |π§
Parse Tree for Right Associative Grammars
CS 335 Swarnendu Biswas
a = b = cπππβπ‘
πππ‘π‘ππ = πππβπ‘
a πππ‘π‘ππ πππβπ‘
πππ‘π‘ππ
c
=
b
Encode Precedence into the Grammar
ππ‘πππ‘ β πΈπ₯ππ
πΈπ₯ππ β πΈπ₯ππ + ππππ|πΈπ₯ππ β ππππ|ππππ
ππππ β ππππ Γ πΉπππ‘ππ|ππππ Γ· πΉπππ‘ππ|πΉπππ‘ππ
πΉπππ‘ππ β (πΈπ₯ππ)|num|name
CS 335 Swarnendu Biswas
pri
ori
ty
Corresponding Parse Tree
π β π + π
ππ‘πππ‘ β πΈπ₯ππ
β πΈπ₯ππ + ππππ
β πΈπ₯ππ + πΉπππ‘ππ
β πΈπ₯ππ + name
β πΈπ₯ππ β ππππ + name
β πΈπ₯ππ β πΉπππ‘ππ + name
β πΈπ₯ππ β name + name
β ππππ β name + name
β πΉπππ‘ππ β name + name
β name β name + name
CS 335 Swarnendu Biswas
πΈπ₯ππ
name
πΈπ₯ππ
πΈπ₯ππ + ππππ
β ππππ πΉπππ‘ππ
nameπΉπππ‘ππ
name
ππππ
πΉπππ‘ππ
Types of Parsers
Top-down
β’ Starts with the root and grows the tree toward the leaves
Bottom-up
β’ Starts with the leaves and grow the tree toward the root
Universal
β’ More general algorithms, but inefficient to use in production compilers
CS 335 Swarnendu Biswas
Error Handling
β’ The scanner cannot deal with all errors
β’ Common source of programming errorsβ’ Lexical errors
β’ For e.g., illegal characters, missing quotes around strings
β’ Syntactic errorsβ’ For e.g., misspelled keywords, misplaced semicolons or extra or missing braces
β’ Semantic errorsβ’ For e.g., type mismatches between operators and operands, undeclared variables
β’ Logical errors
CS 335 Swarnendu Biswas
Handling Errors
Panic-mode recovery
β’ Parser discards input symbols one at a time until a synchronizing token is found
β’ Synchronizing tokens are usually delimiters (for e.g., ; or })
Phrase-level recovery
β’ Perform local correction on the remaining input
β’ Can go into an infinite loop because of wrong correction, or the error may have occurred before it is detected
CS 335 Swarnendu Biswas
Handling Errors
Error productions
β’ Augment the grammar with productions that generate erroneous constructs
β’ Works only for common mistakes, complicates the grammar
Global correction
β’ Given an incorrect input string π₯ and grammar πΊ, find a parse tree for a related string π¦ such that the number of modifications (insertions, deletions, and changes) of tokens required to transform π₯ into π¦ is as small as possible
CS 335 Swarnendu Biswas
Context-Free vs Regular Grammar
β’ CFGs are more powerful than REsβ’ Every regular language is context-free, but not vice versa
β’ We can create a CFG for every NFA that simulates some RE
β’ Language that can be described by a CFG but not by a RE
CS 335 Swarnendu Biswas
πΏ = ππππ π β₯ 1}
Limitations of Syntax Analysis
β’ Cannot determine whether β’ A variable has been declared before use
β’ A variable has been initialized
β’ Variables are of types on which operations are allowed
β’ Number of formal and actual arguments of a function match
β’ These limitations are handled during semantic analysis
CS 335 Swarnendu Biswas