foo presentation
TRANSCRIPT
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 1/21
PRESENTATION ON
SYNTAX ANALYSIS PHASE
IN A COMPILER
by JYOTIRMOY
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 2/21
Syntax Analysis is also often termed as
Parsing.
In computing, Parsing or Syntactic
analysis, is the process of analyzing atext, made of a sequence of tokens (for
example, words), to determine its
grammatical structure with respect to agiven (more or less) formal grammar.
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 3/21
Parser
In computing, a parser is one of the components
in an interpreter or compiler, which checks for correct
syntax and builds a data structure (often some kind of parse tree , abstract syntax tree or other hierarchical
structure) implicit in the input tokens. The parser often
uses a separate Lexical Analysis to create tokens from
the sequence of input characters.Parsers may be programmed by hand or may be
(semi-)automatically generated (in some programming
languages) by a tool.
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 4/21
Syntax Analysis (Parsing)
input
± Sequence of tokens
output
±Abstract Syntax Tree
Report syntax errors
-- unbalanced parenthesizes
[Create ³symbol-table´ ] and Parse Tree
In some cases the tree need not be generated
(one-pass compilers)
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 5/21
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 6/21
The Parsing process :Stage 2:
The next stage is parsing or syntactic
analysis, which is checking that the tokens
form an allowable expression. This is usually
done with reference to a context free grammar,
which recursively defines components that canmake up an expression and the order in which
they must appear.
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 7/21
The Parsing process :
Diagram :
Source
program
Lexical
analyzer
Request
for token
parser
Rest of
front end
Parse
tree
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 8/21
We categorize the parsers into two groups:
1. Top-Down Parser
± the parse tree is created top to bottom, startingfrom the root.
2. Bottom-Up Parser
± the parse is created bottom to top; starting from the
leaves
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 9/21
Both top-down and bottom-up parsers scan
the input from left to right (one symbol at atime).
Efficient top-down and bottom-up parsers can
be implemented only for sub-classes of context-free grammars.
±LL for top-down parsing
±LR for bottom-up parsing
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 10/21
Context-Free Grammars (CFG)
Inherently recursive structures of a programming language are defined by a CFG.
In a CFG, we have:
± A finite set of terminals (in our case, this will be
the set of tokens) ± A finite set of non-terminals (syntactic-variables)
± A finite set of productions rules in the followingform
A p E where A is a non-terminal and E is a stringof terminals and non-terminals (including the emptystring)
± A start symbol (one of the non-terminal symbol)
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 11/21
Example:
E p E + E | E ± E | E * E | E / E | - E
E p ( E )
E p id
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 12/21
Derivations
E
E+E
E+E derives from E
± we can replace E by E+E
± to able to do this, we have to have a production rule
EpE+E in our grammar.
E E+E id+E id+id
A sequence of replacements of non-terminal symbols iscalled a derivation of id+id from E.
*
+
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 13/21
In general a derivation step is
EA F EKF if there is a production rule ApK in our
grammar
where E and F are arbitrary strings
of terminal and non-terminal symbols
E1 E2 ... En (En derives from E1 or E1 derives En )
: derives in one step
: derives in zero or more steps
: derives in one or more steps
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 14/21
Derivations
E -E -(E) -(E+E) -(id+E) -(id+id)
OR
E -E -(E) -(E+E) -(E+id) -(id+id)
At each derivation step, we can choose any of the
non-terminal in the sentential form of G for thereplacement.
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 15/21
If we always choose the left-most non-terminal in
each derivation step, this derivation is called as left-
most derivation.
If we always choose the right-most non-terminal in
each derivation step, this derivation is called as
right-most derivation.
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 16/21
Left-Most and Right-Most Derivation
Left-Most Derivation
E -E -(E) -(E+E) -(id+E) -
(id+id)
Right-Most Derivation
E
-E
-(E)
-(E+E)
-(E+id)
-(id+id)
lmlmlmlmlm
rmrmrmrmrm
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 17/21
The top-down parsers try to find the left-most derivation of the given source
program.
The bottom-up parsers try to find the right-
most derivation of the given source
program in the reverse order.
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 18/21
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 19/21
Ambiguity A grammar produces more than one parse tree for a sentence
is called as an ambiguous grammar.
E E+E id+E id+E*E
id+id*E id+id*id
E E*E E+E*E id+E*E
id+id*E id+id*id
E
id
E +
id
id
E
E
* E
E
E +
id E
E
* E
id id
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 20/21
Ambiguity (cont.)
For the most parsers, the grammar must beunambiguous.
unambiguous grammar
unique selection of the parse tree for a
sentence
8/8/2019 Foo Presentation
http://slidepdf.com/reader/full/foo-presentation 21/21
We should eliminate the ambiguity in the
grammar during the design phase of the
compiler.
An unambiguous grammar should be written to
eliminate the ambiguity.
We have to prefer one of the parse trees of asentence (generated by an ambiguous grammar)
to disambiguate that grammar to restrict to this
choice.
Ambiguity (cont.)