nscet e-learning presentation notes/unit 2/cs8602_cd.pdf · statement in a java method with result...

92
NSCET E-LEARNING PRESENTATION LISTEN … LEARN… LEAD…

Upload: others

Post on 16-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

NSCET

E-LEARNING

PRESENTATIONLISTEN … LEARN… LEAD…

Page 2: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

COMPUTER SCIENCE AND ENGINEERING

P.MAHALAKSHMI,M.E,MISTE

ASSISTANT PROFESSOR

Nadar Saraswathi College of Engineering & Technology,

Vadapudupatti, Annanji (po), Theni – 625531.

CS8602 – Compiler Design

III YEAR / VI SEMESTER

PHOTO

Page 3: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

UNIT II

SYNTAX ANALYSIS

Page 4: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Introduction

A syntax analyzer or parser takes the input from a lexical analyzer in the form of tokenstreams. The parser analyzes the source code (token stream) against the productionrules to detect any errors in the code. The output of this phase is a parse tree.

The parse tree is constructed by using the pre-defined Grammar of the language andthe input string. If the given input string can be produced with the help of the syntaxtree (in the derivation process), the input string is found to be in the correct syntax. ifnot, error is reported by syntax analyzer

Department of CSE, NSCET, Theni Page-1

Page 5: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Topic

Role of Parser

Page 6: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Department of CSE, NSCET, Theni Page-2

Definition In compiler model, the parser obtains a string of tokens from the lexical analyzer and

verifies that the string of token names can be generated by the grammar for the sourcelanguage.

Expect that the parser to report any syntax errors in an intelligible fashion and torecover from commonly occurring errors to continue processing the remainder of theprogram.

Conceptually, for well-formed programs, the parser constructs a parse tree and passesit to the rest of the compiler for further processing.

In fact, the parse tree need not be constructed explicitly, since checking and translationactions can be interspersed with parsing. Thus, the parser and the rest of the front endcould well be implemented by a single module.

There are three general types of parsers for grammars:

universal

top-down

bottom-up

Page 7: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Department of CSE, NSCET, Theni Page-3

Universal parsing methods such as the Cocke-Younger-Kasami algorithm and Earley'salgorithm can parse any grammar. It’s too inefficient to use in production compilers.

Top-down methods build parse trees from the top (root) to the bottom (leaves), whilebottom-up methods start from the leaves and work their way up to the root.

In either case, the input to the parser is scanned from left to right, one symbol at atime.

Page 8: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Topic

Error Handling and

Recovery in Syntax Analyzer

Page 9: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Syntax Error Handling Common programming errors can occur at many different levels.

i) Lexical errors

Include misspellings of identifiers, keywords, or operators - e.g., the use of an identifierelipsesize instead of ellipsesize - and missing quotes around text intended as a string.

ii) Syntactic errors

Include misplaced semicolons or extra or missing braces; that is, “{"or "}.". As anotherexample, in C, the appearance of a case statement without an enclosing switch is asyntactic error.

iii) Semantic errors

Include type mismatches between operators and operands. An example is a returnstatement in a Java method with result type void.

iv) Logical errors It can be anything from incorrect reasoning on the part of the programmer to the use in

a C program of the assignment operator = instead of the comparison operator ==.

Department of CSE, NSCET, Theni Page-4

Page 10: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Parsing methods allows syntactic errors to be detected very efficiently. It has theviable-prefix property, meaning that they detect that an error has occurred as soon asthey see a prefix of the input that cannot be completed to form a string.

A few semantic errors, such as type mismatches, can also be detected efficiently;however, accurate detection of semantic and logical errors at compile time is in generala difficult task.

The goals of error handler are as follows.

Report the presence of errors clearly and accurately.

Recover from each error quickly enough to detect subsequent errors.

Add minimal overhead to the processing of correct programs

Error-Recovery Strategies The simplest approach is for the parser to quit with an informative error message

when it detects the first error.

If errors pile up, it is better for the compiler to give up after exceeding some error limitthan to produce an annoying avalanche of "spurious" errors.

Department of CSE, NSCET, Theni Page-5

Page 11: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

i) Panic-Mode Recovery

With this method, on discovering an error, the parser discards input symbols one at a time until one of adesignated set of synchronizing tokens is found. The synchronizing tokens are usually delimiters, such assemicolon or }

Advantage

Simplicity

Not leads to infinite loop.

Disadvantage

It often skips a considerable amount of input without checking it for additional errors.

ii) Phrase-Level Recovery

On discovering an error, a parser may perform local correction on the remaining input; i.e it mayreplace a prefix of the remaining input by some string that allows the parser to continue.

A typical local correction is to replace a comma by a semicolon, delete an extraneous semicolon, orinsert a missing semicolon

Advantage Phrase-level replacement has been used in several error-repairing compilers, as it can correct any

input string.

Department of CSE, NSCET, Theni Page-6

Page 12: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Disadvantage We must be careful to choose replacements that do not lead to infinite loops, as it can correct any

input string. Its major drawback is the difficulty it has in coping with situations in which the actual error has

occurred before the point of detection.

iii) Error Productions By anticipating common errors that might be encountered, we can augment the grammar for the

language at hand with productions that generate the erroneous constructs. A parser constructed from a grammar augmented by these error productions detects the anticipated

errors when an error production is used during parsing. The parser can then generate appropriate error diagnostics about the erroneous construct that has

been recognized in the input.

iv) Global Correction Here, algorithms are used for choosing a minimal sequence of changes to obtain a globally least

cost correction. For a given an incorrect input string x and grammar G, these algorithms will find a parse tree for a

related string y, such that the number of insertions, deletions, and changes of tokens.Disadvantage Too costly to implement in terms of time & space.

Department of CSE, NSCET, Theni Page-7

Page 13: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Topic

Grammars – Context-free grammars –

Writing a grammar

Page 14: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Grammar

A grammar naturally describes the hierarchical structure of most programming languageconstructs. For example, an if-else statement in Java can have the form

if ( expression ) statement else statement

Using the variable expr to denote an expression and the variable stmt to denote astatement, this structuring rule can be expressed as

stmt if ( expr ) stmt else stmt

In which the arrow may be read as "can have the form." Such a rule is called aproduction. In a production, lexical elements like the keyword if and the parenthesesare called terminals.

Variables like expr and stmt represent sequences of terminals and are callednonterminals.

Department of CSE, NSCET, Theni Page-8

Page 15: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Context-Free GrammarFormally, a context-free grammar G is a 4-tuple G = (V, T, P, S), where

1. T - A set of terminal symbols, sometimes referred to as "tokens." The terminals are the

elementary symbols of the language defined by the grammar.

2. V - A set of nonterminals, sometimes called "syntactic variables." Each nonterminal represents a setof strings of terminals, in a manner we shall describe.

3. P - A set of productions, where each production consists of a nonterminal, called the head or leftside of the production, an arrow, and a sequence of terminals and/or nonterminals, called the body orright side of the production.

4. S - A designation of one of the nonterminals as the start symbol.

The grammar describes the syntax of the expressions as "lists of digits separated by plus or minus signs":

list -> list + digit

list -> list - digit

list -> digit

digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

list -> list + digit | list - digit | digit

Department of CSE, NSCET, Theni Page-9

Page 16: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Derivation

A derivation is basically a sequence of production rules, in order to get the input string.During parsing, we take two decisions for some sentential form of input:

Deciding the non-terminal which is to be replaced.

Deciding the production rule, by which, the non-terminal will be replaced.

To decide which non-terminal to be replaced with production rule, we can have twooptions.

Left-most Derivation

If the sentential form of an input is scanned and replaced from left to right, it is called left-most derivation. The sentential form derived by the left-most derivation is called the left-sentential form.

Right-most Derivation

If we scan and replace the input with production rules, from right to left, it is known asright-most derivation. The sentential form derived from the right-most derivation is calledthe right-sentential form.

Department of CSE, NSCET, Theni Page-10

Page 17: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Example

Production rules:

E → E + E

E → E * E

E → id Input string: id + id * id

The left-most derivation is:

E → E * E

E → E + E * E

E → id + E * E

E → id + id * E

E → id + id * id

Notice that the left-most side non-terminal is always processed first.

The right-most derivation is:

E → E + E

E → E + E * E

E → E + E * id

E → E + id * id

E → id + id * id

Department of CSE, NSCET, Theni Page-11

Page 18: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Parse TreeA parse tree is a graphical depiction of a derivation. It is convenient to see how strings are derived fromthe start symbol. The start symbol of the derivation becomes the root of the parse tree.Consider the production rules:E → E + EE → E * EE → idThe left-most derivation of id + id * idThe left-most derivation is:E → E * EE → E + E * EE → id + E * EE → id + id * EE → id + id * idIn a parse tree: All leaf nodes are terminals. All interior nodes are non-terminals. In-order traversal gives original input string.A parse tree depicts associativity and precedence of operators.

Department of CSE, NSCET, Theni Page-12

Page 19: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Writing Grammar

A grammar consists of a number of productions. Each production has an abstract symbol called anonterminal as its left-hand side, and a sequence of one or more nonterminal and terminalsymbols as its right-hand side.

For each grammar, the terminal symbols are drawn from a specified alphabet.

Starting from a sentence consisting of a single distinguished nonterminal, called the goal symbol, agiven context-free grammar specifies a language, namely, the set of possible sequences of terminalsymbols that can result from repeatedly replacing any nonterminal in the sequence with a right-hand side of a production for which the nonterminal is the left-hand side.

There are four categories in writing a grammar :

1. Regular Expression Vs Context Free Grammar

2. Eliminating ambiguous grammar.

3. Eliminating left-recursion

4. Left-factoring.

Each parsing method can handle grammars only of a certain form hence, the initial grammar mayhave to be rewritten to make it parsable

Department of CSE, NSCET, Theni Page-13

Page 20: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

1. Regular Expression Vs Context Free Grammar

Department of CSE, NSCET, Theni Page-14

Page 21: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

2. Eliminating ambiguous grammar

AmbiguityA grammar G is said to be ambiguous if it has more than one parse tree (left or rightderivation) for at least one string.ExampleE → E + EE → E – EE → idFor the string id + id – id, the above grammar generates two parse trees:

Department of CSE, NSCET, Theni Page-15

Page 22: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Elimination

Ambiguity of a grammar is undecidable, i.e. there is no particular algorithm forremoving the ambiguity of a grammar, but we can remove ambiguity by:

Disambiguate the grammar i.e., rewriting the grammar such that there is only onederivation or parse tree possible for a string of the language which the grammarrepresents.

Example:

Consider the ambiguous grammar E->E+E | E*E | (E) | id

Ans:

Unambiguous Grammar

E -> E + T

T -> T * F

F -> (E) | id

Department of CSE, NSCET, Theni Page-16

Page 23: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

3. Eliminating left-recursion

A grammar becomes left-recursive if it has any non-terminal ‘A’ whose derivation contains‘A’ itself as the left-most symbol. So, when the parser encounters the same non-terminal inits derivation, it becomes hard for it to judge when to stop parsing the left non-terminaland it goes into an infinite loop.

Top-down parsing methods cannot handle left-recursive grammars, so a transformationthat eliminates left recursion is needed.

A left-recursion production of the form A → Aα | β could be replaced by the non-left-recursive productions as follows:

A → βA`

A` → αA` | ε

Without changing the set of strings derivable from A.

Department of CSE, NSCET, Theni Page-17

Page 24: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Algorithm to eliminating left recursion from a grammar

Input: Grammar G with no cycles or ε-productions.

Output: An equivalent grammar with no left recursion.

Method: Note that the resulting non-left-recursive grammar may have ε-productions.

1. Arrange the nonterminals in some order A1, A2, …. An

2. for i := 1 to n do begin

for j := 1 to i-1 do begin

replace each production of the form Ai → Ajγ

by the productions Ai → δ1γ | δ2γ | … | δkγ

where Aj → δ1 | δ2 | … | δk are all the current Aj-productions;

end

eliminate the immediate left recursion among the Ai-productions

end

Department of CSE, NSCET, Theni Page-18

Page 25: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Example 1:Remove the left recursion from the productions:E -> E + T | TT -> T * F | FF -> (E) | idAns:Applying the transformation yields:E -> TE‘E‘ -> +TE‘ | εT -> FT‘T‘ -> *FT‘ | εF -> (E) | idExample 2:Remove the left recursion from the productions:S -> A a | bA -> A c | S d | εAns: The non terminal S is left recursive because S -> A a ->SdaBut it is not immediate left recursive.

Department of CSE, NSCET, Theni Page-19

Page 26: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Substitute S-productions in A -> Sd to obtain:A ->A c | A a d | b d | εEliminating the immediate left recursion:S -> Aa | bA -> bdA’| A’A’-> c A’ | adA’ | ε

4. Left Factoring Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive

parsing. If more than one grammar production rules has a common prefix string, then the top-down parser

cannot make a choice as to which of the production it should take to parse the string in hand.AlgorithmFor all A є non-terminal, find the longest prefix α that occurs in two or more right-hand sides of A.If α ≠ є then replace all of the A productions, A -> αβ1 | αβ2 | ------ | αβn | γWithA -> αA’ | γA’ -> β1 | β2 | ------ | βn

Where, A’ is a new element of non-terminal. Repeat until no common prefixes remain.It is easy to remove common prefixes by left factoring, creating new non-terminal.

Department of CSE, NSCET, Theni Page-20

Page 27: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Example 1:Consider the grammar G. Apply left factoring S -> iEtS | iEtSeS | aE -> bAns:S -> iEtSS’ | aS’ -> eSS’ | εE -> bExample 2:Do left factoring in the following grammar-A → aAB / aBc / aAcAns:A → aA’A’ → AB / Bc / AcAgain, this is a grammar with common prefixes.A → aA’A’ → AD / BcD → B / cThis is a left factored grammar.

Department of CSE, NSCET, Theni Page-21

Page 28: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Topic

Top Down Parsing - General

Strategies Recursive Descent Parser

Predictive Parser-LL(1) Parser

Page 29: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Department of CSE, NSCET, Theni Page-22

ParsingIt is the process of analyzing a continuous stream of input in order to determine its grammaticalstructure with respect to a given formal grammar.

Parse tree

Graphical representation of a derivation or deduction is called a parse tree. Each interior node of theparse tree is a non-terminal; the children of the node can be terminals or non-terminals.

Types of parsing

1. Top down parsing

2. Bottom up parsing

Top-down parsing

A parser can start with the start symbol and try to transform it to the input string.

Example : LL Parsers.

Bottom-up parsing

A parser can start with input and attempt to rewrite it into the start symbol.

Example : LR Parsers.

Page 30: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Top-down parsing Top-down parsing can be viewed as the problem of constructing a parse tree for the

given input string, starting from the root and creating the nodes of the parse tree inpreorder (depth-first left to right).

Equivalently, top-down parsing can be viewed as finding a leftmost derivation for aninput string.

It is classified in to two different variants namely; one which uses Back Tracking and theother is Non Back Tracking in nature.

Department of CSE, NSCET, Theni Page-23

Page 31: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

General Strategies Recursive Descent Parser

Top-down parsing can be viewed as an attempt to find a left most derivation for aninput string.

Equivalently, it can be viewed as a attempt to construct a parse tree for the inputstarting from the root and creating the nodes of the parse tree in preorder.

The special case of recursive –decent parsing, called predictive parsing, where nobacktracking is required. The general form of top-down parsing, called recursivedescent, that may involve backtracking, that is, making repeated scans of the input.

Recursive descent or predictive parsing works only on grammars where the firstterminal symbol of each sub expression provides enough information to choose whichproduction to use.

Recursive descent parser is a top down parser involving backtracking. It makes arepeated scans of the input. Backtracking parsers are not seen frequently, asbacktracking is very needed to parse programming language constructs.

Department of CSE, NSCET, Theni Page-24

Page 32: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Example for backtracking

Consider the grammar G :

S → cAd

A→ab|a

and the input string w=cad.

The parse tree can be constructed using the following top-down approach :

Step1:

Initially create a tree with single node labeled S. An input pointer points to ‘c’, the firstsymbol of w. Expand the tree with the production of S.

Department of CSE, NSCET, Theni Page-25

Page 33: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Step2The leftmost leaf ‘c’ matches the first symbol of w, so advance the input pointer to the second symbolof w ‘a’ and consider the next leaf ‘A’. Expand A using the first alternative.

Step3:The second symbol ‘a’ of w also matches with second leaf of tree. So advance the input pointer tothird symbol of w ‘d’.But the third leaf of tree is b which does not match with the inputsymbol d.Hence discard the chosen production and reset the pointer to second backtracking.Step4:Now try the second alternative for A.

Now halt and announce the successful completion of parsing

Department of CSE, NSCET, Theni Page-26

Page 34: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Predictive Parser (or) LL(1) ParserLL (1) stands forL - Left to right scan of input,L - Uses a Left most derivation,1 - the parser takes 1 symbol as the look ahead symbol from the input in taking parsing actiondecision. It is possible to build a nonrecursive predictive parser by maintaining a stack explicitly, rather

than implicitly via recursive calls. The key problem during predictive parsing is that of determining the production to be applied for

a nonterminal . The nonrecursive parser in figure looks up the production to be applied in parsing table. In what

follows, we shall see how the table can be constructed directly from certain grammars.

Department of CSE, NSCET, Theni Page-27

Page 35: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

A predictive parser has

Input - The input buffer contains the string to be parsed, followed by $, a symbol used as a right endmarker to indicate the end of the input string.

Stack - The stack contains a sequence of grammar symbols with $ on the bottom, indicating thebottom of the stack. Initially, the stack contains the start symbol of the grammar on top of $.

Parsing Table - The parsing table is a two dimensional array M[A,a] where A is a nonterminal, and a isa terminal or the symbol $. The parser is controlled by a program that behaves as follows. Theprogram considers X, the symbol on the top of the stack, and a, the current input symbol. These twosymbols determine the action of the parser. There are three possibilities.

1. If X= a=$, the parser halts and announces successful completion of parsing.

2. If X=a!=$, the parser pops X off the stack and advances the input pointer to the next input symbol.

3. If X is a nonterminal, the program consults entry M[X,a] of the parsing table M. This entry will beeither an X-production of the grammar or an error entry. If, for example, M[X,a]={X- >UVW}, the parserreplaces X on top of the stack by WVU( with U on top). As output, we shall assume that the parser justprints the production used; any other code could be executed here. If M[X,a]=error, the parser calls anerror recovery routine.

Department of CSE, NSCET, Theni Page-28

Page 36: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Algorithm for Nonrecursive predictive parsingInput: A string w and a parsing table M for grammar G.Output: If w is in L(G), a leftmost derivation of w; otherwise, an error indication.Method: Initially, the parser is in a configuration in which it has $S on the stack with S, the startsymbol of G on top, and w$ in the input buffer. The program that utilizes the predictive parsing table Mto produce a parse for the input.

Set ip to point to the first symbol of w$. repeatlet X be the top stack symbol and a the symbol pointed to by ip. if X is a terminal of $ then

if X=a thenpop X from the stack and advance ip else error()

elseif M[X,a]=X->Y1Y2...Yk then begin

pop X from the stack;push Yk,Yk-1...Y1 onto the stack, with Y1 on top; output the production X-> Y1Y2...Yk

endelse error()

until X=$ /* stack is empty */

Department of CSE, NSCET, Theni Page-29

Page 37: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Predictive parsing table construction

The construction of a predictive parser is aided by two functions associated with a grammar G :

FIRST

FOLLOW

Rules for FIRST( )

1. If X is terminal, then FIRST(X) is {X}.

2. If X → ε is a production, then add ε to FIRST(X).

3. If X is non-terminal and X → aα is a production then add a to FIRST(X).

4. If X is non-terminal and X → Y1 Y2…Yk is a production, then place a in FIRST(X) if for some i, a is inFIRST(Yi), and ε is in all of FIRST(Y1),…,FIRST(Yi-1);that is, Y1,….Yi-1=> ε. If ε is in FIRST(Yj) for allj=1,2,..,k, then add ε to FIRST(X).

Rules for FOLLOW( )

1. If S is a start symbol, then FOLLOW(S) contains $.

2. If there is a production A → αBβ, then everything in FIRST(β) except ε is placed in follow(B).

3. If there is a production A → αB, or a production A → αBβ where FIRST(β) contains ε, then everythingin FOLLOW(A) is in FOLLOW(B).

Department of CSE, NSCET, Theni Page-30

Page 38: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Algorithm for construction of predictive parsing table

Input : Grammar G

Output : Parsing table M

Method :

1. For each production A → α of the grammar, do steps 2 and 3.

2. For each terminal a in FIRST(α), add A → α to M[A, a].

3. If ε is in FIRST(α), add A → α to M[A, b] for each terminal b in FOLLOW(A). If ε is inFIRST(α) and $ is in FOLLOW(A) , add A → α to M[A, $].

4. Make each undefined entry of M be error.

Implementation of predictive parser

1. Elimination of left recursion, left factoring and ambiguous grammar.

2. Construct FIRST( ) and FOLLOW( ) for all non-terminals.

3. Construct predictive parsing table.

4. Parse the given input string using stack and parsing table

Department of CSE, NSCET, Theni Page-31

Page 39: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

ExampleConsider the following grammar E→E+T|TT→T*F|FF→(E)|idSolutionStep 1: Eliminating left-recursionE →TE’E’ → +TE’ | εT →FT’T’ → *FT’ | εF → (E)|idStep 2: Compute FIRST( ) and FOLLOW( )FIRST( ) for all Non terminalsFIRST(E) = { ( , id} FIRST(E’) ={+ , ε }FIRST(T) = { ( , id} FIRST(T’) = {*, ε } FIRST(F) = { ( , id }Follow( ) for all Non terminalsFOLLOW(E) = { $, ) } FOLLOW(E’) = { $, ) }FOLLOW(T) = { +, $, ) } FOLLOW(T’) = { +, $, ) } FOLLOW(F) = {+, * , $ , ) }

Department of CSE, NSCET, Theni Page-32

Page 40: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Step 3: Predictive parsing Table

Actions performed in predictive parsing:1. Shift2. Reduce3. Accept4. Error

Department of CSE, NSCET, Theni Page-33

Page 41: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Step 4: Stack ImplementationConsider the input string id+id*id

Department of CSE, NSCET, Theni Page-34

Page 42: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

LL(1) grammarThe parsing table entries are single entries. So each location has not more than one entry. Thistype of grammar is called LL(1) grammar.ExampleConsider this following grammar:S→iEtS | iEtSeS| aE→bSolutionStep: 1 Eliminating left factoringS→iEtSS’|aS’→ eS | εE→bStep 2: Compute FIRST( ) and FOLLOW( )To construct a parsing table, we need FIRST() and FOLLOW() for all the non-terminals.FIRST(S) = { i, a }FIRST(S’) = {e, ε }FIRST(E) = { b}

Department of CSE, NSCET, Theni Page-35

Page 43: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

FOLLOW(S) = { $ ,e }

FOLLOW(S’) = { $ ,e }

FOLLOW(E) = {t}

Step 3: Predictive parsing Table

Since there are more than one production, the grammar is not LL(1) grammar.

Department of CSE, NSCET, Theni Page-36

Page 44: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Topic

Shift Reduce Parser

Page 45: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Bottom-up ParsingConstructing a parse tree for an input string beginning at the leaves and going towards the root iscalled bottom-up parsing. A general type of bottom-up parser is a shift-reduce parser.HandlesA handle of a string is a substring that matches the right side of a production, and whose reduction tothe non-terminal on the left side of the production represents one step along the reverse of arightmost derivation.ExampleConsider the grammar:E→E+EE→E*EE→(E)E→idAnd the input string id1+id2*id3The rightmost derivation is :E→E+E→ E+E*E→ E+E*id3→ E+id2*id3→ id1+id2*id3In the above derivation the underlined substrings are called handles.

Department of CSE, NSCET, Theni Page-37

Page 46: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Handle pruningA rightmost derivation in reverse can be obtained by “handle pruning”. (i.e.) if w is a sentence or stringof the grammar at hand, then w = γn, where γn is the nth right sentinel form of some rightmostderivation.

Shift-reduce ParsingShift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse tree for an inputstring beginning at the leaves (the bottom) and working up towards the root (the top).

Actions in shift-reduce parsershift - The next input symbol is shifted onto the top of the stack.

reduce - The parser replaces the handle within a stack with a non-terminal.

accept - The parser announces successful completion of parsing.

error - The parser discovers that a syntax error has occurred and calls an error recovery routine.

Conflicts in shift-reduce parsingThere are two conflicts that occur in shift-reduce parsing:

1. Shift-reduce conflict: The parser cannot decide whether to shift or to reduce.

2. Reduce-reduce conflict: The parser cannot decide which of several reductions to make.

Department of CSE, NSCET, Theni Page-38

Page 47: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

ExampleE→E+EE→E*EE→(E)E→idAnd the input string id1+id2*id3

Solution

Department of CSE, NSCET, Theni Page-39

Page 48: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

1. Shift-reduce conflictExampleConsider the grammarE→E+E | E*E | id and input id+id*id

Department of CSE, NSCET, Theni Page-40

Page 49: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

2. Reduce-reduce

conflict:

Consider the grammar:

M→R+R|R+c|R

R→c

and input c+c

Viable prefixes:

α is a viable prefix of the grammar if there is w such that αw is a right

The set of prefixes of right sentinel forms that can appear on the stack of a shift-reduce

parser are called viable prefixes

The set of viable prefixes is a regular language.

Department of CSE, NSCET, Theni Page-41

Page 50: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Topic

LR Parser-LR (0)Item

Construction of SLR Parsing

Table

Page 51: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

LR ParsersAn efficient bottom-up syntax analysis technique that can be used CFG is called LR(k) parsing. The ‘L’is for left-to-right scanning of the input, the ‘R’ for constructing a rightmost derivation in reverse, andthe ‘k’ for the number of input symbols. When ‘k’ is omitted, it isassumed to be 1.Advantages of LR parsing1. It recognizes virtually all programming language constructs for which CFG can be written.2. It is an efficient non-backtracking shift-reduce parsing method.3. A grammar that can be parsed using LR method is a proper superset of a grammar that can beparsed with predictive parser4. It detects a syntactic error as soon as possible.Drawbacks of LR methodIt is too much of work to construct a LR parser by hand for a programming language grammar. Aspecialized tool, called a LR parser generator, is needed. Example: YACC.Types of LR parsing method1. SLR- Simple LREasiest to implement, least powerful.2. CLR- Canonical LRMost powerful, most expensive.3. LALR- Look-Ahead LRIntermediate in size and cost between the other two methods.

Department of CSE, NSCET, Theni Page-42

Page 52: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

The LR parsing algorithmThe schematic form of an LR parser is as follows

It consists of an input, an output, a stack, a driver program, and a parts (action and goto). The driver program is the same for all LR parser. The parsing program reads characters from an input buffer one at a time. The program uses a stack to store a string of the form s0X1s1X2s2…Xmsm, where sm is

on top. Each Xi is a grammar symbol and each si is a state. The parsing table consists of two parts : action and goto functions.

Department of CSE, NSCET, Theni Page-43

Page 53: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Action The parsing program determines sm, the state currently on top of stack, and ai, the currentinput symbol. It then consults action[sm,ai] in the action table which can have one of fourvalues:1. shift s, where s is a state,2. reduce by a grammar production A → β,3. accept4. error.GotoThe function goto takes a state and grammar symbol as arguments and produces a state.LR Parsing algorithmInput: An input string w and an LR parsing table with functions action and goto forgrammar G.Output: If w is in L(G), a bottom-up-parse for w; otherwise, an error indication.Method: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in theinput buffer. The parser then executes the following program:

Department of CSE, NSCET, Theni Page-44

Page 54: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Set ip to point to the first input symbol of w$;repeat foreverbeginlet s be the state on top of the stack anda the symbol pointed to by ip;if action[s, a] = shift s’ thenbeginpush a then s’ on top of the stack; advance ip to the next input symbol endelseif action[s, a] = reduce A→β thenbeginpop 2* | β | symbols off the stack;let s’ be the state now on top of the stack;push A then goto[s’, A] on top of the stack;output the production A→ βendelse if action[s, a] = accept thenreturnelseerror( )end

Department of CSE, NSCET, Theni Page-45

Page 55: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Augmented Grammar If G is a grammar with start symbol S, then G’, the augmented grammar for G with a new start

symbol S’ and production S’ -> S. The purpose of this new start stating production is to indicate to the parser when should stop

parsing and announce acceptance of the input i.e., acceptance occurs when and only when theparser is about to reduce by S’->S.

Constructing SLR(1) Parsing TableTo perform SLR parsing, take grammar as input and do the following:1. Find LR(0) items.2. Completing the closure.3. Compute goto(I,X), where, I is set of items and X is grammar symbol.LR(0) items:An LR(0) item of a grammar G is a production of G with a dot at some position of the right side. Forexample, production A → XYZ yields the four items :A→.XYZA → X . YZA → XY . ZA → XYZ .

Department of CSE, NSCET, Theni Page-46

Page 56: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Closure operation

If I is a set of items for a grammar G, then closure(I) is the set of items constructed from Iby the two rules:

1. Initially, every item in I is added to closure(I).

2. If A → α . Bβ is in closure(I) and B → γ is a production, then add the item B → . γ to I , ifit is not already there. We apply this rule until no more new items can be added toclosure(I).

Goto operation

Goto(I, X) is defined to be the closure of the set of all items [A→ αX . β] such that [A→ α .Xβ] is in I.

Steps to construct SLR parsing table for grammar G are:

1. Augment G and produce G’

2. Construct the canonical collection of set of items C for G’

3. Construct the parsing action function action and goto using the following algorithm thatrequires FOLLOW(A) for each non-terminal of grammar.

Department of CSE, NSCET, Theni Page-47

Page 57: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Algorithm for construction of LR(0) or SLR(1) parsing table:

Input : An augmented grammar G’

Output : The SLR parsing table functions action and goto for G’

Method :

1. Construct C = {I0, I1, …. In}, the collection of sets of LR(0) items for G’.

2. State i is constructed from Ii.. The parsing functions for state i are determined as follows:

(a) If [A→α∙aβ] is in Ii and goto(Ii,a) = Ij, then set action[i,a] to “shift j”. Here a must beterminal.

(b) If [A→α∙] is in Ii , then set action[i,a] to “reduce A→α” for all a in FOLLOW(A).

(c) If [S’→S.] is in Ii, then set action[i,$] to “accept”.

If any conflicting actions are generated by the above rules, we say grammar is not SLR(1).3. The goto transitions for state i are constructed for all non-term If goto(Ii,A) = Ij, thengoto[i,A] = j.

4. All entries not defined by rules (2) and (3) are made “error”

5. The initial state of the parser is the one constructed from the [S’→.S].

Department of CSE, NSCET, Theni Page-48

Page 58: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Example 1

Construct SLR parsing table for the following grammar and also check the input aaba is

acceptable or not.

S ‐‐> aSbS

S ‐‐> a

Solution

Step1: Construct the augmented grammar G’:

S’ ‐‐> S

S ‐‐> aSbS

S ‐‐> a

Step2: Set of LR(0) item sets

S’ -> .S

S -> .aSbS ---------------> I0

S -> .a

Department of CSE, NSCET, Theni Page-49

Page 59: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Step 3: Closure OperationState Item NotesI0 [S’ -> .S] start operation; read on S goes to I1 (state 1)

[S -> .aSbS] complete operation on S rule; read on ‘a’ goes to I2 (state 2)

[S -> .a] continue complete for all rules ‘S’; ditto the read on ‘a’, to state 2

I1 [S’ -> S.] read on ‘S’ from first line; Note: never read on ‘$’nothing to read on; nothing to complete

I2 [S -> a.SbS] from read on ‘a’ from state I0; read on ‘S’ goes to I3 (state 3)

[S -> a.] continue from read on ‘a’ from state I0 (see step 2 of state creation) nothing to read on; nothing to complete

[S -> .aSbS] complete the state because of ‘.S’ in the first item read on ‘a’ cycles back to state 2

[S -> .a] continue complete of all grammar rules for ‘S’ ditto read on ‘a’ cycles back to state 2

Department of CSE, NSCET, Theni Page-50

Page 60: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Step 3: Closure OperationState Item NotesI3 [S -> aS.bS] from read on ‘S’ from state I2 the dot is before a

non‐terminal, no complete operation read on ‘b’ goesto 4 (state 4)

I4 [S -> aSb.S] from read on ‘b’ from state I3; read on ‘S’ goes to I5 (state 5)

[S -> .aSbS] complete the state because of ‘.S’ in the first item; note:dot always in front for completes read on ‘a’ cyclesback to state 2

[S -> .a] continue complete; ditto read on ‘a’ cycles back to state 2

I5 [S -> aSbS.] from read on ‘S’ from state 5; nothing to read on

Department of CSE, NSCET, Theni Page-51

Page 61: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Parsing Table Construction

Construction rules

α, β = any string of terminals and/or non‐terminals

X, S’, S = non‐terminals

(When dot is in middle)

1. if [A ‐‐> α.aβ] ε Ii and read on ‘a’ produces Ij then ACTION [i , a] = SHIFT j.

2. if [A ‐‐> α.Xβ] ε Ii and read on ‘X’ produces Ij then GOTO [i , X] = j.

(When dot is at end)

3. if [A ‐‐> α.] ε Ii then ACTION [i , a] = REDUCE on A ‐> α for all a ε FOLLOW(A).

4. if [S’ ‐‐> S.] ε Ii then ACTION [i , $] = ACCEPT.

Using the parse table construction rules for the augmented grammar G’:1. S ‐‐> aSbS2. S ‐‐> aThe FIRST and FOLLOW statements are:FIRST(S) = {a}FOLLOW(S) = {b, $}

Department of CSE, NSCET, Theni Page-52

Page 62: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Procedure(Remember that a SHIFT refers to state, REDUCE refers to grammar rule)State Item Notes

I0 [S’ -> .S$] read on S goes to state 1; dot in middle #2, GOTO[0,S] = 1

[S -> .aSbS] read on ‘a’ for both of these goes to state 2;[S -> .a] dot in middle #1, ACTION[0,a] = SHIFT 2

I1 [S’ -> S.$] dot at end #4 (only one of these), ACTION[1,$] = REDUCE 2

I2 [S -> a.SbS] read on ‘S’ goes to state 3; dot in middle #2, GOTO[2,S] = 3

[S -> a.] dot at end #3, ACTION[2,b] = ACTION[2,$] = REDUCE 2[S -> .aSbS] read on ‘a’ for both of these cycles back to state 2;

[S -> .a] dot in middle #1, ACTION[2,a] = SHIFT 2

Department of CSE, NSCET, Theni Page-53

Page 63: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

State Item Notes

I3 [S -> aS.bS] read on ‘b’ goes to state 4; dot in middle #1,

ACTION[3,b] = SHIFT 4

I4 [S -> aSb.S] read on ‘S’ goes to state 5; dot in middle #2, GOTO[4,S] = 5

[S -> .aSbS] read on ‘a’ cycles for both of these cycles back to state 2;

[S -> .a] dot in middle #1, ACTION[4,a] = SHIFT 2

I5 [S -> aSbS.] dot at end #3, ACTION[5,b] = ACTION[5,$] = REDUCE 1

Department of CSE, NSCET, Theni Page-54

Page 64: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Step 4: Parsing tableUsing the parse table construction rules for the augmented grammar G’:1. S ‐‐> aSbS2. S ‐‐> aThe FIRST and FOLLOW statements are:FIRST(S) = {a}FOLLOW(S) = {b, $}

InputStates

Action Goto

a b $ S

0 s2 1

1 accept

2 s2 r2 r2 3

3 s4

4 s2 5

5 r1 r1

Department of CSE, NSCET, Theni Page-55

Page 65: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Step 4: Parse the inputConsider input w=aaba The given grammar 1. S ‐‐> aSbS 2. S ‐‐> a

Stack Input Action

$0 aaba$ Compare state 0 with input a in parsing table. s2 – means shift a and put the number 2.

$0a2 aba$ Compare state 2 with next input a in parsing table. s2 – means shift a and put the number 2.

$0a2a2 ba$ Compare state 2 with next input b in parsing table. r2 – means reduce a2 from stack and place S using the rule S--> a. 2 – means second rule in production

$0a2S ba$ Now Compare 2 with S in parsing table to get state number. And place that number after S. State number is 3.

$0a2S3 ba$ Compare state 3 with next input b in parsing table. s4 – means shift b and put the number 4.

$0a2S3b4 a$ Compare state 4 with next input a in parsing table. s2 – means shift b and put the number 2.

Department of CSE, NSCET, Theni Page-56

Page 66: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Stack Input Action

$0a2S3b4a2 $ Compare state 2 with next input $ in parsing table. r2 – means reduce a2 from stack and place S using the rule S--> a. 2 – means second rule in production

$0a2S3b4S $ Now Compare 4 with S in parsing table to get state number. And place that number after S. State number is 5.

$0a2S3b4S5 $ Compare state 5 with next input $ in parsing table. r1 – means reduce a2S3b4S5 from stack and place S using the rule 1. S ‐‐> aSbS . 1 – means first rule in production

$0S $ Now Compare 0 with S in parsing table to get state number. And place that number after S. State number is 1.

$0S1 $ Compare state 1 with input $ in parsing table. The answer is accept- means the input is successfully completed.

Accept Input Parsed Successfully

Department of CSE, NSCET, Theni Page-57

Page 67: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Example 2Construct the LR (0) parsing Table for the given Grammar (G).Also check the grammar is SLR(1) ornot.

S -> aB

B -> bB | b

Solution:

Step 1: Construct Augmented Grammar

S’ -> S

S -> aB

B -> bB | b

Step 2: Construct LR(0) Items

S′ -> •S

S -> •aB --------------------> I0

B -> •bB

B -> •b

Department of CSE, NSCET, Theni Page-58

Page 68: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Step 3: Construct closure operationGoto for I0 goto ( I2, b) goto (I0, S) B -> b•BS’ -> S• ------------> I1 B -> b• -----------------> I3

goto ( I0, a) B -> •bBS -> a•B B -> •bB -> •bB ------------> I2 goto (I3, B) B -> •b B -> bB• -------------------->I5

goto (I0 ,b) goto (I3, b)B -> b•B B -> b•BB -> b• B -> b• -----------------> I3

B -> •bB ------------> I3 B -> •bBB -> •b B -> •bGoto for I2

goto ( I2, B) S -> aB• -------------->I4

Department of CSE, NSCET, Theni Page-59

Page 69: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Step 4: Finite State diagram DFAFollowing DFA gives the state transitions of the parser and is useful in constructing the LRparsing table.

S b I3

a a

Bb

bI3

S′ -> •S

S -> •aB

B -> •bB

B -> •b

I0

S’ -> S•

I1

S -> a•B

B -> •bB

B -> •b

I2

B -> b•B

B -> b•

B -> •bB

B -> •b

I3

S -> aB•

I4

B -> bB•

I5

Department of CSE, NSCET, Theni Page-60

Page 70: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Step 4:Parsing Table1. S -> aB 2. B -> bB 3. B -> bFIRST(S) = {a} FOLLOW(S) = {$}FIRST(B) = {b} FOLLOW(B) = {$}

The grammar is not SLR(1). Because there are multiple entries in the SLR parsing table, then it will notaccepted by the SLR parser. In the above table 3 row is giving two entries for the single terminal valueb and it is called as Shift- Reduce conflict.

InputStates

Action Goto

a b $ S B

0 S2 S3 1

1 Accept

2 S3 4

3 S3 / R3 5

4 R1

5 R2

Department of CSE, NSCET, Theni Page-61

Page 71: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Topic

Introduction to LALR Parser

Page 72: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

LR(1) Parser or Canonical LR (CLR) Even more powerful than SLR(l) is the LR(l) parsing method.

LR(l) includes LR(O) items and a look ahead token in item sets.

An LR(l) item consists of,

o Grammar production rule.

o Right-hand position represented by the dot and.

o Lookahead token.

A --->X1 · · · Xi • Xi+1 · · · Xn , l where l is a lookahead token

The • represents how much of the right-hand side has been seen,

o X1 · · · Xi appear on top of the stack.

o Xi+l · · · Xn are expected to appear on input buffer.

The lookahead token l is expected after X1 · · · Xn appears on stack.

An LR(l) state is a set of LR(l) items

Department of CSE, NSCET, Theni Page-62

Page 73: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

LALR Parser

LALR stands for lookahead LR parser.

This is the extension of LR(O) items, by introducing the one symbol of lookahead on theinput.

It supports large class of grammars.

The number of states is LALR parser is lesser than that of LR( 1) parser. Hence, LALR ispreferable as it can be used with reduced memory.

Most syntactic constructs of programming language can be stated conveniently.

Steps to construct LALR parsing table

Generate LR(l) items.

Find the items that have same set of first components (core) and merge these sets intoone.

Merge the goto's of combined item sets.

Revise the parsing table of LR(l) parser by replacing states and goto's with combinedstates and combined goto's respectively.

Department of CSE, NSCET, Theni Page-63

Page 74: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

LR (1) items The LR (1) item is defined by production, position of data and a terminal symbol. The terminal is

called as Look ahead symbol. General form of LR (1) item is

S->α•Aβ , $ A-> •γ, FIRST(β,$)

Rules to create canonical collection: 1. Every element of I is added to closure of I2. If an LR (1) item [X-> A•BC, a] exists in I, and there exists a production B->b1b2….., then add item [B->•b1b2, z] where z is a terminal in FIRST(Ca), if it is not already in Closure(I).keep applying this rule untilthere are no more elements adde.Construction of LR (1) TableRule1: if there is an item [A->α•Xβ,b] in Ii and goto(Ii,X) is in Ij then action [Ii][X]= Shift j, Where X isTerminal.Rule2: if there is an item [A->α•, b] in Ii and (A≠S`) set action [Ii][b]= reduce along with the productionnumber.Rule3: if there is an item [S`->S•, $] in Ii then set action [Ii][$]= Accept.Rule4: if there is an item [A->α•Xβ,b] in Ii and go to(Ii,X) is in Ij then goto [Ii][X]= j, Where X is NonTerminal.

Department of CSE, NSCET, Theni Page-64

Page 75: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

ExampleS->CCC->cCC->dConstruct CLR and LALR parsing tableSolutionStep 1: Construct Augment Production Grammar S′->SS->CC C->cCC->d Step 2 : LR(0) itemsS′->•SS->•CC C->•cCC->•d

Department of CSE, NSCET, Theni Page-65

Page 76: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Step 3 : LR(1) itemsS′->•S, $ --->1.1 \\Begin with look-a-head (LAH) as $. Because of Augmented

Grammar always have $ as a look ahead symbol.The dot symbol is followed by a Non terminal S. So, add productions starting with S

S->•CC

Now match the item S′->•S, $ with the term A ->α •Xβ,b

A=S’ α=ε X=S β= ε b=$

Then compute FIRST(βb)

FIRST(ε$) = FIRST($) = $

Now

S->•CC , $ -------> 1.2

The dot symbol is followed by a Non terminal C. So, add productions starting with C

C->•cC

C->•d

Now match the item S->•CC , $ with the term A ->α •Xβ,b

A=S α=ε X=C β= C b=$

Department of CSE, NSCET, Theni Page-66

Page 77: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Then compute FIRST(βb)

FIRST(C$) = FIRST(C) = c/d \\ First element in C

Now

C->•cC , c/d --------> 1.3

C->•d , c/d

Now combine (1.1, 1.2 and 1.3) to get LR(1) items

S′->•S, $

S->•CC , $

C->•cC , c/d ----------------> I0

C->•d , c/d

Step 4: Compute Goto

Move dot one position right of the production in each goto operation

Goto of I0

goto ( I0, S)

S′->S•,$ ------------------> I1

Department of CSE, NSCET, Theni Page-67

Page 78: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

goto (I0 , C)

S-> C•C, $ \\The dot symbol is followed by a Non terminal C. So, add productions starting with C

C->•cC , $ --------> I2 \\ S-> C•C, $ match with A ->α •Xβ,b then A=S α=C X=C β= ε b=$

C->•d,$ \\ Then FIRST(βb) = FIRST(ε$) = FIRST($) = $

goto(I0 ,c)

C->c•C , c/d \\The dot symbol is followed C. So, add productions starting with C

C->•cC , c/d --------> I3 \\ C->c•C , c/d match with A ->α •Xβ,b then A=C α=c X=C β= ε b=c/d

C->•d, c/d \\ Then FIRST(βb) = FIRST(εc/d) = FIRST(c/d) = c/d

goto( I0 , d)

C->d•, c/d ------------> I4

No more move in I1. Because the dot is already in end

Goto of I2

goto ( I2, C)

S->CC•,$ -------------->I5

Department of CSE, NSCET, Theni Page-68

Page 79: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

goto ( I2, c)

C->c•C , $ \\The dot symbol is followed C. So, add productions starting with C

C->•cC, $ -----------> I6 \\ C->c•C , $ match with A ->α •Xβ,b then A=C α=c X=C β= ε b=$

C->•d , $ \\ Then FIRST(βb) = FIRST(ε$) = FIRST($) = $

goto ( I2, d)

C->d•,$ -------------> I7

Goto of I3

goto (I3 , C)

C->cC•, c/d -----------> I8

goto(I3, c)

C->c•C , c/d \\The dot symbol is followed C. So, add productions starting with C

C->•cC , c/d --------> I3 \\ C->c•C , c/d match with A ->α •Xβ,b then A=C α=c X=C β= ε b=c/d

C->•d, c/d \\ Then FIRST(βb) = FIRST(εc/d) = FIRST(c/d) = c/d

Department of CSE, NSCET, Theni Page-69

Page 80: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

goto (I3 , d)

C->d•, c/d ------------> I4

No more move in I4 and I5. Because the dot is already in end

Goto of I6

goto (I6 , C)

C->cC• , $ ------------> I9

goto (I6 , c)

C->c•C , $ \\The dot symbol is followed C. So, add productions starting with C

C->•cC, $ -----------> I6 \\ C->c•C , $ match with A ->α •Xβ,b then A=C α=c X=C β= ε b=$

C->•d , $ \\ Then FIRST(βb) = FIRST(ε$) = FIRST($) = $

goto (I6 , d)

C->d•,$ -------------> I7

No more move in I7 , I8 and I9 . Because the dot is already in end

Department of CSE, NSCET, Theni Page-70

Page 81: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Step 4: Finite State Machine DFA for the above LR (1) items

Department of CSE, NSCET, Theni Page-71

Page 82: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Step 5: CLR Parsing Table

Department of CSE, NSCET, Theni Page-72

Page 83: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Step 6: LALR Parser Construction

Consider the grammar in the previous example. Consider the states I4 and I7 as given below:

C->d•, c/d ------------> I4

C->d•,$ -------------> I7

These states are differing only in the look-aheads. They have the same productions. Hence these statesare combined to form a single state called as I47.

C->d•, c/d/$ -----------> I47

Similarly the states I3 and I6 differing only in their look-aheads as given below

C->c•C , c/d C->c•C , $

C->•cC , c/d --------> I3 C->•cC , $ --------> I6

C->•d, c/d C->•d, $

These states are differing only in the look-aheads. They have the same productions. Hence these states are combined to form a single state called as I36.

C->c•C , c/d/ $

C->•cC , c/d/ $ --------> I36

C->•d, c/d/ $

Department of CSE, NSCET, Theni Page-73

Page 84: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Similarly the States I8 and I9 differing only in look-aheads.

C->cC•, c/d -----------> I8

C->cC• , $ ------------> I9

Hence they combined to form the state I89

C->cC•, c/d/ $ -----------> I89

Therefore Final States of LALR (1)

I0 I1 I47

S′->•S, $ S′->S•,$ C->d•, c/d/$ S->•CC , $C->•cC , c/d C->•d , c/d

I2 I36 I5 I89

S-> C•C, $ C->c•C , c/d/ $ S->CC•,$ C->cC•, c/d/ $C->•cC , $ C->•cC , c/d/ $C->•d,$ C->•d, c/d/ $

Department of CSE, NSCET, Theni Page-74

Page 85: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Step 7: LALR Parsing Table

Department of CSE, NSCET, Theni Page-75

Page 86: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Topic

YACC

Page 87: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

YACC – Automatic Parser Generator YACC is a automatic tool that generates the parser program YACC stands for Yet Another Compiler Compiler. This program is available in UNIX OS The construction of LR parser requires lot of work for parsing the input string. Hence,

the process must involve automation to achieve efficiency in parsing an input Basically YACC is a LALR parser generator that reports conflicts or uncertainties (if at all

present) in the form of error messages

Department of CSE, NSCET, Theni Page-76

Page 88: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Basic SpecificationsThe YACC specification file consists of three part

Declaration section

In this section, ordinary C declarations are inserted and grammar tokens are declared.

The tokens should be declared between %{ and %}

Department of CSE, NSCET, Theni Page-77

Page 89: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Translation rule sectionIt includes the production rules of context free grammar with corresponding actionsExampleRule-1 action-1Rule-2 action-2:Rule n action nIf there is more than one alternative to a single rule then those alternatives are separatedby ‘|’ (pipe) character.The actions are typical C statements. If CFG isLHS: alternative 1 | alternative 2 | …… alternative nThenLHS: alternative 1 {action 1}

| alternative 2 {action 1}:

| alternative n {action n}

Department of CSE, NSCET, Theni Page-78

Page 90: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

C functions Section

This consists of one main function in which the routine yyparse() is called. And it alsocontains required C functions.

Example

YACC Specification of a simple desk calculator:

%{

#include <ctype.h>

%}

%token DIGIT

%%

line: expr ‘\n’ { printf(“%d\n”, $1); }

;

expr : expr ‘+’ term { $$ = $1 + $3; }

| term

;Department of CSE, NSCET, Theni Page-79

Page 91: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

term : term ‘*’ factor { $$ = $1 * $3; }

| factor

;

factor : ‘(‘ expr ‘)’ { $$ = $2; }

| DIGIT

;

%%

yylex() {

int c;

c = getchar();

if(isdigit(c)

{

yylval = c-‘0’;

return DIGIT;

}

return c;

}

Department of CSE, NSCET, Theni Page-80

Page 92: NSCET E-LEARNING PRESENTATION NOTES/unit 2/CS8602_CD.pdf · statement in a Java method with result type void. iv) Logical errors It can be anything from incorrect reasoning on the

Department of CSE, NSCET, Theni Page-81