nscet e-learning presentation notes/unit 2/cs8602_cd.pdf · statement in a java method with result...

NSCET

E-LEARNING

PRESENTATIONLISTEN … LEARN… LEAD…

COMPUTER SCIENCE AND ENGINEERING

P.MAHALAKSHMI,M.E,MISTE

ASSISTANT PROFESSOR

Nadar Saraswathi College of Engineering & Technology,

Vadapudupatti, Annanji (po), Theni – 625531.

CS8602 – Compiler Design

III YEAR / VI SEMESTER

PHOTO

UNIT II

SYNTAX ANALYSIS

Introduction

A syntax analyzer or parser takes the input from a lexical analyzer in the form of tokenstreams. The parser analyzes the source code (token stream) against the productionrules to detect any errors in the code. The output of this phase is a parse tree.

The parse tree is constructed by using the pre-defined Grammar of the language andthe input string. If the given input string can be produced with the help of the syntaxtree (in the derivation process), the input string is found to be in the correct syntax. ifnot, error is reported by syntax analyzer

Department of CSE, NSCET, Theni Page-1

Topic

Role of Parser


Definition In compiler model, the parser obtains a string of tokens from the lexical analyzer and

verifies that the string of token names can be generated by the grammar for the sourcelanguage.

Expect that the parser to report any syntax errors in an intelligible fashion and torecover from commonly occurring errors to continue processing the remainder of theprogram.

Conceptually, for well-formed programs, the parser constructs a parse tree and passesit to the rest of the compiler for further processing.

In fact, the parse tree need not be constructed explicitly, since checking and translationactions can be interspersed with parsing. Thus, the parser and the rest of the front endcould well be implemented by a single module.

There are three general types of parsers for grammars:

universal

top-down

bottom-up


Universal parsing methods such as the Cocke-Younger-Kasami algorithm and Earley'salgorithm can parse any grammar. It’s too inefficient to use in production compilers.

Top-down methods build parse trees from the top (root) to the bottom (leaves), whilebottom-up methods start from the leaves and work their way up to the root.

In either case, the input to the parser is scanned from left to right, one symbol at atime.

Topic

Error Handling and

Recovery in Syntax Analyzer

Syntax Error Handling Common programming errors can occur at many different levels.

i) Lexical errors

Include misspellings of identifiers, keywords, or operators - e.g., the use of an identifierelipsesize instead of ellipsesize - and missing quotes around text intended as a string.

ii) Syntactic errors

Include misplaced semicolons or extra or missing braces; that is, “{"or "}.". As anotherexample, in C, the appearance of a case statement without an enclosing switch is asyntactic error.

iii) Semantic errors

Include type mismatches between operators and operands. An example is a returnstatement in a Java method with result type void.

iv) Logical errors It can be anything from incorrect reasoning on the part of the programmer to the use in

a C program of the assignment operator = instead of the comparison operator ==.


Parsing methods allows syntactic errors to be detected very efficiently. It has theviable-prefix property, meaning that they detect that an error has occurred as soon asthey see a prefix of the input that cannot be completed to form a string.

A few semantic errors, such as type mismatches, can also be detected efficiently;however, accurate detection of semantic and logical errors at compile time is in generala difficult task.

The goals of error handler are as follows.

Report the presence of errors clearly and accurately.

Recover from each error quickly enough to detect subsequent errors.

Add minimal overhead to the processing of correct programs

Error-Recovery Strategies The simplest approach is for the parser to quit with an informative error message

when it detects the first error.

If errors pile up, it is better for the compiler to give up after exceeding some error limitthan to produce an annoying avalanche of "spurious" errors.


i) Panic-Mode Recovery

With this method, on discovering an error, the parser discards input symbols one at a time until one of adesignated set of synchronizing tokens is found. The synchronizing tokens are usually delimiters, such assemicolon or }

Advantage

Simplicity

Not leads to infinite loop.

Disadvantage

It often skips a considerable amount of input without checking it for additional errors.

ii) Phrase-Level Recovery

On discovering an error, a parser may perform local correction on the remaining input; i.e it mayreplace a prefix of the remaining input by some string that allows the parser to continue.

A typical local correction is to replace a comma by a semicolon, delete an extraneous semicolon, orinsert a missing semicolon

Advantage Phrase-level replacement has been used in several error-repairing compilers, as it can correct any

input string.


Disadvantage We must be careful to choose replacements that do not lead to infinite loops, as it can correct any

input string. Its major drawback is the difficulty it has in coping with situations in which the actual error has

occurred before the point of detection.

iii) Error Productions By anticipating common errors that might be encountered, we can augment the grammar for the

language at hand with productions that generate the erroneous constructs. A parser constructed from a grammar augmented by these error productions detects the anticipated

errors when an error production is used during parsing. The parser can then generate appropriate error diagnostics about the erroneous construct that has

been recognized in the input.

iv) Global Correction Here, algorithms are used for choosing a minimal sequence of changes to obtain a globally least

cost correction. For a given an incorrect input string x and grammar G, these algorithms will find a parse tree for a

related string y, such that the number of insertions, deletions, and changes of tokens.Disadvantage Too costly to implement in terms of time & space.


Topic

Grammars – Context-free grammars –

Writing a grammar

Grammar

A grammar naturally describes the hierarchical structure of most programming languageconstructs. For example, an if-else statement in Java can have the form

if ( expression ) statement else statement

Using the variable expr to denote an expression and the variable stmt to denote astatement, this structuring rule can be expressed as

stmt if ( expr ) stmt else stmt

In which the arrow may be read as "can have the form." Such a rule is called aproduction. In a production, lexical elements like the keyword if and the parenthesesare called terminals.

Variables like expr and stmt represent sequences of terminals and are callednonterminals.


Context-Free GrammarFormally, a context-free grammar G is a 4-tuple G = (V, T, P, S), where

1. T - A set of terminal symbols, sometimes referred to as "tokens." The terminals are the

elementary symbols of the language defined by the grammar.

2. V - A set of nonterminals, sometimes called "syntactic variables." Each nonterminal represents a setof strings of terminals, in a manner we shall describe.

3. P - A set of productions, where each production consists of a nonterminal, called the head or leftside of the production, an arrow, and a sequence of terminals and/or nonterminals, called the body orright side of the production.

4. S - A designation of one of the nonterminals as the start symbol.

The grammar describes the syntax of the expressions as "lists of digits separated by plus or minus signs":

list -> list + digit

list -> list - digit

list -> digit

digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

list -> list + digit | list - digit | digit


Derivation

A derivation is basically a sequence of production rules, in order to get the input string.During parsing, we take two decisions for some sentential form of input:

Deciding the non-terminal which is to be replaced.

Deciding the production rule, by which, the non-terminal will be replaced.

To decide which non-terminal to be replaced with production rule, we can have twooptions.

Left-most Derivation

If the sentential form of an input is scanned and replaced from left to right, it is called left-most derivation. The sentential form derived by the left-most derivation is called the left-sentential form.

Right-most Derivation

If we scan and replace the input with production rules, from right to left, it is known asright-most derivation. The sentential form derived from the right-most derivation is calledthe right-sentential form.


Example

Production rules:

E → E + E

E → E * E

E → id Input string: id + id * id

The left-most derivation is:

E → E * E

E → E + E * E

E → id + E * E

E → id + id * E

E → id + id * id

Notice that the left-most side non-terminal is always processed first.

The right-most derivation is:

E → E + E

E → E + E * E

E → E + E * id

E → E + id * id

E → id + id * id


Parse TreeA parse tree is a graphical depiction of a derivation. It is convenient to see how strings are derived fromthe start symbol. The start symbol of the derivation becomes the root of the parse tree.Consider the production rules:E → E + EE → E * EE → idThe left-most derivation of id + id * idThe left-most derivation is:E → E * EE → E + E * EE → id + E * EE → id + id * EE → id + id * idIn a parse tree: All leaf nodes are terminals. All interior nodes are non-terminals. In-order traversal gives original input string.A parse tree depicts associativity and precedence of operators.


Writing Grammar

A grammar consists of a number of productions. Each production has an abstract symbol called anonterminal as its left-hand side, and a sequence of one or more nonterminal and terminalsymbols as its right-hand side.

For each grammar, the terminal symbols are drawn from a specified alphabet.

Starting from a sentence consisting of a single distinguished nonterminal, called the goal symbol, agiven context-free grammar specifies a language, namely, the set of possible sequences of terminalsymbols that can result from repeatedly replacing any nonterminal in the sequence with a right-hand side of a production for which the nonterminal is the left-hand side.

There are four categories in writing a grammar :

1. Regular Expression Vs Context Free Grammar

2. Eliminating ambiguous grammar.

3. Eliminating left-recursion

4. Left-factoring.

Each parsing method can handle grammars only of a certain form hence, the initial grammar mayhave to be rewritten to make it parsable


1. Regular Expression Vs Context Free Grammar


2. Eliminating ambiguous grammar

AmbiguityA grammar G is said to be ambiguous if it has more than one parse tree (left or rightderivation) for at least one string.ExampleE → E + EE → E – EE → idFor the string id + id – id, the above grammar generates two parse trees:


Elimination

Ambiguity of a grammar is undecidable, i.e. there is no particular algorithm forremoving the ambiguity of a grammar, but we can remove ambiguity by:

Disambiguate the grammar i.e., rewriting the grammar such that there is only onederivation or parse tree possible for a string of the language which the grammarrepresents.

Example:

Consider the ambiguous grammar E->E+E | E*E | (E) | id

Ans:

Unambiguous Grammar

E -> E + T

T -> T * F

F -> (E) | id


3. Eliminating left-recursion

A grammar becomes left-recursive if it has any non-terminal ‘A’ whose derivation contains‘A’ itself as the left-most symbol. So, when the parser encounters the same non-terminal inits derivation, it becomes hard for it to judge when to stop parsing the left non-terminaland it goes into an infinite loop.

Top-down parsing methods cannot handle left-recursive grammars, so a transformationthat eliminates left recursion is needed.

A left-recursion production of the form A → Aα | β could be replaced by the non-left-recursive productions as follows:

A → βA`

A` → αA` | ε

Without changing the set of strings derivable from A.


Algorithm to eliminating left recursion from a grammar

Input: Grammar G with no cycles or ε-productions.

Output: An equivalent grammar with no left recursion.

Method: Note that the resulting non-left-recursive grammar may have ε-productions.

1. Arrange the nonterminals in some order A1, A2, …. An

2. for i := 1 to n do begin

for j := 1 to i-1 do begin

replace each production of the form Ai → Ajγ

by the productions Ai → δ1γ | δ2γ | … | δkγ

where Aj → δ1 | δ2 | … | δk are all the current Aj-productions;

end

eliminate the immediate left recursion among the Ai-productions

end


Substitute S-productions in A -> Sd to obtain:A ->A c | A a d | b d | εEliminating the immediate left recursion:S -> Aa | bA -> bdA’| A’A’-> c A’ | adA’ | ε

4. Left Factoring Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive

parsing. If more than one grammar production rules has a common prefix string, then the top-down parser

cannot make a choice as to which of the production it should take to parse the string in hand.AlgorithmFor all A є non-terminal, find the longest prefix α that occurs in two or more right-hand sides of A.If α ≠ є then replace all of the A productions, A -> αβ1 | αβ2 | ------ | αβn | γWithA -> αA’ | γA’ -> β1 | β2 | ------ | βn

Where, A’ is a new element of non-terminal. Repeat until no common prefixes remain.It is easy to remove common prefixes by left factoring, creating new non-terminal.


Example 1:Consider the grammar G. Apply left factoring S -> iEtS | iEtSeS | aE -> bAns:S -> iEtSS’ | aS’ -> eSS’ | εE -> bExample 2:Do left factoring in the following grammar-A → aAB / aBc / aAcAns:A → aA’A’ → AB / Bc / AcAgain, this is a grammar with common prefixes.A → aA’A’ → AD / BcD → B / cThis is a left factored grammar.


Topic

Top Down Parsing - General

Strategies Recursive Descent Parser

Predictive Parser-LL(1) Parser


ParsingIt is the process of analyzing a continuous stream of input in order to determine its grammaticalstructure with respect to a given formal grammar.

Parse tree

Graphical representation of a derivation or deduction is called a parse tree. Each interior node of theparse tree is a non-terminal; the children of the node can be terminals or non-terminals.

Types of parsing

1. Top down parsing

2. Bottom up parsing

Top-down parsing

A parser can start with the start symbol and try to transform it to the input string.

Example : LL Parsers.

Bottom-up parsing

A parser can start with input and attempt to rewrite it into the start symbol.

Example : LR Parsers.

Top-down parsing Top-down parsing can be viewed as the problem of constructing a parse tree for the

given input string, starting from the root and creating the nodes of the parse tree inpreorder (depth-first left to right).

Equivalently, top-down parsing can be viewed as finding a leftmost derivation for aninput string.

It is classified in to two different variants namely; one which uses Back Tracking and theother is Non Back Tracking in nature.


General Strategies Recursive Descent Parser

Top-down parsing can be viewed as an attempt to find a left most derivation for aninput string.

Equivalently, it can be viewed as a attempt to construct a parse tree for the inputstarting from the root and creating the nodes of the parse tree in preorder.

The special case of recursive –decent parsing, called predictive parsing, where nobacktracking is required. The general form of top-down parsing, called recursivedescent, that may involve backtracking, that is, making repeated scans of the input.

Recursive descent or predictive parsing works only on grammars where the firstterminal symbol of each sub expression provides enough information to choose whichproduction to use.

Recursive descent parser is a top down parser involving backtracking. It makes arepeated scans of the input. Backtracking parsers are not seen frequently, asbacktracking is very needed to parse programming language constructs.


Example for backtracking

Consider the grammar G :

S → cAd

A→ab|a

and the input string w=cad.

The parse tree can be constructed using the following top-down approach :

Step1:

Initially create a tree with single node labeled S. An input pointer points to ‘c’, the firstsymbol of w. Expand the tree with the production of S.


Step2The leftmost leaf ‘c’ matches the first symbol of w, so advance the input pointer to the second symbolof w ‘a’ and consider the next leaf ‘A’. Expand A using the first alternative.

Step3:The second symbol ‘a’ of w also matches with second leaf of tree. So advance the input pointer tothird symbol of w ‘d’.But the third leaf of tree is b which does not match with the inputsymbol d.Hence discard the chosen production and reset the pointer to second backtracking.Step4:Now try the second alternative for A.

Now halt and announce the successful completion of parsing


Predictive Parser (or) LL(1) ParserLL (1) stands forL - Left to right scan of input,L - Uses a Left most derivation,1 - the parser takes 1 symbol as the look ahead symbol from the input in taking parsing actiondecision. It is possible to build a nonrecursive predictive parser by maintaining a stack explicitly, rather

than implicitly via recursive calls. The key problem during predictive parsing is that of determining the production to be applied for

a nonterminal . The nonrecursive parser in figure looks up the production to be applied in parsing table. In what

follows, we shall see how the table can be constructed directly from certain grammars.


A predictive parser has

Input - The input buffer contains the string to be parsed, followed by $, a symbol used as a right endmarker to indicate the end of the input string.

Stack - The stack contains a sequence of grammar symbols with $ on the bottom, indicating thebottom of the stack. Initially, the stack contains the start symbol of the grammar on top of $.

Parsing Table - The parsing table is a two dimensional array M[A,a] where A is a nonterminal, and a isa terminal or the symbol $. The parser is controlled by a program that behaves as follows. Theprogram considers X, the symbol on the top of the stack, and a, the current input symbol. These twosymbols determine the action of the parser. There are three possibilities.

1. If X= a=$, the parser halts and announces successful completion of parsing.

2. If X=a!=$, the parser pops X off the stack and advances the input pointer to the next input symbol.

3. If X is a nonterminal, the program consults entry M[X,a] of the parsing table M. This entry will beeither an X-production of the grammar or an error entry. If, for example, M[X,a]={X- >UVW}, the parserreplaces X on top of the stack by WVU( with U on top). As output, we shall assume that the parser justprints the production used; any other code could be executed here. If M[X,a]=error, the parser calls anerror recovery routine.


Algorithm for Nonrecursive predictive parsingInput: A string w and a parsing table M for grammar G.Output: If w is in L(G), a leftmost derivation of w; otherwise, an error indication.Method: Initially, the parser is in a configuration in which it has $S on the stack with S, the startsymbol of G on top, and w$ in the input buffer. The program that utilizes the predictive parsing table Mto produce a parse for the input.

Set ip to point to the first symbol of w$. repeatlet X be the top stack symbol and a the symbol pointed to by ip. if X is a terminal of $ then

if X=a thenpop X from the stack and advance ip else error()

elseif M[X,a]=X->Y1Y2...Yk then begin

pop X from the stack;push Yk,Yk-1...Y1 onto the stack, with Y1 on top; output the production X-> Y1Y2...Yk

endelse error()

until X=$ /* stack is empty */


Predictive parsing table construction

The construction of a predictive parser is aided by two functions associated with a grammar G :

FIRST

FOLLOW

Rules for FIRST( )

1. If X is terminal, then FIRST(X) is {X}.

2. If X → ε is a production, then add ε to FIRST(X).

3. If X is non-terminal and X → aα is a production then add a to FIRST(X).

4. If X is non-terminal and X → Y1 Y2…Yk is a production, then place a in FIRST(X) if for some i, a is inFIRST(Yi), and ε is in all of FIRST(Y1),…,FIRST(Yi-1);that is, Y1,….Yi-1=> ε. If ε is in FIRST(Yj) for allj=1,2,..,k, then add ε to FIRST(X).

Rules for FOLLOW( )

1. If S is a start symbol, then FOLLOW(S) contains $.

2. If there is a production A → αBβ, then everything in FIRST(β) except ε is placed in follow(B).

3. If there is a production A → αB, or a production A → αBβ where FIRST(β) contains ε, then everythingin FOLLOW(A) is in FOLLOW(B).


Algorithm for construction of predictive parsing table

Input : Grammar G

Output : Parsing table M

Method :

1. For each production A → α of the grammar, do steps 2 and 3.

2. For each terminal a in FIRST(α), add A → α to M[A, a].

3. If ε is in FIRST(α), add A → α to M[A, b] for each terminal b in FOLLOW(A). If ε is inFIRST(α) and $ is in FOLLOW(A) , add A → α to M[A, $].

4. Make each undefined entry of M be error.

Implementation of predictive parser

1. Elimination of left recursion, left factoring and ambiguous grammar.

2. Construct FIRST( ) and FOLLOW( ) for all non-terminals.

3. Construct predictive parsing table.

4. Parse the given input string using stack and parsing table


ExampleConsider the following grammar E→E+T|TT→T*F|FF→(E)|idSolutionStep 1: Eliminating left-recursionE →TE’E’ → +TE’ | εT →FT’T’ → *FT’ | εF → (E)|idStep 2: Compute FIRST( ) and FOLLOW( )FIRST( ) for all Non terminalsFIRST(E) = { ( , id} FIRST(E’) ={+ , ε }FIRST(T) = { ( , id} FIRST(T’) = {*, ε } FIRST(F) = { ( , id }Follow( ) for all Non terminalsFOLLOW(E) = { $, ) } FOLLOW(E’) = { $, ) }FOLLOW(T) = { +, $, ) } FOLLOW(T’) = { +, $, ) } FOLLOW(F) = {+, * , $ , ) }


Step 3: Predictive parsing Table

Actions performed in predictive parsing:1. Shift2. Reduce3. Accept4. Error


Step 4: Stack ImplementationConsider the input string id+id*id


LL(1) grammarThe parsing table entries are single entries. So each location has not more than one entry. Thistype of grammar is called LL(1) grammar.ExampleConsider this following grammar:S→iEtS | iEtSeS| aE→bSolutionStep: 1 Eliminating left factoringS→iEtSS’|aS’→ eS | εE→bStep 2: Compute FIRST( ) and FOLLOW( )To construct a parsing table, we need FIRST() and FOLLOW() for all the non-terminals.FIRST(S) = { i, a }FIRST(S’) = {e, ε }FIRST(E) = { b}


FOLLOW(S) = { $ ,e }

FOLLOW(S’) = { $ ,e }

FOLLOW(E) = {t}

Step 3: Predictive parsing Table

Since there are more than one production, the grammar is not LL(1) grammar.


Topic

Shift Reduce Parser

Bottom-up ParsingConstructing a parse tree for an input string beginning at the leaves and going towards the root iscalled bottom-up parsing. A general type of bottom-up parser is a shift-reduce parser.HandlesA handle of a string is a substring that matches the right side of a production, and whose reduction tothe non-terminal on the left side of the production represents one step along the reverse of arightmost derivation.ExampleConsider the grammar:E→E+EE→E*EE→(E)E→idAnd the input string id1+id2*id3The rightmost derivation is :E→E+E→ E+E*E→ E+E*id3→ E+id2*id3→ id1+id2*id3In the above derivation the underlined substrings are called handles.


Handle pruningA rightmost derivation in reverse can be obtained by “handle pruning”. (i.e.) if w is a sentence or stringof the grammar at hand, then w = γn, where γn is the nth right sentinel form of some rightmostderivation.

Shift-reduce ParsingShift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse tree for an inputstring beginning at the leaves (the bottom) and working up towards the root (the top).

Actions in shift-reduce parsershift - The next input symbol is shifted onto the top of the stack.

reduce - The parser replaces the handle within a stack with a non-terminal.

accept - The parser announces successful completion of parsing.

error - The parser discovers that a syntax error has occurred and calls an error recovery routine.

Conflicts in shift-reduce parsingThere are two conflicts that occur in shift-reduce parsing:

1. Shift-reduce conflict: The parser cannot decide whether to shift or to reduce.

2. Reduce-reduce conflict: The parser cannot decide which of several reductions to make.


ExampleE→E+EE→E*EE→(E)E→idAnd the input string id1+id2*id3

Solution


1. Shift-reduce conflictExampleConsider the grammarE→E+E | E*E | id and input id+id*id


2. Reduce-reduce

conflict:

Consider the grammar:

M→R+R|R+c|R

R→c

and input c+c

Viable prefixes:

α is a viable prefix of the grammar if there is w such that αw is a right

The set of prefixes of right sentinel forms that can appear on the stack of a shift-reduce

parser are called viable prefixes

The set of viable prefixes is a regular language.


Topic

LR Parser-LR (0)Item

Construction of SLR Parsing

Table

LR ParsersAn efficient bottom-up syntax analysis technique that can be used CFG is called LR(k) parsing. The ‘L’is for left-to-right scanning of the input, the ‘R’ for constructing a rightmost derivation in reverse, andthe ‘k’ for the number of input symbols. When ‘k’ is omitted, it isassumed to be 1.Advantages of LR parsing1. It recognizes virtually all programming language constructs for which CFG can be written.2. It is an efficient non-backtracking shift-reduce parsing method.3. A grammar that can be parsed using LR method is a proper superset of a grammar that can beparsed with predictive parser4. It detects a syntactic error as soon as possible.Drawbacks of LR methodIt is too much of work to construct a LR parser by hand for a programming language grammar. Aspecialized tool, called a LR parser generator, is needed. Example: YACC.Types of LR parsing method1. SLR- Simple LREasiest to implement, least powerful.2. CLR- Canonical LRMost powerful, most expensive.3. LALR- Look-Ahead LRIntermediate in size and cost between the other two methods.


The LR parsing algorithmThe schematic form of an LR parser is as follows

It consists of an input, an output, a stack, a driver program, and a parts (action and goto). The driver program is the same for all LR parser. The parsing program reads characters from an input buffer one at a time. The program uses a stack to store a string of the form s0X1s1X2s2…Xmsm, where sm is

on top. Each Xi is a grammar symbol and each si is a state. The parsing table consists of two parts : action and goto functions.


Action The parsing program determines sm, the state currently on top of stack, and ai, the currentinput symbol. It then consults action[sm,ai] in the action table which can have one of fourvalues:1. shift s, where s is a state,2. reduce by a grammar production A → β,3. accept4. error.GotoThe function goto takes a state and grammar symbol as arguments and produces a state.LR Parsing algorithmInput: An input string w and an LR parsing table with functions action and goto forgrammar G.Output: If w is in L(G), a bottom-up-parse for w; otherwise, an error indication.Method: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in theinput buffer. The parser then executes the following program:


Set ip to point to the first input symbol of w$;repeat foreverbeginlet s be the state on top of the stack anda the symbol pointed to by ip;if action[s, a] = shift s’ thenbeginpush a then s’ on top of the stack; advance ip to the next input symbol endelseif action[s, a] = reduce A→β thenbeginpop 2* | β | symbols off the stack;let s’ be the state now on top of the stack;push A then goto[s’, A] on top of the stack;output the production A→ βendelse if action[s, a] = accept thenreturnelseerror( )end


Augmented Grammar If G is a grammar with start symbol S, then G’, the augmented grammar for G with a new start

symbol S’ and production S’ -> S. The purpose of this new start stating production is to indicate to the parser when should stop

parsing and announce acceptance of the input i.e., acceptance occurs when and only when theparser is about to reduce by S’->S.

Constructing SLR(1) Parsing TableTo perform SLR parsing, take grammar as input and do the following:1. Find LR(0) items.2. Completing the closure.3. Compute goto(I,X), where, I is set of items and X is grammar symbol.LR(0) items:An LR(0) item of a grammar G is a production of G with a dot at some position of the right side. Forexample, production A → XYZ yields the four items :A→.XYZA → X . YZA → XY . ZA → XYZ .


Closure operation

If I is a set of items for a grammar G, then closure(I) is the set of items constructed from Iby the two rules:

1. Initially, every item in I is added to closure(I).

2. If A → α . Bβ is in closure(I) and B → γ is a production, then add the item B → . γ to I , ifit is not already there. We apply this rule until no more new items can be added toclosure(I).

Goto operation

Goto(I, X) is defined to be the closure of the set of all items [A→ αX . β] such that [A→ α .Xβ] is in I.

Steps to construct SLR parsing table for grammar G are:

1. Augment G and produce G’

2. Construct the canonical collection of set of items C for G’

3. Construct the parsing action function action and goto using the following algorithm thatrequires FOLLOW(A) for each non-terminal of grammar.


Algorithm for construction of LR(0) or SLR(1) parsing table:

Input : An augmented grammar G’

Output : The SLR parsing table functions action and goto for G’

Method :

1. Construct C = {I0, I1, …. In}, the collection of sets of LR(0) items for G’.

2. State i is constructed from Ii.. The parsing functions for state i are determined as follows:

(a) If [A→α∙aβ] is in Ii and goto(Ii,a) = Ij, then set action[i,a] to “shift j”. Here a must beterminal.

(b) If [A→α∙] is in Ii , then set action[i,a] to “reduce A→α” for all a in FOLLOW(A).

(c) If [S’→S.] is in Ii, then set action[i,$] to “accept”.

If any conflicting actions are generated by the above rules, we say grammar is not SLR(1).3. The goto transitions for state i are constructed for all non-term If goto(Ii,A) = Ij, thengoto[i,A] = j.

4. All entries not defined by rules (2) and (3) are made “error”

5. The initial state of the parser is the one constructed from the [S’→.S].


Example 1

Construct SLR parsing table for the following grammar and also check the input aaba is

acceptable or not.

S ‐‐> aSbS

S ‐‐> a

Solution

Step1: Construct the augmented grammar G’:

S’ ‐‐> S

S ‐‐> aSbS

S ‐‐> a

Step2: Set of LR(0) item sets

S’ -> .S

S -> .aSbS ---------------> I0

S -> .a


Step 3: Closure OperationState Item NotesI0 [S’ -> .S] start operation; read on S goes to I1 (state 1)

[S -> .aSbS] complete operation on S rule; read on ‘a’ goes to I2 (state 2)

[S -> .a] continue complete for all rules ‘S’; ditto the read on ‘a’, to state 2

I1 [S’ -> S.] read on ‘S’ from first line; Note: never read on ‘$’nothing to read on; nothing to complete

I2 [S -> a.SbS] from read on ‘a’ from state I0; read on ‘S’ goes to I3 (state 3)

[S -> a.] continue from read on ‘a’ from state I0 (see step 2 of state creation) nothing to read on; nothing to complete

[S -> .aSbS] complete the state because of ‘.S’ in the first item read on ‘a’ cycles back to state 2

[S -> .a] continue complete of all grammar rules for ‘S’ ditto read on ‘a’ cycles back to state 2


Step 3: Closure OperationState Item NotesI3 [S -> aS.bS] from read on ‘S’ from state I2 the dot is before a

non‐terminal, no complete operation read on ‘b’ goesto 4 (state 4)

I4 [S -> aSb.S] from read on ‘b’ from state I3; read on ‘S’ goes to I5 (state 5)

[S -> .aSbS] complete the state because of ‘.S’ in the first item; note:dot always in front for completes read on ‘a’ cyclesback to state 2

[S -> .a] continue complete; ditto read on ‘a’ cycles back to state 2

I5 [S -> aSbS.] from read on ‘S’ from state 5; nothing to read on


Parsing Table Construction

Construction rules

α, β = any string of terminals and/or non‐terminals

X, S’, S = non‐terminals

(When dot is in middle)

1. if [A ‐‐> α.aβ] ε Ii and read on ‘a’ produces Ij then ACTION [i , a] = SHIFT j.

2. if [A ‐‐> α.Xβ] ε Ii and read on ‘X’ produces Ij then GOTO [i , X] = j.

(When dot is at end)

3. if [A ‐‐> α.] ε Ii then ACTION [i , a] = REDUCE on A ‐> α for all a ε FOLLOW(A).

4. if [S’ ‐‐> S.] ε Ii then ACTION [i , $] = ACCEPT.

Using the parse table construction rules for the augmented grammar G’:1. S ‐‐> aSbS2. S ‐‐> aThe FIRST and FOLLOW statements are:FIRST(S) = {a}FOLLOW(S) = {b, $}


Procedure(Remember that a SHIFT refers to state, REDUCE refers to grammar rule)State Item Notes

I0 [S’ -> .S$] read on S goes to state 1; dot in middle #2, GOTO[0,S] = 1

[S -> .aSbS] read on ‘a’ for both of these goes to state 2;[S -> .a] dot in middle #1, ACTION[0,a] = SHIFT 2

I1 [S’ -> S.$] dot at end #4 (only one of these), ACTION[1,$] = REDUCE 2

I2 [S -> a.SbS] read on ‘S’ goes to state 3; dot in middle #2, GOTO[2,S] = 3

[S -> a.] dot at end #3, ACTION[2,b] = ACTION[2,$] = REDUCE 2[S -> .aSbS] read on ‘a’ for both of these cycles back to state 2;

[S -> .a] dot in middle #1, ACTION[2,a] = SHIFT 2


State Item Notes

I3 [S -> aS.bS] read on ‘b’ goes to state 4; dot in middle #1,

ACTION[3,b] = SHIFT 4

I4 [S -> aSb.S] read on ‘S’ goes to state 5; dot in middle #2, GOTO[4,S] = 5

[S -> .aSbS] read on ‘a’ cycles for both of these cycles back to state 2;

[S -> .a] dot in middle #1, ACTION[4,a] = SHIFT 2

I5 [S -> aSbS.] dot at end #3, ACTION[5,b] = ACTION[5,$] = REDUCE 1


Step 4: Parsing tableUsing the parse table construction rules for the augmented grammar G’:1. S ‐‐> aSbS2. S ‐‐> aThe FIRST and FOLLOW statements are:FIRST(S) = {a}FOLLOW(S) = {b, $}

InputStates

Action Goto

a b $ S

0 s2 1

1 accept

2 s2 r2 r2 3

3 s4

4 s2 5

5 r1 r1


Step 4: Parse the inputConsider input w=aaba The given grammar 1. S ‐‐> aSbS 2. S ‐‐> a

Stack Input Action

$0 aaba$ Compare state 0 with input a in parsing table. s2 – means shift a and put the number 2.

$0a2 aba$ Compare state 2 with next input a in parsing table. s2 – means shift a and put the number 2.

$0a2a2 ba$ Compare state 2 with next input b in parsing table. r2 – means reduce a2 from stack and place S using the rule S--> a. 2 – means second rule in production

$0a2S ba$ Now Compare 2 with S in parsing table to get state number. And place that number after S. State number is 3.

$0a2S3 ba$ Compare state 3 with next input b in parsing table. s4 – means shift b and put the number 4.

$0a2S3b4 a$ Compare state 4 with next input a in parsing table. s2 – means shift b and put the number 2.


Stack Input Action

$0a2S3b4a2 $ Compare state 2 with next input $ in parsing table. r2 – means reduce a2 from stack and place S using the rule S--> a. 2 – means second rule in production

$0a2S3b4S $ Now Compare 4 with S in parsing table to get state number. And place that number after S. State number is 5.

$0a2S3b4S5 $ Compare state 5 with next input $ in parsing table. r1 – means reduce a2S3b4S5 from stack and place S using the rule 1. S ‐‐> aSbS . 1 – means first rule in production

$0S $ Now Compare 0 with S in parsing table to get state number. And place that number after S. State number is 1.

$0S1 $ Compare state 1 with input $ in parsing table. The answer is accept- means the input is successfully completed.

Accept Input Parsed Successfully


Example 2Construct the LR (0) parsing Table for the given Grammar (G).Also check the grammar is SLR(1) ornot.

S -> aB

B -> bB | b

Solution:

Step 1: Construct Augmented Grammar

S’ -> S

S -> aB

B -> bB | b

Step 2: Construct LR(0) Items

S′ -> •S

S -> •aB --------------------> I0

B -> •bB

B -> •b


Step 3: Construct closure operationGoto for I0 goto ( I2, b) goto (I0, S) B -> b•BS’ -> S• ------------> I1 B -> b• -----------------> I3

goto ( I0, a) B -> •bBS -> a•B B -> •bB -> •bB ------------> I2 goto (I3, B) B -> •b B -> bB• -------------------->I5

goto (I0 ,b) goto (I3, b)B -> b•B B -> b•BB -> b• B -> b• -----------------> I3

B -> •bB ------------> I3 B -> •bBB -> •b B -> •bGoto for I2

goto ( I2, B) S -> aB• -------------->I4


Step 4: Finite State diagram DFAFollowing DFA gives the state transitions of the parser and is useful in constructing the LRparsing table.

S b I3

a a

Bb

bI3

S′ -> •S

S -> •aB

B -> •bB

B -> •b

I0

S’ -> S•

I1

S -> a•B

B -> •bB

B -> •b

I2

B -> b•B

B -> b•

B -> •bB

B -> •b

I3

S -> aB•

I4

B -> bB•

I5


Step 4:Parsing Table1. S -> aB 2. B -> bB 3. B -> bFIRST(S) = {a} FOLLOW(S) = {$}FIRST(B) = {b} FOLLOW(B) = {$}

The grammar is not SLR(1). Because there are multiple entries in the SLR parsing table, then it will notaccepted by the SLR parser. In the above table 3 row is giving two entries for the single terminal valueb and it is called as Shift- Reduce conflict.

InputStates

Action Goto

a b $ S B

0 S2 S3 1

1 Accept

2 S3 4

3 S3 / R3 5

4 R1

5 R2


Topic

Introduction to LALR Parser

LR(1) Parser or Canonical LR (CLR) Even more powerful than SLR(l) is the LR(l) parsing method.

LR(l) includes LR(O) items and a look ahead token in item sets.

An LR(l) item consists of,

o Grammar production rule.

o Right-hand position represented by the dot and.

o Lookahead token.

A --->X1 · · · Xi • Xi+1 · · · Xn , l where l is a lookahead token

The • represents how much of the right-hand side has been seen,

o X1 · · · Xi appear on top of the stack.

o Xi+l · · · Xn are expected to appear on input buffer.

The lookahead token l is expected after X1 · · · Xn appears on stack.

An LR(l) state is a set of LR(l) items


LALR Parser

LALR stands for lookahead LR parser.

This is the extension of LR(O) items, by introducing the one symbol of lookahead on theinput.

It supports large class of grammars.

The number of states is LALR parser is lesser than that of LR( 1) parser. Hence, LALR ispreferable as it can be used with reduced memory.

Most syntactic constructs of programming language can be stated conveniently.

Steps to construct LALR parsing table

Generate LR(l) items.

Find the items that have same set of first components (core) and merge these sets intoone.

Merge the goto's of combined item sets.

Revise the parsing table of LR(l) parser by replacing states and goto's with combinedstates and combined goto's respectively.


LR (1) items The LR (1) item is defined by production, position of data and a terminal symbol. The terminal is

called as Look ahead symbol. General form of LR (1) item is

S->α•Aβ , $ A-> •γ, FIRST(β,$)

Rules to create canonical collection: 1. Every element of I is added to closure of I2. If an LR (1) item [X-> A•BC, a] exists in I, and there exists a production B->b1b2….., then add item [B->•b1b2, z] where z is a terminal in FIRST(Ca), if it is not already in Closure(I).keep applying this rule untilthere are no more elements adde.Construction of LR (1) TableRule1: if there is an item [A->α•Xβ,b] in Ii and goto(Ii,X) is in Ij then action [Ii][X]= Shift j, Where X isTerminal.Rule2: if there is an item [A->α•, b] in Ii and (A≠S`) set action [Ii][b]= reduce along with the productionnumber.Rule3: if there is an item [S`->S•, $] in Ii then set action [Ii][$]= Accept.Rule4: if there is an item [A->α•Xβ,b] in Ii and go to(Ii,X) is in Ij then goto [Ii][X]= j, Where X is NonTerminal.


ExampleS->CCC->cCC->dConstruct CLR and LALR parsing tableSolutionStep 1: Construct Augment Production Grammar S′->SS->CC C->cCC->d Step 2 : LR(0) itemsS′->•SS->•CC C->•cCC->•d


Step 3 : LR(1) itemsS′->•S, $ --->1.1 \\Begin with look-a-head (LAH) as $. Because of Augmented

Grammar always have $ as a look ahead symbol.The dot symbol is followed by a Non terminal S. So, add productions starting with S

S->•CC

Now match the item S′->•S, $ with the term A ->α •Xβ,b

A=S’ α=ε X=S β= ε b=$

Then compute FIRST(βb)

FIRST(ε$) = FIRST($) = $

Now

S->•CC , $ -------> 1.2

The dot symbol is followed by a Non terminal C. So, add productions starting with C

C->•cC

C->•d

Now match the item S->•CC , $ with the term A ->α •Xβ,b

A=S α=ε X=C β= C b=$


Then compute FIRST(βb)

FIRST(C$) = FIRST(C) = c/d \\ First element in C

Now

C->•cC , c/d --------> 1.3

C->•d , c/d

Now combine (1.1, 1.2 and 1.3) to get LR(1) items

S′->•S, $

S->•CC , $

C->•cC , c/d ----------------> I0

C->•d , c/d

Step 4: Compute Goto

Move dot one position right of the production in each goto operation

Goto of I0

goto ( I0, S)

S′->S•,$ ------------------> I1


goto (I0 , C)

S-> C•C, $ \\The dot symbol is followed by a Non terminal C. So, add productions starting with C

C->•cC , $ --------> I2 \\ S-> C•C, $ match with A ->α •Xβ,b then A=S α=C X=C β= ε b=$

C->•d,$ \\ Then FIRST(βb) = FIRST(ε$) = FIRST($) = $

goto(I0 ,c)

C->c•C , c/d \\The dot symbol is followed C. So, add productions starting with C

C->•cC , c/d --------> I3 \\ C->c•C , c/d match with A ->α •Xβ,b then A=C α=c X=C β= ε b=c/d

C->•d, c/d \\ Then FIRST(βb) = FIRST(εc/d) = FIRST(c/d) = c/d

goto( I0 , d)

C->d•, c/d ------------> I4

No more move in I1. Because the dot is already in end

Goto of I2

goto ( I2, C)

S->CC•,$ -------------->I5


goto ( I2, c)

C->c•C , $ \\The dot symbol is followed C. So, add productions starting with C

C->•cC, $ -----------> I6 \\ C->c•C , $ match with A ->α •Xβ,b then A=C α=c X=C β= ε b=$

C->•d , $ \\ Then FIRST(βb) = FIRST(ε$) = FIRST($) = $

goto ( I2, d)

C->d•,$ -------------> I7

Goto of I3

goto (I3 , C)

C->cC•, c/d -----------> I8

goto(I3, c)

C->c•C , c/d \\The dot symbol is followed C. So, add productions starting with C

C->•cC , c/d --------> I3 \\ C->c•C , c/d match with A ->α •Xβ,b then A=C α=c X=C β= ε b=c/d

C->•d, c/d \\ Then FIRST(βb) = FIRST(εc/d) = FIRST(c/d) = c/d


goto (I3 , d)

C->d•, c/d ------------> I4

No more move in I4 and I5. Because the dot is already in end

Goto of I6

goto (I6 , C)

C->cC• , $ ------------> I9

goto (I6 , c)

C->c•C , $ \\The dot symbol is followed C. So, add productions starting with C

C->•cC, $ -----------> I6 \\ C->c•C , $ match with A ->α •Xβ,b then A=C α=c X=C β= ε b=$

C->•d , $ \\ Then FIRST(βb) = FIRST(ε$) = FIRST($) = $

goto (I6 , d)

C->d•,$ -------------> I7

No more move in I7 , I8 and I9 . Because the dot is already in end


Step 4: Finite State Machine DFA for the above LR (1) items


Step 5: CLR Parsing Table


Step 6: LALR Parser Construction

Consider the grammar in the previous example. Consider the states I4 and I7 as given below:

C->d•, c/d ------------> I4

C->d•,$ -------------> I7

These states are differing only in the look-aheads. They have the same productions. Hence these statesare combined to form a single state called as I47.

C->d•, c/d/$ -----------> I47

Similarly the states I3 and I6 differing only in their look-aheads as given below

C->c•C , c/d C->c•C , $

C->•cC , c/d --------> I3 C->•cC , $ --------> I6

C->•d, c/d C->•d, $

These states are differing only in the look-aheads. They have the same productions. Hence these states are combined to form a single state called as I36.

C->c•C , c/d/ $

C->•cC , c/d/ $ --------> I36

C->•d, c/d/ $


Similarly the States I8 and I9 differing only in look-aheads.

C->cC•, c/d -----------> I8

C->cC• , $ ------------> I9

Hence they combined to form the state I89

C->cC•, c/d/ $ -----------> I89

Therefore Final States of LALR (1)

I0 I1 I47

S′->•S, $ S′->S•,$ C->d•, c/d/$ S->•CC , $C->•cC , c/d C->•d , c/d

I2 I36 I5 I89

S-> C•C, $ C->c•C , c/d/ $ S->CC•,$ C->cC•, c/d/ $C->•cC , $ C->•cC , c/d/ $C->•d,$ C->•d, c/d/ $


Step 7: LALR Parsing Table


Topic

YACC

YACC – Automatic Parser Generator YACC is a automatic tool that generates the parser program YACC stands for Yet Another Compiler Compiler. This program is available in UNIX OS The construction of LR parser requires lot of work for parsing the input string. Hence,

the process must involve automation to achieve efficiency in parsing an input Basically YACC is a LALR parser generator that reports conflicts or uncertainties (if at all

present) in the form of error messages


Basic SpecificationsThe YACC specification file consists of three part

Declaration section

In this section, ordinary C declarations are inserted and grammar tokens are declared.

The tokens should be declared between %{ and %}


Translation rule sectionIt includes the production rules of context free grammar with corresponding actionsExampleRule-1 action-1Rule-2 action-2:Rule n action nIf there is more than one alternative to a single rule then those alternatives are separatedby ‘|’ (pipe) character.The actions are typical C statements. If CFG isLHS: alternative 1 | alternative 2 | …… alternative nThenLHS: alternative 1 {action 1}

| alternative 2 {action 1}:

| alternative n {action n}


C functions Section

This consists of one main function in which the routine yyparse() is called. And it alsocontains required C functions.

Example

YACC Specification of a simple desk calculator:

%{

#include <ctype.h>

%}

%token DIGIT

%%

line: expr ‘\n’ { printf(“%d\n”, $1); }

;

expr : expr ‘+’ term { $$ = $1 + $3; }

| term

;Department of CSE, NSCET, Theni Page-79

term : term ‘*’ factor { $$ = $1 * $3; }

| factor

;

factor : ‘(‘ expr ‘)’ { $$ = $2; }

| DIGIT

;

%%

yylex() {

int c;

c = getchar();

if(isdigit(c)

{

yylval = c-‘0’;

return DIGIT;

}

return c;

}


nscet e-learning presentation notes/unit 2/cs8602_cd.pdf · statement in a java method with result...

Documents