chapter 6 simplification of context-free grammars and normal forms

76
Chapter 6 Simplification of Context-free Grammars and Normal Forms These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata, 4 th ed., by Peter 06/08/22 03:55

Upload: aerona

Post on 23-Jan-2016

192 views

Category:

Documents


14 download

DESCRIPTION

Chapter 6 Simplification of Context-free Grammars and Normal Forms These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata , 4 th ed., by Peter. Parsing. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Chapter 6

Simplification of Context-free Grammars and Normal Forms

These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata, 4th ed., by Peter

04/21/23 19:54

Page 2: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Parsing• Given a string w and a grammar G, a parser

finds a derivation of the string w from the grammar G, or else determines that the string is not part of the language

• Thus, a parser solves the membership problem for a language, which is the problem of deciding, for any string w and grammar G, whether w belongs to the language generated by G

• Typically, a parser also constructs a parse tree for the string (which can be used by a compiler for code generation)04/21/23 19:54

Page 3: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Two questions

• Can we solve the membership problem for context-free languages? That is, can we develop a parsing algorithm for any context-free language?

• If so, can we develop an efficient parsing algorithm?

• We saw in the previous chapter that we can, if we place restrictions on the grammar.

04/21/23 19:54

Page 4: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Simplified forms and normal forms

Simplified forms can eliminate ambiguity and otherwise “improve” a grammar

What we would like to do is to have all productions in a CFG be in a form such that the string length is strictly non-decreasing. Once the productions are in this form, whenever we find in the process of deriving a string that the derivation string is longer than the input string, we know that the string cannot belong to the language.

04/21/23 19:54

Page 5: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Simplified forms and normal forms

Normal forms of context-free grammars are interesting in that, although they are restricted forms, it can be shown that every CFG can be converted to a normal form.

The two types of normal forms that we will look at are Chomsky normal form and Greibach normal form.

04/21/23 19:54

Page 6: Chapter 6 Simplification of Context-free Grammars and Normal Forms

The empty stringThe empty string often complicates things, so we would like

to define (and work with) a subset of a language which accepts the empty string.

Let L be a context-free language and let G’ = (V, T, S, P) be a context free grammar for L – { λ }.

Then we can construct a grammar G that generates L by adding the following to G’:

Create a new Start variable, S0

Add two new production rules to G’:

S0 S

S0 λ

04/21/23 19:54

Page 7: Chapter 6 Simplification of Context-free Grammars and Normal Forms

The empty string

Most of the proofs for CFG languages are demonstrated by using λ-free languages. It usually can be shown quite easily that the proof can also be extended to “equivalent” languages for which the only difference is the acceptance of the empty string.

(yes, this is handwaving, but . . .)

04/21/23 19:54

Page 8: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Simplified forms

Theorem 6.1: Let G = (V, T, S, P) be a context-free grammar. Suppose that P contains a production rule of the form:

A x1Bx2

Assume that A and B are different variables and that

B y1 | y2 | . . . | yn

is the set of all productions in P which have B as the left side.04/21/23 19:54

Page 9: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Simplified forms

Theorem 6.1: (continued)

Let G’ = (V, T, S, P’) be the grammar in which P’ is constructed by deleting

A x1Bx2

from P, and adding to it

A x1y1x2 | x1y2x2 | . . . | x1ynx2

Then it may be shown that

L(G’) = L(G)

(see the Linz textbook, for the proof)04/21/23 19:54

Page 10: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Simplified forms

Example:

A a | aaA | abBc

B abbA | b

Here we can’t eliminate all rules with B on the left side, but we can eliminate it from the right side of any A rules. The equivalent productions would be:

A a | aaA | ababbAc | abbc

B abbA | b04/21/23 19:54

Page 11: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Simplified forms

Example:

Suppose that our complete simplified grammar is:

S A

A a | aaA | ababbAc | abbc

B abbA | b

Since you can’t get to B from S, there is no longer any way that any B rules can play a part in any derivation; they are useless.

04/21/23 19:54

Page 12: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Simplified formsAnother example:

Suppose that our grammar is:

S aSb | λ | A

A aA

Notice that the production rule A aA can never be used to produce a sequence of all terminals. It is therefore useless.

The production rule S A is also useless. (Why?) Both of these rules may be deleted without effectively changing the grammar.

04/21/23 19:54

Page 13: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Reachable

Definition: A variable A in a CFG grammar G = (V, , S, P)

is reachable if S * xAy for some xy (V T)*.

Reachable variables are variables that appear in strings derivable from S.

04/21/23 19:54

Page 14: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Example

S EA

A abA | ab

C EC | Ab

E bC

G EbE | CE | ba

Reachable variables:R0 = {S}R1 = {S, E, A}R2 = {S, E, A, C}R3 = {S, E, A, C}

04/21/23 19:54

Page 15: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Useful variables

Definition: Let G = (V, , S, P) be a context-free grammar.

Let A V; then A is live iff there is at least one string w L(G) such that

xAy * w with x, y in (V T)*

Informally, live variables are those from which strings of terminals can be derived. Variables which are not live are said to be dead.

04/21/23 19:54

Page 16: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Example

S AB | CD | ADF | CF | EA

A abA |ab

B bB | aD | BF | aF

C cB | EC | Ab

D bB | FFB

E bC | AB

F abbF | baF | bD | BB

G EbE | CE | ba

Live variables:L0={A, G}L1={A, G, C}L2={A, G, C, E}L3={A, G, C, E, S}

04/21/23 19:54

Page 17: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Useful variables

Definition 6.1 (modified): A variable A in a CFG grammar G = (V, , S, P) is useful if, for some string w L(G) , there is a derivation of w that takes the form S * xAb* w.

Informally, a variable is useful if it can be used in a derivation of a string in the language L(G).

A variable which is not useful is said to be useless. Variables which are dead are useless.Variables which are not reachable are useless.

04/21/23 19:54

Page 18: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Useless variables

So a variable is useless if either:

1. it is not live (i.e., cannot derive a terminal string), or

2. it is not reachable from the start symbol

A production is useless if it involves any useless variables.

04/21/23 19:54

Page 19: Chapter 6 Simplification of Context-free Grammars and Normal Forms

ExerciseExample:

Given G = ({S, A, B, C}, {a, b}, S, P), with P =S aS | A | CA aB aaC aCb

eliminate all useless variables and productions.

First, we find any dead variables. It should be obvious that C can never generate a string of all-terminals. C is dead.04/21/23 19:54

Page 20: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Exercise

Delete any productions involving C.

New grammar: S aS | A A aB aa

Next, we check to see if there are any variableswhich cannot be reached from the start symbol.

To do this, we may use a dependency graph.

04/21/23 19:54

Page 21: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Exercise

Example: S aS | A | CA aB aaC aCb

Dependency graph:

S A

C

B

Clearly, B is not reachable

from S.04/21/23 19:54

Page 22: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Exercise

Delete any productions involving B.

New grammar: S aS | A A a

The only productions that were deleted from the original grammar were useless.

This new grammar generates all and only the strings generated by the original grammar. It is equivalent to the original grammar.04/21/23 19:54

Page 23: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Useless variables

Theorem 6.2: Let G = (V, T, S, P) be a context-free grammar. Then there exists an equivalent grammar G’ = (V’, T’, S, P’) that does not contain any useless variables or productions.

Note that useless variables may be removed from V to give V’, and any terminals not occurring in any useful production may be removed from T to give T’.04/21/23 19:54

Page 24: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Simplified forms and normal formsTwo undesirable types of productions in a CFG can

make the string length in sentential forms not increase:

productions -

these productions are of the form A , and they actually decrease the length of the string

unit productions -

these productions are of the form A B, and they allow rules to be applied to a string without increasing the length of the string and without getting us any closer to the goal of ending up with a string of all terminals

04/21/23 19:54

Page 25: Chapter 6 Simplification of Context-free Grammars and Normal Forms

productions

Definition 6.2: Any production of a context-free grammar of the form

A λ

is called a λ-production.

Any variable A for which the derivation A * λ is possible is called nullable.

04/21/23 19:54

Page 26: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Nullable variables

A nullable variable in a context-free grammar G = (V, , S, P) is defined as follows:

1. Any variable A for which P contains the production A is nullable.

2. If P contains the production A B1B2…Bn and B1B2…Bn are nullable variables, then A is nullable.

3. No other variables in V are nullable.

The nullable variables in V are precisely those variables A for which A * .

04/21/23 19:54

Page 27: Chapter 6 Simplification of Context-free Grammars and Normal Forms

The effect of productions

Suppose we are trying to see if our CFG generates the string aabaa, which contains 5 terminal characters. In the process of applying productions, we have generated an intermediate string, aaYbYaa, containing 7 characters. Sinceproductions decrease the length of the string, it might still be possible to generate aabaa from aaYbYaa (if there were a derivation path Y ).

04/21/23 19:54

Page 28: Chapter 6 Simplification of Context-free Grammars and Normal Forms

productions

Note that without productions, a grammar would have no way to reduce the number of characters in its intermediate strings. In such a grammar, we could stop processing intermediate strings as soon as they exceeded the length of the target string.

04/21/23 19:54

Page 29: Chapter 6 Simplification of Context-free Grammars and Normal Forms

productions

So, given a CFG G without productions, we could determine if a given string x of length |x| belonged to L(G) simply by applying production rules and generating all strings of length |x|. If x had not been generated up to that point, it could not belong to that language.

04/21/23 19:54

Page 30: Chapter 6 Simplification of Context-free Grammars and Normal Forms

productions

Given the grammar

S aS1b

S1 aS1b | λ

What is the effect of the production S1 λ?

The effect is to delete S1 from any sentential form occurring on the right-hand side of a production rule.

04/21/23 19:54

Page 31: Chapter 6 Simplification of Context-free Grammars and Normal Forms

productions

If we apply the production S1 λ toS aS1b

the resulting production rule isS ab

If we apply the production S1 λ toS1 aS1b

the resulting production rule isS1 ab

04/21/23 19:54

Page 32: Chapter 6 Simplification of Context-free Grammars and Normal Forms

productions

Therefore, we can eliminate any λ-productions from this grammar by adding the new productions obtained by substituting λ for S1 wherever S1 appears on the right-hand side of the production rules, and then deleting the λ-production.

When we do this, we obtain the equivalent grammar:

S aS1b | ab

S1 aS1b | ab04/21/23 19:54

Page 33: Chapter 6 Simplification of Context-free Grammars and Normal Forms

productions

Theorem 6.3: Let G be any context-free grammar with λ not in L(G). Then there exists an equivalent grammar G’ having no λ-productions.

04/21/23 19:54

Page 34: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Algorithm FindNull

Establish the set N0, which is the set of all variables A in the grammar that go directly to .Now loop: The first time through the loop, add to this set all variables B that go to A. The second time through the loop, add to this set all variables C that go to B. The third time through the loop, add to this set all variables D that go to C. etc. . . . Stop when no new variables were added to the set during the last iteration of the loop.04/21/23 19:54

Page 35: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Example

Let G be the CFG with the productions:

S ABCBCDAA CDB CbC a | D bD |

Here, C and D are nullable because there are production rules C and D .

But A is also nullable, because A CD, and both C and D are nullable.

04/21/23 19:54

Page 36: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Algorithm: Eliminate productions

Given a CFG G = (V, S, P) construct a CFG G’= (V, S, P’) with no -productions as follows:

1. Initialize P’ = P2. Find all nullable variables in V, using FindNull.3. For every production A x in P (x {V T}*),

where x contains nullable variables, add to P’ every production that can be obtained from this one by deleting from x one or more of the occurrences in xof nullable variables.

4. Delete all productions from P’. 5. In addition, delete any duplicates and delete

productions of the form A A.04/21/23 19:54

Page 37: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Implications of Theorem 6.3:

Let G = (V, , S, P) be any context-fee grammar, and let G’ be the grammar obtained from G by the previous algorithm. Then:

1. G’ has no-productions, and 2. L(G’) = L(G) - {}.3. Moreover, if G is unambiguous, then so is G’.

04/21/23 19:54

Page 38: Chapter 6 Simplification of Context-free Grammars and Normal Forms

ExampleGiven a context-free grammar with the following production rules, find the nullable variables:S ABCA B | aB C | b | λC AB | DD Cd

N0 = {B}N1 = {B, A}N2 = {B, A, C}N3 = {B, A, C, S}04/21/23 19:54

Page 39: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Example (continued)

S ABCA B | aB C | b | C AB | DD Cd

N = {A, B, C, S}

S ABCS ABC | BC | AC | AB | A | B | C

C AB | DC AB | A | B | D

D CdD Cd | d

04/21/23 19:54

Page 40: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Example (continued)

S ABC | AB | AC | BC | A | B | CA B | aB C | bC AB | A | B | DD Cd | d

Note that we have gotten rid of all -productions. However, other beneficial changes can still be made.

04/21/23 19:54

Page 41: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Unit productions

Definition 6.3: Any production of a context-free grammar of the form

A B,

where A, B V is called a unit-production.

04/21/23 19:54

Page 42: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Unit productions

Theorem 6.4: Let G = (V, T, S, P) be any context-free grammar without λ-productions. Then there exists a context-free grammar G’ = (V’, T’, S, P’) that does not have any unit-productions and that is equivalent to G.

Proof: See p. 159 in the Linz text.

04/21/23 19:54

Page 43: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Definition of A-derivable variables

The set of “A-derivable variables” is the set of all variables B for which A * .

1. If A B is a production, then B is A-derivable.2. If:

C is A-derivableC B is a productionB A

then B is A-derivable.3. No other variables are A-derivable.

04/21/23 19:54

Page 44: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Algorithm: Eliminating Unit Productions

Given a context-free grammar G = (V, S, P) with no -productions, construct a grammar G’= (V, S, P’) having no unit productions as follows:

1. Initialize P’ to be P.2. For each A V, find the set of A-derivable variables.3. For every pair (A, B) such that B is A-derivable, and

every non-unit production B x (where x {V T}+), add the production A x to P’.

4. Delete all unit productions from P’.

04/21/23 19:54

Page 45: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Example

Original grammar:S S+T | TT T*F | FF (S) | a

{S -derivable} = {T} {T-derivable} = {F}{S-derivable} ={T, F}

Resulting grammar:S S+T | T*F | (S) | aT T*F | (S) | aF (S) | a04/21/23 19:54

Page 46: Chapter 6 Simplification of Context-free Grammars and Normal Forms

SummaryTheorem 6.5: Let L be a context-free language

that does not contain λ. Then there exists a context-free language that generates L and that does not have any useless productions, λ-productions, or unit-productions.

Proof: Find a CFG that generates L. Apply the procedures in theorems 6.2, 6.3, and 6.4. The result is an equivalent CFG that generates L but does not have any useless productions, λ-productions, or unit-productions..

04/21/23 19:54

Page 47: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Summary

Note that the procedure specified above must occur in a particular order. The procedure for removing λ-productions can create new unit-productions, and the procedure for eliminating unit-productions must start with a CFG that has no λ-productions. The required sequence is:

1. Remove λ-productions

2. Remove unit productions

3. Remove useless productions04/21/23 19:54

Page 48: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Unit productions

Given a context-free grammar G’ without unitproductions, any production rule must either:

• Convert a non-terminal to a terminal, or

• Replace a non-terminal with at least two other symbols

04/21/23 19:54

Page 49: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Unit productions

Let:l = length of the current stringt = the number of terminals in the current string

The value of l + t is 1 for the starting string S and 2k for a string (all terminals) of length k in the language.

The value of l + t for an intermediate string of length k containing 1 or more variables would be < 2k.

Any intermediate string with l + t > 2k cannot generate a string of length k in the language.

04/21/23 19:54

Page 50: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Simplified forms

What does this mean for us?

Given a grammar G and a language L(G), it means that if you have a string, x, in L(G) and |x| = k, then starting from S there are no more than 2k - 1 steps in the derivation of x.

04/21/23 19:54

Page 51: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Proof:At the beginning of the derivation of x, the length of the

intermediate string, S, is 1. Somehow you need to generate a string of length k. If G has no -productions or unit-productions, then there are 2 possible kinds of rules:

1. The rule transforms one non-terminal into some combination of two or more non-terminals and/or terminals

2. The rule transforms one non-terminal into one terminalRules of the first type will increase the length of the

derivation string by at least one character at each step. So it will take no more than k-1 steps to increase the size of the string to k.04/21/23 19:54

Page 52: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Proof:Once the intermediate string has k symbols in it, any

additional rules involved in the derivation of x must simply replace variable symbols with terminals. The “worst-case scenario” is if all the symbols are variables; in that case, we will need at most k steps (of rules of the second type, which replace a single variable with a single terminal) to convert the intermediate string into a string of all terminals.

It will take no more than 2k - 1 applications of the production rules to derive x.

These rules can be applied in any order. (We don’t have to expand the string first and then convert it to terminals.)04/21/23 19:54

Page 53: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Chomsky Normal Form

There are other ways to limit the form a grammar can have.

A context-free grammar in Chomsky Normal Form (CNF) has all of its rules restricted so that there are no more than two symbols, either one terminal or two variables, on the right-hand side of a production rule.

This seems very restrictive, but actually every context-free grammar can be converted into Chomsky Normal Form.

04/21/23 19:54

Page 54: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Chomsky Normal Form

Definition 6.4: A context-free grammar is in Chomsky Normal Form (CNF) if every production is one of these two types:

A BC

A a

where A, B, and C are variables and a is a terminal symbol.

04/21/23 19:54

Page 55: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Chomsky normal form

For languages that include the empty string λ, the rule S λ may also be allowed, where S is the start symbol, as long as S does not occur on the right-hand side of any rule

04/21/23 19:54

Page 56: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Chomsky Normal Form

Theorem 6.6: Any context-free grammar G = (V, T, S, P) with λ L(G) has an equivalent grammar G’ = (V’, T’, S, P’) in Chomsky Normal Form.

(Actually, for languages that include the empty string λ, the rule S λ may also be allowed, where S is the start symbol, as long as S does not occur on the right-hand side of any rule.)

04/21/23 19:54

Page 57: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Chomsky Normal Form: Proof by construction

Given a CFG grammar G = (V, , S, P), to convert it to Chomsky Normal Form:

1. Eliminate -productions and unit-productions from G, producing a CFG G’= (V, , S, P’), such that L(G’) = L(G) - {}.

2. Convert G’ into G’’ = (V’’, , S, P’’) so that every production is either of the form

A B1B2 … Bk

(where k 2 and each Bi is a variable in V’’), or of the form

A a04/21/23 19:54

Page 58: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Chomsky Normal Form

Basically, what you are doing in step 2 is restricting the right sides of productions to be either single terminals or strings of two or more variables.

What we don’t want is strings of length 2 that have one or more terminals in them. If we have strings like this, for every terminal a appearing in such a string:

1. Add a new variable, Xa and

add a new production, Xa a

2. Replace a by Xa in all the productions where it appears (except those in the form A a).

04/21/23 19:54

Page 59: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Chomsky Normal Form (continued)3. Convert G’’ into G’’’ = (V’’’, , S, P’’’). To do this,

replace each production having more than two variables on the right by an equivalent set of productions, each one having exactly two variables on the right. (Create new variables as necessary to accomplish this.)

For example:the production A BCD would be replaced with

A BZ1

Z1 CD

Done!

04/21/23 19:54

Page 60: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Example

Original grammar:S AB | abA ABAB | BAB ab | b

After step 2:S AB | XaXb

Xa aXb bA ABAB | BAB XaXb | b

04/21/23 19:54

Page 61: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Example

After step 2:S AB | XaXb

Xa aXb bA ABAB | BAB XaXb | b

After step 3:S AB | XaXb

Xa aXb bA AY1 | BAY1 BY2

Y2 ABB XaXb | b

04/21/23 19:54

Page 62: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Example

If you recognize thatA ABABhas two copies of thesame pair of variables,you could substitutethe following instead:(but the first procedureworks equally well)

After step 3:S AB | XaXb

Xa aXb bA Y1Y1 | BAY1 ABB XaXb | b

04/21/23 19:54

Page 63: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Proof (concluded)

This constitutes a proof by construction that any CFG can be converted to CNF.

Later, this will be used to prove that there are languages which are not context-free.

04/21/23 19:54

Page 64: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Greibach Normal Form

Greibach Normal Form is similar to Chomsky Normal Form, except that every production is of the form A ax, where a is a terminal symbol and x is a string of zero or more variables. Note that GNF puts a limit on where terminals and variables can appear – restrictions on their relative positions – rather than on the number of symbols on the right-hand side of the production rules.

04/21/23 19:54

Page 65: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Greibach Normal Form

Definition 6.5: A context-free grammar is said to be in Greibach Normal Form if all productions have the form

A axwhere a T and x V*

04/21/23 19:54

Page 66: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Greibach Normal Form

Example:

Convert the following grammar into GNF:S abSb | aa

Introduce new variables A and B to stand for a and b respectively, and substitute:

S aBSB | aAA aB b

04/21/23 19:54

Page 67: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Greibach Normal Form

Theorem 6.7: Any context-free grammar G = (V, T, S, P) with λ L(G) has an equivalent grammar G’ = (V’, T’, S, P’) in Greibach Normal Form.

It is hard to prove this, and it is hard to construct an easy-to implement algorithm for performing the conversion.

04/21/23 19:54

Page 68: Chapter 6 Simplification of Context-free Grammars and Normal Forms

A membership algorithm for CFG’s

The famous linguist Noam Chomsky showed that every context-free grammar can be converted to an equivalent grammar in Chomsky normal form.

Why should you care about this?

The fact that any CFG can be converted to Chomsky normal form lets us develop a parsing algorithm that shows that the membership problem can be solved for context-free languages (CFLs).04/21/23 19:54

Page 69: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Some motivationHere is the idea of the algorithm: For a grammar in Chomsky normal form, any

derivation of a string w has 2n-1 steps, where n is the length of w. (Why?) So, it is only necessary to check derivations of 2n-1 steps to decide whether G generates w.

Of course, this parsing algorithm is inefficient! It would never be used in practice. But it solves the membership problem for CFLs.

04/21/23 19:54

Page 70: Chapter 6 Simplification of Context-free Grammars and Normal Forms

The CYK algorithm

The membership algorithm for CFG’s that is usually cited is the CYK algorithm, named for its three developers.

It works by breaking down the problem into a sequence of smaller problems and solving them. Details may be found on pages 172-173 of the Linz textbook.

This algorithm can be shown to run in |w|3 time.

04/21/23 19:54

Page 71: Chapter 6 Simplification of Context-free Grammars and Normal Forms

LL grammars• A top-down parser finds a leftmost derivation of a string.

“Top-down” means to start with the start symbol and show how to derive the string from it.

• An LL(k) grammar allows a parser to perform left-to-right scan of the input to find a leftmost derivation, using k symbols of lookahead to select the next rule.

• Many compilers have been written using LL parsers. But LL grammars are not sufficiently general to generate all deterministic CFLs. This led to study of more general deterministic grammars, especially LR grammars.

04/21/23 19:54

Page 72: Chapter 6 Simplification of Context-free Grammars and Normal Forms

LR grammars• A bottom-up parser finds a rightmost derivation of a

string. “Bottom-up” means to start with a string and “reduce” it to the start symbol.

• An LR(k) grammar allows a parser to perform left-to-right scan of the input to produce a rightmost derivation, using k symbols of lookahead to select the next rule.

• The class of languages generated by LR(1) grammars is exactly the deterministic CFLs.

• Two subclasses of LR(1) grammars, called SLR(1) (for “simple” LR) and LALR(1) (for “lookahead” LR) are commonly used for programming languages.04/21/23 19:54

Page 73: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Parsing algorithms

• Parsing is an extremely important topic in the design and compilation of programming languages. You will study parsing algorithms based on various LL and LR grammars in a course on compiler design.

• Most of what we have studied in these chapters about regular and context-free languages provides the mathematical foundation for designing good compilers. (It has many other applications as well.)

04/21/23 19:54

Page 74: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Efficient parsing

• Programming languages are context-free languages, and parsing is central to any programming language compiler

• Many parsing algorithms for context-free grammars have been developed over the years. Most simulate pushdown automata.

• However, some PDAs cannot be simulated efficiently by computer programs because they are nondeterministic. Efficient parsers simulate deterministic PDAs.04/21/23 19:54

Page 75: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Regular grammar CFG’s

A word is a string of all terminals. A semiword is a string of 0 or more terminals concatenated with exactly one nonterminal on the right. So, for example, abcA is a semiword.

A CFG is called a regular grammar if each of its productions is one of the two forms:Nonterminal semiwordNonterminal word

04/21/23 19:54

Page 76: Chapter 6 Simplification of Context-free Grammars and Normal Forms

Regular grammars

All regular languages can be generated by regular grammars. All regular grammars generate regular languages.

Context-free grammars are more powerful than regular grammars. Regular languages are a proper subset of context-free languages, so CFG’s can generate all regular languages (as well as non-regular context-free languages).

04/21/23 19:54