languages, grammars, and regular expressions chuck cusack based partly on chapter 11 of “discrete...

34
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth Rosen

Upload: cleopatra-doreen-carroll

Post on 13-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Languages, Grammars, and Regular Expressions

Chuck Cusack

• Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5th edition, by Kenneth Rosen

Page 2: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Alphabets and Languages• Definition: A vocabulary (or alphabet) V is a

finite, nonempty set of symbols. • Definition: A word or sentence over V is a finite

string of symbols from V.• Definition: The empty string or null string,

denoted by , is the string containing no symbols.• Definition: The set of all words over V is denoted

by V*.• Definition: A language over V is a subset of V*.

Page 3: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Language Examples• Let V={0,1}• 00110, 11111, 00, and 11 are words over V• 012, a234, and 222 are not words over V• V*={0,1,00,01,10,11,000,…}• In other words, V* is the set of all binary strings• The set of strings consisting of only 0s is a

language over V*

• {1,10,100,1000,10000,…} is a language over V*

Page 4: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Concatenation• Definition: Let V be a vocabulary, and A and B

be subsets of V*. The concatenation of A and B, denoted by AB, is the set of all strings of the form xy, where xA and yB.

• Example: Let A={0, 10}, and B={1,12}. Then– AB={01, 012, 101, 1012}– BA={10, 110, 120, 1210}– AA={00, 010, 100, 1010}– AAA=A(AA)={000, 0010, 0100, 01010, 1000,

10010, 10100, 101010}

Page 5: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Concatenation: An

• Definition: Let V be a vocabulary, and A a subset of V*. Then A0={} , and for n>0, we can define

An=A(n-1)A• Example: Let A={0, 10}. Then

– A0={– A1=A0A={A=A={0,10}– A2=A1A ={00, 010, 100, 1010} – A3= A2A={000, 0010, 0100, 01010, 1000,

10010, 10100, 101010}

Page 6: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Kleene Closure• Definition: Let V be a vocabulary, and A a subset of V*.

The Kleene closure of A, denoted by A*, is the set consisting of concatenations of an arbitrary number of strings from A. That is,

0

*

k

kAA

}{*

1

AAAk

k

• Definition: A+ is the set of nonempty strings over A. In other words,

Page 7: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Kleene Closure Example• Example: Let A={0, 1}. Then

– A0={– A1={0,1}

– A2={00, 01, 10, 11}

– A3={000, 001, 010, 011, 100, 101, 110, 111}

– A*={0,1}*={All binary strings}

• Example: Let B={111}. Then

– B0={B1={111}, B2={111111}

– B3={111111111}

– B* is the set of strings with 3n 1s, for every n

Page 8: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Regular Sets• Definition: A regular set is a set that can be

generated starting from the empty set, empty string, and single elements from the vocabulary, using concatenations, unions, and Kleene closures in arbitrary order.

• We will give a more precise definition after we define a regular expression.

Page 9: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Regular Expressions• Definition: The regular expressions over a set I

are defined recursively by: – (the empty set) is a regular expression,– (the set containing the empty string) is a regular

expression,– x is a regular expression for all xI,– (AB) , (AB) , and A* are regular expressions if A and B

are regular expressions

• Definition: A regular set is a set represented by a regular expression.

• Examples: 001*, 1(0(01)*11, and AB*C are regular expressions

Page 10: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Regular Expression Example• The regular set defined by the regular expression

01* is the set of strings starting with a 0 followed by 0 or more 1s.

• The regular set defined by (10)* is the set of strings containing 0 or more copies of 10.

• The regular set defined by 0(01)*1 is the set of all binary strings beginning with 0 and ending with 1.

• The regular set defined by (01)1(01) is the set of strings {010, 011, 110, 111}.

Page 11: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Regular Expression Applications• Regular expressions are actually used quite often

in computer science.• For instance, if you are editing a file with vi, and

want to see if it contains the string blah followed by a number followed by any character followed by the letter Q, you can use the regular expression

blah[0-9][0-9]*.Q• This works because vi uses regular expressions for

searching.

Page 12: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Grammars and Languages

• Many languages can be defined by grammars.• We are particularly interested in phrase-structure

grammars.• Before we can define phrase-structure grammars,

we need to define a few more terms.

Page 13: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Special Symbols• Definition: A nonterminal symbol (or just

nonterminal) is a symbol which can be replaced by other symbols.

• Definition: A terminal symbol (or just terminal) is a symbol which cannot be replaced by other symbols.

• Definition: The start symbol is a special symbol, usually denoted by S.

• The set of terminals is denoted by T, and the set of nonterminals by N.

• S is a nonterminal.

Page 14: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Productions• Definition: A production is a rule which tells how

to replace one string from V* with another string.• Productions are denoted by ab, which denotes

that a can be replaced by b.• Example

– Let SA0, AA1, and A0 be productions

– Then I can replace S with A0

– Since I can replace A with A1, A0 can become A10

– Since I can replace A with 0, A10 can become 010

– Thus, I can replace S with 010

Page 15: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Phrase-Structure Grammars• Definition: A phrase-structure grammar is a 4-

tuple G=(V,T,S,P), where – V is a vocabulary– TV is a set of terminals– SV is a start symbol– P is a set of productions

• N=V-T is the set of nonterminals• Each production contains at least one nonterminal

on its left side.• We will always use S as the start symbol.

Page 16: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Direct Derivations

• Let G=(V,T,S,P) be a phrase-structure grammar.

• Let A=lar and B=lbr, where l, a, b, r V*.

• Let ab be a production.

• Then we can derive B from A.

• Thus we say that A is directly derivable from B.

• We write this as AB

Page 17: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Derivations

• Let G=(V,T,S,P) be a phrase-structure grammar

• Let A1, A2,…,An V* be such that

A1A2…An

• Then we say that An is derivable from A1.

• We write A1* An

• The sequence of productions used is called a derivation.

Page 18: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Generating Languages

• Let G=(V,T,S,P) be a grammar

• Definition: The language generated by G, denoted L(G) , is the set of all strings of terminals that are derivable from S.

• Put another way,

L(G)={w T* | S * w }

Page 19: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Example 1

Let G be the grammar with – V={S,0,1} – T={0,1}– P={SS0, S0}

• Clearly S0, so 0L(G)• Also, SS000, so 00L(G)• And, SS0S00000, so 000L(G)• It is not hard to see that L(G) is the language

consisting of all strings with 1 or more 0s.

Page 20: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Example 2

Let G be the grammar with V={S,0,1}, T={0,1}, and P={SSS, S1, S0}

• Clearly S0, so 0L(G)• Also, S1, so 1L(G)• Since SSSS101, so 01L(G)• In general, we can get a sequence of Ss, and

replace each with either 0 or 1. • Given this fact, it is easy to see that

L(G) ={0,1}+, the set of all non-empty binary strings

Page 21: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Example 3

Let G be the grammar with V={S,A,B,0,1}, T={0,1}, and

P={SAB, BBB, AAA, A0, B1}• Clearly SAB0B01, so 01L(G)• Also, SABAAB0AB00B001, so

001L(G)• Similarly, we can get 011, 0011, 0001, etc.• In general, we can get a sequence of n 0s followed

by m 1s, where n>0, m>0.• Thus L(G) ={0n1m | m and n are positive integers}

Page 22: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Type 0 Grammars

• Type 0 grammars have no restrictions on the types of productions that are allowed.

• Thus type 0 grammars are just phrase-structure grammars.

• This is not too exciting, so we will move on to type 1 grammars.

Page 23: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Type 1 Grammars

• In a type 1 grammar, productions are of the form– aXbacb,where XN and a,b,cV* with c– (or S, but ignore this for now)

• Thus, a production can only be applied if the symbol X is surrounded by a and b.

• In other words, the production can only be applied in a certain context.

• This is why type 1 grammars are also called context-sensitive grammars.

Page 24: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Type 2 Grammars• Productions are of the form

– Xa, where XN and aV*.

• Thus, if X is in a string, we can replace X with a no matter what surrounds X.

• In other words, the context in which X appears does not matter.

• This is why type 2 grammars are called context-free grammars.

• Context-free grammars produce context-free languages.

Page 25: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Type 3 Grammars

• Productions are of the form– Xa, where XN and aT– XaY, where X,YN and aT– S

• Type 3 grammars are called regular grammars.• Regular grammars produce regular languages.• It is easy to see that a type 3 grammar is a type 2

grammar.

Page 26: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Types of Grammars

Type Productions allowed

0 Almost any kind allowed

1 aXbacb, where XN, a,b,cV*, c

S

2 Xa, where XN and aV*

3 Xa, where XN and aT

XaY, where X,YN and aT

S

Page 27: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Types of Grammars• The following summarizes the relationships

between the types of grammars

Type 0: phrase-structure

Type 1: context-sensitive

Type 2: context-free

Type 3: regular

Page 28: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Regular Grammar Example

• Let G be the grammar with

– V={S,A,0,1},

– T={0,1}, and

– P={S0A, A0A, A1A, A1}

• We can determine what the language is by constructing a few words.– S0A01

– S0A00A001 S0A01A011

– S0A00A000A0001 S0A00A001A0011

– S0A01A010A0101 S0A01A011A0111

• We can see that in general, L(G) is the set of binary strings beginning with 0 and ending with 1.

Page 29: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Regular Languages and Sets

• Theorem: Let A be a subset of V* . Then A is a regular language if and only if A is a regular set.

• In other words, a language defined by a regular grammar can also be defined by a regular expression, and vice-versa.

• Example: We just saw that the grammar with V={S,A,0,1}, T={0,1}, and P={S0A, A0A, A1A, A1} generates the set of binary strings beginning with 0 and ending with 1.

• Recall that the regular set defined by 0(01)*1 is also the set of all binary strings beginning with 0 and ending with 1.

Page 30: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Grammar Applications

• Context-free grammars are used to define the syntax of most programming languages.

• Regular grammars are used in several applications, including the following– Searching text for patterns

– Lexical analysis (during program compilation)

• Efficient algorithms exist to determine if a string is in a context-free or regular language.

• This is important for tasks like determining whether or not a program is syntactically valid.

Page 31: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Backus-Naur Form

• Backus-Naur form (BNF) is a more compact representation of productions in a type 2 grammar.

• All productions with the same left hand side are combined into one production

• The symbol is replaced with ::=• All terminals are enclosed in < and >• The right hand sides of the various productions are

combined, and separated by |

Page 32: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Backus-Naur Form Example

• Consider the set of productions– SAB

– BBB

– AAA

– A0

– B1

• In BNF, they are represented by – <S> ::= <A><B>

– <B> ::= <B><B> | 1

– <A> ::= <A><A> | 0

Page 33: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Backus-Naur Form Example 2

• The Backus Naur form for the production of a signed integer is

– <signed integer> ::= <sign><integer>– <sign> ::= + | -– <integer> ::= <digit> | <digit><integer>– <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Page 34: Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth

Backus-Naur Form Applications

• Specifying the syntax for programming languages including – Java– LISP

• Specifying database languages– SQL

• Specifying markup languages– XML