lecture 3 graph representation for regular expressions

Lecture 3 Graph Representation for Regular Expressions

digraph (directed graph)

• A digraph is a pair of sets (V, E) such that

each element of E is an ordered pair of elements in V.

• A path is an alternative sequence of vertices and edges such that all edges are in the same direction.

string-labeled digraph

• A string-labeled digraph is a digraph in which each edge is labeled by a string.

• In a string-labeled digraph, every path is associated with a string which is obtained by concatenating all strings on the path.

• This string is called the label of the path.

G(r)

• For each regular expression r, we can construct a digraph G(r) with edges labeled by symbols and ε as follows.

• If r=Φ, then

• If r≠Φ, then

Φ*

ε ε

Theorem 1

• G(r) has a property that a string x belongs to r if and only if x is the label of a path from the initial vertex to the final vertex.

• Proof is done by induction on r.

Graph Representation

• A graph representation of a regular expression r is a string-labeled graph with an initial vertex s and a final vertex f such that a string x belongs to r if and only if x is associated with a path from s to f.

Corollary 2

• For any regular expression r, there exists a string-labeled digraph with two special vertices, a initial vertex s and a final vertex f, such that a string x belongs to r if and only if x is associated with a path from s to f.

Puzzle: If a regular expression r contains u

``+''s, v ``·''s, and w ``*''s, how many

ε-edges does G(r) contain?

Question: How to reduce the number of

ε-edges?

Theorem 3

• An ε-edge (u,v) in G(r) which is a unique out-edge from a nonfinal vertex u or a unique in-edge to a noninitial vertex v can be shrunk to a single vertex. (If one of u and v is the initial vertex or the final vertex, so is the resulting vertex.)

• Remark: Shrinking should be done one by one.

Lecture 4 Deterministic Finite Automata (DFA)

DFA

Finite Control

tape

head

The tape is divided into finitely many cells. Each cell contains a symbol in an alphabet Σ.

a l p h a b e t

• The head scans at a cell on the tape and can read a symbol on the cell. In each move, the head can move to the right cell.

a

• The finite control has finitely many states which form a set Q. For each move, the state is changed according to the evaluation of a transition function

δ : Q x Σ → Q .

• δ(q, a) = p means that if the head reads symbol a and the finite control is in the state q, then the next state should be p, and the head moves one cell to the right.

pq

a a

• There are some special states: an initial state s and a set F of final states.

• Initially, the DFA is in the initial state s and the head scans the leftmost cell. The tape holds an input string.

s

• When the head gets off the tape, the DFA stops. An input string x is accepted by the DFA if the DFA stops at a final state.

• Otherwise, the input string is rejected.

h

x

• The DFA can be represented by

M = (Q, Σ, δ, s, F)

where Σ is the alphabet of input symbols.

• The set of all strings accepted by a DFA M is denoted by L(M). We also say that the language L(M) is accepted by M.

• The transition diagram of a DFA is an alternative way to represent the DFA.

• For M = (Q, Σ, δ, s, F), the transition diagram of M is a symbol-labeled digraph G=(V, E) satisfying the following:

V = Q (s = , f = for f \in F)

E = { q p | δ(q, a) = p}.

a

L(M) = (0+1)*00(0+1)*.

δ 0 1 s p s p q s q q q

s p q

1

0

1

0

0, 1

The transition diagram of the DFA M has the

following properties:

• For every vertex q and every symbol a, there exists an edge with label a from q.

• For each string x, there exists exactly one path starting from the initial state s associated with x.

• A string x is accepted by M if and only if this path ends at a final state.

lecture 3 graph representation for regular expressions

Documents

h x slide

regular expressions

initial vertex s

input string x

initial state s

final vertex f

resulting vertex

single vertex