lecture 3 regexpr nfa dfa topics lex, flex thompson construction subset construction (maybe)...
TRANSCRIPT
Lecture 3 RegExpr NFA DFA
Lecture 3 RegExpr NFA DFA
Topics Topics Lex, Flex Thompson Construction Subset construction (maybe)
Readings: 3.3-3.5, 3.7, 3.6Readings: 3.3-3.5, 3.7, 3.6
January 18, 2006
CSCE 531 Compiler Construction
– 2 – CSCE 531 Spring 2006
OverviewOverviewLast TimeLast Time
Why Study Compilers? Regular Expressions DFAs
Language accepted by a DFA
Today’s Lecture Today’s Lecture Lex then Flex Thompson Construction Examples NFA DFA, the subset construction
ε – closure, move(s,a)
ReferencesReferences Flex: http://www.cse.sc.edu/~matthews/Courses/531/Resources.html
– 3 – CSCE 531 Spring 2006
Review of Regular ExpressionsReview of Regular Expressions
Language denoted by a regular expressionLanguage denoted by a regular expression Recursive definition
Equivalence of two regular expressions r and sEquivalence of two regular expressions r and s L(s) = L(r)
More examples like Example 3.3 More examples like Example 3.3 If r = (a | b) (a | b) a what is L(r)?
If s = (a | b | c)* then L(s) =
If s = (a | b | ca)* then L(s) =
– 4 – CSCE 531 Spring 2006
Properties of Regular ExpressionsProperties of Regular Expressions
AxiomAxiom DescriptionDescription
r | s = s | rr | s = s | r | is commutative| is commutative
r | (s | t) = (r | s) | tr | (s | t) = (r | s) | t | is associative| is associative
(r s) t = r (s t)(r s) t = r (s t) concatenation is associativeconcatenation is associative
r (s | t) = rs | rtr (s | t) = rs | rt
(s | t) r = sr | tr(s | t) r = sr | tr
concatenation distributes over |concatenation distributes over |
( | = alternation)( | = alternation)
rrεε = r = r
εεr = rr = r
εε is the identity for is the identity for concatenationconcatenation
(r | (r | εε)* = r*)* = r*
r** = r*r** = r* * is idempotent* is idempotent
– 5 – CSCE 531 Spring 2006
Regular definitionsRegular definitions
A regular definition is a sequence of definitions of the A regular definition is a sequence of definitions of the formform
dd11 r r11
dd22 r r22
……
ddnn r rnn
Where Where
Each dEach dii is a distinct name is a distinct name
each reach rii is a regular expression over the symbols is a regular expression over the symbols ΣΣ U {dU {d11, d, d22, … d, … di-1i-1,},}
– 6 – CSCE 531 Spring 2006
Regular Definition Example 3.4Regular Definition Example 3.4
letter letter A | B | … | Z | a | b | … | z A | B | … | Z | a | b | … | z
digit digit 0 | 1 | … | 9 0 | 1 | … | 9
ID ID letter ( letter | digit ) * letter ( letter | digit ) *
In Lex/Flex this regular definition will be written asIn Lex/Flex this regular definition will be written as
ID ID {letter} ( {letter} | {digit} ) * {letter} ( {letter} | {digit} ) *
– 7 – CSCE 531 Spring 2006
Lexical Analyzer GeneratorsLexical Analyzer Generators
Fig 3.17 Creating and using a lexical analyzer with lex or flexFig 3.17 Creating and using a lexical analyzer with lex or flex
Lex/FlexLex
Sourcelang.l
lex.yy.c
C compiler a.out
a.outInputstream
Sequenceof tokens
lex.yy.c
– 8 – CSCE 531 Spring 2006
Lex and FlexLex and Flex
Johnson, Porter, Ackley, Ross CACM 1968 – described the Johnson, Porter, Ackley, Ross CACM 1968 – described the construction of scanners using Finite State techniquesconstruction of scanners using Finite State techniques
Lex – A Lexical Analyzer GeneratorLex – A Lexical Analyzer Generator
Flex – Fast LexFlex – Fast Lex Manual page ( man flex ) 531 Resources Page http://www.gnu.org/software/flex/manual/html_mono/flex.html
– 9 – CSCE 531 Spring 2006
Sections of a Lex Specification FileSections of a Lex Specification File
definitions definitions
%% %%
Pattern-action pairs Pattern-action pairs
%% %%
user code user code
– 10 – CSCE 531 Spring 2006
Small ExampleSmall Example
Here is a program which compresses multiple blanks Here is a program which compresses multiple blanks and tabs down to a single blank, and throws away and tabs down to a single blank, and throws away whitespace found at the end of a line: whitespace found at the end of a line:
%% %%
[ \t]+ [ \t]+ putchar( ' ' ); putchar( ' ' );
[ \t]+$ [ \t]+$ /* ignore this token *//* ignore this token */
NotesNotes1. Only one separator “%%” no routines section
– 11 – CSCE 531 Spring 2006
Lex PatternsLex Patterns
xx match the character `x' match the character `x'
.. any character (byte) except newline any character (byte) except newline
[xyz][xyz] a "character class"; in this case, the pattern matches either an `x', a a "character class"; in this case, the pattern matches either an `x', a
`y', or a `z' `y', or a `z'
[abj-oZ] [abj-oZ] a "character class" with a range in it; matches an `a', a `b', any a "character class" with a range in it; matches an `a', a `b', any
letter from `j' through `o', or a `Z' letter from `j' through `o', or a `Z'
[^A-Z][^A-Z] a "negated character class", i.e., any character but those in the a "negated character class", i.e., any character but those in the
class. In this case, any character EXCEPT an uppercase letter. class. In this case, any character EXCEPT an uppercase letter.
[^A-Z\n] [^A-Z\n] any character EXCEPT an uppercase letter or a newline any character EXCEPT an uppercase letter or a newline
– 12 – CSCE 531 Spring 2006
More Lex PatternsMore Lex Patternsrsrs
the regular expression the regular expression rr followed by the regular expression followed by the regular expression ss; ; called "concatenation" called "concatenation"
rr||ss either an either an rr or an or an ss
r/s // note this is a slash not a |r/s // note this is a slash not a | an an rr but only if it is followed by an but only if it is followed by an ss. The text matched by . The text matched by ss is is
included when determining whether this rule is the included when determining whether this rule is the longest matchlongest match, , but is then returned to the input before the action is executed. So but is then returned to the input before the action is executed. So the action only sees the text matched by the action only sees the text matched by rr. This type of pattern is . This type of pattern is called called trailing contexttrailing context. (There are some combinations of `. (There are some combinations of `rr//ss' that ' that flex cannot match correctly; see notes in the Deficiencies / Bugs flex cannot match correctly; see notes in the Deficiencies / Bugs section below regarding "dangerous trailing context".) section below regarding "dangerous trailing context".)
^̂rr an an rr, but only at the beginning of a line (i.e., which just starting to , but only at the beginning of a line (i.e., which just starting to
scan, or right after a newline has been scanned). scan, or right after a newline has been scanned).
rr$ $ an an rr, but only at the end of a line (i.e., just before a newline)., but only at the end of a line (i.e., just before a newline).
– 13 – CSCE 531 Spring 2006
More Lex PatternsMore Lex Patterns
rr* * zero or more zero or more rr's, where 's, where rr is any regular expression is any regular expression
rr++ one or more one or more rr's 's
rr? ? zero or one zero or one rr's (that is, "an optional 's (that is, "an optional rr") ")
rr{2,5} {2,5} anywhere from two to five anywhere from two to five rr's 's
rr{2,}{2,} two or more two or more rr's 's
rr{4} {4} exactly 4 exactly 4 rr's 's
{{namename} } the expansion of the "the expansion of the "namename" definition (see above) " definition (see above)
– 14 – CSCE 531 Spring 2006
Esoteric Lex PatternsEsoteric Lex Patterns
"[xyz]\"foo" "[xyz]\"foo" the literal string: `[xyz]"foo' the literal string: `[xyz]"foo'
\\xx if if xx is an `a', `b', `f', `n', `r', `t', or `v', then the ANSI-C interpretation is an `a', `b', `f', `n', `r', `t', or `v', then the ANSI-C interpretation
of \of \xx. Otherwise, a literal `. Otherwise, a literal `xx' (used to escape operators such as `*') ' (used to escape operators such as `*')
\0\0 a NUL character (ASCII code 0) a NUL character (ASCII code 0)
\123\123 the character with octal value 123 the character with octal value 123
\x2a \x2a the character with hexadecimal value 2a the character with hexadecimal value 2a
((rr) ) match an match an rr; parentheses are used to override precedence (see ; parentheses are used to override precedence (see
below) below)
– 15 – CSCE 531 Spring 2006
Matching PatternsMatching Patterns
Output of lex is lex.yy.c and contains the function yylex()Output of lex is lex.yy.c and contains the function yylex()
int yylex() { int yylex() {
... various definitions and the actions in here ... ... various definitions and the actions in here ...
}}
extern char yytext[]; // the lexemeextern char yytext[]; // the lexeme
extern int yyleng;extern int yyleng; // the length of the lexeme // the length of the lexeme “yytext”“yytext”
LookaheadLookahead
– 16 – CSCE 531 Spring 2006
Example 1 in flex Man PagesExample 1 in flex Man Pages
int num_lines = 0, num_chars = 0;int num_lines = 0, num_chars = 0;
%%%%
\n ++num_lines; ++num_chars;\n ++num_lines; ++num_chars;
. ++num_chars;. ++num_chars;
%%%%
main()main()
{{
yylex();yylex();
printf( "# of lines = %d, # of chars = %d\n",printf( "# of lines = %d, # of chars = %d\n",
num_lines, num_chars );num_lines, num_chars );
}}
– 17 – CSCE 531 Spring 2006
/* scanner for a toy Pascal-like language */ /* scanner for a toy Pascal-like language */
%{ /* need this for the call to atof() below */ %{ /* need this for the call to atof() below */
#include <math.h> #include <math.h>
%} %}
DIGIT DIGIT [0-9] [0-9]
ID ID [a-z][a-z0-9]* [a-z][a-z0-9]*
%% %%
{DIGIT}+ {DIGIT}+ { printf( "An integer: %s (%d)\n", yytext, { printf( "An integer: %s (%d)\n", yytext, atoi( yytext ) ); atoi( yytext ) );
}}
… … excerpted from an example in the flex man pagesexcerpted from an example in the flex man pages
– 18 – CSCE 531 Spring 2006
Figure 3.18 excerpts with modificationsFigure 3.18 excerpts with modifications
%{%{
#include “y.tab.h”#include “y.tab.h”
int yylineno=1;int yylineno=1;
%}%}
/* regular definitions *//* regular definitions */
delimdelim [ \t][ \t]
wsws {delim}+{delim}+
letterletter [A-Za-z][A-Za-z]
digitdigit [0-9][0-9]
idid {letter}({letter}|{digit})*{letter}({letter}|{digit})*
numbernumber ……
%%%%
– 19 – CSCE 531 Spring 2006
Figure 3.18b excerpts with modificationsFigure 3.18b excerpts with modifications
%%%%
"\n""\n" { ++yylineno;}{ ++yylineno;}
{ws}{ws} { /* no action and no return */}{ /* no action and no return */}
ifif { return(IF); }{ return(IF); }
thenthen { return(THEN); }{ return(THEN); }
elseelse { return(ELSE); } { return(ELSE); }
{id}{id} { yylval = install_id(); return(ID); }{ yylval = install_id(); return(ID); }
{number}{number} { yylval = install_num(); return(CONSTANT);}{ yylval = install_num(); return(CONSTANT);}
““<=”<=” { yylval = LE; return(RELOP); }{ yylval = LE; return(RELOP); }
(more pattern-action pairs)(more pattern-action pairs)
.. { printf(“Unrecog. char %s line %d\n", { printf(“Unrecog. char %s line %d\n",
yytext, yylineno);yytext, yylineno);
%%%%
Code for install_id() and install_num();Code for install_id() and install_num();
– 20 – CSCE 531 Spring 2006
Lookahead OperatorLookahead Operator
The Fortran compiler’s first step is to squeeze out all the The Fortran compiler’s first step is to squeeze out all the blanks, i.e. the blank was not a “separator”blanks, i.e. the blank was not a “separator”
Example where this presents a problemExample where this presents a problem
Do 5 J = 1, 25Do 5 J = 1, 25 Do5J=1,25 Do5J=1,25 Initialize a do loop to statement #5 with loop varaible J
Do 5 J = 1.25Do 5 J = 1.25 Do5J=1.25 Do5J=1.25 Assign the variable “Do5J” the value 1.25
The problem: we can’t recognize the token “do” until we The problem: we can’t recognize the token “do” until we look ahead and see the commalook ahead and see the comma
Lex pattern for “do” using lookahead operator ‘/’Lex pattern for “do” using lookahead operator ‘/’ Do / ({letter} | {digit})* = ({letter} | {digit})* ,
– 21 – CSCE 531 Spring 2006
Lex / Flex Disambiguation RulesLex / Flex Disambiguation Rules
What if several patterns match?What if several patterns match?
Lex follows the algorithmLex follows the algorithm
1.1. Match as long a string as is possible from the Match as long a string as is possible from the current character.current character.
2.2. Of those patterns that match this longest string Of those patterns that match this longest string choose the one that is listed first in the lex choose the one that is listed first in the lex specification filespecification file
– 22 – CSCE 531 Spring 2006
`input()' reads the next character from the input stream. For example, the `input()' reads the next character from the input stream. For example, the following is one way to eat up C comments: following is one way to eat up C comments:
%% %% "/*" "/*" { register int c; { register int c;
for ( ; ; ) for ( ; ; ) { while ( (c = input()) != '*' && c != EOF ) ; { while ( (c = input()) != '*' && c != EOF ) ;
/* eat up text of comment */ /* eat up text of comment */ if ( c == '*' ) { if ( c == '*' ) {
while ( (c = input()) == '*' ) while ( (c = input()) == '*' ) ; ;
if ( c == '/' ) if ( c == '/' ) break; /*found the end */ break; /*found the end */
} } if ( c == EOF ) { if ( c == EOF ) {
error( "EOF in comment" ); error( "EOF in comment" ); break; break;
}} }}
} } Example from Gnu documentation
– 23 – CSCE 531 Spring 2006
Main Routine taking argumentsMain Routine taking arguments
main(int argc; char **argv)main(int argc; char **argv)
{ { ++argv, --argc; /* skip over program name */ ++argv, --argc; /* skip over program name */
if ( argc > 0 ) yyin = fopen( argv[0], "r" ); if ( argc > 0 ) yyin = fopen( argv[0], "r" );
else yyin = stdin; else yyin = stdin;
yylex(); yylex();
} }
– 24 – CSCE 531 Spring 2006
Symbol TablesSymbol Tables
– 25 – CSCE 531 Spring 2006
Hash TableHash Table
#define ENDSTR 0#define ENDSTR 0
#define MAXSTR 100#define MAXSTR 100
#include <stdio.h>#include <stdio.h>
struct nlist { /* basic table entry */struct nlist { /* basic table entry */
char *name;char *name;
int val;int val;
struct nlist *next; /*next entry in chain */struct nlist *next; /*next entry in chain */
};};
#define HASHSIZE 100#define HASHSIZE 100
static struct nlist *hashtab[HASHSIZE]; /* pointer table */static struct nlist *hashtab[HASHSIZE]; /* pointer table */
– 26 – CSCE 531 Spring 2006
HashtableHashtable
…
…
.
.
.
xbar foo
boatcount
x
int
int float
func
double null
.
.
.
.
.
.
– 27 – CSCE 531 Spring 2006
The Hash FunctionThe Hash Function
/* PURPOSE: Hash determines hash value based on the sum of the /* PURPOSE: Hash determines hash value based on the sum of the
character values in the string. character values in the string.
USAGE: n = hash(s);USAGE: n = hash(s);
DESCRIPTION OF PARAMETERS: s(array of char) string to be hashedDESCRIPTION OF PARAMETERS: s(array of char) string to be hashed
AUTHOR: Kernighan and RitchieAUTHOR: Kernighan and Ritchie
LAST REVISION: 12/11/83LAST REVISION: 12/11/83
*/*/
hash(char *s) /* form hash value for string s */hash(char *s) /* form hash value for string s */
{{
int hashval;int hashval;
for (hashval = 0; *s != '\0'; )for (hashval = 0; *s != '\0'; )
hashval += *s++;hashval += *s++;
return (hashval % HASHSIZE);return (hashval % HASHSIZE);
}}
– 28 – CSCE 531 Spring 2006
The lookup FunctionThe lookup Function
/*PURPOSE: Lookup searches for entry in symbol table and returns a /*PURPOSE: Lookup searches for entry in symbol table and returns a pointer pointer
USAGE: np= lookup(s);USAGE: np= lookup(s);
DESCRIPTION OF PARAMETERS: s(array of char) string searched forDESCRIPTION OF PARAMETERS: s(array of char) string searched for
AUTHOR: Kernighan and RitchieAUTHOR: Kernighan and Ritchie
LAST REVISION: 12/11/83*/LAST REVISION: 12/11/83*/
struct nlist *lookup(char *s) /* look for s in hashtab */struct nlist *lookup(char *s) /* look for s in hashtab */
{{
struct nlist *np;struct nlist *np;
for (np = hashtab[hash(s)]; np != NULL; np = np->next)for (np = hashtab[hash(s)]; np != NULL; np = np->next)
if (strcmp(s, np->name) == 0)if (strcmp(s, np->name) == 0)
return(np); /* found it */return(np); /* found it */
return(NULL); /* not found */return(NULL); /* not found */
}}
– 29 – CSCE 531 Spring 2006
The install FunctionThe install Function
/*/*
PURPOSE: Install checks hash table using lookup and PURPOSE: Install checks hash table using lookup and if entry not found, it "installs" the entry.if entry not found, it "installs" the entry.
USAGE: np = install(name); USAGE: np = install(name);
DESCRIPTION OF PARAMETERS: name(array of char) DESCRIPTION OF PARAMETERS: name(array of char) name to install in symbol tablename to install in symbol table
AUTHOR: Kernighan and Ritchie, modified by Ron AUTHOR: Kernighan and Ritchie, modified by Ron SobczakSobczak
LAST REVISION: 12/11/83LAST REVISION: 12/11/83
*/*/
– 30 – CSCE 531 Spring 2006
struct nlist *install(char *name) /* put (name) in hashtab */struct nlist *install(char *name) /* put (name) in hashtab */
{{
struct nlist *np, *lookup();struct nlist *np, *lookup();
char *strdup(), *malloc();char *strdup(), *malloc();
int hashval;int hashval;
if ((np = lookup(name)) == NULL) { /* not found */if ((np = lookup(name)) == NULL) { /* not found */
np = (struct nlist *) malloc(sizeof(*np));np = (struct nlist *) malloc(sizeof(*np));
if (np == NULL)if (np == NULL)
return(NULL);return(NULL);
if ((np->name = strdup(name)) == NULL)if ((np->name = strdup(name)) == NULL)
return(NULL);return(NULL);
hashval = hash(np->name);hashval = hash(np->name);
np->next = hashtab[hashval];np->next = hashtab[hashval];
hashtab[hashval] = np;hashtab[hashval] = np;
}}
return(np);return(np);
}}
– 31 – CSCE 531 Spring 2006
NFAs (Non-deterministic Finite Automata)NFAs (Non-deterministic Finite Automata)
Recall from last TimeRecall from last Time
M = (M = (ΣΣ, S, s, S, s00, , δδ, S, SFF)) Σ - alphabet S - states δ – state transition function s0 – start state SF – set of final or accepting states
L(M) – { x such that it is possible to follow a path in the L(M) – { x such that it is possible to follow a path in the transition diagram labeled x that ends in an accepting transition diagram labeled x that ends in an accepting state.}state.}
NFAs relax the functional nature of the transition functionNFAs relax the functional nature of the transition function
δδ(s, a), the nextstate for state s and input a, is a subset (s, a), the nextstate for state s and input a, is a subset of states of states
– 32 – CSCE 531 Spring 2006
Language Accepted by an NFALanguage Accepted by an NFA
A string xA string x11xx22 … x … xnn is accepted by an NFA is accepted by an NFA
M = (M = (ΣΣ, S, s, S, s00, , δδ, S, SFF) if) if
– 33 – CSCE 531 Spring 2006
Figure 2.3 Equivalence NFA, DFA, REFigure 2.3 Equivalence NFA, DFA, RE
RegExpr RegExpr NFA NFA Thompson ConstructionThompson Construction
NFA NFA DFA DFA Subset ConstructionSubset Construction
DFA DFA DFA DFA DFA minimizationDFA minimization
DFA DFA tables for scanner tables for scanner
DFA DFA RegExpr RegExpr Kleene Construction Kleene Construction
– 34 – CSCE 531 Spring 2006
Converting Regular Expressions to NFAs Converting Regular Expressions to NFAs
Ken Thompson (1968) outlined a regular expression to Ken Thompson (1968) outlined a regular expression to NFA conversion algorithm for use in an editorNFA conversion algorithm for use in an editor Future fame?
How would we use regular expressions in an editor?How would we use regular expressions in an editor?
Unix regular expressionsUnix regular expressions
Grep family – Global Regular Expressions Print – Grep family – Global Regular Expressions Print – prints all lines in a file that contain a match to the prints all lines in a file that contain a match to the regular expressionregular expression
VariationsVariations Fgrep – fast fixed regular expression just a string Egrep – goes through NFA DFA and minimization
– 35 – CSCE 531 Spring 2006
Restrictions on NFAs in Thompson ConstructionRestrictions on NFAs in Thompson Construction
Constructs an NFA from the regular expression with the Constructs an NFA from the regular expression with the following restrictions:following restrictions:
1.1. The NFA has a single start state, The NFA has a single start state, s0, and single final , and single final state, state, sf..
2.2. There are no transitions coming into the start stateThere are no transitions coming into the start state
3.3. and no transitions leaving the final state.and no transitions leaving the final state.
4.4. A state has at most 2 exiting A state has at most 2 exiting εε – transitions and at – transitions and at most 2 entering most 2 entering εε – transitions. – transitions.
s0 sf
– 36 – CSCE 531 Spring 2006
Base Cases of Thompson Construction Base Cases of Thompson Construction
For a For a εε ΣΣ the NFA M the NFA Maa = ( = (ΣΣ, {s, {s00, s, sff}, }, δδ, s, s00, {s, {sff}) that }) that accepts it is:accepts it is:
For For εε the NFA M the NFA Mεε = ( = (ΣΣ, {s, {s00, s, sff}, }, δδ, s, s00, {s, {sff}) that accepts it }) that accepts it is:is:
– 37 – CSCE 531 Spring 2006
Recursive Cases of Thompson Construction Recursive Cases of Thompson Construction
For regular expressions R and S with machines MFor regular expressions R and S with machines MRR and M and MSS
MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})}) M MSS = ( = (ΣΣ, S, SSS, , δδSS, s, s00, {s, {sff})})
Then the NFA Then the NFA
MMR|SR|S = ( = (ΣΣ, S, SR R U SU SSS U {new U {new00, new, newff}, }, δδR|SR|S, new, new00, {new, {newff})})
– 38 – CSCE 531 Spring 2006
Recursive Cases of Thompson Construction R|SRecursive Cases of Thompson Construction R|S
For regular expressions R and S with machines MFor regular expressions R and S with machines MRR and M and MSS
MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})}) M MSS = ( = (ΣΣ, S, SSS, , δδSS, s, s00, {s, {sff})})
Then the NFA Then the NFA
MMR|SR|S = ( = (ΣΣ, S, SR R U SU SSS U {new U {new00, new, newff}, }, δδR|SR|S, new, new00, {new, {newff})})
– 39 – CSCE 531 Spring 2006
Recursive Cases of Thompson Construction RSRecursive Cases of Thompson Construction RS
For regular expressions R and S with machines MFor regular expressions R and S with machines MRR and M and MSS
MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})}) M MSS = ( = (ΣΣ, S, SSS, , δδSS, s, s00, {s, {sff})})
Then the NFA Then the NFA
MMRSRS = ( = (ΣΣ, S, SR R U SU SSS U {new U {new00, new, newff}, }, δδRSRS, new, new00, {new, {newff})})
– 40 – CSCE 531 Spring 2006
Recursive Cases of Thompson Construction R*Recursive Cases of Thompson Construction R*
For regular expression R with machine MFor regular expression R with machine MRR
MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})})
Then the NFA Then the NFA
MMR*R* = ( = (ΣΣ, S, SR R U {new U {new00, new, newff}, }, δδR*R*, new, new00, {new, {newff})})
– 41 – CSCE 531 Spring 2006
Thompson exampleThompson example
Fig 2.5 has one let’s do another RegExpr = (a|b)*b(a|b)*Fig 2.5 has one let’s do another RegExpr = (a|b)*b(a|b)*
– 42 – CSCE 531 Spring 2006
NFA to DFA the Subset ConstructionNFA to DFA the Subset Construction
In an NFA given an input string we make choices about In an NFA given an input string we make choices about which way to go. We can think of it as being in a which way to go. We can think of it as being in a subset of the states.subset of the states.
To convert to a DFATo convert to a DFA
The states of the DFA correspond to sets of states of The states of the DFA correspond to sets of states of the NFAthe NFA
Transitions of the DFA are when you can move Transitions of the DFA are when you can move between the sets in the NFAbetween the sets in the NFA
– 43 – CSCE 531 Spring 2006
NFA to DFA the Subset ConstructionNFA to DFA the Subset ConstructionTo convert an NFA M = (To convert an NFA M = (ΣΣ, S, SNN, , δδNN, N, N00, F, FNN) to a ) to a DFA DFA
M = (M = (ΣΣ, S, SDD, , δδDD, D, D00, F, FDD) )
We will use a pair of functions to facilitate seeing all We will use a pair of functions to facilitate seeing all of the states we can get to from one on a given input.of the states we can get to from one on a given input.
Move(sMove(si i , , aa)) is set of states reachable from is set of states reachable from ssii by by aa
-closure(s-closure(sii) is set of states reachable from ) is set of states reachable from ssii by by
The algorithm:The algorithm: Start state derived from NStart state derived from N00 of the of the NFANFA
Take its Take its -closure D-closure D00 = = -closure(s-closure(s00) ) Take the image of DTake the image of D00, Move(D, Move(D00, , ) for each ) for each , ,
and take its and take its -closure-closure Iterate until no more states are addedIterate until no more states are added