lecture 3 regexpr nfa dfa topics lex, flex thompson construction subset construction (maybe)...

43
Lecture 3 RegExpr NFA DFA Topics Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531 Compiler Construction

Upload: silas-francis

Post on 14-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

Lecture 3 RegExpr NFA DFA

Lecture 3 RegExpr NFA DFA

Topics Topics Lex, Flex Thompson Construction Subset construction (maybe)

Readings: 3.3-3.5, 3.7, 3.6Readings: 3.3-3.5, 3.7, 3.6

January 18, 2006

CSCE 531 Compiler Construction

Page 2: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 2 – CSCE 531 Spring 2006

OverviewOverviewLast TimeLast Time

Why Study Compilers? Regular Expressions DFAs

Language accepted by a DFA

Today’s Lecture Today’s Lecture Lex then Flex Thompson Construction Examples NFA DFA, the subset construction

ε – closure, move(s,a)

ReferencesReferences Flex: http://www.cse.sc.edu/~matthews/Courses/531/Resources.html

Page 3: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 3 – CSCE 531 Spring 2006

Review of Regular ExpressionsReview of Regular Expressions

Language denoted by a regular expressionLanguage denoted by a regular expression Recursive definition

Equivalence of two regular expressions r and sEquivalence of two regular expressions r and s L(s) = L(r)

More examples like Example 3.3 More examples like Example 3.3 If r = (a | b) (a | b) a what is L(r)?

If s = (a | b | c)* then L(s) =

If s = (a | b | ca)* then L(s) =

Page 4: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 4 – CSCE 531 Spring 2006

Properties of Regular ExpressionsProperties of Regular Expressions

AxiomAxiom DescriptionDescription

r | s = s | rr | s = s | r | is commutative| is commutative

r | (s | t) = (r | s) | tr | (s | t) = (r | s) | t | is associative| is associative

(r s) t = r (s t)(r s) t = r (s t) concatenation is associativeconcatenation is associative

r (s | t) = rs | rtr (s | t) = rs | rt

(s | t) r = sr | tr(s | t) r = sr | tr

concatenation distributes over |concatenation distributes over |

( | = alternation)( | = alternation)

rrεε = r = r

εεr = rr = r

εε is the identity for is the identity for concatenationconcatenation

(r | (r | εε)* = r*)* = r*

r** = r*r** = r* * is idempotent* is idempotent

Page 5: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 5 – CSCE 531 Spring 2006

Regular definitionsRegular definitions

A regular definition is a sequence of definitions of the A regular definition is a sequence of definitions of the formform

dd11 r r11

dd22 r r22

……

ddnn r rnn

Where Where

Each dEach dii is a distinct name is a distinct name

each reach rii is a regular expression over the symbols is a regular expression over the symbols ΣΣ U {dU {d11, d, d22, … d, … di-1i-1,},}

Page 6: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 6 – CSCE 531 Spring 2006

Regular Definition Example 3.4Regular Definition Example 3.4

letter letter A | B | … | Z | a | b | … | z A | B | … | Z | a | b | … | z

digit digit 0 | 1 | … | 9 0 | 1 | … | 9

ID ID letter ( letter | digit ) * letter ( letter | digit ) *

In Lex/Flex this regular definition will be written asIn Lex/Flex this regular definition will be written as

ID ID {letter} ( {letter} | {digit} ) * {letter} ( {letter} | {digit} ) *

Page 7: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 7 – CSCE 531 Spring 2006

Lexical Analyzer GeneratorsLexical Analyzer Generators

Fig 3.17 Creating and using a lexical analyzer with lex or flexFig 3.17 Creating and using a lexical analyzer with lex or flex

Lex/FlexLex

Sourcelang.l

lex.yy.c

C compiler a.out

a.outInputstream

Sequenceof tokens

lex.yy.c

Page 8: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 8 – CSCE 531 Spring 2006

Lex and FlexLex and Flex

Johnson, Porter, Ackley, Ross CACM 1968 – described the Johnson, Porter, Ackley, Ross CACM 1968 – described the construction of scanners using Finite State techniquesconstruction of scanners using Finite State techniques

Lex – A Lexical Analyzer GeneratorLex – A Lexical Analyzer Generator

Flex – Fast LexFlex – Fast Lex Manual page ( man flex ) 531 Resources Page http://www.gnu.org/software/flex/manual/html_mono/flex.html

Page 9: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 9 – CSCE 531 Spring 2006

Sections of a Lex Specification FileSections of a Lex Specification File

definitions definitions

%% %%

Pattern-action pairs Pattern-action pairs

%% %%

user code user code

Page 10: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 10 – CSCE 531 Spring 2006

Small ExampleSmall Example

Here is a program which compresses multiple blanks Here is a program which compresses multiple blanks and tabs down to a single blank, and throws away and tabs down to a single blank, and throws away whitespace found at the end of a line: whitespace found at the end of a line:

%% %%

[ \t]+ [ \t]+ putchar( ' ' ); putchar( ' ' );

[ \t]+$ [ \t]+$ /* ignore this token *//* ignore this token */

NotesNotes1. Only one separator “%%” no routines section

Page 11: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 11 – CSCE 531 Spring 2006

Lex PatternsLex Patterns

xx match the character `x' match the character `x'

.. any character (byte) except newline any character (byte) except newline

[xyz][xyz] a "character class"; in this case, the pattern matches either an `x', a a "character class"; in this case, the pattern matches either an `x', a

`y', or a `z' `y', or a `z'

[abj-oZ] [abj-oZ] a "character class" with a range in it; matches an `a', a `b', any a "character class" with a range in it; matches an `a', a `b', any

letter from `j' through `o', or a `Z' letter from `j' through `o', or a `Z'

[^A-Z][^A-Z] a "negated character class", i.e., any character but those in the a "negated character class", i.e., any character but those in the

class. In this case, any character EXCEPT an uppercase letter. class. In this case, any character EXCEPT an uppercase letter.

[^A-Z\n] [^A-Z\n] any character EXCEPT an uppercase letter or a newline any character EXCEPT an uppercase letter or a newline

Page 12: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 12 – CSCE 531 Spring 2006

More Lex PatternsMore Lex Patternsrsrs

the regular expression the regular expression rr followed by the regular expression followed by the regular expression ss; ; called "concatenation" called "concatenation"

rr||ss either an either an rr or an or an ss

r/s // note this is a slash not a |r/s // note this is a slash not a | an an rr but only if it is followed by an but only if it is followed by an ss. The text matched by . The text matched by ss is is

included when determining whether this rule is the included when determining whether this rule is the longest matchlongest match, , but is then returned to the input before the action is executed. So but is then returned to the input before the action is executed. So the action only sees the text matched by the action only sees the text matched by rr. This type of pattern is . This type of pattern is called called trailing contexttrailing context. (There are some combinations of `. (There are some combinations of `rr//ss' that ' that flex cannot match correctly; see notes in the Deficiencies / Bugs flex cannot match correctly; see notes in the Deficiencies / Bugs section below regarding "dangerous trailing context".) section below regarding "dangerous trailing context".)

^̂rr an an rr, but only at the beginning of a line (i.e., which just starting to , but only at the beginning of a line (i.e., which just starting to

scan, or right after a newline has been scanned). scan, or right after a newline has been scanned).

rr$ $ an an rr, but only at the end of a line (i.e., just before a newline)., but only at the end of a line (i.e., just before a newline).

Page 13: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 13 – CSCE 531 Spring 2006

More Lex PatternsMore Lex Patterns

rr* * zero or more zero or more rr's, where 's, where rr is any regular expression is any regular expression

rr++ one or more one or more rr's 's

rr? ? zero or one zero or one rr's (that is, "an optional 's (that is, "an optional rr") ")

rr{2,5} {2,5} anywhere from two to five anywhere from two to five rr's 's

rr{2,}{2,} two or more two or more rr's 's

rr{4} {4} exactly 4 exactly 4 rr's 's

{{namename} } the expansion of the "the expansion of the "namename" definition (see above) " definition (see above)

Page 14: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 14 – CSCE 531 Spring 2006

Esoteric Lex PatternsEsoteric Lex Patterns

"[xyz]\"foo" "[xyz]\"foo" the literal string: `[xyz]"foo' the literal string: `[xyz]"foo'

\\xx if if xx is an `a', `b', `f', `n', `r', `t', or `v', then the ANSI-C interpretation is an `a', `b', `f', `n', `r', `t', or `v', then the ANSI-C interpretation

of \of \xx. Otherwise, a literal `. Otherwise, a literal `xx' (used to escape operators such as `*') ' (used to escape operators such as `*')

\0\0 a NUL character (ASCII code 0) a NUL character (ASCII code 0)

\123\123 the character with octal value 123 the character with octal value 123

\x2a \x2a the character with hexadecimal value 2a the character with hexadecimal value 2a

((rr) ) match an match an rr; parentheses are used to override precedence (see ; parentheses are used to override precedence (see

below) below)

Page 15: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 15 – CSCE 531 Spring 2006

Matching PatternsMatching Patterns

Output of lex is lex.yy.c and contains the function yylex()Output of lex is lex.yy.c and contains the function yylex()

int yylex() { int yylex() {

... various definitions and the actions in here ... ... various definitions and the actions in here ...

}}

extern char yytext[]; // the lexemeextern char yytext[]; // the lexeme

extern int yyleng;extern int yyleng; // the length of the lexeme // the length of the lexeme “yytext”“yytext”

LookaheadLookahead

Page 16: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 16 – CSCE 531 Spring 2006

Example 1 in flex Man PagesExample 1 in flex Man Pages

int num_lines = 0, num_chars = 0;int num_lines = 0, num_chars = 0;

%%%%

\n ++num_lines; ++num_chars;\n ++num_lines; ++num_chars;

. ++num_chars;. ++num_chars;

%%%%

main()main()

{{

yylex();yylex();

printf( "# of lines = %d, # of chars = %d\n",printf( "# of lines = %d, # of chars = %d\n",

num_lines, num_chars );num_lines, num_chars );

}}

Page 17: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 17 – CSCE 531 Spring 2006

/* scanner for a toy Pascal-like language */ /* scanner for a toy Pascal-like language */

%{ /* need this for the call to atof() below */ %{ /* need this for the call to atof() below */

#include <math.h> #include <math.h>

%} %}

DIGIT DIGIT [0-9] [0-9]

ID ID [a-z][a-z0-9]* [a-z][a-z0-9]*

%% %%

{DIGIT}+ {DIGIT}+ { printf( "An integer: %s (%d)\n", yytext, { printf( "An integer: %s (%d)\n", yytext, atoi( yytext ) ); atoi( yytext ) );

}}

… … excerpted from an example in the flex man pagesexcerpted from an example in the flex man pages

Page 18: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 18 – CSCE 531 Spring 2006

Figure 3.18 excerpts with modificationsFigure 3.18 excerpts with modifications

%{%{

#include “y.tab.h”#include “y.tab.h”

int yylineno=1;int yylineno=1;

%}%}

/* regular definitions *//* regular definitions */

delimdelim [ \t][ \t]

wsws {delim}+{delim}+

letterletter [A-Za-z][A-Za-z]

digitdigit [0-9][0-9]

idid {letter}({letter}|{digit})*{letter}({letter}|{digit})*

numbernumber ……

%%%%

Page 19: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 19 – CSCE 531 Spring 2006

Figure 3.18b excerpts with modificationsFigure 3.18b excerpts with modifications

%%%%

"\n""\n" { ++yylineno;}{ ++yylineno;}

{ws}{ws} { /* no action and no return */}{ /* no action and no return */}

ifif { return(IF); }{ return(IF); }

thenthen { return(THEN); }{ return(THEN); }

elseelse { return(ELSE); } { return(ELSE); }

{id}{id} { yylval = install_id(); return(ID); }{ yylval = install_id(); return(ID); }

{number}{number} { yylval = install_num(); return(CONSTANT);}{ yylval = install_num(); return(CONSTANT);}

““<=”<=” { yylval = LE; return(RELOP); }{ yylval = LE; return(RELOP); }

(more pattern-action pairs)(more pattern-action pairs)

.. { printf(“Unrecog. char %s line %d\n", { printf(“Unrecog. char %s line %d\n",

yytext, yylineno);yytext, yylineno);

%%%%

Code for install_id() and install_num();Code for install_id() and install_num();

Page 20: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 20 – CSCE 531 Spring 2006

Lookahead OperatorLookahead Operator

The Fortran compiler’s first step is to squeeze out all the The Fortran compiler’s first step is to squeeze out all the blanks, i.e. the blank was not a “separator”blanks, i.e. the blank was not a “separator”

Example where this presents a problemExample where this presents a problem

Do 5 J = 1, 25Do 5 J = 1, 25 Do5J=1,25 Do5J=1,25 Initialize a do loop to statement #5 with loop varaible J

Do 5 J = 1.25Do 5 J = 1.25 Do5J=1.25 Do5J=1.25 Assign the variable “Do5J” the value 1.25

The problem: we can’t recognize the token “do” until we The problem: we can’t recognize the token “do” until we look ahead and see the commalook ahead and see the comma

Lex pattern for “do” using lookahead operator ‘/’Lex pattern for “do” using lookahead operator ‘/’ Do / ({letter} | {digit})* = ({letter} | {digit})* ,

Page 21: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 21 – CSCE 531 Spring 2006

Lex / Flex Disambiguation RulesLex / Flex Disambiguation Rules

What if several patterns match?What if several patterns match?

Lex follows the algorithmLex follows the algorithm

1.1. Match as long a string as is possible from the Match as long a string as is possible from the current character.current character.

2.2. Of those patterns that match this longest string Of those patterns that match this longest string choose the one that is listed first in the lex choose the one that is listed first in the lex specification filespecification file

Page 22: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 22 – CSCE 531 Spring 2006

`input()' reads the next character from the input stream. For example, the `input()' reads the next character from the input stream. For example, the following is one way to eat up C comments: following is one way to eat up C comments:

%% %% "/*" "/*" { register int c; { register int c;

for ( ; ; ) for ( ; ; ) { while ( (c = input()) != '*' && c != EOF ) ; { while ( (c = input()) != '*' && c != EOF ) ;

/* eat up text of comment */ /* eat up text of comment */ if ( c == '*' ) { if ( c == '*' ) {

while ( (c = input()) == '*' ) while ( (c = input()) == '*' ) ; ;

if ( c == '/' ) if ( c == '/' ) break; /*found the end */ break; /*found the end */

} } if ( c == EOF ) { if ( c == EOF ) {

error( "EOF in comment" ); error( "EOF in comment" ); break; break;

}} }}

} } Example from Gnu documentation

Page 23: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 23 – CSCE 531 Spring 2006

Main Routine taking argumentsMain Routine taking arguments

main(int argc; char **argv)main(int argc; char **argv)

{ { ++argv, --argc; /* skip over program name */ ++argv, --argc; /* skip over program name */

if ( argc > 0 ) yyin = fopen( argv[0], "r" ); if ( argc > 0 ) yyin = fopen( argv[0], "r" );

else yyin = stdin; else yyin = stdin;

yylex(); yylex();

} }

Page 24: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 24 – CSCE 531 Spring 2006

Symbol TablesSymbol Tables

Page 25: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 25 – CSCE 531 Spring 2006

Hash TableHash Table

#define ENDSTR 0#define ENDSTR 0

#define MAXSTR 100#define MAXSTR 100

#include <stdio.h>#include <stdio.h>

struct nlist { /* basic table entry */struct nlist { /* basic table entry */

char *name;char *name;

int val;int val;

struct nlist *next; /*next entry in chain */struct nlist *next; /*next entry in chain */

};};

#define HASHSIZE 100#define HASHSIZE 100

static struct nlist *hashtab[HASHSIZE]; /* pointer table */static struct nlist *hashtab[HASHSIZE]; /* pointer table */

Page 26: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 26 – CSCE 531 Spring 2006

HashtableHashtable

.

.

.

xbar foo

boatcount

x

int

int float

func

double null

.

.

.

.

.

.

Page 27: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 27 – CSCE 531 Spring 2006

The Hash FunctionThe Hash Function

/* PURPOSE: Hash determines hash value based on the sum of the /* PURPOSE: Hash determines hash value based on the sum of the

character values in the string. character values in the string.

USAGE: n = hash(s);USAGE: n = hash(s);

DESCRIPTION OF PARAMETERS: s(array of char) string to be hashedDESCRIPTION OF PARAMETERS: s(array of char) string to be hashed

AUTHOR: Kernighan and RitchieAUTHOR: Kernighan and Ritchie

LAST REVISION: 12/11/83LAST REVISION: 12/11/83

*/*/

hash(char *s) /* form hash value for string s */hash(char *s) /* form hash value for string s */

{{

int hashval;int hashval;

for (hashval = 0; *s != '\0'; )for (hashval = 0; *s != '\0'; )

hashval += *s++;hashval += *s++;

return (hashval % HASHSIZE);return (hashval % HASHSIZE);

}}

Page 28: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 28 – CSCE 531 Spring 2006

The lookup FunctionThe lookup Function

/*PURPOSE: Lookup searches for entry in symbol table and returns a /*PURPOSE: Lookup searches for entry in symbol table and returns a pointer pointer

USAGE: np= lookup(s);USAGE: np= lookup(s);

DESCRIPTION OF PARAMETERS: s(array of char) string searched forDESCRIPTION OF PARAMETERS: s(array of char) string searched for

AUTHOR: Kernighan and RitchieAUTHOR: Kernighan and Ritchie

LAST REVISION: 12/11/83*/LAST REVISION: 12/11/83*/

struct nlist *lookup(char *s) /* look for s in hashtab */struct nlist *lookup(char *s) /* look for s in hashtab */

{{

struct nlist *np;struct nlist *np;

for (np = hashtab[hash(s)]; np != NULL; np = np->next)for (np = hashtab[hash(s)]; np != NULL; np = np->next)

if (strcmp(s, np->name) == 0)if (strcmp(s, np->name) == 0)

return(np); /* found it */return(np); /* found it */

return(NULL); /* not found */return(NULL); /* not found */

}}

Page 29: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 29 – CSCE 531 Spring 2006

The install FunctionThe install Function

/*/*

PURPOSE: Install checks hash table using lookup and PURPOSE: Install checks hash table using lookup and if entry not found, it "installs" the entry.if entry not found, it "installs" the entry.

USAGE: np = install(name); USAGE: np = install(name);

DESCRIPTION OF PARAMETERS: name(array of char) DESCRIPTION OF PARAMETERS: name(array of char) name to install in symbol tablename to install in symbol table

AUTHOR: Kernighan and Ritchie, modified by Ron AUTHOR: Kernighan and Ritchie, modified by Ron SobczakSobczak

LAST REVISION: 12/11/83LAST REVISION: 12/11/83

*/*/

Page 30: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 30 – CSCE 531 Spring 2006

struct nlist *install(char *name) /* put (name) in hashtab */struct nlist *install(char *name) /* put (name) in hashtab */

{{

struct nlist *np, *lookup();struct nlist *np, *lookup();

char *strdup(), *malloc();char *strdup(), *malloc();

int hashval;int hashval;

if ((np = lookup(name)) == NULL) { /* not found */if ((np = lookup(name)) == NULL) { /* not found */

np = (struct nlist *) malloc(sizeof(*np));np = (struct nlist *) malloc(sizeof(*np));

if (np == NULL)if (np == NULL)

return(NULL);return(NULL);

if ((np->name = strdup(name)) == NULL)if ((np->name = strdup(name)) == NULL)

return(NULL);return(NULL);

hashval = hash(np->name);hashval = hash(np->name);

np->next = hashtab[hashval];np->next = hashtab[hashval];

hashtab[hashval] = np;hashtab[hashval] = np;

}}

return(np);return(np);

}}

Page 31: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 31 – CSCE 531 Spring 2006

NFAs (Non-deterministic Finite Automata)NFAs (Non-deterministic Finite Automata)

Recall from last TimeRecall from last Time

M = (M = (ΣΣ, S, s, S, s00, , δδ, S, SFF)) Σ - alphabet S - states δ – state transition function s0 – start state SF – set of final or accepting states

L(M) – { x such that it is possible to follow a path in the L(M) – { x such that it is possible to follow a path in the transition diagram labeled x that ends in an accepting transition diagram labeled x that ends in an accepting state.}state.}

NFAs relax the functional nature of the transition functionNFAs relax the functional nature of the transition function

δδ(s, a), the nextstate for state s and input a, is a subset (s, a), the nextstate for state s and input a, is a subset of states of states

Page 32: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 32 – CSCE 531 Spring 2006

Language Accepted by an NFALanguage Accepted by an NFA

A string xA string x11xx22 … x … xnn is accepted by an NFA is accepted by an NFA

M = (M = (ΣΣ, S, s, S, s00, , δδ, S, SFF) if) if

Page 33: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 33 – CSCE 531 Spring 2006

Figure 2.3 Equivalence NFA, DFA, REFigure 2.3 Equivalence NFA, DFA, RE

RegExpr RegExpr NFA NFA Thompson ConstructionThompson Construction

NFA NFA DFA DFA Subset ConstructionSubset Construction

DFA DFA DFA DFA DFA minimizationDFA minimization

DFA DFA tables for scanner tables for scanner

DFA DFA RegExpr RegExpr Kleene Construction Kleene Construction

Page 34: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 34 – CSCE 531 Spring 2006

Converting Regular Expressions to NFAs Converting Regular Expressions to NFAs

Ken Thompson (1968) outlined a regular expression to Ken Thompson (1968) outlined a regular expression to NFA conversion algorithm for use in an editorNFA conversion algorithm for use in an editor Future fame?

How would we use regular expressions in an editor?How would we use regular expressions in an editor?

Unix regular expressionsUnix regular expressions

Grep family – Global Regular Expressions Print – Grep family – Global Regular Expressions Print – prints all lines in a file that contain a match to the prints all lines in a file that contain a match to the regular expressionregular expression

VariationsVariations Fgrep – fast fixed regular expression just a string Egrep – goes through NFA DFA and minimization

Page 35: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 35 – CSCE 531 Spring 2006

Restrictions on NFAs in Thompson ConstructionRestrictions on NFAs in Thompson Construction

Constructs an NFA from the regular expression with the Constructs an NFA from the regular expression with the following restrictions:following restrictions:

1.1. The NFA has a single start state, The NFA has a single start state, s0, and single final , and single final state, state, sf..

2.2. There are no transitions coming into the start stateThere are no transitions coming into the start state

3.3. and no transitions leaving the final state.and no transitions leaving the final state.

4.4. A state has at most 2 exiting A state has at most 2 exiting εε – transitions and at – transitions and at most 2 entering most 2 entering εε – transitions. – transitions.

s0 sf

Page 36: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 36 – CSCE 531 Spring 2006

Base Cases of Thompson Construction Base Cases of Thompson Construction

For a For a εε ΣΣ the NFA M the NFA Maa = ( = (ΣΣ, {s, {s00, s, sff}, }, δδ, s, s00, {s, {sff}) that }) that accepts it is:accepts it is:

For For εε the NFA M the NFA Mεε = ( = (ΣΣ, {s, {s00, s, sff}, }, δδ, s, s00, {s, {sff}) that accepts it }) that accepts it is:is:

Page 37: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 37 – CSCE 531 Spring 2006

Recursive Cases of Thompson Construction Recursive Cases of Thompson Construction

For regular expressions R and S with machines MFor regular expressions R and S with machines MRR and M and MSS

MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})}) M MSS = ( = (ΣΣ, S, SSS, , δδSS, s, s00, {s, {sff})})

Then the NFA Then the NFA

MMR|SR|S = ( = (ΣΣ, S, SR R U SU SSS U {new U {new00, new, newff}, }, δδR|SR|S, new, new00, {new, {newff})})

Page 38: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 38 – CSCE 531 Spring 2006

Recursive Cases of Thompson Construction R|SRecursive Cases of Thompson Construction R|S

For regular expressions R and S with machines MFor regular expressions R and S with machines MRR and M and MSS

MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})}) M MSS = ( = (ΣΣ, S, SSS, , δδSS, s, s00, {s, {sff})})

Then the NFA Then the NFA

MMR|SR|S = ( = (ΣΣ, S, SR R U SU SSS U {new U {new00, new, newff}, }, δδR|SR|S, new, new00, {new, {newff})})

Page 39: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 39 – CSCE 531 Spring 2006

Recursive Cases of Thompson Construction RSRecursive Cases of Thompson Construction RS

For regular expressions R and S with machines MFor regular expressions R and S with machines MRR and M and MSS

MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})}) M MSS = ( = (ΣΣ, S, SSS, , δδSS, s, s00, {s, {sff})})

Then the NFA Then the NFA

MMRSRS = ( = (ΣΣ, S, SR R U SU SSS U {new U {new00, new, newff}, }, δδRSRS, new, new00, {new, {newff})})

Page 40: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 40 – CSCE 531 Spring 2006

Recursive Cases of Thompson Construction R*Recursive Cases of Thompson Construction R*

For regular expression R with machine MFor regular expression R with machine MRR

MMRR = ( = (ΣΣ, S, SRR, , δδRR, r, r00, {r, {rff})})

Then the NFA Then the NFA

MMR*R* = ( = (ΣΣ, S, SR R U {new U {new00, new, newff}, }, δδR*R*, new, new00, {new, {newff})})

Page 41: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 41 – CSCE 531 Spring 2006

Thompson exampleThompson example

Fig 2.5 has one let’s do another RegExpr = (a|b)*b(a|b)*Fig 2.5 has one let’s do another RegExpr = (a|b)*b(a|b)*

Page 42: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 42 – CSCE 531 Spring 2006

NFA to DFA the Subset ConstructionNFA to DFA the Subset Construction

In an NFA given an input string we make choices about In an NFA given an input string we make choices about which way to go. We can think of it as being in a which way to go. We can think of it as being in a subset of the states.subset of the states.

To convert to a DFATo convert to a DFA

The states of the DFA correspond to sets of states of The states of the DFA correspond to sets of states of the NFAthe NFA

Transitions of the DFA are when you can move Transitions of the DFA are when you can move between the sets in the NFAbetween the sets in the NFA

Page 43: Lecture 3 RegExpr  NFA  DFA Topics Lex, Flex Thompson Construction Subset construction (maybe) Readings: 3.3-3.5, 3.7, 3.6 January 18, 2006 CSCE 531

– 43 – CSCE 531 Spring 2006

NFA to DFA the Subset ConstructionNFA to DFA the Subset ConstructionTo convert an NFA M = (To convert an NFA M = (ΣΣ, S, SNN, , δδNN, N, N00, F, FNN) to a ) to a DFA DFA

M = (M = (ΣΣ, S, SDD, , δδDD, D, D00, F, FDD) )

We will use a pair of functions to facilitate seeing all We will use a pair of functions to facilitate seeing all of the states we can get to from one on a given input.of the states we can get to from one on a given input.

Move(sMove(si i , , aa)) is set of states reachable from is set of states reachable from ssii by by aa

-closure(s-closure(sii) is set of states reachable from ) is set of states reachable from ssii by by

The algorithm:The algorithm: Start state derived from NStart state derived from N00 of the of the NFANFA

Take its Take its -closure D-closure D00 = = -closure(s-closure(s00) ) Take the image of DTake the image of D00, Move(D, Move(D00, , ) for each ) for each , ,

and take its and take its -closure-closure Iterate until no more states are addedIterate until no more states are added