regular expressions & automata
DESCRIPTION
Regular Expressions & Automata . Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park. Complexity to Computability. Just looked at algorithmic complexity How many steps required At high end of complexity is computability Decidable Undecidable. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/1.jpg)
Regular Expressions & Automata
Fawzi EmadChau-Wen Tseng
Department of Computer ScienceUniversity of Maryland, College Park
![Page 2: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/2.jpg)
Complexity to Computability
Just looked at algorithmic complexityHow many steps required
At high end of complexity is computabilityDecidableUndecidable
![Page 3: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/3.jpg)
Complexity to Computability
Approach complexity from different directionLook at simple models of computation
Regular expressionsFinite automataTuring Machines
![Page 4: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/4.jpg)
Overview
Regular expressionsNotationPatternsJava support
AutomataLanguagesFinite State MachinesTuring MachinesComputability
![Page 5: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/5.jpg)
Regular Expression (RE)
Notation for describing simple string patternsVery useful for text processing
Finding / extracting pattern in textManipulating stringsAutomatically generating web pages
![Page 6: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/6.jpg)
Regular Expression
Regular expression is composed ofSymbolsOperators
Concatenation ABUnion A | BClosure A*
![Page 7: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/7.jpg)
Definitions
AlphabetSet of symbols Examples {a, b}, {A, B, C}, {a-z,A-Z,0-9}…
StringsSequences of 0 or more symbols from alphabetExamples , “a”, “bb”, “cat”, “caterpillar”…
LanguagesSets of stringsExamples , {}, {“a”}, {“bb”, “cat”}…
empty string
![Page 8: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/8.jpg)
More Formally
Regular expression describes a language over an alphabetL(E) is language for regular expression E
Set of strings generated from regular expressionString in language if it matches pattern specified by regular expression
![Page 9: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/9.jpg)
Regular Expression Construction
Every symbol is a regular expressionExample “a”
REs can be constructed from other REs usingConcatenationUnion |Closure *
![Page 10: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/10.jpg)
Regular Expression Construction
ConcatenationA followed by BL(AB) = { ab | a L(A) AND b L(B) }
Example a
{“a”}ab
{“ab”}
![Page 11: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/11.jpg)
Regular Expression Construction
UnionA or BL(A | B) = { a | a L(A) OR a L(B) }
Example a | b
{“a”, “b”}
![Page 12: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/12.jpg)
Regular Expression Construction
ClosureZero or more AL(A*) = { a | a = OR a L(A) OR a L(A)L(A)… }
Examplea*
{, “a”, “aa”, “aaa”, “aaaa” …}(ab)*c
{“c”, “abc”, “ababc”, “abababc”…}
![Page 13: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/13.jpg)
Regular Expressions in Java
Java supports regular expressions In java.util.regex.*Applies to String class in Java 1.4
Introduces additional specification methodsSimplifies specificationDoes not increase power of regular expressionsCan simulate with concatenation, union, closure
![Page 14: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/14.jpg)
Regular Expressions in Java
Concatenationab “ab”(ab)c “abc”
Union ( bar | or square brackets [ ] )a | b “a”, “b”[abc] “a”, “b”, “c”
Closure (star *)(ab)* , “ab”, “abab”, “ababab” …[ab]* , “a”, “b”, “aa”, “ab”, “ba”, “bb” …
![Page 15: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/15.jpg)
Regular Expressions in Java
One or more (plus +)(a)+ One or more “a”s
Range (dash –)[a–z] Any lowercase letters[0–9] Any digit
Complement (caret ^ at beginning of RE)[^a] Any symbol except “a”[^a–z] Any symbol except lowercase letters
![Page 16: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/16.jpg)
Regular Expressions in Java
PrecedenceHigher precedence operators take effect first
Precedence orderParentheses ( … )Closure a* b+Concatenation abUnion a | bRange [ … ]
![Page 17: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/17.jpg)
Regular Expressions in Java
Examplesab+ “ab”, “abb”, “abbb”, “abbbb”…(ab)+ “ab”, “abab”, “ababab”, …ab | cd “ab”, “cd”a(b | c)d “abd”, “acd”[abc]d “ad”, “bd”, “cd”
When in doubt, use parentheses
![Page 18: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/18.jpg)
Regular Expressions in Java
Predefined character classes[.] Any character except end of line[\d] Digit: [0-9][\D] Non-digit: [^0-9][\s] Whitespace character: [ \t\n\x0B\f\r][\S] Non-whitespace character: [^\s][\w] Word character: [a-zA-Z_0-9][\W] Non-word character: [^\w]
![Page 19: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/19.jpg)
Regular Expressions in Java
Literals using backslash \Need two backslashJava compiler will interpret 1st backslash for String
Examples\\] “]”\\. “.”\\\\ “\”
4 backslashes interpreted as \\ by Java compiler
![Page 20: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/20.jpg)
Using Regular Expressions in Java
1. Compile patternimport java.util.regex.*;Pattern p = Pattern.compile("[a-z]+");
2. Create matcher for specific piece of text Matcher m = p.matcher("Now is the time");
3. Search textBoolean found = m.find();
Returns true if pattern is found in text
![Page 21: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/21.jpg)
Using Regular Expressions in Java
If pattern is found in textm.group() string found m.start() index of the first character matchedm.end() index after last character matchedm.group() is same as substring(m.start(), m.end())
Calling m.find() againStarts search after end of current pattern matchIf no more matches, return to beginning of string
![Page 22: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/22.jpg)
Complete Java ExampleCode
Outputow – is – the – time –
import java.util.regex.*;public class RegexTest { public static void main(String args[]) { Pattern p = Pattern.compile(“[a-z]+”); Matcher m = p.matcher(“Now is the time”); while (m.find()) { System.out.print(m.group() + “ – ”);} } }
![Page 23: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/23.jpg)
Language Recognition
Accept string if and only if in language Abstract representation of computationPerforming language recognition can be
SimpleStrings with even number of 1’s
HardStrings representing legal Java programs
Impossible!Strings representing nonterminating Java programs
![Page 24: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/24.jpg)
Automata
Simple abstract computersCan be used to recognize languagesFinite state machine
States + transitions
Turing machineStates + transitions + tape
![Page 25: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/25.jpg)
Finite State Machine
StatesStartingAcceptingFinite number allowed
TransitionsState to stateLabeled by symbol
L(M) = { w | w ends in a 1}
q1 q2
0
10 1
Start State
Accept Statea
![Page 26: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/26.jpg)
Finite State Machine
Operations Move along transitions based on symbol Accept string if ends up in accept stateReject string if ends up in non-accepting state
q1 q2
0
10 1
“011” Accept
“10” Reject
![Page 27: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/27.jpg)
Finite State Machine
PropertiesPowerful enough to recognize regular expressionsIn fact, finite state machine regular expression
Languages recognized by
finite state machines
Languages recognized by
regular expressions
1-to-1 mapping
![Page 28: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/28.jpg)
Turing Machine
Defined by Alan Turing in 1936Finite state machine + tapeTape
Infinite storageRead / write one symbol at tape headMove tape head one space left / right
Tape Head
… …q1 q2
0
10 1
![Page 29: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/29.jpg)
Turing Machine
Allowable actionsRead symbol from current squareWrite symbol to current squareMove tape head leftMove tape head rightGo to next state
![Page 30: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/30.jpg)
Turing Machine
* 1 0 0 1 0 *… …
Current State
Current Content
Value to Write
Direction to Move
New state to enter
START * * Left MOVING
MOVING 1 0 Left MOVING
MOVING 0 1 Left MOVING
MOVING * * No move HALT
Tape Head
![Page 31: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/31.jpg)
Turing Machine
OperationsRead symbol on current squareSelect action based on symbol & current stateAccept string if in accept stateReject string if halts in non-accepting stateReject string if computation does not terminate
Halting problemIt is undecidable in general whether long-running computations will eventually accept
![Page 32: Regular Expressions & Automata](https://reader033.vdocuments.us/reader033/viewer/2022061413/568160dc550346895dd00b71/html5/thumbnails/32.jpg)
Computability
ComputabilityA language is computable if it can be recognized by some algorithm with finite number of steps
Church-Turing thesisTuring machine can recognize any language computable on any machine
IntuitionTuring machine captures essence of computing
Program (finite state machine) + Memory (tape)