cs3102: theory of computation class 10: dfas in practice spring 2010 university of virginia david...
TRANSCRIPT
![Page 1: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/1.jpg)
cs3102: Theory of Computation
Class 10: DFAs in Practice
Spring 2010University of VirginiaDavid Evans
![Page 2: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/2.jpg)
Menu
• Today:– Preparing for Exam 1– Language class for Deterministic PDAs– Applications of DFAs
• Thursday:– Exam Review (if you send questions and/or topics)– Applications of probabilistic DFAs and Grammars
![Page 3: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/3.jpg)
Exam 1
• In class, next Tuesday, 2 March • Covers:
Classes 1-9(10 and 11)
Sipser Ch 0-2
Problem Sets 1-3 + Comments
Exam 1
Note: unlike nearly all other sets we draw in this class, all of these sets are finite, and the size (roughly) represents the relative size.
![Page 4: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/4.jpg)
What’s on the Exam?Definitions
Language, problem, setsConstructing and understanding computing models
Finite automata (DFA, NFA)Pushdown automata (DPDA, NPDA)Grammars (Context-Free Grammar)
Language Classes: Regular and Context FreeShow a language is in the classShow a language is not in the classProve or disprove a closure property
Proof MethodsProof by InductionProof by ConstructionUnderstand and use the pumping lemmas for RL and CFL
Sample exam on website should give you a good idea what to expect
Your exam will probably also have “what’s wrong with this proof” questions
![Page 5: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/5.jpg)
Exam 1 Notesheet
For Exam 1, you may use only:– Your own brain and body– A low-tech writing instrument (pen or pencil) – A single page (both sides) of notes that you create
You may work with others to create your notes page.
![Page 6: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/6.jpg)
Admiral Grace Hopper
John von Neumann
Albert Einstein
![Page 7: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/7.jpg)
Exam Help Available
• Office Hours:– Thursdays, 8:30-9:30am– Thursdays, after class– Fridays, 10-11:30am (Sonali in Stacks)– Mondays, 1:15-3pm
• TA’s Exam Review Session– This Sunday, 5-6:30pm, Olsson 228E
![Page 8: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/8.jpg)
s
All Languages
RegularLanguages
(DFA, NFA, RE, RG)
Finite Languages
Context-Free(CFG or NPDA)
w
an
anbncn
ww
Where are the languages recognized by a Deterministic PDA?
![Page 9: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/9.jpg)
Proving Set Equivalence
A = B A B and B A
Sets A and B are equivalent if A is a subset of B and B is a subset of A.
BA
A B B A
![Page 10: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/10.jpg)
Proving Formalism Equivalence
![Page 11: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/11.jpg)
Proving Formalism Equivalence
![Page 12: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/12.jpg)
Proving Formalism Non-Equivalence
![Page 13: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/13.jpg)
s
All Languages
RegularLanguages
(DFA, NFA, RE, RG)
Context-Free(CFG or NPDA)
Which of these could be true?
anbn
![Page 14: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/14.jpg)
RegularLanguages
(DFA, NFA, RE, RG)
Context-Free (NPDA)
DPDA
RegularLanguages
(DFA, NFA, RE, RG)
Context-Free (NPDA)
DPDA
How can we distinguish these two plausible possibilities?
![Page 15: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/15.jpg)
RegularLanguages
(DFA, NFA, RE, RG)
Context-Free (NPDA)
DPDA
RegularLanguages
(DFA, NFA, RE, RG)
Context-Free (NPDA)
DPDA
How can we distinguish these two plausible possibilities?
Find some language A that can be recognized by some NPDA but not by any DPDA.
A
Prove by construction: for any NPDA, there is a DPDA that recognizes the same language.
![Page 16: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/16.jpg)
![Page 17: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/17.jpg)
ε, ε$
a, ε+
ε, εε
b, +εε, $ ε
ε, ε
εb, +ε
b, εεε, $ ε
![Page 18: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/18.jpg)
Proof by contradiction: Assume there is a DPDA that recognizes A. Show how to construct a NPDA that recognizes some language we know is not context free.
Proved by construction: We showed an NPDA that recognizes A.
![Page 19: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/19.jpg)
Proof by contradiction. Suppose there is a DPDA M that recognizes A.It must be in an accept state only after processing aibi and aib2i.
…a, αβ b, αβ
2i transitions, consuming 0i1i
…b, αβ b, αβ
i transitions, consuming 1i
Construct M’: copy all the states on the second half, replacing b with c:
…a, αβ b, αβ …c, αβ c, αβ
What is the language of M’?
![Page 20: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/20.jpg)
Proof by contradiction. Suppose there is a DPDA M that recognizes A.It must be in an accept state only after processing aibi and aib2i.
…a, αβ b, αβ …b, αβ b, αβ
Construct M’: copy all the states on the second half, replacing b with c:
…a, αβ b, αβ …c, αβ c, αβ
Not a Context-Free Language!
We have a contradiction: if A is in L(DPDA), we could use the DPDA that recognizes A to construct an DPDA that recognizes a non-context-free language! Hence, A must not be in L(DPDA).
![Page 21: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/21.jpg)
s
All Languages
RegularLanguages
(DFA, NFA, RE, RG)
Context-Free(CFG or NPDA)
anbn
A
Deterministic Context-Free LanguagesRecognized by a DPDA (or DCFG)
Context-Free Languages DeterministicContext-Free Languages
Regular Languages
![Page 22: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/22.jpg)
DFAs in Practice
![Page 23: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/23.jpg)
MalwareScanner
W32.Bolzano.Gen: 576a222bd2c20400558b4c240cd9ffff07fbffffff{0-2}5c4e544c445200{0-2}5c57494e4e545c73797374656d33325c6e746f736b726e6c2e65786500{0-29}3b4658
W32.MyLife.E: 7a6172793230*40656d61696c2e636f6d
Note: These are the signatures from ClamAV, an open source virus scanner.
FilesNetwork
Traffic
![Page 24: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/24.jpg)
String Matching
q0 q1 q2 q3 q4 q5
t r u t h
We hold these truths to be self-evident, that …
How much work is it to scan a string of length N for a signature?
![Page 25: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/25.jpg)
Faster String Matching
q0 q1 q2 q3 q4 q5
t r u t h
We hold these truths to be self-evident, that …
s[4] = h?s[10] = h?
truthtruth
s[9] = t?s[8] = u?
truthtruth
truthSkip table:a, b, c, d, e, f, g, i, j, k, l, m, n, o, p, q,
r, s, v, w, x, y, z: 6h: 0r: 4t: 1u: 2
![Page 26: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/26.jpg)
DFA / Skipping DFA
Is a “Skipping DFA” still a DFA?
(That is, does it still only accept the Regular Languages?)
![Page 27: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/27.jpg)
J. Strother Moore (UT Austin)
Boyer-Moore Fast String Searching Algorithm (1977)
Best case: N/(w+1) comparisons where N is the length of the text and w is the length of the search string
Is this fast enough for a malware scanner?
![Page 28: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/28.jpg)
Virus Detection
Total number of signatures: 720,033
2
4
6
8
10
12
11/01 05/02 12/02 06/03 01/04 08/04 02/05 09/05 03/06
Date
Size
(MB
)Symantec
RAV AV
Nate Paul’s study
Can we scan one input for many possible malware signatures quickly?
![Page 29: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/29.jpg)
Combining DFAs?Regular languages closed under union:
q0
qA0
qB0
qA1
qB1
ε
ε
a
a
…
…
How many states are there now?
![Page 30: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/30.jpg)
Signatures
First byte: Set of signatures:00000000 ~720000/25600000001 ~720000/25600000010 ~720000/256…11111111 ~720000/256
![Page 31: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/31.jpg)
Try a Trie
q0
q00
q01
q02
qFF
0x00
0x01
0x02
0xFF
…
q0000
q0001
q0002
q01FF
0x00
0x01
0x02
0xFF…
720000/(256*256) ~ 11
Alfred V. Aho and Margaret J. Corasick, 1975
q0000Alureona
0x02
![Page 33: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/33.jpg)
Evasive Malware
Metamorphic Code: as virus propagates, each new copy is different
How hard is it to automatically modify code without changing its behavior?
![Page 34: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/34.jpg)
Detecting Evasive Malware
• Less exact signatures (e.g., W32.MyLife.E:
7a6172793230*40656d61696c2e636f6d)– Dangerous – start matching benign programs if you’re not
careful!• Behavioral signatures: match the behavior, not the
program text– Undecidable in general (we’ll see in a few weeks)– Expensive and difficult in practice (but done by all decent
scanners)
![Page 35: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/35.jpg)
Faster String Scanning
![Page 36: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans](https://reader034.vdocuments.us/reader034/viewer/2022051620/56649e0d5503460f94af73b7/html5/thumbnails/36.jpg)
Charge
• We focus on DFAs, NFAs, PDAs, CFGs, etc. as abstract models: Number of states, time to process, etc. don’t matter
• Lots of real applications of these models: but in practice, what matters is different
If you have topics you want me to review, post comments (on today’s class announcement) by 5pm tomorrow.