1 monitoring extended regular expressions grigore rosu university of illinois at urbana-champaign,...
TRANSCRIPT
1
Monitoring Extended Regular Expressions
Grigore Rosu
University of Illinois at Urbana-Champaign, USA
Joint work withMahesh Viswanathan and Koushik Sen
2
Increasing Software Reliability
Current solutions– Human review of code and testing
Most used in practice Usually ad-hoc, intensive human support
– (Advanced) Static analysis Often scales up False positives and negatives, annotations
– (Traditional) Formal methods Model checking and theorem proving General, good confidence, do not always scale up
4
Runtime Verification and Monitoring
Idea: Let system run and observe execution trace. If that violates or appears to violate requirements then report error or guide the
program to avoid or to hit error.
Idea: Let system run and observe execution trace. If that violates or appears to violate requirements then report error or guide the
program to avoid or to hit error.
5
Runtime Verification and Monitoring
PathExplorer – developed jointly with Havelund– Used on 70,000 lines of C++ code (K9 Rover)– Found a deadlock in ~10 seconds– Confirmed a datarace suspicion
Runtime Verification Workshop– ‘01 –France (CAV), ‘02 –Denmark (CAV), ’03 –USA (CAV)– ’04 –Spain (ETAPS), …
6
PathExplorer - Overview
Runningprogram
(socket)
Events
Observer
(Joint work with Klaus Havelund of NASA Ames)
7
PathExplorer – the Observer
Predictive Analisis
Specification BasedMonitoring
Dispatcher
datarace
deadlock
temporal
paxmodules module datarace =‘java pax.Datarace’; module deadlock =‘java pax.Deadlock’; module temporal =‘java pax.Temporal spec’; module ERE =‘java pax.Ere spec’;end
Eventstream
warning …warning …
warning …
ERE warning …
8
Why (Extended) Regular Expressions?
Ordinary programmers and software engineers understand and use regular expressions– Perl, Python, etc.
Safety policies are often regular patterns on sequences of states/events:– (idle* open (read + write)* close)*– Complementation needed: to say what should not
happen: ¬ (any* start1 (¬ end1)* start2 any*)
9
Extended Regular Expressions (ERE)
Regular expressions with complement
Language of an ERE
Intersection R ∩ R’ := ¬(¬R + ¬R’)
R ::= Φ | ε | A | R + R | R · R | R* | ¬R
L(Φ) = Φ L(R + R’) = L(R) L(R’)
L(ε) = {ε} L(R · R’) = {ww’ | w L(R), w’ L(R’)}
L(A) = {A} L(R*) = (L(R))* L(¬R) = * \
L(R)
10
ERE Membership Problem
Given w * and R, is it the case that w L(R)? Patterns in strings; many applications
– Programming languages (PERL, Python)– Molecular biology (Knight-Myers95)– Monitoring
Efficient solutions are of great practical interest From now on, n is the length of the word/trace w
and m is the size of the ERE R– n is typically much much larger than m
11
What is known (I)
If R does not contain negations, then– Transform R into an NFA of size O(m) (Aho’90)
Solution in time O(nm) and space O(m) Improved by Mayers’92 (JACM): time/space O(nm / log n)
– Transform R into a DFA of size O(2m) (Aho’90) Solution in time O(nm) and space O(2m) Note: transitions in a DFA take logarithmic time
Negations and their nesting make the membership problem highly non-trivial
12
Problems with Negation (I)
How to complement an NFA?– Just complementing the set of final states is wrong!
a a
b bA
L(A) = {ab}
a a
b bA’
L(A’) = {ab,a, ε}
13
Problems with Negation (II)
DFAs can be complemented safely by just complementing the set of final states, but– NFA -> DFA implies exponential state blowup!– For k nested negations, 2^(2^(…(2^m)…)) states
– This makes the membership problem non-elementary more complex in the context of (nested) negations
k
14
What is known (II)
Dynamic programming algorithm (Hopcroft-Ullman ’79)Time O(n3m) and space O(n2m)
Special synchronized alternating automata(Yamamoto ’02) – intersection but not negation(Kupferman-Zuhovitzky ’02) – general ERETime O(n2m) and space O(nm+kn2), where k is the
number of negations and intersections Algorithms above store the word; this is
unacceptable in many practical situations
15
Desired Behavior - Monitoring
Runningprogram
socket
Events
ObserverAlgorithms processing and then discarding
each event are desiredin practice, since words or execution traces can
be extremely long
Algorithms processing and then discarding
each event are desiredin practice, since words or execution traces can
be extremely long
16
Challenges and Talk Overview
What is the lower space/time bound of the ERE monitoring problem (to process one event)?
– (2cm½ ) for space
What is a reasonable upper bound for the ERE monitoring problem (to process one event)?
– Rewriting algorithm in O(22m2) space/time
How to generate optimal monitors for ERE?– Optimal monitor generation by coinduction
17
Lower Bound for ERE Monitoring (I)
Consider the language(Chandra-Kozen-Stockmeyer81 in alternation)(Kupferman-Vardi98 in model checking)
Lk = {u # w # u’ $ w | w {0,1}k and u,u’
{0,1,#}*}We show that
• There is an ERE Rk of size (k2) with L(Rk) = Lk
• Any monitoring algorithm for Lk needs (2k) spaceSo we can conclude that the space lower bound for
ERE monitoring is (2cm½)
18
Lower Bound for ERE Monitoring (II)
Lk = {u # w # u’ $ w | w {0,1}k and u,u’
{0,1,#}*}
Note that size of Rk is (k2) and L(Rk) = Lk
Rk = ???(¬$)* $ (¬$)* ∩
???(0+1+#)* # ???
[(0+1)i 0 (0+1)k-i-1 # (0+1+#)* $ (0+1)i 0 (0+1)k-i-1 +
(0+1)i 1 (0+1)k-i-1 # (0+1+#)* $ (0+1)i 1 (0+1)k-i-1]
∩k
i=0
There should be exactly one $
symbol, and …
There should be exactly one $
symbol, and …There should be
some sequence of 0,1,#, followed by a # and then by a W …
There should be some sequence of
0,1,#, followed by a # and then by a W …
Each letter in W should appear after $ at exactly the same
position …
Each letter in W should appear after $ at exactly the same
position …
19
Lower Bound for ERE Monitoring (III)
Lk = {u # w # u’ $ w | w {0,1}k and u,u’
{0,1,#}*}• Let A be a monitor for Lk
• When A reads symbol $, it should “remember”
exactly those w that have been seen so far
• There are 22k possible distinct situations to remember;
so at least 2k memory needed by A to encode each of these situations
20
Idea of an Event-Consuming Algorithm
“Consume” each event as it arrives, generating a new ERE monitoring requirement
Use the notion of derivative– R{a} is the ERE that should hold after seeing event
a, in order for R to hold now
– Algorithm A stores an ERE R, and when an event a
arrives it replaces R by R{a} ; at the end of trace A checks whether εR
– How can we generate R{a} efficiently?– How can we store R{a} compactly?
21
ERE Syntax
Sorts Ere and Event; subsort Event < Ere Operations
Φ : -> Ere
ε : -> Ere
_+_ : Ere Ere -> Ere[assoc comm id: empty]
_ _ : Ere Ere -> Ere[assoc id: nil]
_* : Ere -> Ere
¬_ : Ere -> Ere
22
Derivatives
Operations _{_} : Ere Event -> Ere_?_:_ : Bool Ere Ere -> Ere ε_ : Ere -> Bool
Equations(R1 + R2){a} = R1{a} + R2{a}(R1 R2){a} = R1{a} R2 + (εR) ? R2{a} : Φ(R*){a} = R{a} R*
(¬R){a} = ¬(R{a})ε{a} = ΦΦ{a} = Φb{a} = (b == a) ? ε : Φ
Obvious!
• Related work:• Antimirov and Mosses
23
Three Important Simplifying Rules
Without any other rules, R{a1}{a2}…{an} can grow to unbounded size
Simplifying rules
Φ R = ΦR + R = RR1 R + R2 R = (R1 + R2) R
Let R be the rewriting system defined so far
24
Theorems (RTA’03)
R is terminating and ground Church-Rosser modulo AC of _+_ and A of _ _
L(nfAC(R{a})) = {w | aw L(R)} for all EREs R
a1a2…an L(R) iff ε R{a1}{a2}…{an}
R{a1}{a2}…{an} requires O(22m2) space and
O(n22m2) time, where m = |R|
25
Problems …
Previous algorithm is not synchronous!– Unless we check for emptiness after processing each
event, which is very expensive
How to generate a minimal monitor for ERE avoiding the highly exponential state explosion?
Solution: Circular Coinduction– Related work by Rutten: no negation
26
Hidden LogicBehavioral Specification
Behavioral specification– Tuple (V, H, Γ, Σ, E), or simply (Γ, Σ, E)– Sorts S = V H
V = visible sorts (stay for data: integers, reals, chars, etc.) H = hidden sorts (stay for states, objects, blackboxes, etc.)
– Operations Γ Σ Σ is an S-signature Γ is a subsignature of Σ of behavioral operations
– E is a set of Σ-equations
27
Contexts and Experiments
Γ-context is a Γ-term with a hidden “slot” Γ-experiment is a Γ-context of visible result
z : h
operations in Γvisible if Γ-experiment
28
Behavioral Equivalence
Models called hidden Σ-algebras; A, A’, … Behavioral equivalence on A: a ≡ a’
– Identity on visible carriers– a ≡h a’ iff Aξ(a) = Aξ(a’) for any Γ-experiment ξ
a a’
visible
=Aξ(a) Aξ(a’)
Γ Γ
Γ
29
Behavioral Satisfaction
a Σ-equation, A a hidden Σ-algebra A behaviorally satisfies , written
iff θ(t) ≡h θ(t’) for any map θ : X → A
A
( X) t =h t’
A
( X) t =h t’
≡|Γ
ΣA
A
( X) t =h t’
Γ
≡ (Γ, Σ, E)|A
≡ ( X) t =h t’|B
A
30
Proving Behavioral Equivalence
Behavioral satisfaction known to be π2 hard, so– No way to automatically prove any truth– No way to automatically disprove any falsity– Hidden logics are incomplete
Coinduction and context induction very strong– Both require human support
Circular coinduction is an automatic procedure– Tuned and tested on hundreds of examples
Streams, Protocols (ABP), Patterson’s mutual exclusion, etc.
– Supported by BOBJ, prototyped in Maude
0
31
Circular Coinduction in a Nutshell
“Derive” the original proof goal until end up in circles
▲ = ♥
☺ = ☼
♣ = ► ☺ = ☼
5 = 5
9 = 9
0 = 0 ☺ = ☼
a m1 m2
♣ = ►a m1
m2
a m1m2
♣ = ►
Modulo substitutions,
“special” contexts and
equational reasoning
Moreover, all the behavioral equalities on the proof graph are true:lemma descovery!
Moreover, all the behavioral equalities on the proof graph are true:lemma descovery!
“Explanation?”(1) All possibilities to distinguish the two are exhaustively explored
“Explanation?”(1) All possibilities to distinguish the two are exhaustively explored
“Explanation?”(2) Any experiment can be “consumed” bottom-up, ending in a “visible” node
“Explanation?”(2) Any experiment can be “consumed” bottom-up, ending in a “visible” node
“Explanation?”(3) Congruent binary relation R is built; but behavioral equiv. is the largest!
“Explanation?”(3) Congruent binary relation R is built; but behavioral equiv. is the largest!
“Explanation?”(4) Context induction:Nodes above form “induction hypothesis”
“Explanation?”(4) Context induction:Nodes above form “induction hypothesis”
32
zip(zero, one) = blink
zip(zero, one) = blink
0 = 0 zip(one,zero) = t(blink)
1 = 1 zip(zero,one) = blink
h t
h t
Cobasis {h,t}
33
zip(zero, one) = blink
zip(zero, one) = blink
0 = 0 1 = 1 zip(zero,one) = blink
Cobasis {h, ht, tt}
h ht tt
34
zip(odd(S), even(S)) = S
zip(odd(S), even(S)) = S
h(S) = h(S) zip(even(S),even(t(S))) = t(S)
h(t(S)) = h(t(S)) zip(even(t(S)), even(t(t(S)))) = t(t(S))
h t
h t
Cobasis {h,t}
35
zip(odd(S), even(S)) = S
zip(odd(S), even(S)) = S
h(S) = h(S) odd(S) = odd(S)
Cobasis {h, odd, even}
even(S) = even(S)
h odd even
One can prove by {h,t}-circular coinduction that
odd(zip(S,S’)) = Seven(zip(S,S’)) = S’
36
Behavioral Specification of EREs
B = (V, H, Γ, Σ, E) where– V contains Event and Bool– H contains Ere– Σ contains Φ, ε, _+_, _ _, _*, ¬_– E contains all equations defined before– Γ contains ε_ : Ere -> Bool
_{_} : Ere Event -> Ere
Theorem: B beh. satisfies R = R’ iff L(R) =
L(R’)
37
(a + b)* = (a* b*)*
(a + b)* = (a* b*)*
(a + b)* = b* (a* b*)*
(a + b)* = a* b* (a* b*)* (a + b)* = b* (a* b*)*
true = true
true = true
true = true (a + b)* = a* b* (a* b*)* (a + b)* = b* (a* b*)*
ε_ _{a} _{b}
(a + b)* = a* b* (a* b*)*
ε_ _{a} _{b}
ε_ _{a} _{b}
Moreover, all the equivalences in the proof graph below are true!
Moreover, all the equivalences in the proof graph below are true!
Theorem:Circular Coinduction is a decision procedure for ERE language equality
Theorem:Circular Coinduction is a decision procedure for ERE language equality
38
Generating Minimal DFAs for EREs
R
R{a} R{b}
…… R’’ ……
R’{a}
a b
a b
…… R’ ……a bequivalent?
(1) Maintain a set C of pairsof equivalent EREs
(2) Check each new ERE forequivlance with alreadyexisting EREs in the DFA
• First in C• Then by CC. If equivalent ERE found, then add new
circularities to C
39
Implementation
BOBJ cannot be used because it does not return the set of circularities
Implemented a specialized circular coinduction algorithm in Maude
Web server at http://fsl.cs.uiuc.edu– A PERL CGI script which calls Maude– Generates JPEG, PS, and DOT versions of DFA
40
Conclusion and Future Work
Exponential complexity unavoidable when negation is added to regular expressions (EREs)
Few rewriting rules provide the best trace membership algorithm known for EREs
Generation of minimal DFAs for EREs by circular coinduction (CC) avoids state explosion
– To be part of PathExplorer at NASA Ames
Behavioral Maude with circular coinduction Inductive/Coinductive Theorem Prover (ICTP) Behavioral Rewriting Logic