Download - Ditto: Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley
Ditto:Speeding Up Runtime Data Structure Invariant ChecksAJ Shankar and Ras BodikUC Berkeley
Motivation: A Debugging Scenario
Buggy program: a large-scale web application in Java
Primary data structure: hashMap of shopping carts
Carts are modified throughout code Bug: hashMap acting weird: carts disappearing,
etc. Hypothesis: cart modification violates
hashCode() invariance
How to Check the Hypothesis?
Debugger facilities inadequate Idea: write a runtime check
Iterates over buckets, checks hashCode() of each cart in bucket
Run check frequently to pinpoint error
Problem
The check is slow! (100x slowdown) Rerunning the program is now a problem
Furthermore, what if bug isn’t reproducible? Run the program with the check on entire test
suite? Infeasible.
Our Tool: Ditto
Ditto speeds up data structure invariant checks Usually asymptotically in size of data structure Hash table: 10x speedup at 1600 elements
What invariant checks can Ditto handle? Side-effect-free: cannot return fresh mutable
objects Recursive: not an inherent limitation of
algorithm
Basic Observation: Incrementalize
Invariant checks the entire data structure … … but once checked, a local change can be (re)checked locally! So, first establish invariant, then incrementally check changes
…
…
…“Hash code of each cart in
table corresponds to containing bucket.”
…
A New Domain
Existing incrementalizers: general purpose but not automatic [Acar PLDI 2006] User must annotate the program For functional programs Other caveats (conversion to CPS, etc.)
Ditto is automatic in this domain Functional invariant checks in an imperative Java
setting No user annotations Allows arbitrary heap updates outside the invariant A simple bytecode-to-bytecode implementation
Ditto Algorithm Overview
1. First run of check: construct graph of the computation
Stores function calls, concrete inputs
2. Track changes to computation inputs 3. Subsequent runs of check: rerun only
subcomputations with changed inputs Incrementally update computation graph =
incrementally compute invariant check
Example Invariant Check
Ensures a tree is locally ordered
boolean isOrdered(Tree t) { if (t == null) return true; if (t.left != null && t.left .value >= t.value) return
false; if (t.right != null && t.right.value <= t.value)
return false; return isOrdered(t.left) && isOrdered(t.right);}
1. Constructing a Computation Graph
Purpose of computation graph: 1. For unchanged parts of data structure, reuse
existing results2. For changed parts, identify parts of check that
need to be rerun
Graph stores the initial check run: Node = function invocation, along with its
Concrete formal arguments Concrete heap accesses Return value
Same inputs = can reuse return val
Changed inputs = must rerun
Inputs
1. Constructing a Computation Graph
A
P
C
The Heap Node created
with concrete
formal arg A
Calls children
Heap reads from a.value, a.left, a.right,
a.left.value, a.right.value are
remembered
Returns true
During first check run, by instrumentation
isOrdered(P)
isOrdered(A)
isOrdered(B) isOrdered(C)
B
2. Detecting Changed Inputs
Inputs to check that could change between runs: Arguments – easy to detect (passed to the check) Heap values – harder (could be modified anywhere in
code) Selective write barriers
Statically determine which fields are read in the check Barriers collect changed heap inputs used by check
In example: add write barriers for all writes into fields: Tree.left Tree.right Tree.value
if (t == null) return true;if (t.left != null && t.left.value >= t.value) return false;if (t.right != null && t.right.value <= t.value) return false;return isOrdered(t.left) && isOrdered(t.right);
…
……
B
3. Rerunning the Invariant isOrdered() Data structure modification: Add node N, remove node F
A
…
D
F
G
……
C
…E
…
……
N
A
…
D
F
G
…
…C
…E
B…
…
3. Rerunning the Invariant
Goal: Incrementally update computation graph Graph must look as if check was run afresh
Tree With New Modifications
…
……
N
A
…
D
F
G
…
…C
…E
B…
Computation Graph From Last Run
isOrdered(A)…
……
B
A
…
D
F
G
……
C
E
…
true
Write barriers say…
3. Rerunning the Invariant
isOrdered(A) is first node that needs to be rerun Parent inputs haven’t changed (functions are side-
effect-free) Rerunning exposes new node N What happens at isOrdered(B)?
…
……
N
A
…
D
F
G
…
…C
…E
B…
…
……
B
A
…
D
F
G
…
…C
E
…
true
N
3. Rerunning the Invariant
isOrdered(B) has same formal args, heap inputs We’d like to reuse its previous result
And end this subcomputation Problem: isOrdered(B) also depends on return values of
its callees Which might change, since isOrdered(D) will be rerun So we can’t be sure isOrdered(B)’s result will be the same!
…
……
N
A
…
D
F
G
…
…C
…E
B…
…
……
B
A
…
D
F
G
…
…C
E
…
true
N
Optimistic Memoization
Don’t want to rerun all nodes between B and D
Solution: we optimistically assume that isOrdered(B) will return the same result Invariant checks generally do! (e.g. “success”)
Check assumption when we rerun isOrdered(D)
For now, reuse previous result, finish up A A returns previous result (true), so finished here
N
…
……
B
A
…
D
F
G
…
…C
E
…
…
3. Rerunning the Invariant Now we rerun isOrdered(D) Reuse previous result of isOrdered(E), (G)
No further changes so no need for optimism isOrdered(F) pruned from graph
isOrdered(D) returns previous result (true) So optimistic assumption was correct Computations around isOrdered(A) all correct
N
…
……
B
A
…
D
F
G
…
…C
E
…
…
false
What If isOrdered(D) Returned false? Result propagated up graph
Continues as long as return val differs
In this case, root node of graph is reached Result for entire computation is changed
Automatically corrects optimistic assumptions
……
…
D
G
E…
N
B
A
false
false
false
false
false
false
Result of Algorithm
We’ve incrementally updated computation graph to reflect updated data structure Even with circular dependencies throughout graph, only reran 3
nodes Result of computation is result of root node (true) Graph is ready for next incremental update
…
……
N
A
…
D
F
G
…
…C
…E
B…
…
……
B
A
…
D
G
…
…C
E
…
true
N
Evaluation
Ran on a number of common data structure invariants, two real-world examples
Most complex invariant: red-black trees Tree is globally ordered Same # of black nodes to leaf Other RB properties (Black follows Red, etc.) We were unable to incrementalize this check
by hand!
Kernel Results
Ordered list performance
0
200
400
600
800
1000
1200
1400
0 500 1000 1500 2000 2500 3000
Data structure size
Tim
e (m
s)
No invariants
With Ditto
Invariants
Hash table performance
0
500
1000
1500
2000
2500
3000
3500
0 500 1000 1500 2000 2500 3000
Data structure size
Tim
e (m
s)
No invariants
With Ditto
Invariants
Red-black tree performance
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 500 1000 1500 2000 2500 3000
Data structure size
Tim
e (m
s)
No invariants
With Ditto
Invariants
Real-world Examples
Tetris-like game Netcols Invariant: no “floating” jewels in grid With check, main event loop ran at 80ms, noticeably
laggy Result: event loop to 15ms with Ditto
JavaScript obfuscator Invariant: no excluded keywords (based on a set of
criteria) in renaming mapJSO performance
0
5000
10000
15000
20000
25000
0 5000 10000 15000
Lines of JavaScript
Tim
e (m
s)
No invariants
With Ditto
Invariants
Summary
Results: Automatic incrementalization made practical For checks in Java programs Data structure checks viable for development
environment
Made possible by Selection of an interesting domain Optimistic memoization
Web: http://www.cs.berkeley.edu/~aj/cs/ditto/