q uantitative i nformation f low as n etwork f low c apacity stephen mccamant and michael d. ernst...
Post on 04-Jan-2016
212 Views
Preview:
TRANSCRIPT
QUANTITATIVE INFORMATION FLOW AS NETWORK FLOW CAPACITY
Stephen McCamant and Michael D. Ernst
Reading Group 9/18/08Slides by Michelle Goodstein
MOTIVATION Subset of inputs are secret Subset of outputs are public Express confidentiality as a limit on number
of secret bits revealed in public outputs Quantitative information flow security
Goal: Develop scheme for dynamic quantitative information flow analysis Dynamic: examine actual runs
Developed system for single-thread apps only
MOTIVATION
Problem is similar to max flow Distinguished beginning, end Secret inputs “flow” from start Can take many routes to the end Want to know, how many secret bits reach end?
Invent a “gadget” to convert dynamic execution trace into a flow graph
Roughly: Max flow bounds number of bits of secret information revealed Max flow-min cut theorem
OUTLINE
Dynamic Max-Flow Analysis Soundness and Consistency Implementation Details Checking Flow Bounds Case Studies Conclusions
DYNAMIC MAX-FLOW ANALYSIS
Edges Capacities are in # of (secret) bits an edge can
hold Nodes:
Represent basic operations, memory locations, registers
In degree of node corresponds to arity of operation
Goal: Graph where max flow soundly bounds information dissemination
“GADGET” EXAMPLE
c = a + b 32 bit integers, a & b part of “secret input”
a b
c
+
32
32
32
DYNAMIC MAX-FLOW ANALYSIS
Output used multiple times: Modify gadget to properly bound flow
Example: d = c = a + b
a b
+
32 32
32
c d
32 32
a b
+32 32
c d
32 32
Max flow from a,b to c,d: 64 bits Max flow from a,b to c,d: 32 bits
IMPLICIT FLOWS
General programs are more complicated than circuits Branches, arrays, pointers indirectly affect
computation Example: array[5] == 0 implies prior accesses
did not touch 5th location Call this implicit data flow
Need to fix graph to account for implicit flows Solution: add edges that represent all
possible data flows
IMPLICIT FLOW SOLUTION
Implicit flows are contained in an enclosure with defined inputs/outputs Enclosures make code appear to be straight-line
Idea: Add edges from “implicit flows” to outputs of enclosure Examples: Square root
Special square root instruction: explicit flow Uses branches, loops: assume all can affect final value
One problem: how to represent an edge from a “flow” to an output
ENCLOSURE EXAMPLE
ENCLOSURE EXAMPLE
1st
Enclosure
Start
2ndEnclosur
e
End
IMPLICIT FLOW SOLUTION
Assign edge capacity equal to number of possible different executions
2-way branch: add edge with a 1-bit capacity
Pointer op: add edge with capacity equal to number of secret bits in pointer value
IMPLICIT FLOW SOLUTION
Enclosure regions are either Annotated explicitly by the programmer
Used in most of the case studies Inferred using static analysis
Pilot study conducted
“Additional edges” don’t actually go to enclosure outputs Instead, add a distinguished node All flows have edge to distinguished node, and
distinguished node has edges to outputs
DETERMINING EDGE CAPACITIES
In order to assign edge capacities, need to know how many bits in a data value are “secret” Can only leak as many secret bits as a value
contains Computed as Taintcheck at the bit-level
Create shadow bit vectors Track taint of each bit When creating an edge: the number of tainted
bits provides the capacity bound
EXAMPLE
Exposes 9 bits of secret input 1 bit from branch 8 bits from counter
ADVERSARIAL MODEL A bound of k bits is sound if an adversary
could have communicated the message directly using a k-bit code
Consider deterministic, public programs Public inputs fixed in advance
Alice and Bob agree on a messages ahead of time
Alice communicates to Bob via program by manipulating secret inputs
Bob can only observe public inputs, public outputs and program code
Program is a channel for communication Bound is channel capacity
ADVERSARIAL MODEL: SOUND
Alice sends an input i I Tool reports bound k(i)
Tool is sound iff there is also a code c where for each message i, Alice and Bob could have
communicated i using exactly k(i) bits
ADVERSARIAL EXAMPLE
Assume Divide(a,b) returns c = a/b Alice controls inputs a,b Bob sees public output c Alice sets a=2,b=0 for “Attack”
Bob observes “exception” Alice sets a=4,b=1 for “Don’t attack”
Bob observes “4”
Code c: 1Attack, 0Don’t attack 1 bit bound is sound
IMPLICATIONS OF ADVERSARIAL MODEL
Impossible for many distinct outputs which don’t reveal secret input Kraft’s inequality:
Bound of 0 bits Public output does not depend on secret inputs Fixed public inputs determine public output
2k possibilities Bound of ≥ k bits
12 )( i
ik
SOUNDNESS AND CONSISTENCY
Soundness is measured over multiple runs Graph, as defined, only operates over one
dynamic run
Without getting into detail… Can “merge” these graphs using union-find Takes almost-linear time Bound for merged graph is sound for original
graphs Any cut in the combined graph is also valid in the
original graphs
IMPLEMENTATION Valgrind, for Linux/x86 Associate positive integer tags with any
values that could contain secret information Registers, each byte in memory gets a tag Tag == 0 no secret information, not necessary
to include in graph For each operation, if at least one input has
nonzero tag, generate graph nodes and edges appropriately
Some optimizations for arrays possible Descriptions for large memory regions, along
with exception lists
MAXIMUM FLOW
Solving for maximum flow takes O(VE) V = # of vertices E = # of edges For n nodes, potentially O(n3)
Want: Linear in actual program runtime
Soln: Collapse edges, nodes to shrink graph size Sacrificing some precision
USING THE MAX-FLOW IN LATER RUNS
So far: Can observe ≥ 1 run, calculate a max flow for all observed runs
But later runs may have different inputs and different outputs
Question: how can the flow be used again without rerunning the entire algorithm?
Answer: Use Max-flow/Min-cut theorem
MAX-FLOW/MIN-CUT
Once a max flow is found……Use it to find a min-cut (DFS)
Cut edges show where information flows from secret inputs to public outputs
If no other flows occur, the bound is sound
Commentary: if other flows do occur, no guarantee…
FIXING THE SOUNDNESS
On future runs, use Taintcheck When encountering an operation corresponding
to an edge in the cut – clear all taint bits If anything is tainted at the end—followed a new
flow path Nothing tainted at the end—bound was sound
FIXING THE SOUNDNESS
Can also run two versions of the app in lockstep E1 gets secret input, E2 gets “fake” secret input Reach nodes corresponding to the cut, E1 sends
values to E2 If they have same outputs at the end, then no
new flow path followed.
WHY THIS MIGHT BE USEFUL…
Can use to limit the number of bits of information leaked in any particular execution
Results from one execution do not necessarily transfer to another Unless deterministic programs and equivalent
inputs Main use in debugging/testing
CASE STUDIES
Paper presented 5 case studies & pilot study for inferring closure regions
I’ll go over 3 case studies: Battleship OpenSSH ImageMagick
KBATTLESHIP
http://games.kde.org/game.php?game=kbattleship
KBATTLESHIP
Network messages between players call method shipTypeAt() to determine what type of ship is at location (x,y), if any
shipTypeAt() returns an integer length Nonzero: ship is there Particular value: indicates which ship is there
Information can be used to write a modified program that infers extra information
Tool shows patched version leaks at most 2 bits Where or not hit a ship If hit, whether the hit was fatal
OPENSSH
Marked private key as secret during authentication
Tool finds 128 bits of information about the secret key are revealed
Cut location corresponds to an MD5 checksum
IMAGEMAGICK
Suppose you wish to obscure part of an image
Images from http://people.csail.mit.edu/smcc/projects/secret-flow/
IMAGEMAGICK
Which technique does the best job?
IMAGEMAGICK
Which technique does the best job?
IMAGEMAGICKIn fact…
Swirled Unswirled
OriginalOriginal
CONCLUSIONS
Interesting application of Max-flow/Min-cut to network security
Can be applied to a variety of programs Uses dynamic analysis instead of static
Can sometimes make inferences static couldn’t Bounds don’t necessarily hold across multiple
runs Framework designed for single-threaded
applications
BACKUP SLIDES
IMPLEMENTATION
Memory utilization Not merging graphs: can write graph
immediately to file Merging graphs:
Merging can be down piecewise Only current graph + info about nodes that still
correspond to values in registers/memory
INFERRING ENCLOSURE REGIONS
Static analysis for C code Based on CIL framework No alias analysis
72% of regions found by hand are discovered by simple pilot program
Adding array and introprocedural aliasing necessary to infer full set of enclosures
bzip was an outlier
INFERRING ENCLOSURE REGIONS
Ran on bzip2, example of worst case Computationally intensive, relies extensively on
input, uses large arrays Used inputs of various sizes
Inputs: digits of in words (“Three point one foure one five nine”)
Highly compressible input
Estimated bound of flow to be portion of output output that depends on the input
MAX FLOW RUNTIME
MAX FLOW RUNTIME
top related