q uantitative i nformation f low as n etwork f low c apacity stephen mccamant and michael d. ernst...

QUANTITATIVE INFORMATION FLOW AS NETWORK FLOW CAPACITY

Stephen McCamant and Michael D. Ernst

Reading Group 9/18/08Slides by Michelle Goodstein

MOTIVATION Subset of inputs are secret Subset of outputs are public Express confidentiality as a limit on number

of secret bits revealed in public outputs Quantitative information flow security

Goal: Develop scheme for dynamic quantitative information flow analysis Dynamic: examine actual runs

Developed system for single-thread apps only

MOTIVATION

Problem is similar to max flow Distinguished beginning, end Secret inputs “flow” from start Can take many routes to the end Want to know, how many secret bits reach end?

Invent a “gadget” to convert dynamic execution trace into a flow graph

Roughly: Max flow bounds number of bits of secret information revealed Max flow-min cut theorem

OUTLINE

Dynamic Max-Flow Analysis Soundness and Consistency Implementation Details Checking Flow Bounds Case Studies Conclusions

DYNAMIC MAX-FLOW ANALYSIS

Edges Capacities are in # of (secret) bits an edge can

hold Nodes:

Represent basic operations, memory locations, registers

In degree of node corresponds to arity of operation

Goal: Graph where max flow soundly bounds information dissemination

“GADGET” EXAMPLE

c = a + b 32 bit integers, a & b part of “secret input”

DYNAMIC MAX-FLOW ANALYSIS

Output used multiple times: Modify gadget to properly bound flow

Example: d = c = a + b

+32 32

Max flow from a,b to c,d: 64 bits Max flow from a,b to c,d: 32 bits

IMPLICIT FLOWS

General programs are more complicated than circuits Branches, arrays, pointers indirectly affect

computation Example: array[5] == 0 implies prior accesses

did not touch 5th location Call this implicit data flow

Need to fix graph to account for implicit flows Solution: add edges that represent all

possible data flows

IMPLICIT FLOW SOLUTION

Implicit flows are contained in an enclosure with defined inputs/outputs Enclosures make code appear to be straight-line

Idea: Add edges from “implicit flows” to outputs of enclosure Examples: Square root

Special square root instruction: explicit flow Uses branches, loops: assume all can affect final value

One problem: how to represent an edge from a “flow” to an output

ENCLOSURE EXAMPLE

Enclosure

2ndEnclosur

Assign edge capacity equal to number of possible different executions

2-way branch: add edge with a 1-bit capacity

Pointer op: add edge with capacity equal to number of secret bits in pointer value

Enclosure regions are either Annotated explicitly by the programmer

Used in most of the case studies Inferred using static analysis

Pilot study conducted

“Additional edges” don’t actually go to enclosure outputs Instead, add a distinguished node All flows have edge to distinguished node, and

distinguished node has edges to outputs

DETERMINING EDGE CAPACITIES

In order to assign edge capacities, need to know how many bits in a data value are “secret” Can only leak as many secret bits as a value

contains Computed as Taintcheck at the bit-level

Create shadow bit vectors Track taint of each bit When creating an edge: the number of tainted

bits provides the capacity bound

EXAMPLE

Exposes 9 bits of secret input 1 bit from branch 8 bits from counter

ADVERSARIAL MODEL A bound of k bits is sound if an adversary

could have communicated the message directly using a k-bit code

Consider deterministic, public programs Public inputs fixed in advance

Alice and Bob agree on a messages ahead of time

Alice communicates to Bob via program by manipulating secret inputs

Bob can only observe public inputs, public outputs and program code

Program is a channel for communication Bound is channel capacity

ADVERSARIAL MODEL: SOUND

Alice sends an input i I Tool reports bound k(i)

Tool is sound iff there is also a code c where for each message i, Alice and Bob could have

communicated i using exactly k(i) bits

ADVERSARIAL EXAMPLE

Assume Divide(a,b) returns c = a/b Alice controls inputs a,b Bob sees public output c Alice sets a=2,b=0 for “Attack”

Bob observes “exception” Alice sets a=4,b=1 for “Don’t attack”

Bob observes “4”

Code c: 1Attack, 0Don’t attack 1 bit bound is sound

IMPLICATIONS OF ADVERSARIAL MODEL

Impossible for many distinct outputs which don’t reveal secret input Kraft’s inequality:

Bound of 0 bits Public output does not depend on secret inputs Fixed public inputs determine public output

2k possibilities Bound of ≥ k bits

12 )( i

SOUNDNESS AND CONSISTENCY

Soundness is measured over multiple runs Graph, as defined, only operates over one

dynamic run

Without getting into detail… Can “merge” these graphs using union-find Takes almost-linear time Bound for merged graph is sound for original

graphs Any cut in the combined graph is also valid in the

original graphs

IMPLEMENTATION Valgrind, for Linux/x86 Associate positive integer tags with any

values that could contain secret information Registers, each byte in memory gets a tag Tag == 0 no secret information, not necessary

to include in graph For each operation, if at least one input has

nonzero tag, generate graph nodes and edges appropriately

Some optimizations for arrays possible Descriptions for large memory regions, along

with exception lists

MAXIMUM FLOW

Solving for maximum flow takes O(VE) V = # of vertices E = # of edges For n nodes, potentially O(n3)

Want: Linear in actual program runtime

Soln: Collapse edges, nodes to shrink graph size Sacrificing some precision

USING THE MAX-FLOW IN LATER RUNS

So far: Can observe ≥ 1 run, calculate a max flow for all observed runs

But later runs may have different inputs and different outputs

Question: how can the flow be used again without rerunning the entire algorithm?

Answer: Use Max-flow/Min-cut theorem

MAX-FLOW/MIN-CUT

Once a max flow is found……Use it to find a min-cut (DFS)

Cut edges show where information flows from secret inputs to public outputs

If no other flows occur, the bound is sound

Commentary: if other flows do occur, no guarantee…

FIXING THE SOUNDNESS

On future runs, use Taintcheck When encountering an operation corresponding

to an edge in the cut – clear all taint bits If anything is tainted at the end—followed a new

flow path Nothing tainted at the end—bound was sound

FIXING THE SOUNDNESS

Can also run two versions of the app in lockstep E1 gets secret input, E2 gets “fake” secret input Reach nodes corresponding to the cut, E1 sends

values to E2 If they have same outputs at the end, then no

new flow path followed.

WHY THIS MIGHT BE USEFUL…

Can use to limit the number of bits of information leaked in any particular execution

Results from one execution do not necessarily transfer to another Unless deterministic programs and equivalent

inputs Main use in debugging/testing

CASE STUDIES

Paper presented 5 case studies & pilot study for inferring closure regions

I’ll go over 3 case studies: Battleship OpenSSH ImageMagick

KBATTLESHIP

http://games.kde.org/game.php?game=kbattleship

KBATTLESHIP

Network messages between players call method shipTypeAt() to determine what type of ship is at location (x,y), if any

shipTypeAt() returns an integer length Nonzero: ship is there Particular value: indicates which ship is there

Information can be used to write a modified program that infers extra information

Tool shows patched version leaks at most 2 bits Where or not hit a ship If hit, whether the hit was fatal

OPENSSH

Marked private key as secret during authentication

Tool finds 128 bits of information about the secret key are revealed

Cut location corresponds to an MD5 checksum

IMAGEMAGICK

Suppose you wish to obscure part of an image

Images from http://people.csail.mit.edu/smcc/projects/secret-flow/

IMAGEMAGICK

Which technique does the best job?

IMAGEMAGICK

Which technique does the best job?

IMAGEMAGICKIn fact…

Swirled Unswirled

OriginalOriginal

CONCLUSIONS

Interesting application of Max-flow/Min-cut to network security

Can be applied to a variety of programs Uses dynamic analysis instead of static

Can sometimes make inferences static couldn’t Bounds don’t necessarily hold across multiple

runs Framework designed for single-threaded

applications

BACKUP SLIDES

IMPLEMENTATION

Memory utilization Not merging graphs: can write graph

immediately to file Merging graphs:

Merging can be down piecewise Only current graph + info about nodes that still

correspond to values in registers/memory

INFERRING ENCLOSURE REGIONS

Static analysis for C code Based on CIL framework No alias analysis

72% of regions found by hand are discovered by simple pilot program

Adding array and introprocedural aliasing necessary to infer full set of enclosures

bzip was an outlier

INFERRING ENCLOSURE REGIONS

Ran on bzip2, example of worst case Computationally intensive, relies extensively on

input, uses large arrays Used inputs of various sizes

Inputs: digits of in words (“Three point one foure one five nine”)

Highly compressible input

Estimated bound of flow to be portion of output output that depends on the input

MAX FLOW RUNTIME

q uantitative i nformation f low as n etwork f low c apacity stephen mccamant and michael d. ernst...

Documents

data-centric security dawn song uc berkeley collaboration...

melissa goodstein...created date 9/22/2016 9:33:09 am

lincs the vocabulary routine michele goodstein sim...

r-------·-------.,-..-·---~--------------.p) · 1982),...

1 miyasaka laboratory yusuke satoh david w. mccamant et al,...

detecting behaviorally equivalent functions via symbolic...

quantitative information-flow tracking for real...

cormac flanagan and stephen freund pldi 2009 slides by...

c ollecting d ata i: q uantitative methods john perry

griffiths_1998_rev. of feynman's lost lecture the motion of...

ntroduction to uantitative etallography - buehler ·...

planeaciÓn estratÉgica leonard goodstein timothy nolan...

commotenc hniqufeosr q uantitative

uantitative research - pimco

goodstein details advances in...

document resume ed 219 230 goodstein, madeline p. title...

binary analysis and rewriting arvind ayyangar niranjan...

freeswitch in a commercial carrier enviornment cluecon...

m acro -c omparative q uantitative r esearch demography...

j m laidlaw uantitative research