specification inference for explicit information flow problems benjamin livshits, aditya v. nori,...

35
Merlin Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee IMDEA Software

Upload: alberta-watson

Post on 17-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

MerlinSpecification Inference for Explicit Information Flow Problems

Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani

Microsoft Research

Anindya BanerjeeIMDEA Software

Page 2: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Mining Security Specifications

• Problem: Can we automatically infer which routines in a program are sources, sinks and sanitizers?

• Technology: Static analysis + Probabilistic inference

• Applications:– Lowers false errors from tools– Enables more complete flow checking

• Results:– Over 300 new vulnerabilities discovered in 10

deployed ASP.NET applications

Page 3: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Motivation

Page 4: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Static Analysis Tools for Security

Web application vulnerabilities are a serious threat!

CAT.NET

Page 5: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Web Application Vulnerabilities

$username = $_REQUEST['username'];$sql = "SELECT * FROM Students WHERE username = '$username';

Page 6: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Propagation graph

ReadData1

Prop1 Prop2

Cleanse

WriteData

ReadData2

Propagation graphm1→ m2 iff information flows “explicitly” from m1 to m2

void ProcessRequest() { string s1 = ReadData1("name"); string s2 = ReadData2("encoding"); string s11 = Prop1(s1); string s22 = Prop2(s2); string s111 = Cleanse(s11); string s222 = Cleanse(s22);

WriteData("Parameter " + s111); WriteData("Header " + s222); }

Page 7: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Specification Source

returns tainted data Sink

error to pass tainted data

Sanitizer cleanse or untaint

the input Regular nodes

propagate input to output

Every path from a source to a sink should go through a sanitizer

Any source to sink path without a sanitizer is an information flow vulnerability

Vulnerability

Page 8: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

void ProcessRequest() { string s1 = ReadData1("name"); string s2 = ReadData2("encoding"); string s11 = Prop1(s1); string s22 = Prop2(s2); string s111 = Cleanse(s11); string s222 = Cleanse(s22);

WriteData("Parameter " + s111); WriteData("Header " + s222); }

Information flow vulnerabilities

ReadData1

Prop1 Prop2

Cleanse

WriteData

ReadData2

Vulnerabilities?

Page 9: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

void ProcessRequest() { string s1 = ReadData1("name"); string s2 = ReadData2("encoding"); string s11 = Prop1(s1); string s22 = Prop2(s2); string s111 = Cleanse(s11); string s222 = Cleanse(s22);

WriteData("Parameter " + s111); WriteData("Header " + s222); }

ReadData1

Prop1 Prop2

Cleanse

WriteData

ReadData2

Information flow vulnerabilities

source

sink

source

sanitizer

Page 10: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

void ProcessRequest() { string s1 = ReadData1("name"); string s2 = ReadData2("encoding"); string s11 = Prop1(s1); string s22 = Prop2(s2); string s111 = Cleanse(s11); string s222 = Cleanse(s22);

WriteData("Parameter " + s111); WriteData("Header " + s222); }

ReadData1

Prop1 Prop2

Cleanse

WriteData

ReadData2

Information flow vulnerabilities

source

sanitizer

sink

source

Page 11: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Goal

AssumptionMost flow paths in the propagation graph

are secure

Given a propagation graph, can we infer a specification or ‘complete’ a partial specification?

Page 12: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Algorithms

Page 13: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Merlin Architecture

Initial specificati

on

Program

Final specificatio

n

Prop. graph

construction

Factor graph

construction

Probabilistic

inference

Merlin

Vulnerabilities

CAT.NET

Page 14: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Propagation Graph Construction

ReadData1

Prop1 Prop2

Cleanse

WriteData

ReadData2

CAT.NET

void ProcessRequest() { string s1 = ReadData1("name"); string s2 = ReadData2("encoding"); string s11 = Prop1(s1); string s22 = Prop2(s2); string s111 = Cleanse(s11); string s222 = Cleanse(s22);

WriteData("Parameter " + s111); WriteData("Header " + s222); }

Page 15: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Inference?

ReadData1

Prop1 Prop2

Cleanse

WriteData

ReadData2

source

sanitizer

sink

?ReadData1

Prop1 Prop2

Cleanse

WriteData

ReadData2

source

?

sink

source

ReadData1

Prop1 Prop2

Cleanse

WriteData

ReadData2

source source

sanitizer

?

Page 16: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Path constraints

For every acyclic path m1 m2 … mn the probability that m1 is a source, mn is a sink, and m2, …, mn-1 are not sanitizers is very low

ReadData1

Prop1 Prop2

Cleanse

WriteData

ReadData2

Exponential number of path constraints: O(2|V|)!

ReadData1 Prop1 Cleanse WriteDatasource sink

Page 17: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Triple constraints

• For every triple <m1, mi, mn> such that mi

is on a path from m1 to mn, the probability that m1 is a source, mn is a sink, and mi is not a sanitizer is very low

ReadData1

Prop1 Prop2

Cleanse

WriteData

ReadData2

Cubic number of triple constraints: O(|V|3)!

ReadData1 Prop1 WriteData

ReadData1 WriteDataCleanse

source sink

source sink

Page 18: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Minimizing Sanitizers

ReadData1

Prop1 Prop2

Cleanse

WriteData

ReadData2

source source

sink

Page 19: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Minimizing Sanitizers

For every pair of nodes m1, m2 such that m1 and m2 lie on the same path from a potential source to a potential sink, the probability that both m1 and m2 are sanitizers is low

ReadData1

Prop1 Prop2

Cleanse

WriteData

ReadData2

ReadData1 Prop1 Cleanse WriteData

source sink

Page 20: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Need for probabilistic constraints

ReadData1

Prop1 Prop2

Cleanse

WriteData

ReadData2

a

b

c

d

Triple constraints ¬(a ∧ ¬b ∧ d) ¬(a ∧ ¬c ∧ d)

Avoid double sanitizers ¬(b ∧ c)

a ∧ d ⇒ b a ∧ d ⇒ c ¬(b ∧ c)

Page 21: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Boolean formulas as probabilistic constraints

(x1∨ x2) ∧ (x1∨ ¬x3) C1 C2

f(x1, x2, x3) = fC1(x1, x2) ∧ fC2(x1, x3) fC1(x1, x2) = fC2(x1, x3) =

1 if x1∨ x2 = true0 otherwise

1 if x1∨ ¬x3 = true0 otherwise

Page 22: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Boolean formulas as probabilistic constraints

f(x1, x2, x3) = fC1(x1, x2) ∧ fC2(x1, x3) fC1(x1, x2) = fC2(x1, x3) =

1 if x1∨ x2 = true0 otherwise

1 if x1∨ ¬x3 = true0 otherwise

(x1∨ x2) ∧ (x1∨ ¬x3) C1 C2

p(x1, x2, x3) = fC1(x1, x2) × fC2(x1, x3)/Z Z = ∑x1, x2, x3 (fC1(x1, x2) × fC2(x1, x3))

0.90.1

0.90.1

Page 23: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Solution = Marginalization

• Step 1: choose xi with highest pi(xi) and set xi = true if pi(xi) is greater than a threshold, false otherwise• Step 2: recompute marginals and repeat Step 1 until all variables have been assigned

marginal

pi(xi) = ∑x1 … ∑x(i-1) ∑x(i+1) … ∑xN p(x1, … , xN)

Page 24: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Factor graphs: efficient computation of marginals

fC1(x1, x2) = fC2(x1, x3) =

1 if x1∨ x2 = true0 otherwise1 if x1∨ ¬x3 = true0 otherwise

Page 25: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Factor GraphsReadData1

Prop1 Prop2

Cleanse

WriteData

ReadData2

Page 26: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Probabilistic InferenceSourc

eSanitiz

erSink

ReadData1 .95 .001 .001ReadData2 .5 .5 .5Cleanse .5 .5 .5WriteData .5 .5 .85…

Source

Sanitizer

Sink

ReadData1 .95 .001 .001ReadData2 .5 .5 .5Cleanse .01 .997 .03WriteData .5 .5 .85…

ProbabilisticInference

Page 27: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Paths vs. Triples

TheoremPath refines Triple

Page 28: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Experiments

Page 29: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Implementation

Merlin is implemented in C# Uses CAT.NET for building the

propagation graph Uses Infer.NET for probabilistic

inference▪ http://research.microsoft.com/infernet

Page 30: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Experiments

10 line-of-business applications written in C# using ASP.NET

Page 31: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Summary of Discovered Specifications

Sources Sanitizers Sinks0

20

40

60

80

100

120

140

160

27 7

7794

32

152Original With Merlin

Page 32: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Summary of Discovered Vulnerabilities

-50 0 50 100 150 200 250 300 350 400

89

342

13

Original

With MerlinFalse

positives eliminate

d

Page 33: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Experiments - summary

10 large Web apps in .NET Time taken per app < 4 minutes New specs: 167 New vulnerabilities: 322 False positives removed: 13 Final false positive rate for CAT.NET

after Merlin < 1%

Page 34: Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani Microsoft Research Anindya Banerjee

Summary

Merlin is first practical approach to infer explicit information flow specifications

Design based on a formal characterization of an approximate probabilistic constraint system

Able to successfully and efficiently infer explicit information flow specifications in large applications which result in detection of new vulnerabilities