binary analysis and rewriting

29
Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen McCamant Pongsin Poosankam Dawn Song UC, Berkeley

Upload: levana

Post on 13-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Binary Analysis and Rewriting. Min Gyung Kang Stephen McCamant Pongsin Poosankam Dawn Song UC, Berkeley. Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Binary Analysis and Rewriting

Binary Analysis and Rewriting

Arvind AyyangarNiranjan Hasabnis

Alireza SaberiTung TranR. Sekar

Stony Brook University

Min Gyung KangStephen McCamantPongsin Poosankam

Dawn SongUC, Berkeley

Page 2: Binary Analysis and Rewriting
Page 3: Binary Analysis and Rewriting

MotivationA popular approach for protecting applications

from untrusted OS is to rely on a trusted VMMBinary translation is one of the commonly used

implementation technologies in VMMsQEMU, earlier versions of VMWare, …Benefits: No need for hardware support, applicable to COTS binaries, whole system can be instrumented

Unfortunately, existing binary translators unsuited for enforcing higher level propertiesInformation flow, control-flow integrity, object-granularity memory safety, … Incur very high overheads (4x to 10x slowdown), or are

simply unable to express certain properties

Page 4: Binary Analysis and Rewriting

Our ApproachDevelop novel static analysis based methods to overcome the drawbacks of today’s techniques

Robust, scalable static analysis of low-level code From different compilers, or hand-coded assembly

Accurate disassembly of binary code Indirect control-flow transfers, non-standard call/return conventions, mixing of data and code, …

Accurate reasoning about key properties Dynamic taint analysis

Page 5: Binary Analysis and Rewriting

Robust and scalable Static analysis of low-level code

Page 6: Binary Analysis and Rewriting

Static analysis of low-level codeScalability: requires modular analysis

Analyze functions individually, compose resultsAvoids repeated analysis of same code (esp. libraries)

Strength: requires accurate reasoning about variables (esp. local variables)

Challenges in low-level binary codeDifficult to identify parameter passing in optimized

codeMissing pushes, parameter passing via registers,…

Difficult to distinguish local variables from other accesses

Caller/callee-saved registers, stack pointer conventions, …

Page 7: Binary Analysis and Rewriting

Static analysis of low-level code

To solve these challenges, previous approachesmake optimistic assumptions, or rely on compiler

idiomsoften fail on optimized code and/or large programsdon’t work for other compilers, or hand-written assembly

Our solution: Develop a new approach thatUses systematic analysis to reduce

assumptions/heuristicsAccurately tracks local variables by analyzing values

held in registers and on the stack

Page 8: Binary Analysis and Rewriting

Stack AnalysisAnalyzes one function at a timeExamines the use of stack to

Determine parameters Number of them, whether in registers or on stack

Caller- and callee-saved registersSummarize effect on parameters

Preservation of SP, return to caller, changes in parameter or register contents,…

ESP RETURN ADDR

ƒ

Page 9: Binary Analysis and Rewriting

Abstract Interpretation for Stack Analysis

LATTICE

<ƒ> :

Activation Record

Base_BP +[0,0]

EBP

push %ebpmov %esp, %ebpsub $16, %esp

Base_SP +[0,0]

ESP0 Base_SP

Page 10: Binary Analysis and Rewriting

Abstract Interpretation for Stack Analysis

LATTICE

<ƒ> :

Activation Record

Base_BP +[0,0]

EBP

push %ebpmov %esp, %ebpsub $16, %esp

Base_SP +[-4,-4]

ESP

Base_BP+[0,0]

0

-4

Base_SP

Page 11: Binary Analysis and Rewriting

Stack Analysis (contd)

Summary for f: No change to ESP Two input parameters on stack EAX, EDX, arg1 changed as shown Others unchanged

<f>:push %ebpmov %esp, %ebpsub $16, %espmov 8(%ebp), %eaxadd $3, %eaxmov %eax, 8(%ebp)mov $7, -12(%ebp)mov 12(%ebp), %edxmov %edx, -8(%ebp)leaveret

args

locals

Base_SP + [-4, -4]

arg1 + [3, 3]

arg2 + [0, 0]

EBP

EAX

EDX

Base_SP + [0, 0]

arg2 + [0, 0]

ESP

-12

SP

arg2

arg1 + [3, 3]

Ret Addr

RP

arg2 + [0, 0]

Base BP +[0,0]

7

Caller frame

Calleeframe

args

locals

Base_SP

Base_SP+[-20,-20]

Page 12: Binary Analysis and Rewriting

Stack Analysis: Preliminary results

FTP

pdftops

Gimp

XMMSApache

0

50

100

150

200

250

300

0 200 400 600

Size (K instructions)

Anal

ysis

tim

e (s

econ

ds)

Page 13: Binary Analysis and Rewriting

Static disassembly of binary code

Page 14: Binary Analysis and Rewriting

Background: Disassembly TechniquesLinear sweep algorithm

Start with program entry point, proceed to disassemble instructions sequentially

Key assumption: all instructions appear one after the next, without any gapsViolated in most code (presence of data or padding)

Recursive Traversal AlgorithmAfter a control-flow transfer instruction (CTI),

proceed to disassemble target addressFor conditional CTI and non-CTI, proceed to

disassemble next instructionKey problems

Code reached only through indirect CTIsFunctions that don’t return in the usual way

Page 15: Binary Analysis and Rewriting

Our Approach for DisassemblyAssumption

No code obfuscationNon-assumptions

Function prologue and epilogue patternsCompiler idioms or (lack of) optimizations

ApproachUse recursive traversalUse stack analysis to compute/verify return targetsDevelop new analysis to determine targets of

indirect control-flow transfers

Page 16: Binary Analysis and Rewriting

Our Approach: Type inference Key insight: Code pointer values don’t undergo

arithmetic or other transformationsImplication: values assigned to code pointers must

represent indirect CTI targetsAchieves much better results than data flow

analysisAvoids global def-use problem, which is very hard in low-level languages

Compute sets C of possible code addresses and C of definite code addressesCode at addresses in C can be safely disassembledCode at addresses not in C can be safely relocated

Page 17: Binary Analysis and Rewriting

Static Disassembly: Preliminary Results

Analysis of disassembler on 'ls' binary

Analysis Disassembled code Reachable code not disassembled

Recursive Traversal 2.7% 85%

Compiler idioms and heuristics 87% 1%

Function pointer analysis 88% 0%

Page 18: Binary Analysis and Rewriting

Static Disassembly: Preliminary Results

Gap in dhclient due to incomplete implementation, dealing with global arrays

Application Size (KB)

Disassembled code

Reachable code not disassembled

pdftops 14 97% 0%

chroot 26 85% 0%

chmod 39 87% 0%

cat 43 92% 0%

ls 96 88% 0%

dhclient 411 81% 4%

Page 19: Binary Analysis and Rewriting

DTA++: Improving accuracy of Dynamic Taint Analysis [NDSS 2011]

Page 20: Binary Analysis and Rewriting

Under-tainting and Over-taintingResults vary based on which values are

considered to depend on others:

Page 21: Binary Analysis and Rewriting

Under-tainting and Over-taintingResults vary based on which values are

considered to depend on others:

• Too few dependencies lead to under-tainting

Page 22: Binary Analysis and Rewriting

Under-tainting and Over-taintingResults vary based on which values are

considered to depend on others:

• Too many dependencies lead to over-tainting

Page 23: Binary Analysis and Rewriting

Under-tainting occurs when control flow state represents (almost) all of the information in inputs

Key idea: propagate taint only for control dependencies that would cause under-tainting (culprit implicit flows)

Key Idea

Page 24: Binary Analysis and Rewriting

Under-tainting occurs when control flow state represents (almost) all of the information in inputs

Key idea: propagate taint only for control dependencies that would cause under-tainting (culprit implicit flows)

Key Idea

1 char output[256];2 char input = next_in();3 long len = 0;4 if (input == '{') {5 output[0] = '\\';6 output[1] = '{';7 len = 2;8 }

Page 25: Binary Analysis and Rewriting

DTA++ Approach OverviewHypothesis: under-tainting occurs at just a few locations

in a program (culprit branches)Approach: find these locations in advance, and construct

new taint propagation rules for themAssumption: we are given test inputs that demonstrate

the under-tainting

Page 26: Binary Analysis and Rewriting

Approach DetailsUnder-tainting Detection Predicate

Given a (partial) execution trace t, φ(t) holds if t contains a culprit implicit flow

ImplementationUse symbolic execution to count how many other inputs could take the same execution path as t

Few or none → φ(t) = trueSearch for Culprit Branches

Find shortest prefix of t that satisfies φthe last instruction in the prefix is the culprit

Remove culprit, repeat the search to find others

Page 27: Binary Analysis and Rewriting

ProgramDescription

# of CulpritImplicit Flows

Detected & Fixed

Time forDiagnosis

WordPad, RTF 1 0.26s

MS Word 2003, RTF 24 31m 5.26s

AbiWord, HTML 1 14.29s

AngelWriter, HTML 3 0.63s

AurelEdit, RTF 1 0.76s

VNU Editor, RTF 1 0.34s

IntelliEdit, RTF 1 0.40s

CryptEdit, RTF 1 0.23s

DTA++ Results: Diagnosis Time

Page 28: Binary Analysis and Rewriting

DTA++ Results: Over-tainting

Page 29: Binary Analysis and Rewriting

Summary and Future WorkDevelop novel static analysis based methods to

overcome the drawbacks of today’s techniquesRobust, scalable static analysis of low-level codeAccurate disassembly of binary code Accurate reasoning about key properties

Dynamic taint analysisFuture work

Experimentation and evaluation of stack analysis and disassembly

Robust and efficient binary instrumentation for information flow and related properties

Application to hostile OS defense