recovery of variables and heap structure in x86 executables

25
Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin

Upload: nelia

Post on 11-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Recovery of Variables and Heap Structure in x86 Executables. Gogul Balakrishnan Thomas Reps University of Wisconsin. Overview. Introduction Challenges Background Recovering A-locs via Iteration An Abstraction for Heap-Allocated Storage Experiments. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Recovery of Variables and  Heap Structure in x86 Executables

Recovery of Variables and Heap Structure in x86

Executables

Gogul BalakrishnanThomas Reps

University of Wisconsin

Page 2: Recovery of Variables and  Heap Structure in x86 Executables

Overview

• Introduction• Challenges• Background• Recovering A-locs via Iteration• An Abstraction for Heap-Allocated

Storage• Experiments

Page 3: Recovery of Variables and  Heap Structure in x86 Executables

Introduction

• The Need of Analyzing Executables– What You See Is Not What You eXecute

• Many Obstacles in Analyzing Executables– Data Objects are Not Easily Identifiable.– Absence of Symbol Table & Debugging Information– Determining the Memory Addresses of Data Objects– Difficult to Track the Flow of Data through Memory– Challenging to get useful information about the heap

e.g) memset(password, ‘\0’, len); free(password);

Page 4: Recovery of Variables and  Heap Structure in x86 Executables

Challenges(1/3)

• Recovering Variable-like Entities– The layout of Memory is known at Compile

time or Assembly time (IDAPro’ Approach)

– To Recover y, the Set of Values that eax Holds at 5 Needs to be Determined.

void main() { int x, y; x = 1; y = 2; return;}

proc main1 mov ebp, esp2 sub esp, 83 mov [ebp-8], 14 mov eax, ebp5 mov [eax-4], 26 add esp, 87 retn

Page 5: Recovery of Variables and  Heap Structure in x86 Executables

Challenges(2/3)

• Granularity of Recovered Variable-like

Entities– Affects the complexity and accuracy of

subsequent analyses

• The Structure of Heap-Allocated Objects– Only the Size of the Allocated Block is Known.– Using Abstract-Refinement Algorithm

Page 6: Recovery of Variables and  Heap Structure in x86 Executables

Challenges(3/3)

• Resolving Virtual-Function Calls

– A Definite Link between the Object and the Virtual Function Table is Never Established. (Weak Update)

one-variable-per-malloc-site abstraction

Page 7: Recovery of Variables and  Heap Structure in x86 Executables

Background(1/6)

• Abstract Locations (A-locs)– Memory Region

• A Set of Disjoint Memory Areas• Represents a Group of Locations that have Similar

Runtime Properties

– Abstract Locations• Locations between two addresses/offsets in Memory-

Region• Address & Offsets are Statically Determined

Page 8: Recovery of Variables and  Heap Structure in x86 Executables

Background(2/6)

• Abstract Locations (cont’d) proc main0 mov ebp,esp1 sub esp,402 mov ecx,03 lea eax,[ebp-40]L1: mov [eax], 15 mov [eax+4],26 add eax, 87 inc ecx8 cmp ecx, 59 jl L110 mov eax,[ebp-36]11 add esp,4012 retn

Page 9: Recovery of Variables and  Heap Structure in x86 Executables

Background(3/6)

• Value-Set Analysis (VSA)– Combined Numeric-Analysis & Pointer-Analysis– Over-Approximation of the values that each a-

loc holds at each program point– Value-Set

• The Set of Addresses and Numeric Values• N-tuple of strided intervals of the form s[l, u]

• (Global Region, Procedure Region, …)• (1[0, 9], ∮) versus (∮, -8[-40, -8])

e.g) 8[-40, -8] = {-40, -32, -24, -16, -8}

N : the number of memory-regions

Page 10: Recovery of Variables and  Heap Structure in x86 Executables

Background(4/6)

• Value-Set Analysis (cont’d)– The Value-Set of eax at L1

• (∮, 8[-40, -8]) • eax holds the offsets

{-40, -32, -24, -16, -8}• Starting Addresses of Field x of p

proc main0 mov ebp,esp1 sub esp,402 mov ecx,03 lea eax,[ebp-40]L1: mov [eax], 15 mov [eax+4],26 add eax, 87 inc ecx8 cmp ecx, 59 jl L110 mov eax,[ebp-36]11 add esp,4012 retn

Typedef struct { int x, y;} Point;

int main() { int i; Point p[5]; for(i=0; i<5; ++i) { p[i].x = 1; p[i].y = 2; } return p[0].y;}

Page 11: Recovery of Variables and  Heap Structure in x86 Executables

Background(5/6)

• Aggregate Structure Identification (ASI)– Can Distinguish between Accesses to Different

Parts of the Same Aggregate– Aggregate is broken up into smaller parts

(atoms)– Data-Access Constraint Language (DAC)

• Specifying Data-Access Pattern in the Program

DataRef Reference to a set of sequences of bytes

UnifyConstraint

Flow of Data in the Program

Page 12: Recovery of Variables and  Heap Structure in x86 Executables

Background(6/6)

• Aggregate Structure Identification (cont’d)– Data-Access Constraint Language (DAC)

• DataRef [l : u] refers to bytes l through u in DataRef• DataRef n : n is the number of elements

– ASI DAG

e.g) P[0:11] 3 = P[0:3], P[4:7], or P[8:11]

return_main

p[0:39] 5[0:3] ≈ const_1[0:3];p[0:39] 5[4:7] ≈ const_2[0:3];return_main[0:3] ≈ p[4:7]

Page 13: Recovery of Variables and  Heap Structure in x86 Executables

Recovering A-locs via Iteration• Problems of VSA

– Can only Represent a Contiguous Sequence of Memory Locations

– Cannot Detect Internal Substructure

• Basic Idea

1. VSA is used to obtain memory-access patterns in the executable;

2. ASI is used as a heuristic to determine a set of a-locs according to the memory-access patterns obtained from the information recovered by VSA.

IDAPro

ASI VSAFinal Value-Sets

Page 14: Recovery of Variables and  Heap Structure in x86 Executables

Recovering A-locs via Iteration• Generating Data-Access Constraints

from Value<Algorithm 1 SI2ASI>if s[l,u] is a singleton then return <“r[l : l+length-1]”, true>else size ← max(s, length) n ← (u – l + size – 1) / size ref ← “r[l : u+size-1] n[0 : size-1]” return <ref, (s = size)>enf if

e.g) s[l, l]

Actual Byte Range

The number of array elements

Input : (r, s[l, u], length)Output : (ASI Ref, Boolean)

(AR_main, 8[-40, -8], length)=> {AR_main[(-40):(-1)] 5[0:7]}AR_main[-40:-33][0:7]AR_main[-32:-25][0:7]AR_main[-24:-17][0:7]AR_main[-16:-9][0:7]AR_main[-8:-1][0:7]

Page 15: Recovery of Variables and  Heap Structure in x86 Executables

Recovering A-locs via Iteration• Generating Data-Access Constraints

from Value<Algorithm 2>if (s1[l1,u1] or s2[l2,u2] is a singleton then return SI2ASI(r, s1[l1, u1] ⊕ s2[l2, u2], length)end ifif s1 ≥ (u2 – l2 + length) then baseSI ← s1[l1, u1] indexSI ← s2[l2, u2]else if s2 ≥ (u1 – l1 + length) then baseSI ← s2[l2, u2] indexSI ← s1[l1, u1]else return SI2ASI(r, s1[l1, u1] ⊕ s2[l2, u2], length)end if<baseRef, exactRef> ← SI2ASI(r, baseSI, stride(baseSI))if exactRef is false then return SI2ASI(r, s1[l1, u1] ⊕ s2[l2, u2], length)else return concat(baseRef, SI2ASI(‘’, indexSI, length))endif

Determine base register

Row-major order

Base Addr

Base Addr

Index Addr

e.g) eax : (1[0:9], ∮)ecx : (∮, 16[-160, -16])In case of [ecx+eax] =>AR[-160:-1] 10[0:15] [0:9] 10[0:0]

Page 16: Recovery of Variables and  Heap Structure in x86 Executables

Recovering A-locs via Iteration• Interpreting Indirect Memory-

References– Lookup Algorithm

• NodeDesc : <name, length>

• NodeDescList : An Ordered List of NodeDesc

• Three Operations

name :the name associated with the ASI tree nodelength : the length of above node

e.g) [nd1, nd2, …, ndn]

Name Output

GetChildren(aloc) List of Child Nodes

GetRange(start, end)

List of Nodes with offsets in the given range [start, end]

GetArrayElements(m)

List of Nodes with m elements

Page 17: Recovery of Variables and  Heap Structure in x86 Executables

Recovering A-locs via Iteration• Lookup Algorithm Examples

e.g) Lookup p[0:39] 5[0:3]

GetChildren(p) = [<a3, 4>, <a4, 4>, <i2, 32>]GetRange(0, 39) = [<a3, 4>, <a4, 4>, <i2, 32>]GetArrayElements(5) = [<a3, 4>, <a4, 4>], [<a5, 4>, <a6, 4>]GetRange(0, 3) = [<a3, 4>, <a5, 4>]

Page 18: Recovery of Variables and  Heap Structure in x86 Executables

An Abstraction for Heap-Allocated Storage

• Previous Abstraction

• Recency Abstraction– Allowing VSA & ASI to recover Info. About

virtual-function tables– Use Two Memory-Regions per allocation site s

• MRAB[s] : Most Recently Allocated Block• NMRAB[s] : Non-Most Recently Allocated Block• count : How many concrete blocks the memory-region

represents (MRAB[s].count, NMRAB[s].count)– SmallRange = {[0, 0], [0, 1], [1, 1], [0, ∞], [1, ∞], [2, ∞]}

• size : over-approximation of the size of block (MRAB[s].size, NMRAB[s].size)

All of the nodes allocated at a given allocation site s are folded together into a single summary node ns.

Page 19: Recovery of Variables and  Heap Structure in x86 Executables

An Abstraction for Heap-Allocated Storage

• Operation– AbsEnv[s] : MRAB[s]/NMRAB[s] →

<count,size,alocEnv>– AlocEnv = a-loc → ValueSet– Allocation site s transforms absEnv to absEnv’

• absEnv’(MRAB[s]) = <[0,1], size, a-loc.Value-Set>• absEnv’(NMRAB[s]).count = absEnv(NMRAB[s]).count +

absEnv(MRAB[s]).count• absEnv’(NMRAB[s]).size = absEnv(NMRAB[s]).size ∪

absEnv(MRAB[s]).size• absEnv’(NMRAB[s]).alocEnv = absEnv(NMRAB[s]).alocEnv

∪ absEnv(MRAB[s]).alocEnv

Page 20: Recovery of Variables and  Heap Structure in x86 Executables

An Abstraction for Heap-Allocated Storage

Page 21: Recovery of Variables and  Heap Structure in x86 Executables

Experiments

• Environments

• Software

OS Compiler Language Target Files

Windows Visual Studio 6.0

C++ .obj

Page 22: Recovery of Variables and  Heap Structure in x86 Executables

Experiments

• Results of Virtual-Function Call Resolution

Page 23: Recovery of Variables and  Heap Structure in x86 Executables

Experiments

• Results of A-loc Identification– Comparing the Results of Algorithm with

Debugging Information

The structure of 87% of the local variables is correct

Page 24: Recovery of Variables and  Heap Structure in x86 Executables

Experiments

• Results of A-loc Identification

The structure of 72% of the objects in the heap is correct

Page 25: Recovery of Variables and  Heap Structure in x86 Executables

Q & A