automated reassembly of document fragments dfrws 2002

27
Automated Reassembly of Document Fragments DFRWS 2002

Upload: brent-park

Post on 03-Jan-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Automated Reassembly of Document Fragments DFRWS 2002

Automated Reassembly of Document Fragments

DFRWS 2002

Page 2: Automated Reassembly of Document Fragments DFRWS 2002

Outline

Introduction & Motivation Stages in Reassembly Reassembly Problem Our Solution Implementation & Experiments Summary

Page 3: Automated Reassembly of Document Fragments DFRWS 2002

Introduction & Motivation

Page 4: Automated Reassembly of Document Fragments DFRWS 2002

Introduction

Reassembly of objects from mixed fragments

Common problem in:– Classical Forensics– Failure Analysis– Archaeology

Well studied, automated… Is there a similar problem in

digital forensics?

Page 5: Automated Reassembly of Document Fragments DFRWS 2002

Motivation

Digital evidence is– malleable – easily scattered

Fragmentation process

Evidence

F1

F2

Fn-1

Fn

F3

Keywords

Split

Fragments

Page 6: Automated Reassembly of Document Fragments DFRWS 2002

Motivation…

Scenarios… Hiding in Slack Space

– Criminal splits the document and hides them selectively into slack spaces based on a password

Swap File– Addressing & state information is not

available on the disk Peer-to-peer systems

– Fragments are assigned a sequence of keywords and scattered across the network

– e.g.: FreeNet, M-o-o-t

Page 7: Automated Reassembly of Document Fragments DFRWS 2002

Stages of Reassembly

Page 8: Automated Reassembly of Document Fragments DFRWS 2002

Stages of Reassembly

pre

pro

cess

ing

coll

atin

g

F1

Fx

G1

Gy

H1

Hz

reas

sem

bly

G1G1

Gy

F1F2

Fx

H1H1

Hz

F

G

H

Page 9: Automated Reassembly of Document Fragments DFRWS 2002

Stages of Reassembly

Preprocessing– Cryptanalysis– Weight Assignments

Collating– Group together fragments of a

document– Hierarchical approach

Reassembly– Reordering the fragments to form the

original document

Page 10: Automated Reassembly of Document Fragments DFRWS 2002

Reassembly

Page 11: Automated Reassembly of Document Fragments DFRWS 2002

The Problem of Reassembly Suppose we have fragments {A0, A1,

… An} of document A Compute a permutation X such that

A= AX(0)||AX(1)|| … AX(n)

To compute A, we need to find adjacent fragments

To reassemble:– Need to find adjacent fragments– Automate the process

Page 12: Automated Reassembly of Document Fragments DFRWS 2002

Quantifying Adjacency

An Example: A linguist may assign probabilities based on syntactic and semantic analysis

This process is language dependent

the quick bro

over the lazy dog

wn fox jumps

Page 13: Automated Reassembly of Document Fragments DFRWS 2002

Context-Based Statistical Models

Context based models are used in data compression

Predicts subsequent symbols based on current context

Works well on natural languages as well as other data types

Context models can be used to predict upcoming symbols and assign candidate probabilities

abracadabra

abr acontext prediction

Page 14: Automated Reassembly of Document Fragments DFRWS 2002

Adjacency Matrix

Candidate probabilities of each pair of fragments form complete graph

A Hamiltonian path that maximizes the sum of candidate probabilities is our solution

But this problem is intractable

We will discuss a near optimal solution

c

a

e

d

b

a b c d ea 0 c(a,b) ….b c(b,a) 0 …. c d e c(e,a)

Page 15: Automated Reassembly of Document Fragments DFRWS 2002

Steps in Reassembling

1. Build context model using all the fragments

2. Compute candidate probabilities for each pair

3. Find a Hamiltonian Path that maximizes the sum of candidate probabilities

Page 16: Automated Reassembly of Document Fragments DFRWS 2002

Implementation & Experiments

Page 17: Automated Reassembly of Document Fragments DFRWS 2002

Prediction by Partial Matching (PPM)

Uses a suite of fixed order context models

Uses one or more orders to predict upcoming symbol

We process each fragment with PPM

Combine the statistics to form a model for all the fragments

order=2 order=1 order=0ab r 2 2/3 ^ 1 1/3

a b 2 2/7 c 1 1/7 d 1 1/7 ^ 3 3/7

a 5 5/12 b 2 2/16 c 1 1/16

ac a 1 ½ ^ 1 ½

ad a 1 ½ ^ 1 ½

br a 2 2/3 ^ 1 1/3

ca d 1 ½ ^ 1 ½

abracadabra

Page 18: Automated Reassembly of Document Fragments DFRWS 2002

Candidate Probability

Slide a window of size d from one fragment into the other

At each position, use the window as context and determine the probability (pi) of next symbol

Candidate prob. C(1,2) = (p0* p1*…pd)

a b r a c a d a b r a c d a b a

fragment 1 fragment 2

Page 19: Automated Reassembly of Document Fragments DFRWS 2002

Solution Tree

Assumtions:– Fragments are recovered

without data loss– First fragment is

known/easily identified

Paths in complete path can be represented as a tree

Tree grows exponentially!

We have to prune the tree

c

a

e

d

b

a

b c e

c d e

c e

e c

d

Page 20: Automated Reassembly of Document Fragments DFRWS 2002

Pruning

At every level choose a node with the largest candidate probability

We can choose alpha nodes at each level

By looking at candidate probabilities beta levels deep

a

b c e

c d e

c e

e c

d

b c e

b c

c b

Page 21: Automated Reassembly of Document Fragments DFRWS 2002

Experiments

Page 22: Automated Reassembly of Document Fragments DFRWS 2002

Data Set

OS Forensics Log, history files of various users

Source Code C/C++, Java source code

Binary Code Executable, object code (Window, Linux, Solaris)

Binary Document MS Office, PDF documents

Raw Plain-Text Unformatted text, transcripts

Encrypted & Compressed

Encrypted, compressed files

Page 23: Automated Reassembly of Document Fragments DFRWS 2002

Reassembly for various types

Document Reassembly for Various Type

01020304050607080

Document Type

Rea

ssem

bled

(%)

Page 24: Automated Reassembly of Document Fragments DFRWS 2002

Compression ratio

0

1

2

3

4

5

6

7

8

9

10

Co

mp

ress

ion

Ra

tio

Page 25: Automated Reassembly of Document Fragments DFRWS 2002

Iterative Approach

F1

F2

Fn

Reassembly Concatenate

Concatenatedfragments

Final Sequenc

e

OS Forensic

4

Source Code

6

Binary Code

9

Binary Doc

10

Raw-text 10

Page 26: Automated Reassembly of Document Fragments DFRWS 2002

Summary

Introduced reassembly of scattered evidence

Experiments & results Future work:

– Identifying preprocessing heuristics– Compare performance with other

models– Work on reassembling images

Page 27: Automated Reassembly of Document Fragments DFRWS 2002

Questions