saumya debray the university of arizona tucson, az 85721
DESCRIPTION
Saumya Debray The University of Arizona Tucson, AZ 85721. Understanding software that doesn’t want to be understood Reverse engineering obfuscated BINARIE s. The Problem. Rapid analysis and understanding of malware code essential for swift response to new threats - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/1.jpg)
UNDERSTANDING SOFTWARE THAT DOESN’T WANT TO BE UNDERSTOOD REVERSE ENGINEERING OBFUSCATED BINARIES
Saumya DebrayThe University of ArizonaTucson, AZ 85721
![Page 2: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/2.jpg)
The Problem
Rapid analysis and understanding of malware code essential for swift response to new threats‒ Malicious software are usually heavily
obfuscated against analysis Existing approaches to reverse engineering
such code are primitive‒ not a lot of high-level tool support‒ requires a lot of manual intervention‒ slow, cumbersome, potentially error-prone
Delays development of countermeasures
![Page 3: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/3.jpg)
Goals
Develop automated techniques for analysis and reverse engineering of obfuscated binaries
semantics-based‒ output is functionally equivalent to, but simpler
than, the input program
generality‒ should work on any obfuscation
even ones we haven’t thought of yet!‒ should minimize assumptions about obfuscations
![Page 4: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/4.jpg)
Challenges
can’t make assumptions about obfuscations‒ what do we leverage for deobfuscation?‒ distinguishing code we care about from code we
don’t how do we know which instructions we care about?
scale‒ “needle in haystack”
no. of instructions executed increases by 270x (VMprotect) to 4300x (Themida) [Lau 2008]
anti-analysis defenses‒ runtime unpacking‒ anti-emulation, anti-debug checks
![Page 5: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/5.jpg)
Our Approach
no obfuscation-specific assumptions‒ treat programs as input-to-output transformations‒ use semantics-preserving transformations to
simplify execution traces dynamic analysis to handle runtime
unpacking
Taint analysis
(bit-level)
Control flow reconstructi
on
Semantics-preserving
transformations
inpu
t p
rogr
am
cont
rol
flow
gr
aph
map flow of valuesfrom input to output
simplify logic ofinput-to-outputtransformation
reconstruct logic ofsimplified computation
![Page 6: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/6.jpg)
Ex 1:Emulation-based Obfuscation
examination of the code reveals only the emulator’s logic‒ actual program logic embedded in byte code
lots of “chaff” during execution‒ separating emulator logic from payload logic
tricky
emulators can be nested
Obfuscatorinput program
random seed
bytecode logic (data)
emulator (code)
mutation engine
![Page 7: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/7.jpg)
Ex 2:Return-Oriented Programs (ROP)
Originally designed to bypass anti-code-injection defenses‒ stitches together existing code fragments
( “gadgets” ), e.g., in system libraries Logic can be difficult to discern
‒ gadgets are typically scattered across many different functions and/or libraries
‒ gadgets can overlap in memory in weird ways‒ control flow structures (if-else, loops, function
calls) are typically implemented using non-standard idioms
![Page 8: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/8.jpg)
Example 1 (emulation-obfuscation)
factorial (Themida)
![Page 9: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/9.jpg)
Example 2 (ROP)
o
original ROP
factorial
![Page 10: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/10.jpg)
Interactions between ObfuscationsExample: Unpacking + Emulation
unpa
ckun
pack
output
output
input
input
instructions “tainted” as propagating values from input to output
execution traceinput-to-output computation(further simplified)
used
to c
onst
ruct
con
trol fl
ow g
raph
![Page 11: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/11.jpg)
Results
Ex. 1. binary search : Themidaoriginal obfuscated (cropped) deobfuscated
![Page 12: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/12.jpg)
Results
Ex. 2. Hunatcha (drive infection code) : ExeCryptororiginal obfuscated (cropped) deobfuscated
![Page 13: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/13.jpg)
Results
Ex. 3. fibonacci: ROPoriginal obfuscated deobfuscated
![Page 14: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/14.jpg)
Results Ex. 4. Win32/Kryptik.OHY: Code Virtualizer
obfuscated deobfuscated
multiple layers of runtime code generationunpacking
code
initial unpacker is emulation-obfuscated
the CFG shown materializes incrementally
![Page 15: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/15.jpg)
Results: CFG Similarity
0
10
20
30
40
50
60
70
80
90
100
OBFUSCATEDDEOBFUSCATED
Programs
Sim
ilari
ty w
ith
orig
inal
pro
gram
(%
)
![Page 16: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/16.jpg)
Lessons and Issues
Static vs. dynamic analysis‒ multiple layers of runtime code
generation/unpacking limits utility of static analysis
‒ dynamic analysis can run into problems of scale O(n2) algorithms impractical ; even O(n log n) can be
problematic trade memory space for execution time/complexity code coverage — multi-path exploration?
Taint propagation‒ byte/word-level analyses may not be precise
enough we use (enhanced) bit-level taint propagation
Simplified trace → CFG: NP-hard‒ semantic considerations?
![Page 17: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/17.jpg)
Conclusions
Rapid analysis and understanding of malware code essential for swift response to new threats‒ need to deal with advanced code obfuscations‒ obfuscation-specific solutions tend to be fragile
We describe a semantics-based framework for automatic code deobfuscation ‒ no assumptions about the obfuscation(s) used‒ promising results on obfuscators (e.g.,
Themida) not handled by prior research
![Page 18: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/18.jpg)
ADDITIONAL MATERIAL
![Page 19: Saumya Debray The University of Arizona Tucson, AZ 85721](https://reader036.vdocuments.us/reader036/viewer/2022062323/568161ba550346895dd19165/html5/thumbnails/19.jpg)
Semantics-based simplification
Quasi-invariant locations: locations that have the same value at each use.
Our transformations (currently):‒ Arithmetic simplification
adaptation of constant folding to execution traces consider quasi-invariant locations as constants controlled to avoid over-simplification
‒ Data movement simplification use pattern-driven rules to identify and simplify data
movement.‒ Dead code elimination
need to consider implicit destinations, e.g., condition code flags.