reverse engineering malware · 2013. 1. 10. · sri international reverse engineering malware...

67
SRI International Reverse Engineering Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International [email protected]

Upload: others

Post on 20-Aug-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

SRI International

Reverse EngineeringReverse EngineeringMalware

Hassen SaidiComputer Science Laboratoryp ySRI [email protected]

Page 2: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

The Growth of a Network

Page 3: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

The Growth of a ThreatThe Growth of a ThreatMass email campaign: Love letter, MelissaMultiple vectors of infection, attacks against AV software,Combined infection vectors, dangerous payloads: Code Red, Nimda

Email virus + social engineering: Xmas ExecLarge scale pandemics: Morris wormInfected 10% of the Internet.

Self replicatingProgram: Creeper

Sophisticated engineering: ConfickerUse of Crypto.Social Networks/cell phone worms.Stuxnet,…

Page 4: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

The Threats of the Connected AgeThe Threats of the Connected Age

Malware incidents are rising dramatically:• increase of infection vectors• increase in the complexity of

botnet structures From Biology: Connected World Gives Viruses The Edge“as human activity makes the world more connected, natural selection will favor more virulent and dangerous parasites."

Page 5: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The
Page 6: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Motivation

• Malware landscape is diverse and constant evolving– Large botnets

– Diverse propagation vectors, exploits, C&C

– Capabilities – backdoor, keylogging, rootkits,

L i b b ti b b– Logic bombs, time‐bombs

– Diverse targets: desktops, mobile platforms, SCADA systems (Stuxnet)

• Malware is not about script‐kiddies anymore, it’s real business. Recent events indicate that it can be a powerful 

i b fweapon in cyber warfare. 

• Manual reverse‐engineering is close to impossible Need automated techniques to extract system logic– Need automated techniques to extract system logic, interactions and side‐effects, derive intent, and devise mitigating strategies.

Page 7: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Outline

• Review of the workflow of binary program analysis

• Review of the challenges in binary program analysis:a a ys s– Obfuscation Techniques

• Techniques for reverse engineering stripped• Techniques for reverse engineering stripped binaries:

Systematic deobfuscation– Systematic deobfuscation

• Examples of obfuscation: Conficker, Hydrac (G l tt k) St t(Google attack), Stuxnet, …

Page 8: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Capturing Malware

• Honeynets: Capture malware that scans theHoneynets: Capture malware that scans the Internet for vulnerable targets

• Mining SPAM for attachments• Mining SPAM for attachments

• Mining SPAM for malicious URLs, and i d i b d l dcapturing drive‐by downloads

• AV heuristics 

Page 9: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Malware Binary Analysis

0100101010010101010101010011010101010010101001010101010101001101010101001010100101010101010100110101010100101010010101010101010011010101

• What does the malware do• How does it do it • identify triggers• What is the purpose of the10101010011010101

101010100110101010100101010010101010101010011010101

.exe

What is the purpose of themalware

• is this an instance of a knownthreat or a new malware

h i h hTypically a strippedbinary with no debugging information

• who is the author• …

debugging information.

In the case of maliciouscode, it is often obfuscatedand packed

Challenges:• lack of automation• time-critical analysisand packed

Often has embedded suicide logic andanti-analysis logic

y• labor intensive• requires a human in the loop

Page 10: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Dynamic vs Static Malware Analysis

• Dynamic AnalysisDynamic Analysis– Techniques that profile actions of binary at runtime

– More popular• CWSandbox, TTAnalyze, multipath exploration• Only provides partial ``effects‐oriented profile’’ of malware potential

• Static AnalysisStatic Analysis– Can provide complementary insights– Potential for more comprehensive assessmentPotential for more comprehensive assessment

Page 11: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Malware Evasions and Obfuscations

• To defeat signature based detection schemesP l hi t hi t t d i i i f– Polymorphism, metamorphism: started appearing in viruses of the 90’s primarily to defeat AV tools

• To defeat Dynamic Malware Analysisy y– Anti‐debugging, anti‐tracing, anti‐memory dumping 

– VMM detection, emulator detection

• To defeat Static Malware analysis– Encryption (packing)

API and control flow obfuscations– API and control‐flow obfuscations

– Anti‐disassembly

• The main purpose of obfuscation is to slow down the p psecurity community

Page 12: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

My Personal Philosophy

• Push the limits of static analysis as much asPush the limits of static analysis as much as possible.

• Rebuild the binary in its original form prior to• Rebuild the binary in its original form prior to obfuscation.

R hi h l l d i i f h• Recover a higher level description of the malware binary that makes deriving the 

f h l i bl Ipurpose of the malware atteingnable: I want to stare at C code as opposed to staring at 

bl dassembly code

Page 13: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Malware Revere Engineering System Goals

• Desiderata for a Static Analysis Frameworky– Unpack most of contemporary malware– Handle most if not all packers

Deobfuscate API references– Deobfuscate API references– Automate identification of capabilities– Provide feedback on unpacking success– Simplify and annotate call graphs to illustrate interactions 

between key logical blocks– Enable decompilation of assemply code into a higher‐level 

language– Identify key logical blocks (crypto for instance)

Page 14: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Reverse Engineering Phases Unpacking phase: the image of a running malware sample is often considered damaged:

‐ No known OEP. Imported APIs are invoked dynamically and the original import bl b / /table is destroyed. Arbitrary section names and r/w/e permissions. 

Disassembly phase:

‐ Identification of code and data segments

‐ Relies on the unpacker to capture all code and data segments. Our unpacking approach guarantees that.

Decompilation phase:p p

‐ Reconstruction of the code segment into a C‐like higher level representation

‐ Relies on the disassembler to recognize function boundaries, targets of call sites, imports and OEP Our API resolution guarantees thatimports, and OEP. Our API resolution guarantees that.

Program understanding phase:

‐ Relies on the decompiler to produce readable C code, by recognizing the compiler calling conventions stack frames manipulation functions prologs andcompiler, calling conventions, stack frames manipulation, functions prologs and epilogs, user‐defined data structures. Our code rewrite and analysis guarantees that.

Page 15: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Phase 1: Malware Unpacking 

Unpacking

Page 16: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Example of Packed Code

Page 17: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

The Eureka Framework

• Novel unpacking technique based on coarseNovel unpacking technique based on coarse grained execution tracing

• Heuristic based and statistic based upacking• Heuristic‐based and statistic‐based upacking

• Implements several techniques to handle bf d API fobfucated API references

• Multiple metrics to evaluate unpack success

• Annotated call graphs provide bird’s eye view of system interactiony

Page 18: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

The Eureka Workflow 

Packed Binary

Dis‐assemblyIDA‐Pro

Packed .ASM

Trace Malware syscalls in 

VM

Eureka’s Unpacker

Un‐packedBinary

Dis‐assemblyIDA‐Pro

Un‐Packed .ASM

IDA ProStatistics based

Evaluator

Unpack Evaluatio

n

VM

Syscalltrace

Favorable execution 

point

Raw unpackedExecutable

•Unknown OEPHeuristic basedoffline analysis

•Unknown OEP•No debug information•Unresolved library calls•Snapshot of data segment•Unreachable code•Loss of structuresStatistics 

basedEvaluator

Page 19: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Coarse‐grained Execution Monitoring

• Generalized unpacking principleGeneralized unpacking principle– Execute binary till it has sufficiently revealed itself

Dump the process execution image for static– Dump the process execution image for static analysis

• Monitoring exection progress• Monitoring exection progress– Eureka employs a Windows driver that hooks to SSDT (System Service Dispatch Table)SSDT (System Service Dispatch Table)

– Callback invoked on each NTDLL system call

Filtering based on malware process pid– Filtering based on malware process pid

Page 20: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Heuristic‐based Unpacking

• How do you determine when to dump?o do you de e e e o du p– Heuristic #1: Dump as late as possible. NtTerminateProcess

– Heuristic #2: Dump when your program generates errors.  N R i H dENtRaiseHardError

– Heuristic #3: Dump when program forks a child process.  NtCreateProcess

• Issues– Weak adversarial model, too simple to evade…

– Doesn’t work well for package non‐malware programs

Page 21: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Statistics‐based Unpacking

• ObservationsObservations– Statistical properties of packed executable differ from unpacked exectuable

– As malware executes code‐to‐data ratio increases

• Complications– Code and data sections are interleaved in PE executables

– Data directories(import tables) look similar to data but are often found in code sectionsProperties of data sections vary with packers– Properties of data sections vary with packers

Page 22: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Statistics‐based Unpacking (2)

• Our ApproachOu pp oac– Model statistical properties of unpacked code

• Estimating unpacked code– N‐gram analysis to look for frequent instructions

– We use bi‐grams (2‐grams) because x‐86 opcodes are 1 or 2 bytesbytes

– Extract subroutine code from 9 benign executables

– FF 15 (call), FF 75 (push),  E8 _ _ _ ff (call), E8 _ _ _ 00 (call) 

Page 23: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Evaluation  (ASPack)

Page 24: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Evaluation  (MoleBox)

Page 25: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Evaluation  (Armadillo)

Page 26: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Systematic Approach to Code Deobfuscation:UnpackingU pac g

• Automatic Unpacking: involves running the malware and p g gcapturing its memory image.

• Monitoring the execution of the malware is an intrusive d i ft d t t d i ti t i d tiprocess and is often detected using anti‐tracing and anti‐

debugging techniques embedded in the malware.

• Our multi‐strategy approach consists of minimal monitoring gy pp gand capturing the process image at key events:– ExitProcess

– Byte bigram monitoring: call push instructions for instance– Byte bigram monitoring: call, push instructions for instance

– Number of seconds elapsed

– Run the malware without monitoring and suspend its execution and perform memory inspectionmemory inspection

• In practice, we always manage to get a dump (memory snapshot) of the running process: no OEP and no Import table

Page 27: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Phase 2: Disassembly

• The disassembler reads the PE data structure in order to:

1. Determine the different sections of the file and separate code from data and identifies resource information such as import tablesidentifies resource information such as import tables

– The disassembler relies on the PE data structure (could be corrupt)

– The disassembler translates into code, any referenced address from known code locationknown code location

2. Translate code segments into assembly language

– The disassembler relies on the hardware instruction set documentation

3. Interpret data according to identified types3. Interpret data according to identified types

1. A data referenced by code can be of any type: integer, string, struct, etc.

Integer:

0x0040F45C dword 40F45C dd 0E06D7363h, 1, 2 dup(0) ; DATA XREF: 408C980x0040F45C dword_40F45C    dd 0E06D7363h, 1, 2 dup(0) ; DATA XREF: 408C98

String:

0x0040F45C unk_40F45C   db  63h ; c             ; DATA XREF: sub_408C98

0x0040F45D db 73h ; s0x0040F45D                 db  73h ; s

0x0040F45E                 db  6Dh ; m

0x0040F45F                 db 0E0h ; a

Page 28: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

IDA Pro Disassembler• http://www.hex‐rays.com/idapro/

– It supports a variety of executable formats for different processors and operating systems. It also can be used as a debugger for Windows PE, Mac OS X, and Linux ELF executables.

– IDA performs a large degree of automatic code analysis to a certain t t l i f b t d ti k l dextent, leveraging cross‐references between code sections, knowledge 

of parameters of API calls, limited dataflow analysis, and recognition of standard libraries.

• Hashes of known statically linked libraries are compared to hashes ofHashes of known statically linked libraries are compared to hashes of identified subroutines in the code

– Provides scripting languages to interact with the system to improve p g g g y pthe analysis.

– Support plug‐ins: The IDA decompiler is the most impressive plug‐in.

Page 29: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

PE Execution

1. Read the Portable Executable (PE) file data structure and maps the file into memory

2 Load import modules2. Load import modules

1. Start execution at entry point

2. Runtime unpacking

3 Jump to OEP3. Jump to OEP

Page 30: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Phase 3: Fixing the Disassembled Code

• Unpacked & disassembled code does not haveUnpacked & disassembled code does not have an OEP.

• Import tables are rebuilt dynamically and• Import tables are rebuilt dynamically and there are no static references to dynamically loaded librariesloaded libraries

• Header information is not reliable

• Data is not typed

Page 31: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Parsing the PE executable format

Page 32: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Challenges in Binary Code Disassembly

• Disassembly is not an exact science: On CISC platforms withy pvariable‐width instructions, or in the presence of self‐modifying code, it is possible for a single program to have twoor more reasonable disassemblies Determining whichor more reasonable disassemblies. Determining whichinstructions would actually be encountered during a run ofthe program reduces to the proven‐unsolvable haltingproblem.

• Bad disassembly because of variable length instructions• Bad disassembly because of variable length instructions

• Jumps into middle of instructions

• No reachability analysis: Unreachable code can hide data.No reachability analysis: Unreachable code can hide data. 

Page 33: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Examples of Disassembly problems (The Storm Worm)

Data hidden as code:ArcadeWorld.exe:0042BDB0 mov     eax, 0

ArcadeWorld.exe:0042BDB5 test    eax, eax

ArcadeWorld.exe:0042BDB7 jnz     short loc_42BDD8; unreachable code

ArcadeWorld.exe:0042BDD8 loc_42BDD8: ; CODE XREF: 0042BDB7 ArcadeWorld.exe:0042BDD8 lea edx, [ebp+401C42h]ArcadeWorld.exe:0042BDDE lea eax, [ebp+40B220h]ArcadeWorld.exe:0042BDE4 push eaxArcadeWorld.exe:0042BDE5 push 8A20hArcadeWorld exe:0042BDEA push edx

0042BDD8 8D 95 42 1C 40 00 8D 85 20 B2 40 00 50 68 20 8A ìòB.@.ìà ¦@.Ph è0042BDE8 00 00 52 E8 8D 01 00 00 60 8B BD 67 C0 40 00 03 ..RFì...`ï[email protected] BD 3F C0 40 00 8D B5 BF BF 40 00 B9 74 00 00 00 +?+@.ì¦++@.¦t...0042BE08 68 00 20 00 00 57 E8 26 FF FF FF F3 A4 8D 85 AF h. ..WF& =ñìà»0042BE18 BF 40 00 8B 9D 3F C0 40 00 FF B5 43 C0 40 00 FF +@.ï¥?+@. ¦C+@.

ArcadeWorld.exe:0042BDEA push edxArcadeWorld.exe:0042BDEB call near ptr unk_42BF7DArcadeWorld.exe:0042BDF0 pushaArcadeWorld.exe:0042BDF1 mov edi, [ebp+40C067h]ArcadeWorld.exe:0042BDF7 add edi, [ebp+40C03Fh]ArcadeWorld.exe:0042BDFD lea esi, [ebp+40BFBFh]

0042BE28 B5 37 C0 40 00 6A 01 50 53 E8 9E 00 00 00 FF B5 ¦[email protected]... ¦0042BE38 6B C0 40 00 FF B5 3F C0 40 00 E8 DE FD FF FF 8B k+@. ¦[email protected]¦² ï0042BE48 85 18 B2 40 00 85 C0 74 1C FF B5 18 B2 40 00 FF à.¦@.à+t. ¦.¦@.0042BE58 B5 57 C0 40 00 FF B5 3F C0 40 00 E8 E7 10 00 00 ¦W+@. ¦[email protected] E8 3F FE FF FF 53 E8 E5 04 00 00 E8 14 00 00 00 F?¦ SFs...F....0042BE78 5C 64 6C 6C 63 61 63 68 65 5C 74 63 70 69 70 2E \dllcache\tcpip.0042BE88 73 79 73 00 E8 16 0A 00 00 E8 13 00 00 00 5C 64 sys.F....F....\d

Page 34: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

API Resolution

• User‐level malware programs require systemUser level malware programs require system calls to perform malicious actions

• Use Win32 API to access user level libraries• Use Win32 API to access user level libraries

• Obufscations impede malware analysis using IDA P Oll DbIDA Pro or OllyDbg– Packers use non‐standard linking and loading of dlldlls

– Obfuscated API resolution

Page 35: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Standard API Resolution

Imports in IAT identified by IDA by looking at Import Table

Page 36: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Handling Thunks

• Identify subroutines with a JMP instruction only– Treat any calls to these subs as an API call IsDebuggerPresent

Page 37: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Leveraging Standard API Address Loading

==================================================Function Name : ADSICloseDSObjectAddress : 0x76e30826Relative Address : 0x00020826Ordinal : 142 (0x8e)Fil d ld dllFilename : adsldpc.dllFull Path : c:\WINDOWS\system32\adsldpc.dllType : Exported Function==================================================

==================================================Function Name : ADSICloseSearchHandleFunction Name : ADSICloseSearchHandleAddress : 0x76e3050aRelative Address : 0x0002050aOrdinal : 143 (0x8f)Filename : adsldpc.dllFull Path : c:\WINDOWS\system32\adsldpc.dllType : Exported Function==================================================

==================================================Function Name : ADSICreateDSObjectAddress : 0x76e30447Relative Address : 0x00020447Ordinal : 144 (0x90)Ordinal : 144 (0x90)Filename : adsldpc.dllFull Path : c:\WINDOWS\system32\adsldpc.dllType : Exported Function==================================================

Page 38: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Using Dataflow Analysis

• Identify register based indirect callsGetEnvironmentStringW

def

use

Page 39: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Handling Dynamic Pointer Updates

• Identify register based indirect calls

dword_41e304 has no staticvalue to look up API

A def to dword_41e308 is foundLook for probable call toGetProcAddress earlier

C ll t G tP Add value to look up APICall to GetProcAddress

def

use

Page 40: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Leveraging Standard API Address Loading is not enough

==================================================Function Name : ADSICloseDSObjectAddress : 0x76e30826Relative Address : 0x00020826Ordinal : 142 (0x8e)Fil d ld dllFilename : adsldpc.dllFull Path : c:\WINDOWS\system32\adsldpc.dllType : Exported Function==================================================

==================================================Function Name : ADSICloseSearchHandle

There are many indirect ways to loadAnd call a Windows API:• access to list of loaded DLLs• access to a loaded DLL and use ofFunction Name : ADSICloseSearchHandle

Address : 0x76e3050aRelative Address : 0x0002050aOrdinal : 143 (0x8f)Filename : adsldpc.dllFull Path : c:\WINDOWS\system32\adsldpc.dllType : Exported Function

access to a loaded DLL and use ofGetModulHandle() + offset

• …

==================================================

==================================================Function Name : ADSICreateDSObjectAddress : 0x76e30447Relative Address : 0x00020447Ordinal : 144 (0x90)Ordinal : 144 (0x90)Filename : adsldpc.dllFull Path : c:\WINDOWS\system32\adsldpc.dllType : Exported Function==================================================

Page 41: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Consequence of Failure to Identify APIs

...

.text:004011A7 push offset unk_40A2DC ; arg 1

.text:004011AC xor ebx, ebx

Name of a library

.text:004011AE call dword ptr unk_40A0E4 .data:0040A0E4 00000000

.text:004011B4 mov edi, eax

.text:004011B6 cmp edi, ebx

.text:004011B8 jz short loc_401211

.text:004011BA push esi

.text:004011BB mov esi, dword ptr unk_40A0E8text:004011C1 push offset unk 40A2C4 ; arg 2

Load library call (LoadLibrary)

Name of the library function.text:004011C1 push offset unk_40A2C4 ; arg 2.text:004011C6 push edi ; arg 1.text:004011C7 call esi ; unk_40A0E8 .data:0040A0E8 00000000.text:004011C9 push offset unk_40A2AC.text:004011CE push edi.text:004011CF mov dword_433480, eax...

Name of the libraryAPI call to get the addressOf the loaded library function(GetProcAddress)

...

.text:00401132 lea eax, [ebp+var 4].text:00401132 lea eax, [ebp var_4]

.text:00401135 push eax

.text:00401136 push ebx

.text:00401137 push 0

.text:00401139 mov [ebp+var_4], esi

.text:0040113C call dword_433480

.text:00401142 test eax, eaxlibrary function call

Page 42: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Failure to Perform Control Flow Analysis

• CreateThread

.text:009A3A4C                push    eax

.text:009A3A4D                 xor     eax, eax

.text:009A3A4F                 push    eax

.text:009A3A50                 push    eax             

h ff d d

Location of the start address of a thread

.text:009A3A51                 push    offset dword_9A3939 

.text:009A3A56                 push    eax            

.text:009A3A57                 push    eax             

.text:009A3A58                 call    [ebx]

Call to CreateThread

.data:009A3939          xxxxxxx

• Starting Services

• Thread synchronization

• Critical sections

• Callback functions

Page 43: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Advanced API Resolution

• There are many ways in which a library or APIThere are many ways in which a library or API can be invoked.

• There are many ways an API call can be• There are many ways an API call can be obfuscated

B h i i i i d h• But there is one invariant associated to each API and library: its signature– i.e; number of arguments, type of arguments, and type of return value if any.

Page 44: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Advanced API Resolution: Type Inference for binary program analysisanalysis

• Use type inference as a single solution to solve three fundamental problems:– Identifying API and function calls (call and jump targets)

– Building a precise CFG

– Recovering user‐defined types for proper decompilation– Recovering user‐defined types for proper decompilation

• For Windows Executable files:– Integers: object handles, addresses, IP address, ports, etc

– Strings: file names, service names, etc

– Structures: sockaddrStructures: sockaddr

struct sockaddr_in {

short sin family;short   sin_family;

u_short sin_port;

struct  in_addr sin_addr;

char    sin_zero[8];

};

Page 45: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Type propagation and matching

• Type propagation using dataflow analysis

• Propagation of return values and arguments of functionsp g g

sub_403649 proc near

...

.text:00403649 arg_0 = dword ptr 8

There is only one API that has7 arguments such that the seventh

d thi d d fi t b.text:00403649 arg_4 = dword ptr 0Ch

.text:00403649 arg_8 = dword ptr 10h

.text:00403649 arg_C = dword ptr 14h

...

text:00403668 xor ebx ebx

and third and first one can be pointers and all others are not.

.text:00403668 xor ebx, ebx

...

.text:00403704 mov esi, ds:dword_40A06C

.text:0040370A push ebx ; arg 7 type(f,7) = type (ebx)

.text:0040370B mov edi, 80h

t t 00403710 h di 6 t (f 6) t ( di)

HANDLE WINAPI CreateFile(

__in LPCTSTR lpFileName,

i DWORD d D i dA.text:00403710 push edi ; arg 6 type(f,6) = type (edi)

.text:00403711 push 4 ; arg 5 type(f,5) = union(int,char)

.text:00403713 push ebx ; arg 4 type(f,4) = type (ebx)

.text:00403714 push 2 ; arg 3 type(f,3) = union(int,char)

.text:00403716 push 2 ; arg 2 type(f,2) = union(int,char)

__in DWORD dwDesiredAccess,

__in DWORD dwShareMode,

__in LPSECURITY_ATTRIBUTES lpSecurityAttributes,

__in DWORD dwCreationDisposition,

in DWORD dwFlagsAndAttributes,.text:00403718 push [ebp+arg_4] ; arg 1 type(f,1) = type([ebp+arg_4])

.text:0040371B mov [ebp+var_18], ebx

.text:0040371E call esi ; dword_40A06C type(ret(f)) = type(eax)

...

__in DWORD dwFlagsAndAttributes,

__in HANDLE hTemplateFile

);

Page 46: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Advantages of type Inference Analysis

• Programmers data structures and types are going to be based on known data structures and types provided by the libraries

• Identifying API calls and type information help capture better the semantics of the program executionp g

• Not restricted to Windows but require knowledge of the libraries and their documentation

• Can deal with some of the widely used obfuscation• Can deal with some of the widely used obfuscation techniques– Import table obfuscation

– Code rewrite: code rewrite preserves the types!

Page 47: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Phase 3: Rebuilding the unpacked executable

• From a damaged dumped image of a running malware to a PE g p g gexecutable:– Knowing all APIs allows us to identify the OEP.

Semantic approach: ExitProcess CreateMutex GetCommandLine– Semantic approach: ExitProcess, CreateMutex, GetCommandLine, GetModulaHandle �, etc are close to OEP. There are about 20 APIs that are often called at the beginning of the execution of the code.

Structural approach: find sources of call graphs in the binary– Structural approach: find sources of call graphs in the binary

– Rebuilding in import table with all references to identified APIs

• The disassembly of the reconstructed PE is often of better quality than the disassembly of the dumped process image– The new PE code bypasses the unpacking routine embedded in the 

packed codep

– The new PE contains the original code

Page 48: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The
Page 49: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The
Page 50: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Phase 4: Decompilation

• Identifies local variables

• Identifies arguments registers stack or any• Identifies arguments: registers, stack, or any combination

• Identifies global variables

• Identify calling conventions

• Identifies common idioms and compiler features

• Eliminates the use of registers as intermediate variablesvariables

• Identifies control structures

Page 51: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Decompilation Depends on Previous Analysis Phases

Page 52: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Malware Obfuscation Effect on Decompilation

• While packing is the most used obfuscation technique it is often combined with other advancedtechnique, it is often combined with other advanced forms of obfuscation that make decompilation often impossible:p

• Call obfuscation in general and API obfuscation in particular

• Binary Rewrite to create semantically equivalent code with vastly different structure

• Chuncking or “code spaghettisation”• …

Page 53: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Analysis Phases

Source CodeIdeally

MalwareReality

Compiler

Executable code

Unpacking

Non-executableExecutable code

Disassembly &Analysis

Non executablecode

Disassembly &Analysis

Undo Obfuscation

Assembly code

Decompilation

Obfuscated assembly

y

D il ti

Assembly code

D il ti

Legitimate C/C++that a compiler

Decompilation

A mess

Decompilation

Legitimate C/C++

Decompilation

would generate

Page 54: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Example of Binary Rewrite.text:009B327C OBFUSCATED_VERSION_OF_is_private_subnet proc near.text:009B327C mov ecx, eax.text:009B327E and ecx, 0FFFFh.text:009B3284 cmp ecx, 0A8C0h.text:009B328A jz loc_9B4264.text:009B3290 jmp off 9BAAA5

.text:009A1311 is_private_subnet proc near

.text:009A1311 mov ecx, eax

.text:009A1313 and ecx, 0FFFFh

.text:009A1319 cmp ecx, 0A8C0h

.text:009A131F jz loc_9A1340j p _/* .text:009BAAA5 off_9BAAA5 dd offset loc_9AB10C */

.text:009B3290 OBFUSCATED_VERSION_OF_is_private_subnet endp

.

.text:009B4264 loc_9B4264:

.text:009B4264t t 009B4264 1

.text:009A1325 cmp al, 0Ah

.text:009A1327 jz loc_9A1340

.text:009A132D and eax, 0F0FFh

.text:009A1332 cmp eax, 10ACh

.text:009A1337 jz loc_9A1340

.text:009A133D xor eax, eaxtext:009A133F retn.text:009B4264 mov eax, 1

.text:009B4269 retn

.

.text:009AB10C cmp al, 0Ah

.text:009AB10E jz loc_9B4264text:009AB114 jmp off 9BA137

.text:009A133F retn

.text:009A1340

.text:009A1340 loc_9A1340:

.text:009A1340 mov eax, 1

.text:009A1345 retn

.text:009A1345 is_private_subnet endp.text:009AB114 jmp off_9BA137

/* .text:009BA137 off_9BA137 dd offset loc_9B2CE4 */..text:009B2CE4 and eax, 0F0FFh.text:009B2CE9 cmp eax, 10ACh.text:009B2CEE jz loc_9B4264.text:009B2CF4 jmp off_9BA3CC

/* .text:009BA3CC off_9BA3CC dd offset loc_9ABA24 */..text:009ABA24 xor eax, eax.text:009ABA26 retn

bool __usercall is_private_subnet (unsigned __int16 a1) {return a1 == 43200 || a1 == 10 || (a1 & 0xF0FF) == 4268;

}��int __usercall OBFUSCATED_VERSION_OF_is_private_subnet<eax>(unsigned __int16 a1 <ax>){{int result; // eax@2

if ( a1 == 43200 ) result = 1;else result = off_9BAAA5();return result;

}

Page 55: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Systematic Approach to Code Deobfuscation:Binary Rewritinga y e t g

• Dechunking: The control flow of Conficker's P2P module has been significantlyDechunking: The control flow of Conficker s P2P module has been significantly obfuscated to hinder its disassembly and decompilation.  Specifically, the contents of code blocks from each subroutine have been extracted and relocated throughout different portions of the executable.  These different blocks (or chunks) are then referenced through unconditional and conditional jump instructions In effect the logical control flow of the P2Punconditional and conditional jump instructions.  In effect, the logical control flow of the P2P module has been obscured (spaghetti‐code) to a degree that the module cannot be decompiled into coherent C‐like code, which typically drives more in‐depth and accurate code interpretation. Move all blocks to a contiguous memory block.

• Normalize x86 instructions: push followed by a pop is a mova mov

• Normalize calling convention: cdecl fastcall stdcallNormalize calling convention: cdecl, fastcall, stdcall, instead of user‐defined.

Page 56: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Conficker and Hydrac Dechunking

• Identify all chunks in a function and rewrite the functiony

• Applied to all Conficker C P2P Protocol subroutines

• Unlike the Conficker P2P logic, Hydraq did not exhibit the same level of obfuscation. It did, however, share some obfuscation features with Conficker. The functions of the Hydraq binary have been subjected to chunking, which y q y j g,renders decompilation difficult. We applied our transformations to automatically generate the C‐like code for each subroutine and build a complete CFG of the binary Theeach subroutine and build a complete CFG of the binary. The IDA disassembler identified 185 subroutines in the binary prior to our analysis. After running the dechunking transformation, only 141 subroutine remained and were decompiled. 

Page 57: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Purpose of code obfuscation

• While packing is often used to reduce the size of binaries and to create polymorphic malware samples, the more advanced obfuscation techniques are designed to slow down reverse engineering efforts and to prevent:

– the identification of API calls: identify the basic building blocks of the malware

– the control‐flow reconstruction of the malware: follow and reconstruct the logic flow

t ti l i d t i th f ll f ti lit– static analysis: determine the full functionality, triggers, hidden logic, time bombs, etc.

timely reverse engineering and mitigation of the– timely reverse engineering and mitigation of the threat

Page 58: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Why Code Obfuscation is not Easy

• Malware authors can design binary code that is extremely difficult to analyze. Using advanced programming languagesdifficult to analyze. Using advanced programming languages knowledge, it is possible to create such code.

• Malware authors do not feel the need to always obfuscated h d l d f b d dtheir code. Can easily defeat signature‐based detection. Overwhelm analysts and tools with large numbers of samples.

• Malware code should be able to run in a reliable manner.Malware code should be able to run in a reliable manner. Obfuscation should not compromise this important requirement and should maintain the reliability of the initial d Thi i f t f tcode. This requires a proof or guarantee of some sort.

• Malware deobfuscation is therefore a more attainable than you might think. Systematic obfuscation informs systematic y g y ydeobfuscation. 

Page 59: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Our Approach

• Because obfuscation is introduced in a ratherBecause obfuscation is introduced in a rather systematic way, there is a hope that it can be dealt with in an automated waydealt with in an automated way.

S i ll id if i bf i• Systematically identifying an obfuscation step and undoing its effect.

• Focus on generic approaches as opposed to g pp pppacker/obfuscator specifics

Page 60: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Example: Static Analysis of Conficker• Conficker appeared on November 20th, 2008• Infected millions of machines worldwide• Millions of machines still infected despite an extensive news coverage 

b h habout the threat• Four versions have been released: A, B, B++, and C• It is a sophisticated piece of malicious code created by professionals who 

have extensive knowledge about networking cryptography system andhave extensive knowledge about networking, cryptography, system and network programming, and security

• Managed to defeat the security community in stopping its progression by using strong crypto, code obfuscation, aggressive propagation strategies, and constantly monitoring the security community actionsand constantly monitoring the security community actions

• Dynamic analysis provided a limited understanding of the threat:• Identification of what appears to be a P2P protocol• Identification of ports opened by the malwareIdentification of ports opened by the malware

• Deobfuscation and static analysis were the only techniques that were able to uncover the full capability of the malware.

SRI Technical Report (http://mtc.sri.com/Conficker) 

Page 61: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Example: Static Analysis of Conficker

Static Analysis of Conficker Code:– Domain generation algorithm: provided a list of daily domains to be blockedg g p f y– Quarter of the Internet scanned: Understand what part of the Internet was 

targeted for scanning and what infections were due to USB ports and mobile devices

– List of disabled security products: detection– Ukrainian keyboard avoidance: Geo‐location database poisoning– Use of MD6 and related crypto algorithms: Attribution– DNS APIs patching to disable list of websites (including SRI!): detection– Distribute a number of modified versions of the binary– TCP and UDP ports based on the IP address of the infected machine: 

detection

Page 62: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Deobfuscation of the Conficker C P2P protocol

• Heavily obfuscated protocol code

• 88 APIs obfuscated88 APIs obfuscated

• Use of chunking lead to poor decompilation

• Benefits of the deobfuscation– P2P Protocol description: protocol understanding and P2P structure– Peer selection algorithm: proved the peer poisoning approach useless– Possibility to hot patch code without DGA updates: proved C&C domains y p p p

obsolete

The P2P protocol was not just a mechanism for distributing PE executableThe P2P protocol was not just a mechanism for distributing PE executablefiles but also digitally signed sets of x86 instructions that are executed in aseparate thread and take as argument the IP address of the sender. Thiswould provide a hot patch mechanism for all data manipulated by Conficker:list of peers, encryption/decryption keys, the Conficker code it self, etc.

Page 63: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Stuxnet: Keeping it “relatively” simple

• Stuxnet does not use advanced binary obfuscation techniques.

• The analysis of the code is challenging nevertheless• The analysis of the code is challenging nevertheless

• Stuxnet Code Characteristics:

– Use of C++

– Use of C++ exception handling

– Use of C++ classes

– ��������Use of simple data encoding (encryption)

– Use of C structures for all data passed to the main subroutines:

• Over 40 user‐defined structures

• Not recognized by disassemblers and decompilersNot recognized by disassemblers and decompilers

Page 64: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Phase 5: Program Understanding

• Need to identify higher‐level concepts fromNeed to identify higher level concepts from the deobfuscated code

• Need to interpret the code into a higher level• Need to interpret the code into a higher‐level malware objective

N d i d if i l f• Need to indentify particular features: crypto:– Functions that use crypto‐related opcode, loops, etc

– Known constants in crypto algorithms

Page 65: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Finding Known and Unknown Crypto

Page 66: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Conclusions

• It is always desirable to recover from the malware a description that is as close as possible to the original d d d b h hcode produced by the authors.

• It is often possible to do that in practice

• It is often the only way to really determine the full capability of the malware

h b fi i h i hi h• The benefits are important when it comes to high‐profile targets

E il i t t d i l i t l• Easily integrated in common analysis tools: disassembler (IDA), Decompilers.

Page 67: Reverse Engineering Malware · 2013. 1. 10. · SRI International Reverse Engineering Malware Hassen Saidi Computer Science Laboratory SRI International Hassen.Saidi@sri.com. The

Daily Malware Capture and Analysis

• http://mtc sri com/http://mtc.sri.com/