malgram malware analysis: malware unpacking static analysis code deobfuscation decompilation

41
Malgram Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Upload: rufin

Post on 24-Feb-2016

88 views

Category:

Documents


1 download

DESCRIPTION

Malgram Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation. Phillip Porras and Hassen Saidi Computer Science Lab SRI International . Objectives. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Malgram Malware Analysis:Malware UnpackingStatic AnalysisCode DeobfuscationDecompilation

Phillip Porras and Hassen SaidiComputer Science LabSRI International

Page 2: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Objectives• Now that we have various ways of knowing

what the malware does when running on an infected system, we aim at answering two fundamental questions:

1. How does it do it?

2. What are the full capability of the malware: both observed behavior and yet to be triggered behavior

Page 3: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Dynamic vs Static Malware Analysis

• Dynamic Analysis– Techniques that profile actions of binary at runtime– Only provides partial ``effects-oriented profile’’ of

malware potential

• Static Analysis– Techniques that apply program analysis to the

binary code– Can provide complementary insights– Potential for more comprehensive assessment

Page 4: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Malgram Report

• …go interactive

Page 5: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

From Binary To Semantically Rich C Code

Raw Binary

Disassembly

Page 6: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

From Binary To Semantically Rich C CodeComplete Disassembly

Page 7: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

From Binary To Semantically Rich C Code

Decompiled C code

Page 8: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Challenges in Static Analysis

Raw Binary

Disassembly

Complete Disassembly

Decompiled C code

Page 9: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Malware Obfuscation

• Most malware is obfuscated• Packing is the most used obfuscation technique• Packing is often combined with other advanced forms of

obfuscation:• Binary Rewrite to create semantically equivalent

code with vastly different structure• Call obfuscation in general and API obfuscation in

particular• Chuncking or “code spaghettisation”• Dead code (or functionally irrelevant code)

Page 9

Page 10: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Challenges in Static Analysis

Raw Binary

Disassembly

Challenge: Does the binary represents the full malware binary logic.

Page 11: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Unpacking Result

Page 11

Unpacking

Page 12: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Packed vs Unpacked

• go interactive…

Page 13: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Coarse-grained Execution Monitoring

• Generalized unpacking principle– Execute binary till it has sufficiently revealed itself– Dump the process execution image for static

analysis• Monitoring execution progress

– Eureka employs a Windows driver that hooks to SSDT (System Service Dispatch Table)

– Callback invoked on each NTDLL system call– Filtering based on malware process pid

Page 14: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Statistics-based Unpacking

• Observations– Statistical properties of packed executable differ

from unpacked executable– As malware executes code-to-data ratio increases

• Complications– Code and data sections are interleaved in PE

executables– Data directories (import tables) look similar to

data but are often found in code sections– Properties of data sections vary with packers

Page 15: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Statistics-based Unpacking (3)

Bigram Calc117 KB

Explorer1010 KB

Ipconfig59 KB

lpr11 KB

Mshearts131 KB

Notepad72 KB

Ping21 KB

Shutdown23 KB

Taskman19 KB

FF 15call

246 3045 184 24 192 415 58 132 126

FF 75push

235 2494 272 33 274 254 41 63 85

E8 _ _ _ 0xffcall

1583 2201 181 19 369 180 87 49 41

E8 _ _ _ 0x00call

746 1091 152 62 641 108 57 66 50

Page 16: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Evaluation (ASPack)

Page 17: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Evaluation (MoleBox)

Page 18: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

API Resolution

• User-level malware programs require system calls to perform malicious actions

• Use Win32 API to access user level libraries• Obfuscations impede malware analysis using

disassemblers and decompilers– Packers use non-standard linking and loading of

dlls– Obfuscated API resolution

Page 19: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Standard API ResolutionImports in IAT identified by IDA by looking at Import Table

Page 20: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Resolving API Calls Using Dataflow Analysis

• Identify register based indirect callsGetEnvironmentStringW

use

def

Page 21: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Evaluation Metrics

• Measuring analyzability– Code-to-data ratio

• Use disassembler to separate code and data. • Most successfully unpacked malware have code-to-data

ratio over 50%– API resolution success

• Percentage of API calls that have been resolved from the set of all call sites.

• Higher percentage implies more the malware is amenable to static anlaysis.

Page 22: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Challenges in Static Analysis

Disassembly

Complete DisassemblyChallenge: Can we isolate subroutines?

Page 23: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Binary Rewrites

• go interactive …

Page 24: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

From Raw Binary To Decompiled C Code

Raw Binary

Disassembly

Complete Disassembly

Decompiled C code

Page 25: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Renaissance: Improving C Code Readability void *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1){ unsigned int *destination2; size_t num3, num2, num4, num5;

destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 );}Hex Rays

void *sub_9AB966(int a1, void *source, unsigned int a3){ int v3, v4, v5, v6, v8;

v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); }

Hex Rays + Renaissance

Page 26: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

1. Typing and naming variablesvoid *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1){ unsigned int *destination2; size_t num3, num2, num4, num5;

destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 );}Hex Rays

void *sub_9AB966(int a1, void *source, unsigned int a3){ int v3, v4, v5, v6, v8;

v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); }

Hex Rays + Renaissance

Page 27: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

2. Highlighting important varsvoid *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1){ unsigned int *destination2; size_t num3, num2, num4, num5;

destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 );}Hex Rays

void *sub_9AB966(int a1, void *source, unsigned int a3){ int v3, v4, v5, v6, v8;

v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); }

Hex Rays + Renaissance

Page 28: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

3. Improvements to decompilationvoid *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1){ unsigned int *destination2; size_t num3, num2, num4, num5;

destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 );}Hex Rays

void *sub_9AB966(int a1, void *source, unsigned int a3){ int v3, v4, v5, v6, v8;

v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); }

Hex Rays + Renaissance

Page 29: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

4. Caller → Callee type infovoid *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1){ unsigned int *destination2; size_t num3, num2, num4, num5;

destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 );}Hex Rays

void *sub_9AB966(int a1, void *source, unsigned int a3){ int v3, v4, v5, v6, v8;

v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); }

Hex Rays + Renaissance

Page 30: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

EvaluationIDA Pro Renaissance

Adialer 153/606 (25%) 276/606 (46%)Adpclient 12/300 (4%) 93/300 (31%)Adultbrowser 296/762 (39%) 339/762 (44%)Agent.DZ (packed) 1/61 (2%) 14/61 (23%)Browsermodifier 161/469 (34%) 252/469 (54%)Casino_c12a 794/7207 (11%) 2614/7207 (36%)Conficker-A 243/781 (31%) 318/781 (41%)Conficker-B 296/1516 (20%) 735/1516 (49%)Cycbot 267/2842 (9%) 881/2842 (31%)Duqu 66/300 (22%) 117/300 (37%)Lexotan32-A 2/40 (5%) 9/40 (23%)Lexotan32-B 2/50 (4%) 8/50 (16%)Lolyda.AA 10/134 (7%) 40/134 (30%)Magiccasino 42/1064 (4%) 351/1064 (33%)Mydoom_aa32 153/543 (28%) 189/543 (35%)Podnhua_f0a6 99/372 (27%) 139/372 (37%)Qakbot-A 179 (29%) 183 (30%)Stuxnet 64/320 (20%) 134/320 (42%)Torpig 629 (44%) 717 (51%)Notepad 145/273 (53%) 108/273 (40%)Quake 260/4054 (6%) 1271/4054 (31%)Total 3697/23721 (16%) 9503/23721 (40%)

Page 31: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Challenges in Static Analysis

Raw Binary

Disassembly

Complete Disassembly

Decompiled C code

Page 32: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

The Need for Rapid Crypto-Algorithm Isolation AES Truecrypt Waledac

SSL Agobot (IRC over SSL)

Serpent Truecrypt

Twofish Truecrypt

Cascades Truecrypt

HASH Whirlpool Truecrypt

HASH MD6 conficker BC

HASH SHA1 conficker A Truecrypt

RC4 Rustock Zeus Conficker

Custom Crypto / Encoding Pushdo Kraken mebroot Mega-D

XOR-Custom Lethic Virut Hydraq Torpig

RSA variants Nugashe Conficker Waledac

Blowfish - 448 bit Clampi

Page 33: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Intra-module Analyzer

isCrypto Score = isConst + isPadded + Crypt API fn (LargeVar, Loop Detection, Opcodes, BigMath)

cryptoFnDetection () – At least 2 matchesUnknownComputation

cryptoFnDetection ()

IntraModuleisCrypto()

ConstantDetector

PaddingAnalysis

Large Local

Variables

LoopDetection

Big NumberMath

OpcodeAnalysis

MicrosoftCryptoAPICAPICON

ConstantData

Loading

Page 34: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Constant detectionBlowfishCameliaCASTCAST256CRC32DESGOSTHAVALMARSMD2PKCS_MD2PKCS_MD5PKCS_RIPEMD160PKCS_SHA256PKCS_SHA384PKCS_SHA512PKCS_TigerRawDESRC2RijndaelSAFER

SHA1SHA256SHA512SHARKSKIPJACKSquareTigerTwofishWAKEWhirlpoolzlibAESMD6

Direct Reference

Page 35: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Indirect LoadData array containsKnown crypto content

Load Array

Unknown Computation

BlowfishCameliaCASTCAST256CRC32DESGOSTHAVALMARSMD2PKCS_MD2PKCS_MD5PKCS_RIPEMD160PKCS_SHA256PKCS_SHA384PKCS_SHA512PKCS_TigerRawDESRC2RijndaelSAFER SHA1SHA256SHA512SHARKSKIPJACKSquareTigerTwofishWAKEWhirlpoolzlibAESMD6

This could be EncryptionOr Decryption

Page 36: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

func ColorNode (Subgraph){ if (exists uncolored subgraph) ColorNode (subgraph) foreach leaf in subgraph { isCrypto(Leaf) } If (exist green leaf) then color root green if (exist orange leaf) then color root orange if (exist > 2 red leaves) then color root red}

func cryptoString (per subroutine) if node contains known crypto implementation

substring, label node with corresponding crypto library.

Inter-module Analyzer

AES

Vowpal wabbit

MD6

Page 37: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

IDA Pro Call Graph w/ Crypto-routine detection

Page 38: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Example (c) SRI International Finding crypto constants and subroutines in binary files automatic discovery of crypto functions as unknown computations

4BABF1: found sparse constants for SHA-150C254: found const array sbox_AES (used in AES)50E354: found const array rsbox_AES (used in AES)50F574: found const array Twofish_q (used in Twofish)50F7A4: found const array MARS_Sbox (used in MARS)510EA4: found const array zinflate_lengthExtraBits (used in zlib)510F18: found const array zinflate_distanceExtraBits (used in zlib)511918: found const array CRC32_m_tab (used in CRC32)514F98: found const array CRC32_m_tab (used in CRC32)Found 9 known constant arrays in total.

Scanning code for crypto subroutinesfound crypto in Function @ 407334found crypto in Function @ 40E5B4found crypto in Function @ 47D954found crypto in Function @ 47ED34found crypto in Function @ 4816F4found crypto in Function @ 4B6624found crypto in Function @ 4B9980found crypto in Function @ 4CCBD4found crypto in Function @ 4CCD4Cfound crypto in Function @ 4CE208found crypto in Function @ 4CE7CCfound crypto in Function @ 4CEBE8found crypto in Function @ 4D9B00found crypto in Function @ 4D9EE4Done labelling crypto subroutinesFound 14 subroutine(s) with possible crypto

Running SRI Crypt Finder

Page 39: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Running SRI Crypt Finder

Page 40: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Report Generation

• go interactive

Page 41: Malgram  Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation