malgram malware analysis: malware unpacking static analysis code deobfuscation decompilation phillip...

Post on 12-Jan-2016

227 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Malgram Malware Analysis:Malware UnpackingStatic AnalysisCode DeobfuscationDecompilation

Phillip Porras and Hassen SaidiComputer Science LabSRI International

Objectives• Now that we have various ways of knowing

what the malware does when running on an infected system, we aim at answering two fundamental questions:

1. How does it do it?

2. What are the full capability of the malware: both observed behavior and yet to be triggered behavior

Dynamic vs Static Malware Analysis

• Dynamic Analysis– Techniques that profile actions of binary at runtime– Only provides partial ``effects-oriented profile’’ of

malware potential

• Static Analysis– Techniques that apply program analysis to the

binary code– Can provide complementary insights– Potential for more comprehensive assessment

Malgram Report

• …go interactive

From Binary To Semantically Rich C Code

Raw Binary

Disassembly

From Binary To Semantically Rich C CodeComplete Disassembly

From Binary To Semantically Rich C Code

Decompiled C code

Challenges in Static Analysis

Raw Binary

Disassembly

Complete Disassembly

Decompiled C code

Malware Obfuscation

• Most malware is obfuscated• Packing is the most used obfuscation technique• Packing is often combined with other advanced forms

of obfuscation:• Binary Rewrite to create semantically equivalent

code with vastly different structure• Call obfuscation in general and API obfuscation

in particular• Chuncking or “code spaghettisation”• Dead code (or functionally irrelevant code)

Page 9

Challenges in Static Analysis

Raw Binary

Disassembly

Challenge: Does the binary represents the full malware binary logic.

Unpacking Result

Page 11

Unpacking

Packed vs Unpacked

• go interactive…

Coarse-grained Execution Monitoring

• Generalized unpacking principle– Execute binary till it has sufficiently revealed itself– Dump the process execution image for static

analysis• Monitoring execution progress

– Eureka employs a Windows driver that hooks to SSDT (System Service Dispatch Table)

– Callback invoked on each NTDLL system call– Filtering based on malware process pid

Statistics-based Unpacking

• Observations– Statistical properties of packed executable differ

from unpacked executable– As malware executes code-to-data ratio increases

• Complications– Code and data sections are interleaved in PE

executables– Data directories (import tables) look similar to

data but are often found in code sections– Properties of data sections vary with packers

Statistics-based Unpacking (3)

Bigram Calc117 KB

Explorer1010 KB

Ipconfig59 KB

lpr11 KB

Mshearts131 KB

Notepad72 KB

Ping21 KB

Shutdown23 KB

Taskman19 KB

FF 15call

246 3045 184 24 192 415 58 132 126

FF 75push

235 2494 272 33 274 254 41 63 85

E8 _ _ _ 0xffcall

1583 2201 181 19 369 180 87 49 41

E8 _ _ _ 0x00call

746 1091 152 62 641 108 57 66 50

Evaluation (ASPack)

Evaluation (MoleBox)

API Resolution

• User-level malware programs require system calls to perform malicious actions

• Use Win32 API to access user level libraries• Obfuscations impede malware analysis using

disassemblers and decompilers– Packers use non-standard linking and loading of

dlls– Obfuscated API resolution

Standard API ResolutionImports in IAT identified by IDA by looking at Import Table

Resolving API Calls Using Dataflow Analysis

• Identify register based indirect callsGetEnvironmentStringW

use

def

Evaluation Metrics

• Measuring analyzability– Code-to-data ratio

• Use disassembler to separate code and data. • Most successfully unpacked malware have code-to-data

ratio over 50%

– API resolution success• Percentage of API calls that have been resolved from the

set of all call sites.• Higher percentage implies more the malware is

amenable to static anlaysis.

Challenges in Static Analysis

Disassembly

Complete DisassemblyChallenge: Can we isolate subroutines?

Binary Rewrites

• go interactive …

From Raw Binary To Decompiled C Code

Raw Binary

Disassembly

Complete Disassembly

Decompiled C code

Renaissance: Improving C Code Readability void *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1){ unsigned int *destination2; size_t num3, num2, num4, num5;

destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 );}Hex Rays

void *sub_9AB966(int a1, void *source, unsigned int a3){ int v3, v4, v5, v6, v8;

v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); }

Hex Rays + Renaissance

1. Typing and naming variablesvoid *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1){ unsigned int *destination2; size_t num3, num2, num4, num5;

destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 );}Hex Rays

void *sub_9AB966(int a1, void *source, unsigned int a3){ int v3, v4, v5, v6, v8;

v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); }

Hex Rays + Renaissance

2. Highlighting important varsvoid *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1){ unsigned int *destination2; size_t num3, num2, num4, num5;

destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 );}Hex Rays

void *sub_9AB966(int a1, void *source, unsigned int a3){ int v3, v4, v5, v6, v8;

v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); }

Hex Rays + Renaissance

3. Improvements to decompilationvoid *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1){ unsigned int *destination2; size_t num3, num2, num4, num5;

destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 );}Hex Rays

void *sub_9AB966(int a1, void *source, unsigned int a3){ int v3, v4, v5, v6, v8;

v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); }

Hex Rays + Renaissance

4. Caller → Callee type infovoid *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1){ unsigned int *destination2; size_t num3, num2, num4, num5;

destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 );}Hex Rays

void *sub_9AB966(int a1, void *source, unsigned int a3){ int v3, v4, v5, v6, v8;

v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); }

Hex Rays + Renaissance

Evaluation

IDA Pro RenaissanceAdialer 153/606 (25%) 276/606 (46%)Adpclient 12/300 (4%) 93/300 (31%)Adultbrowser 296/762 (39%) 339/762 (44%)Agent.DZ (packed) 1/61 (2%) 14/61 (23%)Browsermodifier 161/469 (34%) 252/469 (54%)Casino_c12a 794/7207 (11%) 2614/7207 (36%)Conficker-A 243/781 (31%) 318/781 (41%)Conficker-B 296/1516 (20%) 735/1516 (49%)Cycbot 267/2842 (9%) 881/2842 (31%)Duqu 66/300 (22%) 117/300 (37%)Lexotan32-A 2/40 (5%) 9/40 (23%)Lexotan32-B 2/50 (4%) 8/50 (16%)Lolyda.AA 10/134 (7%) 40/134 (30%)Magiccasino 42/1064 (4%) 351/1064 (33%)Mydoom_aa32 153/543 (28%) 189/543 (35%)Podnhua_f0a6 99/372 (27%) 139/372 (37%)Qakbot-A 179 (29%) 183 (30%)Stuxnet 64/320 (20%) 134/320 (42%)Torpig 629 (44%) 717 (51%)Notepad 145/273 (53%) 108/273 (40%)Quake 260/4054 (6%) 1271/4054 (31%)Total 3697/23721 (16%) 9503/23721 (40%)

Challenges in Static Analysis

Raw Binary

Disassembly

Complete Disassembly

Decompiled C code

The Need for Rapid Crypto-Algorithm Isolation AES Truecrypt Waledac

SSL Agobot (IRC over SSL)

Serpent Truecrypt

Twofish Truecrypt

Cascades Truecrypt

HASH Whirlpool Truecrypt

HASH MD6 conficker BC

HASH SHA1 conficker A Truecrypt

RC4 Rustock Zeus Conficker

Custom Crypto / Encoding Pushdo Kraken mebroot Mega-D

XOR-Custom Lethic Virut Hydraq Torpig

RSA variants Nugashe Conficker Waledac

Blowfish - 448 bit Clampi

Intra-module Analyzer

isCrypto Score = isConst + isPadded + Crypt API fn (LargeVar, Loop Detection, Opcodes, BigMath)

cryptoFnDetection () – At least 2 matchesUnknownComputation

cryptoFnDetection ()

IntraModuleisCrypto()

ConstantDetector

PaddingAnalysis

Large Local

Variables

LoopDetection

Big NumberMath

OpcodeAnalysis

MicrosoftCryptoAPICAPICON

ConstantData

Loading

Constant detectionBlowfishCameliaCASTCAST256CRC32DESGOSTHAVALMARSMD2PKCS_MD2PKCS_MD5PKCS_RIPEMD160PKCS_SHA256PKCS_SHA384PKCS_SHA512PKCS_TigerRawDESRC2RijndaelSAFER

SHA1SHA256SHA512SHARKSKIPJACKSquareTigerTwofishWAKEWhirlpoolzlibAESMD6

Direct Reference

Indirect LoadData array containsKnown crypto content

Load Array

Unknown Computation

BlowfishCameliaCASTCAST256CRC32DESGOSTHAVALMARSMD2PKCS_MD2PKCS_MD5PKCS_RIPEMD160PKCS_SHA256PKCS_SHA384PKCS_SHA512PKCS_TigerRawDESRC2RijndaelSAFER SHA1SHA256SHA512SHARKSKIPJACKSquareTigerTwofishWAKEWhirlpoolzlibAESMD6

This could be EncryptionOr Decryption

func ColorNode (Subgraph){ if (exists uncolored subgraph) ColorNode (subgraph) foreach leaf in subgraph { isCrypto(Leaf) } If (exist green leaf) then color root green if (exist orange leaf) then color root orange if (exist > 2 red leaves) then color root red}

func cryptoString (per subroutine) if node contains known crypto implementation

substring, label node with corresponding crypto library.

Inter-module Analyzer

AES

Vowpal wabbit

MD6

IDA Pro Call Graph w/ Crypto-routine detection

Example (c) SRI International Finding crypto constants and subroutines in binary files automatic discovery of crypto functions as unknown computations

4BABF1: found sparse constants for SHA-150C254: found const array sbox_AES (used in AES)50E354: found const array rsbox_AES (used in AES)50F574: found const array Twofish_q (used in Twofish)50F7A4: found const array MARS_Sbox (used in MARS)510EA4: found const array zinflate_lengthExtraBits (used in zlib)510F18: found const array zinflate_distanceExtraBits (used in zlib)511918: found const array CRC32_m_tab (used in CRC32)514F98: found const array CRC32_m_tab (used in CRC32)Found 9 known constant arrays in total.

Scanning code for crypto subroutinesfound crypto in Function @ 407334found crypto in Function @ 40E5B4found crypto in Function @ 47D954found crypto in Function @ 47ED34found crypto in Function @ 4816F4found crypto in Function @ 4B6624found crypto in Function @ 4B9980found crypto in Function @ 4CCBD4found crypto in Function @ 4CCD4Cfound crypto in Function @ 4CE208found crypto in Function @ 4CE7CCfound crypto in Function @ 4CEBE8found crypto in Function @ 4D9B00found crypto in Function @ 4D9EE4Done labelling crypto subroutinesFound 14 subroutine(s) with possible crypto

Running SRI Crypt Finder

Running SRI Crypt Finder

Report Generation

• go interactive

top related