malgram malware analysis: malware unpacking static analysis code deobfuscation decompilation phillip...
TRANSCRIPT
Malgram Malware Analysis:Malware UnpackingStatic AnalysisCode DeobfuscationDecompilation
Phillip Porras and Hassen SaidiComputer Science LabSRI International
Objectives• Now that we have various ways of knowing
what the malware does when running on an infected system, we aim at answering two fundamental questions:
1. How does it do it?
2. What are the full capability of the malware: both observed behavior and yet to be triggered behavior
Dynamic vs Static Malware Analysis
• Dynamic Analysis– Techniques that profile actions of binary at runtime– Only provides partial ``effects-oriented profile’’ of
malware potential
• Static Analysis– Techniques that apply program analysis to the
binary code– Can provide complementary insights– Potential for more comprehensive assessment
Malgram Report
• …go interactive
From Binary To Semantically Rich C Code
Raw Binary
Disassembly
From Binary To Semantically Rich C CodeComplete Disassembly
From Binary To Semantically Rich C Code
Decompiled C code
Challenges in Static Analysis
Raw Binary
Disassembly
Complete Disassembly
Decompiled C code
Malware Obfuscation
• Most malware is obfuscated• Packing is the most used obfuscation technique• Packing is often combined with other advanced forms
of obfuscation:• Binary Rewrite to create semantically equivalent
code with vastly different structure• Call obfuscation in general and API obfuscation
in particular• Chuncking or “code spaghettisation”• Dead code (or functionally irrelevant code)
Page 9
Challenges in Static Analysis
Raw Binary
Disassembly
Challenge: Does the binary represents the full malware binary logic.
Unpacking Result
Page 11
Unpacking
Packed vs Unpacked
• go interactive…
Coarse-grained Execution Monitoring
• Generalized unpacking principle– Execute binary till it has sufficiently revealed itself– Dump the process execution image for static
analysis• Monitoring execution progress
– Eureka employs a Windows driver that hooks to SSDT (System Service Dispatch Table)
– Callback invoked on each NTDLL system call– Filtering based on malware process pid
Statistics-based Unpacking
• Observations– Statistical properties of packed executable differ
from unpacked executable– As malware executes code-to-data ratio increases
• Complications– Code and data sections are interleaved in PE
executables– Data directories (import tables) look similar to
data but are often found in code sections– Properties of data sections vary with packers
Statistics-based Unpacking (3)
Bigram Calc117 KB
Explorer1010 KB
Ipconfig59 KB
lpr11 KB
Mshearts131 KB
Notepad72 KB
Ping21 KB
Shutdown23 KB
Taskman19 KB
FF 15call
246 3045 184 24 192 415 58 132 126
FF 75push
235 2494 272 33 274 254 41 63 85
E8 _ _ _ 0xffcall
1583 2201 181 19 369 180 87 49 41
E8 _ _ _ 0x00call
746 1091 152 62 641 108 57 66 50
Evaluation (ASPack)
Evaluation (MoleBox)
API Resolution
• User-level malware programs require system calls to perform malicious actions
• Use Win32 API to access user level libraries• Obfuscations impede malware analysis using
disassemblers and decompilers– Packers use non-standard linking and loading of
dlls– Obfuscated API resolution
Standard API ResolutionImports in IAT identified by IDA by looking at Import Table
Resolving API Calls Using Dataflow Analysis
• Identify register based indirect callsGetEnvironmentStringW
use
def
Evaluation Metrics
• Measuring analyzability– Code-to-data ratio
• Use disassembler to separate code and data. • Most successfully unpacked malware have code-to-data
ratio over 50%
– API resolution success• Percentage of API calls that have been resolved from the
set of all call sites.• Higher percentage implies more the malware is
amenable to static anlaysis.
Challenges in Static Analysis
Disassembly
Complete DisassemblyChallenge: Can we isolate subroutines?
Binary Rewrites
• go interactive …
From Raw Binary To Decompiled C Code
Raw Binary
Disassembly
Complete Disassembly
Decompiled C code
Renaissance: Improving C Code Readability void *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1){ unsigned int *destination2; size_t num3, num2, num4, num5;
destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 );}Hex Rays
void *sub_9AB966(int a1, void *source, unsigned int a3){ int v3, v4, v5, v6, v8;
v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); }
Hex Rays + Renaissance
1. Typing and naming variablesvoid *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1){ unsigned int *destination2; size_t num3, num2, num4, num5;
destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 );}Hex Rays
void *sub_9AB966(int a1, void *source, unsigned int a3){ int v3, v4, v5, v6, v8;
v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); }
Hex Rays + Renaissance
2. Highlighting important varsvoid *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1){ unsigned int *destination2; size_t num3, num2, num4, num5;
destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 );}Hex Rays
void *sub_9AB966(int a1, void *source, unsigned int a3){ int v3, v4, v5, v6, v8;
v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); }
Hex Rays + Renaissance
3. Improvements to decompilationvoid *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1){ unsigned int *destination2; size_t num3, num2, num4, num5;
destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 );}Hex Rays
void *sub_9AB966(int a1, void *source, unsigned int a3){ int v3, v4, v5, v6, v8;
v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); }
Hex Rays + Renaissance
4. Caller → Callee type infovoid *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1){ unsigned int *destination2; size_t num3, num2, num4, num5;
destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 );}Hex Rays
void *sub_9AB966(int a1, void *source, unsigned int a3){ int v3, v4, v5, v6, v8;
v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); }
Hex Rays + Renaissance
Evaluation
IDA Pro RenaissanceAdialer 153/606 (25%) 276/606 (46%)Adpclient 12/300 (4%) 93/300 (31%)Adultbrowser 296/762 (39%) 339/762 (44%)Agent.DZ (packed) 1/61 (2%) 14/61 (23%)Browsermodifier 161/469 (34%) 252/469 (54%)Casino_c12a 794/7207 (11%) 2614/7207 (36%)Conficker-A 243/781 (31%) 318/781 (41%)Conficker-B 296/1516 (20%) 735/1516 (49%)Cycbot 267/2842 (9%) 881/2842 (31%)Duqu 66/300 (22%) 117/300 (37%)Lexotan32-A 2/40 (5%) 9/40 (23%)Lexotan32-B 2/50 (4%) 8/50 (16%)Lolyda.AA 10/134 (7%) 40/134 (30%)Magiccasino 42/1064 (4%) 351/1064 (33%)Mydoom_aa32 153/543 (28%) 189/543 (35%)Podnhua_f0a6 99/372 (27%) 139/372 (37%)Qakbot-A 179 (29%) 183 (30%)Stuxnet 64/320 (20%) 134/320 (42%)Torpig 629 (44%) 717 (51%)Notepad 145/273 (53%) 108/273 (40%)Quake 260/4054 (6%) 1271/4054 (31%)Total 3697/23721 (16%) 9503/23721 (40%)
Challenges in Static Analysis
Raw Binary
Disassembly
Complete Disassembly
Decompiled C code
The Need for Rapid Crypto-Algorithm Isolation AES Truecrypt Waledac
SSL Agobot (IRC over SSL)
Serpent Truecrypt
Twofish Truecrypt
Cascades Truecrypt
HASH Whirlpool Truecrypt
HASH MD6 conficker BC
HASH SHA1 conficker A Truecrypt
RC4 Rustock Zeus Conficker
Custom Crypto / Encoding Pushdo Kraken mebroot Mega-D
XOR-Custom Lethic Virut Hydraq Torpig
RSA variants Nugashe Conficker Waledac
Blowfish - 448 bit Clampi
Intra-module Analyzer
isCrypto Score = isConst + isPadded + Crypt API fn (LargeVar, Loop Detection, Opcodes, BigMath)
cryptoFnDetection () – At least 2 matchesUnknownComputation
cryptoFnDetection ()
IntraModuleisCrypto()
ConstantDetector
PaddingAnalysis
Large Local
Variables
LoopDetection
Big NumberMath
OpcodeAnalysis
MicrosoftCryptoAPICAPICON
ConstantData
Loading
Constant detectionBlowfishCameliaCASTCAST256CRC32DESGOSTHAVALMARSMD2PKCS_MD2PKCS_MD5PKCS_RIPEMD160PKCS_SHA256PKCS_SHA384PKCS_SHA512PKCS_TigerRawDESRC2RijndaelSAFER
SHA1SHA256SHA512SHARKSKIPJACKSquareTigerTwofishWAKEWhirlpoolzlibAESMD6
Direct Reference
Indirect LoadData array containsKnown crypto content
Load Array
Unknown Computation
BlowfishCameliaCASTCAST256CRC32DESGOSTHAVALMARSMD2PKCS_MD2PKCS_MD5PKCS_RIPEMD160PKCS_SHA256PKCS_SHA384PKCS_SHA512PKCS_TigerRawDESRC2RijndaelSAFER SHA1SHA256SHA512SHARKSKIPJACKSquareTigerTwofishWAKEWhirlpoolzlibAESMD6
This could be EncryptionOr Decryption
func ColorNode (Subgraph){ if (exists uncolored subgraph) ColorNode (subgraph) foreach leaf in subgraph { isCrypto(Leaf) } If (exist green leaf) then color root green if (exist orange leaf) then color root orange if (exist > 2 red leaves) then color root red}
func cryptoString (per subroutine) if node contains known crypto implementation
substring, label node with corresponding crypto library.
Inter-module Analyzer
AES
Vowpal wabbit
MD6
IDA Pro Call Graph w/ Crypto-routine detection
Example (c) SRI International Finding crypto constants and subroutines in binary files automatic discovery of crypto functions as unknown computations
4BABF1: found sparse constants for SHA-150C254: found const array sbox_AES (used in AES)50E354: found const array rsbox_AES (used in AES)50F574: found const array Twofish_q (used in Twofish)50F7A4: found const array MARS_Sbox (used in MARS)510EA4: found const array zinflate_lengthExtraBits (used in zlib)510F18: found const array zinflate_distanceExtraBits (used in zlib)511918: found const array CRC32_m_tab (used in CRC32)514F98: found const array CRC32_m_tab (used in CRC32)Found 9 known constant arrays in total.
Scanning code for crypto subroutinesfound crypto in Function @ 407334found crypto in Function @ 40E5B4found crypto in Function @ 47D954found crypto in Function @ 47ED34found crypto in Function @ 4816F4found crypto in Function @ 4B6624found crypto in Function @ 4B9980found crypto in Function @ 4CCBD4found crypto in Function @ 4CCD4Cfound crypto in Function @ 4CE208found crypto in Function @ 4CE7CCfound crypto in Function @ 4CEBE8found crypto in Function @ 4D9B00found crypto in Function @ 4D9EE4Done labelling crypto subroutinesFound 14 subroutine(s) with possible crypto
Running SRI Crypt Finder
Running SRI Crypt Finder
Report Generation
• go interactive