ida and obfuscated code hex-rays ilfak guilfanov

IDA and obfuscated codeHex-RaysIlfak Guilfanov

Presentation Outline

Is obfuscated code a problem for IDA Pro?IDA Pro expects nice proper code

A lost battle?At the first sight, yes

Solutions existThey are numerous...

Future developmentYour feedback

Online copy of this presentation is available at http://www.hex-rays.com/idapro/ppt/caro_obfuscation.ppt

Sample obfuscated code

IDA is a static analysis tool and it makes many assumptions about the input codeWhen these assumptions are violated, the analysis goes wrongAn extremely simple case, call instructions are expected to return to the next instruction:

problem

The solution will be presented later...

Obfuscation categories

RedundancyBlow the code size: code cleaning is necessary

CamouflageHide & seek: the seeker is to win

Anti-debugger tricksTricks can be learned even by old dogs

Since it is “just” obfuscation, a determined reverse engineer will eventually overcome it

Redundancy

Instructions with no effectUseless jumpsComplex computations with a constant result Code duplication

Instructions with no effect

In fact CL is zero

Instructions with no effect - countermeasures

Replace them by 'nop'sCollapse regions of useless instructions into one line (select useless instructions, then View, Hide)

Ideally, a plugin to clean up the code would be nice. The Hex-Rays decompiler ignores useless instructions because it simply removes all dead code but it can not handle obfuscated code well – expect improvements in this direction

Useless jumps

Text view is pretty useless:

Useless jumps

Graph view is slightly better:

A plugin to clean the graph and combine adjacent nodes would be really useful (can be done without modifying the database)

Graph view and plugins

Graphs generated by IDA can be modified by a plugin on the fly – just hook to grcode_changed_graph eventThis allows for improving the graph. Some ideas:

Combine sequential nodes into oneHide dead code pathsRemove dead edgesAdd annotations to graph nodes/edgesAutomatically recognize and collapse patterns (e.g.strlen)Local optimization (within a node; constant folding, etc)

All this can be really useful for obfuscated code!

Constant result calculations

Some constant calculations can be easily handled

Ctrl-R

When there are too many offsets...

The answer is obvious – write a script or a plugin :)Here's very simple one-line script:OpOffEx(here, 1, REF_OFF32|REFINFO_NOBASE, -1, EBP, 0)

To make your life even easier, you may assign a script to a hotkey, press Shift-F2 and enter:

This trick and many others are explained on http://www.xs4all.nl/~itsme/projects/disassemblers/ida.html

AddHotkey("w", "make_ebp_offset");}

static make_ebp_offset(){ OpOffEx(here, 1, REF_OFF32|REFINFO_NOBASE, -1, EBP, 0);

What if there are thousands of such offsets?...

Improve the script to check all instructions for the desired pattern. Here's how to organize a loop over all instructions:

auto ea, ea2;ea2 = MaxEA();for ( ea=MinEA(); ea < ea2; ea=NextHead(ea, ea2) ){ if ( !isCode(GetFlags(ea)) ) continue; if ( GetMnem(ea) == "mov" && GetOpnd(ea, 0) == "ebp" ) Message("%a: found mov ebp!\n", ea);}

What if these offsets appear and vanish dynamically?

Well, then you have to create a plugin. It would:Recognize the desired patternModify the database (create an offset, code, add cmt, etc)

Such plugins are fully automaticThey hook to analysis events (frequently to custom_emu)This is the most powerful technique but, alas, it requires DLL programming in C and using the SDKJust three wishes for your plugins:

Maybe a switch to turn your plugin off is a good ideaTry to be user-friendly (for example, check if there is a comment before calling set_cmt; otherwise you may overwrite a user-defined comment)Do not exit to OS in the case of errors

Constant calculations – some ideas

Create a script or plugin to:Add calculation results as comments (what about a script that traces the application and adds register values as comments for each instruction?)Modify the database and simplify instructions

Camouflage

Opaque predicatesProprietary virtual machineEncryption/compressionMessage-driven systemsNo direct references – PIC (position independent code) codeHidden execution flow using SEHRootkit techniquesHidden entry point (TLS callbacks, entry point in the resources section or in the header)

Opaque predicates

The definition says that opaque predicate is a predicate (an expression that evaluates to either "true" or "false") for which the outcome is known by the programmer a priori, but which, for a variety of reasons, still needs to be evaluated at run timeIn fact, some expressions evaluate to any integer value:

GetLastError returns 0x57 (Invalid Parameter)

Opaque predicates

They may come in many varieties. Since we can not determine the outcome statically, we have to find it out ourselves and

Inform IDA about the predicate outcomePrune dead code paths and simplify the code

Working on graph view or pseudocode is easier

Automate this? How?

Future versions of IDA/Hex-Rays will offer some solutionsInteractivity and extendibility helps

Proprietary virtual machine

Many implementations use this obfuscation methodRequires reverse engineering the virtual machineExamples:

Themida & Code Virtualizer (http://www.oreans.com/)Various malware

In general case, building a processor module for the VM is requiredLet me show you a simple case

Bagle malware case

This mass mailer contains the following code sequence:

Bagle - opcodes

Opcode handlers are very simple, I renamed them:

Bagle – opcode table

After renaming all handlers the opcode table was:

Bagle – create opcode enumeration

The following script created a enumeration for all VM opcodes based on the handler names:

Bagle – enumeration ready

We can use this enumeration in the disassembly nowJust declare an array of bytes and convert them to VM_CODESAll this without quitting IDA (in fact, I was in the middle of a debugging session since there was another layer of protection before the VM)

Bagle – virtual machine readable

Create an array of bytes, declare them as VM_CODES:

Bagle – VM logic visible

The logic of the VM program became visible but there were immediate constants in the code that required manual intervention:

Bagle – VM decoding automated

The following script solve the problem:

Bagle – comfortable analysis of VM

After assigning a hotkey to the previous script, it was almost the same as having a processor module for the VMHowever, another level of deobfuscation is required(0x63FE34B2 ^ 0x9C01CB4D = 0xFFFFFFFF)

VM - summary

We have toAnalyze VM opcodesGive them meaningful, descriptive namesIn simple cases, simple enumeration will do the jobIn complex cases, a processor module has to be developed

It is not _that_ difficult after all ;)

Rolf Rolles created a processor module for a VM:http://www.openrce.org/articles/full_view/28

Executable packing

Plethora of packing methods, good and badManual unpacking is always possible; automatic unpacking would be idealThere are sample scripts and plugins in IDA

uunp – proof of concept unpacker plugin, exists as an IDC script as wellunpack – another sample unpacker

IDA stayed away from this arms raceThere are many other solutions available (unpackers, process dumpers, etc)

Executable packing - approaches

Static analysis too time consuming requires tedious manual work

Dynamic analysis (debugger)much faster requires special sandboxed environmentvulnerable to anti-debugger tricks

Code emulation a good idea any widespread emulator will be attackedemulation imperfections are a problem

No ideal solution...

Encryption

Methods vary from simple XOR encryption to serious encryption schemes like AES, Blowfish, etcSince the key must be present to run the executable, the strength of the encryption method does not matterIdeally we just let the application decrypt itself and then take a memory snapshotIf only part of the executable is decrypted at a time, then we need to automate the process of taking memory snapshots

Position independent code

No fixed addresses means no xrefsAnalysis is harder but user-defined offsets can help

Anti-debugging tricks

I'm sure you know better since you are the practitioners :)IDA related:

Its default settings are not good for hostile code debuggingExceptions are handled by the debugger – change it in the debugger settings

Just two simple methods

Use tracing to find anti-debugging tricks

Tracing is slow but it may be used to find why/when/how the process misbehavesSample trace log from a naïve code:

Simple method to neutralize found tricks

Use “conditional” breakpoint to neutralize tricks encountered while single-steppingThe breakpoint condition for the call instruction is

ip=ip+2Breakpoint conditions may call all defined IDC functions (including user-defined ones) – can be used for logging and changing the application behavior

Debugger – current state

IDA debugger advantagesThe annotated database is available during debuggingAll facilities continue to work: FLIRT signatures, function prototypes and argument names, structures, enumerations, your scripts and plugins, etc...ScriptableAvailable on multiple platforms (+remote debugging)

ShortcomingsSlow operationMultithreaded applications poorly handledOnly application level debugging is available

We continue to work on the shortcomingsFuture versions will be more fit for hostile code analysis

Debugger - ideas

A debugger plugin to configure the 'stealth' modeExceptions are passed to the applicationCalls to IsDebuggerPresent, NtSetInformationThread and similar functions are intercepted

Emulating debugger moduleA 'stealth' debugger module

Do not use the standard debugger interface (CreateProcess/WaitForDebugEvent)Inject a debugger DLL into the process and communicate with it (the must-have functionality is breakpoint handling and memory access)

Higher level debuggingSkip hidden code areas, group nodes in the graph viewSource level debugging using the pseudocode view

Summary

Obfuscation methods vary, no single receipt for all casesThe key is to be able to represent the code nicely on the screenThe problem is generic: what to do if IDA displays things not the way I want?The answer is: modify the output!

Use interactive commands, menus, etcRepresent data in meaningful wayHide irrelevant informationPatch the database and simplify it

Create scripts, plugins, processor modules to avoid routine work

The obfuscating call instruction

The function returns a few bytes further that it would normally:

Example: solution to obfuscating call

The idea: intercept emulation of calls to “ex_obfuscating” and create correct xrefs Just a few lines of code (unfortunately, a plugin)Can be made more complex if necessaryThe source code of the sample plugin can be found at http://www.hexblog.com/ida_pro/files/ex_deobfuscate.zipSee the next slide for the essential part of the plugin

Plugin to handle weird call instructions

Deobfuscated code

Note the arrow on the left side of the listingGraph could be simplified further by a plugin

The “thank you” slide

Thank you for your attention!Questions?

ida and obfuscated code hex-rays ilfak guilfanov

Documents

obfuscated malicious executable scanner · 2017. 5. 8. ·...

page 1 page 2 de obfuscation of virtualization obfuscated...

svac glow plugs 2015sv106 hex 8 sv107 12 sv108 4.4 v hex 10...

cryptographic function detection in obfuscated binaries

hex, reverse hex and cylindrical hex piet hein and martin...

deobfuscation of virtualization-obfuscated software

codext: automatic extraction of obfuscated attack code...

square, hex, heavy hex, and askew head bolts and...

hex s - wifi-stock1 hex s hex s hex s hex s is a six port...

attacking obfuscated code with ida pro - black hat |...

static dis assembly of obfuscated binaries

attacking obfuscated code with ida pro - black hat

an innovative obfuscated code analysis algorithm

dexmonitor: dynamically analyzing and monitoring obfuscated...

progress tracker - remittanceprices.worldbank.org · bank...

detection and classification of obfuscated malware moustafa...

simple obfuscated file transfer - british columbia ·...

hex s - winncom.com · 1 hex s hex s hex s hex s is a six...

hasim - unﬁlled, honed & tumbled · hasim - unﬁlled,...

obfuscated access and search patterns in searchable...