deobfuscation - optimice
TRANSCRIPT
-
8/10/2019 Deobfuscation - optimice
1/28
-
8/10/2019 Deobfuscation - optimice
2/28
Overview
Why?Project goal?How?
DisassemblyInstruction semanticsOptimizationsAssembling
DemoQuestions?
-
8/10/2019 Deobfuscation - optimice
3/28
Why?
To name a fewX86 is complex
2 books, ~1600 pages of instructions
Obfuscated code = complexity++Disassembly is not always prettyDebugging can help mitigate some problemsbut not all
-
8/10/2019 Deobfuscation - optimice
4/28
-
8/10/2019 Deobfuscation - optimice
5/28
Why?
Now what?
-
8/10/2019 Deobfuscation - optimice
6/28
Why?
No public/opensource tool for deobfuscationNo framework to analyze instruction semanticsFun thing to do?
To speed up thingsReuse code for some other projects
-
8/10/2019 Deobfuscation - optimice
7/28
Project goal?
Rewrite code to fix disassembly representationproblemsBuild framework to analyze instruction tainting andsemantics
Extend it for automatic deobfuscationExpose API
Ease development of heuristic deobfuscation rules andcode transformationsExperiment with code transformations
-
8/10/2019 Deobfuscation - optimice
8/28
How? - Disassembly
Main disassembly unit is a functionFunction representation should have all instructionsvisibleProblems (for reversers)
Basic block scatteringNot a real problem for disassembler
Fake paths in conditional jumps lead to broken disassembly(opaque predicates)Instruction overlapping
Not a real problem for disassembler
-
8/10/2019 Deobfuscation - optimice
9/28
How? - Disassembly
JCC path leads to broken disassemblyReplace it with RET , add comment to instruction and continueThis way code can be transformed to a function
-
8/10/2019 Deobfuscation - optimice
10/28
How? - Disassembly
Instruction overlapping hides code pathsDisassembly graph should contain all instructions
-
8/10/2019 Deobfuscation - optimice
11/28
-
8/10/2019 Deobfuscation - optimice
12/28
How? - Disassembly
Nodes represent InstructionsInstruction contains the following information:
OriginEA, Mnemonic, Disassembly, Prefix, Operands,Opcode, Operand types
Instruction information populated from two sources:Information from IDA API
GetMnem (), GetOpnd (), GetOpType ()
Information derived from GetDisasm () API
-
8/10/2019 Deobfuscation - optimice
13/28
-
8/10/2019 Deobfuscation - optimice
14/28
How? Disassembly - Functions
Function abstracted as a ClassFunction = Basic Blocks + CFGCFG stored as two graphs
References from location (GetRefsFrom)References to location (GetRefsTo)
Some of the exposed functions are: GetRefsFrom (),GetRefsTo (), DFSFalseTraverseBlocks ),
CFG optimizations mainly operate on Function class
-
8/10/2019 Deobfuscation - optimice
15/28
How? Disassembly Basic Blocks
Basic Block implemented as linked list in FunctionClassEach entry is an Instruction Class instanceInstruction stores relevant instruction data:
Prefix, mnemonic, operands, types, values, comments
Stores instruction information from two sources:IDAGetOp *() functionsParsing of GetDisasm () string, regex style
-
8/10/2019 Deobfuscation - optimice
16/28
-
8/10/2019 Deobfuscation - optimice
17/28
How? Instruction Semantics
Implemented in TaintInstr ClassContains information about:
Source and destination operands (displayed and hidden)Flag modificationSide effects (e.g. ESP+4 for POP )Ring association ( LLDT )
BlockTainting Class automates process on blocks
Tainting information necessary to perform safeoptimizations
-
8/10/2019 Deobfuscation - optimice
18/28
How? Overall
FunctionCFG information
Basic BlocksInstruction grouping
InstructionOpcode information
Taint informationOperands information
Function
Basic Block
InstructionTaint
Information
-
8/10/2019 Deobfuscation - optimice
19/28
How? Optimizations
We have foundation to analyze codeIts time to exploit some algorithms Four main types of optimizations:
CFG reductionsJCC reductionJMP merging
Dead code removalHeuristic rulesConstant propagation and folding (TODO)
-
8/10/2019 Deobfuscation - optimice
20/28
How? Optimizations - CFG
JCC reductionsJCC path depends on flags statusUse tainting information to detect constant flagsReplace JCC with JMPe.g. AND , clears OF and CF flags
[JO, JNO, JC, JB, ] all take single path
Results in smaller graphs and better/more
precise disassemblyRemoves fake paths that break creation offunctions and mess up disassembly
-
8/10/2019 Deobfuscation - optimice
21/28
How? Optimizations - CFG
JMP mergingIf current block ends in JMP andnext block has only single reference then merge them
Increases block size, reduces CFG complexityCode optimizations are block based so merging caninfluence a lot the final code quality
-
8/10/2019 Deobfuscation - optimice
22/28
JCC
How? Optimizations - CFG
Two staged CFG optimization:
BLOCK 2
BLOCK 3
JMP
BLOCK 2
BLOCK 1
BLOCK 1,2
BLOCK 1
-
8/10/2019 Deobfuscation - optimice
23/28
-
8/10/2019 Deobfuscation - optimice
24/28
How? Optimizations Rule based
There is obfuscation which bothers you and isntautomagically removed?Adding rule based optimization is easy?
-
8/10/2019 Deobfuscation - optimice
25/28
How? Assembling
idaapi.Assemble () ?, we do not support it. It is very limited and canhandle only some trivial instructions. We do nothave plans to improve or modify it . Ilfak
Sensitive to syntaxRemember GetMnem ()?IDA before 5.5 cant assemble JCC easily BUT, it can handle most instructions if you play nice(Batch(1) is your friend)
-
8/10/2019 Deobfuscation - optimice
26/28
DEMO TIME
-
8/10/2019 Deobfuscation - optimice
27/28
Conclusion
It can remove static obfuscationsYou can feed it data from disassembler for betterresults
Tool chaining
Work in progressIt has bugs :D send samples and will fix themGot ideas? Share them.
You can extend, improve, contributeShouts: n00ne, bzdrnja, tox, haarp, MazeGen,RolfRolles, all gnoblets, reddit/RE
-
8/10/2019 Deobfuscation - optimice
28/28
Thank you for your attentionQuestions?
http://code.google.com/p/optimice