the college of w illiam m ary zhenyu wu, steven gianvecchio, mengjun xie advisor: dr. haining wang
TRANSCRIPT
The College of
WILLIAM k MARY
Mimimorphism:A New Approach to Binary Code
ObfuscationZhenyu Wu, Steven Gianvecchio, Mengjun Xie
Advisor: Dr. Haining Wang
The College of
WILLIAM k MARY 2
Internet & Ubiquitous Computing◦ Billions of networked computers◦ Playground for malware
Suppression Techniques◦ Static analysis
Low latency, high throughput Widely used, IDS deployable
◦ Dynamic analysis
Malware Propagation & Detection
The College of
WILLIAM k MARY 3
Un-obfuscated◦ Binary in plain
Oligomorphism◦ Simple transformation (XOR)
Polymorphism◦ Compression and encryption
Metamorphism◦ Meta transformation (P-code)
State of the Art◦ Control-flow encryption◦ Byte frequency manipulation
Unique substring◦ Segments of the binary
Algorithmic detection◦ Build in transformations
Statistical analysis◦ Anomalies in code body
Advanced pattern matching◦ N-gram signatures
Semantic analysis◦ Persist high-level fingerprints
The Game of Hide and Seek
The College of
WILLIAM k MARY 6
Polymorphism◦ Compression & Encryption
Nobody looks like a small dark box!
Fugitive On The Run
??
The College of
WILLIAM k MARY 7
Metamorphism◦ Reordering Components
Cannot evade feature detections
Fugitive On The Run
Wanted
$5,000,000
!!
The College of
WILLIAM k MARY 8
Control Flow Encryption◦ Prevent feature analysis
Increases suspicion
Fugitive On The Run
??
The College of
WILLIAM k MARY 9
The Real Player◦ Assume other people’s identity (Mimicry)
Fugitive On The Run
The College of
WILLIAM k MARY 10
Lessons Learned:◦ Evasion without obfuscating features
◦ Evasion by refusing inspection
◦ Evasion by mimicking Obfuscating original features Open to inspection, but disguises detection
Fugitive On The Run
The College of
WILLIAM k MARY 11
Mimimorphism:◦ Reversible transformation of an executable that produces
output statically resembles other benign programs
◦ Characteristics: Completely erases features from the original binary High order statistics matches benign executables Transformed payload consists of “meaningful” control flows,
highly resemble those from benign executables
Binary Executable Mimicry
The College of
WILLIAM k MARY 12
Text Stenography Technique◦ Transforms the input data and produces mimicry output
copies that assume statistical and grammatical (structural) properties of another type of data
◦ Originally proposed by Peter Wayner as means to transport sensitive data under harsh surveillance Novel use of Huffman coding
Mimic Functions
The College of
WILLIAM k MARY 13
Huffman Tree
Huffman Coding◦ Digesting
Builds a Huffman tree according to the symbol frequency
◦ Encoding Removes redundancies of the input
data using a given Huffman tree◦ Decoding
Recovers the original data from the “condensed” data by emitting symbols according to the original Huffman tree
Mimic Functions
s
m a
0 1
0 1
mass 000111(32 bits) (6 bits)
01 s00 m01 a
The College of
WILLIAM k MARY 14
What if we decode a piece of random data?◦ Produces “meaningless” data, but
The output exhibits similar symbol frequency to the digest- and -
Input data can be recovered by Huffman encode
Regular Mimic Function◦ Learn: Build a Huffman tree from sample text◦ Mimicry: Huffman decode on input (randomized)◦ Recover: Huffman encode
Mimic Functions
The College of
WILLIAM k MARY 15
Huffman “Forest”
Insufficiencies◦ Produces illegible, garbled text◦ Frequency distributions follow 2n
distribution High-order Mimic Function
◦ Captures interdependencies Build multiple Huffman trees One for each unique symbol prefix
◦ Produces “sensible” text with much more “natural” symbol frequency distributions
Mimic Functions
c
l n
0 1
0 1
chi
p t
0 1
ins
rou
t
0 1
n g
0 1
The College of
WILLIAM k MARY 16
Mimicry of Peter Wayner’s paper◦ Produced by 6th order mimic function
Each of these historical reason, I don’t recommend using gA(t) to choose the safe. These one-to-one encoded with n leaves and punctuation. The starting every intended to find the same order mimic files. A Method is to break the trees by constructing the mimics the path down the most even though, offer no way that is, in this paper. Figure will not overflow memory. These produced by truncating letter. This need to handle n-th ordered compartment of nonsense words cannot bear any resemblance to B because this task is a Huffman showed in [1], [2], [3] among others.
Mimicry Text Sample
The College of
WILLIAM k MARY 17
The Challenge: Machine Language Mimicking◦ Consists of instructions and control flows
Each instruction has a strict format to follow Machines never make “typo”, or use wrong “tense”!
◦ Mimic function has no knowledge of instructions Often makes mistakes generating instructions Have a low success rate of creating mimicry control flows
Our Solution◦ Integrate a custom assembler / disassembler◦ Help the mimic function understand the language
Mimimorphism
The College of
WILLIAM k MARY 18
Digesting
Mimimorphism: Digesting
Exec.Binaries
Mimicry Target
Disassemble
High Order Instruction
Mimic Function
Instruction Huffman Forest
Mimicry Digest
PUSH
DEC
MOV
XOR
Control Flows
The College of
WILLIAM k MARY 19
Digesting
MOV
Inst. Prefixes(Atomic op., repeat, operand size, etc.)
ModR/M(Mod / Reg. / R/M)
SIB(Scale / Idx. / Base)
Displacement
MOV
Mimimorphism: Digesting
Exec.Binary
INC
PUSH
0 1
0 1
PUSH
DEC
MOV
XOR
COMMON_INST Structure
Instruction Huffman Tree
Instruction Prefix
MOV
MOV
XOR
PUSH
DEC
The College of
WILLIAM k MARY 20
Digesting
MOV
Inst. Prefixes(Atomic op., repeat, operand size, etc.)
ModR/M(Mod / Reg. / R/M)
SIB(Scale / Idx. / Base)
Displacement
Mimimorphism: Digesting
INC
PUSH
0 1
0 1
XOR
PUSH
DEC
PUSH
DEC
MOV
XOR
COMMON_INST Structure
Instruction Huffman Tree
Instruction Prefix
MOV
Instruction Encoding TemplateMOV
The College of
WILLIAM k MARY 21
DigestingMOV
MOV
Inst. Prefixes(Atomic op., repeat, operand size, etc.)
ModR/M(Mod / Reg. / R/M)
SIB(Scale / Idx. / Base)
Displacement
Mimimorphism: Digesting
Inst. Prefix
16bit REP
0 1
ModR/M
EAX
0 1
ECX EDX
0 1
……
DisplacementSIB
2x8+16 3x4+0
0 1
Inst. Prefixes(Atomic op., repeat, operand size, etc.)
ModR/M(Mod / Reg. / R/M)
SIB(Scale / Idx. / Base)
Displacement
Instruction Encoding Template
The College of
WILLIAM k MARY 22
Digesting
Mimimorphism: Digesting
INC
PUSH
0 1
0 1
XOR
PUSH
DEC
Instruction Huffman Tree
Instruction Prefix
MOV
MOV
Inst. Prefix
16bit REP
0 1
ModR/M
EAX
0 1
ECX EDX
0 1
……
DisplacementSIB
2x8+16 3x4+0
0 1
Instruction Encoding Template
MOV
Inst. Prefixes(Atomic op., repeat, operand size, etc.)
ModR/M(Mod / Reg. / R/M)
SIB(Scale / Idx. / Base)
Displacement
The College of
WILLIAM k MARY 23
Digesting
Mimimorphism: Digesting
MOV
INC
PUSH
0 1
0 1
XOR
PUSH
DEC
Instruction Huffman Tree
Instruction Prefix
MOV
XOR
PUSH
DECXOR
PUSH
DEC
XOR
PUSH
DEC
MOV
Instruction PrefixMOV
Inst. Prefixes(Atomic op., repeat, operand size, etc.)
ModR/M(Mod / Reg. / R/M)
SIB(Scale / Idx. / Base)
Displacement
The College of
WILLIAM k MARY 24
Digesting
Mimimorphism: Digesting
MOV
INC
PUSH
0 1
0 1
XOR
PUSH
DEC
MOV
CMP
XCHG
10
10
PUSH
DEC
MOV
JMP CALL
10
DEC
MOV
POP
Mimimorphic Digest
Instruction Prefix
PUSH
DEC
MOV
The College of
WILLIAM k MARY 25
Encoding
Mimimorphism: Encoding
Binary Data
PRNG
High Order Instruction
Mimic Function
Mimicry Digest
Assemble
MimicryBinaries
The College of
WILLIAM k MARY 26
Encoding
Mimimorphism: Encoding
Binary Data
01001001100101010001010010001001
XOR
PUSH
DEC
Instruction Prefix
Mimicry Digest
MOV
INC
PUSH
0 1
0 1
XOR
PUSH
DEC
Instruction Huffman Tree
The College of
WILLIAM k MARY 27
Instruction Encoding Template
Encoding
Mimimorphism: Encoding
Binary Data
01001001100101010001010010001001
MOV
INC
PUSH
0 1
0 1
Instruction Huffman Tree
MOV
XOR
PUSH
DEC
MOV
Inst. Prefix
16bit REP
0 1
ModR/M
EAX
0 1
ECX EDX
0 1
……
DisplacementSIB
2x8+16 3x4+0
0 1
The College of
WILLIAM k MARY 28
Instruction Encoding Template
Encoding
Mimimorphism: Encoding
01001001100101010001010010001001
MOV
Inst. Prefix
16bit REP
0 1
ModR/M
EAX
0 1
ECX EDX
0 1
……
DisplacementSIB
2x8+16 3x4+0
0 1
16bit
ECX
3x4+0
The College of
WILLIAM k MARY 29
Encoding
Mimimorphism: Encoding
01001001100101010001010010001001
MOV
Inst. Prefixes(Atomic op., repeat, operand size, etc.)
ModR/M(Mod / Reg. / R/M)
SIB(Scale / Idx. / Base)
Displacement
COMMON_INST Structure
Instruction Encoding TemplateMOV
Inst. Prefix
16bit REP
0 1
ModR/M
EAX
0 1
ECX EDX
0 1
……
DisplacementSIB
2x8+16 3x4+0
0 1
16bit
ECX
3x4+0
The College of
WILLIAM k MARY 30
Encoding
Mimimorphism: Encoding
01001001100101010001010010001001
MOV
Inst. Prefixes(Atomic op., repeat, operand size, etc.)
ModR/M(Mod / Reg. / R/M)
SIB(Scale / Idx. / Base)
Displacement
COMMON_INST Structure
PUSH
DEC
?
XOR
MOV
The College of
WILLIAM k MARY 31
Encoding
Mimimorphism: Encoding
01001001100101010001010010001001
PUSH
DEC
MOV
XOR
MOV
XOR
PUSH
DEC
MOV
Instruction Prefix
The College of
WILLIAM k MARY 32
Decoding
Mimimorphism: Decoding
Binary Data
PRNG
High Order Instruction
Mimic Function
Mimicry Digest
MimicryBinaries
Disassemble
The College of
WILLIAM k MARY 33
Training◦ Select 100 Windows XP system files as mimicry target
They represent typical legitimate binaries◦ Trained using 7th and 8th order mimimorphic engines
Most control flow basic blocks have 7-8 instructions
Evaluations◦ Statistical Anomaly Tests
Kolmogorov-Smirnov Test & Entropy Test◦ Semantic Detection Test
Control Flow Fingerprinting
Experimental Setup
The College of
WILLIAM k MARY 34
Statistical Tests◦ Kolmogorov-Smirnov Test
Maximum byte frequency distribution differences
Legitimate: 0.074±0.045; Mimimorphic: 0.093±0.006
◦ Entropy Test Measurement of predictability
(or randomness) of data Legitimate: 6.353±0.258;
Mimimorphic: 6.528±0.021
Evaluation Results
0.074
0.09
6.353
0.516
The College of
WILLIAM k MARY 35
Semantic Tests◦ Control Flow Fingerprinting
Statically analyze executables (with a special disassembler) and extract control flow patterns
Detecting malwares by matching their characteristic control flow patterns (i.e., shared fingerprints)
◦ Between original binary and Mimimorphic instances Shared fingerprints: the lower the better Only 1 out of 100 instances share a single fingerprint (out of
hundreds of thousands fingerprints)
Evaluation Results
The College of
WILLIAM k MARY 36
Semantic Tests◦ Between mimimorphic and legitimate binaries
Shared fingerprints: the higher the better 7th order mimimorphic instances:
Average 1856.46±372.5 (72.93 benign files) Minimum 1057 (44 files); Maximum 3321 (92 files)
8th order mimimorphic instances: Average 11407.99±912.42 (81.37 benign files) Minimum 9606 (70 files); Maximum 14216 (91 files)
Evaluation Results
The College of
WILLIAM k MARY 37
Semantic Tests◦ A sample mimicry control
flow pattern Reproduced by a 7th order
mimimorphic instance
Evaluation Results
The College of
WILLIAM k MARY 38
Application Constraint◦ Memory consumption: 600MB for 7th order and 1.2GB for
8th order mimimorphic transformation Disk-based on-demand digest storage
◦ Size increase: 20x inflation for 7th order and 30x for 8th order mimimorphic transformation Typical malware are less than 100KB Mimimorphism results in 2~3MB files
Limitations & Discussions
The College of
WILLIAM k MARY 39
We propose mimimorphism as a novel binary obfuscation technique
◦ Enhanced high order mimic functions with custom assembler / disassembler
◦ Achieves evasion by disguising, not refusing detection
◦ Effective against both statistical anomaly detection as well as semantic fingerprinting tests
Conclusion
The College of
WILLIAM k MARY 40
Robustness against other approaches◦ Automatic n-gram detections
Typical x86 instruction length: 2.1~2.8 8th order mimimorphism can approach 16-gram mimicry Existing n-gram detection algorithms can hardly scale up to
◦ Static semantic analysis Mimimorphism does not target specific detection techniques Focuses on reproducing features from benign programs Immune to lower order signature detections
Limitations & Discussions
The College of
WILLIAM k MARY 41
Robustness against other approaches◦ Deep syntactic analysis
Fails to exactly reproduce high level syntactic features: 45% “functions” do not have matching prologue and epilogue Many jump instructions go across function boundaries
Detectable program-level anomalies Not all programs follow conventions Could lead to false positives
Limitations & Discussions
The College of
WILLIAM k MARY 43
The Problem of the Unpacker◦ Mimimorphic transformation does not provide solution for
hiding the unpacker◦ However, we believe unpackers do benefit from using
mimimorphism Unpacker is the weakness of polymorphism because it is
easy to be “spotted” – all other payload is not executable! All mimimorphic payload is “executable”, separating
unpacker code from the payload becomes non-trivial
Limitations & Discussions
The College of
WILLIAM k MARY 44
Decoding
Mimimorphism: Decoding
Binary Data
PRNG
High Order Instruction
Mimic Function
Mimicry Digest
MimicryBinaries
Disassemble
The College of
WILLIAM k MARY 45
Decoding
MOV
Inst. Prefixes(Atomic op., repeat, operand size, etc.)
ModR/M(Mod / Reg. / R/M)
SIB(Scale / Idx. / Base)
Displacement
MOV
Mimimorphism: Decoding
MimicryBinary
MOV
INC
PUSH
0 1
0 1
PUSH
DEC
MOV
XOR
COMMON_INST Structure
Instruction Huffman Tree
Instruction Prefix
MOV
MOV
XOR
PUSH
DEC00
Decoded Bits
The College of
WILLIAM k MARY 46
XOR
PUSH
DEC
Decoding
MOV
Inst. Prefixes(Atomic op., repeat, operand size, etc.)
ModR/M(Mod / Reg. / R/M)
SIB(Scale / Idx. / Base)
Displacement
Mimimorphism: Decoding
MOV
INC
PUSH
0 1
0 1
COMMON_INST Structure
Instruction Huffman Tree
Instruction Prefix
MOV
00
MOV
Inst. Prefix
16bit REP
0 1
ModR/M
EAX
0 1
ECX EDX
0 1
……
DisplacementSIB
2x8+16 3x4+0
0 1
Decoded Bits
Decoded Bits
The College of
WILLIAM k MARY 47
Decoding
Mimimorphism: Decoding
MOV
Inst. Prefix
16bit REP
0 1
ModR/M
EAX
0 1
ECX EDX
0 1
……
DisplacementSIB
2x8+16 3x4+0
0 1
Decoded Bits
MOV
Inst. Prefixes(Atomic op., repeat, operand size, etc.)
ModR/M(Mod / Reg. / R/M)
SIB(Scale / Idx. / Base)
Displacement
Inst. Prefixes(Atomic op., repeat, operand size, etc.)
ModR/M(Mod / Reg. / R/M)
SIB(Scale / Idx. / Base)
Displacement
16bit
ECX
3x4+0
0101
The College of
WILLIAM k MARY 48
Decoding
Decoded Bits
Mimimorphism: DecodingDecoded Bits
MOV
Inst. Prefixes(Atomic op., repeat, operand size, etc.)
ModR/M(Mod / Reg. / R/M)
SIB(Scale / Idx. / Base)
Displacement
0101
MOV
INC
PUSH
0 1
0 1
Instruction Huffman Tree
MOV
XOR
PUSH
DEC
MOV
Inst. Prefix
16bit REP
0 1
ModR/M
EAX
0 1
ECX EDX
0 1
……
DisplacementSIB
2x8+16 3x4+0
0 1
16bit
ECX
3x4+0
00