implementing oblivious hashing using overlapped instruction encodings
DESCRIPTION
Implementing Oblivious Hashing Using Overlapped Instruction Encodings. ACM Multimedia and Security ‘07 Dallas, TX (USA) September 20-21, 2007. Mariusz H. Jakubowski Ramarathnam Venkatesan Microsoft Research. Matthias Jacob Nokia Research. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
Implementing Oblivious HashingUsing Overlapped Instruction Encodings
ACM Multimedia and Security ‘07
Dallas, TX (USA)
September 20-21, 2007
Mariusz H. Jakubowski
Ramarathnam Venkatesan
Microsoft Research
Matthias Jacob
Nokia Research
ACM Multimedia and Security ’07 September 20-21, 2007 2
Introduction• Field of work: Software protection
– Obfuscation and tamper-resistance– Prevention (or delaying) of reverse engineering and hacking– Securing of content-rights systems (DRM)
• Background: Two specific protection techniques– Oblivious hashing (OH): Computing hashes (“fingerprints”) of
execution traces– Overlapped code: “Jumping into the middle of instructions” to
obfuscate and protect against disassembly
• Goals of our work:– Apply overlapped code towards obfuscation and tamper-
resistance via OH.– Study new techniques in terms of formal models, avoiding “ad
hoc” approaches.
ACM Multimedia and Security ’07 September 20-21, 2007 3
Overview
• Introduction• Background
– Software protection– Oblivious hashing (OH)– Overlapped code
• Code interleaving• Conclusion
Oblivious hashing via overlapped code
ACM Multimedia and Security ’07 September 20-21, 2007 4
Software Protection
• Obfuscation– Making programs “hard to understand”
• Tamper-resistance– Making programs “hard to modify”
• Obfuscation tamper-resistance
• Tamper-resistance obfuscation?
ACM Multimedia and Security ’07 September 20-21, 2007 5
Formal Obfuscation
• Impossible in general– Black-box model (Barak et al.):
“Source code” doesn’t help adversary who can examine input-output behavior.
– Worst-case programs and poly-time attackers
• Possible in specific limited scenarios– Secret hiding by hashing (Lynn et al.)– Point functions (Wee, Kalai et al.)
• Results difficult to use in practice.
ACM Multimedia and Security ’07 September 20-21, 2007 6
Tamper-Resistance• Many techniques used in practice – e.g.:
– Code-integrity checksums (e.g., Atallah et al.’s software guards)– Anti-debugging and anti-disassembly methods– Virtual machines and interpreters– Polymorphic and metamorphic code
• Never-ending battle on a very active field– Targets: DRM, CD/DVD protection, games, dongles, licensing,
etc.– Defenses: Binary packers and “cryptors,” special compilers,
transformation tools, programming strategies, etc.
• Current techniques tend to be “ad hoc:”– No provable security– No analysis of time required to crack protected instances
ACM Multimedia and Security ’07 September 20-21, 2007 7
Tamper-Resistance Model
• Program: A graph G• Execution: A “random” walk on G• Integrity checks:
– Probabilistic monitoring of a set of G’s nodes– Detection of failures that lead to delayed responses
• Security analysis: “Graph game” on G between attacker and defender
• OH and overlapped code in context of model:– Provide a source of integrity checks.– Help enforce “local indistinguishability” and other
engineering assumptions about implementation.
Abstraction of software tamper-resistance (Dedić et al., IH ’07)
ACM Multimedia and Security ’07 September 20-21, 2007 8
Oblivious Hashing• Computation of hashes over program traces
– Initialize hash values at specific points.– Update hashes upon assignments and branches.
int x = 123;
if (GetUserInput() > 10){ x = x + 1;}else{ printf("Hello\n");}
INITIALIZE_HASH(hash1);
int x = 123;UPDATE_HASH(hash1, x);
if (GetUserInput() > 10){ UPDATE_HASH(hash1, BRANCH_ID_1); x = x + 1; UPDATE_HASH(hash1, x);}else{ UPDATE_HASH(hash1, BRANCH_ID_2); printf("Hello\n");}
VERIFY_HASH(hash1);
Original code
Hash transform
Hashed code
ACM Multimedia and Security ’07 September 20-21, 2007 9
Overlapped Code
• Code sharing among different paths– Semantic: Sharing of code blocks among execution
paths.– Physical: Sharing of code bytes among machine or
byte-code instructions.
• Purposes– Anti-disassembly and anti-decompilation– Obfuscation– Tamper-resistance from code sharing and explicit OH
ACM Multimedia and Security ’07 September 20-21, 2007 10
Semantic Overlap
Code section is sharedalong different paths:
increase_ctr(*ctr) { (*ctr)++;}
increase_win() { increase_ctr(&win); return win;}
increase_loss() { increase_ctr(&loss) return loss;}
return win; return loss;
Automated via code outlining
ACM Multimedia and Security ’07 September 20-21, 2007 11
Physical Overlap
Offset 0:Offset 0:B8 B8 04 05 2D B8 B8 04 05 2D mov eax, 2D0504B8mov eax, 2D0504B805 9005 90 sub eax, 90sub eax, 90
Offset 1:Offset 1:B8 04 05 2D 05B8 04 05 2D 05 mov eax, 52D0504mov eax, 52D05049090 nopnop
Offset 2:Offset 2:04 0504 05 add al, 5add al, 52D 05 902D 05 90 sub eax, 9005sub eax, 9005
Execution and disassembly depend on entry point into code.
Sample x86 code: B8 B8 04 05 2D 05 90
Note: Disassembly tends to resynchronize naturally – but we can prevent this.
Offset 3:Offset 3:05 2D 05 9005 2D 05 90 add eax, 90052Dadd eax, 90052D
Offset 4:Offset 4:2D 05 902D 05 90 sub eax, 9005sub eax, 9005
Offset 5:Offset 5:05 9005 90 sub eax, 90sub eax, 90
ACM Multimedia and Security ’07 September 20-21, 2007 12
Disassembly Synchronization• Often observed in practice, but previously not explained
mathematically.• Limits effectiveness of code overlapping for security.• Requires explicit anti-synchronization measures to enforce
protection.• Rigorous explanation: Kruskal count
00411410 55 push ebp 00411411 8B EC mov ebp,esp 00411413 12 EC adc ch,ah 00411415 C0 00 00 rol byte ptr [eax],0 00411418 00 53 56 add byte ptr [ebx+56h],dl 0041141B 57 push edi 0041141C 8D BD 40 FF FF FF lea edi,[ebp-0C0h]
00411410 55 push ebp 00411411 8B EC mov ebp,esp 00411413 81 EC C0 00 00 00 sub esp,0C0h 00411419 53 push ebx 0041141A 56 push esi 0041141B 57 push edi 0041141C 8D BD 40 FF FF FF lea edi,[ebp-0C0h]
Corrupted byte
Synchronization point
Example of corruptionand synchronization:
ACM Multimedia and Security ’07 September 20-21, 2007 13
Disassembly Synchronization• Disassembly: A “leapfrog” process over code bytes
– Each byte address contains an instruction of a definite length.– After disassembling an instruction, a disassembler skips to the next instruction.
• Example: Sequence of instruction lengths at consecutive offsets: 3 4 6 2 6 3 4 5 3 3 5 4 2 7 3 1 4
Sequence of instruction lengths 3 4 6 2 6 3 4 5 3 3 5 4 2 7 3 1 4
3 2 3 3 4 1 4 4 3 3 4 1 4 6 3 4 1 4 2 3 3 4 1 4 6 5 1 4
Synchronization point
Disassembly at offset: 01234
Kruskal count: Such disassembly synchronizes in about B2/16 steps, where B = average # of bytes per instruction.
ACM Multimedia and Security ’07 September 20-21, 2007 14
Disassembly Synchronization
• Let InstructionLength(address) = length of instruction found at address.
• Starting at “slightly different” addresses x and y, a disassembler iterates:
x x + InstructionLength(x) (“leapfrog x”)y y + InstructionLength(y) (“leapfrog y”)
• Our goal: Compute N = approximate number of steps before any intermediate x is equal to any intermediate y.
• Treat all possible values of x-y as states of a Markov chain.• N is the coupling time of this Markov chain.• Kruskal count: N is about B2/16, where B is the average
instruction length.
Model of the disassembly process
ACM Multimedia and Security ’07 September 20-21, 2007 15
Code Interleaving
• A method to overlap arbitrary code blocks– Explicitly prevents disassembly resynchronization
– Adds tamper-resistance• Hash of instruction bytes only (like traditional code
checksums)• Hash of instruction bytes and program state (like oblivious
hashing)
• Basic algorithm– Code interspersing: Create a block of interleaved
instructions from two code blocks.
– Code merging: Inject hashing instructions overlapped with existing instructions.
ACM Multimedia and Security ’07 September 20-21, 2007 16
Code Interleaving: Basic IdeaSEQ1: INST_1 INST_2
SEQ2: INST_A INST_B
Two input code blocks
ACM Multimedia and Security ’07 September 20-21, 2007 17
Code Interleaving: Basic IdeaSEQ1: INST_1 INST_2
SEQ2: INST_A INST_B
SEQ1: INST_1 JMP L2SEQ2: INST_A JMP LBL2: INST_2 JMP L3LB: INST_BL3:
Two input code blocks After code interspersing
Code interspersing: Interleave instructions, injecting jumps as needed to maintain control flow.
ACM Multimedia and Security ’07 September 20-21, 2007 18
Code Interleaving: Basic IdeaSEQ1: INST_1 INST_2
SEQ2: INST_A INST_B
SEQ1: INST_1 JMP L2SEQ2: INST_A JMP LBL2: INST_2 JMP L3LB: INST_BL3:
SEQ1: INST_1 HASH_1 INST_2 HASH_2
SEQ2: INST_A HASH_A INST_B
Two input code blocks After code interspersingAfter code merging
Code interspersing: Interleave instructions, injecting jumps as needed to maintain control flow.
Code merging: Replace jumps with hash instructions, maintaining control flow.o E.g.: JMP L2; INST_A; JMP_LB transforms into HASH_1o HASH_1 contains INST_A and part of HASH_A.
Suitable hash instructions must be found (and fit together like puzzle pieces).o Various possibilities identified on x86.o Can also design custom byte-codes to maximize utility of overlapping.
Disassembly at SEQ2
Disassembly at SEQ1
ACM Multimedia and Security ’07 September 20-21, 2007 19
Code Interleaving: ExampleSEQ1: C1 E0 02 shl eax, 2I11: 40 inc eax C3 ret
SEQ2: 48 dec eaxI21: C1 E8 03 shr eax, 3 C3 ret
SEQ1: C1 E0 02 shl eax, 2 EB 03 jmp I11SEQ2: 48 dec eax EB 04 jmp I21I11: 90 nop 40 inc eax EB 03 jmp OI21: C1 E8 03 shr eax, 3O: 90 nop C3 ret
SEQ1: C1 E0 02 shl eax, 2 81 F1 48 81 E9 90 xor ecx, 90E98148I11: 40 inc eax 81 C1 C1 E8 03 90 add ecx, 9003E8C1O: C3 ret
SEQ2: 48 dec eax 81 E9 90 40 81 C1 sub ecx, C1814090I21: C1 E8 03 shr eax, 3O: 90 nop C3 ret
Two input code blocks (x86)
After code interspersing
After code merging(OH instructions in red)
Disassembly at SEQ2Disassembly at SEQ1
ACM Multimedia and Security ’07 September 20-21, 2007 20
Code Interleaving
• Observations– Tamper-resistance comes from two main sources:
• Implicit: Shared instruction bytes• Explicit: OH instructions
– Disassembly synchronization is explicitly prevented.– Method enables code-byte hashes even on architectures
that do not allow explicit access to code bytes.
• Extensions– Iteration to build up complexity
• Enhances security at little or no implementation cost.• Complex (emergent) code patterns and behaviors can arise.
– Implementation over custom byte codes designed to maximize utility of overlapping (unlike x86)
ACM Multimedia and Security ’07 September 20-21, 2007 21
Experimental Results
• Tool implementation using Vulcan (binary-rewriting framework)• Reasonable impact on performance, depending on desired security level• Remaining work on analyzing security in practice
Performance impact on SpecINT benchmarks:0 = no overlapping, 1 = full overlapping
ACM Multimedia and Security ’07 September 20-21, 2007 22
Conclusion
• Contributions– Investigation of overlapped code for software
protection• Study of disassembly synchronization and other roadblocks• Design of code interleaving and outlining to address limitations• Integrity checking via oblivious hashing• Placement in context of security models, not ad hoc methods
– Tool implementations to verify practical effectiveness• Code interleaving and outlining for x86 binaries• Iteration framework to enhance security
• Future work– Security analysis in theory and practice– Other overlapped-code methods– Porting to custom byte-codes