obfscuro: a commodity obfuscation engine on intel sgxobfscuro.pdf · obfscuro: a commodity...

15
O BFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad 1, , Byunggill Joe 2, , , Yuan Xiao 3 , Yinqian Zhang 3 , Insik Shin 2 , and Byoungyoung Lee 1, 4 1 Purdue University 2 KAIST 3 Ohio State University 4 Seoul National University Abstract—Program obfuscation is a popular cryptographic construct with a wide range of uses such as IP theft preven- tion. Although cryptographic solutions for program obfuscation impose impractically high overheads, a recent breakthrough in systematically leveraging trusted hardware has shown promise. However, the existing solution is based on special-purpose trusted hardware, restricting its use-cases to a limited few. In this paper, we first study if such obfuscation is fea- sible based on commodity trusted hardware, Intel SGX, and we observe that certain important security considerations are not afforded by commodity hardware. In particular, we found that existing obfuscation/obliviousness schemes are insecure if directly applied to the SGX environment mainly due to the side-channel limitations. To this end, we present OBFSCURO, the first system providing program obfuscation using commodity trusted hardware, Intel SGX. The key idea is to leverage ORAM- based operations to perform secure code execution and data access. Initially, OBFSCURO transforms the regular program layout into a side-channel-secure and ORAM-compatible layout. Then, OBFSCURO ensures that its ORAM controller always performs data oblivious accesses in order to protect itself from the side-channel attacks. Furthermore, OBFSCURO ensures that the program is secure from timing-based attacks by ensuring that the program always runs for a pre-configured time interval. Along the way, OBFSCURO also introduces a systematic optimization such as register-based ORAM stash. We provide a thorough security analysis of OBFSCURO along with empirical attack evaluations showing that OBFSCURO can protect the SGX program execution from being leaked by access pattern-based and timing-based channels. We also provide a detailed performance benchmark results in order to show the practical aspects of OBFSCURO. I. I NTRODUCTION Program obfuscation [5, 29] is a popular cryptographic construct which has interesting and wide-ranging applications towards protecting the intellectual property of a software program. As computing trends are rapidly shifting towards cloud-based computing, there is a strong need for systems supporting such a notion of program obfuscation. This need arises from various use-cases where computing is offloaded to remote clouds and the owner of the program desires to shield his/her proprietary algorithm from the cloud providers The two lead authors contributed equally to this work. The author did part of this work while visiting Purdue University. The corresponding author is Byoungyoung Lee ([email protected]). and/or other tenants. For example, consider a company like 23andMe [2] which could easily outsource its large genomic computations to the cloud without having to worry about the theft of their algorithm. In program obfuscation, a sender, who owns a program, transforms the program to create an obfuscated version of the program which is functionally identical to the original version, but it always runs for a fixed time before returning an output. The sender then sends this obfuscated program to a receiver. After that, the receiver runs the obfuscated program within a blackbox-like environment—the receiver cannot see (and infer) any intermediate computational results and/or footprints of the obfuscated program. Consequently, although the receiver can keep running the obfuscated program using any input of his/her choice, the receiver learns nothing about the original program. Therefore, as far as the attacker is concerned, he/she is interacting with a virtual blackbox, which only takes an input and returns the resulting output. In this sphere, there has been significant (mostly cryp- tographic) research [7, 11, 16, 25] efforts in achieving the program obfuscation, specifically through additional trust on the underlying system. Recently, there has been a systematic break- through, HOP [47], in achieving program obfuscation through relaxed assumptions on trust on the underlying hardware. However, HOP relies on special-purpose hardware, severely limiting its practicality. In particular, their system relies on custom RISC-V processors to conveniently transplant the root of trust to implement the core security logic and securely contain the program code. We believe such convenience is not free—it would be challenging and unrealistic to deploy such custom-built hardware to a majority of end-user machines or cloud-computing machines. In this paper, we propose OBFSCURO, the first system achieving program obfuscation on commodity hardware. Un- like the previous work relying on special-purpose hardware, OBFSCURO is specifically designed to run on Intel SGX, already shipped with millions of machines in the market. The general idea behind OBFSCURO is similar to HOP—it strictly observes the security protocol of Oblivious RAM (ORAM) [23] to support secure code/data accesses between CPU and memory subsystems, because the trusted boundary of Intel SGX only includes CPU itself and excludes memory subsystems. The novelty of OBFSCURO comes from several challenges in supporting such program obfuscation on Intel SGX, ironically

Upload: others

Post on 22-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OBFSCURO: A Commodity Obfuscation Engine on Intel SGXobfscuro.pdf · OBFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad1, ⋆, Byunggill Joe2, , †, Yuan Xiao 3, Yinqian

OBFSCURO: A Commodity Obfuscation Engine onIntel SGX

Adil Ahmad1, ⋆

, Byunggill Joe2, ⋆, †, Yuan Xiao3, Yinqian Zhang3, Insik Shin2, and Byoungyoung Lee1, 4

1Purdue University2KAIST

3Ohio State University4Seoul National University

Abstract—Program obfuscation is a popular cryptographicconstruct with a wide range of uses such as IP theft preven-tion. Although cryptographic solutions for program obfuscationimpose impractically high overheads, a recent breakthrough insystematically leveraging trusted hardware has shown promise.However, the existing solution is based on special-purpose trustedhardware, restricting its use-cases to a limited few.

In this paper, we first study if such obfuscation is fea-sible based on commodity trusted hardware, Intel SGX, andwe observe that certain important security considerations arenot afforded by commodity hardware. In particular, we foundthat existing obfuscation/obliviousness schemes are insecure ifdirectly applied to the SGX environment mainly due to theside-channel limitations. To this end, we present OBFSCURO,the first system providing program obfuscation using commoditytrusted hardware, Intel SGX. The key idea is to leverage ORAM-based operations to perform secure code execution and dataaccess. Initially, OBFSCURO transforms the regular programlayout into a side-channel-secure and ORAM-compatible layout.Then, OBFSCURO ensures that its ORAM controller alwaysperforms data oblivious accesses in order to protect itself from theside-channel attacks. Furthermore, OBFSCURO ensures that theprogram is secure from timing-based attacks by ensuring that theprogram always runs for a pre-configured time interval. Along theway, OBFSCURO also introduces a systematic optimization suchas register-based ORAM stash. We provide a thorough securityanalysis of OBFSCURO along with empirical attack evaluationsshowing that OBFSCURO can protect the SGX program executionfrom being leaked by access pattern-based and timing-basedchannels. We also provide a detailed performance benchmarkresults in order to show the practical aspects of OBFSCURO.

I. INTRODUCTION

Program obfuscation [5, 29] is a popular cryptographicconstruct which has interesting and wide-ranging applicationstowards protecting the intellectual property of a softwareprogram. As computing trends are rapidly shifting towardscloud-based computing, there is a strong need for systemssupporting such a notion of program obfuscation. This needarises from various use-cases where computing is offloadedto remote clouds and the owner of the program desires toshield his/her proprietary algorithm from the cloud providers

⋆ The two lead authors contributed equally to this work.† The author did part of this work while visiting Purdue University.The corresponding author is Byoungyoung Lee ([email protected]).

and/or other tenants. For example, consider a company like23andMe [2] which could easily outsource its large genomiccomputations to the cloud without having to worry about thetheft of their algorithm.

In program obfuscation, a sender, who owns a program,transforms the program to create an obfuscated version of theprogram which is functionally identical to the original version,but it always runs for a fixed time before returning an output.The sender then sends this obfuscated program to a receiver.After that, the receiver runs the obfuscated program withina blackbox-like environment—the receiver cannot see (andinfer) any intermediate computational results and/or footprintsof the obfuscated program. Consequently, although the receivercan keep running the obfuscated program using any input ofhis/her choice, the receiver learns nothing about the originalprogram. Therefore, as far as the attacker is concerned, he/sheis interacting with a virtual blackbox, which only takes an inputand returns the resulting output.

In this sphere, there has been significant (mostly cryp-tographic) research [7, 11, 16, 25] efforts in achieving theprogram obfuscation, specifically through additional trust on theunderlying system. Recently, there has been a systematic break-through, HOP [47], in achieving program obfuscation throughrelaxed assumptions on trust on the underlying hardware.However, HOP relies on special-purpose hardware, severelylimiting its practicality. In particular, their system relies oncustom RISC-V processors to conveniently transplant the rootof trust to implement the core security logic and securelycontain the program code. We believe such convenience is notfree—it would be challenging and unrealistic to deploy suchcustom-built hardware to a majority of end-user machines orcloud-computing machines.

In this paper, we propose OBFSCURO, the first systemachieving program obfuscation on commodity hardware. Un-like the previous work relying on special-purpose hardware,OBFSCURO is specifically designed to run on Intel SGX, alreadyshipped with millions of machines in the market. The generalidea behind OBFSCURO is similar to HOP—it strictly observesthe security protocol of Oblivious RAM (ORAM) [23] tosupport secure code/data accesses between CPU and memorysubsystems, because the trusted boundary of Intel SGX onlyincludes CPU itself and excludes memory subsystems. Thenovelty of OBFSCURO comes from several challenges insupporting such program obfuscation on Intel SGX, ironically

Page 2: OBFSCURO: A Commodity Obfuscation Engine on Intel SGXobfscuro.pdf · OBFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad1, ⋆, Byunggill Joe2, , †, Yuan Xiao 3, Yinqian

This is a preprint, and the final paper will appear in NDSS 2019.

because it has to run on commodity hardware. Intel SGXincludes many other features that can be abused to invalidate akey security assumption behind program obfuscation (i.e., theobfuscated program should be running within a blackbox-likeenvironment).

More specifically, researchers have identified that Intel SGXhas critical access pattern-based side-channel security flaws.These allow adversaries to infer computational semantics withinSGX thereby breaking the blackbox execution environment.Based various side-channels, namely page fault [68, 70],cache [8, 24, 57], and branch-prediction [40] attacks, attackerswith high privileges, e.g., OS, can infer substantial informationabout the execution semantics of the enclave program. Forexample, previous work [57] has shown how cache attackscan be abused to leak an RSA private key from an enclaveprogram.

As far as access pattern-based side-channels are concerned,the root cause of the problem is that it is challenging tocompletely hide memory access patterns from privilegedadversaries in the current Intel SGX architecture. The reasonfor this is that the CPU is designed to rely on other subsystemsto perform computation on given instructions. In particular,Intel SGX is not designed to secure communication patternsbetween the CPU and memory-management hardware units(e.g., the MMU/TLB, cache, DRAM, branch-predictors, etc.).For performance reasons, the communication channels andhardware units are designed to be partially shared betweentrusted and untrusted entities, allowing potentially adversarialentities to observe and collect memory traces exhibited by anSGX enclave.

To address these challenges, OBFSCURO utilizes three mainideas. First, OBFSCURO employs a data-oblivious ORAMimplementation. Our work improves on the previously proposedsecure ORAM implementations [3, 52, 55] by designing anefficient register-based stash. Second, OBFSCURO designs side-channel resistant scratchpad-based code execution and dataaccess models, in order to neutralize the memory accesspatterns observed by attackers as well as bridge the gapbetween traditional ORAM and native program execution.Lastly, OBFSCURO ensures start-to-end obfuscation of thetarget programs by providing execution time normalizationto all applications thereby protecting the programs againstinformation leakage through timing-based channels.

Our implementation of OBFSCURO is based on the LLVMcompiler suite with an installed runtime library. Throughcompiler instrumentation, we transform a native SGX program’scode into cache-line-granular and ORAM-compatible basicblocks. OBFSCURO enforces each basic block to have onedata access, and one code access at fixed offsets within thebasic blocks thereby neutralizing branch targets. The code anddata access instructions are translated into equivalent jumpinstructions targeting OBFSCURO’s runtime library functions.Then, OBFSCURO’s runtime library obliviously serves theprogram with code and data blocks extracted from ORAMstorage onto scratchpad memory regions called C-Pad andD-Pad respectively. The, the code execution and data access(related to the target program) is always performed at theselocations, thereby neutralizing the program’s page table andcache footprints. Lastly, the program is instrumented to keep

executing till a user-configured time interval has elapsed tomitigate the threat of timing channels.

Furthermore, we highlight that although OBFSCURO’sperformance overhead is quite high, it is still much fasterthan the state-of-the-art cryptographic obfuscation schemes. Inparticular, cryptographic obfuscation techniques (which rely onhomomorphic encryption and/or circuit construction as securityprimitives) are still far away to be adopted in practice largelydue to severe performances overheads or limited generality tosupport generic programs (detailed discussion in §IX). However,leveraging the root of trust in the underlying commodityhardware, OBFSCURO demonstrates comparatively moderateperformance overheads on real-world generic programs.

In broad terms, the contributions made by this paper canbe described as follows:

• We dissect commodity-off-the-shelf hardware to find outthe key hardware features which hinder the adoption ofprogram obfuscation in Intel SGX. We also provide acomparison with existing work mentioning how theirapproaches are insecure or incomplete if directly appliedwithin Intel SGX.

• We present, OBFSCURO, the first program obfuscation sys-tem built on top of commodity hardware. Motivated by thehardware limitations of Intel SGX, OBFSCURO provides acomplete start-to-end program obfuscation solution whichcan be readily-adopted without any modifications to legacycode written for SGX programs by the program developer.

• We provide a thorough security analysis of OBFSCUROshowing how it can prevent information leakage throughboth access pattern-based and timing-based side-channels.

• We provide a performance comparison of OBFSCUROusing a diverse set of benchmarking applications as wellas a real-world application, OpenSSL. Our experimentsindicate that OBFSCURO incurs an average overheadof 51× over native SGX execution for our custombenchmarks and an overhead of 16−57× while executingOpenSSL [20].

II. BACKGROUND

A. Intel SGX

Intel SGX is a new set of x86 instructions which havebeen commoditized after the Intel Skylake architecture. SGXallows user-level programs to create a protected memory regioncalled an enclave which is inaccessible from other user-levelprograms as well as privileged components such as BIOS,OS, hypervisor, etc. The Enclave Page Cache (EPC) formsphysical memory pages that are dedicated to enclaves. TheCPU explicitly revokes access to EPC outside an enclave. Eachenclave process is provided its own virtual address space whichis divided into trusted and untrusted parts. The trusted partis allocated pages from the EPC to provide memory integrityand confidentiality. The page tables that deal with translationof virtual to physical pages for EPC pages can be updated byuntrusted system components.

B. SGX Side-Channel Attacks

The three most prominent categories of side-channel attacksagainst Intel SGX are summarized as follows.

2

Page 3: OBFSCURO: A Commodity Obfuscation Engine on Intel SGXobfscuro.pdf · OBFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad1, ⋆, Byunggill Joe2, , †, Yuan Xiao 3, Yinqian

This is a preprint, and the final paper will appear in NDSS 2019.

Page Table Attacks. As with regular non-enclave processes,the untrusted OS handles page tables for the EPC pages toflexibly provision EPC resources. Previous works [68, 70] haveshown that a privileged attacker can exploit page faults andpage table walks in order to gain page-level granular insightinto the execution of an enclave process. Since the OS handlesthe page tables, it can invalidate access onto all EPC pageswhich will result in page faults, thereby capturing trace of allpage accesses performed by the enclave. Similarly, the attackercan monitor the access/dirty bit present within the page tableto find out which page was last accessed without invoking apage fault.

Cache Attacks. Caches are designed to reduce the accesslatency of code and data by exploiting temporal and spatiallocality of an application’s execution. The caches are dividedinto a number of cache-sets, which are further divided into fixed-size cache-lines (64 B). Recent reports [8, 24, 57] have shownthat the SGX enclave is insecure against the Prime+Probe [50]attack. As part of this attack, the attacker runs an attack appli-cation which monitors the cache usage of a victim application,performing some security critical operations. During the Primephase, the attacker fills one or more cache sets with his/herown data and during the Probe phase, he/she tries to accessthe data. If the victim has accessed any of these cache sets,it must have evicted some of the cache lines of the attacker,and subsequent access by the attacker will take longer timethan if the lines had not been evicted. Therefore, an attacker,with prior knowledge of the victim application, can infer whatoperation took place (assuming different operations will accessdifferent cache sets).

Branch Prediction Attacks. Last Branch Record (LBR)saves the history of the recently taken branches which canbe referenced by developers for further optimization The LBRstores information including source/target address of a branch,and a flag whether the branch is taken or not, etc. SGXdisables direct reporting of the LBR information outside theenclave. However, recent reports [40] have shown how it canbe indirectly inferred from outside the enclave. To perform thisattack, the attacker leverages prior information on the sourceand destinations of the branches in a target program. Next, theattacker writes a shadow code for a set of branches within theprogram. The attacker executes both victim and shadow codein parallel. Finally, the attacker monitors the shadow code formis-predictions (penalized by extra CPU cycles), to figure outwhich branch was taken by the enclave.

C. ORAM

ORAM [23] is a well-known cryptographic technique whichprovides secure access to an encrypted memory region locatedin a remote and untrusted server. ORAM achieves securememory access by (a) accessing multiple memory locationsinstead of a single memory location and (b) re-shuffling andre-encrypting the extracted memory regions with a random seed.Path ORAM [63] is an improved variant of ORAM which usesa binary tree-like formation to store the encrypted memory onthe server. Each node within the tree is composed of K blocks,where K is a fixed number pre-defined during initialization. AnORAM tree contains both real blocks, i.e., with actual clientdata, and dummy blocks, i.e., with dummy data, meant to fool

d2 d3

d1

A B C D

0

0 1 0 1

1

Initial state Read operation

A

B

C

D

00

01

10

11

1 2

d2 d3

d1

A B C D

0

0 1 0 1

1A

B

C

D

00

01

10

11

d1

d3

D d2 D

d4

A B C d5

0

0 1 0 1

1A

B

C

D

00

01

10

10

Write operation3

pos.

mapstash pos.

mapstash pos.

mapstash

Fig. 1: Path ORAM illustration

an attacker. The number of real blocks within a tree of L leafscan be at most L in order to provide the security guaranteesof ORAM. The tree is stored within the untrusted storage inan encrypted format.

Using Path ORAM, the client runs an ORAM controllerwithin a small, completely trusted memory region. There aretwo key data structures for Path ORAM, i.e., the Positionmap and the Stash. The position map can be a simple integerarray, which links the real block to its corresponding leaf-indexwithin the ORAM tree. Whenever the client needs to accessa block from the ORAM tree, the ORAM controller finds thecorresponding leaf from the position map and extracts the pathfrom the root to the leaf. The extracted blocks are stored withinthe stash memory.

Figure 1 illustrates the Path ORAM algorithm. In this figure,the client attempts to access the block D from the untrustedstorage containing the ORAM tree ( 1 ). First, the client looks-up the leaf index corresponding to block D, which is 11 in ourexample( 2 ). Then, the client extracts the complete path fromthe root of the tree to the leaf (i.e., d1, d3, D) and saves it in thestash as shown. The dummy blocks (i.e., d1, d3) are discardedat this point to keep the stash size small. After accessing theblock D, the client randomizes its position, i.e., initial leaf was11 and final leaf is 10, and re-encrypts the block with a randomseed ( 3 ). The client then tries to write-back to the tree fromthe old leaf (11) back to the root. To ensure consistency, theclient only writes back a real block on a certain node, iff, thatnode is the new leaf, i.e., 10, or that node is in the path to thenew leaf. If the client does not have a real block to put intothe node, it generates dummy data, encrypts it (using randomnonce) and writes it to that node. For example, in the figure,(d4, d5) corresponds to the generated dummy data.

III. THREAT MODEL

We assume a scenario where a user runs an SGX enclaveprogram with some security-sensitive program. The enclaveprogram, OBFSCURO’s runtime and compiler, and the CPUare the only trusted components, and all other software andhardware components (including operating systems, hypervisors,memory hardware units, etc.) are untrusted. The user’s goalis to ensure that the program’s logic is not leaked to anyattacker observing the enclave’s execution. Therefore, theprogram executable is securely provided to the remote SGXenclave through an encrypted channel (e.g., Diffie-Hellman [54]between enclaves). We assume that the enclave is alreadyprovisioned with all prerequisite memory and/or files that itwould require to correctly execute before it starts executing.Therefore, we can safely assume that the enclave does notperform a synchronous exit (e.g., for system call) after the startof its execution till termination. The attacker’s goal is to obtainthe underlying algorithm or program logic. To achieve this,

3

Page 4: OBFSCURO: A Commodity Obfuscation Engine on Intel SGXobfscuro.pdf · OBFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad1, ⋆, Byunggill Joe2, , †, Yuan Xiao 3, Yinqian

This is a preprint, and the final paper will appear in NDSS 2019.

the attacker can probe 1 the enclave using any input of his/herchoice and get the correct output. Furthermore, the attacker canobserve the program’s access patterns through a combinationof bus snooping attacks and side-channel attacks using pagetables, caches, and branch prediction units. The attacker canalso measure the program’s execution time and use that to leaksome information.

As far as memory access patterns are concerned, we assumea worst-case attack scenario: a powerful attacker who learnsperfect execution traces at their finest resolution (i.e., 64 Bfrom a combination of page table, cache and bus-snooping, andexact branch targets from branch prediction attacks) of bothphysical and virtual memory addresses that an enclave programaccesses. More formally, let Φ be an SGX enclave program,and its runtime memory access trace Φk(I) (0 ≤ k ≤ n)denotes a sequence of stripped code and/or data addresses(i.e., stripped addresses depending on the attacking method’sgranularity) while running an input I . Φ0(I) denotes the firstaddress that the program accesses (i.e., an instruction at theprogram’s entry point) and Φn(I) denotes the last address thatthe program accesses (only if the program terminates on theinput I). Furthermore, the attacker can learn some informationabout the program by monitoring timing channels. The attackercan infer the entire execution time T of the program on his/herprovided inputs to leak some information. Given these memoryand timing traces, attacker tries to learn the security sensitiveinformation (e.g., the algorithm or some part of it) of theprogram.

We do not consider software vulnerabilities in an enclaveprogram (i.e., memory corruption vulnerabilities or seman-tic/logical vulnerabilities) or physical attacks (power-based,electromagnetic etc.) and security solutions [39, 58] to theseissues are orthogonal to OBFSCURO. Furthermore, we considerSpectre [38] and Meltdown [43] attacks out of scope as well.Traditional program obfuscation assumes that the programcannot directly disclose the memory contents of the applicationwhich is what these attacks do. Also their patch [35] has alreadybeen provided by Intel and can be rigorously checked throughthe CPUSVN number provided during SGX remote attestation.

IV. CHALLENGES

As mentioned before, the goal of OBFSCURO is to achievea strong notion of security—program obfuscation (also calledvirtual black box (VBB) obfuscation) on market-availablecommodity trusted hardware, Intel SGX. Unlike supportingprogram obfuscation on special-purpose hardware, such asHOP [47], there are a number of challenges involved insupporting VBB obfuscation on Intel SGX. These challengesarise partly due to the lower privileged execution supportedby SGX enclaves and partly due to side-channel informationleakage from SGX enclaves. In particular, these challengesinclude — (a) how to enforce secure ORAM-based programexecution in SGX? and (b) how to secure the ORAM controllerin SGX? Unlike with special-purpose hardware, SGX enclavescannot control the cache and/or the branch-predictors, which canbe abused by an attacker to infer significant information from aninsecure ORAM-based program execution. Also, while special-purpose hardware provides the program a trusted memory

1This assumption can be easily relaxed to ensure input/output confiden-tiality as we describe in §IX

region to store the ORAM controller, SGX enclaves onlyprovide a very small trusted memory region, registers, dueto side-channel issues.

A. Comparison with Existing Schemes

In this subsection, we provide a comparison of OBFSCUROwith all existing schemes tailored to provide oblivious and/orobfuscated execution. For the ensuing discussion, it is imper-ative that we make a clear distinction between side-channelobliviousness (and its weaker version, memory trace oblivi-ousness) and program obfuscation. In particular, the formerassumes that the program is known to the attacker but the input(securely provided to the program) is sensitive and thereforehas to be protected. The latter assumes that the program isunknown and in itself sensitive whereas the input/output can beknown to the attacker. It is also worth mentioning that programobfuscation can also be used to protect the input to the programbut it is not its primary goal. Figure 2 provides a comparisonof all existing work with OBFSCURO.

First, we compare the existing side-channel oblivioussystems with OBFSCURO. In general, these systems are basedon custom hardware [18, 46], software-level defenses [45,52, 62] or hybrid [44]. The most notable example in thisis Raccoon [52] which can protect the input to a knownprogram against all access pattern leakage (page table, cache,bus-snooping and branch-prediction) on commodity hardware.However, all these schemes do not fulfill the requirements oftraditional program obfuscation and are only concerned withprotecting the input provided to the program. On the otherhand, VBB obfuscation protects both the program and can alsobe used to protect the input to the program.

The closest existing work is HOP [47], which is theonly other system apart from OBFSCURO, which guaranteescomplete VBB obfuscation to the program. However, theHOP system is based on special-purpose hardware and furtherutilizes an on-chip trusted storage region for storing andexecuting the code segments of the program and the ORAMcontroller. Thanks to the special-purpose hardware, HOPremains unconcerned with protecting its ORAM controllerand/or the program against sophisticated cache or branch-prediction attacks. Conversely, since OBFSCURO has to supportoblivious execution on commodity hardware, its design revolvesaround the limitations of the hardware and therefore has to dealwith the cache and/or the branch-predictor, to provide the sametheoretical guarantees of program obfuscation. Furthermore,OBFSCURO has to provision its ORAM controller to ensurethat there is no information leakage through its operation.

B. OBFSCURO’s Approach.

OBFSCURO is a compiler-based approach that requires nomodification from the program developer in order to achieve thesecurity guarantees of VBB obfuscation for Intel SGX programs.Motivated by the limitations in existing approaches, OBFSCUROadopts the following approaches: 1) all code execution and dataaccess has to be performed at the granularity of cache-linesusing ORAM-based shuffling scheme, 2) all code executionshould leave oblivious memory traces on all uncontrolledhardware, i.e., DRAM, cache, TLB and branch-predictors and3) all programs should terminate at user-defined times in order

4

Page 5: OBFSCURO: A Commodity Obfuscation Engine on Intel SGXobfscuro.pdf · OBFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad1, ⋆, Byunggill Joe2, , †, Yuan Xiao 3, Yinqian

This is a preprint, and the final paper will appear in NDSS 2019.

Scheme Architecture Protection Scope Secure Program

Bus Snooping Cache Attacks Branch Prediction Page-level Attacks ORAM Obfuscation

Raccoon [52] commodity hardware ✓ ✓ ✓ ✓ ✓ ✗

Phantom [46] special purpose hardware ✓ ✗ ✗ ✗ ✗ ✗

GhostRider [44] special purpose hardware ✓ ✗ ✗ ✓ ✗ ✗

HOP [47] special purpose hardware ✓ ✗ ✗ ✓ ✗ ✓

OBFSCURO (this paper) commodity hardware ✓ ✓ ✓ ✓ ✓ ✓

Fig. 2: An overview of the differences in OBFSCURO and existing oblivious execution schemes.

Code ORAM

Controller

Data ORAM

Controller

C-Pad

D-Pad

stash

pos. map

Register-based

Data-oblivious

Code access

Retrieve

code block

Retrieve

data block

C-Tree

D-TreeData access

Fetch data

Actual

data

access

Fetch code &

jump

12

3

1

2

3

4

64B

64B

Code execution model (§5.4) Data access model (§5.5)

5Flush data

stash

pos. map

Register-based

Data-oblivious

§5.2

§5.3

§5.2

§5.3

ORAM Bank

Data access jmp

Code access jmp

Fig. 3: OBFSCURO’s system-level overview.

to protect programs against timing attacks. To do so, firstly,OBFSCURO utilizes ORAM access to securely randomize eachmemory access at cache-line granularity. Secondly, OBFSCUROfurther provisions the program by introducing a central codeexecution and data access location—a scratchpad—therebyneutralizing the observed memory traces through all hardwareincluding the branch-predictors. Finally, OBFSCURO ensuresthat all programs (irrespective of their intended program logic)terminate after a certain time period T specified by the user atthe start.

V. DESIGN

A. Overview

OBFSCURO is a software framework enabling obfuscatedexecution for SGX enclave programs. The key idea behindOBFSCURO is to adopt ORAM operations in conjunctionwith scratchpad-based code execution and data access, therebyexhibiting memory traces oblivious to program execution (il-lustrated in Figure 3). The core design features of OBFSCUROcan be summarized as follows.

• Secure ORAM Scheme. OBFSCURO implements its ORAMcontroller using data oblivious algorithms, thereby protectingit from side-channel attacks (§V-B). OBFSCURO implementsa register-based stash which improves on the prior side-channel resilient ORAM implementations [3, 55].

• Repurposing Native Programs. The ORAM scheme is de-signed to retrieve data, securing the data access patterns ratherthan program execution. As a result, there exists semanticsgap from general program execution. OBFSCURO bridgesthe semantic gap between native program execution andORAM operations by repurposing native programs (§V-C)through memory layout transformation and virtual addresstranslation.

• Code Execution Model. OBFSCURO ensures that all thecode execution (for the target program) is only performedwithin a code scratchpad, a fixed memory location (§V-D).All instructions are loaded onto the scratchpad throughORAM operations and executed from the start to the endof the scratchpad ( 1 ∼ 3 ). Furthermore, OBFSCURO’sscratchpad is designed with SGX-aware protections unlikeprevious work [44, 47].

• Data Access Model. OBFSCURO ensures that all data accessis performed through a data scratchpad, which is a fixedmemory location updated using ORAM operations (§V-E).The target program’s read and write operations are performedat the same memory location regardless of execution context( 1 ∼ 5 ). OBFSCURO also ensures that the data access isalways performed once per C-Pad, normalizing the numberof data accesses patterns.

• Start-to-End Obfuscation. OBFSCURO ensures that thetarget program continues executing till a certain predefinedtime to mitigate timing-based channels, irrespective ofthe program logic (§V-F). OBFSCURO achieves this byinstrumenting the target application to introduce dummymemory blocks, after the termination of the intended logic.Furthermore, OBFSCURO returns a fixed output size at theend of program execution.

Workflow of OBFSCURO. The input to OBFSCURO is thesource code of a target enclave application. Using the input,OBFSCURO produces an instrumented executable, fully loadedwith a runtime library (containing the ORAM controller).During initialization, the runtime library populates the code anddata blocks into different ORAM trees. Afterwards, the ORAMcontroller extracts the first code-block to be executed, loads itonto the code scratchpad, and ensures execution starts from thebeginning of code scratchpad. When the code block performsa branch instruction, the branch instruction is replaced withnew jump instruction to the ORAM controller for codes. Then,the ORAM controller loads the required code block onto thecode scratchpad using ORAM operations, and jumps back tothe beginning of the code scratchpad (§V-D). While accessingdata (i.e., global/heap/stack objects), the access instruction isreplaced with new jump to the ORAM controller for data.The ORAM controller for data always loads the correspondingdata block onto the data scratchpad using ORAM operations,and returns the appropriate address (i.e., base address of datascratchpad + access offset) (§V-E). Finally, OBFSCURO ensuresthat the program keeps executing till a certain pre-defined timeand returns a fixed amount of output to the user thereby ensuringcomplete start-to-end obfuscation (§V-F).

5

Page 6: OBFSCURO: A Commodity Obfuscation Engine on Intel SGXobfscuro.pdf · OBFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad1, ⋆, Byunggill Joe2, , †, Yuan Xiao 3, Yinqian

This is a preprint, and the final paper will appear in NDSS 2019.

ORAM

Controller

C-pad or

D-pad

DRAM

Genuine access Bogus access

Stash block Target block

Regular arraya) cmov-based

stash

b) Register

only stash

CPUAVX registers

Fig. 4: Register-based stash versus CMOV-based stash. CMOV-based stashhas to access an entire array placed in DRAM whereas register-basedstash can directly retrieve an item from CPU’s AVX registers.

B. Secure ORAM Scheme

In this subsection, we explain how OBFSCURO designs asecure ORAM scheme to ensure oblivious program execution.Firstly, OBFSCURO places both the ORAM controller andtrees within an SGX enclave. Secondly, in response to side-channel threats against SGX enclaves, OBFSCURO securesworking mechanisms of its ORAM controller, i.e., ensuringthat each operation is branch-free (to mitigate the risk ofbranch-prediction) and data-independent (to mitigate the riskof page table and cache attacks). In this regard, OBFSCUROconstructs two novel stash designs: CMOV-based and register-based stash for the ORAM controller (§V-B1). Furthermore,OBFSCURO employs a data-oblivious population scheme tosecurely populate the ORAM trees (§V-B2).

1) ORAM Controller: In the following, we describe howOBFSCURO secures two main data structures of the ORAM con-troller, i.e., position map and stash, against access-pattern leak-age. By securing access onto these data structures, OBFSCUROalso ensures that its code is devoid of conditional branches(i.e., secure against branch-prediction attacks).

Data Oblivious Position Map. The position map containssensitive information regarding ORAM blocks, i.e., mappingfrom block-id to the leaf in ORAM tree. An attacker can leaksensitive information about program execution by observing theaccess patterns onto the position map. OBFSCURO employs dataoblivious access mechanism to prevent information leakagefrom the position map. The key security primitive of thismechanism is in leveraging cmov instruction in x86 to streamthrough the entire data structures. As shown by Raccoon [52],we devise a wrapper function for the cmov instruction to addadditional bogus memory access. Depending on the flag valueprovided to the wrapper function of the cmov instruction, thefunction performs either the actual memory write (if the flagis true) or a bogus memory access without writing (if the flagis false).

Next, we describe how OBFSCURO secures access onto thestash. Naively accessing the stash would leave memory tracesthat can be used to distinguish between real and dummy blocksin the extracted ORAM tree path. OBFSCURO can utilize twodifferent stash designs, CMOV-based stash and a novel register-based stash. While both completely secure stash accesses, itimposes different performance characteristics depending on theunderlying hardware architecture.

CMOV-based Stash. OBFSCURO can use data-obliviousaccess (using CMOV) to stream through the complete stashmemory region (Figure 4-a), similar to previous schemes [3, 55].As a result, the CMOV-supported access guarantees that the

void retrieve_from_stash_cmov(void* cpad, int required_blk) {bool flag = false;

for (int i = 0; i < NUM_STASH_BLOCKS; i++) {// Check the validity of the condition, i.e.,// is this the block to retrieve from the stashflag = ((stash[i].blocknum == required_blk));

// Based on the flag, either perform a real or a dummy copyx86_cmov(cpad, stash[i].memblk, flag);

}}

(a) CMOV-based stash

; %rsi points to the base address of ORAM tree block.movaps (%rsi), %xmm0vinserti128 $0x0, %xmm0, %ymm5, %ymm5add $16, %rsimovaps (%rsi), %xmm0vinserti128 $0x1, %xmm0, %ymm5, %ymm5add $16, %rsimovaps (%rsi), %xmm0vinserti128 $0x0, %xmm0, %ymm6, %ymm6add $16, %rsimovaps (%rsi), %xmm0vinserti128 $0x1, %xmm0, %ymm6, %ymm6

(b) Register-based stash

Fig. 5: Implementation snippets of OBFSCURO’s stash access:(a) OBFSCURO obliviously retrieves a block from the stash usingCMOV; and (b) OBFSCURO leverages YMM registers to obliviouslyaccess stash indices. As can be observed, there are no conditionalbranches and/or data-dependent access in both cases.

attacker learns nothing from the leaked access patterns as theattacker observes accesses onto all stash indices. One caveat ofthis approach is that the stash is a large memory region, i.e., >=Blog2N bytes; where B is the block-size in bytes and log2Nis the size of the ORAM tree containing N nodes. Therefore,using CMOV within the stash can result in performance overheadas noted by previous works and reported in §VIII-1. Figure 5ashows a code snippet illustrating how the CMOV-based stashfunctions.

Register-based Stash. OBFSCURO also designs a novelregister-based stash, which leverages Advanced Vector Ex-tensions (AVX) instruction set along with the XMM andYMM registers. We collectively refer to these registers asAVX registers. The key idea is to reserve these registers forORAM stash only and restrict the program and associatedlibraries from using them. An operation performed on anyCPU register does not imprint traces on memory-related units(cache, TLB/MMU, DRAM etc.) and is therefore obliviousto even privileged attackers such as the OS (Figure 4-b).Therefore, OBFSCURO copies each tree block onto a set ofAVX registers and performs all required operations on theseregisters. This limits the involvement of CMOV and thereforeprovides a performance improvement of 30−40% as comparedto the CMOV-based stash as shown in §VIII-1. Figure 5b showsan example of where the memory located at rsi is moved inchunks of 32-bytes into ymm5 and ymm6.

However, there are two things to consider while opting forthe register-based stash over the CMOV-based stash. Firstly, theregister-based stash limits the involvement of AVX registers forother important operations such as AES-NI instruction set and ifthe enclave program requires these operations, it would be bettersuited to use the CMOV-based stash. Secondly, current desktop

6

Page 7: OBFSCURO: A Commodity Obfuscation Engine on Intel SGXobfscuro.pdf · OBFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad1, ⋆, Byunggill Joe2, , †, Yuan Xiao 3, Yinqian

This is a preprint, and the final paper will appear in NDSS 2019.

hardware only supports AVX2 [31] which provides 16 YMMregisters of 32 B memory each, totaling to 512 B of memoryfor the stash. This size is enough for small ORAM tree size(e.g., 4-8KB) but is insufficient for larger tree sizes. However,the AVX-512 [53] instruction set architecture introduces largerAVX registers (ZMM registers), currently present on high-endhardware [32, 33]. The ZMM registers are 32 registers in total,with each being 512-bit wide and can support a total stashsize of 2-kilobytes which increases our tree size that can besupported from 8KB to 512MB.

Workflow. Based on the above building blocks, we nowillustrate how OBFSCURO performs a secure ORAM access.First, OBFSCURO uses CMOV to scan through the whole positionmap to find the required ORAM block. Then, OBFSCUROsequentially copies the tree blocks to either memory (if CMOV-based stash is used) or the registers (if the register-based stash isused). Afterwards, OBFSCURO performs an oblivious retrievalof the required block from the stash. In the case of CMOV-basedstash, it performs a sequential CMOV access on each individualstash index and in the case of register-based stash, it performs aninline assembly move operation to move it from the register tothe memory. After performing the relevant tasks on the ORAMblock, we rewrite the block back using similar approach asmentioned above.

2) ORAM Bank: OBFSCURO places the ORAM bank,comprising of the ORAM trees, within the enclave memory.OBFSCURO performs secure ORAM tree population to mitigateside-channel leakage.

Allocation. The ORAM trees are allocated as global arrayswithin the enclave program’s memory space (i.e., within theEPC). OBFSCURO can avoid encrypting ORAM trees, which isan important step in the ORAM protocol, because the MemoryEncryption Engine (MEE) in SGX [28] implicitly performs theencryption. There are two things to note here: (a) the allocationstep does not leak any important information to the attackerapart from the location of the ORAM tree (which is publicinformation in the ORAM attack model) and (b) the size ofthe code and data trees should be carefully considered prior toallocation since as per Path ORAM’s design, the size of thetrees cannot be dynamically adjusted.

Population. As per Path ORAM’s requirement, the populationof each block into the ORAM tree should be performed asa regular ORAM access. To further illustrate, the populationof code and data blocks in C-Tree and D-Tree respectively, iscarried out as follows: (a) OBFSCURO picks a block which isto be added to the ORAM tree. (b) OBFSCURO determinesa random position to store the block within the ORAMtree. The random position is determined using the RDRANDhardware instruction, which only involves the trusted CPU.(c) OBFSCURO performs an ORAM access onto the path thatcorresponds to the selected position. At first glance, this mightleak some information to the attacker. However, since this isan ORAM access, the final destination of the block will berandomized within the path once more which ensures strongsecrecy. (d) OBFSCURO repeats the above steps until all realblocks are populated to the ORAM tree.

C. Repurposing Native Programs

In order to bridge semantic gaps between native executionand oblivious execution, OBFSCURO transforms the nativeprogram’s memory layout into an ORAM-compatible memorylayout, provides virtual address translation to support dynamicmemory relocation, and introduces a scratchpad region for codeexecution and data access.

Memory Layout Transformation. OBFSCURO separates thetarget program into two sections, i.e., code and data, andallocates a dedicated ORAM tree for each section, namelyC-Tree for code and D-Tree for data. OBFSCURO can estimatethe size of the C-Tree since the program’s code size remainsstatic. Since the size of dynamically allocated data (e.g., heapand stack) cannot be precisely estimated, OBFSCURO setsa maximum limit on the size of the D-Tree. This is nota limitation since SGX programs themselves are initializedwith a user-provided stack and heap size. Code blocks areprepared during the compilation phase, where the code isdivided into blocks of the same size and filled with instrumentedinstructions by OBFSCURO (more details in §V-D). Duringprogram initialization, OBFSCURO populates both the codeblocks and data blocks into the C-Tree and D-Tree respectively.The initialized data objects (i.e., global variables) are filled intheir corresponding blocks whereas the blocks correspondingto uninitialized data blocks are zero-initialized.

Virtual Address Translation. All the code and data accessesin a traditional program are realized through the virtual address,while ORAM operations deal in blocks of the ORAM tree.To bridge this semantic gap, OBFSCURO performs on-the-fly translation of virtual addresses into ORAM block indices.To ensure secure translation, OBFSCURO linearly maps thevirtual address space of a program into ORAM blocks andperforms bitwise right-shift. In order to achieve linear mapping,OBFSCURO modifies a linker configuration, specifying code anddata to be located contiguously within enclave’s virtual addressspace. Hence, OBFSCURO can perform a constant bitwise right-shift operation to securely translate the virtual address to blockindex.

Heap Management. Since SGX enclaves do not have supportfor dynamic memory allocation, the maximum heap sizerequired for the application has to be decided at compilationtime. To handle runtime requests, OBFSCURO provides awrapper for the malloc and free function calls, i.e., malloc_oband free_ob, which are responsible for managing the heapmemory (alongside the metadata) requested by the enclaveprogram. In particular, malloc_ob obliviously picks a blockfrom the D-Tree which is already provisioned with blocks tohandle heap memory requests during program initialization. Thewrapper function returns the virtual address corresponding tothe selected block. Later, when free_ob is called, it deallocatesthe heap memory region, figures out which blocks from theD-Tree are now free and simply tags them as such.

Scratchpad. In traditional ORAM, the program can simplyaccess the extracted block from the stash. However, doingso within the SGX environment will leak the position of therequired block within the stash through observed memory traces.To deal with this problem, OBFSCURO prepares two fixedlocations (determined during program initialization) of fixed

7

Page 8: OBFSCURO: A Commodity Obfuscation Engine on Intel SGXobfscuro.pdf · OBFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad1, ⋆, Byunggill Joe2, , †, Yuan Xiao 3, Yinqian

This is a preprint, and the final paper will appear in NDSS 2019.

size (one cache line, i.e., 64 B) to access code and data blocks,called C-Pad and D-Pad respectively. These memory regionsare provisioned with SGX-specific defenses (refer to §V-Dand §V-E). After OBFSCURO performs oblivious operation andlocates a target block in stash, OBFSCURO copies the targetblock in stash to scratchpad. Note that this copy from stashis oblivious as described in §V-B1. Therefore, by normalizingaccess location and size through scratchpads, OBFSCURO cansuccessfully hide actual memory location and the attacker cannot infer that information. We provide more details as to howthis is accomplished in the next two sections.

D. Code Execution Model

OBFSCURO ensures the following three security propertiesin its code execution model: C1) Code execution is alwaysperformed within the C-Pad2; C2) Code access instructions (i.e.,branch instructions which impact the control-flow of a program,including call, return, unconditional branch, and conditionalbranch instructions) are only executed at a fixed location (i.e.,the end of the C-Pad); C3) All code access instructions arereplaced with an instruction jumping to a runtime function (i.e.,code_oram_controller), which performs an ORAM operationto fetch the code block required.

The above mentioned security properties of OBFSCUROprotect code execution from access-based side-channel attacks.Since the size of the C-Pad is the same as the minimumgranularity of page table and cache-based attacks (i.e., 64 B), C1prevents these attacks from gaining any meaningful information.C2 and C3 prevent a branch prediction attack, because all thecontrol-flow changes are made from the same location (i.e., theend of C-Pad as specified by C2) to the same destination (i.e.,code_oram_controller as specified by C3), irrespective of thesemantics of the original branch instruction.

To meet the property C1, OBFSCURO restricts all basicblocks to be at the size of C-Pad (i.e., 64 B) during thecompilation phase. Specifically, OBFSCURO breaks up largerbasic blocks into smaller ones equaling the size of the C-Pad.If the size of the basic block is smaller than the C-Pad,OBFSCURO inserts nop instructions to fill the space. To meetthe properties C2 and C3, OBFSCURO replaces all branchinstructions with a sequence of equivalent instructions invokingcode_oram_controller. This invocation is always performedusing jmp instruction to code_oram_controller, which isaligned at the end of the basic block.

For example, Figure 6a shows how OBFSCURO replacesa unconditional branch instruction. Given the original jmpinstruction, OBFSCURO first instruments an instruction storingthe virtual address of the jump target in R15. Then, OBFSCUROinserts a jmp instruction to the code_oram_controller. Thecode ORAM controller computes the ORAM block index usingthe virtual address stored in R15 (as mentioned in §V-C), andretrieves the required code block from the C-Tree throughan ORAM access. Afterwards, OBFSCURO overwrites C-Padusing the obtained code block and resumes execution from thebeginning of C-Pad. In this manner, OBFSCURO translates alltypes of control flow instructions, including conditional jump,function call, return.

2The C-Pad is a writable and executable region but it can be securedagainst memory corruption by employing SFI similar to SGX-Shield [58]and/or dynamic page protection to be available in SGXv2.

; Beforejmp jump_target

; Aftermov R15, jump_target ; Pass jump_target through R15jmp code_oram_controller ; code_oram_controller loads the code

; block to C-Pad and then jumps to the; beginning of C-Pad.

(a) Unconditional branch (code access)

; Beforemov 4(RAX), RBX ; Store RBX at where (RAX + 4) points to

; Afterlea R15, 4(RAX) ; Pass the store address through R15mov R14, after_fetch ; Pass the return address through R14jmp data_oram_controller ; data_oram_controller fetches data block

; and returns address of (D-Pad + offset); through R15

after_fetch:mov (R15), RBX ; Write a value RBX to (D-Pad + offset)

(b) Store (data access)

Fig. 6: Instrumentation on code and data access.

E. Data Access Model

OBFSCURO ensures the following security properties inthe data access model: D1) Data access is always performedwithin the D-Pad of size 64 B; D2) Data access instructionsare only executed once per C-Pad at a fixed location (i.e., thebeginning of the C-Pad); and D3) All data access instructionsare replaced with an instruction jumping to a runtime function,data_oram_controller, which performs an ORAM operationto load the the corresponding data block onto the D-Pad. Similarto the code execution model (§V-D), these properties preventcache and page table attacks. This is because attackers willalways observe the same data access patterns onto D-Pad.

One thing to note here is that D2 enforces each code blockto perform a single jump to the data_oram_controller. Thisrestriction is partly due to the constraint of the 64-byte codeblock. In particular, OBFSCURO’s data access instructions take28-bytes and the code access instructions (mentioned in §V-D)take 20-bytes. Since a code block requires at least one codeaccess instruction, i.e., to access the next code block, it leavesroom for only a single data access. However, as a result ofthis, OBFSCURO ensures that there is a normalized number ofdata access per code block, which cannot be exploited by anattacker. OBFSCURO also prevents branch-prediction attacksby placing the data access instruction at a fixed location. If acertain code block does not require a data access, OBFSCUROperforms a dummy data access in order to portray the samememory footprints for each block.

Unlike the code execution model, the data access modelallows offset-based access within the D-Pad such that a memoryaccess can be directly performed at any location within D-Pad.This offset-based access is secure against memory-based side-channel attacks since the D-Pad is the size of the minimumgranularity of attack resolution, i.e., 64 B. In order to reflectchanges made by the enclave code on the D-Pad back to theORAM tree, OBFSCURO flushes the extracted data block afterperforming required memory access.

For example, Figure 6b illustrates how OBFSCURO in-struments the store instruction. Similar to the code execu-tion model, OBFSCURO uses the reserved R15 register topass the virtual address (i.e., the memory operand of astore instruction) to the data_oram_controller. Then the

8

Page 9: OBFSCURO: A Commodity Obfuscation Engine on Intel SGXobfscuro.pdf · OBFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad1, ⋆, Byunggill Joe2, , †, Yuan Xiao 3, Yinqian

This is a preprint, and the final paper will appear in NDSS 2019.

target_main()

{

[original_function_body]

*V = ret_val

continuous_dummy()

}

entry()

{

RT_return_addr = R

target_main ()

R:

ret_val = data_oram_controller(V)

}

code_oram_controller (code_block_id)

{

*C-Pad = obtain_oram_block (code_block_id)

flag = ( num_executed_blocks < limit )

num_executed_blocks++

CMOV(flag, C-Pad, RT_return_addr)

jmp C-Pad

}

Instrumented Target Application

Obfuscuro Runtime Library

1

2

3

4

5

6

Original Target Application

target_main()

{

[original_function_body]

return ret_val

}

entry()

{

ret_val = target_main()

}

continuous_dummy()

{

continuous_dummy()

}

Obfuscated reigion Return valueReturn

Fig. 7: OBFSCURO’s continuous execution.

data_oram_controller translates the virtual address into thecorresponding ORAM block index, and updates D-Pad afterextracting the data block using an ORAM access. Afterwards,the data_oram_controller returns the virtual address throughR15, which points within D-Pad (i.e., p1 + p2, where p1 is thebase address of D-Pad and p2 is the offset within the D-Pad).Therefore, the enclave program correctly performs the storeinstruction using R15, and the data block is later flushed backinto the D-Tree.

F. Start-to-End Obfuscation

In the previous subsections, we explain how OBFSCUROensures that the target program’s code blocks perform anormalized sequence of operations, irrespective of their originallogic. However, that is not enough for complete obfuscation.In particular, there is one further distinguishing factor in theprogram, i.e., execution time of the program. For example,running different programs or just running the same programswith different inputs can result in drastically different executiontimes, which can be abused by an attacker.

OBFSCURO handles both of these cases to ensure that,irrespective of program logic, the obfuscated execution alwaysterminates after a fixed amount of time. In order to fixthe execution time, OBFSCURO inserts dummy code blockswithin a native program’s code ensuring that the programkeeps executing even after completing the intended programlogic. OBFSCURO instruments the target application as shownin Figure 7. As shown in the figure, OBFSCURO injects adummy function called continuous_dummy into the program.The dummy function is meant to recursively call itself toinfinity, ensuring that if the program calls this function, itwill not terminate of its own accord. As mentioned in §V-D,each code access will go through the code_oram_controller.Therefore, OBFSCURO can stop the program execution after acertain predefined number of code blocks, even if the dummyfunction never stops executing. However, to do so and providethe required output back, OBFSCURO needs an address to jumpto after reaching the limit on code blocks.

Now, we explain the workflow of the instrumented targetprogram. The application code is defined as target_mainwhereas the enclave officially starts execution from the entryfunction ( 1 ). At the start of the entry, OBFSCURO ensures thatthe return address R is passed to the runtime library by writingRT_return_addr ( 2 ). Afterwards, OBFSCURO starts runningthe target_main function and writes its output to a globalmemory within the program ( 3 ). It is worth noting that thiswrite will also be achieved through an ORAM access (as perall data access mentioned in §V-E) and is therefore obliviousto the attacker. Then, OBFSCURO invokes continuous_dummy( 4 ), ensuring that the program continues executing.

As the program executes, it will jump to thecode_oram_controller on each code access. At this time,OBFSCURO checks that the predefined limit on the numberof code blocks has been reached or not. If the limit hasbeen reached, the program jumps back to RT_return_addrinstead of jumping to the C-Pad ( 5 ). At this point, wecompleted the execution of original program logic but have notobtained the output. To get the output, OBFSCURO calls thedata_oram_controller to extract the output from the D-Tree( 6 ). Through the above mentioned steps, OBFSCURO ensuresthat there is a start-to-end obfuscation of the target program,which always executes the same number of code blocks andthus terminates after a fixed amount of time.

VI. IMPLEMENTATION

We have implemented a prototype of OBFSCURO based onthe LLVM Compiler project 4.0 as well as Intel SGX SDK’senclave loader. OBFSCURO modified following two componentsin LLVM: a) LLVM backend to emit 64B of code blocks aswell as to instrument code and data access instruction; andb) Compiler runtime library for ORAM controllers. In theLLVM backend, especially the assembly emitter, we arrangeda new code emitter to measure the size of instructions inparallel with default emitter. We also utilized built-in machinecode builder to redirect the codes and data accesses to theruntime ORAM controllers. The compiler runtime libraryincludes the implementation of data-oblivious ORAM, andinterfaces for LLVM backend and applications to employ it. Theoblivious stash access is implemented with vinserti128, andvextracti128 AVX register manipulating instructions in theassembly language level. The oblivious position map access isbased on the CMOV instruction, and we generalized its operationto variable lengths. We also changed the enclave loader of theIntel SGX SDK to make C-Pad using SGX’s EADD instruction.In total, OBFSCURO introduces 3,117 LoC in LLVM backend,2,179 LoC in compiler runtime library, and 25 LoC in IntelSGX SDK.

VII. SECURITY ANALYSIS

This subsection provides a security analysis of OBFSCURO.In general, there are two ways an attacker can steal informationfrom SGX enclaves using side-channels. Firstly, an attackercan abuse observed access-patterns to infer some informationabout the program and/or its input. Secondly, an attacker canperform timing-based attacks to leak some information. Weprovide a systematic security analysis of OBFSCURO againstboth these attack avenues.

9

Page 10: OBFSCURO: A Commodity Obfuscation Engine on Intel SGXobfscuro.pdf · OBFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad1, ⋆, Byunggill Joe2, , †, Yuan Xiao 3, Yinqian

This is a preprint, and the final paper will appear in NDSS 2019.

Code ORAM

controller

Code execution (C-Pad)

Data ORAM controller

1a

b

3

c d

2e

D-Pad

Fixed offset

Interaction

Component#

#4

Fig. 8: Data oblivious execution cycle of OBFSCURO

A. Access Pattern Attacks.

As OBFSCURO is composed of multiple components torealize obfuscated program execution, we start by showing thesecurity properties of the individual components of OBFSCURO.Then we show how these components interact with each otherand show that these interactions are completely oblivious aswell. Finally, we present the results of an empirical studyshowing that OBFSCURO achieves access pattern obliviousness.

Obliviousness of Individual Components. OBFSCURO intro-duces newer components to legacy programs in order to achieveobfuscated execution, as shown in Figure 8. In the figure, weshow the four components of OBFSCURO (labeled as 1 ∼ 4 ).We comment on each component individually in the following.

1 Code ORAM controller: The code ORAM controller takesthe virtual address of next required code block as input, andit places the corresponding code block on the C-Pad. Anattacker cannot decipher the virtual address because OBFSCUROperforms secure computation based on this address. In particular,the address is first translated to a specific ORAM blockusing data oblivious right-shift operation (§V-C), which returnsthe corresponding block number in the ORAM tree. Then,OBFSCURO finds the corresponding leaf for this block throughsequential CMOV-based scanning of the position map.

For the stash, OBFSCURO uses two variants, a CMOV-basedand a register-based. The CMOV-based stash performs CMOV-basedmemory access similar to how OBFSCURO shields the positionmap. This includes both (a) while copying the required blockfrom the stash to the C-Pad or D-Pad and (b) while writingback the blocks from the C-Pad or D-Pad to the stash. Forthe register-based stash, the AVX registers are retrofitted asstash space. Since all operations on the AVX registers areoblivious to the underlying system, we can perform a directmemory access to/from a specific register while ensuring thatno information is leaked. Please refer to Figure 9 for detailedoperations performed by the code controller.

2 C-Pad: OBFSCURO ensures that the C-Pad has a fixedlocation (determined at the program loading) and a fixed size(i.e., 64B), and ensures that all oblivious code execution occursfrom this location. Since 64B is the cache-line size (i.e., thefinest visible granularity through access pattern-based side-channel attacks), the attacker learns no useful information toinfer semantics during the C-Pad execution. In other words,as OBFSCURO runs the target program, the attacker will keepobserving the same memory activity over C-Pad, which iscompletely independent of the code block being executed.

3 Data ORAM controller: The data ORAM controller takesthe virtual address of data objects as input, and places thecorresponding data block to D-Pad. The data controller followsthe exact same workflow of the code controller except that it

operates on the D-Tree instead of the C-Tree. As previouslyshown for the code controller, the data controller also does notleak any sensitive information.

4 D-Pad: The D-Pad is functionally and structurally similarto the C-Pad, except that data access is performed on it and notcode execution. Similar to the C-Pad, it has a fixed locationand the same size, thereby showing the same memory activityfor each data access.

Oblivious Interactions b/w Components. The aforementionedcomponents perform five interactions between them (labeledas a ∼ e ). We illustrate below how each of these interactionsis secure against access pattern-based attacks.

a Jump from Code ORAM controller to C-Pad after fetchingcode block: After obliviously extracting a block from theC-Tree and copying it to C-Pad, the code controller performsa single jump to the start of the C-Pad. This step only revealsthat some code block of a target program will now be executed,which entails no semantics behind the code block beingexecuted.

b Jump from C-Pad to Data ORAM controller for fetchingdata block: Each code block (executing within the C-Pad) isstrictly enforced to perform a single jump to the data controller,because OBFSCURO normalizes the number of data accesswithin each code block to be exactly one (refer §V-E). Moreover,this jump is performed at a fixed offset within C-Pad to mitigatethe risk of branch prediction attacks. The target address of thisjump is also fixed, i.e., the start of the data controller’s logic.

c Return from Data ORAM controller to C-Pad: There isonly a single jump from the data controller to the C-Pad ata fixed offset within the C-Pad, after fetching/updating therequired data block on the D-Pad.

d Single D-Pad access: There is only a single access to theD-Pad per code block. Since the size of the D-Pad is 64B, thisaccess does not reveal offset information either.

e Jump from C-Pad to Code ORAM controller: Finally,OBFSCURO enforces that there is only one jump from C-Padto the code controller at a fixed address located towards theend of the C-Pad. The target address of this jump is also fixedat the start of the code controller logic.

Empirical Study. Lastly, we present the results of ourempirical study on obfuscated memory traces exhibited byvarious applications. The results are depicted in Figure 10.We choose six target applications for the study, includinganagram, pi, mattranspose, sum, fibonacci, and palindrome.These applications were chosen due to the diversity of theircomputational complexity.

In Figure 10, we attempt to show that there is no corre-lation between unobfuscated and obfuscated memory tracesof the same program. We measure multiple runs for eachaforementioned application, and for each run we accumulatedata corresponding a timing sequence to the address accessedby the program. Using the accumulated data, we calculate thePearson correlation value between the test applications andpopulate the corresponding cell in the confusion matrix. Forexample, consider the (anagram, anagram) cell in Figure 10-(a), the pearson correlation value is very close to 1 because this

10

Page 11: OBFSCURO: A Commodity Obfuscation Engine on Intel SGXobfscuro.pdf · OBFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad1, ⋆, Byunggill Joe2, , †, Yuan Xiao 3, Yinqian

This is a preprint, and the final paper will appear in NDSS 2019.

ORAM operations Sensitive information OBFSCURO defense Observed traces by adversaries

1. Locating corresponding pos.map element Offset in pos.map CMOV-scanning read Sequential read traces on pos.map

2. Extracting requested ORAM path to stash No sensitive info. - Sequential copy traces from requested ORAM path to stash

3-a. Copying ORAM block in stash to scratchpad (CMOV-based) Offset in stash CMOV-scanning copy Sequential copy traces from stash to scratchpad

3-b. Copying ORAM block in stash to scratchpad (Register-based) Offset in stash Register operations No traces since registers are oblivious to memory

4. Updating pos.map with new leaf number Offset in pos.map CMOV-scanning write Sequential write traces on pos.map

5-a. Writing back scratchpad to stash (CMOV-based) Offset in stash CMOV-scanning write Sequential write traces from scratchpad to stash

5-b. Writing back scratchpad to stash (Register-based) Offset in stash Register operations No traces since registers are oblivious to memory

6. Writing back stash to requested ORAM path No sensitive info. - Sequential write traces from stash to requested ORAM path

Fig. 9: Security analysis of secure ORAM implementation used by the code and data controller.

1.0

-1.0

0.0

anagram

pi

mattrans-

posesum

fibonacci

palin-

drome

(a) Before (a) After

Fig. 10: Confusion matrix for naive access patterns vs. obfuscatedpattern by OBFSCURO

cell is comparing the memory traces between two runs of thesame program. On the other hand, the correlation value in the(anagram, pi) cell is nearly 0 because their access patternsare quite unique to each other.

Figure 10-(b) shows the confusion matrix formed whilecomparing obfuscated programs (using OBFSCURO) to theirnative access patterns. Since OBFSCURO ensures that allapplications proceed in a fixed pattern of execution, the accesspatterns of these programs are completely different from theircounterparts in native execution. Furthermore, all cells inFigure 10-(b) are almost 0 because none of obfuscated programshave any correlation with any of the native programs.

B. Timing-based Attacks.

Apart from access pattern attacks, a privileged attacker canalso break program obfuscation within Intel SGX by abusingtiming channels. In particular, we can think of two ways inwhich an attacker can abuse timing channels to leak informationfrom OBFSCURO— (a) observing the time it takes for individualcode blocks (in C-Pad) to execute, and (b) observing thetotal time it takes for an obfuscated program to execute. Weindividually show the infeasibility of each of these timingchannels.

C-Pad Execution Time. Timing differences in executingeach code block (i.e., C-Pad) can leak information about theexecution semantic of the program. We statistically prove thatthis side channel is infeasible within OBFSCURO’s execution.The reason for this is that the execution time for the dataORAM access (which is performed exactly once per C-Pad)

ADD SUB IMUL IDIV ADD SUB IMUL IDIVNOPNOP

159.4K

159.0K

158.6K

# of ORAM tree leaves = 128 # of ORAM Tree leaves = 1024

159.8K

160.2K

19.5K

18.5K

17.5K

20.5K

Fig. 11: Distributions of code execution cycles of different types ofcode blocks (y-axis) with 10%∼90% percentile intervals.

109

cycl

es

3.5

3.4

3.3

3.2

3.1

# of ORAM tree leaves =128

1010

cycles

1.10

1.06

1.02

0.98

0.94

# of ORAM tree leaves =1024

Fig. 12: Distributions of total execution cycles of various test programs(y-axis) with 10%∼90% percentile intervals.

dominates the entire execution time of the C-Pad. We conducteda statistical experiment measuring CPU cycles in executingdifferent classes of code blocks. We constructed five differentcode blocks, including NOP, ADD, SUB, IMUL, IDIV code blocks.Each code block initially jumps to the data controller to fetcha data block and the remaining space is filled using one of theinstruction type. Furthermore, we impose data dependencieswithin the instructions to prevent out-of-order execution. Weaccumulated the execution times for each class over 10,000repetitions, and the distribution is shown in Figure 11. Asillustrated, the 10%∼90% percentile intervals for each type(marked as two broken lines) largely overlap, which is hardlypossible for an attacker to distinguish.

Program Execution Time. As mentioned in §V-F,OBFSCURO enforces a property that a program continuesexecuting until its number of executed code blocks reaches fixeduser-configured limit. In particular, OBFSCURO allows the userto define the total number of C-Pad executions a programshould perform. If the program’s logic terminates beforeconfigured C-Pad execution number, OBFSCURO continuesexecuting dummy code blocks to complete the number of

11

Page 12: OBFSCURO: A Commodity Obfuscation Engine on Intel SGXobfscuro.pdf · OBFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad1, ⋆, Byunggill Joe2, , †, Yuan Xiao 3, Yinqian

This is a preprint, and the final paper will appear in NDSS 2019.

Slo

wdo

wn (

tim

es)

50

100

150

200

250

20 40 60 80 100

20

40

60

80

# of encryptions(a) General programs (b) OpenSSL

Obfuscuro-CMOV

Obfuscuro-AVXObfuscuro-CMOV

Obfuscuro-AVX

Fig. 13: Performance benchmarks from our test applications. Theaverage performance overhead of OBFSCURO-CMOV is 83× and forOBFSCURO-AVX is 51×.

C-Pad executions.

In order to prove that this results in a uniform executiontime irrespective of the target program being executed, weperformed an experiment on a diverse set of applications asshown in Figure 12. In the experiment, we fixed the totalnumber of C-Pad executions for each of these applications to30, 000 and measured the total execution time. We accumulate50 executions for each program, and plot the distributions ofthem. As shown in the figure, the ranges of total executiontimes for the chosen evaluation set largely overlaps, despitecomputational diversity of these applications. The reason forthis is that each C-Pad execution, as shown above, is boundedat very similar execution times irrespective of the underlyingCPU instructions. Therefore, it is expected that the programexecution time (with same number of C-Pad executions) willalso be very similar.

VIII. PERFORMANCE EVALUATION

In this section, we report a detailed performance benchmarkthrough both micro-benchmarking custom applications andmacro-benchmarking by running openSSL [20].

Experimental Setup. All our evaluations were performed onIntel(R) Core(TM) i7-6700K CPU @ 3.40GHz (Skylake with8 MB cache, 8 cache-slices and 16-way set-associativity) with64 GB RAM (128 MB for EPC). Our system ran Ubuntu 16.04with Linux 4.4.0.59 64-bit. We performed our experimentsusing Intel SGX SDK [1] and the Intel SGX drivers [34].Due to the current unavailability of AVX-512 for SGX-enabledcomputers, most of our experiments (having large code and datasizes) used CMOV-based stash. However, we experimented withAVX2 registers to find the expected benefit of using the register-based stash and have accordingly simulated the performanceimprovement achieved by register-based stash on our targetapplications.

1) Micro-Evaluation: Firstly, we start by providing adetailed performance evaluation result by running severalprograms with OBFSCURO. Next, we show the performance im-provement achieved by the novel register-based stash designedby OBFSCURO.

Benchmarks. We ported simple benchmarking applicationson OBFSCURO in order to show the feasibility of obfuscated

Data Size (Bytes) CMOV (cycles) AVX (cycles) Improvement

1,024 272M 206M 32%2,048 521M 388M 34%4,096 1,044M 741M 41%8,192 2,050M 1,481M 38%

Fig. 14: Performance improvement achieved by using the AVX registerextensions as the ORAM stash compared to CMOV-based stash.

execution using commodity hardware such as Intel SGX. Inparticular, we ported a diverse set of applications from simpleapplications like finding the maximum within a given array tocomplex binary searching.

Figure 13-(a) shows the performance shown by OBFSCUROwhile running the test set of applications described above.We also simulate the performance of OBFSCURO-AVX (theversion of OBFSCURO which uses register-based stash. Thesesimulated results are based on the experiments we performedon AVX2. In general, the performance overhead of OBFSCURO-CMOV is on average 83× and OBFSCURO-AVX is 51× Theperformance overhead of OBFSCURO is expected since it hasto cater to the plethora of side-channels plaguing Intel SGX.In no particular order, the overhead is attributed to: (a) codeaccess control especially dealing with branch-alignment, (b)data access normalization and (c) side-channel-resistant ORAM-based access inside Intel SGX.

Comparison: CMOV-based vs Register-based Stash. In thecoming paragraphs, we provide a comparison of the CMOV-based stash versus the register-based stash. We attempt toanswer the question — what is the performance benefit attainedby using register-based stash over the CMOV-based stash? Onecaveat is that all our experiments are based on the AVX2registers and although we expect the performance benefitsto be similar while using the AVX-512 registers. Figure 14attempts to illustrate the performance benefit achieved by AVXextensions over CMOV while accessing data of variable sizethrough ORAM. Compared to the CMOV-based stash, since theregister-based stash performs just a single oblivious access ontothe AVX registers, it outperforms the CMOV-based stash. Theaverage improvement is around 30-40%.

2) Macro-Evaluation: In order to show how real-worldapplications perform with OBFSCURO, we provide a case-study with a real-world application, OpenSSL [20]. Figure 13shows the result of our evaluations using OpenSSL withOBFSCURO and without OBFSCURO. In this experiment, weperform a variable number of consecutive encryptions andcompare the results. As the number of encryptions increase,the difference between the performance of OBFSCURO andnative also increases. The reason for this is that OBFSCUROhas to perform a fixed number of ORAM operations which addssignificant overhead per-encryption whereas the per-encryptionoverhead of native execution is very small.

IX. DISCUSSION

Comparison with Cryptographic Schemes. There has beenextensive research thrusts in cryptographic circles to achievesecure remote computation. On par with what OBFSCUROprovides, we discuss following two security properties: 1)computational confidentiality (i.e., whether a computational

12

Page 13: OBFSCURO: A Commodity Obfuscation Engine on Intel SGXobfscuro.pdf · OBFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad1, ⋆, Byunggill Joe2, , †, Yuan Xiao 3, Yinqian

This is a preprint, and the final paper will appear in NDSS 2019.

artifact of a target program can be known to the untrusted party)and 2) integrity (i.e., whether the integrity of program executioncan be disrupted by the untrusted party). Towards these securityproperties, we focus our discussion on theoretical programobfuscation techniques, which construct virtual blackbox (VBB)using various techniques. We note that, unlike OBFSCUROwhich performs hardware-assisted secure remote computation,theoretical program obfuscation techniques [12, 17, 26] donot rely on specific architectural characteristics and thus aredesigned to be generally resistant to memory-based side-channelattacks.

More specifically, two well-known cryptographic primitives,fully-homomorphic encryption and garbled circuits are generallyused for the program obfuscation, but both of them are limitedin terms of either performance and generality. In the case offully-homomorphic encryption [21], its performance overheadis in twelve orders of magnitude scale in string search [51],severely impeding its adoption in practice today (and nearfuture). Moreover, while fully-homomorphic encryption ensuresconfidentiality of its inputs, its construction does not ensureintegrity. In the case of garbled circuits [22], its performanceoverhead is about four orders of magnitude for circuit evaluationalone. Moreover, it cannot be used for generic programs (i.e.,a loop structure in a program cannot be supported and thus theloop should be completely unrolled), and the integrity cannotbe guaranteed similar to fully-homomorphic encryption. Forboth of these solutions, verifiable computing techniques canbe adopted to provide integrity, but verifiable computing itselfimposes huge over performance overheads (i.e., about 104

times [65]).

Compared to these theoretical solutions, OBFSCURO effi-ciently achieves confidentiality and integrity, leveraging memoryprotection and remote attestation mechanisms of SGX. From theperformance perspectives, OBFSCURO is a much more practicalsolution as it imposes two orders of magnitude performanceoverhead, as opposed to twelve and four orders in the caseof fully-homomorphic encryption and circuit representation,respectively. OBFSCURO also supports generic programs, asit still retains the form of the host-architecture instructionin conjunction with secured ORAM, which is automaticallytransformed at the compiler level.

Protecting Input/Output. Traditional program obfuscationassumes that the attacker has an oracle-like access to theobfuscated program. Therefore, the assumption is that theattacker can provide any input that he/she desires and getthe corresponding correct output from the obfuscated program.However, this assumption can be further relaxed to protect theinput/output of the program. In that case, the scenario could beof a cloud provider who hosts an enclave on his/her machine butcannot directly access the enclave because the enclave requiresprior authentication to access. In this case, OBFSCURO canbe further leveraged to guarantee that an attacker does notfigure out the input/output either. For the input, since it isnot controlled by OBFSCURO, we assume that the user of theenclave will provide us a fixed-length encrypted memory bufferto extract the input from. OBFSCURO will execute for a fixedtime T based on the input and extract a fixed-size output fromthe D-Tree at the end of T . Then, OBFSCURO will encrypt thisdata and send it back to the user. During all this, the attackerknows that the program is executing but is completely blind to

what the execution might be.

Potential Applications. There are various potential appli-cations for OBFSCURO ranging from protection of a devel-oper’s intellectual property from theft to hiding exploitablevulnerabilities in programs. Firstly, OBFSCURO can ensure thatmachine learning services requiring huge computing resourcesand therefore cloud machines, can safely outsource theircomputational load. For example, companies like 23andMe [2]want to outsource genomic analysis but also want to stayahead of the competition by preventing the theft of theiralgorithm. Another potential application is in the prevention ofvulnerability exploitation. For example, an attacker can extractinformation about the open-source libraries used by the programand perform various code-reuse exploits against the program.In both cases, obfuscated memory traces of program executionprovided by OBFSCURO will be meaningless to the attacker.

General side-channel defense for SGX. OBFSCURO canbe utilized as a general-purpose side-channel defense, whosemain objective is to protect an input of a target programfrom attackers. The attackers usually exploit unique memoryaccess patterns leaked from side channels consisted of caches,page fault, and branch predictor [8, 24, 40, 57, 70]. BecauseOBFSCURO is specially designed to remain obfuscated fromthe channels, OBFSCURO can protect the target program,utilizing its obfuscated execution. For that purpose, we canrelax the obfuscation problem to a side channel defenseproblem, and apply the obfuscation mechanism to a subset ofprogram functions which are sensitive, resulting in performanceimprovement.

Other Use-cases of OBFSCURO. Our current design for anoblivious execution framework is SGX-specific. However, webelieve its design characteristics and optimization techniquesare general, which can be applied to other trusted platformssuch as AEGIS [64], Ascend [19], XOM [41], Bastion [9],Sanctum [14]. For example, our register-based stash (§V-B1)can be considered as a generic optimization for ORAM, ifthe underlying trust architecture shares any of memory-relatedsubsystem such as cache, TLB, MMU, and DRAM.

X. RELATED WORK

SGX Based Systems. Haven [6], Graphene [66, 67] andPanoply [61] provide LibOS for SGX, which enable easier appli-cation porting and prevent Iago attacks [10]. OpenSGX [36] pro-vides an open research framework for running SGX applications.VC3 [56] provides oblivious data analytical algorithms suchas MapReduce [15]. SGX-Shield [58] performs fine-grainedASLR within SGX environments. Some of OBFSCURO’s designschemes, particularly how OBFSCURO breaks a program intosmaller ORAM-compatible blocks, have been inspired bySGX-shield. Ryoan [30] provides a secure framework to portNative Client (NaCl) [71] in Intel SGX. SCONE [4] providesperformance optimizations and ports containers within SGX.Eleos [49] provides a framework to use non-enclave spaceto improve enclave performance. Glamdring [42] providesautomatic partitioning within enclave programs. These systemsdo not consider side-channel issues within SGX and can beused together with OBFSCURO.

Attacks on Intel SGX. SGX is vulnerable to both pagefault [70] and page table [68] attacks. Recent works [8, 24, 57]

13

Page 14: OBFSCURO: A Commodity Obfuscation Engine on Intel SGXobfscuro.pdf · OBFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad1, ⋆, Byunggill Joe2, , †, Yuan Xiao 3, Yinqian

This is a preprint, and the final paper will appear in NDSS 2019.

have shown that cache-based attacks are possible with an SGXenclave. SGX has also been found to be vulnerable againstthe branch-prediction attack [40]. Wang et. al [69] provide anoverview of the attack vectors against SGX and the limitationsof current defense solutions.

Defenses with Intel SGX There have been various de-fenses [59, 60] proposed against the page table attacks. T-SGX [59] uses Transactional Memory (TSX) to run a program.However, T-SGX is vulnerable to the improved controlledchannel attack [68]. Cloak [27] also utilizes TSX as a defenseprimitive, but it only considers cache side channel attacks, and itis limited to a cache size of CPU. Another work [60], providesa way to prevent page faults from the OS-level attacker byperiodically modifying the program’s memory access patterns.Ohrimenko et. al [48] show how to re-adapt ML-algorithmsto exhibit data-oblivious memory access patterns. For cache-based attacks, the previous solutions [13, 37, 72], for non-SGX environments, are not directly applicable since most ofthem require OS-level support. Compared to these defenses,OBFSCURO proposes a generic security framework againstall memory-based side-channel attacks. Obliviate [3] andZeroTrace [55] provide access to files and data structuresrespectively using secure ORAM implementations. Comparedto OBFSCURO, their scope of protection is limited, i.e., filesand data arrays respectively.

Hardware and Software-based Oblivious Systems. Previouswork has alluded to the concept of creating oblivious sys-tems based on custom hardware [18, 46, 47], software-leveldefenses [45, 52] or hybrid [44]. All aforementioned systemsuse variants of ORAM [23] to achieve oblivious execution.Out of all these works, HOP [47] and Phantom [46] are themost similar. However, both Phantom and HOP use RISC-V processors to implement secure ORAM controllers whileOBFSCURO runs on commodity trusted hardware.

XI. CONCLUSION

This paper presents OBFSCURO, the first system whichprovides program obfuscation using commodity trusted hard-ware. OBFSCURO systematically protects the SGX enclaveagainst information leakage through all side-channels, therebyneutralizing all memory and timing footprints to create a virtualblackbox for obfuscated program execution. Our evaluationshows that OBFSCURO can provide strong obfuscation guar-antees within Intel SGX while performing much faster thanexisting cryptographic schemes and being more deployment-friendly than existing system-based solutions.

REFERENCES

[1] 01org. 2016. Intel(R) Software Guard Extensions for Linux* OS (source code).(2016). https://github.com/01org/linux-sgx

[2] 23andme. 2018. 23andme. (2018). https://www.23andme.com[3] Adil Ahmad, Kyungtae Kim, Muhammad Ihsanulhaq Sarfaraz, and Byoungyoung

Lee. 2018. OBLIVIATE: A Data Oblivious File System for Intel SGX. In Pro-ceedings of the 2018 Annual Network and Distributed System Security Symposium(NDSS). San Diego, CA.

[4] Sergei Arnautov, Bohdan Trach, Franz Gregor, Thomas Knauth, Andre Martin,Christian Priebe, Joshua Lind, Divya Muthukumaran, Dan O’Keeffe, Mark Stillwell,et al. 2016. SCONE: Secure Linux Containers with Intel SGX.. In Proceedingsof the 12th USENIX Symposium on Operating Systems Design and Implementation(OSDI). Savannah, GA.

[5] Boaz Barak, Oded Goldreich, Rusell Impagliazzo, Steven Rudich, Amit Sahai, SalilVadhan, and Ke Yang. 2001. On the (im) possibility of obfuscating programs. InAnnual International Cryptology Conference. Springer.

[6] Andrew Baumann, Marcus Peinado, and Galen Hunt. 2014. Shielding Applicationsfrom an Untrusted Cloud with Haven. In Proceedings of the 11th USENIX

Symposium on Operating Systems Design and Implementation (OSDI). Broomfield,Colorado.

[7] Nir Bitansky, Ran Canetti, Shafi Goldwasser, Shai Halevi, Yael Tauman Kalai, andGuy N Rothblum. 2011. Program obfuscation with leaky hardware. In InternationalConference on the Theory and Application of Cryptology and Information Security.Springer, 722–739.

[8] Ferdinand Brasser, Urs Müller, Alexandra Dmitrienko, Kari Kostiainen, SrdjanCapkun, and Ahmad-Reza Sadeghi. 2017. Software Grand Exposure: SGX CacheAttacks Are Practical. In 11th USENIX Workshop on Offensive Technologies (WOOT17). Vancouver, BC.

[9] David Champagne and Ruby B Lee. 2010. Scalable architectural support for trustedsoftware. In High Performance Computer Architecture (HPCA), 2010 IEEE 16thInternational Symposium on. IEEE, 1–12.

[10] Stephen Checkoway and Hovav Shacham. 2013. Iago Attacks: Why the SystemCall API is a Bad Untrusted RPC Interface. In Proceedings of the 18th ACMInternational Conference on Architectural Support for Programming Languagesand Operating Systems (ASPLOS). Houston, TX.

[11] Kai-Min Chung, Jonathan Katz, and Hong-Sheng Zhou. 2013. Functional encryp-tion from (small) hardware tokens. In International Conference on the Theory andApplication of Cryptology and Information Security. Springer, 120–139.

[12] Kai-Min Chung, Jonathan Katz, and Hong-Sheng Zhou. 2013. Functional Encryp-tion from (Small) Hardware Tokens. In Advances in Cryptology - ASIACRYPT 2013.Springer Berlin Heidelberg, Berlin, Heidelberg, 120–139.

[13] Bart Coppens, Ingrid Verbauwhede, Koen De Bosschere, and Bjorn De Sutter.2009. Practical mitigations for timing-based side-channel attacks on modern x86processors. In Proceedings of the 30th IEEE Symposium on Security and Privacy(Oakland). Oakland, CA.

[14] Victor Costan, Ilia A Lebedev, and Srinivas Devadas. 2016. Sanctum: MinimalHardware Extensions for Strong Software Isolation.. In Proceedings of the 25thUSENIX Security Symposium (Security). Austin, TX.

[15] Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processingon large clusters. Commun. ACM 51, 1 (2008), 107–113.

[16] Nico Döttling, Thilo Mie, Jörn Müller-Quade, and Tobias Nilges. 2011. BasingObfuscation on Simple Tamper-Proof Hardware Assumptions. IACR CryptologyePrint Archive 2011 (2011), 675.

[17] nico döttling, thilo mie, jörn müller quade, and tobias nilges. 2011. basingobfuscation on simple tamper-proof hardware assumptions. 2011 (01 2011), 675.

[18] Christopher W. Fletcher, Marten van Dijk, and Srinivas Devadas. 2012. A SecureProcessor Architecture for Encrypted Computation on Untrusted Programs. InProceedings of the Seventh ACM Workshop on Scalable Trusted Computing (STC’12).

[19] Christopher W Fletcher, Marten van Dijk, and Srinivas Devadas. 2012. Asecure processor architecture for encrypted computation on untrusted programs. InProceedings of the seventh ACM workshop on Scalable trusted computing. ACM,3–8.

[20] OpenSSL Software Foundation. 2017. OpenSSL: Cryptography and SSL/TLSToolkit. https://www.openssl.org/. (2017).

[21] Craig Gentry. 2009. Fully Homomorphic Encryption Using Ideal Lattices. InProceedings of the Forty-first Annual ACM Symposium on Theory of Computing(STOC ’09). ACM, New York, NY, USA, 169–178. https://doi.org/10.1145/1536414.1536440

[22] Oded Goldreich. 2003. Cryptography and cryptographic protocols. DistributedComputing (2003), 177–199.

[23] Oded Goldreich and Rafail Ostrovsky. 1996. Software protection and simulationon oblivious RAMs. Journal of the ACM (JACM) 43, 3 (1996), 431–473.

[24] Johannes Götzfried, Moritz Eckert, Sebastian Schinzel, and Tilo Müller. 2017.Cache Attacks on Intel SGX.. In EUROSEC. 2–1.

[25] Vipul Goyal, Yuval Ishai, Amit Sahai, Ramarathnam Venkatesan, and Akshay Wadia.2010. Founding cryptography on tamper-proof hardware tokens. In Theory ofCryptography Conference. Springer, 308–326.

[26] Vipul Goyal, Yuval Ishai, Amit Sahai, Ramarathnam Venkatesan, and Akshay Wadia.2010. Founding Cryptography on Tamper-Proof Hardware Tokens. In Theory ofCryptography, Daniele Micciancio (Ed.). 08–326.

[27] Daniel Gruss, Julian Lettner, Felix Schuster, Olya Ohrimenko, Istvan Haller, andManuel Costa. 2017. Strong and efficient cache side-channel protection usinghardware transactional memory. In USENIX Security Symposium. 217–233.

[28] Shay Gueron. 2016. A Memory Encryption Engine Suitable for General PurposeProcessors. (2016).

[29] Satoshi Hada. 2000. Zero-knowledge and code obfuscation. In InternationalConference on the Theory and Application of Cryptology and Information Security.Springer.

[30] Tyler Hunt, Zhiting Zhu, Yuanzhong Xu, Simon Peter, and Emmett Witchel. 2016.Ryoan: A Distributed Sandbox for Untrusted Computation on Secret Data. InProceedings of the 12th USENIX Symposium on Operating Systems Design andImplementation (OSDI). Savannah, GA.

[31] Intel. 2014. Overview: Intrinsics for Intel (R) Advanced Vector Extensions 2(IntelÂo AVX2) Instructions. (2014). https://software.intel.com/en-us/node/523876

[32] Intel. 2017. Intel (R) Core (TM) i9-7970X Processor. (2017).https://ark.intel.com/products/126697/Intel-Core-i9-7960X-X-series-Processor-22M-Cache-up-to-420-GHz

[33] Intel. 2017. Intel (R) Core (TM) i9-7980XE Extreme Edition Proces-sor. (2017). https://ark.intel.com/products/126699/Intel-Core-i9-7980XE-Extreme-Edition-Processor-2475M-Cache-up-to-420-GHz

[34] Intel. 2017. Intel SGX Linux* Driver. (2017). https://github.com/01org/linux-sgx-

14

Page 15: OBFSCURO: A Commodity Obfuscation Engine on Intel SGXobfscuro.pdf · OBFSCURO: A Commodity Obfuscation Engine on Intel SGX Adil Ahmad1, ⋆, Byunggill Joe2, , †, Yuan Xiao 3, Yinqian

This is a preprint, and the final paper will appear in NDSS 2019.

driver[35] Intel. 2018. Intel Analysis of Speculative Execution Side Channels.

(2018). https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/Intel-Analysis-of-Speculative-Execution-Side-Channels.pdf

[36] Prerit Jain, Soham Desai, Seongmin Kim, Ming-Wei Shih, JaeHyuk Lee, ChanghoChoi, Youjung Shin, Taesoo Kim, Brent B. Kang, and Dongsu Han. 2016.OpenSGX: An Open Platform for SGX Research. In Proceedings of the 2016Annual Network and Distributed System Security Symposium (NDSS). San Diego,CA.

[37] Taesoo Kim, Marcus Peinado, and Gloria Mainar-Ruiz. 2012. STEALTHMEM:System-Level Protection Against Cache-Based Side Channel Attacks in the Cloud..In Proceedings of the 21st USENIX Security Symposium (Security). Bellevue, WA.

[38] Paul Kocher, Jann Horn, Anders Fogh, , Daniel Genkin, Daniel Gruss, Werner Haas,Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz,and Yuval Yarom. 2019. Spectre Attacks: Exploiting Speculative Execution. InProceedings of the 40th IEEE Symposium on Security and Privacy (Oakland). SanFrancisco, CA.

[39] Dmitrii Kuvaiskii, Oleksii Oleksenko, Sergei Arnautov, Bohdan Trach, PramodBhatotia, Pascal Felber, and Christof Fetzer. 2017. SGXBOUNDS: Memory Safetyfor Shielded Execution. In Proceedings of the Twelfth European Conference onComputer Systems (EuroSys ’17). ACM.

[40] Sangho Lee, Ming-Wei Shih, Prasun Gera, Taesoo Kim, Hyesoon Kim, and MarcusPeinado. 2017. Inferring Fine-grained Control Flow Inside SGX Enclaves withBranch Shadowing. In Proceedings of the 26th USENIX Security Symposium(Security). Vancouver, BC.

[41] David Lie, Chandramohan Thekkath, Mark Mitchell, Patrick Lincoln, Dan Boneh,John Mitchell, and Mark Horowitz. 2000. Architectural support for copy and tamperresistant software. ACM SIGPLAN Notices 35, 11 (2000), 168–177.

[42] Joshua Lind, Christian Priebe, Divya Muthukumaran, Dan OâAZKeeffe, Pierre-Louis Aublin, Florian Kelbert, Tobias Reiher, David Goltzsche, et al. 2017.Glamdring: Automatic application partitioning for Intel SGX. In Proceedings ofthe 2017 USENIX Annual Technical Conference (ATC). Santa Clara, CA.

[43] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas,Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, YuvalYarom, and Mike Hamburg. 2018. Meltdown: Reading Kernel Memory from UserSpace. In 27th USENIX Security Symposium (USENIX Security 18).

[44] Chang Liu, Austin Harris, Martin Maas, Michael Hicks, Mohit Tiwari, and ElaineShi. 2015. Ghostrider: A hardware-software system for memory trace obliviouscomputation. ACM SIGARCH Computer Architecture News 43, 1 (2015), 87–101.

[45] Chang Liu, Michael Hicks, and Elaine Shi. 2013. Memory trace oblivious programexecution. In Computer Security Foundations Symposium (CSF), 2013 IEEE 26th.IEEE, 51–65.

[46] Martin Maas, Eric Love, Emil Stefanov, Mohit Tiwari, Elaine Shi, Krste Asanovic,John Kubiatowicz, and Dawn Song. 2013. Phantom: Practical oblivious computationin a secure processor. In Proceedings of the 20th ACM Conference on Computerand Communications Security (CCS). Berlin, Germany.

[47] Kartik Nayak, Christopher Fletcher, Ling Ren, Nishanth Chandran, Satya Lokam,Elaine Shi, and Vipul Goyal. 2017. Hop: Hardware makes obfuscation practical.In Proceedings of the 2017 Annual Network and Distributed System SecuritySymposium (NDSS). San Diego, CA.

[48] Olga Ohrimenko, Felix Schuster, Cédric Fournet, Aastha Mehta, Sebastian Nowozin,Kapil Vaswani, and Manuel Costa. 2016. Oblivious Multi-Party Machine Learningon Trusted Processors.. In Proceedings of the 25th USENIX Security Symposium(Security). Austin, TX.

[49] Meni Orenbach, Pavel Lifshits, Marina Minkin, and Mark Silberstein. 2016. Eleos:ExitLess OS Services for SGX Enclaves.. In Proceedings of the 12th EuropeanConference on Computer Systems (EuroSys). Belgrade, Serbia.

[50] Dag Arne Osvik, Adi Shamir, and Eran Tromer. 2006. Cache attacks and counter-measures: the case of AES. In CryptographersâAZ Track at the RSA Conference.Springer, 1–20.

[51] Raluca Ada Popa, Catherine Redfield, Nickolai Zeldovich, and Hari Balakrishnan.2011. CryptDB: protecting confidentiality with encrypted query processing. InProceedings of the Twenty-Third ACM Symposium on Operating Systems Principles.ACM, 85–100.

[52] Ashay Rane, Calvin Lin, and Mohit Tiwari. 2015. Raccoon: closing digital side-channels through obfuscated execution. In Proceedings of the 24th USENIX SecuritySymposium (Security). Washington, DC.

[53] James Reinders. 2013. AVX-512 instructions. Intel Corporation (2013).[54] Eric Rescorla. 1999. Diffie-hellman key agreement method. (1999).[55] Sajin Sasy, Sergey Gorbunov, and Christopher W. Fletcher. 2018. ZeroTrace:

Oblivious Memory Primitives from Intel SGX. In Proceedings of the 2018 AnnualNetwork and Distributed System Security Symposium (NDSS). San Diego, CA.

[56] Felix Schuster, Manuel Costa, Cédric Fournet, Christos Gkantsidis, Marcus Peinado,Gloria Mainar-Ruiz, and Mark Russinovich. 2015. VC3: Trustworthy data analyticsin the cloud using SGX. In Proceedings of the 36th IEEE Symposium on Securityand Privacy (Oakland). San Jose, CA.

[57] Michael Schwarz, Samuel Weiser, Daniel Gruss, Clémentine Maurice, and StefanMangard. 2017. Malware guard extension: Using SGX to conceal cache attacks. InInternational Conference on Detection of Intrusions and Malware, and VulnerabilityAssessment. Springer, 3–24.

[58] Jaebaek Seo, Byounyoung Lee, Seongmin Kim, Ming-Wei Shih, Insik Shin,Dongsu Han, and Taesoo Kim. 2017. SGX-Shield: Enabling address space layoutrandomization for SGX programs.. In Proceedings of the 2017 Annual Network andDistributed System Security Symposium (NDSS). San Diego, CA.

[59] Ming-Wei Shih, Sangho Lee, Taesoo Kim, and Marcus Peinado. 2017. T-SGX:Eradicating Controlled-Channel Attacks Against Enclave Programs. In Proceedingsof the 2017 Annual Network and Distributed System Security Symposium (NDSS).San Diego, CA.

[60] S Shinde, ZL Chua, V Narayanan, and P Saxena. 2016. Preventing your faults fromtelling your secrets. In Proceedings of the 11th ACM Symposium on Information,Computer and Communications Security (ASIACCS). Xi’an, China.

[61] Shweta Shinde, Dat Le Tien, Shruti Tople, and Prateek Saxena. 2017. PANOPLY:Low-TCB Linux Applications With SGX Enclaves. In Proceedings of the 2017Annual Network and Distributed System Security Symposium (NDSS). San Diego,CA.

[62] Rohit Sinha, Sriram Rajamani, and Sanjit A. Seshia. 2017. A Compiler and Verifierfor Page Access Oblivious Computation. In Proceedings of the 2017 11th JointMeeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM.

[63] Emil Stefanov, Marten van Dijk, Elaine Shi, Christopher Fletcher, Ling Ren,Xiangyao Yu, and Srinivas Devadas. 2013. Path ORAM: An Extremely SimpleOblivious RAM Protocol. In Proceedings of the 20th ACM Conference on Computerand Communications Security (CCS). Berlin, Germany.

[64] G Edward Suh, Charles W O’Donnell, and Srinivas Devadas. 2005. AEGIS: Asingle-chip secure processor. Information Security Technical Report 10, 2 (2005),63–73.

[65] Justin Thaler. 2017. Verifiable Computing: Between Theory and Practice. (2017).https://people.eecs.berkeley.edu/~alexch/docs/pcpipthaler.pdf

[66] Chia-Che Tsai, Kumar Saurabh Arora, Nehal Bandi, Bhushan Jain, William Jannen,Jitin John, Harry A Kalodner, Vrushali Kulkarni, Daniela Oliveira, and Donald EPorter. 2014. Cooperation and security isolation of library OSes for multi-processapplications. In Proceedings of the 9th European Conference on Computer Systems(EuroSys). Amsterdam, The Netherlands.

[67] Chia-Che Tsai, Donald E Porter, and Mona Vij. 2017. Graphene-SGX: A practicallibrary OS for unmodified applications on SGX. In 2017 USENIX Annual TechnicalConference (USENIX ATC).

[68] Jo Van Bulck, Nico Weichbrodt, Rüdiger Kapitza, Frank Piessens, and RaoulStrackx. 2017. Telling your secrets without page faults: Stealthy page table-based attacks on enclaved execution. In Proceedings of the 26th USENIX SecuritySymposium (Security). Vancouver, BC.

[69] Wenhao Wang, Guoxing Chen, Xiaorui Pan, Yinqian Zhang, XiaoFeng Wang,Vincent Bindschaedler, Haixu Tang, and Carl A Gunter. 2016. Leaky Cauldronon the Dark Land: Understanding Memory Side-Channel Hazards in SGX. InProceedings of the 24th ACM Conference on Computer and CommunicationsSecurity (CCS). Dallas, TX.

[70] Yuanzhong Xu, Weidong Cui, and Marcus Peinado. 2015. Controlled-channelattacks: Deterministic side channels for untrusted operating systems. In Proceedingsof the 36th IEEE Symposium on Security and Privacy (Oakland). San Jose, CA.

[71] Bennet Yee, David Sehr, Gregory Dardyk, J Bradley Chen, Robert Muth, TavisOrmandy, Shiki Okasaka, Neha Narula, and Nicholas Fullagar. 2009. Native client:A sandbox for portable, untrusted x86 native code. In Proceedings of the 30th IEEESymposium on Security and Privacy (Oakland). Oakland, CA.

[72] Ziqiao Zhou, Michael K Reiter, and Yinqian Zhang. 2016. A software approachto defeating side channels in last-level caches. In Proceedings of the 23rd ACMConference on Computer and Communications Security (CCS). Vienna, Austria.

15