securecloud...we take a look at how exactly the developed cas-as-a-service (casaas) works. we...

42
SecureCloud Joint EU-Brazil Research and Innovation Action S ECURE BIG DATA P ROCESSING IN UNTRUSTED CLOUDS https://www.securecloudproject.eu/ Services for trust management for secure resources D2.3 Due date: 30 June 2018 Submission date: 3 July 2018 Start date of project: 1 January 2016 Document type: Deliverable Work package: WP2 Editor: Andrey Brito (UFCG) Reviewer: Christof Fetzer (TUD) Reviewer: Hanneli Tavante, Maurílio Coutinho (UNIFEI) Dissemination Level PU Public CO Confidential, only for members of the consortium (including the Commission Services) CI Classified, as referred to in Commission Decision 2001/844/EC SecureCloud has received funding from the European Union’s Horizon 2020 research and innovation programme and was supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) under grant agreement No 690111.

Upload: others

Post on 26-Jan-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

  • SecureCloud

    Joint EU-Brazil Research and Innovation ActionSECURE BIG DATA PROCESSING IN UNTRUSTED CLOUDS

    https://www.securecloudproject.eu/

    Services for trust management for secure resourcesD2.3

    Due date: 30 June 2018Submission date: 3 July 2018

    Start date of project: 1 January 2016

    Document type: DeliverableWork package: WP2

    Editor: Andrey Brito (UFCG)

    Reviewer: Christof Fetzer (TUD)Reviewer: Hanneli Tavante, Maurílio Coutinho (UNIFEI)

    Dissemination LevelPU Public

    √CO Confidential, only for members of the consortium (including the Commission Services)CI Classified, as referred to in Commission Decision 2001/844/EC

    SecureCloud has received funding from the European Union’s Horizon 2020 research and innovationprogramme and was supported by the Swiss State Secretariat for Education, Research and Innovation(SERI) under grant agreement No 690111.

  • Tasks related to this deliverable:Task No. Task description Partners involved○

    T2.3 Services for trust management for secure resources UFCG∗,TUD,IMP,SYNC, CS

    ○This task list may not be equivalent to the list of partners contributing as authors to the deliverable∗Task leader

  • Contents

    1 Introduction 2

    2 Background 32.1 Intel SGX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.1.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.2 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.2 Managing a cloud infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.1 OpenStack services and architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.2 Provisioning virtual machines in OpenStack . . . . . . . . . . . . . . . . . . . . . 10

    2.3 SGX virtualization with KVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 SGX-enabled virtual machines using Nova . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.4.1 The provisioning process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4.2 The scheduling process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.5 SCONE: Secure Container Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    3 CAS Access for external Users 163.1 Changes made on OpenStack Keystone . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2 How the CASaaS works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3 Validation of the CASaaS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    4 Demonstrator 224.1 Creating a private CAS (CAS Host) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    4.1.1 Create a security group for the CAS Host . . . . . . . . . . . . . . . . . . . . . . . 234.1.2 Create a CAS Host . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.1.3 Install the Intel SGX driver on the instance . . . . . . . . . . . . . . . . . . . . . . 344.1.4 Run CAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    4.2 Installing the Patched Keystone (Keystone Container) . . . . . . . . . . . . . . . . . . . . 344.3 Installing CASaaS (OSA Controller) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.4 Running the example (Local Machine) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    5 Final remarks 38

    i

  • List of Figures

    2.1 Architecture of a production installation of OpenStack . . . . . . . . . . . . . . . . . . . . 9

    3.1 Successful benchmark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Failed benchmark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    4.1 LSD Secure Cloud login screen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.2 Create a security group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.3 Manage security group rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.4 private-cas rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.5 Create an instance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.6 Name your instance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.7 Select the instance image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.8 Select the instance flavor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.9 Select the instance network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.10 Network ports are not needed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.11 Select the instance security group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.12 Select the instance key pair. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.13 Configuration is not needed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.14 Server groups are not needed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.15 Scheduler hints are not needed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.16 Launch the instance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    1

  • 1 IntroductionIn the SecureCloud project, we aim to enable the secure execution of big data applications within untrustedcloud environments. Although the projects’ use cases originate from the domain of smart grids, thedeveloped SecureCloud solutions are generic. Thus, they provide a large range of security requirementsthat are also applicable in many other scenarios. These requirements include confidentiality and integrityof sensitive data, as well as the preservation of the typical tools and system interfaces, as much as possible,to ease the adoption process.

    The smart grid use case has been selected because, as with other application domains, it is alsoaffected by the increasing amount of data that is generated by a large range of device types (e.g., meters,sensors in the distribution or in the transmission systems).

    In such a context, we need technologies that help us provide access control to certain features ina secure and efficient way. With this in mind, the SecureCloud project makes use of state-of-the-arttechnologies such as OpenStack [3] and SCONE [8], for generating tools or extensions that enableservices for trust management for secure resources.

    The goal of this document is to serve as a tutorial for using the demonstrator (Deliverable 2.3) as wellas to understand the changes made to OpenStack.

    In this document, we present the solution implemented to enable role-based access control (RBAC) ontop of the SecureCloud infrastructure services. More specifically, we want to use the standard OpenStackRBAC rules and configurations to control the access to the Configuration and Attestation Service (CAS).We start by detailing the changes needed on Keystone [4], the identity component in OpenStack, and thenwe take a look at how exactly the developed CAS-as-a-Service (CASaaS) works.

    We organize this deliverable as follows. In Chapter 2, we review key background concepts, especiallyOpenStack, SGX virtualization with KVM and SCONE. Then, in Chapter 3 we review the changes madeon the Keystone service and summarize how the CASaaS works. Chapter 4 is the description of the actualdemonstrator and a guide on how to experiment with it. Finally, Chapter 5 summarizes the contributionsand lists some related next steps.

    2

  • 2 BackgroundIn this chapter we review basic concepts relevant to the context of this demonstrator1. This includes a briefoverview on Intel SGX and on OpenStack, including OpenStack’s component for container orchestration,Magnum.

    2.1 Intel SGX

    Intel Software Guard eXtensions (SGX) is an Intel hardware-based technology for ensuring security ofsensitive data from disclosure or modification. It enables user-level code to allocate enclaves (i.e., privateregions of memory) that are protected even from processes running at higher privilege levels. Intel SGXcapabilities are available from a set of instructions introduced in off-the-shelf processors based on theSkylake microarchitecture, starting from the 6th Generation Intel Core family and 5th generation of someXeon E3 family.

    2.1.1 Components

    An application of the Intel SGX technology typically2 requires four main components: (i) the availabilityof the set of instructions in the processor, (ii) the operating system driver, (iii) the software developmentkit to facilitate the access to the driver from the application code, and (iv) Platform Software.

    The Platform Software (Intel SGX PSW) is a collection of special SGX enclaves, and an Intel SGXApplication Enclave Services Manager (AESM), provided along with the SGX SDK. These specialenclaves and AESM are used when loading enclaves, retrieving cryptographic keys, and evaluating thecontents of an enclave. The software development kit (SDK) is a collection of APIs, sample source code,libraries and tools that enable software developers to write and debug SGX applications in C/C++. Next,the drivers enable OS’s and other software to access the SGX hardware. Intel SGX drivers are availableboth for Windows (via Intel Management Engine) and for Linux* OS’s. Finally, the instruction set iscomposed of 17 new instructions that can be classified into the following functions [11]:

    Enclave build/teardown: Used to allocate protected memory for the enclave, load values into theprotected memory, measure the values loaded into the enclave’s protected memory, and teardownthe enclave after the application has completed. Instructions used for this purpose are:

    • ECREATE - Declare base and range, start build

    • EADD - Add 4k page

    • EEXTEND - Measure 256 bytes

    • EINIT - Declare enclave built

    • EREMOVE - Remove Page

    Enclave entry/exit: Used to enter and exit the enclave. An enclave can be entered and exited explicitly.It may also be exited asynchronously due to interrupts or exceptions. In the case of asynchronousexits, the hardware will save all secrets inside the enclave, scrub secrets from registers, and returnto external program flow. It then resumes where it left off execution. Instructions used for thispurpose are:

    1Most of this chapter is an extract from the previous deliverable, D2.1, and has been included here for making the documentsufficiently self-contained.

    2If an application accesses processor resources directly, the software development kit and platform software are optional.

    3

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    • EENTER - Enter enclave• ERESUME - Resume enclave• EEXIT - Leave enclave• AEX - Asynchronous enclave exit

    Enclave security operations: Allow an enclave to prove to an external party that the enclave was builton hardware which supports the SGX instruction set. Instructions used for this purpose are:

    • EREPORT - Enclave report• EGETKEY - Generate unique key

    Paging instructions: Allow system software to securely move enclave pages to and from unprotectedmemory. Instructions used for this purpose are:

    • EPA - Create version array page• ELDB/U - Load an evicted page into protected memory• EWB - Evict a protected page• EBLOCK - Prepare for eviction• ETRACK - Prepare for eviction

    Debug instructions: Allow developers to use familiar debugging techniques inside special debugenclaves. A debug enclave can be single stepped and examined. A debug enclave cannot share datawith a production enclave. This protects enclave developers if a debug enclave should escape thedevelopment environment. Instructions used for this purpose are:

    • EDBGRD - Read inside debug enclave• EDBGWR - Write inside debug enclave

    In addition, some features of Intel SGX make it very useful for providing data security. The mainaspects are discussed below:

    Enclave Page Cache: The Enclave Page Cache (EPC) is a protected memory used to store enclavepages and SGX structures. The EPC is divided into 4KB chunks called EPC pages. EPC pages caneither be valid or invalid. A valid EPC page contains either an enclave page or an SGX structure.

    Each enclave instance has an enclave control structure, SECS. Every valid enclave page in theEPC belongs to exactly one enclave instance. System software is required to map enclave virtualaddresses to a valid EPC page.

    Memory Encryption Engine: Memory Encryption Engine (MEE) is a hardware unit that encrypts andintegrity protects selected traffic between the processor package and the main memory (DRAM).The overall memory region that an MEE operates on is called an MEE Region. Depending onimplementation, the Processor Reserved Memory (PRM) is covered by one or more MEE regions.Intel SGX guarantees that all the data that leaves the CPU and is stored in DRAM is first encryptedusing the MEE. Thus, even attackers with physical access to DRAM will not be able to retrievesecret data protected by SGX enclaves from it.

    4

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    Memory Access Semantics: CPU memory protection mechanisms physically block access to PRMfrom all external agents, by treating such accesses as references to non-existent memory. To accessa page inside an enclave using MOV and other memory related instructions, the hardware checksthe following:

    • Logical processor is executing in "enclave mode".

    • Page belongs to enclave that the logical processor is executing.

    • Page accessed using the correct virtual address.

    If any of these checks fails, the page access is treated as reference to nonexistent memory, or bysignaling a fault. This guarantees that even processes with higher privilege levels won’t be able toaccess enclaves’ memory.

    2.1.2 Usage

    Hoekstra et al. define three examples of secure solutions that have been developed to take advantage ofthe new instructions provided by Intel SGX [9]:

    One-time Password (OTP): OTP is an authentication technology often used as a second factor toauthenticate a user. As suggested by the name, the password is valid only for one authenticationand is often used to authorize online financial transactions. There are two primary components tothe architecture: the OTP server and the OTP client. In the case of the prototype developed for thiswork, the OTP client side component is implemented as a browser plugin. Within the OTP clientsoftware, the algorithms that interact directly with the OTP secrets are placed in an enclave. TheOTP server can then use the Remote Attestation mechanism to verify the enclave running on theclient side and establish a secure communication channel, allowing it to send a preshared key to beused as an OTP.

    Enterprise Rights Management: Enterprise Rights Management (ERM) is a technology that aims tosecure crucial elements of access and distribution of sensitive documents, such as confidentiality,access control, usage policies, and logging of user activities. While most existing solutions focuson the protection of enterprise data, the need to enforce the authorized use and disseminationof personal content such as pictures and videos is becoming increasingly apparent. The sametechnologies could be used for this purpose as well.

    If an attacker has physical possession of a platform, he may be able to use memory snooping orcold boot style attacks [10] to acquire the keying material for a valid ERM solution. This wouldpermit the attacker to create malware which could use those stolen keys to effectively impersonatea valid ERM client. To avoid this kind of attack, SGX can be used, so that even if an attacker gainsphysical possession of the platform he will not be able to acquire the keying material.

    Secure Video Conferencing: With the widespread availability of high network bandwidth andinexpensive hardware for capturing video and audio on client platforms, the use of video chat, videoconferencing and web conferencing applications has become increasingly popular for real timeinformation sharing. This creates an opportunity for the unauthorized capture and distribution of a

    5

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    video conferencing stream by malicious individuals, or theft of valuable IP or sensitive informationin enterprise and government sectors.

    Today’s secure video conferencing solutions provide strong protection of sensitive content on thenetwork through the use of cryptographic methods. But with the migration of threats from thenetwork onto the computing platform, this level of security is no longer sufficient to protect theAV stream as it is being processed on the computing device. SGX allows a video conferencingapplication to protect its assets on the platform and enables strong participant authentication, thusmitigating a broad range of threats that could compromise the secrecy and integrity of the AVstream.

    All the above mentioned examples provided by Intel are focused on the usage of SGX on clientmachines. Despite that, Intel SGX has also potential usages on server-side/backend applications. Oneexample is VC3 [12], a system that allows users to run distributed MapReduce computations in thecloud while keeping their code and data secret, and ensuring the correctness and completeness of theirresults. VC3 runs on unmodified Hadoop, but crucially keeps Hadoop, the operating system and thehypervisor out of the trusted computing base (TCB), thus, confidentiality and integrity are preserved evenif these large components are compromised. VC3 relies on SGX to isolate memory regions on individualcomputers, and to deploy new protocols that secure distributed MapReduce computations.

    2.1.3 Limitations

    Before designing a new secure application using SGX, there are some limitations that need to be kept inmind by enclave developers when designing an enclave, in order to avoid having security flaws or bigoverheads due to memory swapping.

    Memory size: When starting a machine, the SGX capable processor needs to reserve a portion ofmemory to itself (PRM). Also, the entire EPC must reside inside the PRM. In the current version ofSGX, this portion of memory is limited to 128MB in size per machine. If more space is neededthan what is available, a big overhead in processing time is added, due to the need to re-encrypt thedata before swapping from EPC to DRAM and r-encrypting the data after swapping from DRAMto EPC. This re-encryption is needed since data in the EPC is cache-line encrypted and in theDRAM it is page-encrypted.

    Programming languages: The SGX SDK provided by Intel is only compatible with C/C++. Thisleaves secure application developers with no choice over what programming language to use whenwriting an enclave’s code.

    Hardware dependency: Intel SGX is a hardware-based technology. Therefore, the SGX capabilitiescan only be used in machines that are SGX capable and that have SGX enabled on BIOS setup.

    2.2 Managing a cloud infrastructure

    To provide an overview on how to manage a cloud infrastructure, we start by describing the OpenStack’smain components and continue with a description on how computing resources are provisioned.

    6

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    2.2.1 OpenStack services and architecture

    In the context of cloud computing, the open-source OpenStack[3] platform has grown and stand out inthe recent years. Often offered as Infrastructure as a Service (IaaS), OpenStack is used for public, privateor hybrid clouds and maintained by an active community financed by more than 200 organizations, suchas Red Hat, Cisco, Rackspace, and others. Companies choose to use this platform for a variety of reasons,as explained in the 2017 OpenStack User Survey[6], such as to increase operational efficiency, avoidvendor lock-in, and mainly to standardize on the same open platform APIs that power a global networkof public and private clouds.

    OpenStack components interrelate to control sets of physical and virtual resources, used for processingand storage of data, communicating via predefined internal and external networks. Users can manage anOpenStack cloud through a control panel available as a web service, with command line tools, or viaa REST API. One of its biggest appeal is the ability of creating IaaS clouds using simple hardware, insmall quantities, using complete open-source solutions, while still being able to use the resources in asophisticated way.

    Among the services available in OpenStack [5], some can be considered essential and some targetspecific use cases. The essential components are named core projects. Below, we will mention the mostrelevant ones for a typical general-use OpenStack cloud:

    Nova: Primary OpenStack computing and provisioning service. It is used to deploy and manage fromsmall to large amounts of virtual and physical machines, offering vertical and horizontal scalabilitymechanisms. Nova interacts with Keystone for authentication; with Glance to retrieve images to bedeployed on provisioned machines; and Horizon for user interface.

    Swift: Storage system for general objects (often being used for files that are not edited in the byte level),providing high availability and scalability. Developers can use this service to efficiently and inexpensivelystore large amounts of data. Swift also deals with concurrent access to files throughout the data set. Idealfor storing unstructured data that can grow arbitrarily.

    Cinder: A service to provide Block Storage for VMs, containers or bare-metal nodes. Its interfaceis homogeneous independently of the backend, which can use popular distributed access solutions inhardware (e.g., using drivers for stage systems from HPE, Lenovo, SolidFire. among others) or in software(using, for example, ones own implementation or others such as CEPH, the most popular one [6]).

    Glance: Provides services related to the registration and retrieval of disk images for virtual and physicalmachines in OpenStack. Images available through Glance can be stored in different places, from simplefile systems to systems for storage of objects such as the Swift service itself.

    Neutron: Service that manages network connections between resources provided by OpenStack, aswell as with the outside world. Neutron ensures user resources will be able to communicate with eachother and with the Internet, and will have access to higher level tears such as firewalls or load balancers.

    7

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    Keystone: Provides authentication and authorization services. For example, it controls access to theAPIs and, therefore, access to users and resources. For authentication, it supports well-known systemssuch as LDAP, OAuth, OpenID Connect, SAML, and also simple solutions like storing credential ininternal MySQL databases.

    Ceilometer: It aims to collect, normalize and transform cloud monitoring data. From these data it ispossible to have a view of the usage of resources from each project, helping cloud providers to plan betterthe deployment of new resources based on this usage, and define business strategies for the cloud. Fromthe user perspective, it also helps to estimate the resources allocated to applications.

    Heat: Orchestrates resources for cloud applications, based on templates. Such templates are written intext form, treated as code, and define which infrastructure is necessary for an application to be deployed.Heat still provides automated scaling services, integrated with the Ceilometer service, increasing ordecreasing the nodes in a cluster based on the application needs defined on the template. Finally, it is alsoused by other OpenStack components, such as Magnum (to deploy container orchestration engines, suchas Kubernetes) or Sahara (to deploy data processing clusters, such as Hadoop or Spark).

    Horizon: It is a user interface with a control panel. This graphical interface is usually the first contactof the end users with the system. Cloud administrators make use of Horizon for visualization purposes,for instance, when considering resource usage.

    Rally: It is a benchmark service for OpenStack. It automates the process of validating and benchmarkingan OpenStack cloud.

    As illustrated by the list of services above, it can be seen that OpenStack is an ecosystem of cooperativeservices. Many other projects are available and their maturity and adoption can be monitored in a centraldashboard3.

    In the main deployment of the SecureCloud infrastructure service, we aim to produce a deploymentthat mimics as much as possible how a production OpenStack cloud would look like. This cloud isillustrated in Figure 2.1, to help keeping track of the availability and performance of the cloud, we useRally [7]. The figure describes four types of nodes and five types of networks. The nodes can be describedas follows:

    • Deployment host: hosts the Ansible scripts4 that automate the installation and configuration of allthe other nodes. It is a single node, as its unavailability does not compromise a running system.The services running in this node are listed in the Service Table 01, at the bottom of the figure, andinclude Rally and services for indexing logs generated by other nodes.

    • Infra Nodes: three replicas that host the user dashboad, Horizon, the service cores and APIs, themessage queue used internally for communication between OpenStack components. The services

    3https://www.openstack.org/software/project-navigator/4Ansible is a software that helps automating configuration process of software (See https://www.ansible.com/). It

    is also broadly used in OpenStack deployments, with a large community of operators and developers that write Ansible scripts thathelp configuring OpenStack components (see https://docs.openstack.org/openstack-ansible/latest/).

    8

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    Figure 2.1: Architecture of a production installation of OpenStack

    running in this node are listed in the Service Table 02, at the bottom of the figure. By using threenodes, the core services and systems tolerate the failure of a node.

    • Compute Nodes: each compute node hosts computing instances for the users. In SecureCloudcloud we consider 4 types or resources: (i) regular KVM instances, which do not support SGX,but for which there are more hardware options available in the market; (ii) KVM machines thatsupport SGX to the VMs, as will be discussed in the next chapters; (iii) LXD nodes, for fasterprovisioning of computing resources with or without SGX; and, (iv) ironic nodes, that provisionphysical machine and, in this case, the availability of SGX depends only on the hardware itself.The types of hypervisor are also listed in the Service Table 03, at the bottom of the figure. Thenumber of nodes depends on the storage configuration and will be discussed below.

    • OSD Nodes: the cloud uses a Ceph storage system5; in this case capacity and performance ofthe storage depend on the number of nodes and the configuration considered. In our case, weconsidered three servers for a storage cluster, each with a small (400 GB) PCIe SSD for journalingas it considerably improves writing speed and five larger spinning disks for capacity. With suchconfiguration, the cluster is able to sustain around 1GB/s of reading speed and around 728MB/s

    5https://ceph.com/

    9

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    for data that is replicated in two nodes (e.g., short term data such as the root filesystem of thecloud instances) and 588 MB/s for data replicated in all four nodes. For accesses to data thatis replicated twice, using a 75% read and a 25% write workload, the cluster sustains 8362 IOoperations per second (IOPS), which for our purposes would enable supporting 170 instances withan average of 50 IOPS6.

    Regarding the networks, five are depicted in Figure 2.1, as detailed below:

    • API Network: This is a moderate to low-traffic network for private communication betweenOpenStack components (for example, a request for a node to create a VM). It connects all thecomponents in the cloud.

    • Storage Network: This is a high-traffic network where all the storage (including the volumes,objects, images or, sometimes, event filesystem access) circulates. Its main purpose is to connectthe storage (Ceph) nodes to the compute nodes.

    • Administration Network: This is used for out-of-band management such as administrativeinterfaces that help debugging machine failures (e.g., through a remote console).

    • Self-service Network: This is the network in which the user-defined networks are hosted. All usernetwork data circulates through this network.

    • OSD-sync Network: This is a private network that is used to synchronize multiple replicas of datablocks stored in the Ceph storage system. It connects only the storage nodes.

    Currently, our cloud is split into two. The bare-metal nodes and the KVM-SGX compute nodes,controlled by a separate machine due to the constant need for experimentation and changes in the codethat runs on the nodes that control the cloud, the Infra Nodes.

    2.2.2 Provisioning virtual machines in OpenStack

    The Nova [2] service is responsible for provisioning resources in an OpenStack cloud, providing on-demand access to large sets of virtual and physical machines. For this, virtualization technologies areused, such as the hypervisors KVM, QEMU, VMware, Xen, Hyper-V, and Linux LXC and LXD forcontainers.

    To access the offered resources, the system defines to which projects a user belongs to and roleshe can exercise. A project consists of a set of resources, such as volumes, instances, images and keys,among others, which can be managed by a set of users. Each project will have access to a portion ofthe resources available in the cloud, limited by quotas. Finally, the roles that a user can have define theactions that he can perform as a member of a given project. In this case, rules are created to specify theselimitations, if any. A detailed discussion on access control mechanisms and the needed adaptations forthe SecureCloud infrastructure are described in Sections 3.1 and 3.2.

    In order to provide the service, Nova has some components that play a variety of roles (storingdata, allocating instances, managing hosts, and more). The components from the two types of cloud

    6For comparison, a dedicated 7200 RPM disk can support around 66 IOPS (source: https://www.symantec.com/connect/articles/getting-hang-iops-v13).

    10

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    infrastructure nodes are especially important for the understanding of the adaptations made in theimplementation of the SecureCloud infrastructure services, as detailed below:

    Controller: As mentioned in the previous section, the controller node (the Infra Nodes in ourarchitecture) hosts the core components of the OpenStack cloud related to the provisioning of instances.Components such as the scheduler, which decides which VMs will be allocated to each nodes, and theusage database, which stores the resources used by the cloud users, run on this node.

    Compute: This node is responsible for the hypervisors that operate the provisioned instances. Acompute can be host to several of these instances, depending on the set of resources it has available, suchas CPU, RAM and disk, and in our case, the EPC memory (see Section 2.1). In addition, more than onecompute can be associated with the same controller, which enables the cloud to be able to provide a largerset of instances, for example.

    Thus, considering these components and how they are organized, we detail next how the provisioningof a machine works in OpenStack. Table 2.1 below shows the essential parameters for creating an instance.It is important to understand how a flavor is used. Flavors represent templates for the resources of a VM.As defined in Table 2.1, this object can store a variety of configurations regarding hardware resourcesbeing requested for an instance. Such resources can be memory, CPU, storage capacity, among others, asshown in Table 2.2 below.

    nova boot Option parameter Description

    –flavor Specifies the template for hardware configurations for aninstance.

    –image Name or ID of an image in Glance to be deployed in an instance.–nic net-id=’net-id’ ID of the provisioning network to be used.

    Table 2.1: Parameters necessary to create an instance using the Nova service.

    nova flavor-create Option parameters Description

    –ram Amount of RAM to be used (in megabytes).–disk Amount of disk to be used (in gigabytes) for the root partition.

    –vcpus Number os virtual cpus to be used.–is-public Defines if the flavor is available to all users or a specific project.

    Standard value is True.–property key:value pairs to define in which compute nodes instances can run.

    Table 2.2: Parameters necessary to create a flavor.

    11

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    In some cases, specific parameters are needed for deploying instances. For example, informationthat will influence the scheduling or the mapping between virtual CPU and physical CPUs. Specifyingadditional information for an instance is done through the -property parameter.A common case nowadays is to have instances that should run only on compute nodes that have GPUhardware. In this case, a flavor is created that defines this in its . In our case, this willbe used to limit SGX instances to run on nodes supporting SGX.

    Once a request to create an instance arrives, the Scheduler [1] component of OpenStack handles thefiltering and selection of compute nodes to host the virtual or physical resources being requested. Duringscheduling time, this component iterates over all found compute nodes, evaluating each one based onfilters configured by the cloud administrator. Such filters define metrics and conditions a compute nodehas to meet in order to be considered able to receive the new instance. All compute nodes that pass theselected filters are then ordered by weighters, and the Scheduler chooses one of the best evaluated ones. Ifthe Scheduler cannot find candidates for an instance, it means that there are no appropriate hosts available(possibly due to the lack of resources in the cloud).

    Considering filters, there is a variety of filter strategies which may be selected for scheduling. Someare listed below:

    AvailabilityZoneFilter. Filters hosts by availability zone. Hosts matching the availability zone specifiedin the instance pass the filter. As in public cloud providers, different availability zones have differentfailure domains, such as different power distribution lines or different Internet connections. Users mayinfluence how this parameter is set.

    AggregateInstanceExtraSpecsFilter. Filters hosts based on the metadata of an aggregate of computes.If the metadata satisfies any extra_specs associated with the instance, the hosts belonging to thataggregate pass the filter. This is not accessible to users, a flavor defined by the cloud admin may or maynot have the metadata that restricts how it can be scheduled in the cloud compute nodes.

    RamFilter. Filtering is based on the available RAM of the hosts. If there is enough RAM to meet therequired amount by an instance request, the compute node passes the filter.

    DiskFilter. Similarly to the RamFilter, filtering is based on the available disk of the hosts. Only hostswith sufficient disk space as per requested by an instance pass this test.

    There are tens of possible filters in OpenStack to support a number of filtering and weighting strategies.Through this variety of filters, the Scheduler becomes very flexible. Nevertheless, if this flexibility isinsufficient for a specific use case, new filters can be implemented using custom filtering algorithms,which is the case of our solution, as explained below (see Section 2.4).

    2.3 SGX virtualization with KVM

    The first step to bridge the OpenStack and SGX worlds is to enable OpenStack to handle SGX-awareresources. For isolation purposes, independently if the applications will be deployed through containersor VMs, the base resources are the VMs. For the container-based applications in a cloud environment,

    12

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    there will be multiple Kubernetes for different tenants, with the support for multitenancy improved in theinfrastructure through the usage of VMs. Furthermore, for applications that are slowly migrating to cloudenvironments, the deployment is strongly based in VMs.

    Creating SGX virtual machines is still considered to be in a beta development stage by Intel. Anyway,there is a consistent effort to improve the support. The specially created kvm-sgx kernel operates byhaving KVM serving the EPC (Enclave Page Cache) from the host to applications inside the virtualmachines. It is important to notice that the special kvm-sgx needs to be installed in a host with SGXcapabilities available and enabled. In order to expose SGX features to guest VMs, it is necessary topredefine an amount of EPC to be allocated to the virtual machines prior to their creation. Currently onlyEPC static partitioning is supported, meaning the EPC pages are statically allocated to the VMs and onlyreleased upon deletion.

    To allocate EPC, some changes on QEMU are also necessary. For this, the specially createdqemu-sgx provides new parameters to be passed in the guest’s XML definition, containing informationon how much EPC should be allocated. These flags are -sgx epc=$AMOUNT_OF_EPC and -cpuhost, and when combined, tell the hypervisor how to create a guest with SGX capabilities.

    Considering this, we integrated kvm-sgx with an OpenStack cloud by making the compute nodeshosting instances able to expose their SGX to the VM guests. Thus, compute nodes on the cloud areSGX servers, and have kvm-sgx installed as the hypervisor. In addition, to enable the user to define theamount of EPC needed and to properly schedule SGX-enabled instances, among other features, changeswere made to OpenStack.

    Finally, there have been changes in the qemu-sgx and kvm-sgx kernel, recently made available in theirrespective repos. The new versions were successfully integrated to our cloud, minor changes necessary inthe previous code.

    2.4 SGX-enabled virtual machines using Nova

    In order to make it possible to create SGX resources in an OpenStack cloud, it is important to consideraspects regarding the provision, scheduling and accounting of SGX usage (e.g., amount of EPC used byeach instance provisioned). In this section we describe the changes made in the OpenStack Nova serviceto provide SGX virtual machines, as well as challenges and limitations.

    The regular process of providing virtual machines using the Nova service is described in Section2.2.2. We change this process when it comes to the XML file used by the hypervisor to create the instance,a new scheduling filter that understands EPC size limitations and selects only hosts capable of allocatingEPC, and the accounting of this resource displayed to the cloud administrator through Nova commandline.

    2.4.1 The provisioning process

    In order to properly provision a secure instance that has access to the SGX in the host, it is necessary tolet the hypervisor know how much of EPC should be allocated to the instance being provisioned. For thatto happen, as previously explained in Section 2.3, an installation of a special kvm-sgx kernel is requiredon the physical machine serving as a host to the guest VMs. In this case, considering an OpenStack cloud,hosts are SGX-enabled compute nodes, having kvm-sgx installed and serving as a cloud hypervisor.

    13

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    The next step consists on modifying the XML files used to create guest VMs, passing the informationregarding EPC allocation to the hypervisor, through the XML, so it knows how to properly create theSGX instance. In OpenStack, when using KVM, this is handled by the Libvirt driver implemented inthe compute nodes, so we located the code that creates the XML objects, and changed it to contain theamount of EPC that should be allocated, coming from the flavors of the instances.

    Finally, it is necessary to change the nova.conf file, which holds Nova’s configuration options, toallow the special kvm-sgx kernel to coexist with the OpenStack provision services. Essentially, we setthe virtualization type to be KVM and enable the Virtual Network Computing (VNC) console displayinstead of SPICE, which is the default option. As of now, only VNC is supported by KVM-SGX.

    2.4.2 The scheduling process

    In an OpenStack cloud, the process of creating an instance relies on a scheduling mechanism that decideswhich compute node is the most suitable host for the requested virtual machine. As mentioned in Section2.2.2, there is a number of filters available in the Nova scheduler that can be enabled by the cloudadministrator. These filters are generally related to resource usage and the host capacity of offering theseresources. However, there can be specific use cases that are not covered by any standard OpenStack filter.In this case, the administrator can create a custom filter that meets his expectations.

    Considering this, in our scenario it is important to guarantee that instances requesting a certain amountof EPC are deployed in suitable hosts, meaning the ones that have the capacity to offer the requested EPC.That said, there is no standard filter in OpenStack that understands this logic, so we created one calledSgxEpcFilter.

    The filter consists of a method called by the Scheduler to determine if the host is suitable based on theconditions declared in the function. In order to do this, we check if the EPC being requested through theflavor is lower than the current amount of available EPC in the host. If yes, then the method returns Trueand the host passes the filter. If there is no EPC available anymore, the host is not suitable to receive newinstances, and therefore, will not pass the filter.

    The calculation of the free amount of EPC is based on previous experiments that exhausted the systemto see how much of EPC a physical machine could actually provide. In addition to that, the modifiedLinux SGX driver mentioned before was installed in a clean machine and the scripts that calculate the freeEPC returned a result around 90MB, which is similar to what we previously achieved when exhaustingthe system.

    In another front, when successfully creating new instances, we need to compute the right amount ofEPC being used by the given host so the Scheduler knows how much of the resource is still available tobe provisioned. In order to do that, the HostState object is responsible for updating the calculationsand check them against the conditions defined by the filters.

    Finally, to enable the SgxEpcFilter for the Scheduler, it is necessary to include this filter in theenabled ones defined in the nova.conf configuration file. Also, the flavors used to create instances shouldnow contain a new extra_specs property indicating the amount of requested EPC. The propertyshould follow the format sgx:epc_size=$AMOUNT_OF_EPC.

    14

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    2.5 SCONE: Secure Container Environment

    With the advent of SGX and the growing use of containers for hosting applications, new approachesto handle security and privacy aspects of such structures have emerged. Here, we use SCONE [8], aSecure Container Environment that uses SGX to protect given container processes, using SGX protectedenclaves. This mechanism offers secure containers together with insecure operational systems, and doesthat in a transparent way to already existent Docker, Kubernetes or Ranger environments. For this tohappen, it is only required that the host machine has a SGX-capable Intel CPU and a Linux SGX kerneldriver7 installed.

    One of the main features of SCONE is the ability to secure existing application in an easy way. Thisis offered relying on the fact that applications do not need to be modified to use it. SCONE also supportsthe most popular programming languages like JavaScript, Python - including PyPy, Java, Rust, Go, C,and C++. This way, SCONE can ensure that applications can later run on different trusted executionenvironments.

    In addition, amongst other features offered by SCONE, there are (i) an asynchronous system callinterface to the host OS provided to container processes, allowing them to perform system calls withouthaving to exit threads inside enclaves; (ii) support for transparent encryption and authentication of datathrough a mechanism called shielding, ensuring data integrity and confidentiality; (iii) no changes to theapplication code being deployed, since SCONE’s special compiler automatically prepares the code to beSGX-compatible; (iv) simple Docker integration relying on a secure container image specially built forthis purpose.

    Besides that, providing a secure container requires a SCONE client extension to enable the creationof configuration files, spawning of such containers and for secure communication with them. Duringcontainer startup, a configuration file is necessary containing keys for encryption, application argumentsand environment variables. Also, the application code must be statically compiled with its librarydependencies and the SCONE library.

    Finally, SCONE provides two services to help in the attestation process involved in SGX applications.The Configuration Attestation Service (CAS) stores the confidential information that needs to be protectedand release this data only to trusted parties, who have to attest themselves in order to prove they canexecute the application and did not make any modifications to its code. More on the changes made toprovide a CAS as service inside an OpenStack cloud is described on Chapter 3. The second service,named Local Attestation Service (LAS), runs on the client side, providing features of local attestation forthe client code in order for it to communicate with the CAS service.

    To summarize, SCONE provides secure containers maintaining a small Trusted Computing Base(TCB) size, reducing overheads naturally imposed by SGX enclave transitions, thanks to its asynchronoussystem calls mechanism and custom kernel module.

    7https://01.org/intel-softwareguard-eXtensions (visited: June 01, 2018).

    15

  • 3 CAS Access for external UsersIn the task associated to this deliverable, we are interested in providing services that enable controllingaccess to resources, especially to resources that represent confidential information (such as databasecredentials, encryption keys, certificates, etc.), using as much as possible the OpenStack role-based accesscontrol rules to define access to services. In this chapter, we present the solution implemented to enablethe access control to use the Configuration and Attestion Service (CAS, detailed in deliverables D1.1 andD1.2) on top of OpenStack, the most popular open-source cloud management platform.

    We start by detailing the changes needed on Keystone [4], and then we will take a look at how exactlythe CASaaS works.

    3.1 Changes made on OpenStack Keystone

    Most OpenStack projects limit access through a role-based access control approach. This means that eachAPI call is associated with a policy rule that defines the level of access required for executing this API.

    Details can be found in the project documentation1, but the general format is shown in Listing 3.1and a short example is shown in the listing below.

    1 {2 "API_NAME" : "RULE_STATEMENT or MATCH_STATEMENT"3 }

    Listing 3.1: OpenStack RBAC rule format.

    The API_NAME shown in Listing 3.1 corresponds to the target API in the service function. In theright-hand side of the rule, the RULE_STATEMENT is a reference to another rule. Alternatively, theMATCH_STATEMENT is a set of identifiers that must match between the authentication token providedby the caller of the API and parameters provided or targets of the call. In our case here, we will beinterested in restricting the access to the CAS API to users that have the adequate roles. The role is simplya tag given by the cloud provider that enables that user to have access to a service (e.g., to be able toinstantiate containers or not, to create volumes or not, and so on).

    1 {2 "identity:create_user" : "role:cloud_admin"3 }

    Listing 3.2: OpenStack RBAC rule example.

    Thus, in the example in Listing 3.2, the access to the create_user API is restricted to the userwho has the cloud_admin role, which is typically a small set of cloud operators.

    As detailed in the project website [4]: Keystone is an OpenStack service that provides API clientauthentication, service discovery, and distributed multi-tenant authorization by implementing OpenStack’sIdentity API. That means Keystone is the service that enables the creation of roles that restrict the accessto other resources. Therefore, as the CAS is also used to specify which applications will have access towhich secret resources, we chose to put the API that controls access to the CAS in the Keystone service.

    In order to be able to control access to the CAS, using the OpenStack standard procedures, butkeeping compatibility to applications that run in other cloud management platforms or even in stand-alone, on-premise scenarios, we created a module that limits access to the CAS to machines that werepreviously authenticated to Keystone. This is done through a new controller on Keystone that listens for

    1https://docs.openstack.org/keystone/pike/admin/identity-service-api-protection.html

    16

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    POST requests on /v3/cas_tunnels. When this controller receives a request, it will authenticate theuser and post a message on the identity channel of RabbitMQ with the IP address of the user thatsent the request. Once the IP of the just authorized machine is in the queue, the CASaaS module willtunnel the requests to an cloud-platform-agnostic CAS.

    The part of the CASaaS module that is inserted into the Keystone component is shown below.

    1 @ c o n t r o l l e r . p r o t e c t e d ( )2 def c r e a t e _ c a s _ t u n n e l ( s e l f , r e q u e s t ) :3 u s e r _ i p = r e q u e s t . c o n t e x t _ d i c t [’environment’ ] [’HTTP_X_FORWARDED_FOR’ ]4 h o s t _ i p = r e q u e s t . c o n t e x t _ d i c t [’environment’ ] [’HTTP_HOST’ ] . s p l i t (5 ’:’ ) [ 0 ]6 p a y l o a d = {’ip_address’ : u s e r _ i p }7

    8 t r y :9 s e l f . _ n o t i f i e r . i n f o ( { } , ’identity.create_cas_tunnel’ , p a y l o a d )

    10 e xc ep t E x c e p t i o n :11 LOG. e x c e p t i o n (12 ’Failed to send identity.create_cas_tunnel notification’ )

    3.2 How the CASaaS works

    CASaaS is a service that waits for a new message on the identity channel of RabbitMQ. When itreceives a new message of the type create_cas_tunnel, it will read the IP address on the payloadand create a tunnel between the machine where it is running (the OpenStack controller) and the machinethat is running the CAS. So, when the user wants to use the CAS, it will use the IP address of the machinethat CASaaS is running on and all the requests will get redirected to the CAS. Please note that not ALLrequests going to the CASaaS’ host machine are going to get redirected to the CAS, only those hittingports 8081 and 18765 (the ports that the CAS uses to communicate).

    The tunnel between the host machine and the CAS machine is a Python TCP server running on thehost machine that will redirect the traffic of the authenticated users. If you try to send a request to thisserver without authenticating yourself with Keystone on the /v3/cas_tunnels endpoint first, therequest will simply be ignored and will not reach the CAS.

    This approach has the goal of enabling that applications do not need to be modified for the differentinfrastructures considered in the SecureCloud project, for example: running in a OpenStack cloud or onservers provisioned by other cloud providers (such as CloudSigma).

    1 i f __name__ == ’__main__’ :2 t r a n s p o r t = o s l o _ m e s s a g i n g . g e t _ n o t i f i c a t i o n _ t r a n s p o r t (CONF)3 t a r g e t s = [4 o s l o _ m e s s a g i n g . T a r g e t ( t o p i c =’cas_tunnel’ , exchange =’/identity’ )5 ]6 e n d p o i n t s = [7 T u n n e l E n d p o i n t ( )8 ]9

    10 l o g g i n g . i n f o (’CASaaS - waiting for identity.create_cas_tunnel messages.’ )11

    12 s e r v e r = o s l o _ m e s s a g i n g . g e t _ n o t i f i c a t i o n _ l i s t e n e r ( t r a n s p o r t , t a r g e t s ,

    17

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    13 e n d p o i n t s ,14 e x e c u t o r =’threading’ )15 s e r v e r . s t a r t ( )16 s c h e d u l e r . s t a r t ( )17

    18 s o c k e t s = _ g e t _ s o c k e t s ( )19 l o g g i n g . i n f o (’CASaaS - tunnel server started.’ )20

    21 whi le True :22 r e a d a b l e , _ , _ = s e l e c t . s e l e c t ( s o c k e t s , [ ] , [ ] )23 s o c k e t _ r e a d y = r e a d a b l e [ 0 ]24

    25 s e r v e r _ s o c k e t , a d d r e s s = s o c k e t _ r e a d y . a c c e p t ( )26 a c c e p t e d _ p o r t = s e r v e r _ s o c k e t . ge t sockname ( ) [ 1 ]27

    28 i f a d d r e s s [ 0 ] in i p _ w h i t e l i s t :29 l o g g i n g . i n f o (’CASaaS - setting up tunnel for %s.’ % a d d r e s s [ 0 ] )30

    31 c a s _ p o r t 1 = CONF[’scone’ ] [’cas_port1’ ]32 c a s _ p o r t 2 = CONF[’scone’ ] [’cas_port2’ ]33

    34 i f a c c e p t e d _ p o r t == c a s _ p o r t 1 :35 T u n n e l S e r v e r ( s e r v e r _ s o c k e t , CONF[’scone’ ] [’cas_ip’ ] ,36 c a s _ p o r t 1 ) . s t a r t ( )37 e l i f a c c e p t e d _ p o r t == c a s _ p o r t 2 :38 T u n n e l S e r v e r ( s e r v e r _ s o c k e t , CONF[’scone’ ] [’cas_ip’ ] ,39 c a s _ p o r t 2 ) . s t a r t ( )

    3.3 Validation of the CASaaS

    We ran a few benchmarks to have an idea of how mature is the current implementation of the CASaaSfeature. The virtual machine running the CAS is hosted on our SecureCloud with 2 GB of RAM, 1 vCPUand 20 GB of disk. The virtual machine running the OpenStack controller is running on our cloud andhas 8 GB of RAM, 4 vCPU and 80 GB of disk.

    One cycle of the benchmark consists of the steps that an application would need to execute at the startof the execution. The steps are the following:

    1. Authenticate with Keystone to get a valid token.

    2. Send a POST request to /v3/cas_tunnels in order to get access to the CAS.

    3. Run a simple SCONE application using the docker run command.

    The first benchmark ran for 24 hours with 1 cycle starting each 2 seconds, which is equivalent toone application (be it in a container, VM or physical machine to be started each 2 seconds). This testran smoothly and with very few failures. We had a total of 43147 cycles with 13 failures (0.0003%)and 43134 successes (99.9997%). The fastest running time was 2.348 seconds, the slowest was 43.583seconds and the mean running time was 3.836 seconds. See Figure 3.1 for more details. We removed the

    18

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    x-axis from the figure to improve readability, but the horizontal axis is the date and time that each cyclestarted and the vertical axis is the time that each cycle took to run in seconds.

    As depicted in the middle of the figure, a series of cycles took longer to complete. We believe thishappened because of some network instability.

    Figure 3.1: Successful benchmark.

    19

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    The second benchmark ran for 24 hours with 1 cycle starting each second. We had a total of 1279cycles with 804 failures and 475 successes, and the running time slows down with each cycle to thepoint where a cycle takes hours to complete. The fastest one that was as success took 2.992 secondsto complete, the slowest successful test took 8.5 minutes to complete. After the slowest success of 8.5minutes, it started to break down and fail. The slowest cycle failure took 11.5 hours to report the failure.See Figure 3.2 for more details. We removed the horizontal axis from the figure to improve readability.We also changed the vertical axis to a 10-logarithmic scale in order to see more details.

    20

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    Figure 3.2: Failed benchmark.

    This test results indicate the limits of a CAS service for large cloud deployments, with tensof thousands of CPUs and thousands of applications being started per hour. For these cases, therecommendation is to create multiple CAS servers and partition these based on the tenants that areaccessing it. It is important to notice that in such scenarios there would already be multiple cloudcontrollers and Keystone servers, and, thus, the requirement from CAS does not impose an additionalburden to the cloud.

    21

  • 4 DemonstratorIn this demonstrator we present how to install, configure and use CASaaS on existing OpenStackinfrastructures. We begin by installing a patched version of OpenStack Keystone that supports this feature.Then, we will run the CAS on a private virtual machine that has strict network access rules as the point ofhaving this feature is that the cloud’s CAS service does not have to be publicly open and and will onlybe accessible for authenticated users. Finally, we will configure both Keystone and CASaaS and run anexample that uses this feature.

    The Keystone service was only slightly modified: we created a new endpoint on the identitycontroller to accept POST requests on /v3/cas_tunnels. When it receives a request from anauthenticated user, Keystone will publish a message on the identity channel of RabbitMQ with theIP address of the user as the payload so that CASaaS can grant access to the CAS for that specific IPaddress. This access is granted for a limited time (the default is one hour).

    In order to run the steps in this section, we use our miniature secure cloud. This cloud includes serverswith SGX enabled virtual and bare-metal machines. This cloud is accessible through OpenStack Horizoninterface, available at https://secure-cloud.lsd.ufcg.edu.br1.

    Our example application is a simple script that receives a secret from CAS and prints it on the outputconsole. We will first try to run it without requesting access to CASaaS to see it fail to connect. Then, wewill ask CASaaS for access and try to run it again to see it executed properly.

    Each section below has a location between parenthesis indicating where the operations should takeplace. The locations considered are the following:

    • The OSA Controller refers to the machine that is the OpenStack controller. It hosts all the LXCcontainers for the OpenStack components. If you use the default IP address provided here, notethat the OSA controller runs (in a VM) external to the secure cloud.

    • The Keystone Container refers to the location where the Keystone service is running, which istypically a container. In our environment, which represent a typical installation using the OpenStackAnsible project’s scripts, one should ssh to the OSA controller and then attach to the Keystonecontainer.

    • The CAS Host refers to the machine that is hosting the CAS, not the CAS container (assuming thatthe CAS service runs in a container, for ease of deployment). If you use the IP address providedin this document, note that the CAS host runs in our secure cloud (as it needs a VM with SGXaccess).

    • The Local Machine refers to the client’s own machine. Beware that you should either use a VPNto access the LSD hosts/VMs (because the OSA controller is hosted in a private network) or givethe OSA controller a public IP. This machine is typically outside the cloud environment and insidethe user’s circle of trust. This is the case because the main motivation of the CAS is to put secretsin the cloud, but still protect it from the operators. Thus, the secrets are never seen unencryptedinside the cloud.

    1Access to internal or external collaborators, and to project evaluators can be granted by request. Please check the form athttps://goo.gl/forms/Y0r0WVjYLH9fvz592

    22

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    4.1 Creating a private CAS (CAS Host)

    We need to create a virtual machine that will run the CAS. This way we are able to tighten the access to itthrough the VM’s security groups. In this section we present the steps to create a private CAS with strictaccess rules.

    4.1.1 Create a security group for the CAS Host

    Access our LSD Secure Cloud available at https://secure-cloud.lsd.ufcg.edu.br (seeFigure 4.1).

    Figure 4.1: LSD Secure Cloud login screen.

    Click on Network, Security Groups and then Create Security Group (see Figure4.2).

    23

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    Figure 4.2: Create a security group.

    Name your security group (we will call it private-cas). We will leave the description empty.After the security group is created, click on Manage Rules so we can define access rules (see Figure4.3).

    24

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    Figure 4.3: Manage security group rules.

    Check Figure 4.4 to see how our rules are defined.

    Figure 4.4: private-cas rules.

    25

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    Ports 8081 and 18765 are the ports that the CAS uses to communicate, so only our OSA controller(IP address 10.11.5.15) can access it.

    4.1.2 Create a CAS Host

    Click on Instances and then Launch Instance (see Figure 4.5).

    Figure 4.5: Create an instance.

    Name your instance (we named it private-cas) and choose the Nova availability zone withCount one (see Figure 4.6).

    26

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    Figure 4.6: Name your instance.

    Select Image as the boot source with no need for a volume creation, and Ubuntu 16.04 Server as theimage (see Figure 4.7).

    27

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    Figure 4.7: Select the instance image.

    Select any of the available SGX flavors (see Figure 4.8).

    Figure 4.8: Select the instance flavor.

    28

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    Select the provider-vm-net network (see Figure 4.9).

    Figure 4.9: Select the instance network.

    There is no need to select anything on Network Ports (see Figure 4.10).

    29

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    Figure 4.10: Network ports are not needed.

    Select the private-cas security group we created earlier (see Figure 4.11).

    Figure 4.11: Select the instance security group.

    30

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    Create or import a key pair for accessing the VM (see Figure 4.12).

    Figure 4.12: Select the instance key pair.

    Leave the rest of the options empty and launch the instance (see Figures 4.13, 4.14, 4.15 and 4.16).

    31

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    Figure 4.13: Configuration is not needed.

    Figure 4.14: Server groups are not needed.

    32

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    Figure 4.15: Scheduler hints are not needed.

    Figure 4.16: Launch the instance.

    33

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    4.1.3 Install the Intel SGX driver on the instance

    Now we need to ssh to the instance and install the SGX driver using our repository: https://git.lsd.ufcg.edu.br/secure-cloud/sgx-install.

    1 $ s s h ubuntu@ < o s a _ c o n t r o l l e r _ i p _ a d d r e s s >2 $ g i t c l o n e h t t p s : / / g i t . l s d . u f cg . edu . b r / s e c u r e −c l o u d / sgx− i n s t a l l . g i t3 $ cd sgx− i n s t a l l4 $ sudo . / i n s t a l l _ l a t e s t _ s g x _ d r i v e r . sh

    4.1.4 Run CAS

    After installing the SGX driver, we can run the CAS.

    1 $ sudo do ck e r run −d −−rm −− p r i v i l e g e d −−d e v i c e / dev / i s g x −p18765 :18765 −p8081 :8081−−name s e c u r e c l o u d l s d _ c a s 1 0 . 1 1 . 5 . 6 : 5 0 0 0 / s c o n e c u r a t e d i m a g e s / s c o n e t a i n e r : c a s

    NOTE: New remote attestation requires newer BIOS versions. If your BIOS version is old, or if yousimply do not want to upgrade it, use sconetainer:cas.trust.group-out-of-date imageto run your CAS.

    NOTE: Remember to enable untrusted registries on Docker to use our images. Edit the/etc/docker/daemon.json file with the following settings.

    1 {2 "insecure-registries" : ["10.11.5.6:5000" ]3 }

    4.2 Installing the Patched Keystone (Keystone Container)

    In this section we present the steps to patch the OpenStack Keystone. We assume that you have a cloudinfrastructure up and running. More specifically, in this example, our OpenStack deploy is structuredaccording to the OpenStack Ansible installation, meaning that the components are all inside LinuxContainers. If you have a different structure, check the OpenStack documentation to see how to installKeystone according to your deploy structure.

    You can find our patched Keystone here: https://git.lsd.ufcg.edu.br/secure-cloud/keystone.

    To install it, go to the machine that is the OSA controller and attach to the Keystone container.

    1 $ s s h ubuntu@ < o s a _ c o n t r o l l e r _ i p _ a d d r e s s >2 $ sudo su3 $ lxc − l s | g r ep k e y s t o n e4 $ lxc − a t t a c h −n < c on t a i n e r _ n a me >

    Now clone the repository and install it.

    1 $ g i t c l o n e h t t p s : / / g i t . l s d . u f cg . edu . b r / s e c u r e −c l o u d / k e y s t o n e . g i t keys tone −p a t c h e d2 $ source / o p e n s t a c k / venvs / keys tone −< v e r s i o n > / b i n / a c t i v a t e3 $ cd keys tone −p a t c h e d4 $ p i p i n s t a l l − r r e q u i r e m e n t s . t x t5 $ py thon s e t u p . py i n s t a l l

    34

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    Exit the container with CTRL+D and restart it with lxc-stop -r -n .Attach to the Keystone container again and create a .config folder in the home folder of the

    keystone user.

    1 $ su k e y s t o n e2 $ cd3 $ mkdir . c o n f i g

    Edit the /etc/keystone/keystone.conf file with the following settings.

    1 [ o s l o _ m e s s a g i n g _ n o t i f i c a t i o n s ]2 d r i v e r = messag ing3 t r a n s p o r t _ u r l = r a b b i t : / / k e y s t o n e : < password >@< r a b b i t _ i p >: < r a b b i t _ p o r t > / / k e y s t o n e4

    5 [ o s l o _ m e s s a g i n g _ r a b b i t ]6 s s l = True

    To find the correct value for transport_url, search for that same variable on that same file, itshould be near the beginning like this.

    1 ## RabbitMQ RPC2 t r a n s p o r t _ u r l = < . . . >

    We need these settings to enable the communication of Keystone with RabbitMQ.Finally, edit the /etc/keystone/policy.json file with the following rule.

    1 {2 "identity:create_tunnel" : "role:admin"3 }

    With this, we are allowing every user with the admin role to use the endpoint that will give access tothe CAS.

    4.3 Installing CASaaS (OSA Controller)

    Go to the OSA controller and clone the CASaaS repository from here: https://git.lsd.ufcg.edu.br/secure-cloud/casaas.

    CASaaS expects a configuration file at /.casaas.conf with the followingsettings.

    1 [DEFAULT]2 t r a n s p o r t _ u r l = r a b b i t : / / k e y s t o n e : < password >@< r a b b i t _ i p >: < r a b b i t _ p o r t > / / k e y s t o n e3

    4 [ o s l o _ m e s s a g i n g _ n o t i f i c a t i o n s ]5 d r i v e r = messag ing6 t r a n s p o r t _ u r l = r a b b i t : / / k e y s t o n e : < password >@< r a b b i t _ i p >: < r a b b i t _ p o r t > / / k e y s t o n e7

    8 [ o s l o _ m e s s a g i n g _ r a b b i t ]9 s s l = True

    10

    11 [ s cone ]12 c a s _ i p = < c a s _ i p >

    35

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    13 c a s _ p o r t 1 = 808114 c a s _ p o r t 2 = 1876515

    16 [ c a s a a s ]17 b i n d _ i p = < b i n d _ i p >

    The value for transport_url is the same as in the Keystone configuration file.The bind_ip is the IP address that CASaaS will use to bind to create the tunnel. This address

    should be reachable for your users because that is what they are going to use to communicate with theCAS. CASaaS will listen on two ports, and they need to be the same ports that the CAS is listening on.For example, say your OSA controller IP address is 10.11.5.15, then you would want to set bind_ip tothat IP (or any other IP). When you allow a user to use the tunnel, they will use 10.11.5.15 to reach theCAS. Also, if the CAS is listening on ports 8081 and 18765, these ports should be free on the machinerunning CASaaS because it will try to bind on those ports.

    Install the dependencies using pip install -r requirements.txt. If you pip haveproblems finding a package, you need to provide a URL with the location where pip can find the package.For example, to install APScheduler you would run pip install APScheduler -find-linkshttps://pypi.org/project/APScheduler/

    Finally, on the OSA controller machine, run the src/casaas script.

    1 $ g i t c l o n e h t t p s : / / g i t . l s d . u f cg . edu . b r / s e c u r e −c l o u d / c a s a a s . g i t2 $ cd c a s a a s3 $ p i p i n s t a l l − r r e q u i r e m e n t s . t x t4 $ py thon s r c / c a s a a s . py

    4.4 Running the example (Local Machine)

    On your local machine, clone the CASaaS repository from here: https://git.lsd.ufcg.edu.br/secure-cloud/casaas.

    Make sure that the LAS is running on your local machine by executing the following command.

    1 $ do ck e r run −d −−rm −− p r i v i l e g e d −−d e v i c e / dev / i s g x −p18766 :18766 −−names e c u r e c l o u d l s d _ l a s 1 0 . 1 1 . 5 . 6 : 5 0 0 0 / s c o n e c u r a t e d i m a g e s / s c o n e t a i n e r : l a s

    Now go to the CASaaS repository and build the image of the example application (you should be onthe same directory as the Dockerfile).

    1 $ g i t c l o n e h t t p s : / / g i t . l s d . u f cg . edu . b r / s e c u r e −c l o u d / c a s a a s . g i t2 $ cd c a s a a s3 $ do ck e r b u i l d . − t c a s a a s −demo

    The commands below will login and send the compose.yml to the CAS. But the split commandshould fail, since you do not have access to the CAS yet. Also, do not forget to replace 10.11.5.15 withthe IP of the machine that is running CASaaS (the OSA controller), not the actual CAS IP.

    1 $ . / scone c a s l o g i n c a s a a s −u s e r cas − l s d −h 1 0 . 1 1 . 5 . 1 5 : 8 0 8 1 : 1 8 7 6 52 $ . / scone c a s s p l i t −− s t a c k c a s a a s −demo− s t a c k −−cas − a l i a s cas − l s d compose . yml

    36

  • Deliverable 2.3 Secure Big Data Processing in Untrusted Clouds

    NOTE: You might need to run chmod +x scone on the SCONE binary CLI to be able to run it.Now run the create_tunnel script (edit this script if you need to change user credentials or the

    URL for OpenStack Keystone).

    1 $ py thon c r e a t e _ t u n n e l . py

    After this, you should be able to run the example application.

    1 $ . / scone c a s s p l i t −− s t a c k c a s a a s −demo− s t a c k −−cas − a l i a s cas − l s d compose . yml2 $ do ck e r run − i −e ’SCONE_CAS_ADDR= 1 0 . 1 1 . 5 . 1 5 : 1 8 7 6 5 ’ −e ’SCONE_CONFIG_ID= c a s a a s −

    u s e r / c a s a a s −demo− s t a c k / c a s a a s −demo ’ −e ’SCONE_LAS_ADDR= 1 7 2 . 1 7 . 0 . 1 : 1 8 7 6 6 ’ −−d e v i c e / dev / i s g x : / dev / i s g x −−rm c a s a a s −demo

    Check the file generated by the split command (compose.yml.docker.yml) if you need tochange any of the variables on the docker run command above.

    37

  • 5 Final remarksIn this report we have presented the current state of a service that leverages the Configuration andAttestation Service (CAS), which was previously developed to provide highly-granular software-definedsharing of secrets for SGX applications, to integrate with the role-based access control that is typicallyused in cloud providers. This system is named CASaaS (CAS-as-a-Service) and was implemented inOpenStack, the most known open-source platform for cloud computing infrastructures. The servicedeveloped integrates well with OpenStack clouds, but, nevertheless, does not require modifications fromthe applications prepared with SCONE. Thus, the process of preparing applications (e.g., compiling withSCONE, packing into containers) remains the same and, consequently, enables the porting of applicationsto other (non-OpenStack infrastructures), such as the ones provided by CloudSigma.

    The goal of this report was to detail the relevant content to the understanding of the demonstrator. Atutorial on the basic usage of the CASaaS feature was provided and access may be granted to members ofthe project or members of the review board to use the resources of the demonstrator.

    The demonstrator discussed in this report illustrates how to install the CASaaS feature in a fullyoperational OpenStack cloud. As a result, from the cloud administrator’s perspective, this feature enablesthe possibility of having a private CAS that has restricted access, allowing the cloud administrator touse Keystone’s full capabilities for access control. We used a simple Python application (running withSCONE) as an example of how the user would interact with the cloud when using CASaaS. As mentionedabove, from the user’s perspective, the interface to run a SCONE application is unchanged. The onlyextra task to do is authenticate with Keystone, but existing SCONE applications do not need to changehow it interacts with the CAS.

    Further steps include experiments using the actual applications from WP5. In addition, the processof configuring the Kubernetes cluster still requires some actions. This is due to a bug in the imagebuilder component used in OpenStack. The needed actions have been automated so the user only needsto download and execute a couple of scripts. Nevertheless, we will wait this bug to be fixed in order toimplement a final solution that is simple, including all needed steps in the image used to build the cluster.

    38

  • Bibliography[1] Nova scheduler.

    https://docs.openstack.org/nova/pike/user/filter-scheduler.html.[Online; Last access: May 18th, 2018].

    [2] Nova service. https://docs.openstack.org/nova/pike/. [Online; Last access: May18th, 2018].

    [3] Openstack. https://www.openstack.org. [Online; Last access: May 18th, 2018].

    [4] Openstack keystone. https://docs.openstack.org/keystone/pike/. [Online; Lastaccess: May 18th, 2018].

    [5] Openstack software and services. https://https://www.openstack.org/software/.[Online; Last access: May 18th, 2018].

    [6] Openstack user survery november 2017. https://www.openstack.org/assets/survey/OpenStack-User-Survey-Nov17.pdf.[Online; Last access: May 18th, 2018].

    [7] Rally service. https://docs.openstack.org/developer/rally/index.html.[Online; Last access: May 17, 2018].

    [8] G. F. Arnautov S., Trach B. Scone: Secure linux containers with intel sgx. In 12th USENIXSymposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association,2016.

    [9] I. Corporation. Intel® software guard extensions (intel® sgx).https://software.intel.com/en-us/sgx, 2016. [Online; Last access: May 18th,2018].

    [10] J. A. Halderman, S. D. Schoen, N. Heninger, W. Clarkson, W. Paul, J. A. Calandrino, A. J. Feldman,J. Appelbaum, and E. W. Felten. Lest we remember: cold-boot attacks on encryption keys.Communications of the ACM, 52(5):91–98, 2009.

    [11] F. McKeen, I. Alexandrovich, A. Berenzon, C. V. Rozas, H. Shafi, V. Shanbhogue, and U. R.Savagaonkar. Innovative instructions and software model for isolated execution. HASP@ ISCA, 10,2013.

    [12] F. Schuster, M. Costa, C. Fournet, C. Gkantsidis, M. Peinado, G. Mainar-Ruiz, and M. Russinovich.Vc3: trustworthy data analytics in the cloud using sgx. In 2015 IEEE Symposium on Security andPrivacy, pages 38–54. IEEE, 2015.

    39