design and implementation of frost - digital forensic ... · openstack cloud platform, which we...

10
DIGITAL FORENSIC RESEARCH CONFERENCE Design and Implementation of FROST - Digital Forensic Tools for the OpenStack Cloud Computing Platform By Josiah Dykstra and Alan Sherman From the proceedings of The Digital Forensic Research Conference DFRWS 2013 USA Monterey, CA (Aug 4 th - 7 th ) DFRWS is dedicated to the sharing of knowledge and ideas about digital forensics research. Ever since it organized the first open workshop devoted to digital forensics in 2001, DFRWS continues to bring academics and practitioners together in an informal environment. As a non-profit, volunteer organization, DFRWS sponsors technical working groups, annual conferences and challenges to help drive the direction of research and development. http:/dfrws.org

Upload: others

Post on 28-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design and Implementation of FROST - Digital Forensic ... · OpenStack cloud platform, which we call Forensic Open-Stack Tools (FROST). FROST provides the first forensic capabilities

DIGITAL FORENSIC RESEARCH CONFERENCE

Design and Implementation of FROST - Digital Forensic

Tools for the OpenStack Cloud Computing Platform

By

Josiah Dykstra and Alan Sherman

From the proceedings of

The Digital Forensic Research Conference

DFRWS 2013 USA

Monterey, CA (Aug 4th - 7th)

DFRWS is dedicated to the sharing of knowledge and ideas about digital forensics

research. Ever since it organized the first open workshop devoted to digital forensics

in 2001, DFRWS continues to bring academics and practitioners together in an

informal environment.

As a non-profit, volunteer organization, DFRWS sponsors technical working groups,

annual conferences and challenges to help drive the direction of research and

development.

http:/dfrws.org

Page 2: Design and Implementation of FROST - Digital Forensic ... · OpenStack cloud platform, which we call Forensic Open-Stack Tools (FROST). FROST provides the first forensic capabilities

Design and implementation of FROST: Digital forensic toolsfor the OpenStack cloud computing platform

Josiah Dykstra*, Alan T. ShermanCyber Defense Lab, Department of CSEE, University of Maryland, Baltimore County (UMBC), 1000 Hilltop Circle, Baltimore, MD 21250,United States

Keywords:OpenStackCloud computingDigital forensicsCloud forensicsFROST

a b s t r a c t

We describe the design, implementation, and evaluation of FROSTdthree new forensictools for the OpenStack cloud platform. Our implementation for the OpenStack cloudplatform supports an Infrastructure-as-a-Service (IaaS) cloud and provides trustworthyforensic acquisition of virtual disks, API logs, and guest firewall logs. Unlike traditionalacquisition tools, FROST works at the cloud management plane rather than interactingwith the operating system inside the guest virtual machines, thereby requiring no trust inthe guest machine. We assume trust in the cloud provider, but FROST overcomes non-trivial challenges of remote evidence integrity by storing log data in hash trees andreturning evidence with cryptographic hashes. Our tools are user-driven, allowing cus-tomers, forensic examiners, and law enforcement to conduct investigations withoutnecessitating interaction with the cloud provider. We demonstrate how FROST’s newfeatures enable forensic investigators to obtain forensically-sound data from OpenStackclouds independent of provider interaction. Our preliminary evaluation indicates theability of our approach to scale in a dynamic cloud environment. The design supports anextensible set of forensic objectives, including the future addition of other data preser-vation, discovery, real-time monitoring, metrics, auditing, and acquisition capabilities.ª 2013 Josiah Dykstra and Alan T. Sherman. Published by Elsevier Ltd. All rights reserved.

1. Introduction

Today, cloud computing environments lack trustworthycapabilities for the cloud customer or forensic investigatorto perform incident response and forensic investigation.Consequently, customers of public cloud services are at themercy of their cloud provider to assist in an investigation.Law enforcement relies on the cumbersome and time-consuming search warrant process to obtain cloud data,and requires the cloud provider to execute each search onbehalf of the requester. In 2012, we concluded that themanagement plane is an attractive solution for user-drivenforensic capabilities since it provides access to forensic data

without needing to trust the guest virtual machine (VM)or the hypervisor and without needing assistance from thecloud provider. Storing and acquiring trustworthy evidencefrom a third party provider is non-trivial. This paperdescribes our design and implementation of amanagementplane forensic toolkit in a private instantiation of theOpenStack cloud platform, which we call Forensic Open-Stack Tools (FROST).

FROST provides the first forensic capabilities integratedwith OpenStack, and to our knowledge the first to be builtinto any Infrastructure-as-a-Service (IaaS) cloud platform.Throughout the paper we use the NIST definition of cloudcomputing as a model for on-demand access to a pool of re-sources “that can be rapidly provisioned and released withminimal management effort or service provider interaction”(National Institute of Standards and Technology, 2011). Ourforensic extensions allow for efficient, trustworthy, anduser-

* Corresponding author.E-mail addresses: [email protected] (J. Dykstra), [email protected]

(A.T. Sherman).

Contents lists available at SciVerse ScienceDirect

Digital Investigation

journal homepage: www.elsevier .com/locate/di in

1742-2876/$ – see front matter ª 2013 Josiah Dykstra and Alan T. Sherman. Published by Elsevier Ltd. All rights reserved.http://dx.doi.org/10.1016/j.diin.2013.06.010

Digital Investigation 10 (2013) S87–S95

Page 3: Design and Implementation of FROST - Digital Forensic ... · OpenStack cloud platform, which we call Forensic Open-Stack Tools (FROST). FROST provides the first forensic capabilities

driven incident response and forensic acquisition in a cloudenvironment, which match the cloud characteristic of beinguser-driven.

This work implements practical tools on the theoreticalfoundations we established (Dykstra and Sherman, 2012).FROST collects data at the cloud provider, from the hostoperating system level (outside the guest virtual ma-chines), and makes that data available within the man-agement plane. The management plane, exposed through awebsite and application programming interface (API), ishow users of OpenStack control the cloud and where theystart and stop virtual machines. Because the user collectingforensic data does not communicate with a virtual ma-chine, the forensic data are preserved against compromisedor untrustworthy virtual machines.

Consider a cloud customer, Alice, whose provider usesOpenStack with FROST. Alice wants to investigate an inci-dent of suspiciously-high bandwidth usage from hercloud-hosted webserver. While her webserver logs webrequests inside of its VM, Alice can get a more completepictures of activity by obtaining a record of managementactivity and metadata about her VM. Alice uses FROST toretrieve firewall logs, Nova Compute Service API logs, and avirtual hard drive image of the suspicious machine andthen provides this evidence to the authorities. The firewalllogs may show an attacker scanning Alice’s virtual ma-chine before hacking it. API logs may contain evidence ofunauthorized attempts to stop the virtual machine. Thedisk image may contain evidence of what an attacker didonce he obtained access. This is strong forensic evidenceabout the potential crime that can be used in court. Alicecan obtain this evidence using either thewebmanagementplane or the OpenStack API. FROST ensures the forensicintegrity of the evidence that Alice gathers. WithoutFROST, this evidence would only be available with assis-tance from Alice’s cloud provider.

OpenStack (2012b) is an open-source cloud computingplatform, conceived as a joint project between the NationalAeronautics and Space Administration (NASA) and Rack-space. OpenStack users include many large organizationssuch as Intel, Argonne National Laboratory, AT&T, Rack-space, and Deutsche Telekom. The cloud platform com-prises six primary modular components: Nova, thecompute platform and cloud controller; Swift, the objectstorage system; Glance, the service for managing disk im-ages; Keystone, the identity service; Horizon, the web-based dashboard for managing OpenStack services; andQuantum, network services for virtual devices. OpenStackis a complex software package, with over 600,000 lines ofcode and 415 active developers (OpenStack, 2012a). It is awidely used platform for private cloud instances, but it isalso compatible with commercial cloud offerings. Open-Stack has APIs compatible with Amazon EC2 and S3.

Without loss of generality, our approach makes thefollowing assumptions. First, the user-driven forensic ca-pabilities are applicable in situations where a cooperativecloud customer is involved in the investigation. That is, if amalicious customer uses the cloud to commit a crime, thecloud provider will still be required to assist law enforce-ment in the investigation. Second, the proposed solutionassumes a trusted cloud provider and cloud infrastructure.

Evidence from our forensic tools could be manipulatedunless the underlying layers of the cloud infrastructure,such as the host operating system and hardware, haveintegrity. We assume that the hardware, host operatingsystem, hypervisor, and cloud employees are trusted, butwe do not assume trust in the guest machine. Third, we donot consider legal issues associated with the process orproduct of cloud-based forensic data acquisition; Dykstraand Riehl (2012) previously explored those issues.

Our contributions are:

! Description of the architecture, design goals, andimplementation of user-driven forensic acquisition ofvirtual disks, API logs, and firewall logs from the man-agement plane of OpenStack.

! An algorithm for storing and retrieving log data withintegrity in a hash tree that logically segregates the dataof each cloud user in his or her own subtree.

! Preliminary informal Evaluation results showing thatthe proposed solution satisfies technological and legalrequirements for acceptance in court and scales appro-priately for a cloud environment.

The rest of the paper is organized as follows. Section 2reviews previous and related work. Section 3 describesthe requirements, specifications, and capabilities of FROST.Section 4 explains the architecture of our solution. Section5 discusses the design. Section 6 explains our API andmanagement console implementations based on the ar-chitecture. Section 7 evaluates our solution. Section 8 dis-cusses advantages, limitations, and trust assumptions.Section 9 concludes the work.

2. Previous and related work

We survey previous and related work in remote forensicacquisition, forensic data collected by providers, andmethods for storing content on untrusted platforms.

Data acquisition is a key issuewhen investigating cloud-based incidents (Dykstra and Sherman, 2011a,b; Ruan et al.,2011; Taylor et al., 2011). Research to date has focused onexplaining this issue but has failed to produce practicaltools to support remote forensic acquisition. Dykstra andSherman (2012) illustrated how to use existing tools likeGuidance EnCase to acquire forensic data remotely over theInternet, but explained why the data may be untrustwor-thy. Martini and Choo (2012) proposed a conceptualframework for preservation and collection of forensic datafrom cloud computing but did not implement any capa-bilities. In some cases, such as in AmazonWeb Services, it ispossible to retrieve an image of the virtual disk of a virtualmachine. There is, however, no mechanism to obtain a hashof the image on the provider’s system to validate theintegrity of the image after download.

In addition to disk images, forensic investigators usemetadata and system logs to reconstruct an event. Gener-ating metadata and system logs are usually standardpractices in the operation of a system, rather than forensic-specific tasks. Nevertheless, the logs are useful in aninvestigation and easily gathered. Consumers of cloud

J. Dykstra, A.T. Sherman / Digital Investigation 10 (2013) S87–S95S88

Page 4: Design and Implementation of FROST - Digital Forensic ... · OpenStack cloud platform, which we call Forensic Open-Stack Tools (FROST). FROST provides the first forensic capabilities

services have few tools available for accessing low-levellogs to the cloud infrastructure. Cloud providers and re-searchers encourage application-level logging (Marty,2011); Google, Amazon, and Microsoft allow customers tolog accesses to stored objects (Google, 2012; Amazon WebServices, 2011; Microsoft, 2012). Cloud customers are usu-ally responsible for their own monitoring, metrics, andauditing inside the customer’s VM. Amazon CloudWatch isa monitoring service for EC2, but its metrics are very coarse(Amazon Web Services, 2013); detailed monitoring fornetwork utilization reports the number of bytes sent orreceived on all network interfaces by an instance at one-minute frequency. To our knowledge, no cloud providermakes available customer accessible API call audit logs orVM firewall logs. That is, a customer has no way to know if,when, and fromwhat IP address his or her credentials wereused to make API calls.

Data integrity is a critical component of the forensicprocess. Given the popular discussions about data securityin the cloud, researchers and commercial vendors havebeen dedicated to addressing data privacy. Other authorshave developed proposals for ensuring integrity onuntrusted machines, such as third-party servers. Clarke(2005) proposed a method for validating the integrity ofuntrusted data using hash trees and a small fixed-sizedtrusted state. This method differs from our methodbecause it does not check the integrity of subsets of thedata. SUNDR (Secure untrusted data repository) (Li et al.,2004) is a filesystem for storing data securely on untrus-ted servers. However, SUNDR requires that each filesystemuser is able to see file modifications by all other clients. Inour solution, we want each cloud consumer to be inde-pendent from the other customers.

Other researchhas focusedonstoringcontent securelyonuntrusted servers, which could then produce trustworthyforensic data, even from third-party cloud providers. Haberet al. (2008) explored in depth the redaction of subdocu-ments from signed original data, while preserving thecryptographic link of integrity between the two datasets.Haber posited that audit logs can be considered an append-only database, and that an audit report is essentially adatabase query with certain entries redacted. Haber’s pro-posed redactable signature algorithm is precisely applicableto the cloud logs wewill encounter, though it must take intoaccount a constantly changing dataset.

The dissertations of Crosby (2009) and Kundu (2010)bear striking similarity to our goals despite different mo-tivations. Crosby proposed history tree tamper-evidentlogs, and suggested that they could “increase the trust insoftware service and ‘cloud computing.”’ Kundu wasinterested in authenticating subsets of signed data objectswithout leaking structural information about the datastructures. Our work was influenced by these designs. Weassume that the logger is trusted, and we use our enhancedlogging mechanism simply for efficient log storage,retrieval, and integrity validation.

3. Requirements, specifications, and capabilities

We describe the requirements, specifications, and ca-pabilities for FROST. We identify the stakeholders and use

cases that will help determine the tool requirements. Wealso discuss the accepted legal and forensic communityrequirements, and how we will meet them.

Cloud-based crimes take two general forms that deter-mine the stakeholders who would use FROST. One form is acrime committed against the cloud-based resources of aninnocent victimwho is cooperative in an investigation. Theother is a crime committed by an uncooperative party usingthe cloud as an instrument of a crime. In the first case, thelegitimate cloud customer and/or law enforcement will useFROST. In the second case, law enforcement or the providerwill use FROST. In both cases the requirement is to mini-mize interaction with personnel at the cloud provider. Thecloud provider deploys FROST, but has no other re-sponsibilities (subject to the assumptions in Section 1).

3.1. Scientific, technical, and legal requirements

There is no single, authoritative source for requirementsdevelopment of new forensic tools. Our solution, however,is informed by accepted practices and written guidance.The Scientific Working Group on Digital Evidence (SWGDE)(2006) asserts that “Digital Evidence submitted for exam-ination should be maintained in such a way that theintegrity of the data is preserved. The commonly acceptedmethod to achieve this is to use a hashing function.” On therequirements for acquisition the National Institute forStandards and Technology (2004) says “The two criticalmeasurable attributes of the acquisition process arecompleteness and accuracy. Completeness measures if theall the datawas acquired, and accuracymeasures if the datawas correctly acquired.” Integrity and completeness of thedata will be of foremost importance.

The cloud environment dictates the technical re-quirements. Any digital forensic tools for cloud computingshould be compatible with cloud characteristics of on-demand self-service, rapid elasticity, and scalability. Thefollowing technical requirements are consistent with thesecharacteristics:

1. Be compatible with existing forensic formats. Insteadof creating new data formats, the new capabilitiesoutput data in existing formats to be easily ingested byother forensic tools. Our logs and disk images are pro-vided in standard formats, and all are accompanied by aDigital Forensic XML (DFXML) file (Garfinkel, 2012).DFXML is used to express the cryptographic hashes andprovenance information.

2. Be easy to generate. It must be easy to modify existingcloud deployments to add forensic capabilities. It mustalso be intuitive and simple for a user to request forensicdata. Our changes to a stock installation of OpenStack canbe made by running an installation script. Users canrequest forensic datawith a single command orweb click.

3. Be open and extensible. The implementation must beavailable for any OpenStack administrator. Developersshould be able to extend and contribute new forensic ca-pabilities. The platform we developed allows other de-velopers to integrateother forensic toolsquicklyandeasily.The software will be submitted to the OpenStack project.

J. Dykstra, A.T. Sherman / Digital Investigation 10 (2013) S87–S95 S89

Page 5: Design and Implementation of FROST - Digital Forensic ... · OpenStack cloud platform, which we call Forensic Open-Stack Tools (FROST). FROST provides the first forensic capabilities

4. Be scalable. The forensic tools must be usable for singlecloud instances, while also supporting millions of cloudcustomers and virtual machines. FROST can support anynumber of instances and is limited only by the pro-cessing time it takes the host operating system toretrieve the forensic data.

5. Follow existing practices and standards. Wherepossible, cloud forensic tools should follow standardforensic practices. The forensic data we provide adheresto accepted practices and can be ingested by standardforensic tools such as Guidance EnCase.

For acceptance in court, the Department of Justice’s“Search and Seizure Manual” (2009) applies Federal Rulesof Evidence 901(b)(9), explaining that “to demonstrateauthenticity for computer-generated records, or any recordsgenerated by a process, the proponent should introduce ‘[e]vidence describing a process or a system used to produce aresult and showing that the process or system produces anaccurate result.”’ In most cases, the reliability of a computerprogram can be established by showing that users of theprogram actually do rely on it on a regular basis, such as inthe ordinary course of business. Our solutions use ordinarydata, such asfirewall logs, evenwhenwehave enhanced thestorage of data to add increased data security.

3.2. Specifications and capabilities

FROST has three primary components. First, a cloud usercan retrieve an image of the virtual disks associatedwith anyof the user’s virtual machines, and validate the integrity ofthose imageswith cryptographic checksums. Second, a clouduser can retrieve logs of all API requests made to the cloudprovider made using his or her credentials, and validate the

integrity of those logs. The API is used for administeringvirtualmachines, such as creating and startingVM instances.Third, the cloud user can retrieve theOpenStackfirewall logsfor any of the user’s virtual machines, and validate theintegrityof those logs. TheOpenStackfirewall operates at thehost operating system, and the API is used to administer it,such as allowing or blocking network ports. These threecomponents are useful and offer forensic data that are notavailable directly to cloud users today. In our informal dis-cussions with cloud users and administrators of two largeprivate clouds and forensic experts, they all requested ca-pabilities that were consistent with these features.

Cloudusers interactwith their provider andmanage cloudresources through the management plane using a web inter-face and API. FROST is accessible from each of these man-agement plane interfaces. The implementation is modular toallow additional forensic capabilities to be added later.

4. Architecture

We describe the architecture of our solution. We showhow we integrate with OpenStack, the type and format ofthe data we collect, and the methods for returning data tothe requestor.

4.1. Integration with OpenStack

OpenStack has many components, but we focus on thetwo where we have integrated FROST: Nova and Horizon.Nova provides the compute service through virtual serverssimilar to those in Amazon EC2 and implements thecomputeAPI. Horizonprovides theweb-baseduser interfacefor OpenStack, and communicates with Nova through the

Fig. 1. Pictorial snippet of the OpenStack architecture showing where OpenStack Compute (Nova) and OpenStack Dashboard (Horizon) have been modified to addFROST. Horizon provides a web interface to the management plane and Nova provides an API interface to the management plane. The majority of changes forFROST were to the API Daemon.

J. Dykstra, A.T. Sherman / Digital Investigation 10 (2013) S87–S95S90

Page 6: Design and Implementation of FROST - Digital Forensic ... · OpenStack cloud platform, which we call Forensic Open-Stack Tools (FROST). FROST provides the first forensic capabilities

compute API. Fig. 1 highlights where we modified Nova andHorizon to integrate FROST.

We add new Nova API calls that correspond to ourforensic features. Cloud users who interact with OpenStackusing the compute API are able to exercise our capabilitiesfrom command-line tools and in their own programs.

Horizon isbuiltusingDjangoandPython, and implementsdashboards for OpenStack. We modify the specification forthe dashboard that displays instance information and createsa new tab. This tabhas links to our forensic capabilities. Theselinks return data from their corresponding API calls.

OpenStack has a variety of credentials for differentpurposes. Our tools assume that OpenStack has authenti-cated the user making the request. The Horizon webinterface requires only a username and password. Thecommand-line API requires either an access key and secretaccess key (which can be retrieved using the API), or anX.509 certificate and private key. API requests are digitallysigned using the private key, and this signature is trans-mitted to OpenStack along with the certificate. Nova alsohas a root certificate that can sign documents. We use thisroot certificate to add integrity to the storage of log data,which we call the Authenticated Logging Service (ALS).

4.2. Data retrieval

Each of the three FROST capabilities accesses uniquedata that are already stored by OpenStack or which we caneasily enable for storage. Retrieval of data for the user de-pends on how and where the data are stored.

Retrieval of virtual disks is the most straightforwardtask. For each virtual machine, OpenStack creates a direc-tory on the host operating system that contains the virtualdisk, ramdisk, and other host-specific files. The file formatof the virtual disk varies according to the hypervisor used.Since we use KVM as our hypervisor, the format of ourvirtual disks is QEMU QCOW2 images. The ability toretrieve the original virtual disks must support snapshotsof disks from machines that are running, as well asdownloads of disk images from stopped machines. QEMUprovides utilities to convert QCOW2 images to raw format,and libewf can convert raw images to the EWF-E01 format.

Cloud users may run a firewall inside their VM, butOpenStack provides firewall services beneath the VM. Bydefault OpenStack uses the Linux iptables firewall on thehost machine to implement network security for the guestmachines. A new chain, or group of rules, is created for eachinstance. Several default rules are automatically created,such as allowing the host to communicate with the guest.Cloud users are then able to create custom rules manually,such as allowing inbound SSH or HTTP traffic. OpenStackhas no inherent configuration options to log networkconnections that match the firewall rules or connectionsthat are denied by the firewall. However, iptables nativelyhas this ability. We enable logging on all denied networkconnections and enable the user to retrieve logs for herOpenStack instances.

OpenStack has the ability to log request successes andfailures when a user issues a request to Nova. For example,when a user uses the API to request a new VM, this requestcan be recorded. These logs are stored on the host operating

system, and therefore are typically not available to cloudusers. FROST stores these data, but in a method that allowsthe data to be segregated for each user and that includesintegrity checking information.

5. Design

The goals of enhanced API and firewall logging are toenable a cloud user to retrieve and validate the integrity offorensically-relevant log data. The Authenticated LoggingService supplements Nova’s default logging capability. Thisservice stores the same data as the traditional log, but anew hash tree segregates users’ data and integrity checkinginformation with minimal overhead for record storage orretrieval. Each OpenStack user account has his or her ownsubtree under the root.

When a user provisions a new virtual machine inOpenStack, a universally unique identifier (UUID) isassigned to the machine. These UUIDs become children ofthe owner’s root, and logs for thatmachine are appended asfollows. The subtree of any virtual machine has a depth offour for the year, month, and day of the log entry, with thelog messages as leaves of the tree. Because the tree isconstantly changing as new log entries are added, hashvalues for the intermediate hash tree nodes are re-calculated daily. This structure enables a user to requestany date range for any or all virtual machines, whilereducing the additional overhead required.

The Authenticated Logging Services guarantees integ-rity of the log data using cryptographic hashes. Integritychecking allows the user to validate if data have beeninserted, removed, or modified. For example, if Alice re-quests her logs for December, she can calculate the hashvalues that she expects in the tree and compare them towhat the provider claimed they should be. If an attackermodified the log data in transit, the integrity check wouldfail and alert Alice to errors or manipulation.

6. Implementation

We provide details about the implementation of FROSTand show how users interact with the tools.

We implemented the forensic extensions using Dev-Stack, an OpenStack development environment, on Ubuntu12.04. We used OpenStack Folsom, which was releasedSeptember 27, 2012. We used the Xen hypervisor andUbuntu guests, but our implementation can support anyhypervisor and guest operating systems.

6.1. Authenticated logging service

The Authenticated Logging Service uses Merkle trees(Merkle, 1988) as the data structure for storing API and fire-wall log data. Unlike previous work, we are not concernedwith hiding the structural information associated with thetree, nor about prohibiting redaction in exported subtrees.

Hash trees offer three advantages. First, storing sum-mary information about a larger dataset enables efficientvalidation andminimal data transmission. For any subset ofdata in the tree, the algorithm hashes chunks of the data,and uses those hashes to compute the hash of the whole

J. Dykstra, A.T. Sherman / Digital Investigation 10 (2013) S87–S95 S91

Page 7: Design and Implementation of FROST - Digital Forensic ... · OpenStack cloud platform, which we call Forensic Open-Stack Tools (FROST). FROST provides the first forensic capabilities

tree. It is unnecessary to reveal or transmit the entire tree.Second, given the way we organize the tree, a user caneasily query for data over any date range. Third, the hashtree natively enables a user to validate the integrity of asubset of log data.

Our algorithm for storing API and firewall logs is asfollows. These two sets of data are stored separately. Sincethe design is the same for each, we describe only thestorage of API logs. As shown in Fig. 2, the cloud providermaintains a single, append-only hash tree for all users.When a new user joins the cloud service, a subtree iscreated for the user under the root. The user’s tree root issigned using the user’s public key. All API logs associatedwith that user are stored in his or her subtree. Data underthe user’s root are organized in five layers, corresponding tothe machine instance, year, month, and day of the respec-tive log entry. Raw records are found at the leaves, stored aschildren of the day. The value at each branch node iscalculated by concatenating the values of its children andcomputing the hash of that aggregate. Every minute, theprovider computes a hash of the children at each node and

updates the value of each node with a new hash. The pro-vider also signs the root of the tree, and the root of eachcloud customer, using the Nova root certificate.

When a user wishes to retrieve the logs associatedwith aparticular instance, the cloud provider returns the raw logmessages and any hash values necessary to validate theintegrityof the result upto theuser’s root. Forexample, in themost trivial case shown in Fig. 2, the provider would returnonly a single logmessage and the hash value at node “Alice.”Using the Nova root certificate, the provider also hashes andsigns all data being returned and records these values in aDFXML logfilewhich is returned to the user. Alice could thencompute the hashes and validate that the value she calcu-lated for “Alice”matches what the provider claimed.

6.2. API implementation

Many users interact with cloud platforms with com-mand line tools that call API functions. The Nova APIdaemon is the endpoint for API queries. Our API extensionfile contains code to implement our features. We registerthese extensions with Nova, and add the ability to call themfrom the dashboard and the command-line novaclient.New API calls are added to OpenStack by placing theirfunctionality in a contribution directory, and modifyingnovaclient to allow the user to call the API. Each of ourforensic capabilities is implemented in this manner. Wethen hook the Nova logging handler to send logmessages toour replacement logging service, described below. We alsohook the iptables manager to label firewall messages withthe instance ID associated with them. The Nova Networkdaemon then carries out the work of correctly modifyingthe iptables rules as the system and the user create them.

To use FROST a user must have already authenticated toOpenStack with his or her private key or credentials. Theauthenticated user can access only the logs for machinesthat he or she owns, as enforced by Keystone, the Open-Stack identity service. The API validates that the requestorhas permission to access the instance for which he or she isrequesting forensic data.

Nova logs are stored in/var/log/nova/on the host oper-ating system. When a user requests his or her Nova logs,FROST searches this file for lines that contain that user’spersonal identifier.

Listing 1 shows the output of using FROST from the com-mand line to retrieve the Nova logs for a single virtual ma-chine. FROST returns the Nova entries that match that UUID,and also creates a DFXML file named report.xml. The DFXMLfile contains provenance information about the execution ofFROST and a hash of the log data for integrity validation.

Fig. 2. Tree structure used to store API logs by user, machine, year, month,and day, showing log entries for Alice’s two virtual machines on December7–8, 2012. The value at each branch node is a hash of the concatenation ofthe values of its children. These hash values enable integrity validation forany subtree of the whole.

Listing 1. Execution of the FROST API to retrieve the Nova logs for virtual machine 0afcfbcd-b836-4593-a02c-25d8d3a94b00 showing user “admin” provisioninga new virtual machine. These data are available only to users with FROST or with provider assistance.

J. Dykstra, A.T. Sherman / Digital Investigation 10 (2013) S87–S95S92

Page 8: Design and Implementation of FROST - Digital Forensic ... · OpenStack cloud platform, which we call Forensic Open-Stack Tools (FROST). FROST provides the first forensic capabilities

Firewall logging must be enabled, since it is not enabledby default in OpenStack. Because OpenStack creates defaultrules for each running virtual machine, we append anotherrule that logs all dropped packets to/var/log/syslog. Foreach instance, we prepend a special prefix to the log mes-sages that labels the UUID of the machine. Doing so enablesus to parse the log file and identify those lines that corre-spond to the particular virtual machine that the userrequests.

Listing 2 shows the output of using FROST from thecommand line to retrieve the firewall logs for a single vir-tual machine. FROST returns the firewall logs that matchthat UUID, and also creates a DFXML file named report.xml.

Disk images are stored in the filesystem of the hostoperating system. The file path includes the name of theinstance, which is used to identify the correct image toreturn to the user.

Our implementation supports the retrieval of disk im-ages from virtual machines that are powered off. Newversions of QEMU and Libvirt include functionality to createsnapshots of running instances, but these features have notyet been added to OpenStack.

Listing 3 shows the output of using FROST from thecommand line to retrieve a disk image for a single virtual diskwith volume name myvol-e9a5612d. FROST returns the diskimage for myvol-e9a5612d, and also creates a DFXML filenamed report.xml in the same way as above. The requestorcan validate the integrity of the image by comparing the hashvalue in the DFXML, as computed by the cloud provider, withthe hash value computed by the requestor.

6.3. Management console web implementation

The Management Console for OpenStack Computecontains an Instance Detail page for each virtual machineguest created by the user. We added a new tab for “IncidentResponse” to the Instance Detail section. This tab containsour forensic tools, and provides a space for future forensicsand incident response related features.

Fig. 3 shows the Incident Response page for a virtualmachine. On this page a user can click to retrieve Nova logs,

firewall logs, and a disk image. These links return a zip filethat contains the data requested and a DFXML file.

7. Preliminary evaluation

We conducted two evaluations of FROST. The first is anobjectives-based assessment to validate that FROST canscale and produce correct results. The second is aconsumer-oriented demonstration and independentappraisal to gather feedback from potential users.

We tested FROST by creating 100 fictitious users andused the API to launch five virtual machines for each usersimultaneously. For each virtual machine, we associated

firewall rules that allowed only SSH. With 500 virtualmachines running, we used a network scanner to scanports 1–1024 on each machine. This was done to triggerthe firewall to block network traffic on the prohibitedports. We then chose a random user’s key from the list of100 users, and a random instance from the list of 500,and used the API to try and stop the virtual machine.There was only a 1% chance that the chosen user ownedthe chosen virtual machine, and this procedure gener-ated Nova logs for both successful and unsuccessfulattempts.

We then chose 20 users at random and for each userrequested the API logs, firewall logs, and disk image foreach of the user’s instances. We validated the integrity ofeach log and disk image returned by computing the hash ofthe data and comparing it to the hash value in the DFXMLfile. No anomalies were observed.

To scale to more users the logging mechanism needsonlymore storage space. Each API and firewall log entry canbe no larger than 1 KB. Using SHA-1 as the cryptographichash algorithm requires 160 bits for each tree node (user,VM, year, month, day). In the worst case this creates 1664bytes per entry. Therefore, the logging mechanism canstore more than 645,000 log entries in 1 GB of storage. Webelieve that modern servers can easily handle this load.Cloud providers could choose to share this cost with cus-tomers who wish to enable the logging service.

Listing 2. Execution of the FROST API to retrieve the firewall logs of virtual machine 0a 18799f-c198-4dbb-b369-b49184e3dfbc showing traffic to ports 443 and53 being dropped. This level of logging is exposed only to users with FROST or with provider assistance.

Listing 3. Execution of the FROST API to retrieve a disk image of volume myvol-e9a5612d. The user can easily validate the integrity by comparing the checksums.

J. Dykstra, A.T. Sherman / Digital Investigation 10 (2013) S87–S95 S93

Page 9: Design and Implementation of FROST - Digital Forensic ... · OpenStack cloud platform, which we call Forensic Open-Stack Tools (FROST). FROST provides the first forensic capabilities

Cloud providers can expect minimal performanceimpact after deploying FROST. The overhead of calculatingchecksums and providing them to users is negligible. Thetime and bandwidth required for a user to download his orher logs or disk images is dependent upon the size of thedata. We also expect users to request large data volumes,such as disk images, infrequently.

We demonstrated FROST to 12 users and administratorsof a large private government cloud; their reactions werepositive. One administrator said “[FROST] is exactly whatOpenStack has been missing” and “I appreciate shifting theload [of investigation] away from me and onto our users.”The audience was confident that FROST would be useful inincident response and forensics due to its ease of use. Usersexercised FROST’s web and API interfaces and describedthem as “intuitive and consistentwith Open-Stacks design.”Most users anticipated automating their use of FROST, suchas for collecting logs on a daily basis. They were also inter-ested in using FROST for non-forensic purposes, such astroubleshooting and compliance. The administrators plan todeploy FROST to this cloud in mid-2013.

This evaluation shows that the integrity, completeness,and accuracy of the forensic data are intact, as identified bySWGDE and NIST in Section 3.1. The legal requirements aresimilarly met. Our solutions use computer data which arealready collected and used in standard practice, or likefirewall logs, are standard practice in computer networksand are easily enabled in OpenStack.

8. Discussion

We discuss advantages, limitations, trust assumptions,and open problems of FROST.

FROST offers advantages to forensic investigators overtoday’s options for data acquisition. Obtaining a searchwarrant and serving it to a cloud provider, or requiring asystem administrator’s intervention to collect data puts theinvestigator at the mercy of others to complete the dataacquisition. Today’s forensic tools like EnCase Enterprisealso perform acquisition inside the guest VM, which couldbe compromised. FROST performs acquisition at the hostoperating system.

One limitation of FROST is that it still requires trust in thecloud provider. In particular, users must trust the host

operating system, hardware, network, and cloud em-ployees. We (2012) previously explored options foraddressing these concerns, such as rooting trusting inhardware with trusted platform modules. Today FROSTconcedes some trust in the system for the ability to performforensics remotely. FROSTcomes as a software solutionwithalmost no cost to the provider other than some disk spaceand the support required to maintain and troubleshootFROST. More comprehensive trust solutions, includingTPMs, require more substantial cost on a large scale, andhardware changes to the entire cloud infrastructure.

The enhanced logging mechanism is not foolproof. First,it is impossible to detect if an untrusted logger intentionallyfails to record an event without access to the logger. Sec-ond, cloud clients could collude with the logger to roll backor modify events that may be difficult for a third party, suchas law enforcement, to detect. Because we assume that thecloud provider is non-coercible, this concern is mitigated.

One open problem is preservation of data in the cloud.Rapid elasticity is a feature of cloud computing, but itcomes with the challenge of preserving data in an inves-tigation until that data can be identified and retrieved.OpenStack needs the capability for manual or automaticdata preservation to maintain the record of activity of amalicious cloud user. This could be achieved by archivinglogs and virtual disks for some period of time after thecloud consumer requests their disposal. However, forensicaccountability may present tension with user privacy andrequires careful thought.

Forensic examination of cloud layers remains open. Forexample, forensic capabilities for hypervisors or virtualnetworks and software-defined networking should beaddressed.

Another open problem is the evolution and maturity ofOpenStack. OpenStack has an active development com-munity and regular software releases. Future modificationsto OpenStack may affect FROST’s functionality. We areworking to add FROST to the public OpenStack project.

Futurework remains in several areas. FROST is unique toIaaS environments. While the reference implementationhas been done with OpenStack, implementations for otherIaaS platforms are feasible. Platform-as-a-Service andSoftware-as-a-Service give less control to cloud users.Forensic capabilities for these environments remain to bedone, and require considerations for their unique chal-lenges. The provider will have to collect more of theforensic data, such as logging in the guest operating system.

Expansion within OpenStack also remains. It would beuseful to support other virtual disk formats, since Open-Stack supports many hypervisors. Other forensic capabil-ities could be added to FROST, such as data preservation orserver-side e-discovery. Users will need the capability tocreate snapshots of running instances, which will bepossible with the latest versions of QEMU and Libvirt. Itwould be useful to have automated snapshots of virtualmachines, and the ability to detect changes betweensnapshots. Forensic tools for object storage, like that pro-vided by OpenStack Swift and Amazon S3, would be usefultoday.

Our preliminary evaluation indicates that FROST is ableto support many users without introducing undue

Fig. 3. Screenshot of the OpenStack web interface showing our new incidentresponse tab and links to FROST functions to download Nova logs, firewalllogs, and disk images for one virtual machine. These links provide easy ac-cess to forensic functions for cloud users.

J. Dykstra, A.T. Sherman / Digital Investigation 10 (2013) S87–S95S94

Page 10: Design and Implementation of FROST - Digital Forensic ... · OpenStack cloud platform, which we call Forensic Open-Stack Tools (FROST). FROST provides the first forensic capabilities

overhead. More comprehensive testing in a large scaleenvironment is needed to gage the performance impact.We suggest an analysis to compare a production environ-ment with and without FROST.

While FROST implements the acquisition phase of theforensic process, future work should consider solutions forother phases of the process affected by cloud computing.FROST produces data that can be consumed by standardtools, but as data volumes increase other analysis tools willbe necessary.

9. Conclusion

We have introduced the FROST suite for OpenStack, thefirst collection of forensic tools integrated into the manage-ment plane of a cloud architecture. These tools enable cloudconsumers, law enforcement, and forensic investigators toacquire trustworthy forensic data independent of the cloudprovider. In addition to incident response and forensics,FROST can also be used for real-time monitoring, metrics, orauditing.

FROST offers concrete user-accessible forensic capabil-ities to cloud consumers.Whilemany organizations are stillhesitant to adopt cloud solutions because of security con-cerns, FROST arms them with powerful and immediateresponse capabilities. Similar tools should be a part of allcommercial cloud services, and we look forward to thecreation and adoption of more such tools to enhanceforensic readiness for cloud computing.

Acknowledgments

We would like to thank the anonymous reviewers fortheir helpful suggestions. We thank Simson Garfinkel, KenZatyko, and Tim Leschke for comments on early drafts. Wealso thank Ron Rivest and Stuart Haber for insights andsuggestions related to hash trees.

Sherman was supported in part by the Department ofDefense under IASP grantsH98230-11-1-0473 andH98230-12-1-0454, and by the National Science Foundation underSFS grant 1241576. Dykstrawas supported inpart byanAWSin Education grant award.

References

Amazon Web Services. Amazon web services: overview of security pro-cesses. Available at: http://awsmedia.s3.amazonaws.com/pdf/AWS_Security_Whitepaper.pdf; 2011 [accessed 28.10.12].

Amazon Web Services. Amazon CloudWatch. Available at: http://aws.amazon.com/cloudwatch/; 2013 [accessed 05.02.13].

Clarke DE. Towards constant bandwidth overhead integrity checking ofuntrusted data [Ph.D. thesis]. MIT; 2005.

Crosby SA. Efficient tamper-evident data structures for untrusted servers[Ph.D. thesis]. Rice University; 2009.

Dykstra J, Riehl D. Forensic collection of electronic evidence frominfrastructure-as-a-service cloud computing. Richmond Journal ofLaw and Technology 2012;19. Available at: http://jolt.richmond.edu/wordpress/?p¼463.

Dykstra J, Sherman AT. Understanding issues in cloud forensics: twohypothetical case studies. In: Proceedings of the 2011 ADFSLConference on digital forensics security and law. ASDFL; 2011a.p. 191–206.

Dykstra J, Sherman AT. Understanding issues in cloud forensics: twohypothetical case studies. Journal of Network Forensics 2011b;3(1):19–31.

Dykstra J, Sherman AT. Acquiring forensic evidence from infrastructure-as-a-service cloud computing: exploring and evaluating tools, trust,and techniques. Digital Investigation 2012;9(Suppl. S90–S98). TheProceedings of the Twelfth Annual DFRWS Conference.

Garfinkel S. Digital forensics xml and the dfxml toolset. Digital Investi-gation 2012;8(3–4):161–74.

Google. Access logs & storage data (experimental) – Google cloud storage.Available at: https://developers.google.com/storage/docs/accesslogs;2012 [accessed 28.10.12].

Haber S, Hatano Y, Honda Y, Horne W, Miyazaki K, Sander T, et al. Efficientsignature schemes supporting redaction, pseudonymization, and datadeidentification. In: Proceedings of the ACM Symposium on infor-mation, computer & communication security (ASIACCS’08) 2008.p. 353–62.

Kundu A. Data in the cloud: authentication without leaking [Ph.D. thesis].Purdue University; 2010.

Li J, Krohn M, Mazières D, Shasha D. Secure untrusted data repository(SUNDR). In: Proceedings of the 6th conference on Symposium onOperating Systems Design & Implementation - Volume 6 (OSDI’04).Berkeley, CA, USA: USENIX Association; 2004. p. 9–9.

Martini B, Choo KKR. An integrated conceptual digital forensic frameworkfor cloud computing. Available at: http://dx.doi.org/10.1016/j.diin.2012.07.001; 2012 [accessed 10.09.12].

Marty R. Cloud application logging for forensics. In: Proceedings of the2011 ACM Symposium on applied computing. New York, NY, USA:ACM; SAC ’11 2011. p. 178–84.

Merkle RC. A digital signature based on a conventional encryption func-tion. In: A Conference on the theory and applications of cryptographictechniques on advances in cryptology. London, UK, UK: Springer-Verlag; CRYPTO ’87 1988. p. 369–78.

Microsoft. About storage analytics logging. Available at: http://msdn.microsoft.com/en-us/library/windowsazure/hh343262.aspx; 2012[accessed 12.11.12].

National Institute of Standards and Technology. Digital data acquisitiontool specification. Available at: http://www.cftt.nist.gov/Pub-Draft-1-DDA-Require.pdf; 2004 [accessed 16.09.12].

National Institute of Standards and Technology. The NIST definition ofcloud computing. Available at: http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf; 2011 [accessed 05.01.13].

OpenStack. Keynote recap, day 2: why we do what we do. Available at:http://www.openstack.org/blog/2012/10/keynote-recap-day-2-why-we-do-what-we-do/; 2012a [accessed 26.10.12].

OpenStack. OpenStack open source cloud computing software. Availableat: http://www.openstack.org/; 2012b [accessed 13.12.12].

Ruan K, Carthy J, Kechadi T, Crosbie M. Cloud forensics: an overview. In:Advances in digital forensics VII 2011.

Scientific Working Group on Digital Evidence (SWGDE). Data integritywithin computer forensics. Available at: https://www.swgde.org/documents/Current%20Documents/2006-04-12%20SWGDE%20Data%20Integrity%20Within%20Computer%20Forensics%20v1.0; 2006[accessed 16.09.12].

Taylor M, Haggerty J, Gresty D, Lamb D. Forensic investigation of cloudcomputing systems. Network Security 2011;3:4–10.

U.S. Dept. of Justice. Searching and seizing computers and obtainingelectronic evidence in criminal investigations. Available at: http://www.justice.gov/criminal/cybercrime/docs/ssmanual2009.pdf; 2009[accessed 16.09.12].

J. Dykstra, A.T. Sherman / Digital Investigation 10 (2013) S87–S95 S95