a novel digital forensic framework for cloud computing...

A Novel Digital Forensic Framework for

Cloud Computing Environment

THESISSubmitted in partial fulfilment

of the requirements for the degree of

DOCTOR OF PHILOSOPHY

byPOVAR DIGAMBAR

ID. No. 2011PHXF401H

Under the supervision ofDr. G. Geethakumari

BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI2015

i

BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI

CERTIFICATE

This is to certify that the thesis entitled A Novel Digital Forensic Framework for Cloud

Computing Environment and submitted by Povar Digambar ID No 2011PHXF401H

for award of Ph.D. of the Institute embodies the original work done by him under my

supervision.

Signature of the Supervisor

Name in capital letters DR. G. GEETHAKUMARI

Designation Asst Professor, Dept. of CSIS

Date:

ii

AcknowledgementsForemost, I would like to express my deepest thanks to my supervisor Dr. G.Geethakumari

for all her suggestions and constant support during this research. Her valuable guidance

and encouragement throughout the period were critical factors which contributed towards

completion of the work. Through her untiring efforts, she helped me to critically anal-

yse the problems in a systematic manner and consider innovative approaches to evolve

practical solutions.

I would also like to thank Prof. Chittaranjan Hota and Prof. Yoganandam, members

of my doctoral advisory committee for their constant review and invaluable suggestions

in steering the work. I would also like to express my gratitude to other members of the

faculty in the Department of Computer Science and Information Systems Prof. Gururaj,

Prof. Bhanu Murthy, Dr. Tathagata Ray, Dr. Aruna Malapati, Mr. KCS Murti, Mr.

Abhishek Thakur and Mr. Rakesh Prasanna for all their suggestions and encouragement

during various presentations and whenever I interacted with them.

My sincere gratitude to my fellow researchers Meera and Pavan for their continuous

support during all the stages of this work. Our numerous discussions and brain storming

sessions helped me to analyse the problem from different perspectives to provide critical

insights. I would like to thank each of the other researchers in the department Agrima,

Jagan, Muthu, Prateek, Anita and Neha for all the wonderful time we shared during our

work.

I am also indebted to the members of the Resource Center for Cyber Forensics group

from CDAC, Trivandrum, with whom I have interacted during the course of my research.

Particularly, I would like to acknowledge K.L. Thomas (Assoc. Director), V.K. Bhadran

(Assoc. Director), C.Balan (Jt. Director), Dija (Dy. Director) and Nabeel Koya (Senior

Scientific Officer) for their invaluable suggestions and inputs during the practical aspects

carried out as part of this report.

Finally, my sincere acknowledgement of the sacrifices and support made by each

member of my family during this period. They were my pillars of strength, always un-

derstanding and encouraging me. Without their support, this work would never have been

completed.

BITS Pilani, Hyderabad Campus Digambar Povar

October 16, 2015

iii

Abstract

Cloud computing is a transformative computing model for businesses that deliver com-

puter based services over the Internet. Cloud computing faces major concerns due to its

architectural characteristics despite the technological innovations that have made it a fea-

sible solution. The huge popularity and utility of the cloud environment has made it the

soft target of cloud crimes. Investigating cloud crimes and fixing the responsibility of

the cyber crimes committed in the cloud platforms help instill confidence and trust in the

stake holders - be the clients, the cloud service providers or the third party entities. Cyber

crime investigation is incomplete without the proper detection of the digital evidence in

cloud. In general, cloud computing is characterized by its highly virtualized nature. As

virtualization provides many benefits, it also makes it difficult to detect digital evidence

when it is in the cloud environment. The approach used for the traditional digital forensic

cannot be directly applied to the cloud environment due to the presence of virtualization,

and hence cloud crime investigation is more difficult to perform than a traditional physical

computer investigation. The existing research in cloud forensics has only focused on the

organizational and the legal aspects, where as our work aims to contribute towards the

technical aspects of forensics in cloud.

The aim of this research is to design a generic digital forensic framework for the cloud

crime investigation by identifying the challenges and requirements of forensics in the vir-

tualized environment of cloud computing, address the issues of dead/live forensic analysis

within/outside the virtual machine that runs in a cloud environment, and to design a dig-

ital forensic triage using parallel processing framework to examine and partially analyze

the virtual machine data to speed up the investigation of the cloud crime. To analyze

the evidence within the virtual machine, we designed various methods of examining the

file system metadata, the registry file content, and the physical memory content. For the

evidence which is outside a virtual machine (cloud logs), various methods of log data

segregation and collection have been devised.

iv

Table of Contents

Certificate i

Acknowledgements ii

Abstract iii

Table of Contents iv

1 Introduction 11.1 Digital Forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Digital forensic process . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.2 Cloud Services, Deployment Models and Characteristics . . . . . 51.2.3 Cloud Crime . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.4 Cloud Forensics . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2.5 Gaps in Existing Research . . . . . . . . . . . . . . . . . . . . . 9

1.3 Objectives of the Research . . . . . . . . . . . . . . . . . . . . . . . . . 111.4 Scope and Problem Definition . . . . . . . . . . . . . . . . . . . . . . . 121.5 Contributions of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 121.6 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Background and Related Work 162.1 General Terms in Digital Forensics . . . . . . . . . . . . . . . . . . . . . 16

2.1.1 Computer Crime . . . . . . . . . . . . . . . . . . . . . . . . . . 162.1.2 Storage Media and File Systems . . . . . . . . . . . . . . . . . . 182.1.3 Limits of Traditional Digital Forensic Tools . . . . . . . . . . . . 18

2.2 Cybercrimes in Cloud Computing . . . . . . . . . . . . . . . . . . . . . 192.2.1 Sources of Digital Evidence . . . . . . . . . . . . . . . . . . . . 192.2.2 Does the Cloud Deployment Model Play a Role? . . . . . . . . . 192.2.3 Role of Cloud Delivery Models in the Investigation . . . . . . . . 212.2.4 Issues with Multi-layered Architecture . . . . . . . . . . . . . . . 22

2.3 Cloud Crime and Forensics - Review . . . . . . . . . . . . . . . . . . . . 23

v

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3 Approaches to Forensics in Presence of Virtualization in Cloud 293.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2 Challenges and Requirements of Forensics . . . . . . . . . . . . . . . . . 303.3 Detection of Virtual Environment . . . . . . . . . . . . . . . . . . . . . . 31

3.3.1 Important files in virtual machine investigation . . . . . . . . . . 323.3.2 Changes in the host OS when the virtual platform is used . . . . . 34

3.4 Detection of Virtual Machine Hidden Using ADS . . . . . . . . . . . . . 353.4.1 Role of Alternate Data Streams (ADSs) . . . . . . . . . . . . . . 363.4.2 Approach to Hide and Detect a VM Hidden using ADS . . . . . . 38

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Designing a Digital Forensic Framework for Cloud Computing Systems 454.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.2 Cloud Forensic Process and Phases . . . . . . . . . . . . . . . . . . . . . 46

4.2.1 Comparison of Digital Forensic Frameworks . . . . . . . . . . . 474.2.2 Identification of Digital Evidence . . . . . . . . . . . . . . . . . 484.2.3 Collection and Preservation of Digital Evidence . . . . . . . . . . 494.2.4 Analysis of the Digital Evidence . . . . . . . . . . . . . . . . . . 514.2.5 Reporting of Digital Evidence . . . . . . . . . . . . . . . . . . . 52

4.3 Heuristic Approach for Performing Digital Forensics in Cloud . . . . . . 544.4 Digital Forensic architecture for Cloud . . . . . . . . . . . . . . . . . . . 56

4.4.1 Cloud Infrastructure Setup . . . . . . . . . . . . . . . . . . . . . 574.4.2 Cloud Deployment (Cloud OS) . . . . . . . . . . . . . . . . . . 574.4.3 Cloud Investigation and Auditing Tools . . . . . . . . . . . . . . 59

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5 Digital Forensic Methods for Cloud Data Acquisition and Analysis 615.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.2 Digital Evidence Source Identification, Data Segregation and Acquisition 62

5.2.1 Identification of the Evidence . . . . . . . . . . . . . . . . . . . 625.2.2 Segregation of the Evidence . . . . . . . . . . . . . . . . . . . . 635.2.3 Acquisition of the Evidence . . . . . . . . . . . . . . . . . . . . 65

5.3 Examination and Partial Analysis of the Evidence . . . . . . . . . . . . . 685.3.1 Within the Virtual Machine . . . . . . . . . . . . . . . . . . . . . 685.3.2 Boyer-Moore (BM) Algorithm . . . . . . . . . . . . . . . . . . . 765.3.3 Outside the Virtual Machine . . . . . . . . . . . . . . . . . . . . 79

5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6 Digital Forensic Triage in the Examination and Partial Analysis 806.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806.2 Digital Forensic Triage . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.2.1 Introduction to Triage and Background . . . . . . . . . . . . . . 816.2.2 Parallel Processing Framework using Hadoop . . . . . . . . . . . 82

vi

6.3 Real-time Digital Forensic Analysis Process . . . . . . . . . . . . . . . . 836.3.1 Selection of the Pattern Matching Algorithm . . . . . . . . . . . 836.3.2 Proposed System Architecture . . . . . . . . . . . . . . . . . . . 846.3.3 Proposed System Implementation Details . . . . . . . . . . . . . 86

6.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 916.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7 Conclusion and Future Scope 997.1 Summary of Deductions . . . . . . . . . . . . . . . . . . . . . . . . . . 997.2 Future Scope of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1017.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

List of Publications 103

Bibliography 104

Glossary 113

vii

List of Tables

2.1 Categories of computer crimes . . . . . . . . . . . . . . . . . . . . . . . 17

2.2 Challenges of data acquisition in private and public clouds . . . . . . . . 20

3.1 Files which make up a virtual machine . . . . . . . . . . . . . . . . . . . 33

3.2 Virtual disk file signatures . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.1 Comparison of digital forensic frameworks . . . . . . . . . . . . . . . . 48

4.2 Hardware configuration details of the private cloud (IaaS) . . . . . . . . . 57

4.3 Basic services of OpenStack cloud OS [31] . . . . . . . . . . . . . . . . 59

5.1 Details of the OpenStack cloud service logs [30] . . . . . . . . . . . . . . 64

5.2 Regular expressions used for corresponding patterns . . . . . . . . . . . . 75

6.1 Report of acquisition and indexing time using traditional digital forensic

tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.2 Execution time of Boyer-Moore and KMP algorithms with multiple key-

words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.3 Hardware configuration of a node in Hadoop cluster . . . . . . . . . . . . 84

viii

List of Figures

1.1 Digital forensic process [64] . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 CSP and cloud customer’s control over multiple layers in three service

models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 Layers of the IaaS cloud environment and cumulative trust required by

each layer [55] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3 Types of hypervisors [86] . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1 Multi-level virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2 Changes in host OS files during VMware workstation installation . . . . . 35

3.3 MFT file record with sample attributes . . . . . . . . . . . . . . . . . . . 36

3.4 MFT file record with named attributes . . . . . . . . . . . . . . . . . . . 37

3.5 Hiding of virtual machine in a cloud hosting server . . . . . . . . . . . . 39

3.6 Launching a hidden virtual machine . . . . . . . . . . . . . . . . . . . . 39

3.7 Configuration file (.vmx) . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.8 Modified configuration file (.vmx) . . . . . . . . . . . . . . . . . . . . . 40

3.9 Hash value of vmtest.txt file before ADS attachment . . . . . . . . . . . . 40

3.10 Hash value of vmtest.txt file after ADS attachment . . . . . . . . . . . . . 41

3.11 Detection of hidden virtual machine . . . . . . . . . . . . . . . . . . . . 43

4.1 Phases of cyber crime investigation . . . . . . . . . . . . . . . . . . . . . 46

4.2 Daubert principles for digital forensic [8] . . . . . . . . . . . . . . . . . 52

4.3 Content of the chain of custody record . . . . . . . . . . . . . . . . . . . 53

4.4 Control flow diagram for digital forensic investigation in cloud . . . . . . 55

4.5 Digital forensic architecture for cloud . . . . . . . . . . . . . . . . . . . 56

4.6 Conceptual architecture of the private cloud IaaS . . . . . . . . . . . . . 58

5.1 Remote data acquisition in the private cloud data center . . . . . . . . . . 65

ix

5.2 Directory of virtual machine instances in the OpenStack cloud . . . . . . 66

5.3 Virtual hard disk location in the OpenStack cloud . . . . . . . . . . . . . 66

5.4 Connecting to cloud hosting server that stores the shared table database . 67

5.5 Shared table with different attribute information . . . . . . . . . . . . . . 67

5.6 Virtual disk examination process . . . . . . . . . . . . . . . . . . . . . . 69

5.7 File system metadata extractor . . . . . . . . . . . . . . . . . . . . . . . 70

5.8 File system metadata extractor report . . . . . . . . . . . . . . . . . . . . 70

5.9 Cloud VM’s registry analyzer . . . . . . . . . . . . . . . . . . . . . . . . 71

5.10 Cloud VM’s registry analyzer report . . . . . . . . . . . . . . . . . . . . 72

5.11 Selective memory analysis . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.12 Selective memory analysis report . . . . . . . . . . . . . . . . . . . . . . 73

5.13 Selection of keyword option for searching . . . . . . . . . . . . . . . . . 73

5.14 Entering multiple keywords for search (indexing) . . . . . . . . . . . . . 74

5.15 Selection of RE option for searching . . . . . . . . . . . . . . . . . . . . 74

5.16 Selecting multiple patterns for search (indexing) . . . . . . . . . . . . . . 75

5.17 Memory analysis report (result of keywords or pattern matching search) . 76

6.1 MapReduce application framework to count distinct words of a file . . . . 83

6.2 Mapping of Hadoop framework components to forensic triage [23] . . . . 85

6.3 Proposed system for ‘real-time digital forensic partial analysis’ using MapRe-

duce with KMP/BM search engine . . . . . . . . . . . . . . . . . . . . . 87

6.4 Default regular expressions to generate Mapper code . . . . . . . . . . . 90

6.5 Adding regular expression to generate Mapper code . . . . . . . . . . . . 90

6.6 Searching time of KMP based MapReduce with single keyword . . . . . 92

6.7 Searching time of KMP based MapReduce with multiple keywords . . . . 94

6.8 Searching time of RE based MapReduce with single pattern . . . . . . . . 95

6.9 Searching time of RE based MapReduce with multiple patterns . . . . . . 96

1

Chapter 1

Introduction

“I cannot teach anybody anything. I can only make them think.”

- Socrates

Over the past few years, cloud computing has revolutionized the methods by which

digital information is stored, transmitted, and processed. Cloud computing is not just

a hyped model, but, a technology embraced by Information Technology giants such as

Apple, Amazon, Microsoft, Google, Oracle, IBM, HP, and others. Cloud computing has

the potential to become one of the most transformative developments in the history of

computing, following the footsteps of mainframes, minicomputers, PCs (Personal Com-

puters), smart phones, and so on [81].

Gartner estimates that there are currently about 50 million enterprise users of cloud

office systems, which represent only 8 percent of overall office system users (excluding

China and India). Gartner, however, predicts that a major shift toward cloud office systems

will begin by the first half of 2015 and reach 33 percent penetration by 2017 and 60

percent by 2020 (Gartner, 2013). According to an IDC IT Cloud Services User Survey,

74 per cent of IT executives and CIOs have cited security as the top challenge preventing

their adoption of the Cloud services model [6].

The Eighth annual Worldwide Infrastructure Security Report (2013) from security

provider Arbor Networks has revealed how cloud services and data centres are “increas-

ingly victimised” by cyber attackers. Some recent attacks on cloud computing platforms

2

strengthen the security concern. Due to its characteristics, cloud services are more vul-

nerable to Denial of Service attacks (DoS) and may cause extreme damage. For example,

a botnet attack (running of “Zeus botnet controller” on an EC2 instance) on Amazon’s

cloud infrastructure was reported in 2009 [46]. This implies that an adversary can rent

any number of virtual machines (VMs) to launch a Distributed Denial of Service (DDoS)

attack on other systems including where the VMs are running. Also, due to the remote

storage facility provided by cloud computing platforms such as Google Drive, Dropbox,

SpiderOak, Amazon Cloud Drive, Microsoft SkyDrive, Ubuntu One, Apple iCloud, etc.,

cyber criminals can keep their secret files (e.g., pornography pictures, forgery documents,

etc.,) in cloud storage and can destroy all digital evidence from their local storage to

get undetected during investigation. To investigate these kinds of cybercrimes involving

cloud computing platforms, investigators have to carry out digital forensic investigation

in the suspected client device as well as cloud computing environment.

Cyber crime is a form of crime where the Internet or computer is used as a medium to

commit the crime [92].

1.1 Digital Forensics

As quoted by Ben Martini and Kim-Kwang Raymond Choo [69], digital forensics is a

relatively new sub-discipline of forensic science among other common forensic science

disciplines. Digital forensics has a number of synonyms including computer forensics,

cyber forensics, computational forensics and forensic computing.

1.1.1 Definition

One of the first definitions of digital forensics was provided by McKemmish in 1999 as,

“The process of identifying, preserving, analyzing and presenting digital evidence in a

manner that is legally acceptable by court of law”[71].

Another widely adopted definition introduced by the inaugural DFRWS (Digital Forensic

Research Workshop, August 7-8, 2001, Utica, New York) is given as,

3

“The use of scientifically derived and proven methods toward the preservation, collec-

tion, validation, identification, analysis, interpretation, documentation and preservation

of digital evidence derived from digital sources for the purpose of facilitating or furthering

the reconstruction of events found to be criminal, or helping to anticipate unauthorized

actions shown to be disruptive to planned operations”.

-DFRWS, 2001

Today, the digital forensic community uses the definition that was provided by NIST [64]

which share some similarities with McKemmish and DFRWS in the four phases as given

below.

1. Collection phase discusses identifying relevant data, preserving its integrity and

acquiring the data;

2. Examination phase uses automated and manual tools to extract data of interest

while ensuring preservation;

3. Analysis phase is concerned with deriving useful information from the results of

the examination; and

4. Reporting phase is concerned with the preparation and presentation of the forensic

analysis

Thus we can say that digital forensic deals with forensic analysis of cyber crimes and its

role is a means of systematically gathering digital evidence, analyzing it to make credible

evidence and authentically presenting it to the court of law.

1.1.2 Digital forensic process

The most common goal of performing forensics is to gain a better understanding of an

event of interest by finding and analyzing the facts related to that event [64]. The basic

phases that are required for a forensic process are collection, examination, analysis and

reporting as shown in the following figure. Digital forensic process transforms content of

a storage media into evidence. During this transformation, there are three stages.

4

Figure 1.1: Digital forensic process [64]

Stage 1:

Collection - examination: where digital data is collected in a format that can be under-

stood by various forensic tools.

Stage 2:

Examination - analysis: where the relevant pieces of information is extracted from col-

lected data.

Stage 3:

Analysis - reporting: where by using various analysis methods, the forensic investigator

must process and analyze the data to draw conclusions relevant to a case under investiga-

tion.

The arrow from reporting phase to collection phase denotes that the reported evidence is

repeatable and reproducible.

1.2 Cloud Computing

Cloud computing is a relatively new business model after grid computing to make avail-

able computer resources as a service to end users accessible over a network. Various

definitions and interpretations of the term “cloud computing” exist in the world commu-

nity of users. Vaquero et al. [90] reviewed more than 20 cloud computing definitions and

noticed that key terms mandatory in a minimal definition are scalability, pay-per-use util-

ity model and virtualization. The most widely used definition of the cloud computing was

provided by Peter et al. in the NIST special publication [72].

5

1.2.1 Definition

“a model for enabling ubiquitous, convenient, on-demand network access to a shared

pool of configurable computing resources (e.g., networks, servers, storage, applications

and services) that can be rapidly provisioned and released with minimal management

effort or service provider interaction”.

- NIST [72]

This was the final definition released by NIST in 2011 after 15 versions of working defi-

nitions. This definition was also adopted by Australian Government for Information and

Communication Technology (ICT) services [7]. Few researchers have suggested it as

the “defacto standard” [60]. The working of cloud computing is based on 3-4-5 rule,

in which, it provides 3 - unique services, 4 - unique deployment models and 5 - unique

characteristics according to NIST.

1.2.2 Cloud Services, Deployment Models and Characteristics

Cloud Services

Three services named according to the abstraction level of the capability provided and the

service models of providers are [91]:

1. Infrastructure-as-a-Service (IaaS)

2. Platform-as-a-Service (PaaS)

3. Software-as-a-Service (SaaS)

Infrastructure-as-a-Service (IaaS):

This service model provides the user the facility of renting processing power and storage

to run his or her own virtual machine in the cloud. A user can access the launched vir-

tual machine through a thin client interface such as a web browser running in computers,

mobiles, PDA’s(Personal Digital Assistant), etc. devices. Users will be charged based on

the resources the virtual machine consumes from the cloud. Amazon through its AWS

(Amazon Web Services) console provides IaaS using its EC2 (Elastic Compute Cloud)

6

facility [4]. Many other vendors like Microsoft, Rackspace, GoGrid, terremark, etc., pro-

vide the same facility. There are well known open source cloud platforms available for

this purpose like Eucalyptus [13], OpenNebula [28], OpenStack [29], etc.

Platform-as-a-Service (PaaS):

Through this model cloud owner provides the user the facility of renting a platform to de-

velop and deploy the user applications in the cloud environment. It is basically an applica-

tion middleware offered as a service to developers, integrators, and architects [86]. Users

will be charged according to the platform (e.g., Database, .Net, etc. ) used and band-

width consumed. The well-known example of PaaS is Google App Engine [19]. There

are a number of other PaaS providers like Windows Azure, Force.com, Drupal, Wolf

Frameworks, Cloud Foundry, IBM Bluemix, Eccentex, AppBase, LongJump, SquareS-

pace, WaveMaker, Heroku, Github, etc., to name few.

Software-as-a-Service (SaaS):

Using this model user can make use of cloud service provider’s software application run-

ning on cloud infrastructure [86]. The user can access the application through a thin

client interface such as a web browser from various devices like computers, mobiles,

PDA’s(Personal Digital Assistant), etc. Users will be charged based on the usages of the

application. Examples of SaaS include applications like Salesforce.com, QuickBooks,

GoToMeeting, Zoho Office Suite, Microsoft Office 365, Google docs, Google calendar,

Facebook, Linkedin, Slideshare, etc., to name few.

Deployment Models

According to the deployment model, cloud computing can be categorized into four cate-

gories as [72]:

1. Private Cloud

2. Public Cloud

3. Community Cloud

4. Hybrid Cloud

7

Private Cloud:

In this model, the cloud infrastructure is fully operated by the cloud owner organization. It

is the internal data center where the infrastructure is located at the organizations premises.

One can set up this kind of cloud computing environment using solutions like OpenStack,

Eucalyptus, OpenNebula, VMWare [40], etc.

Public Cloud:

The cloud service provider (CSP) owns the cloud infrastructure and makes it available

to the general public or a large industry group. Amazon, Microsoft and Google are the

major public cloud service providers in the current IT industry.

Community Cloud:

This is similar to the grid computing model in which several organizations with common

concerns (e.g., mission, security requirements, policy, and compliance considerations)

share the cloud infrastructure. Different private cloud data centers can be connected to

form this kind of a computing model. The public cloud service providers like Amazon,

Microsoft, etc. can deploy this kind of cloud platform based on the user requirements.

Hybrid Cloud:

This model is a composition of two or more clouds (private, community, or public). Hy-

brid cloud architecture requires both on-premises resources and off-site (remote) server

based cloud infrastructure. Eucalyptus, VMWare, etc., are examples of Hybrid cloud de-

ployment solutions.

Characteristics

Five unique characteristics of cloud computing according to NIST (National Institute of

Standards and Technology) are [72]:

1. On-demand Self-service

2. Broad Network Access

3. Resource Pooling

4. Rapid Elasticity

5. Measured Service

8

On-demand Self-service: A user of a cloud can provision computer resources without

the need for interaction with the cloud service provider personnel. For example, one can

log on to Amazon EC2 and obtain virtual resources such as sever, storage, memory and

network within minutes [86].

Broad Network Access: Ubiquitous access to virtual resources in cloud, i.e., access to

resources in the cloud is available over the network using standard methods in a manner

that provides platform-independent access to clients of all types.

Resource Pooling: A cloud service provider creates resources that are pooled together in

a system that supports multi-tenant usage. Physical and virtual systems are dynamically

allocated or reallocated as needed.

Rapid Elasticity: Resources can be rapidly and elastically provisioned. The system can

add resources by either scaling up systems (more powerful computers) or scaling out

systems (more computers of the same kind), and scaling may be automatic or manual.

From the standpoint of the client, cloud computing resources should look limitless and

can be purchased at any time and in any quantity.

Measured Service: The use of cloud system resources is measured, audited, and reported

to the customer based on a metered system.

1.2.3 Cloud Crime

Ruan et al. [82] have extended the definition of cyber crime (or computer crime) to cloud

crime as,

“a crime that involves cloud computing in a sense that the cloud can be the object, subject

or tool of crimes (object - CSP is the target of the crime; subject - cloud is the environment

where the crime is committed; tool - cloud can also be the tool used to conduct or plan a

crime)”.

Cyber criminals may use DDOS (Distributed Denial of Service) attacks to target the CSP

(cloud service provider), or use the cloud environment to commit a crime such as identity

theft of the cloud user, illegal access of data residing in the cloud, or use cloud as a

platform to store crime related data and share among friends using the cloud.

9

1.2.4 Cloud Forensics

As there is no unique definition available for cloud computing, it is too early to expect

a definition of an emerging area like cloud forensics. According to Ruan et al. [82],

cloud computing is based on broad network access. Network forensics deals with forensic

investigations of networks. So, cloud forensics is a subset of network forensics. Also, they

view it as a cross discipline of cloud computing and digital forensics.

Shams et al. [93], define cloud forensic as “the application of computer forensic prin-

ciples and procedures in a cloud computing environment”.

We define cloud forensic as “the process of applying various digital forensic phases

in cloud platform depending on the deployment model of cloud”. For example, digital

forensic process used for private cloud may differ from that of public cloud environment.

1.2.5 Gaps in Existing Research

Cloud computing has completely changed the way the digital data is stored, transmitted

and processed. With such a paradigm shift from desktop systems to network of servers ge-

ographically located, many technological and legal challenges may be encountered when

we intend to perform digital forensics in different types of cloud platforms. In the past

few years, many researchers have contributed in identifying the forensic challenges, de-

signing forensic frameworks and data acquisition methods for cloud computing systems.

Though all these works identify the technical /organizational /legal challenges of cloud

forensic analysis, no concrete solutions have been proposed to address the challenges of

applying forensics to the cloud environment in general that is acceptable to the forensic

investigator or LEAs (Law Enforcement Agencies) of this digital space.

The related research done so far is all about studying the issues in the cloud forensic

arena, except for some specific contributions like FROST (Digital Forensic Tools for the

OpenStack Cloud Computing Platform) [56]. Therefore there is a real requirement to

undertake forensic research in cloud on a large scale. The major challenges of cloud

forensics originate from the very characteristics with which the cloud computing platform

is identified. Accordingly, we enlist a few gaps which demand the immediate attention of

cloud researchers for practical solutions to cloud forensics.

10

1. Absence of uniform standards and protocols - This leads to technical difficulties in

forensic data collection and analysis

2. Investigation in virtualized environment is a real challenge - as virtualization is a

key technology used to implement cloud services

3. Multi-tenancy and multi jurisdiction - which result in legal concerns with respect to

cloud forensics

4. Evidence segregation is a big issue - due to the “resource pooling” characteristic

of the cloud

5. Partial forensic examination - Absence of tools for pre-processing of virtual disks

and memory to help in completing the investigation process

6. Absence of digital forensic triage in cloud data analysis - use of parallel processing

techniques to index virtual disk data to speed up investigation process

7. Interoperability between cloud providers - no interoperability as such between cloud

providers

8. Lack of transparency - the operational details of cloud data centers are not transpar-

ent enough to cloud investigators

9. Maintaining chain of custody - due to the multi-layered and distributed architecture

of cloud, the chain of custody of data may be difficult to verify

10. Loss of data control - cloud user or investigator will have little or no control (or

knowledge) over the physical locations of the digital evidence

11. Virtual machine data is not persistent - if a virtual machine is terminated, there is

no procedure available for recovering its data

12. Identification of evidence - sources of evidence pertain to different cloud platforms

and hence no unique method for identification

13. Live forensics - Live data acquisition will have data integrity or preservation prob-

lem

11

14. Imaging the data center of the cloud - complete data center imaging of the cloud is

not possible and partial imaging may have legal implications

15. Selective data acquisition - requires a good amount of prior knowledge about the

cloud platform

16. Layers of trust - because cloud has a multi-layered architecture, trust is required at

various layers to maintain the integrity of the evidence

17. Reliance on cloud providers - for data acquisition, the investigator has to exclusively

depend on the cloud provider

18. Outsourcing of services to third parties - it makes the scope of investigation widen

and the forensic activity needs to be done as a joint effort

19. Absence of cloud forensic SLAs - There are no well-framed SLAs (Service Level

Agreements) for performing forensics in cloud

1.3 Objectives of the Research

The objectives of this research work include the following:

1. Explore the challenges and requirements of forensics in the virtualized environment

of cloud computing

2. Design a digital forensic framework for the cloud computing systems from the view

point of investigator and/or cloud architecture

3. Address the issues of dead/live forensic analysis within/outside the virtual machine

that runs in a cloud environment

4. Using digital forensic triage in the examination and partial analysis phase of cloud

forensics

12

1.4 Scope and Problem Definition

Cloud computing is maturing and continues to be the latest, most hyped concept in infor-

mation technology industry. Cloud computing evokes different perceptions in different

people. These developments may create problems to law enforcement agencies (LEA)

throughout the world who are actively involved in cyber crime investigation especially in

cloud. The work “A Novel Digital Forensic Framework for Cloud Computing Environ-

ment”, would help an investigator or the Cloud Service Provider to get an overall idea

of performing digital forensic investigation in cloud computing environment. The digital

forensic methods that are suggested in this research can scale to cloud data for handling

the analysis of the cloud crimes. The proposed methods of the partial analysis would help

the forensic investigator in minimizing the overall processing time of a cloud crime un-

der investigation. The digital forensic research community which is actively involved in

the designing and development of the cyber forensic tools for cloud computing systems,

could consider the cloud forensic architecture presented in this work as a reference model.

In brief, the work presented as part of this report can be a way forward to combat cyber

crimes in cloud computing systems.

1.5 Contributions of the Thesis

The contributions of this work can be organized into four aspects. The first aspect deals

with identifying the challenges and requirements of forensics in the vrtualized environ-

ment that is omnipresent in cloud. More specifically, the contribution is restricted to

detecting the virtual environment in the multi-level virtualization, and identification of

forensically relevant files which are generated when virtual systems are used in the vir-

tual machines that are part of the cloud environment, and devising method to detect virtual

machines hidden using alternate data streams in such virtual systems.

In the second aspect, we designed a digital forensic process for cloud computing sys-

tems from the view point of the investigator and/or the cloud architecture. The digi-

tal forensic process that we designed for the investigators has a built-in digital forensic

13

framework which contains the required phases of digital forensics for the cloud comput-

ing systems. We also compared the digital forensic framework which we proposed with

the existing traditional forensic frameworks. A generic digital forensic architecture is de-

signed for the cloud computing systems to understand the challenges that may come up

in designing the new digital forensic tools for cloud platforms.

In the third aspect, we addressed the issues of forensic acquisition and analysis of

evidence within and/or outside the virtual machine that runs in a cloud environment. In

particular, we have designed the digital forensic methods for cloud data acquisition and

analysis. For the examination and partial analysis of the evidential artifacts of a virtual

machine, we have designed and implemented tools to collect actionable evidence depend-

ing on the nature of the cloud crime. We have used the Boyer-Moore pattern matching

algorithm to report the running status of a virtual machine by extracting the physical

memory artifacts of that virtual machine. To analyze the cloud logs (outside the virtual

machine), we designed and implemented tools to segregate and collect important logs

pertaining to the virtual instances.

For large-scale data examination, we have designed and implemented a digital forensic

triage using parallel processing framework to find the evidence of interest to the investiga-

tor in real time. This forms the fourth aspect of the thesis. The approach uses MapReduce

with inbuilt KMP (Knuth-Morris-Pratt) and Boyer-Moore string search algorithms on the

distributed computing platform Hadoop to search for user specified keywords. The facil-

ity of regular expression search is also provided in this framework.

1.6 Outline of the Thesis

The thesis has been organized into seven chapters. In Chapter 1, we provided an introduc-

tion to cloud computing and digital forensics. In Chapter 2, we discuss the background

and related work. To begin with, we defined few general terms in digital forensics, and

about different storage media and the file systems are discussed. In the cloud crime in-

vestigation; the limitations of the traditional digital forensic tools, the sources of digital

evidence, the role of deployment models, the role of delivery models, and the issues with

14

multi-layered architecture are discussed in detail. Finally, we provided an extensive re-

search review on cloud crime and forensics.

Chapter 3 deals with identifying the challenges and requirements of performing foren-

sic activity in presence of virtualization in the cloud environment. When virtualization

software runs in the virtual machine to create another level (second level virtualization)

of virtual layer in cloud, the detection of such virtualization by identifying the relevant

files, and the changes in the host OS of the virtual machine is discussed. While using

virtualization, if virtual machines are hidden using alternate data streams (ADS) for mali-

cious purpose, technique for the detection of such virtual machines is discussed in detail.

To help an investigator understand the complete digital forensic process to investigate

the cloud crime, we designed a digital forensic framework for cloud, which is discussed

in Chapter 4. We described the cloud forensic process and its phases by elaborating on

each phase such as identification, collection and preservation, analysis, and reporting in

detail. Also, we have compared our proposed framework with the existing digital foren-

sic frameworks. Having identified the different phase of cloud forensics, we designed a

control flow diagram of the digital forensic process for cloud computing systems, which

provides a detailed view of how to perform cloud crime investigation of the client de-

vice as well as the cloud data center. Finally, we have provided a generic architecture of

cloud forensics which includes an IaaS (Infrastructure as a Service) cloud test bed using

OpenStack for experimental purpose.

In Chapter 5, the digital forensic methods for data acquisition and analysis of the

cloud environment are provided. In particular, the methods of the examination and partial

analysis of the data within the virtual machine such as examination of the file system

metadata, the registry files, and the physical memory are described in detail. Also, the

methods of data segregation and acquisition with respect to the cloud logs are discussed.

Finally, the implementation detail of the Boyer-Moore pattern matching algorithm, which

is used to search multiple keywords in a memory image is provided.

The approach of the digital forensic triage in the examination and partial analysis

of the cloud data is discussed in the Chapter 6. To begin with, the requirement of the

digital forensic triage in cloud and the parallel processing framework using Hadoop are

explained in detail. Then, the complete real-time digital forensic analysis process with

15

emphasis on the selection of a pattern matching algorithm, the proposed system architec-

ture, and the implementation details are provided. Finally, the searching capability of the

KMP (Knuth-Morris-Pratt) based MapReduce over single and multiple keywords, and RE

(regular expression) based MapReduce over single and multiple patterns on multi-node

Hadoop cluster is summarized.

1.7 Summary

The concept of digital forensics and cloud computing is not new. In the last few years,

network administrators and technology developers have represented the Internet as cloud.

Digital forensics was all started in late 1970s and was a field of growth during 1980-90s.

There were many tools developed for performing digital forensics during 1980 to 2015.

In this introductory Chapter, we have introduced the concepts of cyber crime, dig-

ital forensic, cloud computing, cloud crime, cloud forensics taking inputs from various

researchers in the field of digital forensic and cloud computing. To start with, we have

identified gaps in the existing research and defined the scope and the problem definition.

Finally, we described the contributions made and gave the outline of the thesis. In Chapter

2, we elaborate on the background and the related work with respect to this research.

16

Chapter 2

Background and Related Work

“A little knowledge is a dangerous thing. So is a lot.”

- Albert Einstein

2.1 General Terms in Digital Forensics

2.1.1 Computer Crime

In Chapter 1, we had discussed various terms related to digital forensics and cloud com-

puting. Having identified the gaps in the existing research in the area of cloud forensics,

we look forward to provide more information on its background and related research.

Computer crime also called as cyber crime is a “Unlawful act wherein the computer

is either a tool or a target or both” [78]. A computer may be used as a tool to commit

crime (for example: child pornography, threatening email, spam mails, phishing, etc.).

Computer itself may become target of a crime (for example: viruses, worms, software

piracy, hacking, etc.). All computer crimes may fall into three categories as shown in

Table 2.1.

Digital Evidence:

Digital evidence can be defined as “the digital data which can establish that a crime has

been committed or can provide a link between a crime and its victim or a crime and its

perpetrator” [52]. So, digital evidence is a means for investigation and analysis of the

cyber crimes to bring the culprits to conviction. Digital evidence may exist in the form of

17

Table 2.1: Categories of computer crimes

Category Examples

Against Organizations

Hacking, DOS (Denial of Service), Virus/Worm/Trojans/

Spyware Attacks, IPR Violations, Stealing Trade Secrets,

Website Defacement, etc.

Against PeoplePhishing, Identity Theft, E-mail hijacking, Defamation, Internet

Fraud, Pornography, Distribution of pirated software, etc.

Against Country Cyber Terrorism, Cyber Attacks, etc.

a text, audio, image, video, or raw file (binary file of 0’s and 1’s).

Seizure:

It is the process of capturing the suspect computer for evidence collection. Systematic

procedure is needed for seizure to avoid loss of digital evidence. The process of seize

may not be possible in the cloud environment due to the geographically dispersed servers

(public cloud) and the multi-tenancy nature of cloud. Multi-tenancy is a technology using

which multiple organizations or users share computer resources of a physical server.

Acquisition:

It is the process of recording the physical scene and duplicate digital evidence using stan-

dardized and accepted procedures. This process is known as imaging(bit by bit copying)

of digital storage media. This process also may not be possible in the cloud environment

due to its virtualized nature. Rather, selective remote data collection can be possible for

cloud crime analysis [55].

Authentication:

Validating the seized and acquired evidence to make sure that the integrity of evidence is

not compromised. Investigators generally use hashing algorithms (MD5 - Message Digest

or SHA - Secure Hash Algorithms) for computing a check-sum to maintain the evidence

integrity.

18

2.1.2 Storage Media and File Systems

Digital data created for any purpose must be stored in proper format so that it is easily

accessible for further processing. Any data which is stored in the form of 0’s and 1’s is

termed as digital data. In the digital/ computing environment, many devices are designed

for storing such data. Today, data is generally stored in three different ways: electro-

magnetism (Magnetic Disks - hard disk), microscopic electrical transistors (Flash mem-

ory - USB, Solid State Drive, etc.), and reflecting light (Optical Storage - CDs, DVDs,

etc.) [83]. A file is a bunch of data. A file system is a data structure which allows this

data to be stored in a systematic manner. The file system keeps track of the free space as

well as the location of each file of the storage. The free space is also called as unallocated

space. This free space is either empty or contain files which were deleted previously.

File systems are of different types. Windows operating systems uses FAT (File Al-

location Table with various versions like FAT12, FAT16, FAT32 and exFAT) and NTFS

(New Technology File System). Mac operating system used in Apple products uses file

systems like HFS (Hierarchical File System) and HFS+. File systems used by open source

operating system are ext2, ext3, ext4, etc. Distributed systems like cloud computing make

use of GFS (Google File System), HDFS (Hadoop File System), etc.

2.1.3 Limits of Traditional Digital Forensic Tools

FBI (Federal Bureau of Investigation) reports that the average amount of data per case

has grown 6.65 times during 2003-2011. But, in reality the ability of digital forensic

tools have not been appreciably grown to handle the data growth rate [80]. There are

numerous digital forensic tools to generate timeline view based on file system meta data

like EnCase [12], TSK (The Seuth Kit) [37], CyberCheck [9], etc. The limitation of these

tools is that they depend on file systems and do not use the content of a individual file.

Data segregation is not required in traditional digital forensic and hence there is no

segregation phase. Log formats of a desktop or server operating systems are not similar

to the logs in a cloud operating systems. For cloud log analysis, data segregation and

collection are required.

19

2.2 Cybercrimes in Cloud Computing

As surveyed and reported by RSA(2012), McAfee(2013), Norton(2013), etc., cybercrime

will pose many challenges to digital forensics in the near future which includes threats

due to virtualization and cloud computing among others [48]. Incident response and

computer forensics in a cloud computing environment require fundamentally different

tools, techniques, and training [55]. A draft report from the National Institute of Standards

and Technology noted that “little guidance exists on how to acquire and conduct forensics

in a cloud platform” and suggested that the existing best practices and guidelines still

apply to digital forensics in the cloud computing environment [22].

2.2.1 Sources of Digital Evidence

Any information in traditional desktop systems will be stored as files including data re-

lated to systems activity. Depending on the nature of the computer crime, the files from

the storage will be retrieved and parsed to investigate the cause. Similar to a desktop ma-

chine, a cloud user can create and run virtual machines (VMs) in the cloud environment.

This VM is as good as a physical machine and creates lots of data in the cloud for its

activity and management. The data created by a VM includes virtual disk, virtual physi-

cal memory, and logs (VM logs, Cloud services logs, firewall logs). The virtual physical

memory is the memory space seen by the VM’s operating system and not cloud. Virtual

disk formats that different cloud providers may support include .qcou2, .vhd, .vdi, .vmdk,

.img, etc. Every cloud provider has its own mechanism for storing service logs (activity

maintenance information) and hence there is no interoperability on log formats among

cloud providers.

2.2.2 Does the Cloud Deployment Model Play a Role?

Among the four deployment models defined by NIST [72], two most popular models are

private and public clouds. An appropriate digital forensic architecture designed for these

models can also be used for the remaining models (community and hybrid clouds).

It may be impossible to seize some or all of the servers physically in a cloud data center

due to the servers which are geographically dispersed (may be in multiple jurisdiction) or

20

contain multi-tenant data (violating privacy of tenants). Cloud forensic mainly differs

with traditional digital forensic in data acquisition phase. Rest of the phases are similar

except data segregation of logs in cloud environment which helps in the analysis. Also,

several researchers have pointed out that the acquisition of data in cloud is a forefront

issue while investigating cloud based crimes [55, 82, 88].

As pointed out by Ben Martini and Raymond Choo, the approach used by the digital

forensic investigator in acquiring digital evidence will certainly depend on the cloud de-

ployment models used [69]. Following table lists the challenges an investigator may face

during the data acquisition in private and public clouds.

Table 2.2: Challenges of data acquisition in private and public clouds

Deployment Model Chellenges

Private Cloud

Law Enforcement (LE) with the help of CSP may acquire data

related to crime such as VM’s virtual disk file, cloud service

logs, firewall logs, etc. of a particular IP address belonging

to the incident using remote acquisition methods.

Acquisition is comparatively easy because no jurisdiction is

involved and no loss of control.

Public Cloud

Law Enforcement will have to issue a search warrant to CSP

for acquiring the data related to crime of a particular IP address

belonging to the incident. A technician at the CSP, acquires

the data required using the same methods as private cloud

(because technician can have access to cloud) and submits the

same to LE.

LE has to trust the technician of CSP and his capabilities in

using sound methods of forensic data acquisition.

21

2.2.3 Role of Cloud Delivery Models in the Investigation

All of the four deployment models defined by NIST delivers software, platform and in-

frastructure as services to end users. Platform-as-a-Service is built on Infrastructure-as-

a-Service and Software-as-a-Service is built on Platform-as-a-Service. Hence, the pro-

cedures and frameworks designed for performing digital forensics in Infrastructure-as-a-

Service model will also help in other two service models.

Cloud computing architecture comprises of layers and cloud user will have access to

different layers in different service models. IaaS model provides access to more number

of layers and SaaS provides only one (access control). The number of layers to which

cloud user can have access to in different service models is shown in the Figure 2.1. This

implies that the investigator can acquire data related to a VM or user account with the

help of the IaaS model and not with PaaS and/or SaaS models. Hence, the contribution of

this thesis is restricted to the IaaS model of private or public cloud deployment models.

Figure 2.1: CSP and cloud customer’s control over multiple layers in three service models

22

2.2.4 Issues with Multi-layered Architecture

Presenting the cyber crime case before a court-of-law arises many questions about trust in

hardware (ex: hard drive), software (ex: operating system), procedures and tools used for

data acquisition and analysis, capability of the investigator, etc. Cloud computing added

few more areas to this due to its unique characteristics and layered architecture. Dykstra

and Sherman designed a model of trust in IaaS cloud environment in six layers [55]. The

summary of their work is depicted in the Figure 2.2.

Figure 2.2: Layers of the IaaS cloud environment and cumulative trust required by eachlayer [55]

They represent network as layer 1 and application data (guest application) as layer

6. For each layer the data acquisition method and the level of trust required is different.

For example, at layer 5, using remote acquisition methods the trust is required at differ-

ent layers like guest OS, HV (hypervisor), host OS, hardware and network. Host OS is

the operating system that runs on cloud server hardware. Guest OS is operating system

installed in a virtual machine with the help of a hypervisor. The hypervisor is a virtual-

ization software (also called as VMM - Virtual Machine Monitor) which can create and

run virtual machines. There are two types of hybervisors [86]:

• Type 1 Hypervisor (Bare-metal or Native)

• Type 2 Hypervisor (Hosted)

A type 1 hypervisor as shown in the Figure 2.3, runs directly on the hardware and

type 2 hypervisor runs on top of the existing OS (Windows 7, Windows 8, Red Hat

23

Figure 2.3: Types of hypervisors [86]

Linux, Ubuntu, etc.). Examples of bare-metal hypervisors include VMware ESX/ESXi,

Microsoft Hyper-V, Citrix XenServer, IBM z/VM, etc. and hosted hypervisors include

VMware Workstation, Microsoft Virtual PC, Sun VirtualBox, QEMU, etc.

2.3 Cloud Crime and Forensics - Review

For a forensic investigator or a cloud customer, cloud computing environments lack trust-

worthy capabilities. The cloud investigator or the customer is at the mercy of the cloud

service provider to assist in cloud crime investigation. In this section we list some of the

major contributions made till date by a few researchers in the area of digital forensic in

cloud computing.

There were number of surveys conducted in mapping the principles and guidelines

available for the traditional digital forensic process to the cloud computing environment.

The Incident Management and Forensics Working Group, mapped the forensic standard

ISO/IEC 27037 to cloud computing [24]. This mapping is basically a survey of the issues

related to the forensic investigation of the cloud environments. It includes the standards

which can be followed by the LEAs (Law Enforcement Agencies) across the nations, the

requirements of the service level agreements (SLAs) for cloud forensics, etc.

Harjinder Singh Lallie and Lee Pimlott have investigated the impact of the cloud com-

puting environments on the ACPO (Association of Chief Police Officers) principles [66].

24

The ACPO principles are the guidelines for the digital forensic investigation which will

be followed in handling the computer based electronic evidence by the law enforcement

agencies in the United Kingdom. In their findings, they warned the digital forensic com-

munity about the usage of these guidelines in the cloud computing environment for vari-

ous reasons. The reasons they identified includes the problems associated with metadata,

lack of control over the investigation, complexities related to the distribution of the data

stores and the problems associated with maintaining an audit trail.

To research the digital forensic issues in the cloud environment, the NIST Cloud Com-

puting Forensic Science Working Group (NCC FSWG, 2014) was established to identify

the challenges which cannot be handled with the current technology and methods [61].

This group has surveyed existing literature and identified the set of challenges for cloud

crime investigation. Also, the researchers of this group have interacted with the interna-

tional community of the digital forensics to summarize the challenges. The final research

report summarizes 65 challenges which broadly falls in the categories of the incident first

responders, architecture, anti-forensics, data collection, analysis, legal issues, role man-

agement, training and standards.

All these research reports emphasize on the requirements of the practical methods to

investigate the cloud crime.

To our knowledge the researchers who actively started working in cloud forensic were

Dykstra and Sherman. In 2012, for the first time they have used existing tools like En-

case Enterprise, FTK (Forensic Tool Kit), Fastdump, Memoryze, and FTK Imager to

acquire digital evidence from the public cloud over the Internet. They have used the Elas-

tic Compute Cloud (EC2) as a live test bed from Amazon Web Services (AWS) public

cloud. The aim of their research was to measure the effectiveness and accuracy of the

traditional digital forensic tools on an entirely different and new environment like cloud.

They succeeded in the experiment and highlighted the limits of the same. Their experi-

ment showed that trust is required at many layers to acquire forensic evidence from the

cloud environment. Due to the trust issue, they did not recommend traditional forensic

tools (Encase Enterprise, FTK, etc.) but explored four other solutions for data acquisition

like Trusted Platform Modules (TPM), the management plane, forensics-as-a-service, and

legal solutions. From these four solutions they strongly recommended the management

25

plane.

In continuation of their work, Dykstra and Sherman (in 2013) have implemented user-

driven forensic capabilities using management plane of a private cloud platform called

OpenStack [56]. Their solution is capable of collecting virtual disks, guest firewall logs

and API logs through the management plane of OpenStack (a private cloud computing

platform) [29]. OpenStack users and/or administrators interact with the cloud platform

and manage resources of the cloud through the management plane using a web interface

(ex: Horizon) and API (ex: Nova API). They call their implementation as Forensic Open-

Stack Tools (FROST) which is available through both of these interfaces. Their emphasis

was on data collection and segregation of log data in data centers using OpenStack as the

cloud platform. Hence, their solution is not independent of OpenStack platform and till

date it has not been added to the public distribution (the latest stable version of OpenStack

is Kilo released on 30th April 2015).

Ben Martini and Kim-Kwang Raymond Choo (in 2012) [69], proposed an integrated

conceptual digital forensic framework to collect and preserve digital evidence for forensic

purpose from the cloud computing environment. Their framework was based on two of

the most widely accepted and used digital forensic frameworks - McKemmish (1999) [71]

and NIST (Kent et al., 2006) [64]. They reviewed these two frameworks to know the

changes required to conduct digital forensic in cloud computing environment. For Law

Enforcement (LE) or forensic investigator, they contributed in understanding the technical

challenges and implications of digital forensics in the cloud computing platform. They

have raised the following two potential questions to the digital forensic research commu-

nity for designing and developing frameworks that are evidence-based and forensically-

sound.

1. What further changes are required to the existing forensic frameworks and prac-

tices for conducting forensically-sound investigations in a cloud computing envi-

ronment?

2. What are the legal and privacy issues surrounding the access to cloud computing

data, particularly cross-border legal and privacy issues; and what reforms are re-

quired to facilitate access to such data for LEAs (Law Enforcement Agencies)?

26

The main contribution of this thesis is limited to answer the first question above.

A number of solutions have been proposed by many researchers to reduce the over-

all processing time of the digital evidence. Rogers et al. (2006) have proposed a live

forensics model called Cyber Forensic Field Triage Process Model (CFFTPM), which

deals with gathering actionable intelligence on the crime scene [79]. The model, aimed at

time-critical investigations, defines a workflow for on-scene identification, analysis and

interpretation of digital evidence, without the requirement of acquiring a complete foren-

sic copy or taking the system back to the lab for an in-depth examination. Vassil Roussev

et al. (2013) formulated forensic triage as a real-time computational problem with specific

technical requirements and used these requirements to evaluate the suitability of different

forensic methods for triage purposes [80]. Fabio Marturana et al. (2013) proposed a “ma-

chine learning based digital forensic triage methodology for automated categorization of

digital media” [70]. Kyungho Lee et al. (2013) proposed a new triage model conforming

to the needs of selective seizure of electronic evidence by surveying Law Enforcement

officers who are involved in the onsite search and seizure of digital evidence [63]. Also,

there are many digital forensic triage tools which are used to collect crime related data

quickly and are able to preserve integrity [1, 21, 38, 39]. Neither of the existing tools nor

the recently proposed forensic triage methods use any parallel processing framework to

achieve digital forensic triage.

Adrian Shaw and Alan Browne in their paper [84] have summarized the risks associ-

ated with using triage techniques in digital forensics. Out of the six risks they summa-

rized, the following risks are worth mentioning. We provide counter measures for them

in this thesis.

1. A high risk of evidence being missed through the lack of thoroughness in the pro-

cess: this includes searching through encrypted data, unallocated space, swap space,

etc.

2. The risk of case backlogs: the forensic triage followed provides little assistance

to examiner/investigator for defining scope of the examination, thus, full forensic

examination can not be avoided

3. The risk of missed investigative opportunities: absence of intelligence evidence

27

gathering and analysis

For cloud crimes involving storage as a service (Amazon Cloud Drive, Microsoft Sky-

Drive, Google Drive, Dropbox, etc.), forensic investigation has to be carried out in the sus-

pected client device as well as the cloud computing environment. Chung et al. [53] have

proposed a forensic model for investigation of cloud storage services (Amazon S3, Google

Docs, Evernote, Dropbox) using which analysis of artifacts present in the client devices

such as Android smartphone, iPhone, Windows system and Mac system can be possible.

Darren Quick and Raymond Choo have analyzed the data remnants of cloud storage ser-

vices (Dropbox, GoogleDrive, and Microsoft SkyDrive) on user machines [74, 75, 77]. In

another paper [76], they have used browser and client software (for example Google drive

client software [20] is available for PC’s, Android device, iPhone and iPad ) to collect and

preserve data (basically files) from cloud services mentioned above. Through their exper-

iment, they noticed that there is no change in the integrity of the data through processes

such as uploading, downloading and storing of files in the cloud store. However, the file

timestamp information were changed. Hence, they cautioned forensic investigator about

the implications on making wrong assumptions regarding timestamp.

Corrado Federici has developed a CDI (Cloud Data Imager) library, a mediation layer,

that offers browsing facility for files and folders with metadata of cloud storage ser-

vice [57]. They have built a desktop application on top of the CDI library that provides

folder listing with ability of viewing present, deleted and shared contents. Also, using this

desktop application, the investigator can image a folder tree of cloud account to widely

used forensic format EWF (Expert Witness Format). We believe that the research which

has been done so far to investigate remote storage services, can also be used to investigate

any other cloud storage service.

Garfinkel in his research paper [59] has summarized the research directions of digital

forensics for the next 10 years from the year 2010. He suggests to the digital forensic re-

search community to adopt standardized and modular approaches for data representation

and digital forensic processing. He makes a valid point about the scalability and valida-

tion of the existing tools. He says, digital forensic techniques that are developed and used

today are on relatively small data sets (n <100), which fails to scale for real-world sizes

(n >10,000). Here ‘n’ refers to the number of JPEG files, size of disk in TB (tera bytes),

28

Hard drives, mobile phones, etc.

2.4 Summary

Various advantages offered by the cloud computing business model has made it one of the

most significant of the current computing trends like personal, mobile, ubiquitous, cluster,

grid, and utility computing models. These advantages have created complex issues for

forensic investigators and practitioners for conducting digital forensic investigation in the

cloud computing environment.

In this Chapter, we have defined a few general terms required to understand and per-

form digital forensics. Keeping the cloud computing platform in mind, we have listed

the limitations of traditional digital forensic tools to perform forensic analysis in cloud.

Considering the cloud computing architecture and characteristics, we have identified the

sources of digital evidence to perform digital forensics. We highlighted our emphasis on

the effect and role of the cloud deployment models, the delivery models and the multi-

layered cloud architecture in the cloud crime investigation. Finally, we provided research

reviews from some of the experts in this emerging area in last few years. In Chapter 3,

we will discuss about one of the major objectives of this work, i.e., identifying the chal-

lenges and requirements of forensics in the virtualized environment without which cloud

computing systems cannot exist.

29

Chapter 3

Approaches to Forensics in Presence of

Virtualization in Cloud

“Knowledge will bring you the opportunity to make a difference.”

- Claire Fagin

3.1 Introduction

In Chapter 2, we had provided a glance on the background and related research in the area

of cloud forensics. After carefully examining the past research, we could identify certain

areas to contribute. In this Chapter, we discuss the various ways in which digital forensic

can be carried out in a virtual environment and the challenges therein.

Cloud computing is an Internet based computing paradigm that delivers on-demand

software and hardware computing capability as a “service” where the consumer is com-

pletely abstracted from the computing resources. These services are provided by means

of virtualization.

Virtualization is omnipresent in the cloud computing. It is one of the important tech-

nologies for the realization of cloud computing [89]. Using virtualization one can create

as many virtual machines as possible on a required hardware. A virtual machine (VM) is

a software replacement/implementation of a computer that executes programs like a phys-

ical machine. Just about every literature published today mentions virtualization or cloud

30

computing. Most companies are adopting different virtualization technologies in their

current IT environments. VMware ESX/ESXi/Workstation, Microsoft Hyper-V/Virtual

PC, Citrix XenServer, IBM z/VM, KVM, Sun VirtualBox, QEMU, etc. are few examples

of different virtualization solutions which are used by many companies extensively today.

With more emphasis being placed on going green and power becoming more expen-

sive, virtualization offers cost benefits by decreasing the number of physical machines re-

quired within an environment. A virtualized environment offers reduced support by mak-

ing testing and maintenance easier. The way of performing digital forensic has changed

due to these virtual environments.

As the use of the virtual machine environment increases, computer attackers are be-

coming increasingly interested in exploring the virtual environment to spread malware,

steal data, or conceal activities. The contributions of this chapter is mainly limited to the

following.

• Identification of challenges and requirements of forensics in the virtualized envi-

ronment

• Devise procedures to counter the major challenges in performing forensic analysis

3.2 Challenges and Requirements of Forensics

Cloud computing is characterized by its highly virtualized environment. Traditional dig-

ital forensics cannot be applied to the cloud environment directly due to its virtualized

nature. After the extensive literature survey we conducted (Chapter 2), we have identified

the following digital forensic challenges in the cloud virtual environment for which some

of the counter measures will be provided in the succeeding sections.

1. Understanding of the files created when a virtual machine is launched?

2. Multi-level virtualization: a virtual machine can run virtualization to launch any

number of virtual machines on top of it

3. Detecting virtual environment on a physical system or a virtual machine

31

4. Security challenges due to the vulnerability in the host operating system of the

virtual platform

5. Increase in virtual hard disk space: will increase the time taken to complete the

forensic investigation

6. Digital evidence is no longer confined to the local or single hard drive

7. Analysis of cloud hosting server logs: virtual environment hosting the cloud ser-

vices create altogether different logs for running and maintaining VMs and provid-

ing other services

We will address the first four challenges in the following sections, and remaining three in

the succeeding chapters.

3.3 Detection of Virtual Environment

There are different ways in which the virtual machines can be created and used. First,

virtual machines can be created on the same machine where already an OS is installed.

Second, virtual machines can be created using different cloud deployment models (Pri-

vate, public, community, and hybrid). Third, virtual machines can be created in any

external storage like USB flash drive, USB hard drive, or other portable storage devices

like an iPod or mobile phone. In this section we will explore the forensic analysis of vir-

tual machines for the first case using VMware workstation as the virtualization software.

Contributions on analysis of the cloud virtual machines will be discussed in Chapter 5

and 6. Analysis of virtual machines created on external storage is not within the scope of

this thesis.

32

3.3.1 Important files in virtual machine investigation

The type of virtual machines which can be created on an existing OS with the help of

virtualization software can also be created within another virtual machine by having a

virtualization software as an application within the virtual machine. This concept can be

termed as multi-level virtualization and is shown in the Figure 3.1.

Figure 3.1: Multi-level virtualization

The virtual machines that are running within VM1 with the help of Guest OS and

VMM can be analysed by taking a snapshot or copying the virtual disk file of VM1. Tak-

ing a snapshot or copying the virtual disk file (.vmdk file in case of VMware workstation)

of VM1 is possible through VMM in the case of type 1 hypervisor and host OS in the case

of type 2. The snapshot which has been taken or the virtual disk file that has been copied

for forensic analysis purpose is called as “digital evidence”. Once the VM’s virtual hard

disk file is acquired, that can be mounted as a virtual drive and analyzed using various

available traditional digital forensic tools [9, 11, 12, 17, 37, 45]. More on the analysis of

virtual machines is discussed in the Chapter 5 and 6.

In order to make any conclusion to say that a virtual machine may have existed on the

digital evidence or not, one has to find at least one of the files as listed in the Table 3.1 on

the evidence being analyzed. These files are specific to the VMware workstation virtual-

ization [42] solution and may be different for other solutions like Citrix, Sun VirtualBox,

QEMU, Microsoft Virtual PC, etc.

33

Table 3.1: Files which make up a virtual machine

File Extension Description

.VMDK/ .DSKIt is created for virtual hard drive for the virtual guest operation

system, which may be either dynamic or fixed virtual disk.

.VMX/ .CFGConfiguration file. Stores settings chosen in virtual machine

settings editor.

.LOGIt contains log of activity for a virtual machine and hypervisor.

It is stored along with (.VMX) file

.VMEMIt is backup of the virtual machines paging file. It is available

only when the VM is running or has crashed

.VMSN It is VM’s snapshot file that stores running state of the same.

.VMXF

This is a supplemental configuration file for virtual machines

that are in a team. It remains if a virtual machine is removed

from the team.

.VMTM

Configuration file containing information of a team of VMs.

A team is a group of virtual machines which can inter-operate

in a VMware virtual lab environment.

.VMSS/ .SDT It stores the state of a suspended virtual machine.

.NVRAM It stores the BIOS information of the virtual machine.

34

3.3.2 Changes in the host OS when the virtual platform is used

Now-a-days virtual machines have almost replaced the physical machines in the day-

to-day activity of IT professionals or organizations. These developments should be of

interest to the cyber criminals for several reasons including exploring virtual environments

for crimes. In this section we will analyze the host OS (it can be a guest OS as shown in

Figure 3.1, running on VM1) to detect the presence of the virtualization software.

In this experiment, we have used Windows 7 as the host operating system and VMware

workstation as the virtualization software. We have used ZSoft Uninstaller 2.5 [47] for

analysis purpose. ZSoft Uninstaller 2.5 is a freely available software which can uninstall a

program and find the remnants after uninstalling. We have carried out the following steps

to detect the presence of virtual environment.

Step 1: installed ZSoft Uninstaller 2.5 on Windows 7 system.

Step 2: ran ZSoft Uninstaller 2.5 software to capture the system snapshot.

Step 3: installed VMware workstation on Windows 7 system for virtualization.

Step 4: created a virtual machine using the standard procedure available with the VMware

workstation. Used the created virtual machine for the day-to-day activity such as surfing

internet, sending mails, etc.

Step 5: ran ZSoft Uninstaller 2.5 software to capture the system snapshot again.

We could observe the file changes in the host OS due to the virtualization as shown

in the Figure 3.2. The same experiment can be carried out for other virtual environments

like Citrix, Sun VirtualBox, QEMU, Microsoft Virtual PC, etc. to detect their presence.

Using this approach a virtualized data center administrator can keep track of the multi-

level virtualization and monitor the activities of virtual machine users.

35

Figure 3.2: Changes in host OS files during VMware workstation installation

3.4 Detection of Virtual Machine Hidden Using ADS

Virtualization poses challenges to the implementation of security as well as cybercrime

investigation in the cloud. Hiding of data has always been a major part of the computer

forensic analysis process. Data hiding in a digital media can be performed for various

reasons including potential malware attacks, hiding data for later use in a compromised

environment by an attacker, or when an offender hides useful information in his personal

computer [68]. There are numerous methods that can be used in order to hide data from

potential examination. One of them is hiding data in alternate data streams under NTFS

or HFS+ [51]. Other methods are hiding data in the slack areas of the digital media. Data

which is hidden in slack areas (file slack, disk slack etc.,) can be easily carved out using

traditional digital forensic tools especially carving tools [73].

36

The method of hiding data using ADS has legitimate applications like service for

Macintosh operating system (interoperability between HFS and NTFS), volume change

tracking, storing summary data information, etc. This method of hiding files is vulnera-

ble to both insider and outsider attacks, whereby the attackers may hide files in windows

systems supporting NTFS. The insider may choose this method to perform unauthorized

or unacceptable deeds on his system. The outsiders may choose this method to hide mali-

cious files on a remote system and to prevent third parties from finding out the files. The

same method can be used to hide virtual machines created on the virtualization environ-

ment and used for malicious purposes.

3.4.1 Role of Alternate Data Streams (ADSs)

Alternate Data Streams (ADS) is a unique feature of NTFS file systems introduced with

Windows NT 3.1 in the early 1990’s to provide compatibility between Windows NT

servers and Macintosh clients which use the Hierarchical File System (HFS) [87]. Under-

standing the concept of alternate data streams requires the knowledge of the structure of a

special metadata file called Master File Table (MFT). The Window system creates twelve

metadata files when an NTFS (New Technology File System) partition is formatted that

contains information about the volume itself and the data stored in it [51]. The file that

stores all of the records and attributes that Windows system needs to access any file or

directory on the volume is called as the MFT. The length of each record in the MFT may

vary with a minimum of 1,024 bytes and a maximum of 4,096 bytes. Each record contains

different attributes of a file as shown in the Figure 3.3.

Figure 3.3: MFT file record with sample attributes

SIA: Standard Information Attribute

FNA: File Name Attribute

DA: Data Attribute

37

The data attribute contains the cluster information (cluster chain) that is allocated to

a particular file depending on whether a file is resident or nonresident. A file is said to

be resident when stored in MFT itself, otherwise it is called as nonresident file. A “Clus-

ter” is the basic allocation unit of a file in Windows system. NTFS supports only one

data attribute by default per record without a name called as unnamed attribute, so any

additional data attributes must be named. A directory has no default data attribute but can

have optional named data attributes. These additional named data attributes may contain

the alternate data streams as shown in the Figure 3.4.

Figure 3.4: MFT file record with named attributes

H: Header of the attribute

N: Name of the attribute

LCN: Logical Cluster number

The Logical cluster number (LCN) range specifies the sequential clusters allocated to

the data stream. Figure 3.4 shows the MFT record containing three alternate data streams

(DA1, DA2, and DA3) whose names can be obtained from “N: name attribute”. The

original file name (assume file-org.jpg) is specified in the FNA field of the MFT record.

Any file can be attached to the file file-org.jpg using one of these named data attributes

(DA1, DA2, and DA3).

Example of Hiding and Accessing a File Using ADS:

To attach a file myFile.txt to the file-org.jpg, a malicious attacker can use the following

command [49].

C:\>type myFile.txt >file-org.jpg:hiddenFile.txt

Here, [type] is a command to create ADS.

38

[>] serves for redirecting file

[:] serves for separating stream with original file

The file myFile.txt is hidden in file-org.jpg with hiddenFile.txt as stream name. Now,

the user can delete the file myFile.txt from its original location permanently and still can

access it as a stream. To open the stream, one can use the following command:

C:\>notepad file-org.jpg:hiddenFile.txt

Here, notepad is a utility program which can open any text file.

3.4.2 Approach to Hide and Detect a VM Hidden using ADS

This approach is only applicable to the machines containing Windows Operating system

as the host OS with NTFS as file system. For experimental purpose, we have considered

a compute server of private cloud as shown in Figure 3.5 which runs a set of virtual ma-

chines. We have installed VMware workstation 8.0 virtualization software on Windows 7

virtual machine (VM1) to create virtual machines on it. We created two virtual machines,

one containing Windows 7 (VMa) and other containing Ubuntu 12.04 (VMb) as operat-

ing systems. We experimented with the second virtual machine for the purpose of hiding

and reusing it. As we have described in Table 3.1, any launched virtual machine creates

different files. The file which is important to malicious insider or outsider is virtual disk

file (.vmdk in our case). This .vmdk file is similar to any other system file and can be

easily viewed and hidden with the help of host operating system (refer Windows 7 OS in

VM1).

Hiding and Accessing a VM Using ADS:

When we created the second virtual machine, it could create a virtual disk file with the

name ubuntu.vmdk in the folder path as shown below:

C:\Program Files\VMware\VMwareWorkstation\Ubuntu

We tried to hide ubuntu.vmdk file with a temporary file vmtest.txt in the same path as

follows.

C:\Program Files\VMware\VMwareWorkstation\Ubuntu>

type ubuntu.vmdk >vmtest.txt:myVMFile.vmdk

39

Figure 3.5: Hiding of virtual machine in a cloud hosting server

Now the stream is ready to be used with the name “vmtest.txt:myVMFile.vmdk”. For the

experimental purpose we deleted the file ubuntu.vmdk from its original location and tried

to use the virtual machine that caused an error, showing the message as shown in the

Figure 3.6.

Figure 3.6: Launching a hidden virtual machine

After carefully examining the error message, we realized that one of the supporting

files uses the path of the virtual disk file. So, we edited the configuration file (ubuntu.vmx)

to locate the path of ubuntu.vmdk file as shown in Figure 3.7.

After modifying the configuration file by replacing the path of the old ubuntu.vmdk

file with the new data stream vmtest.txt:myVMFile.mvdk as shown in Figure 3.8, we could

successfully use the virtual machine. Now the hidden virtual machine containing Ubuntu

12.04 can be used for malicious purposes (like storing executables, performing denial of

service attacks, etc.) or whatsoever the user desires.

40

Figure 3.7: Configuration file (.vmx)

Figure 3.8: Modified configuration file (.vmx)

From the digital forensics investigator’s view point, it is impossible to retrieve the

hidden virtual machine because a malicious user may use a file shredder software to over-

write the content of the deleted ubuntu.vmdk file. File shredder is a software that one

can use to safely delete any file which makes it practically impossible to retrieve the file

forensically.

Figure 3.9: Hash value of vmtest.txt file before ADS attachment

We have checked the integrity of the file (vmtest.txt) by hashing it before and after the

attachment of the alternate data stream (ubuntu.vmdk in this case). The tool we have used

41

Figure 3.10: Hash value of vmtest.txt file after ADS attachment

to compute the hash value was Hasher which is part of the CyberCheck suite [9] that uses

MD5/SHA/HMAC algorithms. The hash values of the file vmtest.txt before and after the

attachment of ADS (virtual disk file of a VM) are shown in Figure 3.9 and Figure 3.10

respectively. Because of the same hash values before and after the attachment, one can not

prove that the file vmtest.txt is modified in any sense in the digital forensic investigation.

Hence, the investigator may not have any clue about the presence of the ADS with any

kind of file in the system under investigation.

Methodology to detect hidden VMs:

In NTFS, the MFT is the main data structure that contains all the information required to

retrieve files. The first record of MFT gives details about the layout of MFT, the total size

of MFT and whether a particular record is currently in-use or not. The Bitmap attribute in

the first record indicates the status of an MFT record. The attribute contains a sequence

of bits where each bit represents the allocation status of an MFT record. If a bit is set to 1

then the corresponding MFT record is in-use. It means that the record represents a normal

undeleted file. If the bit is zero then the record is not used currently and it may contain

information about a file that has been deleted [92]. Our interest is to detect the hidden

virtual machines using data streams, and not to retrieve the original or deleted files.

For detecting a hidden virtual machine within an NTFS file system, we have to scan

every record in MFT, and see the presence of named data attributes if any in the MFT

record. If present, the following filters are applied to check the metadata information of

the file. These three filters guarantee that, the hidden file is in fact a virtual machine file.

1. Check for the file extension (.vmdk, .vhd, .vdi, .qcow2, etc.)

42

2. Check for the file size limit (>1GB)

3. Check for the header signature (Table 3.2)

In the flow chart as shown in the Figure 3.11, if DACount = 1, that means the file

does not contain any kind of named streams. If DACount >1, the file may contain one or

more named streams (Alternate data streams) in it. The algorithm explained in the chart

proceeds by initially getting the number of data attributes of a MFT record and iterating

through each data attribute to check whether it contains a VM’s virtual disk file (.vmdk,

.vhd, .vdi, .qcow2, etc.). The process continues for MFT records of all the files available

within a NTFS partition.

The first and second filters are necessary conditions to check whether a given file is

a virtual machine file or not, but not sufficient. In the case of the first filter, a user can

easily change the file extension (signature miss-match), and one cannot judge based on

only extension. The file extension can be read from the stream name attribute (DA1: N

of Figure 3.4). The second filter does not guarantee that a given file is a virtual machine

file and not a video file for instance. The first and second filters can act as adornment to

the third filter. As each and every file has a unique header, it would be sufficient to match

the header of a given alternate data stream file with the header of a virtual hard disk file.

Table 3.2 shows the header signatures of different virtual disk files.

Table 3.2: Virtual disk file signatures

File extension Header signature.vmdk 4B444D56 (KDMV).vhd 636F6E6563746978 (conectix)

.qcow2 514649FB(QFI.).vdi 5644492E (VDI.)

To get the header of a given alternate data stream file, one has to read the first cluster’s

first few bytes allocated to the stream (refer cluster number: 236 of Figure 3.4). The

stream file size can be obtained from the stream header attribute (DA1: H of Figure 3.4).

The detection algorithm which we suggested can be used by the cloud service provider

to monitor the activities of the virtual machines from the host operating system. The

cloud service provider can pre-configure the virtual machine instances with the detection

43

Figure 3.11: Detection of hidden virtual machine

44

algorithm proposed before using them in the cloud environment.

3.5 Summary

In this Chapter, we have presented the ways in which forensic analysis can be done in

the virtual environment that is omnipresent in the cloud. We identified the challenges and

requirements of performing digital forensics in the virtualized cloud environment. Taking

the example of a VMware workstation as a specific virtualization solution, we have de-

vised a procedure to detect and analyze the cloud virtual environment. This procedure can

be applicable to other platforms like Microsoft Virtual PC, Sun VirtualBox, QEMU, etc.

The procedure for detecting and analyzing the virtual environment in cloud got published

in [Pub1]. From the analysis perspective, we described the way in which virtual machines

in cloud can be hidden using the Alternate Data Streams (ADS) technique of Windows.

Also, we presented an algorithm to detect such virtual machines. The proposed algorithm

presents three filters. On implementation, this algorithm guarantees the detection of hid-

den virtual machines. The algorithm and related work we presented here got published in

[Pub2]. In the next Chapter, we will describe the proposed digital forensic framework for

cloud computing systems from the perspective of the cloud investigator and/or the cloud

architecture.

45

Chapter 4

Designing a Digital Forensic

Framework for Cloud Computing

Systems

“Make things as simple as possible... but not simpler.”

- Albert Einstein

4.1 Introduction

In this Chapter, we had identified the challenges and requirements of performing digital

forensics in the virtual environment that is omnipresent in the cloud. Detection of a virtual

environment and an algorithm to detect the cloud virtual machines hidden using ADS

were discussed in particular. In this Chapter, we discuss the proposed design of a “digital

forensic framework” for cloud computing systems from the view point of the investigator

and/or the cloud architecture.

Cloud as a business model presents a range of new challenges to the digital forensic

investigators due to its unique characteristics. It is necessary that the forensic investigators

and/or researchers adapt the existing traditional digital forensic practices and develop new

forensic frameworks which would enable investigators to perform digital forensics in the

cloud computing environment.

46

Ben Martini and Kim-Kwang Raymond Choo [69] have proposed a iterative integrated

digital forensic framework for the forensic data collection and preservation from cloud

services. Also, they have advised the forensic research community to identify the changes

that need to be incorporated into the existing forensic practices and frameworks. In the

following sections, we discuss the forensic phases required for cloud and compare them

with those of Ben Martini and Kim-Kwang Raymond Choo, NIST and McKemmish.

Considering all the phases, we will design a digital forensic framework from the view

point of the digital forensic investigator and/or the digital forensic tools developer.

4.2 Cloud Forensic Process and Phases

The phases involved in investigating a cyber crime does not change with the investigative

environment like desktop, laptop, mobile, network, server with virtual environment, or

cloud. All the phases remain the same irrespective of the environment except the way

in which they are applied. Figure 4.1 shows various phases involved in the cyber crime

investigation as suggested in the digital forensic literature [52, 64, 69, 71].

Figure 4.1: Phases of cyber crime investigation

Hashing is also part of authentication and preservation. There can exist two kinds

of labs for performing forensic acquisition and analysis in the jurisdiction of a country.

47

One will be maintained by the cyber crime department (called as cyber crime lab) and

another by dedicated forensic laboratories of the government or private body (called as

forensic lab). Identification of evidence and seizure with hashing will be carried out by

the department of cyber crime. The experts of both the labs will be capable of acquisition,

authentication, analysis and preservation. If analysis of the crime under investigation is

performed in the forensic lab, the presentation of evidence to the court-of-law will be

done by the cyber crime department in the presence of the expertise of the forensic lab

as a witness of the evidence. Otherwise, officials of the cyber crime department can

directly submit the evidence. The findings of the investigation has to be repeatable and

reproducible at any time before the court-of-law, and hence the preservation phase.

4.2.1 Comparison of Digital Forensic Frameworks

The digital forensic frameworks suggested by NIST [64] and McKemmish [71] were very

much similar. According to NIST, the identification and preservation are part of the col-

lection phase and in McKemmish, the examination is part of the analysis phase. The aim

of these two frameworks was traditional digital forensic investigation. Ben Martini and

Kim-Kwang Raymond Choo’s forensic framework [69] for cloud computing was based

on these two frameworks. They called it a iterative framework due to the backward conti-

nuity from phase 4 (Examination and analysis) to phase 1(Evidence source identification

and preservation). This is possible for the fact that the identification and preservation of

the evidence in cloud has to be done once the use of the cloud services in the client device

is reported in the examination and analysis phases of the client device.

We propose our framework based on the above three frameworks by incorporating

a phase, i.e., examination and partial analysis (phase 3). Through this framework, we

contributed in the following areas, for which the proof of concept will be provided in

Chapter 5 and 6.

• Segregation of log data

• Selective data acquisition

• Partial analysis of evidence: this includes analysis of the evidence within a VM

(memory, registry, file system metadata, etc.) to speed up the final analysis.

48

Table 4.1: Comparison of digital forensic frameworks

Phase No.Proposed cloud

forensic framework

Integrated forensic

framework [69]

NIST

framework [64]

McKemmish

framework [71]

1

Evidence source

Identification,

Segregation, and

Preservation

Evidence source

Identification and

Preservation

Collection Identification

2

Collection (from

client device as

well as cloud)

Collection Examination Preservation

3Examination and

Partial analysis- - -

4 AnalysisExamination and

AnalysisAnalysis Analysis

5 ReportingReporting and

PresentationReporting Presentation

• Digital forensic triage using parallel processing

Before we discuss “Digital Forensic Framework for the Cloud Computing Systems”, we

elaborate on the activities of LEA (Law Enforcement Agency) or the investigator in each

phase, who will be responsible for performing the investigation of the cloud crime in the

following sections.

4.2.2 Identification of Digital Evidence

As an entry point, this phase describes the ways of identifying the sources of evidence

in the digital forensics investigation in the cloud environment. The sources of evidence

could be a client device or the cloud service provider’s data center. A client device can

be a desktop computer, laptop, mobile device, or any device using which one can access

the cloud services. After a reported cloud crime, the client device can be identified using

the network forensic techniques (for example, analysis of the firewall logs of a company

to know which host in the company’s own network is connected to a cloud service). The

identification phase may also be required during the analysis phase to know how the

49

identified device was connected to the cloud environment. Any digital device may connect

to the cloud service using a web browser or client provided by the cloud service provider.

Whether it is a cloud provider’s data center or a client device, identification of the presence

of evidence, its type, format and the location are very important.

4.2.3 Collection and Preservation of Digital Evidence

The emphasis of the cloud investigator for this phase will be on how the data is collected

and preserved for further analysis. Irrespective of the device (sources of evidence) iden-

tified, the forensic investigators need to ensure the proper collection and preservation of

the digital evidence. The Scientific Working Group on Digital Evidence (SWGDE, 2006)

alerts the forensic investigator that the evidence submitted for the analysis should be main-

tained in such a way that the integrity of the data is not lost. Hashing is the commonly

accepted method to achieve this. There are well known data preservation techniques

available like MD4 (Message Digest), MD5, SHA-1 (Secure Hash Algorithm), SHA-2

and SHA-3. The data collection method will depend on the type of the cloud platform

and the deployment models used. Also, the investigator needs to collect the data from the

cloud client device and the cloud service provider’s data center.

Client Side Data Collection and Preservation:

Once the client device is identified, its physical memory data should be collected before

powering off the device. There are numerous tools available for memory acquisition

(FTK imager, OSForensics, dd - data duplication, LiME, etc). The data from powered

off device can be collected using software tools (FTK imager, EnCase Forensic Imager,

TrueBack, etc) or hardware tools (Tableau forensic duplicator, HardCopy 3P, etc). Many

of the above tools have the capability of performing forensically sound data acquisition,

i.e., preservation.

50

Client Side Data Analysis:

The analysis part at the client side proceeds similar to the one in the traditional digital

forensics way by keeping a view on the usage of the cloud service by the client device. Lo-

card’s Exchange Principle says that “Every contact leaves a trace”. There is every chance

of possible remnants in the client device if the criminal is not aware of the anti-forensic

techniques (Darik’s Boot and Nuke (DBAN) [10]. The investigator may need to use the

traditional digital forensic tools [9, 11, 12, 17, 25, 26, 37, 43, 45] to identify the traces of

the cloud services. The investigator has to perform the analysis of cookies, logs, database

files, registry, prefetch files, browser history, pagefile, link files, physical memory, net-

work traffic (incoming and outgoing network packets from the client machine) etc., to get

the possible evidence that proves the usage of the cloud services. Darren Quick, et al.,

identified the types of terrestrial artifacts that are likely to remain on a client’s machine

when one of the cloud services is launched from it [74]. The procedure to perform the

analysis to know about these artifacts is not unique due to the presence of a variety of

operating systems like Windows, Ubuntu, Mac OS, Android, etc. in the client devices.

Cloud Side Data Collection and Preservation:

In the case of the private cloud deployment model, the investigator can use remote acqui-

sition methods to get the virtual disk data and the physical memory data pertaining to a

particular VM (for example the investigator can use dd - data duplication utility of Unix

to acquire the virtual disk image as well as the physical memory image). Unfortunately,

the provenance of the cloud crime not only depends on the analysis of the virtual disk and

the memory of a VM used by the criminal but also on the logs generated by the virtual

machine during its operation. Such logs are categorized as API logs (logs start, end and

life activity of a VM) and host logs (also called as firewall logs, used to log the network

activity of a VM).

A private cloud data center (or any cloud data center for that matter) runs as many

VMs as possible depending on their computational capacity and the requirement. Data

generated by all the VMs and the cloud services that are utilized by the cloud platform

are stored in different log files which cannot be provided to the investigator due to the

51

issue of privacy of other tenants in the cloud. Hence, irrespective of the cloud deployment

model, there is a requirement of segregating the cloud log’s data and collecting the data

of a particular tenant using remote services.

In the case of the public deployment model, the data collection may not be that simple

as the private deployment, because the data is geographically dispersed. If remnants are

found in the client machine about the cloud usage, the investigator has to know for what

purpose the cloud service was used. If it is used for storage purpose (Google Drive,

Dropbox, Windows SkyDrive, etc.), the investigator can obtain the user credentials from

the client machine and get the data stored in the cloud [74]. Otherwise, possibly it might

be used for owning a VM in the cloud (because we are not considering SaaS and PaaS

models in this work). In this case, the investigator will have the option to either download

the virtual disk file or request the cloud service provider to ship his or her virtual disk

data [55].

At the scene of crime after completion of the collection phase, the proposed exami-

nation and partial analysis phase will commence. Using this phase, the investigator can

collect actionable evidence from the collected data with the help of inputs form LEAs

about the nature of the crime. We are using forensic triage to gather actionable evidence

that will be discussed in Chapter 5 and 6. The collected actionable evidence will be

provided as input to the analysis phase for further action to speed up the investigation

process.

4.2.4 Analysis of the Digital Evidence

This phase emphasizes on the examination of the evidence after the source of evidence is

identified, data collected and preserved from the source (cloud computing platform).

Cloud Side Data Analysis:

The extensive study we conducted on the existing work on cloud forensic suggests that

there is no cloud computing architecture that provides in-built forensic facility for data

analysis. Once the data is collected from the cloud environment, the method of analysis

52

depends on the type of the data collected. In the case of the virtual disk data, the tradi-

tional digital forensic analysis procedure can be followed. For the purpose of cloud log

analysis, the data segregation at the cloud data center has to be done so that the evidence

of interest can be collected and analyzed depending on the nature of the cloud crime.

Further discussion on data segregation and analysis will be done in the Chapter 5 and 6.

4.2.5 Reporting of Digital Evidence

This phase provides a way to document and present the evidence found during the analysis

before the court-of-law to enforce the punishment to the cyber criminal depending on the

nationwide policies. There is no major change required in the phase of reporting evidence

other than following forensic-aware Daubert principles [8]. Figure 4.2 shows the Daubert

principles that are required to test the admissibility of the digital evidence in the court-of-

law. In general, there is no nationwide rules to satisfy all these principles, but, if tested

should retain the chain of custody of the evidence under investigation.

Figure 4.2: Daubert principles for digital forensic [8]

As pointed out by Martini [69], before presenting the cloud evidence to the court,

there is a requirement of clear distinction between the data owned by the suspect and the

data generated by the cloud service provider. This is the major difference between the

cloud and the traditional digital forensic evidence presentation. In the cloud, a number of

parties like LEAs, the cloud service provider, etc. may be involved to collect the digital

evidence. Due to the number of parties involved, it is very important to maintain the chain

53

of custody record as shown in Figure 4.3.

Chain of custody is a record that documents all the stages chronologically in the cy-

bercrime investigation showing seizure, acquisition, custody, transmission, examination,

analysis and disposition of the evidence that was investigated [66].

Figure 4.3: Content of the chain of custody record

Figure 4.3 shows a possible template of the Chain of custody record. The record that

is presented here is our own observation derived from traditional forensics and there is

no standard format of this record which can be used by digital forensic labs across the

nations. The investigators may need to focus on the technical aspects of the forensic

investigation and presentation of the evidence to the court, assuming the later is already

aware of the cloud computing deployment and service models.

The countermeasures and solutions for the challenges identified in the cloud forensic

phases above will be discussed in Chapters 5 and 6.

54

4.3 Heuristic Approach for Performing Digital Forensics

in Cloud

The control flow diagram of the proposed “Heuristic approach for performing digital

forensics in cloud computing environment” as shown in Figure 4.4, is fundamentally a

way forward to answer the first question raised by Ben Martini and Kim-Kwang Raymond

Choo [69] to the digital forensic community. The heuristic approach we proposed is based

on the previous digital forensic frameworks of McKemmish [71], NIST [64] and Martini

et al. [69].

The flow of control in the designed approach is self-explanatory which would enable

the forensic investigators to perform investigation in the cloud environment. This ap-

proach can be used as a forensic process for the cloud computing platforms by the foren-

sic investigators who may not have sufficient knowledge of how the cloud services are

built and running. It differs from the traditional digital forensic process in certain aspects.

Particularly, the client side data analysis will start before the cloud side data collection

and preservation. The client device in the identification phase refers to any traditional

computing system like desktop, laptop, mobile, etc. The data acquisition and analysis of

client device reveals the usage of cloud services from the client device. Depending on the

usage of the cloud services, either the virtual disk data has to be collected or the set of

files that are stored in the cloud storage has to be copied along with the logs of the cloud

services.

The digital forensic triage phase will start after the collection of the evidentiary data

from the private or public cloud data center. The methods used for forensic triage for

partial analysis will be discussed in Chapters 5 and 6. The results of the forensic triage

will be used in the further analysis of the digital evidence using traditional digital forensic

methods. The forensic triage will help in the examination and partial analysis of the

collected data to minimize the total processing time of the investigation.

55

Figure 4.4: Control flow diagram for digital forensic investigation in cloud

56

4.4 Digital Forensic architecture for Cloud

In the previous section we have proposed a heuristic approach for performing digital

forensics in the cloud computing environment (IaaS delivery model of the private and the

public cloud). This approach does not include the internal details of the cloud architecture,

either private or public. In this section we will provide a digital forensic architecture

for cloud computing platforms, which is based on the NIST cloud computing reference

architecture [67] and cloud computing solutions like Eucalyptus [13], OpenNebula [28],

OpenStack [29], etc. This architecture would be useful to the digital forensic community

for designing and developing new forensic tools in the area of cloud forensic.

Figure 4.5: Digital forensic architecture for cloud

57

4.4.1 Cloud Infrastructure Setup

A cloud infrastructure consists of the required bare metal hardware to deploy a cloud

computing environment, either private or public. As shown in Figure 4.5, the hardware

may comprise few high end servers for compute and storage, network switches/routers,

and cables for networking. Any cloud operating system [13, 28, 29, 40] can be installed

on these hardware to set up a cloud platform. For experimental purpose, we have set up

a cloud test bed using OpenStack cloud OS with hardware configurations as shown in

Table 4.2.

Table 4.2: Hardware configuration details of the private cloud (IaaS)

Hardware Equipment Qty. Purpose

HP ProLiant 1U Rack Server, Intel Xeon E3-1220v3

(3.1GHz/4-core/8MB/80W, HT),HP 1TB Non-hot plug

LFF SATA, 32(4*8GB)RAM , 4 NICS

21. Controller node

2. Compute node

RACK 37U 1 Housing the Servers

HP 5120 20 RJ-45 autosensing 10/100/1000 ports 1 External connection

HP 1910 24G switch 1Interconnection among

servers

3KVA UPS 1 Power backup

UTP CABLE AND IO BOX,PATCH CARDS 1 Networking

4.4.2 Cloud Deployment (Cloud OS)

A cloud deployment platform mainly consists of the services to manage and provide ac-

cess to the resources in the cloud environment. Due to the layered architecture, the user

access to cloud resources is restricted based on the delivery models (IaaS, PaaS, or SaaS).

Other than the hardware and virtualization layer, the user has access to all other layers

in the IaaS model, whereas restricted access to PaaS and SaaS as depicted in the archi-

tecture. From the cloud crime investigation perspective, the services which manage logs,

instances, images, storage and network will be of interest to the cloud forensic tool de-

signers and developers.

58

Figure 4.6: Conceptual architecture of the private cloud IaaS

For experimental purpose, we have set up a IaaS (Infrastructure as a Service) cloud test

bed using OpenStack as the cloud operating system. Using the hardware configurations

provided in Table 4.2 and the two-node architecture concept of OpenStack [31], we have

deployed the private cloud computing environment. The conceptual architecture diagram

of the private cloud IaaS with one controller node and one compute node is shown in

Figure 4.6. The version of the OpenStack cloud used for this purpose was Icehouse.

The controller node runs the required services of OpenStack to launch and run virtual

machines. The compute node runs all the virtual machines along with the hypervisor. Any

number of compute nodes can be added to this test bed depending on the requirements to

create the virtual machines. The conceptual architecture uses two network switches, one

for the internal communication between the servers and among the virtual machines and

another for external communication. The list of the basic services that are required for

OpenStack private cloud and their use is provided in Table 4.3.

59

Table 4.3: Basic services of OpenStack cloud OS [31]

Service Name Component Use

Dashboard Horizon

Web based portal to interact with other OpenStack services

(i.e., launching an instance, configuring access controls,

attaching volumes to VM, maintenance, etc.)

Compute Nova

Provides virtual servers (or virtual machines) upon demand

by allowing users to create, destroy, and manage virtual

machines using user supplied images.

Image Glance

Provides a catalog and repository for virtual disk images

which are used by Compute (Nova) during instance (virtual

machine) provisioning.

Identity KeystoneProvides authentication and authorization for all the

OpenStack services.

Block Storage CinderProvides persistent block storage to running instances.

Also, used to create and manage block storage devices.

4.4.3 Cloud Investigation and Auditing Tools

The cloud provider may have external auditing services for auditing security, auditing pri-

vacy, and auditing performance. Our goal is to provide forensic investigative services for

data collection, hybrid data acquisition, and partial analysis of the evidence. As shown in

Figure 4.5, the cloud admin (CSP) can make use of the “Forensic Investigative Services”

directly whereas the cloud user and/or the investigator will have to depend on the cloud

admin. The suggested digital forensic architecture for cloud computing systems is generic

and can be used by any cloud deployment model. The methods of data collection, hybrid

data acquisition, and partial analysis of the evidence will be discussed in Chapters 5 and

6.

60

4.5 Summary

The increasing use of cloud services will also increase the criminal exploitation of the

cloud platforms to commit cyber crimes. The exploitation of the cloud services for the

criminal activity presents many challenges for the law enforcement agencies such as data

segregation, collection, multi-jurisdiction, multi-tenancy, chain of custody, etc. In this

Chapter, we started with comparing three major digital forensic frameworks and proposed

a forensic framework for the cloud computing systems. After identifying the required

phases in the framework, we designed a heuristic approach for performing digital foren-

sics in cloud computing environment. The forensic investigators can use this approach as

a forensic process for investigating the cloud computing platforms even without knowing

much internal details of the cloud environment. The proposed approach is an accepted

and published work [Pub3]. Also, we have designed a digital forensic architecture for

cloud computing systems, which may be useful to the digital forensic research commu-

nity for the design and development of new forensic tools in the area of cloud forensic.

The digital forensic architecture we proposed for the cloud environment is under review

as [Pub5] for publishing. In the next Chapter, we will discuss the methods of cloud data

acquisition and analysis.

61

Chapter 5

Digital Forensic Methods for Cloud

Data Acquisition and Analysis

“Attribution is an enduring problem when it comes to forensic investigations. Computer

attacks can be launched from anywhere in the world and routed through multiple hijacked

machines or proxy servers to hide evidence of their source. Unless a hacker is sloppy

about hiding his tracks, it’s often not possible to unmask the perpetrator through digital

evidence alone.”

- Kim Zetter

5.1 Introduction

In Chapter 4, we proposed a “Heuristic approach for performing digital forensics in the

cloud computing environment”, by framing its phases along the lines of the existing foren-

sic frameworks. We also designed a “Digital forensic architecture for the cloud comput-

ing systems”, to design and develop new forensic tools for the analysis of cloud crimes. In

this Chapter, we introduce the digital forensic methods to acquire and analyze the cloud

data. All the methods of acquisition and analysis which we are suggesting will work for

any type of private cloud computing environment like Ecaulyptus [13], OpenNebula [28],

OpenStack [29], etc. In the case of the public cloud, the analysis methods will remain

the same, but not acquisition. The investigator will have to depend on the cloud service

provider for data acquisition.

62

There is no existing digital forensic solution (or toolkit) that can be used in the cloud

platforms to collect the cloud data, to segregate the multi-tenant data and to perform the

partial analysis on the collected data to minimize the overall processing time of the cloud

crime evidence. Inspired with the work of Dykstra and Sherman [56], we proposed mod-

ules for implementation of data collection and segregation; modules for partial analysis

of evidence within (virtual hard disk, physical memory of a VM) and outside (cloud logs)

of the cloud environment.

The approach we suggested for data segregation (cloud logs) will facilitate a software

client to support the collection of the cloud evidentiary data (forensic artifacts) without

disrupting other tenants. To minimize the processing time of the digital evidence, we

proposed solutions for the initial forensic examination of virtual machine’s data (virtual

hard disk, virtual physical memory) in the places where the digital evidence artifacts are

most likely to be present. As understanding the case under investigation is done in a better

way, it saves considerable time, which can be efficiently utilized for further analysis.

Hence, the investigation process may take less time than what is actually required.

For the purpose of the proof-of-concept and experimentation, we use the “Conceptual

architecture of the private cloud IaaS” test bed which is set up using OpenStack cloud

solution (Icehouse version, Figure 4.6).

5.2 Digital Evidence Source Identification, Data Segrega-

tion and Acquisition

In this section, we will discuss the evidence source identification, segregation of the cloud

logs after identification and acquisition of the identified virtual machine’s data along with

segregated log data.

5.2.1 Identification of the Evidence

Any information in the traditional desktop systems will be stored as files including the

data related to systems activity. Depending on the nature of the computer crime, the files

from the storage will be retrieved and parsed to investigate the cause of the crime. Similar

63

to a desktop machine, a cloud user can create and run virtual machines in the cloud envi-

ronment. This virtual machine is as good as a physical machine and creates lots of data in

the cloud for its activity and management. The data created by a virtual machine includes

the virtual hard disk (file with the extension .qcow2 in the case of OpenStack cloud), the

physical memory of the VM, and the logs. Virtual hard disk formats that different cloud

providers may support include .qcou2, .vhd, .vdi, .vmdk, .img, etc. The virtual hard disk

file will be available in the compute node where the corresponding virtual machine runs.

Every cloud provider may have their own mechanism for service logs (activity mainte-

nance information) and hence there is no interoperability on log formats among the cloud

providers. In OpenStack, the cloud logs will be spread across the controller and the com-

pute nodes.

5.2.2 Segregation of the Evidence

Cloud computing platform is a multi-tenant environment where the end users share the

cloud resources and log files which store the cloud computing services activities. These

log files cannot be provided to the investigator and/or cloud user for forensic activity due

to the privacy issues of other users in the same environment. Dykstra and Sherman [56]

have suggested a tree based data structure called “hash tree” to store API logs and firewall

logs. Since, we have not modified any of the OpenStack service modules we have im-

plemented a different approach of logging known as the “shared table” database. In this

approach, a script runs at the host servers where the different services of OpenStack are

installed (for examples “nova service”). This script mines the data from all the log files

and creates a database table. This database table contains the data of multi-tenants and the

key to uniquely identify a record is “Instance ID” which is unique to a virtual machine.

Now, the cloud user and/or the investigator with the help of the cloud administrator can

query the database for any specific information from a remote system as explained in the

next section.

Table 5.1 shows the path of the cloud services logs at the controller and the compute

servers.

64

Table 5.1: Details of the OpenStack cloud service logs [30]

Service Name Hosting Server Location Description

Dashboard

(Horizon)Controller node

/var/log/apache2

or /var/log/httpd

Contain access logs (logs all

attempts to access the web server)

and error logs (logs all unsuccessful

attempts to access the web,server

along with the reason for fail).

Compute

Management

(Nova logs)

Controller node,

Compute node/var/log/nova/

For virtual machine management,

OpenStack runs many services

such as API, scheduler, network,

token authentication, etc. in the

controller as well as compute node.

The logs of these services are gets

created in this directory.

Block Storage

(cinder logs)Controller node /var/log/cinder/

Log file of each block storage

service is stored as api.log,

cinder-manage.log, scheduler.log

and volume.log.

Virtualization

(KVM)Compute node /var/log/libvirt/

Logs activities of all the virtual

machines including services of

hypervisor.

65

5.2.3 Acquisition of the Evidence

We designed a generic architecture for the cloud forensics and tested the forensic meth-

ods which we implemented in the private cloud deployment using OpenStack. The tools

that are designed and developed for data collection and partial analysis will run on the in-

vestigator’s workstation, whereas, data segregation tool runs on the cloud hosting servers

where the log files are stored. A generic view of the investigator’s interaction with the

private cloud platform is shown in Figure 5.1.

Figure 5.1: Remote data acquisition in the private cloud data center

Virtual Disk Data Acquisition:

For acquiring the forensic image of the virtual hard disk of a VM from a remote system,

the investigator can make use of the concepts suggested by Dykstra and Sherman [55]

or use the traditional file transfer applications like WinSCP [44], PuTTY [33], etc. To

preserve the integrity of the collected data, its hash value must be computed irrespective

of the data collection methods used. The creation of a virtual machine in the OpenStack

cloud will create a directory with the name “Instance ID” in the compute node as shown in

Figure 5.2. This directory will contain the virtual hard disk file (disk.qcow2) of the virtual

machine as shown in Figure 5.3, which has to be acquired for analysis. The methods of

the ‘examination and partial analysis’ will be used to extract the forensic artifacts from

this file at the scene of crime by the investigator.

66

Figure 5.2: Directory of virtual machine instances in the OpenStack cloud

Figure 5.3: Virtual hard disk location in the OpenStack cloud

Virtual Machine’s Memory Data Acquisition:

Acquisition of the physical memory (or RAM) data is only possible when the virtual

machine is running (or ON). To acquire VM’s physical memory data, the investigator can

use the traditional digital forensic tools such as FTK Imager [18], LiME (Linux Memory

Extractor) [25], Memoryze [26], etc. The acquisition tool has to be injected into the virtual

machine whose physical memory (RAM) data has to be collected. The investigator cannot

preserve the integrity of the acquired physical memory data due its volatility nature. The

physical memory data analysis may help the investigator in completing the investigation

process but cannot stand in the court-of-law because of the integrity.

Log Data Acquisition:

The segregated log data is collected using the investigator’s workstation, i.e., a computer

device where the acquisition and partial analysis tools are deployed. We have created a

MySQL database with the name logdb and a table servicelogs under the database in the

controller node of the OpenStack, where most of the logs are present. The application

67

screen shots for connecting to the database from the investigator’s machine and viewing

the table content are shown in Figure 5.4 and Figure 5.5 respectively. The investigator

(in the presence of the cloud admin) can go through the table content and form a query

based on ATTRIBUTE, CONDITION (==, !=, <, <=, >, >=), and VALUE to filter the

evidence required and download to the investigator’s workstation if necessary as shown

in Figure 5.5.

Figure 5.4: Connecting to cloud hosting server that stores the shared table database

Figure 5.5: Shared table with different attribute information

The method of data segregation of logs can be applied to any private cloud deploy-

ment; provided, the data segregation tool has to be modified based on the log format of

the cloud service provider. The log data acquisition method we suggested is generic and

can scale to any cloud deployment.

68

5.3 Examination and Partial Analysis of the Evidence

Evidence examination is a process in digital forensics where data gets extracted from

the forensic image for further analysis. Analysis is a process that uses a set of methods to

analyze the forensic data extracted in the examination phase for anomaly detection, corre-

lation, user profiling, timeline analysis, etc. to generate the analysis results. The evidence

examination and analysis approaches of traditional digital forensics cannot be directly ap-

plicable to the cloud data due to virtualization and multi-tenancy. There is a requirement

of “digital forensic triage” to enable the cybercrime investigator to understand whether

the case is worthy enough for investigation. Digital forensic triage is a technique used

in selective data acquisition and analysis to minimize the processing time of the digital

evidence. We will cover more on this technique in Chapter 6. In the following sections,

we will present the methods of the evidence examination and partial analysis required for

the virtual machine data.

5.3.1 Within the Virtual Machine

Hard disk capacity has grown in proportion to the use of computers. With the emer-

gence of cloud computing, this disk space became virtually unlimited to the end users

(for example, VMware provides a datastore of size 62TB [41]). In this scenario, without

knowing the base and visualizing the disk space, the investigator may end up investigating

the evidence without finding any useful evidentiary results related to the crime. Hence,

with the examination and partial analysis phase at the scene of crime at different parts of

the evidence, we provide the investigator with enough knowledge base of the file system

metadata, content of logs (for example, content of registry files in Windows), and the

internals of the physical memory. With this knowledge base, the investigator will have in-

depth understanding of the case under investigation and may save a considerable amount

of valuable time which can be efficiently utilized for further analysis.

Examination of File System Metadata:

Once the forensic image of the virtual hard disk is obtained in the investigator’s work-

station, the examination of the file system metadata or logs (for example, registry file in

69

Windows) will be started as shown in Figure 5.6. Before using the system metadata ex-

tractor or OS log analyzer (for example windows registry analyzer), the investigator has

to mount the acquired virtual disk (.qcow2 file in our case) as a virtual drive. We have

used a tool called “Mount Image Pro” from GetData software solutions [27] for virtual

disk mounting. After mounting, the virtual disk acts like a drive where it is mounted.

Presently “Mount Image Pro” does not support .qcow2 virtual disk format for mounting.

We converted the .qcow2 format to .raw format to mount it. For the conversion, we have

used QEMU disk image utility called qemu-img [35], a example of which is shown below.

$ qemu-img convert -f .qcow2 -O raw windows7.qcow2 windows7.img

This command will convert a qcow2 image file named windows7.qcow2 to a img (raw)

image file.

Figure 5.6: Virtual disk examination process

We have used a free open source software AWStats [5] for analyzing the logs of open

source operating systems. System metadata extractor as shown in Figure5.7 is used to

70

list the metadata information of files and folders available in the different partitions of

the virtual hard disk. For example, a machine where NTFS is used as file system, we

have extracted metadata information of files/folders like MFT record No., active/deleted,

file/folder, filename, file creation date, file accessed date, etc. as shown in the Figure5.8.

This report may differ for various file systems (FAT32, EXT3, HFS, etc.).

Figure 5.7: File system metadata extractor

Figure 5.8: File system metadata extractor report

We have used Python programming language to implement the graphical user inter-

face. To extract the MFT system file from the NTFS partition of a virtual machine’s

virtual disk file, we used FGET (Forensic Get) [15]. To parse the MFT file after its ex-

traction, we used analyzeMFT.py [3]. The tool analyzeMFT.py is a python script that can

be effectively used to parse the MFT file.

71

Examination of the cloud VM’s registry files:

Like traditional desktop systems, cloud virtual machine’s will have the registry files (or

logs). Windows operating system stores the configuration data in the registry which is

most important from digital forensics perspective. The registry is a hierarchical database,

which can be described as a central repository for the configuration data (i.e. it stores

several elements of information including system details, application installation informa-

tion, networks used, attached devices, history list, etc. [62]. Registry files are user specific

and their location depends on the type of operating system (Windows 2000, XP, 7, 8, etc.).

The important registry files in Windows are USER.DAT, SYSTEM.DAT, CLASSES.DAT,

NTUSER.DAT, USRCLASS.DAT, etc.

The GUI of the Windows Registry Analyzer is built using Python programming lan-

guage. To get the content of a registry file, it has to be extracted from the virtual disk file.

For extracting the registry file, we used FGET (Forensic Get) [15]. To parse the registry

file after extracting it, we used a python library called ‘Python-Registry’ [34]. To get the

specific information from a registry file, the investigator needs to choose, mounting point

for virtual disk, Operating system, User, and the element of information to be retrieved

as shown in Figure 5.9. A sample report generated with the system information, the ap-

Figure 5.9: Cloud VM’s registry analyzer

plication information, the attached devices and the history list is shown in Figure 5.10.

72

Figure 5.10: Cloud VM’s registry analyzer report

Examination of the physical memory of a cloud VM:

Physical memory (or RAM, also called as Volatile memory), contains a wealth of informa-

tion about the running state of a system like the running and hidden processes, malicious

injected code, list of open connections, command history, passwords, clipboard content,

etc. We have used volatility 2.1 [43] plugins to capture some of the important information

from the physical memory of the virtual machine as shown in Figure 5.11.

Figure 5.11: Selective memory analysis

73

A selective memory analysis report of the running process, hidden process and com-

mand history is shown in Figure 5.12.

Figure 5.12: Selective memory analysis report

Figure 5.13: Selection of keyword option for searching

Apart from the selective memory analysis, we have implemented the multiple keyword

search using Boyer-Moore [50] pattern matching algorithm, the implementation details

of which will be provided in the next section. For searching only a set of keywords, the

investigator can select the Keyword option as shown in Figure 5.13. After selecting the

74

option of keywords, the investigator can enter keywords using double quotes separated

by comma as shown in the Figure 5.14. For searching patterns, we have implemented the

Figure 5.14: Entering multiple keywords for search (indexing)

regular expression (RE) search technique for URL, Phone No., Email ID and IP address.

To search only patterns, the investigator can select the RE option as shown in Figure 5.15.

After selecting the option of RE, the investigator can choose the patterns to be searched

as shown in the Figure 5.16.

Figure 5.15: Selection of RE option for searching

We have implemented the search engine using C# programming under .Net platform.

The GUI implemented provides the option of searching either keywords or patterns. For

pattern matching, we have used the Match() method of ‘Regex’ class as follows:

Match i = Regex.Match(input, pattern);

where,

75

Figure 5.16: Selecting multiple patterns for search (indexing)

input - text where pattern has to be searched

pattern - regular expressions (URL, Phone No., Email ID and IP address)

i.Value- will contain the matched pattern for the given regular expression (for example,

www.microsoft.com for URL)

i.Index - will contain the file offset of the matched pattern

The regular expressions used for the corresponding patterns are listed in the Table 5.2.

Whether it is a keyword or pattern, the report generated will contain the file offset of the

Table 5.2: Regular expressions used for corresponding patterns

Keyword Regular Expression (Indian context)

Email ID [A-Za-z0-9. -]+@[A-Za-z0-9]+.(com|in|net|edu)

URL [[w]{3}]?.[A-Za-z0-9]+.[[gov.|co.|nic.]?in|com|edu|net]

IP address [0-1]{8}.[0-1]{8}.[0-1]{8}.[0-1]{8}

Mobile number [91|0]?[7-9][0-9]{9}

given keyword or the pattern as shown in Figure5.17.

76

Figure 5.17: Memory analysis report (result of keywords or pattern matching search)

5.3.2 Boyer-Moore (BM) Algorithm

Boyer-Moore is a efficient string searching algorithm which is also a standard benchmark

for the practical string search literature [85]. It works based upon on calculating the

shift values of the characters of a pattern (or keyword) which are used in case of a bad

match. The Naive algorithm for string matching shifts the pattern by one space every

time a bad match occurs. The shift values in the Boyer-Moore algorithm prevent this

from happening [50]. The worst case running time of the Boyer-Moore algorithm and the

Naive algorithm however are the same (i.e O(mn) where ‘m’ is the length of the pattern

and ‘n’ the length of the text). However, in practical situations the Boyer-Moore algorithm

is vastly superior.

The algorithm works in two phases - the pre-processing phase and the searching phase.

In the pre-processing phase, it builds the shift tables (bad character and good suffix) which

contains the length of the characters to shift when a mismatch occurs in the search of the

pattern (or keyword). These tables are built based on the alphabets in the keyword. In

the searching phase, it scans the characters from the right to left for a match. In case of a

mismatch, it uses the bad character and the good suffix tables to shift the keyword more

77

than one character towards right.

Pre-processing Phase of Boyer-Moore Algorithm:

Bad Character Rule (BCR): it is used to build the bad character (BC) shift table. For a

pattern ‘P’, it builds the shift table using the following principle.

Good Suffix Rule (GSR): it calculates the shift values based on how many characters

were matched successfully before a mismatch (i.e., uses the knowledge of the matched

characters in the pattern’s suffix). It uses the following principle to build the shift table of

a pattern ‘P’ called as the good suffix table. In the search algorithm, the value used for

the shift will be the largest of values produced by Case1, Case2 and Case3.

Searching Phase of Boyer-Moore Algorithm:

The Boyer-Moore search algorithm uses the computed shift values from the bad character

and good suffix tables to prevent the approach followed in Naive algorithm. In the case

of a mismatch during the search, it uses the shift value that is the maximum of the good

suffix rule and bad character rule. The Boyer-Moore pattern matching algorithm to search

given patterns (or keywords) in the text ‘T’ is given as Algorithm 1.

78

Algorithm 1: Boyer-Moore pattern matching algorithm [50, 58]Input:T: an array of characters (text where keywords will be searched);P: an array of characters (holds a keyword to be searched in T);Result:A file containing <keyword, file offsets>Initialization:k: number of keywords;m: length of the keyword;n: length of the text ‘T’;q ← m;while k- - do

. for each keywordwhile q<n do

j ← m;l← q;while j>0 and P[j]==T[j] do

j ← j − 1;l← l − 1;

endif j == 0 then

Keyword found at file offset q;Write the name of the keyword and file offset to a file;q ← q +m− l(2);

endelseq ← q +Max(L[j], l(j), j −BC[j]);

endend

79

5.3.3 Outside the Virtual Machine

The data that resides outside a virtual machine will be the logs related to the cloud ser-

vices required to run and manage the virtual machine activity. The forensic process used

to acquire and analyze data that is not within the virtual machine (i.e. cannot be accessed

from the guest Operating system running in the VM) can be a remote application or an

application running along with the cloud hosting services under the control of the host

Operating system of the cloud server. In the previous section, we have discussed a script

that runs on the cloud hosting server to segregate cloud logs data with respect to a service

or a virtual machine instance ID. The remote application that we developed (Data Extrac-

tor), provides a query based facility to collect the log data of a virtual machine within

a cloud platform from a remote machine (the investigator’s workstation) as depicted in

Figure 5.5.

5.4 Summary

On par with the new digital forensic frameworks and architectures for cloud computing

platforms, there is a immediate requirement of having new digital forensic methods which

can scale to cloud data for handling the analysis of the cloud crimes. In this Chapter, we

have proposed methods for data collection and segregation; and methods for the partial

analysis of the evidence within and outside of a virtual machine that is present in a cloud

platform. In particular, to minimize the processing time of the digital evidence of a re-

ported cloud crime, we have proposed methods of examining the virtual machine’s data

in places where important evidentiary data will most likely be present. The results of our

finding in the examination and partial analysis phase will be provided to the investigator

for further analysis. This would help the investigator in knowing the location and pres-

ence of important artifacts in the evidence under investigation. The methods we proposed

for data collection, segregation and partial analysis for cloud forensic are under review

[Pub5]. In the next Chapter, we will demonstrate the application of the digital forensic

triage in the examination and partial analysis phase of the cloud forensics.

80

Chapter 6

Digital Forensic Triage in the

Examination and Partial Analysis

“The term ‘triage’ normally means deciding who gets attention first.”

- Bill Dedman

6.1 Introduction

In Chapter 5, we proposed various methods for data collection, segregation and partial

analysis for cloud forensics. The proposed partial analysis methods were part of the

examination and partial analysis phase of our cloud forensic framework. In this Chapter,

we use the concept of digital forensic triage to examine and partially analyze the cloud

data under investigation using a parallel processing framework to find the evidence of

interest to the investigator in real time.

The traditional digital forensic approach to investigation (seizing, imaging, and anal-

ysis) is no longer applicable for large-scale data examinations [79]. The capacity of the

storage media has increased at such a rate that the traditional digital forensic investigators

are unable to keep up pace. On par with this, the capacity of virtual disk volumes pro-

vided to a given VM in the cloud environment has also increased (for example, VMware

provides a datastore of size 62TB [41]). In this scenario, the investigator may need to

speed up the investigation process while dealing with specific cloud crime cases such as

81

murder, missing persons, child abductions, death threats, etc. Motivated by this, we used

the concept of digital forensic triage to implement ‘real-time digital forensic analysis

process’ to search for user specified keywords or patterns in real-time in the given evi-

dence file to minimize the overall processing time. The digital forensic triage approach

we designed uses MapReduce with inbuilt KMP (Knuth-Morris-Pratt) and Boyer-Moore

string search algorithms on a distributed computing platform which will index the given

keywords in real time depending on the computing nodes deployed for computation. For

searching patterns, we have implemented regular expression search for URL, Phone No.,

Email ID and IP address without using any specific algorithm. The index of the keywords

or patterns will be utilized by the investigator in the analysis phase to speedup the overall

analysis process. The regular expressions used for the corresponding patterns are listed in

the Table 5.2. We have already discussed the working of the Boyer-Moore algorithm in

Chapter 5, hence, we will elaborate on the working of KMP in this Chapter.

6.2 Digital Forensic Triage

6.2.1 Introduction to Triage and Background

Triage is defined in Oxford English dictionary as “The process of determining the most

important things from amongst a large number that require attention” [32]. Roussev et

al. [80] defined digital forensic triage as “a partial forensic examination conducted under

(significant) time and resource constraints”. Also, they have pointed out that the ability

of traditional digital forensic tools to employ a bigger ‘computational hammer’ has not

grown appreciably. We have experimented on data acquisition and analysis (particularly

indexing) with the traditional digital forensic tools and found the results as shown in

Table 6.1.

From our results, we conclude that the actual processing time (involves indexing,

carving files and analysis) of digital evidence is always greater than the acquisition time.

The total time required to complete the forensic investigation is the sum of the acquisition

time and the processing time. Thus, it may be derived that:

Total time = Acquisition time + Processing time

82

Table 6.1: Report of acquisition and indexing time using traditional digital forensic tools

Disk Size

(in GB)Tool used (Hardware/software)

Acquisition

time (min)

Indexing

time (min)

40, 80, 160Tableau Forensic Duplicator Model

TD1, S/W: 01d11068, F/W: 2.3921, 47,116 NA

40, 80, 160Logicube Talon, S/W: V2.43,

F/W: V3.0122, 46, 119 NA

40, 80, 160 FTK V5.2 NA 19, 51, 118

Where, total time is the time taken to complete the investigation of the crime under inves-

tigation (i.e., from evidence acquisition to reporting).

MapReduce is a parallel programming model used to process and generate large data

sets that is open to a broad variety of real-world problems. Digital forensic triage required

in cloud computing data analysis is one of such problems. In the next section, we will use

the parallel programming framework to design and implement a digital forensic triage

for cloud data analysis which can be used to speed up the overall processing of digital

evidence in the cloud crime investigation.

6.2.2 Parallel Processing Framework using Hadoop

MapReduce is a software framework for easily running applications which process large

amount of data in parallel on large clusters having thousands of nodes of commodity

hardware in a reliable and fault-tolerant manner. MapReduce is a fundamental building

block in Hadoop framework [23]. With the help of MapReduce and other components

such as HDFS, Mahout, Sqoop, Pig, Hive, Zoo keeper, Hbase, etc., the Hadoop framework

provides massive parallel processing. The programmer is completely abstracted from

the details of parallelization, fault-tolerance, locality optimization, load balancing, etc.

while parallel processing. In the MapReduce programming model, processing takes place

where the data is (i.e, computation goes to the data rather than the data coming to the

program) [54].

MapReduce takes the advantages of the parallel processing provided within the Hadoop

83

framework for efficient and fast processing by providing inherent parallelism in an appli-

cation [86]. MapReduce is not suitable for all applications, but when it works, it may

save a huge amount of processing time. MapReduce has two phases - the Map phase

and the Reduce phase. Any MapReduce application will have two functions - Map and

Reduce. The inputs to these functions are <key, value>pairs. An example of MapReduce

framework for a word count application is given in Figure 6.1.

Figure 6.1: MapReduce application framework to count distinct words of a file

As shown in the Figure 6.1, the input file data is divided into parts and sent to the

different Mapper processes (three in our case). Each Mapper process will produce <key,

value>pairs. Reduce functions (two in our case) gets these <key, value>pairs and counts

the value based on the key to produce output as <key, value>pairs.

6.3 Real-time Digital Forensic Analysis Process

6.3.1 Selection of the Pattern Matching Algorithm

The task before us was to select an efficient algorithm for MapReduce to search user spec-

ified keywords in an evidence file. We have tested two well-known algorithms (Boyer-

Moore and KMP string matching) for this requirement. The performance of both the

84

algorithms is very much similar except when used with searching keywords of different

lengths. Boyer Moore [50, 58] will be more appropriate when the keyword length is

very large and KMP [65] will out perform any other string search algorithm for shorter

keyword lengths. We have conducted experiments on the execution time of both the al-

gorithms with different keyword lengths. The experiment was carried out (results are

shown in Table 6.2) on single node Hadoop cluster with 1024 MB plain text data using

MapReduce [54]. By analyzing the experiment results, we decided to implement both the

algorithms. So that, depending on the length of the keyword, an appropriate algorithm

will be called during the execution.

Table 6.2: Execution time of Boyer-Moore and KMP algorithms with multiple keywords

Boyer-Moore algorithm KMP algorithmKeywordlength = 4

Keywordlength = 8

Keywordlength = 4

Keywordlength = 8

1 keyword 17 sec 11 sec 15.5 12 sec3 keywords 20.5 sec 13.5 sec 18 sec 14.5 sec5 keywords 23 sec 17.5 sec 21.5 sec 18 sec

6.3.2 Proposed System Architecture

For experimental purpose, we have set up Hadoop cluster using eight nodes with each

node hardware configuration as shown in Table 6.3.

Table 6.3: Hardware configuration of a node in Hadoop cluster

Processor Intel Core i7-4770KClock (GHz) 3.5

Number of cores 4Number of threads 8

RAM (GB) 8Cache (MB) 8

Hard Disk (GB) 1024

The versions of the software used are Apache Hadoop2.2.0 and Ubuntu 12.04. In

our proposed architecture, we have used only two components of Hadoop framework

85

called Hadoop file system (HDFS) and MapReduce as shown in Figure 6.2. The Hadoop

file system is a master/slave based architecture where one of the node acts as master

(NameNode) and the rest become slaves (DataNode). The master node should be running

the Hadoop components such as the DataNode, NameNode, Job tracker, Task tracker and

the secondary NameNode. The slave nodes on the other hand, should only be running the

DataNode and the Task tracker components. The DataNode manages the storage attached

to it. Master is responsible for managing and storing metadata information of all the files

on different DataNodes [23].

Figure 6.2: Mapping of Hadoop framework components to forensic triage [23]

The EditLog and FsImage are data structures (or files) of HDFS. Every transactional

change that occurs to the file system metadata is recorded in EditLog. The file system

namespace details like mapping of blocks to files and file system properties, is stored

in FsImage. For reliability, the data is divided into blocks and distributed over multiple

DataNodes including the master. The number of duplicate blocks of a file is called as

‘replication factor’ and its value is 3 by default. The job tracker that runs on master node

assigns the MapReduce tasks to task trackers. Also, it computes the preprocessing tables

(required to run KMP) and shift tables (required to run Boyer-Moore) of all keywords and

86

assigns them to task trackers. Task tracker that runs on all the DataNodes is responsible for

running the MapReduce functions which produce the final index offsets of each keyword

after searching. Mapping of Hadoop framework components to forensic triage is shown

in Figure 6.2.

6.3.3 Proposed System Implementation Details

The proposed ‘real-time digital forensic partial analysis process’ consists of the following

four steps:

Step 1: Selecting a VM’s virtual disk file acquired using forensically sound data acquisi-

tion techniques.

Step 2: Distributing parts of the selected data in Step 1 to Hadoop cluster for real time

data processing.

Step 3: Running KMP (or Boyer-Moore)/regular expression based MapReduce on Hadoop

cluster to search user specified keywords or patterns in each part of the data and aggregate

the result of all the parts.

Step 4: Based on the aggregated result, the investigator will decide whether to process the

evidence further or not.

The proposed system as shown in Figure 6.3 includes all these steps. We have imple-

mented the KMP/BM search algorithm in case of keywords and regular expression search

in case of patterns within the Map function of MapReduce in Java programming language.

Our implementation uses a Hadoop library ‘hadoop-0.18.3-core.jar’ for few inbuilt classes

and functions. Map function reads a line from part (block) of evidence file (.vmdk, .vhd,

.vdi, .qcow2, .img, etc.) at a time and calls the KMP (or Boyer-Moore)/regular expres-

sion search for a given set of keywords/patterns. If there is a hit for a keyword/pattern,

it provides local offset. There is an inbuilt function ‘Reporter’ that provides the global

offset corresponding to each local offset. These global offsets are collected by ‘Output

collector’ as intermediate records and written to a file. This process continues till all the

lines of different parts of the evidence file are searched. The number of files containing

the intermediate records will be created based on the number of map functions configured

(we set it to 16). The reduce functions (we set it to 2) take inputs from all the files created

by mapper and provides a merged result. The resulted keyword/pattern offsets can be

87

used for further analysis.

Figure 6.3: Proposed system for ‘real-time digital forensic partial analysis’ using MapRe-duce with KMP/BM search engine

The pseudo code of the improvised KMP (multi-pattern with multi-occurrence) search

algorithm [65] which is to be embedded in the map function is given below (as Algo-

rithm 2).

Where S is the set of characters in the evidence file to be searched, H is the array of

headers which is sought, T is the two dimensional array where T[h] is an array of pre-

computed integers computed by KMP table building algorithm (Algorithm 3) for header

H[h].

88

Algorithm 2: Improvised KMP pattern matching algorithmInput:H: an array of headers (a two dimensional array);T: a two dimensional array of integers (result of KMP table building algorithm);S: an array of characters (text to be searched);Result:A file containing <key, val>pairs as <keyword, file offset>Initialization:h: Number of headers;k ← h;while h- - do

m[h]← 0; . the beginning index of the current match of the keyword ‘h’ in ‘S’pos[h]← 0; . the position of the current character in H[h] keyword

endwhile k- - do

while m[h]+pos[h]<length(S) doif H[h][pos[h]] = S[m[h]+pos[h]] then

if pos[h] = length(H[h]) -1 thengenerate <key,value>as <H[h],m[h]>; . keyword offsetUpdate: m[h]← m[h] + pos[h]− T [h][pos[h]];if T[h][pos[h]] >-1 then

pos[h]← T [h][pos[h]];endelsepos[h]← 0;

endelseincrement pos[h];

endelsegoto Update;

endend

89

Algorithm 3: KMP table building algorithm for multiple headersInput:H: an array of headers (a two dimensional array);Result:Populates table(T)Initialization:h: Number of headers;T: a two dimensional array of integers;while h- - do

let T [h][0]← −1, T [h][1]← 0;pos← 2, cnd← 0;while pos <length(H[h]) do

if H[h][pos-1] = H[h][cnd] thencnd← cnd+ 1;T [h][pos]← cnd;pos← pos+ 1;

endelse if cnd >0 then

cnd← T [h][cnd];endelseT [h][pos]← 0;pos← pos+ 1;

endend

90

GUI Based Implementation:

As part of the automating process a GUI was made which would make it easier for the user

to search for keywords/regular expressions. Since, the MapReduce code was completely

dependent on the Mapper function, the GUI was created with the intention of automating

the task of writing a Mapper function. The user is given the option of searching for

either patterns or keywords. The four default regular expressions (i.e URL, Email address,

Mobile number and IP address) are already available for selection as shown in Figure 6.4.

The user can also add a pattern of his/her own choice as shown in Figure 6.5.

Figure 6.4: Default regular expressions to generate Mapper code

Figure 6.5: Adding regular expression to generate Mapper code

AWT and Swing were used for creating the front end. A thread was used to keep

the option selection process in the GUI updated in real time. A standard template for

the Mapper class was taken and modified according to the options selected in the GUI.

A separate file was maintained to update the regular expressions as the user would add

them. This file would serve as a temporary database. Once the user selects a set of

91

keywords or patterns and press “START” button, a .jar file gets created that contains the

KMP (or Boyer-Moore) or regular expressions based Mapper function code to search a

set of keywords or patterns using the Hadoop framework.

Execution of MapReduce in Hadoop:

After compiling the application (say for example KMPHadoop.java), a KMPHadoop.jar

file gets created that can be exported to a particular directory where it can be used as input

to the Hadoop parallel framework. Once the .jar file gets exported, we need to place it

in the location where the HDFS is installed for easier execution. To run the MapReduce

functions containing the user logic on Hadoop cluster, the following command can be

used from the master node of the cluster.

$bin/hadoop jar KMPHadoop.jar KMPHadoop /user/Pawar/Analysis/Ubuntu.img

/user/Pawar/Analysis/output

where,

KMPHadoop.jar - jar file containing the code

KMPHadoop - class name of the application

/user/Pawar/Analysis/Ubuntu.img - input file for searching keywords or patterns

/user/Pawar/Analysis/output - output directory where the resultant file will be stored

A successful execution of MapReduce program using the Hadoop framework will create

a resultant file in the output directory with the name “part-r-00000” that contains the file

offsets of selected keywords or patters.

6.4 Results and Discussion

To distribute the evidence file (.qcow2) over multi-node cluster, we have used the default

block size (64 MB) with replication factor equal to 2 and 3. Replication factor 3 gave us

better results. In our experimentation, we initially started with two nodes and gradually

increased to four and eight. In the two node set up, eight maps, and one reducer were

configured. For four and eight nodes set up, 16 maps, and two reducers were configured.

The search time of the KMP algorithm in all the three cases with single keyword for

different size evidence files is shown in Figure 6.6. The same experiment is carried out

92

Figure 6.6: Searching time of KMP based MapReduce with single keyword

93

with multiple keywords (4 no.s) to observe the behavior of the KMP algorithm as shown

in Figure 6.7. The performance of the KMP algorithm for searching multiple keywords

over single keyword is far better due to the replication of the parts of the evidence file

over the nodes of the cluster. We repeated the same experiment with one or more regular

expressions. The search time of the regular expression based MapReduce function in all

the three cases with single pattern for different size evidence files is shown in Figure 6.8.

Again, by changing the number of patterns (4 no.s), we ran the regular expression based

MapReduce function for which the resulted search time is shown in Figure 6.9. As in the

previous case, the performance of the regular expression based MapReduce function for

searching multiple patterns over single pattern is far better for the same reason.

A keen observation on the performance testing of KMP (or Boyer-Moore) and regu-

lar expression based MapReduce reveals that, the keywords with regular expressions are

searched fast. The reason for this is that the keywords with regular expressions matches

the exact pattern where as others match the substring also. The regular expressions used

for searching e-mail IDs, URLs, IP addresses, and mobile numbers are given in Table 5.2.

These regular expressions are the patterns for respective keywords in the Indian context.

The design method we used support addition of new patterns if required by the investiga-

tor.

The approach we designed, implemented and tested could also be used for the follow-

ing purpose:

• Data carving [73]

• Online social network analysis

• Screening of cloud crime cases (reducing the investigator’s backlog in the computer

forensics lab)

• The cloud crime investigator can make use of the computing facility of any cloud

provider to deliver ‘Forensics as a Service’ (FaaS) to the end-users who require the

indexes of certain keywords or patterns of the evidence

• Server log analysis for forensic purpose

94

Figure 6.7: Searching time of KMP based MapReduce with multiple keywords

95

Figure 6.8: Searching time of RE based MapReduce with single pattern

96

Figure 6.9: Searching time of RE based MapReduce with multiple patterns

97

• The generated file offsets of the keywords or patterns can be used to generate the

timeline view of the important artifacts related to the reported crime

The traditional digital forensic tools like CyberCheck, FTK, Encase have the facility

of indexing. After the indexing, these tools provide the searching of keywords on the

click, but, the time they take to index the evidence is considerably high which is on par

with the increase of the digital media size. These tools can make use of the output (file

offsets of keywords) generated by our approach for searching the specific keywords and/or

patterns which are related to a reported cloud crime. This kind of search will speed up the

searching criteria by avoiding the indexing time. Also, the readily available file offsets of

certain keywords will speed up the file carving process if used by the file carving tools

such as Adroit [2], F-DAC [14], foremost [16], R-STUDIO [36], etc.

With the computation facility of the eight node cluster (results are shown in Figure 6.6

and 6.7), the investigator can know whether an evidence file of size 1 TB contains four

keywords or not in 90 minutes of time. This time can be drastically reduced to less than

few minutes by adding more number of high-end nodes to the cluster and increasing the

number of the Map and Reduce tasks. Hence, we call our approach of finding the user

specified keywords or patterns in given evidence as ‘real-time digital forensic analysis

process’.

6.5 Summary

Digital storage media capacity is growing at the rate of Moore’s law. Cloud computing

environment has added fuel to this by providing almost unlimited virtual computational

facility and storage media. This phenomenal change increases the overall time it takes

to process a typical cloud crime investigation. The increase in the processing time to

completely analyze the evidence data is driving the need for additional research in this

emerging area.

In this Chapter, we designed and developed a ‘real-time digital forensic partial anal-

ysis process’ to search for user specified keywords or patterns in real-time in the given

evidence to minimize the overall processing time. For this we have implemented KMP

98

(or Boyer-Moore)/regular expression based MapReduce on Hadoop cluster in Java pro-

gramming and tested it successfully on a cluster with eight nodes. When this approach

is used for partial analysis, there is no possibility of missing crucial piece of evidence.

The overall model works as simple as searching for user specified patterns in a plain text

document file. The proposed digital forensic triage process, ‘real-time digital forensic

partial analysis’ is an accepted and published work [Pub4]. In the next Chapter, we will

provide the logical conclusion of our research and suggest future directions to the world

community of researchers of this emerging field.

99

Chapter 7

Conclusion and Future Scope

In this research work, we addressed the challenges and requirements of performing dig-

ital forensics in cloud. We designed a generic digital forensic framework for cloud. We

suggested methods of dead/live forensic acquisition and analysis within/outside the vir-

tual machines and also designed a digital forensic triage for the examination and partial

analysis of virtual machines in the cloud computing systems.

In particular, we addressed the concerns, a digital forensic investigator may face dur-

ing the investigation in cloud computing environment. In the following sections, we sum-

marize the details of the work carried out as part of this research.

7.1 Summary of Deductions

Cloud computing is still an evolving computational platform which lacks the support for

crime investigation in terms of the required frameworks/tools. The extensive literature

survey we conducted in the area of digital forensic in cloud computing systems helped

us in identifying the gaps in the existing research. From the identified ones, we focused

on a few and designed and implemented methods for the partial forensic examination,

evidence segregation, selective data acquisition, and digital forensic triage using parallel

processing for cloud forensic data analysis.

Specific to performing digital forensic in the virtual environment, we identified the

challenges and requirements of detecting the virtual environment in multi-level virtual-

ization, identified important files which are generated when virtual systems are used in

100

the virtual machines that are part of the cloud environment, and devised an algorithm to

detect virtual machines hidden using the alternate data streams. The proposed algorithm

for detection of hidden virtual machines, uses three different filters which guarantees the

detection of malicious virtual machines.

To design a generic digital forensic framework for the cloud crime investigation, we

have framed a digital forensic process with five phases. All the phases that are included

in this framework work similar to the existing frameworks, except the third phase. This

phase (the phase of examination and partial analysis) will play an important role in the

examination and analysis of the data produced by the cloud environment. Having iden-

tified the different phases that need to be followed in the cloud crime investigation, we

designed a generic control flow process for performing digital forensics in cloud. The

proposed control flow process will serve as a blueprint for the investigator in acquiring

and analyzing the data of the client device as well as that of the cloud provider data centers

involved in the cloud crime investigation. To design and develop a cloud forensic applica-

tion, the complete knowledge of the cloud computing service models and the deployment

models are essential. To help the digital forensic research community in understanding

the cloud computing architecture for forensic readiness, we designed a digital forensic

architecture for the cloud. This architecture may be used as a reference to design and

develop new digital forensic tools in the area of cloud computing systems.

For the proof of concept for designing a architecture for cloud forensics, we have

formulated methods for cloud data acquisition and analysis. For a reported cloud crime,

the important artifacts which need to be acquired were identified from the point view of

virtual machines and cloud logs. After acquiring a virtual machine’s virtual hard disk file

using the traditional digital forensic approach, the methods we suggested for the exam-

ination and partial analysis will be used to collect actionable evidence from the virtual

disk file. Using our approach, the investigator can collect the evidential artifacts such as

the file system metadata, the registry file contents, and the physical memory contents at

the scene of crime. The collection of the evidential artifacts of a virtual machine under

investigation would help the investigator in speeding up the final analysis process. The

virtual machine under the investigation will also have the logs related to its activity in the

cloud platform. We have devised methods to segregate and acquire the log data belonging

101

to a virtual machine.

The facility of searching for the keywords and/or patterns provided by the traditional

digital forensic tools depend on the indexing capability of the tool. The average time these

tools take to index the complete disk content is not on par with the increase of the digital

media size, especially virtual disk size provided by the cloud platforms today. To speed up

the searching criteria, we have designed and implemented a digital forensic triage using

parallel processing framework to index the evidence of interest to the investigator in real

time. This method of indexing the evidence of interest also falls in the category of the

examination and partial analysis which will help the investigator in speeding up the final

analysis process.

7.2 Future Scope of Work

The methods which were suggested for the realization of the designed framework titled “A

Novel Digital Forensic Framework for Cloud Computing Environment”, are tested using

the private cloud test-bed setup using OpenStack cloud solution. The methods can also be

tested using different private cloud solutions such as Eucalyptus, OpenNebula, VMware

vCloud, etc. The devised methodology of the digital forensic triage using parallel pro-

cessing was carried out using the Hadoop framework, which was not integrated with any

digital forensic analysis tools to use for searching patterns in the evidence file. In the

future work, one could include the pattern search facility using the proposed approach in

the open source software called Digital Forensics Framework (DFF). Also, one can take

up the implementation of the digital forensic triage using Amazon Elastic MapReduce

(Amazon EMR) to index the patterns of interest in the given evidence.

We targeted an IaaS (Infrastructure-as-a-Service) delivery model of the cloud for per-

forming digital forensic activity. As a future work, the design and development of the

forensic methods for the PaaS (Platform-as-a-Service) and SaaS (Software-as-a-Service)

delivery models of cloud computing may be taken up.

It may be appropriate to use Machine Learning principles to design and develop new

methods to solve the problem of digital forensic triage. As another alternative to our

proposed approach for increasing the efficiency of the investigation, one could plan to

102

use Machine Learning algorithms for feature extraction, prioritization of the evidence,

classification of the evidence, etc. to extract and analyze crime related features within the

virtual machine.

7.3 Concluding Remarks

This research work would enable digital forensic investigation in the cloud environment

by filling the gap that exists between the traditional digital forensics and the cloud foren-

sics which is certainly different due to the virtual environment of cloud computing sys-

tems. We hope that, the work presented in this research will be taken forward by the

digital forensic research community to come up with new methods of performing digital

forensics to cater to the needs of the dynamic changing nature of the cloud.

103

List of Publications Published/Accepted

[Pub1] Digambar Povar and G Geethakumari, “Digital Evidence Detection in Virtual

Environment for Cloud Computing”, Proceedings of the ACM International Conference

on Security of Internet of Things, SecurIT’12, August 17-19, India, 2012, pp. 102-106.

[Pub2] Digambar Povar and G Geethakumari, “A Novel approach to Detect Cloud Virtual

Machines hidden using Alternate Data Streams”, Proceedings of the IEEE International

Multi Conference on Automation, Computing, Control, Communication and Compressed

Sensing, iMac4s-2013, March 22-23, India, 2013, pp. 835-839.

[Pub3] Digambar Povar and G Geethakumari, “A Heuristic Model for Performing Digital

Forensics in Cloud Computing Environment”, International Symposium on Security in

Computing and Communications, SSCC-2014, September 24-27, India. Proceedings in

the Journal “Communications in Computer and Information Science (CCIS)”, Springer

Series, Volume 467, pp. 341-352.

[Pub4] Digambar Povar, Saibharath, and G Geethakumari, “Real-time digital forensic

triaging for cloud data analysis using MapReduce on Hadoop framework”, International

Journal of Electronic Security and Digital Forensics, Inderscience Publishers, Vol. 7,

Issue No. 2, pp. 119-133, 2015.

[Pub5] Digambar Povar and G Geethakumari, “Digital Forensic Architecture for Cloud

Computing Systems: Methods of Evidence Identification, Segregation, Collection and

Partial Analysis”, Accepted at the Third International Conference on INformation systems

Design and Intelligent Applications-INDIA-2016. Proceedings in the Journal “Advances

in Intelligent Systems and Computing (AISC) series”.

104

Bibliography

[1] Ad triage - forensically acquire data from live and powered down computers in the

field. http://accessdata.com/solutions/digital-forensics/AD-triage. Accessed: 2015-

06-25.

[2] Adroit photo forensics - smartcarving tool. http://digital-

assembly.com/products/adroit-photo-forensics/features/smartcarving.html. Ac-

cessed: 2015-06-25.

[3] analyzemft - mft file parser. https://github.com/dkovar/analyzeMFT. Accessed:

2015-06-25.

[4] Aws:amazon web services - public cloud computing platform.

https://aws.amazon.com. Accessed: 2015-06-25.

[5] Awstats - an open source log analyzer. http://www.awstats.org. Accessed: 2015-06-

25.

[6] Clavisters new dimension in network security reaches the

cloud. Technical report, Tech. Rep., [Online]. Available:

https://www.clavister.com/globalassets/documents/resources/white-

papers/clavister-whp-cloud-security-en.pdf.

[7] Cloud computing strategic direction paper: Opportunities and applicability for use

by the australian government, version 1.0, 2011.

[8] Computer evidence vs daubert: The coming conflict.

https://www.cerias.purdue.edu/bookshelf/archive/2005-17.pdf. Accessed: 2015-06-

25.

105

[9] Cybercheck - digital evidence analysis software.

http://www.cyberforensics.in/Products/Cybercheck.aspx. Accessed: 2015-06-

25.

[10] Dban - data wiping software. http://www.dban.org. Accessed: 2015-06-25.

[11] Digital forensics framework - open source digital investigation software.

http://www.digital-forensic.org. Accessed: 2015-06-25.

[12] Encase forensic v7 - the fastest, most comprehensive forensic so-

lution. https://www.guidancesoftware.com/products/Pages/encase-

forensic/overview.aspx?cmpid=nav. Accessed: 2015-06-25.

[13] Eucalyptus - private cloud computing platform. https://www.eucalyptus.com. Ac-

cessed: 2015-06-25.

[14] F-dac - forensic data carving tool. http://www.cyberforensics.in/showdownloads.as-

px?id=46. Accessed: 2015-06-25.

[15] Fget - network-capable forensic data acquisition tool. http://www.net-

security.org/secworld.php?id=9757. Accessed: 2015-06-25.

[16] Foremost - freely available file carving tool. http://foremost.sourceforge.net. Ac-

cessed: 2015-06-25.

[17] Forensic tool kit - standard digital forensic investigation solution.

https://www.accessdata.com/solutions/digital-forensics/forensic-toolkit-ftk. Ac-

cessed: 2015-06-25.

[18] Ftk imager - disk imaging tool. http://accessdata.com/product-download. Accessed:

2015-06-25.

[19] Google app engine:google cloud platform for application development and deploy-

ment. https://cloud.google.com/appengine. Accessed: 2015-06-25.

[20] Google drive client software. https://www.google.co.in/drive/download. Accessed:

2015-06-25.

106

[21] Guidance software encase - real-world triage and collection with encase portable.

https://www.guidancesoftware.com/products/Pages/encase-portable/overview.aspx.

Accessed: 2015-06-25.

[22] Guidelines for the secure use of cloud computing by federal departments

and agencies. http://csrc.nist.gov/groups/SMA/ispab/documents/minutes/2011-

07/Jul13 Cloud-ISIMC-Cloud-Security-ISPAB.pdf. Accessed: 2015-06-25.

[23] Hadoop - hdfs architecture guide. http://hadoop.apache.org/docs/r1.2.1/hdfs desig-

n.html. Accessed: 2015-06-25.

[24] Incident management and forensics working group - map-

ping the forensic standard iso/iec 27037 to cloud computing.

https://downloads.cloudsecurityalliance.org/initiatives/imf/Mapping-the-Forensic-

Standard-ISO-IEC-27037-to-Cloud-Computing.pdf. Accessed: 2015-06-25.

[25] Lime - linux memory extractor. https://github.com/504ensicslabs/lime. Accessed:

2015-06-25.

[26] Memoryze - find evil in live memory. http://www.mandiant.com/resources/downloa-

d/memoryze. Accessed: 2015-06-25.

[27] Mount image pro - mount image as a drive letter. http://www.mountimage.com.

Accessed: 2015-06-25.

[28] Opennebula - private cloud computing platform. https://opennebula.org. Accessed:

2015-06-25.

[29] Openstack - private cloud computing platform. https://www.openstack.org. Ac-

cessed: 2015-06-25.

[30] Openstack configuration reference manual guide.

http://docs.openstack.org/icehouse/config-reference/config-reference-icehouse.pdf.

Accessed: 2015-06-25.

107

[31] Openstack installation guide for ubuntu 12.04.

http://docs.openstack.org/icehouse/install-guide/install/apt/openstack-install-

guide-apt-icehouse.pdf. Accessed: 2015-06-25.

[32] Oxford dictionary - definition of triage. http://www.oxforddictionaries.com/definiti-

on/english/triage. Accessed: 2015-06-25.

[33] Putty - an ssh and telnet client. http://www.putty.org. Accessed: 2015-06-25.

[34] Python-registry - library that provides read-only access to windows registry files.

https://github.com/williballenthin/python-registry. Accessed: 2015-06-25.

[35] Qemu disk image utility - openstack virtual machine image guide.

http://docs.openstack.org/image-guide/image-guide.pdf. Accessed: 2015-06-

25.

[36] R-studio - disk recovery software. http://www.data-recovery-software.net. Ac-

cessed: 2015-06-25.

[37] The sleuth kit - open source digital forensics. http://www.sleuthkit.org/. Accessed:

2015-06-25.

[38] Spektor forensic intelligence - triage first responders.

http://www.evidencetalks.com/index.php/en/products. Accessed: 2015-06-25.

[39] The system for triaging key evidence - ideal technology corporation.

http://www.idealcorp.com/products/index.php?product=STRIKE. Accessed:

2015-06-25.

[40] Vmware - private cloud computing solution. https://www.vmware.com/cloud-

computing/private-cloud.html. Accessed: 2015-06-25.

[41] Vmware vsphere 5.5 - configuration maximums.

http://www.vmware.com/pdf/vsphere5/r55/vsphere-55-configuration-

maximums.pdf. Accessed: 2015-06-25.

108

[42] Vmware workstation 5.5 - what files make up a virtual machine?

https://www.vmware.com/support/ws55/doc/ws learning files in a vm.html.

Accessed: 2015-06-25.

[43] The volatility framework - an advanced memory forensics framework.

https://code.google.com/p/volatility. Accessed: 2015-06-25.

[44] Winscp - an open source free ssh client for windows.

https://winscp.net/eng/index.php. Accessed: 2015-06-25.

[45] X-ways forensics - integrated computer forensics software. http://www.x-ways.net.

Accessed: 2015-06-25.

[46] Zeus botnet controller. Technical report, Tech. Rep., 2009. [Online]. Available:

http://aws.amazon.com/security/security-bulletins/zeus-botnet-controller.

[47] Zsoft uninstaller 2.5 - search for remnants after uninstalling a application.

http://www.zsoft.dk/index/software details/4. Accessed: 2015-06-25.

[48] M Al Fahdi, NL Clarke, and SM Furnell. Towards an automated forensic examiner

(afe) based upon criminal profiling & artificial intelligence. 2013.

[49] Cory Altheide and Harlan Carvey. Digital forensics with open source tools. 2011.

[50] Robert S Boyer and J Strother Moore. A fast string searching algorithm. Communi-

cations of the ACM, 20(10):762–772, 1977.

[51] Brian Carrier. File system forensic analysis, volume 3. Addison-Wesley Reading,

2005.

[52] Brian Carrier, Eugene H Spafford, et al. Getting physical with the digital investiga-

tion process. International Journal of digital evidence, 2(2):1–20, 2003.

[53] Hyunji Chung, Jungheum Park, Sangjin Lee, and Cheulhoon Kang. Digital forensic

investigation of cloud storage services. Digital investigation, 9(2):81–95, 2012.

[54] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large

clusters. Communications of the ACM, 51(1):107–113, 2008.

109

[55] Josiah Dykstra and Alan T Sherman. Acquiring forensic evidence from

infrastructure-as-a-service cloud computing: Exploring and evaluating tools, trust,

and techniques. Digital Investigation, 9:S90–S98, 2012.

[56] Josiah Dykstra and Alan T Sherman. Design and implementation of frost: Digital

forensic tools for the openstack cloud computing platform. Digital Investigation,

10:S87–S95, 2013.

[57] Corrado Federici. Cloud data imager: A unified answer to remote acquisition of

cloud storage areas. Digital Investigation, 11(1):30–42, 2014.

[58] Zvi Galil. On improving the worst case running time of the boyer-moore string

matching algorithm. Communications of the ACM, 22(9):505–508, 1979.

[59] Simson L Garfinkel. Digital forensics research: The next 10 years. digital investi-

gation, 7:S64–S73, 2010.

[60] Bernd Grobauer, Tobias Walloschek, and Elmar Stocker. Understanding cloud com-

puting vulnerabilities. Security & privacy, IEEE, 9(2):50–57, 2011.

[61] NIST Cloud Computing Forensic Science Working Group et al. Nist cloud comput-

ing forensic science challenge (draft), 2014.

[62] Jerry Honeycutt and Jerry Honeycutt Jr. Microsoft Windows registry guide. Mi-

crosoft Press, 2005.

[63] Ilyoung Hong, Hyeon Yu, Sangjin Lee, and Kyungho Lee. A new triage model con-

forming to the needs of selective search and seizure of electronic evidence. Digital

Investigation, 10(2):175–192, 2013.

[64] Karen Kent, Suzanne Chevalier, Tim Grance, and Hung Dang. Guide to integrating

forensic techniques into incident response. NIST Special Publication, pages 800–86,

2006.

[65] Donald E Knuth, James H Morris, Jr, and Vaughan R Pratt. Fast pattern matching in

strings. SIAM journal on computing, 6(2):323–350, 1977.

110

[66] Lee Pimlott Lallie, Harjinder Singh. Applying the acpo principles in public cloud

forensic investigations. Journal of Digital Forensics, Security and Law, 7(1):71–86,

2012.

[67] Fang Liu, Jin Tong, Jian Mao, Robert Bohn, John Messina, Lee Badger, and Dawn

Leaf. Nist cloud computing reference architecture: Recommendations of the na-

tional institute of standards and technology (special publication 500-292). 2012.

[68] Adamantini I Martini, Alexandros Zaharis, and Christos Ilioudis. Detecting and

manipulating compressed alternate data streams in a forensics investigation. In Dig-

ital Forensics and Incident Analysis, 2008. WDFIA’08. Third International Annual

Workshop on, pages 53–59. IEEE, 2008.

[69] Ben Martini and Kim-Kwang Raymond Choo. An integrated conceptual digital

forensic framework for cloud computing. Digital Investigation, 9(2):71–80, 2012.

[70] Fabio Marturana and Simone Tacconi. A machine learning-based triage methodol-

ogy for automated categorization of digital media. Digital Investigation, 10(2):193–

204, 2013.

[71] Rodney McKemmish. What is forensic computing? 1999.

[72] P Mell and T Grance. The nist definition of cloud computing. nist special publication

800-145 (final). Technical report, Tech. Rep., 2011.[Online]. Available: http://csrc.

nist. gov/publications/nistpubs/800-145/SP800-145. pdf.

[73] Antonio Merola. Data carving concepts. SANS Institute: Infosec Reading room,

2008.

[74] Darren Quick and Kim-Kwang Raymond Choo. Digital droplets: Microsoft skydrive

forensic data remnants. Future Generation Computer Systems, 29(6):1378–1394,

2013.

[75] Darren Quick and Kim-Kwang Raymond Choo. Dropbox analysis: Data remnants

on user machines. Digital Investigation, 10(1):3–18, 2013.

111

[76] Darren Quick and Kim-Kwang Raymond Choo. Forensic collection of cloud storage

data: Does the act of collection result in changes to the data or its metadata? Digital

Investigation, 10(3):266–277, 2013.

[77] Darren Quick and Kim-Kwang Raymond Choo. Google drive: Forensic analysis of

data remnants. Journal of Network and Computer Applications, 40:179–193, 2014.

[78] Anthony Reyes, Richard Brittson, Kevin O’Shea, and James Steele. Cyber crime

investigations: Bridging the gaps between security professionals, law enforcement,

and prosecutors. Syngress, 2011.

[79] Marcus K Rogers, James Goldman, Rick Mislan, Timothy Wedge, and Steve De-

brota. Computer forensics field triage process model. Journal of Digital Forensics,

Security and Law, 1(2):19–38, 2006.

[80] Vassil Roussev, Candice Quates, and Robert Martell. Real-time digital forensics and

triage. Digital Investigation, 10(2):158–167, 2013.

[81] Keyun Ruan, Ibrahim Baggili, Joe Carthy, and Tahar Kechadi. Survey on cloud

forensics and critical criteria for cloud forensic capability: A preliminary analysis.

In Proceedings of the Conference on Digital Forensics, Security and Law, pages

55–70, 2011.

[82] Keyun Ruan, Joe Carthy, Tahar Kechadi, and Mark Crosbie. Cloud forensics. In

Advances in digital forensics VII, pages 35–46. Springer, 2011.

[83] John Sammons. The basics of digital forensics: the primer for getting started in

digital forensics. Elsevier, 2012.

[84] Adrian Shaw and Alan Browne. A practical and robust approach to coping with large

volumes of data submitted for digital forensic examination. Digital Investigation,

10(2):116–128, 2013.

[85] Nimisha Singla and Deepak Garg. String matching algorithms and their applicability

in various applications. International Journal of Soft Computing and Engineering,

1(6):218–222, 2012.

112

[86] Dinkar Sitaram and Geetha Manjunath. Moving to the cloud: Developing apps in

the new world of cloud computing. Elsevier, 2011.

[87] David Solomon and Mark Russinovich. Microsoft windows internals, 2005.

[88] Mark Taylor, John Haggerty, David Gresty, and David Lamb. Forensic investigation

of cloud computing systems. Network Security, 2011(3):4–10, 2011.

[89] Udaya Tupakula and Vijay Varadharajan. Tvdsec: Trusted virtual domain security.

In Utility and Cloud Computing (UCC), 2011 Fourth IEEE International Conference

on, pages 57–64. IEEE, 2011.

[90] Luis M Vaquero, Luis Rodero-Merino, Juan Caceres, and Maik Lindner. A break in

the clouds: towards a cloud definition. ACM SIGCOMM Computer Communication

Review, 39(1):50–55, 2008.

[91] Toby Velte, Anthony Velte, and Robert Elsenpeter. Cloud computing, a practical

approach. McGraw-Hill, Inc., 2014.

[92] Divya S Vidyadharan and KL Thomas. Digital image evidence detection based on

skin tone filtering technique. In Advances in Computing and Communications, pages

544–551. Springer, 2011.

[93] Shams Zawoad and Ragib Hasan. Cloud forensics: a meta-study of challenges,

approaches, and open problems. arXiv preprint arXiv:1302.6312, 2013.

113

Glossary of terms used in the thesis

ACPO (Association of Chief Police Officers) principles. The ACPO principles are the

guidelines for the digital forensic investigation which will be followed in handling the

computer based electronic evidence by the law enforcement agencies particularly in the

United Kingdom. There are four principles in these guidelines. If all of the four principles

are followed correctly, it may be a benchmark as “chain of custody” for the court-of-law.

Client device. Client device is a digital device used to access the cloud services. Exam-

ples of such device includes the desktop computer, laptop, mobile device, PDA (Personal

Digital Assistant), etc.

Cloud service provider (CSP). CSP is the notable entity that provides computer re-

sources as a service. Examples of CSP’s are: Apple, Amazon, Microsoft, Google, Oracle,

IBM, HP, and others.

Cloud storage. Also called as remote storage. The cloud service that stores user data

in cloud providers storage (cloud servers).

Cloud user. User who uses the cloud services such as IaaS, PaaS, or SaaS.

CyberCheck. Cyber forensic tool for data recovery and analysis of digital evidence.

Darik’s Boot and Nuke (DBAN). Is a free erasure software for deleting the contents

of any hard disk drive. Once the data is deleted, cannot be recovered.

Data duplication (dd). Disk cloning utility. It has the capability of cloning a parti-

tion or an entire hard disk drive.

Daubert principles. A rule of evidence regarding the admissibility of expert witnesses

114

testimony during United States federal legal proceedings.

Digital forensic investigation. The process of investigating a cyber crime using forensi-

cally sound acquisition and analysis methods.

Digital Forensic Research Workshop (DFRWS). Workshop which is conducted every

year to bring together the academic researchers and the digital forensic investigators and

practitioners for active discussion.

EnCase. Digital forensic tool to analyze data from the widest range of devices such

as the computers, smartphones, and tablets.

EnCase Forensic Imager. Digital forensic tool to acquire evidence (bit by bit cloning

of the digital media) in a forensically sound manner.

Expert Witness Format (EWF). Digital data that could be used as evidence are typi-

cally stored in specialized and closed formats. One such format is EWF. This format is

used by all the major tools that are used to acquire and analyze digital evidence.

Forensic Tool Kit (FTK). It is a another tool like EnCase, used to analyze data from

the various digital devices.

FTK Imager. It is a disk imaging (or bit by bit cloning of the digital media) tool that

acquire the digital evidence in a forensically sound manner.

Gartner. It is a Information Technology (IT) research and advisory firm providing tech-

nology related insight. It uses hype cycles and magic quadrants for visualization of its

market analysis results.

HardCopy 3P. Portable hardware tool for the forensic hard drive cloning.

115

International Data Corporation (IDC). Market research, analysis and advisory firm

specialized in the Information Technology (IT), telecommunications, and consumer tech-

nology.

Investigator. The person who investigates a cyber crime.

Law Enforcement Agency (LEA). The person who is authorized to investigate a cy-

ber crime.

Linux Memory Extractor (LiME). Physical memory (also called as volatile memory

or RAM) acquisition tool for Linux and Linux-based devices.

Message Digest (MD5). A cryptographic hash function that computes a checksum (128

bits) used to provide data integrity and authentication.

National Institute of Standards and Technology (NIST). Standardization firm provides

standards and guidelines to new technologies such as mobile computing, cloud comput-

ing, Internet of things (IOT), etc.

Scientific Working Group on Digital Evidence (SWGDE). It is an organization that

builds standards for digital and multimedia evidence.

Secure Hash Algorithm (SHA). A cryptographic hash function that computes a check-

sum (160 bits or more) used to provide data integrity and authentication.

Tableau forensic duplicator. Portable hardware tool for fast and reliable forensic hard

drive cloning.

TrueBack. It is a digital forensic software tool for digital evidence seizure and acquisition

(Disk imaging or cloning), that is compatible with DOS, Windows and Linux operating

systems.

116

Biography: Mr. Digambar Povar

Mr. Digambar Povar is Lecturer, Dept. of Computer Science and Information Systems

at BITS Pilani, Hyderabad Campus. Before joining BITS, he worked as a Scientist in

the Dept. of Resource Center for Cyber Forensics (RCCF) at Center for Development of

Advanced Computing (CDAC), Trivandrum, India, for a period of 5 years and 6 months.

He was also associated with Center for Development of Advanced Computing (CDAC),

Noida, as a Project Engineer for a short period. Mr. Digambar Povar holds a post-

graduation (M.Tech) in Computer Science and Engineering from NIT Warangal, India.

At CDAC, he contributed in the design and development of cyber forensic tools like Cy-

berCheck (forensic disk analysis tool), F-DaC (Forensic Data Carving) tool, FIRT (Foren-

sic Image Recovery tool), etc. He was instrumental in commissioning of the Cyber Foren-

sic labs at the office of the DGIT (Investigation), Delhi, DGIT (Investigation), Mumbai

and DG-DRI, Mumbai, India. He served as a faculty in conducting “Courses on Cyber

Forensics” to Law Enforcement officers, CBI, IB, Navy and Kerala police. Also, he par-

ticipated in many national and international seminars/workshops on “Digital Forensics”.

Mr. Digambar Povar has many international publications to his credit as primary author.

His areas of research interests include: digital forensics, cloud computing, cloud foren-

sics, cyber security and cloud security.

Mr. Digambar Povar has given many guest lectures on topics in emerging areas such as

digital forensics, cloud computing, cloud security and allied areas of cloud computing.

Presently, he is the Co-Investigator for the project “Design and Development of Digital

Forensic Tools for Cloud IaaS” funded by DeitY, Govt. of India.

117

Biography: Dr. G. Geethakumari

Dr G Geethakumari is Asst.Professor, Dept. of Computer Science and Information Sys-

tems at BITS Pilani, Hyderabad Campus. Before joining BITS, she worked as a faculty

in the CSE Dept. at the National Institute of Technology, Warangal. Dr Geetha received

her Ph.D. from University of Hyderabad. Her Ph.D. thesis was titled ‘Grid Computing

Security through Access Control Modelling’.

Dr. Geetha has many international publications to her credit. Her areas of research inter-

ests include: Information security, cloud computing and security, cloud forensics, enter-

prise security challenges and data analysis, cloud authentication techniques, cyber secu-

rity, semantic attacks and privacy in online social networks. She has been in the forefront

of technical activities at BITS-Pilani, Hyderabad Campus. She has been the Faculty Ad-

visor for Computer Science Association during 2008-2011. Presently she is the IEEE

Student Branch Counselor, BITS-Pilani, Hyderabad Campus. She is also the Coordinator

for the Linux User Group, BITS Pilani, Hyderabad Campus.

Dr. Geetha is a Member, IEEE as well as Member, IEEE Computer Society. She is also

a Professional Member, ACM. She was the Organizing Committee Member for the IEEE

INDICON Conference conducted in BITS Pilani, Hyderabad Campus during December

16-18, 2011. Dr Geetha was the Publicity Co-Chair for the IEEE Prime Asia Conference

hosted by BITS Pilani, Hyderabad Campus during December 5-7, 2012.

Dr Geetha was the Publicity Co-Chair for the IEEE Prime Asia Conference hosted by

BITS Pilani, Hyderabad Campus during December 5-7, 2012. She was the Organizing

Committee Member for the Workshop on Advances in Image Processing and Applica-

tions held in BITS Pilani, Hyderabad Campus during October 26 - 27, 2013. She was

part of the Organizing Committee for the National Seminar on Indian Space Technology

- Present and Future (NSIST-2014) held at BITS Pilani Hyderabad Campus on 1st May,

118

2014.

Dr Geetha has given many guest lectures on topics in emerging areas such as cyber se-

curity, cloud computing and cloud security. She has been a member of the Technical

Program Committees of various IEEE International Conferences. An extract from the

paper ‘A taxonomy for modelling and analysis of diffusion of (mis)information in social

networks’, co-authored by Dr Geetha and published in the International Journal of Com-

munication Networks and Distributed Systems, Vol. 13, No. 2, 2014, pp.119-143, by

Inderscience Publishers, was selected for a press release on ‘Semantic attacks in online

social media’.

Presently, she is the Chief-Investigator for the project “Design and Development of Digital

Forensic Tools for Cloud IaaS” funded by DeitY, Govt. of India.

a novel digital forensic framework for cloud computing...

Documents