a novel digital forensic framework for cloud computing...
TRANSCRIPT
A Novel Digital Forensic Framework for
Cloud Computing Environment
THESISSubmitted in partial fulfilment
of the requirements for the degree of
DOCTOR OF PHILOSOPHY
byPOVAR DIGAMBAR
ID. No. 2011PHXF401H
Under the supervision ofDr. G. Geethakumari
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI2015
i
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
CERTIFICATE
This is to certify that the thesis entitled A Novel Digital Forensic Framework for Cloud
Computing Environment and submitted by Povar Digambar ID No 2011PHXF401H
for award of Ph.D. of the Institute embodies the original work done by him under my
supervision.
Signature of the Supervisor
Name in capital letters DR. G. GEETHAKUMARI
Designation Asst Professor, Dept. of CSIS
Date:
ii
AcknowledgementsForemost, I would like to express my deepest thanks to my supervisor Dr. G.Geethakumari
for all her suggestions and constant support during this research. Her valuable guidance
and encouragement throughout the period were critical factors which contributed towards
completion of the work. Through her untiring efforts, she helped me to critically anal-
yse the problems in a systematic manner and consider innovative approaches to evolve
practical solutions.
I would also like to thank Prof. Chittaranjan Hota and Prof. Yoganandam, members
of my doctoral advisory committee for their constant review and invaluable suggestions
in steering the work. I would also like to express my gratitude to other members of the
faculty in the Department of Computer Science and Information Systems Prof. Gururaj,
Prof. Bhanu Murthy, Dr. Tathagata Ray, Dr. Aruna Malapati, Mr. KCS Murti, Mr.
Abhishek Thakur and Mr. Rakesh Prasanna for all their suggestions and encouragement
during various presentations and whenever I interacted with them.
My sincere gratitude to my fellow researchers Meera and Pavan for their continuous
support during all the stages of this work. Our numerous discussions and brain storming
sessions helped me to analyse the problem from different perspectives to provide critical
insights. I would like to thank each of the other researchers in the department Agrima,
Jagan, Muthu, Prateek, Anita and Neha for all the wonderful time we shared during our
work.
I am also indebted to the members of the Resource Center for Cyber Forensics group
from CDAC, Trivandrum, with whom I have interacted during the course of my research.
Particularly, I would like to acknowledge K.L. Thomas (Assoc. Director), V.K. Bhadran
(Assoc. Director), C.Balan (Jt. Director), Dija (Dy. Director) and Nabeel Koya (Senior
Scientific Officer) for their invaluable suggestions and inputs during the practical aspects
carried out as part of this report.
Finally, my sincere acknowledgement of the sacrifices and support made by each
member of my family during this period. They were my pillars of strength, always un-
derstanding and encouraging me. Without their support, this work would never have been
completed.
BITS Pilani, Hyderabad Campus Digambar Povar
October 16, 2015
iii
Abstract
Cloud computing is a transformative computing model for businesses that deliver com-
puter based services over the Internet. Cloud computing faces major concerns due to its
architectural characteristics despite the technological innovations that have made it a fea-
sible solution. The huge popularity and utility of the cloud environment has made it the
soft target of cloud crimes. Investigating cloud crimes and fixing the responsibility of
the cyber crimes committed in the cloud platforms help instill confidence and trust in the
stake holders - be the clients, the cloud service providers or the third party entities. Cyber
crime investigation is incomplete without the proper detection of the digital evidence in
cloud. In general, cloud computing is characterized by its highly virtualized nature. As
virtualization provides many benefits, it also makes it difficult to detect digital evidence
when it is in the cloud environment. The approach used for the traditional digital forensic
cannot be directly applied to the cloud environment due to the presence of virtualization,
and hence cloud crime investigation is more difficult to perform than a traditional physical
computer investigation. The existing research in cloud forensics has only focused on the
organizational and the legal aspects, where as our work aims to contribute towards the
technical aspects of forensics in cloud.
The aim of this research is to design a generic digital forensic framework for the cloud
crime investigation by identifying the challenges and requirements of forensics in the vir-
tualized environment of cloud computing, address the issues of dead/live forensic analysis
within/outside the virtual machine that runs in a cloud environment, and to design a dig-
ital forensic triage using parallel processing framework to examine and partially analyze
the virtual machine data to speed up the investigation of the cloud crime. To analyze
the evidence within the virtual machine, we designed various methods of examining the
file system metadata, the registry file content, and the physical memory content. For the
evidence which is outside a virtual machine (cloud logs), various methods of log data
segregation and collection have been devised.
iv
Table of Contents
Certificate i
Acknowledgements ii
Abstract iii
Table of Contents iv
1 Introduction 11.1 Digital Forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Digital forensic process . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.2 Cloud Services, Deployment Models and Characteristics . . . . . 51.2.3 Cloud Crime . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.4 Cloud Forensics . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2.5 Gaps in Existing Research . . . . . . . . . . . . . . . . . . . . . 9
1.3 Objectives of the Research . . . . . . . . . . . . . . . . . . . . . . . . . 111.4 Scope and Problem Definition . . . . . . . . . . . . . . . . . . . . . . . 121.5 Contributions of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 121.6 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Background and Related Work 162.1 General Terms in Digital Forensics . . . . . . . . . . . . . . . . . . . . . 16
2.1.1 Computer Crime . . . . . . . . . . . . . . . . . . . . . . . . . . 162.1.2 Storage Media and File Systems . . . . . . . . . . . . . . . . . . 182.1.3 Limits of Traditional Digital Forensic Tools . . . . . . . . . . . . 18
2.2 Cybercrimes in Cloud Computing . . . . . . . . . . . . . . . . . . . . . 192.2.1 Sources of Digital Evidence . . . . . . . . . . . . . . . . . . . . 192.2.2 Does the Cloud Deployment Model Play a Role? . . . . . . . . . 192.2.3 Role of Cloud Delivery Models in the Investigation . . . . . . . . 212.2.4 Issues with Multi-layered Architecture . . . . . . . . . . . . . . . 22
2.3 Cloud Crime and Forensics - Review . . . . . . . . . . . . . . . . . . . . 23
v
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Approaches to Forensics in Presence of Virtualization in Cloud 293.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2 Challenges and Requirements of Forensics . . . . . . . . . . . . . . . . . 303.3 Detection of Virtual Environment . . . . . . . . . . . . . . . . . . . . . . 31
3.3.1 Important files in virtual machine investigation . . . . . . . . . . 323.3.2 Changes in the host OS when the virtual platform is used . . . . . 34
3.4 Detection of Virtual Machine Hidden Using ADS . . . . . . . . . . . . . 353.4.1 Role of Alternate Data Streams (ADSs) . . . . . . . . . . . . . . 363.4.2 Approach to Hide and Detect a VM Hidden using ADS . . . . . . 38
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Designing a Digital Forensic Framework for Cloud Computing Systems 454.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.2 Cloud Forensic Process and Phases . . . . . . . . . . . . . . . . . . . . . 46
4.2.1 Comparison of Digital Forensic Frameworks . . . . . . . . . . . 474.2.2 Identification of Digital Evidence . . . . . . . . . . . . . . . . . 484.2.3 Collection and Preservation of Digital Evidence . . . . . . . . . . 494.2.4 Analysis of the Digital Evidence . . . . . . . . . . . . . . . . . . 514.2.5 Reporting of Digital Evidence . . . . . . . . . . . . . . . . . . . 52
4.3 Heuristic Approach for Performing Digital Forensics in Cloud . . . . . . 544.4 Digital Forensic architecture for Cloud . . . . . . . . . . . . . . . . . . . 56
4.4.1 Cloud Infrastructure Setup . . . . . . . . . . . . . . . . . . . . . 574.4.2 Cloud Deployment (Cloud OS) . . . . . . . . . . . . . . . . . . 574.4.3 Cloud Investigation and Auditing Tools . . . . . . . . . . . . . . 59
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5 Digital Forensic Methods for Cloud Data Acquisition and Analysis 615.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.2 Digital Evidence Source Identification, Data Segregation and Acquisition 62
5.2.1 Identification of the Evidence . . . . . . . . . . . . . . . . . . . 625.2.2 Segregation of the Evidence . . . . . . . . . . . . . . . . . . . . 635.2.3 Acquisition of the Evidence . . . . . . . . . . . . . . . . . . . . 65
5.3 Examination and Partial Analysis of the Evidence . . . . . . . . . . . . . 685.3.1 Within the Virtual Machine . . . . . . . . . . . . . . . . . . . . . 685.3.2 Boyer-Moore (BM) Algorithm . . . . . . . . . . . . . . . . . . . 765.3.3 Outside the Virtual Machine . . . . . . . . . . . . . . . . . . . . 79
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6 Digital Forensic Triage in the Examination and Partial Analysis 806.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806.2 Digital Forensic Triage . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2.1 Introduction to Triage and Background . . . . . . . . . . . . . . 816.2.2 Parallel Processing Framework using Hadoop . . . . . . . . . . . 82
vi
6.3 Real-time Digital Forensic Analysis Process . . . . . . . . . . . . . . . . 836.3.1 Selection of the Pattern Matching Algorithm . . . . . . . . . . . 836.3.2 Proposed System Architecture . . . . . . . . . . . . . . . . . . . 846.3.3 Proposed System Implementation Details . . . . . . . . . . . . . 86
6.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 916.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7 Conclusion and Future Scope 997.1 Summary of Deductions . . . . . . . . . . . . . . . . . . . . . . . . . . 997.2 Future Scope of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1017.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
List of Publications 103
Bibliography 104
Glossary 113
vii
List of Tables
2.1 Categories of computer crimes . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Challenges of data acquisition in private and public clouds . . . . . . . . 20
3.1 Files which make up a virtual machine . . . . . . . . . . . . . . . . . . . 33
3.2 Virtual disk file signatures . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1 Comparison of digital forensic frameworks . . . . . . . . . . . . . . . . 48
4.2 Hardware configuration details of the private cloud (IaaS) . . . . . . . . . 57
4.3 Basic services of OpenStack cloud OS [31] . . . . . . . . . . . . . . . . 59
5.1 Details of the OpenStack cloud service logs [30] . . . . . . . . . . . . . . 64
5.2 Regular expressions used for corresponding patterns . . . . . . . . . . . . 75
6.1 Report of acquisition and indexing time using traditional digital forensic
tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2 Execution time of Boyer-Moore and KMP algorithms with multiple key-
words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3 Hardware configuration of a node in Hadoop cluster . . . . . . . . . . . . 84
viii
List of Figures
1.1 Digital forensic process [64] . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 CSP and cloud customer’s control over multiple layers in three service
models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Layers of the IaaS cloud environment and cumulative trust required by
each layer [55] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Types of hypervisors [86] . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1 Multi-level virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Changes in host OS files during VMware workstation installation . . . . . 35
3.3 MFT file record with sample attributes . . . . . . . . . . . . . . . . . . . 36
3.4 MFT file record with named attributes . . . . . . . . . . . . . . . . . . . 37
3.5 Hiding of virtual machine in a cloud hosting server . . . . . . . . . . . . 39
3.6 Launching a hidden virtual machine . . . . . . . . . . . . . . . . . . . . 39
3.7 Configuration file (.vmx) . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.8 Modified configuration file (.vmx) . . . . . . . . . . . . . . . . . . . . . 40
3.9 Hash value of vmtest.txt file before ADS attachment . . . . . . . . . . . . 40
3.10 Hash value of vmtest.txt file after ADS attachment . . . . . . . . . . . . . 41
3.11 Detection of hidden virtual machine . . . . . . . . . . . . . . . . . . . . 43
4.1 Phases of cyber crime investigation . . . . . . . . . . . . . . . . . . . . . 46
4.2 Daubert principles for digital forensic [8] . . . . . . . . . . . . . . . . . 52
4.3 Content of the chain of custody record . . . . . . . . . . . . . . . . . . . 53
4.4 Control flow diagram for digital forensic investigation in cloud . . . . . . 55
4.5 Digital forensic architecture for cloud . . . . . . . . . . . . . . . . . . . 56
4.6 Conceptual architecture of the private cloud IaaS . . . . . . . . . . . . . 58
5.1 Remote data acquisition in the private cloud data center . . . . . . . . . . 65
ix
5.2 Directory of virtual machine instances in the OpenStack cloud . . . . . . 66
5.3 Virtual hard disk location in the OpenStack cloud . . . . . . . . . . . . . 66
5.4 Connecting to cloud hosting server that stores the shared table database . 67
5.5 Shared table with different attribute information . . . . . . . . . . . . . . 67
5.6 Virtual disk examination process . . . . . . . . . . . . . . . . . . . . . . 69
5.7 File system metadata extractor . . . . . . . . . . . . . . . . . . . . . . . 70
5.8 File system metadata extractor report . . . . . . . . . . . . . . . . . . . . 70
5.9 Cloud VM’s registry analyzer . . . . . . . . . . . . . . . . . . . . . . . . 71
5.10 Cloud VM’s registry analyzer report . . . . . . . . . . . . . . . . . . . . 72
5.11 Selective memory analysis . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.12 Selective memory analysis report . . . . . . . . . . . . . . . . . . . . . . 73
5.13 Selection of keyword option for searching . . . . . . . . . . . . . . . . . 73
5.14 Entering multiple keywords for search (indexing) . . . . . . . . . . . . . 74
5.15 Selection of RE option for searching . . . . . . . . . . . . . . . . . . . . 74
5.16 Selecting multiple patterns for search (indexing) . . . . . . . . . . . . . . 75
5.17 Memory analysis report (result of keywords or pattern matching search) . 76
6.1 MapReduce application framework to count distinct words of a file . . . . 83
6.2 Mapping of Hadoop framework components to forensic triage [23] . . . . 85
6.3 Proposed system for ‘real-time digital forensic partial analysis’ using MapRe-
duce with KMP/BM search engine . . . . . . . . . . . . . . . . . . . . . 87
6.4 Default regular expressions to generate Mapper code . . . . . . . . . . . 90
6.5 Adding regular expression to generate Mapper code . . . . . . . . . . . . 90
6.6 Searching time of KMP based MapReduce with single keyword . . . . . 92
6.7 Searching time of KMP based MapReduce with multiple keywords . . . . 94
6.8 Searching time of RE based MapReduce with single pattern . . . . . . . . 95
6.9 Searching time of RE based MapReduce with multiple patterns . . . . . . 96
1
Chapter 1
Introduction
“I cannot teach anybody anything. I can only make them think.”
- Socrates
Over the past few years, cloud computing has revolutionized the methods by which
digital information is stored, transmitted, and processed. Cloud computing is not just
a hyped model, but, a technology embraced by Information Technology giants such as
Apple, Amazon, Microsoft, Google, Oracle, IBM, HP, and others. Cloud computing has
the potential to become one of the most transformative developments in the history of
computing, following the footsteps of mainframes, minicomputers, PCs (Personal Com-
puters), smart phones, and so on [81].
Gartner estimates that there are currently about 50 million enterprise users of cloud
office systems, which represent only 8 percent of overall office system users (excluding
China and India). Gartner, however, predicts that a major shift toward cloud office systems
will begin by the first half of 2015 and reach 33 percent penetration by 2017 and 60
percent by 2020 (Gartner, 2013). According to an IDC IT Cloud Services User Survey,
74 per cent of IT executives and CIOs have cited security as the top challenge preventing
their adoption of the Cloud services model [6].
The Eighth annual Worldwide Infrastructure Security Report (2013) from security
provider Arbor Networks has revealed how cloud services and data centres are “increas-
ingly victimised” by cyber attackers. Some recent attacks on cloud computing platforms
2
strengthen the security concern. Due to its characteristics, cloud services are more vul-
nerable to Denial of Service attacks (DoS) and may cause extreme damage. For example,
a botnet attack (running of “Zeus botnet controller” on an EC2 instance) on Amazon’s
cloud infrastructure was reported in 2009 [46]. This implies that an adversary can rent
any number of virtual machines (VMs) to launch a Distributed Denial of Service (DDoS)
attack on other systems including where the VMs are running. Also, due to the remote
storage facility provided by cloud computing platforms such as Google Drive, Dropbox,
SpiderOak, Amazon Cloud Drive, Microsoft SkyDrive, Ubuntu One, Apple iCloud, etc.,
cyber criminals can keep their secret files (e.g., pornography pictures, forgery documents,
etc.,) in cloud storage and can destroy all digital evidence from their local storage to
get undetected during investigation. To investigate these kinds of cybercrimes involving
cloud computing platforms, investigators have to carry out digital forensic investigation
in the suspected client device as well as cloud computing environment.
Cyber crime is a form of crime where the Internet or computer is used as a medium to
commit the crime [92].
1.1 Digital Forensics
As quoted by Ben Martini and Kim-Kwang Raymond Choo [69], digital forensics is a
relatively new sub-discipline of forensic science among other common forensic science
disciplines. Digital forensics has a number of synonyms including computer forensics,
cyber forensics, computational forensics and forensic computing.
1.1.1 Definition
One of the first definitions of digital forensics was provided by McKemmish in 1999 as,
“The process of identifying, preserving, analyzing and presenting digital evidence in a
manner that is legally acceptable by court of law”[71].
Another widely adopted definition introduced by the inaugural DFRWS (Digital Forensic
Research Workshop, August 7-8, 2001, Utica, New York) is given as,
3
“The use of scientifically derived and proven methods toward the preservation, collec-
tion, validation, identification, analysis, interpretation, documentation and preservation
of digital evidence derived from digital sources for the purpose of facilitating or furthering
the reconstruction of events found to be criminal, or helping to anticipate unauthorized
actions shown to be disruptive to planned operations”.
-DFRWS, 2001
Today, the digital forensic community uses the definition that was provided by NIST [64]
which share some similarities with McKemmish and DFRWS in the four phases as given
below.
1. Collection phase discusses identifying relevant data, preserving its integrity and
acquiring the data;
2. Examination phase uses automated and manual tools to extract data of interest
while ensuring preservation;
3. Analysis phase is concerned with deriving useful information from the results of
the examination; and
4. Reporting phase is concerned with the preparation and presentation of the forensic
analysis
Thus we can say that digital forensic deals with forensic analysis of cyber crimes and its
role is a means of systematically gathering digital evidence, analyzing it to make credible
evidence and authentically presenting it to the court of law.
1.1.2 Digital forensic process
The most common goal of performing forensics is to gain a better understanding of an
event of interest by finding and analyzing the facts related to that event [64]. The basic
phases that are required for a forensic process are collection, examination, analysis and
reporting as shown in the following figure. Digital forensic process transforms content of
a storage media into evidence. During this transformation, there are three stages.
4
Figure 1.1: Digital forensic process [64]
Stage 1:
Collection - examination: where digital data is collected in a format that can be under-
stood by various forensic tools.
Stage 2:
Examination - analysis: where the relevant pieces of information is extracted from col-
lected data.
Stage 3:
Analysis - reporting: where by using various analysis methods, the forensic investigator
must process and analyze the data to draw conclusions relevant to a case under investiga-
tion.
The arrow from reporting phase to collection phase denotes that the reported evidence is
repeatable and reproducible.
1.2 Cloud Computing
Cloud computing is a relatively new business model after grid computing to make avail-
able computer resources as a service to end users accessible over a network. Various
definitions and interpretations of the term “cloud computing” exist in the world commu-
nity of users. Vaquero et al. [90] reviewed more than 20 cloud computing definitions and
noticed that key terms mandatory in a minimal definition are scalability, pay-per-use util-
ity model and virtualization. The most widely used definition of the cloud computing was
provided by Peter et al. in the NIST special publication [72].
5
1.2.1 Definition
“a model for enabling ubiquitous, convenient, on-demand network access to a shared
pool of configurable computing resources (e.g., networks, servers, storage, applications
and services) that can be rapidly provisioned and released with minimal management
effort or service provider interaction”.
- NIST [72]
This was the final definition released by NIST in 2011 after 15 versions of working defi-
nitions. This definition was also adopted by Australian Government for Information and
Communication Technology (ICT) services [7]. Few researchers have suggested it as
the “defacto standard” [60]. The working of cloud computing is based on 3-4-5 rule,
in which, it provides 3 - unique services, 4 - unique deployment models and 5 - unique
characteristics according to NIST.
1.2.2 Cloud Services, Deployment Models and Characteristics
Cloud Services
Three services named according to the abstraction level of the capability provided and the
service models of providers are [91]:
1. Infrastructure-as-a-Service (IaaS)
2. Platform-as-a-Service (PaaS)
3. Software-as-a-Service (SaaS)
Infrastructure-as-a-Service (IaaS):
This service model provides the user the facility of renting processing power and storage
to run his or her own virtual machine in the cloud. A user can access the launched vir-
tual machine through a thin client interface such as a web browser running in computers,
mobiles, PDA’s(Personal Digital Assistant), etc. devices. Users will be charged based on
the resources the virtual machine consumes from the cloud. Amazon through its AWS
(Amazon Web Services) console provides IaaS using its EC2 (Elastic Compute Cloud)
6
facility [4]. Many other vendors like Microsoft, Rackspace, GoGrid, terremark, etc., pro-
vide the same facility. There are well known open source cloud platforms available for
this purpose like Eucalyptus [13], OpenNebula [28], OpenStack [29], etc.
Platform-as-a-Service (PaaS):
Through this model cloud owner provides the user the facility of renting a platform to de-
velop and deploy the user applications in the cloud environment. It is basically an applica-
tion middleware offered as a service to developers, integrators, and architects [86]. Users
will be charged according to the platform (e.g., Database, .Net, etc. ) used and band-
width consumed. The well-known example of PaaS is Google App Engine [19]. There
are a number of other PaaS providers like Windows Azure, Force.com, Drupal, Wolf
Frameworks, Cloud Foundry, IBM Bluemix, Eccentex, AppBase, LongJump, SquareS-
pace, WaveMaker, Heroku, Github, etc., to name few.
Software-as-a-Service (SaaS):
Using this model user can make use of cloud service provider’s software application run-
ning on cloud infrastructure [86]. The user can access the application through a thin
client interface such as a web browser from various devices like computers, mobiles,
PDA’s(Personal Digital Assistant), etc. Users will be charged based on the usages of the
application. Examples of SaaS include applications like Salesforce.com, QuickBooks,
GoToMeeting, Zoho Office Suite, Microsoft Office 365, Google docs, Google calendar,
Facebook, Linkedin, Slideshare, etc., to name few.
Deployment Models
According to the deployment model, cloud computing can be categorized into four cate-
gories as [72]:
1. Private Cloud
2. Public Cloud
3. Community Cloud
4. Hybrid Cloud
7
Private Cloud:
In this model, the cloud infrastructure is fully operated by the cloud owner organization. It
is the internal data center where the infrastructure is located at the organizations premises.
One can set up this kind of cloud computing environment using solutions like OpenStack,
Eucalyptus, OpenNebula, VMWare [40], etc.
Public Cloud:
The cloud service provider (CSP) owns the cloud infrastructure and makes it available
to the general public or a large industry group. Amazon, Microsoft and Google are the
major public cloud service providers in the current IT industry.
Community Cloud:
This is similar to the grid computing model in which several organizations with common
concerns (e.g., mission, security requirements, policy, and compliance considerations)
share the cloud infrastructure. Different private cloud data centers can be connected to
form this kind of a computing model. The public cloud service providers like Amazon,
Microsoft, etc. can deploy this kind of cloud platform based on the user requirements.
Hybrid Cloud:
This model is a composition of two or more clouds (private, community, or public). Hy-
brid cloud architecture requires both on-premises resources and off-site (remote) server
based cloud infrastructure. Eucalyptus, VMWare, etc., are examples of Hybrid cloud de-
ployment solutions.
Characteristics
Five unique characteristics of cloud computing according to NIST (National Institute of
Standards and Technology) are [72]:
1. On-demand Self-service
2. Broad Network Access
3. Resource Pooling
4. Rapid Elasticity
5. Measured Service
8
On-demand Self-service: A user of a cloud can provision computer resources without
the need for interaction with the cloud service provider personnel. For example, one can
log on to Amazon EC2 and obtain virtual resources such as sever, storage, memory and
network within minutes [86].
Broad Network Access: Ubiquitous access to virtual resources in cloud, i.e., access to
resources in the cloud is available over the network using standard methods in a manner
that provides platform-independent access to clients of all types.
Resource Pooling: A cloud service provider creates resources that are pooled together in
a system that supports multi-tenant usage. Physical and virtual systems are dynamically
allocated or reallocated as needed.
Rapid Elasticity: Resources can be rapidly and elastically provisioned. The system can
add resources by either scaling up systems (more powerful computers) or scaling out
systems (more computers of the same kind), and scaling may be automatic or manual.
From the standpoint of the client, cloud computing resources should look limitless and
can be purchased at any time and in any quantity.
Measured Service: The use of cloud system resources is measured, audited, and reported
to the customer based on a metered system.
1.2.3 Cloud Crime
Ruan et al. [82] have extended the definition of cyber crime (or computer crime) to cloud
crime as,
“a crime that involves cloud computing in a sense that the cloud can be the object, subject
or tool of crimes (object - CSP is the target of the crime; subject - cloud is the environment
where the crime is committed; tool - cloud can also be the tool used to conduct or plan a
crime)”.
Cyber criminals may use DDOS (Distributed Denial of Service) attacks to target the CSP
(cloud service provider), or use the cloud environment to commit a crime such as identity
theft of the cloud user, illegal access of data residing in the cloud, or use cloud as a
platform to store crime related data and share among friends using the cloud.
9
1.2.4 Cloud Forensics
As there is no unique definition available for cloud computing, it is too early to expect
a definition of an emerging area like cloud forensics. According to Ruan et al. [82],
cloud computing is based on broad network access. Network forensics deals with forensic
investigations of networks. So, cloud forensics is a subset of network forensics. Also, they
view it as a cross discipline of cloud computing and digital forensics.
Shams et al. [93], define cloud forensic as “the application of computer forensic prin-
ciples and procedures in a cloud computing environment”.
We define cloud forensic as “the process of applying various digital forensic phases
in cloud platform depending on the deployment model of cloud”. For example, digital
forensic process used for private cloud may differ from that of public cloud environment.
1.2.5 Gaps in Existing Research
Cloud computing has completely changed the way the digital data is stored, transmitted
and processed. With such a paradigm shift from desktop systems to network of servers ge-
ographically located, many technological and legal challenges may be encountered when
we intend to perform digital forensics in different types of cloud platforms. In the past
few years, many researchers have contributed in identifying the forensic challenges, de-
signing forensic frameworks and data acquisition methods for cloud computing systems.
Though all these works identify the technical /organizational /legal challenges of cloud
forensic analysis, no concrete solutions have been proposed to address the challenges of
applying forensics to the cloud environment in general that is acceptable to the forensic
investigator or LEAs (Law Enforcement Agencies) of this digital space.
The related research done so far is all about studying the issues in the cloud forensic
arena, except for some specific contributions like FROST (Digital Forensic Tools for the
OpenStack Cloud Computing Platform) [56]. Therefore there is a real requirement to
undertake forensic research in cloud on a large scale. The major challenges of cloud
forensics originate from the very characteristics with which the cloud computing platform
is identified. Accordingly, we enlist a few gaps which demand the immediate attention of
cloud researchers for practical solutions to cloud forensics.
10
1. Absence of uniform standards and protocols - This leads to technical difficulties in
forensic data collection and analysis
2. Investigation in virtualized environment is a real challenge - as virtualization is a
key technology used to implement cloud services
3. Multi-tenancy and multi jurisdiction - which result in legal concerns with respect to
cloud forensics
4. Evidence segregation is a big issue - due to the “resource pooling” characteristic
of the cloud
5. Partial forensic examination - Absence of tools for pre-processing of virtual disks
and memory to help in completing the investigation process
6. Absence of digital forensic triage in cloud data analysis - use of parallel processing
techniques to index virtual disk data to speed up investigation process
7. Interoperability between cloud providers - no interoperability as such between cloud
providers
8. Lack of transparency - the operational details of cloud data centers are not transpar-
ent enough to cloud investigators
9. Maintaining chain of custody - due to the multi-layered and distributed architecture
of cloud, the chain of custody of data may be difficult to verify
10. Loss of data control - cloud user or investigator will have little or no control (or
knowledge) over the physical locations of the digital evidence
11. Virtual machine data is not persistent - if a virtual machine is terminated, there is
no procedure available for recovering its data
12. Identification of evidence - sources of evidence pertain to different cloud platforms
and hence no unique method for identification
13. Live forensics - Live data acquisition will have data integrity or preservation prob-
lem
11
14. Imaging the data center of the cloud - complete data center imaging of the cloud is
not possible and partial imaging may have legal implications
15. Selective data acquisition - requires a good amount of prior knowledge about the
cloud platform
16. Layers of trust - because cloud has a multi-layered architecture, trust is required at
various layers to maintain the integrity of the evidence
17. Reliance on cloud providers - for data acquisition, the investigator has to exclusively
depend on the cloud provider
18. Outsourcing of services to third parties - it makes the scope of investigation widen
and the forensic activity needs to be done as a joint effort
19. Absence of cloud forensic SLAs - There are no well-framed SLAs (Service Level
Agreements) for performing forensics in cloud
1.3 Objectives of the Research
The objectives of this research work include the following:
1. Explore the challenges and requirements of forensics in the virtualized environment
of cloud computing
2. Design a digital forensic framework for the cloud computing systems from the view
point of investigator and/or cloud architecture
3. Address the issues of dead/live forensic analysis within/outside the virtual machine
that runs in a cloud environment
4. Using digital forensic triage in the examination and partial analysis phase of cloud
forensics
12
1.4 Scope and Problem Definition
Cloud computing is maturing and continues to be the latest, most hyped concept in infor-
mation technology industry. Cloud computing evokes different perceptions in different
people. These developments may create problems to law enforcement agencies (LEA)
throughout the world who are actively involved in cyber crime investigation especially in
cloud. The work “A Novel Digital Forensic Framework for Cloud Computing Environ-
ment”, would help an investigator or the Cloud Service Provider to get an overall idea
of performing digital forensic investigation in cloud computing environment. The digital
forensic methods that are suggested in this research can scale to cloud data for handling
the analysis of the cloud crimes. The proposed methods of the partial analysis would help
the forensic investigator in minimizing the overall processing time of a cloud crime un-
der investigation. The digital forensic research community which is actively involved in
the designing and development of the cyber forensic tools for cloud computing systems,
could consider the cloud forensic architecture presented in this work as a reference model.
In brief, the work presented as part of this report can be a way forward to combat cyber
crimes in cloud computing systems.
1.5 Contributions of the Thesis
The contributions of this work can be organized into four aspects. The first aspect deals
with identifying the challenges and requirements of forensics in the vrtualized environ-
ment that is omnipresent in cloud. More specifically, the contribution is restricted to
detecting the virtual environment in the multi-level virtualization, and identification of
forensically relevant files which are generated when virtual systems are used in the vir-
tual machines that are part of the cloud environment, and devising method to detect virtual
machines hidden using alternate data streams in such virtual systems.
In the second aspect, we designed a digital forensic process for cloud computing sys-
tems from the view point of the investigator and/or the cloud architecture. The digi-
tal forensic process that we designed for the investigators has a built-in digital forensic
13
framework which contains the required phases of digital forensics for the cloud comput-
ing systems. We also compared the digital forensic framework which we proposed with
the existing traditional forensic frameworks. A generic digital forensic architecture is de-
signed for the cloud computing systems to understand the challenges that may come up
in designing the new digital forensic tools for cloud platforms.
In the third aspect, we addressed the issues of forensic acquisition and analysis of
evidence within and/or outside the virtual machine that runs in a cloud environment. In
particular, we have designed the digital forensic methods for cloud data acquisition and
analysis. For the examination and partial analysis of the evidential artifacts of a virtual
machine, we have designed and implemented tools to collect actionable evidence depend-
ing on the nature of the cloud crime. We have used the Boyer-Moore pattern matching
algorithm to report the running status of a virtual machine by extracting the physical
memory artifacts of that virtual machine. To analyze the cloud logs (outside the virtual
machine), we designed and implemented tools to segregate and collect important logs
pertaining to the virtual instances.
For large-scale data examination, we have designed and implemented a digital forensic
triage using parallel processing framework to find the evidence of interest to the investiga-
tor in real time. This forms the fourth aspect of the thesis. The approach uses MapReduce
with inbuilt KMP (Knuth-Morris-Pratt) and Boyer-Moore string search algorithms on the
distributed computing platform Hadoop to search for user specified keywords. The facil-
ity of regular expression search is also provided in this framework.
1.6 Outline of the Thesis
The thesis has been organized into seven chapters. In Chapter 1, we provided an introduc-
tion to cloud computing and digital forensics. In Chapter 2, we discuss the background
and related work. To begin with, we defined few general terms in digital forensics, and
about different storage media and the file systems are discussed. In the cloud crime in-
vestigation; the limitations of the traditional digital forensic tools, the sources of digital
evidence, the role of deployment models, the role of delivery models, and the issues with
14
multi-layered architecture are discussed in detail. Finally, we provided an extensive re-
search review on cloud crime and forensics.
Chapter 3 deals with identifying the challenges and requirements of performing foren-
sic activity in presence of virtualization in the cloud environment. When virtualization
software runs in the virtual machine to create another level (second level virtualization)
of virtual layer in cloud, the detection of such virtualization by identifying the relevant
files, and the changes in the host OS of the virtual machine is discussed. While using
virtualization, if virtual machines are hidden using alternate data streams (ADS) for mali-
cious purpose, technique for the detection of such virtual machines is discussed in detail.
To help an investigator understand the complete digital forensic process to investigate
the cloud crime, we designed a digital forensic framework for cloud, which is discussed
in Chapter 4. We described the cloud forensic process and its phases by elaborating on
each phase such as identification, collection and preservation, analysis, and reporting in
detail. Also, we have compared our proposed framework with the existing digital foren-
sic frameworks. Having identified the different phase of cloud forensics, we designed a
control flow diagram of the digital forensic process for cloud computing systems, which
provides a detailed view of how to perform cloud crime investigation of the client de-
vice as well as the cloud data center. Finally, we have provided a generic architecture of
cloud forensics which includes an IaaS (Infrastructure as a Service) cloud test bed using
OpenStack for experimental purpose.
In Chapter 5, the digital forensic methods for data acquisition and analysis of the
cloud environment are provided. In particular, the methods of the examination and partial
analysis of the data within the virtual machine such as examination of the file system
metadata, the registry files, and the physical memory are described in detail. Also, the
methods of data segregation and acquisition with respect to the cloud logs are discussed.
Finally, the implementation detail of the Boyer-Moore pattern matching algorithm, which
is used to search multiple keywords in a memory image is provided.
The approach of the digital forensic triage in the examination and partial analysis
of the cloud data is discussed in the Chapter 6. To begin with, the requirement of the
digital forensic triage in cloud and the parallel processing framework using Hadoop are
explained in detail. Then, the complete real-time digital forensic analysis process with
15
emphasis on the selection of a pattern matching algorithm, the proposed system architec-
ture, and the implementation details are provided. Finally, the searching capability of the
KMP (Knuth-Morris-Pratt) based MapReduce over single and multiple keywords, and RE
(regular expression) based MapReduce over single and multiple patterns on multi-node
Hadoop cluster is summarized.
1.7 Summary
The concept of digital forensics and cloud computing is not new. In the last few years,
network administrators and technology developers have represented the Internet as cloud.
Digital forensics was all started in late 1970s and was a field of growth during 1980-90s.
There were many tools developed for performing digital forensics during 1980 to 2015.
In this introductory Chapter, we have introduced the concepts of cyber crime, dig-
ital forensic, cloud computing, cloud crime, cloud forensics taking inputs from various
researchers in the field of digital forensic and cloud computing. To start with, we have
identified gaps in the existing research and defined the scope and the problem definition.
Finally, we described the contributions made and gave the outline of the thesis. In Chapter
2, we elaborate on the background and the related work with respect to this research.
16
Chapter 2
Background and Related Work
“A little knowledge is a dangerous thing. So is a lot.”
- Albert Einstein
2.1 General Terms in Digital Forensics
2.1.1 Computer Crime
In Chapter 1, we had discussed various terms related to digital forensics and cloud com-
puting. Having identified the gaps in the existing research in the area of cloud forensics,
we look forward to provide more information on its background and related research.
Computer crime also called as cyber crime is a “Unlawful act wherein the computer
is either a tool or a target or both” [78]. A computer may be used as a tool to commit
crime (for example: child pornography, threatening email, spam mails, phishing, etc.).
Computer itself may become target of a crime (for example: viruses, worms, software
piracy, hacking, etc.). All computer crimes may fall into three categories as shown in
Table 2.1.
Digital Evidence:
Digital evidence can be defined as “the digital data which can establish that a crime has
been committed or can provide a link between a crime and its victim or a crime and its
perpetrator” [52]. So, digital evidence is a means for investigation and analysis of the
cyber crimes to bring the culprits to conviction. Digital evidence may exist in the form of
17
Table 2.1: Categories of computer crimes
Category Examples
Against Organizations
Hacking, DOS (Denial of Service), Virus/Worm/Trojans/
Spyware Attacks, IPR Violations, Stealing Trade Secrets,
Website Defacement, etc.
Against PeoplePhishing, Identity Theft, E-mail hijacking, Defamation, Internet
Fraud, Pornography, Distribution of pirated software, etc.
Against Country Cyber Terrorism, Cyber Attacks, etc.
a text, audio, image, video, or raw file (binary file of 0’s and 1’s).
Seizure:
It is the process of capturing the suspect computer for evidence collection. Systematic
procedure is needed for seizure to avoid loss of digital evidence. The process of seize
may not be possible in the cloud environment due to the geographically dispersed servers
(public cloud) and the multi-tenancy nature of cloud. Multi-tenancy is a technology using
which multiple organizations or users share computer resources of a physical server.
Acquisition:
It is the process of recording the physical scene and duplicate digital evidence using stan-
dardized and accepted procedures. This process is known as imaging(bit by bit copying)
of digital storage media. This process also may not be possible in the cloud environment
due to its virtualized nature. Rather, selective remote data collection can be possible for
cloud crime analysis [55].
Authentication:
Validating the seized and acquired evidence to make sure that the integrity of evidence is
not compromised. Investigators generally use hashing algorithms (MD5 - Message Digest
or SHA - Secure Hash Algorithms) for computing a check-sum to maintain the evidence
integrity.
18
2.1.2 Storage Media and File Systems
Digital data created for any purpose must be stored in proper format so that it is easily
accessible for further processing. Any data which is stored in the form of 0’s and 1’s is
termed as digital data. In the digital/ computing environment, many devices are designed
for storing such data. Today, data is generally stored in three different ways: electro-
magnetism (Magnetic Disks - hard disk), microscopic electrical transistors (Flash mem-
ory - USB, Solid State Drive, etc.), and reflecting light (Optical Storage - CDs, DVDs,
etc.) [83]. A file is a bunch of data. A file system is a data structure which allows this
data to be stored in a systematic manner. The file system keeps track of the free space as
well as the location of each file of the storage. The free space is also called as unallocated
space. This free space is either empty or contain files which were deleted previously.
File systems are of different types. Windows operating systems uses FAT (File Al-
location Table with various versions like FAT12, FAT16, FAT32 and exFAT) and NTFS
(New Technology File System). Mac operating system used in Apple products uses file
systems like HFS (Hierarchical File System) and HFS+. File systems used by open source
operating system are ext2, ext3, ext4, etc. Distributed systems like cloud computing make
use of GFS (Google File System), HDFS (Hadoop File System), etc.
2.1.3 Limits of Traditional Digital Forensic Tools
FBI (Federal Bureau of Investigation) reports that the average amount of data per case
has grown 6.65 times during 2003-2011. But, in reality the ability of digital forensic
tools have not been appreciably grown to handle the data growth rate [80]. There are
numerous digital forensic tools to generate timeline view based on file system meta data
like EnCase [12], TSK (The Seuth Kit) [37], CyberCheck [9], etc. The limitation of these
tools is that they depend on file systems and do not use the content of a individual file.
Data segregation is not required in traditional digital forensic and hence there is no
segregation phase. Log formats of a desktop or server operating systems are not similar
to the logs in a cloud operating systems. For cloud log analysis, data segregation and
collection are required.
19
2.2 Cybercrimes in Cloud Computing
As surveyed and reported by RSA(2012), McAfee(2013), Norton(2013), etc., cybercrime
will pose many challenges to digital forensics in the near future which includes threats
due to virtualization and cloud computing among others [48]. Incident response and
computer forensics in a cloud computing environment require fundamentally different
tools, techniques, and training [55]. A draft report from the National Institute of Standards
and Technology noted that “little guidance exists on how to acquire and conduct forensics
in a cloud platform” and suggested that the existing best practices and guidelines still
apply to digital forensics in the cloud computing environment [22].
2.2.1 Sources of Digital Evidence
Any information in traditional desktop systems will be stored as files including data re-
lated to systems activity. Depending on the nature of the computer crime, the files from
the storage will be retrieved and parsed to investigate the cause. Similar to a desktop ma-
chine, a cloud user can create and run virtual machines (VMs) in the cloud environment.
This VM is as good as a physical machine and creates lots of data in the cloud for its
activity and management. The data created by a VM includes virtual disk, virtual physi-
cal memory, and logs (VM logs, Cloud services logs, firewall logs). The virtual physical
memory is the memory space seen by the VM’s operating system and not cloud. Virtual
disk formats that different cloud providers may support include .qcou2, .vhd, .vdi, .vmdk,
.img, etc. Every cloud provider has its own mechanism for storing service logs (activity
maintenance information) and hence there is no interoperability on log formats among
cloud providers.
2.2.2 Does the Cloud Deployment Model Play a Role?
Among the four deployment models defined by NIST [72], two most popular models are
private and public clouds. An appropriate digital forensic architecture designed for these
models can also be used for the remaining models (community and hybrid clouds).
It may be impossible to seize some or all of the servers physically in a cloud data center
due to the servers which are geographically dispersed (may be in multiple jurisdiction) or
20
contain multi-tenant data (violating privacy of tenants). Cloud forensic mainly differs
with traditional digital forensic in data acquisition phase. Rest of the phases are similar
except data segregation of logs in cloud environment which helps in the analysis. Also,
several researchers have pointed out that the acquisition of data in cloud is a forefront
issue while investigating cloud based crimes [55, 82, 88].
As pointed out by Ben Martini and Raymond Choo, the approach used by the digital
forensic investigator in acquiring digital evidence will certainly depend on the cloud de-
ployment models used [69]. Following table lists the challenges an investigator may face
during the data acquisition in private and public clouds.
Table 2.2: Challenges of data acquisition in private and public clouds
Deployment Model Chellenges
Private Cloud
Law Enforcement (LE) with the help of CSP may acquire data
related to crime such as VM’s virtual disk file, cloud service
logs, firewall logs, etc. of a particular IP address belonging
to the incident using remote acquisition methods.
Acquisition is comparatively easy because no jurisdiction is
involved and no loss of control.
Public Cloud
Law Enforcement will have to issue a search warrant to CSP
for acquiring the data related to crime of a particular IP address
belonging to the incident. A technician at the CSP, acquires
the data required using the same methods as private cloud
(because technician can have access to cloud) and submits the
same to LE.
LE has to trust the technician of CSP and his capabilities in
using sound methods of forensic data acquisition.
21
2.2.3 Role of Cloud Delivery Models in the Investigation
All of the four deployment models defined by NIST delivers software, platform and in-
frastructure as services to end users. Platform-as-a-Service is built on Infrastructure-as-
a-Service and Software-as-a-Service is built on Platform-as-a-Service. Hence, the pro-
cedures and frameworks designed for performing digital forensics in Infrastructure-as-a-
Service model will also help in other two service models.
Cloud computing architecture comprises of layers and cloud user will have access to
different layers in different service models. IaaS model provides access to more number
of layers and SaaS provides only one (access control). The number of layers to which
cloud user can have access to in different service models is shown in the Figure 2.1. This
implies that the investigator can acquire data related to a VM or user account with the
help of the IaaS model and not with PaaS and/or SaaS models. Hence, the contribution of
this thesis is restricted to the IaaS model of private or public cloud deployment models.
Figure 2.1: CSP and cloud customer’s control over multiple layers in three service models
22
2.2.4 Issues with Multi-layered Architecture
Presenting the cyber crime case before a court-of-law arises many questions about trust in
hardware (ex: hard drive), software (ex: operating system), procedures and tools used for
data acquisition and analysis, capability of the investigator, etc. Cloud computing added
few more areas to this due to its unique characteristics and layered architecture. Dykstra
and Sherman designed a model of trust in IaaS cloud environment in six layers [55]. The
summary of their work is depicted in the Figure 2.2.
Figure 2.2: Layers of the IaaS cloud environment and cumulative trust required by eachlayer [55]
They represent network as layer 1 and application data (guest application) as layer
6. For each layer the data acquisition method and the level of trust required is different.
For example, at layer 5, using remote acquisition methods the trust is required at differ-
ent layers like guest OS, HV (hypervisor), host OS, hardware and network. Host OS is
the operating system that runs on cloud server hardware. Guest OS is operating system
installed in a virtual machine with the help of a hypervisor. The hypervisor is a virtual-
ization software (also called as VMM - Virtual Machine Monitor) which can create and
run virtual machines. There are two types of hybervisors [86]:
• Type 1 Hypervisor (Bare-metal or Native)
• Type 2 Hypervisor (Hosted)
A type 1 hypervisor as shown in the Figure 2.3, runs directly on the hardware and
type 2 hypervisor runs on top of the existing OS (Windows 7, Windows 8, Red Hat
23
Figure 2.3: Types of hypervisors [86]
Linux, Ubuntu, etc.). Examples of bare-metal hypervisors include VMware ESX/ESXi,
Microsoft Hyper-V, Citrix XenServer, IBM z/VM, etc. and hosted hypervisors include
VMware Workstation, Microsoft Virtual PC, Sun VirtualBox, QEMU, etc.
2.3 Cloud Crime and Forensics - Review
For a forensic investigator or a cloud customer, cloud computing environments lack trust-
worthy capabilities. The cloud investigator or the customer is at the mercy of the cloud
service provider to assist in cloud crime investigation. In this section we list some of the
major contributions made till date by a few researchers in the area of digital forensic in
cloud computing.
There were number of surveys conducted in mapping the principles and guidelines
available for the traditional digital forensic process to the cloud computing environment.
The Incident Management and Forensics Working Group, mapped the forensic standard
ISO/IEC 27037 to cloud computing [24]. This mapping is basically a survey of the issues
related to the forensic investigation of the cloud environments. It includes the standards
which can be followed by the LEAs (Law Enforcement Agencies) across the nations, the
requirements of the service level agreements (SLAs) for cloud forensics, etc.
Harjinder Singh Lallie and Lee Pimlott have investigated the impact of the cloud com-
puting environments on the ACPO (Association of Chief Police Officers) principles [66].
24
The ACPO principles are the guidelines for the digital forensic investigation which will
be followed in handling the computer based electronic evidence by the law enforcement
agencies in the United Kingdom. In their findings, they warned the digital forensic com-
munity about the usage of these guidelines in the cloud computing environment for vari-
ous reasons. The reasons they identified includes the problems associated with metadata,
lack of control over the investigation, complexities related to the distribution of the data
stores and the problems associated with maintaining an audit trail.
To research the digital forensic issues in the cloud environment, the NIST Cloud Com-
puting Forensic Science Working Group (NCC FSWG, 2014) was established to identify
the challenges which cannot be handled with the current technology and methods [61].
This group has surveyed existing literature and identified the set of challenges for cloud
crime investigation. Also, the researchers of this group have interacted with the interna-
tional community of the digital forensics to summarize the challenges. The final research
report summarizes 65 challenges which broadly falls in the categories of the incident first
responders, architecture, anti-forensics, data collection, analysis, legal issues, role man-
agement, training and standards.
All these research reports emphasize on the requirements of the practical methods to
investigate the cloud crime.
To our knowledge the researchers who actively started working in cloud forensic were
Dykstra and Sherman. In 2012, for the first time they have used existing tools like En-
case Enterprise, FTK (Forensic Tool Kit), Fastdump, Memoryze, and FTK Imager to
acquire digital evidence from the public cloud over the Internet. They have used the Elas-
tic Compute Cloud (EC2) as a live test bed from Amazon Web Services (AWS) public
cloud. The aim of their research was to measure the effectiveness and accuracy of the
traditional digital forensic tools on an entirely different and new environment like cloud.
They succeeded in the experiment and highlighted the limits of the same. Their experi-
ment showed that trust is required at many layers to acquire forensic evidence from the
cloud environment. Due to the trust issue, they did not recommend traditional forensic
tools (Encase Enterprise, FTK, etc.) but explored four other solutions for data acquisition
like Trusted Platform Modules (TPM), the management plane, forensics-as-a-service, and
legal solutions. From these four solutions they strongly recommended the management
25
plane.
In continuation of their work, Dykstra and Sherman (in 2013) have implemented user-
driven forensic capabilities using management plane of a private cloud platform called
OpenStack [56]. Their solution is capable of collecting virtual disks, guest firewall logs
and API logs through the management plane of OpenStack (a private cloud computing
platform) [29]. OpenStack users and/or administrators interact with the cloud platform
and manage resources of the cloud through the management plane using a web interface
(ex: Horizon) and API (ex: Nova API). They call their implementation as Forensic Open-
Stack Tools (FROST) which is available through both of these interfaces. Their emphasis
was on data collection and segregation of log data in data centers using OpenStack as the
cloud platform. Hence, their solution is not independent of OpenStack platform and till
date it has not been added to the public distribution (the latest stable version of OpenStack
is Kilo released on 30th April 2015).
Ben Martini and Kim-Kwang Raymond Choo (in 2012) [69], proposed an integrated
conceptual digital forensic framework to collect and preserve digital evidence for forensic
purpose from the cloud computing environment. Their framework was based on two of
the most widely accepted and used digital forensic frameworks - McKemmish (1999) [71]
and NIST (Kent et al., 2006) [64]. They reviewed these two frameworks to know the
changes required to conduct digital forensic in cloud computing environment. For Law
Enforcement (LE) or forensic investigator, they contributed in understanding the technical
challenges and implications of digital forensics in the cloud computing platform. They
have raised the following two potential questions to the digital forensic research commu-
nity for designing and developing frameworks that are evidence-based and forensically-
sound.
1. What further changes are required to the existing forensic frameworks and prac-
tices for conducting forensically-sound investigations in a cloud computing envi-
ronment?
2. What are the legal and privacy issues surrounding the access to cloud computing
data, particularly cross-border legal and privacy issues; and what reforms are re-
quired to facilitate access to such data for LEAs (Law Enforcement Agencies)?
26
The main contribution of this thesis is limited to answer the first question above.
A number of solutions have been proposed by many researchers to reduce the over-
all processing time of the digital evidence. Rogers et al. (2006) have proposed a live
forensics model called Cyber Forensic Field Triage Process Model (CFFTPM), which
deals with gathering actionable intelligence on the crime scene [79]. The model, aimed at
time-critical investigations, defines a workflow for on-scene identification, analysis and
interpretation of digital evidence, without the requirement of acquiring a complete foren-
sic copy or taking the system back to the lab for an in-depth examination. Vassil Roussev
et al. (2013) formulated forensic triage as a real-time computational problem with specific
technical requirements and used these requirements to evaluate the suitability of different
forensic methods for triage purposes [80]. Fabio Marturana et al. (2013) proposed a “ma-
chine learning based digital forensic triage methodology for automated categorization of
digital media” [70]. Kyungho Lee et al. (2013) proposed a new triage model conforming
to the needs of selective seizure of electronic evidence by surveying Law Enforcement
officers who are involved in the onsite search and seizure of digital evidence [63]. Also,
there are many digital forensic triage tools which are used to collect crime related data
quickly and are able to preserve integrity [1, 21, 38, 39]. Neither of the existing tools nor
the recently proposed forensic triage methods use any parallel processing framework to
achieve digital forensic triage.
Adrian Shaw and Alan Browne in their paper [84] have summarized the risks associ-
ated with using triage techniques in digital forensics. Out of the six risks they summa-
rized, the following risks are worth mentioning. We provide counter measures for them
in this thesis.
1. A high risk of evidence being missed through the lack of thoroughness in the pro-
cess: this includes searching through encrypted data, unallocated space, swap space,
etc.
2. The risk of case backlogs: the forensic triage followed provides little assistance
to examiner/investigator for defining scope of the examination, thus, full forensic
examination can not be avoided
3. The risk of missed investigative opportunities: absence of intelligence evidence
27
gathering and analysis
For cloud crimes involving storage as a service (Amazon Cloud Drive, Microsoft Sky-
Drive, Google Drive, Dropbox, etc.), forensic investigation has to be carried out in the sus-
pected client device as well as the cloud computing environment. Chung et al. [53] have
proposed a forensic model for investigation of cloud storage services (Amazon S3, Google
Docs, Evernote, Dropbox) using which analysis of artifacts present in the client devices
such as Android smartphone, iPhone, Windows system and Mac system can be possible.
Darren Quick and Raymond Choo have analyzed the data remnants of cloud storage ser-
vices (Dropbox, GoogleDrive, and Microsoft SkyDrive) on user machines [74, 75, 77]. In
another paper [76], they have used browser and client software (for example Google drive
client software [20] is available for PC’s, Android device, iPhone and iPad ) to collect and
preserve data (basically files) from cloud services mentioned above. Through their exper-
iment, they noticed that there is no change in the integrity of the data through processes
such as uploading, downloading and storing of files in the cloud store. However, the file
timestamp information were changed. Hence, they cautioned forensic investigator about
the implications on making wrong assumptions regarding timestamp.
Corrado Federici has developed a CDI (Cloud Data Imager) library, a mediation layer,
that offers browsing facility for files and folders with metadata of cloud storage ser-
vice [57]. They have built a desktop application on top of the CDI library that provides
folder listing with ability of viewing present, deleted and shared contents. Also, using this
desktop application, the investigator can image a folder tree of cloud account to widely
used forensic format EWF (Expert Witness Format). We believe that the research which
has been done so far to investigate remote storage services, can also be used to investigate
any other cloud storage service.
Garfinkel in his research paper [59] has summarized the research directions of digital
forensics for the next 10 years from the year 2010. He suggests to the digital forensic re-
search community to adopt standardized and modular approaches for data representation
and digital forensic processing. He makes a valid point about the scalability and valida-
tion of the existing tools. He says, digital forensic techniques that are developed and used
today are on relatively small data sets (n <100), which fails to scale for real-world sizes
(n >10,000). Here ‘n’ refers to the number of JPEG files, size of disk in TB (tera bytes),
28
Hard drives, mobile phones, etc.
2.4 Summary
Various advantages offered by the cloud computing business model has made it one of the
most significant of the current computing trends like personal, mobile, ubiquitous, cluster,
grid, and utility computing models. These advantages have created complex issues for
forensic investigators and practitioners for conducting digital forensic investigation in the
cloud computing environment.
In this Chapter, we have defined a few general terms required to understand and per-
form digital forensics. Keeping the cloud computing platform in mind, we have listed
the limitations of traditional digital forensic tools to perform forensic analysis in cloud.
Considering the cloud computing architecture and characteristics, we have identified the
sources of digital evidence to perform digital forensics. We highlighted our emphasis on
the effect and role of the cloud deployment models, the delivery models and the multi-
layered cloud architecture in the cloud crime investigation. Finally, we provided research
reviews from some of the experts in this emerging area in last few years. In Chapter 3,
we will discuss about one of the major objectives of this work, i.e., identifying the chal-
lenges and requirements of forensics in the virtualized environment without which cloud
computing systems cannot exist.
29
Chapter 3
Approaches to Forensics in Presence of
Virtualization in Cloud
“Knowledge will bring you the opportunity to make a difference.”
- Claire Fagin
3.1 Introduction
In Chapter 2, we had provided a glance on the background and related research in the area
of cloud forensics. After carefully examining the past research, we could identify certain
areas to contribute. In this Chapter, we discuss the various ways in which digital forensic
can be carried out in a virtual environment and the challenges therein.
Cloud computing is an Internet based computing paradigm that delivers on-demand
software and hardware computing capability as a “service” where the consumer is com-
pletely abstracted from the computing resources. These services are provided by means
of virtualization.
Virtualization is omnipresent in the cloud computing. It is one of the important tech-
nologies for the realization of cloud computing [89]. Using virtualization one can create
as many virtual machines as possible on a required hardware. A virtual machine (VM) is
a software replacement/implementation of a computer that executes programs like a phys-
ical machine. Just about every literature published today mentions virtualization or cloud
30
computing. Most companies are adopting different virtualization technologies in their
current IT environments. VMware ESX/ESXi/Workstation, Microsoft Hyper-V/Virtual
PC, Citrix XenServer, IBM z/VM, KVM, Sun VirtualBox, QEMU, etc. are few examples
of different virtualization solutions which are used by many companies extensively today.
With more emphasis being placed on going green and power becoming more expen-
sive, virtualization offers cost benefits by decreasing the number of physical machines re-
quired within an environment. A virtualized environment offers reduced support by mak-
ing testing and maintenance easier. The way of performing digital forensic has changed
due to these virtual environments.
As the use of the virtual machine environment increases, computer attackers are be-
coming increasingly interested in exploring the virtual environment to spread malware,
steal data, or conceal activities. The contributions of this chapter is mainly limited to the
following.
• Identification of challenges and requirements of forensics in the virtualized envi-
ronment
• Devise procedures to counter the major challenges in performing forensic analysis
3.2 Challenges and Requirements of Forensics
Cloud computing is characterized by its highly virtualized environment. Traditional dig-
ital forensics cannot be applied to the cloud environment directly due to its virtualized
nature. After the extensive literature survey we conducted (Chapter 2), we have identified
the following digital forensic challenges in the cloud virtual environment for which some
of the counter measures will be provided in the succeeding sections.
1. Understanding of the files created when a virtual machine is launched?
2. Multi-level virtualization: a virtual machine can run virtualization to launch any
number of virtual machines on top of it
3. Detecting virtual environment on a physical system or a virtual machine
31
4. Security challenges due to the vulnerability in the host operating system of the
virtual platform
5. Increase in virtual hard disk space: will increase the time taken to complete the
forensic investigation
6. Digital evidence is no longer confined to the local or single hard drive
7. Analysis of cloud hosting server logs: virtual environment hosting the cloud ser-
vices create altogether different logs for running and maintaining VMs and provid-
ing other services
We will address the first four challenges in the following sections, and remaining three in
the succeeding chapters.
3.3 Detection of Virtual Environment
There are different ways in which the virtual machines can be created and used. First,
virtual machines can be created on the same machine where already an OS is installed.
Second, virtual machines can be created using different cloud deployment models (Pri-
vate, public, community, and hybrid). Third, virtual machines can be created in any
external storage like USB flash drive, USB hard drive, or other portable storage devices
like an iPod or mobile phone. In this section we will explore the forensic analysis of vir-
tual machines for the first case using VMware workstation as the virtualization software.
Contributions on analysis of the cloud virtual machines will be discussed in Chapter 5
and 6. Analysis of virtual machines created on external storage is not within the scope of
this thesis.
32
3.3.1 Important files in virtual machine investigation
The type of virtual machines which can be created on an existing OS with the help of
virtualization software can also be created within another virtual machine by having a
virtualization software as an application within the virtual machine. This concept can be
termed as multi-level virtualization and is shown in the Figure 3.1.
Figure 3.1: Multi-level virtualization
The virtual machines that are running within VM1 with the help of Guest OS and
VMM can be analysed by taking a snapshot or copying the virtual disk file of VM1. Tak-
ing a snapshot or copying the virtual disk file (.vmdk file in case of VMware workstation)
of VM1 is possible through VMM in the case of type 1 hypervisor and host OS in the case
of type 2. The snapshot which has been taken or the virtual disk file that has been copied
for forensic analysis purpose is called as “digital evidence”. Once the VM’s virtual hard
disk file is acquired, that can be mounted as a virtual drive and analyzed using various
available traditional digital forensic tools [9, 11, 12, 17, 37, 45]. More on the analysis of
virtual machines is discussed in the Chapter 5 and 6.
In order to make any conclusion to say that a virtual machine may have existed on the
digital evidence or not, one has to find at least one of the files as listed in the Table 3.1 on
the evidence being analyzed. These files are specific to the VMware workstation virtual-
ization [42] solution and may be different for other solutions like Citrix, Sun VirtualBox,
QEMU, Microsoft Virtual PC, etc.
33
Table 3.1: Files which make up a virtual machine
File Extension Description
.VMDK/ .DSKIt is created for virtual hard drive for the virtual guest operation
system, which may be either dynamic or fixed virtual disk.
.VMX/ .CFGConfiguration file. Stores settings chosen in virtual machine
settings editor.
.LOGIt contains log of activity for a virtual machine and hypervisor.
It is stored along with (.VMX) file
.VMEMIt is backup of the virtual machines paging file. It is available
only when the VM is running or has crashed
.VMSN It is VM’s snapshot file that stores running state of the same.
.VMXF
This is a supplemental configuration file for virtual machines
that are in a team. It remains if a virtual machine is removed
from the team.
.VMTM
Configuration file containing information of a team of VMs.
A team is a group of virtual machines which can inter-operate
in a VMware virtual lab environment.
.VMSS/ .SDT It stores the state of a suspended virtual machine.
.NVRAM It stores the BIOS information of the virtual machine.
34
3.3.2 Changes in the host OS when the virtual platform is used
Now-a-days virtual machines have almost replaced the physical machines in the day-
to-day activity of IT professionals or organizations. These developments should be of
interest to the cyber criminals for several reasons including exploring virtual environments
for crimes. In this section we will analyze the host OS (it can be a guest OS as shown in
Figure 3.1, running on VM1) to detect the presence of the virtualization software.
In this experiment, we have used Windows 7 as the host operating system and VMware
workstation as the virtualization software. We have used ZSoft Uninstaller 2.5 [47] for
analysis purpose. ZSoft Uninstaller 2.5 is a freely available software which can uninstall a
program and find the remnants after uninstalling. We have carried out the following steps
to detect the presence of virtual environment.
Step 1: installed ZSoft Uninstaller 2.5 on Windows 7 system.
Step 2: ran ZSoft Uninstaller 2.5 software to capture the system snapshot.
Step 3: installed VMware workstation on Windows 7 system for virtualization.
Step 4: created a virtual machine using the standard procedure available with the VMware
workstation. Used the created virtual machine for the day-to-day activity such as surfing
internet, sending mails, etc.
Step 5: ran ZSoft Uninstaller 2.5 software to capture the system snapshot again.
We could observe the file changes in the host OS due to the virtualization as shown
in the Figure 3.2. The same experiment can be carried out for other virtual environments
like Citrix, Sun VirtualBox, QEMU, Microsoft Virtual PC, etc. to detect their presence.
Using this approach a virtualized data center administrator can keep track of the multi-
level virtualization and monitor the activities of virtual machine users.
35
Figure 3.2: Changes in host OS files during VMware workstation installation
3.4 Detection of Virtual Machine Hidden Using ADS
Virtualization poses challenges to the implementation of security as well as cybercrime
investigation in the cloud. Hiding of data has always been a major part of the computer
forensic analysis process. Data hiding in a digital media can be performed for various
reasons including potential malware attacks, hiding data for later use in a compromised
environment by an attacker, or when an offender hides useful information in his personal
computer [68]. There are numerous methods that can be used in order to hide data from
potential examination. One of them is hiding data in alternate data streams under NTFS
or HFS+ [51]. Other methods are hiding data in the slack areas of the digital media. Data
which is hidden in slack areas (file slack, disk slack etc.,) can be easily carved out using
traditional digital forensic tools especially carving tools [73].
36
The method of hiding data using ADS has legitimate applications like service for
Macintosh operating system (interoperability between HFS and NTFS), volume change
tracking, storing summary data information, etc. This method of hiding files is vulnera-
ble to both insider and outsider attacks, whereby the attackers may hide files in windows
systems supporting NTFS. The insider may choose this method to perform unauthorized
or unacceptable deeds on his system. The outsiders may choose this method to hide mali-
cious files on a remote system and to prevent third parties from finding out the files. The
same method can be used to hide virtual machines created on the virtualization environ-
ment and used for malicious purposes.
3.4.1 Role of Alternate Data Streams (ADSs)
Alternate Data Streams (ADS) is a unique feature of NTFS file systems introduced with
Windows NT 3.1 in the early 1990’s to provide compatibility between Windows NT
servers and Macintosh clients which use the Hierarchical File System (HFS) [87]. Under-
standing the concept of alternate data streams requires the knowledge of the structure of a
special metadata file called Master File Table (MFT). The Window system creates twelve
metadata files when an NTFS (New Technology File System) partition is formatted that
contains information about the volume itself and the data stored in it [51]. The file that
stores all of the records and attributes that Windows system needs to access any file or
directory on the volume is called as the MFT. The length of each record in the MFT may
vary with a minimum of 1,024 bytes and a maximum of 4,096 bytes. Each record contains
different attributes of a file as shown in the Figure 3.3.
Figure 3.3: MFT file record with sample attributes
SIA: Standard Information Attribute
FNA: File Name Attribute
DA: Data Attribute
37
The data attribute contains the cluster information (cluster chain) that is allocated to
a particular file depending on whether a file is resident or nonresident. A file is said to
be resident when stored in MFT itself, otherwise it is called as nonresident file. A “Clus-
ter” is the basic allocation unit of a file in Windows system. NTFS supports only one
data attribute by default per record without a name called as unnamed attribute, so any
additional data attributes must be named. A directory has no default data attribute but can
have optional named data attributes. These additional named data attributes may contain
the alternate data streams as shown in the Figure 3.4.
Figure 3.4: MFT file record with named attributes
H: Header of the attribute
N: Name of the attribute
LCN: Logical Cluster number
The Logical cluster number (LCN) range specifies the sequential clusters allocated to
the data stream. Figure 3.4 shows the MFT record containing three alternate data streams
(DA1, DA2, and DA3) whose names can be obtained from “N: name attribute”. The
original file name (assume file-org.jpg) is specified in the FNA field of the MFT record.
Any file can be attached to the file file-org.jpg using one of these named data attributes
(DA1, DA2, and DA3).
Example of Hiding and Accessing a File Using ADS:
To attach a file myFile.txt to the file-org.jpg, a malicious attacker can use the following
command [49].
C:\>type myFile.txt >file-org.jpg:hiddenFile.txt
Here, [type] is a command to create ADS.
38
[>] serves for redirecting file
[:] serves for separating stream with original file
The file myFile.txt is hidden in file-org.jpg with hiddenFile.txt as stream name. Now,
the user can delete the file myFile.txt from its original location permanently and still can
access it as a stream. To open the stream, one can use the following command:
C:\>notepad file-org.jpg:hiddenFile.txt
Here, notepad is a utility program which can open any text file.
3.4.2 Approach to Hide and Detect a VM Hidden using ADS
This approach is only applicable to the machines containing Windows Operating system
as the host OS with NTFS as file system. For experimental purpose, we have considered
a compute server of private cloud as shown in Figure 3.5 which runs a set of virtual ma-
chines. We have installed VMware workstation 8.0 virtualization software on Windows 7
virtual machine (VM1) to create virtual machines on it. We created two virtual machines,
one containing Windows 7 (VMa) and other containing Ubuntu 12.04 (VMb) as operat-
ing systems. We experimented with the second virtual machine for the purpose of hiding
and reusing it. As we have described in Table 3.1, any launched virtual machine creates
different files. The file which is important to malicious insider or outsider is virtual disk
file (.vmdk in our case). This .vmdk file is similar to any other system file and can be
easily viewed and hidden with the help of host operating system (refer Windows 7 OS in
VM1).
Hiding and Accessing a VM Using ADS:
When we created the second virtual machine, it could create a virtual disk file with the
name ubuntu.vmdk in the folder path as shown below:
C:\Program Files\VMware\VMwareWorkstation\Ubuntu
We tried to hide ubuntu.vmdk file with a temporary file vmtest.txt in the same path as
follows.
C:\Program Files\VMware\VMwareWorkstation\Ubuntu>
type ubuntu.vmdk >vmtest.txt:myVMFile.vmdk
39
Figure 3.5: Hiding of virtual machine in a cloud hosting server
Now the stream is ready to be used with the name “vmtest.txt:myVMFile.vmdk”. For the
experimental purpose we deleted the file ubuntu.vmdk from its original location and tried
to use the virtual machine that caused an error, showing the message as shown in the
Figure 3.6.
Figure 3.6: Launching a hidden virtual machine
After carefully examining the error message, we realized that one of the supporting
files uses the path of the virtual disk file. So, we edited the configuration file (ubuntu.vmx)
to locate the path of ubuntu.vmdk file as shown in Figure 3.7.
After modifying the configuration file by replacing the path of the old ubuntu.vmdk
file with the new data stream vmtest.txt:myVMFile.mvdk as shown in Figure 3.8, we could
successfully use the virtual machine. Now the hidden virtual machine containing Ubuntu
12.04 can be used for malicious purposes (like storing executables, performing denial of
service attacks, etc.) or whatsoever the user desires.
40
Figure 3.7: Configuration file (.vmx)
Figure 3.8: Modified configuration file (.vmx)
From the digital forensics investigator’s view point, it is impossible to retrieve the
hidden virtual machine because a malicious user may use a file shredder software to over-
write the content of the deleted ubuntu.vmdk file. File shredder is a software that one
can use to safely delete any file which makes it practically impossible to retrieve the file
forensically.
Figure 3.9: Hash value of vmtest.txt file before ADS attachment
We have checked the integrity of the file (vmtest.txt) by hashing it before and after the
attachment of the alternate data stream (ubuntu.vmdk in this case). The tool we have used
41
Figure 3.10: Hash value of vmtest.txt file after ADS attachment
to compute the hash value was Hasher which is part of the CyberCheck suite [9] that uses
MD5/SHA/HMAC algorithms. The hash values of the file vmtest.txt before and after the
attachment of ADS (virtual disk file of a VM) are shown in Figure 3.9 and Figure 3.10
respectively. Because of the same hash values before and after the attachment, one can not
prove that the file vmtest.txt is modified in any sense in the digital forensic investigation.
Hence, the investigator may not have any clue about the presence of the ADS with any
kind of file in the system under investigation.
Methodology to detect hidden VMs:
In NTFS, the MFT is the main data structure that contains all the information required to
retrieve files. The first record of MFT gives details about the layout of MFT, the total size
of MFT and whether a particular record is currently in-use or not. The Bitmap attribute in
the first record indicates the status of an MFT record. The attribute contains a sequence
of bits where each bit represents the allocation status of an MFT record. If a bit is set to 1
then the corresponding MFT record is in-use. It means that the record represents a normal
undeleted file. If the bit is zero then the record is not used currently and it may contain
information about a file that has been deleted [92]. Our interest is to detect the hidden
virtual machines using data streams, and not to retrieve the original or deleted files.
For detecting a hidden virtual machine within an NTFS file system, we have to scan
every record in MFT, and see the presence of named data attributes if any in the MFT
record. If present, the following filters are applied to check the metadata information of
the file. These three filters guarantee that, the hidden file is in fact a virtual machine file.
1. Check for the file extension (.vmdk, .vhd, .vdi, .qcow2, etc.)
42
2. Check for the file size limit (>1GB)
3. Check for the header signature (Table 3.2)
In the flow chart as shown in the Figure 3.11, if DACount = 1, that means the file
does not contain any kind of named streams. If DACount >1, the file may contain one or
more named streams (Alternate data streams) in it. The algorithm explained in the chart
proceeds by initially getting the number of data attributes of a MFT record and iterating
through each data attribute to check whether it contains a VM’s virtual disk file (.vmdk,
.vhd, .vdi, .qcow2, etc.). The process continues for MFT records of all the files available
within a NTFS partition.
The first and second filters are necessary conditions to check whether a given file is
a virtual machine file or not, but not sufficient. In the case of the first filter, a user can
easily change the file extension (signature miss-match), and one cannot judge based on
only extension. The file extension can be read from the stream name attribute (DA1: N
of Figure 3.4). The second filter does not guarantee that a given file is a virtual machine
file and not a video file for instance. The first and second filters can act as adornment to
the third filter. As each and every file has a unique header, it would be sufficient to match
the header of a given alternate data stream file with the header of a virtual hard disk file.
Table 3.2 shows the header signatures of different virtual disk files.
Table 3.2: Virtual disk file signatures
File extension Header signature.vmdk 4B444D56 (KDMV).vhd 636F6E6563746978 (conectix)
.qcow2 514649FB(QFI.).vdi 5644492E (VDI.)
To get the header of a given alternate data stream file, one has to read the first cluster’s
first few bytes allocated to the stream (refer cluster number: 236 of Figure 3.4). The
stream file size can be obtained from the stream header attribute (DA1: H of Figure 3.4).
The detection algorithm which we suggested can be used by the cloud service provider
to monitor the activities of the virtual machines from the host operating system. The
cloud service provider can pre-configure the virtual machine instances with the detection
43
Figure 3.11: Detection of hidden virtual machine
44
algorithm proposed before using them in the cloud environment.
3.5 Summary
In this Chapter, we have presented the ways in which forensic analysis can be done in
the virtual environment that is omnipresent in the cloud. We identified the challenges and
requirements of performing digital forensics in the virtualized cloud environment. Taking
the example of a VMware workstation as a specific virtualization solution, we have de-
vised a procedure to detect and analyze the cloud virtual environment. This procedure can
be applicable to other platforms like Microsoft Virtual PC, Sun VirtualBox, QEMU, etc.
The procedure for detecting and analyzing the virtual environment in cloud got published
in [Pub1]. From the analysis perspective, we described the way in which virtual machines
in cloud can be hidden using the Alternate Data Streams (ADS) technique of Windows.
Also, we presented an algorithm to detect such virtual machines. The proposed algorithm
presents three filters. On implementation, this algorithm guarantees the detection of hid-
den virtual machines. The algorithm and related work we presented here got published in
[Pub2]. In the next Chapter, we will describe the proposed digital forensic framework for
cloud computing systems from the perspective of the cloud investigator and/or the cloud
architecture.
45
Chapter 4
Designing a Digital Forensic
Framework for Cloud Computing
Systems
“Make things as simple as possible... but not simpler.”
- Albert Einstein
4.1 Introduction
In this Chapter, we had identified the challenges and requirements of performing digital
forensics in the virtual environment that is omnipresent in the cloud. Detection of a virtual
environment and an algorithm to detect the cloud virtual machines hidden using ADS
were discussed in particular. In this Chapter, we discuss the proposed design of a “digital
forensic framework” for cloud computing systems from the view point of the investigator
and/or the cloud architecture.
Cloud as a business model presents a range of new challenges to the digital forensic
investigators due to its unique characteristics. It is necessary that the forensic investigators
and/or researchers adapt the existing traditional digital forensic practices and develop new
forensic frameworks which would enable investigators to perform digital forensics in the
cloud computing environment.
46
Ben Martini and Kim-Kwang Raymond Choo [69] have proposed a iterative integrated
digital forensic framework for the forensic data collection and preservation from cloud
services. Also, they have advised the forensic research community to identify the changes
that need to be incorporated into the existing forensic practices and frameworks. In the
following sections, we discuss the forensic phases required for cloud and compare them
with those of Ben Martini and Kim-Kwang Raymond Choo, NIST and McKemmish.
Considering all the phases, we will design a digital forensic framework from the view
point of the digital forensic investigator and/or the digital forensic tools developer.
4.2 Cloud Forensic Process and Phases
The phases involved in investigating a cyber crime does not change with the investigative
environment like desktop, laptop, mobile, network, server with virtual environment, or
cloud. All the phases remain the same irrespective of the environment except the way
in which they are applied. Figure 4.1 shows various phases involved in the cyber crime
investigation as suggested in the digital forensic literature [52, 64, 69, 71].
Figure 4.1: Phases of cyber crime investigation
Hashing is also part of authentication and preservation. There can exist two kinds
of labs for performing forensic acquisition and analysis in the jurisdiction of a country.
47
One will be maintained by the cyber crime department (called as cyber crime lab) and
another by dedicated forensic laboratories of the government or private body (called as
forensic lab). Identification of evidence and seizure with hashing will be carried out by
the department of cyber crime. The experts of both the labs will be capable of acquisition,
authentication, analysis and preservation. If analysis of the crime under investigation is
performed in the forensic lab, the presentation of evidence to the court-of-law will be
done by the cyber crime department in the presence of the expertise of the forensic lab
as a witness of the evidence. Otherwise, officials of the cyber crime department can
directly submit the evidence. The findings of the investigation has to be repeatable and
reproducible at any time before the court-of-law, and hence the preservation phase.
4.2.1 Comparison of Digital Forensic Frameworks
The digital forensic frameworks suggested by NIST [64] and McKemmish [71] were very
much similar. According to NIST, the identification and preservation are part of the col-
lection phase and in McKemmish, the examination is part of the analysis phase. The aim
of these two frameworks was traditional digital forensic investigation. Ben Martini and
Kim-Kwang Raymond Choo’s forensic framework [69] for cloud computing was based
on these two frameworks. They called it a iterative framework due to the backward conti-
nuity from phase 4 (Examination and analysis) to phase 1(Evidence source identification
and preservation). This is possible for the fact that the identification and preservation of
the evidence in cloud has to be done once the use of the cloud services in the client device
is reported in the examination and analysis phases of the client device.
We propose our framework based on the above three frameworks by incorporating
a phase, i.e., examination and partial analysis (phase 3). Through this framework, we
contributed in the following areas, for which the proof of concept will be provided in
Chapter 5 and 6.
• Segregation of log data
• Selective data acquisition
• Partial analysis of evidence: this includes analysis of the evidence within a VM
(memory, registry, file system metadata, etc.) to speed up the final analysis.
48
Table 4.1: Comparison of digital forensic frameworks
Phase No.Proposed cloud
forensic framework
Integrated forensic
framework [69]
NIST
framework [64]
McKemmish
framework [71]
1
Evidence source
Identification,
Segregation, and
Preservation
Evidence source
Identification and
Preservation
Collection Identification
2
Collection (from
client device as
well as cloud)
Collection Examination Preservation
3Examination and
Partial analysis- - -
4 AnalysisExamination and
AnalysisAnalysis Analysis
5 ReportingReporting and
PresentationReporting Presentation
• Digital forensic triage using parallel processing
Before we discuss “Digital Forensic Framework for the Cloud Computing Systems”, we
elaborate on the activities of LEA (Law Enforcement Agency) or the investigator in each
phase, who will be responsible for performing the investigation of the cloud crime in the
following sections.
4.2.2 Identification of Digital Evidence
As an entry point, this phase describes the ways of identifying the sources of evidence
in the digital forensics investigation in the cloud environment. The sources of evidence
could be a client device or the cloud service provider’s data center. A client device can
be a desktop computer, laptop, mobile device, or any device using which one can access
the cloud services. After a reported cloud crime, the client device can be identified using
the network forensic techniques (for example, analysis of the firewall logs of a company
to know which host in the company’s own network is connected to a cloud service). The
identification phase may also be required during the analysis phase to know how the
49
identified device was connected to the cloud environment. Any digital device may connect
to the cloud service using a web browser or client provided by the cloud service provider.
Whether it is a cloud provider’s data center or a client device, identification of the presence
of evidence, its type, format and the location are very important.
4.2.3 Collection and Preservation of Digital Evidence
The emphasis of the cloud investigator for this phase will be on how the data is collected
and preserved for further analysis. Irrespective of the device (sources of evidence) iden-
tified, the forensic investigators need to ensure the proper collection and preservation of
the digital evidence. The Scientific Working Group on Digital Evidence (SWGDE, 2006)
alerts the forensic investigator that the evidence submitted for the analysis should be main-
tained in such a way that the integrity of the data is not lost. Hashing is the commonly
accepted method to achieve this. There are well known data preservation techniques
available like MD4 (Message Digest), MD5, SHA-1 (Secure Hash Algorithm), SHA-2
and SHA-3. The data collection method will depend on the type of the cloud platform
and the deployment models used. Also, the investigator needs to collect the data from the
cloud client device and the cloud service provider’s data center.
Client Side Data Collection and Preservation:
Once the client device is identified, its physical memory data should be collected before
powering off the device. There are numerous tools available for memory acquisition
(FTK imager, OSForensics, dd - data duplication, LiME, etc). The data from powered
off device can be collected using software tools (FTK imager, EnCase Forensic Imager,
TrueBack, etc) or hardware tools (Tableau forensic duplicator, HardCopy 3P, etc). Many
of the above tools have the capability of performing forensically sound data acquisition,
i.e., preservation.
50
Client Side Data Analysis:
The analysis part at the client side proceeds similar to the one in the traditional digital
forensics way by keeping a view on the usage of the cloud service by the client device. Lo-
card’s Exchange Principle says that “Every contact leaves a trace”. There is every chance
of possible remnants in the client device if the criminal is not aware of the anti-forensic
techniques (Darik’s Boot and Nuke (DBAN) [10]. The investigator may need to use the
traditional digital forensic tools [9, 11, 12, 17, 25, 26, 37, 43, 45] to identify the traces of
the cloud services. The investigator has to perform the analysis of cookies, logs, database
files, registry, prefetch files, browser history, pagefile, link files, physical memory, net-
work traffic (incoming and outgoing network packets from the client machine) etc., to get
the possible evidence that proves the usage of the cloud services. Darren Quick, et al.,
identified the types of terrestrial artifacts that are likely to remain on a client’s machine
when one of the cloud services is launched from it [74]. The procedure to perform the
analysis to know about these artifacts is not unique due to the presence of a variety of
operating systems like Windows, Ubuntu, Mac OS, Android, etc. in the client devices.
Cloud Side Data Collection and Preservation:
In the case of the private cloud deployment model, the investigator can use remote acqui-
sition methods to get the virtual disk data and the physical memory data pertaining to a
particular VM (for example the investigator can use dd - data duplication utility of Unix
to acquire the virtual disk image as well as the physical memory image). Unfortunately,
the provenance of the cloud crime not only depends on the analysis of the virtual disk and
the memory of a VM used by the criminal but also on the logs generated by the virtual
machine during its operation. Such logs are categorized as API logs (logs start, end and
life activity of a VM) and host logs (also called as firewall logs, used to log the network
activity of a VM).
A private cloud data center (or any cloud data center for that matter) runs as many
VMs as possible depending on their computational capacity and the requirement. Data
generated by all the VMs and the cloud services that are utilized by the cloud platform
are stored in different log files which cannot be provided to the investigator due to the
51
issue of privacy of other tenants in the cloud. Hence, irrespective of the cloud deployment
model, there is a requirement of segregating the cloud log’s data and collecting the data
of a particular tenant using remote services.
In the case of the public deployment model, the data collection may not be that simple
as the private deployment, because the data is geographically dispersed. If remnants are
found in the client machine about the cloud usage, the investigator has to know for what
purpose the cloud service was used. If it is used for storage purpose (Google Drive,
Dropbox, Windows SkyDrive, etc.), the investigator can obtain the user credentials from
the client machine and get the data stored in the cloud [74]. Otherwise, possibly it might
be used for owning a VM in the cloud (because we are not considering SaaS and PaaS
models in this work). In this case, the investigator will have the option to either download
the virtual disk file or request the cloud service provider to ship his or her virtual disk
data [55].
At the scene of crime after completion of the collection phase, the proposed exami-
nation and partial analysis phase will commence. Using this phase, the investigator can
collect actionable evidence from the collected data with the help of inputs form LEAs
about the nature of the crime. We are using forensic triage to gather actionable evidence
that will be discussed in Chapter 5 and 6. The collected actionable evidence will be
provided as input to the analysis phase for further action to speed up the investigation
process.
4.2.4 Analysis of the Digital Evidence
This phase emphasizes on the examination of the evidence after the source of evidence is
identified, data collected and preserved from the source (cloud computing platform).
Cloud Side Data Analysis:
The extensive study we conducted on the existing work on cloud forensic suggests that
there is no cloud computing architecture that provides in-built forensic facility for data
analysis. Once the data is collected from the cloud environment, the method of analysis
52
depends on the type of the data collected. In the case of the virtual disk data, the tradi-
tional digital forensic analysis procedure can be followed. For the purpose of cloud log
analysis, the data segregation at the cloud data center has to be done so that the evidence
of interest can be collected and analyzed depending on the nature of the cloud crime.
Further discussion on data segregation and analysis will be done in the Chapter 5 and 6.
4.2.5 Reporting of Digital Evidence
This phase provides a way to document and present the evidence found during the analysis
before the court-of-law to enforce the punishment to the cyber criminal depending on the
nationwide policies. There is no major change required in the phase of reporting evidence
other than following forensic-aware Daubert principles [8]. Figure 4.2 shows the Daubert
principles that are required to test the admissibility of the digital evidence in the court-of-
law. In general, there is no nationwide rules to satisfy all these principles, but, if tested
should retain the chain of custody of the evidence under investigation.
Figure 4.2: Daubert principles for digital forensic [8]
As pointed out by Martini [69], before presenting the cloud evidence to the court,
there is a requirement of clear distinction between the data owned by the suspect and the
data generated by the cloud service provider. This is the major difference between the
cloud and the traditional digital forensic evidence presentation. In the cloud, a number of
parties like LEAs, the cloud service provider, etc. may be involved to collect the digital
evidence. Due to the number of parties involved, it is very important to maintain the chain
53
of custody record as shown in Figure 4.3.
Chain of custody is a record that documents all the stages chronologically in the cy-
bercrime investigation showing seizure, acquisition, custody, transmission, examination,
analysis and disposition of the evidence that was investigated [66].
Figure 4.3: Content of the chain of custody record
Figure 4.3 shows a possible template of the Chain of custody record. The record that
is presented here is our own observation derived from traditional forensics and there is
no standard format of this record which can be used by digital forensic labs across the
nations. The investigators may need to focus on the technical aspects of the forensic
investigation and presentation of the evidence to the court, assuming the later is already
aware of the cloud computing deployment and service models.
The countermeasures and solutions for the challenges identified in the cloud forensic
phases above will be discussed in Chapters 5 and 6.
54
4.3 Heuristic Approach for Performing Digital Forensics
in Cloud
The control flow diagram of the proposed “Heuristic approach for performing digital
forensics in cloud computing environment” as shown in Figure 4.4, is fundamentally a
way forward to answer the first question raised by Ben Martini and Kim-Kwang Raymond
Choo [69] to the digital forensic community. The heuristic approach we proposed is based
on the previous digital forensic frameworks of McKemmish [71], NIST [64] and Martini
et al. [69].
The flow of control in the designed approach is self-explanatory which would enable
the forensic investigators to perform investigation in the cloud environment. This ap-
proach can be used as a forensic process for the cloud computing platforms by the foren-
sic investigators who may not have sufficient knowledge of how the cloud services are
built and running. It differs from the traditional digital forensic process in certain aspects.
Particularly, the client side data analysis will start before the cloud side data collection
and preservation. The client device in the identification phase refers to any traditional
computing system like desktop, laptop, mobile, etc. The data acquisition and analysis of
client device reveals the usage of cloud services from the client device. Depending on the
usage of the cloud services, either the virtual disk data has to be collected or the set of
files that are stored in the cloud storage has to be copied along with the logs of the cloud
services.
The digital forensic triage phase will start after the collection of the evidentiary data
from the private or public cloud data center. The methods used for forensic triage for
partial analysis will be discussed in Chapters 5 and 6. The results of the forensic triage
will be used in the further analysis of the digital evidence using traditional digital forensic
methods. The forensic triage will help in the examination and partial analysis of the
collected data to minimize the total processing time of the investigation.
55
Figure 4.4: Control flow diagram for digital forensic investigation in cloud
56
4.4 Digital Forensic architecture for Cloud
In the previous section we have proposed a heuristic approach for performing digital
forensics in the cloud computing environment (IaaS delivery model of the private and the
public cloud). This approach does not include the internal details of the cloud architecture,
either private or public. In this section we will provide a digital forensic architecture
for cloud computing platforms, which is based on the NIST cloud computing reference
architecture [67] and cloud computing solutions like Eucalyptus [13], OpenNebula [28],
OpenStack [29], etc. This architecture would be useful to the digital forensic community
for designing and developing new forensic tools in the area of cloud forensic.
Figure 4.5: Digital forensic architecture for cloud
57
4.4.1 Cloud Infrastructure Setup
A cloud infrastructure consists of the required bare metal hardware to deploy a cloud
computing environment, either private or public. As shown in Figure 4.5, the hardware
may comprise few high end servers for compute and storage, network switches/routers,
and cables for networking. Any cloud operating system [13, 28, 29, 40] can be installed
on these hardware to set up a cloud platform. For experimental purpose, we have set up
a cloud test bed using OpenStack cloud OS with hardware configurations as shown in
Table 4.2.
Table 4.2: Hardware configuration details of the private cloud (IaaS)
Hardware Equipment Qty. Purpose
HP ProLiant 1U Rack Server, Intel Xeon E3-1220v3
(3.1GHz/4-core/8MB/80W, HT),HP 1TB Non-hot plug
LFF SATA, 32(4*8GB)RAM , 4 NICS
21. Controller node
2. Compute node
RACK 37U 1 Housing the Servers
HP 5120 20 RJ-45 autosensing 10/100/1000 ports 1 External connection
HP 1910 24G switch 1Interconnection among
servers
3KVA UPS 1 Power backup
UTP CABLE AND IO BOX,PATCH CARDS 1 Networking
4.4.2 Cloud Deployment (Cloud OS)
A cloud deployment platform mainly consists of the services to manage and provide ac-
cess to the resources in the cloud environment. Due to the layered architecture, the user
access to cloud resources is restricted based on the delivery models (IaaS, PaaS, or SaaS).
Other than the hardware and virtualization layer, the user has access to all other layers
in the IaaS model, whereas restricted access to PaaS and SaaS as depicted in the archi-
tecture. From the cloud crime investigation perspective, the services which manage logs,
instances, images, storage and network will be of interest to the cloud forensic tool de-
signers and developers.
58
Figure 4.6: Conceptual architecture of the private cloud IaaS
For experimental purpose, we have set up a IaaS (Infrastructure as a Service) cloud test
bed using OpenStack as the cloud operating system. Using the hardware configurations
provided in Table 4.2 and the two-node architecture concept of OpenStack [31], we have
deployed the private cloud computing environment. The conceptual architecture diagram
of the private cloud IaaS with one controller node and one compute node is shown in
Figure 4.6. The version of the OpenStack cloud used for this purpose was Icehouse.
The controller node runs the required services of OpenStack to launch and run virtual
machines. The compute node runs all the virtual machines along with the hypervisor. Any
number of compute nodes can be added to this test bed depending on the requirements to
create the virtual machines. The conceptual architecture uses two network switches, one
for the internal communication between the servers and among the virtual machines and
another for external communication. The list of the basic services that are required for
OpenStack private cloud and their use is provided in Table 4.3.
59
Table 4.3: Basic services of OpenStack cloud OS [31]
Service Name Component Use
Dashboard Horizon
Web based portal to interact with other OpenStack services
(i.e., launching an instance, configuring access controls,
attaching volumes to VM, maintenance, etc.)
Compute Nova
Provides virtual servers (or virtual machines) upon demand
by allowing users to create, destroy, and manage virtual
machines using user supplied images.
Image Glance
Provides a catalog and repository for virtual disk images
which are used by Compute (Nova) during instance (virtual
machine) provisioning.
Identity KeystoneProvides authentication and authorization for all the
OpenStack services.
Block Storage CinderProvides persistent block storage to running instances.
Also, used to create and manage block storage devices.
4.4.3 Cloud Investigation and Auditing Tools
The cloud provider may have external auditing services for auditing security, auditing pri-
vacy, and auditing performance. Our goal is to provide forensic investigative services for
data collection, hybrid data acquisition, and partial analysis of the evidence. As shown in
Figure 4.5, the cloud admin (CSP) can make use of the “Forensic Investigative Services”
directly whereas the cloud user and/or the investigator will have to depend on the cloud
admin. The suggested digital forensic architecture for cloud computing systems is generic
and can be used by any cloud deployment model. The methods of data collection, hybrid
data acquisition, and partial analysis of the evidence will be discussed in Chapters 5 and
6.
60
4.5 Summary
The increasing use of cloud services will also increase the criminal exploitation of the
cloud platforms to commit cyber crimes. The exploitation of the cloud services for the
criminal activity presents many challenges for the law enforcement agencies such as data
segregation, collection, multi-jurisdiction, multi-tenancy, chain of custody, etc. In this
Chapter, we started with comparing three major digital forensic frameworks and proposed
a forensic framework for the cloud computing systems. After identifying the required
phases in the framework, we designed a heuristic approach for performing digital foren-
sics in cloud computing environment. The forensic investigators can use this approach as
a forensic process for investigating the cloud computing platforms even without knowing
much internal details of the cloud environment. The proposed approach is an accepted
and published work [Pub3]. Also, we have designed a digital forensic architecture for
cloud computing systems, which may be useful to the digital forensic research commu-
nity for the design and development of new forensic tools in the area of cloud forensic.
The digital forensic architecture we proposed for the cloud environment is under review
as [Pub5] for publishing. In the next Chapter, we will discuss the methods of cloud data
acquisition and analysis.
61
Chapter 5
Digital Forensic Methods for Cloud
Data Acquisition and Analysis
“Attribution is an enduring problem when it comes to forensic investigations. Computer
attacks can be launched from anywhere in the world and routed through multiple hijacked
machines or proxy servers to hide evidence of their source. Unless a hacker is sloppy
about hiding his tracks, it’s often not possible to unmask the perpetrator through digital
evidence alone.”
- Kim Zetter
5.1 Introduction
In Chapter 4, we proposed a “Heuristic approach for performing digital forensics in the
cloud computing environment”, by framing its phases along the lines of the existing foren-
sic frameworks. We also designed a “Digital forensic architecture for the cloud comput-
ing systems”, to design and develop new forensic tools for the analysis of cloud crimes. In
this Chapter, we introduce the digital forensic methods to acquire and analyze the cloud
data. All the methods of acquisition and analysis which we are suggesting will work for
any type of private cloud computing environment like Ecaulyptus [13], OpenNebula [28],
OpenStack [29], etc. In the case of the public cloud, the analysis methods will remain
the same, but not acquisition. The investigator will have to depend on the cloud service
provider for data acquisition.
62
There is no existing digital forensic solution (or toolkit) that can be used in the cloud
platforms to collect the cloud data, to segregate the multi-tenant data and to perform the
partial analysis on the collected data to minimize the overall processing time of the cloud
crime evidence. Inspired with the work of Dykstra and Sherman [56], we proposed mod-
ules for implementation of data collection and segregation; modules for partial analysis
of evidence within (virtual hard disk, physical memory of a VM) and outside (cloud logs)
of the cloud environment.
The approach we suggested for data segregation (cloud logs) will facilitate a software
client to support the collection of the cloud evidentiary data (forensic artifacts) without
disrupting other tenants. To minimize the processing time of the digital evidence, we
proposed solutions for the initial forensic examination of virtual machine’s data (virtual
hard disk, virtual physical memory) in the places where the digital evidence artifacts are
most likely to be present. As understanding the case under investigation is done in a better
way, it saves considerable time, which can be efficiently utilized for further analysis.
Hence, the investigation process may take less time than what is actually required.
For the purpose of the proof-of-concept and experimentation, we use the “Conceptual
architecture of the private cloud IaaS” test bed which is set up using OpenStack cloud
solution (Icehouse version, Figure 4.6).
5.2 Digital Evidence Source Identification, Data Segrega-
tion and Acquisition
In this section, we will discuss the evidence source identification, segregation of the cloud
logs after identification and acquisition of the identified virtual machine’s data along with
segregated log data.
5.2.1 Identification of the Evidence
Any information in the traditional desktop systems will be stored as files including the
data related to systems activity. Depending on the nature of the computer crime, the files
from the storage will be retrieved and parsed to investigate the cause of the crime. Similar
63
to a desktop machine, a cloud user can create and run virtual machines in the cloud envi-
ronment. This virtual machine is as good as a physical machine and creates lots of data in
the cloud for its activity and management. The data created by a virtual machine includes
the virtual hard disk (file with the extension .qcow2 in the case of OpenStack cloud), the
physical memory of the VM, and the logs. Virtual hard disk formats that different cloud
providers may support include .qcou2, .vhd, .vdi, .vmdk, .img, etc. The virtual hard disk
file will be available in the compute node where the corresponding virtual machine runs.
Every cloud provider may have their own mechanism for service logs (activity mainte-
nance information) and hence there is no interoperability on log formats among the cloud
providers. In OpenStack, the cloud logs will be spread across the controller and the com-
pute nodes.
5.2.2 Segregation of the Evidence
Cloud computing platform is a multi-tenant environment where the end users share the
cloud resources and log files which store the cloud computing services activities. These
log files cannot be provided to the investigator and/or cloud user for forensic activity due
to the privacy issues of other users in the same environment. Dykstra and Sherman [56]
have suggested a tree based data structure called “hash tree” to store API logs and firewall
logs. Since, we have not modified any of the OpenStack service modules we have im-
plemented a different approach of logging known as the “shared table” database. In this
approach, a script runs at the host servers where the different services of OpenStack are
installed (for examples “nova service”). This script mines the data from all the log files
and creates a database table. This database table contains the data of multi-tenants and the
key to uniquely identify a record is “Instance ID” which is unique to a virtual machine.
Now, the cloud user and/or the investigator with the help of the cloud administrator can
query the database for any specific information from a remote system as explained in the
next section.
Table 5.1 shows the path of the cloud services logs at the controller and the compute
servers.
64
Table 5.1: Details of the OpenStack cloud service logs [30]
Service Name Hosting Server Location Description
Dashboard
(Horizon)Controller node
/var/log/apache2
or /var/log/httpd
Contain access logs (logs all
attempts to access the web server)
and error logs (logs all unsuccessful
attempts to access the web,server
along with the reason for fail).
Compute
Management
(Nova logs)
Controller node,
Compute node/var/log/nova/
For virtual machine management,
OpenStack runs many services
such as API, scheduler, network,
token authentication, etc. in the
controller as well as compute node.
The logs of these services are gets
created in this directory.
Block Storage
(cinder logs)Controller node /var/log/cinder/
Log file of each block storage
service is stored as api.log,
cinder-manage.log, scheduler.log
and volume.log.
Virtualization
(KVM)Compute node /var/log/libvirt/
Logs activities of all the virtual
machines including services of
hypervisor.
65
5.2.3 Acquisition of the Evidence
We designed a generic architecture for the cloud forensics and tested the forensic meth-
ods which we implemented in the private cloud deployment using OpenStack. The tools
that are designed and developed for data collection and partial analysis will run on the in-
vestigator’s workstation, whereas, data segregation tool runs on the cloud hosting servers
where the log files are stored. A generic view of the investigator’s interaction with the
private cloud platform is shown in Figure 5.1.
Figure 5.1: Remote data acquisition in the private cloud data center
Virtual Disk Data Acquisition:
For acquiring the forensic image of the virtual hard disk of a VM from a remote system,
the investigator can make use of the concepts suggested by Dykstra and Sherman [55]
or use the traditional file transfer applications like WinSCP [44], PuTTY [33], etc. To
preserve the integrity of the collected data, its hash value must be computed irrespective
of the data collection methods used. The creation of a virtual machine in the OpenStack
cloud will create a directory with the name “Instance ID” in the compute node as shown in
Figure 5.2. This directory will contain the virtual hard disk file (disk.qcow2) of the virtual
machine as shown in Figure 5.3, which has to be acquired for analysis. The methods of
the ‘examination and partial analysis’ will be used to extract the forensic artifacts from
this file at the scene of crime by the investigator.
66
Figure 5.2: Directory of virtual machine instances in the OpenStack cloud
Figure 5.3: Virtual hard disk location in the OpenStack cloud
Virtual Machine’s Memory Data Acquisition:
Acquisition of the physical memory (or RAM) data is only possible when the virtual
machine is running (or ON). To acquire VM’s physical memory data, the investigator can
use the traditional digital forensic tools such as FTK Imager [18], LiME (Linux Memory
Extractor) [25], Memoryze [26], etc. The acquisition tool has to be injected into the virtual
machine whose physical memory (RAM) data has to be collected. The investigator cannot
preserve the integrity of the acquired physical memory data due its volatility nature. The
physical memory data analysis may help the investigator in completing the investigation
process but cannot stand in the court-of-law because of the integrity.
Log Data Acquisition:
The segregated log data is collected using the investigator’s workstation, i.e., a computer
device where the acquisition and partial analysis tools are deployed. We have created a
MySQL database with the name logdb and a table servicelogs under the database in the
controller node of the OpenStack, where most of the logs are present. The application
67
screen shots for connecting to the database from the investigator’s machine and viewing
the table content are shown in Figure 5.4 and Figure 5.5 respectively. The investigator
(in the presence of the cloud admin) can go through the table content and form a query
based on ATTRIBUTE, CONDITION (==, !=, <, <=, >, >=), and VALUE to filter the
evidence required and download to the investigator’s workstation if necessary as shown
in Figure 5.5.
Figure 5.4: Connecting to cloud hosting server that stores the shared table database
Figure 5.5: Shared table with different attribute information
The method of data segregation of logs can be applied to any private cloud deploy-
ment; provided, the data segregation tool has to be modified based on the log format of
the cloud service provider. The log data acquisition method we suggested is generic and
can scale to any cloud deployment.
68
5.3 Examination and Partial Analysis of the Evidence
Evidence examination is a process in digital forensics where data gets extracted from
the forensic image for further analysis. Analysis is a process that uses a set of methods to
analyze the forensic data extracted in the examination phase for anomaly detection, corre-
lation, user profiling, timeline analysis, etc. to generate the analysis results. The evidence
examination and analysis approaches of traditional digital forensics cannot be directly ap-
plicable to the cloud data due to virtualization and multi-tenancy. There is a requirement
of “digital forensic triage” to enable the cybercrime investigator to understand whether
the case is worthy enough for investigation. Digital forensic triage is a technique used
in selective data acquisition and analysis to minimize the processing time of the digital
evidence. We will cover more on this technique in Chapter 6. In the following sections,
we will present the methods of the evidence examination and partial analysis required for
the virtual machine data.
5.3.1 Within the Virtual Machine
Hard disk capacity has grown in proportion to the use of computers. With the emer-
gence of cloud computing, this disk space became virtually unlimited to the end users
(for example, VMware provides a datastore of size 62TB [41]). In this scenario, without
knowing the base and visualizing the disk space, the investigator may end up investigating
the evidence without finding any useful evidentiary results related to the crime. Hence,
with the examination and partial analysis phase at the scene of crime at different parts of
the evidence, we provide the investigator with enough knowledge base of the file system
metadata, content of logs (for example, content of registry files in Windows), and the
internals of the physical memory. With this knowledge base, the investigator will have in-
depth understanding of the case under investigation and may save a considerable amount
of valuable time which can be efficiently utilized for further analysis.
Examination of File System Metadata:
Once the forensic image of the virtual hard disk is obtained in the investigator’s work-
station, the examination of the file system metadata or logs (for example, registry file in
69
Windows) will be started as shown in Figure 5.6. Before using the system metadata ex-
tractor or OS log analyzer (for example windows registry analyzer), the investigator has
to mount the acquired virtual disk (.qcow2 file in our case) as a virtual drive. We have
used a tool called “Mount Image Pro” from GetData software solutions [27] for virtual
disk mounting. After mounting, the virtual disk acts like a drive where it is mounted.
Presently “Mount Image Pro” does not support .qcow2 virtual disk format for mounting.
We converted the .qcow2 format to .raw format to mount it. For the conversion, we have
used QEMU disk image utility called qemu-img [35], a example of which is shown below.
$ qemu-img convert -f .qcow2 -O raw windows7.qcow2 windows7.img
This command will convert a qcow2 image file named windows7.qcow2 to a img (raw)
image file.
Figure 5.6: Virtual disk examination process
We have used a free open source software AWStats [5] for analyzing the logs of open
source operating systems. System metadata extractor as shown in Figure5.7 is used to
70
list the metadata information of files and folders available in the different partitions of
the virtual hard disk. For example, a machine where NTFS is used as file system, we
have extracted metadata information of files/folders like MFT record No., active/deleted,
file/folder, filename, file creation date, file accessed date, etc. as shown in the Figure5.8.
This report may differ for various file systems (FAT32, EXT3, HFS, etc.).
Figure 5.7: File system metadata extractor
Figure 5.8: File system metadata extractor report
We have used Python programming language to implement the graphical user inter-
face. To extract the MFT system file from the NTFS partition of a virtual machine’s
virtual disk file, we used FGET (Forensic Get) [15]. To parse the MFT file after its ex-
traction, we used analyzeMFT.py [3]. The tool analyzeMFT.py is a python script that can
be effectively used to parse the MFT file.
71
Examination of the cloud VM’s registry files:
Like traditional desktop systems, cloud virtual machine’s will have the registry files (or
logs). Windows operating system stores the configuration data in the registry which is
most important from digital forensics perspective. The registry is a hierarchical database,
which can be described as a central repository for the configuration data (i.e. it stores
several elements of information including system details, application installation informa-
tion, networks used, attached devices, history list, etc. [62]. Registry files are user specific
and their location depends on the type of operating system (Windows 2000, XP, 7, 8, etc.).
The important registry files in Windows are USER.DAT, SYSTEM.DAT, CLASSES.DAT,
NTUSER.DAT, USRCLASS.DAT, etc.
The GUI of the Windows Registry Analyzer is built using Python programming lan-
guage. To get the content of a registry file, it has to be extracted from the virtual disk file.
For extracting the registry file, we used FGET (Forensic Get) [15]. To parse the registry
file after extracting it, we used a python library called ‘Python-Registry’ [34]. To get the
specific information from a registry file, the investigator needs to choose, mounting point
for virtual disk, Operating system, User, and the element of information to be retrieved
as shown in Figure 5.9. A sample report generated with the system information, the ap-
Figure 5.9: Cloud VM’s registry analyzer
plication information, the attached devices and the history list is shown in Figure 5.10.
72
Figure 5.10: Cloud VM’s registry analyzer report
Examination of the physical memory of a cloud VM:
Physical memory (or RAM, also called as Volatile memory), contains a wealth of informa-
tion about the running state of a system like the running and hidden processes, malicious
injected code, list of open connections, command history, passwords, clipboard content,
etc. We have used volatility 2.1 [43] plugins to capture some of the important information
from the physical memory of the virtual machine as shown in Figure 5.11.
Figure 5.11: Selective memory analysis
73
A selective memory analysis report of the running process, hidden process and com-
mand history is shown in Figure 5.12.
Figure 5.12: Selective memory analysis report
Figure 5.13: Selection of keyword option for searching
Apart from the selective memory analysis, we have implemented the multiple keyword
search using Boyer-Moore [50] pattern matching algorithm, the implementation details
of which will be provided in the next section. For searching only a set of keywords, the
investigator can select the Keyword option as shown in Figure 5.13. After selecting the
74
option of keywords, the investigator can enter keywords using double quotes separated
by comma as shown in the Figure 5.14. For searching patterns, we have implemented the
Figure 5.14: Entering multiple keywords for search (indexing)
regular expression (RE) search technique for URL, Phone No., Email ID and IP address.
To search only patterns, the investigator can select the RE option as shown in Figure 5.15.
After selecting the option of RE, the investigator can choose the patterns to be searched
as shown in the Figure 5.16.
Figure 5.15: Selection of RE option for searching
We have implemented the search engine using C# programming under .Net platform.
The GUI implemented provides the option of searching either keywords or patterns. For
pattern matching, we have used the Match() method of ‘Regex’ class as follows:
Match i = Regex.Match(input, pattern);
where,
75
Figure 5.16: Selecting multiple patterns for search (indexing)
input - text where pattern has to be searched
pattern - regular expressions (URL, Phone No., Email ID and IP address)
i.Value- will contain the matched pattern for the given regular expression (for example,
www.microsoft.com for URL)
i.Index - will contain the file offset of the matched pattern
The regular expressions used for the corresponding patterns are listed in the Table 5.2.
Whether it is a keyword or pattern, the report generated will contain the file offset of the
Table 5.2: Regular expressions used for corresponding patterns
Keyword Regular Expression (Indian context)
Email ID [A-Za-z0-9. -]+@[A-Za-z0-9]+.(com|in|net|edu)
URL [[w]{3}]?.[A-Za-z0-9]+.[[gov.|co.|nic.]?in|com|edu|net]
IP address [0-1]{8}.[0-1]{8}.[0-1]{8}.[0-1]{8}
Mobile number [91|0]?[7-9][0-9]{9}
given keyword or the pattern as shown in Figure5.17.
76
Figure 5.17: Memory analysis report (result of keywords or pattern matching search)
5.3.2 Boyer-Moore (BM) Algorithm
Boyer-Moore is a efficient string searching algorithm which is also a standard benchmark
for the practical string search literature [85]. It works based upon on calculating the
shift values of the characters of a pattern (or keyword) which are used in case of a bad
match. The Naive algorithm for string matching shifts the pattern by one space every
time a bad match occurs. The shift values in the Boyer-Moore algorithm prevent this
from happening [50]. The worst case running time of the Boyer-Moore algorithm and the
Naive algorithm however are the same (i.e O(mn) where ‘m’ is the length of the pattern
and ‘n’ the length of the text). However, in practical situations the Boyer-Moore algorithm
is vastly superior.
The algorithm works in two phases - the pre-processing phase and the searching phase.
In the pre-processing phase, it builds the shift tables (bad character and good suffix) which
contains the length of the characters to shift when a mismatch occurs in the search of the
pattern (or keyword). These tables are built based on the alphabets in the keyword. In
the searching phase, it scans the characters from the right to left for a match. In case of a
mismatch, it uses the bad character and the good suffix tables to shift the keyword more
77
than one character towards right.
Pre-processing Phase of Boyer-Moore Algorithm:
Bad Character Rule (BCR): it is used to build the bad character (BC) shift table. For a
pattern ‘P’, it builds the shift table using the following principle.
Good Suffix Rule (GSR): it calculates the shift values based on how many characters
were matched successfully before a mismatch (i.e., uses the knowledge of the matched
characters in the pattern’s suffix). It uses the following principle to build the shift table of
a pattern ‘P’ called as the good suffix table. In the search algorithm, the value used for
the shift will be the largest of values produced by Case1, Case2 and Case3.
Searching Phase of Boyer-Moore Algorithm:
The Boyer-Moore search algorithm uses the computed shift values from the bad character
and good suffix tables to prevent the approach followed in Naive algorithm. In the case
of a mismatch during the search, it uses the shift value that is the maximum of the good
suffix rule and bad character rule. The Boyer-Moore pattern matching algorithm to search
given patterns (or keywords) in the text ‘T’ is given as Algorithm 1.
78
Algorithm 1: Boyer-Moore pattern matching algorithm [50, 58]Input:T: an array of characters (text where keywords will be searched);P: an array of characters (holds a keyword to be searched in T);Result:A file containing <keyword, file offsets>Initialization:k: number of keywords;m: length of the keyword;n: length of the text ‘T’;q ← m;while k- - do
. for each keywordwhile q<n do
j ← m;l← q;while j>0 and P[j]==T[j] do
j ← j − 1;l← l − 1;
endif j == 0 then
Keyword found at file offset q;Write the name of the keyword and file offset to a file;q ← q +m− l(2);
endelseq ← q +Max(L[j], l(j), j −BC[j]);
endend
79
5.3.3 Outside the Virtual Machine
The data that resides outside a virtual machine will be the logs related to the cloud ser-
vices required to run and manage the virtual machine activity. The forensic process used
to acquire and analyze data that is not within the virtual machine (i.e. cannot be accessed
from the guest Operating system running in the VM) can be a remote application or an
application running along with the cloud hosting services under the control of the host
Operating system of the cloud server. In the previous section, we have discussed a script
that runs on the cloud hosting server to segregate cloud logs data with respect to a service
or a virtual machine instance ID. The remote application that we developed (Data Extrac-
tor), provides a query based facility to collect the log data of a virtual machine within
a cloud platform from a remote machine (the investigator’s workstation) as depicted in
Figure 5.5.
5.4 Summary
On par with the new digital forensic frameworks and architectures for cloud computing
platforms, there is a immediate requirement of having new digital forensic methods which
can scale to cloud data for handling the analysis of the cloud crimes. In this Chapter, we
have proposed methods for data collection and segregation; and methods for the partial
analysis of the evidence within and outside of a virtual machine that is present in a cloud
platform. In particular, to minimize the processing time of the digital evidence of a re-
ported cloud crime, we have proposed methods of examining the virtual machine’s data
in places where important evidentiary data will most likely be present. The results of our
finding in the examination and partial analysis phase will be provided to the investigator
for further analysis. This would help the investigator in knowing the location and pres-
ence of important artifacts in the evidence under investigation. The methods we proposed
for data collection, segregation and partial analysis for cloud forensic are under review
[Pub5]. In the next Chapter, we will demonstrate the application of the digital forensic
triage in the examination and partial analysis phase of the cloud forensics.
80
Chapter 6
Digital Forensic Triage in the
Examination and Partial Analysis
“The term ‘triage’ normally means deciding who gets attention first.”
- Bill Dedman
6.1 Introduction
In Chapter 5, we proposed various methods for data collection, segregation and partial
analysis for cloud forensics. The proposed partial analysis methods were part of the
examination and partial analysis phase of our cloud forensic framework. In this Chapter,
we use the concept of digital forensic triage to examine and partially analyze the cloud
data under investigation using a parallel processing framework to find the evidence of
interest to the investigator in real time.
The traditional digital forensic approach to investigation (seizing, imaging, and anal-
ysis) is no longer applicable for large-scale data examinations [79]. The capacity of the
storage media has increased at such a rate that the traditional digital forensic investigators
are unable to keep up pace. On par with this, the capacity of virtual disk volumes pro-
vided to a given VM in the cloud environment has also increased (for example, VMware
provides a datastore of size 62TB [41]). In this scenario, the investigator may need to
speed up the investigation process while dealing with specific cloud crime cases such as
81
murder, missing persons, child abductions, death threats, etc. Motivated by this, we used
the concept of digital forensic triage to implement ‘real-time digital forensic analysis
process’ to search for user specified keywords or patterns in real-time in the given evi-
dence file to minimize the overall processing time. The digital forensic triage approach
we designed uses MapReduce with inbuilt KMP (Knuth-Morris-Pratt) and Boyer-Moore
string search algorithms on a distributed computing platform which will index the given
keywords in real time depending on the computing nodes deployed for computation. For
searching patterns, we have implemented regular expression search for URL, Phone No.,
Email ID and IP address without using any specific algorithm. The index of the keywords
or patterns will be utilized by the investigator in the analysis phase to speedup the overall
analysis process. The regular expressions used for the corresponding patterns are listed in
the Table 5.2. We have already discussed the working of the Boyer-Moore algorithm in
Chapter 5, hence, we will elaborate on the working of KMP in this Chapter.
6.2 Digital Forensic Triage
6.2.1 Introduction to Triage and Background
Triage is defined in Oxford English dictionary as “The process of determining the most
important things from amongst a large number that require attention” [32]. Roussev et
al. [80] defined digital forensic triage as “a partial forensic examination conducted under
(significant) time and resource constraints”. Also, they have pointed out that the ability
of traditional digital forensic tools to employ a bigger ‘computational hammer’ has not
grown appreciably. We have experimented on data acquisition and analysis (particularly
indexing) with the traditional digital forensic tools and found the results as shown in
Table 6.1.
From our results, we conclude that the actual processing time (involves indexing,
carving files and analysis) of digital evidence is always greater than the acquisition time.
The total time required to complete the forensic investigation is the sum of the acquisition
time and the processing time. Thus, it may be derived that:
Total time = Acquisition time + Processing time
82
Table 6.1: Report of acquisition and indexing time using traditional digital forensic tools
Disk Size
(in GB)Tool used (Hardware/software)
Acquisition
time (min)
Indexing
time (min)
40, 80, 160Tableau Forensic Duplicator Model
TD1, S/W: 01d11068, F/W: 2.3921, 47,116 NA
40, 80, 160Logicube Talon, S/W: V2.43,
F/W: V3.0122, 46, 119 NA
40, 80, 160 FTK V5.2 NA 19, 51, 118
Where, total time is the time taken to complete the investigation of the crime under inves-
tigation (i.e., from evidence acquisition to reporting).
MapReduce is a parallel programming model used to process and generate large data
sets that is open to a broad variety of real-world problems. Digital forensic triage required
in cloud computing data analysis is one of such problems. In the next section, we will use
the parallel programming framework to design and implement a digital forensic triage
for cloud data analysis which can be used to speed up the overall processing of digital
evidence in the cloud crime investigation.
6.2.2 Parallel Processing Framework using Hadoop
MapReduce is a software framework for easily running applications which process large
amount of data in parallel on large clusters having thousands of nodes of commodity
hardware in a reliable and fault-tolerant manner. MapReduce is a fundamental building
block in Hadoop framework [23]. With the help of MapReduce and other components
such as HDFS, Mahout, Sqoop, Pig, Hive, Zoo keeper, Hbase, etc., the Hadoop framework
provides massive parallel processing. The programmer is completely abstracted from
the details of parallelization, fault-tolerance, locality optimization, load balancing, etc.
while parallel processing. In the MapReduce programming model, processing takes place
where the data is (i.e, computation goes to the data rather than the data coming to the
program) [54].
MapReduce takes the advantages of the parallel processing provided within the Hadoop
83
framework for efficient and fast processing by providing inherent parallelism in an appli-
cation [86]. MapReduce is not suitable for all applications, but when it works, it may
save a huge amount of processing time. MapReduce has two phases - the Map phase
and the Reduce phase. Any MapReduce application will have two functions - Map and
Reduce. The inputs to these functions are <key, value>pairs. An example of MapReduce
framework for a word count application is given in Figure 6.1.
Figure 6.1: MapReduce application framework to count distinct words of a file
As shown in the Figure 6.1, the input file data is divided into parts and sent to the
different Mapper processes (three in our case). Each Mapper process will produce <key,
value>pairs. Reduce functions (two in our case) gets these <key, value>pairs and counts
the value based on the key to produce output as <key, value>pairs.
6.3 Real-time Digital Forensic Analysis Process
6.3.1 Selection of the Pattern Matching Algorithm
The task before us was to select an efficient algorithm for MapReduce to search user spec-
ified keywords in an evidence file. We have tested two well-known algorithms (Boyer-
Moore and KMP string matching) for this requirement. The performance of both the
84
algorithms is very much similar except when used with searching keywords of different
lengths. Boyer Moore [50, 58] will be more appropriate when the keyword length is
very large and KMP [65] will out perform any other string search algorithm for shorter
keyword lengths. We have conducted experiments on the execution time of both the al-
gorithms with different keyword lengths. The experiment was carried out (results are
shown in Table 6.2) on single node Hadoop cluster with 1024 MB plain text data using
MapReduce [54]. By analyzing the experiment results, we decided to implement both the
algorithms. So that, depending on the length of the keyword, an appropriate algorithm
will be called during the execution.
Table 6.2: Execution time of Boyer-Moore and KMP algorithms with multiple keywords
Boyer-Moore algorithm KMP algorithmKeywordlength = 4
Keywordlength = 8
Keywordlength = 4
Keywordlength = 8
1 keyword 17 sec 11 sec 15.5 12 sec3 keywords 20.5 sec 13.5 sec 18 sec 14.5 sec5 keywords 23 sec 17.5 sec 21.5 sec 18 sec
6.3.2 Proposed System Architecture
For experimental purpose, we have set up Hadoop cluster using eight nodes with each
node hardware configuration as shown in Table 6.3.
Table 6.3: Hardware configuration of a node in Hadoop cluster
Processor Intel Core i7-4770KClock (GHz) 3.5
Number of cores 4Number of threads 8
RAM (GB) 8Cache (MB) 8
Hard Disk (GB) 1024
The versions of the software used are Apache Hadoop2.2.0 and Ubuntu 12.04. In
our proposed architecture, we have used only two components of Hadoop framework
85
called Hadoop file system (HDFS) and MapReduce as shown in Figure 6.2. The Hadoop
file system is a master/slave based architecture where one of the node acts as master
(NameNode) and the rest become slaves (DataNode). The master node should be running
the Hadoop components such as the DataNode, NameNode, Job tracker, Task tracker and
the secondary NameNode. The slave nodes on the other hand, should only be running the
DataNode and the Task tracker components. The DataNode manages the storage attached
to it. Master is responsible for managing and storing metadata information of all the files
on different DataNodes [23].
Figure 6.2: Mapping of Hadoop framework components to forensic triage [23]
The EditLog and FsImage are data structures (or files) of HDFS. Every transactional
change that occurs to the file system metadata is recorded in EditLog. The file system
namespace details like mapping of blocks to files and file system properties, is stored
in FsImage. For reliability, the data is divided into blocks and distributed over multiple
DataNodes including the master. The number of duplicate blocks of a file is called as
‘replication factor’ and its value is 3 by default. The job tracker that runs on master node
assigns the MapReduce tasks to task trackers. Also, it computes the preprocessing tables
(required to run KMP) and shift tables (required to run Boyer-Moore) of all keywords and
86
assigns them to task trackers. Task tracker that runs on all the DataNodes is responsible for
running the MapReduce functions which produce the final index offsets of each keyword
after searching. Mapping of Hadoop framework components to forensic triage is shown
in Figure 6.2.
6.3.3 Proposed System Implementation Details
The proposed ‘real-time digital forensic partial analysis process’ consists of the following
four steps:
Step 1: Selecting a VM’s virtual disk file acquired using forensically sound data acquisi-
tion techniques.
Step 2: Distributing parts of the selected data in Step 1 to Hadoop cluster for real time
data processing.
Step 3: Running KMP (or Boyer-Moore)/regular expression based MapReduce on Hadoop
cluster to search user specified keywords or patterns in each part of the data and aggregate
the result of all the parts.
Step 4: Based on the aggregated result, the investigator will decide whether to process the
evidence further or not.
The proposed system as shown in Figure 6.3 includes all these steps. We have imple-
mented the KMP/BM search algorithm in case of keywords and regular expression search
in case of patterns within the Map function of MapReduce in Java programming language.
Our implementation uses a Hadoop library ‘hadoop-0.18.3-core.jar’ for few inbuilt classes
and functions. Map function reads a line from part (block) of evidence file (.vmdk, .vhd,
.vdi, .qcow2, .img, etc.) at a time and calls the KMP (or Boyer-Moore)/regular expres-
sion search for a given set of keywords/patterns. If there is a hit for a keyword/pattern,
it provides local offset. There is an inbuilt function ‘Reporter’ that provides the global
offset corresponding to each local offset. These global offsets are collected by ‘Output
collector’ as intermediate records and written to a file. This process continues till all the
lines of different parts of the evidence file are searched. The number of files containing
the intermediate records will be created based on the number of map functions configured
(we set it to 16). The reduce functions (we set it to 2) take inputs from all the files created
by mapper and provides a merged result. The resulted keyword/pattern offsets can be
87
used for further analysis.
Figure 6.3: Proposed system for ‘real-time digital forensic partial analysis’ using MapRe-duce with KMP/BM search engine
The pseudo code of the improvised KMP (multi-pattern with multi-occurrence) search
algorithm [65] which is to be embedded in the map function is given below (as Algo-
rithm 2).
Where S is the set of characters in the evidence file to be searched, H is the array of
headers which is sought, T is the two dimensional array where T[h] is an array of pre-
computed integers computed by KMP table building algorithm (Algorithm 3) for header
H[h].
88
Algorithm 2: Improvised KMP pattern matching algorithmInput:H: an array of headers (a two dimensional array);T: a two dimensional array of integers (result of KMP table building algorithm);S: an array of characters (text to be searched);Result:A file containing <key, val>pairs as <keyword, file offset>Initialization:h: Number of headers;k ← h;while h- - do
m[h]← 0; . the beginning index of the current match of the keyword ‘h’ in ‘S’pos[h]← 0; . the position of the current character in H[h] keyword
endwhile k- - do
while m[h]+pos[h]<length(S) doif H[h][pos[h]] = S[m[h]+pos[h]] then
if pos[h] = length(H[h]) -1 thengenerate <key,value>as <H[h],m[h]>; . keyword offsetUpdate: m[h]← m[h] + pos[h]− T [h][pos[h]];if T[h][pos[h]] >-1 then
pos[h]← T [h][pos[h]];endelsepos[h]← 0;
endelseincrement pos[h];
endelsegoto Update;
endend
89
Algorithm 3: KMP table building algorithm for multiple headersInput:H: an array of headers (a two dimensional array);Result:Populates table(T)Initialization:h: Number of headers;T: a two dimensional array of integers;while h- - do
let T [h][0]← −1, T [h][1]← 0;pos← 2, cnd← 0;while pos <length(H[h]) do
if H[h][pos-1] = H[h][cnd] thencnd← cnd+ 1;T [h][pos]← cnd;pos← pos+ 1;
endelse if cnd >0 then
cnd← T [h][cnd];endelseT [h][pos]← 0;pos← pos+ 1;
endend
90
GUI Based Implementation:
As part of the automating process a GUI was made which would make it easier for the user
to search for keywords/regular expressions. Since, the MapReduce code was completely
dependent on the Mapper function, the GUI was created with the intention of automating
the task of writing a Mapper function. The user is given the option of searching for
either patterns or keywords. The four default regular expressions (i.e URL, Email address,
Mobile number and IP address) are already available for selection as shown in Figure 6.4.
The user can also add a pattern of his/her own choice as shown in Figure 6.5.
Figure 6.4: Default regular expressions to generate Mapper code
Figure 6.5: Adding regular expression to generate Mapper code
AWT and Swing were used for creating the front end. A thread was used to keep
the option selection process in the GUI updated in real time. A standard template for
the Mapper class was taken and modified according to the options selected in the GUI.
A separate file was maintained to update the regular expressions as the user would add
them. This file would serve as a temporary database. Once the user selects a set of
91
keywords or patterns and press “START” button, a .jar file gets created that contains the
KMP (or Boyer-Moore) or regular expressions based Mapper function code to search a
set of keywords or patterns using the Hadoop framework.
Execution of MapReduce in Hadoop:
After compiling the application (say for example KMPHadoop.java), a KMPHadoop.jar
file gets created that can be exported to a particular directory where it can be used as input
to the Hadoop parallel framework. Once the .jar file gets exported, we need to place it
in the location where the HDFS is installed for easier execution. To run the MapReduce
functions containing the user logic on Hadoop cluster, the following command can be
used from the master node of the cluster.
$bin/hadoop jar KMPHadoop.jar KMPHadoop /user/Pawar/Analysis/Ubuntu.img
/user/Pawar/Analysis/output
where,
KMPHadoop.jar - jar file containing the code
KMPHadoop - class name of the application
/user/Pawar/Analysis/Ubuntu.img - input file for searching keywords or patterns
/user/Pawar/Analysis/output - output directory where the resultant file will be stored
A successful execution of MapReduce program using the Hadoop framework will create
a resultant file in the output directory with the name “part-r-00000” that contains the file
offsets of selected keywords or patters.
6.4 Results and Discussion
To distribute the evidence file (.qcow2) over multi-node cluster, we have used the default
block size (64 MB) with replication factor equal to 2 and 3. Replication factor 3 gave us
better results. In our experimentation, we initially started with two nodes and gradually
increased to four and eight. In the two node set up, eight maps, and one reducer were
configured. For four and eight nodes set up, 16 maps, and two reducers were configured.
The search time of the KMP algorithm in all the three cases with single keyword for
different size evidence files is shown in Figure 6.6. The same experiment is carried out
92
Figure 6.6: Searching time of KMP based MapReduce with single keyword
93
with multiple keywords (4 no.s) to observe the behavior of the KMP algorithm as shown
in Figure 6.7. The performance of the KMP algorithm for searching multiple keywords
over single keyword is far better due to the replication of the parts of the evidence file
over the nodes of the cluster. We repeated the same experiment with one or more regular
expressions. The search time of the regular expression based MapReduce function in all
the three cases with single pattern for different size evidence files is shown in Figure 6.8.
Again, by changing the number of patterns (4 no.s), we ran the regular expression based
MapReduce function for which the resulted search time is shown in Figure 6.9. As in the
previous case, the performance of the regular expression based MapReduce function for
searching multiple patterns over single pattern is far better for the same reason.
A keen observation on the performance testing of KMP (or Boyer-Moore) and regu-
lar expression based MapReduce reveals that, the keywords with regular expressions are
searched fast. The reason for this is that the keywords with regular expressions matches
the exact pattern where as others match the substring also. The regular expressions used
for searching e-mail IDs, URLs, IP addresses, and mobile numbers are given in Table 5.2.
These regular expressions are the patterns for respective keywords in the Indian context.
The design method we used support addition of new patterns if required by the investiga-
tor.
The approach we designed, implemented and tested could also be used for the follow-
ing purpose:
• Data carving [73]
• Online social network analysis
• Screening of cloud crime cases (reducing the investigator’s backlog in the computer
forensics lab)
• The cloud crime investigator can make use of the computing facility of any cloud
provider to deliver ‘Forensics as a Service’ (FaaS) to the end-users who require the
indexes of certain keywords or patterns of the evidence
• Server log analysis for forensic purpose
94
Figure 6.7: Searching time of KMP based MapReduce with multiple keywords
95
Figure 6.8: Searching time of RE based MapReduce with single pattern
96
Figure 6.9: Searching time of RE based MapReduce with multiple patterns
97
• The generated file offsets of the keywords or patterns can be used to generate the
timeline view of the important artifacts related to the reported crime
The traditional digital forensic tools like CyberCheck, FTK, Encase have the facility
of indexing. After the indexing, these tools provide the searching of keywords on the
click, but, the time they take to index the evidence is considerably high which is on par
with the increase of the digital media size. These tools can make use of the output (file
offsets of keywords) generated by our approach for searching the specific keywords and/or
patterns which are related to a reported cloud crime. This kind of search will speed up the
searching criteria by avoiding the indexing time. Also, the readily available file offsets of
certain keywords will speed up the file carving process if used by the file carving tools
such as Adroit [2], F-DAC [14], foremost [16], R-STUDIO [36], etc.
With the computation facility of the eight node cluster (results are shown in Figure 6.6
and 6.7), the investigator can know whether an evidence file of size 1 TB contains four
keywords or not in 90 minutes of time. This time can be drastically reduced to less than
few minutes by adding more number of high-end nodes to the cluster and increasing the
number of the Map and Reduce tasks. Hence, we call our approach of finding the user
specified keywords or patterns in given evidence as ‘real-time digital forensic analysis
process’.
6.5 Summary
Digital storage media capacity is growing at the rate of Moore’s law. Cloud computing
environment has added fuel to this by providing almost unlimited virtual computational
facility and storage media. This phenomenal change increases the overall time it takes
to process a typical cloud crime investigation. The increase in the processing time to
completely analyze the evidence data is driving the need for additional research in this
emerging area.
In this Chapter, we designed and developed a ‘real-time digital forensic partial anal-
ysis process’ to search for user specified keywords or patterns in real-time in the given
evidence to minimize the overall processing time. For this we have implemented KMP
98
(or Boyer-Moore)/regular expression based MapReduce on Hadoop cluster in Java pro-
gramming and tested it successfully on a cluster with eight nodes. When this approach
is used for partial analysis, there is no possibility of missing crucial piece of evidence.
The overall model works as simple as searching for user specified patterns in a plain text
document file. The proposed digital forensic triage process, ‘real-time digital forensic
partial analysis’ is an accepted and published work [Pub4]. In the next Chapter, we will
provide the logical conclusion of our research and suggest future directions to the world
community of researchers of this emerging field.
99
Chapter 7
Conclusion and Future Scope
In this research work, we addressed the challenges and requirements of performing dig-
ital forensics in cloud. We designed a generic digital forensic framework for cloud. We
suggested methods of dead/live forensic acquisition and analysis within/outside the vir-
tual machines and also designed a digital forensic triage for the examination and partial
analysis of virtual machines in the cloud computing systems.
In particular, we addressed the concerns, a digital forensic investigator may face dur-
ing the investigation in cloud computing environment. In the following sections, we sum-
marize the details of the work carried out as part of this research.
7.1 Summary of Deductions
Cloud computing is still an evolving computational platform which lacks the support for
crime investigation in terms of the required frameworks/tools. The extensive literature
survey we conducted in the area of digital forensic in cloud computing systems helped
us in identifying the gaps in the existing research. From the identified ones, we focused
on a few and designed and implemented methods for the partial forensic examination,
evidence segregation, selective data acquisition, and digital forensic triage using parallel
processing for cloud forensic data analysis.
Specific to performing digital forensic in the virtual environment, we identified the
challenges and requirements of detecting the virtual environment in multi-level virtual-
ization, identified important files which are generated when virtual systems are used in
100
the virtual machines that are part of the cloud environment, and devised an algorithm to
detect virtual machines hidden using the alternate data streams. The proposed algorithm
for detection of hidden virtual machines, uses three different filters which guarantees the
detection of malicious virtual machines.
To design a generic digital forensic framework for the cloud crime investigation, we
have framed a digital forensic process with five phases. All the phases that are included
in this framework work similar to the existing frameworks, except the third phase. This
phase (the phase of examination and partial analysis) will play an important role in the
examination and analysis of the data produced by the cloud environment. Having iden-
tified the different phases that need to be followed in the cloud crime investigation, we
designed a generic control flow process for performing digital forensics in cloud. The
proposed control flow process will serve as a blueprint for the investigator in acquiring
and analyzing the data of the client device as well as that of the cloud provider data centers
involved in the cloud crime investigation. To design and develop a cloud forensic applica-
tion, the complete knowledge of the cloud computing service models and the deployment
models are essential. To help the digital forensic research community in understanding
the cloud computing architecture for forensic readiness, we designed a digital forensic
architecture for the cloud. This architecture may be used as a reference to design and
develop new digital forensic tools in the area of cloud computing systems.
For the proof of concept for designing a architecture for cloud forensics, we have
formulated methods for cloud data acquisition and analysis. For a reported cloud crime,
the important artifacts which need to be acquired were identified from the point view of
virtual machines and cloud logs. After acquiring a virtual machine’s virtual hard disk file
using the traditional digital forensic approach, the methods we suggested for the exam-
ination and partial analysis will be used to collect actionable evidence from the virtual
disk file. Using our approach, the investigator can collect the evidential artifacts such as
the file system metadata, the registry file contents, and the physical memory contents at
the scene of crime. The collection of the evidential artifacts of a virtual machine under
investigation would help the investigator in speeding up the final analysis process. The
virtual machine under the investigation will also have the logs related to its activity in the
cloud platform. We have devised methods to segregate and acquire the log data belonging
101
to a virtual machine.
The facility of searching for the keywords and/or patterns provided by the traditional
digital forensic tools depend on the indexing capability of the tool. The average time these
tools take to index the complete disk content is not on par with the increase of the digital
media size, especially virtual disk size provided by the cloud platforms today. To speed up
the searching criteria, we have designed and implemented a digital forensic triage using
parallel processing framework to index the evidence of interest to the investigator in real
time. This method of indexing the evidence of interest also falls in the category of the
examination and partial analysis which will help the investigator in speeding up the final
analysis process.
7.2 Future Scope of Work
The methods which were suggested for the realization of the designed framework titled “A
Novel Digital Forensic Framework for Cloud Computing Environment”, are tested using
the private cloud test-bed setup using OpenStack cloud solution. The methods can also be
tested using different private cloud solutions such as Eucalyptus, OpenNebula, VMware
vCloud, etc. The devised methodology of the digital forensic triage using parallel pro-
cessing was carried out using the Hadoop framework, which was not integrated with any
digital forensic analysis tools to use for searching patterns in the evidence file. In the
future work, one could include the pattern search facility using the proposed approach in
the open source software called Digital Forensics Framework (DFF). Also, one can take
up the implementation of the digital forensic triage using Amazon Elastic MapReduce
(Amazon EMR) to index the patterns of interest in the given evidence.
We targeted an IaaS (Infrastructure-as-a-Service) delivery model of the cloud for per-
forming digital forensic activity. As a future work, the design and development of the
forensic methods for the PaaS (Platform-as-a-Service) and SaaS (Software-as-a-Service)
delivery models of cloud computing may be taken up.
It may be appropriate to use Machine Learning principles to design and develop new
methods to solve the problem of digital forensic triage. As another alternative to our
proposed approach for increasing the efficiency of the investigation, one could plan to
102
use Machine Learning algorithms for feature extraction, prioritization of the evidence,
classification of the evidence, etc. to extract and analyze crime related features within the
virtual machine.
7.3 Concluding Remarks
This research work would enable digital forensic investigation in the cloud environment
by filling the gap that exists between the traditional digital forensics and the cloud foren-
sics which is certainly different due to the virtual environment of cloud computing sys-
tems. We hope that, the work presented in this research will be taken forward by the
digital forensic research community to come up with new methods of performing digital
forensics to cater to the needs of the dynamic changing nature of the cloud.
103
List of Publications Published/Accepted
[Pub1] Digambar Povar and G Geethakumari, “Digital Evidence Detection in Virtual
Environment for Cloud Computing”, Proceedings of the ACM International Conference
on Security of Internet of Things, SecurIT’12, August 17-19, India, 2012, pp. 102-106.
[Pub2] Digambar Povar and G Geethakumari, “A Novel approach to Detect Cloud Virtual
Machines hidden using Alternate Data Streams”, Proceedings of the IEEE International
Multi Conference on Automation, Computing, Control, Communication and Compressed
Sensing, iMac4s-2013, March 22-23, India, 2013, pp. 835-839.
[Pub3] Digambar Povar and G Geethakumari, “A Heuristic Model for Performing Digital
Forensics in Cloud Computing Environment”, International Symposium on Security in
Computing and Communications, SSCC-2014, September 24-27, India. Proceedings in
the Journal “Communications in Computer and Information Science (CCIS)”, Springer
Series, Volume 467, pp. 341-352.
[Pub4] Digambar Povar, Saibharath, and G Geethakumari, “Real-time digital forensic
triaging for cloud data analysis using MapReduce on Hadoop framework”, International
Journal of Electronic Security and Digital Forensics, Inderscience Publishers, Vol. 7,
Issue No. 2, pp. 119-133, 2015.
[Pub5] Digambar Povar and G Geethakumari, “Digital Forensic Architecture for Cloud
Computing Systems: Methods of Evidence Identification, Segregation, Collection and
Partial Analysis”, Accepted at the Third International Conference on INformation systems
Design and Intelligent Applications-INDIA-2016. Proceedings in the Journal “Advances
in Intelligent Systems and Computing (AISC) series”.
104
Bibliography
[1] Ad triage - forensically acquire data from live and powered down computers in the
field. http://accessdata.com/solutions/digital-forensics/AD-triage. Accessed: 2015-
06-25.
[2] Adroit photo forensics - smartcarving tool. http://digital-
assembly.com/products/adroit-photo-forensics/features/smartcarving.html. Ac-
cessed: 2015-06-25.
[3] analyzemft - mft file parser. https://github.com/dkovar/analyzeMFT. Accessed:
2015-06-25.
[4] Aws:amazon web services - public cloud computing platform.
https://aws.amazon.com. Accessed: 2015-06-25.
[5] Awstats - an open source log analyzer. http://www.awstats.org. Accessed: 2015-06-
25.
[6] Clavisters new dimension in network security reaches the
cloud. Technical report, Tech. Rep., [Online]. Available:
https://www.clavister.com/globalassets/documents/resources/white-
papers/clavister-whp-cloud-security-en.pdf.
[7] Cloud computing strategic direction paper: Opportunities and applicability for use
by the australian government, version 1.0, 2011.
[8] Computer evidence vs daubert: The coming conflict.
https://www.cerias.purdue.edu/bookshelf/archive/2005-17.pdf. Accessed: 2015-06-
25.
105
[9] Cybercheck - digital evidence analysis software.
http://www.cyberforensics.in/Products/Cybercheck.aspx. Accessed: 2015-06-
25.
[10] Dban - data wiping software. http://www.dban.org. Accessed: 2015-06-25.
[11] Digital forensics framework - open source digital investigation software.
http://www.digital-forensic.org. Accessed: 2015-06-25.
[12] Encase forensic v7 - the fastest, most comprehensive forensic so-
lution. https://www.guidancesoftware.com/products/Pages/encase-
forensic/overview.aspx?cmpid=nav. Accessed: 2015-06-25.
[13] Eucalyptus - private cloud computing platform. https://www.eucalyptus.com. Ac-
cessed: 2015-06-25.
[14] F-dac - forensic data carving tool. http://www.cyberforensics.in/showdownloads.as-
px?id=46. Accessed: 2015-06-25.
[15] Fget - network-capable forensic data acquisition tool. http://www.net-
security.org/secworld.php?id=9757. Accessed: 2015-06-25.
[16] Foremost - freely available file carving tool. http://foremost.sourceforge.net. Ac-
cessed: 2015-06-25.
[17] Forensic tool kit - standard digital forensic investigation solution.
https://www.accessdata.com/solutions/digital-forensics/forensic-toolkit-ftk. Ac-
cessed: 2015-06-25.
[18] Ftk imager - disk imaging tool. http://accessdata.com/product-download. Accessed:
2015-06-25.
[19] Google app engine:google cloud platform for application development and deploy-
ment. https://cloud.google.com/appengine. Accessed: 2015-06-25.
[20] Google drive client software. https://www.google.co.in/drive/download. Accessed:
2015-06-25.
106
[21] Guidance software encase - real-world triage and collection with encase portable.
https://www.guidancesoftware.com/products/Pages/encase-portable/overview.aspx.
Accessed: 2015-06-25.
[22] Guidelines for the secure use of cloud computing by federal departments
and agencies. http://csrc.nist.gov/groups/SMA/ispab/documents/minutes/2011-
07/Jul13 Cloud-ISIMC-Cloud-Security-ISPAB.pdf. Accessed: 2015-06-25.
[23] Hadoop - hdfs architecture guide. http://hadoop.apache.org/docs/r1.2.1/hdfs desig-
n.html. Accessed: 2015-06-25.
[24] Incident management and forensics working group - map-
ping the forensic standard iso/iec 27037 to cloud computing.
https://downloads.cloudsecurityalliance.org/initiatives/imf/Mapping-the-Forensic-
Standard-ISO-IEC-27037-to-Cloud-Computing.pdf. Accessed: 2015-06-25.
[25] Lime - linux memory extractor. https://github.com/504ensicslabs/lime. Accessed:
2015-06-25.
[26] Memoryze - find evil in live memory. http://www.mandiant.com/resources/downloa-
d/memoryze. Accessed: 2015-06-25.
[27] Mount image pro - mount image as a drive letter. http://www.mountimage.com.
Accessed: 2015-06-25.
[28] Opennebula - private cloud computing platform. https://opennebula.org. Accessed:
2015-06-25.
[29] Openstack - private cloud computing platform. https://www.openstack.org. Ac-
cessed: 2015-06-25.
[30] Openstack configuration reference manual guide.
http://docs.openstack.org/icehouse/config-reference/config-reference-icehouse.pdf.
Accessed: 2015-06-25.
107
[31] Openstack installation guide for ubuntu 12.04.
http://docs.openstack.org/icehouse/install-guide/install/apt/openstack-install-
guide-apt-icehouse.pdf. Accessed: 2015-06-25.
[32] Oxford dictionary - definition of triage. http://www.oxforddictionaries.com/definiti-
on/english/triage. Accessed: 2015-06-25.
[33] Putty - an ssh and telnet client. http://www.putty.org. Accessed: 2015-06-25.
[34] Python-registry - library that provides read-only access to windows registry files.
https://github.com/williballenthin/python-registry. Accessed: 2015-06-25.
[35] Qemu disk image utility - openstack virtual machine image guide.
http://docs.openstack.org/image-guide/image-guide.pdf. Accessed: 2015-06-
25.
[36] R-studio - disk recovery software. http://www.data-recovery-software.net. Ac-
cessed: 2015-06-25.
[37] The sleuth kit - open source digital forensics. http://www.sleuthkit.org/. Accessed:
2015-06-25.
[38] Spektor forensic intelligence - triage first responders.
http://www.evidencetalks.com/index.php/en/products. Accessed: 2015-06-25.
[39] The system for triaging key evidence - ideal technology corporation.
http://www.idealcorp.com/products/index.php?product=STRIKE. Accessed:
2015-06-25.
[40] Vmware - private cloud computing solution. https://www.vmware.com/cloud-
computing/private-cloud.html. Accessed: 2015-06-25.
[41] Vmware vsphere 5.5 - configuration maximums.
http://www.vmware.com/pdf/vsphere5/r55/vsphere-55-configuration-
maximums.pdf. Accessed: 2015-06-25.
108
[42] Vmware workstation 5.5 - what files make up a virtual machine?
https://www.vmware.com/support/ws55/doc/ws learning files in a vm.html.
Accessed: 2015-06-25.
[43] The volatility framework - an advanced memory forensics framework.
https://code.google.com/p/volatility. Accessed: 2015-06-25.
[44] Winscp - an open source free ssh client for windows.
https://winscp.net/eng/index.php. Accessed: 2015-06-25.
[45] X-ways forensics - integrated computer forensics software. http://www.x-ways.net.
Accessed: 2015-06-25.
[46] Zeus botnet controller. Technical report, Tech. Rep., 2009. [Online]. Available:
http://aws.amazon.com/security/security-bulletins/zeus-botnet-controller.
[47] Zsoft uninstaller 2.5 - search for remnants after uninstalling a application.
http://www.zsoft.dk/index/software details/4. Accessed: 2015-06-25.
[48] M Al Fahdi, NL Clarke, and SM Furnell. Towards an automated forensic examiner
(afe) based upon criminal profiling & artificial intelligence. 2013.
[49] Cory Altheide and Harlan Carvey. Digital forensics with open source tools. 2011.
[50] Robert S Boyer and J Strother Moore. A fast string searching algorithm. Communi-
cations of the ACM, 20(10):762–772, 1977.
[51] Brian Carrier. File system forensic analysis, volume 3. Addison-Wesley Reading,
2005.
[52] Brian Carrier, Eugene H Spafford, et al. Getting physical with the digital investiga-
tion process. International Journal of digital evidence, 2(2):1–20, 2003.
[53] Hyunji Chung, Jungheum Park, Sangjin Lee, and Cheulhoon Kang. Digital forensic
investigation of cloud storage services. Digital investigation, 9(2):81–95, 2012.
[54] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large
clusters. Communications of the ACM, 51(1):107–113, 2008.
109
[55] Josiah Dykstra and Alan T Sherman. Acquiring forensic evidence from
infrastructure-as-a-service cloud computing: Exploring and evaluating tools, trust,
and techniques. Digital Investigation, 9:S90–S98, 2012.
[56] Josiah Dykstra and Alan T Sherman. Design and implementation of frost: Digital
forensic tools for the openstack cloud computing platform. Digital Investigation,
10:S87–S95, 2013.
[57] Corrado Federici. Cloud data imager: A unified answer to remote acquisition of
cloud storage areas. Digital Investigation, 11(1):30–42, 2014.
[58] Zvi Galil. On improving the worst case running time of the boyer-moore string
matching algorithm. Communications of the ACM, 22(9):505–508, 1979.
[59] Simson L Garfinkel. Digital forensics research: The next 10 years. digital investi-
gation, 7:S64–S73, 2010.
[60] Bernd Grobauer, Tobias Walloschek, and Elmar Stocker. Understanding cloud com-
puting vulnerabilities. Security & privacy, IEEE, 9(2):50–57, 2011.
[61] NIST Cloud Computing Forensic Science Working Group et al. Nist cloud comput-
ing forensic science challenge (draft), 2014.
[62] Jerry Honeycutt and Jerry Honeycutt Jr. Microsoft Windows registry guide. Mi-
crosoft Press, 2005.
[63] Ilyoung Hong, Hyeon Yu, Sangjin Lee, and Kyungho Lee. A new triage model con-
forming to the needs of selective search and seizure of electronic evidence. Digital
Investigation, 10(2):175–192, 2013.
[64] Karen Kent, Suzanne Chevalier, Tim Grance, and Hung Dang. Guide to integrating
forensic techniques into incident response. NIST Special Publication, pages 800–86,
2006.
[65] Donald E Knuth, James H Morris, Jr, and Vaughan R Pratt. Fast pattern matching in
strings. SIAM journal on computing, 6(2):323–350, 1977.
110
[66] Lee Pimlott Lallie, Harjinder Singh. Applying the acpo principles in public cloud
forensic investigations. Journal of Digital Forensics, Security and Law, 7(1):71–86,
2012.
[67] Fang Liu, Jin Tong, Jian Mao, Robert Bohn, John Messina, Lee Badger, and Dawn
Leaf. Nist cloud computing reference architecture: Recommendations of the na-
tional institute of standards and technology (special publication 500-292). 2012.
[68] Adamantini I Martini, Alexandros Zaharis, and Christos Ilioudis. Detecting and
manipulating compressed alternate data streams in a forensics investigation. In Dig-
ital Forensics and Incident Analysis, 2008. WDFIA’08. Third International Annual
Workshop on, pages 53–59. IEEE, 2008.
[69] Ben Martini and Kim-Kwang Raymond Choo. An integrated conceptual digital
forensic framework for cloud computing. Digital Investigation, 9(2):71–80, 2012.
[70] Fabio Marturana and Simone Tacconi. A machine learning-based triage methodol-
ogy for automated categorization of digital media. Digital Investigation, 10(2):193–
204, 2013.
[71] Rodney McKemmish. What is forensic computing? 1999.
[72] P Mell and T Grance. The nist definition of cloud computing. nist special publication
800-145 (final). Technical report, Tech. Rep., 2011.[Online]. Available: http://csrc.
nist. gov/publications/nistpubs/800-145/SP800-145. pdf.
[73] Antonio Merola. Data carving concepts. SANS Institute: Infosec Reading room,
2008.
[74] Darren Quick and Kim-Kwang Raymond Choo. Digital droplets: Microsoft skydrive
forensic data remnants. Future Generation Computer Systems, 29(6):1378–1394,
2013.
[75] Darren Quick and Kim-Kwang Raymond Choo. Dropbox analysis: Data remnants
on user machines. Digital Investigation, 10(1):3–18, 2013.
111
[76] Darren Quick and Kim-Kwang Raymond Choo. Forensic collection of cloud storage
data: Does the act of collection result in changes to the data or its metadata? Digital
Investigation, 10(3):266–277, 2013.
[77] Darren Quick and Kim-Kwang Raymond Choo. Google drive: Forensic analysis of
data remnants. Journal of Network and Computer Applications, 40:179–193, 2014.
[78] Anthony Reyes, Richard Brittson, Kevin O’Shea, and James Steele. Cyber crime
investigations: Bridging the gaps between security professionals, law enforcement,
and prosecutors. Syngress, 2011.
[79] Marcus K Rogers, James Goldman, Rick Mislan, Timothy Wedge, and Steve De-
brota. Computer forensics field triage process model. Journal of Digital Forensics,
Security and Law, 1(2):19–38, 2006.
[80] Vassil Roussev, Candice Quates, and Robert Martell. Real-time digital forensics and
triage. Digital Investigation, 10(2):158–167, 2013.
[81] Keyun Ruan, Ibrahim Baggili, Joe Carthy, and Tahar Kechadi. Survey on cloud
forensics and critical criteria for cloud forensic capability: A preliminary analysis.
In Proceedings of the Conference on Digital Forensics, Security and Law, pages
55–70, 2011.
[82] Keyun Ruan, Joe Carthy, Tahar Kechadi, and Mark Crosbie. Cloud forensics. In
Advances in digital forensics VII, pages 35–46. Springer, 2011.
[83] John Sammons. The basics of digital forensics: the primer for getting started in
digital forensics. Elsevier, 2012.
[84] Adrian Shaw and Alan Browne. A practical and robust approach to coping with large
volumes of data submitted for digital forensic examination. Digital Investigation,
10(2):116–128, 2013.
[85] Nimisha Singla and Deepak Garg. String matching algorithms and their applicability
in various applications. International Journal of Soft Computing and Engineering,
1(6):218–222, 2012.
112
[86] Dinkar Sitaram and Geetha Manjunath. Moving to the cloud: Developing apps in
the new world of cloud computing. Elsevier, 2011.
[87] David Solomon and Mark Russinovich. Microsoft windows internals, 2005.
[88] Mark Taylor, John Haggerty, David Gresty, and David Lamb. Forensic investigation
of cloud computing systems. Network Security, 2011(3):4–10, 2011.
[89] Udaya Tupakula and Vijay Varadharajan. Tvdsec: Trusted virtual domain security.
In Utility and Cloud Computing (UCC), 2011 Fourth IEEE International Conference
on, pages 57–64. IEEE, 2011.
[90] Luis M Vaquero, Luis Rodero-Merino, Juan Caceres, and Maik Lindner. A break in
the clouds: towards a cloud definition. ACM SIGCOMM Computer Communication
Review, 39(1):50–55, 2008.
[91] Toby Velte, Anthony Velte, and Robert Elsenpeter. Cloud computing, a practical
approach. McGraw-Hill, Inc., 2014.
[92] Divya S Vidyadharan and KL Thomas. Digital image evidence detection based on
skin tone filtering technique. In Advances in Computing and Communications, pages
544–551. Springer, 2011.
[93] Shams Zawoad and Ragib Hasan. Cloud forensics: a meta-study of challenges,
approaches, and open problems. arXiv preprint arXiv:1302.6312, 2013.
113
Glossary of terms used in the thesis
ACPO (Association of Chief Police Officers) principles. The ACPO principles are the
guidelines for the digital forensic investigation which will be followed in handling the
computer based electronic evidence by the law enforcement agencies particularly in the
United Kingdom. There are four principles in these guidelines. If all of the four principles
are followed correctly, it may be a benchmark as “chain of custody” for the court-of-law.
Client device. Client device is a digital device used to access the cloud services. Exam-
ples of such device includes the desktop computer, laptop, mobile device, PDA (Personal
Digital Assistant), etc.
Cloud service provider (CSP). CSP is the notable entity that provides computer re-
sources as a service. Examples of CSP’s are: Apple, Amazon, Microsoft, Google, Oracle,
IBM, HP, and others.
Cloud storage. Also called as remote storage. The cloud service that stores user data
in cloud providers storage (cloud servers).
Cloud user. User who uses the cloud services such as IaaS, PaaS, or SaaS.
CyberCheck. Cyber forensic tool for data recovery and analysis of digital evidence.
Darik’s Boot and Nuke (DBAN). Is a free erasure software for deleting the contents
of any hard disk drive. Once the data is deleted, cannot be recovered.
Data duplication (dd). Disk cloning utility. It has the capability of cloning a parti-
tion or an entire hard disk drive.
Daubert principles. A rule of evidence regarding the admissibility of expert witnesses
114
testimony during United States federal legal proceedings.
Digital forensic investigation. The process of investigating a cyber crime using forensi-
cally sound acquisition and analysis methods.
Digital Forensic Research Workshop (DFRWS). Workshop which is conducted every
year to bring together the academic researchers and the digital forensic investigators and
practitioners for active discussion.
EnCase. Digital forensic tool to analyze data from the widest range of devices such
as the computers, smartphones, and tablets.
EnCase Forensic Imager. Digital forensic tool to acquire evidence (bit by bit cloning
of the digital media) in a forensically sound manner.
Expert Witness Format (EWF). Digital data that could be used as evidence are typi-
cally stored in specialized and closed formats. One such format is EWF. This format is
used by all the major tools that are used to acquire and analyze digital evidence.
Forensic Tool Kit (FTK). It is a another tool like EnCase, used to analyze data from
the various digital devices.
FTK Imager. It is a disk imaging (or bit by bit cloning of the digital media) tool that
acquire the digital evidence in a forensically sound manner.
Gartner. It is a Information Technology (IT) research and advisory firm providing tech-
nology related insight. It uses hype cycles and magic quadrants for visualization of its
market analysis results.
HardCopy 3P. Portable hardware tool for the forensic hard drive cloning.
115
International Data Corporation (IDC). Market research, analysis and advisory firm
specialized in the Information Technology (IT), telecommunications, and consumer tech-
nology.
Investigator. The person who investigates a cyber crime.
Law Enforcement Agency (LEA). The person who is authorized to investigate a cy-
ber crime.
Linux Memory Extractor (LiME). Physical memory (also called as volatile memory
or RAM) acquisition tool for Linux and Linux-based devices.
Message Digest (MD5). A cryptographic hash function that computes a checksum (128
bits) used to provide data integrity and authentication.
National Institute of Standards and Technology (NIST). Standardization firm provides
standards and guidelines to new technologies such as mobile computing, cloud comput-
ing, Internet of things (IOT), etc.
Scientific Working Group on Digital Evidence (SWGDE). It is an organization that
builds standards for digital and multimedia evidence.
Secure Hash Algorithm (SHA). A cryptographic hash function that computes a check-
sum (160 bits or more) used to provide data integrity and authentication.
Tableau forensic duplicator. Portable hardware tool for fast and reliable forensic hard
drive cloning.
TrueBack. It is a digital forensic software tool for digital evidence seizure and acquisition
(Disk imaging or cloning), that is compatible with DOS, Windows and Linux operating
systems.
116
Biography: Mr. Digambar Povar
Mr. Digambar Povar is Lecturer, Dept. of Computer Science and Information Systems
at BITS Pilani, Hyderabad Campus. Before joining BITS, he worked as a Scientist in
the Dept. of Resource Center for Cyber Forensics (RCCF) at Center for Development of
Advanced Computing (CDAC), Trivandrum, India, for a period of 5 years and 6 months.
He was also associated with Center for Development of Advanced Computing (CDAC),
Noida, as a Project Engineer for a short period. Mr. Digambar Povar holds a post-
graduation (M.Tech) in Computer Science and Engineering from NIT Warangal, India.
At CDAC, he contributed in the design and development of cyber forensic tools like Cy-
berCheck (forensic disk analysis tool), F-DaC (Forensic Data Carving) tool, FIRT (Foren-
sic Image Recovery tool), etc. He was instrumental in commissioning of the Cyber Foren-
sic labs at the office of the DGIT (Investigation), Delhi, DGIT (Investigation), Mumbai
and DG-DRI, Mumbai, India. He served as a faculty in conducting “Courses on Cyber
Forensics” to Law Enforcement officers, CBI, IB, Navy and Kerala police. Also, he par-
ticipated in many national and international seminars/workshops on “Digital Forensics”.
Mr. Digambar Povar has many international publications to his credit as primary author.
His areas of research interests include: digital forensics, cloud computing, cloud foren-
sics, cyber security and cloud security.
Mr. Digambar Povar has given many guest lectures on topics in emerging areas such as
digital forensics, cloud computing, cloud security and allied areas of cloud computing.
Presently, he is the Co-Investigator for the project “Design and Development of Digital
Forensic Tools for Cloud IaaS” funded by DeitY, Govt. of India.
117
Biography: Dr. G. Geethakumari
Dr G Geethakumari is Asst.Professor, Dept. of Computer Science and Information Sys-
tems at BITS Pilani, Hyderabad Campus. Before joining BITS, she worked as a faculty
in the CSE Dept. at the National Institute of Technology, Warangal. Dr Geetha received
her Ph.D. from University of Hyderabad. Her Ph.D. thesis was titled ‘Grid Computing
Security through Access Control Modelling’.
Dr. Geetha has many international publications to her credit. Her areas of research inter-
ests include: Information security, cloud computing and security, cloud forensics, enter-
prise security challenges and data analysis, cloud authentication techniques, cyber secu-
rity, semantic attacks and privacy in online social networks. She has been in the forefront
of technical activities at BITS-Pilani, Hyderabad Campus. She has been the Faculty Ad-
visor for Computer Science Association during 2008-2011. Presently she is the IEEE
Student Branch Counselor, BITS-Pilani, Hyderabad Campus. She is also the Coordinator
for the Linux User Group, BITS Pilani, Hyderabad Campus.
Dr. Geetha is a Member, IEEE as well as Member, IEEE Computer Society. She is also
a Professional Member, ACM. She was the Organizing Committee Member for the IEEE
INDICON Conference conducted in BITS Pilani, Hyderabad Campus during December
16-18, 2011. Dr Geetha was the Publicity Co-Chair for the IEEE Prime Asia Conference
hosted by BITS Pilani, Hyderabad Campus during December 5-7, 2012.
Dr Geetha was the Publicity Co-Chair for the IEEE Prime Asia Conference hosted by
BITS Pilani, Hyderabad Campus during December 5-7, 2012. She was the Organizing
Committee Member for the Workshop on Advances in Image Processing and Applica-
tions held in BITS Pilani, Hyderabad Campus during October 26 - 27, 2013. She was
part of the Organizing Committee for the National Seminar on Indian Space Technology
- Present and Future (NSIST-2014) held at BITS Pilani Hyderabad Campus on 1st May,
118
2014.
Dr Geetha has given many guest lectures on topics in emerging areas such as cyber se-
curity, cloud computing and cloud security. She has been a member of the Technical
Program Committees of various IEEE International Conferences. An extract from the
paper ‘A taxonomy for modelling and analysis of diffusion of (mis)information in social
networks’, co-authored by Dr Geetha and published in the International Journal of Com-
munication Networks and Distributed Systems, Vol. 13, No. 2, 2014, pp.119-143, by
Inderscience Publishers, was selected for a press release on ‘Semantic attacks in online
social media’.
Presently, she is the Chief-Investigator for the project “Design and Development of Digital
Forensic Tools for Cloud IaaS” funded by DeitY, Govt. of India.