a survey of cloud monitoring tools: taxonomy, capabilities
TRANSCRIPT
J. Parallel Distrib. Comput. 74 (2014) 2918–2933
Contents lists available at ScienceDirect
J. Parallel Distrib. Comput.
journal homepage: www.elsevier.com/locate/jpdc
A survey of Cloud monitoring tools: Taxonomy, capabilitiesand objectivesKaniz Fatema a,∗, Vincent C. Emeakaroha a, Philip D. Healy a, John P. Morrison a,Theo Lynn b
a Irish Centre for Cloud Computing & Commerce, University College Cork, Irelandb Irish Centre for Cloud Computing & Commerce, Dublin City University, Ireland
h i g h l i g h t s
• Surveyed monitoring tools revealing common characteristics and distinctions.• Identified practical capabilities of monitoring tools.• Presented taxonomy of monitoring capabilities.• Analysed strengths and weakness of monitoring tools based on taxonomy.• Discussed challenges and identified future research trends in Cloud monitoring.
a r t i c l e i n f o
Article history:Received 17 September 2013Received in revised form2 May 2014Accepted 19 June 2014Available online 5 July 2014
Keywords:Cloud managementMonitoring toolsCloud operational areasCapabilitiesTaxonomySurvey
a b s t r a c t
The efficient management of Cloud infrastructure and deployments is a topic that is currentlyattracting significant interest. Complex Cloud deployments can result in an intricate layered structure.Understanding the behaviour of these hierarchical systems and how to manage them optimally arechallenging tasks that can be facilitated by pervasive monitoring. Monitoring tools and techniques havean important role to play in this area by gathering the information required to make informed decisions.A broad variety of monitoring tools are available, from general-purpose infrastructure monitoring toolsthat predate Cloud computing, to high-level application monitoring services that are themselves hostedin the Cloud. Surveying the capabilities ofmonitoring tools can identify the fitness of these tools in servingcertain objectives.Monitoring tools are essential components to dealwith various objectives of both Cloudproviders and consumers in different Cloud operational areas.Wehave identified the practical capabilitiesthat an ideal monitoring tool should possess to serve the objectives in these operational areas. Based onthese identified capabilities, we present a taxonomy and analyse the monitoring tools to determine theirstrength andweaknesses. In conclusion, we present our reflections on the analysis, discuss challenges andidentify future research trends in the area of Cloud monitoring.
© 2014 Elsevier Inc. All rights reserved.
1. Introduction
The emergence of Cloud Computing has ushered in a new era ofInternet-based service provisioning opportunities. Cloud Comput-ing is characterised by the provision of resources as general utilitiesthat can be leased and released in an on-demand manner. Conse-quently, IT resources represent an operational rather than a capi-tal expenditure. A broad variety of pricing models can be appliedto Cloud resources, from simple fixed rental schemes to pay-as-
∗ Corresponding author.E-mail address: [email protected] (K. Fatema).
http://dx.doi.org/10.1016/j.jpdc.2014.06.0070743-7315/© 2014 Elsevier Inc. All rights reserved.
you-go models. Monitoring techniques are indispensable in orderto manage large-scale Cloud resources and enforce quality of ser-vice for consumers.
Given the multi-tenant nature of Cloud environments, efficientmanagement in the face of quality of service and performanceconstraints can be a challenge. Monitoring tools have an importantrole to play in these areas by allowing informed decisions tobe made regarding resource utilisation. Automated monitoring ofphysical and virtual IT resources allows for the identification andresolution of issues with availability, capacity, and other qualityrequirements. The benefits of automated monitoring have longbeen recognised, even in non-Cloud environments. The importanceof monitoring has been widely addressed in the literature in
K. Fatema et al. / J. Parallel Distrib. Comput. 74 (2014) 2918–2933 2919
various contexts, such as: system/network [13,35,71], distributedsystems/Grid [93,4,95], application [47,10] and Cloud [1,33]. ForCloud environments, appropriate monitoring is crucial as usage-based billing and elastic scaling are impossible to implement inthe absence of relevant metrics. Currently, a variety of Cloudmonitoring tools is applied in an ad-hoc and non-systematicway, everywhere from low-level, general-purpose infrastructuremonitoring to high-level application and service monitoring. Thepurpose of this paper is to comprehensively review these toolsto assess whether they are adequate in satisfying the essentialobjectives for measuring intrinsic Cloud behaviours.
The focus of our work is to capture the evolutionary adaptationof monitoring tools’ capabilities from general purpose to Cloudmonitoring and to present a full capability analysis with respect topractical Cloud operational areas that would help Cloud providersand customers in making an informed choice of an appropriatemonitoring tool. The monitoring platforms considered in thispaper have been chosen based on literature reviews and perceivedindustrial acceptance.
The main contributions of this paper can be summarised as fol-lows: (i) it surveys the range ofmonitoring tools currently in use togain their technical insights, (ii) it identifies the desired capabilitiesof monitoring tools to serve different Cloud operational areas fromboth providers’ and consumers’ perspectives, (iii) it then presentsa taxonomy of the identified capabilities, (iv) it analyses the avail-able tools based on the identified capabilities and unveils the ca-pabilities that are under-represented, Cloud operational areas thatare currently strongly supported by thosemonitoring tools and theareas that need further development and, (v) it discusses futureresearch challenges and trends in Cloud monitoring and manage-ment.
Our paper flows as follows: First, we assign the tools into cat-egories, Cloud specific and non-Cloud specific. After studying thetools and extracting technical capabilities, we summarise these inTables 1 and 2. We then focus on Cloud-specific monitoring capa-bilities and derive a taxonomy, which we present in Section 4. Wethen re-examine all of the tools again in light of the taxonomy inSection 5 in order to identify their strengths and weakness whenapplied to particular operational areas.
The remainder of this paper is organised as follows: Section 2provides information on various computing environments, rangingfrom single machine to various distributed systems, and theusage of monitoring in those environments. In Section 3, availablemonitoring tools are identified and described. Section 4 describestaxonomy of desirable monitoring capabilities that forms the basisfor the analysis of the identified monitoring tools. In Section 5, weanalyse the identifiedmonitoring tools. Section 6 presents analysisof the related work. Section 7 discusses the identified challengeswhile Section 8 concludes the paper and focuses on the futureresearch trends in Cloud monitoring.
2. Background
Monitoring tools have long been used for tracking resource util-isation and the performance of systems and networks. These toolshave traditionally been administrated by a single administrator ina single domain. Subsequently, the development of distributed sys-tems like clusters forced the evolution of monitoring tools to meetthe demand of these new environments. As more sophisticateddistributed environments emerged, including Grids and Clouds,monitoring tools had to be developed to capture their salient char-acteristics. In the case of Grid these included computations thatspanmultiple administrative domains and in the case of Clouds on-demand multi-tenant services. According to National Institute ofStandards and Technology (NIST) definition [58]: ‘‘Cloud computingis a model for enabling ubiquitous, convenient, on-demand network
access to a shared pool of configurable computing resources (e.g., net-works, servers, storage, applications, and services) that can be rapidlyprovisioned and released with minimal management effort or serviceprovider interaction’’.
Cloud computing has shifted computation from local machinesto services accessed via the network. Services in Cloud are typicallyoffered via three different service models which can be viewedas a layered model of services: Infrastructure-as-a-Service (IaaS),Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS).
IaaS providers, such asAmazonEC2, offer virtual resources, suchasmachine instances, storage, and networking devices, as a serviceto the end user and/or the layers above, enabling self-service forvirtualized resources. A virtual machine manager, or hypervisor,is required in order to make the physical resources available tovirtual machine instances. Through the application of clusteringtechniques, multiple physical servers can be aggregated into aresource pool from which storage, CPU, memory and networkingresources can be dynamically allocated to a set of VMs. Clusteringof the resource pool ensures high availability in the presence ofhardware failures.
PaaS providers utilise the virtual machines’ environment(e.g. operating systems and tools) to provide a scalable abstrac-tions of the underlying resources onto which applications canbe deployed. For example, Google AppEngine provides develop-ers with scalable application back end functionality for dynamicweb serving, automatic scaling and load balancing [86]. The result-ing abstraction is, from the developer’s point of view, completelydivorced from the underlying infrastructure. Similar offerings ex-ist for Java EE (e.g., Red Hat OpenShift [61]) and Ruby on Rails(e.g., Engine Yard [73]), among a host of others. Some Cloud ser-vice providers, such as Microsoft Azure, offer tools and features aspart of their offerings that simplify the development of PaaS ser-vices [56]. Open-source tools such asNimbus [62] andCloudify [32]are available that simplify PaaS integration in a provider-agnosticfashion.
SaaS (Software-as a-Service) providers offer software as a ser-vice which may hide the service implementation, thus the SaaScustomer is not necessarily aware of the underlying platforms, in-frastructure, or hardware.
Given the rich architecture of Cloud, effective monitoring re-quires an appropriate suite of tools capable of monitoring in theIaaS, PaaS and SaaS layers.
3. Review of existing monitoring systems
In the light of the above discussion, we examined the availablemonitoring tools and divided them into two broad categories:(i) general-purpose monitoring tools; and (ii) Cloud-specificmonitoring tools. We examine each of these categories in turn insubsequent subsections to gain an understanding of their commoncharacteristics and functionalities. The technical details, as wellas its reported limitations and usage experiences are summarisedin Tables 1 and 2. These tabular presentations give readers theopportunity to gain technical insights into the tools. Furthermore,the presented information also helps to identify the features of thetools, which are later used for capability based analysis and thedevelopment of a taxonomy.
3.1. General-purpose monitoring tools
Before the advent of Cloud Computing, a number of tools werealready available for the purpose of monitoring diverse IT infras-tructure resources such as networks and compute nodes. Somespecialise in particular domains such as HPC clusters and Grids.Many of these tools continue to be developed, and could beadopted in Clouds for monitoring at various abstraction levels. In
2920 K. Fatema et al. / J. Parallel Distrib. Comput. 74 (2014) 2918–2933
Table1
Featur
ematrixforg
eneral
purp
osemon
itoring
tools.
Tool
Mon
itoredreso
urce
sAg
ent
lang
uage
Agen
tOS
Lice
nce
Alerts
Enterp
rise
messaging
supp
ort
Repo
rted
limita
tions
Usage
expe
rien
ce
Nag
ios[7]
System
reso
urce
s,ne
twork,
sens
ors,
applications
andservices
CLinu
x/Unix(W
indo
wsvia
prox
yag
ent)
Ope
nso
urce
(GPL
)
E-mail,
SMS,
Custom
No
Difficulty
ofco
nfigur
ation[43,70
]inab
ility
toworkat
high
samplingfreq
uenc
y[48,
79],inab
ility
tobind
servicedu
eto
VMmigratio
n[22]
GridInfrastruc
ture
mon
itoring
[41],V
Mmon
itoring
[48,79
,22]
Colle
ctd[21]
System
reso
urce
s,ne
twork,
sens
ors,
databa
ses,ap
plications
andservices
CLinu
x/Unixan
dMac
OS
Ope
nso
urce
(GPL
v2)
N/A
AMQP(abo
veV5
.1)
Novisu
alisationplatform
Gathe
ring
metrics
from
Clou
dde
ploy
men
tsforforen
sic
purp
ose[63]
Ops
view
Core
[52]
System
reso
urce
s,ne
twork,
databa
se,
applications
andservices
Perl,C
Linu
x/Unix,
Windo
ws
Ope
nso
urce
(GPL
)
E-mail,
SMS,
Custom
No
N/A
N/A
Cacti[55
]Mainlyne
twork
PHP,
CLinu
xba
sedOSan
dW
indo
ws
Ope
nso
urce
(GPL
)
Audible
alerts,
No
N/A
Integrated
deve
lopm
ento
fva
riou
sthird-
partymon
itoring
tools[96]
Zabb
ix[74]
System
reso
urce
s,ne
twork,
sens
ors,
databa
ses,ap
plications
andservices
C,PH
P,Java
Linu
x/Unix,
Mac,W
indo
ws
Ope
nso
urce
(GPL
)
E-mail,
Custom
XMPP
Auto-d
isco
very
featur
eof
Zabb
ixcanbe
ineffic
ient.[72
]
InaClou
dbrok
eren
gine
[8]to
dyna
mically
scaleClou
dreso
urce
s,in
privateClou
dmon
itoring
[14].
Ope
nNMS[77]
Mainlyne
twork
Java
Linu
x/Unix,
Windo
ws,Mac
Ope
nso
urce
(GPL
v3)
E-mail,SM
SJM
SAu
todiscov
eryserviceha
slim
itedcapa
bilityan
dits
serviceisno
tgen
eralised
fora
llne
twork
services
[57]
N/A
Gan
glia
[64]
System
reso
urce
sC,
Perl,P
HP,
Python
Linu
x/Unix,
Solaris,
Windo
ws,Mac
Ope
nSo
urce
(BSD
)
N/A
No
Difficulttocu
stom
ise,
introd
uces
overhe
adbo
that
hostsan
dne
tworks
becaus
eof
themultic
ast
upda
tes,an
dXM
Lev
ent
enco
ding
[93]
Server
side
data
colle
ctionfora
Clou
dba
sedmon
itoring
solutio
n[50,89
]and
reso
urce
prov
isioning
inRo
cks
Clus
ters
[49]
Hyp
eric
HQ[39]
System
reso
urce
s,ne
twork,
databa
se,
applications
andservices
Java
Linu
x/Unix,
Windo
ws,Mac
Ope
nso
urce
(GPL
v2)
E-mail,SM
SNo
Highmem
ory
requ
irem
ent[13
]and
difficu
ltyof
custom
ising
grap
hicalinterface
[52]
Forc
ollectingso
ftwareinve
ntory
data
inClou
dAllo
c,amon
itoring
andreservationsystem
for
compu
teclus
ters
[42].
IBM
Tivo
li[40]
System
reso
urce
s,ne
twork,
databa
se,
applications
andservices
Java
Linu
x/Unix,
Windo
ws,Mac
Commer-
cial
E-mail,SM
SNo
N/A
Inmon
itoring
IBM
Clou
d[11],in
enterp
rise
service
man
agem
ent[16
].
Kiwi
Application
Mon
itor[
30]
Applicationpr
ocessesan
dus
eractiv
ityN/A
Windo
ws
Ope
nso
urce
Custom
No
N/A
Inmea
suring
application’spe
akmem
oryus
agean
dCP
Utim
ein
thesimulationof
asp
ace
effic
ient
quan
tum
compu
ter[
30].
R-OSG
i[83
]Distributed
applications
Java
N/A
Ope
nso
urce
E-mail,SM
SNo
Prob
lem
with
service
registratio
nan
dstatic
configur
ation[83]
Gettin
gde
pend
ency
inform
ation
fora
distribu
tedap
plication[27]
DAM
S[46]
Distributed
applications
Java
N/A
Ope
nso
urce
N/A
No
System
’spe
rforman
ceis
affected
[46]
Ope
nEd
ucationMan
agem
ent
System
ofCe
ntralR
adio
andTV
Unive
rsity
[46].
K. Fatema et al. / J. Parallel Distrib. Comput. 74 (2014) 2918–2933 2921
Table2
Clou
dsp
ecificmon
itoring
toolsreview
.Nam
eof
tool
Mon
itoredreso
urce
Port-
able
Scal-
able
Ope
nso
urce
Ope
ratin
gsystem
Alerts
Messaging
system
Implem
entatio
nlang
uage
Repo
rted
limita
tions
Amaz
onClou
dW
atch
[3]
AWSreso
urce
san
dap
plications
andservices
runn
ingin
AWS
No
Yes
No
Linu
x,W
indo
ws
Amaz
onSimpleQue
ueSe
rvice
Scriptsin
Windo
ws
PowerSh
ellfor
Windo
ws,
PerlforL
inux
Works
forA
maz
onreso
urce
son
ly,w
orks
ince
ntralis
edmod
els,do
esno
tens
ure
availability,
potentialsec
urity
threat
dueto
non-
effic
ient
use[75]
Azur
eW
atch
[5]
Azur
eba
sedreso
urce
sNo
Yes
No
Windo
ws
E-mail,repo
rtto
brow
ser,
smartpho
ne,R
SSfeed
No
.Net
Works
forA
zure
basedreso
urce
son
ly
Nim
soft[65]
Variou
sClou
dreso
urce
s,OS,
Networkan
dap
plications
Yes
Yes
No
Linu
x,Netware,
Unix,
Windo
ws,Mac
E-mail,RS
S,text
message
s,instan
tmesseng
er,T
witt
er
Nim
softMessage
Bus
C/C+
+,Java
,Perl,VB
,an
d.Net
Doe
sno
tcap
reso
urce
cons
umptionby
themon
itor,
notfau
lt-tolerant
[66],d
oesno
tsu
pportS
LAco
mplianc
ech
ecking
ford
atadu
rabilityor
locatio
n[80]
Mon
itis[69]
Network,
Server,C
loud
reso
urce
san
dap
plications
Yes
Yes
No
Linu
x/Unix,
Windo
ws,
Mac
E-mail,IM
,SMS,
Twitt
er,
livevo
ice
No
C++
N/A
Clou
dKick[38]
Clou
dreso
urce
s,ap
plicationan
dpr
otoc
ols
Yes
No
No
Linu
x,W
indo
ws
E-mail,web
hook
,SMS
No
C#,.NET
,Pytho
n,PH
P,Java
andRu
byW
orks
ince
ntralis
edmod
elsan
ddo
esno
tens
ureav
ailabilityof
services
[75],can
mon
itoro
nly
viaClou
dKickAg
ent[54
],on
lysu
itableforinfrastru
ctur
emon
itoring
[38]
Boun
dary
Application
Mon
itor[
12]
Clou
dap
plications
Yes
Yes
No
Linu
x,W
indo
ws,Mac
E-mail,SM
SNo
N/A
N/A
mOSA
IC[82]
Clou
dap
plications
Yes
No
Yes
Linu
xba
sedOS
SLAviolationwarning
AMQP
Java
Itmon
itors
theap
plications
deve
lope
dwith
mOSA
ICAP
Ion
ly[26]
CASV
iD[26]
Clou
dap
plications
Yes
Yes
Yes
Unix,
Linu
xSL
Aviolationwarning
JMS
Python
,Jav
aDoe
sno
tsup
port
multi-
tier
applicationmon
itoring
PCMONS[23]
Clou
dreso
urce
sNo
No
Yes
Linu
xTo
olde
pend
ent
No
Python
,Perl,Ba
shscript
Works
only
forinfrastru
ctur
emon
itoring
inpr
ivate
Clou
ds[23]
2922 K. Fatema et al. / J. Parallel Distrib. Comput. 74 (2014) 2918–2933
Fig. 1. Infrastructure monitoring tool architecture.
this section, we discuss the features of general purpose infrastruc-ture and application monitoring tools.
General purpose infrastructure monitoring tools typicallyutilise a client–server model by installing an agent in every systemto bemonitored. Fig. 1 shows this general architecture. Monitoringagentsmeasure themetric values frommonitored components andsend them to themonitoring server. The server stores the collectedmetrics into a database, analyses them, and sends alerts. It maygenerate graphs, trending reports and SLA reports based on themonitored metrics retrieved from the database.
As illustrated in Fig. 2, primitive metric measurements can beinitiated either by the monitoring server (depending on systemconfiguration) or it can be initiated by an external program (orscript) residing on themonitored resource. The latter arrangementis useful when monitoring is performed behind a firewall. For ex-ample, active mode monitoring in Nagios [7] is initiated at regularintervals by executing the Nagios Remote Plugin Executor (NRPE)as shown in Fig. 2(a). Passive monitoring in Nagios uses the NagiosService Check Acceptor (NSCA) and is initiated by external appli-cations. Fig. 2(b) depicts the passive mode.
When a monitoring agent is initiated for collecting metrics, itmeasures the relevant metric values from the monitored compo-nents and passes them on to the monitoring server. Dependingon the configuration of the system, the server may send alerts tointerested parties on occurrence of an event. Most of the moni-toring systems use e-mail and SMS as alerting mechanisms (e.g.Nagios, Opsview [52], Zabbix [74] and OpenNMS [77]). Somemon-itoring tools, such as Cacti [55], use audible alerts. Others, such asCollectd [21] and Ganglia [64], have no alerting mechanisms.
Monitoring servers may be arranged hierarchically to pro-vide distributed monitoring. Zabbix, IBM Tivoli [40] and Opsviewadopted this approach. In this arrangement, a child server passesits monitored data to a parent server which stores them into adatabase, allowing them to be displayed through a web front-endinterface. Fig. 3 illustrates the hierarchical architecture of the Zab-bix monitor. Ganglia, on the other hand, aggregates data from all
the Ganglia Monitoring Daemons (Gmond) via Ganglia Meta Dae-mon (Gmetad) while utilising a multicasting technique amongstthe nodes in a cluster.
Few tools are available for general purpose application mon-itoring. The most appropriate application monitoring techniquedepends on the nature of the application. Kiwi Application Mon-itor [30] monitors centralised applications running in a singlemachine by inspecting OS processes. Other tools, such asRemoting-Open Services Gateway initiative (ROSGi) DevelopmentTool (RDT) [83] and Distributed Application Monitoring System(DAMS) [46], aim to monitor distributed applications involvingnetwork communications. RDT [83] helps to develop, deploy andmonitor distributed Java applications with the Eclipse IDE whileDAMS [46] enhances the byte code of distributed Java applicationsand monitors the application module and class methods at run-time.
Table 1 summarises the general-purpose monitoring tools con-sidered above and provides an overview of the technical detailsof the each of the tools, such as: what resources the tool aims tomonitor; what programming language is used to implement thetool agent; operating systems the agent works on; whether thetool is under open source licence; the alerting mechanism usedif any; and, whether the tool has any enterprise messaging sup-port, i.e whether the tool uses a standardised messaging middle-ware. Reported limitations and documented usage experience inthe scientific literature are also included which offers an opportu-nity to learn about the tools from the experience of others. Table 1provides a brief comparison of the technology across monitoringtools. The extraction of technical details and usage experience ofthe tools also helps to identify the capabilities of the tools whichare discussed in Section 5.
The tools described above are general-purpose, and are notdesigned specifically formonitoring Cloud deployments. However,they can be adapted for this purpose by changing/extending theirdesign with Cloud-specific monitoring features.
3.2. Cloud specific monitoring tools
The advent of Cloud Computing has given rise to the develop-ment of Cloud-specificmonitoring tools. Currently, Cloudprovidersoffer diverse services using proprietary software andmanagementtechniques. In addition, many of these providers use provider-dependentmonitoring toolswhich complement their offerings. Forinstance, Amazon CloudWatch [3]monitors AmazonWeb Services(AWS) such as Amazon EC2, Amazon RDS DB instances, and appli-cations running on AWS. AzureWatch [5], on the other hand, mon-itors Azure-based resources, Windows Azure instances, SQL AzureDatabases, websites, web applications, and Windows Azure stor-age. Both of these tools allow for user definedmetricsmonitored. Incontrast, provider-independent Cloud monitoring tools exist thatcan be used to monitor a multiplicity of Cloud platforms. For ex-ample, Nimsoft [65] can monitor Amazon EC2, S3 Web Services,Rackspace Cloud,Microsoft Azure, Google AppEngine, Google Appsand Salesforce CRM; Monitis [69] can monitor Amazon EC2/AWSand Rackspace Cloud; CloudKick [38] can monitor Amazon EC2,GoGrid and Rackspace Cloud. Finally, some monitoring tools, suchas Private Cloud Monitoring Systems (PCMONS) [23] only moni-tor private Clouds. Table 2 summarises the review of Cloud specificmonitoring tools.
Many Cloud based monitoring tools were initially designed tomonitor IT infrastructure and was later extended to monitor Clouddeployments. For instance, the Nimsoft monitor originally pro-vided monitoring for infrastructure, network, servers, databases,applications and virtualised environments but later it evolved tooffer its services as a Unified Cloud Monitor for monitoring ex-ternally hosted systems and services. Like general purpose mon-itoring tools, provider independent Cloud monitoring tools have
K. Fatema et al. / J. Parallel Distrib. Comput. 74 (2014) 2918–2933 2923
(a) Active mode. (b) Passive mode.
Fig. 2. Nagios active and passive monitoring modes.
Fig. 3. Zabbix monitoring architecture.
agents installed in the monitored systems, which measure andsend monitored data to the server for processing. The server is-sues alerts in cases needing attention. Most of the Cloud moni-
toring tools (e.g., Nimsoft, CloudKick, Monitis, Boundary applica-tion monitor [12]) offer their services as SaaS that can be usedto monitor third party Cloud installation. To realise this, the third
2924 K. Fatema et al. / J. Parallel Distrib. Comput. 74 (2014) 2918–2933
party Clouds must support the installation and execution of SaaSagents. To efficiently determine the response time and communi-cation latency, some monitoring tools, such as Boundary only in-spect packet header information in contrast to other tools, whichperiodically sample the entire monitored dataset.
Many of the Cloud monitoring tools, including Amazon CloudWatch, Azure Watch, Nimsoft and Monitis, are capable of moni-toring at both the infrastructure and application levels. However,some tools are capable of monitoring only at the infrastructure(CloudKick, PCMONS) or application levels (Boundary applicationmonitor, mOSAIC [82] and Cloud Application SLA Violation Detec-tion (CASViD) [26]). In some cases, Cloud providers publish thesource code of their monitoring agents under open source licences.For instance, CloudKick offers their client code in C#, .Net, Python,Java and Ruby and Nimsoft’s client code is available in C, Java, Perl,VB, and .Net.
Similar to Table 1, Table 2 specifies the technical details of theCloud specific monitoring tools which includes resources moni-tored by the tool, whether the tool is open source, operating sys-tems supported by the tool, alertingmechanisms supported by thetool, whether the tool uses any standardised messaging middle-ware, the programming language used to implement the tool andany reported limitations found in the literature.
4. Importance of monitoring: A taxonomy
Monitoring is a term currently used in several fields of studyto represent various processes. In the context of computing, thereare some definitions for this term relating it to specialised areassuch as Grid, Service Oriented-Architecture (SOA) and DistributedSystems [2,93,60,51]. However, these definitions are not completeand do not reflect the full characteristics of a modern monitoringsystem in our opinion.
Therefore, we propose a precise and concise definition of thetermmonitoring so thatwe can contextualisewhatwebelieve to bethe important capabilities associated with the monitoring process.These capabilities can then be used as the basis to assess themonitoring tools described previously. Thus, informed by Kornaroset al. [51] and Mansouri et al. [60], we define monitoring as:‘‘A process that fully and precisely identifies the root cause of an eventby capturing the correct information at the right time and at the lowestcost in order to determine the state of a system and to surface thestatus in a timely and meaningful manner ’’.
This definition views monitoring from a capability perspective.Terms like lowest cost, system’s status and timely, indicate how themonitoring process should be exploited operationally in the ser-vice of managing complex systems like Clouds. The key charac-teristics of Cloud, such as, agility, low cost, device and locationindependence, multi-tenancy, high reliability, high scalability, se-curity and sustainability [34], also identify some capabilities amonitoring tool should possess to adapt to Cloud environments.There are many Cloud operational areas, such as SLA and config-uration management and security, within which monitoring playsan important role in servicing Cloud provider and Cloud consumerobjectives. In this section, we identify the monitoring capabilitiesthat are essential for facilitating complex management activitiesin Cloud environments. From this we construct and present a tax-onomy of monitoring capabilities in the context of specific Cloudoperational areas.
4.1. Desirable Cloud monitoring capabilities
This section presents some important capabilities of an efficientCloudmonitoring tool. Interpreting our definition of monitoring inthe context of Cloud, and from the key Cloud characteristics, weidentify the following list of prevalent capabilities:
• Scalability: Cloud deployment can be of very large scale, con-sisting of thousands of nodes. In order to manage these re-sources, a monitoring tool needs to be scalable to deliver themonitored information in a timely and flexible manner. Theimportance of this capability has been discussed in the litera-ture [33,2,93,37]. Developers are currently striving to achievehigh scalability in terms of resource and application manage-ment in Clouds. Therefore, a scalable means of supervising theservice delivery processes for adequate management is funda-mental.
• Portability: Cloud environments incorporate heterogeneousplatforms and services. Therefore, the portability of monitor-ing tools, i.e., the ability to move the tool from one platform toanother, is indispensable to facilitate efficient management ofsuch environments and to achievewidepenetration acrossmul-tiple Clouds [93,33].
• Non-intrusiveness: In Clouds, there are large numbers of re-sources to be monitored, hence the computational power con-sumed by themonitoring tools tomonitor all of these resourcesmight have a big impact on the performance of the overall sys-tem. To cater for such environments, a monitoring tool shouldconsume as little resource capacity as possible on the moni-tored systems so as not to hamper the overall performance ofthe monitored systems [33,2].
• Robustness: Clouds represent a frequent and dynamicallychanging environment. It is important that the monitoring tooldetects changes in circumstances, such as the addition or re-moval of tenants and resources [37,2]. A monitoring tool needsthe ability to adapt to a new situation by continuing its opera-tion in the changed environment, which helps tomitigate faultsand to provide accurate monitored information [33].
• Multi-tenancy: Clouds may offer a multi-tenant environmentwhere multiple tenants share the same physical resources andapplication instances. A number of works in the literaturehave discussed the necessity of this functional requirement, es-pecially in guaranteeing service level agreements and virtualmachine monitoring [18,91,37]. To support multi-tenancyprovisioning, the Cloud monitoring tool should maintain con-currency, i.e., multiple customers being able to get commonmonitored information and isolation, i.e., tenants only beingable to access the information that is addressed to them. Effi-cient monitoring tools should embody this capability.
• Interoperability: Currently, Cloud environments may includedozens of independent, heterogeneous data centres operatingmostly as stand-alone resources. Many business analysts havepredicted the need for interoperable federated Clouds [15]. In-teroperability is a prerequisite for Cloud bursting and for thecreation of federated offerings frommultiple providers [37,91].A modern monitoring tool should be capable of sharing moni-toring information between heterogeneous Cloud componentsfor managing collaborative operations.
• Customizability: There are presently numerous Cloud ser-vice offerings and many providers are seeking ways to de-liver unique services to their customers by allowing them highcustomisation flexibility. Considering the large number of cus-tomers, providers must be able to manage the service customi-sation of each customer, for example, by granting customers theability to choose the metrics to be monitored for their service.Thus, to realise this goal, efficient monitoring tools should pos-sess this capacity.
• Extensibility:With the rapid growth of Cloud computing, thereare continuous changes and extensions to technologies espe-cially in the area of management. Since monitoring techniquesare fundamental to Cloud management, the monitoring toolsneed to be extensible and be able to adapt to newenvironments,such as being able to incorporate new monitoring metrics [93].This is typically achieved through a modular design.
K. Fatema et al. / J. Parallel Distrib. Comput. 74 (2014) 2918–2933 2925
• Shared resource monitoring: Cloud technology uses virtual-ization of physicalmachine resources to achieve usage isolationin the form of virtual machines. The virtual machines share theunderlying resources while multiple applications share the re-sources of virtual machine. Thus, to avoid resource contentionamong the virtual machines or manage resources shared by ap-plications in a virtual machine, efficient monitoring is needed.A Cloud monitoring tool needs the capability of supervisingshared resources to manage such an environment [37].
• Usability: Usability is one of the critical issues facing the adop-tion of Cloud computing. Fitness for purpose is an importantfactor when evaluating usability since the intended goal of amonitoring tool determines the usability judgement. As a con-sequence, anymonitoring tool that is designed to support Cloudmanagement needs to be easily useable [37]. To be highly use-able a monitoring tool should facilitate deployment, mainte-nance and human interaction.
• Affordability: One of the reasons behind the popularity ofCloud adaptation is the reduction of cost. Cost effectiveness ofa monitoring tool (e.g., being open source) impacts on its widespread acceptance [19]. A Gartner Survey [31] shows that morethan 85% of enterprises were using open source software as of2008 to drive down cost and increase flexibility. This indicatesthat being open source would also have a positive impact onthe prominence of a monitoring tool. We rate affordability byconsidering both the cost of monitoring agent and the back endserver component.
• Archivability: The availability of historical data can be usefulfor analysing and identifying the root cause of a problem in thelong term [13,71]. In order to serve this purpose, a monitoringtool should possess a means of storing historical data.
4.2. Cloud operational areas facilitated by monitoring
Cloud stakeholders, such as providers and consumers, havevarying motivations for gaining insight into Cloud operations. Inthis section, we present some Cloud operational areas that can besupported by monitoring. Later on, we present a taxonomy of thecorresponding capabilities that a monitoring tool needs to possessin order to support these Cloud operational areas.
Cloud computing offers a new style of computing that allowsconsumers to pay only for the services used and frees them fromthe management overhead of the underlying infrastructure. Thisenables low initial set-up cost for business owners. In spite of thepricing advantage, consumers are still sceptical about Cloud offer-ings [59]. For assurance, they may require insight into areas suchas (i) Cloud usage information to confirm the correctness of theirbills, (ii) SLA enforcement mechanisms ensuring their QoS objec-tives and (iii) security and privacy policies guiding the storage andtransfer of their data. Surveys show that the possible lack of se-curity and loss of control over data in Clouds are among the ma-jor concerns of Cloud consumers [90,17]. Monitoring plays a rolein detecting security breaches and hence can provide assurance ofsecurity maintenance.
Whilst Cloud consumers are free from the worry of mainte-nance overhead, providers on the other hand have the responsi-bility of maintaining and managing the underlying infrastructure.Monitoring is an essential part of Cloud management and servesvarious objectives of Cloud providers, such as (i) provisioning re-sources/services, (ii) optimal capacity planning, (iii) assuring SLAs,(iv) configurationmanagement, (v) billing and (vi) security/privacyassurance.
The above discussion highlights the Cloud operational areasthat are facilitated by monitoring. Table 3 summarises these ar-eas and indicates the areas where providers and consumers havedifferent monitoring perspectives.
In the following, we describe the Cloud operational areas thatcan benefit from monitoring. In the process, we reflect on thedesirable capabilities of a monitoring tool that would make it fitfor the purpose in question.Accounting and billing: The notion of providing computing as autility service relies heavily on the ability to record and account forthe Cloud usage information on which billing schemes are based.Accurate accounting and billing relies on the ability to capturethe consumption and allocation information of virtual resourcesas well as that of applications (e.g. compute hour used, bandwidthused) [24]. This is a capability that monitoring can provide. Fur-thermore, the provision of a transparent billing system that is ableto record data in a verifiable and trustworthy manner, to ensureprotection against forgery and false modifications, requires robustand secure monitoring capabilities [78,87].SLA management: A Service Level Agreement (SLA) represents acontract signed between a service provider and a customer specify-ing the terms of a service offering including quality of service (QoS),pricing and penalties in case of violating the agreed terms [25,36].SLAmanagement is an area of great importance for Cloud providerssince the assurance of SLA enforcement is inevitable for customersatisfaction and hence is a driving force for the continuity andgrowth of a Cloud business. The providers are expected to meetthe QoS requirements as well as the Key Performance Indicators(KPI) for services in order to enforce their agreed SLA terms. Mon-itoring is essential to achieve these goals. The monitoring capabil-ities required to support operations in this area include the abilityto measure QoS parameters, storing and analysing data, resourceconsumption measuring and SLA parameter assessment. These ca-pabilities are expected from a monitoring tool for the purpose ofSLA management [36,20,76,25].Service/resource provisioning: Service/resource provisioning in-volves the allocation of resources optimally in order to match theworkload [28]. It is an essential requirement for providing Cloudelasticity. Provisioning can be implemented in two ways: (1) staticprovisioning where VMs are created with a specified size and thenconsolidated onto a set of physical servers. The VM capacity doesnot change; and (2) dynamic provisioning: VM capacity is dynam-ically adjusted to match workload fluctuations [68]. The ability tomeasure the overall resource consumption of a system, along withthe ability to measure per service resource consumption (whichidentifies the amount of resources each individual service needs),is essential for efficient provisioning. Furthermore the ability to as-sess risk and QoS is needed for effective provisioning decisions,such as whether to allocate/release resources to ensure that thequality is not compromised or resources are not wasted [28,94].Capacity planning: Capacity planning is an important domain inCloud computing, especially for the provider. It ensures adequateresource availability to meet the capacity demand necessary forsecuring a level of service quality and for serving various Cloudoperational management activities, e.g., disaster recovery andmaintaining backups [81]. The ability to measure capacity usageenables operations such as predicting the need for more resourcesor determining resource wastage. Furthermore, the ability to de-tect Cloud node availability is necessary to maintain a requiredlevel of resource limits [67]. Monitoring capabilities such as com-ponent status identification play an important role in facilitatingthese goals.Configuration management: Configuration is a set of parame-ters and values that determine the behaviour of devices and soft-ware [88]. While a Cloud provider may operate a multi-tenantenvironment it needs to manage customer-specific configuration.The initial configurationsmay contain theminimal set of resourcesrequired for a certain service. Resources may be added or releaseddepending on varying load resulting into reconfiguration at run
2926 K. Fatema et al. / J. Parallel Distrib. Comput. 74 (2014) 2918–2933
Table 3Cloud operational areas: provider and consumer perspectives.
Cloud operational area Provider perspective Consumer perspective
Accounting and billing yes yesSLA management yes yesService and resource provisioning yes noCapacity planning yes noConfiguration management yes noSecurity and privacy assurance yes yesFault management yes no
Fig. 4. Taxonomy of monitoring capabilities based on Cloud objectives.
time. Configuration management system needs to be able to verifyspecified configurations and to identify possible changes [29,88]. Inthis area, monitoring supports: configuration verification by iden-tifying a configuration and verifying that the instances have thedesired configuration; configuration drift identification by deter-mining whether the configuration of an instance has changed; andConfiguration effect monitoring by auditing the performance ef-fects due to configuration changes.Security and privacy assurance: The risk of compromising se-curity and privacy is one of the major hindrances against thewidespread use of Cloud computing [85]. The capability of detect-ing security breaches or attacks [53] is essential for this purposeandmonitoring can help in this regard, for example, by identifyinga malicious process consuming disappropriate system resources.To ensure Cloud services’ security, it is important to assure that thetool being used for monitoring should not introduce any vulnera-bilities. This is important especially in a multi-tenant Cloud envi-ronment.Monitoring capabilities such as user based access control,secure notification and storage are essential to support this opera-tional area [92,90,91].Fault management: Fault management is one of the core opera-tional areas of Cloud that enables reliable and resilient Cloud ser-vices [44]. Continuous monitoring enables timely prediction anddetection of failures, which can be pro-actively handled by replac-ing the suspected components [6]. Due to the complex constitution
of Cloud components, faults can occur in many different ways, e.g.,server overload or network/hardware/service failure [45]. There-fore, the capability of identifying the load of a Cloud system anddetecting availability of components is necessary to support theoperations in this area.
In Fig. 4, we present the taxonomy of the identified monitoringcapabilities categorised on the basis of Cloud operational areas.There are 12 common capabilities which are essential for all Cloudoperational areas (as described in Section 4.1). It can be seen in thetaxonomy that some of the capabilities are common in a numberof operational areas but not in all operational areas; that resultsinto 29 distinguished capabilities. Based on this taxonomy, weanalyse the described monitoring tools to identify their strengths,challenges and to predict future trends in this area.
5. Analysis of tools based on taxonomy
This section presents our analysis of the monitoring tools de-scribed in Sections 3.1 and 3.2. This analysis is based on the moni-toring capabilities taxonomy described previously. In line with ourapproach in Section 3, we partition the set of tools into two groups:those that are general purpose and those that are Cloud specific.
Table 4 presents the analysis of the general purposemonitoringtools and Table 5 presents the analysis for Cloud specific monitor-ing tools. The 2nd columns of the tables show the weighted aver-age percentage of the implemented capabilities by the tools. In the
K. Fatema et al. / J. Parallel Distrib. Comput. 74 (2014) 2918–2933 2927
Table 4General purpose monitoring tool analysis.
Capability/features Percentageimplemented
Nagios Collectd Opsview Cacti Zabbix OpenNMS
Ganglia Hyperic IBMTivoli
KiwiMonitor
DAMS RDT
Scalability 46% no no limited no yes no yes yes yes no yes noPortability 79% limited limited yes limited yes yes yes yes yes no yes yesNon-intrusivenessa 50% limited limited yes no yes yes limited limited yes no no noRobustness 33% no no no no no no yes yes yes no no yesMulti-tenancy 33% yes no yes no no no no yes yes no no noInteroperability 25% no yes no no yes yes no no no no no noCustomizability 100% yes yes yes yes yes yes yes yes yes yes yes yesExtensibility 100% yes yes yes yes yes yes yes yes yes yes yes yesShared resource monitoring 42% yes yes yes no yes no no yes no no no noUsabilitya 92% no yes yes yes yes yes yes yes yes yes yes yesAffordability 79% limited yes limited yes yes yes yes limited no yes yes yesArchivability 67% yes no yes yes yes yes yes yes yes no no noVerifiable measuring 0% no no no no no no no no no no no noResource usage metering 75% yes yes yes yes yes yes yes yes yes no no noService usage metering 50% yes yes yes no yes no no yes no yes no noService KPI monitoring 0% no no no no no no no no no no no noQoS monitoring 50% yes no yes no yes no no yes yes yes no noRisk assessment 58% yes no yes yes yes yes no yes yes no no noComponent statusidentification
100% yes yes yes yes yes yes yes yes yes yes yes yes
Service load monitoring 50% yes yes yes no yes no no yes no yes no noConfiguration verification 75% yes yes yes yes yes yes yes yes yes no no noConfiguration driftidentification
75% yes yes yes yes yes yes yes yes yes no no no
Configuration effectmonitoring
75% yes yes yes yes yes yes yes yes yes no no no
Security breach monitoring 33% yes no no no yes no no yes yes no no noUser access control 50% no no yes yes yes yes no yes yes no no noUser activity 17% no no no no no no no no yes yes no noSecured notification 17% no no no no no yes no no yes no no noSecured storage 17% no no no no no no no yes yes no no noService dependency 21% no no no no no no no yes no no yes yesPercentage covered by tools 61% 52% 70% 46% 78% 59% 50% 85% 78% 33% 30% 30%a We note that some of these capabilities are somewhat subjective. With that in mind the evaluation presented here is based on the weight of opinion as reflected in the
reviewed literature.
calculations, we assign ‘‘1’’ if a tool has a particular capability and‘‘0’’ if it does not. There is the assignment of ‘‘0.5’’ if the tool partlyimplements such a capability. The sum of these values is used tocalculate the assessment percentage. According to Tables 4 and 5,a ‘‘1’’ is equivalent to ‘‘yes’’, a ‘‘0’’ implies ‘‘no’’ and ‘‘0.5’’ represents‘‘limited’’. The capabilities that have scored ‘‘0’’ for all the tools areexcluded from the calculation of weighted average percentage ofcapabilities covered by each tool which are presented in the lastrows of the tables. For tools with multiple versions—for example,Nagios, Opsview, Hyperic, CloudKick, Nimsoft and Monitis, the ca-pabilities are identified based on the superset of features of all ver-sions.
The determinations of the capability implementation by thetools are based on literature reviews and evaluations of the tools.Someof the capabilities such as non-intrusiveness andusability aresubjective, hence they are evaluated based on the weight of opin-ion found in the reviewed literature.
5.1. Analysis of general purpose monitoring tools
The Cloud monitoring capabilities taxonomy is applied to anal-yse the general purpose monitoring tools to determine how thesetools could be extended/adapted to facilitate efficient Cloud mon-itoring.
As shown in Table 4, this group of tools has broad implementa-tion of the portability, customizability, extensibility, usability, af-fordability, resource usage metering, component status identifi-cation ability, configuration verification ability, configuration driftidentification ability and configuration effect monitoring ability.
These tools are weaker on scalability, non-intrusiveness, ro-bustness, multi-tenancy, per service resource consumption abil-ity, QoS monitoring, risk assessment and, service load monitoring
capabilities. Interoperability and other security and privacy assur-ance related capabilities are least implemented by these tools asshown by their calculated percentages. Some desirable capabilitiessuch as verifiable metering, service KPI monitoring are not imple-mented by any of the tools in this group. The lack of support forthese capabilities illustrates the fact that the tools were not de-signed with Cloud in mind and must be extended or redesignedto be useable in a Cloud environment. The identification of thesecapabilities emphasises the challenges in adapting these tools formonitoring Clouds, since these capabilities are essential for effi-cient Cloud operational management.
The last row of Table 4 shows the weighted average percentageof the capabilities covered by each monitoring tool. This row pro-vides a relative comparison of the number of capabilities imple-mented by each tool. As shown in the table, the Hyperic monitor-ing tools possess the highest percentage, implementing 85% of allcapabilities. All of the general purpose monitoring tools surveyedscored in excess of 50% except for Kiwi ApplicationMonitor, DAMSand RDT each of which only implemented around 30% of the listedcapabilities. These three tools were mainly designed to monitor-ing general applications and the scoring shows the fact that theyare not fit for full Cloud operational management, which requiresresource and application monitoring.
5.2. Analysis of Cloud specific monitoring tools
In this section, we analyse the Cloud specific monitoringtools using the previously described taxonomy. The goal of theanalysis is to determine the strengths, drawbacks and challengesfacing these tools. Table 5 presents the analysis. As shown in thetable, all of the tools implement the customizability, extensibility,resource usage metering, service usage metering, component
2928 K. Fatema et al. / J. Parallel Distrib. Comput. 74 (2014) 2918–2933
Table 5Cloud based monitoring tool analysis.
Capability/features Percentageimplemented
CloudKick Nimsoft Monitis AmazonCloud Watch
AzureWatch
PCMONS Boundaryapp.monitor
mOSAIC CASViD
Scalability 78% yes yes yes yes yes no yes no yesPortability 56% limited yes yes no no no yes yes yesNon-intrusivenessa 94% yes yes yes yes yes limited yes yes yesRobustness 67% yes yes yes yes yes no yes no noMulti-tenancy 44% yes yes no yes yes no no no noInteroperability 33% no no no no no yes no yes yesCustomizability 100% yes yes yes yes yes yes yes yes yesExtensibility 100% yes yes yes yes yes yes yes yes yesShared Resource monitoring 78% yes yes yes yes yes no yes no yesUsabilitya 94% yes yes yes yes yes limited yes yes yesAffordability 50% limited limited limited no no yes no yes yesArchivability 78% yes yes yes yes yes yes yes no noVerifiable measuring 0% no no no no no no no no noResource usage metering 100% yes yes yes yes yes yes yes yes yesService usage metering 100% yes yes yes yes yes yes yes yes yesService KPI monitoring 0% no no no no no no no no noQoS 89% yes yes yes yes yes no yes yes yesRisk assessment 92% yes yes limited yes yes yes yes yes yesComponent statusidentification
100% yes yes yes yes yes yes yes yes yes
Service load monitoring 100% yes yes yes yes yes yes yes yes yesConfiguration verification 78% yes yes yes yes yes yes no no yesConfiguration driftidentification
78% yes yes yes yes yes yes no no yes
Configuration effectmonitoring
100% yes yes yes yes yes yes yes yes yes
Security breaches monitoring 56% yes yes no yes yes no yes no noUser access control 67% yes yes yes yes yes no yes no noUser activity monitoring 0% no no no no no no no no noSecured notification 33% no yes no yes yes no no no noSecured storage 33% no yes no yes yes no no no noService dependency 11% no no no no no no yes no noPercentage covered by tools 78% 87% 70% 85% 85% 52% 73% 54% 69%a We note that some of these capabilities are somewhat subjective. With that in mind the evaluation presented here is based on the weight of opinion as reflected in the
reviewed literature.
status identification, service load monitoring and configurationeffect monitoring capabilities.
This group of tools lacks on the implementation of portability,multi-tenancy, interoperability, secured notification, secured stor-age and service dependency capabilities. These tools are generallydesigned for monitoring in Clouds and many of them are commer-cial and proprietary, i.e., provider and platform dependent. Thisaccounts for the low levels of interoperability (33%) and portabil-ity (56%) observed. The implementation of multi-tenancy of thesetools is 44% compared to 33% for general purposemonitoring tools.Since this capability is indispensable for multi-tenant Cloud envi-ronment, thus, we argue that this area is a challenge for future re-search. Theneed for securing themonitoring tool itself is importantfor ensuring that the tool does not create any security holes in theCloud. As canbe observed, capabilities associatedwith securemon-itoring are lacking implementations. Therefore, future research ef-forts are required in this area.
Compared to the general purpose monitoring tools, the scal-ability (78%), non-intrusiveness (94%), robustness (67%), multi-tenancy (44%), shared resource monitoring (78%), resource usagemetering (100%), per service resource consumption metering(100%) and service load monitoring (100%) capabilities are betteraddressed in the Cloud specificmonitoring tools. This is reasonableas the demand for these capabilities is high in Clouds.
In line with our descriptions, the last row of Table 5 shows theweighted average percentage of the capabilities implemented byeach of the tools. The Nimsoft monitor emerges as the best with87% coverage and the PCMONS tool is the least with 52%.
One interesting observation is that most of the capabilities areimproved with Cloud based monitoring tool except for portabilityand affordability. This reinforces the fact that many of the Cloud
specific monitoring tools are commercial and vendor dependent,which make the tools less portable and less affordable as most ofthe general purpose monitoring tools are open-source.
It is interesting to note that none of the tools analysed imple-ments verifiable measuring, service KPI monitoring and user activ-ity monitoring capabilities. This shows that the realisation of thesefeatures is still a challenge, which indicates that further researchefforts are needed in this direction.
5.3. Capability implementation coverage
In this section, we present a graphical representation of the ca-pability implementation percentages of the monitoring tools. Wetake the Cloud operational areas associated with the capabilitiesinto consideration. This representation clearly shows the Cloud op-erational areas that are currently better supported by themonitor-ing tools in terms of implemented capabilities and the ones thatare poorly supported, which exposes the need for further researchefforts in those areas.
Fig. 5 shows this graphical representation. It includes the capa-bilities of the general purpose as well as the Cloud based monitor-ing tools. From the figure, it can be observed that all the operationalareas are better supported by the Cloud based monitoring toolscompared to the general purpose monitoring tools. This can be at-tributed to the fact that Cloud based monitoring tools are betterspecialised in terms of supporting Cloud operational management.
With the exception of capabilities relating to security and pri-vacy assurance operational area, other areas are supported by atleast 57% implementation coverage. This implies that the currentmonitoring tools are least supportive of security and privacy assur-ance management in Clouds.
K. Fatema et al. / J. Parallel Distrib. Comput. 74 (2014) 2918–2933 2929
Table 6Cloud operational area based analysis of general purpose monitoring tools.
Cloud operational areas Nagios Collectd Opsview Cacti Zabbix OpenNMS
Ganglia Hyperic IBMTivoli
KiwiMonitor
DAMS RDT
Accounting and billing 61% 64% 79% 46% 86% 64% 68% 86% 71% 36% 43% 43%SLA management 66% 56% 81% 47% 88% 63% 59% 88% 75% 38% 38% 38%Service and resourceprovisioning
68% 59% 82% 50% 88% 65% 62% 88% 76% 41% 41% 41%
Capacity planning 60% 60% 80% 56% 87% 73% 70% 87% 80% 33% 46% 46%Configuration management 63% 66% 80% 56% 86% 73% 73% 86% 80% 26% 40% 40%Security and privacy assurance 44% 41% 59% 38% 70% 59% 50% 76% 82% 29% 41% 41%Fault management 57% 60% 73% 43% 80% 60% 63% 87% 67% 40% 53% 53%
Table 7Cloud operational area based analysis of Cloud based monitoring tools.
Cloud operational areas CloudKick NimSoft Monitis Amazon CloudWatch
AzureWatch
PCMONS Boundary app.monitor
mOSAIC CASViD
Accountability and Billing 86% 89% 82% 79% 79% 57% 79% 64% 79%SLA management 88% 90% 81% 81% 81% 56% 81% 69% 81%Service and resourceprovisioning
88% 91% 82% 82% 82% 59% 82% 70% 82%
Capacity planning 87% 90% 80% 80% 80% 60% 80% 66% 80%Configuration management 87% 90% 83% 80% 80% 60% 73% 60% 80%Security and privacy assurance 75% 90% 66% 81% 81% 38% 69% 44% 56%Fault management 80% 83% 77% 73% 73% 53% 80% 60% 73%
Fig. 5. Capacity implementation coverage by monitoring tools.
5.4. Cloud operational area coverage
In this section, we present the coverage for each Cloud oper-ational area by the monitoring tools in terms of capability im-plementation percentages relevant for an operational area. Thisanalysis reveals the fitness of each monitoring tool for variousCloud operational areas. Table 6 shows the analysis of general pur-pose monitoring tools in terms of coverage for each Cloud oper-ational area whereas Table 7 shows the same for Cloud specificmonitoring tools. The percentages shown in the tables are calcu-lated from Tables 4 and 5 by separating the relevant capabilities ofeach Cloud operational areas as presented in the taxonomy.
From the comparison of Tables 4 and 6, it is interesting toobserve that although Hyperic performs the best among thegeneral purpose monitoring tools in terms of implementationpercentage of the considered capabilities, it is outperformed byIBM Tivoli in terms of security and privacy assurance. Zabbix isalso found to be as good as Hyperic for supporting almost all ofthe operational areas except for security and privacy assurance and
fault management. However, among the Cloud specific monitoringtools Nimsoft is found to perform the best not only in terms ofimplementation percentage of the considered capability but alsofor the support of each individual Cloud operational areas. In thenext section, we differentiate our contributions to the previouswork.
6. Related work
This section analyses the previous work on this topic and high-lights our differences. Buytaert et al. [13] provide a detailed re-view of some monitoring tools: Nagios, OpenNMS, Zabbix, andHyperic while identifying the expected features of a monitoringsystem from the users’, system administrators’, and developers’perspectives. However, their analysis is limited to the suitabilityof the tools for monitoring computing infrastructures and net-works only. Krizanic et al. [52] review performance monitoringtools and categorise them according to the operating systems andnotification/alerting facilities that they support. Nevertheless, theiranalysis does not consider the application of the tools to Cloudmonitoring. Zanikolas et al. [93] categorise monitoring tools ap-plicable to Grid Computing based on their usefulness for generat-ing, processing, distributing and consuming monitoring data. Theauthors identify the overlapping functionality, lack of agreementamong the protocols and semantics used in monitoring projectswhile pointing out the need for more coordination and interoper-ability among those projects. But, their approach does not considerCloud environments. Benedict [9] surveys performancemonitoringtools with the goal of helping developers choose the best monitor-ing tool for High Performance Computing (HPC) applications. Hisreview does not consider the applicability of these tools to high-level Cloud monitoring scenarios.
Rimal et al. [84] survey a number of Cloud service providersand place their offerings into categories such as architecture,virtualization technology, load balancing, fault tolerance, interop-erability, and storage systems. Nevertheless, the authors do notconsider themonitoring techniques used by these Cloud providers.Aceto et al. [2,1] present a taxonomy of Cloud monitoring whichincludes (i) the need for monitoring, revealing the various Cloudactivities where monitoring is an essential task (ii) basic concepts,such as layers, abstraction levels andmetrics, (iii) properties,whichexhibit various characteristics a distributed monitoring system
2930 K. Fatema et al. / J. Parallel Distrib. Comput. 74 (2014) 2918–2933
should have and (iv) open issues and future directions. Further-more, their survey categorises the tools into open-source, com-mercial and services. Our work addresses the same area, i.e. Cloudmonitoring but, from a different perspective. We surveyed themonitoring tools and categorised them into general purpose andCloud specific monitoring tools. Such a categorisation realises ahistorical view of these tools evolution from the pre-Cloud erato the Cloud era. We identified the prevalent Cloud monitoringcapabilities, which ensue from our definition of monitoring andkey Cloud characteristics. We provided distinguished provider andconsumer views of various Cloud operational areas where moni-toring plays an inevitable role. The identified operational areas aredeeply explored to surface the required capabilities from a moni-toring tool in order to serve the objectives of those areas. Addition-ally, we presented a taxonomy of these capabilities. Moreover, ourwork provides a clear association of the monitoring tool capabili-ties to the Cloud operational areas, which is not the case in [2,1].
To the best of our knowledge none of the previous work gavea historical evolution of monitoring tools and clearly associatesCloud operational areas with the required monitoring capabilitiesto effect informed decision and thereby achieve efficient manage-ment. In the next section, we discuss the identified research chal-lenges.
7. Identified challenges
In this section, we discuss the challenges and future trendsderived from our analysis.
7.1. Trends and challenges
Cloud Computing is a promising technology that is currentlygaining wide acceptance in industry. Efficient management of thistechnology still remains a challenge and will be an important re-search area for the coming years. The work presented in this paperhas analysed the state of the art of monitoring tools using a capa-bility based approach. This novel approach afforded us the oppor-tunity to compare and contrast the strengths and weaknesses ofthese tools in terms of their implemented functionalities.
General purpose monitoring tools, as we observed, are mainlyfocused on infrastructure resourcemonitoring.Many of these toolsare currently being adapted for monitoring in Clouds, and we fore-see that this adaptation process will continue into the future. Inour analysis of general purposemonitoring tools, we observed thatthe realisation of some desirable capabilities such as scalability,robustness and interoperability, is still a challenge. Our analysisshowed that none of the tools surveyed have capabilities for ver-ifiable metering and service KPI monitoring. Verifiable meteringensures the integrity of monitored data and is important for trust-worthiness of the Cloud provider. Monitoring service KPI is impor-tant for SaaS provider to measure the performance of the softwareservice. Furthermore, the tools lack capabilities implementing useractivity monitoring, secured notification, secured storage and ser-vice dependencymonitoring. In amulti-tenant Cloud environment,it is important to monitor user activity to detect, for example, amalicious user. We consider these capabilities to be important foradvanced Clouds operational management. We predict, therefore,that future research in this areawill be focused on addressing theseissues.
As evident from our review, Cloud specific monitoring toolsare mostly being developed by commercial Cloud providers tohelp in managing their services and enforcing agreed Service LevelObjectives (SLOs). This makes these tools platform dependent andproprietary. As can be seen from our analysis in Table 5, thesetools are currently lacking in the implementations of portabilityand affordability capabilities. Some Cloud specificmonitoring tools
have also been developed in academia. However, these are largelyprototypes andmay not be production ready.We predict thatmostof the research effort on Cloud monitoring in the future will focuson security capabilities of monitoring tools to make Cloud moredependable and trustworthy.
Lack of support for these capabilities may say more about thespeed at which the field is evolving rather than the importancewith which these capabilities are viewed. We argue that this is animportant area for research for the future.
8. Conclusion and future work
Monitoring in Clouds is an area that is yet to be fully realised. Itplays an important role in supporting efficientmanagement of var-ious Cloud operational areas, including SLA management, serviceand resource provisioning. In addition we have postulated that itcan be used perhaps in the future to provide quantitative supportfor trust assurance.
In this paper, we have reviewed the technical details of themonitoring tools classifying them into general purpose and Cloudspecific categories, which offers us an opportunity to gain insightsof the tools and historically analyse their evolution. From ourmon-itoring definition, we derived a list of capabilities that are consid-ered relevant to facilitate efficient Cloud operational management.Since Cloud environments consist of various areas with uniquefunctions that can bemanaged separately, we have identified someof these areas in order to investigate the role of monitoring in sup-porting them fromboth providers’ and consumers’ perspectives. Tosystematically achieve this goal, we defined a taxonomy groupingthe capabilities of the different Cloud operational areas. This taxon-omy provides a useful benchmark for analysing the strengths andweakness of the monitoring tools. Tables 4 and 5 highlight the ex-tent to which the tools considered address the list of capabilities.More importantly it draws attention to those capabilities that areunder-represented.
We noted that general purposemonitoring tools and Cloud spe-cific monitoring tools as a whole exposed different capabilities inaccordance with their deployment domain. In general, this unbal-anced emphasis on certain capabilities was broadly in line withexpectations. General purposemonitoring tools are commonly de-signedwith a client–server architecturewhere the client resides onthemonitored object and communicates information to the server.These tools were designed for monitoring fixed-resource environ-ments where there was no dynamic scaling of resources. This is re-flected in the lack of many capabilities like scalability as shown byour analysis in Table 4. We argue that in designing future monitor-ing tools especially for Clouds, these challenges must be addressedsince issues such as scalability are important for Cloudmonitoring.
The capability based analysis of themonitoring tools has helpedto identify the Cloud operational areas that are better addressed bythese tools andmore importantly the areas that are lacking supportso that future research can take them into consideration. Thereview and analysis in this paper have provided a comprehensiveoverview of the described monitoring tools and their ability tosupport Cloud operational management. We consider this piece ofwork as a reference and a basis for further research work in thisarea as described in the next section.
8.1. Future work
Proper and extensive Cloud monitoring has applications farbeyond Cloud operational management. Applied intelligently,Cloud monitoring can form the foundation for trust assuranceand analytic techniques for visualising and analysing monitoreddata. From the trust assurance perspective, surveys of consumersand enterprises have consistently highlighted concerns about
K. Fatema et al. / J. Parallel Distrib. Comput. 74 (2014) 2918–2933 2931
entrustment of data to Cloud service providers. Cloud trustmarkshave been proposed as a means of addressing these concerns [59].By utilising modern web technologies, such as HTML 5, trustmarkscould be presented as active dynamic entities that succinctlycommunicate up-to-date values for a number of high-leveldependability measures. These dependability measures would bebased on ‘‘live’’ analytics of aspects of the underlying servicesupported by appropriate active monitoring.
From the perspective of data analysis and visualisation, themanagement of diverse Cloud metrics is an on-going challenge. Toaddress this challenge the construction of a metric ontology whichwould categorise and structure in a useful manner the thousandsof available Cloud metrics would be extremely beneficial. Suchan ontology would afford us the chance to identify and studythe important factors for making decisions, for instance, movingbusiness applications to the Cloud or selecting particular providersand service offerings. This work would also include the design ofanalytic tools for analysing data and generating trend informationthrough a dashboard, for example.
Acknowledgments
The research work described in this paper was supported bythe Irish Centre for Cloud Computing and Commerce, an Irishnational Technology Centre funded by Enterprise Ireland and theIrish Industrial Development Authority.
References
[1] G. Aceto, A. Botta, W. de Donato, A. Pescapè, Cloud monitoring: definitions,issues and future directions, in: Proceedings of IEEE 1st InternationalConference on Cloud Networking (CLOUDNET), IEEE, 2012, pp. 63–67.
[2] G. Aceto, A. Botta, W. de Donato, A. Pescapè, Cloud monitoring: a survey,Comput. Netw. 57 (9) (2013) 2093–2115.
[3] Amazon, Amazon CloudWatch: Developer guide, 2009.[4] S. Andreozzi, N. De Bortoli, S. Fantinel, A. Ghiselli, G.L. Rubini, G. Tortone,
M.C. Vistoli, Gridice: a monitoring service for grid systems, Future Gener.Comput. Syst. 21 (4) (2005) 559–571.
[5] AzureWatch (Jan. 2014). URL https://www.paraleap.com/AzureWatch.[6] A. Bala, I. Chana, Fault tolerance-challenges, techniques and implementation
in cloud computing, Computer Science and Engineering Department, ThaparUniversity Patiala, Punjab, India.
[7] W. Barth, Nagios: system and network monitoring, No Starch Pr (2008).[8] P. Bellavista, G. Carella, L. Foschini, T. Magedanz, F. Schreiner, K. Campowsky,
QoS-aware elastic cloud brokering for IMS infrastructures, in: Computers andCommunications (ISCC), 2012 IEEE Symposium on, IEEE, 2012, pp. 157–160.
[9] S. Benedict, Performance issues and performance analysis tools for HPC cloudapplications: a survey, Computing (2012) 1–20.
[10] B. Bonev, S. Ilieva, Monitoring Java based SOA applications, in: Proceedingsof the 13th International Conference on Computer Systems and Technologies,ACM, 2012, pp. 155–162.
[11] G. Boss, P. Malladi, D. Quan, L. Legregni, H. Hall, Cloud computing, IBM whitepaper 1369.
[12] Boundary applicationmonitor (Jan. 2014). URL http://boundary.com/blog/tag/application-monitoring/.
[13] K. Buytaert, T. De Cooman, F. Descamps, B. Verwilst, Systems monitoringshootout, in: Linux Symposium, 2008, p. 53.
[14] V. Cardellini, S. Iannucci, Designing a flexible and modular architecture for aprivate cloud: a case study, in: Proceedings of the 6th International Workshopon Virtualization Technologies in Distributed Computing Date, in: VTDC’12,ACM,NewYork, NY, USA, 2012, pp. 37–44. http://dx.doi.org/10.1145/2287056.2287067.
[15] A. Celesti, F. Tusa, M. Villari, A. Puliafito, How to enhance cloud architecturesto enable cross-federation, in: Proceedings of the 2010 IEEE 3rd InternationalConference on Cloud Computing, in: CLOUD’10, 2010, pp. 337–345.
[16] J. Chen, R. Childress, I. Mcintosh, G. Africa, A. Sitaramayya, A servicemanagement architecture component model, in: Network and ServiceManagement (CNSM), 2011 7th International Conference on, IEEE, 2011,pp. 1–4.
[17] D. Chen, H. Zhao, Data security and privacy protection issues in cloudcomputing, in: Computer Science and Electronics Engineering (ICCSEE), 2012International Conference on, vol. 1, IEEE, 2012, pp. 647–651.
[18] X. Cheng, Y. Shi, Q. Li, A multi-tenant oriented performance monitoring,detecting and scheduling architecture based on sla, in: 2009 Joint Conferenceson Pervasive Computing (JCPC), 2009, pp. 599–604. http://dx.doi.org/10.1109/JCPC.2009.5420114.
[19] C.C. Cirstoiu, C.C. Grigoras, L.L. Betev, A.A. Costan, I.C. Legrand, Monitoring,accounting and automated decision support for the alice experiment basedon the MonALISA framework, in: Proceedings of the 2007 Workshop on GridMonitoring, ACM, 2007, pp. 39–44.
[20] M. Comuzzi, C. Kotsokalis, G. Spanoudakis, R. Yahyapour, Establishing andmonitoring SLAs in complex service based systems, in: Web Services, 2009.ICWS 2009. IEEE International Conference on, IEEE, 2009, pp. 783–790.
[21] B. Cowie, Building A Better Network Monitoring System, 2012.[22] M. de Carvalho, L. Granville, Incorporating virtualization awareness in service
monitoring systems, in: Integrated NetworkManagement (IM), 2011 IFIP/IEEEInternational Symposium on, 2011, pp. 297–304. http://dx.doi.org/10.1109/INM.2011.5990704.
[23] S. De Chaves, R. Uriarte, C. Westphall, Toward an architecture for monitoringprivate clouds, IEEE Commun. Mag. 49 (12) (2011) 130–137.
[24] E. Elmroth, F.G. Marquez, D. Henriksson, D.P. Ferrera, Accounting and billingfor federated cloud infrastructures, in: Grid and Cooperative Computing, 2009.GCC’09. Eighth International Conference on, IEEE, 2009, pp. 268–275.
[25] V.C. Emeakaroha, I. Brandic, M. Maurer, S. Dustdar, Low level metrics tohigh level SLAs-LoM2HiS framework: bridging the gap between monitoredmetrics and SLA parameters in cloud environments, in: High PerformanceComputing and Simulation (HPCS), 2010 International Conference on, IEEE,2010, pp. 48–54.
[26] V. Emeakaroha, T. Ferreto, M. Netto, I. Brandic, C. De Rose, CASViD: applicationlevel monitoring for SLA violation detection in clouds, in: Computer Softwareand Applications Conference (COMPSAC), 2012 IEEE 36th Annual, 2012, pp.499–508. http://dx.doi.org/10.1109/COMPSAC.2012.68.
[27] Z. Feng, L. Huang, R-binder: application of using service binder in R-OSGi,in: Computer Science and Computational Technology, 2008, in: ISCSCT’08.International Symposium on, Vol. 2, IEEE, 2008, pp. 34–39.
[28] A.J. Ferrer, F. Hernández, J. Tordsson, E. Elmroth, A. Ali-Eldin, C. Zsigri,R. Sirvent, J. Guitart, R.M. Badia, K. Djemame, et al., Optimis: a holistic approachto cloud service provisioning, Future Gener. Comput. Syst. 28 (1) (2012) 66–77.
[29] S. Ferretti, V. Ghini, F. Panzieri, M. Pellegrini, E. Turrini, QoS aware clouds,in: Cloud Computing (CLOUD), 2010 IEEE 3rd International Conference on,IEEE, 2010, pp. 321–328.
[30] M. Frank, L. Oniciuc, U. Meyer-Baese, I. Chiorescu, A space-efficient quantumcomputer simulator suitable for high-speed FPGA implementation, Proc. SPIE7342, Quantum Information and Computation VII, 734203 (April 27, 2009);http://dx.doi.org/10.1117/12.817924.
[31] Gartner, Open source survey 2008. URL http://www.gartner.com/technology/home.jsp.
[32] GigaSpaces, Easy deployment of mission-critical applications to the cloud,2011.
[33] S.V. Gogouvitis, V. Alexandrou, N. Mavrogeorgi, S. Koutsoutos, D. Kyriazis,T. Varvarigou, A monitoring mechanism for storage clouds, in: Cloud andGreen Computing (CGC), 2012 Second International Conference on, IEEE, 2012,pp. 153–159.
[34] C. Gong, J. Liu, Q. Zhang, H. Chen, Z. Gong, The characteristics of cloud com-puting, in: Parallel Processing Workshops (ICPPW), 2010 39th InternationalConference on, IEEE, 2010, pp. 275–279.
[35] D. Gupta, P. Mohapatra, C.-N. Chuah, Efficient monitoring in wireless meshnetworks: overheads and accuracy trade-offs, in: Mobile Ad Hoc and SensorSystems, 2008. MASS 2008. 5th IEEE International Conference on, IEEE, 2008,pp. 13–23.
[36] Z. Haiteng, S. Zhiqing, Z. Hong, Z. Jie, Establishing service level agreementrequirement basedonmonitoring, in: Cloud andGreenComputing (CGC), 2012Second International Conference on, IEEE, 2012, pp. 472–476.
[37] P. Hasselmeyer, N. d’Heureuse, Towards holistic multi-tenant monitoring forvirtual data centers, in: Network Operations and Management SymposiumWorkshops (NOMSWksps), 2010 IEEE/IFIP, IEEE, 2010, pp. 350–356.
[38] B. Hoßbach, Reaktives cloud monitoring mit complex event processing(Master’s thesis), Philipps Universitát Marburg, Department of Mathematicsand Informatics, Germany, 2011.
[39] Hyperic (Jan. 2014). URL http://www.hyperic.com.[40] IBM Tivoli monitoring (Jan. 2014). URL https://publib.boulder.ibm.com/
infocenter/ieduasst/tivv1r0/index.jsp?topic=/com.ibm.iea.itm/plugin_coverpage.html.
[41] E. Imamagic, D. Dobrenic, Grid infrastructure monitoring system basedon Nagios, in: Proceedings of the 2007 workshop on Grid monitoring,in: GMW’07, ACM, New York, NY, USA, 2007, pp. 23–28. http://dx.doi.org/10.1145/1272680.1272685.
[42] E. Iori, A. Simitsis, T. Palpanas, K. Wilkinson, S. Harizopoulos, CloudAlloc: amonitoring and reservation system for compute clusters, in: Proceedings ofthe 2012 ACM SIGMOD International Conference on Management of Data,in: SIGMOD’12, ACM, New York, NY, USA, 2012, pp. 721–724. http://dx.doi.org/10.1145/2213836.2213942.
[43] C. Issariyapat, P. Pongpaibool, S. Mongkolluksame, K. Meesublak, UsingNagios as a groundwork for developing a better network monitoring system,in: Technology Management for Emerging Technologies (PICMET), in: 2012Proceedings of PICMET’12, IEEE, 2012, pp. 2771–2777.
[44] R. Jhawar, V. Piuri, M. Santambrogio, A comprehensive conceptual system-level approach to fault tolerance in cloud computing, in: Systems Conference(SysCon), 2012 IEEE International, IEEE, 2012, pp. 1–5.
[45] R. Jhawar, V. Piuri, M. Santambrogio, Fault tolerance management in cloudcomputing: A system-level perspective, Syst. J. IEEE 7 (2) (2013) 288–297.
2932 K. Fatema et al. / J. Parallel Distrib. Comput. 74 (2014) 2918–2933
[46] H. Jiang, H. Lv, N. Wang, R. Di, A performance monitoring solution fordistributed application system based on JMX, in: Grid and CooperativeComputing (GCC), 2010 9th International Conference on, 2010, pp. 124 –127.http://dx.doi.org/10.1109/GCC.2010.35.
[47] W.M. Jones, J.T. Daly, N. DeBardeleben, Application monitoring and check-pointing inHPC: looking towards exascale systems, in: Proceedings of the 50thAnnual Southeast Regional Conference, ACM, 2012, pp. 262–267.
[48] G. Katsaros, R. Kübert, G. Gallizo, Building a service-oriented monitoringframework with REST and Nagios, in: Services Computing (SCC), 2011 IEEEInternational Conference on, IEEE, 2011, pp. 426–431. http://dx.doi.org/10.1109/SCC.2011.53.
[49] E. Kijsipongse, S. Vannarat, Autonomic resource provisioning in Rocks clustersusing Eucalyptus cloud computing, in: Proceedings of the InternationalConference on Management of Emergent Digital EcoSystems, in: MEDES’10,ACM,NewYork, NY, USA, 2010, pp. 61–66. http://dx.doi.org/10.1145/1936254.1936265.
[50] I. Konstantinou, E. Angelou, D. Tsoumakos, C. Boumpouka, N. Koziris, S. Sioutas,Tiramola: elastic nosql provisioning through a cloud management platform,in: SIGMOD Conference’12, ACM, 2012, pp. 725–728.
[51] G. Kornaros, D. Pnevmatikatos, A survey and taxonomy of on-chip monitoringof multicore systems-on-chip, ACM Trans. Des. Autom. Electron. Syst. 18 (2)(2013) 17.
[52] J. Krizanic, A. Grguric, M. Mosmondor, P. Lazarevski, Load testing andperformance monitoring tools in use with AJAX based web applications,in: MIPRO, 2010 Proceedings of the 33rd International Convention, IEEE, 2010,pp. 428–434.
[53] R.L. Krutz, R.D. Vines, Cloud Security: A Comprehensive Guide to Secure CloudComputing, Wiley, 2010.
[54] B.S. Lee, S. Yan, D. Ma, G. Zhao, Aggregating IaaS service, in: SRII GlobalConference (SRII), 2011 Annual, IEEE, 2011, pp. 335–338.
[55] D. liang Lee, S.-Y. Yang, Y.-J. Chang, Developing an active mode of networkmanagement system with intelligent multi-agent techniques, in: PervasiveComputing (JCPC), 2009 Joint Conferences on, 2009, pp. 77–82.
[56] A. Lenk, M. Klems, J. Nimis, S. Tai, T. Sandholm, What’s inside the cloud?An architectural map of the cloud landscape, in: Proceedings of the 2009ICSEWorkshop on Software Engineering Challenges of Cloud Computing, IEEEComputer Society, 2009, pp. 23–31.
[57] T. Lindsey, Apparatus andmethod formodeling, and storingwithin a database,services on a telecommunications network, US Patent App. 10/127, 315 (April2002).
[58] F. Liu, J. Tong, J. Mao, R. Bohn, J. Messina, L. Badger, D. Leaf, NIST CloudComputing Reference Architecture, vol. 500, NIST Special Publication, 2011,p. 292.
[59] T. Lynn, P. Healy, R. McClatchey, J. Morrison, C. Pahl, B. Lee, The case for cloudservice trustmarks and assurance-as-a-service, in: Proceedings of the 3rdInternational Conference on Cloud Computing and Services Science (CLOSER),2013.
[60] M. Mansouri-Samani, M. Sloman, Monitoring distributed systems, IEEE Netw.7 (6) (1993) 20–30.
[61] F. Marchioni, JBoss AS 7 Configuration, Deployment and Administration, PacktPublishing Ltd., 2012.
[62] P. Marshall, H. Tufo, K. Keahey, D. LaBissoniere, M. Woitaszek, Architectinga large-scale elastic environment: recontextualization and adaptive cloudservices for scientific computing, in: 7th International Conference on SoftwareParadigm Trends (ICSOFT), Rome, Italy, 2012.
[63] R. Marty, Cloud application logging for forensics, in: Proceedings of the 2011ACM Symposium on Applied Computing, ACM, 2011, pp. 178–184.
[64] M. Massie, B. Chun, D. Culler, The Ganglia distributed monitoring system:design, implementation, and experience, Parallel Comput. 30 (7) (2004)817–840.
[65] A. Meddeb, Internet QoS: Pieces of the Puzzle, IEEE Commun. Mag. 48 (1)(2010) 86–94.
[66] R. Mehrotra, A. Dubey, J. Kwalkowski, M. Paterno, A. Singh, R. Herber, S.Abdelwahed, RFDMon: a real-time and fault-tolerant distributed systemmonitoring approach (10/2011 2011).
[67] D.A. Menascé, P. Ngo, Understanding cloud computing: Experimentation andcapacity planning, in: Computer Measurement Group Conference, 2009.
[68] X.Meng, C. Isci, J. Kephart, L. Zhang, E. Bouillet, D. Pendarakis, Efficient resourceprovisioning in compute clouds via VM multiplexing, in: Proceedings of the7th International Conference onAutonomic Computing, ACM, 2010, pp. 11–20.
[69] Monitis (Jan. 2014). URL https://www.monitis.com/.[70] S. Mongkolluksamee, Strengths and limitations of Nagios as a network
monitoring solution, in: Proceedings of the 7th International Joint Conferenceon Computer Science and Software Engineering (JCSSE 2010), 2009, pp.96–101.
[71] D.E. Morgan, W. Banks, D.P. Goodspeed, R. Kolanko, A computer networkmonitoring system, IEEE Trans. Softw. Eng. SE-1 (3) (1975) 299–311.
[72] J. Murphy, Snoscan: an iterative functionality service scanner for large scalenetworks (Ph.D. thesis), Iowa State University, 2008.
[73] Y. Natis, E. Knipp, R. Valdes, D. Cearley, D. Sholler, Who is who in applicationplatforms for cloud computing: the cloud specialists, 2009.
[74] R. Olups, Zabbix 1.8 Network Monitoring, Packt Publishing, 2010.[75] S. Padhy, D. Kreutz, A. Casimiro,M. Pasin, Trustworthy and resilientmonitoring
system for cloud infrastructures, in: Proceedings of the Workshop on Postersand Demos Track, in: PDT’11, ACM, 2011, pp. 3:1–3:2.
[76] M. Palacios, J. Garcia-Fanjul, J. Tuya, G. Spanoudakis, Identifying testrequirements by analyzing SLA guarantee terms, in: Web Services (ICWS),2012 IEEE 19th International Conference on, IEEE, 2012, pp. 351–358.
[77] C. Pape, R. Trommer, Monitoring VMware-based virtual infrastructures withOpenNMS, 2012.
[78] K. Park, J. Han, J. Chung, Themis: a mutually verifiable billing system for thecloud computing environment, in: IEEE Transaction on Service Computinghttp://dx.doi.org/10.1109/TSC.2012.1.
[79] M. Paterson, Evaluation of Nagios for real-time cloud virtual machinemonitoring, 2009.
[80] Z.N. Peterson,M.Gondree, R. Beverly, A position paper ondata sovereignty: theimportance of geolocating data in the cloud, in: Proceedings of the 8th USENIXConference on Networked Systems Design and Implementation, 2011.
[81] T. Pueschel, D. Neumann, Management of cloud infrastructures: Policy-basedrevenue optimization, in: Thirtieth International Conference on InformationSystems (ICIS 2009), 2009, pp. 1–16.
[82] M. Rak, S. Venticinque, T. Máhr, G. Echevarria, G. Esnal, Cloud applicationmonitoring: the mOSAIC approach, in: Cloud Computing Technology andScience (CloudCom), 2011 IEEE Third International Conference on, IEEE, 2011,pp. 758–763.
[83] J.S. Rellermeyer, G. Alonso, T. Roscoe, Building, deploying, and monitoringdistributed applications with Eclipse and R-OSGi, in: Proceedings of the 2007OOPSLAWorkshop on Eclipse Technology eXchange, in: eclipse ’07, ACM, NewYork, NY, USA, 2007, pp. 50–54. http://dx.doi.org/10.1145/1328279.1328290.
[84] B. Rimal, E. Choi, I. Lumb, A taxonomy and survey of cloud computing systems,in: INC, IMS and IDC, 2009. NCM’09. Fifth International Joint Conference on,IEEE, 2009, pp. 44–51.
[85] F. Sabahi, Cloud computing security threats and responses, in: CommunicationSoftware and Networks (ICCSN), 2011 IEEE 3rd International Conference on,2011, pp. 245–249.
[86] D. Sanderson, Programming Google App Engine, O’Reilly Media, 2009.[87] V. Sekar, P. Maniatis, Verifiable resource accounting for cloud computing
services, in: Proceedings of the 3rd ACM Workshop on Cloud ComputingSecurity Workshop, ACM, 2011, pp. 21–26.
[88] A. Sekiguchi, K. Shimada, Y. Wada, A. Ooba, R. Yoshimi, A. Matsumoto,Configuration management technology using tree structures of ICT systems,in: Proceedings of the 15th Communications and Networking SimulationSymposium, Society for Computer Simulation International, 2012, p. 4.
[89] T. Somasundaram, K. Govindarajan, Cloud monitoring and discovery service(CMDS) for IaaS resources, in: Advanced Computing (ICoAC), 2011 ThirdInternational Conference on, 2011, pp. 340–345. http://dx.doi.org/10.1109/ICoAC.2011.6165199.
[90] S. Subashini, V. Kavitha, A survey on security issues in service delivery modelsof cloud computing, J. Netw. Comput. Appl. 34 (1) (2011) 1–11.
[91] D. Tovarnak, T. Pitner, Towards multi-tenant and interoperable monitoring ofvirtual machines in cloud, in: Symbolic and Numeric Algorithms for ScientificComputing (SYNASC), 2012 14th International Symposium on, 2012, pp.436–442.
[92] L.M. Vaquero, L. Rodero-Merino, D. Morán, Locking the sky: a survey on IaaScloud security, Computing 91 (1) (2011) 93–118.
[93] S. Zanikolas, R. Sakellariou, A taxonomy of grid monitoring systems, FutureGener. Comput. Syst. 21 (1) (2005) 163–188.
[94] Q. Zhang, L. Cherkasova, E. Smirni, A regression-based analytic model fordynamic resource provisioning of multi-tier applications, in: AutonomicComputing, 2007. ICAC ’07. Fourth International Conference on, 2007,http://dx.doi.org/10.1109/ICAC.2007.1.
[95] X. Zhang, J.L. Freschl, J.M. Schopf, A performance study of monitoring andinformation services for distributed systems, in:High PerformanceDistributedComputing, 2003. Proceedings. 12th IEEE International Symposium on, IEEE,2003, pp. 270–281.
[96] W. Zhou, C. Chen, S. Li, Monitoring system for the campus card based on cactiand nagios, Shiyan Jishu yu Guanli 28 (4) (2011) 246–249.
Kaniz Fatema is a postdoctoral researcher at UniversityCollege Cork focusing on cloud monitoring techniques,gathering and combining metrics from various layersof cloud and other topics related to cloud computing.Her Ph.D. research was conducted under an EC FP7project (Trusted Architecture for Securely Shared Services)and focused on designing an authorisation system thatprovides privacy of personal data through access controlpolicies and conflict resolution policies from multipleauthorities dynamically, obtaining access control rulesfrom the legislation (EU Data Protection Directive). She
has multidisciplinary research experiences (in Information Security and Privacy,Privacy Law, Cloud computing, Speech Processing, Machine Learning, and DigitalSignal Processing). Kaniz Completed her Ph.D. studies at the University of Kent,UK. She obtained her M.Sc. (Eng) in Data Communications from the University ofSheffield, UKwith distinction. She didM.Sc. andB.Sc. (honours) in Computer Scienceand Engineering from theUniversity of Dhaka, Bangladesh. Sheworked as a Lecturerat the Stamford University Bangladesh for 2 years and as an Assistant Lecturer at theUniversity of Kent, UK for 3.5 years.
K. Fatema et al. / J. Parallel Distrib. Comput. 74 (2014) 2918–2933 2933
Vincent C. Emeakaroha is currently a postdoctoralresearcher at the Irish Centre for Cloud Computing andCommerce with affiliation to University College CorkIreland. Previously, he was a research assistant at theDistributed SystemsGroup, Information Systems Institute,Vienna University of Technology (TU Wien). He receivedhis bachelor degree in Computer Engineering in 2006 andgained double masters in Software Engineering & InternetComputing in 2008, and in Informatics Managementin 2009 all from Vienna University of Technology. In2012, he received his Ph.D. in Computer Science with
excellence from the same University. He has worked in different research projectsover the years including the Foundations of Self governing ICT Infrastructures(FoSII) project funded by the Vienna Science and Technology Fund (WWTF) andHolistic Energy Efficient Approach for the Management of Hybrid Clouds (HALEY)funded by the Vienna University of Technology Science Award. He participatedin the European Commissions COST Action on Energy Efficient Large ScaleDistributed System. In October 2010, he was a visiting researcher at the TechnicalUniversity of Catalunya Barcelona Spain. His research areas of interest includeresource management in large scale distributed systems, Cloudmonitoring, serviceprovisioning management, autonomic computing, energy efficiency in Clouds, SLAand QoS management.
Philip Healy has over a decade’s experience in parallel anddistributed software development, in both academic andindustrial settings. He completed his Ph.D. studies in 2006in the area of cluster computing with FPGAs. His recentacademic work includes the research and developmentof a remote monitoring system for physiological signalsacquired in a critical care setting. His recent industryexperience includes the design and development oflarge-scale distributed data processing applications usingcutting-edge technologies such as Hadoop and HBase. Heis currently a Research Fellow at The Irish Centre for Cloud
Computing & Commerce.
John Morrison is the founder and director of the Centrefor Unified Computing. He is a co-founder and co-directorof the Boole Centre for Research in Informatics and a co-founder and co-director of Grid-Ireland. Prof. Morrison isa Science Foundation of Ireland Investigator award holderandhas publishedwidely in the field of Parallel DistributedandGrid Computing. He has been the guest editor onmanyjournals including the Journal of Super Computing andthe Journal of Scientific Computing. He is on the EditorialBoard of Multi-Agent and Grid Systems: An InternationalJournal, published by ISO Press, and the International
Journal of Computational Intelligence: Theory and Practice (IJCITP). He is a memberof the ACM and a senior member of the IEEE. Prof Morrison is a member of theI2Lab Advisory Board in the University of Central Florida. He has served on dozensof international conference programme committees and is a co-founder of theInternational Symposium on Parallel and Distributed Computing.
Theo Lynn is a Senior Lecturer at DCU Business School andis Principal Investigator of the Irish Centre for Cloud Com-puting and Commerce. Theo leads the Techspectations ini-tiative, part of DCU Business Schools MarketingLab andteaches strategy and digital marketing at DCU BusinessSchool. He is a partner in The 30/60 Partnership, a seedcapital investment fund for early stage technology star-tups and advises a number of domestic and internationalcompanies.