![Page 1: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/1.jpg)
Predicting Service Metrics using Real-time Analytics
Rolf Stadler School of Electrical Engineering
KTH Royal Institute of Technology, Sweden
April 25, 2016
AnNet 2016, Istanbul, Turkey
![Page 2: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/2.jpg)
The Success of Analytics in System Engineering DARPA Grand Challenge (2004-2007) • competition for autonomous vehicles
IBM’s Watson (2011) • natural language processing and machine learning on
unstructured data
Google Brain (2012) • recognize higher-level concepts from unlabeled images
Google’s Alpha Go (2016) • plays Go on level of best professionals
3
![Page 3: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/3.jpg)
Cloud for Analytics vs. Analytics for Cloud Cloud technologies in support of (big) data analytics
- enable virtualization, scaling, pay-as-you-go, multi-tenancy
- new computing paradigms, online algorithms, stream processing
- platforms Analytics in support of engineering, operations of cloud technologies and services à this talk
4
![Page 4: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/4.jpg)
The Role of Analytics in Systems Engineering and Operations • What are the benefits and the costs of applying analytics methods?
• For which cases outperform analytics method traditional methods or provide new capabilities?
• How can we integrate analytics into an overall engineering methodology?
• Experience shows that both data science and domain knowledge needed.
à Need to train engineers in data science. 5
![Page 5: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/5.jpg)
Why Analytics for Clouds?
Enablers • large amounts of counters, statistics, event streams • technology has progressed to enable real-time
storage, processing at source • availability of platform technology
Need • complexity makes traditional methods infeasible
statistical learning creates a system model through observation without detailed knowledge of system architecture and its functional components
6
![Page 6: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/6.jpg)
Example:Real-,meAnaly,csforNetworkManagementCollabora,onbetweenEricssonResearch,KTH,SICS
TelecomClouds Analy0csIn-NetworkAnaly0cs
Real-0meManagementServiceAssuranceAnomalyDetec0on
REALMReal-0meAnaly0cs
forCloudNetworkManagement
![Page 7: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/7.jpg)
RTAN
M:X11…X44➝Ŷ1…Ŷn:In-NetworkComputa,on
S1…Sn
S1
S2
X41…X44X12…X24 X31X11
Y1
Y2
S1…Sn
INANINAN INANINAN
Real-,mePredic,onofServiceMetrics
![Page 8: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/8.jpg)
CPUload,memoryload,#networkac,vesockets,#contextswitching,#processes,etc..
Video-on-demand(VoD)• videostreaming(VLC)• videoframerate,audiobuffer
rate,networkreadrateKV-store• response,me
TheProblem
X:devicesta,s,csY:service-levelmetrics
VoD(S1):Cluster
Network
VoD(S1):Client
KV(S2):ClusterKV(S2):Client
FindM:X➝ŶthatpredictsYinreal-,me.
![Page 9: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/9.jpg)
Goal:• Predic,ngServiceMetricsfromDeviceSta,s,csinreal-,meApproach:• Sta,s,callearning,onlinemethods,
distributedlearningoncomputeserversandnetworknodes;Experimenta,onontestbed
Benefitsofapproach:• service-agnos,cmethodology,scalablity,…Challenges:• Largefeatureset(>1kfeatures)• Conceptdri`throughchangingloadpaaernsand
resourcemanagementfunc,ons
Real-,meAnaly,csforManagement
![Page 10: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/10.jpg)
DeviceSta,s,csX
• Linuxkernelsta,s,csXproc– Featuresextractedfrom/procdirectory– CPUcorejiffies,currentmemoryusage,virtualmemorysta,s,cs,
#processes,#blockedprocesses,…– Some4000metrics
• SystemAc,vityReport(SAR)Xsar– SARcomputesmetricsfrom/procover,meinterval– CPUcoreu,liza,on,memoryandswapspaceu,liza,on,diskI/Osta,s,cs,…– Some840metrics
• XproccontainsmanyOScounters,whileXsardoesnot• Formodelpredic,ons,focusonnumericalfeatures• Sensorsreadsta,s,cs1-2,mespersec.
11
![Page 11: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/11.jpg)
ServiceMetricsYVideo-on-Demand
• VideostreamingservicebasedonVLCmediaplayer.• WeinstrumentedtheVLCso`waretocaptureunderlyingeventstocomputethemetrics.
• Metrics:videoframerate,audiobufferrate,RTPpacketrate,…
KV-storagesystem
• Voldemortp2psystem• Metrics:response,me
Metricscaptured1-2,mespersec. 12
![Page 12: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/12.jpg)
VLCClientVLC
Sensor(Y1..Y3)
X4Sensor
Servermachine
Clientmachine
Loadgenerator
VLCClientVoDLoadgenerator
Servermachine
X3Sensor
HTTPLoadbalancer
Videostreaming(HTTP)
Transcodinginstance
X6Sensor
Servermachine
Transcodinginstance
VideoStreaming(HTTP)
X5Sensor
Servermachine
Transcodinginstance
NetworkFileSystem
X1Sensor
Servermachine
Filesystemaccess
NetworkFileSystem
X2Sensor
Servermachine
WebServer
WebServer
WebServer
Server-side Client-side13
M:X1…Xm➝Ŷ1…ŶkIn-NetworkComputa,onXi:OSkernelstatsYj:videoframerate,audiobufferrate,…
![Page 13: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/13.jpg)
Clientmachine
Loadgenerator
Server-side Client-side14
Key-valueSensor(Y)
Key-valueClient
Key-valueClient
Key-valueLoad
generator
Servermachine
Key-valueStore
X1Sensor
X2Sensor
Servermachine
Key-valueStore
X3Sensor
Servermachine
Key-valueStore
M:X1…Xm➝ŶIn-NetworkComputa,onXi:OSkernelstatsY:response,me
![Page 14: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/14.jpg)
Evalua,on
BatchLearningonTraces
X-Ytrace
Trainset Testset
70% 30%
Modeltraining
LinearregressionLassoregressionRegressiontreeRandomforest
….
LearningmodelLearningmodelLearningmodel
ei = | yi − yi |
NMAE = 1ym
eii=1
m∑
Predic,onaccuracy
yi =M (xi )
yiMeasure
TrainingTime
15
NormalizedMeanAbsoluteError
![Page 15: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/15.jpg)
Predic,onMethods
16
Increased difficulty and realism
OnlinelearningBatchlearning Real-,melearning
Usingtraces Usinglivesta,s,cs
![Page 16: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/16.jpg)
FeatureSetReduc,on
• Exhaus,vesearchisinfeasible– RequiresO(2p)trainingexecu,ons
• Op,on:forwardstepwisefeatureselec,on– Heuris,cmethodO(p2)trainingexecu,ons– Incrementallygrowsthefeaturesets
• Reducesfeaturesetfrom5000to12features
17
![Page 17: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/17.jpg)
EffectofFeatureSetReduc,on
18
Loadpaaern
Featureset
Video Audio
NMAE(%) Train(sec) NMAE(%) Train(sec)
PeriodicFull 12 >59000 32 >70000
“Minimal” 6 862 22 1600
FlashFull 8 >55000 21 >75000
“Minimal” 4 778 15 1750
“Minimal”featureset-improvespredic,onaccuracyoverfullset,featuresetselectedbyexperts-reducestraining,me
NMAE =
![Page 18: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/18.jpg)
Real-,meModelComputa,onandEvalua,on
![Page 19: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/19.jpg)
VisualizingOutputfromAnaly,csEngine
![Page 20: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/20.jpg)
Evalua,onforReal-,meLearning
21
LoadpaaernVideo-on-demand(VoD) KV-store
Videoframerate Audiobufferrate Response,me
Periodic 3.6% 14% 7%
Flashcrowd 5.6% 11% 6%
Periodic+
Flashcrowd8% 29% 11%
Withvirtualizedinfrastructure:sameresultsEnd-to-endservicealongnetworkpath:+10%
![Page 21: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/21.jpg)
ResultstoDate
• Predic,ngservicemetricsforcluster-basedservicesisfeasible:– videostreaming,key-valuestore(NMAEbelow14%forvideo,audioframerates,etc.)
• Featuresetreduc,ononXsarreducesmodelcomputa,on,meandimprovesaccuracy.
• Real-,meanaly,csengine– allowstoobserveeffectofsystemperturba,ononservicequality;– servesasbuildingblockforservicequalityassurancesystem,
anomalydetec,onsystem.
![Page 22: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/22.jpg)
Real-time Analytics Functions for Clouds
23
![Page 23: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/23.jpg)
Conclusions (1)
The availability of operational and historical data and recent technology advancement make real-time analytics for clouds possible. Analytics methods create models from observations, without knowing detailed architectural and functional model of a system. Training of engineers is key. 24
![Page 24: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/24.jpg)
Conclusions (2)
We demonstrated the feasibility of estimating service metrics in real-time. Promising application of real-time analytics for engineering and operation of cloud services: • Estimation of KPIs and control parameters; • Quality assurance and anomaly detection/root-cause
analysis. Major challenges remain: • scalability, in-network computation • integrating analytics into an overall engineering
methodology. 25
![Page 25: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time](https://reader034.vdocuments.us/reader034/viewer/2022042106/5e84ff5bf3fe0b0be3447645/html5/thumbnails/25.jpg)
Publica,ons
1.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Aservice-agnos,cmethodforpredic,ngservicemetricsinreal-,me,”submiaedforpublica,on.2.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Predic,ngservicemetricsforcluster-basedservicesusingreal-,meanaly,cs,”Interna,onalConferenceonNetworkandServiceManagement(CNSM2015),Barcelona,Spain,November2015.3.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Predic,ngReal-,meService-levelMetricsfromDeviceSta,s,cs,”Interna,onalSymposiumonIntegratedNetworkManagement(IM2015),Oaawa,CA,May2015.4.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Aplatormforpredic,ngreal-,meservice-levelmetricsfromdevicesta,s,cs,”Interna,onalSymposiumonIntegratedNetworkManagement(IM2015),Demonstra,onProgram,Oaawa,CA,May2015.5.Tracespublishedathap://mldata.org