predicting service metrics using real-time analytics annet 2016.pdf · • large amounts of...
TRANSCRIPT
Predicting Service Metrics using Real-time Analytics
Rolf Stadler School of Electrical Engineering
KTH Royal Institute of Technology, Sweden
April 25, 2016
AnNet 2016, Istanbul, Turkey
The Success of Analytics in System Engineering DARPA Grand Challenge (2004-2007) • competition for autonomous vehicles
IBM’s Watson (2011) • natural language processing and machine learning on
unstructured data
Google Brain (2012) • recognize higher-level concepts from unlabeled images
Google’s Alpha Go (2016) • plays Go on level of best professionals
3
Cloud for Analytics vs. Analytics for Cloud Cloud technologies in support of (big) data analytics
- enable virtualization, scaling, pay-as-you-go, multi-tenancy
- new computing paradigms, online algorithms, stream processing
- platforms Analytics in support of engineering, operations of cloud technologies and services à this talk
4
The Role of Analytics in Systems Engineering and Operations • What are the benefits and the costs of applying analytics methods?
• For which cases outperform analytics method traditional methods or provide new capabilities?
• How can we integrate analytics into an overall engineering methodology?
• Experience shows that both data science and domain knowledge needed.
à Need to train engineers in data science. 5
Why Analytics for Clouds?
Enablers • large amounts of counters, statistics, event streams • technology has progressed to enable real-time
storage, processing at source • availability of platform technology
Need • complexity makes traditional methods infeasible
statistical learning creates a system model through observation without detailed knowledge of system architecture and its functional components
6
Example:Real-,meAnaly,csforNetworkManagementCollabora,onbetweenEricssonResearch,KTH,SICS
TelecomClouds Analy0csIn-NetworkAnaly0cs
Real-0meManagementServiceAssuranceAnomalyDetec0on
REALMReal-0meAnaly0cs
forCloudNetworkManagement
RTAN
M:X11…X44➝Ŷ1…Ŷn:In-NetworkComputa,on
S1…Sn
S1
S2
X41…X44X12…X24 X31X11
Y1
Y2
S1…Sn
INANINAN INANINAN
Real-,mePredic,onofServiceMetrics
CPUload,memoryload,#networkac,vesockets,#contextswitching,#processes,etc..
Video-on-demand(VoD)• videostreaming(VLC)• videoframerate,audiobuffer
rate,networkreadrateKV-store• response,me
TheProblem
X:devicesta,s,csY:service-levelmetrics
VoD(S1):Cluster
Network
VoD(S1):Client
KV(S2):ClusterKV(S2):Client
FindM:X➝ŶthatpredictsYinreal-,me.
Goal:• Predic,ngServiceMetricsfromDeviceSta,s,csinreal-,meApproach:• Sta,s,callearning,onlinemethods,
distributedlearningoncomputeserversandnetworknodes;Experimenta,onontestbed
Benefitsofapproach:• service-agnos,cmethodology,scalablity,…Challenges:• Largefeatureset(>1kfeatures)• Conceptdri`throughchangingloadpaaernsand
resourcemanagementfunc,ons
Real-,meAnaly,csforManagement
DeviceSta,s,csX
• Linuxkernelsta,s,csXproc– Featuresextractedfrom/procdirectory– CPUcorejiffies,currentmemoryusage,virtualmemorysta,s,cs,
#processes,#blockedprocesses,…– Some4000metrics
• SystemAc,vityReport(SAR)Xsar– SARcomputesmetricsfrom/procover,meinterval– CPUcoreu,liza,on,memoryandswapspaceu,liza,on,diskI/Osta,s,cs,…– Some840metrics
• XproccontainsmanyOScounters,whileXsardoesnot• Formodelpredic,ons,focusonnumericalfeatures• Sensorsreadsta,s,cs1-2,mespersec.
11
ServiceMetricsYVideo-on-Demand
• VideostreamingservicebasedonVLCmediaplayer.• WeinstrumentedtheVLCso`waretocaptureunderlyingeventstocomputethemetrics.
• Metrics:videoframerate,audiobufferrate,RTPpacketrate,…
KV-storagesystem
• Voldemortp2psystem• Metrics:response,me
Metricscaptured1-2,mespersec. 12
VLCClientVLC
Sensor(Y1..Y3)
X4Sensor
Servermachine
Clientmachine
Loadgenerator
VLCClientVoDLoadgenerator
Servermachine
X3Sensor
HTTPLoadbalancer
Videostreaming(HTTP)
Transcodinginstance
X6Sensor
Servermachine
Transcodinginstance
VideoStreaming(HTTP)
X5Sensor
Servermachine
Transcodinginstance
NetworkFileSystem
X1Sensor
Servermachine
Filesystemaccess
NetworkFileSystem
X2Sensor
Servermachine
WebServer
WebServer
WebServer
Server-side Client-side13
M:X1…Xm➝Ŷ1…ŶkIn-NetworkComputa,onXi:OSkernelstatsYj:videoframerate,audiobufferrate,…
Clientmachine
Loadgenerator
Server-side Client-side14
Key-valueSensor(Y)
Key-valueClient
Key-valueClient
Key-valueLoad
generator
Servermachine
Key-valueStore
X1Sensor
X2Sensor
Servermachine
Key-valueStore
X3Sensor
Servermachine
Key-valueStore
M:X1…Xm➝ŶIn-NetworkComputa,onXi:OSkernelstatsY:response,me
Evalua,on
BatchLearningonTraces
X-Ytrace
Trainset Testset
70% 30%
Modeltraining
LinearregressionLassoregressionRegressiontreeRandomforest
….
LearningmodelLearningmodelLearningmodel
ei = | yi − yi |
NMAE = 1ym
eii=1
m∑
Predic,onaccuracy
yi =M (xi )
yiMeasure
TrainingTime
15
NormalizedMeanAbsoluteError
Predic,onMethods
16
Increased difficulty and realism
OnlinelearningBatchlearning Real-,melearning
Usingtraces Usinglivesta,s,cs
FeatureSetReduc,on
• Exhaus,vesearchisinfeasible– RequiresO(2p)trainingexecu,ons
• Op,on:forwardstepwisefeatureselec,on– Heuris,cmethodO(p2)trainingexecu,ons– Incrementallygrowsthefeaturesets
• Reducesfeaturesetfrom5000to12features
17
EffectofFeatureSetReduc,on
18
Loadpaaern
Featureset
Video Audio
NMAE(%) Train(sec) NMAE(%) Train(sec)
PeriodicFull 12 >59000 32 >70000
“Minimal” 6 862 22 1600
FlashFull 8 >55000 21 >75000
“Minimal” 4 778 15 1750
“Minimal”featureset-improvespredic,onaccuracyoverfullset,featuresetselectedbyexperts-reducestraining,me
NMAE =
Real-,meModelComputa,onandEvalua,on
VisualizingOutputfromAnaly,csEngine
Evalua,onforReal-,meLearning
21
LoadpaaernVideo-on-demand(VoD) KV-store
Videoframerate Audiobufferrate Response,me
Periodic 3.6% 14% 7%
Flashcrowd 5.6% 11% 6%
Periodic+
Flashcrowd8% 29% 11%
Withvirtualizedinfrastructure:sameresultsEnd-to-endservicealongnetworkpath:+10%
ResultstoDate
• Predic,ngservicemetricsforcluster-basedservicesisfeasible:– videostreaming,key-valuestore(NMAEbelow14%forvideo,audioframerates,etc.)
• Featuresetreduc,ononXsarreducesmodelcomputa,on,meandimprovesaccuracy.
• Real-,meanaly,csengine– allowstoobserveeffectofsystemperturba,ononservicequality;– servesasbuildingblockforservicequalityassurancesystem,
anomalydetec,onsystem.
Real-time Analytics Functions for Clouds
23
Conclusions (1)
The availability of operational and historical data and recent technology advancement make real-time analytics for clouds possible. Analytics methods create models from observations, without knowing detailed architectural and functional model of a system. Training of engineers is key. 24
Conclusions (2)
We demonstrated the feasibility of estimating service metrics in real-time. Promising application of real-time analytics for engineering and operation of cloud services: • Estimation of KPIs and control parameters; • Quality assurance and anomaly detection/root-cause
analysis. Major challenges remain: • scalability, in-network computation • integrating analytics into an overall engineering
methodology. 25
Publica,ons
1.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Aservice-agnos,cmethodforpredic,ngservicemetricsinreal-,me,”submiaedforpublica,on.2.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Predic,ngservicemetricsforcluster-basedservicesusingreal-,meanaly,cs,”Interna,onalConferenceonNetworkandServiceManagement(CNSM2015),Barcelona,Spain,November2015.3.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Predic,ngReal-,meService-levelMetricsfromDeviceSta,s,cs,”Interna,onalSymposiumonIntegratedNetworkManagement(IM2015),Oaawa,CA,May2015.4.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Aplatormforpredic,ngreal-,meservice-levelmetricsfromdevicesta,s,cs,”Interna,onalSymposiumonIntegratedNetworkManagement(IM2015),Demonstra,onProgram,Oaawa,CA,May2015.5.Tracespublishedathap://mldata.org