predicting service metrics using real-time analytics annet 2016.pdf · • large amounts of...

25
Predicting Service Metrics using Real-time Analytics Rolf Stadler School of Electrical Engineering KTH Royal Institute of Technology, Sweden April 25, 2016 AnNet 2016, Istanbul, Turkey

Upload: others

Post on 24-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

Predicting Service Metrics using Real-time Analytics

Rolf Stadler School of Electrical Engineering

KTH Royal Institute of Technology, Sweden

April 25, 2016

AnNet 2016, Istanbul, Turkey

Page 2: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

The Success of Analytics in System Engineering DARPA Grand Challenge (2004-2007) • competition for autonomous vehicles

IBM’s Watson (2011) • natural language processing and machine learning on

unstructured data

Google Brain (2012) • recognize higher-level concepts from unlabeled images

Google’s Alpha Go (2016) • plays Go on level of best professionals

3

Page 3: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

Cloud for Analytics vs. Analytics for Cloud Cloud technologies in support of (big) data analytics

- enable virtualization, scaling, pay-as-you-go, multi-tenancy

- new computing paradigms, online algorithms, stream processing

- platforms Analytics in support of engineering, operations of cloud technologies and services à this talk

4

Page 4: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

The Role of Analytics in Systems Engineering and Operations • What are the benefits and the costs of applying analytics methods?

• For which cases outperform analytics method traditional methods or provide new capabilities?

• How can we integrate analytics into an overall engineering methodology?

• Experience shows that both data science and domain knowledge needed.

à Need to train engineers in data science. 5

Page 5: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

Why Analytics for Clouds?

Enablers • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

storage, processing at source • availability of platform technology

Need • complexity makes traditional methods infeasible

statistical learning creates a system model through observation without detailed knowledge of system architecture and its functional components

6

Page 6: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

Example:Real-,meAnaly,csforNetworkManagementCollabora,onbetweenEricssonResearch,KTH,SICS

TelecomClouds Analy0csIn-NetworkAnaly0cs

Real-0meManagementServiceAssuranceAnomalyDetec0on

REALMReal-0meAnaly0cs

forCloudNetworkManagement

Page 7: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

RTAN

M:X11…X44➝Ŷ1…Ŷn:In-NetworkComputa,on

S1…Sn

S1

S2

X41…X44X12…X24 X31X11

Y1

Y2

S1…Sn

INANINAN INANINAN

Real-,mePredic,onofServiceMetrics

Page 8: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

CPUload,memoryload,#networkac,vesockets,#contextswitching,#processes,etc..

Video-on-demand(VoD)•  videostreaming(VLC)•  videoframerate,audiobuffer

rate,networkreadrateKV-store•  response,me

TheProblem

X:devicesta,s,csY:service-levelmetrics

VoD(S1):Cluster

Network

VoD(S1):Client

KV(S2):ClusterKV(S2):Client

FindM:X➝ŶthatpredictsYinreal-,me.

Page 9: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

Goal:•  Predic,ngServiceMetricsfromDeviceSta,s,csinreal-,meApproach:•  Sta,s,callearning,onlinemethods,

distributedlearningoncomputeserversandnetworknodes;Experimenta,onontestbed

Benefitsofapproach:•  service-agnos,cmethodology,scalablity,…Challenges:•  Largefeatureset(>1kfeatures)•  Conceptdri`throughchangingloadpaaernsand

resourcemanagementfunc,ons

Real-,meAnaly,csforManagement

Page 10: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

DeviceSta,s,csX

•  Linuxkernelsta,s,csXproc–  Featuresextractedfrom/procdirectory–  CPUcorejiffies,currentmemoryusage,virtualmemorysta,s,cs,

#processes,#blockedprocesses,…–  Some4000metrics

•  SystemAc,vityReport(SAR)Xsar–  SARcomputesmetricsfrom/procover,meinterval–  CPUcoreu,liza,on,memoryandswapspaceu,liza,on,diskI/Osta,s,cs,…–  Some840metrics

•  XproccontainsmanyOScounters,whileXsardoesnot•  Formodelpredic,ons,focusonnumericalfeatures•  Sensorsreadsta,s,cs1-2,mespersec.

11

Page 11: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

ServiceMetricsYVideo-on-Demand

•  VideostreamingservicebasedonVLCmediaplayer.•  WeinstrumentedtheVLCso`waretocaptureunderlyingeventstocomputethemetrics.

•  Metrics:videoframerate,audiobufferrate,RTPpacketrate,…

KV-storagesystem

•  Voldemortp2psystem•  Metrics:response,me

Metricscaptured1-2,mespersec. 12

Page 12: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

VLCClientVLC

Sensor(Y1..Y3)

X4Sensor

Servermachine

Clientmachine

Loadgenerator

VLCClientVoDLoadgenerator

Servermachine

X3Sensor

HTTPLoadbalancer

Videostreaming(HTTP)

Transcodinginstance

X6Sensor

Servermachine

Transcodinginstance

VideoStreaming(HTTP)

X5Sensor

Servermachine

Transcodinginstance

NetworkFileSystem

X1Sensor

Servermachine

Filesystemaccess

NetworkFileSystem

X2Sensor

Servermachine

WebServer

WebServer

WebServer

Server-side Client-side13

M:X1…Xm➝Ŷ1…ŶkIn-NetworkComputa,onXi:OSkernelstatsYj:videoframerate,audiobufferrate,…

Page 13: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

Clientmachine

Loadgenerator

Server-side Client-side14

Key-valueSensor(Y)

Key-valueClient

Key-valueClient

Key-valueLoad

generator

Servermachine

Key-valueStore

X1Sensor

X2Sensor

Servermachine

Key-valueStore

X3Sensor

Servermachine

Key-valueStore

M:X1…Xm➝ŶIn-NetworkComputa,onXi:OSkernelstatsY:response,me

Page 14: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

Evalua,on

BatchLearningonTraces

X-Ytrace

Trainset Testset

70% 30%

Modeltraining

LinearregressionLassoregressionRegressiontreeRandomforest

….

LearningmodelLearningmodelLearningmodel

ei = | yi − yi |

NMAE = 1ym

eii=1

m∑

Predic,onaccuracy

yi =M (xi )

yiMeasure

TrainingTime

15

NormalizedMeanAbsoluteError

Page 15: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

Predic,onMethods

16

Increased difficulty and realism

OnlinelearningBatchlearning Real-,melearning

Usingtraces Usinglivesta,s,cs

Page 16: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

FeatureSetReduc,on

•  Exhaus,vesearchisinfeasible– RequiresO(2p)trainingexecu,ons

•  Op,on:forwardstepwisefeatureselec,on– Heuris,cmethodO(p2)trainingexecu,ons– Incrementallygrowsthefeaturesets

•  Reducesfeaturesetfrom5000to12features

17

Page 17: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

EffectofFeatureSetReduc,on

18

Loadpaaern

Featureset

Video Audio

NMAE(%) Train(sec) NMAE(%) Train(sec)

PeriodicFull 12 >59000 32 >70000

“Minimal” 6 862 22 1600

FlashFull 8 >55000 21 >75000

“Minimal” 4 778 15 1750

“Minimal”featureset-improvespredic,onaccuracyoverfullset,featuresetselectedbyexperts-reducestraining,me

NMAE =

Page 18: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

Real-,meModelComputa,onandEvalua,on

Page 19: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

VisualizingOutputfromAnaly,csEngine

Page 20: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

Evalua,onforReal-,meLearning

21

LoadpaaernVideo-on-demand(VoD) KV-store

Videoframerate Audiobufferrate Response,me

Periodic 3.6% 14% 7%

Flashcrowd 5.6% 11% 6%

Periodic+

Flashcrowd8% 29% 11%

Withvirtualizedinfrastructure:sameresultsEnd-to-endservicealongnetworkpath:+10%

Page 21: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

ResultstoDate

•  Predic,ngservicemetricsforcluster-basedservicesisfeasible:–  videostreaming,key-valuestore(NMAEbelow14%forvideo,audioframerates,etc.)

•  Featuresetreduc,ononXsarreducesmodelcomputa,on,meandimprovesaccuracy.

•  Real-,meanaly,csengine–  allowstoobserveeffectofsystemperturba,ononservicequality;–  servesasbuildingblockforservicequalityassurancesystem,

anomalydetec,onsystem.

Page 22: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

Real-time Analytics Functions for Clouds

23

Page 23: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

Conclusions (1)

The availability of operational and historical data and recent technology advancement make real-time analytics for clouds possible. Analytics methods create models from observations, without knowing detailed architectural and functional model of a system. Training of engineers is key. 24

Page 24: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

Conclusions (2)

We demonstrated the feasibility of estimating service metrics in real-time. Promising application of real-time analytics for engineering and operation of cloud services: • Estimation of KPIs and control parameters; • Quality assurance and anomaly detection/root-cause

analysis. Major challenges remain: • scalability, in-network computation •  integrating analytics into an overall engineering

methodology. 25

Page 25: Predicting Service Metrics using Real-time Analytics AnNet 2016.pdf · • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

Publica,ons

1.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Aservice-agnos,cmethodforpredic,ngservicemetricsinreal-,me,”submiaedforpublica,on.2.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Predic,ngservicemetricsforcluster-basedservicesusingreal-,meanaly,cs,”Interna,onalConferenceonNetworkandServiceManagement(CNSM2015),Barcelona,Spain,November2015.3.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Predic,ngReal-,meService-levelMetricsfromDeviceSta,s,cs,”Interna,onalSymposiumonIntegratedNetworkManagement(IM2015),Oaawa,CA,May2015.4.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Aplatormforpredic,ngreal-,meservice-levelmetricsfromdevicesta,s,cs,”Interna,onalSymposiumonIntegratedNetworkManagement(IM2015),Demonstra,onProgram,Oaawa,CA,May2015.5.Tracespublishedathap://mldata.org