predicting service metrics using real-time analytics annet 2016.pdf · • large amounts of...

Post on 24-Mar-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Predicting Service Metrics using Real-time Analytics

Rolf Stadler School of Electrical Engineering

KTH Royal Institute of Technology, Sweden

April 25, 2016

AnNet 2016, Istanbul, Turkey

The Success of Analytics in System Engineering DARPA Grand Challenge (2004-2007) • competition for autonomous vehicles

IBM’s Watson (2011) • natural language processing and machine learning on

unstructured data

Google Brain (2012) • recognize higher-level concepts from unlabeled images

Google’s Alpha Go (2016) • plays Go on level of best professionals

3

Cloud for Analytics vs. Analytics for Cloud Cloud technologies in support of (big) data analytics

- enable virtualization, scaling, pay-as-you-go, multi-tenancy

- new computing paradigms, online algorithms, stream processing

- platforms Analytics in support of engineering, operations of cloud technologies and services à this talk

4

The Role of Analytics in Systems Engineering and Operations • What are the benefits and the costs of applying analytics methods?

• For which cases outperform analytics method traditional methods or provide new capabilities?

• How can we integrate analytics into an overall engineering methodology?

• Experience shows that both data science and domain knowledge needed.

à Need to train engineers in data science. 5

Why Analytics for Clouds?

Enablers • large amounts of counters, statistics, event streams • technology has progressed to enable real-time

storage, processing at source • availability of platform technology

Need • complexity makes traditional methods infeasible

statistical learning creates a system model through observation without detailed knowledge of system architecture and its functional components

6

Example:Real-,meAnaly,csforNetworkManagementCollabora,onbetweenEricssonResearch,KTH,SICS

TelecomClouds Analy0csIn-NetworkAnaly0cs

Real-0meManagementServiceAssuranceAnomalyDetec0on

REALMReal-0meAnaly0cs

forCloudNetworkManagement

RTAN

M:X11…X44➝Ŷ1…Ŷn:In-NetworkComputa,on

S1…Sn

S1

S2

X41…X44X12…X24 X31X11

Y1

Y2

S1…Sn

INANINAN INANINAN

Real-,mePredic,onofServiceMetrics

CPUload,memoryload,#networkac,vesockets,#contextswitching,#processes,etc..

Video-on-demand(VoD)•  videostreaming(VLC)•  videoframerate,audiobuffer

rate,networkreadrateKV-store•  response,me

TheProblem

X:devicesta,s,csY:service-levelmetrics

VoD(S1):Cluster

Network

VoD(S1):Client

KV(S2):ClusterKV(S2):Client

FindM:X➝ŶthatpredictsYinreal-,me.

Goal:•  Predic,ngServiceMetricsfromDeviceSta,s,csinreal-,meApproach:•  Sta,s,callearning,onlinemethods,

distributedlearningoncomputeserversandnetworknodes;Experimenta,onontestbed

Benefitsofapproach:•  service-agnos,cmethodology,scalablity,…Challenges:•  Largefeatureset(>1kfeatures)•  Conceptdri`throughchangingloadpaaernsand

resourcemanagementfunc,ons

Real-,meAnaly,csforManagement

DeviceSta,s,csX

•  Linuxkernelsta,s,csXproc–  Featuresextractedfrom/procdirectory–  CPUcorejiffies,currentmemoryusage,virtualmemorysta,s,cs,

#processes,#blockedprocesses,…–  Some4000metrics

•  SystemAc,vityReport(SAR)Xsar–  SARcomputesmetricsfrom/procover,meinterval–  CPUcoreu,liza,on,memoryandswapspaceu,liza,on,diskI/Osta,s,cs,…–  Some840metrics

•  XproccontainsmanyOScounters,whileXsardoesnot•  Formodelpredic,ons,focusonnumericalfeatures•  Sensorsreadsta,s,cs1-2,mespersec.

11

ServiceMetricsYVideo-on-Demand

•  VideostreamingservicebasedonVLCmediaplayer.•  WeinstrumentedtheVLCso`waretocaptureunderlyingeventstocomputethemetrics.

•  Metrics:videoframerate,audiobufferrate,RTPpacketrate,…

KV-storagesystem

•  Voldemortp2psystem•  Metrics:response,me

Metricscaptured1-2,mespersec. 12

VLCClientVLC

Sensor(Y1..Y3)

X4Sensor

Servermachine

Clientmachine

Loadgenerator

VLCClientVoDLoadgenerator

Servermachine

X3Sensor

HTTPLoadbalancer

Videostreaming(HTTP)

Transcodinginstance

X6Sensor

Servermachine

Transcodinginstance

VideoStreaming(HTTP)

X5Sensor

Servermachine

Transcodinginstance

NetworkFileSystem

X1Sensor

Servermachine

Filesystemaccess

NetworkFileSystem

X2Sensor

Servermachine

WebServer

WebServer

WebServer

Server-side Client-side13

M:X1…Xm➝Ŷ1…ŶkIn-NetworkComputa,onXi:OSkernelstatsYj:videoframerate,audiobufferrate,…

Clientmachine

Loadgenerator

Server-side Client-side14

Key-valueSensor(Y)

Key-valueClient

Key-valueClient

Key-valueLoad

generator

Servermachine

Key-valueStore

X1Sensor

X2Sensor

Servermachine

Key-valueStore

X3Sensor

Servermachine

Key-valueStore

M:X1…Xm➝ŶIn-NetworkComputa,onXi:OSkernelstatsY:response,me

Evalua,on

BatchLearningonTraces

X-Ytrace

Trainset Testset

70% 30%

Modeltraining

LinearregressionLassoregressionRegressiontreeRandomforest

….

LearningmodelLearningmodelLearningmodel

ei = | yi − yi |

NMAE = 1ym

eii=1

m∑

Predic,onaccuracy

yi =M (xi )

yiMeasure

TrainingTime

15

NormalizedMeanAbsoluteError

Predic,onMethods

16

Increased difficulty and realism

OnlinelearningBatchlearning Real-,melearning

Usingtraces Usinglivesta,s,cs

FeatureSetReduc,on

•  Exhaus,vesearchisinfeasible– RequiresO(2p)trainingexecu,ons

•  Op,on:forwardstepwisefeatureselec,on– Heuris,cmethodO(p2)trainingexecu,ons– Incrementallygrowsthefeaturesets

•  Reducesfeaturesetfrom5000to12features

17

EffectofFeatureSetReduc,on

18

Loadpaaern

Featureset

Video Audio

NMAE(%) Train(sec) NMAE(%) Train(sec)

PeriodicFull 12 >59000 32 >70000

“Minimal” 6 862 22 1600

FlashFull 8 >55000 21 >75000

“Minimal” 4 778 15 1750

“Minimal”featureset-improvespredic,onaccuracyoverfullset,featuresetselectedbyexperts-reducestraining,me

NMAE =

Real-,meModelComputa,onandEvalua,on

VisualizingOutputfromAnaly,csEngine

Evalua,onforReal-,meLearning

21

LoadpaaernVideo-on-demand(VoD) KV-store

Videoframerate Audiobufferrate Response,me

Periodic 3.6% 14% 7%

Flashcrowd 5.6% 11% 6%

Periodic+

Flashcrowd8% 29% 11%

Withvirtualizedinfrastructure:sameresultsEnd-to-endservicealongnetworkpath:+10%

ResultstoDate

•  Predic,ngservicemetricsforcluster-basedservicesisfeasible:–  videostreaming,key-valuestore(NMAEbelow14%forvideo,audioframerates,etc.)

•  Featuresetreduc,ononXsarreducesmodelcomputa,on,meandimprovesaccuracy.

•  Real-,meanaly,csengine–  allowstoobserveeffectofsystemperturba,ononservicequality;–  servesasbuildingblockforservicequalityassurancesystem,

anomalydetec,onsystem.

Real-time Analytics Functions for Clouds

23

Conclusions (1)

The availability of operational and historical data and recent technology advancement make real-time analytics for clouds possible. Analytics methods create models from observations, without knowing detailed architectural and functional model of a system. Training of engineers is key. 24

Conclusions (2)

We demonstrated the feasibility of estimating service metrics in real-time. Promising application of real-time analytics for engineering and operation of cloud services: • Estimation of KPIs and control parameters; • Quality assurance and anomaly detection/root-cause

analysis. Major challenges remain: • scalability, in-network computation •  integrating analytics into an overall engineering

methodology. 25

Publica,ons

1.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Aservice-agnos,cmethodforpredic,ngservicemetricsinreal-,me,”submiaedforpublica,on.2.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Predic,ngservicemetricsforcluster-basedservicesusingreal-,meanaly,cs,”Interna,onalConferenceonNetworkandServiceManagement(CNSM2015),Barcelona,Spain,November2015.3.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Predic,ngReal-,meService-levelMetricsfromDeviceSta,s,cs,”Interna,onalSymposiumonIntegratedNetworkManagement(IM2015),Oaawa,CA,May2015.4.R.Yanggratoke,J.Ahmed,J.Ardelius,C.Flinta,A.Johnsson,D.Gillblad,andR.Stadler,“Aplatormforpredic,ngreal-,meservice-levelmetricsfromdevicesta,s,cs,”Interna,onalSymposiumonIntegratedNetworkManagement(IM2015),Demonstra,onProgram,Oaawa,CA,May2015.5.Tracespublishedathap://mldata.org

top related