proactivity = observation + analysis + knowledge extraction + action planning ?
DESCRIPTION
Proactivity = Observation + Analysis + Knowledge extraction + Action planning ?. András Pataricza, Budapest University of Technology and Economics . Contributors. Prof. G. Horváth (BME) I. Kocsis (BME) Z. Micskei (BME) K. Gáti (BME) Zs . Kocsis (IBM) I. Szombath (BME) - PowerPoint PPT PresentationTRANSCRIPT
PowerPoint bemutat
Proactivity = Observation + Analysis + Knowledge extraction + Action planning?Andrs Pataricza, Budapest University of Technology and Economics
Budapest University of Technology and EconomicsDepartment of Measurement and Information SystemsContributorsProf. G. Horvth (BME)I. Kocsis (BME)Z. Micskei (BME)K. Gti (BME)Zs. Kocsis (IBM)I. Szombath (BME)And many others
There will be nothing new in this lecture
I learned the basics, when I was so young
4
But old professors are happy of new audience
5
What can traditional signal processing help for proactivityProactive stance: Builds on foreknowledge (intelligence) and creativity to anticipate the situation as an opportunity, regardless of how threatening or how bad it looks; influence the system constructively instead of reacting
Reactivity vs. proactivityReactive controlacting in response to a situation rather than creating or controlling it:Proactive controlcontrolling a situation rather than just responding to it after it has happened:7
Test environment
Test configurationVirtual desktop infrastructure~ a few of tens of VM/host~ a few of tens of host/clusterVSphere monitoring and supervisory controlObjective: VM level SLA controlCapacity planning, Proactive migrationCPU-ready metrics: VM ready to run, but lack of resources to start
Performance monitoring10
Detecting a possible problem on VM or host level Failure indicator as wellThis document was created using the official VMware icon and diagram library. Copyright 2010 VMware, Inc. See http://communities.vmware.com/docs/DOC-1370210
Actions to prevent performance issue11
Add limits neighbouring VMs
Actions to prevent performance issue12
Live migrate VM to other (underutilized) hostMeasured data (at 20 sec sampling rate)13Aggregation over populationStatistical cluster behavior versusQoS over the VM population14Mean of the goal VM-metric (VM_CPU_READY)VM application:ready to run Resource lack-> Performance bottleneck-> Availability problemVmware recommended threshold:5% watching10% typically action is needed
The two trapsVisual processing: You believe your eyesAutomated processing: you believe your computer
Statistics:Mean: 0.007 -> a good systemOnly 2/3 of the samples are error-free-> A bad systemAfter eliminating failure-free cases below the threshold Mean: 0.023-> a good system
Mean of the goal VM-metricVisual inspection:Lot of bad valuesThis is a bad system
Host shared and used memory along the time NoisyHigh frrequency components dominateBut they correlate (93%!)YOU DONT SEE IT
and a host of more mundane observationsComputing power use = CPU use CPU clk rate (const.)Should be pure proportionalCorrelation coefficient:0.99998477434137Well-visible, but numerically suppressedOrigin???
19
19
Host CPU usage vs VM ratio: bad vCPU ready
Most important factor: host CPU usage mean
The battleplanImpacts of temporal resultionNyquistShannon sampling theorem:2 sampling frequency = bandwidthSampling period = 20 sec-> Sampling frequency = 5 Hz-> Bandwidth = 2.5 HzAdditionaly:Sampling clock jitter (SW sampling)Clock skew (distributed system)Precision Time Protocol(PTP) (IEEE 1588-2008) No fine granular prediction22
ProactivityProactivity needs:Situation recognition based on historical experienceWhat is to be expected ? Identification of the principal factorsSingle factor /multiple factorsOperation domains leading to failuresBoundaries Predictor designHigh failure coverageTemporal lookahead sufficient for reactionDesign of reaction
23Situations to be coveredSingle VM: application demand > resource allocatedVM-host: overcommisioning, overload due to other VMsVM-host-cluster
24
Data preparationData cleaningData reductionData reduction Huge initial set of samplesReductionObject sampling: Represenative measurement objectsParameter selection/reduction:AggregationRelevance Redundancy Temporal Sampling Relevance
Object samplingIn pursuit of discovering fine-grained behaviorand the reasons for outliers27
For presentation purposes only- Reduction of the sample size to 400 ManageabilityReal-life analysis: - keep enough data to maintain a proper correlation with the operation
Subsample: ratio > 0 + random subsampling Demo: Visual data discovery with || coordinates
Visual multifactor analysisVisual analytics for an arbitrary number of factors
Inselberg, A: Parallel Coordinates, Visual Multidimensional Geometry and Its Applications, Springer 2009You can do much, much moreRedundancy reductionCorrelation analysisClustering Data miningApproximationOptimization 30
Prediction at the cluster levelWhat ratio of the VMs will become problematic?31Pinpointed interval for one VM
Situation of interestTraining time> Prediction timeOne minute prediction based on all data sources
One minute prediction and classification
Predicted\RealAlarmNormalAlarm77 (67.54%)56 (0.3%) Normal37 (32.46%)18269 (99.7%)Piros -> missed alarm, elsfaj hibaSrga -> false alarm, msodfaj hiba34One minute prediction with selected variables
Classification error (simplest predictor)FactorsPrediction timeUncovered failure rateFalse alarm rateAll1 min73 %0,2 %Proper feature set1 min32 %0,3 %Wrong feature set1 min97 %0,04%All5 min87 %0,1%False alarm rate is low (dominant pattern) Feature set selection is critical to detectionMore is less (PROPER selection is needed cf. PFARM 2010)Case separation for different situationsLong term prediction is hard (automated reactions)
Case study Connectivity testing in Large NetworksIn dynamic infrastructures the active internode topology has to be discovered as well Utols mdosts:
37Large Networksnot known explicitlytoo complex forconventional algorithms
Social network graphYahoo! Instant Messenger friendconnectivity graph *1.8M nodes ~4M edges
Serve as a model ofLarge Infrastructures
Typical power law network75% of the friendships are related to 35% of users
Yahoo! Research Alliance Webscope program*ydata-yim-friends-graph-v1_0http://research.yahoo.com/Academic_Relations
Typical Model: Random graphsYahoo! Instant Messenger dataset Adjacency Matrix
Preferential attachment graph
GraphonOrdered by degree:Random order:Preferential attachment graph :At step m a new edges are created: randomly choosen nodes are connected with probability proportional to their degrees39
Approx. edge density by subgraph samplingSample size (k)Number of samples (n)Relative errorWhite:error < 5%
Random, k=4 sampleSample size k = 35Repeated n = 20 times2% error4% of the graph examinedThe decrease of error is exponential when x or z is small. Linear after that.Colors darker that purple means error is less than 5%!40Neighborhood sampling: Fault Tolerant Services
No. of 3 and 4 cycles = possible redundancyHigh node has many substitute nodes (e.g. load balancer)
Distribution approximated from samples are very close!Root nodeRedundancy?
Neighborhood samplingtake random nodesexplore neighborhood to a given depth (m)Fault Tolerant DomainTrendsSummary: proactivity needs
Thank you for your attention42ObservationsAll relevant cases (Stress test)AnalysisCheck of input dataVisual analysisUNDERSTANDINGAutomated methods for calculationKnowledge extractionClustering (situation recognition)Predictor(generalization)Action planningSituation defining principal factors are indicative