understanding and predicting host load
DESCRIPTION
Understanding and Predicting Host Load. Peter A. Dinda Carnegie Mellon University http://www.cs.cmu.edu/~pdinda. Talk in a Nutshell. Statistical analysis of two sets of week long, 1 Hz resolution traces of load on ~40 machines and evaluation of linear time series models for load prediction. - PowerPoint PPT PresentationTRANSCRIPT
Understanding and PredictingHost Load
Peter A. DindaCarnegie Mellon University
http://www.cs.cmu.edu/~pdinda
2
Talk in a Nutshell
• Load is self-similar• Load exhibits epochal behavior• Load prediction benefits from capturing
self-similarity
Statistical analysis of two sets of week long, 1 Hz resolution traces of load on ~40 machines and evaluation of linear time series models for load prediction
3
Why Study Load?
Load partially determines execution time
We want to model and predict load
[tmin,tmax] ??InteractiveApplication
Short taskswith deadlines
Unmodified Distributed System
4
Load and Execution Time
1 3 5 7Measured Load
0
5
10
15
20
25
Exe
cutio
n TI
me
(Sec
onds
)
42,000 pointsCoefficient of Correlation = 0.998
nominal
tt
t
tdttload
execnow
now
)(11
5
Outline• Measurement methodology• Load traces• Load variance• New Results
– Self-similarity– Epochal behavior
• Benefits of capturing self similarity in linear models
• Conclusions
6
Measurement Methodology
Ready Queue
RUN
lent
lent-T
lent-2T
lent-29T
...
lent-30T...
ExponentialAverage(1 minute Load “Average”)
avgt
avgt-0.5T
avgt-T...
Our Measurements(1 Hz sample rate)
Digital Unix Kernel User Level Measurement Tool
T=2 seconds
7
Load Traces
Machines DurationAugust 1997 13 production cluster
8 research cluster2 compute servers
15 desktops
~ one week(over onemillionsamples)
February 1998 13 production cluster8 research cluster2 compute servers
11 desktops
~ one week(over onemillionsamples)
8
Absolute Variation
-1
-0.5
0
0.5
1
1.5
2
Host
+SDev
-SDev
Mean
Production Cluster ResearchCluster
Desktops
9
Relative Variation
0
2
4
6
8
10
12
Host
Production Cluster ResearchCluster
Desktops
10
Lag
AC
F
0 100 200 300 400 500 600
0.0
0.2
0.4
0.6
0.8
1.0 Series : axp7.19.day$NormLoad
frequency
spec
trum
0.0 0.1 0.2 0.3 0.4 0.5
-80
-60
-40
-20
020
Series: axp7.19.day$NormLoad Raw Periodogram
bandwidth= 3.34114e-006 , 95% C.I. is ( -5.87588 , 17.5667 )dB
Time
0 20000 40000 60000 80000
0.00
.20.40
.60.81
.01.21
.4Lo
adA
utoc
orre
latio
nP
erio
dogr
am
Time
Lag
Frequency
11
Visual Self-Similarity Here
12
The Hurst Parameter
0.01
0.1
1
10
100
1000
10000
100000
0.0001 0.001 0.01 0.1 1Log(Frequency)
H=0.375
H=0.5
H=0.625H=0.875
H=(1-slope)/2
13
Self-similarity Statistics
0
0.2
0.4
0.6
0.8
1
1.2
Host
Production Cluster ResearchCluster
Desktops
+SDev
-SDev
Mean
14
Why is Self-Similarity Important?• Complex structure
– Not completely random, nor independent– Short range dependence
• Excellent for history-based prediction– Long range dependence
• Possibly a problem
• Modeling Implications– Suggests models that can capture
• ARFIMA, FGN, TAR
15
Load Exhibits Epochal Behavior
Title:axp7_tue_19.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Title:axp7_19_day_time.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
16
Epoch Length Statistics
-200
0
200
400
600
800
1000
1200
Host
+SDev
-SDev
Mean
Production Cluster ResearchCluster
Desktops
17
Why is Epochal Behavior Important?
• Complex structure – Non-stationary
• Modeling Implications– Suggests models
• ARIMA, ARFIMA, etc.• Non-parametric spectral methods
– Suggests problem decomposition
18
Linear Time Series Models
Time
0 20000 40000 60000 80000
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
tj
jtjt aaz
1
Time
0 20000 40000 60000 80000
-0.0
4-0
.02
0.0
0.02
0.04
),0(~ 2at WhiteNoisea 2,~ ztz
22za
Choose weights j to minimize a2
a is the confidence interval for t+1 predictions
UnpredictableRandom Sequence Fixed Linear Filter
Partially PredictableLoad Sequence
19
Realizable Pole-Zero Models
ARFIMA(p,d,q)
ARIMA(p,d,q)
ARMA(p,q)
AR(p) MA(q)
Self Similarity, d related to Hurst
Non-stationarity, d integer
p,q are numbers of parametersd is degree of differencing
20
Real World Benefits of Modelsa is the confidence interval for t+1 predictions
Map work that would take 100 ms at zero load
axp0: z=0.54, =1.0, a(ARMA(4,4))= 0.109 a(ARFIMA(4,d,4))= 0.108no model: 1.0 +/- 1.06 (95%) => 100 to 306 msARMA: 1.0 +/- 0.22 (95%) => 178 to 222 msARFIMA: 1.0 +/- 0.21 (95%) => 179 to 221 ms
axp7: z=0.14, =0.12, a(ARMA(4,4))= 0.041 a(ARFIMA(4,d,4))= 0.025no model: 0.12 +/- 0.27 (95%) => 100 to 139 msARMA: 0.12 +/- 0.08 (95%) => 104 to 120 msARFIMA: 0.12 +/- 0.05 (95%) => 107 to 117 ms
1 %
40 %
21
t+1 prediction
-505
1015202530354045
Host
Production Cluster ResearchCluster
Desktops
22
t+8 prediction
-10
-5
0
5
10
15
20
25
30
35
Host
Production Cluster ResearchCluster
Desktops
23
Conclusions• Load has high variance• Load is self-similar• Load exhibits epochal behavior• Capturing self-similarity in linear time
series models improves predictability
24
Load Traces• Would a web-accessible load trace
database be useful?• Would you like to contribute?