an evaluation of linear models for host load prediction peter a. dinda david r. o’hallaron...
TRANSCRIPT
An Evaluation of Linear Modelsfor
Host Load Prediction
Peter A. Dinda
David R. O’Hallaron
Carnegie Mellon University
2
Motivating Questions
• What are the properties of host load?
• Is host load predictable?
• What predictive models are appropriate?
• Are host load predictions useful?
3
Overview of Answers
• Host load exhibits complex behavior• Self-similarity, epochal behavior
• Host load is predictable• 1 to 30 second timeframe
• Simple linear models are sufficient• Recommend AR(16) or better
• Predictions lead to useful estimates of task execution times
Statistically rigorous approach
4
Outline• Context: predicting task execution times
– Mean squared load prediction error
• Offline trace-based evaluation– Host load traces– Linear models– Randomized methodology– Results of data-mining
• Online prediction of task execution times• Related work• Conclusion
5
Prediction-based Best-effort Distributed Real-time Scheduling
Pre
dict
ed E
xec
Tim
e
Task
deadlinenominal time
?
deadline
Task notifies scheduler of its CPU requirements (nominal time) and its deadline
Scheduler acquires predicted task execution times for all hosts
Scheduler assigns task to a host where its deadline can be met
6
Task
deadlinenominal time
?Load
Sensor
LoadPredictor
Exec TimeModel
Predicting Task Execution TimesP
redi
cted
Exe
c T
ime
deadline
DEC Unix5 secondload averagesampled at 1 Hz
1 to 30 second predictions
7
Confidence IntervalsBad Predictor
No obvious choiceGood Predictor
Two good choices
Pre
dict
ed E
xec
Tim
e
Good predictors provide smaller confidence intervals
Smaller confidence intervals simplify scheduling decisions
Pre
dict
ed E
xec
Tim
e
deadline
8
Task
deadlinenominal time
?Load
Sensor
LoadPredictor
Exec TimeModel
Load Prediction FocusP
redi
cted
Exe
c T
ime
deadline
CI length determined
bymean
squared error
of predictor
9
Load Predictor OperationMeasurements in Fit Interval
Model
Modeler
LoadPredictor
Evaluator
Measurements in Test Interval
Prediction Stream
zt+n-1,…, zt+1 , zt
z’ t,t+w
z’ t,t+1
z’ t,t+2
...
z’ t+1,t
+1+w
z’ t+1,t
+2z’ t+
1,t+3..
.
z’ t+2,t
+2+w
z’ t+2,t
+3z’ t+
2,t+4..
.
...
...
...
<zt-m,..., zt-2 , zt-1>
Model Type
Error Metrics
Error Estimates
One-time use
Production
Stream
10
Mean Squared Error
…, zt+1 , zt
z’ t,t+w
z’ t,t+1
z’ t,t+2
...
z’ t+1,t
+1+w
z’ t+1,t
+2z’ t+
1,t+3..
.
z’ t+2,t
+2+w
z’ t+2,t
+3z’ t+
2,t+4..
.
...
...
... 1 step ahead predictions
2 step ahead predictions
w step ahead predictions
...
( - zt+i)2 Variance of z
...
(z’t+i,t+i+1 - zt+i+1 )2
(z’t+i,t+i+2 - zt+i+2 )2
(z’t+i,t+i+w - zt+i+w)2
1 step ahead mean squared error
2 step ahead mean squared error
w step ahead mean squared error
...
aw=
a1=
a2=
z =
LoadPredictor
Good Load Predictor :a1,
a2 ,…,aw
z
11
CIs From Mean Squared Error
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.05 0.1 0.15 0.2 0.25One second ahead mean squared error
95 % CIfor exec time available in next second
Predicted Load = 1.0
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250
0.5
1
1.5
2
2.5
Lead (seconds)
testcase -1619784968 from axpfea.psc trace
By signal variance
By AR(18)
Massive reduction in confidence interval lengthusing prediction
Do such benefitsconsistently occur?
Example of Improving the Confidence Interval
13
Outline• Context: predicting task execution times
– Mean squared load prediction error
• Offline trace-based evaluation– Host load traces– Linear models– Randomized methodology– Results of data-mining
• Online prediction of task execution times• Related work• Conclusion
14
Host Load Traces• DEC Unix 5 second exponential average
• Full bandwidth captured (1 Hz sample rate)• Long durations• Also looked at “deconvolved” traces
Machines Duration
August 1997 13 production cluster8 research cluster2 compute servers
15 desktops
~ one week(over onemillionsamples)
March 1998 13 production cluster8 research cluster2 compute servers
11 desktops
~ one week(over onemillionsamples)
15
Salient Properties of Load Traces+/- Extreme variation
+ Significant autocorrelationSuggests appropriateness of linear models
+ Significant average mutual information
- Self-similarity / long range dependence
+/- Epochal behavior+ Stable spectrum during an epoch
- Abrupt transitions between epochs
(Detailed study in LCR98, SciProg99)+ encouraging for prediction
- discouraging for prediction
16
Linear Models
(2000 sample fits, largest models in study, 30 steps ahead)
Model Class Fit time (ms) Step time (ms) NotesMEAN 0.03 0.003 Error is signal varianceLAST 0.75 0.001 Last value is predictionBM(p) 46.26 0.001 Average over best windowAR(p) 4.20 0.149 Deterministic algorithmMA(q) 6501.72 0.015 Function OptimizationARMA(p,q) 77046.22 0.034 Function OptimizationARIMA(p,d,q) 53016.77 0.045 Non-stationarity, FOARFIMA(p,d,q) 3692.63 9.485 Long range dependence, MLE
17
AR(p) Models
– Fast to fit (4.2 ms, AR(32), 2000 points)– Fast to use (<0.15 ms, AR(32), 30 steps ahead)– Potentially less parsimonious than other models
tptpttt azzzz 2211
nextvalue p previous
valuesweights chosen to minimize mean square error for fit interval
error
18
Evaluation Methodology
• Ran ~152,000 randomly chosen testcases on the traces– Evaluate models independently of
prediction/evaluation framework– ~30 testcases per trace, model class,
parameter set
• Data-mine results
Offline and online systems implemented using RPS Toolkit
19
Testcases
• Models
– MEAN, LAST/BM(32)
– Randomly chosen model from: AR(1..32), MA(1..8), ARMA(1..8,1..8), ARIMA(1..8,1..2,1..8), ARFIMA(1..8,d,1..8)
Load Trace ~one week
345,000 to >1Msamples at 1Hz
Fit Interval5 min...3 hours
(m=600 to 10800 samples)
Test Interval5 min...3 hours
(n=600 to 10800 samples)
Crossover Point3 hours, trace length - 3 hours
tzmtz 1ntz
20
Evaluating a TestcaseMeasurements in Fit Interval
Model
Modeler
LoadPredictor
Evaluator
Measurements in Test Interval
Prediction Stream
zt+n-1,…, zt+1 , zt
z’ t,t+w
z’ t,t+1
z’ t,t+2
...
z’ t+1,t
+1+w
z’ t+1,t
+2z’ t+
1,t+3..
.
z’ t+2,t
+2+w
z’ t+2,t
+3z’ t+
2,t+4..
.
...
...
...
<zt-m,..., zt-2 , zt-1>
Model Type
Error Metrics
Error Estimates
One-time use
Production
Stream
21
Error Metrics
• Summary statistics for the 1,2,…,30 step ahead prediction errors of all three models– Mean squared error– Min, median, max, mean, mean absolute errors
• IID tests for 1 step ahead errors– Significant residual autocorrelations, Portmanteau Q
(power of residuals), turning point test, sign test
• Normality test (R2 of QQ plot) for 1 step ahead errors
22
Database• 54 values characterize testcase, lead time
• SQL queries to answer questions
select count(*), 100*avg((testvar-msqerr)/testvar) as avgpercentimprovefrom big where p=16 and q=0 and d=0 and lead=1+----------+-------------------+| count(*) | avgpercentimprove |+----------+-------------------+| 1164 | 66.7681346166 |+----------+-------------------+
“How much do AR(16) models reduce the variability of 1 second ahead predictions?”
23
Comparisons
• Paired– MEAN vs BM/LAST vs another model
• Unpaired– All models– Unpaired t-test to compare expected mean
square errors– Box plots to determine consistency
24
AR(16) vs. LAST
-20
0
20
40
60
80
100
0 5 10 15 20 25 30Lead Time (seconds)
AR(16) models
LAST models
25
AR(16), BM(32)
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30Lead Time (seconds)
AR(16) models
BM(32) models
26
Unpaired Box Plot Comparisons
Good models achieve consistently low error
Mea
n S
quar
ed E
rror
Model A Model B Model C
Inconsistentlow error
Consistent low error
Consistent high error
2.5%
25%
50%
Mean
75%
97.5%
27
1 second Predictions, All Hosts
2.5%
25%
50%
Mean
75%
97.5%
Title:all_lead1_8to8.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Predictive models clearly worthwhile
28
15 second Predictions, All HostsTitle:all_lead15_8to8.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
2.5%
25%
50%
Mean
75%
97.5%
Predictive models clearly worthwhileBegin to see differentiation between models
29
30 second Predictions, All Hosts
2.5%
25%
50%
Mean
75%
97.5%
Title:all_lead308to8.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Predictive models clearly beneficialeven at long prediction horizons
30
1 Second Predictions, Dynamic Host
2.5%
25%
50%
Mean
75%
97.5%
Title:axp0_lead1_8to8.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Predictive models clearly worthwhile
31
15 Second Predictions, Dynamic HostTitle:axp0_lead15_8to8.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
2.5%
25%
50%
Mean
75%
97.5%
Predictive models clearly worthwhileBegin to see differentiation between models
32
30 Second Predictions, Dynamic Host
2.5%
25%
50%
Mean
75%
97.5%
Title:axp0_lead30_8to8.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Predictive models clearly worthwhileBegin to see differentiation between models
33
Outline• Context: predicting task execution times
– Mean squared load prediction error
• Offline trace-based evaluation– Host load traces– Linear models– Randomized methodology– Results of data-mining
• Online prediction of task execution times• Related work• Conclusion
34
Online Prediction of Task Execution Times• Replay selected load trace on host• Continuously run 1 Hz AR(16)-based host load predictor• Select random tasks
– 5 to 15 second intervals– 0.1 to 10 second nominal times
• Estimate exec time using predictions– Assume priority-less round-robin scheduler
• Execute task– record nominal, predicted, and actual exec times
35
On-line Prediction ResultsNominal time as prediction Load prediction based
Measurement of 1000 0.1-30 second taskson lightly loaded host
Prediction is beneficial even on lightly loaded hosts
0
10
20
30
40
50
60
0 10 20 30 40 50 60Predicted time (seconds)
f(x) = x - 0.03R^2 = 0.99
50% error
All tasksusefullypredicted
0
10
20
30
40
50
60
0 10 20 30 40 50 60Predicted time (seconds)
f(x) = 1.1*x + 0.02R^2 = 0.79
50% error
10% of tasksdrasticallymispredicted
36
On-line Prediction ResultsNominal time as prediction Load prediction based
Measurement of 3000 0.1-30 second tasks on heavily loaded, dynamic host
Prediction is beneficial on heavily loaded, dynamic hosts
05
1015202530354045
0 5 10 15 20 25 30Predicted time (seconds)
f(x) = 2.15x -0.88R^2 = 0.82
50% error
05
1015202530354045
0 5 10 15 20 25 30Predicted time (seconds)
f(x) = 0.95*x + 0.06R^2 = 0.89
50% error
74% of tasks mispredicted
3% of tasksmispredicted
37
Related Work
• Workload studies for load balancing• Mutka, et al [PerfEval ‘91]• Harchol-Balter, et al [SIGMETRICS ‘96]
• Host load measurement and studies• Network Weather Service [HPDC‘97, HPDC’99]• Remos [HPDC’98]• Dinda [LCR98, SciProg99]
• Host load prediction• Wolski, et al [HPDC’99] (NWS)• Samadani, et al [PODC’95]
38
Conclusions
• Rigorous study of host load prediction
• Host load is predictable despite its complex behavior
• Simple linear models are sufficient• Recommend AR(16) or better
• Predictions lead to useful estimates of task running time
39
Availability
• RPS Toolkit– http://www.cs.cmu.edu/~pdinda/RPS.html– Includes on-line and off-line prediction tools
• Load traces and tools– http://www.cs.cmu.edu/~pdinda/LoadTraces/
• Prediction testcase database– Available by request ([email protected])
• Remos– http://www.cs.cmu.edu/~cmcl/remulac/remos.html
40
Linear Time Series Models
Time
0 20000 40000 60000 80000
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
tj
jtjt aaz
1
Time
0 20000 40000 60000 80000
-0.0
4-0
.02
0.0
0.0
20
.04
),0(~ 2at WhiteNoisea 2,~ ztz
22za
Choose weights j to minimize a2
a is the confidence interval for t+1 predictions
UnpredictableRandom Sequence Fixed Linear Filter
Partially PredictableLoad Sequence
41
Online Resource Prediction System
Sensor
Predictor Evaluator
Buffer
Measurement Stream
Prediction Stream
Refit Signal
ApplicationApplicationApplication
User Control
Req/Resp
Stream
42
Execution Time Model
1 3 5 7Measured Load
0
5
10
15
20
25E
xecu
tion
TIm
e (S
econ
ds)
42,000 pointsCoefficient of Correlation = 0.998
nominal
tt
t
tdttload
execnow
now
)(1
1
43
Prediction Errors
LoadP
redictor
…, zt+1 , zt
z’ t,t+w
z’ t,t+1
z’ t,t+2
...
z’ t+1,t
+1+w
z’ t+1,t
+2z’ t+
1,t+3..
.
z’ t+2,t
+2+w
z’ t+2,t
+3z’ t+
2,t+4..
.
...
...
... 1 step ahead predictions
2 step ahead predictions
w step ahead predictions
...
<z’t+i,t+i+1 - zt+i+1 > 1 step ahead prediction errors
i=0,1,...
44
Prediction Errors
LoadP
redictor
…, zt+1 , zt
z’ t,t+w
z’ t,t+1
z’ t,t+2
...
z’ t+1,t
+1+w
z’ t+1,t
+2z’ t+
1,t+3..
.
z’ t+2,t
+2+w
z’ t+2,t
+3z’ t+
2,t+4..
.
...
...
... 1 step ahead predictions
2 step ahead predictions
w step ahead predictions
...
<z’t+i,t+i+1 - zt+i+1 > 1 step ahead prediction errors
i=0,1,...
<z’t+i,t+i+2 - zt+i+2 > 2 step ahead prediction errors
45
Prediction Errors
LoadP
redictor
…, zt+1 , zt
z’ t,t+w
z’ t,t+1
z’ t,t+2
...
z’ t+1,t
+1+w
z’ t+1,t
+2z’ t+
1,t+3..
.
z’ t+2,t
+2+w
z’ t+2,t
+3z’ t+
2,t+4..
.
...
...
... 1 step ahead predictions
2 step ahead predictions
w step ahead predictions
...
<z’t+i,t+i+1 - zt+i+1 > 1 step ahead prediction errors
i=0,1,...
<z’t+i,t+i+2 - zt+i+2 >
<z’t+i,t+i+w - zt+i+w>
...
2 step ahead prediction errors
w step ahead prediction errors
...
46
Mean Squared Error
LoadP
redictor
…, zt+1 , zt
z’ t,t+w
z’ t,t+1
z’ t,t+2
...
z’ t+1,t
+1+w
z’ t+1,t
+2z’ t+
1,t+3..
.
z’ t+2,t
+2+w
z’ t+2,t
+3z’ t+
2,t+4..
.
...
...
... 1 step ahead predictions
2 step ahead predictions
w step ahead predictions
...
...
(z’t+i,t+i+1 - zt+i+1 )2
(z’t+i,t+i+2 - zt+i+2 )2
(z’t+i,t+i+w - zt+i+w)2
1 step ahead mean squared error
2 step ahead mean squared error
w step ahead mean squared error
...
i=0,1,...
aw=
a1=
a2=
47
Load Predictor Operation
LoadPredictor
…, zt+1 , zt
z’ t,t+w
z’ t,t+1
z’ t,t+2
...
z’ t+1,t
+1+w
z’ t+1,t
+2z’ t+
1,t+3..
.
z’ t+2,t
+2+w
z’ t+2,t
+3z’ t+
2,t+4..
.
...
...
... 1 step ahead predictions
2 step ahead predictions
w step ahead predictions
...
48
CIs From Mean Squared Error
z’t,t+1 = 1.0
a1= 0.1
“load in next second is predicted to be 1.0”
z’t,t+1 = [1.0 - 1.96a1, 1.0 + 1.96a1] with 95% confidence
z’t,t+1 = [0.38, 1.62] with 95% confidence
texec = 1/(1+z’t,t+1) “your task will execute this long in the next second”
“one second ahead predictions are this bad”
texec = 1/(1+1.0) = 0.5 seconds
texec = 1/(1+[0.38, 1.62]) = [0.38, 0.72] seconds with 95% confidence
a1= 0.01
texec = 1/(1+[0.8, 1.2]) = [0.45, 0.56] seconds with 95% confidence
49
AR(1), LAST (big)
-80
-60
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30Lead Time (seconds)
AR(1) models
LAST models
50
AR(2), LAST (big)
-150
-100
-50
0
50
100
0 5 10 15 20 25 30Lead Time (seconds)
AR(2) models
LAST models
51
AR(4), LAST (big)
-80
-60
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30Lead Time (seconds)
AR(4) models
LAST models
52
AR(8), LAST (big)
-80
-60
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30Lead Time (seconds)
AR(8) models
LAST models
53
AR(16), LAST (big)
-80
-60
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30Lead Time (seconds)
AR(16) models
LAST models
54
AR(32), LAST (big)
-80
-60
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30Lead Time (seconds)
AR(32) models
LAST models
55
AR(1), BM(32)
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30Lead Time (seconds)
AR(1) models
BM(32) models
56
AR(2), BM(32)
-100
-80
-60
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30Lead Time (seconds)
AR(2) models
BM(32) models
57
AR(4), BM(32)
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30Lead Time (seconds)
AR(4) models
BM(32) models
58
AR(8), BM(32)
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30Lead Time (seconds)
AR(8) models
BM(32) models
59
AR(16), BM(32)
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30Lead Time (seconds)
AR(16) models
BM(32) models
60
AR(32), BM(32)
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30Lead Time (seconds)
AR(32) models
BM(32) models
61
AR(p), LAST, +1 (big)
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30AR(p) Model Order (p)
AR models
LAST models
One second ahead predictions
62
AR(p), LAST, +2 (big)
-20
0
20
40
60
80
100
0 5 10 15 20 25 30AR(p) Model Order (p)
AR models
LAST models
Two second ahead predictions
63
AR(p), LAST, +4 (big)
-60
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30AR(p) Model Order (p)
AR models
LAST models
Four second ahead predictions
64
AR(p), LAST, +8 (big)
-80
-60
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30AR(p) Model Order (p)
AR models
LAST models
Eight second ahead predictions
65
AR(p), LAST, +16 (big)
-150
-100
-50
0
50
100
0 5 10 15 20 25 30AR(p) Model Order (p)
AR models
LAST models
16 second ahead predictions
66
AR(p), LAST, +30 (big)
-150
-100
-50
0
50
100
0 5 10 15 20 25 30AR(p) Model Order (p)
AR models
LAST models
30 second ahead predictions
67
AR(p), BM(32), +1
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30AR(p) Model Order (p)
AR models
BM(32) models
One second ahead predictions
68
AR(p), BM(32), +2
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30AR(p) Model Order (p)
AR models
BM(32) models
Two second ahead predictions
69
AR(p), BM(32), +4
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30AR(p) Model Order (p)
AR models
BM(32) models
Four second ahead predictions
70
AR(p), BM(32), +8
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30AR(p) Model Order (p)
AR models
BM(32) models
8 second ahead predictions
71
AR(p), BM(32), +16
-80
-60
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30AR(p) Model Order (p)
AR models
BM(32) models
16 second ahead predictions
72
AR(p), BM(32), +30
-100
-80
-60
-40
-20
0
20
40
60
80
100
0 5 10 15 20 25 30AR(p) Model Order (p)
AR models
BM(32) models
30 second ahead predictions