an evaluation of linear models for host load prediction peter a. dinda david r. o’hallaron...

An Evaluation of Linear Modelsfor

Host Load Prediction

Peter A. Dinda

David R. O’Hallaron

Carnegie Mellon University

2

Motivating Questions

• What are the properties of host load?

• Is host load predictable?

• What predictive models are appropriate?

• Are host load predictions useful?

3

Overview of Answers

• Host load exhibits complex behavior• Self-similarity, epochal behavior

• Host load is predictable• 1 to 30 second timeframe

• Simple linear models are sufficient• Recommend AR(16) or better

• Predictions lead to useful estimates of task execution times

Statistically rigorous approach

4

Outline• Context: predicting task execution times

– Mean squared load prediction error

• Offline trace-based evaluation– Host load traces– Linear models– Randomized methodology– Results of data-mining

• Online prediction of task execution times• Related work• Conclusion

5

Prediction-based Best-effort Distributed Real-time Scheduling

Pre

dict

ed E

xec

Tim

e

Task

deadlinenominal time

?

deadline

Task notifies scheduler of its CPU requirements (nominal time) and its deadline

Scheduler acquires predicted task execution times for all hosts

Scheduler assigns task to a host where its deadline can be met

6

Task


?Load

Sensor

LoadPredictor

Exec TimeModel

Predicting Task Execution TimesP

redi

cted

Exe

c T

ime

deadline

DEC Unix5 secondload averagesampled at 1 Hz

1 to 30 second predictions

7

Confidence IntervalsBad Predictor

No obvious choiceGood Predictor

Two good choices

Pre

dict

ed E

xec

Tim

e

Good predictors provide smaller confidence intervals

Smaller confidence intervals simplify scheduling decisions

Pre

dict

ed E

xec

Tim

e

deadline

8

Task


?Load

Sensor

LoadPredictor

Exec TimeModel

Load Prediction FocusP

redi

cted

Exe

c T

ime

deadline

CI length determined

bymean

squared error

of predictor

9

Load Predictor OperationMeasurements in Fit Interval

Model

Modeler

LoadPredictor

Evaluator

Measurements in Test Interval

Prediction Stream

zt+n-1,…, zt+1 , zt

z’ t,t+w

z’ t,t+1

z’ t,t+2

...

z’ t+1,t

+1+w

z’ t+1,t

+2z’ t+

1,t+3..

.

z’ t+2,t

+2+w

z’ t+2,t

+3z’ t+

2,t+4..

.

...

...

...

<zt-m,..., zt-2 , zt-1>

Model Type

Error Metrics

Error Estimates

One-time use

Production

Stream

10

Mean Squared Error

…, zt+1 , zt

z’ t,t+w

z’ t,t+1

z’ t,t+2

...

z’ t+1,t

+1+w

z’ t+1,t

+2z’ t+

1,t+3..

.

z’ t+2,t

+2+w

z’ t+2,t

+3z’ t+

2,t+4..

.

...

...

... 1 step ahead predictions

2 step ahead predictions

w step ahead predictions

...

( - zt+i)2 Variance of z

...

(z’t+i,t+i+1 - zt+i+1 )2

(z’t+i,t+i+2 - zt+i+2 )2

(z’t+i,t+i+w - zt+i+w)2

1 step ahead mean squared error


w step ahead mean squared error

...

aw=

a1=

a2=

z =

LoadPredictor

Good Load Predictor :a1,

a2 ,…,aw

z

11

CIs From Mean Squared Error

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.05 0.1 0.15 0.2 0.25One second ahead mean squared error

95 % CIfor exec time available in next second

Predicted Load = 1.0

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250

0.5

1

1.5

2

2.5

Lead (seconds)

testcase -1619784968 from axpfea.psc trace

By signal variance

By AR(18)

Massive reduction in confidence interval lengthusing prediction

Do such benefitsconsistently occur?

Example of Improving the Confidence Interval

13





14

Host Load Traces• DEC Unix 5 second exponential average

• Full bandwidth captured (1 Hz sample rate)• Long durations• Also looked at “deconvolved” traces

Machines Duration

August 1997 13 production cluster8 research cluster2 compute servers

15 desktops

~ one week(over onemillionsamples)

March 1998 13 production cluster8 research cluster2 compute servers

11 desktops

~ one week(over onemillionsamples)

15

Salient Properties of Load Traces+/- Extreme variation

+ Significant autocorrelationSuggests appropriateness of linear models

+ Significant average mutual information

- Self-similarity / long range dependence

+/- Epochal behavior+ Stable spectrum during an epoch

- Abrupt transitions between epochs

(Detailed study in LCR98, SciProg99)+ encouraging for prediction

- discouraging for prediction

16

Linear Models

(2000 sample fits, largest models in study, 30 steps ahead)

Model Class Fit time (ms) Step time (ms) NotesMEAN 0.03 0.003 Error is signal varianceLAST 0.75 0.001 Last value is predictionBM(p) 46.26 0.001 Average over best windowAR(p) 4.20 0.149 Deterministic algorithmMA(q) 6501.72 0.015 Function OptimizationARMA(p,q) 77046.22 0.034 Function OptimizationARIMA(p,d,q) 53016.77 0.045 Non-stationarity, FOARFIMA(p,d,q) 3692.63 9.485 Long range dependence, MLE

17

AR(p) Models

– Fast to fit (4.2 ms, AR(32), 2000 points)– Fast to use (<0.15 ms, AR(32), 30 steps ahead)– Potentially less parsimonious than other models

tptpttt azzzz 2211

nextvalue p previous

valuesweights chosen to minimize mean square error for fit interval

error

18

Evaluation Methodology

• Ran ~152,000 randomly chosen testcases on the traces– Evaluate models independently of

prediction/evaluation framework– ~30 testcases per trace, model class,

parameter set

• Data-mine results

Offline and online systems implemented using RPS Toolkit

19

Testcases

• Models

– MEAN, LAST/BM(32)

– Randomly chosen model from: AR(1..32), MA(1..8), ARMA(1..8,1..8), ARIMA(1..8,1..2,1..8), ARFIMA(1..8,d,1..8)

Load Trace ~one week

345,000 to >1Msamples at 1Hz

Fit Interval5 min...3 hours

(m=600 to 10800 samples)

Test Interval5 min...3 hours

(n=600 to 10800 samples)

Crossover Point3 hours, trace length - 3 hours

tzmtz 1ntz

20

Evaluating a TestcaseMeasurements in Fit Interval

Model

Modeler

LoadPredictor

Evaluator

Measurements in Test Interval

Prediction Stream

zt+n-1,…, zt+1 , zt

z’ t,t+w

z’ t,t+1

z’ t,t+2

...

z’ t+1,t

+1+w

z’ t+1,t

+2z’ t+

1,t+3..

.

z’ t+2,t

+2+w

z’ t+2,t

+3z’ t+

2,t+4..

.

...

...

...

<zt-m,..., zt-2 , zt-1>

Model Type

Error Metrics

Error Estimates

One-time use

Production

Stream

21

Error Metrics

• Summary statistics for the 1,2,…,30 step ahead prediction errors of all three models– Mean squared error– Min, median, max, mean, mean absolute errors

• IID tests for 1 step ahead errors– Significant residual autocorrelations, Portmanteau Q

(power of residuals), turning point test, sign test

• Normality test (R2 of QQ plot) for 1 step ahead errors

22

Database• 54 values characterize testcase, lead time

• SQL queries to answer questions

select count(*), 100*avg((testvar-msqerr)/testvar) as avgpercentimprovefrom big where p=16 and q=0 and d=0 and lead=1+----------+-------------------+| count(*) | avgpercentimprove |+----------+-------------------+| 1164 | 66.7681346166 |+----------+-------------------+

“How much do AR(16) models reduce the variability of 1 second ahead predictions?”

23

Comparisons

• Paired– MEAN vs BM/LAST vs another model

• Unpaired– All models– Unpaired t-test to compare expected mean

square errors– Box plots to determine consistency

24

AR(16) vs. LAST

-20

0

20

40

60

80

100

0 5 10 15 20 25 30Lead Time (seconds)

AR(16) models

LAST models

25

AR(16), BM(32)

-40

-20

0

20

40

60

80

100


AR(16) models

BM(32) models

26

Unpaired Box Plot Comparisons

Good models achieve consistently low error

Mea

n S

quar

ed E

rror

Model A Model B Model C

Inconsistentlow error

Consistent low error

Consistent high error

2.5%

25%

50%

Mean

75%

97.5%

27

1 second Predictions, All Hosts

2.5%

25%

50%

Mean

75%

97.5%

Title:all_lead1_8to8.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Predictive models clearly worthwhile

28

15 second Predictions, All HostsTitle:all_lead15_8to8.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

2.5%

25%

50%

Mean

75%

97.5%

Predictive models clearly worthwhileBegin to see differentiation between models

29

30 second Predictions, All Hosts

2.5%

25%

50%

Mean

75%

97.5%

Title:all_lead308to8.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Predictive models clearly beneficialeven at long prediction horizons

30

1 Second Predictions, Dynamic Host

2.5%

25%

50%

Mean

75%

97.5%

Title:axp0_lead1_8to8.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Predictive models clearly worthwhile

31

15 Second Predictions, Dynamic HostTitle:axp0_lead15_8to8.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

2.5%

25%

50%

Mean

75%

97.5%


32

30 Second Predictions, Dynamic Host

2.5%

25%

50%

Mean

75%

97.5%

Title:axp0_lead30_8to8.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.


33





34

Online Prediction of Task Execution Times• Replay selected load trace on host• Continuously run 1 Hz AR(16)-based host load predictor• Select random tasks

– 5 to 15 second intervals– 0.1 to 10 second nominal times

• Estimate exec time using predictions– Assume priority-less round-robin scheduler

• Execute task– record nominal, predicted, and actual exec times

35

On-line Prediction ResultsNominal time as prediction Load prediction based

Measurement of 1000 0.1-30 second taskson lightly loaded host

Prediction is beneficial even on lightly loaded hosts

0

10

20

30

40

50

60

0 10 20 30 40 50 60Predicted time (seconds)

f(x) = x - 0.03R^2 = 0.99

50% error

All tasksusefullypredicted

0

10

20

30

40

50

60


f(x) = 1.1*x + 0.02R^2 = 0.79

50% error

10% of tasksdrasticallymispredicted

36

On-line Prediction ResultsNominal time as prediction Load prediction based

Measurement of 3000 0.1-30 second tasks on heavily loaded, dynamic host

Prediction is beneficial on heavily loaded, dynamic hosts

05

1015202530354045


f(x) = 2.15x -0.88R^2 = 0.82

50% error

05

1015202530354045


f(x) = 0.95*x + 0.06R^2 = 0.89

50% error

74% of tasks mispredicted

3% of tasksmispredicted

37

Related Work

• Workload studies for load balancing• Mutka, et al [PerfEval ‘91]• Harchol-Balter, et al [SIGMETRICS ‘96]

• Host load measurement and studies• Network Weather Service [HPDC‘97, HPDC’99]• Remos [HPDC’98]• Dinda [LCR98, SciProg99]

• Host load prediction• Wolski, et al [HPDC’99] (NWS)• Samadani, et al [PODC’95]

38

Conclusions

• Rigorous study of host load prediction

• Host load is predictable despite its complex behavior

• Simple linear models are sufficient• Recommend AR(16) or better

• Predictions lead to useful estimates of task running time

39

Availability

• RPS Toolkit– http://www.cs.cmu.edu/~pdinda/RPS.html– Includes on-line and off-line prediction tools

• Load traces and tools– http://www.cs.cmu.edu/~pdinda/LoadTraces/

• Prediction testcase database– Available by request ([email protected])

• Remos– http://www.cs.cmu.edu/~cmcl/remulac/remos.html

40

Linear Time Series Models

Time

0 20000 40000 60000 80000

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

tj

jtjt aaz

1

Time

0 20000 40000 60000 80000

-0.0

4-0

.02

0.0

0.0

20

.04

),0(~ 2at WhiteNoisea 2,~ ztz

22za

Choose weights j to minimize a2

a is the confidence interval for t+1 predictions

UnpredictableRandom Sequence Fixed Linear Filter

Partially PredictableLoad Sequence

41

Online Resource Prediction System

Sensor

Predictor Evaluator

Buffer

Measurement Stream

Prediction Stream

Refit Signal

ApplicationApplicationApplication

User Control

Req/Resp

Stream

42

Execution Time Model

1 3 5 7Measured Load

0

5

10

15

20

25E

xecu

tion

TIm

e (S

econ

ds)

42,000 pointsCoefficient of Correlation = 0.998

nominal

tt

t

tdttload

execnow

now

)(1

1

43

Prediction Errors

LoadP

redictor

…, zt+1 , zt

z’ t,t+w

z’ t,t+1

z’ t,t+2

...

z’ t+1,t

+1+w

z’ t+1,t

+2z’ t+

1,t+3..

.

z’ t+2,t

+2+w

z’ t+2,t

+3z’ t+

2,t+4..

.

...

...




...

<z’t+i,t+i+1 - zt+i+1 > 1 step ahead prediction errors

i=0,1,...

44

Prediction Errors

LoadP

redictor

…, zt+1 , zt

z’ t,t+w

z’ t,t+1

z’ t,t+2

...

z’ t+1,t

+1+w

z’ t+1,t

+2z’ t+

1,t+3..

.

z’ t+2,t

+2+w

z’ t+2,t

+3z’ t+

2,t+4..

.

...

...




...


i=0,1,...


45

Prediction Errors

LoadP

redictor

…, zt+1 , zt

z’ t,t+w

z’ t,t+1

z’ t,t+2

...

z’ t+1,t

+1+w

z’ t+1,t

+2z’ t+

1,t+3..

.

z’ t+2,t

+2+w

z’ t+2,t

+3z’ t+

2,t+4..

.

...

...




...


i=0,1,...

<z’t+i,t+i+2 - zt+i+2 >

<z’t+i,t+i+w - zt+i+w>

...

2 step ahead prediction errors

w step ahead prediction errors

...

46

Mean Squared Error

LoadP

redictor

…, zt+1 , zt

z’ t,t+w

z’ t,t+1

z’ t,t+2

...

z’ t+1,t

+1+w

z’ t+1,t

+2z’ t+

1,t+3..

.

z’ t+2,t

+2+w

z’ t+2,t

+3z’ t+

2,t+4..

.

...

...




...

...

(z’t+i,t+i+1 - zt+i+1 )2

(z’t+i,t+i+2 - zt+i+2 )2

(z’t+i,t+i+w - zt+i+w)2



w step ahead mean squared error

...

i=0,1,...

aw=

a1=

a2=

47

Load Predictor Operation

LoadPredictor

…, zt+1 , zt

z’ t,t+w

z’ t,t+1

z’ t,t+2

...

z’ t+1,t

+1+w

z’ t+1,t

+2z’ t+

1,t+3..

.

z’ t+2,t

+2+w

z’ t+2,t

+3z’ t+

2,t+4..

.

...

...




...

48

CIs From Mean Squared Error

z’t,t+1 = 1.0

a1= 0.1

“load in next second is predicted to be 1.0”

z’t,t+1 = [1.0 - 1.96a1, 1.0 + 1.96a1] with 95% confidence

z’t,t+1 = [0.38, 1.62] with 95% confidence

texec = 1/(1+z’t,t+1) “your task will execute this long in the next second”

“one second ahead predictions are this bad”

texec = 1/(1+1.0) = 0.5 seconds

texec = 1/(1+[0.38, 1.62]) = [0.38, 0.72] seconds with 95% confidence

a1= 0.01

texec = 1/(1+[0.8, 1.2]) = [0.45, 0.56] seconds with 95% confidence

49

AR(1), LAST (big)

-80

-60

-40

-20

0

20

40

60

80

100


AR(1) models

LAST models

50

AR(2), LAST (big)

-150

-100

-50

0

50

100


AR(2) models

LAST models

51

AR(4), LAST (big)

-80

-60

-40

-20

0

20

40

60

80

100


AR(4) models

LAST models

52

AR(8), LAST (big)

-80

-60

-40

-20

0

20

40

60

80

100


AR(8) models

LAST models

53

AR(16), LAST (big)

-80

-60

-40

-20

0

20

40

60

80

100


AR(16) models

LAST models

54

AR(32), LAST (big)

-80

-60

-40

-20

0

20

40

60

80

100


AR(32) models

LAST models

55

AR(1), BM(32)

-40

-20

0

20

40

60

80

100


AR(1) models

BM(32) models

56

AR(2), BM(32)

-100

-80

-60

-40

-20

0

20

40

60

80

100


AR(2) models

BM(32) models

57

AR(4), BM(32)

-40

-20

0

20

40

60

80

100


AR(4) models

BM(32) models

58

AR(8), BM(32)

-40

-20

0

20

40

60

80

100


AR(8) models

BM(32) models

59

AR(16), BM(32)

-40

-20

0

20

40

60

80

100


AR(16) models

BM(32) models

60

AR(32), BM(32)

-40

-20

0

20

40

60

80

100


AR(32) models

BM(32) models

61

AR(p), LAST, +1 (big)

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30AR(p) Model Order (p)

AR models

LAST models

One second ahead predictions

62


-20

0

20

40

60

80

100

0 5 10 15 20 25 30AR(p) Model Order (p)

AR models

LAST models

Two second ahead predictions

63


-60

-40

-20

0

20

40

60

80

100

0 5 10 15 20 25 30AR(p) Model Order (p)

AR models

LAST models

Four second ahead predictions

64


-80

-60

-40

-20

0

20

40

60

80

100

0 5 10 15 20 25 30AR(p) Model Order (p)

AR models

LAST models

Eight second ahead predictions

65


-150

-100

-50

0

50

100

0 5 10 15 20 25 30AR(p) Model Order (p)

AR models

LAST models

16 second ahead predictions

66


-150

-100

-50

0

50

100

0 5 10 15 20 25 30AR(p) Model Order (p)

AR models

LAST models


67

AR(p), BM(32), +1

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30AR(p) Model Order (p)

AR models

BM(32) models

One second ahead predictions

68

AR(p), BM(32), +2

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30AR(p) Model Order (p)

AR models

BM(32) models

Two second ahead predictions

69

AR(p), BM(32), +4

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30AR(p) Model Order (p)

AR models

BM(32) models

Four second ahead predictions

70

AR(p), BM(32), +8

-40

-20

0

20

40

60

80

100

0 5 10 15 20 25 30AR(p) Model Order (p)

AR models

BM(32) models


71

AR(p), BM(32), +16

-80

-60

-40

-20

0

20

40

60

80

100

0 5 10 15 20 25 30AR(p) Model Order (p)

AR models

BM(32) models


72

AR(p), BM(32), +30

-100

-80

-60

-40

-20

0

20

40

60

80

100

0 5 10 15 20 25 30AR(p) Model Order (p)

AR models

BM(32) models


an evaluation of linear models for host load prediction peter a. dinda david r. o’hallaron...

Documents

z t z t

t w z t

t i w z t i w

z slide

predicted load

load predictable

load average

exec time deadline slide