understanding and predicting host load

Understanding and PredictingHost Load

Peter A. DindaCarnegie Mellon University

http://www.cs.cmu.edu/~pdinda

2

Talk in a Nutshell

• Load is self-similar• Load exhibits epochal behavior• Load prediction benefits from capturing

self-similarity

Statistical analysis of two sets of week long, 1 Hz resolution traces of load on ~40 machines and evaluation of linear time series models for load prediction

3

Why Study Load?

Load partially determines execution time

We want to model and predict load

[tmin,tmax] ??InteractiveApplication

Short taskswith deadlines

Unmodified Distributed System

4

Load and Execution Time

1 3 5 7Measured Load

0

5

10

15

20

25

Exe

cutio

n TI

me

(Sec

onds

)

42,000 pointsCoefficient of Correlation = 0.998

nominal

tt

t

tdttload

execnow

now

)(11

5

Outline• Measurement methodology• Load traces• Load variance• New Results

– Self-similarity– Epochal behavior

• Benefits of capturing self similarity in linear models

• Conclusions

6

Measurement Methodology

Ready Queue

RUN

lent

lent-T

lent-2T

lent-29T

...

lent-30T...

ExponentialAverage(1 minute Load “Average”)

avgt

avgt-0.5T

avgt-T...

Our Measurements(1 Hz sample rate)

Digital Unix Kernel User Level Measurement Tool

T=2 seconds

7

Load Traces

Machines DurationAugust 1997 13 production cluster

8 research cluster2 compute servers

15 desktops

~ one week(over onemillionsamples)

February 1998 13 production cluster8 research cluster2 compute servers

11 desktops

~ one week(over onemillionsamples)

8

Absolute Variation

-1

-0.5

0

0.5

1

1.5

2

Host

+SDev

-SDev

Mean

Production Cluster ResearchCluster

Desktops

9

Relative Variation

0

2

4

6

8

10

12

Host


Desktops

10

Lag

AC

F

0 100 200 300 400 500 600

0.0

0.2

0.4

0.6

0.8

1.0 Series : axp7.19.day$NormLoad

frequency

spec

trum

0.0 0.1 0.2 0.3 0.4 0.5

-80

-60

-40

-20

020

Series: axp7.19.day$NormLoad Raw Periodogram

bandwidth= 3.34114e-006 , 95% C.I. is ( -5.87588 , 17.5667 )dB

Time

0 20000 40000 60000 80000

0.00

.20.40

.60.81

.01.21

.4Lo

adA

utoc

orre

latio

nP

erio

dogr

am

Time

Lag

Frequency

11

Visual Self-Similarity Here

12

The Hurst Parameter

0.01

0.1

1

10

100

1000

10000

100000

0.0001 0.001 0.01 0.1 1Log(Frequency)

H=0.375

H=0.5

H=0.625H=0.875

H=(1-slope)/2

13

Self-similarity Statistics

0

0.2

0.4

0.6

0.8

1

1.2

Host


Desktops

+SDev

-SDev

Mean

14

Why is Self-Similarity Important?• Complex structure

– Not completely random, nor independent– Short range dependence

• Excellent for history-based prediction– Long range dependence

• Possibly a problem

• Modeling Implications– Suggests models that can capture

• ARFIMA, FGN, TAR

15

Load Exhibits Epochal Behavior

Title:axp7_tue_19.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Title:axp7_19_day_time.epsCreator:MATLAB, The Mathworks, Inc.Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

16

Epoch Length Statistics

-200

0

200

400

600

800

1000

1200

Host

+SDev

-SDev

Mean


Desktops

17

Why is Epochal Behavior Important?

• Complex structure – Non-stationary

• Modeling Implications– Suggests models

• ARIMA, ARFIMA, etc.• Non-parametric spectral methods

– Suggests problem decomposition

18

Linear Time Series Models

Time

0 20000 40000 60000 80000

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

tj

jtjt aaz

1

Time

0 20000 40000 60000 80000

-0.0

4-0

.02

0.0

0.02

0.04

),0(~ 2at WhiteNoisea 2,~ ztz

22za

Choose weights j to minimize a2

a is the confidence interval for t+1 predictions

UnpredictableRandom Sequence Fixed Linear Filter

Partially PredictableLoad Sequence

19

Realizable Pole-Zero Models

ARFIMA(p,d,q)

ARIMA(p,d,q)

ARMA(p,q)

AR(p) MA(q)

Self Similarity, d related to Hurst

Non-stationarity, d integer

p,q are numbers of parametersd is degree of differencing

20

Real World Benefits of Modelsa is the confidence interval for t+1 predictions

Map work that would take 100 ms at zero load

axp0: z=0.54, =1.0, a(ARMA(4,4))= 0.109 a(ARFIMA(4,d,4))= 0.108no model: 1.0 +/- 1.06 (95%) => 100 to 306 msARMA: 1.0 +/- 0.22 (95%) => 178 to 222 msARFIMA: 1.0 +/- 0.21 (95%) => 179 to 221 ms

axp7: z=0.14, =0.12, a(ARMA(4,4))= 0.041 a(ARFIMA(4,d,4))= 0.025no model: 0.12 +/- 0.27 (95%) => 100 to 139 msARMA: 0.12 +/- 0.08 (95%) => 104 to 120 msARFIMA: 0.12 +/- 0.05 (95%) => 107 to 117 ms

1 %

40 %

21

t+1 prediction

-505

1015202530354045

Host


Desktops

22

t+8 prediction

-10

-5

0

5

10

15

20

25

30

35

Host


Desktops

23

Conclusions• Load has high variance• Load is self-similar• Load exhibits epochal behavior• Capturing self-similarity in linear time

series models improves predictability

24

Load Traces• Would a web-accessible load trace

database be useful?• Would you like to contribute?

understanding and predicting host load

Documents

selfsimilarity important

load predictionwhy study

qarpmaqself similarity

d integerp

hz resolution traces

epochal behavior important

confidence interval

complex structurenot