active sampling for accelerated learning of performance models

20
Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University

Upload: cecil

Post on 13-Jan-2016

35 views

Category:

Documents


1 download

DESCRIPTION

Active Sampling for Accelerated Learning of Performance Models. Piyush Shivam, Shivnath Babu, Jeff Chase Duke University. Networked Computing Utility. A network of clusters or grid sites. Each site is a pool of heterogeneous resources (e.g., CPU, memory, storage, network) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Active Sampling for Accelerated Learning of Performance Models

Active Sampling for Accelerated Learning of

Performance Models

Piyush Shivam, Shivnath Babu, Jeff Chase

Duke University

Page 2: Active Sampling for Accelerated Learning of Performance Models

C3

C1

C2

Site A

Site B

Site C

Task scheduler

Task workflow

A network of clusters or grid sites.

Each site is a pool of heterogeneous resources (e.g., CPU, memory, storage, network)

Managed as a shared utility.

Jobs are task/data workflows.

Challenge: choose the ‘best’ resource mapping/schedule for the job mix.

Instance of “utility resource planning”.

Solution under construction: NIMO

Networked Computing Utility

Page 3: Active Sampling for Accelerated Learning of Performance Models

Subproblem: Predict Job Completion Time

AttributesSamples

CPU speed

Memory size

Network latency

Disk spindles Execution time

s1 2.4 GHz

2 GB 1 ms 10 2 hours

. . . . . .

. . . . . .

Page 4: Active Sampling for Accelerated Learning of Performance Models

Premises (Limitations)• Important batch applications are run repeatedly.

– Most resources are consumed by applications we have seen in the past.

• Behavior is predictable across data sets.– …given some attributes associated with the data set.– Stable behavior per unit of data processed (D)– D is predictable from data set attributes.

• Behavior depends only on resource attributes.– CPU type and clock, seek time, spindle count.

• Utility controls the resources assigned to each job.– Virtualization enables precise control.

• Your mileage may vary.

Page 5: Active Sampling for Accelerated Learning of Performance Models

NIMONonInvasive Modeling for

Optimization

• NIMO learns end-to-end performance models– Models predict performance as a function of, (a)

application profile, (b) data set profile, and (c) resource profile of candidate resource assignment

• NIMO is active– NIMO collects training data for learning models by

conducting proactive experiments on a ‘workbench’• NIMO is noninvasive

App/data profiles

(Target) performance

Candidate resource profiles

Model

“What if…”

Page 6: Active Sampling for Accelerated Learning of Performance Models

Applicationprofiler

Training setdatabase

Active learning

C3

C1

C2

Site A

Site B

Site C

SchedulerResourceprofiler

The Big Picture

Jobs, benchmarks

Pervasive instrumentation

Correlate metrics

with job logs

Page 7: Active Sampling for Accelerated Learning of Performance Models

Generic End-to-End Model

compute phases(compute resource busy)

stall phases(compute resource

stalled on I/O)

Od

(storage

occupancy)

On

(network

occupancy)

+ + )(T = D *totaldata

comp.time

Oa

(compute

occupancy)

Os

(stall occupancy)

occupancy: average time consumed per unit of datadirectly observable

Page 8: Active Sampling for Accelerated Learning of Performance Models

Independent variables

Dependent variables

Resource profile ( )

Dataprofile ( )

Statistical Learning

Complexity (e.g., latency hiding, concurrency, arm contention) is captured implicitly in the training data rather than in the structure of the model.

Page 9: Active Sampling for Accelerated Learning of Performance Models

Sampling Challenges

• Full system operating range– Samples must cover space of candidate resource

assignments

• Cost of sample acquisition– Acquiring a sample has a non-negligible cost, e.g.,

time to acquire a sample, or opportunity cost for the application

• Curse of dimensionality– Too many parameters!– E.g., 10 dimensions X 10 values per dimension– 5 minutes for each sample => 951 years for 1%

samples!

Page 10: Active Sampling for Accelerated Learning of Performance Models

Active Learning in NIMO

Passive sampling

Active sampling

Number of training samples

Accuracy of

current model

100%

• Passive sampling might not expose the system operating range

• Active sampling using “design of experiments” collects most relevant training data

• Automatic and quick

How to learn accurate models quickly?

Page 11: Active Sampling for Accelerated Learning of Performance Models

Sample Carefully

Passive sampling

Active sampling with acceleration

Number of training samples

Accuracy ofcurrent model

100%

Active samplingwithout acceleration

Page 12: Active Sampling for Accelerated Learning of Performance Models

Active Sampling Challenges

• How to expose the main factors and interactions in the shortest time?– Which dimensions/attributes to perturb?– What values to choose for the attributes?

• Where to conduct the experiment?– On a separate system (“workbench”) or “live”?

Page 13: Active Sampling for Accelerated Learning of Performance Models

Planning `active’ experiments

1. Choose a predictor function to refine• Focus in on the most significant/relevant

predictors….or…the least accurate• Example: CPU-intensive app needs an

accurate compute time predictor2. Choose attribute (if any) to add to the predictor

• Example: CPU speed3. Choose the values of the attributes 4. Conduct the experiment5. Compute current prediction error; Go to Step 1

Page 14: Active Sampling for Accelerated Learning of Performance Models

Choosing the Next Predictor

• Learn the most significant/relevant predictors first.– Static vs. dynamic ordering– Static: define total order, e.g., a priori or by

pre-estimates of influence (Plackett-Burman).• Cycle through the order: round-robin vs.

improvement threshold– Dynamic: choose the predictor with maximum

current error

Page 15: Active Sampling for Accelerated Learning of Performance Models

Choosing New Attributes

• Include the most significant/relevant attributes– Choose attributes to expose main factors and

interactions• Add an attribute when error reduction from

further training with the current set falls below threshold.

• Choose the attribute with maximum potential improvement in accuracy.– Establish total order using pre-estimate of

relevance using Plackett-Burman.

Page 16: Active Sampling for Accelerated Learning of Performance Models

Choosing New Values• Select a new value sample to train the selected

predictor function with the chosen set of attributes.

• Range of approaches balance coverage vs. interactions

Binary search/bracketPB to identify interactions

La-Ib

a = #levels for valueb = degree of interactions

Page 17: Active Sampling for Accelerated Learning of Performance Models

Experimental Results

• Biomedical applications– BLAST, fMRI, NAMD, CardioWave

• Resources– 5 CPU speeds, 6 Network latencies, 5 Memory

sizes– 5 X 6 X 5 = 150 resource assignments

• Goal: Learn executing time model with least number of training assignments

• Separate test set to evaluate the accuracy of the current model

Page 18: Active Sampling for Accelerated Learning of Performance Models

BLAST Application

• Total time for 150 assignments: 130 hrs

• Active sampling: 5 hrs

• Sample space: 2%

• Incorrect order of predictor refinement

• 12 hrs• 10% sample space

Page 19: Active Sampling for Accelerated Learning of Performance Models

BLAST Application

• Total time for 150 assignments: 130 hrs

• Active sampling: 5 hrs

• Sample space: 2%

• Incorrect order of attribute refinement

• 12 hrs• 10% sample space

Page 20: Active Sampling for Accelerated Learning of Performance Models

Summary/Conclusions

• Current SLT – given the right data, learn the right model

• Use active sampling to acquire the right data• Ongoing experiments demonstrate the

importance/potential of guided active sampling– 2% sample space, >= 90% model accuracy

• Upcoming VLDB paper…