jockey: guaranteed job latency in data parallel clustersadf/work/eurosys2012-poster.pdf · jockey:...

Jockey: Guaranteed Job Latency in Data Parallel ClustersAndrew D. Fergusonadf@cs.brown.edu

Peter Bodíkpeterb@microsoft.com

Srikanth Kandulasrikanth@microsoft.com

Eric Boutineric.boutin@microsft.com

Rodrigo Fonsecarfonseca@cs.brown.edu

Jockey

Deadlines and Varying Latency Expressing Performance Targets Evaluation

Cosmos is Microsoft’s data parallel processing environment•It primarily supports Bing, Microsoft AdCenter, and MSN•Cosmos clusters contain 1000s of commodity servers, •each running multiple tasks for many jobsResources are managed by granting • tokens to tasksTokens are de-normalized weights in the scheduler and •guaranteeafixedsliceofCPUandmemory

The Cosmos Environment

Conclusion

Usersofdataparallelclustersnowdemandpredictablelatency•Predictablelatencycanberequiredfordeadlineswithbusiness•partners;missingadeadlinecanhavefinancialconsequences

It would be easy toprovide deadlines if job latency had low variance; unfortunately, it does not.

Job completion time0

Usersprovideutilitycurvestoexpressperformancetargets

DeadlineFor single jobs, scale doesn’t matter

For multiple jobs, usefinancialpenalty

0 5 10 15 20 25 30 35 40

latency [minutes]

Runtime of Recurring Jobs

CosmosStore: •distributed storage layerDryad: data-parallel •executionengineSCOPE:SQL-likequery•language for Dryad

Cosmos Components

Pipeline

Team Boundaries

Pipelinecomplexity:Usersdevelopmulti-stagepipelinesof •dependant jobs, variance in earlier jobs impacts later ones Noisy environment: Simultaneous data parallel jobs compete •for highly utilized shared resources, which can also fail

Why does latency vary?

Allocationabove oracle

Released resourcesduetoexcesspessimism

Deadline lowered from 140 min. to 70 min.

“Oracle” allocation:Total allocation-hours

deadline

How Jockey Managed a Real Job in a Production Cluster

10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% 120% 130%

job completion time relative to deadline

max allocation Jockey

Jockey w/o adapting Jockey w/o simulator

deadline

Jockey’s Performance Relative to other Allocation Schemes

Allocated toomany resources

Simulator made good predictions

Only missed 1 of 94 deadlines

Control-loop isstable and successful

0% 25% 50% 75% 100%

fraction of allocation above oracle

Jockey w/o adaptation

max allocation

Jockey w/o simulator

Jockey

Compared with a naive •maxallocationscheme,and simulator and control-loop independently21 jobs in production clus-•ter,CPUuse:~80%Two metrics: Did jobs •complete before deadline? Minimized impact on the rest of the cluster?

Problem SolutionPipelinecomplexity UseasimulatorNoisy environment Dynamic control

Jockeyworkswithoutrequiringlatencyguaranteesfrom •individual cluster componentsWhen a shared environment is underloaded, guaranteed •latencybringspredictabilitytotheuserexperienceWhen a shared environment is overloaded, utility-based •resource allocation ensures jobs are completed by importance

Conceptually, Jockey is1) a function from progress and allocation toremaining run time2) a control-loop which dynamically adjusts the resource allocation

simulator

offline during job runtime

running job

utility function job stats latency

predictions

resource allocation control loop

job profile

Our Goal:Maximize utility, while minimizing resourcesby dynamically adjusting the allocation

Our paper provides details and evaluation of each component

jockey: guaranteed job latency in data parallel clustersadf/work/eurosys2012-poster.pdf · jockey:...

Documents

jockey plaza revista 47

arbitrage through connectivity -...

jockey plaza - febrero 2012

jockey - eurosys...

bomba jockey

farehawker ultimate travel jockey

low-latency geo-replicated state machines with guaranteed...

the desk jockey

jockey club

automotive ethernet for mobile machines · bandwidth...

farehawker.com the travel jockey

jockey sp10

pm series jockey pump

dockers jockey

word jockey

jockey india

xt series jockey pump controllers 1 - 1...

horse and jockey hotel

isha hening - visual jockey

garment analysis jockey