dynamic scheduling - federated clusters in mesos

Aaron CareyProduction Engineer - ILM [email protected]

Federated Clusters in Mesos

Who wins?

Why?Sites in 3 time zones

Need to share render resources

Went through a project to prepare for cloud burst rendering

Renders mostly come at night (mostly)

What happens when our farm is full?

Can we burst to our other locations?

Approaches

Huawei DesignLed by the master and gossip protocol

Includes policy model

Master decides if a framework gets an offer

Master is in control

Based on two master plugins, consul deployment, gossip protocol

https://www.youtube.com/watch?v=kqyVQzwwD5E

http://www.slideshare.net/mKrishnaKumar1/federated-mesos-clusters-for-global-data-center-designs

https://www.youtube.com/watch?v=kqyVQzwwD5E



Our hack designNeeds to be simple

Decisions made in the framework

Framework connects to all masters

Masters don’t care about each other

We don’t need a policy engine

Keep code out of the Master

Diversion...

A note on scheduling...Historically, schedulers in VFX are tyrannical micro managers

Full knowledge of the whole cluster and all tasks allow better informed decisions

In Mesos you only know what the Master tells youNo knowledge of other frameworks

At the mercy of the Master

Offers only deal in the presentWe could hoard all offers we get, but we want to play nice

We don’t know if a better offer is just around the corner

Making dynamic scheduling decisions...Can we intelligently schedule tasks without knowing the whole cluster state?

Schedule penaltyEvery datacentre has a penalty for scheduling a task

Golf rules

Penalty = Interactivity Penalty + Data Penalty + Utilisation Penalty

Interactive PenaltyFramework regularly checks current latency to connected datacentres

Lo = maximum latency for interactive applications (around 35ms)

Lm = latency for datacentre m

I = 0 for non interactive, 1 for interactive

Data Penalty

Total Input Data Required - Input Data Already at Location

Bandwidth

Utilisation PenaltyFramework checks current utilisation of datacentres

Utarget = target utilisation of datacentre (e.g. 95%)

Um = utilisation of datacentre m

Time PenaltyOptional

Penalty decreases based on length of time in the queue

Putting it togetherSet a cost threshold above which jobs don’t run

Tasks will get dispatched to the datacentre with the lowest cost

Thresholding can ensure jobs wait for optimum resources without consuming all offers

Where were we?

Framework

System

What’s Next?

Peer to Peer vs Hierarchical

Get involved!Proposal for federated clusters:

https://docs.google.com/document/d/1U4IY_ObAXUPhtTa-0Rw_5zQxHDRnJFe5uFNOQ0VUcLg/edit?usp=sharing

Federated Marathon:

https://github.com/schibsted/triathlon

Current Discussion (favouring hierarchical design):

[email protected]



https://github.com/schibsted/triathlon

mailto:[email protected]

We’re [email protected]

dynamic scheduling - federated clusters in mesos

Technology