dynamic scheduling - federated clusters in mesos
TRANSCRIPT
Aaron CareyProduction Engineer - ILM [email protected]
Federated Clusters in Mesos
Why?
Who wins?
Why?Sites in 3 time zones
Need to share render resources
Went through a project to prepare for cloud burst rendering
Renders mostly come at night (mostly)
What happens when our farm is full?
Can we burst to our other locations?
Approaches
Huawei DesignLed by the master and gossip protocol
Includes policy model
Master decides if a framework gets an offer
Master is in control
Based on two master plugins, consul deployment, gossip protocol
https://www.youtube.com/watch?v=kqyVQzwwD5E
http://www.slideshare.net/mKrishnaKumar1/federated-mesos-clusters-for-global-data-center-designs
Our hack designNeeds to be simple
Decisions made in the framework
Framework connects to all masters
Masters don’t care about each other
We don’t need a policy engine
Keep code out of the Master
Diversion...
A note on scheduling...Historically, schedulers in VFX are tyrannical micro managers
Full knowledge of the whole cluster and all tasks allow better informed decisions
In Mesos you only know what the Master tells youNo knowledge of other frameworks
At the mercy of the Master
Offers only deal in the presentWe could hoard all offers we get, but we want to play nice
We don’t know if a better offer is just around the corner
Making dynamic scheduling decisions...Can we intelligently schedule tasks without knowing the whole cluster state?
Schedule penaltyEvery datacentre has a penalty for scheduling a task
Golf rules
Penalty = Interactivity Penalty + Data Penalty + Utilisation Penalty
Interactive PenaltyFramework regularly checks current latency to connected datacentres
Lo = maximum latency for interactive applications (around 35ms)
Lm = latency for datacentre m
I = 0 for non interactive, 1 for interactive
Data Penalty
Total Input Data Required - Input Data Already at Location
Bandwidth
Utilisation PenaltyFramework checks current utilisation of datacentres
Utarget = target utilisation of datacentre (e.g. 95%)
Um = utilisation of datacentre m
Time PenaltyOptional
Penalty decreases based on length of time in the queue
Putting it togetherSet a cost threshold above which jobs don’t run
Tasks will get dispatched to the datacentre with the lowest cost
Thresholding can ensure jobs wait for optimum resources without consuming all offers
Where were we?
Framework
System
What’s Next?
Peer to Peer vs Hierarchical
Get involved!Proposal for federated clusters:
https://docs.google.com/document/d/1U4IY_ObAXUPhtTa-0Rw_5zQxHDRnJFe5uFNOQ0VUcLg/edit?usp=sharing
Federated Marathon:
https://github.com/schibsted/triathlon
Current Discussion (favouring hierarchical design):
We’re [email protected]