![Page 1: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/1.jpg)
Dr. Bernd MathiskeSenior Software Architect
Mesosphere
Why the Datacenter needs an Operating System
1
![Page 2: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/2.jpg)
Bringing Google-Scale
Computing to Everybody
![Page 3: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/3.jpg)
A Slice of Google Tech Transfer History
2005: MapReduce -> Hadoop (Yahoo)
2007: Linux cgroups for lightweight isolation (Google)
2009: BigTable -> MongoDB
2009: “The Datacenter as a Computer” - Barroso, Hölzle (Google)2009: Mesos - a distributed operating system kernel (UC Berkeley)
2010: Large scale production Mesos deployment (Twitter)
since 2010: Many more frameworks and quite a few meta-frameworks
![Page 4: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/4.jpg)
Notable Operating System Developments
Single-something => multi-something: user, tasking, threading, core, …
More: bits, memory, storage, bandwidth…
OS virtualization => lightweight virtualization (cgroups, LXCs, jails, …)
Packaging => containers (docker, rkt, lmctfy, …)
Static libraries => dynamic libraries => static libraries
4
![Page 5: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/5.jpg)
Cluster Operating Systems (Hardware Clustering)Researched since the 1980s
Trying to provide (the illusion of) a single system image
Aiming at HA, load balancing, location transparency (e.g. for storage)
Many systems: Amoeba, ChorusOS, GLUnix, Hurricane, MOSIX, Plan9, RHCS, Spring, Sprite, Sumo, QNX, Solaris MC, UnixWare, VAXclusters, …
Relatively low scale (up to 100s of nodes)
Complicated to manage, less dynamic than software clustering
5
![Page 6: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/6.jpg)
From HPC Grid to Enterprise Cloud
Condor, LSF, Maui, Moab, Quartz, SLURM, …
Typically for batch jobs
Also cover services => SOA => more job schedulers
=> grid computing => grid middleware … => cloud stacks
6
![Page 7: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/7.jpg)
From Server Virtualization to App Aggregation
Cloud Era:Big apps, small servers
Client-Server Era:Small apps, big servers
Server
Virtualization
App App App AppApp
Aggregation
Serv Serv Serv Serv
![Page 8: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/8.jpg)
Cloud Computing
SaaS: Salesforce demonstrated success, then many followed
PaaS: Deis, Dotcloud, OpenShift, Heroku, Pivotal, Stackato, …
IaaS: AWS, Azure, DigitalOcean, GCE…
Private cloud stacks including IaaS: Eucalyptus, CloudStack, Joyent, OpenStack, SmartCloud, vSphere, …
8
![Page 9: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/9.jpg)
Datacenter
✴ A facility used to house computer systems and associated components (e.g. networking, storage, cooling, sensors)
✴ In this talk we focus on how to manage and use a single production cluster of networked computers in a datacenter
✴ Such clusters range in size from 10s to 10000s of nodes
✴ Why should we and how can we end up with just one production cluster?
9
![Page 10: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/10.jpg)
Datacenter Services
✴ LAMP (Linux, Apache, MSQL, PHP) or similar
✴ MEAN (MongoDB, Express.js, Angular.js, Node.js) or similar
✴ Cassandra, ElasticSearch, Exelixi, Hadoop, Hypertable, Jenkins, Kafka, MPI, Spark, Storm, SSSP, Torque, …
✴ Private PaaS: Deis, …
✴ …
10
![Page 11: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/11.jpg)
Operate your Laptop like your Datacenter?
![Page 12: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/12.jpg)
From Static Partitioning to Elastic Sharing
Static Partitioning
Elastic Sharing
WEB HADOOPCACHE
WASTED
FREEFREEHADOOP
WEB
CACHE
WASTED WASTED100% —
100% —
![Page 13: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/13.jpg)
Software Clustering
Layer between node OS and application frameworks
Scale
Multi-tenancy
High availability
![Page 14: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/14.jpg)
Available Open Source Components
✴ 2-level scheduler: Apache Mesos
✴ Meta-frameworks / schedulers: Aurora, Chronos, Marathon, Kubernetes, Swarm, …
✴ Service discovery: Consul, HAProxy, Mesos DNS, …
✴ Highly available configuration: zk, etcd, …
✴ Storage: HDFS, Ceph, …
✴ Node OSs: lots of Linux variants
✴ Lots of app frameworks: Sparc, Storm, Cassandra, Kafka, …14
![Page 15: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/15.jpg)
2-Level Scheduling
Scale: from 1 node to at least 10000s of nodes
Optimizing resource management
End-to-end principle: “application-specific functions ought to reside in the end nodes of a network rather than intermediary nodes”
-> Requirement for general multi-tenancy
-> Requirement for having only one production cluster
15
![Page 16: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/16.jpg)
App
How Mesos Works
�16
Framework
Scheduler Master Slave
Master
Master
Master
Executor
Executor
Task
Task
Task
Task
zk/etcd
![Page 17: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/17.jpg)
Ways to Run an Application
1. Vanilla job
• Employ meta-framework for invocation: Chronos, Aurora, Kubernetes, …
2. Application of an adapted framework
• Hadoop, Sparc, Storm, ElasticSearch, Cassandra, Kafka, many more…
3. Non-adapted services
• Employ meta-framework for invocation: Marathon, Aurora, Kubernetes, …
• Provide (select) a service discovery solution
4. Program your own scheduler (and executor)17
![Page 18: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/18.jpg)
The Mesos Framework API
✴ Currently like internal Mesos communication:
• protobuf messages over HTTP
✴ Soon:
• JSON messages over HTTP (stream)
=> no need to link with binary Mesos library and/or less to reimplement
ca. a dozen programming languages => any language
18
![Page 19: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/19.jpg)
How to implement a framework
✴ Scheduler interface: 1 half of 2-level scheduling
• The framework knows best when to do what with what kind of resources
• About a dozen callbacks, main functionality in 2 of them:- receive resource offers
- receive task status updates
✴ Executor interface: task life-cycle management and monitoring
• Command line executor included in Mesos
• Docker executor included in Mesos
• Custom executors often not needed19
![Page 20: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/20.jpg)
Scheduler SPI (implemented by Framework)
20
public interface Scheduler {
void registered(SchedulerDriver driver, FrameworkID frameworkId, MasterInfo masterInfo);
void reregistered(SchedulerDriver driver, MasterInfo masterInfo);
void resourceOffers(SchedulerDriver driver, List<Offer> offers);
void offerRescinded(SchedulerDriver driver, OfferID offerId);
void statusUpdate(SchedulerDriver driver, TaskStatus status);
void frameworkMessage(SchedulerDriver driver, ExecutorID executorId, SlaveID slaveId, byte[] data);
void disconnected(SchedulerDriver driver);
void slaveLost(SchedulerDriver driver, SlaveID slaveId);
void executorLost(SchedulerDriver driver, ExecutorID executorId, SlaveID slaveId, int status); void error(SchedulerDriver driver, String message);}
![Page 21: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/21.jpg)
Minimal Scheduler Implementationclass MyFrameworkScheduler implements Scheduler { …
private TaskGenerator _taskGen;
public void resourceOffers(SchedulerDriver driver, List<Offer> offers) { if (_taskGen.doneCreatingTasks()) { for (offer : offers) { driver.declineOffer(offer.getId()); } } else { for (offer : offers) {
List<TaskInfo> taskInfos = _taskGen.generateTaskInfos(offer); driver.launchTasks(offer.getId(), taskInfos, _filters); } } }
public void statusUpdate(SchedulerDriver driver, TaskStatus status) { _taskGen.observeTaskStatusUpdate(taskStatus); if (_taskGen.done()) { driver.stop(); } } … }
21
![Page 22: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/22.jpg)
The Developer’s Perspective
✴ Focus on application logic, not datacenter structure
✴Avoid networking-related code
✴Reuse of built-in fault-tolerance and high availability
✴Reuse distributed (infrastructure) frameworks (e.g., storage)
=> API, SDK for datacenter services
22
![Page 23: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/23.jpg)
The Operations Engineer’s Perspective
✴ Ease of deployment/management
✴ Uniformity of deployment/management
✴ Hardware utilization rate
✴ Scaling up as business grows
✴ Scaling out sporadically
✴ Cost and time for moving to a different datacenter
✴ High availability and fault-tolerance of system services
✴ Monitoring
✴ Trouble shooting
23
![Page 24: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/24.jpg)
Necessary Multi-Tenancy Features
Task containerization
Resource isolation
Resource and task attributes
Static and dynamic resource reservations
Reservation levels
Meta-frameworks
Dynamic scheduler update and reconfiguration
Security24
![Page 25: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/25.jpg)
Desirable Multi-Tenancy Features
Optimistic offers
Oversubscription
Task preemption, migration, resizing, reconfiguration
Rate limiting
Auto-scaling => hybrid cloud
Infrastructure frameworks
25
![Page 26: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/26.jpg)
Using Docker Containers in Mesos
26
Mesos Master Server
init | + mesos-master | + marathon |
Mesos Slave Server
init | + docker | | | + lxc | | | + (user task, under container init system) | | | + mesos-slave | | | + /var/lib/mesos/executors/docker | | | | | + docker run … | | |
DockerRegistry
When a user requests a container…
Mesos, LXC, and Docker are tied together for launch
21
3
4
5
6
7
8
![Page 27: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/27.jpg)
Other Schedulers as Meta-Frameworks in a 2-level Scheduler
YARN => https://github.com/mesos/myriad
Kubernetes => https://github.com/mesosphere/kubernetes-mesos
Swarm => Swarm on Mesos (new project)
=> run everything in one cluster
27
![Page 28: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/28.jpg)
Myriad : Virtual YARN Clusters on Mesos
28
◦ POST /api/clusters: Registers a new YARN ◦ GET /api/clusters: Lists all registered clusters ◦ GET /api/clusters/{clusterId}: Lists the cluster with {clusterId} ◦ PUT /api/clusters/{clusterId}/flexup: Expands the size of cluster with {clusterId} ◦ PUT /api/clusters/{clusterId}/flexdown: Shrinks the size of cluster with {clusterId} ◦ DELETE /api/clusters/{clusterId}: Unregisters YARN cluster with {clusterId}. Also, kills all the nodes.
Node
Master
Mesos
Slave
Mesos
YARN
Myriad Scheduler RM
Myriad Executor
1. Launch NodeManager
1
1
1
2.5 CPU 2.5 GB
1
NM
YARN
flexU
p
2.0 CPU 2.0 GB
C1
C2
![Page 29: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/29.jpg)
29
Kubernetes in Mesos
![Page 30: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/30.jpg)
Portability
30
Mesos
Public Cloud Managed Cloud Your Own DC
Framework Apps
Meta-Frameworks
Vanilla Apps
Infrastructure Frameworks
![Page 31: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/31.jpg)
The Application User’s Perspective
✴ Focus on apps, services, parameters, results
✴ Avoid dealing with datacenter operations/management
✴ Avoid adjusting system settings
✴ High availability
✴ Throughput
✴ Responsiveness
✴ Predictiveness
✴ Run everything I need
✴ Return on and safety of investment31
![Page 32: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/32.jpg)
The Datacenter is the new form factor
✴ 2-level scheduler => single production cluster
✴ scalability and portability => avoiding hardware/cloud lock-in
✴ built-in container support => running containers at scale
✴ automation => operator efficiency
✴ repositories => apps/services readily available
✴ API and SDK => productive/quick app/service development
32
![Page 33: OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System](https://reader038.vdocuments.us/reader038/viewer/2022110310/55a5fbaa1a28abcd738b45ea/html5/thumbnails/33.jpg)
33
Above the Clouds
with Open Source!