1/40
Execution Environment for On-Demand Computing Services Based
on Shared Clusters
PhD thesis, Grenoble University
By Rodrigue Chakode(LIG/INRIA, Equipe Mescal)
Advisors: - Jean-François Méhaut- Maurice Tchuenté
2/40
Cloud Computing in a Nutshell
◉ Enables computing features as services
◉ Free or commercial services accessible over network
◉ On-demand and elastic accesses, plus a utility billing
– Customers (users of the service) only pay for what they use, aka pay-as-you-go
– Requests for more or less features should be satisfied quickly
◉ Services setup transparently against customers
– They don't have to care about how the service is enabled
3/40
Context Statement on Cloud Computing
◉Various sorts of cloud services– Infrastructure-as-a-Service, Platform-as-a-Service, Software-
as-a-Service, Data-as-a-Service, Translation-as-a-Service...
– Almost everything could be a service (XaaS)
◉Requires to set up a suitable computing infrastructure– Servers, storage, network fabrics, cooling system...
◉May need significant investments – Out of reach for many small or medium businesses (SMBs)
– Market currently dominated by biggest organizations
Introduction
4/40
Challenges for HPC
◉ Numerous software require intensive computing capabilities
– E.g. EDA Applications (Ciloe Project)
– Integrated circuits need to be simulated before manufacturing
◉ Computing architectures are increasingly parallel
– SMP, NUMA, GPU, Cluster... and soon many-core architectures
◉ HPC applications run on clusters of multicore nodes (SMP/NUMA)
◉ Also expensive
Example of a cluster. Credit : CEA
Introduction
5/40
Bring HPC Services into Clouds
◉Services requiring intensive computations
◉Services enabled from a mutualized cluster– Cluster supported by several businesses
– Each business providing its own service
– Cluster's resources shared among the services
◉Study with the context of an industrial collaboration– The Ciloe Project [http://ciloe.minalogic.net]
– Three SBEs editing EDA applications involved
Introduction
6/40
Outline
◉ Introduction
◉ Problem statement
◉ Background
– Existing SaaS clouds and their related RM issues
– Survey on existing resource sharing techniques
◉ Contributions
– Overview : Scheduling Approach and Execution Model
– Architecture Model and Scheduling Strategy
– Prototyping
◉ Experimental evaluation
– Evaluation Protocol
– Results
◉ Conclusion & perspectives
7/40
Resource Management for HPC SaaS Services
◉What is a service
–Computes customer data with a specific application
–Input specifies an application and the data
–Output retrieved after the computation
–No more interactions necessary
Problem Statement
8/40
Related Research Issues
◉Data Management
◉Resilience and Fault Tolerance
◉Security and privacy
◉Resource Management
Problem Statement
9/40
Scheduling Problems
◉Share the cluster's resources among the services– according to the investments of the different businesses
◉Maximize the use of resources– Use idle resources to run pending requests
– Run miscellaneous tasks on idle resources in a best-effort way
◉Minimize the impact of selfish behaviors– A business can under-invest while needing a lot of resources
Problem Statement
10/40
Resource Allocation for On-demand Services
◉ Running requests in a dynamic way
– Resources should be allocated dynamically
– Allocated resources should be freed up automatically once a request completed
– Handle Input/Output data in a transparent way
◉ Need to think of resource partitioning
– Modern computing nodes have several cores
– The amount of cores required by certain tasks can be less than the number of cores available on a node
Problem Statement
11/40
Outline
◉ Introduction
◉ Problem statement
◉ Background
– Existing SaaS clouds and their related RM issues
– Survey on existing resource sharing techniques
◉ Contributions
– Overview : Scheduling Approach and Execution Model
– Architecture Model and Scheduling Strategy
– Prototyping
◉ Experimental evaluation
– Evaluation Protocol
– Results
◉ Conclusion & perspectives
12/40
Background on Existing SaaS Clouds
◉ Target office and collaborative applications
– E.g. Google Docs, Salesforce, Office365...
– Need of interactiveness
◉ SaaS cloud as a layer on top of a PaaS
– PaaS can rely on an IaaS layer
– IaaS enables on-demand resource allocation
• Virtualization plays an important role
◉ Resources belong to an unique organization
Background on SaaS Clouds
13/40
Services for Intensive Computations
◉ No need of interactiveness ◉ Requires a high dynamicity and
transparency
• Allocation of resources when executing a task
• Release of resources once a task completed
◉ Mutualized resources
=>Need to deal with sharing the resources among the services
Background on SaaS Clouds
14/40
Scheduling services on mutualized resources
◉ Raises conflicting objectives
– Fairness against the service suppliers
– Efficiency concerning the use of resources
◉ Prioritize an objective penalizes the other
=> Requires to make a tradeoff
Background on resource management
15/40
Common resource scheduling strategies
◉ First-come, First-served (FCFS)
◉ FCFS along with Backfilling (EASY/Conservative)
+ Fair against users
– Inefficient in terms of utilization
– May be unfair against some businesses in out context
+ Improve utilization
– May significantly delay biggest tasks
+ Possible optimization with a conservative backfilling
– Remains unfair in our context
Background on resource management
16/40Background on resource management
How Resources are Assigned to Tasks
◉ Simple assignation strategies
– Greedy and round-robin algorithms
◉ Assignations guided by performance requirements
– Notion of match-making (affinities between resources and tasks)
◉ Prioritization
– More prioritized tasks get access to resources first
• Preemption can be introduced
=> Notion of best-effort when certain tasks only run on idle resources
◉ Reservation and leasing
– Resources are allocated for a given time slot
17/40Background on resource management
Common resource sharing strategies
◉ Static sharing (partitioning)
◉ Fair-sharing (no partitioning + dynamic priorities)
+ Fair and easy to setup – Inefficient in terms of utilization in our context
+ Tradeoff between the fairness and the utilization
– May still raise unfair situations in our context
R1
R2
R3
R4
R5
R6
R7
R1
R2
R3
R4
R5
R6
R7
Business 1
Business 2
Business 3
18/40
Partitioning Individual Node
◉ Requires isolation among tasks
– A task would not access resources allocated to another task
◉ Isolation with containers (cgroups, cpusets, OpenVZ, LXC...)
+ Low level partitioning inducing a low overhead
=> good performances
– Non-flexible since not easy to handle dynamically
◉ Isolation with virtual machines (VMs)
+ High level partitioning
=> High flexibility in terms of automation
– Possible performance overhead
―Several optimizations (e.g. HVM, paravirtualization, PCI passthrough...)
Background on resource management
19/40
Synthesis on Partitioning Resources
◉ Virtual Machines enable interesting features
– To partition each individual node along with a high isolation
– To allocate and free up resources dynamically
– To suspend/restart best-effort tasks
◉ Powerful and proved VM management tools
– Handle VMs on individual node
– Xen, KVM, ESXi, Hyper-V...
– Handle VMs on distributed environments
• OpenNebula, Eucalyptus, OpenStack...
―Target IaaS clouds
20/40
Problems to Address With VMs
◉ Deal with performance overhead
– Generic optimizations
• HVM, PCI Passthrough
– Solution-specific optimizations
• Paravirtualization (Xen, Hyper-V)
• Virtio (KVM, Xen)
◉ Allocate custom VMs dynamically on distributed environments
– Contextualization enables interesting features (OpenNebula)
21/40
Lacks of the Existing According to Our Aims
◉ On-demand HPC services on a mutualized cluster
– Existing SaaS clouds focus on collaborative or office applications
• Resources owned by a single organization
◉ Existing resources sharing strategies don't suit our needs
=> Necessity to design new approaches
◉ Contributions
– Scheduling strategy for sharing mutualized resources
– Architecture for on-demand HPC services
– Prototyping for evaluation
Background on resource management
22/40
Outline
◉ Introduction
◉ Problem statement
◉ Background
– Existing SaaS clouds and their related RM issues
– Survey on existing resource sharing techniques
◉ Contributions
– Overview : Scheduling Approach and Execution Model
– Architecture Model and Scheduling Strategy
– Prototyping
◉ Experimental evaluation
– Evaluation Protocol
– Results
◉ Conclusion & perspectives
23/40
Ideas for the resource sharing strategy
◉ Combines the advantages...
– of a static sharing where the fairness is easy to hold
– and those of a fair-sharing strategy that allows to improve the utilization
◉ Enables a elasticity in resource sharing
– A business to use more resources than its investment :
• When the task raising such a situation has a duration less than a acceptable duration threshold noted D
• Or When the task is of best-effort type
=> Limits the impact of selfish behaviors from certain businesses
Contributions : Overview
24/40
Handling Requests Dynamically
◉ Encapsulate each task within a virtual machine (VM)
– Eases the partitioning of nodes and enables dynamicity
◉ Enable a Specific SaaS Manager
– Implements the scheduling strategy to address the resource sharing issues
– Assumes the allocation and the destruction of VMs
◉ Exploit the Contextualization of VMs
– VM created, customized and started dynamically
• VM suitably set to launch the task once started
– VM automatically destroyed once the task is completed
25/40
Architecture Model
◉ The SaaS Manager on top of the cluster
– Relies on a virtual infrastructure manager (VIM)
– VIM relies on hypervisors
◉ Possibility of reusing existing tools
– Avoids rewriting existing features
– Benefits of features from powerful proved tools
Contributions : Architecture Model
26/40
Design Driven by Openness, Performances and Interoperability
◉ OpenNebula enables support for handling the VMs
– Featuring the contextualization
◉ Xen manages VMs on each individual node
– Exploits the paravirtualization for better performances
◉ The different components coupled though Open APIs
– Ensure a better interopera-bility
Contributions : Architecture Model
27/40
Resource Sharing Strategy : Case study
◉ A situation with three businesses B1, B2 and B3
– B1 (with green tasks) invested for 2/7 of resources (R1, R2...R7)
– B2 (with red tasks) invested for 2/7
– B3 (with blue tasks) for 3/7
◉ On the figure, think of tasks as the related VMs
Contributions : Resource Management Strategy
t2t3 t5
t6
t1 t4
Queued tasks
28/40
Resource Sharing Strategy : Example 1
◉ Assumes the duration of t1 and t5 <= D (the chosen duration threshold)
– B1 and B3 are using ratios of resources geater than their investments
– That representing a complementary ratio of 1/14 for each of them
Contributions : Resource Management Strategy
Queued taskst5t1
t2t3
t6
t4
29/40
Resource sharing strategy : Example 2
◉ None of tasks has a duration <= D, but the task t2 is of best-effort type
– B1 is using a ratio of resources 1/7 greater than its investment
– t2 can be suspended at any time
Contributions : Resource Management Strategy
t4t1
Queued tasks
t3
t2
t5t6
30/40
About Implementation
◉ Relies on principles of resource leasing
– A lease consists in allocating a virtual machine for running a task
– The duration of a lease depends on the related task
• Its duration and its of the type (best-effort or not)
◉ Two kinds of leases handled specifically
– Non-preemptive leases
• Assigned to tasks related to the customers
―Non preemptive tasks
=> Resources only freed up at completion
– Preemptive leases
• Assigned to best-effort tasks
―VMs can be suspended to be restart later
=> No guaranty of completion
Contributions : Resource Management Strategy
31/40
Prototyping and Overview on Integration
◉ SVMSched (Smart Virtual Machine Scheduler)
– Drop-in replacement for the OpenNebula's default scheduler
– Proper interfaces that provide the SaaS abstraction
– Deals with allocating and freeing up VMs dynamically
– Implements the resource sharing strategy
– Supports contextualization data stored on Network File Systems
Contributions : Prototyping
32/40
Outline
◉ Introduction
◉ Problem statement
◉ Background
– Existing SaaS clouds and their related RM issues
– Survey on existing resource sharing techniques
◉ Contributions
– Overview : Scheduling Approach and Execution Model
– Architecture Model and Scheduling Strategy
– Prototyping
◉ Experimental evaluation
– Evaluation Protocol
– Results
◉ Conclusion & perspectives
33/40
Evaluation Protocol
◉ Evaluation of the performances of an application
– Time to setup the VM
– Performance overhead induced by the virtualization
◉ Study of the scheduling strategy
– Is that behaves well regarding the fairness and the utilization ?
– If not, how it can be improved?
◉ Experimental conditions
– Nodes from Grid'5000 : each having 2x4 cores, 2.27 Ghz, 8Go of RAM
– Xen 3.4.2 and OpenNebula 1.4.2 along with VM images of 500MB
– Applications from the Parsec Benchmark (BodyTrack, Blackscholes, Freqmine)
Evaluation
34/40Evaluation
Performances of the virtualization
◉ Full VMs perform better than contextualized ones => slight difference
◉ High overhead : applications requiring high disk IO
◉ VMs perform better than native machines
=>concurrent tasks requiring high memory IO
◉ Contextualized VMs : require constant and low setup time
– ~15s (<5% of the duration of a task of 5 mins) with an image of 500 MB
◉ Full VMs : times grow linearly
35/40Evaluation
Analyzing the scheduling strategy
◉ Better choice of the threshold
– Businesses can benefit from the mutualization
– Prevents the temptation for selfish behaviors
– Best-effort tasks would allows better utilization
◉ Mutualization is not relevant
– The threshold is not suitably chosen
– There is no best-effort tasks
– The strategy leads to a static sharing
36/40
Outline
◉ Introduction
◉ Problem statement
◉ Background
– Existing SaaS clouds and their related RM issues
– Survey on existing resource sharing techniques
◉ Contributions
– Overview : Scheduling Approach and Execution Model
– Architecture Model and Scheduling Strategy
– Prototyping
◉ Experimental evaluation
– Evaluation Protocol
– Results
◉ Conclusion & perspectives
37/40
Conclusion
◉ We studied and set up an environment for enabling HPC SaaS services on shared computing resources
– Designing an architecture model that relies on virtualization for executing on-demand requests
– Design resource management algorithms that allow to share in a fair way the resources while maximizing their use
◉ A prototype has been developed to evaluate experimentally our contributions
– Results shown the feasibility of our approach
– Prototype integrated in the deliveries of the Ciloe Project
◉ Thus we have enabled a room for addressing the problem of costs that highly constraints SMBs needing HPC resources for their applications
Conclusion & Perspectives
38/40
Perspectives
◉ Model of predicting the duration of each task
– Envisioning an approximation model based on reinforcing learning
◉ Economic model of billing
– What parameters the invoicing can take into account?
• Per-use costs of software licenses and computing resources + earnings
◉ Dimensioning the platform
– To allow each business to have a suitable view of its needs in terms of resources
Conclusion & Perspectives
39/40
About this Work
◉ Awards
– 1st Prize Grid'5000 Challenge, Reims 2011
◉ Book Chapter
– Rodrigue chakode, Jean-François Méhaut, Blaise-Omer Yenke. Scheduling On-demand SaaS Services on a Shared Virtual Cluster. In Cloud Computing and Services Science. Pages 259 – 276. ISBN 978-1-4614-2325-6, Springer-Verlag, April 2012.
◉ International conferences
– Rodrigue chakode, Blaise-Omer Yenke, Jean-François Méhaut. Resource Management of Virtual Infrastructure for On-demand SaaS Services. In CLOSER2011 - International conference on Cloud Computing and Service Science. Pages 352 – 361. Netherlands, May 2011.
– Rodrigue Chakode, Jean-François Méhaut, François Charlet. High Performance Computing on Demand: Sharing and Mutualizing Clusters. In AINA'10 - IEEE International Conference on Avanced Information Networking and Applications. Pages 126 – 133. Australia, April 2010.
◉ National conferences
– Rodrigue chakode, Blaise-Omer Yenke. Utilisation des machines virtuelles comme support de services de calcul à la demande. In Renpar'20: les actes des Rencontres francophones du Parallélisme, édition 2011. Saint-Malo, France, Mai 2011.
◉ Other publications (in the cloud community)
– Rodrigue chakode. SVMSched : A tool to enable On-demand SaaS and PaaS Services on top of OpenNebula. In OpenNebula Official Blog, http://blog.opennebula.org/?p=1646.
– Link on the OpenNebula Software Ecosystem : http://opennebula.org/software:ecosystem:svmsched
40/40
Thanks for your attention !