cloud auto-scaling with deadline and budget constraints
DESCRIPTION
Ming Mao, Jie Li, Marty Humphrey eScience Group CS Department, University of Virginia Grid 2010 – Oct 27, 2010. Cloud Auto-Scaling with Deadline and Budget Constraints. Cloud Computing. A fast growing computing platform - PowerPoint PPT PresentationTRANSCRIPT
Cloud Auto-Scaling with Deadline and Budget Constraints
Ming Mao, Jie Li, Marty Humphrey
eScience Group
CS Department, University of Virginia
Grid 2010 – Oct 27, 2010
Cloud Computing
A fast growing computing platform IDC - Cloud spending increases 27.4% a year to $56 billion
(compared 5% a year of traditional IT) $16.5 billion (2009) -> $55.5 billion (2014)
src: Worldwide and Regional Public IT Cloud Service 2010-2014 Forecast
Two most quoted benefits Scalable computing and storage Reduced cost
Concerns Security, availability, cost management, integration
interoperability, etc.
Cost
Q1. Cost – the most important factor in practice?
Q2. Moving into Cloud == Reduced Cost ?
54.00%
63.90%
64.60%
67.00%
68.50%
75.30%
77.70%
77.90%
0.00% 20.00% 40.00% 60.00% 80.00% 100.00%
Seems like the way of future
Sharing systems with partners simpler
Alwasys offers latest functionality
Requires less in-house IT staff, costs
Encourages standard systems
Monthly payments
Easy/fast to deply to end-users
Pay only for what you use
Source: IDC Enterprise Panel, 3Q09, n = 263, Sep 2009
Rate the benefits commonly ascribed to the cloud on-demand model
72.90%78.30%79.20%81.00%82.10%
84.50%86.00%87.80%88.60%
91.60%
0.00% 20.00% 40.00% 60.00% 80.00% 100.00%
Have local presence, can come to my officesAre a technology and business model innovatorOffer both on-premise and public cloud services
Support many of my IT needesAllow managing on-premise & cloud together
Understand my business and industryProvide a complete solution
Option to move cloud offerings back on premiseOffer Service Level Agreements
Offer competitive pricing
Source: IDC Enterprise Panel, 3Q09, n = 263, Sep 2009
How important is it that Cloud service providers...
Current Auto-Scaling Mechanisms
Resource utilization information based triggers (e.g. AWS auto-scaling, RightScale, enStratus, Scalr, etc)
Where does the gap exist?
Multiple instance types
Current billing models Full hour billing
Non-ignorable instance acquisition time 7-15 min in Windows Azure
More specific performance goals
Budget awareness (e.g. dollars/month, dollars/job)
Problem Statement
Deadline(Job finish time)
Cost
Problem Statement – how to enable cloud applications to finish all the submitted jobs before user specified deadline with as little money as possible using auto-scaling.
CloudApplication
Users
Job
Cloud Server
Cloud Application Performance Model Workload are non-dependent jobs submitted
in the job queue
FCFS manner and fairly distributed
Different classes of jobs
Same performance goal (e.g.1 hour deadline)
VM instances take time to startup
Problem Formalization (1)ijinijiViI idiV,i jt
Key variables used in the model
Problem Formalization (2)
Workload
Computing Power of Instance Running Instance
Pending Instance
( , )j jW J n
, ( )
( , )i
ji j
j type I jj
D nP J
t n
( )
, ( )
( ( ))( , )i
i
type I i ji j
j type I jj
D d s nP J
t n
iI
Problem Formalization (3)
Scale up Sufficient budget
Insufficient budget
Scale down
'iiP W P ( ')( )
itype IiMin c
( ')iMax P ( ') ( )i itype I type Ii ic C c
i siP P W
An example
Workload Required Computing Power
1
2
3
21
: 60 10 10 40: 60 5 20 35: 60 20 5 35
'
j xj yj z
P W I I
1
2 1 2 3
3
1 2 3
: 10 10 10 45: ' 5 ' 20 ' 10 35: 20 5 10 35
'
j xj n n n yj z
V V V P
1 1 2 2 3 3( ' ' ')Min c n c n c n
1 21 1 2 2 3 3 ( ) ( )' ' ' type I type Ic n c n c n c c C where
Windows Azure Implementation
Cloud Cruise Control
Decider
&
Monitor Repository VMManager
Config
VM instancesHistorical Data
workload
dequeue
enqueue
update update
+ , –
vm plan
vm info
( ')( )itype Ii
Min c 'jjP W P admin
users
dynamicconfiguration
notify
Evaluation - Simulation
MixAvg 30 jobs/hourSTD 5 jobs/hour
Computing Intensive
Avg 30 jobs/hourSTD 5 jobs/hour
IO IntensiveAvg 30 jobs/hourSTD 5 jobs/hour
General0.085$/hourDelay 600s
Average 300sSTD 50s
Average 300sSTD 50s
Average 300sSTD 50s
High-CPU0.17$/hourDelay 720s
Average 210sSTD 25s
Average 75sSTD 15s
Average 300sSTD 50s
High-IO0.17$/hourDelay 720s
Average 210sSTD 25s
Average 300sSTD 50s
Average 75sSTD 15s
Workload & VM simulation parameters
Stable workload & changing deadline
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
0
1000
2000
3000
4000
5000
6000
7000
0 10 20 30 40 50 60 70 80
Utilization (%)Response (sec)
Time (hour)
Stable Worload & Changing Deadline
utilization deadline avg max min
Changing workload & fixed deadline
0
50
100
150
200
250
300
350
0
500
1000
1500
2000
2500
3000
3500
4000
0 10 20 30 40 50 60 70 80
Worload (job/h)Response (sec)
Time (hour)
Changing Workload & Fixed Deadline
deadline avg max min workload
CostVM Types Total Cost ($)
% more than optimalChoice #1 General 98.52$ (43%)Choice #2 High-CPU 128.86$ (87%)Choice #3 High-IO 129.71$ (88%)Choice #4 General, High-CPU, High-IO 78.62$ (14%)Optimal General, High-CPU, High-IO 68.85$
Evaluation - MODIS MODIS200X – Year Terra & Aqua – Satellite(X - Y) – Day X to day Y 15 images / day
Moderate scale test (up to 20 instances)
Large Scale test (up to 90 instances)
* C.H. – computing hour 1C.H. = 0.12$ in Windows Azure
1hour deadline 2hour deadline 3hour deadlineTerra 2004(10-12)
Total 45 jobs4 C.H.* or 0.48$
18 min late 8 min early 20 min early9 C.H.or 1.08$ 6 C.H or 0.72$ 5 C.H.or 0.6$
Aqua 2008(30-32)Total 45 jobs
4 C.H. or 0.48$
15min late 20 min early 29 min early10 C.H or 1.2$ 7 C.H.or 0.84$ 5 C.H.or 0.6$
2 hour deadline 4 hour deadlineTerra & Aqua 2006(1-75)
Total 1125 jobs93 C.H. or 11.16$
20min late170 C.H. or 20.4$
6 min early132 C.H. or 15.84$
Terra & Aqua 2006(1-150)Total 2250 jobs
185 C.H. or 22.2$
Admission Denied 22 min early243 C.H. or 29.16$
Evaluation - MODIS Test: Terra & Aqua 2006(1-75) - total 1125 jobs
6min early theoretical cost - 93 C.H. or 11.16$ actual cost - 132 C.H. or 15.84$
0 1 2 3 4 5
02468
10121416182022242628303234363840
Time (hour)
Inst
ance
Num
ber
Instance Acquisition and Release
Released Acquiring Ready
Conclusions & Future works
Conclusions More cost-efficient than fixed-size instance
choice VM startup delay can affect hugely in practice
Future works More general cloud application model Multiple job classes Consider other instance types (e.g. spot
instances & reserved instances) Data transfer performance and storage cost
Thank you