1 eit ict labs workshop at tu delft, may 2011 – cloud computing parallel and distributed systems...
TRANSCRIPT
1
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
Parallel and Distributed Systems GroupDelft University of TechnologyThe Netherlands
Our team: Undergrad Gargi Prasad, Arnoud Bakker, Nassos Antoniou, Thomas de Ruiter, … Grad Siqi Shen, Nezih Yigitbasi, Ozan Sonmez Staff Henk Sips, Dick Epema, Alexandru Iosup Collaborators Ion Stoica and the Mesos team (UC Berkeley), Thomas Fahringer, Radu Prodan (U. Innsbruck), Nicolae Tapus, Mihaela Balint, Vlad Posea (UPB), Derrick Kondo, Emmanuel Jeannot (INRIA), ...
Cloud Computing Research at TU Delft (2008—ongoing)
3TU. = + +
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing2
TUD Team: 2 Staff, 2+3PhD, n MSc, ...
Our team: Undergrad Adrian Lascateu, Alexandru Dimitriu (UPB, Romania), …, Grad Vlad Nae (U. Innsbruck, Austria), Siqi Shen, Nezih Yigitbasi (TU Delft, the Netherlands), …Staff Alexandru Iosup, Dick Epema, Henk Sips (TU Delft), Thomas Fahringer, Radu Prodan (U. Innsbruck), Nicolae Tapus, Mihaela Balint, Vlad Posea (UPB), etc.
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing3
Cloud Futures Workshop 2010 – Cloud Computing Support for Massively Social Gaming 3
What is Cloud Computing?
• “The path to abundance”• On-demand capacity• Pay what you use• Great for web apps (EIP,
web crawl, DB ops, I/O)
• “The killer cyclone”• Not so great
performance for sci. applications1
• Long-term perf. variability2
• How to manage?
http://www.flickr.com/photos/dimitrisotiropoulos/4204766418/ Tropical Cyclone Nargis (NASA, ISSS, 04/29/08)
1- Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks
Scientific Computing, IEEE TPDS, 2011. 2- Iosup et al., On the Performance Variability of Production Cloud Services,
CCGrid 2011.
VS
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing4
What do We Want from Clouds?
Good IaaS, PaaS, SaaS• Portability (Virtualisation, no vendor lock-in)• Accountability (lease what you use)• … for eScience• … for Massively Social Gaming
Good resource management• Elasticity• Reliability• Efficiency (Scheduling)• Data-aware mechanisms• Being “green”?
Performance evaluation (What is “Good”?)
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing5
Agenda
1. Introduction2. Cloud Performance Studies3. The Cloud Workloads Archive4. Massivizing Online Social Games using Clouds
1. Platform Challenge2. Content Challenge3. Analytics Challenge
5. Other Cloud Activities at TUD6. Take-Home Message
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing6
Cloud Performance Studies
• Many-Tasks Scientific Computing• Quantitative definition: J jobs and B bags-of-tasks• Extracted proto-MT users from grid and parallel
production environments
• Performance Evaluation of Four Commercial Clouds• Amazon EC2, GoGrid, Elastic Hosts, Mosso• Resource acquisition, Single- and Multi-Instance
benchmarking• Low compute and networking performance
• Clouds vs Other Environments• Order of magnitude better performance needed for
clouds• Clouds already good for short-term, deadline-driven
scientific computing1- Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks
Scientific Computing, IEEE TPDS, 2011 (in print)
http://www.st.ewi.tudelft.nl/~iosup/cloud-perf10tpds_in-print.pdf 2- Iosup et al., On the Performance Variability of Production Cloud Services, CCGrid
2011, pds.twi.tudelft.nl/reports/2010/PDS-2010-002.pdf
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing7
Performance Evaluation of Clouds [1/3]
Tools: C-Meter
Yigitbasi et al.: C-Meter: A Framework for Performance Analysis of Computing Clouds. Proc. of CCGRID 2009
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing8
Performance Evaluation of Clouds [2/3]
Low Performance for Sci.Comp.
• Evaluated the performance of resources from four production, commercial clouds. • GrenchMark for evaluating the performance of cloud
resources• C-Meter for complex workloads
• Four production, commercial IaaS clouds: Amazon Elastic Compute Cloud (EC2), Mosso, Elastic Hosts, and GoGrid.
• Finding: cloud performance low for sci.comp.
S. Ostermann, A. Iosup, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema, A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing, Cloudcomp 2009, LNICST 34, pp. 115–131, 2010.
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing9
Performance Evaluation of Clouds [3/3]
Cloud Performance Variability• Long-term performance variability of production cloud
services• IaaS:
Amazon Web Services• PaaS:
Google App Engine
• Year-long performance information for nine services• Finding: about half of the cloud services
investigated in this work exhibits yearly and daily patterns; impact of performance variability depends on application.A. Iosup, N. Yigitbasi, and D. Epema, On the Performance Variability of Production Cloud Services, CCGrid 2011.
Amazon S3: GET US HI operations
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing10
Agenda
1. Introduction2. Cloud Performance Studies3. The Cloud Workloads Archive4. Massivizing Online Social Games using Clouds
1. Platform Challenge2. Content Challenge3. Analytics Challenge
5. Other Cloud Activities at TUD6. Take-Home Message
Traces: Sine Qua Non in Comp.Sys.Res.• “My system/method/algorithm is better than yours
(on my carefully crafted workload)” • Unrealistic (trivial): Prove that “prioritize jobs from
users whose name starts with A” is a good scheduling policy
• Realistic? “85% jobs are short”; “10% Writes”; ...• Major problem in Computer Systems research
• Workload Trace = recording of real activity from a (real) system, often as a sequence of jobs / requests submitted by users for execution• Main use: compare and cross-validate new job and
resource management techniques and algorithms• Major problem: real workload traces from several
sourcesAugust 26, 2010
11
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing12
The Cloud Workloads Archive (CWA)What’s in a Name?CWA = Public collection of cloud/data center workload
traces and of tools to process these traces; allows us to:1. Compare and cross-validate new job and resource management
techniques and algorithms, across various workload traces
2. Determine which (part of a) trace is most interesting for a specific job and resource management technique or algorithm
3. Design a general model for data center workloads, and validate it with various real workload traces
4. Evaluate the generality of a particular workload trace, to determine if results are biased towards a particular trace
5. Analyze the evolution of workload characteristics across long timescales, both intra- and inter-trace
12
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing13
One Format Fits Them All
• Flat format• Job and Tasks• Summary (20 unique data fields) and Detail (60 fields)
• Categories of information• Shared with GWA, PWA: Time, Disk, Memory, Net• Jobs/Tasks that change resource consumption profile• MapReduce-specific (two-thirds data fields)
13
A. Iosup, R. Griffith, A. Konwinski, M. Zaharia, A. Ghodsi, I. Stoica, Data Format for the Cloud Workloads Archive, v.3, 13/07/10
CWJ CWJD CWT
CWTD
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing14
CWA Contents: Large-Scale Workloads
• Tools• Convert to CWA format• Analyze and model automatically Report
14
Trace ID System Size J/T/Obs Period Notes
CWA-01 Facebook 1.1M/-/- 5m/2009 Time & IO
CWA-02 Yahoo M 28K/28M/- 20d/2009 ~Full detail
CWA-03 Facebook 2 61K/10M/- 10d/2009 Full detail
CWA-04 Facebook 3 ?/?/- 10d/01-2010
Full detail
CWA-05 Facebook 4 ?/?/- 3m/02+2010
Full detail
CWA-06 Google 2 25 Aug 2010CWA-07 eBay 23 Sep 2010CWA-08 Twitter Need help!
CWA-09?
Google 9K/177K/4M 7h/2009 Coarse,Period
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing15
The Cloud Workloads Archive
• Looking for invariants• Wr [%] ~40% Total IO, but absolute values
vary
• # Tasks/Job, ratio M:(M+R) Tasks, vary• Understanding workload evolution
Trace ID Total IO [MB]
Rd. [MB] Wr [%] HDFS Wr[MB]
CWA-01 10,934 6,805 38% 1,538
CWA-02 75,546 47,539 37% 8,563
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing16
Agenda
1. Introduction2. Cloud Performance Studies3. The Cloud Workloads Archive4. Massivizing Online Social Games using
Clouds1. Platform Challenge2. Content Challenge3. Analytics Challenge
5. Other Cloud Activities at TUD6. Take-Home Message
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing17
What’s in a name? MSG, MMOG, MMO, …
1. Virtual worldExplore, do, learn, socialize, compete+
2. ContentGraphics, maps, puzzles, quests, culture+
3. Game dataPlayer stats and relationships
Romeo and Juliet
Massively Social Gaming =(online) games with massive numbers of players (100K+), for which social interaction helps the gaming experience
250,000,000 active players3BN hours/week world-wide
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing18
FarmVille, a Massively Social Game
Sources: CNN, Zynga.
Source: InsideSocialGames.com
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing19
MSGs are a Popular, Growing Market
• 25,000,000 subscribed players (from 250,000,000+ active)
• Over 10,000 MSGs in operation
• Subscription market size $7.5B+/year, Zynga $600M+/year
Sources: MMOGChart, own research. Sources: ESA, MPAA, RIAA.
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing20
Massivizing Games using Clouds
Nae, Iosup, Prodan, Dynamic Resource Provisioning in Massively
Multiplayer Online Games, IEEE TPDS, 2011.
(Platform Challenge)Build MSG platform that uses (mostly) cloud resources
• Close to players• No upfront costs, no maintenance• Compute platforms: multi-cores, GPUs, clusters, all-in-one!
(Content Challenge)Produce and distribute content for 1BN people
• Game Analytics Game statistics• Auto-generated game content
Iosup, POGGI: Puzzle-Based Online Games on Grid Infrastructures, EuroPar 2009 (Best Paper
Award)
(Analytics Challenge) Build cloud-based layer to Improve gaming experience
• Game Analytics Ranking / Rating• Game Analytics Matchmaking / Recommendations
Iosup, Lascateu, Tapus. CAMEO: social networks for MMOGs through
continuous analytics and cloud computing, ACM NetGames 2010.
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing21
Cloudifying: PaaS for MSGs
(Platform Challenge)Build MSG platform that uses (mostly) cloud
resources• Close to players• No upfront costs, no maintenance• Compute platforms: multi-cores, GPUs, clusters, all-in-one!• Performance guarantees• Code for various compute platforms—platform profiling• Misprediction=$$$• What services?• Vendor lock-in?• My data
Nae, Iosup, Prodan, Dynamic Resource Provisioning in Massively
Multiplayer Online Games, IEEE TPDS, 2011.
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing22
• Using data centers for dynamic resource allocation
• Main advantages:1. Significantly lower over-provisioning2. Efficient coverage of the world is possible
Proposed hosting model: dynamic
Massive join
Massive joinMassive leave
[Source: Nae, Iosup, and Prodan, ACM SC 2008]
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing23
Static vs. Dynamic Allocation
Q:What is the penalty for static vs. dynamic allocation?
250%
25%
[Source: Nae, Iosup, and Prodan, ACM SC 2008]
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing24
Cloudifying: Content, Content, Content
(Content Challenge)Produce and distribute content for 1BN
people• Game Analytics Game statistic• Crowdsourcing• Storification• Auto-generated game content• Adaptive game content• Content distribution/
Streaming content
A. Iosup, POGGI: Puzzle-Based Online Games on Grid Infrastructures, EuroPar 2009 (Best Paper
Award)
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing25
(Procedural) Game Content (Generation)
Game BitsTexture, Sound, Vegetation, Buildings,
Behavior, Fire/Water/Stone/Clouds
Game SpaceHeight Maps, Bodies of Water, Placement
Maps, …
Game SystemsEco, Road Nets, Urban Envs,
…
Game ScenariosPuzzle, Quest/Story, …
Game DesignRules, Mechanics, …
Hendricks, Meijer, vd Velden, Iosup, Procedural Game Content Generation: A Survey, Working
Paper, 2010
Derived ContentNewsGen, Storification
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing26
The New Content Generation Process*
Only the puzzle concept, and the instance generation and solving algorithms, are produced at development time
* A. Iosup, POGGI: Puzzle-Based Online Games on Grid Infrastructures, EuroPar 2009 (Best
Paper Award)
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing27
Puzzle-Specific ConsiderationsGenerating Player-Customized ContentPuzzle difficulty
• Solution size• Solution alternatives• Variation of moves• Skill moves
Player ability• Keep population statistics and generate
enough content for most likely cases• Match player ability with puzzle difficulty• Take into account puzzle freshness
4
21
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing28
Cloudifying: Social Everything!
• Social Network=undirected graph, relationship=edge• Community=sub-graph, density of edges between its nodes
higher than density of edges outside sub-graph
(Analytics Challenge) Build cloud-based layer to
Improve gaming experience• Ranking / Rating• Matchmaking / Recommendations• Play Style/Tutoring
Organize Gaming Communities• Player Behavior
A. Iosup, CAMEO: Continuous Analytics for Massively Multiplayer
Online Games on Cloud Resources. ROIA, Euro-Par 2009 Workshops.
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing29
Continuous Analytics for MMOGsMMOG Data =
raw and derivative information from the virtual world (millions of users)
Continuous Analytics for MMOGs =
Analysis of MMOG data s.t. important events are not lost• Data collection• Data storage• Data analysis• Data presentation• … at MMOG rate and scale
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing30
Continuous Analysis for MMOGsMain Uses By and For Gamers
1. Support player communities2. Understand play patterns
(decide future investments)3. Prevent and detect cheating or
disastrous game exploits (think MMOG economy reset)
4. Broadcasting of gaming events5. Data for advertisement companies
(new revenue stream for MMOGs)
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing32
The CAMEO Framework*
1. Address community needs• Can analyze skill level, experience points, rank• Can assess community size dynamically
2. Using on-demand technology: Cloud Comp.• Dynamic cloud resource allocation, Elastic IP
3. Data management and storage: Cloud Comp.• Crawl + Store data in the cloud (best performance)
4. Performance, scalability, robustness: Cloud Comp.
* A. Iosup, CAMEO: Continuous Analytics for Massively Multiplayer Online Games
on Cloud Resources. ROIA, Euro-Par 2009 Workshops, LNCS 6043, (2010)
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing33
CAMEO: Cloud Resource Management
-
500
1,000
1,500
2,000
2,500
3/6/2009 3/13/2009 3/20/2009 3/27/2009
Date
Use
d A
maz
on
EC
2 In
stan
cesSteady AnalyticsDynamic Analytics
Burst
• Snapshot = dataset for a set of players• More machines = more snapshots per time unit
Periodic
Unexpected
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing34
CAMEO: Exploiting Cloud Features
• Machines close(r) to server• Traffic dominated
by small packets(latency)
• Elastic IP to avoid traffic bans (legalese: acting on behalf of real people)
A. Iosup, A. Lascateu, N. Tapus, CAMEO: Enabling Social Networks for
Massively Multiplayer Online Games through Continuous Analytics
and Cloud Computing, ACM NetGames 2010.
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing35
Sample Game Analytics ResultsSkill Level Distribution in RuneScape• RuneScape: 135M+ open accounts (world record)
• Dataset: 3M players (largest measurement, to date)• 1,817,211 over level 100• Max skill 2,280
• Number of mid- and high-level players is significant
New Content Generation Challenge
HighLevel
MidLevel
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing36
Cost of Continuous RuneScape Analytics
• Put a price on MMOG analytics (here, $425/month, or less than $0.00015/user/month)
• Trade-off accuracy vs. cost, runtime is constant
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing37
Cloud SchedulingA Provisioning-and-Allocation problem
Many other possibilities
Before experiment
Provision
Allocate
During experiment
ManageQueue Queue Application Job
When needed
We’re just started
working on this problem
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing39
Take Home Message: TUD Research in CloudsTake Home Message: TUD Research in Clouds
• Understanding how real clouds work (focus on data-intensive)• Modeling cloud infrastructure (performance, availability) and workloads• Compare clouds with other platforms (grids, parallel production env.,
p2p,…)
• The Cloud Workloads Archive: easy to share cloud workload traces and research associated with them• Complement the Grid Workloads Archive
• Scheduling: making clouds work• eScience and gaming applications
(cloud application architectures)• MapReduce
• Massive Gaming: services on clouds• CAMEO: Massive Game Analytics• Toolkit for Online Social Network analysis• POGGI: game content generation at scale
Publications
2008: ACM SC
2009: ROIA, CCGrid, NetGames,
EuroPar (Best Paper Award) 2010:
IEEE TPDS, Elsevier CCPE,…
2011: ICPE, CCGrid, Book Chapter
CAMEO+Clouds, IEEE TPDS, IJAMC, …
Graduation (Forecast)
2011-2014: 2+3PhD, 10+MSc, nBSc
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing40
Thank you for your attention! Questions? Suggestions? Observations?
Alexandru Iosup
[email protected]://www.pds.ewi.tudelft.nl/~iosup/ (or google “iosup”)Parallel and Distributed Systems GroupDelft University of Technology
- http://www.st.ewi.tudelft.nl/~iosup/research.html
- http://www.st.ewi.tudelft.nl/~iosup/research_gaming.html
- http://www.st.ewi.tudelft.nl/~iosup/research_cloud.html
More Info:
Do not hesitate to contact me…
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing41