transcloud: design considerations for a high-performance ...nv/rick_mcgeer-sept_10_2011.pdf–...
TRANSCRIPT
TRANSCLOUD: Design Considerations for a High Performance Cloud Architecture AcrossHigh-Performance Cloud Architecture Across
Multiple Administrative Domains Rick McGeer, HP Labs
For the TransCloud Team: HP Labs, UC San Diego, University of Victoria, Northwestern University, UniversityUniversity of Victoria, Northwestern University, University
of Amsterdam, TU-Kaiserslautern, Princeton University, PlanetWorks, PlanetLab, GENI, G-Lab, DFN, NLR, GLIF
Sponsored by the National Science Foundation
August 1, 2010
Introduction – TransCloud
• TransCloud: A Cloud Where Services MigrateTransCloud: A Cloud Where Services Migrate, Anytime, Anywhere In a World Where Distance Is EliminatedEliminated– Joint Project Between GENICloud, iGENI, G-Lab
– GENICloud Provides Seamless Interoperation of Cloud Resources Across N-Sites, N-Administrative Domains
– iGENI Optimizes Private Networks of Intelligent Devices
– G-Lab contributes networking and advanced cloud resources
Sponsored by the National Science Foundation 2November 3, 2010
Context 1: Seamless Computation Services Available Anytime AnywhereAvailable Anytime, Anywhere
• “The Cloud” offers the prospect of ubiquitous information and services BUTand services…BUT…– Performance of Cloud services Highly Dependent On Location
• Of End-User, Applications, Middle Processes, Network Topology• Of Cloud Data, Compute Processes, Storage, etc
• Why?– Performance of Legacy Protocols Highly Dependent on Latency
• Therefore:– If the Clouds Are Too Far Away, Performance Will Be Very
Severely Restricted
Ergo• Ergo– Clouds Needs To Be Close To Service Sites OR– Networks (And Clouds) Must Be Designed To Eliminate Distance
Sponsored by the National Science Foundation 3November 3, 2010
Networks (And Clouds) Must Be Designed To Eliminate Distance
Context 2: Living With Legacy Protocols Over Commodity Internet vs Creating AlternativesCommodity Internet vs Creating Alternatives
Legacy Is There For a Reason• Legacy Is There For a Reason– Compatibility– FairnessFairness– Congestion Avoidance
• Therefore: Distributed Cloud– Minimal Latencies Over Legacy Internet To Anywhere/Everywhere
• Therefore: Private Internal Networks– Eliminate Latency Dependence Internally– Use Aggressive Internal Transport/Application Protocols
TIA 1039 R li bl Bl t UDP L bd RAM• TIA-1039, Reliable Blast UDP, Lambda RAM• Flow Control Enabled
Sponsored by the National Science Foundation 4November 3, 2010
Context 3: No Cloud Lives Everywhere
• Clusters are much easier to build than points-of-presence• Clusters are much easier to build than points-of-presence• Most commercial clouds today have only a few sites• Therefore: cloud service providers want to run services• Therefore: cloud service providers want to run services
across multiple clouds– Need a cloud standard that offers identical interfaces over multiple p
domains
• Inspiration: the web– Standard protocol for sending documents– Standard document format
Permission and access control on a site by site page by page– Permission and access control on a site-by-site, page-by-page basis
Sponsored by the National Science Foundation 5November 3, 2010
Context 4: General Considerations
• Major Cloud Use Case: Big Data, Distributed Collection, Must Live With Available Networks– Smart Cities– Sensor Nets
• Best Case: Create Private Network• Best Case: Create Private Network– Owning Optical Fiber– Create High Performance Wireless Point-to-Point Links
• Many Data Intensive Science Projects, Including – High Energy Physics (e.g. LHCNet, Science Data Network, I-
WIRE)WIRE)– Atmospheric Sensing Apparatus– Ocean Observing (e.g., Project Neptune)g ( g , j p )– Distributed Radio and Optical Telescopes– Telemedicine
Sponsored by the National Science Foundation 6November 3, 2010
Premise: Compute Where Data Lives!
• Computation is Ubiquitous and Easy To Obtain• Computation is Ubiquitous and Easy To Obtain• Programs Are Small and Easy to Transmit• Most Programs Reduce Data• Often Data Is Large and Challenging To Transmitg g g
– E.g., Jim Gray distributing SDSS by sending computers by FedEx!
• Solution -- Send Programs to Data• RequiresRequires
– High-performance, low-latency networkCommon API’s and operating environments
Sponsored by the National Science Foundation 7November 3, 2010
– Common API s and operating environments– Lightweight, user-based federation
What do we need to make this work?
• Advanced Networking and Caching• Advanced Networking and Caching– Firm guarantees on bandwidth and latency on a per-application
basis– Application support at Layer 3 and Layer 2– Means: Private Network where possible
A t l tf h d t li• Access to platforms wherever data lives– But data lives everywhere!
No organization has Points of Presence (PoP)s everywhere– No organization has Points of Presence (PoP)s everywhere– Need for an individual to be able to make arrangements with an
cloud service provider, anywhere, efficiently, minimal overhead– Common form of identity– Common identity not required
C AUP t i d
Sponsored by the National Science Foundation 8November 3, 2010
– Common AUP not required
What do we need to make this work?
• Ability to instantiate and run a program anywhere• Ability to instantiate and run a program anywhere– Common API at each level of the stack
IaaS/NaaS (VM/VN Creation)– IaaS/NaaS (VM/VN Creation)– PaaS (guaranteed OS/Progamming environment)
O S (St d d Q /D t M t API)– OaaS (Standard Query/Data Management API)• Easy, Standard Naming Scheme
– I need to know the name of my VM’s, logins, store etcwithout asking
Sponsored by the National Science Foundation 9November 3, 2010
Solution – TransCloud
• Introducing TransCloud PrototypeA E l I t ti ti f th A hit t– An Early Instantiation of the Architecture
– A Distributed Environment That Enables Component d I bili E l iand Interoperability Evaluation
– A Testbed On Which Early Experimental Research Can Be Conducted
– An Environment That Can Be Used To Explain/Showcase New Innovative Architecture/Concepts Through Demonstrations
Sponsored by the National Science Foundation 10November 3, 2010
TransCloud Today
Approx 40 nodes at 4 sites, 10 Gb/s connectivity
Sponsored by the National Science Foundation 11November 3, 2010
connectivity
TransCloud Today
• Sites at• Sites at– HP Labs, Palo Alto
UC San Diego– UC San Diego– Northwestern
K i l t– Kaiserslautern• Tomorrow (literally!)
– Amsterdam• Connectivity provided by:
– CAVEWave, StarLight, NetherLight, DFN, National Lambda Rail, Global Lambda Integrated Facility
Sponsored by the National Science Foundation 12November 3, 2010
DemoDemo( d b Ch i P d Ch i M tth(code by Chris Pearson and Chris Matthews,
University of Victoria, data store from Paul Muller (Kaiserslautern) and Michael Zink(U
Mass))ass))
Sponsored by the National Science Foundation 13November 3, 2010
TransCloud Demonstration
• Multi site query example• Multi-site query example– Internet data repository (packet traces)
• Kaiserslautern Germany (thanks to Paul Muller)• Kaiserslautern, Germany (thanks to Paul Muller)• UC San Diego (thanks to Michael Zink)
– Run an analysis job at each siteRun an analysis job at each site– Transmit the results back to HP Labs– Run summary job at HPLRun summary job at HPL
• What’s being demonstrated?Ability to run multi site job– Ability to run multi-site job
– Sending programs to dataP t t f l i f i ld f
Sponsored by the National Science Foundation 14November 3, 2010
– Prototype of analysis of coming world of sensors
TransCloud Demonstration
Reduction Job 2Reduction
Merge JobReductio
nReduction
Reduction Result
Final
Reduction
n program
n programResult
Reduction Job 1Reduction
Result
Sponsored by the National Science Foundation 15November 3, 2010
TransCloud Distributed Query
Sponsored by the National Science Foundation 16November 3, 2010
Introduction – TransCloud
• Several Basic TransCloud Concepts– High Performance Highly Distributed Cloud Architecture
Allowing Processes Across Multiple Administrative Domains Integrated With Dynamic Networking (GENI)Domains Integrated With Dynamic Networking (GENI)
– Scalable Lightweight Federation Processes
– Services Are Based On Processes That Can Be Executed Anywhere World-Wide (Location I d d t)Independent)
– Top Level Services Can Be Accessed Via Public IInternet
– Core Processes and Data Streams Leverage
Sponsored by the National Science Foundation 17November 3, 2010
Sophisticated Communication Services Not Merely “Best Effort” Commodity Internet
TransCloud Distributed Query Demo
Sponsored by the National Science Foundation 18November 3, 2010
Introduction – TransCloud
• TransCloud Architectural ComponentsTransCloud Architectural Components– High Level APIs
– A High Performance General Programming Environment
– A Wide Area Programming Environment Integrated With Query Systems Resource Management Frameworks, Including Cluster VM and Network ResourceIncluding Cluster, VM and Network Resource Management
High Levels of Virtualization Based on VMs and– High Levels of Virtualization Based on VMs and Network Abstractions
Sponsored by the National Science Foundation 19November 3, 2010
TransCloud Equals
• IaaS Based on Slice-Based Federation Architecture• IaaS Based on Slice-Based Federation Architecture (GENI/FIRE Standard)– Current instantiation: MyPLC over Eucalyptusy yp– Want: ports to OpenStack, etc.
• Identity: X.509 certificates and ssh keys– TransCloud sites agree to accept these as forms of identity– Which to accept up to the site
St d d DNS I f t t• Standard DNS Infrastructure<instanceName>.<sliceName>.<siteName>.<authorityName>.trans-cloud net: experiment interfacecloud.net: experiment interface
e.g.hadoop22.queryTest.hplabs.genicloud.trans-cloud.net
<siteName>.<authorityName>.trans-cloud.org: admin interfaceh l b i l d t l d
Sponsored by the National Science Foundation 20November 3, 2010
hplabs.genicloud.trans-cloud.org
Each authority does its own DNS.
TransCloud Equals..
• Experimental QaaS (Distributed Hadoop/Pig)• Experimental QaaS (Distributed Hadoop/Pig)• User-done PaaS (some stock images, but the
l t l f b ildi )usual tools for building your own…)
Sponsored by the National Science Foundation 21November 3, 2010
Integration with GENI
• Programmer and User Interface to Cluster Control• Programmer and User Interface to Cluster Control is MyPLC
Cluster version of PlanetLab control interface– Cluster version of PlanetLab control interface– Used for a number of clusters worldwide, including VICI
project in USproject in US• Mechanics of cluster control done by Eucalyptus
Si l E l t M PLC– Single Eucalyptus user – MyPLC– Users log in to MyPLC, issue directives, MyPLC
effectuates by issuing appropriate Eucalyptuseffectuates by issuing appropriate Eucalyptus commands
Sponsored by the National Science Foundation 22November 3, 2010
TransCloud Architecture
Di t ib t d PiDistributed Pig
Distributed Hadoop
NaClRePy
p
Slice FederationGENI Eucalyptus
Fl P i iti
1039/RBUDP…Slice Federation
Architecture Flow Primitives
Sponsored by the National Science Foundation 23November 3, 2010
TransCloud Distributed Query
Sponsored by the National Science Foundation 24November 3, 2010
Getting Hacked!
• On April 15 (about) we were attacked by the Romanian Black HatsOn April 15 (about) we were attacked by the Romanian Black Hats– Stock VM had a privileged user with a guessable password– Came with the VM…
A k k i b f b– Attack was a worm attack to recruit bots for botnets– We were alerted when a third-party site saw worm probes coming from us
• Solution: shut it down, fix it, bring it up, , g p• The Fix:
– Use MyPLC (PlanetLab) as the controllerL i l b h k X 509 t (GENI t d d)– Login only by ssh key, X.509 cert (GENI standard)
– Ssh login only from specified IP addresses (EC-2 standard)– Authorized users can add whitelisted IP’s– Currently enforced by iptables, but we’ll add support into OpenFlow
• Running final pre re-launch tests now
Sponsored by the National Science Foundation 25November 3, 2010
Goals for 2011
• Complete integration with MyPLC• Complete integration with MyPLC• Integrate the ProtoGENI Resource Specification
(R )(Rspec)– Modified to make sense for clusters
• Integrate the GENI standard Authorization-Based Access Control (ABAC)
• Add utility to permit users to manually adjust connectivity rulesy– Integration with ProtoGENI RSpec
Sponsored by the National Science Foundation 26November 3, 2010
Advancing TransCloud
• If You Are Interested In Using This• If You Are Interested In Using This Environment, Contact Us
• If You Would Like To Contribute ResourcesIf You Would Like To Contribute Resources, Contact Us
Sponsored by the National Science Foundation 27November 3, 2010
TransCloud at NICT
• THANKS!THANKS!
• Questions????
Sponsored by the National Science Foundation 28November 3, 2010