cloud computing and datacenters• cluster controller gathers information about and schedules vm...

34
www.helsinki.fi/yliopisto Cloud Computing and Datacenters Prof. Sasu Tarkoma University of Helsinki 22.1.2010 Matemaattis-luonnontieteellinen tiedekunta / Sasu Tarkoma 1

Upload: others

Post on 22-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Cloud Computing and Datacenters

Prof. Sasu Tarkoma University of Helsinki

22.1.2010 Matemaattis-luonnontieteellinen tiedekunta / Sasu Tarkoma 1

Page 2: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Introduction Views to Cloud Computing Eucalyptus Open Source Platform MapReduce and Hadoop NEXUS/Mesos Network and Data Center Virtualization PSIRP

Contents

22.1.2010 Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 2

Page 3: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Focus

• Distributed applications and services on the cloud • Collaboration & P2P • Massive scale

•  Millions and billions of users and data items •  Significant growth is expected in the mobile sector

•  50 times increase in 2015?

•  Evaluation methods for wide-scale experiments

• Exploiting parallelisms & locality • Custom software & solutions

• Building blocks • Interfaces, policies, protocols, algorithms, runtimes, platforms • State of the art examples: Amazon, Google, Yahoo, Microsoft

Page 4: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

The services of Cloud computing can be divided into three categories: 1. Software-as-a-Service (SaaS), in which a vendor supplies the hardware infrastructure, the software product and interacts with the user using a portal (software on demand, pay-as-you-go). 2. Platform-as-a-Service (PaaS), in which a set of software and development tools are hosted by a provider on the provider's infrastructure, for example, Google's AppEngine. 3. Infrastructure-as-a-Service (IaaS), which involves virtual server instances with unique IP addresses and blocks of on-demand storage, for example, Amazon's Web services infrastructure.

Cloud Computing

22.1.2010 Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 4

Page 5: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

22.1.2010 5

Cloud Computing

Browser as a Platform Datacenters and clusters

Virtualization Web Application Frameworks

Elasticity Location Independent Resource Pooling

Information demand and supply (Open APIs)

Ubiquitous Network Access

On-demand service

Software as a Service (SaaS)

Platform as a Service (PaaS)

Infrastructure as a Service (IaaS)

Private, community, public, hybrid clouds

Page 6: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Datacenter for Experiments

•  UH has a new datacenter for experimenting with networking technology, services, and computational science

•  240 CPUs with 8 cores each, virtualization support (Ubuntu and KVM) •  Aim to support network virtualization with OpenFlow switches and NetFPGA software-defined routers •  Platform experiments with Eucalyptus •  Platform experiments with Nexus (joint work with ICSI and UCB) • Pub/sub control and data planes for data centers

Page 7: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Toward New Deployments

Datacenter / cluster / testbed

Networking Solutions and Basic

Processes

Distribution middleware

Services, applications

New deployments

Page 8: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Toward New Knowledge

Datacenter / cluster / testbed

Evaluation Methodology

Wide-area simulation

Real-life experi-ments

Formal models

New Knowledge on Distributed Systems

Page 9: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Eucalyptus

Elastic Utility Computing Architecture Linking Your Programs To Useful Systems

Originally a research project at UC Santa Barbara Now a private company (Eucalyptus Systems) Web services based implementation of elastic/utility/cloud

computing infrastructure Linux image hosting ala Amazon

How do we know if it is a cloud? Try and emulate an existing cloud: Amazon AWS

Functions as a software overlay Existing installation should not be violated (too much)

Focus on installation and maintenance

Page 10: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Eucalyptus

•  Supports KVM and Xen •  Open source platform with Amazon AWS features •  Support for Amazon AWS (Elastic Compute Cloud EC2, Simple Storage Service S3, and Elastic Block Storage EBS)

•  Includes Walrus: an Amazon S3 interface-compatible storage manager •  Added support for elastic IP assignment •  Web-based interface for cloud configuration •  Image registration and image attribute manipulation •  Configurable scheduling policies and SLAs •  Support for multiple hypervisor technologies within the same cloud

Page 11: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Eucalyptus Usage

Foster greater understanding and uptake of cloud computing Provide a vehicle for extending what is known about the utility

model of computing Experimentation vehicle prior to buying commercial services

Provide development, debugging, and “tech preview” platform for Public Clouds

Homogenize local IT environment with Public Clouds AWS functionality locally makes moving using Amazon AWS easier,

cheaper, and more sustainable Provide a basic software development platform for the open source

community

Page 12: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Architecture

Client-side API Translator

Cloud Controller

Cluster Controller Node Controller

Client-side Interface (via network)

Database Walrus (S3)

Page 13: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto 22.1.2010 Matemaattis-luonnontieteellinen tiedekunta 13

Components

•  Node Controller controls the execution, inspection, and terminating of VM instances on the host where it runs.

•  Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network.

•  Storage Controller (Walrus) is a put/get storage service that implements Amazon’s S3 interface, providing a mechanism for storing and accessing virtual machine images and user data.

•  Cloud Controller is the entry-point into the cloud for users and administrators. It queries node managers for information about resources, makes high-level scheduling decisions, and implements them by making requests to cluster controllers.

Page 14: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Cloud Controller

•  Web services in three categories •  Resource Services perform system-wide arbitration of resource

allocations, let users manipulate properties of the virtual machines and networks, and monitor both system components and virtual resources.

•  Data Services govern persistent user and system data and provide for a configurable user environment for formulating resource allocation request properties.

•  Interface Services present user-visible interfaces, handling authentication & protocol translation, and expose system management tools.

Page 15: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto 22.1.2010 Matemaattis-luonnontieteellinen tiedekunta 15

Virtual Interfaces

Physical Resource

VM Instance Virtual

Interface

Software Ethernet Bridge

VLAN Tagged Virtual Interface

Physical Interface To physical Ethernet

To physical Ethernet

VM instance is assigned a virtual interface that is connected to a software Ethernet bridge Cluster Controller configures VM traffic isolation, dynamic public IP assignment

Page 16: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Walrus

•  Walrus is a data storage service •  Leverages standard web services technologies (Axis2, Mule) •  Interface compatible with Amazon’s Simple Storage Service

(S3) •  Walrus implements the REST (via HTTP), sometimes termed

the “Query” interface and SOAP interfaces •  Users that have access to EUCALYPTUS can use Walrus to

stream data into/out of the cloud as well as from instances that they have started on nodes.

•  Walrus acts as a storage service for VM images.

Page 17: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

What’s it Made Out Of?

•  Axis2 and Axis2c version 1.4.0 •  Hibernate 3.2.2 •  HSQLDB 1.8.0 •  jetty 6.1.9 •  JiBX (March 30th sourceforge) •  Mule 2.0.1 •  Rampart version 1.3 •  libvirt version 0.4.2 •  socat-1.6.0 •  VDE version 2.2.0-pre2

Page 18: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Cloud Speed

Performance study using HPC applications and benchmarks Two questions:

What is the performance impact of virtualization? What is the performance impact of cloud infrastructure?

Tested Xen, Eucalyptus, and AWS (small SLA) Many answers:

Random access disk is slower with Xen CPU bound can be faster with Xen -> depends on configuration Kernel version is far more important Eucalyptus imposes no statistically detectable overhead AWS small appears to throttle network bandwidth and (maybe) disk

bandwidth

Page 19: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto 22.1.2010 Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 19

MapReduce and Hadoop

• MapReduce was developed by Google to process large datasets in clusters

• Used for sorting, searching and indexing, counting, clustering, machine learning, etc.

• Inspired by the Map and Reduce operations used in functional programming • Uses a filesystem or a database to store intermediate values and solutions • Solve problems by splitting them into smaller problems, then combine the solutions

• Done by a master node • Hadoop is a Java framework inspired by the Google file system and MapReduce

• Rack-aware processing of vast data sets • Batch-oriented workloads • Used by Facebook and Amazon

Page 20: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

22.1.2010 20

Worker

Worker

Worker

Worker

split 0

split 1

split 2

split 3

split 4

(3)read

(1)fork

output file 0 (4)

local write

output file 1

User program

Master

(1)fork

(2)assign map

(6)write

Worker

(5)Remote read

(1)fork

(2) assign reduce

Input files

Map phase

Intermediate files (on local disks)

Reduce phase

Output files

Page 21: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto 22.1.2010 Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 21

Nexus / Mesos

• Nexus is about running multiple frameworks in the same cluster. • Resource manager. • Can run systems such as Hadoop. • Multiplexes resources across frameworks. • Nexus decouples job execution management from resource management by providing a simple resource management layer over which frameworks like Hadoop and Dryad can run.

• Microkernel • Make reliable component as small as possible

• Exokernel • Give maximal control to frameworks

• IP model • Narrow waist over which diverse frameworks can run

Page 22: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto 22.1.2010 Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 22

Page 23: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto 22.1.2010 Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 23

Task Management

Page 24: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto 22.1.2010 Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 24

Task Management Operations

Page 25: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Intro to OpenFlow

OpenFlow is an open protocol for router configuration and control

Opportunities for convergence of packet and circuit switched and clean slate designs

Slides from the Stanford Clean Slate program http://cleanslate.stanford.edu OpenFlow Web site http://www.openflowswitch.org

Page 26: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

26

Controller

OpenFlow Switch

Flow Table

Secure Channel

PC  

SSL  

hw  

sw  

OpenFlow: Enable Innovations “within” the Infrastructure

•  Add/delete flow entries •  Encapsulated packets •  Controller discovery

API  

Net  Services  

Page 27: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

PSIRP

•  Publish/subscribe Internet Routing Paradigm (PSIRP) •  FP7 project coordinated by HIIT •  Creating a new clean-slate protocol suite based on publish/

subscribe

Page 28: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto 14.10.2008 28

PSIRP

Observations

No topological addresses, only labels

No explicit layering (blackboard pattern)

Security enhanced using self-certification

End-to-end reachability, control in the network

Natural support for multicast, it is the norm

Support for broadcast and all-optical label-switching technologies

Dynamic state is introduced into the network

How do we make it scale?

Fragmentation

Rendezvous

Routing

Page 29: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Forwarding Design

Fast path In-packet Bloom filters Line-rate forwarding

Slow path (Rendezvous) Content-centric functions Policies Caching configuration Security

14.10.2008 29

Page 30: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

Subscriber Publisher

Forwarding node

Forwarding node

Forwarding edge node

Forwarding node

AS: Topology AS: Topology

AS: Rendezvous AS: Rendezvous

Forwarding node

Data Forwarding

Publish Subscribe

Create delivery path

Configure Forwarding path

Page 31: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Characteristics Links have identifiers (Link IDs) Source routing mechanism Install forwarding state on demand (traffic aggregation)

Topology Manager Network topology graph and its maintenance Constructs Bloom filter-based forwarding identifiers

Page 32: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Efficient flat identifier based forwarding Currrent zFilter size 256 bits Link IDs are added in the zFilter (OR operation) Verification requires one comparison (AND operation)

Limitations Possible false positives Wrong forwarding path

Page 33: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto

Introduction Overview Overlay Technology Applications Properties of Data Structure of the Book

Network Technologies Networking Firewalls and NATs Naming Addressing Routing Multicast Network Coordinates Network Metrics

Properties of Networks and Data Data on the Internet Zipf’s Law Scale-free Networks Robustness Small Worlds

Unstructured Overlays Overview Early Systems Locating Data Napster Gnutella Skype BitTorrent Cross-ISP BitTorrent Freenet Comparison

Foundations of Structured Overlays Overview Geometries Consistent Hashing Distributed Data Structures for Clusters

Distributed Hash Tables Overview APIs Plaxton’s Algorithm

Chord Pastry Koorde Tapestry Kademlia Content Addressable Network Viceroy Skip Graph Comparison

Probabilistic Algorithms Overview of Bloom Filters Bloom Filters Bloom Filters in Distributed Computing Gossip Algorithms

Content-based Networking and Publish/Subscribe Overview DHT-based Data-centric Communications Content-based Routing Router Configurations Siena and Routing Structures Hermes Formal Specification of Content-based Routing Systems Pub/sub Mobility

Security Overview Attacks and Threats Securing Data Security Issues in P2P Networks Anonymous Routing Security Issues in Pub/Sub Networks

Applications Amazon Dynamo Overlay Video Delivery SIP and P2PSIP CDN Solutions

Conclusions References Index

22.1.2010 33

Overlay Networks Book

Page 34: Cloud Computing and Datacenters• Cluster Controller gathers information about and schedules VM execution on specific node controllers, as well as manages virtual instance network

www.helsinki.fi/yliopisto 22.1.2010 Matemaattis-luonnontieteellinen tiedekunta / Petri Kettunen 34

Conclusions and Future Work

•  Excellent facilities and connections for doing high impact research in cloud computing in Helsinki

•  Current work with PSIRP, Future Internet SHOK, Cloud Software SHOK

•  Not covered by current activities, seeds for a new project •  Software-defined networking •  Dynamic controller for OpenFlow enabled routers •  PSIRP in data centers •  Petabyte storage mechanisms