distributed systems1 chapter 11: advanced distributed systems 1960 1970197519801985199019952000 *...

Distributed Systems 1

Chapter 11: Advanced Distributed Systems

1960 1970 1975 1980 1985 1990 1995 2000

* ARPANET

* Email * LAN

* TCP/IP

* Internet * WWW * XML

* PC cluster

* Crays

* MPPs

* main frame

* HTML

* P2P computing

* Grid computing

* XEROX PARC

* Web service

* mini computer

* PCs

* WS cluster

* PDAs* workstation

* high performence

* Pervasive computing* IBM

* Parasitic computing

models


P2P Computing

Def 1: “A class of applications that take advantage of resources (e.g., storage, cycles, content) available at the edge of the Internet.”

Edges often turned off, without permanent IP addresses, etc.

Def 2: “A class of decentralized, self-organizing distributed systems, in which all or most communication is symmetric.” (IPTPS’02)

Lots of other definitions that fit in between


Applications: Computing Examples: Seti@Home, UnitedDevices, Gnome@home, many

others Approach suitable for a particular class of problems.

Massive parallelism Low bandwidth/computation ratio Error tolerance, independence from solving a particular task

Problems: Centralized. How to extend the model to problems that are not massively

parallel?

Ability to operate in an environment with limited trust and dynamic resources


Applications: File sharing The ‘killer’ application to date Too many to list them all: Napster, FastTrack (KaZaA,

iMesh), Gnutella (LimeWire, BearShare), Overnet, BitTorrent, etc

Decentralized control Building a (relatively) reliable, data-delivery service using

a large, heterogeneous set of unreliable components.

Bytes transferred 0.8 TB/day on average Number of download sessions 230,000/day Number of local users ≥ 10,000 Number of unique files ~300,000

FastTrack (Kazaa) ， 2003


Applications: Content Streaming Streaming: the user ‘plays’ the data as as it arrives Examples:Pplive, SplitStream, etc

source

Oh, I am exhausted!

Client/server approachP2P approach

The first few users get the stream from the server New users get

the stream from the server or from users who are already receiving the stream

offload part of the server load to consumers to improve scalability


Many other P2P applications …

Backup storage (HiveNet, OceanStore) Collaborative environments (Groove Networks) Web serving communities (uServ) Instant messaging (Yahoo, AOL) Anonymous email Censorship-resistant publishing systems

(Ethernity, Freenet) Spam filtering


Client/Server vs. P2P


Overlay Network


Overlay Network

An abstract layer built on top of the physical network

Neighbors in the overlay can be several hops away in the physical network

Why do we need overlays? Flexibility in

Choosing neighbors Forming and customizing topology to fit application needs

(e.g., short delay, reliability, high BW, …) Designing communication protocols among nodes

Get around limitations in legacy networks


abstract P2P overlay architecture


Network Communications Layer

It describes the network characteristics of desktop machines connected over the Internet or small wireless or sensor-based devices that are connected in an ad-hoc manner.


Overlay Nodes Management layer

The Overlay Nodes Management layer covers the management of peers, which include discovery of peers and routing algorithms for optimization.


Features Management layer

The Features Management layer deals with the security, reliability, fault resiliency and aggregated resource availability aspects of maintaining the robustness of P2P systems.


Services Specific layer

The Services Specific layer supports the underlying P2P infrastructure and the application-specific components through scheduling of parallel and computation intensive tasks, content and file management.


Application-level layer

The Application-level layer is concerned with tools, applications and services that are implemented with specific functionalities on top of the underlying P2P overlay infrastructure.


P2P Systems: Simple Model

P2P Substrate

Operating System

Hardware

Middleware

P2P Application

Software architecture model on a peer

System architecture: Peers form an overlay according to the P2P

Substrate


Peer Software Architecture Model P2P Substrate (key component)

Overlay managementConstruction Maintenance (peer join/leave/fail and network

dynamics)

Resource managementAllocation (storage) Discovery (routing and lookup)

Can be classified according to the flexibility of placing objects at peers


P2P Substrates: ClassificationStructured (or tightly controlled, DHT)

− Objects are rigidly assigned to specific peers− Looks like as a Distributed Hash Table (DHT)− Efficient search & guarantee of finding− Lack of partial name and keyword queries− Maintenance overhead− Ex: Chord, CAN, Pastry, Tapestry, Kademila (Overnet)

Unstructured (or loosely controlled)− Objects can be anywhere− Support partial name and keyword queries− Inefficient search & no guarantee of finding− Some heuristics exist to enhance performance − Ex: Gnutella, Kazaa (super node), GIA


Types of P2P Systems


Napster (1)

Sharing of music files

Lists of files are uploaded to Napster server

Queries contain various keywords of required file

Server returns IP address of user machines having the file

File transfer is direct


Napster (2)

Centralised model Napster server ensures correct results Only used for finding the location of the files

Scalability bottleneck Single point of failure Denial of Service attacks possible Lawsuits


Gnutella (1)

Sharing of any type of files

Decentralised search

Queries are sent to the neighbour nodes

Neighbours ask their own neighbours and so on

Time To Live (TTL) field on queries

File transfer is direct


Gnutella Network

1

2

3

4

5

6

7A

Steps:1. Node 2 initiates search for A

note: do not know where is A，Flooding.


Gnutella Network

1

2

3

4

5

6

7

ASteps:1. Node 2 initiates search for A2. Sends message to all neighbors

A

A


Gnutella Network

1

2

3

4

5

6

7

ASteps:1. Node 2 initiates search for A2. Sends message to all neighbors3. Neighbors forward message

A

A

A


Gnutella Network

1

2

3

4

5

6

7Steps:1. Node 2 initiates search for A2. Sends message to all neighbors3. Neighbors forward message4. Nodes that haveA initiate a repl

y message

A:5

A

A:7

A

A


Gnutella Network

1

2

3

4

5

6

7Steps:1. Node 2 initiates search for A2. Sends message to all neighbor

s3. Neighbors forward message4. Nodes that haveA initiate a rep

ly message5. Query reply message is back-p

ropagated

A:5

A:7

A

A


Gnutella Network

1

2

3

4

5

6




ropagated6. Node 2 gets replies

A:5

A:7


Gnutella Network

1

2

3

4

5

6




ropagated6. Node 2 gets replies7. File download

download A


Gnutella (2)

Decentralised model No single point of failure Less susceptible to denial of service

SCALABILITY (flooding) Cannot ensure correct results


KaZaA Hybrid of Napster and Gnutella

Super-peers act as local search hubs

Each super-peer is like a constrained Napster server Automatically chosen based on capacity and availability

Lists of files are uploaded to a super-peer

Super-peers periodically exchange file lists

Queries are sent to super-peers


Freenet

Ensures anonymity

Decentralised search

Queries are sent to the neighbour nodes

Neighbours ask their own neighbours and so on

The query process is sequential

Learning ability


• Second generation P2P (overlay) networks

• Self-organizing

• Load balanced

• Fault-tolerant

• Guarantees on numbers of hops to answer a query

• Based on a distributed hash table interface

Structured P2P


• Distributed version of a hash table data structure

• Stores (key, value) pairs

The key is like a filename The value can be file contents

• Goal: Efficiently insert/lookup/delete (key, value) pairs

• Each peer stores a subset of (key, value) pairs in the system

• Core operation: Find node responsible for a key

Map key to node Efficiently route insert/lookup/delete request to this node

Distributed Hash Tables (DHT)


• Node id: m-bit identifier (similar to an IP address)

• Key: sequence of bytes

• Value: sequence of bytes

put(key, value) Store (key,value) at the node responsible for the key

value = get(key) Retrieve value associated with key (from the appropriate node)

DHT Generic Interface


File sharing

Databases

Service discovery

Chat service

Publish/subscribe networks

DHT Applications


Keys mapped evenly to all nodes in the network

Each node maintains information about only a few other nodes

Efficient routing of messages to nodes

Node insertion/deletion only affects a few nodes

DHT Desirable Properties


Node id: m-bit identifier (similar to an IP address)

Key: m-bit identifier (hash of a sequence of bytes)

Value: sequence of bytes

API insert(key, value) lookup(key) update(key, newval) join(n) leave()

Chord API


Consistent Hashing


Chord Operation (1)

Nodes form a circle based on node identifiers

Each node is responsible in storing a portion of the keys

Hash function ensures even distribution of keys and nodes in the circle


Chord Ring– definition

Finger table: node k stores pointers to k+1, k+2, k+4 ..., k+2m -1 (mod n)

Find node for every data in O(log(#nodes)) steps; O(log(#nodes)) storage per node


Chord Operation (2)


Chord Operation (3)

Lookup the furthest node that precedes the key

Query reaches target node in O(logN) hops


Scalable Lookup Scheme

N8+1 N14

N8+2 N14

N8+4 N14

N8+8 N21

N8+16 N32

N8+32 N42

Finger Table for N8

finger [k] = 1st node that succeeds (n+2k-1)mod2m

N1

N8

N14

N21N32

N38

N42

N48

N51

N56

finger 1,2,3

finger 4

finger 6

finger 5


Lookup Using Finger Table

N1

N8

N14

N21N32

N38

N42

N51

N56

N48

lookup(54)


Scalable Lookup Scheme

// ask node n to find the successor of idn.find_successor(id)

if (id belongs to (n, successor])return successor;

elsen0 = closest preceding node(id);return n0.find_successor(id);

// search the local table for the highest predecessor of idn.closest_preceding_node(id)

for i = m downto 1if (finger[i] belongs to (n, id))

return finger[i];return n;


Chord Properties

In a system with N nodes and K keys:

Each node manages at most K/N keys Bound information stored in every node Lookups resolved with O(logN) hops

No delivery guarantees Poor network locality


Network LocalityNodes close on ring can be far in the network


Grid Computing

What is a Grid – an integrated advanced cyber infrastructure that delivers:

Computing capacity Data capacity Communication capacity

Analogy to the Electrical Power Grid


History

For many years, a few wacky computer scientists have been trying to help other scientists use distributed computing. Interactive simulation (climate modeling) Very large-scale simulation and analysis (galaxy formation, gravity

waves, battlefield simulation) Engineering (parameter studies, linked component models) Experimental data analysis (high-energy physics) Image and sensor analysis (astronomy, climate study, ecology) Online instrumentation (microscopes, x-ray devices, etc.) Remote visualization (climate studies, biology) Engineering (large-scale structural testing, chemical engineering)

In these cases, the scientific problems are big enough that they required people in several organization to collaborate and share computing resources, data, instruments.


Some Core Problems

Too hard to keep track of authentication data (ID/password) across institutions

Too hard to monitor system and application status across institutions

Too many ways to submit jobs Too many ways to store & access files and data Too many ways to keep track of data Too easy to leave “dangling” resources lying

around (robustness)


Challenging Applications

The applications that Grid technology is aimed at are not easy applications! The reason these things haven’t been done before is because people

believed it was too hard to bother trying. If you’re trying to do these things, you’d better be prepared for it to be

challenging. Grid technologies are aimed at helping to overcome the

challenges. They solve some of the most common problems They encourage standard solutions that make future interoperability

easier They were developed as parts of real projects In many cases, they benefit from years of lessons from multiple

applications Ever-improving documentation, installation, configuration, training


Earth System Grid

Goal: address technical obstacles to the sharing & analysis of high-volume data from advanced earth system models


Other Examples of Grids

TeraGrid – NSF funded linking 5 major research sites at 40 Gbs (www.teragrid.org)

European Union Data Grid – grid for applications in high energy physics, environmental science, bioinformatics (www.eu-datagrid.org)

Access Grid – collaboration systems using commodity technologies (www.accessgrid.org)

Network for Earthquake Engineering Simulations Grid - grid for earthquake engineering (www.nees.org)

http://www.nees.org/


Current Status of the Grid

Dozens of Grid projects in scientific and technical computing in academic research community

Consensus on Key concepts and technologies (GGF Global Grid Forum)

Open source Globus Toolkit – a standard for major protocols and services

Funding agencies funding a lot of grid projects Business interest emerging rapidly Standards still emerging grid services, web services

resource framework Requires significant user training


Use Grid Now

A lot of work to make applications grid-ready adopt new algorithms for parallel computation change user interface

Have to build application on different architectures

Need to move application and data to different computers

Security and Licensing issues Requires a lot of system administration expertise Largely UNIX-based


Software Layers

Globus Client on User’s Workstation

Web browser or command window

Globus Server on Master Node

Queue Managers and Schedulers on Master Node

Applications Running on Grid Clusters

(Certs, submit job)

(User interface)

(Job manager)


Developing Grid StandardsIn

crea

sed

func

tiona

lity,

stan

dard

izat

ion

Customsolutions

1990 1995 2000 2005

Open GridServices Arch

Real standardsMultiple implementations

Web services, etc.

Managed sharedvirtual systems

Research

Globus Toolkit

Defacto standardSingle implementation

Internetstandards

2010Slide courtesy of Ian Foster, Argonne National Labs


Sand Glass Model

Trying to force homogeneity on users is futile. Everyone has their own preferences, sometimes even dogma.

The Internet provides the model…


Evolution of the GridIn

crea

sed

func

tiona

lity,

stan

dard

izat

ion

Time

Globus Toolkit

Open GridServices Arch

GGF: OGSI, WSRF, …(leveraging OASIS, W3C, IETF)

Multiple implementations,including Globus Toolkit

Defacto standardsGGF: GridFTP, GSI(leveraging IETF)

Customsolutions

X.509,LDAP,FTP, …

Web services

App-specificServices


Open Grid Services Architecture

Define a service-oriented architecture… the key to effective virtualization

…to address vital Grid requirements utility, on-demand, system management,

collaborative computing, etc.

…building on Web service standards. extending those standards when needed


Grid and Web Services Convergence

The definition of WSRF means that the Grid and Web services communities can move forward on a common base.


Who Is the Grid For?

Any Grid (distributed/collaborative) application or system will involve several “classes” of people. “End users” (e.g., Scientists, Engineers, Customers) Application/Product Developers System Administrators System Architects and Integrators

Each user class has unique skills and unique requirements.

The user class whose needs are met varies from tool to tool (even within the Globus Toolkit).


What End Users Need

Secure, reliable, on-demand access to data,software, people, and other resources(ideally all via a Web Browser!)


General Architecture

WebBrowser

ComputeServer

DataCatalog

DataViewer

Tool

Certificateauthority

ChatTool

CredentialRepository

WebPortal

ComputeServer

Resources implement standard access & management interfaces

Collective services aggregate &/or

virtualize resources

Users work with client applications

Application services organize VOs & enable

access to other services

Databaseservice

Databaseservice

Databaseservice

SimulationTool

Camera

Camera

TelepresenceMonitor

RegistrationService

http://images.google.com/imgres?imgurl=www.fh-wolfenbuettel.de/st/fara-ei/blindleistung/ws96/simpsons.gif&imgrefurl=http://www.fh-wolfenbuettel.de/st/fara-ei/blindleistung/ws96/seite23.htm&h=248&w=212&sz=7&tbnid=LmDowBrbepwJ:&tbnh=105&tbnw=90&prev=/images%3Fq%3Dsimpsons%26hl%3Den%26lr%3D%26ie%3DUTF-8%26sa%3DN


Grid Community Software

WebBrowser

ComputeServer

GlobusMCS/RLS

DataViewer

Tool

CertificateAuthority

CHEF ChatTeamlet

MyProxy

CHEF

ComputeServer

Resources implement standard access & management interfaces

Collective services aggregate &/or

virtualize resources

Users work with client applications

Application services organize VOs & enable

access to other services

Databaseservice

Databaseservice

Databaseservice

SimulationTool

Camera

Camera

TelepresenceMonitor

Globus IndexService

GlobusGRAM

GlobusGRAM

GlobusDAI

GlobusDAI

GlobusDAI

9Grid Community

8Off the Shelf

2Application Developer

http://images.google.com/imgres?imgurl=www.fh-wolfenbuettel.de/st/fara-ei/blindleistung/ws96/simpsons.gif&imgrefurl=http://www.fh-wolfenbuettel.de/st/fara-ei/blindleistung/ws96/seite23.htm&h=248&w=212&sz=7&tbnid=LmDowBrbepwJ:&tbnh=105&tbnw=90&prev=/images%3Fq%3Dsimpsons%26hl%3Den%26lr%3D%26ie%3DUTF-8%26sa%3DN


Social Policies/Procedures

How will people use the system? Who will set up access control? Who creates the data? How will computational resources be added to the system? How will simulation capabilities be used? What will accounting data be used for?

Not all problems are solved by technology! Understanding how the system will be used is

important for narrowing the requirements.


What Is the Globus® Toolkit? The Globus Toolkit is a collection of solutions to

problems that frequently come up when trying to build collaborative distributed applications.

Heterogeneity To date (v1.0 - v4.0), the Toolkit has focused on

simplifying heterogenity for application developers. We aspire to include more “vertical solutions” in future

versions. Standards

Our goal has been to capitalize on and encourage use of existing standards (IETF, W3C, OASIS, GGF).

The Toolkit also includes reference implementations of new/proposed standards in these organizations.


What Does the Globus Toolkit Cover?

GoalToday


Globus Toolkit Components


Comparisons of P2P and GridCategrary Grid Computing P2P Computing

Community well organized, institute, univ. anonymous

Application computing intensive media exchange

Shared Resource

high performance, reliable computer and clusters, network

PC’s, different networks

Scalability 100-1000 nodes millions or even more

Services well defined services: authorization/authentication, service discovery, task schedule…

limited services:

locating resources

Trustworthy trust each other no constraint

Software mature package, standard simple software, no std

distributed systems1 chapter 11: advanced distributed systems 1960 1970197519801985199019952000 *...

Documents