1 a scalable content- addressable network sylvia rathnasamy, paul francis, mark handley, richard...

33
1 A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam

Upload: aubrie-kelly

Post on 25-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

1

A scalable Content- Addressable Network

Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker

Pirammanayagam Manickavasagam

2

Overview

Introduction Design Design Improvements Design Review Related works Discussion

3

Introduction

Hash Table Functionality: Maps ‘key’ to a ‘value’.

Content Addressable Network (CAN) :-

Is a concept that provides distributed infrastructure which has Hash Table like functionality on Internet like Scale.

Characteristics: scalable, fault-tolerant and completely self-organizing.

4

Introduction (cont..)

Napster Locating a file is centralized.

Gnutella Floods the request for a file, not scalable

CAN provides a solution: Scalable - Nodes maintain small amount of control

state Distributed - Hash table is stored in all Peers, so it

is.

5

Design

Each node stores a chunk of hash table entry and details of adjacent zones.

Requests are forwarded towards the CAN node that contains the key.

Indexing uses virtual d-dimensional Cartesian coordinates. Coordinates are purely logical

6

Coordinate Space

•A

•D •B

•C

0,01,0

0,1

Each node randomly picks a coordinate.Coordinate space is dynamically partitioned

Each node owns its individual zone

7

Design (cont..)

Inserting a pair ( key K1, value V1) Use Hash function to map K1 to a point P1 in space Then this pair is stored in the Node that owns the zone

Retrieving a value: Need to know the key and use the key to identify the

node Node learns and maintains the table of details of

adjacent nodes.

8

Routing

Information's needed for routing CAN node hold routing table that contains IP address

and its virtual coordinate space. Neighbor is determined if one of the d-dimension is

same and another dimension abuts. For a d-dimensional coordinate individual node

maintains 2d neighbors

9

In figure nodes 5&1 are neighbors, as 5 has same Y coordinates as 1 and X coordinate abut 1’s.

10

Routing (Cont..)

CAN message has destination address By simple greedy forwarding to the neighbor

closest to the destination it proceeds it routing. average path length = (d/4)n1/d hops. ( n - # of

zones) As many path is available, network sustains even

if some node fails.

11

Construction

1. First the new node must find a node already in the CAN.

2. Next, using the CAN routing mechanisms, it must find a node whose zone will be split.

3. Finally, the neighbors of the split zone must be notified so that routing can include the new node.

12

Bootstrap

From DNS domain name, one or more bootstrap nodes is determined.

A bootstrap node maintains a partial list of CAN nodes it believes are currently in the system.

TO join a CAN, a new node looks up the CAN domain name in DNS to retrieve a bootstrap nodes IP address.

This bootstrap node then supplies the IP address of several randomly chosen nodes currently in system.

13

Finding a zone

New node randomly chooses a point (p) in space. Sends JOIN request destined for P. This is sent into CAN via existing CAN node. Current occupant node then splits its zone in half

and assigns one half to the new node. Splitting is done by assuming certain order.

Eg, in 2 d, X coordinate splits first and then Y coordinate.

14

Maintenance

Departure of a Node Single Node Failure Multiple Failure

15

Departure of a Node

The node that departs hands over the details to the one of its neighbor.

If the zone of one of the neighbors can be merged with the departing node’s zone to produce a valid single zone, then this is done.

If not, then the zone is handed to the neighbor whose current zone is smallest, and that node will then temporarily handle both zones.

16

Departure of a Node

•A

•D •B

•C

1,0

0,1

0,0

•D

•E •F.

When node F fails, E will be merged with F

17

Failures

Prolonged absence of update message will indicate the failure of a node. Neighbor node starts a takeover timer running. When the timer expires, a node sends a TAKEOVER

message conveying its own zone volume to all of the failed node’s neighbors.

It accepts the TAKEOVER only if the zone volume in the message is smaller than its own zone volume.

Otherwise it sends its TAKEOVER message.

18

Multiple Failure

First does a ring search to get the unreachable nodes.

Then rebuilds neighbor state table to do safe takeover.

19

Design Improvements

Multi-dimensioned coordinate spaces Increasing the dimensions of the CAN coordinate space

reduces the routing path length, and hence the path latency.

Increase in Dimension => increase in neighbor => increase in routing => increases routing fault tolerance

20

21

Design Improvements

Realities: multiple coordinate spaces Each node maintain multiple, independent coordinate spaces with

each node in the system. Each such coordinate space is a “reality”.

Given a coordinate, it is searched in all realities. This reduces the average path length.

Multiple dimensions vs. multiple realities Multiple Reality has increased fault tolerance and data

availability than multiple dimensions.

22

Design Improvements

Overloading coordinate zones allow multiple nodes to share the same zone. Nodes that share the

same zone are termed peers. MAXPEERS, which is the maximum number of allowable peers

per zone. reduced path length (number of hops), and hence reduced path

latency improved fault tolerance

Multiple hash functions Almost equal to multi realities.

23

Design Improvements

Topologically-sensitive construction of the CAN overlay network CAN nodes are ordered with their round-trip-time to

each of landmarks. With m landmarks, m! such orderings are possible. Every portion is assigned a landmark ordering. a new node joins the CAN at a random point in that

portion of the coordinate space associated with its landmark ordering.

24

Design Improvements

More Uniform Partitioning Zone are split after comparing volume of its zone with those

of its immediate neighbors in the coordinate space. Zone with the largest volume is split. we can see that without the uniform partitioning feature a

little over 40% of the nodes are assigned to zones with volume V as compared to almost 90% with this feature and the largest zone volume drops from 8V to 2V .

Not surprisingly, the partitioning of the space further improves with increasing dimensions.

Caching and Replication techniques

25

26

Design Review

Following metrics were used to evaluate system performance: Path length: the number of (application-level) hops required to route

between two points in the coordinate space. Neighbor-state: the number of CAN nodes for which an individual node

must retain state. Latency: we consider both the end-to-end latency of the total routing path

between two points in the coordinate space and the per-hop latency, i.e., latency of individual application level hops obtained by dividing the end-to-end latency by the path length.

Volume: the volume of the zone to which a node is assigned that is indicative of the request and storage load a node must handle.

Routing fault tolerance: the availability of multiple paths between two points in the CAN.

Hash table availability: adequate replication of a (key,value) entry to withstand the loss of one or more replicas.

27

Design Review

The key design parameters affecting system performance are: dimensionality of the virtual coordinate space: d number of realities: r number of peer nodes per zone: p number of hash functions (i.e. number of points per reality at which a (key, value)

pair is stored): k use of the RTT-weighted routing metric use of the uniform partitioning

Test system specification: A system size of n=218 nodes ,Transit-Stub topology with delay of 100ms on intra-

transit links, 10ms on stub-transit links and 1ms on intra-stub links (i.e. 100ms on links that connect two transit nodes, 10ms on links that connect a transit node to a stubnode and so forth).

Transit-stub models explicitly group vertices into domains, and reflect that grouping in the connectivity between vertices.

28

100 node transit-stub topology

29

Bare bones: CAN that does not utilize most of our additional design features Knobs-on-full: CAN making full use of our added features (without the landmark ordering feature)

30

Related Work

Related Algorithms Distance vector and Link State algorithms

These need widespread topological information. CAN in other hand stores only less data.

Plaxton algorithm Each node has n bit label divided into l levels. Each level has width w = n/ l. Each node forwards a packet to a neighbor whose label

matches the destination label in more digits.

31

Related Work

Algorithms with geographic routing. ‘space’ in this algorithm refers to physical space. No neighbor search problem. Correctly mimic the space is a trivial problem It is not extensible to multi dimension

32

Related System

Domain Name System It stores (domain name, IP address).

Ocean Store To provide continuous access to persistent information Uses Plaxtons algorithm

Peer-to-Peer file sharing systems Freenet

Stores Keys ( analogous URL ), address of other nodes, data corresponding to key.

33

Discussion

Addresses two key problems in the design of Content-Addressable Networks: scalable routing and indexing.

Simulation results validate the scalability of our overall design – for a CAN with over 260,000 nodes, we can route with a latency that is less than twice the IP path latency.

Future works Secure CAN Key word searching