scalable content-addressable network lintao liu 2001.11.19
TRANSCRIPT
Scalable Content-Addressable Network
Lintao Liu 2001.11.19
System Goals
CAN: A distributed infrastructure that provides hash table-like functionality on Internet-like scales.
Scalable Fault-tolerant Self-organizing
Basic Design Basic Idea: A virtual d-dimensional Coordinate space
Each node owns a Zone in the virtual space Data is stored as (key, value) pair Hash(key) --> a point P in the virtual space (key, value) pair is stored on the node
within whose Zone the point P locates
Basic Design
For routing purpose, each node only need to maintain the information of those nodes that hold coordinate zone adjoining its own zone (neighbors)
Basic Design Routing: greedy algorithm if P is within the Zone of current node,
return (key, value) or failure (if no such key) else forward the query to the neighbor with coordinates closest to P
Example: Routing
(4,0)
4)
(0, 0)
(0, (4, 4)
7
Basic Design Node Insertion A new node N1 is going to join the network: 1. Find a node N2 already in the CAN 2. Randomly choose one point P in the space
3. Send a JOIN request destined for P (P resides in the Zone of N3)
4. N3 splits its Zone and assigns half zone to N1, and send (key, values) pairs from the half zone to N1
Basic Design
5. N1 also gets the information of neighbors from N3 6. N3 notices all the neighbors the reallocation of space. 7. Neighbors change their corresponding
data
Example: Insertion
Basic Design Node departure 1. Explicit departure
Hand over its zone to another node to
produce a valid single zone or merge with a smallest zone
Basic Design 2. Node failure Periodic update messages between
Neighbors Prolonged absence of an update message from a neighbor indicates its failure A takeover mechanism merges the zone with the smallest adjacent zone
There is also a background zone-reassignment algorithm to smooth the zone allocation.
Example: Node Departure
Problems in Basic design Scalable? Fault-tolerant? Self-organizing? Data durable? Efficient? Some others?
Design Improvements Guidelines:
Reduce path latency Increase fault tolerance
Increase data availability
Without increasing much complexity
Design Improvements Multi-dimensioned coordinate spaces -- reduce average path length
Increase the dimensions of the virtual space=> reduce the routing path length=> reduce the path latency
(increase the size of the routing table for there are more neighbors)
Design Improvements Multiple Realities: Multiple coordinate
spaces -- improve data availability improve routing fault tolerance reduce the average path length
Multiple coordinate spaces exist at the same time Each space is called a “reality” Each node occupies a zone in each reality
Design Improvements Better CAN routing metrics
-- reduce per-hop latency
When there are more than one choice for forwarding, choose the neighbor with the least RTT.
Design Improvements Overload coordinate zones
-- reduce average path length reduce the per-hop latency improve fault tolerance
Assign more than one node to share the same zone
Design Improvements Topologically-sensitive construction of CAN
-- reduce the path latency
A set of machines act as landmarks on InternetEach node measures its RTT to each of these landmarks and orders them in order of RTT.Physically close nodes are likely to have the same ordering and consequently, and will reside in adjacent zones of the coordinate space.
Other Design Improvements Multiple hash function
-- increase data availability reduce the query latency
Uniform Partitioning -- achieve load balance
Caching and Replication -- increase data availability reduce query latency achieve load balance
Comparisons Scalability? Construction & Storage Overhead? Query Efficiency? Fault Tolerance? Complexity?
Gnutella, FreeNet, Past, Chord
Conclusions CAN provides scalable routing and
efficient indexing. Given a key, it can return the (key, value) pair with an average path length (d/4)(n1/d) hops,or return failure if no such (key, value).
CAN is completely self-organizing, fault-tolerant and resistant to DoS attack.