self-organized fault-tolerant routing in p2p overlays wojciech galuba, karl aberer epfl, switzerland...
Post on 21-Dec-2015
215 views
TRANSCRIPT
Self-organized fault-tolerant routing in P2P overlays
Wojciech Galuba, Karl AbererEPFL, Switzerland
Zoran Despotovic, Wolfgang Kellerer Docomo Euro-Labs, Munich, Germany
2© 2009 EPFL, Docomo Euro-Labs
What are the P2P overlays?
Underlying blue network (e.g. TCP/IP) Red peers come and go Peers form an overlay network (red links)
3
Routing in P2P overlays
Overlays (usually) have their own address space Goal: provide point-to-point connectivity
or rather point-to-service connectivity...
© 2009 EPFL, Docomo Euro-Labs
source
destination
4© 2009 EPFL, Docomo Euro-Labs
What is the problem?
Failures in large-scale systems are the norm, not the exception
Permanent failures well understoodOverlay maintenance algorithms
Intermittent failuresTransient network connectivity problemsPeer overload, resource exhaustionCannot be addressed in the same way as
permanent failures
5
Existing solutions - multipath
Multiple paths Goal: at least one path reaches destination
© 2009 EPFL, Docomo Euro-Labs
source
destination
- lossy peer
6
Existing solutions – iterative routing
Source controls the routing process Successively ask nodes for their neighbors High redundancy if one node fails, use others
© 2009 EPFL, Docomo Euro-Labs
source
destination
- lossy peer
j
7
Exisisting solutions - problems
Heavily rely on message redundancyHigh bandwidth cost
Do not learn from failuresLikely to repeat the same routing mistakes
© 2009 EPFL, Docomo Euro-Labs
8© 2009 EPFL, Docomo Euro-Labs
Forward feedback protocol (FFP)
Requestor Provider
Request
Feedback
1
2
3
Service
Requestor determines the quality of the provided service decision binary: good or bad
Feedback follows the same path as the request Feedback is obligatory, no feedback = bad feedback
9
A peer on the path
Knows only its overlay neighbors Based on feedback, learns which neighbors are
reliable Associates a success estimator with each (j, dz) pair:
j – neighbor address dz – destination zone
A success estimator is an exponentially averaged success rate, [0..1] Initially 0.5 Increased on positive feedback Decreased on negative feedback or feedback timeout
© 2009 EPFL, Docomo Euro-Labs
ph peernh
10
Next hop selection
Based on the state of the success estimators
Pick a neighbor j for which the current value of a success estimator is the highest i.e. maximize the probability of success based
on performance history
© 2009 EPFL, Docomo Euro-Labs
11
The FFP protocol in action
ph peer
nh2
nh1
nh1 has history of success but starts failing peer switches to nh2
- -+ -
+
+
© 2009 EPFL, Docomo Euro-Labs
12
Cumulative effect
The root cause of the failure receives the most negative feedback
The links to the faulty peer are avoided by its neighbors
- lossy peer
© 2009 EPFL, Docomo Euro-Labs
13
Scalability through dest zoning
O(log N) zones and O(log N) neighbors Total state at each node: O(log2N)
© 2009 EPFL, Docomo Euro-Labs
Increasing overlay distance to destination
Increasing destination zone number
0123
Exponentially decreasing zone size
14
Evaluation
PlanetLab – a planetary-scale testbed 350 peers Conditions:
Median system load: 5.3 Unpredictable delays and loss „natural” on PlanetLab
Challege: introduce loss and delays in a Chord-like DHT place a tight 3s timeout on service requests see if protocols can route around faulty peers
Workload: multi-source, multi-destination
© 2009 EPFL, Docomo Euro-Labs
15
The line-up
BASE – baseline, no fault-tolerance mechanisms
MULTI4 – 4-way multipath routing ITER4 – Kademlia-based iterative routing,
4 parallel RPCs FFP
© 2009 EPFL, Docomo Euro-Labs
16
Every 5 mins: a new 10% of peers become droppers
Droppers drop all requests© 2009 EPFL, Docomo Euro-Labs
18© 2009 EPFL, Docomo Euro-Labs
Every 5 mins: a new 10% of peers become delayers
Delayers delay all messages by 100-2000ms
19
25% of droppers arrive at 300s Convergence time depends on the traffic pattern
© 2009 EPFL, Docomo Euro-Labs
20
Topology-oblivious routing
Starts with all success estimators = 0.5Empty routing tables
Learn by trial and errorWhich neighbors are good forwarders for
which destinations Routing tables are entirely emergent Initially random walks
converge to reliable routes
© 2009 EPFL, Docomo Euro-Labs
21© 2009 EPFL, Docomo Euro-Labs
Warmup: initially use the original Chord routing tables After some time switch to FFP routing tables
22
Summary
FFP uses 2-5 times less bandwidth than MULTI and ITER
Same or higher fault-tolerance More suitable for workloads:
that are high-rate with fewer src-dest pairs
© 2009 EPFL, Docomo Euro-Labs
23
Benefits of the self-org approach
Decentralized scalability Topology-oblivious
Applicable to many networks Agnostic to the causes of failures
Robust to many failure scenarios Even those it was not designed for
© 2009 EPFL, Docomo Euro-Labs