intel research timothy roscoe p2: implementing declarative overlays timothy roscoe boon thau loo,...
Post on 26-Mar-2015
212 Views
Preview:
TRANSCRIPT
IntelIntel Research Research Timothy Roscoe
P2: Implementing Declarative OverlaysP2: Implementing Declarative Overlays
Timothy RoscoeTimothy RoscoeBoon Thau Loo, Tyson Condie,Boon Thau Loo, Tyson Condie,David Gay, Joseph M. Hellerstein, David Gay, Joseph M. Hellerstein, Petros Maniatis, Ion StoicaPetros Maniatis, Ion Stoica
Intel Research at BerkeleyIntel Research at BerkeleyUC BerkeleyUC Berkeley
P2
P2
2
Timothy RoscoeIntelIntel Research Research
Overlays: a broad viewOverlays: a broad view““Overlay”: the routing and message forwarding Overlay”: the routing and message forwarding component of component of anyany non-trivial distributed system non-trivial distributed system
Internet
Overlay
P2
3
Timothy RoscoeIntelIntel Research Research
Overlays Everywhere…Overlays Everywhere… Many examples:Many examples:
Internet Routing, multicastInternet Routing, multicast
Content delivery, file sharing, DHTs, GoogleContent delivery, file sharing, DHTs, Google
Microsoft ExchangeMicrosoft Exchange
Tibco (technology interoperation)Tibco (technology interoperation)
Overlays are a fundamental tool for repurposing Overlays are a fundamental tool for repurposing communication infrastructurescommunication infrastructures
Get a bunch of friends together and build your Get a bunch of friends together and build your own ISP (Internet evolvability)own ISP (Internet evolvability)
You don’t like Internet Routing? Make up your You don’t like Internet Routing? Make up your own rules (RON)own rules (RON)
Paranoid? Run FreenetParanoid? Run Freenet
Intrusion detection with friends (DDI, Polygraph)Intrusion detection with friends (DDI, Polygraph)
Have your assets discover each other (iAMT)Have your assets discover each other (iAMT)
Internet
Overlay
Distributed systems innovation Distributed systems innovation needsneeds overlays overlays
P2
4
Timothy RoscoeIntelIntel Research Research
If only it weren’t so hardIf only it weren’t so hard In theoryIn theory
Figure out right propertiesFigure out right properties
Get the algorithms and protocols Get the algorithms and protocols
Implement themImplement them
Tune themTune them
Test themTest them
Debug themDebug them
RepeatRepeat
But in practiceBut in practice
No global viewNo global view
Wrong choice of algorithmsWrong choice of algorithms
Incorrect implementationIncorrect implementation
Pathological timeoutsPathological timeouts
Partial failuresPartial failures
Impaired introspectionImpaired introspection
Homicidal boredomHomicidal boredom
Next to no debug supportNext to no debug support
It’s hard enough as it isIt’s hard enough as it is
Do I also need to reinvent the wheel every time?Do I also need to reinvent the wheel every time?
P2
5
Timothy RoscoeIntelIntel Research Research
Our GoalOur Goal Make network development more accessible to Make network development more accessible to
developers of distributed applicationsdevelopers of distributed applications
Specify network at a high-levelSpecify network at a high-level
Automatically translate specification into executableAutomatically translate specification into executable
Hide everything they don’t want to touchHide everything they don’t want to touch
Enjoy performance that is Enjoy performance that is good enoughgood enough
Do for networked systems what SQL and the Do for networked systems what SQL and the relational model did for databasesrelational model did for databases
P2
6
Timothy RoscoeIntelIntel Research Research
The argument: The argument: The set of routing tables in a network represents a The set of routing tables in a network represents a
distributed data structuredistributed data structure
The data structure is characterized by a set of ideal The data structure is characterized by a set of ideal propertiesproperties which define the network which define the network
Thinking in terms of structure, not protocolThinking in terms of structure, not protocol
RoutingRouting is the process of maintaining these is the process of maintaining these properties in the face of changing ground factsproperties in the face of changing ground facts
Failures, topology changes, load, policy…Failures, topology changes, load, policy…
P2
7
Timothy RoscoeIntelIntel Research Research
Routing as Query ProcessingRouting as Query Processing In database terms, the routing table is a In database terms, the routing table is a viewview over over
changing network conditions and statechanging network conditions and state
Maintaining it is the domain of distributed continuous Maintaining it is the domain of distributed continuous query processingquery processing
Not merely an analogy: Not merely an analogy: We have We have implementedimplemented a general routing protocol a general routing protocol engine as a query processor. engine as a query processor.
P2
8
Timothy RoscoeIntelIntel Research Research
Two directionsTwo directions1.1. Declarative expression of Internet Routing Declarative expression of Internet Routing
protocolsprotocols
• Loo et. al., ACM SIGCOMM 2005Loo et. al., ACM SIGCOMM 2005
2.2. Declarative Declarative implementationimplementation of overlay networks of overlay networks
• Loo et. al., ACM SOSP 2005Loo et. al., ACM SOSP 2005
• The focus of this talk (and my work)The focus of this talk (and my work)
P2
9
Timothy RoscoeIntelIntel Research Research
P2: A Declarative Overlay EngineP2: A Declarative Overlay Engine Distributed stateDistributed state
Distributed soft state in relational tables, holding tuples of valuesDistributed soft state in relational tables, holding tuples of values
route (S, D, H)route (S, D, H)
Non-stored information passes around as Non-stored information passes around as event tuple streamsevent tuple streams
message (X, D)message (X, D)
Overlay specification in declarative logic language (OverLog)Overlay specification in declarative logic language (OverLog)
<head> :- <precondition1>, <precondition2>, … , <preconditionN>.<head> :- <precondition1>, <precondition2>, … , <preconditionN>.
Location specifiers Location specifiers @X@X placeplace individual tuples at specific nodes individual tuples at specific nodes
message@H(H, D) :- route@S(S, D, H), message@S(S, D).message@H(H, D) :- route@S(S, D, H), message@S(S, D).
(a, x, c)
(a, z, f)
(a, z, t)
message@a(a, z)
message@f(f, z)
message@t(t, z)
P2
10
Timothy RoscoeIntelIntel Research Research
P2 DataflowP2 Dataflow Overlog automatically translated to dataflow graphOverlog automatically translated to dataflow graph
C++ dataflow elements (similar to Click elements)C++ dataflow elements (similar to Click elements)
Implements:Implements:
relational operators (joins, selections, projections)relational operators (joins, selections, projections)
flow operators (multiplexers, demultiplexers, queues)flow operators (multiplexers, demultiplexers, queues)
network operators (congestion control, retry, rate limits)network operators (congestion control, retry, rate limits)
Interlinked via asynchronous push or pull typed flowsInterlinked via asynchronous push or pull typed flows
Engine executes dataflow graph at runtimeEngine executes dataflow graph at runtime
A distributed query processor to A distributed query processor to maintain overlaysmaintain overlays
demux
P2
11
Timothy RoscoeIntelIntel Research Research
Example: Ring Routing Example: Ring Routing Every node has an Every node has an addressaddress (e.g., (e.g.,
IP address) and an IP address) and an identifier identifier (large random)(large random)
Every object has an Every object has an identifieridentifier
Order nodes and objects into a Order nodes and objects into a ring by their identifiersring by their identifiers
Objects “served” by their Objects “served” by their successor nodesuccessor node
Every node knows its Every node knows its successorsuccessor on the ringon the ring
To find object To find object KK, walk around the , walk around the ring until I locate K’s immediate ring until I locate K’s immediate successor nodesuccessor node
3
28
15
1840
60
58 13
37
0
56
42
222433
P2
12
Timothy RoscoeIntelIntel Research Research
Example: Ring Routing Example: Ring Routing How do I find the
responsible node for a given key k?
n.lookup(k)
if k in (n, n.successor)
return n.successor
else
return n.successor. lookup(k)
3
28
15
1840
60
58 13
37
P2
13
Timothy RoscoeIntelIntel Research Research
Ring StateRing State n.lookup(k)
if k in (n, n.successor)
return n.successor
else
return n.successor. lookup(k)
Node state tuples
node(NAddr, N)
successor(NAddr, Succ, SAddr)
Transient event tuples
lookup (NAddr, Req, K)
3
28
15
1840
60
58 13
37
P2
14
Timothy RoscoeIntelIntel Research Research
Pseudocode to OverLogPseudocode to OverLog n.lookup(k)
if k in (n, n.successor]
return n.successor
else
return n.successor. lookup(k)
Node state tuples
node(NAddr, N)
successor(NAddr, Succ, SAddr)
Transient event tuples
lookup (NAddr, Req, K)
response@Req (Req, K, SAddr) :-
lookup@NAddr (NAddr, Req, K),
node (NAddr, N),
succ (NAddr, Succ, SAddr),
K in (N, Succ].
P2
15
Timothy RoscoeIntelIntel Research Research
Pseudocode to OverLogPseudocode to OverLog n.lookup(k)
if k in (n, n.successor]
return n.successor
else
return n.successor. lookup(k)
Node state tuples
Node (NAddr, N)
Successor NAddr, Succ, SAddr)
Transient event tuples
lookup (NAddr, Req, K)
response@Req (Req, K, SAddr) :-
lookup@NAddr (NAddr, Req, K),
node (NAddr, N),
succ (NAddr, Succ, SAddr),
K in (N, Succ].
lookup@SAddr (SAddr, Req, K) :-
lookup@NAddr (NAddr, Req, K),
node (NAddr, N),
succ (NAddr, Succ, SAddr),
K not in (N, Succ].
P2
17
Timothy RoscoeIntelIntel Research Research
Implementation:From OverLog to Dataflow
Implementation:From OverLog to Dataflow
Traditional problem in databasesTraditional problem in databases
Turn logic into relational algebraTurn logic into relational algebra
Joins, projections, selections, aggregations, etc.Joins, projections, selections, aggregations, etc.
P2
18
Timothy RoscoeIntelIntel Research Research
From OverLog to DataflowFrom OverLog to Dataflowresponse@R(R, K, SI) : - lookup@NI(NI, R, K),
node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].
lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].
lookup
P2
19
Timothy RoscoeIntelIntel Research Research
From OverLog to DataflowFrom OverLog to DataflowR1 response@R(R, K, SI) : - lookup@NI(NI, R, K),
node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].
R2 lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].
node
Joinlookup.NI ==
node.NINI, R, K, N
lookup
P2
20
Timothy RoscoeIntelIntel Research Research
From OverLog to DataflowFrom OverLog to DataflowR1 response@R(R, K, SI) : - lookup@NI(NI, R, K),
node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].
R2 lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].
node succ
Joinlookup.NI ==
node.NI
Joinlookup.NI ==
succ.NINI, R, K, N, S, SI
lookup
P2
21
Timothy RoscoeIntelIntel Research Research
From OverLog to DataflowFrom OverLog to DataflowR1 response@R(R, K, SI) : - lookup@NI(NI, R, K),
node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].
R2 lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].
node succ
Joinlookup.NI ==
node.NI
Joinlookup.NI ==
succ.NI
SelectK in (N, S]
NI, R, K, N, S, SIK in (N, S]
lookup
P2
22
Timothy RoscoeIntelIntel Research Research
From OverLog to DataflowFrom OverLog to DataflowR1 response@R(R, K, SI) : - lookup@NI(NI, R, K),
node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].
R2 lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].
node succ
Joinlookup.NI ==
node.NI
Joinlookup.NI ==
succ.NI
SelectK in (N, S]
Projectresponse@R
(R, K, SI)
lookup
P2
23
Timothy RoscoeIntelIntel Research Research
From OverLog to DataflowFrom OverLog to DataflowR1 response@R(R, K, SI) : - lookup@NI(NI, R, K),
node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].
R2 lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].
node succ
Joinlookup.NI ==
node.NI
Joinlookup.NI ==
succ.NI
SelectK in (N, S]
Projectresponse@R
(R, K, SI)
Joinlookup.NI ==
node.NI
Joinlookup.NI ==
succ.NINI, R, K, N, S, SI
lookup
lookup
P2
24
Timothy RoscoeIntelIntel Research Research
From OverLog to DataflowFrom OverLog to DataflowR1 response@R(R, K, SI) : - lookup@NI(NI, R, K),
node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].
R2 lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].
node succ
Joinlookup.NI ==
node.NI
Joinlookup.NI ==
succ.NI
SelectK in (N, S]
Projectresponse@R
(R, K, SI)
Joinlookup.NI ==
node.NI
Joinlookup.NI ==
succ.NI
SelectK not in (N, S]
NI, R, K, N, S, SIK in (S, N]
lookup
lookup
P2
25
Timothy RoscoeIntelIntel Research Research
From OverLog to DataflowFrom OverLog to DataflowR1 response@R(R, K, SI) : - lookup@NI(NI, R, K),
node@NI(NI, N), succ@NI(NI, S, SI), K in (N, S].
R2 lookup@SI(SI, R, K) :- lookup@NI(NI, R, K),node@NI(NI, N), succ@NI(NI, S, SI), K not in (N, S].
node succ
Joinlookup.NI ==
node.NI
Joinlookup.NI ==
succ.NI
SelectK in (N, S]
Projectresponse@R
(R, K, SI)
Joinlookup.NI ==
node.NI
Joinlookup.NI ==
succ.NI
SelectK not in (N, S]
Projectlookup@SI(SI, R, K)
lookup
lookup
P2
26
Timothy RoscoeIntelIntel Research Research
From OverLog to DataflowFrom OverLog to Dataflow One rule strand per OverLog rule
Rule order is immaterial
Rule strands could execute in parallel
node succ
Rule R1lookup
lookup
response
Rule R2 lookup
P2
27
Timothy RoscoeIntelIntel Research Research
From OverLog to DataflowFrom OverLog to Dataflow
node succ
Rule R1lookup
lookup Rule R2
...
...
Dem
uxQ
ueue
UD
P
Tx
UD
P
Rx
CC
R
x
Sche
dQ
ueue
...
...
CC
T
x
P2
29
Timothy RoscoeIntelIntel Research Research
It actually works. It actually works. For instance, we implemented Chord in P2For instance, we implemented Chord in P2
Popular distributed hash tablePopular distributed hash table
Complex overlayComplex overlay
Dynamic maintenanceDynamic maintenance
How do we know it works?How do we know it works?
Same high-level propertiesSame high-level properties
Logarithmic overlay diameterLogarithmic overlay diameter
Logarithmic state sizeLogarithmic state size
Consistent routing with churnConsistent routing with churn
Comparable performance to hand-coded implementationsComparable performance to hand-coded implementations
RM1Generate
pingEvent(local)TimedPullPush ping_interval
Slot
RM3 ProjectpingResp
(Y,X)Slot
RM4 Join pingResp.XpingNodes.X
Select pingResp.Y = pingNodes.Y
Project lastPing
(X, Y, now)
RM2 Join pingEvent.XpingNodes.X
ProjectpingReq(X,Y)
Materializations
Insert
pingNodesDemux
(@local?)
TimedPullPush0
Network OutQueueremote
local
Netw
ork In
pingNodes
pingEvent
pingReq
pingResp
lastPing
Insert lastPing
RoundRobin
Mux
Tim
edPullPush 0
Queue
Dem
ux(tuple nam
e)
P2
30
Timothy RoscoeIntelIntel Research Research
Key point: remarkably concise overlay specification
Key point: remarkably concise overlay specification
Full specification of Chord Full specification of Chord overlay, includingoverlay, including
Failure recoveryFailure recovery
Multiple successorsMultiple successors
StabilizationStabilization
Optimized maintenanceOptimized maintenance
44 OverLog rules44 OverLog rules
And it runs!And it runs!10 pt font
P2
31
Timothy RoscoeIntelIntel Research Research
Comparison: MIT Chord in C++Comparison: MIT Chord in C++
P2
32
Timothy RoscoeIntelIntel Research Research
Lookup length in hopsLookup length in hops
P2
33
Timothy RoscoeIntelIntel Research Research
Maintenance bandwidth(comparable with MIT Chord)
Maintenance bandwidth(comparable with MIT Chord)
P2
34
Timothy RoscoeIntelIntel Research Research
Latency without churnLatency without churn
P2
35
Timothy RoscoeIntelIntel Research Research
Latency under churnLatency under churn
Compare with Bamboo
non-adaptive timeout
figures…
P2
36
Timothy RoscoeIntelIntel Research Research
Consistency under churnConsistency under churn
P2
37
Timothy RoscoeIntelIntel Research Research
The story so far:The story so far: Can specify overlays as continuous queries in a Can specify overlays as continuous queries in a
logic languagelogic language
Compile to a graph of dataflow elementsCompile to a graph of dataflow elements
Efficiently execute graph to perform routing and Efficiently execute graph to perform routing and forwardingforwarding
Overlays exhibit similar performance characteristicsOverlays exhibit similar performance characteristics
But …But …
Once you have a distributed query processor, lots of Once you have a distributed query processor, lots of things fall off the back of the truck…things fall off the back of the truck…
P2
38
Timothy RoscoeIntelIntel Research Research
What else does this buy you?Introspection (w/ Atul Singh, Rice)
What else does this buy you?Introspection (w/ Atul Singh, Rice)
Overlay invariant monitoring: Overlay invariant monitoring: a distributed watchpointa distributed watchpoint ““What’s the average path length?”What’s the average path length?”
““Is routing consistent?”Is routing consistent?”
Execution tracing at “pseudo-code” granularity: Execution tracing at “pseudo-code” granularity: logical logical steppingstepping Why did rule R7 trigger?Why did rule R7 trigger?
… … and at dataflow granularity: and at dataflow granularity: intermediate representation intermediate representation steppingstepping Why did that tuple expire?Why did that tuple expire?
Great way to do distributed debugging and loggingGreat way to do distributed debugging and logging In fact, we use it and have found a number of bugs…In fact, we use it and have found a number of bugs…
P2
39
Timothy RoscoeIntelIntel Research Research
What else does this buy you?2. Transport reconfiguration
What else does this buy you?2. Transport reconfiguration
Dataflow paradigm thins out Dataflow paradigm thins out layer boundarieslayer boundaries
Mix and match transport Mix and match transport facilities (retries, congestion facilities (retries, congestion control, rate limitation, control, rate limitation, buffering)buffering)
Spread bits of transport Spread bits of transport through the application to suit through the application to suit application requirementsapplication requirements
Automatically!Automatically!
Queue CC Tx
Demux
RR Sched
CC Rx UDP Rx
UDP TxRoute/ Demux
Ap
plic
atio
n
Ne
two
rk
(a)
Retry
Queue CC Tx
Demux
RR Sched
CC Rx UDP Rx
UDP TxRoute/ Demux
Ap
plic
atio
n
Ne
two
rk
(b)
Retry
... ...
CC Tx
Demux
RR Sched
CC Rx UDP Rx
UDP Tx
Route/ Demux
Ap
plic
atio
n
Ne
two
rk
(c)
Retry
... ...
Buffered Agg
......
...
P2
40
Timothy RoscoeIntelIntel Research Research
In fact, a rich seam for future research…
In fact, a rich seam for future research…
Reconfigurable transport protocolsReconfigurable transport protocols
Debugging and logging supportDebugging and logging support
The “right” language – global invariantsThe “right” language – global invariants
Use distributed joins as abstraction mechanismUse distributed joins as abstraction mechanism
Optimization techniquesOptimization techniques
Inc. multiquery optimizationInc. multiquery optimization
Monitoring other distributed systems and networksMonitoring other distributed systems and networks
Evolve towards more general query processor?Evolve towards more general query processor?
PIER heritage returnsPIER heritage returns
P2
41
Timothy RoscoeIntelIntel Research Research
SummarySummary Overlays enable distributed system innovationOverlays enable distributed system innovation
We’d better make them easier to build, reuse, understandWe’d better make them easier to build, reuse, understand
P2 enablesP2 enables High-level overlay specification in OverLogHigh-level overlay specification in OverLog
Automatic translation of specification into dataflow graphAutomatic translation of specification into dataflow graph
Execution of dataflow graphExecution of dataflow graph
Explore and Embrace the trade-off between fine-tuning and Explore and Embrace the trade-off between fine-tuning and ease of developmentease of development
Get the full immersion treatment in our paper in Get the full immersion treatment in our paper in SOSP ’05, code release imminent SOSP ’05, code release imminent
P2
42
Timothy RoscoeIntelIntel Research Research
Thanks! Questions?Thanks! Questions? A few to get you started:A few to get you started:
Who cares about overlays?Who cares about overlays?
Logic? You mean Prolog? Eeew!Logic? You mean Prolog? Eeew!
This language is really ugly. Discuss.This language is really ugly. Discuss.
But what about security?But what about security?
Is anyone ever going to use this?Is anyone ever going to use this?
Is this as revolutionary and inspired as it looks?Is this as revolutionary and inspired as it looks?
top related