differential dataflow (and the naiad system)
DESCRIPTION
Differential Dataflow (and the Naiad system). Frank McSherry , Derek G. Murray, Rebecca Isaacs, Michael Isard Microsoft Research, Silicon Valley. Data-parallel dataflow. 1. k 1:. 1. 4. 5. A. 2. 3. k 2:. 2. B. C. 4. 5. 6. k 3:. 3. 6. D. E. Data-parallel dataflow. 1. A. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/1.jpg)
Differential Dataflow
(and the Naiad system)
Frank McSherry, Derek G. Murray,Rebecca Isaacs, Michael Isard
Microsoft Research, Silicon Valley
![Page 2: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/2.jpg)
Data-parallel dataflow
12345
1 423 66
5 AB CD E
k1:k2:k3:
![Page 3: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/3.jpg)
Data-parallel dataflow
123456
AB CD E
![Page 4: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/4.jpg)
Data-parallel dataflow
123456
AB CD E
iii iiiiv v
ijk
![Page 5: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/5.jpg)
Data-parallel dataflow
123456
AB CD E
iii iiiiv v
ijk
![Page 6: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/6.jpg)
Data-parallel dataflowSimple systems (Hadoop, Dryad) process entire collections.
1. Incremental updates. (StreamInsight, Incoop)2. Fixed point iteration. (Datalog, Rex, Nephele)3. Prioritized computation. (PrIter)
Hard to compose, for non-trivial reasons. (IVM rec-queries)
e.g. Maintaining the Strongly Connected Components of a social graph as edges continually arrive/depart.
![Page 7: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/7.jpg)
NaiadData-parallel compute engine using differential dataflow.
C#/LINQ programming model:• arbitrarily nested loops,• incremental updates,• prioritization,• … • fully composable.
Trades memory for performance:Data-parallelism to scale memory.
![Page 8: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/8.jpg)
Using Naiad1. Programmer writes a declarative Naiad program.
Loop Body
⋈ ∪ MinEdges
Labels
Output
![Page 9: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/9.jpg)
// produces a (name, label) pair for each node in the input graph. public Collection<Node> DirectedReachability(Collection<Edge> edges) { // start each node in the graph with itself as a label var nodes = edges.Select(x => new Node(name = x.src, label = x.src)) .Distinct(); // repeatedly update labels to the minimum of the labels of neighbors return nodes.FixedPoint(x => x.Join(edges, n => n.name, e => e.src, (n, e) => new Node(e.dst, n.label)) .Concat(nodes) .Min(n => n.name, n => n.label)); }
Using Naiad1. Programmer writes a declarative Naiad program.
![Page 10: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/10.jpg)
// produces a (name, label) pair for each node in the input graph. public Collection<Node> DirectedReachability(Collection<Edge> edges) { // start each node in the graph with itself as a label var nodes = edges.Select(x => new Node(name = x.src, label = x.src)) .Distinct(); // repeatedly update labels to the minimum of the labels of neighbors return nodes.FixedPoint(x => x.Join(edges, n => n.name, e => e.src, (n, e) => new Node(e.dst, n.label)) .Concat(nodes) .Min(n => n.name, n => n.label)); }
Using Naiad1. Programmer writes a declarative Naiad program.
![Page 11: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/11.jpg)
// produces a (name, label) pair for each node in the input graph. public Collection<Node> DirectedReachability(Collection<Edge> edges) { // start each node in the graph with itself as a label var nodes = edges.Select(x => new Node(name = x.src, label = x.src)) .Distinct(); // repeatedly update labels to the minimum of the labels of neighbors return nodes.FixedPoint(x => x.Join(edges, n => n.name, e => e.src, (n, e) => new Node(e.dst, n.label)) .Concat(nodes) .Min(n => n.name, n => n.label)); }
Using Naiad2. Program is compiled to a cyclic dataflow graph.
![Page 12: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/12.jpg)
Using Naiad2. Program is compiled to a cyclic dataflow graph.
![Page 13: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/13.jpg)
Using Naiad3. Graph is distributed across independent workers.4. Computation stays resident, with interactive access.var edges = new InputCollection<Edge>();
var labels = edges.DirectedReachability();
labels.Subscribe(x => ProcessLabels(x)); while (!inputStream.Closed()) edges.OnNext(inputStream.GetNext());
![Page 14: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/14.jpg)
Incremental DataflowData-parallel operators can operate on differences:
Collection : { ( record, count ) }
Operator YX
![Page 15: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/15.jpg)
Incremental DataflowData-parallel operators can operate on differences:
Difference : { ( record, delta ) }
Operator YX
![Page 16: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/16.jpg)
Incremental DataflowData-parallel operators can operate on differences:
Difference : { ( record, delta ) }
Operator dYdX
![Page 17: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/17.jpg)
Incremental DataflowData-parallel operators can operate on differences:
Difference : { ( record, delta ) }
Operator dYdX
![Page 18: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/18.jpg)
Incremental DataflowData-parallel operators can operate on differences:
Difference : { ( record, delta ) }
OperatordX dYdX dY
![Page 19: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/19.jpg)
Incremental DataflowData-parallel operators can operate on differences:
Difference : { ( record, delta ) }
OperatordX dYdX dY
![Page 20: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/20.jpg)
Incremental DataflowData-parallel operators can operate on differences:
Difference : { ( record, delta ) }
OperatordX dYdX dYdX dY
![Page 21: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/21.jpg)
Incremental DataflowData-parallel operators can operate on differences:
Difference : { ( record, delta ) }
OperatordX dYdX dYdX dY
![Page 22: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/22.jpg)
Incremental DataflowData-parallel operators can operate on differences:
Difference : { ( record, delta ) }
OperatordX dYdX dYdX dY
![Page 23: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/23.jpg)
Incremental DataflowData-parallel operators can operate on differences:
Difference : { ( record, delta ) }
OperatordX dYdX dYdX dY
![Page 24: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/24.jpg)
Incremental DataflowData-parallel operators can operate on differences:
Difference : { ( record, delta ) }
Up until this point, this is all old news.
OperatordX dYdX dYdX dY
![Page 25: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/25.jpg)
Differential DataflowData-parallel operators can operate on differences:
Difference : { ( record, delta, version ) }
OperatordX dYdX dYdX dY
![Page 26: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/26.jpg)
Differential Dataflow
OperatordX dYdX dYdX dY
Data-parallel operators can operate on differences:
Difference : { ( record, delta, version ) }
Important: A version can be more than just an integer.
![Page 27: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/27.jpg)
Differential Dataflow
dX
OperatordX dYdX dYdX dY
Data-parallel operators can operate on differences:
Difference : { ( record, delta, version ) }
Important: A version can be more than just an integer.
![Page 28: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/28.jpg)
Differential Dataflow
OperatordX dYdX dYdX dY
dX dY
Data-parallel operators can operate on differences:
Difference : { ( record, delta, version ) }
Important: A version can be more than just an integer.
![Page 29: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/29.jpg)
Differential Dataflow
OperatordX dYdX dYdX dY
dX dY
Data-parallel operators can operate on differences:
Difference : { ( record, delta, version ) }
Important: A version can be more than just an integer.
![Page 30: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/30.jpg)
Differential Dataflow
OperatordX dYdX dYdX dY
dX dYdX dY
Data-parallel operators can operate on differences:
Difference : { ( record, delta, version ) }
Important: A version can be more than just an integer.
![Page 31: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/31.jpg)
Differential Dataflow
OperatordX dYdX dYdX dY
dX dYdX dY
Data-parallel operators can operate on differences:
Difference : { ( record, delta, version ) }
Important: A version can be more than just an integer.
![Page 32: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/32.jpg)
Differential Dataflow
OperatordX dYdX dYdX dY
dX dYdX dX dY dY
Data-parallel operators can operate on differences:
Difference : { ( record, delta, version ) }
Important: A version can be more than just an integer.
![Page 33: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/33.jpg)
Differential Dataflow
OperatordX dYdX dYdX dY
dX dYdX dX dY dY
Data-parallel operators can operate on differences:
Difference : { ( record, delta, version ) }
Important: A version can be more than just an integer.
![Page 34: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/34.jpg)
Differential Dataflow
OperatordX dYdX dYdX dY
dX dYdX dX dY dY
Data-parallel operators can operate on differences:
Difference : { ( record, delta, version ) }
Important: A version can be more than just an integer.
![Page 35: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/35.jpg)
Differential DataflowData-parallel operators can operate on differences:
Difference : { ( record, delta, version ) }
OperatordX dYdX dYdX dY
dX dYdX dX dY dY
dY dY dYdXdX dX
![Page 36: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/36.jpg)
Differential DataflowData-parallel operators can operate on differences:
Difference : { ( record, delta, version ) }
OperatordX dYdX dYdX dY
dX dYdX dX dY dY
dY dY dYdXdX dX
![Page 37: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/37.jpg)
Differential DataflowData-parallel operators can operate on differences:
Difference : { ( record, delta, lattice ) }
OperatordX dYdX dYdX dY
dX dYdX dX dY dY
dY dY dYdXdX dX
![Page 38: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/38.jpg)
Differential DataflowData-parallel operators can operate on differences:
Difference : { ( record, delta, lattice ) }
OperatordX dYdX dYdX dY
dX dYdX dX dY dY
dY dY dYdXdX dX
![Page 39: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/39.jpg)
Empirical Efficacy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 291
10
100
1000
10000
100000
1000000
baseline
diffe
renc
es (s
ize
of d
X)
inner iterations
incremental
![Page 40: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/40.jpg)
Strongly Connected Components
Nested fixed-point computation.
Two inner loops re-use existing DirectedReachability() query.
The entire computation is alsoautomatically incrementalized.
Declarative program uses 23 LOC.
![Page 41: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/41.jpg)
Strongly Connected Components
// repeatedly remove edges until fixed point.Collection<Edge> SCC(this Collection<Edge> edges){ return edges.FixedPoint(y => y.TrimAndTranspose() .TrimAndTranspose());}
// retain edges whose endpoint are reached by the same nodes.Collection<Edge> TrimAndTranspose(this Collection<Edge> edges){ var labels = edges.DirectedReachability();
return edges.Join(labels, x => x.src, y => y.name, (x,y) => x.Label1(y)) .Join(labels, x => x.dst, y => y.name, (x,y) => x.Label2(y)) .Where(x => x.label1 == x.label2) .Select(x => new Edge(x.dst, x.src));}
![Page 42: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/42.jpg)
Streaming SCC on Twitter
CDFs for 24 hour windowed SCC of @mention graph.
![Page 43: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/43.jpg)
Concluding CommentsThe generality of differential dataflow allows Naiad arrange computation more naturally and efficiently.
Better re-use of previous work, by changing “previous”. Millisecond-scale updates for complex computations.Enables new and richer program patterns.
ex: SCC, also graph coloring, partitioning, …
Bringing declarative data-parallel closer to imperative.
![Page 44: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/44.jpg)
Naiad StatusPublic code release available at project page:
http://research.microsoft.com/naiad/http://bigdataatsvc.wordpress.com/
Code release is C#: Windows (.NET), Linux, OS X (Mono).
Come see our poster and demo, processing tweets.
![Page 45: Differential Dataflow (and the Naiad system)](https://reader035.vdocuments.us/reader035/viewer/2022062310/56816623550346895dd97b80/html5/thumbnails/45.jpg)
Questions?
𝑓 ∞