seaweed: scalable delay aware querying austin donnelly, richard mortier, dushyanth narayanan, ant...

32
Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Upload: archibald-cain

Post on 30-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Seaweed: Scalable Delay Aware Querying

Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron

Microsoft Research, Cambridge

Page 2: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 2

Motivation•Large, highly distributed data

sets•Data stored on endsystems•Endsystems often unavailable•Centralization, replication do not

scale•Must query data in-situ•How can we deal with

unavailability?

Page 3: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 3

Delay aware querying• In-situ

•Push queries to endsystems

• Incremental results•As endsystems become available

•Progress estimation•Current and future completeness

•Scalability•Fault-tolerance

Page 4: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 4

Applications•Admin, diagnostics, resource

mgmt•Select-Project-Aggregate queries•Small results•Low to moderate query rates

•Different network scales•Data center (10,000+)•Enterprise (100,000+)• Internet (1,000,000+)

Page 5: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 5

Enterprise network management

•Endsystem-based monitoring•Endsystems log their own traffic•Flow and PacketHeader tables

•Queries by admins/operators• SELECT SUM(Bytes) FROM Flow WHERE SrcPort=80

•Flow is horizontally partitioned

•300,000 hosts, 1 month•765 TB total size•2.4 Gbps update rate

Page 6: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 6

Roadmap•Motivation•Design

•Overview•Delay awareness•Distributed query protocols

•Evaluation•Conclusion

Page 7: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 7

Seaweed overview• In-situ querying

• One-shot queries

• Incremental results• Progress estimation

• Meta-data replication

• Exactly-once semantics• Scalable, failure-resilient

protocols• Built on P2P overlay

Page 8: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 8

Why delay awareness?•Endsystem unavailability

Page 9: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 9

What is delay awareness?•User receives partial results•Needs progress indicator

•How much data is out there?•How much have I seen?•How long before I get to 99%?

•Delay/completeness tradeoff•Predicted by Seaweed

Page 10: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 10

Completeness•% of relevant data rows seen so

far•Relevant matches query

predicates•Query-specific

•Completeness predictor:•Currently available rows•Total rows•Expected rows/time

Page 11: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 11

Completeness predictor

Page 12: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 12

Completeness prediction•Relevant rows

•Column histograms•Standard row-count estimation•Replication remote estimation

•Uptime•Availability models

•Replicated meta-data•Highly available•Orders of magnitude smaller than

data

Page 13: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 13

Predictor generation• Meta-data replicated periodically• Query sent to all endsystems

•Application-level multicast tree•Retransmit on failure•Aggregate predictors in-tree

• Exactly-once semantics•Available local histogram, time=0•Unavailable replica histogram,

avail.

Page 14: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 14

0

2

4

6

8

10

12

14

16

18

20

1 10 100 1000 10000Time (hours)

Ro

ws

(m

illi

on

s)

76

77

78

79

80

81

82

1 10 100 1000 10000Time (hours)

Ro

ws

(m

illi

on

s)

0

2

4

6

8

10

12

14

1 10 100 1000 10000Time (hours)

Ro

ws

(m

illi

on

s)

0

1

2

3

4

5

6

7

1 10 100 1000 10000Time (hours)

Ro

ws

(m

illi

on

s)

76

77

78

79

80

81

82

1 10 100 1000 10000Time (hours)

Ro

ws

(mill

ion

s)

Predictor generation

`` `

A B C D

0

10 20 40 5030

10

20

Thickness

Frequency

σ1B:

` `

`

A+B

A+B C+D

C D

80

85

90

95

100

1 10 100 1000 10000Time (hours)

Ro

ws

(m

illi

on

s)

A+B+C+D

A`

0

10 20 40 5030

10

20

Thickness

Frequency

σ1

B C D

Page 15: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 15

Query execution•Persistent query state

•New endsystems get active query list

• Incremental convergecast of results•Deterministic child parent mapping•Each vertex is replicated set•Parent remembers child result versions

•Exactly-once semantics• In-network aggregation

Page 16: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 16

Roadmap•Motivation•Design•Evaluation•Conclusion

Page 17: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 17

Evaluation• Packet-level simulation• Farsite availability traces

•51663 hosts, ~4 weeks•Flow tables from packet traces

•456 hosts, ~4 weeks•Assigned randomly to simulation

hosts

• Two queries• SELECT SUM(Bytes) FROM Flow WHERE SrcPort=80• SELECT COUNT(*) FROM Flow WHERE Bytes > 20000

Page 18: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 18

Predictor accuracy

Page 19: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 19

Prediction accuracy (2)

Page 20: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 20

Overheads

0.0001

0.001

0.01

0.1

1

10

100

1000

0 200 400 600 800 1000

Time (hours)

Tx b

andw

idth

(b

ytes

/s/e

ndsy

stem

)

Seaweed maintenance O(1)MSPastry O(log N)Seaweed query O(log N)

Page 21: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 21

Scalability

Page 22: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 22

Roadmap•Motivation•Design•Evaluation•Conclusion

Page 23: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 23

Related work•P2P querying

•PIER, Mercury, …•Move data across network

•Continuous/streaming queries•Astrolabe, SDIMS, Borealis, …• Ignore availability

Page 24: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 24

Future work•Selective centralization

•“Distributed materialized views”•Need bandwidth/availability

estimation•Large views can melt network

•Beyond histograms•Wavelets approximate results?

•Real-life experience, measurements•Deployment within Microsoft

Page 25: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 25

Conclusion•Querying highly distributed data

•Challenges are unavailability, scale

•Delay awareness•Predict delay/availability tradeoff•Exactly-once semantics

•Seaweed:scalable delay aware querying

•Meta-data replication•Fault-tolerant protocols

Page 26: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 26

Questions?

Page 27: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 27

Consistency (membership)• “Exactly-once” semantics

•No double-counting•Every endsystem’s results counted

•If available at any point in query lifetime

•“Precise single-site validity”

• Estimate always generated•For all endsystems, available or not•Endsystem computes own estimate

•If available through estimation phase

Page 28: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 28

Consistency (time)

•Avoid tight synchronization•Clock-skewed snapshots

•Loosely synchronized clocks•With good NTP, milliseconds

•Currently left to application layer•Timestamped, append-only tuples

•Explicit predicates on timestamp

Page 29: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 29

Result aggregation

• Deterministic mapping to parent

• Each parent is replicated set

• Parents remember child results

R1+R2+R3

R3’

`

` `

` `

` ` `

R1 R2

R1,R2 R1,R2

R1+R2 R3

R1+R2,R3 R1+R2,R3R1+R2,R3’ R1+R2,R3’

R1+R2+R3’

Page 30: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 30

Query dissemination in Pastry

836

000FFF hash(query)

0FAE??DA0

3??

37B

???

8??

E9A

Page 31: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 31

Replication in Pastry

8F690E

910

8E2

000FFF

Topology-independentnode identifiers

Each node maintainsa virtual neighbor set (vset)

8F0

Page 32: Seaweed: Scalable Delay Aware Querying Austin Donnelly, Richard Mortier, Dushyanth Narayanan, Ant Rowstron Microsoft Research, Cambridge

Sep 14 2006 Seaweed: Scalable Delay Aware Querying 32

Result routing in Pastry

836

0FA = hash(query)

0360F6