distributed data structures: a survey cyril gavoille (labri, university of bordeaux)

33
Distributed Data Distributed Data Structures: A Survey Structures: A Survey Cyril Gavoille Cyril Gavoille (LaBRI, University of Bordeaux) (LaBRI, University of Bordeaux)

Upload: randolph-manning

Post on 17-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Distributed Data Distributed Data Structures: A SurveyStructures: A Survey

Cyril GavoilleCyril Gavoille(LaBRI, University of Bordeaux)(LaBRI, University of Bordeaux)

Page 2: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

ContentsContents

1.1. Efficient data structuresEfficient data structures

2.2. Distributed data structuresDistributed data structures

3.3. Informative labeling schemesInformative labeling schemes

4.4. ConclusionConclusion

Page 3: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

1. Efficient data structures1. Efficient data structures(Tarjan’s like)(Tarjan’s like)

Example 1:Example 1:

A tree (static) A tree (static) TT with with nn vertices vertices

Question:Question: nearest common ancestor nearest common ancestor nca(nca(x,yx,y) for some vertices ) for some vertices x,yx,y??

Note:Note: queries ( queries (x,yx,y) are not known in advance) are not known in advance

((on-line queries on a static treeon-line queries on a static tree))

Page 4: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

[Harel-Tarjan ’84][Harel-Tarjan ’84]

Each tree with Each tree with nn vertices has a data vertices has a data structure of O(structure of O(nn) space (computable in ) space (computable in linear time) such that nca queries can linear time) such that nca queries can be answered in constant time.be answered in constant time.

Page 5: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

A weighted graph A weighted graph GG with with nn vertices, vertices, and a parameter and a parameter kk≥11

Question:Question: a a kk-approximation δ(-approximation δ(x,yx,y) on ) on dist(dist(x,yx,y) in ) in G G for some vertices for some vertices x,yx,y??

with with dist(dist(x,yx,y) ≤ δ) ≤ δ((x,yx,y) ) ≤ ≤ kk..dist(dist(x,yx,y))

Example 2:Example 2:

Page 6: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

[Thorup-Zwick - [Thorup-Zwick - J.ACMJ.ACM ’05] ’05]

Each undirected weighted graph Each undirected weighted graph GG with with nn vertices, and each integer vertices, and each integer kk≥1, 1, has a data structure of has a data structure of O(O(kk..nn1+1/k) ) space (computable in O(space (computable in O(kkmm..nn1/k) ) expected time) such that (2expected time) such that (2kk-1)--1)-approximated distance queries can approximated distance queries can be answered in O(be answered in O(kk) time.) time.

Essentially optimal, related to an Erdös Conjecture.

Page 7: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

2. Distributed data structures2. Distributed data structures

Typical questions are:Typical questions are:

Answer to query Answer to query QQ with the local knowledge of with the local knowledge of xx (or (or its vicinity), so without any access to a global data its vicinity), so without any access to a global data structure.structure.

A networkA network

x

Page 8: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Query at Query at xx: : who has any mpeg file named ‘‘Sta*Wa*’’?who has any mpeg file named ‘‘Sta*Wa*’’?

Example 1: Distributed Hash Tables Example 1: Distributed Hash Tables (DHT)(DHT)

x

Answer:Answer: go to go to ww and ask it. and ask it.xx does not know, but does not know, but ww certainly knows … at least a certainly knows … at least a

pointerpointer

set of peersset of peerslogical networklogical network

Page 9: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Query at Query at xx: : next hop to go to next hop to go to yy??

Example 2: Routing in a physical Example 2: Routing in a physical networknetwork

x

y

Page 10: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Query at Query at xx: : the number of descents of the number of descents of xx(or a constant approximation of it)(or a constant approximation of it)

Example 3: in a dynamic settingExample 3: in a dynamic setting

A growing rooted A growing rooted treetree

It is possible to maintain a 2-approximation on the It is possible to maintain a 2-approximation on the number of descendants with O(lognumber of descendants with O(log22n) amortized n) amortized messages of O(loglogn) bits each, n number of messages of O(loglogn) bits each, n number of inserted vertices.inserted vertices.

[Afek,Awerbuch,Plokin,Saks – [Afek,Awerbuch,Plokin,Saks – J.ACM J.ACM ’96]’96]

Page 11: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Goals are:Goals are:

►The same as for global data structures:The same as for global data structures: Low preprocessing timeLow preprocessing time Small size data structureSmall size data structure Fast query timeFast query time Efficient updatesEfficient updates

+ + Smaller and balanced local data Smaller and balanced local data structuresstructures

++ Low communication cost (trade-offs), Low communication cost (trade-offs), for multiple hops answersfor multiple hops answers

Page 12: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

3. Informative Labeling 3. Informative Labeling SchemesSchemes

For the talkFor the talk A static network/graphA static network/graph Queries: involve only verticesQueries: involve only vertices Answers: do not require any Answers: do not require any

communication (direct data structures)communication (direct data structures)

Page 13: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Question: dist(Question: dist(x,yx,y) in a graph ) in a graph GG??

Answering to dist(Answering to dist(x,yx,y) consists only in inspecting ) consists only in inspecting the local data structure of the local data structure of xx and of and of yy..

Main goal:Main goal: minimize the maximal size of a local minimize the maximal size of a local data structure. Wish: |DS(data structure. Wish: |DS(x,Gx,G)| )| « « |DS(|DS(GG)|, ideally)|, ideally

|DS(|DS(x,Gx,G)| )| ≈ (1/n)≈ (1/n)..|DS(|DS(GG)|)|

Data StructureData Structurefor graph for graph GG

xx yy

Page 14: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

[Thorup-Zwick - [Thorup-Zwick - J.ACMJ.ACM ’05] ’05]

… … Moreover, each vertex Moreover, each vertex w w L( L(ww) of ) of Õ(Õ(nn1/kloglogDD) bits () bits (DD=weighted diameter of =weighted diameter of GG) ) such that a (2such that a (2kk-1)-approximation on dist(-1)-approximation on dist(x,yx,y) ) can be answered from L(can be answered from L(xx) and L() and L(yy) only.) only.

nn1+1/k

nn1/k

ww yyxx

Overlap: Õ(logOverlap: Õ(logDD))

Page 15: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Informative labeling schemesInformative labeling schemes(more formally) (more formally) [Peleg ’00][Peleg ’00]

A A PP-labeling scheme for -labeling scheme for FF is a pair ‹ is a pair ‹L,fL,f› › such that: such that: GG F F ,, u,vu,v GG::

• (labeling)(labeling) LL((u,Gu,G) is a binary ) is a binary stringstring• (decoder)(decoder) ff(L((L(u,Gu,G),L(),L(v,Gv,G)) = )) = PP((u,v,Gu,v,G))

Let Let PP be a graph property defined on be a graph property defined on pairs of vertices (can be extended to pairs of vertices (can be extended to any tuple), and let any tuple), and let FF be a graph family. be a graph family.

Page 16: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Some Some PP-labeling schemes-labeling schemes

► AdjacencyAdjacency► Distance (exact or approximate)Distance (exact or approximate)► First edge on a (near) shortest path (compact First edge on a (near) shortest path (compact

routing, labeled-based routing)routing, labeled-based routing)► Ancestry, parent, nca, sibling relation in treesAncestry, parent, nca, sibling relation in trees► Edge connectivity, flowEdge connectivity, flow► General predicate General predicate PP described in monadic described in monadic

second order logic second order logic [Courcelle][Courcelle]► Proof labeling systems Proof labeling systems [Korman,Kutten,Peleg][Korman,Kutten,Peleg]

Page 17: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Ancestry in rooted treesAncestry in rooted trees

Motivation: Motivation: [Abiteboul,Kaplan,Milo ’01][Abiteboul,Kaplan,Milo ’01]

The <TAG> … </TAG> structure of a huge XML data-The <TAG> … </TAG> structure of a huge XML data-base is a rooted tree. Some queries are ancestry base is a rooted tree. Some queries are ancestry relations in this tree.relations in this tree.

Use compact index for fast query XML search engine. Use compact index for fast query XML search engine. Here the constants do matter. Saving Here the constants do matter. Saving 11 byte on each byte on each entry of the index table is important. Here entry of the index table is important. Here nn is very is very large, ~ large, ~ 101099..

Ex: Is <“distributed computing”> descendantEx: Is <“distributed computing”> descendantof <book_title>?of <book_title>?

Page 18: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Folklore? Folklore? [Santoro, Khatib ’85][Santoro, Khatib ’85]

[a,b] [a,b] [c,d]?[c,d]?

2logn bit labels2logn bit labels

DFS labelingDFS labeling 1

L(x)=[2,18]

3

4 5 6

7

8

910

[13,18]

18

[22,27]

24

27

121114

16

23

2625

17

15

2120

19

Page 19: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

[Alstrup,Rauhe – SODA ’02][Alstrup,Rauhe – SODA ’02]

Upper bound: logn + O(Upper bound: logn + O(logn) bitslogn) bits

Lower bound: logn + Lower bound: logn + (loglogn) bits(loglogn) bits

1

2

3

4 5 6

7

8

910

13

18

22

24

27

121114

16

23

2625

17

15

2120

19

Page 20: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Adjacency Labeling /Adjacency Labeling /Implicit RepresentationImplicit Representation

PP(x,y,(x,y,GG)=1 iff xy in E()=1 iff xy in E(GG))

[Kanan,Naor,Rudich – STOC ’92][Kanan,Naor,Rudich – STOC ’92]

O(logn) bit labels for:O(logn) bit labels for:• trees (and forests)trees (and forests)• bounded arboricity graphs (planar, …)bounded arboricity graphs (planar, …)• bounded treewidth graphsbounded treewidth graphs

In particular:In particular:• 2logn bits for trees2logn bits for trees• 4logn bits for planar4logn bits for planar

Page 21: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Acutally, the problem is equivalent to an old Acutally, the problem is equivalent to an old combinatorial problem:combinatorial problem:

[Babai,Chung,Erd[Babai,Chung,Erdöös,Graham,Spencer ’82]s,Graham,Spencer ’82]

Small Universal Induced GraphSmall Universal Induced Graph

UU is an universal graph for the family is an universal graph for the family FF if every if every graph of graph of F F is isomorphic to an induced subgraph is isomorphic to an induced subgraph

of of UU b

e

b

a

c

ed

f

g

c

e

c ga

g

Page 22: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Universal graphUniversal graph UU(fixed for (fixed for FF)

Graph Graph GG of of F F

|L(|L(x,Gx,G)| = )| = loglog22|V(|V(UU)|)|

b

e

b

a

c

ed

f

g

c

e

c ga

g

Page 23: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Best known results/Open Best known results/Open questionsquestions

►Bounded degree graphs: 1.Bounded degree graphs: 1.867 867 lognlogn[Alon,Asodi - FOCS ’02][Alon,Asodi - FOCS ’02]

►Trees: logn + O(logTrees: logn + O(log**n)n)[Alstrup,Rauhe - FOCS ’02][Alstrup,Rauhe - FOCS ’02]

Planar: 3logn + O(logPlanar: 3logn + O(log**n)n)

xxvv

ZZ

yy

log*n = min{ ilog*n = min{ i0 | log0 | log(i)(i)nn 1} 1}

Page 24: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Lower bounds?: logn + Lower bounds?: logn + (1) for (1) for planarplanar

No hereditary family with n!2No hereditary family with n!2O(n)O(n) labeled graphs (trees, planar, labeled graphs (trees, planar, bounded genus, bounded treewidth,bounded genus, bounded treewidth,…) is known to require labels of logn …) is known to require labels of logn + + (1) bits.(1) bits.

logn + O(1) bits for this logn + O(1) bits for this family?family?

Page 25: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

DistanceDistance

Motivation: Motivation: [Peleg ’99][Peleg ’99]

If a short label (say of polylogarithmic size) can be If a short label (say of polylogarithmic size) can be added to the address of the destination, then routing to added to the address of the destination, then routing to any destination can be done without routing tables and any destination can be done without routing tables and with a “limited” number of messages.with a “limited” number of messages.

PP(x,y,(x,y,GG)=dist(x,y) in )=dist(x,y) in GG

dist(dist(x,x,yy))

xx

message header=hop-countyy

Page 26: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

A selection resultsA selection results

► (n) bits for general graphs(n) bits for general graphs 1.56n bits, but with O(n) time decoder!1.56n bits, but with O(n) time decoder!

[Winkler ’83 (Squashed Cube Conjecture)][Winkler ’83 (Squashed Cube Conjecture)]

11n bits and O(loglogn) time decoder11n bits and O(loglogn) time decoder[Gavoille,Peleg,Pérennès,Raz ’01][Gavoille,Peleg,Pérennès,Raz ’01]

► (log(log22n) bits for trees and bounded n) bits for trees and bounded treewidth graphs, … treewidth graphs, … [Peleg ’99, GPPR ’01][Peleg ’99, GPPR ’01]

► (logn) bits and O(1) time decoder for (logn) bits and O(1) time decoder for interval, permutation graphs, … interval, permutation graphs, … [ESA ’03]:[ESA ’03]: O(n) space O(1) time data structure, O(n) space O(1) time data structure, even for m=even for m=(n(n22))

Page 27: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Results (cont’d)Results (cont’d)

► (logn(logn..loglogn) bits and (1+o(1))-approximation loglogn) bits and (1+o(1))-approximation for trees and bounded treewidth graphsfor trees and bounded treewidth graphs[GKKPP – ESA ’01][GKKPP – ESA ’01]

► More recently: doubling dimension-More recently: doubling dimension- graphs graphs

Every radius-2r ball can be covered by Every radius-2r ball can be covered by 2 2 radius-r balls radius-r balls

• Euclidean graphs have Euclidean graphs have =O(1)=O(1)• Include bounded growing Include bounded growing graphsgraphs• Robust notionRobust notion

Page 28: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Distance labeling for doubling Distance labeling for doubling dimension graphsdimension graphs

((-O(-O() ) lognlogn..loglogn) bitsloglogn) bits

(1+(1+)-approximation for doubling )-approximation for doubling dimension-dimension- graphs graphs

[Gupta,Krauthgamer,Lee – FOCS ’03][Gupta,Krauthgamer,Lee – FOCS ’03]

[Talwar – STOC ’04][Talwar – STOC ’04]

[Mendel,Har-Peled – SoCG ’05][Mendel,Har-Peled – SoCG ’05]

[Slivkins - PODC ’05][Slivkins - PODC ’05]

Page 29: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Distance labeling for planarDistance labeling for planar

►O(logO(log22n) bits for 3-approximationn) bits for 3-approximation

[Gupta,Kumar,Rastogi – [Gupta,Kumar,Rastogi – SICOMPSICOMP ’05] ’05]

►O(O(-1-1loglog22n) bits for (1+n) bits for (1+)-)-approximationapproximation

[Thorup – [Thorup – J.ACMJ.ACM ’04] ’04]

►(n(n1/31/3) ) ? ? Õ( Õ(n) for exact distancen) for exact distance

Page 30: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

Lower bounds for planarLower bounds for planar[Gavoille,Peleg,Pérennès,Raz – SODA [Gavoille,Peleg,Pérennès,Raz – SODA

’01]’01]#vertices ~ k3

#critical edges ~ k2

#labels =2k

|label|> k2/ 2k ~ n1/3

Page 31: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

► A graph G with a state SA graph G with a state Suu at each vertex u: (G,S) at each vertex u: (G,S)

► A global property A global property PP (MST, 3-coloring, …) (MST, 3-coloring, …)► A marker algorithm applied on (G,S) that returns A marker algorithm applied on (G,S) that returns

a label L(u) for ua label L(u) for u► A binary decoder (checker) for u applied on N(u):A binary decoder (checker) for u applied on N(u):

ffuu = f(S = f(Suu,L(u),L(v,L(u),L(v11)…L(v)…L(vkk)) )) ∈ ∈ {0,1}{0,1}

G has property G has property PP ffuu=1 =1 uu

G hasn't prop. G hasn't prop. PP w, fw, fww=0 whatever the labels =0 whatever the labels areare

Proof Labeling SystemsProof Labeling Systems[Korman,Kutten,Peleg – PODC ’05][Korman,Kutten,Peleg – PODC ’05]

uu

vv11

vv33

vv22

SS11

SS44

SS22SS33SS55

Page 32: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

What is the knowledge needed for local What is the knowledge needed for local verifications of global properties?verifications of global properties?

SS11

SS44

SS22SS33SS55

Page 33: Distributed Data Structures: A Survey Cyril Gavoille (LaBRI, University of Bordeaux)

ConclusionConclusion

►Labeling scheme for Labeling scheme for distributed distributed computingcomputing is a rich concept. is a rich concept.

►Many things remain to do, specially Many things remain to do, specially lower boundslower bounds