cs 347notes 021 cs 347: parallel and distributed data management notes02: distributed db design...

80
CS 347 Notes 02 1 CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design Hector Garcia-Molina

Upload: florence-daniel

Post on 19-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

CS 347 Notes 02 1

CS 347: Parallel and Distributed

Data ManagementNotes02: Distributed DB

Design

Hector Garcia-Molina

CS 347 Notes 02 2

Distributed DB Design

Top-down approach: - have DB… - how to split and allocate the sites

Multi-DBs (or bottom-up): no design issues!

Chapter 5Ozsu & Valduriez

CS 347 Notes 02 3

Two issues in DDB design:

• Fragmentation• Allocation

Note: issues not independent, but will cover separately

CS 347 Notes 02 4

Example

Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select *

from E from Ewhere loc=Sa where

loc=Sband… and ...

CS 347 Notes 02 5

Example

Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select *

from E from Ewhere loc=Sa where

loc=Sband… and ...

Motivation: Two sites: Sa, Sb Qa QbSa Sb

CS 347 Notes 02 6

• It does not take a rocket scientist to figure out fragmentation...

CS 347 Notes 02 7

# NM Loc Sal

E 578

Sa 10Sally Sb 25Tom Sa 15

Joe

# NM Loc Sal # NM Loc Sal58

Sa 10Tom Sa 15Joe 7 Sb 25Sally

..

..

....

F

At SaAt Sb

CS 347 Notes 02 8

F = { F1, F2 }

F1 = loc=Sa E F2 = loc=Sb E

CS 347 Notes 02 9

F = { F1, F2 }

F1 = loc=Sa E F2 = loc=Sb E

called primary horizontal fragmentation

CS 347 Notes 02 10

Fragmentation

• Horizontal Primary depends on local attributes

R Derived depends on foreign

relation

• Vertical

R

CS 347 Notes 02 11

Fragmentation

• Horizontal Primary depends on local attributes

R Derived depends on foreign

relation

• Vertical

R Fragmentation also calledSharding

CS 347 Notes 02 12

Three common horizontal partitioning techniques

• Round robin• Hash partitioning• Range partitioning

CS 347 Notes 02 13

• Round robin

R D0 D1 D2

t1 t1t2 t2t3 t3t4 t4... t5

• Evenly distributes data• Good for scanning full relation• Not good for point or range queries

CS 347 Notes 02 14

• Hash partitioning

R D0 D1 D2

t1h(k1)=2 t1t2h(k2)=0 t2t3h(k3)=0 t3t4h(k4)=1 t4...

• Good for point queries on key; also for joins• Not good for range queries; point queries not on

key• If hash function good, even distribution

CS 347 Notes 02 15

• Range partitioning

R D0 D1 D2

t1: A=5 t1t2: A=8 t2t3: A=2 t3t4: A=3 t4...

• Good for some range queries on A• Need to select good vector: else unbalance

data skew execution skew

4 7

partitioningvector

V0 V1

CS 347 Notes 02 16

Which are good fragmentations?

Example:

F = { F1, F2 }

F1 = sal<10 E F2 = sal>20 E

CS 347 Notes 02 17

Which are good fragmentations?

Example:

F = { F1, F2 }

F1 = sal<10 E F2 = sal>20 E

Problem: Some tuples lost!

CS 347 Notes 02 18

Which are good fragmentations?

Second example:

F = { F3, F4 }

F3 = sal<10 E F4 = sal>5 E

CS 347 Notes 02 19

Which are good fragmentations?

Second example:

F = { F3, F4 }

F3 = sal<10 E F4 = sal>5 E

Tuples with 5 < sal < 10 are duplicated...

CS 347 Notes 02 20

Prefer to deal with replication explicitly

Example: F = { F5, F6, F7 }

F5 = sal 5 E F6 = 5< sal <10

E

F7 = sal 10 E

Then replicate F6 if convenient (part of allocation problem)

CS 347 Notes 02 21

Desired properties for horizontal fragmentation

R F ={ F1, F2, … }

(1) Completeness

t R, Fi F such that t Fi

CS 347 Notes 02 22

(2) Disjointness t Fi, Fj such that tFj, i j, Fi, Fj F

(3) Reconstruction - ignore

CS 347 Notes 02 23

How do we get completeness and disjointness?

(1) Check it “manually”!

e.g., F1 = sal<10 E ; F2 = sal10 E

CS 347 Notes 02 24

How do we get completeness and disjointness?

(2) “Automatically” generate fragments with these properties

Desired simple predicates Fragments

CS 347 Notes 02 25

Example of generation

• Say queries use predicates:A<10, A>5, Loc = SA, Loc = SB

• Next: - generate “minterm” predicates

- eliminate useless ones

CS 347 Notes 02 26

Minterm predicates (part I)(1) A<10 A>5 Loc=SA Loc=SB

(2) A<10 A>5 Loc=SA ¬(Loc=SB)(3) A<10 A>5 ¬(Loc=SA) Loc=SB

(4) A<10 A>5 ¬(Loc=SA) ¬(Loc=SB)(5) A<10 ¬(A>5) Loc=SA Loc=SB

(6) A<10 ¬(A>5) Loc=SA ¬(Loc=SB)(7) A<10 ¬(A>5) ¬(Loc=SA) Loc=SB

(8) A<10 ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)

CS 347 Notes 02 27

Minterm predicates (part I)(1) A<10 A>5 Loc=SA Loc=SB

(2) A<10 A>5 Loc=SA ¬(Loc=SB)(3) A<10 A>5 ¬(Loc=SA) Loc=SB

(4) A<10 A>5 ¬(Loc=SA) ¬(Loc=SB)(5) A<10 ¬(A>5) Loc=SA Loc=SB

(6) A<10 ¬(A>5) Loc=SA ¬(Loc=SB)(7) A<10 ¬(A>5) ¬(Loc=SA) Loc=SB

(8) A<10 ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)

CS 347 Notes 02 28

Minterm predicates (part I)(1) A<10 A>5 Loc=SA Loc=SB

(2) A<10 A>5 Loc=SA ¬(Loc=SB)(3) A<10 A>5 ¬(Loc=SA) Loc=SB

(4) A<10 A>5 ¬(Loc=SA) ¬(Loc=SB)(5) A<10 ¬(A>5) Loc=SA Loc=SB

(6) A<10 ¬(A>5) Loc=SA ¬(Loc=SB)(7) A<10 ¬(A>5) ¬(Loc=SA) Loc=SB

(8) A<10 ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)

A 5

5 < A < 10

CS 347 Notes 02 29

Minterm predicates (part II)

(9) ¬(A<10) A>5 Loc=SA Loc=SB

(10) ¬(A<10) A>5 Loc=SA ¬(Loc=SB)(11) ¬(A<10) A>5 ¬(Loc=SA) Loc=SB

(12) ¬(A<10) A>5 ¬(Loc=SA) ¬(Loc=SB)(13) ¬(A<10) ¬(A>5) Loc=SA Loc=SB

(14) ¬(A<10) ¬(A>5) Loc=SA ¬(Loc=SB)(15) ¬(A<10) ¬(A>5) ¬(Loc=SA) Loc=SB

(16) ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)

CS 347 Notes 02 30

Minterm predicates (part II)

(9) ¬(A<10) A>5 Loc=SA Loc=SB

(10) ¬(A<10) A>5 Loc=SA ¬(Loc=SB)(11) ¬(A<10) A>5 ¬(Loc=SA) Loc=SB

(12) ¬(A<10) A>5 ¬(Loc=SA) ¬(Loc=SB)(13) ¬(A<10) ¬(A>5) Loc=SA Loc=SB

(14) ¬(A<10) ¬(A>5) Loc=SA ¬(Loc=SB)(15) ¬(A<10) ¬(A>5) ¬(Loc=SA) Loc=SB

(16) ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)

A 10

CS 347 Notes 02 31

Final fragments:

F2: 5 < A < 10 Loc=SA F3: 5 < A < 10 Loc=SB F6: A 5 Loc=SA F7: A 5 Loc=SB F10: A 10 Loc=SA F11: A 10 Loc=SB

CS 347 Notes 02 32

Note: elimination of useless fragments depends on application semantics:

e.g.: if LOC could be SA, SB, we need to add fragments

F4: 5 <A <10 Loc SA Loc SB

F8: A 5 Loc SA Loc SB

F12: A 10 Loc SA Loc SB

CS 347 Notes 02 33

Why does this work?

Predicates: p1 p2 p3 p4 p1 p2 p3 ¬ p4

¬ p1 ¬ p2 ¬ p3 ¬ p4

...

CS 347 Notes 02 34

(1) Completeness: Take t Rpi(t) must be T or F!

Say p1(t) =T p2(t) = T p3(t) =F p4(t) =F

Then t is in fragment with predicate p1 p2 ¬ p3 ¬ p4

CS 347 Notes 02 35

(2) DisjointnessSay t Fragment p1 p2 ¬ p3 ¬ p4

Then: p1(t) = T, p2(t) = T, p3(t) = F, p4(t)= F

t cannot be in any other fragment!

CS 347 Notes 02 36

Summary• Given simple predicates Pr= { p1, p2,..

pm }minterm predicates are

M={m | m = pk*, 1 k m }

where pk* is pk or is ¬ pk

pkPr

• Fragments m R for all m M are

complete and disjoint

Another Desired Fragmentation Property:

Match Access Patterns

CS 347 Notes 02 37

data A

data B

data C

frequentlyaccessedtogether

try toplace insame

fragment

CS 347 Notes 02 38

Return to example:

E(#, NM, LOC, SAL,…)Common queries:

Qa: select * Qb: select * from E from E where LOC=Sa where

LOC=Sb and … and ...

CS 347 Notes 02 39

Three choices:

(1) Pr = { } F1 ={ E }

(2) Pr = {LOC=Sa, LOC=Sb}

F2={ loc=Sa E, loc=Sb E }

(3) Pr = {LOC=Sa, LOC=Sb, Sal<10}

F3={ loc=Sa sal<10 E, loc=Sa sal10 E, loc=Sb sal<10E, loc=Sb sal10 E }

CS 347 Notes 02 40

In other words:

Loc=Sa sal < 10

Loc=Sa sal 10

Loc=Sb sal < 10

Loc=Sb sal 10

F1 F3F2

Qa: Select … loc = Sa ...

Qb: Select … loc = Sb ...

CS 347 Notes 02 41

In other words:

Loc=Sa sal < 10

Loc=Sa sal 10

Loc=Sb sal < 10

Loc=Sb sal 10

F1 F3F2

Qa: Select … loc = Sa ...

Qb: Select … loc = Sb ...

F2 is good…

(not F1 , F3 )

CS 347 Notes 02 42

Derived horizontal fragmentation

Example: E(#, NM, SAL, LOC) F={ E1, E2} by LOC

J(#, DES,…)Common query for project: [Given employee name,

list projects (s)he works in]

CS 347 Notes 02 43

E1

(at Sa) (at Sb)

E2# NM Loc Sal5 Joe Sa 108 Tom Sa 15…

# NM Loc Sal7 Sally Sb 2512 Fred Sb 15…

# Description5 work on 347 hw7 go to moon5 build table12 rest…

J

CS 347 Notes 02 44

E1

(at Sa) (at Sb)

E2# NM Loc Sal5 Joe Sa 108 Tom Sa 15…

# NM Loc Sal7 Sally Sb 2512 Fred Sb 15…

J1 J2

J1 = J E1 J2 = J E2

# Des5 work on 347 hw5 build table…

# Des7 go to moon12 rest…

CS 347 Notes 02 45

Derived horizontal fragmentation

R, F = { F1, F2, ... Fn}

S, D = {D1, D2, …Dn} where Di =S Fi

Convention: R is owner S is member

F could beprimary or derived

CS 347 Notes 02 46

• Checking completeness and disjointness of derived fragmentation

But no #= 33 in E1 nor in E2!

# Des…33 build chair…

Example: Say J is:

This J tuple will not be in J1 nor J2Fragmentation not complete

CS 347 Notes 02 47

Need to enforcereferential integrity constraint:

join attr(#) of member relation

joint attr(#) of owner

relation

To get completeness

CS 347 Notes 02 48

# NM Loc Sal5 Joe Sa 10…

# NM Loc Sal5 Fred Sb 20…

Example:

E1 E2

# Description5 day off…

# Description5 day off…

# Description5 day off…

J1

J

J2

Fragmentationis not

disjoint!

CS 347 Notes 02 49

Join attribute(#) should bekey of owner relation

To get disjointness

CS 347 Notes 02 50

Summary: horizontal fragmentation

• Type: primary, derived• Properties: completeness,

disjointness

CS 347 Notes 02 51

Vertical fragmentation

E1

# NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15…

# NM Loc5 Joe Sa7 Sally Sb8 Fred Sa…

# Sal5 107 258 15…

E

E2

Example:

CS 347 Notes 02 52

R[T] R1[T1] Ti T

Rn[Tn]

Just like normalization of relations

...

CS 347 Notes 02 53

Properties: R[T] Ri[Ti]

(1) Completeness

U Ti = Tall i

CS 347 Notes 02 54

(2) DisjointnessTi Tj = for all i,j ij

E(#,LOC,SAL)E1(#,LOC)

E2(SAL)

CS 347 Notes 02 55

(2) DisjointnessTi Tj = for all i,j ij

E(#,LOC,SAL)E1(#,LOC)

E2(SAL)

Not a desirable property!!(could not reconstruct R!)

CS 347 Notes 02 56

(3) Lossless join

Ri = Rall i

One way to achieve lossless join: Repeat key in all fragments, i.e.,

Key Ti for all i

CS 347 Notes 02 57

How do we decide what attributes are grouped with which?

E1(#,NM,LOC)E2(#,SAL)

Example:E(#,NM,LOC,SAL) E1(#,NM)

E2(#,LOC)E3(#,SAL)

?

CS 347 Notes 02 58

A1 A2 A3 A4 A5

A1 - - - - -

A2 50 - - - -

A3 45 48 - - -

A4 1 2 0 - -

A5 0 0 4 75 -

Attribute affinity matrix

CS 347 Notes 02 59

A1 A2 A3 A4 A5

A1 - - - - -

A2 50 - - - -

A3 45 48 - - -

A4 1 2 0 - -

A5 0 0 4 75 -

Attribute affinity matrix

R1[K,A1,A2,A3] R2[K,A4,A5]

CS 347 Notes 02 60

• Textbook (Ozsu & Valduriez) discusses– How to build affinity matrix– How to identify attribute clusters– How to partition relation

• You are not responsible for– Clustering and partitioning algorithms (i.e., Skip pages 135-145)

CS 347 Notes 02 61

Allocation

Example: E(#,NM,LOC,SAL) F1 = loc=Sa E ; F2 = loc=Sb E

Qa: select … where loc=Sa...Qb: select … where loc=Sb…

Site a Site b

Where do F1,F2 go?

?

CS 347 Notes 02 62

Issues

• Where do queries originate• What is communication cost?

and size of answers, relations,…

• What is storage capacity, cost at sites? and size of fragments?

• What is processing power at sites?

CS 347 Notes 02 63

• What is query processing strategy?

– How are joins done?– Where are answers collected?

More Issues

CS 347 Notes 02 64

• Cost of updating copies?• Writes and concurrency control?• ...

Do we replicate fragments?

CS 347 Notes 02 65

Optimization problem:• What is best placement of

fragments and/or best number of copies to:– minimize query response time– maximize throughput– minimize “some cost”– ...

• Subject to constraints?– Available storage– Available bandwidth, power,…– Keep 90% of response time below X– ...

CS 347 Notes 02 66

Optimization problem:• What is best placement of

fragments and/or best number of copies to:– minimize query response time– maximize throughput– minimize “some cost”– ...

• Subject to constraints?– Available storage– Available bandwidth, power,…– Keep 90% of response time below X– ...

This is an incredibly hard problem

CS 347 Notes 02 67

Example: Single fragment F

Read cost: [ti MIN Cij]

i: Originating site of requestti: Read traffic at Si

Cij: Retrieval costAccessing fragment F at Sj from Si

ji=1

m

CS 347 Notes 02 68

Scenario - Read cost

1 2

.

. .

.

.

.

3i

C=inf

ci,3

ci,1 ci,2

Stream of readrequests for F ti REQ/SEC

C=inf

C=infF

F F

CS 347 Notes 02 69

Write cost

Xj ui C’ij

i: Originating site of requestj: Site being updatedXj: 0 if F not stored at Sj

1 if F stored at Sj

ui: Write traffic at Si C’ij: Write cost

Updating F at Sj from Si

i=1 j=1

m m

CS 347 Notes 02 70

Scenario - write cost

Updates ui

updates/sec

. .

. .

. .i

F

F F

CS 347 Notes 02 71

Storage cost:

Xi di

Xi: 0 if F not stored at Si

1 if F stored at Si

di: storage cost at Si

i=1

m

CS 347 Notes 02 72

Target function:

min [ti MIN Cij + Xj ui

C’ij ]

+ Xi di

ji=1 j=1

i=1

m m

m

CS 347 Notes 02 73

Can add more complications:

Examples: - Multiple fragments- Fragment sizes- Concurrency control cost

Case Study: PNUTS• Where in the World is My Data?

Sudarshan Kadambi, Jianjun Chen, Brian F. Cooper, David Lomax, Raghu Ramakrishnan, Adam Silberstein, Erwin Tam, Hector Garcia-Molina; VLDB 2011

• Distributed object/tuple store for Yahoo!

CS 347 Notes 02 74

Case Study: PNUTS• Issue: Where to locate data• Issue: What and where to replicate

CS 347 Notes 02 75

PNUTS Discussion

• Dynamic vs Static fragment placement

• Caching vs Replication

CS 347 Notes 02 76

Policy Constraints• MIN_COPIES: The minimum number of

full replicas of the record that must exist.

• INCL_LIST: An inclusion list -- the locations where a full replica of the record must exist.

• EXCL_LIST: An exclusion list -- the locations where a full replica of the record cannot exist.

CS 347 Notes 02 77

Example Rule

• Rule 1: • IF TABLE_NAME = "Users“

THENSET 'MIN_COPIES' = 2

CONSTRAINT_PRI = 0

CS 347 Notes 02 78

Another Example Rule

• Rule 2:• IF TABLE_NAME = "Users" AND

FIELD STR('home location') = 'France‘THEN

SET 'MIN_COPIES' = 3 ANDSET 'EXCL LIST' = 'USWest,

USEast‘CONSTRAINT PRI = 1

CS 347 Notes 02 79

CS 347 Notes 02 80

Summary

• Description of fragmentation• Good fragmentations• Design of fragmentation• Allocation