data placement problems in database applications
DESCRIPTION
Data Placement Problems in Database Applications. An Zhu Stanford University. Data Placement. Data objects Multiple disks Assignment of objects to disks Optimize performance Optimize I/O Handle dynamic situations. Outline. Multimedia Systems [GKKTZ 00] - PowerPoint PPT PresentationTRANSCRIPT
Data Placement Problems in Database Applications
An ZhuAn Zhu
Stanford University
04/19/23 AZ 2
Data Placement
Data objects Multiple disks Assignment of objects to disks
Optimize performance Optimize I/O Handle dynamic situations
04/19/23 AZ 3
Outline
Multimedia Systems [GKKTZ 00] Maximize the total clients served
Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time
Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed
moves
04/19/23 AZ 4
Outline
Multimedia Systems [GKKTZ 00] Maximize the total clients served
Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time
Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed
moves
04/19/23 AZ 5
Multimedia Storage Systems
Movie objects Clients/subscribers Parallel disks
Limited storage: # of movies—Nj
Limited bandwidth: # of clients—Cj
Homogeneous system: Nj=k, Cj=L, j Uniform ratio: Cj/Nj=r, j
04/19/23 AZ 6
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Total Storage: 12, Total Capacity: 1800
04/19/23 AZ 7
An Example
000/600
000/600
400/600
100100
100 400400100100100
Total Storage: 12, Total Capacity: 1800
04/19/23 AZ 8
An Example
400/600
000/600
400/600
400400100100
Total Storage: 12, Total Capacity: 1800
04/19/23 AZ 9
Not All Clients Can be Satisfied
400/600
600/600
400/600
400
Total Satisfied Clients: 1400/1800=7/9
04/19/23 AZ 10
Sliding Window Algorithm
Consider one disk at a time Maintain an ordered list of movies The first consecutive k movies (or less)
with at least L combined clients Assign the first L clients to the disk and
reconsider leftover clients
04/19/23 AZ 11
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Max window size k=4
100
04/19/23 AZ 12
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Max window size k=4
200
04/19/23 AZ 13
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Max window size k=4
400
04/19/23 AZ 14
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Max window size k=4
400
04/19/23 AZ 15
An Example
000/600
000/600
000/600
100 100100100100100
100 400400100100100
Max window size k=4
700
04/19/23 AZ 16
An Example
000/600
000/600
600/600
100 100100100100100
100 400100 0 0 0
Max window size k=4
04/19/23 AZ 17
An Example
000/600
000/600
600/600
100 100100100100100
100 400100
Max window size k=4
04/19/23 AZ 18
An Example
600/600
000/600
600/600
100 100100100100100
Max window size k=4
400
04/19/23 AZ 19
An Example
600/600
400/600
600/600
100 100
Total Satisfied Clients: 1600/1800=8/9
04/19/23 AZ 20
Theoretical Bounds
Satisfies at least fraction of total clients
In the worst case, no algorithm can satisfy more clients
Translates to an -approximation
PTAS: (1+)-approximation, >0
21
11
k
21
11
k
04/19/23 AZ 21
Theoretical Bounds
Satisfies at least fraction of total clients
In the worst case, no algorithm can satisfy more clients
Translates to an -approximation
PTAS: (1+)-approximation, >0
21
11
k
21
11
k
04/19/23 AZ 22
Proof Sketch
Load vs. storage saturated: ML, MS
Least loaded disk: cL ML+MS=M, 0<c<1 All remaining movies each have no
more than cL/k clients Initial instance is feasible (w.l.o.g.)
04/19/23 AZ 23
An Example
600/600
400/600
600/600
100 100
Total Satisfied Clients: 1600/1800=8/9
ML=2, MS=1,
c=400/600
cL/k=100
04/19/23 AZ 24
Proof Outline
If there is a load saturated disk with less than k movies All clients are satisfied
Otherwise At most ML movies are left Satisfy at least fraction of
the clients 21
11
k
04/19/23 AZ 25
Lemma
If any of the load saturated disk has less than k objects
Any k-1 remaining movies in the list has L clients or more
04/19/23 AZ 26
Lemma
The remaining disks are all load saturated
So, all clients are satisfied
At least LAt least L
04/19/23 AZ 27
Otherwise…
Each disk has exactly k movies Total assigned movies: M·k
Initial movies: N M·k “New” movies generated: ML
# of movies left: ≤ ML
# of clients/remaining movie: ≤ cL/k Total # of remaining clients: cLML/k
04/19/23 AZ 28
Otherwise…
Total clients: ≤ M·L Assigned clients: ML·L + Ms·cL Total # of remaining clients : ≤ Ms·(1-c)L Final bound:
cLMLM
LcMkMcL
S
U
SL
SL
)1(,/min
21
11
k
04/19/23 AZ 29
Simulation Results
M=5L=100N=M·k
Zipf with=0.0( i-1 )
04/19/23 AZ 30
Recap
The problem is NP-complete PTAS: best possible approximation
bound : best possible absolute bound
Sliding window algorithm: practical with O((M+N)log(M+N)) running time
21
11
k
04/19/23 AZ 31
Outline
Multimedia Systems [GKKTZ 00] Maximize the total clients served
Relational Database Layout [AFMPZ 03] Minimize the total I/O access time
Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed
moves
04/19/23 AZ 32
Relational Databases
Objects: indexes, tables, views Multiple disks Minimize the total I/O access time
04/19/23 AZ 33
Past Work
Full striping Split uniformly across all available disks Utilize I/O parallelism
: transfer rate
200MB=0.05s/MB,Tt=10s200MB
04/19/23 AZ 34
=0.05s/MB,Tt=2.5s
Past Work
Full striping Split uniformly across all available disks Utilize I/O parallelism
: transfer rate
200MB =0.05s/MB,Tt=10s50MB50MB50MB
50MB50MB 50MB 50MB
04/19/23 AZ 35
Past Work
Co-accessed objects with Random I/O Seek time/per block size: 0.01s/0.1MB Seek rate: =0.1s/MB Smaller object dominates
50MB 50MB 50MB 50MB100MB 100MB 100MB 100MB
AB
Ts=50·2=10s
04/19/23 AZ 36
Past Work
Combined access time Transfer time: Tt=(50+100)·=7.5s Seek time: Ts=min(50,100)·=10s Combined time: Tt+Ts=17.5s
50MB 50MB 50MB 50MB100MB 100MB 100MB 100MB
AB
04/19/23 AZ 37
Past Work
Fully striping is no longer optimal [Agrawal Chaudhuri Das Narasayya 03’]
Combined time: 200·=10s
100MB 100MB200MB 200MB
04/19/23 AZ 38
Data Layout Problem
Work Load (SQL DML) A set of queries and/or updates A set of co-accessed objects (pairwise) Access stats (pairwise) Minimize the estimated I/O access time
04/19/23 AZ 39
Theoretical Questions
Approximation and its hardness Transfer time: P Seek time: Very Hard Combined time
Hard Minimizing transfer time alone is a “good”
approximation
04/19/23 AZ 40
Transfer Time
Heterogeneous disks Different rate: j
Storage constraint: cj
Objects Different size: si
Access frequency: i,i’
Solvable using Linear Programming (LP)
04/19/23 AZ 41
LP
',',',
,',',
,',,',
,
,
,
min
',,
,',,)(
,
,
,,0
iiiiii
jiiii
jjijijii
ji
ji
jiji
ji
T
iitT
jiixxt
jcx
isx
jix
Amount of object i assigned to disk j
Each object must be completely assigned
Each disk’s storage limit is kept
Transfer time for (i,i’) on disk j
Overall transfer time for (i,i’)
Minimize the total transfer time
04/19/23 AZ 42
Seek Time
Hard even on disks with no storage constraint
Integral assignment Each object is assigned to one machine
only Conversion from a fraction assignment
with no loss
04/19/23 AZ 43
Conversion
f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 Want: each file is spread uniformly
across a subset of disks
100MB 100MB200MB 200MB
150MB100MB
A B ABC C
04/19/23 AZ 44
Conversion
f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 New cost: 1002+1252
100MB 100MB200MB 200MB
125MB125MB
A B ABC C
04/19/23 AZ 45
Conversion
f( , )=1, f( , )=1, f( , )=0 Total seek cost: 1002+1002 New cost: 1002
100MB 100MB200MB 200MB
125MB125MB250MB
A B ABC C
04/19/23 AZ 46
Conversion
f( , )=1, f( , )=1, f( , )=0 Total seek cost: 0 Each file resides on only one disk
200MB400MB250MB
100MB 100MB200MB200MB250MB
A B ABC C
04/19/23 AZ 47
Implications
A polynomial time algorithm Equivalent to Minimum Edge Deletion
k-Partition NP-Hard to approximate: O(n2) Forces combined time be hard to
approximate
04/19/23 AZ 48
Combined Time
Let
Hard to approximate: ·, 1>>0 Optimize transfer time alone gives 1+
j
jj
max
)(),min(2)(
))(1(),min(2)(
212121
212121
xxxxxx
xxxxxx
04/19/23 AZ 49
Outline
Multimedia Systems [GKKTZ 00] Maximize the total clients served
Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time
Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed
moves
04/19/23 AZ 50
Load Rebalancing
Access pattern changes Initial layout no longer balanced
2
1
54
3 6
7
8
9
1011
MAX LOAD
04/19/23 AZ 51
Load Rebalancing
Relocate objects Minimize the max load with k moves
2
1
543 6
78
9
1011
MAX LOAD
04/19/23 AZ 52
Simple Algorithm (O(nlogn))
Step 1: Repeat k times Remove the largest object from the most
loaded disk The resulting max load: L(1)
Step2: Relocate the removed k objects Assign each object to the least loaded
disk The resulting max load: L(2)
04/19/23 AZ 53
Example (k=3)
Step1: L(1) OPT
2
1
543 6
78
9
1011
MAX LOAD
9
MAX LOAD
1
6
L(1)
04/19/23 AZ 54
Example (k=3)
Step2: L(2) OPT + S 2OPT Overall: max(L(1),L(2)) 2OPT
2 543
78
1011
91
6
MIN LOAD6 MIN LOAD
1 9
L(2)
04/19/23 AZ 55
Can We Do Better?
Blindly remove the large object is not wise
2
1
543 6
78
9
1011
MAX LOAD
04/19/23 AZ 56
How can we do better
Take care of large objects Large objects: size >1/2OPT Small objects: size 1/2OPT
2
1
543 6
78
9
1011
OPT
04/19/23 AZ 57
Revising The Plan
Step 1: Repeat k times Remove the largest object from the most
loaded disk The resulting max load: L(1) OPT
Step2: Relocate the removed k objects Assign each object to the least loaded
disk The resulting max load: L(2) OPT +S
2OPT
04/19/23 AZ 58
Revised Plan
Step 1: with no more than k moves Shuffle large objects and remove small
objects The resulting max load: L(1) 3/2 OPT
Step2: Relocate the removed objects Assign each object to the least loaded
disk (they are all small) The resulting max load: L(2) OPT +S
3/2 OPT just to fill in the space
04/19/23 AZ 59
Example
Step 1
2
1
543 6
78
9
1011
MAX LOAD
2
1
10 11
3/2 OPT
04/19/23 AZ 60
2
Example
Step 2
543 6
78
9 MIN LOAD
2
1
10 11
11 MIN LOAD10 MIN LOAD
OPT+S
04/19/23 AZ 61
Recap
Fast 1.5-approximation (O(nlogn)) NP-complete PTAS: generalized cost
04/19/23 AZ 62
Summary
Multimedia Systems [GKKTZ 00] Maximize the total clients served
Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time
Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed
moves
04/19/23 AZ 63
Other Research Interests
Algorithms for mobile, sensor networks and privacy preserving databases
Online Algorithms: queue management, packet switching, web caching, scheduling
Approximation Algorithms: network design, multi-product pricing
Streaming Algorithms