comp53311 data warehouse prepared by raymond wong presented by raymond wong raywong@cse
TRANSCRIPT
![Page 1: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/1.jpg)
COMP5331 1
COMP5331
Data Warehouse
Prepared by Raymond WongPresented by Raymond Wong
raywong@cse
![Page 2: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/2.jpg)
COMP5331 2
Data Warehouse
Also called Online Analytical Processing (OLAP)
Many corporations use data warehouses for their analysis
![Page 3: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/3.jpg)
COMP5331 3
Data Warehouse
Databases Users
Databases UsersData Warehouse
Need to wait for a long time (e.g., 1 day to 1 week)
Pre-computed results
Query
![Page 4: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/4.jpg)
COMP5331 4
Advantages
Fast Query Response
![Page 5: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/5.jpg)
COMP5331 5
Data Warehouse
Problem Data Warehouse NP-hardness
Algorithm Performance Study
![Page 6: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/6.jpg)
COMP5331 6
Data WarehouseParts are bought from suppliers and then sold to customers at a sale price SP
part supplier customer SP
p1 s1 c1 4
p3 s1 c2 3
p2 s3 c1 7
… … … …
Table T
![Page 7: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/7.jpg)
COMP5331 7
Data WarehouseParts are bought from suppliers and then sold to customers at a sale price SP
part supplier customer SP
p1 s1 c1 4
p3 s1 c2 3
p2 s3 c1 7
… … … …
Table T
part
p1 p2 p3 p4 p5 suppliers1s2
s3s4
customer
c1
c2
c3
c4
4
3
Data cube
![Page 8: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/8.jpg)
COMP5331 8
Data WarehouseParts are bought from suppliers and then sold to customers at a sale price SP
e.g., select part, customer, SUM(SP)from table Tgroup by part, customerpart
customer
SUM(SP)
p1 c1 4
p3 c2 3
p2 c1 7
e.g., select customer, SUM(SP)from table Tgroup by customer
customer
SUM(SP)
c1 11
c2 3
pc 3 c 2
part supplier customer SP
p1 s1 c1 4
p3 s1 c2 3
p2 s3 c1 7
… … … …
Table T
AVG(SP), MAX(SP), MIN(SP), …
![Page 9: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/9.jpg)
COMP5331 9
Data Warehouse
psc 6M
pc 4M ps 0.8M sc 2M
p 0.2M s 0.01M c 0.1M
none 1
Parts are bought from suppliers and then sold to customers at a sale price SP
part supplier customer SP
p1 s1 c1 4
p3 s1 c2 3
p2 s3 c1 7
… … … …
Table T
![Page 10: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/10.jpg)
COMP5331 10
Data Warehouse
psc 6M
pc 4M ps 0.8M sc 2M
p 0.2M s 0.01M c 0.1M
none 1
Suppose we materialize all views. This wastes a lot of space.
Cost for accessing pc = 4M
Cost for accessing ps = 0.8M
Cost for accessing sc = 2MCost for accessing p = 0.2M
Cost for accessing c = 0.1MCost for accessing s = 0.01M
![Page 11: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/11.jpg)
COMP5331 11
Data Warehouse
psc 6M
pc 4M ps 0.8M sc 2M
p 0.2M s 0.01M c 0.1M
none 1
Suppose we materialize the top view only.
Cost for accessing pc = 6M(not 4M)
Cost for accessing ps = 6M(not 0.8M)
Cost for accessing sc = 6M(not 2M)Cost for accessing p = 6M
(not 0.2M)
Cost for accessing c = 6M(not 0.1M)
Cost for accessing s = 6M(not 0.01M)
![Page 12: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/12.jpg)
COMP5331 12
Data Warehouse
psc 6M
pc 4M ps 0.8M sc 2M
p 0.2M s 0.01M c 0.1M
none 1
Suppose we materialize the top view and the view for “ps” only.
Cost for accessing pc = 6M(still 6M)
Cost for accessing sc = 6M(still 6M)
Cost for accessing p = 0.8M(not 6M previously)
Cost for accessing ps = 0.8M(not 6M previously)
Cost for accessing c = 6M(still 6M)
Cost for accessing s = 0.8M(not 6M previously)
![Page 13: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/13.jpg)
COMP5331 13
Data Warehouse
psc 6M
pc 4M ps 0.8M sc 2M
p 0.2M s 0.01M c 0.1M
none 1
Suppose we materialize the top view and the view for “ps” only.
Cost for accessing pc = 6M(still 6M)
Cost for accessing sc = 6M(still 6M)
Cost for accessing p = 0.8M(not 6M previously)
Cost for accessing ps = 0.8M(not 6M previously)
Cost for accessing c = 6M(still 6M)
Cost for accessing s = 0.8M(not 6M previously)
Gain = 0Gain = 5.2M
Gain = 0Gain = 5.2M
Gain = 5.2M Gain = 0
Gain({view for “ps”, top view}, {top view}) = 5.2*3 = 15.6
Selective Materialization Problem:We can select a set V of k views such that Gain(V U {top view}, {top view}) is maximized.
![Page 14: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/14.jpg)
COMP5331 14
Data Warehouse
Problem Data Warehouse NP-hardness
Algorithm Performance Study
![Page 15: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/15.jpg)
COMP5331 15
NP-hardness
Selective Materialization Problem is NP-hard.
Selective Materialization Problem:We can select a set V of k views such that Gain(V U {top view}, {top view}) is maximized.
![Page 16: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/16.jpg)
COMP5331 16
NP-hardness
Selective Materialization Decision Problem (SMD) Given an integer k and a real number J,
We want to find a set V of k views such that Gain(V U {top view}, {top view}) is at least J.
Selective Materialization Decision Problem is NP-hard.
Selective Materialization Problem:We can select a set V of k views such that Gain(V U {top view}, {top view}) is maximized.
![Page 17: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/17.jpg)
COMP5331 17
NP-hardness Exact Cover by 3-Sets (XC)
Instance: Set X with 3q elements, and a collection C of size 3 subsets of X
Question: Does C contain an exact cover for X, i.e., a subcollection C’ C such that every element of X occurs in exactly one set of C’.
It is well-known that this problem is NP-complete.
![Page 18: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/18.jpg)
COMP5331 18
NP-hardness Problem XC can be transformed to Problem SMD
Create a root node with size = 200 (at level 1) Create a bottom node with size 1 (at level 4) For each element x in X,
Create a node Nx with size = 50 at level 3 Create an edge between Nx and the bottom node
For each element a C (where a = (x, y, z))
Create a node Na with size = 100 at level 2 Create an edge between Na and the root node Create an edge between Na and Nx Create an edge between Na and Ny Create an edge between Na and Nz
Set k = q Set J = 400q
![Page 19: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/19.jpg)
COMP5331 19
NP-hardness
E.g., X = {A, B, C, D, E, F} C = {(A, B, C), (B, C, D), (D, E, F)}
A B C D E F
200
1
50 50 50 50 50 50
100 100 100
k = 2
q = 2
J = 400x2 = 800
![Page 20: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/20.jpg)
COMP5331 20
NP-hardness
It is easy to verify that solving the problem SMD is equal to solving problem XC
Problem SMD is NP-hard.
![Page 21: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/21.jpg)
COMP5331 21
Data Warehouse
psc 6M
pc 4M ps 0.8M sc 2M
p 0.2M s 0.01M c 0.1M
none 1
Parts are bought from suppliers and then sold to customers at a sale price SP
part supplier customer SP
p1 s1 c1 4
p3 s1 c2 3
p2 s3 c1 7
… … … …
Table T
![Page 22: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/22.jpg)
COMP5331 22
Data Warehouse
psc 6M
pc 6M ps 0.8M sc 2M
p 0.2M s 0.01M c 0.1M
none 1
Parts are bought from suppliers and then sold to customers at a sale price SP
part supplier customer SP
p1 s1 c1 4
p3 s1 c2 3
p2 s3 c1 7
… … … …
Table T
![Page 23: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/23.jpg)
COMP5331 23
Data Warehouse
Problem Date Warehouse NP-hardness
Algorithm Performance Study
![Page 24: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/24.jpg)
COMP5331 24
Greedy Algorithm k = number of views to be materialized
Given v is a view S is a set of views which are selected to be
materialized Define the benefit of selecting v for
materialization as B(v, S) = Gain(S U {v}, S)
![Page 25: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/25.jpg)
COMP5331 25
Greedy Algorithm
S {top view}; For i = 1 to k do
Select that view v not in S such that B(v, S) is maximized;
S S U {v} Resulting S is the greedy selection
![Page 26: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/26.jpg)
COMP5331 26
1.1 Data Cube
psc 6M
pc 6M ps 0.8M sc 6M
p 0.2M s 0.01M c 0.1M
none 1
1st Choice (M)
2nd Choice (M)
pc
ps
sc
p
s
c
0 x 3= 0
Benefit from pc =
Benefit
6M-6M = 0 k = 2
![Page 27: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/27.jpg)
COMP5331 27
1.1 Data Cube
psc 6M
pc 6M ps 0.8M sc 6M
p 0.2M s 0.01M c 0.1M
none 1
1st Choice (M)
2nd Choice (M)
pc
ps
sc
p
s
c
0 x 3= 0
5.2 x 3= 15.6
Benefit from ps =
Benefit
6M-0.8M = 5.2M k = 2
![Page 28: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/28.jpg)
COMP5331 28
1.1 Data Cube
psc 6M
pc 6M ps 0.8M sc 6M
p 0.2M s 0.01M c 0.1M
none 1
1st Choice (M)
2nd Choice (M)
pc
ps
sc
p
s
c
0 x 3= 0
5.2 x 3= 15.6
Benefit from sc =
Benefit
6M-6M = 0
0 x 3= 0
k = 2
![Page 29: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/29.jpg)
COMP5331 29
1.1 Data Cube
psc 6M
pc 6M ps 0.8M sc 6M
p 0.2M s 0.01M c 0.1M
none 1
1st Choice (M)
2nd Choice (M)
pc
ps
sc
p
s
c
0 x 3= 0
5.2 x 3= 15.6
Benefit from p =
Benefit
6M-0.2M = 5.8M
0 x 3= 0
5.8 x 1= 5.8
k = 2
![Page 30: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/30.jpg)
COMP5331 30
1.1 Data Cube
psc 6M
pc 6M ps 0.8M sc 6M
p 0.2M s 0.01M c 0.1M
none 1
1st Choice (M)
2nd Choice (M)
pc
ps
sc
p
s
c
0 x 3= 0
5.2 x 3= 15.6
Benefit from s =
Benefit
6M-0.01M = 5.99M
0 x 3= 0
5.8 x 1= 5.8
5.99 x 1= 5.99
k = 2
![Page 31: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/31.jpg)
COMP5331 31
1.1 Data Cube
psc 6M
pc 6M ps 0.8M sc 6M
p 0.2M s 0.01M c 0.1M
none 1
1st Choice (M)
2nd Choice (M)
pc
ps
sc
p
s
c
0 x 3= 0
5.2 x 3= 15.6
Benefit from c =
Benefit
6M-0.1M = 5.9M
0 x 3= 0
5.8 x 1= 5.8
5.99 x 1= 5.99
5.9 x 1= 5.9
k = 2
![Page 32: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/32.jpg)
COMP5331 32
1.1 Data Cube
psc 6M
pc 6M ps 0.8M sc 6M
p 0.2M s 0.01M c 0.1M
none 1
1st Choice (M)
2nd Choice (M)
pc
ps
sc
p
s
c
0 x 3= 0
5.2 x 3= 15.6
Benefit
0 x 3= 0
5.8 x 1= 5.8
5.99 x 1= 5.99
5.9 x 1= 5.9
Benefit from pc = 6M-6M = 0
0 x 2= 0
k = 2
![Page 33: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/33.jpg)
COMP5331 33
1.1 Data Cube
psc 6M
pc 6M ps 0.8M sc 6M
p 0.2M s 0.01M c 0.1M
none 1
1st Choice (M)
2nd Choice (M)
pc
ps
sc
p
s
c
0 x 3= 0
5.2 x 3= 15.6
Benefit
0 x 3= 0
5.8 x 1= 5.8
5.99 x 1= 5.99
5.9 x 1= 5.9
Benefit from sc = 6M-6M = 0
0 x 2= 0
0 x 2= 0
k = 2
![Page 34: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/34.jpg)
COMP5331 34
1.1 Data Cube
psc 6M
pc 6M ps 0.8M sc 6M
p 0.2M s 0.01M c 0.1M
none 1
1st Choice (M)
2nd Choice (M)
pc
ps
sc
p
s
c
0 x 3= 0
5.2 x 3= 15.6
Benefit
0 x 3= 0
5.8 x 1= 5.8
5.99 x 1= 5.99
5.9 x 1= 5.9
Benefit from p = 0.8M-0.2M = 0.6M
0 x 2= 0
0 x 2= 0
0.6 x 1= 0.6
k = 2
![Page 35: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/35.jpg)
COMP5331 35
1.1 Data Cube
psc 6M
pc 6M ps 0.8M sc 6M
p 0.2M s 0.01M c 0.1M
none 1
1st Choice (M)
2nd Choice (M)
pc
ps
sc
p
s
c
0 x 3= 0
5.2 x 3= 15.6
Benefit
0 x 3= 0
5.8 x 1= 5.8
5.99 x 1= 5.99
5.9 x 1= 5.9
Benefit from s = 0.8M-0.01M = 0.79M
0 x 2= 0
0 x 2= 0
0.6 x 1= 0.6
0.79 x 1= 0.79
k = 2
![Page 36: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/36.jpg)
COMP5331 36
1.1 Data Cube
psc 6M
pc 6M ps 0.8M sc 6M
p 0.2M s 0.01M c 0.1M
none 1
1st Choice (M)
2nd Choice (M)
pc
ps
sc
p
s
c
0 x 3= 0
5.2 x 3= 15.6
Benefit
0 x 3= 0
5.8 x 1= 5.8
5.99 x 1= 5.99
5.9 x 1= 5.9
Benefit from c = 6M-0.1M = 5.9M
0 x 2= 0
0 x 2= 0
0.6 x 1= 0.6
0.79 x 1= 0.79
5.9 x 1= 5.9
k = 2
![Page 37: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/37.jpg)
COMP5331 37
1.1 Data Cube
psc 6M
pc 6M ps 0.8M sc 6M
p 0.2M s 0.01M c 0.1M
none 1
1st Choice (M)
2nd Choice (M)
pc
ps
sc
p
s
c
0 x 3= 0
5.2 x 3= 15.6
Benefit
0 x 3= 0
5.8 x 1= 5.8
5.99 x 1= 5.99
5.9 x 1= 5.9
0 x 2= 0
0 x 2= 0
0.6 x 1= 0.6
0.79 x 1= 0.79
5.9 x 1= 5.9
Two views to be materialized are1. ps2. c
V = {ps, c} Gain(V U {top view}, {top view})= 15.6 + 5.9 = 21.5
k = 2
![Page 38: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/38.jpg)
COMP5331 38
Data Warehouse
Problem Data Warehouse NP-hardness
Algorithm Performance Study
![Page 39: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/39.jpg)
COMP5331 39
Performance Study
How bad does the Greedy Algorithm perform?
![Page 40: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/40.jpg)
COMP5331 40
1.1 Data Cube
a 200
b 100 c 99 d 100
p1 97
none 1
1st Choice (M) 2nd Choice (M)
b
c
d
… … …
41 x 100= 4100
Benefit from b =
Benefit
200-100= 100
20 nodes …
p20 97
q1 97
…
q20 97
r1 97
…
r20 97
s1 97
…s20 97
k = 2
![Page 41: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/41.jpg)
COMP5331 41
1.1 Data Cube
a 200
b 100 c 99 d 100
p1 97
none 1
1st Choice (M) 2nd Choice (M)
b
c
d
… … …
41 x 100= 4100
Benefit from c =
Benefit
200-99 = 101
20 nodes …
p20 97
q1 97
…
q20 97
r1 97
…
r20 97
s1 97
…s20 97
41 x 101= 4141
k = 2
![Page 42: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/42.jpg)
COMP5331 42
1.1 Data Cube
a 200
b 100 c 99 d 100
p1 97
none 1
1st Choice (M) 2nd Choice (M)
b
c
d
… … …
41 x 100= 4100
Benefit
20 nodes …
p20 97
q1 97
…
q20 97
r1 97
…
r20 97
s1 97
…s20 97
41 x 101= 4141
41 x 100= 4100
k = 2
![Page 43: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/43.jpg)
COMP5331 43
1.1 Data Cube
a 200
b 100 c 99 d 100
p1 97
none 1
1st Choice (M) 2nd Choice (M)
b
c
d
… … …
41 x 100= 4100
Benefit
20 nodes …
p20 97
q1 97
…
q20 97
r1 97
…
r20 97
s1 97
…s20 97
41 x 101= 4141
41 x 100= 4100
Benefit from b = 200-100= 100
21 x 100= 2100
k = 2
![Page 44: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/44.jpg)
COMP5331 44
1.1 Data Cube
a 200
b 100 c 99 d 100
p1 97
none 1
1st Choice (M) 2nd Choice (M)
b
c
d
… … …
41 x 100= 4100
Benefit
20 nodes …
p20 97
q1 97
…
q20 97
r1 97
…
r20 97
s1 97
…s20 97
41 x 101= 4141
41 x 100= 4100
21 x 100= 2100
21 x 100= 2100
Greedy: V = {b, c} Gain(V U {top view}, {top view})= 4141 + 2100 = 6241
k = 2
![Page 45: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/45.jpg)
COMP5331 45
1.1 Data Cube
a 200
b 100 c 99 d 100
p1 97
none 1
1st Choice (M) 2nd Choice (M)
b
c
d
… … …
41 x 100= 4100
Benefit
20 nodes …
p20 97
q1 97
…
q20 97
r1 97
…
r20 97
s1 97
…s20 97
41 x 101= 4141
41 x 100= 4100 41 x 100= 4100
Greedy: V = {b, c} Gain(V U {top view}, {top view})= 4141 + 2100 = 6241
21 x 101 + 20 x 1= 2141
Optimal: V = {b, d} Gain(V U {top view}, {top view})= 4100 + 4100 = 8200
k = 2
![Page 46: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/46.jpg)
COMP5331 46
1.1 Data Cube
a 200
b 100 c 99 d 100
p1 97
none 1
20 nodes …
p20 97
q1 97
…
q20 97
r1 97
…
r20 97
s1 97
…s20 97
Greedy: V = {b, c} Gain(V U {top view}, {top view})= 4141 + 2100 = 6241
Optimal: V = {b, d} Gain(V U {top view}, {top view})= 4100 + 4100 = 8200
Greedy
Optimal=
6241
8200=0.7611
If this ratio = 1, Greedy can give an optimal solution. If this ratio 0, Greedy may give a “bad” solution.
Does this ratio has a “lower” bound?
It is proved that this ratio is at least 0.63.
k = 2
![Page 47: COMP53311 Data Warehouse Prepared by Raymond Wong Presented by Raymond Wong raywong@cse](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f2a5503460f94c44eaa/html5/thumbnails/47.jpg)
COMP5331 47
Performance Study
This is just an example to show that this greedy algorithm can perform badly.
A complete proof of the lower bound can be found in the paper.