Download - Multimedia DBs
![Page 1: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/1.jpg)
Multimedia DBs
![Page 2: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/2.jpg)
Time Series Data
0 50 100 150 200 250 300 350 400 450 50023
24
25
26
27
28
29
25.1750 25.1750 25.2250 25.2500 25.2500 25.2750 25.3250 25.3500 25.3500 25.4000 25.4000 25.3250 25.2250 25.2000 25.1750
.. .. 24.6250 24.6750 24.6750 24.6250 24.6250 24.6250 24.6750 24.7500
A time series is a collection of observations
made sequentially in time.
time axis
valueaxis
![Page 3: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/3.jpg)
PAA and APCA Feature extraction for GEMINI:
Fourier Wavelets
Another approach: segment the time series into equal parts, store the average value for each part.
Use an index to store the averages and the segment end points
![Page 4: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/4.jpg)
0
1
2
3 4
5
6
7
Haar 0
Haar 1
Haar 2
Haar 3
Haar 4
Haar 5
Haar 6
Haar 7
0 20 40 60 80 100 120 140
X
X'DFT
Agrawal, Faloutsos, Swami 1993
Chan & Fu 1999
eigenwave 0
eigenwave 1
eigenwave 2
eigenwave 3
eigenwave 4
eigenwave 5
eigenwave 6
eigenwave 7
Korn, Jagadish, Faloutsos 1997
Feature Spaces
X
X'DWT
0 20 40 60 80 100 120 140
X
X'SVD
0 20 40 60 80 100 120 140
![Page 5: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/5.jpg)
Piecewise Aggregate Approximation (PAA)
valueaxis
time axis
Original time series(n-dimensional vector)S={s1, s2, …, sn}
n’-segment PAA representation (n’-d vector)
S = {sv1 , sv2, …, svn’ }sv1
sv2 sv3sv4
sv5
sv6
sv7
sv8
PAA representation satisfies the lower bounding lemma(Keogh, Chakrabarti, Mehrotra and Pazzani, 2000; Yi and Faloutsos 2000)
![Page 6: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/6.jpg)
Can we improve upon PAA?
n’-segment PAA representation
(n’-d vector)
S = {sv1 , sv2, …, svN }
sv1
sv2 sv3sv4
sv5
sv6
sv7
sv8
sv1
sv2
sv3
sv4
sr1 sr2 sr3 sr4
n’/2-segment APCA representation
(n’-d vector)
S= { sv1, sr1, sv2, sr2, …, svM , srM }
(M is the number of segments = n’/2)
Adaptive Piecewise Constant Approximation (APCA)
![Page 7: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/7.jpg)
1.69
3.02
1.21
1.75
3.77
1.03
Reconstruction error PAA Reconstruction error APCA
APCA approximates original signal better than PAA
Improvement factor =
![Page 8: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/8.jpg)
APCA Representation can be computed efficiently
Near-optimal representation can be computed in O(nlog(n)) time
Optimal representation can be computed in O(n2M) (Koudas et al.)
![Page 9: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/9.jpg)
Q
M
i iiii svqvsrsr1
21 ))((
DLB(Q’,S)
DLB(Q’,S)
Distance Measure
S
Q
D(Q,S)
n
iii sq
1
2
D(Q,S)
Exact (Euclidean) distance D(Q,S) Lower bounding distance DLB(Q,S)
S
S
Q’
![Page 10: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/10.jpg)
Index on 2M-dimensional APCA space
Any feature-based index structure can used (e.g., R-tree, X-tree, Hybrid Tree)
R1
R3
R2R4
2M-dimensional APCA space
S6
S5
S1
S2 S3
S4
S8
S7
S9
R2 R3 R4
R3 R4
R1
S3 S4 S5 S6 S7 S8 S9S2S1
R2
![Page 11: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/11.jpg)
k-nearest neighbor Algorithm
R1
S7
R3
R2
R4
S1
S2S3
S5
S4
S6
S8
S9
MINDIST(Q,R2)
MINDIST(Q,R4)
MINDIST(Q,R3)
Q
For any node U of the index structure with MBR R, MINDIST(Q,R) D(Q,S) for any data item S under U
![Page 12: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/12.jpg)
Index Modification for MINDIST Computation
APCA point S= { sv1, sr1, sv2, sr2, …, svM, srM }
S1
S2S3
S5
S4 S6
S8S9
R1
R3
R2R4
APCA rectangle S= (L,H) where
L= { smin1, sr1, smin2, sr2, …, sminM, srM } and
H = { smax1, sr1, smax2, sr2, …, smaxM, srM }
sv1
sv2
sv3
sv4
sr1 sr2 sr3 sr4
smax3
smin3
smax1
smin1
smax2
smin2
smax4
smin4
S7
![Page 13: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/13.jpg)
REGION 3
REGION 2
REGION 1
MBR Representation in time-value space
valueaxis
time axis L= { l1, l2, l3, l4 , l5, l6 }
We can view the MBR R=(L,H) of any node U as two APCA representations
L= { l1, l2, …, l(N-1), lN } and H= { h1, h2, …, h(N-1), hN }
l1
l2
l3
l4 l6
l5
H= { h1, h2, h3, h4 , h5, h6 }
h1
h2
h3
h4
h5
h6
![Page 14: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/14.jpg)
Regions
M regions associated with each MBR; boundaries of ith region:
REGION i
l(2i-1)
h(2i-1)
h2il(2i-2)+1
h3
h1
h5
h2 h4 h6
valueaxis
time axis
l3
l1
l2 l4
l6
l5
REGION 1
REGION 3
REGION 2
![Page 15: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/15.jpg)
Regions
h3
h1
h5
h2 h4 h6
valueaxis
time axis
l3
l1
l2 l4
l6
l5
REGION 2 t1 t2
REGION 3
REGION 1
ith region is active at time instant t if it spans across t
The value st of any time series S under node U at time instant t must
lie in one of the regions active at t (Lemma 2)
![Page 16: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/16.jpg)
MINDIST Computation
For time instant t, MINDIST(Q, R, t) =
minregion G active at t MINDIST(Q,G,t)
h3
h1
h5
h2 h4 h6
l3
l1
l2 l4
l6
l5
t1
REGION 3
REGION 2
REGION 1
MINDIST(Q,R,t1)=min(MINDIST(Q, Region1, t1), MINDIST(Q, Region2, t1))=min((qt1 - h1)2 , (qt1 - h3)2 )=(qt1 - h1)2
MINDIST(Q,R) =
n
ttRQMINDIST
1),,(
Lemma3: MINDIST(Q,R) D(Q,C) for any time series C under node U
![Page 17: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/17.jpg)
Approximate Search
A simpler definition of the distance in the feature space is the following:
But there is one problem… what?
M
i crki
crcr
k i
ii qcv1
2
1)(
1
1DLB(Q’,S)
![Page 18: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/18.jpg)
Multimedia dbs
A multimedia database stores also images
Again similarity queries (content based retrieval)
Extract features, index in feature space, answer similarity queries using GEMINI
Again, average values help!
![Page 19: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/19.jpg)
Images - color
what is an image?A: 2-d array
![Page 20: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/20.jpg)
Images - color
Color histograms,and distance function
![Page 21: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/21.jpg)
Images - color
Mathematically, the distance function is:
![Page 22: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/22.jpg)
Images - color
Problem: ‘cross-talk’: Features are not orthogonal -> SAMs will not work properly
Q: what to do? A: feature-extraction question
![Page 23: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/23.jpg)
Images - color
possible answers: avg red, avg green, avg blue
it turns out that this lower-bounds the histogram distance ->
no cross-talk SAMs are applicable
![Page 24: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/24.jpg)
Images - color
performance:
time
selectivity
w/ avg RGB
seq scan
![Page 25: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/25.jpg)
Images - shapes distance function: Euclidean, on
the area, perimeter, and 20 ‘moments’
(Q: how to normalize them?
![Page 26: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/26.jpg)
Images - shapes distance function: Euclidean, on
the area, perimeter, and 20 ‘moments’
(Q: how to normalize them? A: divide by standard deviation)
![Page 27: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/27.jpg)
Images - shapes distance function: Euclidean, on
the area, perimeter, and 20 ‘moments’
(Q: other ‘features’ / distance functions?
![Page 28: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/28.jpg)
Images - shapes distance function: Euclidean, on the
area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance
functions? A1: turning angle A2: dilations/erosions A3: ... )
![Page 29: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/29.jpg)
Images - shapes distance function: Euclidean, on
the area, perimeter, and 20 ‘moments’
Q: how to do dim. reduction?
![Page 30: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/30.jpg)
Images - shapes distance function: Euclidean, on
the area, perimeter, and 20 ‘moments’
Q: how to do dim. reduction? A: Karhunen-Loeve (= centered
PCA/SVD)
![Page 31: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/31.jpg)
Images – shapes Performance: ~10x faster
# of features kept
log(# of I/Os)
all kept
![Page 32: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/32.jpg)
Is d(u,v) = sqrt ((u-v)TA(u-v) ) a metric?
xTAx = Σ xixjAij = Σ λixi2
λi is the ith eigenvalue xi is the projection of x along the ith
eigenvector
d(u,v) = sqrt ((u-v)TA(u-v) ) = sqrt (Σ λi(ui-vi)2 )
d(u,v) >= 0, d(u,u) = 0, d(u,v) = d(v,u) d(u,w) <= d(u,v) + d(v,w), provided
sqrt (Σ λi(ui-wi)2 ) <= sqrt (Σ λi(ui-vi)2 ) + sqrt(Σ λi(vi-wi)2 ) sqrt(Σ (√λi ui- √λiwi)2 ) <= sqrt(Σ (√λiui- √λivi)2 ) + sqrt(Σ(√λivi-
√λiwi)2 ) Metric condition for Lp norm
![Page 33: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/33.jpg)
Filtering in QBIC Histogram column vectors x, y of length n
Σ xi = 1, Σ yi = 1 Difference z = (x-y)
Σ zi = 0 Contribution of each color bin to a
smaller set of colors: VT = (c1, c2,.., cn), each ci is a column
vector of length 3 xavg = VT x, yavg = Vty, column vectors of
length 3
![Page 34: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/34.jpg)
Filtering in QBIC Distances
davg2 = (xavg - yavg)T(xavg - yavg)
= (VT z)T(VT z)= zTVVt z
= zTW z dhist
2 = zTA z dhist
2 >= λ1davg2 , where λ1 is the
smallest eigenvalue of A’z = λW’z
![Page 35: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/35.jpg)
Filtering in QBIC Rewrite z to remove the extra
condition that Σ zi = 0. z’ becomes a (n-1) dimensional
column vector zTA z = z’TA’ z’ and zTW z = z’TW’
z’ A’ and W’ are (n-1)x(n-1) matrices
Show that z’TA’ z’ >= λ1z’TW’ z’
![Page 36: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/36.jpg)
Proof of z’TA’ z’ >= λ1z’TW’ z’ Minimize wrt z’, z’TA’ z’, subject to
the constraint z’TW’ z’ = C. Same as minimizing wrt z’,
z’TA’ z’ - λ(z’TW’ z’ - C) Differentiate wrt z and set to 0
A’z’ = λW’ z’ λ and z’ must be eigenvalues and
eigenvectors resp. of A’z’ = λW’ z’
![Page 37: Multimedia DBs](https://reader036.vdocuments.us/reader036/viewer/2022062422/56813280550346895d991c50/html5/thumbnails/37.jpg)
Proof of z’TA’ z’ >= λ1z’TW’ z’ z’TA’ z’ = λz’TW’ z’ = λC To minimize z’TA’ z’ , we must
choose the smallest eigenvalue λ1. The minimization of z’TA’ z’, under z’,
subject to the constraint z’TW’ z’ = C equals λ1C
If z’TW’ z’ = C > 0 then z’TA’ z’ >= λ1C
If z’TW’ z’ = 0 then z’TA’ z’ >= 0, A’ is positive semi-definite