multimedia data management m - unibo.it · weighted lp-norms and relevance feedback can partially...
TRANSCRIPT
ALMA MATER STUDIORUM - UNIVERSITÀ DI BOLOGNA
Multimedia Data Management M
Second cycle degree programme (LM) in Computer Engineering University of Bologna
Multimedia Data Retrieval
Home page: http://www-db.disi.unibo.it/courses/DM/ Electronic version: 4.01.MultimediaDataRetrieval.pdf Electronic version: 4.01.MultimediaDataRetrieval-2p.pdf
I. Bartolini
2 2
Description models for MM data retrieval Low-level features for MM data content representation Similarity measures for MM data content comparison
Outline
I. Bartolini Multimedia Data Management M
MM data retrieval
From the previous lesson we know that features are a smarter way to represent MM data content than their original format e.g., color and texture for an image
Today we focus on which are the most suitable models for representing, interpreting, describing and comparing such features E.g., color histograms for images by using the Euclidean distance as
similarity measure
…with the final goal to be able to retrieve from MM collections those objects which are
most interesting for us!!
3 I. Bartolini Multimedia Data Management M
Content-based search First approach to search for MM objects relies on standard text-based
techniques, provided objects come with a precise textual description of what they represent/describe, i.e., of their semantics However, the “annotation” of MM objects is a subjective, time
consuming, and tedious process (completely manual!!) A more convenient approach, suitable to manage large DBs, is to
automatically extract from MM objects a set of (low-level) relevant numerical features
that, at least partially, convey some of the semantics of the objects Clearly, which are the “best” features to extract depend on the specific
medium and on the application at hand (i.e., what we are looking for)
Look for cheetahs? This is fine; but, how to find it? 4 I. Bartolini Multimedia Data Management M
Content-based similarity search Once we have feature values, we can search objects by using them Assume a database (DB) with N MM objects (e.g., images) and, for each
of the N objects, we have extracted the “relevant features” E.g., we could extract some color information from images
We can now search for objects whose feature values are “similar” (in some sense to be defined) to the feature values of our query [SWS+00, LSD+06, LZL+07, DJL+08]
In general, this approach, much alike as it happens in text-retrieval, cannot guarantee that all and only relevant results are returned as result of a query
Look for cheetahs? Oops! Not really a cheetah ;-)
5 I. Bartolini Multimedia Data Management M
The general scenario
In general, we have a 2-levels scenario:
The objects level Ultimately, we want to find relevant objects
Color Texture Shape
The features level We extract features from the objects, and use them for querying the DB
Needed to support automatic retrieval
6 I. Bartolini Multimedia Data Management M
The reference architecture For the text-based approach, the image querying problem can be
simply transformed into a traditional information retrieval (IR) problem as we saw for textual
documents retrieval and as we will see when speaking about “MM data annotation”…
For content-based
information retrieval (CBIR) more sophisticated query evaluation techniques are required
7
feature extraction
image DB
image segmentation
(optional) GUI
query engine
visualize query processor
feature DB
results
query image
index
I. Bartolini Multimedia Data Management M
Variables of the CBIR problem
How the set of relevant results is determined depends on which low-level features are used to characterize the MM
data content on the similarity criterion (distance function) used to compare
such features on how DB objects are ranked with respect to the query on whether the user is interested in the whole MM data query or
only in a part of it ! All above aspects strongly influence the
query evaluation process
8 I. Bartolini Multimedia Data Management M
CBIR problem
Simplest case: each MM data object (i.e., image) is characterized using global low-level features and the result of a query consists in the set of DB objects that “better match” the visual characteristics of the target object, according to a predefined similarity criterion, which is in turn based on such features This is also defined Nearest Neighbors (NN)
search problem
9 I. Bartolini Multimedia Data Management M
Representing color In a digital image, the color space that encodes the color
content of each pixel of the image is necessarily discretized This depends on how many bits per pixel (bpp) are used Example: if one represents images in the RGB space by using 8 × 3 = 24 bpp,
the number of possible distinct colors is 224 = 16,777,216 With 8 bits per channel, we have 256 possible values on each channel
Although discrete, the possible color values are still too many if one wants to compactly represent the color content of an image This also aims at achieving some robustness in the matching process
(e.g., the two RGB values (123,078,226) and (121,080,230) are almost indistinguishable)
In practice, a common approach to represent color is to make use of histograms…
10 I. Bartolini Multimedia Data Management M
Color histograms A color histogram h is a D-dimensional vector, which is obtained by
quantizing the color space into D distinct colors Typical values of D are 32, 64, 256, 1024, …
Example: the HSV color space can be quantized into D=32 colors: H is divided into 8 intervals, and S into 4 V = 0 guarantees invariance to light intensity The i-th component (also called bin) of h stores the percentage (number)
of pixels in the image whose color is mapped to the i-th color Although conceptually simple, color histograms are widely used since
they are relatively invariant to translation, rotation, scale changes and partial occlusions
D = 64
11 I. Bartolini Multimedia Data Management M
Further examples
Two D=64 color histograms
12 I. Bartolini Multimedia Data Management M
Comparing color histograms Since histograms are vectors, we can use any Lp-norm to measure the
distance (dissimilarity) of two color histograms However, doing so we are not taking into account colors’ correlation Depending on the query and the dataset, we might therefore obtain low-quality
results Weighted Lp-norms and relevance feedback can partially alleviate the
problem…
The problem is that Lp-norms just consider the difference of corresponding bins, i.e., they perform a 1-1 comparison
With color histograms, our “coordinates” are not unrelated (“cross-talk” effect)
13 I. Bartolini Multimedia Data Management M
14
Recall on Lp-norms (1) Given two D-dimensional vectors p and q, their distance in the reference
D-dimensional space based on Lp-norm is:
if p=2 we obtain the Euclidean distance:
if p=1 we obtain the Manhattan distance:
if p=∞ we obtain the “max” distance:
L 2
L 1
L ∞
pD
i
piip qpqpL
/1
1
)(),(
−= ∑=
∞<≤ p1
2/1
1
22 )(),(
−= ∑=
D
iii qpqpL
∑=
−=D
iii qpqpL
11 ),(
{ }( , ) max i iiL p q p q∞ = −
I. Bartolini Multimedia Data Management M
15
Recall on Lp-norms (2) There exist weighed versions of Lp-norms for the Euclidean distance (p=2) :
where W(w1, …,wD) is the vector of weights that reflect the importance of each
coordinate of the D-dimensional space Examples (for D=2; coordinates a and b):
2/1
1
2,2 )(),,(
−= ∑=
D
iiiiW qpwWqpL
Weighted L2
a
1
1 b
q
importance of a equal to b L2
a
0.5
1 b
q
importance of a > b (50%) L2,W
with, for ex., W=[4, 1]
importance of b > a (50%) L2,W
with, for ex., W=[1, 4]
a
1
0.5 b
I. Bartolini Multimedia Data Management M
Sample queries based on color (1) QueryImage Euclidean distance 32-D HSV histograms
Weighted Euclidean distance
16 I. Bartolini Multimedia Data Management M
Sample queries based on color (2) Euclidean distance
Weighted Euclidean distance
QueryImage
17
32-D HSV histograms
I. Bartolini Multimedia Data Management M
Quadratic distance Consider two histograms h and q, both with D bins Their quadratic distance is defined as:
where A = {ai,j} is called the (color-)similarity matrix The value of ai,j is the “similarity” of the i-th and the j-th colors (ai,i = 1) Note that: when A is a diagonal matrix we are back to the weighted Euclidean distance, when A = I (the identity matrix) we obtain the L2 distance
In order to guarantee that LA is indeed a distance (LA(h,q;A) ≥ 0 ∀h,q), it is sufficient that A is a symmetric positive definite matrix
( )( )
( ) ( )qhAqh
qhqhaA)q;(h,L
T
D
1i
D
1jjjiiji,A
−××−=
−−= ∑∑= =
18
LA
I. Bartolini Multimedia Data Management M
Quadratic distance vs. Euclidean distance As a simple example, let D = 3, with colors red, orange, and blue Consider 3 pure-color images and the corresponding histograms:
Using L2, the distance between two different images is always √2 On the other hand, let the color-similarity matrix be defined as:
Now we have LA(h1,h2) = √0.4, whereas LA(h1,h3) = LA(h2,h3) = √2
h1=(1,0,0) h2=(0,1,0) h3=(0,0,1)
A 1 0.8 0
0.8 1 0
0 0 1
19 I. Bartolini Multimedia Data Management M
Representing texture (1)
Texture provides information about the uniformity, granularity and regularity of the image surface
It is usually computed just considering the gray-scale values of pixels V channel in HSV
20 I. Bartolini Multimedia Data Management M
Representing texture (2)
Tamura features correspond to properties of a texture which are readily perceived, that is coarseness, contrast and directionality (3-D feature vector) Coarseness - coarse vs. fine Contrast - high vs. low contrast Directionality - directional vs. non-directional
L1 or L2 distance functions are usually applied for comparison
21 I. Bartolini Multimedia Data Management M
Texture extraction with Gabor filters
A Gabor filter is a Gaussian modulated by a sinusoid, which can reveal the presence of a pattern along a certain direction and at a certain scale (frequency)
To extract texture information, one chooses a number of directions/orientations (e.g.,6) and scales (e.g., 5) according to which the image has to be analyzed For each orientation and scale, the average and the variance
(standard deviation) of the filter output are computed This leads to, say, 2×6×5 = 60-D feature vectors, which are usually
compared using the Manhattan (L1) distance
Scale: 4 at 108° Scale: 5 at 144° Scale: 3 at 72°
22 I. Bartolini Multimedia Data Management M
Gabor filter Let I be an image, with I(x,y) being the gray-scale value of the pixel in
position (x,y) A Gabor function is written as
and is completely determined by its frequency (ω) and bandwidth (σx,σy) The Gabor filter Gm,n(x,y) for scale m and orientation n is then defined as
where K is the total number of orientations Finally, the image is analyzed by convolution with the filter:
( )x2σy
σx
21exp
21y)G(x, 2
y
2
2x
2
ωπcosσπσ yx
+−=
/Knθ),cosθysinθx(ay'),sinθycosθ(xax'
)y',G(x'ay)(x,G
nnnm
nnm
mnm,
π=+−=+=
=−−
−
∑∑=i j
nm,nm, j)j)I(i,-yi,-(xGy)(x,w
23 I. Bartolini Multimedia Data Management M
Representing shape
Once one has succeeded in extracting an object’s contour, the next step is how to represent/encode it A common approach is to navigate the contour, which leads
to an ordering of the pixels in the contour: { (x(t),y(t)) : t = 1…,M }
A 2nd step is to represent the resulting curve in a parametric form For instance, a possibility is to resort
to complex values, by setting z(t) = x(t)+ j y(t) Thus, now we have vectors of complex values… The problem is that each vector has a different length
(i.e., M depends on the specific image) 24 I. Bartolini Multimedia Data Management M
Representative points The idea is to keep only the D most “interesting” points Some methods are: Equally-spaced sampling (a) Grid-based sampling (b) Maximum curvature points (c) Fourier-based methods, which first
compute the DFT of the contour, and then keep only the first D coefficients
Working in the frequency domain has several advantages: It can be proved that by properly modifying Fourier
coefficients one can achieve invariance to scale, translation and rotation Further, by viewing shape as a “signal”, one can adopt
distance measures that have been developed for the comparison of time series and that are somewhat insensitive to signals’ modifications
(b)
(c)
(a)
25 I. Bartolini Multimedia Data Management M
Comparing shapes The commonest way to measure the (dis-)similarity of two shape vectors
of equal length D is based on Euclidean distance (L2) However, with Euclidean distance we have to face a basic problem Sensitivity to “alignment of values”
Intuitively, we would need a distance measure that is able to “match” a point of time series s even with “surrounding” points of time series q Alternatively, we may view the time axis as a “stretchable” one
A distance like this exists, and is called “Dynamic Time Warping” (DTW) In order to understand how it works, we first need to introduce some important
concepts related to time series… 26 I. Bartolini Multimedia Data Management M
How to measure similarity between time series Given two time series of equal length D, the commonest way to
measure their (dis-)similarity is based on Euclidean distance However, with Euclidean distance we have to face two basic problems
1. High-dimensionality: (very) large D values 2. Sensitivity to “alignment of values”
For problem 1. we need to define effective dimensionality reduction techniques that work in a (much) lower dimensional space
For problem 2. we will introduce a new similarity criterion
27 I. Bartolini Multimedia Data Management M
( ) ( )∑ −==
1-D
0t
2tt2 qsqs,L
q
s
Dimensionality reduction: DFT (1) The first approach to reducing the dimensionality of time series, proposed
in [AFS93], was based on Discrete Fourier Transform (DFT) Remind: given a time series s, the Fourier coefficients are complex
numbers (amplitude,phase), defined as: From Parseval theorem we know that DFT preserves the energy of the
signal: Since DFT is a linear transformation we have:
thus, DFT preserves the Euclidean distance And? What can we gain from such transformation??
28 I. Bartolini Multimedia Data Management M
( ) ( ) ∑∑−
=
−
=
===1D
0f
2f
1D
0t
2t SSEssE
( ) ( ) ( ) 22
1D
0f
2ff
1D
0t
2tt
22 Q)(S,LQSQSEqsEqsq)(s,L =−=−=−=−= ∑∑
−
=
−
=
( ) 1D0,...,ft/Df2jexpsD1S
1D
0ttf −=−= ∑
−
=
π
Dimensionality reduction: DFT (2) The key observation is that, by keeping only a small set of Fourier
coefficients, we can obtain a good approximation of the original signal Why: because most of the energy of many real-world signals
concentrates in the low frequencies ([AFS93]): More precisely, the energy spectrum (|Sf|2 vs. f) behaves as O(f-b), b > 0: b = 2 (random walk or brown noise): used to model the behavior of stock
movements and currency exchange rates b > 2 (black noise): suitable to model slowly varying natural phenomena (e.g.,
water levels of rivers) b = 1 (pink noise): according to Birkhoff’s theory, musical scores follow this
energy pattern Thus, if we only keep the first few coefficients (D’ << D) we can achieve
an effective dimensionality reduction Note: this is the basic idea used by well-known compression standards, such
as JPEG (which is based on Discrete Cosine Transform)
29 I. Bartolini Multimedia Data Management M
An example: EEG data
Sampling rate: 128 Hz
30 I. Bartolini Multimedia Data Management M
Time series (4 secs, 512 points) Energy spectrum
Another example
128 points
31 I. Bartolini Multimedia Data Management M
s’ = approximation of s with 4 Fourier coefficients
0 20 40 60 80 100 120 140
s s’
1.5698 1.0485 0.7160 0.8406 0.3709 0.4670 0.2667 0.1928
First 4 Fourier coefficients
1.5698 1.0485 0.7160 0.8406 0.3709 0.4670 0.2667 0.1928 0.1635 0.1602 0.0992 0.1282 0.1438 0.1416 0.1400 0.1412 0.1530 0.0795 0.1013 0.1150 0.1801 0.1082 0.0812 0.0347 0.0052 0.0017 0.0002 ...
Fourier coefficients
0.4995 0.5264 0.5523 0.5761 0.5973 0.6153 0.6301 0.6420 0.6515 0.6596 0.6672 0.6751 0.6843 0.6954 0.7086 0.7240 0.7412 0.7595 0.7780 0.7956 0.8115 0.8247 0.8345 0.8407 0.8431 0.8423 0.8387 …
data values
The “alignment” problem Euclidean distance, as well as other Lp-norms, are not robust w.r.t., even
small, contractions/expansions of the signal along the time axis E.g., speech signals
Intuitively, we would need a distance measure that is able to “match” a point of time series s even with “surrounding” points of time series q Alternatively, we may view the time axis as a “stretchable” one
A distance like this exists, and is called “Dynamic Time Warping” (DTW)!
32 I. Bartolini Multimedia Data Management M
Fixed Time Axis Sequences are aligned “one to one”
“Warped” Time Axis Non-linear alignments are possible
How to compute the DTW (1) Assume that the two time series s and q have the same length D Note that with DTW this is not necessary anymore!
Construct a D×D matrix d, whose element di,j is the distance between si and qj We take di,j = (si - qj)2, but other possibilities exist (e.g., |si – qj|)
The “rules of the game”: Start from (0,0) and end in (D-1,D-1) Take one step at a time At each step, move only by increasing i, j, or both
I.e., never go back! “Jumps” are not allowed! Sum all distances you have found in the “warping path”
33 I. Bartolini Multimedia Data Management M
0 1 2 3 4 5
s 1 2 5 4 3 7 q 2 3 2 1 3 4
D=6 7 25 16 25 36 16 9 3 1 0 1 4 0 1 4 4 1 4 9 1 0 5 9 4 9 16 4 1 2 0 1 0 1 1 4 1 1 4 1 0 4 9 d 2 3 2 1 3 4
s
q
L2(s,q) = √29
How to compute the DTW (2) The figure shows a possible warping path w, whose “cost” is 21 The “Euclidean path” moves only along the main diagonal, and costs 29
The DTW is the minimum cost among all the warping paths
But the number of path is exponential in D Ok, but we can use dynamic programming, with complexity O(D2)
34 I. Bartolini Multimedia Data Management M
7 25 16 25 36 16 9 3 1 0 1 4 0 1 4 4 1 4 9 1 0 5 9 4 9 16 4 1 2 0 1 0 1 1 4 1 1 4 1 0 4 9
2 3 2 1 3 4
s
warping path w
How to compute the DTW (3) From the d matrix, incrementally build a new matrix WP, whose elements
wpi,j are recursively defined as:
35 I. Bartolini Multimedia Data Management M
7 25 16 25 36 16 9 3 1 0 1 4 0 1 4 4 1 4 9 1 0 5 9 4 9 16 4 1 2 0 1 0 1 1 4 1 1 4 1 0 4 9 d 2 3 2 1 3 4
s
q
7 40 22 31 43 24 15 3 15 6 7 11 8 6 4 14 6 9 18 8 5 5 10 5 11 18 7 5 2 1 2 2 3 4 8 1 1 5 6 6 10 19
WP 2 3 2 1 3 4
}wp,wp,min{wpdwp 1j-1,-i1j-i,j1,-iji,ji, +=
s
q
Then set dDTW(s,q) = √wpD-1,D-1
Back to the shape context: sample queries [BCP05]
QueryImage
R = relevant (same type of fish)
1100 objects’ contours
36 I. Bartolini Multimedia Data Management M
This is not the whole story…
Of course, many other features models (and correspondent distance functions) have been defined in literature for MM data Please, refer [SWS+00, LZL+07, DJL+08] for detailed pointers
This was just a way to provide some concrete examples of MM features and distance functions for comparing them! Note that, besides “generic” features, any specific image
domain/application needs to extract and manage specific features, which in general require much more sophisticated tools than the one we have seen E.g., face/fingerprints recognition
Nonetheless, what is important to stress is that the problem of how to search in large image DB’s remains (almost) the same!
37 I. Bartolini Multimedia Data Management M