lecture 9: multimedia content - cse.unsw.edu.aucs9519/lecture_notes_06/l9_comp9519_4in… ·...

17
Lecture 9: Multimedia Content Description (2) Dr Jing Chen NICTA & CSE UNSW CS9519 Multimedia Systems S2 2006 COMP9519 Multimedia Systems – Lecture 9 – Slide 2 – J. Chen last week’s lecture … Why to describe multimedia content ? Explosion in the source of digital media content Large collections of media items Problem? How to search and discover multimedia content ? How to index long video and audio sequence ? How to more efficiently browse content ? Application cases Content description Standard : MPEG-7 Definition Goal interoperability COMP9519 Multimedia Systems – Lecture 9 – Slide 3 – J. Chen Acknowledgement Thanks Dr. Jack Yu for providing the initial version of the lecture slides COMP9519 Multimedia Systems – Lecture 9 – Slide 4 – J. Chen Outline Introduction Color features Color and color spaces Histograms and similarity metrics Color descriptors Texture features Shape features Motion features (next lecture)

Upload: others

Post on 27-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

Lecture 9: Multimedia Content Description (2)

Dr Jing ChenNICTA & CSE UNSW

CS9519 Multimedia SystemsS2 2006

COMP9519 Multimedia Systems – Lecture 9 – Slide 2 – J. Chen

last week’s lecture …Why to describe multimedia content ?

Explosion in the source of digital media contentLarge collections of media items

Problem?How to search and discover multimedia content ? How to index long video and audio sequence ?How to more efficiently browse content ?

Application cases

Content description Standard : MPEG-7DefinitionGoal interoperability

COMP9519 Multimedia Systems – Lecture 9 – Slide 3 – J. Chen

Acknowledgement

Thanks Dr. Jack Yu for providing the initial version

of the lecture slides

COMP9519 Multimedia Systems – Lecture 9 – Slide 4 – J. Chen

OutlineIntroductionColor features

Color and color spacesHistograms and similarity metricsColor descriptors

Texture featuresShape featuresMotion features (next lecture)

Page 2: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 5 – J. Chen

Example

COMP9519 Multimedia Systems – Lecture 9 – Slide 6 – J. Chen

Visual featuresWhy visual features?

Manual labeling is subjective and time consumingDifficult to describe content by text completely

What visual featuresExtractable from image/videoLearn from human visual system

Mathematical representationM pixels in R3 (color channels) --> N-dimension features vectors

Eg, color histogram, 640*480 pixels in R3 -> 40-d vectors (bins)N << M, d is usually small

COMP9519 Multimedia Systems – Lecture 9 – Slide 7 – J. Chen

Good visual featuresGood visual features

Compactness of the representationDiscriminative powerInvariance: occlusion, shift, rotation, lighting change, etcComplexity

COMP9519 Multimedia Systems – Lecture 9 – Slide 8 – J. Chen

Popular visual featuresColor

Color histogramColor momentsDominant color

Texture: structural and statisticalEdge histogramTamura features

Shape: boundaries of objects

Motion: camera motion and object motion

Page 3: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 9 – J. Chen

OutlineIntroductionColor features

Color and color spacesHistograms and similarity metricsColor descriptors

Texture featuresShape featuresMotion featuresContent Search examples with features

COMP9519 Multimedia Systems – Lecture 9 – Slide 10 – J. Chen

Color

Ref: Gonzalez and Woods, digital image processing

COMP9519 Multimedia Systems – Lecture 9 – Slide 11 – J. Chen

Primary colorsOwing to the structure of the human visual system, all colors are seen as variable combinations of the three primary colors: Red, Green and Blue

COMP9519 Multimedia Systems – Lecture 9 – Slide 12 – J. Chen

RGB color spaceA color space is a 3-D coordinate system and a subspace within the system where each color is represented by a single point via itscoordinatesRGB is the most commonly used color space

Page 4: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 13 – J. Chen

HSV Colour SpaceThe hue (H) represents the dominant spectral component—color in its pure form, as in green, red, or yellowSaturation (S) refers to relative purity or the amount of white light mixed with a hueThe value (V) corresponds to the brightness of color.Why HSV colour space?

Perceptually uniform: geometric distance is consistent with perceptual distanceMore natural to humans: more meaningful, easier to work with

HSV color space as a cylindrical object COMP9519 Multimedia Systems – Lecture 9 – Slide 14 – J. Chen

RGB to HSV conversionV = max (r,g,b) S = (max (r,g,b)) - min (r,g,b)/max (r,g,b) H = depends on which of r,g,b is the maximum

#define RETURN_HSV(h, s, v) {HSV.H = h; HSV.S = s; HSV.V = v; return HSV;} // Theretically, hue 0 (pure red) is identical to hue 6 in these transforms. Pure// red always maps to 6 in this implementation. Therefore UNDEFINED can be// defined as 0 in situations where only unsigned numbers are desired.typedef struct {float R, G, B;} RGBType; typedef struct {float H, S, V;} HSVType; HSVType RGB_to_HSV( RGBType RGB ) { // RGB are each on [0, 1]. S and V are returned on [0, 1] and H is returned on [0, 6]. Exception: H is returned UNDEFINED if S==0.

float R = RGB.R, G = RGB.G, B = RGB.B, v, x, f; int i; HSVType HSV; x = min(R, G, B); v = max(R, G, B); if(v == x) RETURN_HSV(UNDEFINED, 0, v); f = (R == x) ? G - B : ((G == x) ? B - R : R - G); i = (R == x) ? 3 : ((G == x) ? 5 : 1); RETURN_HSV(i - f /(v - x), (v - x)/v, v);

}

COMP9519 Multimedia Systems – Lecture 9 – Slide 15 – J. Chen

Distance between two color points in HSV color spaceGiven two colors (h1, s1, v1) and (h2, s2, v2) where h, s and v are in the range of [0,1], the distance between these two colors is:

COMP9519 Multimedia Systems – Lecture 9 – Slide 16 – J. Chen

HMMD color spaceFive parameters

Hue – the same as defined for HSVMax and Min - the maximum and minimum among the R, G, and B values; blackness and whitenessDiff = Max – Min; colorfulnessSum = (Max+Min); brightness

Three parameters - Hue, Max and Min(or Hue, Diff and Sum) - are enough to describe the color space

Adopted in MPEG-7; used in the color structure descriptor (CSD)

Advantage: close to perceptually uniform

White Color

Max

Min

Black Color

Sum

Diff

Hue

Page 5: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen

Distance between Two Color Points in HMMD Suppose h is from 0 to 2π, min is from 0 to 1, max is from 0 to 1, diff is from 0 to sqrt(2)/2, sum is from 0 to sqrt(2), where

The distance between c1 and c2 is (the range of the distance value is 0 ~ 1)

where

and

2)2,1(

22 dsccdist +=

COMP9519 Multimedia Systems – Lecture 9 – Slide 18 – J. Chen

YCbCr color spaceITU-R BT.601 defines Y as the brightness (luma), Cb as blue minus luma (B-Y), and Cr as red minus luma (R-Y).Y to have a nominal range of 16-235 (blackwhite); Cb and Cr are to have a nominal range of 16-240, with 128 corresponding to zero. YCbCr is defined to have been derived from gamma pre-corrected component RGB signalsY = (77/256)R + (150/256)G + (29/256)BCb = -(44/256)R - (87/256)G + (131/256)B + 128Cr = (131/256)R - (110/256)G - (21/256)B + 128R = Y + 1.371(Cr - 128)G = Y - 0.698(Cr - 128) - 0.336(Cb - 128)B = Y + 1.732(Cb - 128)

COMP9519 Multimedia Systems – Lecture 9 – Slide 19 – J. Chen

YCbCr color space (continued)

If the 24-bit RGB data are to have a range of 0-255 (black-white), as commonly found in PCs, the following equations should be used to maintain the correct black and white levels:Y = 0.257R + 0.504G + 0.098B + 16Cb = -0.148R - 0.291G + 0.439B + 128Cr = 0.439R - 0.368G - 0.071B + 128R = 1.164(Y - 16) + 1.596(Cr - 128)G = 1.164(Y - 16) - 0.813(Cr - 128) - 0.392(Cb - 128)B = 1.164(Y - 16) + 2.017(Cb - 128)YCbCr represents color as brightness and two color difference signals, while RGB represents color as red, green and blue.

COMP9519 Multimedia Systems – Lecture 9 – Slide 20 – J. Chen

OutlineIntroductionColor features

Color and color spacesHistograms and similarity metricsColor descriptors

Texture featuresShape features

Page 6: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 21 – J. Chen

HistogramsOne-dimension data distribution = Set of (bin, frequency) pairs

Each bin has its associated attribute value

The set of bins partitions the feature (here the gray scale value) space

* Nuno Vasconcelos and Andrew Lippman

COMP9519 Multimedia Systems – Lecture 9 – Slide 22 – J. Chen

Color histogramsPartition the feature space off into several bins

Represent the statistical of the number of pixels in each binExample: partition of RGB color space into 8 bins

Three types of histograms depending on how we partition the color space

Fixed binningClustered binningAdaptive binning

COMP9519 Multimedia Systems – Lecture 9 – Slide 23 – J. Chen

Fixed binningThe same as scalar quantization of color spaceUse fixed size (for all images) binning of color space

* H.R.Wu

COMP9519 Multimedia Systems – Lecture 9 – Slide 24 – J. Chen

Adaptive binningThe same as VQ of color space adaptive to each imagePartitions the color space into irregular size bins to minimize the representation distortion incurredOptimization algorithm: k-means (aka Lloyd algorithm)

Applied to every image, time consuming

* H.R.Wu

Page 7: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 25 – J. Chen

Clustered binningGather accumulated statistical distribution of pixel values through a training set of images

Perform k-means clustering with the accumulated distribution using a pre-determined number of bins (N)

Get N clusters with their respective centroidsSimilar to codebook generation in VQ

Given a new image, calculate the color histogram based on the codebook generated in step 2

Assign color c to cluster i if distance between c and centroid of cluster i is <= the distance between c and all other N-1 cluster centroids

COMP9519 Multimedia Systems – Lecture 9 – Slide 26 – J. Chen

Comparison of different binning methods

Computational complexityFixed binning: very lowClustered: middle (k-means once for all images + for each image, assign colors to clusters which is part of an iteration in k-means clustering)Adaptive: high (k-means for each image individually)

Representation distortionFixed binning > clustered > adaptive

COMP9519 Multimedia Systems – Lecture 9 – Slide 27 – J. Chen

Similarity metrics

Given two histograms (two feature vectors) I and J, how do we quantify the similarity between these two?

Distance is the reverse of similarity, defined as D(I,J) = f(I,J) where f is a distance function

COMP9519 Multimedia Systems – Lecture 9 – Slide 28 – J. Chen

Minkowski-form distance metric

where p=1, 2 and ∞, and the corresponding D(I,J) is called L1, L2 (also called Euclidean distance) and L∞ distance respectively

pp

iii JxIxJID

/1

|)()(|),( ⎟⎠

⎞⎜⎝

⎛−= ∑

1 2 3 4 5 6 7 4I

0

1

2

3

4

5

6

7

Weight

Bin

IJ

Example: L2(I,J) =21/2

Page 8: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 29 – J. Chen

Kullback-Leibler Divergence and Jeffrey Divergence

Kullback-Leibler Divergence

Information theory interpretationMeasures how inefficient it is to code one histogram using the other as the code-bookNon-symmetric and sensitive to histogram binning

Jeffrey Divergence

Empirically derivedSymmetric Robust with respect to noise and the size of histogram bins

∑=i i

iiKL Jx

IxIxJID)()(log)(),(

∑ ⎟⎟⎠

⎞⎜⎜⎝

⎛+=

i i

ii

i

iiJD Ix

JxJxJxIxIxJID

)()(log)(

)()(log)(),(

COMP9519 Multimedia Systems – Lecture 9 – Slide 30 – J. Chen

Χ2 statistics

Statistical interpretationMeasures how unlikely it is that one distribution was drawn from the population represented by the other

( )( )∑ +

−=

i ii

ii

JxIxJxIxJID

2/)()()()(),(

2

COMP9519 Multimedia Systems – Lecture 9 – Slide 31 – J. Chen

Bin-to-bin and cross-bin metricsMinkowski-form, K-L Divergence, Jeffrey Divergence and χ2

statistics are all bin-to-bin metricsDo not consider similarity across bins

Cross-bin distance metricsConsider the ground distance between bins

* Y. Rubner, C. Tomasi, and L. J. Guibas, “The earth mover's distance as a metric for image retrieval,”Int Jour of Comp Vision, vol 40, no 2, pp 99-121, 2000.

COMP9519 Multimedia Systems – Lecture 9 – Slide 32 – J. Chen

Quadratic-form distance

Where is a similarity metric, aij denotes similarity (ground distance) between bins i and j, FI and FJ are vectors listing all bins in I and JSome ground distance functions:

Where dij is the L2 distance between bins i and j, dmax is maximum dij

Where σ is positive constantFaster roll-off of as a function of dij

)()(),( JIT

JIQF JID FFAFF −−=

][ ija=A

max/1 dda ijij −=

))/(exp( 2maxdda ijij σ−=

Page 9: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 33 – J. Chen

Earth Mover’s Distance (EMD)Given two histograms, move masses (earth) from one histogram to the other while minimising the cost of weight X ground distance

The two histograms do not require to have the same binning partitionEmpirically developed first, and statistical interpretation found second (Mallows distance)

COMP9519 Multimedia Systems – Lecture 9 – Slide 34 – J. Chen

EMD – mathematical representationGiven two histograms and where is the centre of cluster i and is the number of points in the cluster, the EMD is derived by solving as an optimal flow first which minimizes

subject to

is the ground distance between xi and xj.EMD is defined as

)},),...(,{( 11 mm pxpxP = )},),...(,{( 11 nn qxqxQ =ix ip

}{ ijfF =

ijd

COMP9519 Multimedia Systems – Lecture 9 – Slide 35 – J. Chen

Comparison of histogram similarity metrics

* Y. Rubner et al, ``Empirical evaluation of dissimilarity measures for color and texture,'‘ Comput Vis Image Underst, vol 84, no 1, pp 25-43, 2001.

yesnononononoPartial matches

Depending on the application; Χ2 usually gives reasonably good results

Accuracy in image retrieval

yesyesnonononoAdaptive binning support

yesyesnonononoGround distance

HighhighmediummediummediummediumComputational complexity

yesyesyesnoyesyesSymmetrical

EMDQFJDKLΧ2Lp

COMP9519 Multimedia Systems – Lecture 9 – Slide 36 – J. Chen

OutlineIntroductionColor features

Color and color spacesHistograms and similarity metricsColor descriptors

Texture featuresShape features

Page 10: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 37 – J. Chen

Color descriptors in MPEG-7

COMP9519 Multimedia Systems – Lecture 9 – Slide 38 – J. Chen

Dominant color descriptor (DCD)In the category of adaptive histogram

RGB is the default color space

Colors in an image are represented by N dominant color clusters

Where ci is the color of a cluster, pi is the fraction of the number of pixels in a cluster vs that of all pixels in the image,

optional parameter vi is the variation of color values of the pixels in the cluster,

s represents the overall spatial coherency of the dominant colors in the image

( ){ } NisvpcF iii ...2,1,,,, ==

COMP9519 Multimedia Systems – Lecture 9 – Slide 39 – J. Chen

Examples of high and low spatial coherency of color

Low High

* MPEG-7

COMP9519 Multimedia Systems – Lecture 9 – Slide 40 – J. Chen

Extraction of dominant colorsUsing Generalized Lloyd Algorithm (aka k-means)Minimizing distortion

Where ci is the centroid of cluster Ci, x(k) is the color at pixel k, and h(k) is the perceptual weight for pixel k in the form of an exponential function to account for the fact that HVS is more sensitive to changes in smooth regions than in texture regions (see Y. Deng, S.Kenney, M.S.Moore and B. S.Manjunath, "Peer group filtering and perceptual color image quantization", ISCAS'99, Orlando, FL, vol 4, pp.21-24 , June 1999.)

Update rule during optimization:

Difference to normal GLA/K-means: perceptual weighting h(k)

∑∑ ∈=−=i k

ii CkxNickxkhD )(,...1,)()( 2

ii Ckxkh

kxkhc ∈=

∑∑ )(,

)()()(

Page 11: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 41 – J. Chen

Similarity measurement for DCDDCD is essentially adaptive histogram, so Lp, Χ2, KL, JD etc are not suitableQuadratic form distance is adopted in MPEG-7Given two DCDs,

Where p is the percentage, and a is the ground distance between two colors

EMD may be applied here

)()(),( JIT

JIQF JID FFAFF −−=

∑ ∑∑∑= = ==

−+=2 1 21

1 1 1212,1

22

1

21 2

N

j

N

i

N

jjijij

N

ii ppapp

COMP9519 Multimedia Systems – Lecture 9 – Slide 42 – J. Chen

Scalable color descriptorColor Histogram in HSV Color SpaceEncoded by a Haar wavelet transform

Sum coefficients: [1 1]Diff coefficients: [1 -1]

COMP9519 Multimedia Systems – Lecture 9 – Slide 43 – J. Chen

Scalable color descriptor diagram

COMP9519 Multimedia Systems – Lecture 9 – Slide 44 – J. Chen

OutlineIntroductionColor features

Color and color spacesHistograms and similarity metricsColor descriptors

Texture featuresShape features

Page 12: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 45 – J. Chen

Texture What is texture?

Has structure or repetitious pattern, i.e., checkeredHas statistical pattern, i.e., grass, sand, rocks

COMP9519 Multimedia Systems – Lecture 9 – Slide 46 – J. Chen

Brodatz textures

COMP9519 Multimedia Systems – Lecture 9 – Slide 47 – J. Chen

Why textureWhy texture?

Application to satellite images, medical imagesDescribes contents of real world images, i.e., clouds, fabrics, surfaces, wood, stone

Challenging issuesRotation and scale invariance (3D)Segmentation/extraction of texture regions from imagesTexture in noise

COMP9519 Multimedia Systems – Lecture 9 – Slide 48 – J. Chen

Some approaches to texture featuresFourier Domain Energy Distribution

Angular features (directionality)

where

Radial features (coarseness)

where

Page 13: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 49 – J. Chen

MPEG-7 texture descriptors

Homogenous texture descriptorTexture browsing descriptorEdge histogram (Non-homogenous texture descriptor)

COMP9519 Multimedia Systems – Lecture 9 – Slide 50 – J. Chen

Homogenous Texture Descriptor (HTD)

Partitioning the frequency domain into 30 channels (modeled by a2D-Gabor function)

Computing the energy and energy deviation for each channel

Computing mean and standard variation of frequency coefficients

F = {fDC, fSD, e1,…, e30, d1,…, d30}

COMP9519 Multimedia Systems – Lecture 9 – Slide 51 – J. Chen

Channels used in computing the HTD

Frequency plane partition is uniform along the angular direction(30o), non-uniform along the radial direction (on an octave scale)Can be implemented by 2D Fourier Transform

1

2

713

8

910

141516

1920212223

24

34

5

611

12 1718

30

2526272829

0

Channel (Ci)

channel number (i)

ω

COMP9519 Multimedia Systems – Lecture 9 – Slide 52 – J. Chen

Gabor functionOn top of the feature channels, the following 2D Gabor(modulated Gaussian) function is applied to each individual channels

equivalent to weighting the Fourier transform coefficients of the image with a Gaussian centered at the frequency channels as defined above.

Each channel filters a specific type of texture (!!!)

( ) ( ) ( )⎥⎥⎦

⎢⎢⎣

⎡ −−⋅

⎥⎥⎦

⎢⎢⎣

⎡ −−= 2

2

2

2

, 2exp

2exp

rs

rsrsPG

θρ σθθ

σωω

θ,ω

Page 14: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 53 – J. Chen

Demohttp://nayana.ece.ucsb.edu/M7TextureDemo/Demo/client/M7TextureDemo.html

COMP9519 Multimedia Systems – Lecture 9 – Slide 54 – J. Chen

Edge histogram (I) sub-imagesImages are divided into 16 non-overlapping sub-images.

* Manjunath paper

COMP9519 Multimedia Systems – Lecture 9 – Slide 55 – J. Chen

Edge histogram (II) edge detection filtersEdges in the sub-images are categorized into five types: vertical, horizontal, diagonal, diagonal and non-directional edges.

Filters for edge detection (applied to 2x2 blocks)

a) vertical b) horizontal c) 45 degree d) 135 degree e)non-directional edge edge edge edge edge

* Manjunath paper & MPEG-7COMP9519 Multimedia Systems – Lecture 9 – Slide 56 – J. Chen

Edge histogram (III)For each sub-image, local edge histograms can be constructed to represent the distribution of the five-types of edges in the sub-image

Totally 16x5=80 local edge histogram bins. A global-edge histogram and 65 semi-global edge histograms are computed from the 80 local histogram bins.

For the global edge histogram, the five types of edge distributions for all subimages are accumulated. For the semi-global edge histograms, subsets of subimages are grouped.

L1 norm of the distance of local, semi-global and global histograms between two frames is adopted as the distance function.

The distance of the global histogram difference is multiplied by 5 given the number of bins of the global histogram is much smaller than that of local and semi-global histograms.

Page 15: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 57 – J. Chen

OutlineIntroductionColor features

Color and color spacesHistograms and similarity metricsColor descriptors

Texture featuresShape features

COMP9519 Multimedia Systems – Lecture 9 – Slide 58 – J. Chen

Examples of contour- and region-based shape similarity

Horizontal bar: similar shapes by Region-BasedVertical bar: Similar shapes by Contour-Based

COMP9519 Multimedia Systems – Lecture 9 – Slide 59 – J. Chen

MPEG-7 shape features2-dimensional (2D)

Region Shape descriptorThe distribution of all pixels within a region

Contour Shape descriptor Shape properties of a contour of an object

3-dimensional (3D) Shape3D descriptor

Intrinsic shape characterization of 3D mesh models MultipleView descriptor

Combined with a 2D shape descriptor for 3D shape description

COMP9519 Multimedia Systems – Lecture 9 – Slide 60 – J. Chen

Region shape descriptorExpresses pixel distribution within a 2-D object region

Based on both boundary and internal pixelsUses a complex 2D-Angular Radial Transformation (ART)

Real parts of the 2-D basis functions whose origins are at the centers of each image

Advantages:It gives a compact and efficient way of describing properties ofmultiple disjoint regions simultaneouslyThe descriptor is robust to segmentation noise

Page 16: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 61 – J. Chen

Contour shape descriptorDefines a closed contour of a 2D object or region in an image or video sequence Examples of shapes where a contour-based descriptor is applicable Advantages:

Very compact representation (below 14 bytes in size on average) Can find semantically similar shapes – Fig (c)Robust to significant non-rigid deformations – Fig (d)Robust to distortions in the contour due to perspective transformation –Fig (e)

Using Curvature Scale Space (CSS) representation

COMP9519 Multimedia Systems – Lecture 9 – Slide 62 – J. Chen

Curvature Scale-Space (CSS)Finds curvature zero crossing points of the shape’s contour (key points)

Contour curvature function zero-crossing points separate concave and convex parts of the contour

Reduces the number of key points iteratively, by applying Gaussian smoothing The position of key points (horizontal coordinates) are expressed relative to the length of the contour curveThe vertical-coordinates (y_css) correspond to the amount of filtering applied * MPEG-7

COMP9519 Multimedia Systems – Lecture 9 – Slide 63 – J. Chen

Concave and convex functionsFunction f is concave if the line segment joining any two points on the graph of f is never above the graph; f is convex if the line segment joining any two points on the graph is never below the graph

Convex Concave

COMP9519 Multimedia Systems – Lecture 9 – Slide 64 – J. Chen

Application – trademark retrieval

Page 17: Lecture 9: Multimedia Content - cse.unsw.edu.aucs9519/lecture_notes_06/L9_COMP9519_4in… · COMP9519 Multimedia Systems – Lecture 9 – Slide 17 – J. Chen Distance between Two

COMP9519 Multimedia Systems – Lecture 9 – Slide 65 – J. Chen

ReviewIntroductionColor features

Color and color spacesHistograms and similarity metricsColor descriptors

Texture featuresShape featuresMotion features (next week)

COMP9519 Multimedia Systems – Lecture 9 – Slide 66 – J. Chen

Key References

B.S. Manjunath , Phillipe Salembier , Thomas Sikora, Introduction to MPEG-7: Multimedia Content Description Interface, John Wiley & Sons, Inc., New York, NY, 2002 (Book)MPEG-7 visual standardT. Sikora, “The MPEG-7 Visual Standard for Content Description-an overview ”IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 696-702, June 2001 B.S. Manjunath, J.-R. Ohm, V.V. Vasudevan, and A. Yamada, “MPEG-7 Color and Texture Descriptors”, IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 703-715, June 2001M. Bober, “MPEG-7 Visual Shape Descriptors”, IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 716-719, June 2001