lecture 01 internet video search
TRANSCRIPT
Internet Video Search
Arnold W.M. Smeulders & Cees Snoek
CWI & UvA
Overview Image and Video Search
Lecture 1 visual search, the problem
color-spatial-textural-temporal features
measures and invariances
Lecture 2 descriptors
words and similarity
where and what
Lecture 3 data and metadata
performance
speed
1 Visual search, the problem
A brief history of television
From broadcasting to narrowcasting
…to thin casting
~1955 ~1985 ~2005
2008
2010
Any other purpose than tv?
Surveillance to alert events Forensics to find evidence / to protect misuse Social media to sort responses Safety to prevent terrorism Agriculture to sort fruit News to reuse archived footage Business to have efficient access eBusiness to mine consumer data Science to understand visual cognition Family “I have it somewhere on this disk”
How big? The answer from the web
The web is video
…as of May 2011
How big? The answer from
Yearly influx
15.000 hours of video
1 Pbyte per year
Next 6 years
137.200 hours of video
22.510 hours of film
2.900.000 photo’s
How big? Answer from the archive
Crowd-given search
What others say is in the video. We focus on what digital content says is in the video.
Problem 1: The variation
So many images of one thing: illumination background occlusion viewpoint, … This is the sensory gap.
Multimedia Archives
Suit Basketball
Table
Tree
US flag
Aircraft
Dog Tennis Mountain
Fire
Building
1101011011011011011011001101011011111001101011011111
1101011011011011011011001101011011111001101011011111
1101011011011011011011001101011011111001101011011111
1101011011011011011011001101011011111001101011011111
1101011011011011011011001101011011111001101011011111
1101011011011011011011001101011011111001101011011111
1101011011011011011011001101011011111001101011011111
1101011011011011011011001101011011111001101011011111
1101011011011011011011001101011011111001101011011111
1101011011011011011011001101011011111001101011011111
1101011011011011011011001101011011111001101011011111
Problem 2: What defines things?
Language
Machine
Problem 3: The many things
This is the model gap
Problem 4: The story of a video
This is the narrative gap
Problem 5: No shared intuition
This is the query-context gap
Query-by-keyword
Query-by-concept
Query-by-examples
What sources Query
Prediction
Find shots of people shaking hands
System 1: histogram matching
Histogram as a summary of color characteristics. This image cannot currently be displayed.
Swain and Ballard, IJCV 1991
1 Conclusion
As content grows, many applications of image search.
Deep cognitive and computer science problems.
With simple means one gets visually simple results.
2 Features
Light source
Source . reflection
Result
)()( λρλe
Object
)(λρ
)(λe
(R,G,B)
)()()(
)()()(
)()()(
=
∫
∫
∫
λ
λ
λ
λλλρλ
λλλρλ
λλλρλ
dfe
dfe
dfe
BGR
B
G
R
++
++
++
=
BGRB
BGRG
BGRR
bgr
(r, g, b) in (R,G,B)
Independent of shadow!
The sensation of spectra
Hue: dominant wavelength λ(EH)
Saturation: purity of the colour (EH - EW)/EH
Intensity: brightness of the colour EW
“white” “green”
EH
EW
Human perception combines (R,G,B) response of the eye in opponent colors
Maximizes perceived contrast!
The sensation of spectra: opponent
λ
λ
λ
−−
−
++
=
)2(41
)(21
PuperGreenBlueYellowLuminance
GRB
GR
BGR
Color Gaussian space
−−=
BGR
EEE
17.060.034.035.004.030.0
27.063.006.0
λλ
λ
Geusebroek PAMI 2002
Maximizes information content!
(E0,Eλ,Eλλ)-pdf
Color Gaussian space (R,G,B)-pdf
Matter body reflectance in (R,G,B)
Taxonomy of diff-image structure T-junction
These junctions later bring recognition
Corner
Junction
Highlight
The 2D Gabor function is:
)(222
2
22
21),( vyuxj
yx
eeyxh ++
−= πδ
πσTuning parameters: u, v, σ
Gabor texture
Manjunath and Ma on Gabor for texture in Fourier-space
Gabor texture
K-means cluster of RGB
K-means cluster Gabor opponent
Hoang ECCV 2002
Gabor GIST descriptor
Calculate Gabor responses locally
Create histograms as before
Distinguishes things like naturalness, openness,
roughness, expansion, and ruggedness
Olivia IJCV 2001 Slide credit: James Hays and Alexei Efros
Receptive field in f(x,t)
Gaussian equivalent over x and t:
zero order first order t
Burghouts TIP 2006
Gaussians measure differentials
Taylor expansion at x
For discretely sampled signal use the Gaussians The preferred brand of filters: separable by dimension rotation symmetric no new maxima fast implementations.
Receptive fields: overview
All observables up to first order color, second order spatial scales, eight frequency bands & first order in t.
System 2: Blobworld, textured world
Group blobs based on color and Tamura texture
User specifies query blob and features
System returns images with similar regions
Carson PAMI 2002
2 Conclusion
Powerful features capture uniqueness.
A large set is needed for open-ended search.
The Gauss family is the preferred brand of filters.
Fast recursive implementation:
Geusebroek, Van de Weijer & Smeulders 2002
3 Measures and invariances
There are a million appearances to one object The same part of the same shoe does not have the same appearance in the image. This is the sensory gap. Remove unwanted variance as early as you can.
The need for invariance
A feature g is invariant under condition (transform)
caused by accidental conditions at the time of recording,
iff g observed on equal objects and is constant:
Invariance: definition
Quiz: scale invariant detection
What properties are invariant to observation scale?
surface albedo scene & viewpoint invariant
illumination scene dependent
object surface normal object shape variant
illumination direction scene dependent
viewer’s direction viewpoint variant
sensor sensitivity scene dependent
v
)(λCf
sn
)(λe)(λbc
λλλλλλλλλλ∫∫ += dfcevsnmdfcesnmC CssCbb )()()(),,()()()(),(
Color invariance
Matter body reflectance in E
E space C space
C is viewpoint invariant
Gevers TIP 2000
,,(3 BGRc},max{
arctan),,(2 BRGBGRc =
},max{arctan),,(1 BG
RBGRc =
Hue is viewpoint invariant
H = arctan 3 𝐺−𝐵𝑅−𝐺 + 𝑅−𝐵
, H is a scalar
Differential invariants C’, W’, M’
C’ is for matte objects and uneven white light:
EEC λ
λ =
2EEEEEC
EEC
xxx
λλλ
λλλλ
−=
=
W’ is for matte planar objects and even white light:
EEW x
x =E
EW xx
λλ =
M’ is for matte objects and monochromatic light: Geusebroek PAMI 2002 2E
EEEEN xxx
λλλ
−=
shadows shading highlights ill. intensity ill. color E - - - - -
H + + + + - W & W’ - + - + - C & C’ + + - + - M & M’ + + - + + L + + + + - E 990 H 315 Retained from 1000 colors σ = 3: W’ 995 C’ 850 M’ 900
Retained discrimination
Geusebroek PAMI 2003
3 Conclusion
Know your variances and invariants.
Good invariant features make algorithms simple.