lecture 01 internet video search

Internet Video Search

Arnold W.M. Smeulders & Cees Snoek

CWI & UvA

Overview Image and Video Search

Lecture 1 visual search, the problem

color-spatial-textural-temporal features

measures and invariances

Lecture 2 descriptors

words and similarity

where and what

Lecture 3 data and metadata

performance

speed

1 Visual search, the problem

A brief history of television

From broadcasting to narrowcasting

…to thin casting

~1955 ~1985 ~2005

2008

2010

Any other purpose than tv?

Surveillance to alert events Forensics to find evidence / to protect misuse Social media to sort responses Safety to prevent terrorism Agriculture to sort fruit News to reuse archived footage Business to have efficient access eBusiness to mine consumer data Science to understand visual cognition Family “I have it somewhere on this disk”

How big? The answer from the web

The web is video

…as of May 2011

How big? The answer from

Yearly influx

15.000 hours of video

1 Pbyte per year

Next 6 years

137.200 hours of video

22.510 hours of film

2.900.000 photo’s

How big? Answer from the archive

Crowd-given search

What others say is in the video. We focus on what digital content says is in the video.

Problem 1: The variation

So many images of one thing: illumination background occlusion viewpoint, … This is the sensory gap.

Multimedia Archives

Suit Basketball

Table

Tree

US flag

Aircraft

Dog Tennis Mountain

Fire

Building

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

Problem 2: What defines things?

Language

Machine

Problem 3: The many things

This is the model gap

Problem 4: The story of a video

This is the narrative gap

Problem 5: No shared intuition

This is the query-context gap

Query-by-keyword

Query-by-concept

Query-by-examples

What sources Query

Prediction

Find shots of people shaking hands

System 1: histogram matching

Histogram as a summary of color characteristics. This image cannot currently be displayed.

Swain and Ballard, IJCV 1991

1 Conclusion

As content grows, many applications of image search.

Deep cognitive and computer science problems.

With simple means one gets visually simple results.

2 Features

Light source

Source . reflection

Result

)()( λρλe

Object

)(λρ

)(λe

(R,G,B)

)()()(

)()()(

)()()(

=

∫

∫

∫

λ

λ

λ

λλλρλ

λλλρλ

λλλρλ

dfe

dfe

dfe

BGR

B

G

R

++

++

++

=

BGRB

BGRG

BGRR

bgr

(r, g, b) in (R,G,B)

Independent of shadow!

The sensation of spectra

Hue: dominant wavelength λ(EH)

Saturation: purity of the colour (EH - EW)/EH

Intensity: brightness of the colour EW

“white” “green”

EH

EW

Human perception combines (R,G,B) response of the eye in opponent colors

Maximizes perceived contrast!

The sensation of spectra: opponent

λ

λ

λ

−−

−

++

=

)2(41

)(21

PuperGreenBlueYellowLuminance

GRB

GR

BGR

Color Gaussian space

−−=

BGR

EEE

17.060.034.035.004.030.0

27.063.006.0

λλ

λ

Geusebroek PAMI 2002

Maximizes information content!

(E0,Eλ,Eλλ)-pdf

Color Gaussian space (R,G,B)-pdf

Matter body reflectance in (R,G,B)

Taxonomy of diff-image structure T-junction

These junctions later bring recognition

Corner

Junction

Highlight

The 2D Gabor function is:

)(222

2

22

21),( vyuxj

yx

eeyxh ++

−= πδ

πσTuning parameters: u, v, σ

Gabor texture

Manjunath and Ma on Gabor for texture in Fourier-space

Gabor texture

K-means cluster of RGB

K-means cluster Gabor opponent

Hoang ECCV 2002

Gabor GIST descriptor

Calculate Gabor responses locally

Create histograms as before

Distinguishes things like naturalness, openness,

roughness, expansion, and ruggedness

Olivia IJCV 2001 Slide credit: James Hays and Alexei Efros

Receptive field in f(x,t)

Gaussian equivalent over x and t:

zero order first order t

Burghouts TIP 2006

Gaussians measure differentials

Taylor expansion at x

For discretely sampled signal use the Gaussians The preferred brand of filters: separable by dimension rotation symmetric no new maxima fast implementations.

Receptive fields: overview

All observables up to first order color, second order spatial scales, eight frequency bands & first order in t.

System 2: Blobworld, textured world

Group blobs based on color and Tamura texture

User specifies query blob and features

System returns images with similar regions

Carson PAMI 2002

2 Conclusion

Powerful features capture uniqueness.

A large set is needed for open-ended search.

The Gauss family is the preferred brand of filters.

Fast recursive implementation:

Geusebroek, Van de Weijer & Smeulders 2002

3 Measures and invariances

There are a million appearances to one object The same part of the same shoe does not have the same appearance in the image. This is the sensory gap. Remove unwanted variance as early as you can.

The need for invariance

A feature g is invariant under condition (transform)

caused by accidental conditions at the time of recording,

iff g observed on equal objects and is constant:

Invariance: definition

Quiz: scale invariant detection

What properties are invariant to observation scale?

surface albedo scene & viewpoint invariant

illumination scene dependent

object surface normal object shape variant

illumination direction scene dependent

viewer’s direction viewpoint variant

sensor sensitivity scene dependent

v

)(λCf

sn

)(λe)(λbc

λλλλλλλλλλ∫∫ += dfcevsnmdfcesnmC CssCbb )()()(),,()()()(),(

Color invariance

Matter body reflectance in E

E space C space

C is viewpoint invariant

Gevers TIP 2000

,,(3 BGRc},max{

arctan),,(2 BRGBGRc =

},max{arctan),,(1 BG

RBGRc =

Hue is viewpoint invariant

H = arctan 3 𝐺−𝐵𝑅−𝐺 + 𝑅−𝐵

, H is a scalar

Differential invariants C’, W’, M’

C’ is for matte objects and uneven white light:

EEC λ

λ =

2EEEEEC

EEC

xxx

λλλ

λλλλ

−=

=

W’ is for matte planar objects and even white light:

EEW x

x =E

EW xx

λλ =

M’ is for matte objects and monochromatic light: Geusebroek PAMI 2002 2E

EEEEN xxx

λλλ

−=

shadows shading highlights ill. intensity ill. color E - - - - -

H + + + + - W & W’ - + - + - C & C’ + + - + - M & M’ + + - + + L + + + + - E 990 H 315 Retained from 1000 colors σ = 3: W’ 995 C’ 850 M’ 900

Retained discrimination

Geusebroek PAMI 2003

3 Conclusion

Know your variances and invariants.

Good invariant features make algorithms simple.

lecture 01 internet video search

Documents

b e f r d r g

br r r

e f g d b e f b d

b luminance

b response

color gaussian space

opponent colors r

bpdf e0