lecture 01 internet video search

45
Internet Video Search Arnold W.M. Smeulders & Cees Snoek CWI & UvA

Upload: zukun

Post on 19-May-2015

378 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 01 internet video search

Internet Video Search

Arnold W.M. Smeulders & Cees Snoek

CWI & UvA

Page 2: Lecture 01 internet video search

Overview Image and Video Search

Lecture 1 visual search, the problem

color-spatial-textural-temporal features

measures and invariances

Lecture 2 descriptors

words and similarity

where and what

Lecture 3 data and metadata

performance

speed

Page 3: Lecture 01 internet video search

1 Visual search, the problem

Page 4: Lecture 01 internet video search

A brief history of television

From broadcasting to narrowcasting

…to thin casting

~1955 ~1985 ~2005

2008

2010

Page 5: Lecture 01 internet video search

Any other purpose than tv?

Surveillance to alert events Forensics to find evidence / to protect misuse Social media to sort responses Safety to prevent terrorism Agriculture to sort fruit News to reuse archived footage Business to have efficient access eBusiness to mine consumer data Science to understand visual cognition Family “I have it somewhere on this disk”

Page 6: Lecture 01 internet video search

How big? The answer from the web

The web is video

Page 7: Lecture 01 internet video search

…as of May 2011

How big? The answer from

Page 8: Lecture 01 internet video search

Yearly influx

15.000 hours of video

1 Pbyte per year

Next 6 years

137.200 hours of video

22.510 hours of film

2.900.000 photo’s

How big? Answer from the archive

Page 9: Lecture 01 internet video search

Crowd-given search

What others say is in the video. We focus on what digital content says is in the video.

Page 10: Lecture 01 internet video search

Problem 1: The variation

So many images of one thing: illumination background occlusion viewpoint, … This is the sensory gap.

Page 11: Lecture 01 internet video search

Multimedia Archives

Suit Basketball

Table

Tree

US flag

Aircraft

Dog Tennis Mountain

Fire

Building

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

1101011011011011011011001101011011111001101011011111

Problem 2: What defines things?

Language

Machine

Page 12: Lecture 01 internet video search

Problem 3: The many things

This is the model gap

Page 13: Lecture 01 internet video search

Problem 4: The story of a video

This is the narrative gap

Page 14: Lecture 01 internet video search

Problem 5: No shared intuition

This is the query-context gap

Query-by-keyword

Query-by-concept

Query-by-examples

What sources Query

Prediction

Find shots of people shaking hands

Page 15: Lecture 01 internet video search

System 1: histogram matching

Histogram as a summary of color characteristics. This image cannot currently be displayed.

Swain and Ballard, IJCV 1991

Page 16: Lecture 01 internet video search

1 Conclusion

As content grows, many applications of image search.

Deep cognitive and computer science problems.

With simple means one gets visually simple results.

Page 17: Lecture 01 internet video search

2 Features

Page 18: Lecture 01 internet video search

Light source

Source . reflection

Result

)()( λρλe

Object

)(λρ

)(λe

Page 19: Lecture 01 internet video search

(R,G,B)

)()()(

)()()(

)()()(

=

λ

λ

λ

λλλρλ

λλλρλ

λλλρλ

dfe

dfe

dfe

BGR

B

G

R

Page 20: Lecture 01 internet video search

++

++

++

=

BGRB

BGRG

BGRR

bgr

(r, g, b) in (R,G,B)

Independent of shadow!

Page 21: Lecture 01 internet video search

The sensation of spectra

Hue: dominant wavelength λ(EH)

Saturation: purity of the colour (EH - EW)/EH

Intensity: brightness of the colour EW

“white” “green”

EH

EW

Page 22: Lecture 01 internet video search

Human perception combines (R,G,B) response of the eye in opponent colors

Maximizes perceived contrast!

The sensation of spectra: opponent

λ

λ

λ

−−

++

=

)2(41

)(21

PuperGreenBlueYellowLuminance

GRB

GR

BGR

Page 23: Lecture 01 internet video search

Color Gaussian space

−−=

BGR

EEE

17.060.034.035.004.030.0

27.063.006.0

λλ

λ

Geusebroek PAMI 2002

Maximizes information content!

Page 24: Lecture 01 internet video search

(E0,Eλ,Eλλ)-pdf

Color Gaussian space (R,G,B)-pdf

Page 25: Lecture 01 internet video search

Matter body reflectance in (R,G,B)

Page 26: Lecture 01 internet video search

Taxonomy of diff-image structure T-junction

These junctions later bring recognition

Corner

Junction

Highlight

Page 27: Lecture 01 internet video search

The 2D Gabor function is:

)(222

2

22

21),( vyuxj

yx

eeyxh ++

−= πδ

πσTuning parameters: u, v, σ

Gabor texture

Manjunath and Ma on Gabor for texture in Fourier-space

Page 28: Lecture 01 internet video search

Gabor texture

K-means cluster of RGB

K-means cluster Gabor opponent

Hoang ECCV 2002

Page 29: Lecture 01 internet video search

Gabor GIST descriptor

Calculate Gabor responses locally

Create histograms as before

Distinguishes things like naturalness, openness,

roughness, expansion, and ruggedness

Olivia IJCV 2001 Slide credit: James Hays and Alexei Efros

Page 30: Lecture 01 internet video search

Receptive field in f(x,t)

Gaussian equivalent over x and t:

zero order first order t

Burghouts TIP 2006

Page 31: Lecture 01 internet video search

Gaussians measure differentials

Taylor expansion at x

For discretely sampled signal use the Gaussians The preferred brand of filters: separable by dimension rotation symmetric no new maxima fast implementations.

Page 32: Lecture 01 internet video search

Receptive fields: overview

All observables up to first order color, second order spatial scales, eight frequency bands & first order in t.

Page 33: Lecture 01 internet video search

System 2: Blobworld, textured world

Group blobs based on color and Tamura texture

User specifies query blob and features

System returns images with similar regions

Carson PAMI 2002

Page 34: Lecture 01 internet video search

2 Conclusion

Powerful features capture uniqueness.

A large set is needed for open-ended search.

The Gauss family is the preferred brand of filters.

Fast recursive implementation:

Geusebroek, Van de Weijer & Smeulders 2002

Page 35: Lecture 01 internet video search

3 Measures and invariances

Page 36: Lecture 01 internet video search

There are a million appearances to one object The same part of the same shoe does not have the same appearance in the image. This is the sensory gap. Remove unwanted variance as early as you can.

The need for invariance

Page 37: Lecture 01 internet video search

A feature g is invariant under condition (transform)

caused by accidental conditions at the time of recording,

iff g observed on equal objects and is constant:

Invariance: definition

Page 38: Lecture 01 internet video search

Quiz: scale invariant detection

What properties are invariant to observation scale?

Page 39: Lecture 01 internet video search

surface albedo scene & viewpoint invariant

illumination scene dependent

object surface normal object shape variant

illumination direction scene dependent

viewer’s direction viewpoint variant

sensor sensitivity scene dependent

v

)(λCf

sn

)(λe)(λbc

λλλλλλλλλλ∫∫ += dfcevsnmdfcesnmC CssCbb )()()(),,()()()(),(

Color invariance

Page 40: Lecture 01 internet video search

Matter body reflectance in E

Page 41: Lecture 01 internet video search

E space C space

C is viewpoint invariant

Gevers TIP 2000

,,(3 BGRc},max{

arctan),,(2 BRGBGRc =

},max{arctan),,(1 BG

RBGRc =

Page 42: Lecture 01 internet video search

Hue is viewpoint invariant

H = arctan 3 𝐺−𝐵𝑅−𝐺 + 𝑅−𝐵

, H is a scalar

Page 43: Lecture 01 internet video search

Differential invariants C’, W’, M’

C’ is for matte objects and uneven white light:

EEC λ

λ =

2EEEEEC

EEC

xxx

λλλ

λλλλ

−=

=

W’ is for matte planar objects and even white light:

EEW x

x =E

EW xx

λλ =

M’ is for matte objects and monochromatic light: Geusebroek PAMI 2002 2E

EEEEN xxx

λλλ

−=

Page 44: Lecture 01 internet video search

shadows shading highlights ill. intensity ill. color E - - - - -

H + + + + - W & W’ - + - + - C & C’ + + - + - M & M’ + + - + + L + + + + - E 990 H 315 Retained from 1000 colors σ = 3: W’ 995 C’ 850 M’ 900

Retained discrimination

Geusebroek PAMI 2003

Page 45: Lecture 01 internet video search

3 Conclusion

Know your variances and invariants.

Good invariant features make algorithms simple.