20 cv mil_models_for_words

45
Computer vision: models, learning and inference Chapter 20 Models for visual words Please send errata to [email protected]

Upload: zukun

Post on 13-Jan-2015

288 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: 20 cv mil_models_for_words

Computer vision: models, learning and inference

Chapter 20 Models for visual words

Please send errata to [email protected]

Page 2: 20 cv mil_models_for_words

2

Visual words

2Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Most models treat data as continuous• Likelihood based on normal distribution• Visual words = discrete representation of

image• Likelihood based on categorical distribution• Useful for difficult tasks such as scene

recognition and object recognition

Page 3: 20 cv mil_models_for_words

3

Motivation: scene recognition

3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 4: 20 cv mil_models_for_words

4

Structure

4Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications

Page 5: 20 cv mil_models_for_words

5

Computing dictionary of visual words

5Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

1. For every one of the I training images, select a set of Ji spatial locations.• Interest points• Regular grid

2. Compute a descriptor at each spatial location in each image

3. Cluster all of these descriptor vectors into K groups using a method such as the K-Means algorithm

4. The means of the K clusters are used as the K prototype vectors in the dictionary.

Page 6: 20 cv mil_models_for_words

6

Encoding images as visual words

6Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

1. Select a set of J spatial locations in the image using the same method as for the dictionary

2. Compute the descriptor at each of the J spatial locations. 3. Compare each descriptor to the set of K prototype

descriptors in the dictionary4. Assign a discrete index to this location that corresponds to

the index of the closest word in the dictionary.

End result:

Discrete feature index x and y position

Page 7: 20 cv mil_models_for_words

7

Structure

7Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications

Page 8: 20 cv mil_models_for_words

8

Bag of words model

8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Key idea:

• Abandon all spatial information• Just represent image by relative frequency

(histogram) of words from dictionary

where

Page 9: 20 cv mil_models_for_words

9

Bag of words

9Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 10: 20 cv mil_models_for_words

10

Structure

10Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Learning (MAP solution):

Inference:

Page 11: 20 cv mil_models_for_words

11

Bag of words for object recognition

11Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 12: 20 cv mil_models_for_words

12

Problems with bag of words

12Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 13: 20 cv mil_models_for_words

13

Structure

13Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications

Page 14: 20 cv mil_models_for_words

14

Latent Dirichlet allocation

14Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Describes relative frequency of visual words in a single image (no world term)

• Words not generated independently (connected by hidden variable)

• Analogy to text documents– Each image contains mixture of several topics (parts)– Each topic induces a distribution over words

Page 15: 20 cv mil_models_for_words

15

Latent Dirichlet allocation

15Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 16: 20 cv mil_models_for_words

16

Latent Dirichlet allocation

16Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Generative equations

Marginal distribution over features

Conjugate priors over parameters

Page 17: 20 cv mil_models_for_words

17

Latent Dirichlet allocation

17Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 18: 20 cv mil_models_for_words

18

Learning LDA model

18Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Part labels p hidden variables• If we knew them then it would be easy to estimate the

parameters

• How about EM algorithm? Unfortunately, parts within in image not independent

Page 19: 20 cv mil_models_for_words

19

Latent Dirichlet allocation

19Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 20: 20 cv mil_models_for_words

20

Learning

20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Strategy:

1. Write an expression for posterior distribution over part labels

2. Draw samples from posterior using MCMC3. Use samples to estimate parameters

Page 21: 20 cv mil_models_for_words

21

1. Posterior over part labels

21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Can compute two terms in numerator in closed formDenominator

intractable

Page 22: 20 cv mil_models_for_words

22

2. Draw samples from posterior

22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Gibbs’ sampling: fix all part labels except one and sample from conditional distribution

This can be computed in closed form

Page 23: 20 cv mil_models_for_words

23

3. Use samples to estimate parameters

23Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Samples substitute in for real part labels in update equations

Page 24: 20 cv mil_models_for_words

24

Structure

24Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications

Page 25: 20 cv mil_models_for_words

25

Single author topic model

25Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 26: 20 cv mil_models_for_words

26

Single author-topic model

26Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 27: 20 cv mil_models_for_words

27

Learning

27Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

1. Posterior over part labels

Likelihood same as before, prior becomes

Page 28: 20 cv mil_models_for_words

28

Learning

28Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

2. Draw samples from posterior

3. Use samples to estimate parameters

Page 29: 20 cv mil_models_for_words

29

Inference

29Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Compute posterior over categories

Likelihood that words in this image are due to category n

Page 30: 20 cv mil_models_for_words

30

Structure

30Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications

Page 31: 20 cv mil_models_for_words

31

Problems with bag of words

31Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 32: 20 cv mil_models_for_words

32

Constellation model

32Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 33: 20 cv mil_models_for_words

33

Constellation model

33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 34: 20 cv mil_models_for_words

34

Learning

34Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

1. Posterior over part labels

Prior same as before, likelihood becomes

Page 35: 20 cv mil_models_for_words

35

Learning

35Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

2. Draw samples from posterior

3. Use samples to estimate parameters

Part and word probabilities as before

Page 36: 20 cv mil_models_for_words

36

Inference

36Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Compute posterior over categories

Likelihood that words in this image are due to category n

Page 37: 20 cv mil_models_for_words

37

Learning

37Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 38: 20 cv mil_models_for_words

38

Structure

38Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications

Page 39: 20 cv mil_models_for_words

39

Problems with bag of words

39Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 40: 20 cv mil_models_for_words

40

Scene model

40Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 41: 20 cv mil_models_for_words

41

Scene model

41Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 42: 20 cv mil_models_for_words

42

Structure

42Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications

Page 43: 20 cv mil_models_for_words

43

Video Google

43Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 44: 20 cv mil_models_for_words

44

Action recognition

44Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Spatio-temporal bag of words model 91.8% classification

Page 45: 20 cv mil_models_for_words

45

Action recognition

45Computer vision: models, learning and inference. ©2011 Simon J.D. Prince