Download - Machine Learning in Computer Vision
![Page 1: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/1.jpg)
Machine Learning in Computer Vision
Fei-Fei Li
![Page 2: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/2.jpg)
What is (computer) vision?• When we “see” something, what does it
involve?• Take a picture with a camera, it is just a
bunch of colored dots (pixels)• Want to make computers understand
images• Looks easy, but not really…
Image (or video) Sensing device Interpreting device Interpretations
Corn/mature cornjn a cornfield/plant/blue sky inthe backgroundEtc.
![Page 3: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/3.jpg)
What is it related to?
Computer Vision
Neuroscience
Machine learning
Speech Information retrieval
Maths
ComputerScience
InformationEngineering
Physics
Biology
Robotics
Cognitive sciences
Psychology
![Page 4: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/4.jpg)
Quiz?
![Page 5: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/5.jpg)
What about this?
![Page 6: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/6.jpg)
A picture is worth a thousand words.--- Confucius
or Printers’ Ink Ad (1921)
![Page 7: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/7.jpg)
horizontal lines
verticalblue on the top
porous
oblique
white
shadow to the left
textured
large green patches
A picture is worth a thousand words.--- Confucius
or Printers’ Ink Ad (1921)
![Page 8: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/8.jpg)
A picture is worth a thousand words.--- Confucius
or Printers’ Ink Ad (1921)
building court yard
clear sky
campustrees
autumn leaves
people
day time
bicycles
talking outdoor
![Page 9: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/9.jpg)
Today: machine learning methods for object recognition
![Page 10: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/10.jpg)
outline
• Intro to object categorization• Brief overview
– Generative– Discriminative
• Generative models• Discriminative models
![Page 11: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/11.jpg)
How many object categories are there?
Biederman 1987
![Page 12: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/12.jpg)
Challenges 1: view point variation
Michelangelo 1475-1564
![Page 13: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/13.jpg)
Challenges 2: illumination
slide credit: S. Ullman
![Page 14: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/14.jpg)
Challenges 3: occlusion
Magritte, 1957
![Page 15: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/15.jpg)
Challenges 4: scale
![Page 16: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/16.jpg)
Challenges 5: deformation
Xu, Beihong 1943
![Page 17: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/17.jpg)
Challenges 6: background clutter
Klimt, 1913
![Page 18: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/18.jpg)
History: single object recognition
![Page 19: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/19.jpg)
History: single object recognition
• Lowe, et al. 1999, 2003• Mahamud and Herbert, 2000• Ferrari, Tuytelaars, and Van Gool, 2004• Rothganger, Lazebnik, and Ponce, 2004• Moreels and Perona, 2005• …
![Page 20: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/20.jpg)
Challenges 7: intra-class variation
![Page 21: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/21.jpg)
Object categorization: Object categorization: the statistical viewpointthe statistical viewpoint
)|( imagezebrap
)( ezebra|imagnopvs.
• Bayes rule:
)()(
)|()|(
)|()|(
zebranopzebrap
zebranoimagepzebraimagep
imagezebranopimagezebrap
⋅=
posterior ratio likelihood ratio prior ratio
![Page 22: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/22.jpg)
Object categorization: Object categorization: the statistical viewpointthe statistical viewpoint
)()(
)|()|(
)|()|(
zebranopzebrap
zebranoimagepzebraimagep
imagezebranopimagezebrap
⋅=
posterior ratio likelihood ratio prior ratio
• Discriminative methods model posterior
• Generative methods model likelihood and prior
![Page 23: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/23.jpg)
Discriminative
• Direct modeling of
Zebra
Non-zebra
Decisionboundary
)|()|(
imagezebranopimagezebrap
![Page 24: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/24.jpg)
• Model and Generative
)|( zebraimagep ) |( zebranoimagep
Low Middle
High Middle Low
)|( zebranoimagep)|( zebraimagep
![Page 25: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/25.jpg)
Three main issuesThree main issues
• Representation– How to represent an object category
• Learning– How to form the classifier, given training data
• Recognition– How the classifier is to be used on novel data
![Page 26: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/26.jpg)
Representation– Generative /
discriminative / hybrid
![Page 27: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/27.jpg)
Representation– Generative /
discriminative / hybrid– Appearance only or
location and appearance
![Page 28: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/28.jpg)
Representation– Generative /
discriminative / hybrid– Appearance only or
location and appearance
– Invariances• View point• Illumination• Occlusion• Scale• Deformation• Clutter• etc.
![Page 29: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/29.jpg)
Representation– Generative /
discriminative / hybrid– Appearance only or
location and appearance
– invariances– Part-based or global
w/sub-window
![Page 30: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/30.jpg)
Representation– Generative /
discriminative / hybrid– Appearance only or
location and appearance
– invariances– Parts or global w/sub-
window– Use set of features or
each pixel in image
![Page 31: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/31.jpg)
– Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning
Learning
![Page 32: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/32.jpg)
– Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning)
– Methods of training: generative vs. discriminative
Learning
0 0.2 0.4 0.6 0.8 10
1
2
3
4
5
clas
s de
nsiti
es
p(x|C1)
p(x|C2)
x0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
post
erio
r pr
obab
ilitie
s
x
p(C1|x) p(C
2|x)
![Page 33: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/33.jpg)
– Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning)
– What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)
– Level of supervision• Manual segmentation; bounding box; image
labels; noisy labels
Learning
Contains a motorbike
![Page 34: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/34.jpg)
– Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning)
– What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)
– Level of supervision• Manual segmentation; bounding box; image
labels; noisy labels– Batch/incremental (on category and image
level; user-feedback )
Learning
![Page 35: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/35.jpg)
– Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning)
– What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)
– Level of supervision• Manual segmentation; bounding box; image
labels; noisy labels– Batch/incremental (on category and image
level; user-feedback ) – Training images:
• Issue of overfitting• Negative images for discriminative methods
Learning
![Page 36: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/36.jpg)
– Unclear how to model categories, so we learn what distinguishes them rather than manually specify the difference -- hence current interest in machine learning)
– What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)
– Level of supervision• Manual segmentation; bounding box; image
labels; noisy labels– Batch/incremental (on category and image
level; user-feedback ) – Training images:
• Issue of overfitting• Negative images for discriminative methods
– Priors
Learning
![Page 37: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/37.jpg)
– Scale / orientation range to search over – Speed
Recognition
![Page 38: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/38.jpg)
Bag-of-words models
![Page 39: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/39.jpg)
ObjectObject Bag of Bag of ‘‘wordswords’’
![Page 40: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/40.jpg)
Analogy to documentsAnalogy to documentsOf all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.
sensory, brain, visual, perception,
retinal, cerebral cortex,eye, cell, optical
nerve, imageHubel, Wiesel
China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.
China, trade, surplus, commerce,
exports, imports, US, yuan, bank, domestic,
foreign, increase, trade, value
![Page 41: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/41.jpg)
![Page 42: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/42.jpg)
categorycategorydecisiondecision
learninglearning
feature detection& representation
codewords dictionarycodewords dictionary
image representation
category modelscategory models(and/or) classifiers(and/or) classifiers
recognitionrecognition
![Page 43: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/43.jpg)
feature detection& representation
codewords dictionarycodewords dictionary
image representation
RepresentationRepresentation
1.1.2.2.
3.3.
![Page 44: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/44.jpg)
1.Feature detection and representation1.Feature detection and representation
![Page 45: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/45.jpg)
1.Feature 1.Feature detectiondetection and and representationrepresentation
Normalize patch
Detect patches[Mikojaczyk and Schmid ’02]
[Matas et al. ’02]
[Sivic et al. ’03]
Compute SIFT
descriptor[Lowe’99]
Slide credit: Josef Sivic
![Page 46: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/46.jpg)
…
1.Feature 1.Feature detectiondetection and and representationrepresentation
![Page 47: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/47.jpg)
2. Codewords dictionary formation2. Codewords dictionary formation
…
![Page 48: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/48.jpg)
2. Codewords dictionary formation2. Codewords dictionary formation
Vector quantization
…
Slide credit: Josef Sivic
![Page 49: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/49.jpg)
2. Codewords dictionary formation2. Codewords dictionary formation
Fei-Fei et al. 2005
![Page 50: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/50.jpg)
3. Image representation3. Image representation
…..
frequ
ency
codewords
![Page 51: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/51.jpg)
feature detection& representation
codewords dictionarycodewords dictionary
image representation
RepresentationRepresentation
1.1.2.2.
3.3.
![Page 52: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/52.jpg)
categorycategorydecisiondecision
codewords dictionarycodewords dictionary
category modelscategory models(and/or) classifiers(and/or) classifiers
Learning and RecognitionLearning and Recognition
![Page 53: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/53.jpg)
2 case studies2 case studies
1. Naïve Bayes classifier– Csurka et al. 2004
2. Hierarchical Bayesian text models (pLSA and LDA)
– Background: Hoffman 2001, Blei et al. 2004– Object categorization: Sivic et al. 2005, Sudderth et
al. 2005– Natural scene categorization: Fei-Fei et al. 2005
![Page 54: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/54.jpg)
• wn: each patch in an image– wn = [0,0,…1,…,0,0]T
• w: a collection of all N patches in an image– w = [w1,w2,…,wN]
• dj: the jth image in an image collection• c: category of the image• z: theme or topic of the patch
First, some notationsFirst, some notations
![Page 55: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/55.jpg)
wN
c
Case #1: the NaCase #1: the Naïïve ve BayesBayes modelmodel
)|()( cwpcp
Prior prob. of the object classes
Image likelihoodgiven the class
Csurka et al. 2004
∏=
=N
nn cwpcp
1
)|()(
Object classdecision
∝)|( wcpc
c maxarg=∗
![Page 56: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/56.jpg)
Hoffman, 2001
Case #2: Hierarchical Bayesian Case #2: Hierarchical Bayesian text modelstext models
wN
d z
D
wN
c z
D
π
Blei et al., 2001
Probabilistic Latent Semantic Analysis (pLSA)
Latent Dirichlet Allocation (LDA)
![Page 57: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/57.jpg)
wN
d z
D
Case #2: Hierarchical Bayesian Case #2: Hierarchical Bayesian text modelstext models
Probabilistic Latent Semantic Analysis (pLSA)
“face”
Sivic et al. ICCV 2005
![Page 58: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/58.jpg)
wN
c z
D
π
Latent Dirichlet Allocation (LDA)
Case #2: Hierarchical Bayesian Case #2: Hierarchical Bayesian text modelstext models
Fei-Fei et al. ICCV 2005
“beach”
![Page 59: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/59.jpg)
Another application
• Human action classification
![Page 60: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/60.jpg)
Invariance issuesInvariance issues• Scale and rotation
– Implicit– Detectors and descriptors
Kadir and Brady. 2003
![Page 61: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/61.jpg)
• Scale and rotation• Occlusion
– Implicit in the models– Codeword distribution: small variations– (In theory) Theme (z) distribution: different
occlusion patterns
Invariance issuesInvariance issues
![Page 62: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/62.jpg)
• Scale and rotation• Occlusion• Translation
– Encode (relative) location information
Invariance issuesInvariance issues
Sudderth et al. 2005
![Page 63: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/63.jpg)
• Scale and rotation• Occlusion• Translation• View point (in theory)
– Codewords: detector and descriptor
– Theme distributions: different view points
Invariance issuesInvariance issues
Fergus et al. 2005
![Page 64: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/64.jpg)
Model propertiesModel propertiesOf all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.
sensory, brain, visual, perception,
retinal, cerebral cortex,eye, cell, optical
nerve, imageHubel, Wiesel
• Intuitive– Analogy to documents
![Page 65: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/65.jpg)
Model propertiesModel properties
• Intuitive• (Could use)
generative models– Convenient for weakly-
or un-supervised training
– Prior information– Hierarchical Bayesian
frameworkSivic et al., 2005, Sudderth et al., 2005
![Page 66: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/66.jpg)
Model propertiesModel properties
• Intuitive• (Could use)
generative models• Learning and
recognition relatively fast– Compare to other
methods
![Page 67: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/67.jpg)
• No rigorous geometric information of the object components
• It’s intuitive to most of us that objects are made of parts – no such information
• Not extensively tested yet for– View point invariance– Scale invariance
• Segmentation and localization unclear
Weakness of the modelWeakness of the model
![Page 68: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/68.jpg)
part-based models
Slides courtesy to Rob Fergus for “part-based models”
![Page 69: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/69.jpg)
One-shot learningof object categoriesFei-Fei et al. ‘03, ‘04, ‘06
![Page 70: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/70.jpg)
One-shot learningof object categories
P. Bruegel, 1562
Fei-Fei et al. ‘03, ‘04, ‘06
![Page 71: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/71.jpg)
One-shot learningof object categories
model representation
Fei-Fei et al. ‘03, ‘04, ‘06
![Page 72: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/72.jpg)
X (location)
(x,y) coords. of region center
A (appearance)
Projection ontoPCA basis
c1
c2
c10
…..
normalize
11x11 patch
![Page 73: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/73.jpg)
X (location)
(x,y) coords. of region center
A (appearance)
Projection ontoPCA basis
c1
c2
c10
…..
normalize
11x11 patchX A
h
ΓXμX
I
ΓAμA
The Generative Model
![Page 74: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/74.jpg)
X A
h
ΓXμX
I
ΓAμA
The Generative Model
observed variables
hidden variable
parameters
![Page 75: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/75.jpg)
X A
h
ΓXμX
I
ΓAμA
The Generative Model
θn
θ1
θ2
where θ = {µX, ΓX, µA, ΓA}
ML/MAP
Weber et al. ’98 ’00, Fergus et al. ’03
![Page 76: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/76.jpg)
The Generative Model
θn
θ1
θ2
where θ = {µX, ΓX, µA, ΓA}
ML/MAPΓXμX
ΓAμA
shape model
appearance model
![Page 77: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/77.jpg)
X A
h
I
ΓXμX ΓAμA
θn
θ1
θ2
The Generative Model
ML/MAP
![Page 78: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/78.jpg)
X A
h
I
PΓXμX ΓAμA
a0XB0X
m0Xβ0X
a0AB0A
m0Aβ0A
Bayesian
Parameters to estimate: {mX, βX, aX, BX, mA, βA, aA, BA}i.e. parameters of Normal-Wishart distribution
θn
θ1
θ2
The Generative Model
Fei-Fei et al. ‘03, ‘04, ‘06
![Page 79: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/79.jpg)
The Generative Model
X A
h
I
PΓXμX ΓAμA
a0XB0X
m0Xβ0X
a0AB0A
m0Aβ0A
parameters
priors
Fei-Fei et al. ‘03, ‘04, ‘06
![Page 80: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/80.jpg)
The Generative Model
X A
h
I
PΓXμX ΓAμA
a0XB0X
m0Xβ0X
a0AB0A
m0Aβ0A
Prior distribution
Fei-Fei et al. ‘03, ‘04, ‘06
priors
![Page 81: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/81.jpg)
2. modelrepresentation
3. learning& inferences
4. evaluation& dataset
& application
1. human vision
One-shot learningof object categories
![Page 82: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/82.jpg)
One-shot learningof object categories
learning & inferences
Fei-Fei et al. 2003, 2004, 2006
No labeling No segmentation No alignment
![Page 83: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/83.jpg)
One-shot learningof object categories
learning & inferences
Fei-Fei et al. 2003, 2004, 2006
Bayesian
θn
θ1
θ2
![Page 84: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/84.jpg)
E-Step
Random initializationVariational EMVariational EM
new estimate of p(θ|train)
M-Step
prior knowledge of p(θ)Attias, Jordan, Hinton etc.
![Page 85: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/85.jpg)
One-shot learningof object categories
evaluation & dataset
Fei-Fei et al. 2004, 2006a, 2006b
![Page 86: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/86.jpg)
One-shot learningof object categories
evaluation & dataset
Fei-Fei et al. 2004, 2006a, 2006b
-- Caltech 101 Dataset
![Page 87: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/87.jpg)
One-shot learningof object categories
evaluation & dataset
Fei-Fei et al. 2004, 2006a, 2006b
-- Caltech 101 Dataset
![Page 88: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/88.jpg)
Part 3: discriminative methods
![Page 89: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/89.jpg)
Discriminative methodsObject detection and recognition is formulated as a classification problem.
Bag of image patches
Decision boundary
… and a decision is taken at each window about if it contains a target object or not.
Computer screen
Background
In some feature space
Where are the screens?
The image is partitioned into a set of overlapping windows
![Page 90: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/90.jpg)
(The lousy painter)
Discriminative vs. generative
0 10 20 30 40 50 60 700
0.05
0.1
x = data
• Generative model
0 10 20 30 40 50 60 700
0.5
1
x = data
• Discriminative model
0 10 20 30 40 50 60 70 80
-1
1
x = data
• Classification function
(The artist)
![Page 91: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/91.jpg)
Discriminative methods
106 examples
Nearest neighbor
Shakhnarovich, Viola, Darrell 2003Berg, Berg, Malik 2005…
Neural networks
LeCun, Bottou, Bengio, Haffner 1998Rowley, Baluja, Kanade 1998…
Conditional Random FieldsSupport Vector Machines and Kernels
McCallum, Freitag, Pereira 2000Kumar, Hebert 2003…
Guyon, VapnikHeisele, Serre, Poggio, 2001…
![Page 92: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/92.jpg)
• Formulation: binary classificationFormulation
+1-1
x1 x2 x3 xN
…
… xN+1 xN+2 xN+M
-1 -1 ? ? ?
…
Training data: each image patch is labeledas containing the object or background
Test data
Features x =
Labels y =
Where belongs to some family of functions
• Classification function
• Minimize misclassification error(Not that simple: we need some guarantees that there will be generalization)
![Page 93: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/93.jpg)
Overview of section
• Object detection with classifiers
• Boosting– Gentle boosting– Weak detectors– Object model– Object detection
• Multiclass object detection
![Page 94: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/94.jpg)
• A simple algorithm for learning robust classifiers– Freund & Shapire, 1995– Friedman, Hastie, Tibshhirani, 1998
• Provides efficient algorithm for sparse visual feature selection– Tieu & Viola, 2000– Viola & Jones, 2003
• Easy to implement, not requires external optimization tools.
Why boosting?
![Page 95: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/95.jpg)
• Defines a classifier using an additive model:
Boosting
Strong classifier
Weak classifier
WeightFeaturesvector
![Page 96: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/96.jpg)
• Defines a classifier using an additive model:
• We need to define a family of weak classifiers
Boosting
Strong classifier
Weak classifier
WeightFeaturesvector
from a family of weak classifiers
![Page 97: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/97.jpg)
Each data point has
a class label:
wt =1and a weight:
+1 ( )
-1 ( )yt =
Boosting• It is a sequential procedure:
xt=1
xt=2
xt
![Page 98: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/98.jpg)
Toy exampleWeak learners from the family of lines
h => p(error) = 0.5 it is at chance
Each data point has
a class label:
wt =1and a weight:
+1 ( )
-1 ( )yt =
![Page 99: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/99.jpg)
Toy example
This one seems to be the best
Each data point has
a class label:
wt =1and a weight:
+1 ( )
-1 ( )yt =
This is a ‘weak classifier’: It performs slightly better than chance.
![Page 100: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/100.jpg)
Toy example
We set a new problem for which the previous weak classifier performs at chance again
Each data point has
a class label:
wt wt exp{-yt Ht}
We update the weights:
+1 ( )
-1 ( )yt =
![Page 101: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/101.jpg)
Toy example
We set a new problem for which the previous weak classifier performs at chance again
Each data point has
a class label:
wt wt exp{-yt Ht}
We update the weights:
+1 ( )
-1 ( )yt =
![Page 102: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/102.jpg)
Toy example
We set a new problem for which the previous weak classifier performs at chance again
Each data point has
a class label:
wt wt exp{-yt Ht}
We update the weights:
+1 ( )
-1 ( )yt =
![Page 103: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/103.jpg)
Toy example
We set a new problem for which the previous weak classifier performs at chance again
Each data point has
a class label:
wt wt exp{-yt Ht}
We update the weights:
+1 ( )
-1 ( )yt =
![Page 104: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/104.jpg)
Toy example
The strong (non- linear) classifier is built as the combination of all the weak (linear) classifiers.
f1 f2
f3
f4
![Page 105: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/105.jpg)
From images to features:Weak detectors
We will now define a family of visual features that can be used as weak classifiers (“weak detectors”)
Takes image as input and the output is binary response.The output is a weak detector.
![Page 106: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/106.jpg)
Weak detectorsTextures of textures Tieu and Viola, CVPR 2000
Every combination of three filters generates a different feature
This gives thousands of features. Boosting selects a sparse subset, so computations on test time are very efficient. Boosting also avoids overfitting to some extend.
![Page 107: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/107.jpg)
Weak detectorsHaar filters and integral imageViola and Jones, ICCV 2001
The average intensity in the block is computed with four sums independently of the block size.
![Page 108: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/108.jpg)
Weak detectorsOther weak detectors:• Carmichael, Hebert 2004• Yuille, Snow, Nitzbert, 1998• Amit, Geman 1998• Papageorgiou, Poggio, 2000• Heisele, Serre, Poggio, 2001• Agarwal, Awan, Roth, 2004• Schneiderman, Kanade 2004 • …
![Page 109: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/109.jpg)
Weak detectorsPart based: similar to part-based generative
models. We create weak detectors by using parts and voting for the object center location
Car model Screen model
These features are used for the detector on the course web site.
![Page 110: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/110.jpg)
Weak detectorsFirst we collect a set of part templates from a set of training objects.Vidal-Naquet, Ullman (2003)
…
![Page 111: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/111.jpg)
Weak detectorsWe now define a family of “weak detectors” as:
= =
Better than chance
*
![Page 112: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/112.jpg)
Weak detectorsWe can do a better job using filtered images
Still a weak detectorbut better than before
* *= ==
![Page 113: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/113.jpg)
TrainingFirst we evaluate all the N features on all the training images.
Then, we sample the feature outputs on the object center and at random locations in the background:
![Page 114: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/114.jpg)
Representation and object model
…4 10
Selected features for the screen detector
1 2 3
…100
Lousy painter
![Page 115: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/115.jpg)
Representation and object modelSelected features for the car detector
1 2 3 4 10 100
… …
![Page 116: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/116.jpg)
Overview of section
• Object detection with classifiers
• Boosting– Gentle boosting– Weak detectors– Object model– Object detection
• Multiclass object detection
![Page 117: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/117.jpg)
Example: screen detectionFeature output
![Page 118: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/118.jpg)
Example: screen detectionFeature output
Thresholdedoutput
Weak ‘detector’Produces many false alarms.
![Page 119: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/119.jpg)
Example: screen detectionFeature output
Thresholdedoutput
Strong classifier at iteration 1
![Page 120: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/120.jpg)
Example: screen detectionFeature output
Thresholdedoutput
Strongclassifier
Second weak ‘detector’Produces a different set of false alarms.
![Page 121: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/121.jpg)
Example: screen detection
+
Feature output
Thresholdedoutput
Strongclassifier
Strong classifier at iteration 2
![Page 122: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/122.jpg)
Example: screen detection
+
…
Feature output
Thresholdedoutput
Strongclassifier
Strong classifier at iteration 10
![Page 123: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/123.jpg)
Example: screen detection
+
…
Feature output
Thresholdedoutput
Strongclassifier
Adding features
Finalclassification
Strong classifier at iteration 200
![Page 124: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/124.jpg)
applications
![Page 125: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/125.jpg)
![Page 126: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/126.jpg)
![Page 127: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/127.jpg)
![Page 128: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/128.jpg)
Document Analysis
Digit recognition, AT&T labshttp://www.research.att.com/~yann/
![Page 129: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/129.jpg)
Medical Imaging
![Page 130: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/130.jpg)
Robotics
![Page 131: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/131.jpg)
Toys and robots
![Page 133: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/133.jpg)
Surveillance
![Page 135: Machine Learning in Computer Vision](https://reader034.vdocuments.us/reader034/viewer/2022051313/54822d6bb4af9f730d8b4704/html5/thumbnails/135.jpg)
Searching the web