eecs 442 – computer vision object recognition€¦ · p no zebra p zebra p image no zebra p ......
TRANSCRIPT
EECS 442 – Computer vision
Object Recognition
• Intro
• Recognition of 3D objects
• Recognition of object categories:
• Bag of world models
• Part based models
• 3D object categorization
Basic properties
• Representation – 2D Bag of Words (BoW) models;
– 2D Star/ISM models;
– Multi-view models;
• Learning – Generative & Discriminative BoW models
– Probabilistic Hough voting
– Generative multi-view models
• Recognition – Classification with BoW
– Detection via Hough voting
Basic properties
• Representation – 2D Bag of Words (BoW) models;
– 2D Star/ISM models;
– Multi-view models;
• Learning – Generative & Discriminative BoW models
– Probabilistic Hough voting
– Generative multi-view models
• Recognition – Classification with BoW
– Detection via Hough voting
category
decision
Representation
feature detection
& representation
codewords dictionary
image representation
category models
(and/or) classifiers
recognition le
arn
ing
category models
(and/or) classifiers
Learning and Recognition
1. Discriminative method:
- NN
- SVM
2.Generative method:
- graphical models
Discriminative models
Support Vector Machines
Guyon, Vapnik, Heisele,
Serre, Poggio…
Boosting
Viola, Jones 2001,
Torralba et al. 2004,
Opelt et al. 2006,…
106 examples
Nearest neighbor
Shakhnarovich, Viola, Darrell 2003
Berg, Berg, Malik 2005...
Neural networks
Slide adapted from Antonio Torralba Courtesy of Vittorio Ferrari
Slide credit: Kristen Grauman
Latent SVM
Structural SVM
Felzenszwalb 00
Ramanan 03…
LeCun, Bottou, Bengio, Haffner 1998
Rowley, Baluja, Kanade 1998
…
Support vector machines • Find hyperplane that maximizes the margin between the positive
and negative examples
Margin
Support vectors
Distance between point
and hyperplane: ||||
||
w
wx bi
Support vectors: 1 bi wx
Margin = 2 / ||w||
Credit slide: S. Lazebnik
i iii y xw
bybi iii xxxw
Classification function (decision boundary):
Solution:
Support vector machines • Classification
bybi iii xxxw
20
10
classbif
classbif
wx
wx
Test point
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge
Discovery, 1998
• Datasets that are linearly separable work out great:
•
•
• But what if the dataset is just too hard?
• We can map it to a higher-dimensional space:
0 x
0 x
0 x
x2
Nonlinear SVMs
Slide credit: Andrew Moore
Φ: x → φ(x)
Nonlinear SVMs
• General idea: the original input space can always be
mapped to some higher-dimensional feature space
where the training set is separable:
Slide credit: Andrew Moore
lifting transformation
Overfitting
• A simple dataset.
• Two models
+ +
+
+
+
+
+
+ -
-
-
-
-
-
-
-
+
+ +
+
+ +
+
+ -
-
-
-
-
-
-
-
+
+ Linear Non-linear
Overfitting
+ +
+
+
+
+
+
+ -
-
-
-
-
-
-
-
+
+
+ +
+ -
-
-
-
-
-
-
- -
-
+
+
+
+
+ +
+
+
+
+
+
+ -
-
-
-
-
-
-
-
+
+
+ +
+ -
-
-
-
-
-
-
- -
-
+
+
+
+
• Let‟s get more data.
• Simple model has better generalization.
Overfitting
• As complexity increases, the model overfits the data
• Training loss decreases
• Real loss increases
• We need to penalize model complexity = to regularize
Model complexity
Training loss
Real loss
Loss
category models
(and/or) classifiers
Learning and Recognition
1. Discriminative method:
- NN
- SVM
2.Generative method:
- graphical models
Generative models
1. Naïve Bayes classifier – Csurka Bray, Dance & Fan, 2004
2. Hierarchical Bayesian text models (pLSA and LDA)
– Background: Hoffman 2001, Blei, Ng & Jordan, 2004
– Object categorization: Sivic et al. 2005, Sudderth et al. 2005
– Natural scene categorization: Fei-Fei et al. 2005
Object categorization:
the statistical viewpoint
• Bayes rule:
)(
)(
)|(
)|(
)|(
)|(
zebranop
zebrap
zebranoimagep
zebraimagep
imagezebranop
imagezebrap
posterior ratio likelihood ratio prior ratio
• Discriminative methods model posterior
• Generative methods model likelihood and
prior
• w: a collection of all N codewords in the
image
w = [w1,w2,…,wN]
• c: category of the image
Some notations
the Naïve Bayes model
)c|w(p)c(p~
Prior prob. of
the object classes
Image likelihood
given the class
)w|c(p
w
N
c
)|,,()( 1 cwwpcp N
the Naïve Bayes model
)c|w(p)c(p~
Prior prob. of
the object classes
Image likelihood
given the class
)w|c(p )|,,()( 1 cwwpcp N
N
1i
i )c|w(p
• Assume that each feature (codewords) is conditionally
independent given the class
)c|w,,w(p N1
N
n
n cwpcp1
)|()(
Likelihood of nth visual
word given the class
the Naïve Bayes model
)c|w(p)c(p~
Prior prob. of
the object classes
Image likelihood
given the class
)w|c(p )|,,()( 1 cwwpcp N
N
n
n cwpcp1
)|()(
Likelihood of nth visual
word given the class
Classification/Recognition
N
n
n cwpcp1
)|()(
Object class
decision
)|( wcpc
c maxarg
)|( 1cwp i
)|( 2cwp i
• How do we learn P(wi|cj)?
• From empirical
frequencies of code words
in images from a given
class
Summary: Generative models
• Naïve Bayes
– Unigram models in document analysis
– Assumes conditional independence of words given class
– Parameter estimation: frequency counting
Other generative BoW models
• Hierarchical Bayesian topic models (e.g. pLSA and LDA)
– Object categorization: Sivic et al. 2005, Sudderth et al. 2005
– Natural scene categorization: Fei-Fei et al. 2005
Generative vs discriminative
• Discriminative methods – Computationally efficient & fast
• Generative models – Convenient for weakly- or un-supervised,
incremental training
– Prior information
– Flexibility in modeling parameters
• No rigorous geometric information of the object components
• It‟s intuitive to most of us that objects are made of parts – no such information
• Not extensively tested yet for – View point invariance
– Scale invariance
• Segmentation and localization unclear
Weakness of BoW the models
EECS 442 – Computer vision
Object Recognition
• Intro
• Recognition of 3D objects
• Recognition of object categories:
• Bag of world models
• Part based models
• 3D object categorization
Basic properties
• Representation – 2D Bag of Words (BoW) models;
– 2D Star/ISM models;
– Multi-view models;
• Learning – Generative & Discriminative BoW models
– Probabilistic Hough voting
– Generative multi-view models
• Recognition – Classification with BoW
– Detection via Hough voting
Problem with bag-of-words
• All have equal probability for bag-of-words methods
• Location information is important
Part Based Representation
• Object as set of parts
• Model:
– Relative locations
between parts
– Appearance of part
Figure from [Fischler & Elschlager 73]
History of Parts and Structure
approaches
• Fischler & Elschlager 1973
• Yuille „91
• Brunelli & Poggio „93
• Lades, v.d. Malsburg et al. „93
• Cootes, Lanitis, Taylor et al. „95
• Amit & Geman „95, „99
• Perona et al. „95, „96, ‟98, ‟00, ‟03, „04, ‟05
• Ullman et al. 02
• Felzenszwalb & Huttenlocher ‟00, ‟04
• Crandall & Huttenlocher ‟05, ‟06
• Leibe & Schiele ‟03, ‟04
• Many papers since 2000
Sparse representation
Computationally tractable (105 pixels 101 -- 102 parts)
But throw away potentially useful image information
from Sparse Flexible Models of Local Features
Gustavo Carneiro and David Lowe, ECCV 2006
Different connectivity structures
O(N6) O(N2) O(N3)
O(N2)
Fergus et al. ‟03
Fei-Fei et al. „03
Crandall et al. „05
Leibe 05; Felzenszwalb 09 Crandall et al. „05
Felzenszwalb &
Huttenlocher „00
Bouchard & Triggs „05 Carneiro & Lowe „06 Csurka ‟04
Vasconcelos „00
animal head instantiated
by tiger head
animal head instantiated
by bear head
e.g. discontinuities,
gradient
e.g. linelets,
curvelets, T-
junctions
e.g. contours,
intermediate objects
e.g. animals, trees,
rocks
Context and Hierarchy in a Probabilistic Image Model Jin & Geman (2006)
from Sparse Flexible Models of Local Features
Gustavo Carneiro and David Lowe, ECCV 2006
Different connectivity structures
O(N6) O(N2) O(N3)
O(N2)
Fergus et al. ‟03
Fei-Fei et al. „03
Crandall et al. „05
Leibe 05; Felzenszwalb 09 Crandall et al. „05
Felzenszwalb &
Huttenlocher „00
Bouchard & Triggs „05 Carneiro & Lowe „06 Csurka ‟04
Vasconcelos „00
– Search strategy: Sliding Windows
• Simple but large computational complexity (x,y, S, , N of classes)
Star models by Latent SVM
B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit
Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004
Implicit shape models by generalized Hough voting
Credit slide: S. Lazebnik
Basic properties
• Representation
– How to represent an object category; which classification scheme?
• Learning
– How to learn the classifier, given training data
• Recognition
– How the classifier is to be used on novel data
B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit
Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004
Object representation: Constellation of parts w.r.t object
centroid
Credit slide: S. Lazebnik
B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit
Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004
Object representation: How to capture constellation of parts?
Using Hough Voting
Credit slide: S. Lazebnik
Basic properties
• Representation
– How to represent an object category; which classification scheme?
• Learning
– How to learn the classifier, given training data
• Recognition
– How the classifier is to be used on novel data
• Parts in query image vote for a learnt model
• Significant aggregations of votes correspond to models
• Complexity : # parts * # votes
– Significantly lower than brute force search (e.g., sliding window detectors)
• Popular for detecting parameterized shapes – Hough’59, Duda&Hart’72, Ballard’81,…
Hough Transform Formulation
Slide Modified from S. Maji
x
y
Hough transform
Given a set of points, find the curve or line that explains
the data points best
P.V.C. Hough, Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High
Energy Accelerators and Instrumentation, 1959
x
y
n
m
y = m x + n
Hough transform
Given a set of points, find the curve or line that explains
the data points best
P.V.C. Hough, Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High
Energy Accelerators and Instrumentation, 1959
Hough space
y1 = m x1 + n
(x1, y1)
x
y
Hough transform
Issue : parameter space [m,n] is unbounded…
P.V.C. Hough, Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High
Energy Accelerators and Instrumentation, 1959
Hough space
siny cosx
•Use a polar representation for the parameter space
features votes
IDEA: introduce a grid a count intersection points in each cell
Issue: Grid size needs to be adjusted…
Hough transform - experiments
Noisy data
• All points are processed independently, so can cope with
occlusion/outliers
• Some robustness to noise: noise points unlikely to
contribute consistently to any single bin
Hough transform - conclusions
Good:
Bad:
• Spurious peaks due to uniform noise
• Trade-off noise-grid size (hard to find sweet point)
• What if want to detect arbitrary shapes defined by boundary points and a reference point?
Generalized Hough transform
Credit slide: C. Grauman
[Dana H. Ballard, Generalizing the Hough Transform to Detect Arbitrary Shapes, 1980]
To detect the model shape in a new image:
• For each edge point
– Index into table with its gradient orientation θ
– Use retrieved r vectors to vote for position of
reference point
• Peak in this Hough space is reference point with most supporting edges
Assuming translation is the only transformation here, i.e., orientation and
scale are fixed.
Generalized Hough transform
Credit slide: C. Grauman
rx ry
0 1 0
45 0.7 0.7
90 0 1
135 -0.7 0.7
…
270 0.7 -0.7
Circle model
Query P1 = 0
Pk = -180 R = [rx,ry] = [-1, 0] Ck = Pk + R
R = [rx,ry] = [1,0] C1 = P1 + R
P2 = 45 R = [rx,ry] = [.7,.7] C2 = P2 + R
…
Example
• Instead of indexing displacements by gradient orientation, index by “visual codeword”
B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit
Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004
Implicit shape models
Visual codebook is used to index votes for object position [center] and scale
CW rx ry
1 0.9 0.1
1
• Instead of indexing displacements by gradient orientation, index by “visual codeword”
B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit
Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004
Implicit shape models
Visual codebook is used to index votes for object position [center] and scale
CW rx ry
1 0.9 .1
3 ? ?
1 3
• Instead of indexing displacements by gradient orientation, index by “visual codeword”
B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit
Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004
Implicit shape models
Visual codebook is used to index votes for object position [center] and scale
CW rx ry
1 0.9 .1
3 -1 0
… … …
N 0.7 -0.7
1 3
Implicit shape models: Training
1. Build codebook of patches around extracted interest points using clustering
Credit slide: S. Lazebnik
Implicit shape models: Training 1. Build codebook of patches around extracted interest points
using clustering
2. Map the patch around each interest point to closest codebook entry
3. For each codebook entry, store all positions relative to object center [center is given] and scale [bounding box is given]
Credit slide: S. Lazebnik
Implicit Shape Model - Recognition Interest Points Matched Codebook
Entries
Probabilistic
Voting
3D Voting Space
(continuous)
x
y
s
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
22-Nov-11 74
Implicit Shape Model - Recognition [L
eib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
Backprojected
Hypotheses
Interest Points Matched Codebook
Entries
Probabilistic
Voting
3D Voting Space
(continuous)
x
y
s
Backprojection
of Maxima
76
Segmentation
Probabilistic Hough Transform
Position Posterior distribution of the centroid
given the Codeword Ci observed at location lj.
Codeword
Match
confidence (or weight) of the codeword Ci.
Detection Score
Object Position
Image Feature
Codebook match
Learnt using a max margin formulation
Maji et al , CVPR 2009
f = features l = location of the features. C = codebook entry O = object class x = object center
Perc
eptu
al and S
enso
ry A
ugm
ente
d C
om
puti
ng
Co
mp
ute
r V
isio
n W
S 0
8/0
9
Original image
Example: Results on Cows
Perc
eptu
al and S
enso
ry A
ugm
ente
d C
om
puti
ng
Co
mp
ute
r V
isio
n W
S 0
8/0
9
Original image Interest points
Example: Results on Cows
Perc
eptu
al and S
enso
ry A
ugm
ente
d C
om
puti
ng
Co
mp
ute
r V
isio
n W
S 0
8/0
9
Original image Interest points Matched patches
Example: Results on Cows
Perc
eptu
al and S
enso
ry A
ugm
ente
d C
om
puti
ng
Co
mp
ute
r V
isio
n W
S 0
8/0
9
Example: Results on Cows
Prob. Votes
Perc
eptu
al and S
enso
ry A
ugm
ente
d C
om
puti
ng
Co
mp
ute
r V
isio
n W
S 0
8/0
9
1st hypothesis
Example: Results on Cows
Perc
eptu
al and S
enso
ry A
ugm
ente
d C
om
puti
ng
Co
mp
ute
r V
isio
n W
S 0
8/0
9
2nd hypothesis
Example: Results on Cows
Perc
eptu
al and S
enso
ry A
ugm
ente
d C
om
puti
ng
Co
mp
ute
r V
isio
n W
S 0
8/0
9
Example: Results on Cows
3rd hypothesis
Perc
eptu
al and S
enso
ry A
ugm
ente
d C
om
puti
ng
Co
mp
ute
r V
isio
n W
S 0
8/0
9
Example Results: Chairs
Office chairs
Dining room chairs
Perc
eptu
al and S
enso
ry A
ugm
ente
d C
om
puti
ng
Co
mp
ute
r V
isio
n W
S 0
8/0
9
You Can Try It At Home…
• Linux binaries available
Including datasets & several pre-trained detectors
http://www.vision.ee.ethz.ch/bleibe/code
Source: Bastian Leibe
22-Nov-11 88
Extension: Learning Feature Weights 89
Weights can be learned optimally using a max-margin framework.
Subhransu Maji and Jitendra Malik, Object Detection Using a Max-Margin Hough Transform,
CVPR 2009
89
Important Parts blue (low) , dark red (high)
Max-Margin
Learned Weights (ETHZ shape)
Important Parts
95
Naïve Bayes
blue (low) , dark red (high)
Influenced by clutter
(rare structures)
Conclusions
• Pros: – Works well for many different object categories
• Both rigid and articulated objects
– Flexible geometric model • Can recombine parts seen on different training examples
– Learning from relatively few (50-100) training examples
– Optimized for detection, good localization properties
• Cons: – Needs supervised training data
• Object bounding boxes for detection
• Segmentations for top-down segmentation
– No discriminative learning
Object detection and 3d shape recovery from a single image
• M. Sun, B. Xu, G. Bradski, S. Savarese, ECCV 2010 • M. Sun, S. Kumar, G. Bradski, S. Savarese, 3DIM-PVT 2011
• Object categories • Arbitrary Texture, Topology • Uncontrolled Environment • Un-calibrated Camera
Depth Encoded Hough Voting
Position Posterior Codeword likelihood Scale related to Depth
C = Codebook f = Part appearance(feature), l = 2D location s = Scale of the part, d = Depth information,
jjjj
ji
jjjijijjji dsdlspdslCOpfCpdslCOxp ),|(),,,|()|(),,,,|(,
Detection Score
3d reconsctruction
Object detection and 3d shape recovery from a single image
3d pose estimation
Object detection and 3d shape recovery from a single image
3d modeling
Object detection and 3d shape recovery from a single image