object recognizing
DESCRIPTION
Object Recognizing. Recognition. Features Classifiers Example ‘winning’ system. Object Classes. Individual Recognition. Object parts Automatic, or query-driven. Window. Mirror. Window. Door knob. Headlight. Back wheel. Bumper. Front wheel. Headlight. ClassNon-class. - PowerPoint PPT PresentationTRANSCRIPT
Object Recognizing
Recognition
• Features • Classifiers • Example ‘winning’ system
Object Classes
Individual Recognition
Object partsAutomatic, or query-driven
Headlight
Window
Door knob
Back wheel
Mirror
Front wheel Headlight
Window
Bumper
Class Non-class
Class Non-class
Features and Classifiers
Same features with different classifiersSame classifier with different features
Generic Features
Simple (wavelets) Complex (Geons)
Class-specific Features: Common Building Blocks
Optimal Class Components?
• Large features are too rare
• Small features are found everywhere
Find features that carry the highest amount of information
Mutual information
H(C) when F=1 H(C) when F=0
I(C;F) = H(C) – H(C/F)
F=1 F=0
H(C)
))(()()( cPLogcPcH
Mutual Information I(C,F)
Class: 1 1 0 1 0 1 0 0
Feature: 1 0 0 1 1 1 0 0
I(F,C) = H(C) – H(C|F)
Optimal classification features
• Theoretically: maximizing delivered information minimizes classification error
• In practice: informative object components can be identified in training images
KL Classification Error
FC
p(F|C)
p(C|F)
p(C)
E = H(C|F) I(C;F) = H(C) – H(C|F) The best F maximizes mutual information
P(C,F) determines the best classification error:
Mutual Info vs. Threshold
0.00 20.00 40.00
Detection threshold
Mut
ual I
nfo
forehead
hairline
mouth
eye
nose
nosebridge
long_hairline
chin
twoeyes
Selecting Fragments
Adding a New Fragment(max-min selection)
?
MIΔ
MI = MI [Δ ;class] - MI [ ;class ]Select: Maxi Mink ΔMI (Fi, Fk)
)Min. over existing fragments, Max. over the entire pool( );(),;(min);(),;( jjiiji FCMIFFCMIFCMIFFCMI
Horse-class features
Car-class features
Pictorial features Learned from examples
Fragments with positions
On all detected fragments within their regions
Variability of Airplanes Detected
Class-fragments and Activation
Malach et al 2008
Bag of words
ObjectObject Bag of ‘words’Bag of ‘words’
Bag of visual words A large collection of image patches
–
1.Feature detection 1.Feature detection and representationand representation
•Regular grid–Vogel & Schiele, 2003–Fei-Fei & Perona, 2005
Generate a dictionary using K-means clustering
Each class has its words historgram
–
––
Limited or no GeometrySimple and popular, no longer state-of-the art .
Class II
HoG Descriptor Dallal, N & Triggs, B. Histograms of Oriented Gradients for Human Detection
• SIFT: Scale-invariant Feature Transform • MSER: Maximally Stable Extremal Regions• SURF: Speeded-up Robust Features • Cross correlation • ….
• HoG and SIFT are the most widely used.
SVM – linear separation in feature space
Optimal Separation
SVMPerceptron
Find a separating plane such that the closest points are as far as possible
Rosenblatt, Principles of Neurodynamics 1962.
The Nature of Statistical Learning Theory, 1995
Separating line: w ∙ x + b = 0 Far line: w ∙ x + b = +1Their distance: w ∙ ∆x = +1 Separation: |∆x| = 1/|w|Margin: 2/|w|
0+1
-1 The Margin
Max Margin Classification
)Equivalently, usually used
How to solve such constraint optimization ?
The examples are vectors xi
The labels yi are +1 for class, -1 for non-class
Using Lagrange multipliers :
Using Lagrange multipliers: Minimize LP =
With αi > 0 the Lagrange multipliers
Minimizing the Lagrangian
Minimize Lp :
Set all derivatives to 0:
Also for the derivative w.r.t. αi
Dual formulation: Maximize the Lagrangian w.r.t. the αi and the above two conditions.
Dual formulation
Mathematically equivalent formulation: Can maximize the Lagrangian with respect to the αi
After manipulations – concise matrix form :
SVM: in simple matrix form
We first find the α. From this we can find: w, b, and the support vectors.
The matrix H is a simple ‘data matrix’: Hij = yiyj <xi∙xj>
Final classification: w∙x + b ∑αi yi <xi x> + b
Because w = ∑αi yi xi Only <xi x> with support vectors are used
Full story – separable case
Classification of a new data point x: sgn ( ∑ ]αi yi <xi x> + b[ )
Quadratic Programming QP
Minimize (with respect to x)
Subject to one or more constraints of the form:
Ax < b (inequality constraints)Ex = d (equality constraints)
The problem can be solved in polynomial time of Pos. def. Q .
)NP-hard otherwise (
Full story: separable case
Classification of a new data point x: sgn ( ∑ ]αi yi <xi x> + b[ )
Non-
C≥
Kernel Classification
Full story – Kernal case
Classification of a new data point x: sgn ( ∑ ]αi yi <xi x> + b[ )
Hij = K(xi,xj)
Felzenszwalb Algorithm
• Felzenszwalb, McAllester, Ramanan CVPR 2008. A Discriminatively Trained, Multiscale, Deformable Part Model
• Many implementation details, will describe the main points.
Using patches with HoG descriptors and classification by SVM
Person model: HoG
Object model using HoG
A bicycle and its ‘root filter ’The root filter is a patch of HoG descriptor Image is partitioned into 8x8 pixel cells In each block we compute a histogram of gradient orientations
The filter is searched on a pyramid of HoG descriptors, to deal with unknown scale
Dealing with scale: multi-scale analysis
A part Pi = (Fi, vi, si, ai, bi) .
Fi is filter for the i-th part, vi is the center for a box of possible positions for part i relative to the root position, si the size of this box
ai and bi are two-dimensional vectors specifying coefficients of a quadratic function measuring a score for each possible placement of the i-th part. That is, ai and bi are two numbers each, and the penalty for deviation ∆x, ∆y from the expected location is a1 ∆ x + a2 ∆y + b1 ∆x2 + b2 ∆y2
Adding Parts
Bicycle model: root, parts, spatial map
Person model
The full score of a potential match is: ∑ Fi ∙ Hi + ∑ ai1 xi + ai2 yi
+ bi1xi2 + bi2yi
2
Fi ∙ Hi is the appearance part
xi, yi, is the deviation of part pi from its expected location in the model. This is the spatial part.
Match Score
search with gradient descent over the placement. This includes also the levels in the hierarchy. Start with the root filter, find places of high score for it. For these high-scoring locations, each for the optimal placement of the parts at a level with twice the resolution as the root-filter, using GD.
Final decision β∙ψ > θ implies class
Recognition
Essentially maximize ∑Fi Hi + ∑ ai1 xi + ai2 y + bi1x2 + bi2y2
Over placements (xi yi)
• Training -- positive examples with bounding boxes around the objects, and negative examples.
• Learn root filter using SVM • Define fixed number of parts, at locations of
high energy in the root filter HoG • Use these to start the iterative learning
The score of a match can be expressed as the dot-product of a vector β of coefficients, with the image:
Score = β∙ψ
Using the vectors ψ to train an SVM classifier :β∙ψ > 1 for class examples
β∙ψ < 1 for class examples
Using SVM:
β∙ψ > 1 for class examples β∙ψ < 1 for class examples
However, ψ depends on the placement z, that is, the values of ∆xi, ∆yi
We need to take the best ψ over all placements. In their notation :Classification then uses β∙f > 1
We need to take the best ψ over all placements. In their notation :
Classification then uses β∙f > 1
In analogy to classical SVMs we would like to train from labeled examples D = (<x1, y1> . . . , <xn, yn>) the training Data
The algorithm optimizes the following objective function,
Finding β, SVM training:
Hard Negatives
The set M of hard-negatives for a known β and data set DThese are support vector (y ∙ f =1) or misses (y ∙ f < 1)
Optimal SVM training does not need all the examples, hard examples are sufficient. For a given β, use the positive examples + C hard examples Use this data to compute β by standard SVM Iterate (with a new set of C hard examples)
Comments: relations to Star and to SVM
• Like a Star model, it is a collection of parts, at expected locations, where parts are defined by image patches
• The decision about a part detection is done by an SVM, <F H> where F are the learned coefficients and H is the part HoG descriptor
• The locations of the parts are learned by so-called Latent SVM. The part location is selected to maximize the SVM score
• The scheme creates a scale pyramid and searches over the best scales.
‘Pascal Challenge’ Airplanes
Obtaining human-level performance ?
All images contain at least 1 bike
Bike Recognition
Future challenges :
• Dealing with very large number of classes – Imagenet, 15,000 categories, 12 million images
• To consider: human-level performance for at least one class