object recognition szeliski chapter 14. recognition
Post on 19-Dec-2015
293 Views
Preview:
TRANSCRIPT
Object Recognition
Szeliski Chapter 14
Recognition
Recognition
Simultaneous recognition and segmentation (Shotton, Winn, Rother et al. 2009)
Location recognition (Philbin, Chum, Isard et al. 2007)
Using context (Russell, Torralba, Liu et al. 2007).
Recognition Slides from Rick Szeliski
Recognition problems
What is it?• Object and scene recognition
Who is it?• Identity recognition
Where is it?• Object detection
What are they doing?• Activities
All of these are classification problems• Choose one class from a list of possible candidates
Recognition Slides from Rick Szeliski
What is recognition?
A different taxonomy from [Csurka et al. 2006]:• Recognition
• Where is this particular object?
• Categorization• What kind of object(s) is(are) present?
• Content-based image retrieval• Find me something that looks similar
• Detection• Locate all instances of a given class
Recognition Slides from Rick Szeliski
Readings
• Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition
Fergus, R. , Perona, P. and Zisserman, A. International Journal of Computer Vision, Vol. 71(3), 273-303, March 2007
• MIT course • http://people.csail.mit.edu/torralba/courses/
6.870/6.870.recognition.htm
Recognition Slides from Rick Szeliski
Sources
• Steve Seitz, CSE 455/576, previous quarters
• Fei-Fei, Fergus, Torralba, CVPR’2007 course
• Efros, CMU 16-721 Learning in Vision
• Freeman, MIT 6.869 Computer Vision: Learning
• Linda Shapiro, CSE 576, Spring 2007
Object Detection
How to recognize each person in this image?
Recognition
Every possible sub-window?
Effective special-purpose detectors: rapidly find likely regions where particular objects might occur.
Object Detection
• Face Detection• More successful • Built in
• digital cameras to enhance auto-focus• Video conference to control pan-tilt heads• …
• General object detection • Pedestrian • Cars.
Recognition
Face Detection
• Every pixel? Scale?• Too slow in practice
• Tutorials• http://people.csail.mit.edu/torralba/shortCourseRLOC/
• General detection and recognition
• http://vision.ai.uiuc.edu/mhyang/face-detection-survey.html• Face recognition
Recognition
CSE 576, Spring 2008 Face Recognition and Detection 12
Face detection
How to tell if a face is present?
CSE 576, Spring 2008 Face Recognition and Detection 13
Skin detection
Skin pixels have a distinctive range of colors• Corresponds to region(s) in RGB color space
Skin classifier• A pixel X = (R,G,B) is skin if it is in the skin (color) region• How to find this region?
skin
CSE 576, Spring 2008 Face Recognition and Detection 14
Skin detection
Learn the skin region from examples• Manually label skin/non pixels in one or more “training images”• Plot the training data in RGB space
– skin pixels shown in orange, non-skin pixels shown in gray
– some skin pixels may be outside the region, non-skin pixels inside.
CSE 576, Spring 2008 Face Recognition and Detection 15
Skin classifier
Given X = (R,G,B): how to determine if it is skin or not?• Nearest neighbor
– find labeled pixel closest to X
• Find plane/curve that separates the two classes– popular approach: Support Vector Machines (SVM)
• Data modeling– fit a probability density/distribution model to each class
CSE 576, Spring 2008 Face Recognition and Detection 16
Probability
• X is a random variable• P(X) is the probability that X achieves a certain
value
continuous X discrete X
called a PDF-probability distribution/density function-a 2D PDF is a surface-3D PDF is a volume
CSE 576, Spring 2008 Face Recognition and Detection 17
Probabilistic skin classification
Model PDF / uncertainty• Each pixel has a probability of being skin or not skin
Skin classifier• Given X = (R,G,B): how to determine if it is skin or not?
• Choose interpretation of highest probability
Where do we get and ?
CSE 576, Spring 2008 Face Recognition and Detection 18
Learning conditional PDF’s
We can calculate P(R | skin) from a set of training images• It is simply a histogram over the pixels in the training images
• each bin Ri contains the proportion of skin pixels with color Ri
• This doesn’t work as well in higher-dimensional spaces. Why not?Approach: fit parametric PDF functions
• common choice is rotated Gaussian – center – covariance
CSE 576, Spring 2008 Face Recognition and Detection 19
Learning conditional PDF’s
We can calculate P(R | skin) from a set of training images
But this isn’t quite what we want• Why not? How to determine if a pixel is skin?• We want P(skin | R) not P(R | skin)• How can we get it?
CSE 576, Spring 2008 Face Recognition and Detection 20
Bayes rule
In terms of our problem:
what we measure(likelihood)
domain knowledge(prior)
what we want(posterior)
normalization term
What can we use for the prior P(skin)?• Domain knowledge:
– P(skin) may be larger if we know the image contains a person– For a portrait, P(skin) may be higher for pixels in the center
• Learn the prior from the training set. How?– P(skin) is proportion of skin pixels in training set
CSE 576, Spring 2008 Face Recognition and Detection 21
Bayesian estimation
Bayesian estimation• Goal is to choose the label (skin or ~skin) that maximizes the posterior ↔
minimizes probability of misclassification– this is called Maximum A Posteriori (MAP) estimation
likelihood posterior (unnormalized)
CSE 576, Spring 2008 Face Recognition and Detection 22
Skin detection results
CSE 576, Spring 2008 Face Recognition and Detection 23
This same procedure applies in more general circumstances• More than two classes• More than one dimension
General classification
Example: face detection• Here, X is an image region
– dimension = # pixels – each face can be thought of as a
point in a high dimensional space
H. Schneiderman, T. Kanade. "A Statistical Method for 3D Object Detection Applied to Faces and Cars". CVPR 2000
Face Detection
• Feature-based • Distinctive image features: eyes, nose, mouth
• Geometrical arrangement
• Template-based • Active shape model (ASM), active appearance model (AAM)
• Requires good initialization near a real face
• Appearance-based • Search for likely candidates
• Refine using cascade of more expensive but selective detection algorithm
• Rely heavily on training classifiers using labeled faces
Recognition
Recognition Slides from Rick Szeliski
Li Fei-Fei, Princeton
Rob Fergus, MIT
Antonio Torralba, MIT
Recognizing and Learning Recognizing and Learning Object Categories: Year 2007Object Categories: Year 2007
CVPR 2007 Minneapolis, Short Course, June 17CVPR 2007 Minneapolis, Short Course, June 17
(see other slide deck)
Recognition Slides from Rick Szeliski
Today’s lecture
• Known object recognition [Lowe]
• Bag of keypoints [Csurka etc.]
• Location recognition [Schindler et al.]
• Deformable object/category recognition [Fergus et al.]
• Recognition by segmentation
Recognition Slides from Rick Szeliski
Single object recognition
Recognition Slides from Rick Szeliski
Single object recognition
• Lowe, et al. 1999, 2003• Mahamud and Herbert, 2000• Ferrari, Tuytelaars, and Van Gool, 2004• Rothganger, Lazebnik, and Ponce, 2004• Moreels and Perona, 2005• …
Recognition Slides from Rick Szeliski
Planar object recognition [Lowe]
• Use SIFT features• Verify affine
(or homography) geometric alignment
Recognition Slides from Rick Szeliski
Planar object recognition [Lowe]
• Use SIFT features• Verify affine
(or homography) geometric alignment
Recognition Slides from Rick Szeliski
3D object recognition [Lowe]
• Extract object outlines with background subtraction
Recognition Slides from Rick Szeliski
3D object recognition [Lowe]
• Use 3 matches to recognize
• Use additional matches for verification
• Tolerant to occlusions
Recognition Slides from Rick Szeliski
Feature-based recognition
How can we scale to millions of objects?
Comparison to all stored objects/features is infeasible.
Answer:• quantize features into words [Csurka et al.
04]• use information retrieval (inverted index)• use metric tree for faster quantization (NN)
[Nister & Stewenius 05]
Recognition Slides from Rick Szeliski
Today’s lecture
• Known object recognition [Lowe]
• Bag of keypoints [Csurka etc.]
• Location recognition [Schindler et al.]
• Deformable object/category recognition [Fergus et al.]
• Recognition by segmentation
CVPR 2007 Minneapolis, Short Course, June 17CVPR 2007 Minneapolis, Short Course, June 17
Part 1: Bag-of-words models
by Li Fei-Fei (Princeton)
(see other slide deck)
Recognition Slides from Rick Szeliski
Today’s lecture
• Known object recognition [Lowe]
• Bag of keypoints [Csurka etc.]
• Location recognition [Schindler et al.]
• Deformable object/category recognition [Fergus et al.]
• Recognition by segmentation
Recognition Slides from Rick Szeliski
How to scale to 106s of images?
Make “word” generation even more efficient:
“Vocabulary tree”
Recognition Slides from Rick Szeliski
Scalable Recognition with a Vocabulary Tree
David Nistér, Henrik Stewénius
Recognition Slides from Rick Szeliski
Vocabulary Tree
Recognition Slides from Rick Szeliski
Recognition Slides from Rick Szeliski
Performance
Recognition Slides from Rick Szeliski
http://vis.uky.edu/~stewe/ukbench/
Recognition Slides from Rick Szeliski
Location Recognition
Can we apply this to recognizing your location from a cell-phone photo?
City-Scale Location Recognition
Grant Schindler, Matthew Brown,and Richard Szeliski
CVPR’2007
Recognition Slides from Rick Szeliski
The Problem
Recognition Slides from Rick Szeliski
Main idea
Find N-best matches in vocabulary tree
Recognition Slides from Rick Szeliski
Other ideas
• Use only informativefeatures (ignore trees…)
• Integrate matches withadjacent (streetside) neighbors
Recognition Slides from Rick Szeliski
Today’s lecture
• Known object recognition [Lowe]
• Bag of keypoints [Csurka etc.]
• Location recognition [Schindler et al.]
• Deformable object/category recognition [Fergus et al.]
• Recognition by segmentation
Part 2: part-based models
by Rob Fergus (MIT)
CVPR 2007 Minneapolis, Short Course, June 17CVPR 2007 Minneapolis, Short Course, June 17
(see other slide deck)
Recognition Slides from Rick Szeliski
Today’s lecture
• Known object recognition [Lowe]
• Bag of keypoints [Csurka etc.]
• Location recognition [Schindler et al.]
• Deformable object/category recognition [Fergus et al.]
• Recognition by segmentation
Part 4: Combined segmentation and recognition
by Rob Fergus (MIT)
CVPR 2007 Minneapolis, Short Course, June 17CVPR 2007 Minneapolis, Short Course, June 17
Recognition Slides from Rick Szeliski
Aim
Given an image and object category, segment the object
Segmentation should (ideally) be• shaped like the object e.g. cow-like• obtained efficiently in an unsupervised manner• able to handle self-occlusion
Segmentation
ObjectCategory
Model
Cow Image Segmented Cow
Slide from Kumar ‘05
Recognition Slides from Rick Szeliski
Implicit Shape Model - Liebe and Schiele, 2003
BackprojectedHypotheses
Interest PointsMatched Codebook Entries
Probabilistic Voting
Voting Space(continuous)
Backprojection
of Maxima
Segmentation
Refined Hypotheses(uniform sampling)
Recognition Slides from Rick Szeliski
Other topics: context (scenes)
Antonio Torralba, Contextual Priming for Object Detection,IJCV(53), No. 2, July 2003, pp. 169-191
Recognition Slides from Rick Szeliski
New work: tiny images
Datasets and object collections
CVPR 2007 Minneapolis, Short Course, June 17CVPR 2007 Minneapolis, Short Course, June 17
(see other slide deck)
Recognition Slides from Rick Szeliski
Summary of object recognition
• Known object recognition [Lowe]
• Bag of keypoints [Csurka etc.]
• Location recognition [Schindler et al.]
• Deformable object/category recognition [Fergus et al.]
• Recognition by segmentation
• Context and scenes
top related