Generic Object RecognitionGeneric Object Recognition
-- by Yatharth Saraf-- by Yatharth Saraf
A Project on
Problem Definition and Problem Definition and BackgroundBackground
Recognizing generic class or category of a given object as Recognizing generic class or category of a given object as opposed to recognizing specific, individual objectsopposed to recognizing specific, individual objects humans are much better at generic recognition, humans are much better at generic recognition,
machines are more competitive at specific object machines are more competitive at specific object recognitionrecognition
Early work by Marr led to the ‘reconstruction school’Early work by Marr led to the ‘reconstruction school’ advocates 3-D reconstruction and modeling before advocates 3-D reconstruction and modeling before
further reasoning of a scenefurther reasoning of a scene
Current work in object categorization tends to fall in the Current work in object categorization tends to fall in the ‘recognition school’‘recognition school’ work in the 2-D domain, with 2-D image features and work in the 2-D domain, with 2-D image features and
descriptorsdescriptors e.g. Bag of features approaches, spatial 2-D geometry e.g. Bag of features approaches, spatial 2-D geometry
approaches as in the ‘constellation model’approaches as in the ‘constellation model’
ApplicationsApplications
Image database annotation and retrievalImage database annotation and retrieval Video surveillanceVideo surveillance Driver assistance, autonomous robotsDriver assistance, autonomous robots Cognitive support for disabled peopleCognitive support for disabled people
Related WorkRelated Work
Discriminative approachesDiscriminative approaches SVM, subspace methodsSVM, subspace methods
Bag of featuresBag of features Representation of objects with point Representation of objects with point
descriptorsdescriptors Constellation modelConstellation model
Representations that take into account spatial Representations that take into account spatial geometry (2-D) of key pointsgeometry (2-D) of key points
AssumptionsAssumptions
Images are scale-normalizedImages are scale-normalized Images are clean, i.e. no background Images are clean, i.e. no background
clutter/occlusionclutter/occlusion (-) Implies segmentation is necessary as a (-) Implies segmentation is necessary as a
pre-processing steppre-processing step (+) Avoids the problem of exponential search(+) Avoids the problem of exponential search
Outline of the Method (Training)Outline of the Method (Training)
Detect salient regions in all training Detect salient regions in all training images using Kadir-Brady feature detectorimages using Kadir-Brady feature detector
Extract X,Y coordinates, scale and 11x11 Extract X,Y coordinates, scale and 11x11 intensity patches around detected featuresintensity patches around detected features
Reduce dimensionality of appearance Reduce dimensionality of appearance patches from 121 to 16 using PCApatches from 121 to 16 using PCA
Estimate model parametersEstimate model parametersA single full Gaussian for location; one A single full Gaussian for location; one
Gaussian per partGaussian per part
Outline of the Method (Testing)Outline of the Method (Testing)
Extract features of test images in the same Extract features of test images in the same manner as in training phasemanner as in training phase
Use the learnt model to estimate Use the learnt model to estimate probability of detectionprobability of detection
Use Bayes’ Decision Rule to classifyUse Bayes’ Decision Rule to classify
ExperimentsExperiments
Careful tweaking of detector parameters Careful tweaking of detector parameters neededneeded
A single set of parameter settings may not A single set of parameter settings may not be suitable for all categoriesbe suitable for all categories
Experiments (contd.)Experiments (contd.)
47 clean motorbike images used for 47 clean motorbike images used for training motorbike modeltraining motorbike model
Sorting the extracted patches by X-Sorting the extracted patches by X-coordinate helped (as opposed to sorting coordinate helped (as opposed to sorting by saliency)by saliency)
Appearance model not doing as wellAppearance model not doing as well
Features sorted by saliency. Features sorted by X-coordinate.
Log-probabilities of the 9 test images from location model
Image 5 Image 9
Appearance log-probabilities of the 9 test images
Total log-probabilities of the 9 test images
Features sorted by saliency. Features sorted by X-coordinate.
Experiments (contd.)Experiments (contd.) Using a Mixture of Gaussians for the appearances of parts didn’t Using a Mixture of Gaussians for the appearances of parts didn’t
make too much differencemake too much difference
3 mixture components per part (EM initialized with k-means and sample covariances)
Experiments (contd.)Experiments (contd.) Levenshtein distances on the appearance patches worked quite Levenshtein distances on the appearance patches worked quite
nicelynicely
• Each appearance patch is a single character
• Matching cost was computed using a straight SSD
• Cost of inserting a gap = matching cost of the patch with a canonical 11x11 patch having uniform intensity of 128.
Conclusions and Future WorkConclusions and Future Work
Strong dependence on feature detectorStrong dependence on feature detectorAppearance model doesn’t seem to be Appearance model doesn’t seem to be
working too wellworking too wellLevenshtein distances could be more Levenshtein distances could be more
promisingpromisingExperiments with more clean training and Experiments with more clean training and
test data, multiple categoriestest data, multiple categoriesExponential search for dealing with clutter Exponential search for dealing with clutter
and occlusionand occlusion