Contextless Object Recognitionwith Shape-enriched SIFT and
Bags of Features
Marcel Tella Amo
Directed by Dr. Matthias Zeppelzauer (TU Wien)Codirected by Dr. Xavier Giró-i-Nieto (UPC)
2
Motivation
Object Recognition and Classification
Categories• Ball• Airplane• Chair• Beaver• …
Ball Airplane Chair
Shape Information
Texture information
3
Requirements
State of the Art
Design
Results
Index
4
Requirements
5
Design shape features that can be used in an aggregated framework, like Bag of Words with no need of matching or alignment.
Requirements State of the Art Design Results
Take a successful method :
Shape Information
SIFT
6
Analyse the implication of the vocabulary size with respect to the size of the shape features.
SIFT
Shape
Requirements State of the Art Design Results
7
The proposed features should be at least scale, rotation and translation invariant. If it is possible, flip invariant as well.
Requirements State of the Art Design Results
8
Need for Segmentation to codify the shapeStudy the limitations of shape coding when using a state of the art segmentation.
Manual annotations vs Automatic Segmentation
Requirements State of the Art Design Results
9
State of the Art
10
Requirements State of the Art Design Results
Object Candidates algorithmsMultiscale Combinatorial Grouping (MCG)
Arbelaez, P., Pont-Tuset, J., Barron, J. T., Marques, F., Malik, J. (2014).Multiscale Combinatorial Grouping. CVPR.
Ranking
Object Plausibility
High
Low
11
Shape Context
G. Mori, S. Belongie, and J. Malik. Ecient shape matching using shapecontexts. PAMI, 27(11), 2005.
Requirements State of the Art Design Results
12
Interest point descriptors: SIFT descriptor
Typically 4x4 divisions * 8 bins/hist = 128 features
dense SIFT
sparse SIFT
David G Lowe, Distinctive image features from scale-invariant keypoints, International journal of computer vision 60 (2004), no. 2, 91{110.
Simplified example
Requirements State of the Art Design Results
13
Enrichment of SIFT
Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Computer Vision{ECCV 2012} (pp. 430-443). Springer Berlin Heidelberg.
Extra features : Relative position + aspect ratio + scale ratio + Color Space
Extra features : Absolute spatial location (X,Y) or angle and distance
Rene Grzeszick, Leonard Rothacker, and Gernot A. Fink, "Bag-of-features representations using spatial visual vocabularies for object classication,“ in IEEE Intl. Conf. on Image Processing, Melbourne, Australia, 2013
128-dimensional SIFT descriptor Extra features
Requirements State of the Art Design Results
14
Bag of Words
Requirements State of the Art Design Results
15
Bags of Words - Pipeline
Get Descriptors
Clustering(K-means)
Create histograms
Train Model(SVM)
Image
Create histogram
Evaluate(SVM)
Requirements State of the Art Design Results
16
Design
17
Why dense SIFT?
Requirements State of the Art Design Results
18
Main principle: Combination of dense SIFT and Object Candidates
Requirements State of the Art Design Results
19
Distance to the nearest border (DNB)
Logarithmic distance to the nearest border (LDNB)
Less influence of big distances
Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Computer Vision-ECCV 2012 (pp. 430-443). Springer Berlin Heidelberg.
Requirements State of the Art Design Results
20
Distance and Angle to the nearest border (DANB)
Solution: Codify them in two separated features.Problem: Really similar in 2D but very different values.
Requirements State of the Art Design Results
21
Rotation Invariant Angle to the nearest border
Requirements State of the Art Design Results
22
Distance to the center (DC)
Requirements State of the Art Design Results
23
η - Angular Scan (ηAS)WINNER!
WINNER!
Requirements State of the Art Design Results
24
Shape Context from a dense SIFT (DSC)
Note: It crosses the contour of the region like Shape Context. ηAS does not!
Requirements State of the Art Design Results
25
Rotation Invariant Region Quantization (RIRQ)
Main idea: Get spatial information.
Easily extensible to a pyramid!
Lazebnik, S., Schmid, C., & Ponce, J. (2006). 2006 IEEE Computer Society Conference on (Vol. 2, pp. 2169-2178). IEEE.
Requirements State of the Art Design Results
26
Achieving flip invariance (RIRQ)
12
34
1
2 3
44 1
23
2
34
1
4 22 4
SORT SORT
2 4
Requirements State of the Art Design Results
Where do we integrate our features? Two main Architectures
SIFT Shape features
Bag of eSIFT visual words
Visual Vocabulary
Enriched SIFT (eSIFT)
SIFT
Shape histogramBag of Words
Visual Vocabulary
BoW+Shape
27
Requirements State of the Art Design Results
28
SIFT
Shape histogramBag of Words
Visual Vocabulary
BoW+Shape Creation of the shape histograms
11. Accumulate the same feature for all points .
2. Create a histogram of X bins for that feature.
1
2
2
3. Concatenate histograms to create the final one.
Example: 8-Angular Scan
8 distances (different angles)
# SI
FT k
eypo
ints
Accumulation of features
Requirements State of the Art Design Results
29
Results and conclusions
30
The dataset: Caltech-101Requirements State of the Art Design Results
• Well recognized dataset• 101 Different Categories of images• Ground truth annotations available• From 40 to 800 images per category.
31
Metrics: Accuracy (%)
Correct Classifications
Correct + Incorrect Classifications
Requirements State of the Art Design Results
32
Experiments setup• 30 images per category in train and 30-50 in test.• 101 Categories + Background category.• Different Vocabulary sizes in the X axis.• Accuracy(%) in the Y axis:
•Experiments and analysis:• eSIFT• BoW+S• eSIFT vs BoW+S• Performance acheived• Comparison between adding features before or after quantization• Number of bins per histogram• Ground truth vs MCG Object Canditates• Context vs Shape
Requirements State of the Art Design Results
33
Results enriched SIFTRequirements State of the Art Design Results
34
Results BoW+S
Requirements State of the Art Design Results
35
Performance achieved
Conclusion
With Angular Scan, there is an increase of performance from 16% to around 41%.
Requirements State of the Art Design Results
36
Comparison between adding features after and before
Conclusion
In Angular Scan, if the number of shape features is high,both architectures tend to converge.
Requirements State of the Art Design Results
37
Number of bins per histogram
Conclusion
In Angular Scan, 8 bins is the value that gives the best performance.
Requirements State of the Art Design Results
38
Ground truth vs MCG Object Candidates
Conclusion 1
Higher vocabulary values lead to a more robust approach in terms of segmentation errors.
Conclusion 2
Shape-based methods are more sensible to segmentation errors than texture-based.
Requirements State of the Art Design Results
Context gain vs Shape gain
Conclusion
It gives better performance to codify the shape than the context of the image. 39
Object
Context
Requirements State of the Art Design Results
40
Future Work
Comparison betwen our work andSecond Order Pooling
PhD thesis of Carles Ventura
Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Computer Vision-ECCV 2012 (pp. 430-443). Springer Berlin Heidelberg.
41
Distance to the nearest border (DNB)
Future Work
42
Conclusions
1. Increase of performance from 16% to around 41%2. In Angular Scan, if the number of shape features is high, both
architectures tend to converge.3. In Angular Scan, 8 bins is the value that gives the best performance.4. Higher vocabulary values lead to a more robust approach in terms of
segmentation errors.5. Shape-based methods are more sensible to segmentation errors than
texture-based.6. It gives better performance to codify the shape than the context of the
image.
Thank you! Questions?