icme 2013
DESCRIPTION
TRANSCRIPT
![Page 1: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/1.jpg)
IEEE International Conference on Multimedia & Expo 2013
Augmenting Descriptors for Fine-grained Visual Categorization Using Polynomial Embedding
Hideki Nakayama
Graduate School of Information Science and Technology The University of Tokyo
![Page 2: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/2.jpg)
Outline
Background and Motivation Our solution
Polynomial embedding
Experiment Fine-grained categorization Comparison with state-of-the-art
Conclusion
2
![Page 3: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/3.jpg)
Fine-grained visual categorization (FGVC) Distinguish hundreds of fine-grained objects
under a certain domain (e.g., species of animals and plants)
Complement to traditional object recognition problems
Caltech-256 [Griffin et al., 2007]
Caltech-Bird-200 [Welinder et al., 2010]
Generic Object Recognition FGVC
Yellow Warbler
Pririe
Warbler Pine Warbler Airplane Monitor Dog
V.S.
3
![Page 4: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/4.jpg)
Motivation
We need highly discriminative features to distinguish visually very similar categories
Especially at local level.
4
![Page 5: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/5.jpg)
Two basic ideas
1. Co-occurrence (correlation) of neighboring local descriptors Shaplet [Sabzmeydani et al., 2007] Covariance feature [Tuzel et al., 2006]
GGV [Harada et al., 2012]
Expected to capture middle-level local information Results in high-dimensional local features
2. State-of-the-art bag-of-words representation
Based on higher-order statistics of local features Fisher vector [Perronnin et al., 2010]
VLAD [Jegou et al., 2010]
Remarkably high-performance, enables linear classification Dimensionality increases in linear to the size of local features
☹ ☺
☺ ☹
(N: number of visual words, D: size of local features) ND
2ND
conflict
5
![Page 6: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/6.jpg)
Our approach
Compress polynomials of neighboring local features vector with supervised dimensionality reduction Discriminative latent descriptor
Encode by means of bag-of-words (Fisher vector) Logistic regression classifier
1,000~1,0000 dim 64 dim Descriptor
(e.g. SIFT)
Dense sampling
polynomialvectors
latentdescriptor
categorylabel
CCA
(training)
Fishervector
logisticregressionclassifier
6
![Page 7: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/7.jpg)
★● ●
Exploit co-occurrence information
e.g. SIFT
( )( )( )
=
+
−
Tyxyx
Tyxyx
Tyxyx
yx
yx
Vec
Vec
upperVec
),(),(
),(),(
),(),(
),(
2),(
δ
δ
vv
vv
vv
v
p
),( yxv),( yx σ−v ),( yx σ+vNeighbor
(Left) Neighbor (Right)
Descriptor at target position
×
×
Polynomial Vector
a Matrix of vector flattened:)(Vec
7
![Page 8: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/8.jpg)
Exploit co-occurrence information More spatial information can be integrated with more
neighbors (but become high-dimensional)
( )( )( )
=
+
−
Tyxyx
Tyxyx
Tyxyx
yx
yx
Vec
Vec
upperVec
),(),(
),(),(
),(),(
),(
2),(
δ
δ
vv
vv
vv
v
p
( )
= T
yxyx
yxyx upperVec ),(),(
),(0),( vv
vp
( )( )( )( )( )
=
+
+
−
−
Tyxyx
Tyxyx
Tyxyx
Tyxyx
Tyxyx
yx
yx
Vec
Vec
Vec
Vec
upperVec
),(),(
),(),(
),(),(
),(),(
),(),(
),(
4),(
δ
δ
δ
δ
vv
vv
vv
vv
vv
v
p
★
★● ●
★● ●
●
●
0-neighbor
2-neighbors
4-neighbors
2,144dim
10,336dim
18,528dim
8
![Page 9: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/9.jpg)
Descriptor(e.g. SIFT)
Dense sampling
polynomialvectors
latentdescriptor
Fishervector
logisticregressionclassifier
categorylabel
CCA
(training)
9
![Page 10: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/10.jpg)
Patch feature and label pairs Category label: Binary occurrence vector
Strong supervision assumption
Most patches should be related to the content (category) (Somewhat) justified for FGVC considering the applications Users will target the object, sometimes can give segmentation
Supervised dimensionality reduction
Allium triquetrum
0
010
0
010
0
010
0
010
(We do not perform manual segmentation in this work, though) 10
![Page 11: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/11.jpg)
Canonical Correlation Analysis (CCA) [Hotelling, 1936]
( ) ( ) tslltpps
lp
andbetweenncorrelatiothemaximizethat,tionstransformalinearfinds CCA
featurelabel: ls),(polynomiafeaturepatch:
−=−= TT BA
Supervised dimensionality reduction
( )( )IBCBBCBCCC
IACAACACCC
llT
llplpplp
ppT
pplpllpl
=Λ=
=Λ=−
−
21
21
nscorrelatiocanonical:matricescovariance:
ΛC
p l
Canonical space
s t
s
t
Image feature Labels feature
( )pps −= TALatent descriptor
1,000~1,0000 dim
64 dim
(discriminative)
11
![Page 12: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/12.jpg)
Descriptor(e.g. SIFT)
Dense sampling
polynomialvectors
latentdescriptor
Fishervector
logisticregressionclassifier
categorylabel
CCA
(training)
12
![Page 13: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/13.jpg)
Fisher Vector [Perronnin et al., 2010]
State-of-the-art bag-of-words encoding method using higher-level statistics of descriptors (mean and var)
http://www.image-net.org/challenges/LSVRC/2010/ILSVRC2010_XRCE.pdf
13
![Page 14: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/14.jpg)
Experiments
![Page 15: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/15.jpg)
Experimental setup
FGVC Datasets Oxford-Flowers-102 Caltech-Bird-200
Descriptors SIFT, C-SIFT, Opponent-SIFT, Self Similarity Compressed into 64dim using several methods
Fisher Vector 64 Gaussians (visual wods) Global + 3 horizontal spatial regions
Classifier Logistic regression
Evaluation Mean classification accuracy
Flowers
Birds
15
![Page 16: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/16.jpg)
Results: comparison with PCA and CCA
Our method substantially improves performance for all descriptors
Just applying CCA to concatenated neighbors does not improve performance Polynomial embedding makes sense (non-linear convolution)
0
10
20
30
40
50
60
70
80
90
SIFT C-SIFT Opp.-SIFT SSIM
PCA (baseline)
CCA (4-neighbors)
PolEmb (4-neighbors)
0
5
10
15
20
25
SIFT C-SIFT Opp.-SIFT SSIM
Flower Bird
Classification performance (%) with different embedding methods (all 64dim)
Baseline (PCA)
Ours
16
CCA without
Pol.
![Page 17: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/17.jpg)
Results: number of neighbors
Including more neighbors improves performance
0
10
20
30
40
50
60
70
80
90
SIFT C-SIFT Opp.-SIFT SSIM0
5
10
15
20
25
SIFT C-SIFT Opp.-SIFT SSIM
Classification performance (%) of our method with different number of neighbors
Flower Bird
★
★● ●
★● ●
●
●
17
![Page 18: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/18.jpg)
Comparison with other work
![Page 19: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/19.jpg)
Our final system Combine four descriptors in late-fusion
approach (SIFT, C-SIFT, Opp.-SIFT, SSIM)
Sum of log-likelihoods output by each classifier (weighted by its individual confidence)
Descriptor 1(e.g. SIFT)
Dense sampling
polynomialvectors
categorylabel
CCA
latentdescriptor
Fishervector
logisticregressionclassifier
logisticregressionclassifier
logisticregressionclassifier
(training)
Descriptor 2
Descriptor K
+
・・・・・・
Same as above
Same as above
Alliumtriquetrum
19
![Page 20: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/20.jpg)
Comparison on FGVC datasets
Our method outperforms previous work on bird and flower datasets
For the bird dataset, [32] uses the bounding box only for training images, therefore the result is not directly comparable to ours.
(PCA) (PolEmb) (PCA+PolEmb)
← baseline
Mean classification accuracy (%)
20
![Page 21: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/21.jpg)
ImageCLEF 2013 Plant Identification
Flower FruitLeaf Stem Entire
KakiPersimmon
Silverbirch
Boxeldermapple
Identify 250 plant species from different organs (Leaf, Flower, Fruit, etc.)
Got the 1st place in Natural Background task, and in 4/5 subtasks.
(Coming in Sept., 2013.)
21
![Page 22: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/22.jpg)
Conclusion A simple but effective method for FGVC
Embedding co-occurrence patterns of neighboring descriptors Obtain discriminative and small-dimensional latent descriptor
to use together with Fisher vector Polynomial embedding greatly improves the performance,
indicating the importance of non-linearity
Patch-level strong supervision approximation
Not always perfect but reasonable for FGVC problems
Future work Theoretical analysis (probabilistic interpretation) Multiple instance dimensionality reduction
22
![Page 23: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/23.jpg)
Thank you!
Any questions?
23
![Page 24: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/24.jpg)
Object and scene categorization
Caltech-101 (Object dataset)
MIT-Indoor-67 (Scene dataset)
24
![Page 25: ICME 2013](https://reader033.vdocuments.us/reader033/viewer/2022051514/54b6b8194a79595f598b45cb/html5/thumbnails/25.jpg)
Results: Object and scene categorization
Our method seems to be not as effective as in FGVC problems
Combining PCA feature + our feature consistent improves performance
Mean classification accuracy (%)
0
10
20
30
40
50
60
70
80
SIFT C-SIFT Opp.-SIFT0
10
20
30
40
50
60
SIFT C-SIFT Opp.-SIFT
Caltech-101 MIT-Indoor-67
0
10
20
30
40
50
60
70
80
SIFT C-SIFT Opp.-SIFT0
10
20
30
40
50
60
SIFT C-SIFT Opp.-SIFT
25