Download - Sparse Kernel Learning for Image Annotation
Sparse Kernel Learning for Image Annotation
Sean Moran and Victor Lavrenko
Institute of Language, Cognition and ComputationSchool of Informatics
University of Edinburgh
ICMR’14 Glasgow, April 2014
Sparse Kernel Learning for Image Annotation
Overview
SKL-CRM
Evaluation
Conclusion
Sparse Kernel Learning for Image Annotation
Overview
SKL-CRM
Evaluation
Conclusion
Assigning words to pictures
Feature Extraction
GIST SIFT LAB HAAR
Tiger, Grass, Whiskers
City, Castle, Smoke
Tiger, Tree, Leaves
Eagle, Sky
Training Dataset
P(Tiger | ) = 0.15
P(Grass | ) = 0.12
P(Whiskers| ) = 0.12
Top 5 words as annotation
This talk:How best to
combinefeatures?
Multiple Features
Ranked list of words
Tiger, Grass, Tree Leaves, Whiskers
Annotation Model
P(Leaves | ) = 0.10
P(Tree | ) = 0.10
P(Smoke | ) = 0.01
Testing Image
P(City | ) = 0.03
P(Waterfall | ) = 0.05
P(Castle | ) = 0.03
P(Eagle | ) = 0.02
P(Sky | ) = 0.08
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X6
X5
X4
X3
X2
X1
X6
X5
X4
X3
X2
X1
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X6
X5
X4
X3
X2
X1
X6
X5
X4
X3
X2
X1
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X6
X5
X4
X3
X2
X1
X6
X5
X4
X3
X2
X1
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X6
X5
X4
X3
X2
X1
X6
X5
X4
X3
X2
X1
X1
X2
X3
X4
X5
X6
Previous work
I Topic models: latent Dirichlet allocation (LDA) [Barnard etal. ’03], Machine Translation [Duygulu et al. ’02]
I Mixture models: Continuous Relevance Model (CRM)[Lavrenko et al. ’03], Multiple Bernoulli Relevance Model(MBRM) [Feng ’04]
I Discriminative models: Support Vector Machine (SVM)[Verma and Jahawar ’13], Passive Aggressive Classifier[Grangier ’08]
I Local learning models: Joint Equal Contribution (JEC)[Makadia’08], Tag Propagation (Tagprop) [Guillaumin et al.’09], Two-pass KNN (2PKNN) [Verma et al. ’12]
Combining different feature types
I Previous work: linear combination of feature distances in aweighted summation with “default” kernels:
Kernels
x
GG(x
;p)
p =1
x
GG(x
;p)
p =15
x
GG(x
;p)
p =2
Laplacian UniformGaussian
I Standard kernel assignment: Gaussian for Gist, Laplacianfor colour features, χ2 for SIFT
Data-adaptive visual kernels
I Our contribution: permit the visual kernels themselves toadapt to the data:
Kernels
x
GG(x
;p)
p =1
x
GG(x
;p)
p =15
x
GG(x
;p)
p =2
Laplacian UniformGaussian
Corel 5K
I Hypothesis: Optimal kernels for GIST, SIFT etc dependenton the image dataset itself
Data-adaptive visual kernels
I Our contribution: permit the visual kernels themselves toadapt to the data:
Kernels
x
GG(x
;p)
p =1
x
GG(x
;p)
p =15
x
GG(x
;p)
p =2
Laplacian UniformGaussian
IAPR TC12
I Hypothesis: Optimal kernels for GIST, SIFT etc dependenton the image dataset itself
Sparse Kernel Continuous Relevance Model (SKL-CRM)
Overview
SKL-CRM
Evaluation
Conclusion
Continuous Relevance Model (CRM)
I CRM estimates joint distribution of image features (f) andwords (w)[Lavrenko et al. 2003]:
P(w, f) =∑J∈T
P(J)N∏
j=1
P(wj |J)M∏i=1
P(~fi |J)
I P(J): Uniform prior for training image JI P(~fi |J): Gaussian non-parametric kernel density estimateI P(wi |J): Multinomial for word smoothing
I Estimate marginal probability distribution over individual tags:
P(w |f) =P(w , f)∑w P(w , f)
I Top e.g. 5 words with highest P(w |f) used as annotation
Sparse Kernel Learning CRM (SKL-CRM)
I Introduce binary kernel-feature alignment matrix Ψu,v
P(I |J) =M∏i=1
R∑j=1
exp
{− 1
β
∑u,v
Ψu,vkv (~f ui ,~f uj )
}
I kv (~f ui ,~f uj ): v -th kernel function on the u-th feature type
I β: kernel bandwidth parameter
I Goal: learn Ψu,v by directly maximising annotation F1 scoreon held-out validation dataset
Generalised Gaussian Kernel
I Shape factor p: traces out an infinite family of kernels
P(~fi |~fj) =p1−1/p
2βΓ(1/p)exp
[−1
p
|~fi − ~fj |p
βp
]
I Γ: Gamma functionI β: kernel bandwidth parameter
Generalised Gaussian Kernel
I Shape factor p: traces out an infinite family of kernels
P(~fi |~fj) =p1−1/p
2βΓ(1/p)exp
[−1
p
|~fi − ~fj |p
βp
]
x
GG(x ;
p)
p =2
Generalised Gaussian Kernel
I Shape factor p: traces out an infinite family of kernels
P(~fi |~fj) =p1−1/p
2βΓ(1/p)exp
[−1
p
|~fi − ~fj |p
βp
]
x
GG(x ;
p)
p =1
Generalised Gaussian Kernel
I Shape factor p: traces out an infinite family of kernels
P(~fi |~fj) =p1−1/p
2βΓ(1/p)exp
[−1
p
|~fi − ~fj |p
βp
]
x
GG(x ;
p)
p =15
Multinomial Kernel
I Multinomial kernel optimised for count-based features:
P(~fi |~fj) =(∑
d fi ,d)!∏d (fi ,d !)
∏d
(pj ,d)fi,d
I fi,d : count for bin d in the unlabelled image iI fj,d count for the training image j
I Jelinek-Mercer smoothing used to estimate pj ,d :
pj ,d = λfj ,d∑d fj ,d
+ (1− λ)
∑j fj ,d∑
j ,d fj ,d
I We also consider standard χ2 and Hellinger kernels
Greedy kernel-feature alignment
Features
Kernels
Laplacian
GIST HAAR
Gaussian Uniform
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
0 0 0 0
0 0 0 0
0 0 0 0
GIST SIFT LAB HAAR
Laplacian
Gaussian
Uniform
Ψ vu
X6
Iteration 0:
F1 0.0
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
X6
Testing Image
Training Image
x
GG(x
;p)
p =1
x
GG(x
;p)
p =15
x
GG(x
;p)
p =2
Greedy kernel-feature alignment
Features
Kernels
Laplacian
GIST HAAR
Uniform
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
0 0 0 0
1 0 0 0
0 0 0 0
GIST SIFT LAB HAAR
Laplacian
Gaussian
Uniform
Ψ vu
X6
Iteration 1:
F1 0.25
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
X6
Testing Image
Training Image
x
GG(x
;p)
p =1
x
GG(x
;p)
p =15
x
GG(x
;p)
p =2
Gaussian
Greedy kernel-feature alignment
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
0 0 0 0
1 0 0 0
0 0 0 1
GIST SIFT LAB HAAR
Laplacian
Gaussian
Uniform
Ψ vu
X6
Iteration 2:
F1 0.34
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
X6
Testing Image
Training Image
Kernels
Laplacian Uniformx
GG(x
;p)
p =1
x
GG(x
;p)
p =15
x
GG(x
;p)
p =2
Gaussian
Greedy kernel-feature alignment
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
0 0 0 0
1 1 0 0
0 0 0 1
GIST SIFT LAB HAAR
Laplacian
Gaussian
Uniform
Ψ vu
X6
Iteration 3:
F1 0.38
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
X6
Testing Image
Training Image
Kernels
x
GG(x
;p)
p =1
x
GG(x
;p)
p =15
x
GG(x
;p)
p =2
Gaussian Laplacian Uniform
Greedy kernel-feature alignment
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
0 0 1 0
1 1 0 0
0 0 0 1
GIST SIFT LAB HAAR
Laplacian
Gaussian
Uniform
Ψ vu
X6
Iteration 4:
F1 0.42
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
X6
Testing Image
Training Image
Kernels
Laplacian Uniformx
GG(x
;p)
p =1
x
GG(x
;p)
p =15
x
GG(x
;p)
p =2
Gaussian
Evaluation
Overview
SKL-CRM
Evaluation
Conclusion
Datasets/Features
I Standard evaluation datasets:
I Corel 5K: 5,000 images (landscapes, cities), 260 keywords
I IAPR TC12: 19,627 images (tourism, sports), 291 keywords
I ESP Game: 20,768 images (drawings, graphs), 268 keywords
I Standard “Tagprop” feature set [Guillaumin et al. ’09]:
I Bag-of-words histograms: SIFT [Lowe ’04] and Hue [van deWeijer & Schmid ’06]
I Global colour histograms: RGB, HSV, LAB
I Global GIST descriptor [Oliva & Torralba ’01]
I Descriptors, except GIST, also computed in a 3x1 spatialarrangement [Lazebnik et al. ’06]
Evaluation Metrics
I Standard evaluation metrics [Guillaumin et al. ’09]:
I Mean per word Recall (R)
I Mean per word Precision (P)
I F1 Measure
I Number of words with recall > 0 (N+)
I Fixed annotation length of 5 keywords
F1 score of CRM model variants
Corel 5K IAPR TC12 ESP Game0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
CRM
CRM 15
SKL-CRM
F1
F1 score of CRM model variants
Corel 5K IAPR TC12 ESP Game0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
CRM
CRM 15
SKL-CRM
F1
Original CRMDuygulu et al.
features
F1 score of CRM model variants
Corel 5K IAPR TC12 ESP Game0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
CRM
CRM 15
SKL-CRM
F1
Original CRMDuygulu et al.
features
Original CRM15 Tagprop
features +71%
F1 score of CRM model variants
Corel 5K IAPR TC12 ESP Game0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
CRM
CRM 15
SKL-CRM
F1
Original CRMDuygulu et al.
features
Original CRM15 Tagprop
features +71%
SKL-CRM15 Tagprop
features +45%
F1 score of SKL-CRM on Corel 5K
HSV_V3H1DS
HS_V3H1HSV
HSHH_V3H1
GISTLAB_V3H1
RGB_V3H1RGB
DH_V3H1DH
HHLAB
DS_V3H1
0.31
0.33
0.35
0.37
0.39
0.41
0.43
0.45
SKL-CRM (Valid F1)
SKL-CRM (Test F1)
Tagprop (Test F1)
Feature type
F1
F1 score of SKL-CRM on Corel 5K
HSV_V3H1DS
HS_V3H1HSV
HSHH_V3H1
GISTLAB_V3H1
RGB_V3H1RGB
DH_V3H1DH
HHLAB
DS_V3H1
0.31
0.33
0.35
0.37
0.39
0.41
0.43
0.45
SKL-CRM (Valid F1)
SKL-CRM (Test F1)
Tagprop (Test F1)
Feature type
F1
F1 score of SKL-CRM on Corel 5K
HSV_V3H1DS
HS_V3H1HSV
HSHH_V3H1
GISTLAB_V3H1
RGB_V3H1RGB
DH_V3H1DH
HHLAB
DS_V3H1
0.31
0.33
0.35
0.37
0.39
0.41
0.43
0.45
SKL-CRM (Valid F1)
SKL-CRM (Test F1)
Tagprop (Test F1)
Feature type
F1
F1 score of SKL-CRM on Corel 5K
HSV_V3H1DS
HS_V3H1HSV
HSHH_V3H1
GISTLAB_V3H1
RGB_V3H1RGB
DH_V3H1DH
HHLAB
DS_V3H1
0.31
0.33
0.35
0.37
0.39
0.41
0.43
0.45
SKL-CRM (Valid F1)
SKL-CRM (Test F1)
Tagprop (Test F1)
Feature type
F1
F1 score of SKL-CRM on Corel 5K
HSV_V3H1DS
HS_V3H1HSV
HSHH_V3H1
GISTLAB_V3H1
RGB_V3H1RGB
DH_V3H1DH
HHLAB
DS_V3H1
0.31
0.33
0.35
0.37
0.39
0.41
0.43
0.45
SKL-CRM (Valid F1)
SKL-CRM (Test F1)
Tagprop (Test F1)
Feature type
F1
Optimal kernel-feature alignments on Corel 5K
I Optimal alignments1:
I HSV: Multinomial (λ = 0.99)I HSV V3H1: Generalised Gaussian (p=0.9)I Harris Hue (HH V3H1): Generalised Gaussian (p=0.1) ≈
Dirac spike!I Harris SIFT (HS): GaussianI HS V3H1: Generalised Gaussian (p=0.7)I DenseSift (DS): Laplacian
I Our data-driven kernels more effective than standard kernels
I No alignment agrees with literature default assignment i.e.Gaussian for Gist, Laplacian for colour histogram, χ2 for SIFT
1V3H1 denotes descriptors computed in a spatial arrangement
SKL-CRM Results vs. Literature (Precision & Recall)
R P R P0.20
0.25
0.30
0.35
0.40
0.45
0.50
MBRM JEC
Tagprop GS
SKL-CRM
Corel 5K IAPR TC12
SKL-CRM Results vs. Literature (N+)
MBRM JEC Tagprop GS SKL-CRM0
50
100
150
200
250
300
Corel 5K
IAPR TC12
N+
Conclusion
Overview
SKL-CRM
Evaluation
Conclusion
Conclusions and Future Work
I Proposed a sparse kernel model for image annotation
I Key experimental findings:
I Default kernel-feature alignment suboptimal
I Data-adaptive kernels are superior to standard kernels
I Sparse set of features just as effective as much larger set
I Greedy forward selection as effective as gradient ascent
I Future work: superposition of kernels per feature type