multiple kernel learning
DESCRIPTION
Multiple Kernel Learning. Marius Kloft Technische Universität Berlin Korea University. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A. Machine Learning. Example Object detection in images. Aim - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/1.jpg)
Multiple Kernel LearningMarius KloftTechnische Universität BerlinKorea University
![Page 2: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/2.jpg)
Marius Kloft (TU Berlin)
• Aim
▫ Learning the relation of of two random quantities and
from observations
• Kernel-based learning:
Machine Learning
1/12
• Example
▫ Object detection in images
![Page 3: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/3.jpg)
Marius Kloft (TU Berlin)
Multiple Views / Kernels
2/12
How to combine the views?
Weightings.
(Lanckriet, 2004)
Shape
Space
Color
![Page 4: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/4.jpg)
Marius Kloft (TU Berlin)
Computation of Weights?
3/12
• State of the art
▫ Sparse weights
Kernels / views are completely discarded
▫But why discard information?
(Bach, 2008)
![Page 5: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/5.jpg)
Marius Kloft (TU Berlin)
From Vision to Reality?
• State of the art: sparse method
▫ empirically ineffective
4/12
(Gehler et al., Noble et al., Shawe-Taylor et al., NIPS 2008)
• My dissertation: new methodology
▫ established as a standard
Breakthrough at lear-ning bounds: O(M/n)
Effective in Applications
More efficient and effective in practice
![Page 6: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/6.jpg)
Marius Kloft (TU Berlin)
Presentation of MethodologyNon-sparse Multiple Kernel Learning
![Page 7: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/7.jpg)
Marius Kloft (TU Berlin)
• Generalized formulation
▫ arbitrary loss
▫ arbitrary norms
e.g. lp-norms:
1-norm leads to sparsity:
New Methodology
• Computation of weights?
▫ Model
Kernel
▫ Mathematical program
Convex problem.
5/12
(Kloft et al., ECML 2010, JMLR 2011)
Optimization over weights
![Page 8: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/8.jpg)
Marius Kloft (TU Berlin)
Theoretical Analysis
• Theoretical foundations
▫ Active research topic NIPS workshop 2010
▫ We show:
Theorem (Kloft & Blanchard). The local Rademacher complexity of MKL is bounded by:
• Corollaries (Learning Bounds)
▫ Upper bound with rate
best known rate:
Generally
for , improvement of two orders of magnitude
6/12
(Kloft & Blanchard, NIPS 2011, JMLR 2012)
(Cortes et al., ICML 2010)
![Page 9: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/9.jpg)
Marius Kloft (TU Berlin)
Proof (Sketch)
1. Relating the original class with the centered class
2. Bounding the complexity of the centered class
3. Khintchine-Kahane’s and Rosenthal’s inequalities
4. Bounding the complexity of the original class
5. Relating the bound to the truncation of the spectra of the kernels
7/12
![Page 10: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/10.jpg)
Marius Kloft (TU Berlin)
• Implementation
▫ In C++ (“SHOGUN Toolbox”)
Matlab/Octave/Python/R support
▫ Runtime:
~ 1-2 orders of magnitude faster
Optimization
• Algorithms
1. Newton method
2. sequential, quadratically constrained programming
with level set projections
3. block-coordinate descent alg.
Alternate solve (P) w.r.t. w
solve (P) w.r.t. :
Until convergence
(proved)
8/12
(Kloft et al., JMLR 2011)
analytical
(Sketch)
![Page 11: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/11.jpg)
11Marius Kloft (TU Berlin)
▫ Choice of p crucial
▫ Optimality of p depends on true sparsity SVM fails in the sparse scenarios
1-norm best in the sparsest scenario only
p-norm MKL proves robust in all scenarios
• Bounds▫ Can minimize bounds w.r.t. p
▫ Bounds well reflect empirical results
• Design▫ Two 50-dimensional Gaussians
mean μ1 on simplex,
▫Zero-mean features irrelevant 50 training examples One linear kernel per feature
▫ Six scenarios Vary % of irrelevant features
Variance so that Bayes error constant
• Results
sparsity0% 44% 64% 82% 92% 98%
0%
50%
test error
0% 44% 64% 82% 92% 98%
Toy Experiment
Sparsity
sparsity
![Page 12: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/12.jpg)
12Marius Kloft (TU Berlin)
Applications▫
Compu
ter vi
sion
Vi
sual
objec
t rec
ognit
ion
▫Bioi
nfor
matics
Sub
cellu
lar lo
caliz
ation
Pro
tein
fold p
redic
tion
DNA
Tran
scrip
tion s
tart s
ite de
tectio
n
Meta
bolic
netw
ork r
econ
struc
tion
non-sparsity
• Lesson learned:
▫ Optimality of a method depends on true underlying sparsity of problem
• Applications studied:
![Page 13: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/13.jpg)
Marius Kloft (TU Berlin)
• Visual object recognition
▫ Aim: annotation of visual media (e.g., images)
▫ Motivation:
▫ content-based image retrieval
Application Domain: Computer Vision
9/12
aeroplane bicycle bird
![Page 14: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/14.jpg)
Marius Kloft (TU Berlin)
• Multiple kernels
▫ based on
Color histograms
shapes
(gradients)
local features
(SIFT words)
spatial features
• Visual object recognition
▫ Aim: annotation of visual media (e.g., images)
▫ Motivation:
▫ content-based image retrieval
Application Domain: Computer Vision
9/12
![Page 15: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/15.jpg)
Marius Kloft (TU Berlin)
• Datasets
▫ 1. VOC 2009 challenge
7054 train / 6925 test images
20 object categories
▫aeroplane, bicycle,…
▫ 2. ImageCLEF2010 Challenge
8000 train / 10000 test images
▫ taken from Flickr
93 concept categories
▫partylife, architecture, skateboard, …
Application Domain: Computer Vision
9/12
• 32 Kernels
▫ 4 types:
▫ varied over color channel combinations and spatial tilings (levels of a spatial pyramid)
▫ one-vs.-rest classifier for each visual concept
Bag of Words Global Histogram
Gradients BoW-SIFT Ho(O)G
Color Histograms BoW-C HoC
![Page 16: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/16.jpg)
Marius Kloft (TU Berlin)
• Challenge results
▫ Employed our approach for ImageCLEF2011 Photo Annotation challenge
achieved the winning entries in 3 categories!
Application Domain: Computer Vision
9/12
• Preliminary results:
▫ using BoW-S only gives worse results
→ BoW-S alone is not sufficient
SVM MKL
VOC2009 55.85 56.76
CLEF2010 36.45 37.02
(Binder et al., 2010,2011)
![Page 17: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/17.jpg)
Marius Kloft (TU Berlin)
• Experiment
▫ Disagreement of single-kernel classifiers
▫ BoW-S kernels induce more or less the same predictions
Application Domain: Computer Vision
9/12
• Why can MKL help?
▫ some images better captured by certain kernels
▫ Different images may have different kernels that capture them well
![Page 18: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/18.jpg)
Marius Kloft (TU Berlin)
• Detection of▫ transcription start sites:
• by means of kernels based on:
▫ sequence alignments
▫ distribution of nukleotides downstream, upstream
▫ folding properties binding energies and angles
• Empirical analysis
▫ detection accuracy (AUC):
▫ higher accuracies than sparse MKL and ARTS
ARTS winner of international comparison of 19 models
(Abeel et al., 2009)
(Kloft et al., NIPS 2009, JMLR 2011)
Application Domain: Genetics
![Page 19: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/19.jpg)
Marius Kloft (TU Berlin)
• Theoretical analysis
▫ impact of lp-Norm on bound
▫ confirms experimental results:
stronger theoretical guarantees for proposed approach (p>1)
empirical and theoretical results approximately equal for
• Empirical analysis
▫ detection accuracy (AUC):
▫ higher accuracies than sparse MKL and ARTS
ARTS winner of international comparison of 19 models
(Kloft et al., NIPS 2009, JMLR 2011)
(Abeel et al., 2009)
Application Domain: Genetics
![Page 20: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/20.jpg)
20Marius Kloft (TU Berlin)
• Protein Fold Prediction
▫ Prediction of fold class of protein
Fold class related to protein’s function
▫e.g., important for drug design
▫ Data set and kernels from Y. Ying
27 fold classes
Fixed train and test sets
12 biologically inspired kernels
▫e.g. hydrophobicity, polarity, van-der-Waals volume
• Results
▫ Accuracy:
▫ 1-norm MKL and SVM on par
▫ p-norm MKL performs best
6% higher accuracy than baselines
Application Domain: Pharmacology
![Page 21: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/21.jpg)
21Marius Kloft (TU Berlin)
▫Com
puter
visio
n
Visu
al ob
ject r
ecog
nition
▫Bioi
nfor
matics
Sub
cellu
lar lo
caliz
ation
Pro
tein
fold p
redic
tion
DNA
Tran
scrip
tion s
plice
site
detec
tion
Meta
bolic
netw
ork r
econ
struc
tion
non-sparsity
Further Applications
![Page 22: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/22.jpg)
Marius Kloft (TU Berlin)
Conclusion: Non-sparse Multiple Kernel Learning
10/12
Training with > 100,000 data points and > 1 000 Kernels
Sharp learning bounds
Appli-cations
Visual Object Recognitionestablished standard: winner of ImageCLEF 2011 Challenge
Computational Biology
More accurate gene detector than winner of int. comparison
![Page 23: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/23.jpg)
12/12
Thank you for your attention.I will be pleased to answer any additional questions.
![Page 24: Multiple Kernel Learning](https://reader036.vdocuments.us/reader036/viewer/2022062521/56816617550346895dd96623/html5/thumbnails/24.jpg)
References
▫ Abeel, Van de Peer, Saeys (2009). Toward a gold standard for promoter prediction evaluation. Bioinformatics.
▫ Bach (2008). Consistency of the Group Lasso and Multiple Kernel Learning. Journal of Machine Learning Research (JMLR).
▫ Kloft, Brefeld, Laskov, and Sonnenburg (2008). Non-sparse Multiple Kernel Learning. NIPS Workshop on Kernel Learning.
▫ Kloft, Brefeld, Sonnenburg, Laskov, Müller, and Zien (2009). Efficient and Accurate Lp-norm Multiple Kernel Learning. Advances in Neural Information Processing Systems (NIPS 2009).
▫ Kloft, Rückert, and Bartlett (2010). A Unifying View of Multiple Kernel Learning. ECML.
▫ Kloft, Blanchard (2011). The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning. Advances in Neural Information Processing Systems (NIPS 2011).
▫ Kloft, Brefeld, Sonnenburg, and Zien (2011). Lp-Norm Multiple Kernel Learning. Journal of Machine Learning Research (JMLR), 12(Mar):953-997.
▫ Kloft and Blanchard (2012). On the Convergence Rate of Lp-norm Multiple Kernel Learning. Journal of Machine Learning Research (JMLR), 13(Aug):2465-2502.
▫ Lanckriet, Cristianini, Bartlett, El Ghaoui, Jordan (2004). Learning the Kernel Matrix with Semidefinite Programming. Journal of Machine Learning Research (JMLR).
11/12