model selection for svm & our intent works
DESCRIPTION
Model Selection for SVM & Our intent works. Songcan Chen Feb. 8, 2012. Outline. Model Selection for SVM Our intent works. Model Selection for SVM. Introduction to 2 works. Introduction to 2 works. Model selection for primal SVM [MBB11, MLJ11] Selection of Hypothesis Space - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/1.jpg)
Model Selection for SVM&
Our intent works
Songcan Chen
Feb. 8, 2012
![Page 2: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/2.jpg)
Outline
• Model Selection for SVM
• Our intent works
![Page 3: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/3.jpg)
Model Selection for SVM
• Introduction to 2 works
![Page 4: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/4.jpg)
Introduction to 2 works
1. Model selection for primal SVM [MBB11, MLJ11]
2. Selection of Hypothesis Space• Selecting the Hypothesis Space for Improving th
e Generalization Ability of Support Vector Machines [AGOR11,IJCNN2011]
• The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers [AGOR11,NIPS2011]
![Page 5: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/5.jpg)
1st work
• Model selection for primal SVM [MBB11, MLJ11]
[MBB11] Gregory Moore · Charles Bergeron · Kristin P. Bennett, Machine Learning (2011) 85:175–208
![Page 6: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/6.jpg)
Outline
• Primal SVM
• Model selection
1) Bilevel Program for CV
2) Two optimization Methods:
Impilicit & Explicit methods
3) Experiments
4) Conclusions
![Page 7: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/7.jpg)
Primal SVM
• Advantages:
1) simple to implement, theoretically sound, and easy to customize to different tasks such as classification, regression, ranking and so forth.
2) very fast, linear in the number of samples
• Difficulty
model selection
![Page 8: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/8.jpg)
Model selection
An often-adopted approach:
Cross-validation (CV) over a grid
Advantage:
simple and almost universal!
Weakness:
high computation exponential in the number of hyperparameters and the number of grid points for each hyperparameter.
![Page 9: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/9.jpg)
Motivation
• CV is naturally and precisely formulated as a bilevel program (BP) shown as follows.
Bilevel CV Problem(BCP)
![Page 10: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/10.jpg)
Bilevel CV Problem (BCP) (1)
BCP for a single validation and training split:
• The outer-level leader problem selects the
hyperparameters, γ, to perform well on a validation set.
• The follower problem trains an optimal inner-level model for the given hyperparameters, and returns a weight vector w for validation.
![Page 11: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/11.jpg)
Bilevel CV Problem (BCP) (2)
More Specifically, Model selection via T-fold CV BCP!1) The inner-level problems minimize the regularize
d training error to determine the best function for the given hyperparameters for each fold.
2) The hyperparameters are the outer-level control variables. The objective of the outer-level is to minimize the validation error based on the optimal parameters (w) returned for each fold.
![Page 12: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/12.jpg)
Formal Formulation for BCP (1)
• Given a training sample
Ω:= {xj, yj }, j=1… l R∈ n+1.
• T-CV: PartitionΩ into T equally sized divisions; then for fold t=1…T, one of the divisions is used as the validation set, , and the remaining T-1 divisions are assigned to the training set, .
• Letγ R∈ m be the set of m model hyperparameters and wt be the model weights for the t-th fold.
tval
ttrn
![Page 13: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/13.jpg)
Formal Formulation for BCP (2)
Let
be the inner-level training function given the t-th fold training dataset and
be the t-th outer-level validation loss function given its validation dataset
![Page 14: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/14.jpg)
Formal Formulation for BCP (3)
The bilevel program for T-fold CV is:
(2)
The BCP is challenging to solve in this form.
![Page 15: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/15.jpg)
Formal Formulation for BCP (4)
(3)
Two solution methods: 1) Implicit and 2) explicit
![Page 16: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/16.jpg)
Implicit Method
![Page 17: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/17.jpg)
Implicit method (1)
i.e., make w an implicit function ofγ, namely w(γ).
(4)
Forming a nonlinear objective! In practice, nonlinear objectives are much easier to optimize than nonlinear constraints.
Where w(γ) is computed such that
![Page 18: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/18.jpg)
Implicit method (2)
Since for optimality,
Equivalently having the KKT condition
(5)
![Page 19: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/19.jpg)
Implicit method (3)
• The reformulated BCP becomes:
(6)
(7)
![Page 20: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/20.jpg)
One of Applications
• Implicit Model selection for SVR
• Objective of SVR(9) and optimality condition(10) are, respectively:
(9)
(10)
![Page 21: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/21.jpg)
Defining the objective functions Lin and Lout for SVR respectively:
First, each fold t in T-fold CV contributes a validation mean squared error:
(11)
![Page 22: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/22.jpg)
• The T-folds are averaged to generate the outer-level objective:
(12)
For single group SVR, there are T inner-level objectives Lin:
(13)
![Page 23: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/23.jpg)
Implicit model selection for SVR
(14)A full Bilevel Program (BP)
![Page 24: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/24.jpg)
For multiple group SVR (multiSVR)
1) multiSVR’s objective:
2) Optimality condition or constraint
(16)
(15)
![Page 25: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/25.jpg)
Implicit model selection methods:Algorithm (ImpGrad)
where
(17)
![Page 26: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/26.jpg)
Summary for ImpGrad
• ImpGrad alternates between training a model and updating the hyperparameters. Ideally an explicit algorithm that simultaneously solves for both model weights and hyperparameters would be more efficient as there is no need to train a model to optimality when far from the optimal solution.
![Page 27: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/27.jpg)
Explicit method
![Page 28: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/28.jpg)
Explicit Methods (1)
Assume that the inner-level objective functions are differentiable and convex with respect to w, thus the optimality condition is the partial derivative of Lin(w, γ) with respect to w is equal to zero:
![Page 29: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/29.jpg)
Explicit Methods (2)
In position to transforming a bilevel program to the nonconvex nonsmooth program
![Page 30: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/30.jpg)
Penalized bilevel program (PBP)
(34)
![Page 31: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/31.jpg)
PBP Algorithm (1)
![Page 32: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/32.jpg)
PBP Algorithm (2)
where
(35)
![Page 33: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/33.jpg)
One of Applications
SVR using the PBP algorithm
![Page 34: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/34.jpg)
![Page 35: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/35.jpg)
The optimality condition to be penalized for each inner-level problem, t = 1 . . . T , is:
![Page 36: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/36.jpg)
Experiments
• Experiment A: Small QSAR datasets
• Experiment B: Large QSAR datasets
![Page 37: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/37.jpg)
Experiment A (1)
![Page 38: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/38.jpg)
Experiment A (2): MSE
![Page 39: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/39.jpg)
Experiment A (3): Time
![Page 40: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/40.jpg)
Experiment B (1)
![Page 41: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/41.jpg)
Experiment B (2): MSE
For Pyruvate kinase Dataset
![Page 42: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/42.jpg)
More
![Page 43: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/43.jpg)
Experiment B (3): MSEFor Tau-fibril Dataset
![Page 44: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/44.jpg)
More
![Page 45: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/45.jpg)
Scalability (1): The size of dataset
![Page 46: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/46.jpg)
Scalability (2): The # of Parameters
![Page 47: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/47.jpg)
Summary
• Coarse grid search was reasonably fast; faster than both ImpGrad and PBP. In terms of generalization though, coarse grid search performed the worst.
• Implicit and PBP algorithms performed better, with PBP being faster on the smaller datasets and ImpGrad being faster on the larger datasets. Generalization was slightly better for PBP.
![Page 48: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/48.jpg)
Conclusions (1)
• ImpGrad finds solutions with good generalization very quickly for large datasets, but illustrates more erratic behavior on all of the small datasets.
• PBP uses a well-founded subgradient method with proven convergence properties and yields a robust explicit algorithm that performed well on problems of all sizes. While it appears to be roughly linear in the training time required per modeling set size.
![Page 49: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/49.jpg)
Conclusions (2)
• Like all machine learning algorithms, PBP and ImpGrad have algorithm parameters that must be defined such as exit criteria, starting points, and proximity parameters.
• ImpGrad and PBP assume that the inner-level objective functions are at least once differentiable.
![Page 50: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/50.jpg)
More reading for more details!
![Page 51: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/51.jpg)
2nd work
• Selecting the Hypothesis Space for Improving the Generalization Ability of Support Vector Machines [AGOR11,IJCNN2011]
[AGOR11] Davide Anguita, Alessandro Ghio, Luca Oneto, Sandro Ridella, Selecting the Hypothesis Space for Improving the Generalization Ability of Support Vector Machines, Inter. Joint Conf. Neural Networks, 2011.
![Page 52: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/52.jpg)
Outline
• Drawback of conventional SVC in selecting Hypothesis or model
1) Linear SVC
2) Structural Risk Minimization principle (SRM)
3) Drawback
4) Motivation
• Improving model selection for SVC
![Page 53: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/53.jpg)
Linear SVC (1)
• Linear SVC
(1)
1) Parameters (w,b) can be computed during the learning phase using a training set
2) Hyperparameters such as C need to be tuned via CV, more specifically, model selection.
![Page 54: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/54.jpg)
Linear SVC (2)
Structural Risk Minimization Principle (SRM)
![Page 55: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/55.jpg)
SRM (1)
(i) Define a centroid ;
(ii) Choose a (possibly infinite) sequence of hypothesis spaces Fk, k = 1, 2, ..., where the classes of functions Fk describe classifiers of growing complexity and are centered on ;
(iii) Select the optimal model fo among the hypothesis spaces by exploiting the following trade-off between overfitting and underfitting:
w
w
![Page 56: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/56.jpg)
SRM (2)
where
(iii) Select the optimal model fo among the hypothesis spaces by exploiting the following trade-off between overfitting and underfitting:
![Page 57: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/57.jpg)
Drawback (1)
The hypothesis space is usually (and arbitrarily) centered in the origin ( ), because, in general, there is no a priori information leading to a better choice. This choice, however, severely influences the sequence Fk. See Fig.2
0ˆ 0w
![Page 58: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/58.jpg)
Drawback (2)
![Page 59: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/59.jpg)
Motivation
Select a ”good” centroid and to center a
sequence of hypothesis spaces around it, so to better explore the classes of functions for model selection purposes.
![Page 60: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/60.jpg)
Improving model selection for SVC
Model complexity measures:
• the Rademacher Complexity (RC) [1] and
• the Maximal Discrepancy (MD)[2]
[1] P. Bartlett, S. Mendelson, “Rademacher and Gaussian complexities:Risk bounds and structural results”, Computational Learning, pp. 224–240, 2001.[2] P.L. Bartlett, S. Boucheron, G. Lugosi, “Model selection and estimation”, Machine Learning, vol. 48, pp. 85–113, 2002.
![Page 61: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/61.jpg)
Traditional SVC formulation
Or alternative form
![Page 62: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/62.jpg)
Improved SVC formulation (1)
Or alternative form
![Page 63: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/63.jpg)
Let
Primal problem:
![Page 64: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/64.jpg)
How to Choose feasible centroids
Split the dataset Dn in two parts:
where nc patterns are used to find the function and the remaining nl = n − nc samples can be safely exploited for estimating the values of the MD and RC penalty terms (7) and (11) (as Dnl∩Dnc= ).!∅
![Page 65: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/65.jpg)
Where (7) & (11) are respectively:
![Page 66: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/66.jpg)
In order to find , consider Dnc and a class of functions Fc centered on . When varying the hyperparameter C and solving problem (13),
0ˆ 0w
This way, we can get ns or (p=1,…,ns)
![Page 67: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/67.jpg)
Formulation:
or
Improved SVC formulation (2):IMS & IMSA
![Page 68: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/68.jpg)
IMS
![Page 69: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/69.jpg)
IMSA
![Page 70: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/70.jpg)
Experiments (1)
![Page 71: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/71.jpg)
Experiments (2)
![Page 72: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/72.jpg)
Experiments (3)
![Page 73: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/73.jpg)
Experiments (4)
![Page 74: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/74.jpg)
Experiments (5)
![Page 75: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/75.jpg)
Conclusions This work addresses two issues: (1) is the possibility of selecting a different hypothesis
space respect to the one used by the conventional SVM formulation;
(2) is to use this greater flexibility to improve the generalization ability of the trained classifier.
While (1) could be seen as a theoretical curiosity, this paper showed that its solution leads to better results in practice, at least in the small–sample setting.
Next research is to understand how to exploit the new formulation in other ways. An example is the choice of the alternative centroid(s), through some a priori information about the classification problem, instead of deriving it from a portion of the training set.
![Page 76: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/76.jpg)
Thinking (1)
For the 1st work• Use the BCP to help the 2nd one select model! (trivial!)• Adapt BCP to other regularized objectives with spatial re
gularization etc.• How to adapt them nonlinear machines or locally linear
SVM A possible origin: refer to [KPB+11]
For the 2nd work• Other possible feasible centroids: LDA etc.• Explore relationship with the projection penality! [KPB+11]Peter Karsmakers, K. Pelckmanns, K. De Brabanter, H. Van hamme, Johan A. K. Suyk
ens, Sparse conjugate directions pursuit with applications to fixed-size kernel models, Machine Learning, (2011)85:109-148.
![Page 77: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/77.jpg)
Thinking (2)
• How to Diverge your thinking E.g., from the form f(x;w, b)=wTx+b
1) Pattern x φ(x) kernel (matrix) X f(X;u,v,b)=uTXv+b Tensor f(X;W,b)=tr(WX)+b
2) Weight w utilization of prior knowledge, e.g., entries of the w are monotonical, etc.
![Page 78: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/78.jpg)
Our intent works
![Page 79: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/79.jpg)
outline
• Works related to matrix patterns
• Zero-Shot Learning
• How to Approximate more real scenario for research topics
![Page 80: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/80.jpg)
Works related to matrix patterns
• Metric Learning for matrix patterns
• Inverse covariance learning
• Indefinite kernel (matrix) learning
![Page 81: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/81.jpg)
Zero-Shot Learning
• Description and application scenarios
• Attribute Classification and Feature-based classification
[1] Hugo Larochelle,Dumitru Erhan, Yoshua Bengio, Zero-data Learning of New Tasks,AAAI08.[2] Mark Palatucci, Dean Pomerleau, Geoffrey Hinton, Tom M. Mitchell, Zero-Shot Learning with Semantic Output Codes, NIPS2010.[3] Mark M. Palatucci, Thought Recognition: Predicting and Decoding Brain Activity Using the Zero-Shot Learning Model, CMU PhThesis, 4-25-2011.
![Page 82: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/82.jpg)
Description and application scenarios
Zero-data learning problem:• A model must generalize to classes or tasks for
which no training data are available and only a description of the classes or tasks are provided.
• The description of each class or task is provided in some representation, the simplest being a vector of numeric or symbolic attributes.
![Page 83: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/83.jpg)
Application scenarios (1)
• useful for problems where the set of classes to distinguish or tasks to solve is very large and is not entirely covered by the training data.
• E.g., character, object & face recognition, Multi-task ranking, neural decoding task for fMRI.
![Page 84: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/84.jpg)
Application scenarios (2)
![Page 85: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/85.jpg)
Attribute Classification and Feature-based classification
• A framework
![Page 86: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/86.jpg)
How to Approximate more real scenario for research topics
• A Case study
![Page 87: Model Selection for SVM & Our intent works](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814cf9550346895dba0c91/html5/thumbnails/87.jpg)
Thanks!
Q&A