functional analytic framework for model selection
DESCRIPTION
Functional Analytic Framework for Model Selection. Masashi Sugiyama Tokyo Institute of Technology, Tokyo, Japan Fraunhofer FIRST-IDA, Berlin, Germany. Regression Problem. :Learning target function. :Learned function. :Training examples. (noise). From , obtain a good - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/1.jpg)
Aug. 27, 2003IFAC-SYSID2003
Functional Analytic Framework for Model Selection
Functional Analytic Framework for Model Selection
Masashi Sugiyama
Tokyo Institute of Technology, Tokyo, Japan
Fraunhofer FIRST-IDA, Berlin, Germany
![Page 2: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/2.jpg)
2
From , obtain a goodapproximation to
Regression Problem
Regression Problem
:Learning target function
:Learned function
:Training examples
(noise)
![Page 3: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/3.jpg)
3Model SelectionModel Selection
Too simple Appropriate Too complex
Target function
Learned function
Choice of the model is extremely importantfor obtaining good learned function !
(Model refers to, e.g., regularization parameter)
![Page 4: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/4.jpg)
4Aims of Our ResearchAims of Our Research
Model is chosen such that a generalization error estimator is minimized.
Therefore, model selection research is essentially to pursue an accurate estimator of the generalization error.
We are interested inHaving a novel method in different framework.Estimating the generalization error with small
(finite) samples.
![Page 5: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/5.jpg)
5
: A functional Hilbert spaceWe assumeWe shall measure the “goodness”
of the learned function (or the generalization error) by
Formulating Regression Problemas Function Approximation
Problem
Formulating Regression Problemas Function Approximation
Problem
:Norm in
:Expectation over noise
![Page 6: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/6.jpg)
6
In learning problems, we sample values of the target function at sample points (e.g., ).
Therefore, values of the target function at sample points should be specified.
This means that usual -space is not suitable for learning problems.
Function Spaces for LearningFunction Spaces for Learning
and have different values at
But they are treated asthe same function in
is spanned by
![Page 7: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/7.jpg)
7
In a reproducing kernel Hilbert space (RKHS), a value of a function at an input point is always specified.
Indeed, an RKHS has the reproducing kernel with reproducing property:
Reproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces
:Inner product in
![Page 8: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/8.jpg)
8Sampling OperatorSampling Operator
For any RKHS , there exists a linear operator from to such that
Indeed,
:Neumann-Schatten product
: -th standard basis in
For vectors,
![Page 9: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/9.jpg)
9Our FrameworkOur Framework
Learningtarget
function
Learnedfunction
+noise
Sampling operator(Always linear)
Learning operator(Generally non-linear)
RKHS Sample value space
Gen. error
:Expectation over noise
![Page 10: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/10.jpg)
10
We want to estimate . But it includes unknown so it is not straightforward.
To cope with this problem, We shall estimate only its essential part
We focus on the kernel regression model:
Tricks for Estimating
Generalization Error
Tricks for Estimating
Generalization Error
ConstantEssential part
:Reproducing kernel of
![Page 11: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/11.jpg)
11
Unknown target function can be erased!
For the kernel regression model,the essential gen. error is expressed by
A Key LemmaA Key Lemma
:Generalized inverse
:Expectation over noise
![Page 12: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/12.jpg)
12
is an unbiased estimator of the essential gen. error .
However, the noise vector is unknown.Let us define
Clearly, it is still unbiased:We would like to handle well.
Estimating Essential Part Estimating Essential Part
![Page 13: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/13.jpg)
13How to Deal with How to Deal with
Depending on the type of learning operatorwe consider the following three cases.
A) is linear.
B) is non-linear but twice almost differentiable.
C) is general non-linear.
![Page 14: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/14.jpg)
14A) Examples of
Linear Learning OperatorA) Examples of
Linear Learning OperatorKernel ridge regressionA particular Gaussian process regressionLeast-squares support vector machine
:Parameters to be learned
:Ridge parameter
![Page 15: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/15.jpg)
15
When the learning operator is linear,
A) Linear LearningA) Linear Learning
This induces the subspace information criterion (SIC):
SIC is unbiased with finite samples:
M. Sugiyama & H. Ogawa (Neural Comp, 2001)M. Sugiyama & K.-R. Müller (JMLR, 2002)
:Adjoint of
![Page 16: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/16.jpg)
16How to Deal with How to Deal with
Depending on the type of learning operatorwe consider the following three cases.
A) is linear.
B) is non-linear but twice almost differentiable.
C) is general non-linear.
![Page 17: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/17.jpg)
17B) Examples of Twice Almost
Differentiable Learning OperatorB) Examples of Twice Almost
Differentiable Learning Operator
Support vector regression with Huber’s loss
:Ridge parameter
:Threshold
![Page 18: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/18.jpg)
18
For the Gaussian noise, we have
B) Twice Differentiable Learning
B) Twice Differentiable Learning
SIC for twice almost differentiable learning:
It reduces to the original SIC if is linear. It is still unbiased with finite samples:
:Vector-valued function
![Page 19: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/19.jpg)
19How to Deal with How to Deal with
Depending on the type of learning operatorwe consider the following three cases.
A) is linear.
B) is non-linear but twice almost differentiable.
C) is general non-linear.
![Page 20: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/20.jpg)
20C) Examples of General
Non-Linear Learning OperatorC) Examples of General
Non-Linear Learning OperatorKernel sparse regression
Support vector regression with Vapnik’s loss
![Page 21: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/21.jpg)
21
Approximation by the bootstrap
C) General Non-Linear Learning C) General Non-Linear Learning
Bootstrap approximation of SIC (BASIC):
BASIC is almost unbiased:
:Expectation over bootstrap replications
![Page 22: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/22.jpg)
22
:Gaussian RKHS
Kernel ridge regression
Simulation: Learning Sinc function
Simulation: Learning Sinc function
:Ridge parameter
![Page 23: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/23.jpg)
23Simulation: DELVE Data
SetsSimulation: DELVE Data
Sets
Red: Best or comparable (95%t-test)
Normalized test error
![Page 24: Functional Analytic Framework for Model Selection](https://reader034.vdocuments.us/reader034/viewer/2022051018/56814400550346895db0941a/html5/thumbnails/24.jpg)
24ConclusionsConclusions
We provided a functional analytic framework for regression, where the generalization error is measured using the RKHS norm:
Within this framework, we derived a generalization error estimator called SIC.
A) Linear learning (Kernel ridge, GPR, LS-SVM):SIC is exact unbiased with finite samples.
B) Twice almost differentiable learning (SVR+Huber):SIC is exact unbiased with finite samples.
C) Non-linear learning (K-sparse, SVR+Vapnik):BASIC is almost unbiased.