on the relation between universality, characteristic ...cc-universality [micchelli et al., 2006] x:...
TRANSCRIPT
![Page 1: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/1.jpg)
On the Relation Between Universality,Characteristic Kernels and RKHS Embedding of
Measures
Bharath K. Sriperumbudur⋆, Kenji Fukumizu† andGert R. G. Lanckriet⋆
⋆UC San Diego †The Institute of Statistical Mathematics
AISTATS 2010
![Page 2: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/2.jpg)
Outline
RKHS embedding of probability measures
Characteristic kernels
Universal kernels
Various notions of universality
Novel characterization of universality
Relation to RKHS embedding of signed measures
![Page 3: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/3.jpg)
RKHS Embedding of Probability Measures
Input space : X
Feature space : H (with reproducing kernel, k)
Feature map : Φ
Φ : X → H x 7→ Φ(x) := k(·, x)
![Page 4: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/4.jpg)
RKHS Embedding of Probability Measures
Input space : X
Feature space : H (with reproducing kernel, k)
Feature map : Φ
Φ : X → H x 7→ Φ(x) := k(·, x)
Extension to probability measures:
P 7→ Φ(P) :=
∫
X
k(·, x) dP(x)
![Page 5: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/5.jpg)
RKHS Embeddings of Probability Measures
Input space : X
Feature space : H (with reproducing kernel, k)
Feature map : Φ
Φ : X → H x 7→ Φ(x) := k(·, x)
Extension to probability measures:
P 7→ Φ(P) :=
∫
X
k(·, x) dP(x)
︸ ︷︷ ︸EY∼P[Φ(Y )]=EY∼P[k(·,Y )]
![Page 6: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/6.jpg)
RKHS Embeddings of Probability Measures
Input space : X
Feature space : H (with reproducing kernel, k)
Feature map : Φ
Φ : X → H x 7→ Φ(x) := k(·, x)
Extension to probability measures:
P 7→ Φ(P) :=
∫
X
k(·, x) dP(x)
Advantage: Φ(P) can distinguish P by high-order moments.
k(y , x) = c0 + c1(xy) + c2(xy)2 + · · · (ci 6= 0) e.g. k(y , x) = exy
Φ(P)(y) = c0 + c1
(∫
X
x dP(x)
)y + c2
(∫
X
x2 dP(x)
)y2 + · · ·
![Page 7: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/7.jpg)
Applications
Two-sample problem:
Given random samples X1, . . . ,Xm and Y1, . . . ,Yn drawn i.i.d.from P and Q, respectively.
Determine: are P and Q different?
![Page 8: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/8.jpg)
Applications
Two-sample problem:
Given random samples X1, . . . ,Xm and Y1, . . . ,Yn drawn i.i.d.from P and Q, respectively.
Determine: are P and Q different?
γ(P,Q) = ‖Φ(P)− Φ(Q)‖H : distance metric between P and Q.
H0 : P = Q H0 : γ(P,Q) = 0≡
H1 : P 6= Q H1 : γ(P,Q) > 0
Test: Say H0 if γ(P,Q) < ε. Otherwise say H1.
![Page 9: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/9.jpg)
Applications
Two-sample problem:
Given random samples X1, . . . ,Xm and Y1, . . . ,Yn drawn i.i.d.from P and Q, respectively.
Determine: are P and Q different?
γ(P,Q) = ‖Φ(P)− Φ(Q)‖H : distance metric between P and Q.
H0 : P = Q H0 : γ(P,Q) = 0≡
H1 : P 6= Q H1 : γ(P,Q) > 0
Test: Say H0 if γ(P,Q) < ε. Otherwise say H1.
Other applications:
Hypothesis testing : Independence test, Goodness of fit test, etc.
Feature selection, message passing, density estimation, etc.
![Page 10: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/10.jpg)
Characteristic Kernels
Define: k is characteristic if
P 7→
∫
X
k(·, x) dP(x) is injective.
In other words,∫
X
k(·, x) dP(x) =
∫
X
k(·, x) dQ(x) ⇔ P = Q.
When k(·, x) = e√−1〈·,x〉, Φ(P) is the characteristic function of P.
![Page 11: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/11.jpg)
Characteristic Kernels
Define: k is characteristic if
P 7→
∫
X
k(·, x) dP(x) is injective.
In other words,∫
X
k(·, x) dP(x) =
∫
X
k(·, x) dQ(x) ⇔ P = Q.
When k(·, x) = e√−1〈·,x〉, Φ(P) is the characteristic function of P.
Not all kernels are characteristic, e.g., k(x , y) = xT y .
µP = µQ ; P = Q
When is k characteristic? [Gretton et al., 2007,Sriperumbudur et al., 2008, Fukumizu et al., 2008,Fukumizu et al., 2009, Sriperumbudur et al., 2009].
![Page 12: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/12.jpg)
Universal Kernels Regularization approach to supervised learning
minf∈H
1
n
n∑
i=1
ℓ(f (xi ), yi ) + λΩ[f ], (1)
where λ > 0 and (xi , yi )ni=1 is the training data.
![Page 13: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/13.jpg)
Universal Kernels Regularization approach to supervised learning
minf∈H
1
n
n∑
i=1
ℓ(f (xi ), yi ) + λΩ[f ], (1)
where λ > 0 and (xi , yi )ni=1 is the training data.
Representer theorem : The solution to (1) is of the form
f =
n∑
i=1
cik(·, xi ),
where cini=1 ⊂ R are the parameters typically obtained from the
training data.
![Page 14: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/14.jpg)
Universal Kernels Regularization approach to supervised learning
minf∈H
1
n
n∑
i=1
ℓ(f (xi ), yi ) + λΩ[f ], (1)
where λ > 0 and (xi , yi )ni=1 is the training data.
Representer theorem : The solution to (1) is of the form
f =
n∑
i=1
cik(·, xi ),
where cini=1 ⊂ R are the parameters typically obtained from the
training data.
Question: Can f approximate any target function arbitrarily “well”as n → ∞?
![Page 15: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/15.jpg)
Universal Kernels Regularization approach to supervised learning
minf∈H
1
n
n∑
i=1
ℓ(f (xi ), yi ) + λΩ[f ], (1)
where λ > 0 and (xi , yi )ni=1 is the training data.
Representer theorem : The solution to (1) is of the form
f =
n∑
i=1
cik(·, xi ),
where cini=1 ⊂ R are the parameters typically obtained from the
training data.
Question: Can f approximate any target function arbitrarily “well”as n → ∞?
We need H to be “dense” in the space of target functions — k isuniversal.
![Page 16: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/16.jpg)
Various Notions of Universality
Prior work
c-universality [Steinwart, 2001]
cc-universality [Micchelli et al., 2006]
Proposed notion: c0-universality
Characterization of c-, cc- and c0-universality : Relation to RKHSembedding of measures
Translation invariant kernels on Rd
Radial kernels on Rd
![Page 17: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/17.jpg)
c-universality [Steinwart, 2001]
X : compact metric space
k : continuous on X × X
Target function space : C (X ), continuous functions on X
Define k to be c-universal if H is dense in C (X ) w.r.t. the uniform norm(‖f ‖u := supx∈X |f (x)|).
![Page 18: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/18.jpg)
c-universality [Steinwart, 2001]
X : compact metric space
k : continuous on X × X
Target function space : C (X ), continuous functions on X
Define k to be c-universal if H is dense in C (X ) w.r.t. the uniform norm(‖f ‖u := supx∈X |f (x)|).
Sufficient conditions are obtained based on the Stone-Weierstraßtheorem. Not easy to check!
Examples: Gaussian and Laplacian kernels on any compact subset ofRd .
![Page 19: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/19.jpg)
c-universality [Steinwart, 2001]
X : compact metric space
k : continuous on X × X
Target function space : C (X ), continuous functions on X
Define k to be c-universal if H is dense in C (X ) w.r.t. the uniform norm(‖f ‖u := supx∈X |f (x)|).
Sufficient conditions are obtained based on the Stone-Weierstraßtheorem. Not easy to check!
Examples: Gaussian and Laplacian kernels on any compact subset ofRd .
Issue: X is compact which excludes many interesting spaces, such as Rd .
![Page 20: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/20.jpg)
cc-universality [Micchelli et al., 2006]
X : Hausdorff space
k : continuous on X × X
Target function space : C (X )
Define k to be cc-universal if H is dense in C (X ) endowed with thetopology of compact convergence.
![Page 21: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/21.jpg)
cc-universality [Micchelli et al., 2006]
X : Hausdorff space
k : continuous on X × X
Target function space : C (X )
Define k to be cc-universal if H is dense in C (X ) endowed with thetopology of compact convergence.
In other words, for any compact set Z ⊂ X , H|Z := f|Z : f ∈ H isdense in C (Z ) w.r.t. ‖ · ‖u.
![Page 22: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/22.jpg)
cc-universality [Micchelli et al., 2006]
X : Hausdorff space
k : continuous on X × X
Target function space : C (X )
Define k to be cc-universal if H is dense in C (X ) endowed with thetopology of compact convergence.
Necessary and sufficient conditions are obtained, which are relatedto the injectivity of RKHS embedding of measures.
Examples: Gaussian, Laplacian and Sinc kernels on Rd .
![Page 23: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/23.jpg)
cc-universality [Micchelli et al., 2006]
X : Hausdorff space
k : continuous on X × X
Target function space : C (X )
Define k to be cc-universal if H is dense in C (X ) endowed with thetopology of compact convergence.
Necessary and sufficient conditions are obtained, which are relatedto the injectivity of RKHS embedding of measures.
Examples: Gaussian, Laplacian and Sinc kernels on Rd .
Issue: Topology of compact convergence is weaker than the topology ofuniform convergence.
![Page 24: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/24.jpg)
Proposed Notion: c0-universality
X : locally compact Hausdorff (LCH) space
Target function space : C0(X ), the space of bounded continuousfunctions that “vanish at infinity” (for every ǫ > 0,x ∈ X : |f (x)| ≥ ǫ is compact).
k is bounded and k(·, x) ∈ C0(X ) for all x ∈ X .
![Page 25: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/25.jpg)
Proposed Notion: c0-universality
X : locally compact Hausdorff (LCH) space
Target function space : C0(X ), the space of bounded continuousfunctions that “vanish at infinity” (for every ǫ > 0,x ∈ X : |f (x)| ≥ ǫ is compact).
k is bounded and k(·, x) ∈ C0(X ) for all x ∈ X .
Define k to be c0-universal if H is dense in C0(X ) w.r.t. ‖ · ‖u.
Handles non-compact X and ensures uniform convergence overentire X .
![Page 26: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/26.jpg)
Embedding Characterization of UniversalityTheorem
k is c0-universal if and only if
µ 7→
∫
X
k(·, x) dµ(x), µ ∈ Mb(X ),
is injective. Mb(X ) is the space of finite signed Radon measures onX .
![Page 27: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/27.jpg)
Embedding Characterization of UniversalityTheorem
k is c0-universal if and only if
µ 7→
∫
X
k(·, x) dµ(x), µ ∈ Mb(X ),
is injective. Mb(X ) is the space of finite signed Radon measures onX .
k is cc-universal if and only if
µ 7→
∫
X
k(·, x) dµ(x), µ ∈ Mbc(X ),
is injective. Mbc(X ) = µ ∈ Mb(X ) | supp(µ) is compact.
![Page 28: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/28.jpg)
Embedding Characterization of UniversalityTheorem
k is c0-universal if and only if
µ 7→
∫
X
k(·, x) dµ(x), µ ∈ Mb(X ),
is injective. Mb(X ) is the space of finite signed Radon measures onX .
k is cc-universal if and only if
µ 7→
∫
X
k(·, x) dµ(x), µ ∈ Mbc(X ),
is injective. Mbc(X ) = µ ∈ Mb(X ) | supp(µ) is compact.
k is c-universal if and only if
µ 7→
∫
X
k(·, x) dµ(x), µ ∈ Mb(X ),
is injective.
![Page 29: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/29.jpg)
Postive Definite Characterization of Universality
Theorem
k is c0-universal (resp. c-universal) if and only if
∫
X
∫
X
k(x , y) dµ(x) dµ(y) > 0, ∀µ ∈ Mb(X )\0.
![Page 30: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/30.jpg)
Postive Definite Characterization of Universality
Theorem
k is c0-universal (resp. c-universal) if and only if
∫
X
∫
X
k(x , y) dµ(x) dµ(y) > 0, ∀µ ∈ Mb(X )\0.
k is cc-universal if and only if
∫
X
∫
X
k(x , y) dµ(x) dµ(y) > 0, ∀µ ∈ Mbc(X )\0.
![Page 31: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/31.jpg)
Postive Definite Characterization of Universality
Theorem
k is c0-universal (resp. c-universal) if and only if
∫
X
∫
X
k(x , y) dµ(x) dµ(y) > 0, ∀µ ∈ Mb(X )\0.
k is cc-universal if and only if
∫
X
∫
X
k(x , y) dµ(x) dµ(y) > 0, ∀µ ∈ Mbc(X )\0.
If k is c-, cc- or c0-universal, then it is strictly positive definite.
![Page 32: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/32.jpg)
X is an LCH space: Summary
![Page 33: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/33.jpg)
![Page 34: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/34.jpg)
Translation Invariant Kernels on Rd
X = Rd and k(x , y) = ψ(x − y), where
ψ(x) =
∫
Rd
e√−1xTω dΛ(ω), x ∈ Rd ,
and Λ is a non-negative finite Borel measure.
![Page 35: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/35.jpg)
Translation Invariant Kernels on Rd
X = Rd and k(x , y) = ψ(x − y), where
ψ(x) =
∫
Rd
e√−1xTω dΛ(ω), x ∈ Rd ,
and Λ is a non-negative finite Borel measure.
Theorem
k is c0-universal if and only if supp(Λ) = Rd .
k is c0-universal if and only if it is characteristic.
If supp(Λ) has a non-empty interior, then k is cc-universal.[Micchelli et al., 2006]
![Page 36: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/36.jpg)
Examples
Gaussian kernel: ψ(x) = e−x2/2σ2
; Ψ(ω) = σe−σ2ω2/2; dΛ(ω) = Ψ(ω) dω.
−4 −3 −2 −1 0 1 2 3 40
1
x
ψ(x
)
−4 −3 −2 −1 0 1 2 3 40
σ
Ψ(ω
)
Laplacian kernel: ψ(x) = e−σ|x|; Ψ(ω) =√
2π
σσ2+ω2 .
−4 −3 −2 −1 0 1 2 3 40
1
x
ψ(x
)
−4 −3 −2 −1 0 1 2 3 40
(2/πσ2)1/2
Ψ(ω
)
![Page 37: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/37.jpg)
Examples
B1-spline kernel: ψ(x) = (1− |x |)1[−1,1](x); Ψ(ω) = 2√
2√π
sin2(ω2)
ω2 .
−3 −2 −1 0 1 2 30
1
x
ψ(x
)
0
0.2
0.4
0.6
0.8
1
−8π −6π −4π −2π 0 2π 4π 6π 8π
(2π)
1/2 Ψ
(ω)
Sinc kernel: ψ(x) = sin(σx)x
; Ψ(ω) =√
π21[−σ,σ](ω).
−0.2
0
0.2
0.4
0.6
0.8
1
−6π −5π −4π −3π −2π −π 0 π 2π 3π 4π 5π 6π
ψ(x
)
Ψ(ω
)
−3 −2 −1 0 1 2 30
(π/2)1/2
![Page 38: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/38.jpg)
Translation Invariant Kernels on Rd : Summary
![Page 39: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/39.jpg)
![Page 40: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/40.jpg)
Radial Kernels on Rd
Let
k(x , y) =
∫
[0,∞)
e−t‖x−y‖22 dν(t),
where ν is a finite non-negative Borel measure on [0,∞).
Examples: Gaussian kernel, Inverse multi-quadratic kernel,k(x , y) = (c2 + ‖x − y‖22)
−β , β > d2 , c > 0, etc.
![Page 41: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/41.jpg)
Radial Kernels on Rd
Let
k(x , y) =
∫
[0,∞)
e−t‖x−y‖22 dν(t),
where ν is a finite non-negative Borel measure on [0,∞).
Examples: Gaussian kernel, Inverse multi-quadratic kernel,k(x , y) = (c2 + ‖x − y‖22)
−β , β > d2 , c > 0, etc.
TheoremThe following conditions are equivalent.
supp(ν) 6= 0.
k is c0-universal.
k is cc-universal.
k is characteristic.
k is strictly pd.
![Page 42: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/42.jpg)
Radial Kernels on Rd : Summary
![Page 43: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/43.jpg)
![Page 44: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/44.jpg)
Summary
Characteristic kernel
Injective RKHS embedding of probability measures.
Applications: Hypothesis testing, feature selection, etc.
Universal kernel
Consistency of learning algorithms.
Injective RKHS embedding of finite signed Radon measures.
Clarified the relation between various notions of universality andcharacteristic kernels.
![Page 45: On the Relation Between Universality, Characteristic ...cc-universality [Micchelli et al., 2006] X: Hausdorff space k: continuous on X ×X Target function space: C(X) Define k to](https://reader034.vdocuments.us/reader034/viewer/2022042316/5f0473bb7e708231d40e0972/html5/thumbnails/45.jpg)
References
Fukumizu, K., Gretton, A., Sun, X., and Scholkopf, B. (2008).Kernel measures of conditional dependence.In Platt, J., Koller, D., Singer, Y., and Roweis, S., editors, Advances in Neural Information Processing Systems 20, pages 489–496,Cambridge, MA. MIT Press.
Fukumizu, K., Sriperumbudur, B. K., Gretton, A., and Scholkopf, B. (2009).Characteristic kernels on groups and semigroups.In Koller, D., Schuurmans, D., Bengio, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 21, pages473–480.
Gretton, A., Borgwardt, K. M., Rasch, M., Scholkopf, B., and Smola, A. (2007).A kernel method for the two sample problem.In Scholkopf, B., Platt, J., and Hoffman, T., editors, Advances in Neural Information Processing Systems 19, pages 513–520. MITPress.
Micchelli, C. A., Xu, Y., and Zhang, H. (2006).Universal kernels.Journal of Machine Learning Research, 7:2651–2667.
Sriperumbudur, B. K., Fukumizu, K., Gretton, A., Lanckriet, G. R. G., and Scholkopf, B. (2009).Kernel choice and classifiability for RKHS embeddings of probability distributions.In Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C. K. I., and Culotta, A., editors, Advances in Neural Information Processing
Systems 22, pages 1750–1758. MIT Press.
Sriperumbudur, B. K., Gretton, A., Fukumizu, K., Lanckriet, G. R. G., and Scholkopf, B. (2008).Injective Hilbert space embeddings of probability measures.In Servedio, R. and Zhang, T., editors, Proc. of the 21st Annual Conference on Learning Theory, pages 111–122.
Steinwart, I. (2001).On the influence of the kernel on the consistency of support vector machines.Journal of Machine Learning Research, 2:67–93.