nuno vasconcelos ece depp,artment, ucsdpositive definite kernels it is not hard to show that all dot...
TRANSCRIPT
![Page 1: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/1.jpg)
Dot-product kernels
Nuno Vasconcelos ECE Department, UCSDp ,
![Page 2: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/2.jpg)
Classificationa classification problem has two types of variables
• X - vector of observations (features) in the worldX - vector of observations (features) in the world• Y - state (class) of the world
Perceptron: classifier implements the linear decision rulep p
with [ ]g(x)sgn )( =xh bxwxg T +=)( w
appropriate when the classes arelinearly separable
b
x
xg )(
to deal with non-linear separabilitywe introduce a kernel
wb w
2
![Page 3: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/3.jpg)
Kernel summary1. D not linearly separable in X, apply feature transformation Φ:X → Z,
such that dim(Z) >> dim(X)
2. computing Φ(x) too expensive:• write your learning algorithm in dot-product form• instead of Φ(xi) we only need Φ(xi)TΦ(xj) ∀ijinstead of Φ(xi), we only need Φ(xi) Φ(xj) ∀ij
3. instead of computing Φ(xi)TΦ(xj) ∀ij, define the “dot-product kernel”
)()(),( zxzxK T ΦΦ=
and compute K(xi,xj) ∀ij directly• note: the matrix
)()()(
⎥⎤
⎢⎡ M
is called the “kernel” or Gram matrix
⎥⎥⎥
⎦⎢⎢⎢
⎣
=M
LL ),( ji xxKK
3
4. forget about Φ(x) and use K(x,z) from the start!
![Page 4: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/4.jpg)
Polynomial kernelsthis makes a significant difference when K(x,z) is easier to compute that Φ(x)TΦ(z)p ( ) ( )e.g., we have seen that
( ) TT zxzxzxK )()(),( 2ΦΦ== ( )
x
)()(),(
1 ⎞⎜⎛
ℜ→ℜΦ : with2dd
( )Tddddd
d
xxxxxxxxxxxxx
,,,,,,,, 2112111 LLLM →⎟⎟
⎠⎜⎜⎜
⎝
while K(x,z) has complexity O(d), Φ(x)TΦ(z) is O(d2)for K(x,z) = (xTz)k we go from O(d) to O(dk)
4
( , ) ( ) g ( ) ( )
![Page 5: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/5.jpg)
Questionwhat is a good dot-product kernel?• intuitively a good kernel is one that maximizes the margin γ in• intuitively, a good kernel is one that maximizes the margin γ in
range space• however, nobody knows how to do this effectively
in practice:• pick a kernel from a library of known kernels• we talked about• we talked about
• the linear kernel K(x,z) = xTz• the Gaussian family 2zx −
• the polynomial family
σ),(zx
ezxK−
=
( )k
5
( ) { }L,2,1,1),( ∈+= kzxzxK kT
![Page 6: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/6.jpg)
Question“this problem of mine is really asking for the kernel k’(x,z) = ...”( , )• how do I know if this is a dot-product kernel?
let’s start by the definitionDefinition: a mapping
k: X x X → ℜxxx
xx
xxx
xxxx
ooo
ooooo
oooo
x2 X
(x,y) → k(x,y)is a dot-product kernel if and only if
ooo x1
xxx
xx
xxx
xxxx
ooo
oo oo
ox1
Φ
H
k(x,y) = <Φ(x),Φ(y)>where Φ: X → H, H is a vector space and, <.,.> a dot-product in H
o o oo oox1
x3 x2xn
6
product in H
![Page 7: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/7.jpg)
Vector spacesnote that both H and <.,.> can be abstract, not necessarily ℜdyDefinition: a vector space is a set H where • addition and scalar multiplication are defined and satisfy:
1) x+(x’+x’’) = (x+x’)+x” 5) λx ∈ H2) x+x’ = x’+x ∈ H 6) 1x = x3) 0 ∈ H, 0 + x = x 7) λ(λ’ x) = (λλ’)x) , ) ( ) ( )4) –x ∈ H, -x + x = 0 8) λ(x+x’) = λx + λx’
9) (λ+λ’)x = λx + λ’x
the canonical example is ℜd with standard vector additionthe canonical example is ℜ with standard vector addition and scalar multiplicationanother example is the space of mappings X → ℜ with
7
(f+g)(x) = f(x) + g(x) (λf)(x) = λf(x)
![Page 8: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/8.jpg)
Bilinear formsto define dot-product we first need to recall the notion of a bilinear formDefinition: a bilinear form on a vector space H is a mapping
Q: H x H ℜQ: H x H → ℜ(x,x’) → Q(x,x’)
such that ∀ x,x’,x’’∈ Hsuch that ∀ x,x ,x ∈ H
i) Q[(λx+λx’),x”] = λQ(x,x”) + λ’Q(x’,x”)ii) Q[x”,(λx+λx’)] = λQ(x”,x) + λ’Q(x”,x’)) [ ( )] ( ) ( )
in ℜ d the canonical bilinear form is Q(x,x’) = xTAx’
8
if Q(x,x’) = Q(x’,x) ∀ x,x’∈ H, the form is symmetric
![Page 9: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/9.jpg)
Dot productsDefinition: a dot-product on a vector space H is a symmetric bilinear formy
<.,.>: H x H → ℜ(x,x’) → <x,x’>
h th tsuch that
i) <x,x> ≥ 0, ∀ x∈ Hii) <x x> = 0 if and only if x = 0ii) <x,x> = 0 if and only if x = 0
note that for the canonical bilinear form in ℜ dnote that for the canonical bilinear form in ℜ<x,x> = xTAx
this means that A must be positive definite
9
this means that A must be positive definitexTAx > 0, ∀ x ≠ 0
![Page 10: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/10.jpg)
Positive definite matricesrecall that (e.g. Linear Algebra and Applications, Strang)
Definition: each of the following is a necessary andDefinition: each of the following is a necessary and sufficient condition for a real symmetric matrix A to be (semi) positive definite:
i) TA ≥ 0 0i) xTAx ≥ 0, ∀ x ≠ 0 ii) all eigenvalues of A satisfy λi ≥ 0iii) all upper-left submatrices Ak have non-negative determinanti ) there is a matri R ith independent ro s s ch thativ) there is a matrix R with independent rows such that
A = RTR
l ft b t iupper left submatrices:
L 322212
3,12,11,1
32,11,1
2111 ⎥⎥⎤
⎢⎢⎡
=⎥⎤
⎢⎡
== aaaaaa
Aaa
AaA
10
3,32,31,3
3,22,21,232,21,2
21,11
⎥⎥⎦⎢
⎢⎣
⎥⎦
⎢⎣ aaa
aa
![Page 11: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/11.jpg)
Positive definite matricesproperty iv) is particularly interesting• in ℜ d <x x> = xTAx is a dot-product kernel if and only if A is• in ℜ , <x,x> = x Ax is a dot-product kernel if and only if A is
positive definite• from iv) this holds if and only if there is R such that A = RTR• hence
<x,y> = xTAy = (Rx)T(Ry) = Φ(x)TΦ(y)with
Φ: ℜ d → ℜ d
x → Rx
i.e. the dot-product kernelp
k(x,z) = xTAz, (A positive definite)is the standard dot-product in the range space of the
11
is the standard dot product in the range space of the mapping Φ(x) = Rx
![Page 12: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/12.jpg)
Notethere are positive semidefinite matrices
xTAx ≥ 0x Ax ≥ 0and positive definite matrices
xTAx > 0
we will work with semidefinite but, to simplify, will call definiteif we really need > 0 we will say “strictly positive definite”
12
![Page 13: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/13.jpg)
Positive definite kernelshow do we define a positive definite function?Definition: a function k(x y) is a positive definite kernel onDefinition: a function k(x,y) is a positive definite kernel on X xX if ∀ l and ∀ {x1, ..., xl}, xi∈ X, the Gram matrix
⎤⎡ M
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
M
LL
M
),( ji xxkK
is positive definite.Note: this implies that
⎥⎦⎢⎣ M
Note: this implies that• k(x,x) ≥ 0 ∀ x∈ X
• PD ∀ x,y∈ X⎤⎡ ),(),( yxkxxk (*)
13
yetc...⎥
⎦
⎤⎢⎣
⎡),(),(),(),(
yykxyky ( )
![Page 14: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/14.jpg)
Positive definite kernelsthis proves some simple properties• a PD kernel is symmetrica PD kernel is symmetric
Xyxxykyxk ∈∀= ,),,(),( Proof:
since PD means symmetric (*) implies k(x,y) = k(y,x) ∀ x,y∈ X
• Cauchy-Schwarz inequality for kernels: if k(x,y) is a PD kernel, y q y ( y)then
Xyxyykxxkyxk ∈∀≤ ,),,(),(),( 2
Proof:from (*), and property iii) of PD matrices, the determinant of the 2x2 matrix of (*) is non-negative This means that
14
2x2 matrix of ( ) is non negative. This means that
k(x,x)k(y,y) – k(x,y)2 ≥ 0
![Page 15: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/15.jpg)
Positive definite kernelsit is not hard to show that all dot product kernels are PDLemma 1: Let k(x y) be a dot product kernel Then k(x y)Lemma 1: Let k(x,y) be a dot-product kernel. Then k(x,y)is positive definiteproof: p• k(x,y) dot product kernel implies that• ∃Φ and some dot product <.,.> such that
k( ) <Φ( ) Φ( )>k(x,y) = <Φ(x),Φ(y)> • this implies that if:
• we pick any l, and any sequence {x1, ..., xl}, ⎤⎡ My y { 1 l}• and let K be the associated Gram matrix • then, for ∀ c ≠ 0 ⎥
⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
M
LL
M
),( ji xxkK
15
![Page 16: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/16.jpg)
Positive definite kernels
( ),= ∑ij
jijiT xxkccKcc
( ) ( ),ΦΦ= ∑ij
jiji
j
xxcc (k is dot product)
( ) ( ), ΦΦ= ∑∑j
jji
ii xcxc (<.,.> is a bilinear form)
( ) 02
≥Φ= ∑ ii
j
xc (from def of dot product)i product)
16
![Page 17: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/17.jpg)
Positive definite kernelsthe converse is also true but more difficult to proveLemma 2: Let k(x y) x y∈ X be a positive definiteLemma 2: Let k(x,y), x,y∈ X, be a positive definite kernel. Then k(x,y) is a dot product kernelproof:p• we need to show that there is a transformation Φ, a vector space
H = Φ(X), and a dot product <.,.>* in H such that k(x y) <Φ(x) Φ(y)> x2k(x,y) = <Φ(x),Φ(y)>*
• we proceed in three steps1. construct a vector space H
x xx
x
xxx
x
xxx
x
ooo
o
oooo
oooo
x1
x2
Φ
X
p H
2. define the dot-product <.,.>* on H
3. show that k(x,y) = <Φ(x),Φ(y)>* holdsx xx
x
xxx
x
xxx
x
ooo
ooo
oox1
Φ
H
17
ooo
oo ox1
x3 x2xn
![Page 18: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/18.jpg)
The vector space Hwe define H as the space spanned by linear combinations of k(.,xi)( , i)
H = ( )⎭⎬⎫
⎩⎨⎧
∈∀∀= ∑=
m
iiii Xxmxkff
1,,.,(.)|(.) α
notation: by k(.,xi) we mean a function of g(y) = k(y,xi) of y, xi is fixed.homework: check that H is a vector space• e.g. 2) ( )
⎪⎪⎫
= ∑ .,(.) xkfm
iiα ( )
( )∈+=+
⎪⎪⎭
⎪⎪⎬
= ∑
∑= (.)(.)'(.)'(.)
'.,(.)'
.,(.)
'
1
1 ffffxkf
xkf
m
ijj
iii
β
αH
18
⎪⎭=1i
![Page 19: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/19.jpg)
Examplewhen we use the Gaussian kernel
2. ix−
k( ) i G i t d ith i I
2)(., σi
i exK−
=
k(.,xi) is a Gaussian centered on xi with covariance σIand
⎪⎫⎪⎧−m x i.
H =
is the space of all linear⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
∀∀= ∑=
−m
iii xmeff
1
,,(.)|(.) 2 σα
is the space of all linear combinations of Gaussiansnote that these are not mixtures
e.g.
19
but close
![Page 20: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/20.jpg)
The operator <.,.>*
if f(.) and g(.) ∈ H, with'mm
( ) ( )∑∑==
==11
'.,(.).,(.)m
jjj
m
iii xkgxkf βα (**)
we define the operator <.,.>* as
'
( )∑∑= =
=m
iji
m
jji xxkgf
1
'
1*
',, βα (***)
20
![Page 21: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/21.jpg)
Examplewhen we use the Gaussian kernel
2. ix−
th t < > i i ht d f G i t
2)(., σi
i exK−
=
the operator <.,.>* is a weighted sum of Gaussian terms
∑∑−
−m xxm ji
f''
2
2
β
you can look at this as either:
∑∑= =
=i j
ji egf1 1
*
2, σβα
you can look at this as either:• a dot product in H (still need to prove this)• a non-linear measure of similarity in X, somewhat related to
21
ylikelihoods
![Page 22: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/22.jpg)
The operator <.,.>*
important note: for f(.) and g(.) ∈ H, the operator <.,.>*
m m '
( )∑∑= =
=m
iji
m
jji xxkgf
1
'
1*
',, βα
has the property
( )xxkxkxk ')'()( = (****)
proof: just make
( )jiji xxkxkxk ,)(.,),(.,*
= ( )
p j
⎩⎨⎧
≠∀==≠∀==
jkik
kj
ki 0,10,1
ββαα
22
⎩ ≠∀ jkkj 0,1 ββ
![Page 23: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/23.jpg)
The operator <.,.>*
assume that <.,.>* is a dot product in H (proof in moments))since
( )jiji xxkxkxk ,)(.,),(.,*
=
then, clearly
( )*
)(),(, jiji xxxxk ΦΦ=
with
( )*
)(),(, jiji
Φ: X → Hx → k( x)
i.e. the kernel is a dot-product on H, which results from the feature transformation Φ
x → k(.,x)
23
this proves Lemma 2
![Page 24: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/24.jpg)
Examplewhen we use the Gaussian kernel• the point xi ∈ ℜd is mapped into the Gaussian G(x xi σI)
σ
2
),(ixx
i exxK−
−=
• the point xi ∈ ℜ is mapped into the Gaussian G(x,xi,σI)• H is the space of all functions that are linear combinations of
Gaussians• this has infinite dimension• the kernel is a dot product in H, and a non-linear
similarity on X
xx
x
x
xx
x
xx2
ℜd H*
x
xx
x
oo
o
o
o
o oo
o oo
ox1
xx
x
x
x
xx
x
xx
x
x
o
o
ooo
o Φ
*
24
o o
x3x2
xn
x oo
o
ooo
x1*
![Page 25: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/25.jpg)
In summaryto show that k(x,y), x,y∈ X, positive definite ⇒ k(x,y) is a dot product kernelpwe need to show that
( )m m '
( )∑∑= =
=i
jij
ji xxkgf1 1
*',, βα
is a dot product on
H = ( ) ⎬⎫
⎨⎧
∈∀∀= ∑m
Xxmxkff ( )|( ) αH =
this reduces to verifying the dot product conditions
( )⎭⎬
⎩⎨ ∈∀∀= ∑
=iiii Xxmxkff
1,,.,(.)|(.) α
25
y g p
![Page 26: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/26.jpg)
The operator <.,.>*
1) is <.,.>* a bilinear form on H ?
by definition of f( ) and g( ) in (**)by definition of f(.) and g(.) in ( )
( ) ( )'
*'.,,.,, j
m
j
m
ii xkxkgf ∑∑= βα
on the other hand,*11 ji
∑∑==
m m '
( )
( )
∑∑= =
=
m m
m
iji
m
jji xxkgf
'
1 1*
',, βα from (***)
equality of the two left hand sides is the definition of
( ) ( )∑∑= =
=m
iji
m
jji xkxk
1*
1'.,,.,βα from (****)
26
q ybilinearity
![Page 27: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/27.jpg)
The operator <.,.>*
2) is <.,.>* symmetric?note thatnote that
( )*
1
'
1*
,,', gfxxkfgm
iij
m
jji == ∑∑
= =
βα
if and only if k(xi,xj’) = k(xj’,xi) for all xi, xj’.b t thi f ll f th iti d fi it f k( )
1 1i j
but this follows from the positive definiteness of k(x,y)we have seen that a PD kernel is always symmetrich i t ihence, <.,.>* is symmetric
27
![Page 28: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/28.jpg)
The operator <.,.>*
3) is <f,f>* ≥ 0, ∀ f ∈ H ?
by definition of f( ) in (**)by definition of f(.) in ( )
( ) αααα Kxxkff Tm
ji
m
ji == ∑∑*,,
where α ∈ ℜm and K is the Gram matrixsince k(x y) is positive definite K is positive definite by
i j= =1 1
since k(x,y) is positive definite, K is positive definite by definition and <f,f>* ≥ 0the only non-trivial part of the proof is to show that
(x)
<f,f>* = 0 ⇒ f = 0we need two more results
28
![Page 29: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/29.jpg)
The operator <.,.>*
Lemma 3: <.,.>* is itself a positive definite kernel on H x H
proof:proof:• consider any sequence {f1, ..., fm}, fi ∈ H
• then
=∑ ∑∑*
*,, ffff
ij jjj
iiijiji γγγγ (by bilinearity of <.,.>*)
0
≥
=*21
*
,ggj j
(for some g1,g2 ∈ H )
(by (x) )
• hence the Gram matrix is always PD and the kernel <.,.>* is PD
( y ( ) )
29
![Page 30: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/30.jpg)
The operator <.,.>*
Lemma 4: ∀ f ∈ H, <k(.,x),f(.)>* = f(x)
proof:proof:
)(.,),(.,(.)),(.,*
xkxkfxk ii= ∑α (by (**) )
)(.,),(.,*
*
xkxki
ii
i
= ∑α (by bilinearity of <.,.>*)
)(),( xfxxki
ii == ∑α (by (****) )
30
![Page 31: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/31.jpg)
The operator <.,.>*
4) we are now ready to prove that <f,f>* = 0 ⇒ f = 0proof:proof:• since <.,.>* is a PD kernel (lemma 3) we can apply Cauchy-
Schwarz Xyxyykxxkyxk ∈∀≤ ,),,(),(),( 2 • using k(.,x) as x and f(.) as y this becomes
( )2
***),(.,,)(.,),(., fxkffxkxk ≥
• and using lemma 4
)(,),( 2*
xfffxxk ≥
• from which <f,f>* = 0 ⇒ f = 0
31
![Page 32: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/32.jpg)
In summarywe have shown that
( )∑∑= =
=m
iji
m
jji xxkgf
1
'
1*
',, βα
is a dot product on
( ) ⎬⎫
⎨⎧
∀∀∑m
Xxmxkff ( )|( )H =
and this shows that if k(x y) x y∈ X is a positive definite
( )⎭⎬
⎩⎨ ∈∀∀= ∑
=iiii Xxmxkff
1,,.,(.)|(.) α
and this shows that if k(x,y), x,y∈ X, is a positive definite kernel, then k(x,y) is a dot product kernel.since we had initially proven the converse, we have the
32
y p ,following theorem.
![Page 33: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/33.jpg)
Dot product kernelsTheorem: k(x,y), x,y∈ X, is a dot-product kernel if and only if it is a positive definite kernely p
this is interesting because it allows us to check whether a k l i d t d t t!kernel is a dot product or not!• check if the Gram matrix is positive definite for all possible
sequences {x1, ..., xl}, xi∈ X
but the proof is much more interesting than this result aloneit actually gives us insight on what the kernel is doinglet’s summarize
33
![Page 34: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/34.jpg)
Dot product kernelsa dot product kernel k(x,y), x,y∈ X:• applies a feature transformationapplies a feature transformation
Φ: X → Hx → k(.,x)
• to the vector space
H =
x → k(.,x)
( )⎭⎬⎫
⎩⎨⎧
∈∀∀= ∑m
iii Xxmxkff ,,.,(.)|(.) α
• where the kernel implements the dot product
( )⎭⎬
⎩⎨ ∑
=iiii
1,,,( )|( )
m m '
( )∑∑= =
=m
iji
m
jji xxkgf
1 1*
',, βα
34
![Page 35: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/35.jpg)
Dot product kernelsthe dot product
m m '
h th d i t
( )∑∑= =
=m
iji
m
jji xxkgf
1
'
1*
',, βα
has the reproducing property
)((.)),(.,*
xffxk =you can think of this as analog to the convolution with a Dirac deltawe will talk about this a lot in the coming lecturesfinally, <.,.>* is itself a positive definite kernel on H x H
35
![Page 36: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/36.jpg)
A good picture to rememberwhen we use the Gaussian kernel• the point xi ∈ ℜd is mapped into the Gaussian G(x xi σI)
σ
2
),(ixx
i exxK−
−=
• the point xi ∈ ℜ is mapped into the Gaussian G(x,xi,σI)• H is the space of all functions that are linear combinations of
Gaussiansth k l i d t d t i• the kernel is a dot product in H
• the dot product with one of the Gaussianshas the reproducing property
xx
x
x
xx
x
xx2
ℜd H*
x
xx
x
oo
o
o
o
o oo
o oo
ox1
xx
x
x
x
xx
x
xx
x
x
o
o
ooo
o Φ
*
36
o o
x3x2
xn
x oo
o
ooo
x1*
![Page 37: Nuno Vasconcelos ECE Depp,artment, UCSDPositive definite kernels it is not hard to show that all dot product kernels are PD Lemma 1:Lemma 1: Let k(x y)k(x,y) be a dotbe a dot-product](https://reader035.vdocuments.us/reader035/viewer/2022070706/5e9d3d33a2f94f4e105d24d0/html5/thumbnails/37.jpg)
37