learning with kernels on graphs, groups and manifolds › ~risi › papers › montreal.pdf ·...
TRANSCRIPT
![Page 1: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/1.jpg)
Learning with Kernels onGraphs, Groups and Manifolds
Risi KondorColumbia University, New York, USA.
1
![Page 2: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/2.jpg)
Collaborators
John LaffertyGuy LebanonMikhail BelkinAlex SmolaTony JebaraCount Laplace (1749-1827)
2
![Page 3: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/3.jpg)
Batch Learning
Input space: X e.g. X = Rd
Output space: Y Y = R regression
Y = −1, 1 classi£cation
Learn f : X 7→ Y from examples (x1, y1), (x2, y2), . . . , (xm, ym)
3
![Page 4: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/4.jpg)
A Naive Approach
Look for f minimizing
Rreg[f ] =1
m
m∑
i=1
(f(xi)− yi)2︸ ︷︷ ︸Loss function
+ Ω[f ]︸︷︷︸Complexity penalty
Harmonic example: Ω[f ] =
∫e‖ω ‖
2/2‖ f(ω) ‖2 dω
f(x) =1
√2π
d
∫f(ω)eiω·x dω f(ω) =
1√2π
d
∫f(x)e−iω·x dx
4
![Page 5: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/5.jpg)
A space of functions
The functions f naturally form a linear space H
Impose inner product such that 〈f, f〉 = Ω[f ]
e.g. 〈f, f ′〉 =∫e‖ω ‖
2/2f(ω)f ′(ω) dω
If H is complete, it is said to be a Hilbert space.
5
![Page 6: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/6.jpg)
Looking for f ∈H minimizing
Rreg[f ] =1
m
m∑
i=1
(f(xi)− yi)2 + 〈f, f〉
6
![Page 7: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/7.jpg)
Looking for f ∈H minimizing
Rreg[f ] =1
m
m∑
i=1
(f(xi)− yi)2 + 〈f, f〉
Wouldn’t it be neat if 〈f, kx〉 = f(x), then
Rreg[f ] =1
m
m∑
i=1
(〈f, kxi〉 − yi)2 + 〈f, f〉
f only interacts with kxi ’s ⇒ f ∈span(kx1, kx2
, . . . , kxm)
7
![Page 8: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/8.jpg)
Now plug in f(x) =∑
i
αikxi(x):
Rreg[f ] =1
m
m∑
i=1
(〈f, kxi〉 − yi)2 + 〈f, f〉
↓
Rreg[f ] =1
m
m∑
i=1
m∑
j=1
(αj⟨kxj , kxi
⟩− yi
)2+
m∑
i=1
m∑
j=1
αiαj⟨kxi , kxj
⟩
Letting Kij = 〈ki, kj〉:
Rreg = (Kα− y)>(Kα− y) + α>Kα
8
![Page 9: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/9.jpg)
To £nd f(x) =∑
i
αikxi(x) minimizing
Rreg = (Kα− y)>(Kα− y) + α>Kα
set∂R
∂α= 0:
2K(Kα− y) + 2Kα = 0
α = (K + I)−1y
9
![Page 10: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/10.jpg)
So what are kx and K?
Recall 〈f, kx〉 = f(x). It is easy to show that for our example
kx(x′) =
1√2π
de−‖ x−x
′ ‖2/2
and
Ki,j =⟨kxi , kxj
⟩= kxi(xj) =
1√2π
de−‖ xi−xj ‖
2/2
10
![Page 11: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/11.jpg)
K is the kernel!!!
11
![Page 12: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/12.jpg)
Kernel Methods
General regularization network form:
f = argminf∈H
[1
m
m∑
i=1
L(f(xi), yi)
︸ ︷︷ ︸Empirical risk
+ 〈f, f〉︸ ︷︷ ︸Regularizer
]
SVM classi£cation: L = max(0, 1− yif(xi))SVM regression: L = | f(xi)− yi |εGaussian Process MAP L = 1
σ20
(f(xi)− y)2
12
![Page 13: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/13.jpg)
Conventional Explanation
K : X × X 7→ R positive de£nite function: similarity measure
There exists some mapping Φ : X 7→ H such that
K(x, x′) = 〈Φ(x),Φ(x′)〉
Find some geometric criterion to optimize in H, eg. maximummargin.
13
![Page 14: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/14.jpg)
Nonlinear SVM’s
Figure courtesy of B. Scholkopf and A. Smola, c©MIT Press
14
![Page 15: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/15.jpg)
Connection between two views
• One-to-one correspondence between kernel and regularizer
• Kernel is algorithmic
• Form of regularizer really explains what’s going on, e.g.
K(x, x′) =1
√2π
de−‖ x−x
′ ‖2/(2σ2) smoothing
l
〈f, f〉 = 1√2π
d
∫e‖ω ‖
2σ2/2‖ f(ω) ‖2 dω roughening
15
![Page 16: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/16.jpg)
Connection in general
Regularization operator P : H 7→ H (self-adjoint)
Ω[f ] = 〈f, f〉 =∫(Pf)(x) · (Pf)(x) dx
Kernel operator K : H 7→ H
(Kg)(x) =
∫K(x, x′)g(x′) dx′ ⇒ (Kδx)(x
′) = K(x, x′)
〈f, kx〉 = 〈f,K(x, ·)〉 =∫f(x′)(P
2Kδx)(x
′) dx′ = f(x)
hence K =(P)−2
16
![Page 17: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/17.jpg)
References so far
Girosi, Jones & Poggio Regularization Theory and NeuralNetwork Architectures (Neural Computation, 1995)Smola and Scholkopf From Regularization Operators to SupportVector Kernels (NIPS 1998)Aronszajn Theory of Reproducing Kernels (1950)Kimeldorf & Wahba Some Results on Tchebychef£an SplineFunctions (1971)...
17
![Page 18: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/18.jpg)
Link to Diffusion
Kβ(x, x′) =
1√2πβ
e−‖ x−x′ ‖/(2β)
is solution to Diffusion equation
∂
∂βKβ(x, x0) = ∆Kβ(x, x0)
where ∆ = ∂2
∂x21
+ ∂2
∂x22
+ . . .+ ∂2
∂x2d
is the Laplacian.
Generally, may de£ne Laplacian operator ∆ : H 7→ H and kerneloperator by
∂
∂βKβ = ∆Kβ .
Formally K = eβ∆ and P = e−β∆/2.
18
![Page 19: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/19.jpg)
How do we generalize to discrete spaces?
19
![Page 20: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/20.jpg)
1. Graphs
20
![Page 21: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/21.jpg)
Graphs
Looking for positive de£nite K : V × V 7→ R, now just a matrix
21
![Page 22: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/22.jpg)
Try Random Walks
Aij =
1 i ∼ j
0 otherwise
A symmetric ⇒ even powers pos. def.
K = A2 ? A4 ? A∞ ?
K = α1A2 + α2A
4 + . . . ?
22
![Page 23: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/23.jpg)
Diffusion
In£nite number of ininitesimal steps:
K = limn→∞
(1 +
β
nL
)n
Lij =
1 i ∼ j
−di i = j
0 otherwise
(Laplacian)
23
![Page 24: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/24.jpg)
Diffusion
In£nite number of ininitesimal steps:
K = limn→∞
(1 +
β
nL
)n= eβL
Lij =
1 i ∼ j
−di i = j
0 otherwise
(Laplacian)
24
![Page 25: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/25.jpg)
Exponential Kernels
Kβ = eβL = limn→∞
(I + β
n L)n
= I + βL+ β2
2! L2 + β3
3! L3 + . . .
25
![Page 26: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/26.jpg)
Exponential Kernels
Kβ = eβL = limn→∞
(I + β
n L)n
= I + βL+ β2
2! L2 + β3
3! L3 + . . .
d
dβKβ = LKβ K0 = I
26
![Page 27: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/27.jpg)
For any symmetric L, K = eβL is positive de£nite.
eβL = limn→∞
(I +
β
nL
)n= lim
n→∞
(I +
β
2nL
)2n
conversely,
K =(K1/n
)n= lim
n→∞
(I +
1
nL
)n= eL
for any in£nitely divisible (or £nite) K.
27
![Page 28: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/28.jpg)
Properties of Diffusion Kernels
• Positive de£nite
• Analogy with continuous case
• Local relationships L induce global relationships Kβ by
d
dβKβ = LKβ K0 = I
• Works for undirected weighted graphs with weights wij = wji
28
![Page 29: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/29.jpg)
Complete graphs
K(i, j) =
1 + (n− 1) e−nβn
for i = j
1− e−nβn
for i 6= j ,
for n = 2 Kβ(i, j) ∝ (tanhβ)d(i,j)
29
![Page 30: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/30.jpg)
Closed chains
K(i, j) =1
n
n−1∑
ν=0
e−ωνβ cos2πν(i− j)
n
30
![Page 31: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/31.jpg)
Tensor product kernels
K(1) kernel on X1K(2) kernel on X2
K(1,2) = K(1) ⊗K(2) kernel on X1 ⊗X2
K(1,2) ((x1, x2), (x′1, x
′2)) = K(1)(x1, x
′1)K
(2)(x2, x′2)
L(1,2) = L(1) ⊗ I(2) + L(2) ⊗ I(1)
31
![Page 32: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/32.jpg)
Hypercubes, etc.
Hypercube:
K(x, x′) = (tanhβ)d(x,x′)
Alphabet A:
K(x, x′) =
(1− e−|A |β
1 + (| A | − 1) e−|A |β)d(x,x′)
32
![Page 33: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/33.jpg)
k-regular-trees
K(x, x′) = K(d(x, x′)) =
2
π(k−1)
∫ π
0
e−β
(1− 2
√k−1k
cos x)
sinx [ (k−1) sin(d+1)x− sin(d−1)x ]k2 − 4 (k−1) cos2 x dx
33
![Page 34: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/34.jpg)
Combinatorial view
f : V 7→ R function on graph, or vector (f1, f2, . . . , fn)>
f>Lf = −∑
i∼j
(fi − fj)2
total weight of “edge violations”
34
![Page 35: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/35.jpg)
Spectral view
Eigenvalues 0 = λ0 ≥ λ1 ≥ λ2 ≥ . . . ≥ λn and correspondingeigenvectos v0, v1, v2, . . . , vn.
v0 = constv1: “Fiedler-vector” smoothest function on graph orthogonal to v0In general vi minimizes
v>Lv
v>vfor v ⊥ span(v1, v2, . . . , vi−1)
L =∑
i
vi λi v>i K = eβL =
∑
i
vi eβ λi v>i
35
![Page 36: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/36.jpg)
Regularization operator
Recall K =(P)−2
.
Now simply 〈f, f〉 = (Pf)>(Pf) = f>P 2f and P = K−1/2.
K = eβL =∑
i
vi eβ λi v>i P = e−βL/2 =
∑
i
vi e−β λi/2 v>i
36
![Page 37: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/37.jpg)
Generalization
[Smola & Kondor COLT 2003]
K =∑
i
vi r(λi) v>i
• Diffusion kernel: r(λ) = exp(βλ)
• p-step random walk kernel: −(a+ λ)−p
• Regularized Laplacian kernel: σ2λ− 1
37
![Page 38: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/38.jpg)
Applications
• Natural graphs: internet, web, social contacts, citations,scienti£c collaborations, etc.
• Objects with graph-like structure: strings, etc.
• Objects with unknown global structure: set of organicmolecules,
• Bioinformatics: network of molecular pahways in cells [J-P Vertand Kanehisa NIPS 2002]
• Incorporating unlabelled data
38
![Page 39: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/39.jpg)
2. Groups
39
![Page 40: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/40.jpg)
Finite Groups
Finite set G with operation G×G 7→ G
• x1x2 ∈ G for any x1, x2 ∈ G (closure)
• x1(x2x3) = (x1x2)x3 (associativity)
• xe = ex = e for any x ∈ G (identity)
• x−1x = xx−1 = e (inverses)
40
![Page 41: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/41.jpg)
Symmetric groups Sn
x1 = (12)(3)(4)(5) x1(ABCDE) = BACDE
x2 = (1)(324)(5) x2(ABCDE) = ACDBE
x3 = x1x2 x3(ABCDE) = x1(x2(ABCDE))
rankings, orderings, allocation, etc.natural sense of distance
41
![Page 42: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/42.jpg)
Stationary kernels
K(x1, x2) = f(x2 x−11 ) ∼ K(x1, x2) = f(x2−x1)
eg. K ∼ e−(x1−x2)2/2σ2
f positive de£nite function on G
42
![Page 43: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/43.jpg)
Bochner’s Theorem
f is positive de£nite and symmetrical on Rd iff f(ω) > 0
f(ω) =1
√2π
d
∫e−iω ·xf(x) dx
is there an analog for £nite groups?
43
![Page 44: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/44.jpg)
Representation theory
ρ : G→ C d×d
ρ(x1x2) = ρ(x1)ρ(x2)
Equivalence:
ρ1(x) = t−1ρ2(x) t ∀ x ∈ G
Reducibility:
t−1ρ(x) t =
ρ1(x) 0
0 ρ2(x)
∀ x ∈ G
44
![Page 45: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/45.jpg)
Irreducible representations of S5
ρtrivial(x) ≡ (1) → ρ(5)
ρsign(x) ≡ (sgn(x)) → ρ(1,1,1,1,1)
ρdef.(x) ∈ C5×5
ex(1)ex(2)ex(3)ex(4)ex(5)
= ρdef.(x)
e1e2e3e4e5
45
![Page 46: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/46.jpg)
Irreducible representations of S5
t−1 ρdef.(x) t =
ρtr.(x)
ρ(4,1)(x)
ρreg. = ρ(5) ⊕ 4ρ(4,1) ⊕ 5ρ(3,2) ⊕ 6ρ(3,1,1)⊕5ρ(2,2,1) ⊕ 4ρ(2,1,1,1) ⊕ ρ(1,1,1,1,1)
46
![Page 47: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/47.jpg)
Fourier transforms on Groups
f(ρ) =∑
x∈G
ρ(x) f(x) ρ∈R
F : CG 7→⊕
ρ∈R
Cdρ×dρ
inversion:
f(x) =1
|G |∑
ρ∈R
dρ trace[f(ρ)ρ(x−1)]
47
![Page 48: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/48.jpg)
Bochner on Groups
The function f is positive de£nite on G if and only ifthe matrices f(ρ) are all positive de£nite.
48
![Page 49: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/49.jpg)
Conjugacy classes
conjugacy classes: x1 ∼=Conj. x2 iff x1 = t−1 x2 t for some t
eg. on Sn (·)(·)(·)(·) . . .(· ·)(·)(·)(·) . . .(· · ·)(·)(·) . . .(· ·)(· ·)(·)(·) . . ....
49
![Page 50: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/50.jpg)
Corollary to Bochner
The function f is positive de£nite on G and constanton conjugacy classes if and only if
f(x) =∑
ρ∈R
cρ χρ cρ > 0 .
characters:
χρ(x) = trace[ρ(x)]
50
![Page 51: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/51.jpg)
Diffusion on Cayley graph of S4
51
![Page 52: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/52.jpg)
3. Manifolds
52
![Page 53: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/53.jpg)
Riemannian Manifolds
Surface M with sense of distance that locally looks like Rd
In neigborhood of x, points can be represented as vector x+ δx,
d(x,x+ δx)2 = (δx)>G(x) δx+O(‖ δx ‖4)
Metric tensor: G(x)∈Rd×d positive de£nite for all x∈M
For path p : [0, 1] 7→ Rd,
`(p) =
∫ 1
0
d∑
i=1
d∑
j=1
∂ip(γ)G(p(γ))ij ∂jp(γ)
1/2
dγ
53
![Page 54: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/54.jpg)
Laplacian on a Riemannian Manifold
Flat space: ∆ = ∂21 + ∂22 + . . .+ ∂2d
Manifold:
∆ =1√detG
∑
ij
∂i√detG (G−1)ij ∂j
gives rise to operator ∆ : L2(M) 7→ L2(M) as before
54
![Page 55: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/55.jpg)
Diffusion Kernel on MSolution of
∂tKt = ∆Kt with K0 = I
1. Kt(x, x′) = Kt(x
′, x)
2. limt→0Kt(x, x′) = δx(x
′)
3.(∆− ∂
∂t
)K = 0
4. Kt(x, x′) =
∫MKt−s(x, x
′′)Ks(x′′, x′) dz
5. Kt(x, x′) =
∑∞i=0 e
−λitφi(x)φi(x′)
55
![Page 56: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/56.jpg)
Manifold Structures in Data
• Even when data at £rst seems very high dimensional, it is oftenconstrained to a low dimensional manifold i.e. it only has a fewinternal degrees of freedom
• Constraining the kernel to the manifold is expected to help
• The graph Laplacian sampled from M approximates theLaplacian of M [Belkin & Niyogi NIPS 2001]
• A natural use for unlabeled data points is to help construct thekernel [Belkin & Niyogi NIPS 2002]
56
![Page 57: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/57.jpg)
The Statistical Manifold
[Lafferty & Lebanon NIPS 2002]
For a family p(x|θ) : θ∈Rd of statistical models the Fishermetric is
Gij(θ) = E(∂i`θ ∂j`θ) =∫(∂i log p(x|θ)) (∂j log p(x|θ)) p(x|θ) dx
or equivalently
Gij = 4
∫ (∂i√p(x|θ
)(∂j√p(x|θ
)dx .
Locally approximated by Kullback-Leibler divergence
57
![Page 58: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/58.jpg)
The Multinomial
p(x|θ) = (n+ 1)!
x1!x2! . . . xn+1!θx1
1 θx2
2 . . . θxn+1
n+1
n+1∑
i=1
θi = 1
Gij(θ) =n+1∑
k=1
1
θi(∂iθk) (∂jθk) θ∈Pd
Consider the map T : Pd 7→ Sd via T : θ 7→ (√θ1,√θ2, . . . ,
√θn+1)
on Sd the metric becomes the natural metric of the sphere, hence
d(θ, θ′) = 2 arccos
(n+1∑
i=1
√θiθ
′i
)
58
![Page 59: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/59.jpg)
Diffusion on the Sphere
Kt(x, x′) = (4πt)−
n2 exp
(−d
2(x, x′)
4t
) N∑
i=0
ψi(x, x′) ti +O(tN )
Kt(θ, θ′) ≈ (4πt)−
n2 exp
(−1tarccos2
(n+1∑
i=1
√θi θ
′i
))
Proposed and applied to text data in [Lafferty & Lebanon NIPS2002]
59
![Page 60: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/60.jpg)
Conclusions
60
![Page 61: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/61.jpg)
“The Laplace operator in its various manifestations is the mostbeautiful and central object in all of mathematics. Probabilitytheory, mathematical physics, Fourier analysis, partial differentialequations, the theory of Lie groups, and differential geometry allrevolve around this sun, and its light even penetrates such obscureregions as number theory and algebraic geometry.”(Nelson 1968)
61
![Page 62: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/62.jpg)
Conclusions
• Kernel methods: algorithm + kernel
• Kernel alone encapsulates all knowledge about X
• Laplacian is a unifying concept in Mathematics
• Connection with diffusion intuitively appealing, but realjusti£cation for exponential kernels lies deep in operator-land
• Exponentially induced kernels lift knowledge of local structureto global level
• Opens new links to Abstract Algebra and Information Geometry
62
![Page 63: Learning with Kernels on Graphs, Groups and Manifolds › ~risi › papers › montreal.pdf · 2009-11-07 · Learning with Kernels on Graphs, Groups and Manifolds Risi Kondor Columbia](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0f0f1a7e708231d4424934/html5/thumbnails/63.jpg)
References
Belkin & Niyogi Laplacian Eigenmaps for DimensionalityReduction (NIPS 2001)Kondor & Lafferty Diffusion Kernels on Graphs and OtherDiscrete Structures (ICML 2002)Lafferty & Lebanon Information Diffusion Kernels (NIPS 2002)Belkin & Niyogi Using Manifold Structure for Partially LabelledClassi£cation (NIPS 2002)Vert & Kanehisa Graph-driven Feature Abstraction fromMicroarray Data Using Diffusion Kernels and Kernel CCA (NIPS2002)Smola & Kondor Kernels and Regularization on Graphs (COLT2003)
63