cmpsci 791bb: advanced ml: laplacian learningmahadeva/cs791bb/lectures-s2006/lec… · university...
TRANSCRIPT
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
CMPSCI 791BB: Advanced ML:Laplacian Learning
Sridhar Mahadevan
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Outline! Spectral graph operators
! Combinatorial graph Laplacian! Normalized graph Laplacian! Random walks
! Machine learning on graphs! Clustering! Regression
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Operators on Graphs
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Laplacian Spectra of Complete Graph
1 2 3 4 50
1
2
3
4
5
6Eigenvalues of Complete Graph K-5K5
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Some basic results! Theorem: Given a connected graph G = (V, E), the
eigenspace of the first eigenvalue has dimension 1! Proof:
! We know that λ0 = 0 ! Let x be the associated eigenvector, so L x = λ x! We also know that xT L x = ∑(u,v) ε E (xu – xv)2 = 0! So, x must be a constant eigenvector, and hence of dimension 1
! Corollary: The multiplicity of the first eigenvalue gives the number of connected components of G
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Some basic results! Theorem: The Laplacian of the complete graph Kn has n-1
eigenvalues equal to n and 1 eigenvalue = 0! Proof:
! We already know that the constant vector 1 is an eigenvector of L! Consider any vector x such that xT 1 = 0! Note that L xi = (n-1) xi - ∑j ≠ i xj = n xi
! Any eigenvector associated with λ > 0 must be perpendicular to 1! Naming convention: Spielman (and others) label Kn to
mean the complete graph on n vertices (each of degree n-1)
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Laplacian as an Operator
! Note that the Laplacian acts differently than the adjacency operator.
L f(u) = ∑v ~ u (f(u) - f(v)) wuv
! Theorem: For any graph G, the number of spanning trees is given by
where σiare the eigenvalues of the combinatorial Laplacian
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Combinatorial Laplacian on a Path
0 10 200.2236
0.2236
0.2236
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
0 10 20−0.5
0
0.5
Eigenfunctions
Spectrum
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Normalized Laplacian! The normalized Laplacian is L = D-1/2 (D-A) D-1/2
! For a weighted graph, the normalized Laplacian is
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Normalized Laplacian! L = D-1/2 L D-1/2= I – D-1/2 A D-1/2
! Note that for a k-regular graph, L = I – 1/k A! Unlike the combinatorial Laplacian, the normalized
Laplacian takes the degree of each vertex into account
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Random Walk on Graphs! For any graph G = (V, E), we can associate a natural random
walk defined as P(u,v) = w(u,v)/∑∑∑∑v’ w(u,v’)
! Let P*(u) be the long-term probability of being in vertex u of the random walk defined by P.
! It can be shown that this invariant distribution is reversible, and expressed as
P*(u) = w(u) / W, where w(u) = ∑v w(u,v) ! where W = ∑u w(u)
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Random Walks and the Normalized Laplacian
! Another way to define the random walk on a graph = D-1 A! Note that this is not a symmetric matrix! However, its eigenvalues are all real, because it is closely
related to the normalized Laplacian! Define two matrices A and B as similar if A = M B M-1
! Here, M is any invertible matrix! Note that if x is an eigenvector of A, then we have
A x = λ x = M B M-1 x ! This implies that λ M-1 x = B M-1 x, so M-1 x is an
eigenvector of B
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Random Walks and Normalized Laplacian
! Since L = I – D-1/2 A D-1/2, we getI – L = D-1/2 A D-1/2
D-1/2 (I – L) D1/2 = D-1 A
! This shows that the random walk operator is spectrally similar to the normalized Laplacian!
! The eigenvalues of the random walk operator are the same as that of I – L
! The eigenvectors of the random walk operator are???
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Spectral Clustering using the Laplacian(Ng, Jordan, and Weiss, NIPS 2002)
! Given a set of instances D, compute the distance between each pair of points using some local metric! Gaussian kernel! Nearest neighbor kernel
! Define a graph G whose edges are weighted by the distance between each pair of points
! Compute the eigenvectors of the normalized graph Laplacian ! Given any point, compute its embedding using k
eigenvectors associated with the smallest eigenvalues! Use any standard clustering method on the embedded points
(E.g, k-means)
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Spectral Clustering using Graph Laplacian
-0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
Cluster: 1�Adler �Barrington �Immerman �Kurose �Rosenberg �Shenoy �Sitaraman �Towsley �Weems cluster: 2
Adrion Allan Avrunin Barto Brock Clarke Cohen Croft Grupen Hanson Jensen Lehnert Lesser Levine Mahadevan
Manmatha McCallum Moll Moss Osterweil Riseman Rissland Schultz Utgoff Woolf Zilberstein
Embedding using the2nd and 3rd
eigenvector of thegraph Laplacian
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Partitioning Graphs using the Combinatorial Laplacian
2nd Eigenvector ofGraph Laplacian
Spatial Environment
The sign of the second eigenvectorcan separate the vertices inRoom 1 vs. Room 2 and Room3
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Regression using the Graph Laplacian(Belkin and Niyogi, STOC 2005; Mahadevan, ICML 2005)
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Regression using the Graph Laplacian
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Comparison of Polynomial and Laplacian Basis Representations
1 2 3 4 5 6 72
4
6
8V
alue
Desired Function
1 2 3 4 5 6 70
1
2
3
4
Number of basis functions
Mea
n sq
uare
d er
ror
Polynomial Basis Approximation
LaplacianPolynomial
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Approximation on a Grid
0
5
10
0
5
100
50
100
150
Optimal Value Function
0
5
10
0
5
100
20
40
60
80
100
Least-Squares Approximation using automatically learned Proto-Value Functions
0 2 4 6 8 10 12 14 16 18 200
100
200
300
400
500
600
700
800MEAN-SQUARED ERROR OF LAPLACIAN vs. POLYNOMIAL STATE ENCODING
NUMBER OF BASIS FUNCTIONS
ME
AN
-SQ
UA
RE
D E
RR
OR
LAPLACIANPOLYNOMIAL
UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Nonlinear Function Approximation
05
1015
20
0
5
1015
-50
0
50
100
150
Target Function
05
1015
20
05
1015
-50
0
50
100
Laplacian Least Squares Function Approximation
05
1015
20
05
1015
0
50
100
Polynomial Least Squares Function Approximation