cmpsci 791bb: advanced ml: laplacian learningmahadeva/cs791bb/lectures-s2006/lec… · university...

21
U U NIVERSITY NIVERSITY OF OF M M ASSACHUSETTS, AMHERST ASSACHUSETTS, AMHERST D D EPARTMENT EPARTMENT OF OF C C OMPUTER OMPUTER S S CIENCE CIENCE CMPSCI 791BB: Advanced ML: Laplacian Learning Sridhar Mahadevan

Upload: others

Post on 31-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

CMPSCI 791BB: Advanced ML:Laplacian Learning

Sridhar Mahadevan

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Outline! Spectral graph operators

! Combinatorial graph Laplacian! Normalized graph Laplacian! Random walks

! Machine learning on graphs! Clustering! Regression

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Operators on Graphs

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Laplacian Spectra of Complete Graph

1 2 3 4 50

1

2

3

4

5

6Eigenvalues of Complete Graph K-5K5

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Some basic results! Theorem: Given a connected graph G = (V, E), the

eigenspace of the first eigenvalue has dimension 1! Proof:

! We know that λ0 = 0 ! Let x be the associated eigenvector, so L x = λ x! We also know that xT L x = ∑(u,v) ε E (xu – xv)2 = 0! So, x must be a constant eigenvector, and hence of dimension 1

! Corollary: The multiplicity of the first eigenvalue gives the number of connected components of G

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Some basic results! Theorem: The Laplacian of the complete graph Kn has n-1

eigenvalues equal to n and 1 eigenvalue = 0! Proof:

! We already know that the constant vector 1 is an eigenvector of L! Consider any vector x such that xT 1 = 0! Note that L xi = (n-1) xi - ∑j ≠ i xj = n xi

! Any eigenvector associated with λ > 0 must be perpendicular to 1! Naming convention: Spielman (and others) label Kn to

mean the complete graph on n vertices (each of degree n-1)

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Laplacian as an Operator

! Note that the Laplacian acts differently than the adjacency operator.

L f(u) = ∑v ~ u (f(u) - f(v)) wuv

! Theorem: For any graph G, the number of spanning trees is given by

where σiare the eigenvalues of the combinatorial Laplacian

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Combinatorial Laplacian on a Path

0 10 200.2236

0.2236

0.2236

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

0 10 20−0.5

0

0.5

Eigenfunctions

Spectrum

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Normalized Laplacian! The normalized Laplacian is L = D-1/2 (D-A) D-1/2

! For a weighted graph, the normalized Laplacian is

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Normalized Laplacian! L = D-1/2 L D-1/2= I – D-1/2 A D-1/2

! Note that for a k-regular graph, L = I – 1/k A! Unlike the combinatorial Laplacian, the normalized

Laplacian takes the degree of each vertex into account

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Random Walk on Graphs! For any graph G = (V, E), we can associate a natural random

walk defined as P(u,v) = w(u,v)/∑∑∑∑v’ w(u,v’)

! Let P*(u) be the long-term probability of being in vertex u of the random walk defined by P.

! It can be shown that this invariant distribution is reversible, and expressed as

P*(u) = w(u) / W, where w(u) = ∑v w(u,v) ! where W = ∑u w(u)

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Random Walks and the Normalized Laplacian

! Another way to define the random walk on a graph = D-1 A! Note that this is not a symmetric matrix! However, its eigenvalues are all real, because it is closely

related to the normalized Laplacian! Define two matrices A and B as similar if A = M B M-1

! Here, M is any invertible matrix! Note that if x is an eigenvector of A, then we have

A x = λ x = M B M-1 x ! This implies that λ M-1 x = B M-1 x, so M-1 x is an

eigenvector of B

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Random Walks and Normalized Laplacian

! Since L = I – D-1/2 A D-1/2, we getI – L = D-1/2 A D-1/2

D-1/2 (I – L) D1/2 = D-1 A

! This shows that the random walk operator is spectrally similar to the normalized Laplacian!

! The eigenvalues of the random walk operator are the same as that of I – L

! The eigenvectors of the random walk operator are???

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Spectral Clustering using the Laplacian(Ng, Jordan, and Weiss, NIPS 2002)

! Given a set of instances D, compute the distance between each pair of points using some local metric! Gaussian kernel! Nearest neighbor kernel

! Define a graph G whose edges are weighted by the distance between each pair of points

! Compute the eigenvectors of the normalized graph Laplacian ! Given any point, compute its embedding using k

eigenvectors associated with the smallest eigenvalues! Use any standard clustering method on the embedded points

(E.g, k-means)

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Spectral Clustering using Graph Laplacian

-0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Cluster: 1�Adler �Barrington �Immerman �Kurose �Rosenberg �Shenoy �Sitaraman �Towsley �Weems cluster: 2

Adrion Allan Avrunin Barto Brock Clarke Cohen Croft Grupen Hanson Jensen Lehnert Lesser Levine Mahadevan

Manmatha McCallum Moll Moss Osterweil Riseman Rissland Schultz Utgoff Woolf Zilberstein

Embedding using the2nd and 3rd

eigenvector of thegraph Laplacian

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Partitioning Graphs using the Combinatorial Laplacian

2nd Eigenvector ofGraph Laplacian

Spatial Environment

The sign of the second eigenvectorcan separate the vertices inRoom 1 vs. Room 2 and Room3

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Regression using the Graph Laplacian(Belkin and Niyogi, STOC 2005; Mahadevan, ICML 2005)

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Regression using the Graph Laplacian

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Comparison of Polynomial and Laplacian Basis Representations

1 2 3 4 5 6 72

4

6

8V

alue

Desired Function

1 2 3 4 5 6 70

1

2

3

4

Number of basis functions

Mea

n sq

uare

d er

ror

Polynomial Basis Approximation

LaplacianPolynomial

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Approximation on a Grid

0

5

10

0

5

100

50

100

150

Optimal Value Function

0

5

10

0

5

100

20

40

60

80

100

Least-Squares Approximation using automatically learned Proto-Value Functions

0 2 4 6 8 10 12 14 16 18 200

100

200

300

400

500

600

700

800MEAN-SQUARED ERROR OF LAPLACIAN vs. POLYNOMIAL STATE ENCODING

NUMBER OF BASIS FUNCTIONS

ME

AN

-SQ

UA

RE

D E

RR

OR

LAPLACIANPOLYNOMIAL

UUNIVERSITYNIVERSITY OFOF MMASSACHUSETTS, AMHERSTASSACHUSETTS, AMHERST • • DDEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Nonlinear Function Approximation

05

1015

20

0

5

1015

-50

0

50

100

150

Target Function

05

1015

20

05

1015

-50

0

50

100

Laplacian Least Squares Function Approximation

05

1015

20

05

1015

0

50

100

Polynomial Least Squares Function Approximation