spectral clustering royi itzhak. spectral clustering algorithms that cluster points using...

51
Spectral Spectral Clustering Clustering Royi Itzhak Royi Itzhak

Post on 21-Dec-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Spectral ClusteringSpectral Clustering

Royi ItzhakRoyi Itzhak

Page 2: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Spectral ClusteringSpectral Clustering

• Algorithms that cluster points using Algorithms that cluster points using eigenvectors of matrices derived from eigenvectors of matrices derived from the datathe data

• Obtain data representation in the low-Obtain data representation in the low-dimensional space that can be easily dimensional space that can be easily clusteredclustered

• Variety of methods that use the Variety of methods that use the eigenvectors differentlyeigenvectors differently

• Difficult to understand….Difficult to understand….

Page 3: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Elements of Graph Elements of Graph TheoryTheory

•A A graphgraph GG = ( = (VV,,EE) consists of a ) consists of a vertex setvertex set V V and an and an edge setedge set EE. .

•If If GG is a is a directed graphdirected graph, each edge is an , each edge is an ordered pair of verticesordered pair of vertices

•A A bipartite graphbipartite graph is one in which the is one in which the vertices can be divided into two groups, vertices can be divided into two groups, so that all edges join vertices in different so that all edges join vertices in different groups.groups.

Page 4: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

0.1

0.2

0.8

0.7

0.6

0.8

0.8

0.8

E={WE={Wijij}} Set of weighted edges indicating pair-wise Set of weighted edges indicating pair-wise

similarity between points similarity between points

Similarity GraphSimilarity Graph

• Distance decrease similarty increaseDistance decrease similarty increase• Represent dataset as a weighted graph Represent dataset as a weighted graph G(V,E)G(V,E)

1 2 6{ , ,..., }v v v

1

2

3

4

5

6

V={xV={xii}} Set of Set of nn vertices representing data points vertices representing data points

Page 5: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Similarity GraphSimilarity Graph

• Wij represent similarity between Wij represent similarity between vertexvertex

• If Wij=0 where isn’t similarityIf Wij=0 where isn’t similarity• Wii=0 Wii=0

Page 6: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Graph PartitioningGraph Partitioning• Clustering can be viewed Clustering can be viewed

as partitioning a similarity as partitioning a similarity graphgraph

• Bi-partitioningBi-partitioning task: task:– Divide vertices into two Divide vertices into two

disjoint groups disjoint groups (A,B)(A,B)1

2

3

4

5

6

A B

V=A U B

Graph partition is NP hard

Page 7: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Clustering ObjectivesClustering Objectives

• Traditional definition of a “good” clustering:Traditional definition of a “good” clustering:1.1. Points assigned to same cluster should be highly similar.Points assigned to same cluster should be highly similar.

2.2. Points assigned to different clusters should be highly dissimilar.Points assigned to different clusters should be highly dissimilar.

Minimize weight of Minimize weight of between-groupbetween-group connections connections

0.1

0.2

0.8

0.7

0.6

0.8

0.8

0.8

1

2

3

4

5

6

• Apply these objectives to our graph representationApply these objectives to our graph representation

Page 8: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Graph CutsGraph Cuts

• Express partitioning objectives as a function of Express partitioning objectives as a function of the “edge cut” of the partition.the “edge cut” of the partition.

• Cut:Cut: Set of edges with only one vertex in a Set of edges with only one vertex in a group.we wants to find the minimal cut group.we wants to find the minimal cut beetween groups. The groups that has the beetween groups. The groups that has the minimal cut would be the partitionminimal cut would be the partition

BjAi

ijwBAcut,

),(0.1

0.2

0.8

0.7

0.6

0.8

0.8

1

2

3

4

5

6

0.8

A B

cut(A,B) = 0.3

Page 9: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Graph Cut CriteriaGraph Cut Criteria

• Criterion: Minimum-cutCriterion: Minimum-cut– Minimise weight of connections between groupsMinimise weight of connections between groups

min cut(A,B)

Optimal cutMinimum cut

• Problem:Problem:– Only considers external cluster connectionsOnly considers external cluster connections– Does not consider internal cluster densityDoes not consider internal cluster density

• Degenerate case:Degenerate case:

Page 10: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Graph Cut Criteria Graph Cut Criteria (continued)(continued)

• Criterion: Normalised-cut Criterion: Normalised-cut (Shi & (Shi & Malik,’97)Malik,’97)– Consider the connectivity between Consider the connectivity between

groups relative to the density of each groups relative to the density of each group.group.

)(

),(

)(

),(),(min

Bvol

BAcut

Avol

BAcutBANcut

– Normalise the association between groups by Normalise the association between groups by volumevolume..• Vol(A)Vol(A): The total weight of the edges originating : The total weight of the edges originating

from group from group AA. .

• Why use this criterion?Why use this criterion?– Minimising the normalised cut is equivalent to maximising normalised association.Minimising the normalised cut is equivalent to maximising normalised association.– Produces more balanced partitions.Produces more balanced partitions.

Page 11: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Second optionSecond option

_ _ _ _

_ _ _ _

0

1

0 1

A B N

A number of vertexes on A

B number of vertexes on B

A B N

A B

A or B N

thats

The previous criteria was on he weight

This following criteria is on the size of the group

Page 12: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Example – 2 SpiralsExample – 2 Spirals

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Dataset exhibits complex Dataset exhibits complex cluster shapescluster shapes

K-means performs very K-means performs very poorly in this space due poorly in this space due bias toward dense bias toward dense spherical clusters.spherical clusters.

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-0.709 -0.7085 -0.708 -0.7075 -0.707 -0.7065 -0.706In the embedded space In the embedded space given by two leading given by two leading eigenvectors, clusters eigenvectors, clusters are trivial to separate.are trivial to separate.

Page 13: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Spectral Graph TheorySpectral Graph Theory• Possible approachPossible approach

– Represent a similarity graph as a matrixRepresent a similarity graph as a matrix– Apply knowledge from Linear Algebra…Apply knowledge from Linear Algebra…

• Spectral Graph TheorySpectral Graph Theory– Analyse the “spectrum” of matrix representing a graph.Analyse the “spectrum” of matrix representing a graph.– Spectrum Spectrum : The eigenvectors of a graph, ordered by the : The eigenvectors of a graph, ordered by the

magnitude(strength) of their corresponding eigenvalues.magnitude(strength) of their corresponding eigenvalues.

},...,,{ 21 n

• The The eigenvalueseigenvalues and and eigenvectorseigenvectors of a matrix of a matrix provide global information provide global information about its structure.about its structure.

11 1 1 1

1

n

n nn n n

w w x x

λ

w w x x

Page 14: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Matrix RepresentationsMatrix Representations• Adjacency matrix Adjacency matrix (A)(A)

– n x nn x n matrix matrix

– : edge weight between vertex : edge weight between vertex xxii and and xxjj

x1 x2 x3 x4 x5 x6

x1 00 0.80.8 0.60.6 00 0.10.1 00

x20.0.88

00 0.80.8 00 00 00

x30.0.66

0.80.8 00 0.20.2 00 00

x4 00 00 0.20.2 00 0.80.8 0.70.7

x50.0.11

00 00 0.80.8 00 0.80.8

x6 00 00 00 0.70.7 0.80.8 00

0.1

0.2

0.8

0.7

0.6

0.8

0.8

1

2

3

4

5

6

0.8

• Important properties: Important properties: – Symmetric matrixSymmetric matrix Eigenvalues are Eigenvalues are realreal Eigenvector could span Eigenvector could span orthogonal baseorthogonal base

][ ijwA

Page 15: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Matrix Representations Matrix Representations (continued)(continued)

• Important application:Important application:– Normalise adjacency matrixNormalise adjacency matrix

• Degree matrix Degree matrix (D)(D)– n x n n x n diagonal matrixdiagonal matrix

– : total weight of edges incident to vertex : total weight of edges incident to vertex xxii

x1 x2 x3 x4 x5 x6

x1 1.51.5 00 00 00 00 00

x2 00 1.61.6 00 00 00 00

x3 00 00 1.61.6 00 00 00

x4 00 00 00 1.71.7 00 00

x5 00 00 00 00 1.71.7 00

x6 00 00 00 00 00 1.51.5

0.1

0.2

0.8

0.7

0.6

0.8

0.8

1

2

3

4

5

6

0.8

j

ijwiiD ),(

Page 16: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Matrix Representations Matrix Representations (continued)(continued)

• Laplacian matrix Laplacian matrix (L)(L)– n x n n x n symmetric symmetric

matrixmatrix

• Important properties:Important properties:– Eigenvalues are non-negative real numbersEigenvalues are non-negative real numbers– Eigenvectors are real and orthogonalEigenvectors are real and orthogonal– Eigenvalues and eigenvectors provide an insight into Eigenvalues and eigenvectors provide an insight into

the connectivity of the graph…the connectivity of the graph…

0.1

0.2

0.8

0.7

0.6

0.8

0.8

1

2

3

4

5

6

0.8

L = D - Ax1 x2 x3 x4 x5 x6

x1 1.51.5 --0.80.8

--0.60.6 00

--0.10.1 00

x2--

0.80.8 1.61.6 --0.80.8 00 00 00

x3--

0.60.6--

0.80.8 1.61.6 --0.20.2 00 00

x4 00 00--

0.20.2 1.71.7 --0.80.8

--0.70.7

x5--

0.10.1 00 000.80.8--

1.71.7 --0.80.8

x6 00 00 00--

0.70.7--

0.80.8 1.51.5

Page 17: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Another option – normalized Another option – normalized laplasianlaplasian

• Laplacian matrix Laplacian matrix (L) (L) – n x n n x n symmetric matrixsymmetric matrix

1.00 -0.52 -0.39 0.00 -0.06 0.00

-0.52 1.00 -0.50 0.00 0.00 0.00

-0.39 -0.50 1.00 -0.12 0.00 0.00

0.00 0.00 -0.12 1.00 -0.47 -0.44

-0.06 0.00 0.00 0.47- 1.00 -0.50

0.00 0.00 0.00 -0.44 -0.50 1.00• Important properties:Important properties:– Eigenvectors are real and normalizeEigenvectors are real and normalize– Each Aij (which i,j is not equal) =Each Aij (which i,j is not equal) =

0.1

0.2

0.8

0.7

0.6

0.8

0.8

1

2

3

4

5

6

0.8

0.5 0.5( )D D A D

ijA

Dii

Page 18: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Find An Optimal Min-CutFind An Optimal Min-Cut (Hall’70, Fiedler’73)(Hall’70, Fiedler’73)

• Express a bi-partition Express a bi-partition (A,B)(A,B) as a vector as a vector

1 if A

1 if Bi

ii

xp

x

• The laplacian is semi positiveThe laplacian is semi positive• The The Rayleigh TheoremRayleigh Theorem shows: shows:

– The minimum value for The minimum value for f(p)f(p) is given by is given by the the 22ndnd smallest eigenvalue of the Laplacian smallest eigenvalue of the Laplacian LL..

– The optimal solution for The optimal solution for pp is given by the corresponding eigenvector is given by the corresponding eigenvector λλ22, ,

referred as the referred as the Fiedler VectorFiedler Vector..

2, )()( jiVji ij ppwpf

pLpT Laplacian matrix

Page 19: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

ProofProof

• Based on • Consistency of Spectral Clustering

By Ulrike von Luxburg1, Mikhail Belkin2, Olivier BousquetMax Planck Institute for Biological CyberneticsPages 2-6Pages 2-6

Page 20: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

ProofProof

, ( ) ( )

deg( )

( ) deg( ), ( )

( ) min ( )

ij

i s

ws v vol s vol s

i w

vol s i vol s cut

b g cut s

Some definitions:

1,[ ]

1,[ ]i

Ns

i Sf

i S

Define f as follows

2( ) 4 ( )i j

Ts s ij s sf Lf w f f vol s Only the vertex that have edge

between them from different set would be meaningful ( )T

s s ijf Df w vol v For each edge the sum is on the diagonal

[ ] [ ]

1 deg[ ] deg[ ] ( ) ( )Ts

i s i s

f D i i vol s vol s

Hence it would be equal to zero only than vol(s)=vol(s’) now it could definite as

{1, 1}, 1 0

( ) min tw

f fD

b g f Lf

( )wb g

Page 21: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

ContinueContinue… …

n, 1 0

( ) min

n

t

w t

f fD

f Lfb g

f Df

{1, 1}, 1 0

( )( ) min

4

t

w t

f fD

vol v f Lfb g

f Df

From simple algebra..

The relaxation method is for each vector on

Eigen value worth Lf

Df

Because the min of eigenvalue is 0 it doesn’t give us any information ,and that’s why its bw(g)=2nd eigenvalue

Page 22: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Spectral Clustering Spectral Clustering AlgorithmsAlgorithms

• Three basic stages:Three basic stages:

1.1. Pre-processingPre-processing• Construct a matrix representation of the dataset.Construct a matrix representation of the dataset.

2.2. DecompositionDecomposition• Compute eigenvalues and eigenvectors of the Compute eigenvalues and eigenvectors of the

matrix.matrix.• Map each point to a lower-dimensional Map each point to a lower-dimensional

representation based on one or more eigenvectors.representation based on one or more eigenvectors.

3.3. GroupingGrouping• Assign points to two or more clusters, based on the Assign points to two or more clusters, based on the

new representation.new representation.

Page 23: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Spectral Bi-partitioning Spectral Bi-partitioning AlgorithmAlgorithm

1.1. Pre-processingPre-processing– Build Laplacian Build Laplacian

matrix matrix LL of the of the graphgraph

0.90.90.80.80.50.5--0.20.2--0.70.70.40.4

--0.20.2--0.60.6--0.80.8--0.40.4--0.70.70.40.4

--0.60.6--0.40.40.20.20.90.9--0.40.40.40.4

0.60.6--0.20.20.00.0--0.20.20.20.20.40.4

0.30.30.40.4--00..0.10.10.20.20.40.4

--0.90.9--0.20.20.40.40.10.10.20.20.40.4

3.03.0

2.52.5

2.32.3

2.22.2

0.40.4

0.00.0

ΛΛ = = XX = =

2.2. DecompositionDecomposition– Find eigenvalues Find eigenvalues XX

and eigenvectors and eigenvectors ΛΛ of the matrix of the matrix LL

--0.70.7x6

--0.70.7x5

--0.40.4x4

0.20.2x3

0.20.2x2

0.20.2x1– Map vertices to Map vertices to corresponding corresponding components of components of λλ22

x1 x2 x3 x4 x5 x6

x1 1.51.5 --0.80.8

--0.60.6 00

--0.10.1 00

x2--

0.80.8 1.61.6 --0.80.8 00 00 00

x3--

0.60.6--

0.80.8 1.61.6 --0.20.2 00 00

x4 00 00--

0.20.2 1.71.7 --0.80.8

--0.70.7

x5--

0.10.1 00 00--

0.80.8 1.71.7 --0.80.8

x6 00 00 00--

0.70.7--

0.80.8 1.51.5

Page 24: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Spectral Bi-partitioning Spectral Bi-partitioning AlgorithmAlgorithm

123456

0.41 -0.41 -0.65 -0.31 -0.38 0.11

0.41 -0.44 0.01 0.30 0.71 0.22

0.41 -0.37 0.64 0.04 -0.39 -0.37

0.41 0.37 0.34 -0.45 0.00 0.61

0.41 0.41 -0.17 -0.30 0.35 -0.65

0.41 0.45 -0.18 0.72 -0.29 0.09

The matrix which represents the eigenvector of the laplacian the eigenvector matched to the corresponded eigenvalues with increasing order

Page 25: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Spectral Bi-partitioning Spectral Bi-partitioning (continued)(continued)

• GroupingGrouping– Sort components of reduced 1-dimensional vector.Sort components of reduced 1-dimensional vector.– Identify clusters by splitting the sorted vector in two.Identify clusters by splitting the sorted vector in two.

• How to choose a splitting point?How to choose a splitting point?– Naïve approaches: Naïve approaches:

• Split at 0, mean or median valueSplit at 0, mean or median value– More expensive approachesMore expensive approaches

• Attempt to minimise normalised cut criterion in 1-dimensionAttempt to minimise normalised cut criterion in 1-dimension

--0.70.7x6

--0.70.7x5

--0.40.4x4

0.20.2x3

0.20.2x2

0.20.2x1 Split at 0Split at 0Cluster Cluster AA: Positive points: Positive points

Cluster Cluster BB: Negative : Negative pointspoints

0.20.2x3

0.20.2x2

0.20.2x1

--0.70.7x6

--0.70.7x5

--0.40.4x4

A B

Page 26: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

33--ClustersClustersLets assume the next data points

Page 27: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

• If we use the 2If we use the 2ndnd eigen vectoreigen vector

• If we use the 3If we use the 3rdrd eigen vectoreigen vector

Page 28: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

KK-Way Spectral -Way Spectral ClusteringClustering

• How do we partition a graph into How do we partition a graph into kk clusters?clusters?• Two basic approaches:Two basic approaches:

1.1. Recursive bi-partitioning Recursive bi-partitioning (Hagen et al.,’91)(Hagen et al.,’91)

• Recursively apply bi-partitioning algorithm in a hierarchical divisive manner.Recursively apply bi-partitioning algorithm in a hierarchical divisive manner.• Disadvantages: Inefficient, unstableDisadvantages: Inefficient, unstable

2.2. Cluster multiple eigenvectors Cluster multiple eigenvectors (Shi & Malik,’00)(Shi & Malik,’00)

• Build a reduced space from multiple eigenvectors.Build a reduced space from multiple eigenvectors.• Commonly used in recent papersCommonly used in recent papers• A preferable approach…but its like to do PCA and then k-meansA preferable approach…but its like to do PCA and then k-means

3( )O n

Page 29: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Recursive bi-partitioning Recursive bi-partitioning (Hagen (Hagen et al.,’91)et al.,’91)

• Partition using only one eigenvector Partition using only one eigenvector at a timeat a time

• Use procedure recursivelyUse procedure recursively• Example: Image SegmentationExample: Image Segmentation

– Uses 2Uses 2ndnd (smallest) eigenvector to define (smallest) eigenvector to define optimal cut optimal cut

– Recursively generates two clusters with Recursively generates two clusters with each cuteach cut

Page 30: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Why use Multiple Why use Multiple Eigenvectors?Eigenvectors?

1.1. Approximates the optimal cutApproximates the optimal cut (Shi & Malik,’00)(Shi & Malik,’00)

– Can be used to approximate the optimal k-way Can be used to approximate the optimal k-way normalised cut.normalised cut.

2.2. Emphasises cohesive clustersEmphasises cohesive clusters (Brand & Huang,’02)(Brand & Huang,’02)

– Increases the unevenness in the distribution of the data.Increases the unevenness in the distribution of the data. Associations between similar points are Associations between similar points are amplifiedamplified, ,

associations between dissimilar points are associations between dissimilar points are attenuatedattenuated.. The data begins to “approximate a clustering”.The data begins to “approximate a clustering”.

3.3. Well-separated spaceWell-separated space– Transforms data to a new “embedded space”, consisting Transforms data to a new “embedded space”, consisting

of of kk orthogonal basis vectors. orthogonal basis vectors.

• NB: Multiple eigenvectors prevent instability due NB: Multiple eigenvectors prevent instability due to information loss.to information loss.

Page 31: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

KK-Eigenvector Clustering-Eigenvector Clustering• K-eigenvector AlgorithmK-eigenvector Algorithm (Ng et al.,’01)(Ng et al.,’01)

1.1. Pre-processingPre-processing– Construct the Construct the scaled adjacency matrixscaled adjacency matrix

2/12/1' ADDA2.2. DecompositionDecomposition

• Find the eigenvalues and eigenvectors of Find the eigenvalues and eigenvectors of A'A'..• Build embedded space from the eigenvectors Build embedded space from the eigenvectors

corresponding to the corresponding to the kk largest eigenvalues. largest eigenvalues.

3.3. GroupingGrouping• Apply Apply kk-means to reduced -means to reduced n x kn x k space to space to

produce produce kk clusters. clusters.

Page 32: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

0

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

K

Eig

enva

lue

Largest Largest eigenvalueseigenvalues

of Cisi/Medline of Cisi/Medline datadata

λ1

λ2

Aside: How to select Aside: How to select kk??•EigengapEigengap: the difference between two consecutive : the difference between two consecutive

eigenvalueseigenvalues..

•Most stable clustering is generally given by the value Most stable clustering is generally given by the value kk that that maximises the expressionmaximises the expression1 kkk

Choose Choose k=2k=2

12max k

Page 33: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

ConclusionConclusion

• Clustering as a graph partitioning Clustering as a graph partitioning problemproblem– Quality of a partition can be determined Quality of a partition can be determined

using graph cut criteria.using graph cut criteria.– Identifying an optimal partition is NP-hard.Identifying an optimal partition is NP-hard.

• Spectral clustering techniquesSpectral clustering techniques– Efficient approach to calculate near-optimal Efficient approach to calculate near-optimal

bi-partitions and bi-partitions and kk-way partitions.-way partitions.– Based on well-known cut criteria and strong Based on well-known cut criteria and strong

theoretical background.theoretical background.

Page 34: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Selecting relevant genes with spectral approach

Page 35: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Genomics

tissue samples

Gen

e ex

pres

sion

s

The microarray technology provides many measurements of gene expressions for different sample tissues.

Goal: recognizing the relevantgenes that separate betweencells with different biological characteristics (normal vs. tumor,different subclasses of tumor cells)

• Classification of Tissue Samples (type of Cancer, Normal vs. Tumor)• Find Novel Subclasses (unsupervised)• Find Genes responsible for classification (new insights for drug design).

Few samples (~50) and large dimension (~10,000)

Page 36: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

....

1M iM qM

1TmTm2

Tnm

lets the microarray data matrix define by M 1[ ,..., ] n qqM M M R

Normalized gene vector

tissue vector

21im

The gene expression levels form the rows of M

1,...,

s

T Ti im m

which are most “relevant” with respect to aninference (learning) task.

Problem Definition

Page 37: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Feature Subset Relevance - Key Idea

1M iM qMTm1Tm2

Tnm

ˆ M 1

ˆ M i

ˆ M q

M i Rn

ˆ M i R l

ln

R l

Working Assumption: the relevant subset of rows induce columns that are coherently clustered.

Page 38: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

• How to measure cluster coherency? We wish to avoid explicitly clustering for each subset of rows. We wish a measure which is amenable to continuous functional analysis.

key idea: use spectral information from the affinity matrix

ˆ M 1

ˆ M i

ˆ M q

ˆ M

ˆ M T ˆ M

• How to represent ?

ˆ M T ˆ M

liis ,...,1

A ˆ M T ˆ M imimiT

i1

n

subset of features

otherwise

sii 0

1

Correlation matrix between I’th and j’th

Column (symmetric-positive)

Page 39: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Definition of RelevancyThe Standard Spectrum

General Idea:

Select a subset of rows from the sample matrix M such that the resultingaffinity matrix will have high values associated with the first k eigenvalues.

liis ,...,1

),...,(1 lii xxrel )( QAAQtrace TT

k

j j1

2

n

i

Tiii mmA

1

subset of features

otherwise

sii 0

1

Q consists of the first k eigenvectors of A

Page 40: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Optimization Problem

QAAQtrace TT

Q n ,...,, 1

max

n

i

Tiii mmA

1

IQQT

Let for some unknown real scalars Tn ,...,1

subject to

2

1

1n

ii

Motivation: from spectral clustering it is knownthat the eigenvectors tend to be discontinuous and that may lead to an effortless sparsityproperty.

Page 41: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

The AlgorithmQ

QAAQtrace TT

Q n ,...,, 1

max IQQT 1 T

If were known, then is known and Q is simply the first k eigenvectors of A A

If Q were known, then the problem becomes:

GT

n,...,1

max subject to 1 T

jTT

ijTiij QmQmmmG )(where

is the largest eigenvector of G

Page 42: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

The AlgorithmPower-embedded

Q

jrrT

ijTi

rij mQQmmmG

T )1()1()( )( 1. Let be defined)(rG

2. Let be the largest eigenvector of )(rG)(r

3. Let

n

i

Tii

ri

r mmA1

)()(

4. Let )1()()( rrr QAZ

5. )()()( rrQRr RQZ “QR” factorization step

6. Increment r

orthogonal iteration

Note that r-is the index of iterations

Page 43: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

The algorithm need 3 conditions:

• the algorithm converges to a local maximum• At the local maximum • The vector is sparse

0i

11111111111111

Page 44: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

The experiment

• Giving some data sets: blood cell

• Myeloid cell: HL-60 U937.• T cell-Jurkat,• Leukemia cell NB4

• The dimensionality of the expression data was 7229 genes over 17 sampeles

• The goal is to find cluster of the expression level of the gene without any restriction.

Page 45: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Copyright ©1999 by the National Academy of Sciences

Principle of SOMs. Initial geometry of nodes in 3 × 2 rectangular grid is indicated by solid lines connecting the nodes. Hypothetical trajectories of nodes as they migrate to fit data during successive iterations of SOM algorithm are shown. Data points are represented by black dots, six nodes

of SOM by large circles, and trajectories by arrows.

SOM

Page 46: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

S.O.M Results

• the time course data with s.o.m and after pre-processing of the data

• The results was 24 cluster each cluster

consists 6-113 genes

Page 47: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Q-alpha

After applying to Q-alpha algorithm- was found that The set of relevant genes was consists from small number of relevant genes on the you can see plot of the sorted –alpha The profile of the values indicates sparsity meaning that around 95% of the values are of an order of magnitude smaller than the remaining 5%.

Page 48: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

continue

A plot of 6 of the top 40 genes that correspond to clusters 20, 1, 22/23, 4, 15, 21 In each of the six panels time courses of all four cell lines are shown (left toright) HL-60, U937, NB4, Jurkat.

Page 49: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

Another example• Another dataset :

1. DLCL, reffered to as ”lymphoma” 7, 129 genes over 56 samples

2. Childhood medulloblastomas referred to as ”brain”. Thedimensionality of this dataset was 7, 129 and there were 60 samples

3. Breast tumors reffered to as ”breast met”. The dimensionalityof this dataset was 24,624 and there were 44 samples where

4. The fourth breast tumors for which corresponding lymph nodes either were cancerous or not, referred to as ”lymph status”. The dimensionality of this dataset is 12, 600 with 90 samples

Page 50: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

The results-using leave one out algorithm

• Compare to unsupervised method like PCA,GS and supervised methods like SNR,RMB,RFE

Page 51: Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that

The slide you all waited for