data mining with graphs and matrices · fei wang, tao li, chris ding data mining with graphs &...
TRANSCRIPT
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Data Mining with Graphs and Matrices
Fei Wang1 Tao Li1 Chris Ding2
1School of Computing and Information SciencesFlorida International University
2Department of Computer Science and EngineeringUniversity of Texas at Arlington
Tutorial at SDM 2009, Sparks, Nevada
http://feiwang03.googlepages.com/sdm-tutorial
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Table of Contents
1 Graphs and Matrices are Everywhere
2 Unsupervised Learning with Graphs & Matrices
Dimensionality Reduction
Clustering
Co-Clustering
3 Semi-supervised Learning with Graphs & Matrices
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
4 Future Research Directions
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Internet Graph
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Citation Graph
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Friendship Graph
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Airline Graph
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Protein Interaction Graph
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Social Network Analysis
Email network
Represents the email communications between users
Cluster usersIdentify communities
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Document-Term Matrices
A collection of documents is represented by annDoc-by-nTerm matrix (bag-of-words model).
Cluster or classify documents
Find a subset of terms that (accurately) clusters orclassifies documents
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Recommendation Systems
Collaborative filtering
Given the users’ historical data, predict the ratings of a
specific user to a new movie
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Bioinformatics
Gene expression data
Pick a subset of genes (if it exists) that suffices in order to
identify the “cancer type" of a patient
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Some Notations & Preliminaries
The data matrix X = [x1,x2, · · · ,xn] ! Rd!n
Generally, a graph G = "V, E# can be described as a matrix
The columns and rows are indexed by VThe elements are the strengths on the corresponding
edges in EAnalyzing graphs is usually equivalent to perform analysis
on matrices
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Table of Contents
1 Graphs and Matrices are Everywhere
2 Unsupervised Learning with Graphs & Matrices
Dimensionality Reduction
Clustering
Co-Clustering
3 Semi-supervised Learning with Graphs & Matrices
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
4 Future Research Directions
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Singular Value Decomposition
x1 x2 xn u1 uk s1
sk
vT1
vTk
Best rank-k approximation in Frobenius norm
Exact computation of SVD takes O(min(dn2,d2n)) time.
The top k left/right singular vectors/values can be
computed faster using Lanczos/Arnoldi methods.
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Singular Value Decomposition
x1 x2 xn u1 uk s1
sk
vT1
vTk
Best rank-k approximation in Frobenius norm
Exact computation of SVD takes O(min(dn2,d2n)) time.
The top k left/right singular vectors/values can be
computed faster using Lanczos/Arnoldi methods.
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Singular Value Decomposition
x1 x2 xn u1 uk s1
sk
vT1
vTk
Best rank-k approximation in Frobenius norm
Exact computation of SVD takes O(min(dn2,d2n)) time.
The top k left/right singular vectors/values can be
computed faster using Lanczos/Arnoldi methods.
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Singular Value Decomposition
x1 x2 xn u1 uk s1
sk
vT1
vTk
Best rank-k approximation in Frobenius norm
Exact computation of SVD takes O(min(dn2,d2n)) time.
The top k left/right singular vectors/values can be
computed faster using Lanczos/Arnoldi methods.
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Latent Semantic Analysis
k-dimensional semantic structure
Similarity on reduced-space: D-D, D-T, T-T
Folding-in queries: q̂ = S"1k Vkq
X
Do
cu
me
nts
Terms LSA TermVectors
Uk
SingularValues
Sk
LSA DocumentVectors
VTk
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Principal Component Analysis
Find a projection vector u ! Rd!1, such that the projected
data points Y = uTX own the largest variance, i.e., we
should solve the following optimization problem
maxu uT1
n
!n"
i=1
(xi $m)(xi $m)T#
u
s.t . %u%2 = 1 (1)
From the standard theorem of Rayleigh-Ritz, we know that
the optimal u is the eigenvector of the data covariance
matrix C corresponding to its largest eigenvalue.
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Principal Component Analysis
Find a projection vector u ! Rd!1, such that the projected
data points Y = uTX own the largest variance, i.e., we
should solve the following optimization problem
maxu uT1
n
!n"
i=1
(xi $m)(xi $m)T#
u
s.t . %u%2 = 1 (1)
From the standard theorem of Rayleigh-Ritz, we know that
the optimal u is the eigenvector of the data covariance
matrix C corresponding to its largest eigenvalue.
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
PCA & SVD
If X is centralized, then the covariance matrix C = 1nXX
T
Eigenvalue decomposition C = U!UT = 1nXX
T
SVD of X: X = USVT1nXXT = 1
nUSV
TVSU
T = U 1nS2UT
Let ! = 1nS
2, then PCA=SVD
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
PCA & SVD
If X is centralized, then the covariance matrix C = 1nXX
T
Eigenvalue decomposition C = U!UT = 1nXX
T
SVD of X: X = USVT1nXXT = 1
nUSV
TVSU
T = U 1nS2UT
Let ! = 1nS
2, then PCA=SVD
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
PCA & SVD
If X is centralized, then the covariance matrix C = 1nXX
T
Eigenvalue decomposition C = U!UT = 1nXX
T
SVD of X: X = USVT1nXXT = 1
nUSV
TVSU
T = U 1nS2UT
Let ! = 1nS
2, then PCA=SVD
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
PCA & SVD
If X is centralized, then the covariance matrix C = 1nXX
T
Eigenvalue decomposition C = U!UT = 1nXX
T
SVD of X: X = USVT1nXXT = 1
nUSV
TVSU
T = U 1nS2UT
Let ! = 1nS
2, then PCA=SVD
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
PCA & SVD
If X is centralized, then the covariance matrix C = 1nXX
T
Eigenvalue decomposition C = U!UT = 1nXX
T
SVD of X: X = USVT1nXXT = 1
nUSV
TVSU
T = U 1nS2UT
Let ! = 1nS
2, then PCA=SVD
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
PCA & SVD
If X is centralized, then the covariance matrix C = 1nXX
T
Eigenvalue decomposition C = U!UT = 1nXX
T
SVD of X: X = USVT1nXXT = 1
nUSV
TVSU
T = U 1nS2UT
Let ! = 1nS
2, then PCA=SVD
First PCSecond PC
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Nonlinear Embedding
PCA is a linear method to project the data points
What should we do if the data are nonlinearly distributed?
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Manifold & Graph
We usually assume that the high-dimensional data points
reside (nearly) on a low-dimensional nonlinear manifold
Find the low-dimensional embeddings of the data which
preserve the graph structure
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Manifold & Graph
We usually assume that the high-dimensional data points
reside (nearly) on a low-dimensional nonlinear manifold
Find the low-dimensional embeddings of the data which
preserve the graph structure
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Local Linear Embedding (LLE)
Assume each data point can be linearly reconstructed from
its neighborhood, i.e., for each xi , we minimize
!i ="
xj#Ni
%xi $ wijxj%2
s.t ."
jwij = 1 (2)
Then we use all {wij} to recover the low-dimensional
embedding of the data points Y by solving
J ="n
i=1%yi $
"yj#Ni
wijyj%2
s.t . YTY = I (3)Y = [y1, · · · , yn] is the low-dimensional embedded datamatrix
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
An Example of LLE
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Laplacian Eigenmaps (LE)
The embedded data should be sufficiently smooth with
respect to the intrinsic data manifold.
We minimize
minY"
i$jwij%yi $ yj%2
s.t . YTY = I (4)
wij represents the similarity between xi and xj
Writing in matrix form$
i$j wij%yi $ yj%2 = tr(Y(D$W)YT )
W(i, j) = wij is the similarity matrix
D = diag($
j w1j , · · · ,$
j w2j)
We call L = D$W the Laplacian matrix
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Laplacian Eigenmaps (LE)
The embedded data should be sufficiently smooth with
respect to the intrinsic data manifold.
We minimize
minY"
i$jwij%yi $ yj%2
s.t . YTY = I (4)
wij represents the similarity between xi and xj
Writing in matrix form$
i$j wij%yi $ yj%2 = tr(Y(D$W)YT )
W(i, j) = wij is the similarity matrix
D = diag($
j w1j , · · · ,$
j w2j)
We call L = D$W the Laplacian matrix
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Laplacian Eigenmaps (LE)
The embedded data should be sufficiently smooth with
respect to the intrinsic data manifold.
We minimize
minY"
i$jwij%yi $ yj%2
s.t . YTY = I (4)
wij represents the similarity between xi and xj
Writing in matrix form$
i$j wij%yi $ yj%2 = tr(Y(D$W)YT )
W(i, j) = wij is the similarity matrix
D = diag($
j w1j , · · · ,$
j w2j)
We call L = D$W the Laplacian matrix
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Graph Similarities
Node similarities: sij = exp%$%xi"xj%2
2!2
&
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Locality Preserving Projections (LPP)
Linear version of Laplacian embedding
Let P be the projection matrix, then the goal of LPP is just
to solve the following problem
minP tr(PTXLXTP)
s.t . PTP = I (5)
Locality Preserving Indexing
Laplacianface
...
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Graph Embedding: A General Framework
A general graph embedding framework:
miny"
i$jpij%yi $ yj%2
s.t . yTAy = c (6)
i & j denotes that there is an edge connecting xi and xjc is a constant
Linearization:
minp"
i$jpij%pTxi $ pTxj%2
s.t . pTXAXTp = c (7)
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Graph Embedding: A General Framework
A general graph embedding framework:
miny"
i$jpij%yi $ yj%2
s.t . yTAy = c (6)
i & j denotes that there is an edge connecting xi and xjc is a constant
Linearization:
minp"
i$jpij%pTxi $ pTxj%2
s.t . pTXAXTp = c (7)
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Summarization of Different Methods from a GE
Perspective (Shuicheng Yan et al. CVPR’05)
Algorithm P A
PCA pij = 1/n, 'i (= j A = I
LDA pij = "l1,lj/nli A = I$ eeT
LLE P = W+WT $WTW A = I
LPP pij = exp($%xi $ xj%2/(2#2)) A = D
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Table of Contents
1 Graphs and Matrices are Everywhere
2 Unsupervised Learning with Graphs & Matrices
Dimensionality Reduction
Clustering
Co-Clustering
3 Semi-supervised Learning with Graphs & Matrices
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
4 Future Research Directions
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
K-means
The data points X comes from C clusters. We aim to find
the cluster centers {fi}Ci=1 together with the clusters such
that the following criterion is minimized
min
C"
i=1
"
xj#"i
%xj $ fi%2 (8)
$i denotes the i-th cluster
We can resort to iterative procedures to solve the problem.
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
K-means Procedure
The figures come from
http://www.ccs.neu.edu/home/rjw/csg220/lectures/k-means.pdf.Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Graph Clustering
Partition the nodes V in graph G into disjoint clusters
Cut: Set of edges with points belonging to different clusters
Association: Set of edges with points belonging to the
same cluster
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Graph Cut Criteria
MinCut: Minimize the association between groups
min cut(A,B)Normalized graph cut criterions:
RatioAssociation: maxasso(A,A)
|A| + asso(B,B)|B|
RatioCut: mincut(A,B)
|A| + cut(B,A)|B|
NormalizedCut: mincut(A,B)vol(A) + cut(B,A)
vol(B)
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Graph Cut Criteria
MinCut: Minimize the association between groups
min cut(A,B)Normalized graph cut criterions:
RatioAssociation: maxasso(A,A)
|A| + asso(B,B)|B|
RatioCut: mincut(A,B)
|A| + cut(B,A)|B|
NormalizedCut: mincut(A,B)vol(A) + cut(B,A)
vol(B)
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Some Definitions on Graphs
Weight Matrix W: Wij is the weight on the edge eij
Degree Matrix D: Dii =$
jWij
Partition Matrix P: Pij = 1 if xi belongs to partition j ;
Otherwise Pij = 0
Scaled Partition Matrix 'P: 'Pij = 1/)nj if xi belongs to
partition j , nj is the size of the j-th cluster; Otherwise
P̃ij = 0
The goal of graph clustering is to solve P or 'P
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Some Definitions on Graphs
Weight Matrix W: Wij is the weight on the edge eij
Degree Matrix D: Dii =$
jWij
Partition Matrix P: Pij = 1 if xi belongs to partition j ;
Otherwise Pij = 0
Scaled Partition Matrix 'P: 'Pij = 1/)nj if xi belongs to
partition j , nj is the size of the j-th cluster; Otherwise
P̃ij = 0
The goal of graph clustering is to solve P or 'P
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Spectral Clustering
The solutions of the above optimization problems can be
finally obtained by spectral analysis of some matrices
Ratio association: Do eigenvalue decomposition toW
Ratio cut: Do eigenvalue decomposition to L = D$WNormalized cut: Do eigenvalue decomposition to
L̂ = I$ D"1/2WD"1/2
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Spectral Clustering
The solutions of the above optimization problems can be
finally obtained by spectral analysis of some matrices
Ratio association: Do eigenvalue decomposition toW
Ratio cut: Do eigenvalue decomposition to L = D$WNormalized cut: Do eigenvalue decomposition to
L̂ = I$ D"1/2WD"1/2
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Spectral Clustering II
Figure from Shi & Malik. PAMI 2000.
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
The Eigenvectors of The Normalized Laplacian Matrix
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Nonnegative Matrix Factorization
Analyzing nonnegative matrices (document-word matrix,
image matrix...)
For a nonnegative matrix X, we decompose it into two
nonnegative matrices
minF!0,G!0
%X$ FGT%2 (9)
Multiplicative update rule to solve the problem
Fij *$ Fij(XG)ij
(FGTG)ij, Gij *$ Gij
(FTX)ij
(FTFGT )ij
Parts-based representation
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Nonnegative Matrix Factorization
Analyzing nonnegative matrices (document-word matrix,
image matrix...)
For a nonnegative matrix X, we decompose it into two
nonnegative matrices
minF!0,G!0
%X$ FGT%2 (9)
Multiplicative update rule to solve the problem
Fij *$ Fij(XG)ij
(FGTG)ij, Gij *$ Gij
(FTX)ij
(FTFGT )ij
Parts-based representation
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
NMF: An Illustrative Example
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Clustering Results on TDT Data
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Clustering Results on Reuters Data
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
NMF Variants
If the data matrix X has mixed signs, then
Singular Value Decomposition: X± + F±GT±
Semi-NMF: X± + F±GT+
Convex-NMF: X± + X±W+GT+
Kernel-NMF: %(X±) + %(X±)W+GT+
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
The Relationships Between NMF and K-means
K-means objective:
Jkm ="
c
"
xi#"c
%xi $ fc%2 =n"
i=1
c"
c=1
gic%xi $ fc%2
=(((X$ FGT
(((2
F
Cluster center matrix: F = [f1, f2, · · · , fC ] ! Rn!C
G ! Rn!C with Gij = gij , such that gij = 1, if xi ! $j ;Gij = 0,otherwise.
K-means and NMF: the same objective, only differentconstraint
NMF: F , 0, G , 0K-means: Gij ! {0, 1}, G1 = 1
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
The Relationships Between K-means and PCA
!k =$nk
i=1 %x(k)i $mk%2 = %Xk $mke
T%2
!k = trace)Xk(Ink $ eeT/nk )X
Tk
*
Finally,
! =$C
k=1 !k =$c
k=1
%trace(XTi Xi) $
%eT&nk
&XTk Xk
%eT&nk
&&
Let P̃ = diag(en1&n1
, · · · ,enC&nC
)
Then ! = trace(XTX) $ trace(P̃TXTXP̃) subject to P̃T P̃ = I
Therefore we need to maximize trace(P̃TXTXP̃) and get P̃.
According to the Ky Fan therorem, P̃ is composed of the
eigenvectors of XTX corresponding to its largest C
eigevalues
If X is centralized, then it is equivalent to PCA
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
The Relationships Between K-means and Spectral
Clustering
From last slide we can see that the relaxed solution of
kmeans is equivalent to analyze the eigenstructure of
A = XTX
If we define the similarity matrix W = A, then kmeans is
equivalent to ratio association
Define the weighted kmeans criterion
!̃ =$C
k=1
$xi#"k
wi%xi $mk%2
Using similar derivation procedure, we can derive that
optimizing the above criterion is equivalent to solve
maxP̃trace(P̃TD1/2WD1/2P̃) (10)
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
The Relationships Between NMF and Spectral
Clustering
Let the normalized similarity matrix be W̃ = D"1/2WD"1/2
Then we have the following theorem
Theorem
Normalized Cut using similarity W̃ is equivalent to the following
symmetric nonnegative matrix factorization
minP̃!0
J = %W̃$ P̃P̃T%2 (11)
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Table of Contents
1 Graphs and Matrices are Everywhere
2 Unsupervised Learning with Graphs & Matrices
Dimensionality Reduction
Clustering
Co-Clustering
3 Semi-supervised Learning with Graphs & Matrices
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
4 Future Research Directions
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
The Problem
Usually the data we face with are relational, i.e., there are
multiple type of data interrelated with each other
How to cluster those relational data simultaneously?
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
The Problem
Usually the data we face with are relational, i.e., there are
multiple type of data interrelated with each other
How to cluster those relational data simultaneously?
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
A Spectral Approach
Define the similarity matrix on the bi-partite graph
A =
+0 R
RT 0
,
Also the concatenated cluster membership vector
x = [xTI ,xTII ]T
Then the co-clustering problem becomes a graph-cut
problem on the bi-partite graph, i.e., we should solve the
following generalized eigenvalue decomposition problem
Lx = &Dx (12)
where D = diag($
j A1j , · · · ,$
j Anj ), L = D$ A
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
A Spectral Approach
Define the similarity matrix on the bi-partite graph
A =
+0 R
RT 0
,
Also the concatenated cluster membership vector
x = [xTI ,xTII ]T
Then the co-clustering problem becomes a graph-cut
problem on the bi-partite graph, i.e., we should solve the
following generalized eigenvalue decomposition problem
Lx = &Dx (12)
where D = diag($
j A1j , · · · ,$
j Anj ), L = D$ A
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
A Spectral Approach
Define the similarity matrix on the bi-partite graph
A =
+0 R
RT 0
,
Also the concatenated cluster membership vector
x = [xTI ,xTII ]T
Then the co-clustering problem becomes a graph-cut
problem on the bi-partite graph, i.e., we should solve the
following generalized eigenvalue decomposition problem
Lx = &Dx (12)
where D = diag($
j A1j , · · · ,$
j Anj ), L = D$ A
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Nonnegative Matrix Tri-Factorization
Factorize the user-movie rating matrix X into threematrices F,S,G, such that
F represents the cluster memberships on the user sideG represents the cluster memberships on the movie side
By relaxing the integer constrains on F,G, we need to
solve the following optimization problem
minF'0,S'0,G'0
%X$ FSGT%2, s.t . FTF = I, GGT = I (13)
We can derive some multiplicative update rules to solve for
the optimal F,S,G
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
An Example of NMTF
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Dimensionality Reduction
Clustering
Co-Clustering
Other Types of Co-Clustering Methods
Information-Theoretic Co-clustering (Dhillon et al. KDD’03)
Bayesian Co-Clustering (Shan & Banerjee. ICDM’08)
Tensor Method (Banerjee et al. SDM’07)
Collective Factorization on Related Matrices (Long et al.
ICML’06)
Multiple Latent Semantic Analysis (Wang et al. SIGIR’06)
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Why Semi-supervised Learning
Traditional learning problems
Supervised learning: learning with labeled data set
Unsupervised learning: learning with unlabeled data set
Problems
Supervised learning: requires much human effort,
expensive and time consuming
Unsupervised learning: unreliable
Semi-supervised learning
Learning with partially labeled data
Learning with side-information
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Why Semi-supervised Learning
Traditional learning problems
Supervised learning: learning with labeled data set
Unsupervised learning: learning with unlabeled data set
Problems
Supervised learning: requires much human effort,
expensive and time consuming
Unsupervised learning: unreliable
Semi-supervised learning
Learning with partially labeled data
Learning with side-information
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Why Semi-supervised Learning
Traditional learning problems
Supervised learning: learning with labeled data set
Unsupervised learning: learning with unlabeled data set
Problems
Supervised learning: requires much human effort,
expensive and time consuming
Unsupervised learning: unreliable
Semi-supervised learning
Learning with partially labeled data
Learning with side-information
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
The Similarity Between SSL and Ranking
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
The Similarity Between SSL and Collaborative
Filtering
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Table of Contents
1 Graphs and Matrices are Everywhere
2 Unsupervised Learning with Graphs & Matrices
Dimensionality Reduction
Clustering
Co-Clustering
3 Semi-supervised Learning with Graphs & Matrices
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
4 Future Research Directions
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Semi-supervised Assumption
Smoothness Assumption: If two points x1,x2 in a
high-density region are close, then so should be the
corresponding outputs y1, y2
Cluster Assumption: If points are in the same cluster, they
are likely to be of the same class
Manifold Assumption: The (high-dimensional) data lie
(roughly) on a low-dimensional manifold
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Label Propagation
Connect the data points that are close to each other
(Nearest Neighbor Graph)
Propagate the class labels over the connected graph
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Propagation Rules
Initial label vector: y ! Rn!1
yi = ti if xi is labeled as ti ; yi = 0 if xi is unlabeled
f(1)i = yi if xi is labeled; f
(1)i = '
$xj#Ni
Pijyj otherwise
P ! Rn!n is the propagation matrixMatrix form: f(1) = y+ 'Py
f(2) = f(1) + 'Pf(1) = (I+ 'P+ '2P2)y
Finally f(() =$(
i=0 'iPiy = (I$ 'P)"1y
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Propagation Rules
Initial label vector: y ! Rn!1
yi = ti if xi is labeled as ti ; yi = 0 if xi is unlabeled
f(1)i = yi if xi is labeled; f
(1)i = '
$xj#Ni
Pijyj otherwise
P ! Rn!n is the propagation matrixMatrix form: f(1) = y+ 'Py
f(2) = f(1) + 'Pf(1) = (I+ 'P+ '2P2)y
Finally f(() =$(
i=0 'iPiy = (I$ 'P)"1y
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Propagation Rules
Initial label vector: y ! Rn!1
yi = ti if xi is labeled as ti ; yi = 0 if xi is unlabeled
f(1)i = yi if xi is labeled; f
(1)i = '
$xj#Ni
Pijyj otherwise
P ! Rn!n is the propagation matrixMatrix form: f(1) = y+ 'Py
f(2) = f(1) + 'Pf(1) = (I+ 'P+ '2P2)y
Finally f(() =$(
i=0 'iPiy = (I$ 'P)"1y
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Propagation Rules
Initial label vector: y ! Rn!1
yi = ti if xi is labeled as ti ; yi = 0 if xi is unlabeled
f(1)i = yi if xi is labeled; f
(1)i = '
$xj#Ni
Pijyj otherwise
P ! Rn!n is the propagation matrixMatrix form: f(1) = y+ 'Py
f(2) = f(1) + 'Pf(1) = (I+ 'P+ '2P2)y
Finally f(() =$(
i=0 'iPiy = (I$ 'P)"1y
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
An Example
!1.5 !1 !0.5 0 0.5 1 1.5 2 2.5!0.8
!0.6
!0.4
!0.2
0
0.2
0.4
0.6
0.8
1
1.2
!1.5 !1 !0.5 0 0.5 1 1.5 2 2.5!0.8
!0.6
!0.4
!0.2
0
0.2
0.4
0.6
0.8
1
1.2
!1.5 !1 !0.5 0 0.5 1 1.5 2 2.5!0.8
!0.6
!0.4
!0.2
0
0.2
0.4
0.6
0.8
1
1.2
!1.5 !1 !0.5 0 0.5 1 1.5 2 2.5!0.8
!0.6
!0.4
!0.2
0
0.2
0.4
0.6
0.8
1
1.2
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
The Calculation of P
Asymmetrically Normalized Similarity Matrix:
P = D"1W
Symmetrically Normalized Similarity Matrix:
P = D"1/2WD"1/2
How to determine the optimal # when computing Wij?
Linear Neighborhood Similarity
minWij%xi $
"
xj#Ni
Wijxj%2
s.t ."
j
Wij = 1, Wij , 0
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
The Calculation of P
Asymmetrically Normalized Similarity Matrix:
P = D"1W
Symmetrically Normalized Similarity Matrix:
P = D"1/2WD"1/2
How to determine the optimal # when computing Wij?
Linear Neighborhood Similarity
minWij%xi $
"
xj#Ni
Wijxj%2
s.t ."
j
Wij = 1, Wij , 0
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
The Calculation of P
Asymmetrically Normalized Similarity Matrix:
P = D"1W
Symmetrically Normalized Similarity Matrix:
P = D"1/2WD"1/2
How to determine the optimal # when computing Wij?
Linear Neighborhood Similarity
minWij%xi $
"
xj#Ni
Wijxj%2
s.t ."
j
Wij = 1, Wij , 0
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
A Regularization Framework
Label consistency: the predicted labels should be
sufficiently close to the initial labels on the labeled data
points
Label smoothness: the predicted labels should be
sufficiently smooth with respect to the data manifold
(graph)
minf
l"
i=1
(fi $ ti)2 +
n"
i=l+1
f 2i + µ
"
i$jwij(fi $ fj)
2
The first term reflects label consistencyThe second term guarantees the predicted label values
should fall in a reasonable range for numerical stabilityThe third term reflects label smoothness
f = (I+ µL)"1y
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
A Regularization Framework
Label consistency: the predicted labels should be
sufficiently close to the initial labels on the labeled data
points
Label smoothness: the predicted labels should be
sufficiently smooth with respect to the data manifold
(graph)
minf
l"
i=1
(fi $ ti)2 +
n"
i=l+1
f 2i + µ
"
i$jwij(fi $ fj)
2
The first term reflects label consistencyThe second term guarantees the predicted label values
should fall in a reasonable range for numerical stabilityThe third term reflects label smoothness
f = (I+ µL)"1y
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Experimental Results on 20Newsgroup Data
autos, motorcycles, baseball, and hockey under rec
4 10 15 20 25 30 35 40 45 500.2
0.3
0.4
0.5
0.6
0.7
0.8
Classification Results on 20 Newsgroup Data
(a)
LNPConsistencyGaussian FieldsSVMNN
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Table of Contents
1 Graphs and Matrices are Everywhere
2 Unsupervised Learning with Graphs & Matrices
Dimensionality Reduction
Clustering
Co-Clustering
3 Semi-supervised Learning with Graphs & Matrices
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
4 Future Research Directions
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
What is Side-Information
Types of side-information
Must-link: a pair of points should belong to the same classCannot-link: a pair of points should not appear in the same
class
Side-information is a type of prior knowledge weaker thanpartial labeling
Knowing the partial labeling, we can transform it into
side-informationBut not vice versa
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Pairwise Constrained K-means Clustering
K-means objective: Jkm =$
c
$xi#"c %xi $ fc%
2
Matrix form: Jkm =(((X$ FGT
(((2
F
Cluster center matrix: F = [f1, f2, · · · , fC ] ! Rn!C
G ! Rn!C with Gij = 1, if xi ! $j ; Gij = 0,otherwise.
The objective of PCKM
J($) ="
c
"
xi#"c
%xi $ fc%2 +"
xi ,xj!M
s.t. li "=lj
(ij +"
xi ,xj!C
s.t. li=lj
(̃ij ,
{(ij ! 0}: penalties for violating the must-link constraints
{(̃ij ! 0}: penalties for violating the cannot-link constraints
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Pairwise Constrained K-means Clustering
K-means objective: Jkm =$
c
$xi#"c %xi $ fc%
2
Matrix form: Jkm =(((X$ FGT
(((2
F
Cluster center matrix: F = [f1, f2, · · · , fC ] ! Rn!C
G ! Rn!C with Gij = 1, if xi ! $j ; Gij = 0,otherwise.
The objective of PCKM
J($) ="
c
"
xi#"c
%xi $ fc%2 +"
xi ,xj!M
s.t. li "=lj
(ij +"
xi ,xj!C
s.t. li=lj
(̃ij ,
{(ij ! 0}: penalties for violating the must-link constraints
{(̃ij ! 0}: penalties for violating the cannot-link constraints
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Penalized Matrix Factorization
Changing the penalties of violations in the constraints in Minto the awards as
J($) ="
c
"
xi"!c
%xi $ fc%2 $"
xi ,xj!M
s.t. li=lj
(ij +"
xi ,xj!C
s.t. li=lj
(̃ij
="
c
"
xi
Gic%xi $ fc%2 +"
c
"
i,j
GicGjc!ij
!ij =
-.
/
(̃ij , (xi , xj) ! C$(ij , (xi , xj) ! M0, otherwise
Penalized matrix factorization objective
minF,G J($) =(((X$ FGT
(((2
F+ tr(GT
"G)
s.t . G ! 0 (14)
" ! Rn!n with its (i, j)-th entry "ij = !ij
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Penalized Matrix Factorization
Changing the penalties of violations in the constraints in Minto the awards as
J($) ="
c
"
xi"!c
%xi $ fc%2 $"
xi ,xj!M
s.t. li=lj
(ij +"
xi ,xj!C
s.t. li=lj
(̃ij
="
c
"
xi
Gic%xi $ fc%2 +"
c
"
i,j
GicGjc!ij
!ij =
-.
/
(̃ij , (xi , xj) ! C$(ij , (xi , xj) ! M0, otherwise
Penalized matrix factorization objective
minF,G J($) =(((X$ FGT
(((2
F+ tr(GT
"G)
s.t . G ! 0 (14)
" ! Rn!n with its (i, j)-th entry "ij = !ij
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Updating Rules for PMF
Table: Penalized Matrix Factorization for Constrained Clustering
Inputs: Data matrix X, Constraints matrix ".
Outputs: F, G.
1. Initialize G;
2. Repeat the following steps until convergence:
(a). Fixing G, updating F by F = XG(GTG)"1;
(b). Fixing F, updating G by
Gij *$ Gij
0112(XT F)+
ij+[G(FTF)#]
ij+
%"
#G
&
ij
(XT F)#ij
+[G(FTF)+]ij+
%"
+G
&
ij
.
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Side-Information on Bi-partite Graph
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
PMF on Bi-partite Graph
minG1!0,G2!0
J = %R12 $G1SGT2 %2 + tr(GT
1 "1G1) + tr(GT2 "2G2)
Table: PMF on Bi-partite GraphInputs: Relation matrix R12, Constraints matrices "1, "2.Outputs: G1, S, G2.1. Initialize G1, G2;2. Repeat the following steps until convergence:
(a). Fixing G1,G2, updating S using
S)" (GT1G1)
#1GT1R12G2(G
T2G2)
#1;(b). Fixing S,G2, updating G1 using
G1ij ) G1 ij
3(R12G2S
T )+ij
+[G1(STGT2G2S)"]ij+(""
1G1)ij
(R12G2ST )"
ij+[G1(STGT
2G2S)+]ij+("+
1 G1)ij;
(c). Fixing G1,S, updating G2 using
G2ij ) G2 ij
3(RT
12G1S)+
ij+[G2(SGT
1G1S
T )"]ij+(""
2G2)ij
(RT12G1S)"
ij+[G2(SGT
1G1S
T )+]ij+("+2 G2)ij
.
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Semi-supervised Learning with Partially Labeled Data
Semi-supervised Learning Using Side-Information
Table: The F measure of three algorithms on the BBS data set
Data Sets Algorithm d = 3 d = 4 d = 5 d = 6
1 MLSA 0.7019 0.7079 0.7549 0.7541
1 SRC 0.7281 0.6878 0.6183 0.6183
1 Tri-SPMF 0.7948 0.8011 0.8021 0.79932 MLSA 0.7651 0.7429 0.7581 0.7309
2 SRC 0.7627 0.7226 0.7280 0.6965
2 Tri-SPMF 0.8007 0.7984 0.7938 0.78963 MLSA 0.6689 0.6511 0.6987 0.7301
3 SRC 0.7556 0.7666 0.7472 0.7125
3 Tri-SPMF 0.8095 0.8034 0.7993 0.7874
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Tensor & Hypergraph Based Methods
In knowledge & information management, we usually facewith multi-relational data
Graph based methods can capture the pairwise
relationships
Matrix is also only composed of two dimensions
Hypergraph is more efficient in describing the multiple-wise
relationships
Tensor is also a structure that can capture multiple-wise
relationships
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Efficient & Large Scale Methods
Matrix & Graph based methods usually involve highcomputational cost
eigenvalue decompositionsolving large scale linear equation systems
constrained optimization
How to make the algorithm more efficient?
Exploring the sparsity
How to improve scalability?
Smart sampling
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Efficient & Large Scale Methods
Matrix & Graph based methods usually involve highcomputational cost
eigenvalue decompositionsolving large scale linear equation systems
constrained optimization
How to make the algorithm more efficient?
Exploring the sparsity
How to improve scalability?
Smart sampling
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Efficient & Large Scale Methods
Matrix & Graph based methods usually involve highcomputational cost
eigenvalue decompositionsolving large scale linear equation systems
constrained optimization
How to make the algorithm more efficient?
Exploring the sparsity
How to improve scalability?
Smart sampling
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Efficient & Large Scale Methods
Matrix & Graph based methods usually involve highcomputational cost
eigenvalue decompositionsolving large scale linear equation systems
constrained optimization
How to make the algorithm more efficient?
Exploring the sparsity
How to improve scalability?
Smart sampling
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Probabilistic Interpretations
Potential problems of describing the data with matrices
Too large
Too complicatedMissing entries
Noisy entries· · · · · ·
Probabilistic interpretations & graphical models
Discover latent structures
Relationships with matrix based methods?
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Knowledge Transfer Across Different Domains
The multi-relational data contain data points from differentdomains
We may easily get some prior knowledge on some domains
How to transfer the knowledge from one domain toanother?
What knowledge to transfer?How?
Is it really helps?
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Knowledge Transfer Across Different Domains
The multi-relational data contain data points from differentdomains
We may easily get some prior knowledge on some domains
How to transfer the knowledge from one domain toanother?
What knowledge to transfer?How?
Is it really helps?
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Mikhail Belkin, Partha Niyogi. Laplacian eigenmaps and spectral techniques forembedding and clustering. NIPS 2001.
A. Banerjee, S. Basu, S. Merugu. Multi-way Clustering on Relation Graphs. SDM2007.
I. S. Dhillon, Y. Guan, and B. Kulis. Weighted Graph Cuts without Eigenvectors: AMultilevel Approach. PAMI 2007.
I. S. Dhillon, S. Mallela, and D. S. Modha. Information-Theoretic Co-clustering.KDD 2003.
I. S. Dhillon. Co-Clustering Documents and Words Using Bipartite Spectral GraphPartitioning. KDD 2001.
Chris Ding, Xiaofeng He, and Horst D. Simon. On the Equivalence ofNonnegative Matrix Factorization and Spectral Clustering. SDM 2005.
Chris Ding, Tao Li, Wei Peng, Haesun Park. Orthogonal Nonnegative MatrixTri-factorizations for Clustering. KDD 2006.
Chris Ding, Tao Li, Michael I. Jordan. Convex and Semi-Nonnegative MatrixFactorizations. Technical Report. 2006.
Chris Ding, Rong Jin, Tao Li, and Horst D. Simon. A Learning Framework UsingGreen’s Function and Kernel Regularization with Application for RecommenderSystem KDD 2007.
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Xiaofei He and Partha Niyogi. Locality Preserving Projections. NIPS 2003.
Xiaofei He, Deng Cai, Haifeng Liu, and Wei-Ying Ma. Locality PreservingIndexing for Document Representation. SIGIR 2004.
D. D. Lee and H. S. Seung. Learning the parts of objects with nonnegative matrixfactorization. Nature 1999.
D. D. Lee and H. S. Seung. Algorithms for Non-negative Matrix Factorization.NIPS 2000.
Tao Li, Chris Ding, Yi Zhang, and Bo Shao. Knowledge Transformation from WordSpace to Document Space. SIGIR 2008.
Tao Li, Chris Ding, and Michael Jordan. Solving Consensus and Semi-supervisedClustering Problems Using Nonnegative Matrix Factorization ICDM 2007.
Tao Li and Chris Ding. The Relationships Among Various Nonnegative MatrixFactorization Methods for Clustering. ICDM 2006.
Sam Roweis and Lawrence Saul. Nonlinear dimensionality reduction by locallylinear embedding. Science 2000.
Hanhuai Shan and Arindam Banerjee. Bayesian Co-clustering. ICDM 2008.
J. Shi and J. Malik. Normalized Cuts and Image Segmentation. PAMI 2000.
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Fei Wang, Shouchun Chen, Tao Li, Changshui Zhang. Semi-Supervised MetricLearning by Maximizing Constraint Margin. CIKM 2008.
Fei Wang, Tao Li and Changshui Zhang. Semi-Supervised Clustering via MatrixFactorization. SDM 2008.
Fei Wang, Sheng Ma, Liuzhong Yang, Tao Li. Recommendation on Item Graphs.ICDM 2006.
Fei Wang, Changshui Zhang. Label Propagation Through Linear Neighborhoods.ICML 2006.
X. Wang, J. Sun, Z. Chen, and C. Zhai. Latent semantic analysis for multiple-typeinterrelated data objects. SIGIR 2006.
Wei Xu, Xin Liu, Yihong Gong. Document Clustering Based on Non-negativeMatrix Factorization. SIGIR 2003.
S. Yan, D. Xu, B. Zhang and H. Zhang. Graph Embedding: A General Frameworkfor Dimensionality Reduction. CVPR 2005.
D. Zhou , O. Bousquet, T.N. Lal, J. Weston and B. Schölkopf. Learning with Localand Global Consistency. NIPS 2003.
Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty. Semi-supervised learningusing Gaussian fields and harmonic functions. ICML 2003.
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled datawith label propagation. Technical Report CMU-CALD-02-107, Carnegie MellonUniversity, 2002.
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices
Graphs and Matrices are Everywhere
Unsupervised Learning with Graphs & Matrices
Semi-supervised Learning with Graphs & Matrices
Future Research Directions
Thank You!http://feiwang03.googlepages.com/sdm-tutorial
Fei Wang, Tao Li, Chris Ding Data Mining with Graphs & Matrices