scs cmu joint work by hanghang tong, spiros papadimitriou, jimeng sun, philip s. yu, christos...
TRANSCRIPT
SCS CMU
Joint Work by
Hanghang Tong, Spiros Papadimitriou, Jimeng Sun,
Philip S. Yu, Christos Faloutsos
Speaker: Hanghang Tong
Aug. 24-27, 2008, Las Vegas KDD 2008
Colibri: Fast Mining of Large Static and Dynamic Graphs
SCS CMU
Motivation• Q: How to find patterns?
– e.g., community, anomaly, etc.
• A: Low-Rank Approximation (LRA) for Adjacency Matrix of the Graph.
3
A L
M RX X
~~
SCS CMU
LRA for Graph Mining: Example
4
John
KDD
Tom
Bob
Carl
Van
RoyRECOMB
ISMB
ICDM
Author Conf.
L M R
~~X X
Adj. matrix: A
Au. clusters
Conf. Cluster
Interaction
Recon. error is high ‘Carl’ is abnormal
SCS CMU
Challenges
• How to get (L, M, R)+ Efficiently (both time and space);
+ Intuitively (easy for interpretation);
+ Dynamically (track patterns over time)?
5
SCS CMU
6
Roadmap
• Motivation
• Existing Methods– SVD– CUR/CX
• Proposed Methods: Colibri
• Experimental Results
• Conclusion
SCS CMU
Matrix & Column Space
• Matrix
• Column Space of a Matrix
B =
7
3 11 10 0b1 b2
b1 , b2 are vectors in 3-d space!
b2 b1
SCS CMU
Projection, Projection Matrix & Core Matrix
8
v
v~
v~ = B v
BTBTB+
X X X
Projection of v Projection matrix of B An arbitrary vector
Core Matrix
SCS CMU
Singular-Value-Decomposition (SVD)
9
….a1 a2 a3 am…
A: n x m
….u1 uk…
U: left singular vectors
….
…
….
v1
V: right singular vectors
vk
1
k
x x
…
……
… … … … …
…
…
~~
SCS CMU
SVD: How to
• #1: Find the left matrix U, where
• #2: Project A into the column space of U
10
( ) ...T TA U U U U A U V
1 ,1 2 ,2 ,...Ti i m i mi
ii i
a v a v a vA vu
Projection Matrix of Column Space of U
SCS CMU
SVD: drawbacks
• Efficiency– Time– Space (U, V) are dense
• Interpretation
• Dynamic: not easy11
2 2(min( , ))O n m nm
1st singular vector
2nd singular vector
=
A U V
SCS CMU
CUR (CX) decomposition
12
…. …
A: n x m
….
C
…. ….
R
x x…
…
…
…
…
…
…
…
U
( )TC C TC A
~~•Sample Columns from A to form C•Project A onto the col. Space of C
SCS CMU
CUR (CX): advantages
13
• Efficiency (better than SVD)– Time
• (c is # of sampled col.s)
– Space (C, R) are sparse
• Interpretation
2 3( ) or ( )O c n O c cm
SCS CMU
• Redundancy in C, wasting both time and space
• Dynamic: not easy
CUR (CX): drawbacks
14
• 3 copies of green, • 2 copies of red, • 2 copies of purple• purple=0.5*green + red…
SCS CMU
15
Roadmap
• Motivation
• Existing Methods
• Colibri– Colibri-S for static graphs– Colibri-D for dynamic graphs
• Experimental Results
• Conclusion
SCS CMU
16
• 3 copies of green, • 2 copies of red, • 2 copies of purple• purple=0.5*green + red…
Colibri-S: Basic Idea
L
….
….
….
RMx x
CUR (CX) Colibri-SOriginal Matrix
We want the Col.s in L are linearly independent with each other!
SCS CMU
M= =CoreMatrix
17
InitiallySampled matrix C
….
L = : Linearly Ind. Col.s
….
….
….
-1
R = LT x A = ….
Input Output
?
LT L
Q: How to find L & M from C efficiently?
SCS CMU
discard v
18
A: Find L & M iteratively!….
Current L & M
Redundant ?
…
For each col. v in CProject it on L
Initial Sampled Matrix c
Expand L & M
SCS CMU
19
Colibri-S vs. CUR(CX)• Quality:
• Colibri-S = CUR(CX)• Time:
• Colibri-S >= CUR(CX)• Space
• Colibri-S >= CUR(CX)• Illustrations
Colibri-S CUR (CX)
3 3( ) vs. ( ), where ,O c cm O c cm c c m m
SCS CMU
Colirbri-D for dynamic graphs
20
Initially sampled matrix
t+1
Lt
Mt Rt
Lt+1
Mt+1 Rt+1
?
Q: How to update L and M efficiently?
t
SCS CMU
Colibri-D: How-To
21
Initially sampled matrix
t+1
Lt
Mt Rt
Lt+1
Mt+1 Rt+1
t
Selected Redundant
Selected Redundant
?
Changed from t
SCS CMU
Colibri-D: How-To
22
Initially sampled matrix
t+1
Lt
Mt
Lt+1
Mt+1
t
Selected Redundant
Selected Redundant
L~ Subspace by
blue cols at t+1
Un
ch
ang
ed
C
ols!
SCS CMU
24
Experimental Setup
• Datasets• Network traffic• 21,837 sources/destinations• 1,222 consecutive hours• 22,800 edges per hour
• Accuracy:Accu =
• Space Cost:
SCS CMU
25
Performance of Colibri-S
Time Space
Ours
CUR CUR
CMD
OursCMD
• Accuracy• Same 91%+
• Time• 12x of CMD• 28x of CUR
• Space• ~1/3 of CMD• ~10% of CUR
SCS CMU
27
Performance of Colibri-D
Time
# of changed cols
CMD
Colibri-S
Colibri-D achieves up to 112x speedups
Colibri-D
SCS CMU
A Family of Low-Rank Approximationfor Fast Graph Mining
• Colibri-S– For static graphs– Remove redundancy– Significant saving in time & space by “free”
• Colibri-D– For dynamic graphs– Explores “smoothness”– Up to 112x than best known methods
28