![Page 1: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/1.jpg)
Weiren Yu1, Xuemin Lin1, Wenjie Zhang1, Ying Zhang1 Jiajin Le2,
SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic Networks
1 University of New South Wales & NICTA, Australia
2 Donghua University, China
SIGIR 2012
![Page 2: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/2.jpg)
2
2. Problem Definition
Contents
4. Experimental Results
1. Introduction
3. Optimization Techniques
![Page 3: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/3.jpg)
3
1. Introduction
Many applications require a measure of “similarity” between objects.
similaritysearch
Citation of Scientific Papers
(citeseer.com)(amazon.co
m)
Recommender System
Graph Clustering
Web Search Engine
(google.com)
![Page 4: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/4.jpg)
4
SimFusion: A New Link-based Similarity Measure
Structural Similarity Measure
PageRank [Page et. al, 99]
SimRank [Jeh and Widom, KDD 02]
SimFusion similarity
A new promising structural measure [Xi et. al, SIGIR 05]
Extension of Co-Citation and Coupling metrics
Basic Philosophy
Following the Reinforcement Assumption:
The similarity between objects is reinforced by the similarity of
their related objects.
![Page 5: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/5.jpg)
5
SimFusion Overview
Features Using a Unified Relationship Matrix (URM) to represent
relationships among heterogeneous data Defined recursively and is computed iteratively Applicable to any domain with object-to-object relationships
Challenges URM may incur trivial solution or divergence issue of SimFusion. Rather costly to compute SimFusion on large graphs
Naïve Iteration: matrix-matrix multiplication Requiring O(Kn3) time, O(n2) space [Xi et. al. , SIGIR 05]
No incremental algorithms when edges update
![Page 6: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/6.jpg)
6
Existing SimFusion: URM and USM
Data Space: a finite set of data objects (vertices)
Data Relation (edges) Given an entire space
Intra-type Relation carrying info. within one space
Inter-type Relation carrying info. between spaces
Unified Relationship Matrix (URM):
λi,j is the weighting factor between Di and Dj
Unified Similarity Matrix (USM):
1 2{ , , }o oD
,i i i i R D D
,i j i j R D D
1
N
iiD D
1
1, ,
, if ;
, , if , ;
0, otherwise.
j
j
jn
i j i jx
x
x y x y
LN
N
R
1,1 1,1 1,2 1,2 1, 1,
2,1 2,1 2,2 2,2 2, 2,URM
,1 ,1 ,2 ,2 , ,
N N
N N
N N N N N N N N
L L L
L L LL
L L L
1,1 1,
,1 ,
. . .n
n n T
n n n
s s
s t
s s
S S L S LR
![Page 7: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/7.jpg)
7
Example.
1D 2D
3D
in tra - typ ere la tio n sh ip
in te r- typ ere la tio n sh ip
d a tasp a ce
d a ta o b je ct
1v
2v 3v
4v 5v6v
1 2 3
1 1 11 4 4 2
51 12 8 4 8
31 13 5 5 5
Λ
D D D
D
D
D
1 1 1 1 18 8 4 4 4
1 1 14 4 2
5 5 51 1 116 16 4 24 24 24
URM 31 1 1 1 110 10 5 15 15 15
31 1 1 1 110 10 5 15 15 15
31 1 1 110 10 5 10 10
0
0 0 0
0
L
High complexity
!!!
O(Kn3) time
O(n2) space
. . .n n Ts t S S L S LR
1,2 1,
2,1 2,USM
,1 ,2
1
1
1
n
n
n n
s s
s s
s s
S
SimFusion Similarity on Heterogeneous Domain
Trivial
Solution !!!
S=[1]nxn
![Page 8: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/8.jpg)
8
Contributions
Revising the existing SimFusion model, avoiding
non-semantic convergence
divergence issue
Optimizing the computation of SimFusion+
O(Km) pre-computation time, plus O(1) time and O(n) space
Better accuracy guarantee
Incremental computation on edge updates
O(δn) time and O(n) space for handling δ edge updates
![Page 9: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/9.jpg)
9
Revised SimFusion
Motivation: Two issues of the existing SimFusion model
Trivial Solution on Heterogeneous Domain
Divergent Solution on Homogeneous Domain
Root cause: row normalization of URM !!!
![Page 10: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/10.jpg)
10
From URM to UAM
Unified Adjacency Matrix (UAM)
Example
1
, ,
, if ;
, if , ;
0, otherwi e
1
s
,
.
j jn
i j i j
x
x y x y
A
N
R
![Page 11: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/11.jpg)
11
Revised SimFusion+
Basic Intuition
replace URM with UAM to postpone “row normalization”
in a delayed fashion while preserving the reinforcement
assumption of the original SimFusion
Revised SimFusion+ Model Original SimFusion
squeeze similarity scores in S into [0, 1].squeeze similarity scores in S into [0, 1].
![Page 12: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/12.jpg)
12
Optimizing SimFusion+ Computation
Conventional Iterative Paradigm
Matrix-matrix multiplication, requiring O(kn3) time and O(n2) space
Our approach: To convert SimFusion+ computation into
finding the dominant eigenvector of the UAM A.
Matrix-vector multiplication, requiring O(km) time and O(n) space
Pre-compute σmax(A) only once, and cache it for later reusePre-compute σmax(A) only once, and cache it for later reuse
![Page 13: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/13.jpg)
13
Example
Conventional Iteration:
Our approach:
Assume with
![Page 14: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/14.jpg)
14
Key Observation
Kroneckor product “ ”:⊗
e.g.
Vec operator:
e.g.
Two important Properties:
5 6 5 6 5 6 10 121 2
7 8 7 81 2 5 6 7 8 14 16, ,
3 4 7 8 15 18 20 245 6 5 63 4
21 24 28 327 8 7 8
X Y X Y
( ) [1 3 2 4]Tvec X
![Page 15: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/15.jpg)
15
Key Observation
Two important Properties:
P1.
P2.
Our main idea:
(1)
(2)Power Iteration
![Page 16: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/16.jpg)
16
Accuracy Guarantee
Conventional Iterations: No accuracy guarantee !!!
Question: || S(k+1) – S || ≤ ?
Our Method: Utilize Arnoldi decomposition to build an
order-k orthogonal subspace for the UAM A.
Due to Tk small size and almost “upper-triangularity”, Computing σmax(Tk) is less costly than σmax(A).
Due to Tk small size and almost “upper-triangularity”, Computing σmax(Tk) is less costly than σmax(A).
![Page 17: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/17.jpg)
17
Accuracy Guarantee
Arnoldi Decomposition:
k-th iterative similarity
Estimate Error:
![Page 18: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/18.jpg)
18
Example
Arnoldi Decomposition:
Assume with
Given
(1)
(2)
(3)
![Page 19: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/19.jpg)
19
Edge Update on Dynamic Graphs
Incremental UAM
Given old G =(D,R) and a new G’=(D,R’), the incremental UAM is
a list of edge updates, i.e.,
Main idea
To reuse and the eigen-pair (αp, ξp) of the old A to compute
is a sparse matrix when the number δ of edge updates is small.
Incrementally computing SimFusion+
O(δn) time
O(n) space
![Page 20: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/20.jpg)
20
ExampleSuppose edges (P1,P2) and (P2,P1) are removed.
![Page 21: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/21.jpg)
21
Experimental Setting Datasets
Synthetic data (RAND 0.5M-3.5M) Real data (DBLP, WEBKB)
Compared Algorithms
SimFusion+ and IncSimFusion+ ;
SF, a SimFusion algorithm via matrix iteration [Xi et. al, SIGIR 05];
CSF, a variant SF, using PageRank distribution [Cai et. al, SIGIR
10];
SR, a SimRank algorithm via partial sums [Lizorkin et. al, VLDBJ 10];
PR, a P-Rank encoding both in- and out-links [Zhao et. al, CIKM 09];
DBLP
WEBKB
![Page 22: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/22.jpg)
22
Experiment (1): Accuracy
On DBLP and WEBKB
SF+ accuracy is consistently stable on different datasets.SF+ accuracy is consistently stable on different datasets.
SF seems hardly to get sensible similarities as all its similarities asymptotically approach the same value as K grows.
SF seems hardly to get sensible similarities as all its similarities asymptotically approach the same value as K grows.
![Page 23: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/23.jpg)
23
Experiment (2): CPU Time and Space
On DBLP
On WEBKB
SF+ outperforms the other approaches, due to the use of σmax(Tk)SF+ outperforms the other approaches, due to the use of σmax(Tk)
![Page 24: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/24.jpg)
24
Experiment (3): Edge Updates
IncSF+ outperformed SF+ when δ is small.IncSF+ outperformed SF+ when δ is small.
For larger δ, IncSF+ is not that good because the small value of δ preserves the sparseness of the incremental UAM.
For larger δ, IncSF+ is not that good because the small value of δ preserves the sparseness of the incremental UAM.
Varying δ
![Page 25: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/25.jpg)
25
Experiment (4) : Effects of
The small choice of imposes more iterations on computing Tk and vk, and hence increases the estimation costs.
The small choice of imposes more iterations on computing Tk and vk, and hence increases the estimation costs.
![Page 26: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/26.jpg)
26
Conclusions
A revision of SimFusion+, for preventing the trivial solution
and the divergence issue of the original model.
Efficient techniques to improve the time and space of
SimFusion+ with accuracy guarantees.
An incremental algorithm to compute SimFusion+ on
dynamic graphs when edges are updated.
Devise vertex-updating methods for incrementally
computing SimFusion+.
Extend to parallelize SimFusion+ computing on GPU.
Future Work
![Page 27: Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic](https://reader034.vdocuments.us/reader034/viewer/2022051211/551b3fa6550346d31b8b46da/html5/thumbnails/27.jpg)
27