18th international conference on database and expert systems applications journey to the centre of...
TRANSCRIPT
![Page 1: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/1.jpg)
18th International Conference on Database and Expert Systems Applications
18th International Conference on Database and Expert Systems Applications
Journey to the Centre of the Star:Various Ways of Finding Star Centers in
Star Clustering
Tok Wee HyongDerry Tanti WijayaStéphane Bressan
![Page 2: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/2.jpg)
18th International Conference on Database and Expert Systems Applications
Vector Space Clustering
• Naturally translates into a graph clustering problem for a dense graph
Vectors
Weight is cosine of
corresponding vectors
![Page 3: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/3.jpg)
18th International Conference on Database and Expert Systems Applications
Star Clustering for Graph [1]
• Computes vertex cover by a simple computation of star-shaped dense sub-graphs
1. Lower weight edges are pruned
2. Vertices with higher degree (that are not satellites) are chosen in turn as Star centers
3. Vertices connected to a center become satellites
4. Algorithm terminates when every vertex is either a center or a satellite
5. Each center and its satellites form a cluster
![Page 4: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/4.jpg)
18th International Conference on Database and Expert Systems Applications
Star Clustering
• Does not require the indication of an a priori number of clusters
• Allows clusters to overlap• Analytically guarantees a lower bound on the
similarity between objects in each cluster• Computes more accurate clusters than either the
single or average link hierarchical clustering
![Page 5: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/5.jpg)
18th International Conference on Database and Expert Systems Applications
Star Clustering
• Two critical elements:
• Threshold for pruning edges (σ)• Metrics for selecting Star centers
• Aslam et al. [1] derived the theoretical lower bound on the expected similarity between two satellites in a cluster
• Empirically shown to be a good estimate of the actual similarity
• Current metrics for selecting Star centers does not leverage this finding
Our focus is on the metrics for selecting Star centers
![Page 6: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/6.jpg)
18th International Conference on Database and Expert Systems Applications
Extended Star Clustering
• Choose Star centers using complement degree of vertices
• Allow Star centers to be adjacent to one another• Has two versions: unrestricted and restricted
![Page 7: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/7.jpg)
18th International Conference on Database and Expert Systems Applications
Our proposal
• Degree may not be the best metrics• We propose metrics that considers weights of
edges in order to maximize intra-cluster similarity:• Markov Stationary Distribution• Lower Bound• Average• Sum
![Page 8: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/8.jpg)
18th International Conference on Database and Expert Systems Applications
Markov Stationary Distribution
• Similar to the idea of Google’s Page Rank algorithm [2]
Method:• Similarity graph is normalized into a symmetric
Markov matrix• Compute the stationary distribution of the matrix
A* = (I – A) -1
• Vertices are sorted by their stationary values and chosen in turn as Star centers
![Page 9: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/9.jpg)
18th International Conference on Database and Expert Systems Applications
Lower Bound
• Theoretical lower bound on expected similarity between satellite vertices:
cos(γi,j) ≥ cos(αi) cos(αj)+ (σ / σ + 1) sin(αi) sin(αj)
• Can be used to estimate the average intra-cluster similarity
• Lower bound metric is the estimated average intra-cluster similarity when v is a Star center and v.adj are its satelliteslb (v) = ((Σvi v.adj cos(αi)) 2 + (σ / σ + 1) (Σvi v.adj sin(αi)) 2) / n2
• Computed on the pruned graph
![Page 10: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/10.jpg)
18th International Conference on Database and Expert Systems Applications
Average and Sum
• Approximations of the lower bound metric• Computed on the pruned graph• For each vertex v,
ave (v) = Σvi v.adj∈ cos(αi) / degree(v)
sum (v) = Σvi v.adj∈ cos(αi) • Average metric is the square root of the first
term in the lower bound metric
![Page 11: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/11.jpg)
18th International Conference on Database and Expert Systems Applications
Markov, Lower Bound, Average, Sum Metrics
• We integrate our proposed metrics in the Star algorithm and its variants to produce:• Star-lb• Star-sum• Star-ave• Star-markov• Star-extended-sum-(r)• Star-extended-ave-(r) • Star-extended-sum-(u) • Star-extended-ave-(u) • Star-online-sum• Star-online-ave
![Page 12: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/12.jpg)
18th International Conference on Database and Expert Systems Applications
Experiments
• Compare performance with off-line and on-line Star clustering and restricted and unrestricted Extended Star clustering
• Use data from Reuters-21578, Tipster-AP, and our original collection: Google
• Measure effectiveness: recall, precision, F1• Measure efficiency: running time • Measure sensitivity to σ
![Page 13: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/13.jpg)
18th International Conference on Database and Expert Systems Applications
Off-line Algorithms
• Star-lb and Star-ave are most effective but Star-ave is much more efficient
• Star-random performs comparably to original Star when threshold σ is the average similarity
![Page 14: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/14.jpg)
18th International Conference on Database and Expert Systems Applications
Off-line Algorithms
Effectiveness comparison
0
0.2
0.4
0.6
0.8
1
1.2
star
star-
ave
star-
sum
star-
mark
ov
star-
random
star-
lb
star
star-
ave
star-
sum
star-
mark
ov
star-
random
star-
lb
star
star-
ave
star-
sum
star-
mark
ov
star-
random
star-
lb
reuters tipster-ap google
PrecisionRecallF1
![Page 15: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/15.jpg)
18th International Conference on Database and Expert Systems Applications
Off-line Algorithms
Efficiency comparison
0
50000
100000
150000
200000
250000
300000
350000
400000st
ar
star-
ave
star-
sum
star-
ma
rkov
star-
rand
om
star-
lb
star
star-
ave
star-
sum
star-
ma
rkov
star-
rand
om
star-
lb
star
star-
ave
star-
sum
star-
ma
rkov
star-
rand
om
star-
lb
reuters tipster-ap google
Tim
e (
ms)
![Page 16: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/16.jpg)
18th International Conference on Database and Expert Systems Applications
Order of Stars
• We empirically demonstrate that Star-ave indeed approximates Star-lb better than other algorithms by a similar choice of Star centers
![Page 17: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/17.jpg)
18th International Conference on Database and Expert Systems Applications
Order of Stars (on Tipster-AP)
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52
iteration
exp
ecte
d s
imil
arit
y ra
nk
starstar-sumstar-markovstar-avestar-lb
![Page 18: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/18.jpg)
18th International Conference on Database and Expert Systems Applications
Sensitivity to σ
As compared to the original Star:• Star-ave and Star-markov converge to a
maximum F1 at a lower threshold • The maximum F1 of Star-ave is higher • F1 gradient of Star-ave and Star-markov is
smaller
![Page 19: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/19.jpg)
18th International Conference on Database and Expert Systems Applications
Sensitivity to σ (F1 on Reuters)
0
0.2
0.4
0.6
0.8
1
1.20
0.2
0.6
1 (m
ean) 1.2
1.6 2
2.5 3
3.5 4
4.5 5
5.5 6
6.5 7
7.5 8
8.5 9
9.5 10 20
s
F1
starstar-avestar-markovstar-sumstar-lb
σ
![Page 20: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/20.jpg)
18th International Conference on Database and Expert Systems Applications
Sensitivity to σ (F1 gradient on Reuters)
0
0.1
0.2
0.3
0.4
0.5
0.60
0.2
0.6
1 (
me
an)
1.2
1.6 2
2.5 3
3.5 4
4.5 5
5.5 6
6.5 7
7.5 8
8.5 9
9.5 10
20
s
starstar-avestar-markovstar-sumstar-lb
σ
![Page 21: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/21.jpg)
18th International Conference on Database and Expert Systems Applications
Extended Star
• Star-ave is more effective and efficient than Star-extended-(r)
• Star-extended-ave-(r) improves the effectiveness of Star-extended-(r)
• Similar findings are observed with Star-extended-(u)
![Page 22: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/22.jpg)
18th International Conference on Database and Expert Systems Applications
Extended Star
Effectiveness comparison
0
0.2
0.4
0.6
0.8
1
1.2
sta
r-ave
sta
r-exte
nd
ed-(
r)
sta
r-exte
nded-a
ve-(
r)
sta
r-exte
nded-s
um
-(r)
sta
r-ave
sta
r-exte
nd
ed-(
r)
sta
r-exte
nded-a
ve-(
r)
sta
r-exte
nded-s
um
-(r)
sta
r-ave
sta
r-exte
nd
ed-(
r)
sta
r-exte
nded-a
ve-(
r)
sta
r-exte
nded-s
um
-(r)
reuters tipster-ap2 google
PrecisionRecallF1
![Page 23: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/23.jpg)
18th International Conference on Database and Expert Systems Applications
Extended Star
Efficiency comparison
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000st
ar-a
ve
star
-ext
ende
d-(r
)
star
-ext
ende
d-av
e-(r
)
star
-ext
ende
d-su
m-(
r)
star
-ave
star
-ext
ende
d-(r
)
star
-ext
ende
d-av
e-(r
)
star
-ext
ende
d-su
m-(
r)
star
-ave
star
-ext
ende
d-(r
)
star
-ext
ende
d-av
e-(r
)
star
-ext
ende
d-su
m-(
r)
reuters tipster-ap2 google
Tim
e (m
s)
![Page 24: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/24.jpg)
18th International Conference on Database and Expert Systems Applications
On-line Algorithms
• Star-online-ave is more effective and efficient than the original Star on-line algorithm
![Page 25: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/25.jpg)
18th International Conference on Database and Expert Systems Applications
On-line Algorithms
0
0.2
0.4
0.6
0.8
1
1.2st
ar-o
nlin
e
star
-onl
ine-
ave
star
-onl
ine-
sum
star
-onl
ine-
rand
om
star
-onl
ine
star
-onl
ine-
ave
star
-onl
ine-
sum
star
-onl
ine-
rand
om
star
-onl
ine
star
-onl
ine-
ave
star
-onl
ine-
sum
star
-onl
ine-
rand
om
reuters tipster-ap google
PrecisionRecallF1
Effectiveness comparison
![Page 26: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/26.jpg)
18th International Conference on Database and Expert Systems Applications
On-line Algorithms
Efficiency comparison
0
50000
100000
150000
200000
250000
300000
350000
400000st
ar-
on
line
sta
r-o
nlin
e-
ave
sta
r-o
nlin
e-
sum
sta
r-o
nlin
e-
ran
do
m
sta
r-o
nlin
e
sta
r-o
nlin
e-
ave
sta
r-o
nlin
e-
sum
sta
r-o
nlin
e-
ran
do
m
sta
r-o
nlin
e
sta
r-o
nlin
e-
ave
sta
r-o
nlin
e-
sum
sta
r-o
nlin
e-
ran
do
m
reuters tipster-ap google
Tim
e (
ms
)
![Page 27: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/27.jpg)
18th International Conference on Database and Expert Systems Applications
Conclusion
• Current metrics for selecting Star centers is not optimal
• We propose various new metrics for selecting Star centers that maximize intra-cluster similarity
• Average metrics is a fast and good approximation of lower bound metrics
• Since intra-cluster similarity is maximized, it is precision that is mostly improved
• Our proposed average metrics yield up to 19.1% improvement on precision for off-line algorithms, 20.9% improvement on precision for on-line algorithms, and 102% improvement on precision for extended star algorithm
![Page 28: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/28.jpg)
18th International Conference on Database and Expert Systems Applications
References
1. Aslam, J., Pelekhov, K., Rus, D.: The Star Clustering Algorithm. In Journal of Graph Algorithms and Applications, 8(1) 95–129 (2004)
2. Brin Sergey, Page Lawrence: The anatomy of a large-scale hypertextual Web search engine. Proceedings of the seventh international conference on World Wide Web 7, 107-117 (1998)
![Page 29: 18th International Conference on Database and Expert Systems Applications Journey to the Centre of the Star: Various Ways of Finding Star Centers in Star](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649da05503460f94a8afc1/html5/thumbnails/29.jpg)
18th International Conference on Database and Expert Systems Applications
Credits
This work was funded
by the National University of Singapore ARG project R-252-000-285-112,
"Mind Your Language: Corpora and Algorithms
for Fundamental Natural Language Processing
Tasks in Information Retrieval
and Extraction for the Indonesian
and Malay languages"
Copyright © 2007 by Stéphane Bressan