staticgreedy: solving the scalability-accuracy dilemma in influence maximization
DESCRIPTION
StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization. Suqi Cheng Research Center of Web Data Sciences & Engineering Institute of Computing Technology, Chinese Academy of Sciences [email protected],[email protected] http://www.nascgroup.org/~ chengsuqi. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/1.jpg)
StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization
Suqi ChengResearch Center of Web Data Sciences & Engineering
Institute of Computing Technology, Chinese Academy of [email protected],[email protected]
http://www.nascgroup.org/~chengsuqi
Authors: Suqi Cheng, Huawei Shen, Junming Huang, Guoqing Zhang, Xueqi Cheng
![Page 2: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/2.jpg)
2
Outline
• Background• Preliminaries• Motivation• StaticGreedy algorithm• Experiments
![Page 3: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/3.jpg)
3
Information Cascade
• An action or idea are adopted one by one due to social influence– cascade through social relationships
• Main Applications– Word-of-Mouth marketing– Out-break detection– Popularity prediction
social network
![Page 4: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/4.jpg)
4
Word-of-Mouth Marketing
• To promote a product by seeding a few users; users adopting the product will recommend it
• Advantages: efficient; cost-effective
Company seed users follow-up activated users
free product/discount influence
How to select the optimal seed users?
![Page 5: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/5.jpg)
5
Influence Maximization for Viral Marketing
• Objective function– Influence spread I(S) : expected number of activated
(influenced/adpoted) nodes– Maximize I(S)
• Input:– A social influence graph G=(V, E)
– An information cascade model– An integer k, |S| ≤ k
• Output: A seed set S
![Page 6: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/6.jpg)
6
Information Cascade Model
• Independent cascade (IC) model– each edge (u, v) has a propagation probability
p(u, v)– each newly activated node u independently
activates its out-neighbor v with probability p(u, v)
– a discrete time model
• Influence spread estimation on IC model– Monte Carlo simulation– Heuristic methods
0.1 0.2
0.3 0.1
0.1
0.5
0.4
0.1
0.4 0.4
0.2
0.2
0.10.5
0.3
Social influence graph
[Leskovec, 2008]
![Page 7: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/7.jpg)
7
Difficulties in Influence Maximization
Greedy approximate algorithm [Kempe, KDD’03]
(1-1/e-ε)-approximation iteratively select nodes with largest
marginal influence spread guaranteed by submodularity and
montonicity properties of influence spread function
accurate
inefficient
Difficulty 1: Influence maximization problem is NP-hard.[kempe, KDD’03]
Existing solutions
Heuristics Degree Pagerank Betweennes
efficient
inaccurate
![Page 8: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/8.jpg)
8
Difficulties in Influence Maximization
Existing solutions
Heuristic methods DegreeDiscount[Chen,
KDD’09] CGA[Wang, KDD‘10] PMIA[Chen,KDD’10] IRIE[Jung, ICDM’12]
efficient
inaccurate
Monte-Carlo simulation CELF optimization[Leskovec,KDD’07] NewGreedy[Chen, KDD’09] CELF++ optimization[Goyal,WWW’11]
accurate
time-consuming
Difficulty 2: To exactly compute influence spread is #P-hard. [Chen, KDD’10]
A scalability-accuracy delimma!
![Page 9: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/9.jpg)
9
Our works
• Objective : to propose an influence maximization algorithm to solve the scalability-accuracy dilemma
Algorithm Accuracy Scalability
Approximate algorithms
Greedy [Kempe, KDD’03] gurannteed low
CreedyCELF [Leskovec, KDD’07] gurannteed low
GreedyCELF++ [Goyal, WWW’11] gurannteed low
NewGreedy/MixedGreedy
[Chen, KDD’09] gurannteed low
StaticGreedy [cheng, CIKM’13] gurannteed high
Heuristics
Degree ungurannteed high
PageRank [Page, 1999] ungurannteed high
DegreeDiscount [Chen, KDD’09] ungurannteed high
PMIA [Chen, KDD’10] ungurannteed high
IRIE [Jung, ICDM’12] ungurannteed high
SP1M [Kimura, PKDD’06] ungurannteed relatively low
![Page 10: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/10.jpg)
10
Preliminaries-1
• Social influence graph: G=(V, E), n=|V|, m=|E|
• Influence spread: I(S)
• Marginal influence spread: M(v|S)=I(S{v}) - I(S)
guaranteeguarantee
• Greedy approximate algorithm– iteratively select nodes with the largest marginal influence spread– provide 1-1/e-ε approximation
• Properties of I(S) under independent cascade model– submodularity: I(S{v}) - I(S) I(T{v}) - I(S) iff vV, S T V
– monotonicity: I(S{v}) I(S)
Influence spread estimation
![Page 11: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/11.jpg)
11
Preliminaries-2
• Monte Carlo simulation for influence spread estimation– to approximate true values of influence spread by realizations
method An instance Advantage Disadvantage
simulation modeling the information cascade process
relatively low time complexity
estimate one seed set at a time
snapshot[Chen, KDD’09]
removing each edge (u, v) from G with probability 1-p(u, v)
can estimate any seed set simultaneously
relatively high time complexity
equivalent
![Page 12: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/12.jpg)
12
Motivation
• In existing greedy algorithms– a risk of unguaranteed submodularity and monotonicity of influence
spread function
influence graph snapshot1 snapshot 2
iteration 1 iteration 2
Submodularity is breaked!
0 4 0 4
1 4 1 2 4 2
( { }) ( ) ({ }) ( ) 1
( { }) ( ) ({ , }) ({ }) 3
I S v I S I v I
I S v I S I v v I v
– caused by using different results of Monte Carlo simulation across different influence spread estimation
– a very large value of R is required, e.g. R=20000R: number of Monte Carlo simulations for estimation
![Page 13: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/13.jpg)
13
StaticGreedy algorithm
• Core idea: to always use the same snapshots for influence spread estimation– influence spread function is submodular and monotone– a small value of R is required, e.g. R=100
Part1: Generate R static snapshots
Part 2: Greedy selection
![Page 14: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/14.jpg)
14
Performance analysis: Convergence rate
• provide (1-1/e-ε)-approximation with a small value of R
d R,k
log R
*,
, *
( ) ( )
( )k R k
R kk
I S I Sd
I S
seed set size = 50
NetHEPT: a benchmark networkuniform independent cascade (UIC) model: p(u, v) = p = 0.01weighted independent cascade (WIC) model: p(u, v) = 1/(# of in-neighbors of v)
![Page 15: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/15.jpg)
15
Performance analysis: Scalabilitylo
g R
min
seed set size
min ,min{ | 0.005}R kR R d
seed set size
log
runn
ing
time
(sec
)
≈103 times≈102 times
Minimal R required Running time
R is significantly reduced Running time is significantly reduced
![Page 16: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/16.jpg)
16
Performance analysis: Complexity
2
,
' 10
' u v
R R
m p m
n: number of nodes in social influence graphm: number of edges in social influence graphm’: expected number of edges in a snapshot
![Page 17: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/17.jpg)
17
Speed up StaticGreedy
• A dynamic update strategy– calculates the marginal gain in an efficient incremental manner
• at each step t, for each snapshot: M(v) M(v) - |R(v)R(vt*)|, R(v) R(v) - R(v)R(vt*)
– trades space for time
v2v1
v3 v4 v5
v6 v7 v8
M(v1)=4M(v2)=3M(v3)=2M(v4)=1M(v5)=1M(v6)=1M(v7)=2M(v8)=1
v1
snapshot
initial
R(v): reachable nodes from v in the snapshot
![Page 18: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/18.jpg)
18
Speed up StaticGreedy
• A dynamic update strategy– calculates the marginal gain in an efficient incremental manner
• at each step t, for each snapshot: M(v) M(v) - |R(v)R(vt*)|, R(v) R(v) - R(v)R(vt*)
– trades space for time
v2v1
v3 v4 v5
v6 v7 v8
M(v1)=4M(v2)=3M(v3)=2M(v4)=1M(v5)=1M(v6)=1M(v7)=2M(v8)=1
M(v1)=0M(v2)=2M(v3)=0M(v4)=0M(v5)=1M(v6)=0M(v7)=2M(v8)=1
v1
directlyupdate
snapshot
after select v* = v1
R(v): reachable nodes from v in the snapshot
-1-4
-2 -1
-1
![Page 19: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/19.jpg)
19
Experiments: setup
• Algorithms: – Our algorithms: StaticGreedyCELF, StaticGreedyDU– Baselines: CELFGreedy, SP1M, PMIA, Degree, DegreeDiscount
• Tested datasets
• Independent cascade models– uniform independent cascade(UIC) model: p(u, v) = p = 0.01– weighted independent cascade(WIC) model: p(u, v) = 1/(# of in-neighbors of v)
• Metrics: Influence spread, running time
![Page 20: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/20.jpg)
20
Experiments: influence spread
• StaticGreedy achieves better accuracy than other heuristics
NetPHY
DBLP
UIC model
UIC model
WIC model
WIC model
![Page 21: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/21.jpg)
21
Experiments: running time• StaticGreedy runs >103 times faster than CELFGreedy• StaticGreedy has comparable scalability to state-of-the-art heuristics• StaticGreedyDU always runs faster than StaticGreedyCELF
log
runn
ing
time
(sec
)
UIC model WIC model
![Page 22: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/22.jpg)
22
conclusion• Essential reason of the inefficiency of existing greedy algorithms
– a risk of unguaranteed submodularity and monotonicity– caused by different Monte Carlo simulations across different estimations– a very large value of R is required guaranteed accuracy + inefficiency
• StaticGreedy algorithm– guaranteed submodularity and monotonicity– using the same Monte Carlo simulations across different estimations– a small value of R is required guaranteed accuracy + high scalability
– runs >103 times quicker than conventional greedy algorithms
• A dynamic update strategy to speed up StaticGreedy– about 10 times faster
![Page 23: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814644550346895db34f75/html5/thumbnails/23.jpg)
23
Thank you!Thank you!
Q & AQ & A