cmu scs mining billion-node graphs christos faloutsos cmu
Post on 19-Dec-2015
222 views
TRANSCRIPT
CMU SCS
Mining Billion-node Graphs
Christos Faloutsos
CMU
CMU SCS
Related Tasks for this presentation• E2.1• E3.1• I3.1
INARC CUNY'10 C. Faloutsos (CMU) 3
CMU SCS
Big picture: large graph mining• Patterns / anomalies
– Static graphs– Dynamic graphs– Weighted graphs– ‘heterogeneous’ graphs (multiple-type
nodes/edges)
• Generators– Kronecker; Random Typing; Tensors
• Influence/Virus PropagationINARC CUNY'10 4C. Faloutsos (CMU)
CMU SCS
Big picture: large graph mining• Patterns / anomalies
– Static graphs– Dynamic graphs– Weighted graphs– ‘heterogeneous’ graphs (multiple-type
nodes/edges)
• Generators– Kronecker; Random Typing; Tensors
• Influence/Virus Propagation
Duration of phonecalls (E2.1)
(I3.1)INARC CUNY'10 5C. Faloutsos (CMU)
CMU SCS
Virus Propagation on Time-Varying Networks: Theory and
Immunization Algorithms
ECML-PKDD 2010, Barcelona, Spain
B. Aditya Prakash*, Hanghang Tong* ^, Nicholas Valler+, Michalis
Faloutsos+, Christos Faloutsos**Carnegie Mellon University, Pittsburgh USA
+University of California – Riverside USA^ IBM Research, Hawthrone USA
PKDD, 2010 demo
INARC and IRC
CMU SCS
Q1: threshold?Strong Virus
INARC CUNY'10 7C. Faloutsos (CMU)
CMU SCS
Q1: threshold?
Epidemic!
Strong Virus
INARC CUNY'10 8C. Faloutsos (CMU)
CMU SCS
Q1: threshold?Weak Virus
INARC CUNY'10 9C. Faloutsos (CMU)
CMU SCS
Q1: threshold?Weak Virus
Small infection
INARC CUNY'10 10C. Faloutsos (CMU)
CMU SCS
Q2: Immunization?
Which nodes to immunize?
?
?
INARC CUNY'10 11C. Faloutsos (CMU)
CMU SCS
Standard, static graph setting:• Simple stochastic framework (Flu-like – SIS’)• FIXED underlying contact-network – ‘who-
can-infect-whom’
• OUR CASE:–Changes in time – alternating behaviors!–E.g., day vs night
Our Framework
INARC CUNY'10 12C. Faloutsos (CMU)
CMU SCS
• ‘S’ Susceptible (= healthy); ‘I’ Infected• No immunity (cured nodes -> ‘S’)
Reminder: ‘Flu-like’ (SIS)
Susceptible Infected
Infected by neighbor
Cured internally
INARC CUNY'10 13C. Faloutsos (CMU)
CMU SCS
• Virus birth rate β• Host cure rate δ
SIS model (continued)
Infected
Healthy
XN1
N3
N2Prob. β
Prob. β
Prob. δ
INARC CUNY'10 14C. Faloutsos (CMU)
CMU SCS
Alternating Behaviors
adjacency matrix
8
8
INARC CUNY'10 15C. Faloutsos (CMU)
DAY
(e.g., work)
CMU SCS
Alternating Behaviors
NIGHT
(e.g., home)
adjacency matrix
8
8
INARC CUNY'10 16C. Faloutsos (CMU)
CMU SCS
√Our Framework √SIS epidemic model
√Time varying graphs• Problem Descriptions• Epidemic Threshold• Immunization• Conclusion
Outline
INARC CUNY'10 17C. Faloutsos (CMU)
CMU SCS
• SIS model– cure rate δ– infection rate β
• Set of T arbitrary graphs
Formally, given
day
N
N night
N
N ….weekend…..
Infected
Healthy
XN1
N3
N2
Prob. βProb. β
Prob. δ
INARC CUNY'10 18C. Faloutsos (CMU)
CMU SCS
Find…
Q1: Epidemic Threshold:
Fast die-out?
Q2: Immunization
best k??
?
above
below
I
t
INARC CUNY'10 19C. Faloutsos (CMU)
CMU SCS
• NO epidemic if
eig (S) = lS < 1
Q1: Threshold - Main result
INARC CUNY'10 20C. Faloutsos (CMU)
CMU SCS
• NO epidemic if
eig (S) = lS < 1
Q1: Threshold - Main result
Single number!
Largest eigenvalue of the “system matrix ”
INARC CUNY'10 21C. Faloutsos (CMU)
CMU SCS
NO epidemic if eig (S) < 1
S = Pi Si
cure rate
infection rate
……..
adjacency matrix
N
N
day night
Details
INARC CUNY'10 22C. Faloutsos (CMU)
CMU SCS
• Synthetic– 100 nodes– Clique; Chain
• MIT Reality Mining– 104 mobile devices– September 2004 – June 2005– 12-hr adjacency matrices
Q1: Simulation experiments
INARC CUNY'10 23C. Faloutsos (CMU)
CMU SCS
‘Take-off’ plots
Synthetic MIT Reality Mining
Footprint (# infected @ steady state)
Our threshold Our
threshold
(log scale)
NO EPIDEMIC
EPIDEMICEPIDEMIC
NO EPIDEMIC
INARC CUNY'10 24C. Faloutsos (CMU)
CMU SCS
Time-plots
Synthetic MIT Reality Mininglog(# infected)
Time
< threshold
@ threshold >threshold
@ threshold
INARC CUNY'10 25C. Faloutsos (CMU)
>threshold
< threshold
CMU SCS
√Motivation
√Our Framework √SIS epidemic model
√Time varying graphs
√Problem Descriptions
√Epidemic Threshold• Immunization
Outline
INARC CUNY'10 26C. Faloutsos (CMU)
CMU SCS
• Our solution–reduce lPi Si ( == l )–goal: max ‘eigendrop’ Δl
• Comparison - But : No competing policy• We propose and evaluate many policies
Q2: Immunization
Δl = l_before - l _after
?
INARC CUNY'10 27C. Faloutsos (CMU)
?
CMU SCS
Lower is better
OptimalGreedy-S
Greedy-DavgA
INARC CUNY'10 28C. Faloutsos (CMU)
CMU SCS
• Time-varying Graphs• SIS (flu-like) propagation model
√ Q1: Epidemic Threshold - < 1l– Only first eigen-value of system matrix!
√ Q2: Immunization Policies – max. Δl – Optimal– Greedy-S– Greedy-DavgA– etc.
Conclusion
INARC CUNY'10 29C. Faloutsos (CMU)
CMU SCS
Goal: large graph mining• Patterns / anomalies
– Static graphs– Dynamic graphs– Weighted graphs– ‘heterogeneous’ graphs (multiple-type
nodes/edges)
• Generators– Kronecker; Random Typing; Tensors
• Influence/Virus Propagation
Duration of phonecalls (E2.1)
(I3.1)INARC CUNY'10 30C. Faloutsos (CMU)
✔
CMU SCS
Duration of phonecalls
Surprising Patterns for the Call Duration Distribution of Mobile Phone Users
Pedro O. S. Vaz de Melo, Leman
Akoglu, Christos Faloutsos, Antonio A. F. Loureiro
PKDD 2010
INARC CUNY'10 31C. Faloutsos (CMU)
CMU SCS
Probably, power law (?)
??
INARC CUNY'10 32C. Faloutsos (CMU)
CMU SCS
No Power Law!
INARC CUNY'10 33C. Faloutsos (CMU)
CMU SCS
‘TLaC: Lazy Contractor’• The longer a task (phonecall) has taken,• The even longer it will take
Odds ratio=
Casualties(<x):Survivors(>=x)
== power law
INARC CUNY'10 34C. Faloutsos (CMU)
CMU SCS
Data Description
Data from a private mobile operator of a large city 4 months of data 3.1 million users more than 1 billion phone records
Among users with >30 phonecalls 96% followed TLAC Rest: anomalies (too many 1h phonecalls)
INARC CUNY'10 35C. Faloutsos (CMU)
CMU SCS
Goal: large graph mining• Patterns / anomalies
– Static graphs– Dynamic graphs– Weighted graphs– ‘heterogeneous’ graphs (multiple-type
nodes/edges)
• Generators– Kronecker; Random Typing; Tensors
• Influence/Virus Propagation
Duration of phonecalls (E2.1)
(I3.1)INARC CUNY'10 36C. Faloutsos (CMU)
✔
✔
CMU SCS
Project infowww.cs.cmu.edu/~pegasus
Akoglu, Leman
Chau, Polo
Kang, U
McGlohon, Mary
Tsourakakis, Babis
Tong, Hanghang
Prakash,Aditya
Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, INTEL, HP
INARC CUNY'10 37C. Faloutsos (CMU)