mining billion-node graphs: patterns, generators …christos/talks/10-psu/foils/faloutsospsu...cmu...

127
CMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU

Upload: others

Post on 23-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

Mining Billion-node Graphs: Patterns, Generators and

Tools Christos Faloutsos

CMU

Page 2: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

THANK YOU! •  Prof. Lee Giles

PSU'10 C. Faloutsos (CMU) 2

•  Louise Troxell

Page 3: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 3

Our goal:

Open source system for mining huge graphs:

PEGASUS project (PEta GrAph mining System)

•  www.cs.cmu.edu/~pegasus •  code and papers

PSU'10

Page 4: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 4

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools •  Problem#3: Scalability •  Conclusions

PSU'10

Page 5: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 5

Graphs - why should we care?

Internet Map [lumeta.com]

Food Web [Martinez ’91]

Protein Interactions [genomebiology.com]

Friendship Network [Moody ’01]

PSU'10

Page 6: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 6

Graphs - why should we care? •  IR: bi-partite graphs (doc-terms)

•  Citeseer: doc/authors/terms/…

•  web: hyper-text graph

•  ... and more:

D1

DN

T1

TM

... ...

PSU'10

Page 7: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 7

Graphs - why should we care? •  network of companies & board-of-directors

members •  ‘viral’ marketing •  web-log (‘blog’) news propagation •  computer network security: email/IP traffic

and anomaly detection •  ....

PSU'10

Page 8: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 8

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– Static graphs – Weighted graphs – Time evolving graphs

•  Problem#2: Tools •  Problem#3: Scalability •  Conclusions

PSU'10

Page 9: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 9

Problem #1 - network and graph mining

•  How does the Internet look like? •  How does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  which patterns/laws hold?

PSU'10

Page 10: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 10

Problem #1 - network and graph mining

•  How does the Internet look like? •  How does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  which patterns/laws hold?

–  To spot anomalies (rarities), we have to discover patterns

PSU'10

Page 11: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 11

Problem #1 - network and graph mining

•  How does the Internet look like? •  How does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  which patterns/laws hold?

–  To spot anomalies (rarities), we have to discover patterns

–  Large datasets reveal patterns/anomalies that may be invisible otherwise…

PSU'10

Page 12: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 12

Graph mining •  Are real graphs random?

PSU'10

Page 13: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 13

Laws and patterns •  Are real graphs random? •  A: NO!!

– Diameter –  in- and out- degree distributions –  other (surprising) patterns

•  So, let’s look at the data

PSU'10

Page 14: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 14

Solution# S.1 •  Power law in the degree distribution

[SIGCOMM99]

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

PSU'10

Page 15: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 15

Solution# S.2: Eigen Exponent E

•  A2: power law in the eigenvalues of the adjacency matrix

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

May 2001

PSU'10

Page 16: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 16

Solution# S.2: Eigen Exponent E

•  [Mihail, Papadimitriou ’02]: slope is ½ of rank exponent

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

May 2001

PSU'10

Page 17: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 17

But: How about graphs from other domains?

PSU'10

Page 18: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 18

More power laws: •  web hit counts [w/ A. Montgomery]

Web Site Traffic

in-degree (log scale)

Count (log scale)

Zipf

users sites

``ebay’’

PSU'10

Page 19: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 19

epinions.com •  who-trusts-whom

[Richardson + Domingos, KDD 2001]

(out) degree

count

trusts-2000-people user

PSU'10

Page 20: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

And numerous more •  # of sexual contacts •  Income [Pareto] –’80-20 distribution’ •  Duration of downloads [Bestavros+] •  Duration of UNIX jobs (‘mice and

elephants’) •  Size of files of a user •  … •  ‘Black swans’ PSU'10 C. Faloutsos (CMU) 20

Page 21: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 21

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– Static graphs •  degree, diameter, eigen, •  triangles •  cliques

– Weighted graphs – Time evolving graphs

•  Problem#2: Tools PSU'10

Page 22: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 22

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles

PSU'10

Page 23: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 23

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles –  Friends of friends are friends

•  Any patterns?

PSU'10

Page 24: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 24

Triangle Law: #S.3 [Tsourakakis ICDM 2008]

ASN HEP-TH

Epinions X-axis: # of participating triangles Y-axis: count

PSU'10

Page 25: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 25

Triangle Law: #S.3 [Tsourakakis ICDM 2008]

ASN HEP-TH

Epinions

PSU'10

X-axis: # of participating triangles Y-axis: count

Page 26: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 26

Triangle Law: #S.4 [Tsourakakis ICDM 2008]

SN Reuters

Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles

PSU'10

Page 27: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 27

Triangle Law: Computations [Tsourakakis ICDM 2008]

But: triangles are expensive to compute (3-way join; several approx. algos)

Q: Can we do that quickly?

details

PSU'10

Page 28: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 28

Triangle Law: Computations [Tsourakakis ICDM 2008]

But: triangles are expensive to compute (3-way join; several approx. algos)

Q: Can we do that quickly? A: Yes!

#triangles = 1/6 Sum ( λi3 )

(and, because of skewness, we only need the top few eigenvalues!

details

PSU'10

Page 29: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 29

Triangle Law: Computations [Tsourakakis ICDM 2008]

1000x+ speed-up, >90% accuracy

details

PSU'10

Page 30: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

EigenSpokes B. Aditya Prakash, Mukund Seshadri, Ashwin

Sridharan, Sridhar Machiraju and Christos Faloutsos: EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs, PAKDD 2010, Hyderabad, India, 21-24 June 2010.

C. Faloutsos (CMU) 30 PSU'10

Page 31: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

EigenSpokes • Eigenvectors of adjacency matrix

  equivalent to singular vectors (symmetric, undirected graph)

31 C. Faloutsos (CMU) PSU'10

Page 32: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

EigenSpokes • Eigenvectors of adjacency matrix

  equivalent to singular vectors (symmetric, undirected graph)

32 C. Faloutsos (CMU) PSU'10

N

N

details

Page 33: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

EigenSpokes •  EE plot: •  Scatter plot of

scores of u1 vs u2 •  One would expect

– Many points @ origin

– A few scattered ~randomly

C. Faloutsos (CMU) 33

u1

u2

PSU'10

1st Principal component

2nd Principal component

Page 34: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

EigenSpokes •  EE plot: •  Scatter plot of

scores of u1 vs u2 •  One would expect

– Many points @ origin

– A few scattered ~randomly

C. Faloutsos (CMU) 34

u1

u2 90o

PSU'10

Page 35: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

EigenSpokes - pervasiveness • Present in mobile social graph

 across time and space

• Patent citation graph

35 C. Faloutsos (CMU) PSU'10

Page 36: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

36 C. Faloutsos (CMU) PSU'10

Page 37: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

EigenSpokes - explanation

37 C. Faloutsos (CMU) PSU'10

Near-cliques, or near-bipartite-cores, loosely connected

Page 38: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected So what?

 Extract nodes with high scores

  high connectivity  Good “communities”

spy plot of top 20 nodes

38 C. Faloutsos (CMU) PSU'10

Page 39: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

Bipartite Communities!

magnified bipartite community

patents from same inventor(s)

cut-and-paste bibliography!

39 C. Faloutsos (CMU) PSU'10

Page 40: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 40

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– Static graphs •  degree, diameter, eigen, •  triangles •  cliques

– Weighted graphs – Time evolving graphs

•  Problem#2: Tools PSU'10

Page 41: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 41

Observations on weighted graphs?

•  A: yes - even more ‘laws’!

M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008

PSU'10

Page 42: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 42

Observation W.1: Fortification Q: How do the weights of nodes relate to degree?

PSU'10

Page 43: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 43

Observation W.1: Fortification

More donors, more $ ?

$10

$5

PSU'10

‘Reagan’

‘Clinton’ $7

Page 44: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

Edges (# donors)

In-weights ($)

C. Faloutsos (CMU) 44

Observation W.1: fortification: Snapshot Power Law

•  Weight: super-linear on in-degree •  exponent ‘iw’: 1.01 < iw < 1.26

Orgs-Candidates

e.g. John Kerry, $10M received, from 1K donors

More donors, even more $

$10

$5

PSU'10

Page 45: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 45

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– Static graphs – Weighted graphs – Time evolving graphs

•  Problem#2: Tools •  …

PSU'10

Page 46: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 46

Problem: Time evolution •  with Jure Leskovec (CMU ->

Stanford)

•  and Jon Kleinberg (Cornell – sabb. @ CMU)

PSU'10

Page 47: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 47

T.1 Evolution of the Diameter •  Prior work on Power Law graphs hints

at slowly growing diameter: –  diameter ~ O(log N) –  diameter ~ O(log log N)

•  What is happening in real data?

PSU'10

Page 48: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 48

T.1 Evolution of the Diameter •  Prior work on Power Law graphs hints

at slowly growing diameter: –  diameter ~ O(log N) –  diameter ~ O(log log N)

•  What is happening in real data? •  Diameter shrinks over time

PSU'10

Page 49: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 49

T.1 Diameter – “Patents”

•  Patent citation network

•  25 years of data •  @1999

–  2.9 M nodes –  16.5 M edges

time [years]

diameter

PSU'10

Page 50: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 50

T.2 Temporal Evolution of the Graphs

•  N(t) … nodes at time t •  E(t) … edges at time t •  Suppose that

N(t+1) = 2 * N(t) •  Q: what is your guess for

E(t+1) =? 2 * E(t)

PSU'10

Page 51: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 51

T.2 Temporal Evolution of the Graphs

•  N(t) … nodes at time t •  E(t) … edges at time t •  Suppose that

N(t+1) = 2 * N(t) •  Q: what is your guess for

E(t+1) =? 2 * E(t) •  A: over-doubled!

– But obeying the ``Densification Power Law’’ PSU'10

Page 52: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 52

T.2 Densification – Patent Citations

•  Citations among patents granted

•  @1999 –  2.9 M nodes –  16.5 M edges

•  Each year is a datapoint

N(t)

E(t)

1.66

PSU'10

Page 53: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 53

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– Static graphs – Weighted graphs – Time evolving graphs

•  Problem#2: Tools •  …

PSU'10

Page 54: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 54

More on Time-evolving graphs

M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008

PSU'10

Page 55: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 55

Observation T.3: NLCC behavior Q: How do NLCC’s emerge and join with

the GCC?

(``NLCC’’ = non-largest conn. components) – Do they continue to grow in size? –  or do they shrink? –  or stabilize?

PSU'10

Page 56: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 56

Observation T.3: NLCC behavior •  After the gelling point, the GCC takes off, but

NLCC’s remain ~constant (actually, oscillate).

IMDB

CC size

Time-stamp PSU'10

Page 57: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 57

Timing for Blogs

•  with Mary McGlohon (CMU) •  Jure Leskovec (CMU->Stanford) •  Natalie Glance (now at Google) •  Mat Hurst (now at MSR) [SDM’07]

PSU'10

Page 58: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 58

T.4 : popularity over time

Post popularity drops-off – exponentially?

lag: days after post

# in links

1 2 3

@t

@t + lag

PSU'10

Page 59: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 59

T.4 : popularity over time

Post popularity drops-off – exponentially? POWER LAW! Exponent?

# in links (log)

1 2 3 days after post (log)

PSU'10

Page 60: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 60

T.4 : popularity over time

Post popularity drops-off – exponentially? POWER LAW! Exponent? -1.6 •  close to -1.5: Barabasi’s stack model •  and like the zero-crossings of a random walk

# in links (log)

1 2 3

-1.6

days after post (log)

PSU'10

Page 61: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 61

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

– CenterPiece Subgraphs; G-Ray – OddBall (anomaly detection) – PEGASUS

•  Problem#3: Scalability •  Conclusions

PSU'10

Page 62: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

CenterPiece Subgraphs •  Hanghang TONG et al,

KDD’06

C. Faloutsos (CMU) 62 PSU'10

Page 63: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS Center-Piece Subgraph Discovery

[Tong+ KDD 06]

Original Graph

Q: Who is the most central node wrt the black nodes?

(e.g., master-mind criminal, common advisor/collaborator, etc)

Input

B

A

C

63 C. Faloutsos (CMU) PSU'10

Page 64: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

64

B

A

C B

A

C

Center-Piece Subgraph Discovery [Tong+ KDD 06]

Q: How to find hub for the query nodes?

Input: original graph Output: CePS

CePS Node

C. Faloutsos (CMU) A: Combine proximity scores (RWR) PSU'10

Page 65: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS CePS: Example (AND Query)

65

?  

C. Faloutsos (CMU) PSU'10

DBLP co-authorship network: - 400,000 authors, 2,000,000 edges

Code at: http://www.cs.cmu.edu/~htong/soft.htm

Page 66: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS CePS: Example (AND Query)

66 C. Faloutsos (CMU)

DBLP co-authorship network: - 400,000 authors, 2,000,000 edges

Code at: http://www.cs.cmu.edu/~htong/soft.htm PSU'10

Page 67: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 67

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

– CenterPiece Subgraphs; G-Ray – OddBall (anomaly detection) – PEGASUS

•  Problem#3: Scalability •  Conclusions

PSU'10

Page 68: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

Graph X-Ray: Fast Best-Effort Pattern Matching

in Large Attributed Graphs

Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad

KDD’07

Page 69: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

69

Output Input

Attributed Data Graph

Query Graph

Matching Subgraph

PSU'10 C. Faloutsos (CMU)

Page 70: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

70

Effectiveness: star-query

Query Result PSU'10 C. Faloutsos (CMU)

Page 71: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 71

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

– CenterPiece Subgraphs – OddBall (anomaly detection)

•  Problem#3: Scalability - PEGASUS

•  Conclusions

PSU'10

Page 72: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

OddBall: Spotting Anomalies in Weighted Graphs

Leman Akoglu, Mary McGlohon, Christos Faloutsos

Carnegie Mellon University School of Computer Science

To appear in PAKDD 2010, Hyderabad, India

Page 73: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

Main idea For each node, •  extract ‘ego-net’ (=1-step-away neighbors) •  Extract features (#edges, total weight, etc

etc) •  Compare with the rest of the population

C. Faloutsos (CMU) 73 PSU'10

Page 74: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS What is an egonet?

ego

74

egonet

C. Faloutsos (CMU) PSU'10

Page 75: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

Selected Features   Ni: number of neighbors (degree) of ego i   Ei: number of edges in egonet i   Wi: total weight of egonet i   λw,i: principal eigenvalue of the weighted

adjacency matrix of egonet I

75 C. Faloutsos (CMU) PSU'10

Page 76: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS Near-Clique/Star

76 PSU'10 C. Faloutsos (CMU)

Page 77: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS Near-Clique/Star

77 C. Faloutsos (CMU) PSU'10

Page 78: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 78

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

– CenterPiece Subgraphs – OddBall (anomaly detection)

•  Problem#3: Scalability -PEGASUS

•  Conclusions

PSU'10

Page 79: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 79

Centralized Hadoop/PEGASUS

Degree Distr. old old

Pagerank old old

Diameter/ANF old DONE

Conn. Comp old DONE

Triangles DONE Visualization STARTED

Outline – Algorithms & results

PSU'10

Page 80: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

HADI for diameter estimation •  Radius Plots for Mining Tera-byte Scale

Graphs U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM’10

•  Naively: diameter needs O(N**2) space and up to O(N**3) time – prohibitive (N~1B)

•  Our HADI: linear on E (~10B) – Near-linear scalability wrt # machines – Several optimizations -> 5x faster

C. Faloutsos (CMU) 80 PSU'10

Page 81: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  Largest publicly available graph ever studied.

???? ??

19+? [Barabasi+] (~106 nodes)

81 C. Faloutsos (CMU)

Radius

Count

PSU'10

Page 82: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  Largest publicly available graph ever studied.

????

82 C. Faloutsos (CMU)

Radius

Count

PSU'10

14 (dir.) ~7 (undir.)

19+? [Barabasi+] (~106 nodes)

Page 83: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  effective diameter: surprisingly small. •  Multi-modality: probably mixture of cores .

83 C. Faloutsos (CMU) PSU'10

Page 84: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

84 C. Faloutsos (CMU)

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  effective diameter: surprisingly small. •  Multi-modality: probably mixture of cores .

PSU'10

Page 85: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

Radius Plot of GCC of YahooWeb.

85 C. Faloutsos (CMU) PSU'10

Page 86: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

Running time - Kronecker and Erdos-Renyi Graphs with billions edges.

details

Page 87: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 87

Centralized Hadoop/PEGASUS

Degree Distr. old old

Pagerank old old

Diameter/ANF old DONE

Conn. Comp old DONE

Triangles DONE Visualization STARTED

Outline – Algorithms & results

PSU'10

Page 88: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS Generalized Iterated Matrix

Vector Multiplication (GIMV)

C. Faloutsos (CMU) 88

PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up).

PSU'10

Page 89: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS Generalized Iterated Matrix

Vector Multiplication (GIMV)

C. Faloutsos (CMU) 89

•  PageRank •  proximity (RWR) •  Diameter •  Connected components •  (eigenvectors, •  Belief Prop. •  … )

Matrix – vector Multiplication

(iterated)

PSU'10

details

Page 90: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

90

Example: GIM-V At Work •  Connected Components

Size

Count

C. Faloutsos (CMU) PSU'10

Page 91: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

91

Example: GIM-V At Work •  Connected Components

Size

Count

C. Faloutsos (CMU) PSU'10

~0.7B singleton nodes

Page 92: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

92

Example: GIM-V At Work •  Connected Components

Size

Count

C. Faloutsos (CMU) PSU'10

Page 93: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

93

Example: GIM-V At Work •  Connected Components

Size

Count 300-size

cmpt X 500. Why? 1100-size cmpt

X 65. Why?

C. Faloutsos (CMU) PSU'10

Page 94: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

94

Example: GIM-V At Work •  Connected Components

Size

Count

suspicious financial-advice sites

(not existing now)

C. Faloutsos (CMU) PSU'10

Page 95: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

95

GIM-V At Work •  Connected Components over Time •  LinkedIn: 7.5M nodes and 58M edges

Stable tail slope after the gelling point

C. Faloutsos (CMU) PSU'10

Page 96: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 96

Centralized Hadoop/PEGASUS

Degree Distr. old old

Pagerank old old

Diameter/ANF old DONE

Conn. Comp old DONE

Triangles DONE Visualization STARTED

Outline – Algorithms & results

PSU'10

Page 97: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 97

Triangles : Computations [Tsourakakis ICDM 2008]

But: triangles are expensive to compute (3-way join; several approx. algos)

Q: Can we do that quickly? A: Yes!

#triangles = 1/6 Sum ( λi3 )

(and, because of skewness, we only need the top few eigenvalues!

Mentioned already

PSU'10

Page 98: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 98

Triangle Law: #1 [Tsourakakis ICDM 2008]

ASN HEP-TH

Epinions X-axis: # of Triangles a node participates in

Y-axis: count of such nodes

Mentioned already

PSU'10

Page 99: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 99

Centralized Hadoop/PEGASUS

Degree Distr. old old

Pagerank old old

Diameter/ANF old DONE

Conn. Comp old DONE

Triangles DONE Visualization STARTED

Outline – Algorithms & results

PSU'10

Page 100: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

Visualization: ShiftR •  Supporting Ad Hoc Sensemaking:

Integrating Cognitive, HCI, and Data Mining Approaches Aniket Kittur, Duen Horng (‘Polo’) Chau, Christos Faloutsos, Jason I. Hong Sensemaking Workshop at CHI 2009, April 4-5. Boston, MA, USA.

C. Faloutsos (CMU) 100 PSU'10

Page 101: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

101 C. Faloutsos (CMU) PSU'10

Page 102: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 102

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools •  Problem#3: Scalability •  (additional topics, skipped) •  Conclusions

PSU'10

Page 103: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 103

Other topics - part#1 - tools

•  Community detection – how many? – Cross-Associations [Chakrabarti +, KDD 2004]

•  Time-evolving graphs – Tensors [Sun+, KDD’06], –  [Kolda+ ICDM’05] – GraphScope [Sun+, KDD’07]

•  Graph compression – CUR decomposition [Sun+ SDM’07]

PSU'10

Page 104: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 104

Other topics - part#1 - tools

•  Community detection – how many? – Cross-Associations [Chakrabarti +, KDD 2004]

•  Time-evolving graphs – Tensors [Sun+, KDD’06], –  [Kolda+ ICDM’05] – GraphScope [Sun+, KDD’07]

•  Graph compression – CUR decomposition [Sun+ SDM’07]

PSU'10

Page 105: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

105

Tensors

•  Adjacency matrices, stacked (over time, and/or edge-type – ‘composite networks’)

keyword

1990

Author PSU'10 C. Faloutsos (CMU)

Page 106: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

106

Tensors

•  Adjacency matrices, stacked (over time, and/or edge-type – ‘composite networks’)

keyword

1991 1992

1990

Author PSU'10 C. Faloutsos (CMU)

Page 107: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

107

Tensors

•  Adjacency matrices, stacked (over time, and/or edge-type – ‘composite networks’)

~ +

PARAFAC tensor decomposition (generalization of SVD)

keyword

1991 1992

1990

Author PSU'10 C. Faloutsos (CMU)

Page 108: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 108

Other topics – part#2 - generators

•  Kronecker [PKDD’05]; •  Random Typing [Akoglu+, PKDD’09]

PSU'10

Page 109: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

PSU'10 C. Faloutsos (CMU) 109

Kronecker Product – a Graph •  One of most realistic generators, with provable

properties

Page 110: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 110

Other topics - part#3 – virus propagation

•  Epidemic threshold for SIS: depends only on first eigenvalue of adjacency matrix

•  [Chakrabarti+, TISSEC’07]

PSU'10

Page 111: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 111

Other topics - part#3 – virus propagation

•  Ditto for epidemic threshold for – SIR (mumps – lifetime immunity) – SEIR (incubation) – MSEIR (temp. immunity by birth) – S I1 I2 R (HIV)

•  In all cases, the epid. threshold depends on

•  [B.A. Prakash ++, 2010] •  http://arxiv.org/abs/1004.0060 PSU'10

Page 112: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 112

Other topics - part#3 – virus propagation

•  Immunization policies [Tong+, under review]

•  Drinking water sensor placement [KDD’07]

PSU'10

Page 113: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

More info Tutorial on graph mining: KDD’09 (w/ Gary Miller and C. Tsourakakis) www.cs.cmu.edu/~christos/TALKS/09-KDD-tutorial/

Tutorial on tensors: SIGMOD’07 (w/ T. Kolda and J. Sun): www.cs.cmu.edu/~christos/TALKS/SIGMOD-07-tutorial/

PSU'10 C. Faloutsos (CMU) 113

Page 114: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 114

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools •  Problem#3: Scalability •  (additional topics, skipped) •  Conclusions

PSU'10

Page 115: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 115

OVERALL CONCLUSIONS – low level:

•  Several new patterns (fortification, triangle-laws, conn. components, etc)

•  New tools: – CenterPiece Subgraphs, G-Ray, anomaly

detection (OddBall)

•  Scalability: PEGASUS / hadoop

PSU'10

Page 116: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 116

OVERALL CONCLUSIONS – high level

•  Large datasets may reveal patterns/outliers that would be invisible otherwise

•  Terrific opportunities – Large datasets, easily(*) available PLUS

–  s/w and h/w developments

•  Promising collaborations between DB/Sys, AI/Stat, sociology, marketing, epidemiology, ++

PSU'10

Page 117: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 117

References •  Leman Akoglu, Christos Faloutsos: RTG: A Recursive

Realistic Graph Generator Using Random Typing. ECML/PKDD (1) 2009: 13-28

•  Deepayan Chakrabarti, Christos Faloutsos: Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38(1): (2006)

PSU'10

Page 118: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 118

References •  Deepayan Chakrabarti, Yang Wang, Chenxi Wang,

Jure Leskovec, Christos Faloutsos: Epidemic thresholds in real networks. ACM Trans. Inf. Syst. Secur. 10(4): (2008)

•  Deepayan Chakrabarti, Jure Leskovec, Christos Faloutsos, Samuel Madden, Carlos Guestrin, Michalis Faloutsos: Information Survival Threshold in Sensor and P2P Networks. INFOCOM 2007: 1316-1324

PSU'10

Page 119: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 119

References •  Christos Faloutsos, Tamara G. Kolda, Jimeng Sun:

Mining large graphs and streams using matrix and tensor tools. Tutorial, SIGMOD Conference 2007: 1174

PSU'10

Page 120: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 120

References •  T. G. Kolda and J. Sun. Scalable Tensor

Decompositions for Multi-aspect Data Mining. In: ICDM 2008, pp. 363-372, December 2008.

PSU'10

Page 121: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 121

References •  Jure Leskovec, Jon Kleinberg and Christos Faloutsos

Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005 (Best Research paper award).

•  Jure Leskovec, Deepayan Chakrabarti, Jon M. Kleinberg, Christos Faloutsos: Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication. PKDD 2005: 133-145

PSU'10

Page 122: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

References •  B. Aditya Prakash, Deepayan Chakrabarti,

Michalis Faloutsos, Nicholas Valler, Christos Faloutsos: Got the Flu (or Mumps)? Check the Eigenvalue! Apr 2010 arXiv:1004.0060v1

PSU'10 C. Faloutsos (CMU) 122

Page 123: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 123

References •  Jimeng Sun, Yinglian Xie, Hui Zhang, Christos

Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr 2007.

•  Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos, GraphScope: Parameter-free Mining of Large Time-evolving Graphs ACM SIGKDD Conference, San Jose, CA, August 2007

PSU'10

Page 124: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

References •  Jimeng Sun, Dacheng Tao, Christos

Faloutsos: Beyond streams and graphs: dynamic tensor analysis. KDD 2006: 374-383

PSU'10 C. Faloutsos (CMU) 124

Page 125: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 125

References •  Hanghang Tong, Christos Faloutsos, and

Jia-Yu Pan, Fast Random Walk with Restart and Its Applications, ICDM 2006, Hong Kong.

•  Hanghang Tong, Christos Faloutsos, Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA

PSU'10

Page 126: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 126

References •  Hanghang Tong, Christos Faloutsos, Brian

Gallagher, Tina Eliassi-Rad: Fast best-effort pattern matching in large attributed graphs. KDD 2007: 737-746

PSU'10

Page 127: Mining Billion-node Graphs: Patterns, Generators …christos/TALKS/10-PSU/FOILS/faloutsosPSU...CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food

CMU SCS

C. Faloutsos (CMU) 127

Project info www.cs.cmu.edu/~pegasus

Akoglu, Leman

Chau, Polo

Kang, U

McGlohon, Mary

Tsourakakis, Babis

Tong, Hanghang

Prakash, Aditya

PSU'10

Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, INTEL, HP