fast graphlet decomposition: theory, algorithms, and applications

47

Upload: nesreen-k-ahmed

Post on 11-Feb-2017

183 views

Category:

Data & Analytics


5 download

TRANSCRIPT

Page 1: Fast Graphlet Decomposition: Theory, Algorithms, and Applications
Page 2: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Jennifer'NevillePurdue&University

Ryan'A.'RossiPARC

Nick'DuffieldTexas&A&M&University

Ted'WillkeIntel&Research&Labs

Page 3: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Social'Network Internet'(AS)

BiologicalPolitical'Blogs

Graph Mining

Page 4: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Network(Motifs:(Simple(Building(Blocks(of(Complex(Networks(– [Milo&et.&al&– Science&2002]The(Structure(and(Function(of(Complex(Networks(– [Newman&– Siam&Review&2003]

2"node'Graphlets

3"node'Graphlets

4"node'Graphlets

Connected(

Disconnected

Page 5: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

! Small&k9vertex&induced&subgraphs

Network(Motifs:(Simple(Building(Blocks(of(Complex(Networks(– [Milo&et.&al&– Science&2002]The(Structure(and(Function(of(Complex(Networks(– [Newman&– Siam&Review&2003]

2"node'Graphlets

3"node'Graphlets

4"node'Graphlets

Connected(

Disconnected

Page 6: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

! Small&k9vertex&induced&subgraphs

! Motifs:&Occur& in&real9world&networks&with&frequencies&significantly(higher than&randomly&generated&networks

Network(Motifs:(Simple(Building(Blocks(of(Complex(Networks(– [Milo&et.&al&– Science&2002]The(Structure(and(Function(of(Complex(Networks(– [Newman&– Siam&Review&2003]

2"node'Graphlets

3"node'Graphlets

4"node'Graphlets

Connected(

Disconnected

Page 7: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

! Small&k9vertex&induced&subgraphs

! Motifs:&Occur& in&real9world&networks&with&frequencies&significantly(higher than&randomly&generated&networks

! Applied&to&food&web,&genetic,& neural,&web,&and&other&networks• Found&distinct&graphlets in&each&case

Network(Motifs:(Simple(Building(Blocks(of(Complex(Networks(– [Milo&et.&al&– Science&2002]The(Structure(and(Function(of(Complex(Networks(– [Newman&– Siam&Review&2003]

2"node'Graphlets

3"node'Graphlets

4"node'Graphlets

Connected(

Disconnected

Page 8: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

AISTATS'2009

Page 9: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

AISTATS'2009

Bioinformatics'2006

Page 10: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

99999

! Biological&Networks&• network&alignment,&protein&function&prediction

! Social&Networks&• triad&analysis,&community&detection,&Exp.&Random&Models

! Computer&Networks

! Internet&AS

! Cyber&Security&• spam&detection

! Ecology

.(.(.

Page 11: Fast Graphlet Decomposition: Theory, Algorithms, and Applications
Page 12: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Ex:(Given(an(input(graph(G

9 How%many%triangles% in%G?9 How%many%cliques%of%size%49nodes%in%G?9 How%many%cycles%of%size%49nodes% in%G?

Page 13: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Ex:(Given(an(input(graph(G

9 How%many%triangles% in%G?9 How%many%cliques%of%size%49nodes%in%G?9 How%many%cycles%of%size%49nodes% in%G?

" In%practice,%we%would%like%to%count%all%k9vertex%graphlets

Page 14: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

! Enumerate& all&possible&graphlets

Page 15: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

! Enumerate& all&possible&graphlets

" Exhaustive%enumeration% is%too%expensive%

Page 16: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

! Enumerate& all&possible&graphlets

" Exhaustive%enumeration% is%too%expensive%

! Count&graphlets for&each&node&– and&combine&all&node&counts

[Shervashidze et.%al%– AISTAT%2009]%

Page 17: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

! Enumerate& all&possible&graphlets

" Exhaustive%enumeration% is%too%expensive%

! Count&graphlets for&each&node&– and&combine&all&node&counts

" Still%expensive% for%relatively% large%k%%%[Shervashidze et.%al%– AISTAT%2009]%

Page 18: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

! Enumerate& all&possible&graphlets

" Exhaustive%enumeration% is%too%expensive%

! Count&graphlets for&each&node&– and&combine&all&node&counts

" Still%expensive% for%relatively% large%k% [Shervashidze et.%al%– AISTAT%2009]%

! Other&recent&work&counts&only&connected& graphlets of&size&k=4

[Marcus%&%Shavitt – Computer%Networks%2012]%

Not(practical(– scales&only&for&small&graphs&with&few&hundred/thousand&nodes/edges9 taking%2400%secs for%a%graph%with%26K%nodes

Page 19: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Most&work&focused&on&graphlets of&k=3&nodes&

In&this&work,&we&focus&on&graphlets of&k=3,4&nodes

Efficient%Graphlet Counting%for%Large%Networks%%[Ahmed%et%al.,%ICDM%2015]

Graphlet Decomposition:% Framework,%Algorithms,%and%Applications[Ahmed%et%al.,%KAIS%Journal%2016% (to%appear)]

Page 20: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Searching(Edge(Neighborhoods

① For(each(edge(do

u v

v2 v3v1 v4

v6 v7

edge

Page 21: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Searching(Edge(Neighborhoods

① For(each(edge(do

• Count(All(3<node(graphlets

② Merge(counts(from(all(edges

u v

v2 v3v1 v4

v6 v7

edge

Triangle 2<star 1<edge Independent(

Page 22: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Searching(Edge(Neighborhoods

① For(each(edge(do

• Count(All(3<node(graphlets

② Merge(counts(from(all(edges

u v

v2 v3v1 V4

v6 v7

edge

Triangle 2<star 1<edge Independent(

# We(only(need(to(find/count(triangles

# Use(equations to(get(counts(of(others(in(o(1) Triangle

Page 23: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Edge"centric,'Parallel,'Memory"efficient'Framework'

Page 24: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

How to count all 4-node graphlets?

4<Clique 4<Cycle4<Chrodal<Cycle Tailed<triangle 4<Path 3<Star

4<node<triangle 4<node<2star 4<node<2edge 4<node<1edge Independent(

Page 25: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Step(1 Step(2 Step(3Searching(Edge(Neighborhoods

For%each%edge%Find%the%triangles

Count(4Gnode(graphlets

For%each%edge%Count%49node%cliques%and%49node% cycles only

Count(4Gnode(graphlets

For%each%edge%Use%combinatorial% %%

relationships%to%compute%counts%of%other%graphlets

in%constant(time

Step(4 Merge(counts(from(all(edges(

Page 26: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

± 1&edge4<Node(Graphlet Transition(Diagram(

Page 27: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

4<Node(Graphlet Transition(Diagram(± 1&edge

Count(Cliques(&(Cycles(ONLY

Use(relationships(&(transitions(to(count(all(other(graphlets in(constant(time

4<Cliques

4<Cycles

Maximum&no.&triangles&Incident&to&an&edge

Maximum&no.&starsIncident&to&an&edge

Page 28: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

T T

Relationship(between(4<cliques(&(4<ChordalCycles4<Cliques 4<ChordalCycle

e

T T

e

No.&49ChordalCycles No.&&49Cliques

Proof'in'Lemma'1'" Ahmed'et'al.,'ICDM'2015

Page 29: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

T T

Relationship(between(4<cliques(&(4<ChordalCycles

T T

No.&49ChordalCycles No.&&49Cliques

4<Cliques 4<ChordalCyclee e

Proof'in'Lemma'1'" Ahmed'et'al.,'ICDM'2015

Page 30: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Experiments & Results

Page 31: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

! Shared&Memory&Implementation

! Tested&on&graphs&with&over&a&billion&edges

! Largest&systematic&investigation&on&300+&networks• Social,&web,&technological,& biological,&co9authorship,& infrastructure…&

• Facebook&100&networks&from&a&variety&of&US&schools

• Dense&graphs& from&the&DIMACS&challenge&

• Large&collections&of&biological&and&chemical&graphs

Details'in'the'paperData/code'online

Page 32: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Comparison(to(RAGE( [Marcus'&'Shavitt – J.'Computer'Networks'2011]'''Facebook100 Networks from US Schools

Ours RAGETime-in-Seconds

Page 33: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

|V| |E| Ours RAGETime-in-Seconds

Baseline% (RAGE)%did%not%finish% for%most%graphs

We'take'~45'mins for'socSorkut (117M'edges)

We'take'~40'secs for'caSdblp (15M'edges)

Most'graphlet counts'in'orders'of'106'– 1015

Page 34: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

|V| |E| Ours RAGETime-in-Seconds

Baseline% (RAGE)%did%not%finish% for%most%graphs

We'take'~4.5'secs for'webSgoogle (4.3M'edges)

We'take'~4'secs for'infSroadSusa (29M'edges)

Most'graphlet counts'in'orders'of'106'– 1015

Page 35: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

0 1 2 4 8 160

5

10

15

Number of Processing Units

Sp

ee

dup

socfb−Texassocfb−ORsocfb−UCLAsocfb−Berkeley13socfb−MITsocfb−Penn94

0 1 2 4 8 160

5

10

15

Number of Processing Units

Sp

ee

dup

0 1 2 4 8 160

2

4

6

8

10

12

14

Number of Processing UnitsS

pee

dup

tech−internet−astech−WHOISweb−it−2004web−spam

0 1 2 4 8 160

2

4

6

8

10

12

14

Number of Processing UnitsS

pee

dup

Strong'scaling'results

Intel%Xeon%3.10%Ghz E592687W%server,%16%cores

Page 36: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Applications

Page 37: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Label'1

Label'0

Enzyme

NonSEnzyme

Collection'of'Graphs(e.g.'Protein'Graphs)

.

.

.

Graphs&

Each%Protein%is%represented%by%a%graph

Binary%label%represents%the%function%of%the%protein

Page 38: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Label'1

Label'0

Enzyme

NonSEnzyme

?

?...

Graphs&

?

?

?

Collection'of'Graphs(e.g.'Protein'Graphs)

Assume%we%know%the%labels%of%a%few%graphs

How%to%predict%the%labels%of%the%unlabeled%graphs?

Page 39: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Features

GraphsGraphlet

Feature&Extraction

ModelLearning

Predict&Labels&of&Unlabeled&Graphs

Label'1

Label'0

?

?

?

?...

Graphs&

?

?

?

Protein'Graphs

Page 40: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

! D&D&– 1178&protein&graphs.&Binary&labeled&as&Enzymes&vs.&Non.&Enzymes

! MUTAG&– 188&mutagenic&compounds. Binary&labeled&(whether&or&not&they&have&a&mutagenic&effect&on&the&Gram9negative&bacterium)

! 109fold&validation,&Support&Vector&Machine

! Used&2,3,4&node&graphlets as&features

Page 41: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Previous)work: in)machine)learning)&)biological)networks)Shervashidze et.al [AISTATS'2009]

Feature'Extraction'Time:D&D' 2'hours,'45'minsMUTAG 4.73'secs

Page 42: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Ranking'by'graphlet counts

Links'are'colored/weighted'by'stars'of'size'4'nodes

Nodes'are'colored/weighted'by'triangle'counts

Leukemia

Colon(cancer

Deafness

Page 43: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

! Local&Graphlet Decomposition

Role'discovery,'relational'learning,'multi"label'classification

Page 44: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

! Unbiased&Estimation&of&Graphlet Counts

104

105

0.85

0.9

0.95

1

1.05

1.1

1.15

soc−orkut−dir

Sample Size10

410

5

0.85

0.9

0.95

1

1.05

1.1

1.15

soc−orkut−dir

Sample Size

x/y

104

105

0.9

0.95

1

1.05

1.1

1.15

soc−flickr

Sample Size10

410

5

0.9

0.95

1

1.05

1.1

1.15

soc−flickr

Sample Size

Estimation'of'counts'of'4"vertex'clique

Page 45: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

! Framework&&&Algorithms&• One&of&the&first&parallel&approaches& for&graphlet counting

• On&average&460x&faster&than&current&methods

• Edge9centric& computations&(only&requires&access&to&edge&neighborhood)

• Time&and&space9efficient

• Sampling/estimation&methods&

• Local/global& counting

! Applications• Large<scale graph&comparison,& classification,&and&anomaly&detection

• Visual&analytics&and&real<time graphlet mining

Page 46: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Codehttp://nesreenahmed.com/graphletshttps://github.com/nkahmed/PGD

Datahttp://networkrepository.com

" Email%us%for%questions%

Page 47: Fast Graphlet Decomposition: Theory, Algorithms, and Applications

Thank&you!

Questions?

[email protected]

http://nesreenahmed.com