fast graphlet decomposition: theory, algorithms, and applications
TRANSCRIPT
Jennifer'NevillePurdue&University
Ryan'A.'RossiPARC
Nick'DuffieldTexas&A&M&University
Ted'WillkeIntel&Research&Labs
Social'Network Internet'(AS)
BiologicalPolitical'Blogs
Graph Mining
Network(Motifs:(Simple(Building(Blocks(of(Complex(Networks(– [Milo&et.&al&– Science&2002]The(Structure(and(Function(of(Complex(Networks(– [Newman&– Siam&Review&2003]
2"node'Graphlets
3"node'Graphlets
4"node'Graphlets
Connected(
Disconnected
! Small&k9vertex&induced&subgraphs
Network(Motifs:(Simple(Building(Blocks(of(Complex(Networks(– [Milo&et.&al&– Science&2002]The(Structure(and(Function(of(Complex(Networks(– [Newman&– Siam&Review&2003]
2"node'Graphlets
3"node'Graphlets
4"node'Graphlets
Connected(
Disconnected
! Small&k9vertex&induced&subgraphs
! Motifs:&Occur& in&real9world&networks&with&frequencies&significantly(higher than&randomly&generated&networks
Network(Motifs:(Simple(Building(Blocks(of(Complex(Networks(– [Milo&et.&al&– Science&2002]The(Structure(and(Function(of(Complex(Networks(– [Newman&– Siam&Review&2003]
2"node'Graphlets
3"node'Graphlets
4"node'Graphlets
Connected(
Disconnected
! Small&k9vertex&induced&subgraphs
! Motifs:&Occur& in&real9world&networks&with&frequencies&significantly(higher than&randomly&generated&networks
! Applied&to&food&web,&genetic,& neural,&web,&and&other&networks• Found&distinct&graphlets in&each&case
Network(Motifs:(Simple(Building(Blocks(of(Complex(Networks(– [Milo&et.&al&– Science&2002]The(Structure(and(Function(of(Complex(Networks(– [Newman&– Siam&Review&2003]
2"node'Graphlets
3"node'Graphlets
4"node'Graphlets
Connected(
Disconnected
AISTATS'2009
AISTATS'2009
Bioinformatics'2006
99999
! Biological&Networks&• network&alignment,&protein&function&prediction
! Social&Networks&• triad&analysis,&community&detection,&Exp.&Random&Models
! Computer&Networks
! Internet&AS
! Cyber&Security&• spam&detection
! Ecology
.(.(.
Ex:(Given(an(input(graph(G
9 How%many%triangles% in%G?9 How%many%cliques%of%size%49nodes%in%G?9 How%many%cycles%of%size%49nodes% in%G?
Ex:(Given(an(input(graph(G
9 How%many%triangles% in%G?9 How%many%cliques%of%size%49nodes%in%G?9 How%many%cycles%of%size%49nodes% in%G?
" In%practice,%we%would%like%to%count%all%k9vertex%graphlets
! Enumerate& all&possible&graphlets
! Enumerate& all&possible&graphlets
" Exhaustive%enumeration% is%too%expensive%
! Enumerate& all&possible&graphlets
" Exhaustive%enumeration% is%too%expensive%
! Count&graphlets for&each&node&– and&combine&all&node&counts
[Shervashidze et.%al%– AISTAT%2009]%
! Enumerate& all&possible&graphlets
" Exhaustive%enumeration% is%too%expensive%
! Count&graphlets for&each&node&– and&combine&all&node&counts
" Still%expensive% for%relatively% large%k%%%[Shervashidze et.%al%– AISTAT%2009]%
! Enumerate& all&possible&graphlets
" Exhaustive%enumeration% is%too%expensive%
! Count&graphlets for&each&node&– and&combine&all&node&counts
" Still%expensive% for%relatively% large%k% [Shervashidze et.%al%– AISTAT%2009]%
! Other&recent&work&counts&only&connected& graphlets of&size&k=4
[Marcus%&%Shavitt – Computer%Networks%2012]%
Not(practical(– scales&only&for&small&graphs&with&few&hundred/thousand&nodes/edges9 taking%2400%secs for%a%graph%with%26K%nodes
Most&work&focused&on&graphlets of&k=3&nodes&
In&this&work,&we&focus&on&graphlets of&k=3,4&nodes
Efficient%Graphlet Counting%for%Large%Networks%%[Ahmed%et%al.,%ICDM%2015]
Graphlet Decomposition:% Framework,%Algorithms,%and%Applications[Ahmed%et%al.,%KAIS%Journal%2016% (to%appear)]
Searching(Edge(Neighborhoods
① For(each(edge(do
u v
v2 v3v1 v4
v6 v7
edge
Searching(Edge(Neighborhoods
① For(each(edge(do
• Count(All(3<node(graphlets
② Merge(counts(from(all(edges
u v
v2 v3v1 v4
v6 v7
edge
Triangle 2<star 1<edge Independent(
Searching(Edge(Neighborhoods
① For(each(edge(do
• Count(All(3<node(graphlets
② Merge(counts(from(all(edges
u v
v2 v3v1 V4
v6 v7
edge
Triangle 2<star 1<edge Independent(
# We(only(need(to(find/count(triangles
# Use(equations to(get(counts(of(others(in(o(1) Triangle
Edge"centric,'Parallel,'Memory"efficient'Framework'
How to count all 4-node graphlets?
4<Clique 4<Cycle4<Chrodal<Cycle Tailed<triangle 4<Path 3<Star
4<node<triangle 4<node<2star 4<node<2edge 4<node<1edge Independent(
Step(1 Step(2 Step(3Searching(Edge(Neighborhoods
For%each%edge%Find%the%triangles
Count(4Gnode(graphlets
For%each%edge%Count%49node%cliques%and%49node% cycles only
Count(4Gnode(graphlets
For%each%edge%Use%combinatorial% %%
relationships%to%compute%counts%of%other%graphlets
in%constant(time
Step(4 Merge(counts(from(all(edges(
± 1&edge4<Node(Graphlet Transition(Diagram(
4<Node(Graphlet Transition(Diagram(± 1&edge
Count(Cliques(&(Cycles(ONLY
Use(relationships(&(transitions(to(count(all(other(graphlets in(constant(time
4<Cliques
4<Cycles
Maximum&no.&triangles&Incident&to&an&edge
Maximum&no.&starsIncident&to&an&edge
T T
Relationship(between(4<cliques(&(4<ChordalCycles4<Cliques 4<ChordalCycle
e
T T
e
No.&49ChordalCycles No.&&49Cliques
Proof'in'Lemma'1'" Ahmed'et'al.,'ICDM'2015
T T
Relationship(between(4<cliques(&(4<ChordalCycles
T T
No.&49ChordalCycles No.&&49Cliques
4<Cliques 4<ChordalCyclee e
Proof'in'Lemma'1'" Ahmed'et'al.,'ICDM'2015
Experiments & Results
! Shared&Memory&Implementation
! Tested&on&graphs&with&over&a&billion&edges
! Largest&systematic&investigation&on&300+&networks• Social,&web,&technological,& biological,&co9authorship,& infrastructure…&
• Facebook&100&networks&from&a&variety&of&US&schools
• Dense&graphs& from&the&DIMACS&challenge&
• Large&collections&of&biological&and&chemical&graphs
Details'in'the'paperData/code'online
Comparison(to(RAGE( [Marcus'&'Shavitt – J.'Computer'Networks'2011]'''Facebook100 Networks from US Schools
Ours RAGETime-in-Seconds
|V| |E| Ours RAGETime-in-Seconds
Baseline% (RAGE)%did%not%finish% for%most%graphs
We'take'~45'mins for'socSorkut (117M'edges)
We'take'~40'secs for'caSdblp (15M'edges)
Most'graphlet counts'in'orders'of'106'– 1015
|V| |E| Ours RAGETime-in-Seconds
Baseline% (RAGE)%did%not%finish% for%most%graphs
We'take'~4.5'secs for'webSgoogle (4.3M'edges)
We'take'~4'secs for'infSroadSusa (29M'edges)
Most'graphlet counts'in'orders'of'106'– 1015
0 1 2 4 8 160
5
10
15
Number of Processing Units
Sp
ee
dup
socfb−Texassocfb−ORsocfb−UCLAsocfb−Berkeley13socfb−MITsocfb−Penn94
0 1 2 4 8 160
5
10
15
Number of Processing Units
Sp
ee
dup
0 1 2 4 8 160
2
4
6
8
10
12
14
Number of Processing UnitsS
pee
dup
tech−internet−astech−WHOISweb−it−2004web−spam
0 1 2 4 8 160
2
4
6
8
10
12
14
Number of Processing UnitsS
pee
dup
Strong'scaling'results
Intel%Xeon%3.10%Ghz E592687W%server,%16%cores
Applications
Label'1
Label'0
Enzyme
NonSEnzyme
Collection'of'Graphs(e.g.'Protein'Graphs)
.
.
.
Graphs&
Each%Protein%is%represented%by%a%graph
Binary%label%represents%the%function%of%the%protein
Label'1
Label'0
Enzyme
NonSEnzyme
?
?...
Graphs&
?
?
?
Collection'of'Graphs(e.g.'Protein'Graphs)
Assume%we%know%the%labels%of%a%few%graphs
How%to%predict%the%labels%of%the%unlabeled%graphs?
Features
GraphsGraphlet
Feature&Extraction
ModelLearning
Predict&Labels&of&Unlabeled&Graphs
Label'1
Label'0
?
?
?
?...
Graphs&
?
?
?
Protein'Graphs
! D&D&– 1178&protein&graphs.&Binary&labeled&as&Enzymes&vs.&Non.&Enzymes
! MUTAG&– 188&mutagenic&compounds. Binary&labeled&(whether&or¬&they&have&a&mutagenic&effect&on&the&Gram9negative&bacterium)
! 109fold&validation,&Support&Vector&Machine
! Used&2,3,4&node&graphlets as&features
Previous)work: in)machine)learning)&)biological)networks)Shervashidze et.al [AISTATS'2009]
Feature'Extraction'Time:D&D' 2'hours,'45'minsMUTAG 4.73'secs
Ranking'by'graphlet counts
Links'are'colored/weighted'by'stars'of'size'4'nodes
Nodes'are'colored/weighted'by'triangle'counts
Leukemia
Colon(cancer
Deafness
! Local&Graphlet Decomposition
Role'discovery,'relational'learning,'multi"label'classification
! Unbiased&Estimation&of&Graphlet Counts
104
105
0.85
0.9
0.95
1
1.05
1.1
1.15
soc−orkut−dir
Sample Size10
410
5
0.85
0.9
0.95
1
1.05
1.1
1.15
soc−orkut−dir
Sample Size
x/y
104
105
0.9
0.95
1
1.05
1.1
1.15
soc−flickr
Sample Size10
410
5
0.9
0.95
1
1.05
1.1
1.15
soc−flickr
Sample Size
Estimation'of'counts'of'4"vertex'clique
! Framework&&&Algorithms&• One&of&the&first¶llel&approaches& for&graphlet counting
• On&average&460x&faster&than¤t&methods
• Edge9centric& computations&(only&requires&access&to&edge&neighborhood)
• Time&and&space9efficient
• Sampling/estimation&methods&
• Local/global& counting
! Applications• Large<scale graph&comparison,& classification,&and&anomaly&detection
• Visual&analytics&and&real<time graphlet mining
Codehttp://nesreenahmed.com/graphletshttps://github.com/nkahmed/PGD
Datahttp://networkrepository.com
" Email%us%for%questions%