scale-free networks are rare - santa fe...
Post on 26-May-2020
4 Views
Preview:
TRANSCRIPT
Scale-free networks are rare
Anna Broido1 and Aaron Clauset2, 3
1. Applied Math Dept., CU Boulder2. Computer Science Dept. & BioFrontiers Institute, CU Boulder3. External Faculty, Santa Fe Institute
21 June 2017© 2017 Anna Broido & Aaron Clauset
18 years of scale-free networks
A classic idea and network growth model in network science• claim: scale-free networks are everywhere
18 years of scale-free networks
A classic idea and network growth model in network science• claim: scale-free networks are everywhere
But, definitions are inconsistent across literature:• a network built by preferential attachment• a network with a power-law (PL) degree distribution• a PL with • a PL in the upper tail only• PL merely a better model than alternatives
18 years of scale-free networks
2 < ↵ < 3p(k) / k�↵
A classic idea and network growth model in network science• claim: scale-free networks are everywhere
But, definitions are inconsistent across literature:• a network built by preferential attachment• a network with a power-law (PL) degree distribution• a PL with • a PL in the upper tail only• PL merely a better model than alternatives
Empirically and theoretically controversial
No broad evaluation of their existence
18 years of scale-free networks
2 < ↵ < 3p(k) / k�↵
A large, diverse corpus of real-world networks:• extracted from the Index of Complex Networks (icon.colorado.edu)• 927 real-world network datasets• 5 domains: Biological, Information, Social, Technological, Transportation• diverse sizes: to vertices
"All" the network data
102 106
The scale-free property is defined for simple graphs• but real-world networks may not be simple:
directed, weighted, bipartite, multiplex, and multigraph
"All" the network data… is complicated
The scale-free property is defined for simple graphs• but real-world networks may not be simple:
directed, weighted, bipartite, multiplex, and multigraph
Solution• convert input data set (multiple) simple degree sequences• for example:
• we implement consistent set of conversion rules that guarantee sparse and simple graphs
• some data sets produce many simple graphs
Directed
in-degree out-degree both
degree
YES
NO
"All" the network data… is complicated
} 3 simple seqs.
} 1 simple seqs.
!
For each degree sequence from a data set:
• estimate PL model , compute , and -value for KS hypothesis test
• estimate alternative distributions: exponential, log-normal, Weibull, and PL+exponential cut-off
• perform likelihood-ratio test (LRT) against alternatives
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SIAM REVIEW c⃝ 2009 Society for Industrial and Applied MathematicsVol. 51, No. 4, pp. 661–703
Power-Law Distributions inEmpirical Data∗
Aaron Clauset†
Cosma Rohilla Shalizi‡
M. E. J. Newman§
Abstract. Power-law distributions occur in many situations of scientific interest and have significantconsequences for our understanding of natural and man-made phenomena. Unfortunately,the detection and characterization of power laws is complicated by the large fluctuationsthat occur in the tail of the distribution—the part of the distribution representing largebut rare events—and by the difficulty of identifying the range over which power-law behav-ior holds. Commonly used methods for analyzing power-law data, such as least-squaresfitting, can produce substantially inaccurate estimates of parameters for power-law dis-tributions, and even in cases where such methods return accurate answers they are stillunsatisfactory because they give no indication of whether the data obey a power law atall. Here we present a principled statistical framework for discerning and quantifyingpower-law behavior in empirical data. Our approach combines maximum-likelihood fittingmethods with goodness-of-fit tests based on the Kolmogorov–Smirnov (KS) statistic andlikelihood ratios. We evaluate the effectiveness of the approach with tests on syntheticdata and give critical comparisons to previous approaches. We also apply the proposedmethods to twenty-four real-world data sets from a range of different disciplines, each ofwhich has been conjectured to follow a power-law distribution. In some cases we find theseconjectures to be consistent with the data, while in others the power law is ruled out.
Key words. power-law distributions, Pareto, Zipf, maximum likelihood, heavy-tailed distributions,likelihood ratio test, model selection
AMS subject classifications. 62-07, 62P99, 65C05, 62F99
DOI. 10.1137/070710111
1. Introduction. Many empirical quantities cluster around a typical value. Thespeeds of cars on a highway, the weights of apples in a store, air pressure, sea level,the temperature in New York at noon on a midsummer’s day: all of these things varysomewhat, but their distributions place a negligible amount of probability far fromthe typical value, making the typical value representative of most observations. Forinstance, it is a useful statement to say that an adult male American is about 180cmtall because no one deviates very far from this height. Even the largest deviations,which are exceptionally rare, are still only about a factor of two from the mean in
∗Received by the editors December 2, 2007; accepted for publication (in revised form) February2, 2009; published electronically November 6, 2009. This work was supported in part by the SantaFe Institute (AC) and by grants from the James S. McDonnell Foundation (CRS and MEJN) andthe National Science Foundation (MEJN).
http://www.siam.org/journals/sirev/51-4/71011.html†Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, and Department of Computer
Science, University of New Mexico, Albuquerque, NM 87131.‡Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213.§Department of Physics and Center for the Study of Complex Systems, University of Michigan,
Ann Arbor, MI 48109.
661
Dow
nloa
ded
06/0
1/16
to 1
98.1
1.31
.136
. Red
istri
butio
n su
bjec
t to
SIA
M li
cens
e or
cop
yrig
ht; s
ee h
ttp://
ww
w.si
am.o
rg/jo
urna
ls/o
jsa.
php
Testing the scale-free property
~k(i)
{x̂min, ↵̂} pntail
Strongestp > 0.1 and
ntail > 50 for
90% and no
alternatives
favored for 95%
of graphs
Testing the scale-free property
Strongest
Strong
p > 0.1 and
ntail > 50 for
90% and no
alternatives
favored for 95%
of graphs
no alternative
distributions
favored over
power-law for
50% of graphs
Testing the scale-free property
StrongStrongStrongest
Strong
Weak
Weakest
p > 0.1 and
ntail > 50 for
90% and no
alternatives
favored for 95%
of graphs
no alternative
distributions
favored over
power-law for
50% of graphs
ntail > 50
for 50%
of graphs
p > 0.1for 50% of
graphs
Testing the scale-free property
Biological
Scale-Free }prop
orti
on
Not Scale-Free
prop
orti
on
Results by domain
Social
prop
orti
on
Biological
Scale-Free }prop
orti
on
Not Scale-Free
prop
orti
on
prop
orti
on
Results by domain
Technological
prop
orti
on
Social
prop
orti
on
Biological
Scale-Free }prop
orti
on
Not Scale-Free
prop
orti
on
prop
orti
on
prop
orti
on
Results by domain
num
ber
of d
atas
ets
927 datasets (100%)
Scaling parameter ↵
Results by scaling parameter
num
ber
of d
atas
ets
309 datasets (33%)Weakest:
Scaling parameter ↵
Results by scaling parameter
p > 0.1for 50% of graphs
num
ber
of d
atas
ets
224 datasets (24%)Weak:p > 0.1 and ntail > 50
for 50% of graphs
Scaling parameter ↵
Results by scaling parameter
num
ber
of d
atas
ets
Strong:p > 0.1, ntail > 50,
and no alternative distributions
favored for 50% of graphs
Scaling parameter ↵
Results by scaling parameter
98 datasets (10%)
num
ber
of d
atas
ets
Strongest:p > 0.1 and ntail > 50
for 90% of graphs
and no alternatives favored
for 95% of graphs
Scaling parameter ↵
Results by scaling parameter
35 datasets (4%)
Genuine scale-free networks are rare:• only 4% of network datasets are "strongly scale-free"• only 33% of network datasets are "weakly scale-free"• Of remaining 77%, the majority favor a non-PL distribution
over the power law.
Are any structural patterns "universal"? Maybe not.
Some domains have more scale-free networks than others• e.g., Biological and Technological networks [good theories for why]
• Social networks at best weakly scale-free• we need new mechanistic models of general structural patterns
Preprint of these results on arXiv in late July
Conclusions
Future directions
Kansuke Ikehara
(Colorado)
Ellen Tucker
(Northwestern)
Matthias Sainz
(Colorado)
McKenzie Weller
(Colorado)
Acknowledgements
36%
37%27%
12%43%
45%
33%25%
42%
0%49%
51%
likelihood-ratio test:best power-law vs. best alternative fit
Is the PL a better model than others? Generally, no.
top related