scale-free networks are rare - santa fe...

Scale-free networks are rare

Anna Broido1 and Aaron Clauset2, 3

1. Applied Math Dept., CU Boulder2. Computer Science Dept. & BioFrontiers Institute, CU Boulder3. External Faculty, Santa Fe Institute

18 years of scale-free networks

A classic idea and network growth model in network science• claim: scale-free networks are everywhere

But, definitions are inconsistent across literature:• a network built by preferential attachment• a network with a power-law (PL) degree distribution• a PL with • a PL in the upper tail only• PL merely a better model than alternatives

2 < ↵ < 3p(k) / k�↵

But, definitions are inconsistent across literature:• a network built by preferential attachment• a network with a power-law (PL) degree distribution• a PL with • a PL in the upper tail only• PL merely a better model than alternatives

Empirically and theoretically controversial

No broad evaluation of their existence

2 < ↵ < 3p(k) / k�↵

A large, diverse corpus of real-world networks:• extracted from the Index of Complex Networks (icon.colorado.edu)• 927 real-world network datasets• 5 domains: Biological, Information, Social, Technological, Transportation• diverse sizes: to vertices

"All" the network data

102 106

The scale-free property is defined for simple graphs• but real-world networks may not be simple:

directed, weighted, bipartite, multiplex, and multigraph

"All" the network data… is complicated

The scale-free property is defined for simple graphs• but real-world networks may not be simple:

directed, weighted, bipartite, multiplex, and multigraph

Solution• convert input data set (multiple) simple degree sequences• for example:

• we implement consistent set of conversion rules that guarantee sparse and simple graphs

• some data sets produce many simple graphs

Directed

in-degree out-degree both

degree

"All" the network data… is complicated

} 3 simple seqs.

} 1 simple seqs.

For each degree sequence from a data set:

• estimate PL model , compute , and -value for KS hypothesis test

• estimate alternative distributions: exponential, log-normal, Weibull, and PL+exponential cut-off

• perform likelihood-ratio test (LRT) against alternatives

SIAM REVIEW c⃝ 2009 Society for Industrial and Applied MathematicsVol. 51, No. 4, pp. 661–703

Power-Law Distributions inEmpirical Data∗

Aaron Clauset†

Cosma Rohilla Shalizi‡

M. E. J. Newman§

Abstract. Power-law distributions occur in many situations of scientific interest and have significantconsequences for our understanding of natural and man-made phenomena. Unfortunately,the detection and characterization of power laws is complicated by the large fluctuationsthat occur in the tail of the distribution—the part of the distribution representing largebut rare events—and by the difficulty of identifying the range over which power-law behav-ior holds. Commonly used methods for analyzing power-law data, such as least-squaresfitting, can produce substantially inaccurate estimates of parameters for power-law dis-tributions, and even in cases where such methods return accurate answers they are stillunsatisfactory because they give no indication of whether the data obey a power law atall. Here we present a principled statistical framework for discerning and quantifyingpower-law behavior in empirical data. Our approach combines maximum-likelihood fittingmethods with goodness-of-fit tests based on the Kolmogorov–Smirnov (KS) statistic andlikelihood ratios. We evaluate the effectiveness of the approach with tests on syntheticdata and give critical comparisons to previous approaches. We also apply the proposedmethods to twenty-four real-world data sets from a range of different disciplines, each ofwhich has been conjectured to follow a power-law distribution. In some cases we find theseconjectures to be consistent with the data, while in others the power law is ruled out.

Key words. power-law distributions, Pareto, Zipf, maximum likelihood, heavy-tailed distributions,likelihood ratio test, model selection

AMS subject classifications. 62-07, 62P99, 65C05, 62F99

DOI. 10.1137/070710111

1. Introduction. Many empirical quantities cluster around a typical value. Thespeeds of cars on a highway, the weights of apples in a store, air pressure, sea level,the temperature in New York at noon on a midsummer’s day: all of these things varysomewhat, but their distributions place a negligible amount of probability far fromthe typical value, making the typical value representative of most observations. Forinstance, it is a useful statement to say that an adult male American is about 180cmtall because no one deviates very far from this height. Even the largest deviations,which are exceptionally rare, are still only about a factor of two from the mean in

∗Received by the editors December 2, 2007; accepted for publication (in revised form) February2, 2009; published electronically November 6, 2009. This work was supported in part by the SantaFe Institute (AC) and by grants from the James S. McDonnell Foundation (CRS and MEJN) andthe National Science Foundation (MEJN).

http://www.siam.org/journals/sirev/51-4/71011.html†Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, and Department of Computer

Science, University of New Mexico, Albuquerque, NM 87131.‡Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213.§Department of Physics and Center for the Study of Complex Systems, University of Michigan,

Ann Arbor, MI 48109.

ttp://

Testing the scale-free property

{x̂min, ↵̂} pntail

Strongestp > 0.1 and

ntail > 50 for

90% and no

alternatives

favored for 95%

of graphs

Strongest

Strong

p > 0.1 and

ntail > 50 for

90% and no

alternatives

favored for 95%

of graphs

no alternative

distributions

favored over

power-law for

50% of graphs

StrongStrongStrongest

Strong

Weakest

p > 0.1 and

ntail > 50 for

90% and no

alternatives

favored for 95%

of graphs

no alternative

distributions

favored over

power-law for

50% of graphs

ntail > 50

for 50%

of graphs

p > 0.1for 50% of

graphs

Biological

Scale-Free }prop

Not Scale-Free

Results by domain

Social

Biological

Scale-Free }prop

Not Scale-Free

Results by domain

Technological

Social

Biological

Scale-Free }prop

Not Scale-Free

Results by domain

927 datasets (100%)

Scaling parameter ↵

Results by scaling parameter

309 datasets (33%)Weakest:

p > 0.1for 50% of graphs

224 datasets (24%)Weak:p > 0.1 and ntail > 50

for 50% of graphs

Strong:p > 0.1, ntail > 50,

and no alternative distributions

favored for 50% of graphs

98 datasets (10%)

Strongest:p > 0.1 and ntail > 50

for 90% of graphs

and no alternatives favored

for 95% of graphs

35 datasets (4%)

Genuine scale-free networks are rare:• only 4% of network datasets are "strongly scale-free"• only 33% of network datasets are "weakly scale-free"• Of remaining 77%, the majority favor a non-PL distribution

over the power law.

Are any structural patterns "universal"? Maybe not.

Some domains have more scale-free networks than others• e.g., Biological and Technological networks [good theories for why]

• Social networks at best weakly scale-free• we need new mechanistic models of general structural patterns

Preprint of these results on arXiv in late July

Conclusions

Future directions

Kansuke Ikehara

(Colorado)

Ellen Tucker

(Northwestern)

Matthias Sainz

(Colorado)

McKenzie Weller

(Colorado)

Acknowledgements

37%27%

12%43%

33%25%

likelihood-ratio test:best power-law vs. best alternative fit

Is the PL a better model than others? Generally, no.

scale-free networks are rare - santa fe...

Documents

inclu. - home | santa fe...

inferring large-scale patterns in complex...

1 structural importance - santa fe...

asset pricing under endogenous ... - santa fe...

1 metabolic networks :...

class power and alienated labor - santa fe...

institutions and culture - santa fe...

abstract - home | santa fe...

scaling of the arnold tongues - santa fe...

power laws in economics and elsewhere - santa fe...

five lectures on networks - santa fe...

what determines mutual fund size? - santa fe...

power laws, pareto distributions and zipf’s...

1 structural importance - santa fe...

learning from data - santa fe...

lecture 8: generalized large-scale...

youth director & community profile - razor planet ·...

1 random walk models of evolution - santa fe...

market selection and asset pricing - santa fe...

global patterns in terrorism -...