scale-free networks are rare - santa fe...

23
Scale-free networks are rare Anna Broido 1 and Aaron Clauset 2, 3 1. Applied Math Dept., CU Boulder 2. Computer Science Dept. & BioFrontiers Institute, CU Boulder 3. External Faculty, Santa Fe Institute 21 June 2017 © 2017 Anna Broido & Aaron Clauset

Upload: others

Post on 26-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

Scale-free networks are rare

Anna Broido1 and Aaron Clauset2, 3

1. Applied Math Dept., CU Boulder2. Computer Science Dept. & BioFrontiers Institute, CU Boulder3. External Faculty, Santa Fe Institute

21 June 2017© 2017 Anna Broido & Aaron Clauset

Page 2: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

18 years of scale-free networks

Page 3: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

A classic idea and network growth model in network science• claim: scale-free networks are everywhere

18 years of scale-free networks

Page 4: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

A classic idea and network growth model in network science• claim: scale-free networks are everywhere

But, definitions are inconsistent across literature:• a network built by preferential attachment• a network with a power-law (PL) degree distribution• a PL with • a PL in the upper tail only• PL merely a better model than alternatives

18 years of scale-free networks

2 < ↵ < 3p(k) / k�↵

Page 5: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

A classic idea and network growth model in network science• claim: scale-free networks are everywhere

But, definitions are inconsistent across literature:• a network built by preferential attachment• a network with a power-law (PL) degree distribution• a PL with • a PL in the upper tail only• PL merely a better model than alternatives

Empirically and theoretically controversial

No broad evaluation of their existence

18 years of scale-free networks

2 < ↵ < 3p(k) / k�↵

Page 6: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

A large, diverse corpus of real-world networks:• extracted from the Index of Complex Networks (icon.colorado.edu)• 927 real-world network datasets• 5 domains: Biological, Information, Social, Technological, Transportation• diverse sizes: to vertices

"All" the network data

102 106

Page 7: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

The scale-free property is defined for simple graphs• but real-world networks may not be simple:

directed, weighted, bipartite, multiplex, and multigraph

"All" the network data… is complicated

Page 8: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

The scale-free property is defined for simple graphs• but real-world networks may not be simple:

directed, weighted, bipartite, multiplex, and multigraph

Solution• convert input data set (multiple) simple degree sequences• for example:

• we implement consistent set of conversion rules that guarantee sparse and simple graphs

• some data sets produce many simple graphs

Directed

in-degree out-degree both

degree

YES

NO

"All" the network data… is complicated

} 3 simple seqs.

} 1 simple seqs.

!

Page 9: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

For each degree sequence from a data set:

• estimate PL model , compute , and -value for KS hypothesis test

• estimate alternative distributions: exponential, log-normal, Weibull, and PL+exponential cut-off

• perform likelihood-ratio test (LRT) against alternatives

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

SIAM REVIEW c⃝ 2009 Society for Industrial and Applied MathematicsVol. 51, No. 4, pp. 661–703

Power-Law Distributions inEmpirical Data∗

Aaron Clauset†

Cosma Rohilla Shalizi‡

M. E. J. Newman§

Abstract. Power-law distributions occur in many situations of scientific interest and have significantconsequences for our understanding of natural and man-made phenomena. Unfortunately,the detection and characterization of power laws is complicated by the large fluctuationsthat occur in the tail of the distribution—the part of the distribution representing largebut rare events—and by the difficulty of identifying the range over which power-law behav-ior holds. Commonly used methods for analyzing power-law data, such as least-squaresfitting, can produce substantially inaccurate estimates of parameters for power-law dis-tributions, and even in cases where such methods return accurate answers they are stillunsatisfactory because they give no indication of whether the data obey a power law atall. Here we present a principled statistical framework for discerning and quantifyingpower-law behavior in empirical data. Our approach combines maximum-likelihood fittingmethods with goodness-of-fit tests based on the Kolmogorov–Smirnov (KS) statistic andlikelihood ratios. We evaluate the effectiveness of the approach with tests on syntheticdata and give critical comparisons to previous approaches. We also apply the proposedmethods to twenty-four real-world data sets from a range of different disciplines, each ofwhich has been conjectured to follow a power-law distribution. In some cases we find theseconjectures to be consistent with the data, while in others the power law is ruled out.

Key words. power-law distributions, Pareto, Zipf, maximum likelihood, heavy-tailed distributions,likelihood ratio test, model selection

AMS subject classifications. 62-07, 62P99, 65C05, 62F99

DOI. 10.1137/070710111

1. Introduction. Many empirical quantities cluster around a typical value. Thespeeds of cars on a highway, the weights of apples in a store, air pressure, sea level,the temperature in New York at noon on a midsummer’s day: all of these things varysomewhat, but their distributions place a negligible amount of probability far fromthe typical value, making the typical value representative of most observations. Forinstance, it is a useful statement to say that an adult male American is about 180cmtall because no one deviates very far from this height. Even the largest deviations,which are exceptionally rare, are still only about a factor of two from the mean in

∗Received by the editors December 2, 2007; accepted for publication (in revised form) February2, 2009; published electronically November 6, 2009. This work was supported in part by the SantaFe Institute (AC) and by grants from the James S. McDonnell Foundation (CRS and MEJN) andthe National Science Foundation (MEJN).

http://www.siam.org/journals/sirev/51-4/71011.html†Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, and Department of Computer

Science, University of New Mexico, Albuquerque, NM 87131.‡Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213.§Department of Physics and Center for the Study of Complex Systems, University of Michigan,

Ann Arbor, MI 48109.

661

Dow

nloa

ded

06/0

1/16

to 1

98.1

1.31

.136

. Red

istri

butio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.si

am.o

rg/jo

urna

ls/o

jsa.

php

Testing the scale-free property

~k(i)

{x̂min, ↵̂} pntail

Page 10: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

Strongestp > 0.1 and

ntail > 50 for

90% and no

alternatives

favored for 95%

of graphs

Testing the scale-free property

Page 11: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

Strongest

Strong

p > 0.1 and

ntail > 50 for

90% and no

alternatives

favored for 95%

of graphs

no alternative

distributions

favored over

power-law for

50% of graphs

Testing the scale-free property

Page 12: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

StrongStrongStrongest

Strong

Weak

Weakest

p > 0.1 and

ntail > 50 for

90% and no

alternatives

favored for 95%

of graphs

no alternative

distributions

favored over

power-law for

50% of graphs

ntail > 50

for 50%

of graphs

p > 0.1for 50% of

graphs

Testing the scale-free property

Page 13: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

Biological

Scale-Free }prop

orti

on

Not Scale-Free

prop

orti

on

Results by domain

Page 14: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

Social

prop

orti

on

Biological

Scale-Free }prop

orti

on

Not Scale-Free

prop

orti

on

prop

orti

on

Results by domain

Page 15: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

Technological

prop

orti

on

Social

prop

orti

on

Biological

Scale-Free }prop

orti

on

Not Scale-Free

prop

orti

on

prop

orti

on

prop

orti

on

Results by domain

Page 16: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

num

ber

of d

atas

ets

927 datasets (100%)

Scaling parameter ↵

Results by scaling parameter

Page 17: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

num

ber

of d

atas

ets

309 datasets (33%)Weakest:

Scaling parameter ↵

Results by scaling parameter

p > 0.1for 50% of graphs

Page 18: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

num

ber

of d

atas

ets

224 datasets (24%)Weak:p > 0.1 and ntail > 50

for 50% of graphs

Scaling parameter ↵

Results by scaling parameter

Page 19: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

num

ber

of d

atas

ets

Strong:p > 0.1, ntail > 50,

and no alternative distributions

favored for 50% of graphs

Scaling parameter ↵

Results by scaling parameter

98 datasets (10%)

Page 20: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

num

ber

of d

atas

ets

Strongest:p > 0.1 and ntail > 50

for 90% of graphs

and no alternatives favored

for 95% of graphs

Scaling parameter ↵

Results by scaling parameter

35 datasets (4%)

Page 21: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

Genuine scale-free networks are rare:• only 4% of network datasets are "strongly scale-free"• only 33% of network datasets are "weakly scale-free"• Of remaining 77%, the majority favor a non-PL distribution

over the power law.

Are any structural patterns "universal"? Maybe not.

Some domains have more scale-free networks than others• e.g., Biological and Technological networks [good theories for why]

• Social networks at best weakly scale-free• we need new mechanistic models of general structural patterns

Preprint of these results on arXiv in late July

Conclusions

Future directions

Page 22: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

Kansuke Ikehara

(Colorado)

Ellen Tucker

(Northwestern)

Matthias Sainz

(Colorado)

McKenzie Weller

(Colorado)

Acknowledgements

Page 23: Scale-free networks are rare - Santa Fe Institutetuvalu.santafe.edu/~aaronc/slides/Broido_Clauset...A large, diverse corpus of real-world networks: • extracted from the Index of

36%

37%27%

12%43%

45%

33%25%

42%

0%49%

51%

likelihood-ratio test:best power-law vs. best alternative fit

Is the PL a better model than others? Generally, no.