scale free network paper

Upload: tirumanivamshi

Post on 04-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Scale Free network paper

    1/10

    THEINTERNET,mapped onthe opposite page, is a scale-free network in thatsome siteS starbursts and detail above have a seemingly unlimited

    number of connections to other sites. Thismap, made on February 6, 2003,traces the shortest routes from atest WebsinHo about 1 others,using like colors for similar Webaddresses.

    a -Scientistshaverecentlydiscoveredthat variouscomplexsystemshaveantlnderlyihg~..'~tJ;i~e~tu eg~Ye'l rne(;lb9. haedorganili ngprincipies

    Thisinsighthas important impli~ationsfor a hostofapplications fromdrugdevelopmentto InternetsecurityBYALBERT U\SZLOBARABASINDERICBONABE

    50 SCIENTIFIC AMERICAN MAY2 3

  • 8/13/2019 Scale Free network paper

    2/10

    i

  • 8/13/2019 Scale Free network paper

    3/10

    The brain isa network of nerve cells con-nected by axons, and cells themselves arenetworks ofmolecules connected by bio-chemical reactions. Societies, too, are net-works of people linked by friendships,familial relationships and professionalties. On a larger scale, food webs and eco-systems can be represented as networksof species. And networks pervade tech-nology: the Internet, power grids andtransportation systems are but a few examples. Even the language we are usingto convey these thoughts to you is a net-work, made up of words connected bysyntactic relationships,

    Yet despite the importance and per-vasiveness of networks, scientists havehad little understanding of their structureand properties. How do the interactionsof several malfunctioning llodes in a com-plex genetic network resl).lt in cancer?How does diffusion occur so rapidly.incertain social and communications sys-tems, leading to epidemics ()fdiseases andcomputer viruses? How do ~ome net-works continue to function even after thevast majority of their nodes have failed?

    Recent research has begun to ansWersuch questions. Over the past feWyears,investigators from a variety of fields havediscovered that many .networks-froIIlthe World Wide Web to a cell's metabol-ic system to actors in Hollywood-aredominated by a relatively sm.allnumberof nodes that ani c:oll1l~ctedto many other sites. Networks containing suchim-portant nodes, or hubs, tend to be whatwe call scale-free, in the sense thatsome hubs have a seemingly unlimited

    number of links and no node is typical ofthe others. These networks also behave incertain predictable ways; for example,they are remarkably resistant to acciden-tal failures but extremely vulnerable tocoordinated attacks.Such discoveries have dramaticallychanged what we thought we knew aboutthe complex interconnected world aroundus. Unexplained by previous network the~ories, hubs offer convincing proof thatvarious complex systems have a strict ar-chitecture, ruled by fundamental laws-laws that appear to apply equally to cells,col11puters, languages and society. Fur-thermore, these organizing principles havesignificant implications for developingbetter drugs, defending the Internet fromhackers, and halting the spread of deadlyepidemics, among other applicati()ns.etworks without caleFOR MORE THAN 40 YEARS, sciencetreated all complex net\'\(orks as beingcompletely randoIIl. This paradigIIl has itsroots in the work of two H1ir1garianmath~ematicians, the inimitable Paul Erdos andhis close collaborator Alfred Renyi. In1959, aiming to describe networks seen incommunications and the lif~ sciences,Erdos and Renyi suggested that suc:hsys-tems could be effectively l110deledby con-necting their nodes withrandornlyplacedlinks. The simplicitYof their approach andthe elegance of so~e of their related theo-rems revitalized graph theory, leading tothe emergence of a field in mathematicsthat focuses on random networks.

    An important prediction of random-

    S IENTIFI MERI N

    network theory is that, despite the ran-dom placement of links, the resulting sys-tem will be deeply democratic: mostnodes will have approximately the samenumber of links. Indeed, in a random net-work the nodes follow a Poisson distrib-ution with a bell shape, and itis extreme-lyrare to find nodes that have significant-ly more or fewer links than the average.Random networks are also called expo-nential, because the probability that anode is connected to k other sites de-creases exponentially for large k

    So in 1998, when we, together withHawoong Jeong and Reka Albert of theUniversity of Notre Dame, embarked ona project to map the World Wide Web,we expected to find a random network.Here's why: people follow their uniqueinterests when deciding what sites to linktheir Web documents to, and giventhe di-versity of everyone's .interests and thetremendous number of pages they canchoose from, the resulting pattern of con-nections should appear fairly random.The measurements, however, defiedthat expectation. Software designed forthis project hopped from one Web pageto another and collected all the links itcould. Although this virtu:alrobot reachedonly a tiny fraction of the entire Web, themap it assembled revealed somethingquite surprising: a few highly connectedpages are essentially holding the WorldWide Web together. More than 80 per-cent ()fthe pages on the map had feWerthan four links, but a small minority, lessthan 0.01 percent of all nodes, had morethan 1,000. (A subsequent Web surveywould uncover one document that hadbeen referenced bymore than two millionother pages )

    Counting how many Web pages haveex'1stly k links showed that the distribu-tion followed a so-called power law: theprobability that any node was connectedto k other nodes was proportional to knThe value of n for incoming links was ap-proximately 2, so, for instance, any nodewas roughly four times as likely to havejust half the number of incoming links asanother node. Power laws are quite dif-ferent from the bell-shaped distributionsthat characterize random networks.

    M Y 2003

  • 8/13/2019 Scale Free network paper

    4/10

    R NDOMVERSUSSC LE FREENETWORKS IRANDOMNETWORKS hich resemble the U.S.highway systemsimplified in left map , consist of nodes with randomly placedconnections. Insuch systems, aplot of the distribution of nodelinkages will follow a bell-shaped curve left graph , with mostnodes having approximately the same number of links.

    In contrast, scale-free networks, which resemble,the U.S.airline system simplified in right map . contain hubs [red -

    RandomNetwork

    nodes with a very high number of links. In such networks, thedistribution of node linkages follows a power law [center graphin that most nodes have just a few connections and some havea tremendous number of links. In that sense, the system has noscale. The defining characterist ic of such networks is that thedistribution of links, if plotted on a double-logarithmic scale[right graph , results in a straight line.

    Scale FreeNetwork

    BellCurve ~istribution of NodeLinkagesIf)QJ-c0Z0c;;..cE:::JZ

    Number of L inks

    ~~;;:~

    Specifically, a power law does not have apeak, asa bellcurve does, but isinstead de-scribed by a continuously decreasing func-tion. When plotted on a double-logarith-mic scale, a power law is a straight line[see illustration above] In contrast to thedemocratic distribution of links seen inrandom networks, power laws describesystems in which a few hubs, such as Ya-hoo and Google, dominate.Hubs are simply forbidden in randomnetworks. When we began to map theWeb, we expected the nodes to follow abell-shaped distribution, as do people sheights. Instead we discovered certainnodes that defied explanation, almost asifwe had stumbled on a significant num-ber of people who were 100 feet tall , thusprompting us to coin the term scale-free.www.sciam.com

    PowerLawDistribution of NodeLinkages~

    L. ~~

    ~ 0 0 QJZ zCij 00 0 If)~ ~ 011~ ~~E E~:::J. :::JZ Z

    Number of L inks Number of Links (log scale)

    Scale Free Networks AboundOVER THE PAST several years, re-searchers have uncovered scale-free structures in a stunning range of systems.When we studied the World Wide Web,we looked at the virtual network of Webpages connected to one another by hy-perlinks. In contrast, .ty1ichalisFaloutsosof the University of California at River-side, Petros Falotitsos of theUniversity ofToronto arid Christos Faloutsos of Car-negie MelloQ Uq~versity .analyzed tbephysical structure of the Internet. Thesethree computer-scientist brothers investi-gated the routers connected by optical orother communications lines and foundthat the topology of that network, too, isscale-free.

    Researchers have also discovered that

    some social networks are scale-free. A col-laboration between scientists from BostonUniversity and Stockholm University, forinstance, has shown that a netWork ofsexual relationships among people inSweden followed a poWer law: althoughmost individuals had only a few sexualpartners during their lifetime, a few (thehubs) had hundreds. A recent study ledby Stefan Bornholdt of the University ofKiel in Germany concluded that the net-work of people connected bye-mail islikewise scahfree. Sidney Redner ofBoston University demonstrated that thenetwork of scientific papers, connectedby citations, follows a power law aswell.And Mark Newman of the University ofMichigan at Ann Arbor examined col-laborations among scientists in several

    SCIENTIFIC AMERICAN 5

  • 8/13/2019 Scale Free network paper

    5/10

    disciplines,indudingphysicians and com-puter scientists, and found that those net-works were also scale-free, corroboratinga study we conducted focusing on math-ematicians and neurologists. (Interesting-ly, one of the largest hubs in the mathe-matics community is Erdos himself, whowrote more than 1,400 papers with nofewer than 500 co-authors.)Scale-free networks can occur in busi-ness. Walter W. Powell of Stanford Uni-versity, Douglas R. White of the Univer-sity of California at Irvine, Kenneth W.Koput of the University of Arizona, andJason-Owen Smith 6f the University ofMichigan studied the formation of al-liance networks in the U.S. biotechnolo:gy industry and discovered definite hubs~for instance, companies such as Genzyme,Chiron and Genentech had a dispropor-tionately large number of partnershipswith other firms. Researchers in Italy tooka deeper look at that network. Using datacollected by the University of Siena's Phar-maceuticallndustry Database, which nowprovides information for around 20,100R D agreements among more than 7,200organizations, they found that the hubsdetected byPowell and his colleagues wereactually part of a scale-free network.

    Even the network of actors in Holly-wood-,.,popularized by the game Six De-grees of Kevin Bacon, in which playerstry to connect actors to Bacon via themovies in which they have appeared to-gether-is scale-free. A quantitative analy-

    sis of that network showed that it, too,is dominated by hubs. Specifically, al-though most actors have only a few linksto others, a handful of actors, includingRod Steiger and Donald Pleasence, havethousands of connections. (Incidentally,on a list of most connected actors, Baconranked just 876th.)

    On a more serious note, scale-freenetworks are present in the biologicalrealm. With Zoltan Oltvai, a cellbiologistfrom Northwestern University, we founda scale-free structure in the cellular meta-bolic networks of 43 different organismsfrom all three domains of life, including rchaeoglobus fulgidus{an archaeb

  • 8/13/2019 Scale Free network paper

    6/10

    BIRTHOF SC LE FREENETWORK SC LE FREENETWORKgrows increment ally f rom two to 11 nodes in t his example. When deciding where t o establish a link a new node green) pr ef er s to at tach t o an exist ing node red tha t al re ad y ha s man y o th er c on nec tio ns . T hes e two b as ic mec ha ni sms-growthand pref erential at tachment-will event ually lead to the system s being dominat ed by hubs nodes having an enormous number of links.

    .---- -1 ~

    ~::;:~;;:~

    connected actors are more likely to bechosen for new roles. On the Internet themore connected routers, which typicallyhave greater bandwidth, are more desir-able for new users. In the U.S. biotech in-dustry, well-established companies such asGenzyme tend to attract more alliances,which further increases their desirabilityfor future partnerships. Likewise, themost cited articles in the scientific litera-ture stimulate even more researchers toread and cite them, a phenomenon thatnoted sociologist Robert K. Mertoncalled the Matthew effect, after a passagein the New Testament: For unto everyone that hath shall be given, and he shallhave abundance.

    These two mechanisms-growth andpreferential attachment-help to explainthe existence of hubs: as new nodes ap-pear, they tend to connect to the moreconnected sites, and these popular loca-tions thus acquire more links over timethan their less connected neighbors. Andthis rich get richer process will gener-allyfavor the early nodes, which are morelikely to eventually become hubs.Along with Reka Albert, we have usedcomputer simulations and calculations toshow that a growing network with pref-erential attachment will indeed becomescale-free, with its distribution of nodesfollowing a power law. Although this the-oretical model is simplistic and needs tobe adapted to specific situations, it doesappear to confirm our explanation forwww.sciam.com

    why scale-free networks are so ubiquitousin the real world.Growth and preferential attachment

    can even help explicate the presence ofscale-free networks in biological systems.Andreas Wagner of the University ofNew Mexico and David A. Fellof OxfordBrookes University in England havefound, for instance, that the most-con-nected molecules in the E. coli metabolicnetwork tend to have an early evolution-ary history: some are believed to be rem-nants of the so-called RNA world (theevolutionary step before the eInergence ofDNA), and others are coiJ;].poneptsohremost ancient metabolic pathways.

    Interestingly, the mechanism of pref-erential attachment tends to be linear. Inother words, a new node is twice as Hke-ly to link to an existing node that hastwice as many connections as its neigh-bor. Redner and his colleagues at BostonUniversity and elsewhere have investigat-ed different types of preferential attach-ment and have learned that ifthe mecha-nism is faster than linear (for example, anew node is four times as likely to link to

    ~ ~~~

    an existing node that has twice as manyconnections), one hub will tend to runaway with the lion s share of connections.In such winner take all scenarios, thenetwork eventually assumes a star topol-ogy with a central hub.AnAchilles HeelAS HUMANITY BECOMES increasing-ly dependent on power grids and com-munications webs, a much-voiced con-cern arises: Exactly how reliable are thesetypes of networks? The good news isthatcomplex systems can be amazingly re-silient against accidental failures. In fact,although hundreds of routers routinelymalfunction on the Internet at any mo-ment, the network rarely suffers majordisruptions. A similar degree of robust-ness characterizes. living systems: peoplerarely notice the consequences of thou-sands of errors in their cells, ranging frommutations to misfolded proteins. What isthe origin of this robustness?Intuition tells us that the breakdownofa substantial number of nodes will re-sult in a network s inevitable fragmenta-

    ALBERT L4SZL.6BARABASI. and ERIC BONABEAU s tu dy the b eh av io r a nd c ha rac teris ti cs o fmyria d co mp le x sy stems ran gin g fro m the Interne t to in se ct c ol oni es . Ba ra ba si is Emil T.Ho fman . Profes so r of P hys ic s a t the. U ni versi ty o f No tre Da me w he re h e d irec ts res earchon complex net works. He is author of Linked: The New Science of Networks. Bonabeau ischief scientist at Icosystem a consulting firm based in Cambridge Mass. that applies thetoo ls o f co mp le xity s cie nce to the d is co very o f b us ine ss o ppo rtun it ie s. H e i s c o-au th or o fSwarmIntelligence: From Natural to ArtificialSystems. Thisis Bonabeau s second articlefor Scientific American.

    SCIENTIFIC AMERICAN

  • 8/13/2019 Scale Free network paper

    7/10

    tion. This iscertainly trUefor random net-works: if a critical fraction of nodes is re-moved, these systems break into tiny,noncommunicating islands. Yet simula-tions of scale-free networks tell a differentstory: as many as 80 percent of random-ly selected Internet routers can failand theremaining ones will still form a compactcluster in which there will still be a pathbetween any two nodes. It is equally dif-ficult to disrupt a cell s protein-interactionnetwork: our measurements indicate thateven after a high level of random muta-tions are introduced, the unaffected pro-teins will continue to work together.

    In general, scale-free networks dis-play an amazing robustness against ac-cidental failures, a property that is root-ed in their inhomogeneous topology. Therandom removal of nodes will take outmainly the small ones because they aremuch more plentiful than hubs. And theelimination of small nodes will not dis-rupt the network topology significantly,because they contain few links comparedwith the hubs, which connect to nearlyeverything. But a reliance on hubs has aserious drawback: vulnerability to attacks.

    In a series of simulations, we found

    that the removal of just a few key hubsfrom the Internet splintered the systeminto tiny groups of hopelessly isolatedrouters. Similarly, knockout experimentsin yeast have shown that the removal ofthe more highly connected proteins has asignificantly greater chance of killing theorganism than does the deletion of othernodes. These hubs are crucial-if muta-tions make them dysfunctional, the cellwill most likely die.A reliance on hubs can be advanta-geous or not, depending on the system.Certainly, resistance to random break-down is good news for both the Internetand the caLIn addition, the cell s relianceon hubs provides pharmaceutical re-searchers with new strategies for selectingdrug targets, potentially leading to curesthat would kill only harmful cells or bac-teria by selectively targeting their hubs,while leaving healthy tissue unaffected.But the ability of a small group of well-in-formed hackers to crash the entire com-munications infrastructure by targetingits hubs isa major reason for concern.The Achilles heel of scale-free net-works raises a compelling question: Howmany hubs are essential? Recent research

    S IENTIFI MERI N

    suggests that, generally speaking, the si-multaneous elimination of as few as 5 to15 percent of all hubs can crash a system.For the Internet, our experiments implythat a highly coordinated attack-first re-moving the largest hub, then the nextlargest, and so on-could cause signifi-cant disruptions after the elimination ofjust several hubs. Therefore, protectingthe hubs is perhaps the most effectiveway to avoid large-scale disruptionscaused by malicious cyber-attacks. Butmuch more work is required to deter-mine just how fragile specific networksare. For instance, could the failure of sev-eral hubs like Genzyme and Genentechlead to the collapse of the entire U.S. bio-tech industry?

    Scale Free pidemicsKNOWLEDGE ABOUT scale-free net-works has implications for understandingthe spread of computer viruses, diseasesand fads. Diffusion theories, intensivelystudied for decades by both epidemi-ologists and marketing experts, predict acritical threshold for the propagation ofa contagion throughout a population.Any virus, disease or fad that is less in-fectious than that well-defined thresholdwill inevitably die out, whereas thoseabove the threshold will multiply expo-nentially, eventually penetrating the en-tire system.

    Recently, though, Romualdo Pastor-Satorras of the Polytechnic University ofCatalonia in Barcelona and AlessandroVespigniani of the International Centerfor Theoretical Physics in Trieste, Italy,reached a disturbing conclusion. Theyfound that in a scale-free network thethreshold is zero. That is, all viruses, eventhose that are weakly contagious, willspread and persist in the system. This re-sult explains why Love Bug, the mostdamaging computer virus thus far it shutdown the British Parliament in 2000),was still one of the most pervasive virus-es a year after its supposed eradication.

    Because hubs are connected to manyother nodes, at least one hub will tend tobe infected by any corrupted node. Andonce a hub has been infected, it will passthe virus to numerous other sites, eventu-allycompromising other hubs, which will

    MAY2003

  • 8/13/2019 Scale Free network paper

    8/10

    HOWROBUSTARERANDOMANDSCALE FREENETWORKS ITHE IDENT LF ILUREof a number ofnodes in arandomnetwork top panels can fracture the system into non-communicating islands. In contrast, scale-free networks are

    RandomNetwork AccidentalNodeFailure

    Before

    Scale Free Network Accidental Node Failure

    BeforeScale.FreeNetwork Attackon Hubs

    Before

    more robust inthe face of such failures middle panels .But they are highly vulnerable to a coordinated attack againsttheir hubs bottom panels .

    0

    After

    0 Failed nodep~ .00*

    After

    After

    :::i~-Z:J

    then spread the virus throughout the en-tire system.

    The fact that biological viruses spreadin social networks, which in many casesappear to be scale-free, suggests that sci-entists should take a second look at thevolumes of research writ ten on the inter-play of network topology and epidemics.Speci fically, in a scale-free network, the

    traditionalpublichealth approach of ran-dom immunization could easily fail be-cause it would very likely neglect a num-ber ofthe hubs. In fact, nearly everyonewould have to be treated to ensure thatthe hubs were not missed. A vaccinationfor measles, for instance, must reach 90percent of the population to be effective.

    Instead of random immunizat ions,www.sciam.com

    .0

    thoggh what if doctors targeted the hubsor the most connected individuals? Research in scale free networks indicatesthat this alternative approach could beef-fective even if the immunizations reachedonly a small fractionof the overall population provided that the fraction contained the hubs.

    But identifyingthe hubs in a socialS IENTIFI MERI N 57

  • 8/13/2019 Scale Free network paper

    9/10

    network is much more difficult than de-tecting them in other types of systems.Nevertheless, Reuven Cohen and ShlomoHavlin of Bar-Han University in Israel, to-gether with Daniel ben-Avraham ofClarkson University, have proposed aclever solution: immunize a small fractionof the random acquaintances of arbitrar- .ily selected individuals, a procedure thatselects hubs with a high probability be-cause they are linked to many people.That approach, though, leads to a num-ber of ethical dilemmas. For instance,even if the hubs could be identified,should they have priority for immuniza-

    tions and cures? Such issues notwith-standing, targeting hubs could be themost pragmatic solution for the futuredistribution ofAIDS or smallpox vaccinesin countries and regions that do not havethe resources to treat everyone.

    In many business contexts, peoplewant to start, not stop, epidemics. Viralmarketing campaigns, for instance, oftenspecifically try to target hubs to speed theadoption of a product. Obviously, such astrategy is not new. Back in the 1950s, astudy funded by pharmaceutical giantPfizer discovered the important role thathubs play in how quickly a community of

    It s a SmallWor d After ll

    doctors begins using a new drug. Indeed,marketers have intuitively known forsome time that certain customers outshineothers in spreading promotional buzzabout products and fads. But recent workin scale-free networks provides the scien-tific framework and mathematical tools toprobe that phenomenon more rigorously.FromTheoryto PracticeALTHOUGH SC LE FREEnetworksare pervasive, numerous prominent ex-ceptions exist. For example, the highwaysystem and power grid in the u.s. are notscale-free. Neither are most networks

    one another.. - -.The smalt~world property does not necessarily indicate the

    presence of a.nymagic organizing principle. Even a large networkwiturely random connections will be a small world. Considert,n u ~i avin als know~anotherl.000,thena million, -:be just two handshakes away from you, a billion will be just threeaway, and the earth's entire population willbe well withfi:lfour,Given that fact, the notion that any two.strangers in the world are,

    d average of . egr,IV tihntveal e c

    Ours imple calculation assumes that the people you know areall strangers to one another. In reali ty, there is much overlap.Indeed, society is fragmented into clusters of individuals-havingsimilar c' uc 'ncome.or interests). aJ~atureihathas t~ --followingthe seminal workin the 1970-sof MarkGranovetter,then a graduate student at Harvard,Clustering\s also a generalproperty ofmany othe,rtypes of networks, In1998 DuncanWattsand Steven Strqgat en both at CornellUniversity,significant 2liJsteri varityil'6fy~~ems:'fr'iimtlgridto the he'IIralnetworkofthe Caenorhabditisefegans Worm.

    Atfirst glance, isolated clusters ofhighly interconnectednodes appear to run counter t01he topology ofscale-freenetworks, inwhich a number ofhubs raoiate throughout thesyst~m,lin verytRing~ Re~ently,'H~weverrwe.havl.. ,i' . Mthat the two operties are compatible:.a network can be bothhighlyclustered and scale-free when small,tightly interlinkedclusters of nodes are connected into larger, less cohesive groupsteft , Thistype ofhierarchy appears to exist ina number ofsystems, fr~~Jhe WorldWide)Yeb(io,yV/)jch sare:,groupings ofWebpages devotei: tothe same t )to a cell(inwhichclusters are teams ofmolecules responsible fora specific function). -A.-L.B. ande.B:

    010pieasking them to forwardthe correspondence to acquaintanceswhomight be able to shepherd itcloserto a target recipient: astockbroker inBoston. Totrack each ofthe different paths,

    11.,1i1 ask e p ant\.~o mai w ey p d th erto,somec ,the letters that eventually arrived at the finaldestination hadpassed through an ave-rageofsixindividuals-the basis ofthepopular notion of sixdegrees ofsepar' tion between everyone.

    Althou am rk rdly:onclu-'-cletter~ ney de th y stofkbrokerecen~ly learned that other networks exhibit this small worldproperty, Wehave found, for instance, that a path ofjust threereactions willconnect almost any pair of chemicals ina cel l. And

    HIERARCHICALC ~STE cludpag the FrankLio w .,be I to otherclusters'(green) sin Wright;Tamous homes orPennsylvania' s attract ions. Those sites, in turn, could be connected toclusters red on famous architects or architecture ingeneral.

    58 SCIENTIFICAMERICAN

    --~--

    ~

    MAY 2003

  • 8/13/2019 Scale Free network paper

    10/10

    M POFINTER TINGPROTEINSn yeast highlights tl'\ discoverytha.~high1ylinked,or hub,proteinstend to be crucial for a cel l's survival. Red denotes essential proteir js(thei r removal wi ll cause the cellto die). Orange represents proteins of some importance [thei r removal will slow cel l growth). Greenand yellow represent proteins of less.er or unknoWn significance, respectively.

    ':z~':z03on sca.le-free networks at www.nd.edu/-networks

    S IENTIFI MERI N