complex networks - los alamos national...

94 Los Alamos Science Number 29 2005

Complex NetworksThe Challenge of Interaction Topology

Zoltán Toroczkai

The roadways of Portland, Oregon.

Networks have recently become a paradigmatic way of representing complex systems in which the pattern of interactions between a system’s constituent partsis itself complex and is evolving together with the system’s dynamics. Transport is the main function of these dynamic networks. It is therefore crucial that weunderstand the coupling between the network structure and the efficiency androbustness of the transport processes on the structure. Such understanding willhave a huge impact, allowing us to control signaling processes in the cell and todesign robust information and energy-transmission infrastructures, such as theInternet or the power grid. However, achieving this type of understanding is rather challenging, because of the discrete and random nature of network topology. This article reports on some of our results that connect network dynamics andtransport efficiency. It also illustrates the power behind the ability to control thetopology of the interactions in the design of scalable computer networks.

Systems of many interacting par-ticles typically exhibit complexbehavior. In most well-known

complex systems, the topology of theinteractions between particles can bedescribed by simple structures, suchas regular crystalline lattices or a con-tinuum, and the complex behaviorarises from nonlinearity and nonlocal-ity, which describe the nature of theinteractions themselves. There is,however, a large class of systemscalled complex networks, in which theinteractions are mediated not by acontinuum (or a simple regular struc-ture) but by a complex graph, whosestructure may evolve as part of thedynamics of the interactions.

Familiar systems in almost everyarea of life form such complex net-works. Here are a few examples:transport and transportation infrastruc-tures (electric power grids, water-ways, natural gas pipelines, roadways,

airlines, and others) social interactions(acquaintance networks, scientific col-laboration networks, terrorist net-works, sex webs, and others), commu-nications networks (the World WideWeb, the Internet, microwave back-bone, and telephone networks), bio-logical networks (metabolic networks,gene regulatory networks, proteininteraction networks), and networks inecology (food webs). Although thesesystems have been known for a while,their complexity has been exploredonly recently because the large data-bases and the immense computationalpower required to analyze networkdata were almost nonexistent twodecades ago.

Even a cursory “look” at the struc-ture of real-world networks creates abreathtaking impression: These arelarge objects containing thousands, orsometimes, even hundreds of mil-lions, of nodes with an intricate mesh

of connections among them. For thelast decade, the science of complexnetworks has focused on describingthe structural complexity of real-world network topologies. By lookingat the three images on these openingpages, one can easily surmise, thatstatistical and probabilistic methodsare essential to that description.Today, the focus has expandedbeyond network structure to anunderstanding of the relationshipbetween structure and dynamics andthe implications of that relationshipfor network design. The first half ofthis article traces the main ideas ingraph theory over the past two cen-turies, which are at the basis of themathematical approach to networks,and the second half is devoted tosome very recent developments: computer network design and theconnection between network dynamics and structure.

Number 29 2005 Los Alamos Science 95

Complex Networks

The metabolic pathway. (Gerhard Michal:Biochemical Pathways,1999 © Elsevier GmbH,Spektrum Akademischer Verlag, Heidelberg.)

Macroscopic snapshot of Internetconnectivity (skitter data) withselected backbone Internet serviceproviders. (This photo is courtesy of theLumeta Corporation.)

The Problem of the Königsberg Bridges

Network images can be quite strik-ing. But one might question whetherthinking about complex systems interms of networks leads to more thanpretty pictures. Ironically, the funda-mentals of the theory of networkstructures were introduced by a blindmathematician.

It all began with the puzzle ofseven bridges, an entertaining brain-teaser for people who strolled throughKönigsberg, the Prussian city at theBaltic Sea, in the 18th century. Theriver Pregel divides the city into fourland areas connected by sevenbridges. The burghers of Königsbergwondered if one could visit all thefour areas by crossing each bridgeexactly once (see Figure 1).

The puzzle was solved in 1736 byLeonhard Euler, who at the time, wasa mathematics professor in St.Petersburg. The power of Euler’ssolution lies not in the answer itself(which is negative) but in the way itwas derived. Euler’s revolutionaryidea was to represent the pieces ofland separated by bridges as the nodes(dots) A, B, C, and D and to representthe bridges as the edges (line seg-ments) a, b, c, d, e, f, and g, connect-ing the nodes (see Figure 2). Thestructure formed by the set of nodesand edges, called a graph, is a simpli-fied representation of the puzzle,encoding the relationships betweenthe pieces of land and the paths ofaccess between them (see Figure 2inset). In this representation, the prob-lem translates into the following one:Find a path that visits all nodes butpasses through all edges exactly once.Obviously, the intermediate nodesmust have an even number of incidentedges (if one visits an intermediatenode, one must also leave it).

Because the Königsberg puzzle has4 (>2) nodes, all with an odd numberof edges, there can be no such path.


Complex Networks

Figure 1. The Königsberg PuzzleThis woodcut shows the ancient Prussian city of Königsberg (now known asKaliningrad) with its seven bridges across the river Pregel. The possibility ofstrolling across the city by crossing each bridge once only became the object of afamous brainteaser.

Figure 2. Euler’s Solution to the Königsberg PuzzleA simple representation of the problem by a graph helps realize that there is nosuch path that visits all nodes and passes through all seven edges exactly once.

Euler’s representation of the relation-ships between a discrete set of entitiesas a graph led to the development of aparticular type of mathematicalnomenclature and ultimately to a newfield of discrete mathematics calledgraph theory.

A Hard Problem: TheRamsey Numbers

For nearly 200 years, graph theorywas concerned with topologicaland/or geometrical properties of smallstructures, or regular structures (suchas a lattice). Then, the 1951 seminalpaper by Ron Solomonoff and AnatolRapoport (1951) and the 1959–1960series of papers by Pál Erdo″s andAlfréd Rényi caused the rebirth ofgraph theory. These papers introducedthe notion of a random graph and,more important, that of graph ensem-bles, which are sets of graphs thatshare a given property Q. To under-stand this notion, let us look briefly atthe famous Party Problem and theRamsey numbers. This problem,inspired from social interactions, isstated very simply:

What is the minimum number ofguests, R, one should invite to a partythat would surely have k people whoall know each other or k who do notknow each other (at all)?

For k = 3, it is easy to prove thatR(3) = 6. We will use Euler’s method:Let us denote the six people by thenodes A, B, C, D, E, and F. Let usrepresent the fact that two peopleknow each other by drawing a redlink (or edge) between them and use ablue edge to link two people who donot know each other. Since pairs ofpeople either know each other or donot, the graph obtained is complete,which means that all possible edgesare drawn—see Figure 3(a).Specifically, a complete graph with nnodes, denoted here by Kn, always hasn(n – 1)/2 edges. The graph theoretic

version of the Party Problem is thus todetermine the minimum number ofnodes n, such that a complete graphwith n nodes and with edges of twocolor always has at least one completesubgraph of k nodes with all edges ofthe same color. For k = 3, a completesubgraph is a monochromatic triangle.

If there are n = 5 people present, onecan easily color a complete graphwith no such triangle present, rulingout n = 5 by inspection.

For n = 6, however, there is alwaysat least one such triangle. To provethis statement, let us assume the oppo-site, namely, that there can be no suchtriangles. Since for every node thereare n – 1 = 5 incident edges but onlytwo colors, there must be 3 edges ofthe same color incident on the node.For example, consider the edges AC,AD, and AE in Figure 3(b) to be thesame color, for example, blue. Sincethe triangles ACE, ACD, and ADEcannot have all three of their edges ofthe same color, CE, CD, and DE mustbe red. Then CDE is a triangle allwith the same color edges (red), acontradiction. Hence, R(3) = 6.

For k = 4, the answer is R(4) = 18,which is hard to prove. For k = 5 andhigher, the answers are not known;only some bounds exist. Although wehave no proof for k = 5, one mightthink that we would surely be able touse today’s supercomputers to find thevalue of R(5). However, as Bollobás,


Complex Networks

Leonhard EulerLeonhard Euler, the most prolific mathe-matician of all times, was born inSwitzerland in 1707 and spent his life inBerlin and St. Petersburg. Opera Omniais an incomplete collection of his worksthat has 73 volumes, each over 600pages in length.

Figure 3. The Party Problem for k = 3Six is the minimum number of people that always contains a group of three, all ofwhom either know each other (red links) or do not know each other (blue links). (a)A complete graph for six people is shown. Note that it contains the complete sub-graph CDE. (b) This figure demonstrates that there is no way to draw a completegraph without constructing a complete single-color three-node subgraph within it.Suppose that blue links indicate that A does not know C, D, or E. In that case, CD,DE, and EC must be red links so that a complete blue subgraph should not beformed. But then, as shown in (b), those three red links form a complete subgraph,which means that C, D, and E know each other.

an eminent graph theoretician, hasstated (1998) “…a head-on attack bycomputers for R(5) is doomed to fail-ure . . . .” This failure is largely due tothe combinatorial explosion in thenumber of ways we can draw a com-plete graph with n nodes using twocolors for the edges: On the face of it,a computer would have to search atotal of 2n(n–1)/2 such graphs for com-plete subgraphs with k nodes. For k =3, when n = R(3) = 6, there are 215 =32,768 complete graphs, for k = 4, theanalytic solution gives n = 18, whichmeans that there are 2153, or approxi-mately 1.46 × 1046 graphs. For k = 5,the best known bounds are 43 ≤ R(5)≤ 49, which would mean approxi-mately 2903 to 21176 graphs (or on theorder of 10301 graphs). For k = 5 andn = 43, the “ultimate laptop” of SethLloyd (2000), which operates at thephysical limit of computation (asdetermined by the speed of light, thePlanck constant, and the gravitationalconstant), performing f = 5.4258 ×1050 operations per second, wouldhave to work for at least 2.693 ×10213 years, a mighty long time (theage of the universe is estimated tobe between 1.1 × 109 and 2 × 109

years).So, can we hope ever to solve the

Party Problem? The key idea is tounderstand how different colorings ofKn relate to one another via transfor-mations, which would allow us to par-tition the set of two colorings of Kninto a smaller number of classes and,in the absence of a full mathematicaltheory, to program the computer tosearch for the monochromatic com-plete subgraphs on the set of classesinstead of the full set. Although stillunsolved, intense activity in this arealed to a number of generalizations ofthis problem and to the developmentof a huge branch of mathematics, theRamsey theory (Graham et al. 1990).That theory has a number of verydeep results that go well beyondgraph theory, affecting set theory in

the form of partition calculus, combi-natorics, ergodic theory, logic, analy-sis, algebra, geometry and computerscience.

Ultimately, the Party Problem sug-gests that, if we partition a set into afixed number of classes, order mustemerge for large enough sets. Thisprinciple is also illustrated by van derWaerden’s theorem (Bollobás 1998),which states that, for a given k and p,if we partition the first w integers intok classes, we will always find a classthat contains an arithmetic progres-sion with p terms for large enough w.Problems like the Party Problem leadto a simple conclusion: In order tounderstand properties of graphs, onehas to think in terms of ensembles ofgraphs that share a certain property, Q.

A Revolutionary Idea

The Hungarian mathematician PálErdo″s was one of the main pioneers ofthe ensemble approach. His completedisregard for the notion of possession

and ownership and his habit of livingout of a suitcase and visiting onemathematician friend after the nextwere symptoms, perhaps, of his totaldedication to mathematics. Erdo″s isconsidered by many to be the secondmost productive mathematician of alltimes, after Euler. Possibly the great-est contribution of Erdo″s is his intro-duction of the probabilistic methodin discrete mathematics. For graphtheory, this means that, instead ofasking for detailed properties of allgraphs in an ensemble, we are askingfor average properties, or the proba-bility that a graph from an ensemblehas the property Q. The probabilisticmethod was definitely not new whenErdo″s introduced it to discrete math-ematics: By the end of the 19th cen-tury, Boltzmann, Gibbs, and othershad laid down the foundations ofequilibrium statistical mechanics,which is based on applying the prob-abilistic method to ensembles ofmicrostates and characterizingmacroscopic properties of the systemby the properties of the “typical”microstates. This natural connectionbetween statistical mechanics andgraph theory is currently beingexploited by some research groupsworldwide, including the StatisticalPhysics of Infrastructure Networksteam at Los Alamos. Besides thecombinatorial explosion in the num-ber of possible graphs (or states),there is a second strong reason thatcalls for the use of the probabilisticmethod: incomplete information.Real-world networks, as we will seefrom the following sections, are inmany cases very dynamic, with newedges and nodes appearing and oldones disappearing as a result of sto-chastic processes. In addition, insome cases, it is hard, or even impos-sible, to identify precisely the graphstructure at a given moment. Again, agood example is supplied by a prob-lem related to social networks, name-ly, the Gossip Problem:


Complex Networks

Pál Erdo″s (1913–1996)Pál Erdo″s, who introduced the notion ofrandom graphs, is probably the secondmost prolific mathematician of all times,having produced over 1500 publicationswith 507 coauthors.

Suppose that person A in a set of Npeople has a very interesting piece ofinformation or gossip. On average,how many acquaintances must everyperson in N have such that the gossipbecomes known to all?

Since we do not know who knowswhom, we determine the answer byconsidering all graphs of N peoplewith the nodes representing the indi-viduals and the edges representing theacquaintanceships, or social links, fortransmitting gossip. In other words, ifpersons A and B are linked and one ofthem knows the gossip, we canassume the other knows it too. Theanswer to the Gossip Problem mustbe probabilistic in nature: It is theclass of graphs having nodes with acertain average number of links andcharacterized by the property thateveryone knows the gossip in theend. Erdo″s and Rényi came up with arather surprising solution: Once anode has on average one link, thegossip becomes known to all! In thejargon of social scientists, the set ofpeople represented by that graphforms a society. The class of graphsthat Erdo″s and Rényi introduced andthat helped give the answer is calledrandom graphs, a subject with a hugemathematical literature. For a review,see the book by Bollobás (2001).

The Binomial Random Graph.Because we need it for later discussion,we introduce the binomial randomgraph G(N,p) and present some of itsproperties. G(N,p) is a class of graphswith N vertices, whose edges aredrawn at random and independently,according to a uniform distributionwith probability p. Therefore, theaverage number of links incident on anode is λ = p(N – 1), or for largegraphs, it is approximately λ = pN.Thus, according to the answer for theGossip Problem, when λ = 1, or theprobability for a node to have an edgeis p = pc = 1/N, a giant cluster, orgiant component, emerges that con-

tains most of the nodes, and the prob-ability for a node not to belong to thiscluster decreases exponentially fast forp > pc. Physicists call this phenome-non percolation. Passing through pc(by the process of increasing the aver-age number of incident edges), thenetwork suffers a drastic change,which is called a phase transition inthe language of physics.

We now introduce one of the mostimportant characteristics of randomgraphs, namely, their degree distribu-tion. The degree of a node x is thenumber k(x) of incident edges on thatnode. The degree distribution of thebinomial random graph G(N,p) is theprobability that the number of nodesXk with degree k is y. In a G(N,p), theprobability of a node being connectedto k specific other nodes and not con-nected to the rest of N – 1 – k nodes ispk(1 – p)N–1–k. Because the number ofways to connect those k nodes isequal to the binomial coefficient

the probability of a node havingexactly k incident edges in G(N,p)becomes

(1)

Note that, as edges are drawn inci-dent to a node, that node will influ-ence the number of edges around theother nodes, and thus, in principle, thedistribution of Xk will not be exactlythe same as if all the nodes were inde-pendent, and the calculation of theexact form of the degree distributionbecomes a hard task. It was Bollobás(2001) who showed that, for largeenough N, the nodes can be treated asif they were independent, and thus,with good approximation, the degreedistribution of G(N,p) is described by

the binomial distribution in Equation(1). In the limit of N → ∞ and p → 0such that λ = pN = constant, the bino-mial goes into the Poisson distribu-tion:

(2)

Figure 4 shows a comparisonbetween the formula in (2) and themeasured degree distribution for abinomial graph of N = 20,000 nodesand a link probability p = 20/N =0.001. It shows that, indeed, theapproximation is good. The Poissondistribution P(k) has a “bell curve”shape, with a peak at λ = pN, and fastdecaying tails. The degree of a nodecharacterizes how a node “sees” itsimmediate neighborhood in the net-work. According to the formula in (2),if we keep λ = pN a constant, whileincreasing the size of the network, the distribution of edges in the immediate neighborhood of a nodebecomes independent of N for large N.However, keeping λ = pN a constant,means scaling the link probability pwith N – 1. The average node degree is

and the standard deviation of the dis-tribution around the average is

This result shows that the binomialgraph has a characteristic scaledefined by λ.

Real-World Networks

The latest revolution in networksscience happened toward the end ofthe 1990s, when powerful computersmade it possible to gather and analyzedata for systems containing a large


Complex Networks

number of components: from theWorld Wide Web and the Internet,phone call networks, networks ofmovie actors, large-company boards ofdirectors, scientific collaboration net-works, language networks, crimewebs, epidemic networks, and the sexweb to biological networks such as themetabolic network, protein interactionnetworks, cell-signaling, and foodwebs. The first important observationis that most of these networks are verydifferent from the random graphs ofErdo″s and Rényi. In hindsight, thisdeparture is not unexpected: In therandom graphs of Erdo″s and Rényi,the edges are assumed to exist com-pletely independently from each other,whereas in real-world networks, theexistence of edges is typically condi-tioned by nonindependent processes,or constraints, such as spatial embed-ding and interaction range depend-ency. The real surprise is that, in spiteof their diversity, real-world networkscan be classified into a small numberof different classes of graphs, eachcharacterized by certain structuralproperties of the interaction topology

in these systems. The most usefulproperties for this purpose are degreedistributions, clustering, assortativity,and shortest paths.Instead of listing the classes of thesenetworks and enumerating their prop-erties, we will discuss one ubiquitousclass, the so-called scale-free networks, originally introduced byAlbert-László Barabási. These net-works have power-law degree distri-butions (see Figure 5), as opposed tothe Gaussian or Poisson degree distri-butions of random graphs (for exam-ple, Figure 4). These real-world,scale-free networks include the net-work of movie actors, scientific col-laboration networks, the sex web, themetabolic network in the cell (on allthree levels of life—archaea, bacteria,and eukaryotes), the protein interac-tions network, the language networkdefined by synonyms (in which case,the nodes are the words, and the edgesconnect the synonyms), and virtuallyall large-scale information networks:the Internet (router and alsoautonomous domain level), the WorldWide Web, some e-mail networks and

phone call networks. Why do thesereal-world networks have similardegree distributions? Is there a univer-sal mechanism that generates thesestructures? The first crucial observa-tion is that, in most cases, these struc-tures result from dynamic processeswith a strong stochastic component,just like the random graph model ofErdo″s and Rényi. However, to deviatefrom the random graph model, thenetwork evolution process mustinclude stochastic dependency andbias. The question is then, “What sto-chastic processes will generate scale-free networks?”

Most current models generatingscale-free networks identify a mecha-nism for network growth and evolu-tion. Among the notable ones are thepreferential attachment model ofAlbert and Barabási (2002), in whicha newly arriving node connects to anode in the existing network withprobability proportional to the currentdegree of the node in the network; thefitness-based network growth modelof Caldarelli et al. (2002); the Chung-Lu model of power-law randomgraphs (2002); the model of the WorldWide Web by Menczer (2002); theinitial attractiveness model ofDorogovtsev et al. (2000); and others.Although these models producegraphs that have power-law degreedistributions, they either have beenbuilt for a specific type of network(for example, the model by Menczer)or are mathematical abstractions inwhich the stochastic network-growthprocess has little to do with the actual,often quite complicated, evolutionmechanism of the real-world network.The stochastic dynamics for theappearance and disappearance ofInternet routers, which has manyunknown factors, is another example.Most real-world networks are alsostrongly coupled to other networks orother large-scale complex systems,and thus, in order to identify the net-work evolution mechanism, one can-


Complex Networks

Figure 4. The Degree Distribution of the Binomial Random GraphThe red circles show the degree distribution, or Xk/N, the fraction of nodes with klinks vs k, for a single instance of the binomial random graph G(N,p) with the num-ber of nodes N = 2 ×× 104 and the probability for a link to exist between two nodesbeing p = 10–3. The continuous black line is a plot of the Poisson distribution P(k) informula (2) with λλ = pN = 20. Note the similarity between the two distributions.

not study these networks in isolation.To add to the complexity of the prob-lem, the evolution of the networkstructure can depend on the dynamicsor flow on the network. Most studiesof complex networks have been staticand structural as they try to identifytheir graph-theoretic properties. It hasbecome clear, however, that, to solveeven this problem, we must look atthe full dynamics of the complex net-work, that is, at the flow on these

structures and the coupling of theflow to the structural evolution.

The Problem of Epidemics.Before presenting some recent resultsthat take into account the couplingbetween structure and dynamics, Iwill briefly mention an interesting andimportant real-world problem that iscomplex in the sense mentionedabove. I am referring to epidemics, ordisease propagation in living popula-

tions, a topic heavily studied at LosAlamos in the past decade. The usual,classic approach to epidemics imposesa number of assumptions that makeanalytic and numerical treatment rela-tively straightforward; however, atleast in some cases, that approachmay cause a departure from reality.One such assumption is uniform mix-ing, whereby the individuals of a pop-ulation are assumed to come in con-tact with equal probability, independ-


Complex Networks

Figure 5. Degree Distribution of Various Scale-Free Networks(a) Shown here is the cumulative degree distribution for citation networks (after Redner 1998); (b) the sex web (after Liljeros etal. 2001); (c) the Internet at the router level (after Faloutsos et al. 1999); (d) the in-link and (e) out-link degree distributions for theWorld Wide Web (after Albert et al. 1999); and (f) the metabolic networks for three species (after Jeong et al. 2000). [Plot (a) is courtesy

of the European Physical Journal B. Plot (c) is courtesy of Computer Communication Review, ACM Publications 1999. Plots (b), (d), (e), (f), (g), and (h) are courtesy of Nature.]

http://www.nature.com/

ent of their locations. In order to relaxthis assumption, we observe that con-tact processes, such as disease trans-mission, are well localized in spaceand require that the two or more indi-viduals be no farther apart than sometypical distance characteristic of thedisease transmission process. In heav-ily populated urban areas, disease isusually transmitted within such loca-tions as buildings and mass transitareas (waiting areas and mass transitcars). Using census data and mobilitydiaries that specify the times ofentrance and exit to and from a loca-tion for all locations that a specificperson visited during the day, oneobtains a graph that has the desireddetailed resolution for contact patternsbetween people moving around in anurban area. This movement is largelyconstrained by the roadway networkand the traffic on it. Since the road-way network is itself a complex net-work, the disease transmission prob-lem is that of coupled complexdynamic networks (see Figure 6).

In this network, there are twotypes of nodes: people and locations,and an edge is drawn between a per-

son and a location if that person vis-ited that location during the day. Theedge has a weight associated with it,called “timestamp,” which is theunion of time intervals during whichthe person was at that location. Whatcan we learn, analyzing such a net-work, pertinent to disease outbreaksin a city? How can knowledge aboutthis network be exploited to designeffective vaccination and quarantinestrategies? Conclusions for the spe-cific case of Portland, Oregon, withapproximately 1.6 million people and181,000 locations can be found inEubank et al. (2004).

Scale-Free Networks:Coincidence or Universality?

This section presents a differentapproach to understanding the emer-gence of the scale-free property forreal-world networks. As mentionedpreviously, so far no one has found anobvious universal mechanism leadingto power-law degree distributions forreal-world networks. We actually sug-gest that, for a large class of networks

(to be specified below), there is nouniversal evolutionary mechanism.Instead, the network structure evolvesaccording to a selection principle thatpromotes the global efficiency oftransport and flow processing throughthese structures (Toroczkai andBassler 2004). In other words, regard-less of the specific evolutionarymechanism, that mechanism workswithin the constraints of the selectionprinciple. And the operation of theselection principle on evolution oftenresults in scale-free networks.

Most real-world networks (exceptthose that are defined by artificialassociations) serve as transport sub-strates for various entities such asinformation, energy, material, andforces. Some networks have evolvedspontaneously (without globaldesign), and it makes sense to enquirewhether their dynamics obey a selec-tion principle toward some kind ofoptimal or efficient behavior. Such aprinciple would be analogous to theone of natural selection that shapesboth the biological networks at thecellular level and the food web.

Looking at the Internet, we notethat, if a router receives too muchtraffic and causes constant congestionof the packets, engineers will fix theproblem locally by bringing up morerouters or modifying the routing algo-rithm. Similarly, in social networks, ifan acquaintance does not satisfy ourexpectations about some set of socialnorms, that link will “naturally” bedropped from our own social network.

To explore this trend toward effi-ciency more formally, we first need todefine a flow process on the network.Among the most ubiquitous flowprocesses in Nature are those generat-ed by local variations, or gradients, ofscalar quantities. Particle concentra-tion, temperature, electric or gravita-tional potential, and pressure are just afew examples. The gradient-inducedflow processes include granular flow,fluid flow, electric current, diffusion


Complex Networks

Figure 6. The Roadways of Portland, Oregon The roadway network of Portland forms the substrate for a coupled complex dynam-ic network to simulate movements and disease transmission in this highly populat-ed urban area. (This image is courtesy of the TRANSIMS Project at Los Alamos.)

processes, heat flow, and so on.Naturally, the same local-gradientmechanism will generate flows incomplex networks. Two less obviouslocal-gradient examples are diffusiveload balancing schemes used in dis-tributed computation (Rabani et al.1998), which are also employed inpacket routing on the Internet, and thereinforcement learning mechanism insocial networks with competitivedynamics (Anghel et al. 2004). In thefirst example, a computer (or a router)will ask its neighbors on the networkfor their current job load (or packetload), and the router will balance itsload with the neighbor that has theminimum number of jobs to run (orpackets to route). In this case, thescalar is the negative of the number ofjobs, or packets, at nodes, and theflow will be along the direction of thegradient of this scalar in the node’snetwork neighborhood. In the secondexample, a number of agents/playersin the social network play an interac-tive competition game with limitedinformation. At every step of thegame, each agent has to decide whoseadvice to follow before taking anaction (such as placing a bet), in itscircle of acquaintances (networkneighborhood). Typically, an agentwill try to follow the advice of thatneighbor who in the past proved to bethe most successful in predicting thegame. That neighbor is recognized byusing a scoring mechanism, which isthe simplest form of reinforcementlearning: Every agent has a successscore that changes in time, coupled tothe game’s evolution. An agent willfollow the advice of that agent whohas the highest score in its networkneighborhood at that moment (Anghelet al. 2004). In this case, the scalar isthe past success score of the agents,and an agent will act based on theinformation received along the linkthat is in the direction of the gradientof this scalar.

To construct a simple and general

model of a transport process, weassume that there are N nodes andthat the transport takes place on afixed substrate network S(V, E),where V is the node set and E is theedge set that describes the intercon-nections of the nodes. Associatedwith each node i, there is a scalar hithat describes the “potential” of thenode. Then a gradient network G canbe constructed as the collection ofdirected links that point from eachnode to the nearest neighbor on thesubstrate network S that has the high-est potential (see Figure 7). Thus,only one directed link points awayfrom each node in G, and G consistsof N directed links. Note that, if thepotential of a node is higher than thepotential of all its nearest neighbornodes, the gradient link of that nodeis a loop that points back to itself(“self-loop”). In general, the potentialfor each point can evolve in time, andas a result, the gradient network G

will be time dependent. If we further-more assume that all links have thesame conductance, or transport prop-erties, the gradient network willdescribe the instantaneous substruc-ture carrying the maximum flow.Consequently, we can hope to usegradient networks as a tool to analyzethe flow efficiency or susceptibility tojamming on the corresponding sub-strate networks.

Note that, if there are two or morenodes in the network neighborhoodof a node i that share the maximumvalue, the gradient in i is calleddegenerate. If each neighborhoodhas only one maximum, it is callednondegenerate, and is easier to ana-lyze. In the discussions below, wewill restrict ourselves to the nonde-generate condition, which is easilyrealized if, for example, the scalarsare continuous random variables.Since every node has exactly onegradient direction from it (even f itis a self-loop), G has exactly Nnodes and N edges (and there is atleast one self-loop, corresponding tomaxi{hi}). A simple but very impor-tant property of nondegenerate gra-dient networks is that they formforests, that is, each gradient net-work is a collection of tree graphscontaining no loops (except for self-loops). We can therefore hope toanalyze network flow processesusing the techniques of statisticalmechanics that have been welldeveloped for treelike structures.

Gradient Networks on Randomvs Scale-free Networks. Let us firstconsider a gradient network for a ran-dom graph substrate S. In particular,we choose for S the binomial randomgraph, G(N,p) consisting of N nodes,each pair of nodes being linked withprobability p. We next assume that thescalar potentials of the various nodesare independent random variablesidentically distributed according to adistribution η(h). The distribution of


Complex Networks

Figure 7. Definition of a GradientEdgeThe gradient edge is a directed linkfrom node i to that neighbor on the sub-strate graph that has the largest valueof the scalar in the neighborhood of i. Ifi has the largest value, then the gradientedge is a self-loop.

the number of links l pointing to eachnode, the so-called in-degree distribu-tion R(l) of the gradient network G,can be exactly calculated, and it yieldsthe following expression:

(3)

Thus, this in-degree distribution isindependent of the particular form ofthe distribution for the scalar poten-tials η(h). It is possible to show thatin the limit N → ∞ and p → 0 suchthat Np = λ = constant >> 1, theexpression in Equation (3) becomesthe power law R(l) ≈ 1/(λl), with afinite-size cutoff at lc = z; refer toFigure 8(a). Therefore, in this limit,gradient networks are scale-freegraphs (up to their cutoff)! Thispower-law degree distribution for thegradient network is a rather surprisingresult because, in the same limit, thesubstrate graph S is a binomial ran-dom graph having a Poisson degreedistribution with a well-defined aver-age degree λ (setting the scale of thesubstrate graph), as well as rapidlydecaying tails.1

If, instead, the substrate network Sis a scale-free graph, the gradientgraph will still have a power-lawdegree distribution. Figure 8(b) com-pares degree distributions P(k) for


Complex Networks

1A similar finding was reported byLakhina et al. (2003), who repeated onbinomial random graphs the trace-routemeasurements used to sample the struc-ture of the Internet. Lakhina and col-leagues found that the spanning treesobtained in this way have a degree distri-bution that obeys the 1/k law. Later,Clauset and Moore (2003) have presentedan analytical approach to derive the 1/klaw. This approach suggests the possibil-ity of mapping between graphs generatedby trace-route sampling and gradient net-works. Although it is not an exact map-ping, a close connection can indeed bemade by interpreting trace-route trees assuitably constructed gradient networks.

Figure 8. Gradient-Graph Degree Distributions for Random and Scale-Free Substrate Networks(a) The in-degree distribution is shown for the substrate binomial random graphG(N,p), where N = 1000, and p = 0.1 (z = 100). The numerical values are obtainedafter averaging over 104 sample runs. (b) The in-degree and degree P(k) distribu-tions are for the substrate Barabási-Albert scale-free graph with parameter m (m =1, 3). In this case N = 105, and the average is performed over 103 samples.

scale-free substrate networks, whichwe generated by the Barabási-Albertnetwork with parameter m (minimumdegree, see Albert and Barabási 2002)with the in-degree distributions for thecorresponding gradient networks. Oneimmediate conclusion is that the gra-dient network is the same type ofstructure as the substrate. In this case,it is a scale-free (power-law) graphwith the same exponent.

Flow Properties on Random vsScale-Free Networks. Using theproperties of gradient networks, wecan define a transport characteristicrelated to congestion or jamming inthe substrate network. In particular,we compare the average number ofnodes with in-links with the averagenumber of nodes with out-links. IfNl

(in) denotes the number of nodeswith l in-links, the total number ofnodes receiving gradient flow will be

The total number of gradient out-linksis simply Nsend = N because everynode has exactly one out-link.Naturally, the ratio Nreceive/Nsend willbe related to the instantaneous globalcongestion in the network. The small-er the number of nodes receiving theflow (given the same number ofsenders), the more congestion is in thesubstrate network at that instant. If theflow received by a node requires anonzero processing time (such asrouting of a packet by the router), asmall ratio of Nreceive/Nsend translatesinto large delay times and thus ineffi-cient flow processing. Let us definethe congestion (or jamming) factor asfollows:

(4)

where ⟨ ⟩n means averaging over thedisorder in the network structure and ⟨ ⟩h means averaging over the ran-domness in the scalar field. The valueof J is always between 0 and 1, with J= 1 corresponding to maximal conges-tion and J = 0 corresponding to nocongestion. Note that J is a congestionpressure characteristic generated bygradients rather than an actualthroughput characteristic. For a bino-mial random substrate networkG(N,p), we use Equations (3) and (4)to obtain the corresponding jammingfactor:

(5)

In the scaling limit N → ∞ and p =constant, the jamming factor assumesthe asymptotic behavior

That is, the random graph becomesmaximally congested. It is easy toshow that, in the other limit, when z = Np >> 1 is kept constant while N → ∞,

Once again, the random graph asymp-tomatically becomes maximally con-gested, or jammed.

For scale-free networks, however,the conclusion about jamming isentirely different. We find that thejamming coefficient J becomes inde-pendent of N, and it is always a

constant less than unity for large networks. In other words, scale-freenetworks are not prone to maximalcongestion. (This is true for allpower-law networks for which theaverage degree does not grow withN.) Figure 9 shows the congestionfactors as a function of network sizefor random and scale-free substratenetworks. Many real-world networksevolve more or less spontaneously(for example, the Internet or theWorld Wide Web), and they can reachsizes of about 108 nodes. At suchlarge N, the scaling limit studiedabove applies, and random networkshave maximal congestion. Thus, suchsubstrates are very inefficient forflow processing. Scale-free networks,on the other hand, have congestionthat stays bounded away from unityas the number of nodes grows verylarge, and they are therefore muchmore efficient substrates for transportand flow processing. Thus, it appearsthat the scale-free property of manyreal-world networks is not accidental.Topology may develop quite naturallyfrom a selection rule that tends tomaximize the global efficiency of theflow along the network.

Small-World Magic:Synchronized Computing

Networks

We have seen that many real-worldinteractions are mediated across com-plex network topologies and that thestructure and dynamics of those com-plex networks are becoming betterunderstood. It is therefore natural towonder whether network concepts canbe put to practical use. For example,can those concepts help us design systems that exhibit certain desiredproperties? In this last section of thearticle, I will show how complex net-work concepts were used to solve aproblem in distributed, or parallel,computation.


Complex Networks

We consider the class of systemsmade of a large number of interactingelements or individuals, each having afinite number of attributes, or localstate variables, that can assume acountable number (typically finite) ofvalues. The dynamics of the localstate variables are discrete eventsoccurring in continuous time, and theinteractions between individuals, orelements, have a finite range. Thereare many examples of such systems:magnetic systems, epidemics, somefinancial markets, wireless communi-cations, queuing systems, and so on.Virtually all agent-based systems canbe considered to belong to this classof discrete-event complex systems.Often, the dynamics of such systemsis inherently stochastic and asynchro-

nous. Simulating the systems is nontrivial, and in most cases, thecomplexity of the problem requiresthe use of distributed computer archi-tectures. These problems define thefield of parallel discrete-event simu-lations (PDES).

Conceptually, the computationaltask is divided among N processingelements (PEs), each of which evolvesthe dynamics of the allocated piece ofthe system. Because of the interac-tions among the individual elementsof the real system (spins, atoms, pack-ets, calls, and so on), the PEs mustcoordinate with a subset of other PEsduring the simulation.

At present, large parallel comput-ers for performing PDES have thou-sands of nodes and soon will have

tens of thousands: the Nippon ElectricCompany’s 5120-node Earth simula-tor producing 35.86 teraflops, the8192-node Q-machine at Los Alamoswith 13.88 teraflops, Virginia Tech’sX machine, which is a 2200-nodeapple G5 cluster with 10.28 teraflops,and so on. IBM is currently buildingthe Blue Gene/L parallel computerwith 360 teraflops and 65,000 nodes.Blue Gene/P, the next-generationcomputer, is expected to surpass thepetaflop barrier in 2006.

The design of efficient, scalableupdate schemes for performing PDESon these large parallel computers is arather challenging problem becausethe simulation scheme itself becomesa complex system whose propertiesare hard to deduce using classicalmethods of algorithm analysis.Korniss et al. (2003) introduced a lessconventional approach to analyzingthe efficiency and scalability of paral-lel discrete-event simulation schemes.The authors constructed an exactmapping between the parallel compu-tational process itself and a nonequi-librium surface growth model. As aresult, questions about efficiency andscalability can be mapped into certaintopological properties of this nonequi-librium surface. Then, using methodsfrom statistical mechanics, we cansolve the scalability problem of thecomputation PDES schemes. We nowbriefly sketch the scalability problemand its solution.

In order to simulate the dynamicsof the underlying system, the PDESscheme must track the physical-timevariable of the complex system. Incase of asynchronous dynamics ondistributed architectures, each PE gen-erates its own physical (also calledvirtual) time τ, which is the physicaltime variable of the particular compu-tational domain handled by that PE.Because of the varying complexity ofthe computation at different PEs, at agiven wall-clock instant, the simulat-ed, or virtual, times of the PEs can


Complex Networks

Figure 9. Congestion Factors for Random and Scale-Free SubstrateNetworks Congestion factors are shown as a function of size for random graphs and scale-free networks. For random binomial graph substrates, the jamming coefficient tendsto unity with increasing network size, indicating that these networks will becomeextremely congested in this limit. For scale-free substrates, however, the congestionfactor becomes independent on the network size, and thus arbitrarily large networkscan be considered without increasing their congestion level.

differ, a phenomenon called “timehorizon roughening.” Let us denotethe virtual time at PEi measured atwall-clock time t by τi(t). The set ofvirtual times {τi(t)}

Ni=1 forms the vir-

tual time horizon of the PDES schemeafter t parallel updates. In conserva-tive PDES schemes, a PE will per-form its next update only if it canobtain the correct information from itsneighbors to evolve the local configu-ration (local state) of the underlyingphysical system it simulates withoutviolating causality. Otherwise, it idles.Specifically, the PEi can only update(become “active”) at wall-clockinstant t if

(6)

That is, the PE’s virtual time is a localminimum among the virtual times ofits neighboring PEs (specified as theset <i>). Once the PE at site i canupdate, it will advance its local simu-

lated time to the new value τi(t + 1),and the process is repeated for allactive sites, generating the dynamicsof the virtual time horizon {τi(t)}

Ni=1.

The average of the time horizon aftert parallel steps is obviously

Thus, the rate of progress of the timehorizon average, or the average uti-lization of the PEs ⟨u(t)⟩ = ⟨τ

_(t + 1)⟩

– ⟨τ_

(t)⟩ is proportional to the numberof nonidling, or active, PEs. The aver-age ⟨·⟩ is taken over the stochasticevent dynamics. The PDES scheme iscomputationally scalable if there is aconstant c > 0, such that

(7)

That is, the average rate of progress ofthe time horizon does not vanish even

after very long times, as the simulatedsystem size and, therefore, the numberof PEs are taken to infinity.

We solved this computational scal-ability problem by drawing an analogywith the statistical mechanics of non-equilibrium surface-growth processes.Thin films are grown on solid sub-strates by deposition of atoms or mol-ecules from surrounding vapors.Because the vapors are fairly hot, theatoms reaching the solid surface fol-low a stochastic path until they areincorporated into the surface, typicallyin an irregular fashion. The resultingthin film has mounds and valleys thatcan be described by the fluctuations ofthe local height variable h(x,t) of thefilm measured from the surface of thesubstrate. Using an approach based onthe Langevin equation, physicists havedeveloped extended theoreticalmachinery to describe the statistics ofthe fluctuations of the variable h(x,t).The simulated time variable τi(t) inthe computational scalability problem


Complex Networks

Figure 10. A Fully Scalable Small-World PDES SchemeThe small-world PDES scheme is fully scalable. By introducing more shortcuts into the communication network (increasing p),the algorithm becomes measurement scalable (a), and it stays computationally scalable (b).

behaves much like the surface heightvariable h(x,t) in that τi(t) evolvesaccording to the stochastic updatedynamics of the PDES scheme withthe index i of the PE corresponding tox in the height variable. In many largecomplex systems, the dynamics of thestochastic events can be characterizedby a Poisson distributed stream. Thismeans that, when simulating such sys-tems, the updates at individual PEscorrespond to adding height incre-ments that follow a Poisson distribu-tion. Using statistical mechanicsmethods to analyze the resulting surface-growth model, one can showthat the fluctuations of the virtualtime horizon in the continuum limitcan be described by the so-calledKardar-Parisi-Zhang (KPZ) equationof surface growth:

(8)

where τ̂ is a coarse-grained form ofthe virtual-time variable and η is awhite noise term.2 We then use theKPZ equation to verify that the uti-lization of the PEs satisfies Equation(7). The existence of a constant c > 0,as in (7), is the result of the slope-slope correlations of the surface beingshort ranged and not scaling with N.Our numerical evaluation of this con-stant yields c = 0.2461 ± (7 × 10–6),which shows that the basic conserva-tive PDES scheme is indeed computa-tionally scalable.

There is, however, a fundamentalproblem with the basic PDES updatescheme. The KPZ equation for thetime horizon fluctuations predicts thatthe average spread of those fluctua-tions, w2(N,t), diverges with anincreasing number of processing

elements N in the long time limit (t → ∞). Therefore, if we try to meas-ure a global property of simulatedsystem at a given simulated time τand wait until all processors have sim-ulated their local state correspondingto that time, the waiting period inwall-clock time would diverge withthe number of processing elements! In other words, even though a parallelcomputer with infinitely many pro-cessing elements can simulate thedynamics of an infinitely large systemat nonzero speed (computational scal-ability), the basic PDES scheme couldnot produce a single measurement ofthe global state of the system! Thebasic conservative scheme is compu-tationally scalable but measurementnonscalable.

How can we surmount this prob-lem? Can the PDES scheme be modi-fied such that the new update schemeis also measurement scalable? Theanswer is affirmative, and the key tothe solution is the notion of the small-world property of complex networks(Korniss et al 2003).

In order to decorrelate the fluctua-tions in the time horizon, we modifythe update topology in the followingway: for every node i, we assign arandomly chosen communication link,r(i). According to its definition, theresulting communication topology (aregular lattice plus random links)forms a small-world network. When a node is allowed toupdate—its virtual time satisfies thecondition in (6)—it will make, withprobability p, an extra check for thecondition τi(t) ≤ τr(i)(t) and update ifthat condition is satisfied. With proba-bility 1 – p, it will make this extracheck and thus behave as the basicPDES scheme. Here p has the role ofa tuning parameter: For p = 0, wehave the basic PDES scheme, whereasp = 1 corresponds to the fully scalablesmall-world PDES scheme. Note thatthese extra checks do not affect thecorrectness of the simulation, and

causality is preserved in just the sameway. These checks only synchronizethe PEs. Using the same coarse-grain-ing methods as for the basic PDESscheme, we now find that the timehorizon fluctuations are described by

(9)

with γ(0) = 0 and γ(p) > 0 for0 < p ≤ 1. This equation differs fromEquation (8) in the strong dampingterm, –γ(p)τ̂ , which is ultimatelyresponsible for the nondivergence ofthe average spread, and thus the newupdate scheme is measurement scala-ble as shown in Figure 10.

Concluding Remarks

The list of problems, challenges, andapplications that I presented above israther biased toward my particularresearch interests, and it is not, by far,exhaustive of this area. My goal was togive the reader a feeling for the type ofcomplexity one encounters when deal-ing with networks. I also wanted toshow that this is a novel area with manyinteresting and potentially powerfulapplications awaiting discovery. n

Further Reading

Albert, R., and A.-L. Barabási. 2002. StatisticalMechanics of Complex Networks. Rev.Mod. Phys. 74: 47.

Albert, R., H. Jeong, and A.-L. Barabási. 1999.Diameter of the World-Wide Web. Nature401: 130.

Anghel, M., Z. Toroczkai, K. E. Bassler, and G.Korniss. 2004. Competition-DrivenNetwork Dynamics: Emergence of a Scale-Free Leadership Structure and CollectiveEfficiency. Phys. Rev. Lett. 92: 058701.


Complex Networks

2 The deterministic part of this equation is easi-ly related to the well-known Burgers equationthrough the slope variables φ̂ = ∂τ̂ /∂x.


Complex Networks

Bollobás, B. 1998. Modern Graph Theory:Graduate Texts in Mathematics. Vol. 184.Edited by F. W. Gehring, and S. Axler. NewYork: Springer-Verlag.

———. 2001. Random Graphs. SecondEdition. Cambridge; New York: CambridgeUniversity Press.

Caldarelli, G., A. Capocci, P. De Los Rios, andM. A. Muñoz. 2002. Scale-Free Networksfrom Varying Vertex Intrinsic Fitness. Phys.Rev. Lett. 89: 258702.

Chung, F., and L. Lu. 2002. ConnectedComponents in Random Graphs with GivenExpected Degree Sequences. Ann. Comb. 6:125.

Clauset, A. and C. Moore. 2004. TracerouteSampling Makes Random Graphs Appear to Have Power Law Degree Distributions.[Online]:http://arxiv.org/abscondmat/0312674

Dorogovtsev, S. N., J. F. F. Mendes, and A. N.Samukhin. 2000. Structure of GrowingNetworks with Preferential Linking. Phys.Rev. Lett. 85 (21): 4633.

Eubank, S., H. Guclu, V. S. A. Kumar, M. V.Maratho, A. Srinivasan, Z. Toroczkai, andN. Wang. 2004. Modelling DiseaseOutbreaks in Realistic Urban SocialNetworks. Nature 429: 180.

Faloutsos, M., P. Faloutsos, and C. Faloutsos.1999. On Power-Law Relationships of theInternet Topology. Comput. Commun. Rev.29 (4): 251.

Graham, R. L., B. L. Rothschild, and J. H.Spencer. 1990. Ramsey Theory. SecondEdition. New York: John Wiley and Sons, Inc.

Jeong, H., B. Tombor, R. Albert, Z. N. Oltvai,and A.-L. Barabási. 2000. The Large-ScaleOrganization of Metabolic Networks.Nature 407: 651.

Kilian, J., Y. Rabani, A. Sinclair, and R. Wanka.1998. Local Divergence of Markov Chainsand the Analysis of Iterative LoadBalancing Schemes. In Proceedings of the39th Annual Symposium on the Foundationsof Computer Science (FOCS). p. 694. LosAlamitos, CA: IEEE Computer Society.

Korniss, G., M. A. Novotny, H. Guclu, Z.Toroczkai, and P. A. Rikvold. 2003.Suppressing Roughness of Virtual Times inParallel Discrete-Event Simulations. Science299 (5607): 677.

Lakhina, A., J. W. Byers, M. Crovella, and P.Xie. 2003. Sampling Biases in IP TopologyMeasurements. In Proceedings of theAnnual Joint Conference of the IEEEComputer and Communications Societies(INFOCOM). p. 332. New York: IEEE.

Liljeros, F., C. R. Edling, L. A. N. Amaral, H.E. Stanley, and Y. Åberg. 2001. The Web ofHuman Sexual Contacts. Nature 411: 907.

Lloyd, S. 2000. Ultimate Physical Limits toComputation. Nature 406: 1047.

Menczer, F. 2002. Growing and Navigating theSmall World Web by Local Content. Proc.Natl. Acad. Sci. U.S.A. 99 (22): 14014.

Redner, S. 1998. How Popular Is Your Paper?An Empirical Study of the CitationDistribution. Eur. Phys. J. B, 4 (2): 132.

Solomonoff, R., and A. Rapoport. 1951.Connectivity of Random Nets. Bull. Math.Biophys. 13: 107.

Toroczkai, Z., and K. E. Bassler. 2004.Jamming is Limited in Scale-Free Systems.Nature 428: 716.

For further information, contact Zoltán Toroczkai (505) 667-3218([email protected]).

complex networks - los alamos national...

Documents