arxiv:1012.2242v1 [q-bio.qm] 10 dec 2010wbialek/our_papers/mora...versities paris 6 et paris 7, 24...

Are biological systems poised at criticality?

Thierry Mora1∗ and William Bialek1,2

1Joseph Henry Laboratories of Physics, Lewis–Sigler Institute for Integrative Genomics,and 2Princeton Center for Theoretical Science, Princeton University, Princeton, New Jersey 08544 USA

(Dated: December 13, 2010)

Many of life’s most fascinating phenomena emerge from interactions among many elements—manyamino acids determine the structure of a single protein, many genes determine the fate of a cell,many neurons are involved in shaping our thoughts and memories. Physicists have long hoped thatthese collective behaviors could be described using the ideas and methods of statistical mechanics.In the past few years, new, larger scale experiments have made it possible to construct statisticalmechanics models of biological systems directly from real data. We review the surprising successesof this “inverse” approach, using examples form families of proteins, networks of neurons, and flocksof birds. Remarkably, in all these cases the models that emerge from the data are poised at a veryspecial point in their parameter space—a critical point. This suggests there may be some deepertheoretical principle behind the behavior of these diverse systems.

I. INTRODUCTION

One of the great triumphs of twentieth century sciencewas the identification of the molecular building blocksof life. From the DNA molecules whose sequence andstructure control the flow of genetic information, to theion channels and receptors whose dynamics govern theflow of information in the brain, these building blocksare, to a remarkable extent, universal, shared among allforms of life on earth. Despite the importance of thisreduction to elementary constituents, most of what werecognize as the phenomena of life are not properties ofsingle molecules, but rather emerge from the interactionsamong many molecules. Almost by definition, what wefind especially interesting about the behavior of multicel-lular organisms (like us) emerges from interactions amongmany cells, and the most striking behaviors of animal(and human) populations are similarly collective.

For decades, physicists have hoped that the emergent,collective phenomena of life could be captured using ideasfrom statistical mechanics. The stationary states of bio-logical systems have a subtle structure, neither “frozen”into a well ordered crystal, nor chaotic and disorderedlike a gas. Further, these states are far from equilibrium,maintained by a constant flow of energy and materialthrough the system. There is something special aboutthe states corresponding to functional, living systems,but at the same time it cannot be that function dependson a fine tuning of parameters. Of the many ideas rootedin statistical physics that have been suggested to charac-terize these states, perhaps the most intriguing—and themost speculative—is the idea of self–organized criticality.

The theory of self–organized criticiality has its originin models for inanimate matter (sandpiles, earthquakes,etc.) [1], but the theory was then extended and adapted

∗ Present address: Laboratoire de Physique Statistique de l’EcoleNormale Superieure, UMR 8550 of CNRS associated with Uni-versities Paris 6 et Paris 7, 24 rue Lhomond, 75231 Paris Cedex05, France

to encompass biological systems through the analysis ofsimple toy models [2]. As an example, simple modelsfor the evolution of interacting species can self–organizeto a critical state in which periods of quiescence are in-terrupted by “avalanches” of all sizes [3], which remindsus of the idea of punctuated equilibria in evolution [4].Similarly, it was suggested that the brain is in a self–organized critical state, at the boundary between beingnearly dead and being fully epileptic [5]. It now seemsunlikely that some of the initial ideas were correct (e.g.,real sand behaves very differently from the models), butpossibility that biological system poise themselves at ornear a critical point remains tantalizing.

Despite the enthusiasm for using ideas from statisticalphysics to think about biological systems, the connec-tions between the models and the experimentally measur-able quantities often has been tenuous. Even in the caseof neural networks, where statistical physics approachesare perhaps best developed [6–9], the relationship be-tween the models and the dynamics of real neurons issomewhat loose. For the ideas of criticality, it might notbe too harsh to suggest that much of what has been doneis at the level of metaphor, rather than calculations whichcould be tested against real data.

In the past decade or so, there has been an importantdevelopment in the experimental investigation of biolog-ical networks, and this suggests a very different route tothe use of ideas from statistical physics. While it haslong been conventional to monitor the activity or stateof individual elements in a network, it is now possibleto monitor many elements in parallel. The technologiesare specific to each class of systems—large arrays of elec-trodes recording from many neurons in parallel [10, 11],high throughput sequencing to probe large ensembles ofamino acid sequences [12], accurate imaging to track in-dividual animals in large groups [13–17]—and each mea-surement of course has its own limitations. Nonetheless,the availability of these new experiments has led sev-eral groups to try constructing statistical physics modelsdirectly from the data. A remarkable feature of theseanalyses, scattered across many levels of organization,

arX

iv:1

012.

2242

v1 [

q-bi

o.Q

M]

10

Dec

201

0

2

is the appearance of signatures of criticality. Whereastwenty–five years ago we had a grand theory with littleconnection to data, we now have many isolated discus-sions of particular experiments hinting at similar conclu-sions. Our goal here is to bring these analyses together,perhaps rekindling the hopes for a more general theory.

II. ZIPF’S LAW AND CRITICALITY

In the usual examples of critical phenomena, there aresome natural macroscopic variables with a singular de-pendence on parameters that we can control experimen-tally. A familiar example is that we can identify theliquid/gas critical point by measuring the density of thefluid as a function of temperature and pressure. It isworth noting that, sometimes, doing experiments thatcouple to the correct macroscopic variables is difficult, asin the Bishop–Reppy experiments on superfluid heliumfilms [18]. In many cases one can also identify criticalityin purely thermodynamic measurements, as a singular-ity in the heat capacity as a function of temperature, orthrough the behavior of the correlation function of fluctu-ations in some local variable, such as the magnetizationin a magnet.

The difficulty in biological systems is that they arenot really equilibrium statistical mechanics problems, sothere is no guarantee that we can find relevant macro-scopic variables, and certainly it is not clear how tochange the temperature. Even if an Ising spin glass isthe correct description of a neural network, for example[19–21], it is not clear how to measure the analog of themagnetic susceptibility. Nonetheless it may be true thatthe probability of finding the system in a particular stateis governed by a probability distribution that is mathe-matically equivalent to the Boltzmann distribution for asystem poised at a critical point.

Let us denote by σ the state of a system. Typically, σis a multi-dimensional variable σ = (σ1, . . . , σN ), whereσi can be a spin, a letter in a word, the spiking activityof a neuron, an amino acid in a peptide chain, or thevector velocity of bird in a flock. Let us then denote byP (σ) the probability of finding the system in the state σ.One can formally write this probability as a Boltzmanndistribution:

P (σ) =1

Ze−E(σ)/kBT , (1)

where kB is Boltzmann’s constant and Z the partitionfunction. Without loss of generality we can set the tem-perature to kBT = 1 and Z to 1, which leads to thefollowing definition for the energy:

E(σ) = − logP (σ). (2)

With the availability of large datasets of biological sys-tems, it now seems possible to construct P (σ) directlyfrom the data, and to take the corresponding energy func-tion E(σ) seriously as a statistical mechanics problem. In

this section we explore the consequences of that idea, byshowing the equivalence between Zipf’s law of languageand the critical properties of the associated statisticalmechanics model.

In our modern understanding of critical phenomena inequilibrium systems, a central role is played by powerlaw dependencies. Indeed, the exponents of these powerlaws—describing the dependence of correlations on dis-tance, or the divergence of thermodynamic quantities asa function of temperature—are universal, and reflect fun-damental features of the underlying field theory that de-scribes the long wavelength behavior of the system. Self–organized critical systems also exhibit power laws, for ex-ample in the distribution of sizes of the avalanches thatoccur as a sandpile relaxes [1]. Power laws have also beenobserved empirically in a wide variety of non–equilibriumsystems [22], although many of these claims do not sur-vive a rigorous assessment [23]. It is also fair to note that,in contrast to the case of equilibrium critical phenomena,the observation of power laws in these more exotic caseshas not led to anything like a general theory.

There is a very old observation of a power law in abiological system, and this is Zipf’s law in language [24],first observed by Auerbach in 1913 [25]. In contrast toexamples such as avalanches, where power laws describethe dynamics of the system, Zipf’s law really refers tothe distribution over states of the system, in the sameway that the Boltzmann distribution describes the dis-tribution over states of an equilibrium system. Specifi-cally, in written language we can think of the state of thesystem as being a single word σ, and as texts or con-versations proceed they sample many such states. If oneorders (ranks) words σ by their decreasing frequencyP (σ), Zipf’s law states that the frequency of words P (σ)decays as the inverse of their rank r(σ):

P (σ) ∝ 1

r(σ). (3)

This distribution cannot be normalized when the num-ber of words is infinite. This can be corrected either byintroducing a cutoff corresponding to a finite vocabulary,or by slightly modifying the law to P = r−α/ζ(α), withα > 1 and ζ(α) is Riemann’s zeta function. Since itsintroduction in the context of language, Zipf’s law hasbeen observed in all branches of science, but has also at-tracted a lot of criticism, essentially for the same reasonsas other power laws, but also because of the controversialclaim by Zipf himself that his law was characteristic ofhuman language.

Despite all our concerns, Zipf’s law is, in a certainprecise sense, a signature of criticality [26]. To see this,consider the density of states, obtained just by countingthe number of states in a small window δE, The densityof states is the number of states within a small energybracket:

ρδE(E) =1

δE

∑σ

I[E < E(σ) < E + δE], (4)

3

where I[x] is the indicator function. This density of statesis the exponential of the entropy, and in the thermody-namic limit the energy and the entropy both should scalewith the system’s size N :

S(E) ≡ log ρδE(E) = Ns(ε = E/N) + s1, (5)

where s1 is sub–extensive, that is limN→∞(s1/N) = 0.The bin size δE only affects the sub–extensive correctionsas δE → 0, and can be ignored for very large systems.But for real data and finite N , the choice of the bin sizeδE can be problematic, and it is useful to consider insteadthe cumulative density of states:

N (E) =∑σ

I[E(σ) < E] =

∫ E

−∞dE′ρδE=0(E′). (6)

For large systems, this integral is dominated by the max-imum of the integrand, and the two definitions for thedensity of states become equivalent:

N (E) =

∫ E

−∞dE′eNs(E

′/N) (7)

= N

∫ E/N

−∞dε′ exp [N (s(ε′) + s1/N)] (8)

∼ eNs(ε), (9)

⇒ logN (E) ∼ Ns(E/N) = S(E). (10)

But the rank r(σ) is exactly the cumulative density ofstates at the energy of σ:

r(σ) = N [E = E(σ)],

that is, the number of states that are more frequent (or oflower energy) than σ, and so in general we expect that,for large systems,

S[E(σ)] ≈ log r(σ). (11)

Zipf’s law tell us that probabilities are related to ranks,

P (σ) =1

ζ(α)r−α(σ)

⇒ − logP (σ) = α log r(σ) + log ζ(α). (12)

But now we can connect probabilities to energy, from Eq(2), and ranks to entropy, from Eq (11), to give

S(E) =E

α+ · · · , (13)

where again · · · is sub–extensive. In words, Zipf’s law fora very large system is equivalent to the statement thatthe entropy is an exactly linear function of the energy.

A perfectly linear relation between entropy and energyis very unusual. To see why—and to make the connectionto criticality—let’s recall the relation of the (canonical)partition function to the energy/entropy relationship. Asusual we have

Z(T ) =∑σe−E(σ)/kBT , (14)

where we have reintroduced a fictious temperature T .The “operating temperature,” i.e. the temperature ofthe original distribution, is kBT = 1. Then we have

Z(T ) =

∫dEρ(E)e−E/kBT , (15)

where ρ(E) is the density of states as before. But in thesame large N approximations used above, we can write

Z(T ) =

∫dEρ(E)e−E/kBT

=

∫dEeS(E)e−E/kBT (16)

∼∫dε exp [N (s(ε)− ε/kBT )] . (17)

For large N , this integral is dominated by the largestterm of the integrand, which is the point where ds/dε =1/kBT ; this much is standard, and true for all sys-tems. But in the special case of Zipf’s law, we haveds/dε = 1/α, for all energies. What this really meansis that kBT = α is a (very!) critical point: for anykBT < α, the system freezes into a ground state of zeroenergy and zero entropy, while for kBT > α the systemexplores higher energies with ever higher probabilities,and all thermodynamic quantities diverge if Zipf’s lawholds exactly.

Clearly, not all critical systems are described by a den-sity of states as restrictive as in Eq (13). Systems ex-hibiting a first order transition have at least one energyE for which S′′(E) < 0, and systems with a second orderphase transition are characterized by the existence of anenergy where S′′(E) = 0. The specific heat, whose diver-gence serves to detect second order phase transitions, canbe related to the second derivative of the micocanonicalentropy:

C(T ) =N

T 2

[−d

2S(E)

dE2

]−1

. (18)

What is truly remarkable about Zipf’s law, and its cor-relate Eq (13), is that S′′(E) = 0 at all energies, mak-ing Zipf’s law a very strong signature of criticality. Atangible consequence of this peculiar density of states isthat the entropy is sub–extensive below the critical point,S/N → 0. For real data, finite size effects will compli-cate this simple picture, but this argument suggests thatcritical behaviour can considerably reduce the space ofexplored states, as measured by the entropy. In later sec-tions, we will see examples of biological data which obeyZipf’s law with surprising accuracy, and this observationwill turn out to have practical biological consequences.

III. MAXIMUM ENTROPY MODELS

Systems with many degrees of freedom have a daunt-ingly large number of states, which grows exponentially

4

with the system’s size, a phonemon sometimes called the‘curse of dimensionality’. Because of that, getting a goodestimate of P (σ) from data can be impractical. The prin-ciple of maximum entropy [27, 28] is a strategy for dealingwith this problem by assuming a model that is as randomas possible, but that agrees with some average observ-ables of the data. As we will see, maximum entropy mod-els naturally map onto known statistical physics models,which will ease the study of their critical properties.

In the maximum entropy approach, the real (but un-known) distribution Pr(σ) is approximated by a modeldistribution Pm(σ) that maximizes the entropy [29]:

S[P ] = −∑σP (σ) logP (σ), (19)

and that satistifies

〈Oa(σ)〉m = 〈Oa(σ)〉r, (20)

where O1,O2, . . . are observables of the system, and 〈·〉rand 〈·〉m are averages taken with Pr and Pm respectively.The key point is that often average observables 〈Oa〉rcan be estimated accurately from the data, even whenthe whole distribution Pr(σ) cannot. Oa is typically amoment of one or a few variables, but it can also be aglobal quantity of the system. Using the technique ofLagrange multipliers, one can write the explicit form ofthe model distribution:

Pm(σ) =1

Ze∑

a βaOa(σ). (21)

β1, β2, . . . are the Lagrange multipliers associated tothe constraints (20) and constitute the fitting parame-ters of the model. When the maximum entropy modelis constrained only by the mean value of the energy,O(σ) = −E(σ), we recover the Boltzmann distribution,Pm(σ) = Z−1e−βE(σ), where β = 1/kBT is the inversetemperature. More generally, the exponential form ofthe distribution (21) suggests to define the energy as:E(σ) = −

∑a βaOa(σ).

There exists a unique set of Lagrange multipliers thatsatisfies all the constraints, but finding them is a com-putationally difficult inverse problem. Inverse problemsin statistical mechanics have a long history, which goesat least as far back as Keller and Zumino, who inferedmicroscopic interaction potentials from thermodynamicquantities [30]. The special case of binary variables con-strained by pairwise correlations was formulated in 1985by Ackley, Hinton, and Sejnowski in their discussion of“Boltzmann machines” as models for neural networks[31]. Solving the inverse problem is equivalent to mini-mizing the Kullback–Leibler divergence between the realand the model distribution (21), defined as:

DKL(Pr‖Pm) =∑σPr(σ) log

Pr(σ)

Pm(σ), (22)

or equivalently, to maximizing the log-likelihood L thatthe experimental data (given by M independent draws

σ1, . . . ,σM ) was produced by the model:

L = log

M∏a=1

Pm(σa)

= M∑σPr(σ) logPm(σ)

= M {S[Pr]−DKL(Pr‖Pm)} .

(23)

where, by definition, Pr(σ) = (1/M)∑Ma=1 δσ,σa . In

fact, one has:

∂DKL(Pr‖Pm)

∂βa= 〈Oa〉m − 〈Oa〉r, (24)

which ensures that the constraints (20) are satisfied atthe minimum. This explicit expression of the derivativessuggests to use a gradient descent algorithm, with thefollowing update rules for the model parameters:

βa ← βa + η(〈Oa〉r − 〈Oa〉m), (25)

where η is a small constant, the “learning rate.” Notethat in this framework, the inverse problem is in factbroken down into two tasks: estimating the mean observ-ables 〈Oa〉m within the model distribution for a given setof parameters βa (direct problem); and implementing anupdate rule such as (25) that will converge to the rightβa’s (inverse problem). The direct problem is compu-tationally costly, as it requires to sum over all possiblestates σ. Approximate methods have been proposed tocircumvent this difficulty. Monte Carlo algorithms havebeen commonly used [20, 32, 33] and have been improvedby techniques such as histrogram sampling [34]. Approx-imate analytic methods, such as high temperature ex-pansions [35, 36] or message-passing algorithms [37, 38],were also developed, and shown to be fast and accuratein the perturbative regime of weak correlations.

Note that even when a solution to the inverse problemcan be found, one still needs to evaluate whether the max-imum entropy distribution correctly describes the data,for example by testing its predictions on local and globalobservables that were not constrained by the model. Inthe following two sections we present examples in whichmaximum entropy models were successfully fitted to realbiological data, and analyzed to reveal their critical prop-erties. We then turn to other approaches that also pointto the criticality of different biological systems.

IV. NETWORKS OF NEURONS

Throughout the nervous systems of almost all animals,neurons communicate with one another through discrete,stereotyped electrical pulses called action potentials orspikes [39]. Thus, if we look in a brief window of time∆τ , the activity of a neuron (denoted by i) is binary: inthis brief window, a neuron either spikes, in which casewe assign it σi = 1, or it does not, and then σi = −1.

5

In this notation the binary string or ‘spike word’ σ =(σ1, . . . , σN ) entirely describes the spiking activity of anetwork of N neurons, and the probability distributionP (σ) over all 2N possible spiking states describes thecorrelation structure of the network, as well as definingthe “vocabulary” that the network has at its disposalto use in representing sensations, thoughts, memories oractions.

For large networks, sampling all 2N words is of courseimpractical. For many years, much attention was focusedon the behavior of single neurons, and then on pairs.An important observation is that correlations betweenany two neurons typically are weak, so that the corre-lation coefficient between σi and σj 6=i is on the order of0.1 or less. It is tempting to conclude that, physicists’prejudices notwithstanding, neurons are approximatelyindependent, and there are no interesting collective ef-fects. As soon as it became possible to record simultane-ously from many neurons, however, it became clear thatthis was wrong, and that, for example, larger groups ofneurons spike simultaneously much more frequently thanwould be expected if spiking were independent in everycell [40]. It is not clear, however, how to interpret suchdata. It might be that there are specific sub–circuitsin the network that link special groups of many cells,and it is these groups which dominate the patterns ofsimultaneous spiking. Alternatively, the network couldbe statistically homogenous, and simultaneous spiking ofmany cells could emerge as a collective effect. An im-portant hint is that while correlations are weak, they arewidespread, so that any two neurons that plausibly areinvolved in the same task are equally likely to have asignificant correlation.

To make this discussion concrete, it is useful to thinkabout the vertebrate retina. The retina is an ideal placein which to test ideas about correlated activity, becauseit is possible to make long and stable recordings of manyretinal ganglion cells—the output cells of the retina,whose axons bundle together to form the optic nerve—as they respond to visual stimuli. In particular, becausethe retina is approximately flat, one can record from theoutput layer of cells by placing a piece of the retina onan array of electrodes that have been patterned onto toa glass slide, using conventional methods of microfabri-cation. Such experiments routinely allow measurementson ∼ 100 neurons, in some cases sampling densely froma small region of the retina, so that this represents a sig-nificant fraction of all the cells in the area covered by theelectrode array [10, 11].

The average rate at which neuron i generates spikesis given by ri = 〈(1 + σi)/2〉/∆τ , so that knowing theaverage rates is the same as knowing the local magne-tizations 〈σi〉. The maximum entropy model consistentwith these averages, but with no other constraints, is amodel of independently firing cells, from Eq. (21):

P1(σ) =∏i

pi(σi) = Z−1 exp

[∑i

hiσi

], (26)

where hi is the Lagrange multiplier associated to the av-erage observable 〈σi〉. Although the independent modelmay correctly describe the activity of small groups ofneurons, it is often inconsistent with some global prop-erties of the network. For example, for the retina stimu-lated by natural movies [19], the distribution of the total

number of spikes K =∑Ni=1(1 + σi)/2 is observed to be

approximately exponential [P (K) ≈ e−K/K ], while an in-dependent model predicts Gaussian tails. This suggeststhat correlations strongly determine the global state ofthe network.

As the first step beyond an independent model, one canlook for the maximum entropy distribution that is consis-tent not only with 〈σi〉, but also with pairwise correlationfunctions between neurons 〈σiσj〉. The distribution thentakes a familiar form:

P2(σ) =1

Ze−E(σ), E(σ) = −

N∑i=1

hiσi −∑i<j

Jijσiσj ,

(27)where Jij is the Lagrange multiplier associated to 〈σiσj〉.Remarkably, this model is mathematically equivalent to adisordered Ising model, where hi are external local fields,and Jij exchange couplings. Ising models were first in-troduced by Hopfield in the context of neural networks todescribe associative memory [6]. The maximum entropyapproach allows for a direct connection to experiments,since all the parameters hi and Jij are determined fromdata.

Maximum entropy distributions consistent with pair-wise correlations, as in Eq (27), were fitted for subnet-works of up to N = 15 neurons [19] by direct summationof the partition function coupled with gradient descent[Eq. (25)]. These models did a surprisingly good job ofpredicting the collective firing patterns across the pop-ulation of all N neurons, as illustrated in Fig. 1. Im-portantly, the model of independent neurons makes er-rors of many orders of magnitude in predicting relativefrequencies of the N−neuron patterns, despite the factthat pairwise correlations are weak, and these errors arelargely corrected by the maximum entropy model. Theaccuracy of the model can be further evaluated by askinghow much of the correlative structure is captured. Theoverall strength of correlations in the network is mea-sured by the multi-information [41], defined as the totalreduction in entropy relative to the independent model,I = S[P1]− S[Pr]. The ratio:

I2I

=S[P1]− S[P2]

S[P1]− S[Pr](28)

thus gives the fraction of the correlations captured bythe model. When N is small enough (≤ 10), S[Pr] canbe evaluated by directly estimating Pr(σ) from data. Inthe salamander retina I2/I ≈ 90%, indicating excellentperformance of the model.

The generality of the maximum entropy approach sug-gests that its validity should extend beyond the special

6

σ18

A

B

FIG. 1. The Ising model greatly improves the prediction ofretinal activity over the independent model [19]. A. Neuronalactivity is summarized by a binary word σ = σ1, . . . , σN ob-tained by binning spikes into 20 ms windows. B. The fre-quencies of all spike words σ of a subnetwork of N = 10neurons are compared between the experiment (x axis) andthe prediction (y axis) of the independent model (gray dots)and the maximum entropy model with pairwise interactions(black dots). The straight line represents identity.

case of the salamander retina, and much subsequent workhas been devoted to testing it in other contexts. In a ef-fort parallel to [19], the activity of the retina of macaquemonkeys [42] was analyzed with maximum entropy meth-ods. The behaviour of small populations (N = 3 to 7)of ON and OFF parasol cells was acurately explained byan Ising model, with 98 to 99% of the correlations cap-tured. Mammalian retinal ganglion cells can be classifiedinto well-defined types, and cells of a given type tile thevisual space like a mosaic [43]; this stands in contrast tothe salamander retina, where cells are not well typed andare grouped in large patches responding to the same areaof the visual space. It was found that restricting interac-tions to adjacent pairs in the mosaic did not significantlyalter the performance of the model, at least under a lim-ited set of stimulus conditions, a result later confirmedfor larger networks [32].

The maximum entropy framework was also extended

to other (non retinal) areas of the brain. In cultured cor-tical neurons [19, 21] and cortical slices [21], Ising modelsperfomed as well as in the retina (88 to 95% of the correla-tion captured). Ising models also proved useful for study-ing neural activity in the visual cortex of cats [44] andmacaque monkeys [45, 46]. In monkeys, the Ising modelagreed well with data when neurons were far apart fromeach other (> 600µm, tens of micro-columns), but failedat shorter separations (< 300µm, a few micro-columns),where higher order correlations prevail [46]. This em-phasizes the importance of testing the model predictionssystematically on local as well as global observables, andif necessary add constraints to the model.

Most of the work reviewed so far was restricted tosmall population sizes, partly because of the difficultyof recording from many neurons simultaneously, but alsobecause of the computational problems mentioned in theprevious section. In the salamander retina [19], extrap-olations from small networks (N ≤ 15) have suggestedthat the constraints imposed by pairwise correlationsconsiderably limit the space of possible patterns (mea-sured by the entropy) as N grows, effectively confiningit to a few highly correlated states when N ≈ 200 —roughly the size of a patch of retinal ganglion cells withoverlapping receptive fields. This led to the proposal thatthe network might be poised near a critical point.

To test that idea, an Ising model of the whole pop-ulation of ganglion cells recorded in [19] (N = 40) wasfitted using Monte Carlo methods and gradient descent[20, 47]. Although the large size of the population forbidsto compute global information theoretic quantities sucha I2/I, the validity of the model can still be tested on lo-cal observables not fitted by the model. Specifically, themodel was found to be a good predictor of the three-pointcorrelation functions 〈σiσjσk〉 measured in the data, aswell as of the distribution of the total number of spikesacross the population.

Armed with an explicit model (27) for the whole net-work, one can explore its thermodynamics along thelines sketched in section II. The introduction of a fic-ticious temperature T [as in Eq. (14)] corresponds to aglobal rescaling of the fitting parameters, hi → hi/kBT ,Jij → Jij/T . As seen in Fig. 2, the heat capacity ver-sus temperature is found to be more and more sharplypeaked around the operating temperature kBT = 1 asone increases the network size N . One can also use these“thermodynamic” measurements to show that the ob-served networks of N ≤ 40 cells are very similar to net-works that are generated by mean spike probabilities andcorrelations at random from the observed distributions ofthese quantities. This raises the possibility that critical-ity could be diagnosed directly from the distribution ofpairwise correlations, rather than their precise arrange-ment across cells. More concretely, it gives us a pathto simulate what we expect to see from larger networks,assuming that the cells that have been recorded from inthis experiment are typical of the larger population ofcells in the neighborhood. The result for N = 120 is an

7

100

101

0

5

10

15

20

25

30

35

40

45

50

T

C(T

,N)

N=120, sim

N=40

N=20

N=20, rand

N=15

N=10

N=5

FIG. 2. Divergence of the heat capacity is a classical signatureof criticality. This plot represents the heat capacity versustemperature for Ising models of retinal activity for increasingpopulation sizes N [47]. The “N = 20, rand,” and N =120 curves were obtained by infering Ising models for fictiousnetworks whose correlations were randomly drawn from realdata. Error bars show the standard deviation when choosingdifferent subsets of N neurons among the 40 available.

even clearer demonstration that the system is operatingnear a critical point in its parameter space, as shown bythe huge enhancement of the peak in specific heat, shownin the top curve of Fig. 2.

This diverging heat capacity is further evidence thatthe system is near a critical point, but one might be wor-ried that this is an artifact of the model or of the fittingprocedure. As we have seen in section II, the criticalproperties of the distribution P (σ) can be also exploreddirectly, without recourse to the maximum entropy ap-proximation, by plotting the probability of firing pat-terns versus their rank. Figure 3, which shows such plotsfor increasing network sizes, reveals good agreement withZipf’s law, especially for larger N .

Some of the inferred couplings Jij were negative, indi-cating an effective mutual inhibition between two cells.We know from spin glass theory [48] that negative cou-plings can lead to frustration and the emergence ofmany locally stable, or metastable, states. Formally,a metastable state is defined as a state whose energyis lower than any of its adjacent states, where adja-cency is defined by single spin flips. Said differently,metastable states are local “peaks” in the probabilitylandscape. In the retina responding to natural movies,up to four metastable states were reported in the popu-lation (N = 40). These states appeared at precise timesof the repeated movie [20], suggesting that they mightcode for specific stimulus features. The synthetic net-work of N = 120 cells displayed a much larger numberof metastable states, and the distribution over the basins

100

102

104

10−6

10−5

10−4

10−3

10−2

10−1

100

Rank

P(σ)/P(silent)

N = 40

N = 20

N = 30

P(σ) ∝ 1 / rank

FIG. 3. The activity of populations of retinal ganglion cellsobeys Zipf’s law (from the data in Ref [19]). Shown is theprobability of activity patterns (or ‘words’) against their rankfor various population sizes. Error bars show the variabil-ity across different choices of subpopulations. Note that theagreement with Zipf’s law, P (σ) ∝ 1/rank, is maximum forlarger N .

corresponding to these states also followed Zipf’s law. Atthis point however, the exact relation between the pro-liferation of metastable states and criticality is still notwell understood.

In summary, these analyses give strong support to theidea that neural networks might be poised near a criticalstate. However, it is still not clear whether the observedsignatures of criticality will hold for larger N , especiallywhen it is of the order of a correlated patch (∼ 200). Thenext generation of retinal experiments, which will recordfrom ≈ 100− 200 cells simultaneously, should be able tosettle that question.

V. ENSEMBLES OF SEQUENCES

The structure and function of proteins is determinedby their amino acid sequence, but we have made rela-tively little progress in understanding the nature of thismapping; indeed, to solve this problem completely wouldbe equivalent to solving the protein folding problem [49–51]. An oblique way to tackle that question is to remarkthat a single function or structure often is realized bymany different protein sequences. Can we use the statis-tics of these related proteins to understand how physicalinteractions constrain sequences through selection?

To make progress, one first needs to define proteinfamilies. Since only a fraction of known proteins havea resolved structure or identified function, defining thesefamilies must rely on simplifying assumptions. The stan-

8

dard method for constructing a family is to start froma few well identified proteins or protein domains with acommon structure or function [52]. A hidden Markovmodel is then inferred from that small pool of sequences,and used to scan huge protein databases to search fornew members. Clearly, this method only works if themodel can set a sharp boundary between members andnon–members, and an implicit hypothesis underlying thewhole approach is that families are indeed well separatedfrom each other.

Once a protein family has been defined, it is interest-ing to study its statistical properties. The data on aparticular family consists of a multiple sequence align-ment, so that for each member of the family we have astring σ = (σ1, . . . , σN ), where N is the number of aminoacids in the protein and σi is one of the 20 possible aminoacids at position i in the alignment, or alternatively analignment gap ‘–’; cf. Fig. 4A. It is useful to think ofthe family as a probabilistic object, described by a dis-tribution P (σ) from which sequences are drawn. As fornetworks of neurons, sampling P (σ) exhaustively is im-possible, so one must have recourse to approximations.

Models of independent residues, P1(σ) =∏Ni=1 pi(σi),

have been widely used in the literature. Physically, how-ever, residues do not simply contribute to the free en-ergy additively [53], emphasizing the importance of cor-relations. Indeed, statistical analyses of protein familiesreveal strong correlations among the amino acid substi-tutions at different residue positions in short protein do-mains [54]. To illustrate this, we represent in Fig. 4Bthe mutual information between all pairs of positions ina multiple sequence alignment the “WW domain” familyof proteins. WW domains are 30 amino acid long proteinregions present in many unrelated proteins. They fold asstable, triple stranded beta sheets and bind proline richpeptide motifs. The mutual information gives a measureof correlations between non–numerical variables—here,the residue identity at given positions—defined by

MI[pij ] =∑σi,σj

pij(σi, σj) log2

[pij(σi, σj)

pi(σi)pj(σj)

](29)

for a pair of positions i and j in the alignment, where

pi(σi) =∑{σk}k 6=i

P (σ), (30)

pij(σi, σj) =∑

{σk}k 6=i,j

P (σ), (31)

are the one and two point marginals of the distribution,respectively. Some pairs have as much as 1 bit of mutualinformation among them, which means that one residuecan inform a binary decision about the other.

How important are these correlations for specifyingthe fold and function of proteins? In a groundbreakingpair of papers [55, 56], Ranganathan and his collabora-tors showed that random libraries of sequences consis-tent with pairwise correlations of WW domains repro-

LPEGWEMRFTVDGIPYFVDHNRRTTTYIDP

----WETRIDPHGRPYYVDHTTRTTTWERP

LPPGWERRVDPRGRVYYVDHNTRTTTWQRP

LPPGWEKREQ-NGRVYFVNHNTRTTQWEDP

LPLGWEKRVDNRGRFYYVDHNTRTTTWQRP

LPNGWEKRQD-NGRVYYVNHNTRTTQWEDP

LPPGWEMKYTSEGIRYFVDHNKRATTFKDP

LPPGWEQRVDQHGRAYYVDHVEKRTT----

LPPGWERRVDNMGRIYYVDHFTRTTTWQRP

.........

σ30

residue position

resid

ue

po

sitio

n

Mutual information (bits)

5 10 15 20 25 30

5

10

15

20

25

30 0

0.2

0.4

0.6

0.8

1

A

B

FIG. 4. Network of correlations between residue positions inthe protein family of WW domains. A. A protein sequence isa string σ of amino acids in the multiple sequence alignmentof the family. Here is shown a small sample of co-alignedsequences. B. The mutual information between amino acidpositions reveals a tighly connected network of correlationsbetween residues all across the sequence.

duced the functional properties of their native counter-part with high frequency. In contrast, sequences thatwere drawn from an independent distribution failed tofold. Technically, a random library consistent with pair-wise correlations was constructed using a simulated an-nealing procedure. The algorithm started from the nativelibrary and randomly permuted residues within columnsof the multiple sequence alignment, thereby leaving theone point functions pi(σi) unchanged. The Metropolisrejection rate was designed to constrain the two pointfunctions pij(σi, σj): a cost was defined to measure thetotal difference between the correlation functions of thenative and artificial libraries:

C =∑

i,j,σ,σ′

∣∣∣∣∣logpnativeij (σ, σ′)

partificialij (σ, σ′)

∣∣∣∣∣ , (32)

and moves were accepted with probability e−∆C/T , wherethe algorithm temperature T was exponentially cooled tozero until convergence.

9

In spirit, this procedure seems similar to the maximumentropy principle: random changes make the library asrandom as possible, but with the constraint that the oneand two point functions match those of the native library.That intuition was formalized in Ref [57], where the twoapproaches were shown to be mathematically equivalent.However, to this day no explicit model for the maximumentropy distribution of the WW domains has been con-structed.

The results from [55, 56] generated a lot of interest,and since then several studies have tried to explore thecollective properties of proteins using similar ideas. Wenow review three of these recent efforts [33, 38, 58]. Allof these examples support the utility of maximum en-tropy methods in drawing meaningful conclusions aboutsequence families, while the last focuses our attentionback on the question of criticality in these ensembles.

“Two component signaling” is a ubiquitous system forthe detection and transduction of environmental cues inbacteria. It consists of a pair of cognate proteins, a sensorhistidine kinase (SK) which detects cellular and environ-mental signals, and a response regulator (RR) to whichsignal is communicated by SK via the transfer of a phos-phoryl group; the activated RR then triggers other bio-chemical processes in the cell, including in many cases theexpression of other proteins. Many different versions ofthe two component system are present within and acrossspecies, with about 10 per genome on average. A natu-ral question about this system is how the specificity ofcoupling between particular SK and RR proteins is deter-mined, especially when the different family members haveso much in common. To approach this problem, Weigtet al. studied a large collection of cognate SK/RR pairs,and built a maximum entropy model for the (joint) vari-ations in sequence [38, 59]. The maximum entropy dis-tribution consistent with two point correlation functionspij(σi, σj) takes the form of a disordered Potts model:

P (σ) =1

Ze∑

i hi(σi)+∑

ij Jij(σi,σj), (33)

with the gauge constraints∑σ hi(σ) = 0 and∑

σ′ Jij(σ, σ′) =

∑σ Jij(σ, σ

′) = 0. The distribution wasapproximately fitted to the data using mean field tech-niques [37, 59].

A key point, familiar from statistical mechanics, is thata relatively sparse set of interactions Jij can generatewidespread correlations. It seems plausible that aminoacids on the SK and RR proteins which govern the speci-ficity of their contact actually have to interact in thesense of the Potts model, while other residues may be-come correlated even if they don’t have this essential rolein specificity. The maximum entropy method allows forthe distinction of the two cases. A ‘Direct Information’(DI) was defined as the mutual information between tworesidues when all other residues are ignored:

DIij = MI[pdirectij ] (34)

FIG. 5. The maximum entropy distinguishes between cor-relations arising from direct pairwise interactions, and cor-relations arising from collective effects [38]. A. The mutualInformation (29) between pairs of amino acid position is plot-ted versus the Direct information (34), which measures themutual information directly contributed by the pairwise in-teraction. Among highly correlated pairs, one distinguishesbetween strongly interacting pairs (red area) and pairs whosecorrelations result from collective effects (green area) B. Di-rect interactions dominate in the binding domain, while col-lectively induced correlations are mostly present in the phos-photransfer site.

where in

pdirectij (σ, σ′) =

1

zijeJij(σ,σ′)+h

(j)i (σ)+h

(i)j (σ′), (35)

the ‘fields’ h(j)i and h

(i)j are chosen such that∑

σ pdirectij (σ, σ′) = pj(σ

′) and∑σ′ pdirect

ij (σ, σ′) = pi(σ).This direct information, which is zero only for Jij(·, ·) =0, can be viewed as an effective measure of the interactionstrength between two residues. Fig. 5 shows direct infor-mation versus mutual information for all pairs of residuepositions in the protein complex. Direct pairwise interac-tions (large DI, large MI, red) were found to dominate inthe binding domain. In contrast, collective effects arisingfrom many weak interactions (low DI, large MI, green)characterized the phosphotransfer domain. Quite natu-rally, strong interactions (large DI, or equivalently largeJij ’s) were hypothesized to correspond to direct contactbetween residues that play a key role in the determina-tion of specificity.

10

To validate the connection between specificity and theJij , the inferred interacting residue pairs were used topredict the structure of the transient complex formedby the two proteins upon binding. The prediction wasshown to agree within crystal resolution accuracy withexisting crystallographic data [60]. However efficient, thisuse of the method only focuses on the strongly interact-ing pairs involvled in binding, leaving out collective (andpossibly critical) behaviors present in the phosphotrans-fer domain, where strong correlations arise from weakbut distributed interactions. It would be interesting toexplore the collective properties of the network as a wholethrough a more systematic study of the model’s thermo-dynamic properties.

In a parrallel effort, Halabi et al. [58] showed that vari-ability in protein families could be decomposed into a fewcollective modes of variations involving non–overlappinggroups of residues, called ‘sectors’, which are function-ally and historically independent. To find these sectors,an estimator of the correlation strength was defined as:

Cij = DiDj

∣∣pij(σconsi , σcons

j )− pi(σconsi )pj(σ

consj )

∣∣ ,(36)

where σcons is the consensus sequence made of the mostcommon residues at each position. The role of theweights

Di = logpi(σ

consi )[1− q(σcons

i )]

[1− pi(σconsi )]q(σcons

i ), (37)

where q(σ) is the background probability of residues inall proteins, is to give more importance to highly con-served positions. The matrix Cij was diagonalized, andthe projection of each position i onto the second, thirdand fourth largest eigenmodes (the first mode being dis-carded because attributed to historical effets) was rep-resented in a three dimensional space. In that space,which concentrates the main directions of evolutionaryvariation, residue positions can easily be clustered into afew groups, called sectors.

This approach was applied to the S1A serine proteasefamily, for which three sectors were found (Fig. 6A). Re-markably, two of these sectors are related to two distinctbiochemical properties of the protein, namely its thermalstability and catalytic power, and experiments showedthat mutations in each sector affected the two propertiesindependently (Fig. 6B). The mutants used for these ex-periments were randomly generated by an Ising modelidentical to (27) for each sector. The model was fittedto the data after sequences σ were simplified to binarystrings σ, with σi = δ(σi, σ

consi ). Although no system-

atic study of the many body properties of this model wascarried out, the non–additive effect of mutations on theprotein’s properties was demonstrated by experiments ondouble mutants.

Finally, in recent work, the maximum entropy ap-proach was used to study the diversity of an unambigu-ously defined family of proteins: the repertoire of B cellreceptors in a single individual [33]. B cells are compo-nents of the immune system; each indivudual has many

280

310

320

1 10-1

10-2

10-3

10-4

10-5

D189C191A

G226A

G216A

Y29A

V183A

Q30AWT

C136A

M104AC157A

L105A

P124AQ210A

T229A

K230A

330

Tm

(K

)

[(kcat/Km)mut/(kcat/Km)wt]

A

B

FIG. 6. Independent sectors in the S1A serine protease family[58]. A. Residue positions i are plotted in the eigenspace ofthe weighted correlation matrix Cij . Three clusters of posi-tions, called sectors, emerge (blue, red and green). B. Mu-tations in different sectors result in independent changes inthe biochemical properties of the protein. Mutations in thered sector (red dots) affect the catalytic power (x axis), whilemutations in the blue sector (blue dots) change the thermalstability Tm (y axis).

B cells, each of which expresses its own specific surfacereceptor (an antibody) whose task is to recognize anti-gens. Thus, the diversity of the repertoire of B cell re-ceptors carries an important biological function, as it setsthe range of pathogens against which the organism candefend itself. The mechanisms by which diversity is gen-erated in the repertoire are complex and not entirely elu-cidated [61]. Recently, Weinstein et al. have sequencedalmost exhaustively the repertoire of B cell receptors ofsingle zebrafish [12], allowing for the first time for a de-tailed analysis of repertoire diversity.

A main source of the diversity is generated through aprocess called recombination, which pieces together dif-ferent segments of the antibody sequence (called V, Dand J segments), each of which is encoded in the genomein several versions. Additional diversity is generated atthe VD and DJ junctions by random addition and re-moval of nucleotides during recombination. Finally, an-tibody sequences undergo random somatic hypermuta-tions, mostly in and around the D segment, throughoutthe lifetime of the cell. Thus, most of the diversity isconcentrated around the D segments, which also consti-

11

V D J

}

σ

observed, testing set

mo

de

l, tra

inin

g s

et

i = 1i = 2i = 3i = 4

100

100

10-1

10-1

P 1i (σi) (i = position from left)

ARNDCQEGHILKMFPSTWYV

i = 1 2 3 4

0 0.3

FIG. 7. A translation invariant maximum entropy model ofnon-aligned sequences correctly predicts amino acid frequen-cies at absolute positions [33]. Top: the sequence is madeof three segments, V D and J, of which on D and its flank-ing junctions are fitted by a translation invariant maximumentropy model. Left: for each position i from the left, thefrequency table P 1

i for all 20 residues is represented by ahistogram. Right: comparison of these frequencies betweendata and model prediction (after rescaling by the translation-invariant independent model P 1(σi)).

tute one of the three main loops involved in the pathogenrecognition process. The D region (defined as the D seg-ment plus its flanking junctions) is therefore a excellentplace to study repertoire diversity.

Compared to the previous cases, the definition of thefamily here is straightforward: all D region sequences of asingle individual. However, and in contrast to other pro-tein families, D sequences cannot be aligned, and theirlength varies considerably (from 0 to 8 amino acids). Tocircumvent this problem, a maximum entropy distribu-tion consistent with translation invariant observables wasdefined. This leads to writing a model similar to Eq (33),but where hi = h and Jij = Jk=(i−j) do not depend onthe absolute position of the residues along the sequence.In addition, in order to account for the variable length,the length distribution itself was added to the list of fit-ted observables, resulting in a chemical potential µ[L(σ)]being added to the Potts energy, where L(σ) is the se-quence length.

The model was fitted by gradient descent combinedwith Monte Carlo simulations. Pairwise correlation be-tween nearest and second nearest neighbors alone ex-plained 70 to 90% of correlations, contributing to a largedrop in entropy compared to the independent model,from 15 to 9 bits on average. Thus, correlations lim-ited the size of the repertoire by a ∼ 26 = 64 fold factor.Despite it being translation invariant, the model couldalso reproduce local observables by simple end effects,

100

101

102

103

104

10−4

10−3

10−2

10−1

Rank

Probability

ObservedModelIndependent

100

102

104

10−6

10−4

10−2

FIG. 8. The repertoire of antibody D regions of zebrafishfollows Zipf’s law [33]. For a single fish, the probability ofa small antibody segment involved in pathogen recognitionis plotted versus its frequency rank, as in Fig. 3. The data(cyan) is compared with the prediction of a maximum entropymodel consistent with nearest and next nearest neighbors cor-relations (red), and also with a model of independent residues(green). Inset: the same curve plotted for multiple individu-als.

such as the > 10× variation in amino acid frequencies atgiven absolute positions, as shown in Fig. 7.

One striking prediction of the model is that the reper-toire follows Zipf’s law, in close analogy to results ob-tained for the activity of neural networks. Since the ex-haustive sampling of P (σ) is possible in this case, thatprediction can be directly tested against the data, andwas found to be in excellent agreement (Fig. 8). Im-portantly, pairwise correlations between residues are es-sential for explaining this behavior, as evidenced by thefailure of the independent model to reproduce it. Thelaw also seems to be universal, as its varies little from in-dividual to individual, despite substantial differences inthe details of their repertoires.

In addition, the model was used to look for metastablestates, performing a similar analysis as was done forthe retina in the previous section. About ten relevantmetastable states were found for each individual. Notall these states could be mapped onto a genomic tem-plate, and it was hypothesized that these non–templatedstates might reflect the history of antigenic stimulationand thus “code” for an efficient defense against futureinfections. Furthermore, continuous mutation paths ex-isted between almost all metastable states, showing thatthe repertoire efficiently covers gaps between metastablestates, and emphasizing the surprising plasticity of therepertoire.

These results suggest that correlations in protein fam-ilies build up to create strongly correlated, near–critical

12

states. A practical consequence for protein diversity isthat collective effects limit the space of functional pro-teins much more dramatically than previously thought.This should invite us to revisit previously studied fami-lies (WW domains, SK/RR pairs, serine proteases, butalso PDZ, SH2, and SH3 domains) to investigate theirthermodynamical properties, with the help of maximumentropy models, in search of critical signatures.

VI. FLOCKS OF BIRDS

Groups of animals such as schooling fish, swarminginsects or flocking birds move with fascinating coordina-tion [62]. Rather than being dictated by a leader or inresponse to a common stimulus, the collective patterns offlock dynamics tend to be self organized, and arise fromlocal interactions between individuals, which propagateinformation through the whole group. Flocks, schoolsand swarms also are highly responsive and cohesive inthe face of predatory threat. This balance between orderand high susceptibility points to the idea of criticality.Recent field work and theoretical analysis pioneered bythe STARFLAG team [13–17] (see also [63] for a review inrelation to previous models), has framed this idea in pre-cise mathematical terms, culminating in the first empir-ical evidence that flock behaviour may indeed be criticalin the sense of statistical physics [64]. Before embarkingon the description of these results, we first review thetechnical advances that have made these developmentspossible.

Three dimensional studies of flocks were pioneered byCullen et al. [65]. Until recently, such experiments havefocused on small populations of a few tens of individuals,which is insufficient to investigate the large scale prop-erties of flocks. The accurate reconstruction of the threedimensional positions of large flocks is impeded by manytechnical challenges and has been a major bottleneck. Inprinciple, one can infer the three dimensional coordinatesof any object from two photographs taken simultaneouslyfrom different viewpoints. But in the presence of a largenumber of indistinguishable birds, individuals first needto be identified between photographs before that simplegeometric argument can be used; this is the so–calledmatching problem. Use of three cameras can help, butin the presence of noise the matching problem is stillhighly challenging. In Ref [15], new techniques were de-veloped to aid the resolution of the matching problem.The main idea is to compare the patterns formed by theimmediate neighborhood of each individual between dif-ferent photographs. The best match is then chosen asthe one maximizing the overlap between these patternsin the different photographs.

With the help of this technique, triplets of carefully cal-ibrated, high resolution photographs of flocks of starlingstaken from three different viewpoints were processed andanalysed to yield accurate positions and velocities for allthe individuals of flocks comprising up to 2700 birds; see

FIG. 9. Two dimensional projection of a typical 3D recon-struction of the positions and velocities of every bird in aflock of 1,246 starlings [64]. Left: the absolute velocities ~vishow a high degree of order in bird orientation. Right: thevelocity fluctuations, ~ui = ~vi − 1

N

∑Ni=1 ~vi, are long-ranged,

and form only two coherent domains of opposite directions.

Fig. 9 for an example. Preliminary analysis focused onthe overall size, shape, density, homogeneity and flyingdirection of entire flocks [13, 14]. A subsequent study [17]demonstrated that birds interact with their neighbors ac-cording to their topological distance (measured in unitsof average bird separation), rather than to their metricdistance (measured in units of length). The reasoningleading to that conclusion is quite indirect and is worthexplaining in some detail. The distribution of neighborsaround an average bird is not uniform: birds tend to havecloser neighbors on their sides than behind or in front ofthem. There are biological reasons for this. Birds havelateral vision, and can monitor their lateral neighborswith better accuracy. In addition, keeping a larger dis-tance with frontal neighbors may be a good strategy foravoiding collisions. The main assumption of [17] is thatthis heterogeneity is a result of interactions between in-dividuals, and can be used to estimate the range of theseinteractions, defined as the distance at which the neigh-borhood of an average bird becomes uniform. Plottingthis range for various flock densities both in topologi-cal and metric units (Fig. 10) clearly showed that birdsinteract with a fixed number (∼ 7) of neighbors ratherthan with birds within a fixed radius as was previouslythought.

How does global order emerge across the whole flockfrom local interactions? Clearly, if each bird perfectlymimics its neighbors, then a preferred orientation willpropagate without errors through the flock, which willalign along that direction. In reality, alignment withneighbors is not perfect, and noise could impede theemergence of global order. This situation is similar tothat encountered in physics, where increasing the tem-perature destroys the ordered state (melting). Considerfor example a uniform, fully connected Ising model—thesimplest model of ferromagnetism—defined by Eq. (27)with Jij = J/N and hi = h. At equilibrium, itsmean magnetization m = 1

N

∑i〈σi〉 = 0 satisfies m =

13

FIG. 10. Topological versus metric: flocking birds interactwith a finite and fixed number of neighbors [17]. The in-teraction range is plotted in terms of number of interactingneighbors nc (left) and in terms of the metric distance rc(right), as a function of the sparseness r1, defined as the av-erage separation between neighbors. The topological rangenc ∼ 7 is invariant while the metric range rc scales with thelinear sparseness.

tanh(Jm+ h) [66]. Under a small field h = 0+, the sys-tem is completely disordered (m = 0) when the controlparamater J (inverse temperature) is smaller than 1, butbecomes ordered (m > 0) for J > 1. Interestingly, asimilar phase transition occurs in simple models of flockdynamics [67], where the external control parameter canbe the noise, the flock density, or the size of the align-ment zone. This phase transition, and the concomittentspontaneous symmetry breaking, were analyzed analyti-cally in a continuum dynamical model which exactly re-duced to the XY model in the limit of vanishing velocities[68, 69].

Order is not exclusive to self organized systems, andcan instead result from an external forcing (in languageappropriate to flocks, by a leader or a shared environmen-tal stimulus). In the Ising model, this corresponds forexample to J = 0 and h� 1. To better discriminate be-tween self-organization and global forcing, one can exam-ine the response function of the system, or equivalently(by virtue of the fluctuation-dissipation theorem) the cor-relation functions of small local fluctuations around theordered state. In the context of flocks, a large responsefunction means that the flock is not only ordered, butalso responds collectively to external perturbations. It istempting to suggest that this property is desirable froman evolutionary point of view, as it implies a strongerresponsiveness of the group to predatory attacks. Wewill see that this is indeed how flocks of birds behave.Note that in physical systems, high susceptibility is onlyachieved near a critical point. In the disordered phase,variables are essentially independent from each other,while in the ordered phase, variables are aligned but theirfluctuations become independent as the temperature islowered. What is the situation for bird flocks?

To explore these ideas empirically, Cavagna et al. [64]analyzed the velocity correlations of large flocks, usingthe same dataset as in previous studies. At this point itshould be stressed that here, at variance with the pre-

A

B

FIG. 11. Velocity fluctuations are scale free [64]. A.. Thecorrelation length ξ scales linearly with the system’s size L,indicating that no other scale than L is present in the system.B. Correlation function C versus rescaled distance r/ξ. ξ isdefined as the radius for which C = 0. The slope at r = ξ(Inset) seems to depend only weakly upon ξ. This suggeststhat coherence can in principle be preserved over extremelylong ranges.

vious cases of neurons and proteins, learning the prob-ability distribution of the system’s state is impracticalbecause only one example of the flock’s state is avail-able to us. On the other hand, translation invariance (ifone excludes the edges of the flock) and homogenpeityin the birds’ behavior can be invoked to make statisti-cal statements across the population. Let us call ~vi the3D velocity vector of a bird i = 1, . . . , N . The amountof order in the flock is typically measured by the po-larization ‖ 1

N

∑ivi‖~vi‖‖, whose value is here very close

to 1 (0.96 ± 0.03) in agreement with previous studies.But as discussed earlier, more interesting are the fluc-tuations around the global orientation, defined by thevelocities in the reference frame of the center of mass:~ui = ~vi − (1/N)

∑Ni=1 ~vi. Correlations in these fluctua-

tions are captured by the distance dependent correlationfunction:

C(r) =1

c0

∑i,j ~ui · ~ujδ(r − rij)∑

i,j δ(r − rij), (38)

where rij is the distance between birds i and j, δ(·) is a(smoothed) Dirac delta function, and c0 is chosen such

14

that C(r = 0) = 1. The correlation function C(r) isplotted in Fig. 11A for different flock sizes as a functionof the rescaled distance r/ξ, where ξ is a characteristiclength defined by C(ξ) = 0. All points seem to fall ontoa single curve.

The results of Fig. 11A are consistent with what weknow from scaling theory in physics [66]. Near a criti-cal point, correlation functions are given by a universalfunction,

C(r) =1

rγf(r/ξ), (39)

where ξ is the correlation length which diverges as thecritical point is approached. Strikingly, in bird flocks, thecorrelation length ξ is found to scale with the linear size ofthe flock L (Fig. 11B). This indicates that the correlationfunction is in fact scale free, in the sense that no scaleis present except for the system size. Replacing ξ = αLinto (39) and taking L → ∞ yields a power law decayfor the correlation function, C(r) = r−γ , characteristicof a critical point. The exponent γ can in principle beevaluated from data through the derivative of C at r = ξ:ξ∂C/∂r ∝ −ξ−γ . However, as evident from the inset ofFig. 11A, γ is almost indistinguishable from zero. Thisimplies that the correlation function is not only scale free,but also decays very slowly, implying extremely strongand long ranged coherence across the flock.

The same analysis was carried out on the correlationsof the modulus of the velocity, rather than its orienta-tion, yielding essentially the same restults. A physicalsystem with a spontaneously broken symmetry, such asits overall orientation, can display scale free (“massless”)behavior of the quantity associated to that symmetry,even when no critical point is present (Goldstone modes).However, the modulus of velocity is a much stiffer modethan velocity orientation, and corresponds to no obvioussymmetry. The fact that it also exhibits scale free be-havior thus is stronger evidence that the system indeedis close to a critical point.

One must be cautious when extrapolating from fi-nite system sizes, and conclusions drawn from these ex-trapolations must be examined with increased scrutiny.Nonetheless, the wealth of evidence in favor of criticalitymakes it a very useful and pertinent concept for under-standing complex flock dynamics. We expect that con-tinued improvements in experimental technique and dataanalysis methods will test the hypothesis of criticalitymuch more sharply.

VII. DYNAMICAL VS. STATISTICALCRITICALITY

So far, we have assumed that states of a biological sys-tem were drawn from a stationary probability distribu-tion P (σ), and we have explored questions of criticalityin the associated statistical mechanics model. Criticality,however, can also be meant as a dynamical concept. For

example, in models of self-organized criticality mentionedin the introduction, avalanches are by nature a dynam-ical phenomenon [1]. We now discuss two lines of workin this direction: the observation of critical avalanchesof activity in networks of cultured neurons, and dynami-cal criticality close to a Hopf bifurcation in the auditorysystem.

We start with avalanches in neural networks [70–72].Consider a control parameter for neuronal excitability,which sets how much a spike in one neuron excites itsneighbors. If this parameter is too low, a spike in one neu-ron may propagate to its direct neighbors, but the associ-ated wave of activity will quickly go extinct. Conversely,if the excitability parameter is too high, the wave will ex-plode through the whole population and cause somethingreminiscent of an epileptic seizure. To function efficiently,a neural population must therefore poise itself near thecritical point between these two regimes. The analogywith sandpiles and earthquakes is straightforward: whena grain falls, it dissipates some its mechanical energy toits neighbors, which may fall in response, provoking anavalanche of events [1]. A similar argument applies toearthquakes and the propagation of slips [73].

The most striking feature of self–organized criticalityis the distribution of the avalanche sizes, which typicallyfollows a power law. Beggs and Plenz [74] were the firstto report such power laws in the context of neural net-works. In their experiment, a 60–channel multielectrodearray was used to measure local field potentials (a coarsegrained measure of neural activity) in cortical culturesand acute slices. Activity occured in avalanches—burstsof activity lasted for tens of milliseconds and were sep-arated by seconds long silent episodes—that propagatedacross the array (Fig. 12A). For each event, the totalnumber of electrodes involved was counted as a measureof avalanche size. The distribution of this size s followeda power-law with an exponent close to −3/2 (Fig 12B).Although that exponent was first speculated to be uni-versal, it was later shown that it depends on the detailsof the measurement method [75].

The critical properties of neural avalanches can be ex-plained by a simple branching process [76]. Assume thatwhen a neuron fires at time t, each of its neighbors has acertain probability of firing at time t + 1, such that theaverage number of neighbors firing at t + 1 is given bythe branching parameter β. That parameter is exactlywhat we called “excitability” earlier; β < 1 leads to to anexponential decay of the avalanche, β > 1 to its exponen-tial and unlimited growth, and β = 1 defines the criticalpoint. To support that simple theory, the parameter βwas estimated directly from the data, and was found tobe 1 within error bars.

Can we connect this notion of criticality in a neuralnetwork to the ideas discussed in Section II? Consider, forexample, the simple mean field branching process on aninfinite tree analyzed in [77] and summarized by Fig. 13.When p = 1/2 (β = 1), one can show, by recursivelycalculating the generating function of the avalanche size

15

A

B

FIG. 12. The distribution of avalanche sizes follows a powerlaw. A. Sample avalanche propagating on the 8 × 8 multi-electrode array. B. Probability distribution of avalanche sizes(measured in number of electrode) in log-log space. The dis-tribution follows a power-law with a cutoff set by the size ofthe array.

s = 7

p

p

1-p

p

1-p

FIG. 13. A simple branching process on a tree [77]. Start-ing from the root, activity propagates to its two descendantswith probability p = 1/2, or to none with probability 1 − p.The process repeats itself for all active descendants. In thisexample black node are active, while white node are inactive.The size of the avalanche is s = 7.

s, that the distribution of avalanche sizes becomes

P (s� 1) =√

2/πs−3/2. (40)

Although the resemblance of the exponent 3/2 to thatfound in [74] is coincidental, this simple process nonethe-less predicts a power law in the avalanche size. Similarmodels defined on lattices or on completely connectedgraphs were proposed to explore the functional proper-ties of neural avalanches [74, 78, 79]. When p = 1/2, theprobability of any particular avalanche event σ is easy toestimate, and is 2−s, where s is the size of the avalanche;note that there are many states σ that correspond to thesame size s. Using our definition of the “energy” fromEq. (2), we have E(σ) = s log(2). By virtue of Eq. (40),

however, in this dynamically critical state the probabil-ity that a random configuration has energy E decays lessrapidly than an exponential, and this must result from anear perfect balance between energy and entropy:

P (E) =1

ZeS(E)−E =

√2/π

(log 2)3/2E−3/2, (41)

which implies:

S(E) = E − 3

2log(E) + . . . , (42)

and this is (for large E) Zipf’s law once again. Note thatthis result is driven solely by the fact that the distri-bution of avalanche sizes has a long tail, and not by anyspecific power law behaviour. To summarize, in the spaceof avalanche configurations we have the same signatureof criticality that we have seen in the retina (Figs. 2 and3), although in different tissues, with different measure-ment methods, and assuming different models of activity.This emphasizes the potential generality of Zipf’s law andcriticality for brain function.

The space of possible avalanches is huge, and one mightwonder whether avalanches can serve as a basis for a neu-ral code. In a simple branching process, each avalanche ofa given length occurs completly at random and is as prob-able as any other. But in a real network, with disorderedexcitabilities and branching parameters, some types ofavalanches may be more likely than others, forming at-tractors in avalanche space. Such attractors were de-tected in the experimental data by clustering all observedavalanche patterns [80]. Remarkably, simulations of dis-ordered branching processes show that a large numberof attractors is only possible when the system is close tothe critical point (average branching parameter 1) [79].These results are reminiscent of those found in the retina,with the difference that attractors are now defined in adynamical space rather than as metastable states in thespace of configurations. As in the retina, the exact func-tion of these attractors for coding is still elusive.

We now turn to another example where dynamical crit-icality plays an important role, although in a differentway, in the context of the auditory system. Our earis remarkably sensitivive to weak sounds, responding tomotions of the same magnitude as thermal noise. Asearly as 1948, Gold [81] proposed that this sensitivityis achieved by compensating damping through an activeprocess. Several observations support this hypothesis.Hair cells, which convert mechanical movement into elec-trical current, respond to sounds in a highly non-linearmanner, by strongly amplifying low amplitude stimuli atsome frequency that is characteristic of each cell. In ad-dition, hair cells display small spontaneous oscillationseven in the absence of stimulus. Most dramatically, theear can actually emit sounds, spontaneously, presumablyas the result of damping being (pathologically) over–compensated at some points in the inner ear [82, 83].

16

A series of recent works, both theoretical and experi-mental, have shown that the mechanosensing system ofhair cells is tuned close to a Hopf bifurcation, where thesystem is highly sensitive to stimulation (see [84] for arecent review). Before going into the specifics of hair cellbiophysics, let us first explain the basic idea.

A Hopf oscillator is described by two essential dynam-ical variables, often collected into a single complex num-ber z. Hopf’s oscillators form a universality class of dy-namical systems, and in response to small forces near thebifurcation point, the dynamical equations can always bewritten as

dz

dt= (µ+ iω0)z − |z|2z + Feiωt. (43)

In the absence of forcing, F = 0, self–sustained oscilla-tions appear for µ > 0: z = reiω0t, with r =

√µ. When

a stimulus is applied at the resonant frequency ω0, thesystem simplifies to

z = reiω0t,dr

dt= r(µ− r2) + F. (44)

Precisely at the bifurcation point µ = 0, there is noregime of linear response; instead we have r = F 1/3. Thekey point is that the “gain” of the system, r/F = F−2/3,diverges at small amplitudes, providing high sensitivityto weak forcings (Fig. 14). This very high gain does notextend much beyond the resonant frequency ω0: it dropsalready by half from its peak when |ω−ω0| = 3

√7F 2/3/4.

How does this theory fit into what we know about theauditory system? As we have seen, an active processis necessary for amplification. In hair cells, this activeprocess is provided by hair bundle motility powered bymolecular motors, which causes spontaneous oscillationsat a characteristic frequency that depends on the geome-try of the hair bundle. Hence each cell will be highlyselective for one particular frequency. Signal is tran-duced by the opening of channels upon deflection of thehair bundle, which has the effect of depolarizing the cell.The interplay of hair bundle motility and external forc-ing provides the basic ingredients for an excitable Hopfoscillator. The relevance of Hopf’s bifurcation in haircells was suggested in [86], and its consequences in termsof signal processing was explored in [87]. In a paralleleffort [88], an explanation was proposed for how the sys-tem tunes itself near the critical point in the oscillatingregime (µ = 0+). The idea is that feedback is provided bythe activity of the channels themselves, notably throughthe calcium ion concentration C which controls the ac-tivity of the motors responsible for bundle motility. Atfirst approximation one can write C = C(µ). Channelactivity regulates C through

dC/dt = −C/τ + J(x), (45)

where τ is the relaxation time, x = Re(z) the hair bun-dle displacement, and J(x) the displacement-dependent

ion flux. For an oscillatory input x = r cos(ωt), J(r) =

Response relations

Stimulus - response relations10

1

0.1

0.01

1

10

0.1

0.011000 2000 5000 10,000

Frequency (Hz)

Dis

pla

ce

me

nt (n

m)

Dis

pla

ce

me

nt (n

m)

0 20 40 60 80Stimulus level (dB SPL)

A

B

FIG. 14. Response to an oscillatory input of a Hopf oscillatornear its critical point [84]. A. Response (displacement) as afunction of input frequency, for increasing input amplitudesfrom 0 dB (lower curve) to 80 dB (top curve). This plotemphasizes the amplification of small inputs, as well as theshrinking width of the frequency range where amplification ispresent. B. Displacement as a function of simulus amplitude,plotted in log space. The red curve, of slope 1/3, shows theenhanced response at the critical (resonant) frequency. Forother frequencies (whose color correspond to the frequenciesmarked by lines in A), the response is linear.

〈J(x)〉 is an increasing function of r (assuming that J(x)is convex). Thus, the non–oscillatory part of C will tune

itself at a value such that C(µ) = τ J(r) = τ J(√µ).

One can show that, for relevant physical parameters, thisdrives the system to small values of µ, that is, close tothe bifurcation point.

Experiments were able to confirm this picture quanti-tatively by measuring the voltage response of frog haircells to an input current. The results showed an en-hanced gain for small amplitudes at the resonance fre-quency (Fig. 15), as predicted by the theory. There arealso classical experiments in auditory perception that areexplained by the Hopf scenario. In particular, in thepresence of two tones at frequencies f1 and f2, we hear acombination tone at frequency 2f1−f2, but the apparentintensity of this sound scales linearly with the intensityof the primary tones. This is completely inconsistentwith a system that has a linear response and perturba-tive nonlinear corrections, but agrees with the 1/3 powerresponse at the critical point.

Again, we can ask how Hopf bifurcation relates to theequilibrium notion of criticality we have explored before.If we examine the equation governing the evolution ofthe amplitude r as a function of time, Eq (44), we canformally rewrite it as the overdamped motion of a coor-

17

FIG. 15. Experimental evidence of a Hopf bifurcation in haircells [85]. Shown is the membrane potential as a functionof the input current for different input frequencies. Two topcurves: an input current oscillating at the resonant frequency(126 Hz) is amplified in a non linear way. Bottom curve: therelation becomes linear when the input frequency is 20 Hzabove the resonant frequency.

dinate r in a potential U :

dr

dt= −∂U

∂r, U(r) = g

r4

4− µr

2

2− Fr, (46)

with g = 1. The form of U is familiar: it describesLandau’s theory of second order phase transitions. InLandau theory, µ is a function of the model parameters(notably the temperature), and vanishes at the criticalpoint. One might object that this dynamical model isnot really a many body system, and so can’t have a truephase transition. But in all ears, and especially in themammalian cochlea, there are many hair cells, tuned todifferent frequencies, and they are mechanically coupledto one another. Maximal amplification at each frequencythus requires that the whole system be set such that amacroscopic fraction of the dynamical degrees of freedomare at criticality [89, 90]. Presumably this interacting sys-tem should exhibit departures from mean field or Landaubehavior, although this has not been explored.

VIII. LOOKING AHEAD

We write at a fortunate moment in the developmentof our subject, when experiments are emerging that holdthe promise of connecting decades of theoretical discus-sion to the real phenomena of life, on many scales. Wehope to have conveyed our reasons for thinking that thisis a remarkable development, but also to have conveyedthe challenges inherent in this attempt to bring theoryand experiment into more meaningful dialogue.

The first challenge is that we do have somewhat differ-ent notions of criticality in different systems, even at thelevel of theory, and these differences are amplified as weexamine the many different approaches to data analysis.

This is a deep problem, not necessarily limited to bio-logical systems. Except in a few cases, the mathematicallanguage that we use to describe criticality in statisticalsystems is quite different from the language that we usein dynamical systems. Efforts to understand, for exam-ple, current data on networks of neurons will force us toaddress the relations between statistical and dynamicalcriticality more clearly.

The second major challenge is that using the maxi-mum entropy method to analyze real data requires usto solve an inverse statistical mechanics problem. Thisproblem is tractable far away from critical points, butnear criticality it seems very difficult. If we had moreanalytic understanding of the problem, it might be pos-sible to identify the signatures of criticality more directlyfrom the measurable correlations, perhaps even allowingus to draw conclusions without explicit construction ofthe underlying model. Absent this understanding, thereis a serious need for better algorithms.

A third set of challenges comes from the nature of thedata itself. While we have celebrated the really revo-lutionary changes in the scale and quality of data nowavailable, there are limitations. In some cases, such as theflocks of birds, we have relatively few independent sam-ples of the network state; even if we had access to longertime series, the topology of the network is changing asindividual birds move through the flock, and we wouldbe forced back to analyzing the system almost snapshotby snapshot. In other cases, such as protein sequences,we have access to very large data sets but there are un-known biases (the organisms that have been chosen forsequencing).

A more subtle problem is that, in all cases, the cor-relations that we observe have multiple origins, some ofwhich are intrinsic to the function of the system and someof which reflect external influences. For many of the sys-tems we have considered, most of the literature about theanalysis of correlations has sought to disentangle theseeffects, but this work makes clear that it might not bepossible to do this without introducing rather detailedmodel assumptions (e.g., about the mechanisms generat-ing diversity in the antibody repetoire vs. the dynamicsof selection in response to antigenic challenge). In thecase of the retina, we know that, quantitatively, roughlyhalf the entropy reduction in the network relative to in-dependent neurons is intrinsic, and half arises in responseto the visual stimulus [19], but even the “extrinsic” corre-lations are not passively inherited from the outside world,since the strength and form of these correlations dependson the adaptation state of the underlying neural circuitry.If the networks that we observe, reflecting both intrinsicand extrinsic effects, operate near a critical point, thisfact may be more fundamental than the microscopic ori-gins of the correlations.

Hopefully the discussion thus far has struck the correctbalance, exposing the many pieces of evidence pointingtoward critical behavior in different systems, but at thesame time emphasizing that criticality of biological net-

18

works remains a hypothesis whose most compelling testsare yet to come. To conclude our review, let’s take the ev-idence for criticality at face value, and discuss two ques-tions which are raised by these observations.

The first question is why biological systems should benearly critical. What benefits does operation at this spe-cial point in parameters pace provide for these systems?For birds, we have seen that criticality confers high sus-ceptibility to external perturbations, and this enhancedreactivity endows them with a better defense mechanimagainst predators. Similarly, in the auditory system, be-ing close to a bifurcation point allows for arbitrarily highgains and accurate frequency selectivity in response weaksounds.

In neural populations, the naive idea underlying thetheory of branching processes makes criticality seem al-most inevitable — a middle point between death andepilepsy. However, the function of neural networks is notonly to be reactive, but also to carry and process complexinformation in a collective manner through its patternsof activity. The observation and analysis of metastablestates, both in retinal acitivity analyzed within the max-imum entropy framework [20] and in the activity of corti-cal slices analyzed with the theory of branching processes[79], suggest that criticality may be coupled to the explo-sion of these states, allowing for a wider set of coding op-tions. A more detailed analysis is needed to support thisspeculation, and to better understand how metastablestates can be learned and used in practice for efficientdecoding. More generally, criticality runs counter to sim-ple notions of efficiency in neural coding, suggesting thatother principles may be operating, as discussed in Refs[20, 47]. In the case of immune proteins, criticality couldbe useful for preparedness to attacks, and could resultfrom a tight balance between the expected—prior expe-rience with antigens, as well as hereditary information en-coded in the genomic templates—and the unknown. Asin the case of neural coding, the existence of metastablestates and their potential for encoding pathogen historymay be enhanced by criticality.

The second question is how criticality can be achieved,apparently in so many very different systems. Criticalsystems occupy only a thin region (sometimes even a sin-gle point) of the parameter space, and it is not clear howbiological systems find this region. In some cases, a feed-back mechanism can be invoked to explain this adapta-tion, as in the case of hair cells, where the active processis itself regulated by the amplitude of the oscillations itproduces. In networks of neurons, synaptic plasticity isa good candidate, and there are models that use (moreor less) known mechanisms of synaptic dynamics to sta-bilize a near–critical state [91]. In other cases however,

no obvious explanation comes to mind.Bird flocks display coherence over very large length

scales, which suggests that the strength of the underly-ing interactions (that is, the precision with which eachbird matches its velocity vector to its neighbors) is tunedvery precisely, but we have no idea how this tuning couldbe achieved. In the case of the immune system, feed-back does not seem a plausible explanation, because theimmune repertoire is constantly renewing itself and outof equilibrium. It is worth noting that a simple mecha-nism of exponential growth with introduction of randomnovelty, called the Yule process [92], predicts Zipf’s law.However such a model suffers from the same flaws asthe branching processes mentioned above: the resultingstates are uniformly random, and cannot carry informa-tion about their environment, as one would want from anadaptive immune system. Besides, the Yule process doesnot account for the existence of a constant source of ge-nomic antibody segments. Therefore, the mechanism bywhich the repertoire maintains criticality remains largelyelusive and requires further investigation.

To summarize, we have discussed experimental evi-dence of criticality in a wide variety of systems, spanningall possible biological scales, from individual proteins towhole populations of animals with high cognitive capac-ity, in stationary as well as dynamical systems. The wideapplicability of the concepts exposed here, fueled by anincreasing amount of high quality data, makes for anexciting time. Ideas which once seemed tremendouslyspeculative are now emerging, independently, from theanalysis of real data on many different systems, and thecommon features seen across so many levels of biologicalorganization encourage us to think that there really aregeneral principles governing the function of these com-plex systems.

ACKNOWLEDGMENTS

We thank our many collaborators for the pleasure ofworking together on these ideas: D Amodei, MJ BerryII, CG Callan, O Marre, M Mezard, SE Palmer, R Ran-ganathan, E Schneidman, R Segev, GJ Stephens, S Still,G Tkacik, and AM Walczak. In addition, we are gratefulto our colleagues who have taken time to explain theirown ideas: A Cavagna, I Giardina, MO Magnasco, andM Weigt. Speculations, confusions, and errors, of course,remain our fault and not theirs. This work was supportedin part by NSF Grants PHY–0650617 and PHY–0957573,by NIH Grant P50 GM071598, and by the Swartz Foun-dation; T. M. was supported in part by the Human Fron-tiers Science Program.

[1] P Bak, C Tang, and K Wiesenfeld, “Self-organized crit-icality: An explanation of the 1/f noise,” Phys Rev Lett

59, 381–384 (Jul 1987).[2] Per Bak, How Nature Works (Springer, New York, 1996).

19

[3] P Bak and K Sneppen, “Punctuated equilibrium and crit-icality in a simple model of evolution,” Phys Rev Lett 71,4083–4086 (Dec 1993).

[4] Stephen Gould and Niles Eldredge, “Punctuated equi-libria: The tempo and mode of evolution reconsidered,”Paleobiology 3, 115–151 (Apr 1977).

[5] M Usher, M Stemmler, and Z Olami, “Dynamic patternformation leads to 1/f noise in neural populations,” PhysRev Lett 74, 326–329 (Jan 1995).

[6] J J Hopfield, “Neural networks and physical systems withemergent collective computational abilities,” Proc NatlAcad Sci USA 79, 2554–8 (Apr 1982).

[7] J J Hopfield and D W Tank, “Computing with neuralcircuits: a model,” Science 233, 625–33 (Aug 1986).

[8] Daniel J. Amit, Modeling Brain Function: The World ofAttractor Neural Networks (Cambridge University Press,Cambridge, 1989).

[9] John Hertz, Anders Krogh, and Richard G. Palmer, In-troduction to the theory of neural computation (Addison-Wesley, 1991).

[10] Ronen Segev, Joe Goodhouse, Jason Puchalla, andMichael J Berry, “Recording spikes from a large fractionof the ganglion cells in a retinal patch,” Nat Neurosci 7,1154–61 (Oct 2004).

[11] A M Litke, N Bezayiff, E J Chichilnisky, W Cunning-ham, W Dabrowski, A A Grillo, M Grivich, P Grybos,P Hottowy, S Kachiguine, R S Kalmar, K Mathieson,D Petrusca, M Rahman, and A Sher, “What does theeye tell the brain?: Development of a system for thelarge-scale recording of retinal output activity,” NuclearScience, IEEE Transactions on 51, 1434 – 1440 (2004).

[12] Joshua A Weinstein, Ning Jiang, Richard A White,Daniel S Fisher, and Stephen R Quake, “High-throughput sequencing of the zebrafish antibody reper-toire,” Science 324, 807–10 (May 2009).

[13] Michele Ballerini, Nicola Cabibbo, Raphael Candelier,Andrea Cavagna, Evaristo Cisbani, Irene Giardina, Al-berto Orlandi, Giorgio Parisi, Andrea Procaccini, Massi-miliano Viale, and Vladimir Zdravkovic, “Empirical in-vestigation of starling flocks: a benchmark study in col-lective animal behaviour,” Anim Behav 76, 201–215 (Jan2008).

[14] Andrea Cavagna, Irene Giardina, Alberto Orlandi, Gior-gio Parisi, and Andrea Procaccini, “The STARFLAGhandbook on collective animal behaviour: 2. Three-dimensional analysis,” (Jan 2008).

[15] Andrea Cavagna, Irene Giardina, Alberto Orlandi, Gior-gio Parisi, Andrea Procaccini, Massimiliano Viale, andVladimir Zdravkovic, “The STARFLAG handbook oncollective animal behaviour: 1. Empirical methods,” (Jan2008).

[16] Andrea Cavagna, Alessio Cimarelli, Irene Giardina, Al-berto Orlandi, Giorgio Parisi, Andrea Procaccini, Raf-faele Santagati, and Fabio Stefanini, “New statisticaltools for analyzing the structure of animal groups,” MathBiosci 214, 32–37 (Jan 2008).

[17] M Ballerini, N Cabibbo, R Candelier, A Cavagna, E Cis-bani, I Giardina, V Lecomte, A Orlandi, G Parisi, A Pro-caccini, M Viale, and V Zdravkovic, “Interaction rulinganimal collective behavior depends on topological ratherthan metric distance: evidence from a field study,” ProcNatl Acad Sci USA 105, 1232–7 (Jan 2008).

[18] D Bishop and J Reppy, “Study of the superfluid transi-tion in two-dimensional 4He films,” Phys Rev Lett 40,

1727–1730 (Jun 1978).[19] Elad Schneidman, Michael J Berry, Ronen Segev,

and William Bialek, “Weak pairwise correlations implystrongly correlated network states in a neural popula-tion,” Nature 440, 1007–12 (Apr 2006).

[20] Gasper Tkacik, Elad Schneidman, Michael J Berry II,and William Bialek, “Spin glass models for a network ofreal neurons,” arXiv 0912.5409v1 (Jan 2009).

[21] Aonan Tang, David Jackson, Jon Hobbs, Wei Chen,Jodi L Smith, Hema Patel, Anita Prieto, Dumitru Petr-usca, Matthew I Grivich, Alexander Sher, Pawel Hot-towy, Wladyslaw Dabrowski, Alan M Litke, and John MBeggs, “A maximum entropy model applied to spa-tial and temporal correlations from cortical networks invitro,” J Neurosci 28, 505–18 (Jan 2008).

[22] M E J Newman, “Power laws, pareto distributions andzipf’s law,” Contemporary Physics 46, 323 (Sep 2005).

[23] Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J New-man, “Power-law distributions in empirical data,” SiamRev 51, 661–703 (Jan 2009).

[24] George Kingsley Zipf, Human behavior and the principleof least effort (Addison-Wesley, Cambridge, 1949).

[25] F. Auerbach, “Das Gesetz derBevolkerungskonzentration,” Petermanns GeographischeMitteilungen 59, 74–76 (1913).

[26] Greg J Stephens, Thierry Mora, Gasper Tkacik, andWilliam Bialek, “Thermodynamics of natural images,”arXiv 0806.2694v1 (Jun 2008).

[27] E. T Jaynes, “Information theory and statistical mechan-ics,” Physical Review 106, 620 (May 1957).

[28] E. T Jaynes, “Information theory and statistical mechan-ics. ii,” Physical Review 108, 171 (Oct 1957).

[29] T. M. Cover and J. A. Thomas, Elements of informationtheory (Wiley, New-York, 1991).

[30] Joseph B Keller and Bruno Zumino, “Determination ofintermolecular potentials from thermodynamic data andthe law of corresponding states,” J Chem Phys 30, 1351(Aug 1959).

[31] D Ackley, G Hinton, and T Sejnowski, “A learning al-gorithm for boltzmann machines,” Cognitive science 9,147–169 (Jan 1985).

[32] Jonathon Shlens, Greg D Field, Jeffrey L Gauthier, Mar-tin Greschner, Alexander Sher, Alan M Litke, and E JChichilnisky, “The structure of large-scale synchronizedfiring in primate retina,” J Neurosci 29, 5022–31 (Apr2009).

[33] Thierry Mora, Aleksandra M Walczak, William Bialek,and Curtis G Callan, “Maximum entropy models for an-tibody diversity,” Proc Natl Acad Sci USA 107, 5405–10(Mar 2010).

[34] Tamara Broderick, Miroslav Dudik, Gasper Tkacik,Robert E Schapire, and William Bialek, “Faster solutionsof the inverse pairwise ising problem,” arXiv 0712.2437v2(Jan 2007).

[35] Vitor Sessak and Remi Monasson, “Small-correlation ex-pansions for the inverse ising problem,” J Phys A-MathTheor 42, 055001 (Jan 2009).

[36] S Cocco, S Leibler, and R Monasson, “Neuronal couplingsbetween retinal ganglion cells inferred by efficient inversestatistical physics methods,” Proc Natl Acad Sci USA106, 14058–62 (Jul 2009).

[37] Marc Mezard and Thierry Mora, “Constraint satisfac-tion problems and neural networks: A statistical physicsperspective,” J Physiol Paris 103, 107–13 (Jan 2009).

http://dx.doi.org/10.1038/nn1323

http://dx.doi.org/10.1109/TNS.2004.832706

http://dx.doi.org/10.1109/TNS.2004.832706

http://dx.doi.org/10.1126/science.1170020

http://dx.doi.org/10.1016/j.anbehav.2008.02.004

http://dx.doi.org/10.1016/j.mbs.2008.05.006

http://dx.doi.org/10.1016/j.mbs.2008.05.006

http://dx.doi.org/10.1073/pnas.0711437105


http://dx.doi.org/10.1103/PhysRevLett.40.1727

http://dx.doi.org/10.1038/nature04701

http://arxiv.org/abs/0912.5409v1

http://dx.doi.org/10.1523/JNEUROSCI.3359-07.2008

http://dx.doi.org/10.1080/00107510500052444

http://dx.doi.org/10.1137/070710111

http://dx.doi.org/10.1137/070710111


http://dx.doi.org/10.1103/PhysRev.106.620

http://dx.doi.org/10.1103/PhysRev.108.171

http://dx.doi.org/doi:10.1063/1.1730184




http://dx.doi.org/10.1088/1751-8113/42/5/055001

http://dx.doi.org/10.1088/1751-8113/42/5/055001


http://dx.doi.org/10.1016/j.jphysparis.2009.05.013

20

[38] Martin Weigt, Robert A White, Hendrik Szurmant,James A Hoch, and Terence Hwa, “Identification of directresidue contacts in protein-protein interaction by mes-sage passing,” Proc Natl Acad Sci USA 106, 67–72 (Jan2009).

[39] F. Rieke, D. Warland, R. de Ryuter van Stevenick, andW. Bialek, Spikes: Exploring the Neural Code (MITPress, Cambridge, 1997).

[40] M Meister, L Lagnado, and D A Baylor, “Concerted sig-naling by retinal ganglion cells,” Science 270, 1207–10(Nov 1995).

[41] Elad Schneidman, Susanne Still, Michael J Berry, andWilliam Bialek, “Network information and connectedcorrelations,” Phys Rev Lett 91, 238701 (Dec 2003).

[42] Jonathon Shlens, Greg D Field, Jeffrey L Gauthier,Matthew I Grivich, Dumitru Petrusca, Alexander Sher,Alan M Litke, and E J Chichilnisky, “The structure ofmulti-neuron firing patterns in primate retina,” J Neu-rosci 26, 8254–66 (Aug 2006).

[43] H Wassle, L Peichl, and B B Boycott, “Mosaics and ter-ritories of cat retinal ganglion cells,” Prog Brain Res 58,183–90 (Jan 1983).

[44] Shan Yu, Debin Huang, Wolf Singer, and Danko Nikolic,“A small world of neuronal synchrony,” Cereb Cortex 18,2891–901 (Dec 2008).

[45] Ifije E Ohiorhenuan and Jonathan D Victor,“Information-geometric measure of 3-neuron firingpatterns characterizes scale-dependence in corticalnetworks,” Journal of computational neuroscience,published online ahead of print (Jul 2010).

[46] Ifije E Ohiorhenuan, Ferenc Mechler, Keith P Purpura,Anita M Schmid, Qin Hu, and Jonathan D Victor,“Sparse coding and high-order correlations in fine-scalecortical networks,” Nature 466, 617–21 (Jul 2010).

[47] Gasper Tkacik, Elad Schneidman, Michael J Berry II,and William Bialek, “Ising models for networks of realneurons,” arXiv q-bio/0611072v1 (Nov 2006).

[48] M. Mezard, G. Parisi, and M. A. Virasoro, Spin-GlassTheory and Beyond, Lecture Notes in Physics, Vol. 9(World Scientific, Singapore, 1987).

[49] C B Anfinsen, “Principles that govern the folding of pro-tein chains,” Science 181, 223–30 (Jul 1973).

[50] M H Cordes, A R Davidson, and R T Sauer, “Sequencespace, folding and protein design,” Curr Opin Struct Biol6, 3–10 (Feb 1996).

[51] Valerie Daggett and Alan Fersht, “The present view ofthe mechanism of protein folding,” Nat Rev Mol Cell Biol4, 497–502 (Jun 2003).

[52] Richard Durbin, Sean R. Eddy, Anders Krogh, andGraeme Mitchison, Biological Sequence Analysis: Proba-bilistic Models of Proteins and Nucleic Acids (CambridgeUniversity Press, Cambridge, 2005).

[53] A Horovitz and A R Fersht, “Co-operative interactionsduring protein folding,” J Mol Biol 224, 733–40 (Apr1992).

[54] S W Lockless and R Ranganathan, “Evolutionarily con-served pathways of energetic connectivity in protein fam-ilies,” Science 286, 295–9 (Oct 1999).

[55] Michael Socolich, Steve W Lockless, William P Russ,Heather Lee, Kevin H Gardner, and Rama Ranganathan,“Evolutionary information for specifying a protein fold,”Nature 437, 512–8 (Sep 2005).

[56] William P Russ, Drew M Lowery, Prashant Mishra,Michael B Yaffe, and Rama Ranganathan, “Natural-like

function in artificial ww domains,” Nature 437, 579–83(Sep 2005).

[57] William Bialek and Rama Ranganathan, “Rediscoveringthe power of pairwise interactions,” arXiv 0712.4397v1(Jan 2007).

[58] Najeeb Halabi, Olivier Rivoire, Stanislas Leibler, andRama Ranganathan, “Protein sectors: evolutionary unitsof three-dimensional structure,” Cell 138, 774–86 (Aug2009).

[59] Bryan Lunt, Hendrik Szurmant, Andrea Procaccini,James A Hoch, Terence Hwa, and Martin Weigt, “In-ference of direct residue contacts in two-component sig-naling,” Meth Enzymol 471, 17–41 (Jan 2010).

[60] Alexander Schug, Martin Weigt, Jose N Onuchic, TerenceHwa, and Hendrik Szurmant, “High-resolution proteincomplexes from integrating genomic information withmolecular simulation,” Proc Natl Acad Sci USA 106,22124–9 (Dec 2009).

[61] Kenneth P. Murphy, Paul Travers, Charles Janeway, andMark Walport, Janeway’s immunobiology (Garland, NewYork, 2008).

[62] Dr Jens Krause and Graeme D. Ruxton, Living in groups(Oxford University Press, Oxford, 2002).

[63] Irene Giardina, “Collective behavior in animal groups:theoretical models and empirical studies,” HFSP J 2,205–19 (Aug 2008).

[64] Andrea Cavagna, Alessio Cimarelli, Irene Giardina, Gior-gio Parisi, Raffaele Santagati, Fabio Stefanini, and Massi-miliano Viale, “Scale-free correlations in starling flocks,”Proc Natl Acad Sci USA 107, 11865–70 (Jun 2010).

[65] J M Cullen, E Shaw, and H A Baldwin, “Methodsfor measuring the three-dimensional structure of fishschools,” Anim Behav 13, 534–43 (Oct 1965).

[66] Kerson Huang, Statistical Mechanics, 2Nd Ed (Wiley,New York, 1987).

[67] T Vicsek, A Czirok, E Ben-Jacob, I Cohen, andO Shochet, “Novel type of phase transition in a system ofself-driven particles,” Phys Rev Lett 75, 1226–1229 (Aug1995).

[68] J Toner and Y Tu, “Long-range order in a two-dimensional dynamical xy model: How birds fly to-gether,” Phys Rev Lett 75, 4326–4329 (Dec 1995).

[69] J Toner and YH Tu, “Flocks, herds, and schools: A quan-titative theory of flocking,” Phys Rev E 58, 4828–4858(Jan 1998).

[70] A Corral, C Perez, A Dıaz-Guilera, and A Arenas, “Self-organized criticality and synchronization in a latticemodel of integrate-and-fire oscillators,” Phys Rev Lett74, 118–121 (Jan 1995).

[71] A Herz and J Hopfield, “Earthquake cycles and neuralreverberations: Collective oscillations in systems withpulse-coupled threshold elements,” Phys Rev Lett 75,1222–1225 (Aug 1995).

[72] Dan-Mei Chen, S Wu, A Guo, and Z Yang, “Self-organized criticality in a cellular automaton model ofpulse-coupled integrate-and-fire neurons,” Journal ofPhysics A: Mathematical and General 28, 5177 (Sep1995).

[73] Per Bak and Chao Tang, “Earthquakes as a self-organized critical phenomenon,” J Geophys Res 94,15635–15,637 (1989).

[74] John M Beggs and Dietmar Plenz, “Neuronal avalanchesin neocortical circuits,” J Neurosci 23, 11167–77 (Dec




http://dx.doi.org/10.1016/S0079-6123(08)60019-9

http://dx.doi.org/10.1093/cercor/bhn047

http://dx.doi.org/10.1007/s10827-010-0257-0


http://arxiv.org/abs/q-bio/0611072v1

http://dx.doi.org/10.1038/nrm1126




http://dx.doi.org/10.1016/j.cell.2009.07.038

http://dx.doi.org/10.1016/S0076-6879(10)71002-8


http://dx.doi.org/10.2976/1.2961038


http://dx.doi.org/10.1088/0305-4470/28/18/009

http://dx.doi.org/10.1088/0305-4470/28/18/009

http://dx.doi.org/doi:10.1029/JB094iB11p15635

21

2003).[75] John M Beggs, “The criticality hypothesis: how local cor-

tical networks might optimize information processing,”Philos Transact A Math Phys Eng Sci 366, 329–43 (Feb2008).

[76] Theodore E. Harris, The theory of branching processes(Springer, Berlin, 1949).

[77] S Zapperi, Bækgaard Lauritsen K, and H Stanley, “Self-organized branching processes: Mean-field theory foravalanches,” Phys Rev Lett 75, 4071–4074 (Nov 1995).

[78] Wei Chen, Jon P Hobbs, Aonan Tang, and John M Beggs,“A few strong connections: optimizing information reten-tion in neuronal avalanches,” BMC Neurosci 11, 3 (Jan2010).

[79] Clayton Haldeman and John M Beggs, “Critical branch-ing captures activity in living neural networks and max-imizes the number of metastable states,” Phys Rev Lett94, 058101 (Feb 2005).

[80] John M Beggs and Dietmar Plenz, “Neuronal avalanchesare diverse and precise activity patterns that are stablefor many hours in cortical slice cultures,” J Neurosci 24,5216–29 (Jun 2004).

[81] T. Gold, “Hearing. II. the physical basis of the action ofthe cochlea,” Proc R Soc Lond B Biol Sci 135, 492498(1948).

[82] D T Kemp, “Stimulated acoustic emissions from withinthe human auditory system,” J Acoust Soc Am 64, 1386–91 (Nov 1978).

[83] PM Zurek, “Spontaneous narrowband acoustic signalsemitted by human ears,” J Acoust Soc Am 69, 514–523(1981).

[84] A J Hudspeth, Frank Julicher, and Pascal Martin, “A cri-tique of the critical cochlea: Hopf–a bifurcation–is betterthan none,” J Neurophysiol 104, 1219–29 (Sep 2010).

[85] M Ospeck, V M Eguıluz, and M O Magnasco, “Evidenceof a hopf bifurcation in frog hair cells,” Biophys J 80,2597–607 (Jun 2001).

[86] Y Choe, M O Magnasco, and A J Hudspeth, “A model foramplification of hair-bundle motion by cyclical binding ofCa2+ to mechanoelectrical-transduction channels,” ProcNatl Acad Sci USA 95, 15321–6 (Dec 1998).

[87] V M Eguıluz, M Ospeck, Y Choe, A J Hudspeth, andM O Magnasco, “Essential nonlinearities in hearing,”Phys Rev Lett 84, 5232–5 (May 2000).

[88] S Camalet, T Duke, F Julicher, and J Prost, “Auditorysensitivity provided by self-tuned critical oscillations ofhair cells,” Proc Natl Acad Sci USA 97, 3183–8 (Mar2000).

[89] Marcelo O Magnasco, “A wave traveling over a hopf in-stability shapes the cochlear tuning curve,” Phys RevLett 90, 058101 (Feb 2003).

[90] Thomas Duke and Frank Julicher, “Active traveling wavein the cochlea,” Phys Rev Lett 90, 158101 (Apr 2003).

[91] MO Magnasco, O Piro, and GA Cecchi, “Self–tuned crit-ical anti–hebbian networks,” Phys Rev Lett 102, 258102(2009).

[92] G Yule, “A mathematical theory of evolution, based onthe conclusions of Dr. J. C. Willis, F.R.S,” Philosophi-cal Transactions of the Royal Society of London. SeriesBContaining Papers of a Biological Character 213, 21–87(Jan 1925).

http://dx.doi.org/10.1098/rsta.2007.2092

http://dx.doi.org/10.1186/1471-2202-11-3


http://dx.doi.org/10.1152/jn.00437.2010

http://dx.doi.org/10.1016/S0006-3495(01)76230-3

arxiv:1012.2242v1 [q-bio.qm] 10 dec 2010wbialek/our_papers/mora...versities paris 6 et paris 7, 24...

Documents