chapter 3 predicting the uncertain · 2013. 6. 3. · chapter 3. predicting the uncertain 32 c s r...

Chapter 3

Predicting the uncertain

©2013 by Alessandro Codello

All epistemological value of the theory of probability is based on this: thelarge scale random phenomena in their collective action create strict, non

random regularity.

B.V. Gnedenkov and A.N. Kolmogorov [1]

3.1 Random variables

Everything around us is random, from the temperature inside our room to theheight of the next person who comes in from the door. But the recognitionand description of randomness is the first step in the direction of understand-ing, it’s a way to parametrize uncertain.

A random variable is anything that can be measured an arbitrary number of what is arandom

vari-

able?

times, the outcome of the measure being random (this outcome can be aninteger or a real number).

A random variable X can be described by the probability density func- definitionof ran-

dom

variable

tion (pdf) pX(x) defined by the relation:

P (a ≤ X ≤ b) =∫ b

a

dx pX(x) . (3.1)

27

CHAPTER 3. PREDICTING THE UNCERTAIN 28

The probability that X takes some value is one P (−∞ ≤ X ≤ ∞) = 1, thuswe have the normalization condition on the pdf:

∫ ∞

−∞dx pX(x) = 1 . (3.2)

A discrete random variable can be described by a pdf of the form p(x) =∑

i piδ(x − xi). The cumulative distribution function FX(x) of therandom variable X is defined by

FX(x) = P (X ≥ x) (3.3)

and we have F ′X(x) = pX(x). Expectations are defined as:

〈f(X)〉 =∫ ∞

−∞dx pX(x) f(x)

and can be evaluated if the pdf is known.

The moments of the random variable X are given by the expectation values momentsand cu-

mulants

of the powers of X:mn = 〈Xn〉 . (3.4)

The value of the first moment reflects the normalization of the pdf m0 =1 while the second moment is just the mean m1 = m. Higher momentscan be infinite. Moments can be conveniently calculated from the momentgenerating function defined as:

ZX(t) =〈

etX〉

mn = Z(n)X (0) , (3.5)

which is just the Laplace transform of the pdf1. Much more important thanthe moments are the cumulants which are generated by

WX(t) = log ZX(t) cn = W(n)X (0) . (3.6)

The first two cumulants are c0 = 0 and c1 = m, the third is the variance2

c2 = m2 −m2 =〈

(x−m)2〉

≡ σ2 (3.7)1We will use field theory conventions2The positive square root of the variance σ is called standard deviation.


and measures the deviations from the mean. The variance sets the scaleof the pdf and quantifies the resolution at which we are “observing” thepdf. The next few moments are

c3 = m3 − 3m2m+ 2m2

=〈

(x−m)3〉

c4 = m4 − 4m1m3 − 3m22 ++12m21m2 − 6m41

=〈

(x−m)4〉

− 3σ4

c5 = ... , (3.8)

and so on. In general the cumulants cn are polynomial in the moments oforder p ≤ n.

A problem with the moment generating function is that it is not always characteristicfunctionfinite, for this reason it is useful to introduce the characteristic func-

tion which is the Fourier transform3 of the pdf

p̂X(t) =〈

eitX〉

. (3.9)

The nice thing about the characteristic function is that it is always finitesince:

|p̂X(t)| =∣∣〈

eitX〉∣∣ ≤

〈∣∣eitX

∣∣〉

= 〈1〉 = 1 ,

where we used the inequality∣∣〈

eA〉∣∣ ≤

〈∣∣eA

∣∣〉

. Once we have calculated thecharacteristic function we can extract the moments from the relation

mn = (−i)np̂(n)X (0) . (3.10)

Again, from the normalization of the pdf we have p̂X(0) = 1. Cumulants canalso be extracted from the characteristic function as follows:

cn = (−i)ndn

dtnlog p̂X(t)

∣∣∣∣t=0

. (3.11)

It is useful to define the normalized or “dimensionless” cumulants (remember dimensionlesscumu-

lants3Our Fourier transform conventions are:

f̂(t) =

∫

dx f(x) eitx f(x) =

∫dt

2πf̂(t) e−itx .


that the standard deviation σ sets the scale of the pdf)

c̃n =cnσn

. (3.12)

The first two dimensionless cumulants are called the skewness

ς ≡ c̃3 =〈(x−m)3〉

σ3(3.13)

and the kurtosis

κ ≡ c̃4 =〈(x−m)4〉

σ4− 3 . (3.14)

The kurtosis is bounded from below, it is possible to prove that κ > −2 forany pdf.

We will see that the Gaussian pdf has only m and σ non zero, so higher cumulantsparametrize

devia-

tions

from

gaussian-

ity

cumulants, like the skewness (for asymmetrical pdf) and the kurtosis (forsymmetrical pdf), measure the first deviations from gaussianity. The cumu-lants play the role of the connected autocorrelation functions.

3.2 Sums of iid random variablesiid ran-

dom

vari-

ables

and

their

sums

We consider now independent identically distributed (iid) randomvariables of mean m and standard deviation σ. The pdf for the sum of two iidrandom variables X1+X2 is given by the sum of the products pX1(x1)pX2(x2)over all values of x1 and x2 such that x1 + x2 = x:

pX1+X2(x) =

∫

dx1dx2 pX1(x1)pX2(x2)δ(x− x1 − x2)

=

∫

dx1 pX1(x1)pX2(x− x1) . (3.15)

In other words, the pdf of the sum of iid random variables is given by con-volution. In terms of the characteristic functions we simply have

p̂X1+X2(t) = p̂X1(t)p̂X2(t) , (3.16)


since the Fourier transform of a convolution is just the product of the Fouriertransforms. We also have

paX+b(x) =1

apX

(x− ba

)

(3.17)

or in term of the characteristic function

p̂aX+b(t) = eitbp̂X(at) .

From (3.16) we can deduce that cumulants sum up cX1+X2 = cX1 + cX2.

3.2.1 Exact RG transformationsrg

trans-

forma-

tion

We now develop the RG theory for iid random variables [2, 3]. The randomvariables Xi are independent and thus non–interacting4. The coarse–grainingis done by grouping the Xi in groups of two (the number b of grouped ran-dom variables is a scheme freedom) and summing them. The aim of therescaling is to keep the linear size of the system constant, i.e. to keep σ2

unchanged. This means that we have to rescale x so to have constant vari-ance x → x2ν , where ν is a scaling exponent to be determined self-consistently.

The functional space of the pdf is the functional space of positive functions fixingtheory

space

on R which satisfy∫

dx p(x) = 1∫

dx x p(x) = 0∫

dx x2 p(x) = σ2 . (3.18)

We will consider pdf with finite moments. The requirements (3.18) andp(x) ≥ 0 fix theory space.

Using the relations (3.15) and (3.17) we can write the exact RG transfor- exact rgtrans-

forma-

tion

4What really makes the “structure of space” is the “structure of the interactions”. Wecan imagine the iid random variables on one dimensional lattice, but it is only fictitious,what we are actually doing if field theory in zero dimensions.


C

SR

Figure 3.1: Coarse–graining by convolution of neighbor iid random variables.

mation as

Rp(x) = 2ν∫

dy p(y)p(2νx− y) . (3.19)

We can determine the value of the exponent ν by the requirement that theRG transformation respects the properties (3.18), we have

∫

dxRp(x) = 2ν∫

dxdy p(y)p(2νx− y) ,

changing variable to x → 2νx− y (dx → 2νdx) gives∫

dxRp(x) =∫

dxdy p(y)p(x) = 1 .

By the same steps we find∫

dx xRp(x) = 0 .

Since it is the variance that sets the scale, it is the condition on the variance


that fixes the value of ν:∫

dx x2Rp(x) = 2ν∫

dxdy x2 p(y)p(2νx− y)

=

∫

dxdy

(x+ y

2ν

)2

p(y)p(x)

= 21−2νσ2

⇒ ν =1

2. (3.20)

This last relation tells us that Rσ2 = 2σ2, which could had been derived bythe fact that cumulants sum up. The exact rg transformation is

Rp(x) =√2

∫

dy p(y)p(√2x− y) . (3.21)

If we where grouping more variables than two we would had found√2 →

√b.

In terms of the characteristic function the exact RG transformation becomes

Rp̂(t) =[

p̂

(t√2

)]2

. (3.22)

3.2.2 Fixed point: Gaussian pdf

The first thing to do is to find the fixed point pdf p∗(x), i.e. the solutions of fixedpoint ≡gaussianRp∗(x) = p∗(x) (3.23)

To solve (3.23) we use the Fourier representation of (3.22) to obtain:

p̂∗(t) =

[

p̂∗

(t√2

)]2

. (3.24)

Taking the logarithm of (3.24) and defining f(t) = log p̂∗(t) gives

f(t) = 2f

(t√2

)

. (3.25)

Equation (3.25) shows that the function f(t) is homogeneous function withα = 2 and λ = 1√

2, thus f(t) = Ct2 and we find:

p̂∗(t) = eCt2 . (3.26)


The constant C can be fixed by imposing (3.18). Using (3.11) this is equiva-lent to impose p̂∗(0) = 1, p̂′∗(0) = 0 and p̂

′′∗(0) = −σ2, from this last relation

we find:

C = −1

2σ2 . (3.27)

If we reintroduce the mean we finally find:

p̂G(t) = eiµt− 12 t

2σ2 . (3.28)

In (3.28) we called the fixed point solution gaussian since this is the nameof the pdf we have found. Using the Gaussian integration formula

∫ ∞

−∞dx e−

12ax

2+bx =

√

2π

ae

12

b2

a , (3.29)

we can Fourier transform back (3.28):

pG(x) =

∫dt

2πp̂G(t)e

−itx

=

∫dt

2πeit(x−µ)−

12 t

2σ2

=1√2πσ2

e−12

(x−µ)2

σ2 ,

to obtain:

pG(x) =1√2πσ2

e−12

(x−µ)2

σ2 . (3.30)

Using (3.29) we can easily prove that∫

dx pG(x) = 1. The cumulative dis-tribution function of the Gaussian distribution is:

FG(x) =1√2πσ2

∫ ∞

x

du e−12

(u−µ)2

σ2 =1

2−

1

2Erf

(

x− µ√2σ

)

, (3.31)

where Erf(x) is the error function.


3.2.3 Linearizing the RG transformation: the CLTlinear rg

trans-

forma-

tion and

rg eigen-

value

problem

To test the stability properties of the fixed point we linearize the RG trans-formation around the Gaussian pdf:

R(pG + !h)(x) =√2

∫

dy [pG(y) + !h(y)][

pG(√2x− y) + !h(

√2x− y)

]

= RpG(x) + ! 2√2

∫

dy pG(y)h(√2x− y) +O(!2)

= pG(x) + !LGh(x) +O(!2) , (3.32)

where the linear rg operator LG of the Gaussian pdf, defined in the lastline, is the following:

LGh(x) =2√πσ

∫

dy e−y2

2σ2 h(√2x− y) . (3.33)

We need to study the rg eigenvalue problem:

LGhn(x) = λnhn(x) , (3.34)

to do this we switch to Fourier space where the stability operator acts as

LGĥ(t) = 2e−14σ

2t2 ĥ

(t√2

)

(3.35)

and the eigenvalue problem becomes:

λnĥn(t) = 2e− 14σ

2t2 ĥn

(t√2

)

. (3.36)

It is easy to check, by following the same steps used to solve the fixed pointequation, that the functions

ĥn(t) = e− 12σ

2t2(it)n , (3.37)

solve equation (3.36) if we fix the eigenvalues to

λn = 21−n2 , (3.38)

for n = 0, 1, 2, 3, ....


The perturbations ĥ0(t) and ĥ1(t) are amplified by a RG transformation relevant,marginal

irrele-

vant

since the respective eigenvalues λ0 = 2, λ1 =√2 are bigger than one

and are called relevant; the direction ĥ2(t) is marginal λ2 = 1 whileall the others ĥ3(t), ĥ4(t), ... are suppressed and are termed irrelevantλ3 =

1√2,λ4 =

12 , ....

In coordinate space the eigenfunctions (3.37) are given by the Chebyshev- chebyshev–hermite

polyno-

mials

Hermite polynomials hn(x) = pG(x)σ−2nHn(xσ

)

. The first few are:

H0(x) = 1

H1(x) = x

H2(x) = x2 − 1

H3(x) = x3 − 3x

H4(x) = x4 − 6x2 + 3 , (3.39)

in general we have:

Hn(x) = (−1)nex2

2dn

dxne−

x2

2 . (3.40)

Describe the “tangent space” to the fixed point Gaussian pdf, around whichwe can write:

p(x) = pG(x)[

1 +#3σ3

H3(x

σ

)

+#4σ4

H4(x

σ

)

+ ...]

,

which is the first example of perturbative expansion around a fixed–point.Which is the relation between the “couplings” #i and the cumulants ci?

Not all eigen–perturbations are within our theory space. In fact one has: centrallimit

theorem

∫

dx [pG(x) + #0h0(x)] =

∫

dx pG(x) + #0

∫

dx pG(x) h0(x)

= 1 + #0 ⇒ #0 = 0

and∫

dx x [pG(x) + #1h1(x)] =

∫

dx x pG(x) +#1σ2

∫

dx x2 pG(x)

= #1 ⇒ #1 = 0 .

The marginal direction does not contribute since∫

dx pG(x) h2(x) = 0. Thusthe only two relevant directions are orthogonal to our theory space and the


Gaussian fixed point pdf attracts all other directions: the long range col-lective behavior of any collection of iid random variable is described by aGaussian! This is the central limit theorem and a manifestation ofuniversality.

3.2.4 Convergence to the Gaussianrg flow:

running

cumu-

lants

The CLT is valid only in the limit N → ∞ where N = 2n is the numberof random variable summed and n is the number of RG transformationsperformed. To find out finite N corrections we consider a general pdf in thebasin of attraction of the Gaussian pdf, with zero mean and with all the othercumulants finite. This has a characteristic function of the following general“cumulant expansion” form:

p̂(t) = exp

[∞∑

k=2

ckk!(it)k

]

,

since c0 = c1 = 0. To study the convergence of a general pdf to the Gaussianpdf, we iterate the RG transformation n times:

Rnp̂(t) =[

p̂

(t√2n

)]2n

=

[

p̂

(t√N

)]N

= exp

[

N∞∑

k=2

ckk!

(it√N

)k]

= exp

[

−1

2t2σ2 +

∞∑

k=3

ckk!N1−k/2 (it)k

]

= exp

[

−1

2t2σ2 +

∞∑

k=3

cNkk!

(it)k]

, (3.41)

where cNk = N1−k/2ck, are the scale dependent (running) cumulants (or cou-

plings). We can write these “beta functions” as cnk = λnkck, where the λk

are the RG eigenvalues (3.38). We see again from (3.41) that in the limitN → ∞ the pdf p̂N(t) = Rnp̂(t) converges to the Gaussian pdf.


Generally we are more interested in the behavior for large but finite N , finite Ncorrec-

tions to

the clt

situation that we encounter in practice. In this situation we are already con-verging to the Gaussian pdf and thus we can assume cumulants to be small(we are assuming they are all finite). If we fix σ = 1 and we have c̃3 ! 1,c̃4 ! 1 we can expand the exponential in (3.41). The terms can be arrangedin powers of N−1/2:

p̂N(t) = e− 12 t

2

{

1 +c̃3

6√N(it)3 +

c̃424N

(it)4 +c̃23

72N(it)6 +O

(

N−3/2)}

.

(3.42)In terms of the coordinate space pdf we find:

pN(x) =e−

12x

2

√2π

{

1 +1√Nq1/2(x) +

1

Nq1(x) +O

(

N−3/2)}

, (3.43)

where the qk(x) are polynomials depending on the normalized cumulants.Using (3.43) we can also calculate the cumulative distribution function:

F (x) = FG(x)−e−

12x

2

√2π

{1√NQ1/2(x) +

1

NQ1(x) +O

(

N−3/2)}

, (3.44)

where FG(x) is given in (3.31) and the first two Qk(x) polynomials are:

Q1/2(x) =ς

6(x2 − 1)

Q1(x) =1

72ς2x5 +

(

1

24κ−

5

36ς2)

x3 +

+

(5

24ς2 −

1

8κ

)

x , (3.45)

with ς ≡ c̃3 and κ ≡ c̃4 the skewness and the kurtosis as defined in (3.13)and (3.14).

Give some examples. Point out that the convergence to the Gaussian isin the central part of the pdf. Tails converge only for N = ∞.


3.2.5 Law of large numberslaw of

large

numbers

The law of large numbers is obtained instead by imposing:∫

dx p(x) = 1∫

dx x p(x) = m,

and working in the relative functional space. In this case the direction h1(x)belongs to theory space and a pdf is attracted towards pm(x) = δ(x − m).You can work out the details as an exercise.

3.2.6 Stable distributionsσ2 = ∞implies ν

is not de-

termined

If we drop the requirement that the pdf has finite moments, in particularfinite variance, then the exponent ν in the RG transformation (3.19) is notdetermined. In fact we are considering a different theory space for each dif-ferent value of ν and in each of these spaces we are interested in finding fixedpoints and to study the RG flow around them.

In Fourier space the RG transformation (3.19) becomes: fixedpoints ≡stable

distribu-

tions

Rp̂(t) =[

p̂

(t

2ν

)]2

, (3.46)

and the fixed point equation is now:

p̂∗(t) =

[

p̂∗

(t

2ν

)]2

. (3.47)

Proceeding as we did before we find the general form:

p̂∗(t) = e−c|t|α α =

1

ν. (3.48)

To have a everywhere positive pdf we must demand 0 < α ≤ 2. This generalclass of fixed point pdf are called stable distributions; they are describedby the following characteristic function:

p̂Lα(t) = eiµt−c|t|α , (3.49)


!4 !2 0 2 4

0.1

0.2

0.3

0.4

0.5

0.6

0.7

5 10 15 20

0.005

0.010

0.015

0.020

Figure 3.2: Lévy probability density functions for (from top) α = 12 , 1,32 , 2.

Note that for smaller values of α the pdf is more picked around zero but hasprogressively thicker tails.

with also c ≥ 0 and µ real. These distributions are also called lévy distri-butions of which the Gaussian distribution is the particular case α = 2 andc = 1/2σ2. For a generic value of α it is not usually possible to analyticallycalculate the inverse Fourier transform.

A case where this is possible is when α = 1 and we recover the Cauchy cauchydistribu-

tion

(or Lorentzian) distribution:

pL1(x) =A

π2A+ (x− µ)2, (3.50)

where c = πA.

Note that all the Lévy distributions with 0 < α < 2 have infinite variance: scale–free

distribu-

tions

they are scale–free distribution in the sense that there is no “character-istic scale” like the one set by a finite variance. For the distributions with0 < α < 1 not even the mean is defined.

One can prove a generalized central limit theorem for Lévy distributions generalizedcltalong the lines of our RG proof of the standard CLT.


3.3 Entropy and Informationrg

trasfor-

mations

burn

informa-

tion

The information associated to a random variable described by a pdf p(x)is the following:

I[p] = −∫

dx p(x) log p(x) . (3.51)

the coarse–graining procedure burns information and thus theRG flow drives to a pdf which minimize the functional (3.51) within thefunctional space specified by (3.18).

In particular, fixed point pdf must be an extremum of (3.51) subject to the σ2 = ∞fixed

points

≡ ex-tremum

of I[p]

constrains (3.18). We can implement these constrains by employing Lagrangemultipliers:

δ

{

I[p] + α

∫

dx p(x) + β

∫

dx x p(x) + γ

∫

dx x2 p(x)

}

= 0 . (3.52)

Equation (3.52) is solved by:

p(x) = e1+α+βx+γx2, (3.53)

where:

e1+α =1√2πσ

β = 0 α = −1

2σ2,

which gives back the Gaussian pdf (3.30) as expected. Its easy to calculatethe information of the Gaussian:

I[pG] =1

2+

1

2log 2π + log σ = 1.41894...+ log σ .

Can we prove that I [Rp] ≤ I[p]?

3.4 The effective action and Cramér’s theoremeffective

actionThe moment generating function of a random variable φ is5:

Z(J) =〈

eJφ〉

Z(0) = 1 Z(n)(0) = 〈φn〉 ,5We switch notation X → φ and t → J .


the cumulant distribution function is the logarithm of the moment generatingfunction:

W (J) = logZ(J) W (0) = 0

W ′(0) = Z ′(0) W ′′(0) = Z ′′(0)− Z ′(0)2 .We can prove that W (J) is convex:

W ′′(J) =Z(J)Z ′′(J)− Z ′(J)2

Z(J)2=

〈

eJφ〉 〈

φ2eJφ〉

−〈

φeJφ〉2

〈eJφ〉2,

then using the Cauchy-Schwarz inequality 〈φψ〉2 ≤ 〈φ2〉〈ψ2〉 for〈

φeJφ〉2 ≤

〈

eJφ〉 〈

φ2eJφ〉

gives W ′′(J) ≥ 0 for all J . Thus one can define the so–calledCramér’s function, or rate function, as the Legendre transform ofW (J):

Γ(ϕ) = supJ [Jϕ−W (J)] .Γ(ϕ) is also convex and we call Jϕ the solution of W ′(J) = ϕ so thatΓ(ϕ) = Jϕϕ−W (Jϕ). We will call Γ(ϕ) the effective action.

The fundamental result of the theory of large deviations, due to Cramér, largedevia-

tions:

cramér’s

theorem

states that:

The probability of having a deviation from the law of large numbers is givenby:

P

(

1

N

N∑

i=1

φi > ϕ

)

→ e−NΓ(ϕ) for N → ∞ ,

and similarly for 1N∑N

i=1 φi < ϕ.

For a proof see [4]. We now look at two examples.

Its easy to find the rate function for a Gaussian random variable. Start example:gaussian

random

vari-

ables

from the moment generating function (with m = 0):

Z(J) =1√2πσ2

∫

dφ e−1

2σ2φ2eJφ = e

12σ2J2 ,

gives W (J) = 12σ2J2. The average field is ϕJ = W ′(J) = Jσ2 from which we

obtain the current Jϕ = 1σ2ϕ. The rate function is then

Γ(ϕ) = Jϕϕ−W (Jϕ) =1

σ2ϕ2 −

1

2

1

σ2ϕ2 =

1

2σ2(ϕ−m)2 .


0.6 0.7 0.8 0.9 1.0!

"0.7

"0.6

"0.5

"0.4

"0.3

"0.2

"0.1

"#!!"

Figure 3.3: Cramér’s function, i.e. effective action, for a discrete Bernoullirandom variable compared to a simulation.

Note that for a Gaussian pdf Γ(ϕ) = log p(ϕ) + 12 log(2πσ2). Which is the

effective action of Lévy random variables?

The bernoulli discrete random variable φ assume the values 0, 1 example:bernoulli

random

vari-

ables

with probability p = (1−p) = 12 . The moment generating function is simply:

Z(J) =1 + eJ

2,

while the cumulant generating function is:

W (J) = − log 2 + log(1 + eJ) .

We need to find the minimum:

W ′(J) =eJ

1 + eJ= ϕJ ⇒ Jϕ = log

ϕ

1− ϕ.

The effective action is:

Γ(ϕ) = Jϕϕ−W (Jϕ)

= ϕ logϕ

1− ϕ+ log 2− log

(

1 +ϕ

1− ϕ

)

= log 2 + ϕ logϕ+ (1− ϕ) log(1− ϕ) .


Cramér’s theorem predicts:

1

NlogP

(

1

N

N∑

i=1

φi > ϕ

)

→ − log 2− ϕ logϕ− (1− ϕ) log(1− ϕ) .

It is nice to compare this relation with a simulation. This is shown in Figure3.3. For small ϕ we recover the CLT:

Γ(ϕ) = 2

(

ϕ−1

2

)2

+O

((

ϕ−1

2

)4)

=1

2σ2(ϕ−m)2 +O

(

(ϕ−m)4)

,

where we used m = p = 12 and σ2 = p(1 − p) = 14 valid for a Bernoulli

variable.

Bibliography

[1] B.V. Gnedenko and A.N. Kolmogorov, Limit Distributions for Sums ofIndependent Random Variables, Addison Wesley, Cambridge, MA, 1954.

[2] G. Jona-Lasinio, Renormalization group and probability theory, PhysicsReports, (2001) 1–31.

[3] P. Castiglione, M. Falcione, A. Lesne and A. Vulpiani, Chaos and CoarseGraining in Statistical Mechanics (2008) Cambridge University Press.

[4] R. Ellis, Entropy Large Deviations and Statistical Mechanics (2000)Springer–Verlag.

45

chapter 3 predicting the uncertain · 2013. 6. 3. · chapter 3. predicting the uncertain 32 c s r...

Documents