computational statistical physics - eth z · dates and further information •...

Post on 28-Jun-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Computational Statistical PhysicsPart I: Statistical Physics and Phase Transitions

Lucas BöttcherAugust 2, 2019

ETH ZürichInstitute for Theoretical PhysicsHIT K 11.3Wolfgang-Pauli-Strasse 278093 Zürich

402-0812-00LFS 2019

Dates and Further Information

• Lectures: Friday 10.45–12.30 in HIT H51• Exercices: Friday 8.45–10.30 in HIT F21• Oral exams take place in August 2019• Webpage: http://www.itp.phys.ethz.ch/education/computational-statphys.html

1

Teaching Assistants

• Maximilian Holst: holstm@student.ethz.ch• Giovanni Balduzzi: gbalduzz@itp.phys.ethz.ch

2

Course Offered in (Selection)

• Mathematics, Computer Science (Master)• Physics (Master)• Material Science (Spezialvorlesung, Master)• Civil Engineering (Spezialvorlesung)• Biomedical Engineering (Master)

3

Learning Objectives

Students are able to

• distinguish between different computational methods tostudy statistical physics models.

• apply appropriate computational techniques to efficientlysimulate a given equilibrium or non-equilibrium system.

• differentiate between different simulation methods tostudy particle and molecular interactions.

• apply appropriate molecular dynamics modelingconcepts.

4

Phase Transitions

• 22.02. Introduction to statistical physics and the Isingmodel

• 01.03. Monte Carlo methods• 08.03. Finite size methods and cluster algorithms• 15.03. Histogram methods and renormalization group• 22.03. Renormalization group and Boltzmann machines• 29.03. Non-equilibrium phase transitions and summary

5

Molecular Dynamics

• 05.04. Molecular Dynamics, Verlet and Leapfrog methods• 12.04. Optimization, Composed Particle Systems• 19.04. and 26.04. ETH vacations• 03.05. Arbitrary Shapes, Long-range Potentials• 10.05. Nosé-Hoover thermostat, stochastic method,constant pressure ensemble

6

Event-driven Dynamics

• 17.05. Event driven, inelastic collisions, friction• 24.05. Contact dynamics, Particles in Fluids, (if timeallows: ab initio MD, Car-Parrinello)

• 31.05. shifted to 24.05

7

Classical Statistical Mechanics

Classical Statistical Mechanics

8

Phase Space

Let us consider a classical physicalsystem with N particles whosecanonical coordinates and thecorresponding conjugate momentaare given by q1, . . . , q3N andp1, . . . , p3N , respectively. The6N-dimensional space Γ defined bythe last set of coordinates definesthe phase space.

Ludwig Boltzmann(1844-1906)

9

Phase Space

q

p

Phase spaces of an undamped (blue) and a damped (orange)harmonic oscillator. 10

Ensemble Average

The assumption that all states in an ensemble are reached bythe time evolution of the corresponding system is referred toas ergodicity hypothesis. We define the ensemble average of aquantity Q(p, q) as

⟨Q⟩ =∫Q(p, q)ρ(p, q)dpdq∫

ρ(p, q)dpdq , (1)

where ρ (p, q) denotes the phase space density and dpdq is ashorthand notation for dp3Ndq3N .

11

Hamiltonian

The dynamics of the considered N particles is described bytheir Hamiltonian H(p, q), i.e., the equations of motions are

pi = −∂H∂qi

and qi =∂H∂pi

(i = 1, . . . , 3N). (2)

Moreover, the temporal evolution of a phase space element ofvolume V and boundary ∂V is given by

∂t

∫VρdV +

∫∂V

ρv dA = 0, (3)

where v = (p1, . . . , p3N , q1, . . . , q3N ) is a generalized velocityvector.

12

Liouville Theorem

Applying the divergence theorem to Eq. (3), we find that ρsatisfies the continuity equation

∂ρ

∂t+∇ · (ρv) = 0, (4)

where ∇ = (∂/∂p1, . . . , ∂/∂p3N , ∂/∂q1, . . . , ∂q3N ). We canfurther simplify Eq. (4) since

∇ · v =

3N∑i=1

(∂qi∂qi

+∂pi∂pi

)=

3N∑i=1

(∂

∂qi

∂H∂pi− ∂

∂pi

∂H∂qi

)︸ ︷︷ ︸

=0

= 0. (5)

13

Liouville Theorem

Rewriting Eq. (4) using Poisson brackets1 yields Liouville’sTheorem

∂ρ

∂t= H, ρ (7)

which describes the time evolution of the phase spacedensity ρ.

1The Poisson bracket is defined as

u, v =∑i

(∂u

∂qi

∂v

∂pi− ∂u

∂pi

∂v

∂qi

)= −v, u. (6)

14

Thermal Equilibrium

In thermal equilibrium, the system reaches a steady state inwhich the distribution of the configurations is constant andtime-independent, i.e., ∂ρ/∂t = 0. Liouville’s theorem leads tothe following condition

v · ∇ρ = H, ρ = 0. (8)

The last equation is satisfied if ρ depends on quantities whichare conserved during the time evolution of the system. Wethen use such a phase space density ρ to replace the timeaverage

⟨Q⟩ = limT→∞

1T

∫ T

0Q(p(t), q(t))dt. (9)

by its ensemble average as defined by Eq. (1).15

Ensemble Average

In the subsequent sections, we also consider discreteconfigurations X . In this case, we define the ensembleaverage as

⟨Q⟩ = 1Ω

∑X

Q(X)ρ(X), (10)

where Ω is the normalizing volume such thatΩ−1∑

X ρ(X) = 1. With the help of ensemble averages,systems can be described by means of some macroscopicquantities, such as temperature, energy and pressure.

16

Ensembles

• Microcanonical ensemble:constant E, V , N

• Canonical ensemble: constantT , V , N

• Canonical pressure ensemble:constant T , p, N

• Grandcanonical ensemble:constant T , V , µ

Josiah W. Gibbs(1839-1903)

17

Microcanonical Ensemble

Themicrocanonical ensemble is defined by a constant numberof particles, volume and energy. Thus, any configuration X ofthe system has the same energy E (X) = const. The phasespace density is also constant and given by

ρ (X) =1

Zmcδ (H (X)− E) , (11)

with Zmc being the partition function of the microcanonicalensemble

Zmc =∑X

δ (H (X)− E).

18

Canonical Ensemble

In a canonical ensemble setup, the system we study(system 1) is coupled to a heat reservoir (system 2) that

guarantees a constant temperature.

19

Canonical Ensemble

At a given temperature T , the probability for a system to be ina certain configuration X with energy E (X) is given by

ρ (X) =1ZT

exp[−E (X)

kBT

], (12)

withZT =

∑X

exp[−E (X)

kBT

](13)

being the partition function of the canonical ensemble.According to the prior definition in Eq. (10), the ensembleaverage of a quantity Q is then given by

⟨Q⟩ = 1ZT

∑X

Q (X) e−E(X)

kBT . (14)

20

Ising Model

Ising Model

An illustration of the interaction of a magnetic dipole (black)with its nearest neighbors (dark grey) on a two-dimensional

lattice.21

Ising Model

We consider a two-dimensional lattice with sites σi ∈ 1,−1which only interact with their nearest neighbors. Theirinteraction is described by the Hamiltonian

H (σ) = −J∑⟨i,j⟩

σiσj −H

N∑i=1

σi, (15)

where the first term denotes the interaction between allnearest neighbors represented by a sum over ⟨i, j⟩, and thesecond one the interaction of each site with an externalmagnetic field H .

• Ferromagnetic case: J > 0 (parallel spins)• Antiferromagnetic case: J < 0 (anti-parallel spins)• No interaction: J = 0

22

Phase Transition in the Ferromagnetic Case

The magnetization M(T,H) for different fields H ≥ 0 as afunction of T . For T ≤ Tc, the system is characterized by aferromagnetic phase. The black solid line represents the

spontaneous magnetization MS(T ) for H = 0 and should beinterpreted in the sense that limH → 0+M(T,H).

23

Phase Transition in the Ferromagnetic Case

The first-order transition as a consquence of a sign change ofthe external field.

24

Order Parameter

The magnetization is defined as

M (T,H) =

⟨1N

N∑i=1

σi

⟩, (16)

and corresponds to the ensemble average of the mean valueof all spins.

On average, M (T ) vanishes since for every configurationthere exists one of opposite sign which neutralizes the otherone. As a consequence, we define the order parameter of theIsing model as

MS (T ) = limH→0+

⟨1N

N∑i=1

σi

⟩(17)

and refer to it as the spontaneous magnetization.25

Magnetic Domains

The simulations have been performed on a square lattice with512× 512 sites using

https://mattbierbaum.github.io/ising.js/.

26

Critical Exponents I

In the vicinity of the critical temperature for T < Tc, thespontaneous magnetization scales as

MS(T ) ∝ (Tc − T )β . (18)

For T = Tc and H → 0, we find the following scaling

M(T = Tc,H) ∝ H 1/δ. (19)

The exponents β and δ are so-called critical exponents andcharacterize together with other exponents the underlyingphase transition.

2D: β = 1/8, δ = 153D: β ≈ 0.326, δ ≈ 4.790

27

Fluctuations

The magnetic sucsceptibility is defined as the change of themagnetization M in response to an applied magnetic field H ,i.e.,

χ (T ) =∂M (T,H)

∂H. (20)

We now use the definition of the spontaneous magnetizationgiven by Eq. (17) and plug it into Eq. (20) leading to

χ (T ) = limH→0+

∂ ⟨M (T,H)⟩∂H

= limH→0+

∂H

1N

∑σ∑N

i=1 σi exp(E0+H

∑Ni=1 σi

kBT

)∑σ

exp

(E0 +H

∑Ni=1 σi

kBT

)︸ ︷︷ ︸

=ZT (H)

.

28

Fluctuations

Using the product rule yields

χ (T ) = limH→0+

1NkBT

∑σ

(∑Ni=1 σi

)2exp

(E0+H

∑Ni=1 σi

kBT

)ZT (H)

− 1NkBT

[∑σ∑N

i=1 σi exp(E0+H

∑Ni=1 σi

kBT

)]2

[ZT (H)]2

=N

kBT

[⟨MS (T )2

⟩− ⟨MS (T )⟩2

]≥ 0.

(21)

The last equation defines the fluctuation-dissipation theoremfor the magnetic susceptibility. Analogously, the specific heatis connected to energy fluctuations by

C(T ) =∂ ⟨E⟩∂T

=1

kBT 2

[⟨E (T )2

⟩− ⟨E (T )⟩2

]. (22)

29

Critical Exponents II

Similarly to the power-law scaling of the spontaneousmagnetization defined in Eq. (18), we find for the magneticsusceptibility and the specific heat in the vicinity of Tc

χ (T ) ∝ |Tc − T |−γ , (23)

C (T ) ∝ |Tc − T |−α . (24)

The corresponding exponents are in2D: γ = 7/4, α = 0

3D: γ ≈ 1.24, α ≈ 0.11

30

Critical Exponents II

T Tc

χ(T

)

T Tc

C(T

)

Susceptibility and specific heat as a function of temperaturefor the three dimensional Ising model. Both quantitiesdiverge at the critical temperature Tc in the thermodynamiclimit.

31

Correlation Length

The correlation function is defined by

G(r1, r2;T,H) = ⟨σ1σ2⟩ − ⟨σ1⟩ ⟨σ2⟩ , (25)

where the vectors r1 and r2 pointing in the direction of latticesites 1 and 2.

32

Correlation Length

The correlation function is defined by

G(r1, r2;T,H) = ⟨σ1σ2⟩ − ⟨σ1⟩ ⟨σ2⟩ , (26)

where the vectors r1 and r2 pointing in the direction of latticesites 1 and 2. If the system is translational and rotationalinvariant, the correlation function only depends onr = |r1 − r2|. At the critical point, the correlation functiondecays as

G(r;Tc, 0) ∝ r−d+2−η, (27)

where η is another critical exponent and d the dimension ofthe system.

2D: η = 1/43D: η ≈ 0.036

33

Correlation Length

For temperatures away from the critical temperature, thecorrelation function exhibits an exponential decay

G(r;T, 0) ∝ r−ϑe−r/ξ, (28)

where ξ defines the correlation length. The exponent ϑ equals2 above and 1/2 below the transition point. In the vicinity ofTc, the correlation length ξ diverges since

ξ(T ) ∝ |T − Tc|−ν . (29)

2D: ν = 13D: ν ≈ 0.63

34

Critical Exponents and Universality

The aforementioned six critical exponents are connected byfour scaling laws

α+ 2β + γ = 2 (Rushbrooke), (30)γ = β(δ − 1) (Widom), (31)γ = (2− η)ν (Fisher), (32)

2− α = dν (Josephson), (33)

which have been derived in the context of thephenomenological scaling theory for ferromagneticsystems [1, 2]. Due to these relations, the number ofindependent exponents reduces to two.

35

Critical Exponents and Universality

Universal scaling for five different gases. The figure is takenfrom Ref. [3]. 36

Critical Exponents and Universality

The critical exponents of the Ising model in two and threedimensions [4].

Exponent d = 2 d = 3

α 0 0.110(1)β 1/8 0.3265(3)γ 7/4 1.2372(5)δ 15 4.789(2)η 1/4 0.0364(5)ν 1 0.6301(4)

37

Monte Carlo Methods

Monte Carlo Methods

The main steps of the Monte Carlo sampling are

1. Choose randomly a new configuration in phase spacebased on a Markov chain.

2. Accept or reject the new configuration, depending on thestrategy used (e.g., Metropolis or Glauber dynamics).

3. Compute the physical quantity and add it to theaveraging procedure.

4. Repeat the previous steps.

38

Markov Chains

Example of an energy distribution with a system size L

dependence of the distribution width which scales as ∝√Ld

where d is the system dimension.39

Markov Chains

In terms of a Markov chain, the transition probability fromone state to another is given by the probability of a new stateto be proposed (T ) and the probability of this state to beaccepted (A). Specifically, T (X → Y ) is the probability that anew configuration Y is proposed, starting from configurationX . For the thermodynamic systems we consider, thetransition probability fulfills three conditions:

1. Ergodicity: any configuration in the phase space must bereachable within a finite number of steps,

2. Normalization:∑

Y T (X → Y ) = 1,3. Reversibility: T (X → Y ) = T (Y → X).

40

Markov Chains

Once a configuration is proposed, we can accept the newconfiguration with probability A(X → Y ) or reject it withprobability 1−A(X → Y ). The probability of the Markov chainis then given by

W (X → Y ) = T (X → Y ) ·A(X → Y ). (34)

41

Markov Chains

We denote the probability to find the system in a certainconfiguration X at virtual time τ by p (X, τ). The masterequation describes the time evolution of p (X, τ) and is givenby

dp (X, τ)

dτ=∑Y

p(Y )W (Y → X)−∑Y

p(X)W (X → Y ). (35)

42

Markov Chains

A stationary state pst is reached if dp(X,τ)dτ = 0. The probability

of the Markov chain fulfills the following properties:

1. Ergodicity: any configuration must be reachable: ∀X,Y :

W (X → Y ) ≥ 0,2. Normalization:

∑Y W (X → Y ) = 1,

3. Homogeneity:∑

Y pst(Y )W (Y → X) = pst(X).

43

Markov Chains

To efficiently sample the relevant regions of the phase space,the probability of the Markov chain W (·) has to depend onthe system properties. To achieve that, we set the stationarydistribution pst equal to the equilibrium distribution of thephysical system peq (a real and measurable distribution):

dp (X, τ)

dτ= 0⇔ pst

!= peq. (36)

44

Markov Chains

45

Markov Chains

It then follows from the stationary state condition of theMarkov chain that∑

Y

peq(Y )W (Y → X) =∑Y

peq(X)W (X → Y ). (37)

A sufficient condition for this to be true is

peq(Y )W (Y → X) = peq(X)W (X → Y ), (38)

which is referred to as condition of detailed balance.

46

Markov Chains

As an example, in a canonical ensemble at fixed temperatureT, the equilibrium distribution is given by the Boltzmanndistribution

peq(X) =1ZT

exp[−E(X)

kBT

](39)

with the partition function ZT =∑

X exp[−E(X)

kBT

].

47

M(RT)2 Algorithm

Nicholas C. Metropolis (1915-1999) introduced the M(RT)2sampling technique. 48

M(RT)2 Algorithm

One possible choice of the acceptance probability fulfillingthe detailed balance condition is given by

A (X → Y ) = min

[1,peq (Y )

peq (X)

]. (40)

which can be obtained by rewritting Eq. (38).

49

M(RT)2 Algorithm

In the case of the canonical ensemble withpeq (X) = 1

ZTexp

[−E(X)

kBT

], the acceptance probability

becomes

A (X → Y ) = min

[1, exp

(− ∆E

kBT

)], (41)

where ∆E = E(Y )− E(X). The last equation implies that theMonte Carlo step is always accepted if the energy decreases,and if the energy increases, it is accepted with probabilityexp

(− ∆E

kBT

).

50

M(RT)2 Algorithm

In summary, the steps of the M(RT)2 alogrithm applied to theIsing model are

M(RT)2 Algorithm

• Randomly choose a lattice site i,• Compute ∆E = E(Y )− E(X) = 2Jσihi,• Flip the spin if ∆E ≤ 0, otherwise accept it withprobability exp

(− ∆E

kBT

),

with hi =∑

⟨i,j⟩ σj and E = −J∑

⟨i,j⟩ σiσj .

51

Glauber Dynamics

The Metropolis algorithm is not the only possible choice tofulfill the detailed balance condition. Another acceptanceprobability given by

AG (X → Y ) =exp

(− ∆E

kBT

)1 + exp

(− ∆E

kBT

) (42)

has been suggested by Glauber.

52

Glauber Dynamics

−6 −4 −2 0 2 4 6

β∆E

0.0

0.2

0.4

0.6

0.8

1.0

A(X→Y

)

M(RT)2

Glauber

A comparison of the acceptance probabilities of M(RT)2 andGlauber dynamics.

53

Glauber Dynamics

In contrast to the M(RT)2 acceptance probability, updates with∆E = 0 are not always accepted but with probability 1/2.

To proof that Eq. (42) satisfies the condition of detailedbalance, we have to show that

peq(Y )AG(Y → X) = peq(X)AG(X → Y ), (43)

since T (Y → X) = T (X → Y ).

54

Glauber Dynamics

The last equation is equivalent to

peq(Y )

peq(X)=

AG(X → Y )

AG(Y → X)(44)

which is fulfilled since

peq(Y )

peq(X)= exp

(− ∆E

kBT

)(45)

and

AG(X → Y )

AG(Y → X)=

exp(− ∆E

kBT

)1 + exp

(− ∆E

kBT

) exp

(∆EkBT

)1 + exp

(∆EkBT

)−1

= exp(− ∆E

kBT

).

(46)

55

Glauber Dynamics

As in the M(RT)2 algorithm, only the local configurationaround the lattice site is relevant for the update procedure.Furthermore, with J = 1, the probability to flip spin σi is

AG(X → Y ) =exp

(−2σihikBT

)1 + exp

(−2σihikBT

) (47)

with hi =∑

⟨i,j⟩ σj being the local field andX = . . . , σi−1, σi, σi+1, . . . and Y = . . . , σi−1,−σi, σi+1, . . . the initial and final configuration, respectively.

56

Glauber Dynamics

We abbreviate the probability defined by Eq. (47) as pi. Thespin flip and no flip probabilities can then be expressed as

pflip =

pi for σi = −1

1− pi for σi = +1and pno flip =

1− pi for σi = −1

pi for σi = +1(48)

57

Glauber Dynamics

A possible implementation is

σi(τ + 1) = −σi(τ) · sign(pi − z), (49)

with z ∈ (0, 1) being a uniformly distributed random number,or

σi(τ+1) =

+1 with propability pi

−1 with propability 1− piand pi =

exp (2βhi)1 + exp (2βhi)

.

(50)This method does not depend on the spin value at time t andis called heat bath Monte Carlo.

58

Binary Mixtures

An example of a binary mixture consisting of two differentatoms A and B.

59

Binary Mixtures

Kawasaki dynamics

• Choose a A−B bond,• Compute ∆E for A−B → B −A,• Metropolis: If ∆E ≤ 0 flip, else flip with probability

p = exp

(−∆E

kBT

),

• Glauber: Flip with probability

p = exp(− ∆E

kBT

)/

[1 + exp

(− ∆E

kBT

)].

60

Creutz Algorithm

61

Creutz Algorithm

The movement in phase space is therefore not strictlyconstrained to a subspace of constant energy but there is acertain additional volume in which we can freely move. Thecondition of constant energy is softened by introducing aso-called demon which corresponds to a small reservoir ofenergy ED that can store a certain maximum energy Emax.

62

Creutz Algorithm

Creutz Algorithm

• Choose a site,• Compute ∆E for the spin flip,• Accept the change if Emax ≥ ED −∆E ≥ 0.

63

Creutz Algorithm

The distribution of the demon energy ED is exponentiallydistributed. Based on the Boltzmann factor, it is possible to

extract the inverse temperature β = (kBT )−1 = 2.25. 64

Q2R

In the case of Emax → 0, the Creutz algorithm resembles atotalistic cellular automaton called Q2R [5]. The update ruleson a square lattice for spins σij ∈ 0, 1 are given by

σij(τ + 1) = f(xij)⊕ σij(τ) (51)

with

xij = σi−1j + σi+1j + σij−1 + σij+1 and f(x) =

1 if x = 2

0 if x = 2.

(52)

65

Boundary Conditions

For finite lattices, the following boundary conditions may beused:

• Open boundaries, i.e., no neighbors at the edges of thesystem,

• fixed boundary conditions,• and periodic boundaries.

66

Temporal Correlations

According to the definition of a Markov chain, the dependenceof a quantity A on virtual time τ is given by

⟨A(τ)⟩ =∑X

p (X, τ)A (X) =∑X

p (X, τ0)A (X(τ)). (53)

In the second step of the last equation, we used the fact thatthe average is taken over an ensemble of initialconfigurations X(τ0) which evolve according to Eq. (35) [6].

67

Temporal Correlations

For some τ0 < τ , the non-linear correlation function

ΦnlA (τ) =

⟨A(τ)⟩ − ⟨A(∞)⟩⟨A(τ0)⟩ − ⟨A(∞)⟩

(54)

is a measure to quantify the deviation of A(τ) from A(∞)

relative to the deviation of A(τ0) from A(∞).

68

Temporal Correlations

The non-linear correlation time τnlA describes the relaxationtowards equilibrium and is defined as2

τnlA =

∞∫0

ΦnlA (τ) dτ. (56)

2If we consider an exponential decay of ΦnlA (τ), we find that this definition

is meaningful since∞∫

0

exp(−τ/τnl

A

)dτ = τnl

A . (55)

69

Temporal Correlations

In the vicinity of the critical temperature Tc, we observe theso-called critical slowing down of our dynamics, i.e., thenon-linear correlation time is described by power law

τnlA ∝ |T − Tc|−znlA (57)

with znlA being the non-linear dynamical critical exponent.This is very bad news, because the last equation implies thatthe time needed to reach equilibrium diverges at Tc.

70

Temporal Correlations

The linear correlation function of two quantities A and B inequilibrium is defined as

ΦAB (τ) =⟨A(τ0)B(τ)⟩ − ⟨A⟩ ⟨B⟩⟨AB⟩ − ⟨A⟩ ⟨B⟩

(58)

with

⟨A (τ0)B (τ)⟩ =∑X

p (X, τ0)A (X (τ0))B (X (τ)).

As τ goes to infinity, ΦAB (τ) decreases from unity to zero.

71

Temporal Correlations

If A = B, we call Eq. (58) the autocorrelation function. For thespin-spin correlation in the Ising model we obtain

Φσ (τ) =⟨σ(τ0)σ(τ)⟩ − ⟨σ(τ0)⟩2

⟨σ2(τ0)⟩ − ⟨σ(τ0)⟩2. (59)

72

Temporal Correlations

The linear correlation time τAB describes the relaxation inequilibrium

τAB =

∞∫0

ΦAB (τ) dτ. (60)

73

Temporal Correlations

As in the case of the non-linear correlation time, in thevicinity of Tc, we observe a critical slowing down, i.e.,

τAB ∝ |T − Tc|−zA . (61)

with zA being the linear dynamical critical exponent.

74

Temporal Correlations

The dynamical exponents for spin correlations turn out to be

zσ = 2.16 (2D), (62)zσ = 2.09 (3D). (63)

There is a conjectured relation between the Ising criticalexponents and the critical dynamical exponents for spin σ

and energy correlations E. The relations

zσ − znlσ = β, (64)zE − znlE = 1− α, (65)

are numerically well-established, however, not yetanalytically proven.

75

Decorrelated Configurations

Connecting this behavior with the one observed for thecorrelation time described by Eq. (60) yields

τAB ∝ |T − Tc|−zAB ∝ LzABν . (66)

76

Decorrelated Configurations

77

Decorrelated Configurations

To ensure not to sample correlated configurations one should

• first reach equilibrium (discard n0 = cτnl(T )

configurations),• only sample every nth

e = cτ(T ) configuration,

• and at Tc use n0 = cLznl

ν and ne = cLzν

where c ≈ 3 is a ”safety factor” to make sure to discardenough samples.

78

Finite Size Methods

Finite size methods

Divergent behavior at Tc

χ (T ) ∝ |Tc − T |−γ ,

C (T ) ∝ |Tc − T |−α ,

ξ(T ) ∝ |Tc − T |−ν .

(67)

79

Finite size methods

4.0 4.2 4.4 4.6 4.8 5.0

T

0

5

10

15

20

25

χ(T,L

)

L=8

L=12

L=16

L=20

−10 −5 0 5 10

L1/ν (T − Tc) /Tc

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

χ(T,L

)L−γ/ν

L=8

L=12

L=16

L=20

The system size dependence of the susceptibility and thecorresponding finite size scaling.

80

Finite size methods

The finite size scaling relation of the susceptibility is given by

χ (T, L) = Lγν Fχ

[(T − Tc)L

], (68)

where Fχ is called susceptibility scaling function3.

3Based on Eq. (23), we can infer that Fχ

[(T − Tc)L

]∝

(|T − Tc|L

)−γ

as L → ∞.

81

Finite size methods

In the case of the magnetization, the corresponding finite sizescaling relation is

MS (T, L) = L−βν FMS

[(T − Tc)L

]. (69)

82

Binder Cumulant

We still need a method to better determine Tc and maketherefore use of the Binder cumulant

UL(T ) = 1−⟨M4⟩

L

3 ⟨M2⟩2L, (70)

which is independent of the system size L at Tc since⟨M4⟩

L

3 ⟨M2⟩2L=

L− 4βν FM4

[(T − Tc)L

]L− 2β

ν FM2

[(T − Tc)L

]2 = FC

[(T − Tc)L

].

(71)

83

Binder Cumulant

The distribution P (M) of the magnetization M above a belowthe critical temperature Tc.

84

Binder Cumulant

For T > Tc, the magnetization is described by a Gaussiandistribution

PL (M) =

√Ld

πσLexp

[−M2Ld

σL

], (72)

with σL = kBT2χL. Since the fourth moment equals threetimes the second moment squared, i.e.,⟨

M4⟩L= 3

⟨M2⟩2

L, (73)

it follows that UL(T ) must be zero for T > Tc.

85

Binder Cumulant

Below the critical temperature (T < Tc), there exist oneground state with positive and one with negativemagnetization and the corresponding distribution is given by

PL (M) =12

√Ld

πσl

exp

[−(M −MS)

2 Ld

σL

]

+exp

[−(M +MS)

2 Ld

σL

].

(74)

86

Binder Cumulant

For this distribution, it holds that⟨M4⟩

L=⟨M2⟩2

Land

therefore UL(T ) =23 for T < Tc. In summary, we

demonstrated that

UL(T ) =

23 for T < Tc

const. for T = Tc

0 for T > Tc

(75)

87

Binder Cumulant

3.0 3.5 4.0 4.5 5.0 5.5 6.0

T

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

UL

(T)

L=8

L=12

L=16

L=20

4.490 4.495 4.500 4.505 4.510 4.515 4.520

T

0.42

0.44

0.46

0.48

0.50

0.52

0.54

0.56

UL

(T)

L=8

L=12

L=16

L=20

88

Corrections to Scaling

Far away from Tc we cannot observe a clear power lawbehavior anymore, and corrections to scaling have to beemployed, i.e.,

M (T ) = A0 (Tc − T )β +A1 (Tc − T )β1 + . . . , (76)ξ (T ) = C0 (Tc − T )−ν + C1 (Tc − T )−ν1 + . . . , (77)

with β1 > β and ν1 < ν.

89

Corrections to Scaling

These corrections are very important for high quality data,where the errors are small and the deviations become visible.The scaling functions must also be generalized as

M (T,L) = L−βν FM

[(T − Tc)L

]+ L−xF 1

M

[(T − Tc)L

]+ . . .

(78)with x = max

[β1ν ,

βν1, βν − 1

].

90

First Order Transition

The magnetization exhibits a switching behavior if the fieldvanishes. For non-zero magnetic fields, the magnetization isdriven in the direction of the field. The figure is taken from

Ref. [7].

91

First Order Transition

Hysteresis can be found by varying the field from negative topositive values and back. The figure is taken from Ref. [7].

92

First Order Transition

Binder showed that the magnetization as a function of thefield H is described by tanh

(αLd

)if the distribution of the

magnetization is given by Eq. (74) [7]. Specifically, we find forthe magnetization and susceptibility

M(H) = χDLH +MLtanh

(βHMLL

d), (79)

χL (H) =∂M

∂H= χD

L +βMLL

d

cosh2 (βHMLLd). (80)

93

First Order Transition

Similarly, to the scaling of a second order transition, we canscale the maximum of the susceptibility (χL (H = 0) ∝ Ld)and the width of the peak (∆χL ∝ L−d). To summarize, a firstorder phase transition is characterized by

1. A bimodal distribution of the order parameter,2. stochastic switching between the two states in small

systems,3. hysteresis of the order parameter when changing the

field,4. a scaling of the order parameter, or response function

according to Eq. (80).

94

Cluster Algorithms

Potts Model

The Hamiltonian of the system is defined as

H = −J∑⟨i,j⟩

δσiσj −H∑i

σi, (81)

where σi ∈ 1, ..., q and δσiσj is unity when nodes i and j arein the same state. The Potts model exhibits a first ordertransition at the critical temperature in two dimensions forq > 4, and for q > 2 for dimensions larger than two.

95

The Kasteleyn and Fortuin Theorem

We consider the Potts model not on a square lattice but on anarbitrary graph of nodes connected with bonds ν. Each nodehas q possible states and each connection leads to an energycost of unity if two connected nodes are in a different stateand of zero if they are in the same state, i.e.,

E = J∑ν

ϵν with ϵν =

0 if endpoints are in the same state1 otherwise

(82)

96

The Kasteleyn and Fortuin Theorem

Contraction and deletion on a graph.

97

The Kasteleyn and Fortuin Theorem

The partition function is the sum over all possibleconfigurations weighted by the Boltzmann factor and thusgiven by

Z =∑X

e−βE(X) (82)=∑X

e−βJ∑

ν ϵν =∑X

∏ν

e−βJϵν . (83)

98

The Kasteleyn and Fortuin Theorem

We now consider a graph where bond ν1 connects two nodes iand j with states σi and σj , respectively. If we would deletebond ν1, the partition function is

ZD =∑X

∏ν =ν1

e−βJϵν . (84)

99

The Kasteleyn and Fortuin Theorem

We can thus rewrite Eq. (83) as

Z =∑X

e−βJϵν1∏ν =ν1

e−βJϵν

=∑

X:σi=σj

∏ν =ν1

e−βJϵν + e−βJ∑

X:σi =σj

∏ν =ν1

e−βJϵν ,

where the first part is the partition function of the contractedgraph ZC and the second part is given by the identity∑X:σi =σj

∏ν =ν1

e−βJϵν =∑X

∏ν =ν1

e−βJϵν−∑

X:σi=σj

∏ν =ν1

e−βJϵν = ZD−ZC .

(85)

100

The Kasteleyn and Fortuin Theorem

Summarizing the last results, we find

Z = ZC + e−βJ (ZD − ZC) = pZC + (1− p)ZD, (86)

where p = 1− e−βJ . To be more precise, we expressed thepartition function Z as the contracted and deleted partitionfunctions at bond ν1. We apply the last procedure to anotherbond ν2 and find

Z = p2ZCν1 ,Cν2+p(1−p)ZCν1 ,Dν2

+(1−p)pZDν1 ,Cν2+(1−p)2ZDν1 ,Dν2

.

(87)

101

The Kasteleyn and Fortuin Theorem

After applying these operations to every bond, the graph isreduced to a set of separated points corresponding toclusters of nodes which are connected and in the same stateout of q states. The partition function reduces to

Z =∑

configurations ofbond percolation

q# of clusterspc (1− p)d =⟨q# of clusters

⟩b, (88)

where c and d are the numbers of contracted and deletedbonds respectively. In the limit of q → 1, one obtains thepartition function of bond percolation4.4In bond percolation, an edge of a graph is occupied with probability p andvacant with probability 1 − p.

102

Coniglio-Klein Clusters

The probability of a given cluster C to be in a certain state σ0

is independent of the state itself, i.e.,

p(C, σ0) = pcC (1− p)dC∑

bond percolationwithout cluster C

q# of clusterspc (1− p)d. (89)

103

Coniglio-Klein Clusters

This implies that flipping this particular cluster has no effecton the partition function (and therefore the energy) so that itis possible to accept the flip with probability one. This can beseen by looking at the detailed balance condition of thesystem

p(C, σ1)W [(C, σ1)→ (C, σ2)] = p(C, σ2)W [(C, σ2)→ (C, σ1)]

(90)and using p(C, σ1) = p(C, σ2).

104

Coniglio-Klein Clusters

We then obtain for the acceptance probabilities

A [(C, σ2)→ (C, σ1)] = min

[1,p(C, σ2)

p(C, σ1)

]= 1 M(RT)2,

A [(C, σ2)→ (C, σ1)] =p(C, σ2)

p(C, σ1) + p(C, σ2)=

12

Glauber.(91)

Based on these insights, we introduce cluster algorithmswhich are much faster than single-spin flip algorithms andless prone to the problem of critical slowing down.

105

Swendsen-Wang Algorithm

Swendsen-Wang algorithm

• Occupy the bonds with probability p = 1− e−βJ ifsites are in the same state.

• Identify the clusters with the Hoshen-Kopelmanalgorithm.

• Flip the clusters with probability 1/2 for Ising oralways choose a new state for q > 2.

• Repeat the procedure.

106

Wolff Algorithm

Wolff algorithm

• Choose a site randomly.• If the neighboring sites are in the same state, addthem to the cluster with probability p = 1− e−βJ .

• Repeat this for any site on the boundaries of thecluster, until all the bonds of the cluster have beenchecked exactly once.

• Choose a new state for the cluster.• Repeat the procedure.

107

Other Ising-like Models

One of the possible generalizations of the Ising model is theso called n-vector model.

The Hamiltonian resembles the one of the Potts model in thesense that it favors spin alignment

H = −J∑⟨i,j⟩

Si · Sj + H∑i

Si. (92)

with Si =(S1i, S

2i , . . . , S

ni

)and

∣∣∣Si

∣∣∣ = 1.

108

Other Ising-like Models

The dependence of the critical temperature on the number ofvector components n.

109

Other Ising-like Models

For Monte Carlo simulations with vector-valued spins we haveto adapt our simulation methods. The classical strategy is toflip spins by modifying the spin locally trough adding a small∆S such that S′

i = Si +∆S and ∆S ⊥ Si. The classicalMetropolis algorithm can then be used in the same fashion asin the Ising model.

110

Histogram Methods

Histogram Methods

For computing the thermal average defined by Eq. (14), weneed to sample different configurations at differenttemperatures. Another possibility would be to determine anaverage at a certain temperature T0 and extrapolate toanother temperature T . In the case of a canonical ensemble,an extrapolation can be achieved by reweighing thehistogram of energies pT0(E) with the Boltzmann factorexp(E/T − E/T0).

111

Histogram Methods

Such histogram methods have first been described in Ref. [8].We now reformulate the computation of the thermal averageof a quantity Q and of the partition function as a sum over allpossible energies instead of over all possible configurationsand find

Q (T0) =1

ZT0

∑E

Q (E) pT0 (E) with ZT0 =∑E

pT0 (E), (93)

where pT0 (E) = g (E) e− E

kBT0 with g (E) defining thedegeneracy of states, i.e., the number of states with energy E.

112

Histogram Methods

This takes into account the fact that multiple configurationscan have the same energy. The goal is to compute thequantity Q at another temperature T

Q (T ) =1ZT

∑E

Q (E) pT (E). (94)

The degeneracy of states contains all the information needed.Using the definition of g (E) yields

pT (E) = g (E) e− E

kBT = pT0 (E) exp

[− E

kBT+

E

kBT0

](95)

and with fT0,T (E) = exp[− E

kBT + EkBT0

]we finally obtain

Q (T ) =

∑E Q (E) pT0 (E) fT0,T (E)∑

E pT0 (E) fT0,T (E). (96)

113

Histogram Methods

An example of the histogram and the broad histogram methodfor different system sizes. The figure is taken from Ref. [9]. 114

Broad Histogram Method

Let Nup and Ndown be the numbers of processes which lead toan increasing and decreasing energy, respectively.Furthermore, we have to keep in mind that the degeneracy ofstates increases exponentially with energy E, because thenumber of possible configurations increases with energy. Toexplore all energy regions equally, we find a conditionequivalent to the one of detailed balance, i.e.,

g (E +∆E)Ndown (E +∆E) = g (E)Nup (E) . (97)

115

Broad Histogram Method

The motion in phase space towards higher energies can thenbe penalized with a Metropolis-like dynamics:

• Choose a new configuration,• if the new energy is lower, accept the move,• if the new energy is higher then accept with probability

Ndown(E+∆E)Nup(E) .

116

Broad Histogram Method

We obtain the function g(E) by taking the logarithm of Eq. (97)and divide by ∆E

log [g (E +∆E)]−log [g (E)] = log [Nup (E)]−log [Ndown (E +∆E)] .

(98)

117

Broad Histogram Method

In the limit of small energy differences, we can approximatethe last equation by

∂ log [g (E)]

∂E=

1∆E

log

[Nup (E)

Ndown (E +∆E)

](99)

which we can numerically integrate to obtain g (E).

118

Broad Histogram Method

Distributions of Nup and Ndown can be obtained by keepingtrack of these numbers for each configuration at a certainenergy. In addition, we also need to store the values of thequantity Q (E) we wish to compute as a thermal averageaccording to

Q (T ) =

∑E Q (E) g (E) e

− EkBT∑

E g (E) e− E

kBT

. (100)

Based on a known degeneracy of states g (E), we can nowcompute quantities at any temperature.

119

Flat Histogram Method

Flat histogram method

• Start with g (E) = 1 and set f = e.• Make a Monte Carlo update with p(E) = 1/g(E).• If the attempt is successful at E: g(E)← f · g(E).• Obtain a histogram of energies H(E).• If H(E) is flat enough, then f ←

√f .

• Stop when f ≤ 1 + 10−8.

120

Umbrella Sampling

The basic idea is to multiply transition probabilities with afunction that is large at the free energy barrier and to later onremove this correction in the averaging step.

p (C) =w (C) e

−E(C)kBT∑

C w (C) e−E(C)

kBT

with ⟨A⟩ =⟨A/w⟩w⟨1/w⟩w

. (101)

121

Summary Histogram Methods

Summarizing, some of the most common techniques relatedto the histogram methods are

• Wang-Landau method [10, 11],• Multiple histogram method [12],• Multicanonical Monte Carlo [13],• Flat Histogram method [14],• Umbrella sampling [15].

122

Renormalization Group

Self-similarity

The Koch snowflake as an example of a self-similar pattern.

123

Real Space Renormalization

H. J. Maris and L. P. Kadanoff describe in Teaching therenormalization group that “communicating exciting newdevelopments in physics to undergraduate students is of

great importance. There is a natural tendency for students tobelieve that they are a long way from the frontiers of science

where discoveries are still being made.” [16]

124

Renormalization and Free Energy

To build some intuition for renormalization approaches, weconsider a scale transformation of the characteristic length L

of our system with that leads to a rescaled characteristiclength L = L/l. Moreover, we consider the partition functionof an Ising system as defined by the Hamiltonian given inEq. (15). A scale transformation with L = L/l leaves thepartition function

Z =∑σ

e−βH (102)

and the corresponding free energy invariant [17].

125

Renormalization and Free Energy

We therefore find for the free energy densities

f (ϵ,H) = l−df(ϵ, H

). (103)

We set ϵ = lyT ϵ and H = lyHH to obtain

f(ϵ, H

)= f (lyT ϵ, lyHH) . (104)

126

Renormalization and Free Energy

Since renormalization also affects the correlation length

ξ ∝ |T − Tc|−ν = |ϵ|−ν (105)

we can relate the critical exponent ν to yT . The renormalizedcorrelation length ξ = ξ/l scales as

ξ ∝ ϵ−ν . (106)

127

Renormalization and Free Energy

And due to

lyT ϵ = ϵ ∝ ξ−1ν =

l

)− 1ν

∝ ϵl1ν , (107)

we find yT = 1/ν.

The critical point is a fixed point of the transformation sinceϵ = 0 at Tc and ϵ does not change independent of the value ofthe scaling factor.

128

Majority Rule

A straightforward example which can be regarded asrenormalization of spin systems is the majority rule. Insteadof considering all spins in a certain neighborhood separately,one just takes the direction of the net magnetization of theseregions as new spin value, i.e.,

σi = sign

∑region

σi

. (108)

129

Majority Rule

An illustration of the majority rule renormalization.

130

Decimation of the One-dimensional Ising Model

The spins only interact with their nearest neighbors and thecoupling constant K = J/(kBT ) is the same for all spins.

131

Decimation of the One-dimensional Ising Model

An example of a one-dimensional Ising chain.

132

Decimation of the One-dimensional Ising Model

To further analyze this system, we compute its partitionfunction Z and obtain

Z =∑σ

eK∑

i σi =∑

σ2i=±1

∏2i

∑σ2i+1=±1

eK(σ2iσ2i+1+σ2i+1σ2i+2)

=∑

σ2i=±1

∏2i2cosh [K (σ2i + σ2i+2)]

=∑

σ2i=±1

∏2i

z (K) eKσ2iσ2i+2

= [z (K)]N2∑

σ2i=±1

∏2i

eKσ2iσ2i+2 ,

(109)where we used in the third step that the cosh(·) function onlydepends on even spins.

133

Decimation of the One-dimensional Ising Model

According to Eq. (109), the relation

Z(K,N) = [z (K)]N2 Z(K,N/2) (110)

holds as a consequence of the decimation method. Thefunction z(K) is the spin-independent part of the partitionfunction and K is the renormalized coupling constant.

134

Decimation of the One-dimensional Ising Model

We compute the relationz (K) eKs2is2i+2 = 2cosh [K (s2i + s2i+2)] explicitly and find

z (K) eKs2is2i+2 =

2cosh (2K) if s2i = s2i+2,

2 otherwise .(111)

135

Decimation of the One-dimensional Ising Model

Dividing and multiplying the last two expressions yields

e2K = cosh (2K) and z2 (K) = 4 cosh (2K) . (112)

And the renormalized coupling constant K in terms of K isgiven by

K =12ln [cosh (2K)] . (113)

136

Decimation of the One-dimensional Ising Model

0.0 0.2 0.4 0.6 0.8 1.0

K

0.0

0.2

0.4

0.6

0.8

1.0

K

An illustration of the fixed point iteration defined by Eq. (113).

137

Decimation of the One-dimensional Ising Model

Given the partition function, we now compute the free energyaccording to F = −kBTNf(K) = −kBT ln (Z) with f(K)

being the free energy density. Taking the logarithm ofEq. (110) yields

ln [Z(K,N)] = Nf(K) =12N ln [z(K)] +

12Nf

(K). (114)

Based on the last equation, we can derive the followingrecursive relation for the free energy density

f(K)= 2f (K)− ln

[2 cosh (2K)1/2

]. (115)

138

Decimation of the One-dimensional Ising Model

There exists one stable fixed point at K∗ = 0 and anotherunstable one at K∗ →∞. Every fixed point (K∗ = K) impliesthat Eq. (115) can be rewritten due to f

(K)= f (K∗).

The case of K∗ = 0 corresponds to the high-temperature limitwhere the free energy approaches the value

F = −NkBTf (K∗) = −NkBT ln(2). (116)

In this case, the entropy dominates the free energy.

139

Decimation of the One-dimensional Ising Model

For K∗ →∞, the system approaches the low temperaturelimit and the free energy is given by

F = −NkBTf (K∗) = −NkBTK = −NJ, (117)

i.e., given by the internal energy.

140

Generalization

In general, multiple coupling constants are necessary as forexample in the case of the two-dimensional Ising model.Thus, we have to construct a renormalized Hamiltonian basedon multiple renormalized coupling constants, i.e.,

H =

M∑α=1

KαOα with Oα =∑i

∏k∈cα

σi+k (118)

where cα is the configuration subset over which werenormalize and

Kα (K1, . . . ,KM ) with α ∈ 1, . . . ,M . (119)

141

Generalization

At Tc there exists a fixed point K∗α = Kα (K

∗1 , . . . ,K

∗M ). A

possible ansatz to solve this problem is the linearization ofthe transformation. Thus, we compute the JacobianTα,β = ∂Kα

∂Kβand obtain

Kα −K∗α =

∑β

Tα,β|K∗

(Kβ −K∗

β

). (120)

142

Generalization

To analyze the behavior of the system close to criticality, weconsider eigenvalues λ1, . . . , λM and eigenvectors ϕ1, . . . , ϕM

of the linearized transformation defined by Eq. (120). Theeigenvectors fulfill ϕα = λαϕα and the fixed point is unstableif λα > 1.

143

Generalization

The largest eigenvalue dominates the iteration and we canidentify the scaling field ϵ = lyT ϵ with the eigenvector of thetransformation, and the scaling factor with eigenvalueλT = lyT . Then, we compute the exponent ν according to

ν =1yT

=ln (l)

ln (λT ). (121)

144

Monte Carlo Renormalization Group

Since we are dealing with generalized Hamiltonians withmany interaction terms, we compute the thermal average ofOα according to

⟨Oα⟩ =∑

σOαe∑

β KβOβ∑σ e

∑β KβOβ

=∂F

∂Kα(122)

where F is the free energy.

145

Monte Carlo Renormalization Group

Using the fluctuation-dissipation theorem, we can alsonumerically compute the response functions

χα,β =∂ ⟨Oα⟩∂Kβ

= ⟨OαOβ⟩ − ⟨Oα⟩ ⟨Oβ⟩ ,

χα,β =∂⟨Oα

⟩∂Kβ

=⟨OαOβ

⟩−⟨Oα

⟩⟨Oβ⟩ .

146

Monte Carlo Renormalization Group

We also find with Eq. (122) that

χ(n)α,β =

∂⟨O

(n)α

⟩∂Kβ

=∑γ

∂Kγ

∂Kβ

∂⟨O

(n)α

⟩∂Kγ

=∑γ

Tγ,βχ(n)α,γ . (123)

It is thus possible to derive a value of Tγ,β from thecorrelation functions by solving a set of M coupled linearequations. At point K = K∗, we can apply this method in aniterative manner to compute critical exponents as suggestedby Eq. (121).

147

Monte Carlo Renormalization Group

There are many error sources in this technique, that originatefrom the fact that we are using a combination of several tricksto obtain our results:

• Statistical errors,• Truncation of the Hamiltonian to the M th order,• Finite number of scaling iterations,• Finite size effects,• No precise knowledge of K∗.

148

Boltzmann Machine

Hopfield Network

We begin our excursion to Boltzmann machines with anetwork consisting of neurons which are fully connected, i.e.,every single neuron is connected to all other neurons. Aneuron represents a node of a network and is nothing but afunction of I different inputs xii∈1,...,I which are weightedby wii∈1,...,I to compute and output y.

149

Hopfield Network

An illustration of a single neuron with output y, inputsxii∈1,...,I and weights wii∈1,...,I.

150

Hopfield Network

In terms of a Hopfield network, we consider discrete inputsxi ∈ −1, 1. The activation of neuron i is given by

ai =∑j

wijxj , (124)

where we sum over the inputs. The weights fulfill wij = wji

and wii = 0.

151

Hopfield Network

Similarly to the Ising model, the associated energy is given by

E = − 12∑i,j

wijxixj −∑i

bixi, (125)

where bi is the bias term.

152

Hopfield Network

The dynamics of a Hopfield network is given by

xi(ai) =

1 if ai ≥ 0,

−1 otherwise.(126)

153

Hopfield Network

The energy difference ∆Ei after neuron i has been updated is

∆Ei = E(xi = −1)− E(xi = 1) = 2

bi +∑j

wijxj

. (127)

We can absorb the bias bi in the sum by having an extra activeunit at every node in the network. We thus showed that theactivation defined by Eq. (124) equals one half of the energydifference ∆Ei.

154

Boltzmann Machine Learning

A comparison of denoising capabilities of Hopfield andBoltzmann machine models. The figure is taken from

https://arxiv.org/pdf/1803.08823.pdf.

155

Boltzmann Machine Learning

For some applications, finding a local minimum based on thedeterministic update rule defined by Eq. (126) might not besufficient. Similar to the discussion of Monte Carlo methodsfor Ising systems, we employ an update probability

pi =1

1 + exp(−∆Ei/T )= σ(2ai/T ) (128)

to set neuron i to unity independent of its state [18]. Here,σ(x) = 1/ [1 + exp(−x)] denotes the sigmoid function.

156

Boltzmann Machine Learning

As defined in Eq. (127), the energy difference ∆Ei is the gapbetween a configuration with an active neuron i and aninactive one. The parameter T acts as temperatureequivalent5.

A closer look at Eqs. (125) and (128) tells us that we aresimulating a Hamiltonian system with Glauber dynamics. Dueto the fulfilled detailed balance condition, we reach thermalequilibrium and find again for the probabilities of the systemto be in state X or Y 6

peq(Y )

peq(X)= exp

(−E(Y )− E(X)

T

). (129)

5For T → 0, we recover deterministic dynamics as described by Eq. (126).6Here we set kB = 1.

157

Boltzmann Machine Learning

We divide the Boltzmann machine units into visible andhidden units represented by the non-empty set V and thepossibly empty set H , respectively. The visible units are setby the environment whereas the hidden units are additionalvariables which might be necessary to model certain outputs.

158

Boltzmann Machine Learning

Let P ′(ν) be the probability distribution over the visible unitsν in a freely running network. It can be obtained bymarginalizing over the corresponding joint probabilitydistribution, i.e.,

P ′(ν) =∑h

P ′ (ν, h) , (130)

where h represents a hidden unit.

159

Boltzmann Machine Learning

The goal is to come up with a method such that P ′(ν)

approaches the unknown environment distribution P (ν). Wemeasure the difference between P ′(ν) and P (ν) in terms ofthe Kullback-Leibler divergence (relative entropy)

G =∑ν∈V

P (ν) ln

[P (ν)

P ′(ν)

]. (131)

160

Boltzmann Machine Learning

To minimize G, we perform a gradient descent according to

∂G

∂wij= − 1

T

(pij − p′ij

), (132)

where pij is the probability that two units are active onaverage if the environment is determining the states of thevisible units and p′ij is the corresponding probability in afreely running network without a coupling to the environment.

161

Boltzmann Machine Learning

Both probabilities are measured at thermal equilibrium. Inthe literature, the probabilities pij and p′ij are also oftendefined in terms of the thermal averages ⟨xixj⟩data and⟨xixj⟩model, respectively. The weights wij of the network arethen updated according to

∆wij = ϵ(pij − p′ij

)= ϵ (⟨xixj⟩data − ⟨xixj⟩model) , (133)

where ϵ is the learning rate.

162

Boltzmann Machine Learning

The so-called restricted Boltzmann machine is not taking intoaccount mutual connections within the set of hidden andvisible units, and turned out to be more suitable for learningtasks. In the case of restricted Boltzmann machines, theweight update is given by

∆wij = ϵ (⟨νihj⟩data − ⟨νihj⟩model) , (134)

where νi and hj represent visible and hidden units,respectively.

163

Boltzmann Machine Learning

Instead of sampling the configurations for computing⟨νihj⟩data and ⟨νihj⟩model at thermal equilibrium, we couldalso just consider a few relaxation steps. This method iscalled contrastive divergence and defined by the followingupdate rule

∆wCDij = ϵ

(⟨νihj⟩0data − ⟨νihj⟩

kmodel

), (135)

where the superscript indices indicate the number of updates.

164

Parallelization

Parallelization

Some important concepts are summarized in the lecturenotes.

165

Non-Equilibrium Systems

Directed Percolation

Isotropic percolation is an equilibrium process in which a siteor bond is occupied with a certain probability p. However, inthe case of directed percolation an additional constraint isdefined, namely in each time step occupied sites or bondsoccupy their neighbors with probability p only along a givendirection [19]. Time is therefore a relevant parameter.

166

Directed Percolation

167

Directed Percolation

168

Directed Percolation

In the vicinity of pc the percolation order parameter P (p), thefraction of sites of the largest cluster (istropic percolation) orthe density of active sites (directed percolation), takes theform: P (p) ∝ (p− pc)

β .

Although, the free energy is not defined in non-equilibriumprocesses one can distinguish between first-order andcontinuous phase transitions based on the behavior of theorder parameter.

169

Directed Percolation

Another formulation of directed percolation is given by thecontact process. On a lattice with active (si(t) = 1) andinactive (si(t) = 0) sites, we sequentially update the systemaccording to the following dynamics:

Let the number of active neighbors neighbors beni(t) =

∑⟨i,j⟩ sj(t) and a new value si(t+ dt) ∈ 0, 1 is

obtained according to the transition rates

w[0→ 1, n] = (λn)/(2d) and w[1→ 0, n] = 1, (136)

where d is the dimension of the system.

170

Directed Percolation

The critical infection rate λc and the critical exponent β on asquare lattice for different dimensions d. At and above the

critical dimension dc = 4 the critical exponents are describedby mean-field theory [17]. The values are rounded to the

second decimal and taken from Refs. [20] and [21].d = 1 d = 2 d = 3 dc = 4

λc 3.30 1.65 1.32 1.20β 0.28 0.59 0.78 1

171

Directed Percolation

The contact process has the same characteristic criticalexponents as directed percolation what implies that theyboth belong to the same universality class. According toRef. [17] the directed percolation universality class requires:

1. a continuous phase transition from a fluctuating phase toa unique absorbing state,

2. a transition with a one-component order parameter,3. local process dynamics,4. no additional attributes, such as conservation laws or

special symmetries.

172

Kinetic Monte Carlo (Gillespie Algorithm)

Non-equilibrium systems are not described by a Hamiltonianand we cannot use the same algorithms which we employedfor studying the Ising model or other equilibrium models.Time is now a physical parameter and not a mere virtualproperty of a Markov chain.

A standard algorithm for simulating non-equilibriumdynamics is the kinetic Monte Carlo method which is alsooften referred to as Gillespie algorithm.

173

Kinetic Monte Carlo (Gillespie Algorithm)

To give an example, the kinetic Monte Carlo algorithm isapplied to the contact process on a two-dimensional squarelattice. Recovery and activation define n = 2 processes withcorresponding rates R1 = 1 and R2 = λ.

174

Kinetic Monte Carlo (Gillespie Algorithm)

At time t, the subpopulation N1(t) consists of all active nodeswhich recover with rate R1. The total recovery rate is thengiven by Q1(t) = N1(t).

On the square lattice only nearest-neighbor interactions areconsidered and the total rate of the second process(activation) is obtained by computing ni and w [0→ 1, ni]

according to Eq. (136) for all nodes.

175

Kinetic Monte Carlo (Gillespie Algorithm)

Kinetic Monte Carlo algorithm

1. Identify all individual rates (per node) Rii∈1,...,n.2. Determine the overall rates (all nodes)Qii∈1,...,n, where Qi = RiNi and Ni defines thesubpopulation which corresponds to Ri. It ispossible that some subpopulations are identicalbut correspond to different rates.

3. Let η ∈ [0, 1) be a uniformly distributed randomnumber and Q =

∑iQi. The process with rate Rj

occurs if∑j−1

i=1 Qi ≤ ηQ <∑j

i=1 Qi.4. Let ϵ ∈ [0, 1) be a uniformly distributed random

number. Then the time evolves as t→ t+∆t,where ∆t = −Q−1 log(1− ϵ) and return to step 2.

176

[1] Stanley, H.Scaling, universality, and renormalization: Three pillarsof modern critical phenomena (1999).URL http://link.springer.com/chapter/10.1007%2F978-1-4612-1512-7_39.

[2] Stanley, H.Introduction to Phase Transitions and Critical Phenomena(Clarendon, Oxford, 1971).

[3] Sengers, J., Sengers, J. L. & Croxton, C.Progress in liquid physics.Wiley, Chichester (1978).

177

[4] Pelissetto, A. & Vicari, E.Critical phenomena and renormalization-group theory.Phys. Rep. 368, 549–727 (2002).

[5] Vichniac, G. Y.Simulating physics with cellular automata.Phys. D 10, 96–116 (1984).

[6] Binder, K. & Heermann, D.Monte Carlo Simulation in Statistical Physics (Springer,1997).

[7] Binder, K. & Landau, D.Finite-size scaling at first-order phase transitions.Phys. Rev. B 30, 1477 (1984).

178

[8] Salzburg, Z., Jacobson, J., Fickett, W. & Wood, W.Excitation of non-radial stellar oscillations bygravitational waves: a first model.Chem. Phys. 65 (1959).

[9] de Oliveira, P., Penna, T. & Herrmann, H.Broad histogram method.Braz. J. Phys. 677 (1996).

[10] Wang, F. & Landau, D.Efficient, multiple-range random walk algorithm tocalculate the density of states.Phys. Rev. Lett. 86, 2050 (2001).

179

[11] Zhou, C., Schulthess, T. C., Torbrügge, S. & Landau, D.Wang-landau algorithm for continuous models and jointdensity of states.Phys. Rev. Lett. 96, 120201 (2006).

[12] Ferrenberg, A.Am ferrenberg and rh swendsen, phys. rev. lett. 63, 1195(1989).Phys. Rev. Lett. 63, 1195 (1989).

[13] Berg, B. A.Introduction to multicanonical monte carlo simulations.Fields Inst. Commun. 26, 1–24 (2000).

180

[14] Wang, F. & Landau, D.Determining the density of states for classical statisticalmodels: A random walk algorithm to produce a flathistogram.Phys. Rev. E 64, 056101 (2001).

[15] Torrie, G. & Valleau, J.Nonphysical sampling distributions in monte carlofree-energy estimation: Umbrella sampling.J. Comp. Phys. 23, 187–199 (1977).URL http://www.sciencedirect.com/science/article/pii/0021999177901218.

[16] Maris, H. J. & Kadanoff, L. P.Teaching the renormalization group.Am. J. Phys. 46, 652–657 (1978).

181

[17] Henkel, M., Hinrichsen, H. & Lübeck, S.Non-Equilibrium Phase Transitions Volume 1 – AbsorbingPhase Transitions (Springer Science & Business Media,2008).

[18] Ackley, D. H., Hinton, G. E. & Sejnowski, T. J.A learning algorithm for boltzmann machines.Cogn. Sci. 9, 147–169 (1985).

[19] Broadbent, S. & Hammersley, J.Percolation processes I. crystals and mazes.Proceedings of the Cambridge Philosophical Society(1957).

182

[20] Moreira, A. G. & Dickman, R.Critical dynamics of the contact process with quencheddisorder.Phys. Rev. E (1996).

[21] Sabag, M. M. S. & de Oliveira, M. J.Conserved contact process in one to five dimensions.Phys. Rev. E (2002).

183

top related