computational statistical physics - eth z · dates and further information •...
TRANSCRIPT
Computational Statistical PhysicsPart I: Statistical Physics and Phase Transitions
Lucas BöttcherAugust 2, 2019
ETH ZürichInstitute for Theoretical PhysicsHIT K 11.3Wolfgang-Pauli-Strasse 278093 Zürich
402-0812-00LFS 2019
Dates and Further Information
• Lectures: Friday 10.45–12.30 in HIT H51• Exercices: Friday 8.45–10.30 in HIT F21• Oral exams take place in August 2019• Webpage: http://www.itp.phys.ethz.ch/education/computational-statphys.html
1
Course Offered in (Selection)
• Mathematics, Computer Science (Master)• Physics (Master)• Material Science (Spezialvorlesung, Master)• Civil Engineering (Spezialvorlesung)• Biomedical Engineering (Master)
3
Learning Objectives
Students are able to
• distinguish between different computational methods tostudy statistical physics models.
• apply appropriate computational techniques to efficientlysimulate a given equilibrium or non-equilibrium system.
• differentiate between different simulation methods tostudy particle and molecular interactions.
• apply appropriate molecular dynamics modelingconcepts.
4
Phase Transitions
• 22.02. Introduction to statistical physics and the Isingmodel
• 01.03. Monte Carlo methods• 08.03. Finite size methods and cluster algorithms• 15.03. Histogram methods and renormalization group• 22.03. Renormalization group and Boltzmann machines• 29.03. Non-equilibrium phase transitions and summary
5
Molecular Dynamics
• 05.04. Molecular Dynamics, Verlet and Leapfrog methods• 12.04. Optimization, Composed Particle Systems• 19.04. and 26.04. ETH vacations• 03.05. Arbitrary Shapes, Long-range Potentials• 10.05. Nosé-Hoover thermostat, stochastic method,constant pressure ensemble
6
Event-driven Dynamics
• 17.05. Event driven, inelastic collisions, friction• 24.05. Contact dynamics, Particles in Fluids, (if timeallows: ab initio MD, Car-Parrinello)
• 31.05. shifted to 24.05
7
Classical Statistical Mechanics
Classical Statistical Mechanics
8
Phase Space
Let us consider a classical physicalsystem with N particles whosecanonical coordinates and thecorresponding conjugate momentaare given by q1, . . . , q3N andp1, . . . , p3N , respectively. The6N-dimensional space Γ defined bythe last set of coordinates definesthe phase space.
Ludwig Boltzmann(1844-1906)
9
Phase Space
q
p
Phase spaces of an undamped (blue) and a damped (orange)harmonic oscillator. 10
Ensemble Average
The assumption that all states in an ensemble are reached bythe time evolution of the corresponding system is referred toas ergodicity hypothesis. We define the ensemble average of aquantity Q(p, q) as
⟨Q⟩ =∫Q(p, q)ρ(p, q)dpdq∫
ρ(p, q)dpdq , (1)
where ρ (p, q) denotes the phase space density and dpdq is ashorthand notation for dp3Ndq3N .
11
Hamiltonian
The dynamics of the considered N particles is described bytheir Hamiltonian H(p, q), i.e., the equations of motions are
pi = −∂H∂qi
and qi =∂H∂pi
(i = 1, . . . , 3N). (2)
Moreover, the temporal evolution of a phase space element ofvolume V and boundary ∂V is given by
∂
∂t
∫VρdV +
∫∂V
ρv dA = 0, (3)
where v = (p1, . . . , p3N , q1, . . . , q3N ) is a generalized velocityvector.
12
Liouville Theorem
Applying the divergence theorem to Eq. (3), we find that ρsatisfies the continuity equation
∂ρ
∂t+∇ · (ρv) = 0, (4)
where ∇ = (∂/∂p1, . . . , ∂/∂p3N , ∂/∂q1, . . . , ∂q3N ). We canfurther simplify Eq. (4) since
∇ · v =
3N∑i=1
(∂qi∂qi
+∂pi∂pi
)=
3N∑i=1
(∂
∂qi
∂H∂pi− ∂
∂pi
∂H∂qi
)︸ ︷︷ ︸
=0
= 0. (5)
13
Liouville Theorem
Rewriting Eq. (4) using Poisson brackets1 yields Liouville’sTheorem
∂ρ
∂t= H, ρ (7)
which describes the time evolution of the phase spacedensity ρ.
1The Poisson bracket is defined as
u, v =∑i
(∂u
∂qi
∂v
∂pi− ∂u
∂pi
∂v
∂qi
)= −v, u. (6)
14
Thermal Equilibrium
In thermal equilibrium, the system reaches a steady state inwhich the distribution of the configurations is constant andtime-independent, i.e., ∂ρ/∂t = 0. Liouville’s theorem leads tothe following condition
v · ∇ρ = H, ρ = 0. (8)
The last equation is satisfied if ρ depends on quantities whichare conserved during the time evolution of the system. Wethen use such a phase space density ρ to replace the timeaverage
⟨Q⟩ = limT→∞
1T
∫ T
0Q(p(t), q(t))dt. (9)
by its ensemble average as defined by Eq. (1).15
Ensemble Average
In the subsequent sections, we also consider discreteconfigurations X . In this case, we define the ensembleaverage as
⟨Q⟩ = 1Ω
∑X
Q(X)ρ(X), (10)
where Ω is the normalizing volume such thatΩ−1∑
X ρ(X) = 1. With the help of ensemble averages,systems can be described by means of some macroscopicquantities, such as temperature, energy and pressure.
16
Ensembles
• Microcanonical ensemble:constant E, V , N
• Canonical ensemble: constantT , V , N
• Canonical pressure ensemble:constant T , p, N
• Grandcanonical ensemble:constant T , V , µ
Josiah W. Gibbs(1839-1903)
17
Microcanonical Ensemble
Themicrocanonical ensemble is defined by a constant numberof particles, volume and energy. Thus, any configuration X ofthe system has the same energy E (X) = const. The phasespace density is also constant and given by
ρ (X) =1
Zmcδ (H (X)− E) , (11)
with Zmc being the partition function of the microcanonicalensemble
Zmc =∑X
δ (H (X)− E).
18
Canonical Ensemble
In a canonical ensemble setup, the system we study(system 1) is coupled to a heat reservoir (system 2) that
guarantees a constant temperature.
19
Canonical Ensemble
At a given temperature T , the probability for a system to be ina certain configuration X with energy E (X) is given by
ρ (X) =1ZT
exp[−E (X)
kBT
], (12)
withZT =
∑X
exp[−E (X)
kBT
](13)
being the partition function of the canonical ensemble.According to the prior definition in Eq. (10), the ensembleaverage of a quantity Q is then given by
⟨Q⟩ = 1ZT
∑X
Q (X) e−E(X)
kBT . (14)
20
Ising Model
Ising Model
An illustration of the interaction of a magnetic dipole (black)with its nearest neighbors (dark grey) on a two-dimensional
lattice.21
Ising Model
We consider a two-dimensional lattice with sites σi ∈ 1,−1which only interact with their nearest neighbors. Theirinteraction is described by the Hamiltonian
H (σ) = −J∑⟨i,j⟩
σiσj −H
N∑i=1
σi, (15)
where the first term denotes the interaction between allnearest neighbors represented by a sum over ⟨i, j⟩, and thesecond one the interaction of each site with an externalmagnetic field H .
• Ferromagnetic case: J > 0 (parallel spins)• Antiferromagnetic case: J < 0 (anti-parallel spins)• No interaction: J = 0
22
Phase Transition in the Ferromagnetic Case
The magnetization M(T,H) for different fields H ≥ 0 as afunction of T . For T ≤ Tc, the system is characterized by aferromagnetic phase. The black solid line represents the
spontaneous magnetization MS(T ) for H = 0 and should beinterpreted in the sense that limH → 0+M(T,H).
23
Phase Transition in the Ferromagnetic Case
The first-order transition as a consquence of a sign change ofthe external field.
24
Order Parameter
The magnetization is defined as
M (T,H) =
⟨1N
N∑i=1
σi
⟩, (16)
and corresponds to the ensemble average of the mean valueof all spins.
On average, M (T ) vanishes since for every configurationthere exists one of opposite sign which neutralizes the otherone. As a consequence, we define the order parameter of theIsing model as
MS (T ) = limH→0+
⟨1N
N∑i=1
σi
⟩(17)
and refer to it as the spontaneous magnetization.25
Magnetic Domains
The simulations have been performed on a square lattice with512× 512 sites using
https://mattbierbaum.github.io/ising.js/.
26
Critical Exponents I
In the vicinity of the critical temperature for T < Tc, thespontaneous magnetization scales as
MS(T ) ∝ (Tc − T )β . (18)
For T = Tc and H → 0, we find the following scaling
M(T = Tc,H) ∝ H 1/δ. (19)
The exponents β and δ are so-called critical exponents andcharacterize together with other exponents the underlyingphase transition.
2D: β = 1/8, δ = 153D: β ≈ 0.326, δ ≈ 4.790
27
Fluctuations
The magnetic sucsceptibility is defined as the change of themagnetization M in response to an applied magnetic field H ,i.e.,
χ (T ) =∂M (T,H)
∂H. (20)
We now use the definition of the spontaneous magnetizationgiven by Eq. (17) and plug it into Eq. (20) leading to
χ (T ) = limH→0+
∂ ⟨M (T,H)⟩∂H
= limH→0+
∂
∂H
1N
∑σ∑N
i=1 σi exp(E0+H
∑Ni=1 σi
kBT
)∑σ
exp
(E0 +H
∑Ni=1 σi
kBT
)︸ ︷︷ ︸
=ZT (H)
.
28
Fluctuations
Using the product rule yields
χ (T ) = limH→0+
1NkBT
∑σ
(∑Ni=1 σi
)2exp
(E0+H
∑Ni=1 σi
kBT
)ZT (H)
− 1NkBT
[∑σ∑N
i=1 σi exp(E0+H
∑Ni=1 σi
kBT
)]2
[ZT (H)]2
=N
kBT
[⟨MS (T )2
⟩− ⟨MS (T )⟩2
]≥ 0.
(21)
The last equation defines the fluctuation-dissipation theoremfor the magnetic susceptibility. Analogously, the specific heatis connected to energy fluctuations by
C(T ) =∂ ⟨E⟩∂T
=1
kBT 2
[⟨E (T )2
⟩− ⟨E (T )⟩2
]. (22)
29
Critical Exponents II
Similarly to the power-law scaling of the spontaneousmagnetization defined in Eq. (18), we find for the magneticsusceptibility and the specific heat in the vicinity of Tc
χ (T ) ∝ |Tc − T |−γ , (23)
C (T ) ∝ |Tc − T |−α . (24)
The corresponding exponents are in2D: γ = 7/4, α = 0
3D: γ ≈ 1.24, α ≈ 0.11
30
Critical Exponents II
T Tc
χ(T
)
T Tc
C(T
)
Susceptibility and specific heat as a function of temperaturefor the three dimensional Ising model. Both quantitiesdiverge at the critical temperature Tc in the thermodynamiclimit.
31
Correlation Length
The correlation function is defined by
G(r1, r2;T,H) = ⟨σ1σ2⟩ − ⟨σ1⟩ ⟨σ2⟩ , (25)
where the vectors r1 and r2 pointing in the direction of latticesites 1 and 2.
32
Correlation Length
The correlation function is defined by
G(r1, r2;T,H) = ⟨σ1σ2⟩ − ⟨σ1⟩ ⟨σ2⟩ , (26)
where the vectors r1 and r2 pointing in the direction of latticesites 1 and 2. If the system is translational and rotationalinvariant, the correlation function only depends onr = |r1 − r2|. At the critical point, the correlation functiondecays as
G(r;Tc, 0) ∝ r−d+2−η, (27)
where η is another critical exponent and d the dimension ofthe system.
2D: η = 1/43D: η ≈ 0.036
33
Correlation Length
For temperatures away from the critical temperature, thecorrelation function exhibits an exponential decay
G(r;T, 0) ∝ r−ϑe−r/ξ, (28)
where ξ defines the correlation length. The exponent ϑ equals2 above and 1/2 below the transition point. In the vicinity ofTc, the correlation length ξ diverges since
ξ(T ) ∝ |T − Tc|−ν . (29)
2D: ν = 13D: ν ≈ 0.63
34
Critical Exponents and Universality
The aforementioned six critical exponents are connected byfour scaling laws
α+ 2β + γ = 2 (Rushbrooke), (30)γ = β(δ − 1) (Widom), (31)γ = (2− η)ν (Fisher), (32)
2− α = dν (Josephson), (33)
which have been derived in the context of thephenomenological scaling theory for ferromagneticsystems [1, 2]. Due to these relations, the number ofindependent exponents reduces to two.
35
Critical Exponents and Universality
Universal scaling for five different gases. The figure is takenfrom Ref. [3]. 36
Critical Exponents and Universality
The critical exponents of the Ising model in two and threedimensions [4].
Exponent d = 2 d = 3
α 0 0.110(1)β 1/8 0.3265(3)γ 7/4 1.2372(5)δ 15 4.789(2)η 1/4 0.0364(5)ν 1 0.6301(4)
37
Monte Carlo Methods
Monte Carlo Methods
The main steps of the Monte Carlo sampling are
1. Choose randomly a new configuration in phase spacebased on a Markov chain.
2. Accept or reject the new configuration, depending on thestrategy used (e.g., Metropolis or Glauber dynamics).
3. Compute the physical quantity and add it to theaveraging procedure.
4. Repeat the previous steps.
38
Markov Chains
Example of an energy distribution with a system size L
dependence of the distribution width which scales as ∝√Ld
where d is the system dimension.39
Markov Chains
In terms of a Markov chain, the transition probability fromone state to another is given by the probability of a new stateto be proposed (T ) and the probability of this state to beaccepted (A). Specifically, T (X → Y ) is the probability that anew configuration Y is proposed, starting from configurationX . For the thermodynamic systems we consider, thetransition probability fulfills three conditions:
1. Ergodicity: any configuration in the phase space must bereachable within a finite number of steps,
2. Normalization:∑
Y T (X → Y ) = 1,3. Reversibility: T (X → Y ) = T (Y → X).
40
Markov Chains
Once a configuration is proposed, we can accept the newconfiguration with probability A(X → Y ) or reject it withprobability 1−A(X → Y ). The probability of the Markov chainis then given by
W (X → Y ) = T (X → Y ) ·A(X → Y ). (34)
41
Markov Chains
We denote the probability to find the system in a certainconfiguration X at virtual time τ by p (X, τ). The masterequation describes the time evolution of p (X, τ) and is givenby
dp (X, τ)
dτ=∑Y
p(Y )W (Y → X)−∑Y
p(X)W (X → Y ). (35)
42
Markov Chains
A stationary state pst is reached if dp(X,τ)dτ = 0. The probability
of the Markov chain fulfills the following properties:
1. Ergodicity: any configuration must be reachable: ∀X,Y :
W (X → Y ) ≥ 0,2. Normalization:
∑Y W (X → Y ) = 1,
3. Homogeneity:∑
Y pst(Y )W (Y → X) = pst(X).
43
Markov Chains
To efficiently sample the relevant regions of the phase space,the probability of the Markov chain W (·) has to depend onthe system properties. To achieve that, we set the stationarydistribution pst equal to the equilibrium distribution of thephysical system peq (a real and measurable distribution):
dp (X, τ)
dτ= 0⇔ pst
!= peq. (36)
44
Markov Chains
45
Markov Chains
It then follows from the stationary state condition of theMarkov chain that∑
Y
peq(Y )W (Y → X) =∑Y
peq(X)W (X → Y ). (37)
A sufficient condition for this to be true is
peq(Y )W (Y → X) = peq(X)W (X → Y ), (38)
which is referred to as condition of detailed balance.
46
Markov Chains
As an example, in a canonical ensemble at fixed temperatureT, the equilibrium distribution is given by the Boltzmanndistribution
peq(X) =1ZT
exp[−E(X)
kBT
](39)
with the partition function ZT =∑
X exp[−E(X)
kBT
].
47
M(RT)2 Algorithm
Nicholas C. Metropolis (1915-1999) introduced the M(RT)2sampling technique. 48
M(RT)2 Algorithm
One possible choice of the acceptance probability fulfillingthe detailed balance condition is given by
A (X → Y ) = min
[1,peq (Y )
peq (X)
]. (40)
which can be obtained by rewritting Eq. (38).
49
M(RT)2 Algorithm
In the case of the canonical ensemble withpeq (X) = 1
ZTexp
[−E(X)
kBT
], the acceptance probability
becomes
A (X → Y ) = min
[1, exp
(− ∆E
kBT
)], (41)
where ∆E = E(Y )− E(X). The last equation implies that theMonte Carlo step is always accepted if the energy decreases,and if the energy increases, it is accepted with probabilityexp
(− ∆E
kBT
).
50
M(RT)2 Algorithm
In summary, the steps of the M(RT)2 alogrithm applied to theIsing model are
M(RT)2 Algorithm
• Randomly choose a lattice site i,• Compute ∆E = E(Y )− E(X) = 2Jσihi,• Flip the spin if ∆E ≤ 0, otherwise accept it withprobability exp
(− ∆E
kBT
),
with hi =∑
⟨i,j⟩ σj and E = −J∑
⟨i,j⟩ σiσj .
51
Glauber Dynamics
The Metropolis algorithm is not the only possible choice tofulfill the detailed balance condition. Another acceptanceprobability given by
AG (X → Y ) =exp
(− ∆E
kBT
)1 + exp
(− ∆E
kBT
) (42)
has been suggested by Glauber.
52
Glauber Dynamics
−6 −4 −2 0 2 4 6
β∆E
0.0
0.2
0.4
0.6
0.8
1.0
A(X→Y
)
M(RT)2
Glauber
A comparison of the acceptance probabilities of M(RT)2 andGlauber dynamics.
53
Glauber Dynamics
In contrast to the M(RT)2 acceptance probability, updates with∆E = 0 are not always accepted but with probability 1/2.
To proof that Eq. (42) satisfies the condition of detailedbalance, we have to show that
peq(Y )AG(Y → X) = peq(X)AG(X → Y ), (43)
since T (Y → X) = T (X → Y ).
54
Glauber Dynamics
The last equation is equivalent to
peq(Y )
peq(X)=
AG(X → Y )
AG(Y → X)(44)
which is fulfilled since
peq(Y )
peq(X)= exp
(− ∆E
kBT
)(45)
and
AG(X → Y )
AG(Y → X)=
exp(− ∆E
kBT
)1 + exp
(− ∆E
kBT
) exp
(∆EkBT
)1 + exp
(∆EkBT
)−1
= exp(− ∆E
kBT
).
(46)
55
Glauber Dynamics
As in the M(RT)2 algorithm, only the local configurationaround the lattice site is relevant for the update procedure.Furthermore, with J = 1, the probability to flip spin σi is
AG(X → Y ) =exp
(−2σihikBT
)1 + exp
(−2σihikBT
) (47)
with hi =∑
⟨i,j⟩ σj being the local field andX = . . . , σi−1, σi, σi+1, . . . and Y = . . . , σi−1,−σi, σi+1, . . . the initial and final configuration, respectively.
56
Glauber Dynamics
We abbreviate the probability defined by Eq. (47) as pi. Thespin flip and no flip probabilities can then be expressed as
pflip =
pi for σi = −1
1− pi for σi = +1and pno flip =
1− pi for σi = −1
pi for σi = +1(48)
57
Glauber Dynamics
A possible implementation is
σi(τ + 1) = −σi(τ) · sign(pi − z), (49)
with z ∈ (0, 1) being a uniformly distributed random number,or
σi(τ+1) =
+1 with propability pi
−1 with propability 1− piand pi =
exp (2βhi)1 + exp (2βhi)
.
(50)This method does not depend on the spin value at time t andis called heat bath Monte Carlo.
58
Binary Mixtures
An example of a binary mixture consisting of two differentatoms A and B.
59
Binary Mixtures
Kawasaki dynamics
• Choose a A−B bond,• Compute ∆E for A−B → B −A,• Metropolis: If ∆E ≤ 0 flip, else flip with probability
p = exp
(−∆E
kBT
),
• Glauber: Flip with probability
p = exp(− ∆E
kBT
)/
[1 + exp
(− ∆E
kBT
)].
60
Creutz Algorithm
61
Creutz Algorithm
The movement in phase space is therefore not strictlyconstrained to a subspace of constant energy but there is acertain additional volume in which we can freely move. Thecondition of constant energy is softened by introducing aso-called demon which corresponds to a small reservoir ofenergy ED that can store a certain maximum energy Emax.
62
Creutz Algorithm
Creutz Algorithm
• Choose a site,• Compute ∆E for the spin flip,• Accept the change if Emax ≥ ED −∆E ≥ 0.
63
Creutz Algorithm
The distribution of the demon energy ED is exponentiallydistributed. Based on the Boltzmann factor, it is possible to
extract the inverse temperature β = (kBT )−1 = 2.25. 64
Q2R
In the case of Emax → 0, the Creutz algorithm resembles atotalistic cellular automaton called Q2R [5]. The update ruleson a square lattice for spins σij ∈ 0, 1 are given by
σij(τ + 1) = f(xij)⊕ σij(τ) (51)
with
xij = σi−1j + σi+1j + σij−1 + σij+1 and f(x) =
1 if x = 2
0 if x = 2.
(52)
65
Boundary Conditions
For finite lattices, the following boundary conditions may beused:
• Open boundaries, i.e., no neighbors at the edges of thesystem,
• fixed boundary conditions,• and periodic boundaries.
66
Temporal Correlations
According to the definition of a Markov chain, the dependenceof a quantity A on virtual time τ is given by
⟨A(τ)⟩ =∑X
p (X, τ)A (X) =∑X
p (X, τ0)A (X(τ)). (53)
In the second step of the last equation, we used the fact thatthe average is taken over an ensemble of initialconfigurations X(τ0) which evolve according to Eq. (35) [6].
67
Temporal Correlations
For some τ0 < τ , the non-linear correlation function
ΦnlA (τ) =
⟨A(τ)⟩ − ⟨A(∞)⟩⟨A(τ0)⟩ − ⟨A(∞)⟩
(54)
is a measure to quantify the deviation of A(τ) from A(∞)
relative to the deviation of A(τ0) from A(∞).
68
Temporal Correlations
The non-linear correlation time τnlA describes the relaxationtowards equilibrium and is defined as2
τnlA =
∞∫0
ΦnlA (τ) dτ. (56)
2If we consider an exponential decay of ΦnlA (τ), we find that this definition
is meaningful since∞∫
0
exp(−τ/τnl
A
)dτ = τnl
A . (55)
69
Temporal Correlations
In the vicinity of the critical temperature Tc, we observe theso-called critical slowing down of our dynamics, i.e., thenon-linear correlation time is described by power law
τnlA ∝ |T − Tc|−znlA (57)
with znlA being the non-linear dynamical critical exponent.This is very bad news, because the last equation implies thatthe time needed to reach equilibrium diverges at Tc.
70
Temporal Correlations
The linear correlation function of two quantities A and B inequilibrium is defined as
ΦAB (τ) =⟨A(τ0)B(τ)⟩ − ⟨A⟩ ⟨B⟩⟨AB⟩ − ⟨A⟩ ⟨B⟩
(58)
with
⟨A (τ0)B (τ)⟩ =∑X
p (X, τ0)A (X (τ0))B (X (τ)).
As τ goes to infinity, ΦAB (τ) decreases from unity to zero.
71
Temporal Correlations
If A = B, we call Eq. (58) the autocorrelation function. For thespin-spin correlation in the Ising model we obtain
Φσ (τ) =⟨σ(τ0)σ(τ)⟩ − ⟨σ(τ0)⟩2
⟨σ2(τ0)⟩ − ⟨σ(τ0)⟩2. (59)
72
Temporal Correlations
The linear correlation time τAB describes the relaxation inequilibrium
τAB =
∞∫0
ΦAB (τ) dτ. (60)
73
Temporal Correlations
As in the case of the non-linear correlation time, in thevicinity of Tc, we observe a critical slowing down, i.e.,
τAB ∝ |T − Tc|−zA . (61)
with zA being the linear dynamical critical exponent.
74
Temporal Correlations
The dynamical exponents for spin correlations turn out to be
zσ = 2.16 (2D), (62)zσ = 2.09 (3D). (63)
There is a conjectured relation between the Ising criticalexponents and the critical dynamical exponents for spin σ
and energy correlations E. The relations
zσ − znlσ = β, (64)zE − znlE = 1− α, (65)
are numerically well-established, however, not yetanalytically proven.
75
Decorrelated Configurations
Connecting this behavior with the one observed for thecorrelation time described by Eq. (60) yields
τAB ∝ |T − Tc|−zAB ∝ LzABν . (66)
76
Decorrelated Configurations
77
Decorrelated Configurations
To ensure not to sample correlated configurations one should
• first reach equilibrium (discard n0 = cτnl(T )
configurations),• only sample every nth
e = cτ(T ) configuration,
• and at Tc use n0 = cLznl
ν and ne = cLzν
where c ≈ 3 is a ”safety factor” to make sure to discardenough samples.
78
Finite Size Methods
Finite size methods
Divergent behavior at Tc
χ (T ) ∝ |Tc − T |−γ ,
C (T ) ∝ |Tc − T |−α ,
ξ(T ) ∝ |Tc − T |−ν .
(67)
79
Finite size methods
4.0 4.2 4.4 4.6 4.8 5.0
T
0
5
10
15
20
25
χ(T,L
)
L=8
L=12
L=16
L=20
−10 −5 0 5 10
L1/ν (T − Tc) /Tc
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
χ(T,L
)L−γ/ν
L=8
L=12
L=16
L=20
The system size dependence of the susceptibility and thecorresponding finite size scaling.
80
Finite size methods
The finite size scaling relation of the susceptibility is given by
χ (T, L) = Lγν Fχ
[(T − Tc)L
1ν
], (68)
where Fχ is called susceptibility scaling function3.
3Based on Eq. (23), we can infer that Fχ
[(T − Tc)L
1ν
]∝
(|T − Tc|L
1ν
)−γ
as L → ∞.
81
Finite size methods
In the case of the magnetization, the corresponding finite sizescaling relation is
MS (T, L) = L−βν FMS
[(T − Tc)L
1ν
]. (69)
82
Binder Cumulant
We still need a method to better determine Tc and maketherefore use of the Binder cumulant
UL(T ) = 1−⟨M4⟩
L
3 ⟨M2⟩2L, (70)
which is independent of the system size L at Tc since⟨M4⟩
L
3 ⟨M2⟩2L=
L− 4βν FM4
[(T − Tc)L
1ν
]L− 2β
ν FM2
[(T − Tc)L
1ν
]2 = FC
[(T − Tc)L
1ν
].
(71)
83
Binder Cumulant
The distribution P (M) of the magnetization M above a belowthe critical temperature Tc.
84
Binder Cumulant
For T > Tc, the magnetization is described by a Gaussiandistribution
PL (M) =
√Ld
πσLexp
[−M2Ld
σL
], (72)
with σL = kBT2χL. Since the fourth moment equals threetimes the second moment squared, i.e.,⟨
M4⟩L= 3
⟨M2⟩2
L, (73)
it follows that UL(T ) must be zero for T > Tc.
85
Binder Cumulant
Below the critical temperature (T < Tc), there exist oneground state with positive and one with negativemagnetization and the corresponding distribution is given by
PL (M) =12
√Ld
πσl
exp
[−(M −MS)
2 Ld
σL
]
+exp
[−(M +MS)
2 Ld
σL
].
(74)
86
Binder Cumulant
For this distribution, it holds that⟨M4⟩
L=⟨M2⟩2
Land
therefore UL(T ) =23 for T < Tc. In summary, we
demonstrated that
UL(T ) =
23 for T < Tc
const. for T = Tc
0 for T > Tc
(75)
87
Binder Cumulant
3.0 3.5 4.0 4.5 5.0 5.5 6.0
T
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
UL
(T)
L=8
L=12
L=16
L=20
4.490 4.495 4.500 4.505 4.510 4.515 4.520
T
0.42
0.44
0.46
0.48
0.50
0.52
0.54
0.56
UL
(T)
L=8
L=12
L=16
L=20
88
Corrections to Scaling
Far away from Tc we cannot observe a clear power lawbehavior anymore, and corrections to scaling have to beemployed, i.e.,
M (T ) = A0 (Tc − T )β +A1 (Tc − T )β1 + . . . , (76)ξ (T ) = C0 (Tc − T )−ν + C1 (Tc − T )−ν1 + . . . , (77)
with β1 > β and ν1 < ν.
89
Corrections to Scaling
These corrections are very important for high quality data,where the errors are small and the deviations become visible.The scaling functions must also be generalized as
M (T,L) = L−βν FM
[(T − Tc)L
1ν
]+ L−xF 1
M
[(T − Tc)L
1ν
]+ . . .
(78)with x = max
[β1ν ,
βν1, βν − 1
].
90
First Order Transition
The magnetization exhibits a switching behavior if the fieldvanishes. For non-zero magnetic fields, the magnetization isdriven in the direction of the field. The figure is taken from
Ref. [7].
91
First Order Transition
Hysteresis can be found by varying the field from negative topositive values and back. The figure is taken from Ref. [7].
92
First Order Transition
Binder showed that the magnetization as a function of thefield H is described by tanh
(αLd
)if the distribution of the
magnetization is given by Eq. (74) [7]. Specifically, we find forthe magnetization and susceptibility
M(H) = χDLH +MLtanh
(βHMLL
d), (79)
χL (H) =∂M
∂H= χD
L +βMLL
d
cosh2 (βHMLLd). (80)
93
First Order Transition
Similarly, to the scaling of a second order transition, we canscale the maximum of the susceptibility (χL (H = 0) ∝ Ld)and the width of the peak (∆χL ∝ L−d). To summarize, a firstorder phase transition is characterized by
1. A bimodal distribution of the order parameter,2. stochastic switching between the two states in small
systems,3. hysteresis of the order parameter when changing the
field,4. a scaling of the order parameter, or response function
according to Eq. (80).
94
Cluster Algorithms
Potts Model
The Hamiltonian of the system is defined as
H = −J∑⟨i,j⟩
δσiσj −H∑i
σi, (81)
where σi ∈ 1, ..., q and δσiσj is unity when nodes i and j arein the same state. The Potts model exhibits a first ordertransition at the critical temperature in two dimensions forq > 4, and for q > 2 for dimensions larger than two.
95
The Kasteleyn and Fortuin Theorem
We consider the Potts model not on a square lattice but on anarbitrary graph of nodes connected with bonds ν. Each nodehas q possible states and each connection leads to an energycost of unity if two connected nodes are in a different stateand of zero if they are in the same state, i.e.,
E = J∑ν
ϵν with ϵν =
0 if endpoints are in the same state1 otherwise
(82)
96
The Kasteleyn and Fortuin Theorem
Contraction and deletion on a graph.
97
The Kasteleyn and Fortuin Theorem
The partition function is the sum over all possibleconfigurations weighted by the Boltzmann factor and thusgiven by
Z =∑X
e−βE(X) (82)=∑X
e−βJ∑
ν ϵν =∑X
∏ν
e−βJϵν . (83)
98
The Kasteleyn and Fortuin Theorem
We now consider a graph where bond ν1 connects two nodes iand j with states σi and σj , respectively. If we would deletebond ν1, the partition function is
ZD =∑X
∏ν =ν1
e−βJϵν . (84)
99
The Kasteleyn and Fortuin Theorem
We can thus rewrite Eq. (83) as
Z =∑X
e−βJϵν1∏ν =ν1
e−βJϵν
=∑
X:σi=σj
∏ν =ν1
e−βJϵν + e−βJ∑
X:σi =σj
∏ν =ν1
e−βJϵν ,
where the first part is the partition function of the contractedgraph ZC and the second part is given by the identity∑X:σi =σj
∏ν =ν1
e−βJϵν =∑X
∏ν =ν1
e−βJϵν−∑
X:σi=σj
∏ν =ν1
e−βJϵν = ZD−ZC .
(85)
100
The Kasteleyn and Fortuin Theorem
Summarizing the last results, we find
Z = ZC + e−βJ (ZD − ZC) = pZC + (1− p)ZD, (86)
where p = 1− e−βJ . To be more precise, we expressed thepartition function Z as the contracted and deleted partitionfunctions at bond ν1. We apply the last procedure to anotherbond ν2 and find
Z = p2ZCν1 ,Cν2+p(1−p)ZCν1 ,Dν2
+(1−p)pZDν1 ,Cν2+(1−p)2ZDν1 ,Dν2
.
(87)
101
The Kasteleyn and Fortuin Theorem
After applying these operations to every bond, the graph isreduced to a set of separated points corresponding toclusters of nodes which are connected and in the same stateout of q states. The partition function reduces to
Z =∑
configurations ofbond percolation
q# of clusterspc (1− p)d =⟨q# of clusters
⟩b, (88)
where c and d are the numbers of contracted and deletedbonds respectively. In the limit of q → 1, one obtains thepartition function of bond percolation4.4In bond percolation, an edge of a graph is occupied with probability p andvacant with probability 1 − p.
102
Coniglio-Klein Clusters
The probability of a given cluster C to be in a certain state σ0
is independent of the state itself, i.e.,
p(C, σ0) = pcC (1− p)dC∑
bond percolationwithout cluster C
q# of clusterspc (1− p)d. (89)
103
Coniglio-Klein Clusters
This implies that flipping this particular cluster has no effecton the partition function (and therefore the energy) so that itis possible to accept the flip with probability one. This can beseen by looking at the detailed balance condition of thesystem
p(C, σ1)W [(C, σ1)→ (C, σ2)] = p(C, σ2)W [(C, σ2)→ (C, σ1)]
(90)and using p(C, σ1) = p(C, σ2).
104
Coniglio-Klein Clusters
We then obtain for the acceptance probabilities
A [(C, σ2)→ (C, σ1)] = min
[1,p(C, σ2)
p(C, σ1)
]= 1 M(RT)2,
A [(C, σ2)→ (C, σ1)] =p(C, σ2)
p(C, σ1) + p(C, σ2)=
12
Glauber.(91)
Based on these insights, we introduce cluster algorithmswhich are much faster than single-spin flip algorithms andless prone to the problem of critical slowing down.
105
Swendsen-Wang Algorithm
Swendsen-Wang algorithm
• Occupy the bonds with probability p = 1− e−βJ ifsites are in the same state.
• Identify the clusters with the Hoshen-Kopelmanalgorithm.
• Flip the clusters with probability 1/2 for Ising oralways choose a new state for q > 2.
• Repeat the procedure.
106
Wolff Algorithm
Wolff algorithm
• Choose a site randomly.• If the neighboring sites are in the same state, addthem to the cluster with probability p = 1− e−βJ .
• Repeat this for any site on the boundaries of thecluster, until all the bonds of the cluster have beenchecked exactly once.
• Choose a new state for the cluster.• Repeat the procedure.
107
Other Ising-like Models
One of the possible generalizations of the Ising model is theso called n-vector model.
The Hamiltonian resembles the one of the Potts model in thesense that it favors spin alignment
H = −J∑⟨i,j⟩
Si · Sj + H∑i
Si. (92)
with Si =(S1i, S
2i , . . . , S
ni
)and
∣∣∣Si
∣∣∣ = 1.
108
Other Ising-like Models
The dependence of the critical temperature on the number ofvector components n.
109
Other Ising-like Models
For Monte Carlo simulations with vector-valued spins we haveto adapt our simulation methods. The classical strategy is toflip spins by modifying the spin locally trough adding a small∆S such that S′
i = Si +∆S and ∆S ⊥ Si. The classicalMetropolis algorithm can then be used in the same fashion asin the Ising model.
110
Histogram Methods
Histogram Methods
For computing the thermal average defined by Eq. (14), weneed to sample different configurations at differenttemperatures. Another possibility would be to determine anaverage at a certain temperature T0 and extrapolate toanother temperature T . In the case of a canonical ensemble,an extrapolation can be achieved by reweighing thehistogram of energies pT0(E) with the Boltzmann factorexp(E/T − E/T0).
111
Histogram Methods
Such histogram methods have first been described in Ref. [8].We now reformulate the computation of the thermal averageof a quantity Q and of the partition function as a sum over allpossible energies instead of over all possible configurationsand find
Q (T0) =1
ZT0
∑E
Q (E) pT0 (E) with ZT0 =∑E
pT0 (E), (93)
where pT0 (E) = g (E) e− E
kBT0 with g (E) defining thedegeneracy of states, i.e., the number of states with energy E.
112
Histogram Methods
This takes into account the fact that multiple configurationscan have the same energy. The goal is to compute thequantity Q at another temperature T
Q (T ) =1ZT
∑E
Q (E) pT (E). (94)
The degeneracy of states contains all the information needed.Using the definition of g (E) yields
pT (E) = g (E) e− E
kBT = pT0 (E) exp
[− E
kBT+
E
kBT0
](95)
and with fT0,T (E) = exp[− E
kBT + EkBT0
]we finally obtain
Q (T ) =
∑E Q (E) pT0 (E) fT0,T (E)∑
E pT0 (E) fT0,T (E). (96)
113
Histogram Methods
An example of the histogram and the broad histogram methodfor different system sizes. The figure is taken from Ref. [9]. 114
Broad Histogram Method
Let Nup and Ndown be the numbers of processes which lead toan increasing and decreasing energy, respectively.Furthermore, we have to keep in mind that the degeneracy ofstates increases exponentially with energy E, because thenumber of possible configurations increases with energy. Toexplore all energy regions equally, we find a conditionequivalent to the one of detailed balance, i.e.,
g (E +∆E)Ndown (E +∆E) = g (E)Nup (E) . (97)
115
Broad Histogram Method
The motion in phase space towards higher energies can thenbe penalized with a Metropolis-like dynamics:
• Choose a new configuration,• if the new energy is lower, accept the move,• if the new energy is higher then accept with probability
Ndown(E+∆E)Nup(E) .
116
Broad Histogram Method
We obtain the function g(E) by taking the logarithm of Eq. (97)and divide by ∆E
log [g (E +∆E)]−log [g (E)] = log [Nup (E)]−log [Ndown (E +∆E)] .
(98)
117
Broad Histogram Method
In the limit of small energy differences, we can approximatethe last equation by
∂ log [g (E)]
∂E=
1∆E
log
[Nup (E)
Ndown (E +∆E)
](99)
which we can numerically integrate to obtain g (E).
118
Broad Histogram Method
Distributions of Nup and Ndown can be obtained by keepingtrack of these numbers for each configuration at a certainenergy. In addition, we also need to store the values of thequantity Q (E) we wish to compute as a thermal averageaccording to
Q (T ) =
∑E Q (E) g (E) e
− EkBT∑
E g (E) e− E
kBT
. (100)
Based on a known degeneracy of states g (E), we can nowcompute quantities at any temperature.
119
Flat Histogram Method
Flat histogram method
• Start with g (E) = 1 and set f = e.• Make a Monte Carlo update with p(E) = 1/g(E).• If the attempt is successful at E: g(E)← f · g(E).• Obtain a histogram of energies H(E).• If H(E) is flat enough, then f ←
√f .
• Stop when f ≤ 1 + 10−8.
120
Umbrella Sampling
The basic idea is to multiply transition probabilities with afunction that is large at the free energy barrier and to later onremove this correction in the averaging step.
p (C) =w (C) e
−E(C)kBT∑
C w (C) e−E(C)
kBT
with ⟨A⟩ =⟨A/w⟩w⟨1/w⟩w
. (101)
121
Summary Histogram Methods
Summarizing, some of the most common techniques relatedto the histogram methods are
• Wang-Landau method [10, 11],• Multiple histogram method [12],• Multicanonical Monte Carlo [13],• Flat Histogram method [14],• Umbrella sampling [15].
122
Renormalization Group
Self-similarity
The Koch snowflake as an example of a self-similar pattern.
123
Real Space Renormalization
H. J. Maris and L. P. Kadanoff describe in Teaching therenormalization group that “communicating exciting newdevelopments in physics to undergraduate students is of
great importance. There is a natural tendency for students tobelieve that they are a long way from the frontiers of science
where discoveries are still being made.” [16]
124
Renormalization and Free Energy
To build some intuition for renormalization approaches, weconsider a scale transformation of the characteristic length L
of our system with that leads to a rescaled characteristiclength L = L/l. Moreover, we consider the partition functionof an Ising system as defined by the Hamiltonian given inEq. (15). A scale transformation with L = L/l leaves thepartition function
Z =∑σ
e−βH (102)
and the corresponding free energy invariant [17].
125
Renormalization and Free Energy
We therefore find for the free energy densities
f (ϵ,H) = l−df(ϵ, H
). (103)
We set ϵ = lyT ϵ and H = lyHH to obtain
f(ϵ, H
)= f (lyT ϵ, lyHH) . (104)
126
Renormalization and Free Energy
Since renormalization also affects the correlation length
ξ ∝ |T − Tc|−ν = |ϵ|−ν (105)
we can relate the critical exponent ν to yT . The renormalizedcorrelation length ξ = ξ/l scales as
ξ ∝ ϵ−ν . (106)
127
Renormalization and Free Energy
And due to
lyT ϵ = ϵ ∝ ξ−1ν =
(ξ
l
)− 1ν
∝ ϵl1ν , (107)
we find yT = 1/ν.
The critical point is a fixed point of the transformation sinceϵ = 0 at Tc and ϵ does not change independent of the value ofthe scaling factor.
128
Majority Rule
A straightforward example which can be regarded asrenormalization of spin systems is the majority rule. Insteadof considering all spins in a certain neighborhood separately,one just takes the direction of the net magnetization of theseregions as new spin value, i.e.,
σi = sign
∑region
σi
. (108)
129
Majority Rule
An illustration of the majority rule renormalization.
130
Decimation of the One-dimensional Ising Model
The spins only interact with their nearest neighbors and thecoupling constant K = J/(kBT ) is the same for all spins.
131
Decimation of the One-dimensional Ising Model
An example of a one-dimensional Ising chain.
132
Decimation of the One-dimensional Ising Model
To further analyze this system, we compute its partitionfunction Z and obtain
Z =∑σ
eK∑
i σi =∑
σ2i=±1
∏2i
∑σ2i+1=±1
eK(σ2iσ2i+1+σ2i+1σ2i+2)
=∑
σ2i=±1
∏2i2cosh [K (σ2i + σ2i+2)]
=∑
σ2i=±1
∏2i
z (K) eKσ2iσ2i+2
= [z (K)]N2∑
σ2i=±1
∏2i
eKσ2iσ2i+2 ,
(109)where we used in the third step that the cosh(·) function onlydepends on even spins.
133
Decimation of the One-dimensional Ising Model
According to Eq. (109), the relation
Z(K,N) = [z (K)]N2 Z(K,N/2) (110)
holds as a consequence of the decimation method. Thefunction z(K) is the spin-independent part of the partitionfunction and K is the renormalized coupling constant.
134
Decimation of the One-dimensional Ising Model
We compute the relationz (K) eKs2is2i+2 = 2cosh [K (s2i + s2i+2)] explicitly and find
z (K) eKs2is2i+2 =
2cosh (2K) if s2i = s2i+2,
2 otherwise .(111)
135
Decimation of the One-dimensional Ising Model
Dividing and multiplying the last two expressions yields
e2K = cosh (2K) and z2 (K) = 4 cosh (2K) . (112)
And the renormalized coupling constant K in terms of K isgiven by
K =12ln [cosh (2K)] . (113)
136
Decimation of the One-dimensional Ising Model
0.0 0.2 0.4 0.6 0.8 1.0
K
0.0
0.2
0.4
0.6
0.8
1.0
K
An illustration of the fixed point iteration defined by Eq. (113).
137
Decimation of the One-dimensional Ising Model
Given the partition function, we now compute the free energyaccording to F = −kBTNf(K) = −kBT ln (Z) with f(K)
being the free energy density. Taking the logarithm ofEq. (110) yields
ln [Z(K,N)] = Nf(K) =12N ln [z(K)] +
12Nf
(K). (114)
Based on the last equation, we can derive the followingrecursive relation for the free energy density
f(K)= 2f (K)− ln
[2 cosh (2K)1/2
]. (115)
138
Decimation of the One-dimensional Ising Model
There exists one stable fixed point at K∗ = 0 and anotherunstable one at K∗ →∞. Every fixed point (K∗ = K) impliesthat Eq. (115) can be rewritten due to f
(K)= f (K∗).
The case of K∗ = 0 corresponds to the high-temperature limitwhere the free energy approaches the value
F = −NkBTf (K∗) = −NkBT ln(2). (116)
In this case, the entropy dominates the free energy.
139
Decimation of the One-dimensional Ising Model
For K∗ →∞, the system approaches the low temperaturelimit and the free energy is given by
F = −NkBTf (K∗) = −NkBTK = −NJ, (117)
i.e., given by the internal energy.
140
Generalization
In general, multiple coupling constants are necessary as forexample in the case of the two-dimensional Ising model.Thus, we have to construct a renormalized Hamiltonian basedon multiple renormalized coupling constants, i.e.,
H =
M∑α=1
KαOα with Oα =∑i
∏k∈cα
σi+k (118)
where cα is the configuration subset over which werenormalize and
Kα (K1, . . . ,KM ) with α ∈ 1, . . . ,M . (119)
141
Generalization
At Tc there exists a fixed point K∗α = Kα (K
∗1 , . . . ,K
∗M ). A
possible ansatz to solve this problem is the linearization ofthe transformation. Thus, we compute the JacobianTα,β = ∂Kα
∂Kβand obtain
Kα −K∗α =
∑β
Tα,β|K∗
(Kβ −K∗
β
). (120)
142
Generalization
To analyze the behavior of the system close to criticality, weconsider eigenvalues λ1, . . . , λM and eigenvectors ϕ1, . . . , ϕM
of the linearized transformation defined by Eq. (120). Theeigenvectors fulfill ϕα = λαϕα and the fixed point is unstableif λα > 1.
143
Generalization
The largest eigenvalue dominates the iteration and we canidentify the scaling field ϵ = lyT ϵ with the eigenvector of thetransformation, and the scaling factor with eigenvalueλT = lyT . Then, we compute the exponent ν according to
ν =1yT
=ln (l)
ln (λT ). (121)
144
Monte Carlo Renormalization Group
Since we are dealing with generalized Hamiltonians withmany interaction terms, we compute the thermal average ofOα according to
⟨Oα⟩ =∑
σOαe∑
β KβOβ∑σ e
∑β KβOβ
=∂F
∂Kα(122)
where F is the free energy.
145
Monte Carlo Renormalization Group
Using the fluctuation-dissipation theorem, we can alsonumerically compute the response functions
χα,β =∂ ⟨Oα⟩∂Kβ
= ⟨OαOβ⟩ − ⟨Oα⟩ ⟨Oβ⟩ ,
χα,β =∂⟨Oα
⟩∂Kβ
=⟨OαOβ
⟩−⟨Oα
⟩⟨Oβ⟩ .
146
Monte Carlo Renormalization Group
We also find with Eq. (122) that
χ(n)α,β =
∂⟨O
(n)α
⟩∂Kβ
=∑γ
∂Kγ
∂Kβ
∂⟨O
(n)α
⟩∂Kγ
=∑γ
Tγ,βχ(n)α,γ . (123)
It is thus possible to derive a value of Tγ,β from thecorrelation functions by solving a set of M coupled linearequations. At point K = K∗, we can apply this method in aniterative manner to compute critical exponents as suggestedby Eq. (121).
147
Monte Carlo Renormalization Group
There are many error sources in this technique, that originatefrom the fact that we are using a combination of several tricksto obtain our results:
• Statistical errors,• Truncation of the Hamiltonian to the M th order,• Finite number of scaling iterations,• Finite size effects,• No precise knowledge of K∗.
148
Boltzmann Machine
Hopfield Network
We begin our excursion to Boltzmann machines with anetwork consisting of neurons which are fully connected, i.e.,every single neuron is connected to all other neurons. Aneuron represents a node of a network and is nothing but afunction of I different inputs xii∈1,...,I which are weightedby wii∈1,...,I to compute and output y.
149
Hopfield Network
An illustration of a single neuron with output y, inputsxii∈1,...,I and weights wii∈1,...,I.
150
Hopfield Network
In terms of a Hopfield network, we consider discrete inputsxi ∈ −1, 1. The activation of neuron i is given by
ai =∑j
wijxj , (124)
where we sum over the inputs. The weights fulfill wij = wji
and wii = 0.
151
Hopfield Network
Similarly to the Ising model, the associated energy is given by
E = − 12∑i,j
wijxixj −∑i
bixi, (125)
where bi is the bias term.
152
Hopfield Network
The dynamics of a Hopfield network is given by
xi(ai) =
1 if ai ≥ 0,
−1 otherwise.(126)
153
Hopfield Network
The energy difference ∆Ei after neuron i has been updated is
∆Ei = E(xi = −1)− E(xi = 1) = 2
bi +∑j
wijxj
. (127)
We can absorb the bias bi in the sum by having an extra activeunit at every node in the network. We thus showed that theactivation defined by Eq. (124) equals one half of the energydifference ∆Ei.
154
Boltzmann Machine Learning
A comparison of denoising capabilities of Hopfield andBoltzmann machine models. The figure is taken from
https://arxiv.org/pdf/1803.08823.pdf.
155
Boltzmann Machine Learning
For some applications, finding a local minimum based on thedeterministic update rule defined by Eq. (126) might not besufficient. Similar to the discussion of Monte Carlo methodsfor Ising systems, we employ an update probability
pi =1
1 + exp(−∆Ei/T )= σ(2ai/T ) (128)
to set neuron i to unity independent of its state [18]. Here,σ(x) = 1/ [1 + exp(−x)] denotes the sigmoid function.
156
Boltzmann Machine Learning
As defined in Eq. (127), the energy difference ∆Ei is the gapbetween a configuration with an active neuron i and aninactive one. The parameter T acts as temperatureequivalent5.
A closer look at Eqs. (125) and (128) tells us that we aresimulating a Hamiltonian system with Glauber dynamics. Dueto the fulfilled detailed balance condition, we reach thermalequilibrium and find again for the probabilities of the systemto be in state X or Y 6
peq(Y )
peq(X)= exp
(−E(Y )− E(X)
T
). (129)
5For T → 0, we recover deterministic dynamics as described by Eq. (126).6Here we set kB = 1.
157
Boltzmann Machine Learning
We divide the Boltzmann machine units into visible andhidden units represented by the non-empty set V and thepossibly empty set H , respectively. The visible units are setby the environment whereas the hidden units are additionalvariables which might be necessary to model certain outputs.
158
Boltzmann Machine Learning
Let P ′(ν) be the probability distribution over the visible unitsν in a freely running network. It can be obtained bymarginalizing over the corresponding joint probabilitydistribution, i.e.,
P ′(ν) =∑h
P ′ (ν, h) , (130)
where h represents a hidden unit.
159
Boltzmann Machine Learning
The goal is to come up with a method such that P ′(ν)
approaches the unknown environment distribution P (ν). Wemeasure the difference between P ′(ν) and P (ν) in terms ofthe Kullback-Leibler divergence (relative entropy)
G =∑ν∈V
P (ν) ln
[P (ν)
P ′(ν)
]. (131)
160
Boltzmann Machine Learning
To minimize G, we perform a gradient descent according to
∂G
∂wij= − 1
T
(pij − p′ij
), (132)
where pij is the probability that two units are active onaverage if the environment is determining the states of thevisible units and p′ij is the corresponding probability in afreely running network without a coupling to the environment.
161
Boltzmann Machine Learning
Both probabilities are measured at thermal equilibrium. Inthe literature, the probabilities pij and p′ij are also oftendefined in terms of the thermal averages ⟨xixj⟩data and⟨xixj⟩model, respectively. The weights wij of the network arethen updated according to
∆wij = ϵ(pij − p′ij
)= ϵ (⟨xixj⟩data − ⟨xixj⟩model) , (133)
where ϵ is the learning rate.
162
Boltzmann Machine Learning
The so-called restricted Boltzmann machine is not taking intoaccount mutual connections within the set of hidden andvisible units, and turned out to be more suitable for learningtasks. In the case of restricted Boltzmann machines, theweight update is given by
∆wij = ϵ (⟨νihj⟩data − ⟨νihj⟩model) , (134)
where νi and hj represent visible and hidden units,respectively.
163
Boltzmann Machine Learning
Instead of sampling the configurations for computing⟨νihj⟩data and ⟨νihj⟩model at thermal equilibrium, we couldalso just consider a few relaxation steps. This method iscalled contrastive divergence and defined by the followingupdate rule
∆wCDij = ϵ
(⟨νihj⟩0data − ⟨νihj⟩
kmodel
), (135)
where the superscript indices indicate the number of updates.
164
Parallelization
Parallelization
Some important concepts are summarized in the lecturenotes.
165
Non-Equilibrium Systems
Directed Percolation
Isotropic percolation is an equilibrium process in which a siteor bond is occupied with a certain probability p. However, inthe case of directed percolation an additional constraint isdefined, namely in each time step occupied sites or bondsoccupy their neighbors with probability p only along a givendirection [19]. Time is therefore a relevant parameter.
166
Directed Percolation
167
Directed Percolation
168
Directed Percolation
In the vicinity of pc the percolation order parameter P (p), thefraction of sites of the largest cluster (istropic percolation) orthe density of active sites (directed percolation), takes theform: P (p) ∝ (p− pc)
β .
Although, the free energy is not defined in non-equilibriumprocesses one can distinguish between first-order andcontinuous phase transitions based on the behavior of theorder parameter.
169
Directed Percolation
Another formulation of directed percolation is given by thecontact process. On a lattice with active (si(t) = 1) andinactive (si(t) = 0) sites, we sequentially update the systemaccording to the following dynamics:
Let the number of active neighbors neighbors beni(t) =
∑⟨i,j⟩ sj(t) and a new value si(t+ dt) ∈ 0, 1 is
obtained according to the transition rates
w[0→ 1, n] = (λn)/(2d) and w[1→ 0, n] = 1, (136)
where d is the dimension of the system.
170
Directed Percolation
The critical infection rate λc and the critical exponent β on asquare lattice for different dimensions d. At and above the
critical dimension dc = 4 the critical exponents are describedby mean-field theory [17]. The values are rounded to the
second decimal and taken from Refs. [20] and [21].d = 1 d = 2 d = 3 dc = 4
λc 3.30 1.65 1.32 1.20β 0.28 0.59 0.78 1
171
Directed Percolation
The contact process has the same characteristic criticalexponents as directed percolation what implies that theyboth belong to the same universality class. According toRef. [17] the directed percolation universality class requires:
1. a continuous phase transition from a fluctuating phase toa unique absorbing state,
2. a transition with a one-component order parameter,3. local process dynamics,4. no additional attributes, such as conservation laws or
special symmetries.
172
Kinetic Monte Carlo (Gillespie Algorithm)
Non-equilibrium systems are not described by a Hamiltonianand we cannot use the same algorithms which we employedfor studying the Ising model or other equilibrium models.Time is now a physical parameter and not a mere virtualproperty of a Markov chain.
A standard algorithm for simulating non-equilibriumdynamics is the kinetic Monte Carlo method which is alsooften referred to as Gillespie algorithm.
173
Kinetic Monte Carlo (Gillespie Algorithm)
To give an example, the kinetic Monte Carlo algorithm isapplied to the contact process on a two-dimensional squarelattice. Recovery and activation define n = 2 processes withcorresponding rates R1 = 1 and R2 = λ.
174
Kinetic Monte Carlo (Gillespie Algorithm)
At time t, the subpopulation N1(t) consists of all active nodeswhich recover with rate R1. The total recovery rate is thengiven by Q1(t) = N1(t).
On the square lattice only nearest-neighbor interactions areconsidered and the total rate of the second process(activation) is obtained by computing ni and w [0→ 1, ni]
according to Eq. (136) for all nodes.
175
Kinetic Monte Carlo (Gillespie Algorithm)
Kinetic Monte Carlo algorithm
1. Identify all individual rates (per node) Rii∈1,...,n.2. Determine the overall rates (all nodes)Qii∈1,...,n, where Qi = RiNi and Ni defines thesubpopulation which corresponds to Ri. It ispossible that some subpopulations are identicalbut correspond to different rates.
3. Let η ∈ [0, 1) be a uniformly distributed randomnumber and Q =
∑iQi. The process with rate Rj
occurs if∑j−1
i=1 Qi ≤ ηQ <∑j
i=1 Qi.4. Let ϵ ∈ [0, 1) be a uniformly distributed random
number. Then the time evolves as t→ t+∆t,where ∆t = −Q−1 log(1− ϵ) and return to step 2.
176
[1] Stanley, H.Scaling, universality, and renormalization: Three pillarsof modern critical phenomena (1999).URL http://link.springer.com/chapter/10.1007%2F978-1-4612-1512-7_39.
[2] Stanley, H.Introduction to Phase Transitions and Critical Phenomena(Clarendon, Oxford, 1971).
[3] Sengers, J., Sengers, J. L. & Croxton, C.Progress in liquid physics.Wiley, Chichester (1978).
177
[4] Pelissetto, A. & Vicari, E.Critical phenomena and renormalization-group theory.Phys. Rep. 368, 549–727 (2002).
[5] Vichniac, G. Y.Simulating physics with cellular automata.Phys. D 10, 96–116 (1984).
[6] Binder, K. & Heermann, D.Monte Carlo Simulation in Statistical Physics (Springer,1997).
[7] Binder, K. & Landau, D.Finite-size scaling at first-order phase transitions.Phys. Rev. B 30, 1477 (1984).
178
[8] Salzburg, Z., Jacobson, J., Fickett, W. & Wood, W.Excitation of non-radial stellar oscillations bygravitational waves: a first model.Chem. Phys. 65 (1959).
[9] de Oliveira, P., Penna, T. & Herrmann, H.Broad histogram method.Braz. J. Phys. 677 (1996).
[10] Wang, F. & Landau, D.Efficient, multiple-range random walk algorithm tocalculate the density of states.Phys. Rev. Lett. 86, 2050 (2001).
179
[11] Zhou, C., Schulthess, T. C., Torbrügge, S. & Landau, D.Wang-landau algorithm for continuous models and jointdensity of states.Phys. Rev. Lett. 96, 120201 (2006).
[12] Ferrenberg, A.Am ferrenberg and rh swendsen, phys. rev. lett. 63, 1195(1989).Phys. Rev. Lett. 63, 1195 (1989).
[13] Berg, B. A.Introduction to multicanonical monte carlo simulations.Fields Inst. Commun. 26, 1–24 (2000).
180
[14] Wang, F. & Landau, D.Determining the density of states for classical statisticalmodels: A random walk algorithm to produce a flathistogram.Phys. Rev. E 64, 056101 (2001).
[15] Torrie, G. & Valleau, J.Nonphysical sampling distributions in monte carlofree-energy estimation: Umbrella sampling.J. Comp. Phys. 23, 187–199 (1977).URL http://www.sciencedirect.com/science/article/pii/0021999177901218.
[16] Maris, H. J. & Kadanoff, L. P.Teaching the renormalization group.Am. J. Phys. 46, 652–657 (1978).
181
[17] Henkel, M., Hinrichsen, H. & Lübeck, S.Non-Equilibrium Phase Transitions Volume 1 – AbsorbingPhase Transitions (Springer Science & Business Media,2008).
[18] Ackley, D. H., Hinton, G. E. & Sejnowski, T. J.A learning algorithm for boltzmann machines.Cogn. Sci. 9, 147–169 (1985).
[19] Broadbent, S. & Hammersley, J.Percolation processes I. crystals and mazes.Proceedings of the Cambridge Philosophical Society(1957).
182
[20] Moreira, A. G. & Dickman, R.Critical dynamics of the contact process with quencheddisorder.Phys. Rev. E (1996).
[21] Sabag, M. M. S. & de Oliveira, M. J.Conserved contact process in one to five dimensions.Phys. Rev. E (2002).
183