langevin-type sampling methods - github pages · 3. hamiltonian monte carlo (hmc) neal, radford m....
TRANSCRIPT
![Page 1: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/1.jpg)
Based on summer literature review
School of Physics, Peking University
Ziming Liu
Advisor: Zheng Zhang, UCSB ECE1
Langevin-type sampling methods
![Page 2: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/2.jpg)
The big party
2
ComputerScience Math
Physics
Robustness
QuantumField
Theory
StatisticalPhysics
QuantumInformation
Information Theory
Tensor Network
QuantumDynamics
TensorAnalysis
PartialDifferentialEquation
The goal of this talk
Complexity
MachineLearning
Hamiltonian Monte CarloRenormalization Group
Ising model
Matrix product state
Tensor trainTensorized neural network
Schrodinger equationKohn-sham equationentanglementClassical-quantum correspondence
Quantum-inspired Hamiltonian Monte Carlo
![Page 3: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/3.jpg)
Overview
3
1. Introduction to Bayesian models
2. Introduction to Langevin dynamics (1st,2nd,3rd-order)
3. Hamiltonian Monte Carlo (HMC)
![Page 4: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/4.jpg)
4
1. Introduction to Bayesian models
![Page 5: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/5.jpg)
Maxwell-Boltzmann distribution
5
Description: for isothermal systemSingle-particle system (temperature 𝑇)
For state 𝑥, energy 𝐸(𝑥), then probability density 𝑝 𝑥 ∝ exp(−𝐸 𝑥
𝑘𝐵𝑇)
Theory
• Maximum entropy principle for isolated system (canonical ensemble)
• Minimum free energy principle for isothermal system
Ideal gas 𝐸 =𝑞2
2𝑚𝑝 𝑞 ∝ exp(−
𝑞2
2𝑚𝑘𝐵𝑇)
Static isothermal atmosphere
𝐸 = 𝑈(𝑥) 𝑝 𝑥 ∝ exp(−𝑈(𝑥)
𝑘𝐵𝑇)
Gas ina well
𝐸 = 𝑈 𝑥 +𝑞2
2𝑚 𝑝 𝑥, 𝑞 ∝ exp(−𝑈 𝑥 +
𝑞2
2𝑚𝑘𝐵𝑇
)
Link between pdf and energy func
![Page 6: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/6.jpg)
Bayesian model
6
What ?
𝑝 𝜃 𝐷 ∝ 𝑝 𝐷 𝜃 𝑝(𝜃)
posterior likelihood prior
𝜃: model parameters
Link to regression models
𝑝 𝜃 𝐷 = exp −𝑈 𝜃
𝑈 𝜃 = − log 𝑝 𝐷 𝜃 − log(𝑝 𝜃 )
Regression error Regularization term
𝜃∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑈(𝜃)𝜃
𝜃∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝(𝜃|𝐷)𝜃
Maximum Posterior EstimationGlobal Optima
![Page 7: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/7.jpg)
Bayesian model
7
Why ?
𝜃∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑈(𝜃)𝜃
𝜃∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝(𝜃|𝐷)𝜃
Maximum a Posterior EstimationGlobal Optima
Limitations of MAP1 No uncertainty quantification (point estimation)2 Risk of overfitting !
MAP Bayesian
![Page 9: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/9.jpg)
Markov Chain Monte Carlo (MCMC)
9
𝑃𝑠 𝜃 = 𝑃(𝜃|𝐷)Steady distribution Posterior distribution
Bayesian modelMarkov Chain
Given a steady distribution, how to construct a Markov Chain?
Steady distribution Detailed balance
(Simplified to)
1
2 3
Detailed balance:
1
2 3
Steady distributionBut no db:
![Page 10: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/10.jpg)
Detailed balance
10
𝑁1 𝑜𝑟 𝑝1 𝑁2 𝑜𝑟 𝑝2
𝑄(1 → 2) =1/2
𝑄(2 → 1) =1/2
![Page 11: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/11.jpg)
Metropolis-Hastings (MH)
11
The MH algorithm for sampling from a target distribution 𝑝(𝑥), using transition kernel 𝑄, consists of the following steps:• For 𝑡 = 1, 2,⋯
• Sample 𝑦 from 𝑄(𝑦|𝑥𝑡). Think of 𝑦 as a proposed value of 𝑥𝑡+1.• Compute acceptance probability
𝐴(𝑥𝑡 → 𝑦) = min(1,𝑝 𝑦 𝑄(𝑥𝑡|𝑦)
𝑝 𝑥𝑡 𝑄(𝑦|𝑥𝑡))
• With probability A accept the proposed value, and set 𝑥𝑡+1 = 𝑦.Otherwise, set 𝑥𝑡+1 = 𝑥𝑡.
𝑥𝑡 𝑦
𝑝(𝑥): number of particles at state 𝑥
𝑄(𝑦|𝑥): transition rate from 𝑥 to 𝑦
𝑝(𝑦)
𝑝(𝑥𝑡)𝑄(𝑦|𝑥𝑡)
𝑄(𝑥𝑡|𝑦)
𝑝(𝑥)𝑄(𝑦|𝑥): number of particles transitfrom 𝑥 to 𝑦
𝑝 𝑥 𝑄 𝑦 𝑥 𝐴(𝑥 → 𝑦): number of accepted particles transit from 𝑥 to 𝑦
=
Detailedbalance
![Page 12: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/12.jpg)
Metropolis algorithmThe Metropolis algorithm for sampling from a target distribution 𝑝(𝑥),transit through random walk, consists of the following steps:• For 𝑡 = 1, 2,⋯
• Random walk to 𝑦 from 𝑥𝑡. Think of 𝑦 as a proposed value of 𝑥𝑡+1.• Compute acceptance probability
𝐴(𝑥𝑡 → 𝑦) = min(1,𝑝 𝑦
𝑝 𝑥𝑡)
• With probability A accept the proposed value, and set 𝑥𝑡+1 = 𝑦.Otherwise, set 𝑥𝑡+1 = 𝑥𝑡.
12
Comment: random walk is symmetric, so 𝑄 𝑦 𝑥 = 𝑄(𝑥|𝑦)
In thermodynamics models, 𝑃 𝑥 ∼ exp −𝑈 𝑥 /𝑇 (Boltzmann distribution)
𝑈(𝑥)
Step sizelarge: low acceptance rate
small: correlated & not independent
![Page 13: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/13.jpg)
13
2. Introduction toLangevin Dynamics
![Page 14: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/14.jpg)
Zoo of Langevin dynamics
14
Stochastic Gradient Langevin Dynamics (cite=718)
Stochastic Gradient Hamiltonian Monte Carlo (cite=300)
Stochastic sampling using Nose-Hoover thermostat (cite=140)
Stochastic sampling using Fisher information (cite=207)
Welling, Max, and Yee W. Teh. "Bayesian learning via
stochastic gradient Langevin dynamics." Proceedings
of the 28th international conference on machine
learning (ICML-11). 2011.
Chen, Tianqi, Emily Fox, and Carlos Guestrin.
"Stochastic gradient hamiltonian monte
carlo." International conference on machine
learning. 2014.
Ding, Nan, et al. "Bayesian sampling using
stochastic gradient thermostats." Advances in
neural information processing systems. 2014.
Ahn, Sungjin, Anoop Korattikara, and Max Welling.
"Bayesian posterior sampling via stochastic gradient
Fisher scoring." arXiv preprint arXiv:1206.6380 (2012).
1storder, general
1storder, gaussian
2ndorder
3rdorder
![Page 15: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/15.jpg)
1st order Langevin dynamics
15
(also known as Brownian motion or Wiener Process)
𝑑𝑥 = −∇ 𝑓 𝑥 𝑑𝑡 + 𝛽−12𝑑𝑊(𝑡) 𝜌𝑠 ∝ exp(−𝛽𝑓(𝑥))
Energy function (bayesian) / loss function (optimization)
m
The properties of the mediumA heat bath (temperature 𝑻)Hit the ball every 𝑡0 (憋大招)
transfer momentum 𝑝 ∼ 𝑒𝑥𝑝(−𝑝2
2𝑇𝑡0)
Overdamped (coefficient 𝜸 large)Small relaxation time
Physical Intuition𝑓 𝑥 = 𝑐𝑜𝑛𝑠𝑡
①The ball gains a momentum 𝑝 fromparticles (fluctuating) around it.
② It travels in the damping medium𝑚 ሷ𝑥 = −𝛾 ሶ𝑥
→ ሶ𝑥 =𝑝
𝑚exp −
𝛾
𝑚𝑡 , 𝑥 =
𝑝
𝛾(1 − exp(−
𝛾
𝑚𝑡))
③Overdamped condition, then exp −𝛾
𝑚𝑡0 → 0.
So at time 𝑡, the total displacement is 𝑝
𝛾∝ 𝑝.
i.e. 𝑑𝑥 ∝1
𝛾exp −
𝑝2
2𝑇𝑡0∝ 𝑇𝑑𝑊(𝑡0) (𝛽 =
1
𝑇)
![Page 16: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/16.jpg)
2nd order Langevin dynamics
16
𝑑 Ԧ𝑥 = Ԧ𝑝𝑑𝑡
𝑑 Ԧ𝑝 = −∇𝑓 Ԧ𝑥 𝑑𝑡 − 𝐴 Ԧ𝑝𝑑𝑡 + 2𝐴𝑇𝑑𝑊
𝑓( Ԧ𝑥)
Ԧ𝑝
ConservativeForce
DampingForce
Thermal“Force”
𝑃𝑠 Ԧ𝑥, Ԧ𝑝 ∝ exp(−(𝑝2
2+ 𝑈( Ԧ𝑥))/𝑇)
Invariant measure:
![Page 17: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/17.jpg)
17
𝜙𝑡
Fokker Planck Eq for 2nd order LD
One-dim random walk (不变原理)
𝜙𝑥 𝜙𝑥+1𝜙𝑥−1
𝑡
𝑡 + 1𝜕𝜙
𝜕𝑡=1
2𝜙𝑥−1 + 𝜙𝑥+1 − 2𝜙𝑥 ≈
1
2
𝜕2𝜙
𝜕𝑥2
Dynamical Equations
Fokker-Planck Equations
![Page 18: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/18.jpg)
18
𝑑 Ԧ𝜃 = Ԧ𝑝𝑑𝑡
𝑑 Ԧ𝑝 = −∇𝑈 Ԧ𝜃 𝑑𝑡 − 𝜁 Ԧ𝑝𝑑𝑡 + 2𝐴𝑇𝑑𝑊
𝑑𝜁 =𝑝2
𝑛− 𝑇0 𝑑𝑡
When 𝒑↑ → 𝒑𝟐
𝒏∼ 𝑻 > 𝑻𝟎 → 𝜻 ↑ →more friction on 𝒑 → 𝒑 ↓
Negative feedback loop
3rd order Langevin dynamics (special)
Thermal term (thermostat)
![Page 19: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/19.jpg)
3rd order Langevin dynamics (general)
19
𝑑𝑞 = 𝑀−1𝑑𝑝
𝑑𝑝 = −∇𝑈 𝑞 𝑑𝑡 + 𝜎𝐹 Δ𝑡𝑀12𝑑𝑊 − 𝜁𝑝𝑑𝑡 + 𝜎𝐴𝑀
12𝑑𝑊
𝑑𝜁 =1
𝜇𝑝𝑇𝑀−1𝑝 − 𝑁𝑑𝑘𝐵𝑇 𝑑𝑡 − 𝛾𝜁𝑑𝑡 + 2𝑘𝐵𝑇𝛾𝑑𝑊
Invariant measure: exp −𝛽𝑈(𝑞) exp(−𝛽𝑝𝑇𝑀−1𝑝/2) exp −𝜇 𝜁 − ො𝛾 2/2
ො𝛾 = 𝛽(𝜎𝐹2 + 𝜎𝐴
2)/2
![Page 20: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/20.jpg)
20
Slides from: https://ergodic.org.uk/~bl/Data/Slides/MD4.pdf
![Page 21: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/21.jpg)
3rd order Langevin dynamics
21
𝑑𝑞 = 𝑀−1𝑑𝑝
𝑑𝑝 = −∇𝑈 𝑞 𝑑𝑡 + 𝜎𝐹 Δ𝑡𝑀12𝑑𝑊 − 𝜁𝑝𝑑𝑡 + 𝜎𝐴𝑀
12𝑑𝑊
𝑑𝜁 =1
𝜇𝑝𝑇𝑀−1𝑝 − 𝑁𝑑𝑘𝐵𝑇 𝑑𝑡 − 𝛾𝜁𝑑𝑡 + 2𝑘𝐵𝑇𝛾𝑑𝑊
Thermostat
exp −𝛽𝑈(𝑞) exp(−𝛽𝑝𝑇𝑀−1𝑝/2)Hamiltonian dynamics
exp(−𝛽𝑝𝑇𝑀−1𝑝/2)exp(−𝜇𝜁2/2)
OU process for 𝜁 exp(−𝜇𝜁2/2)
Noise for 𝑝 exp −𝜇𝜁2
2→ exp −
𝜇(𝜁 − ො𝛾)2
2
Invariant measure: exp −𝛽𝑈(𝑞) exp(−𝛽𝑝𝑇𝑀−1𝑝/2) exp −𝜇 𝜁 − ො𝛾 2/2
ො𝛾 = 𝛽(𝜎𝐹2 + 𝜎𝐴
2)/2
Bu
ildin
g B
lock
s
![Page 22: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/22.jpg)
22
3. Hamiltonian Monte Carlo(HMC)
Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of
markov chain monte carlo 2.11 (2011): 2.
Betancourt, Michael. "A conceptual introduction to Hamiltonian Monte Carlo." arXiv preprint arXiv:1701.02434 (2017).
![Page 23: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/23.jpg)
2nd order Langevin dynamics
23
𝑑 Ԧ𝑥 = Ԧ𝑝𝑑𝑡
𝑑 Ԧ𝑝 = −∇𝑓 Ԧ𝑥 𝑑𝑡 − 𝐴 Ԧ𝑝𝑑𝑡 + 2𝐴𝑇𝑑𝑊
𝑓( Ԧ𝜃)
Ԧ𝑝
ConservativeForce
DampingForce
Thermal“Force”
𝑃𝑠 Ԧ𝑥, Ԧ𝑝 ∝ exp(−(𝑝2
2+ 𝑈( Ԧ𝑥))/𝑇)
Invariant measure:
𝑑 Ԧ𝑥 = Ԧ𝑝𝑑𝑡
𝑑 Ԧ𝑝 = −∇𝑓 Ԧ𝑥 𝑑𝑡
𝐴 = 0
![Page 24: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/24.jpg)
Hamiltonian dynamics
24
𝑑 Ԧ𝑥 = 𝑀−1 Ԧ𝑝𝑑𝑡
𝑑 Ԧ𝑝 = −∇𝑈( Ԧ𝑥)𝑑𝑡
Definition of momentum
Momentum theorem
𝑓( Ԧ𝑥): conservative force
𝐻 𝑥, 𝑝 =1
2𝑝𝑇𝑀−1𝑝 + 𝑈(𝑥)
Kinetic energy
Potential energy
𝑑𝐻 = 𝑝𝑇𝑀−1𝑑𝑝 + 𝑑𝑈 𝑥 = −𝑑𝑥𝑇∇𝑈 𝑥 + 𝑑𝑈 𝑥 = 0
Energy conservation
Hamiltonian
Hamiltonian equations
![Page 25: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/25.jpg)
Steady distribution
25
𝑑 Ԧ𝑥 = 𝑀−1 Ԧ𝑝𝑑𝑡
𝑑 Ԧ𝑝 = −∇𝑈( Ԧ𝑥)𝑑𝑡
Hamiltonian Equations
Steady distribution:
𝑝𝑠 𝑥, 𝑞 ∝ exp −𝑈 Ԧ𝑥 −1
2𝑞𝑇𝑀−1𝑞
𝑝𝑠 𝑥 ∝ exp −𝑈 Ԧ𝑥 = 𝑝(𝑥|𝐷)
Fokker Planck Equation:
𝜕𝑡𝑝 +𝜕𝑝
𝜕𝑥
𝑇𝜕𝐻
𝜕𝑥+
𝜕𝑝
𝜕𝑞
𝑇𝜕𝐻
𝜕𝑞= 0
(Also known as Liouville’s Theorem in physics)
![Page 26: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/26.jpg)
Example: 1d spring-mass system
26
𝐸 =1
2𝑘𝑥2 +
𝑞2
2𝑚
q
x
𝑥 = 𝐴𝑠𝑖𝑛(𝜔𝑡 + 𝜙0)
𝑞 = 𝑚𝜔𝐴𝑐𝑜𝑠(𝜔𝑡 + 𝜙0)
(𝜔 =𝑘
𝑚)
No ergodicity ?
![Page 27: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/27.jpg)
Example: 1d spring-mass system interacting with a heat bath
27
①momentum resampling (𝑚 = 𝑘𝐵𝑇 = 1)𝑞 ∼ N(0,1)
②travel on an energy level for a certain time (L steps)
q
x① ①
①
①
①
②
②
②
②
Maxwell-Boltzmann distribution
𝑝 𝑞 ∝ exp(−𝑞2
2𝑚𝑘𝐵𝑇)
Ensemble Time
![Page 28: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/28.jpg)
2nd LD & HMC
28
q
x① ①
①
①
①
②
②
②
②
Continuous scattering Discrete scattering (憋大招)
![Page 29: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/29.jpg)
Algorithm
29
Momentum resampling
Hamiltonian dynamics(Leap frog scheme)
Metropolis-Hastings
![Page 30: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/30.jpg)
Euler vs leap frog
30
Leapfrog method
Euler’s method
𝑞 𝑡 + 𝜖 = 𝑞 𝑡 − 𝜖𝜕𝑈
𝜕𝑥(𝑥(𝑡))
𝑥 𝑡 + 𝜖 = 𝑥 𝑡 + 𝜖𝑞(𝑡)
𝑚
𝑞 𝑡 +𝜖
2= 𝑞 𝑡 −
𝜖
2
𝜕𝑈
𝜕𝑥(𝑥(𝑡))
𝑞 𝑡 + 𝜖 = 𝑞 𝑡 +𝜖
2−𝜖
2
𝜕𝑈
𝜕𝑥(𝑥(𝑡 + 𝜖))
𝑥 𝑡 + 𝜖 = 𝑥 𝑡 + 𝜖𝑞(𝑡 +
𝜖2)
𝑚
𝑥(𝑡 + 𝜖)
𝑞(𝑡 + 𝜖)
1𝜖
𝑚
−𝑘𝜖
𝑚1
𝑥(𝑡)
𝑞(𝑡)=
𝑥(𝑡 + 𝜖)
𝑞(𝑡 + 𝜖)=
1 0
−𝑘𝜖
2𝑚1
1𝜖
𝑚
0 1
1 0
−𝑘𝜖
2𝑚1
𝑥(𝑡)
𝑞(𝑡)×
e.g. 𝑈 𝑥 =1
2𝑘𝑥2
det>1, not preserving volume !
det=1, preserving volume !
![Page 31: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/31.jpg)
Euler vs leap frog
31
Euler’s method : diverge Leapfrog : stable
![Page 32: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/32.jpg)
MCMC & HMC
32
𝑈(𝑥)
Random walk MCMC: position 𝑥
𝑈(𝑥)
HMC: position 𝑥 + momentum 𝑞
𝑃 𝑥 ∼ exp −𝑈 𝑥 /𝑇
𝑃 𝑥 ∼ exp −(𝑈 𝑥 + 𝐾 𝑞 )/𝑇
Hamiltonian dynamics → energy conservation
Always accept ! (if step size→0)
}
![Page 33: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/33.jpg)
MCMC & HMC
33
Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of
markov chain monte carlo 2.11 (2011): 2.
![Page 34: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/34.jpg)
HMC limitations
34
①Ill-conditioned distributions ②Multimodal distributions
③Discontinuous ④spiky
⑤Large training dataset
Need different massesin different directions hard to escape from one mode
Large energy gapAcceptance rate low
Large gradientsAcceptance rate low
Expensive gradients computation
![Page 35: Langevin-type sampling methods - GitHub Pages · 3. Hamiltonian Monte Carlo (HMC) Neal, Radford M. "MCMC using Hamiltonian dynamics." Handbook of markov chain monte carlo 2.11 (2011):](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1f63536635f508620c5aec/html5/thumbnails/35.jpg)
HMC variants
35
Riemannian HMC
Girolami, Mark, Ben Calderhead, and
Siu A. Chin. "Riemannian manifold
hamiltonian monte carlo." arXiv
preprint arXiv:0907.1100 (2009).
Magnetic HMC
Tripuraneni, Nilesh, et al. "Magnetic
Hamiltonian Monte Carlo." Proceedings
of the 34th International Conference on
Machine Learning-Volume 70. JMLR.
org, 2017.
Wormhole HMC
Lan, Shiwei, Jeffrey Streets, and Babak
Shahbaba. "Wormhole hamiltonian
monte carlo." Twenty-Eighth AAAI
Conference on Artificial Intelligence.
2014.
Continuous tempered HMC
Graham, Matthew M., and Amos J.
Storkey. "Continuously tempered
hamiltonian monte carlo." arXiv preprint
arXiv:1704.03338 (2017).
Stochastic Gradient HMC
Chen, Tianqi, Emily Fox, and Carlos
Guestrin. "Stochastic gradient hamiltonian
monte carlo." International conference on
machine learning. 2014.
Stochastic Gradient Thermostat
Ding, Nan, et al. "Bayesian sampling
using stochastic gradient
thermostats." Advances in neural
information processing systems. 2014.
Relativistic Monte Carlo
Lu, Xiaoyu, et al. "Relativistic
monte carlo." arXiv preprint
arXiv:1609.04388 (2016).
Optics HMC
Afshar, Hadi Mohasel, and Justin
Domke. "Reflection, refraction, and
hamiltonian monte carlo." Advances
in Neural Information Processing
Systems. 2015.