accelerated kinetic monte carlo methods: hierarchical ... · accelerated kinetic monte carlo...

76
Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University of Massachusetts & University of Crete Funding:NSF-DMS, NSF-CMMI, U.S. DOE and E.C. FP7

Upload: others

Post on 27-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Accelerated kinetic Monte Carlo methods:Hierarchical Parallel Algorithms & Coarse-Graining

Markos Katsoulakis

University of Massachusetts & University of Crete

Funding:NSF-DMS, NSF-CMMI, U.S. DOE and E.C. FP7

Page 2: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Overview

1 Stochastic Lattice Systems & Applications

2 Kinetic Monte Carlo (KMC) methods

3 Coarse Graining (CG)

4 Hierarchical Parallel Algorithms

5 Benchmarks, examples and simulations

Page 3: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Stochastic Lattice Systems

Surface processes

Provide information forpattern formation, chemical reactions, phase transitions

ΛN = 1N Zd ∩ [0, 1)d

Lattice size N >> 1

Configurationsσ ∈ ΣN := I ΛN

I = {0, 1} or I = {−1, 1}

Page 4: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Equilibrium Theory

Hamiltonian: HN(σ) = −12

∑x 6=y J(x , y)σ(x)σ(y) + h

∑x σ(x)

- h: external field

- J: potential with interaction range L; V : R→ R hascompact support,

J(x − y) =1

LV(x − y

L

).

Nearest neighbor models (as truncations) and possiblycombinations short-/long- range interactions.

Fitted potentials to Molecular Dynamics simulations or data,e.g. Morse potentials

Page 5: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Gibbs States

At the inverse temperature β = 1kT :

µΛ,β(σ = σ0) =1

ZΛ,βexp

{− βHN(σ0)}PN(σ = σ0)

[Probability of the configuration σ0]

Partition function: ZΛ,β =∑

σ0exp

{− βHN(σ0)}PN(σ = σ0)

Prior distribution (no interactions, hight temp.):

PN(σ = σ0) = Πx∈ΛP(σ(x) = σ0(x))

Page 6: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Kinetic Monte Carlo (KMC)

Dynamics

Adsorption/Desorption/Reactions/Surface diffusion

Markov Chain modeling with state space Σ = all configurations σ

Generator: ∂tEf (σ) = E∑x∈Λ

c(x , σ)[f (σx)− f (σ)]︸ ︷︷ ︸LN f (σ)

.

Multi-site updates σx for most systems, e.g.

Suchorski et al ChemPhysChem (2010)

Page 7: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Kinetic Monte Carlo (KMC)

Dynamics

Adsorption/Desorption/Reactions/Surface diffusion

Markov Chain modeling with state space Σ = all configurations σ

Generator: ∂tEf (σ) = E∑x∈Λ

c(x , σ)[f (σx)− f (σ)]︸ ︷︷ ︸LN f (σ)

.

Multi-site updates σx for most systems, e.g.

Suchorski et al ChemPhysChem (2010)

Page 8: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Kinetic Monte Carlo (KMC)

Dynamics

Adsorption/Desorption/Reactions/Surface diffusion

Markov Chain modeling with state space Σ = all configurations σ

Generator: ∂tEf (σ) = E∑x∈Λ

c(x , σ)[f (σx)− f (σ)]︸ ︷︷ ︸LN f (σ)

.

Multi-site updates σx for most systems, e.g.

Suchorski et al ChemPhysChem (2010)

Page 9: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Kinetic Monte Carlo (KMC)

TransitionProbability p(x,y)

No depend. onthe Past 1,...,k-1

Present State=x

Possible FutureState=y

Possible FutureState=z

Possible FutureState=w

Past States=x_kk=1,2,...,k-1

ResidenceTime: τ_x

expon.distributed:λ(x)

Page 10: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Kinetic Monte Carlo: Arrhenius dynamics

Transition rate to the gas phase: c(x , σ) ∼ d0 exp[− βU(x , σ)

]Energy barrier: U(x , σ) =

∑z 6=x J(x − z)σ(z)− h.

-Exponential clock: for each configuration σ,

λ(σ) = d1(N −∑x∈

σ(x)) +∑x∈

d0σ(x)e−βU(x ,σ).

-Transition rates σ 7→ σ′ = σx :

c(x , σ) = λ(σ)p(σ, σx) = d1(1− σ(x)) + d0σ(x)e−βU(x ,σ)

Page 11: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Kinetic Monte Carlo: Arrhenius dynamics

Transition rate to the gas phase: c(x , σ) ∼ d0 exp[− βU(x , σ)

]Energy barrier: U(x , σ) =

∑z 6=x J(x − z)σ(z)− h.

-Exponential clock: for each configuration σ,

λ(σ) = d1(N −∑x∈

σ(x)) +∑x∈

d0σ(x)e−βU(x ,σ).

-Transition rates σ 7→ σ′ = σx :

c(x , σ) = λ(σ)p(σ, σx) = d1(1− σ(x)) + d0σ(x)e−βU(x ,σ)

Page 12: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Kinetic Monte Carlo: Arrhenius dynamics

Transition rate to the gas phase: c(x , σ) ∼ d0 exp[− βU(x , σ)

]Energy barrier: U(x , σ) =

∑z 6=x J(x − z)σ(z)− h.

-Exponential clock: for each configuration σ,

λ(σ) = d1(N −∑x∈

σ(x)) +∑x∈

d0σ(x)e−βU(x ,σ).

-Transition rates σ 7→ σ′ = σx :

c(x , σ) = λ(σ)p(σ, σx) = d1(1− σ(x)) + d0σ(x)e−βU(x ,σ)

Page 13: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Kinetic Monte Carlo: Arrhenius dynamics

References: Gillespie (chemical reactions); Bortz, Kalos, Lebowitz(Ising-type systems)The pseudo-algorithm suggests:

divide lattice sites x into classes of equal rates

pick a class using the relative weights

pick from each class a site x uniformly and update theconfiguration

However: For complex interactions (e.g. long-range)

U(x , σ) =∑z 6=x

J(x − z)σ(z)− h

yields a very high number of classes making the algorithmimpractical.

Page 14: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Towards accelerating molecular simulations

Bottlenecks in molecular simulation of extended systems.

Cannot simulate realistic spatio-temporal scales:

1µm2 ≈ 10, 0002 lattice

Difficult to carry out ”systems tasks” for engineering applications:

sensitivity analysis, optimization, control

Page 15: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Towards accelerating molecular simulations

Bottlenecks in molecular simulation of extended systems.

Cannot simulate realistic spatio-temporal scales:

1µm2 ≈ 10, 0002 lattice

Difficult to carry out ”systems tasks” for engineering applications:

sensitivity analysis, optimization, control

Page 16: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Towards accelerating molecular simulations

Bottlenecks in molecular simulation of extended systems.

Cannot simulate realistic spatio-temporal scales:

1µm2 ≈ 10, 0002 lattice

Difficult to carry out ”systems tasks” for engineering applications:

sensitivity analysis, optimization, control

Page 17: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Coarse-Graining: from microscopics to mesoscopics

Spatial acceleration methods

Microscopic lattice

Coarse lattice

Time (s)

Non-uniform mesh

1 Chatterjee et al., JCP 121, 11420 (2004); PRE 71, 0267021 (2005)2 Chatterjee and Vlachos, JCP 124, 0641101 (2006)

0.00

0.05

0.10

0.15

0.20

0.25

0.40 0.44 0.48 0.52

Prob

abili

ty

Average lattice coverage, !

KMCMultiscale CGMC

CGMC-MFCGMC-QC

t = 10 s

• Spatial adaptivity1

- Error estimates guide mesh refinement• Multiscale MC methods for high accuracy2

– Higher order closures– Multigrid

• Multicomponent interacting systems

c(x , t) ≈ local average vN(x , t) =1

|Bx |∑y∈Bx

σt(y), as N →∞ ,

”Closure”: when does c = c(x , t) solve a PDE/Stoch. PDE?

E.g. local mean-field limits, Connections to Cahn-Hilliard (S)PDEfor attractive interactions (J > 0)Lebowitz, Orlandi, Presutti JSP ’91; Giacomin, Lebowitz, Phys. Rev. Lett. ’96; K. , Vlachos, Phys. Rev. Lett. ’00;

J. Chem. Phys. ’03.

Page 18: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Coarse-Graining: from microscopics to mesoscopics

Spatial acceleration methods

Microscopic lattice

Coarse lattice

Time (s)

Non-uniform mesh

1 Chatterjee et al., JCP 121, 11420 (2004); PRE 71, 0267021 (2005)2 Chatterjee and Vlachos, JCP 124, 0641101 (2006)

0.00

0.05

0.10

0.15

0.20

0.25

0.40 0.44 0.48 0.52

Prob

abili

ty

Average lattice coverage, !

KMCMultiscale CGMC

CGMC-MFCGMC-QC

t = 10 s

• Spatial adaptivity1

- Error estimates guide mesh refinement• Multiscale MC methods for high accuracy2

– Higher order closures– Multigrid

• Multicomponent interacting systems

c(x , t) ≈ local average vN(x , t) =1

|Bx |∑y∈Bx

σt(y), as N →∞ ,

”Closure”: when does c = c(x , t) solve a PDE/Stoch. PDE?

E.g. local mean-field limits, Connections to Cahn-Hilliard (S)PDEfor attractive interactions (J > 0)Lebowitz, Orlandi, Presutti JSP ’91; Giacomin, Lebowitz, Phys. Rev. Lett. ’96; K. , Vlachos, Phys. Rev. Lett. ’00;

J. Chem. Phys. ’03.

Page 19: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Coarse-Graining: from microscopics to mesoscopics

Spatial acceleration methods

Microscopic lattice

Coarse lattice

Time (s)

Non-uniform mesh

1 Chatterjee et al., JCP 121, 11420 (2004); PRE 71, 0267021 (2005)2 Chatterjee and Vlachos, JCP 124, 0641101 (2006)

0.00

0.05

0.10

0.15

0.20

0.25

0.40 0.44 0.48 0.52

Prob

abili

ty

Average lattice coverage, !

KMCMultiscale CGMC

CGMC-MFCGMC-QC

t = 10 s

• Spatial adaptivity1

- Error estimates guide mesh refinement• Multiscale MC methods for high accuracy2

– Higher order closures– Multigrid

• Multicomponent interacting systems

c(x , t) ≈ local average vN(x , t) =1

|Bx |∑y∈Bx

σt(y), as N →∞ ,

”Closure”: when does c = c(x , t) solve a PDE/Stoch. PDE?

E.g. local mean-field limits, Connections to Cahn-Hilliard (S)PDEfor attractive interactions (J > 0)Lebowitz, Orlandi, Presutti JSP ’91; Giacomin, Lebowitz, Phys. Rev. Lett. ’96; K. , Vlachos, Phys. Rev. Lett. ’00;

J. Chem. Phys. ’03.

Page 20: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Hierarchical Coarse-Graining

1. Coarse-graining ofpolymers; DPD methods

Briels, et. al. J.Chem.Phys. ’01;Doi et. al. J.Chem.Phys. ’02;Kremer et. al. Macromolecules ’06;Muller-Plathe Chem.Phys.Chem ’00;Laaksonen et. al. Soft Matter ’03, etc.

Recent related work on simulating

bio-membranes: Deserno et. al. Nature ’07.

2. Stochastic latticedynamics/ KMCK., Majda, Vlachos, PNAS’03;K., Plechac, Sopasakis, SIAM Num. Anal.’06;Are, K., Plechac, Rey-Bellet SIAMJ.Sci.Comp. ’08;

Sinno et al. J.Chem.Phys.’08.

Spatial acceleration methods

Microscopic lattice

Coarse lattice

Time (s)

Non-uniform mesh

1 Chatterjee et al., JCP 121, 11420 (2004); PRE 71, 0267021 (2005)2 Chatterjee and Vlachos, JCP 124, 0641101 (2006)

0.00

0.05

0.10

0.15

0.20

0.25

0.40 0.44 0.48 0.52

Prob

abili

ty

Average lattice coverage, !

KMCMultiscale CGMC

CGMC-MFCGMC-QC

t = 10 s

• Spatial adaptivity1

- Error estimates guide mesh refinement• Multiscale MC methods for high accuracy2

– Higher order closures– Multigrid

• Multicomponent interacting systems

Page 21: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Coarse Graining in Lattice Systems

Divide lattice of size N into M cellswith q-particles in each cell

q

q

di!usion

adsorption desorption

block spin !(k) =!

x!Ck"(x)Coarse cells

m

m

Coarse map:

T : ΣN → ΣM

σ 7→ η := {η(k) =∑x∈Ck

σ(x)}

Renormalization Group map:

H(η) = − 1

βlog

∫exp{−βHN(σ)}P(dσ|η)

Page 22: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Coarse Graining in Lattice Systems

Divide lattice of size N into M cellswith q-particles in each cell

q

q

di!usion

adsorption desorption

block spin !(k) =!

x!Ck"(x)Coarse cells

m

m

Coarse map:

T : ΣN → ΣM

σ 7→ η := {η(k) =∑x∈Ck

σ(x)}

Renormalization Group map:

H(η) = − 1

βlog

∫exp{−βHN(σ)}P(dσ|η)

Page 23: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Coarse Graining in Lattice Systems

Divide lattice of size N into M cellswith q-particles in each cell

q

q

di!usion

adsorption desorption

block spin !(k) =!

x!Ck"(x)Coarse cells

m

m

Coarse map:

T : ΣN → ΣM

σ 7→ η := {η(k) =∑x∈Ck

σ(x)}

Renormalization Group map:

H(η) = − 1

βlog

∫exp{−βHN(σ)}P(dσ|η)

Page 24: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

1-D example: n.n. Ising model

Approximation of RG map: H(η) by H0(η) computable:

HN(σ) =∑k

Hk(σ) +∑k

Wk,k+1(σ)

Hk : energy for the cell k with free boundary conditionsWk,k+1: short-range interactions between cell k and cell k + 1.

e−βHN PN(dσ|η) =∏

k: odd

[e−(Wk−1,k+Wk,k+1)e−Hk Pk(dσk |η(k))

]×∏

k: even

e−Hk Pk(dσk |η(k))1D Operator Splitting

Algorithm(OpSpl)

1. Apply SSA on white cells in parallel until synchronization time 2. Apply SSA on black cells in parallel until synchronization time 3. Goto 1

Page 25: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

A simple example

-When Wk,k+1 are disregarded (e.g. high temps), there areintra-cell interactions, but there are no CG cell correlations:

H(0)m (η) =

∑k

U(0)k (ηk) = −

∑k

1

βlog

∫e−βHk (σ)Pk(dσk |η(k))

Sampling over a single coarse cell with free boundary conditionsInverse Monte Carlo method: Laaksonen et. al. Soft Matter ’03.

1D Operator Splitting Algorithm(OpSpl)

1. Apply SSA on white cells in parallel until synchronization time 2. Apply SSA on black cells in parallel until synchronization time 3. Goto 1

Multi-body terms in Coarse GrainingK., Plechac , Rey-Bellet, Tsagkarogiannis ESAIM M2AN ’07, preprint ’10

Page 26: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

A simple example

-When Wk,k+1 are disregarded (e.g. high temps), there areintra-cell interactions, but there are no CG cell correlations:

H(0)m (η) =

∑k

U(0)k (ηk) = −

∑k

1

βlog

∫e−βHk (σ)Pk(dσk |η(k))

Sampling over a single coarse cell with free boundary conditionsInverse Monte Carlo method: Laaksonen et. al. Soft Matter ’03.

1D Operator Splitting Algorithm(OpSpl)

1. Apply SSA on white cells in parallel until synchronization time 2. Apply SSA on black cells in parallel until synchronization time 3. Goto 1

Multi-body terms in Coarse GrainingK., Plechac , Rey-Bellet, Tsagkarogiannis ESAIM M2AN ’07, preprint ’10

Page 27: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Coarse Graining- Approximations heuristics

• CG Hamiltonian–Renormalization Group Map: N = mq

H(η) = − 1

βlog

∫exp{−βHN(σ)}P(dσ|η)

• Correction terms around a first ”good guess” H(0)m :

Hm(η) = H(0)m (η)− 1

βlog E [exp

(− β(HN − H

(0)m ))|η] , m = N,N−1, ...

• Expansion of exp (β∆H) and log:

= E [∆H|η] + E [(∆H)2|η]− E [∆H|η]2 + O((∆H)3)

formal calculations inadequate since:

∆H ≡ HN − H(0)m = N · O(ε)

• the role of fluctuations and extensivity.

• Rigorous analysis – Cluster expansion: around H(0)m

K., Plechac , Rey-Bellet, Tsagkarogiannis ESAIM M2AN ’07

Page 28: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Coarse Graining- Approximations heuristics

• CG Hamiltonian–Renormalization Group Map: N = mq

H(η) = − 1

βlog

∫exp{−βHN(σ)}P(dσ|η)

• Correction terms around a first ”good guess” H(0)m :

Hm(η) = H(0)m (η)− 1

βlog E [exp

(− β(HN − H

(0)m ))|η] , m = N,N−1, ...

• Expansion of exp (β∆H) and log:

= E [∆H|η] + E [(∆H)2|η]− E [∆H|η]2 + O((∆H)3)

formal calculations inadequate since:

∆H ≡ HN − H(0)m = N · O(ε)

• the role of fluctuations and extensivity.

• Rigorous analysis – Cluster expansion: around H(0)m

K., Plechac , Rey-Bellet, Tsagkarogiannis ESAIM M2AN ’07

Page 29: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Coarse Graining- Approximations heuristics

• CG Hamiltonian–Renormalization Group Map: N = mq

H(η) = − 1

βlog

∫exp{−βHN(σ)}P(dσ|η)

• Correction terms around a first ”good guess” H(0)m :

Hm(η) = H(0)m (η)− 1

βlog E [exp

(− β(HN − H

(0)m ))|η] , m = N,N−1, ...

• Expansion of exp (β∆H) and log:

= E [∆H|η] + E [(∆H)2|η]− E [∆H|η]2 + O((∆H)3)

formal calculations inadequate since:

∆H ≡ HN − H(0)m = N · O(ε)

• the role of fluctuations and extensivity.

• Rigorous analysis – Cluster expansion: around H(0)m

K., Plechac , Rey-Bellet, Tsagkarogiannis ESAIM M2AN ’07

Page 30: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Coarse Graining- Approximations heuristics

• CG Hamiltonian–Renormalization Group Map: N = mq

H(η) = − 1

βlog

∫exp{−βHN(σ)}P(dσ|η)

• Correction terms around a first ”good guess” H(0)m :

Hm(η) = H(0)m (η)− 1

βlog E [exp

(− β(HN − H

(0)m ))|η] , m = N,N−1, ...

• Expansion of exp (β∆H) and log:

= E [∆H|η] + E [(∆H)2|η]− E [∆H|η]2 + O((∆H)3)

formal calculations inadequate since:

∆H ≡ HN − H(0)m = N · O(ε)

• the role of fluctuations and extensivity.

• Rigorous analysis – Cluster expansion: around H(0)m

K., Plechac , Rey-Bellet, Tsagkarogiannis ESAIM M2AN ’07

Page 31: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Coarse Graining- Approximations heuristics

• CG Hamiltonian–Renormalization Group Map: N = mq

H(η) = − 1

βlog

∫exp{−βHN(σ)}P(dσ|η)

• Correction terms around a first ”good guess” H(0)m :

Hm(η) = H(0)m (η)− 1

βlog E [exp

(− β(HN − H

(0)m ))|η] , m = N,N−1, ...

• Expansion of exp (β∆H) and log:

= E [∆H|η] + E [(∆H)2|η]− E [∆H|η]2 + O((∆H)3)

formal calculations inadequate since:

∆H ≡ HN − H(0)m = N · O(ε)

• the role of fluctuations and extensivity.

• Rigorous analysis – Cluster expansion: around H(0)m

K., Plechac , Rey-Bellet, Tsagkarogiannis ESAIM M2AN ’07

Page 32: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Coarse Graining- Approximations heuristics

• CG Hamiltonian–Renormalization Group Map: N = mq

H(η) = − 1

βlog

∫exp{−βHN(σ)}P(dσ|η)

• Correction terms around a first ”good guess” H(0)m :

Hm(η) = H(0)m (η)− 1

βlog E [exp

(− β(HN − H

(0)m ))|η] , m = N,N−1, ...

• Expansion of exp (β∆H) and log:

= E [∆H|η] + E [(∆H)2|η]− E [∆H|η]2 + O((∆H)3)

formal calculations inadequate since:

∆H ≡ HN − H(0)m = N · O(ε)

• the role of fluctuations and extensivity.

• Rigorous analysis – Cluster expansion: around H(0)m

K., Plechac , Rey-Bellet, Tsagkarogiannis ESAIM M2AN ’07

Page 33: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Coarse Graining, Long-range Interactions

-Very costly with KMC, but easy to CG:

HN(σ) = −1

2

∑x∈ΛN

∑y 6=x

J(x − y)σ(x)σ(y) + h∑x

σ(x)

J(x − y) =1

LdV

( |x − y |L

), x , y ∈ ΛN

Jk,l =1

q2

∑x∈Ck

∑y∈Cl

J(x − y),

Spatial acceleration methods

Microscopic lattice

Coarse lattice

Time (s)

Non-uniform mesh

1 Chatterjee et al., JCP 121, 11420 (2004); PRE 71, 0267021 (2005)2 Chatterjee and Vlachos, JCP 124, 0641101 (2006)

0.00

0.05

0.10

0.15

0.20

0.25

0.40 0.44 0.48 0.52

Prob

abili

ty

Average lattice coverage, !

KMCMultiscale CGMC

CGMC-MFCGMC-QC

t = 10 s

• Spatial adaptivity1

- Error estimates guide mesh refinement• Multiscale MC methods for high accuracy2

– Higher order closures– Multigrid

• Multicomponent interacting systems

H(0)(η) = −1

2

∑k

∑l 6=k

Jk,lη(k)η(l)−1

2

∑k

Jk,k(η(k)−q)+h∑k

η(k)

Page 34: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Coarse Graining, Long-range Interactions

-Very costly with KMC, but easy to CG:

HN(σ) = −1

2

∑x∈ΛN

∑y 6=x

J(x − y)σ(x)σ(y) + h∑x

σ(x)

J(x − y) =1

LdV

( |x − y |L

), x , y ∈ ΛN

Jk,l =1

q2

∑x∈Ck

∑y∈Cl

J(x − y),

Spatial acceleration methods

Microscopic lattice

Coarse lattice

Time (s)

Non-uniform mesh

1 Chatterjee et al., JCP 121, 11420 (2004); PRE 71, 0267021 (2005)2 Chatterjee and Vlachos, JCP 124, 0641101 (2006)

0.00

0.05

0.10

0.15

0.20

0.25

0.40 0.44 0.48 0.52

Prob

abili

ty

Average lattice coverage, !

KMCMultiscale CGMC

CGMC-MFCGMC-QC

t = 10 s

• Spatial adaptivity1

- Error estimates guide mesh refinement• Multiscale MC methods for high accuracy2

– Higher order closures– Multigrid

• Multicomponent interacting systems

H(0)(η) = −1

2

∑k

∑l 6=k

Jk,lη(k)η(l)−1

2

∑k

Jk,k(η(k)−q)+h∑k

η(k)

Page 35: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Coarse Graining, Long-range Interactions

-Very costly with KMC, but easy to CG:

HN(σ) = −1

2

∑x∈ΛN

∑y 6=x

J(x − y)σ(x)σ(y) + h∑x

σ(x)

J(x − y) =1

LdV

( |x − y |L

), x , y ∈ ΛN

Jk,l =1

q2

∑x∈Ck

∑y∈Cl

J(x − y),

Spatial acceleration methods

Microscopic lattice

Coarse lattice

Time (s)

Non-uniform mesh

1 Chatterjee et al., JCP 121, 11420 (2004); PRE 71, 0267021 (2005)2 Chatterjee and Vlachos, JCP 124, 0641101 (2006)

0.00

0.05

0.10

0.15

0.20

0.25

0.40 0.44 0.48 0.52

Prob

abili

ty

Average lattice coverage, !

KMCMultiscale CGMC

CGMC-MFCGMC-QC

t = 10 s

• Spatial adaptivity1

- Error estimates guide mesh refinement• Multiscale MC methods for high accuracy2

– Higher order closures– Multigrid

• Multicomponent interacting systems

H(0)(η) = −1

2

∑k

∑l 6=k

Jk,lη(k)η(l)−1

2

∑k

Jk,k(η(k)−q)+h∑k

η(k)

Page 36: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Multi-body terms in Coarse Graining

Corrections to H(0): Hm(η) = H(0)m (η) + H

(1)m (η) + ...

H(1)(η) = β∑k1

∑k2>k1

∑k3>k2

[j2k1k2k3

(−E1(k1)E2(k2)E1(k3) + ...

• “Moments” of interaction potential J:

j2k1k2k3

=∑

x∈Ck1

∑y∈Ck2

∑z∈Ck3

(J(x−y)− J(k1, k2))(J(y−z)− J(k2, k3))

Typically omitted but essential tocapture phase transitions, hysteresis

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

h −external field

cove

rage

N=1024; q=8; ! Jo = 5.0

0 0.02 0.04 0.06 0.08 0.1

−0.1

−0.05

0

0.05

r "

J(r)

MCq=8 with corrections & potential splittingq=8 with corrections & no potential splittingq=8

J(r)

Page 37: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Multi-body terms in Coarse Graining

Corrections to H(0): Hm(η) = H(0)m (η) + H

(1)m (η) + ...

H(1)(η) = β∑k1

∑k2>k1

∑k3>k2

[j2k1k2k3

(−E1(k1)E2(k2)E1(k3) + ...

• “Moments” of interaction potential J:

j2k1k2k3

=∑

x∈Ck1

∑y∈Ck2

∑z∈Ck3

(J(x−y)− J(k1, k2))(J(y−z)− J(k2, k3))

Typically omitted but essential tocapture phase transitions, hysteresis

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

h −external field

cove

rage

N=1024; q=8; ! Jo = 5.0

0 0.02 0.04 0.06 0.08 0.1

−0.1

−0.05

0

0.05

r "

J(r)

MCq=8 with corrections & potential splittingq=8 with corrections & no potential splittingq=8

J(r)

Page 38: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Multi-body terms in Coarse Graining

Corrections to H(0): Hm(η) = H(0)m (η) + H

(1)m (η) + ...

H(1)(η) = β∑k1

∑k2>k1

∑k3>k2

[j2k1k2k3

(−E1(k1)E2(k2)E1(k3) + ...

• “Moments” of interaction potential J:

j2k1k2k3

=∑

x∈Ck1

∑y∈Ck2

∑z∈Ck3

(J(x−y)− J(k1, k2))(J(y−z)− J(k2, k3))

Typically omitted but essential tocapture phase transitions, hysteresis

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

h −external field

cove

rage

N=1024; q=8; ! Jo = 5.0

0 0.02 0.04 0.06 0.08 0.1

−0.1

−0.05

0

0.05

r "

J(r)

MCq=8 with corrections & potential splittingq=8 with corrections & no potential splittingq=8

J(r)

Page 39: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Loss of Information & Coarse Graining

Relative entropy: R(µ|ν) =∫

log(

dµdν

)dν

Theorem [Error estimates]:1. For ε = β q

L‖∇J‖1,

1

NR(µ(p)|µ) = O(εp+2)

2. Cluster expansions → a posteriori expansion for the relativeentropy:

1

NR(µ(p)|µ) = Eµ(0) [I (η)] + log

(Eµ(0) [e−I (η)]

)+ O(ε3)

The error indicator I (.) is given by the terms H(1), H(2) anddepends only on the coarse variable η ∼ µ(0).

Page 40: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Loss of Information & Coarse Graining

Relative entropy: R(µ|ν) =∫

log(

dµdν

)dν

Theorem [Error estimates]:1. For ε = β q

L‖∇J‖1,

1

NR(µ(p)|µ) = O(εp+2)

2. Cluster expansions → a posteriori expansion for the relativeentropy:

1

NR(µ(p)|µ) = Eµ(0) [I (η)] + log

(Eµ(0) [e−I (η)]

)+ O(ε3)

The error indicator I (.) is given by the terms H(1), H(2) anddepends only on the coarse variable η ∼ µ(0).

Page 41: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Parallel KMC Simulation in Lattice Systems

Markovian Dynamics: Adsorption/Desorption/Reaction/ Diffusion

Generator: ∂tEf (σ) = E∑

x∈Λ c(x , σ)[f (σx)− f (σ)].

TransitionProbability p(x,y)

No depend. onthe Past 1,...,k-1

Present State=x

Possible FutureState=y

Possible FutureState=z

Possible FutureState=w

Past States=x_kk=1,2,...,k-1

ResidenceTime: τ_x

expon.distributed:λ(x)

Page 42: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Parallel KMC Simulation in Lattice Systems

Lubachevsky, JCP ’88, Korniss, Novotny, Rikvold JCP ’01,...

Main idea in geometric parallelization: break up the lattice insmaller sub-lattices.

Run KMC on each sub-lattice in separate processors andcommunicate across boundaries.

However: asynchronous updates in neighboring sites acrossprocesses in standard CTMC implementations

parallel n-fold way with block size l serial

l 16 32 64 128 256 512 1024 (!)!t 3.7 6.1 9.2 12.6 15.4 17.4 18.5 19.9

TABLE I. Mean time increments (in MCSP) for the serial and the parallel n-fold way algorithms with di"erent block sizel at T=0.7Tc, |H |/J=0.2857. They are approximately independent of the full system size L and NPE. (!) The mean timeincrement for the serial algorithm is approximately independent of L.

|H |/J0.1587 0.2222 0.2857 0.3492 0.4127

0.6 serial - 81.5 61.4 46.4 36.3parallel - 23.4 21.4 19.3 17.4

T/Tc 0.7 serial 33.8 25.4 19.9 16.5 14.3parallel 16.8 14.5 12.6 11.1 10.1

0.8 serial 12.5 10.4 9.2 8.5 7.9parallel 9.2 8.0 7.4 6.9 6.5

TABLE II. Mean time increments (in MCSP) for the serial and the parallel n-fold way algorithms for di"erent temperaturesand magnetic fields (NPE=64, l=128).

PE8

i k

PE0 PE1 PE2

PE3 PE4 PE5

PE6 PE7

FIG. 1. Schematic diagram of the spatial decomposition of the system and its mapping onto a parallel machine. Here L=12and l=4. Each of the NPE=(L/l)2=9 processing elements (PEs) carries l2=16 spins, confined by solid lines. The spins on theboundary are separated from those in the kernel by dashed lines.

11

Page 43: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Uniformization and Parallel Simulation in Lattice Systems

One solution is ”uniformization”: same process indistribution, however we pick clock with uniform rate λ∗ suchthat:

maxxλ(x) ≤ λ∗ ,

and new skeleton process

p∗(x , y) =

1− λ(x)

λ∗ if x=y

λ(x)λ∗ p(x , y) if x 6= y

p∗(x , y) introduces rejections in the algorithm.

asynchronous algorithms, unless a boundary event occurs

Page 44: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Parallel Simulation in Lattice Systems

However,

we still have excessive communication between processors inthe case of complex interactions:communication (boundary) regions can be ”wide”, in contrastto the n.n. case:

parallel n-fold way with block size l serial

l 16 32 64 128 256 512 1024 (!)!t 3.7 6.1 9.2 12.6 15.4 17.4 18.5 19.9

TABLE I. Mean time increments (in MCSP) for the serial and the parallel n-fold way algorithms with di"erent block sizel at T=0.7Tc, |H |/J=0.2857. They are approximately independent of the full system size L and NPE. (!) The mean timeincrement for the serial algorithm is approximately independent of L.

|H |/J0.1587 0.2222 0.2857 0.3492 0.4127

0.6 serial - 81.5 61.4 46.4 36.3parallel - 23.4 21.4 19.3 17.4

T/Tc 0.7 serial 33.8 25.4 19.9 16.5 14.3parallel 16.8 14.5 12.6 11.1 10.1

0.8 serial 12.5 10.4 9.2 8.5 7.9parallel 9.2 8.0 7.4 6.9 6.5

TABLE II. Mean time increments (in MCSP) for the serial and the parallel n-fold way algorithms for di"erent temperaturesand magnetic fields (NPE=64, l=128).

PE8

i k

PE0 PE1 PE2

PE3 PE4 PE5

PE6 PE7

FIG. 1. Schematic diagram of the spatial decomposition of the system and its mapping onto a parallel machine. Here L=12and l=4. Each of the NPE=(L/l)2=9 processing elements (PEs) carries l2=16 spins, confined by solid lines. The spins on theboundary are separated from those in the kernel by dashed lines.

11

many rejections for poor upper bounds λ∗ ≥ maxx λ(x) .

Variants: partially rejection-free methods

Page 45: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Parallel Simulation in Lattice Systems

However,

we still have excessive communication between processors inthe case of complex interactions:communication (boundary) regions can be ”wide”, in contrastto the n.n. case:

parallel n-fold way with block size l serial

l 16 32 64 128 256 512 1024 (!)!t 3.7 6.1 9.2 12.6 15.4 17.4 18.5 19.9

TABLE I. Mean time increments (in MCSP) for the serial and the parallel n-fold way algorithms with di"erent block sizel at T=0.7Tc, |H |/J=0.2857. They are approximately independent of the full system size L and NPE. (!) The mean timeincrement for the serial algorithm is approximately independent of L.

|H |/J0.1587 0.2222 0.2857 0.3492 0.4127

0.6 serial - 81.5 61.4 46.4 36.3parallel - 23.4 21.4 19.3 17.4

T/Tc 0.7 serial 33.8 25.4 19.9 16.5 14.3parallel 16.8 14.5 12.6 11.1 10.1

0.8 serial 12.5 10.4 9.2 8.5 7.9parallel 9.2 8.0 7.4 6.9 6.5

TABLE II. Mean time increments (in MCSP) for the serial and the parallel n-fold way algorithms for di"erent temperaturesand magnetic fields (NPE=64, l=128).

PE8

i k

PE0 PE1 PE2

PE3 PE4 PE5

PE6 PE7

FIG. 1. Schematic diagram of the spatial decomposition of the system and its mapping onto a parallel machine. Here L=12and l=4. Each of the NPE=(L/l)2=9 processing elements (PEs) carries l2=16 spins, confined by solid lines. The spins on theboundary are separated from those in the kernel by dashed lines.

11many rejections for poor upper bounds λ∗ ≥ maxx λ(x) .

Variants: partially rejection-free methods

Page 46: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Parallel Simulation in Lattice Systems

However,

we still have excessive communication between processors inthe case of complex interactions:communication (boundary) regions can be ”wide”, in contrastto the n.n. case:

parallel n-fold way with block size l serial

l 16 32 64 128 256 512 1024 (!)!t 3.7 6.1 9.2 12.6 15.4 17.4 18.5 19.9

TABLE I. Mean time increments (in MCSP) for the serial and the parallel n-fold way algorithms with di"erent block sizel at T=0.7Tc, |H |/J=0.2857. They are approximately independent of the full system size L and NPE. (!) The mean timeincrement for the serial algorithm is approximately independent of L.

|H |/J0.1587 0.2222 0.2857 0.3492 0.4127

0.6 serial - 81.5 61.4 46.4 36.3parallel - 23.4 21.4 19.3 17.4

T/Tc 0.7 serial 33.8 25.4 19.9 16.5 14.3parallel 16.8 14.5 12.6 11.1 10.1

0.8 serial 12.5 10.4 9.2 8.5 7.9parallel 9.2 8.0 7.4 6.9 6.5

TABLE II. Mean time increments (in MCSP) for the serial and the parallel n-fold way algorithms for di"erent temperaturesand magnetic fields (NPE=64, l=128).

PE8

i k

PE0 PE1 PE2

PE3 PE4 PE5

PE6 PE7

FIG. 1. Schematic diagram of the spatial decomposition of the system and its mapping onto a parallel machine. Here L=12and l=4. Each of the NPE=(L/l)2=9 processing elements (PEs) carries l2=16 spins, confined by solid lines. The spins on theboundary are separated from those in the kernel by dashed lines.

11many rejections for poor upper bounds λ∗ ≥ maxx λ(x) .

Variants: partially rejection-free methods

Page 47: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Synchronous algorithms-Exact SimulationJ. Phys.: Condens. Matter 21 (2009) 084214 G Nandipati et al

Figure 6. Schematic diagram of square decomposition for Np = 9.Solid lines correspond to processor domains.

carrying out our parallel KMC simulations of coarsening wehave tested and developed several different parallel algorithmsin order to determine which is the most efficient. In particular,we have studied a recently developed rigorous algorithm, the‘optimistic synchronous relaxation’ (OSR) algorithm as wellas a modified version of this algorithm which we refer toas ‘optimistic synchronous relaxation with pseudo-rollback’(OSRPR). In addition, we have tested the recently developedsemi-rigorous synchronous sublattice (SL) algorithm. Finally,to reduce the number of events corresponding to boundaryevents between processors we have also developed a newmethod, which we refer to as ‘dynamic boundary allocation’(DBA). Below we discuss each of these methods and some ofthe details of their implementation.

4.1. Optimistic synchronous relaxation (OSR) algorithm

One of the first rigorous algorithms for parallel discrete-event simulations was the synchronous relaxation algorithmdeveloped by Lubachevsky [9]. We note that the applicationof this algorithm to KMC simulations as well as its scalingas a function of the number of processors Np has beenrecently studied by Shim and Amar [11]. However, since thisalgorithm is relatively complex and requires multiple iterationsfor each cycle, Merrick and Fichthorn have recently developeda similar but simpler algorithm which they refer to as optimisticsynchronous relaxation (OSR) [12].

Figure 6 shows a typical decomposition of a squaresystem into Np square regions, where Np is the number ofprocessors. Also indicated in figure 6 are the boundary and‘ghost’ regions for the central processor, where the boundaryregion is defined as that portion of the processor’s domain inwhich a change may affect neighboring processors. Similarly,the ghost region corresponds to that part of the neighboringprocessors’ domains which can affect a given processor. Thus,in general the width of the boundary and ghost regions must beat least equal to the range of interaction.

As shown in figure 7, in the OSR algorithm in eachcycle all processors start with the same initial time and then

Figure 7. Time evolution of events for OSR and OSRPR algorithmswith G = 4. Dashed lines correspond to selected events, while thedashed line with an X corresponds to an event exceeding tmin (see thetext). In OSR this event is discarded while in OSRPR this event isadded to the next cycle.

simultaneously and independently carry out KMC events intheir domains until either the number of KMC events reachesa pre-determined fixed number G, or one of the eventscorresponds to a ‘boundary event’, i.e. an event which modifiesthe boundary region of the given processor and which can thusaffect events in neighboring processors. Defining the time forthe last event in each processor as tlast, a global communicationis then carried out to determine the time tmin corresponding tothe minimum of tlast over all processors. Each processor then‘rolls back’ or undoes all KMC events which occur after tmin.If there are no boundary events then the processors all move onto the next cycle with the new starting time corresponding totmin. However, if tmin corresponds to a boundary event, then anadditional communication is needed to update the ghost and/orboundary regions of all processors affected by the boundaryevent.

We note that typically the OSR algorithm requires 2–3 global communications each cycle, one to determine tmin,another to determine if the event with tmin corresponded to aboundary event, and a third to update the boundary regionsof the affected processors if there was a boundary event.To reduce the number of global communications we haveencoded the processor identity as well as whether or not thelast event was a boundary event, along with the least advancedtime of each processor before doing a global communicationto determine tmin. This was done by replacing tlast with anumber whose most significant figures corresponded to tlast butwhose least significant figures contained information about theprocessor ID and whether or not that processor had a boundaryevent6. Thus, in our implementation of the OSR algorithm onlyone global communication was needed if tlast corresponded to anon-boundary event, while two communications were neededif it was a boundary event.

6 In this method, the time each processor advances from its previous cycleis multiplied by a very large number to form the integer part of the doubleprecision packed number. The ratio of the processor ID to the total numberof processors used Np is then added to the decimal part if there is a boundaryevent in that processor. If there is no boundary event in that processor a decimalnumber is added such that it does not correspond to any processor identity. Inour implementation the multiplying number was 1020, which leads to goodaccuracy.

5

Shim, Amar, PRB ’05, Merrick, Fichthorn, PRE ’07, etc

Synchronous algorithm: uniform time window for eachprocessor unless a boundary event occurs.

Resolve conflicts at boundary regions by communicating withneighboring processors and restart cycle.

Global communication overhead in each cycle.

Previous methods rely on exact simulation of the stochasticprocess.

Page 48: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Synchronous algorithms-Exact SimulationJ. Phys.: Condens. Matter 21 (2009) 084214 G Nandipati et al

Figure 6. Schematic diagram of square decomposition for Np = 9.Solid lines correspond to processor domains.

carrying out our parallel KMC simulations of coarsening wehave tested and developed several different parallel algorithmsin order to determine which is the most efficient. In particular,we have studied a recently developed rigorous algorithm, the‘optimistic synchronous relaxation’ (OSR) algorithm as wellas a modified version of this algorithm which we refer toas ‘optimistic synchronous relaxation with pseudo-rollback’(OSRPR). In addition, we have tested the recently developedsemi-rigorous synchronous sublattice (SL) algorithm. Finally,to reduce the number of events corresponding to boundaryevents between processors we have also developed a newmethod, which we refer to as ‘dynamic boundary allocation’(DBA). Below we discuss each of these methods and some ofthe details of their implementation.

4.1. Optimistic synchronous relaxation (OSR) algorithm

One of the first rigorous algorithms for parallel discrete-event simulations was the synchronous relaxation algorithmdeveloped by Lubachevsky [9]. We note that the applicationof this algorithm to KMC simulations as well as its scalingas a function of the number of processors Np has beenrecently studied by Shim and Amar [11]. However, since thisalgorithm is relatively complex and requires multiple iterationsfor each cycle, Merrick and Fichthorn have recently developeda similar but simpler algorithm which they refer to as optimisticsynchronous relaxation (OSR) [12].

Figure 6 shows a typical decomposition of a squaresystem into Np square regions, where Np is the number ofprocessors. Also indicated in figure 6 are the boundary and‘ghost’ regions for the central processor, where the boundaryregion is defined as that portion of the processor’s domain inwhich a change may affect neighboring processors. Similarly,the ghost region corresponds to that part of the neighboringprocessors’ domains which can affect a given processor. Thus,in general the width of the boundary and ghost regions must beat least equal to the range of interaction.

As shown in figure 7, in the OSR algorithm in eachcycle all processors start with the same initial time and then

Figure 7. Time evolution of events for OSR and OSRPR algorithmswith G = 4. Dashed lines correspond to selected events, while thedashed line with an X corresponds to an event exceeding tmin (see thetext). In OSR this event is discarded while in OSRPR this event isadded to the next cycle.

simultaneously and independently carry out KMC events intheir domains until either the number of KMC events reachesa pre-determined fixed number G, or one of the eventscorresponds to a ‘boundary event’, i.e. an event which modifiesthe boundary region of the given processor and which can thusaffect events in neighboring processors. Defining the time forthe last event in each processor as tlast, a global communicationis then carried out to determine the time tmin corresponding tothe minimum of tlast over all processors. Each processor then‘rolls back’ or undoes all KMC events which occur after tmin.If there are no boundary events then the processors all move onto the next cycle with the new starting time corresponding totmin. However, if tmin corresponds to a boundary event, then anadditional communication is needed to update the ghost and/orboundary regions of all processors affected by the boundaryevent.

We note that typically the OSR algorithm requires 2–3 global communications each cycle, one to determine tmin,another to determine if the event with tmin corresponded to aboundary event, and a third to update the boundary regionsof the affected processors if there was a boundary event.To reduce the number of global communications we haveencoded the processor identity as well as whether or not thelast event was a boundary event, along with the least advancedtime of each processor before doing a global communicationto determine tmin. This was done by replacing tlast with anumber whose most significant figures corresponded to tlast butwhose least significant figures contained information about theprocessor ID and whether or not that processor had a boundaryevent6. Thus, in our implementation of the OSR algorithm onlyone global communication was needed if tlast corresponded to anon-boundary event, while two communications were neededif it was a boundary event.

6 In this method, the time each processor advances from its previous cycleis multiplied by a very large number to form the integer part of the doubleprecision packed number. The ratio of the processor ID to the total numberof processors used Np is then added to the decimal part if there is a boundaryevent in that processor. If there is no boundary event in that processor a decimalnumber is added such that it does not correspond to any processor identity. Inour implementation the multiplying number was 1020, which leads to goodaccuracy.

5

Shim, Amar, PRB ’05, Merrick, Fichthorn, PRE ’07, etc

Synchronous algorithm: uniform time window for eachprocessor unless a boundary event occurs.

Resolve conflicts at boundary regions by communicating withneighboring processors and restart cycle.

Global communication overhead in each cycle.

Previous methods rely on exact simulation of the stochasticprocess.

Page 49: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Synchronous algorithms: Sub-Lattice MethodJ. Phys.: Condens. Matter 21 (2009) 084214 G Nandipati et al

Figure 8. Comparison between parallel results using the OSRPRalgorithm with square decomposition (Np = 4) and serial results fora fractal model with D/F = 105 and G = 7.

We note that, in the OSR algorithm for a givenconfiguration, there is an optimal value of G which takes intoaccount the tradeoffs between communication time (which iswasted if there are no boundary events) and rollbacks. While,in general, an adaptive method could be used to attempt tooptimize the value of G from cycle to cycle, in practice we havefound it more efficient to simply use trial and error to find theoptimal fixed value of G for our simulation (see section 4.4).

4.2. Optimistic synchronous relaxation with pseudo-rollback(OSRPR) algorithm

In the OSR algorithm each processor discards all KMC eventswhich occur after tmin. However, this is unnecessary if thereare no boundary events in any of the processors. Therefore, wehave considered a variation of the OSR algorithm (optimisticsynchronous relaxation with pseudo-rollback) in which, whenthere are no boundary events in the system, those events thatwould have been discarded are added to the next cycle. Thiscan reduce the loss of computational time due to undoing andthen ‘redoing’ events and thus enhances the computationalefficiency. As a test of the OSRPR algorithm, we have carriedout parallel simulations using this algorithm for a ‘fractal’model of irreversible submonolayer growth in which onlymonomer deposition and diffusion processes are included [11],with Np = 4. As expected, there is excellent agreementbetween serial and parallel results for the island and monomerdensities (see figure 8).

4.3. Synchronous sublattice (SL) algorithm

In order to maximize the parallel efficiency we have alsocarried out simulations using the semi-rigorous synchronoussublattice (SL) algorithm recently developed by Shim andAmar [13]. To avoid conflicts between processors, in the SLalgorithm each processor domain is divided into subregionsor sublattices (see figure 9). A complete synchronous cycle

Figure 9. Schematic diagram of strip decomposition for Np = 2.Each processor domain is subdivided into A and B sublattices.Boundary and ghost regions for B sublattice of processor 1 are alsoshown.

Figure 10. Time evolution in the SL algorithm. Dashed linescorrespond to selected events, while the dashed line with an Xcorresponds to an event which is rejected since it exceeds the cycletime ! .

corresponding to a cycle time ! is then as follows. Atthe beginning of a cycle, each processor’s local time isinitialized to zero. One of the sublattices (A or B) isthen randomly selected so that all processors operate onthe same sublattice during the cycle. Each processor thensimultaneously and independently carries out KMC events inthe selected sublattice until the time of the next event exceedsthe time interval ! (see figure 10). The processors thencommunicate any necessary changes (boundary events) withtheir neighboring processors, update their event rates and moveon to the next cycle using a new randomly chosen sublattice.We note that, in order to ensure accuracy, the cycle time musttypically be less than or equal to the inverse of the fastestpossible single-event rate in the system [13].

Since it only requires local communication, the scalingbehavior of the SL algorithm is significantly better than for theOSR and OSRPR algorithms. As a result, it has been shown tobe relatively efficient in parallel KMC simulations of a varietyof models of growth [13, 17] and island coarsening [18]. Inaddition, while it is not exact, in simulations of a variety ofmodels [13, 17, 18] it was found that, unless the processor sizeis extremely small (smaller than a ‘diffusion length’) or thecycle time is too large, there is essentially perfect agreement

6

Shim, Amar, PRB ’07

Synchronous algorithm: fixed time window (cycle).

Random choice of sublattice and restart of cycle.

No global communication overhead in each cycle.

Relies on approximation of the stochastic process.

Page 50: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Synchronous algorithms: Sub-Lattice MethodJ. Phys.: Condens. Matter 21 (2009) 084214 G Nandipati et al

Figure 8. Comparison between parallel results using the OSRPRalgorithm with square decomposition (Np = 4) and serial results fora fractal model with D/F = 105 and G = 7.

We note that, in the OSR algorithm for a givenconfiguration, there is an optimal value of G which takes intoaccount the tradeoffs between communication time (which iswasted if there are no boundary events) and rollbacks. While,in general, an adaptive method could be used to attempt tooptimize the value of G from cycle to cycle, in practice we havefound it more efficient to simply use trial and error to find theoptimal fixed value of G for our simulation (see section 4.4).

4.2. Optimistic synchronous relaxation with pseudo-rollback(OSRPR) algorithm

In the OSR algorithm each processor discards all KMC eventswhich occur after tmin. However, this is unnecessary if thereare no boundary events in any of the processors. Therefore, wehave considered a variation of the OSR algorithm (optimisticsynchronous relaxation with pseudo-rollback) in which, whenthere are no boundary events in the system, those events thatwould have been discarded are added to the next cycle. Thiscan reduce the loss of computational time due to undoing andthen ‘redoing’ events and thus enhances the computationalefficiency. As a test of the OSRPR algorithm, we have carriedout parallel simulations using this algorithm for a ‘fractal’model of irreversible submonolayer growth in which onlymonomer deposition and diffusion processes are included [11],with Np = 4. As expected, there is excellent agreementbetween serial and parallel results for the island and monomerdensities (see figure 8).

4.3. Synchronous sublattice (SL) algorithm

In order to maximize the parallel efficiency we have alsocarried out simulations using the semi-rigorous synchronoussublattice (SL) algorithm recently developed by Shim andAmar [13]. To avoid conflicts between processors, in the SLalgorithm each processor domain is divided into subregionsor sublattices (see figure 9). A complete synchronous cycle

Figure 9. Schematic diagram of strip decomposition for Np = 2.Each processor domain is subdivided into A and B sublattices.Boundary and ghost regions for B sublattice of processor 1 are alsoshown.

Figure 10. Time evolution in the SL algorithm. Dashed linescorrespond to selected events, while the dashed line with an Xcorresponds to an event which is rejected since it exceeds the cycletime ! .

corresponding to a cycle time ! is then as follows. Atthe beginning of a cycle, each processor’s local time isinitialized to zero. One of the sublattices (A or B) isthen randomly selected so that all processors operate onthe same sublattice during the cycle. Each processor thensimultaneously and independently carries out KMC events inthe selected sublattice until the time of the next event exceedsthe time interval ! (see figure 10). The processors thencommunicate any necessary changes (boundary events) withtheir neighboring processors, update their event rates and moveon to the next cycle using a new randomly chosen sublattice.We note that, in order to ensure accuracy, the cycle time musttypically be less than or equal to the inverse of the fastestpossible single-event rate in the system [13].

Since it only requires local communication, the scalingbehavior of the SL algorithm is significantly better than for theOSR and OSRPR algorithms. As a result, it has been shown tobe relatively efficient in parallel KMC simulations of a varietyof models of growth [13, 17] and island coarsening [18]. Inaddition, while it is not exact, in simulations of a variety ofmodels [13, 17, 18] it was found that, unless the processor sizeis extremely small (smaller than a ‘diffusion length’) or thecycle time is too large, there is essentially perfect agreement

6

Shim, Amar, PRB ’07

Synchronous algorithm: fixed time window (cycle).

Random choice of sublattice and restart of cycle.

No global communication overhead in each cycle.

Relies on approximation of the stochastic process.

Page 51: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Hierarchical Parallel KMC algorithms

M.K., P. Plechac (U of TN and ORNL), G. Arabatzis (U of Crete)and L.

Xu (CS, UDel), preprint, (2010)

Markovian Dynamics: Adsorption/Desorption/Reaction/ Diffusion

Adsorption/Desorption/Reaction Generator: LN f (σ) =P

x∈Λ c(x, σ)[f (σx ) − f (σ)].

Multi-site updates σx for most systems, e.g.

Suchorski et al ChemPhysChem (2010)Complex behavior: bistability, oscillations, chaos, patterning, etc.

Page 52: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Analogy to coarse-graining

Decompose particle system in parts communicating minimally;thus, local info is represented by suitable coarse variables, orcomputed on separate processors within a parallel architecture.

Example: A 1-D equilibrium calculationHs

N (σ) =P

k Hsk (σ) +

Pk Wk,k+1(σ)

Hsk : short-range Hamiltonian for the k-CG cell with free boundary conditions

Wk,k+1: short-range interactions between k- and k + 1-CG cells

e−βHsN PN(dσ|η) =

∏k: odd

[e−(Wk−1,k+Wk,k+1)e−Hs

k Pk(dσk |η(k))]×∏

k: even

e−Hk Pk(dσk |η(k))1D Operator Splitting

Algorithm(OpSpl)

1. Apply SSA on white cells in parallel until synchronization time 2. Apply SSA on black cells in parallel until synchronization time 3. Goto 1

Page 53: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Analogy to coarse-graining

Simplified CG: When Wk,k+1 are disregarded, there are no CG cellcorrelations–but there are intra-cell–and the CG Hamiltonian is

H(s,0)m (η) =

∑k

U(s,0)k (ηk) = −

∑k

1

βlog

∫e−βHs

k (σ)Pk(dσk |η(k))

i.e. independent sampling over each coarse cell with free boundaryconditions

1D Operator Splitting Algorithm(OpSpl)

1. Apply SSA on white cells in parallel until synchronization time 2. Apply SSA on black cells in parallel until synchronization time 3. Goto 1

Parallelization: trivial, no communication between CG cells.

Page 54: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Step 1: Generator decomposition

LN f (σ) =∑x∈Λ

c(x , σ)[f (σx)− f (σ)]∑k

∑x∈Ck

c(x , σ)[f (σx)− f (σ)]

∑k: odd

Lk f (σ)︷ ︸︸ ︷∑x∈Ck

c(x , σ)[f (σx)− f (σ)] +∑

k: even

∑x∈Ck

c(x , σ)[...]

:= LO f (σ) + LE f (σ)1D Operator Splitting

Algorithm(OpSpl)

1. Apply SSA on white cells in parallel until synchronization time 2. Apply SSA on black cells in parallel until synchronization time 3. Goto 1

2D OpSpl

1. Apply SSA on white cells in parallel until synchronization time 2. Apply SSA on black cells in parallel until synchronization time 3. Goto 1

Page 55: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Step 1: Generator decomposition

LN f (σ) =∑x∈Λ

c(x , σ)[f (σx)− f (σ)]∑k

∑x∈Ck

c(x , σ)[f (σx)− f (σ)]

∑k: odd

Lk f (σ)︷ ︸︸ ︷∑x∈Ck

c(x , σ)[f (σx)− f (σ)] +∑

k: even

∑x∈Ck

c(x , σ)[...]

:= LO f (σ) + LE f (σ)1D Operator Splitting

Algorithm(OpSpl)

1. Apply SSA on white cells in parallel until synchronization time 2. Apply SSA on black cells in parallel until synchronization time 3. Goto 1

2D OpSpl

1. Apply SSA on white cells in parallel until synchronization time 2. Apply SSA on black cells in parallel until synchronization time 3. Goto 1

Page 56: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Step 2: Trotter product & Fractional Step Approximation

Trotter product for semigroups (Proc. AMS (1958)):

limh→0

(eAh/2eBh/2

)[t/h]f = e(A+B)t f

Random Trotter product formula for jump processes:Kurtz, (Proc. AMS (1972))

Approximation of the Markov semigroup based on (random)Trotter Theorem (Lie or Strang splitting):

eLN∆t ≈ eLO∆t/2eLE ∆t/2

For short range interactions, the processes ∼ Lk areindependent and can be simulated on separate processors:

eLN∆t ≈ eLO∆teLE ∆t ≈∏

k: odd

k-th︷ ︸︸ ︷eLk∆t

∏k: even

processor︷ ︸︸ ︷eLk∆t

Page 57: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Step 2: Trotter product & Fractional Step Approximation

Trotter product for semigroups (Proc. AMS (1958)):

limh→0

(eAh/2eBh/2

)[t/h]f = e(A+B)t f

Random Trotter product formula for jump processes:Kurtz, (Proc. AMS (1972))

Approximation of the Markov semigroup based on (random)Trotter Theorem (Lie or Strang splitting):

eLN∆t ≈ eLO∆t/2eLE ∆t/2

For short range interactions, the processes ∼ Lk areindependent and can be simulated on separate processors:

eLN∆t ≈ eLO∆teLE ∆t ≈∏

k: odd

k-th︷ ︸︸ ︷eLk∆t

∏k: even

processor︷ ︸︸ ︷eLk∆t

Page 58: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Step 2: Trotter product & Fractional Step Approximation

Trotter product for semigroups (Proc. AMS (1958)):

limh→0

(eAh/2eBh/2

)[t/h]f = e(A+B)t f

Random Trotter product formula for jump processes:Kurtz, (Proc. AMS (1972))

Approximation of the Markov semigroup based on (random)Trotter Theorem (Lie or Strang splitting):

eLN∆t ≈ eLO∆t/2eLE ∆t/2

For short range interactions, the processes ∼ Lk areindependent and can be simulated on separate processors:

eLN∆t ≈ eLO∆teLE ∆t ≈∏

k: odd

k-th︷ ︸︸ ︷eLk∆t

∏k: even

processor︷ ︸︸ ︷eLk∆t

Page 59: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Step 2: Trotter product & Fractional Step Approximation

Trotter product for semigroups (Proc. AMS (1958)):

limh→0

(eAh/2eBh/2

)[t/h]f = e(A+B)t f

Random Trotter product formula for jump processes:Kurtz, (Proc. AMS (1972))

Approximation of the Markov semigroup based on (random)Trotter Theorem (Lie or Strang splitting):

eLN∆t ≈ eLO∆t/2eLE ∆t/2

For short range interactions, the processes ∼ Lk areindependent and can be simulated on separate processors:

eLN∆t ≈ eLO∆teLE ∆t ≈∏

k: odd

k-th︷ ︸︸ ︷eLk∆t

∏k: even

processor︷ ︸︸ ︷eLk∆t

Page 60: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Benchmarks and Error Analysis

4

Algorithm and ResultsApplication Performance

!"#"$%&'()*+&,-($./!#0012$3"$3+456&5$./7%289:;12$<"="$>+&56)($./<?;1

!"#$%&"#'(") #**)+ ",- ").*/'#$&+

o Kinetic Monte Carlo methods amenable to parallelization on

GPU clusters

o Benchmark model defined and accuracy tested

o Simulation of real chemical processes (oxidation)

o Distributed (MPI) version implemented

o 1000x speed-up compared to standard implementation

o Controlled approximation of the original Markov jump

process

3@(A$!"#"$%&'()*+&,-( ./!#0012$3"$3+456&5./7%289:;12$ <"="$>+&56)($./<?;13)('B)5A$?"$%&++-C-&DD&,- ./7%EF@G01

Application Performance

0-H*+&'-)D$)I$)J-B&'-)D$KL)54(($)D$'64$M<$+&''-54"Domain decomposition depicted together with the workload on GPU cells (bottom figure).

<ND&H-5$+)&B$O&+&D5-DCA$4J&HK+4$)I$&D$&+C)L-'6H$-D$P<"

0*,#-1 2%3 4"/"))%) ").*/'#$&+ 5*/ 6',%#'( !*,#%0"/)* +'&7)"#'*,+

Figure: Phase diagram of critical 2D Ising model used as a benchmark for accuracy (Onsager solution).

Figure: No load balancing

Figure: With load balancing

Phase diagram of critical 2D Ising model used as a benchmark for accuracy (Onsager solution).

Flexibility in choosing suitable decompositions

Controlled error approximations for observables with similar toolboxas in CG

K., Plechac, Sopasakis, SIAM Num. Anal. ’06

Page 61: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Algorithm Performance

105 106

101

102

103

104

105

106

107

GPU and Sequential code execution time

N (lattice size = N2)

exec

utio

n tim

e in

sec

seq

dt=0.01 (fermi)

dt=0.1

dt=1

dt=10

dt=0.01 (tesla)

dt=0.1

dt=1

dt=10

GPU simulation with various architectures (e.g. Fermi) vs. DNS

Page 62: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Dynamic load balancing

4

Algorithm and ResultsApplication Performance

!"#"$%&'()*+&,-($./!#0012$3"$3+456&5$./7%289:;12$<"="$>+&56)($./<?;1

!"#$%&"#'(") #**)+ ",- ").*/'#$&+

o Kinetic Monte Carlo methods amenable to parallelization on

GPU clusters

o Benchmark model defined and accuracy tested

o Simulation of real chemical processes (oxidation)

o Distributed (MPI) version implemented

o 1000x speed-up compared to standard implementation

o Controlled approximation of the original Markov jump

process

3@(A$!"#"$%&'()*+&,-( ./!#0012$3"$3+456&5./7%289:;12$ <"="$>+&56)($./<?;13)('B)5A$?"$%&++-C-&DD&,- ./7%EF@G01

Application Performance

0-H*+&'-)D$)I$)J-B&'-)D$KL)54(($)D$'64$M<$+&''-54"Domain decomposition depicted together with the workload on GPU cells (bottom figure).

<ND&H-5$+)&B$O&+&D5-DCA$4J&HK+4$)I$&D$&+C)L-'6H$-D$P<"

0*,#-1 2%3 4"/"))%) ").*/'#$&+ 5*/ 6',%#'( !*,#%0"/)* +'&7)"#'*,+

Figure: Phase diagram of critical 2D Ising model used as a benchmark for accuracy (Onsager solution).

Figure: No load balancing

Figure: With load balancing

Load balancing controlled by number of

jumps executed on each sub-domain

Mass Transport to a uniform histogram

Fractional Step approximation allows for tuning the balancing

4

Algorithm and ResultsApplication Performance

!"#"$%&'()*+&,-($./!#0012$3"$3+456&5$./7%289:;12$<"="$>+&56)($./<?;1

!"#$%&"#'(") #**)+ ",- ").*/'#$&+

o Kinetic Monte Carlo methods amenable to parallelization on

GPU clusters

o Benchmark model defined and accuracy tested

o Simulation of real chemical processes (oxidation)

o Distributed (MPI) version implemented

o 1000x speed-up compared to standard implementation

o Controlled approximation of the original Markov jump

process

3@(A$!"#"$%&'()*+&,-( ./!#0012$3"$3+456&5./7%289:;12$ <"="$>+&56)($./<?;13)('B)5A$?"$%&++-C-&DD&,- ./7%EF@G01

Application Performance

0-H*+&'-)D$)I$)J-B&'-)D$KL)54(($)D$'64$M<$+&''-54"Domain decomposition depicted together with the workload on GPU cells (bottom figure).

<ND&H-5$+)&B$O&+&D5-DCA$4J&HK+4$)I$&D$&+C)L-'6H$-D$P<"

0*,#-1 2%3 4"/"))%) ").*/'#$&+ 5*/ 6',%#'( !*,#%0"/)* +'&7)"#'*,+

Figure: Phase diagram of critical 2D Ising model used as a benchmark for accuracy (Onsager solution).

Figure: No load balancing

Figure: With load balancing

Page 63: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Hierarchical Structure and multiple GPUs2D OpSpl on multi-GPUs

Hierarchical methods are well suited for current architectures whichhave sophisticated memory hierarchies, e.g. GPUs.

Hierarchical lattice partitioning on GPU cluster: macro-, meso-,micro-cells

MPI/OpenMP communication between GPUs

Page 64: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Hierarchical Structure and multiple GPUs2D OpSpl on multi-GPUs

Hierarchical methods are well suited for current architectures whichhave sophisticated memory hierarchies, e.g. GPUs.

Hierarchical lattice partitioning on GPU cluster:

macro-, meso-,micro-cells

MPI/OpenMP communication between GPUs

Page 65: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Hierarchical Structure and multiple GPUs2D OpSpl on multi-GPUs

Hierarchical methods are well suited for current architectures whichhave sophisticated memory hierarchies, e.g. GPUs.

Hierarchical lattice partitioning on GPU cluster: macro-, meso-,micro-cells

MPI/OpenMP communication between GPUs

Page 66: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Hierarchical Structure and multiple GPUs2D OpSpl on multi-GPUs

Hierarchical methods are well suited for current architectures whichhave sophisticated memory hierarchies, e.g. GPUs.

Hierarchical lattice partitioning on GPU cluster: macro-, meso-,micro-cells

MPI/OpenMP communication between GPUs

Page 67: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Hierarchical Structure-Algorithm Performance

104

105

106

107

108

100

101

102

103

104

105

106

107

108

N (lattice size = N2)

exec

utio

n tim

e in

sec

GPU and Sequential code execution time

seq−1

seq−2

dt=0.01 mpi−64

dt=0.1

dt=1

dt=10

dt=0.01 (fermi)

dt=0.1

dt=1

dt=10

2D unimolecular reaction/diffuson particle system

Up to 105x speed-up compared to standard implementation

With 64 GPUs can simulate with relative ease 108 particles ( approx. 1µm2).

Page 68: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Hierarchical Structure & Multiple Scales

1. Micromechanisms with (very) different time scales, e.g. fastdiffusion in CO oxidation on Pt [Suchorski et al Phys. Rev. Lett. ’99]

ε−1Ldiff + Lreaction , ε << 1

Combine the hierarchical structure with established uses ofTrotter products for molecular systems with fast/slowmechanisms.Molecular Dynamics: Tuckerman et al J. Chem. Phys. ’92.

2. Optimizing the algorithms: Computation vs. CommunicationK., Plechac, Xu, Taufer (CS, UDel)

Central cells

!"#$%&'"()*+,"'-*.-%%/*0(*',-*10$(2&34

Page 69: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Hierarchical Structure & Multiple Scales

1. Micromechanisms with (very) different time scales, e.g. fastdiffusion in CO oxidation on Pt [Suchorski et al Phys. Rev. Lett. ’99]

ε−1Ldiff + Lreaction , ε << 1

Combine the hierarchical structure with established uses ofTrotter products for molecular systems with fast/slowmechanisms.Molecular Dynamics: Tuckerman et al J. Chem. Phys. ’92.

2. Optimizing the algorithms: Computation vs. CommunicationK., Plechac, Xu, Taufer (CS, UDel)

Central cells

!"#$%&'"()*+,"'-*.-%%/*0(*',-*10$(2&34

Page 70: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Hierarchical Structure & Multiple Scales

1. Micromechanisms with (very) different time scales, e.g. fastdiffusion in CO oxidation on Pt [Suchorski et al Phys. Rev. Lett. ’99]

ε−1Ldiff + Lreaction , ε << 1

Combine the hierarchical structure with established uses ofTrotter products for molecular systems with fast/slowmechanisms.Molecular Dynamics: Tuckerman et al J. Chem. Phys. ’92.

2. Optimizing the algorithms: Computation vs. CommunicationK., Plechac, Xu, Taufer (CS, UDel)

Central cells

!"#$%&'"()*+,"'-*.-%%/*0(*',-*10$(2&34

Page 71: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Conclusions - Further Work

Kinetic Monte Carlo methods amenable to parallelization onGPU clusters

Benchmark model defined and accuracy tested

Distributed (MPI) version implemented

Controlled approximation of the original Markov jump process

Capability to simulate realistic chemical processes at largespatiotemporal scales

Page 72: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Conclusions - Further Work

Kinetic Monte Carlo methods amenable to parallelization onGPU clusters

Benchmark model defined and accuracy tested

Distributed (MPI) version implemented

Controlled approximation of the original Markov jump process

Capability to simulate realistic chemical processes at largespatiotemporal scales

Page 73: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Conclusions - Further Work

Kinetic Monte Carlo methods amenable to parallelization onGPU clusters

Benchmark model defined and accuracy tested

Distributed (MPI) version implemented

Controlled approximation of the original Markov jump process

Capability to simulate realistic chemical processes at largespatiotemporal scales

Page 74: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Conclusions - Further Work

Kinetic Monte Carlo methods amenable to parallelization onGPU clusters

Benchmark model defined and accuracy tested

Distributed (MPI) version implemented

Controlled approximation of the original Markov jump process

Capability to simulate realistic chemical processes at largespatiotemporal scales

Page 75: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Conclusions - Further Work

However challenges remain....

Systems in surface process with short and long rangeinteractions, patterning, etc.

Optimizing the algorithms: Computation vs. Communication

Central cells

!"#$%&'"()*+,"'-*.-%%/*0(*',-*10$(2&34

Page 76: Accelerated kinetic Monte Carlo methods: Hierarchical ... · Accelerated kinetic Monte Carlo methods: Hierarchical Parallel Algorithms & Coarse-Graining Markos Katsoulakis University

Lattice Systems KMC Coarse-Graining Parallel KMC Conclusions

Conclusions - Further Work

However challenges remain....

Systems in surface process with short and long rangeinteractions, patterning, etc.

Optimizing the algorithms: Computation vs. Communication

Central cells

!"#$%&'"()*+,"'-*.-%%/*0(*',-*10$(2&34