high performance monte carlo and time-stepping dynamics ...dpplayne/papers/cstn-146.pdf · scale...

Technical Report CSTN-146

High Performance Monte Carlo and Time-SteppingDynamics for the Classical Spin Heisenberg Model on GPUs

K.A. Hawick and D.P. PlayneComputer Science, Institute for Information and Mathematical Sciences,

Massey University, North Shore 102-904, Auckland, New Zealandemail: { k.a.hawick, d.p.playne }@massey.ac.nz

Tel: +64 9 414 0800 Fax: +64 9 441 8181

April 2012

ABSTRACTThe Heisenberg model of classical spins makes useof both Monte Carlo stochastic dynamics as well astime-integration of its equation of motion. Thesetwo schemes have different parallelisation strategiesand tradeoffs. We implement both algorithms us-ing a data-parallel approach for Graphical Process-ing Units (GPUs) and we discuss the resulting perfor-mance on various combinations of single and multipleGPU. In addition to studying Monte Carlo dynamicalupdate schemes, we use our fast simulation code toexplore the scaling and time correlations of a large-scale Heisenberg model system using a high-order nu-merical integration algorithm, which enables study ofaccurate spin wave phenomena and time-correlationfunctions. We also discuss various graphical render-ing models to appropriately visualise the spin vectorsinside an interactive Heisenberg spin simulation.

KEY WORDSHeisenberg model; classical spin; Monte Carlo dy-namics; time-integration dynamics.

1 IntroductionSimulation of complex systems is a powerful meansof investigating phase transitions [6] and critical phe-nomena [16]. Visualising the approach to critical-ity of such system is also important to help developan intuitive understanding of simulation model pa-rameters. A high-performance visual simulation alsoaids in navigating through the model parameter spaceto identify interactively those areas that are worthmore exhaustive simulation and collection of statisti-

Figure 1: A visualization of a sample 3D Heisenbergsimulation.

cal measurements from appropriate numerical exper-iments.

A great deal of work has been done on models such asthe Ising model [10,15] which is based upon applyinga stochastic Monte Carlo based dynamics on a systemof spins modelled by individual bits. The Heisenbergmodel [1] of classical spin system [13, 24] is inter-esting because its more realistic continuous individ-ual spins can be simulated dynamical using the MonteCarlo method [3] but also using a more realistic time-integration method based on proper equations of mo-tion. This combined approach means that the Heisen-berg model is more appropriate for studying dynamicgrowth, decay and relaxation properties [19], sincethe simulation time can be more readily identified as a“real” time variable rather purely as an artifact of the

1

Monte Carlo algorithm.

In this present paper we make use of Graphical Pro-cessing Units (GPUs) and NVidia’s Compute Uni-fied Device Architecture (CUDA) software languageand framework to develop very fast numerical simu-lations of large Heisenberg spin model systems in twoand three dimensions [22]. We also develop variousgraphical rendering models using OpenGL softwareto visualise and explain the dynamically evolving spinvector field as both Monte Carlo dynamics and timeintegration dynamics are applied. Figure 1 shows asample rendering of a three dimensional Heisenbergsystem, simulated on a 643 lattice.

Although other workers have reported simulationwork of Monte Carlo simulations of the Heisenbergmodel [5, 26], it is also important to be able to sim-ulate its time-integration dynamics [2]. A typicalnumerical experiment consists of rapidly quenchingan hot random initial spin pattern using the MonteCarlo dynamics, followed by a carefully controlledfine grained time-stepping on the system using a suit-able numerical integration scheme to obtain temporalmeasurements.

We have developed automatic code generation tech-niques that has allowed us to generate high order timeintegration software to solve the equations of motionfor the spins to tenth order accuracy. This high de-gree of accuracy and associated stability allows usto investigate key measured properties such as timecorrelation functions over longer periods of simulatedtime than would otherwise be feasible. These mea-surements then have the potential to be compared withquantities obtained from experiments on real physicalmagnetic systems [18].

Our article is structured as follows: In Section 2 wesummarise the (classical) Heisenberg model of spinsand the dynamical schemes that we can apply to it.In Section 4 we present some visual renderings of thespin system as well as a discussion of some compu-tational performance measurements of our softwarerunning on various individual and multiple GPU sys-tems. In Section 5 we give us discussion of some ofthe phenomena we observed and we offer some con-clusions and areas for further work in Section 6.

2 Heisenberg Model SimulationsThe classical Heisenberg model is essentially a con-tinuous spin version of the Ising model.

The Heisenberg system is realised using a d-dimensional array (such as a cubic lattice in 3-D or

square lattice in 2-D) where each lattice site has aspin on it. A spin comprises a vector (sx, sy, sz)where each component is a scalar, normalised so thatsi ∈ [0, 1]. In practice, the normalisation of thespin vectors to be unit vectors implies that |s| =√

(s2x + s2

y + s2z) ≡ 1 and this means there are effec-

tively only two degrees of freedom. Using sphericaltrigonometry, the unit spin vector can thus be repre-sented by two angles θ ∈ [0, π], φ ∈ [0, 2π].

The energy function (the Hamiltonian) of the classicalHeisenberg model system is:

H = −J∑i,j

si · sj (1)

Where the summation is over the nearest neighboursof the lattice site and the negative sign (with a positivevalue of J means we get ferro-magnetism - ie align-ment of the spins for strong couplings (low tempera-tures). The dot product is between two neighbouringspins and essentially contributes towards the total en-ergy when two spins couple or closely align in direc-tion with one another.

We can write the equation of motion for the spins inthe form of a differential equation as:

dsidt

= si ×∑j

Jsj (2)

Where A×B is the cross product of the two vectorsA and B, and so the differential equation is actually avector equation – with a separate component for eachof the x, y, z parts of dsi.

This can be transformed into a finite difference equa-tion using the standard techniques such as Euler (poorstability) or Runge-Kutta (a lot better stability) and wein fact use a tenth order algorithm due to Hairer [8],that has a very high degree of accuracy and stability.We thus obtain an explicit formula for the change ∆sin each spin in terms of its prior value and the valuesof its nearest neighbours.

snew = sold + ∆s (3)

This algorithm allows us to update the Heisenbergspins very carefully and with a realistic and mean-ingful time that can be compared with temporal mea-surements of real magnetic systems. However in prac-tice the numerical experiments that are typically per-formed on the Heisenberg system involve quenchinga hot initially random arrangement of spins to a finitetemperature. Time integration is not well suited to

2

this quenching process as it is too slow. In practicea Monte Carlo stochastic algorithm is used to stepthe system forward in pseudo-time to thermal equi-librium, and the time-stepping algorithm can then beapplied.

Monte Carlo thermalisation using one of the standardalgorithms such as Metropolis [17] of Glauber [7] aredescribed in detail elsewhere [12]. In summary, thesealgorithms work as follows. At each Monte Carlostep each spin is considered in turn (usually in ran-dom order). A new direction for the spin is gener-ated randomly and the energy consequences ∆E ofthis change computed. If the spin change would de-crease the energy then the spin change is immediatelyaccepted. Otherwise the change is accepted accordingto the Boltzmann probability factor exp(−∆E/kBT )where kB is Boltzmann’s constant - which we take tobe unity for our purposes, and T is the temperature,which is effectively just the reciprocal of the couplingparameter J .

3 GPU ImplementationsGraphics Processing Units (GPUs) have been shownto be a very effective processing architecture for reg-ular lattice simulations such as the Heisenberg model.Originally designed for rendering real-time graph-ics for computer games, GPUs have evolved intohighly parallel architectures and are being increas-ingly used for scientific applications. In previouswork, GPUs have been used for processing the Isingspin model [11] as well as scalar and vector [21] mod-els.

All simulations discussed in this paper have been ex-ecuted on Fermi architecture NVIDIA GPUs. FermiGPUs contain up to 16 multiprocessors which eachcontain 32 scalar processors or SPs. Each multipro-cessor contains on-chip memory which allows infor-mation to be shared between SPs, all the multiproces-sors also have access to global memory which is themain storage area of the device.

These simulations perform all memory accessthrough global memory which is automaticallycached on Fermi devices. This memory access hasbeen shown to provide the best performance for thisaccess pattern [21]. For more details on GPU archi-tectures and implementing lattice-based simulationson GPUs see [20, 21].

The implementation of the Heisenberg model is dif-ferent to previous work in that each spin is representedby a three-dimensional vector and requires two phases

of computation - the equilibration phase and the spinupdate phase. The equilibration phase is computedusing a Monte-Carlo method while the spin update iscomputed by integrating the Heisenberg equation ofmotion over time. Each of these phases must be par-allelised for them to be computed on a GPU.

3.1 Equilibration PhaseThe equilibration phase of the Heisenberg simula-tion requires the use of a parallel Monte-Carlo. TheMetropolis algorithm does not parallelise well as itcan lead to race-conditions so instead the checker-board or red-black update is used. The checkerboardupdate pattern ensures that no two neighboring latticecells are changed during the same update, meaningno race-conditions can occur. This red-black checker-board pattern is shown in Figure 2. ¡

Figure 2: The checkerboard update pattern.

Each update will read the value of a cell and it’s near-est neighbors and compute the energy of the config-uration (E1). It will then randomly generate anotherspin and compute the energy of this alternative con-figuration (E2). The change in energy can then becalculated (∆E = E2 − E1) and used to either ac-cept or reject the new configuration. The new con-figuration is accepted if ∆E < 0 or with probabilitye−(∆E)/kbT .

This update process requires the use of a randomnumber generator (RNG) to create a new random spinconfiguration and also to determine if the spin shouldbe accepted when ∆E > 0. For this simulationthe RAN random number generator discussed in Nu-merical Recipes [23] is used. This is a fast randomnumber generator with relatively low storage require-ments and has been shown to pass appropriate statis-tical tests [14]. The random number generation pro-

3

cess is parallelised by creating a different RAN RNGfor each lattice cell and appropriately initialized. Thisway each lattice cell has an independent stream ofrandom numbers.

The checkerboard update pattern ensures that notwo neighboring lattice cells are changed during thesame update, unfortunately GPUs give the best per-formance when sequential threads access sequentialmemory addresses because these memory transac-tions can be coalesced into a single transaction. Thisdifference between algorithmic requirements and op-timal GPU access patterns can be overcome by re-ordering or crinkling the lattice. The lattice can berearranged so that cells that are updated at the sametime are stored sequentially which allows the mem-ory access to be as efficient as possible. This processis thoroughly described in [9] and the lattice from Fig-ure 2 is shown in it’s crinkled form in Figure 3.

Figure 3: The checkerboard update pattern shown ona crinkled lattice.

3.2 Spin Update PhaseIn the spin update phase of the simulation, the equa-tion of motion for each spin is computed and inte-grated over time to compute the spin at the end of thetime step. Every spin is updated each time step whichmeans that the best memory access pattern is providedby the uncrinkled lattice. Also the spin update phasedoes not require a random number generator.

There are a number of different methods that canbe used to integrate the equation of motion (equa-tion 2) over time. In this simulation (as with previ-ous work) the explicit methods from the Runge-Kuttafamily of integration methods are used. These meth-ods are used because they parallelise well and thehigher-order methods can provide very good stabilityand accuracy. The higher order methods become in-

creasingly complex to implement and for this reasonwe make use of code-generation techniques to pro-duce template code. The code generator can createintegration code for lattice-based simulations from aButcher tableau [4, 21].

These methods all integrate the motion of the spinsby calculating a number of intermediate stages. Thederivatives of these intermediate stages are combinedto calculate the final spin configuration. Higher or-der methods are more expensive in terms of memorystorage and computational intensity. However, thesemore stable higher-order methods can often simulatesystems with larger time-step and lead to an overallreduction in computation time.

4 ResultsOne important feature of the Heisenberg model isthe presence of a phase transition at the critical tem-perature Tc which can be seen in Figures 6 and 7.These figures show the temperature dependent behav-ior of three Heisenberg systems in two-dimensions(Figure 6) and three-dimensions (Figure 7).

The first system has a temperature higher than the crit-ical temperature (T > Tc) and exhibits random ‘hot’behavior. As the probability of accepting a new ran-dom spin approaches 1 the system will become com-pletely random. The temperature of second systemis near the critical temperature (T ≈ Tc) and showsquite different behavior to the first system. There areclear structures forming in this system while main-taining an element of randomness. The final systemhas a temperature well below the critical temperature(T < Tc) and will simply minimize the energy of thesystem.

The effect of this temperature dependent behaviorcan be easily seen by plotting the energy of thesystem. Figure 4 shows the energy of 1024x1024Heisenberg systems averaged over 20 runs withT={0.1, 0.2...1.0}. The checkerboard update is usedfor 1024 steps until the system reaches equilib-rium and then the Runge-Kutta 4th Order integrationmethod is used to integrate the motion of the spinsover time. It can be easily seen from the plot that theenergy for the different systems quickly reaches equi-librium and remains stable at this value.

The temperature of the system also affects howquickly the correlation of subsequent systems decays.The correlation of a system at time t to a previousstate at time 0 is calculated as the sum of the dot prod-ucts of each current spin st,i and the initial spin s0,i.

4

Figure 4: The energy of the two-dimensional Heisen-berg model for T = {0.1, 0.2...1.0} vs time (ln scale).

This can be written as:

Ct =∑i

s0,i · st,i (4)

The correlation of the two-dimensional Heisen-berg model in the spin update phase for T ={0.1, 0, 2...1.0} is shown in Figure 5. The Runge-Kutta 4th Order integration method with a time stepof h = 0.01 is used to evolve the system. The correla-tion of the system evolution shows different behaviordepending on the regime of the temperature.

Figure 5: The correlation of the two-dimensionalHeisenberg model for T = {0.1, 0.2...1.0}.

Figure 6: A series of three two-dimensional Heisen-berg systems - below, near and above the critical tem-perature.

5

Figure 7: A series of three three-dimensional Heisen-berg systems - below, near and above the critical tem-perature.

5 DiscussionWe found that the time -integration method is compu-tationally too inefficient to achieve a thermally equi-librated system in a reasonable number of time steps.The Monte Carlo dynamics can be implemented torapidly move the quenched system to thermal equilib-rium, but otherwise it does not yield good time mea-surements that can be easily related to real time be-haviour in real magnetic system. Development of acode that can apply both dynamic schemes was there-fore necessary.

We experimented extensively with different renderingschemes. The scheme used for the illustrations in thispaper are based on a simple mapping of colour hueand value to the two degrees of freedom - namely thetwo angles φ, θ that precisely define each unit spinvector.

Another model - the clock model [25] – is similar tothe classical Heisenberg system, except it has onlytwo components in each spin vector. Consequentlythe clock model spins can be specified using only oneangle and it could be visualised using an arrow orsome simpler colour mapping.

There is scope for further experimentation with dif-ferent rendering schemes. Some thresholding of thespin values by direction, could be used to look at apartial subset of the spins in a 3D hyperbrick at once.The problem of rendering a four dimensional systemeffectively is an open one. Although real magneticsystems are only 3 dimensional simulating a four-dimensional model is useful since it allows the dimen-sional dependence for various structural and phasetransitional properties to be studied. This remains anopen problem for the present however.

6 ConclusionsWe have described the classical Heisenberg model ofspin systems and our implementation of it on two-dimensional and three dimensional lattice systems us-ing Graphical Processing Units. Using appropriatememory mappings and data structures we obtaineda sufficiently fast implementation of the Heisenbergsimulations that we are able to explore its propertiesin near interactive time.

The Heisenberg system is somewhat more difficult torender than the Ising model. We experimented withvarious graphical renderings of the spin system us-ing colour hue and value to map to the two indepen-dent degrees of freedom of the unit spin vectors of the

6

evolving system.

We implemented both Monte Carlo dynamics as wellas a high order time integration dynamical scheme.We were able to use these to quench the system froman initial random state and subsequently integrate itcarefully to obtain time correlation functions respec-tively.

There is scope for a more detailed study of spin wavephenomena using this simulation and metrical analy-sis. We also expect to be able to adapt our simulationto study damaged and frustrated spin models whichhave a direct bearing on comparisons with propertiesof new magnetic materials.

References[1] Anderson, P.W.: New approach to the theory of su-

perexchange interactions. Phys. Rev. 115, 2–13 (1959)[2] Bernaschi, M., Parisi, G., Parisi, L.: Benchmarking

gpu and cpu codes for heisenberg spin glass over-relaxation. Computer Physics Communications 182,1265–1271 (2011)

[3] Binder, K. (ed.): Monte Carlo Methods in StatisticalPhysics. Topics in Current Physics, Springer-Verlag, 2edn. (1986), number 7

[4] Butcher, J.C.: Numerical Methods for Ordinary Dif-ferential Equations. No. ISBn 978-0-470-72335-7,Wiley, second edition edn. (2008)

[5] Campos, A.M., Pecanha, J.P., Pampanelli, P.,de Almeida, R.B., Lobosco, M., Vieira, M.B.,de O. Dantas, S.: Parallel implementation of theheisenberg model using monte carlo on gpgpu. In:Proc. Computational Science and its Applications(ICCSA 2011). vol. LNCS 6784, pp. 654–667 (2011)

[6] E.N.Miranda, N.Parga: Dynamical phase transitionsin the classical heisenberg model. J.Phys.A: Math.Gen 24, 1059–1064 (1991)

[7] Glauber, R.: Time dependent statistics of the isingmodel. J. Math. Phys II 228(4), 294–307 (1963)

[8] Hairer, E.: A Runge-Kutta Method of Order 10. J.Inst. Maths. Applics. 21, 47–59 (1978)

[9] Hawick, K.A., Playne, D.P.: Hypercubic StorageLayout and Transforms in Arbitrary Dimensions us-ing GPUs and CUDA. Concurrency and Computa-tion: Practice and Experience 23(10), 1027–1050(July 2011)

[10] Hawick, K., Leist, A., Playne, D.: Cluster and fastupdate lattice simulations using graphical process-ing units. Tech. Rep. CSTN-104, Computer Science,Massey University (November 2009)

[11] Hawick, K., Leist, A., Playne, D.: Regular Lattice andSmall-World Spin Model Simulations using CUDAand GPUs. Int. J. Parallel Prog. 39(CSTN-093), 183–201 (2011)

[12] Hawick, K.A.: Domain Growth in Alloys. Ph.D. the-sis, Edinburgh University (1991)

[13] Joyce, G.S.: Classical heisenberg model. Physical Re-view 155, 478–491 (1967)

[14] L’Ecuyer, P.: Software for uniform random numbergeneration: distinguishing the good and the bad. In:Proc. 2001 Winter Simulation Conference. vol. 2, pp.95–105 (2001)

[15] Leist, A., Playne, D., Hawick, K.: Interactive visu-alisation of spins and clusters in regular and small-world Ising models with CUDA on GPUs. Journalof Computational Science 1, 33–40 (2010), www.elsevier.com/locate/jocs

[16] M.E.Fisher: The theory of equilibrium critical phe-nomena. Rep.Prog.Phys. 30, 615–730 (1967)

[17] Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N.,Teller, A.H.., Teller, E.: Equation of state calculationsby fast computing machines. J. Chem. Phys. 6(21),1087–1092 (1953)

[18] M.Steiner, J.Villain, C.G.Windsor: Theoretical andexperimental studies on one-dimensional magneticsystems. Advances in Physics 25(2), 87–209 (1976)

[19] Muller-Krumbhaar, H., Binder, K.: Dynamic proper-ties of the Monte Carlo Method in Statistical Physics.J. Stat. Phys. 8(1), 1–24 (1973)

[20] NVIDIA R© Corporation: CUDATM 3.1 ProgrammingGuide (2010), http://www.nvidia.com/, lastaccessed August 2010

[21] Playne, D.P.: Generative Programming Methods forParallel Partial Differential Field Equation Solvers.Ph.D. thesis, Computer Science, Massey University(2011)

[22] P.Peczak, Ferrenberg, A.M., D.P.Landau: High-accuracy Monte Carlo study of the three-dimensionalclassical Heisenberg ferromagnet. Phys.Rev.B 43(7),6087–6093 (Mar 1991)

[23] Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flan-nery, B.P.: Numerical Recipes - The Art of ScientificComputing. Cambridge, third edn. (2007), iSBN 978-0-521-88407-5

[24] R.E.Watson, M.Blume, G.H.Vineyard: Classical Hes-ienberg magnet in two dimensions. Phys.Rev.B 2(3),684–690 (1970)

[25] Stanley, H.E.: Introduction to phase transitionsand critical phenomena. Oxford Science Publications(1987)

[26] Weigul, M., Yavorskii, T.: Gpu accelerated montecarlo simulations of lattice spin models. In: Pro-ceedings of the 24th Workshop on Computer Simula-tion Studies in Condensed Matter Physics (CSP2011).Physics Procedia, vol. 15, pp. 92–96 (2011)

7

high performance monte carlo and time-stepping dynamics ...dpplayne/papers/cstn-146.pdf · scale...

Documents