on the use of graphics processing units (gpus) for

5
On the use of graphics processing units (GPUs) for molecular dynamics simulation of spherical particles R. C. Hidalgo, T. Kanzaki, F. Alonso-Marroquin, and S. Luding Citation: AIP Conf. Proc. 1542, 169 (2013); doi: 10.1063/1.4811894 View online: http://dx.doi.org/10.1063/1.4811894 View Table of Contents: http://proceedings.aip.org/dbt/dbt.jsp?KEY=APCPCS&Volume=1542&Issue=1 Published by the AIP Publishing LLC. Additional information on AIP Conf. Proc. Journal Homepage: http://proceedings.aip.org/ Journal Information: http://proceedings.aip.org/about/about_the_proceedings Top downloads: http://proceedings.aip.org/dbt/most_downloaded.jsp?KEY=APCPCS Information for Authors: http://proceedings.aip.org/authors/information_for_authors Downloaded 08 Aug 2013 to 95.18.147.177. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions

Upload: others

Post on 21-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

On the use of graphics processing units (GPUs) for molecular dynamicssimulation of spherical particlesR. C. Hidalgo, T. Kanzaki, F. Alonso-Marroquin, and S. Luding Citation: AIP Conf. Proc. 1542, 169 (2013); doi: 10.1063/1.4811894 View online: http://dx.doi.org/10.1063/1.4811894 View Table of Contents: http://proceedings.aip.org/dbt/dbt.jsp?KEY=APCPCS&Volume=1542&Issue=1 Published by the AIP Publishing LLC. Additional information on AIP Conf. Proc.Journal Homepage: http://proceedings.aip.org/ Journal Information: http://proceedings.aip.org/about/about_the_proceedings Top downloads: http://proceedings.aip.org/dbt/most_downloaded.jsp?KEY=APCPCS Information for Authors: http://proceedings.aip.org/authors/information_for_authors

Downloaded 08 Aug 2013 to 95.18.147.177. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions

On the Use of Graphics Processing Units (GPUs) forMolecular Dynamics Simulation of Spherical Particles

R.C. Hidalgo∗, T. Kanzaki†, F. Alonso-Marroquin∗∗ and S. Luding‡

∗Department of Physics and Applied Mathematics, University of Navarra, Pamplona, Navarra, Spain†Gilogiq Internet Services S.L. Girona, Spain

∗∗School of Civil Engineering, The University of Sydney, Sydney NSW 2006, Australia‡Multi Scale Mechanics, CTW, UTwente, P. O. Box 217, 7500 AE Enschede, Netherlands

Abstract. General-purpose computation on Graphics Processing Units (GPU) on personal computers has recently becomean attractive alternative to parallel computing on clusters and supercomputers. We present the GPU-implementation of anaccurate molecular dynamics algorithm for a system of spheres. The new hybrid CPU-GPU implementation takes intoaccount all the degrees of freedom, including the quaternion representation of 3D rotations. For additional versatility, thecontact interaction between particles is defined using a force law of enhanced generality, which accounts for the elastic anddissipative interactions, and the hard-sphere interaction parameters are translated to the soft-sphere parameter set. We provethat the algorithm complies with the statistical mechanical laws by examining the homogeneous cooling of a granular gas withrotation. The results are in excellent agreement with well established mean-field theories for low-density hard sphere systems.This GPU technique dramatically reduces user waiting time, compared with a traditional CPU implementation.

Keywords: MD, DEM, Homogeneous Cooling, Friction and Rotations, Numerical Methods, GPUPACS: 81.05.Rm, 83.10.Rs

INTRODUCTION

In the last years, rapid advances in computer sim-ulations have led to many new developments in mod-eling particulate systems. Molecular dynamics simula-tion (MD) is widely accepted as an effective method inaddressing physical and engineering problems concern-ing dense granular media [1]. The main disadvantagesof molecular dynamics algorithms implemented on cen-tral processing units (CPU) are the maximum number ofparticles and the expensive computing time of the simu-lation.

Graphics processing units (GPU) are designed torapidly manipulate and alter memory. Their highly par-allel structure makes them more effective than general-purpose CPUs for algorithms where processing of largeblocks of data is done in parallel. Thus, general-purposecomputation on (desktop) graphics hardware (GPGPU)[2, 3, 4] has become a serious alternative for parallelcomputing on clusters or supercomputers.

Compute Unified Device Architecture (CUDA) is aparallel computing platform, which has been recentlyintroduced by NVIDIA [2, 3]. This platform notablyimproves the computing performance by exploiting thepower of the GPUs. The open source NVIDIA GPUComputing package provides several code samples thathelp to get started on the path of writing software withCUDA C/C++, OpenCL or DirectCompute. Specifical-ly, the example particles-CUDA is a simple algorithm,which includes discrete elements that move and collide

within an uniform grid data structure. However, this im-plementation is by no means optimal and there are manypossible further optimizations to this algorithm.

MD IMPLEMENTATION OF SPHERESON GPUS INCLUDING ROTATION

We have developed a new hybrid CPU-GPU DiscreteElement based on the CUDA-particles example. The firststep was to replace the Euler’s integrator by a VelocityVerlet integration method. This modification notably im-proved the numerical output of the algorithm and it guar-anties that the total mechanical energy of the system al-ways oscillates around a constant value that correspondsto the exactly solved system. The same goes for otherconservative quantities like linear or angular momentumthat are, at least nearly, preserved using this symplecticintegrator. The original collision rule was replaced by ageneralized contact law that is more realistic than the lin-ear spring contact. Rotational degrees of freedoms wereincluded and are activatd by a tangential force that de-pends on history. Accordingly, a neighbor list and a con-tact list have been also implemented.

The application developed, as most of the GPGPUsoftware, has an heterogeneous architecture. This meansthat some pieces of code run in the CPU and others inthe GPU. The flowchart of the MD method is presentedin Figure 1. The first steps of the program consist in the

Powders and Grains 2013AIP Conf. Proc. 1542, 169-172 (2013); doi: 10.1063/1.4811894

© 2013 AIP Publishing LLC 978-0-7354-1166-1/$30.00

169

Downloaded 08 Aug 2013 to 95.18.147.177. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions

FIGURE 1. Flowchart of the granular gas simulation. Operations in gray run on the CPU, subroutines in blue run on the GPUand the ones in oranges run partially in CPU and GPU, and, in most cases, they require data-interchange between CPU and GPU.

initialization of the CUDA-enabled device, the allocationof the necessary memory –in both CPU and GPU– andloading configuration parameters of the granular gas. Ini-tially, the particles are homogeneously distributed in thesimulation space with a random velocity for translation-al and rotational degrees of freedom (this is done on thehost and then the particles’ information is sent to the G-PU device). With the goal to avoid effects of the initialconfiguration, the dissipation due to particle-particle in-teraction is disabled, and a number of iterations whithvery low dissipation is performed. After that, the energyloss is enabled again and the main loop of the programstarts, calling in each iteration the Update System subrou-tine, and with a periodic (lower) frequency printing outthe particles information. When the simulation finishesthe resources reserved are released and the program ends.In the Update System routine is where the MD processesoccur. Initially, following the Velocity Verlet integrationmethod, the particles’ velocity in the mid-point is calcu-lated and with it the positions are updated. Then, with theaim of minimize the time used by the collisions method,the list with the particles that are neighbors to each oth-er is refreshed. Next, the collisions between particles arecomputed, calculating the forces and torques that eachparticle experiences and the list of contacts is updated.Finally the last step of the Verlet and the leap-frog inte-grator are performed.

As we mentioned above, to define the normal inter-action FN

i j , we use a linear elastic force, depending onthe overlap distance δ of the particles. To introduce dis-sipation, a velocity dependent viscous damping is as-sumed. Hence, the total normal force reads as FN

i j =

−kN · δ − γN · vNrel where kN is the spring constant in

the normal direction, γN is the damping coefficient inthe normal direction and vN

rel is the normal relative ve-locity between i and j. The tangential force FT

i j also

contains an elastic term and a tangential frictional ter-m accounting also for static friction between the grain-s. Taking into account Coulomb’s friction law it read-s as FT

i j = min{−kT · ξ − γT · |vTrel |, μFN

i j } where γT isthe damping coefficient in tangential direction, vT

rel is thetangential component of the relative contact velocity ofthe overlapping pair, μ is the friction coefficient of theparticles, ξ represents the elastic elongation of an imag-inary spring with spring constant kT at the contact [5],which increases as dξ (t)/dt = vT

rel as long as there isa non-sliding overlap between the interacting particles[5, 6, 7]. The tangential spring is chosen to be orthogo-nal to the normal vector [8]. Finally, we solve Newton’sequation of motion for all particles. A quaternion for-malism is used to describe the rotation of the particles.Finally, the equations of motion are integrated using aFincham’s leap-frog algorithm (rotational) [9] and a Ver-let Velocity algorithm (translational) [10].

Validation The numerical accuracy of the algorithmhas been validated by comparing our results with a meanfield model. Specifically, we have examined the homo-geneous cooling of rough and dissipative spherical parti-cles. Luding et al [12] and Herbst et al [13] have foundthat translational T and rotational R kinetic energy ofgranular gas of rough particles, in homogeneous coolingstate, is governed by the following system of equations

d

dτT = −AT 3/2 +BT 1/2R (1)

d

dτR = BT 3/2−CT 1/2R (2)

with the constants A, B and C, whose values dependon the space dimensionality D and the energy dissipa-tion rates as, A = 1−e2

n4 + η(1− η), B = η2

q and C =

ηq

(1− η

q

)where η = q(1+et)/(2q+2) (in 3D q= 2/5)

170

Downloaded 08 Aug 2013 to 95.18.147.177. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions

10-4

10-3

10-2

10-1

11

11 10 102 103

T/T (0) Eq.(51) PRE 58 3416R/R(0) Eq.(51) PRE 58 3416

T/T (0) CUDA (MD)R/R(0) CUDA (MD)

en = et = 0.6

10-4

10-3

10-2

10-1

11

11 10 102 103

T/T (0) Eq.(51) PRE 58 3416R/R(0) Eq.(51) PRE 58 3416

T/T (0) CUDA (MD)R/R(0) CUDA (MD)

en = et = 0.8

0

20

40

60

80

100

120

0.0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

D(v)

v

at t = 9× 102

at t = 5× 102

at t = 2× 102

Maxwell-Boltzmann fit 1

Maxwell-Boltzmann fit 2

Maxwell-Boltzmann fit 3

FIGURE 2. Evolution of the translational and rotational kinetic energy vs time. a) en = et = 0.6 b) en = et = 0.8. In c) we presentthe speed distribution obtained for en = et = 0.8 at t = 2×102s (red), t = 5×102s (green) and t = 9×102s (black).

and en and et are the restitution coefficients on the nor-mal and tangential direction respectively. The mean timebetween collisions G = 8(2a)2 N

V

√πm g(2a), is used to

rescale real time scale accordingly to τ = 23 GT 1/2(0)t.

The strength of the dissipation can also be included in-to the characteristic time τ = 2

3 (1− e2n)GT 1/2(0)t [14].

To compare the numerical output of our code with thetheoretical predictions (Equations 2), we have to find e-quivalent dissipation parameters (γn, γt and kt ) that cor-respond with specific values of the normal en and tan-gential et restitution coefficients. In the simplest approx-imation, the normal interaction force between two con-tacting particles is a linear spring f n

el = kNδ and a ve-locity depending viscous damping force f n

diss = γnδ̇ [11].Examining the contact evolution one gets a well knowndifferential equation of the damped harmonic oscillator[11]

δ̈ +2ηδ̇ +ω20 δ = 0 (3)

Here ω0 =√

k/m12 is the oscillation frequency of an e-lastic oscillator and η is the effective viscosity, obtainedas γn = 2ηm12 where m12 = m1m2/(m1 +m2) is the re-duced mass. Solving Eq.(3) one can find that the effec-tive restitution coefficient, en = exp(−πη/ω) where isthe oscillation frequency of the damped oscillator. Com-bining the equations the following expression is obtainedγn =

√(4knm12)/((

πln 1

en

)2 +1).

On the other hand, describing the tangential forcebetween to contacting particles, one can also considera tangential spring f t

el = ktδt and a velocity dependentviscous damping f t

diss = γt δ̇t [11]. For simplicity sakehere we examined the case γt = 0; for which an analyticexpression, relating kt and kn, can be derived,

kt =knq

1+q

(arccos(−et)

π

)2

(4)

where q = 2/5 stands for the 3D case [11].

Numerical Results. For validation, we have numeri-cally studied the free cooling kinetics of a dilute systemof N = 4096 spheres confined within a square box withl = 2m, resulting in a volume fraction of Vf = 0.008.Initially, the particles are homogeneously distributed inthe space and their translational and rotational veloci-ties follow Gaussian distributions. To avoid memory ef-fects from the initial conditions, we allow the system toexecute several collisions before starting to analyze theparticles’ temporal evolution. To compare the algorithmperformance with the mean field model [12], system ofparticles with two different restitution coefficient wherestudied, en = et = 0.6 and en = et = 0.8. The valueskn = 108N/m and a density ρ = 2000kg/m3 were used.The corresponding dissipative parameters have been cal-culated using the equations for the normal damping co-efficient γn and for the stiffness of the tangential springkt . The time step was set to dt = 10−6s.

Figure 2 shows the evolution of the translational T androtational R kinetic energies are. Note that in every casethe time scale have been rescaled using the correspond-ing characteristic time, resulting in τ = 2

3 GT1/2

tr (0)t. Aswe start from an equilibrium state and the dissipationis low, the system evolves into a homogeneous coolingstate. For comparison we also show the correspondinganalytic result of Eq. (2) for the same restitution coeffi-cients. The excellent agreement archived for both casesvalidates the numerical performance of our algorithm.

During the cooling process the velocity statistic-s was also examined. Low dissipative particles cooldown uniformly over a wide range of time. Thus, al-l the temporal dependences enter through the meanvalues of the translational and rotational temperature.Such a picture is consistent with the results shown inFig. 2c. (en = et = 0.8) where the speed distribution,D(c) and (c =

√v2

x + v2y + v2

z ) is presented at sever-al times. In all cases, the speed distribution remainsclose to a Maxwell-Boltzmann speed distribution D(c) =

171

Downloaded 08 Aug 2013 to 95.18.147.177. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions

104

2

5

105

2

5

106

2

5

107

10000 20000 30000

C(N

)Cundal’sNumber

N

CPUCPU-GPU Ge-Force9400CPU-GPU Ge-Force430

FIGURE 3. Cundall number against N for different simula-tions. In black color a version that run only on the CPU. In redand blue hybrids versions.

4π(

1πv2

mp

)3/2c2e−c2/v2

mp , where vmp(t) is the most prob-

able velocity. For the rotational degrees of freedom wehave obtained similar results (data not shown).

Computational Performance Benchmarks. In any pro-gram that runs in parallel the execution time depend-s, in conjunction with the hardware, on the number oftasks running simultaneously. We compared the runtimedifference between typical MD-algorithms and hybridCPU-GPU MD-algorithms. A 3D mono-disperse gran-ular gas was used as model example. The first versionof the code was a hybrid CPU-GPU algorithm, whichuses the GPU to calculate the interaction between par-ticles, whilst the second version fully runs on the CPU.The performance of a particle simulation code is mea-sured by the Cundall number defined as C = NT N/TCPU

where NT is the number of time steps, N is the numberof particles and TCPU is the duration of the simulation inreal-time. The CPU version benchmark was performed ina personal computer running Debian GNU/Linux 6.0.2,with a processor Intel® Core™ 2 Quad Q6600 at 2.40GHz. In the case of the GPU version, the benchmark wasperformed in the same PC with an NVIDIA® GeForce®GT 430 graphic card, and in an Apple MacBook Pro®with a processor Intel® Core™ 2 Duo at 2.53GHz anda graphic card NVIDIA® GeForce® 9400M. In all cas-es the mean value for 10 different executions –1000000iterations each one– are presented. The results obtainedare shown in Figure 3. Note that when the number ofparticles is relatively small (from N = 4 to N = 2048)the Cundall Number is approximately constant, denotinga very similar performance for all cases. As the systemsize gets larger, however, the performances of the hybrid

CPU-GPU algorithm are notably enhanced with respectto the ones only running on the CPU. Furthermore, us-ing and NVIDIA GeForce GT 430 graphic card the sim-ulations executed with the hybrid CPU-GPU algorithmrun faster by more than one order of magnitude than onthe CPU. It is important to remark, that today there isa new generation of NVIDIA products, which are basedon Fermi and Kepler architecture and are optimized forscientific applications. The performance of the hybrid al-gorithm, on this novel hardware, should be even better.

In summary, we have described the implementationof an accurate molecular dynamics algorithm for mono-disperse systems of spheres with rotation, using GPUs.Simulations in GPU are notably faster than traditionalCPU methods, the results agree with mean-field theoriesfor low-dense granular systems.

ACKNOWLEDGMENTS

The Spanish MINECO (Projects FIS2011-26675), theUniversity of Navarra (PIUNA Program) and the Univer-sity of Sydney Civil Engineering Research DevelopmentScheme (CERDS) have supported this work.

REFERENCES

1. T. Pöschel and T. Schwager, Computational GranularDynamics. Springer-Verlag Berlin, 2005.

2. J. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger,A. Lefohn, and T. Purcell, Computer Graphics Forum, 26,80-113, (2007)

3. J. Owens, M. Houston, D. Luebke, S. Green, J. Stone, andJ. Phillips, “Gpu computing,” Proceedings of the IEEE, 96,879-899 (2008).

4. Juan-Pierre Longmore, Patrick Marais and Michelle KuttelPowder Technology in Press (2012)

5. P. Cundall and O. Strack, Geotechnique, 29 47-65 (1979)6. G. Duvaut and J.-L. Lions, Les Inéquations en Mécanique

et en Physique. Dunod, Paris, 1972.7. S. Luding Granular Matter 10, 235-246, (2008)8. T. Weinhart, A. R. Thornton, S. Luding, and O. Bokhove,

Granular Matter 14, 531-552 (2012)9. D. Fincham, “Leapfrog rotational algorithms,” Molecular

Simulation, 8 165-178 (1992)10. L. Verlet, Phys. Rev., 165 (201-214 (1968)11. S. Luding, “Collisions & contacts between two particles,”

in Physics of dry granular media - NATO ASI Series E350(285-304) Kluwer Acad. Publ. (Dordrecht) (1998)

12. S. Luding, M. Huthmann, S. McNamara, and A. Zippelius,Phys. Rev. E, 58, (3416-3425) (1998).

13. O. Herbst, R. Cafiero, A. Zippelius, H. J. Herrmann, andS. Luding, Physics of Fluids 17 107102 (2005)

14. S. Miller and S. Luding, Phys. Rev. E, 69 31305 (2004)

172

Downloaded 08 Aug 2013 to 95.18.147.177. This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: http://proceedings.aip.org/about/rights_permissions