code and internal kink mode simulation

m: ?~5~~ ~A=RR ~

Parallelization of

its Application to

Gyrokinetic Particle

Internal Kink Mode Code and Simulation

NAITOU Hiroshi, SONODA Takuya, TOKUDA Shinji* and DECYK Viktor K.** Department of Electrical and Electronical Engineering, Yamaguchi University, Ube 755, Japan

*Naka Fusion Research Establishment, Japan Atomic Energy Research Institute, Ibaraki 311-01, Japan

* *Department of Physics, University of Callfornia at Los Angeles, CA 90095-1547, USA

(Received 28 November 1995/Revised manuscript:received 30 January 1996)

Abstract A three dimensional magneto-inductive gyrokinetic particle code with ~f method, GYR3D, is

ported to a massively parallel computer, Intel Paragon XP/S15-256, which has 256 scalar processors

(nodes) connected with a two-dimensional mesh topology. The Fortran77 compiler with message-

passing subroutines is used. It is found that the speed of the parallelized code scales well with the num-

ber of nodes. The speed of the parallelized GYR3D on Intel Paragon with 256 nodes is lilore than three

times faster than that of the vectorized GYR3D on NEC SX-3/24R (vector/parallel computer) when

one node is used. The parallelized GYR3D code is used to simulate the m=1 (poloidal mode number)

and n=1 (toroidal mode number) kinetic internal kink mode in a tokamak. The results for the conver-

gence study of 8f code in the limit of extremely large number of particles are shown.

Keywords: gyrokinetic particle simulation, numerical tokamak project, numerical experiment of tokamak,

delta-f method, massively parallel computer, distributed memory, message passing model, tokamak,

internal kink mode, sawiooth crash

1. Introduction

Massively parallel computers are promising an un-

imaginable improvement in computer physics in many

scientific and industrial areas like astrophysics, molecu-

lar dynamics, fluid dynamics, elementary particles,

molecular biology, plasma physics including processing

plasma, etc. Also in the fusion society, it is believed

that the use of massively parallel computers (including

forthcoming machines) will make it possible to simulate

the entire plasma in tokamaks. The Numerical Toka-

mak Project (NTP) [1,2,3], a High Performance Com-

puting and Communications (HPCC) Project sponsored

by the US Department of Energy (DoE) and the Nu-

merical Experiment of Tokamak (NEXT) begun in the

Japan Atomic Energy Research Institute (JAERI) are

the examples of such efforts. The main focus of NEXT

is a simulation study of tokamak plasmas with particle

and MHD-fluid models using the dev~loping technol-

ogy of parallel computers. This project aims to clarify

the physics of fusion plasmas and to contribute to the

design study of fusion reactors such as ITER. The pro-

ject also includes the development of parallel comput-

ing technology for tokamak simulation, and scalar par-

allel computing and vector parallel computing are in-

vestigated on an equal footing. The phenomena which

are not well understood in tokamak plasma are related

to kinetic effects of electrons, ions, and energetic ions.

The gyrokinetic particle simulation [4] is much more

suitable to these kinds of subjects because it includes

many basic physics without too many assumptions.

Also fluid simulation codes that include kinetic effects

may be used but the validity of the results must be

compared with and verified by the gyrokinetic codes.

So both approaches are necessary to study kinetic

259

~~ )~7 ･ ~p~/~Z.~~A~!;~･~~;~i_~~~d~~ 1995 f~ 3 ~l

effects in tokamaks.

Recently, three-dimensional electrostatic gyro-

kinetic codes in toroidal geometry were ported to mas-

sively parallel computers. Simulations of ion tempera-

ture gradient driven turbulence in tokamaks were done

by Parker et al. [5] ~on the CM-200 and CM-5 and by

Sydora [6] on the IBM SP2. In this paper, the three-

dimensional magneto-inductive gyrokinetic code

GYR3D [7] which uses a newly developed noise reduc-

tion technique (~f method [9]) was ported to a mas-

sively parallel computer, Intel Paragon XP/S15-256 as

part of the NEXr project. The Intel Paragon XP/S15-

256 is a distributed memory parallel computer, with

256 nodes connected with a two-dimensional mesh to-

pology. Each node has a 75 Mflops (double precision)

scalar processor and 32 Mbyies of memory, so the total

peak speed and memory is 19.2 Gflops and 8 Gbyies,

respectively. The Paragon uses a distributed memory

programming model implemented in Fortran 77 with

message-passing subroutines (NX communication li-

brary). In the distributed memory parallel computer,

the synchronization among the nodes and the access to

the memory of the other node can only be done by pas-

sing messages among processors. So message passing li-

braries are usually supported. Because the memory of

one node is small, the technique of domain decomposi-

tion is necessary. Also the transpose method of com-

municating data between nodes is used to execute the

fast Fourier transforms in the direction of the decom-

position. The performance of the parallelized GYR3D

on the Intel Paragon XP/S15-256 is studied and com-

pared with the vectorized GYR3D on the NEC SX-3/

24R (vector/parallel computer with peak speed of 1 2.8

Gflops with two CPUs) in single CPU mode.

The parallelized GYR3D was run to simulate the

linear and nonlinear phenomena associated with a m = 1

(poloidal mode number) and n=1 (toroidal mode num-

ber) kinetic internal kink mode (sawtooth crash) in to-

kamaks. The main purpose is to do a convergence study in the limit of extremely large number of particles

(markers), which used to be very difficult because the

estimated CPU time was so large. Also the higher n

mode numbers are included in the simulation by using

the smaller time step size because the frequencies of

corresponding shear Aliv6n waves are large.

The outline of the paper is as follows. A brief sum-

mary of the basic equations of gyrokinetic code with ~f

method is given in the following section. The descrip-

tion and performance of the parallelized GYR3D code

are presented in Section 3 . The application of the paral-

lelized GYR3D code to the simulation of the sawiOoth

crash in tokamaks is shown in Section 4. The conclud-

ing remarks and discussion are given in Section 5 .

2. Basic Equations The derivation of the gyrokinetic equations in the

low beta limit is given by Hahm [8] using the gyro-

kinetic ordering,

co

coci

Pti k e_c _ 6B _ m L ~ k ~ T. ~ Bo = O(e), i

klpt' = O(1) , (1)

where co is a characteristic frequency of a mode, co,i is

an ion cyclotron frequency, pti is a thermal ion gyro-

radius, L is a characteristic length of the gradient of

density or temperature, kjj and ki are wave numbers

parallel and perpendicular to the magnetic field, respec-

tively, ~ is a potential, T* is an electron temperature,

6B is the variation of the magnetic field from a constant

longitudinal magnetic field Bo, and 6 is the smallness

parameter. The compressional component of the longi-

tudinal magnetic field is neglected in the low beta ap-

proximation. In the gyrokinetic particle simulation, we

use the lowest order equations in the long wavelength

limit. Electrons are treated in the drift kinetic approxi-

mation. We also neglect the finite gyroradius effect of

ions, i.e., kiPti = O, while keeping - the polarization

shielding effects of ions in the Poisson's equation.

We assume a rectangular system with dimensions

of L. , Ly , and L< . There is a strong and constant mag-

netic field in the z (toroidal) direction, B=Bob, where b

is the unit vector in the z direction. The system is

bounded by a perfectly conducting wall in the x and y

directions and connected by the periodic end condition

in the z direction. The electrostatic potential ~ and the z

component of the vector potential A, are zero at the

wall. We solve the gyrokinetic Vlasov equation by using

the method of characteristics. The gyrokinetic code

with the 6f method [9], simulates the temporal evolu-

tion of the deviation 6f from the equilibrium distribu-

tion function fo by following the characteristics of the

particles (markers) having variable weights, wf;

f.(x, pz' t) = fo'(x, y, p.) + 6f.(x, p., t) , (2)

6f.(x, p., t) = ~: wjq(t) ~[x - x]9(t)]

' 6[p.-p."j(t)] (3)

260

~ jf ~~7* ~~ ~C Parallelization of Gyrokinetic Particle Code r)~i~~ , ~]EEi4til

where a represents the pariicle species and pz is the ca-

nonical momentum,

q pzaj= vazj+ ma Az XJq t) a ( ' '

and the distribution of markers are given by

(4)

g.(x, p., t) = ~ ~[x - xJ9(t)] 6[p. - p."=j(t)] . (5)

For the equilibrium distribution we assume a shifted

Maxwellian for electrons,

noe foe = J~~~vte

exp

e [p~ + m Azo(x, y) - vD(x, y)]2

2 v2 t*

(6)

pl qi Azo

d wi' z mi J = ( ~ v2 - - )) fj

w d t gi j u dx;･ qi aAZO + dy} qj aAZO dP;-

' dtm~1 ax dtmlm~: ay ~dt

and a non-shifted Maxwellian for ions,

, (12)

where the value of f.j/ g.f is stored at the initial time

(t= O) because f. and g. are constant along the trajec-

tory. Note that f. is the physical distribution function

while g. is the distribution of the markers which are

loaded numerically and evolve in time. There is free-

dom for the choice for the distribution of markers at

t O go. Kotschenruether [10] used go.=const., while

Parker et al [9] selected go.=fo.' Here we adopted

e _ noe [P + m Azo(x y)]2 ,(_13)

goe ~ J~;~vte exp 2 vt2e

goi = foi , (14) noi

foi ~ J~;~vti exp

. qj - Azo(x, y)]2 [P' m,~

2 vt2i

(7)

where A~) rs related to the equilibrium electron drift

velocity lb by the Ampere's law,

V~ Azo(x, y) = poenoevD(x, y) . (8)

The equations of motion of markers are given by,

= vzj( b+ ) dxJ9 VAZ X b

dt Bo V~ X b

dpzj q. dt m.

(b ' V~ - v."j b '

Bo

VAz) '

(9)

(10)

The change of weights along the trajectory of markers

is given by

e

dwj P~+ m Azo ~ ~) ( ~- -~ ) fej

dt wj 2 gej vte

because we can expect that the distribution of Tnarkers

will be approximately constant in time assuming that

the spatial motion of markers are almost incom-

pressible; those markers will pick up the contribution

from fo. much more precisely even in the long time

runs.

The electrostatic potential is determined by the gy-

rokinetic Poisson's equation,

.[ d~ [ dt

+ dy]~ (

dt

e aA~) + al~) i

m' ax axl

eaA me

J

) , Azo + al~) dP~ (11)

ay ay dt

e V2~+iBLcoi 2 =_~wJ~6(x-x~J) co.2, Vi~ 60 j

The deviation of A<

Ampere's law by using the iterative method;

( '~

q' _ ~L ~ wj. 6(x - xj.) . (15)

60 j

from Azo' 6Az' is calculated from

co2e +coi 2) z c2 6A = poe~j p',j wJ~ ~(x- x~])

+~i2~a'* c2no (A~) + 6A<) ~ w~ 6(x - x~j)

- /loqi ~ pi,j wj. 6(x - x})

+~~Lco* c2no' (A~) + 6A,) ~ w} ~(x - x]i') . (16)

261

j~ ;~7 ･ ~~~~i~~/~-rlFr~~~･__A~~'~'~~"*~ ~~ 72 ~j~~~ 3 ~" 1995 ~j~ 3 ~

If there are enough markers to simulate the change

of distribution function, the total energy 8T '

8r = 8K + 8E + 8B , (17) will be conserved, where 8K is the kinetic energy, 8E is

the summation of the electric field energy and the fluid

kinetic (ion sloshing) energy (the latter is dominant),

and eB is the magnetic field energy;

8K = 1 2 ~ m*vt* N*

, e )2 f ( 1

+~m'no m ~Az+ I~) dx

~ z J2 [ +~ m'~j w~ [p + m A (x~J, t)

1 + ~ mivt2i Ni

+ ~ n~ino' ~L 6AZ )2dx f (

1 q mi: ,- , J2 [ q' +~mi~j w}[pJ jAz(x'j't) '

m 1

8E=~e0~k lip l

1 iELco2 . + ~ 60 coc2: ~k k2i l~kl2

1 8B = ~ k2i IAzk 12 .

2/lo k

(18)

(19)

(20)

3. Description and Performance of Paral-lelized Code

Before elucidating the method of parallelization,

the original code (the vectorized version of GYR3D) is

briefly described. Time integration is done by the leap-

frog trapezoidal scheme which is one of the predictor-

corrector methods. Linear interporation is used for

charge and current deposits from the particles to the

grids and for the field interpolation from the grids to

the particles. The particle acceleration subroutine is

well vectorized. The charge and current deposit subrou-

tine is divided into three parts. The first involves a cal-

culati_on_ of addresses a_n_d weig-h^ts which. is ','Jell ･Jec-

torized. The second is a loop of particle deposits in-

cluding list vectors (indirect addressing). Here we use

the fact that 8 grid points, which are the corners of a

cube (grid cell), are independent (no recursive ref-

erence) for the deposit of one particle. The vector

length of 8 is rather short for SX- 3 / 24R so we used the

16 replicas of the field array; we deposit 16 particles at

the same time to the different replicas. By using the

trick of using one array for 16 replicas, the vector

length of 128 is accomplished; hence, this part was also

well vectorized. The third loop is the summation of the

charge and current densities over the 1 6 replicas. The

fast Fourier transform (FFT) is vectorized in the direc-

tion perpendicular to the direction of FFr by using the

fact that the system is multi-dimensional. The integra-

tion of the gyrokinetic Poisson equation (Eq. (15)) is

executed in Fourier space. For the integration of Am-

pere's law, (Eq. (16)), an iterative method is used in-

cluding the forward ahd inverse FFT'S. Usually itera-

tions of 5 are enough. Also, the calculation of spatial

derivatives of the field quantities are done in Fourier

space. The total number of FFr's called to advance a

time step is 14 forward FFr's and 24 inverse FFT'S.

One of the key issues for parallelization is the de-

composition [ 1 1] of the spatial region of the system

(domain). In a distributed memory parallel computer,

the memory of one node is not so big as to include all

the field quantities of the total domain. So one proces-

sor can keep only the field data in a limited sub-

domain. From the point of view of load balancing it is

preferable that each node has approximately same

number of particles located in its sub-domain. In this

paper, the three-dimensional rectangular simulation

box is cut into N slices in the z direction. So, one di~<

sub-domain represents a limited region in z but includes

all the meshes in the x and y directions. It is to be noted

that the upper surface of one sub-domain is also the

lower surface of the adjacent sub-domain. So the sur-

faces in the boundary between sub-domains belong to

two adjacent nodes. (It is not necessary those nodes are

physically adj acent). To be clear, we determine the

owner of the data in the meshes of the sub-domain.

The owner of the data in the sub-domain is the node it-

self except for the data of the upper surface, the owner

of which is the adjacent node sharing the same meshes.

We will call the meshes in the upper surfaces the guard

cells. After the charge and current deposit processes,

the charge and current densities in the guard cells are

added to the cells of the owner. Also, after the field

quantities are calculated at the grids, the data in the

lower bounda.-'y cells of the ow*"'1ers al~e sent to the

guard cells (upper boundary cells) in the adjacent

nodes.

The maximum number of Ndtv, is limited by the

number of meshes in the z direction, N.. In our case,

262

~;F"~~* "~~~'~~~~ Parallelization of Gyrokinetic Particle Code ~]~~ , ~]EEl~~~

the maximum number of nodes N is usually larger , *~d*,

than N<. Therefore it is necessary to use N**d* /Ndi~, re-

plicas for each sub-domain. (We are assuming N.~d, is a

multiple of Ndtv.)' Charge and current densities should

be summed up over the replicas. There are N**d* /Ndi+.

groups of nodes. One group is the assembly of Ndi+<

sub-domains which consists of a complete set of the

total domain. Particles move between the nodes

through communication inside each group. Field quan-

tities are calculated independently in each group. So the

communication between groups is needed only when

the charge and current densities are summed up over

the replicas.

The parallelized version of the FFT is illustrated

here. For the FFT in x and y directions, a standard FFT

method is used because all the data in those directions

are inside the one node. However for the FFT in the z

direction the data are distributed over the nodes.

Hence, a transpose method [12] is necessary. The field

data are divided by Ndi+, in the x direction and re-

distributed over the nodes. Because all the data in the z

direction is inside the node after the transpose of the

data, we can implement the FFT in the z direction very

efficiently. After the manipulation of the field quan-

tities in the Fourier space, an inverse FFT in the z di-

rection, an inverse transpose of the data, and inverse

FFT's in x and y directions are done in series.

The performance of the parallelized version of

GYR3D on the Paragon XP/S15-256 is summarized in

Tables I and 2. The 64x64x32 meshes are used for x,

y, and z directions, respectively. The total number of particles used in the simulation is rixed; 8388608 elec-

trons and 8388608 ions are used. The code was run

1000 time steps. The results for the case of Ndi+.=N<

(=32) is shown because the highest speed is obtained

for this number of divisions. To see the scalability of

the parallel computer, the results using 1 28 nodes are

compared with the results using 256 nodes. The sizes of

the executable programs are 3.3 Gbyies and 4.2 Gbytes

for the cases of 128 and 256 nodes, respectively. The

other parameters are the same as the actual simulation

which is shown in the following section. Table I shows

the CPU times while Table 2 shows the CPU times for

a particle and a step. Simulation time was divided into

(a) particle deposit time, (b) particle acceleration time,

and (c) field .calculation time. Particle deposit time in-

cludes the tim_e _for deposit ^_f c^h.arge and curr**nt to t*he

meshes, and the time for the summation of data over

the replicas, and the time for adding the data of the

guard cells to the owner. Particle acceleration time

involves the time for the update of particle positions,

Tab[e 1 CPU times for Intel Paragon XP/S15-256 and NEC

SX-3_/24R. The numerical values in the paren-thesis are the times consumed at the subroutines

added to the original code to treat the communi-

cation between nodes.

Paragon Paragon SX3 128 nodes 256 nodes I node [1 03 sec] [1 03 sec] [1 03 sec]

particle deposit

particle acceleration

field calcuiation

total

3.40

(0.0963)

20.0

(0.868)

1 .69

25.0

1 .80 1 2.3 (o. 1 74)

9.97 30.7 (0.447)

1 .87 0.378 1 3.6 43,4

Table 2 CPU times per one particle for one time step for

Intel Paragon XP/SI 5-256 and NEC SX-3/24R.

Paragon Paragon SX3 128 nodes 256 nodes I node [// sec] [// sec] [/1 sec]

particle deposit

particle acceleration

field calculation

total

0.20

1.19

o. I o

1 .49

0.1 1

0.59

0.1 1

0.81

0.73

1 .83

0.02

2.59

canonical momentum, and weights (Eqs. (9)-(12)) and

the time for moving particles between neighboring pro-

cessors. Field calculation time corresponds to the time

to update the field quantities by using the parallelized

FFT. The times for the initialization and the diagnostics

are not included.

The time for the implementation of the paral-

lelized FFT is about 18 times faster than the time of

single node calculation. The theoretical speed-up factor

without communications should be 32 because Ndi+z= 32. Hence we can estimate that 45 per cents of

time is used for communication in the calculation of the

field. The total time using 256 processors is about 1.8

times faster than the results using 128 processors. So no

significant saturation as the number of nodes increases,

is observed; the speed of the code scales very well with

the number of nodes. ~1 he communication times are at

an acceptable level because it is 7 and 1 1 per cent for

the case of 128 and 256 nodes, respectively. The times

for the particle acceleration are dominant and 80 ( 1 28

nodes) and 73 (256 nodes) per cent of the total time is

263

~~ )~~ ･ ~~~/~Z.~~i~,~~:A~~--'_"d~,*.

spent there. While for the particle deposit, only the

charge and current densities are accumulated on the

grids, for the particle acceleration the interpolations of

seven field quantities (-a~lax, -a~/ ay, -aip/ az, A,,

aA.lax, aA<lay, and aA< / az) to the particle positions

are needed as well as the calculation of aA,olax,

aA<0lay, avDlax, and avDlay. Therefore, the particle

acceleration time dominates over the particle deposit

time.

The time on the Paragon XP/S15-256 is com-

pared with the time on the SX-3/24R. The simulation

on the SX-3/24R is done by using 2,097,152 electrons

and 2,097,152 ions. (If we use the same number of par-

ticles as the case of the PARAGON, the size of the ex-

ecutable program is 2.8 Gbyies. It is not impossible to

run on SX-3 but becomes the huge run to the SX-3

users; hence we reduced the number of particles to de-

crease the load for the computer.) In order to compare

the results with the case of Paragon XP/S15-256 in

which 8,388,608 electrons and 8,388,608 ions are used,

we multiplied the times for the particle deposit and

acceleration on SX-3/24R by the factor of 4. (Usually

the simulation time for vector machines scales very well

with the number of particles.) The speed of the Paragon

XP/S15-256 using the maximum number of nodes is

3.2 times faster than the speed of the SX-3/24R. This

speed ratio is consistent with the difference of the peak

speeds (19.6 Gflops for Paragon and 6.4 Gflops for

SX-3124R (bne node)). Here we will compare the

speed of the Paragon XP/S15-256 using 256 nodes

with the speed of the SX-3/24R using one node. The

time for the particle deposit is 6.8 times faster because

the scatter calculation is the most difficult part to op-

timize for the vector processors. The time for the par-

ticle acceleration is 3 . I times faster. However the time

for field calculation is 4.9 times slower because com-

munication time is not negligible in parallel FFl]. Also

in our case optimization of the FFT by parallelization is

limited by the factor Ndi.,=N< . If we use more meshes

in the z direction further speed up of the FFT may be

expected.

4. Simulation Results The simulation of the m = I and n = I kinetic inter-

nal kink mode is done using the parallelized GYR3D

code. The mode becomes unstable due to the inertial

effects of electrons when the safety factor q at the mag-

netic axis, qo, is less than unity. We simulate the Wes-

son's type reconnection [14] which is the modification

of the Kadomtsev type reconnection [ 15]. While in the

former reconnection electron inertia in the current layer

1995 ~1; 3 ~

is dominant, in the latter collisional effect is crucial.

Also, the importance of the electron inertial effect for

the collisionless reconnection has been recognized in

astrophysics [ 1 6]. The main purpose of the paper is to

implement the convergence study of the GYR3D code

in the limit of extremely large number of particles. In

the previous paper [7] we used only about I million

particles to simulate the s~lwtooth crash process coming

from the nonlinear development of the internal kink

mode. The result is quite satisfactory because the noise

level is quite low compared with the standard gyro-

kinetic code (total f code). The linear phase of the tem-

poral development of m=1 and n=1 kinetic internal

kink mode is clearly observed even if the amplitude of

the mode is quite small. One also observed the non-

linear full reconnection process of Wesson type and the

secondary reconnection [ 1 7] after the full reconnection

which creates the new magnetic axis with qo less than

unity again. While we believed that the results were

physically sound, a convergence study of the GYR3D

code was felt to be necessary. Numerically it is impor-

tant to check the convergence properties of the code in

the limit of extremely large number of particles. The-

oretically gyrokinetic equations with ~f method assure

conservation of energy in the collisionless fluid limit in

the phase space. Because the gyrokinetic Vlasov equa-

tion is solved with a finite number of markers (par-

ticles), we must verify that if we increase the number of

markers, conservation of energy improves up to an ac-

ceptable level.

The simulation parameters are the same as those in

the previous paper [7] except for the number of par-

ticles, the time step size, and the number of maximum n

modes included. The simulation system is a three

dimensional rectangular box of L*XLyXL.=64Ai x64Ai X32An , where Al and All are the grid sizes per-

pendicular and parallel to the longitudinal magnetic

field. A is stretched by the factor of 1000 from Ai .

Plasma is filled uniformly inside the box. The initial dis-

tribution functions of markers are generated by using

the bit-reversed numbers [ 1 9] which is one of the quiet

start techniques. In the previous paper, we used go*=fo*

and goi=foi following Parker et al. [9] for the distribu-

tion of markers. Here we used the same choice for ions.

For electrons, however, we used the distribution given

by Eq. (13) which is uniform in space and slightly dif-

ferent from fo* ' For the initial weights of electrons, very

small values to roughly approximate the charge density

profile in the linear phase are given. For ions, initial

weights are set to be zero. The number of time steps is

13,000. As the equilibrium current density Jzo We use

264

~ *~~"~~'~~~C ~ Parallelization of Gyrokinetic Particle Code P~~:~ ~IEEl4~

the profile given in the paper by Strauss [ 18]. The

equilibrium q -profile for this current profile monotoni-

cally increases from the center to the wall. The central

q value of q0=0.85 is selected. For the toroidal mode

numbers, the modes up to n=0, ~1, ~2, ~3, :!:4 were

included; higher n modes are eliminated in the Fourier

space to reduce the high-frequency shear Altv6n

waves. The other simulation parameters are given as

follows: co /co = I c/cop.=4A vte/vA=0.25, i, .* p. ' Ai = p,i = f 71 ~;~/co,i , mi /m. =1836, T, / T. =1,

p=(vt. / vA)2(m.1,ni )= 0.000034, At= 0.00233L</ VA .

Here lrA is an Altv6n time, TA=L./vA , and VA is an Alf-

v6n velocity, vA= c((o,i / copi ). Finite size particles with

Gaussian shape are used. Particle sizes of afAi,

ay=Ai , a<=AII are used in x, y, and z directions.

The time evolution of the deviation of various en-

ergies is shown in Fig. I . The results with about two

million particles (1.048,576 electrons and 1,048,578

ions) (Fig. I (a)) are compared with results with eight

million particles (4,194,304 electrons and 4,194,304

ions) (Fig. I (b)). Those results should be compared

with the results of Fig. 7 (include n=0, ~ 1) and Fig. 12

(a) (include n=0, ~ 1, ~2) in reference [7] with about

one million particles. The deviation of eT is larger than

the deviation of eE for the one million particle run. In

the 2 million particle run, the deviation of 8T is less than

the deviation of eE . The deviation of 8r for the case

with about 8 million particles is fairly well and signifi-

cantly less than the deviation of GK , 8E , and 8B . It is to

be noted that the temporal behavior of 8E and 8B are al-

most independent of the number of particles. Only 8K

behaves differently depending on the number of par-

ticles. This is explained in the following. 8E and 8B are

calculated by the lowest order of moments (density and

current density). However eK includes the higher order

moments and the contribution from the high energy

electrons will introduce numerical errors. The error in

the kinetic energy does not significantly influence the

temporal evolution of the collective modes. So the

number of particles only changes the physics slightly in

the long time runs.

The Roman characters in Fig. I (b) show the typical

times corresponding to various phases of the instability.

Hereafter we will show only the results obtained with

the eight million particle run. (1) is linear phase. (II)

and (III) are nonlinear phases before full reconnection.

(IV) is the time of the saturation. (V) and (VI) repre-

sent the post full reconnection phase. It is clear that

until the saturation of full reconnection phase the mag-

netic field energy is released to the fluid kinetic energy

and the electron kinetic energy in the magnetic field

O t~)~

.¥ ~ ~~)

~ O ~ :A~

*~ C~ ~) ~)

O ~~~~

.~ cc{

~4)

~)

O ~~~~

+~ t~)*

~

0.001

O

-0.001

(*)

58BleTO ~

581( 1 8To f

rl ! ~

¥ 1 5CEl8TO

l f; !

ld /~~ !" rl~?--"'

=*_/ ~ .! ¥!" 5CT ICTO

/ "

',

O

(b)

1 O 20 t(vA/L.)

30

C) ~~~

,~ ~ ~~)

~ O ~ ~~)

,¥ t~)~~~

~)

O ~ :~)

+~ cq ~~)

~ (:)

~ ~~)

,~ ~~)~

~)

0.001

O

-O.oOi

58El8TO

l d~/'~~~"

' ,TT ~. .

n '*Irrrv

5CBl8TO L

58KICTO

ll / ~

T

--~~+_"'/

~ VVI

t t l

6CTICTO

, ,

O 1 O 20 t(vA/L.)

30

Fig. I Time dependence of deviation of kinetic 8K, fluid

kinetic 8E, magnetic field 8B, and total 8T energies.

Each energy is normalized by initial total energy eTo'

The total numbers of particles (markers) used are (a) 2,097,1 52 and (b) 8,388,608.

direction. Also it is clear that after the saturation an in-

verse process is occurring because the fluid kinetic energy

goes back to the magnetic field energy. Fig. 2 shows the

temporal dependence of the amplitude of the electrostatic

potential. It is clear there is no thermal fluctuation and the

growih of the mode is simulated from the very early stage

of the instability when the amplitude of the mode is quite

small. The growth rate calculated from Fig. 2 is y = 0.44

( vA/ L<) which is equal to the measurement of the one

million particle runs. This value of y is consistent with

the estimation by Wesson [ 14] and may explain the fast

sawiooth crash in the large tokamaks.

265

~~ ;~7 ･ ~~~~;~~~~A~~ ~~_.>"~_A~C~~~'~~~~ ~~ 72 ~j~~~ 3 ~" 1995 ~P 3 ~l

o

f*~ Ei!1q)

+¥ ~~~ -5 ~)

~j ~~

~

O 10

t(vA/L.)

20 30

Fig. 2 Time evolution for the amp]itude of the eiectro-static potential at z=0 cross section.

The temporal evolution of the magnetic field struc-

ture and the potential profiles at the cross section of

z=0 are shown in Fig. 3 and 4 where the Roman num-

erals corresponding to those shown in Fig. I (b). It is

clear that the original magnetic axis moves out to the x-

point in the q=1 surface with the increase of m=1

magnetic island and finally dissappears and the axis of a

magnetic island forms the new magnetic axis ((1) to

(III)). As is obvious from Fig. 4, the potential structure

which drives the core plasma in the direction of the x-

point by the E X B drift survives even after the satura-

tion of the instability. This is possible because q= 1

after the full reconnection and electron motion along

the magnetic field can not erase the potential profile

because mode structure is flute-like and coincides with

the magnetic field structure. So the plasma in the oppo-

site side of the x-point comes into the central region in

the post full reconnection phase. After the secondary

reconnection a new core plasma is formed in time (VI).

The central q value of the new core plasma is q0= 0.95

and is less than unity again.

For the secondary reconnecton we need a flow to

the reconnecton point. As is clear in Fig. 4 (VI), we ob-

serve a quadrupole structure around the reconnection

point. Two of the flows (by E X B drifts) are toward the

reconnection point. Also the other two of the flows

come out from the reconnection point; one moves to

the center of the plasma and the other moves to the

wall. This quadrupole structure around the ･reconnec-

tion point was not observed in the previous paper be-

cause only up to n=~3 modes were included there.

The large number of particles used here was also help-

ful in observing such a clear quadrupole potential

profile.

The basic process of collisionless reconnection has

also been studied by the two-dimensional particle simu-

lation of the coalescence of two flux bundles [20] which

is caused by the magnetic attraction between the two

flux bundles carrying the same-directional currents. The

importance of the electron inertia is pointed out. The

physics of the collisionless reconnection is essentially

the same for tokamaks and the coalescence of the two

flux bundles. The quadrupole potential profile in the

simulation of the coalescence can be found in the mag-

netic reconnection for I m ;~ 2 in tokamaks. However

for the m=1 mode presented in this article, the poten-

tial profile is dipole which represents the uniform radial

(herical) flow to the x-point only from the inside of the

q=1 surface, It is interesting in the simulation of the

kinetic internal kink mode that the quadrupole poten-

tial profile appears in the strongly nonlinear phase after

the full reconnection. Due to the strong E X B flow

driven by the dipole potential created in the full recon-

nectiori phase, the plasma in the peripheral region at

the opposite side of the original x-point comes into the

core region; the anti-parallel helical magnetic field

structure is formed. There still remains the poloidal

flow from the both sides of the anti-parallel helical

magnetic field. This situation is topologically very simi-

lar to the case of the coalescence of the two flux bund-

les. Hence, the quadrupole potential profile is also ob-

served in the nolninear phase of the m=1 and n=1

kinetic internal kink mode; the new core plasma with

qo < I appears as the result of the magnetic reconnec-

tion in the post full reconnection phase.

5. Conclusions and Discussion

The three dimensional magneto-inductive gyro-

kinetic particle code GYR3D is ported to a massively

parallel computer, Intel Paragon XP/S15-256, which

has 256 scalar processors (nodes) connected with a

two-dimensional mesh topology. The Fortran77 compiler with message-passing subroutines (NX communication library) is used. The standard tech-

niques of parallelization [11,12] are used such as do-

main decomposition and data transpose. No special

technique was used for the load balance [13] because

we used the same size sub-domains and an almost equal

number of particles per sub-domain since the density

profile is uniform in space. Even in the non-uniform

case, it may be possible that we do not need the load

balance technique because, in the 6f code, we can select

266

,~c ~~ ~Jf ~fu ~m7 )~ Parallellzation of Gyrokinetic Particle Code r~~;~ , ~IEEI4~2ti

64

48

Y 32

IS

o

(1)

/** '

/' ~~ '¥ t : f/ '/ "' ¥¥ / // /./(~l~~~::~. ¥ ¥ ¥ "' r'lPl~' .It . . i~i " t j r , fli ~r

t~ -i l, t!¥¥¥~/~~O ~ i l ) i

iti ! t l

I~ f' ¥ L '¥ ~e / / ~aih_ , ¥ ¥ Il' ¥~¥ ~"~~-~~ ~" / __r .._1 '

o I s 32 48 e4 X

64

4a

Y 32

16

o

(IV)

/ f~ " " *~" -- ~1 1 ,. IL ..,1' -' ~:'~-*' / l'r " ~¥" ' L L i ' L - - ¥¥' ' ! f" ~' ~ 1

. , ¥ . .. .~¥ ' l ' ' t ' / -- ~S' t t lh ~"¥~;¥¥::~t:::¥,~:~~~:~:~;~,~¥~~~i~~L~::~~~~~~'b~_~~~~__~~~::~h~:~~~¥~~¥~l;~:¥t t'tl ~ ;:

lr~~ I!( '1 i ; ~ ' tstl~~~ : i tt' l j t :L¥:¥:~~~~::i~::~~~~~~~~:~}~J:1)~ffl l:l~ ;: i ~

l

1 ;~'iL .¥ L~, .,~ __ /"i..: ,' j

tj]1¥¥ . t' '

'tc¥

L "' "'- -- I ' f / '¥'¥ "/ " ¥ "~ '~ 4

t ¥~~'* rll" ~' " j "':vr"*- / /

o 16 32 48 e4 X

64

48

Y 32

16

o

(II)

'/'~ '¥ / /' ' ¥b ¥ / / al"~ -::L-~' ~1;i's' ¥ 1 / _'i'- " ~ . . ¥ " ;¥'¥ i !r/'~';~;' /~~/// " 1: 1 ) .~l ' ' f ji!lV:f:(~ Iflt;j i

~:¥)~ i ' !{tt';;,~f ' i /'/-**'¥ ' r~L~~~!I I ,

;

¥¥J~/ ' I

¥ ¥ '~'_ //' ¥ 'e¥J ~¥~~' ' i ¥'h l ': t¥ ~)~)'-¥ "~~"le"-"" ! l 'l ' ¥¥ " T::; / ¥ ' ." ' ' ¥b / / ¥*

c 16 32 48 64 X

64

48

Y 32

16

o

(V)

'r ""I"""e I¥ l tt'F' i_-"ttl;':ttll' I tl"P l' 'P~;"' """ "~t '1~ ¥ "i:'1T;!:~ti"" 'I t' l t / ILi/ , :L -;)O~t~i;i:ii : : ~ t f!1! ~' /l ! /!/f:)/ t ~l

f ; ¥ittti/f(Pjlll":~(1 't:tit

~11 !l¥ l. i

i ~l' ~¥ t' I'i "t ' I"?r It tl~ 'L~";:' ~i:L~il'

¥ '~ "'r"L_'1"'1 ": ' 'l 'P t¥i ~' I't r

c l 6 32 48 e4 X

64

48

Y 32

16

o

(III)

/' *""'- "¥ l' ' r " "~e ¥' I f' ':': 'L' ~' .,1~ I ~ '1/'~ ~t ~ l 'JL"I"':It¥ ~ ! ! ';;.' )) . t" ~ i ,Lt

/..,*n""~:~:~~""~~ ' i i~f: t: t ' l e " s:~l:'C~~e~~rri' f " ) i I ' !

~' ~ :f~' ! 1 1 , f....,'

~ ::2: 'l

. , t : l' i ;, '.~1" l t / I ':t '~ f : / / t I '111 '_""" / : i t t '1':~:;~/ "" / " :'~P , " " '¥" ' ~" j i ¥. L '~-'h'~_""'111 "

,L '¥ " 'r / ,,

'h" ' / '/'

o 1 6 3z 4 8 64 X

64

4e

Y 32

16

o

(VI)

f ' v 1"~~ ' "II'¥,, . ¥"

' '1;' Istr '_' l **.~ ' / / /::":r "(' 1"' Lt .¥ . ' / 'j f ~t:~sL . .,.. ¥t I f~ ~

: I t "L -"' ( . i ' Il~l ' ' l:' r~)~~~~~:.~:1 1 t

, I t ¥t~'~ i' i ' ~ f f l:~_ }

: ill : 1 ! ' t Dl!"!! i : : i l

l " l ,

t. I tt~ / ~ I ~l '!' I ! / l'~

l ~ '¥¥~.¥1~f"L "~ '_ '

' ' '¥"' " " "r l ., ., ¥ ¥ "'.;""~'::: ;i: " : :r"' I;' _f~ / 1 t"t ""' t""I"t.""' /

l,.*

c Is 32 48 e4 X

Fig. 3 Time evolution of the magnetic field structvre at the cross section of z=0.

The number corresponds to the time sequence in Fig. I (b).

the distribution of markerS uniformly in space. We only

decomposed the domain in the z direction. The best re-

s'"'Its 'fJere obtai*･･^ed w,hen the i'-'1ul'~~uber of rneshes in z,

Nz' is equal to the number of decomposition, Ndi~z' In

our case, Nz is usually smaller than the number of

nodes, N.od" Therefore we used N.od' / Ndi.z replicas. It

is found that the speed of the parallelized code scales

well with the number of nodes. The speed of the paral-

lelized GYR3D with 256 nodes is more than three

times faster than that of the vectorized GYR3D on

NEC SX-3/24R (vector/parallel computer) with one

node. This speed ratio is consistent with the ratio of

peak speeds of these computers. It will be interesting

from the viewpoint of parallelized particle simulation to

267

7 7 ~:7 ･ ~~~~~~;~~~A~d~ ~~ ~~ 1995 ~l; 3 ~j

64

48

Y 32

IS

o

(1)

~)

o l e 32 48 64 X

64

48

Y 32

IS

o

(IV)

i: .1, .･ .,,' ~ 1~

o 1 e 32 48 X

e4

64

48

Y 32

16

o

(II)

C 1 6 32 48 e4 X

64

4a

Y 32

16

o

(v)

ir~~ :~s~~*'1':***=~････,-･･.*='='

o Is 32 48 X

e4

64

48

Y 32

16

o

(III)

Q

c le 32 48 64 X

64

48

y, 32

16

o

(VI)

~~ s ,

c le 32 48 X

64

Fig. 4 Time evolution of the electrostatic potential profile at the cross section of z=0. Contour plots are shown.

The number corresponds to the time sequence in Fig. I (b).

port the GYR3D code to a vector parallel computer

with distributed memory, Fujitsu VPP500/42 (42 pro-

cessors). The parallelization of GYR3D on VPP500/42

is now under progress and will be reported with com-

parison of performance among Paragon and VPP 500

elsewhere.

The parallelized GYR3D code was applied to the

simulation of the m=1 and n=1 kinetic internal kink

mode in tokamaks. A convergence study was done in

the limit of extremely large number of particles. The re-

sults show the very good energy conservation when

about eight million particles are used. It is important to

note that the number of particles did not influence the

linear phase of the instability. Also the number of

268

~Jf *7c:*~'~'~'~)~ Parallelization of Gyrokinetic Particle Code r~~}~ , ~I~El4ti;

particles had only a slight influence on the nonlinear

phase in the very long runs. This is explained in the fol-

lowing. Usually the error in energy conservation reflects

the error in the total kinetic energy of electrons which

comes from the contribution of the high energy elec-

trons. The error in the kinetic energy did not signifi-

cantly influence the temporal evolution of collective

modes. This is because field quantities are determind by

the lower order moments such as charge and current

densities. The other finding in this paper is the exist-

ence of the quadrupole potential profile around the rec-

onnection point in the secondary phase after the full

reconnection. This clear mode structure is obtained by

including higher n modes by reducing the time step

size. The quadrupole potential structure was clearly ob-

served when modes up to n=~4 are included in the

simulation, which was not observed in the previous

paper [7] because only modes up to n=~3 were in-

cluded. The large number of particles used in the simu-

lation also helped to obtain a clear observation of the

mode pattern.

This simulation was done with a relatively small

system size and with less realistic parameters. Because

the collisionless skin depth of electrons is the minimum

characteristic length, we must resolve this length; we

are forced to use a relatively small system size in the di-

rection perpendicular to the longitudinal magnetic field.

Also the parameter selected is vt* << vA. However in the

actual large tokamaks, this value is in the opposite ex-

treme; vt* >> vA. To simulate the case of vt* >> vA, we

must reduce the time step size such that kjl vt* At<< 1;

the simulation time will increase greatly. It is found that

three-dimensional gyrokinetic particle codes with a 6f

method are very powerful tools in simulating the kinetic

modification of MHD modes. However, more powerful

parallel computers with greater than I tera flops are re-

quired for simulation with parameters of existing and

next generation tokamaks. The particle simulations for

such tokamaks need intensive research on the architec-

ture of parallel computers and on the parallel comput-

ing algorithms, which is one of main subjects of the

NEXT project.

Acknowledgement The authors would like to express their thanks to

Professors W.W. Lee, Princeton Plasma Physics Labor-

ato*"'y ar^d R.D. Sydora, University of California at Los

Angeles because the vectorized version of GYR3D

with 6f method was developed in collaboration with

them. The authors are grateful to Drs. M. Azumi and

K. Tani, JAERI and Professor T. Sato, National In-

stitute for Fusion Science for supporting their works.

Two of the authors (H.N. and T.S.) express their

thanks to Professor O. Fukumasa, Yamaguchi Univer-

sity for continuous encouragement. The Intel Paragon

XP/S15-256 at JAERI Naka Institute and the NEC SX-3/24R at the National Institute for Fusion Science

were used for the numerical calculations.

References [1] John .M. Dawson, Viktor Decyk and Richard Sy-

dora, Physics Today 64, March (1993).

[2] B.J. Cohen, D.C. Barnes, J.M. Dawson, G.W.

Hammett, W.W. Lee, G.D. Kerbel, J.-N Leboeuf,

P.C. Liewer, T. Tajima and R.E. Waltz, Comput.

Phys. Comm. 87, I (1995).

[3] H. Naitou, Journal of Plasma and Fusion Research

70, 135 (1994) [in Japanese].

[4] W.W. Lee, J. Comp. Phys. 72, 243 (1987).

[5] S.E. Parker, W.W. Lee and R.A. Santoro, Physi-

cal Review Letters 71, 2042 (1993).

[6] R.D. Sydora, Physica Scripta 52,~474 (1995).

[7] H. Naitou, K. Tsuda, W.W. Lee and R.D. Sydora,

Physics of Plasmas 2, 4257 (1995).

[8] T.S. Hahm, W.W. Lee, and A. Brizard, Phys. Fluids 31, 1940 (1988).

[9] S.E. Parker and W.W. Lee, Phys. Fluids B 5, 77

(1993).

[10] M. Kotschenruether, Bull. Am. Phys. Soc. 34,

2107 (1988).

[1l] P.C. Liewer and V.K. Decyk, J. Comput. Phys.

85, 302 (1989).

[12] V.K. Decyk, Compt. Phys. Comm. 87, 87 (1995).

[13] R.D. Ferraro, P.C. Liewer and V.K. Decyk, J.

Comput. Phys. 109, 329 (1993).

[14] J.A. Wesson, Nuclear Fusion 30, 2545 (1990).

[15] B.B. Kadomtsev, Fiz. Plazmy 1, 710 (1975) (Sov.

J. Plasma Phys. 1, 389 (1976)).

[16] V.M. Vasyliunas, Rev. Geophys. Space Phys. 13,

303 (1975).

[17] D. Biskamp and J.F. Drake, Physical Review Let-

ters 73, 971 (1994).

[18] H.R. Strauss, Phys. Fluids 19, 134 (1976).

[19] J.M. Hammersley and D.C. Handscomb, in "Monte Carlo Methods", Methuen Co. Ltd., Lon-

don, 133 (1964).

[20] M. Tanaka, Phys. Plasmas 2, 2920 (1995).

269

code and internal kink mode simulation

Documents