code and internal kink mode simulation
TRANSCRIPT
m: ?~5~~ ~A=RR ~
Parallelization of
its Application to
Gyrokinetic Particle
Internal Kink Mode Code and Simulation
NAITOU Hiroshi, SONODA Takuya, TOKUDA Shinji* and DECYK Viktor K.** Department of Electrical and Electronical Engineering, Yamaguchi University, Ube 755, Japan
*Naka Fusion Research Establishment, Japan Atomic Energy Research Institute, Ibaraki 311-01, Japan
* *Department of Physics, University of Callfornia at Los Angeles, CA 90095-1547, USA
(Received 28 November 1995/Revised manuscript:received 30 January 1996)
Abstract A three dimensional magneto-inductive gyrokinetic particle code with ~f method, GYR3D, is
ported to a massively parallel computer, Intel Paragon XP/S15-256, which has 256 scalar processors
(nodes) connected with a two-dimensional mesh topology. The Fortran77 compiler with message-
passing subroutines is used. It is found that the speed of the parallelized code scales well with the num-
ber of nodes. The speed of the parallelized GYR3D on Intel Paragon with 256 nodes is lilore than three
times faster than that of the vectorized GYR3D on NEC SX-3/24R (vector/parallel computer) when
one node is used. The parallelized GYR3D code is used to simulate the m=1 (poloidal mode number)
and n=1 (toroidal mode number) kinetic internal kink mode in a tokamak. The results for the conver-
gence study of 8f code in the limit of extremely large number of particles are shown.
Keywords: gyrokinetic particle simulation, numerical tokamak project, numerical experiment of tokamak,
delta-f method, massively parallel computer, distributed memory, message passing model, tokamak,
internal kink mode, sawiooth crash
1. Introduction
Massively parallel computers are promising an un-
imaginable improvement in computer physics in many
scientific and industrial areas like astrophysics, molecu-
lar dynamics, fluid dynamics, elementary particles,
molecular biology, plasma physics including processing
plasma, etc. Also in the fusion society, it is believed
that the use of massively parallel computers (including
forthcoming machines) will make it possible to simulate
the entire plasma in tokamaks. The Numerical Toka-
mak Project (NTP) [1,2,3], a High Performance Com-
puting and Communications (HPCC) Project sponsored
by the US Department of Energy (DoE) and the Nu-
merical Experiment of Tokamak (NEXT) begun in the
Japan Atomic Energy Research Institute (JAERI) are
the examples of such efforts. The main focus of NEXT
is a simulation study of tokamak plasmas with particle
and MHD-fluid models using the dev~loping technol-
ogy of parallel computers. This project aims to clarify
the physics of fusion plasmas and to contribute to the
design study of fusion reactors such as ITER. The pro-
ject also includes the development of parallel comput-
ing technology for tokamak simulation, and scalar par-
allel computing and vector parallel computing are in-
vestigated on an equal footing. The phenomena which
are not well understood in tokamak plasma are related
to kinetic effects of electrons, ions, and energetic ions.
The gyrokinetic particle simulation [4] is much more
suitable to these kinds of subjects because it includes
many basic physics without too many assumptions.
Also fluid simulation codes that include kinetic effects
may be used but the validity of the results must be
compared with and verified by the gyrokinetic codes.
So both approaches are necessary to study kinetic
259
~~ )~7 ・ ~p~/~Z.~~A~!;~・~~;~i_~~~d~~ 1995 f~ 3 ~l
effects in tokamaks.
Recently, three-dimensional electrostatic gyro-
kinetic codes in toroidal geometry were ported to mas-
sively parallel computers. Simulations of ion tempera-
ture gradient driven turbulence in tokamaks were done
by Parker et al. [5] ~on the CM-200 and CM-5 and by
Sydora [6] on the IBM SP2. In this paper, the three-
dimensional magneto-inductive gyrokinetic code
GYR3D [7] which uses a newly developed noise reduc-
tion technique (~f method [9]) was ported to a mas-
sively parallel computer, Intel Paragon XP/S15-256 as
part of the NEXr project. The Intel Paragon XP/S15-
256 is a distributed memory parallel computer, with
256 nodes connected with a two-dimensional mesh to-
pology. Each node has a 75 Mflops (double precision)
scalar processor and 32 Mbyies of memory, so the total
peak speed and memory is 19.2 Gflops and 8 Gbyies,
respectively. The Paragon uses a distributed memory
programming model implemented in Fortran 77 with
message-passing subroutines (NX communication li-
brary). In the distributed memory parallel computer,
the synchronization among the nodes and the access to
the memory of the other node can only be done by pas-
sing messages among processors. So message passing li-
braries are usually supported. Because the memory of
one node is small, the technique of domain decomposi-
tion is necessary. Also the transpose method of com-
municating data between nodes is used to execute the
fast Fourier transforms in the direction of the decom-
position. The performance of the parallelized GYR3D
on the Intel Paragon XP/S15-256 is studied and com-
pared with the vectorized GYR3D on the NEC SX-3/
24R (vector/parallel computer with peak speed of 1 2.8
Gflops with two CPUs) in single CPU mode.
The parallelized GYR3D was run to simulate the
linear and nonlinear phenomena associated with a m = 1
(poloidal mode number) and n=1 (toroidal mode num-
ber) kinetic internal kink mode (sawtooth crash) in to-
kamaks. The main purpose is to do a convergence study in the limit of extremely large number of particles
(markers), which used to be very difficult because the
estimated CPU time was so large. Also the higher n
mode numbers are included in the simulation by using
the smaller time step size because the frequencies of
corresponding shear Aliv6n waves are large.
The outline of the paper is as follows. A brief sum-
mary of the basic equations of gyrokinetic code with ~f
method is given in the following section. The descrip-
tion and performance of the parallelized GYR3D code
are presented in Section 3 . The application of the paral-
lelized GYR3D code to the simulation of the sawiOoth
crash in tokamaks is shown in Section 4. The conclud-
ing remarks and discussion are given in Section 5 .
2. Basic Equations The derivation of the gyrokinetic equations in the
low beta limit is given by Hahm [8] using the gyro-
kinetic ordering,
co
coci
Pti k e_c _ 6B _ m L ~ k ~ T. ~ Bo = O(e), i
klpt' = O(1) , (1)
where co is a characteristic frequency of a mode, co,i is
an ion cyclotron frequency, pti is a thermal ion gyro-
radius, L is a characteristic length of the gradient of
density or temperature, kjj and ki are wave numbers
parallel and perpendicular to the magnetic field, respec-
tively, ~ is a potential, T* is an electron temperature,
6B is the variation of the magnetic field from a constant
longitudinal magnetic field Bo, and 6 is the smallness
parameter. The compressional component of the longi-
tudinal magnetic field is neglected in the low beta ap-
proximation. In the gyrokinetic particle simulation, we
use the lowest order equations in the long wavelength
limit. Electrons are treated in the drift kinetic approxi-
mation. We also neglect the finite gyroradius effect of
ions, i.e., kiPti = O, while keeping - the polarization
shielding effects of ions in the Poisson's equation.
We assume a rectangular system with dimensions
of L. , Ly , and L< . There is a strong and constant mag-
netic field in the z (toroidal) direction, B=Bob, where b
is the unit vector in the z direction. The system is
bounded by a perfectly conducting wall in the x and y
directions and connected by the periodic end condition
in the z direction. The electrostatic potential ~ and the z
component of the vector potential A, are zero at the
wall. We solve the gyrokinetic Vlasov equation by using
the method of characteristics. The gyrokinetic code
with the 6f method [9], simulates the temporal evolu-
tion of the deviation 6f from the equilibrium distribu-
tion function fo by following the characteristics of the
particles (markers) having variable weights, wf;
f.(x, pz' t) = fo'(x, y, p.) + 6f.(x, p., t) , (2)
6f.(x, p., t) = ~: wjq(t) ~[x - x]9(t)]
' 6[p.-p."j(t)] (3)
260
~ jf ~~7* ~~ ~C Parallelization of Gyrokinetic Particle Code r)~i~~ , ~]EEi4til
where a represents the pariicle species and pz is the ca-
nonical momentum,
q pzaj= vazj+ ma Az XJq t) a ( ' '
and the distribution of markers are given by
(4)
g.(x, p., t) = ~ ~[x - xJ9(t)] 6[p. - p."=j(t)] . (5)
For the equilibrium distribution we assume a shifted
Maxwellian for electrons,
noe foe = J~~~vte
exp
e [p~ + m Azo(x, y) - vD(x, y)]2
2 v2 t*
(6)
pl qi Azo
d wi' z mi J = ( ~ v2 - - )) fj
w d t gi j u dx;・ qi aAZO + dy} qj aAZO dP;-
' dtm~1 ax dtmlm~: ay ~dt
and a non-shifted Maxwellian for ions,
, (12)
where the value of f.j/ g.f is stored at the initial time
(t= O) because f. and g. are constant along the trajec-
tory. Note that f. is the physical distribution function
while g. is the distribution of the markers which are
loaded numerically and evolve in time. There is free-
dom for the choice for the distribution of markers at
t O go. Kotschenruether [10] used go.=const., while
Parker et al [9] selected go.=fo.' Here we adopted
e _ noe [P + m Azo(x y)]2 ,(_13)
goe ~ J~;~vte exp 2 vt2e
goi = foi , (14) noi
foi ~ J~;~vti exp
. qj - Azo(x, y)]2 [P' m,~
2 vt2i
(7)
where A~) rs related to the equilibrium electron drift
velocity lb by the Ampere's law,
V~ Azo(x, y) = poenoevD(x, y) . (8)
The equations of motion of markers are given by,
= vzj( b+ ) dxJ9 VAZ X b
dt Bo V~ X b
dpzj q. dt m.
(b ' V~ - v."j b '
Bo
VAz) '
(9)
(10)
The change of weights along the trajectory of markers
is given by
e
dwj P~+ m Azo ~ ~) ( ~- -~ ) fej
dt wj 2 gej vte
because we can expect that the distribution of Tnarkers
will be approximately constant in time assuming that
the spatial motion of markers are almost incom-
pressible; those markers will pick up the contribution
from fo. much more precisely even in the long time
runs.
The electrostatic potential is determined by the gy-
rokinetic Poisson's equation,
.[ d~ [ dt
+ dy]~ (
dt
e aA~) + al~) i
m' ax axl
eaA me
J
) , Azo + al~) dP~ (11)
ay ay dt
e V2~+iBLcoi 2 =_~wJ~6(x-x~J) co.2, Vi~ 60 j
The deviation of A<
Ampere's law by using the iterative method;
( '~
q' _ ~L ~ wj. 6(x - xj.) . (15)
60 j
from Azo' 6Az' is calculated from
co2e +coi 2) z c2 6A = poe~j p',j wJ~ ~(x- x~])
+~i2~a'* c2no (A~) + 6A<) ~ w~ 6(x - x~j)
- /loqi ~ pi,j wj. 6(x - x})
+~~Lco* c2no' (A~) + 6A,) ~ w} ~(x - x]i') . (16)
261
j~ ;~7 ・ ~~~~i~~/~-rlFr~~~・__A~~'~'~~"*~ ~~ 72 ~j~~~ 3 ~" 1995 ~j~ 3 ~
If there are enough markers to simulate the change
of distribution function, the total energy 8T '
8r = 8K + 8E + 8B , (17) will be conserved, where 8K is the kinetic energy, 8E is
the summation of the electric field energy and the fluid
kinetic (ion sloshing) energy (the latter is dominant),
and eB is the magnetic field energy;
8K = 1 2 ~ m*vt* N*
, e )2 f ( 1
+~m'no m ~Az+ I~) dx
~ z J2 [ +~ m'~j w~ [p + m A (x~J, t)
1 + ~ mivt2i Ni
+ ~ n~ino' ~L 6AZ )2dx f (
1 q mi: ,- , J2 [ q' +~mi~j w}[pJ jAz(x'j't) '
m 1
8E=~e0~k lip l
1 iELco2 . + ~ 60 coc2: ~k k2i l~kl2
1 8B = ~ k2i IAzk 12 .
2/lo k
(18)
(19)
(20)
3. Description and Performance of Paral-lelized Code
Before elucidating the method of parallelization,
the original code (the vectorized version of GYR3D) is
briefly described. Time integration is done by the leap-
frog trapezoidal scheme which is one of the predictor-
corrector methods. Linear interporation is used for
charge and current deposits from the particles to the
grids and for the field interpolation from the grids to
the particles. The particle acceleration subroutine is
well vectorized. The charge and current deposit subrou-
tine is divided into three parts. The first involves a cal-
culati_on_ of addresses a_n_d weig-h^ts which. is ','Jell ・Jec-
torized. The second is a loop of particle deposits in-
cluding list vectors (indirect addressing). Here we use
the fact that 8 grid points, which are the corners of a
cube (grid cell), are independent (no recursive ref-
erence) for the deposit of one particle. The vector
length of 8 is rather short for SX- 3 / 24R so we used the
16 replicas of the field array; we deposit 16 particles at
the same time to the different replicas. By using the
trick of using one array for 16 replicas, the vector
length of 128 is accomplished; hence, this part was also
well vectorized. The third loop is the summation of the
charge and current densities over the 1 6 replicas. The
fast Fourier transform (FFT) is vectorized in the direc-
tion perpendicular to the direction of FFr by using the
fact that the system is multi-dimensional. The integra-
tion of the gyrokinetic Poisson equation (Eq. (15)) is
executed in Fourier space. For the integration of Am-
pere's law, (Eq. (16)), an iterative method is used in-
cluding the forward ahd inverse FFT'S. Usually itera-
tions of 5 are enough. Also, the calculation of spatial
derivatives of the field quantities are done in Fourier
space. The total number of FFr's called to advance a
time step is 14 forward FFr's and 24 inverse FFT'S.
One of the key issues for parallelization is the de-
composition [ 1 1] of the spatial region of the system
(domain). In a distributed memory parallel computer,
the memory of one node is not so big as to include all
the field quantities of the total domain. So one proces-
sor can keep only the field data in a limited sub-
domain. From the point of view of load balancing it is
preferable that each node has approximately same
number of particles located in its sub-domain. In this
paper, the three-dimensional rectangular simulation
box is cut into N slices in the z direction. So, one di~<
sub-domain represents a limited region in z but includes
all the meshes in the x and y directions. It is to be noted
that the upper surface of one sub-domain is also the
lower surface of the adjacent sub-domain. So the sur-
faces in the boundary between sub-domains belong to
two adjacent nodes. (It is not necessary those nodes are
physically adj acent). To be clear, we determine the
owner of the data in the meshes of the sub-domain.
The owner of the data in the sub-domain is the node it-
self except for the data of the upper surface, the owner
of which is the adjacent node sharing the same meshes.
We will call the meshes in the upper surfaces the guard
cells. After the charge and current deposit processes,
the charge and current densities in the guard cells are
added to the cells of the owner. Also, after the field
quantities are calculated at the grids, the data in the
lower bounda.-'y cells of the ow*"'1ers al~e sent to the
guard cells (upper boundary cells) in the adjacent
nodes.
The maximum number of Ndtv, is limited by the
number of meshes in the z direction, N.. In our case,
262
~;F"~~* "~~~'~~~~ Parallelization of Gyrokinetic Particle Code ~]~~ , ~]EEl~~~
the maximum number of nodes N is usually larger , *~d*,
than N<. Therefore it is necessary to use N**d* /Ndi~, re-
plicas for each sub-domain. (We are assuming N.~d, is a
multiple of Ndtv.)' Charge and current densities should
be summed up over the replicas. There are N**d* /Ndi+.
groups of nodes. One group is the assembly of Ndi+<
sub-domains which consists of a complete set of the
total domain. Particles move between the nodes
through communication inside each group. Field quan-
tities are calculated independently in each group. So the
communication between groups is needed only when
the charge and current densities are summed up over
the replicas.
The parallelized version of the FFT is illustrated
here. For the FFT in x and y directions, a standard FFT
method is used because all the data in those directions
are inside the one node. However for the FFT in the z
direction the data are distributed over the nodes.
Hence, a transpose method [12] is necessary. The field
data are divided by Ndi+, in the x direction and re-
distributed over the nodes. Because all the data in the z
direction is inside the node after the transpose of the
data, we can implement the FFT in the z direction very
efficiently. After the manipulation of the field quan-
tities in the Fourier space, an inverse FFT in the z di-
rection, an inverse transpose of the data, and inverse
FFT's in x and y directions are done in series.
The performance of the parallelized version of
GYR3D on the Paragon XP/S15-256 is summarized in
Tables I and 2. The 64x64x32 meshes are used for x,
y, and z directions, respectively. The total number of particles used in the simulation is rixed; 8388608 elec-
trons and 8388608 ions are used. The code was run
1000 time steps. The results for the case of Ndi+.=N<
(=32) is shown because the highest speed is obtained
for this number of divisions. To see the scalability of
the parallel computer, the results using 1 28 nodes are
compared with the results using 256 nodes. The sizes of
the executable programs are 3.3 Gbyies and 4.2 Gbytes
for the cases of 128 and 256 nodes, respectively. The
other parameters are the same as the actual simulation
which is shown in the following section. Table I shows
the CPU times while Table 2 shows the CPU times for
a particle and a step. Simulation time was divided into
(a) particle deposit time, (b) particle acceleration time,
and (c) field .calculation time. Particle deposit time in-
cludes the tim_e _for deposit ^_f c^h.arge and curr**nt to t*he
meshes, and the time for the summation of data over
the replicas, and the time for adding the data of the
guard cells to the owner. Particle acceleration time
involves the time for the update of particle positions,
Tab[e 1 CPU times for Intel Paragon XP/S15-256 and NEC
SX-3_/24R. The numerical values in the paren-thesis are the times consumed at the subroutines
added to the original code to treat the communi-
cation between nodes.
Paragon Paragon SX3 128 nodes 256 nodes I node [1 03 sec] [1 03 sec] [1 03 sec]
particle deposit
particle acceleration
field calcuiation
total
3.40
(0.0963)
20.0
(0.868)
1 .69
25.0
1 .80 1 2.3 (o. 1 74)
9.97 30.7 (0.447)
1 .87 0.378 1 3.6 43,4
Table 2 CPU times per one particle for one time step for
Intel Paragon XP/SI 5-256 and NEC SX-3/24R.
Paragon Paragon SX3 128 nodes 256 nodes I node [// sec] [// sec] [/1 sec]
particle deposit
particle acceleration
field calculation
total
0.20
1.19
o. I o
1 .49
0.1 1
0.59
0.1 1
0.81
0.73
1 .83
0.02
2.59
canonical momentum, and weights (Eqs. (9)-(12)) and
the time for moving particles between neighboring pro-
cessors. Field calculation time corresponds to the time
to update the field quantities by using the parallelized
FFT. The times for the initialization and the diagnostics
are not included.
The time for the implementation of the paral-
lelized FFT is about 18 times faster than the time of
single node calculation. The theoretical speed-up factor
without communications should be 32 because Ndi+z= 32. Hence we can estimate that 45 per cents of
time is used for communication in the calculation of the
field. The total time using 256 processors is about 1.8
times faster than the results using 128 processors. So no
significant saturation as the number of nodes increases,
is observed; the speed of the code scales very well with
the number of nodes. ~1 he communication times are at
an acceptable level because it is 7 and 1 1 per cent for
the case of 128 and 256 nodes, respectively. The times
for the particle acceleration are dominant and 80 ( 1 28
nodes) and 73 (256 nodes) per cent of the total time is
263
~~ )~~ ・ ~~~/~Z.~~i~,~~:A~~--'_"d~,*.
spent there. While for the particle deposit, only the
charge and current densities are accumulated on the
grids, for the particle acceleration the interpolations of
seven field quantities (-a~lax, -a~/ ay, -aip/ az, A,,
aA.lax, aA<lay, and aA< / az) to the particle positions
are needed as well as the calculation of aA,olax,
aA<0lay, avDlax, and avDlay. Therefore, the particle
acceleration time dominates over the particle deposit
time.
The time on the Paragon XP/S15-256 is com-
pared with the time on the SX-3/24R. The simulation
on the SX-3/24R is done by using 2,097,152 electrons
and 2,097,152 ions. (If we use the same number of par-
ticles as the case of the PARAGON, the size of the ex-
ecutable program is 2.8 Gbyies. It is not impossible to
run on SX-3 but becomes the huge run to the SX-3
users; hence we reduced the number of particles to de-
crease the load for the computer.) In order to compare
the results with the case of Paragon XP/S15-256 in
which 8,388,608 electrons and 8,388,608 ions are used,
we multiplied the times for the particle deposit and
acceleration on SX-3/24R by the factor of 4. (Usually
the simulation time for vector machines scales very well
with the number of particles.) The speed of the Paragon
XP/S15-256 using the maximum number of nodes is
3.2 times faster than the speed of the SX-3/24R. This
speed ratio is consistent with the difference of the peak
speeds (19.6 Gflops for Paragon and 6.4 Gflops for
SX-3124R (bne node)). Here we will compare the
speed of the Paragon XP/S15-256 using 256 nodes
with the speed of the SX-3/24R using one node. The
time for the particle deposit is 6.8 times faster because
the scatter calculation is the most difficult part to op-
timize for the vector processors. The time for the par-
ticle acceleration is 3 . I times faster. However the time
for field calculation is 4.9 times slower because com-
munication time is not negligible in parallel FFl]. Also
in our case optimization of the FFT by parallelization is
limited by the factor Ndi.,=N< . If we use more meshes
in the z direction further speed up of the FFT may be
expected.
4. Simulation Results The simulation of the m = I and n = I kinetic inter-
nal kink mode is done using the parallelized GYR3D
code. The mode becomes unstable due to the inertial
effects of electrons when the safety factor q at the mag-
netic axis, qo, is less than unity. We simulate the Wes-
son's type reconnection [14] which is the modification
of the Kadomtsev type reconnection [ 15]. While in the
former reconnection electron inertia in the current layer
1995 ~1; 3 ~
is dominant, in the latter collisional effect is crucial.
Also, the importance of the electron inertial effect for
the collisionless reconnection has been recognized in
astrophysics [ 1 6]. The main purpose of the paper is to
implement the convergence study of the GYR3D code
in the limit of extremely large number of particles. In
the previous paper [7] we used only about I million
particles to simulate the s~lwtooth crash process coming
from the nonlinear development of the internal kink
mode. The result is quite satisfactory because the noise
level is quite low compared with the standard gyro-
kinetic code (total f code). The linear phase of the tem-
poral development of m=1 and n=1 kinetic internal
kink mode is clearly observed even if the amplitude of
the mode is quite small. One also observed the non-
linear full reconnection process of Wesson type and the
secondary reconnection [ 1 7] after the full reconnection
which creates the new magnetic axis with qo less than
unity again. While we believed that the results were
physically sound, a convergence study of the GYR3D
code was felt to be necessary. Numerically it is impor-
tant to check the convergence properties of the code in
the limit of extremely large number of particles. The-
oretically gyrokinetic equations with ~f method assure
conservation of energy in the collisionless fluid limit in
the phase space. Because the gyrokinetic Vlasov equa-
tion is solved with a finite number of markers (par-
ticles), we must verify that if we increase the number of
markers, conservation of energy improves up to an ac-
ceptable level.
The simulation parameters are the same as those in
the previous paper [7] except for the number of par-
ticles, the time step size, and the number of maximum n
modes included. The simulation system is a three
dimensional rectangular box of L*XLyXL.=64Ai x64Ai X32An , where Al and All are the grid sizes per-
pendicular and parallel to the longitudinal magnetic
field. A is stretched by the factor of 1000 from Ai .
Plasma is filled uniformly inside the box. The initial dis-
tribution functions of markers are generated by using
the bit-reversed numbers [ 1 9] which is one of the quiet
start techniques. In the previous paper, we used go*=fo*
and goi=foi following Parker et al. [9] for the distribu-
tion of markers. Here we used the same choice for ions.
For electrons, however, we used the distribution given
by Eq. (13) which is uniform in space and slightly dif-
ferent from fo* ' For the initial weights of electrons, very
small values to roughly approximate the charge density
profile in the linear phase are given. For ions, initial
weights are set to be zero. The number of time steps is
13,000. As the equilibrium current density Jzo We use
264
~ *~~"~~'~~~C ~ Parallelization of Gyrokinetic Particle Code P~~:~ ~IEEl4~
the profile given in the paper by Strauss [ 18]. The
equilibrium q -profile for this current profile monotoni-
cally increases from the center to the wall. The central
q value of q0=0.85 is selected. For the toroidal mode
numbers, the modes up to n=0, ~1, ~2, ~3, :!:4 were
included; higher n modes are eliminated in the Fourier
space to reduce the high-frequency shear Altv6n
waves. The other simulation parameters are given as
follows: co /co = I c/cop.=4A vte/vA=0.25, i, .* p. ' Ai = p,i = f 71 ~;~/co,i , mi /m. =1836, T, / T. =1,
p=(vt. / vA)2(m.1,ni )= 0.000034, At= 0.00233L</ VA .
Here lrA is an Altv6n time, TA=L./vA , and VA is an Alf-
v6n velocity, vA= c((o,i / copi ). Finite size particles with
Gaussian shape are used. Particle sizes of afAi,
ay=Ai , a<=AII are used in x, y, and z directions.
The time evolution of the deviation of various en-
ergies is shown in Fig. I . The results with about two
million particles (1.048,576 electrons and 1,048,578
ions) (Fig. I (a)) are compared with results with eight
million particles (4,194,304 electrons and 4,194,304
ions) (Fig. I (b)). Those results should be compared
with the results of Fig. 7 (include n=0, ~ 1) and Fig. 12
(a) (include n=0, ~ 1, ~2) in reference [7] with about
one million particles. The deviation of eT is larger than
the deviation of eE for the one million particle run. In
the 2 million particle run, the deviation of 8T is less than
the deviation of eE . The deviation of 8r for the case
with about 8 million particles is fairly well and signifi-
cantly less than the deviation of GK , 8E , and 8B . It is to
be noted that the temporal behavior of 8E and 8B are al-
most independent of the number of particles. Only 8K
behaves differently depending on the number of par-
ticles. This is explained in the following. 8E and 8B are
calculated by the lowest order of moments (density and
current density). However eK includes the higher order
moments and the contribution from the high energy
electrons will introduce numerical errors. The error in
the kinetic energy does not significantly influence the
temporal evolution of the collective modes. So the
number of particles only changes the physics slightly in
the long time runs.
The Roman characters in Fig. I (b) show the typical
times corresponding to various phases of the instability.
Hereafter we will show only the results obtained with
the eight million particle run. (1) is linear phase. (II)
and (III) are nonlinear phases before full reconnection.
(IV) is the time of the saturation. (V) and (VI) repre-
sent the post full reconnection phase. It is clear that
until the saturation of full reconnection phase the mag-
netic field energy is released to the fluid kinetic energy
and the electron kinetic energy in the magnetic field
O t~)~
.¥ ~ ~~)
~ O ~ :A~
*~ C~ ~) ~)
O ~~~~
.~ cc{
~4)
~)
O ~~~~
+~ t~)*
~
0.001
O
-0.001
(*)
58BleTO ~
581( 1 8To f
rl ! ~
¥ 1 5CEl8TO
l f; !
ld /~~ !" rl~?--"'
=*_/ ~ .! ¥!" 5CT ICTO
/ "
',
O
(b)
1 O 20 t(vA/L.)
30
C) ~~~
,~ ~ ~~)
~ O ~ ~~)
,¥ t~)~~~
~)
O ~ :~)
+~ cq ~~)
~ (:)
~ ~~)
,~ ~~)~
~)
0.001
O
-O.oOi
58El8TO
l d~/'~~~"
' ,TT ~. .
n '*Irrrv
5CBl8TO L
58KICTO
ll / ~
T
--~~+_"'/
~ VVI
t t l
6CTICTO
, ,
O 1 O 20 t(vA/L.)
30
Fig. I Time dependence of deviation of kinetic 8K, fluid
kinetic 8E, magnetic field 8B, and total 8T energies.
Each energy is normalized by initial total energy eTo'
The total numbers of particles (markers) used are (a) 2,097,1 52 and (b) 8,388,608.
direction. Also it is clear that after the saturation an in-
verse process is occurring because the fluid kinetic energy
goes back to the magnetic field energy. Fig. 2 shows the
temporal dependence of the amplitude of the electrostatic
potential. It is clear there is no thermal fluctuation and the
growih of the mode is simulated from the very early stage
of the instability when the amplitude of the mode is quite
small. The growth rate calculated from Fig. 2 is y = 0.44
( vA/ L<) which is equal to the measurement of the one
million particle runs. This value of y is consistent with
the estimation by Wesson [ 14] and may explain the fast
sawiooth crash in the large tokamaks.
265
~~ ;~7 ・ ~~~~;~~~~A~~ ~~_.>"~_A~C~~~'~~~~ ~~ 72 ~j~~~ 3 ~" 1995 ~P 3 ~l
o
f*~ Ei!1q)
+¥ ~~~ -5 ~)
~j ~~
~
O 10
t(vA/L.)
20 30
Fig. 2 Time evolution for the amp]itude of the eiectro-static potential at z=0 cross section.
The temporal evolution of the magnetic field struc-
ture and the potential profiles at the cross section of
z=0 are shown in Fig. 3 and 4 where the Roman num-
erals corresponding to those shown in Fig. I (b). It is
clear that the original magnetic axis moves out to the x-
point in the q=1 surface with the increase of m=1
magnetic island and finally dissappears and the axis of a
magnetic island forms the new magnetic axis ((1) to
(III)). As is obvious from Fig. 4, the potential structure
which drives the core plasma in the direction of the x-
point by the E X B drift survives even after the satura-
tion of the instability. This is possible because q= 1
after the full reconnection and electron motion along
the magnetic field can not erase the potential profile
because mode structure is flute-like and coincides with
the magnetic field structure. So the plasma in the oppo-
site side of the x-point comes into the central region in
the post full reconnection phase. After the secondary
reconnection a new core plasma is formed in time (VI).
The central q value of the new core plasma is q0= 0.95
and is less than unity again.
For the secondary reconnecton we need a flow to
the reconnecton point. As is clear in Fig. 4 (VI), we ob-
serve a quadrupole structure around the reconnection
point. Two of the flows (by E X B drifts) are toward the
reconnection point. Also the other two of the flows
come out from the reconnection point; one moves to
the center of the plasma and the other moves to the
wall. This quadrupole structure around the ・reconnec-
tion point was not observed in the previous paper be-
cause only up to n=~3 modes were included there.
The large number of particles used here was also help-
ful in observing such a clear quadrupole potential
profile.
The basic process of collisionless reconnection has
also been studied by the two-dimensional particle simu-
lation of the coalescence of two flux bundles [20] which
is caused by the magnetic attraction between the two
flux bundles carrying the same-directional currents. The
importance of the electron inertia is pointed out. The
physics of the collisionless reconnection is essentially
the same for tokamaks and the coalescence of the two
flux bundles. The quadrupole potential profile in the
simulation of the coalescence can be found in the mag-
netic reconnection for I m ;~ 2 in tokamaks. However
for the m=1 mode presented in this article, the poten-
tial profile is dipole which represents the uniform radial
(herical) flow to the x-point only from the inside of the
q=1 surface, It is interesting in the simulation of the
kinetic internal kink mode that the quadrupole poten-
tial profile appears in the strongly nonlinear phase after
the full reconnection. Due to the strong E X B flow
driven by the dipole potential created in the full recon-
nectiori phase, the plasma in the peripheral region at
the opposite side of the original x-point comes into the
core region; the anti-parallel helical magnetic field
structure is formed. There still remains the poloidal
flow from the both sides of the anti-parallel helical
magnetic field. This situation is topologically very simi-
lar to the case of the coalescence of the two flux bund-
les. Hence, the quadrupole potential profile is also ob-
served in the nolninear phase of the m=1 and n=1
kinetic internal kink mode; the new core plasma with
qo < I appears as the result of the magnetic reconnec-
tion in the post full reconnection phase.
5. Conclusions and Discussion
The three dimensional magneto-inductive gyro-
kinetic particle code GYR3D is ported to a massively
parallel computer, Intel Paragon XP/S15-256, which
has 256 scalar processors (nodes) connected with a
two-dimensional mesh topology. The Fortran77 compiler with message-passing subroutines (NX communication library) is used. The standard tech-
niques of parallelization [11,12] are used such as do-
main decomposition and data transpose. No special
technique was used for the load balance [13] because
we used the same size sub-domains and an almost equal
number of particles per sub-domain since the density
profile is uniform in space. Even in the non-uniform
case, it may be possible that we do not need the load
balance technique because, in the 6f code, we can select
266
,~c ~~ ~Jf ~fu ~m7 )~ Parallellzation of Gyrokinetic Particle Code r~~;~ , ~IEEI4~2ti
64
48
Y 32
IS
o
(1)
/** '
/' ~~ '¥ t : f/ '/ "' ¥¥ / // /./(~l~~~::~. ¥ ¥ ¥ "' r'lPl~' .It . . i~i " t j r , fli ~r
t~ -i l, t!¥¥¥~/~~O ~ i l ) i
iti ! t l
I~ f' ¥ L '¥ ~e / / ~aih_ , ¥ ¥ Il' ¥~¥ ~"~~-~~ ~" / __r .._1 '
o I s 32 48 e4 X
64
4a
Y 32
16
o
(IV)
/ f~ " " *~" -- ~1 1 ,. IL ..,1' -' ~:'~-*' / l'r " ~¥" ' L L i ' L - - ¥¥' ' ! f" ~' ~ 1
. , ¥ . .. .~¥ ' l ' ' t ' / -- ~S' t t lh ~"¥~;¥¥::~t:::¥,~:~~~:~:~;~,~¥~~~i~~L~::~~~~~~'b~_~~~~__~~~::~h~:~~~¥~~¥~l;~:¥t t'tl ~ ;:
lr~~ I!( '1 i ; ~ ' tstl~~~ : i tt' l j t :L¥:¥:~~~~::i~::~~~~~~~~:~}~J:1)~ffl l:l~ ;: i ~
l
1 ;~'iL .¥ L~, .,~ __ /"i..: ,' j
tj]1¥¥ . t' '
'tc¥
L "' "'- -- I ' f / '¥'¥ "/ " ¥ "~ '~ 4
t ¥~~'* rll" ~' " j "':vr"*- / /
o 16 32 48 e4 X
64
48
Y 32
16
o
(II)
'/'~ '¥ / /' ' ¥b ¥ / / al"~ -::L-~' ~1;i's' ¥ 1 / _'i'- " ~ . . ¥ " ;¥'¥ i !r/'~';~;' /~~/// " 1: 1 ) .~l ' ' f ji!lV:f:(~ Iflt;j i
~:¥)~ i ' !{tt';;,~f ' i /'/-**'¥ ' r~L~~~!I I ,
;
¥¥J~/ ' I
¥ ¥ '~'_ //' ¥ 'e¥J ~¥~~' ' i ¥'h l ': t¥ ~)~)'-¥ "~~"le"-"" ! l 'l ' ¥¥ " T::; / ¥ ' ." ' ' ¥b / / ¥*
c 16 32 48 64 X
64
48
Y 32
16
o
(V)
'r ""I"""e I¥ l tt'F' i_-"ttl;':ttll' I tl"P l' 'P~;"' """ "~t '1~ ¥ "i:'1T;!:~ti"" 'I t' l t / ILi/ , :L -;)O~t~i;i:ii : : ~ t f!1! ~' /l ! /!/f:)/ t ~l
f ; ¥ittti/f(Pjlll":~(1 't:tit
~11 !l¥ l. i
i ~l' ~¥ t' I'i "t ' I"?r It tl~ 'L~";:' ~i:L~il'
¥ '~ "'r"L_'1"'1 ": ' 'l 'P t¥i ~' I't r
c l 6 32 48 e4 X
64
48
Y 32
16
o
(III)
/' *""'- "¥ l' ' r " "~e ¥' I f' ':': 'L' ~' .,1~ I ~ '1/'~ ~t ~ l 'JL"I"':It¥ ~ ! ! ';;.' )) . t" ~ i ,Lt
/..,*n""~:~:~~""~~ ' i i~f: t: t ' l e " s:~l:'C~~e~~rri' f " ) i I ' !
~' ~ :f~' ! 1 1 , f....,'
~ ::2: 'l
. , t : l' i ;, '.~1" l t / I ':t '~ f : / / t I '111 '_""" / : i t t '1':~:;~/ "" / " :'~P , " " '¥" ' ~" j i ¥. L '~-'h'~_""'111 "
,L '¥ " 'r / ,,
'h" ' / '/'
o 1 6 3z 4 8 64 X
64
4e
Y 32
16
o
(VI)
f ' v 1"~~ ' "II'¥,, . ¥"
' '1;' Istr '_' l **.~ ' / / /::":r "(' 1"' Lt .¥ . ' / 'j f ~t:~sL . .,.. ¥t I f~ ~
: I t "L -"' ( . i ' Il~l ' ' l:' r~)~~~~~:.~:1 1 t
, I t ¥t~'~ i' i ' ~ f f l:~_ }
: ill : 1 ! ' t Dl!"!! i : : i l
l " l ,
t. I tt~ / ~ I ~l '!' I ! / l'~
l ~ '¥¥~.¥1~f"L "~ '_ '
' ' '¥"' " " "r l ., ., ¥ ¥ "'.;""~'::: ;i: " : :r"' I;' _f~ / 1 t"t ""' t""I"t.""' /
l,.*
c Is 32 48 e4 X
Fig. 3 Time evolution of the magnetic field structvre at the cross section of z=0.
The number corresponds to the time sequence in Fig. I (b).
the distribution of markerS uniformly in space. We only
decomposed the domain in the z direction. The best re-
s'"'Its 'fJere obtai*・・^ed w,hen the i'-'1ul'~~uber of rneshes in z,
Nz' is equal to the number of decomposition, Ndi~z' In
our case, Nz is usually smaller than the number of
nodes, N.od" Therefore we used N.od' / Ndi.z replicas. It
is found that the speed of the parallelized code scales
well with the number of nodes. The speed of the paral-
lelized GYR3D with 256 nodes is more than three
times faster than that of the vectorized GYR3D on
NEC SX-3/24R (vector/parallel computer) with one
node. This speed ratio is consistent with the ratio of
peak speeds of these computers. It will be interesting
from the viewpoint of parallelized particle simulation to
267
7 7 ~:7 ・ ~~~~~~;~~~A~d~ ~~ ~~ 1995 ~l; 3 ~j
64
48
Y 32
IS
o
(1)
~)
o l e 32 48 64 X
64
48
Y 32
IS
o
(IV)
i: .1, .・ .,,' ~ 1~
o 1 e 32 48 X
e4
64
48
Y 32
16
o
(II)
C 1 6 32 48 e4 X
64
4a
Y 32
16
o
(v)
ir~~ :~s~~*'1':***=~・・・・,-・・.*='='
o Is 32 48 X
e4
64
48
Y 32
16
o
(III)
Q
c le 32 48 64 X
64
48
y, 32
16
o
(VI)
~~ s ,
c le 32 48 X
64
Fig. 4 Time evolution of the electrostatic potential profile at the cross section of z=0. Contour plots are shown.
The number corresponds to the time sequence in Fig. I (b).
port the GYR3D code to a vector parallel computer
with distributed memory, Fujitsu VPP500/42 (42 pro-
cessors). The parallelization of GYR3D on VPP500/42
is now under progress and will be reported with com-
parison of performance among Paragon and VPP 500
elsewhere.
The parallelized GYR3D code was applied to the
simulation of the m=1 and n=1 kinetic internal kink
mode in tokamaks. A convergence study was done in
the limit of extremely large number of particles. The re-
sults show the very good energy conservation when
about eight million particles are used. It is important to
note that the number of particles did not influence the
linear phase of the instability. Also the number of
268
~Jf *7c:*~'~'~'~)~ Parallelization of Gyrokinetic Particle Code r~~}~ , ~I~El4ti;
particles had only a slight influence on the nonlinear
phase in the very long runs. This is explained in the fol-
lowing. Usually the error in energy conservation reflects
the error in the total kinetic energy of electrons which
comes from the contribution of the high energy elec-
trons. The error in the kinetic energy did not signifi-
cantly influence the temporal evolution of collective
modes. This is because field quantities are determind by
the lower order moments such as charge and current
densities. The other finding in this paper is the exist-
ence of the quadrupole potential profile around the rec-
onnection point in the secondary phase after the full
reconnection. This clear mode structure is obtained by
including higher n modes by reducing the time step
size. The quadrupole potential structure was clearly ob-
served when modes up to n=~4 are included in the
simulation, which was not observed in the previous
paper [7] because only modes up to n=~3 were in-
cluded. The large number of particles used in the simu-
lation also helped to obtain a clear observation of the
mode pattern.
This simulation was done with a relatively small
system size and with less realistic parameters. Because
the collisionless skin depth of electrons is the minimum
characteristic length, we must resolve this length; we
are forced to use a relatively small system size in the di-
rection perpendicular to the longitudinal magnetic field.
Also the parameter selected is vt* << vA. However in the
actual large tokamaks, this value is in the opposite ex-
treme; vt* >> vA. To simulate the case of vt* >> vA, we
must reduce the time step size such that kjl vt* At<< 1;
the simulation time will increase greatly. It is found that
three-dimensional gyrokinetic particle codes with a 6f
method are very powerful tools in simulating the kinetic
modification of MHD modes. However, more powerful
parallel computers with greater than I tera flops are re-
quired for simulation with parameters of existing and
next generation tokamaks. The particle simulations for
such tokamaks need intensive research on the architec-
ture of parallel computers and on the parallel comput-
ing algorithms, which is one of main subjects of the
NEXT project.
Acknowledgement The authors would like to express their thanks to
Professors W.W. Lee, Princeton Plasma Physics Labor-
ato*"'y ar^d R.D. Sydora, University of California at Los
Angeles because the vectorized version of GYR3D
with 6f method was developed in collaboration with
them. The authors are grateful to Drs. M. Azumi and
K. Tani, JAERI and Professor T. Sato, National In-
stitute for Fusion Science for supporting their works.
Two of the authors (H.N. and T.S.) express their
thanks to Professor O. Fukumasa, Yamaguchi Univer-
sity for continuous encouragement. The Intel Paragon
XP/S15-256 at JAERI Naka Institute and the NEC SX-3/24R at the National Institute for Fusion Science
were used for the numerical calculations.
References [1] John .M. Dawson, Viktor Decyk and Richard Sy-
dora, Physics Today 64, March (1993).
[2] B.J. Cohen, D.C. Barnes, J.M. Dawson, G.W.
Hammett, W.W. Lee, G.D. Kerbel, J.-N Leboeuf,
P.C. Liewer, T. Tajima and R.E. Waltz, Comput.
Phys. Comm. 87, I (1995).
[3] H. Naitou, Journal of Plasma and Fusion Research
70, 135 (1994) [in Japanese].
[4] W.W. Lee, J. Comp. Phys. 72, 243 (1987).
[5] S.E. Parker, W.W. Lee and R.A. Santoro, Physi-
cal Review Letters 71, 2042 (1993).
[6] R.D. Sydora, Physica Scripta 52,~474 (1995).
[7] H. Naitou, K. Tsuda, W.W. Lee and R.D. Sydora,
Physics of Plasmas 2, 4257 (1995).
[8] T.S. Hahm, W.W. Lee, and A. Brizard, Phys. Fluids 31, 1940 (1988).
[9] S.E. Parker and W.W. Lee, Phys. Fluids B 5, 77
(1993).
[10] M. Kotschenruether, Bull. Am. Phys. Soc. 34,
2107 (1988).
[1l] P.C. Liewer and V.K. Decyk, J. Comput. Phys.
85, 302 (1989).
[12] V.K. Decyk, Compt. Phys. Comm. 87, 87 (1995).
[13] R.D. Ferraro, P.C. Liewer and V.K. Decyk, J.
Comput. Phys. 109, 329 (1993).
[14] J.A. Wesson, Nuclear Fusion 30, 2545 (1990).
[15] B.B. Kadomtsev, Fiz. Plazmy 1, 710 (1975) (Sov.
J. Plasma Phys. 1, 389 (1976)).
[16] V.M. Vasyliunas, Rev. Geophys. Space Phys. 13,
303 (1975).
[17] D. Biskamp and J.F. Drake, Physical Review Let-
ters 73, 971 (1994).
[18] H.R. Strauss, Phys. Fluids 19, 134 (1976).
[19] J.M. Hammersley and D.C. Handscomb, in "Monte Carlo Methods", Methuen Co. Ltd., Lon-
don, 133 (1964).
[20] M. Tanaka, Phys. Plasmas 2, 2920 (1995).
269