running m3d on advanced computing architectures
DESCRIPTION
Running M3D on Advanced Computing Architectures. Jin Chen PPPL. M3D Summery. Multilevel / 3D using potential stream function, mhd / 2-fluid / particle, tokamak / stellarator Semi-implicit time step applied 13-19 elliptic solver calls per time step Poisson Equation with Neumann b.c. - PowerPoint PPT PresentationTRANSCRIPT
Running M3D on Advanced Computing Architectures
Jin Chen
PPPL
M3D Summery
• Multilevel / 3D using potential stream function, mhd / 2-fluid / particle, tokamak / stellarator• Semi-implicit time step applied• 13-19 elliptic solver calls per time step• Poisson Equation with Neumann b.c.• Matrices symmetrized / solvers optimized• Higher order triangular elements• Runs on NERSC: seaborg / jacquard NLCF : cheetah / ram / phoenix PPPL : fcc / mhd
M3D code structure
Initialization M3D fortran Elliptic solver & operator C
Postprocessing
Equilibrium
Checkpoint
Interpolation
Mesh generation
Coordinate setup
Triangle elements
M3D matrices
-rhoeqn: density
-weqn:
-chiaiap_3:
-vphieq:
-phieqn:
-aeqn:
-ceqn/:
-feqn:
poissc.c
hopoissc.c
hopoissc_diag.c
poissc_fsymm.cpoissc_fsymm_opt.c
dxdrc.c
checkpoint
hdf5 viz data†w U, ,I p
v
I
FR
making M3D matrices symmetry2 2
2 2
*
†
*
†
1
1
1
R Z
R R
R R
Conserved form
R
R
* *
† †† †
g
g
g
Iu f u f
t g
Iu f u f
t g
Iu f u f
t g
J. Chen, et al., Symmetric Solution in M3D, Computer Physics Communication 164,468(2004).
compiler options
• Compile Level 0: make BOPT=O update Level 1: make –f Makefile.fsymm BOPT=O update Level 2: make –f Makefile.fsymm_opt BOPT=O update
runtime optionsjob.old: works for both regular M3D and symmetrized M3D
-poisson_pc_type asm \-poisson_pc_asm_overlap 1 \-poisson_sub_pc_type ilu \-poisson_sub_pc_ilu_levels 3 \-poisson_ksp_type gmres \
-poissonH1_pc_type asm \-poissonH1_pc_asm_overlap 1 \-poissonH1_sub_pc_type ilu \-poissonH1_sub_pc_ilu_levels 3 \-poissonH1_ksp_type gmres \
-poissonH2_pc_type asm \-poissonH2_pc_asm_overlap 1 \-poissonH2_sub_pc_type ilu \-poissonH2_sub_pc_ilu_levels 3 \-poissonH2_ksp_type gmres \
-pc_type asm \-pc_asm_overlap 1 \-sub_pc_type ilu \-sub_pc_ilu_levels 3 \-ksp_type gmres \
job.opt: works only for symmetrized M3D
-poisson_pc_type hypre-poisson_pc_hypre_type boomeramg-poisson_ksp_type cg \
-poissonH1_pc_type jacobi\-poissonH1_ksp_type cg \
-poissonH2_pc_type asm \-poissonH2_pc_hypre_type boomeramg-poissonH2_ksp_type cg \
-pc_type jacobi \-ksp_type cg \
Iteration counts comparing (7067->1311->593)
gmres/ilu cg/hypre/jacobi
restart 30 200 200 fillin level 0 0 3
• Diffpar-operator_4-ibc_1-ID_0 : its = 3 3 2 6• poisD-operator_3 : its = 404 188 58 8• Diffpar-operator_4-ibc_1-ID_0 : its = 5 5 5 5• Diffpar-operator_4-ibc_0-ID_10 : its = 30 30 9 101• Diffpar-operator_4-ibc_0-ID_10 : its = 30 30 9 101• poisN-operator_1 : its = 3234 465 89 9• Diffpar-operator_4-ibc_1-ID_0 : its = 5 5 5 5• Diffpar-operator_4-ibc_0-ID_0 : its = 3 3 3 6• poisD-operator_1 : its = 371 181 56 9• Diffpar-operator_5-ibc_0-ID_0 : its = 2 2 2 4• poisD-operator_2 : its = 358 182 56 8• poisN-operator_1 : its = 2593 420 57 9
Improved M3D time components
Time (sec, 500 time steps, 1 node, seborg) Gmres/ilu CG/jacobi/hypre
M3D total time
(fortran code, solver, operator, data conversion)
5788 2424
elliptic solver time
KSPSolve time
3962
3222
1335
1189
M3D-Petsc data convertion
(par2m3d m3d2par Rpar2m3d m3d2Rpar)
974 118
Strong Scaling in direction
m3d time KSP time• 1 node (39481 eqs) 2242 sec 1237 sec• 4 nodes( 9871 eqs) 1010 sec 482 sec• 8 nodes( 4971 eqs) 734 sec 353 sec
• Note: seaborg. 1 node has 16 processors. The number of equations is counted on each processor. 100 timestep. 16/16/141/1/4/1-4-8. optimz/opt4_scaling_strong_16p/64p/128p/256p.wxo.
Weak Scaling in direction
m3d time KSP time• 1 node (7321 eqs) 129 sec 22 sec• 4 nodes(7261 eqs) 259 sec 42 sec• 8 nodes(7261 eqs) 321 sec 58 sec• 16 nodes(7261 eqs) 322 sec 68 sec
Note: seaborg. 10 timestep. Optimz/weaking_scaling_1.16_16_061/121/121/121_4/4/8/16_1/4/8/16_1node/4node/8node/16node.wxo
Most time saved … Neumann b.c.Weakly diagonal dominant matrix
u f x R
ug x R
n
Ax b
R R
fdx gdl
( , )
( , )0T b e
b be
be
e 1.Consistent
system
2.Unique solution
0Tx e 0R
udx
{ }
: 0
0
nullspace span e
Singularity Ae
A is semi definite
one eigenvalue is
Minimum length solution
,
.
if u is a solution
u c is also a solution
Higher order triangles
2nd order
3rd order
Regular higherJ. Chen, et al,
Solving Anisotropic Transport Equation on Misaligned Grids, LNCS 3516, pp. 1076-1079, 2005.
Lump higher order G. Cohen, et al,
Higher order trangular finite elements with mass lumping for the wave equation, Siam J. Numer. Anal. 38(2047-2078), 2001
2nd order meshes with p __from Linda
Run M3D with ho options
• Compiler options make BOPT=O update
• Runtime options– Regular 2nd order -hoelement -horder 2 \– Regular 3rd order -hoelement -horder 3 \– Lump 2nd order -hoelement -lump -horder 2 \– Lump 3rd order -hoelement -lump -horder 3 \
Benchmark ho code: m3d/code/m2.F
• 346 c.. determine mesh• 347 call dmesh• 348 • 349 c.. cjtest 1-dec-04 for linda start --• 355 call cvolea( one, sum )• 356 write(0,*)"TEST: cvolea sum for one = ", sum• 357 call cvol( one, sum )• 358 write(0,*)"TEST: cvol sum for one = ", sum• 359 c.. cjtest 1-dec-04 for linda end --• 360 • 362 call rnetc• 363 • 364 c... model test problem• 366 call ellip• 367 call circle• 369 c return• 370 • 371 cLS if(impp.eq.1.and.ioldinp.ne.1) call wread_mpp
Benchmark options
Compiler options to turn on
-DELLIP in m3d/grid/Makefile
-DHELMHOLTZ in m3d/interface/Makefile
-mhd/driver/test.c: mh3d_test
Runtime options
Ellip.F controled by elist
Circle.F controled by clist
M3D elliptic solvers
†
*
†
*
1: . .
2 : . .
3. . .
4. ( ) . .
5. ( ) . .
6. ( ) . .
7. . .
u f Dirichlet b c
u f Dirichlet b c
u f Dirichlet b c
I u f Dirichlet b c
I u f Dirichlet b c
I u f Dirichlet b c
u f Neumann b c
M3D operators
• iselect = 11 dudx 1st order partial derivatice• iselect = 12 dudz 1st order partial derivative• iselect = 13 dxdphi toroidal derivative• iselect = 14 cvol total toroidal volume• iselect = 15 cvolea toroidal volume contained in each • iselect = 16 d2udxdz - d2udzdx 2nd order derivative commute• iselect = 17 gradsq vector inner product• iselect = 18 gcro vector cross product• iselect = 19 delsq laplacian and bdy line integral• iselect = 20 div divergence
Numerical Accuracy (2nd order, RMS)
operators Linear Regualr HO Lump HO
• pure poiss .3133E-04 .1824E-10 .2778E-10• star poiss .7480E-04 .1741E-07 .9668E-11• dagg poiss .7689E-05 .1368E-07 .1316E-11• Helmholtz pure poiss .8375E-04 .5808E-06 .5921E-11• Helmholtz star poiss .2019E-03 .1122E-04 .1187E-10• Helmholtz dagg poiss .3648E-04 .1582E-06 .1542E-10• pure poiss Neumann u_x .3034E-02 .1049E-03 .1157E-03• u_y .2290E-02 .7860E-04 .8898E-04• dxdr .2424E-03 .7718E-11 .4413E-13• dxdz .9665E-03 .1709E-09 .2709E-13• d2xdrdz - d2xdzdr .9787E-03 .1705E-09 .5611E-11• grad .4251E-03 .7760E-13 .4639E-13• gcro .3830E-02 .6218E-09 .9326E-14• delsq .6927E-03 .9690E-10 .1106E-09
Numerical Efficiency (2nd order)
operators Linear Regular HO Lump HO
• pure poiss 11.505164 17.993580 15.881487• star poiss 11.936641 17.842965 15.577935• dagg poiss 11.487363 17.065694 15.590550• Helmholtz pure poiss 11.593001 17.850698 15.764700• Helmholtz star poiss 11.827986 17.617935 15.462633• Helmholtz dagg poiss 11.127486 17.504207 15.329060• pure poiss Neumann 11.800331 17.994744 15.368874
• dxdr 0.325041 2.822974 0.443981• dxdz 0.467021 2.539099 0.419528• d2xdrdz - d2xdzdr 0.560459 9.457784 2.098601• grad 0.680051 2.715444 0.961536• gcro 0.234130 2.418649 0.544330• Delsq(Laplacian) 0.355726 6.733015 0.554883
poisson solver scales to # of eqs
Lump HO is used.
Gmres/ilu.
Application of ho code to anisotropic transport on misaligned grids
M3D on X1
• m3dp.x
m3dp_vec.x
• m3dp_fsymm.x
m3dp_fsymm_vec.x ??
• m3dp_fsymm_opt.x
m3dp_fsymm_opt_vec.x ??
Optimizing M3D on X1—cont’d
• Petsc, Matrix Vector Product
• MatMult flops (16MSP):
Standard petsc Optimized petsc
6.81 MFlops 54.0 MFlops