s3385 porting marine ecosystem model
TRANSCRIPT
-
7/27/2019 S3385 Porting Marine Ecosystem Model
1/18
Porting Marine Ecosystem Model Spin-Up UsingTransport Matrices to GPUs
E. Siewertsen, J. Piwonski, T. Slawig
CAU - Christian-Albrechts-Universitat zu Kiel
19. Marz 2013
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 1 / 18
http://find/http://goback/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
2/18
Outline
Motivation
Algorithm
Operations
Software Porting to GPU
Biogeochemical model (Fortran) Model driver (C)
Results Remarks
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 2 / 18
http://find/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
3/18
Motivation
Global carbon cycle, CO2 uptake of the worlds oceans
Parameter estimation in biogeochemical models
-180 -135 -90 -45 0 45 90 135 180
Longitude [degrees]
-80
-60
-40
-20
0
20
40
60
80
Latitude
[deg
rees]
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
Simulated concentration of nutrients (phosphate, P O34
) at surface layer in
mmolP/m3. The longitudinal and latitudinal resolution is at 1.0.
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 3 / 18
http://find/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
4/18
Motivation contd
Ocean Biogeochemical Dynamics [Sarmiento and Gruber, 2006]
System of transport equations for biogeochemical tracers:
yit = (yi)
diffusion
(v yi) advection
+ qi(y, u) reaction
, i = 1, . . .
Climatological (annual periodic) forcing
Solution is steady annual periodic state (equilibrium)
Involves integration over thousands of model years
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 4 / 18
http://find/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
5/18
Motivation contd
Transport Matrix Method [Khatiwala et al., 2005]
Discretized and approximated problem:
yj+1 = A imp,j (A exp,j
transport matrices
yj + t qj(yj ,u)), j = 0, . . .
Monthly averaged matrices provided
Interpolation needed:
A imp,j = j Ai[i,j ] + j Ai[i,j ]A exp,j = j Ae[i,j ] + j Ae[i,j ]
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 5 / 18
http://find/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
6/18
Algorithm
Assuming: 1 year 360 days and t 3 h
1: y = y02: repeat
3: for j = 1, . . . , 2880 do4: evaluate biogeochemical model: yq = qj(y,u)5: interpolate matrices to time step j6: perform explicit step: y = Aexp,j y7: perform implicit step: y = Aimp,j(y + tyq)
8: end for9: until steady annual cycle is reached
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 6 / 18
http://find/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
7/18
Operations
Evaluate biogeochemical model: BGCStep(); Including copying between different data alignments
Interpolate matrices:
MatCopy(); MatScale(); MatAXPY();
Apply explicit and implicit step: MatMult();
99.4% of computational effort is spent by these operations.
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 7 / 18
http://find/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
8/18
Porting to GPU
Software: Metos3D [Piwonski and Slawig, 2013] github.com/metos3d PETSc based implementation in C [Balay et al., 1997] Using PETSc matrix and vector operations Coupling biogeochemical models implemented in Fortran
Objectives: Do not change Fortran implementation at all
Do not change C implementation, if possible
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 8 / 18
http://localhost/var/www/apps/conversion/tmp/scratch_9/github.com/metos3dhttp://localhost/var/www/apps/conversion/tmp/scratch_9/github.com/metos3dhttp://find/http://goback/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
9/18
Biogeochemical model
Fortran implementation: Obtained PGI CUDA Fortran compiler license Using wrapper file model.CUF with Fortran kernels Including original through: #include "model.F" Using macro to change subroutine to
attributes(device) subroutine
Copying between data alignments: Thrust iterators
Operator overloading
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 9 / 18
http://find/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
10/18
Model Driver
Objective: Do not change software implementation
Translates to: Change library beneath
PETSc-dev GPU enabled PETSc version MatMult() is already implemented [Minden et al., 2010] MatCopy(), MatScale(), MatAXPY() have to added ..
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 10 / 18
http://find/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
11/18
PETSc contd
PETSc classes
Basic PETSc principles:
Object oriented programming using C language: Data encapsulation Polymorphism Inheritance
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 11 / 18
http://find/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
12/18
PETSc contd
Class Mat
Operations:
struct MatOps {...
PetscErrorCode (*mult)(Mat,Vec,Vec);
PetscErrorCode (*copy)(Mat,Mat,...);
PetscErrorCode (*scale)(Mat,PetscScalar);
PetscErrorCode (*axpy)(Mat,PetscScalar,Mat,...);
...
};
Adding own implementation:
M->ops->scale = MatScale SeqAIJCUSP;
...
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 12 / 18
http://find/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
13/18
Results
GPU: GeForce GTX 480 CPU: Intel Xeon E5520, running at 2.27 GHz
Biogeochemical model: N-DOP [Dutkiewicz et al., 2005]
Min Max Avg StdDevCPU 621.43 s 626.79 s 622.14 s 0.540GPU 28.17 s 28.20 s 28.18 s 0.003
22.06
Overall performance gain simulating one model year using the
N-DOP model at a longitudinal and latitudinal resolution of2.8125.
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 13 / 18
http://goforward/http://find/http://goback/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
14/18
Results contd
Performance gain per operation:
Routine CPU GPU CPU : GPU
BGCStep 469.76 s 13.05 s 36.00MatCopy 34.04 s 3.91 s 8.70MatScale
23.33 s 1.99 s11.70
MatAXPY 37.49 s 2.89 s 12.96MatMult 58.19 s 5.87 s 9.92
Performance ofMatMult in detail: CPU: 1.2 GFlop/s (13 % of 9.08 GFlop/s), 12 GB/s (56.8 % of 21.2 GB/s) GPU: 11.9 GFlop/s (7 % of 168 GFlop/s), 119.4 GB/s (67.4% of 177 GB/s)
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 14 / 18
http://find/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
15/18
Results contd
B G C S t e p
7 5 . 0 %
M a t C o p y
5 . 4 %
M a t S c a l e
3 . 7 %
M a t A X P Y
6 . 0 %
M a t M u l t
9 . 3 %
O t h e r
N - D O P m o d e l , C P U ( 6 2 6 . 0 7 s / a )
B G C S t e p
4 6 . 0 %
M a t C o p y
1 3 . 8 %
M a t S c a l e
7 . 0 %
M a t A X P Y
1 0 . 2 %
M a t M u l t
2 0 . 7 %
O t h e r
2 . 2 %
N - D O P m o d e l , G P U ( 2 8 . 3 4 s / a )
[ b l o c k s i z e 1 6 0 ]
Distribution of computational time per operation during
the simulation of one model year.
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 15 / 18
l d
http://find/http://goback/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
16/18
Results contd
1 01 7
2 0 2 8 3 0 4 0 5 0 5 6 6 0
p r o c e s s o r c o u n t
2 8
5 0
1 0 0
1 5 0
t
i
m
e
p
e
r
c
y
c
l
e
[
s
]
S i m u l a t i o n o f o n e c y c l e ( N - D O P , 2 . 8 1 2 5 , 2 8 8 0 t i m e s t e p s )
2 . 1 0 G H z A M D B a r c e l o n a ( r z c l u s t e r )
2 . 6 7 G H z I n t e l W e s t m e r e ( r z c l u s t e r )
2 . 9 3 G H z I n t e l G a i n e s t o w n ( H L R N )
G e F o r c e G T X 4 8 0
Performance of GPU simulation compared to different CPU clusters.
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 16 / 18
R k
http://find/http://goback/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
17/18
Remarks
PGI Fortran preprocessor didnt like: #define subroutine attributes(device) subroutine
C preprocessor was used Only sequential PETSc operations were implemented
Ongoing work on multi GPU implementation
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 17 / 18
Th k
http://find/ -
7/27/2019 S3385 Porting Marine Ecosystem Model
18/18
Thanks
Thanks for your attention!
Special thanks to Dr. Ulrich Knechtel
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. Marz 2013 18 / 18
http://find/