running operational canadian nwp models and assimilation … · 2015. 11. 18. · 3....
TRANSCRIPT
Scalability Workshop ECMWF, Reading, UK April 14-15, 2014
Running operational Canadian NWP models and assimilation systems on next-generation supercomputers Mark Buehner, Michel Desgagné, Abdessamad Qaddouri, Janusz Pudykiewicz, Michel Valin, Martin Charron Meteorological Research Division
The GEM model 1. Grid point lat/lon model 2. Finite differences on an Arakawa-C grid 3. Semi-Lagrangian (poles are an issue) 4. Implicit time discretization
1. Direct solver (Nk 2D horizontal elliptic problems) 2. Full 3D iterative solver based on FGMRES
5. Global uniform, Yin-Yang and LAM configurations 6. Hybrid MPI/OpenMP
1. Halo exchanges 2. Array transposes for elliptic problems
7. PE block partitioning for I/O
Global uniform grid: 1) a challenge for DM implementation 2) many more elliptic problems to solve due to implicit horizontal diffusion (transposes) 3) semi-Lagrangian near the poles 4) current DM implementation will not scale
Yin-Yang grid configuration
• Implemented as 2 LAMs communicating at boundaries
• Optimized Schwarz iterative method for solving the elliptic problem.
• Scales a whole lot better • Operational implementation due • in spring 2015 • Communications are an issue • Exchanging a Global Uniform
scalability problem (poles) by another scalability problem
Yin-Yang 10 km scalability Ni=3160, Nj=1078, Nk=158
H960 H3200 H5056 H1920
Y3200 Y8192 Y16384 Y30968 Y5056
H= CMC/hadar: IBM Power7 Y= NCAR/yellowstone: IBM iDataPlex
16x30x1 198x36
79x32x1 40x34
79x49x4 40x22
32x32x8 99x34 32x32x4
99x34 32x30x1 99x36
40x40x1 79x27
Yin-Yang 10 km scalability Dynamics components
The future of GEM
• Yin-Yang 2km on order 100K cores is already feasible on P7 processors or similar
• Yin-Yang exchanges will need work • Using GPUs capabilities is on the table • Improve Omp scalability • Re-partition MPI sub-domain (bni,bnj,ni,nj,nk) • Export SL interpolations to reduce halo size • Processor mapping to reduce the need to communicate
through the switch • Partition NK • MIMD approach for I/O
Investigating scalability and accuracy on an icosahedral geodesic grid
Spatial discretization: finite volume method on icosahedral geodesic grid Time discretization: exponential integration methods which resolve high frequencies to the required level of tolerance without severe time step restriction Shallow water implementation already shows great scalability Vertical coordinates: Generalized quasi-Lagrangian with conservative monotonic remapping
Unstable jet, icosahedral grid number 6, dx= 112 km, dt=7200 sec (typically 30 sec)
Pudykiewicz (2006), J. Comp. Phys., 213, pp 358-390 Pudykiewicz J. (2011), J. Comp. Phys., 230, pp 1956--1991 Qaddouri A., J. et al. (2012), Q. J. Roy. Met. Soc., 138, pp 989--1003 Clancy C., Pudykiewicz J. (2013), Tellus A, vol. 65
Page 9 – April-16-14
Ensemble-Variational Assimilation: EnVar
• EnVar will replace 4D-Var at Environment Canada in 2014 for both global and regional deterministic prediction systems
• EnVar uses a variational assimilation approach in combination with the already available 4D ensemble covariances from the EnKF
• By using 4D ensembles, EnVar performs a 4D analysis without need of tangent-linear/adjoint of forecast model
• Consequently, it is more computationally efficient and easier to maintain/adapt than 4D-Var:
– EnVar: ~10 min, 320 cores, 50km grid spacing – 4D-Var: ~1hr, 640 cores, 100km grid spacing
• Future improvements to EnKF will benefit both ensemble and deterministic forecasts incentive to increase overall effort on EnKF development
• EnKF scalability examined by Houtekamer et al. (2014, MWR)
Page 10 – April-16-14
• In 4D-Var the 3D analysis increment is evolved in time using the TL/AD forecast model (here included in H4D):
• In EnVar the background-error covariances and increment are explicitly 4-dimensional, resulting in cost function: 4D
14D4D4Db4D
14Db4D4D 2
1)][()][(21)( xBxyxHxRyxHxx ∆∆+−∆+−∆+=∆ −− TT HHJ
EnVar formulation
xBxyxHxRyxHxx ∆∆+−∆+−∆+=∆ −− 14Db4D
14Db4D 2
1)][()][(21)( TT HHJ
Page 11 – April-16-14
Current operational systems
Global EnKF
Global ensemble forecasts (GEPS)
Global deterministic
forecast (GDPS)
Global 4D-Var
2013-2017: Toward a Reorganization of the NWP Suites at Environment Canada
Regional ensemble forecasts (REPS)
Regional deterministic
forecast (RDPS)
Regional 4D-Var
global system regional system
Page 12 – April-16-14
2014 implementation: 4D-Var replaced by EnVar
Global EnKF
Global ensemble forecasts (GEPS)
Global deterministic
forecast (GDPS)
Global EnVar
Background error
covariances
2013-2017: Toward a Reorganization of the NWP Suites at Environment Canada
Regional ensemble forecasts (REPS)
Regional deterministic
forecast (RDPS)
Regional EnVar
global system regional system
Page 13 – April-16-14
4D error covariances Temporal covariance evolution
EnVar (4D ensemble covariance):
4D-Var:
-3h 0h +3h
256 pre-computed forecast model integrations provide 4D background-error covariances
~70 TL and AD model integrations performed as part of assimilation
“analysis time”
Page 14 – April-16-14
Spatial covariance localization in EnVar (Buehner 2005, Lorenc 2003)
• Complexity of 4D-Var is replaced by relatively simple use of 4D ensemble covariances to compute analysis increment and gradient during each iteration of the cost function minimization
• If the spatial localization function is horizontally homogeneous and isotropic, then spectral approach is efficient (for global lat-lon system):
∆x4D = ∑k ek4D o (L1/2 ξk) = ∑k ek
4D o (S-1 Lv
1/2 Lh1/2 ξk)
where: • ek
4D is the kth 4D ensemble deviation • ξk is corresponding control vector • S-1 is the inverse spectral transform • Lh = SLhST is the diagonal spectral horizontal localization matrix
• Equivalent to using sqrt of localized sample covariance matrix: L o[∑k ek (ek)T]
Page 15 – April-16-14
Scalability of EnVar • Computation and I/O must be highly parallel, ek
4D is large: • 256 members, 7 times, 4 vars, 800x400x75L ~640GB
• Calculation of analysis increment (and adjoint) currently about 1/2 of overall cost of minimization: ∆x4D = ∑k ek
4D o (S-1 Lv1/2 Lh
1/2 ξk)
• Currently, Schur product is the most expensive (and simplest) step, but scales perfectly: ∆x4D=∑k ek
4D o αk
• Spectral transform is next most expensive, involves a global transpose: αk=S-1 Lv
1/2 Lh1/2 ξk
• Currently only 1D decomposition: • ξk : split by ens member (256) • ∆x : split by latitude (400)
• Improved scalability requires 2D decomposition (3D transposition strategy) under development
seco
nds
Reading ensemble
≈ + +
END