cache-optimal parallel solution of pdes ch. zenger informatik v, tu münchen finite element solution...
Post on 21-Dec-2015
226 views
TRANSCRIPT
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Finite Element Solution of PDEs
Christoph ZengerNadine Dieminger, Frank Günther, Wolfgang Herder,
Andreas Krahnke, MiriamMehl, Tobias Neckel, Markus Pögl, Markus Langlotz, Tobias Weinzierl
Institut für Informatik TU München
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Problem:Numerical solution of partial differential equationsby the finite element method
Numerical kernel: Computation of the product:discrete operator A · approx. solution u
Desired properties:
•Multilevel scheme•adaptive•efficient on modern computer architectures•Parallel with good load balance•Complex geometries
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Concepts:
Hierarchical structures
• Informatics: Stacks and trees
• Geometry: space trees • Numerics: Hierarchical bases and generating systems
• Mathematics: Space filling curves
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Informatics Stack Binary tree
1
2
3
1
2
3
4
5
6
10
54
6
7
9
8 11
13
151412
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Ternary tree
1
2
3 54
6
7 98
10
11 1312
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Geometry:
quadtree ternary space tree
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Dimension recursive constructionternary space tree:
d steps in d dimensions instead of one
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Hierarcal structures in Numerics
Hierarchical basis and generating system
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Ternary hierarchical basis
ternary generating system
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Mathematics: Space filling curves
Basic template (Hilbert):
Recursive construction:
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Peano curve (dimension recursive):
Basic template:
Recursive construction:
Works for arbitrary dimension
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Space-Trees and Element-Oriented Operator Evaluation
21,,1,1,,1
,
4
h
uuuuuu jijijijiji
jihh
i,ji+1,j
i,j-1
i-1,j
i,j+1
11 -4 1
1
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Space-Trees and Element-Oriented Operator Evaluation
2
,1,,1
,21
21
h
uuuu
jijiji
jihh
i,ji+1,j
i,j-1
i-1,j
i,j+1
½ -1 ½
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Space-Trees and Element-Oriented Operator Evaluation
2
,1,,1
,21
21
h
uuuu
jijiji
jihh
i,ji+1,j
i,j-1
i-1,j
i,j+1
-1 ½½
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Space-Trees and Element-Oriented Operator Evaluation
2
,1,,1
,21
21
h
uuuu
jijiji
jihh
i,ji+1,j
i,j-1
i-1,j
i,j+1
½ -1 ½
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Space-Trees and Element-Oriented Operator Evaluation
2
,1,,1
,21
21
h
uuuu
jijiji
jihh
i,ji+1,j
i,j-1
i-1,j
i,j+1
½½ -1
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Space-Trees and Space-Filling Curves
• ordering of cells along the Peano-curve• line-stacks with alternating linear (locally
deterministic) processing order
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Adaptive Space-Trees and Space-Filling Curves
• adaptive grids, generating systems hiding of points on different levels
• additional colours, point stacks 8 stacks (independent of refinement depth)
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Locality:
Length of a cache line : m bytesLength of solution vector: s bytesMinimal number of cache misses: nmin = s/m.Actual number of cache misses: n = 1.1*nmin.
Memory efficiency
Essentially only solution data are storedDefinition of domain and refinement structure:only 2 bits per degree of freedom!( unknowns on a PC for Laplace equation)1010
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
3D-Poisson-equation on a cube
xu
xu
,0
,,1
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Adaptivity for Complicated Geometries
• arbitrary refinements• automatic boundary detection
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
tau-extrapolation
2nd order extrapolation
H h ||e||L2 ||e||1
||e||L2 ||e||1
3-2 3-3 3,310-
3 3,810-3 1,610-3 1,810-3
3-3 3-4 3,310-
4 3,810-4 2,210-5 2,510-5
3-4 3-5 3,510-
5 4,410-5 1,610-7 3,310-7
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Adaptivity + Full Multigrid
fourth order solution for the actual grid
refinement(hierarchical surplus,tau, dual approach)
additive v-cycleswith
tau-extrapolation
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Adaptivity + Full Multigrid
fourth order solution for the actual grid
refinement(hierarchical surplus,tau, dual approach)
additive v-cycleswith
tau-extrapolation
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Parallelization – PartitioningUsing the Peano-Curve
process 1 process 2
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Parallelization – Communication
process 1 process 2
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Results – Speedup / Efficiency• Poisson equation
– Sphere geometry– Static non-regular grid– # dof: 23,118,848 # cells: 26,329,806– Myrinet cluster
# processes T (all) T (comm.) Parallel Speedup
Parallel Efficiency
1 3155.18 0 1 1
2 1614.86 5.37 1.95 0.976
4 845.80 26.53 3.73 0.932
8 460.49 27.48 6.85 0.856
16 243.82 22.74 12.93 0.809
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Continuity preservingFE-Schemefor the Navier-Stokes equation
1u 2u
3u 4u
h
h
1v 2v
3v 4v
h
h
12
34
12
34
5u 5v
5 1 2 3 4 1 2 3 4
5 1 2 3 4 1 2 3 4
1:
41
:4
u u u u u v v v v
v u u u u v v v v
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Time dependant 2D-Navier-Stokes-Equation
Reynoldsnumber 2
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Time dependant 2D-Navier-Stokes-Equation
Reynoldsnumber 1000
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Re = 100 , 729*81 grid points, velocity(right) pressure (left)
Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München
Conclusion
higherorder
efficientparallelization
multigrid
adaptivity
complicated geometries
cache-efficiency
space tree,Peano-curve,
stacks
Navier-Stokes
fluid-structure interactions
diffusion equation with non-constant
coefficients
financialpricing
enhanced boundarytreatment