cache-optimal parallel solution of pdes ch. zenger informatik v, tu münchen finite element solution...

50
Cache-Optimal Parallel Solution of PDEs Ch. Zenger Informatik V, TU München Finite Element Solution of PDEs Christoph Zenger Nadine Dieminger, Frank Günther, Wolfgang Herder, Andreas Krahnke, MiriamMehl, Tobias Neckel, Markus Pögl, Markus Langlotz, Tobias Weinzierl Institut für Informatik TU München

Post on 21-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Finite Element Solution of PDEs

Christoph ZengerNadine Dieminger, Frank Günther, Wolfgang Herder,

Andreas Krahnke, MiriamMehl, Tobias Neckel, Markus Pögl, Markus Langlotz, Tobias Weinzierl

Institut für Informatik TU München

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Problem:Numerical solution of partial differential equationsby the finite element method

Numerical kernel: Computation of the product:discrete operator A · approx. solution u

Desired properties:

•Multilevel scheme•adaptive•efficient on modern computer architectures•Parallel with good load balance•Complex geometries

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Concepts:

Hierarchical structures

• Informatics: Stacks and trees

• Geometry: space trees • Numerics: Hierarchical bases and generating systems

• Mathematics: Space filling curves

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Informatics Stack Binary tree

1

2

3

1

2

3

4

5

6

10

54

6

7

9

8 11

13

151412

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Ternary tree

1

2

3 54

6

7 98

10

11 1312

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Geometry:

quadtree ternary space tree

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Dimension recursive constructionternary space tree:

d steps in d dimensions instead of one

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Hierarcal structures in Numerics

Hierarchical basis and generating system

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Ternary hierarchical basis

ternary generating system

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Mathematics: Space filling curves

Basic template (Hilbert):

Recursive construction:

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Peano curve (dimension recursive):

Basic template:

Recursive construction:

Works for arbitrary dimension

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Space-Trees and Element-Oriented Operator Evaluation

21,,1,1,,1

,

4

h

uuuuuu jijijijiji

jihh

i,ji+1,j

i,j-1

i-1,j

i,j+1

11 -4 1

1

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Space-Trees and Element-Oriented Operator Evaluation

2

,1,,1

,21

21

h

uuuu

jijiji

jihh

i,ji+1,j

i,j-1

i-1,j

i,j+1

½ -1 ½

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Space-Trees and Element-Oriented Operator Evaluation

2

,1,,1

,21

21

h

uuuu

jijiji

jihh

i,ji+1,j

i,j-1

i-1,j

i,j+1

-1 ½½

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Space-Trees and Element-Oriented Operator Evaluation

2

,1,,1

,21

21

h

uuuu

jijiji

jihh

i,ji+1,j

i,j-1

i-1,j

i,j+1

½ -1 ½

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Space-Trees and Element-Oriented Operator Evaluation

2

,1,,1

,21

21

h

uuuu

jijiji

jihh

i,ji+1,j

i,j-1

i-1,j

i,j+1

½½ -1

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Concept of stacks:

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Space-Trees and Space-Filling Curves

• ordering of cells along the Peano-curve• line-stacks with alternating linear (locally

deterministic) processing order

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Adaptive Space-Trees and Space-Filling Curves

• adaptive grids, generating systems hiding of points on different levels

• additional colours, point stacks 8 stacks (independent of refinement depth)

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Locality

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Locality:

Length of a cache line : m bytesLength of solution vector: s bytesMinimal number of cache misses: nmin = s/m.Actual number of cache misses: n = 1.1*nmin.

Memory efficiency

Essentially only solution data are storedDefinition of domain and refinement structure:only 2 bits per degree of freedom!( unknowns on a PC for Laplace equation)1010

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

3D-Poisson-equation on a cube

xu

xu

,0

,,1

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

More complicated Domains

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Adaptivity for Complicated Geometries

• arbitrary refinements• automatic boundary detection

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

tau-extrapolation

2nd order extrapolation

H h ||e||L2 ||e||1

||e||L2 ||e||1

3-2 3-3 3,310-

3 3,810-3 1,610-3 1,810-3

3-3 3-4 3,310-

4 3,810-4 2,210-5 2,510-5

3-4 3-5 3,510-

5 4,410-5 1,610-7 3,310-7

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Adaptivity + Full Multigrid

fourth order solution for the actual grid

refinement(hierarchical surplus,tau, dual approach)

additive v-cycleswith

tau-extrapolation

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Adaptivity + Full Multigrid

fourth order solution for the actual grid

refinement(hierarchical surplus,tau, dual approach)

additive v-cycleswith

tau-extrapolation

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Adaptive refinement

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Parallelization – PartitioningUsing the Peano-Curve

process 1 process 2

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Parallelization – Communication

process 1 process 2

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Results – Speedup / Efficiency• Poisson equation

– Sphere geometry– Static non-regular grid– # dof: 23,118,848 # cells: 26,329,806– Myrinet cluster

# processes T (all) T (comm.) Parallel Speedup

Parallel Efficiency

1 3155.18 0 1 1

2 1614.86 5.37 1.95 0.976

4 845.80 26.53 3.73 0.932

8 460.49 27.48 6.85 0.856

16 243.82 22.74 12.93 0.809

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Continuity preservingFE-Schemefor the Navier-Stokes equation

1u 2u

3u 4u

h

h

1v 2v

3v 4v

h

h

12

34

12

34

5u 5v

5 1 2 3 4 1 2 3 4

5 1 2 3 4 1 2 3 4

1:

41

:4

u u u u u v v v v

v u u u u v v v v

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Time dependant 2D-Navier-Stokes-Equation

Reynoldsnumber 2

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Time dependant 2D-Navier-Stokes-Equation

Reynoldsnumber 1000

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Re = 100 , 729*81 grid points, velocity(right) pressure (left)

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Conclusion

higherorder

efficientparallelization

multigrid

adaptivity

complicated geometries

cache-efficiency

space tree,Peano-curve,

stacks

Navier-Stokes

fluid-structure interactions

diffusion equation with non-constant

coefficients

financialpricing

enhanced boundarytreatment

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Thank You !

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München

Cache-Optimal Parallel Solution of PDEsCh. Zenger Informatik V, TU München