towards achieving gpu-native adaptive mesh reﬁnement€¦ · daino (2016) the amr algorithm...

Towards achieving GPU-native adaptive mesh refinement

Ania Brown

Prof Takayuki Aoki

Why AMR?

AMR is not GPU friendly

• Complicated, time varying data structures

• Can you use AMR and keep GPU performance?My conclusion: yes, but it’s messy

Contents

• Introduction to the algorithm + data structures

• The challenges

• Optimisation possibilities

• The RSE perspective — some lessons learnt

Problem domain

• Stencil calculations on a square structured mesh

• Cell centre values

i, j+1

i+1, ji-1, j i, j

i, j-1

1) Block structured AMR

2) Tree based AMRSimulation mesh Refinement representation

Simulation mesh Refinement representation

2) Tree based AMR

… …

2) Tree based AMR

… …… …

2) Tree based AMR

3) Patches +Tree AMR

Block-structured Enzo CHOMBO

Octree RAMSES

Octree + patches FLASH, using PARAMESH NIRVANA

CPU GPU

Octree + patches GAMER (2011) Daino (2016)

The AMR algorithmInitialize data structures

Create halo regions

Update patch values

Refine/coarsen patches

Output values for visualisation

Main loop:

Update neighbour relations

Initialize data structures

Main loop:

CPU GPU

Create halo regions

Update patch values

Update step

1 CUDA block

Update step

• Tune block size for coalesced access

• Zmarching

Main loop:

CPU GPU

Create halo regions

Update patch values

Main loop:

CPU GPU

Create halo regions

Update patch values

Ordering leaves

Hilbert curve

At each step:

• Divide space into 4

• Replace each quadrant with rotated or reflected versions of the original curve

• Connect such that that start and end points remain the same

Rules for refinement

Neighbour relations

Leaf nodes:Neighbour indices in each direction Parent index

Parent nodes: Child indices

Create halo regionsFind correct neighbour node

Create halo regionsCopy halo values

Interpolating halo values

Reducing halo values

Main loop:

CPU GPU

Refine/coarsen patch values

Defragment value array

Find patches to coarsen/refine

Coarsen/refine step

Defragment value array refine node

Defragment value array coarsen node

Calculate new defragmented position

• Input: for all nodes, flag whether that node is to be refined, coarsened or unchanged

• Refined: +3Coarsened: -3Unchanged: 0

• For each element, sum all preceding elements in the array

• For n nodes, requires n/2 threads and O(log2(n)) serial steps

Multi-GPULoad balancing

Boundaries between subdomains

Node 0

Node 1

Node 2

Node 0

Node 1

Node 2

How to distribute tree

Software design

• Code framework — allow user to edit/add functions for initialisation, resolution criterion, stencil calc

• Code generation — annotated regular data structures

• How much to offer? — cell/node centre, interpolation level, stencil type

Software development process

• Unit testing

• Verification

• Profiling

Code by: T.Shimokawabe, T.Takaki (2011)

Phase field model of dendritic solidification in a binary alloy

7 refinement levels in quad-tree

Regular mesh

Adaptive mesh

Performance testing for dendritic solidification model

256 x 256

L = 1.5⇥ 10�3m

R = 4.5⇥ 10�4m

�xmin = 6⇥ 10�6m

= 1.2⇥ 10�5m

8192 x 8192

L = 1.5⇥ 10�3m

R = 4.5⇥ 10�4m

= 1.2⇥ 10�5m

�xmin = 1.9⇥ 10�7m

Performance testing for dendritic solidification model

Resolution

8192AMR Regular Mesh Worst case AMR

256512

In summary

• Patch based tree-AMR

• For quick gains, offload update step to GPU

• GPU-native version possible — values on GPU, neighbour relations on CPU

• Likely won’t be a one size fits all fix

Governing PDEs for phase field model

�(x, y, t)

c = (1� �)cL + �cSc(x, y, t)

: phase

: liquid concentration: solid concentration

@c@t = r · [DS�rcS +DL(1� �)rcL]

Diffusion Interface anisotropy

Chemical driving force Phase change

= �M

r · (a2r�) +

|r�|2) + @

|r�|2)

|r�|2)� S�T

dp(�)

dq(�)

ap(�), q(�)DS , DL

: mobility: interface anisotropy

: interpolating functionM�

ap(�), q(�)DS , DL

q(�) : double well function: diffusion in solid, liquid

: entropy of fusion

: temperature: height of double well potential

towards achieving gpu-native adaptive mesh reﬁnement€¦ · daino (2016) the amr algorithm...

Documents

fast...

cem: coarsened exact matching in stata - gary king · of...

hbs trigrid v 1€¦ · when prompted to coarsen mesh, just...

efﬁcient iterative linear-quadratic approximations...

the$double$bass:$...

food webs: experts consuming families of...

daino: a high-level framework for parallel and efficient amr...

marble - plitkazavr.ru · marble porcelain clay pasta...

new pterosaur tracks (pteraichnidae) from the late...

school sport in european cities milan -italy€¦ ·...

tms184b - diabetes research institute...where it all began...

self-supervised learning of interpretable keypoints from...

colores · 2019-04-03 · travertino reale travertino reale...

generate, segment and reﬁne: towards generic manipulation...

delfino daino drago dominator eroe orso · 2013. 7. 4. ·...

fotos gnosis workshop · 2020. 7. 14. · fotos2...

ing. gianluca daino -...

geometric constrained joint lane segmentation and lane...

ianni assimo aino appellee - ks courts · daino argued that...

baldocer: tradición y vanguardia. - azulejos calderón ·...