parallel tetrahedral mesh generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  ·...

84
Parallel Tetrahedral Mesh Generation Andrey Chernikov, Ph.D. Department of Computer Science Old Dominion University [email protected] Center for Real-Time Computing http://crtc.wm.edu May 6, 2014

Upload: others

Post on 29-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Parallel Tetrahedral Mesh Generation

Andrey Chernikov, Ph.D.

Department of Computer ScienceOld Dominion [email protected]

Center for Real-Time Computinghttp://crtc.wm.edu

May 6, 2014

Page 2: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Acknowledgments

Dr. Nikos ChrisochoidesDr. Christos AntonopoulosDr. Kevin BarkerDr. Filip BlagojevicDr. Xiaoning DingDr. Andriy Fedorov

Daming FengDr. Panagiotis FoteinosDr. Andriy KotDr. George KarniadakisDr. Leonidas LinardakisDr. Dimitrios Nikolopoulos

Scott PardueDr. Scott Schneider

George Zagaris

2/48

Page 3: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Outline

1 A Taxonomy of Parallel Delaunay Meshing Algorithms

2 Other Parallel Meshing Algorithms

3 Data Decomposition Based Parallel Delaunay Mesh Refinement

4 Parallel Generalized Delaunay Mesh Refinement

5 Exploring Multilevel Parallelism

3/48

Page 4: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Outline

1 A Taxonomy of Parallel Delaunay Meshing Algorithms

2 Other Parallel Meshing Algorithms

3 Data Decomposition Based Parallel Delaunay Mesh Refinement

4 Parallel Generalized Delaunay Mesh Refinement

5 Exploring Multilevel Parallelism

Page 5: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Guaranteed Quality Delaunay Mesh Refinement

Theoretical proofs of

boundary conformity (fidelity)

element quality in terms of circumradius-to-shortest edge ratio ρ: in 2D ρ = 12 sin θ

,θ is min angle

good grading: ∀p ∈ T : lfs(p)R(p)

≤ C

Iterative point insertion

Bowyer, Watson, Ruppert, Chew, Shewchuk 1981–2002 4/48

Page 6: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Guaranteed Quality Delaunay Mesh Refinement

Theoretical proofs ofboundary conformity (fidelity)

element quality in terms of circumradius-to-shortest edge ratio ρ: in 2D ρ = 12 sin θ

,θ is min angle

good grading: ∀p ∈ T : lfs(p)R(p)

≤ C

Iterative point insertion

Bowyer, Watson, Ruppert, Chew, Shewchuk 1981–2002 4/48

Page 7: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Guaranteed Quality Delaunay Mesh Refinement

Theoretical proofs ofboundary conformity (fidelity)

element quality in terms of circumradius-to-shortest edge ratio ρ: in 2D ρ = 12 sin θ

,θ is min angle

good grading: ∀p ∈ T : lfs(p)R(p)

≤ C

Iterative point insertion

Bowyer, Watson, Ruppert, Chew, Shewchuk 1981–2002 4/48

Page 8: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Guaranteed Quality Delaunay Mesh Refinement

Theoretical proofs ofboundary conformity (fidelity)

element quality in terms of circumradius-to-shortest edge ratio ρ: in 2D ρ = 12 sin θ

,θ is min angle

good grading: ∀p ∈ T : lfs(p)R(p)

≤ C

Iterative point insertion

Bowyer, Watson, Ruppert, Chew, Shewchuk 1981–2002 4/48

Page 9: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Guaranteed Quality Delaunay Mesh Refinement

Theoretical proofs ofboundary conformity (fidelity)

element quality in terms of circumradius-to-shortest edge ratio ρ: in 2D ρ = 12 sin θ

,θ is min angle

good grading: ∀p ∈ T : lfs(p)R(p)

≤ C

Iterative point insertion

Bowyer, Watson, Ruppert, Chew, Shewchuk 1981–2002 4/48

Page 10: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Guaranteed Quality Delaunay Mesh Refinement

Theoretical proofs ofboundary conformity (fidelity)

element quality in terms of circumradius-to-shortest edge ratio ρ: in 2D ρ = 12 sin θ

,θ is min angle

good grading: ∀p ∈ T : lfs(p)R(p)

≤ C

Iterative point insertion

Bowyer, Watson, Ruppert, Chew, Shewchuk 1981–2002 4/48

Page 11: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Guaranteed Quality Delaunay Mesh Refinement

Theoretical proofs ofboundary conformity (fidelity)

element quality in terms of circumradius-to-shortest edge ratio ρ: in 2D ρ = 12 sin θ

,θ is min angle

good grading: ∀p ∈ T : lfs(p)R(p)

≤ C

Iterative point insertion

Bowyer, Watson, Ruppert, Chew, Shewchuk 1981–2002 4/48

Page 12: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Guaranteed Quality Delaunay Mesh Refinement

Theoretical proofs ofboundary conformity (fidelity)

element quality in terms of circumradius-to-shortest edge ratio ρ: in 2D ρ = 12 sin θ

,θ is min angle

good grading: ∀p ∈ T : lfs(p)R(p)

≤ C

Iterative point insertion

Bowyer, Watson, Ruppert, Chew, Shewchuk 1981–2002 4/48

Page 13: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Guaranteed Quality Delaunay Mesh Refinement

Theoretical proofs ofboundary conformity (fidelity)

element quality in terms of circumradius-to-shortest edge ratio ρ: in 2D ρ = 12 sin θ

,θ is min angle

good grading: ∀p ∈ T : lfs(p)R(p)

≤ C

Iterative point insertion

Bowyer, Watson, Ruppert, Chew, Shewchuk 1981–2002 4/48

Page 14: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Guaranteed Quality Delaunay Mesh Refinement

Theoretical proofs ofboundary conformity (fidelity)

element quality in terms of circumradius-to-shortest edge ratio ρ: in 2D ρ = 12 sin θ

,θ is min angle

good grading: ∀p ∈ T : lfs(p)R(p)

≤ C

Iterative point insertion

Bowyer, Watson, Ruppert, Chew, Shewchuk 1981–2002 4/48

Page 15: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Guaranteed Quality Delaunay Mesh Refinement

Theoretical proofs ofboundary conformity (fidelity)

element quality in terms of circumradius-to-shortest edge ratio ρ: in 2D ρ = 12 sin θ

,θ is min angle

good grading: ∀p ∈ T : lfs(p)R(p)

≤ C

Iterative point insertion

Bowyer, Watson, Ruppert, Chew, Shewchuk 1981–2002 4/48

Page 16: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Guaranteed Quality Delaunay Mesh Refinement

Theoretical proofs ofboundary conformity (fidelity)

element quality in terms of circumradius-to-shortest edge ratio ρ: in 2D ρ = 12 sin θ

,θ is min angle

good grading: ∀p ∈ T : lfs(p)R(p)

≤ C

Iterative point insertion

Bowyer, Watson, Ruppert, Chew, Shewchuk 1981–2002 4/48

Page 17: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Guaranteed Quality Delaunay Mesh Refinement

Theoretical proofs ofboundary conformity (fidelity)

element quality in terms of circumradius-to-shortest edge ratio ρ: in 2D ρ = 12 sin θ

,θ is min angle

good grading: ∀p ∈ T : lfs(p)R(p)

≤ C

Iterative point insertion

Bowyer, Watson, Ruppert, Chew, Shewchuk 1981–2002 4/48

Page 18: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Guaranteed Quality Delaunay Mesh Refinement

Theoretical proofs ofboundary conformity (fidelity)

element quality in terms of circumradius-to-shortest edge ratio ρ: in 2D ρ = 12 sin θ

,θ is min angle

good grading: ∀p ∈ T : lfs(p)R(p)

≤ C

Iterative point insertion

Bowyer, Watson, Ruppert, Chew, Shewchuk 1981–2002 4/48

Page 19: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Encroachment in 3D

p

q

r

v

v′

v′′

Bowyer, Watson, Ruppert, Chew, Shewchuk 1981–2002 5/48

Page 20: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Geometric Race Conditions

pi

pi

pjp

i

pjp

ipi

pj

Chernikov and Chrisochoides: FINEL’10, SISC’06, ACM ICS’08, ACM ICS’04, IMR’06, IMR’05 6/48

Page 21: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Geometric Race Conditions

pi

pi

pjp

i

pjp

ipi

pj

Chernikov and Chrisochoides: FINEL’10, SISC’06, ACM ICS’08, ACM ICS’04, IMR’06, IMR’05 6/48

Page 22: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Geometric Race Conditions

pi

pi

pjp

i

pjp

ipi

pj

Chernikov and Chrisochoides: FINEL’10, SISC’06, ACM ICS’08, ACM ICS’04, IMR’06, IMR’05 6/48

Page 23: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Geometric Race Conditions

pi

pi

pjp

i

pjp

i

pi

pj

Chernikov and Chrisochoides: FINEL’10, SISC’06, ACM ICS’08, ACM ICS’04, IMR’06, IMR’05 6/48

Page 24: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Geometric Race Conditions

pi

pi

pjp

i

pjp

ipi

pj

Chernikov and Chrisochoides: FINEL’10, SISC’06, ACM ICS’08, ACM ICS’04, IMR’06, IMR’05 6/48

Page 25: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Speculative (Optimistic) Delaunay Meshing

Tig

ht

Deg

ree

of

cou

pli

ng

Lo

ose

pjp

i

pi

Foteinos, Nave, Chew, and Chrisochoides: SoCG’02, IJNME’03, CGTA’04, JPDC’14 7/48

Page 26: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Speculative (Optimistic) Delaunay Meshing

Tig

ht

Deg

ree

of

cou

pli

ng

Lo

ose

pjp

ipi

Foteinos, Nave, Chew, and Chrisochoides: SoCG’02, IJNME’03, CGTA’04, JPDC’14 7/48

Page 27: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Speculative (Optimistic) Delaunay Meshing

Tig

ht

Deg

ree

of

cou

pli

ng

Lo

ose

pjp

ipi

Foteinos, Nave, Chew, and Chrisochoides: SoCG’02, IJNME’03, CGTA’04, JPDC’14 7/48

Page 28: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Speculative (Optimistic) Delaunay Meshing

Tig

ht

Deg

ree

of

cou

pli

ng

Lo

ose

pjp

ipi

Foteinos, Nave, Chew, and Chrisochoides: SoCG’02, IJNME’03, CGTA’04, JPDC’14 7/48

Page 29: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Parallel Projection-Based Delaunay Meshing

Tig

ht

Deg

ree

of

cou

pli

ng

Lo

ose

Kadow and Walkington: TUMG’03, http://www.imr.sandia.gov/papers/tumg4/kadow.zip 8/48

Page 30: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Parallel Projection-Based Delaunay Meshing

Tig

ht

Deg

ree

of

cou

pli

ng

Lo

ose

Kadow and Walkington: TUMG’03, http://www.imr.sandia.gov/papers/tumg4/kadow.zip 8/48

Page 31: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Parallel Constrained Delaunay Meshing

Tig

ht

Deg

ree

of

cou

pli

ng

Lo

ose

p14

p

p

p

p

p

p p

p

p

p

p

p2

p2

1

3

1

3

4

5

6

7

8

10

11

p9

p12

p15

p

p

p

p

p

p p

p

p

p

p

p2

p2

1

3

1

3

4

5

6

7

8

10

11

p9

p12

15

p

p15

p

p

p

p

p

p p

p

p

p

p

p2

p2

1

3

1

3

4

5

6

7

8

10

11

p9

p12

24 8 16 32 48 64 80 96 112 128 14428

16

32

48

64

80

96

112

128

144

Number of processes

Scale

d s

peedup

Linear speedupPCDM (pipe cross−section)PCDM (cyliner flow)PCDM (Chesapeake bay)

Scaled speedup: the number of triangles ≈ 10M × P, that is, for 2 processors 20M, and for 144 processors about 1.4B.

(Extension to 3D subject to availability of a 3D domain decomposer)

Chernikov and Chrisochoides, ACM TOMS’08 9/48

Page 32: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Parallel Constrained Delaunay Meshing

Tig

ht

Deg

ree

of

cou

pli

ng

Lo

ose

p14

p

p

p

p

p

p p

p

p

p

p

p2

p2

1

3

1

3

4

5

6

7

8

10

11

p9

p12

p15

p

p

p

p

p

p p

p

p

p

p

p2

p2

1

3

1

3

4

5

6

7

8

10

11

p9

p12

15

p

p15

p

p

p

p

p

p p

p

p

p

p

p2

p2

1

3

1

3

4

5

6

7

8

10

11

p9

p12

24 8 16 32 48 64 80 96 112 128 14428

16

32

48

64

80

96

112

128

144

Number of processes

Scale

d s

peedup

Linear speedupPCDM (pipe cross−section)PCDM (cyliner flow)PCDM (Chesapeake bay)

Scaled speedup: the number of triangles ≈ 10M × P, that is, for 2 processors 20M, and for 144 processors about 1.4B.

(Extension to 3D subject to availability of a 3D domain decomposer)

Chernikov and Chrisochoides, ACM TOMS’08 9/48

Page 33: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Parallel Constrained Delaunay Meshing

Tig

ht

Deg

ree

of

cou

pli

ng

Lo

ose

p14

p

p

p

p

p

p p

p

p

p

p

p2

p2

1

3

1

3

4

5

6

7

8

10

11

p9

p12

p15

p

p

p

p

p

p p

p

p

p

p

p2

p2

1

3

1

3

4

5

6

7

8

10

11

p9

p12

15

p

p15

p

p

p

p

p

p p

p

p

p

p

p2

p2

1

3

1

3

4

5

6

7

8

10

11

p9

p12

24 8 16 32 48 64 80 96 112 128 14428

16

32

48

64

80

96

112

128

144

Number of processes

Scale

d s

peedup

Linear speedupPCDM (pipe cross−section)PCDM (cyliner flow)PCDM (Chesapeake bay)

Scaled speedup: the number of triangles ≈ 10M × P, that is, for 2 processors 20M, and for 144 processors about 1.4B.

(Extension to 3D subject to availability of a 3D domain decomposer)

Chernikov and Chrisochoides, ACM TOMS’08 9/48

Page 34: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Parallel Constrained Delaunay Meshing

Tig

ht

Deg

ree

of

cou

pli

ng

Lo

ose

p14

p

p

p

p

p

p p

p

p

p

p

p2

p2

1

3

1

3

4

5

6

7

8

10

11

p9

p12

p15

p

p

p

p

p

p p

p

p

p

p

p2

p2

1

3

1

3

4

5

6

7

8

10

11

p9

p12

15

p

p15

p

p

p

p

p

p p

p

p

p

p

p2

p2

1

3

1

3

4

5

6

7

8

10

11

p9

p12

24 8 16 32 48 64 80 96 112 128 14428

16

32

48

64

80

96

112

128

144

Number of processes

Scale

d s

peedup

Linear speedupPCDM (pipe cross−section)PCDM (cyliner flow)PCDM (Chesapeake bay)

Scaled speedup: the number of triangles ≈ 10M × P, that is, for 2 processors 20M, and for 144 processors about 1.4B.

(Extension to 3D subject to availability of a 3D domain decomposer)

Chernikov and Chrisochoides, ACM TOMS’08 9/48

Page 35: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Parallel Constrained Delaunay Meshing

Tig

ht

Deg

ree

of

cou

pli

ng

Lo

ose

p14

p

p

p

p

p

p p

p

p

p

p

p2

p2

1

3

1

3

4

5

6

7

8

10

11

p9

p12

p15

p

p

p

p

p

p p

p

p

p

p

p2

p2

1

3

1

3

4

5

6

7

8

10

11

p9

p12

15

p

p15

p

p

p

p

p

p p

p

p

p

p

p2

p2

1

3

1

3

4

5

6

7

8

10

11

p9

p12

24 8 16 32 48 64 80 96 112 128 14428

16

32

48

64

80

96

112

128

144

Number of processes

Scale

d s

peedup

Linear speedupPCDM (pipe cross−section)PCDM (cyliner flow)PCDM (Chesapeake bay)

Scaled speedup: the number of triangles ≈ 10M × P, that is, for 2 processors 20M, and for 144 processors about 1.4B.

(Extension to 3D subject to availability of a 3D domain decomposer)

Chernikov and Chrisochoides, ACM TOMS’08 9/48

Page 36: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Domain Decomposition and Decoupling

Tig

ht

Deg

ree

of

cou

pli

ng

Lo

ose

Given domain Ω ⊂ Rn, construct the separators Sij ⊂ Rn−1, such that thedomain is decomposed into subdomains Ωi :

Ω =N⋃

i=1

Ωi , ∂Ωi ∩ ∂Ωj = Sij , i, j = 1, . . . ,N, i 6= j,

while the separators do not create very small angles and other features.

(Has not been extended to 3D)

Linardakis and Chrisochoides: SISC’06, SISC’08, TOMS’08 10/48

Page 37: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Outline

1 A Taxonomy of Parallel Delaunay Meshing Algorithms

2 Other Parallel Meshing Algorithms

3 Data Decomposition Based Parallel Delaunay Mesh Refinement

4 Parallel Generalized Delaunay Mesh Refinement

5 Exploring Multilevel Parallelism

Page 38: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Parallel Advancing Front Meshing

Idea: Given a final surfacemesh of domain D constructa 3D zone using apre-computed surface S toguide a single layer along Sstarting from any externalboundary of D.

Given a source driven AFT, azone can be constructed fromelements whose size willremain invariant throughoutthe mesh generation process.

No new features or smallangles due to decomposition,therefore any decompositionworks.

Caveat: termination notguaranteed for thesub-problems.

Zagaris, Pirzadeh, and Chrisochoides: AIAA’09 11/48

Page 39: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Parallel Terminal Edge Bisection

A terminal-edge is the longest edgeof every element that shares such anedge.

A terminal star is the set of elementsthat share a terminal-edge.

The stopping criterion is thepredefined bound for the length of theterminal edges.

The terminal-star algorithm eliminatesthe management of non-conformingedges both in the interior of thesubmeshes and in the interfaces i.e.,eliminates communication.

Rivara, Pizarro, and Chrisochoides: IMR’04 12/48

Page 40: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Parallel Terminal Edge Bisection

Qulaity is measured as normalized volume / (longest edge)3.

Rivara, Pizarro, and Chrisochoides: IMR’04 13/48

Page 41: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Outline

1 A Taxonomy of Parallel Delaunay Meshing Algorithms

2 Other Parallel Meshing Algorithms

3 Data Decomposition Based Parallel Delaunay Mesh Refinement

4 Parallel Generalized Delaunay Mesh Refinement

5 Exploring Multilevel Parallelism

Page 42: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Data Decomposition Based Parallel Delaunay Mesh Refinement

Tig

ht

Deg

ree

of

cou

pli

ng

Lo

ose

No rollbacks

No fine-grain synchronization

Does not require to solve the domain decomposition problem

Extended to 3D

Code reuse

Chernikov and Chrisochoides: FINEL’10, SISC’06, ACM ICS’08, ACM ICS’04, IMR’06, IMR’05 14/48

Page 43: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Delaunay-Independence Criterion (2D)

Lemma Points pi and pj are Delaunay-independent iff

C (pi ) ∩ C(pj)

= ∅, and

∀e (pmpn) ∈ ∂C (pi ) ∩ ∂C(pj)

: pi /∈ ©(4(pj pmpn

)).

pj

p

pn

m

pi

Chernikov and Chrisochoides: SISC’06, ACM ICS’04 15/48

Page 44: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Delaunay-Independence Criterion (3D)

Lemma Points pi and pj are Delaunay-independent iff

C (pi ) ∩ C(pj)

= ∅, and ∀ξ ∈ ∂C (pi ) ∩ ∂C(pj)

: pi /∈ ©(τ(pjξ)).

pk

pr

pm

pl

pn

pi

pj

© (τ (prξ))

© (τ (pjξ))

© (τ (pmξ))

ξ

Chernikov and Chrisochoides: ACM ICS’08 16/48

Page 45: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Sufficient Condition for Uniform PDR

≥ 4r

Lemma (Sufficient condition of strongDelaunay-independence in 2D) If

‖pi − pj‖ ≥ 4r ,

then pi and pj are stronglyDelaunay-independent.

Chernikov and Chrisochoides: SISC’06, ACM ICS’04 17/48

Page 46: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Uniform PDR: Scaled Speedup (2D)

4 9 16 25 36 49 64 81 100 1210

5

10

15

20

25

30

35

40

Number of processors

Speedup

Unit square

Pipe cross−section

91M179M

295M175M

352M

588M

441M

874M

Scaled Speedup =P × T1(W )

TP(P ×W )

3.64M triangles per processor (pipe), 8.39M triangles per processor (unit square)

Chernikov and Chrisochoides: SISC’06, ACM ICS’04 18/48

Page 47: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Hybrid Speculative / Data Decomposition Approach

Image-to mesh conversion

Implementing an efficient parallel Delaunay refinement algorithm targeting DSM NUMA architecture will help to improve the understanding of the characteristic challenges of irregular applications on distributed shared memory super machines consisting of thousands or millions of cores, and will help the community gain insight into whole family of problems characterized by unpredictable communication patterns. In mesh generation the length of the research and development cycle for industrial strength codes often takes a hundred or more man-years [7]. Therefore, the rewriting of parallel mesh generation codes is extremely expensive. In addition, due to geometric dependencies, there is no known feasible approach to the automatic compile-time analysis and parallelization [8, 9].

In this paper we propose a three dimensional Parallel Uniform Locality Optimized Speculative Delaunay Image-to-Mesh Conversion algorithm PLODC. PLODC employs an n-step data locality optimization scheme to reduce the communication overhead caused by a large number of remote memory accesses. Also it takes full advantage of our previous image-to-mesh conversion approach, PODM, that recovers the tissues’ boundaries and generates quality tetrahedral meshes. PODM introduces low level locking mechanisms, carefully designed contention managers and well-suited load balancing schemes to boost the performance with the cost of very little overhead.

The experimental test and evaluation were done on Blacklight, the cache-coherent NUMA shared memory machine in the Pittsburgh Supercomputing Center. We observed a more than 67% weak scaling efficiency on 192 cores, compared to only about 30% for PODM, the previous code. PODM suffers from communication overhead caused by a large number of remote memory accesses. To our knowledge, this is the best scalability result for three dimensional isosurface based parallel mesh generation algorithms running on NUMA DSM supercomputers with quality and fidelity guaranteed. Fig. 1 shows an example mesh created by our algorithm.

Fig. 1. An example mesh created by our algorithm. The input image is the CT abdominal atlas obtained from IRCAD Laparoscopic Center. The boundaries of all tissues are well recovered. The zoom in part the mesh shows the good quality of each element in the mesh is guaranteed.

The rest of the paper is organized as follows. In Section II we present the background of Delaunay mesh refinement and review the related prior work. In Section III we present the implementation of our n-steps data locality optimized parallel mesh generation algorithm, PLODC. Also, we estimate the number of remote accesses of both PODM and PLODC and show that the number of remote accesses of PLODC is much less than that of PODM. In section IV we show and analyze the experimental results of our approach and Section V concludes the paper.

II. BACKGROUND AND RELATED WORK In the case of bio-engineering applications, since we start

with images and due to the limitation of PLC-based method [10, 11], it is best to avoid the initial generation of the PLC and immediately proceed with volume mesh generation. Isosurface-based methods recovered and meshed during refinement, this method does not suffer from any angle constraints [12, 13].

Delaunay refinement algorithms work by inserting additional (so-called Steiner) points into an existing mesh to improve the quality of the elements. In Delaunay mesh refinement, the computation depends on the input geometry and changes as the algorithm progresses. The basic operation is the insertion of a single point which leads to the removal of a poor quality tetrahedron and of several adjacent tetrahedral from the mesh and the insertion of several new tetrahedra. The new tetrahedra may or may not be of poor quality and, hence, may or may not require further point insertions. It is proven that the algorithm eventually terminates after having eliminated all poor quality tetrahedra, and in addition the termination does not depend on the order of processing of poor quality tetrahedra, even though the structure of the final meshes may vary [11, 14]. The insertion of a point is often implemented according to the well-known Bowyer-Watson kernel [15, 16]. The parallel insertion of points by different threads needs to be synchronized.

The problem of parallel Delaunay triangulation of a specified point set has been solved by Blelloch et al. [17]. They describe a divide-and-conquer projection-based algorithm for constructing Delaunay triangulations of pre-defined point sets in parallel. One approach, named Parallel Delaunay Refinement (PDR), which we published previously [18, 19] is based on a theoretically proven method to choose the points for the insertion, so that we can guarantee their independence and thus avoid runtime data dependencies and overheads. This approach is based on the analysis of the dependencies between the inserted points and requires neither the runtime checks nor the geometry decomposition. However, PDR is conservative in leveraging available concurrency due to theoretical guarantees that require a sufficiently refinement mesh before the parallelism can be safely started. PODM, on the other hand, is more aggressive in leveraging parallelism at the cost of run-time checks for data dependencies.

An idea of updating partition boundaries when inserted points happen to be close to them was presented by Chew et al.[20] and extended in [21] as a Parallel Constrained Delaunay Meshing (PCDM) algorithm. In PCDM, the edges on the boundaries of submeshes are fixed (constrained), and if a new

Feng, Chernikov, and Chrisochoides: SC’14 (in review) 19/48

Page 48: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Parallel Euclidean Image Distance Transform

Best known previous work4 Intel Xeon 1.5 GHz processorswith 2 SMT threads per processor

2 GB of RAM

Max image size 3003

Mean speedup 3.3

Efficiency per thread 0.413

Our recent work40-core Intel Xeon node at2.2GHz

128 GB of RAM

Max image size 12503

Mean speedup 19

Efficiency per thread 0.475

Staubs et al.: IJ’06 Pardue, Chernikov, and Chrisochoides: Capstone’14 20/48

Page 49: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Parallel Euclidean Image Distance Transform

Best known previous work4 Intel Xeon 1.5 GHz processorswith 2 SMT threads per processor

2 GB of RAM

Max image size 3003

Mean speedup 3.3

Efficiency per thread 0.413

Our recent work40-core Intel Xeon node at2.2GHz

128 GB of RAM

Max image size 12503

Mean speedup 19

Efficiency per thread 0.475

Staubs et al.: IJ’06 Pardue, Chernikov, and Chrisochoides: Capstone’14 20/48

Page 50: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Sufficient Condition for Graded PDR (2D)

Lemma (Sufficient condition of Delaunay-independence) Points pi and pj areDelaunay-independent if there exists a subsegment s ⊆ L

(pi pj

)such that

∀t ∈ T : s ∩© (t) =⇒ 2r (t) ≤ |s|.

pj

s

pi

21/48

Page 51: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Sufficient Condition for Graded PDR (3D)

Lemma (Sufficient condition of Delaunay-independence) Points pi and pj areDelaunay-independent if there exists a subsegment s ⊆ L

(pi pj

)such that

∀τ ∈ T : s ∩© (τ) =⇒ 2r (τ) ≤ |s|.

s

pk

pr

pm

pl

pn

pi

pj

© (τ (prξ))

© (τ (pmξ))

ξ

Chernikov and Chrisochoides: ACM ICS’08 22/48

Page 52: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Data Distribution and Buffer Zones

Lemma (Sufficient condition of Delaunay-independence) Points pi and pj areDelaunay-independent if there exists a subsegment s ⊆ L

(pi pj

)such that

∀t ∈ T : s ∩© (t) =⇒ 2r (t) ≤ |s|.

Remark

pi

pj

s

w

r

pi

pj

s

Chernikov and Chrisochoides: ACM ICS’08 23/48

Page 53: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Data Distribution and Buffer Zones

Lemma (Sufficient condition of Delaunay-independence) Points pi and pj areDelaunay-independent if there exists a subsegment s ⊆ L

(pi pj

)such that

∀t ∈ T : s ∩© (t) =⇒ 2r (t) ≤ |s|.

Remark

pi

pj

s

w

r

pi

pj

s

Chernikov and Chrisochoides: ACM ICS’08 23/48

Page 54: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

α-neighborhood

Adjacency directions:

Λx = Left ,Right,Λy = Top,Bottom,Λz = Back ,Front. L

Definition α-neighborhood Nα (L) of leaf L (α ∈ Λx ∪ Λy ∪ Λz ) is the set of octreeleaves that share a face with L and are located in the α direction of L.

Chernikov and Chrisochoides: ACM ICS’08 24/48

Page 55: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Buffer Zone (2D)

L

Definition 2D buffer zone is the set of leaves

BUF (L) =⋃

α∈ΛxNα (L) ∪

⋃β∈Λy

Nβ (L′) | L′ ∈ Nα (L)

under the condition

∀L′ ∈ BUF (L) , ∀τ ∈ T : © (τ) ∩ L′ 6= ∅ =⇒ r (τ) <14`(L′),

Chernikov and Chrisochoides: FINEL’10, IMR’06, IMR’05 25/48

Page 56: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Buffer Zone (3D)

Definition 3D buffer zone is the set of leaves

BUF (L) =⋃

α∈ΛxNα (L) ∪

⋃β∈Λy

Nβ (L′) | L′ ∈ Nα (L) ∪⋃γ∈Λz

Nγ (L′′) | L′′ ∈ Nβ (L′) | L′ ∈ Nα (L)

under the condition

∀L′ ∈ BUF (L) , ∀τ ∈ T : © (τ) ∩ L′ 6= ∅ =⇒ r (τ) <16`(L′),

Chernikov and Chrisochoides: ACM ICS’08 26/48

Page 57: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Delaunay-Separated Regions

Definition Let two regions (leaves) Li and Lj be called Delaunay-separated withrespect to meshM iff arbitrary points pi ∈ Li and pj ∈ Lj are stronglyDelaunay-independent.

i

jL

L

Lemma If Li and Lj are quadtree leavesand Lj /∈ BUF (Li ), then Li and Lj areDelaunay-separated.

Chernikov and Chrisochoides: FINEL’10, SISC’06, ACM ICS’08, ACM ICS’04, IMR’06, IMR’05 27/48

Page 58: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

PGDR Software Design Diagram

Ser

ial

Del

aunay

Ref

inem

ent

Multithreaded Memory Allocator

Coarse Grain SchedulingS

eria

l D

elau

nay

Ref

inem

ent

SelectionPoint

Scheduling

Element

SelectionPoint

Scheduling

Element

28/48

Page 59: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Multithreaded Memory Management

Allocator Characteristicsstandard glibc, thread-safe for MPCDM and PDR, or optimized

sequential for PCDM, 2n object sizestcmalloc (Google) no headers, locks, 2n object size classesStreamflow no headers, no locks, lock-free page block recycling,

4|8× object size classescustom application-specificcustom + page manager custom allocator uses Streamflow’s page manager for

block allocations

Chernikov et al.: ICNGG’07 29/48

Page 60: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Performance Evaluation (2D)

1 2 3 40

10

20

23.5

30

42.2

50

54.3

60

70

80

Number of compute threads

Tim

e, sec

Triangle (sequential)granularity = 1granularity = 2granularity = 3granularity = 4

granularity = ∞

0 1 2 3 40

5

10

15

20

25

Thread number

Tim

e, sec

Mesh refinementQuadtree refinementRefinement queue updatesIdle time

Pipe cross-section, 17M triangles, A (x , y) = 10−4(√

(x − 200)2 + (y − 200)2 + 1)

Chernikov and Chrisochoides: IMR’06 30/48

Page 61: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Performance Evaluation (3D)

The bat, 5.8M tetrahedra,

r =

0.25 if (−19.89 < x < 6.50) ∧ (−5.65 < y < 8.05) ∧ (−5.61 < z < 5.61)0.50 otherwise

Unit cube, 1.9M tetrahedra, r = 0.015.

Model Number of compute threads1 2 3 4

Bat 235.2 142.1 120.0 111.2Unit cube 77.2 47.0 37.06 32.29

Chernikov and Chrisochoides: ACM ICS’08 31/48

Page 62: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Outline

1 A Taxonomy of Parallel Delaunay Meshing Algorithms

2 Other Parallel Meshing Algorithms

3 Data Decomposition Based Parallel Delaunay Mesh Refinement

4 Parallel Generalized Delaunay Mesh Refinement

5 Exploring Multilevel Parallelism

Page 63: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Selection of Steiner Points: Circumcenter

circumcenter

Triangle

Subsegment

midpoint

×ρ(≥√2)

× 1√2

× 12 cosα

Frey: IJNME’87; Ruppert: JA’95; Chew: SoCG’93; Shewchuk: SoCG’98, CGTA’02 32/48

Page 64: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Selection of Steiner Points: Circumcenter

circumcenter

Triangle

Subsegment

midpoint

×ρ(≥√2)

× 1√2

× 12 cosα

Frey: IJNME’87; Ruppert: JA’95; Chew: SoCG’93; Shewchuk: SoCG’98, CGTA’02 32/48

Page 65: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Selection of Steiner Points: Other Approaches

c

pl

pm

pk

|| pl

pm

||−ρ

a

b

o

Avoiding slivers [Chew: SoCG’97] Reducing mesh size [Üngör: LATIN’04]

pi

c

pl

pm

pk

Removing slivers [Li and Teng: SoDA’01; Li: TCS’03] Longest edge propagation [Rivara: IMR’06]

33/48

Page 66: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Other Steiner Points?

Other points?

Other points?

×ρ(≥√2)

× 1√2

× 12 cosα

Chernikov, Chrisochoides, and Foteinos: SISC’12, SISC’10, SISC’09, ACM ICS’08, IMR’07, IMR’06 34/48

Page 67: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Selection Balls

c

p

q

r

r(1− δ1)

r(1− δ1)

cp

q

r

r(1− δ2)

pq

r

s

c

r(1− δ3)

Chernikov, Chrisochoides, and Foteinos: SISC’12, SISC’10, SISC’09, ACM ICS’08, IMR’07, IMR’06 35/48

Page 68: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Selection Balls

c

p

q

r

r(1− δ1)

r(1− δ1)

cp

q

r

r(1− δ2)

pq

r

s

c

r(1− δ3)

Chernikov, Chrisochoides, and Foteinos: SISC’12, SISC’10, SISC’09, ACM ICS’08, IMR’07, IMR’06 35/48

Page 69: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Selection Balls

c

p

q

r

r(1− δ1)

r(1− δ1)

cp

q

r

r(1− δ2)

pq

r

s

c

r(1− δ3)

Chernikov, Chrisochoides, and Foteinos: SISC’12, SISC’10, SISC’09, ACM ICS’08, IMR’07, IMR’06 35/48

Page 70: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

PGDR Approach

sequential methods

conformity to selection balls

parallel methods

Chernikov and Chrisochoides: ACM ICS’08 36/48

Page 71: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Selection Balls And Fidelity to Boundaries

Chernikov, Chrisochoides, and Foteinos: SISC’12, SISC’10, SISC’09, ACM ICS’08, IMR’07, IMR’06 37/48

Page 72: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Selection Balls And Fidelity to Boundaries

Chernikov, Chrisochoides, and Foteinos: SISC’12, SISC’10, SISC’09, ACM ICS’08, IMR’07, IMR’06 37/48

Page 73: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Outline

1 A Taxonomy of Parallel Delaunay Meshing Algorithms

2 Other Parallel Meshing Algorithms

3 Data Decomposition Based Parallel Delaunay Mesh Refinement

4 Parallel Generalized Delaunay Mesh Refinement

5 Exploring Multilevel Parallelism

Page 74: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Multilevel Architecture

Integration of many processing elements in a single system

Hierarchical, multilevel designs

System−level

Node−level

Chip−level SMT / Multicore / GPU

Small−scale Nodes

Large−scale Clusters

Antonopoulos, Ding, Chernikov, Blagojevic, Nikolopoulos, and Chrisochoides: JPDC’09a, JPDC’09b, ACM ICS’05 38/48

Page 75: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Parallel Constrained Delaunay Meshing — Coarse-grain

Parallelism at the subdomain level

MPI-style implementation

Appropriate for distributed memoryclusters

High degree of parallelism

Coarse granularity (seconds / minutes)

Antonopoulos, Ding, Chernikov, Blagojevic, Nikolopoulos, and Chrisochoides: JPDC’09a, JPDC’09b, ACM ICS’05 39/48

Page 76: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Optimistic Delaunay Meshing — Medium-grain

Parallelism at the cavity level

Shared-memory implementation

Medium granularity ( 2 msec)Appropriate for

shared-memorySMTs / Multicores / GPUs with manyexecution units per device

Requires conflict detection and rollback

Antonopoulos, Ding, Chernikov, Blagojevic, Nikolopoulos, and Chrisochoides: JPDC’09a, JPDC’09b, ACM ICS’05 40/48

Page 77: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Optimistic Delaunay Meshing — Medium-grain

Available Concurrency

0

100

200

300

400

500

600

700

800

900

1000

1 51 101 151 201 251 301 351 401 451

Cavities Already Expanded (Thousands)

Co

nc

urr

en

tly

Ex

pa

nd

ab

le

Ca

vit

ies

(T

ho

us

an

ds

)

Statistical estimation of the available parallelismthroughout the execution life of a medium-grainPCDM, when 32 to 512 processors are used. Thelower and upper curves correspond to the minimumand maximum estimation respectively.

High degree of parallelism

Up to 512 execution contextscan be exploited

Average of 400 cavities perthread (worst case scenario)

Promising preliminary results

Antonopoulos, Ding, Chernikov, Blagojevic, Nikolopoulos, and Chrisochoides: JPDC’09a, JPDC’09b, ACM ICS’05 41/48

Page 78: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

Optimistic Delaunay Meshing — Fine-grain

p

p

p

p

p

p

k

l

m

n

s

t

pi

Parallelism at the element level

Sh.-memory implementation

Extremely fine granularity (4-6 usec)— too fine for user-level threadlibraries

Parallel / sequential phasesalternating every 7-10 usec

Low degree of parallelism

Appropriate for SMTs / CMPs —specifically for the ones with fewexecution contexts

Antonopoulos, Ding, Chernikov, Blagojevic, Nikolopoulos, and Chrisochoides: JPDC’09a, JPDC’09b, ACM ICS’05 42/48

Page 79: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

References I

Christos Antonopoulos, Filip Blagojevic, Andrey Chernikov, Nikos Chrisochoides, and Dimitrios Nikolopoulos.

A multigrain Delaunay mesh generation method for multicore SMT-based architectures.Journal on Parallel and Distributed Computing, 69:589–600, 2009.

Christos Antonopoulos, Filip Blagojevic, Andrey Chernikov, Nikos Chrisochoides, and Dimitrios Nikolopoulos.

Algorithm, software, and hardware optimizations for Delaunay mesh generation on simultaneous multithreaded architectures.Journal on Parallel and Distributed Computing, 69:601–612, 2009.

Christos Antonopoulos, Xiaoning Ding, Andrey Chernikov, Filip Blagojevic, Dimitrios Nikolopoulos, and Nikos Chrisochoides.

Multigrain parallel Delaunay mesh generation: challenges and opportunities for multithreaded architectures.In ACM International Conference on Supercomputing, pages 367–376, Cambridge, MA, June 2005.

Adrian Bowyer.

Computing Dirichlet tesselations.Computer Journal, 24:162–166, 1981.

Andrey Chernikov, Christos Antonopoulos, Nikos Chrisochoides, Scott Schneider, and Dimitrios Nikolopoulos.

Experience with memory allocators for parallel mesh generation on multicore architectures.In International Conference on Numerical Grid Generation in Computational Field Simulations, Forth, Crete, Greece, September2007.Published on CD-ROM.

Andrey Chernikov and Nikos Chrisochoides.

Practical and efficient point insertion scheduling method for parallel guaranteed quality Delaunay refinement.In ACM International Conference on Supercomputing, pages 48–57, Saint-Malo, France, June 2004.

Andrey Chernikov and Nikos Chrisochoides.

Parallel 2D graded guaranteed quality Delaunay mesh refinement.In International Meshing Roundtable, pages 505–517, San Diego, CA, September 2005.

Andrey Chernikov and Nikos Chrisochoides.

Generalized Delaunay mesh refinement: from scalar to parallel.In International Meshing Roundtable, pages 563–580, Birmingham, AL, September 2006.

43/48

Page 80: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

References II

Andrey Chernikov and Nikos Chrisochoides.

Parallel guaranteed quality Delaunay uniform mesh refinement.SIAM Journal on Scientific Computing, 28:1907–1926, November 2006.

Andrey Chernikov and Nikos Chrisochoides.

Three-dimensional semi-generalized point placement method for Delaunay mesh refinement.In International Meshing Roundtable, pages 25–44, Seattle, WA, October 2007.

Andrey Chernikov and Nikos Chrisochoides.

Algorithm 872: parallel 2D constrained Delaunay mesh generation.ACM Transactions on Mathematical Software, 34:6–25, January 2008.

Andrey Chernikov and Nikos Chrisochoides.

Three-dimensional Delaunay refinement for multi-core processors.In ACM International Conference on Supercomputing, pages 214–224, Island of Kos, Greece, June 2008.

Andrey Chernikov and Nikos Chrisochoides.

Generalized two-dimensional Delaunay mesh refinement.SIAM Journal on Scientific Computing, 31:3387–3403, 2009.

Andrey Chernikov and Nikos Chrisochoides.

A template for developing next generation parallel Delaunay refinement methods.Finite Elements in Analysis and Design, 46:96–113, 2010.

Andrey Chernikov and Nikos Chrisochoides.

Generalized insertion region guides for Delaunay mesh refinement.SIAM Journal on Scientific Computing, 34:A1333–A1350, 2012.

L. Paul Chew.

Guaranteed-quality triangular meshes.Technical Report TR89983, Cornell University, Computer Science Department, 1989.

44/48

Page 81: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

References III

L. Paul Chew.

Guaranteed quality mesh generation for curved surfaces.In Proceedings of the 9th ACM Symposium on Computational Geometry, pages 274–280, San Diego, CA, 1993.

L. Paul Chew.

Guaranteed-quality Delaunay meshing in 3D.In Proceedings of the 13th ACM Symposium on Computational Geometry, pages 391–393, Nice, France, 1997.

Nikos Chrisochoides and Démian Nave.

Parallel Delaunay mesh generation kernel.International Journal for Numerical Methods in Engineering, 58:161–176, 2003.

Panagiotis Foteinos and Nikos Chrisochoides.

High quality real-time image-to-mesh conversion for finite element simulations.Journal on Parallel and Distributed Computing, 74(2):2123–2140, 2014.

Panagiotis Foteinos, Andrey Chernikov, and Nikos Chrisochoides.

Fully generalized 2D constrained Delaunay mesh refinement.SIAM Journal on Scientific Computing, 32:2659–2686, 2010.

Daming Feng, Andrey Chernikov, and Nikos Chrisochoides.

Parallel uniform data-locality optimized speculative delaunay image-to-mesh conversion.In ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis, New Orleans, LA,2014.In review.

William H. Frey.

Selective refinement: A new strategy for automatic node placement in graded triangular meshes.International Journal for Numerical Methods in Engineering, 24(11):2183–2200, 1987.

Paul-Louis George and Houman Borouchaki.

Delaunay Triangulation and Meshing. Application to Finite Elements.HERMES, 1998.

45/48

Page 82: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

References IV

Clemens Kadow and Noel Walkington.

Design of a projection-based parallel Delaunay mesh generation and refinement algorithm.In 4th Symposium on Trends in Unstructured Mesh Generation, Albuquerque, NM, July 2003.http://www.andrew.cmu.edu/user/sowen/usnccm03/agenda.html.

Leonidas Linardakis and Nikos Chrisochoides.

Delaunay decoupling method for parallel guaranteed quality planar mesh refinement.SIAM Journal on Scientific Computing, 27(4):1394–1423, 2006.

Leonidas Linardakis and Nikos Chrisochoides.

Algorithm 870: A static geometric medial axis domain decomposition in 2D Euclidean space.ACM Transactions on Mathematical Software, 34(1):1–28, 2008.

Leonidas Linardakis and Nikos Chrisochoides.

Graded Delaunay decoupling method for parallel guaranteed quality planar mesh generation.SIAM Journal on Scientific Computing, 30(4):1875–1891, March 2008.

Xiang-Yang Li.

Generating well-shaped d-dimensional Delaunay meshes.Theoretical Computer Science, 296(1):145–165, 2003.

Xiang-Yang Li and Shang-Hua Teng.

Generating well-shaped Delaunay meshes in 3D.In Proceedings of the 12th annual ACM-SIAM symposium on Discrete algorithms, pages 28–37, Washington, D.C., 2001.

Démian Nave, Nikos Chrisochoides, and L. Paul Chew.

Guaranteed–quality parallel Delaunay refinement for restricted polyhedral domains.In Proceedings of the 18th ACM Symposium on Computational Geometry, pages 135–144, Barcelona, Spain, 2002.

Démian Nave, Nikos Chrisochoides, and L. Paul Chew.

Guaranteed–quality parallel Delaunay refinement for restricted polyhedral domains.Computational Geometry: Theory and Applications, 28:191–215, 2004.

46/48

Page 83: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

References V

Scott Pardue, Nikos Chrisochoides, and Andrey Chernikov.

Scalability of a parallel arbitrary-dimensional image distance transform.In Modeling, Simulation, and Visualization Student Capstone Conference, Suffolk, VA, April 2014. Virginia Modeling, Analysis andSimulation Center.

Maria-Cecilia Rivara.

A study on Delaunay terminal edge method.In Proceedings of the 15th International Meshing Roundtable, pages 543–562, Birmingham, AL, September 2006. Springer.

Maria-Cecilia Rivara, Daniel Pizarro, and Nikos Chrisochoides.

Parallel refinement of tetrahedral meshes using terminal-edge bisection algorithm.In 13th International Meshing Roundtable, pages 427–436, Williamsburg, VA, September 2004.

Jim Ruppert.

A Delaunay refinement algorithm for quality 2-dimensional mesh generation.Journal of Algorithms, 18(3):548–585, 1995.

Robert Staubs, Andriy Fedorov, Leonidas Linardakis, Benjamin Dunton, and Nikos Chrisochoides.

Parallel N-dimensional exact signed Euclidean distance transform.Insight Journal, 2006.http://hdl.handle.net/1926/307.

Jonathan Richard Shewchuk.

Tetrahedral mesh generation by Delaunay refinement.In Proceedings of the 14th ACM Symposium on Computational Geometry, pages 86–95, Minneapolis, MN, 1998.

Jonathan Richard Shewchuk.

Delaunay refinement algorithms for triangular mesh generation.Computational Geometry: Theory and Applications, 22(1–3):21–74, May 2002.

Alper Üngör.

Off-centers: A new type of Steiner points for computing size-optimal guaranteed-quality Delaunay triangulations.In Proceedings of LATIN, pages 152–161, Buenos Aires, Argentina, April 2004.

47/48

Page 84: Parallel Tetrahedral Mesh Generationossanworld.com/hiroakinishikawa/niacfds/... · 5/6/2014  · Department of Computer Science Old Dominion University achernik@cs.odu.edu Center

References VI

David F. Watson.

Computing the n-dimensional Delaunay tesselation with application to Voronoi polytopes.Computer Journal, 24:167–172, 1981.

George Zagaris, Shahyar Pirzadeh, and Nikos Chrisochoides.

A framework for parallel unstructured grid generation for practical aerodynamic simulations.In 47th AIAA Aerospace Sciences Meeting, Orlando, FL, January 2009.

48/48