state of the art distributed parallel computational ... · 1 pareng- 2011 louis komzsik state of...
TRANSCRIPT
![Page 1: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/1.jpg)
1 Louis Komzsik PARENG- 2011
State of the art distributed parallel computational techniques in industrial finite element analysis
Second Conference on Parallel, Distributed, Grid and CloudComputing for Engineering
Dr. Louis KomzsikSiemens PLM Software, USA
Ajaccio, FranceApril 12-15, 2011
![Page 2: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/2.jpg)
2 Louis Komzsik PARENG- 2011
Introduction to industrial analysis
Geometric domain decomposition
Distributed computational solutions
Parallel computational kernels
Application case studies
Conclusions and future work
Scope or presentation
![Page 3: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/3.jpg)
3 Louis Komzsik PARENG- 2011
Industrial complexity – constantly increasing
Engine block1,000,000 elements
Car30,000 parts
Jet Engine10,000 parts
Factory10,000 machines
3
![Page 4: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/4.jpg)
4 Louis Komzsik PARENG- 2011
Cray Computer Multi-core CPU
$15 million $150
O(1) gigaflops O(100) gigaflops
1000 sold 100 million sold
Computer hardware – constantly changing
![Page 5: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/5.jpg)
5 Louis Komzsik PARENG- 2011
Lifecycle simulations
Designerview
Analystview
![Page 6: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/6.jpg)
6 Louis Komzsik PARENG- 2011
Multidisciplinary solutions
Designerview
Analystview
![Page 7: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/7.jpg)
7 Louis Komzsik PARENG- 2011
High performance requirements
The constrained stiffness matrix of an analysis problem
� Number of rows: 35,734,709
� Nonzero terms: 1,384,305,995
� Nonzero terms in sparse factor matrix: 43,827,004,000
� Memory used during factorization: 1,080,732,000 (4 byte) words
� Actual elapsed time of sparse factorization on a single high performance processor:
335 minutes
![Page 8: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/8.jpg)
8 Louis Komzsik PARENG- 2011
Introduction to industrial analysis
Geometric domain decomposition
Distributed computational solutions
Parallel computational kernels
Application case studies
Conclusions
Scope or presentation
![Page 9: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/9.jpg)
9 Louis Komzsik PARENG- 2011
� Subdivide large geometry domains into limited number of partitions
� Computations in the geometry partitions are dependent
� Minimize the boundary size of each partition with respect to its interior
� Minimize the total boundary size as communication is needed
Single level geometric domain decomposition
Proc 1 Proc 2 Proc k
![Page 10: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/10.jpg)
10 Louis Komzsik PARENG- 2011
Single level
� Subdivide large geometry domains into limited number of partitions
� Subdivide the partitions into sub-partitions and dynamically reduce them to their collectors
� Assemble the multilevel substructures to obtain the engineering solution
� The total number of substructures may exceed the number of processors
Multi-level geometry domain decomposition
![Page 11: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/11.jpg)
11 Louis Komzsik PARENG- 2011
Finite element problem domain decomposition
Based on model or matrices
Graph Matrix FE model
Vertices Diagonal Terms Node points
Edges Off-diagonals Elements
Undirected Symmetric Linear
![Page 12: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/12.jpg)
12 Louis Komzsik PARENG- 2011
Graphs and matrices
Graph model and its Laplacian matrix
Finite element model and its stiffness matrix
1 2 4
3 5
−
−−
−
−−−
−
=
kkk
kkk
kkk
kkkkk
kkk
K
2300
36030
0023
3383
0032
MembraneElement 1
Membrane Element 2
1 2 4
3 5
−−
−−
−−
−−−−
−−
=
21010
12010
00211
11141
00112
L
![Page 13: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/13.jpg)
13 Louis Komzsik PARENG- 2011
Partitioning technology
Spectral bisection method
Vertex cut result
1 2 4
3 5
:222 uLu λ=
−
−
⋅=
−
−
−−
−−
−−
−−−−
−−
2/1
2/1
2/1
0
2/1
1
2/1
2/1
2/1
0
2/1
21010
12010
00211
11141
00112
1 2
3
2 4
5
![Page 14: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/14.jpg)
14 Louis Komzsik PARENG- 2011
Recursive graph partitioning
Coarsening, partitioning and refining phases
8
9 36
57
24
1
2 8
36
57
69
4 2
71
69
44 2
7
2
1
9 6
24
Partition 1
Refining
Partitioning
Coarsening
9 3
51
6
7
Partition 2
![Page 15: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/15.jpg)
15 Louis Komzsik PARENG- 2011
Introduction to industrial analysis
Geometric domain decomposition
Distributed computational solutions
Parallel computational kernels
Application case studies
Conclusions and future work
Scope or presentation
![Page 16: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/16.jpg)
16 Louis Komzsik PARENG- 2011
Distributed memory parallel architecture
� Cluster of high performance workstations
� Distributed memory work station
� Dedicated I/O devices
� High level parallelism
� Feasible number ofnodes: 16-1024
![Page 17: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/17.jpg)
17 Louis Komzsik PARENG- 2011
Geometric problem Partitioning hierarchy
Recursive matrix partitioning
1 2 4
3 6
7
5
9 36
57
24
1
![Page 18: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/18.jpg)
18 Louis Komzsik PARENG- 2011
Distributed normal modes analysis
1 1 1,3 1,3 1
2 2 2,3 2,3 2
3 3 3,7 3,7 3
4 4 4,6 4,6 4
5 5 5,6 5,6 5
6 6 6,7 6,7 6
77 7
oo oo ot ot o
oo oo ot ot o
tt tt tt tt t
oo oo ot ot o
oo oo ot ot o
tt tt tt tt t
ttt tt
K M K M
K M K M
K M K M
K M K M
K M K M
K M K M
K M
λ λ φλ λ φ
λ λ φλ λ φ
λ λ φλ λ φ
φλ
− −
− − − −
− − − −
− −
−
0
=
0)( =Φ− MK λPhysical problem
Partitioned form
![Page 19: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/19.jpg)
19 Louis Komzsik PARENG- 2011
Phase 1
Processor 1
Processor 3 Processor 4
Processor 2Start
Communicate
![Page 20: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/20.jpg)
20 Louis Komzsik PARENG- 2011
Phase 2
Processors 1-2
Processors 3- 4
Start
Communicate
![Page 21: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/21.jpg)
21 Louis Komzsik PARENG- 2011
Phase 3
Processors 1-2-3-4Start
0~
)~~
( =Φ− MK λ
Solve reduced order problem
Recover physical solution
=Φ→
=Φ→
=Φ
7
6
5
4
3
2
1
7
6
5
4
3
2
1
7
6
5
4
3
2
1
~
~
~
~
~
~
~
~
t
t
o
o
t
o
o
t
t
o
o
t
o
o
q
q
q
q
q
q
q
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
![Page 22: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/22.jpg)
22 Louis Komzsik PARENG- 2011
Introduction to industrial analysis
Geometric domain decomposition
Distributed computational solutions
Parallel computational kernels
Application case studies
Conclusions and future work
Scope or presentation
![Page 23: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/23.jpg)
23 Louis Komzsik PARENG- 2011
Shared memory parallel architecture
� Multi-core processors
� Shared cache
� Shared memory
� Low level parallelism
� Feasible number of cores: 2-16
![Page 24: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/24.jpg)
24 Louis Komzsik PARENG- 2011
Sparse factorization
Matrix connectivity Reordering
Elimination tree Factorization
![Page 25: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/25.jpg)
25 Louis Komzsik PARENG- 2011
Multifrontal factorization
Sparsity pattern
Frontal steps
Front amalgamation
![Page 26: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/26.jpg)
26 Louis Komzsik PARENG- 2011
Symbolic reordering
Consecutive columns
Same sparsity pattern
Cache fitting size
Supernodal approach
![Page 27: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/27.jpg)
27 Louis Komzsik PARENG- 2011
Matrix update
Panel selection
Downstream columns
Different sparsity pattern
BLAS 2.5 operation
![Page 28: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/28.jpg)
28 Louis Komzsik PARENG- 2011
Introduction to industrial analysis
Geometric domain decomposition
Distributed computational solutions
Parallel computational kernels
Application case studies
Conclusions and future work
Scope or presentation
![Page 29: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/29.jpg)
29 Louis Komzsik PARENG- 2011
High performance workstation cluster
111 IBM P575 nodes with 1.9 GHz4 dual-core POWER5 CPUs per node
3.5 Terabyte aggregate memory100 Terabyte total disk space
IBM High Performance Switch (HPS)8 GB/sec bidirectional bandwidth
AIX OS Version 5.3Parallel Environment (PE) V4.2
![Page 30: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/30.jpg)
30 Louis Komzsik PARENG- 2011
Trimmed car body application
Shell element model
� 1.3 M grid points� 1.2 M shell elements� 7.9 M degrees of freedom
Normal modes analysis
� Frequency 0 – 300 Hz � ~1000 normal modes� 512 partitions
![Page 31: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/31.jpg)
31 Louis Komzsik PARENG- 2011
Shortening solution time
0.0
20.0
40.0
60.0
80.0
100.0
120.0
Serial 1 2 4 8 16 32 64 128
1.04.0
7.8
29.3
49.2
77.5
96.5
104.1 105.9
Speed Up
Number of DMP processes
![Page 32: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/32.jpg)
32 Louis Komzsik PARENG- 2011
0.00
2.00
4.00
6.00
8.00
10.00
12.00
0 - 100 0 - 200 0 - 300 0 - 400 0 - 500
1.00 1.08 1.21 1.34 1.551.00
2.41
4.67
7.44
10.93
Frequency Range (Hz)
Solution Time
(Normalized)
Number of Modes
(Normalized)
0.00
2.00
4.00
6.00
8.00
10.00
12.00
0 - 100 0 - 200 0 - 300 0 - 400 0 - 500
1.00 1.08 1.21 1.34 1.551.00
2.41
4.67
7.44
10.93
Frequency Range (Hz)
Solution Time
(Normalized)
Number of Modes
(Normalized)
Increased fidelity of analysis
![Page 33: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/33.jpg)
33 Louis Komzsik PARENG- 2011
Distributed memory workstation
HP Proliant DL320G5 server
64 dual core (1.85 GHz) Xeon CPUs
50GB local SATA disks per node
4 GB memory per node
GigE interconnect with HP MPI
Suse Linux Version 10.3
![Page 34: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/34.jpg)
34 Louis Komzsik PARENG- 2011
Automotive engine application
Solid element model
� 3.6 M grid points� 2.3 M tetrahedral elements� 10.8 M degrees of freedom
Normal modes analysis
� Frequency: 0 – 10,000 Hz � ~ 250 normal modes� 256 partitions
![Page 35: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/35.jpg)
35 Louis Komzsik PARENG- 2011
Shortening solution time
1.004.00
7.11
12.47
17.15
25.78
34.58
49.27
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
Speed up
Serial 1 2 4 8 16 32 64
Number of DMP processes
![Page 36: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/36.jpg)
36 Louis Komzsik PARENG- 2011
Increased fidelity of analysis
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
0 - 10,000 0 - 20,000 0 - 30,000 0 - 40,000 0 - 50,000
1.001.25 1.28 1.32 1.34
1.00
2.95
5.61
8.79
12.57
Frequency Range (Hz)
Solution Time
(Normalized)
Number of Modes
(Normalized)
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
0 - 10,000 0 - 20,000 0 - 30,000 0 - 40,000 0 - 50,000
1.001.25 1.28 1.32 1.34
1.00
2.95
5.61
8.79
12.57
Frequency Range (Hz)
Solution Time
(Normalized)
Number of Modes
(Normalized)
![Page 37: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/37.jpg)
37 Louis Komzsik PARENG- 2011
Introduction to industrial analysis
Geometric domain decomposition
Distributed computational solutions
Parallel computational kernels
Application case studies
Conclusions and future work
Scope or presentation
![Page 38: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/38.jpg)
38 Louis Komzsik PARENG- 2011
Geometric domain decomposition technologies provide the basis for distributed solutions on modern hardware
Recursive computational solutions can support a wide range of engineering analyses with practically acceptable accuracy
The handling of the local matrix operations with multi-core processors contributes to the overall performance gain
The performance advantages of distributed computational solutionsare significant and tremendously accelerate the engineering work
Conclusions
![Page 39: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/39.jpg)
39 Louis Komzsik PARENG- 2011
Extending the distributed finite element technology to a grid computing environment
Overcoming the lack of node to node communication mechanism with a high speed network
Minimizing the need for a high bandwidth connection between the local nodes and storage devices
Synchronizing completion of similar computational complexity components on non-homogeneous grid environment
Future work
![Page 40: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element](https://reader034.vdocuments.us/reader034/viewer/2022042101/5e7d7a62f17f9d304c7eb812/html5/thumbnails/40.jpg)
40 Louis Komzsik PARENG- 2011
Thank you for your attention!
www.siemens.com
www.siemens.com/plm
www.siemens.com/plm/nxnastran
Siemens and the Siemens logo are registered trademarks of Siemens AG. NX is a registered trademark of Siemens PLM Software Inc. in the United States and in other countries.
NASTRAN is a registered trademark of the National Aeronautics and Space Administration.
SpaceShip One pictures by courtesy and permission of Quartus Engineering Inc.