2.4 parallel performance enhancements

20
Training Manual 001419 15 Aug 2000 2.4-1 NEW FEATURES 5.7 NEW FEATURES 5.7 2.4 Parallel Performance Enhancements In this section, we will discuss the following topics: A. New add-on product Parallel Performance for ANSYS B. Distributed Domain Solver (DDS) C. Algebraic Multigrid Solver (AMG)

Upload: yasuo

Post on 13-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

2.4 Parallel Performance Enhancements. In this section, we will discuss the following topics: A.New add-on product Parallel Performance for ANSYS B.Distributed Domain Solver (DDS) C.Algebraic Multigrid Solver (AMG). Parallel Performance Enhancements Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-1

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.72.4 Parallel Performance Enhancements

• In this section, we will discuss the following topics:

A. New add-on product Parallel Performance for ANSYS

B. Distributed Domain Solver (DDS)

C. Algebraic Multigrid Solver (AMG)

Page 2: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-2

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7Parallel Performance Enhancements

Overview

• Driven by user requirements of higher accuracy and fidelity in solution

– e.g. mesh refinement and adaptive meshing

• Desire to solve assemblies instead of individual component analysis

– e.g. assembly contact problems

Page 3: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-3

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7Parallel Performance Enhancements

A. Parallel Performance for ANSYS

• A new, add-on product for shared memory and distributed memory environments

• Offers powerful new solvers enabling quick, accurate solutions to large models using multiple processors

– Algebraic MultiGrid (AMG) solver• Solves static/ transient nonlinear analyses using multiple

processors (up to 8) on a single system (shared memory parallel)

– Distributed Domain Solver (DDS)• Solves large static / transient nonlinear analyses over multiple

systems (Distributed memory parallel) as well as multiple processors on a single machine (Shared memory parallel) or any combination

Page 4: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-4

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7

What is DDS?

• Breaks large problems (up to 10 million DOFs) into smaller domains (1000 to 10000 DOFs) automatically

• Compatibility among domains obtained by solving for interface variables (Lagrange multipliers)

Parallel Performance Enhancements

B. DDS

Page 5: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-5

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7

...What is DDS?

• Transfers and factorizes the subdomains on slave machines using direct solver

• Master machine retrieves and assembles subdomain solutions; solves for interface variables using an iterative solver and computes results for entire model

0

2

4

6

8

10

12

14

0 2 4 6 8 10 12 14 16 18 20 22 24

Number of CPUs

Sp

eed

up

rat

io

Parallel Performance Enhancements

… DDS

Page 6: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-6

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7

Carrier problem 3.5 million DOF

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

0 5 10 15 20 25 30 35

No. of Processors

So

lver

Wal

l-ti

me

(sec

.)

Speed-up = 21.0

Parallel Performance Enhancements … DDS

Why DDS?

• Highly scalable

– More processors / less elapsed time

– Example below shows a 3.5 million-DOF SOLID92 model• 2020 subdomains on an SGI Origin 2000, 12GB memory

Page 7: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-7

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7

Memory / Disk requirements

• 2 to 4 times more memory than PCG; however, this is not a problem for distributed memory architecture.

– Memory required is a sum of all master & individual slave machine memories

– In general Master machine will need large memory

Parallel Performance Enhancements

… DDS

Page 8: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-8

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7

• DDS has 2 components:

– Domain decomposer• Embedded in ANSYS

• Divides domain into n subdomains

• Creates scratch.dds, file.dds, and file.erot

• Issues ‘mpirun’ command and launches appropriate ansdds.e57 executable

– ANSDDS.E57• A stand-alone, MPI enabled executable

• Computes solution for subdomain on the slave processor

• Writes out a file called scratch.u, which is later retrieved by the Master to calculate element results

Parallel Performance Enhancements

… DDS - Under the Hood

Page 9: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-9

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7Parallel Performance Enhancements … DDS

• System requirements

– Network must be homogeneous (same operating system)• Message Passing Interface (MPI) used to communicate

– Master (where the job is submitted)• “Performance Parallel for ANSYS” add-on required

• ANSYS 5.7 must be installed (including ansdds.e57)

• Installation of MPI

• 256 MB ram / 10 GB disk required

– Slave • Installation of MPI on all slave machines

• ansdds.e57 executable must be installed

Page 10: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-10

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7

How to use DDS

• Specify “Parallel Peformance for ANSYS” add-on when starting ANSYS

– ansys57 -pp

• Choose DDS Solver

– EQSLV,DOMAIN

• Specify information about slave processors

– DDSOPT command*

*DDSOPT command covered in Systems Training

Parallel Performance Enhancements

… DDS

Page 11: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-11

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7Parallel Performance Enhancements

… DDS

How to use DDS (cont'd)

• Solve

• Postprocessing

– You get a results file as usual

– /PNUM,DOMAIN,ON will display domains by colors / numbers

Page 12: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-12

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7

Are there any modeling restrictions for using DDS?

• Structural static/transient only (linear or nonlinear)

• Symmetric matrices

• “h” elements only

• No coupling / constraint equations

• No inertia relief

Parallel Performance Enhancements

… DDS Solver

Page 13: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-13

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7

Benchmark - 2 million DOF test case

0

2000

4000

6000

8000

10000

12000

0 2 4 6 8 10 12 14 16 18 20 22 24 26

Number of Processors

CP

U t

ime

( in

sec

on

ds)

Parallel Performance Enhancements

… DDS Solver

Page 14: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-14

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7

What is AMG solver?

• A preconditioned conjugate gradient solver similar to PCG solver

• The preconditioner used in AMG solver is derived using Algebraic MultiGrid technique

– MultiGrid techniques derive a preconditioner that is very close to [K]-1 by working on a coarser mesh of the FE model supplied

– Algebraic MultiGrid methods work on a coarsened version of the full [K] matrix instead of the mesh (that is mesh independent)

Parallel Performance Enhancements

C. AMG Solver

Page 15: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-15

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7Parallel Performance Enhancements … AMG Solver

Why do we need AMG solver?

• Sensitivity to ill-conditioning

– Much less sensitive to ill-conditioned problems than PCG

– Will get solutions in fewer iterations than PCG for ill-conditioned problems

– Expected to perform as well as PCG for well conditioned problems

• Scalability

– Up to 5 times for 8 processors

– Scales much better than PCG

– Used in shared memory parallel (single machine with multiple processors) only

Page 16: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-16

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7

Scalability

Parallel Performance Enhancements

… AMG Solver

AMG Benchmark , 500,000 dof model

0

100

200

300

400

500

600

0 1 2 3 4 5 6 7 8

Number of CPUs

CP

tim

e (

AM

G s

olv

er)

Page 17: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-17

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7

How to use AMG solver

– Specify “Parallel Peformance for ANSYS” add-on when starting ANSYS• ansys57 -pp

– Specify number of processors:• /CONFIG,NPROC,N

• or config57.ans

• or use the macro SETNPROC

– Choose AMG Solver• EQSLV,AMG,Toler

– Tolerance defaults to 1e-8 similar to PCG

– Solve

Parallel Performance Enhancements

… AMG Solver

Page 18: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-18

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7

• When to use AMG solver

– Structural Static & Transient analyses

– Nonlinear analyses

– Large aspect ratio elements, reduced integration elements

– Models with combination of shells/ solids/ beams

– Shared memory parallel machines

• When not to use AMG solver

– Non-structural problems (it works but is less efficient)

– Models made of only shell63 elements do not seem to be as cpu efficient as PCG

Parallel Performance Enhancements

… AMG Solver

Page 19: 2.4 Parallel Performance Enhancements

Training Manual 00141915 Aug 2000

2.4-19

NEW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7N

EW

FEA

TU

RES 5

.7

Memory / Disk requirements

– 1.3 to 2 times more memory than PCG solver• Rule of thumb is 130 MB per 100,000 dof for solid92s

• Memory required is also a function of number of processors used (overhead)

– Files created during AMG solution are very similar to PCG and about the same size

Parallel Performance Enhancements

… AMG Solver

Page 20: 2.4 Parallel Performance Enhancements