high performance computing (hpc) - cae associates or on the cae associates youtube channel:website...

32
High Performance Computing (HPC) Computing (HPC) CAEA eLearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 © 2015 CAE Associates

Upload: truongdat

Post on 07-Mar-2018

224 views

Category:

Documents


1 download

TRANSCRIPT

High Performance Computing (HPC)Computing (HPC)CAEA eLearning Series

Jonathan G. Dudley, Ph.D.06/09/2015

© 2015 CAE Associates

Agenda

Introduction

HPC Background— Why HPC

— SMP vs. DMP

— Licensing

HPC Terminology— Types of HPC: HPC Cluster & Workstation HPC— Hardware Components: CPU vs. Cores, GPU vs. Phi, HDD vs. SSD— Interconnects— GPU Acceleration

2

CAE Associates Inc.

Engineering Consulting Firm in Middlebury, CT specializing in FEA and CFD analysis.

ANSYS® Channel Partner since 1985 providing sales of the ANSYS®

products, training and technical support.

3

e-Learning Webinar Series

This presentation is part of a series of e-Learning webinars offered by This presentation is part of a series of e-Learning webinars offered by CAE Associates.

You can view many of our previous e-Learning session either on our website or on the CAE Associates YouTube channel:website or on the CAE Associates YouTube channel:

If you are a New Jersey or New York resident you can earn continuing education credit for attending the full webinar and completing a survey

4

education credit for attending the full webinar and completing a survey which will be emailed to you after the presentation.

CAEA Resource Library

Our Resource Library contains over 250 items including: Our Resource Library contains over 250 items including: — Consulting Case Studies— Conference and Seminar Presentations

Software demonstrations— Software demonstrations— Useful macros and scripts

The content is searchable and you can download copies of the material to review at your conveniencereview at your convenience.

5

CAEA Engineering Advantage Blog

Our Engineering Advantage Blog offers weekly insights from our Our Engineering Advantage Blog offers weekly insights from our experienced technical staff.

6

CAEA ANSYS® Training

Classes can be held at our Training Center at CAE Associates or on-site Classes can be held at our Training Center at CAE Associates or on-site at your location.

CAE Associates is offering on-line training classes in 2015! Registration is a ailable on o r ebsite Registration is available on our website.

7

Agenda

Introduction

HPC Background— Why HPC

— Licensing

— SMP vs. DMP

HPC Terminology— Types of HPC: HPC Cluster & Workstation HPC— Hardware Components: CPU vs. Cores, GPU vs. Phi, HDD vs. SSD— Interconnects— GPU Acceleration

8

Why High Performance Computing (HPC)?

Remove computing limitations from Remove computing limitations from engineers in all phases of design, analysis, and testing

Impact product design Impact product design— Faster simulation— More efficient parametric studies

Larger Models— More accuracy — Turbulence modeling, particle

tracking. More refined models Design Optimization

— More runs for a fixed hardware configuration

9

Why HPC?

Using today’s multicore Using today s multicore computers are key for companies to remain competitive.

ANSYS HPC product suite allows scalability to whatever computational level required from single-user orlevel required, from single-user or small user group options at entry-level up to virtually unlimited parallel capacity or large user group optionscapacity or large user group options at enterprise level.

— Reduce turnaround time— Examine more design variants fasterg— Simulate larger or more complex

models

10

4 Main Product Licenses

HPC (per-process) Parallel HPC (per-process) HPC Pack

— HPC product rewarding volume parallel processing for high fidelity simulations

2048

Enabled(Cores) 32768

8192

processing for high-fidelity simulations.— Each simulation consumes one or more Packs.— Parallel enabled increases quickly with added

Packs32

8

128

512

Packs. HPC Workgroup

— HPC product rewards volume parallel processing for increased simulation throughput

8

HPC Packs per Simulation

1 2 3 4 5 6 7processing for increased simulation throughput shared among engineers throughout a single location or the world.

— 16 to 32768 parallel shared across any number

HPC Packs per Simulation

of simulations on a single server. HPC Parametric Pack

— Enables simultaneous execution of multiple

11

design points while consuming just one set of licenses.

Poll #01

Poll #01

12

Shared and Distributed Memory

Shared Memory: Single Machine Shared Memory Shared Memory: Single Machine Parallel (SMP) systems share a single global memory image that may be distributed physically across

Shared Memory

multiple cores, but is globally addressable.

— OpenMP is the industry standard.

Distributed Memory: Distributed memory parallel processing (DMP) assumes that physical memory for Distributed Memoryassumes that physical memory for each process is separate from all other processes

— Requires message passaging

Distributed Memory

software to communicate between cores

— MPI is the industry standard

13

Distributed ANSYS Architecture

Domain decomposition approach Sparse PCG & LANPCG all Domain decomposition approach Break problem into “N” pieces “Solve” the global problem

independently within each domain

Sparse, PCG & LANPCG all support distributed

BenefitsTh ti SOLVE h i ll lindependently within each domain

Communicate information across the boundaries as necessary

DMP on single node or cluster! SMP

— The entire SOLVE phase is parallel — More computations performed in parallel

with faster solution time. Better speed-ups than SMP Can achieve > 4x speed up on 8 cores (TryDMP on single node or cluster! SMP

for single node only — Can achieve > 4x speed-up on 8 cores (Try

getting that with SMP!!!!) — Can be used for jobs running on hundreds

of cores. Can take advantage of resources on multiple machines p

— Memory usage and bandwidth scales — Disk (I/O) usage scales (i.e. parallel I/O)

14

ANSYS Mechanical Scaling

6M Degrees of Freedom Plasticity, Contact Bolt pretension 4 load steps 4 load steps

15

v15

Parallel Settings ANSYS APDL

SMP With GPU Acceleration Settings

DMP: For Multiple Core or Node Processing

For GPU Acceleration using DMP:Customization Preferences Tab  Additional Parameters add command line argument: ‐acc nvidia

16

Parallel Settings ANSYS CFX/Fluent

FluentMultiple Core Processing and

FluentParallel Settings OptionsCFX

P ll l S tti O ti

17

GPU Acceleration Optionsg p

Parallel Settings Options

2 Common Types of HPC

HPC Cluster HPC Cluster HPC Cluster— Communication via series of

switches and interconnects • Infiniband,

HPC Cluster

Infiniband, • Gigabit (1GB/s,10GB/s)• Fiber

— Scalable • DOE Supercomputer: 1.6M cores

Workstation HPC Workstation HPC— Single desktop communication— More than 2 cores, commonly 8

or more

Workstation HPC

— Quad Socket Current Builds • Xeon E5-4600 up to 48 cores• Up to 1TB of 1866 DDR3 1866

MH RAM

18

MHz RAM

Poll #02

Poll #02

19

PC Components

20

Central Processing Unit and Cores Intel Xeon E5 Processor Series

Quad-Socket MOBO— E5: 4-18 Cores per CPU— Frequency: 1.8-3.5 GHz— L3 Cache up to 2.5MB/Core

Quad-Socket MOBO

— Bus: 6.4-9 GT/s QPI

Intel Xeon E7 Processor Series:— E7: 4-18 Cores per CPU— Frequency: 1.9-3.2 GHz— L3 Cache up to 2.5MB/Core CPU— Bus: 6.4-9 GT/s QPI

RAM

CPU

— DDR4: Supports 2-4k MT/s (106 transfers/s)

— DDR3: Supports 0.8-2k MT/s

21

DDR3: Supports 0.8 2k MT/s

Graphical and Co-Processing Units

GPU l t d ti i th C P i i t GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate scientific, analytics, engineering,

Co-Processing is a computer processor (PCI-Card) used to supplement the functions of the primary processorscientific, analytics, engineering,

consumer, and enterprise applications.

Supported Cards

p y p— Floating-point arithmetic— Signal processing

Supported Cards Supported Cards — Mechanical and Fluent Only— 64-bit Windows or Linux x64

• Tesla K10 and K20 series

— Xeon Phi 3000, 5000, 7000 series(ANSYS Mechanical only)

• Quadro 6000• Quadro K5000 and K6000

22

Improved Parallel Performance & Scaling

23

ANSYS FLUENT

GPU Acceleration ANSYS Mechanical ANSYS Fluent Only

— For models with solid elements > 500k DOF— DMP is preferred— DOF>5M add another card or a single card

with 12GB (k40 k6000)

— Higher AMG are ideal for GPU acceleration. Coupled problems benefit from GPUs

— Whole problem must fit on GPUwith 12GB (k40, k6000)— PCG/JCG solver: MSAVE off— Models with lower Lev_Diff better suited

p• 1e06 cells require ~4 GB GPU RAM

— Better performance with lower CPU core counts

3 t 4 CPU C 1 GPU• 3 to 4 CPU Cores per 1 GPU

ANSYS Fluent

24

GPU/CoProcessing Licensing

Licensing Optionsg p— HPC Packs for quick scale-up— HPC Workgroup for flexibility

GPUs treated same as CPU cores in the licensing modelGPUs treated same as CPU cores in the licensing model As you scale-up, license cost decreases per core

25

Poll #03

Poll #03

26

Hard Disk

Conventional SATA SAS & SATA Conventional SATA— 7200 RPM and 10k RPM— Ideal for volume storage

Cheapest

SAS & SATA

— Cheapest

Serial Attached SCSI (SAS)15k RPM d i (RAID 0)— 15k RPM drives (RAID 0)

— Ideal “scratch space” drives

2 5” SSD Solid State Drives (SSD)

— Fastest read/write operations— Lower power, cooler, quieter

2.5 SSD

— No mechanical parts— Ideal for OS drive— Cost per GB is highest

27

Interconnects

Internal CAT5 I fi ib d Internal— Controlled by motherboard — Intel QuickPath Interconnect (QPI)

• PCIe 3 0 x8 = 63 Gb/s

CAT5e Infiniband

• PCIe 3.0 x8 = 63 Gb/s• PCIe 2.0 x1 = 32 Gb/s• PCIe 4.0 x8 = 125 Gb/s

External— Gigabit (1 Gb/s) — Infiniband (56 Gb/s) Mechanical/APDL requires at least 10

(1 GB/s) (40+ GB/s)

Infiniband (56 Gb/s) — Fibre Channel’s (16 Gb/s)— Ethernet RDMA (40 Gb/s)

Mechanical/APDL requires at least 10 GB/s interconnect for scaling past 1-node.

• Prefer Infiniband FRD/QDR FDR for large clusters

28

Basic Guidelines

Faster cores = faster solution4 GB RAM/Core ANSYS CFD

Faster RAM = faster solution— Most be aware of memory

bandwidth

4 GB RAM/Core ANSYS CFD Hyper-threading: Off Turbo-Boost: Only for low core counts

Faster HD = faster solution — Especially for intensive I/O— RAID 0 for multiple disks

Faster is better! More is better.— Must balance

budget/performance

— SSD or SAS 15k drives— Parallel file systems

29

Poll #04

Poll #04

30

HPC Revolution

Every computer today is a parallel computer.

Every simulation in ANSYS can benefit from parallel processing.

31

Questions

© 2015 CAE Associates