aws re:invent 2016: high performance computing on aws (cmp207)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

David Pellerin, Business Development Principal, HPC

November 30, 2016

High Performance

Computing on AWS

CMP207

What to Expect from the Session

Overview of use cases for HPC in science, aerospace, automotive and

manufacturing, life sciences, financial services, and energy.

Overview of HPC capabilities on AWS, including:

• The newest compute-optimized instances

• P2 GPU instances

• F1 FPGA instances

Best practices for running traditional and new/emerging HPC workflows in

the cloud, including graphical pre- and post-processing and workflow

automation

What is HPC?

Use Cases

• High-energy physics simulations

• Weather and climate modeling and prediction

• Analysis of fluids, structures, and materials

• Thermal and electromagnetic simulations

• Genomics, proteomics, and molecular dynamics

• Seismic and reservoir simulations

• 3D rendering and visualizations

• Deep learning training and inference

Cloud unlocks HPC for a broad range of use cases

AWS for High Performance Computing…

Scale Matters: for Big Data and Big Compute

Big Datadrives

Big Computein

Big Science

Big Compute on AWS for Big Science

HPC in Energy Management

Big Data meets Big Compute

"Fugro Roames has enabled Ergon Energy to

reduce the cost of vegetation management from

AU$100 million to AU$60 million per year.”- Josh Passenger, Technical Architect, Fugro Roames

• Aircraft equipped with cameras, laser sensors

• Repeated overflights of power networks

• Captured data is used to render detailed 3D

models of the power lines, and the environment

• Analytics and simulations are run to generate

actionable reports for directing post-disaster

repair and prioritizing ongoing maintenance

HGST applications for engineering:

Molecular dynamics, CAD, CFD, EDA

Collaboration tools for engineering

Big data for manufacturing yield analysis

HPC for Engineering Simulations

Running drive-head

simulations at scale:

Millions of parallel parameter

sweeps, running months of

simulations in just hours

Over 85,000 Intel cores

running at peak, using Spot

Instances

16M cell, polyhedral,

external aero case

Running on c4.8xlarge

instances

Demonstrates excellent

scalability for typical

CFD models

HPC for Aerospace

Mapping HPC Use-Cases

Data LightMinimal

requirements for

high performance

storage

Data HeavyBenefits from

access to high

performance

storage

Fluid dynamics

Weather forecasting

Materials simulations

Crash simulations

Risk simulations

Molecular modeling

Contextual search

Logistics simulations

Animation and VFX

Semiconductor verification

Image processing/GIS

Genomics

Seismic processing

Metagenomics

Astrophysics

Deep learning

Clustered (Tightly Coupled)

Distributed/Grid (Loosely Coupled)

Cluster HPC and Grid HPC on the Cloud

Cluster HPC

Tightly coupled,

latency sensitive

applications

Use larger EC2

compute instances,

placement groups,

enhanced networking

Grid HPC

Loosely coupled,

pleasingly parallel

Use a variety of EC2

instances, multiple

AZs, Spot, Auto

Scaling, Amazon

SQS

Grids of Clusters

Use a grid strategy on the cloud

to run a group of parallel,

individually clustered HPC jobs

What Does This Mean for Simulations?

Expand the simulation domain

Run larger numbers of parallel, clustered HPC jobs

Performance Testing

for Common Applications

Weather Prediction

WRF Scaling and Performance on AWS

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0.0

500.0

1000.0

1500.0

2000.0

2500.0

0 500 1000 1500 2000 2500 3000 3500 4000

Time(S)

Scale-Up

Cores

WRF2.5kmCONUSBenchmark

Scale-Up time

0

20

40

60

80

100

120

140

160

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

0 50 100 150 200 250 300 350

ScaleUp

Time(s)

Cores

c4.8xlargeTime c4.8xlargeScaleup

Structural Analysis

AWG ERIF Test Case 2.1: Fan Blade-Off Rig Test, Generic Fan Rig Model

ANSYS Mechanical FEA Performance

ENGINE BLOCK (V17cg-3) (PCG solver)Static structural analysis of an engine block

without the internal components

Performance for Fluid Dynamics on AWSANSYS Fluent

• AWS c4.8xlarge

• 140M cells

• F1 car CFD benchmark

http://www.ansys-blog.com/simulation-on-the-cloud/

Test using larger, real-world examples

• Use large cases for testing: do not benchmark scalability

using only small examples

Domain decomposition

• Choose number of cells per core for either per-core

efficiency or for faster results

Instance types

• C4 or M4 are best choices today

Network

• Use a placement group

• Enable enhanced networking

Performance Considerations for HPC on AWS

Choose a cell-to-core

ratio to optimize core

efficiency, to optimize

license costs, or to

achieve faster results

Higher per-core

efficiency

Faster results

Domain Decomposition is Important

OS version

• Use Amazon Linux or a version 3.10 or later Linux kernel

Processor states and affinity

• Use P-states to reduce processor variability

• Use CPU affinity to pin threads to CPU cores

MPI libraries

• Intel MPI recommended

Hyper-threading

• Test with Hyper-threading on and off

• Usually off is best, but not always

Performance Considerations for HPC on AWS

EC2 Instance Types

for HPC

Broad Set of Compute Instance Types

M4

General

purpose

Compute

optimized

C4

C3

Storage and I/O

optimized

I3

G2

GPU or FPGA

enabled

Memory

optimized

D2

M3

X1

P2

F1

R4

R3

C5

I2 HS1

Diving Deep: M4.16xlarge M4

CPU-Based Instances for HPC

Intel CPUs

• Up to 2.9 GHz, Turbo enabled up to 3.6 GHz

• Intel® Advanced Vector Extensions (Intel® AVX2)

• Control over C-States, P-States, and Hyper-threading

• C4, M4 are the most common instance types for HPC:

• Up to 64 vCPUs (32 physical cores)

• R3 and X1 for higher memory applications

• Up to 128 vCPUs (64 physical cores), up to 2 TB RAM

• Proprietary network delivering up to 20 Gbps

GPU and FPGA Instances

P2: GPU instance

• Up to 16 NVIDIA GK210 (8 X K80) GPUs in a single instance, with

peer-to-peer PCIe GPU interconnect

• Supporting a wide variety of use cases including deep learning, HPC

simulations, financial computing, and batch rendering

F1: FPGA instance

• Up to 8 Xilinx Virtex® UltraScale+™ VU9P FPGAs in a single

instance, with peer-to-peer PCIe and bidirectional ring interconnects

• Designed for hardware-accelerated applications including financial

computing, genomics, accelerated search, and image processing

P2

F1

P2 GPU Instances

• Up to 16 K80 GPUs in a single instance

• Including peer-to-peer PCIe GPU interconnect

• Supporting a wide variety of use cases including deep

learning, HPC simulations, and batch rendering

P2

Instance

Size

GPUs GPU Peer

to Peer

vCPUs Memory

(GiB)

Network

Bandwidth*

p2.xlarge 1 - 4 61 1.25Gbps

p2.8xlarge 8 Y 32 488 10Gbps

p2.16xlarge 16 Y 64 732 20Gbps

*In a placement group

F1 FPGA Instances

• Up to 8 Xilinx Virtex UltraScale Plus VU9p FPGAs in a single instance

with four high-speed DDR-4 per FPGA

• Largest size includes high performance FPGA interconnects via PCIe

Gen3 (FPGA Direct), and bidirectional ring (FPGA Link)

• Designed for hardware-accelerated applications including financial

computing, genomics, accelerated search, and image processing

F1

Instance Size FPGAs FPGA

Link

FPGA

Direct

vCPUs Memory

(GiB)

NVMe

Instance

Storage

Network

Bandwidth*

f1.2xlarge 1 - 8 122 1 x 480 5 Gbps

f1.16xlarge 8 Y Y 64 976 4 x 960 30 Gbps

*In a placement group

Why FPGAs?featuring

Genomic Big Data

Scale

• Population scale genomics

• Precision medicine for all

• Liquid biopsy cancer screenings

DNA data doubles every 7 months – CPU speeds double every 2 years

Size

• Each person’s genome is ~100 GB

• Computationally intensive analysis

• Multiple copies stored forever

DNA

FPGAs for Genomics HPC

Highly Efficient

• Algorithms implemented in hardware

• Gate-level circuit design

• No instruction set overhead

Massively Parallel

• Massively parallel circuits

• Multiple compute engines

• Rapid FPGA reconfigurability

FPGA

Speeds analysis of whole human genomes from hours to minutes

Unprecedented low cost for compute and compressed storage

www.edicogenome.com

Deploying HPC

on AWS

Traditional HPC Stack

Shared file storage

HPC cluster

License managers and cluster

head nodes with job schedulers

3D graphics remote desktop servers

Remote

graphics workstations

Storage cache

Remote sites

Remote backup

Migrating HPC to AWS

Shared File Storage

Cloud-based, scaling HPC cluster

on EC2

License managers and cluster

head nodes with job schedulers

3D graphics virtual workstation

AWS Direct Connect

On-Premises IT

Resources

Thin or Zero Client

- No local data -

Storage CacheAmazon S3

and

Amazon

Glacier

Deploying HPC

on AWS (Legacy)

Deploying HPC

on AWS (Optimized)Use different on-demand

HPC clusters for

different applications

or end-users

1. Users access resources via secure VPN tunnel

2. Cloud desktops are GPU-enabled for graphics performance

3. Hardened and monitored proxy server used for all access

4. Optional: AWS CodeCommit used for source code repo

5. Continuous Integration server used to manage builds

6. Simple Queueing Service used for queue-based job submission

7. Application-specific compute nodes automatically scaled based on demand

8. License server can be on-premises, or in cloud with results and logs pushed to S3

9. Coverage tracking system notified and updated as jobs complete

Automation Capabilities: CfnCluster

• CfnCluster simplifies

deployment of HPC in the

cloud, including integrating

with popular HPC schedulers

Amazon S3

Secure, durable,

highly-scalable object

storage. Fast access,

low cost.

For long-term durable

storage of data, in a

readily accessible

get/put access format.

Primary durable and

scalable storage for

HPC data

Amazon Glacier

Secure, durable, long

term, highly cost-

effective object

storage.

For long-term storage

and archival of data

that is infrequently

accessed.

Use for long-term,

lower-cost archival

of HPC data

EC2+EBS

Create a single-AZ

shared file system

using EC2 and EBS,

with third-party or

open source software

(e.g., Intel Lustre).

For near-line storage

of files optimized for

high I/O performance.

Use for high-IOPs,

temporary working

storage

AWS Storage Options for HPC Workloads

EFS

Highly available,

multi-AZ, fully

managed network-

attached elastic file

system.

For near-line, highly-

available storage of

files in a traditional

NFS format (NFSv4).

Use for read-often,

temporary working

storage

Secure Graphics and CollaborationCloud can be used for pre-and post processing as well as HPC

• Use GPUs in the cloud for remote rendering and remote desktops

Cloud is more secure

for collaboration

• Encrypt the data in flight

and at rest

• Manage your own keys

and credentials

• Deliver pixels to your

collaborators, not the

actual data

1) Customer Managed Application Hosting• Customer has account with AWS and manages virtual infrastructure

• Cloud used for batch jobs via cluster management software

• Customer can also remote log in and collaborate using GPU instances

• Customer maintains traditional software vendor relationships

• Software vendor optionally offers license flexibility for scalable computing

2) Software Vendor Managed Application Hosting• SaaS or hybrid model for managed engineering apps in the cloud

• Customer pays software vendor for cloud-hosted services

• Customer does not need to manage virtual infrastructure

Either method of software delivery is supported on AWS, and the right method will depend

on customer requirements – for security and governance, ease of deployment, etc.

Options for Software Licensing

Example: ANSYS Enterprise Cloud

Example: Altair HyperWorks on AWS

Virtual screening at Novartis

• 10 million compounds screened

against a cancer target, in only 9 hours

• Approximately 87,000 compute cores

at peak

HPC Partner on AWS: Cycle Computing

Engineering simulations at HGST:

• Millions of parameter sweeps, running

months of simulations in just hours

• Over 85,000 Intel cores running at peak,

using Spot Instances

www.cyclecomputing.com

● Customer:

● Reduced analysis time from 5.3 days to 12 hours

● Instantly scaled up to 48 cores

HPC Partner on AWS: RescaleAPN Advanced Partner

Rescale’s cloud HPC platform

• Offers native integration to over

180+ simulation and machine

learning applications in a SaaS

environment

• Automation of systems tools and

services enables seamless

deployment of AWS

• JL & Associates used Rescale on

AWS to utilize multiphase CFD

analysis for modeling boiling oil

(C12H26)

• The team was able to achieve their

goal of steady state convergence

which required 23k Iterations @

~20 sec/It

www.rescale.com

HPC Partner on AWS: Alces Flight

www.alces-flight.com

Future Trends: Microservice-Based HPC

www.algorithmia.com

Next Steps

Visit aws.amazon.com/hpc

Additional sessions:

• CMP314 - Bringing Deep Learning to the Cloud with Amazon EC2

• CMP317 – Deep Learning, 3D Content Rendering, and Massively

Parallel, Compute-Intensive Workloads in the Cloud

• CMP318 – Building HPC Clusters as Code

• CMP320 – Delivering Graphical Applications on AWS

Thank you!

Remember to complete

your evaluations!

aws re:invent 2016: high performance computing on aws (cmp207)

Technology