aws re:invent 2016: high performance computing on aws (cmp207)
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
David Pellerin, Business Development Principal, HPC
November 30, 2016
High Performance
Computing on AWS
CMP207
What to Expect from the Session
Overview of use cases for HPC in science, aerospace, automotive and
manufacturing, life sciences, financial services, and energy.
Overview of HPC capabilities on AWS, including:
• The newest compute-optimized instances
• P2 GPU instances
• F1 FPGA instances
Best practices for running traditional and new/emerging HPC workflows in
the cloud, including graphical pre- and post-processing and workflow
automation
What is HPC?
Use Cases
• High-energy physics simulations
• Weather and climate modeling and prediction
• Analysis of fluids, structures, and materials
• Thermal and electromagnetic simulations
• Genomics, proteomics, and molecular dynamics
• Seismic and reservoir simulations
• 3D rendering and visualizations
• Deep learning training and inference
Cloud unlocks HPC for a broad range of use cases
AWS for High Performance Computing…
Scale Matters: for Big Data and Big Compute
Big Datadrives
Big Computein
Big Science
Big Compute on AWS for Big Science
HPC in Energy Management
Big Data meets Big Compute
"Fugro Roames has enabled Ergon Energy to
reduce the cost of vegetation management from
AU$100 million to AU$60 million per year.”- Josh Passenger, Technical Architect, Fugro Roames
• Aircraft equipped with cameras, laser sensors
• Repeated overflights of power networks
• Captured data is used to render detailed 3D
models of the power lines, and the environment
• Analytics and simulations are run to generate
actionable reports for directing post-disaster
repair and prioritizing ongoing maintenance
HGST applications for engineering:
Molecular dynamics, CAD, CFD, EDA
Collaboration tools for engineering
Big data for manufacturing yield analysis
HPC for Engineering Simulations
Running drive-head
simulations at scale:
Millions of parallel parameter
sweeps, running months of
simulations in just hours
Over 85,000 Intel cores
running at peak, using Spot
Instances
16M cell, polyhedral,
external aero case
Running on c4.8xlarge
instances
Demonstrates excellent
scalability for typical
CFD models
HPC for Aerospace
Mapping HPC Use-Cases
Data LightMinimal
requirements for
high performance
storage
Data HeavyBenefits from
access to high
performance
storage
Fluid dynamics
Weather forecasting
Materials simulations
Crash simulations
Risk simulations
Molecular modeling
Contextual search
Logistics simulations
Animation and VFX
Semiconductor verification
Image processing/GIS
Genomics
Seismic processing
Metagenomics
Astrophysics
Deep learning
Clustered (Tightly Coupled)
Distributed/Grid (Loosely Coupled)
Cluster HPC and Grid HPC on the Cloud
Cluster HPC
Tightly coupled,
latency sensitive
applications
Use larger EC2
compute instances,
placement groups,
enhanced networking
Grid HPC
Loosely coupled,
pleasingly parallel
Use a variety of EC2
instances, multiple
AZs, Spot, Auto
Scaling, Amazon
SQS
Grids of Clusters
Use a grid strategy on the cloud
to run a group of parallel,
individually clustered HPC jobs
What Does This Mean for Simulations?
Expand the simulation domain
Run larger numbers of parallel, clustered HPC jobs
Performance Testing
for Common Applications
Weather Prediction
WRF Scaling and Performance on AWS
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0.0
500.0
1000.0
1500.0
2000.0
2500.0
0 500 1000 1500 2000 2500 3000 3500 4000
Time(S)
Scale-Up
Cores
WRF2.5kmCONUSBenchmark
Scale-Up time
0
20
40
60
80
100
120
140
160
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
0 50 100 150 200 250 300 350
ScaleUp
Time(s)
Cores
c4.8xlargeTime c4.8xlargeScaleup
Structural Analysis
AWG ERIF Test Case 2.1: Fan Blade-Off Rig Test, Generic Fan Rig Model
ANSYS Mechanical FEA Performance
ENGINE BLOCK (V17cg-3) (PCG solver)Static structural analysis of an engine block
without the internal components
Performance for Fluid Dynamics on AWSANSYS Fluent
• AWS c4.8xlarge
• 140M cells
• F1 car CFD benchmark
http://www.ansys-blog.com/simulation-on-the-cloud/
Test using larger, real-world examples
• Use large cases for testing: do not benchmark scalability
using only small examples
Domain decomposition
• Choose number of cells per core for either per-core
efficiency or for faster results
Instance types
• C4 or M4 are best choices today
Network
• Use a placement group
• Enable enhanced networking
Performance Considerations for HPC on AWS
Choose a cell-to-core
ratio to optimize core
efficiency, to optimize
license costs, or to
achieve faster results
Higher per-core
efficiency
Faster results
Domain Decomposition is Important
OS version
• Use Amazon Linux or a version 3.10 or later Linux kernel
Processor states and affinity
• Use P-states to reduce processor variability
• Use CPU affinity to pin threads to CPU cores
MPI libraries
• Intel MPI recommended
Hyper-threading
• Test with Hyper-threading on and off
• Usually off is best, but not always
Performance Considerations for HPC on AWS
EC2 Instance Types
for HPC
Broad Set of Compute Instance Types
M4
General
purpose
Compute
optimized
C4
C3
Storage and I/O
optimized
I3
G2
GPU or FPGA
enabled
Memory
optimized
D2
M3
X1
P2
F1
R4
R3
C5
I2 HS1
Diving Deep: M4.16xlarge M4
CPU-Based Instances for HPC
Intel CPUs
• Up to 2.9 GHz, Turbo enabled up to 3.6 GHz
• Intel® Advanced Vector Extensions (Intel® AVX2)
• Control over C-States, P-States, and Hyper-threading
• C4, M4 are the most common instance types for HPC:
• Up to 64 vCPUs (32 physical cores)
• R3 and X1 for higher memory applications
• Up to 128 vCPUs (64 physical cores), up to 2 TB RAM
• Proprietary network delivering up to 20 Gbps
GPU and FPGA Instances
P2: GPU instance
• Up to 16 NVIDIA GK210 (8 X K80) GPUs in a single instance, with
peer-to-peer PCIe GPU interconnect
• Supporting a wide variety of use cases including deep learning, HPC
simulations, financial computing, and batch rendering
F1: FPGA instance
• Up to 8 Xilinx Virtex® UltraScale+™ VU9P FPGAs in a single
instance, with peer-to-peer PCIe and bidirectional ring interconnects
• Designed for hardware-accelerated applications including financial
computing, genomics, accelerated search, and image processing
P2
F1
P2 GPU Instances
• Up to 16 K80 GPUs in a single instance
• Including peer-to-peer PCIe GPU interconnect
• Supporting a wide variety of use cases including deep
learning, HPC simulations, and batch rendering
P2
Instance
Size
GPUs GPU Peer
to Peer
vCPUs Memory
(GiB)
Network
Bandwidth*
p2.xlarge 1 - 4 61 1.25Gbps
p2.8xlarge 8 Y 32 488 10Gbps
p2.16xlarge 16 Y 64 732 20Gbps
*In a placement group
F1 FPGA Instances
• Up to 8 Xilinx Virtex UltraScale Plus VU9p FPGAs in a single instance
with four high-speed DDR-4 per FPGA
• Largest size includes high performance FPGA interconnects via PCIe
Gen3 (FPGA Direct), and bidirectional ring (FPGA Link)
• Designed for hardware-accelerated applications including financial
computing, genomics, accelerated search, and image processing
F1
Instance Size FPGAs FPGA
Link
FPGA
Direct
vCPUs Memory
(GiB)
NVMe
Instance
Storage
Network
Bandwidth*
f1.2xlarge 1 - 8 122 1 x 480 5 Gbps
f1.16xlarge 8 Y Y 64 976 4 x 960 30 Gbps
*In a placement group
Why FPGAs?featuring
Genomic Big Data
Scale
• Population scale genomics
• Precision medicine for all
• Liquid biopsy cancer screenings
DNA data doubles every 7 months – CPU speeds double every 2 years
Size
• Each person’s genome is ~100 GB
• Computationally intensive analysis
• Multiple copies stored forever
DNA
FPGAs for Genomics HPC
Highly Efficient
• Algorithms implemented in hardware
• Gate-level circuit design
• No instruction set overhead
Massively Parallel
• Massively parallel circuits
• Multiple compute engines
• Rapid FPGA reconfigurability
FPGA
Speeds analysis of whole human genomes from hours to minutes
Unprecedented low cost for compute and compressed storage
www.edicogenome.com
Deploying HPC
on AWS
Traditional HPC Stack
Shared file storage
HPC cluster
License managers and cluster
head nodes with job schedulers
3D graphics remote desktop servers
Remote
graphics workstations
Storage cache
Remote sites
Remote backup
Migrating HPC to AWS
Shared File Storage
Cloud-based, scaling HPC cluster
on EC2
License managers and cluster
head nodes with job schedulers
3D graphics virtual workstation
AWS Direct Connect
On-Premises IT
Resources
Thin or Zero Client
- No local data -
Storage CacheAmazon S3
and
Amazon
Glacier
Deploying HPC
on AWS (Legacy)
Deploying HPC
on AWS (Optimized)Use different on-demand
HPC clusters for
different applications
or end-users
1. Users access resources via secure VPN tunnel
2. Cloud desktops are GPU-enabled for graphics performance
3. Hardened and monitored proxy server used for all access
4. Optional: AWS CodeCommit used for source code repo
5. Continuous Integration server used to manage builds
6. Simple Queueing Service used for queue-based job submission
7. Application-specific compute nodes automatically scaled based on demand
8. License server can be on-premises, or in cloud with results and logs pushed to S3
9. Coverage tracking system notified and updated as jobs complete
Automation Capabilities: CfnCluster
• CfnCluster simplifies
deployment of HPC in the
cloud, including integrating
with popular HPC schedulers
Amazon S3
Secure, durable,
highly-scalable object
storage. Fast access,
low cost.
For long-term durable
storage of data, in a
readily accessible
get/put access format.
Primary durable and
scalable storage for
HPC data
Amazon Glacier
Secure, durable, long
term, highly cost-
effective object
storage.
For long-term storage
and archival of data
that is infrequently
accessed.
Use for long-term,
lower-cost archival
of HPC data
EC2+EBS
Create a single-AZ
shared file system
using EC2 and EBS,
with third-party or
open source software
(e.g., Intel Lustre).
For near-line storage
of files optimized for
high I/O performance.
Use for high-IOPs,
temporary working
storage
AWS Storage Options for HPC Workloads
EFS
Highly available,
multi-AZ, fully
managed network-
attached elastic file
system.
For near-line, highly-
available storage of
files in a traditional
NFS format (NFSv4).
Use for read-often,
temporary working
storage
Secure Graphics and CollaborationCloud can be used for pre-and post processing as well as HPC
• Use GPUs in the cloud for remote rendering and remote desktops
Cloud is more secure
for collaboration
• Encrypt the data in flight
and at rest
• Manage your own keys
and credentials
• Deliver pixels to your
collaborators, not the
actual data
1) Customer Managed Application Hosting• Customer has account with AWS and manages virtual infrastructure
• Cloud used for batch jobs via cluster management software
• Customer can also remote log in and collaborate using GPU instances
• Customer maintains traditional software vendor relationships
• Software vendor optionally offers license flexibility for scalable computing
2) Software Vendor Managed Application Hosting• SaaS or hybrid model for managed engineering apps in the cloud
• Customer pays software vendor for cloud-hosted services
• Customer does not need to manage virtual infrastructure
Either method of software delivery is supported on AWS, and the right method will depend
on customer requirements – for security and governance, ease of deployment, etc.
Options for Software Licensing
Example: ANSYS Enterprise Cloud
Example: Altair HyperWorks on AWS
Virtual screening at Novartis
• 10 million compounds screened
against a cancer target, in only 9 hours
• Approximately 87,000 compute cores
at peak
HPC Partner on AWS: Cycle Computing
Engineering simulations at HGST:
• Millions of parameter sweeps, running
months of simulations in just hours
• Over 85,000 Intel cores running at peak,
using Spot Instances
www.cyclecomputing.com
● Customer:
● Reduced analysis time from 5.3 days to 12 hours
● Instantly scaled up to 48 cores
HPC Partner on AWS: RescaleAPN Advanced Partner
Rescale’s cloud HPC platform
• Offers native integration to over
180+ simulation and machine
learning applications in a SaaS
environment
• Automation of systems tools and
services enables seamless
deployment of AWS
• JL & Associates used Rescale on
AWS to utilize multiphase CFD
analysis for modeling boiling oil
(C12H26)
• The team was able to achieve their
goal of steady state convergence
which required 23k Iterations @
~20 sec/It
www.rescale.com
HPC Partner on AWS: Alces Flight
www.alces-flight.com
Future Trends: Microservice-Based HPC
www.algorithmia.com
Next Steps
Visit aws.amazon.com/hpc
Additional sessions:
• CMP314 - Bringing Deep Learning to the Cloud with Amazon EC2
• CMP317 – Deep Learning, 3D Content Rendering, and Massively
Parallel, Compute-Intensive Workloads in the Cloud
• CMP318 – Building HPC Clusters as Code
• CMP320 – Delivering Graphical Applications on AWS
Thank you!
Remember to complete
your evaluations!