high performance computing implementation on aws

27
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pawan Agnihotri Global Financial Services Solutions Architect March 23, 2017 High Performance Computing Implementation on AWS

Upload: amazon-web-services

Post on 07-Apr-2017

83 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: High Performance Computing Implementation on AWS

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Pawan Agnihotri – Global Financial Services Solutions Architect

March 23, 2017

High Performance Computing

Implementation on AWS

Page 2: High Performance Computing Implementation on AWS

Risk Management for Financial Services

Risk Management is essential to the operations of all

Financial Services institutions (FSI).

Types of risks that need to be tested for include, credit, market,

foreign exchange, liquidity, volatility, and inflation.

Regulatory bodies are requiring FSIs to perform higher levels

of stress testing to maintain adequate capital ratios.

Page 3: High Performance Computing Implementation on AWS

Models

Banks use different models for risk

analysis. Some examples include:

• CCAR

• CCR

• VaR

• CVaR

To run these simulations Global FSIs

need large amounts of compute

resources.

Page 4: High Performance Computing Implementation on AWS

The Challenges

Datacenter capacity is limited,

resulting in simulation backlogs

or inadequate risk calculations.

Financial instruments require

flexible compute resources for

development and testing.

Limited capacity, which results

in long run times for

simulations.

Regulatory & market

fluctuations require flexible

compute capabilities.

Large upfront investments &

maintenance required to run on

premises grids.

Standardized hardware offers

limited grid and compute types.

Page 5: High Performance Computing Implementation on AWS

What Is Needed for a Solution

Security of data (environment isolation

& encryption of data at rest).

Capacity on demand.

Large amounts of storage

capacity for data.

Availability of different

compute types.

Page 6: High Performance Computing Implementation on AWS

Schedule

impact!

The Cluster as Seen by the Application User

Page 7: High Performance Computing Implementation on AWS

Security

Page 8: High Performance Computing Implementation on AWS

AWS Compliance

Key Certifications and Assurance Programs

Access

Control

Identity

Management

Key Management

& Storage

Monitoring

& Logs

Assessment

and reporting

Resource & Usage

Auditing

SECURITY & COMPLIANCE

Configuration

Compliance

Web application

firewall

Page 9: High Performance Computing Implementation on AWS

Encryption

Key

Management

Service

CloudHSM Server-side

Encryption

Networking

Virtual

Private

Cloud

Web

Application

Firewall

Compliance

ConfigCloudTrail

&

Inspector

Service

Catalog

Identity

IAM Active

Directory

Integration

SAML

Federation

AWS Security: Deep Set of Cloud Security Tools

Page 10: High Performance Computing Implementation on AWS

Compute Performance

Page 11: High Performance Computing Implementation on AWS

Performance Factors: Compute Capacity

Page 12: High Performance Computing Implementation on AWS

AWS proprietary 10Gb networking

• Highest performance in .8xlarge instance sizes

• Full bi-section bandwidth in placement groups

Enhanced networking

• Available on D2, C3, C4, M4, R3, I2

• Over 1M PPS performance, reduced instance-to-instance

latencies, consistent performance

Performance Factors: Networks

Page 13: High Performance Computing Implementation on AWS

Performance Factors: Storage

Locally attached or “instance storage”

Amazon EBS General Purpose (SSD) volumes

Amazon EBS Provisioned IOPS (SSD) volumes

Amazon EBS Magnetic volumes

Amazon S3 and Amazon Glacier for object storage

Page 14: High Performance Computing Implementation on AWS

Intel Xeon E5-2670 (Sandy Bridge) CPUs

• Available on M3, CC2, CR1, and G2 instance types

Intel Xeon E5-2680 v2 (Ivy Bridge) CPUs

• Available on C3, R3, and I2 instance types

• 2.8 GHz in C3, Turbo enabled up to 3.6 GHz

• Supports Enhanced Advanced Vector Extensions (AVX)

instructions

Intel Xeon E5-2666 v3 (Haswell – AVX2) CPUs

• Available on C4, D2, and M4 instance types

• 2.9 GHz in C4, Turbo enabled up to 3.5 GHz (with Intel Turbo

Boost)

• Supports AVX2 instructions

Performance Factors: CPU

Page 15: High Performance Computing Implementation on AWS

EC2 Instances: Types and Sizes

c4.largeInstance family

Instance generation

Instance size

Page 16: High Performance Computing Implementation on AWS

New EC2 GPU instance type, specifically for accelerated computing:

• Offers up to eight NVIDIA Tesla K80 accelerators

The 16xlarge size provides:

• Combined 192 GB of GPU memory

• 40 thousand CUDA cores

• 70 teraflops of single precision floating point performance

• Over 23 teraflops of double precision floating point performance

Target workloads:

• Deep learning, computational fluid dynamics, computational finance, seismic analysis, molecular

modeling, genomics, rendering

New GPU Instance Types: P2

Page 17: High Performance Computing Implementation on AWS

Available in three sizes:

Instance Size GPUs P2P vCPUs Memory

(GiB)

Network

Bandwidth*

p2.xlarge 1 - 4 61 1.25Gbps

p2.8xlarge 8 Y 32 488 10Gbps

p2.16xlarge 16 Y 64 976 20Gbps

*In a placement group

P2 Instance Types

Page 18: High Performance Computing Implementation on AWS

Grid Reference Architecture

virtual private cloud

Subnet Placement Group

10.40.0.0/16

10.40.10.0/20

Amazon S3

EFS

IAM RoleMSSNode

SchedulerNodeCompute

Nodes

Compute

Nodes

Metadata

Servers

Datanode

Servers

Amazon

CloudWatch

AWS

CloudFormation

AWS

CloudTrail

AWS

ConfigAWS KMS

Page 19: High Performance Computing Implementation on AWS

corporate data centerAWS cloud

Grid Operation

Amazon S3

Page 20: High Performance Computing Implementation on AWS

COST

Page 21: High Performance Computing Implementation on AWS

Time

Typical cluster

utilization rates

are low due to

need to deploy for

peak times.

The Old Way: Low Utilization, High Costs

Server

acquisition

Server

acquisition

Server

acquisition

Actual Demand for Computing

Unused

IT

Resources

Total servers

deployed

Page 22: High Performance Computing Implementation on AWS

Reduced Time

Project

Acceleration

Scale higher to reduce time-to-results: shorter wait times, greater agility,

faster innovation cycles

New

Peak(62K cores)

Previous

Peak(31K cores)

The Cloud Way: Scalability When Needed

Page 23: High Performance Computing Implementation on AWS

Assumptions:

• 1 Petaflop total computing on AWS

• 636 Gigaflops for each m4.10xlarge instance

• 1572 total m4.10xlarge instances

• 31,447 total Xeon cores (E5-2676 v3 Intel Haswell)

• 251TB total RAM (8GB RAM per core)

• EC2 instance type selected for modeling purposes is m4.10xlarge. Other instance types and sizes

are available, and may be recommended for cost optimization or to optimize for specific workloads

• Utilization for comparison purposes is assumed to be 60%

• Storage (1000TB) modeled as a blend of S3 object storage, Glacier, EFS, and EBS

• Persistent (head) nodes and license server nodes are assumed to be 100% utilized

Scenarios for 1 Petaflop Cluster

Page 24: High Performance Computing Implementation on AWS

1. Scenario 1 (50% Reserved Instances, 50% Spot)

2. Scenario 2 (25% Reserved Instances, 75% Spot)

Reserved Instances 50% Reserved Instances

50% Spot31,447

cores

Reserved Instances 25% Reserved Instances

75% Spot31,447

cores

Scenarios for 1 Petaflop Peak Core Cluster

Page 25: High Performance Computing Implementation on AWS

Reserved Instances 50% Reserved Instances

50% Spot31,447

cores

Cost Structure 1 – 50% RI, 50% Spot

Summary

Total Compute Cost: $0.025 per core, per hour

Page 26: High Performance Computing Implementation on AWS

Reserved Instances 25% Reserved Instances

75% Spot31,447

cores

Cost Structure 2 – 25% RI, 75% Spot

Summary

Total Compute Cost: $0.02 per core, per hour