(bdt202) hpc now means 'high personal computing' | aws re:invent 2014

55
November 13, 2014 | Las Vegas, NV Sérgio Mafra ONS (Operador Nacional do Sistema Elétrico) Ricardo Geh AWS Enterprise Solutions Architect

Upload: amazon-web-services

Post on 30-Jun-2015

346 views

Category:

Technology


0 download

DESCRIPTION

Since 2011, ONS.org.br (responsible for planning and operating the Brazilian Electric Sector) has been using AWS to run daily simulations using complex mathematical models. The use of the MIT StarCluster toolkit makes running HPC on AWS much less complex and lets ONS provision a high performance cluster in less than 5 minutes. Since the elapsed time of a big cluster depends of the user, ONS decide to develop a HPC portal where its engineers can interface with AWS and MIT StarCluster without knowing a line of code or having to use the command terminal. It is just a simple turn-on/turn-off portal. The cluster now gets personal, and every engineer runs the models using HPC on AWS as if they are using a PC.

TRANSCRIPT

Page 1: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

November 13, 2014 | Las Vegas, NV

Sérgio Mafra – ONS (Operador Nacional do Sistema Elétrico)

Ricardo Geh – AWS Enterprise Solutions Architect

Page 2: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
Page 3: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
Page 4: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Shifting

the

Paradigm

Page 5: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

FlexibilityHow HPC can be used as

utility

Shifting

the

Paradigm

Page 6: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Pay As You Go Model

Use only what you need

Multiple pricing models

On-Premises

Capital Expense Model

High upfront capital cost

High cost of ongoing support

HPC as utility

Page 7: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Elastic Cloud-Based Resources

Actual demand

Resources scaled to demand

Waste Customer

Dissatisfaction

Actual Demand

Predicted Demand

Rigid On-Premises Resources

Page 8: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
Page 9: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
Page 10: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Scale using Elastic Capacity

>600 cores

Scalability on AWS

<10 cores

>1500

cores

Page 11: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Making Production Cloud HPC easy from 64 cores to

PharmaJohnson &

Johnson

ManufacturingHGST, a Western

Digital Company

Financial ServicesPacific Life Insurance

GenomicsLife Technologies

ResearchThe Aerospace

Corporation

… 156,314 cores for better solar panel materials for $33k, not $68M

Amazon EC2

16,788 Spot

Instances

Amazon S3

4TB

Processed

Spot Instances

on all 8 Regions

1.21 PetaFLOPS

Intel SandyBridge

on CC2

Page 12: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
Page 13: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

FlexibilityHow HPC can be used as

utility

Cost-optimizationIt’s about new cost models and

new ways to enable your business

to do more.

Shifting

the

Paradigm

Page 14: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

On-Demand

Pay for compute

capacity by the hour

with no long-term

commitments

For spiky workloads,

or to define needs

Reserved

Make a low, one-time

payment and receive a

significant discount on

the hourly charge

For committed

utilization

Spot

Bid for unused capacity,

charged at a Spot Price

which fluctuates based

on supply and demand

For time-insensitive or

transient workloads

Page 15: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Heavy Utilization Reserved Instances

Light RI Light RILight RILight RI

On-DemandSpot and

On-

Demand

100%

80%

60%

40%

20%

Percentage of Peak Requirements Over Time

Page 16: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

EC2 Compute Units (HP)

Mem

ory

(G

B)

256

128

64

32

16

8

4

2

1

1 2 4 8 16 32 64 128

High C

PU

High M

emory

Cluste

r Com

pute

& Hig

h I/O

Mic

ro

Standard

Cluste

r Hig

h

Memory

& H

igh

Stora

ge

Page 17: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Cost-optimizationIt’s about new cost models and

new ways to enable your business

to do more.

FlexibilityHow HPC can be used as

utility

Performance and powerFrom embarrassingly parallel to

tightly coupled

Shifting

the

Paradigm

Page 18: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

CLI, API, and console

Scripted configurations

Automation & control

Automatic re-sizing of compute clusters

based upon demand and policies

Page 19: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

cfncluster (“CloudFormation cluster”)

Command Line Interface Tool

Deploy and demo an HPC cluster

For more info:

aws.amazon.com/hpc/resources

Try our HPC AWS CloudFormation-based demo

Page 20: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Cluster compute instances

Implement HVM process execution

Intel® Xeon® processors

10 Gigabit Ethernet – Enhanced networking, SR-IOV

c3.8xlarge

32 vCPUs

2.8 GHz Intel Xeon

E5-2680v2 Ivy Bridge

60GB RAM

2 x 320 GB

Local SSD

Performance for tightly-coupled workloads

r3.8xlarge

32 vCPUs

2.5 GHz Intel Xeon

E5-2670v2 Ivy Bridge

244 GB RAM

2 x 320 GB

Local SSD

i2.8xlarge

32 vCPUs

2.5 GHz Intel Xeon

E5-2670v2 Ivy Bridge

244 GB RAM

8 x 800 GB

Local SSD

Page 21: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Network placement groups

Cluster instances deployed in a Placement

Group enjoy low latency, full bisection

10 Gbps bandwidth

10Gbps

Performance for tightly-coupled workloads

Page 22: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

GPU compute instances

cg1.8xlarge

33.5 EC2 Compute Units

20GB RAM

2x NVIDIA GPU

448 Cores

3GB Mem

g2.2xlarge

26 EC2 Compute Units

16GB RAM

1x NVIDIA GPU

1536 Cores

4GB Mem

G2 instances

Intel® Intel Xeon E5-2670

1 NVIDIA Kepler GK104 GPU

I/O Performance: Very High

CG1 instances

Intel® Xeon® X5570 processors

2 x NVIDIA Tesla “Fermi” M2050 GPUs

I/O Performance: Very High

Performance for tightly-coupled workloads

Page 23: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

FlexibilityHow HPC can be used as

utility

Achieve morePerform bigger, more

complex jobs in a much

reduced time

Performance and powerFlexibility to choose platforms

Shifting

the

Paradigm

Cost-optimizationIt’s about new cost models and

new ways to enable your business

to do more

Page 24: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Oil and Gas

Seismic Data Processing

Reservoir Simulations,

Modeling

Geospatial applications

Predictive Maintenance

Manufacturing & Engineering

Computational Fluid Dynamics

(CFD)

Finite Element Analysis (FEA)

Wind Simulation

Life Sciences

Genome Analysis

Molecular Modeling

Protein Docking

Media & Entertainment

Transcoding and Encoding

DRM, Encryption

Rendering

Energy & Scientific

Computing

Computational Chemistry

High Energy Physics

Stochastic Modeling

Quantum Analysis

Energy Models

Climate Models

Financial

Monte Carlo Simulations

Wealth Management Simulations

Portfolio, Credit Risk Analytics

High Frequency Trading

Analytics

Customers are using AWS for more and more

HPC workloads

Page 25: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
Page 26: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Who is ONS?

ONS’s

Journey to

the Cloud

Page 27: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

BIPS

Page 28: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
Page 29: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

NorthIsolated

BrasiliaMain CC

RecifeN/NE Branch

Rio de JaneiroSoutheast Branch

FlorianopolisSouth Branch

Page 30: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

The ChallengeWho is ONS?

ONS’s

Journey to

the Cloud

Page 31: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Math Models

Page 32: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Medium

Term

Short

Term

Horizon: 1 to 6 months

Stage: week

Horizon: 5 years

Stage: month NEWAVE

DECOMP

More uncertainty and fewer details

Less uncertainty and more details

Up

da

tin

g o

f o

per

ati

ng c

on

dit

ion

s

Page 33: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Use Hydro

Use Thermal to supplement Hydro

OK

Energy Deficit(load shedding)

Spillage(waste)

OK

Decision

Immediate Cost Future CostTotal Cost

Page 34: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Decomp

“We need more power”

NewaveWeather

Forecast

Parallel Processing

Page 35: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

“Sorry, we don´t have any power left”“My job first… pleease!”

Page 36: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

1. Elastic Environment• Unlimited processing power

• Ideal for unexpected load

2. Low data transfer• Input and output of small files

• Ideal for internet connection

3. Variable and right cost• Pay per use

• Don´t need to buy huge servers

Page 37: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
Page 38: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

http://star.mit.edu/cluster/

Page 39: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Cluster 1 Cluster 2 Cluster 3

Config

Work A Work B Work C

Page 40: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
Page 41: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

The ChallengeWho is ONS?

The Journey

ONS’s

Journey to

the Cloud

Page 42: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
Page 43: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Simply put.. SHOW me

the results..

Page 44: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Engineers would

get lost with the

AWS Management

Console

They needed a easy, task-

oriented portal

Page 45: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

1. Self Service

3. Accountability2. Usage Control

Page 46: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Cluster Name # Nodes On/Off

Page 47: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
Page 48: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
Page 49: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014
Page 50: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Work

(EBS)

Amazon EC2

Amazon EBS

MIT StarCluster

1 Tb

HPC Cluster

1 Gbps

10 Gbps

Controller

Master

Node1

Node3

Node N

ComputingNodes

c3.8xlarge(2 reserved)

MasterNode

c3.8xlarge

Node2

Chrome

Data

Control

Page 51: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Virtual Private Cloud

Work

Private Subnet Public Subnet

HPC Cluster Controller

Internet/

AWS

VPN site-site

Internet

Gateway

Page 52: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

The ChallengeWho is ONS?

The Journey

ONS’s

Journey to

the Cloud

Lessons Learned

Page 53: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Infrastructure as code

HPC gets personal

Page 54: (BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014