(bdt202) hpc now means 'high personal computing' | aws re:invent 2014
DESCRIPTION
Since 2011, ONS.org.br (responsible for planning and operating the Brazilian Electric Sector) has been using AWS to run daily simulations using complex mathematical models. The use of the MIT StarCluster toolkit makes running HPC on AWS much less complex and lets ONS provision a high performance cluster in less than 5 minutes. Since the elapsed time of a big cluster depends of the user, ONS decide to develop a HPC portal where its engineers can interface with AWS and MIT StarCluster without knowing a line of code or having to use the command terminal. It is just a simple turn-on/turn-off portal. The cluster now gets personal, and every engineer runs the models using HPC on AWS as if they are using a PC.TRANSCRIPT
November 13, 2014 | Las Vegas, NV
Sérgio Mafra – ONS (Operador Nacional do Sistema Elétrico)
Ricardo Geh – AWS Enterprise Solutions Architect
Shifting
the
Paradigm
FlexibilityHow HPC can be used as
utility
Shifting
the
Paradigm
Pay As You Go Model
Use only what you need
Multiple pricing models
On-Premises
Capital Expense Model
High upfront capital cost
High cost of ongoing support
HPC as utility
Elastic Cloud-Based Resources
Actual demand
Resources scaled to demand
Waste Customer
Dissatisfaction
Actual Demand
Predicted Demand
Rigid On-Premises Resources
Scale using Elastic Capacity
>600 cores
Scalability on AWS
<10 cores
>1500
cores
Making Production Cloud HPC easy from 64 cores to
…
PharmaJohnson &
Johnson
ManufacturingHGST, a Western
Digital Company
Financial ServicesPacific Life Insurance
GenomicsLife Technologies
ResearchThe Aerospace
Corporation
… 156,314 cores for better solar panel materials for $33k, not $68M
Amazon EC2
16,788 Spot
Instances
Amazon S3
4TB
Processed
Spot Instances
on all 8 Regions
1.21 PetaFLOPS
Intel SandyBridge
on CC2
FlexibilityHow HPC can be used as
utility
Cost-optimizationIt’s about new cost models and
new ways to enable your business
to do more.
Shifting
the
Paradigm
On-Demand
Pay for compute
capacity by the hour
with no long-term
commitments
For spiky workloads,
or to define needs
Reserved
Make a low, one-time
payment and receive a
significant discount on
the hourly charge
For committed
utilization
Spot
Bid for unused capacity,
charged at a Spot Price
which fluctuates based
on supply and demand
For time-insensitive or
transient workloads
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Heavy Utilization Reserved Instances
Light RI Light RILight RILight RI
On-DemandSpot and
On-
Demand
100%
80%
60%
40%
20%
Percentage of Peak Requirements Over Time
EC2 Compute Units (HP)
Mem
ory
(G
B)
256
128
64
32
16
8
4
2
1
1 2 4 8 16 32 64 128
High C
PU
High M
emory
Cluste
r Com
pute
& Hig
h I/O
Mic
ro
Standard
Cluste
r Hig
h
Memory
& H
igh
Stora
ge
Cost-optimizationIt’s about new cost models and
new ways to enable your business
to do more.
FlexibilityHow HPC can be used as
utility
Performance and powerFrom embarrassingly parallel to
tightly coupled
Shifting
the
Paradigm
CLI, API, and console
Scripted configurations
Automation & control
Automatic re-sizing of compute clusters
based upon demand and policies
cfncluster (“CloudFormation cluster”)
Command Line Interface Tool
Deploy and demo an HPC cluster
For more info:
aws.amazon.com/hpc/resources
Try our HPC AWS CloudFormation-based demo
Cluster compute instances
Implement HVM process execution
Intel® Xeon® processors
10 Gigabit Ethernet – Enhanced networking, SR-IOV
c3.8xlarge
32 vCPUs
2.8 GHz Intel Xeon
E5-2680v2 Ivy Bridge
60GB RAM
2 x 320 GB
Local SSD
Performance for tightly-coupled workloads
r3.8xlarge
32 vCPUs
2.5 GHz Intel Xeon
E5-2670v2 Ivy Bridge
244 GB RAM
2 x 320 GB
Local SSD
i2.8xlarge
32 vCPUs
2.5 GHz Intel Xeon
E5-2670v2 Ivy Bridge
244 GB RAM
8 x 800 GB
Local SSD
Network placement groups
Cluster instances deployed in a Placement
Group enjoy low latency, full bisection
10 Gbps bandwidth
10Gbps
Performance for tightly-coupled workloads
GPU compute instances
cg1.8xlarge
33.5 EC2 Compute Units
20GB RAM
2x NVIDIA GPU
448 Cores
3GB Mem
g2.2xlarge
26 EC2 Compute Units
16GB RAM
1x NVIDIA GPU
1536 Cores
4GB Mem
G2 instances
Intel® Intel Xeon E5-2670
1 NVIDIA Kepler GK104 GPU
I/O Performance: Very High
CG1 instances
Intel® Xeon® X5570 processors
2 x NVIDIA Tesla “Fermi” M2050 GPUs
I/O Performance: Very High
Performance for tightly-coupled workloads
FlexibilityHow HPC can be used as
utility
Achieve morePerform bigger, more
complex jobs in a much
reduced time
Performance and powerFlexibility to choose platforms
Shifting
the
Paradigm
Cost-optimizationIt’s about new cost models and
new ways to enable your business
to do more
Oil and Gas
Seismic Data Processing
Reservoir Simulations,
Modeling
Geospatial applications
Predictive Maintenance
Manufacturing & Engineering
Computational Fluid Dynamics
(CFD)
Finite Element Analysis (FEA)
Wind Simulation
Life Sciences
Genome Analysis
Molecular Modeling
Protein Docking
Media & Entertainment
Transcoding and Encoding
DRM, Encryption
Rendering
Energy & Scientific
Computing
Computational Chemistry
High Energy Physics
Stochastic Modeling
Quantum Analysis
Energy Models
Climate Models
Financial
Monte Carlo Simulations
Wealth Management Simulations
Portfolio, Credit Risk Analytics
High Frequency Trading
Analytics
Customers are using AWS for more and more
HPC workloads
Who is ONS?
ONS’s
Journey to
the Cloud
BIPS
NorthIsolated
BrasiliaMain CC
RecifeN/NE Branch
Rio de JaneiroSoutheast Branch
FlorianopolisSouth Branch
The ChallengeWho is ONS?
ONS’s
Journey to
the Cloud
Math Models
Medium
Term
Short
Term
Horizon: 1 to 6 months
Stage: week
Horizon: 5 years
Stage: month NEWAVE
DECOMP
More uncertainty and fewer details
Less uncertainty and more details
Up
da
tin
g o
f o
per
ati
ng c
on
dit
ion
s
Use Hydro
Use Thermal to supplement Hydro
OK
Energy Deficit(load shedding)
Spillage(waste)
OK
Decision
Immediate Cost Future CostTotal Cost
Decomp
“We need more power”
NewaveWeather
Forecast
Parallel Processing
“Sorry, we don´t have any power left”“My job first… pleease!”
1. Elastic Environment• Unlimited processing power
• Ideal for unexpected load
2. Low data transfer• Input and output of small files
• Ideal for internet connection
3. Variable and right cost• Pay per use
• Don´t need to buy huge servers
http://star.mit.edu/cluster/
Cluster 1 Cluster 2 Cluster 3
Config
Work A Work B Work C
The ChallengeWho is ONS?
The Journey
ONS’s
Journey to
the Cloud
Simply put.. SHOW me
the results..
Engineers would
get lost with the
AWS Management
Console
They needed a easy, task-
oriented portal
1. Self Service
3. Accountability2. Usage Control
Cluster Name # Nodes On/Off
Work
(EBS)
Amazon EC2
Amazon EBS
MIT StarCluster
1 Tb
HPC Cluster
1 Gbps
10 Gbps
Controller
Master
Node1
Node3
Node N
ComputingNodes
c3.8xlarge(2 reserved)
MasterNode
c3.8xlarge
Node2
Chrome
Data
Control
Virtual Private Cloud
Work
Private Subnet Public Subnet
HPC Cluster Controller
Internet/
AWS
VPN site-site
Internet
Gateway
The ChallengeWho is ONS?
The Journey
ONS’s
Journey to
the Cloud
Lessons Learned
Infrastructure as code
HPC gets personal
http://bit.ly/awsevals