cloud computing for hpc · hpc cloud - differentiating the products cloud connect on demand •...

27
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. TORONTO 10/25/2011 Cloud Computing for HPC Extending Clusters to Clouds Solution Briefing

Upload: others

Post on 04-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. TORONTO 10/25/2011

Cloud Computing

for HPC Extending Clusters to Clouds

Solution Briefing

Page 2: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 2

Company Background

Page 3: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 3 3

Platform Computing, Inc.

The leader in cluster, grid and cloud management software:

o 19 years of profitable growth

o 2,000 of the world’s most demanding client organizations

o 5,000,000 CPUs under management

o 500 professionals working across 13 global centers

o Strategic relationships with Cray, Dell, Fujitsu, HP, IBM, Intel,

Microsoft, Red Hat and SAS

Platform

Clusters, Grids, Clouds

Computing

Page 4: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 4 4

Product Leadership

Workload Management

Platform Computing

Clusters Grids

Resource Management

“We believe Platform ISF is perhaps the most complete internal

cloud software solution we’ve seen so far,” Staten says.

Clouds

Platform HPC

Platform MPI

Platform LSF

Platform Symphony

Platform ISF

Page 5: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 5

Cloud Computing

for HPC

Page 6: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 6 6

Key Trends

• Extending HPC to the cloud paradigm

• Management, Analysis, and Reporting

• Graphic Processing Units

• ISV Application Integration

HPC Market Trends

Page 7: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 7

HPC Challenges

Users need:

• More grid resources to run

application faster

• Flexible resources to support

multiple application

• Lower cost per performance

IT needs:

• Contain costs without compromising

grid size and performance

• Grid data security

• Better meet their users needs

USERS IT

Page 8: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 8

Optimally Making Use Of The Cloud

• Send workload to the cloud

o When workload queues exceed tolerable thresholds

for pending jobs

o When more capacity is required to meet SLAs

o To help small groups easily run their first cluster

enabled jobs

• Not all workload is suitable for the cloud, including:

o When data transfer required exceeds the acceptable wait time

o Data intensive computing applications

o When cloud resources become unreliable or

unavailable

o When privacy and/or security risks are too high

Cloud Computing Can Help

Page 9: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 9

HPC Cloud - Differentiating the products

Cloud Connect On Demand

• LSF, HPC, Symphony plugin

• Schedules infrastructure in multi-tenant environments

• Provides for request fulfillment for sandboxing

• Creates external cloud connection from local clusters

• Accounts for infrastructure consumed

• Customer self service

• Pay-per-use

• Amazon EC2 only (no local)

• Web access only

• HPC Vertical specific (Life Sciences offering now avail)

• Applications pre-installed

• Platform HPC only

Page 10: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 10

Page 11: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 11

Problem #1: Connecting to the cloud

Problem

• Customer workloads are spiky

• Provisioning for peak is highly wasteful (utilization)

• Relying on desktops or existing servers wastes user time and can be very slow

Alternatives

• Take advantage of cloud resources by building their own solution

• Provision for peak (and live with the cost)

• Wait (wastes valuable engineering time, slow TTM)

Desired solution

• Provide a simple to use IaaS connection from an LSF Cluster

• Provide a simple policy engine to decide which jobs burst and which wait for local resources

Page 12: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 12

Problem #2: Only use what is needed

Problem

• IaaS providers usually charge by the instance-hour. In short bursts, very cost effective. In long duration, expensive.

• Workload varies all the time. Cloud should only be used for peak demand

Alternatives

• Open Source products: OpenNebula, Nimbus

• Competitive products: AdaptiveCloud & Unicloud

• In-house ELIM integrated with IaaS APIs

Desired solution

• LSF/HPC/Symphony Plugin architecture

• Automated flexup/flexdown based on pending jobs and TTL for idle resources

Page 13: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 13

Test/Dev Private

Cloud

Java

Platform’s Product Line

Cloud

Extension

Workstation

Cluster Extension

HPC Appl.

Integrations

Advanced

Analytics

EGO

GPU GPU GPU

Symphony HPC

ISF LSF

PCM

MPI

Products that make up Platform’s cloud solutions

Page 14: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 14

What is Platform HPC? The easiest and most complete HPC cluster solution

• Feature-rich workload management

• Unified web portal for access anywhere

• Heterogeneous cluster management Complete Product

• Easy to use job submission portal • Customizable application templates

Integrated Application

Support

• Certified with server, storage & interconnect vendors

• Best customer support Certified

Platform

HPC

Page 15: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 15

What is Platform LSF? The HPC Workload Management Standard

• Advanced, feature-rich workload scheduling

• Robust set of add-on features

• Integrated application support Complete

• Policy & resource-aware scheduling • Resource consolidation for max performance • Advanced self-management

Powerful

• Support for thousands of concurrent users & jobs

• Delivers a virtualized pool of shared resources to support multiple apps

• Flexible control to support multiple policy centers

Most Scalable

• Optimal utilization reduces infrastructure costs

• Improves user productivity for faster time to solution

• Robust operational capabilities improve administrative productivity

Best TCO

Page 16: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 16 16

Dynamic Resource Management

• Separate applications from infrastructure by

creating an independent management platform . . .

What is Platform ISF?

. . . to achieve resource sharing, vendor

independence, and commodity computing

Application workloads

Private Cloud management platform

VM management Provisioning

Server Storage network

IaaS

Page 17: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 17

Integrated Cluster with the Cloud (Platform LSF with VPN cluster management) 1

Platform’s Cloud Solutions for HPC

Cloud Bursting

Making it easy to extend to the cloud

Multi-Cluster to the Cloud (Platform LSF with Platform MultiCluster) 2

Dynamic Cluster Extension to the Cloud (Platform LSF with Platform ISF) 3

Page 18: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 18

Use Case #1 Integrated Cluster with the Cloud

Internal Resources

Cloud provider VPC

connection

Workload Manager (Platform LSF)

The existing cluster nodes

are already too busy

Additional resources from

Amazon join automatically

an existing cluster

Platform LSF contacts cloud

provider to launch VMs

User

Jobs

End user submits more jobs

A policy

determines that

the jobs can go to

the cloud

1

2

3

4

5

6

Page 19: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 19

Internal

Resources Cloud on-demand

instances

Workload Manager (Platform LSF)

MCO automatically

forwards jobs to the

new cluster based

upon poliicies

User

Jobs

Transparent for end users

LSF asks cloud provider

to create a new cluster of

VMs (possibly CCI)

MultiCluster orchestrator

Jobs that may go to

the cloud in RED

1

2

3

4

5

Use Case #2 Multi-Cluster to the Cloud

Page 20: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 20

I need a cluster: 1 master and 3 computes nodes to test my new project

Dynamic Resource Manager

(Platform ISF)

Cloud provider gets

request from ISF to build

new test cluster

ISF determines that no internal

resources are available and by

policy QA/TST can go to cloud

User

Jobs

Grid is created and

then jobs get

submitted

Workload Manager

(Platform LSF)

Workload manager

requests resources from

dynamic resource manager

ISF passes master

location of new

nodes to LSF

1

2 3

4 5

7

6

Use Case #3 Dynamic Cluster Extension to the Cloud

ISF gets master

location from

cloud provider

Page 21: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 21

Designed for Extensibility

• Customizable scheduling algorithm

• Open adapter architecture

• Flexible architectural options

On Demand Scalability

• Grow the grid when needed, shrink when not

• Contain capital costs, keep utilization high

Industry-Leading Support

• Large, worldwide development and support team

• Extensive partner ecosystem

• Nearly two decades of HPC experience

Platform’s Cloud for HPC Key Benefits

Page 22: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 22

Page 23: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 23

Problem #1: No HPC Infrastructure locally

Problem

• Smallest companies and consultants have no access to HPC

• Provisioning for HPC need is unwise / impractical

• Desktop is insufficient to service the workload

• Little or no IT/HPC expertise

Alternatives

• Very few

• Contract with Cycle Computing, Univa, others to build cloud infrastructure (expensive, long lead time)

Desired solution

• Self service, near instantaneous availability, security

• Provide pre-configured SaaS (open source apps) for anyone with an IaaS account

• Applications pre-installed, pre-configured, ready to execute

Page 24: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 24

Platform OnDemand

Workstation

User

VPN OR MultiCluster

L S F D A T A

F I L E D A T A

Phase I: Life Sciences

Phase IB: GRE

Page 25: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 25

Roadmap – On Demand

HPC Cluster

Linux Win Joined Indep L/S & Chem

CAE & IM

O/G GEO

GRE DCC FS EDA

n n n n

n n n n n

n n n n n n

n n n n n n n

n n n n n n n n n

n n n n n n n n n n

Phase I

Phase IB

Phase II

Phase III

Phase IIIB

Phase IV

CYQ3

2011

CYQ4

2011

CYQ1

2012

CYQ2

2011

Phase I IB Phase II Phase III IIIB Phase IV

Marketplace GA Independent Offering

Page 26: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 26

EC2 Instance Sizes Vertical Master Host HPC Cluster

Life Sci. / Chem Standard Large HPC

CAE / IM Standard Large HPC

Oil & Gas / GEO Standard Large HPC

GRE Standard Small High-Mem 4 XL

DCC TBD TBD

Size Memory (GB) Cores HDD (GB) Price $(Lin / Win)

Standard-Small 1.7 1 160 0.085 / 0.12

Standard-Large 7.5 2 850 0.34 / 0.48

Standard-XL 15 4 1690 0.68 / 0.96

Micro 0.613 1 EBS only 0.02 / 0.03

High-Mem XL 17.1 2 420 0.50 / 0.62

High-Mem 2XL 34.2 4 850 1.00 / 1.24

High-Mem 4XL 68.4 8 1690 2.00 / 2.48

High CPU-med 1.7 2 350 0.17 / 0.29

High CPU-XL 7 8 1690 0.68 / 1.16

HPC 23 8 1690 1.60 / -

EC2 Instance Options

Page 27: Cloud Computing for HPC · HPC Cloud - Differentiating the products Cloud Connect On Demand • LSF, HPC, Symphony plugin • Schedules infrastructure in multi-tenant environments

Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 27

Thank You