vmworld 2013: how uc san francisco delivered ‘science as a service’ with private cloud for hpc

39
How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC Brad Dispensa, University of California Andrew Nelson, VMware VSVC5272 #VSVC5272

Upload: vmworld

Post on 22-Jan-2015

113 views

Category:

Technology


0 download

DESCRIPTION

VMworld 2013 Brad Dispensa, University of California Andrew Nelson, VMware Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare

TRANSCRIPT

Page 1: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

How UC San Francisco Delivered ‘Science as a

Service’ with Private Cloud for HPC

Brad Dispensa, University of California

Andrew Nelson, VMware

VSVC5272

#VSVC5272

Page 2: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

2

Agenda

Who we are

Motivation

Project

Architecture

Next steps

Page 3: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

3

Who We Are

Andrew Nelson

Staff Systems Engineer

• VMware

• VCDX#33

Page 4: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

4

Who We Are

Brad Dispensa

Director IT / IS UCSF

• Department of Anesthesia

• Institute for Human Genetics

Page 5: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

5 http://www.flickr.com/photos/43021516@N06/7467742818/

Page 6: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

6

Page 7: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

7

Page 8: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

8

What This Is Not…

We are not launching a new product

This is about a collaboration to determine the use case and

limitations of running workloads that historically have been run in

HPC clusters that could be run virtually

What we find we will share so you can make your own choices

Page 9: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

9

Motivation

A need to deploy HPC as service

• *Where the use case makes sense

Where could it make sense?

• Jobs that are not dependent on saturating all I/O

• Jobs that don’t require all available resources

• Jobs that require bleeding edge packages

• Users want to run as root (Really?!)

• User wants to run an unsupported OS

• Development / QA

• Job integrity more important than run time

• Funding issue (Grant based)

Page 10: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

10

Page 11: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

11

Bias?

Why VM people think they wouldn’t do this

• “You will saturate my servers and cause slowdowns in production systems”

• “I don’t have HPC Fabric”

• “Vm sprawl would take over my datacenter”

• “How would I begin to scope for a use case that does not fit the usual 20%

utilization model?”

Why HPC people think they wouldn’t do this

• “Its not high performance”

• “It will be slow and unwieldy”

• “My app has to be run on dedicated hardware”

• “Latency introduced by the hypervisor”

• “That wont work for my weird use case”

Page 12: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

12

Motivation

Here’s the thing….

• Most life science jobs are single threaded

• Most “programmers” are grad-students

• HPC in Life Sciences is not the same as HPC for oil and gas or other engineering users.

• We are not “critical” it’s research, 5 9’s is not our deal

• When do most long runs start, Friday. Nice to use that hardware that was just going to idle all weekend.

• How is this any different than any discussion in HPC?

• We often debate which file system is better, which chipset, controller.

• It’s never be one size fits all.

• Spending more time sizing rather than just running it.

• The hardware should really be agnostic

• Should we buy …. Or ….

http://frabz.com/meme-generator/caption/10406-Willy-Wonka/

Page 13: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

13

Run Any Software Stacks

App A

OS A

App B

OS B

virtualization layer

hardware

virtualization layer

hardware

virtualization layer

hardware

Support groups with disparate software requirements

Including root access

Page 14: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

14

Separate Workloads

virtualization layer

hardware

virtualization layer

hardware

virtualization layer

hardware

Secure multi-tenancy

Fault isolation

…and sometimes performance

App A

OS A

App B

OS B

Page 15: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

15

Use Resources More Efficiently

App A

OS A

App B

OS B

virtualization layer

hardware

virtualization layer

hardware

virtualization layer

hardware

App A

OS A

App C

OS B

App C

OS A

Avoid killing or pausing jobs

Increase overall throughput

Page 16: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

16

Workload Agility

hardware

operating system

app app app

virtualization layer

hardware

virtualization layer

hardware

app app app

Page 17: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

17

Multi-tenancy with Resource Guarantees

Define policies to manage resource sharing

between groups

App A

OS A

App B

OS B

virtualization layer

hardware

virtualization layer

hardware

virtualization layer

hardware

App A

OS A

App C

OS B

App C

OS A

App A

OS A

App B

OS B

Page 18: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

18

Protect Applications from Hardware Failures

virtualization layer

hardware

virtualization layer

hardware

virtualization layer

hardware

Reactive Fault Tolerance: “Fail and Recover”

App A

OS

App A

OS

Page 19: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

19

Protect Applications from Hardware Failures

virtualization layer

hardware

virtualization layer

hardware

virtualization layer

hardware

MPI-0

OS

MPI-1

OS

MPI-2

OS

Proactive Fault Tolerance: “Move and Continue”

Page 20: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

20

Elastic Application Layers

App A

OS A

App B

OS B

virtualization layer

hardware

virtualization layer

hardware

virtualization layer

hardware

App A

OS A

App C

OS B

App C

OS A

Ability to decouple compute and data and size

each appropriately

Multi-threading vs multi-VMs

App A

OS A

App B

OS B

Compute

OS A

Data

OS B

Page 21: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

21

http://siliconangle.com/blog/2011/05/24/the-basic-qos-myth-myth-3-of-the-good-enough-network/fuzzy-tv/

Page 22: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

22

Agenda

Who we are

Motivation

Project

Architecture

Next steps

Page 23: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

23

http://www.examiner.com/article/best-barbecue-books

Page 24: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

24

Project Overview

Collaborative research effort between UCSF and VMware Field and

CTO Office

• Additional participation by nVidia, EMC/Isilon and DDN.

Prove out the value of a private cloud solution for HPC Life

Sciences workload

Stand up a small private cloud on customer-supplied hardware

• Dell M1000E Blade Chassis

• Dell M610 Blades

• FDR-IB

• Equalogic VMDK storage

• DDN GPFS store

• EMC/Isilon store (NFS)

Testing to include an array of Life Sciences applications important

to UCSF, including some testing of the use of VMware VDI to move

science desktop workloads into the private cloud

Page 25: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

25

Project Overview

Desktop visualization

• Could we also replace expensive desktops with thin-client like devices

for users that need to visualize complex imaging datasets or 3D

instrument datasets?

Page 26: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

26

Page 27: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

27

Project Overview – Success Factors

Didn’t have to be as fast as metal but can’t be significantly slower

The end product must allow a user to self provision a host from a

vetted list of options

• “I want 10 Ubuntu machines that I can run as root with X packages installed”

Environment must be agile allowing for different workloads to

cohabitate a single hardware environment

• I.E. You can run a “R” workload on the same blade that is running a desktop

visualization job

What ever you could do on metal, you have to be able to do

in virtualization (*)

Users must be fully sandboxed to prevent “bad stuff” from

leaving their workloads

Page 28: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

28

Agenda

Who we are

Motivation

Project

Architecture

Next steps

Page 29: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

29

VMware vCAC

Users IT

Research Group 1 Research Group m

Public Clouds

Programmatic

Control and

Integrations

User Portals Security

VMware

vCNS

Research Cluster 1 Research Cluster n

VMware vCloud Automation Center

VMware

vCenter Server

VMware vSphere VMware vSphere VMware vSphere

Catalogs

VMware

vCenter Server

VMware

vCenter Server

Secure Private Cloud for HPC

Page 30: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

30

Architecture

The components are “off the shelf”

• Standard Dell servers

• Mellanox FDR switches

• Isilon and DDN are tuned as normal

No custom workflows

• We tune the nodes the same way you would normally in your virtual and HPC

environments.

There is no “next-gen” black box appliance used, what we have

you can have.

Page 31: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

31

Architecture

Why Blades?

• It’s what we have...

• The chassis allows us to isolate more easily for initial testing but they are also

commonly deployed in dense virtualization environments as well in HPC.

Page 32: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

32

Agenda

Who we are

Motivation

Project

Architecture

Next steps

Page 33: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

33

Next Steps

The Results will report performance comparisons between bare-

metal and virtualized for a set of Life Sciences applications

important to UCSF and life sciences:

• BLAST – running a synthetic data set

• Bowtie

• Affymetrix and Illumina genomics pipelines (both with vendor-supplied

test datasets)

• R – with a stem-cell dataset (likely) or a hypertension dataset (possibly)

• Desktop virtualization

The Results will also report on use of VDI to move current

workstation science applications onto the Proof of Concept

server cluster

An important part of this will be an assessment of the hypothesized

value props: self provisioning, multi-tenancy, etc.

Page 34: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

34

Next Steps

Complete initial benchmarking

• Capture core metrics on the physical hardware and then capture the same

data as a virtualized host.

Does it work?

• What happens when we start to scale it upwards, does performance

stay linear?

Page 35: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

35

Page 36: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

36

Conclusions and future directions

http://commons.wikimedia.org/wiki/File:20_questions_1954.JPG

Page 37: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

THANK YOU

Page 38: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC
Page 39: VMworld 2013: How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPC

How UC San Francisco Delivered ‘Science as a

Service’ with Private Cloud for HPC

Brad Dispensa, University of California

Andrew Nelson, VMware

VSVC5272

#VSVC5272