hpc in the cloud bof - events | internet2€¦ · self-service elastic hpc scheduler ccq efs s3...

19
HPC in the Cloud BOF Sara Jeanes, Boyd Wilson, Amy Cannon I2 & Omnibond (CloudyCluster) © 2016 Internet2

Upload: others

Post on 29-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

HPC in the Cloud BOF

Sara Jeanes, Boyd Wilson, Amy CannonI2 & Omnibond (CloudyCluster)

© 2016 Internet2

Page 2: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

[ 2 ]© 2016 Internet2

Things to Consider (Discussion Topics)• People & Disciplines• Workloads & Technology• Funding & Integration• Cloud vs. Datacenter Costs• Break• Optional Demo – Hands On

HPC in the Cloud

Page 3: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

[ 3 ]

HPC Support

• People + People– Researchers

• HPC or Parallel computation opens doors, but a majority of researchers can’t do it alone

• CI Practitioners • Disciplines

– Sciences, Engineering, Arts, Humanities, Social Sciences– Machine Learning will extend this– More will come

• Additional Pressure on resources (funding, support and infrastructure)

© 2016 Internet2

Page 4: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

[ 4 ]

Workloads

• Pleasingly Parallel (P2) / High Throughput Computing• Message Passing Interface (MPI)

– Light Communication– Heavy Communication

• Data Intensive Computing – Big Data (Hadoop Ecosystem +)• Graphics Processing Unit (GPU)• Field-programmable gate array (FPGA)• Interactive Computation (Jupyter)• Machine Learning• Real Time

© 2016 Internet2

Page 5: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

Where to start with HPC in the Cloud?

Technology

Page 6: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

Here – Combine all the services yourself?

Technology

Page 7: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

Self-Service Elastic HPC

Scheduler

CCQ

EFS S3

Auto-ScalingCompute

OrangeFS HPCParallelStorage

DDB

HPCJob

Login

WebDAVGlobus

CreateafullyoperationalHPCClusterinminutes,completewith:• Storage: OrangeFSonEBS,S3,EFS

• Compute: JobDrivenElasticComputethroughCCQ

• Scheduler: Torque/Maui&SLURMwithCCQMeta-Scheduler

• HPCLibraries:Boost,Cuda Toolkit,Docker,FFTW,FLTK,GCC,Gengetopt,GRIB2,GSL,Hadoop,HDF5,ImageMagick,JasPer,NetCDF,NumPy,Octave,OpenCV,OpenMPI,PROJ,R,Rmpi,SciPy,SWIG,WGRIB,UDUNITS,.NET Core, Singularity, Queue, Picard and xrootd

• HPCSoftware:Ambertools,ANN,ATLAS,BLAS,Blast,Blender,Burrows-WheelerAligner,CESM,GROMACS,LAMMPS,NCAR,NCL,NCO,nwchem,OpenFoam,papi,paraview,QuantumEspresso,SAMtools,WRF,Galaxy, Vtk, Su2, Dakota, Gatk and JupyterNotebook

• YoucanalsoInstallyourownsoftwareinacustomAMIorinEFS• AllfromaneasytouseWebUIfrommobile,tabletordesktop• iRODS andXDMoD aretargetedforFuturerelease.• OnAveragefor5%oftheinstancechargesno upfrontcosts

TorqueSlurm

Technology

Page 8: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

[ 8 ]

Technology

© 2016 Internet2

Federated Web Authentication• Shibboleth• OAuth

Collaborate• Have the ability to create

collaborations• Invite other collaborators to

CloudyCluster• Initially can share Google Drive

Folders

Page 9: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

[ 9 ]

TechnologyCCQ - Elastic HPC Dispatching

SchedulerDynamoDB

Login Instance

Public Subnet

SubmittheJobThroughCCQ

CCQholdsjobdeterminesand

launchesinstancesneeded

CCQSendsthejobtotheschedulerwhenready

SchedulerlaunchestheJobnormally

Ifnojobsareinthequeueforthat

instancetypenearthebillinghour,instances

areterminated

Compute Groups

Scheduler:TorqueSlurm

CCQ

Page 10: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

[ 10 ]

Remove VisualizationFlip of the switch enables secure VNC

Technology

Page 11: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

[ 11 ]

Serverless• Launch Code based on Events

Machine Learning as a Service• natural language understanding (NLU)• text-to-speech (TTS)

• Amazon – Rekognition, Polly, Lex• IBM - Watson• Google - Tensor Flow

Technology

Page 12: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

[ 12 ]

FundingforCloudHPC

• NIHCloudCreditspilot,upto$6mtobereleasedforcurrentNIHInvestigators.Thegetthecreditsdirectlyfromtheprovider.CloudyClusterisaconformantplatform.

• NSFBigDataSciencesandEngineeringprogram$29m+$9minPublicCloudCredits(fromAWS,AzureandGoogle)tobegivendirectlytoresearchers.

Page 13: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

[ 13 ]

CCQHub ProjectHPCJobRouting

Projectgoals:• RouteHPCJobsonPremiseortothe

Cloud• StageDatapriortolaunchingthe

job.• Returnjobresultswhencomplete.• Scalecloudresourceswith

CloudyCluster

Integration

Page 14: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

[ 14 ]

NewFeaturesin V1.3

• SharedHomeDirectoriesinEFS• ConfigurableEBSvolumesper

instanceforOrangeFS• EncryptedEBSvolumeoptions• EnforceS3objectencryption• MFAsupport• SupportforCCQHub• NewLibrariesincludingMachine

LearningCodes• Mlpack,.NetCore,NuPIC,Octave,

OpenCV,PICARD,Queue,Scikit-learn,TensorFlowandTheano.

Scheduler

EFS S3

Auto-ScalingCompute

OrangeFSHPCParallelStorage

DDB

Login

WebDAVGlobus

TorqueSlurm

MultiFactorAuthentication

EncryptedEBS

VolumesOption

SharedHome

Directories

ConfigurableEBSvolumesperInstance

EnforceS3Object

Encryption

Page 15: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

[ 15 ]

COTS Servers + Power + Power Equip + Free DC Building

– $14,375 (5 year, 36 core across 2 CPU, 64 GB Ram, Rack, Cabling, PDU), = 0.0091 per core hr(no UPS/Gen, no electrons, no cooling, no sysadmin, no netadmin, no building, 5yr warranty). ½ Kw power per server

– UPS+ Gen + Transfer Switch (Power Equipment) = $0.13 kw/hr, (100% Utilized)– Electron Charges $0.08 kw/hr Server, ½ for AC units $0.04 kw/hr = $0.12 kw/hr– Total UPS/GEN/Electron/Cooling: = $0.25 kw/hr– Cost per server 1/2kw (at 100% Capacity of DC) = $0.125 /36 cores = $0.0035 core/hr power/cooling (free

building)

– Network $2500 per 10G Port $0.057 hr /36 core = $0.0016 per core / hr– Total = $0.014 P/C 100% utilized Servers 100% utilized (No System Admin, No Network Admin) – Total = $0.021 if P/C is 50% utilized. Servers 100% utilized (Free Building, No System or Network

Admin)– Total = $0.023 P/C is 50% utilized and Servers 85% utilized (Benchmarks, upgrades, offline nodes),

(Free Building, No System or Network Admin).

Cost

Page 16: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

[ 16 ]

Cost

• Cloud (AWS)– C4.8xlarge – 36 core, 60GB Ram– On Demand – $1.591 per instance, $0.0419 per core hr

• Can get the latest CPU/GPU as soon as its avaialble

– Reserved 3 yr -- $0.852 per instance, $0.0237 per core hr• Buying like HW

– Current Spot in Oregon -- $.60 per instance, $0.0167 per core hr– Current Spot in Ohio – $.39 per instance, $0.0108 per core hr

• Spot can be interrupted, so checkpoint or small jobs with restart capability.

– Does not include any quantity discounts, etc.. (Netflix doesn’t pay retail)

Page 17: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

[ 17 ]

HPC Center off Web

Includes some sort of labor

Page 18: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

[ 18 ]

Electron Costs per State (retail)

Page 19: HPC in the Cloud BOF - Events | Internet2€¦ · Self-Service Elastic HPC Scheduler CCQ EFS S3 Auto-Scaling Compute OrangeFSHPC Parallel Storage DDB HPC Job Login WebDAV Globus Create

Subtitle (if any)

© 2016 Internet2

Thank you…

Optional Demo / Hands on (with Free AWS Credit)