![Page 1: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/1.jpg)
HPC Bursting with Google Cloud PlatformKaran Bhatia, GoogleJonathon LeFaive, University of Michigan
Proprietary + ConfidentialSEPTEMBER 28, 2016
![Page 3: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/3.jpg)
![Page 4: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/4.jpg)
Cloud Deployment Manager
Proprietary + Confidential
● Repeatable
● Declarative
● Template Driven
![Page 5: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/5.jpg)
Cloud Deployment Manager
Proprietary + Confidential
% gcloud deployment-manager deployments create mycluster --config condor-cluster.yaml
% gcloud compute ssh condor-submit
![Page 6: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/6.jpg)
Proprietary + Confidential
Cores from Google
![Page 7: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/7.jpg)
Proprietary + Confidential
![Page 8: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/8.jpg)
Not interested in Infrastructure? Just want to run jobs and get results...
Confidential & ProprietaryGoogle Cloud Platform 8
![Page 9: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/9.jpg)
MIT Research w/ VMs
Products used: Google Compute Engine, Cloud Storage, DataStore
220,000 cores on preemptible VMs
2,250 32-core instances, 60 CPU-years of computation in a single afternoon
Answers in hours v. months
580,000 cores
![Page 10: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/10.jpg)
Broad Firecloud:WDL, Cromwell and Google Genomics
WDL: an external DSL used by computational biologists to express the analytical pipelines
Cromwell: a scalable, robust engine for executing WDL against pluggable backends including local, Docker, Grid Engine or …
Google Genomics Pipelines API: co-developed by Broad and Google Genomics, a scalable Docker-as-a-Service with data scheduling
![Page 11: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/11.jpg)
Pipeline definition{
"name": "samtools index",
"description": "Run samtools index to generate a BAM index file",
"inputParameters": [
{"name": "inputFile",
"localCopy": {
"disk": "data",
"path": "input.bam"
}
},
{"name": "outputFile",
"localCopy": {
"disk": "data",
"path": "output.bam.bai"
}
},
],
"resources": {
"minimumCpuCores": 1,
"minimumRamGb": 1,
"disks": [{
"name": "data",
"type": "PERSISTENT_HDD"
"sizeGb": 200,
"mountPoint": "/mnt/data",
}]
},
"docker": {
"imageName": "quay.io/cancercollaboratory/dockstore-tool-samtools-index",
"cmd": "samtools index /mnt/data/input.bam /mnt/data/output.bam.bai"
}
}
![Page 12: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/12.jpg)
Create, run, monitor, and kill pipelines
Create$ gcloud alpha genomics pipelines create --pipeline-json-file PIPELINE-FILE.json --pipeline-json-file samtools_index.json
Created samtools index, id: PIPELINE-ID
Run$ gcloud alpha genomics pipelines run --pipeline_id PIPELINE-ID \
--logging gs://YOUR-BUCKET/YOUR-DIRECTORY/logs \
--inputs inputFile=gs://genomics-public-data/gatk-examples/example1/NA12878_chr22.bam \
--outputs outputFile=gs://YOUR-BUCKET/YOUR-DIRECTORY/output/NA12878_chr22.bam.bai
Running: operations/OPERATION-ID
Status$ gcloud alpha genomics operations describe OPERATION-ID
Kill$ gcloud alpha genomics operations cancel OPERATION-ID
![Page 13: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/13.jpg)
DSUB (google genomics pipelines)
![Page 14: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/14.jpg)
Task Tailored Resources
![Page 15: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/15.jpg)
Preemptible VM Instances
● What Preemptible VMs are○ Up to 80% cheaper than regular VMs. (~$0.01 per core hour)○ Very easy to use -- just flip one switch in the UI, API or command line○ Many of our biggest customers run huge clusters (10k+ cores) with great success and
savings.
● Things to keep in mind○ Same great disk, OS images and network○ Google Compute Engine can preempt (i.e. shutdown/take-away) the VM with 30
seconds of notice ○ Maximum 24 hours of uptime○ No SLAs or guarantees of any kind but we historically see preemption rates of 5-15%
![Page 16: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/16.jpg)
Intel Skylake
● Significant “per core” performance improvements
● Intel® Advanced Vector Extension 512 (Intel® AVX-512)
○ 2x flops/second● Accelerated IO with Intel® Omni-Path
Architecture (Fabric)● Integrated Intel® QuickAssist Technology
(crypto & compression offload)● Intel® Resource Director Technology (Intel®
RDT) for Efficiency & TCO
![Page 17: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/17.jpg)
Hardware Accelerated
● Available Today: NVIDIA K80 GPU, P100s
● Coming Soon: Tensor Processing Unit (TPU)
● Custom ASIC built and optimized for TensorFlow
● Used in production at Google for over 16 months
● 7 years ahead of GPU performance per watt
![Page 18: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/18.jpg)
Tensorflow Processor Units (TPU) - 180TFlops
![Page 19: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/19.jpg)
Cloud Storage
Cloud Bigtable
Cloud Datastore
Cloud SQL
Good for:Input & Output Binary or object data (BLOB) Long term storage
Such as:Images, videos, etc.
Good for:Hierarchical metadata
Good for:Metadata & other related data
Good for:Heavy read + write, events
Big Query
Good for:Data Warehouse
Analytics, Dashboards
Relational NoSQL Object Warehouse
Good for:Local VM file storage & work space
Block
Persistent Disk (GCE)
Where do I store data for my batch jobs?
![Page 20: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/20.jpg)
Block storageReliable, high-performance block storage for any GCE VM instance
Local SSDFastest, Attached, Ephemeral
Persistent Disk: SSDFast, Persistent, Durable, Remote
Persistent Disk: HDDCheapest, Persistent, Durable, Remote
Targetscenarios
- High-performance scratch space. Frequently accessed data.- Excellent for scientific workloads, especially when combined with fast compute VMs like GPU instances
- Latency sensitive applications and files.- High performance database and enterprise applications- Databases
- Large data processing workloads- Latency incentive tasks with lots of data: Genomics processing, video transcoding in GCE
Features
- Ephemeral storage- Highest-performance ($0.218 GB)- IOPS: 680k read / 360k write
- Persistent storage- Performance sensitive ($0.17GB)- IOPS: up to 40k read / 30k write
- Persistent storage- Cost sensitive ($.04 GB)- IOPS: 3k read / 15k write
Encryption3TB - 375 GB per partition, up to 8 partitions
Encryption, Snapshots64 TB, Disk Size sets performance
(Attach larger VMs for max SSD performance)
![Page 21: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/21.jpg)
GCS: Object/Blob store
● Google Cloud Storage is a scalable object storage service suitable for all kinds of unstructured data
● Cloud Storage vs Perst. Disk:○ Scales to exabytes○ Accessible from anywhere; REST interface○ Higher latency than PD○ Write semantics include insert and overwrite
file only○ Offers versioning○ Cheaper - put your data here until you need it
● Lots of guidelines on picking storage on our site
![Page 22: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/22.jpg)
Human Genetics on Google Cloud Platform
Jonathon LeFaiveUniversity of Michigan
Center for Statistical Genetics
![Page 23: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/23.jpg)
Human Genetics, Sample Sizes over Time
![Page 24: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/24.jpg)
TOPMed Sequencing as of April 25, 2017http://nhlbi.sph.umich.edu/
● 71,713 genomes○ 70,497 pass quality checks (98.3%)○ 823 flagged for low coverage ( 1.2%)○ 393 fail quality checks ( 0.5%)
● Mean depth: 38.3x● Genome covered: 98.7%● Contamination: 0.27%
9 x 1015 sequenced bases
![Page 25: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/25.jpg)
9 x 1015 sequenced bases
On the same scale as the number of grains of sand in a small beachImage: Wikimedia Commons
![Page 26: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/26.jpg)
What Does Alignment Mean?
Image: Wikimedia Commons
![Page 27: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/27.jpg)
Compute Workload● 70,000 genomes● ~ 20 GiB per genome● 456 core hours per genome
![Page 28: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/28.jpg)
1.3 PiBHighly Compressed Input Data
![Page 29: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/29.jpg)
3644Core Years
![Page 30: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/30.jpg)
Local Cluster Limitations● Not enough CPU cores● NFS saturation● Shared resources
![Page 31: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/31.jpg)
GCE Price Table
Machine Type vCPUs Memory Price (USD) Preemptible price (USD)
n1-standard-1 1 3.75 GB $0.0475 $0.0100
n1-standard-2 2 7.5GB $0.0950 $0.0200
n1-standard-4 4 15GB $0.1900 $0.0400
n1-standard-8 8 30GB $0.3800 $0.0800
PVMs are approximately ⅕ the cost.
![Page 32: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/32.jpg)
$0.0375 x 456 hr = $17.10
Savings of $17 per genome.
![Page 33: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/33.jpg)
$17.10 x 70,000 genomes = $1.2m
Total savings of $1.2m
![Page 34: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/34.jpg)
PVMs are Important …… so how do we utilize them?
● Change nothing● Process checkpointing (CRIU, BLCR, etc.)● Linux “suspend to disk” (swsusp)● Proprietary solution
![Page 35: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/35.jpg)
Pipeline Flow● Pre-process step outputs chunked
files● Main processing runs on chunked
files using preemptible instances● Chunks are merged before
post-processing
![Page 36: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/36.jpg)
Application Design● Checks out job from master
database
● Downloads input from bucket
● Processes input
● Uploads output to bucket
● Uploads log to bucket
● Checks in job to master
database.
![Page 37: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/37.jpg)
Advice for New Users● Use Google Cloud Storage● Create custom network● Exponential backoff retries● Cater to the majority when provisioning resources
![Page 38: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/38.jpg)
What Catching Up Looks Like
![Page 39: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/39.jpg)
![Page 40: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/40.jpg)
![Page 41: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/41.jpg)
![Page 42: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/42.jpg)
![Page 43: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/43.jpg)
![Page 44: Google Cloud Platform HPC Bursting with · 11/7/2017 · Karan Bhatia, Google Jonathon LeFaive, University of Michigan Proprietary + Confidential SEPTEMBER 28, 2016. Cloud Deployment](https://reader033.vdocuments.us/reader033/viewer/2022052017/602f7df5fe07443dc66a1bcc/html5/thumbnails/44.jpg)
Thank You!
Acknowledgments: Chris Scheller, Hyun M. Kang, Goncalo Abecasis, GCP Team