Download - Designing and managing scalable HPC Infrastructure to support Public Health in the Genomic Era
Designing and managing scalable HPC Infrastructure to support Public Health in the Genomic Era
Francesco Giannoccaro
A walk-trough the design and implementation phases of a scalable HPC/HTC infrastructure to support the increasing demand for advanced computing platforms, driven by the speed at which public health science is evolving and the rate at which medical microbiology is modernising.
Summary
Slide 2 of 14
● Public Health England (PHE) is an executive agency of the Department of Health in the UK, its main mission is "to protect and improve the nation’s health and well-being, and reduce health inequalities”
● PHE is structured in directorates and corporate programs and has a network of specialist microbiology laboratories across England
● PHE services include the microbiology services, regional microbiology network, field epidemiology, surveillance and control
● Increased demand on computational power primarily due to the implementation of whole genome sequencing (WGS) as part of its modernisation
● Geographically distributed IT infrastructure mainly located on two sites Colindale (north London) and Porton Down (south west)
Background
Slide 3 of 14
High performance and high throughput computing (HPC/HTC) have been used in PHE mainly by three departments:
● Emergency Response, use HPC to better understand ahead of time the (eco-) epidemiological, social, behavioral drivers that exacerbate the risks posed by infectious disease threats, including bioterrorism;
● Statistics Modeling and Economics, use HPC to provide real time models to predict expected pandemic disease dynamics, and to produce data contributing to the body of knowledge, scientific publication informing national policy including national vaccination policy and control of antimicrobial resistance;
● Infection Disease Informatics (Bioinformatics), use HPC to support whole genome sequencing (WGS) analysis for diagnostics and surveillance of infectious diseases, with hundreds of biological samples received per week from patients with unidentified and potentially aggressive pathogens (bacteria and virus) that need urgent identification.
Overview of pre-existing HPC systems
Slide 4 of 14
Pre-existing HPC infrastructure
High Performance ComputingSystem used by Modeling andEconomics department
Located in Colindale
High Performance ComputingSystem used by Bioinformatics unit
Located in Colindale Located in Porton
High Performance ComputingSystem used by EmergencyResponse department
Linux cluster based on RHEL● Resource manager: GridEngine● Provisioning system: xCat
2 x Management server IBM x3650● 2 x Intel X5450, 32GB of RAM● 2 x 10 GB Ethernet● 6 x 72 GB SAS
16 x HP Blade BL460c Gen8 compute nodes● 2 x Intel E5-2680, 128 GB of RAM● 2 x 10 Gb Ethernet● 2 x 900GB 6G SAS 10K
10 x IBM Flex System x240 compute node● 2 x Intel E5-2650v2, 128GB of RAM● 1 x CN4022 2-port 10Gb ● 2 x 900GB 10K SAS HDD
Lustre filesystem
Linux cluster based on Bull/RHEL● Resource manager: Slurm● Provisioning system: Bull
2 x Bull R423-E3 management server● 2 x Intel E5-2620, 32GB of RAM● 2 x 500GB SATA3● 2 x InfiniBand ConnectX-2 QDR IB● 2 x Gb Ethernet
72 x Bull B500 compute nodes:● 2 x Intel X5660, 48 GB of RAM● 1 x 128GB SATA2 SSD● InfiniBand adapter
Lustre filesystem
Linux cluster based on Bull/RHEL● Resource manager: Slurm● Provisioning system: Bull
28 x Bull B510 compute nodes:● 2 x Intel E5-2620, 32GB of RAM● 1 x 256 GB SSD● 2 x 1Gb Ethernet● 1 x InfiniBand QDR
8 x Bull B500 compute nodes:● 2 x CPU Intel L5530, 24GB of RAM● 1 x 256 GB SSD● 2 x 1Gb Ethernet● 1 x InfiniBand QDR
1 x Bull GPU server:● 2 x CPU Inte E5620, 20GB of RAM● 2 x Nvidia K20c GPU cards● 1 x InfiniBand QDR - 2 x 1Gb Ethernet● 1TB SAS disk
Lustre filesystem
Slide 5 of 14
Lustre HPS storage tier
HPS system DDN EXAScaler SFA 10K ● Lustre filesystem (v. 2.5.41) ● 2 x DDN SF 10K controllers ● 3 x DDN Enclosure SS7000 (300 TB usable)● 4 x Lustre Object Storage Servers● 2 x Meta Data servers, 1 x MDT DDN EF3015● Host interfaces: 4x 10GbE SFP+ (per controller)● 2.5 GB/s read and write performance
HPS system DDN ES7K Lustre ● Lustre version 2.5.42.8 (EXAScaler 2.3.1)● 40 x 4TB NL-SAS disks (145 TB raw, 125 TiB
usable capacity for data)● 6 x 300GB SAS (metadata)● 2 virtual OSS & 2 virtual MDS● 3 GB/s reads and writes performance ● Host interfaces: 4 x InfiniBand FDR or 40
GbE QSFP● Dual LNETs configured
● LNET1 on InfiniBand FDR, IPoIB configured to use datagram mode
● LNET2 on Ethernet/QSFP 40Gb/s, jumbo frames enabled
Slide 6 of 14
iRODS archive storage tier
High Performance StorageDDN EXAScaler / Lustre fs
iRODS serversWith SSL SAN certificate
PHE/Colindale
HPC cluster
DDN WOS object storage system
Sequencingmachines
PHE WAN
PHE/Porton Down
iRODS server
DDN WOS object storage system
Computing andStorage system forsimple analysisSequencing
machines
PHE/Birmingham
DDN WOS objectstorage system
iRODS server
Slide 7 of 14
Metalnx – iRODS Administrative and Metadata Management WebUI
PHE Cloud Platform goals & objectives
Slide 8 of 14
Applying a holistic approach to increase both capacity and capability of the IT infrastructure, implementing cloud technologies able to:
● provide HPC on demand services, offering the ability to expand any of the existing clusters by deploying additional compute resources when needed; also provide the ability to deploy virtual HPC cluster (eg ElastiCluster, Senlin)
● improve orchestration and automation of existing HPC environments and the new software defined infrastructure by implementing an end to end API solution to support centralized provisioning, configuration and management operations;
● provide IaaS capability to host big data analytics platforms, to leverage the value of a number of existing PHE datasets (eg Cassandra, Hadoop, Apache Spark);
● increase resilience and disaster recovery capability by implementing geographically distributed cloud storage tiers and archive storage tier across multiple sites/regions;
● reduce and limit vendor lock-in constraints using open-source enterprise class technologies;
● allow PHE to scale-up its computational capacity if required above and beyond the on premise computational resources, leveraging commercial clouds.
Infrastructure capacity Cores Ram HPS Archive Storage
Initial capacity 1.5k 6.4TB 390TB 500TB
Post deployment capacity 2.9k 16.4TB 500TB 500TB
Overview of the new architecture
Slide 9 of 14
OpenStack deployment at each cloud region
Slide 10 of 14
OpenStack Undercloud provisioning server:● Lenovo x3550 M5 with 1x Intel E5-2620 v3● 64 GB of RAM● 2x 1TB 10K 6Gbps SAS HDD● 1x ConnectX-3 Pro 2x40GbE/FDR VPI Adapt.
3 x Controller nodes - Lenovo x3550 M5:● 2x Intel E5-2620 v3, 128 GB of RAM● 2x 240GB SATA SSD - 2x 480GB STAT SSD● 1x ConnectX-3 Pro ML2 2x40GbE/FDR VPI Adapter
31 x Compute nodes (*) – Lenovo nx360 M5:● 2x Intel E5-2640 v3(16cores) , 128 GB of RAM● 2x 120GB SATA SSD● 1x ConnectX-3 Pro ML2 2x40GbE/FDR VPI Adapter
1 x Compute node for large cloud instances - Lenovo x3950 X6 8U● 8x Intel E7-8860 v3(128cores) - 1TB of RAM● 2x 120GB STAT SSD● 1x ConnectX-3 Pro ML2 2x40GbE/FDR VPI Adapter
1x GPU node – Lenovo nx360 M5:● 2x Intel E5-2640 v3, 128 GB of RAM● 2x 120GB STAT SSD● 1x ConnectX-3 Pro ML2 2x40GbE/FDR VPI Adapter● 1x nVidia Tesla K80
infrastructure designed to be easily expanded
(*) Including additional nodes that will be installed by end of March 2017Cloud Storage Tier
Network topology at each cloud region
Slide 11 of 14
2x Mellanox SX1710spine switches
Mellanox switches and network cards installed on all OpenStack servers are configured with support for SR-IOV
3x Mellanox SX1710leaf switches
Lustre HPS is presented as external-network to Newtron with range of floating IP address that can be used by tenants.
Note that from release Liberty Neutron support RBAC, to solve the inability to share certain network resources with a subset of projects/tenants. Neutron has supported shared resources in the past, but until now it's been all-or-nothing. If a network is marked as shared, it is shared with all tenants.
Access can now be tuned trough RBAC on the basis of these features• regular port creation permissions on
networks (since Liberty)• binding QoS policies permissions to
networks or ports (since Mitaka)• attaching router gateways to
networks (since Mitaka)
CEPH cloud storage deployment at each region
Slide 12 of 14
3 x CEPH server - Lenovo x3550 M5
● 2x Intel E5-2630 v3● 128 GB of RAM● 9x 6TB SATA SSD● 2x 480GB SATA SSD● 1x ConnectX-3 Pro 2x40GbE/FDR VPI
Adapter
40/56GbEthernet Networkfor CEPH data
1Gb Ethernet Network formanagement
Insights of the adopted solution
Slide 13 of 14
● Using OpenStack Mitaka (RHOSP9) which provide several improvements, including better Heat support for: resource chain, support for cleanup actions (filesystem sync), thread-aware CPU pinning, ceilometer integration improvements, autoscaling for compute based on heat/ceilometer .
● Deployment thorugh Director (based on TripleO and Ironic): two cloud-regions deployed using CEPH cloud storage tiers with smart replication and synchronization. Supported upgrade path.
● Fernet tokens for Keystone authentication and authorisation system (FreeIPA) across multiple regions allowing users to log-in at either site using the same credentials.
● Spine/leaf network architecture with cloud nodes connected at 56Gb. SR-IOV (ethernet/IB) & Mellanox configuration for low latency MPI workloads.
● OpenStack sub-projects that will be used in addition to the core ones● Sahara: provides a simple means to provision a data-intensive application cluster
(Hadoop or Spark)● Magnum: application catalog to deploy Kubernetes clusters, pods, and container
applications
Acknowledgments
Team member's and key contributions
Technical partners and key contributions
Francesco Giannoccaro Tim CairnesThomas Stewart Anna Rance
Andrew Dean, Christopher Brown
Richard Mansfield
Slide 14 of 14
Thanks and keep in [email protected] www.linkedin.com/in/giannoccaro