magic castle - fosdem...magic castle terraforming the cloud for hpc félix-antoine fortin, fosdem20....

43
Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20

Upload: others

Post on 12-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Magic Castle

Terraforming the Cloud for HPCFélix-Antoine Fortin, FOSDEM20

Page 2: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Why are there more wizards in Harry Potter than in Lord of the Rings?

Page 3: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Context

Page 4: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Canada Digital Research Infrastructure

Page 5: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Education and Training in Compute Canada

● Over 150 workshops / year● Most workshops use the

HPC software environment● HPC clusters require an

account● Account creation process

can take a few days

Could we replicate the HPC environment for training?

Page 6: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

So what is the difference between HP and LotR?

?

Page 7: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

So what is the difference between HP and LotR?

Wizardry Schools

Page 8: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Proposal

Page 9: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

HPC Wizard Tower bySimon Guilbault

Page 10: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

demo

Page 11: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

CC Wizard: Magic Castle Voice Assistant

Page 12: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

CC Wizard: Magic Castle Voice Assistant

Page 13: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Magic Castle

Open source project that instantiates a Compute Canada cluster replica in any major cloud with Terraform and Puppet

● Create instances○ Management nodes○ Login nodes○ Compute nodes

● Create volumes, network, network acls● Create certificates, dns records, passwords● Configuration done via input parameters

https://github.com/computecanada/magic_castle

Page 14: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Terraform

● Tool for building, changing, and versioning infrastructure

● Infrastructure is described using a high-level configuration syntax.

● Create resources that can then be setup by a config management tool.

● Config management tool used for deploying, configuring and managing servers.

● Define configurations for each host

● Continuously check whether the required configuration is in place and is not altered

Puppet

Page 15: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Overview of a Magic Castle Release

Magic Castle

provider*

main.tf

data.tf

variables.tf

output.tf

infrastructure.tf

cloud-init mgmt.yaml

puppet.yaml

provider.tf

*could be any in [aws, azure, gcp, openstack, ovh]

Page 16: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Infrastructure

Page 17: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Overview of a Magic Castle Release

Magic Castle

provider*

main.tf

data.tf

variables.tf

output.tf

infrastructure.tf

cloud-init mgmt.yaml

puppet.yaml

provider.tf

*could be any in [aws, azure, gcp, openstack, ovh]

Page 18: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Architecture

Page 19: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Architecture - login nodes

Page 20: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Architecture - management nodes

Page 21: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Architecture - compute nodes

Page 22: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Main Interface

Page 23: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Overview of a Magic Castle Release

Magic Castle

provider*

main.tf

data.tf

variables.tf

output.tf

infrastructure.tf

cloud-init mgmt.yaml

puppet.yaml

provider.tf

*could be any in [aws, azure, gcp, openstack, ovh]

Page 24: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Magic Castle Terraform Main Module

4 sections

1. Cloud provider selection2. Infrastructure customization3. Cloud Provider specifics inputs4. DNS Configuration (optional)

Page 25: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

MC Module - 1. source

source = "./provider"

Page 26: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

cluster_name = "fosdem" domain = "computecanada.dev" image = "CentOS-7-x64-2019-07" nb_users = 100 public_keys = [file("~/.ssh/id.pub")]

MC Module - 2.1 Infrastructure customization

Page 27: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

MC Module - 2.2 Instance definition

instances = { mgmt = { type = "p4-6gb", count = 1 }, login = { type = "p2-3gb", count = 1 }, node = { type = "p2-3gb", count = 1 } }

Page 28: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

MC Module - 2.3 Storage definition

storage = { type = "nfs" home_size = 100 project_size = 50 scratch_size = 50 }

Page 29: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

MC Module - 3. Cloud Provider Specific Inputs

Examples:

● OpenStack list of floating ips● Google GPU attachment for compute nodes● AWS / Azure / Google Cloud region

Page 30: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

MC Module - 4. DNS Configuration (optional)

source = "./dns/cloudflare" name = module.provider.cluster_name domain = module.provider.domain email = "[email protected]" public_ip = module.provider.ip rsa_public_key = module.provider.rsa_public_key sudoer_username = module.provider.sudoer_username

Page 31: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Apply Plan

$ terraform apply

Apply complete! Resources: 30 added, 0 changed, 0 destroyed.

Outputs:

admin_username = centosguest_passwd = **redacted**guest_usernames = user[01-10]hostnames = [pirate.calculquebec.cloud, pirate1.calculquebec.cloud]public_ip = [206.12.90.97]

Page 32: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Challenges: Infrastructure as Code

● Designing the main user interface that would limit the references to a provider specific implementation / API.

● Terraform configuration language tends to favor repetition over re-use of code.

● Regrouping every components that are common amongst providers

Page 33: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Provisioning

Page 34: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Overview of a Magic Castle Release

Magic Castle

provider*

main.tf

data.tf

variables.tf

output.tf

infrastructure.tf

cloud-init mgmt.yaml

puppet.yaml

provider.tf

*could be any in [aws, azure, gcp, openstack, ovh]

Page 35: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Bootstrap Puppet

1. Inject data from TF

2. Upgrade CentOS

3. Install Puppet rpms

4. Configure Puppet

certificates

5. Setup host

configuration

Page 36: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

login1

node1 node2

mgmt1

node3 node4 node5

node6

Provisioning with Puppet and Consul

Page 37: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Challenges: Provisioning

● Every steps of the provisioning need to work without human intervention.

● Once provisioned, the cluster need to stay healthy on itself - users are not necessarily sys admins.

● Provisioning both master and slave services without proper syncing mechanism.

Page 38: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Software

Page 39: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Batteries Included

● FreeIPA○ Kerberos○ BIND○ 389 DS LDAP

● NFS● Slurm● Globus Endpoint● JupyterHub with BatchSpawner● Compute Canada CVMFS● LMOD

Page 40: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Compute Canada Software Stack - CVMFS

● CernVM File System (CVMFS) provides a scalable, reliable and low-maintenance software distribution service;

● Compute Canada CVMFS repo:○ 600+ scientific applications○ 4,000+ permutations of

version/arch/toolchain○ All compiled with EasyBuild

● Available from anywhere● PEARC19 paper

Page 41: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Key Takeaways

1. Terraform can be used to build complex things and modules simplify that complexity.

2. Magic Castle is a teaching and development meta-platform for HPC.

Page 42: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Magic Castle Replicates a Compute Canada Cluster in 20 min.

Page 43: Magic Castle - FOSDEM...Magic Castle Terraforming the Cloud for HPC Félix-Antoine Fortin, FOSDEM20. Why are there more wizards in Harry Potter than in Lord of the Rings? Context

Questions ?