containers and hpc
TRANSCRIPT
Containers and HPC
Olli-Pekka Lehto (@ople)Services for Research Work Together Days
Dec 17th 2015
Making It Easy to Do Custom HPC Environments
Or rather..
Olli-Pekka Lehto (@ople)Services for Research Work Together Days
Dec 17th 2015
What is Docker?
• Helps to manage and run applications with complex dependencies quickly and efficiently
• Management framework for Linux Containers• Containers:– Instances isolated within a namespace– Kernel shared with host OS– Resources guaranteed using Linux cgroups
• Grown into a complete ecosystem– docker-swarm, docker-machine, docker-compose…
What is Docker?
Docker Adoption is Rapid
Interest in HPC is growing
http://www.lanl.gov/projects/apex/_assets/docs/APEX2020_draft_tech_specs_v1.0.pdf
http://investors.cray.com/phoenix.zhtml?c=98390&p=irol-newsArticle&ID=2112970
Why Docker
• Fast initialization (10-1000x vs VMs)• Small memory overhead • Efficient disk usage• Bare-metal access to devices• Built-in version control and repository support– Simple sharing of containers
• Simple launch mechanism– Can be run within batch job queue system
Bare-metal HPCCloud HPC
CSC-builtapps
User-built CSC-compatibleapps
Hosting
Windows
Non-SLURM batch queue system
Ubuntu
“I need/want Root”VM image app
Web Servers
Preservation of SW stackSecure access
Complex stack
Current Choices for HPC Workloads
Cloud HPC Container HPC Bare-metal HPC
CSC-builtapps
User-built CSC-compatibleapps
Hosting
Windows
Non-SLURM batch queue system
Ubuntu
“I need/want Root”VM image app
Secure access
Web Servers
Preservation of SW stack
Complex stack
Containerized app
Future Choices for HPC Workloads
Example Case
“My HPC application needs Ubuntu”• Alternative 1: Adapt the application
– Takes time and work– May not be possible with ISV codes
• Alternative 2: Run it in the cloud– You may get to play cluster admin! – Scheduling is limited in OpenStack (no backfill etc.)– Running a short job has large initialization overhead– Performance penalties
• Alternative 3: Run it in a container– No need to touch the application– Nearly as easy to run as a normal job– Very little overhead– Can use the normal batch job scheduler
Challenges for HPC Use
• Security model is problematic – Initially designed for server environments
• Only trusted users have shell access to server– Containers launched as root – Access to bare metal & device drivers
• Requires an overlay FS & kernel modules– Relatively new Linux OS version needed
• Daemon must run on every compute node• Low-level driver compatibility?– GPU, InfiniBand, Lustre
Shifter
• Alternative to Docker daemon– Adapts containers to HPC use• Repacks into a new filesystem (squashfs)
– Integrates with batch job queue systems• No need to run a daemon on compute nodes
• Developed by NERSC for Cori (Cray XC40)– Pre-release version available– Cray is productizing it
• Parallel jobs? Driver issues?https://www.nersc.gov/research-and-development/user-defined-images/ https://bitbucket.org/berkeleylab/shifter
Next Steps / Ideas
• Piloting Shifter in 2016– Sisu, Taito, Taito-shell• Custom interactive containers in taito-shell
• Containerized CSC compute environment?– Customers could run on their own laptops,
workstations and/or clusters– Starting point for users’ own customizations– Having our environment in config management
would make it easier
Extras
EasyBuild + Docker?
• Dockerfile can be used to define a container– Simple flat file with a script-like syntax– Specific to Docker
• Using EasyBuild with Docker?– Target also VMs or bare-metal with same config– Update portions of the stack easily– Manage dependencies– Leverage the rich set of EasyBuild configs– Which way to do it?
• Using EasyBuild in a container or• Building containers with EasyBuild?