the state of linux containers - hpc...
TRANSCRIPT
1. “Linux Container” / “Docker Ecosystem” in a Nutshell
2. Confusion about Ecosystem / Vision to tackle it
3. Docker -> SWARM -> SLURM -> BigData
4. Discussion of Opportunities and Problems
4
Agenda
Userland(OS)Userland(OS) Userland(OS)
Userland(OS)
Ubuntu:14.04 Ubuntu:15.10 RHEL7.2
TinyCoreLinux
Linux Containers
6
SERVER
HOSTKERNEL
HYPERVISOR
KERNEL
SERVICE
Userland(OS)
KERNEL KERNEL
Userland(OS)Userland(OS) Userland(OS)
SERVICE SERVICE
SERVER
HOSTKERNEL
SERVICE SERVICE SERVICE
Traditional Virtualisation Containerisation
Containers do not spin up a distinct kernel all containers & the host share the same
user-lands are independent
they are separated by Kernel Namespaces
Containers are ‘grouped processes’ isolated by Kernel Namespaces
resource restrictions applicable through CGroups (disk/netIO)
HOSTcontainer1
7
Kernel Namespaces
bash
ls -l
container2
apache
container3
mysqld
consul consul
PIDNamespaces: Network Mount IPC UTS
container4
slurmd
ssh
consul
Container Runtime Daemon creates/…/removes containers, exposes REST API
handles Namespaces, CGroups, bind-mounts, etc.
IP connectivity by default via ‘host-only’ network bridge
Docker Engine
8SERVEReth0
dock
er0
container1
container2
Docker-Engine
Docker Compose
9
Describes stack of container configurations instead of writing a small bash script…
… it holds the runtime configuration as YAML file.
Docker Networking spans networks across engines KV-store to synchronise (Zookeeper, etcd, Consul)
VXLAN to pass messages along
SERVER0 SERVER1 SERVER<n>
Docker Networking
10
Consul
Docker-Engine
Consul Consul
Docker-Engine Docker-Engine
Consul DC
global
container0 container1 containerN
Docker Swarm proxies docker-engines serves an API endpoint in front of multiple docker-engines
does placement decisions.
SERVER0 SERVER1 SERVER<n>
Docker Swarm
11
Docker-Engine Docker-Engine Docker-Engine
swarm-client swarm-client swarm-client
swarm-master
:2376 :2376 :2376
:2375
container1
-e constraint:node==SERVER0
Introducing new Tech
15
… not always the same as the perception of others.
credit: TF2 - Meet the Pyro
Docker Buzzword Chaos!
Distributions
Solutions
Auto-ScalingOn-Premise & OverSpill
Orchestration
self-healing
16
production-readyenterprise-grade
1. No special distributions useful for certain use-cases, such as elasticity and green-field deployment
not so much for an on-premise datacenter w/ legacy in it.
2. Leverage existing processes/resources install workflow, syslog, monitoring
security (ssh infrastructure), user auth.
3. keep up with docker ecosystem incorporate new features of engine, swarm, compose
networking, volumes, user-namespaces17
Vision
Hardware (courtesy of ) 8x Sun Fire x2250, 2x 4core XEON, 32GB, Mellanox ConnectX-2)
Software Base installation
CentOS 7.2 base installation (updated from 7-alpha)
Ansible
consul, sensu
docker v1.10, docker-compose
docker SWARM
19
Testbed
node1
node2
node8
20
Docker Networking
Synchronised by Consul
Consul
Consul DC
Consul
Consul
Docker-Engine
Docker-Engine
Docker-Engine
node1
node2
node8
21
Docker SWARM
Docker SWARM Synchronised by Consul KV-store
Consul
Consul DC
Consul
Consul
Docker-Engine
Docker-Engine
Docker-Engine
swarm
swarm
SWARM
swarm master
node8
node2
node1
22
SLURM Cluster
Consul
Consul DC
Consul
Consul
SLURM within SWARM
slurmctld slurmd
slurmd
slurmd
Docker-Engine
Docker-Engine
Docker-Engine
swarm
swarm
SWARM
swarm master
SLURM
node8
node2
node1
24
SLURM Cluster [cont]
Consul
Consul DC
Consul
Consul
SLURM within SWARM slurmd within app-container
pre-stage containers slurmctld slurmd
slurmd
slurmd
Docker-Engine
Docker-Engine
Docker-Engine
swarm
swarm
hpcg
hpcg
SWARM
hpcg
swarm master
SLURM
25
MPI Benchmark
http://qnib.org/mpi
http://qnib.org/mpi-paper
node8
node2
node1
26
SLURM Cluster [cont]
Consul
Consul DC
Consul
Consul
SLURM within SWARM slurmd within app-container
pre-stage containers slurmctld slurmd
slurmd
slurmd
Docker-Engine
Docker-Engine
Docker-Engine
swarm
swarm
hpcg
hpcg
SWARM
hpcg
OpenFOAM
OpenFOAM
OpenFOAM
swarm master
SLURM
27
OpenFOAM Benchmark
http://qnib.org/immutable
http://qnib.org/immutable-paper
node1
node2
node8
28
Samza Cluster
Consul
Consul DC
Consul
Consul
Distributed Samza Zookeeper and Kafka cluster
Samza instances to run jobsDocker-Engine
Docker-Engine
Docker-Engine
swarm
swarm
SWARM
swarm masterzookeeper
zookeeper
zookeeper
kafka
kafka
kafka
samza
samza
samza
$ cat test.log |awk ‘{print $1}’ |sed -e ’s/HPC/BigData/g’ |tee out.log
1. Where to base images on? Ubuntu/Fedora: ~200MB
Debian: ~100MB
Alpine Linux: 5MB (musl-libc)
2. Trimm the Images down at all cost? How about debugging tools? Possibility to run tools on the host and ‘inspect’ namespaced processes inside of a container.
If PID-sharing arrives, carving out (e.g.) monitoring could be a thing.
30
Small vs. Big
1. In an ideal world… a container only runs one process, e.g. the HPC solver.
2. In reality… MPI want’s to connect to a sshd within the job-peers
monitoring, syslog, service discovery should be present as well.
3. How fast / aggressive to break traditional approaches?
31
One vs. Many Processes
Running OpenFOAM on small scale is cumbersome manually install OpenFOAM on a workstation
be confident that the installation works correctly
A containerised OpenFOAM installation tackles both
33
Reproducibility / Downscaling
http://qnib.org/immutablehttp://qnib.org/immutable-paper
1. Since the environments are rather dynamic… how does the containers discover services?
external registry as part of the framework?
discovery service as part of the container stacks?
34
Service Discovery
With Docker Swarm it is rather easy to spin up a Kubernetes or Mesos cluster within Swarm.
35
Orchestration Frameworks
SERVER0 SERVER1 SERVER<n>
Docker-Engine Docker-Engine Docker-Engine
swarm-client swarm-client swarm-client
swarm-master
etcd
kubelet
scheduler apiserver
etcd
kubelet
etcd
kubelet
1. Containers should be controlled via ENV or flags External access/change of a running container is discouraged
2. Configuration management Downgraded to bootstrap a host?
36
Immutable vs. Config Mgmt
If containers are immutable within pipeline testing/deployment should be automated
developers should have a production replica
37
Continuous Dev./Integration
38
Docker Momentum
Software Dev
Dat
acen
ter O
ps
IT Tinkering (Hello World)
Continuous Dev/Int/Dep
Microservices, hyper scale
Big Data
High Performance Computing
HPC
Disclaimer: subjective exaggeration
Spinning up production-like environment is great MongoDB, PostreSQL, memcached as separate containers
python2.7, python3.4
39
Docker in Software Development
Like python’s virtualenv on steroids, iteration speedup through reproducibility
Spinning up production-like environment is… …not that easy
focus more on engineer/scientist, not the software-developer
1. For development it might work close to non-HPC software dev
2. But is that the iteration-focus? rather job settings / input data?
40
Docker in HPC development
Split input iteration / development from operation non-distributed stays vanilla
transition to HPC cluster using tech to foster operation
41
Separation of Concerns?
http://gmkurtzer.github.io/singularity
Input/Dev
Docker-Engine 1.11 will not be the parent of containers runC usage under the hood
42
containerd Integration
1. Separat Dev and Ops don’t block the momentum fostering iteration speed in Development
2. Using vanilla docker-tech keep up with the ecosystem and prevent vendor/ecosystem lock-in
3. 80/20 rule have caveats on the radar but don’t bother too much
everything is so fast moving - it’s hard to predict
43
Recap aka. IMHO
Q&Ahttps://github.com/qnib/hpcac-cluster2016
http://qnib.org
eGalea Workshop (Pisa)<plz ping me if you are interested>
23.06.2016