the docker multitenancy problem: a journey through infrastructure hell
TRANSCRIPT
A JOURNEY THROUGH INFRASTRUCTURE HELL
MULTITENANCY WITH DOCKER
Peter Klipfel
STOP ME IF YOU WANT TO LOOK AROUND
THERE’S A LOT TO SEE
I WILL PAUSEWHEN I SEE A TURTLE
SOME CONTEXT
WHAT WE’RE TRYING TO DO
EACH USER GETS
▸ Private data storage
▸ Notebook (executable code on our servers)
▸ Deployed microservices
WHAT WE’RE TRYING TO DO
WE NEED
▸ Scalability
▸ Fault Tolerance
▸ Security
HOW HARD IS IT TO CREATE A MULTI-TENANT ELASTICSEARCH CLUSTER?
LET’S START WITH A QUESTION
VERY HARD
MULTITENANT ELASTICSEARCH
POSSIBLE SOLUTIONS
▸ Built in multi tenancy
▸ Shield
▸ Search-guard
MULTITENANT ELASTICSEARCH
NONE OF THEM WORK
▸ Built in multi tenancy: update yml file every user ->
restart
▸ Shield: Not Free
▸ Search-guard: SSL was painful
HOW CAN WE DO THAT?
EACH USER GETS THEIR OWN DATABASE
ELASTICSEARCH INSTANCE PER USER
POSSIBLE SOLUTIONS
▸ Use hosted ES: Really expensive
▸ Use a cloud provider: expensive
▸ Use Docker: not as expensive
DOCKER TO THE RESCUE!
HOW DO WE CREATE DOCKER CONTAINERS ON DEMAND?
BUT WAIT
DOCKER CONTAINERS ON DEMAND
POSSIBLE SOLUTIONS
▸ Mesos (+ marathon)
▸ Docker Swarm
▸ Kubernetes
DOCKER CONTAINERS ON DEMAND
WHAT ARE THOSE TOOLS?
▸ Container schedulers
▸ APIs to run a docker container somewhere in the
cluster
▸ Uniform cluster nodes
DOCKER CONTAINERS ON DEMAND
WHAT ARE THOSE TOOLS?
MASTER
AGENT
AGENT
AGENT
AGENT
AGENT
CONTAINER
CONTAINER
CONTAINER
DOCKER CONTAINERS ON DEMAND
THE PROBLEMS
▸ How do users get to their services (databases)?
▸ What if a node goes down?
▸ How do I separate users?
HOW DO USERS GET TO THEIR DATABASES?
SERVICE ACCESS
WHAT ARE THOSE TOOLS?MAST
ER
AGENT
CONTAINER
CONTAINER
CONTAINER
AGENT
CONTAINER
CONTAINER
CONTAINER
AGENT
CONTAINER
CONTAINER
CONTAINER
AGENT
CONTAINER
CONTAINER
CONTAINER
REVERSE
PROXY
SERVICE ACCESS
REVERSE PROXY
▸ Nginx (reloads good)
▸ HAProxy (reloads bad)
▸ And we will need Consul
SERVICE ACCESS
CONSUL: THE EASIEST WAY
▸ We need Registrator on every node
▸ consul-dns creates routing
▸ consul-template builds nginx config
SERVICE ACCESS
NOW OUR REVERSE PROXY WORKS!MAST
ER
AGENT
CONTAINER
CONTAINER
CONTAINER
AGENT
CONTAINER
CONTAINER
CONTAINER
AGENT
CONTAINER
CONTAINER
CONTAINER
AGENT
CONTAINER
CONTAINER
CONTAINER
REVERSE
PROXY
…
SERVICE ACCESS
POTENTIAL ALTERNATIVES
▸ ETCD
▸ MesosDNS
▸ Zookeeper
WHAT IF A NODE GOES DOWN?
GREAT! USERS CAN ACCESS THINGS!
STATEFUL SERVICES
PROBLEMS
▸ Containers have different fs mounts on each instance
▸ Node spin-up is non-deterministic (which disk will it
use)
▸ Network file systems require implementation
changes
STATEFUL SERVICES
SOME SOLUTIONS
▸ We can mount docker container filesystems with
volumes
▸ Can specify certain nodes for services
▸ Force stateful services to same node
CLUSTERINGSOLUTION:
STATEFUL SERVICES
CLUSTERING
▸ Failure is ok, as long as it’s not the whole cluster
▸ Storage can be ephemeral
▸ Most databases cluster
HOW DO WE KEEP OUR USERS SEPARATED?
GREAT! LET’S CLUSTER
NETWORK ISOLATION
THEY’RE ALL ON THE SAME SYSTEMMAST
ER
AGENT
CONTAINER
CONTAINER
CONTAINER
AGENT
CONTAINER
CONTAINER
CONTAINER
AGENT
CONTAINER
CONTAINER
CONTAINER
AGENT
CONTAINER
CONTAINER
CONTAINER
REVERSE
PROXY
…
NETWORK ISOLATION
PROBLEMS WITH CLUSTERING
▸ Reverse proxy works only for HTTP
▸ Don’t want to DOS the internal network
▸ Need isolation between users
NETWORK ISOLATION
SOLUTION: DOCKER OVERLAY NETWORK▸ Weave
▸ Callico
▸ Flannel
WE JUST REINVENTED OPENSTACK…
BUT WAIT
NETWORK ISOLATION
PROBLEMS WITH OPENSTACK
▸ Maintaining it sucks
▸ Upgrading it sucks
▸ Paying for it sucks
SO I USED OPENSTACK FOR A WHILE
NETWORK ISOLATION
HOW IT WORKED
▸ User gets their own account
▸ Every user gets their own network
▸ Every user gets their own persistent storage
KUBERNETES
AND AFTER IT STOPPED SCALING I TRIED
GOOGLE CONTAINER ENGINE
AND AFTER IT STOPPED SCALING I TRIED
GOOGLE CONTAINER ENGINE (GKE)
THE BEST SOLUTION I HAVE FOUND
▸ Persistent volumes
▸ Decent library support
▸ Hopeful networking promised land
GOOGLE CONTAINER ENGINE (GKE)
PERSISTENT VOLUMES
▸ I don’t need automated clustering if disks are
persistent
▸ Manual deploy for customers that require larger
clusters
▸ Can separate disk utilization by service
GOOGLE CONTAINER ENGINE (GKE)
HOPEFUL NETWORKING PROMISED LAND▸ Configuration defines subnetwork id
▸ Subnets can exist across data centers
▸ Lots of opportunities for more clever reverse
proxying
CONCLUSIONWHAT HAVE WE LEARNED?
CONCLUSION
WHAT HAVE WE LEARNED?
▸ Docker is a glorified package manager
▸ Complex microservice architectures are still hard
▸ The promised land is close