bringing security and multi-tenancy to kubernetes

40
Bringing Security and Multi- tenancy to Kubernetes Lei (Harry) Zhang

Upload: trankiet

Post on 13-Feb-2017

219 views

Category:

Documents


0 download

TRANSCRIPT

Bringing Security and Multi-tenancy to Kubernetes

Lei (Harry) Zhang

About Me• Lei (Harry) Zhang @resouer

• #CNCF member, #Microsoft MVP

• Previous: VMware, Baidu

• Feature Maintainer of Kubernetes

• HyperCrew: https://hyper.sh/

• Publication: Docker & Kubernetes Under the Hood

• Phd Candidate #Large-scale cluster scheduling and management

A survey about “boundary”

• Are you comfortable with Linux containers as an effective boundary?

• Yes, I use containers in my private/safe environment

• No, I use containers to serve the public cloud

As long as we care security…• We have to wrap containers inside full-blown virtual machines

• But we lose cloud-native deployment

• Slow startup time

• Huge resources wasting

• Memory tax for every container

• …

dream

reality

Revisit container

• Container Runtime

• The dynamic view and boundary of your running process

• Container Image

• The static view of your program, data, dependencies, files and directories

namespace cgroups

FROM busybox

ADD temp.txt /

VOLUME /data

CMD [“echo hello"]

Read-Write Layer & /data

“echo hello”

read-only layer

/bin /dev /etc /home /lib /lib64 /media /mnt /opt /proc /root /run /sbin /sys /tmp /usr /var /data /temp.txt

/etc/hosts /etc/hostname /etc/resolv.conf

read-write layer

/temp.tx

t

json

json

init layer

FROM busybox ADD temp.txt /  VOLUME /data  CMD [“echo hello"]Docker Container

HyperContainerSecure Kubernetes from runtime level

HyperContainer• Container Runtime

• RunV

• https://github.com/hyperhq/runv

• A OCI compatible hypervisor based runtime implementation

• Control daemon

• https://github.com/hyperhq/hyperd

• Container Image

• Docker Image Spec

Combine the best parts• Portable and behaves like a Linux container

• $ hyperctl run -t busybox echo helloworld

• sub-second startup time*, ~12MB memory cost

• Fully isolated sandbox with an independent guest kernel

• $ hyperctl exec -t busybox uname -r

• 4.4.12-hyper (or your provided kernel)

• security, backward compatibility, maturity

See: http://hypercontainer.io/why-hyper.html

HyperContainer is a Pod

• That’s how HyperContainer fits into the Kubernetes philosophy

• Wait, why Pod is so important?

Pod: lesson learned from Borg• Should sample.war be packaged with Tomcat?

Pod: lesson learned from Borg

• InitContainers: one or more containers started in sequence before the pod's normal containers are started.

• Share volumes, perform network operations, and perform computation prior to the app containers.

So, Pod is• The group of super-affinity containers

• The atomic scheduling unit

• The process group in container cloud

• Do right things

• without modifying your container image

• Kubernetes = Spring Framework

• Pod = IoC

Pod

log app

infra container

volume

init container

Pod is not easy to simulate• log super affinity app

• Requirement:

• app: 1G, log: 0.5G

• Available:

• Node_A: 1.25G, Node_B: 2G

• What happens if app scheduled to Node_A?

HyperContainer is a Pod

• Linux container based runtimes

• wraps and encapsulates several app containers into a logical group

• Hypervisor container based runtime

• hypervisor serves as a natural boundary of Pod

HyperContainer is a Pod• Container Runtime Interface

• create sandbox Foo --> create container C --> start container C

• stop container C --> remove container C --> delete sandbox Foo

• Sandbox

• Normally: the infra container

• HyperContainer: hypervisor

• with HyperKernel

• a HyperStart process as PID 1

• setup mnt namespace, launch apps from the images etc

HypernetesKubernetes with HyperContainer Runtime

Hypernetes• Also: h8s

• Kubernetes + HyperContainer runtime

• officially supported by using kubernetes/frakti

• Multi-tenant network and persistent volumes

• battle tested Neutron + Cinder plugin

Multi-tenant Network• Goal:

• leveraging tenant-aware neutron network for Kubernetes

• following the network plugin workflow

• Non-goal:

• break k8s network model or hack k8s code

Define the Network

• Network

• a top class api object

• each tenant (created by Keystone) has its own Network

• Network mapping to Neutron “net”

• a Network Controller is responsible to manage Network lifecycle

Examplekubelet

SyncLoop

controller-managerControlLoop

kubeletSyncLoop

proxy

proxy

networkpod replica namespace service job deployment volume petset …

etcd

scheduler

api-server

Desired World Real World

Call Neutron to create/delete

network

Kubernetes Network Model• Container reach container

• all containers can communicate with all other containers without NAT

• Node reach container

• all nodes can communicate with all containers (and vice-versa) without NAT

• IP addressing

• Pod in cluster can be addressed by its IP

How h8s fits that?• Network can be assigned to one or more

Namespaces

• Pods belonging to the same Network can reach each other directly through IP

• a Pod’s network mapping to Neutron “port”

• kubelet network plugin is responsible for Pod network setup

Examplekubelet

SyncLoop

kubeletSyncLoop

proxy

proxy

1 Pod created

etcd

scheduler

api-server

Examplekubelet

SyncLoop

kubeletSyncLoop

proxy

proxy

2 Pod object added

etcd

scheduler

api-server

Examplekubelet

SyncLoop

kubeletSyncLoop

proxy

proxy

3.1 New pod object detected3.2 Bind pod with node

etcd

scheduler

api-server

Examplekubelet

SyncLoop

kubeletSyncLoop

proxy

proxy

4.1 Detected pod bind with me4.2 Start containers in pod

etcd

scheduler

api-server

Design of kubelet

InitNetworkPlugin

Choose Runtimedocker, rkt, hyper/remote

InitNetworkPlugin

HandlePods{Add, Update, Remove, Delete, …}

NodeStatus

Network Status

status Manager

PLEG

SyncLoop

Pod Update Worker (e.g.ADD) • generale Pod status • check volume status (talk later) • call runtime to start containers

• set up Pod network (see next slide)

volume Manager

PodUpdate

image Manager

Set Up Pod Network

kubestack

A standalone gRPC daemon

1. to “translate” the SetUpPod request to the Neutron network API

2. handling multi-tenant Service proxy

Service$ iptables-save | grep my-service -A KUBE-SERVICES -d 10.0.0.116/32 -p tcp -m comment --comment "default/my-service: cluster IP" -m tcp --dport 8001 -j KUBE-SVC-KEAUNL7HVWWSEZA6

-A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-6XXFWO3KTRMPKCHZ -A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-57KPRZ3JQVENLNBRZ

-A KUBE-SEP-6XXFWO3KTRMPKCHZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.2:80 -A KUBE-SEP-57KPRZ3JQVENLNBRZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.3:80

portal 10.10.0.116:8001

random mode rules

backend rule_1

backend rule_2

172.17.0.2.:80

172.17.0.3.:80

OnServiceUpdate

OnEndpointsUpdate

Multi-tenant Service• Default iptables-based kube-proxy is not tenant aware

• Endpoint Pods and Nodes with iptables rules are isolated into different networks

• Hypernetes uses a built-in HAproxy as the Service portal

• to proxy all Service instances within same namespace

• the same OnServiceUpdate and OnEndpointsUpdate process

• ExternalProvider

• a OpenStack LB will be created as Service

• e.g. curl 58.215.33.98:8078

Kubernetes Persistent Volume

Host

path

Cinder volume plugin

Pod PodmountPath mountPath

attach

mount

VolumeManager desired

World

reconcile

• Get mountedVolume from actualStateOfWorld

• Unmount volumes in mountedVolume but not in desiredStateOfWorld

• AttachVolume() if vol in desiredStateOfWorld and not attached

• MountVolume() if vol in desiredStateOfWorld and not in mountedVolume

• Verify devices that should be detached/unmounted are detached/unmounted

• Tips:

1. -v host:path

2. attach VS mount

3. Totally independent from container management

Persistent Volume with HyperContainer• Enhanced Cinder volume plugin

• Linux container:

1. full OpenStack cluster

2. query Nova to find node

3. attach Cinder volume to host path

4. bind mount host path to Pod containers

• HyperContainer:

• directly attach block devices to Pod

• thanks to the hypervisor based Pod boundary

• eliminates extra time to query Nova

Host

vol

Enhanced Cinder volume plugin

Pod PodmountPath mountPath

attach vol

desired World

reconcile

VolumeManager

PV Example

• Create a Cinder volume

• Claim volume by reference its volumeID

Container Runtime Interface

Future of CRI

• Keep Docker as the only one default container runtime

• ocid, rktlet, hyperd

• Frakti: the Remote Container Runtime Kit

• https://github.com/kubernetes/frakti

• welcome to tryout, star and fork

“if image becomes non-standard”

• e.g. Docker image becomes somehow Docker specific

• Don’t worry, kubelet.imageManager is moving to runtime specific

• but then k8s will probably choose

• NO DEFAULT runtime

Node Node

Full TopologyNode

kubestack

Neutron L2 Agent

kube-proxy

kubelet

Cinder Plugin

Pod Pod Pod PodKeyStone

Neutron

Cinder

Master

Object: Network

Ceph

Object: Pod

Object: …

Summary• A new way to build secure and multi-tenant Kubernetes

• Kubernetes + HyperContainer + Neutron Plugin + Cinder Plugin + Keystone

• Roadmap

• Graduate HyperContainer runtime on k8s upstream

• Neutron CNI plugin

• Project URL: https://github.com/hyperhq/hypernetes

• Tip: https://hyper.sh is totally built on Hypernetes, try it out :)

ENDLei (Harry) Zhang

@resouer