docker, linux container

71
Docker & Linux Containers Araf Karsh Hamid

Upload: araf-karsh-hamid

Post on 15-Jul-2015

444 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Docker, LinuX Container

Docker & Linux ContainersAraf Karsh Hamid

Page 2: Docker, LinuX Container

Topics

1. Docker

1. Docker Container

2. Docker Key Concepts

3. Docker Internals

4. Docker Architecture Linux Vs. OS X

5. Docker Architecture Windows

6. Docker Architecture Linux (Docker Daemon and Client)

7. Anatomy of Dockerfile

8. Building a Docker Image

9. Creating and Running a Docker Container

10. Invoking Docker Container using Java ProcessBuilder

2. Linux Containers

2

Page 3: Docker, LinuX Container

Docker containers are Linux Containers

CGROUPS NAMESPACES IMAGESDOCKER

CONTAINER

• Kernel Feature• Groups of Processes• Control Resource

Allocation• CPU, CPU Sets• Memory• Disk• Block I/O

• Not a File System• Not a VHD• Basically a tar file• Has a Hierarchy

• Arbitrary Depth• Fits into Docker Registry

• The real magic behind containers

• It creates barriers between processes

• Different Namespaces• PID Namespace• Net Namespace• IPC Namespace• MNT Namespace

• Linux Kernel Namespace introduced between kernel 2.6.15 – 2.6.26

docker run lxc-start

3

Page 4: Docker, LinuX Container

Docker Key Concepts• Docker images

• A Docker image is a read-only template. • For example, an image could contain an Ubuntu operating system with Apache and your

web application installed. • Images are used to create Docker containers. • Docker provides a simple way to build new images or update existing images, or you can

download Docker images that other people have already created.• Docker images are the build component of Docker.

• Docker containers

• Docker containers are similar to a directory. • A Docker container holds everything that is needed for an application to run.• Each container is created from a Docker image. • Docker containers can be run, started, stopped, moved, and deleted. • Each container is an isolated and secure application platform. • Docker containers are the run component of Docker.

• Docker Registries

• Docker registries hold images. • These are public or private stores from which you upload or download images. • The public Docker registry is called Docker Hub. • It provides a huge collection of existing images for your use. • These can be images you create yourself or you can use images that others have

previously created. • Docker registries are the distribution component of Docker.

4

Page 5: Docker, LinuX Container

Docker Architecture Linux Vs. OS X

• In an OS X installation, the docker daemon is running inside a Linux virtual machine provided by Boot2Docker.

• In OS X, the Docker host address is the address of the Linux VM. When you start the boot2docker process, the VM is assigned an IP address. Under boot2docker ports on a container map to ports on the VM.

5

Page 6: Docker, LinuX Container

Docker – Somewhere in the Future ……

Docker Running natively in Windows!

6

Page 7: Docker, LinuX Container

Docker Architecture – Linux• Docker Daemon

• Docker daemon, which does the heavy lifting of building, running, and distributing your Docker containers.

• Both the Docker client and the daemon can run on the same system, or you can connect a Docker client to a remote Docker daemon.

• The Docker client and daemon communicate via sockets or through a RESTful API.

• Docker Client (docker) Commands

• search (Search images in the Docker Repository)

• pull (Pull the Image)• run (Run the container)• create (Create the container)• build (build an image using Dockerfile)• images (Shows images)• push (Push the container to Docker

Repository)• import / export• start (start a stopped container)• stop (stop a container)• restart (Restart a container)• save (Save an image to a tar archive)• exec (Run a command in a running

container)• top (Look at the running process in a

container)• ps (List the containers)• attach (Attach to a running Container)• diff (Inspect changes to a containers file

system)

$ docker search applifire$ docker pull applifire/jdk:7$ docker images$ docker run –it applifire/jdk:7 /bin/bash

Examples

7

Page 8: Docker, LinuX Container

Docker client examples

Searching in the docker registry for images.

Images in your local registry after the build or directly pulled from docker registry.

To pull an imagedocker pull applifire/tomcat

8

Page 9: Docker, LinuX Container

Analyzing “docker run –it ubuntu /bin/bash” command

In order, Docker does the following:

1. Pulls the ubuntu image:

• Docker checks for the presence of the ubuntu image and, if it doesn't exist locally on the host, then Docker downloads it from Docker Hub.

• If the image already exists, then Docker uses it for the new container.• Creates a new container: Once Docker has the image, it uses it to create a

container.2. Allocates a filesystem and mounts a read-write layer:

• The container is created in the file system and a read-write layer is added to the image.

3. Allocates a network / bridge interface:

• Creates a network interface that allows the Docker container to talk to the local host.

4. Sets up an IP address:

• Finds and attaches an available IP address from a pool.5. Executes a process that you specify:

• Runs your application, and;6. Captures and provides application output:

• Connects and logs standard input, outputs and errors for you to see how your application is running.

9

Page 10: Docker, LinuX Container

Anatomy of a Dockerfile

Command Description Example

FROM

The FROM instruction sets the Base Image for subsequent instructions. As such, a valid Dockerfile must have FROM as its first instruction. The image can be any valid image – it is especially easy to start by pulling an imagefrom the Public repositories

FROM ubuntuFROM applifire/jdk:7

MAINTAINERThe MAINTAINER instruction allows you to set the Author field of the generated images.

MAINTAINER arafkarsh

LABELThe LABEL instruction adds metadata to an image. A LABEL is a key-value pair. To include spaces within a LABEL value, use quotes and blackslashes as you would in command-line parsing.

LABEL version="1.0”LABEL vendor=“Algo”

RUNThe RUN instruction will execute any commands in a new layer on top of the current image and commit the results. The resulting committed image will be used for the next step in the Dockerfile.

RUN apt-get install -y curl

ADDThe ADD instruction copies new files, directories or remote file URLs from <src> and adds them to the filesystem of the container at the path <dest>.

ADD hom* /mydir/ ADD hom?.txt /mydir/

COPYThe COPY instruction copies new files or directories from <src> and adds them to the filesystem of the container at the path <dest>.

COPY hom* /mydir/ COPY hom?.txt /mydir/

ENVThe ENV instruction sets the environment variable <key> to the value <value>. This value will be in the environment of all "descendent" Dockerfile commands and can be replaced inline in many as well.

ENV JAVA_HOME /JDK8ENV JRE_HOME /JRE8

EXPOSE

The EXPOSE instructions informs Docker that the container will listen on the specified network ports at runtime. Docker uses this information to interconnect containers using links and to determine which ports to expose to the host when using the –P flag with docker client.

EXPOSE 8080

10

Page 11: Docker, LinuX Container

Anatomy of a DockerfileCommand Description Example

VOLUME

The VOLUME instruction creates a mount point with the specified name and marks it as holding externally mounted volumes from native host or other containers. The value can be a JSON array, VOLUME ["/var/log/"], or a plain string with multiple arguments, such as VOLUME /var/log or VOLUME /var/log

VOLUME /data/webapps

USERThe USER instruction sets the user name or UID to use when running the image and for any RUN, CMD and ENTRYPOINT instructions that follow it in the Dockerfile.

USER applifire

WORKDIRThe WORKDIR instruction sets the working directory for any RUN, CMD, ENTRYPOINT, COPY and ADD instructions that follow it in the Dockerfile.

WORKDIR /home/user

CMD

There can only be one CMD instruction in a Dockerfile. If you list more than one CMD then only the last CMD will take effect.The main purpose of a CMD is to provide defaults for an executing container. These defaults can include an executable, or they can omit the executable, in which case you must specify an ENTRYPOINT instruction as well.

CMD echo "This is a test." | wc -

ENTRYPOINT

An ENTRYPOINT allows you to configure a container that will run as an executable. Command line arguments to docker run <image> will be appended after all elements in an exec form ENTRYPOINT, and will override all elements specified using CMD. This allows arguments to be passed to the entry point, i.e., docker run <image> -d will pass the -d argument to the entry point. You can override the ENTRYPOINT instruction using the docker run --entrypoint flag.

ENTRYPOINT ["top", "-b"]

11

Page 12: Docker, LinuX Container

Building a Docker image : Base Ubuntu

• Dockerfile (Text File)

• Create the Dockerfile

• Build image using Dockerfile

• The following command will build the docker image based on the Dockerfile.

• Docker will download any required build automatically from the Docker registry.

• docker build –t applifire/ubuntu .

• This will build a base ubuntu with enough Linux utilities for the development environment.

1

2

12

Page 13: Docker, LinuX Container

Building a Docker Image : Java 8 (JRE) + Tomcat 8

• Dockerfile (Text File)

1. Create the Java (JRE8) Dockerfile with Ubuntu as the base image.

2. Create the Tomcat Dockerfile with JRE8 as the base image.

• Build image using Dockerfile

1. Build Java 8 (JRE) Docker Image

docker build –t applifire/jre:8 .

1. Build Tomcat 8 Docker Image

docker build –t applifire/tomcat:jre8 .

1

2

13

Page 14: Docker, LinuX Container

Building a Docker Image : Java 7 (JDK) + Gradle 2.3

• Dockerfile (Text File)

1. Create the Java (JDK7) Dockerfile with Ubuntu as the base image.

2. Create the Gradle Dockerfile with Java (JDK7) as the base image.

• Build image using Dockerfile

1. Build Java 7 (JDK) Docker Image

docker build –t applifire/jdk:7 .

1. Build Gradle 2.3 Docker Image

docker build –t applifire/gradle:jdk7 .

1

2

14

Page 15: Docker, LinuX Container

Creating & Running Docker Containers

docker run Example

-d Detached modeTo run servers like Tomcat, Apache Web Server

-p

Publish Container’s Port

IP:hostport:ContainerPort 192.a.b.c:80:8080

IP::ContainerPort 192.a.b.c::8080

HostPort:ContainerPort 8081:8080

-it Run Interactive Mode

When you want to log into the container. This mode works fine from a Unix Shell. However, ensure that you don’t use this mode when running it through the ProcessBuilder in Java.

-v Mount Host File System -v host-file-system:container-file-system

-name Name to the container

-w Working Directory Working Directory for the Container

-u User NameUser Name with which you can log into the container.

15

Page 16: Docker, LinuX Container

Creating & Running Docker Containers - Advanced

docker run Example

--cpuset=“” CPUs in which to allow execution 0-3, 0,1

-m Memory Limit for the ContainerNumber & Unit (b, k, m, g)1g = 1GB, 1m = 1MB, 1k = 1KB

--memory-swap

Total Memory Usage (Memory + Swap Space)

Number & Unit (b, k, m, g)1g = 1GB

-e Set Environment Variables

--link=[] Link another containerWhen your Tomcat Container wants to talk to MySQL DB container.

--ipc Inter Process Communication

--dns Set Custom DNS Servers

--dns-search Set Custom DNS Search Domains

-h Container Host Name --hostname=“voldermort”

--expose=[]Expose a container port or a range of ports

--expose=8080-8090

--add-host Add a custom host-IP mapping Host:IP

16

Page 17: Docker, LinuX Container

Docker Container Process Management

docker ps

-a Show all the containers. Only running containers are shown by default

-q Only display the Numeric IDs

-s Display the total file sizes

-fProvide Filters to show containers.-f status=exited-f exited=100

-l Show only the latest Container.

docker startStarts a stopped Container. For example Tomcat ServerEx. docker start containerName

docker stopStops a container. Start and Stop is mainly used for detachedcontainers like Tomcat, MySQL, and Apache Web Server Containers.Ex. docker stop containerName

docker restartRestart a ContainerEx. docker restart containerName

17

Page 18: Docker, LinuX Container

Docker Container Management – Short Cuts

Remove all Exited Containers

docker rm containerId / name Removes the Exited Container

docker rm $(docker ps –aq) docker ps –aq : returns all the container ID in exited state into $ and then docker rm command will remove the exited containers.

docker stop To remove a running container, you need to stop the container first. Ex. Tomcat Server Running.docker stop containerName

Remove Docker Image

docker rmi imageId Removes the Docker Image

Remove all Docker images with <none> tag

docker rmi $(docker images | grep "^<none>" | tr -s '' | awk -F ' ' '{print $3}')*

* This command can be made even better….

18

Page 19: Docker, LinuX Container

Invoking Docker Container using Java ProcessBuilder API

When you execute docker command using Java ProcessBuilder API never use run with –it (for interactive and terminal). This will block the container from exiting, unless you want to have an interactive session..

Ex. docker run applifire/maven:jdk7 pom.xml

If you are using a shell script to invoke the docker container then refer the following to handle Linux and OS X environments.

Boot2Docker

Settings for

OS X

$? to get the

exit code of

previous

command

19

Page 20: Docker, LinuX Container

LinuX Container© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 21: Docker, LinuX Container

What’s Linux container

Linux Containers (LXC for LinuX Containers) are

Lightweight virtual machines (VMs)

Which are realized using features provided by a modern Linux

kernel –

VMs without the hypervisor

Containerization of:

(Linux) Operating Systems

Single or multiple applications (Tomcat, MySQL DB etc.,)

21

Page 22: Docker, LinuX Container

Why LXC?

Provision in seconds / milliseconds

Near bare metal runtime performance

VM-like agility – it’s still “virtualization”

Flexibility

• Containerize a “system”

• Containerize “application(s)”

Lightweight

• Just enough Operating System (JeOS)

• Minimal per container penalty

Open source – free – lower TCO

Supported with OOTB modern Linux kernel

Growing in popularity

Manual VM LXC

Provision Time

Days

Minutes

Seconds / ms

linpack performance @ 45000

0

50

100

150

200

250

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 BM

vcpusG

Flo

ps

Google trends - LXC Google trends - docker

“Linux Containers as poised as the next VM in our modern Cloud era…”

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

22

Page 23: Docker, LinuX Container

Hypervisors vs. Linux containers

Hardware

Operating System

Hypervisor

Virtual Machine

Operating

System

Bins / libs

Ap

p

Ap

p

Virtual Machine

Operating

System

Bins / libs

Ap

p

Ap

p

Hardware

Hypervisor

Virtual Machine

Operating

System

Bins / libs

Ap

p

Ap

p

Virtual Machine

Operating

System

Bins / libs

Ap

p

Ap

p

Hardware

Operating System

Container

Bins / libs

Ap

p

Ap

p

Container

Bins / libs

Ap

p

Ap

p

Type 1 Hypervisor Type 2 Hypervisor Linux Containers

Containers share the OS kernel of the host and thus are lightweight.

However, each container must have the same OS kernel.

Containers are isolated,

but share OS and,

where appropriate, libs /

bins.

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

23

Page 24: Docker, LinuX Container

LXC Technology Stack

LXCs are built on modern kernel features

• cgroups; limits, prioritization, accounting & control

• namespaces; process based resource isolation

• chroot; apparent root FS directory

• Linux Security Modules (LSM); Mandatory Access Control (MAC)

User space interfaces for kernel functions

LXC tools

• Tools to isolate process(es) virtualizing kernel resources

LXC commoditization

• Dead easy LXC

• LXC virtualization

Orchestration & management

• Scheduling across multiple hosts

• Monitoring

• Uptime

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

24

Page 25: Docker, LinuX Container

Linux cgroups

History

• Work started in 2006 by google engineers

• Merged into upstream 2.6.24 kernel due to wider spread LXC usage

• A number of features still a WIP

Functionality

• Access; which devices can be used per cgroup

• Resource limiting; memory, CPU, device accessibility, block I/O, etc.

• Prioritization; who gets more of the CPU, memory, etc.

• Accounting; resource usage per cgroup

• Control; freezing & check pointing

• Injection; packet tagging

Usage

• cgroup functionality exposed as “resource controllers” (aka “subsystems”)

• Subsystems mounted on FS

• Top-level subsystem mount is the root cgroup; all procs on host

• Directories under top-level mounts created per cgroup

• Procs put in tasks file for group assignment

• Interface via read / write pseudo files in group

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

25

Page 26: Docker, LinuX Container

Linux cgroup subsystemscgroups provided via kernel modules

• Not always loaded / provided by default

• Locate and load with modprobe

Some features tied to kernel version

See: https://www.kernel.org/doc/Documentation/cgroups/

Subsystem Tunable Parameters

blkio - Weighted proportional block I/O access. Group wide or per device.

- Per device hard limits on block I/O read/write specified as bytes per second or IOPS per second.

cpu - Time period (microseconds per second) a group should have CPU access.

- Group wide upper limit on CPU time per second.

- Weighted proportional value of relative CPU time for a group.

cpuset - CPUs (cores) the group can access.

- Memory nodes the group can access and migrate ability.

- Memory hardwall, pressure, spread, etc.

devices - Define which devices and access type a group can use.

freezer - Suspend/resume group tasks.

memory - Max memory limits for the group (in bytes).

- Memory swappiness, OOM control, hierarchy, etc..

hugetlb - Limit HugeTLB size usage.- Per cgroup HugeTLB metrics.

net_cls - Tag network packets with a class ID.- Use tc to prioritize tagged packets.

net_prio - Weighted proportional priority on egress traffic (per interface).

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

26

Page 27: Docker, LinuX Container

Linux cgroups FS layout

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

27

Page 28: Docker, LinuX Container

Linux cgroups Pseudo FS Interface

/sys/fs/cgroup/my-lxc

|-- blkio| |-- blkio.io_merged| |-- blkio.io_queued| |-- blkio.io_service_bytes| |-- blkio.io_serviced| |-- blkio.io_service_time| |-- blkio.io_wait_time| |-- blkio.reset_stats| |-- blkio.sectors| |-- blkio.throttle.io_service_bytes| |-- blkio.throttle.io_serviced| |-- blkio.throttle.read_bps_device| |-- blkio.throttle.read_iops_device| |-- blkio.throttle.write_bps_device| |-- blkio.throttle.write_iops_device| |-- blkio.time| |-- blkio.weight| |-- blkio.weight_device| |-- cgroup.clone_children| |-- cgroup.event_control| |-- cgroup.procs| |-- notify_on_release| |-- release_agent| `-- tasks

|-- cpu| |-- ...

|-- ...`-- perf_event

echo "8:16 1048576“ > blkio.throttle.read_bps_device

cat blkio.weight_devicedev weight8:1 2008:16 500

App

App

App

Linux pseudo FS is the interface to cgroups

• Read / write to pseudo file(s) in your cgroup directory

Some libs exist to interface with pseudo FS programmatically

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

28

Page 29: Docker, LinuX Container

Linux cgroups: CPU Usage

Use CPU shares (and other controls) to prioritize jobs / containers

Carry out complex scheduling schemes

Segment host resources

Adhere to SLAs

29

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 30: Docker, LinuX Container

Linux cgroups: CPU Pinning

Pin containers / jobs to CPU cores

Carry out complex scheduling schemes

Reduce core switching costs

Adhere to SLAs

30

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 31: Docker, LinuX Container

Linux cgroups: Device Access

Limit device visibility; isolation

Implement device access controls

• Secure sharing

Segment device access

Device whitelist / blacklist

31

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 32: Docker, LinuX Container

LXC Realization: Linux cgroups

cgroup created per container (in each cgroup subsystem)

Prioritization, access, limits per container a la cgroup controls

Per container metrics (bean counters)

32

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 33: Docker, LinuX Container

Linux namespaces

History

• Initial kernel patches in 2.4.19

• Recent 3.8 patches for user namespace support

• A number of features still a WIP

Functionality

• Provide process level isolation of global resources• MNT (mount points, file systems, etc.)

• PID (process)

• NET (NICs, routing, etc.)

• IPC (System V IPC resources)

• UTS (host & domain name)

• USER (UID + GID)

• Process(es) in namespace have illusion they are the only processes on the system

• Generally constructs exist to permit “connectivity” with parent namespace

Usage

• Construct namespace(s) of desired type

• Create process(es) in namespace (typically done when creating namespace)

• If necessary, initialize “connectivity” to parent namespace

• Process(es) in name space internally function as if they are only proc(s) on system

33

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 34: Docker, LinuX Container

Linux namespaces: Conceptual Overview 34

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 35: Docker, LinuX Container

Linux namespaces: MNT namespace

Isolates the mount table – per namespace mounts

mount / unmount operations isolated to namespace

Mount propagation

• Shared; mount objects propagate events to one another

• Slave; one mount propagates events to another, but not vice versa

• Private; no event propagation (default)

Unbindable mount forbids bind mounting itself

Various tools / APIs support the mount namespace such as the mount command

• Options to make shared, private, slave, etc.

• Mount with namespace support

Typically used with chroot or pivot_root for effective root FS isolation

35

“global” (i.e. root)

namespace

“green”

namespace

“red” namespace

MNT NS//proc/mnt/fsrd/mnt/fsrw/mnt/cdrom/run2

MNT NS//proc/mnt/greenfs/mnt/fsrw/mnt/cdrom

MNT NS//proc/mnt/cdrom/redns

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 36: Docker, LinuX Container

Linux namespaces: UTS namespace

Per namespace

• Hostname

• NIS domain name

Reported by commands such as hostname

Processes in namespace can change UTS

values – only reflected in the child

namespace

Allows containers to have their own FQDN

36

“global” (i.e. root)

namespace

“green”

namespace

“red” namespace

UTS NS

globalhostrootns.com

UTS NS

greenhostgreenns.org

UTS NS

redhostredns.com

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 37: Docker, LinuX Container

Linux namespaces: PID namespace

Per namespace PID mapping

• PID 1 in namespace not the same as PID 1 in

parent namespace

• No PID conflicts between namespaces

• Effectively 2 PIDs; the PID in the namespace

and the PID outside the namespace

Permits migrating namespace processes

between hosts while keeping same PID

Only processes in the namespace are visible

within the namespace (visibility limited)

37

“global” (i.e. root)

namespace

“green”

namespace

“red” namespace

PID NS

PID COMMAND1 /sbin/init2 [kthreadd]3 [ksoftirqd]4 [cpuset]5 /sbin/udevd

PID NS

PID COMMAND1 /bin/bash2 /bin/vim

PID NS

PID COMMAND1 /bin/bash2 python3 node

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 38: Docker, LinuX Container

Linux namespaces: IPC namespace

System V IPC object & POSIX message queue

isolation between namespaces

• Semaphores

• Shared memory

• Message queues

Parent namespace connectivity

• Signals

• Memory polling

• Sockets (if no NET namespace)

• Files / file descriptors (if no mount namespace)

• Events over pipe pair

38

“global” (i.e. root)

namespace

“green”

namespace

“red” namespace

IPC NSSHMID OWNER32452 root43321 boden

SEMID OWNER0 root1 boden

IPC NS

SHMID OWNER

SEMID OWNER0 root

IPC NS

SHMID OWNER

SEMID OWNER

MSQID OWNER

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 39: Docker, LinuX Container

Linux namespaces: NET namespace

Per namespace network objects

• Network devices (eths)

• Bridges

• Routing tables

• IP address(es)

• ports

• Etc

Various commands support network namespace

such as ip

Connectivity to other namespaces

• veths – create veth pair, move one inside the

namespace and configure

• Acts as a pipe between the 2 namespaces

LXCs can have their own IPs, routes, bridges, etc.

39

“global” (i.e. root)

namespace

“green”

namespace

“red” namespace

NET NSlo: UNKNOWN…eth0: UP…eth1: UP…br0: UP…

app1 IP:5000app2 IP:6000app3 IP:7000

NET NS

lo: UNKNOWN…eth0: UP…

app1 IP:1000app2 IP:7000

NET NS

lo: UNKNOWN…eth0: DOWN…eth1: UP

app1 IP:7000app2 IP:9000

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 40: Docker, LinuX Container

Linux namespaces: USER namespace

A long work in progress – still development for XFS and other FS support

• Significant security impacts

• A handful of security holes already found + fixed

Two major features provided:

• Map UID / GID from outside the container to UID / GID inside the container

• Permit non-root users to launch LXCs

• Distro’s rolling out phased support, with UID / GID mapping typically 1st

First process in USER namespace has full CAPs; perform initializations before other processes are created

• No CAPs in parent namespace

UID / GID map can be pre-configured via FS

Eventually USER namespace will mitigate many perceived LXC security concerns

40

“global” (i.e. root)

namespace

“green”

namespace

“red” namespace

USER NS

root 0:0ntp 104:109Mysql 105:110boden 106:111

USER NS

root 0:0app 106:111

USER NS

root 0:0app 104:109

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 41: Docker, LinuX Container

LXC Realization: Linux namespaces 41

A set of namespaces created for the container

Container process(es) “executed” in the namespace set

Process(es) in the container have isolated view of resources

Connectivity to parent where needed (via lxc tooling)

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 42: Docker, LinuX Container

Linux namespaces & cgroups: Availability 42

Note: user namespace support

in upstream kernel 3.8+, but

distributions rolling out phased

support:

- Map LXC UID/GID between

container and host

- Non-root LXC creation

Page 43: Docker, LinuX Container

Linux chroots

Changes apparent root directory for process and children

• Search paths

• Relative directories

• Etc

Using chroot can be escaped given proper capabilities, thus pivot_root is often used instead

• chroot; points the processes file system root to new directory

• pivot_root; detaches the new root and attaches it to process root directory

Often used when building system images

• Chroot to temp directory

• Download and install packages in chroot

• Compress chroot as a system root FS

LXC realization

• Bind mount container root FS (image)

• Launch (unshare or clone) LXC init process in a new MNT namespace

• pivot_root to the bind mount (root FS)

43

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 44: Docker, LinuX Container

Linux chroot vs pivot_root 44

Using pivot_root with MNT namespace addresses escaping

chroot concerns

The pivot_root target directory becomes the “new root FS”

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 45: Docker, LinuX Container

LXC Realization: Images

LXC images provide a flexible means to deliver only what you need – lightweight and minimal footprint

Basic constraints

• Same architecture

• Same endian

• Linux’ish Operating System; you can run different Linux distros on same host

Image types

• System; images intended to virtualize Operating System(s) – standard distro root FS less the kernel

• Application; images intended to virtualize application(s) – only package apps + dependencies (aka JeOS – Just enough Operating System)

Bind mount host libs / bins into LXC to share host resources

Container image init process

• Container init command provided on invocation – can be an application or a full fledged init process

• Init script customized for image – skinny SysVinit, upstart, etc.

• Reduces overhead of lxc start-up and runtime foot print

Various tools to build images

• SuSE Kiwi

• Debootstrap

• Etc.

LXC tooling options often include numerous image templates

45

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 46: Docker, LinuX Container

Linux Security Modules & MAC

Linux Security Modules (LSM) – kernel modules which provide a framework for Mandatory Access Control (MAC) security implementations

MAC vs DAC

• In MAC, admin (user or process) assigns access controls to subject / initiator

• Most MAC implementations provide the notion of profiles

• Profiles define access restrictions and are said to “confine” a subject

• In DAC, resource owner (user) assigns access controls to individual resources

Existing LSM implementations include: AppArmor, SELinux, GRSEC, etc.

46

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 47: Docker, LinuX Container

Linux Capabilities & Other Security Measures

Linux capabilities

• Per process privileges which define operational (sys call) access

• Typically checked based on process EUID and EGID

• Root processes (i.e. EUID = GUID = 0) bypass capability checks

Capabilities can be assigned to LXC processes to restrict

Other LXC security mitigations

• Reduce shared FS access using RO bind mounts

• Keep Linux kernel up to date

• User namespaces in 3.8+ kernel• Allow to launch containers as non-root user

• Map UID / GID inside / outside of container

47

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 48: Docker, LinuX Container

LXC Realization 48

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 49: Docker, LinuX Container

LXC Tooling

LXC is not a kernel feature – it’s a technology enabled via kernel features

• User space tooling required to manage LXCs effectively

Numerous toolsets exist

• Then: add-on patches to upstream kernel due to slow kernel acceptance

• Now: upstream LXC feature support is growing – less need for patches

More popular GNU Linux toolsets include libvirt-lxc and lxc (tools)

• OpenVZ is likely the most mature toolset, but it requires kernel patches

• Note: I would consider docker a commoditization of LXC

Non-GNU Linux based LXC

• Solaris zones

• BSD jails

• Illumos / SmartOS (solaris derivatives)

• Etc.

49

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 50: Docker, LinuX Container

LXC Industry Tooling 50

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 51: Docker, LinuX Container

Libvirt-lxc

Perhaps the simplest to learn through a familiar virsh interface

Libvirt provides LXC support by connecting to lxc:///

Many virsh commands work

• virsh -c lxc:/// define sample.xml

• virsh –c lxc:/// start sample

• virsh –c lxc:/// console sample

• virsh –c lxc:/// shutdown sample

• virsh –c lxc:/// undefine sample

No snapshotting, templates…

OpenStack support since Grizzly

No VNC

No Cinder support in Grizzly

Config drive not supported Alternative means of accessing metadata

Attached disk rather than http calls

51

<domain type='lxc'>

<name>sample</name>

<memory>32768</memory>

<os> <type>exe</type> <init>/init</init> </os>

<vcpu>1</vcpu>

<clock offset='utc'/>

<on_poweroff>destroy</on_poweroff>

<on_reboot>restart</on_reboot>

<on_crash>destroy</on_crash>

<devices>

<emulator>/usr/libexec/libvirt_lxc</emulator>

<filesystem type='mount'> <source dir='/opt/vm-1-root'/> <target dir='/'/> </filesystem>

<interface type='network'> <source network='default'/> </interface>

<console type='pty' />

</devices>

</domain>

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 52: Docker, LinuX Container

LXC (tools)

A little more functionality

Supported by the major distributions

LXC 1.0 recently released

• Cloning supported: lxc-clone

• Templates… btrfs

• lxc-create -t ubuntu -n CN creates a new ubuntu container

• “template” is downloaded from Ubuntu

• Some support for Fedora <= 14

• Debian is supported

• lxc-start -d -n CN starts the container

• lxc-destroy -n CN destroys the container

• /etc/lxc/lxc.conf has default settings

• /var/lib/lxc/CN is the default place for each container

52

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 53: Docker, LinuX Container

LXC Commoditization: docker

Young project with great vibrancy in the industry

Currently based on unmodified LXC – but the goal is to make it dirt easy

As of March 10th, 2014 at v0.9. Monthly releases, 1.0 should be ready for production use

What docker adds to LXC

• Portable deployment across machines• In Cloud terms, think of LXC as the hypervisor and docker as the Open Virtualization Appliance (OVA) and the provision

engine

• Docker images can run unchanged on any platform supporting docker

• Application-centric• User facing function geared towards application deployment, not VM analogs [!]

• Automatic build• Create containers from build files

• Builders can use chef, maven, puppet, etc.

• Versioning support• Think of git for docker containers

• Only delta of base container is tracked

• Component re-use• Any container can be used as a base, specialized and saved

• Sharing• Support for public/private repositories of containers

• Tools• CLI / REST API for interacting with docker

• Vendors adding tools daily

Docker containers are self contained – no more “dependency hell”

53

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 54: Docker, LinuX Container

Docker vs. LXC vs. Hypervisor 54

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 55: Docker, LinuX Container

Docker: LXC Virtualization? 55

Docker decouples the LXC provider from the operations

• LXC provider agnostic

Docker “images” run anywhere docker is supported

• Portability

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 56: Docker, LinuX Container

LXC Orchestration & Management

Docker & libvirt-lxc in OpenStack

• Manage containers heterogeneously with traditional VMs… but not w/the level of support & features we might like

CoreOS

• Zero-touch admin Linux distro with docker images as the unit of operation

• Centralized key/value store to coordinate distributed environment

Various other 3rd party apps

• Maestro for docker

• Shipyard for docker

• Fleet for CoreOS

• Etc.

LXC migration

• Container migration via criu

But…

• Still no great way to tie all virtual resources together with LXC – e.g. storage + networking

• IMO; an area which needs focus for LXC to become more generally applicable

56

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 57: Docker, LinuX Container

Docker in OpenStack

Introduced in Havana

• A nova driver to integrate with docker REST API

• A Glance translator to integrate containers with Glance

• A docker container which implements a docker registry API

The claim is that docker will become a “group A” hypervisor

• In it’s current form it’s effectively a “tech preview”

57

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 58: Docker, LinuX Container

LXC Evaluation

Goal: validate the promise with an eye towards practical

applicability

Dimensions evaluated:

• Runtime performance benefits

• Density / footprint

• Workload isolation

• Ease of use and tooling

• Cloud Integration

• Security

• Ease of use / feature set

NOTE: tests performed in a passive manner – deeper analysis

warrented.

58

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 59: Docker, LinuX Container

Runtime Performance Benefits - CPU

Tested using libvirt lxc on Ubuntu 13.10 using linpack 11.1

Cpuset was used to limit the number of CPUs that the containers could use

The performance overhead falls within the error of measurement of this test

Actual bare metal performance is actually lower than some container results

59

linpack performance @ 45000

0

50

100

150

200

250

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 BM

vcpus

GF

lop

s

220.77

Bare metal220.5

@32 vcpu

220.9

@ 31 vcpu

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 60: Docker, LinuX Container

Runtime Performance Benefits – I/O

I/O Tests using libvirt lxc show a < 1 % degradation

Tested with a pass-through mount

60

Sync read I/O test

Rw=WriteSize=1024mBs=128mbdirect=1sync=1

Sync write I/O test

Rw=WriteSize=1024mBs=128mbdirect=1sync=1

I/O throughput

1711.2 1724.91626.4 1633.4

0

500

1000

1500

2000

lxc write bare metal

write

lxc read bare metal

read

test

MB

/s

Series1

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 61: Docker, LinuX Container

Runtime Performance Benefits – Block I/O

Tested with [standard] AUFS

61

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 62: Docker, LinuX Container

Density & Footprint – libvirt-lxc 62

Starting 500 containers

Mon Nov 11 13:38:49 CST 2013 ... all threads

done in 157

(sequential I/O bound)

Stopping 500 containers

Mon Nov 11 13:42:20 CST 2013 ... all threads

done in 162

Active memory delta: 417.2 KB

Starting 1000 containers

Mon Nov 11 13:59:19 CST 2013 ... all threads

done in 335

Stopping 1000 containers

Mon Nov 11 14:14:26 CST 2013 ... all threads

done in 339

Active memory delta: 838.4KB

Using libvirt lxc on RHEL 6.4, we found that empty container overhead was just 840 bytes. A container could be started in

about 330ms, which was an I/O bound process

This represents the lower limit of lxc footprint

Containers ran /bin/sh

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 63: Docker, LinuX Container

Density & Footprint – Docker

In this test, we created 150 Docker containers with CentOS,

started apache & then removed them

Average footprint was ~10MB per container

Average start time was 240ms

Serially booting 150 containers which run

apache

• Takes on average 36 seconds

• Consumes about 2 % of the CPU

• Negligible HDD space

• Spawns around 225 processes for create

• Around 1.5 GB of memory ~ 10 MB per container

• Expect faster results once docker addresses performance topics in

the next few months

Serially destroying 150 containers

running apache

• On average takes 9 seconds

• We would expect destroy to be faster – likely a docker bug and will

triage with the docker community

63

Container

Creation

Container

Deletion

I/O profile

CPU profile

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 64: Docker, LinuX Container

Workload Isolation: Examples

Using the blkio cgroup (lxc.cgroup.blkio.throttle.read_bps_device) to cap the I/O of a container

Both the total bps and iops_device on read / write could be capped

Better async BIO support in kernel 3.10+

We used fio with oflag=sync, direct to test the ability to cap the reads:

• With limit set to 6 MB / secondREAD: io=131072KB, aggrb=6147KB/s, minb=6295KB/s, maxb=6295KB/s, mint=21320msec,

maxt=21320msec

• With limit set to 60 MB / secondREAD: io=131072KB, aggrb=61134KB/s, minb=62601KB/s, maxb=62601KB/s, mint=2144msec,

maxt=2144msec

• No read limitREAD: io=131072KB, aggrb=84726KB/s, minb=86760KB/s, maxb=86760KB/s, mint=1547msec,

maxt=1547msec

64

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 65: Docker, LinuX Container

OpenStack VM Operations 65

NOTE: orchestration / management overheads cap LXC performance

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 66: Docker, LinuX Container

Who’s Using LXC

Google app engine & infra is said to be using some form of LXC

RedHat OpenShift

dotCloud (now docker inc)

CloudFoundry (early versions)

Rackspace Cloud Databases

• Outperforms AWS (Xen) according to perf results

Parallels Virtuozzo (commercial product)

Etc..

66

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 67: Docker, LinuX Container

LXC Gaps

There are gaps…

Lack of industry tooling / support

Live migration still a WIP

Full orchestration across resources (compute / storage / networking)

Fears of security

Not a well known technology… yet

Integration with existing virtualization and Cloud tooling

Not much / any industry standards

Missing skillset

Slower upstream support due to kernel dev process

Etc.

67

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 68: Docker, LinuX Container

LXC: Use Cases For Traditional VMs

There are still use cases where traditional VMs are warranted.

Virtualization of non Linux based OSs

• Windows

• AIX

• Etc.

LXC not supported on host

VM requires unique kernel setup which is not applicable to

other VMs on the host (i.e. per VM kernel config)

Etc.

68

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 69: Docker, LinuX Container

LXC Recommendations

Public & private Clouds

• Increase VM density 2-3x

• Accommodate Big Data & HPC type applications

• Move the support of Linux distros to containers

PaaS & managed services

• Realize “as a Service” and managed services using LXC

Operations management

• Ease management + increase agility of bare metal components

DevOps

Development & test

• Sandboxes

• Dev / test envs

• Etc.

If you are just starting with LXC and don’t have in-depth skillset

• Start with LXC for private solutions (trusted code)

69

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 70: Docker, LinuX Container

LXC Resourceshttps://www.kernel.org/doc/Documentation/cgroups/

http://www.blaess.fr/christophe/2012/01/07/linux-3-2-cfs-cpu-bandwidth-english-version/

http://atmail.com/kb/2009/throttling-bandwidth/

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch-Subsystems_and_Tunable_Parameters.html

http://www.janoszen.com/2013/02/06/limiting-linux-processes-cgroups-explained/

http://www.mattfischer.com/blog/?p=399

http://oakbytes.wordpress.com/2012/09/02/cgroup-cpu-allocation-cpu-shares-examples/

http://fritshoogland.wordpress.com/2012/12/15/throttling-io-with-linux/

https://lwn.net/Articles/531114/

https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt

http://www.ibm.com/developerworks/library/l-mount-namespaces/

http://blog.endpoint.com/2012/01/linux-unshare-m-for-per-process-private.html

http://timothysc.github.io/blog/2013/02/22/perprocess/

http://www.evolware.org/?p=293

http://s3hh.wordpress.com/2012/05/10/user-namespaces-available-to-play/

http://libvirt.org/drvlxc.html

https://help.ubuntu.com/lts/serverguide/lxc.html

https://linuxcontainers.org/

https://wiki.ubuntu.com/AppArmor

http://linux.die.net/man/7/capabilities

http://docs.openstack.org/trunk/config-reference/content/lxc.html

https://wiki.openstack.org/wiki/Docker

https://www.docker.io/

http://marceloneves.org/papers/pdp2013-containers.pdf

http://openvz.org/Main_Page

http://criu.org/Main_Page

70

© Realizing Linux Containers (LXC) Building Blocks, Underpinnings & Motivations : By Boden Russell – IBM Technology Services ([email protected])

Page 71: Docker, LinuX Container

Thank You…..

Araf Karsh Hamid ([email protected])

Thanks to : Boden Russell – IBM Technology Services ([email protected])For the fantastic presentation on LinuX Containers

71