linux security and isolation apis control groups …...outline 13 cgroups 13-1 13.1 introduction to...

16
Linux Security and Isolation APIs Control Groups (cgroups) Michael Kerrisk, man7.org © 2020 [email protected] February 2020 Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups v1: populating a cgroup 13-24 13.4 Cgroups v1: release notication 13-33 13.5 Cgroups v1: a survey of the controllers 13-43 13.6 Cgroups /proc les 13-65 13.7 Cgroup namespaces 13-68

Upload: others

Post on 03-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

Linux Security and Isolation APIs

Control Groups (cgroups)

Michael Kerrisk, man7.org © 2020

[email protected]

February 2020

Outline

13 Cgroups 13-113.1 Introduction to cgroups v1 and v2 13-313.2 Cgroups v1: hierarchies and controllers 13-1713.3 Cgroups v1: populating a cgroup 13-2413.4 Cgroups v1: release notification 13-3313.5 Cgroups v1: a survey of the controllers 13-4313.6 Cgroups /proc files 13-6513.7 Cgroup namespaces 13-68

Page 2: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

Outline

13 Cgroups 13-113.1 Introduction to cgroups v1 and v2 13-313.2 Cgroups v1: hierarchies and controllers 13-1713.3 Cgroups v1: populating a cgroup 13-2413.4 Cgroups v1: release notification 13-3313.5 Cgroups v1: a survey of the controllers 13-4313.6 Cgroups /proc files 13-6513.7 Cgroup namespaces 13-68

Goals

Cgroups is a big topicMany controllersV1 versus V2 interfaces

Our goal: understand fundamental semantics of cgroupfilesystem and interfaces

Useful from a programming perspectiveHow do I build container frameworks?What else can I build with cgroups?

And useful from a system engineering perspectiveWhat’s going on underneath my container’s hood?

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-4 §13.1

Page 3: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

Focus

We’ll focus on:General principles of operation; goals of cgroupsThe cgroup filesystemInteracting with the cgroup filesystem using shellcommandsProblems with cgroups v1, motivations for cgroups v2Differences between cgroups v1 and v2

We’ll look briefly at some of the controllers

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-5 §13.1

Resources

Kernel Documentation filesDocumentation/cgroup-v1/*.txtDocumentation/admin-guide/cgroup-v2.rst

cgroups(7) man pageNeil Brown’s excellent (2014) LWN.net series on Cgroups:https://lwn.net/Articles/604609/

Thought-provoking commentary on the meaning ofgrouping and hierarchy

https://lwn.net/Articles/484254/ – Tejun Heo’s initialthinking about redesigning cgroupsOther articles at https://lwn.net/Kernel/Index/#Control_groups

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-6 §13.1

Page 4: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

History

2006/2007, “Process Containers”Developed by engineers at Google2007: renamed “control groups” to avoid confusion withalternate meaning for “containers”

January 2008: initial release in mainline kernel (Linux2.6.24)

Three resource controllers in initial mainline releaseFast-forward a few years...

Many new resource controllers addedVarious problems arose from haphazard/uncoordinateddevelopment of cgroup controllers

“Design followed implementation” :-(

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-7 §13.1

History

Sep 2012: work begins on cgroups v2In-kernel changes, but marked experimentalChanges were necessarily incompatible with cgroups v1

⇒ Create new/orthogonal filesystem interface for v2March 2016, Linux 4.5: cgroups version 2 becomes official

Older version (cgroups v1) remainsA.k.a. “legacy cgroups”, but not going away in a hurry

Oct 2019: Fedora 31 is first distro to switch to v2-by-defaultBoot with systemd.unified_cgroup_hierarchy=0 torevert to v1/v2 “hybrid” mode

Cgroups v2 work is ongoingFor now, some functionality remains available only via v1Conversely, v2 offers a number of advantages over v1

Subject to some rules, can use both versions at same time

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-8 §13.1

Page 5: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

Cgroups overview

Two principle components:A mechanism for hierarchically grouping processesA set of controllers (kernel components) that manage,control, or monitor processes in cgroups

(Resources such as CPU, memory, block I/O bandwidth)

Interface is via a pseudo-filesystemCgroup manipulation takes form of filesystem operations,which might be done:

Via shell commandsProgrammaticallyVia management daemon (e.g., systemd)Via your container framework’s tools (e.g., LXC, Docker)

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-9 §13.1

What do cgroups allow us to do?

Limit resource usage of groupE.g., limit percentage of CPU available to group

Prioritize group for resource allocationE.g., some group might get greater proportion of CPU

Resource accountingMeasure resources used by processes

Freeze a groupFreeze, restore, and checkpoint a group

And more...

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-10 §13.1

Page 6: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

Terminology and semantics

Control group: group of processes bound to set ofparameters or limits(Resource) controller: kernel component that controls ormonitors processes in a cgroup

E.g., memory controller limits memory usage; cpuacctaccounts for CPU usageAlso known as subsystem

(But that term is rather ambiguous)Cgroups for each controller are arranged in a hierarchy

Child cgroups inherit attributes from parent

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-11 §13.1

Filesystem interface

Cgroup filesystem directory structure defines cgroups +cgroup hierarchy

I.e., use mkdir(2) / rmdir(2) (or equivalent shellcommands) to create cgroups

Each subdirectory contains automagically created filesSome files are used to manage the cgroup itselfOther files are controller-specific

Files in cgroup are used to:Define/display membership of cgroupControl behavior of processes in cgroupExpose information about processes in cgroup (e.g.,resource usage stats)

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-12 §13.1

Page 7: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

Example: the pids controller (cgroups v1)

pids (“process number”) controller allows us to limitnumber of PIDs in cgroup

Prevent fork() bombs!Use mount to attach pids controller to cgroup filesystem:# mkdir -p /sys/fs/ cgroup /pids # Create mount point# mount -t cgroup -o pids none /sys/fs/ cgroup /pids

� May not be necessarySome systems automatically mount filesystems withcontrollers attached

Specifically, systemd mounts the v1 controllers undersubdirectories of /sys/fs/cgroup, a tmpfs filesystemmounted via:# mount -t tmpfs tmpfs /sys/fs/ cgroup

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-13 §13.1

Example: the pids controller (cgroups v1)

Create new cgroup, and place shell’s PID in that cgroup:# mkdir /sys/fs/ cgroup /pids/g1# echo $$17273# echo $$ > /sys/fs/ cgroup /pids/g1/ cgroup .procs

cgroup.procs defines/displays PIDs in cgroupWhich processes are in cgroup?# cat /sys/fs/ cgroup /pids/g1/ cgroup .procs1727320591

Where did PID 20591 come from?PID 20591 is cat command, created as a child of shell

Child processes inherit parent’s cgroup membership(s)

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-14 §13.1

Page 8: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

Example: the pids controller (cgroups v1)

Limit number of processes in cgroup, and show effect:# echo 20 > /sys/fs/ cgroup /pids/g1/pids.max# for a in $(seq 1 20); do sleep 20 & done[1] 20938...[18] 20955bash: fork: retry: Resource temporarily unavailable

pids.max defines/exposes limit on number of PIDs incgroup

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-15 §13.1

Applications

Cgroups (v1) is used in a range of applicationsContainer frameworks such as Docker and LXCFirejailFlatpaksystemd (also knows about cgroups v2)and more...

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-16 §13.1

Page 9: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

Outline

13 Cgroups 13-113.1 Introduction to cgroups v1 and v2 13-313.2 Cgroups v1: hierarchies and controllers 13-1713.3 Cgroups v1: populating a cgroup 13-2413.4 Cgroups v1: release notification 13-3313.5 Cgroups v1: a survey of the controllers 13-4313.6 Cgroups /proc files 13-6513.7 Cgroup namespaces 13-68

Cgroup hierarchies

Cgroup == collection of processesCgroup hierarchy == hierarchical arrangement of cgroups

Implemented via a cgroup pseudo-filesystemStructure and membership of cgroup hierarchy is defined by:

1 Mounting a cgroup filesystem2 Creating a subdirectory structure that reflects desired

cgroup hierarchy3 Moving processes within hierarchy by writing their PIDs

to special files in cgroup subdirectoriesE.g., cgroup.procs

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-18 §13.2

Page 10: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

Attaching a controller to a hierarchy

A controller is attached to a hierarchy by mounting acgroup filesystem:# mkdir -p /sys/fs/ cgroup /pids # Create mount point# mount -t cgroup -o pids none /sys/fs/ cgroup /pids

Here, pids controller was mountednone can be replaced by any suitable mnemonic name

Not interpreted by system, but appears in /proc/mounts

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-19 §13.2

Attaching a controller to a hierarchy

To see which cgroup filesystems are mounted and theirattached controllers:# mount | grep cgroupnone on /sys/fs/ cgroup /pids type cgroup (rw ,pids)# grep cgroup /proc/ mountsnone /sys/fs/ cgroup /pids cgroup rw ,... , pids 0 0

Unmounting filesystem detaches the controller:# umount /sys/fs/ cgroup /pids

But..., filesystem will remain (invisibly) mounted if itcontains child cgroups

I.e., must move all processes to root cgroup, and removechild cgroups, to truly unmount

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-20 §13.2

Page 11: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

Attaching controllers to hierarchies

A controller can be attached to only one hierarchyMounting same controller at different mount point simplycreates second view of same hierarchy

Multiple controllers can be attached to same hierarchy:# mkdir -p /sys/fs/ cgroup / mem_cpu# mount -t cgroup -o memory ,cpu none \

/sys/fs/ cgroup / mem_cpu

In effect, resources associated with those controllers arebeing managed together

Or, all controllers can be attached to one hierarchy:# mount -t cgroup -o all none /some/mount/point

-o all is the default if no controller is specified

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-21 §13.2

Creating cgroups

When a new hierarchy is created, all tasks on system arepart of root cgroup for that hierarchyNew cgroups are created by creating subdirectories undercgroup mount point:# mkdir /sys/fs/ cgroup / memory /g1

Relationships between cgroups are reflected by creatingnested (arbitrarily deep) subdirectory structure

Meaning of hierarchical relationship depends on controller

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-22 §13.2

Page 12: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

Destroying cgroups

An empty cgroup can be destroyed by removing directoryEmpty == last process in cgroup terminates or migrates toanother cgroup and last child cgroup is removed

Presence of zombie process does not prevent removal ofcgroup directory

(Notionally, zombies are moved to root cgroup)

Not necessary (or possible) to delete attribute files insidecgroup directory before deleting it

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-23 §13.2

Page 13: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

Outline

13 Cgroups 13-113.1 Introduction to cgroups v1 and v2 13-313.2 Cgroups v1: hierarchies and controllers 13-1713.3 Cgroups v1: populating a cgroup 13-2413.4 Cgroups v1: release notification 13-3313.5 Cgroups v1: a survey of the controllers 13-4313.6 Cgroups /proc files 13-6513.7 Cgroup namespaces 13-68

Placing a process in a cgroup

To move a process to a cgroup, we write its PID tocgroup.procs file in corresponding subdirectory# echo $$ > /sys/fs/ cgroup / memory /g1/ cgroup .procs

In multithreaded process, moves all threads to cgroup...� Can write only one PID at a time

write() fails with EINVAL

Writing 0 to cgroup.procs moves writing process to cgroup

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-26 §13.3

Page 14: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

Viewing cgroup membership

To see PIDs in cgroup, read cgroup.procs filePIDs are newline-separatedZombie processes do not appear in list

� List is not guaranteed to be sorted or free ofduplicates

PID might be moved out and back into cgroup or recycledwhile reading list

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-27 §13.3

Cgroup membership details

Within a hierarchy, a process can be member of just onecgroup

That association defines attributes / parameters that applyto the process

Adding a process to a different cgroup automaticallyremoves it from previous cgroupA process can be a member of multiple cgroups, each ofwhich is in a different hierarchyOn fork(), child inherits cgroup memberships of parent

Afterward, cgroup memberships of parent and child can beindependently changed

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-28 §13.3

Page 15: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

Placing a thread (task) in a cgroup

Writing a PID to cgroup.procs moves all threads inthread group to a cgroupCgroups v1 also supports notion of thread-levelgranularity for cgroup membership

I.e., individual threads in a multithreaded process can beplaced in different cgroups⇒ threads can be subject to different control settings

Each cgroup directory also has a tasks file...Writing a thread ID (TID) to tasks moves that thread tocgroup

Thread ID == kernel thread ID (displayable with ps –L)Reading tasks shows all TIDs in cgroup

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-29 §13.3

Tasks?

Cgroups v1 draws distinction between process and taskTask == kernel scheduling entity

From scheduler’s perspective, “processes” and “threads” arepretty much the same thing....(Threads just share more state than processes)

Multithreaded process == set of tasks with same threadgroup ID (TGID)

TGID == PID!Each thread has unique thread ID (TID)

Here, TID means kernel thread IDI.e., value returned by clone(2) and gettid(2)

And displayed (as “LWP”) by ps –LNot same as POSIX threads pthread_t

(But there is 1:1 relationship in NPTL implementation...)

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-30 §13.3

Page 16: Linux Security and Isolation APIs Control Groups …...Outline 13 Cgroups 13-1 13.1 Introduction to cgroups v1 and v2 13-3 13.2 Cgroups v1: hierarchies and controllers 13-17 13.3 Cgroups

Exercises

(If you have a recent distro that defaults to cgroups v2 only, reboot withsystemd.unified_cgroup_hierarchy=0 to revert to “hybrid” mode.)

1 In this exercise, we create a cgroup, place a process in the cgroup, andthen migrate that process to a different cgroup.

If the memory cgroup is not already mounted, mount it:# grep ’cgroup .* mem ’ /proc/ mounts # Is cgroup mounted ?# mkdir -p /sys/fs/ cgroup / memory# mount -t cgroup -o memory none /sys/fs/ cgroup / memory# cd /sys/fs/ cgroup / memory

Note: some systems (e.g., some Debian releases) provide apatched kernel that disables the memory controller bydefault. If you can’t mount the controller, it may benecessary to reboot with the cgroup_enable=memorykernel command-line option. Alternatively, you could use adifferent controller for this exercise.

[Exercise continues on the next slide]

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-31 §13.3

Exercises

Create two subdirectories, m1 and m2, in the memory cgroup rootdirectory.Execute the following command, and note the PID assigned tothe resulting process:# sleep 300 &

Write the PID of the process created in the previous step into thefile m1/cgroup.procs, and verify by reading the file contents.Now write the PID of the process into the file m2/cgroup.procs.Is the PID still visible in the file m1/cgroup.procs? Explain.Try removing cgroup m1 using the command rm -rf m1. Whydoesn’t this work?Remove the cgroups m1 and m2 using the rmdir command.

Linux Security and Isolation APIs ©2020, Michael Kerrisk Cgroups 13-32 §13.3