let me contain that for you - linux plumbers conf · google confidential and proprietary victor...
TRANSCRIPT
Google Confidential and Proprietary
Victor Marmol ([email protected])Rohit Jnagal ([email protected])
Let Me Contain That For You
Google Confidential and ProprietaryLPC 2013
Containers @ Google
● Early users: Scaling process management and isolation.● What: Linux cgroups + user-space policies and monitoring.● Everywhere: SaaS, PaaS, IaaS; Private and Public clouds.● Containerizing shared machines
○ Asymmetric workloads : Latency, bandwidth, and priority○ Asymmetric Isolation○ High churn
● Goals:○ Performance guarantees.○ High utilization across resources.○ Shared resources.○ Overcommitment: Invisible workload from reclaimed resources.○ Near zero overhead.
● Other use cases: ChromeOS et al
Google Confidential and ProprietaryLPC 2013
I/O:CPU:MemSensitive Front End Job Back End Job
Allocation
BACKGROUND TASKS
A Shared Google Machine
System Daemons Batch workload Soaker workload
Google Confidential and ProprietaryLPC 2013
Resource Isolation
● Quality of service○ Bandwidth - Fair share, progress guarantees, availability.○ Latency - wakeup, allocation, access times○ Priority - Order of importance.○ Performance: Microarchitecture interference (CPI2); Locality
● Solution: ○ Scheduling a good mix.○ Hierarchical resource management for effective sharing.○ Maximize utilization across all dimensions.○ Cgroup-aware tasks:
■ User subcontainers [eg. Query management]■ User schedulers.■ Self-correcting tasks: Notifications
image credit
Google Confidential and ProprietaryLPC 2013
Scalability
● Churn○ 1 Creation/Deletion per 10 seconds
● Per Container○ Read: O(10) cgroup-based stats per second○ Write: O(1) cgroup-based param per second
● Per Machine○ O(100) containers○ Looks to grow dramatically
● Overall○ Read: 1000’s per second○ Write: 100’s per second
● Users can do a lot more.
● Precise accounting for chargeback● Monitoring built in at multiple layers● Extremely low overhead
Google Confidential and ProprietaryLPC 2013
ContainLet Me That For You
● Revised container management○ Separate cgroup abstraction from policies.○ Configuring cgroups with an intent-based resource specification.
● Built for scalability and parallel access.● Also includes extra kernel patches for:
○ Improving resource isolation.○ Providing tighter performance guarantees.○ Precise accounting in face of sharing.○ Cap for global resources.
● Allow users to create subcontainers with restrictions.● Open-source: Sharing use-cases, problems, and benchmarks.● Implement policies in a higher layer:
○ Continuous monitoring and fine-tuning.○ No critical loops [Remember LPC2011?]○ Machine-level utilization and isolation management.○ Isolated from system APIs.
Google Confidential and ProprietaryLPC 2013
T1[1536]
T2[512]
T1[2G]
T2[3G]
/dev/cgroup/cpu/A1[2048]
/dev/cgroup/mem/A1[4G]
Task running in an allocation sharing resources with co-located siblings.
An allocation A1 with two tasks T1 and T2
Hierarchical Sharing
Google Confidential and ProprietaryLPC 2013
Managing priority across resources
T1[0.1]
T1[1G]
T3[1G]
Block I/O Cpu
Default[0.1]
T3[0.1]
T2[0.8]
T1[512]
Default[2]
T3[256]
T2[1024]
Memory
T2[2G]
Cgroups for low-priority batch tasks Cgroups for a latency sensitive task
Google Confidential and ProprietaryLPC 2013
T2[2G]
Block I/O Cpu
Default[0.1]
T2[0.1]
T1[0.8]
Default[2]
T2[1024]
T1[2048]
Memory
T1[0.3]
T1[PRIO][0.5]
T1[4G]
Cgroups for a high I/O priority latency sensitive task
Cgroups for a low priority task
A task may require multiple containers for the same resource to balance its workload priorities. I/O server T1 uses two subcontainers to differentiate incoming I/O requests and moves threads to the right subcontainer.
Managing priority across resources
Google Confidential and ProprietaryLPC 2013
Splitting hierarchies for performance
T2[2G]
Block I/O Cpu
Default[0.1]
T3[0.1]
T1[0.8] Default
[2]
T3[1024]
T1[2048]
Memory
T1[4G]
Splitting hierarchies reduces stranded resources and
improves performance for highly sensitive tasks.
T2[1024]
T3[2G]
T2[0.1]
T1[0.5|P]
T1[0.3]
Cpu, Memory and I/O sensitive task
Cpu & Memory sensitive task with low I/O priority
Low priority batch task
Google Confidential and ProprietaryLPC 2013
User Subcontainers
App Engine Task
Server Instances
Instance1 Instance3Instance2
App Engine uses on-demand container creation:fair sharing, notifications, and isolation of misbehaving apps
Protected server app
Subcontainers with tailored spec and priority
OOM
Google Confidential and ProprietaryLPC 2013
Takeaways
Come find us for chat, discussions, BoF, and drinks. Or virtually:[email protected]@google.com
● Cgroups support goes beyond containerized VMs.● Sharing and overcommitment is a key to higher
utilization.● Managing each resource separately helps fine-tune
utilization and performance.● More power to users means better flexibility and
scalability.