of running kubernetes for publication deep dive: the value · decoupling the os from the hardware...
TRANSCRIPT
#vmworld
Deep Dive: The Valueof Running Kubernetes
on vSphereFrank Denneman, VMware, Inc.
@FrankDennemanMichael Gasch, VMware, Inc.
@embano1
CNA1553BE
#CNA1553BE
VMworld 2018 Content: Not for publication or distribution
Disclaimer
2©2018 VMware, Inc.
This presentation may contain product features orfunctionality that are currently under development.
This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined.
VMworld 2018 Content: Not for publication or distribution
Agenda
3©2018 VMware, Inc.
Kubernetes Primer
Customer Scenario – Making the Case for Bare Metal
Experience Report – Kubernetes on Bare Metal vs. vSphere
QnA
VMworld 2018 Content: Not for publication or distribution
4©2018 VMware, Inc.
Kubernetes Primer
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 5©2018 VMware, Inc.
Google Search(late 1990s)
The Origin of Kubernetes
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 6©2018 VMware, Inc.
Revolutionizing the Way we build Distributed(cloud-native) Applications today.
Google Search Pillars:• Commodity• Fault-Tolerant Software• Fraction of the Cost from High-End Servers
The Origin of KubernetesGoogle Search
Source: https://ai.google/research/pubs/pub49VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 7©2018 VMware, Inc.
Platform Engineering Responsibilities
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 8©2018 VMware, Inc. CONFIDENTIAL
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 9©2018 VMware, Inc.
“We must treat the Datacenter itself as one massive Warehouse-scale Computer.”
The Origin of KubernetesThe Datacenter as a Computer
SSH
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 10©2018 VMware, Inc.
Google Search(late 1990s)
Borg(~2003)
The Origin of Kubernetes
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 11©2018 VMware, Inc.
Google Search(late 1990s)
Borg(~2003)
Cgroups(2007)
The Origin of Kubernetes
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 12©2018 VMware, Inc.
Google Search(late 1990s)
Borg(~2003)
Cgroups(2007)
Omega(~2012)
The Origin of Kubernetes
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 13©2018 VMware, Inc.
The Origin of KubernetesContainers become Mainstream
In Search for a Common Language
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 14©2018 VMware, Inc.
The Origin of KubernetesSo what is a Container, really?
Kernel Mode
Cgroups
Namespaces
Security Capabilities
Scheduler
Syscall
task_struct
…
Scheduling Entity (se)
“running”
syscall.Exec(ENTRYPOINT/CMD)*
A Structure in Kernel Memory. The Kernel has no Notion of a “Container”. It’s yet another Executable.
User Mode
Docker Engine
ContainerCreate()
* After Container Sandbox Initialization(nsenter.go/nsexec.c)
sched_classfair.c (CFS)
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 15©2018 VMware, Inc.
Google Search(late 1990s)
Borg(~2003)
Cgroups(2007)
Omega(~2012)
Docker(2013)
The Origin of Kubernetes
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 16©2018 VMware, Inc.
Google Search(late 1990s)
Borg(~2003)
Cgroups(2007)
Omega(~2012)
Docker(2013)
The Origin of Kubernetes
Kubernetes(2014)
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 17©2018 VMware, Inc.
"Kubernetes is an open-source System for automating Deployment, Scaling, and
Management of containerized Applications."
The Origin of KubernetesContainer Orchestration
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 18©2018 VMware, Inc.
Kubernetes Cluster
KubernetesHigh-Level Architecture
Infrastructure(Compute, Storage, Networking)
Control Plane Worker
Pod Pod Pod Pod PodAPI
Kubernetes Cloud Provider
VMworld 2018 Content: Not for publication or distribution
19©2018 VMware, Inc.
Customer ScenarioMaking the Case for Bare Metal
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 20©2018 VMware, Inc.
ABC Inc. is the Leader in Manufacturing Wayback Machines
Enterprise IT Organization with separate Infrastructure, Linux/Middleware and Development Teams (Silos)
>90% standardized and virtualized on VMware vSphere
Going through Digital Transformation to become more Customer and Feedback driven• Need to develop (iterate) faster with an agile Approach
Technical Vehicle: Containers and “cloud-native” Application Architectures• Kubernetes as the Framework to build and run these new Applications• Embrace and contribute to Open Source Software
Meet ABC Inc.
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 21©2018 VMware, Inc.
Linux Team at ABC Inc. decided to deploy Kubernetes on Bare Metal
Justification:• New (cloud-native) Applications don’t need vSphere Features like HA and vMotion• Containers are more lightweight, replacing VM’s and the Hypervisor• Kubernetes provides Hypervisor Functionality, e.g. Resource Management and HA• Virtualization reduces Performance of containerized Applications• Reduce Complexity and Costs by eliminating the Hypervisor from the Stack• IT Infrastructure not agile enough (no Self-Service)
ABC Inc.’s vSphere Team reached out to VMware for Help
ABC Inc.’s Decision to go Bare Metal
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 22©2018 VMware, Inc.
Back to 2005
merchoid.com
VMworld 2018 Content: Not for publication or distribution
23©2018 VMware, Inc.
Experience ReportKubernetes on Bare Metal vs. vSphere
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 24©2018 VMware, Inc.
Day 0 Planning and “Green Lights”
Day 1 Experiences with first Deployments
Day 2 Container and Cluster Sprawl
Day 3 Maintenance & Availability
Terminology
VMworld 2018 Content: Not for publication or distribution
25©2018 VMware, Inc.
Day 0Planning and “Green Lights”
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 26©2018 VMware, Inc.
Day 0Planning and “Green Lights”
Kubernetes Cluster
Infrastructure(Compute, Storage, Networking)
Cloud Provider(Custom)
External Dependencies
DNS DBs IPAM
Images CA Auth
Secrets Monitoring Logging
CustomIntegrations
Label: AZ=AZ-1VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 27©2018 VMware, Inc.
Day 0Realization: Managing Bare Metal Systems is hard
VMworld 2018 Content: Not for publication or distribution
28©2018 VMware, Inc.
How vSphere Can Help
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 29©2018 VMware, Inc.
Average Time to get HW in DC – Unpredictable Process
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 30©2018 VMware, Inc.
Average Time to get Hardware in Data Center
86 Days
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 31©2018 VMware, Inc.
Hardware CompatibilityDecoupling the OS from the Hardware reduces operational Overhead
a simple NIC revision change can directly impact the Kubernetes host
Virtualized hardware decouples the OS from the underlying hardware.Hardware abstraction reduces operational overhead for supported firmware versions of components.
Configuration management done at the physical layer (firmware, drivers, etc). (Drift?)(Supported?)
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 32©2018 VMware, Inc.
Non Disruptive PatchingvMotion Workload away for Hardware, Firmware or Driver Update
a simple NIC revision change can directly impact the Kubernetes host
Need to Patch, vMotion workloadNo disruption
Need to patch – Kill workload
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 33©2018 VMware, Inc.
Kill Doesn’t matter for Stateless WorkloadsETCD isn’t stateless, and what about top 10 Workloads in Containers today?
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 34©2018 VMware, Inc.
Strong Security IsolationStrong Isolation between Workloads with efficient Resource Usage
a simple NIC revision change can directly impact the Kubernetes host
VMs provides strong isolation between guest, allowing multi tenancy. Efficient use of resources
Containers are processes in Linux Kernel, security concerns can lead to reduced resource utilization
Tenant A Tenant B
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 35©2018 VMware, Inc.
Modern DCs operate various workloads in different packaging formats
vSphere provides unified platform for these workloads
Use your current tool and skillset to manage this workloads
Focus on creating value
Functional Use of HardwareGeneral vs Dedicated
general purpose allows mixed workloads and superior resource utilization
dedicated hardware to a particular function hinders resource optimization
VMworld 2018 Content: Not for publication or distribution
36©2018 VMware, Inc.
Day 1Experiences with first Deployments
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 37©2018 VMware, Inc.
Physical Host
Day 1In the old Bare Metal Days (Pre-Virtualization Era)
Kernel
App
M
Hardware
16 Cores 128GB RAID NIC-Teaming
G G W W W
NUMA sysctl nice IRQ-Balance
Bins/Libs
Almost exclusive Access to Host Resources for this App (1:1)
Best Practices for this Deployment Type were developed
App uses Host Information for Runtime Tuning
Downside: Utilization & Agility
M
G
W
OS Thread: main()
OS Thread: GC
OS Thread: Worker/PoolVMworld 2018 Content: Not for publication or distribution
#CNA1553BE 38©2018 VMware, Inc.
Physical Host
Day 1Containers and Kubernetes on Bare Metal to the Rescue?
Kernel
Hardware
64 Cores(HT)
384GB RAID NIC-Teaming
NUMA sysctl Cgroups IRQ-Balance
Container Runtime
Kubelet
Not all Runtimes are Cgroup-aware!
How to tune per Workload?
Resource Contention and Workload Interference!
Isolation (Security)?
Utilization & Agility kube-scheduler
Node: BM001Capacity:
cpus: 64memory: 384GB
Allocatable:cpus: 60memory: 360GB
How much to reserve vs. Waste?
VMworld 2018 Content: Not for publication or distribution
41©2018 VMware, Inc.
How vSphere Can Help
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 42©2018 VMware, Inc.
Hyperthreading
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 43©2018 VMware, Inc.
Hyperthreading in vSphere
VMworld 2018 Content: Not for publication or distribution
‹#› 44©2018 VMware, Inc.
Consistent performance is obtained by avoiding NUMA boundaries
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 45©2018 VMware, Inc.
NUMA Architecture
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 46©2018 VMware, Inc.
NUMA Architecture
VMworld 2018 Content: Not for publication or distribution
47©2018 VMware, Inc.
Day 2Container and Cluster Sprawl
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 48©2018 VMware, Inc.
Day 2Container and Cluster Sprawl
Org
aniz
atio
n
EnvironmentVMworld 2018 Content: Not for publication or distribution
#CNA1553BE 49©2018 VMware, Inc.
Day 2Container and Cluster Sprawl
Org
aniz
atio
n
EnvironmentVMworld 2018 Content: Not for publication or distribution
#CNA1553BE 50©2018 VMware, Inc.
Day 2Container and Cluster Sprawl
Imb
alan
ce
CostVMworld 2018 Content: Not for publication or distribution
51©2018 VMware, Inc.
How vSphere Can Help
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 52©2018 VMware, Inc.
Multi-Tenancy
general purpose allows mixed workloads and superior resource utilization
VM
vSphere Cluster
VM VM VM VM VM VM VMVM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Test
VM
K8S Test
VM
K8S Test
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 53©2018 VMware, Inc.
Multi-TenancyIncrease in Utilization: Scale out & redistribute
general purpose allows mixed workloads and superior resource utilization
VM
vSphere Cluster
VM VM VM VM VM VM VMVM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Test
VM
K8S Test
VM
K8S Test
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VMworld 2018 Content: Not for publication or distribution
55©2018 VMware, Inc.
Day 3Maintenance & Availability
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 56©2018 VMware, Inc.
Day 3Maintenance & Availability (Control Plane)
Admission Control and Failover Capacity (MTTR)?Proactive HA?Impact of Host Maintenance/
Failure on Control Plane?
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 57©2018 VMware, Inc.
Day 3Maintenance & Availability (Workloads)
Kubernetes Control Plane
Controller Manager Scheduler
“QA”“Dev” “Prod” “QA”
4 CPUs 4 CPUs 4 CPUs
“Prod” “QA”“Dev”
2 CPUs 1 CPU 1 CPU 2 CPUs 1 CPU1 CPU 1 CPU
Only considering beta/stable and in-tree Kubernetes FeaturesDisruptive Pod Priority & Preemption, incl. Priority Queue, beta in v1.11
Example assumes shared File System for Persistent Volumes
Queue: “QA”“Dev”“Prod”pick
* Default, configurable** Fixed
pod-eviction-timeout
5min*ReconcilerMaxWaitForUnmountDuration
6min**
“Dev”
1 CPU
“QA”
1 CPU
VMworld 2018 Content: Not for publication or distribution
58©2018 VMware, Inc.
How vSphere Can Help
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 59©2018 VMware, Inc.
Multi Cluster ConfigurationPriority
VM
vSphere Cluster
VM VM VM VM VM VM VMVM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Test
VM
K8S Test
VM
K8S Test
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
Important Important Important Important
More Important More Important More Important More Important
Meh Meh Meh
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 60©2018 VMware, Inc.
HA Restart PriorityEnsure “Prod” Systems get restarted first
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 61©2018 VMware, Inc.
Restart Dependency
Works based on VM to VM rules
Only 1 level, so for A-B-C create two rules
• VM Group B depends on A• VM Group C depends on B• ETCD-Masters-Workers
Specify when to start next batch
• Resources allocated• Powered On• Guest Heartbeat• App Heartbeat
Or “HA Orchestrated Restart” as it is also called
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 62©2018 VMware, Inc.
Kubernetes Cluster Node RolesControl Plane (Masters) and Workers
VM
vSphere Cluster
VM VM VM VM VM VM VMVM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Test
VM
K8S Test
VM
K8S Test
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
(Master) (Master) (Master) (Workers) (Worker)
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 63©2018 VMware, Inc.
Multi-TenancyDRS Affinity Rules
VM
vSphere Cluster
VM VM VM VM VM VM VMVM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Test
VM
K8S Test
VM
K8S Test
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
(Master) (Master) (Master) (Workers) (Worker)
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 64©2018 VMware, Inc.
Multi-TenancyDRS Affinity Rules
VM
vSphere Cluster
VM VM VM VM VM VM VMVM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Test
VM
K8S Test
VM
K8S Test
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
(Master) (Master) (Master) (Workers) (Worker)
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 65©2018 VMware, Inc.
Multiple Fault DomainsQuorum dictates Design
VM
Fault Domain A
VM VM VM VM VM VM VMVM
K8S Prod
VM
K8S Prod
V
K8S Prod
VM
K8S Prod
VM
K8S Test
VM
K8S Test
VM
K8S Test
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
(Master) (Master)
Fault Domain B
(Worker)
VM
K8S Prod(Master) (Worker)
VM
K8S Prod
(Worker) (Worker)
(VM Anti-Affinity)
Host-VM Rules
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 66©2018 VMware, Inc.
DRS proactively avoid needing the use of HA• Integrates with server vendor’s monitoring software• Health states are passed to DRS• DRS reacts based on health state of hardware
None of DRS affinity/anti-affinity rules are violated
Quarantine Mode accepts workloads if performance degradation is imminent
Proactive HAMoving Workloads away at first Signs of Trouble
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 68©2018 VMware, Inc.
VM Latency SensitivityCPU Core Isolation
VMworld 2018 Content: Not for publication or distribution
DON’T FORGET TO FILL OUT YOUR SURVEY.
#vmworld #CNA1553BE
VMworld 2018 Content: Not for publication or distribution
THANK YOU!
#vmworld #CNA1553BE
VMworld 2018 Content: Not for publication or distribution
#CNA1553BE 71©2018 VMware, Inc.
Global Services: Reimagining Support Giving you a more proactive, personalized, effortless experience
Read All About ItSupport Insider Blog
https://blogs.vmware.com/kb/
Meet the Team Connect at the
VMVillage’s Listening Post and the Global
Services Meeting Center
Download VMware Skyline™
Visit the VMware Skyline station in the Solutions
Exchange VMworld 2018 Content: Not for publication or distribution
More Sessions on Kubernetes
Try HOL
VMware Cloud-Native Apps
Follow Us
https://blogs.vmware.com/cloudnativehttps://www.youtube.com/c/VMwareCloudNativeApps
@cloudnativeapps
Tuesday, Nov 6CNA1656BE Put a Lid on It: Securing Containers and Kubernetes on vSphereCNA1634BE Container Portfolio at VMwareCNA1816BE Container and Kubernetes 101 for Admins
Wednesday, Nov 7CNA2755BE Architecting PKS for Production Lessons Learned from PKS DeploymentsCNA2084BE Intro to VMware Kubernetes Engine – Managed K8sService on Public CloudDC3845KE Cloud and Developer Keynote: Public Clouds and Kubernetes at ScaleCNA1674BE Deep Dive: Run Kubernetes in Production with PKS
Thursday, Nov 8CNA3124BE Deep Dive: VMware Kubernetes Engine – Kubernetes as a Service on Public CloudCNA2009BE Run Stateful Apps on Kubernetes with PKS: Highlight WebLogic Server
1932 VMware Kubernetes Engine – Getting Started
1931 VMware Pivotal Container Service and Kubernetes – Getting Started1935 VMware Pivotal Container Service on VMware NSX-T
VMworld 2018 Content: Not for publication or distribution