lecture 11: virtual machinescpop/sisteme_cu_microprocesoare... · 2010. 2. 25. · today’s menu:...
TRANSCRIPT
Lecture 11:
Virtual Machines
Department of Electrical EngineeringStanford University
EE282 – Fall 2008 Mendel RosenblumLecture 11- 1
http://eeclass.stanford.edu/ee282
Today’s Menu: Architectural Support for Virtual MachinesArchitectural Support for Virtual Machines
• Definition and type of Virtual Machines
• System virtual machines– MotivationMotivation– Virtualizable architecture– Architectural support for virtualizations
• CPU memory I/O• CPU, memory, I/O
• Based on 2005 HotChips Tutorial by Jim Smith and Rich Uhlig– http://www.hotchips.org/hc17/program/tutorials.htm for more info
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 2
Virtualization in General
• Systems are built on levels of abstraction– Higher level hide details at lower levels
file file
– Higher level hide details at lower levels– Example: files are an abstraction of a disk
• Virtualization is a level of indirection
abstraction
• Virtualization is a level of indirection– Similar to abstraction, except details not
necessarily hiddenExample: construct Virtual Disks– Example: construct Virtual Disks
• Virtual machines (VM) C t l i l t t th t t
virtualization
– Create logical structures that seem to operate just like the physical machine
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 3
What is the “Machine”?
• Many different perspectives:– OS developer ApplicationOS developer
• Full ISA, devices, … – Compiler developer:
• User ISA and OS calls (ABI)
ppPrograms
Libraries
– Application programmer• User ISA and library calls (API)
• Leads to many types of VMsS stem VMs (o r foc s toda )
Execution Hardware
Operating System
– System VMs (our focus today)• IBM VM/360, VMware, Xen
– Process VMs• IA-32 EL, FX!32, Dynamo
System Interconnect(bus)
MemoryTranslation
, , y– HLL VMs
• JVM, CLR– Co-designed VMs
C d IBM D i
I/O devicesand
Networking
MainMemory
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 4
• Crusoe and IBM Daisy
System Virtual Machines (VMs)
...App App App VM1VM0
App AppApp ... App ... AppApp
Physical Host Hardware
GFX
Operating System
NICIDEDeviceDrivers...
Guest OS0 ... Guest OS1A newA newlayer oflayer of
software...software...Physical Host Hardware
MemoryProcessors GraphicsVirtual Machine Monitor (VMM)
Physical Host Hardware
Keyboard / MouseStorageNetwork
Without VMs: Single OS owns
Physical Host Hardware
With VMs: Multiple OSes share
• A Virtual Machine Monitor (VMM) honors existing hardware interfaces to create virtual copies of a complete hardware system
gall hardware resources
phardware resources
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 5
p p y– Full ISA, memory, devices
Motivation for System VMs
Workload Isolation Workload Aggregation
OS1 OS2OS
App1App2 App2App1
OS1 OS2OS1 OS2
App2App1App2App1
VMMHW
HW VMMHW
HW1 HW2
Workload Migration
App App
• Manageability, Reliability, Availability– Server consolidation, staged upgrade
deployment, failure confinement, … S it
OS
VMM
App
VMM
OS
VMM
App
VMM
• Security– Encapsulate untrusted SW– Separate environment for trusted SW
• Keep old SW running forever with old HW
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 6
VMM
HW1 HW2
VMM VMM
HW1 HW2
VMM • Keep old SW running forever with old HW
Virtualization Data Center
App2 App2 AppOS2
App2
OS2
App2
OS2
App2
OS2
App2
OS2
App2
OS
App2App2
OS2
App2
OS2
App2
OS2
App2
OS2
App2
OS2
App2
OS
App2App2
OS2
App2
OS2
App2
OS2
App2
OS2
App2
OS2
App2App2App2OS2OS2
2OS2OS2
OS2OS2OS2
App2
Distributed Virtualization Layer – Map virtual machines to hardware
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 7
VM Terminology
• Host: the physical platform (HW + host OS in some cases)• Guest: the additional platforms that run on the VM (OS apps )Guest: the additional platforms that run on the VM (OS, apps,…)• Virtual machine monitor (VMM)
– The thin layer of software that supports virtualization– Also know as hypervisor
Applications
Also know as hypervisor
OSThe image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
Guest
Hardware
VirtualizingSoftwareVMM
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 8
Hardware"Machine"Host
VMM Architecture Options
Hypervisor Architecture Hosted Architecture
...
VMnVM0
Guest OSand Apps
Guest OSand Apps
VM1
Guest OSand Apps
VMn
VM0
Guest OS
User-level VMM
UserA
DeviceModels
Hypervisor
Device Models (Top)
Host OS
Device VMM
Guest OSand Apps
Apps Models
Host HW
Device Drivers (Bottom)DeviceDrivers
VMM“Kernel”
Host HW
Hypervisor architecture provides its own device drivers and services
Hosted architecture leverages device drivers and services of a “host OS”
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 9
VM Compatibility Options
• Full virtualization (e.g., IBM, VMware)Virt ali ation is transparent to g est OS and apps– Virtualization is transparent to guest OS and apps
– Binary level compatibility without any changes at all– Advantages: works with any binary, no need to change OS– Disadvantage: potential performance issues
• But not always, more on this later
• Paravirtualization (e.g., Xen)– Virtualization is transparent to apps but not to guest OS
Must modify guest OS to use API defined by VMM• Must modify guest OS to use API defined by VMM• For Xen, they changed 3K lines of Linux code
– Advantage: addresses performance issues
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 10
– Disadvantage: does not work for proprietary OSes
Basic Virtualization Requirements
• A VMM must meet three conditions:– Isolation (safety)Isolation (safety)
• Protect itself from guest software• Isolate guest software stacks (OS + Apps) from one another
– Equivalency (fidelity)P h ld b h l h i h/ i h i li i• Programs should behaves exactly the same with/without virtualization
– Efficiency (performance) – Hopefully
T hi thi VMM t f ll t l t• To achieve this, VMM must fully control access to– CPUs, Memory and all I/O Devices
• Ways that a VMM can share resources between VMs– Time multiplexing– Resource partitioning
M di ti h d i t f
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 11
– Mediating hardware interfaces
(1) Time Multiplexing
VM0 VM1
VMM
Processor
• VM is allowed direct access to resource for a period of time before being context switched to another VM (e.g., CPU resource)
D il i i th d t il
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 12
– Devil is in the details
(2) Resource Partitioning
VM0 VM1
VMMRemap / Protection MechanismRemap / Protection Mechanism
DisplayStorage Memory
• VMM allocates “ownership” of physical resources to VMs– Typically involves some remapping and protection mechanism
DisplayStorage Memory
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 13
yp y pp g p– Examples: physical memory, disk partitions, graphical display
(3) Mediating Hardware Interfaces
VM0 VM1
VMMVMM
Network Keyboard / Mouse
• VMM retains direct ownership of physical resource– VMM hosts device driver as well as a virtualized device interface
Network Keyboard / Mouse
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 14
– Virtual interface can be same as or different than physical device
When is an Architecture Virtualizable?(Popek & Goldberg 1974)(Popek & Goldberg 1974)
• Basic requirement: at least two execution modes (user & kernel)
• Additional requirement: all sensitive instructions must be privileged – Sensitive instructions: those that change the HW configurations
(allocations, mapping, …) or whose outcome depends on HW configurationE g write TLB or read processor mode• E.g., write TLB or read processor mode
– Priviledged instructions: if executed in user mode they trap in kernel mode
• NotesNotes– There can be privileged instructions that are not sensitive– Memory accesses must go through a privileged translation stage
• E.g., paging or segmentation
• Optimizations: an architecture can include further support for VMs– More on this in a few slides
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 15
System Virtualization Process with Unmodified GuestsApplicationwith Unmodified Guests
• On any sensitive case• Traps, interrupts, system calls
system call/trap
Application
– Transfer to VMM– VMM determines appropriate Guest OS– VMM transfers to Guest OS
• Guest performs privileged operationprivileged operationnext instruction
Guest OS
Guest performs privileged operation– Trap to VMM– VMM reads/modifies guest state– May modify shadow state
Returns to Guest
next instruction
– Returns to Guest• Guest OS “return” to user app.
– Transfer to VMM– VMM bounces return back to Guest app
virtual vector location:
VMM
• Notes:– Traps can very expensive (e.g. 3K cycles)– Reducing the frequency of traps is critical
check privilegesperform operationreturn
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 16
Reducing the frequency of traps is critical• E.g. paravirtualization + other techniques
vector location:
Which Architectures meet Popek’s Requirements?Which Architectures are Virtualizable?Which Architectures are Virtualizable?
• Power architecture (IBM): Yes, yes– Hypervisor mode in addition to user & supervisor– Hypervisor mode in addition to user & supervisor– Real-mode region (RMR) base register to support relocation of
addresses that guest OS believes to be unmapped– Power5 and Power6 systems always run a VMMPower5 and Power6 systems always run a VMM
• Sparc architecture (Sun): Yes, yes– Similar to IBM’s approach (at least in philosophy)– Sparc v9 systems always run a VMM
• x86 architecture (Intel, AMD): No, yes– There are ~17 sensitive but not priviledged instructions in x86– So, how do you virtualize such an architecture?
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 17
, y
x86 virtualization problems
• Unsafe/privileged operations don’t always trap E ample POPF (different semantics in different rings)– Example: POPF (different semantics in different rings)
• Privilege level is visible to software– Example: MOV ax, cs
• Traps destroy state – Example: traps reload segment descriptor registers
Beware of side effects on hidden state– Beware of side effects on hidden state• No place to hide VMM
– Example: Need to steal part of the address space for monitor
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 18
Dynamic Binary Translation (DBT) for Full Virtualizationfor Full Virtualization
• Goal: on-the-fly translation of OS code to replace privileged but insensitive instructionsinsensitive instructions– As the VMM executes OS code, patch problem areas– No need to do anything for user code
Extensively used in original VMware VMM– Extensively used in original VMware VMM
• Opportunity: optimize while translating T l ti b f t th ti– Translation sequences can be faster than native:
• cli vs. vpu.flags.IF := 0– Avoid privilege instruction traps
Example: rdtsc• Example: rdtsc– Trap-and-emulate: 2030 cycles– Callout-and-emulate (paravirtualization): 1254 cycles– BT emulation: 216 cycles (but TSC value is stale)
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 19
CPU Virtualization: General Principles
VM0 VM1
VMM
Processor
• To virtualize a CPU, a VMM must retain control over:– Accesses to privileged state (control regs, debug regs, etc.)p g ( g g g )– Exceptions (page faults, machine-check exceptions, etc.)– Interrupts and interrupt masking
Address translation (via page tables)
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 20
– Address translation (via page tables)– CPU access to I/O (via I/O ports or memory-mapped IO)
Memory Virtualization: General Principles
VM0 VM1
CR3 PD PT
PTGuest OS Guest OS
CR3 PD PT
PTGuest OS Guest OS
VMM Memory Virtualization
• Guest OS expects to control address translation
TLBHostHardware Memory
• Guest OS expects to control address translation– Allocates memory, page tables, manages TLB consistency, etc.
• But, VMM must have ultimate control over physical memory
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 21
– Must map guest-physical address space to host-physical space
A Case Study: IA-32 Address Translation
CR3 PD.
.
.
PT FPaging-related.
..
PDE PTE
PT
F
F
CR0
CR4
Paging relatedControl Registers
PE, PG, WP
PAE PSE
TLB
VPN PFN Access
DR/WU/S
PT
PTE
.
.
.
F
F
CR4
CR2 FaultingAddress
PAE, PSEHardware sets
A / D Bits
VPN PFN Access
• IA-32 defines a hierarchical page-table structureD fi li t h i l dd t l ti
PFN PD U/S R/WA… …
– Defines linear-to-physical address translation– After page-table walk, page-table Entries (PTEs) are cached in a hardware TLB
• IA-32 address translation configured via control registers (CR3, etc.)• Invalidation of PTEs signaled by OS via INVLPG instruction
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 22
Virtualizing Page Tables: Some Options
• Option 1: Protect access to guest-OS page tables (PTs)Use paging protections or binar translation to detect changes– Use paging protections or binary translation to detect changes
– Upon write access, substitute remapped phys address in PTE– Also need VM exit on page-table reads (to report original PTE
value to guest OS)– High overheads
• Option 2: Make a shadow copy of guest page tables– Guest OS freely changes its page tables
C 3– VM exit occurs whenever CR3 changes– VMM copies contents of guest page tables to active page tables– Copy operation is analogous to a TLB refill, hence: “Virtual TLB”
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 23
Virtual TLB: Basic Idea VMCR3
Guest Page Table
Guest OS
CR3 PD PT
PT
VMM VTLB
TLB
CR3 PD PT
PTActivePage Table
VTLB P TLB + A ti P T bl• VTLB = Processor TLB + Active Page Table– VMM initializes an empty VTLB and starts guest execution– When guest accesses memory, #PF occurs, and is sent to VMM
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 24
– VMM copies needed translation (VTLB refill) and resumes guest
IO Virtualization: General Principles
Hypervisor Architecture Hosted Architecture
...
VMnVM0
Guest OSand Apps
Guest OSand Apps
VM1
Guest OSand Apps
VMn
VM0
G t OS
User-level VMM
User DeviceModels
Hypervisor
pp pppp
Device Models (Top)
Host OS
Device Ring-0 VMM
Guest OSand Apps
Apps Models
Vi t l d i d l t i t f t
( p)Device Drivers (Bottom)
DeviceDrivers
Ring 0 VMM“Kernel”
• Virtual device model presents interface to guest operating system
• Physical device driver programs and responds to actual device hardware
Virtual Device Interface and Model
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 25
Physical Device Interface and Driver
Virtual and Physical Device Interfaces
VM0 VM0
Guest OSand Apps Guest device driver programs
“virtual device” interface:• Device Configuration Accesses• IO-port Accesses
M d D i R i
Guest OSand Apps Virtual device model proxies
device activity back to guest OS:• Copying (or translation) of DMA buffers
I j i f “ i l i ”
Virtual Device Interface and Model
• Memory-mapped Device Registers
Virtual device model proxies accesses to physical device driver:
Virtual Device Interface and Model
• Injection of “virtual interrupts”
Physical Device Interface and Driver
Physical Device Interface and Driver
• Possible translation of commands• Translation of DMA addresses
Interface and Driver
Guest device driver programs“virtual device” interface:• DMA transactions to host physical memory
Interface and Driver
Device driver programs actual physical IO device:• Device configuration
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 26
DMA transactions to host physical memory• Physical device interrupts
Physical Device
• Device configuration• IO-port and MMIO accesses
IO Virtualization Issues and Solutions
• How does one intercept accesses to memory-mapped IO?Mark pages for MMIO and ca se trap into VMM– Mark pages for MMIO and cause trap into VMM
• How do you interrupt guest on demands?– Virtualize pending interrupt register
• What if guest OS cannot be interrupted currently– It assumes it has interrupts off?
Allow guest OS to avoid interrupt until it has enabled them again– Allow guest OS to avoid interrupt until it has enabled them again– Expose interrupt enable mask to VMM
• Eliminate copying, interrupts, etc– Same techniques as with accelerating TCP/IP
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 27
VM Performance: Xen Paravirtualization System
• CPU intensive applications run at full speed (few traps)• I/O intensive applications experience slowdown (lots of traps)• I/O bound applications have many traps but can hid the overhead
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 28
• Sources of overhead: more instructions, more cache misses, more TLB misses
VM Performance: I/O Intensive Cases
Additi l NIC l d t hi h l d ith i t li ti• Additional NICs lead to higher slowdown with virtualization
• Will discuss “Xen driver VM” shortly
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 29
VM Performance: Sources of Overhead
• Extra instructions & L2 misses due to indirection & VM bookkeeping
• Extra TLB misses due to lack of super-page support
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 30
Extra TLB misses due to lack of super page support
Extra Architecture Support for VMCase Study: IA-32 CPU VirtualizationCase Study: IA 32 CPU Virtualization
• IA-32 provides 4 privilege levels (Rings) but– Segmentation: distinguishes between all 4 rings– Segmentation: distinguishes between all 4 rings– Paging: user mode in ring 3, supervisor mode in ring 0, 1, or 2
• Can maintain CPU control via “de-privileging” – VMM runs in the most privileged ring 0– Guest OS runs in a higher than usual ring (e.g. not 0)
• Two options– Guest OS in ring 0
• Lose ring protections between guest OS / Appsg p g pp– Guest OS in ring 1
• Can’t use paging to protect VMM from guest OS• VMM forced to use segment-based protections
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 31
Patching IA-32 for Virtualization (VT-x)
• A new operating mode enabled with VMXON/VMXOFF Root Operation F ll pri ileged intended for VMM– Root Operation: Fully privileged, intended for VMM
– Ron-root Operation: Not fully privileged, intended for guest• What got fixed
– Guest OS still runs in ring 0– After VMXON all sensitive instructions can cause traps
• VMM selects which events should trapVMM selects which events should trap– Set of control registers per “virtual CPU”
• Describes state, behavior on entry/exit, interrupt delivery etc• Allows fast VMM ↔ guest OS interactions• Allows fast VMM ↔ guest OS interactions
– Some support for virtualization of virtual memory management– Some support for virtualization of I/O
I d li ki f f d IO
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 32
• Interrupt delivery, marking of pages for memory-mapped IO
Future IA-32 Support for VMs
• Extended page tables (EPT) – Architectural support for page table virtualizationpp p g– Translation process
• First: Guest virtual ⇒ guest physical address• Then: Guest physical ⇒ host physical address
– EPT is also a mult-level page table structure managed by HWEPT is also a mult level page table structure managed by HW– Minimize VMM transistions for page table manipulation
• Virtual-processor identifiers (VPIDs) – Currently, TLB flush needed on every guest-VMM transition– Tag TLB entries with VPID to avoid flushes
• Already in AMDs version of VM support– Per VPID TLB flushing
• A few other• A few other– Guest-preemption timer
• Extra timer used to preempt guests without affecting interrupt system– Descriptor-table exiting
NMI i d iti
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 33
– NMI-window exiting
HW Support Vs. Binary Translation
• Binary Translation VMM:Con erts traps to callo ts– Converts traps to callouts
• Callouts faster than trapping– Faster emulation routine
• VMM does not need to reconstruct state– Avoids callouts entirely
• Hardware VMM:– Preserves code density
No precise exception overhead– No precise exception overhead– Faster system calls
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 34
Compute-bound Benchmarks
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 35
Bottomline: little difference for SPEC
Mixed Benchmarks
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 36
Costs of Basic Operations
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 37
Issues Looking Forward
• Dealing with balance of HW & software features– HW support for virtualization or dynamic binary translation?– HW support for virtualization or dynamic binary translation?
• I/O virtualization (disk, net, graphics, …)– How to balance performance & flexibility– Passthrough I/O for devices with “multiple personalities”
• But how about VM mobility?– I/O devices controlled by separate VMs
• Overhead?
• Recursive virtualization– The VMM of today will be the guest OS in a few years– Can you virtualize the resources used by the VMM?
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 38
y y
Summary of Virtual Machines
• Virtual machines: technique to virtualize a system
• Benefits– Security, fault-tolerance, load-balancing, …Security, fault tolerance, load balancing, … – Support for legacy ISAs, OSes, etc– Co-development/optimization of architecture & software stack
• Architectural support for virtual machines– Two execution modes, no user-level sensitive instructions– VMM and guest OS control state tracking– Flexible mechanisms for virtual memory, interrupt delivery, …
EE282 – Fall 2008 Mendel RosenblumLecture 11 - 39