virtual machines - stony brooknhonarmand/... · •device pass through –directly assign a...
TRANSCRIPT
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Virtual Machines
Heyi Li and Zhen Cao
(Some of the figures are from the Internet)
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Outline• Basic concepts
• When virtual is better
• Implementation
• When virtual is harder
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Basic Concepts• What is a virtual machine?
– An emulation of a particular computer system
• System VM vs. Process VM– System VM: supports the execution of a
complete OS (Xen)
– Process VM: supports the execution of a single process (JVM)
• Hypervisor (VMM)– Computer software that creates and runs VMs
• Type I & II Hypervisor
VMware ESX, Microsoft Hyper-V, Xen
Hardware
Hypervisor
VM1 VM2
Type 1 (bare-metal)
Host
Guest
Hardware
Hosting OS
Process Hypervisor
VM1 VM2
Type 2 (hosted)
VMware Workstation, Microsoft Virtual PC, Sun VirtualBox, QEMU, KVM
Host
Guest
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Applications and Benefits
• Energy efficiency• Reducing Maintenance costs
• Rapid deployment• Security
Server Consolidation
HWn
…
HW0
VM1 VMn
OS
App
OS
App …
HW
VM1 VMn
VMM
OS
App
OS
App
Test and DevelopmentVM1
HW
VMM
OS
App
OS
App
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Virtualization Requirements• Fidelity
– Software on the VM executes identically to its execution on hardware, barring time effects
• Performance– Performance overhead must be small
• Safety– The VMM manages all hardware resources
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Obstacles for X86• Trap-and-emulate
– All virtualization-sensitive instructions are also privileged instructions
• x86 architecture once thought to be not fully virtualizable– Certain privileged instructions behave differently when run in unprivileged
mode (POPF)
– Certain unprivileged instructions can access privileged state (SGDT)
• Techniques to address inability to virtualize x86 – Full virtualization w/o hardware support – Binary Translation (VMware ESX)
– Paravirtualization (Xen)
– Hardware-assisted virtualization
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Binary Translation
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Binary Translation• Binary: input is binary x86 code, not source code
• On-the-fly: dynamic and on demand
• Only need to translate kernel mode code– User mode: direct execution
• Even for kernel mode, most instruction sequences don’t change
• Instructions that do change:– Indirect control flow: call/ret, jmp
– PC-relative addressing
– Privileged instructions
Fall 2014 :: CSE 506 :: Section 2 (PhD)
1. A translation unit stops at 12 instructions or a control-flow instruction
2. Translated into Compiled Code Fragments(CCF) and cached
TUBinary Translator
Translation Cache
CCF
PC [x] [y]
([x], [y])
Hash Table
Execute1
5
3
2 4
3. Track the translation cache with a hash table
4. Execute the CCF
5. Continuation (either fall-through or taken-branch)
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Memory
Guest Virtual Address (gVA) Space
0 4GB
Guest Physical Address (gPA) Space
0
Host Physical Address (hPA) Space
0
Guest Page Table (Visible to guest OS)
VMM PhysMap (Pmap) (Maintained by VMM)
4GB
4GB
Shadow Page Table(Resides in hardwareand maintained byVMM)
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Shadow Page Tables • Translation from gVA to hPA directly by hardware
• If not present, page fault generated by hardware
• Hidden page fault: the mapping present in guest page table– VMM walks the guest page table to determine the gPA backing that gVA
– VMM allocates a physical page, and adds the mapping to Pmap
– Updates the shadow page table
• True page fault: the mapping not present in guest page table– VMM generates an exception on the virtual cpu
– Resume executing on the first instruction of the guest exception handler
Fall 2014 :: CSE 506 :: Section 2 (PhD)
I/O Virtualization – Direct I/O Model• Place drivers for high-performance I/O
devices directly into hypervisor
• Not attempt to have the virtual hardware match the specific underlying hardware
• Virtualize selected, canonical I/O devices
• Problems– Larger Hypervisor
– Need to protect hypervisor from driver faultsHypervisor
SharedDevices
I/O Services
Device Drivers
VM0
Guest OSand Apps
VMn
Guest OSand Apps
Full Virtualization
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Paravirtualization
Fall 2014 :: CSE 506 :: Section 2 (PhD)
CPU Virtualization• Privilege levels in x86
– Ring 0: Xen– Ring 1: guest OS– Ring 3: user apps
• Isolation– Guest user mode and guest kernel mode
• Page table “supervisor” bit: PTE_U
– Guest OS and VMM• Segmentation
– Problem with x86-64
Fall 2014 :: CSE 506 :: Section 2 (PhD)
CPU Virtualization (cont.)• Privileged instructions
– Hypercalls– Modify source codes– Validated and executed by Xen (e.g., installing a new PT)
• Exceptions– Registered with Xen once. Accepted (validated) if don’t require to execute
exception handlers in ring0.– Called directly without Xen intervention– All syscalls from apps to guest OS handled this way (and executed in ring1)
• Page fault handlers are special– Faulting address can be read only in ring 0– Xen reads the faulting address and passes it via stack to the OS handler in
ring1
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Memory Virtualization• Physical memory
– At domain creation, hardware pages “reserved”– Domain can increase/decrease its quota– Xen does not guarantee that the hardware pages are contiguous
• Virtual memory– Register guest OS page tables directly with MMU– Guest OS allocates and initializes a page from its own memory reservation
and registers it with Xen• Every guest OS has its own address space• Xen occupies top 64MB of every address space.
• To save switching costs between address spaces (hypervisor calls)
– Xen involved only in memory updates
Fall 2014 :: CSE 506 :: Section 2 (PhD)
I/O Virtualization – Indirect I/O Model• Uses a privileged virtual
machine (Domain0) for all device drivers
• Simple interfaces for guest OSes
• Pros– higher security
• Cons – lower performance
SharedDevices
I/O Services
Hypervisor
Device Drivers
Service VMs
VMn
VM0
Guest OSand Apps
Guest VMs
Paravirtualization
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Hardware-assist Virtualization (HVM)
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Intel’s VT-x• More-privileged mode for VMM
• Less-privileged mode for guest OS
• Eliminate de-privileging of Ring for guest OS
Ring 3
Ring 0
VMXRoot
Virtual Machines (VMs)
Apps
OS
VM Monitor (VMM)
Apps
OS
VM Exit VM Entry
Fall 2014 :: CSE 506 :: Section 2 (PhD)
VM Control Structure(VMCS)• Execution controls determine when exits occur
– Access to privileged state, occurrence of exceptions, etc.
– Flexibility provided to avoid unwanted exits
• Guest-state area– Processor state saved into the guest-state area on VM exits and loaded on VM
entries
• Host-state area– Processor state loaded from the host-state area on VM exits
• Other
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Extended Page Table(EPT)
• A new page-table structure, under the control of the VMM– Defines mapping between GPA & HPA
– EPT base pointer (new VMCS field) points to the EPT page tables
– EPT (optionally) activated on VM entry, deactivated on VM exit
• Guest has full control over its own IA-32 page tables– No VM exits due to guest page faults, INVLPG, or CR3 changes
GuestPageTables
Guest Linear Address Guest Physical Address ExtendedPageTables
Host Physical Address
EPT Base Pointer (EPTP)CR3
Fall 2014 :: CSE 506 :: Section 2 (PhD)
I/O Virtualization
Hypervisor
SharedDevices
I/O Services
Device Drivers
VM0
Guest OSand Apps
VMn
Guest OSand Apps
Full Virtualization
SharedDevices
I/O Services
Hypervisor
Device Drivers
Service VMs
VMn
VM0
Guest OSand Apps
Guest VMs
Paravirtualization
AssignedDevices
Hypervisor
VM0
Guest OSand Apps
DeviceDrivers
VMn
Guest OSand Apps
DeviceDrivers
Pass-through Model
Fall 2014 :: CSE 506 :: Section 2 (PhD)
IOMMU• Device pass through
– Directly assign a physical device to a particular guest OS
– Address space translation handled transparently
• Device isolation– Safely map a device to a particular guest without risking the integrity of other
guests
Fall 2014 :: CSE 506 :: Section 2 (PhD)
IOMMU• Translation Control Entry
– Translation from a DMA address to a host memory address
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Security Problems• Transience
– Large numbers of machines appear and disappear from the network sporadically
• Diversity– Long and painful upgrade cycles
• Identity– Difficult to establish who owns a VM running on a particular physical host
• Mobility– Can be easily copied over a network or carried on portable storage media
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Discussion
Fall 2014 :: CSE 506 :: Section 2 (PhD)
Thanks!