virtual ization 101

Upload: -

Post on 03-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Virtual Ization 101

    1/50

    Copyright 2007 VMware, Inc. All rights reserved.

    MIT IAP CourseLecture #1: Virtualization 101

    Carl Waldspurger (SB SM 89 PhD 95)VMware R&D

    January 16, 2007

  • 7/28/2019 Virtual Ization 101

    2/50

    2Copyright 2007 VMware, Inc. All rights reserved.

    What is Virtualization?

    Virtual systems Abstract physical components using logical objects

    Dynamically bind logical objects to physical configurations

    Examples Network Virtual LAN (VLAN), Virtual Private Network (VPN)

    Storage Storage Area Network (SAN), LUN

    Computer Virtual Machine (VM), simulator

    vir tu al (adj): existing in essence or effect,though not in actual fact

  • 7/28/2019 Virtual Ization 101

    3/50

    3Copyright 2007 VMware, Inc. All rights reserved.

    Overview

    Virtual Machines

    Virtualization Approaches

    Processor Virtualization

    Additional Topics

  • 7/28/2019 Virtual Ization 101

    4/50

    4Copyright 2007 VMware, Inc. All rights reserved.

    Starting Point: A Physical Machine

    Physical Hardware Processors, memory, chipset,

    I/O bus and devices, etc. Physical resources often

    underutilized

    Software Tightly coupled to hardware Single active OS image

    OS controls hardware

  • 7/28/2019 Virtual Ization 101

    5/50

    5Copyright 2007 VMware, Inc. All rights reserved.

    What is a Virtual Machine?

    Hardware-Level Abstraction Virtual hardware: processors,

    memory, chipset, I/O devices, etc. Encapsulates all OS and

    application state

    Virtualization Software Extra level of indirection

    decouples hardware and OS

    Multiplexes physical hardwareacross multiple guest VMs

    Strong isolation between VMs Manages physical resources,

    improves utilization

  • 7/28/2019 Virtual Ization 101

    6/50

    6Copyright 2007 VMware, Inc. All rights reserved.

    VM Isolation

    Secure Multiplexing Run multiple VMs on

    single physical host Processor hardware

    isolates VMs, e.g. MMU

    Strong Guarantees Software bugs, crashes,

    viruses within one VMcannot affect other VMs

    Performance Isolation

    Partition system resources Example: VMware controls

    for reservation, limit, shares

  • 7/28/2019 Virtual Ization 101

    7/50

    7Copyright 2007 VMware, Inc. All rights reserved.

    VM Encapsulation

    Entire VM is a File OS, applications, data

    Memory and device state

    Snapshots and Clones Capture VM state on the fly

    and restore to point-in-time

    Rapid system provisioning,backup, remote mirroring

    Easy Content Distribution Pre-configured apps, demos

    Virtual appliances

  • 7/28/2019 Virtual Ization 101

    8/50

    8Copyright 2007 VMware, Inc. All rights reserved.

    VM Compatibility

    Hardware-Independent Physical hardware hidden

    by virtualization layer Standard virtual hardware

    exposed to VM

    Create Once, Run Anywhere No configuration issues Migrate VMs between hosts

    Legacy VMs Run ancient OS on new platform

    E.g. DOS VM drives virtual IDEand vLance devices, mapped tomodern SAN and GigE hardware

  • 7/28/2019 Virtual Ization 101

    9/50

    9Copyright 2007 VMware, Inc. All rights reserved.

    Common Virtualization Uses Today

    Server Consolidation and Containment Eliminate serversprawl by deploying systems into virtual machines that can runsafely and move transparently across shared hardware

    Test and Development Rapidly provision test anddevelopment servers; store libraries of pre-configured testmachines

    Enterprise Desktop Secure unmanaged PCs withoutcompromising end-user autonomy by layering a security policy insoftware around desktop virtual machines

    Business Continuity Reduce cost and complexity byencapsulating entire systems into single files that can bereplicated and restored onto any target server

  • 7/28/2019 Virtual Ization 101

    10/50

    10Copyright 2007 VMware, Inc. All rights reserved.

    Overview

    Virtual Machines

    Virtualization Approaches Virtual machine monitors (VMMs)

    Virtualization platform types

    Alternative system virtualizations

    Processor VirtualizationAdditional Topics

  • 7/28/2019 Virtual Ization 101

    11/50

    11Copyright 2007 VMware, Inc. All rights reserved.

    What is a Virtual Machine Monitor?

    VMM Characteristics Fidelity

    Performance Isolation / Safety

    An Old Concept Classic definition from

    Popek & Goldberg 74

    IBM mainframes since 60s

  • 7/28/2019 Virtual Ization 101

    12/50

    12Copyright 2007 VMware, Inc. All rights reserved.

    VMM Technology

    So this is just like Java, right? No, a Java VM is very different from the physical machine that runs it

    A hardware-level VM reflects underlying processor architecture

    Like a simulator or emulator that can run old Nintendo games? No, they emulate the behavior of different hardware architectures

    Simulators generally have very high overhead

    A hardware-level VM utilizes the underlying physical processor directly

  • 7/28/2019 Virtual Ization 101

    13/50

    13Copyright 2007 VMware, Inc. All rights reserved.

    VMMs Past

    An Old Idea Hardware-level VMs since 60s

    IBM S/360, IBM VM/370mainframe systems

    Timeshare multiple single-userOS instances on expensivehardware

    Classical VMM Run VM directly on hardware

    Trap and emulate modelfor privileged instructions

    Vendors had vertical controlover proprietary hardware,operating systems, VMM

    From IBM VM/370 product announcement, ca . 1972

  • 7/28/2019 Virtual Ization 101

    14/50

    14Copyright 2007 VMware, Inc. All rights reserved.

    VMMs Present

    Renewed Interest Academic research since 90s

    VMs for commodity systems

    Server consolidation

    VMM for x86

    Industry-standard hardware,from laptops to datacenter

    Run unmodified commodityguest operating systems

    Significant challenges, e.g.

    non-virtualizable instructions Pioneered by VMware in 98

    VMware Fusion for Mac OS X running WinXP, 2006

  • 7/28/2019 Virtual Ization 101

    15/50

    15Copyright 2007 VMware, Inc. All rights reserved.

    VMM Platform Types

    Hosted Architecture Install as application on existing x86 host OS,

    e.g. Windows, Linux, OS X Small context-switching driver

    Leverage host I/O stack and resource management

    Examples: VMware Player/Workstation/Server,

    Microsoft Virtual PC/Server, Parallels DesktopBare-Metal Architecture

    Hypervisor installs directly on hardware

    Acknowledged as preferred architecture for high-end servers

    Examples: VMware ESX Server, Xen, Microsoft Viridian (2008)

  • 7/28/2019 Virtual Ization 101

    16/50

    16Copyright 2007 VMware, Inc. All rights reserved.

    System Virtualization Alternatives

    OS Level Hardware Level

    Virtual machines abstracted using a layer at different places

    Language Level

  • 7/28/2019 Virtual Ization 101

    17/50

    17Copyright 2007 VMware, Inc. All rights reserved.

    System Virtualization Taxonomy

    System Virtualization

    Java Microsoft .NET / Mono Smalltalk

    High-Level LanguageHardware Level

    Bare-Metal/ Hypervisor

    HP Integrity VM IBM zSeries z/VM VMware ESX Server Xen

    Hosted Microsoft Virtual Server Microsoft Virtual PC Parallels Desktop VMware Player VMware Workstation VMware Server

    Para-virtualization Virtual Iron VMware VMI Xen

    OS Level FreeBSD Jail HP Secure Resource

    Partitions Sun Solaris Zones SWsoft Virtuozzo User-Mode Linux

    Bochs Microsoft VPC for Mac

    QEMU Virtutech Simics

    Emulators

  • 7/28/2019 Virtual Ization 101

    18/50

    18Copyright 2007 VMware, Inc. All rights reserved.

    Overview

    Virtual Machines

    Virtualization Approaches

    Processor Virtualization Classical techniques

    Software x86 VMM

    Hardware-assisted x86 VMM Para-virtualization

    Additional Topics

  • 7/28/2019 Virtual Ization 101

    19/50

    19Copyright 2007 VMware, Inc. All rights reserved.

    Classical Instruction Virtualization

    Trap and Emulate Run guest operating system deprivileged

    All privileged instructions trap into VMM VMM emulates instructions against virtual state

    e.g. disable virtual interrupts, not physical interrupts

    Resume direct execution from next guest instruction

    Implementation Technique This is just one technique

    Popek and Goldberg criteria permit others

  • 7/28/2019 Virtual Ization 101

    20/50

    20Copyright 2007 VMware, Inc. All rights reserved.

    Classical Memory Virtualization

    Traditional VMM Approach

    Extra Level of Indirection Virtual Physical

    Guest maps VPN to PPNusing primary page tables

    Physical Machine

    VMM maps PPN to MPNShadow Page Table

    Composite of two mappings

    For ordinary memory references

    Hardware maps VPN to MPN Cached by physical TLB

    VPN

    PPN

    MPN

    hardware TLB

    shadow page table

    guest

    VMM

  • 7/28/2019 Virtual Ization 101

    21/50

    21Copyright 2007 VMware, Inc. All rights reserved.

    Memory Traces

    Shadow Page Table Derived from primary page table in guest

    VMM must keep primary and shadow coherent

    Trace = Coherency Mechanism Write-protect primary page table

    Trap guest writes to primary

    Update or invalidate corresponding shadow

    Transparent to guest

  • 7/28/2019 Virtual Ization 101

    22/50

    22Copyright 2007 VMware, Inc. All rights reserved.

    Classical VMM Performance

    Native Speed Except for Traps No overhead in direct execution

    Overhead = trap frequency average trap cost

    Trap Sources Most frequent: Guest page table traces

    Privileged instructions

    Memory-mapped device traces

  • 7/28/2019 Virtual Ization 101

    23/50

    23Copyright 2007 VMware, Inc. All rights reserved.

    x86 Virtualization Challenges

    Not Classically Virtualizable x86 ISA includes instructions that read or modify privileged state

    But which dont trap in unprivileged mode

    Example: POPF instruction Pop top-of-stack into EFLAGS register

    EFLAGS.IF bit privileged (interrupt enable flag)

    POPF silently ignores attempts to alter EFLAGS.IFin unprivileged mode!

    So no trap to return control to VMM

    Deprivileging not possible with x86!

  • 7/28/2019 Virtual Ization 101

    24/50

    24Copyright 2007 VMware, Inc. All rights reserved.

    How to Virtualize x86?

    Interpretation Problem too inefficient

    x86 decoding slow

    Code Patching Problem not transparent

    Guest can inspect its own code

    Binary Translation (BT) Approach pioneered by VMware

    Run any unmodified x86 OS in VM

    Extend x86 Architecture

  • 7/28/2019 Virtual Ization 101

    25/50

    25Copyright 2007 VMware, Inc. All rights reserved.

    Software VMM: Binary Translation

    Direct execute unprivileged guest application code Will run at full speed until it traps, we get an interrupt, etc.

    Binary translate all guest kernel code, run it unprivileged Since x86 has non-virtualizable instructions,

    proactively transfer control to the VMM (no need for traps)

    Safe instructions are emitted without change

    For unsafe instructions, emit a controlled emulation sequence

    VMM translation cache for good performance

  • 7/28/2019 Virtual Ization 101

    26/50

    26Copyright 2007 VMware, Inc. All rights reserved.

    VMware Translator Properties

    Binary input is x86 hex, not source

    Dynamic interleave translation and execution

    On Demand translate only what about to execute (lazy)

    System Level makes no assumptions about guest code

    Subsetting full x86 to safe subset

    Adaptive adjust translations based on guest behavior

  • 7/28/2019 Virtual Ization 101

    27/50

    27Copyright 2007 VMware, Inc. All rights reserved.

    BT Mechanics

    Each Translator Invocation Consume a basic block (BB)

    Produce a compiled code fragment (CCF)

    Store CCF in Translation Cache Future reuse

    Capture working set of guest kernel

    Amortize translation costs

    Not patching in place

    translator

    Input: BB

    Output: CCF55 ff 33 c7 03 ...

    55 ff 33 c7 03 ...

  • 7/28/2019 Virtual Ization 101

    28/50

    28Copyright 2007 VMware, Inc. All rights reserved.

    Example: IDENT Translation

    80304a69 push %ebp80403a6a push (%ebx)

    80403a6c mov (%ebx), ffffffff80403a72 mov %edx, %esp80403a74 mov %esp, 81c(%ebx)80403a7a push %edx80403a7b mov %ebp, %eax80403a7d call 80460ba4

    25555b0 push %ebp25555b1 push (%ebx)25555b3 mov (%ebx), ffffffff25555b9 mov %edx, %esp25555bb mov %esp, 81c(%ebx)25555c1 push %edx25555c2 mov %ebp, %eax25555c4 push 80403a82

    25555c9 int 3a25555cb data: 80460ba4BB

    CCF25555c4: push return address25555c9: invoke translator on callee

  • 7/28/2019 Virtual Ization 101

    29/50

    29Copyright 2007 VMware, Inc. All rights reserved.

    Adaptive BT

    Translated Code Is Fast Mostly IDENT translations

    Runs at speed

    Except Writes to Traced Memory Page fault (shown as !*!)

    Decode and interpret instruction

    Fire trace callbacks

    Resume execution

    Can take 1000s of cycles

    !*!

    Invoke Translator

    TranslationCache

  • 7/28/2019 Virtual Ization 101

    30/50

    30Copyright 2007 VMware, Inc. All rights reserved.

    Adaptive BT: Fast Trace Handling

    Detect and Track Trace Faults

    Splice in TRACE Translation Execute memory access in software

    Avoid page fault

    No re-decoding

    Faster resumption

    Faster Traces 10x performance improvement

    Adapts to runtime behavior

    JMP

    InvokeTranslator

    TRACE

  • 7/28/2019 Virtual Ization 101

    31/50

    31Copyright 2007 VMware, Inc. All rights reserved.

    Software VMM Evaluation

    Benefits Adaptation

    Fast traces Fast I/O emulation

    Flexibility

    Costs Running translator

    Path lengthening

    System call slowdown

    Complexity

  • 7/28/2019 Virtual Ization 101

    32/50

    32Copyright 2007 VMware, Inc. All rights reserved.

    Hardware-Assisted VMM

    Recent x86 Extension 1998 2005: Software-only VMMs using binary translation

    2005: Intel and AMD start extending x86 to support virtualization

    First-Generation Hardware Enables classical trap-and-emulate VMMs

    Intel VT, aka Vanderpool Technology

    AMD SVM, aka Pacifica

    Performance VT/SVM help avoid BT, but not MMU ops (actually slower!)

    Main problem is efficient virtualization of MMU and I/O,Not executing the virtual instruction stream

  • 7/28/2019 Virtual Ization 101

    33/50

    33Copyright 2007 VMware, Inc. All rights reserved.

    VT/SVM Architecture

    Diagram Y-axis: old school x86 privilege (CPL)

    X-axis: virtualization privilege

    Guest Mode Runs unmodified OS

    Sensitive operations exit(trap out) to host mode

    VMCB Virtual Machine Control Block

    VMM-controlled, hardware-walked

    Buffers simple exits

    CPL 3CPL 3

    CPL 2

    CPL 1

    CPL 0

    CPL 2

    CPL 1

    CPL 0

    Host Guest

  • 7/28/2019 Virtual Ization 101

    34/50

    34Copyright 2007 VMware, Inc. All rights reserved.

    Hardware-Assisted VMM

    Hardware-Assisted Direct ExecCPL 0-3

    VMMCPL 0-3

    Host mode

    Guest mode

    Fault,Trace,

    Interrupt,I/O ...

    Resume Guest

  • 7/28/2019 Virtual Ization 101

    35/50

    35Copyright 2007 VMware, Inc. All rights reserved.

    Hardware-Assisted VMM Evaluation

    Benefits Simplicity (no BT)

    Fast system calls No translator overheads

    Costs Exits: 1000s of cycles for traces and I/O

    No adaptation or software flexibility

    Stateless model

    Future

    Hardware support for fast MMU virtualization Intel EPT, AMD NPT

  • 7/28/2019 Virtual Ization 101

    36/50

    36Copyright 2007 VMware, Inc. All rights reserved.

    What is Paravirtualization?

    Full Virtualization No modifications to guest OS

    Excellent compatibility, good performance, but complex

    Paravirtualization Exports Simpler Architecture Term coined by Denali project in 01, popularized by Xen

    Modify guest OS to be aware of virtualization layer

    Remove non-virtualizable parts of architecture

    Avoid rediscovery of knowledge in hypervisor

    Excellent performance and simple, but poor compatibility

    Ongoing Linux Standards Work Paravirt Ops interface between guest and hypervisor

    Small team from VMware, Xen, IBM LTC, etc.

  • 7/28/2019 Virtual Ization 101

    37/50

    37Copyright 2007 VMware, Inc. All rights reserved.

    Paravirtualization: Conceptual Diagram

    Hardware

    Hypervisor

    Guest

    OS

    Hardware

    Hypervisor

    GuestOS

    Full Virtualization Paravirtualization

    Hypercalls(GOOD)

    System callinterface

    NOT GOOD!

  • 7/28/2019 Virtual Ization 101

    38/50

    38Copyright 2007 VMware, Inc. All rights reserved.

    VMware Vision: Transparent Paravirtualization

    Same OS binary

    Xen 3.0.x VMware ESX

    NativeNative Native

    Dom0VMI

    LinuxDomU

    XenoLinux

    VMILinux

    VMILinux

    WindowsSolaris

  • 7/28/2019 Virtual Ization 101

    39/50

    39Copyright 2007 VMware, Inc. All rights reserved.

    Further Reading

    VMware Publications www.vmware.com/academic/resources.html

    A Comparison of Software and Hardware Techniques for x86Virtualization (ASPLOS 06)

    Fast Transparent Migration for Virtual Machines (USENIX 05)

    Memory Resource Management in VMware ESX Server (OSDI 02)

    Virtualizing I/O Devices on VMware Workstations Hosted VMM(USENIX 01)

    Additional Academic Publications Xen and the Art of Virtualization (SOSP 03)

    Disco: Running Commodity Operating Systems on ScalableMultiprocessors (SOSP 97)

    Many more

  • 7/28/2019 Virtual Ization 101

    40/50

    40Copyright 2007 VMware, Inc. All rights reserved.

    Additional Topics

    I/O Virtualization

    Memory Management

  • 7/28/2019 Virtual Ization 101

    41/50

    41Copyright 2007 VMware, Inc. All rights reserved.

    I/O Virtualization Stack

    Guest Device Driver

    Virtual Device Model existing device, e.g. e1000

    Model an idealized device, e.g. vmxnet

    Virtualization Layer Emulates the virtual device

    Remaps guest and real I/O addresses Multiplexes and drives physical device

    Provides additional features,e.g. transparent NIC teaming

    Real Device Physical hardware, e.g. bcm5700

    Likely to be different than virtual device

    Guest OS

    Device Driver

    Device Driver

    I/O Stack

    DeviceEmulation

  • 7/28/2019 Virtual Ization 101

    42/50

    42Copyright 2007 VMware, Inc. All rights reserved.

    I/O Virtualization Implementations

    Device Driver

    I/O Stack

    Guest OS

    Device Driver

    DeviceEmulation

    Device Driver

    I/O Stack

    Guest OS

    Device Driver

    DeviceEmulation DeviceEmulation

    Host OS/Dom0/Parent Domain

    Guest OS

    Device Driver

    DeviceManager

    Hosted or Split Hypervisor DirectPassthrough I/O

    VMware Workstation, VMware Server,VMware ESX Server (for slow devices),Xen, Microsoft Viridian, Virtual Server

    VMware ESX Server(storage and network)

    A Future OptionMany Challenges

    Emulated I/O

  • 7/28/2019 Virtual Ization 101

    43/50

    43Copyright 2007 VMware, Inc. All rights reserved.

    Passthrough I/O Virtualization

    High Performance Guest drives device directly

    Minimizes CPU utilizationEnabled by HW Assists

    I/O-MMU for DMA isolatione.g. Intel VT-d, AMD IOMMU

    Partitionable I/O devicee.g. PCI-SIG IOV spec

    Challenges Hardware independence

    Migration, suspend/resume Memory overcommitment

    I/O MMU

    DeviceManager

    VF VF VF

    PF

    PF = Physical Function, VF = Virtual FunctionI/O Device

    Guest OS

    Device Driver

    Guest OS

    Device Driver

    Guest OS

    Device Driver

    VirtualizationLayer

  • 7/28/2019 Virtual Ization 101

    44/50

    44Copyright 2007 VMware, Inc. All rights reserved.

    Additional Topics

    I/O Virtualization

    Memory Management

  • 7/28/2019 Virtual Ization 101

    45/50

    45Copyright 2007 VMware, Inc. All rights reserved.

    Memory Management

    Desirable capabilities Efficient memory overcommitment

    Accurate resource controls Exploit sharing opportunities

    Challenges Allocations should reflect both importance and working set

    Best data to guide decisions known only to guest OS

    Guest and meta-level policies may clash

  • 7/28/2019 Virtual Ization 101

    46/50

    46Copyright 2007 VMware, Inc. All rights reserved.

    VMware Memory Management

    Reclamation mechanisms Ballooning guest driver allocates pinned PPNs,

    hypervisor deallocates backing MPNs Swapping hypervisor transparently pages out PPNs,

    paged in on demand

    Page sharing hypervisor identifies identical PPNsbased on content, maps to same MPN copy-on-write

    Allocation policies Proportional sharing revoke memory from VM

    with minimum shares-per-page ratio

    Idle memory tax charge VM more for idle pagesthan for active pages to prevent unproductive hoarding

  • 7/28/2019 Virtual Ization 101

    47/50

    47Copyright 2007 VMware, Inc. All rights reserved.

    Ballooning

    Guest OS

    balloon

    Guest OS

    balloon

    Guest OS

    inflate balloon(+ pressure)

    deflate balloon( pressure)

    may page outto virtual disk

    may page infrom virtual disk

    guest OS manages memoryimplicit cooperation

  • 7/28/2019 Virtual Ization 101

    48/50

    48Copyright 2007 VMware, Inc. All rights reserved.

    Page Sharing

    Motivation Multiple VMs running same OS, apps

    Collapse redundant copies of code, data, zerosTransparent page sharing

    Map multiple PPNs to single MPN copy-on-write

    Pioneered by Disco [Bugnion 97] , but required guest OS hooks

    Content-based sharing General-purpose, no guest OS changes

    Background activity saves memory over time

  • 7/28/2019 Virtual Ization 101

    49/50

    49Copyright 2007 VMware, Inc. All rights reserved.

    Page Sharing: Scan Candidate PPN

    VM 1 VM 2 VM 3

    011010110101010111101100

    MachineMemory 06af

    343f8123b

    Hash:VM:PPN:MPN:

    hint frame

    hashtable

    hash page contents 2bd806af

  • 7/28/2019 Virtual Ization 101

    50/50

    50Copyright 2007 VMware, Inc. All rights reserved.

    Page Sharing: Successful Match

    VM 1 VM 2 VM 3

    MachineMemory 06af

    2123b

    Hash:Refs:MPN:

    shared frame

    hashtable