Windows Virtualization Best Windows Virtualization Best Practices And Future Practices And Future Hardware DirectionsHardware Directions
Benjamin ArmstrongBenjamin ArmstrongProgram ManagerProgram ManagerVirtualizationVirtualizationMicrosoft CorporationMicrosoft Corporation
David WootenDavid WootenHardware ArchitectHardware ArchitectSystem Integrity GroupSystem Integrity GroupMicrosoft CorporationMicrosoft Corporation
GoalsGoals
After this session, you willAfter this session, you willBetter understand how a Microsoft Windows Better understand how a Microsoft Windows virtualization virtual machine (VM) virtualization virtual machine (VM) environment differs from a physical machineenvironment differs from a physical machine
Know what to do to ensure that your software Know what to do to ensure that your software works well within a VMworks well within a VM
AgendaAgenda
Virtual machine hardwareVirtual machine hardware
Virtualization impacts onVirtualization impacts onProcessorProcessor
StorageStorage
NetworkingNetworking
VideoVideo
Understanding isolationUnderstanding isolation
Development opportunitiesDevelopment opportunities
Hardware EquivalencyHardware Equivalency
Virtual machines (VMs) shouldVirtual machines (VMs) shouldaim to achieve a high level ofaim to achieve a high level ofhardware equivalencyhardware equivalency
Most software solutions ‘just work’Most software solutions ‘just work’
Not always possible to haveNot always possible to have100% equivalency100% equivalency
Awareness of differences in theAwareness of differences in theVM environment can help you to deliver a VM environment can help you to deliver a better solution for your customers on a better solution for your customers on a virtual platformvirtual platform
VSPs And VSCsVSPs And VSCs
Windows virtualization will provide a setWindows virtualization will provide a setof core VSPs and VSCs plus emulated of core VSPs and VSCs plus emulated hardware on supported platformshardware on supported platforms
Core VSP/VSCs will be included for storage, Core VSP/VSCs will be included for storage, networking, input and videonetworking, input and video
You cannot modify the core VSP/VSCsYou cannot modify the core VSP/VSCs
Emulated HardwareEmulated Hardware
The initial release of Windows The initial release of Windows virtualization will always exposevirtualization will always exposea limited set of emulated hardwarea limited set of emulated hardware
S3 Trio 64 Video cardS3 Trio 64 Video card
DEC 21140 Network cardDEC 21140 Network card
Etc.Etc.
It is possible to reconfigure theIt is possible to reconfigure theemulated hardwareemulated hardware
It is not possible to change the typeIt is not possible to change the typeof hardware being emulatedof hardware being emulated
Processor TopologyProcessor Topology
Changing processor type and topology inside of Changing processor type and topology inside of VMs under Windows virtualization is possibleVMs under Windows virtualization is possible
Processor changes require a cold boot of the VMProcessor changes require a cold boot of the VM
Do not make assumptions thatDo not make assumptions thatThe number of processors won’t changeThe number of processors won’t change
The core-to-processor ratio won’t changeThe core-to-processor ratio won’t change
The processor type won’t changeThe processor type won’t change
Hot add of virtual processors planned for Hot add of virtual processors planned for Windows Server 2003 and Windows Server Windows Server 2003 and Windows Server codenamed “Longhorn” guest operating systemcodenamed “Longhorn” guest operating system
Each VM is a single NUMA nodeEach VM is a single NUMA node
Processor SchedulingProcessor Scheduling
Each virtual processor “believes” that it Each virtual processor “believes” that it has 100% of its physical processor has 100% of its physical processor resources and that time is accurateresources and that time is accurate
This is not always trueThis is not always truePhysical processors can be oversubscribedPhysical processors can be oversubscribed
Resource limits can be configuredResource limits can be configured
Hypervisor is responsible for scheduling of Hypervisor is responsible for scheduling of virtual processors virtual processors
High-precision timing inside of VMs is High-precision timing inside of VMs is usually, but not always, guaranteed to usually, but not always, guaranteed to be accuratebe accurate
ProcessorProcessor
User-mode codeUser-mode codeMostly, no noticeable change to Mostly, no noticeable change to user-mode codeuser-mode code
Use CPUID to determine what is available Use CPUID to determine what is available Processor features might be subset of Processor features might be subset of physical machine physical machine
Do not assume all processors are always Do not assume all processors are always running at the same timerunning at the same time
Affects parallel execution codeAffects parallel execution code
ProcessorProcessor
Kernel-mode codeKernel-mode codeDon’t access processor structures directlyDon’t access processor structures directly(CRs, DRs, MSRs, PMC)(CRs, DRs, MSRs, PMC)
This is very expensiveThis is very expensive
Don’t use CPUID as a synchronizing instructionDon’t use CPUID as a synchronizing instructionUse fences insteadUse fences instead
Don’t assume CLI/STI gives accurate timingDon’t assume CLI/STI gives accurate timingInterrupts will still happenInterrupts will still happen
Don’t use RDTSC accesses for timingDon’t use RDTSC accesses for timingThis is highly volatileThis is highly volatile
Don't rely on processor performance countersDon't rely on processor performance countersCounters don't work outside of the parent partitionCounters don't work outside of the parent partition
StorageStorage
Storage is completely encapsulatedStorage is completely encapsulatedand the VM is not aware of thisand the VM is not aware of this
Unless you are using pass-through storageUnless you are using pass-through storage
Do not assume performance characteristicsDo not assume performance characteristicsof storage devicesof storage devices
Do not assume that CDs are slow (ISOs are fast) Do not assume that CDs are slow (ISOs are fast)
Do not assume that hard disks are fastDo not assume that hard disks are fast(might be on a network)(might be on a network)
Do not assume that floppy disks are slowDo not assume that floppy disks are slow
Emulated storage controllers areEmulated storage controllers areIntel 440BX controllerIntel 440BX controller
AIC 7870 SCSI controllerAIC 7870 SCSI controller
Persistency Of StoragePersistency Of Storage
Technologies like differencing disks, and Technologies like differencing disks, and snapshots mean that traditionally snapshots mean that traditionally persistent storage might not be persistent persistent storage might not be persistent any moreany more
Your software may find itself arbitrarily Your software may find itself arbitrarily moved back to an older point in timemoved back to an older point in time
Patches may be applied andPatches may be applied andthen ‘undone’then ‘undone’
Changes to storage persistencyChanges to storage persistencyare always user initiatedare always user initiated
NetworkingNetworking
Routing through host networkRouting through host networkadapter performed at OSI Layer 2adapter performed at OSI Layer 2
Host network security software provides Host network security software provides no protectionno protection
Unless the host is manually configuredUnless the host is manually configuredto route the VM’s network traffic at a higher to route the VM’s network traffic at a higher OSI LayerOSI Layer
Windows virtualization will only support Windows virtualization will only support 802.3 networking devices802.3 networking devices
NetworkingNetworking
Each virtual network card hasEach virtual network card hasits own separate MAC addressits own separate MAC address
This will be changed in the eventThis will be changed in the eventof a MAC address conflictof a MAC address conflict
MAC addresses can be configuredMAC addresses can be configuredto be static; But default to dynamicto be static; But default to dynamic
Emulated network controller isEmulated network controller isDEC/Intel 21140 Network controllerDEC/Intel 21140 Network controller
Performance not limited to 100 MbitPerformance not limited to 100 Mbit
VideoVideo
In Windows Server virtualization video In Windows Server virtualization video capabilities will be targeted at server capabilities will be targeted at server scenariosscenarios
2D video support only2D video support only
All video will be remoted over RDPAll video will be remoted over RDP
Emulated video controllerEmulated video controllerS3 Trio 64 Video controllerS3 Trio 64 Video controller
VGA and Text Mode performanceVGA and Text Mode performanceis not optimizedis not optimized
Non-planar video modes perform bestNon-planar video modes perform best
IsolationIsolation
By default, VMs are isolated entitiesBy default, VMs are isolated entitiesChild partitions are not able to access Child partitions are not able to access memory in any other partitionsmemory in any other partitions
Child partitions are not able to crashChild partitions are not able to crashany other partitionsany other partitions
Only methods for inter-virtualOnly methods for inter-virtualmachine communication aremachine communication are
Traditional networkingTraditional networking
HypercallsHypercalls
Integration ComponentsIntegration Components
Integration components operateIntegration components operateover VMBus to provide basic over VMBus to provide basic integration featuresintegration features
Time synchronizationTime synchronization
Operating System (OS) shutdownOperating System (OS) shutdown
Registry updatingRegistry updating
OS heartbeatOS heartbeat
OS identificationOS identification
Development OpportunitiesDevelopment Opportunities
VM neutral developmentVM neutral developmentSoftware that is not dependent onSoftware that is not dependent onspecific hardware will continue to function specific hardware will continue to function inside of VMsinside of VMs
External VM managementExternal VM managementSoftware can utilize WMI interfaces to control Software can utilize WMI interfaces to control and monitor VMsand monitor VMs
Integrated VM solutionsIntegrated VM solutionsVM-aware solutions can be developed that VM-aware solutions can be developed that provide enhanced features for users of VMsprovide enhanced features for users of VMs
Virtualization Hardware Virtualization Hardware FuturesFutures
David WootenDavid WootenHardware ArchitectHardware ArchitectSystem Integrity GroupSystem Integrity GroupMicrosoftMicrosoft
david.wooten @ microsoft.comdavid.wooten @ microsoft.com
Future TechnologiesFuture Technologies
The topics in this presentation relate The topics in this presentation relate to hardware to support possible to hardware to support possible features in version 2 of the Windows features in version 2 of the Windows hypervisor (HV2)hypervisor (HV2)
The hardware “requirements” discussed The hardware “requirements” discussed are expected to be needed to support the are expected to be needed to support the features of HV2 but future events may features of HV2 but future events may change these requirementschange these requirements
TopicsTopics
““Execution Environment” and why it Execution Environment” and why it needs protectionneeds protection
Protections in Root Complex with Protections in Root Complex with DMA RemappingDMA Remapping
Protections in Fabric to Regulate RoutingProtections in Fabric to Regulate Routing
Roots of Trust and the SMM ConundrumRoots of Trust and the SMM Conundrum
The EnvironmentThe Environment
The software running on a computer has The software running on a computer has control of the hardware on which it is control of the hardware on which it is running – its “execution environment”running – its “execution environment”
If that software is running on a virtual If that software is running on a virtual computer, it is important to preserve the computer, it is important to preserve the illusion of control over the virtualized illusion of control over the virtualized execution environmentexecution environment
Prevents unexpected behaviorPrevents unexpected behavior
Preserves meaning of local attestation used Preserves meaning of local attestation used for sealingfor sealing
Preservation Of EnvironmentPreservation Of Environment
The preservation of the apparent execution environment The preservation of the apparent execution environment of a virtual computer in a partition is the responsibility of of a virtual computer in a partition is the responsibility of the hypervisorthe hypervisor
The hypervisor must be able to enforce isolation between The hypervisor must be able to enforce isolation between partitions to insure adequate fidelity of the virtualization partitions to insure adequate fidelity of the virtualization
The main isolation tool for the hypervisor is The main isolation tool for the hypervisor is memory managementmemory management
Memory virtualization by the MMU (and associated Memory virtualization by the MMU (and associated registers) can prevent inappropriate changes to the registers) can prevent inappropriate changes to the memory of another partition through direct access by memory of another partition through direct access by the CPUthe CPU
Memory virtualization extensions are needed in IO Memory virtualization extensions are needed in IO hardware to complete the memory protectionshardware to complete the memory protections
Hypervisor
Partition 1
Partition 2
IOIO
The IO ProblemThe IO Problem
MMU MemoryMemory
100100
100100
100100
100100 42004200
100100
100100
AddressAddress
ControlControl
LegendLegend
FA00FA00
42004200
Evolution Of IO ProtectionEvolution Of IO Protection
In initial implementation of Windows In initial implementation of Windows virtualization, the IO mapping problem is virtualization, the IO mapping problem is finessed byfinessed by
““Assign” all IO devices to the Parent partitionAssign” all IO devices to the Parent partitionGive Parent partition a special mapping of Guest Give Parent partition a special mapping of Guest Physical = System PhysicalPhysical = System PhysicalPlace a lot of “trust” in the ParentPlace a lot of “trust” in the Parent
In HV2, the Parent may not have special rights to In HV2, the Parent may not have special rights to see into other partitionssee into other partitions
other partition may be a “peer” to the Parentother partition may be a “peer” to the Parent
In HV2, devices may be assigned to partitions In HV2, devices may be assigned to partitions other than the Parentother than the Parent
Partitions doing IO may not have the same level of Partitions doing IO may not have the same level of assumed “trust” as the V1 Parentassumed “trust” as the V1 Parent
Mechanisms For IO ProtectionMechanisms For IO Protection
Main new mechanism is DMA Main new mechanism is DMA remapping (DMAr)remapping (DMAr)
Adds address translation to DMAAdds address translation to DMA
Lets hypervisor limit device access Lets hypervisor limit device access to memoryto memory
PCI Routing Control and ID CheckingPCI Routing Control and ID CheckingRestrict peer-to-peer (P2P) access Restrict peer-to-peer (P2P) access so that devices can’t do P2P with so that devices can’t do P2P with un-translated addressun-translated address
Check ID of requester in switchesCheck ID of requester in switches
MemoryMemory
IOIO DMAr
MMUPartition 1
Partition 2
DMA RemappingDMA Remapping
100100
100100100100
100100
Hypervisor
42004200FA00FA00
HV2 DMAr RequirementsHV2 DMAr Requirements
Chipset must support either IOMMU Chipset must support either IOMMU (AMD) or VT-d (Intel)(AMD) or VT-d (Intel)
DMAr is not processor specific so IOMMU DMAr is not processor specific so IOMMU can be used with Intel processor and VT-d can be used with Intel processor and VT-d can be used with AMDcan be used with AMD
All IO devices must access memory All IO devices must access memory through DMArthrough DMAr
Chipset may have more that one DMAr Chipset may have more that one DMAr unit but they must use the same type of unit but they must use the same type of programming interfaceprogramming interface
PCI Routing ControlPCI Routing Control
PCI devices are accessed using System PCI devices are accessed using System Physical Addresses (SPA)Physical Addresses (SPA)
Drivers will program devices with Device Drivers will program devices with Device Physical Addresses (DPA) – DPA may be Physical Addresses (DPA) – DPA may be equal to Guest Physical Address (GPA) or equal to Guest Physical Address (GPA) or be device specific be device specific
To prevent a DPA from accessing a PCI To prevent a DPA from accessing a PCI device, switches must not route based device, switches must not route based on DPAon DPA
PCI Routing ControlPCI Routing Control
Microsoft is working through the PCI-SIG Microsoft is working through the PCI-SIG to define a modification to switches and to define a modification to switches and Functions so that DPA-based routing Functions so that DPA-based routing between PCI Functions can be disabledbetween PCI Functions can be disabled
Devices that must do P2P can get SPA Devices that must do P2P can get SPA from RC by using Address Translation from RC by using Address Translation Services (ATS)Services (ATS)
With ATS, Function can ask DMAr in RC for With ATS, Function can ask DMAr in RC for the SPA corresponding to DPA and then use the SPA corresponding to DPA and then use that DPA to directly access another devicethat DPA to directly access another device
Requester ID CheckingRequester ID Checking
DMAr hardware uses the Requester ID (Bus-DMAr hardware uses the Requester ID (Bus-Dev-Func or “BDF”) to chose a translation tableDev-Func or “BDF”) to chose a translation table
A device could write to wrong memory address if A device could write to wrong memory address if the BDF is wrongthe BDF is wrong
A switch can check the Requester ID and A switch can check the Requester ID and prevent errors of this sortprevent errors of this sort
Bus number of requester must be >= the secondary Bus number of requester must be >= the secondary bus number and <= the subordinate bus number of a bus number and <= the subordinate bus number of a switch portswitch port
Microsoft is working with the PCI-SIG to have Microsoft is working with the PCI-SIG to have this checking capability added to switchesthis checking capability added to switches
Static And Dynamic Roots Static And Dynamic Roots Of TrustOf Trust
Static Root of Trust Measurement (SRTM) and Static Root of Trust Measurement (SRTM) and Dynamic Root of Trust Measurement (DRTM) Dynamic Root of Trust Measurement (DRTM) are different ways to start a chain of trustare different ways to start a chain of trust
To start a chain of trust, the CPU must be in a To start a chain of trust, the CPU must be in a known state, running known code, and the known state, running known code, and the system must be in a state in which the code can system must be in a state in which the code can “defend” itself“defend” itself
From this initial condition, we can measure each From this initial condition, we can measure each of the state changes and be able to make of the state changes and be able to make assertions about the state of the computerassertions about the state of the computer
Static Root Of Static Root Of Trust MeasurementTrust Measurement
This is a chain of trust that is started by This is a chain of trust that is started by computer system reset – puts CPU in a computer system reset – puts CPU in a known stateknown stateThe first code executed (The Core Root of Trust The first code executed (The Core Root of Trust for Measurement – CRTM) measures the next for Measurement – CRTM) measures the next thing to be executed – CRTM is known codething to be executed – CRTM is known codeHardware is reset and peripheral access to Hardware is reset and peripheral access to memory is not allowed – CRTM can memory is not allowed – CRTM can “defend” itself“defend” itselfSignificant issue with SRTM is that, once trust is Significant issue with SRTM is that, once trust is lost (e.g., unknown code executed), only way to lost (e.g., unknown code executed), only way to get it back is to reboot the systemget it back is to reboot the system
Dynamic Root Of Dynamic Root Of Trust MeasurementTrust Measurement
Uses new CPU instructions to put the CPU in a Uses new CPU instructions to put the CPU in a known stateknown state
Code to be executed is sent to TPM to be “measured” Code to be executed is sent to TPM to be “measured” into a special Platform Configuration Register (PCR)into a special Platform Configuration Register (PCR)
This PCR is accessible only when in the DRTM initialization state This PCR is accessible only when in the DRTM initialization state and only by CPUand only by CPU
Initial, measured DRTM code is protected by hardware – Initial, measured DRTM code is protected by hardware – method varies by vendormethod varies by vendor
With DMAr, hypervisor can “defend” itself from IO devicesWith DMAr, hypervisor can “defend” itself from IO devices
With DRTM, if trust is lost, can restart chain of trust With DRTM, if trust is lost, can restart chain of trust without rebootingwithout rebooting
Secure LaunchSecure Launch
““Secure Launch” refers to the act of starting the Secure Launch” refers to the act of starting the hypervisor using the DRTMhypervisor using the DRTMA Secure Launch allows the hypervisor to come A Secure Launch allows the hypervisor to come up in a trusted state, with control of the system, up in a trusted state, with control of the system, regardless of what code has run previouslyregardless of what code has run previously
Allows arbitrary initialization code to run without Allows arbitrary initialization code to run without affecting the trust state of the systemaffecting the trust state of the system
Major benefit of DRTM is that attestation of the Major benefit of DRTM is that attestation of the platform can exclude lots of meaningless platform can exclude lots of meaningless information that can’t be ignored by SRTMinformation that can’t be ignored by SRTM
Add-in cardsAdd-in cardsBIOS updatesBIOS updatesDriver code used to boot hypervisorDriver code used to boot hypervisor
DRTM And Trust StateDRTM And Trust State
The attestation of a partition must include The attestation of a partition must include the partition state and anything that can the partition state and anything that can affect the execution of that partitionaffect the execution of that partition
Would like attestation only to include Would like attestation only to include software that is loaded after the DRTMsoftware that is loaded after the DRTM
This allows sealing to exclude This allows sealing to exclude pre-launch actionspre-launch actions
Can maintain chain of trust when code Can maintain chain of trust when code is updatedis updated
Bring system up in trusted state, verify that Bring system up in trusted state, verify that changes are within policy, then make changes changes are within policy, then make changes and update sealed blobsand update sealed blobs
Isolation And SMMIsolation And SMM
SMM can be more privileged than the hypervisorSMM can be more privileged than the hypervisorSMM can access any memory location without mediation by SMM can access any memory location without mediation by the hypervisorthe hypervisor
The privilege level of SMM means that SMM code may The privilege level of SMM means that SMM code may have to be included in the seal-to statehave to be included in the seal-to state
Because SMM loads before the DRTM is initiated, almost Because SMM loads before the DRTM is initiated, almost all of the code update problems related to the SRTM are all of the code update problems related to the SRTM are reinserted into the attestation/sealing process reinserted into the attestation/sealing process
SRTM problems arise because of changes to BIOS code which is SRTM problems arise because of changes to BIOS code which is not vetted by the OS/hypervisornot vetted by the OS/hypervisor
When OS/hypervisor loads, the changed BIOS means that PCRs When OS/hypervisor loads, the changed BIOS means that PCRs no longer match, which means that blobs can’t be unsealedno longer match, which means that blobs can’t be unsealed
What To Do About SMM?What To Do About SMM?
One approach to dealing with SMM is to make it run in a One approach to dealing with SMM is to make it run in a “container” that is controlled by the hypervisor“container” that is controlled by the hypervisor
Hypervisor can prevent SMM from accessing anything Hypervisor can prevent SMM from accessing anything that it shouldn’tthat it shouldn’t
Issue that OEMs have with this approach is that it could Issue that OEMs have with this approach is that it could allow the hypervisor to prevent SMM from accessing the allow the hypervisor to prevent SMM from accessing the parts of the hardware that it must accessparts of the hardware that it must access
Will CPU melt if hypervisor is broken or rogue?Will CPU melt if hypervisor is broken or rogue?
OEMs consider SMM to be part of the hardware and just OEMs consider SMM to be part of the hardware and just as “trustworthy” as the hardwareas “trustworthy” as the hardware
Trust isn’t the issue, the attestation and security evaluation of Trust isn’t the issue, the attestation and security evaluation of SMM is the issueSMM is the issue
““SMM is hardware” position begs the question of whether this SMM is hardware” position begs the question of whether this applies equally to SMM “applications”.applies equally to SMM “applications”.
SMM In HV2SMM In HV2
Microsoft does not yet have a complete Microsoft does not yet have a complete solution for dealing with SMM privilege solution for dealing with SMM privilege in HV2in HV2
Likely to have to evolve the solution by Likely to have to evolve the solution by working with processor, chipset, BIOS, working with processor, chipset, BIOS, and computer system vendorsand computer system vendors
Call To ActionCall To Action
Chipset vendors: Start planning Chipset vendors: Start planning DMAr deploymentDMAr deploymentSwitch vendors: Look to PCI-sig for ECRs Switch vendors: Look to PCI-sig for ECRs to implement access controlsto implement access controlsDevice vendors: Consider impact of Device vendors: Consider impact of DMAr and evaluate need for ATSDMAr and evaluate need for ATSBIOS, CPU, system vendors: Help with BIOS, CPU, system vendors: Help with SMM problemSMM problemAttend other virtualization presentations, Attend other virtualization presentations, especially VIR046 – HyperCall especially VIR046 – HyperCall APIs ExplainedAPIs Explained
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,
it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.