io memory management hardware goes mainstream mark hummel amd fellow computation products group, amd...
TRANSCRIPT
IO Memory Management IO Memory Management Hardware Goes MainstreamHardware Goes Mainstream
Mark HummelMark HummelAMD FellowAMD FellowComputation Products Group, AMDComputation Products Group, AMDMark.Hummel @ amd.comMark.Hummel @ amd.com
Session OverviewSession Overview
Benefits and FunctionBenefits and Function
Topology and Features Topology and Features
Translation Data StructuresTranslation Data Structures
Software InterfaceSoftware Interface
Function Of An IOMMUFunction Of An IOMMUWhat does it do?What does it do?
Translates requests that come Translates requests that come fromfrom all devices all devices regardless of targetregardless of target
Enforces access rights of devices to system Enforces access rights of devices to system address spaceaddress space
Page granular protectionPage granular protection
Separate read and write access rights Separate read and write access rights
Maintains cache of translations Maintains cache of translations
Root of distributed caching hierarchy of Root of distributed caching hierarchy of address translationsaddress translations
Function Of An IOMMUFunction Of An IOMMUWhat does it not do?What does it not do?
Does not translate CPU originated trafficDoes not translate CPU originated trafficThe processor’s traffic is translated by the The processor’s traffic is translated by the CPU’s MMUCPU’s MMU
Does not directly support demand paged IODoes not directly support demand paged IODevices and drivers are not designed to deal with Devices and drivers are not designed to deal with arbitrary delays arbitrary delays
Devices and drivers don’t understand concept of an Devices and drivers don’t understand concept of an “IO Page Fault”“IO Page Fault”
Support for remote address translation extensions Support for remote address translation extensions enable indirect device specific demand pagingenable indirect device specific demand paging
System TopologySystem TopologyWhere is the IOMMU?Where is the IOMMU?
HTHT
DRAM
IOM
MU
PC
I E
xpre
ssP
CI
Exp
ress
dev
ices
,d
evic
es,
swit
ches
swit
ches
CPU
DRAM
HTHT
IOM
MU
PCI, LPC, etcPCI, LPC, etc
HTHT
PCIe bridge
CPU
DeviceAT
C
optional optional remote ATCremote ATC
Tunnel
PCIe bridge
AT
CA
TC
ATC = Address Translation CacheATC = Address Translation CacheHT = HyperTransportHT = HyperTransportPCIe = PCI ExpressPCIe = PCI Express
PCIe bridge
IO Hub
System TopologySystem Topology
IOMMU are at edge of system IOMMU are at edge of system interconnection fabricinterconnection fabric
Full Source Identification is availableFull Source Identification is available
IOMMU are distributed and independentIOMMU are distributed and independentCreates scalable caching structuresCreates scalable caching structures
IOMMU supports remote address IOMMU supports remote address translation caching extensionstranslation caching extensions
Allow tuning of caching hierarchyAllow tuning of caching hierarchy
Benefits Of An IOMMU Benefits Of An IOMMU Why have one?Why have one?
Enhanced virtualization capabilitiesEnhanced virtualization capabilitiesDirect device assignment to Guest OS Direct device assignment to Guest OS
Improved performance and scalabilityImproved performance and scalability
Enables direct device access by user Enables direct device access by user mode applicationsmode applications
Benefits Of An IOMMU Benefits Of An IOMMU Direct Device Assignment ExampleDirect Device Assignment Example
Device Driver
Device Controller
Virtual Machine Monitor
Virtual Device Emulator
Device Controller
Device Driver
Virtual Device Driver
Guest OS
Virtual Device Driver
Guest OS
Device Controller
Virtual Machine Monitor
Device Controller
Device Driver
Guest OS
Device Driver
Guest OS
IOMMU
Overhead Overhead reduced reduced in path in path
between between Guest Guest and and
DeviceDevice
Benefits Of An IOMMU Benefits Of An IOMMU Why have one? (continued)Why have one? (continued)
Enhanced security capabilitiesEnhanced security capabilitiesAdds precise device access control of Adds precise device access control of address spaceaddress space
Creates IO protection domains Creates IO protection domains
Enhanced system reliabilityEnhanced system reliabilityIsolation between devices Isolation between devices
Protects system memory from errant Protects system memory from errant device writesdevice writes
Benefits Of An IOMMU Benefits Of An IOMMU Security and Isolation ExampleSecurity and Isolation Example
Device Controller
Device Controller
System Memory
I/O BufferI/O Buffer
Malicious or Malicious or Errant WriteErrant Write
Device Controller
Device Controller
System Memory
I/O Buffer
IOMMU
Protection Domain 1
I/O Buffer
Protection Domain 2
Write is Write is blockedblocked
Benefits Of An IOMMU Benefits Of An IOMMU Why have one?Why have one?
Support for Trusted Input and OutputSupport for Trusted Input and OutputCreates a protected channel between a Creates a protected channel between a device and driver device and driver
Benefits Of An IOMMU Benefits Of An IOMMU Trusted I/O ExampleTrusted I/O Example
I/O Buffer
DiskController
System Memory
Graphics Controller
Device Driver
Application
I/O Buffer
DiskController
Content is Content is capture by capture by
33rdrd party party
I/O Buffer
DiskController
System Memory
Graphics Controller
I/O Buffer
DiskController
IOMMU33rdrd party party
access is access is blockedblocked
Protected Protected ChannelsChannels
Device Driver
Device Driver
Application
Device Driver
Benefits Of An IOMMU Benefits Of An IOMMU Why have one?Why have one?
Support legacy 32-bit devices in Support legacy 32-bit devices in large-memory systems large-memory systems
Eliminates bounce buffersEliminates bounce buffers
Benefits Of An IOMMU Benefits Of An IOMMU Bounce Buffer ExampleBounce Buffer Example
Bounce Buffer
DiskController
System Memory
CPU
I/O Buffer
0 - 4 GB
4 GB+
DiskController
System Memory
CPU
I/O Buffer
0 - 4 GB
4 GB+
IOMMUController limited to Controller limited to 32 bit addressing32 bit addressing
CPU must CPU must move datamove data
IOMMU translates IOMMU translates address so data address so data can be directly can be directly
placedplaced
Benefits Of An IOMMU Benefits Of An IOMMU Why have one?Why have one?
Synergy with PCI-SIG virtualization effortsSynergy with PCI-SIG virtualization effortsAddress translation service (ATS)Address translation service (ATS)
Single root device virtualizationSingle root device virtualization
Multi-root shared I/O fabric Multi-root shared I/O fabric
IOMMU FeaturesIOMMU Features
Variable per-device virtual address rangeVariable per-device virtual address range
Variable per-device physical page size Variable per-device physical page size
Flexible virtual address space sharing optionsFlexible virtual address space sharing optionsDevices can have their own virtual address space Devices can have their own virtual address space
Devices can share a virtual address space Devices can share a virtual address space
Can be utilized natively by an enhancedCan be utilized natively by an enhanced OS OS
Can be utilized by a virtual machine monitorCan be utilized by a virtual machine monitor
Translation Data StructuresTranslation Data StructuresDefinitionsDefinitions
Requester ID (RID) Requester ID (RID) Label identifying the source of a transactionLabel identifying the source of a transaction
Address Translation Cache (ATC)Address Translation Cache (ATC)Local or remote coherent copy of address Local or remote coherent copy of address translationstranslations
I/O Translation Look aside Buffer (IOTLB)I/O Translation Look aside Buffer (IOTLB)A remote ATC that exists in a device associated A remote ATC that exists in a device associated with an IOMMU with an IOMMU
Address Translation Services (ATS)Address Translation Services (ATS)Extensions supporting remote caching of Extensions supporting remote caching of address translationsaddress translations
Translation Data StructuresTranslation Data StructuresDefinitionsDefinitions
Page Directory Entry (PDE)Page Directory Entry (PDE)Translation table entry that points at a table Translation table entry that points at a table
Page Table Entry (PTE) Page Table Entry (PTE) Translation table entry that contains a Translation table entry that contains a translationtranslation
Root translation table Root translation table TranslationTranslation table at the top of translation hierarchy table at the top of translation hierarchy
Device TableDevice TableMaps Requester ID to root translation tableMaps Requester ID to root translation table
Translation Data StructuresTranslation Data StructuresDevice RequestsDevice Requests
Contain a Requester ID Contain a Requester ID BUS/Device/Function (BDF) used for PCI ExpressBUS/Device/Function (BDF) used for PCI Express
Unit ID or BDF (with SRC ID extension) for Unit ID or BDF (with SRC ID extension) for HyperTransportHyperTransport
Extensions to support remote ATCExtensions to support remote ATCUn-translated (device virtual address) read or writeUn-translated (device virtual address) read or write
Default caseDefault case
IOMMU will translate the address of the requestIOMMU will translate the address of the request
Translated (system physical address) read or writeTranslated (system physical address) read or writeIOMMU uses the address provided without translationIOMMU uses the address provided without translation
Translation requestTranslation request
Translation Data StructuresTranslation Data StructuresDevice TableDevice Table
Single contiguous block of system memory Single contiguous block of system memory
Maps Requester ID to a root Maps Requester ID to a root translation tabletranslation table
Per device virtual address space supported Per device virtual address space supported
Many to one mappings supportedMany to one mappings supported
Each device is assigned a Domain IDEach device is assigned a Domain IDDevices may share a Domain IDDevices may share a Domain ID
IOMMU Invalidations managed on a per IOMMU Invalidations managed on a per Domain basisDomain basis
Translation Data StructuresTranslation Data StructuresDevice TableDevice Table
Reserved Control Bits
127127 104104 103103 9696
Reserved Domain ID [15:0]
9595 8080 7979 6464
Reserved Page Table Root Pointer [51:32]
6363 5151 3232
IRIWRes
6262 6161 6060 5252
Page Table Root Pointer [31:12] Reserved
3131 99 88 00
VNL11111212
V – valid bitV – valid bit
IW – I/O Write protectionIW – I/O Write protection
IR – I/O Read protectionIR – I/O Read protection
NL – next LevelNL – next Level
Res - reservedRes - reserved
Translation Data StructuresTranslation Data StructuresPage tablesPage tables
Translation tables are Translation tables are alwaysalways 4K byte 4K byte blocks in system memory blocks in system memory
Root translation table base address Root translation table base address comes from the Device Tablecomes from the Device Table
May point to either a table of PDE or PTEMay point to either a table of PDE or PTE
Intermediate translation tables Intermediate translation tables Point to either a table of PDE or PTEPoint to either a table of PDE or PTE
Translation Data StructuresTranslation Data StructuresPage tablesPage tables
Next Table Addr [31:12] Reserved
3131 99 88 00
PNL11111212 11
Res Next Table Addr [51:32]
6363 5151 3232
IRIWRes
6262 6161 6060 5252
IW – I/O Write protectionIW – I/O Write protection
IR – I/O Read protectionIR – I/O Read protection
NL – next LevelNL – next Level
P – presentP – present
Page Address [31:12] Reserved
3131 99 88 00
P00011111212 11
S – sizeS – size
U – ATS attribute bitU – ATS attribute bit
NS – ATS attribute bitNS – ATS attribute bit
Res - reservedRes - reserved
PDE FormatPDE Format
PTE FormatPTE Format
Res Page Address [51:32]
6363 5252 3232
IRIWRes
6262 6161 6060 5757
NS5959
US
5858 5151
1) IOMMU receives request, but the translation is not 1) IOMMU receives request, but the translation is not cached in the ATC. Socached in the ATC. So
4) and refill the ATC and satisfy 4) and refill the ATC and satisfy the device requestthe device request
2) The Requester ID from the 2) The Requester ID from the device request is used to select device request is used to select the root translation tablethe root translation table
Translation Data Structures Translation Data Structures Simplified ViewSimplified View
Device Device requestrequest
Re
qu
es
ter
ID
PointerPointer
IndexIndex
IOMMU
Device Table Base
3) Address from the device request is 3) Address from the device request is used to walk page tablesused to walk page tables
Page tablesPage tables
““Virtual” addressVirtual” address
ATC““Translated” addressTranslated” address
Device TableDevice Table
Translation Data StructuresTranslation Data StructuresAdvance capabilitiesAdvance capabilities
Support for 64-bit device virtual addressSupport for 64-bit device virtual addressRequires 6 level lookup Requires 6 level lookup
Support for variable page sizesSupport for variable page sizesAll power of 2 from 4K upAll power of 2 from 4K up
IOMMU tables may be shared with IOMMU tables may be shared with CPU MMUCPU MMU
Can be efficiently virtualizedCan be efficiently virtualized
Translation Data StructuresTranslation Data StructuresAdvance capabilitiesAdvance capabilities
Configurable maximum table depthConfigurable maximum table depthIf virtual address has group of leading zeros the lookup If virtual address has group of leading zeros the lookup depth may be reduceddepth may be reduced
Level SkippingLevel SkippingIf virtual address has interior groups of zeros, lookup If virtual address has interior groups of zeros, lookup levels may be skippedlevels may be skipped
Early ExitEarly ExitExit is possible at any level with remaining un-Exit is possible at any level with remaining un-translated address bits used as an offset within a translated address bits used as an offset within a “super page”“super page”
Base is 4k, super pages are at 2M(2Base is 4k, super pages are at 2M(22121) 1G(2) 1G(23030), ), 512G(2512G(23939), 2), 24848
Translation Data Structures Translation Data Structures Example with level skippingExample with level skipping
Starting Starting LevelLevel
Levels SkippedLevels Skipped
Final Level 1Final Level 1Skipped Skipped 2M 2M Super pageSuper page
0000000b1 000000000b1 Level-4 Page Table Offset
000000000b1 Level-2 Page Table Offset
Physical Page Offset
63 58 57 48 47 39 38 30 29 21 20 0
11The Virtual Address bits associates with all skipped levels must be zeroThe Virtual Address bits associates with all skipped levels must be zero
Level 4 Page Table Address 4h
51 12 11 95263 8 0
PDE 2h
Level-4 Level-4 TableTable
PTE 0h
Level-2 Level-2 TableTable
Physical Address
2 MB Page2 MB Page
52 52
99 99 2121
Software InterfaceSoftware InterfaceControl StructuresControl Structures
Command QueueCommand Queue
Event QueueEvent Queue
Cmd Buffer base register
IOMMUDevice Table base register
Event Log base register
Device Table
Command Queue
Event LogI/O Page Tables
Software InterfaceSoftware InterfaceControl StructuresControl Structures
Command QueueCommand QueueCircular ring buffer in system memoryCircular ring buffer in system memory
Low insertion overheadLow insertion overhead
Processed at IOMMU service rateProcessed at IOMMU service rate
16 byte command entries16 byte command entries
Maximum size is 512 KBMaximum size is 512 KB
Event LogEvent LogCircular ring buffer in system memoryCircular ring buffer in system memory
Low removal overheadLow removal overhead
Processed at CPU service rateProcessed at CPU service rate
16 byte log entries16 byte log entries
Maximum size 512 KBMaximum size 512 KB
Software Interface Software Interface Command queueCommand queue
Tail Pointer is incremented by the CPU after writing a commandTail Pointer is incremented by the CPU after writing a command
Tail Pointer write signals IOMMU that new command is ready Tail Pointer write signals IOMMU that new command is ready
Head Pointer is incremented by the IOMMU after reading a commandHead Pointer is incremented by the IOMMU after reading a command
MMIO Offset 0008hMMIO Offset 0008h
MMIO Offset 2008hMMIO Offset 2008h
MMIO Offset 2000hMMIO Offset 2000h
MMIO Offset 2020hMMIO Offset 2020hstatus register
tail pointer
buffer base
buffer size
tail pointer
buffer base
buffer size
head pointer+0+0
+16+16
+32+32
+48+48
+64+64
+80+80
+96+96
+112+112
reads
reads
writ
es
writ
es
IOMMUIOMMU(consumer)(consumer)
IOMMU registersIOMMU registers
System SoftwareSystem Software(producer)(producer)
Circular command buffer in Circular command buffer in system memorysystem memory
Software Interface Software Interface CommandsCommands
Invalidate Device Table EntryInvalidate Device Table EntryIndexed by Device IDIndexed by Device ID
Invalidate IOMMU PagesInvalidate IOMMU PagesPower of 2 naturally aligned number of 4K pagesPower of 2 naturally aligned number of 4K pages
Indexed by DomainIndexed by Domain
Invalidate IOTLB PagesInvalidate IOTLB PagesPower of 2 naturally aligned number of 4K pagesPower of 2 naturally aligned number of 4K pages
Indexed by Device IDIndexed by Device ID
Completion WaitCompletion WaitMay be used as a fenceMay be used as a fence
May be used to signal an interruptMay be used to signal an interrupt
May be used to write a flag in system memoryMay be used to write a flag in system memory
IOMMU manages ordering interlocksIOMMU manages ordering interlocksInvalidate Device Table commands will complete before Invalidate Device Table commands will complete before subsequent Invalidate IOMMU Pages commandssubsequent Invalidate IOMMU Pages commands
Invalidate IOMMU Pages commands will complete before Invalidate IOMMU Pages commands will complete before subsequent Invalidate IOTLB Pages commandssubsequent Invalidate IOTLB Pages commands
Completion semanticsCompletion semanticsInvalidation commands are complete when all overlapping DMA Invalidation commands are complete when all overlapping DMA transactions that are in flight to system memory are either transactions that are in flight to system memory are either complete or visiblecomplete or visible
Completion signaled when Completion Wait Completion signaled when Completion Wait command is executedcommand is executed
InterruptInterrupt
Memory based flagMemory based flag
Software Interface Software Interface Command ordering and semanticsCommand ordering and semantics
Software Interface Software Interface Event logEvent log
Tail Pointer is incremented by the IOMMU after writing an event Tail Pointer is incremented by the IOMMU after writing an event
IOMMU can be configured to signal an interrupt when event log is written IOMMU can be configured to signal an interrupt when event log is written
Head Pointer is incremented by the CPU after reading an eventHead Pointer is incremented by the CPU after reading an event
Head Pointer write signals IOMMU that event has been consumed Head Pointer write signals IOMMU that event has been consumed
[MMIO Offset 0010h][MMIO Offset 0010h]
[MMIO Offset 2018h][MMIO Offset 2018h]
[MMIO Offset 2010h][MMIO Offset 2010h]
head pointer
buffer base
buffer size
+0+0
+16+16
+32+32
+48+48
+64+64
+80+80
+96+96
+112+112
reads
reads
writ
es
writ
es
System SoftwareSystem Software(consumer)(consumer)
status register
tail pointer
buffer base
buffer size
head pointer
IOMMU registersIOMMU registers
IOMMUIOMMU(producer)(producer)
Circular event log in system Circular event log in system memorymemory
Software Interface Software Interface EventsEvents
Translation eventsTranslation eventsInvalid Device Table EntryInvalid Device Table Entry
IO Page FaultIO Page Fault
Device Table HW ErrorDevice Table HW Error
Page Table HW ErrorPage Table HW Error
Invalid Device RequestInvalid Device Request
Command processing eventsCommand processing eventsCommand HW ErrorCommand HW Error
Illegal Command Illegal Command
IOTLB Invalidate TimeoutIOTLB Invalidate Timeout
Software Interface Software Interface Exception HandlingException Handling
Translation failure for any reason Translation failure for any reason (i.e. Errors due to I/O page faults, memory errors (i.e. Errors due to I/O page faults, memory errors due to page table walks) due to page table walks)
Request is abortedRequest is abortedCompleter Abort (CA) returned to device where possibleCompleter Abort (CA) returned to device where possibleDetails loggedDetails loggedInterrupt is optionally generated Interrupt is optionally generated
Command queue failureCommand queue failureProcessing is haltedProcessing is haltedDetails loggedDetails loggedInterrupt is optionally generatedInterrupt is optionally generated
Software Interface Software Interface OS/Hypervisor InteractionsOS/Hypervisor Interactions
InitializationInitializationDone via configuration and MMIO transactionsDone via configuration and MMIO transactions
Clear caches, set base address and size of domain tables, etcClear caches, set base address and size of domain tables, etc
Runtime operationsRuntime operationsDevice table updates, translation cache invalidations Device table updates, translation cache invalidations
Combination of MMIO and DRAM accessesCombination of MMIO and DRAM accesses
MP support requires software-managed sharing of command bufferMP support requires software-managed sharing of command buffer
Each IOMMU has separate command and event queueEach IOMMU has separate command and event queue
Virtualization of IOMMUVirtualization of IOMMUIntercept MMIO pointer writes to virtual IOMMUIntercept MMIO pointer writes to virtual IOMMU
Process virtual IOMMU command queue and update shadow tablesProcess virtual IOMMU command queue and update shadow tables
Forward Invalidate commands to real IOMMUForward Invalidate commands to real IOMMU
Call To ActionCall To Action
Read the “AMD I/O Virtualization (IOMMU) Read the “AMD I/O Virtualization (IOMMU) Technology” specification to understand Technology” specification to understand hardware assisted virtualization, available at hardware assisted virtualization, available at http://http://developer.amd.com/documentation.aspxdeveloper.amd.com/documentation.aspx
Driver writers should consider the effects of the Driver writers should consider the effects of the change from physical to virtual address change from physical to virtual address assignmentassignment
Device vendors should consider the impact on Device vendors should consider the impact on their devices when used with I/O memory their devices when used with I/O memory management hardwaremanagement hardware
Sign up for AMD’s development center at Sign up for AMD’s development center at http://http://devcenter.amd.comdevcenter.amd.com
Additional ResourcesAdditional Resources
Web ResourcesWeb ResourcesMain Page Main Page http://www.amd.comhttp://www.amd.com
Developer Center Developer Center http://http://devcenter.amd.comdevcenter.amd.com
PCI-SIG PCI-SIG http://http://www.pcisig.comwww.pcisig.com
Related SessionsRelated SessionsPCIe Address Translation Services PCIe Address Translation Services and I/O Virtualizationand I/O Virtualization
Windows Virtualization Best Practices and Windows Virtualization Best Practices and Future Hardware DirectionsFuture Hardware Directions
Questions?Questions?