io memory management hardware goes mainstream

40
IO Memory Management IO Memory Management Hardware Goes Hardware Goes Mainstream Mainstream Mark Hummel Mark Hummel AMD Fellow AMD Fellow Computation Products Group, Computation Products Group, AMD AMD Mark.Hummel @ amd.com Mark.Hummel @ amd.com

Upload: trantruc

Post on 21-Dec-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IO Memory Management Hardware Goes Mainstream

IO Memory Management IO Memory Management Hardware Goes MainstreamHardware Goes Mainstream

Mark HummelMark HummelAMD FellowAMD FellowComputation Products Group, AMDComputation Products Group, AMDMark.Hummel @ amd.comMark.Hummel @ amd.com

Page 2: IO Memory Management Hardware Goes Mainstream

Session OverviewSession OverviewBenefits and FunctionBenefits and FunctionTopology and Features Topology and Features Translation Data StructuresTranslation Data StructuresSoftware InterfaceSoftware Interface

Page 3: IO Memory Management Hardware Goes Mainstream

Function Of An IOMMUFunction Of An IOMMUWhat does it do?What does it do?

Translates requests that come Translates requests that come fromfrom all devices all devices regardless of targetregardless of targetEnforces access rights of devices to system Enforces access rights of devices to system address spaceaddress space

Page granular protectionPage granular protectionSeparate read and write access rights Separate read and write access rights

Maintains cache of translations Maintains cache of translations Root of distributed caching hierarchy of Root of distributed caching hierarchy of address translationsaddress translations

Page 4: IO Memory Management Hardware Goes Mainstream

Function Of An IOMMUFunction Of An IOMMUWhat does it not do?What does it not do?

Does not translate CPU originated trafficDoes not translate CPU originated trafficThe processor’s traffic is translated by the The processor’s traffic is translated by the CPU’s MMUCPU’s MMU

Does not directly support demand paged IODoes not directly support demand paged IODevices and drivers are not designed to deal with Devices and drivers are not designed to deal with arbitrary delays arbitrary delays Devices and drivers don’t understand concept of an Devices and drivers don’t understand concept of an “IO Page Fault”“IO Page Fault”Support for remote address translation extensions Support for remote address translation extensions enable indirect device specific demand pagingenable indirect device specific demand paging

Page 5: IO Memory Management Hardware Goes Mainstream

System TopologySystem TopologyWhere is the IOMMU?Where is the IOMMU?

HTHT

DRAM

IOM

MU

PCI E

xpre

ssPC

I Exp

ress

devi

ces,

devi

ces,

switc

hes

switc

hes

CPU

DRAM

HTHT

IOM

MU

PCI, LPC, etcPCI, LPC, etc

HTHT

PCIe bridge

CPU

DeviceATC

optional optional remote ATCremote ATC

Tunnel

PCIe bridge

ATC

ATC

ATC = Address Translation CacheATC = Address Translation CacheHT = HyperTransportHT = HyperTransportPCIe = PCI ExpressPCIe = PCI Express

PCIe bridge

IO Hub

Page 6: IO Memory Management Hardware Goes Mainstream

System TopologySystem TopologyIOMMU are at edge of system IOMMU are at edge of system interconnection fabricinterconnection fabric

Full Source Identification is availableFull Source Identification is available

IOMMU are distributed and independentIOMMU are distributed and independentCreates scalable caching structuresCreates scalable caching structures

IOMMU supports remote address IOMMU supports remote address translation caching extensionstranslation caching extensions

Allow tuning of caching hierarchyAllow tuning of caching hierarchy

Page 7: IO Memory Management Hardware Goes Mainstream

Benefits Of An IOMMU Benefits Of An IOMMU Why have one?Why have one?

Enhanced virtualization capabilitiesEnhanced virtualization capabilitiesDirect device assignment to Guest OS Direct device assignment to Guest OS

Improved performance and scalabilityImproved performance and scalability

Enables direct device access by user Enables direct device access by user mode applicationsmode applications

Page 8: IO Memory Management Hardware Goes Mainstream

Benefits Of An IOMMU Benefits Of An IOMMU Direct Device Assignment ExampleDirect Device Assignment Example

Device Driver

Device Controller

Virtual Machine Monitor

Virtual Device Emulator

Device Controller

Device Driver

Virtual Device Driver

Guest OS

Virtual Device Driver

Guest OS

Device Controller

Virtual Machine Monitor

Device Controller

Device Driver

Guest OS

Device Driver

Guest OS

IOMMU

Overhead Overhead reduced reduced in path in path

between between Guest Guest and and

DeviceDevice

Page 9: IO Memory Management Hardware Goes Mainstream

Benefits Of An IOMMU Benefits Of An IOMMU Why have one? (continued)Why have one? (continued)

Enhanced security capabilitiesEnhanced security capabilitiesAdds precise device access control of Adds precise device access control of address spaceaddress spaceCreates IO protection domains Creates IO protection domains

Enhanced system reliabilityEnhanced system reliabilityIsolation between devices Isolation between devices Protects system memory from errant Protects system memory from errant device writesdevice writes

Page 10: IO Memory Management Hardware Goes Mainstream

Benefits Of An IOMMU Benefits Of An IOMMU Security and Isolation ExampleSecurity and Isolation Example

Device Controller

Device Controller

System Memory

I/O BufferI/O Buffer

Malicious or Malicious or Errant WriteErrant Write

Device Controller

Device Controller

System Memory

I/O Buffer

IOMMU

Protection Domain 1

I/O Buffer

Protection Domain 2

Write is Write is blockedblocked

Page 11: IO Memory Management Hardware Goes Mainstream

Benefits Of An IOMMU Benefits Of An IOMMU Why have one?Why have one?

Support for Trusted Input and OutputSupport for Trusted Input and OutputCreates a protected channel between a Creates a protected channel between a device and driver device and driver

Page 12: IO Memory Management Hardware Goes Mainstream

Benefits Of An IOMMU Benefits Of An IOMMU Trusted I/O ExampleTrusted I/O Example

I/O Buffer

DiskController

System Memory

Graphics Controller

Device Driver

Application

I/O Buffer

DiskController

Content is Content is capture by capture by

33rdrd party party

I/O Buffer

DiskController

System Memory

Graphics Controller

I/O Buffer

DiskController

IOMMU 33rdrd party party access is access is blockedblocked

Protected Protected ChannelsChannels

Device Driver

Device Driver

Application

Device Driver

Page 13: IO Memory Management Hardware Goes Mainstream

Benefits Of An IOMMU Benefits Of An IOMMU Why have one?Why have one?

Support legacy 32-bit devices in Support legacy 32-bit devices in large-memory systems large-memory systems

Eliminates bounce buffersEliminates bounce buffers

Page 14: IO Memory Management Hardware Goes Mainstream

Benefits Of An IOMMU Benefits Of An IOMMU Bounce Buffer ExampleBounce Buffer Example

Bounce Buffer

DiskController

System Memory

CPU

I/O Buffer

0 - 4 GB

4 GB+

DiskController

System Memory

CPU

I/O Buffer

0 - 4 GB

4 GB+

IOMMUController limited to Controller limited to 32 bit addressing32 bit addressing

CPU must CPU must move datamove data

IOMMU translates IOMMU translates address so data address so data can be directly can be directly

placedplaced

Page 15: IO Memory Management Hardware Goes Mainstream

Benefits Of An IOMMU Benefits Of An IOMMU Why have one?Why have one?

Synergy with PCI-SIG virtualization effortsSynergy with PCI-SIG virtualization effortsAddress translation service (ATS)Address translation service (ATS)Single root device virtualizationSingle root device virtualizationMulti-root shared I/O fabric Multi-root shared I/O fabric

Page 16: IO Memory Management Hardware Goes Mainstream

IOMMU FeaturesIOMMU FeaturesVariable per-device virtual address rangeVariable per-device virtual address rangeVariable per-device physical page size Variable per-device physical page size Flexible virtual address space sharing optionsFlexible virtual address space sharing options

Devices can have their own virtual address space Devices can have their own virtual address space Devices can share a virtual address space Devices can share a virtual address space

Can be utilized natively by an enhancedCan be utilized natively by an enhanced OS OS Can be utilized by a virtual machine monitorCan be utilized by a virtual machine monitor

Page 17: IO Memory Management Hardware Goes Mainstream

Translation Data StructuresTranslation Data StructuresDefinitionsDefinitions

Requester ID (RID) Requester ID (RID) Label identifying the source of a transactionLabel identifying the source of a transaction

Address Translation Cache (ATC)Address Translation Cache (ATC)Local or remote coherent copy of address Local or remote coherent copy of address translationstranslations

I/O Translation Look aside Buffer (IOTLB)I/O Translation Look aside Buffer (IOTLB)A remote ATC that exists in a device associated A remote ATC that exists in a device associated with an IOMMU with an IOMMU

Address Translation Services (ATS)Address Translation Services (ATS)Extensions supporting remote caching of Extensions supporting remote caching of address translationsaddress translations

Page 18: IO Memory Management Hardware Goes Mainstream

Translation Data StructuresTranslation Data StructuresDefinitionsDefinitions

Page Directory Entry (PDE)Page Directory Entry (PDE)Translation table entry that points at a table Translation table entry that points at a table

Page Table Entry (PTE) Page Table Entry (PTE) Translation table entry that contains a Translation table entry that contains a translationtranslation

Root translation table Root translation table TranslationTranslation table at the top of translation hierarchy table at the top of translation hierarchy

Device TableDevice TableMaps Requester ID to root translation tableMaps Requester ID to root translation table

Page 19: IO Memory Management Hardware Goes Mainstream

Translation Data StructuresTranslation Data StructuresDevice RequestsDevice Requests

Contain a Requester ID Contain a Requester ID BUS/Device/Function (BDF) used for PCI ExpressBUS/Device/Function (BDF) used for PCI ExpressUnit ID or BDF (with SRC ID extension) for Unit ID or BDF (with SRC ID extension) for HyperTransportHyperTransport

Extensions to support remote ATCExtensions to support remote ATCUn-translated (device virtual address) read or writeUn-translated (device virtual address) read or write

Default caseDefault caseIOMMU will translate the address of the requestIOMMU will translate the address of the request

Translated (system physical address) read or writeTranslated (system physical address) read or writeIOMMU uses the address provided without translationIOMMU uses the address provided without translation

Translation requestTranslation request

Page 20: IO Memory Management Hardware Goes Mainstream

Translation Data StructuresTranslation Data StructuresDevice TableDevice Table

Single contiguous block of system memory Single contiguous block of system memory Maps Requester ID to a root Maps Requester ID to a root translation tabletranslation table

Per device virtual address space supported Per device virtual address space supported Many to one mappings supportedMany to one mappings supported

Each device is assigned a Domain IDEach device is assigned a Domain IDDevices may share a Domain IDDevices may share a Domain IDIOMMU Invalidations managed on a per IOMMU Invalidations managed on a per Domain basisDomain basis

Page 21: IO Memory Management Hardware Goes Mainstream

Translation Data StructuresTranslation Data StructuresDevice TableDevice Table

Reserved Control Bits127127 104104 103103 9696

Reserved Domain ID [15:0]9595 8080 7979 6464

Reserved Page Table Root Pointer [51:32]6363 5151 3232

IRIWRes6262 6161 6060 5252

Page Table Root Pointer [31:12] Reserved3131 99 88 00

VNL11111212

V – valid bitV – valid bit

IW – I/O Write protectionIW – I/O Write protection

IR – I/O Read protectionIR – I/O Read protection

NL – next LevelNL – next Level

Res - reservedRes - reserved

Page 22: IO Memory Management Hardware Goes Mainstream

Translation Data StructuresTranslation Data StructuresPage tablesPage tables

Translation tables are Translation tables are alwaysalways 4K byte 4K byte blocks in system memory blocks in system memory Root translation table base address Root translation table base address comes from the Device Tablecomes from the Device Table

May point to either a table of PDE or PTEMay point to either a table of PDE or PTE

Intermediate translation tables Intermediate translation tables Point to either a table of PDE or PTEPoint to either a table of PDE or PTE

Page 23: IO Memory Management Hardware Goes Mainstream

Translation Data StructuresTranslation Data StructuresPage tablesPage tables

Next Table Addr [31:12] Reserved3131 99 88 00

PNL11111212 11

Res Next Table Addr [51:32]6363 5151 3232

IRIWRes6262 6161 6060 5252

IW – I/O Write protectionIW – I/O Write protection

IR – I/O Read protectionIR – I/O Read protection

NL – next LevelNL – next Level

P – presentP – present

Page Address [31:12] Reserved3131 99 88 00

P00011111212 11

S – sizeS – size

U – ATS attribute bitU – ATS attribute bit

NS – ATS attribute bitNS – ATS attribute bit

Res - reservedRes - reserved

PDE FormatPDE Format

PTE FormatPTE Format

Res Page Address [51:32]6363 5252 3232

IRIWRes6262 6161 6060 5757

NS5959

US5858 5151

Page 24: IO Memory Management Hardware Goes Mainstream

1) IOMMU receives request, but the translation is not 1) IOMMU receives request, but the translation is not cached in the ATC. Socached in the ATC. So

4) and refill the ATC and satisfy 4) and refill the ATC and satisfy the device requestthe device request

2) The Requester ID from the 2) The Requester ID from the device request is used to select device request is used to select the root translation tablethe root translation table

Translation Data Structures Translation Data Structures Simplified ViewSimplified View

Device Device requestrequest

Req

uest

er ID

PointerPointer

IndexIndex

IOMMU

Device Table Base

3) Address from the device request is 3) Address from the device request is used to walk page tablesused to walk page tables

Page tablesPage tables

““Virtual” addressVirtual” address

ATC““Translated” addressTranslated” address

Device TableDevice Table

Page 25: IO Memory Management Hardware Goes Mainstream

Translation Data StructuresTranslation Data StructuresAdvance capabilitiesAdvance capabilities

Support for 64-bit device virtual addressSupport for 64-bit device virtual addressRequires 6 level lookup Requires 6 level lookup

Support for variable page sizesSupport for variable page sizesAll power of 2 from 4K upAll power of 2 from 4K up

IOMMU tables may be shared with IOMMU tables may be shared with CPU MMUCPU MMUCan be efficiently virtualizedCan be efficiently virtualized

Page 26: IO Memory Management Hardware Goes Mainstream

Translation Data StructuresTranslation Data StructuresAdvance capabilitiesAdvance capabilities

Configurable maximum table depthConfigurable maximum table depthIf virtual address has group of leading zeros the lookup If virtual address has group of leading zeros the lookup depth may be reduceddepth may be reduced

Level SkippingLevel SkippingIf virtual address has interior groups of zeros, lookup If virtual address has interior groups of zeros, lookup levels may be skippedlevels may be skipped

Early ExitEarly ExitExit is possible at any level with remaining un-Exit is possible at any level with remaining un-translated address bits used as an offset within a translated address bits used as an offset within a “super page”“super page”Base is 4k, super pages are at 2M(2Base is 4k, super pages are at 2M(22121) 1G(2) 1G(23030), ), 512G(2512G(23939), 2), 24848

Page 27: IO Memory Management Hardware Goes Mainstream

Translation Data Structures Translation Data Structures Example with level skippingExample with level skipping

Starting Starting LevelLevel

Levels SkippedLevels Skipped

Final Level 1Final Level 1Skipped Skipped 2M 2M Super pageSuper page

0000000b1 000000000b1 Level-4 Page Table Offset

000000000b1 Level-2 Page Table Offset

Physical Page Offset

63 58 57 48 47 39 38 30 29 21 20 0

11The Virtual Address bits associates with all skipped levels must be zeroThe Virtual Address bits associates with all skipped levels must be zero

Level 4 Page Table Address 4h

51 12 11 95263 8 0

PDE 2h

Level-4 Level-4 TableTable

PTE 0h

Level-2 Level-2 TableTable

Physical Address

2 MB Page2 MB Page

52 52

99 99 2121

Page 28: IO Memory Management Hardware Goes Mainstream

Software InterfaceSoftware InterfaceControl StructuresControl Structures

Command QueueCommand QueueEvent QueueEvent Queue

Cmd Buffer base register

IOMMUDevice Table base register

Event Log base register

Device Table

Command Queue

Event LogI/O Page Tables

Page 29: IO Memory Management Hardware Goes Mainstream

Software InterfaceSoftware InterfaceControl StructuresControl Structures

Command QueueCommand QueueCircular ring buffer in system memoryCircular ring buffer in system memoryLow insertion overheadLow insertion overheadProcessed at IOMMU service rateProcessed at IOMMU service rate16 byte command entries16 byte command entriesMaximum size is 512 KBMaximum size is 512 KB

Event LogEvent LogCircular ring buffer in system memoryCircular ring buffer in system memoryLow removal overheadLow removal overheadProcessed at CPU service rateProcessed at CPU service rate16 byte log entries16 byte log entriesMaximum size 512 KBMaximum size 512 KB

Page 30: IO Memory Management Hardware Goes Mainstream

Software Interface Software Interface Command queueCommand queue

Tail Pointer is incremented by the CPU after writing a commandTail Pointer is incremented by the CPU after writing a commandTail Pointer write signals IOMMU that new command is ready Tail Pointer write signals IOMMU that new command is ready Head Pointer is incremented by the IOMMU after reading a commandHead Pointer is incremented by the IOMMU after reading a command

MMIO Offset 0008hMMIO Offset 0008h

MMIO Offset 2008hMMIO Offset 2008h

MMIO Offset 2000hMMIO Offset 2000h

MMIO Offset 2020hMMIO Offset 2020hstatus register

tail pointer

buffer basebuffer size

tail pointer

buffer basebuffer size

head pointer+0+0

+16+16

+32+32

+48+48+64+64

+80+80

+96+96

+112+112

readsreads

writes

writes

IOMMUIOMMU(consumer)(consumer)

IOMMU registersIOMMU registers

System SoftwareSystem Software(producer)(producer)

Circular command buffer in Circular command buffer in system memorysystem memory

Page 31: IO Memory Management Hardware Goes Mainstream

Software Interface Software Interface CommandsCommands

Invalidate Device Table EntryInvalidate Device Table EntryIndexed by Device IDIndexed by Device ID

Invalidate IOMMU PagesInvalidate IOMMU PagesPower of 2 naturally aligned number of 4K pagesPower of 2 naturally aligned number of 4K pagesIndexed by DomainIndexed by Domain

Invalidate IOTLB PagesInvalidate IOTLB PagesPower of 2 naturally aligned number of 4K pagesPower of 2 naturally aligned number of 4K pagesIndexed by Device IDIndexed by Device ID

Completion WaitCompletion WaitMay be used as a fenceMay be used as a fenceMay be used to signal an interruptMay be used to signal an interruptMay be used to write a flag in system memoryMay be used to write a flag in system memory

Page 32: IO Memory Management Hardware Goes Mainstream

IOMMU manages ordering interlocksIOMMU manages ordering interlocksInvalidate Device Table commands will complete before Invalidate Device Table commands will complete before subsequent Invalidate IOMMU Pages commandssubsequent Invalidate IOMMU Pages commandsInvalidate IOMMU Pages commands will complete before Invalidate IOMMU Pages commands will complete before subsequent Invalidate IOTLB Pages commandssubsequent Invalidate IOTLB Pages commands

Completion semanticsCompletion semanticsInvalidation commands are complete when all overlapping DMA Invalidation commands are complete when all overlapping DMA transactions that are in flight to system memory are either transactions that are in flight to system memory are either complete or visiblecomplete or visible

Completion signaled when Completion Wait Completion signaled when Completion Wait command is executedcommand is executed

InterruptInterruptMemory based flagMemory based flag

Software Interface Software Interface Command ordering and semanticsCommand ordering and semantics

Page 33: IO Memory Management Hardware Goes Mainstream

Software Interface Software Interface Event logEvent log

Tail Pointer is incremented by the IOMMU after writing an event Tail Pointer is incremented by the IOMMU after writing an event IOMMU can be configured to signal an interrupt when event log is written IOMMU can be configured to signal an interrupt when event log is written Head Pointer is incremented by the CPU after reading an eventHead Pointer is incremented by the CPU after reading an eventHead Pointer write signals IOMMU that event has been consumed Head Pointer write signals IOMMU that event has been consumed

[MMIO Offset 0010h][MMIO Offset 0010h]

[MMIO Offset 2018h][MMIO Offset 2018h]

[MMIO Offset 2010h][MMIO Offset 2010h]

head pointer

buffer base

buffer size

+0+0

+16+16

+32+32

+48+48+64+64

+80+80

+96+96

+112+112

readsreads

writes

writes

System SoftwareSystem Software(consumer)(consumer)

status register

tail pointer

buffer base

buffer size

head pointer

IOMMU registersIOMMU registers

IOMMUIOMMU(producer)(producer)

Circular event log in system Circular event log in system memorymemory

Page 34: IO Memory Management Hardware Goes Mainstream

Software Interface Software Interface EventsEvents

Translation eventsTranslation eventsInvalid Device Table EntryInvalid Device Table EntryIO Page FaultIO Page FaultDevice Table HW ErrorDevice Table HW ErrorPage Table HW ErrorPage Table HW ErrorInvalid Device RequestInvalid Device Request

Command processing eventsCommand processing eventsCommand HW ErrorCommand HW ErrorIllegal Command Illegal Command IOTLB Invalidate TimeoutIOTLB Invalidate Timeout

Page 35: IO Memory Management Hardware Goes Mainstream

Software Interface Software Interface Exception HandlingException Handling

Translation failure for any reason Translation failure for any reason (i.e. Errors due to I/O page faults, memory errors (i.e. Errors due to I/O page faults, memory errors due to page table walks) due to page table walks)

Request is abortedRequest is abortedCompleter Abort (CA) returned to device where possibleCompleter Abort (CA) returned to device where possibleDetails loggedDetails loggedInterrupt is optionally generated Interrupt is optionally generated

Command queue failureCommand queue failureProcessing is haltedProcessing is haltedDetails loggedDetails loggedInterrupt is optionally generatedInterrupt is optionally generated

Page 36: IO Memory Management Hardware Goes Mainstream

Software Interface Software Interface OS/Hypervisor InteractionsOS/Hypervisor Interactions

InitializationInitializationDone via configuration and MMIO transactionsDone via configuration and MMIO transactionsClear caches, set base address and size of domain tables, etcClear caches, set base address and size of domain tables, etc

Runtime operationsRuntime operationsDevice table updates, translation cache invalidations Device table updates, translation cache invalidations Combination of MMIO and DRAM accessesCombination of MMIO and DRAM accessesMP support requires software-managed sharing of command bufferMP support requires software-managed sharing of command bufferEach IOMMU has separate command and event queueEach IOMMU has separate command and event queue

Virtualization of IOMMUVirtualization of IOMMUIntercept MMIO pointer writes to virtual IOMMUIntercept MMIO pointer writes to virtual IOMMUProcess virtual IOMMU command queue and update shadow tablesProcess virtual IOMMU command queue and update shadow tablesForward Invalidate commands to real IOMMUForward Invalidate commands to real IOMMU

Page 37: IO Memory Management Hardware Goes Mainstream

Call To ActionCall To ActionRead the “AMD I/O Virtualization (IOMMU) Read the “AMD I/O Virtualization (IOMMU) Technology” specification to understand Technology” specification to understand hardware assisted virtualization, available at hardware assisted virtualization, available at http://http://developer.amd.com/documentation.aspxdeveloper.amd.com/documentation.aspxDriver writers should consider the effects of the Driver writers should consider the effects of the change from physical to virtual address change from physical to virtual address assignmentassignmentDevice vendors should consider the impact on Device vendors should consider the impact on their devices when used with I/O memory their devices when used with I/O memory management hardwaremanagement hardwareSign up for AMD’s development center at Sign up for AMD’s development center at http://http://devcenter.amd.comdevcenter.amd.com

Page 38: IO Memory Management Hardware Goes Mainstream

Additional ResourcesAdditional ResourcesWeb ResourcesWeb Resources

Main Page Main Page http://www.amd.comhttp://www.amd.comDeveloper Center Developer Center http://http://devcenter.amd.comdevcenter.amd.com PCI-SIG PCI-SIG http://http://www.pcisig.comwww.pcisig.com

Related SessionsRelated SessionsPCIe Address Translation Services PCIe Address Translation Services and I/O Virtualizationand I/O VirtualizationWindows Virtualization Best Practices and Windows Virtualization Best Practices and Future Hardware DirectionsFuture Hardware Directions

Page 39: IO Memory Management Hardware Goes Mainstream

Questions?Questions?

Page 40: IO Memory Management Hardware Goes Mainstream