smart data accelerator interface (“sdxi”) specification

109
1 Smart Data Accelerator Interface (“SDXI”) Specification Version 0.9.0 rev 1 ABSTRACT: Smart Data Accelerator Interface (SDXI) is a proposed standard for a memory to memory Data Mover and acceleration interface. Publication of this Working Draft for review and comment has been approved by the SDXI TWG. This draft represents a “best effort” attempt by the SDXI TWG to reach preliminary consensus, and it may be updated, replaced, or made obsolete at any time. This document should not be used as reference material or cited as other than a “work in progress.” Suggestions for revisions should be directed to http://www.snia.org/feedback/ Working Draft July 2021

Upload: others

Post on 04-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

1

Smart Data Accelerator Interface (“SDXI”) Specification

Version 0.9.0 rev 1

ABSTRACT: Smart Data Accelerator Interface (SDXI) is a proposed standard for a memory to memory Data Mover and acceleration interface.

Publication of this Working Draft for review and comment has been approved by the SDXI TWG. This draft represents a “best effort” attempt by the SDXI TWG to reach preliminary consensus, and it may be

updated, replaced, or made obsolete at any time. This document should not be used as reference material or cited as other than a “work in progress.” Suggestions for revisions should be directed to

http://www.snia.org/feedback/

Working Draft

July 2021

USAGE 2 Copyright © 2021 SNIA. All rights reserved. All other trademarks or registered trademarks are the property of their 3 respective owners. 4

The SNIA hereby grants permission for individuals to use this document for personal use only, and for corporations and 5 other business entities to use this document for internal use only (including internal copying, distribution, and display) 6 provided that: 7

1. Any text, diagram, chart, table or definition reproduced shall be reproduced in its entirety with no alteration, and, 8

2. Any document, printed or electronic, in which material from this document (or any portion hereof) is reproduced 9 shall acknowledge the SNIA copyright on that material, and shall credit the SNIA for granting permission for its 10 reuse. 11

12

Other than as explicitly provided above, you may not make any commercial use of this document or any portion thereof, 13 or distribute this document to third parties. All rights not explicitly granted are expressly reserved to SNIA. 14

Permission to use this document for purposes other than those enumerated above may be requested by e-mailing 15 [email protected]. Please include the identity of the requesting individual and/or company and a brief description of the 16 purpose, nature, and scope of the requested use. 17

18

All code fragments, scripts, data tables, and sample code in this SNIA document are made available under 19 the following license: 20

BSD 3-Clause Software License 21 22 Copyright (c) 2021, The Storage Networking Industry Association. 23 24 Redistribution and use in source and binary forms, with or without modification, are permitted provided that the 25 following conditions are met: 26 27 * Redistributions of source code must retain the above copyright notice, this list of conditions and the following 28 disclaimer. 29 30 * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following 31 disclaimer in the documentation and/or other materials provided with the distribution. 32 33 * Neither the name of The Storage Networking Industry Association (SNIA) nor the names of its contributors may 34 be used to endorse or promote products derived from this software without specific prior written permission. 35 36 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY 37 EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 38 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT 39 SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 40 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED 41 TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR 42 BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 43 CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY 44 WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 45

46

DISCLAIMER 47

The information contained in this publication is subject to change without notice. The SNIA makes no 48 warranty of any kind with regard to this specification, including, but not limited to, the implied warranties of 49 merchantability and fitness for a particular purpose. The SNIA shall not be liable for errors contained herein 50 or for incidental or consequential damages in connection with the furnishing, performance, or use of this 51 specification. 52

53

54

Revision History 55

56

Revision Date Comments v0.7 September 2020 Starting point Draft Contribution to SNIA SDXI TWG

v0.9 July 2021 First Public preview

57

58

59

Suggestions for revisions should be directed to http://www.snia.org/feedback/. 60 61

62

63

64

65

Smart Data Accelerator Interface (“SDXI”) Specification 66

[Status] 67

July 2021 68

69

70

71

SDXI TWG v0.9 Contributors 72

Philip Ng, AMD 73

Alexandre Romana, Arm 74

Glen Sescila, Dell Inc 75

Shyam Iyer, Dell Inc 76

Paul Von Stamwitz, Fujitsu 77

Curtis Ballard, HPE 78

Dwight Riley, HPE 79

Donald Dutile, Red Hat Inc. 80

Beau Beachamp, MemVerge 81

Jason Wohlgemuth, Microsoft 82

Murali Ravirala, Microsoft 83

Frederick Knight, NetApp 84

Bill Martin, Samsung 85

Santosh Kumar, SK Hynix 86

Rich Brunner, VMware 87

James Leighton, Western Digital 88

Paul Hartke, Xilinx 89

90

91

2 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Table of contents 92

ABSTRACT: SMART DATA ACCELERATOR INTERFACE (SDXI) IS A PROPOSED STANDARD FOR A MEMORY TO 93 MEMORY DATA MOVER AND ACCELERATION INTERFACE. ................................................................................. 1 94

SDXI TWG V0.9 CONTRIBUTORS ............................................................................................................... 1 95

1 SDXI: OVERVIEW .................................................................................................................................... 6 96

1.1 SCOPE 6 97

1.2 OUTSIDE SCOPE 6 98

1.3 DOCUMENTATION CONVENTIONS ......................................................................................................... 6 99

Normative vs. Informative ........................................................................................................... 6 100

Reserved .................................................................................................................................... 6 101

Developer Notes ........................................................................................................................ 7 102

Abbreviations and Terminology .................................................................................................. 7 103

Other Clarifying Notes ................................................................................................................ 8 104

References ................................................................................................................................. 8 105

2 BACKGROUND ........................................................................................................................................ 9 106

2.1 ARCHITECTED PLATFORM DATA MOVER .............................................................................................. 9 107

SDXI Descriptor Ring .............................................................................................................. 10 108

Virtualization Support ............................................................................................................... 11 109

2.2 ADDRESSING MODES ........................................................................................................................ 12 110

PASID Usage ........................................................................................................................... 12 111

Address Spaces and Multi-Address-Space Operation .............................................................. 12 112

SDXI Function Groups ............................................................................................................. 13 113

Unpinned Memory Access ........................................................................................................ 14 114

2.3 MODULARITY AND EXPANDABILITY ..................................................................................................... 14 115

2.4 ENDIAN FORMAT SUPPORT ............................................................................................................... 15 116

3 SYSTEM MEMORY DATA STRUCTURES ............................................................................................ 16 117

3.1 OVERVIEW 16 118

3.2 CONTEXT DATA STRUCTURES ........................................................................................................... 16 119

Context Level 2 Table .............................................................................................................. 18 120

Context Level 1 Table .............................................................................................................. 19 121

Context Control Entry ............................................................................................................... 20 122

AKey Table Entry ..................................................................................................................... 22 123

Context Status Entry ................................................................................................................ 24 124

3.3 ADMINISTRATIVE CONTEXT (CONTEXT 0) ........................................................................................... 24 125

SNIA SDXI Specification Working Draft 3 Version 0.9.0 rev 1

3.4 ERROR LOG 24 126

Error Log Header Entry ............................................................................................................ 25 127

Descriptor Retry ....................................................................................................................... 30 128

3.5 SDXI MULTI-FUNCTION ACCESS ....................................................................................................... 30 129

SDXI Function Group ............................................................................................................... 30 130

RKey Processing ...................................................................................................................... 30 131

RKey Table .............................................................................................................................. 31 132

RKey Table Entry ..................................................................................................................... 31 133

4 SDXI FUNCTION AND CONTEXT STATE ............................................................................................. 33 134

4.1 SDXI FUNCTION STATE .................................................................................................................... 33 135

Stopped State .......................................................................................................................... 34 136

Initializing State ........................................................................................................................ 34 137

Active State .............................................................................................................................. 34 138

Soft Stop Request State ........................................................................................................... 34 139

Hard Stop Request State .......................................................................................................... 34 140

Stopped-On-Error State............................................................................................................ 35 141

4.2 SDXI CONTEXT STATE ..................................................................................................................... 35 142

Context Stopped/NotRunning States ........................................................................................ 35 143

Context Inactive State .............................................................................................................. 35 144

Context Ready State ................................................................................................................ 35 145

Context Running State ............................................................................................................. 37 146

Context Stopping State ............................................................................................................ 37 147

Context Stopping With Errors State .......................................................................................... 37 148

Context Error State ................................................................................................................... 37 149

5 SDXI DESCRIPTOR RING OPERATION ................................................................................................ 38 150

5.1 ENQUEUING ONE OR MORE DESCRIPTORS.......................................................................................... 40 151

Multi-Producer Enqueue ........................................................................................................... 43 152

5.2 DESCRIPTOR PROCESSING ............................................................................................................... 44 153

5.3 DESCRIPTOR ORDERING AND PARALLEL EXECUTION .......................................................................... 44 154

5.4 DESCRIPTOR COMPLETION ............................................................................................................... 45 155

5.5 MEMORY CONSISTENCY MODEL ........................................................................................................ 45 156

5.6 DESCRIPTOR CHAINING .................................................................................................................... 46 157

Extended Descriptors ............................................................................................................... 46 158

4 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

6 SDXI DESCRIPTOR AND OPERATION SPECIFICATION ..................................................................... 47 159

6.1 DESCRIPTOR FORMAT FOR SDXI OPERATIONS .................................................................................. 48 160

Common Header and Footer .................................................................................................... 48 161

Completion Status Block .......................................................................................................... 50 162

Attribute Field ........................................................................................................................... 50 163

6.2 DMA BASE OPERATIONS GROUP (DMABASEGRP) ............................................................................. 52 164

DmaBaseGrp: DS_DMAB_NOP ............................................................................................... 52 165

DmaBaseGrp: DS_DMAB_WRT_IMM Operation ..................................................................... 54 166

DmaBaseGrp: DS_DMAB_COPY Operation ............................................................................ 56 167

DmaBaseGrp: DS_DMAB_REPCOPY Operation ..................................................................... 58 168

6.3 ADMINISTRATIVE OPERATION GROUP (ADMINGRP) ............................................................................. 60 169

AdminGrp: DS_ADM_UPD_FUNC Operation ........................................................................... 61 170

AdminGrp DS_ADM_UPD_CXT Operation .............................................................................. 63 171

AdminGrp DS_ADM_UPD_AKEY Operation ............................................................................ 65 172

AdminGrp DS_ADM_START Operation ................................................................................... 67 173

AdminGrp DS_ADM_STOP Operation ..................................................................................... 69 174

AdminGrp DS_ADM_INTR Operation ....................................................................................... 71 175

AdminGrp DS_ADM_SYNC Operation ..................................................................................... 73 176

AdminGrp DS_ADM_UPD_RKEY Operation ............................................................................ 75 177

6.4 ATOMIC OPERATION GROUP (ATOMICGRP) ........................................................................................ 76 178

6.5 INTRGRP OPERATION GROUP ........................................................................................................... 79 179

IntrGrp DS_INTERRUPT Operation ......................................................................................... 79 180

6.6 VENDOR DEFINED OPERATION GROUP .............................................................................................. 81 181

7 FUNCTION INITIALIZATION AND CONTEXT MANAGEMENT ............................................................. 82 182

7.1 FUNCTION INITIALIZATION .................................................................................................................. 82 183

7.2 CONTEXT MANAGEMENT ................................................................................................................... 82 184

Caching of Memory Data Structures ......................................................................................... 82 185

Modify or Delete a Context Level 2 Table Entry ........................................................................ 83 186

Add Context Level 1 Table ....................................................................................................... 83 187

Modify or Delete a Context Level 1 Table Entry ........................................................................ 83 188

New Context Creation .............................................................................................................. 84 189

Modify a Context Control Entry ................................................................................................. 85 190

Modify or Delete an AKey Table Entry ...................................................................................... 85 191

Start a Context ......................................................................................................................... 86 192

SNIA SDXI Specification Working Draft 5 Version 0.9.0 rev 1

Restore and Resume a Context ............................................................................................... 86 193

Stop/Save a Context ................................................................................................................ 87 194

7.3 ERROR LOG PROCESSING ................................................................................................................. 87 195

Error Log Initialization ............................................................................................................... 87 196

Error Log Overflow Recovery ................................................................................................... 88 197

7.4 RESET 88 198

7.5 MAILBOX INTERFACE ........................................................................................................................ 88 199

7.6 DESCRIPTOR DRIVEN INTERRUPTS .................................................................................................... 88 200

8 SDXI PCI-EXPRESS DEVICE ARCHITECTURE .................................................................................... 89 201

8.1 SDXI FUNCTION CONFIGURATION SPACE REGISTERS ........................................................................ 89 202

Class Code............................................................................................................................... 89 203

BAR Configuration .................................................................................................................... 90 204

Required Capabilities and Extended Capabilities ..................................................................... 91 205

9 MMIO CONTROL REGISTERS .............................................................................................................. 92 206

9.1 GENERAL CONTROL AND STATUS REGISTERS .................................................................................... 93 207

9.2 CONTEXT AND RKEY TABLE REGISTERS ............................................................................................ 97 208

9.3 ERROR LOGGING CONTROL AND STATUS REGISTERS ......................................................................... 98 209

9.4 MBOX MAILBOX REGISTERS ........................................................................................................... 100 210

9.5 PF MAILBOX DATA REGISTERS ........................................................................................................ 102 211

9.6 VF MAILBOX DATA REGISTERS ........................................................................................................ 104 212

9.7 DOORBELL BAR 105 213

214

6 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

1 SDXI: Overview 215

This document describes SDXI, an architectural data-mover device for server platforms. SDXI aims to 216 provide a variety of benefits including architectural stability, modularity, virtualization-friendly programming 217 interface, both user and kernel mode support, and new capabilities designed to accelerate virtualized 218 workloads. 219

1.1 Scope 220

1.2 Outside Scope 221

1.3 Documentation Conventions 222

Shall, Should, May, and Can 223

This specification adheres to Section 13.1 of the IEEE Specifications Style Manual, which dictates use of 224 the words ‘shall’, ‘should’, ‘may’, and ‘can’ in the development of documentation, as follows: 225

• The word ‘shall’ is used to indicate mandatory requirements strictly to be followed in order to conform 226 to the Specification and from which no deviation is permitted (‘shall’ equals ‘is required to’). 227

• The use of the word ‘must’ is deprecated and shall not be used when stating mandatory 228 requirements; ‘must’ is used only to describe unavoidable situations. 229

• The use of the word ‘will’ is deprecated and shall not be used when stating mandatory requirements; 230 will is only used in statements of fact. 231

• The word ‘should’ is used to indicate that among several possibilities one is recommended as 232 particularly suitable, without mentioning or excluding others; or that a certain course of action is 233 preferred but not necessarily required; or that (in the negative form) a certain course of action is 234 deprecated but not prohibited (should equals is recommended that). 235

• The word ‘may’ is used to indicate a course of action permissible within the limits of the Specification 236 (may equals is permitted). 237

• The word ‘can’ is used for statements of possibility and capability, whether material, physical, or 238 causal (‘can’ equals ‘is able to’). 239

Normative vs. Informative 240

All sections are normative, unless they are explicitly indicated to be informative using conventions 241 described in 1.3. 242

Reserved 243

The following applies to the term ‘Reserved’: 244

• The contents, state, or information are not specified at this time. 245

• Any field, feature, capability, etc. marked ‘Reserved’ is subject to change. 246

SNIA SDXI Specification Working Draft 7 Version 0.9.0 rev 1

Developer Notes 247

Developer Notes do not specify normative or optional requirements. They are included for clarification and 248 illustration only. These notes are delineated by: 249

250

Developer Note: This is such a note … 251

Abbreviations and Terminology 252

253

Term Definition Administrative Context

A context that supports the use of administrative operations used for managing SDXI Functions and their contexts.

ATS Address Translation Services. Defined in the PCI Express Base Specification. Context A Context refers to a descriptor ring used for executing operations, along with

all associated memory data structures such as control and status information DMA Direct Memory Access Function Group A collection of SDXI functions that may generate DMA requests using each

other’s PCIe Requester IDs. GPA Guest Physical Address GVA Guest Virtual Address HPA Host Physical Address HVA Host Virtual Address IO Input Output IOMMU IO Memory Management Unit IOVA IO Virtual Address Local Local or Local Space refers to the SFuncHandle=0 value that corresponds to

the SDXI PCIe function hosting the context and descriptor. MMIO Memory mapped IO MMU Memory Management Unit PASID Process Address Space Identifier. The combination of PASID and Requester ID

identifies the address space used by a transaction on the PCIe bus. PIO Programmed I/O PRI Page Request Interface. Defined in the PCI Express Base Specification. Remote Remote or Remote Space refers to the use of a SFuncHandle value that is not

the Local one. Requester ID The PCIe bus, device and function number used by an SDXI function as part of

DMA and address translation operations. SDXI Function A PCIe endpoint function that implements the SDXI specification. This may be

either a physical or virtual function. SFuncHandle SDXI Function Handle is an opaque identifier that maps to an SDXI function’s

PCIe Requester ID. This is used when accessing AKey referenced data structures. The encoding is implementation specific other than the 0 value.

254

255

8 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Other Clarifying Notes 256

1.3.5.1 Index 257

Index as used in this specification refers to a table index to a table-specific entry. In order to address the 258 corresponding table entry in memory, the index is multiplied by the size of the table-specific entry and added to 259 the beginning address of the table. For example, if a table starting at address 0x1000 has entries that are 64 260 bytes in size, then an index into that table needs to be multiplied by 64 and added to 0x1000 to determine the 261 correct memory location of the entry. 262 263

References 264

PCI Express Base Specification, Revision 5.0 265

SNIA SDXI Specification Working Draft 9 Version 0.9.0 rev 1

2 Background 266

The concept of a Direct Memory Access (DMA) data mover device to offload software-based copy loops is 267 well-known. Such offloading is desirable to free up CPU execution cycles. But adoption has been mostly 268 limited to specific privileged software and I/O use cases employing very device-specific interfaces that are 269 not forward compatible. The current limitations make user-mode application usage challenging in a non-270 virtualized environment and nearly impossible in a multi-tenant virtualized environment. The figure below 271 shows a vision for such an architectural data mover. 272

Figure 2-1 Architectural Data Mover 273

MMIO (Memory Mapped I/O)

SCM (Storage Class Memory)

Fabric Attached Memory

SDXI

System Physical Address space• Accelerate Data

Movement (CPU offloaded)

• Secure• Architectural, Stable

Interfaces

SW context isolation layers

Application(Context A) Application(Context B)

1. Leverage a standard specification

Direct User mode

2. Innovate around the spec

3. Add incremental Data acceleration

features

GPU

FPGA

SMART IO

CPU Family B SDXI

SDXI

SDXI

SDXI

CPU CPU Family A

DRAM(Context A)

DRAM(Context B)DRAM (Context B)

DRAM (Context A)

274

We have developed the SDXI Platform Data Mover (aka “SDXI”) to provide an architectural interface to 275 address these limitations: 276

• An extensible, forward-compatible interface that is independent of actual data mover implementations 277 and underlying I/O interconnect technology. 278

• A standard interface for user-mode address-space to address-space data movement without the 279 need of mediation by privileged software once a connection has been established. 280

• A standard interface for privileged software to control the data mover including connection 281 management and data movement between multiple address spaces 282

• An interface that can be abstracted or virtualized by privileged software to allow greater compatibility 283 of workloads or virtual machines across different servers. 284

• A well-defined capability to quiesce, suspend, and resume the architectural state of a per-address-285 space data mover to allow “live” workload or virtual machine migration between servers. 286

• A forward compatible interface that ensures pre-existing software drivers including user-mode drivers 287 can operate on future hardware implementations without modification. 288

• The ability to incorporate additional offloads in the future without modification to the architectural 289 interface. 290

2.1 Architected Platform Data Mover 291

The basic SDXI architecture consists of some number of Smart Data accelerators that are enumerated as 292 one or more PCIe endpoint functions. Each function may support 1 or more contexts with each context 293 having a single descriptor ring. SR-IOV is optionally supported. 294

10 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Figure 2-2 Basic SDXI Architecture 295

SDXI Device

IOMMU

System Memory

SDXI Data Mover

PF0VF0_0VF0_nVF

0_0C

ntxt

0

DataDataData

VF0_

0Cnt

xtm

-1

PF0C

ntxt

n-1

PF0C

ntxt

0

VF0_

nCnt

xt0

VF0_

nCnt

xtm

-1

296

297

SDXI Descriptor Ring 298

An SDXI descriptor is a naturally aligned 64-byte entry that instructs an SDXI function to perform a given 299 operation. Descriptors are placed in a circular ring that starts at a specified address (DescRingBasePtr). 300 The ring is contiguous at the translation level configured for the SDXI function. The ring is configured to 301 contain a given number of descriptor ring entries (DescRingSize). The number of bytes allocated in 302 memory for the ring is: (DescRingSize * 64). 303

A set of related operations form an operations group. An operation requires a particular descriptor format 304 to indicate parameters for the operation. 305

The ring and all of its related system memory data structures comprise a “context”. SDXI uses a 2-level 306 hierarchy of context tables (Context Table Level 2, Context Table Level 1) to enumerate the components 307 of the context. The concatenation of the offsets used to enumerate a context in both Context tables yields 308 the 16-bit “context_number” that is associated with the context. (Note that the Context Tables point to a 309 context but are not themselves part of the context.) 310

SNIA SDXI Specification Working Draft 11 Version 0.9.0 rev 1

Figure 2-3 SDXI Descriptor Ring 311

A circular ring requires “start” and “end” indicators. Rather than using memory pointers to track these, an 312 SDXI descriptor ring uses 64-bit *unsigned* logical indices to indicate the start (ReadIndex) and end 313 (WriteIndex) of the descriptor ring. This simplifies various calculations for SW and HW alike. The logical 314 indices need only be mapped onto descriptor ring addresses when writing or reading a ring entry at a 315 given Index. All rings use the same mechanism to communicate between producer and consumer. 316

From the perspective of software, SDXI has two kinds of descriptor rings: 317

• A software producer ring. Software writes descriptors on to the ring, increments WriteIndex, and 318 writes to the doorbell to signal the SDXI function that a new descriptor has been written. The SDXI 319 function reads from the ring, increments ReadIndex, and performs the requested operation. With a 320 few exceptions noted in the next list item, all rings are software producer rings. 321

• A software consumer ring. The SDXI function writes log messages (using the format of descriptors) 322 on to the ring, increments WriteIndex, and can be configured to generate an interrupt to signal that a 323 new message has been written. Software reads from the log ring, increments ReadIndex, and 324 processes the requested message. The Error Log and the optional Administrative Log are two 325 examples of this kind of ring. 326

Virtualization Support 327

SDXI is architected specifically to allow operation within a virtualized environment. 328

Live migration is supported through a variety of accommodations in the programming model. During live 329 migration, a real hardware SDXI implementation may be stopped by the Hypervisor. Once stopped, there 330 is no hidden state. All critical state is located in either MMIO registers or in system memory. Another 331 instance may be configured with this state and restarted. It is possible to arbitrarily transition between 332 hardware and software emulated implementations. 333

DRBp + (N-1) * 64 DRBp + (1) * 64

DRBp

EntryAddresses

(Wraps every N*64 bytes)

Index = 0, N, 2*N, ...

Index = N-1, 2*N-1, 3*N-1, ...

Index = 1, N+1, 2*N+1, ...

Indices(Do not Wrap)

ReadIndex:

WriteIndex-1:

WriteIndex:

ValidEntry

FreeEntry

FreeEntry

FreeEntry

FreeEntry

ValidEntry

SDXI DescriptorQueue Ring

N = DescQueueSize (Number of entries in Queue)DRBp = DescriptorRingBasePtr

Indices are from 0 to (2^64)-1

EntryAddress = DescRingBasePtr + ((Index % DescQueueSize) << 6)WriteIndex - ReadIndex <= DescQueueSize

Where Consumer can start reading enqueued entries

Index of last enqueued entry to be read by Consumer

Where producer can start enqueueing

more entries

12 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

The programming model allows a Hypervisor to hide features from newer implementations and 334 specifications from a VM. This facilitates migration of a VM built around older versions of the specification 335 onto newer hardware. 336

Specific use cases around virtualized environments may be accelerated by SDXI. See Address Spaces 337 and Multi-Address-Space Operation. 338

Hardware implementations may be virtualized using standard PCIe SR-IOV techniques. There are slight 339 differences in the SDXI programming model between PF and VF implementations described in this 340 specification. In general, it is assumed that a Hypervisor would take control over PFs and allow VFs to be 341 directly mapped into VMs. 342

2.2 Addressing Modes 343

SDXI is designed to support multiple address spaces to facilitate a number of data movement use cases. 344 In combination with a supporting IOMMU, SDXI may make memory accesses using guest virtual, guest 345 physical or system physical addresses. 346

PASID Usage 347

Guest virtual space is accessed by tagging DMA requests with a Process Address Space Identifier 348 (PASID) that indexes into an appropriate IOMMU guest page table external to the SDXI function. This 349 allows DMA directly into user processes. The SDXI programming model allows control structures, 350 descriptors and data buffers to each be optionally located in a guest virtual space facilitating the 351 implementation of user-mode rings and access to user process data. Traditional descriptor rings and data 352 buffers located in kernel space are also supported. 353

Address Spaces and Multi-Address-Space Operation 354

Each SDXI physical or virtual function, like generic PCIe functions, can be considered to support DMA 355 access to one or more address spaces defined by their PCIe Requester ID, and optionally, a set of PASID 356 values. In general, the address spaces will be mapped as appropriate by privileged software. 357

A basic memory copy operation may reference two distinct memory data buffers. Privileged software may 358 want to use the copy operation to copy a data buffer from one address space to another. Traditionally this 359 would require some form of software address translation, or some scheme of mapping multiple address 360 spaces into a shared memory address space. Under SDXI this is greatly simplified. 361

Typically, a PCIe function can only access address spaces mapped using its own Requester ID. A unique 362 capability of SDXI functions is the optional ability to access memory buffers using address spaces 363 accessible by other SDXI functions. SDXI determines the address space of each data buffer using an 364 AKey value which is used to look up an AKey table entry. This entry provides SFuncHandle which maps to 365 the Requester ID of an SDXI function within the SDXI Function Group. The AKey table entry also provides 366 an optional PASID value. 367

Consider the example where a Hypervisor and a number of virtual machines (VMs) each have an SDXI 368 function (physical or virtual) assigned to them - FnCntl, FnSrc, FnDst - and those functions are located 369 within the same SDXI function group. The descriptor ring and all other control structures would be 370 associated with the Hypervisor’s FnCntl function and would be fetched using DMA FnCntl’s Requester ID. 371 The descriptor may then indicate the appropriate address spaces for both the source and destination data 372 buffers as belonging to FnSrc and FnDst. When reading the source buffer, DMA from SDXI will appear as 373 a read from FnSrc, and when writing the destination buffer, DMA from SDXI will appear as writes from 374 FnDst. In all cases, addresses may be either Guest Virtual or Guest Physical. All addresses would be 375 translated by the IOMMU using existing page tables. 376

SNIA SDXI Specification Working Draft 13 Version 0.9.0 rev 1

The figure below shows an example of such an operation. 377

Figure 2-4 Example Operation 378

F(TFFuncHandle)

Akey Table

DMA Read{RID0, PASID_A}DMA Read Completion DMA Write{RID2, PASID_C}

DMA Read{RID1, PASID_B}DMA Read Completion

Descriptor

Desc Type

Attr

DstAkey SrcAkey

SrcAddress

DstAddress

CompletionSignal

Address Space A Address Space CAddress Space B

Implementation defined Function. (Maps SFuncHandle to appropriate Requester IDs)

Akey Table (Allowed address Spaces)

AKEY 0 {SFuncHandleX, PASID_A}

IOMMUSF0: RID0 + PASID_A

SDXI Device

SF 0(“ FnSrc”)

SF 2(“ FnDst ”)

SF 1(“FnCtl”)

IOMMUSF1: RID1 + PASID_B

IOMMUSF2: RID2 + PASID_C

Producer

AKEY 1 {SFuncHandleZ, PASID_C}

F(SFuncHandle)SFuncHandle

RequesterID

379

380

Note that a multi-address space data movement solution can be achieved using two SDXI functions as 381 well as a single SDXI function. If FnDst or FnSrc are not equal to FnCtl, they may protect access to their 382 address spaces using the RKey feature. See 3.5 SDXI Multi-Function Access and 3.5.2 RKey Processing 383 for more details. 384

SDXI Function Groups 385

One of the conditions for allowing an SDXI function to access an address space owned by another SDXI 386 function is that both functions must belong to the same SDXI Function Group. An SDXI PF and its 387 associated VFs always belong to the same function group. A function group may contain multiple PFs and 388 VFs. A system may contain multiple disconnected function groups. 389

Function groups are discovered using the GroupID field in the MMIO registers. The method of 390 communication between SDXI functions within a function group is outside the scope of this specification. 391

392

14 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Figure 2-5 SDXI Function Groups 393

IOMMU 1

SR-IOV SDXI Device 0

IOMMU 0

System Memory

PF1VF1_0VF1_n

VF1_

0Cnt

xt0

DataDataData

Multiple SDXI devices within an SDXI Function Group

VF1_

0Cnt

xtm

-1

PF1C

ntxt

n-1

PF1C

ntxt

0

VF1_

nCnt

xt0

VF1_

nCnt

xtm

-1

PF0VF0_0VF0_n

VF0_

0Cnt

xt0

DataDataData

VF0_

0Cnt

xtm

-1

PF0C

ntxt

n-1

PF0C

ntxt

0

VF0_

nCnt

xt0

VF0_

nCnt

xtm

-1

SR-IOV SDXI Device 1

394

Unpinned Memory Access 395

SDXI supports the use of unpinned memory through the use of the PCIe-defined ATS and PRI features. 396 The SDXI programming model does not explicitly identify whether data structures reside in unpinned 397 memory. If the PCIe ATS and PRI features are enabled within a function, an SDXI implementation must 398 assume that any memory accesses associated with that function’s address spaces may target unpinned 399 memory. 400

2.3 Modularity and Expandability 401

SDXI is architected as a modular and expandable framework for offload functions. While the initial 402 specification is focused on copy operations, additional offloads may be defined and added to newer 403 versions of the specification, as well as newer implementations. All new features shall be explicitly 404 discovered and enabled by supporting software. Implementations may choose to implement different sets 405 of offload capabilities. 406

In general, all SDXI OS/VM and user-level software is expected to operate transparently on any hardware 407 implementation that supports the same or newer version of the specification that the software was 408 designed for, as long as all of the expected offload capabilities are present. 409

SNIA SDXI Specification Working Draft 15 Version 0.9.0 rev 1

2.4 Endian Format Support 410

The SDXI specification is written assuming a little-endian architecture. Support for other endian formats is 411 outside the scope of this specification. 412

16 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

3 System Memory Data Structures 413

3.1 Overview 414

The following figure depicts all of the memory-addressed data structures used by an SDXI function. In 415 order to facilitate efficient software-based virtualization, the majority of an SDXI function’s architectural 416 state resides in system memory and is described in this chapter. The remaining state resides in a small 417 number of MMIO control registers (described in Chapter 9) and PCI Function configuration space registers 418 (described in Chapter 8). This layout of memory structures is designed to facilitate selective trapping by 419 privileged software. 420

Figure 3-1 Memory-Addressed Data Structures 421

422

3.2 Context Data Structures 423

An SDXI context represents the memory structures needed to directly monitor or control the operation of a 424 descriptor ring. An SDXI context consists of a descriptor ring, and the associated Context Control Entry 425 (CCE), Context Status Entry (CSE), AKey Table, and WriteIndex structures. Memory buffers and 426

SNIA SDXI Specification Working Draft 17 Version 0.9.0 rev 1

completion signals which descriptor ring entries operate upon are not considered part of the SDXI function 427 context; however, they are an important part of the additional corresponding state that software manages 428 directly in order to use the descriptor ring. 429

SDXI uses a 2-level hierarchy of context tables (Context Table Level 2, Context Table Level 1) to 430 enumerate the components of the context. The concatenation of the 9-bit Level 2 offset with the 7-bit Level 431 1 offset yields the 16-bit “context_number” that is associated with the context. (Note that the Context 432 Tables point to a context but are not themselves part of the context.) MmioCap1.NContexts indicates the 433 maximum supported context number. 434

An SDXI function will not access portions of the contexts table associated with context numbers greater 435 than MmioCap1.NContexts. See the example below. 436

Figure 3-2 MmioCap1.NContexts Example 437

Context Level 1 Table

Lvl 1 Ptr for contexts 0 to 127Lvl 1 Ptr for contexts 128 to 255

. . .Other Entries Are Invalid

Context 128: CCE and AKey PtrsContext 129: CCE and AKey Ptrs

. . .Other Entries Are Invalid

Context 255: CCE and AKey Ptrs

Context 0: CCE and AKey PtrsContext 1: CCE and AKey Ptrs

. . .Other Entries Are Valid

Context 127: CCE and AKey Ptrs

Context Level 1 Table

Context Level 2 Table

Example: MmioCap1.NContexts = 128

438

439

18 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Context Level 2 Table 440

The Context Base Table register points to the 4KB level 2 table containing an array of level 2 table entries 441 described below. Each of those entries is a pointer to a level 1 table. A level 2 table contains 512 level 1 442 pointers. 443

444

Figure 3-3 Context Level 2 Table Entry 445

446

447

Table 3-1 Context Level 2 Table Entry 448

Bits Field Description

0 V Valid. When 1, indicates the other bits in this data structure are valid. When 0, all other bits in this data structure shall be ignored.

11:1 Reserved

63:12 ContextLvl1TableBasePtr Pointer to the start of a Context Level 1 table. This points to an aligned 4K region of memory.

449

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD0 V

DWORD1

ContextLvl1TableBasePtr[31:12] Rsvd

ContextLvl1TableBasePtr[63:32]

SNIA SDXI Specification Working Draft 19 Version 0.9.0 rev 1

Context Level 1 Table 450

Each of the level 1 table structures is a 4K naturally aligned piece of memory containing an array of 451 context level 1 table entries described below. A Context level 1 table has 128 entries. 452

Figure 3-4 Context Level 1 Table Entry 453

454

Table 3-2 Context Level 1 Table Entry 455

Bits Field Description

0 V Valid. When 1, indicates the other bits in this data structure are valid. When 0, all other bits in this data structure shall be ignored.

1 A Active. 1=Context is active and may execute descriptors. 0=Context is inactive and is blocked from executing descriptors.

2 PV PASID Valid. Indicates the ContextPASID field is valid.

5:3 Reserved

63:6 ContextControlPtr Pointer to a Context Control Entry. This points to a 64B aligned region of memory.

67:64 AKTSize AKeyTableSize. Indicates the size of the AKey table referenced by AKeyTableBasePtr. The table size is encoded as 2(AKTSize+12) bytes or 2(AKTSize+7)

entries. Encodings greater than 0x8 are reserved.

75:68 Reserved

127:76 AKeyTableBasePtr Pointer to the start of an AKey table. This points to a 4Kbyte aligned region of memory.

147:128 ContextPASID If PV=1, indicates the PASID value used in combination with the SDXI function’s Requester ID to access the WriteIndex, Context Status Entry, Completion Signal and Descriptor Ring. If PV=0, this field is not used and the WriteIndex, Context Status Entry, Completion Signal and Descriptor Ring are accessed using the SDXI function’s Requester ID but without a PASID.

151:148 MaxBuffer Indicates the maximum data buffer size supported by this context. The size is encoded as 2(MaxBuffer+21). Descriptor type or the size of the buffer length field within certain types of descriptors may further limit the maximum data buffer size for certain types of operations. This field should not be set to exceed MmioControl2.MaxBufferCntl

159:152 Reserved

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD0 PV A V

DWORD1

DWORD2

DWORD3

DWORD4

DWORD5

DWORD6

DWORD7

ContextControlPtr[63:32]

ContextControlPtr[31:6] Rsvd

Rsvd

Rsvd

AKeyTableBasePtr[31:12]

AKeyTableBasePtr[63:32]

CmdGrp0Support

ContextPASIDMaxBufferRsvd

AKTSizeRsvd

20 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Bits Field Description

191:160 CmdGrp0Support Each bit in this field is set to 1 to indicate whether a certain group of optional descriptors is supported within this context. The encoding of the bits matches the MmioCapabilities1.CmdGrp0Support register.

255:192 Reserved

456

Context Control Entry 457

The Context Control Entry contains control information for a single descriptor ring. 458

Software shall expose the memory containing the context control entry as readable to the SDXI function 459 hosting the context. Software shall expose the memory containing the descriptor ring and the write index 460 as read-write to the SDXI function hosting the context. 461

This structure is intended to be controlled by privileged software. 462

Figure 3-5 Context Control Entry 463

464

465

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD0 R SH R V

DWORD1

DWORD2

DWORD3

DWORD4

DWORD5

DWORD6

DWORD7

DWORD8

DWORD9

DWORD10

DWORD11

DWORD12

DWORD13

DWORD14

DWORD15

DescRingBasePtr[31:6] QoS

DescRingBasePtr[63:32]

DescRingSize

WriteIndexPtr[31:3] Rsvd

ContextStatusPtr[63:32]

ContextStatusPtr[31:4] Rsvd

Rsvd

WriteIndexPtr[63:32]

Rsvd

Rsvd

Rsvd

Rsvd

Rsvd

Rsvd

Rsvd

Rsvd

SNIA SDXI Specification Working Draft 21 Version 0.9.0 rev 1

466

Table 3-3 Context Control Entry 467

Bits Field Description

0 V Valid. When 1, indicates the other bits in this data structure are valid. When 0, all other bits in this data structure shall be ignored.

1 Reserved

3:2 QoS Quality of service indicator.

Value Definition 00b Urgent 01b High 10b Medium 11b Low

The usage of this field is implementation specific.

4 SH Sequential Consistency Hint. 1=Descriptor ring is expected to set the S bit in most descriptors.

5 Reserved

63:6 DescRingBasePtr Pointer to the start of the descriptor ring. This points to a 64-byte aligned region of memory

95:64 DescRingSize Descriptor ring size. Indicates the size of the descriptor ring in 64-byte units.

131:96 Reserved

191:132 ContextStatusPtr Pointer to the Context Status data structure which includes the read index value. This points to a 16B aligned region of memory.

194:192 Reserved

255:195 WriteIndexPtr Pointer to descriptor ring write index. This points to an 8B aligned region of memory.

511:256 Reserved

468

469

22 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

AKey Table Entry 470

The AKey table is a 4-Kbyte aligned contiguous memory structure composed of AKey table entries. The 471 table may be any power-of-2 size from 4-Kbytes up to 1-Mbyte. Software controls the size of the table 472 through the context level 1 entry structure. Software chooses the size of the table based on its contiguous 473 memory allocation capabilities. 474

The descriptor may reference any AKey associated with the same ring context. The AKey table entries 475 encode all of the valid address spaces, PASIDs and interrupts available to the context. 476

Figure 3-6 AKey Table Entry 477

478

Table 3-4 AKey Table Entry 479

Bits Field Description

0 V Valid. When 1, indicates the other bits in this data structure are valid. When 0, all other bits in this data structure shall be ignored.

1 IV Interrupt Valid. When 1, the InterruptNumber field is valid.

2 PV PASIDValid. When 1, the PASID field contains valid information.

3 SE Steering Enable. When 1, memory requests referencing this AKey table entry are enabled for PCIe Transaction Processing Hints* and set PCIe TH=1 if TPH is also enabled through the TPH Requester Extended Capability. 0=TPH is disabled for memory requests referencing this AKey table entry.

14:4 InterruptNumber Interrupts generated using this AKey table entry are issued using the specified SFuncHandle and the MSI-X entry corresponding to InterruptNumber.

15 Reserved

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD0 R SE PV IV V

DWORD1

DWORD2

DWORD3

TargetSFuncHandle InterruptNumber

PASID

SteeringTagPHRsvd

Rsvd

RKeyRsvd

SNIA SDXI Specification Working Draft 23 Version 0.9.0 rev 1

Bits Field Description

31:16 TargetSFuncHandle TargetSFuncHandle specifies the target function within the SDXI function group which owns data buffers associated with the AKey table entry. TargetSFuncHandle is set to the target function’s MmioCapabilities0.SFuncHandle value. TargetSFuncHandle is an opaque identifier that maps to the target function’s PCIe Requester ID which is used when accessing the data buffer. The TargetSFuncHandle encoding of 0 is used to indicate the same function that is executing the descriptor and which owns the AKey table. A function may not use its own MmioCapabilities1.SFuncHandle value in its AKey table entries to reference its own Requester ID. It must use the 0 encoding for this purpose. See 2.2.3 SDXI Function Groups for more details.

51:32 PASID PASID value used for requests using this AKey table entry.

63:52 Reserved

79:64 SteeringTag When SE=1 and ST Mode Select in the TPH Requester Extended Capability = Device Specific, this field provides up to 16 bits of steering tag information through the PCIe ST field for memory requests that reference this AKey table entry. Steering tag information is generated through standard PCIe controls if ST Mode Select is not set to the device specific mode.

81:80 PH Processing Hint. When SE=1, this field supplies the 2-bit PH field for memory requests that reference this AKey table entry.

95:82 Reserved

111:96 RKey Specifies the RKey value used to access another function’s data buffer. This field is only valid if TargetSFuncHandle is non-zero. See 5.7 RKey Processing for more details

*Refer to the PCI-Express Base Specification for further details on Transaction Processing Hints. 480

481

24 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Context Status Entry 482

The Context Status Entry contains status information for a single descriptor Context. If made accessible to 483 a user process, it is exposed as a read-only structure to software. 484

When creating the context, software shall initialize the status entry to all-zero prior to making the context 485 valid. 486

Software shall expose the memory containing the Context Status Entry as read-write to the SDXI function 487 using the context address space. 488

Figure 3-7 Context Status Entry 489

490

Table 3-5 Context Status Entry 491

Bits Field Description

3:0 ContextState Context State. 0000b = Stopped 0001b = Running 0010b = Stopping 1111b = Error All other encodings reserved

7:4 Reserved

63:8 Reserved

127:64 ReadIndex Descriptor ring read index.

3.3 Administrative Context (Context 0) 492

Administrative operations are used by privileged software to manage the SDXI function and are only 493 supported in Context 0 (“Administrative” Context) within each function. 494

An Administrative Context only supports “AdminGrp, ConnectGrp and other specified administrative 495 operations; no other operation groups including the DmaBaseGrp operations are supported. A VF 496 Administrative Context may be used to manage contexts associated with that VF. A PF Administrative 497 Context may be used to manage contexts hosted by the PF or contexts hosted by any VF associated with 498 the PF. 499

500

3.4 Error Log 501

Each SDXI function supports an Error Log message ring configured through MMIO space. 502

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD0

DWORD1

DWORD2

DWORD3 ReadIndex[63:32]

Rsvd

ReadIndex[31:0]

ContextStateRsvd

SNIA SDXI Specification Working Draft 25 Version 0.9.0 rev 1

Even though descriptors may reference data buffers in different address spaces and may detect errors 503 when accessing those buffers, all errors are logged with the function hosting the context. 504

The error log is a software consumer ring and is structured similar to a descriptor ring. 505

An SDXI function writes an error log header entry to the error log whenever an error is detected. For errors 506 associated with a descriptor, a copy of the descriptor is written into the error log after the header entry. 507 Descriptors written into the error log will have their Valid bit set to 1. 508

During descriptor processor errors may be detected and logged at different processing stages for a single 509 descriptor resulting in multiple error logs. Only one error may be logged for errors occurring during the 510 execution of the descriptor operation. Refer to 5.2 Descriptor Processing for more details. 511

The error log is empty when ErrLogWptr is equal to ErrLogRptr. Error log entries are written in increments 512 of 64 bytes. When an error log entry is written, ErrLogWptr will be advanced by 1 for each 64 byte unit 513 written, and ErrLogStatus will be set. 514

An interrupt will be generated when the error log is written if ErrLogInterruptEn=1. If MSI or MSI-X are 515 enabled, vector 0 will be used for the error log interrupt. 516

Software may process error log entries and then advance ErrLogRptr by 1 for each 64byte unit processed. 517

The function may simultaneously process multiple descriptors from the same ring. The order in which error 518 log entries are written is implementation-dependent even for descriptors within the same ring. 519

A context with context-specific errors, as indicated by CV=1 in the error log header will stop processing 520 new descriptors. If the context status entry is accessible, it will start transitioning to the Error state. 521

The error log will overflow if there is insufficient space to write a complete log entry when required. The 522 ErrorLogStatus.ErrLogOverflow bit is set on an overflow event. Error logging is stopped when the overflow 523 bit is set. Contexts may continue to run even when the error log is full or in the overflow state, however 524 detailed error information for new errors will be lost. 525

The function may be halted if an uncorrectable error prevents further safe operation across all of its 526 associated contexts. This is reflected in Status0.GlobalState. The function does not transition all of its 527 contexts into the error state when this occurs. 528

Errors may be reported on DMA write operations. While DMA writes are posted in PCI-Express and do not 529 return any status, write response status may be available in implementations that do not connect using a 530 physical PCIe link. 531

Error Log Header Entry 532

Error log header entries provide error status information about a specific error event. The ErrLogType and 533 ErrLogSubType fields provide information about the error condition. An implementation may use any of the 534 ErrLogSubType encodings in combination with any ErrLogType encodings as appropriate. 535

The CV, AV and IV flags indicate whether detailed information about the context number, memory address 536 and descriptor index are provided. 537

The RT flag indicates that the affected descriptor may be retried by software. See section 0 for more 538 details. 539

Error log header entries are formatted similar to SDXI descriptors but use a unique Type encoding to avoid 540 aliasing with real operations. See 6.1.1 Common Header and Footer for more details. 541

If a descriptor encountering an error is logged with the header, the header entry and the descriptor are 542 linked together by setting the Chain bit in the error log header. See 5.6 Descriptor Chaining for more 543 details on the use of the Chain bit. 544

26 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

An Extended Descriptor may be written into the error log after an error log header entry. If an Extended 545 Descriptor is written into the error log, the header entry ErrIndex field shall point to the starting index of the 546 Extended Descriptor in the context descriptor ring. 547

Figure 3-8 Error Log Header Entry 548

549

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD0 C V

DWORD1 RT IV AV CV

DWORD2

DWORD3

DWORD4

DWORD5

DWORD6

DWORD7

DWORD8

DWORD9

DWORD10

DWORD11

DWORD12

DWORD13

DWORD14

DWORD15

ErrAddress [31:0]

Rsvd

ErrAddress [63:32]

ErrIndex[31:0]

ErrIndex[63:32]

SubType=ErrLogType Type=0x7F7

ErrLogSubType ErrContextNumber

Rsvd

Rsvd

SNIA SDXI Specification Working Draft 27 Version 0.9.0 rev 1

Table 3-6 Error Log Header Entry 550

Bits Field Description

0 V Valid. When 1, indicates the other bits in this data structure are valid. When 0, all other bits in this data structure shall be ignored.

11:1 Type 0x7F7 = Error Log Header Entry

19:12 SubType The SubType field is used to encode the error log type. See below for encodings.

30:20 Reserved

31 C Chain. Indicates whether the error log header is followed by 1 or more descriptors in the error log.

47:32 ErrContextNumber Context number associated with the error log entry.

51:48 Reserved

52 CV Context Number Valid. Indicates whether the context_number field contains valid information.

53 AV Address Valid. 1=The ErrAddress field contains valid information.

54 IV Index Valid. 1=The ErrIndex field contains valid information.

55 RT Retry. 1=The referenced descriptor may be retried. Only valid in conjunction with descriptor errors (IV=1 and C=1).

63:56 ErrLogSubType Error log sub-type. See below for encodings.

127:64 ErrAddress Error Address. Valid if AV=1. Address associated with the error.

191:128 ErrIndex Error Index. Valid if IV=1. Descriptor index associated with the error.

511:192 Reserved

551

552

28 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Table 3-7 Error Log Type Encoding 553

ErrLogType Encoding and Type Allowed Values

Additional Information IV AV CV C

0 Internal Error X X X X DWords 1 to 31 including reserved fields may be used to communicate implementation specific data.

1 Generic Programming Error X X X X Simplified implementations may optionally utilize this generic error type rather than the more specific types.

2 Generic Memory Error X X X X Simplified implementations may optionally utilize this generic error type rather than the more specific types.

3 Context Level 2 Table Entry Error X 1 X X

ErrAddress holds the address of the Context Level 2 Table Entry containing the error. The address format is dependent on the FunctionPASIDValid register.

4 Context Level 1 Table Entry Error X 1 1 X

ErrAddress holds the address of the Context Level 1 Table Entry containing the error. The address format is dependent on the FunctionPASIDValid register

5 Context Control Entry Error X 1 1 X

ErrAddress holds the address of the Context Control Entry containing the error. The address format is dependent on the FunctionPASIDValid register

6 Context Status Entry Error X 1 1 X

ErrAddress holds the address of the Context Status Entry containing the error. The address format is dependent on the FunctionPASIDValid register

7 Generic Descriptor Error 1 0 1 1

8 Descriptor Completion Status Error 1 0 1 1

9 Descriptor AKey Error 1 0 1 1

10 Descriptor Data Buffer Error 1 X 1 1 If AV=1, ErrAddress contains the data buffer address associated with the error.

11 Descriptor Abort on Context Stop 1 0 1 1

12 RKey Error 1 0 1 1

ErrIndex points to the AKey associated with the error. The referenced AKey table entry contains the RKey used in the attempted data buffer access.

Notes: In this table, X refers to a value that can be either 0 or 1.

554

SNIA SDXI Specification Working Draft 29 Version 0.9.0 rev 1

Table 3-8 Error Log Sub-Type 555

ErrLog SubType Summary Description

0 Reserved

1 Generic uncorrectable error The SDXI function must be reset prior to restarting operation.

2 Non-zero reserved field A reserved field contained a non-zero value. Support for this sub-type is optional. Implementations are not required to detect all possible reserved field violations.

3 Unsupported field encoding Unsupported encoding in a supported field.

4 ATS address translation error An ATS request was returned with error status. This encoding is not used for permission faults unless the Page Request Interface (PRI) feature is disabled or halted.

5 ATS address translation returned poisoned data

An ATS request was returned with poisoned data.

6 PASID out of range A PASID value outside the implementation-supported range was detected.

7 PRI error received PRI response received with an error response code.

8 PRI timeout Timeout waiting for PRI response. The timeout time is implementation specific.

9 Reference to invalid AKey table entry

Referenced AKey table entry is either invalid, contains an error, or is outside the range of the AKey table.

10 Reserved

11 Memory access returned with unsupported request status

12 Memory access returned with completer abort status

13 Memory access returned with poisoned data

14 Timeout accessing memory DMA request timed out. The timeout value is implementation specific.

15 Index Error Invalid combination of WriteIndex and ReadIndex values. For example, WriteIndex < ReadIndex or WriteIndex – ReadIndex > DescRingSize. This subtype is also used to indicate an Index rollover at 264.

556

30 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Descriptor Retry 557

Descriptor errors with RT=1, IV=1, and C=1 set in the error log header may be retried by software. An 558 SDXI function may only mark an error log entry as RT=1 if memory consistency has not been violated. 559

Retrying a descriptor generally involves software correcting the underlying cause of the error, copying the 560 descriptor into an appropriate context, possibly with modifications, and re-executing it. Software shall 561 determine whether any underlying error conditions are correctable and whether to attempt a retry. Further 562 details are outside the scope of this specification. 563

While multiple descriptors from the same context may be written into the error log out-of-order, descriptors 564 must be retried in a valid order that reproduces the memory consistency of the original descriptor ring 565 ordering. The original descriptor ordering may be determined using the error log header 566 ErrContextNumber and ErrIndex fields. 567

Software may determine that all error logs for a context have been written by checking whether the context 568 status value has been changed to Error. 569

Retry may not be possible on an error log overflow event. 570

Implementations are encouraged to generate error logs with RT=1 where possible. 571

3.5 SDXI Multi-Function Access 572

There can be multiple interacting SDXI functions in a system if supported by the SDXI implementation. 573 Typically, a PCIe function can only access address spaces mapped using its own Requester ID. A unique 574 capability of an SDXI function is the optional ability to access address spaces mapped by the Requestor 575 ID of another SDXI function. 576

When executing a descriptor operation, the SDXI function determines the address space for each 577 specified data buffer or interrupt target by using the associated AKey value to reference an AKey table 578 entry. This entry provides SFuncHandle which maps to the Requester ID of an SDXI function within the 579 SDXI Function Group. When SFuncHandle is non-zero, the request is to another target or receiving 580 function. 581

The receiving function can optionally control which remote functions are permitted to attempt access using 582 the Receiver-side (function) Key Table (RKEY) mechanism. 583

SDXI Function Group 584

When an SDXI Function returns MmioCapabilities1.SDXI_MF as ‘1’, the function supports SDXI multi-585 function access and provides the Receiver-side (function) Key Table (RKEY) mechanism; otherwise, these 586 mechanisms are not supported. Note that SDXI multi-function access when supported is only possible for 587 SDXI functions that belong to the same SDXI Function Group. An SDXI PF and its associated VFs always 588 belong to the same function group. A function group may contain multiple PFs and VFs. A system may 589 contain multiple disconnected function groups. Function groups are discovered using the 590 MmioControl0.GroupID field in each SDXI function. 591

In a multi-function implementation of SDXI, all functions connected within an SDXI function group reflect 592 the same GroupID value written into any group member’s copy of this register. Software may identify all 593 functions within a function group by writing a non-zero value into the GroupID field of one function and 594 scanning for the same value in other SDXI functions. The GroupID register is only used to identify 595 functions within the same SDXI function group. It is not referenced by other SDXI registers or data 596 structures. 597

The method of coordination and configuration between SDXI functions within a function group is outside 598 the scope of this specification. 599

RKey Processing 600

SNIA SDXI Specification Working Draft 31 Version 0.9.0 rev 1

The Receiver-side (function) Key Table (RKEY) mechanism is used by a target (receiving) function to 601 control access to its data buffers from remote requesting functions within the same SDXI function group. 602

When a data buffer’s AKey table entry specifies an SFuncHandle value of zero (i.e. the access is within a 603 function), RKey processing for that access is skipped. When SFuncHandle is non-zero (i.e. the access is 604 across functions), RKey processing is attempted using the following checks below. 605

1. Requesting function passes its own SFuncHandle value along with the AKey table entry’s RKey 606 value and, if present, PASID, to the target function indicated by TargetSFuncHandle. 607

2. If the target function has MmioCapabilities1.SDXI_MF=0, abort the remote access. 608

3. Target function uses the supplied RKey value as an index into its RKey table to obtain the 609 associated RKey table entry. If the RKey table is not enabled 610 (MmioRKeyTblBase.RKeyTableEn=0), or the RKey table entry is not valid, abort the remote access. 611

4. Target function checks if the RKey table entry RequesterSfuncHandle match the requester 612 SFuncHandle. If not, abort the remote access. 613

5. Target function checks if the RKey table entry has PV=0. If so, it then checks that the requesting 614 function did not supply a PASID value, otherwise the remote access is aborted. 615

6. Target function checks if the RKey table entry has PV=1. If so, it then checks that the requesting 616 function supplied a PASID value and that the value matches the PASID value in the RKey table 617 entry. If not, the data buffer access is aborted. 618

RKey Table 619

The RKey table is a 4-Kbyte aligned contiguous memory structure composed of 8-byte RKey table entries. 620 The table may be any power-of-2 size from 4-Kbytes up to 512-Kbytes. Software controls the size of the 621 table through MmioRKeyTblBase.RKeyTableSize. 622

Software controlling the function may use IOMMU page tables in combination with PASID and RKey 623 values to selectively expose data buffers to different requesting functions. 624

Errors associated with failed RKey table entry checks are logged at the requesting function. No errors are 625 logged at the target function. 626

Error log entries associated with failed RKey table entry checks are marked as RT=0 and cannot be 627 retried. 628

RKey checks are performed for each descriptor referenced data buffer which is owned by another SDXI 629 function. This is implied when the data buffer AKey table entry contains a non-zero TargetSFuncHandle. 630

A connection manager specification is proposed for development by the SNIA SDXI TWG post publication 631 of Smart Data Accelerator Interface (“SDXI”) Specification v1.0. Please refer to it for mechanisms to 632 allocate and exchange RKeys between functions. 633

634

RKey Table Entry 635

Each RKey table entry is an aligned 8-byte structure with the following format. 636

Figure 3-9 RKey Table Entry 637

638

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD0 PV IE V

DWORD1

RsvdRequesterSFuncHandle

Rsvd PASID

32 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Table 3-9 RKey Table Entry 639

Bits Field Description

0 V Valid. When 1, indicates the other bits in this data structure are valid. When 0, all other bits in this data structure shall be ignored and remote requesting functions may not use the RKey value associated with this entry to access data buffers owned by this function.

1 IE Interrupt Enable. When 1, Requesting functions referencing this RKey table entry are permitted to generate interrupt requests via this function. When 0, interrupt requests generated by referencing this RKey table entry are ignored.

2 PV PASIDValid. When 1, the PASID field contains valid information.

15:3 Reserved

31:16 RequesterSFuncHandle RequesterSFuncHandle specifies the SFuncHandle value of the remote requesting function expected to reference this RKey table entry. See <> for more details.

51:32 PASID PASID value used to access data buffers using this RKey entry

63:52 Reserved

*Refer to the PCI-Express Base Specification for further details on Transaction Processing Hints. 640

641

SNIA SDXI Specification Working Draft 33 Version 0.9.0 rev 1

4 SDXI Function and Context State 642

4.1 SDXI Function State 643

An SDXI function operates in one of 6 basic states reflected by the MmioStatus0.GlobalState register and 644 shown in the figure below. Software may use the MmioControl0.Enable register to help transition the 645 function between the various states. 646

Figure 4-1 SDXI Function States and State Transitions 647

648

649

GlobalState = 001b(Initializing)

GlobalState = 000b(Stopped)

GlobalState = 101b(Stopped On Error)

GlobalState = 010b(Active)

GlobalState = 011b(Soft Stop Step)

Reset

GlobalState = 100b(Hard Stop Step)

GlobalState is MmioStatus0.GlobalStateEnable is MmioControl0.Enable

Enable 11b

Function InitComplete

Enable Legend00b: Request Stop 01b: Request Soft Stop10b: Request Hard Stop11b: Request Active

One or more contexts failed to stop

34 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Stopped State 650

The function is idle and not enabled to process any memory data structures or perform DMA operations. 651

From the Stopped state, software may set MmioControl0.Enable=11b. The function will attempt to 652 transition through the Initializing state into the Active state. Software shall program 653 ContextTableBase.ContextLvl2TableBasePtr prior to setting MmioControl0.Enable=11b. 654

Initializing State 655

The function self-initializes in the initializing state. If initialization is successful, the function transitions to 656 the Active state. If an error occurs during the initialization that prevents the processing of all contexts, the 657 function writes an appropriate error log and transitions to the Stopped-On-Error state. 658

It is an error (Generic Programming Error/Invalid Field Encoding) for software to program 659 MmioControl0.Enable to a value other than 11b when the function is in the Initializing state. 660

Active State 661

In the Active state, the function is enabled to process contexts and descriptors. 662

From the Active state, privileged software may request the function stop by writing Mmio0Control0Enable 663 to 10b to request a hard stop or to 01b to request a soft stop. 664

It is an error (Generic Programming Error/Invalid Field Encoding) for software to clear 665 MmioControl0.Enable to 00b when the function is in the Active state. 666

If an error occurs that prevents the processing of all contexts, the function writes an appropriate error log 667 and transitions to the Stopped-On-Error state 668

Soft Stop Request State 669

In the Soft Stop Request state, the function shall stop processing new descriptors and wait for all 670 outstanding descriptors to complete. The function shall wait until all outstanding descriptors complete 671 naturally or until an implementation-defined timeout period has expired at which point any remaining 672 descriptors are aborted. 673

When all contexts are stopped, the function transitions to the Stopped state. If an error occurs that 674 prevents one or more contexts from stopping, errors are logged, if possible, and the function instead 675 transitions to the Stopped-On-Error state. Descriptor errors including aborts alone do not drive a transition 676 to the Stopped-On-Error state. 677

Hard Stop Request State 678

In the Hard Stop Request state, the function shall stop processing new descriptors and wait for all 679 outstanding descriptors to complete. The function shall abort outstanding descriptors and log errors as 680 necessary for each aborted descriptor. 681

When all contexts are stopped, the function transitions to the Stopped state. If an error occurs that 682 prevents one or more contexts from stopping, errors are logged, if possible, and the function instead 683 transitions to the Stopped-On-Error state. Descriptor errors including aborts alone do not drive a transition 684 to the Stopped-On-Error state. 685

SNIA SDXI Specification Working Draft 35 Version 0.9.0 rev 1

Stopped-On-Error State 686

This state indicates that the function has stopped due to an error condition that prevents further context 687 processing. 688

If MmioControl0. FunctionErrIntEn=1, an interrupt to software is generated upon entering this state. 689

Software may return the function to the Stopped state by clearing MmioControl0.Enable to 00b. 690

4.2 SDXI Context State 691

An SDXI context operates in one of 7 basic states. The state is determined through a combination of the 692 value of ContextState in the Context Status Entry and the A (Active) field in the Context Level 1 Table 693 Entry. 694

Figure 4-2 shows the various states and the basic state transitions. Additional state transitions may occur, 695 particularly to the Stopping state, due to various error conditions. 696

Software shall not write to the Context Status Entry of a context in the Ready, Running or Stopping states. 697 A function may non-coherently cache the context status entry while in any of these states. 698

Context Stopped/NotRunning States 699

In these states the context does not process descriptors and does not respond to doorbells. The context 700 requires additional software initialization. 701

Contexts are normally transitioned into the Ready state by executing an AdminGrp.Start operation on the 702 function’s administrative context or, under SR-IOV, from the PF’s administrative context. An administrative 703 context cannot start itself through this mechanism. 704

An administrative context may be bootstrapped into the Running state by having software set, in memory, 705 both the Context Level 1 Table Entry A (Active) bit to 1 and Context Status Entry ContextStatus field to 706 Running. Software may then issue a doorbell to the context. 707

Software shall not use the AdminGrp.Start command from a PF administrative context to start a VF 708 administrative context. 709

Context Inactive State 710

In this state the context does not process descriptors and does not respond to doorbells. The context 711 requires additional software initialization. 712

Context Ready State 713

In this state, the context has been enabled to process descriptors but no descriptors are available. The 714 context may transition to the Running state based upon receiving an appropriate doorbell write that target 715 this context, or optionally by polling the context’s WriteIndex value in memory. 716

717

36 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Figure 4-2 SDXI Context States and Basic State Transitions 718

719

ContextStatus = StoppedS0: Context Stopped

* = Implementations may optionally transition into the Context Running state by reading WriteIndex and ReadIndex from memory. This is shown to indicate the possibility a context may

execute descriptor operations even without receiving a doorbell write. Software must issue doorbell writes in order to ensure correct operation.

Context State Diagram

S5: Context Operation Undefined

SW: Writing ContextStatus

4a. Context Ready

4b. Context Running

ContextStatus = Running, S4: Context Active,

CL1TE.A = 1

Func: All known work completed (LastDoorbellValue =ReadIndex)

SW: ( Doorbell write OR Admin(Start) ) AND DoorbellValue > ReadIndex AND DoorbellValue <= WriteIndex

Func: Implementation Specific*

SW: Writing ContextStatus

SW: ContextStatus !Stopped

SW: Writing ContextStatus

S2

Key “Stops”: Admin(Stop) or Function(Stop) “Start”: Admin(Start) “Inactive”: ContextStatus = Running, CL1TE.A=0, and No pending Descriptors ——— : SW Initiated State Transition ——— : Function Initiated State Transition ——— : State Transition to Operation Undefined

SW: “Stops”

ContextStatus = ErrorS1: Context Error

SW: ContextStatus Running OR Admin(Start) (and DBR if needed)

ContextStatus = StoppingS2: Context Stopping

SW: “Stops” or “Start”

SW: “Stops”

SW: “Stops” or “Start”

or “Start”

and SavedSW: “Error” Context Suspended SW: “Error” Context Modified,

Restored, or Created

Restored, or CreatedSW: “Stopped” Context Modified,

and SavedSW: “Stopped” Context Suspended

and Saved

SW: ContextStatus Stopped

S2Func: No errors detected, All Outstanding Descriptors Complete

Func: Errors detected, All Outstanding Descriptors Complete or Aborted

ContextStatus = Running

S4: Context ActiveCL1TE.A=1

SW: A 1

S3a: Context Inactive CL1TE.A=0

All Outstanding Descriptors Completed

SW: A 0

Func: No errors detected, All Outstanding Descriptors Complete

A = 0

A = 1

Restored, or CreatedSW: “Inactive” Context Modified,

and SavedSW: “Inactive” Context Suspended

and Saved

Error detectedFunc: Context or Descriptor

S3b: Context Going Inactive, CL1TE.A=0

May be Pending Descriptors

SW: A 1

Note: For any software-initiated context state transition, software shall use Admin Update and Admin Sync

operations as appropriate to ensure serialization and synchronization.

SNIA SDXI Specification Working Draft 37 Version 0.9.0 rev 1

Context Running State 720

In this state, the context is actively processing new descriptors. 721

AdminGrp.Start operations targeting the context may update the current doorbell value. 722

Context Stopping State 723

In this state, the context stops processing new descriptors and waits for outstanding descriptors to 724 complete. 725

If a hard stop request caused the transition into this state, or while in this state, a hard stop is requested, 726 outstanding descriptors are aborted, and errors are logged. Otherwise, the function may wait for an 727 implementation-specific amount of time for outstanding descriptors to complete normally. Once the timeout 728 time has expired, any remaining outstanding descriptors are aborted, and errors are logged. 729

A hard stop may be requested through either AdminGrp.Stop or MmioControl0.Enable. 730

Context Stopping With Errors State 731

In this state, the context stops processing new descriptors and waits an implementation-specific amount of 732 time for outstanding descriptors to complete normally. Once the timeout time has expired, any remaining 733 outstanding descriptors are aborted, and errors are logged. 734

While in this state, if a hard stop is requested through AdminGrp.Stop or through MmioControl0.Enable, 735 outstanding descriptors are aborted, and errors are logged. 736

AdminGrp.Start operations targeting the context are ignored while in this state. 737

Context Error State 738

The context is stopped in this state and waits for software to correct any underlying error conditions. 739 Software should overwrite ContextStatus to Stopped to return the context to the Not Running state. 740 Alternatively, software may return the context to the Stopped state by clearing the A (Active) bit in the 741 Context Level 1 Table Entry, performing appropriate AdminGrp.Update and AdminGrp.Sync operations, 742 and then overwriting ContextStatus to Stopped. 743

AdminGrp.Start and AdminGrp.Stop operations targeting the context are ignored while in this state. 744

38 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

5 SDXI Descriptor Ring Operation 745

Figure 5-1 An SDXI Function Context 746

An SDXI function descriptor is a naturally aligned 64-byte entry that instructs an SDXI function to perform 747 a given action. Descriptors are placed in a circular ring buffer that starts at a specified address 748 (DescRingBasePtr). The ring is contiguous at the translation level configured for the SDXI function. The 749 ring is configured to contain a given number of descriptor entries (DescRingSize). The number of bytes 750 allocated in memory for the ring is: (DescRingSize * 64). 751

The ring and all its related system memory data structures comprise a “context”. SDXI uses a 2-level 752 hierarchy of context tables (Context Table Level 2, Context Table Level 1) to enumerate the components 753 of the context. The concatenation of the offsets used to enumerate a context in both Context tables yields 754 the 16-bit “context_number” that is associated with the context. Note that the Context Tables point to a 755 context but are not themselves part of the context. An SDXI function can support multiple concurrent 756 contexts. Contexts are classified into two types: an “unprivileged” type used directly by user applications 757 for data movement; and “administrative” contexts that can be used by privileged software to control all 758 contexts supported by a SDXI function. 759

An SDXI function may optionally generate DMA requests with PASID when accessing non-context data 760 structures, as well as Context Control Entries and AKey Table Entries. This is controlled through the 761 Control0.FunctionPASID and Control0.FunctionPASIDValid MMIO registers. 762

A SDXI function may optionally generate DMA requests with PASID when accessing a context’s descriptor 763 ring, WriteIndex, Context Status Entry, or Completion Statusstructures. This is controlled through the 764 Context Level 1 Table Entry PV and ContextPASID fields. 765

A SDXI function may optionally generate DMA requests with PASID when accessing data buffers. This is 766 controlled through the associated AKey Table Entry PV and PASID fields. 767

A circular ring requires “start” and “end” indicators. Rather than using memory pointers to track these, a 768 SDXI descriptor ring uses 64-bit *unsigned* logical indices to indicate the start (ReadIndex) and end 769 (WriteIndex) of the descriptor ring. This simplifies various calculations for SW and HW alike. The logical 770

A Single SDXI Context

Index = 0, N, 2*N, ...

Descriptor Entry

: CCE.DescBasePtr

: + (1) * 64

: + (N-1) * 64

Descriptor Entry

Descriptor EntryIndex = 1, N+1, 2*N+1, ...

Index = N-1, 2*N-1, 3*N-1, ...Entry AddressWrap-around

ContextStateReadIndex

WriteIndex

CSE:ContextStatus Entry

Control BitsDescRingBasePtrDescQueueSize

. . .ContextStatusPtr

WriteIndexPtrCmdGrp0Support

MaxBuffer

CCE:ContextControl Entry

Descriptor Ring

CL1T:Context Level 1 Table

N = CCE.DescQueueSize(Each entry 64-bytes)

. . .

. . .

. . .

AkeyTable

CL2T:Context Level 2 Table

Context Lvl 2Table Base Ptr

Function MMIO

Context.DoorbellFunction MMIO

Context Lvl 1Table Base Ptr

(each entry 8-bytes)

EntryAddress = CCE.DescRingBasePtr + ((Index % CCE.DescQueueSize) << 6)WriteIndex – CSE.ReadIndex <= CCE.DescQueueSize

ContextNumber[15:0] = (CL2T_byte_offset[11:0] << 4) + (CL1T_byte_offset[11:0] >> 5)

ContextControlPtrAkeyTableBasePtr

ContextPASID. . .

(each entry 32-bytes)

SNIA SDXI Specification Working Draft 39 Version 0.9.0 rev 1

indices need only be mapped onto descriptor ring addresses when writing or reading a ring entry at a 771 given Index. 772

• An SDXI descriptor ring shall be contiguous within the address space used to access it. This is 773 determined by the function’s PCI requester ID and optional PASID. 774

CCE.DescRingBasePtr indicates the start address of the ring within this address space. 775

The logical ReadIndex and WriteIndex indices map to ring addresses when writing or reading a ring entry 776 at a given Index. An index N, maps to an address in the ring by: 777

• ring_entry_address = DescBase + ((N % DescRingSize) << 6) 778

The indices are not expected to reach or exceed 264-1 in practice 779

After software increments WriteIndex and adds entries to the ring, it shall write the context’s Doorbell 780 address with the WriteIndex value; there is no architectural guarantee that new entries on the ring are 781 processed without this important write. 782

If software wants to add N entries to the descriptor ring, it must ensure that: 783

• WriteIndex + N - ReadIndex <= DescRingSize 784

For an enabled context, ReadIndex is constantly being updated by HW as new descriptors are processed. 785 Any value read by SW of the ReadIndex could be stale, but this is not a problem. Since ReadIndex always 786 increases and never wraps in the normal case, a calculation using a stale value of ReadIndex will only 787 cause a more conservative understanding of the number of available indices/entries by software. It will 788 never allow enqueing more entries than the ring can hold. 789

If WriteIndex == ReadIndex, then the SDXI function does not process more entries of that context until 790 WriteIndex and Doorbell.WriteIndex are updated beyond ReadIndex. The SDXI function will stop 791 processing entries when ReadIndex == WriteIndex or ReadIndex points to an invalid entry. The SDXI 792 function shall never process the entry pointed to by WriteIndex or any entry that is logically past 793 WriteIndex, regardless of the value of the descriptor’s Valid field. 794

Software shall not modify valid descriptors located between ReadIndex and WriteIndex. 795

796

40 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

5.1 Enqueuing one or more Descriptors 797

The following procedure may be used to enqueue one or more descriptors into a descriptor ring: 798

1. Check for sufficient space in the descriptor ring by reading the ReadIndex, WriteIndex and ring size. 799

2. Reserve space in the descriptor ring by using an atomic compare-and-swap operation to add a value to 800 the Write Index in memory corresponding to the number of descriptors to be enqueued. 801

3. Descriptors may be written into memory starting at DescBase + (pre_incremented_WriteIndex % 802 DescRingSize )*64. Software shall ensure that the Valid field in a descriptor header is set to 0 (Invalid) 803 until the whole descriptor is written. The function may start processing the whole descriptor immediately 804 once the valid field is set to 1. The function may read the descriptor one or more times prior to the 805 doorbell being written. 806

4. Once all descriptors are written to memory, write the updated WriteIndex value to the context’s doorbell 807 location. 808

SNIA SDXI Specification Working Draft 41 Version 0.9.0 rev 1

Figure 5-2 Example x86-64 SDXI Enqueue Code 809

/* Example code for enqueueing descriptors on an SDXI ring using gcc 9.3 (and later). 810 811

The code should be correct for multiple CPU architectures as it relies solely on: the cpu-independent gcc single-threaded guarantees for 812 instruction re-ordering; gcc atomic built-ins which control compiler instruction-reordering and proper emitting of CPU memory barriers and 813 instructions; and the behavior of the volatile type keyword to ensure that gcc does not optimize away or cache critical SDXI index 814 locations. (Note: the code has been compiled at -O3 and the assembly examined on both x86-64 and ARM64 for correctness wrt these 815 issues.) 816

817 Using a fence (__atomic_thread_fence(__ATOMIC_ACQ_REL)) at key points in the code ensures for any given thread that: the complier 818 doesn't re-order instructions due to optimizations; and the complier emits the minimum but necessary CPU memory barriers to ensure the 819 required read and write ordering within the thread. 820

821 For example, on x86-64 with optimizations enabled, the fence compiles to nothing due to the TSO memory model; however, on ARM64 822 this compiles to "dmb ish". Using __ATOMIC_SEQ_CST instead for the fence penalizes the emitted code in some cases -- e.g. mfence 823 on x86-64 which is too strong. 824

825 In a multi-producer scenario, Write_Index can only be safely updated using __atomic_compare_exchange_n(). This operation also 826 provides the required single-threaded guarantees wrt the update of Write_Index. Using _ATOMIC_SEQ_CST here seems like the right 827 generic choice which gcc compiles to the correct CPU instructions such as "lock cmpxchg" on x86_64 and "ldaxr, cmp, bne, stlxr" on 828 ARM64. 829

*/ 830 831 #define SDXI_DESCR_SIZE 64 832 #define SDXI_DS_NUM_QW (SDXI_DESCR_SIZE / sizeof(uint64_t)) 833 #define _ALIGN64 __attribute__ ((aligned (64))) 834 #define SDXI_MULTI_PRODUCER 1 // Define to 0 if single-producer 835 836 /* Shows a method for writing an array of descriptors to the SDXI ring given a ring index. 837 838

Method: Skip writing the first QW of the first new entry to be added to the ring, but write all the other entry data as normal. This ensures 839 that we don't trigger the SDXI function into reading incomplete entries as we are writing them. Finally, write the first QW of the first new 840 entry to the ring so the SDXI function can start processing the new entries. 841

842 This is safe provided that the zeroth bit of the LSB (VALID bit) of the first new ring entry is '0' *before* WriteIndex is advanced past it 843 (since an SDXI function will not read past an invalid ring entry). This requirement is ensured by SW and the SDXI function in the following 844 way: SW shall always initialize at least every 64th byte of the ring to zero before first use of the ring; and subsequently, the SDXI function 845 shall always clear the zeroth bit of the LSB of every ring entry the function processes. 846

*/ 847 int update_ring( 848 uint64_t * const enq_entries, // Ptr to entries to enqueue 849 uint64_t enq_num, // Number of entries to enqueue 850 uint64_t * ring_base, // Ptr to ring location 851 uint64_t ring_size, // (Ring Size in bytes)/64 852 uint64_t index ) // Starting ring index to update 853 { 854 uint64_t skip = 1; // 'skip' writing the first QW. 855

856 for (uint64_t i = 0; i < enq_num; i++){ 857 uint64_t * ringp = ring_base + ((index + i) % ring_size) * SDXI_DS_NUM_QW; 858 uint64_t * entryp = enq_entries + (i * SDXI_DS_NUM_QW); 859 860 for (uint64_t j = skip; j < SDXI_DS_NUM_QW; j++ ){ 861 *(ringp + j) = *(entryp + j); 862 } 863 864 skip = 0; // Don't skip the first QW for the other entries 865 } 866 867

// Now write the first QW of the first new entry to the ring. 868 __atomic_thread_fence(__ATOMIC_ACQ_REL); 869 *(ring_base + (index % ring_size) * SDXI_DS_NUM_QW) = *enq_entries; 870 871 return 0; 872 } 873

874

42 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Figure 5-3 Example x86-64 SDXI Enqueue Code (cont) 875

/* Enqueues entries onto an SDXI Ring. 876 877

We assume that the OS and/or the application have already setup the SDXI function's context and can pass the locations and sizes of the 878 ring, Write_index, Read_Index, and Door_Bell to this routine. Once WriteIndex can be safely advanced for the desired number of ring 879 entries, this method does so and *then* afterwards populates the new ring entries. 880

*/ 881 882 int sdxi_enqueue( 883 uint64_t * const enq_entries, // Ptr to entries to enqueue 884 uint64_t enq_num, // Number of entries to enqueue 885 uint64_t * ring_base, // Ptr to ring location 886 uint64_t ring_size, // (Ring Size in bytes)/64 887 uint64_t const volatile * const Read_Index, // Ptr to ReadIndex location 888 uint64_t volatile * const Write_Index, // Ptr to WriteIndex location 889 uint64_t volatile * const Door_Bell) // Ptr to Ring Doorbell location 890 { 891 uint64_t read_idx, old_write_idx, new_idx; 892 893

// SW should do intelligent retry & back-off; while loop shown for simplicity 894 while (true){ 895 896

// Get Read_Index before Write_Index to always get consistent values 897 read_idx = *Read_Index; 898 __atomic_thread_fence(__ATOMIC_ACQ_REL); 899 old_write_idx = *Write_Index; 900 901 if (read_idx > old_write_idx){ 902

// Only happens if WriteIndex wraps or ring has bad setup 903 return 1; 904 } 905 906 new_idx = old_write_idx + enq_num; 907 if (new_idx - read_idx > ring_size) { 908 continue; // Not enough free entries, try again 909 } 910 911 if ( SDXI_MULTI_PRODUCER ){ 912 /* Try to atomically update Write_Index before other threads with compare_exchange operation. This built-in also ensures 913

the required single-threaded guarantees wrt the update of Write_Index. 914 */ 915 bool success = 916 __atomic_compare_exchange_n( Write_Index, &old_write_idx, new_idx, false, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST ); 917 918 if ( success ) break; // Updated Write_Index, no need to try again. 919 } 920 else { 921

// Single-Producer case 922 *Write_Index = new_idx; 923 __atomic_thread_fence(__ATOMIC_ACQ_REL); 924 break; // Always successful for single-producer 925 } 926 927

// Couldn't update Write_Index, try again. 928 } 929 930

// Write_Index is now advanced. Let's write out entries to the ring. 931 update_ring(enq_entries, enq_num, ring_base, ring_size, old_write_idx); 932 933

// Door_Bell write required; only needs ordering wrt update of Write_Index. 934 *Door_Bell = new_idx; 935 936 return 0; 937 } 938

939

SNIA SDXI Specification Working Draft 43 Version 0.9.0 rev 1

Multi-Producer Enqueue 940

Multiple producer entities may use the enqueue procedure described in the prior section in parallel without 941 the use of locks or other higher-level serialization methods. While a context’s WriteIndex value in memory 942 must be monotonically increasing, the SDXI function may see doorbell writes to a context whose values 943 are not monotonically increasing. The function shall discard such doorbell writes. 944

945

44 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

5.2 Descriptor Processing 946

This section describes the specific behavior required of an SDXI function as it processes a descriptor. 947

The function processes descriptors from a ring using the following notional steps: 948

1. Determine space has been allocated in the descriptor ring (WriteIndex>ReadIndex), for example by 949 receiving a doorbell write with an updated WriteIndex value or by polling WriteIndex in memory. 950

2. Poll on the descriptor located at ReadIndex until it is valid. The function may stop polling after an 951 implementation-specific timeout period and log an error. 952

3. Check the Fence bit in the descriptor header. If it is set, wait for all prior descriptors within the same 953 context to complete. 954

4. Set the Valid field in the header to invalid. 955

5. Write incremented ReadIndex value back to memory. 956

6. Execute the descriptor operation including accessing any referenced AKeys and data buffers 957

7. Write an error log entry if an error was detected in any of the previous steps 958

8. Update MiscStatus in the Completion Status Block if necessary 959

9. Write an error log entry if any errors were encountered accessing the Completion Status Block 960

10. Complete the descriptor by atomically decrementing the completion signal in the Completion Status 961 Block, or if an error was previously detected, by writing all bits of the completion signal to 1. 962

11. Write an error log entry if any errors were encountered accessing the completion signal 963

964

If an error occurs during steps 1 to 3, an error log is written and processing for the current descriptor is 965 stopped. Further processing actions such as changing the Valid field or generating a completion signal are 966 not performed. 967

If an error occurs during steps 4, 5, or 6, the function jumps to step 7 and writes an error log entry. 968

If an error is detected during descriptor processing, the function starts transitioning the Context to the Error 969 state. All outstanding descriptors associated with the Context must complete prior to entering the Error 970 state. 971

Descriptor processing may be stopped during steps 1, 2 or 3 without logging an error. This may occur, for 972 example, due to the parallel execution of a Stop Context operation in an administrative context. Stopping 973 descriptor processing at any other time results in an error. 974

5.3 Descriptor Ordering and Parallel Execution 975

An SDXI function may prefetch any number of descriptors from a descriptor ring up to the location prior to 976 that indicated by the WriteIndex value in memory. Dynamic modification of valid descriptors located 977 between ReadIndex and WriteIndex-1 is not supported. 978

Following the model in the prior section, an SDXI function shall process descriptors in the order in which 979 they appear in the descriptor ring until the end of step 4 where the ReadIndex is incremented. An SDXI 980 function may execute steps 5 and later for multiple descriptors of the same context in parallel. 981

The Fence (F) flag may be used to constrain parallel execution. When the Fence flag is set, the function 982 shall wait for all prior descriptor operations to complete before executing the fenced descriptor operation. 983 When the Fence flag is clear, the descriptor operation may be executed while prior descriptor operations 984 are outstanding. 985

SNIA SDXI Specification Working Draft 45 Version 0.9.0 rev 1

The Sequential Consistency (S) flag may also affect parallel execution. See section 5.5 Memory 986 Consistency Model for more details. 987

There are no ordering or consistency requirements for descriptors belonging to different contexts. 988

5.4 Descriptor Completion 989

When each descriptor operation has completed, the function will decrement the referenced completion 990 signal in memory. Descriptors may complete out of order. It is up to software to assign completion signal 991 locations and initial values as appropriate. Some applications may require precise tracking of descriptors 992 while others may track groups of descriptors. 993

When a descriptor encounters an error during processing and causes its context to stop, there may be 994 multiple prior and/or subsequent descriptors from the same context being processed in parallel. These 995 may complete independently with or without error. 996

5.5 Memory Consistency Model 997

Unless otherwise specified, there are no ordering or consistency requirements between an SDXI function’s 998 memory accesses associated with the same descriptor operation. 999

An SDXI function clearing the descriptor header Valid bit shall be made globally visible ahead of making 1000 the ReadIndex update globally visible. Additionally, the clearing of the Valid bit shall be made globally 1001 visible ahead of making any writes associated the descriptor operation globally visible and ahead of 1002 making any writes to the descriptor’s Completion Status globally visible. 1003

All writes associated with a descriptor operation shall be made globally visible prior to making changes to 1004 the descriptor’s Completion Statusglobally visible. Within the Completion Status, changes to MiscStatus 1005 shall be made globally visible prior to making updates to the descriptor’s completion signal globally visible. 1006 Additionally, all error logs for the descriptor, other than those associated with accessing the completion 1007 status block shall be made globally visible prior to making updates to the completion status block globally 1008 visible. 1009

If the Sequential Consistency (S) flag is set, all writes from prior descriptor operations in the same context 1010 shall be made globally visible prior to making writes from the current descriptor operation globally visible, 1011 regardless of address. 1012

If the Sequential Consistency flag is clear, writes associated with the current descriptor operation may be 1013 reordered ahead of, and made globally visible prior to, writes associated with prior descriptor operations 1014 that have not yet been made globally visible. 1015

Unlike the fence flag, the sequential consistency flag allows multiple descriptors to be executed in parallel 1016 as long as the rules around global visibility are maintained. 1017

Developer Note: This note is regarding the use of Sequential Consistency Flag and PCIe Relaxed 1018 Ordering. 1019

In general, writes associated with a single descriptor operation may be issued over the PCIe bus in any 1020 order. 1021

If the S flag is clear, all writes associated with the descriptor operation may be issued as Relaxed Ordered 1022 (RO=1) writes. Additionally, writes associated with the descriptor operation may be issued ahead of writes 1023 associated with prior descriptor operations. 1024

46 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

If the S flag is set, all writes associated with a descriptor operation need to be issued as non-relaxed 1025 (RO=0) over the PCIe bus, and only after writes associated with all prior descriptor operations have been 1026 issued on the bus. 1027

5.6 Descriptor Chaining 1028

The descriptor chaining mechanism may be used for either Extended Descriptors or as part of the Error 1029 Log. See 5.6.1 Extended Descriptors and 3.4 Error Log for more details. 1030

A descriptor chain is formed from 2 or more adjacent 64-byte descriptor entries. Each entry in the chain is 1031 referred to as a link. Each link is formatted as a standard SDXI descriptor with an SDXI header and footer. 1032 Each link except for the last shall set the Chain (C) bit to 1 in their respective headers. The last link in the 1033 chain is indicated by a descriptor with the Chain bit clear. 1034

Software shall reserve enough space in the descriptor ring to enqueue the entire chain. The individual 1035 descriptor entries forming the chain may be written into the descriptor ring in arbitrary order. 1036

When processing a descriptor chain, an SDXI function shall increment ReadIndex by 1 and clear the 1037 header Valid bit for each link in the chain. 1038

Extended Descriptors 1039

Extended descriptors may be utilized by operations that require more than the 52 bytes of operand space 1040 provided by a single SDXI descriptor. Additional operand space may be created by chaining 2 or more 1041 descriptor entries. Each operation definition shall indicate whether the use of an extended descriptor is 1042 required. 1043

Other than the Chain bit, extended descriptors utilize the same descriptor header for each link in the chain. 1044

Only the last link in the chain contains a valid descriptor footer with the pointer to the completion status 1045 block. The descriptor footers for all other links in the chain are reserved. Only one completion signal 1046 decrement is performed for the entire extended descriptor. 1047

Error log entries are generated for the extended descriptor as a whole. 1048

An SDXI function processing an extended descriptor shall wait until all links are valid before executing the 1049 descriptor operation. 1050

SNIA SDXI Specification Working Draft 47 Version 0.9.0 rev 1

6 SDXI Descriptor and Operation Specification 1051

A set of related operations form an operations group. An operation requires a particular descriptor format 1052 to indicate parameters for the operation. These are described in this chapter. 1053

Table 6-1 SDXI Operation Groups, Types, and Subtypes 1054

Operation Group Type Subtype Operation (Reserved) 0x000 ‘-- ‘-- DMA Base Operation Group (DmaBaseGrp)

0x001 0x01 Nop 0x02 WriteImm 0x03 Copy 0x04 RepCopy

Administrative Operation Group (AdminGrp)

0x002 0x00 UpdateFunc 0x01 UpdateContext 0x02 UpdateAkey 0x03 Start 0x04 Stop 0x05 Intr 0x06 Sync

Atomic Operation Group (AtomicGrp)

0x003 0x00 Reserved 0x01 Swap 0x02 Unsigned Add 0x03 Unsigned Subtract 0x04 Reserved 0x05 AND 0x06 OR 0x07 XOR 0x08 Signed Min 0x09 Signed Max 0x0A Unsigned Clamping Increment 0x0B Unsigned Clamping Decrement 0x0C Compare and Swap

Interrupt Operation Group (IntrGrp) 0x004 0x00 Interrupt Connection Operation Group (ConnectGrp)

0x005 ‘-- Reserved

Reserved 0x7F6 – 0x006 ‘-- ‘-- Reserved for Error Log 0x7F7 ‘-- ‘-- Vendor Defined 0x7F8 – 0x7FF ‘-- Reserved

1055 Operation Types and Subtypes are encoded in the first 4-bytes of the descriptor header as: 1056

Figure 6-1 Type & Subtype Encoding in Descriptors 1057

1058

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

* * * * 0000 00000

< - - opcode - - > subtype type

48 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

6.1 Descriptor Format for SDXI Operations 1059

Common Header and Footer 1060

All SDXI operations use the following common header and footer encodings in their descriptor format. 1061

Figure 6-2 Common Header and Footer 1062

1063

1064

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

* * * *DWORD01DWORD02DWORD03DWORD04DWORD05DWORD06DWORD07DWORD08DWORD09DWORD10DWORD11DWORD12DWORD13DWORD14 NP

DWORD150 0 0

completion_ptr [63:00]

0000 00000

Body

< - - opcode - - > subtype type

SNIA SDXI Specification Working Draft 49 Version 0.9.0 rev 1

<!-- Table=Desc --!> 1065 Table 6-2 SDXI Operation Common Header and Footer Descriptor Format 1066

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid =* (VL)’ 000 1 = Descriptor is valid. All fields may be processed. 0 = Descriptor is invalid. All other fields within the descriptor shall be ignored. The function shall poll the descriptor until it becomes valid or until an implementation defined timeout period has expired at which point an error is logged.

DW00[01] ‘sequential = * (SE)’ 001 Sequential Consistency 1 = Operation writes are sequentially consistent 0 = Operation writes are not required to be sequentially consistent See section 5.5 for more details.

DW00[02] ‘fence =* (FE)’ 002 1 = All prior descriptor operations shall complete before executing this descriptor’s operation 0 = Execution of this descriptor’s operation is permitted prior to the completion of prior descriptor operations See section 5.3 for more details.

DW00[03] ‘chain = * (CH)’ 003 Chain. 1=Start or middle of a set of chained descriptors. 0=End of chain, or non-chained descriptor. See section 5.6 for more details.

DW00[07:04] ‘= 0000’ 007:004 Shall be set to 0 DW00[15:08] ‘subtype’ 015:008 Specifies an operation within the Operation Group. DW00[26:16] ‘type’ 026:016 Specifies the Operation Group. DW00[31:27] ‘= 00000’ 031:027 Shall be set to 0

DW13:01 ‘Operation’ 447:032 This region is defined separately for each operation.

DW15:14 ‘completion_ptr [63:00]^’ 511:448 A pointer to a 16B aligned region of memory containing the Completion Status block. An invalid pointer may result in an error.

DW14[00] ‘no_pointer (NP)’ 448 When 0, the SDXI function shall update the completion status blocked pointed to by completion_ptr. When 1, there is no completion status block associated with the descriptor to update and the completion_ptr shall be ignored. This mechanism avoids the overhead of completion-block-status processing for a series of descriptor requests that are ordered before a terminal one which will specify a completion status block.

DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0 1067

50 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Completion Status Block 1068

Figure 6-3 Completion Status Block 1069

1070

1071

Table 6-3 Completion Status Block 1072 1073

Location Field Bit Range Description DW01:00 ‘completion_signal[63:00]’ 063:000 The SDXI function shall atomically decrement the Completion

Signal value upon completion regardless of success or failure. An atomic decrement of 0 results in 0xFFFF_FFFF_FFFF_FFFF.

In case of errors that are indicated via the Completion Status Block, an SDXI function will first set the ‘Error Recorded (ER)’ bit prior to decrementing the Completion signal value. Information in DWORD2 through DWORD7 of Completion Status Block can only be relied upon by software once the Completion signal field reaches a value that indicates all associated descriptors have been processed by the hardware.

In cases of errors on multiple descriptors that are associated with a single Completion Status Block, an SDXI function may set ER bit for one or more of the descriptors.

The producer shall initialize the Completion Status Block prior to submitting any associated descriptor.

NOTE: A producer may update the Completion Status Block atomically even after an associated descriptor has been issued.

DW02[30:00] ‘Rsvd’ 094:064 Reserved.

DW02[31] ‘Error Recorded (ER)’ 095 This bit will be set by an SDXI function if it encounters an error during processing of any associated descriptor. An SDXI function will never clear the ER bit if it is already set.

DW07:03 ‘Rsvd’ 255:096 Reserved.

Attribute Field 1074

The attribute (Attr) field controls features related to accessing a data buffer. Within a descriptor definition, 1075 a postfix may be added to the field name to uniquely identify a specific data buffer (ex. Attr (Src), Attr 1076 (Dst)). 1077

Subsequently defined descriptors may contain data buffer pointers which have a corresponding Attr field. 1078

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00

DWORD01

DWORD02 ERDWORD03

DWORD04

DWORD05

DWORD06

DWORD07

Rsvd [000:159]

completion_signal [63:00]

Rsvd [30:00]

SNIA SDXI Specification Working Draft 51 Version 0.9.0 rev 1

Table 6-4 Attribute Field 1079

Bits Field Description

1:0 CoherencyControl 00b : The associated memory location is accessed as I/O coherent. 01b : The associated memory location is accessed as non-coherent. 10b : The associated memory location is accessed as I/O coherent. Cache injection may be used for the associated data buffer if enabled. All other encodings reserved

2 Reserved Shall be set to 0

3 MMIO 1b: The referenced memory buffer may be located in either system memory or MMIO space. 0b: The reference memory buffer is located in system memory and is not located in an MMIO space. This setting may optimize performance in some implementations.

1080

Figure 6-4 AttributeFormat 1081

1082

3 2 1:0MMIO rsvd Coherency Ctl

52 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

6.2 DMA Base Operations Group (DmaBaseGrp) 1083

All SDXI implementations are required to support the DmaBaseGrp operations. 1084

DmaBaseGrp: DS_DMAB_NOP 1085

The DmaBaseGrp’s Nop operation performs no functional DMA operation. A descriptor specifying the 1086 operation is processed and ordered the same as other descriptors, obeys all relevant bits in the header, 1087 and generates the completion signal. 1088

Figure 6-5 DS_DMAB_NOP Descriptor Format 1089

1090

1091

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

0 * 0 1DWORD01DWORD02DWORD03DWORD04DWORD05DWORD06DWORD07DWORD08DWORD09DWORD10DWORD11DWORD12DWORD13DWORD14 NP

DWORD150 0 0

completion_ptr [63:00]

0000 0x01 0x001 00000

rsvd (r)

< - - opcode - - > subtype type

SNIA SDXI Specification Working Draft 53 Version 0.9.0 rev 1

<!-- Table=Desc --!> 1092 Table 6-5 DS_DMAB_NOP Descriptor Format 1093

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid =1 (VL)’ 000 See header description. DW00[01] ‘sequential = 0 (SE)’ 001 Shall be set to 0. DW00[02] ‘fence = * (FE)’ 002 See header description. DW00[03] ‘chain=0 (CH)’ 003 Shall be set to 0. DW00[07:04] ‘= 0000’ 007:004 Shall be set to 0 DW00[15:08] ‘subtype=0x01’ 015:008 See header description. DW00[26:16] ‘type=0x001’ 026:016 See header description. DW00[31:27] ‘= 00000’ 031:027 Shall be set to 0

DW13:01 ‘rsvd (r)’ 447:032 Shall be set to 0

DW15:14 ‘completion_ptr [63:00]^’ 511:448 See footer description, Section 6.1.1

DW14[00] ‘no_pointer (NP)’ 448 See footer description, Section 6.1.1 DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0

<!-- Table=End --!> 1094 1095

54 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

DmaBaseGrp: DS_DMAB_WRT_IMM Operation 1096

The DmaBaseGrp’s WriteImm operation is used to write up to 32 contiguous bytes starting at a destination 1097 memory address. The data is supplied inside the descriptor. Following little-endian conventions, data byte 1098 0 is written to the first byte of the destination address. Write atomicity guarantees are implementation 1099 specific. 1100

Figure 6-6 DS_DMAB_WRT_IMM Descriptor Format 1101

1102

1103

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

0 * * 1

DWORD01

DWORD02DWORD03DWORD04DWORD05DWORD06DWORD07DWORD08DWORD09DWORD10DWORD11DWORD12DWORD13DWORD14 NP

DWORD15

akey0 (Dst)rsvd (r)

address0 [63:00] (Dst)

data [255:000]

0 0 0 completion_ptr [63:00]

bsize 0 0 0 rsvd (r)< - - attr - - >

attr0 (Dst)rsvd (r) rsvd (r)

0000 0x02 0x001 00000< - - size - - >

< - - opcode - - > subtype type

SNIA SDXI Specification Working Draft 55 Version 0.9.0 rev 1

<!-- Table=Desc --!> 1104 Table 6-6 DS_DMAB_WRT_IMM Descriptor Format 1105

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid =1 (VL)’ 000 See header description. DW00[01] ‘sequential = * (SE)’ 001 See header description. DW00[02] ‘fence = * (FE)’ 002 See header description. DW00[03] ‘chain=0 (CH)’ 003 Shall be set to 0. DW00[07:04] ‘= 0000’ 007:004 Shall be set to 0 DW00[15:08] ‘subtype=0x02’ 015:008 See header description. DW00[26:16] ‘type=0x001’ 026:016 See header description. DW00[31:27] ‘= 00000’ 031:027 Shall be set to 0

DW01[07:00] ‘size*' 039:032 Size DW01[04:00] ‘bsize’ 036:032 Specifies the number of bytes to write minus 1.

0x0 = 1 Byte, . . .0x1F = 32 Bytes DW01[07:05] ‘0 0 0’ 039:037 Shall be set to 0

DW01[31:08] ‘rsvd (r)’ 063:040 Shall be set to 0

DW02[07:00] ‘attr*' 071:064 Attributes DW02[03:00] ‘attr0 (Dst)’ 067:064 Destination buffer attributes DW02[07:04] ‘rsvd (r)’ 071:068 Shall be set to 0

DW02[31:08] ‘rsvd (r)’ 095:072 Shall be set to 0 DW03[15:00] ‘akey0 (Dst)’ 111:096 Destination buffer AKey DW03[31:16] ‘rsvd (r)’ 127:112 Shall be set to 0 DW05:04 ‘address0 [63:00] (Dst)’ 191:128 Starting Destination Buffer Address

DW13:06 ‘data [255:000]’ 447:192 Immediate Data DW15:14 ‘completion_ptr [63:00]^’ 511:448 See footer description, Section 6.1.1

DW14[00] ‘no_pointer (NP)’ 448 See footer description, Section 6.1.1 DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0

<!-- Table=End --!> 1106

56 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

DmaBaseGrp: DS_DMAB_COPY Operation 1107

The DmaBaseGrp’s Copy operation copies a source data buffer to a destination data buffer. 1108

It is an error (Generic Descriptor Error/Unsupported Field Encoding) for the source and destination buffers 1109 to exceed the size described by CL1TE.MaxBuffer. 1110

Figure 6-7 DS_DMAB_COPY Descriptor Format 1111

1112

1113

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

0 * * 1DWORD01

DWORD02DWORD03DWORD04DWORD05DWORD06DWORD07DWORD08DWORD09DWORD10DWORD11DWORD12DWORD13DWORD14 NP

DWORD15

address0 [63:00] (Src)

address1 [63:00] (Dst)

rsvd (r)

0 0 0 completion_ptr [63:00]

< - - attr - - > attr0 (Src)attr1 (Dst) rsvd (r)

akey0 (Src)akey1 (Dst)

0000 0x03 0x001 00000size

< - - opcode - - > subtype type

SNIA SDXI Specification Working Draft 57 Version 0.9.0 rev 1

<!-- Table=Desc --!> 1114 Table 6-7 DS_DMAB_ COPY Descriptor Format 1115

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid =1 (VL)’ 000 See header description. DW00[01] ‘sequential = * (SE)’ 001 See header description. DW00[02] ‘fence = * (FE)’ 002 See header description. DW00[03] ‘chain=0 (CH)’ 003 Shall be set to 0. DW00[07:04] ‘= 0000’ 007:004 Shall be set to 0 DW00[15:08] ‘subtype=0x03’ 015:008 See header description. DW00[26:16] ‘type=0x001’ 026:016 See header description. DW00[31:27] ‘= 00000’ 031:027 Shall be set to 0

DW01[31:00] ‘size’ 063:032 Specifies the number of bytes to write minus 1. 0x0 = 1 Byte, . . ., 0xFFFF_FFFF = 2**32 Bytes

DW02[07:00] ‘attr*' 071:064 Attributes DW02[03:00] ‘attr0 (Src)’ 067:064 Source buffer attributes DW02[07:04] ‘attr1 (Dst)’ 071:068 Destination buffer attributes

DW02[31:08] ‘rsvd (r)’ 095:072 Shall be set to 0

DW03[15:00] ‘akey0 (Src)’ 111:096 Source buffer AKey

DW03[31:16] ‘akey1 (Dst)’ 127:112 Destination buffer AKey

DW05:04 ‘address0 [63:00] (Src)’ 191:128 Source Buffer Address

DW07:06 ‘address1 [63:00] (Dst)’ 255:192 Destination Buffer Address

DW13:08 ‘rsvd (r)’ 447:256 Shall be set to 0

DW15:14 ‘completion_ptr [63:00]^’ 511:448 See footer description, Section 6.1.1 DW14[00] ‘no_pointer (NP)’ 448 See footer description, Section 6.1.1 DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0

<!-- Table=End --!> 1116

58 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

DmaBaseGrp: DS_DMAB_REPCOPY Operation 1117

The DmaBaseGrp’s Repeat Copy operation copies a single 4-KiB-aligned source data buffer to multiple 1118 adjacent locations within a 4-KiB-aligned destination buffer. This operation may be used, for example, to 1119 fill a large region of memory. 1120

It is an error (Generic Descriptor Error/Unsupported Field Encoding) to specify a destination buffer size 1121 greater than the size described by CL1TE..MaxBuffer. 1122

Figure 6-8 DS_DMAB_REPCOPY Descriptor Format 1123

1124

1125

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

0 * * 1

DWORD01

DWORD02DWORD03DWORD04 AZ

DWORD05DWORD06DWORD07

DWORD08DWORD09DWORD10DWORD11DWORD12DWORD13DWORD14 NP

DWORD15

rsvd (r)

0 0 0 completion_ptr [63:00]

0 … 0 address1 [63:00] (Dst)

< - - repeat - - > 0 . . . 0 number

akey0 (Src)akey1 (Dst)0 … 0

address0 [63:00] (Src)

0 … 0 nsize 0 … 0 < - - attr - - >

attr0 (Src)attr1 (Dst) rsvd (r)

0000 0x04 0x001 00000< - - size - - >

< - - opcode - - > subtype type

SNIA SDXI Specification Working Draft 59 Version 0.9.0 rev 1

<!-- Table=Desc --!> 1126 Table 6-8 DS_DMAB_REPCOPY Descriptor Format 1127

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid =1 (VL)’ 000 See header description. DW00[01] ‘sequential = * (SE)’ 001 See header description. DW00[02] ‘fence = * (FE)’ 002 See header description. DW00[03] ‘chain=0 (CH)’ 003 Shall be set to 0. DW00[07:04] ‘= 0000’ 007:004 Shall be set to 0 DW00[15:08] ‘subtype=0x04’ 015:008 See header description. DW00[26:16] ‘type=0x001’ 026:016 See header description. DW00[31:27] ‘= 00000’ 031:027 Shall be set to 0

DW01[31:00] ‘size*' 063:032 Size DW01[11:00] ‘0 … 0’ 043:032 Shall be set to 0

DW01[20:12] ‘nsize’ 052:044 The size of the source data buffer in 4-Kbyte increments minus 1. 0x0 = 4 Kbytes, … , 0x1FF = 2 Mbytes

DW01[31:21] ‘0 … 0’ 063:053 Shall be set to 0

DW02[07:00] ‘attr*' 071:064 Attribute Fields DW02[03:00] ‘attr0 (Src)’ 067:064 Source buffer attributes DW02[07:04] ‘attr1 (Dst)’ 071:068 Destination buffer attributes

DW02[31:08] ‘rsvd (r)' 095:072 Shall be set to 0 DW03[15:00] ‘akey0 (Src)’ 111:096 Source buffer AKey DW03[31:16] ‘akey1 (Dst)’ 127:112 Destination buffer AKey DW05:04 ‘address0 [63:00] (Src)^’ 191:128 4KB aligned source buffer starting address.

DW04[00] ‘all_zero (AZ)’ 128

When AZ=1, this indicates that the source buffer is all zero. This allows the SDXI function to optimize the transfer (even avoiding reading the buffer). When AZ=0, no such guarantee is made. When AZ=1, SW shall always initialize the entire source buffer to zero; otherwise the contents of the destination is not defined – and no error is required to be reported in this case.

DW04[11:01] ‘0 … 0~’ 139:129 Shall be set to 0 DW07:06 ‘address1 [63:00] (Dst)^’ 255:192 4KB aligned destination buffer starting address.

DW06[11:00] ‘0 … 0~’ 203:192 Shall be set to 0 DW08[31:00] ‘repeat*’ 287:256 Specifies the number of times the source data buffer is copied

to the destination buffer. 0x0 = 1 copy …. 0xF_FFFF = 220 copies. The total destination size is: 4Kbytes * (Size+1) * (Repeat+1).

DW08[11:00] ‘0 . . . 0’ 267:256 Shall be set to 0 DW08[31:12] ‘number’ 287:268 See above

DW13:09 ‘rsvd (r)’ 447:288 Shall be set to 0 DW15:14 ‘completion_ptr [63:00]^’ 511:448 See footer description, Section 6.1.1

DW14[00] ‘no_pointer (NP)’ 448 See footer description, Section 6.1.1 DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0

<!-- Table=End --!> 1128

60 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

6.3 Administrative Operation Group (AdminGrp) 1129

Administrative operations are used by software to manage the SDXI function. 1130

AdminGrp’s Update operations are part of the administrative operations group. They are used by 1131 privileged software to indicate changes to various memory data structures. When an Update operation 1132 signals completion, this does not guarantee that the effects of the operation are visible or that the flushing 1133 of any affected caches has completed. 1134

Update operations shall be used even when transitioning data structures from the invalid to valid state. 1135 This is done to simplify software emulated implementations. 1136

The AdminGrp’s Sync operation is used as a barrier after one or more Update operations. After 1137 completion of the Sync, the effects of prior AdminGrp Updates to the structures selected by the Sync 1138 operation are guaranteed to be seen by all new descriptors. AdminGrp’s Sync does not act as a barrier for 1139 AdminGrp Update operations executed in different contexts from the Sync. 1140

The AdminGrp’s Start and Stop operations may be used to control the execution state of a target context. 1141 A context that is stopped may also be started by writing appropriate memory state values followed by 1142 writing an appropriate doorbell. 1143

See section 7.2 for more details. 1144

1145

SNIA SDXI Specification Working Draft 61 Version 0.9.0 rev 1

AdminGrp: DS_ADM_UPD_FUNC Operation 1146

The AdminGrp’s Update Function operation is used by software to indicate a non-specific change in 1147 memory data structure contents. This operation signals the target SDXI function to invalidate all non-1148 coherently cached copies of SDXI data structures across all contexts. See 7.2.1 Caching of Memory Data 1149 Structures for more details. 1150

Figure 6-9 DS_ADM_UPD_FUNC Descriptor Format 1151

1152

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

0 * * 1

DWORD01 VF

DWORD02DWORD03DWORD04DWORD05DWORD06DWORD07DWORD08DWORD09DWORD10DWORD11DWORD12DWORD13DWORD14 NP

DWORD15

rsvd (r)0 . . . 0 vf_number

rsvd (r)

0 0 0 completion_ptr [63:00]

0000 0x00 0x002 00000< - - vflags - - >

< - - opcode - - > subtype type

62 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

<!-- Table=Desc --!> 1153 Table 6-9 DS_ADM_UPD_FUNC Operation Descriptor Format 1154

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid =1 (VL)’ 000 See header description. DW00[01] ‘sequential = * (SE)’ 001 See header description. DW00[02] ‘fence = * (FE)’ 002 See header description. DW00[03] ‘chain=0 (CH)’ 003 Shall be set to 0. DW00[07:04] ‘= 0000’ 007:004 Shall be set to 0 DW00[15:08] ‘subtype=0x00’ 015:008 See header description. DW00[26:16] ‘type=0x002’ 026:016 See header description. DW00[31:27] ‘= 00000’ 031:027 Shall be set to 0

DW01[07:00] ‘rsvd (r)’ 039:032 Shall be set to 0

DW01[15:08] ‘vflags*' 048:040 Flags

DW01[14:08] ‘0 . . . 0’ 046:040 Shall be set to 0 DW01[15] ‘vf_valid (VF)’ 047 1 = Update the VF number indicated by vf_number

0 = Update within the local function context executing this descriptor (For VF administrative contexts, this field shall be set to 0.)

DW01[31:16] ‘vf_number’ 063:048 Only valid for PF administrative contexts. When VF=1, this field

indicates the VF to update. (For VF administrative contexts, this field shall be set to 0.)

DW13:02 ‘rsvd (r)’ 447:096 Shall be set to 0 DW15:14 ‘completion_ptr [63:00]^’ 511:448 See footer description, Section 6.1.1

DW14[00] ‘no_pointer (NP)’ 448 See footer description, Section 6.1.1 DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0

<!-- Table=End --!> 1155

SNIA SDXI Specification Working Draft 63 Version 0.9.0 rev 1

AdminGrp DS_ADM_UPD_CXT Operation 1156

The AdminGrp’s Update Context operation is used by software to indicate a change to the memory data 1157 describing one or more contexts of a target function. This operation signals the target SDXI function to 1158 invalidate any non-coherently cached copies of the selected SDXI data structures. 1159

The affected contexts are described using the context_number and context_mask fields. The L2, L1 and 1160 CT bits determine whether context level 2 table entries, context level 1 table entries and/or context control 1161 entries are affected. 1162

All non-coherently cached AKeys associated with the affected contexts are invalidated. 1163

See 7.2 Context Management for more details. 1164

Figure 6-10 DS_ADM_UPD_CXT Descriptor Format 1165

1166

1167

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

0 * * 1

DWORD01 VF CT L1 L2

DWORD02DWORD03DWORD04DWORD05DWORD06DWORD07DWORD08DWORD09DWORD10DWORD11DWORD12DWORD13DWORD14 NP

DWORD15

rsvd (r)

0 0 0 completion_ptr [63:00]

0 . . . 0 0 . . . 0 vf_number context_number context_mask

0000 0x01 0x002 00000< - - cflags - - > < - - vflags - - >

< - - opcode - - > subtype type

64 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

<!-- Table=Desc --!> 1168 Table 6-10 DS_ADM_UPD_CXT Descriptor Format 1169

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid =1 (VL)’ 000 See header description. DW00[01] ‘sequential = * (SE)’ 001 See header description. DW00[02] ‘fence = * (FE)’ 002 See header description. DW00[03] ‘chain=0 (CH)’ 003 Shall be set to 0. DW00[07:04] ‘= 0000’ 007:004 Shall be set to 0 DW00[15:08] ‘subtype=0x01’ 015:008 See header description. DW00[26:16] ‘type=0x002’ 026:016 See header description. DW00[31:27] ‘= 00000’ 031:027 Shall be set to 0

DW01[07:00] ‘cflags*' 039:032 Flags DW01[00] ‘CLT2(L2)’ 032 When 1, indicates that context level 2 table entries associated

with the masked context_number have been updated DW01[01] ‘CLT1(L1)’ 033 When 1, indicates that context level 1 table entries associated

with the masked context_number have been updated DW01[02] ‘CCE(CT)’ 034 When 1, indicates that the context control entries specified by the

masked context_number have been updated DW01[07:03] ‘0 . . . 0’ 039:035 Shall be set to 0

DW01[15:08] ‘vflags*' 047:040 Flags DW01[14:08] ‘0 . . . 0’ 046:040 Shall be set to 0 DW01[15] ‘vf_valid (VF)’ 047 1 = Update the VF number indicated by vf_number

0 = Update within the local function context executing this descriptor (For VF administrative contexts, this field shall be set to 0.)

DW01[31:16] ‘vf_number’ 063:048 Only valid for PF administrative contexts. When VF=1, this field indicates the VF to update. (For VF administrative contexts, this field shall be set to 0.)

DW02[15:00] ‘context_number’ 079:064 Indicates the context number to start. This value is masked by context_mask.

DW02[31:16] ‘context_mask’ 095:080 For each bit in context_mask that is set to 1, the corresponding bit in context_number is ignored when matching cached Context entries. When invalidating Context Level 2 Table Entries, aligned blocks of 128 context numbers must be updated using context_mask.

DW13:03 ‘rsvd (r)’ 447:096 Shall be set to 0 DW15:14 ‘completion_ptr [63:00]^’ 511:448 See footer description, Section 6.1.1

DW14[00] ‘no_pointer (NP)’ 448 See footer description, Section 6.1.1 DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0

<!-- Table=End --!> 1170 1171

SNIA SDXI Specification Working Draft 65 Version 0.9.0 rev 1

AdminGrp DS_ADM_UPD_AKEY Operation 1172

The AdminGrp’s Update AKey operation is used by software to indicate a change to the memory data 1173 describing one or more AKey table entries associated with one or more contexts of a target function. This 1174 operation signals the target SDXI function to invalidate any non-coherently cached copies of the selected 1175 AKey table entries. 1176

The affected contexts are described using the context_number and context_mask fields. The affected 1177 AKey table indices are selected using the AKey and AKeyMask fields. 1178

See 7.2 Context Management for more details. 1179

Figure 6-11 DS_ADM_UPD_AKEY Descriptor Format 1180

1181

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

0 * * 1

DWORD01 VF

DWORD02DWORD03DWORD04DWORD05DWORD06DWORD07DWORD08DWORD09DWORD10DWORD11DWORD12DWORD13DWORD14 NP

DWORD15

akey akey_mask

rsvd (r)

0 0 0 completion_ptr [63:00]

rsvd (r)0 . . . 0 vf_number context_number context_mask

0000 0x02 0x002 00000< - - vflags - - >

< - - opcode - - > subtype type

66 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

<!-- Table=Desc --!> 1182 Table 6-11 DS_ADM_UPD_AKEY Descriptor Format 1183

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid =1 (VL)’ 000 See header description. DW00[01] ‘sequential = * (SE)’ 001 See header description. DW00[02] ‘fence = * (FE)’ 002 See header description. DW00[03] ‘chain=0 (CH)’ 003 Shall be set to 0. DW00[07:04] ‘= 0000’ 007:004 Shall be set to 0 DW00[15:08] ‘subtype=0x02’ 015:008 See header description. DW00[26:16] ‘type=0x002’ 026:016 See header description. DW00[31:27] ‘= 00000’ 031:027 Shall be set to 0

DW01[07:00] ‘rsvd (r)’ 039:032 Shall be set to 0

DW01[15:08] ‘vflags*' 047:040 Flags DW01[14:08] ‘0 . . . 0’ 046:040 Shall be set to 0 DW01[15] ‘vf_valid (VF)’ 047 1 = Update the VF number indicated by vf_number

0 = Update within the local function context executing this descriptor (For VF administrative contexts, this field shall be set to 0.)

DW01[31:16] ‘vf_number’ 063:048 Only valid for PF administrative contexts. When VF=1, this field indicates the VF to update. (For VF administrative contexts, this field shall be set to 0.)

DW02[15:00] ‘context_number’ 079:064 Indicates the context number to start. This value is masked by context_mask.

DW02[31:16] ‘context_mask’ 095:080 For each bit in context_mask that is set to 1, the corresponding bit in context_number is ignored when matching cached Context entries. When invalidating Context Level 2 Table Entries, aligned blocks of 128 context numbers must be updated using context_mask.

DW03[15:00] ‘akey’ 111:096 When combined with akey_mask, indicates the AKey values that have been updated.

DW03[31:16] ‘akey_mask’ 127:112 For each bit in akey_mask that is set to 1, the corresponding bit in AKey is ignored when matching cached AKey entries numbers

DW13:04 ‘rsvd (r)’ 447:128 Shall be set to 0 DW15:14 ‘completion_ptr [63:00]^’ 511:448 See footer description, Section 6.1.1

DW14[00] ‘no_pointer (NP)’ 448 See footer description, Section 6.1.1 DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0

<!-- Table=End --!> 1184

SNIA SDXI Specification Working Draft 67 Version 0.9.0 rev 1

AdminGrp DS_ADM_START Operation 1185

The AdminGrp’s Start operation: changes the value of ContextState within the target context’s Context 1186 Status Entry to indicate Running; and signals the context to run using the supplied doorbell value. 1187

Figure 6-12 DS_ADM_START Descriptor Format 1188

1189

1190

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

0 1 * 1

DWORD01 VF DR

DWORD02DWORD03DWORD04DWORD05DWORD06DWORD07DWORD08DWORD09DWORD10DWORD11DWORD12DWORD13DWORD14 NP

DWORD15

rsvd (r)

doorbell [63:00]

rsvd (r)

0 0 0 completion_ptr [63:00]

rsvd (r)0 . . . 0 vf_number context_number context_mask

0000 0x03 0x002 00000< - - vflags - - >

< - - opcode - - > subtype type

68 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

<!-- Table=Desc --!> 1191 Table 6-12 DS_ADM_START Descriptor Format 1192

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid=1 (VL)’ 000 See header description. DW00[01] ‘sequential=* (SE)’ 001 See header description. DW00[02] ‘fence=1 (FE)’ 002 Shall be set to 1. DW00[03] ‘chain=0 (CH)’ 003 Shall be set to 0. DW00[07:04] ‘=0000’ 007:004 Shall be set to 0 DW00[15:08] ‘subtype=0x03’ 015:008 See header description. DW00[26:16] ‘type=0x002’ 026:016 See header description. DW00[31:27] ‘= 00000’ 031:027 Shall be set to 0

DW01[07:0] ‘rsvd (r)’ 039:032 Shall be set to 0

DW01[15:08] ‘vflags*' 047:040 Flags

DW01[13:08] ‘0 . . . 0’ 046:040 Shall be set to 0 DW01[14] ‘db_ring (DR)’ Doorbell Ring. When set to 1, after starting the context, the

context’s doorbell is rung using the ‘doorbell’ value. DW01[15] ‘vf_valid (VF)’ 047 1 = Update the VF number indicated by vf_number

0 = Update within the local function context executing this descriptor (For VF administrative contexts, this field shall be set to 0.)

DW01[31:16] ‘vf_number’ 063:048 Only valid for PF administrative contexts. When VF=1, this field indicates the VF to update. (For VF administrative contexts, this field shall be set to 0.)

DW02[15:00] ‘context_number’ 079:064 Indicates the context number to start. This value is masked by context_mask.

DW02[31:16] ‘context_mask’ 095:080 For each bit in context_mask that is set to 1, the corresponding bit in context_number is ignored when matching cached Context entries. When invalidating Context Level 2 Table Entries, aligned blocks of 128 context numbers must be updated using context_mask.

DW03[31:00] ‘rsvd (r)’ 127:096 Shall be set to 0 DW05:04 ‘doorbell [63:00]’ 191:128 64-bit Doorbell value used when DR=1. See the Doorbell BAR

section for details on the doorbell value encoding. DW13:06 ‘rsvd (r)’ 447:192 Shall be set to 0 DW15:14 ‘completion_ptr [63:00]^’ 511:448 See footer description, Section 6.1.1

DW14[00] ‘no_pointer (NP)’ 448 See footer description, Section 6.1.1 DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0

<!-- Table=End --!> 1193

SNIA SDXI Specification Working Draft 69 Version 0.9.0 rev 1

AdminGrp DS_ADM_STOP Operation 1194

The AdminGrp’s Stop operation stops a target context from processing new descriptors and changes the 1195 target context’s state to a relevant stopped state. The target context shall be stopped at a descriptor 1196 boundary. See 4.2 SDXI Context State for more details. 1197

The AdminGrp’s Stop operation shall complete in a timely manner so that further administrative operations 1198 may be executed. The operation is not required to wait for the target context to stop prior to completing. 1199 Software shall check the target context’s ContextStatus to determine whether the context has stopped. 1200

If a hard stop is specified, the SDXI function may abort outstanding operations rather than wait for them to 1201 complete normally to speed up the stop process. This may result in errors being reported in the error log. 1202

Figure 6-13 DS_ADM_STOP Descriptor Format 1203

1204

1205

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

0 1 * 1

DWORD01 VF 0 HS

DWORD02DWORD03DWORD04DWORD05DWORD06DWORD07DWORD08DWORD09DWORD10DWORD11DWORD12DWORD13DWORD14 NP

DWORD15

rsvd (r)

0 0 0 completion_ptr [63:00]

rsvd (r)0 . . . 0 vf_number context_number context_mask

0000 0x04 0x002 00000< - - vflags - - >

< - - opcode - - > subtype type

70 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

<!-- Table=Desc --!> 1206 Table 6-13 DS_ADM_STOP Descriptor Format 1207

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid=1 (VL)’ 000 See header description. DW00[01] ‘sequential=* (SE)’ 001 See header description. DW00[02] ‘fence=1 (FE)’ 002 Shall be set to 1. DW00[03] ‘chain=0 (CH)’ 003 Shall be set to 0. DW00[07:04] ‘=0000’ 007:004 Shall be set to 0 DW00[15:08] ‘subtype=0x04’ 015:008 See header description. DW00[26:16] ‘type=0x002’ 026:016 See header description. DW00[31:27] ‘=00000’ 031:027 Shall be set to 0

DW01[07:0] ‘rsvd (r)’ 039:032 Shall be set to 0

DW01[15:08] ‘vflags*' 047:040 Flags

DW01[12:08] ‘0 . . . 0’ 044:040 Shall be set to 0 DW01[13] ‘hard_stop (HS)’ 045 Hard Stop. When HS=1, outstanding operations in the

affected context may be aborted prior to stopping the context. When HS=0, outstanding operations in the affected context shall complete normally prior to stopping the context.

DW01[14] ‘0’ 046 Shall be set to 0. DW01[15] ‘vf_valid (VF)’ 047 1 = Update the VF number indicated by vf_number

0 = Update within the local function context executing this descriptor (For VF administrative contexts, this field shall be set to 0.)

DW01[31:16] ‘vf_number’ 063:048 Only valid for PF administrative contexts. When VF=1, this field indicates the VF to update. (For VF administrative contexts, this field shall be set to 0.)

DW02[15:00] ‘context_number’ 079:064 Indicates the context number tostop. This value is masked by context_mask.

DW02[31:16] ‘context_mask’ 095:080 For each bit in context_mask that is set to 1, the corresponding bit in context_number is ignored when matching cached Context entries. When invalidating Context Level 2 Table Entries, aligned blocks of 128 context numbers must be updated using context_mask.

DW13:03 ‘rsvd (r)’ 447:096 Shall be set to 0 DW15:14 ‘completion_ptr [63:00]^’ 511:448 See footer description, Section 6.1.1

DW14[00] ‘no_pointer (NP)’ 448 See footer description, Section 6.1.1 DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0

<!-- Table=End --!> 1208

SNIA SDXI Specification Working Draft 71 Version 0.9.0 rev 1

AdminGrp DS_ADM_INTR Operation 1209

The AdminGrp’s Interrupt operation is used to generate interrupts from within an administrative context. 1210 The interrupt is generated from the local function hosting the administrative context. 1211

Figure 6-14 DS_ADM_INTR Descriptor Format 1212

1213

1214

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

0 * * 1DWORD01DWORD02

DWORD03DWORD04DWORD05DWORD06DWORD07DWORD08DWORD09DWORD10DWORD11DWORD12DWORD13DWORD14 NP

DWORD150 0 0

completion_ptr [63:00]

< - - intr_number - - > intr 0 . . . 0 rsvd (r)

rsvd (r)

0000 0x05 0x002 00000

rsvd (r)

< - - opcode - - > subtype type

72 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

<!-- Table=Desc --!> 1215 Table 6-14 DS_ADM_INTR Descriptor Format 1216

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid=1 (VL)’ 000 See header description. DW00[01] ‘sequential=* (SE)’ 001 See header description. DW00[02] ‘fence=* (FE)’ 002 See header description. DW00[03] ‘chain=0 (CH)’ 003 Shall be set to 0. DW00[07:04] ‘=0000’ 007:004 Shall be set to 0 DW00[15:08] ‘subtype=0x05’ 015:008 See header description. DW00[26:16] ‘type=0x002’ 026:016 See header description. DW00[31:27] ‘= 00000’ 031:027 Shall be set to 0

DW02:01 ‘rsvd (r)’ 095:032 Shall be set to 0

DW03[15:00] ‘intr_number*’ 111:096 Interrupt

DW03[11:00] ‘intr’ 107:096 Specifies the MSI-X entry used to generate the interrupt.

DW03[15:12] ‘0 . . . 0’ 111:108 Shall be set to 0

DW03[31:16] ‘rsvd (r)’ 127:112 Shall be set to 0

DW13:04 ‘rsvd (r)’ 447:128 Shall be set to 0

DW15:14 ‘completion_ptr [63:00]^’ 511:448 See footer description, Section 6.1.1 DW14[00] ‘no_pointer (NP)’ 448 See footer description, Section 6.1.1 DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0

<!-- Table=End --!> 1217

SNIA SDXI Specification Working Draft 73 Version 0.9.0 rev 1

AdminGrp DS_ADM_SYNC Operation 1218

The AdminGrp’s Sync operation acts as a synchronization barrier relative to prior AdminGrp’s Update 1219 operations that target the selected function, contexts, and AKeys. 1220

A successful completion of the Sync operation indicates the successful completion of all prior AdminGrp 1221 Update operations to the selected data structures and that all outstanding requests using prior non-1222 coherently cached copies of those data structures have completed. 1223

An error associated with Sync may indicate an error associated with a prior AdminGrp Update operation. 1224

Figure 6-15 DS_ADM_SYNC Descriptor Format 1225

1226

1227

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

0 1 * 1

DWORD01 VF

DWORD02DWORD03DWORD04DWORD05DWORD06DWORD07DWORD08DWORD09DWORD10DWORD11DWORD12DWORD13DWORD14 NP

DWORD15

akey akey_mask

rsvd (r)

0 0 0 completion_ptr [63:00]

rsvd (r)0 . . . 0 vf_number context_number context_mask

0000 0x06 0x002 00000< - - vflags - - >

< - - opcode - - > subtype type

74 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

<!-- Table=Desc --!> 1228 Table 6-15 DS_ADM_SYNC Descriptor Format 1229

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid=1 (VL)’ 000 See header description. DW00[01] ‘sequential=* (SE)’ 001 See header description. DW00[02] ‘fence=1 (FE)’ 002 Shall be set to 1. DW00[03] ‘chain=0 (CH)’ 003 Shall be set to 0. DW00[07:04] ‘=0000’ 007:004 Shall be set to 0 DW00[15:08] ‘subtype=0x06’ 015:008 See header description. DW00[26:16] ‘type=0x002’ 026:016 See header description. DW00[31:27] ‘= 00000’ 031:027 Shall be set to 0

DW01[07:0] ‘rsvd (r)’ 039:032 Shall be set to 0

DW01[15:08] ‘vflags*' 047:040 Flags

DW01[14:08] ‘0 . . . 0’ 046:040 Shall be set to 0 DW01[15] ‘vf_valid (VF)’ 047 When VF=1, sync the VF number indicated by vf_number.

When VF=0, sync within the local function hosting the context executing this descriptor (For VF administrative contexts, this field shall be set to 0.)

DW01[31:16] ‘vf_number’ 063:048 Only valid for PF administrative contexts. When VF=1, this field indicates the VF to update. (For VF administrative contexts, this field shall be set to 0.)

DW02[15:00] ‘context_number’ 079:064 Indicates the context number to sync. This value is masked by ‘context_mask’.

DW02[31:16] ‘context_mask’ 095:080 For each bit in context_mask that is set to 1, the corresponding bit in context_number is ignored when matching context numbers.

DW03[15:00] ‘akey’ 111:096 Indicates the AKey value to sync. This value is masked by ‘akey_mask’.

DW03[31:16] ‘akey_mask’ 127:112 For each bit in akey_mask that is set to 1, the corresponding bit in ‘akey’ is ignored when matching AKey entry numbers

DW13:04 ‘rsvd (r)’ 447:128 Shall be set to 0 DW15:14 ‘completion_ptr [63:00]^’ 511:448 See footer description, Section 6.1.1

DW14[00] ‘no_pointer (NP)’ 448 See footer description, Section 6.1.1 DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0

<!-- Table=End --!> 1230

SNIA SDXI Specification Working Draft 75 Version 0.9.0 rev 1

AdminGrp DS_ADM_UPD_RKEY Operation 1231

The AdminGrp’s Update RKey operation is used by software to indicate a change to the memory data 1232 describing one or more RKey table entries a target function. This operation signals the target SDXI 1233 function to invalidate any non-coherently cached copies of the selected RKey table entries. 1234

Figure 6-16 DS_ADM_UPD_RKEY Descriptor Format 1235

1236

The affected RKey table indices are selected using the RKey and RKeyMask fields. 1237

Table 6-16 DS_ADM_UPD_RKEY Descriptor Format 1238

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid=1 (VL)’ 000 See header description. DW00[01] ‘sequential=* (SE)’ 001 See header description. DW00[02] ‘fence=1 (FE)’ 002 Shall be set to 1. DW00[03] ‘chain=0 (CH)’ 003 Shall be set to 0. DW00[07:04] ‘=0000’ 007:004 Shall be set to 0 DW00[15:08] ‘subtype=0x06’ 015:008 See header description. DW00[26:16] ‘type=0x002’ 026:016 See header description. DW00[31:27] ‘= 00000’ 031:027 Shall be set to 0

DW01[07:0] ‘rsvd (r)’ 039:032 Shall be set to 0

DW01[15:08] ‘vflags*' 047:040 Flags

DW01[14:08] ‘0 . . . 0’ 046:040 Shall be set to 0 DW01[15] ‘vf_valid (VF)’ 047 When VF=1, sync the VF number indicated by vf_number.

When VF=0, sync within the local function hosting the context executing this descriptor (For VF administrative contexts, this field shall be set to 0.)

DW01[31:16] ‘vf_number’ 063:048 Only valid for PF administrative contexts. When VF=1, this field indicates the VF to update. (For VF administrative contexts, this field shall be set to 0.)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL0 1 * 1

DWORD01 VFDWORD02

DWORD03

DWORD04

DWORD05

DWORD06

DWORD07

DWORD08

DWORD09

DWORD10

DWORD11

DWORD12

DWORD13

DWORD14 NPDWORD15

⭠ opcode ⭢ type subtype

00000 0x002 0x07 0000⭠ vflags ⭢

vf_number 0 . . . 0 rsvd (r)

rsvd (r)

rkey_mask rkey

rsvd (r)

completion_ptr [63:00]0 0 0

76 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Location Field Bit Range Description DW02[15:00] ‘context_number’ 079:064 Indicates the context number to sync. This value is masked by

‘context_mask’. DW02[31:16] ‘context_mask’ 095:080 For each bit in context_mask that is set to 1, the corresponding

bit in context_number is ignored when matching context numbers.

DW03[15:00] ‘akey’ 111:096 Indicates the AKey value to sync. This value is masked by ‘akey_mask’.

DW03[31:16] ‘akey_mask’ 127:112 For each bit in akey_mask that is set to 1, the corresponding bit in ‘akey’ is ignored when matching AKey entry numbers

DW13:04 ‘rsvd (r)’ 447:128 Shall be set to 0 DW15:14 ‘completion_ptr [63:00]^’ 511:448 See footer description, Section 6.1.1

DW14[00] ‘no_pointer (NP)’ 448 See footer description, Section 6.1.1 DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0

1239

6.4 Atomic Operation Group (AtomicGrp) 1240

This operation group performs various atomic operations. All AtomicGrp operations use the same 1241 descriptor format. 1242

Figure 6-17 DS_ATM Descriptor Format 1243

1244

1245

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

0 * * 1

DWORD01

DWORD02DWORD03DWORD04DWORD05DWORD06DWORD07DWORD08DWORD09DWORD10DWORD11DWORD12DWORD13DWORD14 NP

DWORD15

00 ret_data_ptr [63:00]

rsvd (r)

0 0 0 completion_ptr [63:00]

00 address0 [63:00] (Dst)

operand1 [63:00]

operand2 [63:00]

attr0 (Dst)0 . . . 0 rsvd (r)akey0 (Dst)rsvd (r)

0osz 000 rsvd (r)< - - attr - - >

0000 0x003 00000< - - size - - >

< - - opcode - - > subtype type

SNIA SDXI Specification Working Draft 77 Version 0.9.0 rev 1

<!-- Table=Desc --!> 1246 Table 6-17 DS_ATM Operation Descriptor Format 1247

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid =1 (VL)’ 000 See header description. DW00[01] ‘sequential = * (SE)’ 001 See header description. DW00[02] ‘fence = * (FE)’ 002 See header description. DW00[03] ‘chain=0 (CH)’ 003 Shall be set to 0. DW00[07:04] ‘= 0000’ 007:004 Shall be set to 0 DW00[15:08] ‘subtype’ 015:008 See header description. DW00[26:16] ‘type=0x003’ 026:016 See header description. DW00[31:27] ‘= 00000’ 031:027 Shall be set to 0

DW01[07:00] ‘size*' 039:032 Size DW01[01:00] ‘00’ 033:032 Shall be set to 0 DW01[04:02] ‘osz’ 036:034 The size of the operands in 4-byte increments minus 1:

000b = 4 bytes, 001b = 8 bytes. All other encodings reserved DW01[07:05] ‘000’ 039:037 Shall be set to 0

DW01[31:08] ‘rsvd (r)’ 063:040 Shall be set to 0

DW02[07:00] ‘attr*' 071:064 Attributes DW02[03:00] ‘attr0 (Dst)’ 067:064 Destination buffer attributes DW02[07:04] ‘0 . . . 0’ 071:068 Shall be set to 0 DW02[31:08] ‘rsvd (r)’ 095:072 Shall be set to 0 DW03[15:00] ‘akey0 (Dst)’ 111:096 Destination buffer AKey DW03[31:16] ‘rsvd (r)’ 127:112 Shall be set to 0

DW05:04 ‘address0 [63:00] (Dst)^’ 191:128 Destination buffer memory address to target for the atomic operation. The address shall be aligned to the operand size.

DW04[01:00] ‘00~’ 139:128 Shall be set to 0 DW07:06 ‘operand1 [63:00]’ 255:192 Operand 1 Data. If the operand size is set to 4 bytes, the lower 32 bits

of Operand1Data are utilized. DW09:08 ‘operand2 [63:00]’ 319:256 Operand 2 Data. If the operand size is set to 4 bytes, the lower 32 bits

of Operand1Data are utilized. DW11:10 ‘ret_data_ptr [63:00]^’ 383:320 Pointer to a naturally aligned operand size location in memory. The

original memory value at Address will be written back to this location. If this field is set to 0, no data is returned.

DW10[01:00] ‘00~’ 321:320 Shall be set to 0 DW13:12 ‘rsvd (r)’ 447:384 Shall be set to 0

DW15:14 ‘completion_ptr [63:00]^’ 511:448 See footer description, Section 6.1.1 DW14[00] ‘no_pointer (NP)’ 448 See footer description, Section 6.1.1 DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0

<!-- Table=End --!> 1248

78 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Table 6-18 Atomic Operations Sub-Type1,2,3,4 1249

Sub Type Operation 0x00 Reserved 0x01 Swap (OP_ATM_SWAP) *address0 = operand1; 0x02 Unsigned Add (OP_ATM_UADD) *address0 += operand1; 0x03 Unsigned Subtract (OP_ATM_USUB) *address0 -= operand1; 0x04 Reserved 0x05 AND (OP_ATM_AND) *address0 &= operand1; 0x06 OR (OP_ATM_OR) *address0 |= operand1; 0x07 XOR (OP_ATM_XOR) *address0 ^= operand1; 0x08 Signed Min (OP_ATM_SMIN)5 if (*address0 > operand1){ address0 = operand1; } 0x09 Signed Max (OP_ATM_SMAX)5 if (*address0 < operand1){ address0 = operand1; } 0x0A Unsigned Min (OP_ATM_UMIN)5 if (*address0 > operand1){ address0 = operand1; } 0x0B Unsigned Max (OP_ATM_UMAX)5 if (*address0 < operand1){ address0 = operand1; } 0x0C Unsigned Clamping Increment (OP_ATM_UCLAMPI) *address0 = (*address0 >= operand1) ? operand1 : *address + 1; 0x0D Unsigned Clamping Decrement (OP_ATM_UCLAMPD) *address0 = (*address0 == 0 || *address0 > operand1) ? operand1 : *address - 1; 0x0E Compare and Swap (OP_ATM_CMPSWAP)5 if (*address0 == operand1){ address0 = operand2; } All other encodings are Reserved

1. The sequence of reading and writing (or releasing) address0 by the SDXI function must be atomic 1250 with respect to operations of other agents on address0. 1251

2. ‘*nnn’ means the memory contents pointed to by nnn. 1252

3. All operations are performed at the specified operand size and all operands must be naturally 1253 aligned to that size. 1254

4. Before each operation above, the following pre-operation is always done: 1255 if (ret_data_ptr){ *ret_data_ptr = *address0; } 1256

5. Note: Provided that the atomicity requirements are met, an SDXI implementation is permitted to write 1257 back the unchanged content of address0 when the condition is false. 1258

1259

SNIA SDXI Specification Working Draft 79 Version 0.9.0 rev 1

6.5 IntrGrp Operation Group 1260

IntrGrp DS_INTERRUPT Operation 1261

The IntrGrp’s Interrupt operation is used to generate an interrupt. The interrupt number and address space 1262 are specified in the referenced AKey. 1263

Figure 6-18 DS_INTERRUPT Descriptor Format 1264

1265

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

0 * * 1DWORD01DWORD02DWORD03DWORD04DWORD05DWORD06DWORD07DWORD08DWORD09DWORD10DWORD11DWORD12DWORD13DWORD14 NP

DWORD15

akey rsvd (r)

rsvd (r)

0 0 0 completion_ptr [63:00]

0000 0x00 0x004 00000

rsvd (r)

< - - opcode - - > subtype type

80 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

<!-- Table=Desc --!> 1266 Table 6-19 DS_INTERRUPT Descriptor Format 1267

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid =1 (VL)’ 000 See header description. DW00[01] ‘sequential = * (SE)’ 001 See header description. DW00[02] ‘fence = * (FE)’ 002 See header description. DW00[03] ‘chain=0 (CH)’ 003 Shall be set to 0. DW00[07:04] ‘= 0000’ 007:004 Shall be set to 0. DW00[15:08] ‘subtype=0x00’ 015:008 See header description. DW00[26:16] ‘type=0x004’ 026:016 See header description. DW00[31:27] ‘= 00000’ 031:027 Shall be set to 0

DW02:01 ‘rsvd (r)’ 095:032 Shall be set to 0

DW03[15:00] ‘akey’ 107:096 AKey used for this operation

DW03[31:16] ‘rsvd (r)’ 127:108 Shall be set to 0

DW13:04 ‘rsvd (r)’ 447:128 Shall be set to 0

DW15:14 ‘completion_ptr [63:00]^’ 511:448 See footer description, Section 6.1.1 DW14[00] ‘no_pointer (NP)’ 448 See footer description, Section 6.1.1 DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0

<!-- Table=End --!> 1268

SNIA SDXI Specification Working Draft 81 Version 0.9.0 rev 1

6.6 Vendor Defined Operation Group 1269

The desired expansion model for new SDXI function operations is to create a new architectural operation 1270 group within this specification with architected operations. This prevents fragmentation of the architecture. 1271 But there may be rare circumstances where an SDXI function vendor has a need to create a non-1272 standard, vendor-specific operation. Such an operation shall use the format of the DS_VENDOR 1273 descriptor. Using a standard format allows leveraging of the standard software ecosystem for SDXI. 1274

Figure 6-19 DS_VENDOR Descriptor Format 1275

1276 <!-- Table=Desc --!> 1277

Table 6-20 DS_VENDOR Descriptor Format 1278

Location Field Bit Range Description DW00[31:00] ‘opcode:' 031:000 Opcode

DW00[00] ‘valid =1 (VL)’ 000 See header description. DW00[01] ‘sequential = * (SE)’ 001 See header description. DW00[02] ‘fence = * (FE)’ 002 See header description. DW00[03] ‘chain = * (CH)’ 003 See header description. DW00[07:04] ‘= 0000’ 007:004 Shall be set to 0 DW00[15:08] ‘subtype’ 015:008 Vendor-Defined. DW00[26:16] ‘type’ 026:016 Vendor-Defined but must be between 0x7F8 – 0x7FF. DW00[31:27] ‘= 00000’ 031:027 Shall be set to 0

DW01[15:00] ‘vendor_id’ 047:032 PCI vendor ID for the vendor that defined the descriptor encoding.

DW01[31:16] ‘vendor [015:000]’ 063:048 These bits are defined by the vendor based on the VendorID and SubType fields

DW13:02 ‘vendor [399:016]’ 447:064 These bits are defined by the vendor based on the VendorID and SubType fields

DW15:14 ‘completion_ptr [63:00]^’ 511:448 See footer description, Section 6.1.1 DW14[00] ‘no_pointer (NP)’ 448 See footer description, Section 6.1.1 DW14[03:01] ‘0 0 0~’ 451:449 Shall be set to 0

<!-- Table=End --!> 1279

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

DWORD00 CH FE SE VL

* * * 1DWORD01DWORD02DWORD03DWORD04DWORD05DWORD06DWORD07DWORD08DWORD09DWORD10DWORD11DWORD12DWORD13DWORD14 NP

DWORD15

vendor [399:016]

0 0 0 completion_ptr [63:00]

0000 00000vendor_id vendor [015:000]

< - - opcode - - > subtype type

82 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

7 Function Initialization and Context Management 1280

7.1 Function Initialization 1281

In order to initialize an SDXI function, the following basic steps shall be taken: 1282

1. Read the SDXI MMIO Capability registers to discover the supported SDXI features 1283

2. Write the appropriate SDXI MMIO Capability registers to enable supported SDXI features 1284

3. Read the SDXI MMIO Capabilities register to discover the feature set exposed to driver software 1285

4. Initialize the Error Log. Refer to the Error Logging section for more details 1286

5. Allocate an aligned 4KB region of memory for the Context Level 2 Table 1287

6. Initialize all Context Level 2 Table entries to 0 1288

7. Program the Context Table Base register to point to the Context Level 2 Table location in memory. 1289

• If guest virtual addressing is required, set Control0.FunctionPASID and 1290 Control0.FunctionPASIDValid to appropriate values 1291

8. If present, initialize the mailbox registers 1292

9. Set MmioControl0.Enable to 11b 1293

After basic initialization, Context Level 1 Table structures may be added. Subsequently, Contexts may be 1294 created to host descriptor rings. It is recommended that an Administrative Context be the first context 1295 created within an SDXI function. 1296

Software may also need to configure additional PCIe standards-based features such as MSI/MSI-X, ATS, 1297 PRI, or TPH based on the desired use cases. 1298

7.2 Context Management 1299

Caching of Memory Data Structures 1300

The function may cache various memory data structures using private caches that are not coherent with 1301 respect to processor caches. Data in both the valid and invalid states may be cached in this manner. 1302 Software must use the appropriate AdminGrp.Update and AdminGrp.Sync commands to notify the 1303 function when memory data structures have changed. 1304

The function may cache any of the following data structures: 1305

• Context level 1 table entry 1306

• Context level 2 table entry 1307

• Context control entry 1308

• AKey table entry 1309

• Valid descriptors located between the read and write pointer of the associated descriptor ring 1310

• Context Status Entry of a running context 1311

1312

SNIA SDXI Specification Working Draft 83 Version 0.9.0 rev 1

Modify or Delete a Context Level 2 Table Entry 1313

In order to modify or delete an existing Context Level 2 Table entry, the following steps may be followed: 1314

1. Atomically modify the Context Level 2 Table Entry data in memory 1315

2. Issue an Update Context operation targeting the Context Number to an appropriate Administrative 1316 Context with the following settings 1317

• L2=1 1318 • L1=1 1319 • CT=1 1320 • context_mask=0x007F 1321 • context_number={Context Level 2 Table Entry offset, 0000000b} 1322

3. Issue a Sync operation into the same descriptor ring as the prior Update operation 1323

4. Wait for the completion signal of the Sync operation 1324

Add Context Level 1 Table 1325

In order to add a Context Level 1 Table, the following steps may be followed: 1326

1. Identify a free Context Level 2 Table Entry 1327

2. Allocate an aligned 4KB region of memory 1328

3. Initialize the memory region to zero 1329

4. Program the Context Level 2 Table Entry to point to the new memory region 1330

5. Set the Context Level 2 Table entry Valid bit to 1 1331

6. Issue an Update Context operation targeting this Context Number into the descriptor ring of an 1332 Administrative Context with the following settings 1333

• L2=1 1334 • L1=1 1335 • CT=1 1336 • context_mask=0x007F 1337 • context_number={Context Level 2 Table Entry offset, 0000000b} 1338

7. Issue a Sync operation into the same descriptor ring as the prior Update operation 1339

8. Wait for the Completion signal of the Sync operation 1340

Modify or Delete a Context Level 1 Table Entry 1341

In order to modify or delete an existing Context Level 1 Table Entry, the following steps may be followed if 1342 the memory data can be atomically modified: 1343

1. Atomically modify the relevant portion of the Context Level 1 Table Entry data in memory 1344

2. Issue an Update Context operation targeting this Context Number into the descriptor ring of an 1345 Administrative Context with the following settings 1346

• L2=0 1347 • L1=1 1348 • CT=1 1349 • context_mask=0x0000 1350 • context_number={Context Level 2 Table Entry offset, Context Level 1 Table Entry offset} 1351

3. Issue a Sync operation into the same descriptor ring as the prior Update operation 1352

84 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

4. Wait for the completion signal of the Sync operation 1353

1354

As the Context Level 1 Table entry is 32 bytes, it may not be possible to atomically modify the entire entry. 1355 In this case, software may attempt to perform multiple modification sequences with the understanding that 1356 the function may see and utilize the intermediate values if the context referenced by the entry is active. 1357

Alternatively, the associated context may be stopped prior to modifying the Context Level 1 Table entry. 1358 An update operation is still required to be performed. 1359

New Context Creation 1360

The context controls the capabilities and features of the associated descriptor ring. In order to create a 1361 new context, software may use the following procedure: 1362

1. Identify a free Context Number 1363

2. Create a new Context Level 1 Table if required 1364

3. Allocate memory for the Context Control Entry 1365

4. Allocate memory for the AKey table, if required 1366

5. Allocate memory for the Descriptor Ring, if required. Depending on the software model, memory for 1367 some data structures may have been allocated by a user process prior to requesting context 1368 creation. 1369

6. Allocate memory for the Context Status Entry, if required 1370

7. Allocate memory for the Write Index, if required 1371

8. Initialize the Context Status Entry including the Read Index 1372

9. Initialize any AKey table entries as appropriate and set the remaining entries to invalid (V=0) 1373

10. Initialize the Context Control Entry 1374

11. Set the Write Index value to zero 1375

12. Program the selected Context Level 1 Entry associated with the Context Number to point to all of 1376 the relevant data structures 1377

13. Issue an Update Context operation targeting the Context Number into the descriptor ring of an 1378 Administrative Context with the following settings 1379

• L2=0 1380 • L1=1 1381 • CT=1 1382 • context_mask=0x0000 1383 • context_number={Context Level 2 Table Entry offset, Context Level 1 Table Entry offset} 1384

14. Issue a Sync operation into the same descriptor ring as the prior Update operation 1385

15. Wait for the completion signal of the Sync operation 1386

16. Provide the context owner access to the doorbell page associated with the context 1387

• DOORBELL_BAR+DBSTRIDE*context_number 1388

1389

SNIA SDXI Specification Working Draft 85 Version 0.9.0 rev 1

Modify a Context Control Entry 1390

Software may modify a Context Control Entry using the following procedure. 1391

1. Modify the Context Control Entry in memory 1392

2. Issue an Update Context operation targeting this Context Number into the descriptor ring of an 1393 Administrative Context with the following settings 1394

• L2=0 1395 • L1=0 1396 • CT=1 1397 • context_mask=0x0000 1398 • context_number={Context Level 2 Table Entry offset, Context Level 1 Table Entry offset} 1399

3. Issue a Sync operation into the same descriptor ring as the prior Update operation 1400

4. Wait for the completion signal of the Sync operation 1401

1402

It may be difficult to atomically modify the context control entry as it is 64 bytes in size. The function may 1403 see either the old or new value of the context control entry from the time it is modified in memory until the 1404 point at which software sees the completion signal get updated for the Sync operation. There are several 1405 approaches that may be considered: 1406

Software may be able to determine a safe sequence of modifications depending on the fields that need to 1407 be modified. Care shall be taken as intermediate values may be processed by the function if the context is 1408 running. 1409

Software may stop the context, modify the context entry and then restart the context. 1410

Another option is to construct a new version of the context entry in an alternate location in memory and 1411 then modify the context level 1 table entry to point to the new location. Software shall issue an Update 1412 operation for the context level 1 table entry followed by a Sync operation and wait for the Sync completion. 1413 If the context is running, the transition to the new context control entry information is asynchronous relative 1414 to the processing of the context’s descriptors and is not guaranteed to occur on a descriptor boundary. 1415

Modify or Delete an AKey Table Entry 1416

Software may modify or delete an AKey table entry by updating the entry in memory and then enqueueing 1417 either an Update Function or Update AKey Table Entry descriptor into the ring of an associated 1418 Administrative Context. 1419

1. Modify the AKey table entry in memory 1420

2. Issue an Update AKey Table Entry operation into the descriptor ring of an Administrative Context 1421 with the following settings 1422

• context_mask=0x0000 1423 • context_number={Context Level 2 Table Entry offset, Context Level 1 Table Entry offset} 1424 • AKeyMask=0x0000 1425 • AKeyNumber= AKey table entry offset 1426

3. Issue a Sync operation into the same descriptor ring as the prior Update operation 1427

4. Wait for the completion of the Sync operation 1428

1429

If the entry is modified in a non-atomic manner, it is possible for an implementation to see and use an 1430 intermediate value. The function may use either the old or new value of the AKey Table Entry from the 1431

86 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

time it is modified in memory until the point at which software sees the completion signal update for the 1432 Sync operation. Software must take proper care to ensure that these possibilities are accommodated. For 1433 example, software may ensure that no descriptors will use the target AKey Table Entry during the 1434 modification process, or software may ensure that all intermediate states are safe. 1435

Start a Context 1436

In order to execute descriptors, a context must meet all of the following conditions: 1437

1. The function must be in the Active state (MmioStatus0.GlobalState==Active) 1438

2. Context Level 2 Table Entry must be readable by the function and valid (CL2TE.V=1) 1439

3. Context Level 1 Table Entry must be readable by the function, valid (CL1TE.V=1) and active 1440 (CL1TE.A=1) 1441

4. Context Control Entry must be readable by the function and valid (CCE.V=1) 1442

5. Context Status Entry must indicate ContextState==Running 1443

6. Write Index memory location must be readable 1444

7. Write Index memory value must be greater than the Read Index value 1445

8. Doorbell write targeting the context must be received with a value greater than Read Index 1446

In order to set ContextState==Running, software may issue a Start Context descriptor in an appropriate 1447 Administrative Context. Software may also manually modify the Context State Entry memory to indicate 1448 ContextState==Running. This is necessary when starting the first context. 1449

A doorbell write may be generated by performing an MMIO write to the appropriate address within the 1450 doorbell BAR. When using a Start Context descriptor, a doorbell access may also be generated from 1451 within the descriptor by setting the DR flag and supplying an appropriate DoorbellValue. 1452

Refer to 4.2 SDXI Context State for more details. 1453

Restore and Resume a Context 1454

Software may load a previously saved context. This may be done in case time sharing of a context is 1455 required, or as part of live migration. There is no requirement to restore a context to the same context 1456 number that was previously used. Context numbers are only referenced by admin operations and error 1457 logs. They are not intended to be exposed to any user processes that may be controlling a context 1458 descriptor ring. 1459

To restore a context, the following procedure may be used: 1460

1. Locate a free context number 1461

2. Add an additional Context Level 1 Table if necessary 1462

3. Load all of the context data structures into memory 1463

4. Modify AKey table entry contents as necessary 1464

5. Program the Context Level 1 Entry associated with the selected context number to point to the data 1465 structures in memory 1466

6. Enqueue a Context Start descriptor into the ring of an appropriate Administrative Context. Set DR=1 1467 and DoorbellValue = all-F if the context was stopped with pending descriptors. 1468

7. Map the appropriate doorbell page associated with the new context number to the context owner 1469

SNIA SDXI Specification Working Draft 87 Version 0.9.0 rev 1

Stop/Save a Context 1470

A context may be stopped through the following procedure: 1471

1. Enqueue a Context Stop descriptor referencing the desired Context Number into the descriptor ring 1472 of an appropriate Administrative Context. 1473

2. Poll either the Context Status Entry’s ContextState field until it indicates Stopped or wait for the 1474 completion signal for the Context Stop operation. 1475

1476

A context may also be stopped through the following alternate procedure: 1477

1. Set Context Level 1 Table Entry to invalid 1478

2. Issue an Update Context operation targeting this Context Number into the descriptor ring of an 1479 Administrative Context with the following settings 1480

• L2=0 1481 • L1=1 1482 • CT=1 1483 • context_mask=0x0000 1484 • context_number={Context Level 2 Table Entry offset, Context Level 1 Table Entry offset} 1485

3. Issue a Sync operation into the same descriptor ring as the prior Update operation 1486

4. Wait for the Completion signal of the Sync operation 1487

1488

Once a context is safely stopped, all of its system memory data structures may be saved by software. 1489

7.3 Error Log Processing 1490

Error Log Initialization 1491

To initialize or re-initialize the error log, the following procedure may be used by software: 1492

• Clear ErrorLogBase.ErrLogEn=0 1493

• Allocate a contiguous memory buffer to hold the error log. This shall be a 4K aligned region of 1494 N*4Kbytes 1495

• Initialize the error log memory to zero 1496

• Program ErrorLogBase.ErrLogBase to point to the start of the error log memory region 1497

• Program ErrorLogBase.ErrLogSize to the appropriate size of the error log 1498

• Set ErrorLogControl.ErrLogInterruptEn=1 if interrupts on errors are desired 1499

• Set ErrorLogBase.ErrLogEn=1 1500

• Clear the bits in ErrorLogStatus as necessary 1501

1502

88 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Error Log Overflow Recovery 1503

On an error log overflow event, software may use the following sequence free up space in the error log 1504 and resume error loging: 1505

• Process one or more error log entries starting from ErrLogRptr 1506

• Advance ErrLogRptr 1507

• Clear ErrorLogStatus.ErrLogOverflow and ErrorLogStatus.ErrLogStatus 1508

7.4 Reset 1509

When an SDXI function is reset, it is possible that there are outstanding DMA requests from the function to 1510 both local and remote address spaces. The resetting function may receive DMA from other SDXI functions 1511 targeting the function’s local address spaces. Those remote functions may continue to issue new DMA 1512 requests attempting to access the resetting function’s local address spaces during reset. The function 1513 must clean up all outstanding DMAs during the reset process. 1514

After reset has completed, the function does not generate DMA requests until it is reinitialized by software. 1515 All remote function DMA targeting local address spaces is aborted. 1516

It is privileged software’s responsibility to notify the remote software contexts that the target function is 1517 being reset. It shall also ensure either orderly or unorderly removal of all associated connections prior to 1518 re-enabling the VF that was reset. 1519

7.5 Mailbox Interface 1520

The PF/VF mailbox interface allows for low-bandwidth two-way communication between privileged 1521 software and each of the software contexts. This may be used directly during runtime to pass control 1522 messages between privileged software and the VMs. It may also be used to bootstrap higher bandwidth 1523 forms of communication such as shared memory buffers, or even memory ring-buffers and interrupts that 1524 are written or generated by SDXI contexts. 1525

It is recommended that under virtualization, the Hypervisor take ownership of all SDXI PFs in order to 1526 facilitate communication between VFs under the same PF, and between SDXI VFs belonging to different 1527 PFs within the same SDXI function group. 1528

7.6 Descriptor Driven Interrupts 1529

Interrupts generated by descriptors do not set any sort of internal MMIO status as may be present in 1530 traditional devices. Any interrupt status is expected to be maintained in memory. 1531

For these interrupts, the function does not explicitly know when interrupt service is complete. These 1532 interrupts must be edge triggered as the function does not know when to de-assert a level-sensitive 1533 interrupt. Additionally, if MSI or MSI-X Pending bits are implemented, the Pending state will not be change 1534 from Set to Clear while the associated vector is masked. 1535

SNIA SDXI Specification Working Draft 89 Version 0.9.0 rev 1

8 SDXI PCI-Express Device Architecture 1536

SDXI is based on the PCI-Express device and software architecture including the use of configuration and 1537 MMIO register spaces and standard PCI-Express capabilities including, but not limited to, PCI Power 1538 Management, MSI/MSI-X interrupts and SR-IOV. 1539

SDXI may be implemented using a range of function types such as PCIe EP or RCiEP. It may be 1540 implemented as either a Physical Function or Virtual Function. 1541

Each SDXI function shall support standard PCI-Express defined reset mechanisms such as fundamental 1542 reset and hot-reset. 1543

If SR-IOV is supported, PFs and VFs shall support Function Level Reset (FLR). 1544

8.1 SDXI Function Configuration Space Registers 1545

Figure 8-1 PCI Config Space 1546

1547

1548

Class Code 1549

SDXI utilizes the following class code: 1550

Base Class = 0x12, Sub Class = 0x01, Programming Interface = 0x0 – SNIA Smart Data Accelerator 1551 Interface (SDXI) controllerSDXI functions are expected to be identified through the SDXI class code. 1552

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

+000h

+004h

+008h

MF +00Ch

+010h

+014h

+018h

+01Ch

+020h

+024h

+028h

+02Ch

+030h

+034h

+038h

+03Ch

SE PE BM MS I/O0x0

SC0x0

WI0x0

PS0x0

SWC0x0rsvdPIRIS

0x0CAP0x1

660x0MDPSTARTARMA rsvdrsvd

⭠Command ⭢

Capabilities Pointer

Bar 0/1 Prefetchable Memory for MMIO Registers

Bar 2/3 Prefetchable Memory for Doorbells

Bar 4 - Reserved 0x0Bar 5 - Reserved 0x0

Cardbus CIS Pointer - 0x0Subsystem ID Subsystem Vendor ID

Expansion ROM Base Address - 0x0

SSEDPE DEVSEL0x0

ID0x0

FBE0x0

FBC0x0

Interrupt Pin - 0x0 Interrupt Line

Vendor IDDevice ID

Revision ID

Cache Line SizeLatency Timer - 0x0BIST

Base Class - 0x12 Sub-Class - 0x1 Prog I/F - 0x0

Header Layout - 0x0

⭠Status ⭢

⭠Class Code ⭢

Max_Lat - 0x0 Min_Gnt - 0x0

Reserved

90 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

BAR Configuration 1553

Table 8-1 BAR Configuration 1554

PCI Bar Index Width Size Type Description

Bar 0/1 64 bit 512KiB to 4MiB for PF. 512KiB for VF

Prefetchable, Memory

MMIO Registers

Bar 2/3 64 bit Variable Prefetchable, Memory

Doorbell BAR

Bar 4 Reserved

Bar 5 Reserved

1555

If MSI-X is supported, the MSI-X table and PBA Offset table may be placed within any of the BARs 1556 including the reserved ones depending on hardware implementation. A block of MMIO space within 1557 BAR0/1 is reserved for use by MSI-X tables. Implementations are not required to use this reserved MMIO 1558 region. 1559

The MMIO register BAR is marked as prefetchable to allow software to allocate the region within 64-bit 1560 MMIO space in order to avoid exhaustion of 32-bit MMIO space when implementing SR-IOV with many 1561 virtual functions. Please refer to the implementation note in the PCI-Express Base Specification which 1562 provides additional guidance on the use of the prefetchable attribute. 1563

1564

SNIA SDXI Specification Working Draft 91 Version 0.9.0 rev 1

Required Capabilities and Extended Capabilities 1565

SDXI functions are required to support the following capabilities: 1566

• PCI Power Management 1567

• MSI and/or MSI-X 1568

Additional PCIe capabilities such as Advanced Error Reporting, Transaction Processing Hints, and Single 1569 Root I/O Virtualization are optional. See section 7.4 for reset requirements. 1570

92 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

9 MMIO Control Registers 1571

MMIO registers are grouped by function with each function separated within a separate 64Kbyte region to 1572 facilitate CPU MMU-based protection and emulation across a range of possible processor architectures. 1573 PFs may implement an MMIO region of between 512KB and 4MB depending on the amount of MMIO 1574 space required to hold mailboxes for the number of supported VFs. VFs each consume 512KB of MMIO 1575 space 1576

1577

The requirements to access an SDXI function's MMIO register are described here. Software should not 1578 depend on the results of any unsupported access (as described in this section) to these MMIO registers. A 1579 misaligned access that straddles MMIO registers is not supported and results in undefined operation of the 1580 SDXI function. A write to an MMIO register must be ordered with respect to other accesses to the same 1581 register; software must use processor-specific means to ensure that the required memory ordering is 1582 enforced. 1583

The SDXI function's MMIO registers are located in PCIe prefetchable memory space. While it is expected 1584 that most platforms will not merge separate writes to a PCIe endpoint, software may be required on some 1585 platforms to work around this when writing SDXI function MMIO registers. Please consult platform-specific 1586 references and the 5.0 PCIe Base Specification -- Implementation Note: "Additional Guidance on the 1587 Prefetchable Bit in Memory Space BARS". 1588

The SDXI function's MMIO register space is segmented logically into two regions for the purposes of 1589 discussing the access model: the Doorbell MMIO registers, and the remaining MMIO registers. 1590

For the Doorbell MMIO registers, only a naturally-aligned 64-bit write is supported. Any other write access 1591 is ignored by the SDXI function. A read access of a Doorbell MMIO register is not supported; the SDXI 1592 function must return a result but the result is architecturally undefined. 1593

For the remaining MMIO registers, the SDXI function supports all naturally-aligned accesses up to 64-bits 1594 as atomic with one exception: if an implementation can not support 64-bit atomic accesses it will report 1595 MmioCapabilities1.Mmio64 as '0'. Misaligned reads and writes within an MMIO register are not supported; 1596 the SDXI function is allowed to return or write stale data as determined by the access type. Attempting to 1597 write reserved fields in an MMIO register with an illegal value results in an undefined value for the register. 1598

A "torn" read or write of an MMIO register can result when at least two agents - any of the SDXI function or 1599 one or more software threads -- are concurrently accessing the fields of the register in a non-atomic 1600 manner. The torn access can result in stale data being intermixed with more recent data in an 1601 unpredictable way. Software is solely responsible for avoiding torn reads and writes on accesses that 1602 straddle fields within an MMIO register. Software shall solely ensure synchronized access among multiple 1603 software threads; the method for which is beyond the scope of this specification. 1604

In an environment with synchronized software access, the following list describes methods a single 1605 software thread can use to avoid undefined operation and torn accesses with respect to the SDXI function 1606 itself; SDXI implementations shall ensure that these methods are supported. 1607

• If the register is not dynamically updated by the SDXI function, then any naturally-aligned read will 1608 avoid returning torn data. 1609

• When the SDXI function's MmioControl0.GlobalState is "Stopped", any naturally-aligned read or write 1610 will avoid torn data. 1611

• This is the only supported method for MmioContextTblBase (MMIO Offset: 0x1_0000) 1612

• If possible, use a supported naturally-aligned atomic read or write that maps solely to complete fields 1613 in the MMIO register. 1614

SNIA SDXI Specification Working Draft 93 Version 0.9.0 rev 1

If the MMIO register has an associated valid/enable bit (in some cases this is in another MMIO register), then disable or 1615 invalidate the register, read the enable/valid bit back to confirm it has changed value, make modifications, and then 1616

enable or validate the register.Table 9-1 MMIO Control Registers 1617

9.1 General Control and Status Registers 1618

Table 9-2 MmioControl0 (MMIO Offset :0x0_0000) 1619

Bits Field Type Reset Value Description

1:0 Enable RW 0x0 Controls the SDXI function state. See 4.1 SDXI Function State for more detail.

2 FunctionPASIDValid RW 0x0 Indicates whether the FunctionPASID field is used when accessing context level 2 or 1 table entries, Context Control Entries, AKey table entries, and error log entries. If FunctionPASIDValid=0, requests to the referenced data structures are accessed using non-PASID DMA requests.

3 Reserved R 0x0 4 FunctionErrIntEn RW 0x0 1=An interrupt is signaled when hardware transitions the

function to the Stopped-On-Error state as reported through MmioStatus0.GlobalState. When MSI or MSI-X is enabled, message 0 is signaled. 0=No interrupt is generated

7:5 Reserved R 0x0 27:8 FunctionPASID RW 0x0 Function PASID value. 31:28 Reserved R 0x0 39:32 GroupID RW 0x0 In a multi-function implementation of SDXI, all functions

connected within an SDXI function group reflect the same GroupID value written into any group member’s copy of this register. Software may identify all functions within a function group by writing a non-zero value into the GroupID field of one function and scanning for the same value in other SDXI functions. The GroupID register is only used to identify functions within the same SDXI function group. It is not referenced by other SDXI registers or data structures. See 2.2.3 SDXI Function Groups for more details.

64KB Region PF Usage VF Usage

0x00_0000 General Control and Status Registers

0x01_0000 Context Table Registers

0x02_0000 Error Logging Control and Status Registers

0x03_0000 Mailbox Control Registers

0x04_0000 Reserved for MSI-X

0x05_0000 Reserved

0x06_0000 to 0x25_FFFF Alternating 64K regions of Send and Receive Mailboxes

Send Mailbox at 0x6_0000. Receive Mailbox at 0x7_0000.

0x26_0000 to 0x3F_FFFF Reserved N/A

94 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Bits Field Type Reset Value Description

63:40 Reserved R 0x0 1620

Table 9-3 MmioControl1 (MMIO Offset: 0x0_0008) 1621

Bits Field Type Reset Value Description

63:1 Reserved R 0x0

0 AllContextDB W 0x0 Writing this field to 1 is equivalent to generating a doorbell with a value of 0xFFFF_FFFF_FFFF_FFFF to all contexts within this function. This field always reads back as 0.

1622

Table 9-4 MmioControl2 (MMIO Offset: 0x0_0010) 1623

Bits Field Type Reset Value Description

3:0 MaxBufferCntl RW MaxBuffer This field controls the maximum data buffer size enabled for use by this function. The field is encoded the same as MmioCapabilities1.MaxBuffer and must not exceed it in value.

11:5 Reserved R 0x0

7:5 Reserved R 0x0

11:8 Reserved R 0x0

15:12 MaxAKTSizeCntl RW MaxAKTSize This field controls the maximum size of any AKey table useable by any context within this function. The field is encoded the same asMmioCapabilities1.MaxAKTSize and must not exceed it in value.

31:16 NContextsCntl RW NContexts This field controls the maximum number of contexts enabled for use by this function. The field is encoded the sameMmioCapabilities1.NContexts and must not exceed it in value.

63:32 CmdGrp0En RW 0x0 Each bit in this register enables the use of the corresponding operation groups indicated by MmioCapabilities1.CmdGrp0Support. Software shall not set bits in this field whose corresponding bits inMmioCapabilities1.CmdGrp0Support are clear.

SNIA SDXI Specification Working Draft 95 Version 0.9.0 rev 1

Table 9-5 MmioStatus0 (MMIO Offset: 0x0_0100) 1624

Bits Field Type Reset Value Description

2:0 GlobalState R 0x0 This field describes the overall state of the SDXI function. This register does not indicate the state of any specific context 000b – Stopped 001b – Initializing 010b – Active 011b – Soft Stop Request 100b – Hard Stop Request 101b – Stopped-On-Error All other encodings reserved See 4.1 SDXI Function State for more detail.

63:3 Reserved R 0x0

1625

Table 9-6 MmioCapabilities0(MMIO Offset: 0x0_0200) 1626

Bits Field Type Reset Value Description

15:0 SFuncHandle R An opaque identifier that maps to the function’s PCIe Requester ID. Referenced by AKey table entries. SFuncHandle is unique for each SDXI function with an SDXI Function Group. This field may be set to 0 for implementations with MmioCapabilities1.RKeySupport=0.

16 VF R 1=Indicates a virtual function. Software should use the virtual function programming model including MMIO register layout and mailbox programming. 0=Indicates a physical function. Software should use the physical function programming model including MMIO register layout and mailbox programming.

19:17 Reserved R 0x0

22:20 DBStride R Impl Indicates the address stride between doorbell pages. The stride is encoded as 2(DBStride+12).

23 Reserved R 0x0

28:24 MaxDescRingSize R Impl Indicates the maximum size of any descriptor ring for any context within this function. The maximum descriptor ring size that is supported is 2(MaxDescRingSize + 16) bytes. Valid MaxDescRingSize values range from 0 (64 KB or 1K descriptors) to 22 (256 GB or 4G descriptors); all other encodings are reserved. Note that the CCE.DescRingSize field has a maximum value of 4G-1 descriptors.

63:29 Reserved R 0x0

1627

96 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

Table 9-7 MmioCapabilities1 (MMIO Offset: 0x0_0208) 1628

Bits Field Type Reset Value Description

3:0 MaxBuffer R Impl Indicates the maximum data buffer size supported by this function for operations. The maximum data buffer size that is supported is 2(MaxBuffer+21) bytes. Valid MaxBuffer values range from 0 (2 MiB) to 11 (4 GiB); all other encodings are reserved.

4 RKeySupport R Impl Indicates whether RKey functionality is supported. RKey support is required to support data transfers between functions in an SDXI group.

5 RM R Impl Memory Registration Required. Reserved for use with the SDXI Connection Operation Group.

6 Mmio64 R Impl When 1, indicates the SDXI Function implements 64-bit atomic access to MMIO registers; When 0, 32-bit is the largest atomic access to MMIO registers. See section 9 MMIO Control Registers for more details. Behaviors associated with this bit do not apply to the doorbell registers.

7 Reserved R

11:8 MaxErrLogSize R Impl Indicates the maximum size of the error log for this function. The maximum error log size that is supported is 2(MaxErrLogSize + 23) bytes. Valid MaxErrLogSize values range from 0 (8 MB) to 9 (4 GB); all other encodings are reserved.

15:12 MaxAKTSize R Impl Indicates the maximum size of any AKey table referenced by any context within this function. The maximum size that is supported is 2(MaxAKTSize+12) bytes. Values of MaxAKTSize greater than 0x8 are reserved.

31:16 NContexts R Impl The contexts supported by this function range from 0 to NContexts. The maximum number of contexts supported by this function is NContexts+1.

63:32 CmdGrp0Support R Impl Each bit in this field is set to 1 to indicate whether a certain group of optional descriptors is supported. Some descriptor types are required to be supported and their presence is implied and not indicated through this register. Bits 1:0 – Reserved 2 – Administrative Operation Group 3 – Atomic Operation Group 4 - Interrupt Operation Group 5 – Connection Operation Group 31:6 – Reserved

1629

SNIA SDXI Specification Working Draft 97 Version 0.9.0 rev 1

Table 9-8 MmioCapabilities2 (MMIO Offset: 0x0_020C) 1630

Bits Field Type Reset Value Description

7:0 MinVer R Impl Indicates Minor Version number of SDXI function implementation. 0x0 for the version of SDXI described in this document.

15:8 Reserved R 0x0 Reserved.

23:16 MajVer R Impl Indicates Major Version number of SDXI function implementation. 0x1 for the version of SDXI described in this document.

31-24 Reserved R 0 Reserved.

1631

9.2 Context and RKey Table Registers 1632

Table 9-9 MmioContextTblBase (MMIO Offset: 0x1_0000) 1633

Bits Field Type Reset Value Description

11:0 Reserved R 0x0

63:12 ContextLvl2TableBasePtr RW 0x0 Pointer to the context level 2 table.

1634

Table 9-10 MmioRKeyTblBase (MMIO Offset: 0x1_0100) 1635

Bits Field Type Reset Value Description

0 RKeyTableEn RW 0x0 When set to 1, RKey functionality is enabled. Also, RKeyTableSize and RKeyTableBasePtr contain valid information.

3:1 RKeyTableSize RW 0x0 Controls the size of the RKey table. The RKey table size is encoded as 2(12+RKeyTableSize).

11:1 Reserved R 0x0

63:12 RKeyTableBasePtr RW 0x0 Pointer to the start of the RKey table.

1636

98 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

9.3 Error Logging Control and Status Registers 1637

Table 9-11 MmioErrorLogCtl (MMIO Offset:0x2_0000) 1638

Bits Field Type Reset Value Description

0 ErrLogInterruptEn RW 0x0 1=An interrupt is signaled when hardware sets ErrLogStatus. When MSI or MSI-X is enabled, message 0 is signaled. 0=No interrupt is generated

63:1 Reserved R 0x0

1639

Table 9-12 MmioErrorLogStatus (MMIO Offset:0x2_0008) 1640

Bits Field Type Reset Value Description

0 ErrLogStatus RW1C 0x0 1=An error has been recorded and the error log.

1 ErrLogOverflow RW1C 0x0 1=The error log has overflowed, and some error information was lost. ErrStatusHalted will be set on an overflow event. Further error logging is disabled until this bit is cleared.

2 Reserved R 0x0

3 ErrLogError RW1C 0x0 1=An error occurred attempting to write the error log.

63:4 Reserved R 0x0

1641

SNIA SDXI Specification Working Draft 99 Version 0.9.0 rev 1

Table 9-13 MmioErrorLogBase (MMIO Offset:0x2_0010) 1642

Bits Field Type Reset Value Description

0 ErrLogEn RW 0x0 Indicates that ErrLogBase and ErrLogSize contains information.

11:1 ErrLogSize RW 0x0 Error log size in 4K pages. 4K * (ErrLogSize+1).

63:12 ErrLogBase RW 0x0 Pointer to the start of the error log.

1643

Table 9-14 MmioErrorLogWptr (MMIO Offset: 0x2_0020) 1644

Bits Field Type Reset Value Description

63:0 ErrLogWptr RW 0x0 Hardware error log write pointer. Hardware increments the write pointer when an error log entry is written. The error log is empty when ErrLogWptr==ErrLogRptr. The write pointer value is expected to never wrap around to 0. The memory buffer used by the error log may wrap around. The write pointer points to (ErrLogBase + ((ErrLogWptr * 64) % ((ErrLogSize+1) * 4096)). Software may write ErrLogWptr only when error logging is disabled either through ErrLogEn or ErrLogOverflow. Writes to this register at any other time result in undefined behavior.

1645

Table 9-15 MmioErrorLogRptr (MMIO Offset: 0x2_0028) 1646

Bits Field Type Reset Value Description

63:0 ErrLogRptr RW 0x0 Error log read pointer. This points to the start of the first error log entry that has not been consumed by software. The read pointer points to (ErrLogBase + ((ErrLogRptr * 64) % ((ErrLogSize+1) * 4096)). Software increments the read pointer as it consumes error log entries. Software shall clear ELV in each processed error log entry prior to incrementing the read pointer.

1647

1648

100 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

9.4 MBOX Mailbox Registers 1649

Mailbox registers are used under SR-IOV virtualization to communicate between a PF and its associated 1650 VFs. There is no mailbox communication directly between VFs. There is no mailbox communication 1651 directly between functions that belong to different SDXI SR-IOV devices even if they belong to the same 1652 SDXI Group. All mailbox registers are reserved in implementations that do not support SR-IOV. An 1653 implementation is only required to implement enough mailbox registers to support the number of VFs it 1654 implements. 1655

Each mailbox is 16 bytes in size. There are separate transmit and receive mailboxes between each VF 1656 and the PF. 1657

1658

Table 9-16 MmioMBoxCtl (MMIO Offset:0x3_0000) 1659

Bits Field Type Reset Value Description

0 SndMsgAckIntrEn RW 0x0 Send message acknowledge interrupt enable. When 1, an interrupt is generated when any bit in SndMsgAckStatus is set to 1. When MSI or MSI-X is enabled, message 0 is signaled.

1 RcvMsgIntrEn RW 0x0 Receive message interrupt enable. When 1, an interrupt is generated when any bit in RcvMsgStatus is set to 1. When MSI or MSI-X is enabled, message 0 is signaled.

63:2 Reserved R 0x0

1660

Table 9-17 MmioSndMBoxCtl (MMIO Offset: 0x3_0010) 1661

Bits Field Type Reset Value Description

0 SndMsg W Writing this bit to 1 causes the target RcvMsgStatus bit to be set in the target function’s mailbox. VF’s always send to the PF mailbox.

15:1 Reserved R 0x0

31:16 TargetFunction W Indicates the target VF number to be used by the SndMsg field of this register. Shall be set to 0 for VF copies of this register.

63:32 Reserved

1662

1663

SNIA SDXI Specification Working Draft 101 Version 0.9.0 rev 1

Table 9-18 MmioSndMBoxStatus (MMIO Offset: 0x3_1000 to 0x3_2FFF) 1664

Bits Field Type Reset Value Description

63:0 SndMsgAckStatus RW1C Message acknowledgment status. Indicates that the last message sent to the corresponding function was acknowledge by the receiver. VFs implements only bit 0 at offset 0x3_1000 representing a message received by the PF. The PF implements as many bits as there are VFs. This bit should be cleared prior to sending a new message to the target function.

1665

Table 9-19 MmioRcvMBoxStatus (MMIO Offset: 0x3_3000 to 0x3_4FFF) 1666

Bits Field Type Reset Value Description

63:0 RcvMsgStatus RW1C 0x0 Receive message status. Indicates that RcvMsgData from the mailbox corresponding to the status bit is valid. VFs implements only bit 0 at offset 0x3_3000 representing a message received from the PF. The PF implements as many bits as there are VFs. Clearing this status bit returns an acknowledgement to the sender.

1667

1668

102 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

9.5 PF Mailbox Data Registers 1669

Table 9-20 Send Mailbox Data Format 1670

Bits Field Type Reset Value Description

127:0 SndMsgData RW 16 bytes of mailbox message data to be sent to a VF.

1671

Table 9-21 Receive Mailbox Data Format 1672

Bits Field Type Reset Value Description

127:0 RcvMsgData R 16 bytes of mailbox message data received from a VF.

1673

Table 9-22 MMIO Send Mailbox Registers 1674

MMIO Addresses MMIO Send Mailbox Range 0x06_0000 to 0x06_FFFF MmioSndMBox[0000:0FFF]

0x08_0000 to 0x08_FFFF MmioSndMBox[1000:1FFF]

0x0A_0000 to 0x0A_FFFF MmioSndMBox[2000:2FFF]

0x0C_0000 to 0x0C_FFFF MmioSndMBox[3000:3FFF]

0x0E_0000 to 0x0E_FFFF MmioSndMBox[4000:4FFF]

0x10_0000 to 0x10_FFFF MmioSndMBox[5000:5FFF]

0x12_0000 to 0x12_FFFF MmioSndMBox[6000:6FFF]

0x14_0000 to 0x14_FFFF MmioSndMBox[7000:7FFF]

0x16_0000 to 0x16_FFFF MmioSndMBox[8000:8FFF]

0x18_0000 to 0x18_FFFF MmioSndMBox[9000:9FFF]

0x1A_0000 to 0x1A_FFFF MmioSndMBox[A000:AFFF]

0x1C_0000 to 0x1C_FFFF MmioSndMBox[B000:BFFF]

0x1E_0000 to 0x1E_FFFF MmioSndMBox[C000:CFFF]

0x20_0000 to 0x20_FFFF MmioSndMBox[D000:DFFF]

0x22_0000 to 0x22_FFFF MmioSndMBox[E000:EFFF]

0x24_0000 to 0x24_FFFF MmioSndMBox[F000:FFFF]

1675

1676

1677

SNIA SDXI Specification Working Draft 103 Version 0.9.0 rev 1

Table 9-23 MMIO Receive Mailbox Registers 1678

MMIO Addresses MMIO Receive Mailbox Range 0x07_0000 to 0x07_FFFF MmioRcvMBox[0000:0FFF]

0x09_0000 to 0x09_FFFF MmioRcvMBox[1000:1FFF]

0x0B_0000 to 0x0B_FFFF MmioRcvMBox[2000:2FFF]

0x0D_0000 to 0x0D_FFFF MmioRcvMBox[3000:3FFF]

0x0F_0000 to 0x0F_FFFF MmioRcvMBox[4000:4FFF]

0x11_0000 to 0x11_FFFF MmioRcvMBox[5000:5FFF]

0x13_0000 to 0x13_FFFF MmioRcvMBox[6000:6FFF]

0x15_0000 to 0x15_FFFF MmioRcvMBox[7000:7FFF]

0x17_0000 to 0x17_FFFF MmioRcvMBox[8000:8FFF]

0x19_0000 to 0x19_FFFF MmioRcvMBox[9000:9FFF]

0x1B_0000 to 0x1B_FFFF MmioRcvMBox[A000:AFFF]

0x1D_0000 to 0x1D_FFFF MmioRcvMBox[B000:BFFF]

0x1F_0000 to 0x1F_FFFF MmioRcvMBox[C000:CFFF]

0x21_0000 to 0x21_FFFF MmioRcvMBox[D000:DFFF]

0x23_0000 to 0x23_FFFF MmioRcvMBox[E000:EFFF]

0x25_0000 to 0x25_FFFF MmioRcvMBox[F000:FFFF]

1679

104 Working Draft SNIA SDXI Specification Version 0.9.0 rev 1

9.6 VF Mailbox Data Registers 1680

Table 9-24 MmioSndMBoxData (MMIO Offset: 0x6_0000) 1681

Bits Field Type Reset Value Description

127:0 SndMsgData RW 16 bytes of mailbox message data to be sent to the PF.

1682

Table 9-25 MmioRcvMBoxData (MMIO Offset: 0x7_0000) 1683

Bits Field Type Reset Value Description

127:0 RcvMsgData R 16 bytes of mailbox message data received from the PF.

1684

SNIA SDXI Specification Working Draft 105 Version 0.9.0 rev 1

9.7 Doorbell BAR 1685

The doorbell BAR is used by software to indicate that new work is available within a particular context. The 1686 BAR is divided into doorbell stride size sections. Each section, numbered from zero, corresponds to the 1687 context of the same number. This spacing is provided to allow the doorbell region for a particular context 1688 to be mapped to a user process with MMU-based page protections. 1689

Currently only the lower 8 bytes within each section are allocated. Software performs an aligned 64b 1690 MMIO write containing the WriteIndex of the context to its associated doorbell address after new 1691 descriptors have been written to memory and after the WriteIndex value referenced by the context control 1692 entry has been updated. All other offsets within the doorbell BAR are reserved. 1693

Doorbell writes containing the value 0xFFFF_FFFF_FFFF_FFFF are special and signal the function that a 1694 context may have descriptors to process, but that the WriteIndex value needs to be fetched from memory. 1695

Reads to the doorbell BAR always return all-zero. The write pointer values cannot be read back through 1696 the doorbell BAR. They can only be read through the context entry. 1697

The size of the doorbell BAR is dependent on the maximum supported <doorbell stride> and the maximum 1698 number of ring contexts supported. 1699