isca final presentation - queuing model

59
HSA QUEUING MODEL HAKAN PERSSON, SENIOR PRINCIPAL ENGINEER, ARM

Upload: hsa-foundation

Post on 27-Jun-2015

536 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: ISCA final presentation - Queuing Model

HSA QUEUING MODELHAKAN PERSSON, SENIOR PRINCIPAL ENGINEER, ARM

Page 2: ISCA final presentation - Queuing Model

HSA QUEUEING, MOTIVATION

Page 3: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

MOTIVATION (TODAY’S PICTURE)

Application OS GPU

Transfer buffer to GPU Copy/Map

Memory

Queue Job

Schedule JobStart Job

Finish JobSchedule

ApplicationGet Buffer

Copy/MapMemory

Page 4: ISCA final presentation - Queuing Model

HSA QUEUEING: REQUIREMENTS

Page 5: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

REQUIREMENTS

Three key technologies are used to build the user mode queueing mechanism

Shared Virtual Memory System Coherency Signaling

AQL (Architected Queueing Language) enables any agent enqueue tasks

Page 6: ISCA final presentation - Queuing Model

SHARED VIRTUAL MEMORY

Page 7: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

PHYSICAL MEMORY

SHARED VIRTUAL MEMORY (TODAY) Multiple Virtual memory address spaces

CPU0 GPU

VIRTUAL MEMORY1

PHYSICAL MEMORY

VA1->PA1 VA2->PA1

VIRTUAL MEMORY2

Page 8: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

PHYSICAL MEMORY

SHARED VIRTUAL MEMORY (HSA) Common Virtual Memory for all HSA agents

CPU0 GPU

VIRTUAL MEMORY

PHYSICAL MEMORY

VA->PA VA->PA

Page 9: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

SHARED VIRTUAL MEMORY

Advantages No mapping tricks, no copying back-and-forth between different PA

addresses Send pointers (not data) back and forth between HSA agents.

Implications Common Page Tables (and common interpretation of architectural semantics

such as shareability, protection, etc). Common mechanisms for address translation (and servicing address

translation faults) Concept of a process address space (PASID) to allow multiple, per process

virtual address spaces within the system.

Page 10: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

SHARED VIRTUAL MEMORY

Specifics Minimum supported VA width is 48b for 64b systems, and 32b for

32b systems. HSA agents may reserve VA ranges for internal use via system

software. All HSA agents other than the host unit must use the lowest privilege

level If present, read/write access flags for page tables must be

maintained by all agents. Read/write permissions apply to all HSA agents, equally.

Page 11: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

GETTING THERE …

Application OS GPU

Transfer buffer to GPU Copy/Map

Memory

Queue Job

Schedule JobStart Job

Finish JobSchedule

ApplicationGet Buffer

Copy/MapMemory

Page 12: ISCA final presentation - Queuing Model

CACHE COHERENCY

Page 13: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

CACHE COHERENCY DOMAINS (1/3)

Data accesses to global memory segment from all HSA Agents shall be coherent without the need for explicit cache maintenance.

Page 14: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

CACHE COHERENCY DOMAINS (2/3)

Advantages Composability Reduced SW complexity when communicating between agents Lower barrier to entry when porting software

Implications Hardware coherency support between all HSA agents Can take many forms

Stand alone Snoop Filters / Directories Combined L3/Filters Snoop-based systems (no filter) Etc …

Page 15: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

CACHE COHERENCY DOMAINS (3/3)

Specifics No requirement for instruction memory accesses to be coherent Only applies to the Primary memory type. No requirement for HSA agents to maintain coherency to any

memory location where the HSA agents do not specify the same memory attributes

Read-only image data is required to remain static during the execution of an HSA kernel.

No double mapping (via different attributes) in order to modify. Must remain static

Page 16: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

GETTING CLOSER …

Application OS GPU

Transfer buffer to GPU Copy/Map

Memory

Queue Job

Schedule JobStart Job

Finish JobSchedule

ApplicationGet Buffer

Copy/MapMemory

Page 17: ISCA final presentation - Queuing Model

SIGNALING

Page 18: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

SIGNALING (1/3)

HSA agents support the ability to use signaling objects All creation/destruction signaling objects occurs via HSA

runtime APIs From an HSA Agent you can directly access signaling objects.

Signaling a signal object (this will wake up HSA agents waiting upon the object)

Query current object Wait on the current object (various conditions supported).

Page 19: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

SIGNALING (2/3)

Advantages Enables asynchronous events between HSA agents,

without involving the kernel Common idiom for work offload Low power waiting

Implications Runtime support required Commonly implemented on top of cache coherency flows

Page 20: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

SIGNALING (3/3)

Specifics Only supported within a PASID Supported wait conditions are =, !=, < and >= Wait operations may return sporadically (no guarantee against false

positives) Programmer must test.

Wait operations have a maximum duration before returning. The HSAIL atomic operations are supported on signal objects. Signal objects are opaque

Must use dedicated HSAIL/HSA runtime operations

Page 21: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

ALMOST THERE…

Application OS GPU

Transfer buffer to GPU Copy/Map

Memory

Queue Job

Schedule JobStart Job

Finish JobSchedule

ApplicationGet Buffer

Copy/MapMemory

Page 22: ISCA final presentation - Queuing Model

USER MODE QUEUING

Page 23: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

ONE BLOCK LEFT

Application OS GPU

Transfer buffer to GPU Copy/Map

Memory

Queue Job

Schedule JobStart Job

Finish JobSchedule

ApplicationGet Buffer

Copy/MapMemory

Page 24: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

USER MODE QUEUEING (1/3)

User mode Queueing Enables user space applications to directly, without OS

intervention, enqueue jobs (“Dispatch Packets”) for HSA agents.

Queues are created/destroyed via calls to the HSA runtime.

One (or many) agents enqueue packets, a single agent dequeues packets.

Requires coherency and shared virtual memory.

Page 25: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

USER MODE QUEUEING (2/3)

Advantages Avoid involving the kernel/driver when dispatching work for an Agent. Lower latency job dispatch enables finer granularity of offload Standard memory protection mechanisms may be used to protect communication with

the consuming agent.

Implications Packet formats/fields are Architected – standard across vendors!

Guaranteed backward compatibility Packets are enqueued/dequeued via an Architected protocol (all via memory

accesses and signaling) More on this later……

Page 26: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

SUCCESS!

Application OS GPU

Transfer buffer to GPU Copy/Map

Memory

Queue Job

Schedule JobStart Job

Finish JobSchedule

ApplicationGet Buffer

Copy/MapMemory

Page 27: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

SUCCESS!

Application OS GPU

Queue Job

Start Job

Finish Job

Page 28: ISCA final presentation - Queuing Model

ARCHITECTED QUEUEING LANGUAGE, QUEUES

Page 29: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

ARCHITECTED QUEUEING LANGUAGE HSA Queues look just like standard shared

memory queues, supporting multi-producer, single-consumer

Single producer variant defined with some optimizations possible.

Queues consist of storage, read/write indices, ID, etc.

Queues are created/destroyed via calls to the HSA runtime

“Packets” are placed in queues directly from user mode, via an architected protocol

Packet format is architected

Producer Producer

Consumer

Read Index

Write Index

Storage in coherent, shared memory

Packets

Page 30: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

ARCHITECTED QUEUING LANGUAGE Packets are read and dispatched for execution from the queue in order, but

may complete in any order. There is no guarantee that more than one packet will be processed in parallel at a

time

There may be many queues. A single agent may also consume from several queues.

Any HSA agent may enqueue packets CPUs GPUs Other accelerators

Page 31: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

QUEUE STRUCTUREOffset (bytes) Size (bytes) Field Notes

0 4 queueType Differentiate different queues

4 4 queueFeatures Indicate supported features

8 8 baseAddress Pointer to packet array

16 16 doorbellSignal HSA signaling object handle

24 4 size Packet array cardinality

28 4 queueId Unique per process

32 8 serviceQueue Queue for callback services

intrinsic 8 writeIndex Packet array write index

intrinsic 8 readIndex Packet array read index

Page 32: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

QUEUE VARIANTS

queueType and queueFeatures together define queue semantics and capabilities

Two queueType values defined, other values reserved: MULTI – queue supports multiple producers SINGLE – queue supports single producer

queueFeatures is a bitfield indicating capabilities DISPATCH (bit 0) if set then queue supports DISPATCH packets AGENT_DISPATCH (bit 1) if set then queue supports AGENT_DISPATCH packets All other bits are reserved and must be 0

Page 33: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

QUEUE STRUCTURE DETAILS

Queue doorbells are HSA signaling objects with restrictions Created as part of the queue – lifetime tied to queue object Atomic read-modify-write not allowed

size field value must be aligned to a power of 2

serviceQueue can be used by HSA kernel for callback services Provided by application when queue is created Can be mapped to HSA runtime provided serviceQueue, an application serviced

queue, or NULL if no serviceQueue required

Page 34: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

READ/WRITE INDICES

readIndex and writeIndex properties are part of the queue, but not visible in the queue structure Accessed through HSA runtime API and HSAIL operations HSA runtime/HSAIL operations defined to

Read readIndex or writeIndex property Write readIndex or writeIndex property Add constant to writeIndex property (returns previous writeIndex value) CAS on writeIndex property

readIndex & writeIndex operations treated as atomic in memory model relaxed, acquire, release and acquire-release variants defined as applicable

readIndex and writeIndex never wrap

PacketID – the index of a particular packet Uniquely identifies each packet of a queue

Page 35: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

PACKET ENQUEUE

Packet enqueue follows a few simple steps: Reserve space

Multiple packets can be reserved at a time Write packet to queue Mark packet as valid

Producer no longer allowed to modify packet Consumer is allowed to start processing packet

Notify consumer of packet through the queue doorbell Multiple packets can be notified at a time Doorbell signal should be signaled with last packetID notified

On small machine model the lower 32 bits of the packetID are used

Page 36: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

PACKET RESERVATION

Two flows envisaged Atomic add writeIndex with number of packets to reserve

Producer must wait until packetID < readIndex + size before writing to packet Queue can be sized so that wait is unlikely (or impossible) Suitable when many threads use one queue

Check queue not full first, then use atomic CAS to update writeIndex Can be inefficient if many threads use the same queue Allows different failure model if queue is congested

Page 37: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

QUEUE OPTIMIZATIONS

Queue behavior is loosely defined to allow optimizations

Some potential producer behavior optimizations: Keep local copy of readIndex, update when required For single producer queues:

Keep local copy of writeIndex Use store operation rather than add/cas atomic to update writeIndex

Some potential consumer behavior optimizations: Use packet format field to determine whether a packet has been submitted rather than writeIndex property Speculatively read multiple packets from the queue Not update readIndex for each packet processed Rely on value used for doorbellSignal to notify new packets

Especially useful for single producer queues

Page 38: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

POTENTIAL MULTI-PRODUCER ALGORITHM

// Allocate packetuint64_t packetID = hsa_queue_add_write_index_relaxed(q, 1);

// Wait until the queue is no longer full. uint64_t rdIdx;do { rdIdx = hsa_queue_load_read_index_relaxed(q);} while (packetID >= (rdIdx + q->size));

// calculate indexuint32_t arrayIdx = packetID & (q->size-1);

// copy over the packet, the format field is INVALIDq->baseAddress[arrayIdx] = pkt;

// Update format field with release semanticsq->baseAddress[index].hdr.format.store(DISPATCH, std::memory_order_release);

// ring doorbell, with release semantics (could also amortize over multiple packets)hsa_signal_send_relaxed(q->doorbellSignal, packetID);

Page 39: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

POTENTIAL CONSUMER ALGORITHM// Get location of next packetuint64_t readIndex = hsa_queue_load_read_index_relaxed(q);

// calculate the index uint32_t arrayIdx = readIndex & (q->size-1);

// spin while empty (could also perform low-power wait on doorbell)while (INVALID == q->baseAddress[arrayIdx].hdr.format) { }

// copy over the packetpkt = q->baseAddress[arrayIdx];

// set the format field to invalidq->baseAddress[arrayIdx].hdr.format.store(INVALID, std::memory_order_relaxed);

// Update the readIndex using HSA intrinsichsa_queue_store_read_index_relaxed(q, readIndex+1);

// Now process <pkt>!

Page 40: ISCA final presentation - Queuing Model

ARCHITECTED QUEUEING LANGUAGE, PACKETS

Page 41: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

PACKETS

Packets come in three main types with architected layouts Always reserved & Invalid

Do not contain any valid tasks and are not processed (queue will not progress) Dispatch

Specifies kernel execution over a grid Agent Dispatch

Specifies a single function to perform with a set of parameters Barrier

Used for task dependencies

Page 42: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

COMMON PACKET HEADERStart Offset

(Bytes)Format Field Name Description

0 uint16_t

format:8Contains the packet type (Always reserved, Invalid, Dispatch, Agent Dispatch, and Barrier). Other values are reserved and should not be used.

barrier:1If set then processing of packet will only begin when all preceding packets are complete.

acquireFenceScope:2

Determines the scope and type of the memory fence operation applied before the packet enters the active phase.Must be 0 for Barrier Packets.

releaseFenceScope:2Determines the scope and type of the memory fence operation applied after kernel completion but before the packet is completed.

reserved:3 Must be 0

Page 43: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

DISPATCH PACKETStart

Offset (Bytes)

Format Field Name Description

0 uint16_t header Packet header

2 uint16_tdimensions:2 Number of dimensions specified in gridSize. Valid values are 1, 2, or 3.

reserved:14 Must be 0.4 uint16_t workgroupSize.x x dimension of work-group (measured in work-items).6 uint16_t workgroupSize.y y dimension of work-group (measured in work-items).8 uint16_t workgroupSize.z z dimension of work-group (measured in work-items).10 uint16_t reserved2 Must be 0.12 uint32_t gridSize.x x dimension of grid (measured in work-items).16 uint32_t gridSize.y y dimension of grid (measured in work-items).20 uint32_t gridSize.z z dimension of grid (measured in work-items).24 uint32_t privateSegmentSizeBytes Total size in bytes of private memory allocation request (per work-item).28 uint32_t groupSegmentSizeBytes Total size in bytes of group memory allocation request (per work-group).

32 uint64_t kernelObjectAddress Address of an object in memory that includes an implementation-defined executable ISA image for the kernel.

40 uint64_t kernargAddress Address of memory containing kernel arguments.48 uint64_t reserved3 Must be 0.56 uint64_t completionSignal Address of HSA signaling object used to indicate completion of the job.

Page 44: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

AGENT DISPATCH PACKETStart Offset

(Bytes)Format Field Name Description

0 uint16_t header Packet header

2 uint16_t type

The function to be performed by the destination Agent. The type value is split into the following ranges: 0x0000:0x3FFF – Vendor specific 0x4000:0x7FFF – HSA runtime 0x8000:0xFFFF – User registered function

4 uint32_t reserved2 Must be 0.8 uint64_t returnLocation Pointer to location to store the function return value in.16 uint64_t arg[0]

64-bit direct or indirect arguments.24 uint64_t arg[1]32 uint64_t arg[2]40 uint64_t arg[3]48 uint64_t reserved3 Must be 0.

56 uint64_t completionSignal Address of HSA signaling object used to indicate completion of the job.

Page 45: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

BARRIER PACKET

Used for specifying dependences between packets

HSA agent will not launch any further packets from this queue until the barrier packet signal conditions are met

Used for specifying dependences on packets dispatched from any queue. Execution phase completes only when all of the dependent signals (up to five) have

been signaled (with the value of 0). Or if an error has occurred in one of the packets upon which we have a dependence.

Page 46: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

BARRIER PACKETStart Offset

(Bytes)Format Field Name Description

0 uint16_t header Packet header, see 2.8.1 Packet header (p. 16).

2 uint16_t reserved2 Must be 0.

4 uint32_t reserved3 Must be 0.

8 uint64_t depSignal0

Address of dependent signaling objects to be evaluated by the packet processor.

16 uint64_t depSignal1

24 uint64_t depSignal2

32 uint64_t depSignal3

40 uint64_t depSignal4

48 uint64_t reserved4 Must be 0.

56 uint64_t completionSignal Address of HSA signaling object used to indicate completion of the job.

Page 47: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

DEPENDENCES

A user may never assume more than one packet is being executed by an HSA agent at a time.

Implications: Packets can’t poll on shared memory values which will be set by packets issued from

other queues, unless the user has ensured the proper ordering. To ensure all previous packets from a queue have been completed, use the Barrier bit. To ensure specific packets from any queue have completed, use the Barrier packet.

Page 48: ISCA final presentation - Queuing Model

HSA QUEUEING, PACKET EXECUTION

Page 49: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

PACKET EXECUTION

Launch phase Initiated when launch conditions are met

All preceding packets in the queue must have exited launch phase If the barrier bit in the packet header is set, then all preceding packets in the queue must have

exited completion phase Includes memory acquire fence

Active phase Execute the packet Barrier packets remain in Active phase until conditions are met.

Completion phase First step is memory release fence – make results visible. completionSignal field is then signaled with a decrementing atomic.

Page 50: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

PACKET EXECUTION – BARRIER BIT

Pkt1Launch

Pkt2Launch

Pkt1Execute

Pkt2Execute

Pkt1Complete

Pkt3Launch (barrier=1)

Pkt2Complete

Pkt3Execute

Time

Pkt3 launches whenall packets in the queue have completed.

Page 51: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

PUTTING IT ALL TOGETHER (FFT)Packet 1

Packet 2

Packet 3

Packet 4

Packet 5

Packet 6Barrier Barrier

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Time

Page 52: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

PUTTING IT ALL TOGETHERAQL Pseudo Code

// Send the packets to do the first stage. aql_dispatch(pkt1);aql_dispatch(pkt2);

// Send the next two packets, setting the barrier bit so we// know packets 1 & 2 will be complete before 3 and 4 are // launched. aql_dispatch_with _barrier_bit(pkt3); aql_dispatch(pkt4);

// Same as above (make sure 3 & 4 are done before issuing 5// & 6) aql_dispatch_with_barrier_bit(pkt5); aql_dispatch(pkt6);

// This packet will notify us when 5 & 6 are complete)aql_dispatch_with_barrier_bit(finish_pkt);

Page 53: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

PACKET EXECUTION – BARRIER PACKET

Barrier T2Q2

T1Q1

Signal Xinit to 1

depSignal0

completionSignal

Time

Decrements signal X

BarrierLaunch

T1Launch

BarrierExecute

T1Execute

BarrierComplete

T1Complete

T2Launch

T2Execute

T2Complete

Barrier completes when signal X signalled with 0

T2 launches once barrier complete

Page 54: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

DEPTH FIRST CHILD TASK EXECUTION

Consider two generations of child tasks Task T submits tasks T.1 & T.2 Task T.1 submits tasks T.1.1 & T.1.2 Task T.2 submits tasks T.2.1 & T.2.2

Desired outcome Depth first child task execution

I.e. T T1 T.1.1 T.1.2 T.2 T.2.1 T.2.2 T passed signal (allComplete) to decrement when all tasks are complete (T and its

children etc)

T

T.2.2T.1.2T.1.2T.1.1

T.1 T.2

Page 55: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

HOW TO DO THIS WITH HSA QUEUES?

Use a separate user mode queue for each recursion level Task T submits to queue Q1 Tasks T.1 & T.2 submits tasks to queue Q2 Queues could be passed in as parameters to task T

Depth first requires ordering of T.1, T.2 and their children Use additional signal object (childrenComplete) to track completion of the children of

T.1 & T.2 childrenComplete set to number of children (i.e. 2) by each of T.1 & T.2

Page 56: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

A PICTURE SAYS MORE THAN 1000 WORDS

T

T.2.2T.1.2T.1.2T.1.1

T.1 T.2 T.1 Barrier T.2 BarrierQ1

Wait on childrenComplete

Signal allComplete

T.1.1 T.1.2 T.2.1 T.2.2Q2

Page 57: ISCA final presentation - Queuing Model

SUMMARY

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 58: ISCA final presentation - Queuing Model

© Copyright 2014 HSA Foundation. All Rights Reserved

KEY HSA TECHNOLOGIES HSA combines several mechanisms to enable low overhead task

dispatch Shared Virtual Memory System Coherency Signaling AQL

User mode queues – from any compatible agent Architected packet format

Rich dependency mechanism Flexible and efficient signaling of completion

Page 59: ISCA final presentation - Queuing Model

QUESTIONS?

© Copyright 2014 HSA Foundation. All Rights Reserved