isca final presentation - runtime

75
HSA RUNTIME YEN-CHING CHUNG, NATIONAL TSING HUA UNIVERSITY

Upload: hsa-foundation

Post on 27-Jun-2015

746 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: ISCA final presentation - Runtime

HSA RUNTIMEYEN-CHING CHUNG, NATIONAL TSING HUA UNIVERSITY

Page 2: ISCA final presentation - Runtime

OUTLINE Introduction

HSA Core Runtime API (Pre-release 1.0 provisional) Initialization and Shut Down Notifications (Synchronous/Asynchronous) Agent Information Signals and Synchronization (Memory-Based) Queues and Architected Dispatch

Summary

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 3: ISCA final presentation - Runtime

INTRODUCTION (1) The HSA core runtime is a thin, user-mode API that provides the interface necessary for

the host to launch compute kernels to the available HSA components.

The overall goal of the HSA core runtime design is to provide a high-performance dispatch mechanism that is portable across multiple HSA vendor architectures.

The dispatch mechanism differentiates the HSA runtime from other language runtimes by architected argument setting and kernel launching at the hardware and specification level.

The HSA core runtime API is standard across all HSA vendors, such that languages which use the HSA runtime can run on different vendor’s platforms that support the API.

The implementation of the HSA runtime may include kernel-level components (required for some hardware components, ex: AMD Kaveri) or may be entirely user-space (for example, simulators or CPU implementations).

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 4: ISCA final presentation - Runtime

Component 1

DriverComponent N…

Vendor m

…Component 1

DriverComponent N…

Vendor 1

Component 1

HSA RuntimeComponent N…

HSA Vendor 1

HSAFinalizer Component 1

HSA RuntimeComponent N…

HSA Vendor m

HSAFinalizer

INTRODUCTION (2)

Programming Model

Language Runtime

The software architecture stack without HSA runtime

OpenCLApp

JavaApp

OpenMPApp

DSLApp

OpenCLRuntime

JavaRuntime

OpenMPRuntime

DSLRuntime

The software architecture stack with HSA runtime

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 5: ISCA final presentation - Runtime

INTRODUCTION (3)

OpenCL Runtime HSA RuntimeAgent

Start Program

HSA Memory Allocation

Enqueue Dispatch Packet

Exit Program Resource Deallocation

Command Queue

Platform, Device, and Context Initialization

SVM Allocation and Kernel Arguments Setting

Build Kernel

HSA Runtime Close

HSA Runtime Initialization and Topology Discovery

HSAIL Finalization and Linking

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 6: ISCA final presentation - Runtime

INTRODUCTION (4) HSA Platform System Architecture Specification support

Runtime initialization and shutdown Notifications (synchronous/asynchronous) Agent information Signals and synchronization (memory-based) Queues and Architected dispatch Memory management

HSAIL support Finalization, linking, and debugging

Image and Sampler support

HSA Runtime

HSA Memory Allocation

Enqueue Dispatch Packet

HSA Runtime Close

HSA Runtime Initialization and

Topology Discovery

HSAIL Finalization and Linking

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 7: ISCA final presentation - Runtime

RUNTIME INITIALIZATION AND SHUTDOWN

Page 8: ISCA final presentation - Runtime

OUTLINE

Runtime Initialization API hsa_init

Runtime Shut Down API hsa_shut_down

Examples

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 9: ISCA final presentation - Runtime

HSA RUNTIME INITIALIZATION

When the API is invoked for the first time in a given process, a runtime instance is created.

A typical runtime instance may contain information of platform, topology, reference count, queues, signals, etc.

The API can be called multiple times by applications Only a single runtime instance will exist for a given process. Whenever the API is invoked, the reference count is increased by one.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 10: ISCA final presentation - Runtime

HSA RUNTIME SHUT DOWN

When the API is invoked, the reference count is decreased by 1.

When the reference count < 1 All the resources associated with the runtime instance (queues, signals, topology

information, etc.) are considered invalid and any attempt to reference them in subsequent API calls results in undefined behavior.

The user might call hsa_init to initialize the HSA runtime again. The HSA runtime might release resources associated with it.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 11: ISCA final presentation - Runtime

EXAMPLE – RUNTIME INITIALIZATION (1)

Data structure for runtime instance

If hsa_init is called more than once, increase the ref_count by 1

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 12: ISCA final presentation - Runtime

EXAMPLE – RUNTIME INITIALIZATION (2)

hsa_init is called the first time, allocate resources and set the reference count

Get the number of HSA agent

Initialize agents

Create an empty agent list

If initialization failed, release resources

Create topology table

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 13: ISCA final presentation - Runtime

Agent-0

node_id 0

id 0

type CPU

vendor Generic

name Generic

wavefront_size 0

queue_size 200

group_memory 0

fbarrier_max_count 1

is_pic_supported 0……

EXAMPLE - RUNTIME INSTANCE (1)Platform Name: Generic Memory

node_id 0

id 0

segment_type 111111

address_base 0x0001

size 2048 MB

peak_bandwidth 6553.6 mpbs

Agent-1

node_id 0

id 0

type GPU

vendor Generic

name Generic

wavefront_size 64

queue_size 200

group_memory 64

fbarrier_max_count 1

is_pic_supported 1

Cache

node_id 0

id 0

levels 1

associativity 1

cache size 64KB

cache line size 4

is_inclusive 1

Agent: 2Memory: 1

Cache: 1

… …

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 14: ISCA final presentation - Runtime

Agent-0

node_id = 0

id = 0

agent_type = 1 (CPU)

vendor[16] = Generic

name[16] = Generic

wavefront_size = 0

queue_size =200

group_memory_size_bytes =0

fbarrier_max_count = 1

is_pic_supported = 0

Platform Header File

*base_address = 0x00001

Size = 248

system_timestamp_frequency_mhz = 200

signal_maximum_wait = 1/200

*node_id

no_nodes = 1

*agent_list

no_agent = 2

*memory_descriptor_list

no_memory_descriptor = 1

*cache_descriptor_list

no_cache_descriptor = 1

EXAMPLE - RUNTIME INSTANCE (2)

cache

node_id = 0

Id = 0

Levels = 1

* associativity

* cache_size

* cache_line_size

* is_inclusive

1 NULL

64KB NULL

1 NULL

4 NULL

Memory

node_id = 0

Id = 0

supported_segment_type_mask = 111111

virtual_address_base = 0x0001

size_in_bytes = 2048MB

peak_bandwidth_mbps = 6553.6

0 NULL

45 165 NULL

285 NULL

325 NULL

Agent-1

node_id = 0

id = 0

agent_type = 2 (GPU)

vendor[16] = Generic

name[16] = Generic

wavefront_size = 64

queue_size =200

group_memory_size_bytes =64

fbarrier_max_count = 1

is_pic_supported = 1…

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 15: ISCA final presentation - Runtime

© Copyright 2014 HSA Foundation. All Rights Reserved

EXAMPLE – RUNTIME SHUT DOWN

If ref_count < 1, then free the list; Otherwise decrease the ref_count by 1.

Page 16: ISCA final presentation - Runtime

NOTIFICATIONS (SYNCHRONOUS/ASYNCHRONOUS)

Page 17: ISCA final presentation - Runtime

OUTLINE

Synchronous Notifications hsa_status_t hsa_status_string

Asynchronous Notifications

Example

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 18: ISCA final presentation - Runtime

SYNCHRONOUS NOTIFICATIONS Notifications (errors, events, etc.) reported by the runtime can be synchronous or

asynchronous

The HSA runtime uses the return values of API functions to pass notifications synchronously.

A status code is define as an enumeration, , to capture the return value of any API function that has been executed, except accessors/mutators.

The notification is a status code that indicates success or error. Success is represented by HSA_STATUS_SUCCESS, which is equivalent to zero. An error status is assigned a positive integer and its identifier starts with the

HSA_STATUS_ERROR prefix. The status code can help to determine a cause of the unsuccessful execution.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 19: ISCA final presentation - Runtime

STATUS CODE QUERY

Query additional information on status code

Parameters status (input): Status code that the user is seeking more information on status_string (output): An ISO/IEC 646 encoded English language string that potentially

describes the error status

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 20: ISCA final presentation - Runtime

ASYNCHRONOUS NOTIFICATIONS The runtime passes asynchronous notifications by calling user-defined

callbacks. For instance, queues are a common source of asynchronous events because the

tasks queued by an application are asynchronously consumed by the packet processor. Callbacks are associated with queues when they are created. When the runtime detects an error in a queue, it invokes the callback associated with that queue and passes it an error flag (indicating what happened) and a pointer to the erroneous queue.

The HSA runtime does not implement any default callbacks. When using blocking functions within the callback implementation, a callback that

does not return can render the runtime state to be undefined.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 21: ISCA final presentation - Runtime

EXAMPLE - CALLBACK

Pass the callback function when create queue

If the queue is empty, set the event and invoke callback

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 22: ISCA final presentation - Runtime

AGENT INFORMATION

Page 23: ISCA final presentation - Runtime

OUTLINE

Agent information hsa_node_t hsa_agent_t hsa_agent_info_t hsa_component_feature_t

Agent Information manipulation APIs hsa_iterate_agents hsa_agent_get_info

Example

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 24: ISCA final presentation - Runtime

INTRODUCTION The runtime exposes a list of agents that are available in the system.

An HSA agent is a hardware component that participates in the HSA memory model. An HSA agent can submit AQL packets for execution. An HSA agent may also but is not required to be an HSA component. It is possible for

a system to include HSA agents that are neither an HSA component nor a host CPU.

HSA agents are defined as opaque handles of type hsa_agent_t .

The HSA runtime provides APIs for applications to traverse the list of available agents and query attributes of a particular agent.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 25: ISCA final presentation - Runtime

AGENT INFORMATION (1)

Opaque agent handle

Opaque NUMA node handle An HSA memory node is a node that delineates a set of

system components (host CPUs and HSA Components) with “local” access to a set of memory resources attached to the node's memory controller and appropriate HSA-compliant access attributes.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 26: ISCA final presentation - Runtime

AGENT INFORMATION (2)

Component features An HSA component is a hardware or software component that can be a target of the AQL queries

and conforms to the memory model of the HSA.

Values HSA_COMPONENT_FEATURE_NONE = 0

No component capabilities. The device is an agent, but not a component. HSA_COMPONENT_FEATURE_BASIC = 1

The component supports the HSAIL instruction set and all the AQL packet types except Agent dispatch.

HSA_COMPONENT_FEATURE_ALL = 2 The component supports the HSAIL instruction set and all the AQL packet types.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 27: ISCA final presentation - Runtime

AGENT INFORMATION (3)

Agent attributes

Values HSA_AGENT_INFO_MAX_GRID_DIM HSA_AGENT_INFO_MAX_WORKGROUP_DIM HSA_AGENT_INFO_QUEUE_MAX_PACKETS HSA_AGENT_INFO_CLOCK HSA_AGENT_INFO_CLOCK_FREQUENCY HSA_AGENT_INFO_MAX_SIGNAL_WAIT

HSA_AGENT_INFO_NAME HSA_AGENT_INFO_NODE HSA_AGENT_INFO_COMPONENT_FEATURES HSA_AGENT_INFO_VENDOR_NAME HSA_AGENT_INFO_WAVEFRONT_SIZE HSA_AGENT_INFO_CACHE_SIZE

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 28: ISCA final presentation - Runtime

AGENT INFORMATION MANIPULATION (1)

Iterate over the available agents, and invoke an application-defined callback on every iteration

If callback returns a status other than HSA_STATUS_SUCCESS for a particular iteration, the traversal stops and the function returns that status value.

Parameters callback (input): Callback to be invoked once per agent data (input): Application data that is passed to callback on every iteration. Can be

NULL.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 29: ISCA final presentation - Runtime

AGENT INFORMATION MANIPULATION (2)

Get the current value of an attribute for a given agent

Parameters agent (input): A valid agent attribute (input): Attribute to query value (output): Pointer to a user-allocated buffer where to store the value of the

attribute. If the buffer passed by the application is not large enough to hold the value of attribute, the behavior is undefined.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 30: ISCA final presentation - Runtime

EXAMPLE - AGENT ATTRIBUTE QUERY

Copy agent attribute information

Get the agent handle of Agent 0

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 31: ISCA final presentation - Runtime

SIGNALS AND SYNCHRONIZATION (MEMORY-BASED)

Page 32: ISCA final presentation - Runtime

OUTLIINE Signal

Signal manipulation API Create/Destroy Query Send Atomic Operations

Signal wait Get time out Signal Condition

Example

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 33: ISCA final presentation - Runtime

SIGNAL (1) HSA agents can communicate with each other by using coherent global memory,

or by using signals.

A signal is represented by an opaque signal handle

A signal carries a value, which can be updated or conditionally waited upon via an API call or HSAIL instruction.

The value occupies four or eight bytes depending on the machine model in use.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 34: ISCA final presentation - Runtime

SIGNAL (2) Updating the value of a signal is equivalent to sending the signal.

In addition to the update (store) of signals, the API for sending signal must support other atomic operations with specific memory order semantics

Atomic operations: AND, OR, XOR, Add, Subtract, Exchange, and CAS Memory order semantics : Release and Relaxed

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 35: ISCA final presentation - Runtime

SIGNAL CREATE/DESTROY

Create a signal Parameters

initial_value (input): Initial value of the signal.

signal_handle (output): Signal handle.

Destroy a signal previous created by hsa_signal_create

Parameter signal_handle (input): Signal handle.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 36: ISCA final presentation - Runtime

Send and atomically set the value of a signal with release semantics

SIGNAL LOAD/STORE Atomically read the current signal value with

acquire semantics

Atomically read the current signal value with relaxed semantics

Send and atomically set the value of a signal with relaxed semantics

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 37: ISCA final presentation - Runtime

Send and atomically increment the value of a signal by a given amount with release semantics

SIGNAL ADD/SUBTRACT

Send and atomically decrement the value of a signal by a given amount with release semantics

Send and atomically increment the value of a signal by a given amount with relaxed semantics

Send and atomically decrement the value of a signal by a given amount with relaxed semantics

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 38: ISCA final presentation - Runtime

Send and atomically perform a logical AND operation on the value of a signal and a given value with release semantics

SIGNAL AND (OR, XOR)/EXCHANGE

Send and atomically set the value of a signal and return its previous value with release semantics

Send and atomically perform a logical AND operation on the value of a signal and a given value with relaxed semantics

Send and atomically set the value of a signal and return its previous value with relaxed semantics

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 39: ISCA final presentation - Runtime

SIGNAL WAIT (1) The application may wait on a signal, with a condition specifying the terms of

wait.

Signal wait condition operator

Values HSA_EQ: The two operands are equal. HSA_NE: The two operands are not equal. HSA_LT: The first operand is less than the second operand. HSA_GTE: The first operand is greater than or equal to the second operand.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 40: ISCA final presentation - Runtime

SIGNAL WAIT (2) The wait can be done either in the HSA component via an HSAIL wait instruction

or via a runtime API defined here. Waiting on a signal returns the current value at the opaque signal object; The wait may have a runtime defined timeout which indicates the maximum amount of time that

an implementation can spend waiting.

The signal infrastructure allows for multiple senders/waiters on a single signal.

Wait reads the value, hence acquire synchronizations may be applied.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 41: ISCA final presentation - Runtime

SIGNAL WAIT (3)

Signal wait

Parameters signal_handle (input): A signal handle condition (input): Condition used to compare the passed and signal values compare_ value (input): Value to compare with return_value (output): A pointer where the current signal value must be read into

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 42: ISCA final presentation - Runtime

SIGNAL WAIT (4)

Signal wait with timeout Parameters

signal_handle (input): A signal handle timeout (input): Maximum wait duration (A value of zero indicates no maximum) long_wait (input): Hint indicating that the signal value is not expected to meet the given condition

in a short period of time. The HSA runtime may use this hint to optimize the wait implementation. condition (input): Condition used to compare the passed and signal values compare_ value (input): Value to compare with return_value (output): A pointer where the current signal value must be read into

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 43: ISCA final presentation - Runtime

EXAMPLE – SIGNAL WAIT (1)

thread_1 thread_2

thread_1 is blocked

hsa_signal_add_relaxed(value = value + 3)

Return signal value

Condition satisfied, the execution of thread_1 continues

value = 0

Timeline Timeline

value = 3

hsa_signal_substract_relaxed(value = value - 1)value = 2

hsa_signal_wait_timeout_acquire(value == 2)

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 44: ISCA final presentation - Runtime

EXAMPLE – SIGNAL WAIT (2)

If signal_handle is invalid, then return signal invalid status

Compare tmp->value with compare_value to see if the condition is satisfied? If timeout = 0 then return signal time out status

Signal wait condition function

If the condition is satisfied, then return signal and status

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 45: ISCA final presentation - Runtime

QUEUES AND ARCHITECTED DISPATCH

Page 46: ISCA final presentation - Runtime

OUTLINE

Queues Queue Types and Structure HSA runtime API for Queue Manipulations

Architected Queuing Language (AQL) Support Packet type Packet header

Examples Enqueue Packet Packet Processor

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 47: ISCA final presentation - Runtime

INTRODUCTION (1) An HSA-compliant platform supports multiple user-level command queues allocation.

A use-level command queue is characterized as runtime-allocated, user-level accessible virtual memory of a certain size, containing packets defined in the Architected Queuing Language (AQL packets).

Queues are allocated by HSA applications through the HSA runtime.

HSA software receives memory-based structures to configure the hardware queues to allow for efficient software management of the hardware queues of the HSA agents.

This queue memory shall be processed by the HSA Packet Processor as a ring buffer.

Queues are read-only data structures. Writing values directly to a queue structure results in undefined behavior. But HSA agents can directly modify the contents of the buffer pointed by base_address, or use

runtime APIs to access the doorbell signal or the service queue.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 48: ISCA final presentation - Runtime

Two queue types, AQL and Service Queues, are supported AQL Queue consumes AQL packets that are used to specify the information of kernel functions

that will be executed on the HSA component Service Queue consumes agent dispatch packets that are used to specify runtime-defined or user

registered functions that will be executed on the agent (typically, the host CPU)

INTRODUCTION (2)

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 49: ISCA final presentation - Runtime

INTRODUCTION (3) AQL queue structure

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 50: ISCA final presentation - Runtime

INTRODUCTION (4) In addition to the data held in the queue structure, the queue also defines two

properties (readIndex and writeIndex) that define the location of “head” and “tail” of the queue.

readIndex: The read index is a 64-bit unsigned integer that specifies the packetID of the next AQL packet to be consumed by the packet processor.

writeIndex: The write index is a 64-bit unsigned integer that specifies the packetID of the next AQL packet slot to be allocated.

Both indices are not directly exposed to the user, who can only access them by using dedicated HSA core runtime APIs.

The available index functions differ on the index of interest (read or write), action to be performed (addition, compare and swap, etc.), and memory consistency model (relaxed, release, etc.).

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 51: ISCA final presentation - Runtime

INTRODUCTION (5) The read index is automatically advanced when a packet is read by the packet

processor.

When the packet processor observes that The read index matches the write index, the queue can be considered empty; The write index is greater than or equal to the sum of the read index and the size of

the queue, then the queue is full.

The doorbell_signal field of a queue contains a signal that is used by the agent to inform the packet processor to process the packets it writes.

The value that the doorbell signaled is equal to the ID of the packet that is ready to be launched.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 52: ISCA final presentation - Runtime

INTRODUCTION (6) The new task might be consumed by the packet processor even before the

doorbell signal has been signaled by the agent. This is because the packet processor might be already processing some other

packets and observes that there is new work available, so it processes the new packets.

In any case, the agent must ring the doorbell for every batch of packets it writes.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 53: ISCA final presentation - Runtime

QUEUE CREATE/DESTROY Create a user mode queue

When a queue is created, the runtime also allocates the packet buffer and the completion signal.

The application should only rely on the status code returned to determine if the queue is valid

Destroy a user mode queue A destroyed queue might not be accessed after being

destroyed. When a queue is destroyed, the state of the AQL packets

that have not been yet fully processed becomes undefined.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 54: ISCA final presentation - Runtime

GET READ/WRITE INDEX Atomically retrieve read index of a queue with

acquire semantics

Atomically retrieve write index of a queue with acquire semantics

Atomically retrieve read index of a queue with relaxed semantics

Atomically retrieve write index of a queue with relaxed semantics

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 55: ISCA final presentation - Runtime

SET READ/WRITE INDEX Atomically set the read index of a queue with

release semantics

Atomically set the read index of a queue with relaxed semantics

Atomically set the write index of a queue with release semantics

Atomically set the write index of a queue with relaxed semantics

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 56: ISCA final presentation - Runtime

COMPARE AND SWAP WRITE INDEX Atomically compare and set the write index of a

queue with acquire/release/relaxed/acquire-release semantics

Parameters queue (input): A queue expected (input): The expected index value val (input): Value to copy to the write index if expected

matches the observed write index

Return value Previous value of the write index

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 57: ISCA final presentation - Runtime

ADD WRITE INDEX Atomically increment the write index of a

queue by an offset with release/acquire/relaxed/acquire-release semantics

Parameters queue (input): A queue val (input): The value to add to the write index

Return value Previous value of the write index

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 58: ISCA final presentation - Runtime

ARCHITECTED QUEUING LANGUAGE (AQL)

An HSA-compliant system provides a command interface for the dispatch of HSA agent commands.

This command interface is provided by the Architected Queuing Language (AQL).

AQL allows HSA agents to build and enqueue their own command packets, enabling fast and low-power dispatch.

AQL also provides support for HSA component queue submissions The HSA component kernel can write commands in AQL format.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 59: ISCA final presentation - Runtime

AQL PACKET (1)

AQL packet format

Values Always reserved packet (0): Packet format is set to always reserved when the queue is initialized. Invalid packet (1): Packet format is set to invalid when the readIndex is incremented, making the

packet slot available to the HSA agents. Dispatch packet (2): Dispatch packets contain jobs for the HSA component and are created by HSA

agents. Barrier packet (3): Barrier packets can be inserted by HSA agents to delay processing subsequent

packets. All queues support barrier packets. Agent dispatch packet (4): Dispatch packets contain jobs for the HSA agent and are created by HSA

agents.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 60: ISCA final presentation - Runtime

AQL PACKET (2)

HSA signaling object handle used to indicate completion of the job

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 61: ISCA final presentation - Runtime

EXAMPLE - ENQUEUE AQL PACKET (1)

An HSA agent submits a task to a queue by performing the following steps: Allocate a packet slot (by incrementing the writeIndex) Initialize the packet and copy packet to a queue associated with the Packet Processor Mark packet as valid Notify the Packet Processor of the packet (With doorbell signal)

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 62: ISCA final presentation - Runtime

EXAMPLE - ENQUEUE AQL PACKET (2)

Dispatch Queue

Allocate an AQL packet slot

Copy the packet into queue. Note that, we can have a lock here to prevent race condition in multithread environment

WriteIndex

ReadIndexInitialize packet

Send doorbell signal

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 63: ISCA final presentation - Runtime

EXAMPLE - PACKET PROCESSOR

WriteIndex

ReadIndex

Get packet content

Check if barrier packet

Update readIndex, change packet state to invalid, and send completion signal.

Receive doorbell Dispatch Queue

If there is any packet in queue, process the packet.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 64: ISCA final presentation - Runtime

MEMORY MANAGEMENT

Page 65: ISCA final presentation - Runtime

OUTLINE

Memory registration and deregistration

Memory region and memory segment

APIs for memory region manipulation

APIs for memory registration and deregistration

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 66: ISCA final presentation - Runtime

INTRODUCTION One of the key features of HSA is its ability to share global pointers between the

host application and code executing on the HSA component. This ability means that an application can directly pass a pointer to memory allocated on the host

to a kernel function dispatched to a component without an intermediate copy

When a buffer created in the host is also accessed by a component, programmers are encouraged to register the corresponding address range beforehand.

Registering memory expresses an intention to access (read or write) the passed buffer from a component other than the host. This is a performance hint that allows the runtime implementation to know which buffers will be accessed by some of the components ahead of time.

When an HSA program no longer needs to access a registered buffer in a device, the user should deregister that virtual address range.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 67: ISCA final presentation - Runtime

MEMORY REGION/SEGMENT

A memory region represents a virtual memory interval that is visible to a particular agent, and contains properties about how memory is accessed or allocated from that agent.

Memory segments

Values HSA_SEGMENT_GLOBAL = 1 HSA_SEGMENT_PRIVATE = 2 HSA_SEGMENT_GROUP = 4

HSA_SEGMENT_KERNARG = 8 HSA_SEGMENT_READONLY = 16 HSA_SEGMENT_IMAGE = 32

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 68: ISCA final presentation - Runtime

MEMORY REGION INFORMATION

Attributes of a memory region

Values HSA_REGION_INFO_BASE_ADDRESS HSA_REGION_INFO_SIZE HSA_REGION_INFO_NODE HSA_REGION_INFO_MAX_ALLOCATION_SIZE HSA_REGION_INFO_SEGMENT HSA_REGION_INFO_BANDWIDTH HSA_REGION_INFO_CACHED

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 69: ISCA final presentation - Runtime

MEMORY REGION MANIPULATION (1)

Get the current value of an attribute of a region

Iterate over the memory regions that are visible to an agent, and invoke an application-defined callback on every iteration

If callback returns a status other than HSA_STATUS_SUCCESS for a particular iteration, the traversal stops and the function returns that status value.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 70: ISCA final presentation - Runtime

MEMORY REGION MANIPULATION (2)

Allocate a block of memory

Deallocate a block of memory previously allocated using hsa_memory_allocate

Copy block of memory Copying a number of bytes larger than the size of the

memory regions pointed by dst or src results in undefined behavior.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 71: ISCA final presentation - Runtime

MEMORY REGISTRATION/DEREGISTRATION

Register memory

Parameters address (input): A pointer to the base of

the memory region to be registered. If a NULL pointer is passed, no operation is performed.

size (input): Requested registration size in bytes. A size of zero is only allowed if address is NULL.

Deregister memory previously registered using hsa_memory_register

Parameter address (input): A pointer to the base of the

memory region to be registered. If a NULL pointer is passed, no operation is performed.

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 72: ISCA final presentation - Runtime

EXAMPLE

Allocate a memory space

Use hsa_region_get_info to get the size in byte of this memory space

Register this memory space for a performance hint

Finish operation, deregister and free this memory space

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 73: ISCA final presentation - Runtime

SUMMARY

Page 74: ISCA final presentation - Runtime

SUMMARY Covered

HSA Core Runtime API (Pre-release 1.0 provisional) Runtime Initialization and Shutdown (Open/Close) Notifications (Synchronous/Asynchronous) Agent Information Signals and Synchronization (Memory-Based) Queues and Architected Dispatch Memory Management

Not covered Extension of Core Runtime HSAIL Finalization, Linking, and Debugging Images and Samplers

© Copyright 2014 HSA Foundation. All Rights Reserved

Page 75: ISCA final presentation - Runtime

QUESTIONS?

© Copyright 2014 HSA Foundation. All Rights Reserved