introduction to hsa
TRANSCRIPT
![Page 1: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/1.jpg)
INTRODUCTION TO HETEROGENEOUS SYSTEM ARCHITECTURE
Presenter: BingRu Wu
![Page 2: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/2.jpg)
Outline
◻ Introduction◻ Goal◻ Concept◻ Memory Model◻ System Components
![Page 3: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/3.jpg)
Introduction
◻ HSA: Heterogeneous System Architecture◻ Promising future:
◻ Arm processors producers◻ GPU vendors: AMD, Imaginations
◻ Fully utilize computation resource◻ Our system may connect to major
application base with supporting HSA
![Page 4: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/4.jpg)
Goal of HSA
◻ Remove programmability barrier◻ Memory space barrier◻ Access latency among devices
◻ Backward compatible◻ Utilize existing programming models
![Page 5: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/5.jpg)
Concept of HSA
![Page 6: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/6.jpg)
Abstract
◻ Two kinds of compute unit◻ LCU: Latency Compute Unit (ex. CPU)◻ TCU: Throughput Compute Unit (ex. GPU)
◻ Merged memory space
![Page 7: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/7.jpg)
Memory Management (1/2)
◻ Shared page table◻ Memory is shared by all devices◻ No longer host to device copy and vice versa◻ Support pointer data structure (ex. list)
◻ Page faulting◻ Virtual memory space for all devices◻ ex. GPU now can use memory as if it has
whole memory space
![Page 8: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/8.jpg)
Memory Management (2/2)
◻ Coherent memory regions◻ The memory is coherent
◻ Shared among all devices (CUs)◻ Unified address space
◻ Memory type separated by address◻ Private / local / global memory decided by
memory region◻ No special instruction is required
![Page 9: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/9.jpg)
User-Level Command Queue
◻ Queues for communication◻ User to device◻ Device to device
◻ HSA runtime handles the queue◻ Allocation & destruction◻ Each per application◻ Vendor dependent implementation
◻ Direct access to devices◻ No OS syscall◻ No task managing
![Page 10: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/10.jpg)
Hardware Scheduler (1/3)
◻ No real scheduling on TCU (GPU)◻ Task scheduling◻ Task preemption
◻ Current implementation◻ Execute without lock:
◻ All threads execute◻ Multiple tasks cause error result
![Page 11: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/11.jpg)
Hardware Scheduler (2/3)
◻ Current implementation◻ Execute with lock:
◻ Code exception may cause the resource being locked up
◻ Long runtime tasks prevent others from execution
◻ We may fail to finish critical jobs
![Page 12: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/12.jpg)
Hardware Scheduler (3/3)
HSA runtime guarantees:◻ Bounded execution time
◻ Any process cease in reasonable time◻ Fast switch among applications
◻ Use hardware to save time◻ Application level parallelism
![Page 13: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/13.jpg)
HSAIL (1/2)
◻ HSA Intermediate Language◻ The language for TCU
◻ Similar to “PTX” code◻ No graphic-specific instructions◻ Further translated to HW ISA (by Finalizer)
◻ The abstract platform is similar to OpenCL◻ Work item (thread)◻ Work group (block)◻ NDRange (grid)
![Page 14: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/14.jpg)
HSAIL (2/2)
![Page 15: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/15.jpg)
Memory Model
![Page 16: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/16.jpg)
◻ All types of memory using same space◻ Memory access behavior
◻ Not all regions are accessible by all devices◻ OS kernel should not be accessible◻ Mapping to a region in kernel is still possible
◻ Accessing identical address may gives different values◻ Work item private memory◻ Work group local memory◻ Accessing other item / group is not valid
Virtual Memory Address
![Page 17: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/17.jpg)
◻ Global◻ The memory shared by all LCU & TCU◻ Accessible via work item / group
◻ Group◻ The memory shared by all work items in the
same group◻ Private
◻ The memory only visible by a work item
Memory Region
![Page 18: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/18.jpg)
◻ Kernarg◻ The memory for kernel arguments◻ Kernel is the code fragment we ask a device
to run on◻ Readonly
◻ Read-only type of global memory◻ Spill
◻ Memory for register spill◻ Arg
◻ Memory for function call arguments
Memory Region
![Page 19: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/19.jpg)
Memory Consistency
◻ LCU◻ LCU maintains its own consistency◻ Shares global memory
◻ Work item◻ Memory operation to same address by single
work item is in order◻ Memory operations to different address may
be reordered◻ Other than that, nothing is guaranteed
![Page 20: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/20.jpg)
System Components
![Page 21: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/21.jpg)
HSA System
![Page 22: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/22.jpg)
Compilation
◻ Frontend◻ LLVM IR◻ No data dependency
◻ Backend◻ Convert IR to HSAIL◻ Optimization happens
here◻ Binary format
◻ ELF format◻ Embedded container for
HSAIL (BRIG)
![Page 23: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/23.jpg)
Runtime
◻ HSA runtime◻ Issue tasks to device
protocol◻ Device
◻ Convert HSAIL to ISA with Finalizer
![Page 24: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/24.jpg)
HSAIL Program Features
◻ Backward Compatible◻ A system without HSA support should still
run the executable◻ Function Invocation
◻ LCU functions may call LCU ones◻ TCU functions may call TCU ones with
Finalizer support◻ LCU to TCU / TCU to LCU is supported by
using queue◻ C++ compatible
![Page 25: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/25.jpg)
Conclusion
◻ HSA is an open and standard layer between software / hardware
◻ The cardinal feature of HSA is the unified virtual memory space
◻ No replacement for current programming framework, no new language is required
![Page 26: Introduction to HSA](https://reader030.vdocuments.us/reader030/viewer/2022020208/55a9a6411a28abb3518b47d5/html5/thumbnails/26.jpg)
Reference
◻ Heterogeneous System Architecture: A Technical Review
◻ HSA Programmer’s Reference Manual◻ HSAIL: Write-Once-Run-Everywhere for
Heterogeneous Systems