bitfusion nimbix dev summit heterogeneous architectures
TRANSCRIPT
![Page 1: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/1.jpg)
HETEROGENEOUS ARCHITECTURES: A SURVEY AND OVERVIEW FOR DEVELOPERS
1
MAZHAR MEMON
CTO, BITFUSION. IO
![Page 2: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/2.jpg)
2
abstract and
slow
à
ß com
plex and
fast
Time à
Delivering performance and efficiency to today’s applica<ons is becoming more difficult
The problem in compuHng
![Page 3: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/3.jpg)
The soKware world is increasingly abstract
![Page 4: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/4.jpg)
Transistor scaling is ending
![Page 5: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/5.jpg)
Moore’s law slowing -‐> complexity
Era of frequency Era of mul<-‐core Era of many-‐core
![Page 6: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/6.jpg)
6
abstract and
slow
à
ß com
plex and
fast
Time à
Help!
The problem in compuHng
![Page 7: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/7.jpg)
The soluHon(s)
• Hardware • Specialized hardware required to keep up with accelerated performance curve • Encourage accessibility: low hourly pricing
• SoIware • Abstrac<ons: Libraries, APIs, tool chain up to compiler IR, use transla<ons where possible • Ecosystem: Learning materials, user groups, university engagement
• What makes this happen: Developers
7
Remainder of this talk is about the hardware out there and how to develop for them
![Page 8: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/8.jpg)
Current State of Developer Experience for Accelerators
8
-‐ Update to the right Opera<ng System -‐ Install Vendor Tool-‐flows which only
work on specific Opera<ng Systems -‐ SeXng up the Environment and
Licenses -‐ Installing the Board -‐ SeXng up the board -‐ Numerous pages of documenta<on
Unhappy Developer Experience L
In many cases developers give up before even star<ng real work due to this poor developer experience
![Page 9: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/9.jpg)
Overview of available compute devices
9
…from easiest to hardest
![Page 10: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/10.jpg)
Integrated GPUs
• Architecture: SIMD, shared resource architecture
• Targeted workloads: Medium-‐sized offloads, latency-‐sensi<ve, cost-‐sensi<ve, media
• Programming models: OpenCL, DirectCompute, C++ AMP, SPIR, HSAIL
• Ecosystem maturity: High
• Links: • haps://soIware.intel.com/en-‐us/ar<cles/intel-‐graphics-‐developers-‐guides
10
![Page 11: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/11.jpg)
Discrete GPUs
• Architecture: SIMD, discrete coprocessor configura<on
• Targeted workloads: Large-‐sized offloads, throughput-‐sensi<ve, parallel structured
• Programming models: CUDA, OpenCL, DirectCompute, C++ AMP, SYCL, SPIR, HSA
• Ecosystem maturity: High
• Links: • hap://docs.nvidia.com/cuda/cuda-‐geXng-‐started-‐guide-‐for-‐linux
11
![Page 12: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/12.jpg)
MICs
• Architecture: Many GP cores, (co)processor configura<on
• Targeted workloads: Large-‐sized offloads, throughput-‐sensi<ve, generic HPC
• Programming models: OpenCL, OMP, MPI, general x86
• Ecosystem maturity: High
• Links: • haps://soIware.intel.com/en-‐us/ar<cles/intel-‐xeon-‐phi-‐coprocessor-‐developers-‐quick-‐start-‐guide
12
![Page 13: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/13.jpg)
FPGAs
• Architecture: LUTs+HPs+Fabric, coprocessor configura<on • Targeted workloads: extreme pipelining or fanout, systolic, fast configura<on(?)
• Programming models: VHDL, Verilog, HLS, OpenCL
• Ecosystem maturity: Medium
• Links:
• haps://www.altera.com/products/design-‐soIware/embedded-‐soIware-‐developers/opencl/overview.highResolu<onDisplay.html
• hap://www.xilinx.com/products/design-‐tools/soIware-‐zone/sdaccel.html
13
![Page 14: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/14.jpg)
Automata
• Architecture: NFA with programmable fabric
• Targeted workloads: MISD, paaern matching, parallel unstructured
• Programming models: API, ANML, regexp
• Ecosystem maturity: Low
• Links: hap://micronautomata.com/
14
![Page 15: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/15.jpg)
Enabling developers: Accessibility: sHll a problem
15
![Page 16: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/16.jpg)
Vision
To bring supercompu<ng for the masses by: ◦ building soIware to automa<cally realize the benefits of heterogeneous hardware
16
![Page 17: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/17.jpg)
Enabling scaling automaHcally
Horizontal Scaling with BF Boost remo<ng technology
Ver5cal Scaling with BF Boost spliXng technology
Heterogeneous Scaling with BF Boost intercep<on technology
cpu system gpu system
3X Machine learning with Caffe, Torch: 2 local vs. 8 remote GPUs
3.5X Rendering with Blender: 1 local vs. 4 remote GPUs
20X Rendering with Blender: 4 remote GPUs
8X Image Processing with ImageMagick: 1 vs. 12 local GPUs
10X Computer Vision (face detect) with OpenCV: 12 CPU cores vs. 4 GPUs
7X Computa5onal Science with NAMD: 2 remote GPUs
![Page 18: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/18.jpg)
BiYusion Tech: Remote VirtualizaHon
18
Features • Scale-‐out: connect one server to many accelerators to boost performance • Scale-‐in: connect many servers to few accelerators to pool resources and lower cost • Service discovery: local and remote machines can discover themselves on demand
without complex or <me consuming configura<on. • Virtual pools: Segment resources by class of users or hardware
Remote virtualiza<on enables varied virtual configura<ons by combining or sharing the resources of local and remote servers
• Binary-‐level API intercep<on • Distribute work across local
and remote machines • Advanced performance
features including synchroniza<on elision and data pipelining
applica5on
remote servers
local server
• SoIware sees all new hardware as if it were directly connected
• No change to soIware required
applica5on
virtual server with combined resources
System view Applica5on view
data and compute pipelining
Advanced caching and data directories
Auto service discovery, metering
Func<on redirec<on for advanced coprocessors
![Page 19: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/19.jpg)
Helping to solve accessibility
19
scale-‐out pooling
Inexpensive micro-‐client
Shared Heterogeneous server
![Page 20: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/20.jpg)
offer most affordable
20
Heterogeneous cloud
Developer machine
high performance developer instances and
• Binary-‐level API intercep<on • Distribute work across local
and remote machines • Advanced performance
features including synchroniza<on elision and data pipelining
applica5on
remote servers
local server
data and compute pipelining
Advanced caching and data directories
Auto service discovery, metering
Func<on redirec<on for advanced coprocessors
![Page 21: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/21.jpg)
SUPERCOMPUTING TO THE MASSES
21
![Page 22: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/22.jpg)
Quantum computers
• Architecture: • Targeted workloads:
• Programming models:
• Ecosystem maturity:
22
![Page 23: Bitfusion Nimbix Dev Summit Heterogeneous Architectures](https://reader031.vdocuments.us/reader031/viewer/2022030216/5888b2941a28ab80248b5b57/html5/thumbnails/23.jpg)
ApplicaHon specific processors
• Architecture: Varied • Targeted workloads: App specific: molecular simula<ons, dnn
• Programming models: API
• Ecosystem maturity: Zero-‐ish
23