opencl on fpga - gipsa-lab · floating-point dsp fixed-point dsp . elementary math functions...
TRANSCRIPT
![Page 1: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/1.jpg)
OpenCL On FPGA Marc Gaucheron – ALTERA
Ecole Thématique
Architecture et programmation des GPU
GIPSA-LAB, Grenoble-Campus, 14-17 Décembre 2015
![Page 2: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/2.jpg)
Agenda
2
FPGA architecture overview
Conventional way of developing with FPGA
OpenCL: abstracting FPGA away
ALTERA BSP: abstracting FPGA development
Examples
![Page 3: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/3.jpg)
A bit of Marketing ?
3
![Page 4: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/4.jpg)
FPGA architecture overview
![Page 5: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/5.jpg)
FPGA Architecture: Fine-grained Massively Parallel
5
Millions of reconfigurable logic elements
Thousands of 20Kb memory blocks
Thousands of Variable Precision DSP
blocks
Dozens of High-speed transceivers
Multiple High Speed configurable Memory
Controllers
Multiple ARM© Cores
I/O
I/O
I/O
I/O
Let’s zoom in
![Page 6: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/6.jpg)
FPGA Architecture: Basic Elements
6
1-bit configurable
operation
Configured to perform any
1-bit operation:
AND, OR, NOT, ADD, SUB
Basic Element
1-bit register
(store result)
![Page 7: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/7.jpg)
FPGA Architecture: Flexible Interconnect
7
Basic Elements are surrounded with a
flexible interconnect
…
![Page 8: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/8.jpg)
FPGA Architecture: Flexible Interconnect
8
Wider custom operations are implemented by
configuring and
interconnecting Basic Elements
…
…
![Page 9: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/9.jpg)
FPGA Architecture: Custom Operations Using Basic Elements
9
Wider custom operations are implemented by
configuring and
interconnecting Basic Elements
…
16-bit add
Your custom 64-bit
bit-shuffle and encode
32-bit sqrt
…
…
![Page 10: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/10.jpg)
FPGA Architecture: Memory Blocks
10
Memory
Block
20 Kb
addr
data_in
data_out
Can be configured and grouped using
the interconnect to create various
cache architectures
![Page 11: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/11.jpg)
FPGA Architecture: Memory Blocks
11
Memory
Block
20 Kb
addr
data_in
data_out
Can be configured and grouped using
the interconnect to create various
cache architectures
Lots of smaller
caches
Few larger
caches
![Page 12: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/12.jpg)
FPGA Architecture: Floating Point Multiplier/Adder Blocks
12
data_in
Dedicated floating point
multiply and add blocks
data_out
![Page 13: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/13.jpg)
DSP block architecture
13
Floating-Point DSP Fixed-Point DSP
![Page 14: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/14.jpg)
Elementary math functions supporting floating point
14
Basic Floating Point
FPAdd
FPAddExpert
FPAddN
FPSubExpert
FPAddSub \
FPAddSubExpert
FPFusedAddSub
FPMul
FPMulExpert
FPConstMul
FPAcc
FPSqrt
FPDivSqrt
FPRecipSqrt
FPCbrt
FPDiv
FPInverse
FPFloor
FPCeil
FPRound
FPRint
FPFrac
FPMod
FPDim
FPAbs
FPMin
FPMax
FPMinAbs
FPMaxAbs
FPMinMaxFused
FPMinMaxAbsFused
FPCompare
FPCompareFused
Exp, Log and Power
FPLn
FPLn1px
FPLog10
FPLog2
FPExp
FPExpFPC
FPExpM1
FPExp2
FPExp10
FPPowr
Trig with argument reduction
FPSinX
FPCosX
FPSinCosX
FPTanX
FPCotX
Inverse trigonometric functions
FPArcsinX\
FPArcsinPi
FPArccosX
FPArccosPi
FPArctanX
FPArctanPi
FPArctan2
Trigonometrics of pi*x
FPSinPiX
FPCosPiX
FPTanPiX
FPCotPiX
Trigonometrics misc
FPHypot
FPRangeReduction
Conversion
FXPToFP
FPToFXP
FPToFXPExpert
FPToFXPFused
FPToFP
Macro Operators
FPFusedHorner
FPFusedHornerExpert
FPFusedHornerMulti
FPFusedMultiFunction
Fixed and floating point
Floating point only
Coverage of ~70 elementary math functions
Patented & published efficient mapping to FPGA hardware
Polynomial approximation, Horner’s method, truncated multipliers, …
Compliant to OpenCL & IEEE754 accuracy standards
Rounding mode options for fundamental operators
Half- to Double-precision
![Page 15: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/15.jpg)
FPGA Architecture: Configurable Connectivity = Efficiency
15
Blocks are connected into
a custom data-path that matches your
application.
Streaming data-path more efficient
than copying to/from global memory
![Page 16: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/16.jpg)
16
![Page 17: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/17.jpg)
1GHz Core Performance
5.5M Logic Elements
1TB/s 3D SIP Integration
70% Up to
Lower Power
10 Up to
TFLOPS 14 nm Intel
Tri-Gate
Security
Most
Comprehensive Cortex-A53
Quad-Core
ARM Processor
Heterogeneous up to
![Page 18: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/18.jpg)
Developing with FPGA
18
![Page 19: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/19.jpg)
Typical Programmable Logic Design Flow
19
Synthesis (Mapping) - Translate design into device specific primitives
- Optimization to meet required area & performance constraints
- Quartus II synthesis, Precision Synthesis, Synplify/Synplify Pro,
Design Compiler FPGA
- Result: Post-synthesis netlist
Design specification
Place & route (Fitting) - Map primitives to specific locations inside
target technology with reference to area &
performance constraints
- Specify routing resources to be used
- Result: Post-fit netlist
Design entry/RTL coding - Behavioral or structural description of design
RTL simulation
- Functional simulation
(Mentor Graphics ModelSim® or other 3rd-party simulators)
- Verify logic model & data flow
(no timing delays)
LE M512
M4K/M9K I/O
![Page 20: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/20.jpg)
Typical Programmable Logic Design Flow
20
Timing analysis - Verify performance specifications were met
- Static timing analysis
Gate level simulation (optional) - Timing simulation
- Verify design will work in target technology
PC board simulation & test - Simulate board design
- Program & test device on board
- Use on-chip tools for debugging
tclk
![Page 21: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/21.jpg)
Application Development Paradigm
21
ASIC
FPGA Programmers
Parallel
Programmers
Standard CPU Programmers
![Page 22: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/22.jpg)
The magic trick ?
22
FpgaC
HDL Coder
![Page 23: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/23.jpg)
OpenCL: abstracting FPGA away
![Page 24: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/24.jpg)
Altera OpenCL Program Overview
2010 research project Toronto Technology Center
2011 Development started Proof of concept
9 customer evaluations
2012 Early Access Program Demo’s at Supercomputing ‘12
Over 60 customer evaluations
2013 First public release Publically available May 2013
Passed Conformance Testing
>8500 programs run properly
Public release 13.1 (Nov 2013) Channels (Streaming IO)
Example Designs
SoC Support
Release 14.0 (June 2014) Platforms
Emulator/Profiler
Rapid Prototyping
Release 14.1 (Nov 2014) Arria 10 support
Shared Virtual Memory (PoC)
Release 15.1 (Nov 2015) Kernel Update
Library support
24
![Page 25: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/25.jpg)
OpenCL Use Model: Abstracting the FPGA away
25
Host Code
main() { read_data( … ); manipulate( … ); clEnqueueWriteBuffer( … ); clEnqueueNDRange(…,sum,…); clEnqueueReadBuffer( … ); display_result( … ); }
Standard
gcc Compiler
EXE
Host
Accelerator
Altera Offline
Compiler
AOCX
__kernel void sum (__global float *a, __global float *b, __global float *y) { int gid = get_global_id(0); y[gid] = a[gid] + b[gid]; }
Verilog
Quartus II
OpenCL Accelerator Code
SoC FPGA combines these
in single device
![Page 26: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/26.jpg)
Software Development Flow
26
Modify kernel.cl
x86 Emulator (sec)
Optimization Report (sec)
Profiler (hours)
Functional Bugs?
Stall-free pipeline? Memory
coalesced?
Hardware
performance
met?
DONE!
![Page 27: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/27.jpg)
OpenCL on GPU/Multi-Core CPU Architectures
Conceptually many parallel threads
Simplified View Each thread runs sequentially on a different processing element (PE)
Fixed #s of Functional Units, Registers available on each PE
Many processing elements are available to provide significant parallel speedup
27
TEX L1
SP
Shared Memory
IU
SP
Shared Memory
IU
TF
TEX L1
SP
Shared Memory
IU
SP
Shared Memory
IU
TF
TEX L1
SP
Shared Memory
IU
SP
Shared Memory
IU
TF
TEX L1
SP
Shared Memory
IU
SP
Shared Memory
IU
TF
TEX L1
SP
Shared Memory
IU
SP
Shared Memory
IU
TF
TEX L1
SP
Shared Memory
IU
SP
Shared Memory
IU
TF
TEX L1
SP
Shared Memory
IU
SP
Shared Memory
IU
TF
TEX L1
SP
Shared Memory
IU
SP
Shared Memory
IU
TF
L2
Memory
Work Distribution Host CPU
L2
Memory
L2
Memory
L2
Memory
L2
Memory
L2
Memory
![Page 28: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/28.jpg)
OpenCL on FPGA
OpenCL kernels are translated into a highly parallel circuit A unique functional unit is created for every operation in the kernel
Memory loads / stores, computational operations, registers
Functional units are only connected when there is some data dependence dictated by the kernel
Pipeline the resulting circuit with a new thread on each clock cycle to keep
functional units busy
28
Amount of parallelism is dictated by the number of pipelined
computing operations in the generated hardware
![Page 29: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/29.jpg)
Mapping a simple program to an FPGA
29
R0 Load Mem[100] R1 Load Mem[101] R2 Load #42 R2 Mul R1, R2 R0 Add R2, R0 Store R0 Mem[100]
High-level code
Mem[100] += 42 * Mem[101]
CPU instructions
![Page 30: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/30.jpg)
CPU activity, step by step
30
A R0 Load Mem[100]
A R1 Load Mem[101]
A R2 Load #42
A R2 Mul R1, R2
A R0 Add R2, R0
Store R0 Mem[100] A
Time
![Page 31: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/31.jpg)
On the FPGA we unroll the CPU hardware…
31
A
A
A
A
A
R0 Load Mem[100]
R1 Load Mem[101]
R2 Load #42
R2 Mul R1, R2
R0 Add R2, R0
Store R0 Mem[100] A
Space
![Page 32: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/32.jpg)
… and specialize by position
32
A
A
A
A
A
R0 Load Mem[100]
R1 Load Mem[101]
R2 Load #42
R2 Mul R1, R2
R0 Add R2, R0
Store R0 Mem[100] A
1. Instructions are fixed. Remove “Fetch”
![Page 33: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/33.jpg)
… and specialize
33
A
A
A
A
A
R0 Load Mem[100]
R1 Load Mem[101]
R2 Load #42
R2 Mul R1, R2
R0 Add R2, R0
Store R0 Mem[100] A
1. Instructions are fixed. Remove “Fetch”
2. Remove unused ALU ops
![Page 34: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/34.jpg)
… and specialize
34
A
A
A
A
A
R0 Load Mem[100]
R1 Load Mem[101]
R2 Load #42
R2 Mul R1, R2
R0 Add R2, R0
Store R0 Mem[100] A
1. Instructions are fixed. Remove “Fetch”
2. Remove unused ALU ops
3. Remove unused Load / Store
![Page 35: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/35.jpg)
… and specialize
35
R0 Load Mem[100]
R1 Load Mem[101]
R2 Load #42
R2 Mul R1, R2
R0 Add R2, R0
Store R0 Mem[100]
1. Instructions are fixed. Remove “Fetch”
2. Remove unused ALU ops
3. Remove unused Load / Store
4. Wire up registers properly! And
propagate state.
![Page 36: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/36.jpg)
… and specialize
36
R0 Load Mem[100]
R1 Load Mem[101]
R2 Load #42
R2 Mul R1, R2
R0 Add R2, R0
Store R0 Mem[100]
1. Instructions are fixed. Remove “Fetch”
2. Remove unused ALU ops
3. Remove unused Load / Store
4. Wire up registers properly! And
propagate state.
5. Remove dead data.
![Page 37: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/37.jpg)
… and specialize
37
R0 Load Mem[100]
R1 Load Mem[101]
R2 Load #42
R2 Mul R1, R2
R0 Add R2, R0
Store R0 Mem[100]
1. Instructions are fixed. Remove “Fetch”
2. Remove unused ALU ops
3. Remove unused Load / Store
4. Wire up registers properly! And
propagate state.
5. Remove dead data.
6. Reschedule!
![Page 38: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/38.jpg)
Custom data-path on the FPGA matches your algorithm!
38
Build exactly what you need:
Operations
Data widths
Memory size & configuration
Efficiency:
Throughput / Latency / Power
load load
store
42
High-level code
Mem[100] += 42 * Mem[101]
Custom data-path
![Page 39: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/39.jpg)
What Hardware do we produce?
39
PCIe
DDRx
Flow control logic
Load
Store
Mult-add
Load
Done?
OpenCL-specific iterators
CRA
Mem
ory
Inte
rconnect
![Page 40: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/40.jpg)
ALTERA BSP: abstracting FPGA development
![Page 41: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/41.jpg)
41
An adaptable Board Support Package
OpenCL Domain
PCIe gen3x8 Host Interface
Inte
rco
nn
ec
t
DDR3 Memory Interface
DDR3 Memory Interface
Kernel
IP
Kernel
IP
DDR
DDR
Host
10Gb MAC/UOE Data Interface
10Gb MAC/UOE Data Interface
10G Network
QDRII Memory Interface
QDRII Memory Interface
JESD204
XCVRs
QDR
QDR
A/D
SDI
Prebuilt
BSP with standard HDL
Tools by FPGA
Developer
Built with
Altera
OpenCL
Compiler
IO Infrastructure
![Page 42: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/42.jpg)
Channels Advantage
42
Standard OpenCL Altera Vendor Extension
Host Interface
CvP Update
Inte
rco
nn
ect
DDR3 Interface
10Gb
Interface
10Gb Interface
DDR3 Interface
QDRII Interface
QDRII Interface
QDRII Interface
QDRII Interface OpenCL
Kernels
OpenCL
Kernels
DDR
DDR
QDR
QDR
QDR
QDR
10G
Host Host Interface
CvP Update
Inte
rco
nn
ect
DDR3 Interface
10Gb
Interface
10Gb Interface
DDR3 Interface
QDRII Interface
QDRII Interface
QDRII Interface
QDRII Interface OpenCL
Kernels
OpenCL
Kernels
DDR
DDR
QDR
QDR
QDR
QDR
Host
Network 10G
Network
IO and Kernel Channels
![Page 43: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/43.jpg)
Start with OpenCL ready platforms 1/2
43
HPC Applications Network Applications
![Page 44: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/44.jpg)
Start with OpenCL ready platforms 2/2
44
![Page 45: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/45.jpg)
Shared Virtual Memory (SVM) Platform Model
OpenCL 1.2 Traditional Hosted Heterogeneous
Platform
OpenCL 2.0 New Hosted Heterogeneous Platform
with SVM
45
Global
Memory
DEV
1
Host
Memory CPU
Global
Memory
DEV
…
Global
Memory
DEV
N
Shared Virtual Memory
Global
Memory
DEV
1
Host
Memory CPU
Global
Memory
DEV
…
Global
Memory
DEV
N
PCIe
QPI
CAPP PSL CAPI
![Page 46: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/46.jpg)
Example
![Page 47: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/47.jpg)
Case Study: Image Classification
Deep Learning Algorithm Convolutional Neural Networking
Based on Hinton’s CNN
Early Results on Stratix V 2X Perf./Power vs. gpgpu
despite soft floating point
400 images/s
8+ simultaneous kernels
vs. 2 on gpgpu
Exploiting OpenCL channels
between kernels
A10 results Hard floating point uses all DSPs
Better density and frequency
~ 4X performance/watt v SV
6800 images/s
No code change required 47
T h e C IF AR -1 0 d a ta s e t c o n s is ts o f 6 0 0 0 0 3 2 x3 2 c o lo u r im a g e s in 1 0 c la s s e s , w ith 6 0 0 0
im a g e s p e r c la s s . T h e re a re 5 0 0 0 0 tra in in g im a g e s a n d 1 0 0 0 0 te s t im a g e s .
H e re a re th e c la s s e s in th e d a ta s e t, a s w e ll a s 1 0 ra n d o m im a g e s fro m e a c h :
a irp lan e
au to m o b ile
b ird
c a t
d e e r
d o g
fro g
h o rs e
s h ip
tru c k
Hinton’s CNN Algorithm
![Page 48: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/48.jpg)
Multi-Asset Barrier Option Pricing
Monte-Carlo simulation No closed form solution possible
High quality random number generator required
Billions of simulations required
Used GPU vendors example code
Advantage FPGA Complex Control Flow
Optimizations Channels, loop pipelining
Results
48
Platform Power
(W)
Performance
(Bsims/s)
Efficiency
(Msims/s/W)
W3690 Xeon Processor 130 .032 0.0025
nVidia Kepler20 212 10.1 48
Bittware S5-PCIe-HQ 45 12.0 266
![Page 49: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/49.jpg)
Summary
![Page 50: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/50.jpg)
OpenCL + FPGA Key Benefits
Faster development vs. traditional FPGA design flow Puts the FPGA in the software developers hands
Familiar C-based development flow
Heterogeneous IO interface Multiple 10G Ethernet
SDI, HMDI, A/D Interface
Higher performance/watt vs. CPU/GPGPU Implement exactly what you need
Pipeline parallel structures
Custom interconnect converging with data processing cores
Portability & Obsolescence free Code can transfer between different HW accelerators (CPU, GPGPU, FPGA, etc)
Code ports seamlessly to new generations of the FPGA
FPGA life cycle considerably longer than CPUs or GPGPUs
50
![Page 51: OpenCL On FPGA - Gipsa-lab · Floating-Point DSP Fixed-Point DSP . Elementary math functions supporting floating point 14 Basic Floating Point ... application. Streaming data-path](https://reader035.vdocuments.us/reader035/viewer/2022063011/5fc42b7524897362e20252a7/html5/thumbnails/51.jpg)
Q & A