eugene khvedchenia - image processing using fpgas
TRANSCRIPT
![Page 1: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/1.jpg)
Image processing on FPGAEugene Khvedchenya
https://ua.linkedin.com/in/cvtalks
![Page 2: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/2.jpg)
What is FPGA and who needs it ?
![Page 3: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/3.jpg)
General implementation
OpenCLCache tuning
MultithreadingSIMD (SSE, NEON)
FPGA
Optimization pyramid
![Page 4: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/4.jpg)
What’s inside?
LUT
Flip-Flop
ALU
BRAM
IO pads
FPGA
![Page 5: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/5.jpg)
Development efforts
![Page 6: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/6.jpg)
CPU vs FPGA
![Page 7: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/7.jpg)
CPU vs FPGA
![Page 8: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/8.jpg)
CPU vs FPGA
![Page 9: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/9.jpg)
Development efforts
![Page 10: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/10.jpg)
High Level SynthesisConverts C++ code to hardware design
HLS compiler optimizes your code for FPGA
Automatically optimize RTL and timing
Provides #pragma’s for fine tuning
C++ API for arbitrary precision math
C++ API for stream data processing
Supports C++ 11
![Page 11: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/11.jpg)
Things to rememberNo branching penalty
![Page 12: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/12.jpg)
Things to rememberNo dynamic memory allocation
![Page 13: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/13.jpg)
Things to rememberInstantaneous BRAM access
Register-level bandwidth 0.5M-bits / second
BRAM bandwidth 23T-bits / second
Numbers above for Xilinx Kintex®-7 410T device
![Page 14: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/14.jpg)
Things to rememberSingle producer - single consumer
![Page 15: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/15.jpg)
Things to rememberPipelining
![Page 16: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/16.jpg)
Things to remember
● No branching penalty
● No cache penalty
● No dynamic memory allocation
● Instantaneous BRAM access
● Single producer - single consumer
● Pipelining
● Task-centric approach
![Page 17: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/17.jpg)
HLS Development cycle
1. Get baseline version
2. Write simulation test
3. Run HLS synthesis
4. Simulate
5. Validate
6. Measure
7. Optimize
8. Goto 3
![Page 18: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/18.jpg)
Sobel Edge DetectionGoal: Process image 1920x1080 @ 60HZ
![Page 19: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/19.jpg)
Sobel Edge DetectionBaseline implementation
Iterate over image● Convolve 3x3 window with Gx and Gy kernels● Compute their absolute sum● Write to corresponding output pixel
The FPGA frequency is this example is 150 MhzTo meet 1920x1080@60Hz goal we must process data at rate 1 cycle/pixel or faster
![Page 20: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/20.jpg)
Sobel Edge DetectionBaseline implementation
![Page 21: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/21.jpg)
Sobel Edge DetectionBaseline implementation
40 cycles/pixel on FPGATiming violation
![Page 22: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/22.jpg)
Sobel Edge DetectionTuning FPGA implementation
Iterate over image● Convolve 3x3 window with Gx and Gy kernels
Pipeline: Compute one field in the 3x3 filter window per clock cycle.● Compute Gx and Gy absolute sum● Write to corresponding output pixel
![Page 23: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/23.jpg)
Sobel Edge DetectionTuning FPGA implementation
![Page 24: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/24.jpg)
Sobel Edge DetectionTuning FPGA implementation
10 cycles/pixel on FPGATiming violation
![Page 25: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/25.jpg)
Sobel Edge DetectionTuning FPGA implementation
Iterate over image● Pipeline: Apply pipeline to the inner loop (columns)● Convolve 3x3 window with Gx and Gy kernels
○ Loop gets totally unrolled and computed at 1 cycle● Compute Gx and Gy absolute sum
○ Also computed in parallel● Write to corresponding output pixel
![Page 26: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/26.jpg)
Sobel Edge DetectionTuning FPGA implementation
![Page 27: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/27.jpg)
Sobel Edge DetectionTuning FPGA implementation
1 cycle/pixel on FPGAMemory-access violation
![Page 28: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/28.jpg)
Sobel Edge DetectionTuning FPGA implementation
Issues● Nine concurrent memory accesses● More hardware blocks required● HLS module can only connect a single port capable of one transaction/clock
![Page 29: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/29.jpg)
Sobel Edge DetectionTuning FPGA implementation
● Use BRAM to store intermediate line buffer ● Read data from external memory to line buffer● Fill memory window (Flip-flop elements)● Convolve 3x3 window with Gx and Gy kernels
○ Loop gets totally unrolled and computed at 1 cycle● Compute their absolute sum
○ Also computed in parallel● Write to corresponding output pixel
![Page 30: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/30.jpg)
Sobel Edge DetectionTuning FPGA implementation
1 cycle/pixel on FPGAAchievement unlocked
![Page 31: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/31.jpg)
The dark sideOf the FPGA development
● The tools aren’t great● It works in simulator!● Learning curve● Debugging timing violations
![Page 32: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/32.jpg)
Quick start● FPGA Development board: Altera, Xilinx● IDE & Samples: Vivado● OpenCV support● HLS for OpenCL
![Page 33: Eugene Khvedchenia - Image processing using FPGAs](https://reader035.vdocuments.us/reader035/viewer/2022062412/5882a4c71a28ab92618b6b9b/html5/thumbnails/33.jpg)
Image processing on FPGAEugene Khvedchenya
Questions?
https://ua.linkedin.com/in/[email protected]
@cvtalks