Download - RR Osorio FPGA
![Page 1: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/1.jpg)
Field-Programmable Gate Arraysas tracking devices
Roberto Rodríguez OsorioJavier Díaz Bruguera
Group of Computer ArchitectureDept. of Electronics and Computer Science
University of Santiago de Compostela
![Page 2: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/2.jpg)
2
Outline
Application-specific computing machinesASIC vs FPGAFPGA technology basicsHard cores in FPGAsPerformanceDesign effortChoicesApplications
![Page 3: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/3.jpg)
3
Application-specific computing machines
Microprocessor
Codememory
Datamemory
PC IR
Control logic
Registerfile
Functionalunits
DatapathControlsection
M p
Control logic MAC
DatapathControlsection
Mpt
Codememory
Datamemory
PC IR
Control logic
Registerfile
Functionalunits
DatapathControlsection
M p
Control logic MAC
DatapathControlsection
Mpt
Application-SpecificIntegrated Circuit
Performance: 10 cycles @ 3GHzDissipated power: ~35 W
Performance: 1 cycle @ 1GHzDissipated power: ~mW
![Page 4: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/4.jpg)
4
ASIC vs FPGA
0.05
$4M
$3M
$2M
$1M
Technology (micrometers)
NR
E
0.35 0.25 0.2 0.15 0.1
![Page 5: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/5.jpg)
5
ASIC vs FPGA
10
10
10
10
10
10
10
6
5
4
3
2
1
0
2 1 0.5 0.25 0.13 0.07
1986 1990 1994 1998 2002 2006
Computational efficiency (Mops/w)
Technology ( m)
Maximum efficiency(ASIC)
FPGAASSPMPPAGPGPUVLIWASIPManyCore...
Source: Theo A.C.M Claasen, ISSCC 99
![Page 6: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/6.jpg)
6
FPGA technology basics – Computing
carryinput a b s
carryoutput
0 0 0 0 00 0 1 1 00 1 0 1 00 1 1 0 11 0 0 1 01 0 1 0 11 1 0 0 11 1 1 1 1
FA
a b
s
cout cin
ac
b
aba
cbcin
in
in
s
cout
![Page 7: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/7.jpg)
7
FPGA technology basics – Do not compute
Logic blocks
SRAM
Memory
8x1-bit
SRAM
Memory
8x1-bit
cin
a
b s
cout
![Page 8: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/8.jpg)
8
FPGA technology basics – Interconnect█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
![Page 9: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/9.jpg)
9
FPGA technology basics – Interconnect
![Page 10: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/10.jpg)
10
FPGA technology basics – Interconnect
![Page 11: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/11.jpg)
11
FPGA technology basics – Interconnect + memory
FPGA fabric consists of a huge number of simple memory elements connected by means of a reconfigurable networkDesign software must break every computing tasks into 1-bit size operation with no more than 4, 5 or 6 variablesOperations are spatially distributed according to proximity criteriaRouting may be troublesome
Long paths are slowRouting though logic blocks increase area
![Page 12: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/12.jpg)
12
Hard cores in FPGAs
Memory blocksMultipliersDSP blocksMicroprocessorsFloating point units?
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
![Page 13: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/13.jpg)
13
Memory blocks
Hundreds or thousands of small memory blocksDual-port blocks18 K-bit each for XilinxFlexible configurations
Many short words or a few large word
Independent accessHuge aggregated bandwidth
![Page 14: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/14.jpg)
14
Multipliers and DSP blocks
As FPGAs were becoming larger, some people tried to implement DSP algorithms on them
However: Multipliers take too much areaTherefore: Hardwired multipliers were introduced
DSP algorithms are often based on multiply & addmultiply & accumulate
DSP blocks in modern FPGAs implement hardwired: multipliy, multiply & add, multiply & accumulateoptional addition before multiplyingthree-input add1 large, 2 medium or 4 small operations on the same hardwareshifting, comparisons, bit-wise operations,…
Up to 2000 DSP blocks in current FPGAs for massive parallelism
![Page 15: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/15.jpg)
15
Microprocessors
Xilinx: IBMs Power PC processors
Virtex II ProVirtex-4 FXVirtex-5 FX
Microblaze soft processors
Altera: ARM RISC processorsNios soft processor
![Page 16: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/16.jpg)
16
Floating point units
Not implemented so far• Suggested to help to accelerate scientific computing• For engineering, fixed point arithmetic is usually enough
Would it happen?☺ It happened with multipliers, transceivers, DSP blocks, …
GPUs have already a strong position in this field
![Page 17: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/17.jpg)
17
Performance
Compared to an ASIC10 times slower, larger and power hungry
Compared to a microprocessorFast, depending on:
Potential parallelismRequired bandwidth
Small and simple, even standaloneReduced power consumption (< 1W), they may run on batteries
![Page 18: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/18.jpg)
18
Design effort
Several scenarios:
Pure VHDL or Verilog codingHigher flexibility, efficiency and performanceLong design time Costly debugging
Use macros combined with VHDL or Verilog Libraries of IP blocks easy the design processIt is not guaranteed that the required functionalities can be found
High level languages (DSP logic (Matlab), Impulse-C, Handel-C,…)
Efficient and simple implementation for simple algorithmsLack of expressiveness for complex algorithms
![Page 19: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/19.jpg)
19
Choices
XilinxVirtexSpartan
AlteraStratixCyclone
OthersActelLattice Semiconductor…
![Page 20: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/20.jpg)
20
Choices - Xilinx
Spartan 3 Spartan 6 Virtex 6
Logic Cells 1728 – 74880 3840 - 147443 74496 – 566784
Block RAM (Kbits)
12 - 1872 216 - 4824 5616 – 32832
Multipliers / DSP
4 – 10484 - 126 8 - 180 288 - 2016
Evaluation board cost
< $200 $300 - $1000 $2000 - $2500
![Page 21: RR Osorio FPGA](https://reader034.vdocuments.us/reader034/viewer/2022052601/5596022b1a28ab607d8b45ef/html5/thumbnails/21.jpg)
21
In the context of this applications
Device choice• Logic bounded
• Standard logic• Multipliers
• IO boundedParallel acquisition• Switching memory blocks for acquisition and computationHigh computing speed• Via pipeliningResults storage• Internal or external memoryPower consumptionConfiguration