quantifying the cost and benefit of latency insensitive
TRANSCRIPT
![Page 1: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/1.jpg)
Quantifying the Cost and Benefit of Latency
Insensitive Communication on FPGAs
Kevin E. Murray and Vaughn Betz
1
![Page 2: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/2.jpg)
Motivation
2
![Page 3: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/3.jpg)
Local Communication Speed Improving
3
![Page 4: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/4.jpg)
Local Communication Speed Improving
3
>2.4x
![Page 5: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/5.jpg)
Local Communication Speed Improving
3
>2.4x
![Page 6: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/6.jpg)
Local Communication Speed Improving
3
>2.4x
![Page 7: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/7.jpg)
Global Communication Speed Not Improving
4
![Page 8: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/8.jpg)
Global Communication Speed Not Improving
4
1.3x
![Page 9: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/9.jpg)
Global Communication Speed Not Improving
4
3.6x
1.3x
![Page 10: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/10.jpg)
Global Communication Speed Not Improving
4
3.6x
1.3x
System level timing closure increasingly
difficult!
![Page 11: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/11.jpg)
System Level Timing Closure Issues
5
![Page 12: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/12.jpg)
System Level Timing Closure Issues
5
Identify Critical
Path
![Page 13: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/13.jpg)
System Level Timing Closure Issues
5
Identify Critical
Path
Insert Register
![Page 14: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/14.jpg)
System Level Timing Closure Issues
5
Identify Critical
Path
Insert Register
Modify & Verify
Control Logic
• RTL changes are error prone
![Page 15: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/15.jpg)
System Level Timing Closure Issues
5
Identify Critical
Path
Insert Register
Modify & Verify
Control Logic
Physical CAD
• RTL changes are error prone
• Re-compile can take hours or days
![Page 16: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/16.jpg)
System Level Timing Closure Issues
5
Identify Critical
Path
Insert Register
Modify & Verify
Control Logic
Physical CAD
Closed
Timing?
Done
Yes
• RTL changes are error prone
• Re-compile can take hours or days
![Page 17: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/17.jpg)
System Level Timing Closure Issues
5
Identify Critical
Path
Insert Register
Modify & Verify
Control Logic
Physical CAD
Closed
Timing?
Done
Yes
No
• RTL changes are error prone
• Re-compile can take hours or days
• No guarantee of convergence
![Page 18: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/18.jpg)
Key Problem & Potential Solutions
6
• Limited by Synchronous Assumption:
• Computation and communication occur in a single clock cycle (if not
pipelined)
• Reasonable when local-global speed gap was small, but not when
large
• Many different proposed design schemes:
• Over-pipelining
• Asynchronous
• Globally Asynchronous Locally Synchronous (GALS)
• Latency Insensitive
![Page 19: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/19.jpg)
What is Latency Insensitive Design?
7
![Page 20: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/20.jpg)
Latency Insensitive System Implementation
8
• Key Idea: Make sub-modules insensitive to their communication latency
• Create a LI module by placing a designer’s synchronous module (Pearl) in a
wrapper (Shell), which is also synchronous
• Use Relay Stations (RS) to pipeline interconnect
• Deadlock free and applicable to (nearly) any synchronous module [1]
Logical System LI Implementation[1] Carloni et. al, “Theory of Latency-Insensitive Design”, TCAD, 2001
![Page 21: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/21.jpg)
Latency Insensitive Communication Protocol
9
• Tag each module port with ‘Valid’ and ‘Stop’ bits
• Pearl is paused until all inputs are ‘valid’
• ‘Stop’ signal provides back-pressure to prevent FIFO overflow
ReceiverSender
Valid
FIFOData
Stop
![Page 22: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/22.jpg)
Latency Insensitive Communication Protocol
10
Shell C
Pearl
Shell B
Pearl
Shell A
Pearl
![Page 23: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/23.jpg)
Latency Insensitive Communication Protocol
10
Shell C
Pearl
Shell B
Pearl
Shell A
Pearl
![Page 24: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/24.jpg)
Latency Insensitive Communication Protocol
10
Shell C
Pearl
Shell B
Pearl
Shell A
Pearl
![Page 25: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/25.jpg)
Latency Insensitive Communication Protocol
10
Shell C
Pearl
Shell B
Pearl
Shell A
Pearl
Backpressure
![Page 26: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/26.jpg)
Latency Insensitive Communication Protocol
10
Shell C
Pearl
Shell B
Pearl
Shell A
Pearl
Backpressure
![Page 27: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/27.jpg)
Latency Insensitive Communication Protocol
10
Shell C
Pearl
Shell B
Pearl
Shell A
Pearl
Backpressure
![Page 28: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/28.jpg)
Latency Insensitive Communication Protocol
10
Shell C
Pearl
Shell B
Pearl
Shell A
Pearl
Backpressure
![Page 29: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/29.jpg)
Latency Insensitive Communication Protocol
10
Shell C
Pearl
Shell B
Pearl
Shell A
Pearl
Backpressure
![Page 30: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/30.jpg)
Latency Insensitive Communication Protocol
10
Shell C
Pearl
Shell B
Pearl
Shell A
Pearl
Backpressure
![Page 31: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/31.jpg)
Latency Insensitive Communication Protocol
10
Shell C
Pearl
Shell B
Pearl
Shell A
Pearl
Backpressure
![Page 32: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/32.jpg)
Latency Insensitive Communication Protocol
10
Shell C
Pearl
Shell B
Pearl
Shell A
Pearl
![Page 33: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/33.jpg)
Latency Insensitive Design
11
Advantages:
• Interconnect pipelining does not affect correctness
• Designers can still reason about system synchronously
• Easy to pipeline late in the design flow
• Enhanced module re-use & composability
• Suitable for automation
• Use existing CAD tools
![Page 34: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/34.jpg)
Interconnect Pipelining Automation
12
Identify Critical
Path
Insert Register
Modify & Verify
Control Logic
Physical CAD
Closed
Timing?
Done
Yes
No
![Page 35: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/35.jpg)
Interconnect Pipelining Automation
12
Identify Critical
Path
Insert Register
Modify & Verify
Control Logic
Physical CAD
Closed
Timing?
Done
Yes
No
![Page 36: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/36.jpg)
Interconnect Pipelining Automation
12
Identify Critical
Path
Insert Register
Modify & Verify
Control Logic
Physical CAD
Closed
Timing?
Done
Yes
No
Identify Critical
Path
Insert Register
Physical CAD
Closed
Timing?
Done
Yes
No
LI Physical CAD
![Page 37: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/37.jpg)
Latency Insensitive Design
13
Advantages:
• Interconnect pipelining does not affect correctness
• Designers can still reason about system synchronously
• Easy to pipeline late in the design flow
• Enhanced module re-use & composability
• Suitable for automation
• Use existing CAD tools
• Fold interconnect pipelining into physical CAD Tools [Future work]
![Page 38: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/38.jpg)
Latency Insensitive Design
13
Advantages:
• Interconnect pipelining does not affect correctness
• Designers can still reason about system synchronously
• Easy to pipeline late in the design flow
• Enhanced module re-use & composability
• Suitable for automation
• Use existing CAD tools
• Fold interconnect pipelining into physical CAD Tools [Future work]
Disadvantages:
• Area/Speed overhead versus hand-tuned design
• Must verify sufficient throughput
![Page 39: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/39.jpg)
Latency Insensitive Design
14
Trade off:
• Implementation efficiency for designer productivity
Key question of this work:
• What are the overheads of LI design on FPGAs?
![Page 40: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/40.jpg)
Latency Insensitive Implementation
15
![Page 41: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/41.jpg)
Baseline Shell Implementation
16
• ASIC LI design stalls modules by clock gating
• Use ‘Clock Enable’ on FPGAs
![Page 42: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/42.jpg)
Baseline Shell Implementation
16
• ASIC LI design stalls modules by clock gating
• Use ‘Clock Enable’ on FPGAs
• ‘Clock Enable’ becomes timing critical
• Fans out to all registers in pearl
• Connected to upstream and downstream modules
High Fan-out
![Page 43: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/43.jpg)
Optimized Shell Implementation
17
• Break timing path before it becomes high fan-out
• Insert additional registers in Shell
![Page 44: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/44.jpg)
Optimized Shell Implementation
17
• Break timing path before it becomes high fan-out
• Insert additional registers in Shell
Single-bit
![Page 45: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/45.jpg)
Optimized Shell Implementation
17
• Break timing path before it becomes high fan-out
• Insert additional registers in Shell
Single-bit
Multi-bit
![Page 46: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/46.jpg)
Optimized Shell Implementation
17
• Break timing path before it becomes high fan-out
• Insert additional registers in Shell
• Improves Timing
• Adds additional cycle of latency to shell
Single-bit
Multi-bit
![Page 47: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/47.jpg)
Relay Station (RS) Implementation
18
• Analogous to conventional pipeline register
• Additional logic to:
• Handle ‘valid’ and ‘stop’ bits
• Store in-flight data when facing backpressure (avoids stalling)
![Page 48: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/48.jpg)
FIR Design Example
19
![Page 49: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/49.jpg)
Cascaded FIR Case Study
20
• Pearl: FIR filter
• Design: 49 cascaded FIR filters
• Used as a high speed design example
• Investigate the frequency impact of LI
design
• Allow comparison of LI and non-LI
pipelining
Resources EP4sGX230 Utilization
Logic Blocks 51%
DSP Blocks 99%
M9K Blocks <1%
M144K Blocks 0%
![Page 50: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/50.jpg)
Cascaded FIR Case Study - Area
21
1.08 x
1.09 x
1.00 x
![Page 51: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/51.jpg)
Cascaded FIR Case Study - Frequency
22
0.67 x
0.92 x
1.00 x
![Page 52: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/52.jpg)
Cascaded FIR Case Study - Frequency
23
High Speed
Solutions
![Page 53: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/53.jpg)
Cascaded FIR Case Study - Frequency
23
High Speed
Solutions
26%
![Page 54: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/54.jpg)
Cascaded FIR Case Study - Frequency
24
![Page 55: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/55.jpg)
Cascaded FIR Case Study - Frequency
25
![Page 56: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/56.jpg)
Cascaded FIR Case Study - Frequency
25
42%
![Page 57: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/57.jpg)
Cascaded FIR Case Study - Frequency
26
![Page 58: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/58.jpg)
Cascaded FIR Case Study - Frequency
27
![Page 59: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/59.jpg)
Cascaded FIR Case Study - Frequency
28
![Page 60: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/60.jpg)
Pipelining Overhead Cause
29
• Extra control logic adds delay overhead to each Shell or RS
![Page 61: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/61.jpg)
Pipelining Overhead Cause
29
• Extra control logic adds delay overhead to each Shell or RS
Increased effective Tsu
![Page 62: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/62.jpg)
Generalized LI Scaling
30
![Page 63: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/63.jpg)
Generalized LI Shell Scaling
31
• Identify what makes Shells expensive
• Leads to design guidelines to minimize overhead
• Consider impact of scaling three main shell characteristics
• FIFO Depth
• Number of Input Ports
• Port Width
![Page 64: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/64.jpg)
FIFO Depth Scaling
32
• Scaling FIFO Depth costs minimal area
• Block RAMs are underused at shallow depths
![Page 65: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/65.jpg)
FIFO Depth Scaling
33
• Scaling FIFO Depth costs minimal area
• Block RAMs are underused at shallow depths
Use deep FIFOs to minimize stalling
![Page 66: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/66.jpg)
Port Width and Input Port Scaling
34
• Increasing port width or input ports costs significant area
![Page 67: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/67.jpg)
Port Width and Input Port Scaling
35
• Increasing port width or input ports costs significant area
• Frequency degrades faster as input ports are increased
![Page 68: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/68.jpg)
Port Width and Input Port Scaling
35
• Increasing port width or input ports costs significant area
• Frequency degrades faster as input ports are increased
2048 input bits
![Page 69: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/69.jpg)
Port Width and Input Port Scaling
35
• Increasing port width or input ports costs significant area
• Frequency degrades faster as input ports are increased
2048 input bits 160 input bits
![Page 70: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/70.jpg)
Port Width and Input Port Scaling
35
• Increasing port width or input ports costs significant area
• Frequency degrades faster as input ports are increased
2048 input bits 160 input bits
Favour wider ports instead of more ports to
maximize frequency
![Page 71: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/71.jpg)
Granularity
36
![Page 72: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/72.jpg)
LI Design Granularity
37
• How fine or coarse should we make LI Systems?
• Trade-off between:
• Flexibility and productivity benefits
• Area overhead
• Local communication (e.g. 40K LEs) is still fast
• Flexibility most beneficial for slow system-level communication
![Page 73: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/73.jpg)
Rent’s Rule
38
• Use Rent’s Rule to relate design size to pin count:
P = KNR
![Page 74: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/74.jpg)
Rent’s Rule
38
• Use Rent’s Rule to relate design size to pin count:
P = KNR
P pins
![Page 75: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/75.jpg)
Rent’s Rule
38
• Use Rent’s Rule to relate design size to pin count:
P = KNR
P pins
K pins
![Page 76: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/76.jpg)
Rent’s Rule
38
• Use Rent’s Rule to relate design size to pin count:
P = KNR
P pins
K pins N blocks
![Page 77: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/77.jpg)
Rent’s Rule
38
• Use Rent’s Rule to relate design size to pin count:
P = KNR
• R: Rent parameter
P pins
K pins N blocks
![Page 78: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/78.jpg)
Rent’s Rule
38
• Use Rent’s Rule to relate design size to pin count:
P = KNR
• R: Rent parameter
P pins
K pins N blocks
R = 0.0
![Page 79: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/79.jpg)
Rent’s Rule
38
• Use Rent’s Rule to relate design size to pin count:
P = KNR
• R: Rent parameter
P pins
K pins N blocks
R = 0.0 R = 1.0
![Page 80: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/80.jpg)
Rent’s Rule
38
• Use Rent’s Rule to relate design size to pin count:
P = KNR
• R: Rent parameter
• Typical circuits:
0.50 < R < 0.75
P pins
K pins N blocks
R = 0.0 R = 1.0
![Page 81: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/81.jpg)
Rent’s Rule Overhead Projections
39
• Combine shell area scaling numbers for various design sizes and
ranges of Rent parameters
![Page 82: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/82.jpg)
Rent’s Rule Overhead Projections
39
• Combine shell area scaling numbers for various design sizes and
ranges of Rent parameters
![Page 83: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/83.jpg)
Rent’s Rule Overhead Projections
39
• Combine shell area scaling numbers for various design sizes and
ranges of Rent parameters
Cascaded FIR
(2.4K LE)
![Page 84: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/84.jpg)
Hypothetical Design Example 20% Overhead
40
• Consider a 4M LE FPGA at 20% area overhead
![Page 85: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/85.jpg)
Hypothetical Design Example 20% Overhead
40
• Consider a 4M LE FPGA at 20% area overhead
71 Modules
(56K LEs)
307 Modules
(13K LEs)
5 Modules
(700K LEs)
![Page 86: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/86.jpg)
LI Design Granularity
41
• Area overhead is strongly related to communication locality (Rent Parameter)
• Designs with well localized communication will result in low overhead
• Rent parameter varies within different parts of a design
• Careful choice of module boundaries may further reduce overhead
![Page 87: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/87.jpg)
Conclusion and Future Work
42
![Page 88: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/88.jpg)
• Illustrated the growing gap between local and global communication speed
• 3.6x and growing
• Developed optimized LI building blocks for FPGAs
• Reduced frequency overhead from 33% to 8%
• Quantified the area and frequency overhead of LI communication on FPGAs
• Provided design guidelines to minimize the overheads of LI communication
Conclusion
43
![Page 89: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/89.jpg)
• Explore the benefits of LI design
• LI aware CAD Tools
• Investigate architectural enhancements
• Hardened FIFOs
• Fine-grained clock gating
• Embedded NoC
• Evaluate LI design on a broader range of designs
• Develop lower area/speed overhead LI design techniques
Future Work
44
![Page 90: Quantifying the Cost and Benefit of Latency Insensitive](https://reader034.vdocuments.us/reader034/viewer/2022051403/627c93fce072062acc3d4f76/html5/thumbnails/90.jpg)
Thanks! Questions?Email: [email protected]