before placement: clustering
DESCRIPTION
ECE 506 Reconfigurable Computing http://www.ece.arizona.edu/~ece506 Lecture 6 Clustering Ali Akoglu. Before Placement: Clustering. Intra -cluster connections: fast Inter -cluster connections: slow Need to pack BLEs Goals : Reduce stress on routing - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/1.jpg)
ECE 506
Reconfigurable Computing
http://www.ece.arizona.edu/~ece506
Lecture 6
Clustering
Ali Akoglu
![Page 2: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/2.jpg)
Before Placement: Clustering° Intra-cluster connections: fast° Inter-cluster connections: slowNeed to pack BLEs ° Goals:
• Reduce stress on routing• Take advantage of local fast
interconnect• Reduce inter-cluster wiring• Minimize critical path (timing-
driven)° How do we do this
• Take advantage of cluster architecture
° Tradeoffs
![Page 3: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/3.jpg)
Basic Clustering (Betz)
° How many distinct inputs should be provided to a cluster of N 4-LUTs?
° How many 4 LUTs should be included in a cluster to create the most area-efficient logic block?
![Page 4: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/4.jpg)
VPACK
![Page 5: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/5.jpg)
Basic Clustering (Betz)
° Flow• Iterate until all BLEs consumed• Start new cluster by selecting a random BLE
- select the currently unclustered BLE with the most used inputs,• Add BLE with most shared inputs with current cluster to
cluster- to minimize the number of inputs that must be routed to
each cluster.• Keep adding until either cluster full or input pins used up• Hill climbing – if some cluster BLEs unused
- Add another BLE even if cluster input count temporarily overflowed
- If input count not eventually reduced select best choice from before hill climbing
![Page 6: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/6.jpg)
Logic Utilization
![Page 7: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/7.jpg)
Number of Inputs per Cluster
• Lots of opportunities for input sharing in large clusters (Betz – CICC’99)
• Reducing inputs reduces the size of the device and makes it faster.
• Most FPGA devices (Xilinx, Lucent) have 4 BLE per cluster with more inputs than actually needed.
![Page 8: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/8.jpg)
TVPACK
![Page 9: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/9.jpg)
Architecture Modeling
Tri-state buffer and pass transistor distributionCluster Size vs. Routing resources (Tile size)Transistor and Buffer Scaling based on segment lengthFlexibility of Switches (Fc=W for large cluster size is a waste?)
![Page 10: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/10.jpg)
Logic Cluster Structure
![Page 11: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/11.jpg)
Timing-Driven Clustering – T-VPACK
° Optimization goals of VPack• Pack each cluster to its capacity
- Minimize number of clusters• Minimize number of inputs per cluster
- Reduce the number of external connections
![Page 12: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/12.jpg)
Timing-Driven Clustering – T-VPACK
° Optimization goal of T-VPack• Minimize number of external connections on critical
path• Why?
- External connections have higher delay and internal connections
- Reducing number of external nets on critical path will reduce delay
![Page 13: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/13.jpg)
Timing-Driven Clustering – T-VPACK
° First stage• Identify connections that are on the critical path
° Second Stage• Pack BLEs sequentially along the critical path • Recompute criticality of remaining BLEs
![Page 14: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/14.jpg)
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
![Page 15: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/15.jpg)
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
Arrival Times
![Page 16: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/16.jpg)
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
3
3
1
Arrival Times
![Page 17: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/17.jpg)
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7
9
7
7
Arrival Times
![Page 18: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/18.jpg)
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7
9
7
7
13
15
14
Arrival Times
![Page 19: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/19.jpg)
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7
9
7
7
13
15
14
18
22
18
Arrival Times
![Page 20: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/20.jpg)
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7
9
7
7
13
15
14
22/22
18/22
18/22
arrival time/required time
![Page 21: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/21.jpg)
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7
9
7 / 15
7
13
14/ 18
22/22
18/22
18/22
15 / 15
arrival time/required time
![Page 22: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/22.jpg)
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7
9
7/ 13
13
14 / 18
22/22
18/22
18/22
15 / 15
7 / 15
arrival time/required time
![Page 23: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/23.jpg)
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7
9
7/ 13
13 / 15
14 / 18
22/22
18/22
18/22
15 / 15
7 / 15
arrival time/required time
![Page 24: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/24.jpg)
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7 / 9
9 / 9
7/ 13
13 / 15
14 / 18
22/22
18/22
18/22
15 / 15
7 / 15
arrival time/required time
![Page 25: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/25.jpg)
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1 / 5
3 / 3
1 / 9
7 / 9
9 / 9
7/ 13
13 / 15
14 / 18
22/22
18/22
18/22
15 / 15
7 / 15
arrival time/required time
![Page 26: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/26.jpg)
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0 / 4
0 / 0
0 / 8
1 / 5
3 / 3
1 / 9
7 / 9
9 / 9
7/ 13
13 / 15
14 / 18
22/22
18/22
18/22
15 / 15
7 / 15
Slack = required time - arrival time
![Page 27: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/27.jpg)
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
4
0
8
4
0
8
2
0
6
2
4
0
4
4
0
8
Slack = required time - arrival time
![Page 28: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/28.jpg)
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0 / 4
0 / 0
0 / 8
1 / 5
3 / 3
1 / 9
7 / 9
9 / 9
7/ 13
13 / 15
14 / 18
22/22
18/22
Critical Path
18/22
15 / 15
7 / 15
![Page 29: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/29.jpg)
Timing-Driven Clustering – T-VPACK
° Cost metric now considers both connectivity and timing criticality
° Perform an analysis of criticality at beginning considering all wires to be inter-cluster
° Determine “Base” BLE criticality
![Page 30: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/30.jpg)
Base Criticality
![Page 31: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/31.jpg)
How to break ties?
° Initially, many paths may have the same number of BLEs
° Include “tie-breaking” in performance cost function
![Page 32: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/32.jpg)
Results for T-VPACK versus VPACK
Why does the gap between VPack and T-VPack increase as N increases?
![Page 33: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/33.jpg)
Results for T-VPACK versus VPACK
° T-VPack prefers to cluster a BLE with BLEs that are in its fan-in or fan-out
° VPack favors input sharing° T-VPack completely absorbs many low-fanout nets
• Fewer nets to route!
![Page 34: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/34.jpg)
Results for T-VPACK versus VPACK
Why does area-delay product show an increasing trend beyond cluster size of 10?
![Page 35: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/35.jpg)
Results for T-VPACK versus VPACK
° Increased number of nets that are completely absorbed by T-Vpack
° Area- delay product• Cluster size 7-10 best choice (36-34% better than N=1)
° N=7 vs N=1• 30% less delay, 8% les area
![Page 36: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/36.jpg)
Results for T-VPACK, DELAY !!!
Why do we see a circuit speedup?
![Page 37: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/37.jpg)
Results for T-VPACK, DELAY !!!
18%
40%
° Intra-cluster: Fast, Inter-cluster: Slow !° As N increases
• Number of internal connections on the critical path increase• Number of external connections on the critical path decrease
![Page 38: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/38.jpg)
Why are inter-cluster connections becoming faster?
Reduction in Number of external connections (internal connections are faster)External connections on the critical path are becoming faster
Reduction in routing requirements
![Page 39: Before Placement: Clustering](https://reader036.vdocuments.us/reader036/viewer/2022062305/5681621e550346895dd2447d/html5/thumbnails/39.jpg)
Drawback of VPack and T-VPack