![Page 1: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/1.jpg)
Stratix™10 High Performance Routable Clock NetworksCarl Ebeling, Dana How, Herman Schmit, David Lewis
![Page 2: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/2.jpg)
In the beginning, there was an FPGA
2
and clocks were simple
![Page 3: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/3.jpg)
But Moore’s Law was relentless
3
FPGAs became largerWe needed balanced trees
![Page 4: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/4.jpg)
Clock Trees Became Even Larger
4
![Page 5: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/5.jpg)
Different Sizes of Fixed Clock Trees
5
Quadrant Clocks
![Page 6: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/6.jpg)
6
Peripheral clocks
![Page 7: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/7.jpg)
7
Overlay them all
![Page 8: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/8.jpg)
This Approach Does Not Scale Well
8
Cost: How many fixed clocks? What size trees?
Performance
![Page 9: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/9.jpg)
Clock Distribution and Clock Loss
9
Register setup (tsu) and propagation delay (tco)Clock skew (S)
tco tsuS
Clock period
Useful clock
![Page 10: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/10.jpg)
The Performance Implications of Clock Distribution
10
Advanced technologies Variation: Process, voltage, temperature Jitter Aging/End of Life
tsuS
Clock period
Useful clock
PVT
EOL
Jtco
![Page 11: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/11.jpg)
The Performance Implications of Clock Distribution
11
And now we want to double the clock frequency!
tsuS
Clock period
Useful clock
PVT
EOL
Jtco
![Page 12: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/12.jpg)
The Performance Implications of Clock Distribution
12
All these factors are a function of the path length Reducing the clock tree size reduces clock loss
Clock period
Useful clock
tsutco
![Page 13: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/13.jpg)
Clock Loss in Fixed Trees
13
Clock loss chasms
![Page 14: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/14.jpg)
Clock Region Alignment to Fixed Trees
14
Clock loss chasms
![Page 15: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/15.jpg)
Clock Region Alignment to Fixed Trees
15
Clock loss chasms
![Page 16: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/16.jpg)
Clock Region Alignment to Fixed Trees
16
tco tsu
![Page 17: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/17.jpg)
Clock Region Alignment to Fixed Trees
17
tsutco
![Page 18: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/18.jpg)
Clock Region Alignment to Fixed Trees
18
tsutco
![Page 19: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/19.jpg)
Clock Region Alignment to Fixed Trees
19
tsutco
![Page 20: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/20.jpg)
The Advantage of Customized Clock Trees
20
Fixed clock trees Custom clock trees
Several clock regions can share the same clock planeCustom clock regions minimize divergent insertion delayand clock loss
![Page 21: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/21.jpg)
Stratix 10 Routable Clocks
21
Construct any clock tree Arbitrary size Arbitrary placement Support different types of tree: H-tree, fishbone, …
Flexibility has cost and delay implications But in the end, we’ll come out ahead
![Page 22: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/22.jpg)
22
Stratix 10 is an Array of “Sectors”
Sector
![Page 23: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/23.jpg)
23
The Routable Clocks are in the Seams between Sectors
![Page 24: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/24.jpg)
24
Clock Grid
32 clock wiresClock switchbox
Repeaters
Repeaters+ clock taps
Wires are bi-directional
![Page 25: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/25.jpg)
25
Intra-Sector Clock Distribution
![Page 26: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/26.jpg)
The Routable Clock Challenge
26
Provide sufficient flexibility With a minimum of added cost and delay
![Page 27: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/27.jpg)
Clock “Planes”
27
Label clock segments 0 – 31 Each set of segments labeled N forms a plane: 32 clock planes
![Page 28: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/28.jpg)
Clock “Planes”
28
Label clock segments 0 – 31 Each set of segments labeled N forms a plane: 32 clock planes
![Page 29: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/29.jpg)
Clock “Planes”
29
Label clock segments 0 – 31 Each set of segments labeled N forms a plane: 32 clock planes
![Page 30: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/30.jpg)
Clock “Planes”
30
Label clock segments 0 – 31 Each set of segments labeled N forms a plane: 32 clock planes
![Page 31: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/31.jpg)
Clock “Planes”
31
H-tree can be constructed in one plane No crossovers
![Page 32: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/32.jpg)
Intra-Plane Muxing
32
Simple 3-1 mux on each segmentSmall incremental delaySmall cost
![Page 33: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/33.jpg)
Inter-Plane Muxing
33
Must support cross-overs Route from source to tree root More complex clock trees
Add one more multiplexer Connect plane N+1 to plane N
![Page 34: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/34.jpg)
Clock Tree Synthesis
34
Identify the clock region
![Page 35: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/35.jpg)
Clock Tree Synthesis
35
Identify the clock regionChoose a clock plane and generate a balanced clock tree
![Page 36: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/36.jpg)
Clock Tree Synthesis
36
Identify the clock regionChoose a clock plane and generate a balanced clock treeRoute clock from source to root of clock tree
![Page 37: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/37.jpg)
Non-canonical Clock Trees
37
Clock regions are generally not square Nor 2n x 2m
What about trees of arbitrary size and shape?
![Page 38: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/38.jpg)
Clock Tree Synthesis Algorithm
38
Generate minimal height tree
Example: 2 x 5 clock region
![Page 39: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/39.jpg)
Clock Tree Synthesis – Graphical Representation
39
1. Cover with smallest enclosing 2n x 2n tree
![Page 40: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/40.jpg)
Clock Tree Synthesis – Graphical Representation
40
2. Prune tree
![Page 41: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/41.jpg)
Clock Tree Synthesis – Graphical Representation
41
2. Prune tree
![Page 42: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/42.jpg)
Clock Tree Synthesis – Graphical Representation
42
![Page 43: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/43.jpg)
Clock Tree Synthesis – Graphical Representation
43
3. Remove redundant segments in subtree
![Page 44: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/44.jpg)
Clock Tree Synthesis – Graphical Representation
44
4. Reduce tree height
![Page 45: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/45.jpg)
Clock Tree Synthesis – Graphical Representation
45
Tradeoff tree height for less overlap
![Page 46: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/46.jpg)
Clock Tree Delay
46
Most connections go through a single primary mux Minimal additional delay
A few go through additional secondary mux Where overlap occurs in the tree
Path delays are inherently balanced Leaves are all on plane N, at the same depth All paths from source (plane N+k) to leaves go through the same number
of primary and same number of secondary muxes
![Page 47: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/47.jpg)
Clock Insertion Delay
47
Flexibility does add insertion delay
Bi-directional clock wires add some delay Tradeoff delay for cost
Total delay relative to fixed clocks: ~1.3-1.4x delay for same distance
What is the effect on performance? Routable clock tree can be much smaller than fixed trees But for the same size, fixed trees have less clock loss
![Page 48: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/48.jpg)
Performance Analysis 1
48
Clock loss modeled as a linear function of insertion delayCompare performance based on clock tree sizeAssumptions: Some critical path must cross the worst-case skew chasm
Example:Critical path = 760ps : What Fmax can I expect?
Depends on the size of the clock region
![Page 49: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/49.jpg)
49
Performance vs. Clock Region Size (760ps critical path)
25K LE 400K LE 1.6M LE 3M LE
![Page 50: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/50.jpg)
Performance Analysis 2
50
Monte Carlo analysisLarge set of benchmarks From previous generation FPGA
Generate a variation profile according to modelLocate a design randomly on the dieCompute Fmax 1. Using a fixed clock tree
2. Using a custom clock tree for clock region
Repeat for many designs, variation profiles and placements
![Page 51: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/51.jpg)
Example Variation
51
Analysis captures effect of variation correlation
![Page 52: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/52.jpg)
Monte Carlo Experiment
52
Map design to random locations with fixed Clocks
![Page 53: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/53.jpg)
Monte Carlo Experiment
53
Map using routable clocks
![Page 54: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/54.jpg)
54
Results – Routable Clocks Reduce Worst Case by 6%
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
0.92 0.94 0.96 0.98 1 1.02 1.04 1.06 1.08
99%ile
Rou
table Clock De
lay
99%ile Global Clock Delay
Scatter Global vs Routable Clocks
![Page 55: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/55.jpg)
Capacity
55
Does the clock grid with 32 planes provide enough clocks?Some designs use well over 100 clocks
![Page 56: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/56.jpg)
Capacity Experiments
56
Used large Arria 10 designs with large clock demandsExtended the benchmarks scaled to Stratix 10 merged designs added many local transceiver clock regions
Mapped each clock region on Arria 10 to corresponding set of sectors on Stratix 10Used prototype clock tree synthesis tools Optimal, balanced clock trees
Generated clock trees using N clock planes N = 16 – 32
![Page 57: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/57.jpg)
57
Capacity Results
Base Benchmarks Expanded
Expanded/ MergedMerged
3clocks/ Expanded
![Page 58: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/58.jpg)
Stratix 10 Clock Network Summary
58
Transition to custom-built clock trees Scalability Performance
Challenges: Provide sufficient flexibility Without too much delay overhead
Performance difference averages about 6% But can be larger if designs are partitioned well
![Page 59: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/59.jpg)
Future Work
59
ASIC design methodologies for FPGAsMany opportunities for CAD tools Design partitioning Clock partitioning Iterative floorplanning, placement, clock tree synthesis Skew-aware place and route Take advantage of “Interesting” clock tree structures
![Page 60: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/60.jpg)
Future Work
60
ASIC design methodologies for FPGAsMany opportunities for CAD tools Design partitioning Iterative floorplanning, placement, clock tree synthesis Skew-aware place and route Take advantage of “Interesting” clock tree structures
Buy Stratix 10!
![Page 61: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/61.jpg)
Thank You
![Page 62: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/62.jpg)
Performance Comparison – Fixed vs. Routable Clocks
62
Infeasible to run real experiments No benchmarks, no CAD tools
Model performance based on size of clock region Fixed global clocks vs. customized clocks
Change in design methodology Global clocks kill performance Long recognized in ASICs
Partition design into communicating clock regions GALS: Globally Asynchronous, Locally Synchronous May be meso-synchronous (same clock, arbitrary skew)
Sectors are large! ~25K LEs
![Page 63: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/63.jpg)
63
Performance vs. Clock Region Size (1200ps critical path)
![Page 64: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/64.jpg)
64
Performance vs. Clock Region Size (800ps critical path)
![Page 65: Stratix™10 High Performance Routable Clock Networksisfpga.org/fpga2016/index_files/Slides/2_2.pdf · Stratix™10 High Performance Routable Clock Networks Carl Ebeling, Dana How,](https://reader033.vdocuments.us/reader033/viewer/2022051202/5a79a38f7f8b9a5a438dc630/html5/thumbnails/65.jpg)
Related Work
65
Very little academic work Limited to fixed clocks with routable source-to-root routing
Xilinx Ultrascale clocks “Clock region” tiles
Similar to sectors, but are not uniform Two separate clock networks:
“Routing”: Route source to clock tree“Distribution”: Build local clock tree
Clock tracks run through the center of CRs Trees not inherently balanced Seems biased towards fishbone clocks: Rib-and-Spine Includes tunable delays