industrial clock synthesis - international symposium on ... · pdf fileregular clock tree...
TRANSCRIPT
Industrial Clock Synthesis
Pei-Hsin HoImplementation GroupSynopsys, Inc.
© 2009 Synopsys, Inc. (2)
Outline
• Problems that our customers care about• Existing solutions and plan-of-record solutions• Improvements required
© 2009 Synopsys, Inc. (3)
Clock Network• Clock network delivers the clock signal to synchronize
every sequential cell in the clock domain
i j
© 2009 Synopsys, Inc. (4)
Clock Network Metrics• Power• Insertion delay for a clock path
Usually longer than the clock periodProxy for variation and powerLogic levelRC delay
• Skew under variationFalse skewVariation
• Multiple corners • OCV (on-chip variation) i j
© 2009 Synopsys, Inc. (5)
Power: Clock Major Culprit
• Power: biggest implementation issue todayConsumer electronicsFrequency limiter: 4GHz ceiling
• Biggest trouble makerClock
• 1/3 total power • Many large and leaky buffers
i j
© 2009 Synopsys, Inc. (6)
Variation: Clock Most Susceptible
• Variation: biggest implementation issue tomorrowFrequency limiter
• >40% dead cycle timeConsumer electronics
• Biggest trouble makerClock
• OCV impact: 2X
i j
Logic time
logic vari. margin
clock skew
clk variation margin
clock cycle time
© 2009 Synopsys, Inc. (7)
Variation: Clock Most Susceptible• MC CTS: how to minimize skew for all corners?
Variation ratios of cells and wires differ in different corners• Variation rations of wires differ in different layers• Failed chips hold violation caused by variable skews
Skew = 5-5 = 0; Skew = 5.7-5.6 = 0.1
1
3
11
1
1
11
1.1
3.3
1.21.2
1.2
1.1
1.1
1.1
© 2009 Synopsys, Inc. (8)
Performance
• Performance: biggest implementation challenge yesterday?
1G Hz ASICs
• Clock skew is still a big issue for performance todayMore clocks, more modes and more IPsComplex clock gatingHigher wire delays
i j
© 2009 Synopsys, Inc. (9)
Clock Is Key
• Not competitive in clock not competitive in timing or power for 45nm and beyond
• State-of-the-art clock-tree synthesis algorithms
i1
i2
i3
i4
i1
i2
i3
i4
© 2009 Synopsys, Inc. (10)
Industrial Solutions for Power
• Clock gatingConventionalSequentialPhysical clock gating
• Register clumping• Register banking
© 2009 Synopsys, Inc. (11)
Please check the techniques your team is using on your current project. 2007 N = 718; Margin of error = +/- 4%
Clock Gating Used by 90% of Synopsys Users
14%
18%
42%
43%
51%
90%
0% 20% 40% 60% 80% 100%
Power NetworkSynthesis
StateRetention/MTCMOS
Multi-ThresholdDesign
Multi-Voltage Design
Gate-Level PowerOptimization
Clock Gating
2007
2006
© 2009 Synopsys, Inc. (12)
Clock Gating
• Automatic clock gating• always@(posedge clk)
if (en) Q <= D• ICGs (Integrated Clock Gating cells)
• Fewer sizes and uneven loads harder to balance skew• Consume power
en
clk
ICG
gclkD
Q
enclk High
activity
D Qen
Low activity
clk gclk
ICG
© 2009 Synopsys, Inc. (13)
Physical Clock Gating• Design Compiler inserts ICGs during logic synthesis
ICG drive flops wide-spread for datapath timing; unbalanced ICG levelsLeave power saving opportunities on the table
m2
m1
i4
r6
r1r2
r3
r5
r4
i1
flop
clock gate
macro
buffer
© 2009 Synopsys, Inc. (14)
ICG Merge
• Merge ICGs of the same enable signal
s1
i4
r6
r1r2
r3
r5
r4
i1
flop
clock gate
macro
buffer
m2
m1
i4
r6
r1r2
r3
r5
r4
© 2009 Synopsys, Inc. (15)
ICG Removal
• Remove ICGs that are ineffective (small fanout and mostly enabled) or causing unbalanced ICG levels
s1
i4
i1
flop
clock gate
macro
buffer
s1
i4
r6
r1r2
r3
r5
r4
© 2009 Synopsys, Inc. (16)
ICG Splitting
• Split ICGs based on timing, DRC and placement
i1
i2
i4
i1
flop
clock gate
macro
buffer
s1
i4
© 2009 Synopsys, Inc. (17)
Issue: Higher ICGs
• Merging ICGs with the same enable may save more clock tree power
Can gate a larger subtree
• Splitting ICGs make enable signal timing more easily satisfied
3a
1a
2a
Merge
Split
© 2009 Synopsys, Inc. (18)
Issue: Higher ICGs
• Does not always save power!Higher ICGs restrict the sharing of the subtreesMay introduce enable timing violations
3a
1a
2a
Merge
Split
© 2009 Synopsys, Inc. (19)
Issue: Multi-Layer ICGs
• Multi-layer ICGs may save more clock tree powerDC PwrC generates multi-layer ICGs by enable factoringMay gate a larger subtree
1a
3c
2b
1a&c
2b&c
Factor
Removal
© 2009 Synopsys, Inc. (20)
Issue: Multi-Layer ICGs
• But does not always save power!Multi-layer ICGs restrict sharing of the subtreesExtra ICG may consume more power than a very small gated subtree
1a
3c
2b
1a&c
2b&c
Factor
Removal
© 2009 Synopsys, Inc. (21)
Flop Placement: Register Clumping• Around 8% reduction in total power on average
© 2009 Synopsys, Inc. (22)
Flop Placement: Register Banking
• Automatically place registers into banksReduce powerReduce clock skewImplemented as RP (relative placement) constraints
Routability may be an issue for some designs
© 2009 Synopsys, Inc. (23)
Industrial Solutions for Variation
• Clock meshes• OCV-aware clustering
© 2009 Synopsys, Inc. (24)
Clock Meshes
• Good skew under variationTree above the meshTrees below the mesh to drive the flops
• Bad for power (~+30% clock power)
More wiresCan only gate the small clock trees below the mesh
• Few SNPS customers mass-produce IC products with clock meshes
• Insight from clock mesh?Regularity good for variation
metal 8 metal 7 metal 6 planGroup boundary periphery IOs
© 2009 Synopsys, Inc. (25)
Dummy ICG Insertion
• Insert dummy ICGs to balance topology
i1
i2
i3
i4
i1
flop
clock gate
macro
buffer
i1
i2
i4
© 2009 Synopsys, Inc. (26)
OCV-Aware Register Clustering
• Try to cluster registers with critical timing paths in between within a 1st or 2nd level cluster minimize variation impact to timing
i1
flop
clock gate
macro
buffer
i1
i2
i3
i4i j
i1
i2
i3
i4
i j
© 2009 Synopsys, Inc. (27)
Register Clumping
• Place registers closer for the leaf-level buffers or ICGs to minimize leaf-level net capacitance (>50% of total net cap)
i1
flop
clock gate
macro
buffer
i4
i3
i1
i2
i1
i2
i3
i4
© 2009 Synopsys, Inc. (28)
Register Banking
• Place registers into rectangular banks (more dramatic form of register clumping)
i4
i3
i1
i2
i1
flop
clock gate
macro
buffer
i4
i3
i1
i2
© 2009 Synopsys, Inc. (29)
Regular Clock Tree Synthesis
• Synthesize regular buffer tree based on DRC and placementBalanced buffer levels, ICG levels, fanout, wire length (by placement), metal layer
i1
flop
clock gate
macro
buffer
i1
i2
i4
i3
i4
i3
i1
i2
© 2009 Synopsys, Inc. (30)
Industrial Solutions for Timing
• Clock routing• Useful skews• Inter-clock delay balancing• CTS for SoCs• Multi-voltage-domain and multi-mode CTS
© 2009 Synopsys, Inc. (31)
Clock Routing
• Signal routing mostly for routability (wire length) and timing
• Clock routing mostly for skew under variation and power (wire length)
• Snaking for skew under variation• Selective shielding clock nets to
control skew
© 2009 Synopsys, Inc. (32)
Detail-Route Clocks with Wire Snaking• Detail route the clock tree using minimal wire snaking (and
shielding) to fine-tune skew
i1
flop
clock gate
macro
buffer
i1
i2
i4i3
i1
i2
i4i3
© 2009 Synopsys, Inc. (33)
Before Physical Synthesis
m2
m1
i4
r6
r1r2
r3
r5
r4
i1
flop
clock gate
macro
buffer
• Irregular gated clock topology bad for variation and power
© 2009 Synopsys, Inc. (34)
After Physical Synthesis
• Regular gated clock topology good for variation and power
i1
flop
clock gate
macro
buffer
i1
i2
i4i3
© 2009 Synopsys, Inc. (35)
Useful Skew
• Clock skews can be used to fix timing violationsSetup: trigger the launcher sooner and/or the capturer laterHold: trigger the launcher later and/or the capturer sooner
• RiskClock skew is hard to control under variation
i j
© 2009 Synopsys, Inc. (36)
Inter-Clock Delay Balancing
• If there are timing paths from one clock domain to another clock
• Inter-clock delay must be balanced based on the timing constraints
• Extra insertion delay is bad for power and variation
i j
© 2009 Synopsys, Inc. (37)
CTS for SoCs
•SoCLarge number of IPs with known clock latencies
• Hard to balance skewsLarge number of placement and/or routing blockages
• Hard to balance topologyMultiple voltage domains and multiple operation modes
• Multiple clocks• Complex requirements
Clock source
Macro clock pins
Routingblockages
Placementblockage
© 2009 Synopsys, Inc. (38)
Multiple Voltage Domains and Multiple Modes• Clock shared by multiple voltage
domainsNo timing path in between insert isolation cells near the top of the clock tree to save power
• Clock going through a voltage domain that may be turned off in an operation mode
Clock buffers powered by always-on power rail
• Implication in power rail synthesis
Clock source
Macro clock pins
Routingblockages
Placementblockage
© 2009 Synopsys, Inc. (39)
Summary
• Competitive clock synthesis technology is key for IC product differentiation
PowerVariationPerformance
• Academic research in this area will make huge impacts to the real world through the realization of cool, robust and fast ICs
© 2009 Synopsys, Inc. (40)
Backup Slides
© 2009 Synopsys, Inc. (41)
• Invalid data (bubbles) may go through the pipelined datapath and consume energy during computation
Pipelined Datapath with Bubbles
+ * &
© 2009 Synopsys, Inc. (42)
• Gate clock upon invalid dataalways @ (posedge clk)
if (vld) begin a1 <= a0; b1 <= b0; end
Conventional Clock Gating
+ * &
© 2009 Synopsys, Inc. (43)
• Introduce a valid-bit pipeline to track valid data and gate the clock to the datapath so that no pipeline stage computes for invalid data
Sequential Clock Gating
+ * &