low power clocking
DESCRIPTION
Low Power Clocking. Through the Use of Dual Edge Triggered Flip-Flops Gabriel Ricardo Theresa Holliday. Outline. Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow - PowerPoint PPT PresentationTRANSCRIPT
ACSEL Lab University of California, Davis 1
Low Power Clocking
Through the Use of Dual Edge Triggered Flip-FlopsGabriel Ricardo
Theresa Holliday
2ACSEL Lab University of California, Davis
Outline
Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions
3ACSEL Lab University of California, Davis
Outline
Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions
4ACSEL Lab University of California, Davis
Symmetric Pulse Generator Flip-Flop (SPGFF)
First stage, X and Y, are dynamic, second stage static NAND Results in small delay Can size to trade some delay for power
5ACSEL Lab University of California, Davis
Operation of SPGFF
Transparency window created by CLK and CLK3 for stage 1 (CLK1 and CLK4 for stage 2), allows for X (Y) to conditionally evaluate based on input D.
Output stage NAND allows for X, Y to be passed to output based on clock value without the need for a latch.
6ACSEL Lab University of California, Davis
Transmission Gate Master Slave (TGMS)
7ACSEL Lab University of California, Davis
Comparison between SPGFF and TGMS in 0.18um
Delay Power EDP Clk load
SPGFF 356 ps 133 μW 1.70e-23 Js 12 fF
TGMS 354 ps 89.9 μW 1.13e-23 Js 16 fF
),max( ,,,, rqclkfsufqclkrsu ttttdelay
354
110
Setup Time
Total Delay
356
-20
Performance (ps)
TGMS SPGFF
75
3 12
90122
2.0 9.3
133
Total Power
Internal Power
Data Power
Clock Power
TGMS SPGFF
Power @ 25% activity (uW)
8ACSEL Lab University of California, Davis
Advantages of SPGFF
Lowest clock energy of other DET-CSEs, resulting in higher clock power savings
Energy delay product comparable to high performance single edge triggered clocked storage elements
9ACSEL Lab University of California, Davis
Outline
Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions
10ACSEL Lab University of California, Davis
Characterization Methodology – Generating synthesis views
Created automated process for generating synopsys liberty format (.lib) synthesis models.Using perl scripts and gspice (spice pre/post-
processor)Characterized for timing and energy.Can easily extend to generate cadence
synthesis models (.tlf).
11ACSEL Lab University of California, Davis
Characterization Methodology – Trip-points
Used same trip-points as those in technology library.
Nominal conditions: 25˚C, 1.8V supply Can easily generate best and worst case corner
models (over temp and supply variation). Cell delay: defined as clock 50% rise/fall to Output
(Q or QN) 50% rise/fall Transition time: 10%-90% rise, 90%-10% fall
time
12ACSEL Lab University of California, Davis
Trip-points - Falling
13ACSEL Lab University of California, Davis
Trip-points - Rising
14ACSEL Lab University of California, Davis
Characterization Methodology - Drive Characteristics
Build 5x5 non-linear delay table.Clock slope values (nano-seconds) :
0.03, 0.1, 0.4, 1.5, 3Output load values (fF):
0.35, 21, 38.5, 147, 311
15ACSEL Lab University of California, Davis
Characterization Methodology – Trip-points
Setup time: sweep input transition towards active edge until 10% increase in clock to output delay.
Hold time: sweep input transition away from active edge until 10% increase in clock to output delay.
16ACSEL Lab University of California, Davis
Clock to Qdelay
Data to clock delay
Constant clk-Q Constant clk-Q
Failure region
Variable clk-QVariable clk-Q
Characterization Methodology – Setup-hold
10% push-out 10% push-out
17ACSEL Lab University of California, Davis
Characterization Methodology – Setup and Hold
Build 3x2 non-linear delay table. (3ps accuracy)
Clock slope values (nano-seconds):
0.03, 3Data slope values (nano-seconds):
0.03, 0.9, 3
18ACSEL Lab University of California, Davis
Characterization Methodology – Internal energy
Characterized over same data points as drive characteristics for internal energy (5x5 lookup table).
Data pin, clock pin energy tables generated (1x5 lookup table).
19ACSEL Lab University of California, Davis
Characterization Results- single vs dual-edge – D to Q delay
SPGFFTGMS
16
1142
0.03
0.1
0.4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
delay (ns)
load - # of minimum sized inverters
clock slope (ns)
TGMS delay
0.4-0.45
0.35-0.4
0.3-0.35
0.25-0.3
0.2-0.25
0.15-0.2
0.1-0.15
0.05-0.1
0-0.05
16
1142
0.03
0.1
0.4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
delay (ns)
load - # of minimum sized invertersclock slope (ns)
SPGFF delay
20ACSEL Lab University of California, Davis
What is typical output load?
Extracted output loading from netlist for all CSEs.
Average load = 24fF (6.8 min. inverters)
90% of CSEs have load less than 60fF (17 min. sized inverters)
21ACSEL Lab University of California, Davis
Netlist extracted CSE output loading statistics
output loading on CSEs
0
200
400
600
800
1000
1200
loading - # of min. sized inverters
nu
mb
er o
f n
ets
22ACSEL Lab University of California, Davis
Characterization Results- single vs dual-edge – Delay
SPGFFTGMS
16
1142
0.03
0.1
0.4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
delay (ns)
load - # of minimum sized inverters
clock slope (ns)
TGMS delay
0.4-0.45
0.35-0.4
0.3-0.35
0.25-0.3
0.2-0.25
0.15-0.2
0.1-0.15
0.05-0.1
0-0.05
16
1142
0.03
0.1
0.4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
delay (ns)
load - # of minimum sized inverters
clock slope (ns)
SPGFF delay
Typical region of operation
23ACSEL Lab University of California, Davis
Characterization Results – zoomed-in- single vs dual-edge – delay
SPGFFTGMS
2 3 4 5 60.03
0.05
0.07
0.09
0.11
0.14
0.15
0.16
0.17
0.18
0.19
0.2
0.21
delay (ns)
load (# min. inverters)
clo
ck
slo
pe
(n
s)
SPGFF delay
0.2-0.21
0.19-0.2
0.18-0.19
0.17-0.18
0.16-0.17
0.15-0.16
0.14-0.15
2 3 4 5 60.03
0.05
0.07
0.09
0.11
0.14
0.15
0.16
0.17
0.18
0.19
0.2
0.21
delay (ns)
load (# min. inverters)
clo
ck
slo
pe
(n
s)
TGMS delay
0.2-0.21
0.19-0.2
0.18-0.19
0.17-0.18
0.16-0.17
0.15-0.16
0.14-0.15
24ACSEL Lab University of California, Davis
Characterization Results- single vs dual-edge – Energy delay product
SPGFF TGMS
2 3 4 5 60.03
0.05
0.07
0.09
0.11
0.18
0.2
0.22
0.24
0.26
0.28
0.3
0.32
energy (pJ)
load (# min. inv)
clk
slo
pe
(n
s)
SPGFF energy
2 3 4 5 60.03
0.05
0.07
0.09
0.11
0.18
0.2
0.22
0.24
0.26
0.28
0.3
0.32
energy (pJ)
load (# min. inv)
clo
ck
slo
pe
(n
s)
TGMS energy
0.3-0.32
0.28-0.3
0.26-0.28
0.24-0.26
0.22-0.24
0.2-0.22
0.18-0.2
25ACSEL Lab University of California, Davis
Outline
Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions
26ACSEL Lab University of California, Davis
Leon SPARC core configuration
27ACSEL Lab University of California, Davis
Leon SPARC synthesis
Synthesized using TSMC 0.18um standard cell library.
Target frequency of 200MHzLimit use of single sized D-FF.
28ACSEL Lab University of California, Davis
SET- Synthesis flow
Netlist(.db)
Power Analysis(power compiler)
Standardcell library
RTL ofprocessor(VHDL)
Synthesis(Design Compiler)
Reports(area, timing)
29ACSEL Lab University of California, Davis
SET-CSE synthesis summary
Cell type Area
(mm2)
% total
Power
(mW)
%
total
Memory blocks 2.03 55% 214.3 72%
Core 0.71 19% 73 24%
Clock tree (ideal net) N/A N/A 11.6 4%
Total 3.7 299
Area and Power
30ACSEL Lab University of California, Davis
Core summary
Core Area(mm2) % total
core
Power
(mW)
Sequential (1986 CSEs) 0.47 36% 26
Combinatorial + nets 0.24 64% 47
Total 0.71 73
Approximately 20k-gates
31ACSEL Lab University of California, Davis
Clock tree loading
Clock tree components Loading (pF)
Sequential cells (1986 cells) 5.18Memory macro cells (6) 1.37Wire routing* 11.4
Total 17.94
* - based on library wire-load model
32ACSEL Lab University of California, Davis
Clock tree power estimation
High-fanout nets are beyond the library’s wire-load models interpolation range.
wire-load models are not meant for estimating balanced distribution nets such as clock nets.
Using library wire-load models for clock tree is not valid.
Use an H-tree estimation equation to obtain a ball-park number.
33ACSEL Lab University of California, Davis
H-tree estimation equation
Equation developed by ACSEL lab member Nikola Nedovic.
recursively calculates H-tree loading for a given area, number of CSEs in design, and number of H-tree levels.
34ACSEL Lab University of California, Davis
H-tree estimation method
PLL
cc
S
S
Leaf level
S/2L-1
S/2L-1
M/4L-1 Storage elements
35ACSEL Lab University of California, Davis
H-tree estimation method
* Table taken from Nedovic, Nikola, Ph.D. Dissertation, UCD, “CLOCKED STORAGE ELEMENTS FOR HIGH-PERFORMANCE APPLICATIONS”
36ACSEL Lab University of California, Davis
H-tree estimation method
Equation reduces to:
Load due to CSEs Load due to wiring
37ACSEL Lab University of California, Davis
Total H-tree power
Load switching powerClock driver power
38ACSEL Lab University of California, Davis
SET-CSE synthesis summarywith H-tree estimate
Cell type Area
(mm2)
% total
Power
(mW)
%
total
Memory blocks 2.03 55% 214.3 66%
Core 0.71 19% 63 19%
Clock tree (H-tree estimate) N/A N/A 48.5 15%
Total 3.7 325
Area and Power
39ACSEL Lab University of California, Davis
SET-CSE power profilewith H-tree estimate
SET power breakdown
calculated clk pwr, 48.507,
15%
Total core power
(m W), 63, 19%
Regis ter file (m W),
85.762, 26%
Total cache (m W),
128.5716, 40%
40ACSEL Lab University of California, Davis
SET-CSE Core power profile
SET Core power breakdown
calculated clk pwr, 48.507,
44%Total core power
(m W), 63, 56%
41ACSEL Lab University of California, Davis
Outline
Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions
42ACSEL Lab University of California, Davis
Modeling DET-CSEs for Synthesis
Need to model the timing parameters for both edges.
Tsetup Thold
Ts-r Th-r Ts-f Th-f
System clock
Data
Output
DET-CSE
SET-CSE
Tclk->Q
43ACSEL Lab University of California, Davis
Modeling DET-CSEs for Synthesis
Can model complex timing relationships for synthesis.
D
CLK
QFalling-edge timing arc
rising-edge timing arc
44ACSEL Lab University of California, Davis
Modeling DET-CSEs for Synthesis
Synthesis tool will time, and (try to) meet constraints for the dual-edge triggered synchronous system.
CLK
D
45ACSEL Lab University of California, Davis
Modeling DET-CSEs for Synthesis
Synthesis tool will use the worst timing arc relationship for critical path constraint.
Rising to Falling Rising to FallingFalling to rising
Rising edge samplewindow
falling edge samplewindow
Critical
Not Critical
46ACSEL Lab University of California, Davis
Modeling DET-CSEs for Synthesis
Synthesis tools are not capable of inferring a dual-edge triggered device from HDL code.
For meeting timing we only care about the strictest constraint anyway. (i.e. for one pair of launch and capture edges).
Unnecessary to model complex timing device.
47ACSEL Lab University of California, Davis
Modeling DET-CSEs for Synthesis
Simply model DET-CSE as a SET-CSE with worst-edge timing parameters.
System clock
Data
Output
Tclk->Q-max
Ts-max Th-max
48ACSEL Lab University of California, Davis
Synthesis flow for DET-CSEs
Netlist withDET-CSEs
(.db)
Power AnalysisTiming Analysis
Standardcell library
RTL ofprocessor(VHDL)
Model ofDET-CSE
Synthesis(Design Compiler)
AutomatedCharacterization
(perl, hspice)
49ACSEL Lab University of California, Davis
Synthesis flow for DET-CSEs
Use synthesis directives to force use of DET-CSE modeled device.
Synthesize for target throughput, not frequency. Worst-case models for meeting critical-path
timing constraints. generate a worst-case hold model, to verify the
race-path. Fastest clk-Q with worst-case hold time
50ACSEL Lab University of California, Davis
Modeling DET-CSEs for Synthesis
Race-path modeling.
Rising to Falling Rising to FallingFalling to rising
Rising edge samplewindow
falling edge samplewindow
May have under-constrained race-path.
51ACSEL Lab University of California, Davis
DET-CSE synthesis summarywith H-tree estimate
Cell type Area
(mm2)
% total
Power
(mW)
%
total
Memory blocks 2.03 44% 214.3 72%
Core 1.65 36% 64 21%
Clock tree (det-cse H-tree estimate)
@ new freq.
N/A N/A 20.2 7%
Total 4.64 298.5
Area and Power
52ACSEL Lab University of California, Davis
DET-CSE power profile
DET power breakdown
calculated clk pwr,
20.2, 7%
Total core power
(m W), 64, 21%
Regis ter file (m W),
85.762, 29%
Total cache (m W),
128.5716, 43%
53ACSEL Lab University of California, Davis
DET Core summary
Core Area(mm2) % total
core
Power
(mW)
% total
Sequential (1986 CSEs) 1.41 85.5% 22 34%
Combinatorial + nets 0.24 14.5% 42 66%
Total 1.65 64
Approximately 20k-gates (based on nand4)
54ACSEL Lab University of California, Davis
DET-CSE power profile
DET Core power breakdown
calculated clk pwr,
20.2, 24%
Total core power
(m W), 64, 76%
55ACSEL Lab University of California, Davis
Outline
Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including DETCSEs into synthesis flow Preliminary comparisons Conclusions and Future Work Questions
56ACSEL Lab University of California, Davis
Issues with DET-CSE integration
Memory blocks are single-edge triggered and must be clocked at twice the core clock rate.
Currently using a dual-edge triggered VHDL behavioral model for memory blocks for netlist simulations.
Possible solutions: Clock the memory blocks at 2x nominal. Modify memory address and data latch to be
dual-edge triggered.
57ACSEL Lab University of California, Davis
Outline
Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions
58ACSEL Lab University of California, Davis
Power Comparison of two design netlists
SPGFF TGMS
Core Total = 92.46mW Core Total = 106.8mW
27mW savings24% power savings in core
SET Core power breakdown
calculated clk pwr, 48.507,
44%Total core power
(m W), 63, 56%
DET Core power breakdown
calculated clk pwr,
20.2, 24%
Total core power
(m W), 64, 76% Total = 84.2mW Total = 111mW
59ACSEL Lab University of California, Davis
Summary of comparison
24% savings in core power. Estimated 28% increase in sequential cell area
(17% increase in core area). Both meet specified performance @ 200MHz
(report zero slack).
60ACSEL Lab University of California, Davis
Outline
Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions
61ACSEL Lab University of California, Davis
Summary
Established methods for automated cell characterization.
Developed design flow for DET-CSE integration.
Demonstrated pre-layout results.Obtained functional DET-CSE netlist.Investigated functionally enhanced DET-
CSEs (scan, reset).
62ACSEL Lab University of California, Davis
Future work
Expand family of DET-CSEs (i.e. sizings, functionalities)
Obtain more accurate clock tree loading.Perform layout of cells for more accurate
comparison.
63ACSEL Lab University of California, Davis
Functionally enhanced Dual-Edge Triggered Flip-Flops
Need to show that functions such as reset, set, and scan can be added to DETCSEs
Need to do analysis of power and performance impact of added functionality
Do DETCSEs still result in practical power savings?
64ACSEL Lab University of California, Davis
Scan in SPGFF
Q
Q
Mp22 Mp23
Mn12
Mn13 Mpi10mni10
Mpi4mni4
Mpi5mni5
Mpi6mni6
Mpi7mni7
CLKCLK1 CLK2 CLK3 CLK4
Mp0 Mp14
Mn0
Mn1
Mn2
Mp15
Mn3
Mpi3mni3
CLK
CLK
CLK3
D
X
SD
SCAN
CLK3
Mns0
Mns1 Mns2
Mp21Mp20
Mn9
Mn10
Mn11
Mp19
Mn8
Mpi9mni9
CLK1
CLK1
CLK4
D
Y
SD
SCAN
CLK4
Mns3
Mns4Mns5
65ACSEL Lab University of California, Davis
Scan in DFF
Functional Schematic of DFF with Scan
66ACSEL Lab University of California, Davis
Clear in SPGFF
Q
Q
Mp22 Mp23
Mn12
Mn13Mpi10mni10
Mpi4mni4
Mpi5mni5
Mpi6mni6
Mpi7mni7
CLKCLK1 CLK2 CLK3 CLK4
Mp0
Mp14
Mn1
Mn2
Mp15
Mn3
Mpi3mni3
CLK
CLK
CLK3
D
X
CLK3
Mn0
Mpr0
Mpr1
CLR
Mp21
Mp20
Mn10
Mn11
Mp19
Mn8
Mpi3mni3
CLK1
CLK1
CLK4
D
Y
CLK4
Mn9
Mpr2
Mpr3
CLR
67ACSEL Lab University of California, Davis
Clear in DFF
68ACSEL Lab University of California, Davis
Preliminary Results of Adding Functionalities
),max( ,,,, rqclkfsufqclkrsu ttttdelay
Delay Power EDP
SPGFF 356 ps 136 μW 1.73e-23 Js
With Scan 371 ps (4.2%) 143 μW (5%) 1.97e-23 Js (14%)
With Reset 407 ps (14%) 140 μW (3%) 2.32e-23 Js (34%)
Delay Power EDP
SETFF 412 ps 82 μW 1.38e-23 Js
With Scan 483 ps (17%) 82 μW (0%) 1.89e-23 Js (37%)
With Reset 483 ps (17%) 71 μW (-13%) 1.65e-23 Js (20%)
69ACSEL Lab University of California, Davis
Outline
Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions