low power clocking

Post on 21-Jan-2016

33 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Low Power Clocking. Through the Use of Dual Edge Triggered Flip-Flops Gabriel Ricardo Theresa Holliday. Outline. Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow - PowerPoint PPT Presentation

TRANSCRIPT

ACSEL Lab University of California, Davis 1

Low Power Clocking

Through the Use of Dual Edge Triggered Flip-FlopsGabriel Ricardo

Theresa Holliday

2ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

3ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

4ACSEL Lab University of California, Davis

Symmetric Pulse Generator Flip-Flop (SPGFF)

First stage, X and Y, are dynamic, second stage static NAND Results in small delay Can size to trade some delay for power

5ACSEL Lab University of California, Davis

Operation of SPGFF

Transparency window created by CLK and CLK3 for stage 1 (CLK1 and CLK4 for stage 2), allows for X (Y) to conditionally evaluate based on input D.

Output stage NAND allows for X, Y to be passed to output based on clock value without the need for a latch.

6ACSEL Lab University of California, Davis

Transmission Gate Master Slave (TGMS)

7ACSEL Lab University of California, Davis

Comparison between SPGFF and TGMS in 0.18um

Delay Power EDP Clk load

SPGFF 356 ps 133 μW 1.70e-23 Js 12 fF

TGMS 354 ps 89.9 μW 1.13e-23 Js 16 fF

),max( ,,,, rqclkfsufqclkrsu ttttdelay

354

110

Setup Time

Total Delay

356

-20

Performance (ps)

TGMS SPGFF

75

3 12

90122

2.0 9.3

133

Total Power

Internal Power

Data Power

Clock Power

TGMS SPGFF

Power @ 25% activity (uW)

8ACSEL Lab University of California, Davis

Advantages of SPGFF

Lowest clock energy of other DET-CSEs, resulting in higher clock power savings

Energy delay product comparable to high performance single edge triggered clocked storage elements

9ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

10ACSEL Lab University of California, Davis

Characterization Methodology – Generating synthesis views

Created automated process for generating synopsys liberty format (.lib) synthesis models.Using perl scripts and gspice (spice pre/post-

processor)Characterized for timing and energy.Can easily extend to generate cadence

synthesis models (.tlf).

11ACSEL Lab University of California, Davis

Characterization Methodology – Trip-points

Used same trip-points as those in technology library.

Nominal conditions: 25˚C, 1.8V supply Can easily generate best and worst case corner

models (over temp and supply variation). Cell delay: defined as clock 50% rise/fall to Output

(Q or QN) 50% rise/fall Transition time: 10%-90% rise, 90%-10% fall

time

12ACSEL Lab University of California, Davis

Trip-points - Falling

13ACSEL Lab University of California, Davis

Trip-points - Rising

14ACSEL Lab University of California, Davis

Characterization Methodology - Drive Characteristics

Build 5x5 non-linear delay table.Clock slope values (nano-seconds) :

0.03, 0.1, 0.4, 1.5, 3Output load values (fF):

0.35, 21, 38.5, 147, 311

15ACSEL Lab University of California, Davis

Characterization Methodology – Trip-points

Setup time: sweep input transition towards active edge until 10% increase in clock to output delay.

Hold time: sweep input transition away from active edge until 10% increase in clock to output delay.

16ACSEL Lab University of California, Davis

Clock to Qdelay

Data to clock delay

Constant clk-Q Constant clk-Q

Failure region

Variable clk-QVariable clk-Q

Characterization Methodology – Setup-hold

10% push-out 10% push-out

17ACSEL Lab University of California, Davis

Characterization Methodology – Setup and Hold

Build 3x2 non-linear delay table. (3ps accuracy)

Clock slope values (nano-seconds):

0.03, 3Data slope values (nano-seconds):

0.03, 0.9, 3

18ACSEL Lab University of California, Davis

Characterization Methodology – Internal energy

Characterized over same data points as drive characteristics for internal energy (5x5 lookup table).

Data pin, clock pin energy tables generated (1x5 lookup table).

19ACSEL Lab University of California, Davis

Characterization Results- single vs dual-edge – D to Q delay

SPGFFTGMS

16

1142

0.03

0.1

0.4

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

delay (ns)

load - # of minimum sized inverters

clock slope (ns)

TGMS delay

0.4-0.45

0.35-0.4

0.3-0.35

0.25-0.3

0.2-0.25

0.15-0.2

0.1-0.15

0.05-0.1

0-0.05

16

1142

0.03

0.1

0.4

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

delay (ns)

load - # of minimum sized invertersclock slope (ns)

SPGFF delay

20ACSEL Lab University of California, Davis

What is typical output load?

Extracted output loading from netlist for all CSEs.

Average load = 24fF (6.8 min. inverters)

90% of CSEs have load less than 60fF (17 min. sized inverters)

21ACSEL Lab University of California, Davis

Netlist extracted CSE output loading statistics

output loading on CSEs

0

200

400

600

800

1000

1200

loading - # of min. sized inverters

nu

mb

er o

f n

ets

22ACSEL Lab University of California, Davis

Characterization Results- single vs dual-edge – Delay

SPGFFTGMS

16

1142

0.03

0.1

0.4

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

delay (ns)

load - # of minimum sized inverters

clock slope (ns)

TGMS delay

0.4-0.45

0.35-0.4

0.3-0.35

0.25-0.3

0.2-0.25

0.15-0.2

0.1-0.15

0.05-0.1

0-0.05

16

1142

0.03

0.1

0.4

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

delay (ns)

load - # of minimum sized inverters

clock slope (ns)

SPGFF delay

Typical region of operation

23ACSEL Lab University of California, Davis

Characterization Results – zoomed-in- single vs dual-edge – delay

SPGFFTGMS

2 3 4 5 60.03

0.05

0.07

0.09

0.11

0.14

0.15

0.16

0.17

0.18

0.19

0.2

0.21

delay (ns)

load (# min. inverters)

clo

ck

slo

pe

(n

s)

SPGFF delay

0.2-0.21

0.19-0.2

0.18-0.19

0.17-0.18

0.16-0.17

0.15-0.16

0.14-0.15

2 3 4 5 60.03

0.05

0.07

0.09

0.11

0.14

0.15

0.16

0.17

0.18

0.19

0.2

0.21

delay (ns)

load (# min. inverters)

clo

ck

slo

pe

(n

s)

TGMS delay

0.2-0.21

0.19-0.2

0.18-0.19

0.17-0.18

0.16-0.17

0.15-0.16

0.14-0.15

24ACSEL Lab University of California, Davis

Characterization Results- single vs dual-edge – Energy delay product

SPGFF TGMS

2 3 4 5 60.03

0.05

0.07

0.09

0.11

0.18

0.2

0.22

0.24

0.26

0.28

0.3

0.32

energy (pJ)

load (# min. inv)

clk

slo

pe

(n

s)

SPGFF energy

2 3 4 5 60.03

0.05

0.07

0.09

0.11

0.18

0.2

0.22

0.24

0.26

0.28

0.3

0.32

energy (pJ)

load (# min. inv)

clo

ck

slo

pe

(n

s)

TGMS energy

0.3-0.32

0.28-0.3

0.26-0.28

0.24-0.26

0.22-0.24

0.2-0.22

0.18-0.2

25ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

26ACSEL Lab University of California, Davis

Leon SPARC core configuration

27ACSEL Lab University of California, Davis

Leon SPARC synthesis

Synthesized using TSMC 0.18um standard cell library.

Target frequency of 200MHzLimit use of single sized D-FF.

28ACSEL Lab University of California, Davis

SET- Synthesis flow

Netlist(.db)

Power Analysis(power compiler)

Standardcell library

RTL ofprocessor(VHDL)

Synthesis(Design Compiler)

Reports(area, timing)

29ACSEL Lab University of California, Davis

SET-CSE synthesis summary

Cell type Area

(mm2)

% total

Power

(mW)

%

total

Memory blocks 2.03 55% 214.3 72%

Core 0.71 19% 73 24%

Clock tree (ideal net) N/A N/A 11.6 4%

Total 3.7 299

Area and Power

30ACSEL Lab University of California, Davis

Core summary

Core Area(mm2) % total

core

Power

(mW)

Sequential (1986 CSEs) 0.47 36% 26

Combinatorial + nets 0.24 64% 47

Total 0.71 73

Approximately 20k-gates

31ACSEL Lab University of California, Davis

Clock tree loading

Clock tree components Loading (pF)

Sequential cells (1986 cells) 5.18Memory macro cells (6) 1.37Wire routing* 11.4

Total 17.94

* - based on library wire-load model

32ACSEL Lab University of California, Davis

Clock tree power estimation

High-fanout nets are beyond the library’s wire-load models interpolation range.

wire-load models are not meant for estimating balanced distribution nets such as clock nets.

Using library wire-load models for clock tree is not valid.

Use an H-tree estimation equation to obtain a ball-park number.

33ACSEL Lab University of California, Davis

H-tree estimation equation

Equation developed by ACSEL lab member Nikola Nedovic.

recursively calculates H-tree loading for a given area, number of CSEs in design, and number of H-tree levels.

34ACSEL Lab University of California, Davis

H-tree estimation method

PLL

cc

S

S

Leaf level

S/2L-1

S/2L-1

M/4L-1 Storage elements

35ACSEL Lab University of California, Davis

H-tree estimation method

* Table taken from Nedovic, Nikola, Ph.D. Dissertation, UCD, “CLOCKED STORAGE ELEMENTS FOR HIGH-PERFORMANCE APPLICATIONS”

36ACSEL Lab University of California, Davis

H-tree estimation method

Equation reduces to:

Load due to CSEs Load due to wiring

37ACSEL Lab University of California, Davis

Total H-tree power

Load switching powerClock driver power

38ACSEL Lab University of California, Davis

SET-CSE synthesis summarywith H-tree estimate

Cell type Area

(mm2)

% total

Power

(mW)

%

total

Memory blocks 2.03 55% 214.3 66%

Core 0.71 19% 63 19%

Clock tree (H-tree estimate) N/A N/A 48.5 15%

Total 3.7 325

Area and Power

39ACSEL Lab University of California, Davis

SET-CSE power profilewith H-tree estimate

SET power breakdown

calculated clk pwr, 48.507,

15%

Total core power

(m W), 63, 19%

Regis ter file (m W),

85.762, 26%

Total cache (m W),

128.5716, 40%

40ACSEL Lab University of California, Davis

SET-CSE Core power profile

SET Core power breakdown

calculated clk pwr, 48.507,

44%Total core power

(m W), 63, 56%

41ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

42ACSEL Lab University of California, Davis

Modeling DET-CSEs for Synthesis

Need to model the timing parameters for both edges.

Tsetup Thold

Ts-r Th-r Ts-f Th-f

System clock

Data

Output

DET-CSE

SET-CSE

Tclk->Q

43ACSEL Lab University of California, Davis

Modeling DET-CSEs for Synthesis

Can model complex timing relationships for synthesis.

D

CLK

QFalling-edge timing arc

rising-edge timing arc

44ACSEL Lab University of California, Davis

Modeling DET-CSEs for Synthesis

Synthesis tool will time, and (try to) meet constraints for the dual-edge triggered synchronous system.

CLK

D

45ACSEL Lab University of California, Davis

Modeling DET-CSEs for Synthesis

Synthesis tool will use the worst timing arc relationship for critical path constraint.

Rising to Falling Rising to FallingFalling to rising

Rising edge samplewindow

falling edge samplewindow

Critical

Not Critical

46ACSEL Lab University of California, Davis

Modeling DET-CSEs for Synthesis

Synthesis tools are not capable of inferring a dual-edge triggered device from HDL code.

For meeting timing we only care about the strictest constraint anyway. (i.e. for one pair of launch and capture edges).

Unnecessary to model complex timing device.

47ACSEL Lab University of California, Davis

Modeling DET-CSEs for Synthesis

Simply model DET-CSE as a SET-CSE with worst-edge timing parameters.

System clock

Data

Output

Tclk->Q-max

Ts-max Th-max

48ACSEL Lab University of California, Davis

Synthesis flow for DET-CSEs

Netlist withDET-CSEs

(.db)

Power AnalysisTiming Analysis

Standardcell library

RTL ofprocessor(VHDL)

Model ofDET-CSE

Synthesis(Design Compiler)

AutomatedCharacterization

(perl, hspice)

49ACSEL Lab University of California, Davis

Synthesis flow for DET-CSEs

Use synthesis directives to force use of DET-CSE modeled device.

Synthesize for target throughput, not frequency. Worst-case models for meeting critical-path

timing constraints. generate a worst-case hold model, to verify the

race-path. Fastest clk-Q with worst-case hold time

50ACSEL Lab University of California, Davis

Modeling DET-CSEs for Synthesis

Race-path modeling.

Rising to Falling Rising to FallingFalling to rising

Rising edge samplewindow

falling edge samplewindow

May have under-constrained race-path.

51ACSEL Lab University of California, Davis

DET-CSE synthesis summarywith H-tree estimate

Cell type Area

(mm2)

% total

Power

(mW)

%

total

Memory blocks 2.03 44% 214.3 72%

Core 1.65 36% 64 21%

Clock tree (det-cse H-tree estimate)

@ new freq.

N/A N/A 20.2 7%

Total 4.64 298.5

Area and Power

52ACSEL Lab University of California, Davis

DET-CSE power profile

DET power breakdown

calculated clk pwr,

20.2, 7%

Total core power

(m W), 64, 21%

Regis ter file (m W),

85.762, 29%

Total cache (m W),

128.5716, 43%

53ACSEL Lab University of California, Davis

DET Core summary

Core Area(mm2) % total

core

Power

(mW)

% total

Sequential (1986 CSEs) 1.41 85.5% 22 34%

Combinatorial + nets 0.24 14.5% 42 66%

Total 1.65 64

Approximately 20k-gates (based on nand4)

54ACSEL Lab University of California, Davis

DET-CSE power profile

DET Core power breakdown

calculated clk pwr,

20.2, 24%

Total core power

(m W), 64, 76%

55ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including DETCSEs into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

56ACSEL Lab University of California, Davis

Issues with DET-CSE integration

Memory blocks are single-edge triggered and must be clocked at twice the core clock rate.

Currently using a dual-edge triggered VHDL behavioral model for memory blocks for netlist simulations.

Possible solutions: Clock the memory blocks at 2x nominal. Modify memory address and data latch to be

dual-edge triggered.

57ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

58ACSEL Lab University of California, Davis

Power Comparison of two design netlists

SPGFF TGMS

Core Total = 92.46mW Core Total = 106.8mW

27mW savings24% power savings in core

SET Core power breakdown

calculated clk pwr, 48.507,

44%Total core power

(m W), 63, 56%

DET Core power breakdown

calculated clk pwr,

20.2, 24%

Total core power

(m W), 64, 76% Total = 84.2mW Total = 111mW

59ACSEL Lab University of California, Davis

Summary of comparison

24% savings in core power. Estimated 28% increase in sequential cell area

(17% increase in core area). Both meet specified performance @ 200MHz

(report zero slack).

60ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

61ACSEL Lab University of California, Davis

Summary

Established methods for automated cell characterization.

Developed design flow for DET-CSE integration.

Demonstrated pre-layout results.Obtained functional DET-CSE netlist.Investigated functionally enhanced DET-

CSEs (scan, reset).

62ACSEL Lab University of California, Davis

Future work

Expand family of DET-CSEs (i.e. sizings, functionalities)

Obtain more accurate clock tree loading.Perform layout of cells for more accurate

comparison.

63ACSEL Lab University of California, Davis

Functionally enhanced Dual-Edge Triggered Flip-Flops

Need to show that functions such as reset, set, and scan can be added to DETCSEs

Need to do analysis of power and performance impact of added functionality

Do DETCSEs still result in practical power savings?

64ACSEL Lab University of California, Davis

Scan in SPGFF

Q

Q

Mp22 Mp23

Mn12

Mn13 Mpi10mni10

Mpi4mni4

Mpi5mni5

Mpi6mni6

Mpi7mni7

CLKCLK1 CLK2 CLK3 CLK4

Mp0 Mp14

Mn0

Mn1

Mn2

Mp15

Mn3

Mpi3mni3

CLK

CLK

CLK3

D

X

SD

SCAN

CLK3

Mns0

Mns1 Mns2

Mp21Mp20

Mn9

Mn10

Mn11

Mp19

Mn8

Mpi9mni9

CLK1

CLK1

CLK4

D

Y

SD

SCAN

CLK4

Mns3

Mns4Mns5

65ACSEL Lab University of California, Davis

Scan in DFF

Functional Schematic of DFF with Scan

66ACSEL Lab University of California, Davis

Clear in SPGFF

Q

Q

Mp22 Mp23

Mn12

Mn13Mpi10mni10

Mpi4mni4

Mpi5mni5

Mpi6mni6

Mpi7mni7

CLKCLK1 CLK2 CLK3 CLK4

Mp0

Mp14

Mn1

Mn2

Mp15

Mn3

Mpi3mni3

CLK

CLK

CLK3

D

X

CLK3

Mn0

Mpr0

Mpr1

CLR

Mp21

Mp20

Mn10

Mn11

Mp19

Mn8

Mpi3mni3

CLK1

CLK1

CLK4

D

Y

CLK4

Mn9

Mpr2

Mpr3

CLR

67ACSEL Lab University of California, Davis

Clear in DFF

68ACSEL Lab University of California, Davis

Preliminary Results of Adding Functionalities

),max( ,,,, rqclkfsufqclkrsu ttttdelay

Delay Power EDP

SPGFF 356 ps 136 μW 1.73e-23 Js

With Scan 371 ps (4.2%) 143 μW (5%) 1.97e-23 Js (14%)

With Reset 407 ps (14%) 140 μW (3%) 2.32e-23 Js (34%)

Delay Power EDP

SETFF 412 ps 82 μW 1.38e-23 Js

With Scan 483 ps (17%) 82 μW (0%) 1.89e-23 Js (37%)

With Reset 483 ps (17%) 71 μW (-13%) 1.65e-23 Js (20%)

69ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

top related