low power clocking

69
ACSEL Lab University of California, Davis 1 Low Power Clocking Through the Use of Dual Edge Triggered Flip-Flops Gabriel Ricardo Theresa Holliday

Upload: lapis

Post on 21-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Low Power Clocking. Through the Use of Dual Edge Triggered Flip-Flops Gabriel Ricardo Theresa Holliday. Outline. Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Low Power Clocking

ACSEL Lab University of California, Davis 1

Low Power Clocking

Through the Use of Dual Edge Triggered Flip-FlopsGabriel Ricardo

Theresa Holliday

Page 2: Low Power Clocking

2ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

Page 3: Low Power Clocking

3ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

Page 4: Low Power Clocking

4ACSEL Lab University of California, Davis

Symmetric Pulse Generator Flip-Flop (SPGFF)

First stage, X and Y, are dynamic, second stage static NAND Results in small delay Can size to trade some delay for power

Page 5: Low Power Clocking

5ACSEL Lab University of California, Davis

Operation of SPGFF

Transparency window created by CLK and CLK3 for stage 1 (CLK1 and CLK4 for stage 2), allows for X (Y) to conditionally evaluate based on input D.

Output stage NAND allows for X, Y to be passed to output based on clock value without the need for a latch.

Page 6: Low Power Clocking

6ACSEL Lab University of California, Davis

Transmission Gate Master Slave (TGMS)

Page 7: Low Power Clocking

7ACSEL Lab University of California, Davis

Comparison between SPGFF and TGMS in 0.18um

Delay Power EDP Clk load

SPGFF 356 ps 133 μW 1.70e-23 Js 12 fF

TGMS 354 ps 89.9 μW 1.13e-23 Js 16 fF

),max( ,,,, rqclkfsufqclkrsu ttttdelay

354

110

Setup Time

Total Delay

356

-20

Performance (ps)

TGMS SPGFF

75

3 12

90122

2.0 9.3

133

Total Power

Internal Power

Data Power

Clock Power

TGMS SPGFF

Power @ 25% activity (uW)

Page 8: Low Power Clocking

8ACSEL Lab University of California, Davis

Advantages of SPGFF

Lowest clock energy of other DET-CSEs, resulting in higher clock power savings

Energy delay product comparable to high performance single edge triggered clocked storage elements

Page 9: Low Power Clocking

9ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

Page 10: Low Power Clocking

10ACSEL Lab University of California, Davis

Characterization Methodology – Generating synthesis views

Created automated process for generating synopsys liberty format (.lib) synthesis models.Using perl scripts and gspice (spice pre/post-

processor)Characterized for timing and energy.Can easily extend to generate cadence

synthesis models (.tlf).

Page 11: Low Power Clocking

11ACSEL Lab University of California, Davis

Characterization Methodology – Trip-points

Used same trip-points as those in technology library.

Nominal conditions: 25˚C, 1.8V supply Can easily generate best and worst case corner

models (over temp and supply variation). Cell delay: defined as clock 50% rise/fall to Output

(Q or QN) 50% rise/fall Transition time: 10%-90% rise, 90%-10% fall

time

Page 12: Low Power Clocking

12ACSEL Lab University of California, Davis

Trip-points - Falling

Page 13: Low Power Clocking

13ACSEL Lab University of California, Davis

Trip-points - Rising

Page 14: Low Power Clocking

14ACSEL Lab University of California, Davis

Characterization Methodology - Drive Characteristics

Build 5x5 non-linear delay table.Clock slope values (nano-seconds) :

0.03, 0.1, 0.4, 1.5, 3Output load values (fF):

0.35, 21, 38.5, 147, 311

Page 15: Low Power Clocking

15ACSEL Lab University of California, Davis

Characterization Methodology – Trip-points

Setup time: sweep input transition towards active edge until 10% increase in clock to output delay.

Hold time: sweep input transition away from active edge until 10% increase in clock to output delay.

Page 16: Low Power Clocking

16ACSEL Lab University of California, Davis

Clock to Qdelay

Data to clock delay

Constant clk-Q Constant clk-Q

Failure region

Variable clk-QVariable clk-Q

Characterization Methodology – Setup-hold

10% push-out 10% push-out

Page 17: Low Power Clocking

17ACSEL Lab University of California, Davis

Characterization Methodology – Setup and Hold

Build 3x2 non-linear delay table. (3ps accuracy)

Clock slope values (nano-seconds):

0.03, 3Data slope values (nano-seconds):

0.03, 0.9, 3

Page 18: Low Power Clocking

18ACSEL Lab University of California, Davis

Characterization Methodology – Internal energy

Characterized over same data points as drive characteristics for internal energy (5x5 lookup table).

Data pin, clock pin energy tables generated (1x5 lookup table).

Page 19: Low Power Clocking

19ACSEL Lab University of California, Davis

Characterization Results- single vs dual-edge – D to Q delay

SPGFFTGMS

16

1142

0.03

0.1

0.4

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

delay (ns)

load - # of minimum sized inverters

clock slope (ns)

TGMS delay

0.4-0.45

0.35-0.4

0.3-0.35

0.25-0.3

0.2-0.25

0.15-0.2

0.1-0.15

0.05-0.1

0-0.05

16

1142

0.03

0.1

0.4

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

delay (ns)

load - # of minimum sized invertersclock slope (ns)

SPGFF delay

Page 20: Low Power Clocking

20ACSEL Lab University of California, Davis

What is typical output load?

Extracted output loading from netlist for all CSEs.

Average load = 24fF (6.8 min. inverters)

90% of CSEs have load less than 60fF (17 min. sized inverters)

Page 21: Low Power Clocking

21ACSEL Lab University of California, Davis

Netlist extracted CSE output loading statistics

output loading on CSEs

0

200

400

600

800

1000

1200

loading - # of min. sized inverters

nu

mb

er o

f n

ets

Page 22: Low Power Clocking

22ACSEL Lab University of California, Davis

Characterization Results- single vs dual-edge – Delay

SPGFFTGMS

16

1142

0.03

0.1

0.4

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

delay (ns)

load - # of minimum sized inverters

clock slope (ns)

TGMS delay

0.4-0.45

0.35-0.4

0.3-0.35

0.25-0.3

0.2-0.25

0.15-0.2

0.1-0.15

0.05-0.1

0-0.05

16

1142

0.03

0.1

0.4

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

delay (ns)

load - # of minimum sized inverters

clock slope (ns)

SPGFF delay

Typical region of operation

Page 23: Low Power Clocking

23ACSEL Lab University of California, Davis

Characterization Results – zoomed-in- single vs dual-edge – delay

SPGFFTGMS

2 3 4 5 60.03

0.05

0.07

0.09

0.11

0.14

0.15

0.16

0.17

0.18

0.19

0.2

0.21

delay (ns)

load (# min. inverters)

clo

ck

slo

pe

(n

s)

SPGFF delay

0.2-0.21

0.19-0.2

0.18-0.19

0.17-0.18

0.16-0.17

0.15-0.16

0.14-0.15

2 3 4 5 60.03

0.05

0.07

0.09

0.11

0.14

0.15

0.16

0.17

0.18

0.19

0.2

0.21

delay (ns)

load (# min. inverters)

clo

ck

slo

pe

(n

s)

TGMS delay

0.2-0.21

0.19-0.2

0.18-0.19

0.17-0.18

0.16-0.17

0.15-0.16

0.14-0.15

Page 24: Low Power Clocking

24ACSEL Lab University of California, Davis

Characterization Results- single vs dual-edge – Energy delay product

SPGFF TGMS

2 3 4 5 60.03

0.05

0.07

0.09

0.11

0.18

0.2

0.22

0.24

0.26

0.28

0.3

0.32

energy (pJ)

load (# min. inv)

clk

slo

pe

(n

s)

SPGFF energy

2 3 4 5 60.03

0.05

0.07

0.09

0.11

0.18

0.2

0.22

0.24

0.26

0.28

0.3

0.32

energy (pJ)

load (# min. inv)

clo

ck

slo

pe

(n

s)

TGMS energy

0.3-0.32

0.28-0.3

0.26-0.28

0.24-0.26

0.22-0.24

0.2-0.22

0.18-0.2

Page 25: Low Power Clocking

25ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

Page 26: Low Power Clocking

26ACSEL Lab University of California, Davis

Leon SPARC core configuration

Page 27: Low Power Clocking

27ACSEL Lab University of California, Davis

Leon SPARC synthesis

Synthesized using TSMC 0.18um standard cell library.

Target frequency of 200MHzLimit use of single sized D-FF.

Page 28: Low Power Clocking

28ACSEL Lab University of California, Davis

SET- Synthesis flow

Netlist(.db)

Power Analysis(power compiler)

Standardcell library

RTL ofprocessor(VHDL)

Synthesis(Design Compiler)

Reports(area, timing)

Page 29: Low Power Clocking

29ACSEL Lab University of California, Davis

SET-CSE synthesis summary

Cell type Area

(mm2)

% total

Power

(mW)

%

total

Memory blocks 2.03 55% 214.3 72%

Core 0.71 19% 73 24%

Clock tree (ideal net) N/A N/A 11.6 4%

Total 3.7 299

Area and Power

Page 30: Low Power Clocking

30ACSEL Lab University of California, Davis

Core summary

Core Area(mm2) % total

core

Power

(mW)

Sequential (1986 CSEs) 0.47 36% 26

Combinatorial + nets 0.24 64% 47

Total 0.71 73

Approximately 20k-gates

Page 31: Low Power Clocking

31ACSEL Lab University of California, Davis

Clock tree loading

Clock tree components Loading (pF)

Sequential cells (1986 cells) 5.18Memory macro cells (6) 1.37Wire routing* 11.4

Total 17.94

* - based on library wire-load model

Page 32: Low Power Clocking

32ACSEL Lab University of California, Davis

Clock tree power estimation

High-fanout nets are beyond the library’s wire-load models interpolation range.

wire-load models are not meant for estimating balanced distribution nets such as clock nets.

Using library wire-load models for clock tree is not valid.

Use an H-tree estimation equation to obtain a ball-park number.

Page 33: Low Power Clocking

33ACSEL Lab University of California, Davis

H-tree estimation equation

Equation developed by ACSEL lab member Nikola Nedovic.

recursively calculates H-tree loading for a given area, number of CSEs in design, and number of H-tree levels.

Page 34: Low Power Clocking

34ACSEL Lab University of California, Davis

H-tree estimation method

PLL

cc

S

S

Leaf level

S/2L-1

S/2L-1

M/4L-1 Storage elements

Page 35: Low Power Clocking

35ACSEL Lab University of California, Davis

H-tree estimation method

* Table taken from Nedovic, Nikola, Ph.D. Dissertation, UCD, “CLOCKED STORAGE ELEMENTS FOR HIGH-PERFORMANCE APPLICATIONS”

Page 36: Low Power Clocking

36ACSEL Lab University of California, Davis

H-tree estimation method

Equation reduces to:

Load due to CSEs Load due to wiring

Page 37: Low Power Clocking

37ACSEL Lab University of California, Davis

Total H-tree power

Load switching powerClock driver power

Page 38: Low Power Clocking

38ACSEL Lab University of California, Davis

SET-CSE synthesis summarywith H-tree estimate

Cell type Area

(mm2)

% total

Power

(mW)

%

total

Memory blocks 2.03 55% 214.3 66%

Core 0.71 19% 63 19%

Clock tree (H-tree estimate) N/A N/A 48.5 15%

Total 3.7 325

Area and Power

Page 39: Low Power Clocking

39ACSEL Lab University of California, Davis

SET-CSE power profilewith H-tree estimate

SET power breakdown

calculated clk pwr, 48.507,

15%

Total core power

(m W), 63, 19%

Regis ter file (m W),

85.762, 26%

Total cache (m W),

128.5716, 40%

Page 40: Low Power Clocking

40ACSEL Lab University of California, Davis

SET-CSE Core power profile

SET Core power breakdown

calculated clk pwr, 48.507,

44%Total core power

(m W), 63, 56%

Page 41: Low Power Clocking

41ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

Page 42: Low Power Clocking

42ACSEL Lab University of California, Davis

Modeling DET-CSEs for Synthesis

Need to model the timing parameters for both edges.

Tsetup Thold

Ts-r Th-r Ts-f Th-f

System clock

Data

Output

DET-CSE

SET-CSE

Tclk->Q

Page 43: Low Power Clocking

43ACSEL Lab University of California, Davis

Modeling DET-CSEs for Synthesis

Can model complex timing relationships for synthesis.

D

CLK

QFalling-edge timing arc

rising-edge timing arc

Page 44: Low Power Clocking

44ACSEL Lab University of California, Davis

Modeling DET-CSEs for Synthesis

Synthesis tool will time, and (try to) meet constraints for the dual-edge triggered synchronous system.

CLK

D

Page 45: Low Power Clocking

45ACSEL Lab University of California, Davis

Modeling DET-CSEs for Synthesis

Synthesis tool will use the worst timing arc relationship for critical path constraint.

Rising to Falling Rising to FallingFalling to rising

Rising edge samplewindow

falling edge samplewindow

Critical

Not Critical

Page 46: Low Power Clocking

46ACSEL Lab University of California, Davis

Modeling DET-CSEs for Synthesis

Synthesis tools are not capable of inferring a dual-edge triggered device from HDL code.

For meeting timing we only care about the strictest constraint anyway. (i.e. for one pair of launch and capture edges).

Unnecessary to model complex timing device.

Page 47: Low Power Clocking

47ACSEL Lab University of California, Davis

Modeling DET-CSEs for Synthesis

Simply model DET-CSE as a SET-CSE with worst-edge timing parameters.

System clock

Data

Output

Tclk->Q-max

Ts-max Th-max

Page 48: Low Power Clocking

48ACSEL Lab University of California, Davis

Synthesis flow for DET-CSEs

Netlist withDET-CSEs

(.db)

Power AnalysisTiming Analysis

Standardcell library

RTL ofprocessor(VHDL)

Model ofDET-CSE

Synthesis(Design Compiler)

AutomatedCharacterization

(perl, hspice)

Page 49: Low Power Clocking

49ACSEL Lab University of California, Davis

Synthesis flow for DET-CSEs

Use synthesis directives to force use of DET-CSE modeled device.

Synthesize for target throughput, not frequency. Worst-case models for meeting critical-path

timing constraints. generate a worst-case hold model, to verify the

race-path. Fastest clk-Q with worst-case hold time

Page 50: Low Power Clocking

50ACSEL Lab University of California, Davis

Modeling DET-CSEs for Synthesis

Race-path modeling.

Rising to Falling Rising to FallingFalling to rising

Rising edge samplewindow

falling edge samplewindow

May have under-constrained race-path.

Page 51: Low Power Clocking

51ACSEL Lab University of California, Davis

DET-CSE synthesis summarywith H-tree estimate

Cell type Area

(mm2)

% total

Power

(mW)

%

total

Memory blocks 2.03 44% 214.3 72%

Core 1.65 36% 64 21%

Clock tree (det-cse H-tree estimate)

@ new freq.

N/A N/A 20.2 7%

Total 4.64 298.5

Area and Power

Page 52: Low Power Clocking

52ACSEL Lab University of California, Davis

DET-CSE power profile

DET power breakdown

calculated clk pwr,

20.2, 7%

Total core power

(m W), 64, 21%

Regis ter file (m W),

85.762, 29%

Total cache (m W),

128.5716, 43%

Page 53: Low Power Clocking

53ACSEL Lab University of California, Davis

DET Core summary

Core Area(mm2) % total

core

Power

(mW)

% total

Sequential (1986 CSEs) 1.41 85.5% 22 34%

Combinatorial + nets 0.24 14.5% 42 66%

Total 1.65 64

Approximately 20k-gates (based on nand4)

Page 54: Low Power Clocking

54ACSEL Lab University of California, Davis

DET-CSE power profile

DET Core power breakdown

calculated clk pwr,

20.2, 24%

Total core power

(m W), 64, 76%

Page 55: Low Power Clocking

55ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including DETCSEs into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

Page 56: Low Power Clocking

56ACSEL Lab University of California, Davis

Issues with DET-CSE integration

Memory blocks are single-edge triggered and must be clocked at twice the core clock rate.

Currently using a dual-edge triggered VHDL behavioral model for memory blocks for netlist simulations.

Possible solutions: Clock the memory blocks at 2x nominal. Modify memory address and data latch to be

dual-edge triggered.

Page 57: Low Power Clocking

57ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

Page 58: Low Power Clocking

58ACSEL Lab University of California, Davis

Power Comparison of two design netlists

SPGFF TGMS

Core Total = 92.46mW Core Total = 106.8mW

27mW savings24% power savings in core

SET Core power breakdown

calculated clk pwr, 48.507,

44%Total core power

(m W), 63, 56%

DET Core power breakdown

calculated clk pwr,

20.2, 24%

Total core power

(m W), 64, 76% Total = 84.2mW Total = 111mW

Page 59: Low Power Clocking

59ACSEL Lab University of California, Davis

Summary of comparison

24% savings in core power. Estimated 28% increase in sequential cell area

(17% increase in core area). Both meet specified performance @ 200MHz

(report zero slack).

Page 60: Low Power Clocking

60ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions

Page 61: Low Power Clocking

61ACSEL Lab University of California, Davis

Summary

Established methods for automated cell characterization.

Developed design flow for DET-CSE integration.

Demonstrated pre-layout results.Obtained functional DET-CSE netlist.Investigated functionally enhanced DET-

CSEs (scan, reset).

Page 62: Low Power Clocking

62ACSEL Lab University of California, Davis

Future work

Expand family of DET-CSEs (i.e. sizings, functionalities)

Obtain more accurate clock tree loading.Perform layout of cells for more accurate

comparison.

Page 63: Low Power Clocking

63ACSEL Lab University of California, Davis

Functionally enhanced Dual-Edge Triggered Flip-Flops

Need to show that functions such as reset, set, and scan can be added to DETCSEs

Need to do analysis of power and performance impact of added functionality

Do DETCSEs still result in practical power savings?

Page 64: Low Power Clocking

64ACSEL Lab University of California, Davis

Scan in SPGFF

Q

Q

Mp22 Mp23

Mn12

Mn13 Mpi10mni10

Mpi4mni4

Mpi5mni5

Mpi6mni6

Mpi7mni7

CLKCLK1 CLK2 CLK3 CLK4

Mp0 Mp14

Mn0

Mn1

Mn2

Mp15

Mn3

Mpi3mni3

CLK

CLK

CLK3

D

X

SD

SCAN

CLK3

Mns0

Mns1 Mns2

Mp21Mp20

Mn9

Mn10

Mn11

Mp19

Mn8

Mpi9mni9

CLK1

CLK1

CLK4

D

Y

SD

SCAN

CLK4

Mns3

Mns4Mns5

Page 65: Low Power Clocking

65ACSEL Lab University of California, Davis

Scan in DFF

Functional Schematic of DFF with Scan

Page 66: Low Power Clocking

66ACSEL Lab University of California, Davis

Clear in SPGFF

Q

Q

Mp22 Mp23

Mn12

Mn13Mpi10mni10

Mpi4mni4

Mpi5mni5

Mpi6mni6

Mpi7mni7

CLKCLK1 CLK2 CLK3 CLK4

Mp0

Mp14

Mn1

Mn2

Mp15

Mn3

Mpi3mni3

CLK

CLK

CLK3

D

X

CLK3

Mn0

Mpr0

Mpr1

CLR

Mp21

Mp20

Mn10

Mn11

Mp19

Mn8

Mpi3mni3

CLK1

CLK1

CLK4

D

Y

CLK4

Mn9

Mpr2

Mpr3

CLR

Page 67: Low Power Clocking

67ACSEL Lab University of California, Davis

Clear in DFF

Page 68: Low Power Clocking

68ACSEL Lab University of California, Davis

Preliminary Results of Adding Functionalities

),max( ,,,, rqclkfsufqclkrsu ttttdelay

Delay Power EDP

SPGFF 356 ps 136 μW 1.73e-23 Js

With Scan 371 ps (4.2%) 143 μW (5%) 1.97e-23 Js (14%)

With Reset 407 ps (14%) 140 μW (3%) 2.32e-23 Js (34%)

Delay Power EDP

SETFF 412 ps 82 μW 1.38e-23 Js

With Scan 483 ps (17%) 82 μW (0%) 1.89e-23 Js (37%)

With Reset 483 ps (17%) 71 μW (-13%) 1.65e-23 Js (20%)

Page 69: Low Power Clocking

69ACSEL Lab University of California, Davis

Outline

Dual Edge Flip-Flops overview Standard Cell Characterization LEON Synthesis for SET design LEON Synthesis for DET design Issues with including Dual edge into synthesis flow Preliminary comparisons Conclusions and Future Work Questions