lecture 6 clocked elements - stanford university · 2007-04-23 · m horowitz ee 371 lecture 6 26...

40
EE 371 Lecture 6 M Horowitz 1 Lecture 6 Clocked Elements Computer Systems Laboratory Stanford University [email protected] Copyright © 2006 Mark Horowitz, Ron Ho Some material taken from lecture notes by Vladimir Stojanovic and Ken Mai

Upload: others

Post on 15-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 1

Lecture 6

Clocked Elements

Computer Systems LaboratoryStanford University

[email protected]

Copyright © 2006 Mark Horowitz, Ron HoSome material taken from lecture notes by Vladimir Stojanovic and Ken Mai

Page 2: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 2

Overview

• Readings (For next lecture on clocking)– Gronowski Alpha clocking paper– Restle Clock grid paper– Harris Variations paper

• Today’s topics– Latches and flops overview– Power and timing metrics– High-performance design and low-energy design– Examples

Page 3: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 3

Why Are Clocked Elements Important?

• A graph from the first lecture, showing cycle time in FO4

• A clock cycle contains less and less time each generation

Cycle in FO4

10

100

85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05

intel 386

intel 486

intel pentium

intel pentium 2

intel pentium 3

intel pentium 4

intel i tanium

Alpha 21064

Alpha 21164

Alpha 21264

Spar c

Super Spar c

Spar c64

Mips

HP PA

Power PC

AMD K6

AMD K7

AMD x86-64

Page 4: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 4

We Need Faster Clocked Elements

• Note that the previous graph invites us to extrapolate blindly– “In just a few years, clock frequencies will go to infinity!!”– Reality says clock cycle times will level out

• There is some disagreement on the actual limits– “6-8 FO4 per cycle is optimum” (Hrishikesh, ISCA ’02)– “Yes, but including overhead, more like 18” (Srinivasan, ISM ’02)

• Today: 16 FO4/cycle in Pentium4; 11 FO4/cycle in Sony Cell– If a flop has an overhead of 3FO4, this is 15%-30% overhead– We can fill less of the cycle time with “work”

• Making a faster flop helps performance significantly

Page 5: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 5

We Need Lower Power Clocked Elements

• Recall that chip power density is climbing up– “Rocket nozzle!” “Surface of the sun!” “Oh, my!”– Because power is C*V2f and V is not scaling anymore (V != 10Lgate)

• Part of this power is for clock distribution– Clocking requires 70% of core power in Power4 (2001)– Clocking requires 25% of core power in dual-core Itanium (2005)

• Almost all of this power is in driving the latches/flops at the ends– Only around 20-25% of clock power is in clock transmission– Saving power at the ends is better than saving power in the wires

• Making a more efficient flop helps power significantly

Page 6: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 6

Timing Overhead, Illustrated

• A standard circuit view of a flop system

– Flop timing overhead is the “data-to-output(Q)” delay• Tdata-to-Q = Tsetup + Tclk-to-Q = Tsu + Tcq

• Next look at some basic latch and flop designs and metrics– But first, an analogy…

LogicD Q

Clk

D Q

Clk

Tcycle

Tcq TsuTlogic

Page 7: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 7

An Analogy Of Timing

• Clocked datapaths are like streets with traffic lights– Cars moving down the street are data

• Some cars speed, some cars drive normally, some crawl• Principal rule: cannot have two cars collide into each other

– As a car (data) comes to a light (flop or latch)• It passes through if the light is green and waits if it’s red

• A clocked system has a master controller for the lights– All the lights turn red/green at the same time (as in Manhattan)

• In some systems, the lights are green very briefly (flops)• In other systems, the lights stay green half the time (latches)

– Satisfy principal rule: ensure all cars only pass one block per light

• The point of traffic lights is to slow down fast cars

Page 8: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 8

The Analogy’s Timing Failures

• A traffic light (flop or latch) can fail the principal rule in two ways– A car can be too slow to make it to the next block: Max path

• The car hadn’t reached the intersection when the light turned red– A car can race through more than one block: Min path

• The intersection didn’t stay clear just when the light went red

• Failures often arise because traffic light timing is not perfect– Differences between traffic lights (skew), or local variations (jitter)

• Fixing these failures– Max paths are fixable: Slow down the master controller – Min paths are not: Rebuild the street (or the light control wires)!

• Cost of 3 months fab time and $1M in mask costs

Page 9: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 9

By The Way

• Do you really need traffic lights?– No, if you can guarantee that all cars travel at the same rate– Wave-pipelining ensures all datapaths have the same delay

• Do all the lights need to switch at the same time?– No, and long blocks might benefit from different light timing– Intentional clock skew on chips helps for timing problems

• Do you really need a master controller for all the lights?– No, if you have cars (drivers) negotiate between themselves– Asynchronous circuits handshake between data items

Page 10: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 10

A Note On Terminology

• Avoid saying a latch or flop is “open” or “closed”– There is ambiguity in these terms– Water flows if you “open a valve”– Current stops if you “open a switch”

• The common wording today is a little awkward, but clear– A timing element transfers data when it is transparent– It does not transfer data when it is opaque

• Other choices include “blocking” and “non-blocking”– The oddest I have ever seen was “Permissive” and “Prohibitive”– But don’t use that…

Page 11: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 11

Latches and Flops

• Latch has soft timing: transparent when clock=1

• Flop has hard timing: “transparent” when clock 0 1

D Q

Clk

transparent opaqueClk

D

Q

D Q

sample edge

Clk

D

Q

Page 12: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 12

Latch Timing

• Setup and hold times are defined relative to the clock fall– Setup time: how long before the clock fall must the data arrive– Hold time: how long after the clock fall must the data not change

• Delay depends on arrival time of data relative to clock rise– On early data arrival, delay = Tcq

– On late data arrival, delay = Tdq

D Q

Clk

transparent opaqueClk

D

Q

D

Q

Early

Late

su hold

Page 13: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 13

Flop Timing

• Setup and hold times are defined relative to the clock rise– Setup time: how long before the clock rise must the data arrive– Hold time: how long after the clock rise must the data not change

• Delay is always Tcq, as long as data hits the setup constraint

Clk

D

Q

su hold

D Q

Page 14: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 14

Latch and Flop Timing

• Softness of latch timing edges allows time borrowing– Nominally a latch expects its data when the latch goes transparent– But the latch will accommodate late data

• Until the data runs into the falling edge of the clock (going opaque)– Time-borrowing works backwards (“slack forwarding”) and forwards

• Flops generally do not allow time borrowing

• For some latches and flops, setup time is negative– The data can change just after the latch goes opaque– The latch can still see that data change

Page 15: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 15

Flop Delay

• What’s the delay through a flop?– Data must get to the flop by a setup time Tsu

– Data must traverse the flop and take up Tcq

• So the time available to do logic is what’s left– Tlogic = Tcycle – Tsu – Tcq – Tskew

LogicD Q D Q

Tcycle

Tcq TsuTlogic

Page 16: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 16

0

50

100

150

200

250

300

350

-200 -150 -100 -50 0 50 100 150 200Data-Clk [ps]

Clk

-Out

put [

ps]

Setup Hold

Flop Clk-to-Q Delay Depends On Data-to-Clk

Sampling Window

Source: Stojanovic

Page 17: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 17

Examine The Setup Half Of That Graph

Data-to-clock delay

Dat

a-to

-out

put d

elay

Variable cq regionConstant cq region Failure region

Tcq

Tdq

optimum

45o

Page 18: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 18

What About Power?

• To fairly compare power of various flops or latches, include– External power to drive the clock input– External power to drive the data input– Internal power required to switch interior nodes– Internal power required to drive output load

Q

CLK

D

Qb

VDDVDD

VDD

PD

PCLK PINT

PLOAD

D

CLK

Page 19: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 19

Simplest CMOS Latch

• Basic transparent high latch (Figure 11.2) is simply a passgate

– Very simple and compact– Stores data dynamically – subject to leakage and noise problems

clk

clk_b

dataq_b

Page 20: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 20

Bad Things Can Happen To This Latch

• Various modes of failure, including

Source: Chandrakasan Ch. 11

Page 21: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 21

Buffered Transparent High Latch

• Avoid input noise with a local input inverter

– Still have problems with the storage node

clk

clk_b

dataq

Page 22: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 22

Jam Latch

• Make the storage node static

– Feedback inverter is very weak and loses in a fight– Burns power during the fight until the latch flips– Beware mixed process corners (SF, FS) for overpowering latch!

clk

clk_b

dataq

weak

Page 23: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 23

Tristate Latch

• Prevent the fight by shutting off the feedback device

– Count on constant input drive during latch transparency– Note that input inverter+passgate can be a tristate as well

clk

clk_b

data

clk_b

clk

q

Page 24: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 24

Robust CMOS Latch

• Don’t take the output from the storage node directly

– This is a good, safe latch design– Not the fastest in the world (can trade-off speed for DANGER)– Setup and hold times are close to 0

clk

clk_b

data

clk_b

clk

q

Page 25: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 25

Flip-Flops Come In Several Flavors

• You can make a flip-flop out of two back-to-back latches

• You can make it out of a edge-triggered element, plus SR latch

D QClk

D QClk

clk_b

M-SFlop

EdgeFlop

R Q

clk

SPulseGen

Page 26: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 26

Flip-Flops Come In Several Flavors, con’t

• You can also make a flip-flop out of a pulsed (glitch) latch

• All three flavors are (almost) functionally indistinguishable – Although they may react differently to clock skew– Will look at this in a later lecture

D QClk

clk

GlitchLatch

pulsegen

Page 27: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 27

Flavor 1: Simplest Flop Is Master-Slave

• World’s first LSI calculator chip (Tokyo Shibaura Electric)– Real “old-school” but it’s a master-slave– All tristates instead of passgates

Source: Suzuki, JSSC 1973

D Q

Clk1

Clk

Clk

Clk1

Clk

Clk1

Clk

QMClk

Clk1

Clk1

Clk

Page 28: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 28

Flavor 1: Another Master-Slave

• PowerPC 603 flop – Note this is a negative edge-triggered flop (clks are reversed)– Faster than C2MOS, but at worse input noise (no tristate)– Master clk usually generated from slave (why not vice versa?)

Vdd Vdd

Clk

QClk Clkb

Clkb

D

Source: Gerosa, JSSC 1994

Page 29: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 29

Flavor 1: Yet Another Master-Slave (With Scan)

• IBM Power4 latch with scan-ability (very common today)

Source: Warnock, IBM JRD, 2002

Page 30: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 30

Flavor 2: True Edge-Triggered Flops

• Dec 21264 Alpha flop (Madden & Bowhill, ’90, Matsui ’94)– A sense-amp (pulse generator) followed by a capturing RS latch– Negative setup time available

Clk

D D

RS

QQD=1

D=0

pulse

Pulsegenerator

CapturingLatch

Why?

delays not equal Source: Madden & Bowhill, JSSC, 1990

Page 31: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 31

(Flavor 2) Improved “Strong-Arm Flop”

• Faster S-R stage – Sized for improved switching – Bigger drivers and smaller keepers

• Symmetric Q and Q_b delays

• Reduce the hysterisis of the latch– Slam the node high/low strongly– Then in precharge, hold it weakly– Makes it faster, but at what cost?

• Coupling immunity• Yet another application of NFL

– “No free lunch”

Source: Nikolic & Stojanovic, ISSCC ’99

Page 32: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 32

Flavor 3: Glitch Latch or Pulse Latch

• Just like a standard latch, only with a pulsed (glitched) clock– Looks and smells like a flip-flop– Pulse gives you negative setup time and a soft edge (latch-like)

• Generating the pulsed clock can be tricky over process corners– A “clock chopper” or “single-shot”– Do this locally at the pulse latch for each latch or small group

• Distributing this pulse is dangerous: pulses disappear– So often combine this functionality into the latch itself

clk pulse

Page 33: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 33

(Flavor 3) Glitch Latch

• AMD K6 latch, called a “Hybrid Latch Flip-Flop” (i.e., glitch latch)

Vdd

D

Clk

Q

Q

D=1

D=0

signal atnode X Second

Stage LatchPulse

Generator

D=1

D=0

Source: Partovi ISSCC ’96

Page 34: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 34

(Flavor 3) Abstraction of the HLFF

D=1

D=0

signal atnode X

SecondStage Latch

PulseGenerator

D=1

Clk

D

D=0

Enable

Q

Page 35: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 35

(Flavor 3) A “Semi-Dynamic” Flip-Flop

• Sun UltrasparcIII pulsed latch

– Dynamic pulse generator, unlike the NAND3 in the HLFF– Beware the 1-1 glitch

Q

Clk

Clk1

S

I

D

Clk

Clk Source: Klass VLSI ’98

Page 36: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 36

Delay Comparisons

• Driving a load of 14 inverters, in a 0.18μm technology

Min D-Q Delay Comparison

0.00.51.01.52.02.53.03.54.04.55.0

MSL C2MOS HLFF SDFF SAFF M-SAFF

Del

ay [F

O4]

Pulse latches are faster

Source: Stojanovic

Page 37: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 37

Energy Comparisons

• Driving a load of 14 inverters, in a 0.18μm technology

Energy breakdown (50% activity)

0

20

40

60

80

100

120

MSL C2MOS HLFF SDFF SAFF M-SAFF

Ener

gy [f

J]

Ext. clock Ext. data Int. clockInternal non-clk

Latches are lower energy

Source: Stojanovic

Page 38: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 38

What About Imperfect Clocks?

• Clocks do not always arrive “on time”– Clocks to different clocked elements arrive at different times: SKEW– One clock will arrive at different times from cycle to cycle: JITTER

• Clock performance is a function of the distribution grid (later)

clk1

clk2

tskewt+jit

t-jit

Page 39: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 39

Do Clocked Elements Have Transparency?

200

220

240

260

280

300

-30 -20 -10 0 10 20 30 40 50 60

Clk arrival time [ps]

D-Q

del

ay [p

s]

tCU

DDQm

DDQM

NominalClk

Clk

• Change in D-Q delay < clock uncertainty– The clocked element absorbs some of the clock uncertainty

• There is a range of clk arrivals for which D-Q is constant– For that range the clocked element looks like a combinational gate

Page 40: Lecture 6 Clocked Elements - Stanford University · 2007-04-23 · M Horowitz EE 371 Lecture 6 26 Flip-Flops Come In Several Flavors, con’t • You can also make a flip-flop out

EE 371 Lecture 6M Horowitz 40

“Clock Uncertainty Absorption”

• Helps to mitigate the effects of clock skew (HLFF shown below)

• More skew-tolerant circuits will be discussed in a later lecture