The University of Texas at AustinEE 382M Class Notes Page # 1 / 31
Gian Gerosa, IntelFall 2008
EE-382M
VLSI–II
FLIP-FLOPS
The University of Texas at AustinEE 382M Class Notes Page # 2 / 31
OUTLINE
• Trends
• LATCH Operation
• FLOP Timing Diagrams & Characterization
• Transfer-Gate Master-Slave FLIP-FLOP
• Merged Functions
• Clock Skew
• Other Topologies
• SCAN
• References
• Homework Discussion
The University of Texas at AustinEE 382M Class Notes Page # 3 / 31
Where are we going?
• Trends in high-performance systems
– Higher clock frequency leads to …..
– Deeper pipelines or more parallelism leads to ….
– More transistors which leads to ….
– More sequentials (FLOP or LATCH) which leads to ….
• Consequences
– Increased flip-flop overhead• Cycle time in 12-15 stage pipeline uArchitectures ~22 FO4 delays
• FLOP overhead ~3 FO4 delay (D-Q delay) ~14%
– Clock uncertainty (jitter & skew) also affects cycle time
– Clock power
The University of Texas at AustinEE 382M Class Notes Page # 4 / 31
Why work on Sequentials ?
In a 3.3 GHZ processor (90n CMOS) cycle=300pS
- Typical D-Q delay is ~ 90ps.
- If one can design a faster sequential, say D-Q delay of ~ 60pS, this represents ~10% processor performance improvement.
- If in addition one can absorb 15ps of clock uncertainties and/or embed one level of logic, this will yield an additional 5-10% processor performance improvement.
- Attaining a 10-20% performance improvement via architecture enhancements is very expensive (area, power, complexity, etc.)!
The University of Texas at AustinEE 382M Class Notes Page # 5 / 31
Basic LATCH Operation
clock
Dout
Din
Transparent-lowclock
Dout
Din
Transparent-high
clock
transparent opaque transparent opaque
Din
Dout
clock
Din
Dout
Tsu Th Tsu Th
Tdq Tdq
The University of Texas at AustinEE 382M Class Notes Page # 6 / 31
Difference between a LATCH and a FLOP
Data Q
Clock
Q
Clock
Data
F-F
Data Q
Clock
Q
Clock
Data
Latch
Edge triggered
Transparent / Opaque
Q ‘follow’s the input DATA
Q only changes at the rising edge of
the clock
The University of Texas at AustinEE 382M Class Notes Page # 7 / 31
cloc
k
Dout
Din
Building a FLOP with Two Latches
The University of Texas at AustinEE 382M Class Notes Page # 8 / 31
FLOP Delay
• Sum of setup time and Clk-output delay is the only true
measure of the performance with respect to the system
speed (MAXDELAY)
• Tcycle = Tcq + Tlogic + Tsu + Tskew
• Tlogic contains interconnect delay
D Q
CLK
D Q
CLK
logic
Tcq Tlogic Tsu
N loads
The University of Texas at AustinEE 382M Class Notes Page # 9 / 31
FLOP Timing Diagrams
Tsu : input setup timeThold : input hold timeTcq : clock to out
Tdata to out = Tsu + Tcq
100 200 300 400 500 600 700 800 900 1000
picoseconds
volts
Din
Dout
Clock
Tsu Thold
Tcq
The University of Texas at AustinEE 382M Class Notes Page # 10 / 31
Functional Pass/Failure vs. Tsu and Th
clock Master internal node
fail
pass
fail
pass
clock
Input setup time Input hold time
The University of Texas at AustinEE 382M Class Notes Page # 11 / 31
-200 -150 -100 -50 0 50 100 150 200 250
Data to Clock (picoseconds)
pico
seco
nds
-250
Tsu
Tdata to out
Thold
Tcq10%
minimum Tcq
Tsu : input setup timeThold : input hold timeTcq : clock to out
Tdata to out = Tsu + Tcq
FLOP Characterization
The University of Texas at AustinEE 382M Class Notes Page # 12 / 31
MAXDELAY D Q
CLK
D Q
CLK
logicD1Q1 D1’
Q1’
CLK
D1
Q1
D1’
Q1’
Tsu Tsu
Tcq Tcq
Tlogic
Tcycle
Tlogic < Tcycle – (Tcq + Tsu) Tcycle <= Tlogic + Tcq + Tsu or
The University of Texas at AustinEE 382M Class Notes Page # 13 / 31
MAXDELAYwith Clock Skew
D Q
CLK
D Q
CLK
logicD1Q1 D1’
Q1’
CLK CLK’
Tlogic < Tcycle – (Tcq + Tsu + Tskew)
CLK’
D1
Q1
D1’
Q1’
Tsu
Tsu
Tcq
Tcq
Tcycle
CLKTskew
Tlogic’
The University of Texas at AustinEE 382M Class Notes Page # 14 / 31
MINDELAY D Q
CLK
D Q
CLK
logicD1Q1 D1’
Q1’
CLK CLK
Tlogic > Thold – Tcq + Tskew
CLK
D1
Q1
D1’
Q1’
Tsu
Tcq
Tcycle
Thold
Tlogic
The University of Texas at AustinEE 382M Class Notes Page # 15 / 31
DESIGN WINDOW
Thold – Tcq + Tskew < Tlogic
and
Tlogic < Tcycle – (Tcq + Tsu + Tskew)
If Tcq > Thold + Tskew, then MINDELAY hazard is removed since Tlogic >= 0 always.
The University of Texas at AustinEE 382M Class Notes Page # 16 / 31
T-G Master-Slave FLOP(buffered non-inverting)
cloc
k
Dout
Din
Non time-borrowing
Time borrowing keeps the MASTER open longer by ~ 2 inverter delays;
need to be careful about MINDELAYS
Isolates SLAVE latch timing optimization/sensitivities from
output load.
TIMING:Tsu ~ 1 TG + 2 invertersTh ~ 1 inverterTcq ~ 1 TG + 1 inverter
The University of Texas at AustinEE 382M Class Notes Page # 17 / 31
Merged Function inverting FLOP
cloc
k
DoutAB
The University of Texas at AustinEE 382M Class Notes Page # 18 / 31
RESETABLE Master-Slave FLOP (asynchronous)
cloc
k
Dout
Din
Rb
The University of Texas at AustinEE 382M Class Notes Page # 19 / 31
Clock Skew Impact to Fmax
Tcycle = Tcq + Tlogic + Tsu + Tclock_uncertainty
Tclock_uncertainty = clock skew + clock jitter clock skew = τ1 – τ2 – τ3
Din
Dout
mas
ter
clock
slav
e
mas
ter
clock
slav
e
LC
B local clock bufferL
CB
GLOBAL clock
τ2
τ1 τ3
The University of Texas at AustinEE 382M Class Notes Page # 20 / 31
Other Circuit Topologies for M-S FLOPS
• C2MOS
• Hybrid Latch Flip-Flop (HLFF)
• Pulse Latch
• In Backup:• True Single-Phase Clock FLOP
• K-6 Dual-Rail ETL
• Semi-Dynamic Flip-Flop (SDFF)
The University of Texas at AustinEE 382M Class Notes Page # 21 / 31
CLK
CLKB
CLKB
CLK
Din
CLKBCLK
CLKD
Q
CLKB
CLK
CLK
CLKB
C2MOS FLOPS
clk
clk
Robustness to clock slope Low power feedbackPoor driving capability
master
slave
The University of Texas at AustinEE 382M Class Notes Page # 22 / 31
Hybrid Latch Flip-Flop (HLFF)(AMD K-6, Partovi, ISSCC 1996)
N
Dclk_
Din
Clk
Dout
The University of Texas at AustinEE 382M Class Notes Page # 23 / 31
Hybrid Latch Flip-Flop (HLFF) waveforms
Clk
Dclk_
N
Din
Dout
valid
valid
valid
TIMING:Sampling Window ~ 3 invertersTsu ~ 0 to slightly negativeTh > sampling windowTcq ~ 2 inverters
The University of Texas at AustinEE 382M Class Notes Page # 24 / 31
Pulse Latch
Din
Clock pclk
Dout
τ
The University of Texas at AustinEE 382M Class Notes Page # 25 / 31
Pulse Latch Waveforms
Clock
Pclk
Din
Dout
valid
valid
TIMING:Sampling Window ~ NAND + τTsu ~ 0 to slightly negativeTh > sampling windowTcq ~ 2 inverters
The University of Texas at AustinEE 382M Class Notes Page # 26 / 31
ACLK
Scan_outScan_in
BC
LK
ACLK
ACLKB
SCAN GADGETin
p
nout
FLOP with SCAN
clock
DoutDin
ACLK
ACLKB
FUNCTIONAL
The University of Texas at AustinEE 382M Class Notes Page # 27 / 31
A Typical Scan Path
CLK#CLK
ACLK BCLK
QStore_en
CLK#_P
CLK#
CLK
ACLK
BCLK
CLK#
CLK
ACLK
BCLK
CLK#CLK
ACLK BCLK
QStore_en
CLK#_PCLK
ACLK
BCLK
CLK#
CLK
ACLK
BCLK
CLK#
DI
SI DO
SO
Hold_scan FLOPS(non-destructive scan)
scanable FLOPS
scanable Latches
The University of Texas at AustinEE 382M Class Notes Page # 28 / 31
QUICK AREA and TIMING budgets in 130nm
Inverting FLIP-FLOP:
Area ~ 60 μm2
Tsu ~ 35psTcq ~ 65ps
Total FLOP timing overhead ~ 100ps
Scan Gadget area ~ 35 μm2
TOTAL scan inverting FLOP ~ 95 μm2
This layout does not include scan.
The University of Texas at AustinEE 382M Class Notes Page # 29 / 31
QUICK AREA and TIMING budgets in 65nm
Inverting FLIP-FLOP:
Area ~ 15 μm2
Tsu ~ ? psTcq ~ ? ps
Total FLOP timing overhead ~ ? ps
Scan Gadget area ~ 9 μm2
TOTAL scan inverting FLOP ~ 24 μm2
input output
0.45 μm
0.25 μm
0.90 μm
0.50 μm
Restof
FLOP
The University of Texas at AustinEE 382M Class Notes Page # 30 / 31
Design Goals
Characterization:Use worst case Tcq + Tsu for MAXDELAY analysis.Use worst case Thold for MINDELAY analysis.Take into account all sources of power dissipation
Target:Small clock loadShortest Din to Dout direct pathLow-power feedbackSimultaneously optimize both master and slave latchesHigh driving capabilityOptimize speed * power product
while:Minimizing Tsu + Thold (smallest sampling window)Reducing sensitivity to clock slew rate and skewNot allowing floating nodes
The University of Texas at AustinEE 382M Class Notes Page # 31 / 31
References
1. A. Chandrakasan, W.J. Bowhill, F. Fox, Design of High-Performance Microprocessor Circuits, IEEE Press, New York, 2001. Chapter 11 “Clocked Storage Elements” by Hamid Partovi, pages 207-234.
2. V. G. Oklobdzija, The Computer Engineering Handbook, CRC Press, Boca Raton, Florida, 2002. Chapter 10.2 “Latches and Flip-Flops” by Fabian Klass, pages 10.34-10.69.
3. R. J. Baker, H.W. Li, D.E. Boyce, CMOS Circuit Design, Layout, and Simulation, IEEE Press, New York, 1998. Chapter 13, pages 255-274.
4. V. G. Oklobdzija et. al. , Digital System Clocking: High-Performance and Low-Power Aspects, A Wiley-IEEE Press Publication, 264 pages, 2003.
Reference 2 has a very nice treatment of FLOPS/LATCHES, MIN/MAXDELAY, SKEW, etc with plenty of timing diagrams.
The University of Texas at AustinEE 382M Class Notes Page # 32 / 31
BACKUP
The University of Texas at AustinEE 382M Class Notes Page # 33 / 31
Transfer-Gate (T-G) Master-Slave FLOP
• Low power feedback
• Un-buffered inputs
– input capacitance depends on the phase of the clock
– over-shoot and under-shoot with long routes
– Wire length must be restricted at the input
• Buffered input addresses above issues
• Low power
• Small clk-output delay, but positive setup
• Easily embedded scan, mux, other simple functions
The University of Texas at AustinEE 382M Class Notes Page # 34 / 31
Hybrid Latch Flip-Flop Highlights
• Flip-flop features:– single phase clock– edge triggered, on one clock edge
• Latch features: Soft clock edge property– brief transparency, equal to 3 inverter delays– negative setup time– allows slack passing– absorbs skew– minimum delay between flip-flops must be controlled
• Fully static• Possible to incorporate logic
The University of Texas at AustinEE 382M Class Notes Page # 35 / 31
ATPG Sequence Timing
GCLK
CLK
CLK#
CLK
CLK#
1st System Cycle
2nd System Cycle
2nd
launch
CLK#CLK
ACLK
BCLK
CLK#
CLK
ACLK
BCLK
1st Capture
in Slave
BCLK BCLK
ACLKACLK
Capture at speed in master
STORE_EN
Capture at speed in slave
DI
SI DO
SO
SHIFT_EN
BCLKObserve Master
Aclk, Bclk Freq = 1/16 GCLK
ACLK
DC STUCK @
Transition Fault testing
The University of Texas at AustinEE 382M Class Notes Page # 36 / 31
Merged Function MUX-FLOP
SelD
_
A
clock
B
C
D
Dout
SelC
_Se
lB_
SelA
_
The University of Texas at AustinEE 382M Class Notes Page # 37 / 31
Another RESETABLE Master-Slave FLOP (synchronous)
cloc
k
Dout
Din
Rb
The University of Texas at AustinEE 382M Class Notes Page # 38 / 31
True Single-Phase Clock (TSPC) FLOP
Din
clockDout
X
Y
MASTER PRE-CHARGE SLAVE
Clock power is low; no local inversion required.
The University of Texas at AustinEE 382M Class Notes Page # 39 / 31
True Single-Phase Clock FLOP Waveforms
TIMING:Tsu ~ 2 invertersTh ~ 2 invertersTcq ~ 3 inverters
clock
X
Din
Dout
valid
valid
Y valid
valid
The University of Texas at AustinEE 382M Class Notes Page # 40 / 31
Semi-Dynamic Flip-Flop (SDFF)
• Soft edge conditioned by data since first stage is pre-charged - cross-coupled latch is added for robustness
• Small penalty for adding logic• Latch has one transistor less in stack - faster than HLFF, but 1-1 glitch exists
N
K
Dclk
The University of Texas at AustinEE 382M Class Notes Page # 41 / 31
Semi-Dynamic Flip-Flop Waveforms
TIMING:Sampling Window ~ 2 inverters + 1 NANDTsu ~ 0 to slightly negativeTh > sampling windowTcq ~ 2 inverters
Clk
Dclk
N
D
Q
valid
valid
valid
K valid
The University of Texas at AustinEE 382M Class Notes Page # 42 / 31
K-6 Dual-Rail ETL
Dclk_
A B
Pch
Determines A, B, Q,and Q_ pulse widths
The University of Texas at AustinEE 382M Class Notes Page # 43 / 31
Clk
Dclk_
A
D
valid
valid
TIMING:Sampling Window ~ 3 invertersTsu ~ 0 to slightly negativeTh > sampling windowTcq ~ 2 inverters
K-6 Dual-Rail Waveforms
B
Pch
Q valid
Q_ valid
valid
T is determined by 4 inversions
T
The University of Texas at AustinEE 382M Class Notes Page # 44 / 31
HMK#3 Problem 1.For both Din transitions (0->1 and 1->0), determine the input setup Tsu, input hold Thold, and
clock to out Tcq for the following 4 FLIP FLOPS (a, b, c, d). Use 70ps slew rate (full rail) for Din and clock; use the 130 nm CMOS transistor models. These designs are all driving a 4.2/2.1 inverter.
Show ALL your work; also answer the following questions pertaining to each design:
a. List 3 deficiencies with this design. Hint: look at b, c designs. Will this design work for a cycle time of 450ps? Why or why not?
b. Is the Din input capacitance lower than design a.? What about the clock capacitance?
c. What are the benefits of placing the slave latch off to the side? Is this a time-borrowing FLOP? Is the clock capacitance lower than design b? Any benefit in clocking the master LATCH feedback?
d. This design is a pulsed LATCH. Describe it’s behaviour with timing diagrams; Compared to a traditional FLIP-FLOP scheme, list ONE advantage and ONE disadvantage.
Simulation Tips:• Use HSPICE ic statements to properly initialize these sequential circuits.
The University of Texas at AustinEE 382M Class Notes Page # 45 / 31
Homework # 3, Problem #1FLOP design A
cloc
k
Din Out4.2
2.118.0
clock
din dout
0.28/0.6
0.13/0.6
1.4
0.7
0.56
0.28
0.28
0.28
0.28/0.6
0.13/0.6
0.28 0.28
The University of Texas at AustinEE 382M Class Notes Page # 46 / 31
cloc
k
Din Out4.2
2.118
clock
din Dout_b
0.28/0.6
0.13/0.6
1.4
0.7
0.56
0.28
0.560.56
0.28/0.6
0.13/0.6
0.28 0.28
0.56
0.28
0.280.28
Homework # 3, Problem #1FLOP design B
The University of Texas at AustinEE 382M Class Notes Page # 47 / 31
cloc
k
Din Out4.2
2.118.0
Homework # 4, Problem #2FLOP design C
clock
din Dout_b
0.28/0.6
0.13/0.6
1.4
0.7
0.56
0.28
0.13
0.13
0.28 0.28
0.56
0.28
0.280.28
0.28
0.13
0.13
0.13
0.280.28
0.28
0.28
0.28
0.28
0.28
0.28
The University of Texas at AustinEE 382M Class Notes Page # 48 / 31
Homework # 4, Problem #2FLOP design D
clock
Din Out4.2
2.118.0
pclk
din Dout
0.28/0.6
0.13/0.6
1.4
0.7
0.28
0.13
0.28
0.56
0.28
0.28
0.28
0.13
0.280.56
0.56
0.56
0.13
0.13
0.28
0.13
clock0.13
0.13
0.28
0.13