clockless logic: asynchronous pipelines

21
Clockless Logic: Clockless Logic: Asynchronous Pipelines Asynchronous Pipelines MOUSETRAP: Ultra-High-Speed MOUSETRAP: Ultra-High-Speed Transition-Signaling Transition-Signaling Asynchronous Pipelines Asynchronous Pipelines Singh and Nowick, Intl. Conf. on Computer Singh and Nowick, Intl. Conf. on Computer Design (ICCD), September 2001 Design (ICCD), September 2001

Upload: hea

Post on 31-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Clockless Logic: Asynchronous Pipelines. MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Singh and Nowick, Intl. Conf. on Computer Design (ICCD), September 2001. MOUSETRAP Pipelines. Simple asynchronous implementation style, uses… transparent latches - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Clockless Logic:   Asynchronous Pipelines

Clockless Logic: Clockless Logic: Asynchronous PipelinesAsynchronous Pipelines

MOUSETRAP: Ultra-High-Speed Transition-MOUSETRAP: Ultra-High-Speed Transition-Signaling Signaling

Asynchronous PipelinesAsynchronous Pipelines

Singh and Nowick, Intl. Conf. on Computer Design Singh and Nowick, Intl. Conf. on Computer Design (ICCD), September 2001(ICCD), September 2001

Page 2: Clockless Logic:   Asynchronous Pipelines

2

MOUSETRAP PipelinesMOUSETRAP PipelinesSimple asynchronous implementation style, uses… Simple asynchronous implementation style, uses…

transparent latchestransparent latches simple control:simple control: 1 gate/pipeline stage1 gate/pipeline stage

Target datapath = static logic blocksTarget datapath = static logic blocks

““MOUSETRAP”: uses a “capture protocol”MOUSETRAP”: uses a “capture protocol”

LatchesLatches … … areare normally transparent: normally transparent: beforebefore new data arrivesnew data arrives becomebecome opaque: opaque: afterafter data arrives (“capture” data)data arrives (“capture” data)

Control Signaling:Control Signaling: transition-signaling = 2-phasetransition-signaling = 2-phase simple protocol: simple protocol: req/ackreq/ack = only 2 events per handshake = only 2 events per handshake

(not 4)(not 4) nono “return-to-zero” “return-to-zero” each transition (up/down) signals a distinct operationeach transition (up/down) signals a distinct operation

Our Goal:Our Goal: very fast cycle time very fast cycle time simple inter-stage communicationsimple inter-stage communication

Page 3: Clockless Logic:   Asynchronous Pipelines

3

reqN

ackN-1

reqN+1

ackN

Data Latch

Latch Controller

doneN

Data in Data out

Stage NStage N-1 Stage N+1

En

MOUSETRAP: A Basic FIFOMOUSETRAP: A Basic FIFOStages communicate usingStages communicate using transition-transition-

signaling:signaling:

1 transition1 transitionper data item!per data item!

11stst data item flowing through the pipeline data item flowing through the pipeline1st data item flowing through the pipeline22ndnd data item flowing through the pipeline data item flowing through the pipeline

Page 4: Clockless Logic:   Asynchronous Pipelines

4

MOUSETRAP: A Basic FIFO MOUSETRAP: A Basic FIFO (contd.)(contd.)Latch controller (XNOR) acts as Latch controller (XNOR) acts as “phase “phase

converter”:converter”: 2 distinct transitions (up or down) 2 distinct transitions (up or down) pulsed latch pulsed latch

enableenable

2 transitions per2 transitions per latch cyclelatch cycle

reqN

ackN-1

reqN+1

ackN

Data Latch

Latch Controller

doneN

Data in Data out

Stage NStage N-1 Stage N+1

En

Latch is re-enabled when Latch is re-enabled when next stage is “done”next stage is “done”Latch is disabled when Latch is disabled when current stage is “done”current stage is “done”

Page 5: Clockless Logic:   Asynchronous Pipelines

5

MOUSETRAP: FIFO Cycle TimeMOUSETRAP: FIFO Cycle Time

XNORLATCH TT 2Cycle Time =Cycle Time =

reqN

ackN-1

reqN+1

ackN

Data Latch

Latch Controller

doneN

Data in Data out

Stage NStage N-1 Stage N+1

En

Fast self-loop:Fast self-loop: N disables itselfN disables itself

22

N computesN computes

11

N+1 computesN+1 computes

22

33

N re-enabledN re-enabled to computeto compute

Page 6: Clockless Logic:   Asynchronous Pipelines

6

Detailed Controller OperationDetailed Controller Operation

One pulse per data item flowing through:One pulse per data item flowing through: down transition:down transition: caused bycaused by “done”“done” of N of N up transition:up transition: caused bycaused by “done”“done” of N+1 of N+1

No minimum pulse width constraint!No minimum pulse width constraint! simply, down transition should start “early enough”simply, down transition should start “early enough” can be “negative width” (no pulse!)can be “negative width” (no pulse!)

ack from N+1

Stage N’s Latch Controller

to Latch

done from N

Page 7: Clockless Logic:   Asynchronous Pipelines

7

Stage N+1

logic

delaydelay

Stage N

Data Latch

Latch Controller

doneN

logic

delaydelay

Stage N-1

logic

delaydelayreqreqNN

ackN-1

reqreqN+N+11

ackN

MOUSETRAP: Pipeline With LogicMOUSETRAP: Pipeline With Logic

Logic Blocks:Logic Blocks: can use standard single-rail (non-hazard-free)can use standard single-rail (non-hazard-free)““Bundled Data” Requirement:Bundled Data” Requirement:

eacheach “req”“req” must arrive must arrive after after data inputs valid and stabledata inputs valid and stable

Simple Extension to FIFO: Simple Extension to FIFO:

insert insert logic blocklogic block + + matching delaymatching delay in each in each stagestage

Page 8: Clockless Logic:   Asynchronous Pipelines

8

Special Case: Using “Clocked Special Case: Using “Clocked Logic”Logic”Clocked-CMOS = CClocked-CMOS = C22MOS: eliminate explicit MOS: eliminate explicit

latcheslatches latch folded into logic itselflatch folded into logic itself

pull-upnetworkpull-upnetwork

pull-downnetwork

pull-downnetwork

“keeper”

EnEn

EnEn

A General C2MOS gate

logicinputs

logicinputs

logicoutput

C2MOS AND-gate

“keeper”

EnEn

EnEn

A

B

BA

logicoutput

Page 9: Clockless Logic:   Asynchronous Pipelines

9

Gate-Level MOUSETRAP: with Gate-Level MOUSETRAP: with CC22MOSMOS

Use CUse C22MOS:MOS: eliminate explicit latcheseliminate explicit latches

New Control Optimization =New Control Optimization = “Dual-Rail “Dual-Rail XNOR”XNOR” eliminate 2 inverters from critical patheliminate 2 inverters from critical path

C2MOS logic

Latch Controller

Stage NStage N-1 Stage N+1

2 22

2 22

2

2

En,En

pair ofbit latches

reqN

ackN-1

reqN+1

ackN

doneN

(En,En’)(En,En’) (done,done’)(done,done’)

(ack,ack’)(ack,ack’)

Page 10: Clockless Logic:   Asynchronous Pipelines

10

Problems with Linear Pipelining:Problems with Linear Pipelining: handles limited applications; real systems are more handles limited applications; real systems are more

complexcomplex

Complex Pipelining: Forks & JoinsComplex Pipelining: Forks & Joins

Contribution: introduce efficient circuit structures Contribution: introduce efficient circuit structures Forks: Forks: distributedistribute data + controldata + control to multiple destinationsto multiple destinations Joins: Joins: mergemerge data + controldata + control from multiple sourcesfrom multiple sources

Enabling technology for building complex async Enabling technology for building complex async

systemssystems

forkfork joinjoin

Non-Linear Pipelining: Non-Linear Pipelining: has forks/joinshas forks/joins

Page 11: Clockless Logic:   Asynchronous Pipelines

11

req

ack2

Stage N

CC

ack1

reqreq2

Stage N

CC

req1ack

Forks and Joins: ImplementationForks and Joins: Implementation

Join:Join: merge multiple merge multiple requestsrequests

Fork:Fork: merge multiple merge multiple acknowledgesacknowledges

Page 12: Clockless Logic:   Asynchronous Pipelines

12

Related ProtocolsRelated ProtocolsDay/Woods (’97), and Charlie Boxes (’00)Day/Woods (’97), and Charlie Boxes (’00)

Similarities: Similarities: all use…all use… transition signaling transition signaling for handshakesfor handshakes phase conversion phase conversion for latch signalsfor latch signals

Differences: Differences: MOUSETRAP has…MOUSETRAP has… higher throughputhigher throughput ability to handleability to handle fork/joinfork/join datapathsdatapaths more aggressive timing, less insensitivity to delaysmore aggressive timing, less insensitivity to delays

Page 13: Clockless Logic:   Asynchronous Pipelines

13

Performance, Timing and Optzn.Performance, Timing and Optzn.

MOUSETRAP with Logic:MOUSETRAP with Logic:

XNORMOSCTT 22Cycle Time =Cycle Time =

MOSCT 2Stage Latency =Stage Latency =

LOGICLATCH TT Stage Latency =Stage Latency =

LOGICXNORLATCH TTT 2Cycle Time =Cycle Time =

MOUSETRAP Using CMOUSETRAP Using C22MOS Gates:MOS Gates:

Page 14: Clockless Logic:   Asynchronous Pipelines

14

Timing AnalysisTiming AnalysisMain Timing Constraint: avoid “data overrun”Main Timing Constraint: avoid “data overrun”

Data must be safely “captured” byData must be safely “captured” by Stage NStage N

before new inputs arrive frombefore new inputs arrive from Stage Stage N-1N-1 simple 1-sided timing constraint: simple 1-sided timing constraint: fast latch disablefast latch disable Stage N’s Stage N’s “self-loop”“self-loop” faster than faster than entire pathentire path through through

previous stageprevious stage

Stage Stage NN

Data Latch

Latch Controller

doneN

logic

delaydelay

Stage Stage N-N-11

logic

delaydelayreqN

ackN-1

reqN+1

ackN

Page 15: Clockless Logic:   Asynchronous Pipelines

15

Timing Optzn: Reducing Cycle Timing Optzn: Reducing Cycle TimeTimeAnalytical Cycle Time =Analytical Cycle Time =

Goal:Goal: shorten shorten (in steady-state operation) (in steady-state operation)

Steady-state = no undue pipeline congestionSteady-state = no undue pipeline congestion

Observation:Observation: XNOR switches twice per data item: XNOR switches twice per data item:

only 2nd (up) transition criticalonly 2nd (up) transition critical for performance:for performance:

Solution: Solution: reduce XNOR output swingreduce XNOR output swing degrade “slew” for start of pulsedegrade “slew” for start of pulse allows quick pulse completion: faster rise timeallows quick pulse completion: faster rise time

Still safe when congested:Still safe when congested: pulse starts on timepulse starts on time pulse maintained until congestion clearspulse maintained until congestion clears

XNORLOGICLATCH TTT2

XNORXNOR TT and

XNORT

XNORT

Page 16: Clockless Logic:   Asynchronous Pipelines

16

Timing Optzn (contd.)Timing Optzn (contd.)

N “done”N “done” N+1 “done”N+1 “done”

N’s latchN’s latch disableddisabled

N’s latchN’s latch re-enabledre-enabled

“unoptimized” XNOR output

“optimized” XNOR output

latch only partlylatch only partlydisabled;disabled;recovers quicker!recovers quicker!

(no pulse width(no pulse widthrequirement)requirement)

Page 17: Clockless Logic:   Asynchronous Pipelines

17

Comparison with Wave PipeliningComparison with Wave Pipelining

Two Scenarios:Two Scenarios: Steady State:Steady State:

both MOUSETRAP and wave pipelines act like both MOUSETRAP and wave pipelines act like transparent “flow through” combinational pipelinestransparent “flow through” combinational pipelines

Congestion:Congestion: right environment stalls: right environment stalls: each MOUSETRAP stage each MOUSETRAP stage

safely captures datasafely captures data internal stage slow:internal stage slow: MOUSETRAP stages to its left MOUSETRAP stages to its left

safely capture datasafely capture data

congestion properly handled in congestion properly handled in MOUSETRAPMOUSETRAP

Conclusion: MOUSETRAP has potential of…Conclusion: MOUSETRAP has potential of… speed of wave pipeliningspeed of wave pipelining greater robustness and flexibilitygreater robustness and flexibility

Page 18: Clockless Logic:   Asynchronous Pipelines

18

Timing Issues: Handling Wide Timing Issues: Handling Wide DatapathsDatapathsBuffers inserted to amplify latch signals Buffers inserted to amplify latch signals (En):(En):

reqN reqN+1doneN

Stage NStage N-1

EnEn

Reducing Impact of Reducing Impact of Buffers:Buffers:

control uses control uses unbuffered unbuffered

signalssignals

buffer delay off of buffer delay off of critical critical

path!path! datapath datapath skewedskewed w.r.t. w.r.t.

controlcontrol

Timing assumption:Timing assumption:buffer delays roughly equalbuffer delays roughly equal

reqN reqN+1doneN

Stage NStage N-1

EnEn

Page 19: Clockless Logic:   Asynchronous Pipelines

19

reqN reqN+1doneN

Stage NStage N-1

EnEn

Page 20: Clockless Logic:   Asynchronous Pipelines

20

Preliminary ResultsPreliminary ResultsPre-Layout Simulations of FIFO’s:Pre-Layout Simulations of FIFO’s:

do not account for wire delays, parasitics, etc.do not account for wire delays, parasitics, etc. careful transistor sizing/verification of timing careful transistor sizing/verification of timing

constraintsconstraints

C2MOS FIFO in 0.6 HP CMOS (3.3V, 300K, normal corner)

Design Latch Delay (ps)

XNOR Delay (ps) tXNOR tXNOR

Throughput (GHz)

MOUSETRAP 220 130 160 1.67

MOUSETRAP (optimized)

200 180 120 1.92

FIFO in 0.25 TSMC (2.5V, 300K, normal corner)

Design Latch Delay (ps)

XNOR Delay (ps) tXNOR tXNOR

Throughput (GHz)

MOUSETRAP 110 65 63 3.51

Page 21: Clockless Logic:   Asynchronous Pipelines

21

Conclusions and Future WorkConclusions and Future WorkIntroduced a new asynchronous pipeline style:Introduced a new asynchronous pipeline style:

Static logic blocksStatic logic blocks Simple latches and control:Simple latches and control:

transparent latches, or Ctransparent latches, or C22MOS gatesMOS gates single gate control = 1 XNOR gate/stagesingle gate control = 1 XNOR gate/stage

Highly concurrent event-driven protocolHighly concurrent event-driven protocol High throughputs obtained:High throughputs obtained:

3.5 GHz in 0.253.5 GHz in 0.25, 1.9 GHz in 0.6, 1.9 GHz in 0.6 comparable to wave pipelines; yet more robust/less design comparable to wave pipelines; yet more robust/less design

efforteffort Correctly handle Correctly handle forksforks and and joinsjoins in datapaths in datapaths Timing constrains: local, 1-sided, easily metTiming constrains: local, 1-sided, easily met

Ongoing Work:Ongoing Work: more realistic performance measurement (incl. more realistic performance measurement (incl.

parasitics)parasitics) layout and fabricationlayout and fabrication