1 clockless logic montek singh tue, mar 21, 2006

43
1 Clockless Logic Clockless Logic Montek Singh Montek Singh Tue, Mar 21, 2006 Tue, Mar 21, 2006

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

1

Clockless LogicClockless Logic

Montek SinghMontek Singh

Tue, Mar 21, 2006Tue, Mar 21, 2006

Page 2: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

2

Dynamic Logic Pipelines Dynamic Logic Pipelines (contd.)(contd.)

Drawbacks of Williams’ PS0 PipelinesDrawbacks of Williams’ PS0 Pipelines Lookahead PipelinesLookahead Pipelines High-Capacity PipelinesHigh-Capacity Pipelines

Page 3: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

3

Drawbacks of PSO PipeliningDrawbacks of PSO Pipelining1.1. Poor throughput:Poor throughput:

long cycle time: 6 events per cyclelong cycle time: 6 events per cycle data “tokens” are forced far apart in timedata “tokens” are forced far apart in time

2.2. Limited storage capacity:Limited storage capacity: max only 50% of stages can hold distinct tokensmax only 50% of stages can hold distinct tokens data tokens must be separated by at least one data tokens must be separated by at least one

spacerspacer

Our Research Goals: Our Research Goals: address both issuesaddress both issues still maintain very low latencystill maintain very low latency

Page 4: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

4

Recent ApproachesRecent Approaches3 novel styles for high-speed async pipelining:3 novel styles for high-speed async pipelining:

MOUSETRAP Pipelines MOUSETRAP Pipelines [Singh/Nowick, TAU-00, ICCD-[Singh/Nowick, TAU-00, ICCD-01]01]

““Lookahead Pipelines”Lookahead Pipelines” (LP) (LP) [Singh/Nowick, Async-00][Singh/Nowick, Async-00] ““High-Capacity Pipelines”High-Capacity Pipelines” (HC) (HC) [Singh/Nowick, [Singh/Nowick,

WVLSI-00]WVLSI-00]

Goal:Goal: significantly improve throughput of PS0significantly improve throughput of PS0

Two Distinct Strategies:Two Distinct Strategies: LP: LP: introduceintroduce protocol optimizations protocol optimizations

““shave off”shave off” components from critical cycle components from critical cycle

HC: HC: fundamentally new protocolfundamentally new protocolgreater concurrency: “loosely-coupled” stagesgreater concurrency: “loosely-coupled” stages

Page 5: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

5

OutlineOutline New Asynchronous Pipelines: New Asynchronous Pipelines:

MOUSETRAP PipelinesMOUSETRAP Pipelines LLookahead ookahead PPipelines (LP)ipelines (LP) HHigh-igh-CCapacity Pipelines (HC)apacity Pipelines (HC) Dynamic circuit styleDynamic circuit style

Static circuit styleStatic circuit style

Page 6: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

6

Lookahead Pipeline StylesLookahead Pipeline Styles

Singh/NowickSingh/Nowick

Async-2000Async-2000

Page 7: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

7

Lookahead Pipelines: Strategy Lookahead Pipelines: Strategy #1#1Use non-neighbor communication:Use non-neighbor communication:

stage receives information stage receives information from from multiple later multiple later stagesstages

allows allows “early evaluation” “early evaluation”

Benefit:Benefit: stage gets stage gets head-starthead-start on next on next

cyclecycle

Page 8: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

8

Lookahead Pipelines: Strategy Lookahead Pipelines: Strategy #2#2Use early completion detection:Use early completion detection:

completion detector completion detector moved before stagemoved before stage (not after) (not after) stage indicatesstage indicates “early done”“early done” in parallel with in parallel with

computationcomputation

Benefit:Benefit: again, stage gets again, stage gets head-starthead-start on on

next cyclenext cycle

early completion detectorearly completion detector

Page 9: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

9

Lookahead Pipelines: OverviewLookahead Pipelines: Overview5 New Designs:5 New Designs:

““Dual-Rail” Data Signaling:Dual-Rail” Data Signaling: LP3/1:LP3/1: “early evaluation”“early evaluation” LP2/2:LP2/2: “early done”“early done” LP2/1:LP2/1: “early evaluation” + “early done”“early evaluation” + “early done”

““Single-Rail” Bundled-Data Signaling:Single-Rail” Bundled-Data Signaling: LPLPSRSR2/2:2/2: “early done”“early done”

LPLPSRSR2/1:2/1: “early evaluation” + “early done”“early evaluation” + “early done”

Page 10: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

10

Optimization = Optimization = “early evaluation”“early evaluation” each stage has two control inputs: from stages N+1 and N+2each stage has two control inputs: from stages N+1 and N+2

Idea: Idea: shorten precharge phaseshorten precharge phase terminate precharge terminate precharge early:early: when N+2 is done evaluating when N+2 is done evaluating

Dual-Rail Design #1: Dual-Rail Design #1: LP3/1LP3/1

Datain

Dataout

PCPC EvalEval

From N+2From N+2From N+2From N+2

NN N+1N+1 N+2N+2

ProcessingBlock

CompletionDetector

Page 11: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

11

LP3/1 ProtocolLP3/1 Protocol PRECHARGEPRECHARGE N:N: when N+1 completes when N+1 completes

evaluationevaluation EVALUATEEVALUATE N:N: whenwhen N+2N+2 completes completes

evaluationevaluation

New!New!

11 22 33

Enables “early evaluation!”Enables “early evaluation!”

44

N evaluatesN evaluates N+1 evaluatesN+1 evaluates

N+2 indicates “done”N+2 indicates “done”

N+2 evaluatesN+2 evaluates

NN N+1N+1 N+2N+2

N+1 indicates “done”N+1 indicates “done”

33

Page 12: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

12

PS0PS0PS0PS0

LP3/1LP3/1LP3/1LP3/1

LP3/1: Comparison with PS0LP3/1: Comparison with PS0

55

44

4466

NN N+1N+1 N+2N+2

NN N+1N+1 N+2N+2

Enables “early evaluation!”Enables “early evaluation!”

11

11

evaluatesevaluates

evaluatesevaluates

22

22

evaluatesevaluates

evaluatesevaluates

33

33

evaluatesevaluates

evaluatesevaluatesOnly 4 events in cycle!Only 4 events in cycle!

6 events in cycle6 events in cycle

PRECHARGE N:PRECHARGE N: when N+1 when N+1completes evaluationcompletes evaluationPRECHARGE N:PRECHARGE N: when N+1 when N+1completes evaluationcompletes evaluation

33

indicates “done”indicates “done”

indicates “done”indicates “done”

33

EVALUATE N:EVALUATE N: when N+2 completes evaluation when N+2 completes evaluationEVALUATE N:EVALUATE N: when N+2 completes evaluation when N+2 completes evaluation

EVALUATE N:EVALUATE N: when N+1 completes precharging when N+1 completes prechargingEVALUATE N:EVALUATE N: when N+1 completes precharging when N+1 completes precharging

Page 13: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

13

11 22 33

44

LP3/1 PerformanceLP3/1 Performance

DETECTEVAL TT 3Cycle Time =Cycle Time =

saved pathsaved path

Savings over PS0:Savings over PS0: 1 Precharge + 1 Completion Detection1 Precharge + 1 Completion Detection

Page 14: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

14

LP3/1: Inside a StageLP3/1: Inside a Stage

Timing Issues:Timing Issues: must satisfy several simple must satisfy several simple

constraintsconstraints Ex.:Ex.: PCPC must arrive must arrive beforebefore

Eval de-assertedEval de-asserted 1-sided timing requirement1-sided timing requirement easily satisfied in practiceeasily satisfied in practice

PC (From Stage N+1)PC (From Stage N+1)Eval (From Stage N+2)Eval (From Stage N+2)

NANDNAND

““early Eval”early Eval”

““old Eval”old Eval”Merging 2 Control Merging 2 Control Inputs:Inputs:

Page 15: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

15

Dual-Rail Design #2: Dual-Rail Design #2: LP2/2LP2/2Optimization = Optimization = “early done”“early done”

Idea: move completion detector Idea: move completion detector beforebefore processing processing blockblockstage indicates whenstage indicates when “about to”“about to” precharge/evaluateprecharge/evaluate

ProcessingBlock

“early” Completion

Detector

Datain

Dataout

“early done”

Page 16: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

16

LP2/2 Completion DetectorLP2/2 Completion DetectorModified completion detectors needed:Modified completion detectors needed:

DoneDone=1=1 when stage starts evaluating, and inputs valid when stage starts evaluating, and inputs valid DoneDone=0=0 when stage starts precharging when stage starts precharging

asymmetric C-elementasymmetric C-element

CCDoneDone

ORORbitbit00

ORORbitbit11

ORORbitbitnn

++++++

PCPC

Page 17: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

17

11 22

44

LP2/2 ProtocolLP2/2 ProtocolCompletion Detection:Completion Detection:

performedperformed in parallel in parallel with evaluation/precharge of with evaluation/precharge of stagestage

N evaluatesN evaluates N+1 evaluatesN+1 evaluates

NN N+1N+1 N+2N+2

22

““early done”early done”of N+1 evalof N+1 eval

33

33

““early done”early done”of N+2 evalof N+2 eval

““early done”early done”of N+1 prechof N+1 prech

Page 18: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

18

LP2/2 PerformanceLP2/2 Performance

11 22

3344

LP2/2 savings over PS0: LP2/2 savings over PS0: 1 Evaluation + 1 Precharge1 Evaluation + 1 Precharge

DETECTEVAL TT 22Cycle Time =Cycle Time =

Page 19: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

19

Dual-Rail Design #3: Dual-Rail Design #3: LP2/1LP2/1Hybrid of LP3/1 and LP2/2.Hybrid of LP3/1 and LP2/2. Combines: Combines:

early evaluationearly evaluation of LP3/1 of LP3/1 early doneearly done of LP2/2 of LP2/2

DETECTEVAL TT 2Cycle Time =Cycle Time =

Page 20: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

20

Lookahead Pipelines: OverviewLookahead Pipelines: Overview5 New Designs:5 New Designs:

““Dual-Rail” Data Signaling:Dual-Rail” Data Signaling: LP3/1:LP3/1: “early evaluation”“early evaluation” LP2/2:LP2/2: “early done”“early done” LP2/1:LP2/1: “early evaluation” + “early done”“early evaluation” + “early done”

““Single-Rail” Bundled-Data Signaling:Single-Rail” Bundled-Data Signaling: LPLPSRSR2/2:2/2: “early done”“early done”

LPLPSRSR2/1:2/1: “early evaluation” + “early done”“early evaluation” + “early done”

Page 21: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

21

Single-Rail Design: Single-Rail Design: LPLPSRSR2/12/1Derivative of LP2/1, adapted to single-rail:Derivative of LP2/1, adapted to single-rail:

bundled-data: bundled-data: matched delaysmatched delays instead of completion instead of completion detectorsdetectors

delaydelay delaydelay delaydelay

““Ack”Ack” to previous stages is to previous stages is “tapped off early”“tapped off early”once in evaluate (precharge), dynamic logic insensitive to input changesonce in evaluate (precharge), dynamic logic insensitive to input changes

Page 22: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

22

PC and Eval are combined exactly as in LP3/1PC and Eval are combined exactly as in LP3/1

Inside an LPInside an LPSRSR2/1 Stage2/1 Stage

““done”done” generated by an generated by an asymmetric C- asymmetric C-element element

donedone=1=1 when stage evaluates, and when stage evaluates, and data inputs data inputs validvalid donedone=0=0 when stage precharges when stage precharges

PC (From Stage N+1)PC (From Stage N+1)

Eval (From Stage N+2)Eval (From Stage N+2)

NANDNAND

aCaC++

““ack”ack”

““req” inreq” in

data indata in data outdata out

““req” outreq” out

matcheddelay

donedone

Page 23: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

23

LPLPSRSR2/1 Protocol2/1 Protocol

11 22

33

aCEVAL TT 2Cycle Time =Cycle Time =

element-C asymmetric throughDelay aCT

N evaluatesN evaluates N+2 evaluatesN+2 evaluates

N+2 indicates “done”N+2 indicates “done”

NN N+1N+1 N+2N+2

22

N+1 evaluatesN+1 evaluates

N+1 indicates “done”N+1 indicates “done”

Page 24: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

24

ResultsResultsDesigned/simulated FIFO’s for each pipeline Designed/simulated FIFO’s for each pipeline

style style

Experimental Setup:Experimental Setup: design: 4-bit wide, 10-stage FIFOdesign: 4-bit wide, 10-stage FIFO technology: 0.6technology: 0.6 HP CMOS (old!) HP CMOS (old!) operating conditions: 3.3 V and 300°Koperating conditions: 3.3 V and 300°K

Page 25: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

25

Throughput

Design Mega items/sec Improvement (%)

PS0 350 -

LP3/1 490 38%

LP2/2 580 64%

LP2/1 640 83%

LPSR2/1 1200 -

HC 1300 -

dual-raildual-rail

single-railsingle-rail

FIFO Results (simulations)FIFO Results (simulations)

LP dual-rail: LP dual-rail: over 80% fasterover 80% faster than Williams’ PS0 than Williams’ PS0 comparable latencycomparable latency

LP single-rail: LP single-rail: even fastereven faster

0.60.6 HP CMOS HP CMOS3.3 V, 300°K3.3 V, 300°K

Page 26: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

26

datapath widthdatapath width= 32 dual-rail bits!= 32 dual-rail bits!

Practicality of Gate-Level Practicality of Gate-Level PipeliningPipeliningWhen datapath is wide:When datapath is wide:

Can often split into narrow Can often split into narrow “streams”“streams”

comp. comp. ddet. et. ffairly airly low cost!low cost!

Use Use “localized”“localized” completion detector completion detector for each stream:for each stream:

need to examine only a few bitsneed to examine only a few bits small fan-insmall fan-in

send “done” to only a few gatessend “done” to only a few gates small fan-outsmall fan-out

donedone

fan-out=2fan-out=2

comp. det.comp. det.fan-in = 2fan-in = 2

Page 27: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

27

High-Capacity PipelinesHigh-Capacity Pipelines

Singh/Nowick Singh/Nowick WVLSI-00, ISSCC-02, Async-02WVLSI-00, ISSCC-02, Async-02

Page 28: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

28

HCHC Pipeline Style Pipeline StyleHigh-Capacity Pipelines (HC)High-Capacity Pipelines (HC)

bundled datapaths; dynamic logic function blocksbundled datapaths; dynamic logic function blocks latch-free: no explicit latches neededlatch-free: no explicit latches needed

dynamic logic provides implicit latchingdynamic logic provides implicit latching novel highly-concurrent protocol novel highly-concurrent protocol maximizes storage maximizes storage

capacitycapacity traditional latch-free approaches: “spacers” limit capacity to traditional latch-free approaches: “spacers” limit capacity to

50%50%

Key Idea: Obtain greater control of stage’s operationKey Idea: Obtain greater control of stage’s operation separate control of pull-up/pull-downseparate control of pull-up/pull-down result = new result = new “isolate phase”“isolate phase” stage holds outputs/impervious to input changesstage holds outputs/impervious to input changes

Advantage: Each stage can hold a distinct data itemAdvantage: Each stage can hold a distinct data item 100% storage capacity100% storage capacity

Extra Benefit: Obtain greater concurrencyExtra Benefit: Obtain greater concurrency High throughputHigh throughput

Page 29: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

29

HC: Basic StructureHC: Basic Structure

Key Idea:Key Idea:2 independent control 2 independent control signals:signals:pc: pc: controls prechargecontrols prechargeeval: eval: controls evaluationcontrols evaluation

Allows novel 3-phase cycle:Allows novel 3-phase cycle:

EvaluateEvaluate

““Isolate” (hold)Isolate” (hold)

Precharge Precharge

delaydelay

stagestagecontrollercontroller

pcpc evaleval

ackack

N N+1 N+2

delaydelay

Single-rail “Bundled Datapath”: Single-rail “Bundled Datapath”: matched delay: matched delay: produces delayed produces delayed “done” “done”

signalsignalworst-case delay: longer than slowest path worst-case delay: longer than slowest path

for datafor data

delaydelay

Page 30: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

30

HC: Inside a StageHC: Inside a StageIndependent ControlsIndependent Controls of of pull-uppull-up and pull-down: and pull-down:

allows new 3allows new 3rdrd phase: “isolate” phase: “isolate”

pcpc asserted: asserted: prechargeprecharge evaleval asserted: asserted: evaluateevaluate pcpc and and evaleval de-asserted: enter de-asserted: enter “isolate” (hold) “isolate” (hold)

phasephase

“keeper”

controlscontrolsevaluationevaluation

controlscontrolsprechargeprecharge

evaleval

inputsoutputs

pcpc

Page 31: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

31

HC: ProtocolHC: Protocol

Most Existing Protocols: Most Existing Protocols: 3 synchronization 3 synchronization

arcsarcs1 forward arc: 1 forward arc: data dependencydata dependency2 backward arcs: 2 backward arcs: control synchronizationcontrol synchronization

Our protocol: Our protocol: only 2only 2 synchronization arcssynchronization arcsonly 1 backward arconly 1 backward arc

once stage N+1 evaluates, N can complete entire next once stage N+1 evaluates, N can complete entire next cycle!cycle!

EvalEval

IsolateIsolate

PrechargePrecharge

pc=1pc=1eval=1eval=1

pc=1pc=1eval=0eval=0

pc=0pc=0eval=0eval=0

EvalEval

IsolateIsolate

PrechargePrecharge

Stage NStage N Stage N+1Stage N+1

X

Page 32: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

32

Formal Specification of ControllerFormal Specification of Controller

Problem: Specification Problem: Specification too concurrenttoo concurrent for direct synthesis for direct synthesisdesired precharge condition: N and N+1 have evaluated desired precharge condition: N and N+1 have evaluated

same data same data problem: this condition not uniquely captured by given problem: this condition not uniquely captured by given

signals!signals!N may evaluate next data item,N may evaluate next data item, while N+1 stuck on current item!while N+1 stuck on current item!

T+T+

T-T-

(Evaluate of(Evaluate ofN+1 complete)N+1 complete)

(Precharge of(Precharge ofN+1 complete)N+1 complete)

pc+pc+ eval+eval+

S+S+

eval-eval-

pc-pc-

S-S-

(Start(Startevaluate)evaluate)

(Evaluate(Evaluatecomplete)complete)

(Isolate)(Isolate)

(Start(Startprecharge)precharge)

(Precharge(Prechargecomplete)complete)

Page 33: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

33

Modified Specification of Modified Specification of ControllerControllerSolution: Add a state variable Solution: Add a state variable ok2pcok2pc

ok2pc ok2pc records whether N+1 has records whether N+1 has “absorbed”“absorbed” N’s data N’s data itemitem

ok2pcok2pc resets resets immediately when N deletes item immediately when N deletes item (N (N precharges)precharges)

ok2pcok2pc is set is set when N+1 deletes item when N+1 deletes item (N+1 precharges) (N+1 precharges)

ok2pc+ok2pc+

ok2pc-ok2pc-

pc+pc+ eval+eval+

S+S+

eval-eval-

pc-pc-

S-S-

T+T+

T-T-

(Evaluate of(Evaluate ofN+1 complete)N+1 complete)

(Precharge of(Precharge ofN+1 complete)N+1 complete)

Page 34: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

34

Controller implementationController implementation

Controller implementation is very simple:Controller implementation is very simple: each signal implemented using each signal implemented using a single gatea single gateok2pcok2pc typically typically off the critical pathoff the critical path

INVINV

NAND3NAND3

aCaC++

SS

TT

SSTT

ok2pcok2pc

pcpc

evaleval SS

Page 35: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

35

++

evalevalpcpc

HC: Stage ImplementationHC: Stage Implementation

reqreq donedone

ackack

NANDNANDINVINV

delaydelay

state variable:state variable: off the critical pathoff the critical path

from currentfrom currentstagestage

self-loop:self-loop: key to fastkey to fast “ “isolation”isolation”

from nextfrom nextstagestage

early ackearly ack

Page 36: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

36

HC: OperationHC: Operation

11

NN N+1N+1N evaluatesN evaluates N+1 starts toN+1 starts to

evaluateevaluateN prechargesN precharges

N enables itself for next evaluationN enables itself for next evaluation

22

33

(fast(fastself-loop)self-loop)

N isolatesN isolates

(fast(fastself-loop)self-loop)

(early Ack)(early Ack)

Cycle Time = 8 CMOS gate delaysCycle Time = 8 CMOS gate delaysCycle Time = 8 CMOS gate delaysCycle Time = 8 CMOS gate delays

Page 37: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

37

N enables itselfN enables itselffor next evaluationfor next evaluation

N prechargesN precharges

PerformancePerformance

11

)()( INVPRECHNANDaCEVAL TTTTT 3Cycle Time =Cycle Time =

N evaluatesN evaluates

NN N+1N+1 N+2N+2

N+1 evaluatesN+1 evaluates

33

22

N isolatesN isolates

22

Page 38: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

38

Throughput

Design Mega items/sec Improvement (%)

PS0 350 -

LP3/1 490 38%

LP2/2 580 64%

LP2/1 640 83%

LPSR2/1 1200 -

HC 1300 -

dual-raildual-rail

single-railsingle-rail

FIFO Results (simulation)FIFO Results (simulation)

LP dual-rail: LP dual-rail: over 80% fasterover 80% faster than Williams’ PS0 than Williams’ PS0 comparable latencycomparable latency

HC single-rail:HC single-rail: 1.3 Giga items/second1.3 Giga items/second

0.60.6 HP CMOS HP CMOS3.3 V, 300°K3.3 V, 300°K

Page 39: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

39

Fabricated Chip: HC FIFOFabricated Chip: HC FIFO 2.5 GHz in 0.18u2.5 GHz in 0.18u

Page 40: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

40

Ripple-Carry Adder: One StageRipple-Carry Adder: One StageMixed Dual-Rail/Single-Rail Datapath:Mixed Dual-Rail/Single-Rail Datapath:

single-rail: single-rail: sumsum dual-rail:dual-rail: A, B, Carry-inA, B, Carry-in and and Carry-outCarry-out

must implement binate functions using unate dynamic must implement binate functions using unate dynamic logiclogic

Full-AdderFull-AdderStageStage

ccinin11

ccinin00

reqreqcc

aa00aa11 bb00bb11reqreqabab

ccoutout11ccoutout00

sumsum

donedone

AA BB

Carry-inCarry-in Carry-outCarry-out

Page 41: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

41

Final Adder ArchitectureFinal Adder Architecture

adderadderstagestage

A,BA,B

sumsum

carrycarryinin

carrycarryoutout

shift-registersshift-registersprovide operand bitsprovide operand bits

shift-registersshift-registersaccumulate sum bitsaccumulate sum bits

leastleastsignificantsignificant

mostmostsignificantsignificant

Page 42: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

42

ResultsResults

Designed/simulated adder in each pipeline Designed/simulated adder in each pipeline style style Experimental Setup:Experimental Setup: design: 32-bit ripple-carry-adderdesign: 32-bit ripple-carry-adder technology: 0.6technology: 0.6 HP CMOS, @3.3 V and 300°K HP CMOS, @3.3 V and 300°K

Adder Design

Cycle Time (ns)

Throughput (Mega items/sec)

LPSR2/1 1.07 930

HC 0.98 1023

HC is HC is 10% faster10% faster than LP than LPSRSR2/12/1

Page 43: 1 Clockless Logic Montek Singh Tue, Mar 21, 2006

43

ConclusionsConclusions

Introduced 2 new async high-speed pipeline styles:Introduced 2 new async high-speed pipeline styles: Lookahead Pipelines:Lookahead Pipelines: use novel protocol use novel protocol

optimizationsoptimizations High-Capacity Pipelines:High-Capacity Pipelines: fundamentally new fundamentally new

protocolprotocol– allows 100% storage capacityallows 100% storage capacity

Obtain very high throughputs:Obtain very high throughputs: FIFO’s: up to FIFO’s: up to 1.3 GigaHertz1.3 GigaHertz in 0.6 in 0.6 CMOS CMOS Adders: Adders: ~1.0 GigaHertz ~1.0 GigaHertz in 0.6in 0.6 CMOS CMOS near-best performance, and…near-best performance, and…

– significantly simpler and easier-to-constructsignificantly simpler and easier-to-construct

Fabricated chip: 2.5 GHz (HC FIFOs in 0.18u)Fabricated chip: 2.5 GHz (HC FIFOs in 0.18u)