low-latency interfaces for mixed-timing domains [in dac-01] tiberiu chelceasteven m. nowick...
Post on 20-Dec-2015
214 views
TRANSCRIPT
Low-Latency Interfaces for
Mixed-Timing Domains[in DAC-01]
Tiberiu ChelceaTiberiu Chelcea Steven M. Steven M. NowickNowick
Department of Computer ScienceDepartment of Computer Science
Columbia UniversityColumbia University
{tibi,nowick}@cs.columbia.edu{tibi,nowick}@cs.columbia.edu
IntroductionIntroductionKey Trend in VLSI systems: Key Trend in VLSI systems: systems-on-a-chip systems-on-a-chip
(SoC)(SoC)
Two fundamental challenges:Two fundamental challenges: mixed-timing domainsmixed-timing domains long interconnect delayslong interconnect delays
Our Goal:Our Goal: design of efficient interface circuits design of efficient interface circuits
Desirable Features:Desirable Features: arbitrarily robustarbitrarily robust low-latency, high-throughputlow-latency, high-throughput modularity, scalabilitymodularity, scalability
Few satisfactory solutions to date….Few satisfactory solutions to date….
Timing Issues in SoC DesignTiming Issues in SoC Design
(a) single-clock
longinter-connect
Domain #1
sync or async
(b) mixed-timing domains
Domain #2 sync or async
Domain #1
Domain #2
longinter-connect
Timing Issues in SoC Design Timing Issues in SoC Design (cont.)(cont.)
Solution:Solution: provide interface circuits provide interface circuits(a) single-clock
longinter-connect
Carloni et al., “relay stations”
Domain #1sync or async
(b) mixed-timing domains
Domain #2sync or async
Domain #1
Domain #2
longinter-connect
NEW: “mixed-timingFIFO’s”
NEW: “mixed-timing“relay stations”
ContributionsContributionsComplete set of mixed-timing interface circuits:Complete set of mixed-timing interface circuits:
sync-sync, async-sync,sync-sync, async-sync, sync-async, async-async sync-async, async-async
Features:Features: Arbitrary Robustness:Arbitrary Robustness: wrt synchronization failures wrt synchronization failures High-Throughput:High-Throughput:
in steady-state operation: no synchronization in steady-state operation: no synchronization overhead overhead
Low-Latency:Low-Latency: “fast restart”“fast restart” in empty FIFO: only synchronization overheadin empty FIFO: only synchronization overhead
Reusability:Reusability: each interface partitioned into each interface partitioned into reusable sub-reusable sub-
componentscomponents
Two Contributions: Two Contributions: Mixed-Timing FIFO’sMixed-Timing FIFO’s Mixed-Timing Relay StationsMixed-Timing Relay Stations
Contribution #1: Mixed-Timing Contribution #1: Mixed-Timing FIFO’sFIFO’sAddresses issue of interfacing mixed-timing domainsAddresses issue of interfacing mixed-timing domains
Features: Features: token ring architecturetoken ring architecture circular array of identical cellscircular array of identical cells shared buses: data + controlshared buses: data + control data: “immobile” once enqueueddata: “immobile” once enqueued distributed control: allows concurrent distributed control: allows concurrent putput//getget operations operations
2 circulating tokens:2 circulating tokens: define define tailtail & & headhead of queue of queue
Potential benefits:Potential benefits: low latencylow latency low powerlow power scalabilityscalability
Contribution #2: Mixed-Timing Contribution #2: Mixed-Timing Relay StationsRelay Stations
Addresses issue of long interconnect delaysAddresses issue of long interconnect delays
““Latency-Insensitive Protocols”:Latency-Insensitive Protocols”: safely tolerate long safely tolerate long
interconnect delays between systemsinterconnect delays between systems
Prior Contribution:Prior Contribution: introduce introduce “relay stations”“relay stations” single-clock domains (single-clock domains (Carloni et al., ICCAD-99)Carloni et al., ICCAD-99)
Our Contribution:Our Contribution: introduce introduce “mixed-timing relay “mixed-timing relay
stations”stations” mixed-clock mixed-clock (sync-sync)(sync-sync) async-syncasync-sync
First proposed solutions to date….First proposed solutions to date….
Related WorkRelated WorkSingle-Clock Domains: Single-Clock Domains: handling clock handling clock
discrepanciesdiscrepancies clock skew and jitter clock skew and jitter (Kol98, Greenstreet95)(Kol98, Greenstreet95) long interconnect delays long interconnect delays (Carloni99)(Carloni99)
Mixed-Timing Domains:Mixed-Timing Domains: 3 common approaches 3 common approaches Use “Wrapper Logic”:Use “Wrapper Logic”:
add logic layer to synchronize data/controladd logic layer to synchronize data/control(Seitz80, Seizovic94)(Seitz80, Seizovic94)
drawback:drawback: long latencies in communicationlong latencies in communication
Modify Receiver’s Clock:Modify Receiver’s Clock: stretchable and pausible clocks stretchable and pausible clocks
(Chapiro84, Yun96, Bormann97, (Chapiro84, Yun96, Bormann97, Sjogren/Myers97)Sjogren/Myers97)
drawback:drawback: penalties in restarting clock penalties in restarting clock
Related Work: Closer ApproachesRelated Work: Closer Approaches
Mixed-Timing Domains (cont.):Mixed-Timing Domains (cont.):
Interface Circuits: Mixed-Clock FIFO’s Interface Circuits: Mixed-Clock FIFO’s (Intel, Jex et al. (Intel, Jex et al. 1997):1997):
drawback:drawback: significant area overhead = significant area overhead = synchronizersynchronizer for each cellfor each cell
Our approach: mixed-clock FIFO’s Our approach: mixed-clock FIFO’s … … only 2 synchronizers only 2 synchronizers for entire FIFOfor entire FIFO
OutlineOutline
Mixed-Clock InterfacesMixed-Clock Interfaces FIFOFIFO Relay StationRelay Station
• Async-Sync InterfacesAsync-Sync Interfaces FIFOFIFO Relay StationRelay Station
• ResultsResults
• ConclusionsConclusions
Mixed-Clock FIFO: Block LevelMixed-Clock FIFO: Block Level
full
req_put
data_put
CLK_put
req_get
valid_get
empty
data_get
CLK_getMix
ed
-Clo
ck
FIF
O
Bus for data itemsIndicates when FIFO full
Indicates when FIFO empty
Controls get operations
Initiates get operations
Bus for data items
Indicates data items validity(always 1 in this design)
synchronous synchronous putput inteface inteface
synchronous synchronous getget interface interface
Initiates put operations
Controls put operations
Mixed-Clock FIFO: Steady-State Mixed-Clock FIFO: Steady-State SimulationSimulation
Ge
tC
on
tro
ller
Empty Detector
Full DetectorPut
Controller
full
req_put
data_put
CLK_put
CLK_getdata_get
req_get
valid_get
empty
Put Controller enables a put operation
TAIL
At the end of clock cycle
Cell enqueues data
HEAD
Sender starts a put operationSteady state: FIFO neither full, nor empty
Mixed-Clock FIFO: Steady-State Mixed-Clock FIFO: Steady-State SimulationSimulation
Ge
tC
on
tro
ller
Empty Detector
Full DetectorPut
Controller
full
req_put
data_put
CLK_put
CLK_getdata_get
req_get
valid_get
empty
TAILPasses the put token
HEAD
Mixed-Clock FIFO: Steady-State Mixed-Clock FIFO: Steady-State SimulationSimulation
Ge
tC
on
tro
ller
Empty Detector
Full DetectorPut
Controller
full
req_put
data_put
CLK_put
CLK_getdata_get
req_get
valid_get
empty
TAIL
HEAD
Get OperationGet Operation
Mixed-Clock FIFO: Steady-State Mixed-Clock FIFO: Steady-State SimulationSimulation
Ge
tC
on
tro
ller
Empty Detector
Full DetectorPut
Controller
full
req_put
data_put
CLK_put
CLK_getdata_get
req_get
valid_get
empty
TAIL
HEADSteady state operation: Puts and Gets “reasonably spaced”Steady state operation: Puts and Gets “reasonably spaced”
Zero probabilityZero probability of synchronization failure of synchronization failure
Steady state operation:Steady state operation:Zero synchronization overheadZero synchronization overhead
Mixed-Clock FIFO: Steady-State Mixed-Clock FIFO: Steady-State SimulationSimulation
Ge
tC
on
tro
ller
Empty Detector
Full DetectorPut
Controller
full
req_put
data_put
CLK_put
CLK_getdata_get
req_get
valid_get
empty
TAILTAIL
HEAD
TAIL
Mixed-Clock FIFO: Full ScenarioMixed-Clock FIFO: Full Scenario
Ge
tC
on
tro
ller
Empty Detector
Full DetectorPut
Controller
full
req_put
data_put
CLK_put
CLK_getdata_get
req_get
valid_get
empty
Put interface stalled FIFO FULLFIFO FULL
HEAD
TAIL
Mixed-Clock FIFO: Full ScenarioMixed-Clock FIFO: Full Scenario
Ge
tC
on
tro
ller
Empty Detector
Full Detectorfull
req_put
data_put
CLK_put
CLK_getdata_get
req_get
valid_get
empty
HEAD
PutController
TAIL
Mixed-Clock FIFO: Full ScenarioMixed-Clock FIFO: Full Scenario
Ge
tC
on
tro
ller
Empty Detector
Full Detectorfull
req_put
data_put
CLK_put
CLK_getdata_get
req_get
valid_get
empty
PutController
TAILFIFO NOT FULLFIFO NOT FULL
HEAD
Mixed-Clock FIFO: Full ScenarioMixed-Clock FIFO: Full Scenario
Ge
tC
on
tro
ller
Empty Detector
Full Detectorfull
req_put
data_put
CLK_put
CLK_getdata_get
req_get
valid_get
empty
PutController
TAIL
HEAD
REG
Mixed-Clock FIFO: Cell Mixed-Clock FIFO: Cell ImplementationImplementation
En
En
f_ie_i
ptok_out ptok_in
gtok_ingtok_out
CLK_get en_get valid data_get
CLK_put en_put req_putdata_put
SR
en_puten_put
en_geten_get
Enables a get operationEnables a get operation
Enables a put operationEnables a put operationSynchronous Put PartSynchronous Put Part
Synchronous Get PartSynchronous Get Part
Data ValidityData ValidityControllerController
reusable
reusable
f_if_ie_ie_i
Cell FULLCell FULL
Cell EMPTYCell EMPTY
Status Bits:Status Bits:
ptok_outptok_out ptok_inptok_in
gtok_outgtok_out gtok_ingtok_inEn
En
validvaliddata_getdata_get
Data item outData item outValidity bit outValidity bit out
req_putreq_putdata_putdata_put
Data item inValidity bit in
Mixed-Clock FIFO: ArchitectureMixed-Clock FIFO: Architecture
Ge
tC
on
tro
ller
Empty Detector
Full DetectorPut
Controller
full
req_put
data_put
CLK_put
CLK_getdata_get
req_get
valid_get
empty
Synchronization IssuesSynchronization IssuesChallenge: interfaces are highly-concurrentChallenge: interfaces are highly-concurrent
Global “FIFO state”: controlled by 2 different clocksGlobal “FIFO state”: controlled by 2 different clocks
Problem #1: Problem #1: MetastabilityMetastability Each FIFO interface needs Each FIFO interface needs clean state signalsclean state signals
Solution:Solution: Synchronize “full” & “empty” signalsSynchronize “full” & “empty” signals ““full”full” with with CLK_putCLK_put ““empty”empty” with with CLK_getCLK_get
Add 2Add 2 (or more) (or more) synchronizing latchessynchronizing latches to each signal to each signal
Observable Observable “full”“full”//“empty”“empty” safely approximatesafely approximate true FIFO statetrue FIFO state
Synchronization Issues (cont.)Synchronization Issues (cont.)Problem #2:Problem #2: FIFO now may underflow/overflow!FIFO now may underflow/overflow!
synchronizing latches synchronizing latches add extra latencyadd extra latency
Solution:Solution: Modify definitions of “full” and “empty” Modify definitions of “full” and “empty”New FULL:New FULL: 0 or 1 empty cells left0 or 1 empty cells leftNew EMPTY:New EMPTY: 0 or 1 full cells left0 or 1 full cells left
e_0
e_1
e_2
e_3
e_3
e_2
e_1
e_0
CLK_put
CLK_put
CLK_put
full
Two consecutive empty cells FIFO not full=NO two consecutive
empty cells
Synchronizing Latches
New Full Detector
Synchronization Issues (cont.)Synchronization Issues (cont.)Problem #3:Problem #3: Potential for deadlockPotential for deadlock
Scenario:Scenario: suppose only 1 data item in quiescent FIFO suppose only 1 data item in quiescent FIFO FIFO still considered “empty” (new definition)FIFO still considered “empty” (new definition)
Get interface: Get interface: cannot dequeue data item!cannot dequeue data item!
Solution:Solution: bi-modal “empty detector”, bi-modal “empty detector”, combines:combines: ““New empty”New empty” detector (0 or 1 data items) detector (0 or 1 data items)
““True empty”True empty” detector (0 data items) detector (0 data items)
Two results folded into single global Two results folded into single global “empty”“empty” signal signal
Synchronization Issues: Avoiding Synchronization Issues: Avoiding DeadlockDeadlock
f_0
f_1
f_2
f_3
f_3
f_2
f_1
f_0
CLK_get
CLK_get
CLK_getne
f_1 f_3f_2f_0
CLK_get
CLK_get
CLK_get
oe
req_get
en_get
empty
Detects “new empty” (0 or 1 empty cells)Detects “new empty” (0 or 1 empty cells)
Detects “true empty” (0 empty cells)Detects “true empty” (0 empty cells)
Combine intoCombine intoglobal “empty”global “empty”
Bi-modal empty detection: select either Bi-modal empty detection: select either nene or or oeoe
Reconfigure whenever activeReconfigure whenever activeget interfaceget interface
When reconfiguredWhen reconfigureduse “ne”:use “ne”:
FIFO active FIFO active avoids underflowavoids underflow
When NOT When NOT reconfigured, use “oe”:reconfigured, use “oe”:
FIFO quiescent FIFO quiescent avoids deadlockavoids deadlock
Mixed-Clock FIFO: ArchitectureMixed-Clock FIFO: Architecture
Ge
tC
on
tro
ller
Empty Detector
Full DetectorPut
Controller
full
req_put
data_put
CLK_put
CLK_getdata_get
req_get
valid_get
empty
Put/Get ControllersPut/Get Controllers
Put Controller:Put Controller: enables put operationenables put operation disabled disabled when FIFOwhen FIFO fullfull
Get Controller:Get Controller: enables get operationenables get operation indicates when data validindicates when data valid disabled disabled when FIFOwhen FIFO emptyempty
en_putfull
req_put
en_get
empty
valid
req_get
valid_get
OutlineOutline
Mixed-Clock InterfacesMixed-Clock Interfaces FIFOFIFO Relay StationRelay Station
• Async-Sync InterfacesAsync-Sync Interfaces FIFOFIFO Relay StationRelay Station
• ResultsResults
• ConclusionsConclusions
Relay Stations: OverviewRelay Stations: Overview
system 1 now sends “data packets” to system 2system 1 now sends “data packets” to system 2
RS RS RS RS
Sys
tem
1
Sys
tem
2
Data Packet = Data Packet =
data item +data item +
validity bitvalidity bit
““stop”stop” control = stopIn + stopOut control = stopIn + stopOut- apply counter-pressureapply counter-pressure- result: stall communicationresult: stall communication
Proposed by Carloni et al. (ICCAD’99)
Steady State: pass data on every cycleSteady State: pass data on every cycle(either valid or invalid)(either valid or invalid)
Problem: Works only for single-clock systems!Problem: Works only for single-clock systems!
CLK
system 1 sends “data items” to system 2
Delay = > 1 cycleDelay = > 1 cycleDelay = 1 cycleDelay = 1 cycle
Relay Stations: ImplementationRelay Stations: Implementation
• In In normal operation:normal operation: packetInpacketIn copied to copied to MRMR and forwarded on and forwarded on
packetOutpacketOut
• When When stoppedstopped ( (stopInstopIn=1):=1): stopOutstopOut raised on the next clock edgeraised on the next clock edge extra packet copied to extra packet copied to ARAR
switc
h mux
MR
AR
Control
packetOutpacketIn
stopInstopOut
Relay Station Relay Station vs. vs. Mixed- Mixed-Clock FIFOClock FIFO
Steady state:Steady state: always always pass pass datadata
Data items:Data items: both both valid valid & & invalidinvalid
Stopping mechanism:Stopping mechanism: stopIn stopIn & & stopOutstopOut
Steady state:Steady state: only only pass pass data data when when requested requested
Data items:Data items: only only valid valid datadata
Stopping mechanism: Stopping mechanism: nonenone (only full/empty) (only full/empty)
validOut
dataOut
stopIn
validIn
dataIn
stopOut
emptyfull
req_getreq_put
dataOutdataIn
RelayStation
Mixed-ClockFIFO
full
req_put
data_put
CLK_put
empty
req_get
valid_get
data_get
CLK_getMix
ed
-Clo
ck
FIF
O
CLK
Mixed-Clock Relay Stations Mixed-Clock Relay Stations (MCRS)(MCRS)
RS RS RS RS
Sys
tem
1
Sys
tem
2
Mixed-Clock Relay Station derived from the Mixed-Clock FIFO
valid_putvalid_put
data_putdata_put
stopOutstopOut stopInstopIn
valid_getvalid_get
data_getdata_get
Mix
ed
-Clo
ck
Rela
y S
tati
on
CLK1CLK1 CLK2CLK2
MCRS
CLK1CLK1 CLK2
Change ONLY Put and Get ControllersChange ONLY Put and Get Controllers
NEW
packetInpacketIn packetOutpacketOut
Mixed-Clock Relay Station: Mixed-Clock Relay Station: ImplementationImplementation
Identical:Identical:- FIFO cells- FIFO cells- Full/Empty detectors- Full/Empty detectors (...or can simplify)(...or can simplify)
Only modify: Only modify: Put & Get ControllersPut & Get Controllers
validInvalidIn
fullfull en_puten_putstopIstopI
nn
emptyempty
validvalid
en_geten_get
validOutvalidOut
to cellsto cells
Put Controller Get Controller
Mixed-Clock Relay Station vs. Mixed-Clock FIFO
Always enqueue data (unless full)Always enqueue data (unless full)
OutlineOutline
• Mixed-Clock InterfacesMixed-Clock Interfaces FIFOFIFO Relay StationRelay Station
• Async-Sync InterfacesAsync-Sync Interfaces FIFOFIFO Relay StationRelay Station
• ResultsResults
• ConclusionsConclusions
Async-Sync FIFO: Block LevelAsync-Sync FIFO: Block Level
Asynchronous Asynchronous putput interface: uses interface: uses handshaking handshaking communicationcommunication put_req:put_req: request operation request operation put_ack:put_ack: acknowledge completion acknowledge completion no “full” signalno “full” signal
Synchronous Synchronous getget interface: no change interface: no change
full
req_put
data_put
CLK_put
req_get
valid_get
empty
data_get
CLK_getMix
ed
-Clo
ck
FIF
Oput_dataput_data
req_getreq_get
valid_getvalid_get
emptyempty
data_getdata_get
CLK_getCLK_get
put_reqput_req
put_ackput_ack
As
yn
c-S
yn
cF
IFO
Async Domain Sync Domain
Async-Sync FIFO: ArchitectureAsync-Sync FIFO: Architecture
cell cell cell cell cell
Ge
tC
on
tro
ller
Empty Detector
put_ack
put_req
put_data
CLK_getdata_get
req_get
valid_get
empty
Get interface: exactly as in Mixed-Clock FIFOGet interface: exactly as in Mixed-Clock FIFO
Asynchronous put interfaceNo Full Detector or Put ControllerNo Full Detector or Put ControllerWhen FIFO full, acknowledgement withheldWhen FIFO full, acknowledgement withheld
until safe to perform the put operationuntil safe to perform the put operation
REG
Async-Sync FIFO: Cell Async-Sync FIFO: Cell ImplementationImplementation
C+ OPT
DV
En
put_reqput_data put_ack
we
f_i
gtok_out
we1
gtok_in
CLK_get en_get get_data
e_i
Data ValidityData ValidityControllerController
new
Synchronous Get Part
reusable (from mixed-clock FIFO)(from mixed-clock FIFO)
Asynchronous Put PartAsynchronous Put Part
reusable
from asyncfrom asyncFIFO (Async00)FIFO (Async00)
Async-Sync Relay Stations (ASRS)Async-Sync Relay Stations (ASRS)
ARS ARS RS
Sys
tem
1(a
syn
c)
Sys
tem
2(s
ync)
ASRS
CLK2
Micropipeline
optional
OutlineOutline
• Mixed-Clock InterfacesMixed-Clock Interfaces FIFOFIFO Relay StationRelay Station
• Async-Sync InterfacesAsync-Sync Interfaces FIFOFIFO Relay StationRelay Station
• ResultsResults
• ConclusionsConclusions
ResultsResults
Each circuit implemented: Each circuit implemented: using both academic and industry toolsusing both academic and industry tools
MINIMALIST:MINIMALIST: Burst-Mode controllers [Nowick et al. Burst-Mode controllers [Nowick et al. ‘99]‘99]
PETRIFY:PETRIFY: Petri-Net controllers [Cortadella et al. ‘97] Petri-Net controllers [Cortadella et al. ‘97]
Pre-layout simulations: 0.6Pre-layout simulations: 0.6m HP CMOS m HP CMOS
technologytechnology
Experiments: Experiments: various FIFO capacitiesvarious FIFO capacities (4/8/16 cells) (4/8/16 cells) various data widths various data widths (8/16 bits)(8/16 bits)
Results: LatencyResults: Latency
DesignDesign4-place4-place 8-place8-place 16-place16-place
MinMin MaxMax MinMin MaxMax MinMin MaxMaxMixed-ClockMixed-Clock 5.435.43 6.346.34 5.795.79 6.646.64 6.146.14 7.177.17Async-SyncAsync-Sync 5.535.53 6.456.45 6.136.13 7.177.17 6.476.47 7.517.51Mixed-Clock RSMixed-Clock RS 5.485.48 6.416.41 6.056.05 7.027.02 6.236.23 7.287.28Async-Sync RSAsync-Sync RS 5.615.61 6.356.35 6.186.18 7.137.13 6.576.57 7.627.62
Experimental Setup:- 8-bit data items- various FIFO capacities (4, 8, 16)
For each design, latency not uniquely defined: For each design, latency not uniquely defined: Min/MaxMin/Max
Latency = time from enqueuing to dequeueing data into an empty FIFO
Results: Maximum Operating RateResults: Maximum Operating Rate
DesignDesign4-place4-place 8-place8-place 16-place16-place
PutPut GetGet PutPut GetGet PutPut GetGetMixed-ClockMixed-Clock 565565 549549 544544 523523 505505 484484Async-SyncAsync-Sync 421421 549549 379379 523523 357357 484484Mixed-Clock RSMixed-Clock RS 580580 539539 550550 517517 509509 475475Async-Sync RSAsync-Sync RS 421421 539539 379379 517517 357357 475475
Synchronous interfaces: MegaHertzAsynchronous interfaces: MegaOps/sec
Put vs. Get rates:- sync put faster than sync get- async put slower than sync get
ConclusionsConclusionsIntroduced several new low-latency interface circuitsIntroduced several new low-latency interface circuits
Address 2 major issues in SoC design:Address 2 major issues in SoC design: Mixed-timing domainsMixed-timing domains
mixed-clock FIFOmixed-clock FIFO async-sync FIFOasync-sync FIFO
Long interconnect delaysLong interconnect delays mixed-clock relay stationmixed-clock relay station async-sync relay stationasync-sync relay station
Other designs implemented and simulated:Other designs implemented and simulated: Sync-Async FIFO + Relay StationSync-Async FIFO + Relay Station Async-Async FIFO + Relay StationAsync-Async FIFO + Relay Station
Reusable components: mix & match to build circuitsReusable components: mix & match to build circuits
Provide useful set of interface circuits for SoC designProvide useful set of interface circuits for SoC design