advanced digital design practical example: darts by a. steininger and m. delvai vienna university of...
Post on 19-Dec-2015
220 views
TRANSCRIPT
Advanced Digital DesignAdvanced Digital DesignPractical Example: DARTSPractical Example: DARTS
by A. Steininger and M. DelvaiVienna University of Technology
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 2
OutlineOutline
The Clock Distribution ProblemThe Clock Distribution Problem DARTS Idea & Project OutlineDARTS Idea & Project Outline DARTS ImplementationDARTS Implementation
concept & modulesconcept & modules complexity issuescomplexity issues performance resultsperformance results test concepttest concept timing assumptionstiming assumptions
fundamental Problem FT Asyn Logicfundamental Problem FT Asyn Logic
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 3
concept:concept: precise global notion of time for entire (system on) chip
method:method: discrete evenly spaced time slicesglobal, “phase accurate”“phase accurate” clock
treesingle crystal oscillator
costs: costs: cumbersome
clock tree design considerable
waste of power single point of failuresingle point of failure
DSP
WLAN
Kamera
GPRS
GPS
The Synchronous ParadigmThe Synchronous Paradigm
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 4
Phase Accurrate ClockingPhase Accurrate Clocking
low-skew clock distribution has low-skew clock distribution has become a substantial problem:become a substantial problem: non-negligible signal propagation timenon-negligible signal propagation time clock network is widely distributedclock network is widely distributed high fan-out of clock networkhigh fan-out of clock network enormous power dissipation in clock enormous power dissipation in clock
networknetwork sophisticated techniques for clock sophisticated techniques for clock
routing with little tool supportrouting with little tool support ……
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 5
Current SolutionsCurrent Solutions
symmetric routing symmetric routing (H-tree, X-tree)(H-tree, X-tree)
configurable buffersconfigurable buffers deskewing circuitsdeskewing circuits gated clock treegated clock tree half swing clockhalf swing clock ……
can we go on like this?can we go on like this?
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 6
Fault-Tolerant ClockingFault-Tolerant Clocking
crystal oscillators not very robustcrystal oscillators not very robust increasing extent of clock netincreasing extent of clock net higher clock rateshigher clock rates smaller voltage swingsmaller voltage swing shrinking feature size andshrinking feature size and
critical chargecritical charge more demanding applicationsmore demanding applications
Can we admit clock source & network Can we admit clock source & network as single points of failure?as single points of failure?
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 7
Current SolutionsCurrent Solutions
need to use independent clock sourcesneed to use independent clock sources sacrifice global synchrony: sacrifice global synchrony:
GGlobally lobally AAsynchronous synchronous LLocally ocally SSynchronous ynchronous (GALS) systems(GALS) systems
perform synchronizationperform synchronization of clock sourcesof clock sources
(on microtick level)(on microtick level) of local time bases of local time bases
(on macrotick level)(on macrotick level)
DSP
WLAN
Kamera
GPRS
GPS
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 8
The GALS ConceptThe GALS Concept
partition system in functional units FUpartition system in functional units FU apply synchronous paradigm within FUsapply synchronous paradigm within FUs apply asynchronous paradigm (hand-apply asynchronous paradigm (hand-
shake) for communication among FUsshake) for communication among FUsmost difficult clock most difficult clock routing problems routing problems eliminatedeliminated
potential metastabilitypotential metastabilityproblems at syn/asyn problems at syn/asyn boundariesboundaries
REQACK
FU1
FUn
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 9
ref. clock
clock drift
Real time
Clo
ck t
ime
|C1(t)-C2(t)| ≤ Π„Precision“
t
T
C1
C2
Synchronization: PrincipleSynchronization: Principle
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 10
Why Synchronize?Why Synchronize?
For unsynchronized clocks the distance For unsynchronized clocks the distance between corresponding edges („nbetween corresponding edges („nthth edge“) becomes arbitrarily large.edge“) becomes arbitrarily large.
Hence the relative timing between the Hence the relative timing between the two FUs is completely undefined.two FUs is completely undefined.
Therefore consistent temporal ordering Therefore consistent temporal ordering of global events is impossible.of global events is impossible. This may, e.g., cause redundant modules to This may, e.g., cause redundant modules to
deliver differing results even in the fault-deliver differing results even in the fault-free case!free case!
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 11
PLL
voted ref. clock
voter
local clock
node A
clock output
clock inputs
HW Clock SynchronizationHW Clock Synchronization
node B
node C node D
at every node do:at every node do:
(1) derive reference clock (1) derive reference clock and phase by voting over and phase by voting over all local clocksall local clocks
(2) adjust local clock (2) adjust local clock phase by means of PLLphase by means of PLL
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 12
a closed-loop control circuita closed-loop control circuit different implementation styles different implementation styles
(analog, digital, fully digital)(analog, digital, fully digital) potential stability problemspotential stability problems cannot sync local clock only for small cannot sync local clock only for small
deviations from refdeviations from ref
PLL (Phase Locked Loop)PLL (Phase Locked Loop)
phasedetector
loopfilter
voltage controlled oscillator
ref inout
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 13
One voter is required on One voter is required on everyevery node. node. For a global agreement all voters For a global agreement all voters
must derive the same reference clock, must derive the same reference clock, otherwise cliques may evolve.otherwise cliques may evolve.
BUT…BUT…
Obviously the local voting result Obviously the local voting result is influenced by skew!is influenced by skew!
VoterVoter
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 14
independent local clock sources: independent local clock sources: „microtick“„microtick“ identical instances of a distributed algorithm identical instances of a distributed algorithm
executed on every nodeexecuted on every node global time established by global time established by „macroticks“„macroticks“ (i.e. (i.e.
defined number M of local microticks)defined number M of local microticks) message exchange between nodes provides message exchange between nodes provides
knowledge on other nodes‘ local timeknowledge on other nodes‘ local time algorithm continuously derives correction for algorithm continuously derives correction for
M to keep macrotick in synchrony with othersM to keep macrotick in synchrony with others set of N nodes can tolerate f Byzantine faulty set of N nodes can tolerate f Byzantine faulty
nodes, if N ≥ 3f+1nodes, if N ≥ 3f+1
SW Clock SynchronizationSW Clock Synchronization
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 15
Rate CorrectionRate Correction
…
makrotick
microtick
1
…
t
1 2 3 uu-1
…1 2 3 vv-1
t
v+1
makrotick
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 16
Critical ParametersCritical Parameters
message jittermessage jitter algorithm execution time jitteralgorithm execution time jitter native drift of clock sources (microtick)native drift of clock sources (microtick) resynchronization periodresynchronization period network topology (fully connected, …)network topology (fully connected, …) correction function (algorithm)correction function (algorithm)
precision of 100ns achievableprecision of 100ns achievable
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 17
Consistency Based Algs.Consistency Based Algs.
all non-faulty nodes agree on same view of all non-faulty nodes agree on same view of every (even faulty) node‘s local time, every (even faulty) node‘s local time,
can calculate the global time consistentlycan calculate the global time consistently agreement needs several rounds of agreement needs several rounds of
communication per valuecommunication per value creates high network trafficcreates high network traffic requires bounded transmission delayrequires bounded transmission delay does not require fully connected networkdoes not require fully connected network good performance wrt. skewgood performance wrt. skew agreed value not necessarily correct (only in agreed value not necessarily correct (only in
case of non-faulty sender)case of non-faulty sender)
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 18
An Example AlgorithmAn Example Algorithm
when received (tick[k]) from f+1 then send (tick[k]) to all {once}
when received (tick[k]) from 2f+1 then send (tick[k+1]) to all {once} local_tick := k+1
Rule 2: “Increment”Rule 2: “Increment”
(Srikanth & Toueg, 87)(Srikanth & Toueg, 87)
Rule 1: “Relay”Rule 1: “Relay”
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 19
FT Clocking – ResumeFT Clocking – Resume
GALSGALS sacrifices global time basesacrifices global time base potential metastability problemspotential metastability problems
HW clock sync HW clock sync (PCB level)(PCB level) skew can cause inconsistent votingskew can cause inconsistent voting
SW clock sync SW clock sync (distributed systems)(distributed systems) precision of better than 100ns not precision of better than 100ns not
realisticrealistic
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 20
DistributedDistributed SystemSystem on Chip on Chip??
complex structurecomplex structure
communication delay communication delay negilgiblenegilgible
fault tolerance fault tolerance rarely neededrarely needed
explicit explicit computing nodescomputing nodes
pronounced pronounced communic. delaycommunic. delay
need for need for fault tolerancefault tolerance
wealth of wealth of existing existing researchresearch
Distributed Systems
classical VLSI Systems
System on Chip
modular structuremodular structure
communication delay communication delay dominatesdominates
fault tolerancefault tolerancedefinitely desireddefinitely desired
new new problemsproblems
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 21
1s is excellent precision for distributed clock
at 1GHz this means 360.000° phase shift
phase synchronisation
tick synchronisation
clock synchronisation
keep same frequency for all modules, AND deterministically accommodate significant skew
„„Tick Synchronization“Tick Synchronization“
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 22
DSP
WLAN
Kamera
GPRS
GPS
New Synchrony on ChipNew Synchrony on Chip
DSP
WLAN
Kamera
GPRS
GPS
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 23
The DARTS ApproachThe DARTS Approach
adopt a distributed adopt a distributed synchronization algo-synchronization algo-rithm for SoC clockingrithm for SoC clocking
inherit all fault-tolerance inherit all fault-tolerance propertiesproperties
implement algorithm in implement algorithm in hardwarehardware
thus achieve much better thus achieve much better precision (a few clock precision (a few clock cycles) and clock ratecycles) and clock rate
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 24
FU1
FU2
FU3
data bus
Clock tree
TG algs
TG network
Distributed clock
modules FUmodules FUii augmen- augmen-ted with simple local ted with simple local clock unit (clock unit (TG algTG alg))
TG algs implemented in TG algs implemented in asynchronous logic styleasynchronous logic style
TG algs communicate TG algs communicate over dedicated bus (over dedicated bus (TG TG networknetwork) to generate ) to generate local clockslocal clocks
need 3f+1 modules to need 3f+1 modules to tolerate f arbitrary faultstolerate f arbitrary faults
Synchronous solution
The DARTS ArchitectureThe DARTS Architecture
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 25
Conceptual Advantages (1)Conceptual Advantages (1)
best possible synchronybest possible synchrony locally:locally: still (phase)synchronous still (phase)synchronous
remain with traditional synchronous paradigmremain with traditional synchronous paradigm
globally:globally: frequency synchronous frequency synchronous
global precision global precision pp is known and bounded: is known and bounded:
delay delaymaxmax / delay / delayminmin (relative!) (relative!)
completely avoid metastability issuescompletely avoid metastability issues
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 26
Conceptual Advantages (2)Conceptual Advantages (2)
fault-tolerant clock generationfault-tolerant clock generation algorithm algorithm generatesgenerates clock clock
no crystal oscillator requiredno crystal oscillator required
distributed algorithmdistributed algorithmno single point of failureno single point of failure
scalable fault tolerancescalable fault tolerance
use n ≥ 3f+1 nodesuse n ≥ 3f+1 nodes to to tolerate f Byzantine tolerate f Byzantine faultsfaults
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 27
Conceptual Advantages (3)Conceptual Advantages (3)
weaker timing assumptionsweaker timing assumptions TG-net instead of costly clock treeTG-net instead of costly clock tree
large skew uncritical for operationlarge skew uncritical for operation
closed-loop timing closed-loop timing => frequency adapts to=> frequency adapts to
variation of operating conditionsvariation of operating conditions type variations from fabtype variations from fab
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 28
An Example AlgorithmAn Example Algorithm
when received (tick[k]) from f+1 then send (tick[k]) to all {once}
when received (tick[k]) from 2f+1 then send (tick[k+1]) to all {once} local_tick := k+1
Rule 2: “Increment”Rule 2: “Increment”
(Srikanth & Toueg, 87)(Srikanth & Toueg, 87)
Rule 1: “Relay”Rule 1: “Relay”
recall
Our choice fo
r DARTS
Our choice fo
r DARTS
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 29
Algorithm PropertiesAlgorithm Properties
formal proof for precision existsformal proof for precision exists precision depends on precision depends on relativerelative timing only timing only
simplesimple scalable fault tolerancescalable fault tolerance can handle Byzantine failurescan handle Byzantine failures formal proof for booting existsformal proof for booting exists time free ideal assumption coveragetime free ideal assumption coverage
to write downto write down
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 30
when received (tick[k]) from 2f+1 then send (tick[k+1]) to all {once} local_tick := k+1
when received (tick[k]) from f+1 then send (tick[k]) to all {once}
message number k of unbounded size
atomicity of actions
The Hardware PerspectiveThe Hardware Perspective
Rule 1: “Relay”Rule 1: “Relay”
Rule 2: “Increment”Rule 2: “Increment”
integer comparison
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 31
Further PropertiesFurther Properties
fully connected netfully connected net message basedmessage based message book-keepingmessage book-keeping integer comparisoninteger comparison majority voting on comparison resultmajority voting on comparison result high-level proofs imply high-level proofs imply
many properties – are many properties – are they met by the they met by the implementation?implementation?
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 32
(1)(1)adapt algorithm for single-rail, zero-adapt algorithm for single-rail, zero-bit messages bit messages
(2)(2)maintain only the maintain only the differencedifference of local of local and remote ticksand remote ticks
challenge: challenge: “unbounded” size of tick numbers“unbounded” size of tick numbers options:options:
serialize transmission of tick numbers:serialize transmission of tick numbers:too slow for clock generationtoo slow for clock generation
multiple parallel rails per clock signalmultiple parallel rails per clock signalhardware/wiring effort too highhardware/wiring effort too high
From Algorithm to HW (1)From Algorithm to HW (1)
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 33
challenge: challenge: atomicity of actions not implicitlyatomicity of actions not implicitlyguaranteed by asynchronous HWguaranteed by asynchronous HW
options:options:
strict serialization of operationsstrict serialization of operationsalternating processing of remote & local tickalternating processing of remote & local tick
interlocking of the algorithm’s rulesinterlocking of the algorithm’s rulesseparate processing for rising and falling separate processing for rising and falling
edgeedge
From Algorithm to HW (2)From Algorithm to HW (2)
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 34
algorithm redesignalgorithm redesign
appropriate HW designappropriate HW design
ASIC implementationASIC implementation
demo application demo application
experim. evaluationexperim. evaluation
for „zero-bit“ messagesfo
rmal
ly p
rove
d
DARTS Project AimsDARTS Project Aims
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 35
asnyHW behav. assumptionsasnyHW behav. assumptions 0-bit messages0-bit messages
SW behavior assumptionsSW behavior assumptions unbounded message lengthunbounded message length unbounded local memoryunbounded local memory strong local atomicity, strong local atomicity,
local lock-steplocal lock-step high comput. performance, high comput. performance,
algorithm can contain algorithm can contain complex computationscomplex computations
Results: precision, accuracy, …Results: precision, accuracy, …
Formal ProofsFormal Proofs
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 36
algorithm redesignalgorithm redesign
appropriate HW designappropriate HW design
ASIC implementationASIC implementation
demo application demo application
experim. evaluationexperim. evaluation
DARTS Project AimsDARTS Project Aims
ack_ext ack_int
req_ext req_int
R em ote P ipe
____
_G
EQ
e
GR
e
GE
Qo
___
GR
o
3f+1
1
= 2f+1 = 2f+1
= f+1 = f+1
......
......
Threshold Logic_____G EQ e
G R e
G EQ o
___G R o
clk_
out
Pipeline 1
N ode p...
...
...
Pipe C om pare S ignal G enerators
C
C
C
C
C
C
C
C
C
Diff-G ate
CC
C
Local P ipe
rem
ote
clk_
in
E xterna l P ipe
P ipeline 2
Loca l P ipeD iff-G ate
P ipe C om pare S ignal G en.
ExternalP ipe
P ipeline 3
Loca l P ipeD iff-G ate
P ipe C om pare S igna l G en.
R em oteP ipe
P ipeline 3f+1
LocalP ipe
D iff-G ate
P ipe C om pare S ignal G en.
...
circuit design ready
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 37
algorithm redesignalgorithm redesign
appropriate HW designappropriate HW design
ASIC implementationASIC implementation
demo application demo application
experim. evaluationexperim. evaluation
DARTS Project AimsDARTS Project Aims
FPGA prototype running
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 38
algorithm redesignalgorithm redesign
appropriate HW designappropriate HW design
ASIC implementationASIC implementation
demo application demo application
experim. evaluationexperim. evaluation
DARTS Project AimsDARTS Project Aims
postponed to follow-up
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 39
algorithm redesignalgorithm redesign
appropriate HW designappropriate HW design
ASIC implementationASIC implementation
demo application demo application
experim. evaluationexperim. evaluation
DARTS Project AimsDARTS Project Aims
ongoing…
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 40
TG Alg Circuit PrincipleTG Alg Circuit Principle
+ Counter + Counter
+ Counter + Counter
+ Counter
clocks from other TG
algs
≥ 2f+1
≥ f+1
„OR“
local clock
GRGR
GEQGEQ
+ Counter
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 41
ack_ext ack_int
req_ext req_int
R em ote P ipe
____
_G
EQ
e
GR
e
GE
Qo
___
GR
o
3f+1
1
= 2f+1 = 2f+1
= f+1 = f+1
......
......
Threshold Logic_____G EQ e
G R e
G EQ o
___G R o
clk_
out
Pipeline 1
N ode p
...
...
...
Pipe C om pare S ignal G enerators
C
C
C
C
C
C
C
C
C
Diff-G ate
CC
C
Local P ipe
rem
ote
clk_
in
E xterna l P ipe
P ipeline 2
Loca l P ipeD iff-G ate
P ipe C om pare S ignal G en.
ExternalP ipe
P ipeline 3
Loca l P ipeD iff-G ate
P ipe C om pare S igna l G en.
R em oteP ipe
P ipeline 3f+1
LocalP ipe
D iff-G ate
P ipe C om pare S ignal G en.
...
TG Alg Block DiagramTG Alg Block Diagram
clock output
clock inputs
counter modules
threshold function and tick generation
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 42
C
C
C
C
Reset
Rremote,in
C
C
C
C
Reset
Rlocal,in
NAND2
NOR2
NOR1
NAND3
NAND4
NAND5
GEQe
GRe
GEQo
GRo
Pipeline 3f+1 of 3f+1
Local PipeDiff-Gate
Remote Pipe
Pipe Compare Signal Gen.
...
...
≥2f+1 ≥2f+1
≥f+1 ≥f+1
......
......
Threshold Gates
____GEQe
___GRe
____GEQo
___GRo
C
...3f+
1
...
Cbo
ttom
Ctop
clk_out
00
02 04
06
08
01
03
05
07
09 10
12
14
16
18
11
13
1517
19
40
42
44
4648
41
43
45
47
49
5051
20
22
24
26
2821
2325
27
29
30
32
34
36
3133
3537
38
39
52
54 56
58
53
55
57
s0
s1
i0 i1 i2 i3 i4 i5 i6 i7 i8 i9
PCSGModule
Threshold Module
DifferenceModule(EP + DG)
TG Alg ModulesTG Alg Modules
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 43
Elastic Pipeline (EP)Elastic Pipeline (EP)
buffers incoming clock edgesbuffers incoming clock edges proof shows that 4 stages are sufficient hereproof shows that 4 stages are sufficient here ack_out is ignored !ack_out is ignored !
minimum distance between successive input minimum distance between successive input transitions is ttransitions is tpipepipe
ack_in
data_outdata_in
ack_out
C
C
C
Ctpipe
tpipe
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 44
Diffifference Gate (DG)Diffifference Gate (DG)
asynchronous state machine (see homework ) removes matching transitions from tick buffers alternating progress on remote and local buffers
to serialize processing of ticks
Rremote,out Rlocal,out
Alocal,outAremote,out
Reset
Res
et
C
C
remote ticks
localticks
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 45
Muller-C-ElementMuller-C-Element
basic building block for EP and DMbasic building block for EP and DM internal storage loop !internal storage loop !
minimum distance between successive minimum distance between successive input transitions is tinput transitions is tlooploop
Ca
b
y
º
tloop
b y
a
y
tprop
a b
010
01
10
1
yold
0
1yold
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 46
The PCSGThe PCSG Pipe Compare Signal GeneratorPipe Compare Signal Generator
Compares the fill levels of remote and local Compares the fill levels of remote and local
pipepipe must not react to dynamic effects during must not react to dynamic effects during
substraction (glitches!)substraction (glitches!) purely combinational logicpurely combinational logic
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 47
≥2f+1 ≥2f+1
≥f+1 ≥f+1
......
......
Threshold Modules
GEQe
GRe
GEQo
GRo
Cclk_out
Tick-Gen. Module
buffer fill-levels
generates new tick when rule fires (= threshold reached) separate rising and falling transitions for interlocking back-transition from state logic to event logic problematic wrt. glitches !!
local clock output
Threshold Modules (THM)Threshold Modules (THM)
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 48
Thresold GatesThresold Gates
building blocks for THMbuilding blocks for THM activate output if more than k of activate output if more than k of
their n inputs are activetheir n inputs are active very inconvenient logic functionvery inconvenient logic function several different implementation several different implementation
options (ROM, sum-of-products, options (ROM, sum-of-products, custom cell, non-CMOS,…)custom cell, non-CMOS,…)
must be free of hazards!must be free of hazards! sum-of-products turned out bestsum-of-products turned out best
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 49
Implementation ResultsImplementation Results
FPGA prototype: 24MHz with 4ns skew
non-optimized ASIC simulation: >200MHz with 650ps skew (RadHard UMC018 library)
ultimately available custom cell for Muller-C element will further increase performance
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 50
Fully Connected…Fully Connected…
A fully connected system of A fully connected system of nn nodes nodes comprisescomprises 22nn((nn-1) 4-stage EPs-1) 4-stage EPs nn((nn-1) DGs and PCSGs-1) DGs and PCSGs 44nn THGs with THGs with nn-1 inputs-1 inputs nn22 interconnect lines in the TG net interconnect lines in the TG net
A reduction to a sparsely connected system A reduction to a sparsely connected system would yield substantial savings, butwould yield substantial savings, but
it is impossible to handle Byzantine failures it is impossible to handle Byzantine failures then.then.
Does hardware fail „Byzantine“?Does hardware fail „Byzantine“?
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 51
Failure ModelsFailure Models
Distributed Systems people talk aboutDistributed Systems people talk about omission failuresomission failures clean/unclean crashesclean/unclean crashes Byzantine failures,…Byzantine failures,…
Hardware people care aboutHardware people care about stuck-at faultsstuck-at faults bit-flipsbit-flips metastability,metastability, opens and shorts,…opens and shorts,…
We are currently investigating this issueWe are currently investigating this issue
??
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 52
Test ProblemsTest Problems
Fault toleranceFault tolerance implies fault maskingimplies fault masking
Self-timed behaviorSelf-timed behavior inconvenient for testerinconvenient for tester
Delay faultsDelay faults different effects in a DI circuitdifferent effects in a DI circuit
Sequential behaviorSequential behavior due to Muller-C Gatesdue to Muller-C Gates
Scan chain Scan chain does not naturally existdoes not naturally exist
Sequential asyn ATPG toolSequential asyn ATPG tool not availablenot available
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 53
Dynamic states difficult / impossible to Dynamic states difficult / impossible to control for testercontrol for tester
Resulting necessities:Resulting necessities: Breaking feedback loopsBreaking feedback loops Halting the circuit in a dynamic stateHalting the circuit in a dynamic state
Alternative: scan chainAlternative: scan chain Lock internal stateLock internal state Controlled by tester clockControlled by tester clock
Temporal ControlTemporal Control
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 54
The “Freeze” LatchesThe “Freeze” Latches
S0
11Sn
ArAl
S1
01
S2
00S3
10
RrRl
11
01
00
10
(a) (b)
Rl
DD
Rr
Ar
Al
freezer freezel
Rr
ENEN
Q Q
C
Halting FSM of DG inhibits further operation Halting FSM of DG inhibits further operation of TG Algof TG Alg
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 55
PartitioningPartitioning
C
C
C
C
Reset
Rremote,in
C
C
C
C
Reset
Rlocal,in
NAND2
NOR2
NOR1
NAND3
NAND4
NAND5
GEQe
GRe
GEQo
GRo
Pipeline 3f+1 of 3f+1
Local PipeDiff-Gate
Remote Pipe
Pipe Compare Signal Gen.
...
...
≥2f+1 ≥2f+1
≥f+1 ≥f+1
......
......
Threshold Gates
____GEQe
___GRe
____GEQo
___GRo
C
...3f+
1
...
Cbo
ttom
Ctop
clk_out
00
02 04
06
08
01
03
05
07
09 10
12
14
16
18
11
13
1517
19
40
42
44
4648
41
43
45
47
49
50
51
20
22
24
26
2821
2325
27
29
30
32
34
36
3133
3537
38
39
52
54 56
58
53
55
57
s0
s1
i0 i1 i2 i3 i4 i5 i6 i7 i8 i9
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 56
Self-Checking PropertySelf-Checking Property
Any SAF in the EP will inhibit further Any SAF in the EP will inhibit further transitions at the outputtransitions at the output
If we observe a correct response for the If we observe a correct response for the input sequence „1010“ then the EP is free input sequence „1010“ then the EP is free of any SAFof any SAF
ack_in
data_outdata_in
ack_out
C
C
C
Ctpipe
tpipe
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 57
use EP outputs use EP outputs as stimuli for as stimuli for PCSGPCSG
many dynamic many dynamic states needed states needed => freezing => freezing mandatorymandatory
Coverage still Coverage still 98% only, due 98% only, due to redundant to redundant inputinput
Test VectorTest Vector StableStable FreezeFreeze
1010 0101011010 010101
1010 0110011010 011001
1010 1001011010 100101
1010 1010101010 101010
1001 0101011001 010101
1001 0110011001 011001
1001 1010101001 101010
0110 0101010110 010101
0110 0110010110 011001
0101 0101010101 010101
0101 0110100101 011010
0101 1001100101 100110
xx10 010110xx10 010110
xx01 101001xx01 101001
Testing PCSG via the EPTesting PCSG via the EP
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 58
Threshold Modules TestingThreshold Modules Testing
purely combinational logicpurely combinational logic implementation may varyimplementation may vary
=> black box test desirable=> black box test desirable exhaustive test tractable (11 inputs exhaustive test tractable (11 inputs
=> 2=> 211 11 vectors in our case)vectors in our case) can test all 4 THGs in parallelcan test all 4 THGs in parallel need direct access to outputs need direct access to outputs beforebefore
combination into Muller C-elementcombination into Muller C-element can use counter or LFSR as test can use counter or LFSR as test
pattern generatorpattern generator
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 59
A Special Scan CellA Special Scan Cell
D
CLKCLK
SCAN_ENABLE
DATA
SCAN_DATA
0
1
0 G 01
SCAN_OUT
DATA_OUT
standard scan cell
special scan cell
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 60
Why Constraints?Why Constraints?
Our event-logic based DARTS circuit Our event-logic based DARTS circuit is not fully delay insensitive:is not fully delay insensitive:
residual delay sensitivity in gate- residual delay sensitivity in gate- internal storage loops (MCG)internal storage loops (MCG)
open ack path for clock ticksopen ack path for clock ticks mixture of event logic (EP, DG) and mixture of event logic (EP, DG) and
state logic (PCSG, THM)state logic (PCSG, THM) modular fault-tolerance is modular fault-tolerance is
contradictory to the DI principlecontradictory to the DI principle
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 61
Fault-tolerant Asyn LogicFault-tolerant Asyn Logic
Asynchronous Logic is based on the Asynchronous Logic is based on the handshake principle.handshake principle.
Before generating ist next output a DI gate Before generating ist next output a DI gate with multiple inputs waits for the last one to with multiple inputs waits for the last one to become valid („non-eager style“).become valid („non-eager style“).
Problem: In case of a single input failure Problem: In case of a single input failure (stuck-at) such a gate will wait forever.(stuck-at) such a gate will wait forever.
This voids all redundancy concepts This voids all redundancy concepts (duplication of units, TMR, …)(duplication of units, TMR, …)
This is not an implementation issue but a This is not an implementation issue but a fundamental dilemma, also in DARTSfundamental dilemma, also in DARTS
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 62
Solving the DilemmaSolving the Dilemma
We generate a new tick before all pending We generate a new tick before all pending ticks have arrived. Sync loops remain open!ticks have arrived. Sync loops remain open!
Now we must prevent old ticks we have Now we must prevent old ticks we have missed from being mixed up with responses missed from being mixed up with responses to the new tick („de-synchronization“).to the new tick („de-synchronization“).
With anonymous ticks this can only be solved With anonymous ticks this can only be solved by timing constraints.by timing constraints.
We can weaken the constraints by treating We can weaken the constraints by treating rising and falling edges separately.rising and falling edges separately.
For DARTS this yields the constraint that the For DARTS this yields the constraint that the slowest path must have no more than twice slowest path must have no more than twice the round trip delay of the fastest onethe round trip delay of the fastest one
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 63
Optimal ConstraintsOptimal Constraints DARTS has been proven on the algorithm DARTS has been proven on the algorithm
levellevel Does the HW implementation meet all proof Does the HW implementation meet all proof
assumptions without restrictions?assumptions without restrictions?Definitely not – we have already identified Definitely not – we have already identified constaintsconstaints
How can we derive the minimum set of How can we derive the minimum set of constraints for a given implementation?constraints for a given implementation?
This is a non-trivial „constraint satisfaction“ This is a non-trivial „constraint satisfaction“ problem!problem!
We are currently planning to further pursue We are currently planning to further pursue this issue.this issue.
Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 64
SummarySummary
FT clocking is becoming an issue, but current FT clocking is becoming an issue, but current solutions are insufficientsolutions are insufficient
DARTS adopts a distributed algorithm for SoC DARTS adopts a distributed algorithm for SoC clockingclocking
The TG alg implementation points out The TG alg implementation points out substantial differences between HW and SWsubstantial differences between HW and SW
It also exemplifies practical problems and It also exemplifies practical problems and benefits of asyncrnonus circuitsbenefits of asyncrnonus circuits
Timing constraints are unavoidable for Timing constraints are unavoidable for several reasons but difficult to derive and several reasons but difficult to derive and even minimize.even minimize.
There is still much work ahead…There is still much work ahead…