ay-jan.20011 communicating in systems with heterogeneous timing alex yakovlev, asynchronous systems...
Post on 31-Mar-2015
213 Views
Preview:
TRANSCRIPT
AY-Jan.2001 1
Communicating in Systems with Heterogeneous Timing
Alex Yakovlev,
Asynchronous Systems Laboratory
University of Newcastle upon Tyne
Edinburgh,11 Jan. 2001
AY-Jan.2001 2
Objectives
• To study a range of asynchronous communication mechanisms (ACMs) that can be used in constructing (distributed and concurrent) systems with heterogeneous timing
• To develop hardware implementations for ACMs, using self-timed circuits for potential use in Systems-On-a-Chip (SOCs) and embedded (miniature, low power and EMC) applications
• Work is done within a collaborative EPSRC research project COMFORT with King’s College London.
AY-Jan.2001 3
Heterogeneously Timed Nets (hets)
A1 C1
A3
A4
A2
C3
C2
AY-Jan.2001 4
Hets
A1 C1
A3
A4
A2
C3
C2
Time/event/data-drivenData processing elements(active)
AY-Jan.2001 5
Hets
A1 C1
A3
A4
A2
C3
C2
Data communication elements(passive) - ACMs
AY-Jan.2001 6
Previous work
• Real-time networks and MASCOT approach – from RSRE/Phillips(67), BAe/Simpson(86) – for software systems– high time heterogeneity but relatively low speed
• Globally-Asynchronous-Locally-Synchronous (GALS) – Chapiro(84), Muttersbach(00), Ginosar(00) – for VLSI circuits– high speed but very limited time heterogeneity (mesa-
chronous or source synchronous)
AY-Jan.2001 7
Interaction between system parts
A BComm.Mechanism(e.g. shared memory)
AY-Jan.2001 8
Terminology on timing
• Temporal relationship between parts A and B in a system can be:– (Globally, locally for A/B) clocked = synchronous on
(global, local for A/B) clock
– Self-timed = synchronous on handshakes and/or by some time constraints, e.g. I/O and fundamental modes
– (Mutually) asynchronous = NOT synchronous (on global clock or on handshakes); hence asynchronous is neither self-timed nor globally clocked
AY-Jan.2001 9
Globally clocked
A BComm.Mechanism(e.g. shared memory)
Global clock
AY-Jan.2001 10
Self-timed (via handshake)
A BComm.Mechanism(e.g. shared memory)
Req/Ack handshake(s),possibly with bounded buffer in between
AY-Jan.2001 11
Fully Asynchronous
A BComm.Mechanism(e.g. shared memory)
Timing for A Timing for BTemporalfirewall
AY-Jan.2001 12
Evolution of timing (1)
• Globally clocked systems:Good: deterministic and predictable for real-time,
safety-critical systems
Bad: prone to clock skew, bad for power consumption and EMC: indiscriminate data-crunching
AY-Jan.2001 13
Evolution of timing (2)
• Self-timed systems (with micropipelines and handshakes):Good: no skew problems, good for power and
EMC if data-driven
Bad: temporal non-determinism, lockable handshakes, hence bad for real-time
AY-Jan.2001 14
Evolution of timing (3)
• Fully or partially Asynchronous systems:Good: distributed and heterogeneous clocking;
real-time applied locally – fully predictable; self-timing can be applied where possible for power saving and EMC
Bad: potential loss of information where full asynchrony (e.g. due to real-time) is applied
AY-Jan.2001 15
Asynchronous Communication mechanisms (ACMs)
Writer ReaderACM
Level of asynchrony is defined by WRITE and READ rules
AY-Jan.2001 16
Classification of ACMs
Hugo Simpson’s classification:
Destructive read (read can be held up)
Non-destructive read (read cannot be held up)
Destructive write (write cannot be held up)
Signal
(event data)
Pool
(reference data)
Non-destructive write (write can be held up)
Channel
(message data)
Constant
(configuration data)
AY-Jan.2001 17
Difficulty with Simpson’s classification
• Destructive/Non-destructive does not intuitively imply temporal, Wait/No-wait division, but what is meant is that:– Destructive (non-destructive) write cannot (can) wait – Destructive (non-destructive) read can (cannot) wait
• There is symmetry (duality) between Pool and Channel but no symmetry between Signal and Constant, because Constant allows ‘constructive’ write only once - yet ‘constructive’ writes are also allowed by Signal
AY-Jan.2001 18
Petri net capture of Simpson’s protocolsSignal
non-destr write empty
full
destr write
non-destr write
empty
full
destr read
non-destr write
empty
full
full
destr write non-destr read
destr read
ConstantChannel
Pool
non-destr read
Constructive writes
AY-Jan.2001 19
Another interpretationSignal
writeread
unread
over-writeread
unread
writeread
unreadread
CommandChannel
Pool
writeread
re-read
read
unread
over-write
write re-read
read
read
Constant is a special case of Command
AY-Jan.2001 20
Another interpretationSignal
writeread
unread
over-writeread
unread
writeread
unreadread
CommandChannel
Pool
writeread
re-read
read
unread
over-write
write re-read
read
read
Busy Writer
AY-Jan.2001 21
Another interpretationSignal
writeread
unread
over-writeread
unread
writeread
unreadread
CommandChannel
Pool
writeread
re-read
read
unread
over-write
write re-read
read
read
Lazy Writer
AY-Jan.2001 22
Another interpretationSignal
writeread
unread
over-writeread
unread
writeread
unreadread
CommandChannel
Pool
writeread
re-read
read
unread
over-write
write re-read
read
read
Busy Reader
AY-Jan.2001 23
Another interpretationSignal
writeread
unread
over-writeread
unread
writeread
unreadread
CommandChannel
Pool
writeread
re-read
read
unread
over-write
write re-read
read
read
Lazy Reader
AY-Jan.2001 24
Another classification of ACMsLazy read = read only previously unread data(read can be held up)
Busy read = may re-read data already read(read cannot be held up)
Busy write = may over-write unread data(write cannot be held up)
BW-LR (Signal)
(event data)
BW-BR (Pool)
(reference data)
Lazy write = write only if previous read(write can be held up)
LW-LR (Channel)
(message data)
LW-BR (Command)
(configuration data)
AY-Jan.2001 25
Signal vs Pool
Pool
Real time 1 (busy domain)
Real time 2 (busy domain)
Signal
Real time (busy domain)
Data-driven (lazy domain)
Low Power!
AY-Jan.2001 26
Problems with the above Petri net definitions
• These Petri nets assumed:– Data capacity (max value of the data state of the ACM)
equals 1 (this can be easily generalised to any finite n>0 for Channel, defined as an n-place buffer with a wide range of known hardware implementations); do we semantically need other ACMs with n>1?
– Write and Read access are held up only by the data state of the ACM and not by the Read and Write operations themselves – those are treated as atomic and taking no time; in reality they are not and should be assumed to take arbitrary time
AY-Jan.2001 27
Breaking the atomicity
Signal with atomic access
over-write
writeread
unread
read
write read
unread readingover-write
not-in-writing
in writing
Signal with non- atomic access
AY-Jan.2001 28
Breaking the atomicity
Signal with atomic access
over-write
writeread
unread
read
write read
unread
in reading
over-write
not-in-writing
in writing
Signal with non- atomic access
Read may be held up by write being in progress … but not write by reading!
not-in-reading
AY-Jan.2001 29
But …
write read
unread readingover-write
not-in-writing
in writing
Signal with non- atomic accessWhat if Reading
begins just before Writing?
Problem with data integrity if only one data slot (one data token) is available
AY-Jan.2001 30
Required Properties of Signal(1)
1. Data states and their updating:– Signal’s capacity is 1 (at any time, it has
either 0 or 1 unread data items)– At the end of write access, Signal’s state is set
to unread (1)– At the end of read access, Signal’s state is set
to read (0)
AY-Jan.2001 31
Required Properties of Signal(2)
2. Conditional asynchrony for the reader:– Read access may start only when Signal’s
data state is unread (1) and no write access is in progress
– Read access can be arbitrarily long
3. Unconditional asynchrony for the writer:– Write must be allowed to start and complete
access at any time, regardless of Signal’s data state and the status of read access.
AY-Jan.2001 32
Required Properties of Signal(3)
4. Data coherence:– Any item of data that is read from Signal
must not have been changed since been written (i.e. no writing or reading in part)
5. Data freshness:– Any read access must obtain the data item
designated as the current unread item in Signal, i.e. the data item made available by the latest completed write access
AY-Jan.2001 33
Data slots and Signal
• “Data slot” is a unique portion of the shared memory which may contain one item of data of arbitrary (but bounded) size
• Signal cannot be implemented using One Slot only and satisfy all of the above properties
• Let us construct a Signal with TWO data slots• First a formal specification, State Graph (or
Transition System) must be built
AY-Jan.2001 34
Formal spec of Signal
Automatonfor Signal
Write slot 0 (wr0)
Write slot 1 (wr1)
Read slot 0 (rd0)
Read slot 1 (rd1)
Problem: construct a maximally permissible automaton, on alphabet of {wr0,wr1,rd0,rd1}, satisfying the required properties of the Signal ACM
AY-Jan.2001 35
State Graph constraints1. Data states, their updates and asynchrony:
swri rdj
swri wrj
srdi rdj
srdi wrj
2. Data coherence:
swri
rdjs
wri
rdjonly if i<>j
An wr action is enabled in every state
AY-Jan.2001 36
State Graph constraints
3. Data freshness (slot swapping):
4. No “re-try loops” (persistency in reading):
swri rdi
rdj wrjs
wri
rdjIf then
wrj writhere is no rdi on this paths
rdi i<>j
… s’
rdj
AY-Jan.2001 37
State Graph for 2-slot Signal
s0
s5s1
s4
s3
s2
s0
rd0 rd0
rd1 rd1
wr1
wr1 wr0
wr0
wr1
wr0init state
AY-Jan.2001 38
How to implement 2-slot Signal?
s0
s5s1
s4
s3
s2
s0
rd0 rd0
rd1 rd1
wr1
wr1 wr0
wr0
wr1
wr0init state
• In order to implement Signal we must distribute states and events between elements of implementation architecture.
• For that we must first separate states using a behavioural model of the implementation
AY-Jan.2001 39
Implementation architecture
Writer Reader
Signal control
wr0 rd1rd0wr1
Wreq
Wack
Rreq
Rack
Data slotsData
accessData access
Control access
Control access
The following structure must be kept in mind:
In hardware implementation of Signal control, latches and logic will be used to generate signals corresponding to steering events wri and rdi, events on handshakes with writer and reader, and some internal events
AY-Jan.2001 40
Behavioural model for Signal
• Petri nets can be used as a behavioural model (algorithm) for Signal:– A 1-safe Petri net can be synthesised from a
finite Transition System using theory of regions (Ehrenfeucht, Rozenberg et al)
– A 1-safe Petri net can be implemented in a self-timed circuit using either direct translation techniques or logic synthesis from Signal Transition Graphs (Yakovlev,Koelmans98)
AY-Jan.2001 41
State Graph refinement
wr0
rd0
rd1rd1
wr0
wr0
wr1
rd0
wr1
wr1
s0
s5s1
s4
s3
s2
s0
rd0 rd0
rd1 rd1
wr1
wr1 wr0
wr0
wr1
wr0init state
This Transition System cannot be synthesised into a 1-safe Petri net with unique event labelling – it requires refinement (it violates some separation conditions). There is also arbitration (conflict relation) between rdi and wrj events – in a physical implementation one cannot disable output actions
AY-Jan.2001 42
State Graph refinement
wr0
rd0
rd1rd1
wr0
wr0
wr1
rd0
wr1
wr1
Now arbitration is between internal events while wri and rdj are persistent
AY-Jan.2001 43
Distributing states b/w Write and Read parts
wr0
rd0
rd1rd1
wr0
wr0
wr1
rd0
wr1
wr1
3 4
1 2
6
5
Write superstates
Write elementary states
wr0
wr1
1
2
3
4
5
6
Write part:
AY-Jan.2001 44
Distributing states b/w Write and Read parts
wr0
rd0
rd1rd1
wr0
wr0
wr1
rd0
wr1
wr1
Read superstates
Read elementary states
7
8
9
10
11
12
Read part:
rd1
rd0
7
8
9
10
11
12
AY-Jan.2001 45
Completing the Petri net model
wr0
wr1
1
2
3
4
5
6
rd1
rd0
7
8
9
10
11
12
AY-Jan.2001 46
Introducing binary control variables
4
wr0
wr1
1
2
3
5
6
rd1
rd0
7
8
9
10
11
12
w=1w-
w=0w+ r-
r+
r=1
r=0
‘w’ encodes the slot being accessed for writing
‘r’ encodes the slot being accessed for reading
AY-Jan.2001 47
Towards circuit implementation
Data-out
Wreq
Wack
Rreq
Rack
Data-in Slot 0
Slot 1
Writepart
Readpart
w
r
set/reset
set/resettest
test
wr0 wr1 rd1rd0
AY-Jan.2001 48
Direct translation of PNs to circuits
p1 p2
p1 p2
(1) (0) (0) (1)
1*(1)
OperationControlled
To Operation
AY-Jan.2001 49
Direct translation of PNs to circuits
p1 p2
p1 p2
(1) (0) 0->1 1->0
1->0 (1)To Operation
AY-Jan.2001 50
Direct translation of PNs to circuits
p1 p2
p1 p21->0 0->1 0->1 1->0
1->0->1 1*To Operation
AY-Jan.2001 51
Direct translation of PNs to circuits
• This method associates places with latches (flip-flops) – so the state memory (marking) of PN is directly mimicked in the circuit’s state memory
• Transitions are associated with controlled actions (e.g. activations of data path units or lower level control blocks – by using handshake protocols)
• Modelling discrepancy (be careful!): – in Petri nets removal of a token from pre-places and adding tokens
in post-places is instantaneous (i.e. no intermediate states) – in circuits the “move of a token” has a duration and there is an
intermediate state
AY-Jan.2001 52
Translation in brief
This method has been used for designing control of a token ring adaptor
[Yakovlev, Varshavsky, Marakhovsky, Semenov, IEEE Conf. on Asynchronous Design Methodologies, London, 1995
a2- b2- a2+ b2+
a3- b3- a3+C2+
C1+
dummyQ1 Q3
b3+
Q6
Q7
Q5from
Op1
Op2
Op3
Op1
Q1 Q2
Q5
Q4
Op2C1
C2Q6
Q7Q3
Op3
Op1 1
1
C1C2
Op2
Q4
Q5Q3
Q1 Q2 Q6
(0)
(1) (1)
(0)
(0)
(1)
(1)
(1)
(1)
(1)
(1)
(1)
(1)
(0)
(0)
(0)(0)
(1)
(1)
(1)
(1)
(1)(1)
a1 b1 a2 b2
(1)*
(1)Op3
Q7(0)
(1)
(1)(1)
(1)
a3 b3(0)
(1)
(1)
(1)
(1)
(0)
(0)
*
(1)
(1)
(0) (1)
(1)
(1)
(1)
(1)(0)
(1)
(1)
(1)
(1)
(0)
(0) (0)
(1)(1)(1)
Cell Implementations
&&
a1- b1- a1+ b1+
(a) (b)
(c)
AY-Jan.2001 53
Refining the Write part
wr0
wr1
1
2
3
5
6
w=1w-
w=0w+
11
12
r=1
r=0
1
2
wr0
2123
w+
3
wr1
4
43 41
r=0
r=1
w-
w=1
w=0
AY-Jan.2001 54
Control circuit for Write part
2dc
odc1 odc0
sync
r
sync
rbar
sdcsdcsdcsdc
wr
write_start
write_ack
r_0 r_1 rbar_0 rbar_1
odc1 odc0wr1 wr0
ck1 ck0
3 1
4 2
43 41 2321
slot1
wr1 wr0
slot0
slot1 slot0
clrw setw
1
2
wr0
2123
w+
3
wr1
4
43 41
r=0
r=1
w-
w=1
w=0
AY-Jan.2001 55
Implementing David cells (1)
inr- x+ xb- ina- inr+
outa- outr-xb+ina+
x- outr+ outa+
x xb
ina
inr
outa
outr
"mild" relativetiming
inr- x+ xb- outr- outa-
xb+ina+ ina- inr+x-
x xb
ina
inr
outa
outr
Speed-independent version:
“Aggressive” relative timing version:
AY-Jan.2001 56
Implementing David cells (2)
2dc(0)
(1) (0)
odc1 odc0wr wr1 wr0
(1)
slot0 slot1
This is an peep-hole optimised solution for two David cells (places 1 and 3) and interface to the handshake with the Writer
AY-Jan.2001 57
Implementing ‘sync’ blocks
r
ck1
r_0
r_1
(0)
(0)
(1)
(1)
(0)
(0)
AY-Jan.2001 58
Simulation using Cadence toolkitmetastability inside mutex
Write response time
input of sync
output of sync
AY-Jan.2001 59
Cycle times (ns) for 0.6 micron
type Write Read
Without set-reset of w
With set-reset of w
No waiting for Write
Speed-independent
9.0 10.4 9.0
With Relative Timing
4.8 6.3 6.6
AY-Jan.2001 60
Improving performance
wr0
rd0
rd1rd1
wr0
wr0
wr1
rd0
wr1
wr1
s0
s5s1
s4
s3
s2
s0
rd0 rd0
rd1 rd1
wr1
wr1 wr0
wr0
wr1
wr0init state
In case of repetitive writing (of, eg., slot 1), read access may have to wait for the completion of write just because of a timing clash on the same slot – and not because of absence of new data in the ACM (original aim of Signal)
This problem cannot be resolved within the TWO slot ACM because of coherence violation. Can we do it with an extra slot?
AY-Jan.2001 61
Towards 3-slot Signal
rd1
1 2
6
10
2
rd1 rd1 rd1
13
18
3
4 5
8 9
12 1
11
14
16
15
17 19
20 21 22
24
23
23
19
15
11
7
7
3
13 14
rd1 rd1
rd3 rd3 rd3
rd3 rd3 rd3
rd2 rd2 rd2
rd2 rd2 rd2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr3
wr3
wr3
wr3
wr3
wr3
wr3
wr3
wr1
wr1
wr1
wr1
wr1
wr1
wr1
wr1
12 13
23 21
31 32
12'
23'
31'
32' 31'32
21' 23'
13' 12'13
21
Idea:
After writing a slot (e.g.2) for the first time writer alternates between 3 and 2
Reader can then, after finishing reading slot 1, read slot 2 or 3 whichever is free
AY-Jan.2001 62
Towards 3-slot Signal
rd1
1 2
6
10
2
rd1 rd1 rd1
13
18
3
4 5
8 9
12 1
11
14
16
15
17 19
20 21 22
24
23
23
19
15
11
7
7
3
13 14
rd1 rd1
rd3 rd3 rd3
rd3 rd3 rd3
rd2 rd2 rd2
rd2 rd2 rd2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr3
wr3
wr3
wr3
wr3
wr3
wr3
wr3
wr1
wr1
wr1
wr1
wr1
wr1
wr1
wr1
12 13
23 21
31 32
12'
23'
31'
32' 31'32
21' 23'
13' 12'13
21
Idea:
After writing a slot (e.g.2) for the first time writer alternates between 3 and 2
Reader can then, after finishing reading slot 1, read slot 2 or 3 whichever is free
or
AY-Jan.2001 63
Towards 3-slot Signal
rd1
1 2
6
10
2
rd1 rd1 rd1
13
18
3
4 5
8 9
12 1
11
14
16
15
17 19
20 21 22
24
23
23
19
15
11
7
7
3
13 14
rd1 rd1
rd3 rd3 rd3
rd3 rd3 rd3
rd2 rd2 rd2
rd2 rd2 rd2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr3
wr3
wr3
wr3
wr3
wr3
wr3
wr3
wr1
wr1
wr1
wr1
wr1
wr1
wr1
wr1
12 13
23 21
31 32
12'
23'
31'
32' 31'32
21' 23'
13' 12'13
21
Idea:
After writing a slot (e.g.2) for the first time writer alternates between 3 and 2
Reader can then, after finishing reading slot 1, read slot 2 or 3 whichever is free
or
AY-Jan.2001 64
3-slot Signal refined
r l w
Control variables
21(32):
w(2->1)l(3->2)
32:
r(3->2)
Algorithm:
Write part:
write slot w; l:=w; w:=differ(l,r)
Read part:
if (r<>l) r:=l else wait; read slot r;
r-read, w-write, l-last
AY-Jan.2001 65
3-slot Pool
rd1
1 2
6
10
2
rd1 rd1 rd1
13
18
3
4 5
8 9
12 1
11
14
16
15
17 19
20 21 22
24
23
23
19
15
11
7
7
3
6
13 14
rd1 rd1
rd3 rd3 rd3
rd3 rd3 rd3
rd2 rd2 rd2
rd2 rd2 rd2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr2
wr3
wr3
wr3
wr3
wr3
wr3
wr3
wr3
wr3
wr1
wr1
wr1
wr1
wr1
wr1
wr1
wr1
12 13
23 21
31 32
12'
23'
31'
32' 31'32
21' 23'21
13' 12'13
rd1
rd2
rd3
rd3
rd2
rd1
In Pool we must have:
Read asynchrony
Write part:
write slot w; l:=w; w:=differ(l,r)
Read part:
r:=l; read slot r;
Algorithm:
r-read, w-write, l-last
AY-Jan.2001 66
Three-slot algorithm (due to Hugo Simpson)
Writer: Reader:
wr: d[n]:=input
w0: l:=n
w1: n:=differ(l,r)
r0: r:=l
rd: output:=d[r]
n(next), l(last), r(read) – 3-valued var’s
AY-Jan.2001 67
Three-slot algorithm
differ:
1
2
3
1 2 3
2 3 2
3 3 1
2 1 1
AY-Jan.2001 68
Three-slot PoolWriter: Reader:
s2
30.12
23.12
27.12
next
read
s1
s3
last
02.01
AY-Jan.2001 69
Three-slot PoolWriter: Reader:
s2
30.12
02.01
27.12
next
read
s1
s3
last
AY-Jan.2001 70
Three-slot PoolWriter: Reader:
s2
30.12
02.01
27.12
next
read
s1
s3
last
AY-Jan.2001 71
Three-slot PoolWriter: Reader:
s2
30.12
02.01
27.12
next
read
s1
s3
last02.01
AY-Jan.2001 72
Three-slot PoolWriter: Reader:
s2
30.12
02.01
27.12
next
read
s1
s3
last02.0103.01
AY-Jan.2001 73
Three-slot PoolWriter: Reader:
s2
30.12
02.01
03.01
next
read
s1
s3
last02.01
AY-Jan.2001 74
Three-slot PoolWriter: Reader:
s2
30.12
02.01
03.01
next
read
s1
s3
last02.01
AY-Jan.2001 75
Three-slot PoolWriter: Reader:
s2
30.12
02.01
03.01
next
read
s1
s3
last02.01
05.01
AY-Jan.2001 76
Three-slot PoolWriter: Reader:
s2
05.01
02.01
03.01
next
read
s1
s3
last02.01
AY-Jan.2001 77
Three-slot PoolWriter: Reader:
s2
05.01
02.01
03.01
next
read
s1
s3
last
AY-Jan.2001 78
Three-slot PoolWriter: Reader:
s2
05.01
02.01
03.01
next
read
s1
s3
last
AY-Jan.2001 79
3-slot ACM design
write control mutex read control
differ ® n
reg l reg r
l
rn l r
Rw0Gw0 Gr0
Rr0
w1-req/ack
w0-req/ack
r0-req/ack
AY-Jan.2001 80
3-slot ACM design
write control mutex read control
differ ® n
reg l reg r
l
rl r
Rw0Gw0 Gr0
Rr0
w1-req/ack
w0-req/ack
r0-req/ack
n
AY-Jan.2001 81
Differ and register logic
l1
l2
l3
r1
r2
w1-req
r3
differ register
w1-ackn2
n3
n1
AY-Jan.2001 82
3-slot ACM design
write control mutex read control
differ ® n
reg l reg r
l
rn l r
Rw0Gw0 Gr0
Rr0
w1-req/ack
w0-req/ack
r0-req/ack
AY-Jan.2001 83
Write control circuit: STG
INPUTS: wr,Gw0,w0_ack,w1_ackOUTPUTS: w0,wa,w0_req,w1_req
w1_req-
w1_ack- wa+ w0- w1_ack+
wa-
wr- Gw0- w0_ack- w1_req+
w0_req-
wr+
w0_ack+
w0+
w0_req+
Gw0+
AY-Jan.2001 84
Write control ckt: from Petrify
wa
wr Gw0
csc2bcsc2
csc1bcsc1
The writer control circuits of the three-slot ACM
Rw0
w0_reqw0_ack
w1_reqw1_ack
AY-Jan.2001 85
Four-slot PoolWriter: Reader:
nextread
23.12
d[0,0]
last
02.01
28.12
d[0,1]
24.12
d[1,0]
30.12
d[1,1]
s[0] s[1]
v[0] v[1]
AY-Jan.2001 86
Four-slot Pool algorithm (H.Simpson)
Writer: Reader:
wr: d[n,¬s[n]]:=input
w0: s[n]:= ¬s[n]
w1: l:=n || n:=¬r
r0: r:=l
r1: v:=s
rd: output:=d[r,v[r]]
n (next), l(last), r(read) – binary var’s
AY-Jan.2001 87
3-slot vs 4-slot performance
statements 3-slot min time
ns
4-slot min time ns
w0+w1 4.19 9.39
r0+(r1) 1.38 3.47
Time for control statements
AY-Jan.2001 88
Are we in the end fully asynchronous?
• Circuit implementations involve use of latches, which may go metastable.
• Metastability always implies a trade-off, in terms of noise, between data or time domain error.
• In a “truly busy (real-time)’’ environment, where the ack signal is not used, the corresponding process (e.g., writer) must allow for a small interval (3-4ns for .6m CMOS), sufficient for metastability to get resolved practically with the probability of 1.
• Our h/w solutions for “busy” domains aim at maximising the “wait-free” aspect of communication but theoretically cannot fully eliminate mutual dependency between processes (hidden within ACM control variable circuits).
AY-Jan.2001 89
Concluding remarks
• Constructing ACMs to interface sub-systems with different time and energy requirements, and implementing them in high-speed hardware, proves feasible.
• Application of hets in control or image processing (e.g. via neural networks) is needed to fully assess their potential for future application-specific SOCs
• More work on mathematical modelling of hets and on developing an extensive parametrised library of ACM circuits is needed.
AY-Jan.2001 90
VLSI design layout (chip fab’ed in June 2000 via EUROPRACTICE)
4-slot Pool ACM
AY-Jan.2001 91
4-slot ACM part
Tested (physically) correct (details on testing in 9thAsync UK Forum paper)
AY-Jan.2001 92
Acknowledgements and References
• Members of the COMFORT team: At KCL – Tony Davies, Ian Clark, David Fraser, Sergio
VelastinAt NCL – Fei Xia, David Kinniment, Albert Koelmans,
Delong Shang, Alex Bystrov• BAe colleagues: Hugo Simpson and Eric Campbell • Project COMFORT web site:
http://www.eee.kcl.ac.uk/~comfort
• Work supported by EPSRC, EU (ACiD-WG) and reported and published at Async2000, AINT’2000, Async2001 etc.
top related