ay-jan.20011 communicating in systems with heterogeneous timing alex yakovlev, asynchronous systems...

AY-Jan.2001 1

Communicating in Systems with Heterogeneous Timing

Alex Yakovlev,

Asynchronous Systems Laboratory

University of Newcastle upon Tyne

Edinburgh,11 Jan. 2001

AY-Jan.2001 2

Objectives

• To study a range of asynchronous communication mechanisms (ACMs) that can be used in constructing (distributed and concurrent) systems with heterogeneous timing

• To develop hardware implementations for ACMs, using self-timed circuits for potential use in Systems-On-a-Chip (SOCs) and embedded (miniature, low power and EMC) applications

• Work is done within a collaborative EPSRC research project COMFORT with King’s College London.

AY-Jan.2001 3

Heterogeneously Timed Nets (hets)

AY-Jan.2001 4

Time/event/data-drivenData processing elements(active)

AY-Jan.2001 5

Data communication elements(passive) - ACMs

AY-Jan.2001 6

Previous work

• Real-time networks and MASCOT approach – from RSRE/Phillips(67), BAe/Simpson(86) – for software systems– high time heterogeneity but relatively low speed

• Globally-Asynchronous-Locally-Synchronous (GALS) – Chapiro(84), Muttersbach(00), Ginosar(00) – for VLSI circuits– high speed but very limited time heterogeneity (mesa-

chronous or source synchronous)

AY-Jan.2001 7

Interaction between system parts

A BComm.Mechanism(e.g. shared memory)

AY-Jan.2001 8

Terminology on timing

• Temporal relationship between parts A and B in a system can be:– (Globally, locally for A/B) clocked = synchronous on

(global, local for A/B) clock

– Self-timed = synchronous on handshakes and/or by some time constraints, e.g. I/O and fundamental modes

– (Mutually) asynchronous = NOT synchronous (on global clock or on handshakes); hence asynchronous is neither self-timed nor globally clocked

AY-Jan.2001 9

Globally clocked

Global clock

AY-Jan.2001 10

Self-timed (via handshake)

Req/Ack handshake(s),possibly with bounded buffer in between

AY-Jan.2001 11

Fully Asynchronous

Timing for A Timing for BTemporalfirewall

AY-Jan.2001 12

Evolution of timing (1)

• Globally clocked systems:Good: deterministic and predictable for real-time,

safety-critical systems

Bad: prone to clock skew, bad for power consumption and EMC: indiscriminate data-crunching

AY-Jan.2001 13

• Self-timed systems (with micropipelines and handshakes):Good: no skew problems, good for power and

EMC if data-driven

Bad: temporal non-determinism, lockable handshakes, hence bad for real-time

AY-Jan.2001 14

• Fully or partially Asynchronous systems:Good: distributed and heterogeneous clocking;

real-time applied locally – fully predictable; self-timing can be applied where possible for power saving and EMC

Bad: potential loss of information where full asynchrony (e.g. due to real-time) is applied

AY-Jan.2001 15

Asynchronous Communication mechanisms (ACMs)

Writer ReaderACM

Level of asynchrony is defined by WRITE and READ rules

AY-Jan.2001 16

Classification of ACMs

Hugo Simpson’s classification:

Destructive read (read can be held up)

Non-destructive read (read cannot be held up)

Destructive write (write cannot be held up)

Signal

(event data)

(reference data)

Non-destructive write (write can be held up)

Channel

(message data)

Constant

(configuration data)

AY-Jan.2001 17

Difficulty with Simpson’s classification

• Destructive/Non-destructive does not intuitively imply temporal, Wait/No-wait division, but what is meant is that:– Destructive (non-destructive) write cannot (can) wait – Destructive (non-destructive) read can (cannot) wait

• There is symmetry (duality) between Pool and Channel but no symmetry between Signal and Constant, because Constant allows ‘constructive’ write only once - yet ‘constructive’ writes are also allowed by Signal

AY-Jan.2001 18

Petri net capture of Simpson’s protocolsSignal

non-destr write empty

destr write

non-destr write

destr read

non-destr write

destr write non-destr read

destr read

ConstantChannel

non-destr read

Constructive writes

AY-Jan.2001 19

Another interpretationSignal

writeread

unread

over-writeread

unread

writeread

unreadread

CommandChannel

writeread

re-read

unread

over-write

write re-read

Constant is a special case of Command

AY-Jan.2001 20

writeread

unread

over-writeread

unread

writeread

unreadread

CommandChannel

writeread

re-read

unread

over-write

write re-read

Busy Writer

AY-Jan.2001 21

writeread

unread

over-writeread

unread

writeread

unreadread

CommandChannel

writeread

re-read

unread

over-write

write re-read

Lazy Writer

AY-Jan.2001 22

writeread

unread

over-writeread

unread

writeread

unreadread

CommandChannel

writeread

re-read

unread

over-write

write re-read

Busy Reader

AY-Jan.2001 23

writeread

unread

over-writeread

unread

writeread

unreadread

CommandChannel

writeread

re-read

unread

over-write

write re-read

Lazy Reader

AY-Jan.2001 24

Another classification of ACMsLazy read = read only previously unread data(read can be held up)

Busy read = may re-read data already read(read cannot be held up)

Busy write = may over-write unread data(write cannot be held up)

BW-LR (Signal)

(event data)

BW-BR (Pool)

(reference data)

Lazy write = write only if previous read(write can be held up)

LW-LR (Channel)

(message data)

LW-BR (Command)

(configuration data)

AY-Jan.2001 25

Signal vs Pool

Real time 1 (busy domain)

Real time 2 (busy domain)

Signal

Real time (busy domain)

Data-driven (lazy domain)

Low Power!

AY-Jan.2001 26

Problems with the above Petri net definitions

• These Petri nets assumed:– Data capacity (max value of the data state of the ACM)

equals 1 (this can be easily generalised to any finite n>0 for Channel, defined as an n-place buffer with a wide range of known hardware implementations); do we semantically need other ACMs with n>1?

– Write and Read access are held up only by the data state of the ACM and not by the Read and Write operations themselves – those are treated as atomic and taking no time; in reality they are not and should be assumed to take arbitrary time

AY-Jan.2001 27

Breaking the atomicity

Signal with atomic access

over-write

writeread

unread

write read

unread readingover-write

not-in-writing

in writing

Signal with non- atomic access

AY-Jan.2001 28

Breaking the atomicity

Signal with atomic access

over-write

writeread

unread

write read

unread

in reading

over-write

not-in-writing

in writing

Signal with non- atomic access

Read may be held up by write being in progress … but not write by reading!

not-in-reading

AY-Jan.2001 29

But …

write read

unread readingover-write

not-in-writing

in writing

Signal with non- atomic accessWhat if Reading

begins just before Writing?

Problem with data integrity if only one data slot (one data token) is available

AY-Jan.2001 30

Required Properties of Signal(1)

1. Data states and their updating:– Signal’s capacity is 1 (at any time, it has

either 0 or 1 unread data items)– At the end of write access, Signal’s state is set

to unread (1)– At the end of read access, Signal’s state is set

to read (0)

AY-Jan.2001 31

2. Conditional asynchrony for the reader:– Read access may start only when Signal’s

data state is unread (1) and no write access is in progress

– Read access can be arbitrarily long

3. Unconditional asynchrony for the writer:– Write must be allowed to start and complete

access at any time, regardless of Signal’s data state and the status of read access.

AY-Jan.2001 32

4. Data coherence:– Any item of data that is read from Signal

must not have been changed since been written (i.e. no writing or reading in part)

5. Data freshness:– Any read access must obtain the data item

designated as the current unread item in Signal, i.e. the data item made available by the latest completed write access

AY-Jan.2001 33

Data slots and Signal

• “Data slot” is a unique portion of the shared memory which may contain one item of data of arbitrary (but bounded) size

• Signal cannot be implemented using One Slot only and satisfy all of the above properties

• Let us construct a Signal with TWO data slots• First a formal specification, State Graph (or

Transition System) must be built

AY-Jan.2001 34

Formal spec of Signal

Automatonfor Signal

Write slot 0 (wr0)

Write slot 1 (wr1)

Read slot 0 (rd0)

Read slot 1 (rd1)

Problem: construct a maximally permissible automaton, on alphabet of {wr0,wr1,rd0,rd1}, satisfying the required properties of the Signal ACM

AY-Jan.2001 35

State Graph constraints1. Data states, their updates and asynchrony:

swri rdj

swri wrj

srdi rdj

srdi wrj

2. Data coherence:

rdjonly if i<>j

An wr action is enabled in every state

AY-Jan.2001 36

State Graph constraints

3. Data freshness (slot swapping):

4. No “re-try loops” (persistency in reading):

swri rdi

rdj wrjs

rdjIf then

wrj writhere is no rdi on this paths

rdi i<>j

… s’

AY-Jan.2001 37

State Graph for 2-slot Signal

rd0 rd0

rd1 rd1

wr1 wr0

wr0init state

AY-Jan.2001 38

How to implement 2-slot Signal?

rd0 rd0

rd1 rd1

wr1 wr0

wr0init state

• In order to implement Signal we must distribute states and events between elements of implementation architecture.

• For that we must first separate states using a behavioural model of the implementation

AY-Jan.2001 39

Implementation architecture

Writer Reader

Signal control

wr0 rd1rd0wr1

Data slotsData

accessData access

Control access

The following structure must be kept in mind:

In hardware implementation of Signal control, latches and logic will be used to generate signals corresponding to steering events wri and rdi, events on handshakes with writer and reader, and some internal events

AY-Jan.2001 40

Behavioural model for Signal

• Petri nets can be used as a behavioural model (algorithm) for Signal:– A 1-safe Petri net can be synthesised from a

finite Transition System using theory of regions (Ehrenfeucht, Rozenberg et al)

– A 1-safe Petri net can be implemented in a self-timed circuit using either direct translation techniques or logic synthesis from Signal Transition Graphs (Yakovlev,Koelmans98)

AY-Jan.2001 41

State Graph refinement

rd1rd1

rd0 rd0

rd1 rd1

wr1 wr0

wr0init state

This Transition System cannot be synthesised into a 1-safe Petri net with unique event labelling – it requires refinement (it violates some separation conditions). There is also arbitration (conflict relation) between rdi and wrj events – in a physical implementation one cannot disable output actions

AY-Jan.2001 42

State Graph refinement

rd1rd1

Now arbitration is between internal events while wri and rdj are persistent

AY-Jan.2001 43

Distributing states b/w Write and Read parts

rd1rd1

Write superstates

Write elementary states

Write part:

AY-Jan.2001 44

Distributing states b/w Write and Read parts

rd1rd1

Read superstates

Read elementary states

Read part:

AY-Jan.2001 45

Completing the Petri net model

AY-Jan.2001 46

Introducing binary control variables

w=0w+ r-

‘w’ encodes the slot being accessed for writing

‘r’ encodes the slot being accessed for reading

AY-Jan.2001 47

Towards circuit implementation

Data-out

Data-in Slot 0

Slot 1

Writepart

Readpart

set/reset

set/resettest

wr0 wr1 rd1rd0

AY-Jan.2001 48

Direct translation of PNs to circuits

(1) (0) (0) (1)

OperationControlled

To Operation

AY-Jan.2001 49

(1) (0) 0->1 1->0

1->0 (1)To Operation

AY-Jan.2001 50

p1 p21->0 0->1 0->1 1->0

1->0->1 1*To Operation

AY-Jan.2001 51

• This method associates places with latches (flip-flops) – so the state memory (marking) of PN is directly mimicked in the circuit’s state memory

• Transitions are associated with controlled actions (e.g. activations of data path units or lower level control blocks – by using handshake protocols)

• Modelling discrepancy (be careful!): – in Petri nets removal of a token from pre-places and adding tokens

in post-places is instantaneous (i.e. no intermediate states) – in circuits the “move of a token” has a duration and there is an

intermediate state

AY-Jan.2001 52

Translation in brief

This method has been used for designing control of a token ring adaptor

[Yakovlev, Varshavsky, Marakhovsky, Semenov, IEEE Conf. on Asynchronous Design Methodologies, London, 1995

a2- b2- a2+ b2+

a3- b3- a3+C2+

dummyQ1 Q3

Q5from

Q1 Q2 Q6

(1) (1)

(0)(0)

(1)(1)

a1 b1 a2 b2

(1)Op3

(1)(1)

a3 b3(0)

(0) (1)

(1)(0)

(0) (0)

(1)(1)(1)

Cell Implementations

a1- b1- a1+ b1+

(a) (b)

AY-Jan.2001 53

Refining the Write part

AY-Jan.2001 54

Control circuit for Write part

odc1 odc0

sdcsdcsdcsdc

write_start

write_ack

r_0 r_1 rbar_0 rbar_1

odc1 odc0wr1 wr0

ck1 ck0

43 41 2321

wr1 wr0

slot1 slot0

clrw setw

AY-Jan.2001 55

Implementing David cells (1)

inr- x+ xb- ina- inr+

outa- outr-xb+ina+

x- outr+ outa+

"mild" relativetiming

inr- x+ xb- outr- outa-

xb+ina+ ina- inr+x-

Speed-independent version:

“Aggressive” relative timing version:

AY-Jan.2001 56

Implementing David cells (2)

2dc(0)

(1) (0)

odc1 odc0wr wr1 wr0

slot0 slot1

This is an peep-hole optimised solution for two David cells (places 1 and 3) and interface to the handshake with the Writer

AY-Jan.2001 57

Implementing ‘sync’ blocks

AY-Jan.2001 58

Simulation using Cadence toolkitmetastability inside mutex

Write response time

input of sync

output of sync

AY-Jan.2001 59

Cycle times (ns) for 0.6 micron

type Write Read

Without set-reset of w

With set-reset of w

No waiting for Write

Speed-independent

9.0 10.4 9.0

With Relative Timing

4.8 6.3 6.6

AY-Jan.2001 60

Improving performance

rd1rd1

rd0 rd0

rd1 rd1

wr1 wr0

wr0init state

In case of repetitive writing (of, eg., slot 1), read access may have to wait for the completion of write just because of a timing clash on the same slot – and not because of absence of new data in the ACM (original aim of Signal)

This problem cannot be resolved within the TWO slot ACM because of coherence violation. Can we do it with an extra slot?

AY-Jan.2001 61

Towards 3-slot Signal

rd1 rd1 rd1

20 21 22

rd1 rd1

rd3 rd3 rd3

rd2 rd2 rd2

32' 31'32

21' 23'

13' 12'13

After writing a slot (e.g.2) for the first time writer alternates between 3 and 2

Reader can then, after finishing reading slot 1, read slot 2 or 3 whichever is free

AY-Jan.2001 62

rd1 rd1 rd1

20 21 22

rd1 rd1

rd3 rd3 rd3

rd2 rd2 rd2

32' 31'32

21' 23'

13' 12'13

AY-Jan.2001 63

rd1 rd1 rd1

20 21 22

rd1 rd1

rd3 rd3 rd3

rd2 rd2 rd2

32' 31'32

21' 23'

13' 12'13

AY-Jan.2001 64

3-slot Signal refined

Control variables

21(32):

w(2->1)l(3->2)

r(3->2)

Algorithm:

Write part:

write slot w; l:=w; w:=differ(l,r)

Read part:

if (r<>l) r:=l else wait; read slot r;

r-read, w-write, l-last

AY-Jan.2001 65

3-slot Pool

rd1 rd1 rd1

20 21 22

rd1 rd1

rd3 rd3 rd3

rd2 rd2 rd2

32' 31'32

21' 23'21

13' 12'13

In Pool we must have:

Read asynchrony

Write part:

write slot w; l:=w; w:=differ(l,r)

Read part:

r:=l; read slot r;

Algorithm:

r-read, w-write, l-last

AY-Jan.2001 66

Three-slot algorithm (due to Hugo Simpson)

Writer: Reader:

wr: d[n]:=input

w0: l:=n

w1: n:=differ(l,r)

r0: r:=l

rd: output:=d[r]

n(next), l(last), r(read) – 3-valued var’s

AY-Jan.2001 67

Three-slot algorithm

differ:

AY-Jan.2001 68

Three-slot PoolWriter: Reader:

AY-Jan.2001 69

AY-Jan.2001 70

AY-Jan.2001 71

last02.01

AY-Jan.2001 72

last02.0103.01

AY-Jan.2001 73

last02.01

AY-Jan.2001 74

last02.01

AY-Jan.2001 75

last02.01

AY-Jan.2001 76

last02.01

AY-Jan.2001 77

AY-Jan.2001 78

AY-Jan.2001 79

3-slot ACM design

write control mutex read control

differ &reg n

reg l reg r

rn l r

Rw0Gw0 Gr0

w1-req/ack

w0-req/ack

r0-req/ack

AY-Jan.2001 80

3-slot ACM design

differ &reg n

reg l reg r

Rw0Gw0 Gr0

w1-req/ack

w0-req/ack

r0-req/ack

AY-Jan.2001 81

Differ and register logic

w1-req

differ register

w1-ackn2

AY-Jan.2001 82

3-slot ACM design

differ &reg n

reg l reg r

rn l r

Rw0Gw0 Gr0

w1-req/ack

w0-req/ack

r0-req/ack

AY-Jan.2001 83

Write control circuit: STG

INPUTS: wr,Gw0,w0_ack,w1_ackOUTPUTS: w0,wa,w0_req,w1_req

w1_req-

w1_ack- wa+ w0- w1_ack+

wr- Gw0- w0_ack- w1_req+

w0_req-

w0_ack+

w0_req+

AY-Jan.2001 84

Write control ckt: from Petrify

wr Gw0

csc2bcsc2

csc1bcsc1

The writer control circuits of the three-slot ACM

w0_reqw0_ack

w1_reqw1_ack

AY-Jan.2001 85

Four-slot PoolWriter: Reader:

nextread

d[0,0]

d[0,1]

d[1,0]

d[1,1]

s[0] s[1]

v[0] v[1]

AY-Jan.2001 86

Four-slot Pool algorithm (H.Simpson)

Writer: Reader:

wr: d[n,¬s[n]]:=input

w0: s[n]:= ¬s[n]

w1: l:=n || n:=¬r

r0: r:=l

r1: v:=s

rd: output:=d[r,v[r]]

n (next), l(last), r(read) – binary var’s

AY-Jan.2001 87

3-slot vs 4-slot performance

statements 3-slot min time

4-slot min time ns

w0+w1 4.19 9.39

r0+(r1) 1.38 3.47

Time for control statements

AY-Jan.2001 88

Are we in the end fully asynchronous?

• Circuit implementations involve use of latches, which may go metastable.

• Metastability always implies a trade-off, in terms of noise, between data or time domain error.

• In a “truly busy (real-time)’’ environment, where the ack signal is not used, the corresponding process (e.g., writer) must allow for a small interval (3-4ns for .6m CMOS), sufficient for metastability to get resolved practically with the probability of 1.

• Our h/w solutions for “busy” domains aim at maximising the “wait-free” aspect of communication but theoretically cannot fully eliminate mutual dependency between processes (hidden within ACM control variable circuits).

AY-Jan.2001 89

Concluding remarks

• Constructing ACMs to interface sub-systems with different time and energy requirements, and implementing them in high-speed hardware, proves feasible.

• Application of hets in control or image processing (e.g. via neural networks) is needed to fully assess their potential for future application-specific SOCs

• More work on mathematical modelling of hets and on developing an extensive parametrised library of ACM circuits is needed.

AY-Jan.2001 90

VLSI design layout (chip fab’ed in June 2000 via EUROPRACTICE)

4-slot Pool ACM

AY-Jan.2001 91

4-slot ACM part

Tested (physically) correct (details on testing in 9thAsync UK Forum paper)

AY-Jan.2001 92

Acknowledgements and References

• Members of the COMFORT team: At KCL – Tony Davies, Ian Clark, David Fraser, Sergio

VelastinAt NCL – Fei Xia, David Kinniment, Albert Koelmans,

Delong Shang, Alex Bystrov• BAe colleagues: Hugo Simpson and Eric Campbell • Project COMFORT web site:

http://www.eee.kcl.ac.uk/~comfort

• Work supported by EPSRC, EU (ACiD-WG) and reported and published at Async2000, AINT’2000, Async2001 etc.

ay-jan.20011 communicating in systems with heterogeneous timing alex yakovlev, asynchronous systems...

realtime slide

clocked slide

shared memory slide

source synchronous slide

selftimed systems

asynchronous ab

clocked systems

ab clock selftimed

Documents

31 aug 20041 asynchronous data communication mechanisms ...

pavel yakovlev/duquesne university 1 state budgets and taxes...

yakovlev yak-38 (forger-a) - wordpress.com...yakovlev yak-38...

tyne cot cemetery

russian experience - ilyushin 96 and yakovlev 42

annual report 20011-12

cell 20011

so rhode island april 20011

merry christmas & happy new year 20011

yakovlev, nikolay - chess blueprints - planning in the...

is the die cast for the token game? alex yakovlev, frank...

victor b. yakovlev national research university miet,...

async 2000 eilat april 2 - 6 1 priority arbiters alex...

new investor overview march 20011

chandra s. r. kaipa, alexander b. yakovlev george w...

website optimizer yakovlev

tyne valley mtb newsletter: august 2014 ... › files ›...

yakovlev golani

newsletter 20011

yakovlev - 1989 obituary