introduction to silicon programming in the tangram/haste language

Introduction to Silicon Programmingin the Tangram/Haste language

Material adapted from lectures by:Prof.dr.ir Kees van Berkel[Dr. Johan Lukkien][Dr.ir. Ad Peeters]

at the Technical University of Eindhoven, the Netherlands

Philips Research, Kees van Berkel, Ad Peeters, 2002-09-10 2

TU/e

Handshake signaling and data

request ar

active side

passive side

acknowledge ak

data ad

request ar

active side

passive side

acknowledge ak

data ad

push channel

versus

pull channel


TU/e

Handshake signaling: push channel

ack ak

req ar

time

early ad

broad ad

late ad


TU/e

Data bundlingIn order to maintain event ordering at both sides of a

channel, the circuit must satisfy data bundling constraint:

• for push channel: delay along request wire must exceed delay of data wire;

• for pull channel: delay along acknowledge wire must exceed delay of data wire.


TU/e

Handshake signaling: pull channel

ack ak

req ar time

early ad

broad ad

late ad

When data wires are invalid: multiple and incomplete transitions allowed.


TU/e

Tangram assignment x:= f(y,z)

yw

zw

y

f

z

xw0

| x xrxw1

Handshake circuit

y

f

z

| x

y

f

z

| x


TU/e

Four-phase data transfer

b

c

r / brtime

bd / cd

ba / cr

ca / a

1 2 3 4 5


TU/e

Handshake latch

[ [ w ; [w : rd:= wd] [] r ; r] ]

• 1-bit handshake latch: wd wr rd

wd wr rd wk = wr

rk = rr

x rw

wd

wr

rd


TU/e

N-bit handshake latch

wr

wd1 rd1

wd2

wk

rd2

wdN rdN

...

rr

rk

area, delay, energy • area: 2(N+1) gate

eqs.• delay per cycle:

4 gate delays• energy per write

cycle: 4 + 0.5*2N transitions, in average


TU/e

Transferrer

[ [ a : (b ; c)] ; [ a : (b ; cd:= bd ; c ; cd:= )] ]

a

b c

ar ak

br

bk

bd

ck

cr

cd


TU/e

Multiplexer

[ [ a : c ; a : (cd:= ad; c ; cd:= ) [] b : c ; b : (cd:= bd; c ; cd:= ) ] ]

Restriction: ar br must hold at all times!

|

a

b

c


TU/e

Multiplexer realization

data circuit

control circuit


TU/e

Logic/arithmetic operator

[ [ a : (b || c) ]; [ a : ((b || c) ; ad:= f(bd , cd ))]]

Cheaper realization (delay sensitive):

[ [ a : (b || c) ]; [ a : ((b || c) ; ad:= f(bd , cd ))]; “delay” ; ad:= ]

fb

ca


TU/e

A one-place fifo buffer

byte = type [0..255]

& BUF1 = main proc(a?chan byte & b!chan byte).begin x: var byte | forever do a?x ; b!x odend

BUF1a b


TU/e

A one-place fifo buffer


& BUF1 = main proc(a?chan byte & b!chan byte).begin x: var byte| forever do a?x ; b!x odend

;

x ba

;

aa x bb

x

;

xx


TU/e

2-place buffer


& BUF1 = proc (a?chan byte & b!chan byte).begin x: var byte | forever do a?x ; b!x od end

& BUF2: main proc (a?chan byte & c!chan byte).begin b: chan byte | BUF1(a,b) || BUF1(b,c) end

BUF1a b BUF1 c


TU/e

Two-place ripple buffer


TU/e

Two-place wagging buffer

ba


& wag2: main proc(a?chan byte & b!chan byte).begin x,y: var byte| a?x ; forever do (a?y || b!x) ; (a?x || b!y) odend


TU/e

Two-place ripple register

…begin x0, x1: var byte| forever do b!x1 ; x1:=x0; a?x0 odend


TU/e

4-place ripple register


& rip4: main proc (a?chan byte & b!chan byte). begin x0, x1, x2, x3: var byte | forever do b!x3 ; x3:=x2 ; x2:=x1 ; x1:=x0 ; a?x0 od end


TU/e

4-place ripple register

• area : N (Avar + Aseq )

• cycle time : Tc = (N+1) T:=

• cycle energy: Ec = N E:=

x0 x1 x2 x3x3 x0 x3 x2 x3x1 x2x0 x1x0


TU/e

Introducing vacancies

…begin x0, x1, x2, x3, v: var byte| forever do (b!x3 ; x3:=x2 ; x2:=v) || (v:=x1 ; x1:=x0 ; a?x0) odend

• what is wrong?


TU/e

Introducing vacancies

forever do ((b!x3 ; x3:=x2) || (v:=x1 ; x1:=x0 ; a?x0)) ; x2:=v od

or:

forever do ((b!x3 ; x3:=x2) || (v:=x1 ; x1:=x0)); (x2:=v || a?x0)od


TU/e

“synchronous” 4-p ripple register

forever do (s0:=m0 || s1:=m1 || s2:=m2 || b!m3 ); ( a?m0 || m1:=s0 || m2:=s1 || m3:=s2)od

m0

s0

m1

s1

m2

s2

m3x0 b

m0

s0

m1

s1

m2

s2

m3x0 b

m0

s0

m1

s1

m2

s2

m3x0 b

m0

s0

m1

s1

m2

s2

m3x0 b

m0

s0

m1

s1

m2

s2

m3x0 b


TU/e

4-place wagging register

forever do b!x1 ; x1:=x0 ; a?x0; b!y1 ; y1:=y0 ; a?y0od

x0 x1

x2 x3y0 y1

a b

x1

x2b

x0 x1

a

x0

ba

y1

bb

y0 y1

a

y0

a


TU/e

8-place register

4-way wagging

forever do b!u1 ; u1:=u0 ; a?u0; b!v1 ; v1:=v0 ; a?v0; b!x1 ; x1:=x0 ; a?x0; b!y1 ; y1:=y0 ; a?y0od


TU/e

Four 88 shift registers comparedtype area

[gate eq.] cycle time [nanosec.]

energy/message [nanojoule]

linear 167 43 0.75

pseudo synchronous

264 23 1.46

4-way wagging

238 26 0.29

wagging 201 34 0.48


TU/e

Tangram/Haste• Purpose: programming language for

asynchronous VLSI circuits.

• Creator: Tangram team @ Philips Research Labs (proto-Tangram 1986; release 2 in 1998).

• Inspiration: Hoare’s CSP, Dijkstra’s GCL.

• Lectures: no formal introduction; manual hand-out (learn by example, learn by doing).

• Main tools: compiler, analyzer, simulator, viewer.


TU/e

2-place buffer


& BUF1 = proc (a?chan byte & b!chan byte).begin x: var byte | forever do a?x ; b!x od end

& BUF2: main proc (a?chan byte & c!chan byte).begin b: chan byte | BUF1(a,b) || BUF1(b,c) end

BUF1a b BUF1 c


TU/e

Median filter

median: main proc (a? chan W & b! chan W). begin x,y,z: var W & xy, yz, zw: var bool | forever do

((z:=y; y:=x) || yz:=xy) ; a?x; (xy:= x<=y || zx:= z<=x); if zx=xy then b!x or xy=yz then b!y or yz=zx then b!z fi

odend

Mediana b


TU/e

Greatest Common Divisor

gcd: main proc (ab?chan <<byte,byte>> & c!chan byte).begin x,y: var byte| forever do

ab?<<x,y>>; do x<y then y:= y-x or x>y then x:= x-y

od; c!xod

endGCDab c


TU/e

Nacking Arbiter

nack: main proc (a?chan bool & b!chan bool).begin na,nb: var bool | <<na,nb>> := <<true,true>>

; forever dosel probe(a) then a!nb || na:=

na#nbor probe(b) then b!na || nb:=

nb#nales

odend

Nacking

arbiter

a

b


TU/e

C : Tangram handshake circuit

T

a b

C(T) =

;

a c

SR

C(R;S)=


TU/e


;

a c

SR

C(R;S)=

a c

SR

;

C(R;S)=

|

b


TU/e


C (R||S) =

SR

||

o

|

rx

i


TU/e

Tangram Compilation

Tangram program T

Handshake circuit

VLSI circuit

C

E

Handshake process

H

||

· H · T = || · C ·T


TU/e

VLSI programming of asynchronous circuits

expander

Tangram program

Handshake circuit

Asynchronous circuit(netlist of gates)

compilersimulator

feedback

behavior,

area, time, energy,

test coverage


TU/e

Tangram tool boxLet Rlin4.tg be a Tangram program:• htcomp -B Rlin4

– compiles Rlin4.tg into Rlin4.hcl, a handshake circuit

• htmap Rlin4– produces Rlin4*.v files, a CMOS standard-cell circuit

• htsim Rlin4 a b– executes Rlin4.hcl with files a, b for input/output

• htview Rlin4– provides interactive viewing of simulation results


TU/e

Tangram program “Conway”

B1 = type [0..1] & B2 = type <<B1,B1>>& B3 = type <<B1,B1,B1>>& P = … & Q = … & R = …

& conway: main proc (a?chan B2 & d!chan B3). begin b,c: chan B1 | P(a,b) || Q(b,c) || R(c,d) end

P Q Ra b c d


TU/e

Tangram program “Conway”& P = proc(a?chan B2 & b!chan B1).

begin x: var B2| forever do a?x; b!x.0; b!x.1 od end

& Q= proc(b?chan B1 & c!chan B1).begin y: var B1| forever do b?y; c!y od end

& R= proc(c?chan B1 & d!chan B3).begin x,y,z: var B1| forever do c?x; c?y; c?z; d!<<x,y,z>>

od end


TU/e

VLSI programming for …

• Low costs: – introduce resource sharing.

• Low delay (high throughput): – introduce parallelism.

• Low energy (low power):– reduce activity; …


TU/e

VLSI programming for low costs

• Keep it simple!!

• Introduce resource sharing: commands, auxiliary variables, expressions, operators.

• Enable resource sharing, by:– reducing parallelism– making similar commands equal


TU/e

Command sharing

S ; … ; S

P : proc(). S

P() ; … ; P()

S

0

S

1

|

S

0 1


TU/e

Command sharing: example

a?x ; … ; a?x

ax : proc(). a?x

ax() ; … ; ax()

1|

0

|

a xw

|

0 1

a xw


TU/e

Procedure definition vs declaration

Procedure definition: P = proc (). S– provides a textual shorthand (expansion)– each call generates copy of resource, i.e.

no sharing

Procedure declaration: P : proc (). S– defines a sharable resource– each call generates access to this resource


TU/e

Command sharing

• Applies only to sequentially used commands.• Saves resources, almost always

(i.e. when command is more costly than a mixer).• Impact on delay and energy often favorable.• Introduced by means of procedure declaration.• Makes Tangram program less well readable.

Therefore, apply after program is correct & sound.

• Should really be applied by compiler.


TU/e

Sharing of auxiliary variables

• x:=E is an auto assignment when E depends on x. This is compiled as aux:=E; x:= aux , where aux is a “fresh” auxiliary variable.

• With multiple auto assignments to x, as in:x:=E; ... ; x:=F

auxiliary variables can be shared, as in: aux:=E; aux2x(); ... ; aux:=F; aux2x()

with aux2x(): proc(). x:=aux


TU/e

Expression sharing

x:=E ; … ; a!E

f : func(). E

x:=f() ; … ; a!f()

|

E

e0

e1

Ee0

Ee1


TU/e

Expression sharing

• Applies only to sequentially used expressions.• Often saves resources, (i.e. when expression

is more costly than the demultiplexer).• Introduced by means of function declarations.• Makes Tangram program less well readable.

Therefore apply after program is correct & sound.

• Should really be applied by compiler.


TU/e

Operator sharing

• Consider x0 := y0+z0 ; … ; x1 := y1+z1 .

• Operator + can be shared by introducingadd : func(a,b? var T): T. a+b

and applying it as in x0 := add(y0, z0) ; … ; x1 :=

add(y1,z1) .


TU/e

Operator sharing: the costs

• Operator sharing may introduce multiplexers to (all) inputs of the operator and a demultiplexer to its output.

• This form of sharing only reduces costs when:– operator is expensive,– some input(s) and/or output are common.


TU/e

Operator sharing: example

• Consider x := y+z0 ; … ; x := y+z1 .

• Operator + can be shared by introducingadd2y : proc(b? var T). x:=y+b

and applying it as inadd2y(z0) ; … ; add2y(z1) .


TU/e

Greatest Common Divisor

gcd: main proc (ab?chan <<byte,byte>> & c!chan byte).begin x,y: var byte| forever do

ab?<<x,y>>; do x<y then y:= y-x or x>y then x:= x-y

od; c!xod

end

GCDab c


TU/e

Assigment: make GCD smaller

• Both assignments (y:= y-x and x:= x-y) are auto assignments and hence require an auxiliary variable.

• Program requires 4 arithmetic resources (twice < and –) .

• Reduce costs of GCD by saving on auxiliary variables and arithmetic resources. (Beware the costs of multiplexing!)

• Use of ff variables not allowed for this exercise.

introduction to silicon programming in the tangram/haste language

Documents