geometry of synthesis iii: resource management through type …drg/papers/popl11.pdf ·...

Geometry of Synthesis III:

Resource Management Through Type Inference

Dan R. Ghica∗ Alex SmithUniversity of Birmingham, UK

January 5, 2011

Abstract

Geometry of Synthesis is a technique for compiling higher-level programming languages into digitalcircuits via their game semantic model. Ghica (2007) first presented the key idea, then Ghica andSmith (2010) gave a provably correct compiler into asynchronous circuits for Syntactic Control ofInterference (SCI), an affine-typed version of Reynolds’s Idealized Algol. Affine typing has the dualbenefits of ruling out race conditions through the type system and having a finite-state game-semanticmodel for any term, which leads to a natural circuit representation and simpler correctness proofs.In this paper we go beyond SCI to full Idealized Algol, enhanced with shared-memory concurrencyand semaphores (Ghica and Murawski, 2008).

Compiling ICA proceeds in three stages. First, an intermediate type system called SyntacticControl of Concurrency (SCC), (Ghica et al., 2006) is used to statically determine “concurrencybounds” on all identifiers in the program. Then, a program transformation called serialization isapplied to the program to translate it into an equivalent SCC program in which all concurrencybounds are set to the unit. Finally, the resulting program can be then compiled into asynchronouscircuits using a slightly enhanced version of the GoS II compiler, which can handle assignable variablesused in non-sequential contexts.

1 Geometry of Synthesis

The problem of hardware compilation, synthesising digital circuits from behavioural specifications writtenin higher-level programming languages, turned out to be surprisingly difficult. Although the pioneeringwork of van Berkel and Saeijs (1988) and Page and Luk (1991); Luk et al. (1994) yielded promisinginitial results, more than a decade later this technology has yet to enter the mainstream of digital design.Several C-to-hardware compilers are available as commercial products from companies such as Mentor,Altera, Synopsys and others, but they all share a common weakness, providing poor support forhandling of functions and procedures, at least when compared with modern conventional (“software”)compilers.

The role of functions (and related concepts such as procedures, subroutines, methods, etc.) in modernlanguages and compilers is essential in several different ways. Through the functional interface a programwill structure itself into reusable components, interact with code written in other programming languages(foreign function interface), and interact with run-time services of the operating systems (applicationbinary interface). The functional subset of a modern language will usually be a variant of a typedlambda calculus and can handle functions as arguments (higher-order functions), anonymous functions(abstraction) and partial application (currying). Starting with Algol, at least in the idealized formof Reynolds (1981), such features are common enough to be taken for granted in most contemporarylanguages and compilers. However, this is not the case in hardware compilation. These methodologicalconsiderations are elaborated by Ghica (2009).

The Geometry of Synthesis (GoS) approach is concerned with remedying this situation by compilinghigher level languages into hardware in a way that is consistent with a modern functional infrastructurehaving mature functional interface capabilities. The GoS approach starts from the widely acceptedpremise that circuit diagrams that are graph isomorphic should be behaviourally equal at the “right” levelof abstraction, and builds on the work of Kelly and Laplaza (1980) and their seminal results regardingthe representation of diagrams in the language of compact closed categories. Function objects can beencoded in the language of compact closed categories using just a disciplined accounting of boxes andwires, so the first key new insight provided by GoS is that function definition and (linear) application can

∗supported by an EPSRC Advanced Research Fellowship1

be represented in hardware in a purely static way by an interconnection of circuits which is consistentwith this discipline. The second insight is that in order to compile conventional programming languages,their imperative features and especially contraction (which allows multiple occurrences of identifiers) canbe represented at the circuit level using game semantics (Ghica, 2007). In fact, the hardware compilationprocess is tantamount to a reification of game semantics into hardware (Ghica, 2009).

Contribution. In this paper we address what we perceive to be a key inconvenience in the GoS ap-proach, namely restrictions on contraction in concurrent contexts. Ideally, we want to allow the program-mer to write programs such as λfλx.f(x); f(x) or λfλx.f(x)||f(x). However, the type system whichforms the basis of the programming language allows contraction only in sequential contexts, hence thefirst term is typeable but the second is not. The programmer needs to serialize the term by hand, writingsomething like λf1f2λx1x2.f1(x1)||f2(x2); in complicated terms this quickly becomes very difficult tohandle. This becomes even more difficult if we want to apply this function to terms which themselvesrequire transformation. For example, if applied to λc.c||c, which needs itself to change to λc1c2.c1||c2,the original function should actually be λf1f2λx1x2x3x4.f1x1x2||f2x3x4.

In this paper we give a systematic method of obtaining such terms, which we then know how tocompile into hardware.

2 Type systems

In this section we will present a realistic shared-memory higher-order concurrent programming languagewith synchronisation primitives, and give several typing systems for the language. The first type systemis Idealized Concurrent Algol (ICA), the most general one, which is essentially the simply-typed lambdacalculus with special constants for state manipulation and concurrency. It represents an extension of theidealized Algol language proposed by Reynolds (1981) with parallel composition and semaphores (Ghicaand Murawski, 2008).

The next typing system, (Basic) Syntactic Control of Interference (SCI) is an affine version of ICAin which contraction is disallowed over function application and parallel execution. SCI was initially pro-posed by Reynolds as a programming language which would facilitate Hoare-style correctness reasoningbecause covert interference between terms is disallowed (Reynolds, 1978, 1989). SCI turned out to besemantically interesting and it was studied extensively (Reddy, 1996; O’Hearn et al., 1999; McCusker,2007, 2010). The restriction on contraction in SCI makes it particularly well suited for hardware com-pilation because any term in the language has a finite-state model and can therefore be compiled as astatic circuit (Ghica, 2007; Ghica and Smith, 2010).

The third type system, Syntactic Control of Concurrency (SCC) is a half-way house between theunrestricted ICA and the constrained SCI (Ghica et al., 2006). In SCC, contraction is allowed in all con-texts, but static bounds on the number of contractions in non-sequential contexts are enforced throughthe type system. In fact, SCC with all contraction bounds set to unit is equivalent to SCI. SCC has beeninitially proposed as a framework for model checking concurrent programs, because this language alsoenjoys a finite-state model property at all terms (Ghica and Murawski, 2006). Evidently, this propertyrecommends it as a more flexible alternative to SCI for hardware compilation. Shared memory concur-rency and semaphores cannot type in SCI because they require contraction in non-sequential contexts,but can be handled by the SCC system.

2.1 Idealized Concurrent Algol (ICA)

The primitive types of the language are commands, memory cells, semaphores and expressions. For sim-plicity we only consider boolean expressions, but finite-integer expressions can be added in a conceptuallystraightforward manner.

σ ::= com | var | sem | exp.The type constructors are product and function:

θ ::= θ × θ | θ → θ | σ.

Terms are described by typing judgements of the form

x1 : θ1, . . . , xk : θk `M : θ,

where we denote the set of identifier type assignments on the left by Γ. By convention, if we write Γ,Γ′

it assumes that the two type assignments have disjoint sets of identifiers.The term formation rules of the language are those of the simply typed lambda calculus:

Identityx : θ ` x : θΓ `M : θ Weakening

Γ, x : θ′ `M : θ

Γ, x : θ, y : θ `M : θ′Contraction

Γ, x : θ `M [x/y] : θ′

Γ, x : θ′ `M : θAbstraction

Γ ` λx.M : θ′ → θ

Γ `M : θ → θ′ Γ ` N : θ ApplicationΓ `MN : θ′

Γ `Mi : θi ProductΓ ` 〈M1,M2〉 : θ1 × θ2

The constants of the language are described below:

0, 1 : exp are the boolean constants;

skip : com is the only command constant (“no-op”);

asg : var × exp→ com is assignment to memory cell, denoted by “:=” when used in infix notation;

der : var→ exp is dereferencing of memory cell, also denoted by “!”;

seq : com× com→ com is command sequencing, denoted by “;” when used in infix notation;

seq : com× exp→ exp is sequencing of command with expression, denoted by “;” when used in infixnotation, resulting in an expression with side effects;

par : com→ com→ com is parallel composition of commands, denoted by “||” when used in infix notation;

neg : exp→ exp is boolean negation;

or : exp× exp→ exp is boolean disjunction;

if : exp× com× com→ com is branching;

while : exp× com→ com is iteration;

grab : sem→ com is semaphore grab;

release : sem→ com is semaphore release;

newvar : (var→ com)→ com is local variable declaration in block command;

newvar : (var→ exp)→ exp is local variable declaration in block expression;

newsem : (sem→ com)→ com is semaphore declaration in block command;

newsem : (sem→ exp)→ exp is semaphore declaration in block expression;

recθ : (θ → θ)→ θ is a fix-point operator.

Local variable and semaphore binding is presented with a quantifier-style type in order to avoid introduc-ing new variable binders in the language. Local variable declaration can be sugared into a more familiarsyntax as newvar(λx.M) ≡ newvar x in M.

The language ICA is a highly expressive programming language in which a large variety of algorithmsand programming constructs can be coded. Its operational semantics, which defines the imperative andshared-variable concurrency primitives in the usual way in the framework of a call-by-name lambda-calculus, along with a fully abstract game semantic model are given by Ghica and Murawski (2008).

2.2 Syntactic Control of Interference (SCI)

SCI is the affine version of ICA. It has the same type structure as ICA. However, since contraction inconcurrent contexts is not allowed semaphores can play no meaningful role and can be omitted from thedefinition of the language. The only changes as compared to ICA are the removal of explicit contractionand a new rule for function application:

Γ `M : θ → θ′ Γ′ ` N : θ Application (new)Γ,Γ′ `MN : θ′

The immediate consequence of this restriction is that nested application is no longer possible, i.e. termssuch as

f : com→ com ` f(f(skip))

are illegal. Elimination of nested application means that the usual operational unfolding of recursionno longer preserves typing, therefore the recθ operator must be also eliminated. A restricted recursionoperator can be reintroduced but we will not consider it here.

The second consequence of this restriction plays out in conjunction with the chosen types of sequentialand parallel composition:

seq : com× com→ com

par : com→ com→ com.

The uncurried type of seq allows the typing of normal imperative programs because contraction can beachieved via product formation. Terms such as λc.c; c are possible, but λc.c||c are not.

Despite its restrictions SCI is still expressive enough to allow many interesting programs. Its finitestate model makes it perfectly suited for hardware compilation (Ghica, 2007; Ghica and Smith, 2010).

2.3 Syntactic Control of Concurrency (SCC)

The SCC type system allows contraction in all contexts but only when static bounds on the numbers ofnon-sequential contractions are respected. The types of the language are given by the grammar

σ ::= com | exp | var | semγ ::= θn, n ∈ N

θ ::= σ | θ × θ | γ → θ.

If a bound in θn is unit we may omit it. SCC types have the following sub-typing relation

n1 ≤ n2 θ1 ≤ θ2

θn11 ≤ θn2

2

γ2 ≤ γ1 θ1 ≤ θ2

γ1 → θ1 ≤ γ2 → θ2

Type judgements have formx1 : γ1, . . . , xk : γk `M : θ.

If a type assignment environment is Γ = {xi : θnii | 1 ≤ i ≤ k} we define

n · Γ def= {xi : θn·nii | 1 ≤ i ≤ k}.

The typing rules are like in ICA except for the management of bounds in contraction and functionapplication, plus a new rule for sub-typing:

Γ, x : θm, y : θn `M : θ′Contraction (new)

Γ, x : θm+n `M [x/y] : θ′

Γ `M : θn → θ′ Γ′ ` N : θ Application (new)Γ, n · Γ′ `MN : θ′

Γ `M : θ θ ≤ θ′ SubtypingΓ `M : θ′

Note that λc.c||c is typeable in SCC, but the type is not the same as for λc.c; c:

c:com ` c:com ` par:com→ com→com

c:com ` par c:com→com d:com ` d:com

c:com, d:com ` par c d : com

c:com2 ` par c c : com

` λc:com2.par c c : com2→com

As in SCI, a general fix-point combinator cannot be typed. All other constants are inherited from ICA andreceive, where needed, unit bound, except that newvar and newsem are replaced by a family of constantswhich can accept a local variable (semaphore, respectively) for any given bound

newvar : (varn → com)→ com

newvar : (varn → exp)→ exp

newsem : (semn → com)→ com

newsem : (semn → exp)→ exp.

The SCC type system can be used to write a large variety of imperative, concurrent or higher orderprograms such as producer-consumer (Ghica and Murawski, 2006). In practice the restriction is felt mainlybecause general recursion is ruled out and in the definition of higher-order functions where concurrencybounds must be specified by the programmer. Without type inference the programmer needs to specifyguarantees as well, which is inconvenient and will be also addressed in this paper. Consider for example aproducer-consumer program where the producer and the consumer are given as arguments. In ICA sucha program would have signature

λprod : exp.λcons : exp→ com.M

whereas in SCC it must be given as, for example

λprod : exp.λcons : exp1 → com.M,

which means that any argument consumer is not allowed to use its own argument more than once innon-sequential contexts.

Example 2.1

1. ` λf.λx.f(f(x)) : (comn → com)n+1 → comn2 → com

2. ` λf.λx.f(x); f(x) : (comn → com)1 → comn → com

3. ` λf.λx.f(x)||f(x) : (comn → com)2 → com2n → com

4. ` λf.f(f(skip)) : (comn → com)n+1 → com

5. ` λg.g(λx.g(λy.x)) : ((comn → com)n → com)n+1 → com.

In the above, n is a positive integer constant which must be provided by the environment (the user).

The term of “concurrency bound” is meant to encompass two kinds of run-time interleaving. The first isthe genuine concurrency expressed in a term such as f(x)||f(x) and the second is that occurring in a termsuch as f(f(x)). We feel justified in calling the latter “concurrent” because the computations associatedwith the two instances of f are interleaved during execution. In fact, in loc. cit. it is shown how a termof that form can be syntactically “pulled apart” using semaphores and side-effects into a term of shape· · · f · · · || · · · f · · · || · · ·x · · · which has precisely the same interleaving of effects as the original.

Note that some terms are not typeable, for example the application of term (5) to term (4) above.Many ICA programs have SCC typing; in fact all beta-normal form ICA programs and all ICA programswith beta-redexes of first order or base types are SCC typeable (Ghica et al., 2006, Lemma 10). Loc. cit.also gives a fully abstract game semantic model of SCC.

3 Type inference for SCC

SCC is an assume-guarantee type system where the assume bounds must be provided by the context butthe guarantee can be computed via type inference, rather than being supplied by the programmer. Inthis section we will give a decidability result for SCC type inference. Without loss of generality we willrestrict the algorithm to higher-order closed terms. ICA typing can be determined with a variant Hindley-Milner-style algorithm, and we assume it. It is well known that affine typing plus explicit contractionis as expressive as conventional typing and we will assume our ICA type inference algorithm uses thisstrategy. The details of such an algorithm are standard and will be omitted.

We further assume that each type θ → θ′ is decorated with a fresh variable n, as in θn → θ′, toform a skeleton for an SCC type. For instance, the type (com→ com)→ com will be annotated as(comn1 → com)n2 → com. The variables which occur in covariant positions in the type of the term arecalled assumes and those that occur in contravariant positions guarantees.

Once this annotated type expression is constructed the next step is to obtain a set of numeric con-straints on the bounds. This is done by defining a function recursively on the derivation tree, whichproduces a set of constraints as a result. First we define the constraints at the level of the type system:

|θn| = {n}|θ × θ′| = |θ| ∪ |θ′||θn → θ′| = {n} ∪ |θ′||θn1

1 ≥ θn22 | = (n1 ≥ n2) ∧ |θ1| ≥ |θ2|

|γ1 → θ1 ≥ γ2 → θ2| = |γ2 ≥ γ1| ∧ |θ1 ≥ θ2||Γ ≥ Γ′| =

∧

x:γ∈Γx:γ′∈Γ′

|γ ≥ γ′|

The notation can be extended to |Γ = Γ′| in the obvious way.The required constraints are indicated as annotations on the derivation rules:

x : (θ′)n ` x : θ I n ≥ 1 ∧ |θ′ ≥ θ|

∅ ` k : θ I∧n∈|θ| n ≥ 1

Γ′ `M : θ I CΓ, x : (θ′)n `M : θ I |Γ = Γ′|

Γ′, x : γ `M : θ I CΓ ` λx : γ.M : γ → θ I |Γ = Γ′|

Γ′ `Mi : θi I CiΓ ` 〈M1,M2〉 : θ1 × θ2 I |Γ = Γ′|

The key rules are for contraction

Γ′, x : θn1 , y : θn2 `M : θ′ I CΓ, x : θn `M [x/y] : θ′ I |Γ = Γ′| ∧ n ≥ n1 + n2

and application

Γ `M : θk1 → θ2 I C ∆ ` N : (θ1)′ I C ′

Γ′,∆′ `MN : θ′2 I |∆ ≤ ∆′| ∧ |Γ = Γ′| ∧

∧

x:θn∈∆

x:θn′∈∆′

n′ ≥ k · n

Note that the environments Γ,Γ′ and ∆,∆′ in each rule above differ only in the choice of variables usedas bounds.

In Fig. 1 we show an example annotated derivation tree, for the term λfx.f(f(x)).Given a closed term M and its derivation tree, let C(M) be the constraint system generated from the

conjunction of all the annotations in the derivation tree. For the example in Fig. 1, C(λfx.f(f(x))) is

n2 ≥ n4 + n5 ∧ n5 ≥ n1 · n7 ∧ n6 ≥ n1 ∧ n3 ≥ n1 · n8 ∧ n8 ≥ n6 · n9 ∧ n9 ≥ 1 ∧ n7 ≥ 1 ∧ n4 ≥ 1

We say that a constraint system is solved for a given mapping of assumes into non-negative integerconstants if constant bounds for all the guarantees consistent with the constraint system can be found ifthey exist. The mapping of guarantees into non-negative integers is the solution of the system.

The first result of this paper is:

Theorem 3.1 For any closed ICA term M it is decidable whether C(M) can be solved, in which case asolution can be constructed.

Proof. This theorem is proved by giving an algorithm for solving the constraint system then showing thatthe algorithm terminates.

We start by substituting all assumes with the provided constants. Note that all inequations in theconstraint system have one of the following forms:

n ≥ n′ + n′′ n ≥ n′ · n′′ n ≥ n′ n ≥ k, k ∈ N.

We define the relation n � n′ if n appears on the left on a constraint and n′ on the right. (In whatfollows, the order of arguments to + and · is irrelevant for the purpose of working out if a constraint isof a particular form.) The algorithm is:

1. Construct the set S of solutions of the system in the abstract domain of positive integers formedby quotienting Z0,+ over the equivalence relation a ≡ b ⇐⇒ a = b∨ (a ≥ 2 ∧ b ≥ 2), defining +, ·,≥ on this domain in the obvious way.

2. For all constraint systems in S repeat

(a) Replace all variables assigned 0 (1, respectively) at (1) with 0 (1, respectively).(b) Add equations n = 0 or n = 1 respectively for each such replacement.(c) Delete all inequations of the form n ≥ 0 ·n′, and replace all inequations of the form n ≥ 0 +n′

and n ≥ 1 · n′ with n ≥ n′.(d) Construct the � relation on its set of variables.(e) Repeat until � is well founded:

i. Pick a cycle n1 � n2 · · ·nj � n1, or a single variable n1 where n1 � n1 (treating it as aone-element cycle).

ii. If the system contains any inequations of the form n ≥ n′+ n′′, n ≥ n′+ k, n ≥ n′ · n′′, orn ≥ n′ · k, where n, n′, n′′ are variables involved in the cycle and k is a constant discardthe current solution and break to (2).

iii. Pick a fresh variable n.iv. Replace all occurrences of ni with n, adding equations of the form ni = n for each such

replacement.v. Delete all inequations of the form n ≥ n.

(f) Repeat until all RHSs of all equations are constant expressions:i. Choose a �-minimal element n.ii. Let E be the set of RHSes of inequations having n on the LHS, and which are all constants.

iii. Replace all occurrences of n on the RHS of any equation or inequation with the maximumelement of E.

(g) Report the resulting set of equations and inequations as one possible solution to the constraintsystem.

Correctness. The key point of the correctness argument is that at step (2e), all remaining constraintsof the form n ≥ n′ + n′′ and n ≥ n′ · n′′ imply that n > n′ and n > n′′, and thus n � n′ implies thatn ≥ n′. This has two consequences:

• In any cycle n1 � n2 · · ·nk � n1 all variables are greater than or equal to each other, and so mustbe equal, hence step (2(e)iii).

• Constraints as in step (2(e)ii) cannot be satisfied, and so show that the solution to the constraintsin the {0, 1,≥ 2} number system that is currently being tried is impossible.

If the dependency is well founded then given the form of the inequations in the system we can proceedto variable elimination via substitution in a straightforward way.

Termination. The algorithm contains three loops; the outside loop always terminates because it iter-ates over a finite set, and the inside loops always terminate because they always reduce either the numberof variables, or the number of constraints. We take the empty relation to be trivially well founded. �In our running example suppose we take the assume to be n1 = 2, i.e. function f can use its argument intwo concurrent contexts. Solving the system of constraints gives the following typing where the guaranteesn2, n3 are given the smallest possible values:

λf.λx.f(f(x)) : (com2 → com)3 → com4 → com.

This means that the term will use f in at most 3 non-sequential contexts and x in 4.Finally, note that a purely symbolic solution of the constraint system, which does not require the

assumes given as constants, is not always straightforward, so types cannot always be presented as inEx. 2.1. The reason is that for some (pathological) terms, giving the type symbolically would requirea large number of cases. There are even terms that type in some such cases but not others; one suchexample is the term

λq.(λg.g(λx.g(qx)))(λb.(λk.((k(λu.u))(λl.((kb)(λt.(l(t skip)))))))(λv.λw.wv))

: (comm → comn → com)n2 → com,

which types only if m ≤ 1.

f : (comn1 → com)n4 ` f : (comn1 → com) I n4 ≥ 1

g : (comn6 → com)n7 ` g : comn6 → com I n7 ≥ 1 x : comn9 ` x : com I n9 ≥ 1

g : (comn6 → com)n7 , x : comn8 ` g(x) : com I n8 ≥ n6 · n9

f : (comn1 → com)n4 , g : (comn1 → com)n5 , x : comn3 ` f(g(x)) : com I n5 ≥ n1 · n7 ∧ n6 ≥ n1 ∧ n3 ≥ n1 · n8

f : (comn1 → com)n2 , x : comn3 ` f(f(x)) : com I n2 ≥ n4 + n5

` λfλx.f(f(x)) : (comn1 → com)n2 → comn3 → com) I true

Figure 1: Annotated SCC derivation tree for λfx.f(f(x))

f : (com2 → com)1 ` fZ=⇒ f3 : com → com → com ` f3

g : (com2 → com) ` g Z=⇒ g : com → com → com ` g x : com1 ` x Z=⇒ x : com ` x

g : (com2 → com), x : com2 ` g(x)Z=⇒ g : com → com → com, x1, x2 : com ` g(x1x2)

f : (com2 → com)1, g : (com2 → com)2, x : com4 ` f(fx) Z=⇒ f3, g, g′ : com → com → com, x1, x2, x

′1, x′2 : com ` f3(gx1x2)(g′x′1x

′2)

f : (com2 → com)3, x : com4 ` f(fx) Z=⇒ f3, f1, f2 : (com → com → com) → com, x1, x2, x′1, x′2 : com ` f3(f1xx)(f2xx)

` λfx.f(f(x)) Z=⇒ ` λf3f2f1x1x2x′1x′2.f3(f1x1x2)(f2x

′1x′2)

Figure 2: Serialization of λfx.f(f(x))

4 Mapping to SCC(1): serialisation

Using GoS, we know how to compile into hardware terms that only use contraction in sequential con-texts (Ghica and Smith, 2010). However, contraction in concurrent contexts cannot be compiled andmust be replaced by systematic replication of resources.

We do that by translating any SCC-typed term into another SCC-typed term in which all bounds areset to the unit value. We call this type system SCC(1). Perhaps surprisingly, this is actually possibleprovided that we introduce new, multivariate binders for assignable variables. In this section we presentthe translation, which we call serialisation because it results in a term in which all identifiers are usedsequentially.

First an informal introduction. Supposed that we want to compile the term in our running example,λf.λx.f(f(x)). We know that if f : com2 → com then the term has type

(com2 → com)3 → com4 → com,

i.e. it uses 3 instances of f non-sequentially and 4 of x.In hardware, contraction is used to mediate access to a shared piece of circuitry from several points in

a client. When sharing is not possible then a circuit can be replicated as much as needed. We will takethe same approach in the programming language, by replicating identifiers with bounds larger than theunit. The SCC bounds in fact tell us precisely how many instances of an identifier must be generated,because the bounds represent the maximum number of identifiers used in parallel at any given moment.

At the level of types, θn → θ′ becomes θ → θ → · · · → θ′; note that the expanded type must not beθ × θ → θ′, as product allows contraction.

This means that our argument f must be changed to have type com→ com→ com, and we will needthree instances of it, f1, f2, f3. The type of x does not change, but we will need 4 instances of it x1, . . . , x4.The serialised form of the term is λf1f2f3x1x2x3x4.f1(f2x1x2)(f3x3x4).

An obstacle in the way of a straightforward transformation is the existence of storage types (var, sem)which can be used from non-sequential contexts. For example, how can the term

newvar x.x := 0 || x := 1 : com ≡ newvar(λx : var2.x := 0 || x := 1) : com

be serialized when the obvious serialization of the body of the loop is λx1x2.x1 := 0 || x2 := 1? We cando that simply by generalising the local variable binder itself and allowing it to bind several identifiers tothe same memory location.

We can now define the transformation in a systematic manner, inductively on the SCC type derivationinferred in the previous section. We denote the transformation operation by Z=⇒ and we define type leveltranslation as

σ = σ

θ1 → θ′ = θ → θ′

θn → θ′ = θ → θn−1 → θ′

θ × θ′ = θ × θ′.

For constants we define k : θ = k : θ except for

newvar : (varn → com)→ com

= newvarn : (varn → com)→ com

= newvarn : (var→ · · · → var︸︷︷︸n times

→ com)→ com.

Similarly for the other binders.We need an ancillary transformation in cases when multiple variables could be assigned different SCC

types. A simple example is λg.g(λx.x); g(λy.y||y), because both occurrences of g must be assigned thesame SCC type, so its arguments have to have the same SCC type as well. Because the second argumentis transformed to λy1y2.y1||y2, the first argument must receive a dummy variable and be rewritten toλx2x1.x1. The subtype() construction inserts dummy variables whenever needed:

subtypeθ≤θ(M)def= M

subtypeθm1 →θ3≤θn2→θ3(M)def= λx.subtypeθm+1

1 →θ3≤θn2→θ3(M)

subtypeθn1→θ3≤θn2→θ3(M)def=

λx1 . . . xn.M(subtypeθ2≤θ1(x1) . . . subtypeθ2≤θ1(x1)

)

subtypeθn3→θ1≤θn3→θ2(M)def=

λx1 . . . xn.subtypeθ1≤θ2 (Mx1 . . . xn)

subtypeθ1→θ2≤θ3→θ4(M)def=

subtypeθ3→θ2≤θ3→θ4(subtypeθ1→θ2≤θ3→θ2(M)

)

We define the transformation as follows:

x : θn ` x : θ Z=⇒ x : θ ` x : θ

` k : θ Z=⇒ k : θ

Γ `M : θ2 Z=⇒ Γ′ `M ′ : θ2

Γ, x : θn1 `M : θ2 Z=⇒ Γ′, x1 : θ1, . . . , xn : θ1 `M ′ : θ2

Γ, x : θn1 `M : θ2 Z=⇒ Γ′, x1 : θ1, . . . , xn : θ1 `M ′ : θ2

Γ ` λx.M : θn1 → θ2 Z=⇒ Γ′ ` λx1 · · ·xn.M ′ : θn1 → θ2

Γ `Mi : θi Z=⇒ Γ′ `M ′i : θiΓ ` 〈M1,M2〉 : θ1 × θ2 Z=⇒ Γ′ ` 〈M ′1,M ′2〉 : θ1 × θ2

Γ `M : θ1 Z=⇒ Γ′ `M ′ : θ1 θ1 ≤ θ2

Γ `M : θ2 Z=⇒ Γ ` subtypeθ1≤θ2 (M ′) : θ2

Γ, x : θn11 , y : θn2

1 `M : θ2

Z=⇒ Γ′, x1 : θ1, . . . , xn1 : θ1, y1 : θ1, . . . , yn2 : θ1 `M ′ : θ2

Γ, x : θn1+n21 `M [x/y] : θ2

Z=⇒ Γ′, x1 : θ1, . . . , xn1+n2 : θ1 `M [xn1+1/y1] · · · [xn1+n2/yn2 ] : θ2

Γ `M : θn1 → θ2 Z=⇒ Γ′ `M ′ : θn1 → θ2

∆ ` N : θ1 Z=⇒ ∆′ ` N ′ : θ1

Γ, n ·∆ `MN : θ2

Z=⇒ Γ′,∆′1, . . . ,∆′n `M ′(N ′[∆′1/∆′]) · · · (N ′[∆′n/∆′]) : θ2

In the last rule given an identifier type assignment ∆′, by ∆′k we understand an identifier type assignmentisomorphic to ∆′ where all the identifiers are fresh. The substitution N ′[∆′k/∆

′] replaces all the identifiersin N ′ which occur in dom(∆′) with the corresponding fresh identifier from dom(∆′k).

The correctness of the transformation is formulated as:

Theorem 4.1 If Γ ` M : θ is a valid SCC term and Γ ` M : θ Z=⇒ Γ′ ` M ′ : θ′ then Γ′ ` M ′ : θ′ is avalid SCC(1) term.

Moreover, if M is a program then M may terminate if and only if M ′ may terminate.

Proof. The proof of the first part of the theorem is by structural induction; that of the second, by showingthat any sequence of reductions in the operational semantics of an SCC term corresponds to a sequenceof reductions in the corresponding SCC(1) term, and vice versa. The small-step operational semantics ofICA is given in Ghica and Murawski (2008) and is the obvious one. We do not include it here for lack ofspace. SCC and SCI have essentially the same operational semantics as ICA.

Typing. As our induction hypothesis, we take the following (stronger) hypothesis: for Γ ` M : θ avalid SCC term, and Γ `M : θ Z=⇒ Γ′ `M ′ : θ′, then Γ′ `M ′ : θ′ is a valid SCC(1) term, θ′ = θ, and Γ′

is Γ with all variables x : θn1 replaced with at most n copies of xi : θ1.It is first necessary to prove that for any SCC type θ, θ is an SCC(1) type, but this is obvious from

the definition of type level translation, because it cannot produce any bounds greater than 1.The base case is trivially true by definition for the second rule; for the first rule, the knowledge that

the LHS is typed correctly implies that n is at least 1, and thus the typing is correct (via the identityaxiom), θ′ = θ by definition, and the use of 1 copy of x fulfils the requirement to have at most n copieswith n ≥ 1.

In the case of Contraction note that N ′[∆′k/∆′] must have the same SCC type as N ′ because it is a

replacement of variables in N ′ with other variables of the same type. None of the N ′[∆′k/∆′] share free

variables with M ′ by definition, and because the type of M ′ is θn1 → θ2, the type of M ′(N ′[∆′1/∆′]) is

θn−11 → θ2 etc., until the type of M ′(N ′[∆′n/∆

′]) is θ2, which proves the case of Contraction.The key cases are Subtyping and Application.For Application, observe that N ′[∆′k/∆

′] must have the same SCC type as N ′ because it is a replace-ment of variables in N ′ with other variables of the same type. None of the N ′[∆′k/∆

′] share free variableswith M ′ by definition, and because the type of M ′ is θn1 → θ2, the type of M ′(N ′[∆′1/∆

′]) is θn−11 → θ2,

and so on, until the type of M ′(N ′[∆′n/∆′]) is θ2, proving that the eighth rule creates a correctly typed

SCC(1) expression.Correctness of Subtyping is proved by induction on the hypothesis, i.e. ifM ′ : θ1, then subtypeθ1≤θ2(M ′) :

θ2. Note that the cases in the definition of subtype mirror the cases in the definition of ≤ on SCCtypes (with the third and fourth cases being special cases of the fifth, needed to make the recursionwell-founded), and thus subtypeθ1≤θ2 is defined whenever θ1 < θ2. The base case (the first definition) isdegenerately true, the fifth case is true by definition, and the second, third, and fourth cases are obviouslytrue if the type of x is taken to be θ1, θ2, and θ3 respectively.

It is important to note also that this transformation is compositional on the syntax, i.e. for any termΓ ` M : θ and context C[−] such that ∅ ` C[M ] : com, if Γ ` Mθ Z=⇒ Γ′ ` M̃ ′ : θ′ there exist context C′such that

∅ ` C[M ] : com Z=⇒ ∅ ` C′[M ′] : com. (1)

Soundness. The following uses the operational semantics (OS) of SCC, which is essentially the same asthat of ICA, the obvious combination of CBN lambda calculus, simple imperative language and parallelexecution (Ghica and Murawski, 2008). The reduction rules are small-step and the semantics is non-deterministic because of possible race conditions. By “termination” of a program (closed term of comtype) M⇓ we mean may-termination, i.e. the existence of a chain of reductions leading to skip. We donot define the OS for lack of space.

The first step of the proof is to eliminate the multivariate binders from the SCC(1) term Γ′ `M ′ : θ′;we do that simply by replacing all terms of form newn(λx1 . . . xn.M) with new(λx.M [x/xi]) where x ischosen fresh. We can do this because of the previous result (the correctness of typing). The resultingterm is not pure SCC(1) but it must be operationally equivalent by the very definition of the multivariatebinder. We denote this transformation by M̃ .

We define a logical relation Γ ` MRθM ′ between SCC terms such that there exist Γ′, θ′ such thatΓ `M : θ Z=⇒ Γ′ ` M̃ ′ : θ′.

For programs, ∅ `MRcomM′ if and only if M⇓ if and only if M̃ ′⇓. This is lifted to open terms in the

usual manner, i.e. for all command contexts C which accept a term of type θ and trap free variables inΓ, C[M ]⇓ if and only if C′[M̃ ′]⇓. We can write this because of (1) and because there is no point in usingmultivariate binders in the context, so we eliminate them only in the original term.

We induct on the reduction rules of the operational semantics. The base cases (for identity andconstants) are true by definition, as the transformation in those cases does not change them or theirbehaviour. The only new construct, the multivariate binder has been syntactically eliminated. Most in-context reductions are also trivial. In fact this is also the reason why the rules of the OS can be omitted,because this proof is entirely parametric on them.

The only interesting case is function application when the function has been reduced to a normalform:

(λx.M)N, s −→M [N/x], s′. (2)

The serialised term is(subtype(θ1)m→θ2≤(θ′1)n→θ′2(λx1 · · ·xm.M ′)

)(N ′1) · · · (N ′n),

where each N ′k is a copy of the same term but with free variables changed.An argument on the definition of subtype() will show immediately that it is semantically innocuous,

only introducing dummy variables and lambdas so that the types match.Then we notice that the only difference between the two is that reduction 2 executes one reduction

consisting of n substitutions, whereas reduction 4 executes n reductions consisting of one substitutioneach. �The serialization of λfx.f(f(x)) is shown in Fig. 2, with term types omitted for brevity.

5 Compilation and correctness

Here we will discuss compilation into asynchronous circuits using the Geometry of Synthesis approach.Consider the typical implementation of a digital half-adder:

AB S

C

The inputs are A and B and outputs sum S and carry C:

S = A⊕B, C = A ∧B.

Suppose that the circuit is in an initial state, where A = B = C = S = 0 and we want to change the inputvalues to A = B = 1. In a synchronous (clocked) circuit, the system clock has a period longer than thepropagation delay of signals through wires and gates, and values are only considered meaningful on thefalling (or raising) edge of the clock, giving them time to stabilise at the correct values of S = 0, C = 1.However, in an asynchronous (clock-less) circuit the new input signals will propagate along the wires andreach the four gate inputs at different times. Depending on the relative wire delays, there are 8 differentorders in which this can happen. The two gates will see a sequence of four distinct inputs, and producethe corresponding outputs, before settling on the correct values. As inputs change from 0 to 1 on itsinputs, the outputs of the AND gate are the sequence 001, which corresponds to a “clean” transition from0 to 1. However, on the XOR gate, as the inputs change from 0 to 1 the outputs will see the sequence010. Before settling on the correct value of 0, the circuit shows a spurious value of 1, a so-called hazard.If this adder is connected to other circuits then these circuits will consider the hazard value as a genuinevalue and propagate it, leading to more spurious values and ultimately a rather chaotic circuit behaviour.

In a nutshell, this is the main problem of asynchronous circuit design, and there exist a variety oftheoretical and practical approaches to mitigating it (Hauck, 1995).

5.1 Event logic

A particularly interesting and clean solution was proposed by Sutherland (1989) in his seminal TuringAward lecture. At the foundation of his approach lies the observation that boolean logic is not particularlywell suited to implementing asynchronous circuits, suggesting instead an event logic: a logic of purecontrol, dealing not with “true” and “false” but with the more fundamental notions that “somethinghappened” or “nothing happened”. The basic logical functions on events can be (efficiently) implementedas special gates or modules. At the level of physical implementation, an event is either a high-to-low ora low-to-high transition (edge) on a wire, the so-called two-phase event encoding.

XOR provides an OR-like function for events, producing an output event when an event arrives on anyof the input ports.

C is the so-called Muller C-element (Miller, 1965, Chap. 10), a fundamental gate in asynchronous design.It has an AND-like functionality on events, producing an output when events arrive on both inputports.

CAB X

TOGGLE

X Y

SELECTtrue false

A

X Y

CALL

R1D1

R2D2

RD

ARBITE

R

R1R2

G1D1

G2D2

A

S

XORAB Z

Figure 3: Logic modules for events

TOGGLE steers events to its outputs alternately, starting with the dot.

SELECT steers its input event to the the output according to the value of input S.

CALL remembers which “client”, R1 or R2, called more recently and it steers the matching D back toD1 or D2 as is the case. CALL can be generalised to the case where R,D,Ri, Di represent sets ofports.

ARBITER grants service G1 or G2 to only one input request R1 or R2 at a time, delaying subsequentgrants until the matching done event D1 or D2.

It also makes sense to consider as primitives WIRE, the simple connector, and FORK the forkingconnector with one input and two outputs.

Ghica and Smith (2010) give a compositional trace model for event logic. Composition of event-logiccircuits is rather subtle because the interaction between two circuits can lead to “unsafe” traces. Weillustrate this with the following example.

XORXY ZA

Given the obvious trace semantics for the two circuits, the composition FORK;XOR might be expectedto produce input-output traces of the form (AZZ)∗. However, if we consider traces including the internalchannels X and Y , we can see that these observable traces might correspond to interactions AXY ZZ,which are from a physical point of view unsafe: if events X and Y arrive very close to each othertemporally, then it is possible that the sequence ZZ consists of two events that happen faster than theinertial delay of the wire or the gate and may be suppressed (Sparsø and Furber, 2001, Sec. 6.1.3).

However, composition disallows such unsafe traces. The set of traces of a composite system onlycontains those traces that can only be produced safely. Intuitively, the safe composition of two circuitsinvolves only those traces in which, at the interface, each output produced by a circuit can be immediatelyconsumed as an input by the other circuit. From this point of view, there are no safe traces in thecomposition above, i.e. FORK;XOR = ∅. This can be shown analysing all possible interactions. Theinteraction between the two is unsafe because after input A and interface event X the next output, onY , cannot be consumed by XOR before it outputs on Z. AY is unsafe for a similar reasons. Since allinteractions are prefixed by a sequence of events A ·X or A · Y there are no safe interactions. This canbe proved formally in the model of Ghica and Smith (2010).

On the other hand we can show that FORK;C has the same behaviour as a wire connecting input Aand output Z, and all interactions in the composite circuit are safe.

C ZA YX

5.2 Compiling SCI into event logic

The game model for SCI can be represented using only the XOR, C, CALL, WIRE and FORK fragmentof event logic.

The concrete representation of types follows the game-semantic model of the language so that eachgame-semantic move corresponds to a distinct port of a circuit. Concretely, the port structures generatedduring compilation is as follows:

Q'' QA Q'A' A''

Q'' Q

A Q'

A'A''

Given a set of plays P we denote the least strategy that contains it (i.e. itsclosure under prefix, O-completion and saturation) strat(P ).

Arenas and saturated strategies form a Cartesian Closed Category in whichthe objects are arenas, morphisms A ! B are saturated strategies ! : A " B.The identity strategy is defined by saturating the “copy-cat” strategy commonin game semantics:

idA = strat{s # PA!A | s ! inl(A) = s ! inr(A)}.

The constant functions of the language are interpreted by:

!seq : com$ com" ! com""" = sat(q""qaq"a"a"")!par : com! com" ! com""" = sat(q""qq"aa"a"")

!if : exp$ com" $ com"" ! com"""" = sat(q"""qtq"a"a""" + q"""qfq""a""a""")!while : exp$ com" ! com""" = sat(q""(qtq"a")#qfa"")

!asg : var $ exp" ! com""" = sat(q""q"t" wt a + q""q"f " wf a)!deref : var! exp"" = sat(q"qtt", q"qff ")

!newvar : (var! com" !)com""" = sat(q""q"(wt a(qt)# + wf a(qf)#)#a"a"")

The game model for ICA is fully abstract if the language has semaphores andso-called bad variable constructors [8]. In the absence of these constructs we canstill state that

Theorem 1. The game model of ICA is sound and adequate, relative to a stan-dard operational semantics.

The operational semantics and the proof of soundness are given in loc. cit..Concrete representations of the game model of ICA are complicated by the it-

erated shu!e operation which is not implementable using finite state automata.In fact, in the presence of semaphores the model of ICA is undecidable [7,Thm. 6]. Bounding the amount of concurrency and interleaving in game models,as SCC does, leads to a finite state model. In fact, for SCI, which is SCC withall bounds set to the unit, the model is particularly simple. For expedience, wewill present the model for SCC when it coincides with SCI, i.e. when the boundsare all the unit.

Definition 5 (Unit-bounded play). A unit-bounded set of plays PA % PA

is the set of all plays such that if · · · q · · · q" # PA and q, q" # IB from somesub-arena B of A then q is not pending before q" is played.

This definition is a simplified instance of Def. 12 [7] when the bound is fixed.For example, in arena com$ com" ! com"" the play q""qaq", which occurs in

sequential composition is legal, whereas the play q""qq" which occurs in parallelcomposition is illegal because com$ com is a subarena of com$ com" ! com""

and q, q" are both initial questions in it. However, note that the same play q""qq"

is legal in arena com! com" ! com"", which is in fact the proper type of parallelcomposition.





































Q''' QT Q'

A''A'''

F Q''A'



















F A''

Q''Q

Q'T

A'



















Q'' Q'T' WT

A A''F' WF



















Q' QT T'F F'XOR

XOR

C

Figure 4: Event-logic circuits for SCI imperative constants

com : the type of commands corresponds to one input port Q and one output port A. Intuitively, Qrepresents a request from the environment to execute the command and A and acknowledgment oftermination.

exp : the type of expressions corresponds to one input port Q and two output ports T and F . As before,Q is a request to evaluate the expression while T and F are the two possible outcomes, true or false.

var : the type of variable corresponds to input ports Q,WT,WF and output ports T, F,A. Q is a readrequest, answered by T (true) or F (false), while WT (WF ) is a request to write true (false) andis acknowledged by A.

θ1 × θ2 : the product type is the disjoint sum of the port structures corresponding to θ1, θ2.

θ1 → θ2 : the product type is the disjoint sum of the port structures corresponding to θ1, θ2 except thatthe input-output polarities associated with θ1 are reversed.

The base-type constants are

Q A Q TF

Strategies ! : A ! B and " : B ! C can be composed in a standard way, byconsidering all possible interactions of plays from " with shu!ed plays from !!

in the shared arena, followed by hiding all B moves.

! • " = {u | u ! A, B " !!, u ! B, C " "}!; " = {u ! A, C | u " ! • "}.

A key notion from concurrent game semantics is that of saturation [9], reflectingthe fact that in an asynchronous setting the program only has a limited amountof control over the ordering of events.

Definition 4 (Saturation). Let # " PA $ PA be the least transitive relationsuch that sos!s!! # ss!os!! and ss!ps!! # sps!s!!, where o is any O-move, p anyP-move, and the justification pointers are the same. A strategy ! is saturated ifand only if for any s " !, if s! # s then s! " !.


Arenas and saturated strategies form a Cartesian Closed Category in whichthe objects are arenas and morphisms A% B are saturated strategies ! : A! B.The identity strategy is defined by saturating the “copy-cat” strategy commonin game semantics: idA = strat{s " PA"A | s ! inl(A) = s ! inr(A)}.


!skip : com" = strat(qa), !1 : exp" = strat(qt)

!seq : com$ com! % com!!" = strat!q!!qaq!a!a!!

"

!par : com% com! % com!!" = strat!q!!qq!aa!a!!

"

!if : exp$ com! $ com!! % com!!!" = strat!q!!!qtq!a!a!!! + q!!!qfq!!a!!a!!!

"

!while : exp$ com! % com!!" = strat!q!!(qtq!a!)#qfa!!

"

!asg : var $ exp! % com!!" = strat!q!!q!(f ! wf + t! wt)aa!!

"

!deref : var% exp!" = sat!q!qtt!, q!qff !

"

!newvar : (var% com!)% com!!" = strat!q!!q!(wt a(qt)# + wf a(qf)#)#a!a!!)

Arithmetical-logical operators can be defined in several ways (sequential, lazysequential or parallel). Consider this three versions of the OR operator:

!ors : exp$ exp! % exp!!" = strat!q!!qtq!(t! + f !)t!! + q!!qfq!t!t!! + q!!qfq!f !f !!

"

!orl : exp$ exp! % exp!!" = strat!q!!qtt!! + q!!qf(q!t!t!! + q!f !f !!)

"

!orp : exp% exp! % exp!!" = strat!q!!q!q(t!t + t!f + f !t)t!! + q!!q!qf !ff !!

".

The game model for ICA is fully abstract if the language has semaphores andso-called bad variable constructors. In the absence of these constructs we canstill state that

Theorem 1 ([9]). The game model of ICA is sound and adequate, relative toa standard operational semantics.

Strategies ! : A ! B and " : B ! C can be composed in a standard way, byconsidering all possible interactions of plays from " with shu!ed plays from !!

in the shared arena, followed by hiding all B moves.

! • " = {u | u ! A, B " !!, u ! B, C " "}!; " = {u ! A, C | u " ! • "}.

A key notion from concurrent game semantics is that of saturation [9], reflectingthe fact that in an asynchronous setting the program only has a limited amountof control over the ordering of events.

Definition 4 (Saturation). Let # " PA $ PA be the least transitive relationsuch that sos!s!! # ss!os!! and ss!ps!! # sps!s!!, where o is any O-move, p anyP-move, and the justification pointers are the same. A strategy ! is saturated ifand only if for any s " !, if s! # s then s! " !.


Arenas and saturated strategies form a Cartesian Closed Category in whichthe objects are arenas and morphisms A% B are saturated strategies ! : A! B.The identity strategy is defined by saturating the “copy-cat” strategy commonin game semantics: idA = strat{s " PA"A | s ! inl(A) = s ! inr(A)}.


!skip : com" = strat(qa), !1 : exp" = strat(qt)

!seq : com$ com! % com!!" = strat!q!!qaq!a!a!!

"

!par : com% com! % com!!" = strat!q!!qq!aa!a!!

"

!if : exp$ com! $ com!! % com!!!" = strat!q!!!qtq!a!a!!! + q!!!qfq!!a!!a!!!

"

!while : exp$ com! % com!!" = strat!q!!(qtq!a!)#qfa!!

"

!asg : var $ exp! % com!!" = strat!q!!q!(f ! wf + t! wt)aa!!

"

!deref : var% exp!" = sat!q!qtt!, q!qff !

"

!newvar : (var% com!)% com!!" = strat!q!!q!(wt a(qt)# + wf a(qf)#)#a!a!!)

Arithmetical-logical operators can be defined in several ways (sequential, lazysequential or parallel). Consider this three versions of the OR operator:

!ors : exp$ exp! % exp!!" = strat!q!!qtq!(t! + f !)t!! + q!!qfq!t!t!! + q!!qfq!f !f !!

"

!orl : exp$ exp! % exp!!" = strat!q!!qtt!! + q!!qf(q!t!t!! + q!f !f !!)

"

!orp : exp% exp! % exp!!" = strat!q!!q!q(t!t + t!f + f !t)t!! + q!!q!qf !ff !!

".

The game model for ICA is fully abstract if the language has semaphores andso-called bad variable constructors. In the absence of these constructs we canstill state that

Theorem 1 ([9]). The game model of ICA is sound and adequate, relative toa standard operational semantics.

Note that in both cases the request Q is immediately propagated to the corresponding acknowledgment.The implementation of 0:exp is analogous.

The representations of games for the imperative language constants are given in Fig. 4, and are simplyevent logic representations of the game semantic model of SCI. The ports on this circuit are decoratedin the same way as the types in the signature in order to make the correspondence obvious. The input-output behaviour of these circuits is operationally intuitive. We only discuss sequential composition,seq : com× com′ → com′′. The initial input request is Q′′, corresponding to the return type com′′. Thisrequest is simply propagated as an output request Q, corresponding to the operator requesting theevaluation of its first argument. When it acknowledges termination A, it is in turn simply propagated toQ′, requesting the evaluation of the second argument. When it acknowledges termination A′ the operatorcan acknowledge termination to the environment A′′.

Here we only consider the representation of lazy sequential operators, such as orl : exp× exp′ → exp′′:

Q'' QF Q'

T'T''

TXOR

F' F''

Parallel and (eager) sequential operators raise certain technical problems discussed in Ghica and Smith(2010). Note that this way of encoding boolean values using different ports, called dual rail, is standardin asynchronous circuit design and can be extended to integers.

The CALL module is used to implement the family of diagonal strategies δθ : θ → θ′ × θ′′ used incontraction. The implementation of δcom : com→ (com1 × com2) is simply the CALL module. Higher-order contraction is implemented using more complex generalised CALL module as shown in loc. cit..

The local-variable binder newvar : (var→ com′)→ com′′ can be also be implemented by taking advan-tage of the stateful nature of the CALL module.

CALL

WFFWTT

AQ

Q''

Q'

A'

A''

SCI terms can be interpreted inductively on the syntax, where terms are formed from constants, con-traction (described above), function application, function declaration and free identifiers. Given cir-cuits F,M which are compilations of terms Γ `r F : θ → θ′ and, respectively, ∆ `r M : θ, the circuit forΓ,∆ `r F (M) : θ′ is constructed as a certain interconnect for the two circuits, as discussed in Sec. 1(connectors labelled by G,D, T, T ′ are multi-line input-output bundles).

F

MG

D

T

T

T'

The greyed-out circuit connecting the argument to the function is the evaluation morphism, the uncurry-ing of the identity at type θ → θ′. The entire construction can be expressed in the language of compactclosed categories in a canonical way. Function declaration is the currying relabelling of ports discussedearlier, and free identifiers are the identity (wires).

A simple but useful program which illustrates the compilation of open higher-order programs is in-place map, which applies a function f to all elements of a data structure, modifying them in place.Consider an iterator over some data structure, provided with the following interface:

init : com initialise an iterator over the data structure;

curr : var get the current element in the data structure;

next : com advance the iterator to the next element;

more : exp return false if the end of the data structure has been reached and true otherwise.

Note that SCI being a call-by-name language these identifiers represent thunks, i.e. parameter-less pro-cedures. The program for in-place map is:

init : com, curr : var, next : com,more : exp `rλf : exp→ exp.init; while (more)(curr := f(!curr); next) : com.

The structure of the resulting circuit is shown in Fig. 5, along with the concrete circuit, which is strikinglysimple. Ports are annotated with the variable name for readability; top-level ports are top.q and top.a.For function f : exp′ → exp the ports corresponding to the argument are primed. Technically, variablecurr should go through a contraction circuit δvar : var→ var × var; however, because the first occurrenceuses only the “write” ports and the second only the “read” ports, no connectors need to be actuallyreused and contraction can be omitted.

In loc. cit. we show that compiled circuits are both logically and physically correct, i.e. delay-insensitive. Any term is compiled into an event logic circuit that has the same input-output behaviouras the (sound) game semantic model of the language. Moreover, the circuits constructed by the compilerare always safe in the sense discussed earlier in Sec. 5.1.

5.3 Contraction and local variables in concurrent contexts

We can still use the method described above except that the local variable binder must handle the bindingof multiple identifiers to the same memory cell. Moreover, the identifiers may occur in concurrent contexts.It also makes sense now to use semaphores.

while

init

asg

deref

δ

seq

seq

more

evalnext

f

currcom

top.qtop.ainit.q

init.a more.q

more.t

more.f

f.qf.tf.ff.q'

f.t'f.a'

curr.wtcurr.wf

curr.acurr.q

curr.tcurr.f

next.q

next.a

Figure 5: In-place map, overall structure and event-logic implementation

CALL

C

XOR

XOR

CALL

CALL

CALL

WT1WF1

OK1

WT2WF2OK2

RD1T1F1

RD2T2F2

WTWFOK

RD

T1F1

Figure 6: Event logic implementation for sequential contraction δvar : var→ var1 × var2.

ARBITER

ARBITER

ARBITER

Figure 7: A 3-way arbiter

In loc. cit. variable contraction can only be done in a sequential setting; the diagonal δvar :var→ var1 × var2 is shown in Fig 6.

In the new setting, we do not have contraction of var-typed identifiers in the syntax but the si-multaneous binding of several identifiers to the same memory cell amounts to the same thing in theimplementation, if the identifiers belong to sequential contexts. This is not the case in non-sequentialcontexts, because the CALL module cannot handle concurrent request, which amount to a race condition.The standard solution in asynchronous design, which we will apply, is to guard the CALL modules byusing n-way ARBITERs to mediate all inputs. Such circuits are not basic event logic gates but can beconstructed from the 2-way arbiters. In Fig. 7 we show a standard 3-way arbiter construction due toSeitz (1980). More efficient n-way arbiters can be designed however directly rather than compositionallyfrom smaller arbiters (Martin, 1990). The new multivariate binding circuit is represented in Fig. 8.

We will not provide semantic-directed implementations for semaphores noting that they can be im-plemented in the language using shared memory in standard ways, e.g. the Peterson (1981) tie-breakeralgorithm. As Murawski (2010) points out, the first-class semaphore provided in the original ICA isneeded mostly for technical reasons related to definability and not algorithmic considerations.

The correctness of the compilation is, as in Ghica and Smith (2010), the correctness of representationof the game model in event logic gates.

Theorem 5.1 Let M : com be an SCC(1) term and K its event-logic representation.

Correctness: If K receives an input event on its Q port it will produce an output event on its A port iffM terminates.

Safety: Circuit K is delay-insensitive.

Proof. The proof of this theorem is a corollary of Thm. 5.3 in loc. cit., which is essentially proving alogical relation between the input-output behaviour of any circuit and the game semantic model of thecorresponding term. Since we are still within the SCC(1) game semantic model the proof stands, butit has to be extended with a new case, the family of variable binders for multiple identifiers. Becauseof the type of the multivariate binder ((var→ var→ com)→ com) the concurrent usage of the variablesdoes not violate the seriality constraint of the game model (Def. 2.7 in loc. cit.). The n-way arbiter thenensures that the underlying contraction circuit is actually used sequentially, because all read and writerequests to the variable are serialized. The two XOR gates take mutually exclusive input events, thereforethey are always used safely. �

This, together with the correctness of type inference from ICA to SCC (Thm. 3.1) and program trans-formation from SCC to SCC(1) (Thm. 4.1) lead to the main result,

Theorem 5.2 Programs in ICA which have an SCC type can be effectively mapped into delay-insensitiveevent-logic circuits.

5.4 Example

We show the compilation of three terms with identical ICA types but distinct SCC and serialised versions.The terms, the inferred SCC types and the SCC(1) versions are given below, assuming f : com1 → com.

λfx.f(fx):(com1→com)2→com1→com Z=⇒ λf1f2x.f1(f2x)

λfx.fx; fx:(com1→com)1→com1→com Z=⇒ λfx.fx; fx

λfx.fx||fx:(com1→com)2→com2→com

Z=⇒ λf1f2x1x2.f1x1; f2x2

WTWF

OK

RD1

T1F1

WT1WF1

OK1

WT2WF2OK2

RD1T1F1

RD2T2F2

WT1

WT2

OK1

WF1

WF2

OK2

RD1

T1

F1

RD2

T2

F2

ARBITE

R

Figure 8: Event logic for contraction in concurrent contexts.

The compiled versions are in Fig. 9. The actual synthesised circuits are inside the grey box. The circuitsmarked Fi, Xi, F,X are instances of the argument that must be supplied by the designer to create aworking circuits or, equivalently, arguments for f, x to lead to programs. Note the trade-offs in the lasttwo designs. The second circuit contains two fairly expensive diagonal circuits but it only requires oneinstance of F and X, while the third consists only of connectors, but requires two instances of each of Fand X;

6 Related and further work

There exist other higher-level approaches to hardware synthesis: SystemC1 or CoWare2, hardwarecompilers based on process calculi, such as (van Berkel et al., 1991), or higher-order structural languagessuch as Lava (Bjesse et al., 1998); these are interesting and useful, but conceptually different ways ofapproaching VLSI design.

Hardware compilation in the behavioural style we are pursuing in GoS has a substantial literaturewhich we cannot discuss extensively; some entry points to the literature are Budiu and Goldstein (2002);Buyukkurt et al. (2006). This line of work is in some sense parallel with ours and focuses almost exclusivelyon optimisation techniques such as automated parallelisation whereas we are concerned with problems ofa structural nature. This difference of focus is discussed extensively in Ghica (2009).

Type inference for SCI has been studied before, but for a richer version of the type system which wedo not need (Yang and Huang, 1998). A program transformation similar in spirit with our serialisationis linearisation, due to Kfoury (2000). The first main difference is that linearisation replicates everyvariable occurrence, without permitting contraction at all. The second one is that replication of variablesof higher-order type does not have to be uniform, but can result in occurrences with different linearizedtypes. The first difference is conceptually significant but technically rather minor, whereas the secondone is conceptually minor but, perhaps surprisingly, technically significant resulting in the existence ofnormal forms which can be linearized but cannot be serialized. From the point of view of compilersupport for separate compilation, foreign function calls and run-time services we consider it crucial tooffer a consistent interface between a serialized term and its context, hence our decision. Also note thatthe soundness argument of Thm. 4.1 is simplified by the fact that serialization is uniform across copiesof identifiers, leading to a very simple inductive step in the proof.

However, inside the serialised term itself perhaps a more flexible approach which mixes serialisationand linearization could be used at the expense of some complication in the algorithms. Even so, it is worthnoting that non-typeable terms are somewhat pathological and unlikely to be found in algorithmicallyrelevant programs. Combining serialization with a selective form of linearization can lead to interestingoptimisations techniques. Various performance parameters can be calculated at compile-time, e.g. foot-

1www.systemc.org2www.coware.com

F

X

F1

X

F2

!com

!com!com

R

D R

D

model (Def. 2.7 in [5]). The n-way arbiter then ensures that thesequential contraction circuit is actually used sequentially, becauseall read and write requests to the variable are serialized. The twoXOR gates take mutually exclusive input events, therefore they arealways used safely. !

This, together with the correctness of type inference from ICA toSCC (Thm 4.1) and program transformation from SCC to SCC(1)(Thm. 5.1) lead to the main result,

THEOREM 6.2. Programs in ICA which have an SCC type canbe effectively mapped into delay-insensitive event-logic circuits.

6.1 ExampleWe show the compilation of two terms with identical ICA types butdistinct SCC and serialised versions, !fx.f(fx) and !fx.fx; fx.Assuming that f : com1 ! com, the two terms have SCC types(com1 ! com)2 ! com1 ! com and, respectively (com1 ! com)1 ! com1 ! com.The respective SCC(1) transformations give, respectively, !f1f2x.f1(f2x)and !fx.fx; fx. The two compiled circuits are in Fig. ??.

7. Further work* related work: “linearization”?* make synchronous via round abstraction

References[1] D. R. Ghica. Geometry of Synthesis: a structured approach to VLSI

design. In POPL, pages 363–375, 2007.

[2] D. R. Ghica and A. Murawski. Angelic semantics of fine-grainedconcurrency. Annals of Pure and Applied Logic, 151(2-3):89–114,2008.

[3] D. R. Ghica and A. S. Murawski. Compositional model extraction forhigher-order concurrent programs. In TACAS, pages 303–317, 2006.

[4] D. R. Ghica, A. S. Murawski, and C.-H. L. Ong. Syntactic control ofconcurrency. Theor. Comput. Sci., 350(2-3):234–251, 2006.

[5] D. R. Ghica and A. Smith. Geometry of Synthesis II: From games todelay-insensitive circuits. In MFPS XXVI, 2010. forthcoming.

[6] S. Hauck. Asynchronous design methodologies: an overview.Proceedings of the IEEE, 83(1):69–93, Jan 1995.

[7] A. J. Martin. Developments in concurrency and communication,chapter Programming in VLSI: From communicating processes todelay-insensitive circuits, pages 1–64. Addison-Wesley, 1990.

[8] G. McCusker. Categorical models of syntactic control of interferencerevisited, revisited. LMS Journal of Computation and Mathematics,10:176–216, 2007.

[9] G. McCusker. A graph model for imperative computation. LogicalMethods in Computer Science, 6(1), 2010. DOI: 10.2168/LMCS-6(1:2)2010.

[10] R. E. Miller. Sequential Circuits. Wiley, NY, 1965.

[11] A. Murawski. Full abstraction without synchronization primitives. InMFPS XXVI, 2010. (forthcoming).

[12] P. W. O’Hearn, J. Power, M. Takeyama, and R. D. Tennent. Syntacticcontrol of interference revisited. Theor. Comput. Sci., 228(1-2):211–252, 1999.

[13] G. L. Peterson. Myths about the mutual exclusion problem.Information Processing Letters, 12:115–116, 1981.

[14] U. S. Reddy. Global state considered unnecessary: An introduction toobject-based semantics. Lisp and Symbolic Computation, 9(1):7–76,1996.

[15] J. C. Reynolds. Syntactic control of interference. In POPL, pages39–46, 1978.

[16] J. C. Reynolds. The essence of Algol. In Proceedings of the 1981International Symposium on Algorithmic Languages, pages 345–372.North-Holland, 1981.

[17] J. C. Reynolds. Syntactic control of inference, part 2. In ICALP,pages 704–722, 1989.

[18] C. L. Seitz. Ideas about arbiters. Lambda, 1(1):10–14, 1980.

[19] J. Sparsø and S. Furber, editors. Principles of Asynchronous CircuitDesign: A Systems Perspective. European Low-Power Initiative forElectronic System Design. Kluwer Academic Publishers, 2001.

[20] I. E. Sutherland. Micropipelines. Commun. ACM, 32(6):720–738,1989. Turing Award Paper.

11 2010/7/11




6.1 ExampleWe show the compilation of two terms with identical ICA types butdistinct SCC and serialised versions, !fx.f(fx) and !fx.fx; fx.Assuming that f : com1 ! com, the two terms have SCC types(com1 ! com)2 ! com1 ! com and, respectively (com1 ! com)1 ! com1 ! com.The respective SCC(1) transformations give, respectively, !f1f2x.f1(f2x)and !fx.fx; fx. The two compiled circuits are in Fig. ??.























11 2010/7/11

F1

X1

F2

R

D

X2

C




6.1 ExampleWe show the compilation of three terms with identical ICA typesbut distinct SCC and serialised versions. The terms, the inferredSCC types and the SCC(1) versions are given below, assumingf : com1 ! com.

!fx.f(fx) : (com1 ! com)2 ! com1 ! com !" !f1f2x.f1(f2x)

!fx.fx; fx : (com1 ! com)1 ! com1 ! com !" !fx.fx; fx

!fx.fx||fx : (com1 ! com)2 ! com2 ! com !" !f1f2x1x2.f1x1; f2x2

The compiled versions are in Fig. 9. The actual synthesised cir-cuits are inside the grey box. The circuits marked Fi, F, X are in-stantiations of the argument that must be supplied by the designerto create a working circuits or, equivalently, arguments for f, x tolead to programs. Note the tradeoffs in the two designs. The firstone consists only of connectors, but requires two instances of F ;the second one contains two fairly expensive diagonal circuits butit only requires one instance of F .























11 2010/7/13

Figure 9: Three example circuits

print (number of gates) or latency (longest delay). Linearisation of an identifier makes a trade-off betweenduplicating arguments to functions and using expensive contraction circuitry, as can be seen in Fig. 9. Inthis sense, serialization is the extreme scenario in which contraction is always favoured before replication.Introducing a controlled form of linearization will be investigated in further optimised implementationsof the compiler.

Finally, the approach here can be extended to synchronous circuits using the round abstractionmethodology for low-latency encoding of asynchronous specifications into synchronous circuits (Ghicaand Menaa, 2010). This is forthcoming work.

References

Per Bjesse, Koen Claessen, Mary Sheeran, and Satnam Singh. Lava: hardware design in Haskell. InICFP, pages 174–184, 1998.

Mihai Budiu and Seth Copen Goldstein. Compiling application-specific hardware. In FPL, pages 853–863,2002.

Betul Buyukkurt, Zhi Guo, and Walid A. Najjar. Impact of loop unrolling on area, throughput and clockfrequency in Roccc: C to VHDL compiler for FPGAs. In ARC, pages 401–412, 2006.

Dan R. Ghica. Geometry of Synthesis: a structured approach to VLSI design. In POPL, pages 363–375,2007.

Dan R. Ghica. Function interface models for hardware compilation: Types, signatures, protocols. CoRR,abs/0907.0749, 2009.

Dan R. Ghica and Mohamed N. Menaa. On the compositionality of round abstraction. In CONCUR,pages 417–431, 2010.

Dan R. Ghica and Andrzej Murawski. Angelic semantics of fine-grained concurrency. Annals of Pure andApplied Logic, 151(2-3):89–114, 2008.

Dan R. Ghica and Andrzej S. Murawski. Compositional model extraction for higher-order concurrentprograms. In TACAS, pages 303–317, 2006.

Dan R. Ghica and Alex Smith. Geometry of Synthesis II: From games to delay-insensitive circuits. Electr.Notes Theor. Comput. Sci., 265:301–324, 2010.

Dan R. Ghica, Andrzej S. Murawski, and C.-H. Luke Ong. Syntactic control of concurrency. Theor.Comput. Sci., 350(2-3):234–251, 2006.

S. Hauck. Asynchronous design methodologies: an overview. Proceedings of the IEEE, 83(1):69–93, Jan1995.

G. M. Kelly and M. L. Laplaza. Coherence for compact closed categories. Journal of Pure and AppliedAlgebra, 19:193–213, 1980.

A. J. Kfoury. A linearization of the lambda-calculus and consequences. J. Log. Comput., 10(3):411–436,2000.

Wayne Luk, David Ferguson, and Ian Page. Structured hardware compilation of parallel programs. InWill Moore and Wayne Luk, editors, More FPGAs. Abingdon EE&CS Books, 1994.

A. J. Martin. Developments in concurrency and communication, chapter Programming in VLSI: Fromcommunicating processes to delay-insensitive circuits, pages 1–64. Addison-Wesley, 1990.

Guy McCusker. Categorical models of syntactic control of interference revisited, revisited. LMS Journalof Computation and Mathematics, 10:176–216, 2007.

Guy McCusker. A graph model for imperative computation. Logical Methods in Computer Science, 6(1),2010.

R. E. Miller. Sequential Circuits. Wiley, NY, 1965.

Andrzej S. Murawski. Full abstraction without synchronization primitives. Electr. Notes Theor. Comput.Sci., 265:423–436, 2010.

Peter W. O’Hearn, John Power, Makoto Takeyama, and Robert D. Tennent. Syntactic control of inter-ference revisited. Theor. Comput. Sci., 228(1-2):211–252, 1999.

Ian Page and Wayne Luk. Compiling Occam into FPGAs. In W. Moore and W. Luk, editors, FPGAs,pages 271–283. Abingdon EE&CS Books, 1991.

G. L. Peterson. Myths about the mutual exclusion problem. Information Processing Letters, 12:115–116,1981.

Uday S. Reddy. Global state considered unnecessary: An introduction to object-based semantics. Lispand Symbolic Computation, 9(1):7–76, 1996.

John C. Reynolds. Syntactic control of interference. In POPL, pages 39–46, 1978.

John C. Reynolds. The essence of Algol. In Proceedings of the 1981 International Symposium on Algo-rithmic Languages, pages 345–372. North-Holland, 1981.

John C. Reynolds. Syntactic control of inference, part 2. In ICALP, pages 704–722, 1989.

C. L. Seitz. Ideas about arbiters. Lambda, 1(1):10–14, 1980.

J. Sparsø and S. Furber, editors. Principles of Asynchronous Circuit Design: A Systems Perspective.European Low-Power Initiative for Electronic System Design. Kluwer Academic Publishers, 2001.

Ivan E. Sutherland. Micropipelines. Commun. ACM, 32(6):720–738, 1989. Turing Award Paper.

C. H. van Berkel and R. W. J. J. Saeijs. Compilation of communicating processes into delay-insensitivecircuits. In Proceedings of ICCD, 1988.

Kees van Berkel, Joep Kessels, Marly Roncken, Ronald Saeijs, and Frits Schalij. The VLSI-programminglanguage Tangram and its translation into handshake circuits. In EURO-DAC, pages 384–389, 1991.

Hongseok Yang and Howard Huang. Type reconstruction for syntactic control of interference, part 2. InICCL, pages 164–173, 1998.

geometry of synthesis iii: resource management through type …drg/papers/popl11.pdf ·...

Documents