list of technical reports - university of cambridge · technical report computer laboratory...

Technical Report

Computer Laboratory

UCAM-CL-TRISSN 1476-2986

List of technical reports

June 2020

15 JJ Thomson AvenueCambridge CB3 0FDUnited Kingdomphone +44 1223 763500

https://www.cl.cam.ac.uk/

Technical reports published by the University of CambridgeComputer Laboratory are freely available via the Internet:

https://www.cl.cam.ac.uk/techreports/

ISSN 1476-2986

UCAM-CL-TR-1

M.F. Challis:

The JACKDAW database packageOctober 1974, 15 pages, PDF

Abstract: This report describes a general database pack-age which has been implemented in BCPL on an IBM370/165 at the University of Cambridge. One cur-rent application is the provision of an administrativedatabase for the Computing Service.

Entries within a database may include (in additionto primitive fields such as ‘salary’ and ‘address’) linksto other entries: each link represents a relationship be-tween two entries and is always two-way.

Generality is achieved by including within eachdatabase class definitions which define the structure ofthe entries within it; these definitions may be interro-gated by program.

The major part of the package presents a procedu-ral interface between an application program and anexisting database, enabling entries and their fields tobe created, interrogated, updated and deleted. The cre-ation of a new database (or modification of an existingone) by specifying the class definitions is handled by aseparate program.

The first part of the report describes the databasestructure and this is followed by an illustration of theprocedural interface. Finally, some of the implementa-tion techniques used to insure integrity of the databaseare described.

UCAM-CL-TR-2

J. Larmouth:

Scheduling for a share of the machineOctober 1974, 29 pages, PDF

Abstract: This paper describes the mechanism used toschedule jobs and control machine use on the IBM370/165 at Cambridge University, England. The samealgorithm is currently being used in part at the Univer-sity of Bradford and implementations are in progress orunder study for a number of other British Universities.

The system provides computer management with asimple tool for controlling machine use. The manage-rial decision allocates a share of the total machine re-sources to each user of the system, either directly, or viaa hierarchial allocation scheme. The system then un-dertakes to vary the turnaround of user jobs to ensurethat those decisions are effective, no matter what sortof work the user is doing.

At the user end of the system we have great flexi-bility in the way in which he uses the resources he hasreceived, allowing him to get a rapid turnaround forthose (large or small) jobs which require it, and a slowerturnaround for other jobs. Provided he does not work

at a rate exceeding that appropriate to his share of themachine, he can request, for every job he submits, the‘deadline’ by which he wants it running, and the sys-tem will usually succeed in running his job at aboutthe requested time – rarely later, and only occasionallysooner.

Every job in the machine has its own ‘deadline’, andthe machine is not underloaded. Within limits, eachuser can request his jobs back when he wants them,and the system keeps his use to within the share of themachine he has been given. The approach is believed tobe an original one and to have a number of advantagesover more conventional scheduling and controlling al-gorithms.

UCAM-CL-TR-3

A.J.M. Stoneley:

A replacement for the OS/360 discspace management routinesApril 1975, 7 pages, PDF

Abstract: In the interest of efficiency, the IBM disc spacemanagement routines (Dadsm) have been completelyreplaced in the Cambridge 370/165.

A large reduction in the disc traffic has beenachieved by keeping the lists of free tracks in a morecompact form and by keeping lists of free VTOCblocks. The real time taken in a typical transaction hasbeen reduced by a factor of twenty.

By writing the code in a more appropriate form thanthe original, the size has been decreased by a factorof five, thus making it more reasonable to keep it per-manently resident. The cpu requirement has decreasedfrom 5% to 0.5% of the total time during normal ser-vice.

The new system is very much safer than the old inthe fact of total system crashes. The old system gavelittle attention to the consequences of being stopped inmid-flight, and it was common to discover an area ofdisc allocated to two files. This no longer happens.

UCAM-CL-TR-4

A.J.M. Stoneley:

The dynamic creation of I/O pathsunder OS/360-MVTApril 1975, 16 pages, PDF

Abstract: In a large computer it is often desirable andconvenient for an ordinary program to be able to es-tablish for itself a logical connection to a peripheraldevice. This ability is normally provided through a rou-tine within the operating system which may be called byany user program at any time. OS/360 lacks such a rou-tine. For the batch job, peripheral connections can only

3

be made through the job control language and this can-not be done dynamically at run-time. In the restrictedcontext of TSO (IBM’s terminal system) a routine forestablishing peripheral connections does exist, but it isextremely inefficient and difficult to use.

This paper describes how a suitable routine waswritten and grafted into the operating system of theCambridge 370/165.

UCAM-CL-TR-5

P. Hazel, A.J.M. Stoneley:

Parrot – A replacement for TCAMApril 1976, 25 pages, PDF

Abstract: The terminal driving software and hardwarefor the Cambridge TSO (Phoenix) system is described.TCAM and the IBM communications controller werereplaced by a locally written software system and aPDP-11 complex. This provided greater flexibility, reli-ability, efficiency and a better “end-user” interface thanwas possible under a standard IBM system.

UCAM-CL-TR-6

Andrew D. Birrell:

System programming in a high levellanguage125 pages, paper copyPhD thesis (Trinity College, December 1977)

UCAM-CL-TR-7

Andrew Hopper:

Local area computer communicationnetworksApril 1978, 192 pages, PDFPhD thesis (Trinity Hall, April 1978)

Abstract: In this thesis a number of local area networkarchitectures are studied and the feasibility of a LSI de-sign for a universal local network chip is considered.The thesis begins with a survey of current network tech-nologies and a discussion of some of the problems en-countered in local network design. Existing implemen-tations of local networks are then discussed, and theirperformance compared. Ultimately the design consid-erations for a general purpose, microprogrammed, LSInetwork chip is discussed. Such a circuit is able to han-dle a range of network architectures and can be recon-figured to suit various traffic patterns. Finally some ofthe protocol requirements of local networks are dis-cussed, leading to a redesign of the Cambridge ringto provide hardware support for protocol implemen-tation.

UCAM-CL-TR-9

Douglas John Cook:

Evaluation of a protection system181 pages, paper copyPhD thesis (Gonville & Caius College, April 1978)

UCAM-CL-TR-10

Mark Theodore Pezarro:

Prediction oriented description ofdatabase systems190 pages, paper copyPhD thesis (Darwin College, October 1978)

UCAM-CL-TR-11

Branimir Konstatinov Boguraev:

Automatic resolution of linguisticambiguities222 pages, PDFPhD thesis (Trinity College, August 1979)

Abstract: The thesis describes the design, implemen-tation and testing of a natural language analysis sys-tem capable of performing the task of generating para-phrases in a highly ambiguous environment. The em-phasis is on incorporating strong semantic judgementin an augmented transition network grammar: the sys-tem provides a framework for examining the relation-ship between syntax and semantics in the process oftext analysis, especially while treating the related phe-nomena of lexical and structural ambiguity. Word-senseselection is based on global analysis of context withina semantically well-formed unit, with primary empha-sis on the verb choice. In building structures represent-ing text meaning, the analyser relies not on screen-ing through many alternative structures – intermedi-ate, syntactic or partial semantic – but on dynami-cally constructing only the valid ones. The two tasksof sense selection and structure building are procedu-rally linked by the application of semantic routines de-rived from Y. Wilks’ preference semantics, which are in-voked at certain well chosen points of the syntactic con-stituent analysis – this delimits the scope of their actionand provides context for a particular disambiguationtechnique. The hierarchical process of sentence analy-sis is reflected in the hierarchical organisation of ap-plication of these semantic routines – this allows theefficient coordination of various disambiguation tech-niques, and the reduction of syntactic backtracking,non-determinism in the grammar, and semantic paral-lelism. The final result of the analysis process is a de-pendency structure providing a meaning representation

4

of the input text with labelled components centred onthe main verb element, each characterised in terms ofsemantic primitives and expressing both the meaning ofa constituent and its function in the overall textual unit.The representation serves as an input to the generator,organised around the same underlying principle as theanalyser – the verb is central to the clause. Currentlythe generator works in paraphrase mode, but is specif-ically designed so that with minimum effort and vir-tually no change in the program control structure andcode it could be switched over to perform translation.

The thesis discusses the rationale for the approachadopted, comparing it with others, describes the sys-tem and its machine implementation, and presents ex-perimental results.

UCAM-CL-TR-12

M.R.A. Oakley, P. Hazel:

HASP “IBM 1130” multileavingremote job entry protocol withextensions as used on the Universityof Cambridge IBM 370/165September 1979, 28 pages, PDF

Abstract: This document brings together most of theinformation required to design, write and operate aHASP Remote Job Entry Terminal program. Most ofthe document describes facilities available using anyhost computer supporting the HASP protocols. The re-mainder of the document describes improvements tothese facilities which have been made in order to en-hance the reliability of the system, to make it easier torun, and to provide for a wider range of peripheralsthan the basic system.

UCAM-CL-TR-13

Philip Hazel:

Resource allocation and jobscheduling41 pages, PDF

Abstract: The mechanisms for sharing the resources ofthe Cambridge IBM 370/165 computer system amongmany individual users are described. File store is treatedseparately from other resources such as central proces-sor and channel time. In both cases, flexible systemsthat provide incentives to thrifty behaviour are used.The method of allocating resources directly to usersrather than in a hierarchical manner via faculties anddepartments is described, and its social acceptability isdiscussed.

UCAM-CL-TR-14

J.S. Powers:

Store to store swapping forTSO under OS/MVTJune 1980, 28 pages, PDF

Abstract: A system of store-to-store swapping incorpo-rated into TSO on the Cambridge IBM 370/165 is de-scribed. Unoccupied store in the dynamic area is usedas the first stage of a two-stage backing store for swap-ping time-sharing sessions; a fixed-head disc providesthe second stage. The performance and costs of the sys-tem are evaluated.

UCAM-CL-TR-15

I.D. Wilson:

The implementation of BCPL on aZ80 based microcomputer68 pages, PDF

Abstract: The main aim of this project was to achieve asfull an implementation as possible of BCPL on a floppydisc based microcomputer, running CP/M or CDOS(the two being esentially compatible). On the face ofit there seemed so many limiting factors, that, when theproject was started, it was not at all clear which one(if any) would become a final stumbling block. As ithappened, the major problems that cropped up couldbe programmed round, or altered in such a way as tomake them soluble.

The main body of the work splits comfortably intothree sections, and the writer hopes that, in coveringeach section separately, to be able to show how thewhole project fits together into the finished implemen-tation.

UCAM-CL-TR-16

Jeremy Dion:

Reliable storage in a local networkFebruary 1981, 142 pages, PDFPhD thesis (Darwin College, February 1981)

Abstract: A recent development in computer science hasbeen the advent of local computer networks, collectionsof autonomous computers in a small geographical areaconnected by a high-speed communications medium. Insuch a situation it is natural to specialise some of thecomputers to provide useful services to others in thenetwork. These server machines can be economicallyadvantageous if they provide shared access to expensivemechanical devices such as discs.

5

This thesis discusses the problems involved in de-signing a file server to provide a storage service in alocal network. It is based on experience gained fromthe design and implementation of a file server for theCambridge ring.

An important aspect of the design of a file server isthe choice of the service which is provided to client ma-chines. The spectrum of choice ranges from providinga simple remote disc with operations such as read andwrite block, to a remote file system with directories andtextual names. The interface chosen for the Cambridgefile server is “universal” in that the services it providesare intended to allow easy implementation of both vir-tual memory systems and filing systems.

The second major aspect of the file server designconcerns reliability. If the server is to store importantinformation for clients, then it is essential that it be re-sistant to transient errors such as communications orpower failures. The general problems of reliability andcrash resistance are discussed in terms of a model de-veloped for this purpose. Different reliability strategiesused in current data base and filing systems are relatedto the model, and a mechanism for providing atomictransactions in the Cambridge file server is described indetail. An improved mechanism which allows atomictransactions on multiple files is also described and con-trasted with the first version. The revised design allowsseveral file servers in a local network to cooperate inatomic updates to arbitrary collections of files.

UCAM-CL-TR-17

B.K. Boguraev, K. Sparck Jones, J.I. Tait:

Three papers on parsing1982, 22 pages, PDF

Abstract: This collection of three papers examines cur-rent problems in the parsing of natural language. Thefirst paper investigates the parsing of compound nouns,and suggests that the existing strategies are inadequate.Accepting that better approaches are needed, the paperthen proceeds to examine the implications for naturallanguage processing systems.

The second paper in the collection examines the taskof recognising conjunctions within an ATN grammar.To do this only through the grammar specification isdifficult and results in a bulky grammar. The papertherefore presents some ideas for extending the ATNmechanism to better deal with conjunctions.

The final paper considers ways in which semanticparsers can exploit syntactic constraints. Two specificsemantic parsers are considered: those of Cater andBoguraev which are regarded as being representative oftwo styles of parsing. The main conclusion to be drawnis that there are significant disadvantages to semanticparsing without complete syntactic processing of the in-put.

UCAM-CL-TR-18

Burkard Wordenweber:

Automatic mesh generation of 2 & 3dimensional curvilinear manifoldsNovember 1981, 128 pages, paper copyPhD thesis (St John’s College, November 1981)

UCAM-CL-TR-19

Arthur William Sebright Cater:

Analysis and inference for EnglishSeptember 1981, 223 pages, paper copyPhD thesis (Queens’ College, September 1981)

UCAM-CL-TR-20

Avra Cohn, Robin Milner:

On using Edinburgh LCF to provethe correctness of a parsing algorithmFebruary 1982, 23 pages, PDF

Abstract: The methodology of Edinburgh LCF, a mech-anized interactive proof system is illustrated through aproblem suggested by Gloess – the proof of a simpleparsing algorithm. The paper is self-contained, givingonly the relevant details of the LCF proof system. It isshown how tactics may be composed in LCF to yielda strategy which is appropriate for the parser problembut which is also of a generally useful form. Also il-lustrated is a general mechanized method of derivingstructural induction rules within the system.

UCAM-CL-TR-21

A. Cohn:

The correctness of a precedenceparsing algorithm in LCFApril 1982, 38 pages, PDF

Abstract: This paper describes the proof in the LCF sys-tem of a correctness property of a precedence parsingalgorithm. The work is an extension of a simpler parserand proof by Cohn and Milner (Cohn & Milner 1982).Relevant aspects of the LCF system are presented asneeded. In this paper, we emphasize (i) that althoughthe current proof is much more complex than the ear-lier one, mqany of the same metalanguage strategiesand aids developed for the first proof are used in thisproof, and (ii) that (in both cases) a general strategyfor doing some limited forward search is incorporatedneatly into the overall goal-oriented proof framework.

6

UCAM-CL-TR-22

M. Robson:

Constraints in CODD18 pages, PDF

Abstract: The paper describes the implementation ofthe data structuring concepts of domains, intra-tupleconstraints and referential constraints in the relationalDBMS CODD. All of these constraints capture some ofthe semantics of the database’s application.

Each class of constraint is described briefly and it isshown how each of them is specified. The constraintsare stored in the database giving a centralised datamodel, which contains descriptions of procedures aswell as of statistic structures. Some extensions to thenotion of referential constraint are proposed and it isshown how generalisation hierarchies can be expressedas sets of referential constraints. It is shown how thestored data model is used in enforcement of the con-straints.

UCAM-CL-TR-23

J.I. Tait:

Two papers about the scrabblesummarising system12 pages, PDF

Abstract: This report contains two papers which de-scribe parts of the Scrabble English summarizing sys-tem. The first, “Topic identification techniques for pre-dictive language analyzers” has been accepted as ashort communication for the 9th International COnfer-ence on Computational Linguistics, in Prague. The sec-ond, “General summaries using a predictive languageanalyser” is an extended version of a discussion paperwhich will be presented at the European Conferenceon Artificial Intelligence in Paris. Both conferences willtake place during July 1982.

The [second] paper describes a computer systemcapable of producing coherent summaries of Englishtexts even when they contain sections which the systemhas not understood completely. The system employs ananalysis phase which is not dissimilar to a script appliertogether with a rather more sophisticated summariserthan previous systems. Some deficiencies of earlier sys-tems are pointed out, and ways in which the currentimplementation overcomes them are discussed.

UCAM-CL-TR-24

B.K. Boguraev, K. Sparck Jones:

Steps towards natural language todata language translation usinggeneral semantic informationMarch 1982, 8 pages, PDF

Abstract: The aim of the work reported here is to max-imise the use of general semantic information in anAI task processor, specifically in a system front endfor converting natural language questions into formaldatabase queries. The paper describes the translationcomponent of such a front end, which is designed towork from the question meaning representation pro-duced by a language analyser exploiting only generalsemantics and syntax, to a formal query relying ondatabase-specific semantics and syntax. Translation iseffected in three steps, and the paper suggests that therich and explicit meaning representations using seman-tic primitives produced for input sentences by the anal-yser constitute a natural and effective base for furtherprocessing.

UCAM-CL-TR-25

Hiyan Alshawi:

A clustering technique for semanticnetwork processingMay 1982, 9 pages, PDF

Abstract: This paper describes techniques for perform-ing serial processing on the type of semantic networkexemplified by NETL. They make use of an indexingscheme that can be based on semantic clustering. Thebasic algorithm is aimed at performing fast intersectionoperations. It is claimed that the scheme is suitable forits current application in text processing. The semanticcriteria for clustering that have been tried are briefly de-scribed. Extensions of the scheme are suggested for usewith large networks.

UCAM-CL-TR-26

Brian James Knight:

Portable system software forpersonal computers on a network204 pages, paper copyPhD thesis (Churchill College, April 1982)

7

UCAM-CL-TR-27

Martyn Alan Johnson:

Exception handling in domain basedsystems129 pages, paper copyPhD thesis (Churchill College, September 1981)

UCAM-CL-TR-28

D.C.J. Matthews:

Poly reportAugust 1982, 17 pages, PDF

Abstract: Poly was designed to provide a programmingsystem with the same flexibility as a dynamically typedlanguage but without the run-time oveheads. The typesystem, based on that of Russel allows polymorpphicoperations to be used to manipulate abstract objects,but with all the type checking being done at compile-time. Types may be passed explicitly or by inference asparameters to procedures, and may be returned fromprocedures. Overloading of names and generic typescan be simulated by using the general procedure mecha-nism. Despite the generality of the language, or perhapsbecause of it, the type system is very simple, consistingof only three classes of object. There is an exceptionmechanism, similar to that of CLU, and the exceptionsraised in a procedure are considered as part of its ‘type’.The construction of abstract objects and hiding of inter-nal details of the representation come naturally out ofthe type system.

UCAM-CL-TR-29

D.C.J. Matthews:

Introduction to PolyMay 1982, 24 pages, PDF

Abstract: This report is a tutorial introduction to theprogramming language Poly. It describes how to writeand run programs in Poly using the VAX/UNIX im-plementation. Examples given include polymorphic listfunctions, a double precision integer package and asubrange type constructor.

UCAM-CL-TR-30

John Wilkes:

A portable BCPL libraryOctober 1982, 31 pages, PDF

Abstract: Too often, programs written in BCPL are dif-ficult to port from one system to another, not because ofthe language, but because of differences between ‘stan-dard’ libraries. Almost without exception, the defini-tions of these libraries are loose, woolly and inaccu-rate – the proposed BCPL standards document beinga prime example. The author has developed and imple-mented a new BCPL library which is explicitly designedto aid the portability of programs between systems. Inaddition to being largely portable itself, it has two otherfeatures of interest: it uses an exception handling sys-tem instead of return codes, and it makes no distinctionbetween system and user defined stream handlers. Thispaper defines the interface to the package.

UCAM-CL-TR-31

J. Fairbairn:

Ponder and its type systemNovember 1982, 42 pages, PDF

Abstract: This note describes the programming lan-guage “Ponder”, which is designed according to theprinciples of referencial transparency and “orthogonal-ity” as in [vWijngaarden 75]. Ponder is designed to besimple, being functional with normal order semantics.It is intended for writing large programmes, and to beeasily tailored to a particular application. It has a sim-ple but powerful polymorphic type system.

The main objective of this note is to describe thetype system of Ponder. As with the whole of the lan-guage design, the smallest possible number of primi-tives is built in to the type system. Hence for example,unions and pairs are not built in, but can be constructedfrom other primitives.

UCAM-CL-TR-32


How to drive a database front endusing general semantic informationNovember 1982, 20 pages, PDF

Abstract: This paper describes a front end for naturallanguage access to databases making extensive use ofgeneral, i.e. domain-independent, semantic informationfor question interpretation. In the interests of porta-bility, initial syntactic and semantic processing of aquestion is carried out without any reference to the

8

database domain, and domain-dependent operationsare confined to subsequent, comparatively straightfor-ward, processing of the initial interpretation. The dif-ferent modules of the front end are described, and thesystem’s performance is illustrated by examples.

UCAM-CL-TR-33

John A. Carroll:

An island parsing interpreter forAugmented Transition NetworksOctober 1982, 50 pages, PDF

Abstract: This paper describes the implementation ofan ‘island parsing’ interpreter for an Augmented Tran-sition Network (ATN). The interpreter provides morecomplete coverage of Woods’ original ATM formalismthan his later island parsing implementation; it is writ-ten in LISP and has been modestly tested.

UCAM-CL-TR-34

Larry Paulson:

Recent developments in LCF:examples of structural inductionJanuary 1983, 15 pages, PDF

Abstract: Manna and Waldinger have outlined a largeproof that probably exceeds the power of currenttheorem-provers. The proof establishes the unificationalgorithm for terms composed of variables, constants,and other terms. Two theorems from this proof, in-volving structural induction, are performed in the LCFproof assistant. These theorems concern a function thatsearches for an occurrence of one term inside another,and a function that lists the variables in a term.

Formally, terms are regarded as abstract syntaxtrees. LCF automatically builds the first-order theory,with equality, of this recursive data structure.

The first theorem has a simple proof, induction fol-lowed by rewriting. The second theorem requires acases split and substitution throughout the goal. Eachtheorem is proved by reducing the initial goal to sim-pler and simpler subgoals. LCF provides many stan-dard proof strategies for attacking goals; the user canprogram additional ones in LCF’s meta-language, ML.This felxibility allows users to take ideas from suchdiverse fields as denotational semantics and logic pro-gramming.

UCAM-CL-TR-35

Larry Paulson:

Rewriting in Cambridge LCFFebruary 1983, 32 pages, DVI

Abstract: Many automatic theorem-provers rely onrewriting. Using theorems as rewrite rules helps to sim-plify the subgoals that arise during a proof.

LCF is an interactive theorem-prover intended forreasoning about computation. Its implementation ofrewriting is presented in detail. LCF provides a fam-ily of rewriting functions, and operators to combinethem. A succession of functions is described, from pat-tern matching primitives to the rewriting tool that per-forms most inferences in LCF proofs.

The design is highly modular. Each function per-forms a basic, specific task, such as recognizing a cer-tain form of tautology. Each operator implements onemethod of building a rewriting function from simplerones. These pieces can be put together in numerousways, yielding a variety of rewriting strategies.

The approach involves programming with higher-order functions. Rewriting functions are data values,produced by computation on other rewriting functions.The code is in daily use at Cambridge, demonstratingthe practical use of functional programming.

UCAM-CL-TR-36

Lawrence Paulson:

The revised logic PPLAMBDAA reference manualMarch 1983, 28 pages, PDF

Abstract: PPLAMBDA is the logic used in the Cam-bridge LCF proof assistant. It allows Natural Deduc-tion proofs about computation, in Scott’s theory of par-tial orderings. The logic’s syntax, axioms, primitive in-ference rules, derived inference rules and standard lem-mas are described as are the LCF functions for buildingand taking apart PPLAMBDA formulas.

PPLAMBDA’s rule of fixed-point induction admitsa wide class of inductions, particularly where flat or fi-nite types are involved. The user can express and provethese type properties in PPLAMBDA. The inductionrule accepts a list of theorems, stating type propertiesto consider when deciding to admit an induction.

UCAM-CL-TR-37

Christopher Gray Girling:

Representation and authentication oncomputer networks154 pages, paper copyPhD thesis (Queens’ College, April 1983)

9

UCAM-CL-TR-38

Mike Gray:

Views and imprecise information indatabases119 pages, paper copyPhD thesis (November 1982)

UCAM-CL-TR-39

Lawrence Paulson:

Tactics and tacticals in CambridgeLCFJuly 1983, 26 pages, PDF

Abstract: The tactics and tacticals of Cambridge LCFare described. Tactics reason about logical connectives,substitution and rewriting; tacticals combine tacticsinto more powerful tactics. LCF’s package for manag-ing an interactive proof is discussed. This manages thesubgoal tree, presenting the user with unsolved goalsand assembling the final proof.

While primarily a reference manual, the paper con-tains a brief introduction to goal-directed proof. Anexample shows typical use of the tactics and subgoalpackage.

UCAM-CL-TR-40

W. Stoye:

The SKIM microprogrammer’s guideOctober 1983, 33 pages, PDF

Abstract: This paper describes the design and imple-mentation of the SKIM micorprocessor. The processorhas a 24 bit ALU with 16 general purpose registers.The main unique feature is a large microcode store ofup to 64K 40 bit words, with the intention that the mi-crocode could be used like the machine code on a con-ventional processor, with operating system primitivesbeing programmed in microcode.

The processor has been constructed from TTL logic,with a microcode assembler running on Phoenix. A de-bugger for both the hardware and microcode programsruns on the host machine, currently a BBC Microcom-puter.

The processor architecture is discussed, with ex-amples of microcode programming. comparisons withother processors are made, and some of the limitationsof the present design are noted.

UCAM-CL-TR-41

Mike Gordon:

LCF LSM, A system for specifyingand verifying hardwareSeptember 1983, 53 pages, PDF

Abstract: The LCF LSM system is designed to showthat it is practical to prove the correctness of real hard-ware. The system consists of a programming environ-ment (LCF) and a specification language (LSM). Theenvironment contains tools for manipulating and rea-soning about the specifications. Verification consists inproving that a lov-level (usually structural) descriptionis behaviourally equivalent to a high-level functionaldescription. Specifications can be fully hierarchical, andat any level devices can be specified either functionallyor structurally.

As a first case study a simple microcoded computerhas been verified. This proof is described in a com-panion report. In this we also illustrate the use of thesystem for other kinds of manipulation besides verifi-cation. For example, we show how to derive an im-plementation of a hard-wired controller from a micro-program and its decoding and sequencing logic. Thederivation is done using machine checked inference;this ensures that the hard-wired controller is equiva-lent to the microcoded one. We also show how to codea microassembler. These examples illustrate our beliefthat LCF is a good environment for implementing awide range of tools for manipulating hardware speci-fications.

This report has two aims: first to give an overviewof the ideas embodied in LCF LSM, and second, to bea user manual for the system. No prior knowledge ofLCF is assumed.

UCAM-CL-TR-42

Mike Gordon:

Proving a computer correct with theLCF LSM hardware verificationsystemSeptember 1983, 49 pages, PDF

Abstract: A machine generated correctness proof of asimple computer is described.

At the machine code level the computer has a mem-ory and two registers: a 13 bit program counter and a16-bit accumulator. There are 8 machine instructions:halt, unconditional jump, jump when the accumulatorcontains 0, add contents of a memory location to accu-mulator, subtract contents of a location from accumu-lator, load accumulator from memory, store contents ofaccumulator in memory, and skip. The machine can beinterrupted by pushing a button on its front panel.

10

The implementation which we prove correct has 6data registers, and ALU, a memory, and a microcodecontroller. The controller consists of a ROM holding26 30-bit microinstructions, a microprogram counter,and some combinatorial microinstruction decode logic.

Formal specifications of the target and host ma-chines are given, and we describe the main steps inproving that the host correctly fetches, decodes and ex-ecutes machine instructions.

The utility of LCF LSM for general manipulaton isillustrated in two appendices. In appendix 1 we showhow to code a microassembler. In appendix 2 we usethe LCF LSM inference rules to design a hard-wiredcontroller equivalent to the original microcoded one.

N.B. This report should be read in conjunction withLCF LSM: A system for specifying and verifying hard-ware. University of Cambridge, Computer Laboratorytechnical report number 41.

UCAM-CL-TR-43

Ian Malcom Leslie:

Extending the local area networkFebruary 1983, 71 pages, PDFPhD thesis (Darwin College, February 1983)

Abstract: This dissertation is concerned with the de-velopment of a large computer network which hasmany properties associated with local area computernetworks, including high bandwidth and lower errorrates. The network is made up of component local areanetworks, specifically Cambridge rings, which are con-nected either through local ring-ring bridges or througha high capacity satellite link. In order to take advan-tage of the characteristics of the resulting network, theprotocols used are the same simple protocols as thoseused on a single Cambridge ring. This in turn allowsmany applications, which might have been thought ofas local area network applications, to run on the largernetwork.

Much of this work is concerned with an intercon-nection strategy which allows hosts of different com-ponent networks to communicate in a flexible man-ner without building an extra internetwork layer intoprotocol hierarchy. The strategy arrived at is neither adatagram approach nor a system of concatenated er-ror and flow controlled virtual circuits. Rather, it is alightweight virtual circuit approach which preserves theorder of blocks sent on a circuit, but which makes noother guarantees about the delivery of these blocks. Anextra internetwork protocol layer is avoided by modify-ing the system used on a single Cambridge ring whichbinds service names to addresses so that it now bindsservice names to routes across the network.

UCAM-CL-TR-44

Lawrence Paulson:

Structural induction in LCFNovember 1983, 35 pages, paper copy

UCAM-CL-TR-45

Karen Sparck Jones:

Compound noun interpretationproblemsJuly 1983, 16 pages, PDF

Abstract: This paper discusses the problems of com-pound noun interpretation in the context of automaticlanguage processing. Given that compound processingimplies identifying the senses of the words involved, de-termining their bracketing, and establishing their un-derlying semantic relations, the paper illustrates theneed, even in comparatively favourable cases, for in-ference using pragmatic information. This has conse-quences for language processor architectures and, evenmore, for speech processors.

UCAM-CL-TR-46

Nicholas Henry Garnett:

Intelligent network interfaces140 pages, paper copyPhD thesis (Trinity College, May 1983)

UCAM-CL-TR-47

John Irving Tait:

Automatic summarising of Englishtexts137 pages, PDFPhD thesis (Wolfson College, December 1982)

Abstract: This thesis describes a computer programcalled Scrabble which can summarise short Englishtexts. It uses large bodies of predictions about the likelycontents of texts about particular topics to identify thecommonplace material in an input text. Pre-specifiedsummary templates, each associated with a differenttopic are used to condense the commonplace materialin the input. Filled-in summary templates are then usedto form a framework into which unexpected materialin the input may be fitted, allowing unexpected ma-terial to appear in output summary texts in an essen-tially unreduced form. The system’s summaries are inEnglish.

11

The program is based on technology not dissimilarto a script applier. However, Scrabble represents a sig-nificant advance over previous script-based summaris-ing systems. It is much less likely to produce misleadingsummaries of an input text than some previous systemsand can operate with less information about the subjectdomain of the input than others.

These improvements are achieved by the use of threemain novel ideas. First, the system incorporates a newmethod for identifying the idea or topics of an inputtext. Second, it allows a section of text to have morethan one topic at a time, or at least a composite topicwhich may be dealt with by the computer program si-multaneously applying the text predictions associatedwith more than one simple topic. Third, Scrabble in-corporates new mechanisms for the incorporation ofunexpected material in the input into its output sum-mary texts. The incorporation of such material in theoutput summary is motivated by the view that it is pre-cisely unexpected material which is likely to form themost salient matter in the input text.

The performance of the system is illustrated bymeans of a number of example input texts and theirScrabble summaries.

UCAM-CL-TR-48

Hiyan Alshawi:

A mechanism for the accumulationand application of context intext processingNovember 1983, 17 pages, PDF

Abstract: The paper describes a mechanism for the rep-resentation and application of context information forautomatic natural language processing systems. Con-text information is gathered gradually during the read-ing of the text, and the mechanism gives a way of com-bining the effect of several different types of contextfactors. Context factors can be managed independently,while still allowing efficient access to entities in focus.The mechanism is claimed to be more general than theglobal focus mechanism used by Grosz for discourseunderstanding. Context affects the interpretation pro-cess by choosing the results, and restricting the process-ing, of a number of important language interpretationoperations, including lexical disambiguation and refer-ence resolution. The types of context factors that havebeen implemented in an experimental system are de-scribed, and examples of the application of context aregiven.

UCAM-CL-TR-49

David Charles James Matthews:

Programming language design withpolymorphism

143 pages, paper copyPhD thesis (Wolfson College, 1983)

UCAM-CL-TR-50

Lawrence Paulson:

Verifying the unification algorithm inLCFMarch 1984, 28 pages, PDF, DVI

Abstract: Manna and Waldinger’s theory of substitu-tions and unification has been verified using the Cam-bridge LCF theorem prover. A proof of the monotonic-ity of substitution is presented in detail, as an exam-ple of interaction with LCF. Translating the theory intoLCF’s domain-theoretic logic is largely straightforward.Well-founded induction on a complex ordering is trans-lated into nested structural inductions. Correctness ofunification is expressed using predicates for such prop-erties as idempotence and most-generality. The verifica-tion is presented as a series of lemmas. The LCF proofsare compared with the original ones, and with other ap-proaches. It appears difficult to find a logic that is bothsimple and flexible, especially for proving termination.

UCAM-CL-TR-51

Glynn Winskel, Kim Guldstrand Larsen:

Using information systems to solverecursive domain equations effectivelyJuly 1984, 41 pages, PDF

Abstract: This paper aims to make two main contribu-tions. One is to show how to use the concrete natureof Scott’s information systems to advantage in solvingrecursive domain equations. The method is based onthe substructure relation between information systems.This essentially makes a complete partial order (cpo)of information systems. Standard domain constructionslike function space can be made continuous on this cposo the solution of recursive domain equations reducesto the more familiar construction of forming the least-fixed point of a continuous function. The second con-tribution again relies on the concrete nature of infor-mation systems, this time to develop a basic theory ofeffectively given information systems and through thispresent a simple treatment of effectively given domains.

UCAM-CL-TR-52

Steven Temple:

The design of a ring communicationnetwork132 pages, PDFPhD thesis (Corpus Christi College, January 1984)

12

Abstract: This dissertation describes the design of ahigh speed local area network. Local networks havebeen in use now for over a decade and there is a prolif-eration of different systems, experimental ones whichare not widely used and commercial ones installed inhundreds of locations. For a new network design tobe of interest from the research point of view it musthave a feature or features which set it apart from exist-ing networks and make it an improvement over exist-ing systems. In the case of the network described, theresearch was started to produce a network which wasconsiderably faster than current designs, but which re-tained a high degree of generality.

As the research progressed, other features were con-sidered, such as ways to reduce the cost of the networkand the ability to carry data traffic of many differenttypes. The emphasis on high speed is still present butother aspects were considered and are discussed in thedissertation. The network has been named the Cam-bridge Fast Ring and and the network hardware is cur-rently being implemented as an integrated circuit at theUniversity of Cambridge Computer Laboratory.

The aim of the dissertation is to describe the back-ground to the design and the decisions which weremade during the design process, as well as the designitself. The dissertation starts with a survey of the usesof local area networks and examines some establishednetworks in detail. It then proceeds by examining thecharacteristics of a current network installation to as-sess what is required of the network in that and simi-lar applications. The major design considerations for ahigh speed network controller are then discussed anda design is presented. Finally, the design of computerinterfaces and protocols for the network is discussed.

UCAM-CL-TR-53

Jon Fairbairn:

A new type-checker for a functionallanguageJuly 1984, 16 pages, PDF

Abstract: A polymorphic type checker for the func-tional language Ponder [Fairbairn 82] is described. Theinitial sections give an overview of the syntax of Pon-der, and some of the motivation behind the design ofthe type system. This is followed by a definition of therelation of ‘generality’ between these types, and of thenotion of type-validity of Ponder programs. An algo-rithm to determine whether a Ponder program is type-valid is then presented. The final sections give examplesof useful types which may be constructed within thetype system, and describe some of the areas in which itis thought to be inadequate.

UCAM-CL-TR-54

Lawrence Paulson:

Lessons learned from LCFAugust 1984, 16 pages, PDF

Abstract: The history and future prospects of LCFare discussed. The introduction sketches basic conceptssuch as the language ML, the logic PPLAMBDA, andbackwards proof. The history discusses LCF proofsabout denotational semantics, functional programs,and digital circuits, and describes the evolution of ideasabout structural induction, tactics, logics of computa-tion, and the use of ML. The biography contains thirty-five references.

UCAM-CL-TR-55

Ben Moszkowski:

Executing temporal logic programsAugust 1984, 27 pages, PDF

Abstract: Over the last few years, temporal logic hasbeen investigated as a tool for reasoning about com-puter programs, digital circuits and message-passingsystems. In the case of programs, the general feeling hasbeen that temporal logic is an adjunct to existing lan-guages. For example, one might use temporal logic tospecify and prove properties about a program writtenin, say, CSP. This leads to the annoyance of having tosimultaneously use two separate notations.

In earlier work we proposed that temporal logic it-self directly serve as the basis for a programming lan-guage. Since then we have implemented an interpreterfor such a language called Tempura. We are developingTempura as a tool for directly executing suitable tem-poral logic specifications of digital circuits and otherdiscrete time systems. Since every Tempura statement isalso a temporal formula, we can use the entire tempo-ral logic formalism for our assertion language and se-mantics. Tempura has the two seemingly contradictoryproperties of being a logic programming langauge andhaving imperative constructs such as assignment state-ments.

The presentation given here first describes the syn-tax of a first order temporal logic having the operators (next) and (always). This serves as the basis forthe Tempura programming language. The lesser knowntemporal operator chop is subsequently introduced, re-sulting in Interval Temporal Logic. We then show howto incorporate chop and related constructs into Tem-pura.

13

UCAM-CL-TR-56

William Stoye:

A new scheme for writing functionaloperating systemsSeptember 1984, 30 pages, PDF

Abstract: A scheme is described for writing nondeter-ministic programs in a functional language. The schemeis based on message passing between a number of ex-pressions being evaluated in parallel. I suggest thatit represents a significant improvement over previousmethods employing a nondeterministic merge primi-tive, and overcomes numerous drawbacks in that ap-proach. The scheme has been designed in a practicalcontext, and is being used to write an operating sys-tem for SKIM, a functionally programmed machine. Itis not yet well understood in a mathematical sense.

UCAM-CL-TR-57

Lawrence C. Paulson:

Constructing recursion operators inintuitionistic type theoryOctober 1984, 46 pages, PDF, DVI

Abstract: Martin-Lof’s Intuitionistic Theory of Typesis becoming popular for formal reasoning about com-puter programs. To handle recursion schemes otherthan primitive recursion, a theory of well-founded rela-tions is presented. Using primitive recursion over highertypes, induction and recursion are formally derived fora large class of well-founded relations. Included are <on natural numbers, and relations formed by inverseimages, addition, multiplication, and exponentiation ofother relations. The constructions are given in full de-tail to allow their use in theorem provers for Type The-ory, such as Nuprl. The theory is compared with workin the field of ordinal recursion over higher types.

UCAM-CL-TR-58

Glynn Winskel:

Categories of models for concurrencyOctober 1984, 35 pages, PDF

Abstract: It is shown how a variety of models for con-current processes can be viewed as categories in whichfamiliar constructions turn out to be significant categor-ically. Constructions to represent various parallel com-positions are often based on a product construction, forinstance. In many cases different models can be relatedby a pair of functors forming an adjunction between thetwo categories. Because of the way in which such pairsof functors preserve categorical constructions, the ad-junction serves to translate between the different mod-els, so it is seen how semantics expressed in terms ofone model translate to semantics in terms of another.

UCAM-CL-TR-59

Glynn Winskel:

On the composition anddecomposition of assertionsNovember 1984, 35 pages, PDF

Abstract: Recently there has been a great deal of interestin the problem of how to compose modal assertions, inorder to deduce the truth of an assertion for a compo-sition of processes in a parallel programming language,from the truth of certain assertions for its components.

This paper addresses that problem from a theo-retical standpoint. The programming language used isRobin Milner’s Synchronous Calculus of Communicat-ing Systems (called SCCS), while the language of asser-tions is a fragment of dynamic logic which, despite itssimplicity, is expressive enough to characterise obser-vational equivalence. It is shown how, with respect toeach operation ‘op’ in SCCS, every assertion has a de-composition which reduces the problem of proving theassertion holds of a compound process built up using‘op’ to proving assertions about its components. Theseresults provide the foundations of a proof system forSCCS with assertions.

UCAM-CL-TR-60

Hiyan Alshawi:

Memory and context mechanisms forautomatic text processing192 pages, paper copyPhD thesis (Trinity Hall, December 1983)

UCAM-CL-TR-61

Karen Sparck Jones:

User models and expert systemsDecember 1984, 44 pages, PDF

Abstract: This paper analyses user models in expert sys-tems in terms of the many factors involved: user roles,user properties, model types, model functions in rela-tion to different aspects of system performance, andsources, e.g. linguistic or non-linguistic, of modellinginformation. The aim of the detailed discussion, withextensive examples illustrating the complexity of mod-elling, is to clarify the issues involved in modelling, as anecessary preliminary to model building.

14

UCAM-CL-TR-62

Michael Robson:

Constraint enforcement in arelational database managementsystem106 pages, paper copyPhD thesis (St John’s College, March 1984)

UCAM-CL-TR-63

David C.J. Matthews:

Poly manualFebruary 1985, 46 pages, PDF

Abstract: Poly is a general purpose, High-level pro-gramming language. It has a simple type system whichis also very powerful. Higher order procedures, poly-morphic operations, parameterised abstract types andmodules are all supported by a single mechanism.

Poly is strongly typed. All objects have a specifica-tion which the compiler can use to check that opera-tions applied to them are sensible. Type errors cannotcause run time faults. The language is safe, meaningthat any faults occuring at run time will result in ex-ceptions which can be caught. All veriables must be ini-tialised before use, so faults due to undefined variablescannot occur. Poly allows higher order procedures tobe declared and used; these take another procedure asa parameter, or return a procedure as the result. SincePoly is statically scoped, this may still refer to the ar-guments and local variables of the procedure which re-turned it.

Poly allows polymorphic operations. Thus, it is pos-sible to write one program to perform an operation ondata of any type, provided only that the operation isavailable for the data type. Abstract types may be cre-ated and manipulated. These can be specified in such away that only the functions to manipulate these objectsare available to the user. This has the advantage that theimplementation can easily be changed, provided that ithas the same external properties. Abstract types can beparameterised so that a set of types can be defined in asingle definition. Types in Poly are similar to modules inother languages. For example, types can be separatelycompiled. An abstract type which makes use of othertypes can be written as though it were polymorphic; itwill work if it is given any type which has the requiredoperations. Its operation may be to return a new typewhich may be used directly or as a parameter to otherpolymorphic abstract types.

UCAM-CL-TR-64

Branimir K. Boguraev, Karen Sparck Jones:

A framework for inference innatural language front ends todatabasesFebruary 1985, 73 pages, paper copy

UCAM-CL-TR-65

Mark Tillotson:

Introduction to the programminglanguage “Ponder”May 1985, 57 pages, paper copy

UCAM-CL-TR-66

M.J.C. Gordon, J. Herbert:

A formal hardware verificationmethodology and its application to anetwork interface chipMay 1985, 39 pages, PDF

Abstract: We describe how the functional correctnessof a circuit design can be verified by machine checkedformal proof. The proof system used is LCF LSM [1],a version of Milner’s LCF [2] with a different logicalcalculus called LSM. We give a tutorial introduction toLSM in the paper.

Our main example is the ECL chip of the Cam-bridge Fast Ring (CFR) [3]. Although the ECL chip isquite simple (about 360 gates) it is nevertheless real.Minor errors were discovered as we performed the for-mal proof, but when the corrected design was even-tually fabricated it was functionally correct first time.The main steps in verification were: (1) Writing a high-level behavioural specification in the LSM notation. (2)Translating the circuit design from its Modula-2 repre-sentation in the Cambridge Design Automation System[4] to LSM. (3) Using the LCF LSM theorem provingsystem to mechanically generate a proof that the be-haviour determined by the design is equivalent to thespecified behaviour.

In order to accomplish the second of these steps, aninterface between the Cambridge Design AutomationSystem and the LCF LSM system was constructed.

15

UCAM-CL-TR-67


Natural deduction theorem provingvia higher-order resolutionMay 1985, 22 pages, PDF

Abstract: An experimental theorem prover is described.Like LCF it is embedded in the metalanguage ML andsupports backward proof using tactics and tacticals.The prover allows a wide class of logics to be intro-duced using Church’s representation of quantifiers inthe typed lambda-calculus. The inference rules are ex-pressed as a set of generalized Horn clauses containinghigher-order variables. Depth-first subgoaling along in-ference rules is essentially linear resolution, but usinghigher-order unification instead of first-order. This con-stitutes a higher-order Prolog interpreter.

The rules of Martin Lof’s Constructive Type The-ory have been entered into the Prover. Special tacticsinspect a goal and decide which type theory rules maybe appropriate, avoiding excessive backtracking. Thesetactics can automatically derive the types of many TypeTheory expressions. Simple functions can be derived in-teractively.

UCAM-CL-TR-68

Mike Gordon:

HOLA machine oriented formulation ofhigher order logicJuly 1985, 52 pages, PDF

Abstract: In this paper we describe a formal languageintended as a basis for hardware specification and ver-ification. The language is not new; the only originalityin what follows lies in the presentation of the details.Considerable effort has gone into making the formal-ism suitable for manipulation by computer.

The logic described here underlies an automatedproof generator called HOL. The HOL logic is derivedfrom Church’s Simple Type Theory by: making the syn-tax more readable, allowing types to contain variables,and building in the Axiom of Choice via Hilbert’s ε-operator.

The exact syntax of the logic is defined relative toa theory, which determines the types and constantsthat are available. Theories are developed incrementallystarting from the standard theories of truth-values orbooleans, and of individuals. This paper describes thelogic underlying the HOL system.

UCAM-CL-TR-69


Proving termination of normalizationfunctions for conditional expressionsJune 1985, 16 pages, PDF, DVI

Abstract: Boyer and Moore have discussed a recursivefunction that puts conditional expressions into normalform. It is difficult to prove that this function terminateson all inputs. Three termination proofs are compared:(1) using a measure function, (2) in domain theory us-ing LCF, (3) showing that its “recursion relation”, de-fined by the pattern of recursive calls, is well-founded.The last two proofs are essentially the same thoughconducted in markedly different logical frameworks.An obviously total variant of the normalize function ispresented as the ‘computational meaning’ of those twoproofs.

A related function makes nested recursive calls. Thethree termination proofs become more complex: termi-nation and correctness must be proved simultaneously.The recursion relation approach seems flexible enoughto handle subtle termination proofs where previouslydomain theory seemed essential.

UCAM-CL-TR-70

Kenneth Graham Hamilton:

A remote procedure call system109 pages, paper copyPhD thesis (Wolfson College, December 1984)

UCAM-CL-TR-71

Ben Moszkowski:

Executing temporal logic programsAugust 1985, 96 pages, paper copy

UCAM-CL-TR-72

W.F. Clocksin:

Logic programming and thespecification of circuitsMay 1985, 13 pages, PDF

16

Abstract: Logic programming (see Kowalski, 1979)can be used for specification and automatic reasoningabout electrical circuits. Although propositional logichas long been used for describing the truth functionsof combinational circuits, the more powerful PredicateCalculus on which logic programming is based has seenrelatively little use in design automation. Previous re-searchers have introduced a number of techniques sim-ilar to logic programming, but many of the useful con-sequences of the logic programming methodology havenot been exploited. This paper first reviews and com-pares three methods for representing circuits, whichwill be called here the functional method, the exten-sional method, and the definitional method. The lattermethod, which conveniently admits arbitrary sequen-tial circuits, is then treated in detail. Some useful con-sequences of using this method for writing directly ex-ecutable specifications of circuits are described. Theseinclude the use of quantified variables, verification ofhypothetical states, and sequential simulation.

UCAM-CL-TR-73

Daniel Hammond Craft:

Resource management in adistributed computing system116 pages, paper copyPhD thesis (St John’s College, March 1985)

UCAM-CL-TR-74

Mike Gordon:

Hardware verification by formalproofAugust 1985, 6 pages, PDF

Abstract: The use of mathematical proof to verify hard-ware designs is explained and motivated. The hierarchi-cal verification of a simple n-bit CMOS counter is usedas an example. Some speculations are made about whenand how formal proof will become used in industry.

UCAM-CL-TR-75

Jon Fairbairn:

Design and implementation of asimple typed language based on thelambda-calculusMay 1985, 107 pages, PDFPhD thesis (Gonville & Caius College, December1984)

Abstract: Despite the work of Landin and others aslong ago as 1966, almost all recent programming lan-guages are large and difficult to understand. This thesisis a re-examination of the possibility of designing andimplementing a small but practical language based onvery few primitive constructs.

The text records the syntax and informal semanticsof a new language called Ponder. The most notable fea-tures of the work are a powerful type-system and anefficient implementation of normal order reduction.

In contrast to Landin’s ISWIM, Ponder is staticallytyped, an expedient that increases the simplicity of thelanguage by removing the requirement that operationsmust be defined for incorrect arguments. The type sys-tem is a powerful extension of Milner’s polymorphictype system for ML in that it allows local quantifica-tion of types. This extension has the advantage thattypes that would otherwise need to be primitive maybe defined.

The criteria for the well-typedness of Ponder pro-grammes are presented in the form of a natural de-duction system in terms of a relation of generality be-tween types. A new type checking algorithm derivedfrom these rules is proposed.

Ponder is built on the λ-calculus without the needfor additional computation rules. In spite of this ab-stract foundation an efficient implementation based onHughes’ super-combinator approach is described. Someevidence of the speed of Ponder programmes is in-cluded.

The same strictures have been applied to the de-sign of the syntax of Ponder, which, rather than havingmany pre-defined clauses, allows the addition of newconstructs by the use of a simple extension mechanism.

UCAM-CL-TR-76

R.C.B. Cooper, K.G. Hamilton:

Preserving abstraction in concurrentprogrammingAugust 1985, 16 pages, PDF

Abstract: Recent programming languages have at-tempted to provide support for concurrency and formodular programming based on abstract interfaces.Building on our experience of adding monitors to CLU,a language orientated towards data abstraction, we ex-plain how these two goals conflict. In particular wediscuss the clash between conventional views on inter-face abstraction and the programming style requiredfor avoiding monitor deadlock. We argue that the bestcompromise between these goals is a combination of afine grain locking mechanism together with a methodfor explicitly defining concurrency properties for se-lected interfaces.

17

UCAM-CL-TR-77

Mike Gordon:

Why higher-order logic is agood formalisation for specifying andverifying hardwareSeptember 1985, 28 pages, PDF

Abstract: Higher order logic was originally developedas a foundation for mathematics. In this paper we showhow it can be used as: 1. a hardware description lan-guage, and 2. a formalism for proving that designs meettheir specifications.

Examples are given which illustrate various spec-ification and verification techniques. These include aCMOS inverter, a CMOS full adder, an n-bit ripple-carry adder, a sequential multiplier and an edge-triggered D-type register.

UCAM-CL-TR-78

Glynn Winskel:

A complete proof system for SCCSwith model assertionsSeptember 1985, 23 pages, PDF

Abstract: This paper presents a proof system for RobinMilner’s Synchronous Calculus of Communicating Sys-tems (SCCS) with modal assertions. The language ofassertions is a fragment of dynamic logic, sometimescalled Hennessy-Milner logic after they brought it toattention; while rather weak from a practical point ofview, its assertions are expressive enough to charac-terise observation equivalence, central to the work ofMilner et al. on CCS and SCCS. The paper includes acompleteness result and a proof of equivalence betweenan operational and denotational semantics for SCCS.Its emphasis is on the theoretical issues involved in theconstruction of proof systems for parallel programminglangauges.

UCAM-CL-TR-79

Glynn Winskel:

Petri nets, algebras and morphisms38 pages, PDF

Abstract: It is shown how a category of Petri nets canbe viewed as a subcategory of two sorted algebras overmultisets. This casts Petri nets in a familiar frameworkand provides a useful idea of morphism on nets differ-ent from the conventional definition – the morphisms

here respect the behaviour of nets. The categorical con-structions with result provide a useful way to synthe-sise nets and reason about nets in terms of their com-ponents; for example various forms of parallel compo-sition of Petri nets arise naturally from the product inthe category. This abstract setting makes plain a use-ful functor from the category of Petri nets to a cate-gory of spaces of invariants and provides insight intothe generalisations of the basic definition of Petri nets– for instance the coloured and higher level nets ofKurt Jensen arise through a simple modificationof thesorts of the algebras underlying nets. Further it providesa smooth formal relation with other models of con-currency such as Milner’s Calculus of CommunicatingSystems (CCS) and Hoare’s Communicating SequentialProcesses (CSP).

UCAM-CL-TR-80


Interactive theorem proving withCambridge LCFA user’s manualNovember 1985, 140 pages, paper copy

UCAM-CL-TR-81

William Robert Stoye:

The implementation of functionallanguages using custom hardwareDecember 1985, 151 pages, PDFPhD thesis (Magdalene College, May 1985)

Abstract: In recent years functional programmers haveproduced a great many good ideas but few results.While the use of functional languages has been enthu-siastically advocated, few real application areas havebeen tackled and so the functional programmer’s viewsand ideas are met with suspicion.

The prime cause of this state of affairs is the lack ofwidely available, solid implementations of functionallanguages. This in turn stems from two major causes:(1) Our understanding of implementation techniqueswas very poor only a few years ago, and so any im-plementation that is “mature” is also likely to be un-useably slow. (2) While functional languages are excel-lent for expressing algorithms, there is still considerabledebate in the functional programming community overthe way in which input and output operations shouldbe represented to the programmer. Without clear guid-ing principles implementors have tended to produce ad-hoc, inadequate solutions.

My research is concerned with strengthening thecase for functional programming. To this end I con-structed a specialised processor, called SKIM, which

18

could evaluate functional programs quickly. This al-lowed experimentation with various implementationmethods, and provided a high performance implemen-tation with which to experiment with writing largefunctional programs.

This thesis describes the resulting work and includesthe following new results: (1) Details of a practicalturner-style combinator reduction implementation fea-turing greatly improved storage use compared with pre-vious methods. (2) An implementation of Kennaway’sdirector string idea that further enhances performanceand increases understanding of a variety of reductionstrategies. (3) Comprehensive suggestions concerningthe representation of input, output, and nondetermin-istic tasks using functional languages, and the writingof operating systems. Details of the implementation ofthese suggestions developed on SKIM. (4) A number ofobservations concerning fuctional programming in gen-eral based on considerable practical experience.

UCAM-CL-TR-82


Natural deduction proof ashigher-order resolutionDecember 1985, 25 pages, PDF, DVI

Abstract: An interactive theorem prover, Isabelle, is un-der development. In LCF, each inference rule is repre-sented by one function for forwards proof and another(a tactic) for backwards proof. In Isabelle, each infer-ence rule is represented by a Horn clause. Resolutiongives both forwards and backwards proof, supportinga large class of logics. Isabelle has been used to provetheorems in Martin-Lof’s Constructive Type Theory.

Quantifiers pose several difficulties: substitution,bound variables, Skolemization. Isabelle’s representa-tion of logical syntax is the typed lambda-calculus, re-quiring higher-order unification. It may have potentialfor logic programming. Depth-first search using infer-ence rules constitutes a higher-order Prolog.

UCAM-CL-TR-83

Ian David Wilson:

Operation system design forlarge personal workstations203 pages, paper copyPhD thesis (Darwin College, July 1985)

UCAM-CL-TR-84

Martin Richards:

BSPL:a language for describing thebehaviour of synchronous hardwareApril 1986, 56 pages, paper copy

UCAM-CL-TR-85

Glynn Winskel:

Category theory and models forparallel computationApril 1986, 16 pages, PDF

Abstract: This report will illustrate two uses of categorytheory: Firstly the use of category theory to define se-mantics in a particular model. How semantic construc-tions can often be seen as categorical ones, and, in par-ticular, how parallel compositions are derived from acategorical product and a nun-deterministic sum. Thesecategorical notions can provide a basis for reason-ing about computations and will be illustrated for themodel of Petri nets.

Secondly, the use of category theory to relate differ-ent semantics will be examined; specifically, how the re-lations between various concrete models like Petri nets,event structures, trees and state machines are expressedas adjunctions. This will be illustrated by showing thecoreflection between safe Petri nets and trees.

UCAM-CL-TR-86

Stephen Christopher Crawley:

The Entity System: an object basedfiling systemApril 1986, 120 pages, paper copyPhD thesis (St John’s College, December 1985)

UCAM-CL-TR-87

Kathleen Anne Carter:

Computer-aided type face designMay 1986, 172 pages, PDFPhD thesis (King’s College, November 1985)

19

Abstract: This thesis tackles the problems encounteredwhen trying to carry out a creative and intuitive task,such as type face design, on a computer. A brief his-tory of printing and type design sets the scene for a dis-cussion of digital type. Existing methods for generatingand handling digital type are presented and their rela-tive merits are discussed. Consideration is also given tothe nature of designing, independent of the tools used.The importance of intuition and experience in such atask is brought out. Any new tools must allow the de-signer to exercise his skills of hand and eye, and tojudge the results visually. The different abstractions thatcan be used to represent a typeface in a computer arediscussed with respect to the manner of working thatthey force upon the designer.

In the light of this discussion some proposals aremade for a new system for computer-aided type face de-sign. This system must be highly interactive, providingrapid visual feedback in response to the designer’s ac-tions. Designing is a very unstructured task, frequentlywith a number of activities being pursued at once.Hence the system must also be able to support multi-ple activities, with the user free to move between themat any time.

The characteristics of various types of interactivegraphical environment are then considered. This discus-sion leads on to proposals for an environment suitablefor supporting type face design. The proposed anviron-ment is based on the provision of a number of windowson the screen, each supporting a different activity. Amouse, graphics tablet and keyboard are all continu-ously available for interection with the system. The restof the thesis discusses the implementation of this graph-ical environment and the type face design system thatmakes use of it. The final chapter evaluates the successof both the underlying software and of the type facedesign system itself.

UCAM-CL-TR-88

David Maclean Carter:

A shallow processing approach toanaphor resolutionMay 1986, 233 pages, paper copyPhD thesis (King’s College, December 1985)

UCAM-CL-TR-89

Jon Fairbairn:

Making form follow functionAn exercise in functionalprogramming styleJune 1986, 9 pages, PDF

Abstract: The combined use of user-defined infix opera-tors and higher order functions allows the programmerto invent new control structures tailored to a particularproblem area.

This paper is to suggest that such a combination hasbeneficial effects on the ease of both writing and read-ing programmes, and hence can increase programmerproductivity. As an example, a parser for a simple lan-guage is presented in this style.

It is hoped that the presentation will be palatable topeople unfamiliar with the concepts of functional pro-gramming.

UCAM-CL-TR-90

Andy Hopper, Roger M. Needham:

The Cambridge Fast Ring networkingsystem (CFR)June 1986, 25 pages, PDF

Abstract: Local area networks have developed fromslow systems operating at below 1MBs to fast systemsat 50MBs or more. We discuss the choices facing a de-signer as faster speeds for networks are contemplated.The 100MBs Cambridge Fast Ring is described. Thering protocol allows one of a number of fixed size slotsto be used once or repeatedly. The network design al-lows sets of rings to be constructed by pushing thebridge function to the lowest hardware level. Low costand ease of use is normally achieved by design of spe-cial chips and we describe a two-chip VLSI implemen-tation. This VLSI hardware forms the basis of a kit-of-parts from which many different network componentscan be constructed.

UCAM-CL-TR-91

Albert Camilleri, Mike Gordon,Tom Melham:

Hardware verification usinghigher-order logicSeptember 1986, 25 pages, PDF

Abstract: The Hardware Verification Group at the Uni-versity of Cambridge is investigating how various kindsof digital systems can be verified by mechanised formalproof. This paper explains our approach to represent-ing behaviour and structure using higher order logic.Several examples are described including a ripple carryadder and a sequential device for computing the fac-torial function. The dangers of inaccurate models areillustrated with a CMOS exclusive-or gate.

20

UCAM-CL-TR-92

Stuart Charles Wray:

Implementation and programmingtechniques for functional languagesJune 1986, 117 pages, PDFPhD thesis (Christ’s College, January 1986)

Abstract: In this thesis I describe a new method of strict-ness analysis for lazily evaluated functional languages,and a method of code generation making use of theinformation provided by this analysis. I also describetechniques for practical programming in lazily evalu-ated functional languages, based on my experience ofwriting substantial functional programs.

My new strictness analyser is both faster and morepowerful than that of Mycroft. It can be used on anyprogram expressed as super-combinator definitions andit uses the additional classifications absent and danger-ous as well as strict and lazy. This analyser assumesthat functional arguments to higher order functions arecompletely lazy.

I describe an extension of my analyser which dis-covers more strictness in the presence of higher orderfunctions, and I compare this with higher order analy-sers based on Mycroft’s work. I also describe an exten-sion of my analyser to lazy pairs and discuss strictnessanalysers for lazy lists.

Strictness analysis brings useful performance im-provements for programs running on conventional ma-chines. I have implemented my analyser in a compilerfor Ponder, a lazily evaluated functional language withpolymorphic typing. Results are given, including thesurprising result that higher order strictness analysis isno better than first order strictness analysis for speedingup real programs on conventional machines.

I have written substantial programs in Ponder anddescribe in some detail the largest of these which isabout 2500 lines long. This program is an interactivespreadsheet using a mouse and bitmapped display. I dis-cuss programming techniques and practical problemsfacing functional languages with illustrative examplesfrom programs I have written.

UCAM-CL-TR-93

J.P. Bennett:

Automated design of an instructionset for BCPLJune 1986, 56 pages, paper copy

UCAM-CL-TR-94

Avra Cohn, Mike Gordon:

A mechanized proof of correctness ofa simple counterJune 1986, 80 pages, paper copy

UCAM-CL-TR-95

Glynn Winskel:

Event structuresLecture notes for the AdvancedCourse on Petri NetsJuly 1986, 69 pages, PDF

Abstract: Event structures are a model of computa-tional processes. They represent a process as a set ofevent occurrences with relations to express how eventscausally depend on others. This paper introduces eventstructures, shows their relationship to Scott domainsand Petri nets, and surveys their role in denotationalsemantics, both for modelling laguages like CCS andCSP and languages with higher types.

UCAM-CL-TR-96

Glynn Winskel:

Models and logic of MOS circuitsLectures for the MarktoberdorfSummerschool, August 1986October 1986, 47 pages, PDF

Abstract: Various models of hardware have been pro-posed though virtually all of them do not model cir-cuits adequately enough to support and provide a for-mal basis for many of the informal arguments used bydesigners of MOS circuits. Such arguments use rathercrude discrete notions of strength – designers cannot betoo finicky about precise resistances and capacitanceswhen building a chip – as well as subtle derived no-tions of information flow between points in the circuit.One model, that of R.E. Bryant, tackles such issues inreasonable generality and has been used as the basis ofseveral hardware simulators. However Bryant’s modelis not compositional. These lectures introduce Bryant’sideas and present a compositional model for the be-haviour of MOS circuits when the input is steady, showhow this leads to a logic, and indicate the difficulties inproviding a full and accurate treatment for circuits withchanging inputs.

21

UCAM-CL-TR-97

Alan Mycroft:

A study on abstract interpretationand “validating microcodealgebraically”October 1986, 22 pages, PDF

Abstract: This report attempts to perfrom two roles:the first part aims to give a state-of-the-art introduc-tion to abstract interpretation with as little mathemat-ics as possible. The question of the ‘best’ meta-languagefor abstract interpretation is, however, left open. Thesecond part gives a tutorial introduction to an applica-tion of abstract interpretation based on the relationalstyle of Mycroft and Jones (1985). This report doesnot claim to have introduced any new techniques, butrather aims to make the existing literature understand-able to a wider audience.

UCAM-CL-TR-98

E. Robinson:

Power-domains, modalities and theVietoris monadOctober 1986, 16 pages, PDF

Abstract: It is possible to divide the syntax-directed ap-proaches to programming language semantics into twoclasses, “denotational”, and “proof-theoretic”. Thispaper argues for a different approach which also hasthe effect of linking the two methods. Drawing on re-cent work on locales as formal spaces we show thatthis provides a way in which we can hope to use aproof-theoretical semantics to give us a denotationalone. This paper reviews aspects of the general the-ory, before developing a modal construction on localesand discussing the view of power-domains as free non-deterministic algebras. Finally, the relationship betweenthe present work and that of Winskel is examined.

UCAM-CL-TR-99


An overview of the Polyprogramming languageAugust 1986, 11 pages, PDF

Abstract: Poly is a general purpose programming lan-guage based on the idea of treating types as first classvalues. It can support polymorphic operations by pass-ing types as parameters to procedures, and abstracttypes and parameterised types by returning types as re-sults.

Although Poly is not intended specifically as adatabase programming language it was convenient toimplement it is a persistent storage system. This allowsthe user to retain data structures from one session tothe next, and can support large programming systemssuch as the Poly compiler and a Standard ML system.

UCAM-CL-TR-100

Jeff Joyce, Graham Birtwistle, Mike Gordon:

Proving a computer correct inhigher order logicDecember 1986, 57 pages, paper copy

UCAM-CL-TR-101

David Russel Milway:

Binary routing networksDecember 1986, 131 pages, paper copyPhD thesis (Darwin College, December 1986)

UCAM-CL-TR-102


A persistent storage system for Polyand MLJanuary 1987, 16 pages, PDF

Abstract: The conventional strategy for implementinginteractive languages has been based on the use of a“workspace” or “core-image” which is read in at thestart of a session and written out at the end. While thisis satisfactory for small systems it is inefficient for largeprograms. This report describes how an idea originallyinvented to simplify database programming, the persis-tent store, was adapted to support program develop-ment in an interactive language.

Poly and ML are both semi-functional languages inthe sense that they allow functions as first class ob-jects but they have variables (references) and use call-by-value semantics. Implementing such languages in apersistent store poses some problems but also allowsoptimisations which would not be possible if their typesystems did not apply certain constraints.

The basic system is designed for single-users but theproblems of sharing data between users is discussed andan experimental system for allowing this is described.

22

UCAM-CL-TR-103

Mike Gordon:

HOLA proof generating system forhigher-order logicJanuary 1987, 56 pages, paper copy

UCAM-CL-TR-104

Avra Cohn:

A proof of correctness ofthe Viper microprocessor:the first levelJanuary 1987, 46 pages, PDF

Abstract: The Viper microprocessor designed at theRoyal Signals and Radar Establishment (RSRE) is oneof the first commercially produced computers to havebeen developed using modern formal methods. Viper isspecified in a sequence of decreasingly abstract levels.In this paper a mechanical proof of the equivalence ofthe first two of these levels is described. The proof wasgenerated using a version of Robin Milner’s LCF sys-tem.

UCAM-CL-TR-105

Glynn Winskel:

A compositional model of MOScircuitsApril 1987, 25 pages, PDF

Abstract: This paper describes a compositional modelfor MOS circuits. Like the model of Bryant (1984), itcovers some of the effects of capacitance and resistanceused frequently in designs. Although this has formedthe basis of several hardware simulators, it suffers fromthe inadequacy that it is not compositional, making itdifficult to reason in a structured way.

The present paper restricts its attention to the staticbehaviour of circuits, representing this as the set of pos-sible steady states the circuit can settle into. A goodunderstanding of such static behaviour is necessary totreat sequential circuits. This paper further takes theview that it is useful to have a language to describethe construction of circuits, and to this end borrowsideas from Hoare’s Communicating Sequential Pro-cesses, and Milner’s Calculus of Communicating Sys-tems.

UCAM-CL-TR-106

Thomas F. Melham:

Abstraction mechanisms forhardware verificationMay 1987, 26 pages, PDF

Abstract: It is argued that techniques for proving thecorrectness of hardware designs must use abstractionmechanisms for relating formal descriptions at differentlevels of detail. Four such abstraction mechanisms andtheir formalisation in higher order logic are discussed.

UCAM-CL-TR-107

Thierry Coquand, Carl Gunter,Glynn Winskel:

DI-domains as a model ofpolymorphismMay 1987, 19 pages, PDF

Abstract: This paper investigates a model of the poly-morphic lambda calculus recently described by Girard(1985). This model differs from earlier ones in that allthe types are interpreted as domains rather than clo-sures or finitary projections on a universal domain. Theobjective in this paper is to generalize Girard’s con-struction to a larger category called dI-domains, andsecondly to show how Girard’s construction (and thisgeneralization) can be done abstractly. It demonstratesthat the generalized construction can be used to do de-notational semantics in the ordinary way, but with theadded feature of type polymorphism.

UCAM-CL-TR-108

Andrew John Wilkes:

Workstation design for distributedcomputingJune 1987, 179 pages, PDFPhD thesis (Wolfson College, June 1984)

Abstract: This thesis discusses some aspects of thedesign of computer systems for local area networks(LANs), with particular emphasis on the way such sys-tems present themselves to their users. Too little atten-tion to this issue frequently results in computing envi-ronments that cannot be extended gracefully to accom-modate new hardware or software and do not presentconsistent, uniform interfaces to either their humanusers or their programmatic clients. Before computersystems can become truly ubiquitous tools, these prob-lems of extensibility and accessibility must be solved.This dissertation therefore seeks to examine one possi-ble approach, emphasising support for program devel-opment on LAN based systems.

23

UCAM-CL-TR-109

Jeffrey Joyce:

Hardware verification of VLSI regularstructuresJuly 1987, 20 pages, PDF

Abstract: Many examples of hardware specification fo-cus on hierarchical specification as a means of control-ling structural complexity in design. Another methodis the use of iteration. This paper, however, presents athird method, namely the mapping of irregular combi-national functions to regular structures.

Regular structures often result in solutions whichare economical in terms of area and design time. Theautomatic generation of a regular structure such as aROM or PLA from a functional specification usuallyaccommodates minor changes to the functional specifi-cation.

The mapping of irregular combinational functionsto a regular structure separates function from circuitdesign. This paper shows how this separation can beexploited to derive a behavioural specification of a reg-ular structure parameterized by the functional specifi-cation.

UCAM-CL-TR-110

Glynn Winskel:

Relating two models of hardwareJuly 1987, 16 pages, PDF

Abstract: The idea of this note is to show how Winskel’sstatic-configuration model of circuits is related for-mally to Gordon’s relational model. Once so related,the simpler proofs in the relational model can, for in-stance, be used to justify results in terms of the static-configurations model. More importantly, we can ex-hibit general conditions on circuits which ensure thatassertions which hold of a circuit according to the sim-pler model are correct with respect to the more accu-rate model. The formal translation makes use of a sim-ple adjunction between (partial order) categories asso-ciated with the two models, in a way reminiscient of ab-stract interpretation. Preliminary results suggest similarlines of approach may work for other kinds of abstrac-tion such as temporal abstraction used in e.g. Melham’swork to reason about hardware, and, more generally,make possible a formal algebraic treatment of the rela-tionship between different models of hardware.

UCAM-CL-TR-111

K. Sparck Jones:

Realism about user modellingJune 1987, 32 pages, PDF

Abstract: This paper reformulates the framework foruser modelling presented in an earlier technical report,‘User Models and Expert Systems’, and considers theimplications of the real limitations on the knowledgelikely to be available to a system for the value and ap-plication of user models.

UCAM-CL-TR-112

D.A. Wolfram:

Reducing thrashing by adaptivebacktrackingAugust 1987, 15 pages, PDF

Abstract: Adaptive backtracking dynamically reducesthrashing caused by blind backtracking and recurringfailures, by locating early backtrack points and delet-ing choices which are not part of any solution. Searchproblems with hereditary bounding properties are sol-uble by this method. These problems include searchesin theorem proving, logic programming, reason mainte-nance, and planning. The location of a backtrack pointuses a particular minimal inconsistent subset, which iscalled the cause set. A rejection set is computed fromthe union of cause sets and rejection sets at a failure areused to locate subsequent backtrack points. A choice isdeleted when a rejection set is a singleton. The worstcase overhead is O(nf(n)) in time if the bounding prop-erty can be tested in O(f(n)) time, and O(n2) in space.An implementation confirms the expected exponentialspeed-ups for problems whose solution involves muchthrashing.

UCAM-CL-TR-113


The representation of logics inhigher-order logicAugust 1987, 29 pages, PDF

Abstract: Intuitionistic higher-order logic — the frag-ment comtaining implication, universal quantification,and equality — can serve as a meta-logic for formaliz-ing various logics. As an example, axioms formalizingfirst-order logic are presented, and proved sound andcomplete by induction on proof trees.

Proofs in higher-order logic represent derivations ofrules as well as proofs of theorems. A proof develops by

24

deriving rules using higher-order resolutions. The dis-charge of assumptions involves derived meta-rules for‘lifting’ a proposition.

Quantifiers require a similar lifting rule or elseHilbert’s ε-operator. The alternatives are contrastedthrough several examples. Hilbert’s ε underlies Is-abelle’s original treatment of quantifiers, but the liftingrule is logically simpler.

The meta-logic is used in the latest version of thetheorem prover Isabelle. It extends the logic used in ear-lier versions. Compared with other meta-logics, higher-order logic has a weaker type system but seems easierto implement.

UCAM-CL-TR-114

Stephen Ades:

An architecture for integrated serviceson the local area networkSeptember 1987, 166 pages, PDFPhD thesis (Trinity College, January 1987)

Abstract: This dissertation concerns the provision of in-tegrated services in a local area context, e.g. on busi-ness premises. The term integrated services can be un-derstood at several levels. At the lowest, one networkmay be used to carry traffic of several media—voice,data, images etc. Above that, the telephone exchangemay be replaced by a more versatile switching system,incorporating facilities such as stored voice messages.Its facilities may be accessible to the user through theinterface of the workstation rather than a telephone.At a higher level still, new services such as multi-mediadocument manipulation may be added to the capabili-ties of a workstation.

Most of the work to date has been at the lowestof these levels, under the auspices of the IntegratedServices Digital Network (ISDN), which mainly con-cerns wide area communications systems. The thesispresented here is that all of the above levels are impor-tant in a local area context. In an office environment,sophisticated data processing facilities in a workstationcan usefully be combined with highly available telecom-munications facilities such as the telephone, to offer theuser new services which make the working day morepleasant and productive. That these facilities should beprovided across one integrated network, rather than byseveral parallel single medium networks is an importantorganisational convenience to the system builder.

The work described in this dissertation is relevantprincipally in a local area context—in the wide areaeconomics and traffic balance dictate that the empha-sis will be on only the network level of integration forsome time now. The work can be split into three parts:

i) the use of a packet network to carry mixed me-dia. This has entailed design of packet voice proto-cols which produce delays low enough for the networkto interwork with national telephone networks. The

system has also been designed for minimal cost pertelephone—packet-switched telephone systems havetraditionally been more expensive than circuit-switchedtypes. The network used as a foundation for this workhas been the Cambridge Fast Ring.

ii) use of techniques well established in distributedcomputing systems to build an ‘integrated servicesPABX (Private Automatic Branch Exchange)’. CurrentPABX designs have a very short life expectancy and analarmingly high proportion of their costs is due to soft-ware. The ideas presented here can help with both ofthese problems, produce an extensible system and pro-vide a basis for new multi-media services.

iii) development of new user level Integrated Ser-vices. Work has been done in three areas. The first ismulti-media documents. A voice editing interface is de-scribed along with the system structure required to sup-port it. Secondly a workstation display has been built tosupport a variety of services based upon image manip-ulation and transmission. Finally techniques have beendemonstrated by which a better interface to telephonyfunctions can be provided to the user, using methods ofcontrol typical of workstation interfaces.

UCAM-CL-TR-115

I.S. Dhingra:

Formal validation of an integratedcircuit design styleAugust 1987, 29 pages, PDF

Abstract: In dynamic circuit design many rules must befollowed which govern the correctness of the design. Inthis paper a dynamic CMOS design style using a twophase non-overlapping clock with its intricate designrules is presented together with formal means of show-ing that a circuit follows these rules.

UCAM-CL-TR-116

Thierry Coquand, Carl Gunter,Glynn Winskel:

Domain theoretic models ofpolymorphismSeptember 1987, 52 pages, paper copy

UCAM-CL-TR-117

J.M. Bacon, K.G. Hamilton:

Distributed computing with RPC:the Cambridge approachOctober 1987, 15 pages, PDF

25

Abstract: The Cambridge Distributed Computing Sys-tem (CDCS) is described and its evolution outlined.The Mayflower project allowed CDCS infrastructure,services and applications to be programmed in a highlevel, object oriented, language, Concurrent CLU. TheConcurrent CLU RPC facility is described in detail. Itis a non-transparent, type checked, type safe systemwhich employs dynamic binding and passes objects ofarbitrary graph structure. Recent extensions accomo-date a number of languages and transport protocols. Acomparison with other RPC schemes is given.

UCAM-CL-TR-118


Material concerning a study ofcasesMay 1987, 31 pages, PDF

Abstract: This note describes and illustrates a study ofdeep cases using a large sample of sentences. We haveused a language analyser which builds meaning repre-sentations expressing semantic case roles; specificallyBoguraev’s (1979) analyser, which builds dependencytrees with word senses defined by semantic categoryprimitive formulae, and with case labels, i.e. semanticrelation primitives. This note highlights the importanceof the source material for those interested in case-basedrepresentations of sentence meaning, and indicates thepotential utility of the study results.

UCAM-CL-TR-119

Robert Cooper:

Pilgrim: a debugger for distributedsystemsJuly 1987, 19 pages, PDF

Abstract: Pilgrim is a source level debugger for con-current CLU programs which execute in a distributedenvironment. It integrates conventional debugging fa-cilities with features for debugging remote procedurecalls and critical region based process interactions. Pil-grim is unusual in that it functions on programs inthe target environment under conditions of actual use.This has caused a trade-off between providing rich anddetailed information to the programmer and avoidingany unwanted alteration to the computation being de-bugged. Another complication is debugging one clientof a network server while avoiding interference with theserver’s other clients. A successful methodology for thiscase requires assistance from the server itself.

UCAM-CL-TR-120

D. Wheeler:

Block encryptionNovember 1987, 4 pages, PDF

Abstract: A fast and simple way of encrypting computerdata is needed. The UNIX crypt is a good way of do-ing this although the method is not cryptographicallysound for text. The method suggested here is appliedto larger blocks than the DES method which uses 64bit blocks, so that the speed of encyphering is reason-able. The algorithm is designed for software rather thanhardware. This forgoes two advantages of the crypt al-gorithm, namely that each character can be encodedand decoded independently of other characters and thatthe identical process is used both for encryption anddecryption. However this method is better for codingblocks directly.

UCAM-CL-TR-121

Jonathan Billington:

A high-level petri net specification ofthe Cambridge Fast Ring M-accessserviceDecember 1987, 31 pages, PDF

Abstract: Numerical Petri Nets (a high level inhibitornet) are used to characterise the Cambridge Fast RingHardware at a high level of abstraction. The NPNmodel describes the service provided to users of thehardware (stations, monitors, bridges and ring trans-mission plant), known as the M-Access service defini-tion in order to remove ambiguities and as a basis forthe development and verification of the protocols usingthe M-Access service.

UCAM-CL-TR-122

John Herbert:

Temporal abstraction of digitaldesignsFebruary 1988, 34 pages, PDF

Abstract: Formal techniques have been used to ver-ify the function of reasonably large digital devices([Hunt85], [Cohn87]), and also to describe and reasonabout digital signal behaviour at a detailed timing level[Hanna85], [Herbert86]. Different models are used:simple synchronous models of components are the basisfor verifying high-level functional specifications; moredetailed models which capture the behaviour of signalsin real time are the basis of proofs about timing. The

26

procedure called temporal abstraction is a technique forformally relating these two behavioural models.

The background to temporal abstraction is pre-sented and the details of its implementation in HOL.The HOL language ([Gordon85a]) is a computerisedversion of higher-order logic which has an associatedproof assistant also called HOL. In HOL one may spec-ify behaviour at both the functional and timing lev-els. This work describes how the relationship betweenthese levels may also be described in HOL and reasonedabout using the HOL system.

The formal transformation of descriptions of be-haviour at the timing level to behaviour at the func-tional level involves generating and verifying timingconstraints. This process can be identified with the con-ventional design activity of timing analysis. This workshows that timing verification can be viewed, not as aseparate phase of design, but as part of a single verifica-tion process which encompasses functional and timingverification. A single formal language, HOL, is used todescribe all aspects of the behaviour and a single ver-ification system provides all the proofs of correctness.The use of uniform, formal techniques is shown to havea number of advantages.

UCAM-CL-TR-123

John Herbert:

Case study of the Cambridge FastRing ECL chip using HOLFebruary 1988, 38 pages, PDF

Abstract: This article describes the formal specificationand verification of an integrated circuit which is partof a local area network interface. A single formal lan-guage is used to describe the structure and behaviourat all levels in the design hierarchy, and an associatedproof assistant is used to generate all formal proofs.The implementation of the circuit, described as a struc-ture of gates and flip-flops, is verified via a number oflevels with respect to a high-level formal specificationof required behaviour. The high-level formal specifica-tion is shown to be close to precise natural languagedescription of the circuit behaviour.

The specification language used, HOL [Gor-don85a], has the advantage of permitting partial speci-fications. It turns out that partial specification has animportant effect on the specification and verificationmethodology and this is presented. We have also evalu-ated aspects of conventional design, such as techniquesfor locating errors and the use of simulation, withinthe case study of formal methods. We assert that proofstrategies must assist error location and that simulationhas a role alongside formal verification.

UCAM-CL-TR-124

John Herbert:

Formal verification of basic memorydevicesFebruary 1988, 46 pages, PDF

Abstract: Formal methods have been used recently toverify high-level functional specifications of digital sys-tems. Such formal proofs have used simple models ofcircuit components. In this article we describe comple-mentary work which uses a more detailed model ofcomponents and demonstrates how hardware can bespecified and verified at this level.

In this model all circuits can be described as struc-tures of gates, each gate having an independent prop-agation delay. The behaviour of digital signals in realtime is captured closely. The function and timing ofasynchronous and synchronous memory elements im-plemented using gates is derived. Formal proofs of cor-rectness show that, subject to certain constraints ongate delays and signal timing parameters, these de-vices act as memory elements and exhibit certain timingproperties.

All the proofs have been mechanically generated us-ing Gordon’s HOL system.

UCAM-CL-TR-125

Juanito Camilleri:

An operational semantics for OccamFebruary 1988, 24 pages, PDF

Abstract: Occam is a programming language designedto support concurrent applications, especially those im-plemented on networks of communicating processors.The aim of this paper is to formulate the meaning ofthe language constructs of Occam by semantic defini-tions which are intended as a direct formalisation ofthe natural language descriptions usually found in pro-gramming language manuals [Inmos 3]. This is done bydefining a syntax directed transition system where thetransitions associated to a phrase are a function of thetransitions associated to its components. This method isby no means novel. The concepts here were introducedin [Plotkin 8] and are applied in [Plotkin 9] where anoperational semantics for CSP [Hoare 2] was presented.The operational semantics for a subset of Ada is de-fined in [Li 6], where tasking and exception handlingare modelled. For simplicity only a subset of Occamis defined. Timing, priority, replicators and BYTE sub-scription are omitted. Other features of Occam whichdeal with the associated components of an Occam pro-gram with a set of physical resources (i.e. configura-tions) are also omitted since they do not effect the se-mantic interpretation of a program.

27

UCAM-CL-TR-126

M.E. Leeser:

Reasoning about the function andtiming of integrated circuits withProlog and temporal logicFebruary 1988, 50 pages, paper copy

UCAM-CL-TR-127

John Carroll, Bran Boguraev, Claire Grover,Ted Briscoe:

A development environment forlarge natural language grammarsFebruary 1988, 44 pages, PDF

Abstract: The Grammar Development Environment(GDE) is a powerful software tool designed to help alinguist or grammarian experiment with and developlarge natural language grammars. (It is also, however,being used to help teach students on courses in com-putational linguistics). This report describes the gram-matical formalism employed by the GDE, and containsdetailed instructions on how to use the system.

Source code for a Common Lisp version of the soft-ware is available from the University of Edinburgh Ar-tificial Intelligence Applications Institute.

UCAM-CL-TR-128

Robert Charles Beaumont Cooper:

Debugging concurrent anddistributed programsFebruary 1988, 111 pages, PDFPhD thesis (Churchill College, December 1987)

Abstract: This thesis aims to make one aspect of dis-tributed programming easier: debugging. The principlesfor designing and implementing an interactive debug-ger for concurrent and distributed programs are pre-sented. These programs are written in a high-level lan-guage with type-checked remote procedure calls. Theyexecute on the nodes of a local computer network andinteract with the other programs and services which ex-ist on such a network.

The emphasis is on debugging programs in the en-vironment in which they will eventually operate, ratherthan some simulated environment oriented specificallyto the needs of debugging. Thus the debugging facili-ties impose a low overhead on the program and may beactivated at any time.

Ideally the actions of the debugger should be trans-parent to the execution of the program being debugged.

The difficult problem of avoiding any alteration to therelative ordering of inter-process events is examined indetail. A method of breakpointing a distributed com-putation is presented which achieves a high degree oftransparency in the face of arbitary process interactionsthrough shared memory.

The problems of debugging programs that interactwith network services, which are shared concurrentlywith other users of the distributed environment, areexamined. A range of debugging techniques, some ofwhich are directly supported by the debugger, are dis-cussed.

A set of facilities for debugging remote procedurecalls is presented, and the functions required of the op-erating system kernel and runtime system to supportdebugging are also discussed. A distributed debugger isitself an example of a distributed program and so is-sues such as functional distribution and authenticationare addressed.

These ideas have been implemented in Pilgrim, a de-bugger for Concurrent CLU programs running underthe Mayflower supervisor within the Cambridge Dis-tributed Computing System.

UCAM-CL-TR-129

Jeremy Peter Bennett:

A methodology for automated designof computer instruction setsMarch 1988, 147 pages, PDFPhD thesis (Emmanuel College, January 1987)

Abstract: With semiconductor technology providingscope for increasingly complex computer architectures,there is a need more than ever to rationalise themethodology behind computer design. In the 1970’s,byte stream architectures offered a rationalisation ofcomputer design well suited to microcoded hardware.In the 1980’s, RISC technology has emerged to sim-plify computer design and permit full advantage to betaken of very large scale integration. However, such ap-proaches achieve their aims by simplifying the problemto a level where it is within the comprehension of asimple human being. Such an effort is not sufficient.There is a need to provide a methodology that takes theburden of design detail away from the human designer,leaving him free to cope with the underlying principlesinvolved.

In this dissertation I present a methodology for thedesign of computer instruction sets that is capable ofautomation in large part, removing the drudgery of in-dividual instruction selection. The methodology doesnot remove the need for the designer’s skill, but ratherallows precise refinement of his ideas to obtain an opti-mal instruction set.

In developing this methodology a number of piecesof software have been designed and implemented.Compilers have been written to generate trial instruc-tion sets. An instruction set generator program has been

28

written and the instruction set it proposes evaluated. Fi-nally a prototype language for instruction set design hasbeen devised and implemented.

UCAM-CL-TR-130

Lawrence C Paulson:

The foundation of a generic theoremproverMarch 1988, 44 pages, PDF, DVIThis paper is a revised version of UCAM-CL-TR-113.

Abstract: Isabelle is an interactive theorem prover thatsupports a variety of logics. It represents rules as propo-sitions (not as functions) and builds proofs by combin-ing rules. These operations constitute a meta-logic (or‘logical framework’) in which the object-logics are for-malized. Isabelle is now based on higher-order logic – aprecise and well-understood foundation.

Examples illustrate use of this meta-logic to formal-ize logics and proofs. Axioms for first-order logic areshown sound and complete. Backwards proof is for-malized by meta-reasoning about object-level entail-ment.

Higher-order logic has several practical advantagesover other meta-logics. Many proof techniques areknown, such as Huet’s higher-order unification proce-dure.

UCAM-CL-TR-131

Karen Sparck Jones:

Architecture problems in theconstruction of expert systems fordocument retrievalDecember 1986, 28 pages, PDF

Abstract: The idea of an expert system front end of-fering the user effective direct accessto a document re-trieval system is an attractive one. The paper consid-ers two specific approaches to the construction of suchan expert interface, Belkin and Brooks and their col-leagues’ treatment of the functions of such a front endbased on the analysis of human intermediaries, and Pol-litt’s experimental implementation of a query formula-tor for searching Cancerline. The distributed expert sys-tem model proposed by Belkin and Brooks is a plaus-able one, and Pollitt’s system can be regarded as a firststep towards it. But there are major problems aboutthis type of architecture, and the paper argues in par-ticular that in seeking to develop more powerful frontends of the kind envisaged there is one important issue,the nature of the language used for communication be-tween the contributing experts, that requires for atten-tion than it has hitherto received.

UCAM-CL-TR-132

Miriam Ellen Leeser:

Reasoning about the function andtiming of integrated circuits withProlog and temporal logicApril 1988, 151 pages, paper copyPhD thesis (Queens’ College, December 1987)

UCAM-CL-TR-133


A preliminary users manual forIsabelleMay 1988, 81 pages, PDF, DVI

Abstract: This is an early report on the theorem proverIsabelle and several of its object-logics. It describesIsabelle’s operations, commands, data structures, andorganization. This information is fairly low-level, butcould benefit Isabelle users and implementors of othersystems.

UCAM-CL-TR-134

Avra Cohn:

Correctness propertiesof the Viper black model:the second levelMay 1988, 114 pages, paper copy

UCAM-CL-TR-135

Thomas F. Melham:

Using recursive types to reason abouthardware in higher order logicMay 1988, 30 pages, PDF

Abstract: The expressive power of higher order logicmakes it possible to define a wide variety of data typeswithin the logic and to prove theorems that state theproperties of these types concisely and abstractly. Thispaper describes how such defined data types can beused to support formal reasoning in higher order logicabout the behaviour of hardware designs.

29

UCAM-CL-TR-136

Jeffrey J. Joyce:

Formal specification and verificationof asynchronous processes inhigher-order logicJune 1988, 45 pages, PDF

Abstract: We model the interaction of a synchronousprocess with an asynchronous memory process usinga four-phase “handshaking” protocol. This exampledemonstrates the use of higher-order logic to reasonabout the behaviour of synchronous systems such asmicroprocessors which communicate requests to asyn-chronous devices and then wait for unpredictably longperiods until these requests are answered. We also de-scribe how our model could be revised to include someof the detailed timing requirements found in real sys-tems such as the M68000 microprocessor. One en-hancement uses non-determinism to model minimumsetup times for asynchronous inputs. Experience withthis example suggests that higher-order logic may alsobe a suitable formalism for reasoning about more ab-stract forms of concurrency.

UCAM-CL-TR-137

F.V. Hasle:

Mass terms and plurals:from linguistic theory to naturallanguage processingJune 1988, 171 pages, PDF

Abstract: Two linguistic theories within the traditionof formal semantics are investigated. One is concernedwith mass terms, and the other with plurals.

Special attention is paid to the possibility of im-plementing the theories on a computer. With this goalin mind their basic ideas are examined, and the lin-guistic implications are discussed. In the process, var-ious features of the theories are made formally precise.This leads to two formal systems, one for represent-ing the meanings of sentences with mass terms, and an-other similar one for plurals. The systems are specifiedby machine-executable translation relations from frag-ments of natural language into logical representations.

The underlying model-theoretic semantics of eachtheory is partially axiomatised. From the axiomatisa-tions all of the paradigmatic inferences of each theorycan be proved in a purely deductive manner. This is de-manstrated by a number of rigorous proofs of naturallanguage inferences.

Finally some methodological issues are raised. Boththeories recommend a particular approach within for-mal semantics for natural language. I explore the

methodological views underlying the theories, and dis-cuss whether the authors actually follow the methodswhich they recommend.

UCAM-CL-TR-138

Michael Burrows, Martın Abadi,Roger Needham:

Authentication: a practical study inbelief and actionJune 1988, 19 pages, PDF

Abstract: Questions of belief and action are essential inthe analysis of protocols for the authentication of prin-cipals in distributed computing systems. In this paperwe motivate, set out and exemplify a logic specificallydesigned for this analysis; we show how protocols dif-fer subtly with respect to the required initial assump-tions of the participants and their final beliefs. Ourfomalism has enabled us to isolate and express thesedifferences in a way that was not previously possible,and it has drawn attention to features of the protocolsof which we were perviously unaware. The reasoningabout particular protocols has been mechanically veri-fied.

This paper starts with an informal account of theproblem, goes on to explain the formalism to be used,and gives examples of its application to real protocolsfrom the literature. The final sections deal with a formalsemantics of the logic and conclusions.

UCAM-CL-TR-139

Paul R. Manson:

Petri net theory: a surveyJune 1988, 77 pages, PDF

Abstract: The intense interest in concurrent (or “paral-lel”) computation over the past decade has given rise toa large number of languages for concurrent program-ming, representing many conflicting views of concur-rency.

The discovery that concurrent programming is sig-nificantly more difficult than sequential programminghas prompted considerable research into determining atractable and flexible theory of concurrency, with theaim of making concurrent processing more accessible,and indeed the wide variety of concurrent languagesmerely reflects the many different models of concur-rency which have also been developed.

This report, therefore introduces Petri nets, dis-cussing their behaviour, interpretation and relationshipto other models of concurrency. It defines and dis-cusses several restrictions and extensions of the Petrinet model, showing how they relate to basic Petri nets,while explaining why they have been of historical im-portance. Finally it presents a survey of the analysismethods applied to Petri nets in general and for someof the net models introduced here.

30

UCAM-CL-TR-140

Albert John Camilleri:

Executing behavioural definitions inhigher order logicJuly 1988, 183 pages, PDFPhD thesis (Darwin College, February 1988)

Abstract: Over the past few years, computer scien-tists have been using formal verification techniques toshow the correctness of digital systems. The verificationprocess, however, is complicated and expensive. Evenproofs of simple circuits can involve thousands of logi-cal steps. Often it can be extremely difficult to find cor-rect device specifications and it is desirable that one setsoff to prove a correct specification from the start, ratherthan repeatedly backtrack from the verification processto modify the original definitions after discovering theywere incorrect or inadequate.

The main idea presented in the thesis is to amal-gamate the techniques of simulation and verification,rather than have the latter replace the former. The re-sult is that behavioural definitions can be simulated un-til one is reasonably sure that the specification is cor-rect. Furthermore, proving the correctness with respectto these simulated specifications avoids the inadequa-cies of simulation where it may not be computation-ally feasible to demonstrate correctness by exhaustivetesting. Simulation here has a different purpose: to getspecifications correct as early as possible in the verifi-cation process. Its purpose is not to demonstrate thecorrectness of the implementation – this is done in theverification stage when the very same specifications thatwere simulated are proved correct.

The thesis discusses the implementation of an exe-cutable subset of the HOL logic, the version of HigherOrder Logic embedded in the HOL theorem prover. It isshown that hardware can be effectively described usingboth relations and functions; relations being suitablefor abstract specification and functions being suitablefor execution. The difference between relational andfunctional specifications are discussed and illustratedby the verification of an n-bit adder. Techniques for ex-ecuting functional specifications are presented and var-ious optimisation strategies are shown which make theexecution of the logic efficient. It is further shown thatthe process of generating optimised functional defini-tions from relational definitions can be automated. Ex-ample simulations of three hardware devices (a facto-rial machine, a small computer and a communicationschip) are presented.

UCAM-CL-TR-141

Roy Want:

Reliable management of voice in adistributed system

July 1988, 127 pages, PDFPhD thesis (Churchill College, December 1987)

Abstract: The ubiquitous personal computer has foundits way into most office environments. As a result,widespread use of the Local Area Network (LAN)for the purposes of sharing distributed computing re-sources has become common. Another technology, thePrivate Automatic Branch Exchange (PABX), has ben-efited from large research and development by the tele-phone companies. As a consequence, it is cost effectiveand has widely infiltrated the office world. Its primarypurpose is to switch digitised voice but, with the grow-ing need for communication between computers it isalso being adapted to switch data. However, PABXsare generally designed around a centralised switch inwhich bandwidth is permanently divided between itssubscribers. Computing requirements need much largerbandwidths and the ability to connect to several ser-vices at once, thus making the conventional PABX un-suitable for this application.

Some LAN technologies are suitable for switchingvoice and data. The additional requirement for voiceis that point to point delay for network packets shouldhave a low upper-bound. The 10 Mb/s Cambridge Ringis an example of this type of network, but is relativelylow bandwidth gives it limited application in this area.Networks with larger bandwidths (up to 100 Mb/s) arenow becoming available comercially and could supporta realistic population of clients requiring voice and datacommunication.

Transporting voice and data in the same networkhas two main advantages. Firstly, from a practical pointof view, wiring is minimised. Secondly, applicationswhich integrate both media are made possible, andhence digitised voice may be controlled by client pro-grams in new and interesting ways.

In addition to the new applications, the original tele-phony facilities must also be available. They should, atleast by default, appear to work in an identical way toour tried and trusted impression of a telephone. How-ever, the control and management of a network tele-phone is now in the domain of distributed computing.The voice connections between telephones are virtualcircuits. Control and data information can be freelymixed with voice at a network interface. The new prob-lems that result are the management issues related tothe distributed control of real-time media.

This thesis describes the issues as a distributed com-puting problem and proposes solutions, many of whichhave been demonstrated in a real implementation. Par-ticular attention has been paid to the quality of ser-vice provided by the solutions. This amounts to the de-sign of helpful operator interfaces, flexible schemes forthe control of voice from personal workstations and,in particular, a high reliability factor for the backbonetelephony service. This work demonstrates the advan-tages and the practicality of integrating voice and dataservices within the Local Area Network.

31

UCAM-CL-TR-142

Peter Newman:

A fast packet switch for theintegrated services backbone networkJuly 1988, 24 pages, PDF

Abstract: With the projected growth in demand forbandwidth and telecommunications services, will comethe reguirement for a multi-service backbone networkof far greater efficiency, capacity and flexibility thanthe ISDN is able to satisfy. This class of network hasbeen termed the Broadband ISDN, and the design ofthe switching node of such a network is the subject ofmuch current research. This paper investigates one pos-sible solution. The design and performance, for multi-service traffic, is presented of a fast packet switch basedupon a non-buffered, multi-stage interconnection net-work. It is shown that for an implementation in currentCMOS technology, operating at 50 MHz, switches witha total traffic capacity of up to 150 Gbit/sec may beconstructed. Furthermore, if the reserved service traf-fic load is limited on each input port to a maximum of80% of switch port saturation, then a maximum de-lay across the switch of the order of 100 µsecs may beguaranteed, for 99% of the reserved service traffic, re-gardless of the unreserved service traffic load.

UCAM-CL-TR-143


Experience with IsabelleA generic theorem proverAugust 1988, 20 pages, PDF, DVI

Abstract: The theorem prover Isabelle is describedbriefly and informally. Its historical development istraced from Edinburgh LCF to the present day. Themain issues are unification, quantifiers, and the rep-resentation of inference rules. The Edinburgh LogicalFramework is also described, for a comparison with Is-abelle. An appendix presents several Isabelle logics, in-cluding set theory and Constructive Type Theory, withexamples of theorems.

UCAM-CL-TR-144

Juanito Camilleri:

An operational semantics for occamAugust 1988, 27 pages, PDFThis is an extended version of UCAM-CL-TR-125, inwhich we include the operational semantics of priorityalternation.

Abstract: Occam is a programming language designedto support concurrent applications, especially those im-plemented on networks of communicating processors.The aim of this paper is to formulate the meaning ofthe language constructs of Occam by semantic defini-tions which are intended as a direct formalisation ofthe natural language descriptions usually found in pro-gramming language manuals [Inmos 5]. This is done bydefining a syntax directed transition system where thetransitions associated to a phrase are a function of thetransitions associated to its components. This method isby no means novel. The concepts here were introducedin [Plotkin 10] and are applied in [Plotkin 11] wherean operational semantics for CSP [Hoare 4] was pre-sented. The operational semantics for a subset of Ada isdefined in [Li 6], where tasking and exception handlingare modelled. For simplicity only a subset of Occam isdefined. Timing, replicators and BYTE subscription areomitted. Other features of Occam which deal with theassociation of components of an Occam program witha set of physical resources (i.e. configurations) are alsoomitted since they do not effect the semantic interpre-tation of a program.

UCAM-CL-TR-145

Michael J.C. Gordon:

Mechanizing programming logics inhigher order logicSeptember 1988, 55 pages, PDF

Abstract: Formal reasoning about computer programscan be based directly on the semantics of the program-ming language, or done in a special purpose logic likeHoare logic. The advantage of the first approach is thatit guarantees that the formal reasoning applies to thelanguage being used (it is well known, for example,that Hoare’s assignment axiom fails to hold for mostprogramming languages). The advantage of the secondapproach is that the proofs can be more direct and nat-ural.

In this paper, an attempt to get the advantages ofboth approaches is described. The rules of Hoare logicare mechanically derived from the semantics of a sim-ple imperative programming language (using the HOLsystem). These rules form the basis for a simple pro-gram verifier in which verification conditions are gen-erated by LCF-style tactics whose validations use thederived Hoare rules. Because Hoare logic is derived,rather than postulated, it is straightforward to mix se-mantic and axiomatic reasoning. It is also straightfor-ward to combine the constructs of Hoare logic withother application-specific notations. This is briefly illus-trated for various logical constructs, including termi-nation statements, VDM-style ‘relational’ correctnessspecifications, weakest precondition statements and dy-namic logic formulae.

The theory underlying the work presented here iswell known. Our contribution is to propose a way of

32

mechanizing this theory in a way that makes certainpractical details work out smoothly.

UCAM-CL-TR-146

Thomas F. Melham:

Automating recursive type definitionsin higher order logicSeptember 1988, 64 pages, paper copy

UCAM-CL-TR-147

Jeffrey Joyce:

Formal specification and verificationof microprocessor systemsSeptember 1988, 24 pages, PDF

Abstract: This paper describes the use of formal meth-ods to verify a very simple microprocessor. The hi-erarchical structure of the microprocessor implemen-tation is formally specified in higher-order logic. Thebehaviour of the microprocessor is then derived froma switch level model of MOS (Metal Oxide Semicon-ductor) behaviour using inference rules of higher-orderlogic with assistance from a mechanical theorem prov-ing system. The complexity of the formal proof is con-trolled by a multi-level approach based on increas-ingly abstract views of time and data. While tradi-tional methods such as multi-level simulation may re-veal errors or inconsistencies, formal verification canprovide greater certainty about the correctness of a de-sign. The main difference with formal verification, andits strength, is that behaviour at one level is formallydervied from lower levels with a precise statement ofthe conditions under which one level accurately modelslower levels.

UCAM-CL-TR-148


Extending coloured petri netsSeptember 1988, 82 pages, PDF

Abstract: Jensen’s Coloured Petri Nets (CP-nets) aretaken as the starting point for the development ofa specification technique for complex concurrent sys-tems. To increase its expressive power CP-nets are ex-tended by including capacity and inhibitor functions. Aclass of extended CP-nets, known as P-nets, is definedthat includes the capacity function and the threshold in-hibitor extension. The inhibitor extension is defined ina totally symmetrical way to that of the usual pre placemap (or incidence function). Thus the inhibitor and preplace maps may be equated by allowing a marking to be

purged by a single transition occurrence, useful whenspecifying the abortion of various procedures. A chap-ter is devoted to developing the theory and notation forthe purging of a place’s marking or part of its marking.

Two transformations from P-nets to CP-nets arepresented and it is proved that they preserve interleav-ing behaviour. These are based on the notion of com-plementary places defined for PT-nets and involve thedefinition and proof of a new extended complementaryplace invariant for CP-nets

The graphical form of P-nets, known as a P-Graph,is presented formally and draws upon the theories de-veloped for algebraic specification. Arc inscriptions aremultiples of tuples of terms generated by a many-sortedsignature. Transition conditions are Boolean expres-sions derived from the same signature. An interpreta-tion of the P-Graph is given in terms of a correspondingP-net. The work is similar to that of Vautherin but in-cludes the inhibitor and capacity extension and a num-ber of significant differences. in the P-Graph concretesets are associated with places, rather than sorts andlikewise there are concrete initial marking and capacityfunctions. Vautherin associates equations with transi-tions rather than the more general Boolean expressions.P-Graphs are useful for specification at a concrete level.Classes of the P-Graph, known as Many-sorted Alge-braic Nets and Many-sorted Predicate/Transition nets,are defined and illustrated by a number of examples. Anextended place capacity notation is developed to allowfor the convenient representation of resource bounds inthe graphical form.

Some communications-oriented examples are pre-sented including queues and the Demon Game of in-ternational standards fame.

The report concludes with a discussion of futurework. In particular, an abstract P-Graph is defined thatis very similar to Vautherin’s Petri net-like schema, butincluding the capacity and inhibitor extensions andassociating boolean expressions with transitions. Thiswill be useful for more abstract specifications (eg classesof communications protocols) and for their analysis.

It is believed that this is the first coherent and formalpresentation of these extensions in the literature.

UCAM-CL-TR-149

Paul Ashley Karger:

Improving security and performancefor capability systemsOctober 1988, 273 pages, PDF, PostScriptPhD thesis (Wolfson College, March 1988)

Abstract: This dissertation examines two major limita-tions of capability systems: an inability to support secu-rity policies that enforce confinement and a reputationfor relatively poor performance when compared withnon-capability systems.

The dissertation examines why conventional capa-bility systems cannot enforce confinement and proposes

33

a new secure capability architecture, called SCAP, inwhich confinement can be enforced. SCAP is based onthe earlier Cambridge Capability System, CAP. The dis-sertation shows how a non-discretionary security policycan be implemented on the new architecture, and howthe new architecture can also be used to improve trace-ability of access and revocation of access.

The dissertation also examines how capability sys-tems are vulnerable to discretionary Trojan horse at-tacks and proposes a defence based on rules builtinto the command-language interpreter. System-widegarbage collection, commonly used in most capabil-ity systems, is examined in the light of the non-discretionary security policies and found to be funda-mentally insecure. The dissertation proposes alternativeapproaches to storage management to provide at leastsome of the benefits of system-wide garbage collection,but without the accompanying security problems.

Performance of capability systems is improved bytwo major techniques. First, the doctrine of program-ming generality is addressed as one major cause ofpoor performance. Protection domains should be al-located only for genuine security reasons, rather thanat every subroutine boundary. Compilers can better en-force modularity and good programming style withoutadding the expense of security enforcement to everysubroutine call. Second, the ideas of reduced instruc-tion set computers (RISC) can be applied to capabilitysystems to simplify the operations required. The disser-tation identifies a minimum set of hardware functionsneeded to obtain good performance for a capability sys-tem. This set is much smaller than previous researchhad indicated necessary.

A prototype implementation of some of the capa-bility features is described. The prototype was imple-mented on a re-microprogrammed VAX-11/730 com-puter. The dissertation examines the performance andsoftware compatibility implications of the new capa-bility architecture, both in the context of conventionalcomputers, such as the VAX, and in the context of RISCprocessors.

UCAM-CL-TR-150

Albert John Camilleri:

Simulation as an aid to verificationusing the HOL theorem proverOctober 1988, 23 pages, PDF

Abstract: The HOL theorem proving system, developedby Mike Gordon at the University of Cambridge, is amechanism of higher order logic, primarily intended forconducting formal proofs of digital system designs. Inthis paper we show that hardware specifications writ-ten in HOL logic can be executed to enable simula-tion as a means of supporting formal proof. Specifica-tions of a small microprocessor are described, showing

how HOL logic sentences can be transformed into exe-cutable code with minimum risk of introducing incon-sistencies. A clean and effective optimisation strategyis recommended to make the executable specificationspractical.

UCAM-CL-TR-151

Inderpreet-Singh Dhingra:

Formalising an integrated circuitdesign style in higher order logicNovember 1988, 195 pages, PDFPhD thesis (King’s College, March 1988)

Abstract: If the activities of an integrated circuit de-signer are examined, we find that rather than keepingtrack of all the details, he uses simple rules of thumbwhich have been refined from experience. These rulesof thumb are guidelines for deciding which blocks touse and how they are to be connected. This thesis givesa formal foundation, in higher order logic, to the de-sign rules of a dynamic CMOS integrated circuit designstyle.

Correctness statements for the library of basic ele-ments are fomulated. These statements are based on asmall number of definitions which define the behaviourof transistors and capacitors and the necessary axiomi-sation of the four valued algebra for signals. The cor-rectness statements of large and complex circuits arethen derived from the library of previously proved cor-rectness statements, using logical inference rules insteadof rules of thumb. For example, one gate from the li-brary can drive another only if its output constraintsare satisfied by the input constraints of the gate that itdrives. In formalising the design rules, these constraintsare captured as predicates and are part of the correct-ness statements of these gates. So when two gates are tobe connected, it is only necessary to check that the pred-icates match. These ideas are fairly general and widelyapplicable for formalising the rules of many systems.

A number of worked examples are presented basedon these formal techniques. Proofs are presented at var-ious stages of development to show how the correctnessstatement for a device evolves and how the proof is con-structed. In particular it is demonstrated how such for-mal techniques can help improve and sharpen the finalspecifications.

As a major case study to test all these techniques, anew design for a gigital phase-locked loop is presented.This has been designed down to the gate level usingthe above dynamic design style, and has been describedand simulated using ELLA. Some of the subcomponentshave been formally verified down to the detailed circuitlevel while others have merely been specified withoutformal proofs of correctness. An informal proof of cor-rectness of this device is also presented based on theformal specifications of the various submodules.

34

UCAM-CL-TR-152

Andrew Mark Pullen:

Motion development for computeranimationNovember 1988, 163 pages, paper copyPhD thesis (Churchill College, August 1987)

UCAM-CL-TR-153

Michael Burrows:

Efficient data sharingDecember 1988, 99 pages, PDFPhD thesis (Churchill College, September 1988)

Abstract: As distributed computing systems becomewidespread, the sharing of data between people usinga large number of computers becomes more important.One of the most popular ways to facilitate this shar-ing is to provide a common file system, accessible byall the machines on the network. This approach is sim-ple and reasonably effective, but the performance ofthe system can degrade significantly if the number ofmachines is increased. By using a hierarchical network,and arranging that machines typically access files storedin the same section of the network it is possible to buildvery large systems. However, there is still a limit on thenumber of machines that can share a single file serverand a single network effectively.

A good way to decrease network and server load isto cache file data on client machines, so that data neednot be fetched from the centralized server each time it isaccessed. This technique can improve the performanceof a distributed file system and is used in a number ofworking systems. However, caching brings with it theoverhead of maintaining consistency, or cache coher-ence. That is, each machine in the network must see thesame data in its cache, even though one machine may bemodifying the data as others are reading it. The prob-lem is to maintain consistency without dramatically in-creasing the number of messages that must be passedbetween machines on the network.

Some existing file systems take a probabilistic ap-proach to consistency, some explicitly prevent the ac-tivities that can cause inconsistency, while others pro-vide consistency only at the some cost in functionalityor performance. In this dissertation, I examine how dis-tributed file systems are typically used, and the degreeto which caching might be expected to improve perfor-mance. I then describe a new file system that attempts tocache significantly more data than other systems, pro-vides strong consistency guarantees, yet requires fewadditional messages for cache management.

This new file-system provides fine-grain sharing ofa file concurrently open on multiple machines on thenetwork, at the granularity of a single byte. It usesa simple system of multiple-reader, single writer locks

held in a centralized server to ensure cache consistency.The problem of maintaining client state in a centralizedserver are solved by using efficient data structures andcrash recovery techniques.

UCAM-CL-TR-154

I.B. Crabtree, R.S. Crouch, D.C. Moffat,N.J. Pirie, S.G. Pulman, G.D. Ritchie,B.A. Tate:

A natural language interface to anintelligent planning systemJanuary 1989, 14 pages, PDF

Abstract: An intelligent planning system is an exampleof a software aid which, although developed by special-ists, is intended to be used by non-programmers for awide variety of tasks. There is therefore a need for acommunication medium which allows the applicationspecialist, and the non-expert user to specify their needswithout knowing the details of the system.

This kind of system is one where the ‘mice andmenus’ approach is unlikely to be able to provide a veryflexible interface since the range and type of potentialqueries is not predictable in advance. Clearly, therefore,some kind of language is a necessity here. The aim ofthis project is to experiment with the use of English lan-guage as the medium of communication. The kind ofsystem we would eventually be able to build would beone where the user could use the planner to organisesome external activity, trying out alternative scenarios,and then interact with the system during the executionof the resulting plans, making adjustments where nec-essary.

UCAM-CL-TR-155

S.G. Pulman, G.J. Russell, G.D. Ritchie,A.W. Black:

Computational morphology ofEnglishJanuary 1989, 15 pages, PDF

Abstract: This paper describes an implemented com-puter program which uses various kinds of linguisticknowledge to analyse existing or novel word formsin terms of their components. Three main types ofknowledge are required (for English): knowledge aboutspelling or phonological changes consequent upon af-fixation (notice we are only dealing with isolated wordforms); knowledge about the syntactic or semanticproperties of affixation (i.e. inflexional and derivationalmorphology), and knowledge about the properties ofthe stored base forms of words (which in our case arealways themselves words, rather than more abstract en-tities). These three types of information are stored as

35

data files, represented in exactly the form a linguistmight employ. These data files are then compiled bythe system to produce a run-time program which willanalyse arbitrary word forms presented to it in a wayconsistent with the original linguistic description.

UCAM-CL-TR-156

Steve Pulman:

Events and VP modifiersJanuary 1989, 10 pages, PDF

Abstract: This paper concerns the anaysis of adverbialand PP modifiers of VP suggested by Davidson, whereverbs are regarded as introducing reference to events,and such modifiers are predicates of these events. Sev-eral problems raised by it are described and a solution ispresented. The paper then goes on to suggest some ex-tensions of the theory in order to be able to cope withtemporal and aspectual modification of VPs.

UCAM-CL-TR-157

Juanito Camilleri:

Introducing a priority operator toCCSJanuary 1989, 19 pages, PDF

Abstract: In this paper we augment the syntax of CCSby introducing a priority operator. We present a syn-tax directed operational semantics of the language as alabelled transition system. A new equivalence relationwhich is based on Milner’s strong observational equiv-alence [11] is defined and proved to be a congruence.We also give some examples which illustrate the use ofthe operator and emphasise the novelty of the approachused to introduce the notion prior to process algebras.

UCAM-CL-TR-158

Karen Sparck Jones:

Tailoring output to the user:What does user modelling ingeneration mean?August 1988, 21 pages, PDF

Abstract: This paper examines the implications for lin-guistic output generation tailored to the interactive sys-tem user, of earlier analyses of the components of usermodelling and of the constraints realism imposes onmodelling. Using a range of detailed examples it arguesthat tailoring based only on the actual dialogue and onthe decision model required for the system task is quiteadequate, and that more ambitious modelling is bothdangerous and unnecessary.

UCAM-CL-TR-159

Andrew M. Pitts:

Non-trivial power types can’t besubtypes of polymorphic typesJanuary 1989, 12 pages, PostScript

Abstract: This paper establishes a new, limitative rela-tion between the polymorphic lambda calculus and thekind of higher-order type theory which is embodied inthe logic of toposes. It is shown that any embedding in atopos of the cartesian closed category of (closed) typesof a model of the polymorphic lambda calculus mustplace the polymorphic types well away from the pow-ertypes σ → Ω of the topos, in the sense that σ → Ω isa subtype of a polymorphic type only in the case thatσ isempty (and hence σ → Ω is terminal). As corollar-ies we obtain strengthenings of Reynold’s result on thenon-existence of set-theoretic models of polymorphism.

UCAM-CL-TR-160

Andrew Gordon:

PFL+: A Kernal Scheme for FunctionsI/OFebruary 1989, 26 pages, PDF

Abstract: In place of the common separation of func-tional I/O into continuation and stream based schemes,an alternative division between Data Driven and Strict-ness Driven mechanisms for I/O is proposed. Thedata driven mechanism determines I/O actions by theWeak Head Normal Form of programs, while strict-ness driven I/O is based on suspensions – I/O actionsare triggered when demand arises for the value of asuspension during normal order reduction. The datadriven and strictness driven I/O mechanisms are exem-plified by the output list and input list, respectively, inLandin’s stream based I/O scheme.

PFL+ is a functional I/O scheme, able to express ar-bitary I/O actions and both data driven and strictnessdriven constructs in terms of a small kernel of primi-tives. PFL+ could be added to any functional language.It is based on Holmstrom’s PFL [5], a parallel func-tional language with embedded communication andconcurrency operators from CCS. PFL+ adds non-strictcommunication, behaviours with results and primitivesto make suspensions.

Examples are given of how PFL+ can dervive fromthese primitives both stream based I/O and the repre-sentation of the file system as a function.

36

UCAM-CL-TR-161

D.C.J. Matthews:

Papers on Poly/MLFebruary 1989, 150 pages, paper copy

UCAM-CL-TR-162

Claire Grover, Ted Briscoe, John Carroll,Bran Boguraev:

The Alvey natural language toolsgrammar (2nd Release)April 1989, 90 pages, paper copy

UCAM-CL-TR-163

Ann Copestake, Karen Sparck Jones:

Inference in a natural language frontend for databasesFebruary 1989, 87 pages, PDF

Abstract: This report describes the implementation andinitial testing of knowledge representation and infer-ence capabilities within a modular database front enddesigned for transportability.

UCAM-CL-TR-164

Li Gong, David J. Wheeler:

A matrix key distribution systemOctober 1988, 20 pages, PDF

Abstract: A new key distribution scheme is presented.It is based on the distinctive idea that lets each nodehave a set of keys of which it shares a distinct subsetwith every other node. This has the advantage that thenumbers of keys that must be distributed and main-tained are reduced by a square root factor; moreover,two nodes can start conversation with virtually no de-lay. Two versions of the scheme are given. Their per-formance and security analysis shows it is a practicalsolution to some key distribution problems.

UCAM-CL-TR-165

Peter Newman:

Fast packet switching for integratedservicesMarch 1989, 145 pages, paper copyPhD thesis (Wolfson College, December 1988)

UCAM-CL-TR-166

Jean Bacon:

Evolution of operating systemstructuresMarch 1989, 28 pages, PDF

Abstract: The development of structuring within oper-ating systems is reviewed and related to the simultane-ous evolution of concurrent programming languages.First traditional, multi-user systems are considered andtheir evolution from monolithic closed systems to gen-eral domain structured systems is traced. Hardwaresupport for protected sharing is emphasised for thistype of system.

The technology directed trend towards single userworkstations requires a different emphasis in systemdesign. The requirement for protection in such systemsis less strong than in multi-user systems and, in a singlelanguage system, may to some extent be provided bysoftware at compile time rather than hardware at runtime. Distributed systems comprising single user work-stations and dedicated server machines are consideredand the special requirements for efficient implementa-tion of servers are discussed.

The concepts of closed but structured and open sys-tem designs are helpful. It is argued that the open ap-proach is most suited to the requirements of single userand distributed systems. Experiences of attempting toimplement systems over a closed operating system baseare presented.

Progress towards support for heterogeneity in dis-tributed systems, so that interacting components writ-ten in a range of languages may interwork and may runon a variety of hardware, is presented.

The benefits of taking an object orientated view forsystem-level as well as language-level objects and forspecification, generation and design of systems are dis-cussed and work in this area is described.

An outline of formal approaches aimed at specifica-tion, verification and automatic generation of softwareis given.

Finally, design issues are summarised and conclu-sions drawn.

UCAM-CL-TR-167

Jeffrey J. Joyce:

A verified compiler for a verifiedmicroprocessorMarch 1989, 67 pages, paper copy

37

UCAM-CL-TR-168

J.M. Bacon, I.M. Leslie, R.M. Needham:

Distributed computing with aprocessor bankApril 1989, 15 pages, PDF

Abstract: The Cambridge Distributed Computing Sys-tem (CDCS) was designed some ten years ago and wasin everyday use at the Computer Laboratory until De-cember 1988. An overview of the basic design of CDCSis given, an outline of its evolution and a descriptionof the distributed systems research projects that werebased on it. Experience has shown that a design basedon a processor bank leads to a flexible and extensibledistributed system.

UCAM-CL-TR-169

Andrew Franklin Seaborne:

Filing in a heterogeneous networkApril 1989, 131 pages, paper copyPhD thesis (Churchill College, July 1987)

UCAM-CL-TR-170

Ursula Martin, Tobias Nipkow:

Ordered rewriting and confluenceMay 1989, 18 pages, paper copy

UCAM-CL-TR-171

Jon Fairbairn:

Some types with inclusion propertiesin ∀,→, µJune 1989, 10 pages, PDF

Abstract: This paper concerns the ∀, →, µ type systemused in the non-strict functional programming languagePonder. While the type system is akin to the types ofSecond Order Lambda-calculus, the absence of type ap-plication makes it possible to construct types with use-ful inclusion relationships between them.

To illustrate this, the paper contains definitions ofa natural numbers type with many definable subtypes,and of a record type with inheritance.

UCAM-CL-TR-172

Julia Rose Galliers:

A theoretical framework forcomputer models of cooperativedialogue, acknowledging multi-agentconflictJuly 1989, 226 pages, paper copy

UCAM-CL-TR-173

Roger William Stephen Hale:

Programming in temporal logicJuly 1989, 182 pages, paper copyPhD thesis (Trinity College, October 1988)

UCAM-CL-TR-174

James Thomas Woodchurch Clarke:

General theory relating to theimplementation of concurrentsymbolic computationAugust 1989, 113 pages, paper copyPhD thesis (Trinity College, January 1989)

UCAM-CL-TR-175


A formulation of the simple theory oftypes (for Isabelle)August 1989, 32 pages, PDF, DVI

Abstract: Simple type theory is formulated for use withthe generic theorem prover Isabelle. This requires ex-plicit type inference rules. There are function, prod-uct, and subset types, which may be empty. Descrip-tions (the eta-operator) introduce the Axiom of Choice.Higher-order logic is obtained through reflection be-tween formulae and terms of type bool. Recursive typesand functions can be formally constructed.

Isabelle proof procedures are described. The logicappears suitable for general mathematics as well ascomputational problems.

UCAM-CL-TR-176

T.J.W. Clarke:

Implementing aggregates in parallelfunctional languagesAugust 1989, 13 pages, paper copy

38

UCAM-CL-TR-177

P.A.J. Noel:

Experimenting with Isabelle inZF Set TheorySeptember 1989, 40 pages, paper copy

UCAM-CL-TR-178

Jeffrey J. Joyce:

Totally verified systems:linking verified software to verifiedhardwareSeptember 1989, 25 pages, PDF

Abstract: We describe exploratory efforts to design andverify a compiler for a formally verified microproces-sor as one aspect of the eventual goal of building to-tally verified systems. Together with a formal proof ofcorrectness for the microprocessor this yields a preciseand rigorously established link between the semanticsof the source language and the execution of compiledcode by the fabricated microchip. We describe in par-ticular: (1) how the limitations of real hardware influ-enced this proof; and (2) how the general frameworkprovided by higher order logic was used to formalizethe compiler correctness problem for a hierarchicallystructured language.

UCAM-CL-TR-179

Ursula Martin, Tobias Nipkow:

Automating SquiggolSeptember 1989, 16 pages, paper copy

UCAM-CL-TR-180

Tobias Nipkow:

Formal verification of data typerefinementTheory and practiceSeptember 1989, 31 pages, paper copy

UCAM-CL-TR-181

Tobias Nipkow:

Proof transformations for equationaltheoriesSeptember 1989, 17 pages, paper copy

UCAM-CL-TR-182

John M. Levine, Lee Fedder:

The theory and implementation ofa bidirectional question answeringsystemOctober 1989, 27 pages, paper copy

UCAM-CL-TR-183

Rachel Cardell-Oliver:

The specification and verificationof sliding window protocols inhigher order logicOctober 1989, 25 pages, paper copy

UCAM-CL-TR-184

David Lawrence Tennenhouse:

Site interconnection and theexchange architectureOctober 1989, 225 pages, paper copyPhD thesis (Darwin College, September 1988)

UCAM-CL-TR-185

Guo Qiang Zhang:

Logics of domainsDecember 1989, 250 pages, paper copyPhD thesis (Trinity College, May 1989)

UCAM-CL-TR-186

Derek Robert McAuley:

Protocol design for high speednetworksJanuary 1990, 100 pages, PostScriptPhD thesis (Fitzwilliam College, September 1989)

Abstract: Improvements in fibre optic communicationand in VLSI for network switching components haveled to the consideration of building digital switchednetworks capable of providing point to point commu-nication in the gigabit per second range. Provision ofbandwidths of this magnitude allows the considerationof a whole new range of telecommunications services,integrating video, voice, image and text. These multi-service networks have a range of requirements not met

39

by traditional network architectures designed for dig-ital telephony or computer applications. This disserta-tion describes the design, and an implementation, of theMulti-Service Network architecture and protocol fam-ily, which is aimed at supporting these services.

Asynchronous transfer mode networks provide thebasic support required for these integrated services, andthe Multi-Service Network architecture is designed pri-marily for these types of networks. The aim of theMulti-Service protocol family is to provide a com-plete architecture which allows use of the full facili-ties of asynchronous transfer mode networks by multi-media applications. To maintain comparable perfor-mance with the underlying media, certain elements ofthe MSN protocol stack are designed with implementa-tion in hardware in mind. The interconnection of het-erogeneous networks, and networks belonging to dif-ferent security and administrative domains, is consid-ered vital, so the MSN architecture takes an internet-working approach.

UCAM-CL-TR-187

Ann Copestake, Karen Sparck Jones:

Natural language interfaces todatabasesSeptember 1989, 36 pages, PostScript

Abstract: This paper reviews the state of the art in nat-ural language access to databases. This has been a long-standing area of work in natural language processing.But though some commercial systems are now avail-able, providing front ends has proved much harder thanwas expected, and the necessary limitations on frontends have to be recognised. The paper discusses theissues, both general to language and task-specific, in-volved in front end design, and the way these havebeen addressed, concentrating on the work of the lastdecade. The focus is on the central process of translat-ing a natural language question into a database query,but other supporting functions are also covered. Thepoints are illustrated by the use of a single example ap-plication. The paper concludes with an evaluation ofthe current state, indicating that future progress willdepend on the one hand on general advances in natu-ral language processing, and on the other on expandingthe capabilities of traditional databases.

UCAM-CL-TR-188

Timothy E. Leonard:

Specification of computerarchitectures:a survey and annotated bibliographyJanuary 1990, 42 pages, paper copy

UCAM-CL-TR-189

Lawrence C. Paulson, Tobias Nipkow:

Isabelle tutorial and user’s manualJanuary 1990, 142 pages, PDF, DVI

Abstract: This (obsolete!) manual describes how to usethe theorem prover Isabelle. For beginners, it explainshow to perform simple single-step proofs in the built-in logics. These include first-order logic, a classical se-quent calculus, ZF set theory, Constructie Type Theory,and higher-order logic. Each of these logics is described.The manual then explains how to develop advancedtactics and tacticals and how to derive rules. Finally,it describes how to define new logics within Isabelle.

UCAM-CL-TR-190

Ann Copestake:

Some notes on mass terms andpluralsJanuary 1990, 65 pages, PostScript

Abstract: This report describes a short investigationinto some possible treatments of mass nouns and plu-rals. It aims to provide a grammar and axiomatisationwith a reasonable coverage of these phenomena, so thata range of sentences can be parsed, and inferences madeautomatically.

The previous work on the subject, mainly due toHasle (1988) is reviewed, and the limitations of boththe original theories and Hasle’s implementation aredemonstrated. Some more recent work, especially thatrelevant to Link’s theory, is also discussed.

The present grammar and axiomatisation is de-scribed. Although it is not the implementation of anyparticular theory, it draws on the work of Link, Krifkaand Roberts. Some of the problems with the presentapproach are discussed, although possible solutionswould need to be considered in a wider context. Theaim is to show what types of phenomena can be treatedby a relatively simple approach.

The implemented grammar covers everything thatwas treated by Hasle’s implementation, and extendsthat coverage in a varietry of ways, while providinga better integration of the treatment of mass nounsand plurals than the earlier work. It was written in theCFG+ formalism, and some parts of the axiomatisationhave been tested using the HOL system.

40

UCAM-CL-TR-191

Cosmos Nicolaou:

An architecture for real-timemultimedia communications systemsFebruary 1990, 30 pages, PDF

Abstract: An architecture for real-time multimediacommunications systems is presented. A multimediacommunication systems includes both the communica-tion protocols used to transport the real-time data andalso the Distributed Computing system (DCS) withinwhich any applications using these protocols must ex-ecute. The architecture presented attempts to integratethese protocols with the DCS in a smooth fashion inorder to ease the writing of multimedia applications.Two issues are identified as being essential to the suc-cess of this integration: namely the synchronisation ofrelated real-time data streams, and the management ofheterogeneous multimedia hardware. The synchronisa-tion problem is tackled by defining explicit synchroni-sation properties at the presentation level and by pro-viding control and synchronisation operations withinthe DCS which operate in terms of these properties. Theheterogeneity problems are addressed by separating thedata transport semantics (protocols themselves) fromthe control semantics (protocol interfaces). The controlsemantics are implemented using a distributed, typedinterface, scheme within the DCS (i.e. above the pre-sentation layer), whilst the protocols themselves are im-plemented within the communication subsystem. Theinterface between the DCS and communications sub-system is referred to as the orchestration interface andcan be considered to lie in the presentation and sessionlayers.

A conforming prototype implementation is cur-rently under construction.

UCAM-CL-TR-192


Designing a theorem proverMay 1990, 57 pages, PDF, DVI

Abstract: The methods and principles of theoremprover design are presented through an extended ex-ample. Starting with a sequent calculus for first-orderlogic, an automatic prover (called Folderol) is devel-oped. Folderol can prove quite a few complicated theo-rems, although its search strategy is crude and limited.Folderol is coded in Standard ML and consists largelyof pure functions. Its complete listing is included.

The report concludes with a survey of other researchin theorem proving: the Boyer/Moore theorem prover,Automath, LCF, and Isabelle.

UCAM-CL-TR-193


Belief revision and a theory ofcommunicationMay 1990, 30 pages, paper copy

UCAM-CL-TR-194


Proceedings of the First BeliefRepresentation and AgentArchitectures WorkshopMarch 1990, 199 pages, paper copy

UCAM-CL-TR-195

Jeffrey J. Joyce:

Multi-level verification ofmicroprocessor-based systemsMay 1990, 163 pages, paper copyPhD thesis (Pembroke College, December 1989)

UCAM-CL-TR-196

John Peter Van Tassell:

The semantics of VHDL with Val andHol:towards practical verification toolsJune 1990, 77 pages, paper copy

UCAM-CL-TR-197

Thomas Clarke:

The semantics and implementation ofaggregatesorhow to express concurrency withoutdestroying determinismJuly 1990, 25 pages, paper copy

41

UCAM-CL-TR-198

Andrew M. Pitts:

Evaluation LogicAugust 1990, 31 pages, PostScript

Abstract: A new typed, higher-order logic is describedwhich appears particularly well fitted to reasoningabout forms of computation whose operational be-haviour can be specified using the Natural Semanticsstyle of structural operational semantics. The logic’s un-derlying type system is Moggi’s computational metalan-guage, which enforces a distinction between computa-tions and values via the categorical structure of a strongmonad. This is extended to a (constructive) predicatelogic with modal formulas about evaluation of compu-tations to values, called evaluation modalities. The cat-egorical structure corresponding to this kind of logic isexplained and a couple of examples of categorical mod-els given.

As a first example of the naturalness and applicabil-ity of this new logic to program semantics, we investi-gate the translation of a (tiny) fragment of Standard MLinto a theory over the logic, which is proved computa-tionally adequate for ML’s Natural Semantics. Whilstit is tiny, the ML fragment does however contain bothhigher-order functional and imperative features, aboutwhich the logic allows us to reason without having tomention global states explicitly.

UCAM-CL-TR-199

Richard Boulton, Mike Gordon,John Herbert, John Van Tassel:

The HOL verification of ELLAdesignsAugust 1990, 22 pages, PostScript

Abstract: HOL is a public domain system for gener-ating proofs in higher order predicate calculus. It hasbeen in experimental and commercial use in severalcountries for a number of years.

ELLA is a hardware design language developed atthe Royal Signals and Radar Establishment (RSRE) andmarketed by Computer General Electronic Design. Itsupports simulation models at a variety of different ab-straction levels.

A preliminary methodology for reasoning aboutELLA designs using HOL is described. Our approach isto semantically embed a subset of the ELLA languagein higher order logic, and then to make this embed-ding convenient to use with parsers and pretty-printers.There are a number of semantic issues that may affectthe ease of verification. We discuss some of these briefly.We also give a simple example to illustrate the method-ology.

UCAM-CL-TR-200

Tobias Nipkow, Gregor Snelting:

Type classes and overloadingresolution via order-sorted unificationAugust 1990, 16 pages, paper copy

UCAM-CL-TR-201

Thomas Frederick Melham:

Formalizing abstraction mechanismsfor hardware verification in higherorder logicAugust 1990, 233 pages, PDFPhD thesis (Gonville & Caius College, August 1989)

Abstract: Recent advances in microelectronics havegiven designers of digital hardware the potential tobuild devices of remarkable size and complexity. Alongwith this however, it becomes increasingly difficult toensure that such systems are free from design errors,where complete simulation of even moderately sizedcircuits is impossible. One solution to these problemsis that of hardware verification, where the functionalbehaviour of the hardware is described mathematicallyand formal proof is used to show that the design meetsrigorous specifications of the intended operation.

This dissertation therefore seeks to develop this,showing how reasoning about the correctness of hard-ware using formal proof can be achieved using fun-damental abstraction mechanisms to relate specifica-tions of hardware at different levels. Therefore a sys-tematic method is described for defining any instance ofa wide class of concrete data types in higher order logic.This process has been automated in the HOL theoremprover, and provides a firm logical basis for represent-ing data in formal specifications.

Further, these abstractions have been developed intoa new technique for modelling the behaviour of entireclasses of hardware designs. This is based on a for-mal representation in logic for the structure of circuitdesigns using the recursive types defined by the abovemethod. Two detailed examples are presented showinghow this work can be applied in practice.

Finally, some techniques for temporal abstractionare explained, and the means for asserting the correct-ness of a model containing time-dependent behaviouris described. This work is then illustrated using a casestudy; the formal verification on HOL of a simple ringcommunication network.

[Abstract by Nicholas Cutler (librarian), as nonewas submitted with the report.]

42

UCAM-CL-TR-202

Andrew Charles Harter:

Three-dimensional integrated circuitlayoutAugust 1990, 179 pages, paper copyPhD thesis (Corpus Christi College, April 1990)

UCAM-CL-TR-203

Valeria C.V. de Paiva:

Subtyping in Ponder(preliminary report)August 1990, 35 pages, PDF

Abstract: This note starts the formal study of the typesystem of the functional language Ponder. Some of theproblems of proving soundness and completeness arediscussed and some preliminary results, about frag-ments of the type system, shown.

It consists of 6 sections. In section 1 we reviewbriefly Ponder’s syntax and describe its typing system.In section 2 we consider a very restricted fragment ofthe language for which we can prove soundness ofthe type inference mechanism, but not completeness.Section 3 describes possible models of this fragmentand some related work. Section 4 describes the type-inference algorithm for a larger fragment of Ponder andin section 5 we come up against some problematic ex-amples. Section 6 is a summary of further work.

UCAM-CL-TR-204

Roy L. Crole, Andrew M. Pitts:

New foundations for fixpointcomputations: FIX-hyperdoctrinesand the FIX-logicAugust 1990, 37 pages, PostScript

Abstract: This paper introduces a new higher-ordertyped constructive predicate logic for fixpoint computa-tions, which exploits the categorical semantics of com-putations introduced by Moggi and contains a strongversion of Martin Lof’s ‘iteration type’. The type sys-tem enforces a separation of computations from values.The logic contains a novel form of fixpoint inductionand can express partial and total correctness statementsabout evaluation of computations to values. The con-structive nature of the logic is witnessed by strong met-alogical properties which are proved using a category-theoretic version of the ‘logical relations’ method.

UCAM-CL-TR-205

Lawrence C. Paulson, Andrew W. Smith:

Logic programming, functionalprogramming and inductivedefinitions29 pages, PDF, DVI

Abstract: This paper reports an attempt to combinelogic and functional programming. It also questions thetraditional view that logic programming is a form offirst-order logic, arguing instead that the essential na-ture of a logic program is an inductive definition. Thisrevised view of logic programming suggests the designof a combined logic/functional language. A slow butworking prototype is described.

UCAM-CL-TR-206

Rachel Cardell-Oliver:

Formal verification of real-timeprotocols using higher order logicAugust 1990, 36 pages, paper copy

UCAM-CL-TR-207

Stuart Philip Hawkins:

Video replay in computer animationOctober 1990, 161 pages, paper copyPhD thesis (Queens’ College, December 1989)

UCAM-CL-TR-208

Eike Ritter:

Categorical combinators for thecalculus of constructionsOctober 1990, 43 pages, paper copy

UCAM-CL-TR-209

Andrew William Moore:

Efficient memory-based learning forrobot controlNovember 1990, 248 pages, PDFPhD thesis (Trinity Hall, October 1990)

43

Abstract: This dissertation is about the application ofmachine learning to robot control. A system which hasno initial model of the robot/world dynamics shouldbe able to construct such a model using data receivedthrough its sensors—an approach which is formalizedhere as the SAB (State-Action-Behaviour) control cycle.A method of learning is presented in which all the expe-riences in the lifetime of the robot are explicitly remem-bered. The experiences are stored in a manner whichpermits fast recall of the closest previous experience toany new situation, thus permitting very quick predic-tions of the effects of proposed actions and, given a goalbehaviour, permitting fast generation of a candidate ac-tion. The learning can take place in high-dimensionalnon-linear control spaces with real-valued ranges ofvariables. Furthermore, the method avoids a numberof shortcomings of earlier learning methods in whichthe controller can become trapped in inadequate perfor-mance which does not improve. Also considered is howthe system is made resistant to noisy inputs and howit adapts to environmental changes. A well foundedmechanism for choosing actions is introduced whichsolves the experiment/perform dilemma for this domainwith adequate computational efficiency, and with fastconvergence to the goal behaviour. The dissertation ex-plains in detail how the SAB control cycle can be in-tegrated into both low and high complexity tasks. Themethods and algorithms are evaluated with numerousexperiments using both real and simulated robot do-mains. The final experiment also illustrates how a com-pound learning task can be structured into a hierarchyof simple learning tasks.

UCAM-CL-TR-210

Tobias Nipkow:

Higher-order unification,polymorphism, and subsorts15 pages, paper copy

UCAM-CL-TR-211

Karen Sparck Jones:

The role of artificial intelligence ininformation retrievalNovember 1990, 13 pages, paper copy

UCAM-CL-TR-212

K.L. Wrench:

A distributed and-or parallel Prolognetwork82 pages, paper copy

UCAM-CL-TR-213

Valeria Correa Vaz de Paiva:

The Dialectica categoriesJanuary 1991, 82 pages, PDFPhD thesis (Lucy Cavendish College, November 1988)

Abstract: This work consists of two main parts. Thefirst one, which gives it its name, presents an internalcategorical version of Godel’s “Dialectica interpreta-tion” of higher-order arithmetic. The idea is to anal-yse the Dialectica interpretation using a cetegory DCwhere objects are relations on objects of a basic cate-gory C and maps are pairs of maps of C satisfying apullback condition. If C is finitely complete, DC existsand has a very natural symmetric monoidal structure.If C is locally cartesian closed then DC is symmetricmonoidal closed. If we assume C with stable and dis-joint coproducts, DC has cartesian products and weak-coproducts and satisfies a weak form of distributivity.Using the structure above, DC is a categorical modelfor intuitionistic linear logic.

Moreover if C has free monoids then DC has cofreecomonoids and the corresponding comonad “!” onDC, which has some special properties, can be usedto model the exponential “of course!” in Intuitionis-tic Linear Logic. The category of “!”-coalgebras is iso-morphic to the category of comonoids in DC and, ifwe assume commutative monoids in C, the “!”-Kleislicategory, which is cartesian closed, corresponds to theDiller-Nahm variant of the Dialectica interpretation.

The second part introduces the categories GC. Theobjects of GC are the same objects of DC, but mor-phisms are easier to handle, since they are maps in Cin opposite directions. If C is finitely complete, the cat-egory GC exists. If C is cartesian closed, we can de-fine a symmetric monoidal structure and if C is locallycartesian closed as well, we can define inernal homsin GC that make it a symmetric monoidal closed cate-gory. Supposing C with stable and disjoint coproducts,we can define cartesian products and coproducts in GCand, more interesting, we can define a dual operationto the tensor product bifunctor, called “par”. The oper-ation “par” is a bifunctor and has a unit “⊥”, which isa dualising object. Using the internal hom and ⊥ we de-fine a contravariant functor “(−)⊥” which behaves likenegation and thus it is used to model linear negation.We show that the category GC, with all the structureabove, is a categorical model for Linear Logic, but notexactly the classical one.

In the last chapter a comonad and a monad are de-fined to model the exponentials “!” and “?”. To de-fine these endofunctors, we use Beck’s distributive lawsin an interesting way. Finally, we show that the Kleislicategory GC! is cartesian closed and that the categoriesDC and GC are related by a Kleisli construction.

44

UCAM-CL-TR-214

J.A. Bradshaw, R.M. Young:

Integrating knowledge of purposeand knowledge of structure fordesign evaluationFebruary 1991, 20 pages, paper copy

UCAM-CL-TR-215

Paul Curzon:

A structured approach to theverification of low level microcode265 pages, PDFPhD thesis (Christ’s College, May 1990)

Abstract: Errors in microprograms are especially se-rious since all higher level programs on the machinedepend on the microcode. Formal verification presentsone avenue which may be used to discover such errors.Previous systems which have been used for formallyverifying microcode may be categorised by the form inwhich the microcode is supplied. Some demand that itbe written in a high level microprogramming language.Conventional software verification techniques are thenapplied. Other methods allow the microcode to be sup-plied in the form of a memory image. It is treated asdata to an interpreter modelling the behaviour of themicroarchitecture. The proof is then performed by sym-bolic execution. A third solution is for the code to besupplied in an assembly language and modelled at thatlevel. The assembler instructions are converted to com-mands in a modelling language. The resulting programis verified using traditional software verification tech-niques.

In this dissertation I present a new universal micro-program verification system. It achieves many of the ad-vantages of the other kinds of systems by adopting a hy-brid approach. The microcode is supplied as a memoryimage, but it is transformed by the system to a high levelprogram which may be verified using standard soft-ware verification techniques. The structure of the highlevel program is obtained from user supplied documen-tation. I show that this allows microcode to be split intosmall, independently validatable portions even when itwas not written in that way. I also demonstrate thatthe techniques allow the complexity of detail due tothe underlying microarchitecture to be controlled at anearly stage in the validation process. I suggest that thesystem described would combine well with other vali-dation tools and provide help throughout the firmwaredevelopment cycle. Two case studies are given. The firstdescribes the verification of Gordon’s computer. Thisexample being fairly simple, provides a good illustra-tion of the techniques used by the system. The secondcase study is concerned with the High Level Hardware

Orion computer which is a commercially produced ma-chine with a fairly complex microarchitecture. This ex-ample shows that the techniques scale well to produc-tion microarchitectures.

UCAM-CL-TR-216

Carole Susan Klein:

Exploiting OR-parallelism in Prologusing multiple sequential machines250 pages, paper copyPhD thesis (Wolfson College, October 1989)

UCAM-CL-TR-217

Bhaskar Ramanathan Harita:

Dynamic bandwidth management160 pages, paper copyPhD thesis (Wolfson College, October 1990)

UCAM-CL-TR-218

Tobias Nipkow:

Higher-order critical pairs15 pages, paper copy

UCAM-CL-TR-219

Ian M. Leslie, Derek M. McAuley,Mark Hayter, Richard Black, Reto Beller,Peter Newman, Matthew Doar:

Fairisle project working documentsSnapshot 1March 1991, 15 pages, paper copy

UCAM-CL-TR-220

Cosmos Andrea Nicolaou:

A distributed architecture formultimedia communication systems192 pages, paper copyPhD thesis (Christ’s College, December 1990)

45

UCAM-CL-TR-221

Robert Milne:

Transforming axioms for data typesinto sequential programs44 pages, PDF

Abstract: A process is proposed for refining specifica-tions of abstract data types into efficient sequential im-plementations. The process needs little manual inter-vention. It is split into three stages, not all of whichneed always be carried out. The three stages entail in-terpreting equalities as behavioural equivalences, con-verting functions into procedures and replacing axiomsby programs. The stages can be performed as automatictransformations which are certain to produce resultsthat meet the specifications, provided that simple con-ditions hold. These conditions describe the adequacyof the specifications, the freedom from interference be-tween the procedures, and the mode of construction ofthe procedures. Sufficient versions of these conditionscan be checked automatically. Varying the conditionscould produce implementations for different classes ofspecification. Though the transformations could be au-tomated, the intermediate results, in styles of specifica-tion which cover both functions and procedures, haveinterest in their own right and may be particularly ap-propriate to object-oriented design.

UCAM-CL-TR-222


Extensions to coloured petri nets andtheir application to protocols190 pages, paper copyPhD thesis (Clare Hall, May 1990)

UCAM-CL-TR-223

Philip Gladwin, Stephen Pulman,Karen Sparck Jones:

Shallow processing andautomatic summarising:a first studyMay 1991, 65 pages, paper copy

UCAM-CL-TR-224

Ted Briscoe, John Carroll:

Generalised probabilistic LR parsingof natural language (corpora)with unification-based grammars45 pages, paper copy

UCAM-CL-TR-225

Valeria de Paiva:

Categorical multirelations,linear logic and petri nets(draft)May 1991, 29 pages, PDF

Abstract: This note presents a categorical treatment ofmultirelations, which is, in a loose sense a generalisa-tion of both our previous work on the categories GC,and of Chu’s construction A NC [Barr’79]. The mainmotivation for writing this note was the utilisation ofthe category GC by Brown and Gurr [BG90] to modelPetri nets. We wanted to extend their work to dealwith multirelations, as Petri nets are usually modelledusing multirelations pre and post. That proved easyenough and people interested mainly in concurrencytheory should refer to our joint work [BGdP’91], thisnote deals with the mathematics underlying [BGdP’91].The upshot of this work is that we build a model of In-tuitionistic Linear Logic (without modalities) over anysymmetric monoidal category C with a distinguishedobject (N, ≤, , e −) – a closed poset. Moreover, ifthe category C is cartesian closed with free monoids,we build a model of Intuitionistic Linear Logic with anon-trivial modality ‘!’ over it.

UCAM-CL-TR-226

Kwok-yan Lam:

A new approach for improvingsystem availabilityJune 1991, 108 pages, paper copyPhD thesis (Churchill College, January 1991)

UCAM-CL-TR-227

Juanito Albert Camilleri:

Priority in process calculiJune 1991, 203 pages, paper copyPhD thesis (Trinity College, October 1990)

UCAM-CL-TR-228

Mark Hayter, Derek McAuley:

The desk area networkMay 1991, 11 pages, PostScript

Abstract: A novel architecture for use within an endcomputing system is described. This attempts to extendthe concepts used in modern high speed networks intocomputer system design. A multimedia workstation isbeing built based on this concept to evaluate the ap-proach.

46

UCAM-CL-TR-229

David J. Brown:

Abstraction of image and pixelThe thistle display systemAugust 1991, 197 pages, paper copyPhD thesis (St John’s College, February 1991)

UCAM-CL-TR-230

J. Galliers:

Proceedings of the SecondBelief Representation andAgent Architectures Workshop(BRAA ’91)August 1991, 255 pages, paper copy

UCAM-CL-TR-231

Raphael Yahalom:

Managing the order of transactions inwidely-distributed data systemsAugust 1991, 133 pages, paper copyPhD thesis (Jesus College, October 1990)

UCAM-CL-TR-232

Francisco Corella:

Mechanising set theoryJuly 1991, 217 pages, PDFPhD thesis (Corpus Christi College, June 1989)

Abstract: Set theory is today the standard foundationof mathematics, but most proof development sysems(PDS) are based on type theory rather than set theory.This is due in part to the difficulty of reducing the richmathematical vocabulary to the economical vocabularyof the set theory. It is known how to do this in principle,but traditional explanations of mathematical notationsin set theoretic terms do not lead themselves easily tomechanical treatment.

We advocate the representation of mathematicalnotations in a formal system consisting of the ax-ioms of any version of ordinary set theory, such asZF, but within the framework of higher-order logicwith λ-conversion (H.O.L.) rather than first-order logic(F.O.L.). In this system each notation can be repre-sented by a constant, which has a higher-order typewhen the notation binds variables. The meaning of thenotation is given by an axiom which defines the repre-senting constant, and the correspondence between the

ordinary syntax of the notation and its representationin the formal language is specified by a rewrite rule. Thecollection of rewrite rules comprises a rewriting systemof a kind which is computationally well behaved.

The formal system is justified by the fact than settheory within H.O.L. is a conservative extension of settheory within F.O.L. Besides facilitating the represen-tation of notations, the formal system is of interestbe-cause it permits the use of mathematical methods whichdo not seem to be available in set theory within F.O.L.

A PDS, called Watson, has been built to demon-strate this approach to the mechanization of mathe-matics. Watson embodies a methodology for interactiveproof which provides both flexibility of use and a rela-tive guarantee of correctness. Results and proofs can besaved, and can be perused and modified with an ordi-nary text editor. The user can specify his own notationsas rewrite rules and adapt the mix of notations to suitthe problem at hand; it is easy to switch from one set ofnotations to another. As a case study, Watson has beenused to prove the correctness of a latch implemented astwo cross-coupled nor-gates, with an approximation oftime as a continuum.

UCAM-CL-TR-233

John Carroll, Ted Briscoe, Claire Grover:

A development environment forlarge natural language grammarsJuly 1991, 65 pages, paper copy

UCAM-CL-TR-234

Karen Sparck Jones:

Two tutorial papers:Information retrieval & ThesaurusAugust 1991, 31 pages, PDF

Abstract: The first paper describes the characteristicsof information retrieval from documents or texts, thedevelopment and status of automatic indexing and re-trieval, and the actual and potential relations betweeninformation retrieval and artificial intelligence. The sec-ond paper discusses the properties, construction and ac-tual and potential uses of thesauri, as semantic classifi-cations or terminological knowledge bases, in informa-tion retrieval and natural language processing.

UCAM-CL-TR-235

Heng Wang:

Modelling and image generation145 pages, paper copyPhD thesis (St John’s College, July 1991)

47

UCAM-CL-TR-236

John Anthony Bradshaw:

Using knowledge of purpose andknowledge of structure as a basisfor evaluating the behaviour ofmechanical systems153 pages, paper copyPhD thesis (Gonville & Caius College, June 1991)

UCAM-CL-TR-237

Derek G. Bridge:

Computing presuppositions in anincremantal language processingsystem212 pages, paper copyPhD thesis (Wolfson College, April 1991)

UCAM-CL-TR-238

Ted Briscoe, Ann Copestake,Valeria de Paiva:

Proceedings of the ACQUILEXWorkshop on Default Inheritance inthe lexiconOctober 1991, 180 pages, paper copy

UCAM-CL-TR-239

Mark Thomas Maybury:

Planning multisentential English textusing communicative actsDecember 1991, 329 pages, PDFPhD thesis (Wolfson College, July 1991)

Abstract: The goal of this research is to develop expla-nation presentation mechanisms for knowledge basedsystems which enable them to define domain terminol-ogy and concepts, narrate events, elucidate plans, pro-cesses, or propositions and argue to support a claim oradvocate action. This requires the development of de-vices which select, structure, order and then linguisti-cally realize explanation content as coherent and cohe-sive English text.

With the goal of identifying generic explanation pre-sentation strategies, a wide range of naturally occurringtexts were analyzed with respect to their communica-tive structure, function, content and intended effects on

the reader. This motivated an integrated theory of com-municative acts which characterizes text at the levelof rhetorical acts (e.g. describe, define, narrate), illo-cutionary acts (e.g. inform, request), and locutionaryacts (ask, command). Taken as a whole, the identifiedcommunicative acts characterize the structure, contentand intended effects of four types of text: description,narration, exposition, argument. These text types havedistinct effects such as getting the reader to know aboutentities, to know about events, to understand plans,processes, or propositions, or to believe propositions orwant to perform actions. In addition to identifying thecommunicative function and effect of text at multiplelevels of abstraction, this dissertation details a tripartitetheory of focus of attention (discourse focus, temporalfocus and spatial focus) which constrains the planningand linguistic realization of text.

To test the integrated theory of communicative actsand tripartite theory of focus of attention, a text gener-ation system TEXPLAN (Textual EXplanation PLAN-ner) was implemented that plans and linguistically re-alizes multisentential and multiparagraph explanationsfrom knowledge based systems. The communicativeacts identified during text analysis were formalized oversixty compositional and (in some cases) recursive planoperators in the library of a hierarchical planner. Dis-course, temporal and spatial models were implementedto track and use attentional information to guide theorganization and realization of text. Because the planoperators distinguish between the communicative func-tion (e.g. argue for a proposition) and the expected ef-fect (e.g. the reader believes the proposition) of commu-nicative acts, the system is able to construct a discoursemodel of the structure and function of its textual re-sponses as well as a user model of the expected effectsof its responses on the reader’s knowledge, beliefs, anddesires. The system uses both the discourse model anduser model to guide subsequent utterances. To test itsgenerality, the system was interfaced to a variety of do-main applications including a neuropsychological diag-nosis system, a mission planning system, and a knowl-edge based mission simulator. The system produces de-scriptions, narratives, expositions and arguments fromthese applications, thus exhibiting a broader ranger ofrhetorical coverage then previous text generation sys-tems.

UCAM-CL-TR-240

Juanito Camilleri:

Symbolic compilation andexecution of programs by proof:a case study in HOL31 pages, paper copy

48

UCAM-CL-TR-241

Thomas Ulrich Vogel:

Learning in large state spaces with anapplication to biped robot walkingDecember 1991, 204 pages, paper copyPhD thesis (Wolfson College, November 1991)

UCAM-CL-TR-242

Glenford Ezra Mapp:

An object-oriented approach tovirtual memory managementJanuary 1992, 150 pages, PDFPhD thesis (Clare Hall, September 1991)

Abstract: Advances in computer technology are beingpooled together to form a new computing environmentwhich is characterised by powerful workstations withvast amounts of memory connected to high speed net-works. This environment will provide a large num-ber of diverse services such as multimedia communi-cations, expert systems and object-oriented databases.In order to develop these complex applications in anefficient manner, new interfaces are required which aresimple, fast and flexible and allow the programmer touse an object-oriented approach throughout the designand implementation of an application. Virtual memorytechniques are increasingly being used to build thesenew facilities.

In addition since CPU speeds continue to increasefaster than disk speeds, an I/O bottleneck may developin which the CPU may be idle for long periods wait-ing for paging requests to be satisfied. To overcomethis problem it is necessary to develop new paging algo-rithms that better reflect how different objects are used.Thus a facility to page objects on a per-object basis isrequired and a testbed is also needed to obtain experi-mental data on the paging activity of different objects.

Virtual memory techniques, previously only used inmainframe and minicomputer architectures, are beingemployed in the memory management units of mod-ern microprocessors. With very large address spaces be-coming a standard feature of most systems, the use ofmemory mapping is seen as an effective way of provid-ing greater flexibility as well as improved system effi-ciency.

This thesis presents an object-oriented interface formemory mapped objects. Each object has a designatedobject type. Handles are associated with different ob-ject types and the interface allows users to define andmanage new object types. Moving data between theobject and its backing store is done by user-level pro-cesses called object managers. Object managers inter-act with the kernel via a specified interface thus al-lowing users to build their own object managers. A

framework to compare different algorithms was alsodeveloped and an experimental testbed was designed togather and analyse data on the paging activity of vari-ous programs. Using the testbed, conventional pagingalgorithms were applied to different types of objectsand the results were compared. New paging algorithmswere designed and implemented for objects that are ac-cessed in a highly sequential manner.

UCAM-CL-TR-243

Alison Cawsey, Julia Galliers, Stenev Reece,Karen Sparck Jones:

Automating the librarian:a fundamental approach using beliefrevisionJanuary 1992, 39 pages, paper copy

UCAM-CL-TR-244

T.F. Melham:

A mechanized theory of theπ-calculus in HOL31 pages, paper copy

UCAM-CL-TR-245

Michael J. Dixon:

System support for multi-servicetrafficJanuary 1992, 108 pages, PDFPhD thesis (Fitzwilliam College, September 1991)

Abstract: Digital network technology is now capable ofsupporting the bandwidth requirements of diverse ap-plications such as voice, video and data (so called multi-service traffic). Some media, for example voice, havespecific transmission requirements regarding the max-imum packet delay and loss which they can tolerate.Problems arise when attempting to multiplex such traf-fic over a single channel. Traditional digital networksbased on the Packet- (PTM) and Synchronous- (STM)Transfer Modes prove unsuitable due to their me-dia access contention and inflexible bandwidth alloca-tion properties respectively. The Asynchronous Trans-fer Mode (STM) has been proposed as a compromisebetween the PTM and STM techniques. The currentstate of multimedia research suggests that a significantamount of multi-service traffic will be handled by com-puter operating systems. Unfortunately conventionaloperating systems are largely unsuited to such a task.This dissertation is concerned with the system organi-sation necessary in order to extend the benefits of ATM

49

networking through the endpoint operating system andup to the application level. A locally developed micro-kernel, with ATM network protocol support, has beenused as a testbed for the ideas presented. Practical re-sults over prototype ATM networks, including the 512MHz Cambridge Backbone Network, are presented.

UCAM-CL-TR-246

Victor Poznanski:

A relevance-based utteranceprocessing systemFebruary 1992, 295 pages, PDFPhD thesis (Girton College, December 1990)

Abstract: This thesis presents a computational interpre-tation of Sperber and Wilson’s relevance theory, basedon the use of non-monotonic logic supported by a rea-son maintenance system, and shows how the theory,when given a specific form in this way, can provide aunique and interesting account of discourse processing.

Relevance theory is a radical theory of natural lan-guage pragmatics which attempts to explain the wholeof human cognition using a single maxim: the Principleof Optimal Relevance. The theory is seen by its origi-nators as a computationally more adequate alternativeto Gricean pragmatics. Much as it claims to offer theadvantage of a unified approach to utterance compre-hension, Relevance Theory is hard to evaluate becauseSperber and Wilson only provide vague, high-level de-scriptions of vital aspects of their theory. For exam-ple, the fundamental idea behind the whole theory isthat, in trying to understand an utterance, we attemptto maximise significant new information obtained fromthe utterance whilst consuming as little cognitive effortas possible. However, Sperber and Wilson do not makethe nature of information and effort sufficiently clear.

Relevance theory is attractive as a general theoryof human language communication and as a potentialframework for computational language processing sys-tems. The thesis seeks to clarify and flesh out the prob-lem areas in order to develop a computational imple-mentation which is used to evaluate the theory.

The early chapters examine and criticise the impor-tant aspects of the theory, emerging with a schemafor an ideal relevance-based system. Crystal, a com-putational implementation of an utterance processingsystem based on this schema is then described. Crys-tal performs certain types of utterance disambiguationand reference resolution, and computes implicatures ac-cording to relevance theory.

An adequate reasoning apparatus is a key compo-nent of a relevance based discourse processor, so a suit-able knowledge representation and inference engine arerequired. Various candidate formalisms are considered,and a knowledge representation and inference enginebased on autoepistemic logic is found to be the mostsuitable. It is then shown how this representation can

be used to meet particular discourse processing require-ments, and how it provides a convenient interface toa separate abduction system that supplies not demon-strative inferences according to relevence theory. Crys-tal’s powers are illustrated with examples, and the the-sis shows how the design not only implements the lessprecise areas of Sperber and Wilson’s theory, but over-comes problems with the theory itself.

Crystal uses rather crude heuristics to model notionssuch as salience and degrees of belief. The thesis theforepresents a proposal and outline for a new kind of rea-son maintenance system that supports non-monotoniclogic whose formulae re labelled with upper/lowerprobability ranges intended to represent strength ofbelief. This system should facilitate measurements ofchange in semantic information and shed some light onnotions such as expected utility and salience.

The thesis concludes that the design and implemen-tation of crystal provide evidence that relevance theory,as a generic theory of language processing, is a viablealternative theory of pragmatics. It therefore merits agreater level of investigation than has been applied to itto date.

UCAM-CL-TR-247

Roy Luis Crole:

Programming metalogics with afixpoint typeFebruary 1992, 164 pages, paper copyPhD thesis (Churchill College, January 1992)

UCAM-CL-TR-248

Richard J. Boulton:

On efficiency in theorem proverswhich fully expand proofs intoprimitive inferencesFebruary 1992, 23 pages, DVI

Abstract: Theorem Provers which fully expand proofsinto applications of primitive inference rules can bemade highly secure, but have been criticized for beingorders of magnitude slower than many other theoremprovers. We argue that much of this relative inefficiencyis due to the way proof procedures are typically writtenand not all is inherent in the way the systems work. Wesupport this claim by considering a proof procedure forlinear arithmetic. We show that straightforward tech-niques can be used to significantly cut down the com-putation required. An order of magnitude improvementin the performance is shown by an implementation ofthese techniques.

50

UCAM-CL-TR-249

John P. Van Tassel:

A formalisation of the VHDLsimulation cycleMarch 1992, 24 pages, PDF

Abstract: The VHSIC Hardware Description Language(VHDL) has been gaining wide acceptance as a unify-ing HDL. It is, however, still a language in which theonly way of validating a design is by careful simulation.With the aim of better understanding VHDL’s particu-lar simulation process and eventually reasoning aboutit, we have developed a formalisation of VHDL’s sim-ulation cycle for a subset of the language. It has alsobeen possible to embed our semantics in the CambridgeHigher-Order Logic (HOL) system and derive interest-ing properties about specific VHDL programs.

UCAM-CL-TR-250

Innes A. Ferguson:

TouringMachines: autonomousagents with attitudesApril 1992, 19 pages, PostScript

Abstract: It is becoming widely accepted that neitherpurely reactive nor purely deliberative control tech-niques are capable of producing the range of be-haviours required of intelligent robotic agents in dy-namic, unpredictable, multi-agent worlds. We presenta new architecture for controlling autonomous, mobileagents – building on previous work addressing reac-tive and deliberative control methods. The proposedmulti-layered control architecture allows a resource-bounded, goal-directed agent to react promptlyto un-expected changes in its environment; at the same time itallows the agent to reason predictively about potentialconflicts by contrasting and projecting theories whichhypothesise other agents’ goals and intentions.

The line of research adopted is very much a prag-matic one. A single common architecture has beenimplemented which, being extensively parametrizedallows an experimenter to study functionally- andbehaviourally-diverse agent configurations. A principalaim of this research is to understand the role differentfunctional capabilities play in constraining an agent’sbehaviour under varying environmental conditions. Tothis end, we have constructed an experimental testbedcomprising a simulated multi-agent world in which avariety of agent configurations and bahaviours havebeen investigated. Some experience with the new con-trol architecture is described.

UCAM-CL-TR-251

Xiaofeng Jiang:

Multipoint digital videocommunicationsApril 1992, 124 pages, paper copyPhD thesis (Wolfson College, December 1991)

UCAM-CL-TR-252

Andrew M. Pitts:

A co-induction principle forrecursively defined domains25 pages, PostScript

Abstract: This paper establishes a new property of pre-domains recursively defined using the cartesian prod-uct, disjoint union, partial function space and convexpowerdomain constructors. We prove that the partialorder on such a recuirsive predomain D is the great-est fixed point of a certain monotone operator asso-ciated to D. This provides a structurally defined fam-ily of proof principles for these recursive predomains:to show that one element of D approximates another,it suffices to find a binary relation containing the twoelements that is a post-fixed point for the associatedmonotone operator. The statement of the proof princi-ples is independent of any of the various methods avail-able for explicit construction of recursive predomains.Following Milner and Tofte, the method of proof iscalled co-induction. It closely resembles the way bisim-ulations are used in concurrent process calculi.

Two specific instances of the co-induction principlealready occur in the work of Abramsky in the formof ‘internal full abstraction’ theorems for denotationalsemantics of SCCS and the lazy lambda calculus. Inthe first case post-fixed binary relations are preciselyAbramsky’s partial bisimulations, whereas in the sec-ond case they are his applicative bisimulations. Thecoinduction principle also provides an apparently use-ful tool for reasoning about the equality of elements ofrecursively defined datatypes in (strict or lazy) higherorder functional programming languages.

UCAM-CL-TR-253

Antonio Sanfilippo:

The (other) Cambridge ACQUILEXpapers141 pages, paper copy

51

UCAM-CL-TR-254

Richard J. Boulton:

A HOL semantics for a subset ofELLAApril 1992, 104 pages, DVI

Abstract: Formal verification is an important tool in thedesign of computer systems, especially when the sys-tems are safety or security critical. However, the formaltechniques currently available are not well integratedinto the set of tools more traditionally used by design-ers. This work is aimed at improving the integrationby providing a formal semantics for a subset of thehardware description language ELLA, and by support-ing this semantics in the HOL theorem proving system,which has been used extensively for hardware verifica-tion.

A semantics for a subset of ELLA is described, andan outline of a proof of the equivalence of parallel andrecursive implementations of an n-bit adder is given asan illustration of the semantics. The proof has beenperformed in an extension of the HOL system. Someproof tools written to support the verification are alsodescribed.

UCAM-CL-TR-255

Rachel Mary Cardell-Oliver:

The formal verification of hardreal-time systems1992, 151 pages, paper copyPhD thesis (Queens’ College, January 1992)

UCAM-CL-TR-256

Martin Richards:

MCPL programming manualMay 1992, 32 pages, paper copy

UCAM-CL-TR-257

Rajeev Prakhakar Gore:

Cut-free sequent and tableau systemsfor propositional normal modallogicsMay 1992, 160 pages, PDF

Abstract: We present a unified treatment of tableau, se-quent and axiomatic formulations for many proposi-tional normal modal logics, thus unifying and extend-ing the work of Hanson, Segerberg, Zeman, Mints, Fit-ting, Rautenberg and Shvarts. The primary emphasisis on tableau systems as the completeness proofs areeasier in this setting. Each tableau system has a natu-ral sequent analogue defining a finitary provability re-lation for each axiomatically formulated logic L. Con-sequently, any tableau proof can be converted into asequent proof which can be read downwards to obtainan axiomatic proof. In particular, we present cut-freesequent systems for the logics S4.3, S4.3.1 and S4.14.These three logics have important temporal interpreta-tions and the sequent systems appear to be new.

All systems are sound and (weakly) complete withrespect to their known finite frame Kripke seman-tics. By concentrating almost exclusively on finite treeframes we obtain finer characterisation results, partic-ularly for the logics with natural temporal interpreta-tions. In particular, all proofs of tableau completenessare constructive and yield the finite model property anddecidability for each logic.

Most of these systems are cut-free giving a Gentzencut-elimination theorem for the logic in question. Buteven when the cut rule is required, all uses of it re-main analytic. Some systems do not possess the sub-formula property. But in all such cases the class of “su-performulae” remains bounded, giving an analytic su-performula property. Thus all systems remain totallyamenable to computer implementation and immedi-ately serve as nondeterministic decision procedures forthe logics they formulate. Furthermore, the constructivecompleteness proofs yield deterministic decision proce-dures for all the logics concerned.

In obtaining these systems we domonstrate that thesubformula property can be broken in a systematicand analytic way while still retaining decidability. Thisshould not be surprising since it is known that modallogic is a form of second order logic and that the sub-formula property does not hold for higher order logics.

UCAM-CL-TR-258

David J. Greaves, Derek McAuley:

Private ATM networksMay 1992, 12 pages, paper copy

UCAM-CL-TR-259

Samson Abramsky, C.-H. Luke Ong:

Full abstraction in the Lazy LambdaCalculus104 pages, paper copy

52

UCAM-CL-TR-260

Henrik Reif Anderson:

Local computation of alternatingfixed-points21 pages, paper copy

UCAM-CL-TR-261

Neil Anthony Dodgson:

Image resamplingAugust 1992, 264 pages, PDFPhD thesis (Wolfson College)

Abstract: Image resampling is the process of geometri-cally transforming digital images. This report considersseveral aspects of the process.

We begin by decomposing the resampling processinto three simpler sub-processes: reconstruction of acontinuous intensity surface from a discrete image,transformation of that continuous surface, and sam-pling of the transformed surface to produce a new dis-crete image. We then consider the sampling process,and the subsidiary problem of intensity quantisation.Both these are well understood, and we present a sum-mary of existing work, laying a foundation for the cen-tral body of the report where the sub-process of recon-struction is studied.

The work on reconstruction divides into four parts,two general and two specific:

1. Piecewise local polynomials: the most studiedgroup of reconstructors. We examine these, and the cri-teria used in their design. One new derivation is of twopiecewise local quadratic reconstructors.

2. Infinite extent reconstructors: we consider theseand their local approximations, the problem of finiteimage size, the resulting edge effects, and the solu-tions to these problems. Amongst the reconstructorsdiscussed are the interpolating cubic B-spline and theinterpolating Bezier cubic. We derive the filter kernelsfor both of these, and prove that they are the same.Given this kernel we demonstrate how the interpolatingcubic B-spline can be extended from a one-dimensionalto a two-dimensional reconstructor, providing a consid-erable speed improvement over the existing method ofextension.

3. Fast Fourier transform reconstruction: it has longbeen known that the fast Fourier transform (FFT) canbe used to generate an approximation to perfect scalingof a sample set. Donald Fraser (in 1987) took this resultand generated a hybrid FFT reconstructor which can beused for general transformations, not just scaling. Wemodify Fraser’s method to tackle two major problems:its large time and storage requirements, and the edgeeffects it causes in the reconstructed intensity surface.

4. A priori knowledge reconstruction: first consid-ering what can be done if we know how the original

image was sampled, and then considering what can bedone with one particular class of image coupled withone particular type of sampling. In this latter case wefind that exact reconstruction of the image is possible.This is a surprising result as this class of images cannotbe exactly reconstructed using classical sampling the-ory.

The final section of the report draws all of thestrands together to discuss transformations and the re-sampling process as a whole. Of particular note hereis work on how the quality of different reconstructionand resampling methods can be assessed.

UCAM-CL-TR-262

Nick Benton, Gavin Bierman,Valeria de Paiva:

Term assignment forintuitionistic linear logic(preliminary report)August 1992, 57 pages, PDF

Abstract: In this paper we consider the problem of de-riving a term assignment system for Girard’s Intuition-istic Linear Logic for both the sequent calculus andnatural deduction proof systems. Our system differsfrom previous calculi (e.g. that of Abramsky) and hastwo important properties which they lack. These arethe substitution property (the set of valid deductionsis closed under substitution) and subject reduction (re-duction on terms is well typed).

We define a simple (but more general than previ-ous proposals) categorical model for Intuitionistic Lin-ear Logic and show how this can be used to derive theterm assignment system.

We also consider term reduction arising from cut-elimination in the sequent calculus and normalisationin natural deduction. We explore the relationship be-tween these, as well as with the equations which followfrom our categorical model.

UCAM-CL-TR-263

C.-H. Luke Ong:

The Lazy Lambda Calculus:an investigation into the foundationsof functional programmingAugust 1992, 256 pages, paper copyPhD thesis (Imperial College London, May 1988)

UCAM-CL-TR-264

Juanito Camilleri:

CCS with environmental guardsAugust 1992, 19 pages, paper copy

53

UCAM-CL-TR-265

Juanito Camilleri, Tom Melham:

Reasoning with inductively definedrelations in the HOL theorem proverAugust 1992, 49 pages, paper copy

UCAM-CL-TR-266

Carole Klein:

Automatic exploitation ofOR-parallelism in Prolog18 pages, paper copy

UCAM-CL-TR-267

Christine Ernoult, Alan Mycroft:

Untyped strictness analysis13 pages, paper copy

UCAM-CL-TR-268

Paul W. Jardetzky:

Network file server design forcontinuous mediaOctober 1992, 101 pages, PostScriptPhD thesis (Darwin College, August 1992)

Abstract: This dissertation concentrates on issues re-lated to the provision of a network based storage fa-cility for digital audio and video data. The goal is todemonstrate that a distributed file service in support ofthese media may be built without special purpose hard-ware. The main objective is to identify those parame-ters that affect file system performance and provide thecriteria for making desirable design decisions.

UCAM-CL-TR-269

Alan Mycroft, Arthur Norman:

Optimising compilation23 pages, paper copy

UCAM-CL-TR-270

Chaoying Ma:

Designing a universal name service133 pages, PDFPhD thesis (Newnham College, October 1992)

Abstract: Generally speaking, naming in computingsystems deals with the creation of object identifiers atall levels of system architecture and the mapping amongthem. Two of the main purposes of having names incomputer systems are (a) to identify objects; (b) to ac-complish sharing. Without naming no computer systemdesign can be done.

The rapid development in the technology of per-sonal workstations and computer communication net-works has placed a great number of demands on de-signing large computer naming systems. In this disser-tation, issues of naming in large distributed computingsystems are addressed. Technical aspects as well as sys-tem architecture are examined. A design of a Univer-sal Name Service (UNS) is proposed and its prototypeimplementation is described. Three major issues on de-signing a global naming system are studied. Firstly, it isobserved that none of the existing name services pro-vides enough flexibility in restructuring name spaces,more research has to be done. Secondly it is observedthat although using stale naming data (hints) at the ap-plication level is acceptable in most cases as long as itis detectable and recoverable, stronger naming data in-tegrity should be maintained to provide a better guar-antee of finding objects, especially when a high degreeof availability is required. Finally, configuring the nameservice is usually done in an ad hoc manner, leadingto unexpected interruptions or a great deal of humanintervention when the system is reconfigured. It is nec-essary to make a systematic study of automatic config-uration and reconfiguration of name services.

This research is based on a distributed computingmodel, in which a number of computers work coop-eratively to provide the service. The contributions in-clude: (a) the construction of a Globally Unique Direc-tory Identifier (GUDI) name space. Flexible name spacerestructuring is supported by allowing directories to beadded to or removed from the GUDI name space. (b)The definition of a two class name service infrastruc-ture which exploits the semantics of naming. It makesthe UNS replication control more robust, reliable aswell as highly available. (c) The identification of twoaspects in the name service configuration: one is con-cerned with the replication configuration, and the otheris concerned with the server configuration. It is notablethat previous work only studied these two aspects indi-vidually but not in combination. A distinguishing fea-ture of the UNS is that both issues are considered at thedesign stage and novel methods are used to allow dy-namic service configuration to be done automaticallyand safely.

54

UCAM-CL-TR-271


Set theory as a computational logic: I.from foundations to functionsNovember 1992, 28 pages, PDF, DVI

Abstract: A logic for specification and verification is de-rived from the axioms of Zermelo-Fraenkel set theory.The proofs are performed using the proof assistant Is-abelle. Isabelle is generic, supporting several differentlogics. Isabelle has the flexibility to adapt to variants ofset theory. Its higher-order syntax supports the defini-tion of new binding operators. Unknowns in subgoalscan be instantiated incrementally. The paper describesthe derivation of rules for descriptions, relations andfunctions, and discusses interactive proofs of Cantor’sTheorem, the Composition of Homomorphisms chal-lenge, and Ramsey’s Theorem. A generic proof assis-tant can stand up against provers dedicated to particu-lar logics.

UCAM-CL-TR-272

Martin David Coen:

Interactive program derivationNovember 1992, 100 pages, PDF, DVIPhD thesis (St John’s College, March 1992)

Abstract: As computer programs are increasingly usedin safety critical applications, program correctness isbecoming more important; as the size and complex-ity of programs increases, the traditional approach oftesting is becoming inadequate. Proving the correctnessof programs written in imperative languages is awk-ward; functional programming languages, however, of-fer more hope. Their logical structure is cleaner, and itis practical to reason about terminating functional pro-grams in an internal logic.

This dissertation describes the development of a log-ical theory called TPT for reasoning about the correct-ness of terminating functional programs, its implemen-tation using the theorem prover Isabelle, and its usein proving formal correctness. The theory draws bothfrom Martin-Lof’s work in type theory and Manna andWaldinger’s work in program synthesis. It is based onclassical first-order logic, and it contains terms thatrepresent classes of behaviourally equivalent programs,types that denote sets of terminating programs andwell-founded orderings. Well-founded induction is usedto reason about general recursion in a natural way andto separate conditions for termination from those forcorrectness.

The theory is implemented using the generic theo-rem prover Isabelle, which allows correctness proofsto be checked by machine and partially automated us-ing tactics. In particular, tactics for type checking use

the structure of programs to direct proofs. Type check-ing allows both the verification and derivation of pro-grams, reducing specifications of correctness to sets ofcorrectness conditions. These conditions can be provedin typed first-order logic, using well-known techniquesof reasoning by induction and rewriting, and then liftedup to TPT. Examples of program termination are as-serted and proved, using simple types. Behaviouralspecifications are expressed using dependent types, andthe correctness of programs asserted and then proved.As a non-trivial example, a unification algorithm isspecified and proved correct by machine.

The work in this dissertation clearly shows howa classical theory can be used to reason about pro-gram correctness, how general recursion can be rea-soned about, and how programs can direct proofs ofcorrectness.

UCAM-CL-TR-273

Innes A. Ferguson:

TouringMachines:an architecture for dynamic, rational,mobile agentsNovember 1992, 206 pages, PDF, PostScriptPhD thesis (Clare Hall, October 1992)

Abstract: It is becoming widely accepted that nei-ther purely reactive nor purely deliberative controltechniques are capable of producing the range ofbehaviours required of intelligent computational orrobotic agents in dynamic, unpredictable, multi-agentworlds. We present a new architecture for control-ling autonomous, mobile agents – building on previouswork addressing reactive and deliberative control meth-ods. The proposed multi-layered control architectureallows a resource-bounded, goal-directed agent to reactpromptly to unexpected changes in its environment; atthe same time it enables the agent to reason predictivelyabout potential conflicts by constructing and project-ing causal models or theories which hypothesise otheragents’ goals and intentions.

The line of research adopted is very much a prag-matic one. A single, common architecture has beenimplemented which, being extensively parametrized,allows an experimenter to study functionally- andbehaviourally-diverse agent configurations. A principalaim of this research is to understand the role differentfunctional capabilities play in constraining an agent’sbehaviour under varying environmental conditions. Tothis end, we have constructed an experimental testbedcomprising a simulated multi-agent world in which avariety of agent configurations and behaviours havebeen investigated. Experience with the new control ar-chitecture is described.

55

UCAM-CL-TR-274

Paul Curzon:

Of what use is a verified compilerspecification?23 pages, paper copy

UCAM-CL-TR-275

Barney Pell:

Exploratory learning in the game ofGO18 pages, PostScript

Abstract: This paper considers the importance of explo-ration to game-playing programs which learn by play-ing against opponents. The central question is whethera learning program should play the move which of-fers the best chance of winning the present game, orif it should play the move which has the best chance ofproviding useful information for future games. An ap-proach to addressing this question is developed usingprobability theory, and then implemented in two differ-ent learning methods. Initial experiments in the gameof Go suggest that a program which takes explorationinto account can learn better against a knowledgeableopponent than a program which does not.

UCAM-CL-TR-276

Barney Pell:

METAGAME: a new challenge forgames and learning15 pages, PostScript

Abstract: In most current approaches to ComputerGame-Playing, including those employing some formof machine learning, the game analysis mainly is per-formed by humans. Thus, we are sidestepping largelythe interesting (and difficult) questions. Human analy-sis also makes it difficult to evaluate the generality andapplicability of different approaches.

To address these problems, we introduce a new chal-lenge: Metagame. The idea is to write programs whichtake as input the rules of a set of new games withina pre-specified class, generated by a program which ispublicly available. The programs compete against eachother in many matches on each new game, and they canthen be evaluated based on their overall performanceand improvement through experience.

This paper discusses the goals, research areas, andgeneral concerns for the idea of Metagame.

UCAM-CL-TR-277

Barney Pell:

METAGAME in symmetric chess-likegames30 pages, PostScript

Abstract: I have implemented a game generator thatgenerates games from a wide but still restricted class.This class is general enough to include most aspectsof many standard games, including Chess, Shogi, Chi-nese Chess, Checkers, Draughts, and many variants ofFairy Chess. The generator, implemented in Prolog istransparent and publicly available, and generates gamesusing probability distributions for parameters such aspiece complexity, types of movement, board size, andlocality.

The generator is illustrated by means of a new gameit produced, which is then subjected to a simple strate-gic analysis. This form of analysis suggests that pro-grams to play Metagame well will either learn or ap-ply very general game-playing principles. But becausethe class is still restricted, it may be possible to de-velop a naive but fast program which can outplay moresophisticated opponents. Performance in a tournamentbetween programs is the deciding criterion.

UCAM-CL-TR-278

Monica Nesi:

A formalization of the processalgebra CCS in high order logic42 pages, PDF

Abstract: This paper describes a mechanization inhigher order logic of the theory for a subset of Mil-ner’s CCS. The aim is to build a sound and effectivetool to support verification and reasoning about pro-cess algebra specifications. To achieve this goal, the for-mal theory for pure CCS (no value passing) is defined inthe interactive theorem prover HOL, and a set of prooftools, based on the algebraic presentation of CCS, isprovided.

UCAM-CL-TR-279

Victor A. Carreno:

The transition assertions specificationmethod18 pages, paper copy

56

UCAM-CL-TR-280


Introduction to IsabelleJanuary 1993, 61 pages, DVI

Abstract: Isabelle is a generic theorem prover, support-ing formal proof in a variety of logics. Through a va-riety of examples, this paper explains the basic theorydemonstrates the most important commands. It servesas the introduction to other Isabelle documentation.

UCAM-CL-TR-281

Sape J. Mullender, Ian M. Leslie,Derek McAuley:

Pegasus project descriptionSeptember 1992, 23 pages, paper copy

UCAM-CL-TR-282

Ian M. Leslie, Derek McAuley,Sape J. Mullender:

Pegasus – Operating system supportfor distributed multimedia systemsDecember 1992, 14 pages, paper copy

UCAM-CL-TR-283


The Isabelle reference manualFebruary 1993, 78 pages, DVI

Abstract: This manual is a comprehensive descriptionof Isabelle, including all commands, functions andpackages. It is intended for reference rather than forreading through, and is certainly not a tutorial. Themanual assumes familiarity with the basic concepts ex-plained in Introduction to Isabelle. Functions are or-ganized by their purpose, by their operands (subgoals,tactics, theorems), and by their usefulness. In each sec-tion, basic functions appear first, then advanced func-tions, and finally esoteric functions.

UCAM-CL-TR-284

Claire Grover, John Carroll, Ted Briscoe:

The Alvey Natural Language Toolsgrammar (4th Release)January 1993, 260 pages, paper copy

UCAM-CL-TR-285

Andrew Donald Gordon:

Functional programming andinput/outputFebruary 1993, 163 pages, paper copyPhD thesis (King’s College, August 1992)

UCAM-CL-TR-286


Isabelle’s object-logicsFebruary 1993, 161 pages, DVI

Abstract: Several logics come with Isabelle. Many ofthem are sufficiently developed to serve as comfortablereasoning environments. They are also good startingpoints for defining new logics. Each logic is distributedwith sample proofs, some of which are presented inthe paper. The logics described include first-order logic,Zermelo-Fraenkel set theory, higher-order logic, con-structive type theory, and the classical sequent calculusLK. A final chapter explains the fine points of defininglogics in Isabelle.

UCAM-CL-TR-287

Andrew D. Gordon:

A mechanised definition of Silage inHOLFebruary 1993, 28 pages, DVI

Abstract: If formal methods of hardware verificationare to have any impact on the practices of working engi-neers, connections must be made between the languagesused in practice to design circuits, and those used forresearch into hardware verification. Silage is a simpledataflow language marketed for specifying digital sig-nal processing circuits. Higher Order Logic (HOL) isextensively used for research into hardware verifica-tion. This paper presents a formal definition of a sub-stantial subset of Silage, by mapping Silage declarationsinto HOL predicates. The definition has been mecha-nised in the HOL theorem prover to support the trans-formational design of Silage circuits as theorem provingin HOL.

UCAM-CL-TR-288

Rajeev Gore:

Cut-free sequent and tableau systemsfor propositional Diodorean modallogicsFebruary 1993, 19 pages, paper copy

57

UCAM-CL-TR-289

David Alan Howard Elworthy:

The semantics of noun phraseanaphoraFebruary 1993, 191 pages, paper copyPhD thesis (Darwin College, February 1993)

UCAM-CL-TR-290

Karen Sparck Jones:

Discourse modelling for automaticsummarisingFebruary 1993, 30 pages, paper copy

UCAM-CL-TR-291

J.R. Galliers, K. Sparck Jones:

Evaluating natural languageprocessing systemsFebruary 1993, 187 pages, PostScript

Abstract: This report presents a detailed analysis andreview of NLP evaluation, in principle and in practice.Part 1 examines evaluation concepts and establishesa framework for NLP system evaluation. This makesuse of experience in the related area of informationretrieval and the analysis also refers to evaluation inspeech processing. Part 2 surveys significant evaluationwork done so far, for instance in machine translation,and discusses the particular problems of generic systemevaluation. The conclusion is that evaluation strategiesand techniques for NLP need much more development,in particular to take proper account of the influence ofsystem tasks and settings. Part 3 develops a general ap-proach to NLP evaluation, aimed at methodologically-sound strategies for test and evaluation motivated bycomprehensive performance factor identification. Theanalysis throughout the report is supported by exten-sive illustrative examples.

UCAM-CL-TR-292

Cormac John Sreenan:

Synchronisation services fordigital continuous mediaMarch 1993, 123 pages, PostScriptPhD thesis (Christ’s College, October 1992)

Abstract: The development of broadband ATM net-working makes it attractive to use computer commu-nication networks for the transport of digital audioand motion video. Coupled with advances in worksta-tion technology, this creates the opportunity to inte-grate these continuous information media within a dis-tributed computing system. Continuous media have aninherent temporal dimension, resulting in a set of syn-chronisation requirements which have real-time con-straints. This dissertation identifies the role and posi-tion of synchronisation, in terms of the support whichis necessary in an integrated distributed system. Thiswork is supported by a set of experiments which wereperformed in an ATM inter-network using multi-mediaworkstations, each equipped with an Olivetti PandoraBox.

UCAM-CL-TR-293

Jean Bacon, Ken Moody:

Objects and transactions formodelling distributed applications:concurrency control and commitmentApril 1993, 39 pages, paper copy

UCAM-CL-TR-294

Ken Moody, Jean Bacon, Noha Adly,Mohamad Afshar, John Bates,Huang Feng, Richard Hayton, Sai Lai Lo,Scarlet Schwiderski, Robert Sultana,Zhixue Wu:

OPERAStorage, programming and display ofmultimedia objectsApril 1993, 9 pages, paper copy

UCAM-CL-TR-295

Jean Bacon, John Bates, Sai Lai Lo,Ken Moody:

OPERAStorage and presentation supportfor multimedia applicationsin a distributed, ATM networkenvironmentApril 1993, 12 pages, paper copy

58

UCAM-CL-TR-296

Z. Wu, K. Moody, J. Bacon:

A persistent programming languagefor multimedia databases in theOPERA projectApril 1993, 9 pages, paper copy

UCAM-CL-TR-297

Eike Ritter:

Categorical abstract machines forhigher-order typed lambda calculiApril 1993, 149 pages, paper copyPhD thesis (Trinity College)

UCAM-CL-TR-298

John Matthew Simon Doar:

Multicast in the asynchronoustransfer mode environmentApril 1993, 168 pages, PostScriptPhD thesis (St John’s College, January 1993)

Abstract: In future multimedia communication net-works, the ability to multicast information will be use-ful for many new and existing services. This disser-tation considers the design of multicast switches forAsynchronous Transfer Mode (ATM) networks andproposes one design based upon a slotted ring. Anal-ysis and simulation studies of this design are presentedand details of its implementation for an experimentalATM network (Project Fairisle) are described, togetherwith the modifications to the existing multi-service pro-tocol architecture necessary to provide multicast con-nections. Finally, a short study of the problem of mul-ticast routing is presented, together with some simula-tions of the long-term effect upon the routing efficiencyof modifying the number of destinations within a mul-ticast group.

UCAM-CL-TR-299

Bjorn Gamback, Manny Rayner, Barney Pell:

Pragmatic reasoning in bridgeApril 1993, 23 pages, PostScript

Abstract: In this paper we argue that bidding in thegame of Contract Bridge can profitably be regarded as amicro-world suitable for experimenting with pragmat-ics. We sketch an analysis in which a “bidding system”is treated as the semantics of an artificial language, andshow how this “language”, despite its apparent simplic-ity, is capable of supporting a wide variety of commonspeech acts parallel to those in natural languages; wealso argue that the reason for the relatively unsuccess-ful nature of previous attempts to write strong Bridgeplaying programs has been their failure to address theneed to reason explicitly about knowledge, pragmat-ics, probabilities and plans. We give an overview ofPragma, a system currently under development, whichembodies these ideas in concrete form, using a com-bination of rule-based inference, stochastic simulation,and “neural-net” learning. Examples are given illustrat-ing the functionality of the system in its current form.

UCAM-CL-TR-300

Wai Wong:

Formal verification of VIPER’s ALUApril 1993, 78 pages, paper copy

UCAM-CL-TR-301

Zhixue Wu, Ken Moody, Jean Bacon:

The dual-level validation concurrencycontrol methodJune 1993, 24 pages, paper copy

UCAM-CL-TR-302

Barney Pell:

Logic programming for generalgame-playingJune 1993, 15 pages, PostScript

Abstract: Meta-Game Playing is a new approach togames in Artificial Intelligence, where we construct pro-grams to play new games in a well-defined class, whichare output by an automatic game generator. As the spe-cific games to be played are not known in advance, adegree of human bias is eliminated, and playing pro-grams are required to perform any game-specific opti-misations without human assistance.

The attempt to construct a general game-playingprogram is made difficult by the opposing goals of gen-erality and efficiency. This paper shows how applica-tion of standard techniques in logic-programming (ab-stract interpretation and partial evaluation) makes itpossible to achieve both of these goals. Using thesetechniques, we can represent the semantics of a large

59

class of games in a general and declarative way, butthen have the program transform this representationinto a more efficient version once it is presented withthe rules of a new game. This process can be viewedas moving some of the responsibility for game analysis(that concerned with efficiency) from the researcher tothe program itself.

UCAM-CL-TR-303

Andrew Kennedy:

Drawing trees —a case study in functionalprogrammingJune 1993, 9 pages, paper copy

UCAM-CL-TR-304


Co-induction and co-recursion inhigher-order logicJuly 1993, 35 pages, PDF, PostScript, DVI

Abstract: A theory of recursive and corecursive defini-tions has been developed in higher-order logic (HOL)and mechanised using Isabelle. Least fixedpoints ex-press inductive data types such as strict lists; greatestfixedpoints express co-inductive data types, such as lazylists. Well-founded recursion expresses recursive func-tions over inductive data types; co-recursion expressesfunctions that yield elements of co-inductive data types.The theory rests on a traditional formalization of in-finite trees. The theory is intended for use in specifi-cation and verification. It supports reasoning about awide range of computable functions, but it does not for-malize their operational semantics and can express non-computable functions also. The theory is demonstratedusing lists and lazy lists as examples. The emphasis ison using co-recursion to define lazy list functions, andon using co-induction to reason about them.

UCAM-CL-TR-305

P.N. Benton:

Strong normalisation for thelinear term calculusJuly 1993, 13 pages, paper copy

UCAM-CL-TR-306

Wai Wong:

Recording HOL proofsJuly 1993, 57 pages, paper copy

UCAM-CL-TR-307

David D. Lewis, Karen Sparck Jones:

Natural language processing forinformation retrievalJuly 1993, 22 pages, PostScript

Abstract: The paper summarizes the essential propertiesof document retrieval and reviews both conventionalpractice and research findings, the latter suggesting thatsimple statistical techniques can be effective. It thenconsiders the new opportunities and challenges pre-sented by the ability to search full text directly (ratherthan e.g. titles and abstracts), and suggests appropriateapproaches to doing this, with a focus on the role ofnatural language processing. The paper also commentson possible connections with data and knowledge re-trieval, and concludes by emphasizing the importanceof rigorous performance testing.

UCAM-CL-TR-308

Jacob Frost:

A case study of co-induction inIsabelle HOLAugust 1993, 27 pages, PDF, PostScript, DVI

Abstract: The consistency of the dynamic and static se-mantics for a small functional programming languagewas informally proved by R. Milner and M. Tofte. Thenotions of co-inductive definitions and the associatedprinciple of co-induction played a pivotal role in theproof. With emphasis on co-induction, the work pre-sented here deals with the formalisation of this resultin the higher-order logic of the generic theorem proverIsabelle.

UCAM-CL-TR-309

Peter Nicholas Benton:

Strictness analysis of lazy functionalprogramsAugust 1993, 154 pages, paper copyPhD thesis (Pembroke College, December 1992)

60

UCAM-CL-TR-310

Noha Adly:

HARP: a hierarchical asynchronousreplication protocol for massivelyreplicated systemsAugust 1993, 34 pages, PostScript

Abstract: This paper presents a new asynchronousreplication protocol that is especially suitable for widearea and mobile systems, and allows reads and writesto occur at any replica. Updates reach other replicasusing a propagation scheme based on nodes organizedinto a logical hierarchy. The hierarchical structure en-ables the scheme to scale well for thousands of replicas,while ensuring reliable delivery. A new service interfaceis proposed that provides different levels of asynchrony,allowing strong consistency and weak consistency tobe integrated into the same framework. Further, dueto the hierarchical pattern of propagation, the schemeprovides the ability to locate replicas that are more up-to-date than others, depending on the needs of variousapplications. Also, it allows a selection from a num-ber of reconciliation techniques based on delivery ordermechanisms. Restructuring operations are provided tobuild and reconfigure the hierarchy dynamically with-out disturbing normal operations. The scheme toleratestransmission failures and network partitions.

UCAM-CL-TR-311

Paul Curzon:

A verified Vista implementationSeptember 1993, 56 pages, paper copy

UCAM-CL-TR-312


Set theory for verification:IIInduction and recursionSeptember 1993, 46 pages, PDF

Abstract: A theory of recursive definitions has beenmechanized in Isabelle’s Zermelo-Fraenkel (ZF) set the-ory. The objective is to support the formalization ofparticular recursive definitions for use in verification,semantics proofs and other computational reasoning.

Inductively defined sets are expressed as least fixed-points, applying the Knaster-Tarski Theorem over asuitable set. Recursive functions are defined by well-founded recursion and its derivatives, such as transfi-nite recursion. Recursive data structures are expressed

by applying the Knaster-Tarski Theorem to a set that isclosed under Cartesian product and disjoint sum.

Worked examples include the transitive closure ofa relation, lists, variable-branching trees and mutu-ally recursive trees and forests. The Schroder-BernsteinTheorem and the soundness of propositional logic areproved in Isabelle sessions.

UCAM-CL-TR-313

Yves Bertot, Gilles Kahn, Laurent Thery:

Proof by pointingOctober 1993, 27 pages, paper copy

UCAM-CL-TR-314

John Andrew Carroll:

Practical unification-based parsing ofnatural language173 pages, PostScriptPhD thesis (September 1993)

Abstract: The thesis describes novel techniques and al-gorithms for the practical parsing of realistic NaturalLanguage (NL) texts with a wide-coverage unification-based grammar of English. The thesis tackles two of themajor problems in this area: firstly, the fact that parsingrealistic inputs with such grammars can be computa-tionally very expensive, and secondly, the observationthat many analyses are often assigned to an input, onlyone of which usually forms the basis of the correct in-terpretation.

The thesis starts by presenting a new unification al-gorithm, justifies why it is well-suited to practical NLparsing, and describes a bottom-up active chart parserwhich employs this unification algorithm together withseveral other novel processing and optimisation tech-niques. Empirical results demonstrate that an imple-mentation of this parser has significantly better prac-tical performance than a comparable, state-of-the-artunification-based parser. Next, techniques for comput-ing an LR table for a large unification grammar are de-scribed, a context free non-deterministic LR parsing al-gorithm is presented which has better time complexitythan any previously reported using the same approach,and a unification-based version is derived. In experi-ments, the performance of an implementation of thelatter is shown to exceed both the chart parser and alsothat of another efficient LR-like algorithm recently pro-posed.

Building on these methods, a system for parsingtext taken from a given corpus is described which usesprobabilistic techniques to identify the most plausiblesyntactic analyses for an input from the often large

61

number licensed by the grammar. New techniques im-plemented include an incremental approach to semi-supervised training, a context-sensitive method of scor-ing sub-analyses, the accurate manipulation of prob-abilities during parsing, and the identification of thehighest ranked analyses without exhaustive search. Thesystem attains a similar success rate to approachesbased on context-free grammar, but produces analyseswhich are more suitable for semantic processing.

The thesis includes detailed analyses of the worst-case space and time complexities of all the main algo-rithms described, and discusses the practical impact ofthe theoretical complexity results.

UCAM-CL-TR-315

Barney Darryl Pell:

Strategy generation and evaluationfor meta-game playingNovember 1993, 289 pages, PostScriptPhD thesis (Trinity College, August 1993)

Abstract: Meta-Game Playing (METAGAME) is a newparadigm for research in game-playing in which we de-sign programs to take in the rules of unknown gamesand play those games without human assistance. Strongperformance in this new paradigm is evidence that theprogram, instead of its human designer, has performedthe analysis of each specific game.

SCL-METAGAME is a concrete METAGAME re-search problem based around the class of symmet-ric chess-like games. The class includes the games ofchess, checkers, noughts and crosses, Chinese-chess,and Shogi. An implemented game generator producesnew games in this class, some of which are objects ofinterest in their own right.

METAGAMER is a program that plays SCL-METAGAME. The program takes as input the rules ofa specific game and analyses those rules to constructfor that game an efficient representation and an evalua-tion function, both for use with a generic search engine.The strategic analysis performed by the program relatesa set of general knowledge sources to the details of theparticular game. Among other properties, this analysisdetermines the relative value of the different pieces ina given game. Although METAGAMER does not learnfrom experience, the values resulting from its analysisare qualitatively similar to values used by experts onknown games, and are sufficient to produce competitiveperformance the first time the program actually playseach game it is given. This appears to be the first pro-gram to have derived useful piece values directly fromanalysis of the rules of different games.

Experiments show that the knowledge implementedin METAGAMER is useful on games unknown to itsprogrammer in advance of the competition and makeit seem likely that future programs which incorporatelearning and more sophisticated active-analysis tech-niques will have a demonstrable competitive advantage

on this new problem. When playing the known gamesof chess and checkers against humans and specialisedprograms, METAGAMER has derived from more gen-eral principles some strategies which are familiar toplayers of those games and which are hard-wired inmany game-specific programs.

UCAM-CL-TR-316

Ann Copestake:

The Compleat LKBAugust 1993, 126 pages, PostScript

Abstract: This report is a full description of the lexicalknowledge base system (LKB) and the representationlanguage (LRL) developed on the Esprit ACQUILEXproject. The LKB system is designed to allow the rep-resentation of multilingual lexical information in a waywhich integrates lexical semantics with syntax and for-mal semantics. The LRL is a typed feature structure lan-guage which makes it possible to represent the lexiconas a highly structured object and to capture relation-ships between individual word senses by (default) in-heritance and by lexical rules. The extension to multi-lingual representation allows a concise and natural de-scription of translation mismatches. Most of this reportconsists of a detailed formal description of the LRL —this is augmented with appendices containing the usermanual, an implementation outline and a discussion ofsome of the algorithms used, and a bibliography of pa-pers which describe the LKB and its use within AC-QUILEX. (Some of this material has been publishedpreviously, but is included here to make this report aconvenient reference source.)

UCAM-CL-TR-317

John Peter Van Tassel:

Femto-VHDL:the semantics of a subset of VHDLand its embedding in the HOL proofassistantNovember 1993, 122 pages, paper copyPhD thesis (Gonville & Caius College, July 1993)

UCAM-CL-TR-318

Jim Grundy:

A method of program refinementNovember 1993, 207 pages, PostScriptPhD thesis (Fitzwilliam College, November 1993)

62

Abstract: A method of specifying the desired behaviourof a computer program, and of refining such specifica-tions into imperative programs is proposed. The refine-ment method has been designed with the intention ofbeing amenable to tool support, and of being applica-ble to real-world refinement problems.

Part of the refinement method proposed involvesthe use of a style of transformational reasoning called‘window inference’. Window inference is particularlypowerful because it allows the information inherent inthe context of a subexpression to be used in its trans-formation. If the notion of transformational reasoningis generalised to include transformations that preserverelationships weaker than equality, then program re-finement can be regarded as a special case of trans-formational reasoning. A generalisation of window in-ference is described that allows non-equivalence pre-serving transformations. Window inference was orig-inally proposed independently from, and as an alter-native to, traditional styles of reasoning. A correspon-dence between the generalised version of window in-ference and natural deduction is described. This corre-spondence forms the basis of a window inference toolthat has been built on top of the HOL theorem provingsystem.

This dissertation adopts a uniform treatment ofspecifications and programs as predicates. A survey ofthe existing approaches to the treatment of programsas predicates is presented. A new approach is then de-veloped based on using predicates of a three-valuedlogic. This new approach can distinguish more easilybetween specifications of terminating and nonterminat-ing behaviour than can the existing approaches.

A method of program refinement is then describedby combining the unified treatment of specificationsand programs as three-valued predicates with the win-dow inference style of transformational reasoning. Theresult is a simple method of refinement that is wellsuited to the provision of tool support.

The method of refinement includes a technique fordeveloping recursive programs. The proof of such de-velopments is usually complicated because little can beassumed about the form and termination properties ofa partially developed program. These difficulties areside-stepped by using a simplified meaning for recur-sion that compels the development of terminating pro-grams. Once the development of a program is complete,the simplified meaning for recursion is refined into thetrue meaning.

The dissertation concludes with a case study whichpresents the specification and development of a simpleline-editor. The case study demonstrates the applicabil-ity of the refinement method to real-world problems.The line editor is a nontrivial example that containsfeatures characteristic of large developments, includingcomplex data structures and the use of data abstrac-tion. Examination of the case study shows that windowinference offers a convenient way of structuring largedevelopments.

UCAM-CL-TR-319

Mark David Hayter:

A workstation architecture tosupport multimediaNovember 1993, 99 pages, PostScriptPhD thesis (St John’s College, September 1993)

Abstract: The advent of high speed networks in thewide and local area enables multimedia traffic to beeasily carried between workstation class machines. Thedissertation considers an architecture for a workstationto support such traffic effectively. In addition to pre-senting the information to a human user the architec-ture allows processing to be done on continuous mediastreams.

The proposed workstation architecture, known asthe Desk Area Network (DAN), extends ideas fromAsynchronous Transfer Mode (ATM) networks into theend-system. All processors and devices are connectedto an ATM interconnect. The architecture is shown tobe capable of supporting both multimedia data streamsand more traditional CPU cache line traffic. The advo-cated extension of the CPU cache which allows cachingof multimedia data streams is shown to provide a natu-ral programming abstraction and a mechanism for syn-chronising the processor with the stream.

A prototype DAN workstation has been built. Ex-periments have been done to demonstrate the featuresof the architecture. In particular the use of the DAN asa processor-to-memory interconnect is closely studiedto show the practicality of using ATM for cache linetraffic in a real machine. Simple demonstrations of thestream cache ideas are used to show its utility in futureapplications.

UCAM-CL-TR-320


A fixedpoint approach toimplementing (co)inductivedefinitions (updated version)July 1995, 29 pages, PDF, DVI

Abstract: Several theorem provers provide commandsfor formalizing recursive datatypes or inductively de-fined sets. This paper presents a new approach, basedon fixedpoint definitions. It is unusually general: it ad-mits all monotone inductive definitions. It is concep-tually simple, which has allowed the easy implementa-tion of mutual recursion and other conveniences. It alsohandles coinductive definitions: simply replace the leastfixedpoint by a greatest fixedpoint. This represents thefirst automated support for coinductive definitions.

The method has been implemented in Isabelle’s for-malization of ZF set theory. It should be applicable to

63

any logic in which the Knaster-Tarski Theorem can beproved. The paper briefly describes a method of formal-izing non-well-founded data structures in standard ZFset theory.

Examples include lists of n elements, the accessiblepart of a relation and the set of primitive recursive func-tions. One example of a coinductive definition is bisim-ulations for lazy lists. Recursive datatypes are examinedin detail, as well as one example of a “codatatype”: lazylists. The appendices are simple user’s manuals for thisIsabelle/ZF package.

UCAM-CL-TR-321

Andrew M. Pitts:

Relational properties of domainsDecember 1993, 38 pages, PostScript

Abstract: New tools are presented for reasoning aboutproperties of recursively defined domains. We workwithin a general, category-theoretic framework for var-ious notions of ‘relation’ on domains and for actionsof domain constructors on relations. Freyd’s analysis ofrecursive types in terms of a property of mixed initial-ity/finality is transferred to a corresponding propertyof invariant relations. The existence of invariant rela-tions is proved under completeness assumptions aboutthe notion of relation. We show how this leads to sim-pler proofs of the computational adequacy of denota-tional semantics for functional programming languageswith user-declared datatypes. We show how the initial-ity/finality property of invariant relations can be spe-cialized to yield an induction principle for admissiblesubsets of recursively defined domains, generalizing theprinciple of structural induction for inductively definedsets. We also show how the initiality/finality propertygives rise to the co-induction principle studied by theauthor (in UCAM-CL-TR-252), by which equalities be-tween elements of recursively defined domains may beproved via an appropriate notion of ‘bisimulation’.

UCAM-CL-TR-322

Guangxing Li:

Supporting distributed realtimecomputingDecember 1993, 113 pages, paper copyPhD thesis (King’s College, August 1993)

UCAM-CL-TR-323

J. von Wright:

Representing higher-order logicproofs in HOLJanuary 1994, 28 pages, paper copy

UCAM-CL-TR-324

J. von Wright:

Verifying modular programs inHOLJanuary 1994, 25 pages, paper copy

UCAM-CL-TR-325

Richard Crouch:

The temporal properties of Englishconditionals and modalsJanuary 1994, 248 pages, PDFPhD thesis (April 1993)

Abstract: This thesis deals with the patterns of temporalreference exhibited by conditional and modal sentencesin English, and specifically with the way that past andpresent tenses can undergo deictic shift in these con-texts. This shifting behaviour has consequences bothfor the semantics of tense and for the semantics of con-ditionals and modality.

Asymmetries in the behaviour of the past andpresent tenses under deictic shift are explained bypositing a primary and secondary deictic centre fortenses. The two deictic centres, the assertion time andthe verification time, are given independent motivationthrough an information based view of tense. This holdsthat the tense system not only serves to describe theway that the world changes over time, but also the waythat information about the world changes. Informationchange takes place in two stages. First, it is assertedthat some fact holds. And then, either at the same timeor later, it is verified that is assertion is correct.

Typically, assertion and verification occur simulta-neously, and most sentences convey verified informa-tion. Modals and conditionals allow delayed assertionand verification. “If A, then B” means roughly: supposeyou were now to assert A; if and when A is verified, youwill be in a position to assert B, and in due course thisassertion will also be verified. Since A and B will bothbe tensed clauses, the shifting of the primary and sec-ondary deictic centres leads to shifted interpretations ofthe two clauses.

The thesis presents a range of temporal propertiesof indicative and subjunctive conditionals that have notpreviously been discussed, and shows how they can beexplained. A logic is presented for indicative condition-als, based around an extension of intuitionistic logic toallow for both verified and unverified assertions. Thislogic naturally gives rise to three forms of epistemicmodality, corresponding to “must”, “may” and “will”.

64

UCAM-CL-TR-326

Sai-Lai Lo:

A modular and extensible networkstorage architectureJanuary 1994, 147 pages, PostScriptPhD thesis (Darwin College, November 1993)

Abstract: Most contemporary distributed file systemsare not designed to be extensible. This work asserts thatthe lack of extensibility is a problem because:

– New data types, such as continuous-medium dataand structured data, are significantly different fromconventional unstructured data, such as text and bi-nary, that contemporary distributed file systems arebuilt to support.

– Value-adding clients can provide functional en-hancements, such as convenient and reliable persistentprogramming and automatic and transparent file index-ing, but cannot be integrated smoothly with contempo-rary distributed file systems.

– New media technologies, such as the optical juke-box and RAID disk, can extend the scale and per-formance of a storage service but contemporary dis-tributed file systems do not have a clear framework toincorporate these new technologies and to provide thenecessary user level transparency.

Motivated by these observations, the new networkstorage architecture (MSSA) presented in this disserta-tion, is designed to be extensible. Design modularity istaken as the key to achieve service extensibility. Thisdissertation examines a number of issues related to thedesign of the architecture. New ideas, such as a flexi-ble access control mechanism based on temporary ca-pabilities, a low level storage substrate that uses non-volatile memory to provide atomic update semantics athigh performance, a concept of sessions to differenti-ate performance requirements of different data types,are introduced. Prototype implementations of the keycomponents are evaluated.

UCAM-CL-TR-327

Siani L. Baker:

A new application forexplanation-based generalisationwithin automated deductionFebruary 1994, 18 pages, paper copy

UCAM-CL-TR-328

Paul Curzon:

The formal verification of theFairisle ATM switching element:an overviewMarch 1994, 46 pages, paper copy

UCAM-CL-TR-329

Paul Curzon:

The formal verification of theFairisle ATM switching elementMarch 1994, 105 pages, paper copy

UCAM-CL-TR-330

Pierre David Wellner:

Interacting with paper on theDigitalDeskMarch 1994, 96 pages, PDFPhD thesis (Clare Hall, October 1993)

Abstract: In the 1970’s Xerox PARC developed the“desktop metaphor,” which made computers easy touse by making them look and act like ordinary desksand paper. This led visionaries to predict the “paper-less office” would dominate within a few years, but thetrouble with this prediction is that people like papertoo much. It is portable, tactile, universally accepted,and easier to read than a screen. Today, we continue touse paper, and computers produce more of it than theyreplace.

Instead of trying to use computers to replace paper,the DigitalDesk takes the opposite approach. It keepsthe paper, but uses computers to make it more power-ful. It provides a Computer Augmented Environmentfor paper.

The DigitalDesk is built around an ordinary physi-cal desk and can be used as such, but it has extra ca-pabilities. A video camera is mounted above the desk,pointing down at the work surface. This camera’s out-put is fed through a system that can detect where theuser is pointing, and it can read documents that areplaced on the desk. A computer-driven electronic pro-jector is also mounted above the desk, allowing the sys-tem to project electronic objects onto the work surfaceand onto real paper documents — something that can’tbe done with flat display panels or rear-projection. Thesystem is called DigitalDesk because it allows pointingwith the fingers.

Several applications have been prototyped on theDigitalDesk. The first was a calculator where a sheetof paper such as an annual report can be placed on thedesk allowing the user to point at numbers with a fin-ger or pen. The camera reads the numbers off the pa-per, recognizes them, and enters them into the displayfor further calculations. Another is a translation sys-tem which allows users to point at unfamiliar Frenchwords to get their English definitions projected downnext to the paper. A third is a paper-based paint pro-gram (PaperPaint) that allows users to sketch on pa-per using traditional tools, but also be able to select

65

and paste these sketches with the camera and projec-tor to create merged paper and electronic documents.A fourth application is the DoubleDigitalDesk, whichallows remote colleagues to “share” their desks, lookat each other’s paper documents and sketch on themremotely.

This dissertation introduces the concept of Com-puter Augmented Environments, describes the Digi-talDesk and applications for it, and discusses someof the key implementation issues that need to be ad-dressed to make this system work. It describes a toolkitfor building DigitalDesk applications, and it concludeswith some more ideas for future work.

UCAM-CL-TR-331

Noha Adly, Akhil Kumar:

HPP: a hierarchical propagationprotocol for large scale replication inwide area networksMarch 1994, 24 pages, paper copy

UCAM-CL-TR-332

David Martin Evers:

Distributed computing with objectsMarch 1994, 154 pages, paper copyPhD thesis (Queens’ College, September 1993)

UCAM-CL-TR-333

G.M. Bierman:

What is a categorical model ofintuitionistic linear logic?April 1994, 15 pages, paper copy

UCAM-CL-TR-334


A concrete final coalgebra theoremfor ZF set theoryMay 1994, 21 pages, PDF, PostScript, DVI

Abstract: A special final coalgebra theorem, in the styleof Aczel (1988), is proved within standard Zermelo-Fraenkel set theory. Aczel’s Anti-Foundation Axiom isreplaced by a variant definition of function that admitsnon-well-founded constructions. Variant ordered pairsand tuples, of possibly infinite length, are special casesof variant functions. Analogues of Aczel’s Solution and

Substitution Lemmas are proved in the style of Ruttenand Turi (1993).

The approach is less general than Aczel’s; non-well-founded objects can be modelled only using the vari-ant tuples and functions. But the treatment of non-well-founded objects is simple and concrete. The final coal-gebra of a functor is its greatest fixedpoint. The the-ory is intended for machine implementation and a sim-ple case of it is already implemented using the theoremprover Isabelle.

UCAM-CL-TR-335

G.J.F. Jones, J.T. Foote, K. Sparck Jones,S.J. Young:

Video mail retrieval using voice:report on keyword definition anddata collection (deliverable report onVMR task No. 1)April 1994, 38 pages, PDF

Abstract: This report describes the rationale, design,collection and basic statistics of the initial training andtest database for the Cambridge Video Mail Retrieval(VMR) project. This database is intended to supportboth training for the wordspotting processes and test-ing for the document searching methods using thesethat are being developed for the project’s message re-trieval task.

UCAM-CL-TR-336

Barnaby P. Hilken:

Towards a proof theory of rewriting:the simply-typed 2-λ calculusMay 1994, 28 pages, paper copy

UCAM-CL-TR-337

Richard John Boulton:

Efficiency in a fully-expansivetheorem proverMay 1994, 126 pages, DVIPhD thesis (Churchill College, December 1993)

Abstract: The HOL system is a fully-expansive theoremprover: Proofs generated in the system are composedof applications of the primitive inference rules of theunderlying logic. This has two main advantages. First,the soundness of the system depends only on the imple-mentations of the primitive rules. Second, users can begiven the freedom to write their own proof procedureswithout the risk of making the system unsound. A full

66

functional programming language is provided for thispurpose. The disadvantage with the approach is thatperformance is compromised. This is partly due to theinherent cost of fully expanding a proof but, as demon-strated in this thesis, much of the observed inefficiencyis due to the way the derived proof procedures are writ-ten.

This thesis seeks to identify sources of non-inherentinefficiency in the HOL system and proposes somegeneral-purpose and some specialised techniques foreliminating it. One area that seems to be particularlyamenable to optimisation is equational reasoning. Thisis significant because equational reasoning constituteslarge portions of many proofs. A number of techniquesare proposed that transparently optimise equationalreasoning. Existing programs in the HOL system re-quire little or no modification to work faster.

The other major contribution of this thesis is aframework in which part of the computation involvedin HOL proofs can be postponed. This enables users tomake better use of their time. The technique exploits aform of lazy evaluation. The critical feature is the sepa-ration of the code that generates the structure of a theo-rem from the code that justifies it logically. Delaying thejustification allows some non-local optimisations to beperformed in equational reasoning. None of the tech-niques sacrifice the security of the fully-expansive ap-proach.

A decision procedure for a subset of the theory oflinear arithmetic is used to illustrate many of the tech-niques. Decision procedures for this theory are com-monplace in theorem provers due to the importance ofarithmetic reasoning. The techniques described in thethesis have been implemented and execution times aregiven. The implementation of the arithmetic procedureis a major contribution in itself. For the first time, usersof the HOL system are able to prove many arithmeticlemmas automatically in a practical amount of time(typically a second or two).

The applicability of the techniques to other fully-expansive theorem provers and possible extensions ofthe ideas are considered.

UCAM-CL-TR-338

Zhixue Wu:

A new approach to implementingatomic data typesMay 1994, 170 pages, paper copyPhD thesis (Trinity College, October 1993)

UCAM-CL-TR-339

Brian Logan, Steven Reece, Alison Cawsey,Julia Galliers, Karen Sparck Jones:

Belief revision and dialoguemanagement in information retrievalMay 1994, 227 pages, PDF

Abstract: This report describes research to evaluate atheory of belief revision proposed by Galliers in thecontext of information-seeking interaction as modelledby Belkin, Brooks and Daniels and illustrated by user-librarian dialogues. The work covered the detailed as-sessment and development, and computational imple-mentation and testing, of both the belief revision theoryand the information retrieval model. Some features ofthe belief theory presented problems, and the original‘multiple expert’ retrieval model had to be drasticallymodified to support rational dialogue management. Butthe experimental results showed that the characteristicsof literature seeking interaction could be successfullycaptured by the belief theory, exploiting important el-ements of the retrieval model. Thus, though the sys-tem’s knowledge and dialogue performance were verylimited, it provides a useful base for further research.The report presents all aspects of the research in de-tail, with particular emphasis on the implementation ofbelief and intention revision, and the integration of re-vision with domain reasoning and dialogue interaction.

UCAM-CL-TR-340

Eoin Andrew Hyden:

Operating system support for qualityof serviceJune 1994, 102 pages, PDFPhD thesis (Wolfson College, February 1994)

Abstract: The deployment of high speed, multiservicenetworks within the local area has meant that it hasbecome possible to deliver continuous media data to ageneral purpose workstation. This, in conjunction withthe increasing speed of modern microprocessors, meansthat it is now possible to write application programswhich manipulate continuous media in real-time. Un-fortunately, current operating systems do not providethe resource management facilities which are requiredto ensure the timely execution of such applications.

This dissertation presents a flexible resource man-agement paradigm, based on the notion of Quality ofService, with which it is possible to provide the schedul-ing support required by continuous media applications.The mechanisms which are required within an operat-ing system to support this paradigm are described, andthe design and implementation of a prototypical kernelwhich implements them is presented.

It is shown that, by augmenting the interface be-tween an application and the operating system, the ap-plication can be informed of varying resource availabil-ities, and can make use of this information to vary thequality of its results. In particular an example decoderapplication is presented, which makes use of such in-formation and exploits some of the fundamental prop-erties of continuous media data to trade video imagequality for the amount of processor time which it re-ceives.

67

UCAM-CL-TR-341

John Bates:

Presentation support for distributedmultimedia applicationsJune 1994, 140 pages, PostScript

Abstract: Distributed computing environments cannow support digital continuous media (such as audioand video) in addition to still media (such as text andpictures). The work presented in this dissertation is mo-tivated by the desire of application developers to cre-ate applications which utilise these multimedia environ-ments. Many important application areas are emerg-ing such as Computer-Aided Instruction (CAI) andComputer-Supported Cooperative Working (CSCW).

Building multimedia applications is currently a dif-ficult and time consuming process. At run-time, an ap-plication must manage connections to a range of het-erogeneous sevices to access data. Building applicationsdirectly on top of environment specific features rootsthem to those features. Continuous media introducesnew problems into application management such ascontrol of Quality of Service (QoS) and synchronisa-tion of data items. An application may also be requiredto analyse, process or display data. Some multimediaapplications are event-driven, i.e. they must performactions in response to asynchronous run-time occur-rences. They may also be required to control manyworkspaces and involve multiple users.

The thesis of this dissertation is based on two prin-ciples. Firstly, despite the heterogeneity between andwithin multimedia environments, that their function-ality should be provided in a uniform way to appli-cation developers. By masking the control differenceswith generic abstractions, applications can easily be de-veloped and ported. Secondly, that it is possible to de-velop such abstractions to support a wide range of mul-timedia applications. Extensible and configurable facil-ities can be provided to access, and present multimediadata and to support event-diven applications includingcooperative ones.

The approach taken in this work is to provide apresentation support platform. To application develop-ers this platform offers an authoring interface basedon data modelling and specification using a script lan-guage. Using these facilities, the parts of an applicationinvolving interactive presentation of multimedia can bespecified. Services have been built to support the run-time realisation of authored presentations on top of en-vironments. Experiments show that a wide range of ap-plications can be supported.

UCAM-CL-TR-342

Stephen Martin Guy Freeman:

An architecture for distributed userinterfaces

July 1994, 127 pages, PDFPhD thesis (Darwin College, 1994)

Abstract: Computing systems have changed rapidlysince the first graphical user interfaces were developed.Hardware has become faster and software architectureshave become more flexible and more open; a mod-ern computing system consists of many communicatingmachines rather than a central host. Understanding ofhuman-computer interaction has also become more so-phisticated and places new demands on interactive soft-ware; these include, in particular, support for multi-userapplications, continuous media, and ‘ubiquitous’ com-puting. The layer which binds user requirements andcomputing systems together, the user interface, has notchanged as quickly; few user interface architectures caneasily supportthe new requirements placed on them andfew take advantage of the facilities offered by advancedcomputing systems.

Experiences of implementing systems with unusualuser interfaces has shown that current window systemmodels are only a special case of possible user interfacearchitectures. These window systems are too stronglytied to assumptions about how users and computers in-teract to provide a suitable platform for further evolu-tion. Users and application builders may reasonably ex-pect to be able to use multiple input and output devicesas their needs arise. Experimental applications showthat flexible user interface architectures, which supportmultiple devices and users, can be built without exces-sive implementation and processing costs.

This dissertation describes Gemma, a model for anew generation of interactive systems that are not con-fined to virtual terminals but allows collections of in-dependent devices to be bound together for the task athand. It provides mediated shared access to basic de-vices and higher-level virtual devices so that people canshare computational facilities in the real world, ratherthan in a virtual world. An example window systemshows how these features may be exploited to providea flexible, collaborative and mobile interactive environ-ment.

UCAM-CL-TR-344

Martin John Turner:

The contour tree image encodingtechnique and file formatJuly 1994, 154 pages, paper copyPhD thesis (St John’s College, April 1994)

UCAM-CL-TR-345

Siani L. Baker:

A proof environment for arithmeticwith the Omega ruleAugust 1994, 17 pages, paper copy

68

UCAM-CL-TR-346

G.M. Bierman:

On intuitionistic linear logicAugust 1994, 191 pages, paper copyPhD thesis (Wolfson College, December 1993)

Abstract: In this thesis we carry out a detailed study ofthe (propositional) intuitionistic fragment of Girard’slinear logic (ILL). Firstly we give sequent calculus, nat-ural deduction and axiomatic formulations of ILL. Inparticular our natural deduction is different from oth-ers and has important properties, such as closure un-der substitution, which others lack. We also study theprocess of reduction in all three local formulations, in-cluding a detailed proof of cut elimination. Finally, weconsider translations between Instuitionistic Logic (IL)and ILL.

We then consider the linear term calculus, whicharises from applying the Curry-Howard correspon-dence to the natural deduction formulation. We showhow the various proof theoretic formulations suggestreductions at the level of terms. The properties of strongnormalization and confluence are proved for these re-duction rules. We also consider mappings between theextended λ-calculus and the linear term calculus.

Next we consider a categorical model for ILL. Weshow how by considering the linear term calculus asan equational logic, we can derive a model: a linearcategory. We consider two alternative models: firstly,one due to Seely and then one due to Lafont. Surpris-ingly, we find that Seely’s model is not sound, in thatequal terms are not modelled with equal morphisms.We show how after adapting Seely’s model (by view-ing it in a more abstract setting) it becomes a particularinstance of a linear category. We show how Lafont’smodel can also be seen as another particular instanceof a linear category. Finally we consider various cate-gories of coalgebras, whose construction can be seenas a categorical equivalent of the translation of IL intoILL.

UCAM-CL-TR-347

Karen Sparck Jones:

Reflections on TRECJuly 1994, 35 pages, PostScript, DVI

Abstract: This paper discusses the Text REtrieval Con-ferences (TREC) programme as a major enterprise ininformation retrieval research. It reviews its structureas an evaluation exercise, characterises the methods ofindexing and retrieval being tested within it in termsof the approaches to system performance factors theserepresent; analyses the test results for solid, overall con-clusions that can be drawn from them; and, in the lightof the particular features of the test data, assesses TRECboth for generally-applicable findings that emerge fromit and for directions it offers for future research.

UCAM-CL-TR-348

Jane Louise Hunter:

Integrated sound synchronisation forcomputer animationAugust 1994, 248 pages, paper copyPhD thesis (Newnham College, August 1994)

UCAM-CL-TR-349

Brian Graham:

A HOL interpretation of NodenSeptember 1994, 78 pages, paper copy

UCAM-CL-TR-350

Jonathan P. Bowen, Michael G. Hinchey:

Ten commandments of formalmethodsSeptember 1994, 18 pages, paper copy

UCAM-CL-TR-351

Subir Kumar Biswas:

Handling realtime traffic in mobilenetworksSeptember 1994, 198 pages, PostScriptPhD thesis (Darwin College, August 1994)

Abstract: The rapidly advancing technology of cellu-lar communication and wireless LAN makes ubiquitouscomputing feasible where the mobile users can have ac-cess to the location independent information and thecomputing resources. Multimedia networking is an-other emerging technological trend of the 1990s andthere is an increasing demand for supporting continu-ous media traffic in wireless personal communicationenvironment. In order to guarantee the strict perfor-mance requirements of realtime traffic, the connection-oriented approaches are proving to be more efficientcompared to the conventional datagram based net-working. This dissertation deals with a network ar-chitecture and its design issues for implementing theconnection-oriented services in a mobile radio environ-ment.

The wired backbone of the proposed wireless LANcomprises of high speed ATM switching elements, con-nected in a modular fashion, where the new switchesand the user devices can be dynamically added and re-connected for maintaining a desired topology. A dy-namic reconfiguration protocol, which can cope withthese changing network topologies, is proposed for the

69

present network architecture. The details about a pro-totype implementation of the protocol and a simulationmodel for its performance evaluation are presented.

CSMA/AED, a single frequency and carrier sensingbased protocol is proposed for the radio medium accessoperations. A simulation model is developed in order toinvestigate the feasibility of this statistical and reliableaccess scheme for the proposed radio network archi-tecture. The effectiveness of a per-connection windowbased flow control mechanism, for the proposed radioLAN, is also investigated. A hybrid technique is used,where the medium access and the radio data-link layersare modelled using the mentioned simulator; an upperlayer end-to-end queueing model, involving flow de-pendent servers, is solved using an approximate MeanValue Analysis technique which is augmented for fasteriterative convergence.

A distributed location server, for managing mobileusers’ location information and for aiding the mobileconnection management tasks, is proposed. In order tohide the effects of mobility from the non-mobile net-work entities, the concept of a per-mobile software en-tity, known as a “representative”, is introduced. A mo-bile connection management scheme is also proposedfor handling the end-to-end network layer connectionsin the present mobile environment. The scheme uses therepresentatives and a novel connection caching tech-nique for providing the necessary realtime traffic sup-port functionalities.

A prototype system, comprising of the proposed lo-cation and the connection managers, has been built fordemonstrating the feasibility of the presented architec-ture for transporting continuous media traffic. A set ofexperiments have been carried out in order to inves-tigate the impacts of various design decisions and toidentify the performance-critical parts of the design.

UCAM-CL-TR-352

P.N. Benton:

A mixed linear and non-linear logic:proofs, terms and modelsOctober 1994, 65 pages, paper copy

UCAM-CL-TR-353

Mike Gordon:

Merging HOL with set theoryNovember 1994, 40 pages, PDF

Abstract: Set theory is the standard foundation formathematics, but the majority of general purposemechanized proof assistants support versions of typetheory (higher order logic). Examples include Alf, Au-tomath, Coq, Ehdm, HOL, IMPS, Lambda, LEGO,Nuprl, PVS and Veritas. For many applications type

theory works well and provides for specification thebenefits of type-checking that are well known in pro-gramming. However, there are areas where types get inthe way or seem unmotivated. Furthermore, most peo-ple with a scientific or engineering background alreadyknow set theory, whereas type theory may appear in-accessible and so be an obstacle to the uptake of proofassistants based on it. This paper describes some ex-periments (using HOL) in combining set theory andtype theory; the aim is to get the best of both worldsin a single system. Three approaches have been tried,all based on an axiomatically specified type V of ZF-like sets: (i) HOL is used without any additions besidesV; (ii) an embedding of the HOL logic into V is pro-vided; (iii) HOL axiomatic theories are automaticallytranslated into set-theoretic definitional theories. Theseapproaches are illustrated with two examples: the con-struction of lists and a simple lemma in group theory.

UCAM-CL-TR-354

Sten Agerholm:

Formalising a model of the λ-calculusin HOL-STNovember 1994, 31 pages, PDF

Abstract: Many new theorem provers implement strongand complicated type theories which eliminate some ofthe limitations of simple type theories such as the HOLlogic. A more accessible alternative might be to use acombination of set theory and simple type theory as inHOL-ST which is a version of the HOL system sup-porting a ZF-like set theory in addition to higher orderlogic. This paper presents a case study on the use ofHOL-ST to build a model of the λ-calculus by formal-ising the inverse limit construction of domain theory.This construction is not possible in the HOL system it-self, or in simple type theories in general.

UCAM-CL-TR-355

David Wheeler, Roger Needham:

Two cryptographic notesNovember 1994, 6 pages, PDF

Abstract: A large block DES-like algorithmDES was designed to be slow in software. We give

here a DES type of code which applies directly to sin-gle blocks comprising two or more words of 32 bits.It is thought to be at least as secure as performingDES separately on two word blocks, and has the addedadvantage of not requiring chaining etc. It is about8m/(12+2m) times as fast as DES for an m word blockand has a greater gain for Feistel codes where the num-ber of rounds is greater. We use the name GDES forthe codes we discuss. The principle can be used on anyFeistel code.

70

TEA, a Tiny Encryption AlgorithmWe design a short program which will run on most

machines and encypher safely. It uses a large numberof iterations rather than a complicated program. It ishoped that it can easily be translated into most lan-guages in a compatible way. The first program is givenbelow. It uses little set up time and does a weak nonlinear iteration enough rounds to make it secure. Thereare no preset tables or long set up times. It assumes 32bit words.

UCAM-CL-TR-356

S.E. Robertson, K. Sparck Jones:

Simple, proven approaches totext retrievalDecember 1994, 8 pages, PDF

Abstract: This technical note describes straightforwardtechniques for document indexing and retrieval thathave been solidly established through extensive testingand are easy to apply. They are useful for many differ-ent types of text material, are viable for very large files,and have the advantage that they do not require spe-cial skills or training for searching, but are easy for endusers.

UCAM-CL-TR-357

Jonathan P. Bowen, Michael G. Hinchey:

Seven more myths of formal methodsDecember 1994, 12 pages, paper copy

UCAM-CL-TR-358

Simon William Moore:

Multithreaded processor designFebruary 1995, 125 pages, paper copyPhD thesis (Trinity Hall, October 1994)This report was also published as a book of thesame title (Kluwer/Springer-Verlag, 1996, ISBN0-7923-9718-5).

UCAM-CL-TR-359

Jacob Frost:

A case study of co-induction inIsabelleFebruary 1995, 48 pages, PDF, PostScript, DVI

Abstract: The consistency of the dynamic and static se-mantics for a small functional programming languagewas informally proved by R. Milner and M. Tofte. Thenotions of co-inductive definitions and the associatedprinciple of co-induction played a pivotal role in theproof. With emphasis on co-induction, the work pre-sented here deals with the formalisation of this result inthe generic theorem prover Isabelle.

UCAM-CL-TR-360

W.F. Clocksin:

On the calculation of explicitpolymetresMarch 1995, 12 pages, PDF

Abstract: Computer scientists take an interest in objectsor events which can be counted, grouped, timed andsynchronised. The computational problems involvedwith the interpretation and notation of musical rhythmare therefore of particular interest, as the most com-plex time-stamped structures yet devised by humankindare to be found in music notation. These problems arebrought into focus when considering explicit polymet-ric notation, which is the concurrent use of differenttime signatures in music notation. While not in com-mon use the notation can be used to specify compli-cated cross-rhythms, simple versus compound metres,and unequal note values without the need for tupletnotation. From a computational point of view, explicitpolymetric notation is a means of specifying synchro-nisation relationships amongst multiple time-stampedstreams. Human readers of explicit polymetic notationuse the time signatures together with the layout of bar-lines and musical events as clues to determine the per-formance. However, if the aim is to lay out the notation(such as might be required by an automatic music no-tation processor), the location of barlines and musicalevents will be unknown, and it is necessary to calcu-late them given only the information conveyed by thetime signatures. Similar problems arise when trying toperform the notation (i.e. animate the specification) inreal-time. Some problems in the interpretation of ex-plicit polymetric notation are identified and a solutionis proposed. Two different interpretations are distin-guished, and methods for their automatic calculationare given. The solution given may be applied to prob-lems which involve the synchronisation or phase adjust-ment of multiple independent threads of time-stampedobjects.

UCAM-CL-TR-361

Richard John Black:

Explicit network schedulingApril 1995, 121 pages, PostScriptPhD thesis (Churchill College, December 1994)

71

Abstract: This dissertation considers various problemsassociated with the scheduling and network I/O organ-isation found in conventional operating systems for ef-fective support for multimedia applications which re-quire Quality of Service.

A solution for these problems is proposed in amicro-kernel structure. The pivotal features of the pro-posed design are that the processing of device interruptsis performed by user-space processes which are sched-uled by the system like any other, that events are usedfor both inter- and intra-process synchronisation, andthe use of a specially developed high performance I/Obuffer management system.

An evaluation of an experimental implementation isincluded. In addition to solving the scheduling and net-working problems addressed, the prototype is shownto out-perform the Wanda system (a locally developedmicro-kernel) on the same platform.

This dissertation concludes that it is possible to con-struct an operating system where the kernel providesonly the fundamental job of fine grain sharing of theCPU between processes, and hence synchronisation be-tween those processes. This enables processes to per-form task specific optimisations; as a result system per-formance is enhanced, both with respect to throughputand the meeting of soft real-time guarantees.

UCAM-CL-TR-362

Mark Humphrys:

W-learning:competition among selfish Q-learnersApril 1995, 30 pages, PostScript

Abstract: W-learning is a self-organising action-selection scheme for systems with multiple paral-lel goals, such as autonomous mobile robots. Ituses ideas drawn from the subsumption architecturefor mobile robots (Brooks), implementing them withthe Q-learning algorithm from reinforcement learning(Watkins). Brooks explores the idea of multiple sensing-and-acting agents within a single robot, more than oneof which is capable of controlling the robot on its ownif allowed. I introduce a model where the agents arenot only autonomous, but are in fact engaged in directcompetition with each other for control of the robot.Interesting robots are ones where no agent achievestotal victory, but rather the state-space is fragmentedamong different agents. Having the agents operate byQ-learning proves to be a way to implement this, lead-ing to a local, incremental algorithm (W-learning) toresolve competition. I present a sketch proof that thisalgorithm converges when the world is a discrete, finiteMarkov decision process. For each state, competition isresolved with the most likely winner of the state beingthe agent that is most likely to suffer the most if it doesnot win. In this way, W-learning can be viewed as ‘fair’resolution of competition. In the empirical section, I

show how W-learning may be used to define spaces ofagent-collections whose action selection is learnt ratherthan hand-designed. This is the kind of solution-spacethat may be searched with a genetic algorithm.

UCAM-CL-TR-363

Ian Stark:

Names and higher-order functionsApril 1995, 140 pages, PostScript, DVIPhD thesis (Queens’ College, December 1994)

Abstract: Many functional programming languages relyon the elimination of ‘impure’ features: assignment tovariables, exceptions and even input/output. But someof these are genuinely useful, and it is of real interest toestablish how they can be reintroducted in a controlledway. This dissertation looks in detail at one example ofthis: the addition to a functional language of dynami-cally generated “names”. Names are created fresh, theycan be compared with each other and passed around,but that is all. As a very basic example of “state”, theycapture the graduation between private and public, lo-cal and global, by their interaction with higher-orderfunctions.

The vehicle for this study is the “nu-calculus”, anextension of the simply-typed lambda-calculus. The nu-calculus is equivalent to a certain fragment of StandardML, omitting side-effects, exceptions, datatypes and re-cursion. Even without all these features, the interactionof name creation with higher-order functions can becomplex and subtle.

Various operational and denotational methods forreasoning about the nu-calculus are developed. Theseinclude a computational metalanguage in the style ofMoggi, which distinguishes in the type system betweenvalues and computations. This leads to categoricalmodels that use a strong monad, and examples are de-vised based on functor categories.

The idea of “logical relations” is used to derivepowerful reasoning methods that capture some of thedistinction between private and public names. Thesetechniques are shown to be complete for establishingcontextual equivalence between first-order expressions;they are also used to construct a correspondingly ab-stract categorical model.

All the work with the nu-calculus extends cleanly toReduced ML, a larger language that introduces integerreferences: mutable storage cells that are dynamicallyallocated. It turns out that the step up is quite simple,and both the computational metalanguage and the sam-ple categorical models can be reused.

UCAM-CL-TR-364

Ole Rasmussen:

The Church-Rosser theorem inIsabelle:

72

a proof porting experimentApril 1995, 27 pages, PostScript

Abstract: This paper describes a proof of the Church-Rosser theorem for the pure lambda-calculus for-malised in the Isabelle theorem prover. The initial ver-sion of the proof is ported from a similar proof donein the Coq proof assistant by Girard Huet, but a num-ber of optimisations have been performed. The devel-opment involves the introduction of several inductiveand recursive definitions and thus gives a good presen-tation of the inductive package of Isabelle.

UCAM-CL-TR-365

P.N. Benton, G.M. Bierman, V.C.V. de Paiva:

Computational types from alogical perspective IMay 1995, 19 pages, paper copy

UCAM-CL-TR-366

K. Sparck Jones, G.J.F. Jones, J.T. Foote,S.J. Young:

Retrieving spoken documents:VMR Project experimentsMay 1995, 28 pages, paper copy

UCAM-CL-TR-367

Andrew M. Pitts:

Categorical logicMay 1995, 94 pages, PostScript, DVI

Abstract: This document provides an introduction tothe interaction between category theory and mathemat-ical logic which is slanted towards computer scientists.It will be a chapter in the forthcoming Volume VI of:S. Abramsky, D. M. Gabbay, and T. S. E. Maibaum(eds), “Handbook of Logic in Computer Science”, Ox-ford University Press.

UCAM-CL-TR-368

Burkhard Stiller:

CogPiT – configuration of protocolsin TIPJune 1995, 73 pages, PostScript

Abstract: The variety of upcoming applications in termsof their performance and Quality-of-Service (QoS) re-quirements is increasing. Besides almost well-knownapplications, such as teleconferencing, audio- andvideo-transmissions, even more contemporary ones,such as medical imaging, Video-on-Demand, and in-teractive tutoring systems, are introduced and appliedto existing networks. On the contrary, traditionallydata-oriented applications, such as file transfer and re-mote login, are considerably different in terms of theirQoS requirements. Therefore, the consequences of thisevolution effect the architectures of end-systems, e.g.,workstations that have to be capable of maintaining alldifferent kinds of multi-media data, and intermediate-systems as well.

Therefore, a configuration approach of communi-cation protocols has been developed to support thevariety of applications. This approach offers the pos-sibility to configure communication protocols auto-matically depending on the application requirementsexpressed in various QoS parameters. The result, anapplication-tailored communication protocol, matchesthe requested application requirements as far as possi-ble. Additionally, network and system resources (NSR)are taken into account for a well-suited configuration.

The Configuration of Protocols in TIP is called Cog-PiT and is part of the Transport and InternetworkingPackage (TIP). As an example, in the TIP environmentthe transport protocol TEMPO is used for configura-tion purposes.

UCAM-CL-TR-369

Sten Agerholm:

A comparison of HOL-ST andIsabelle/ZFJuly 1995, 23 pages, PDF

Abstract: The use of higher order logic (simple typetheory) is often limited by its restrictive type system.Set theory allows many constructions on sets that arenot possible on types in higher order logic. This paperpresents a comparison of two theorem provers support-ing set theory, namely HOL-ST and Isabelle/ZF, basedon a formalization of the inverse limit construction ofdomain theory; this construction cannot be formalizedin higher order logic directly. We argue that whilst thecombination of higher order logic and set theory inHOL-ST has advantages over the first order set theoryin Isabelle/ZF, the proof infrastructure of Isabelle/ZFhas better support for set theory proofs than HOL-ST.Proofs in Isabelle/ZF are both considerably shorter andeasier to write.

UCAM-CL-TR-370

73

Sten Agerholm:

A package for non-primitive recursivefunction definitions in HOLJuly 1995, 36 pages, paper copy

UCAM-CL-TR-371

Kim Ritter Wagner:

LIMINF convergence in Ω-categoriesJune 1995, 28 pages, paper copy

UCAM-CL-TR-372

Stefan G. Hild:

A brief history of mobile telephonyJanuary 1995, 19 pages, PDF

Abstract: Mobile telephony has gone through a decadeof tremendous change and progress. Today, mobilephones are an indispensable tool to many professionals,and have great potential to become vital components inmobile data communication applications. In this sur-vey we will attempt to present some of the milestonesfrom the route which mobile telephony has taken overthe past decades while developing from an experimen-tal system with limited capabilities with to a maturetechnology (section 1), followd by a more detailed in-troduction into the modern pan-European GSM stan-dard (section 2). Section 3 is devoted to the data com-munication services, covering two packet-oriented dataonly networks as well as data services planned for theGSM system. Section 4 covers some security issues andsection 5 gives an insight into the realities today withdetails of some networks available in the UK. Finally,section 6 concludes this overview with a brief look intothe future.

UCAM-CL-TR-373

Benjamın Macıas, Stephen G. Pulman:

Natural-language processing andrequirements specificationsJuly 1995, 73 pages, paper copy

UCAM-CL-TR-374

Burkhard Stiller:

A framework for QoS updates in anetworking environmentJuly 1995, PostScript

Abstract: The support of sufficient Quality-of-Service(QoS) for applications residing in a distributed environ-ment and running on top of high performance networksis a demanding issue. Currently, the areas to providethis support adequately include communication proto-cols, operating systems support, and offered networkservices. A configurable approach of communicationprotocols offers the needed protocol flexibility to reactaccordingly on various different requirements.

Communication protocols and operating systemshave to be parametrized using internal configurationparameters, such as window sizes, retry counters, orscheduling mechanisms, that rely closely on requestedapplication-oriented or network-dependent QoS, suchas bandwidth or delay. Moreover, these internal pa-rameters have to be recalculated from time to time dueto network changes (such as congestion or line break-down) or due to application-specific alterations (suchas enhanced bandwidth requirements or increased reli-ability) to adjust a temporary or semi-permanent “out-of-tune” service behavior.

Therefore, a rule-based evaluation and QoS updat-ing framework for configuration parameters in a net-working environment has been developed. The result-ing “rulework” can be used within highly dynamic en-vironments in a communication subsystem that offersthe possibility to specify for every QoS parameter botha bounding interval of values and an average value.As an example, the framework has been integratedin the Function-based Communication Subsystem (F-CSS). Especially, an enhanced application service inter-face is offered, allowing for the specification of variousQoS-parameters that are used to configure a sufficientapplication-tailored communication protocol.

UCAM-CL-TR-375

Feng Huang:

Restructuring virtual memory tosupport distributed computingenvironmentsJuly 1995, 135 pages, paper copyPhD thesis (Clare Hall, July 1995)

74

UCAM-CL-TR-376

Timothy Roscoe:

The structure of a multi-serviceoperating systemAugust 1995, 113 pages, PostScriptPhD thesis (Queens’ College, April 1995)

Abstract: Increases in processor speed and networkbandwidth have led to workstations being used to pro-cess multimedia data in real time. These applicationshave requirements not met by existing operating sys-tems, primarily in the area of resource control: there isa need to reserve resources, in particular the processor,at a fine granularity. Furthermore, guarantees need tobe dynamically renegotiated to allow users to reassignresources when the machine is heavily loaded. Therehave been few attempts to provide the necessary facil-ities in traditional operating systems, and the internalstructure of such systems makes the implementation ofuseful resource control difficult.

This dissertation presents a way of structuring anoperating system to reduce crosstalk between applica-tions sharing the machine, and enable useful resourceguarantees to be made: instead of system services be-ing located in the kernel or server processes, they areplaced as much as possible in client protection domainsand scheduled as part of the client, with communica-tion between domains only occurring when necessaryto enforce protection and concurrency control. Thisamounts to multiplexing the service at as low a levelof abstraction as possible. A mechanism for sharingprocessor time between resources is also described. Theprototype Nemesis operating system is used to demon-strate the ideas in use in a practical system, and to illus-trate solutions to several implementation problems thatarise.

Firstly, structuring tools in the form of typed inter-faces within a single address space are used to reducethe complexity of the system from the programmer’sviewpoint and enable rich sharing of text and data be-tween applications.

Secondly, a scheduler is presented which deliversuseful Quality of Service guarantees to applications in ahighly efficient manner. Integrated with the scheduler isan inter-domain communication system which has min-imal impact on resource guarantees, and a method ofdecoupling hardware interrupts from the execution ofdevice drivers.

Finally, a framework for high-level inter-domainand inter-machine communication is described, whichgoes beyond object-based RPC systems to permit bothQuality of Service negotiation when a communicationbinding is established, and services to be implementedstraddling protection domain boundaries as well as lo-cally and in remote processes.

UCAM-CL-TR-377

Larry Paulson, Krzysztof Grabczewski:

Mechanising set theory:cardinal arithmetic and the axiom ofchoiceJuly 1995, 33 pages, PDF, PostScript

Abstract: Fairly deep results of Zermelo-Fraenkel (ZF)set theory have been mechanised using the proof as-sistant Isabelle. The results concern cardinal arithmeticand the Axiom of Choice (AC). A key result about car-dinal multiplication is K*K=K, where K is any infinitecardinal. Proving this result required developing theo-ries of orders, order-isomorphisms, order types, ordinalarithmetic, cardinals, etc.; this covers most of Kunen,Set Theory, Chapter I. Furthermore, we have provedthe equivalence of 7 formulations of the Well-orderingTheorem and 20 formulations of AC; this covers thefirst two chapters of Rubin and Rubin, Equivalents ofthe Axiom of Choice. The definitions used in the proofsare largely faithful in style to the original mathematics.

UCAM-CL-TR-378

Noha Adly:

Performance evaluation of HARP:a hierarchical asynchronousreplication protocol for large scalesystemAugust 1995, 94 pages, PostScript

Abstract: This report evaluates the performance ofHARP, a hierarchical replication protocol based onnodes organised into a logical hierarchy. The schemeis based on communication with nearby replicas andscales well for thousands of replicas. It proposes a newservice interface that provides different levels of asyn-chrony, allowing strong consistency and weak consis-tency to be integrated into the same framework. Fur-ther, it provides the ability to offer different levels ofstaleness, by querying from different levels of the hi-erarchy. We present results from a detailed simulationanalysis evaluating the benefits and losses in perfor-mance resulting from using synchronous versus asyn-chronous operation within HARP under different sys-tem configurations and load mixes. Further, the perfor-mance is evaluated on different network topologies. Ananalytical solution based on the Open Queueing Net-work Model with Multiple Job Classes is carried outfor the verification of the simulation model and the re-sults are presented.

75

UCAM-CL-TR-379

Lawrence Paulson:

Proceedings of the First Isabelle UsersWorkshopSeptember 1995, 265 pages, paper copy

UCAM-CL-TR-380

Burkhard Stiller:

Quality-of-Service issues innetworking environmentsSeptember 1995, 68 pages, PostScript

Abstract: Quality-of-Service (QoS) issues in network-ing environments cover various separate areas and top-ics. They include at least the specification of applica-tions requirements, the definition of network services,QoS models, resource reservation methods, negotiationand transformation methods for QoS, and operatingsystem support for guaranteed services. An embracingapproach for handling, dealing with, and supportingQoS in different scenarios and technical set-ups is re-quired to manage sufficiently forthcoming communica-tion and networking tasks. Modern telecommunicationsystems require an integrated architecture for appli-cations, communication subsystems, and network per-spectives to overcome drawbacks of traditional com-munication architectures, such as redundant protocolfunctionality, weakly designed interfaces between theend-system and a network adapter, or impossibility ofspecifying and guaranteeing QoS parameter.

This work contains the discussion of a number ofinterconnected QoS issues, e.g., QoS mapping, QoSnegotiation, QoS-based configuration of communica-tion protocols, or QoS aspects in Asynchronous Trans-fer Mode (ATM) signaling protocols, which have beendealt with during a one-year research fellowship. Thisreport is not intended to be a complete description ofevery technical detail, but tries to provide a brief over-all picture of the emerging and explosively developingQoS issues in telecommunication systems. Additionally,investigations of some of these issues are undertaken ina more closer detail. It is mainly focussed on QoS map-ping, negotiation, and updating in the communicationprotocol area.

UCAM-CL-TR-381

Uwe Michael Nimscheck:

Rendering for free form deformationsOctober 1995, 151 pages, paper copyPhD thesis (Wolfson College)

UCAM-CL-TR-382

Oliver M. Castle:

Synthetic image generation for amultiple-view autostereo displayOctober 1995, 184 pages, paper copyPhD thesis (Wolfson College, April 1995)

UCAM-CL-TR-383

Noha Adly:

Management of replicated data inlarge scale systemsNovember 1995, 182 pages, paper copyPhD thesis (Corpus Christi College, August 1995)

UCAM-CL-TR-384

Shaw-Cheng Chuang:

Securing ATM networksJanuary 1995, 30 pages, PostScript

Abstract: This is an interim report on the investigationsinto securing Asynchronous Transfer Mode (ATM) net-works. We look at the challenge in providing such a se-cure ATM network and identify the important issuesin achieving such goal. In this paper, we discuss theissues and problems involved and outline some tech-niques to solving these problems. The network environ-ment is first examined and we also consider the correctplacement of security mechanism in such an environ-ment. Following the analysis of the security require-ment, we introduce and describe a key agile crypto-graphic device for ATM. The protection of the ATMdata plane is extremely important to provide data con-fidentiality and data integrity. Techniques in providingsynchronisation, dynamic key change, dynamic initiali-sation vector change and Message Authentication Codeon ATM data, are also being considered. Next, we dis-cuss the corresponding control functions. A few key ex-change protocols are given as possible candidates forthe establishment of the session key. The impact of suchkey exchange protocols on the design of an ATM sig-nalling protocol has also been examined and securityextension to an existing signalling protocol being dis-cussed. We also talk about securing other control planefunctions such as NNI routing, Inter-Domain PolicyRouting, authorisation and auditing, firewall and in-trusion detection, Byzantine robustness. Managementplane functions are also being looked at, with discus-sions on bootstrapping, authenticated neighbour dis-covery, ILMI Security, PVC security, VPI security andATM Forum management model.

76

UCAM-CL-TR-385

Sanjay Saraswat:

Performance evaluation of theDelphi machineDecember 1995, 187 pages, paper copyPhD thesis (St Edmund’s College, October 1995)

UCAM-CL-TR-386

Andrew D. Gordon, Gareth D. Rees:

Bisimilarity for a first-order calculusof objects with subtypingJanuary 1996, 78 pages, paper copy

UCAM-CL-TR-387

Scarlet Schwiderski, Andrew Herbert,Ken Moody:

Monitoring composite events indistributed systemsFebruary 1996, 20 pages, paper copy

UCAM-CL-TR-388

P.N. Benton:

A unified approach to strictnessanalysis and optimisingtransformationsFebruary 1996, 21 pages, paper copy

UCAM-CL-TR-389

Wai Wong:

A proof checked for HOLMarch 1996, 165 pages, paper copy

UCAM-CL-TR-390

Richard J. Boulton:

Syn: a single language for specifiyingabstract syntax tress, lexical analysis,parsing and pretty-printingMarch 1996, 25 pages, PostScript, DVI

Abstract: A language called Syn is described in whichall aspects of context-free syntax can be specified with-out redundancy. The language is essentially an extendedBNF grammar. Unusual features include high-level con-structs for specifying lexical aspects of a language andspecification of precedence by textual order. A systemhas been implemented for generating lexers, parsers,pretty-printers and abstract syntax tree representationsfrom a Syn specification.

UCAM-CL-TR-391

Andrew John Kennedy:

Programming languages anddimensionsApril 1996, 149 pages, PDFPhD thesis (St Catherine’s College, November 1995)

Abstract: Scientists and engineers must ensure that theequations and formulae which they use are dimension-ally consistent, but existing programming languagestreat all numeric values as dimensionless. This thesisinvestigates the extension of programming languages tosupport the notion of physical dimension.

A type system is presented similar to that of the pro-gramming language ML but extended with polymor-phic dimension types. An algorithm which infers mostgeneral dimension types automatically is then describedand proved correct.

The semantics of the language is given by a transla-tion into an explicitlytyped language in which dimen-sions are passed as arguments to functions. The opera-tional semantics of this language is specified in the usualway by an evaluation relation defined by a set of rules.This is used to show that if a program is well-typed thenno dimension errors can occur during its evaluation.

More abstract properties of the language are inves-tigated using a denotational semantics: these include anotion of invariance under changes in the units of mea-sure used, analogous to parametricity in the polymor-phic lambda calculus. Finally the dissertation is sum-marised and many possible directions for future re-search in dimension types and related type systems aredescribed.

77

UCAM-CL-TR-392

Uwe Nestmann, Benjamin C. Pierce:

Decoding choice encodingsApril 1996, 54 pages, paper copy

UCAM-CL-TR-393

Simon Andrew Crosby:

Performance management inATM networksApril 1996, 215 pages, PostScriptPhD thesis (St John’s College, May 1995)

Abstract: The Asynchronous Transfer Mode (ATM) hasbeen identified as the technology of choice amongsthigh speed communication networks for its potentialto integrate services with disparate resource needs andtiming constraints. Before it can successfully deliver in-tegrated services, however, significant problems remainto be solved. They centre around two major issues.First, there is a need for a simple, powerful network ser-vice interface capable of meeting the communicationsneeds of new applications. Second, within the networkthere is a need to dynamically control a mix of diversetraffic types to ensure that they meet their performancecriteria.

Addressing the first concern, this dissertation arguesthat a simple network control interface offers signifi-cant advantages over the traditional, heavyweight ap-proach of the telecommunications industry. A networkcontrol architecture based on a distributed systems ap-proach is presented which locates both the networkcontrol functions and its services outside the network.The network service interface uses the Remote Proce-dure Call (RPC) paradigm and enables more compli-cated service offerings to be built from the basic primi-tives. A formal specification and verification of the user-network signalling protocol is presented. Implementa-tions of the architecture, both on Unix and the Wandamicro-kernel, used on the Fairisle ATM switch, are de-scribed. The implementations demonstrate the feasibil-ity of the architecture, and feature a high degree of ex-perimental flexibility. This is exploited in the balanceof the dissertation, which presents the results of a prac-tical study of network performance under a range ofdynamic control mechanisms.

Addressing the second concern, results are presentedfrom a study of the cell delay variation suffered byATM connections when multiplexed with real ATMtraffic in an uncontrolled network, and from an inves-tigation of the expansion of bursts of ATM traffic as aresult of multiplexing. The results are compared withthose of analytical models. Finally, results from a studyof the performance delivered to delay sensitive traffic bypriority and rate based cell scheduling algorithms, andthe loss experienced by different types of traffic underseveral buffer allocation strategies are presented.

UCAM-CL-TR-394


A simple formalization and proof forthe mutilated chess boardApril 1996, 11 pages, PDF, PostScript, DVI

Abstract: The impossibility of tiling the mutilated chessboard has been formalized and verified using Isabelle.The formalization is concise because it is expressed us-ing inductive definitions. The proofs are straightfor-ward except for some lemmas concerning finite cardi-nalities. This exercise is an object lesson in choosinga good formalization. is applicable in a variety of do-mains.

UCAM-CL-TR-395

Torben Brauner, Valeria de Paiva:

Cut-elimination for full intuitionisticlinear logicMay 1996, 27 pages, paper copy

UCAM-CL-TR-396


Generic automatic proof toolsMay 1996, 28 pages, PDF, PostScript

Abstract: This paper explores a synthesis between twodistinct traditions in automated reasoning: resolutionand interaction. In particular it discusses Isabelle, aninteractive theorem prover based upon a form of res-olution. It aims to demonstrate the value of prooftools that, compared with traditional resolution sys-tems, seem absurdly limited. Isabelle’s classical reasonersearches for proofs using a tableau approach. The rea-soner is generic: it accepts rules proved in applied the-ories, involving defined connectives. New constants arenot reduced to first-order logic; the reasoner

UCAM-CL-TR-397

Borut Robic:

Optimal routing in 2-jump circulantnetworksJune 1996, 7 pages, PostScript

Abstract: An algorithm for routing a message alongthe shortest path between a pair of processors in 2-jump circulant (undirected double fixed step) networkis given. The algorithm requires O(d) time for prepro-cessing, and l = O(d) routing steps, where l is the dis-tance between the processors and d is the diameter ofthe network.

78

UCAM-CL-TR-398

N.A. Dodgson, J.R. Moore:

Design and implementation of anautostereoscopic camera systemJune 1996, 20 pages, PDF

Abstract: An autostereoscopic display provides theviewer with a three-dimensional image without theneed for special glasses, and allows the user to lookaround objects in the image by moving the head left-right. The time-multiplexed autostereo display devel-oped at the University of Cambridge has been in op-eration since late 1991.

An autostereoscopic camera system has been de-signed and implemented. It is capable of taking videoinput from up to sixteen cameras, and multiplexingthese into a video output stream with a pixel rate anorder of magnitude faster than the individual inputstreams. Testing of the system with eight cameras and aCambridge Autostereo Display has produced excellentlive autostereoscopic video.

This report describes the design of this camerasystem which has been successfully implemented anddemonstrated. Problems which arose during this pro-cess are discussed, and a comparison with similar sys-tems made.

UCAM-CL-TR-399

Richard Hayton:

OASIS:An open architecture for secureinterworking servicesJune 1996, 102 pages, PDFPhD thesis (Fitzwilliam College, March 1996)

Abstract: An emerging requirement is for applicationsand distributed services to cooperate or inter-operate.Mechanisms have been devised to hide the heterogene-ity of the host operating systems and abstract the issuesof distribution and object location. However, in orderfor systems to inter-operate securely there must also bemechanisms to hide differences in security policy, or atleast negotiate between them.

This would suggest that a uniform model of accesscontrol is required. Such a model must be extremelyflexible with respect to the specification of policy, as dif-ferent applications have radically different needs. In awidely distributed environment this situation is exacer-bated by the differing requirements of different organi-sations, and in an open environment there is a need tointerwork with organisations using alternative securitymechanisms.

Other proposals for the interworking of securitymechanisms have concentrated on the enforcement of

access policy, and neglected the concerns of freedomof expression of this policy. For example it is commonto associate each request with a user identity, and touse this as the only parameter when performing accesscontrol. This work describes an architectural approachto security. By reconsidering the role of the client andthe server, we may reformulate access control issues interms of client naming.

We think of a client as obtaining a name issued by aservice; either based on credentials already held by theclient, or by delegation from another client. A gram-mar has been devised that allows the conditions underwhich a client may assume a name to be specified, andthe conditions under which use of the name will be re-voked. This allows complex security policies to be spec-ified that define how clients of a service may interactwith each other (through election, delegation and revo-cation), how clients interact with a service (by invok-ing operations or receiving events) and how clients andservices may inter-operate. (For example, a client of aLogin service may become a client of a file service.)

This approach allows great flexibility when integrat-ing a number of services, and reduces the mismatchof policies common in heterogeneous systems. A flexi-ble security definition is meaningless if not backed bya robust and efficient implementation. In this thesiswe present a systems architecture that can be imple-mented efficiently, but that allows individual servicesto ‘fine tune’ the trade-offs between security, efficiencyand freedom of policy expression. The architecture isinherently distributed and scalable, and includes mech-anisms for rapid and selective revocation of privilegeswhich may cascade between services and organisations.

UCAM-CL-TR-400

Scarlet Schwiderski:

Monitoring the behaviour ofdistributed systemsJuly 1996, 161 pages, PDFPhD thesis (Selwyn College, April 1996)

Abstract: Monitoring the behaviour of computing sys-tems is an important task. In active database systems,a detected system behaviour leads to the triggering ofan ECA (event-condition-action) rule. ECA rules areemployed for supporting database management systemfunctions as well as external applications. Although dis-tributed database systems are becoming more common-place, active database research has to date focussedon centralised systems. In distributed debugging sys-tems, a detected system behaviour is compared withthe expected system behaviour. Differences illustrateerroneous behaviour. In both application areas, sys-tem behaviours are specified in terms of events: prim-itive events represent elementary occurrences and com-posite events represent complex occurrence patterns.At system runtime, specified primitive and composite

79

events are monitored and event occurrences are de-tected. However, in active database systems events aremonitored in terms of physical time and in distributeddebugging systems events are monitored in terms oflogical time. The notion of physical time is difficult indistributed systems because of their special characteris-tics: no global time, network delays, etc.

This dissertation is concerned with monitoring thebehaviour of distributed systems in terms of physicaltime, i.e. the syntax, the semantics, the detection, andthe implementation of events are considered.

The syntax of primitive and composite events is de-rived from the work of both active database systemsand distributed debugging systems; differences and ne-cessities are highlighted.

The semantics of primitive and composite events es-tablishes when and where an event occurs; the seman-tics depends largely on the notion of physical time indistributed systems. Based on the model for an approx-imated global time base, the ordering of events in dis-tributed systems is considered, and the structure andhandling of timestamps are illustrated. In specific ap-plications, a simplified version of the semantics can beapplied which is easier and therefore more efficient toimplement.

Algorithms for the detection of composite eventsat system runtime are developed; event detectors aredistributed to arbitrary sites and composite events areevaluated concurrently. Two different evaluation poli-cies are examined: asynchronous evaluation and syn-chronous evaluation. Asynchronous evaluation is char-acterised by the ad hoc consumption of signalled eventoccurrences. However, since the signalling of events in-volves variable delays, the events may not be evaluatedin the system-wide order of their occurrence. On theother hand, synchronous evaluation enforces events tobe evaluated in the system-wide order of their occur-rence. But, due to site failures and network congestion,the evaluation may block on a fairly long-term basis.

The prototype implementation realises the algo-rithms for the detection of composite events with bothasynchronous and synchronous evaluation. For the pur-pose of testing, primitive event occurrences are simu-lated by distributed event simulators. Several tests areperformed illustrating the differences between asyn-chronous and synchronous evaluation: the first is ‘fastand unreliable’ whereas the latter is ‘slow and reliable’.

UCAM-CL-TR-401

Gavin Bierman:

A classical linear λ-calculusJuly 1996, 41 pages, paper copy

UCAM-CL-TR-402


Video mail retrieval using voice:report on collection of naturalisticrequests and relevance assessmentsSeptember 1996, 21 pages, paper copy

UCAM-CL-TR-403

Paul Ronald Barham:

Devices in a multi-service operatingsystemOctober 1996, 131 pages, PostScript, DVIPhD thesis (Churchill College, June 1996)

Abstract: Increases in processor speed and network anddevice bandwidth have led to general purpose work-stations being called upon to process continuous me-dia data in real time. Conventional operating systemsare unable to cope with the high loads and strict tim-ing constraints introduced when such applications formpart of a multi-tasking workload. There is a need forthe operating system to provide fine-grained reservationof processor, memory and I/O resources and the abil-ity to redistribute these resources dynamically. A smallgroup of operating systems researchers have recentlyproposed a “vertically-structured” architecture wherethe operating system kernel provides minimal function-ality and the majority of operating system code exe-cutes within the application itself. This structure greatlysimplifies the task of accounting for processor usage byapplications. The prototype Nemesis operating systemembodies these principles and is used as the platformfor this work.

This dissertation extends the provision of Qual-ity of Service guarantees to the I/O system by pre-senting an architecture for device drivers which min-imises crosstalk between applications. This is achievedby clearly separating the data-path operations, whichrequire careful accounting and scheduling, and the in-frequent control-path operations, which require protec-tion and concurrency control. The approach taken is toabstract and multiplex the I/O data-path at the low-est level possible so as to simplify accounting, policingand scheduling of I/O resources and enable application-specific use of I/O devices.

The architecture is applied to several representativeclasses of device including network interfaces, networkconnected peripherals, disk drives and framestores. Ofthese, disks and framestores are of particular interestsince they must be shared at a very fine granularity buthave traditionally been presented to the application viaa window system or file-system with a high-level andcoarse-grained interface.

A device driver for the framestore is presented whichabstracts the device at a low level and is therefore ableto provide each client with guaranteed bandwidth tothe framebuffer. The design and implementation of anovel client-rendering window system is then presented

80

which uses this driver to enable rendering code to besafely migrated into a shared library within the client.

A low-level abstraction of a standard disk drive isalso described which efficiently supports a wide varietyof file systems and other applications requiring persis-tent storage, whilst providing guaranteed rates of I/Oto individual clients. An extent-based file system is pre-sented which can provide guaranteed rate file accessand enables clients to optimise for application-specificaccess patterns.

UCAM-CL-TR-404

Kam Hong Shum:

Adaptive parallelism for computingon heterogeneous clustersNovember 1996, 147 pages, paper copyPhD thesis (Darwin College, August 1996)

UCAM-CL-TR-405

Richard J. Boulton:

A tool to support formal reasoningabout computer languagesNovember 1996, 21 pages, PostScript, DVI

Abstract: A tool to support formal reasoning aboutcomputer languages and specific language texts is de-scribed. The intention is to provide a tool that canbuild a formal reasoning system in a mechanical theo-rem prover from two specifications, one for the syntaxof the language and one for the semantics. A parser,pretty-printer and internal representations are gener-ated from the former. Logical representations of syntaxand semantics, and associated theorem proving tools,are generated from the combination of the two specifi-cations. The main aim is to eliminate tedious work fromthe task of prototyping a reasoning tool for a computerlanguage, but the abstract specifications of the languagealso assist the automation of proof.

UCAM-CL-TR-406


Tool support for logics of programsNovember 1996, 31 pages, PDF, PostScript

Abstract: Proof tools must be well designed if they areto be more effective than pen and paper. Isabelle sup-ports a range of formalisms, two of which are described(higher-order logic and set theory). Isabelle’s represen-tation of logic is influenced by logic programming: its“logical variables” can be used to implement step-wiserefinement. Its automatic proof procedures are basedon search primitives that are directly available to users.While emphasizing basic concepts, the article also dis-cusses applications such as an approach to the analysisof security protocols.

UCAM-CL-TR-407

Sebastian Schoenberg:

The L4 microkernel on AlphaDesign and implementationSeptember 1996, 51 pages, PostScript

Abstract: The purpose of a microkernel is to cover thelowest level of the hardware and to provide a moregeneral platform to operating systems and applicationsthan the hardware itself. This has made microkerneldevelopment increasingly interesting. Different types ofmicrokernels have been developed, ranging from ker-nels which merely deal with the hardware infterface(Windows NT HAL), kernels especially for embeddedsystems (RTEMS), to kernels for multimedia streamsand real time support (Nemesis) and general purposekernels (L4, Mach).

The common opinion that microkernels lead to de-terioration in system performance has been disprovedby recent research. L4 is an example of a fast and small,multi address space, message-based microkernel, devel-oped originally for Intel systems only. Based on the L4interface, which should be as similar as possible on dif-ferent platforms, the L4 Alpha version has been devel-oped.

This work describes design decisions, implementa-tion and interfaces of the L4 version for 64-bit Alphaprocessors.

UCAM-CL-TR-408

John Robert Harrison:

Theorem proving with the realnumbersNovember 1996, 147 pages, PostScript, DVIPhD thesis (Churchill College, June 1996)

Abstract: This thesis discusses the use of the real num-bers in theorem proving. Typically, theorem proversonly support a few ‘discrete’ datatypes such as the natu-ral numbers. However the availability of the real num-bers opens up many interesting and important appli-cation areas, such as the verification of floating pointhardware and hybrid systems. It also allows the formal-ization of many more branches of classical mathemat-ics, which is particularly relevant for attempts to injectmore rigour into computer algebra systems.

Our work is conducted in a version of the HOL the-orem prover. We describe the rigorous definitional con-struction of the real numbers, using a new version ofCantor’s method, and the formalization of a significantportion of real analysis. We also describe an advancedderived decision procedure for the ‘Tarski subset’ ofreal algebra as well as some more modest but practi-cally useful tools for automating explicit calculationsand routine linear arithmetic reasoning.

81

Finally, we consider in more detail two interestingapplication areas. We discuss the desirability of com-bining the rigour of theorem provers with the powerand convenience of computer algebra systems, and ex-plain a method we have used in practice to achieve this.We then move on to the verification of floating pointhardware. After a careful discussion of possible correct-ness specifications, we report on two case studies, oneinvolving a transcendental function.

We aim to show that a theory of real numbers isuseful in practice and interesting in theory, and thatthe ‘LCF style’ of theorem proving is well suited to thekind of work we describe. We hope also to convince thereader that the kind of mathematics needed for appli-cations is well within the abilities of current theoremproving technology.

UCAM-CL-TR-409


Proving properties of securityprotocols by inductionDecember 1996, 24 pages, PDF, PostScript, DVI

Abstract: Security protocols are formally specified interms of traces, which may involve many interleavedprotocol runs. Traces are defined inductively. Protocoldescriptions model accidental key losses as well as at-tacks. The model spy can send spoof messages made upof components decrypted from previous traffic.

Correctness properties are verified using the prooftool Isabelle/HOL. Several symmetric-key protocolshave been studied, including Needham-Schroeder, Ya-halom and Otway-Rees. A new attack has been dis-covered in a variant of Otway-Rees (already broken byMao and Boyd). Assertions concerning secrecy and au-thenticity have been proved.

The approach rests on a common theory of mes-sages, with three operators. The operator “parts” de-notes the components of a set of messages. The opera-tor “analz” denotes those parts that can be decryptedwith known keys. The operator “synth” denotes thosemessages that can be expressed in terms of given com-ponents. The three operators enjoy many algebraic lawsthat are invaluable in proofs.

UCAM-CL-TR-410

John Harrison:

Proof styleJanuary 1997, 22 pages, PostScript, DVI

Abstract: We are concerned with how to communicatea mathematical proof to a computer theorem prover.This can be done in many ways, while allowing the ma-chine to generate a completely formal proof object. Themost obvious choice is the amount of guidance required

from the user, or from the machine perspective, the de-gree of automation provided. But another importantconsideration, which we consider particularly signifi-cant, is the bias towards a ‘procedural’ or ‘declarative’proof style. We will explore this choice in depth, anddiscuss the strengths and weaknesses of declarative andprocedural styles for proofs in pure mathematics andfor verification applications. We conclude with a briefsummary of our own experiments in trying to combineboth approaches.

UCAM-CL-TR-411

Monica Nesi:

Formalising process calculi inHigher Order LogicJanuary 1997, 182 pages, paper copyPhD thesis (Girton College, April 1996)

UCAM-CL-TR-412

G.M. Bierman:

Observations on a linear PCF(preliminary report)January 1997, 30 pages, paper copy

UCAM-CL-TR-413


Mechanized proofs of securityprotocols:Needham-Schroeder with public keysJanuary 1997, 20 pages, PDF, PostScript, DVI

Abstract: The inductive approach to verifying secu-rity protocols, previously applied to shared-key encryp-tion, is here applied to the public key version of theNeedham-Schroeder protocol. As before, mechanizedproofs are performed using Isabelle/HOL. Both theoriginal, flawed version and Lowe’s improved versionare studied; the properties proved highlight the distinc-tions between the two versions. The results are com-pared with previous analyses of the same protocol. Theanalysis reported below required only 30 hours of theauthor’s time. The proof scripts execute in under threeminutes.

82

UCAM-CL-TR-414

Martın Abadi, Andrew D. Gordon:

A calculus for cryptographicprotocolsThe SPI calculusJanuary 1997, 105 pages, PostScript

Abstract: We introduce the spi calculus, an extension ofthe pi calculus designed for the description and analysisof cryptographic protocols. We show how to use the spicalculus, particularly for studying authentication pro-tocols. The pi calculus (without extension) suffices forsome abstract protocols; the spi calculus enables us toconsider cryptographic issues in more detail. We repre-sent protocols as processes in the spi calculus and statetheir security properties in terms of coarse-grained no-tions of protocol equivalence.

UCAM-CL-TR-415

Steven Leslie Pope:

Application support for mobilecomputingFebruary 1997, 145 pages, paper copyPhD thesis (Jesus College, October 1996)

UCAM-CL-TR-416

Donald Syme:

DECLARE: a prototype declarativeproof system for higher order logicFebruary 1997, 25 pages, paper copy

UCAM-CL-TR-417

Peter J.C. Brown:

Selective mesh refinement forinteractive terrain renderingFebruary 1997, 18 pages, paper copy

Abstract: Terrain surfaces are often approximated bygeometric meshes to permit efficient rendering. This pa-per describes how the complexity of an approximatingirregular mesh can be varied across its domain in orderto minimise the number of displayed facets while en-suring that the rendered surface meets pre-determinedresolution requirements. We first present a generalisedscheme to represent a mesh over a continuous range ofresolutions using the output from conventional single-resolution approximation methods. We then describe

an algorithm which extracts a surface from this repre-sentation such that the resolution of the surface is en-hanced only in specific areas of interest. We prove thatthe extracted surface is complete, minimal, satisfies thegiven resolution constraints and meets the Delaunay tri-angulation criterion if possible. In addition, we presenta method of performing smooth visual transitions be-tween selectively-refined meshes to permit efficient ani-mation of a terrain scene.

A HTML version of that report is athttps://www.cl.cam.ac.uk/research/rainbow/publications/pjcb/tr417/

UCAM-CL-TR-418


Mechanized proofs for a recursiveauthentication protocolMarch 1997, 30 pages, PDF, PostScript

Abstract: A novel protocol has been formally analyzedusing the prover Isabelle/HOL, following the inductiveapproach described in earlier work. There is no limiton the length of a run, the nesting of messages or thenumber of agents involved. A single run of the protocoldelivers session keys for all the agents, allowing neigh-bours to perform mutual authentication. The basic se-curity theorem states that session keys are correctly de-livered to adjacent pairs of honest agents, regardless ofwhether other agents in the chain are compromised.The protocol’s complexity caused some difficulties inthe specification and proofs, but its symmetry reducedthe number of theorems to prove.

UCAM-CL-TR-419

James Quentin Stafford-Fraser:

Video-augmented environmentsApril 1997, 91 pages, PDFPhD thesis (Gonville & Caius College, February 1996)

Abstract: In the future, the computer will be thought ofmore as an assistant than as a tool, and users will in-creasingly expect machines to make decisions on theirbehalf. As with a human assistant, a machine’s ability tomake informed choices will often depend on the extentof its knowledge of activities in the world around it.Equipping personal computers with a large number ofsensors for monitoring their environment is, however,expensive and inconvenient, and a preferable solutionwould involve a small number of input devices with abroad scope of application. Video cameras are ideallysuited to many realworld monitoring applications forthis reason. In addition, recent reductions in the manu-facturing costs of simple cameras will soon make theirwidespread deployment in the home and office econom-ically viable. The use of video as an input device alsoallows the creation of new types of user-interface, more

83

suitable in some circumstances than those afforded bythe conventional keyboard and mouse.

This thesis examines some examples of these ‘Video-Augmented Environments’ and related work, and thendescribes two applications in detail. The first, a ‘soft-ware cameraman’, uses the analysis of one video streamto control the display of another. The second, ‘Bright-Board’, allows a user to control a computer by makingmarks on a conventional whiteboard, thus ‘augment-ing’ the board with many of the facilities common toelectronic documents, including the ability to fax, save,print and email the image of the board. The techniqueswhich were found to be useful in the construction ofthese applications are common to many systems whichmonitor real-world video, and so they were combinedin a toolkit called ‘Vicar’. This provides an architecturefor ‘video plumbing’, which allows standard videopro-cessing components to be connected together under thecontrol of a scripting language. It is a single applicationwhich can be programmed to create a variety of sim-ple Video-Augmented Environments, such as those de-scribed above, without the need for any recompilation,and so should simplify the construction of such appli-cations in the future. Finally, opportunities for furtherexploration on this theme are discussed.

UCAM-CL-TR-420

Jonathan Mark Sewell:

Managing complex models forcomputer graphicsApril 1997, 206 pages, PDFPhD thesis (Queens’ College, March 1996)

Abstract: Three-dimensional computer graphics is be-coming more common as increasing computationalpower becomes more readily available. Although theimages that can be produced are becoming more com-plex, users’ expectations continue to grow. This disser-tation examines the changes in computer graphics soft-ware that will be needed to support continuing growthin complexity, and proposes techniques for tackling theproblems that emerge.

Increasingly complex models will involve longerrendering times, higher memory requirements, longerdata transfer periods and larger storage capacities. Fur-thermore, even greater demands will be placed on theconstructors of such models. This dissertation aims todescribe how to construct scalable systems which canbe used to visualise models of any size without requir-ing dedicated hardware. This is achieved by controllingthe quality of the results, and hence the costs incurred.In addition, the use of quality controls can become atool to help users handle the large volume of informa-tion arising from complex models.

The underlying approach is to separate the modelfrom the graphics application which uses it, so that themodel exists independently. By doing this, an applica-tion is free to access only the data which is required at

any given time. For the application to function in thismanner, the data must be in an appropriate form. Toachieve this, approximation hierarchies are defined asa suitable new model structure. These utilise multiplerepresentations of both objects and groups of objects atall levels in the model.

In order to support such a structure, a novel methodis proposed for rapidly constructing simplified repre-sentations of groups of complex objects. By calculatinga few geometrical attributes, it is possible to generatereplacement objects that preserve important aspects ofthe originals. Such objects, once placed into an approx-imation hierarchy, allow rapid loading and rendering oflarge portions of a model. Extensions to rendering algo-rithms are described that take advantage of this struc-ture.

The use of multiple representations encompassesnot only different quality levels, but also different stor-age formats and types of objects. It provides a frame-work within which such aspects are hidden from theuser, facilitating the sharing and re-use of objects. Amodel manager is proposed as a means of encapsulatingthese mechanisms. This software gives, as far as possi-ble, the illusion of direct access to the whole complexmodel, while at the same time making the best use ofthe limited resources available.

UCAM-CL-TR-421

Michael Norrish:

An abstract dynamic semantics forCMay 1997, 31 pages, PDF, PostScript, DVI

Abstract: This report is a presentation of a formal se-mantics for the C programming language. The seman-tics has been defined operationally in a structured se-mantics style and covers the bulk of the core of the lan-guage. The semantics has been developed in a theoremprover (HOL), where some expected consequences ofthe language definition

UCAM-CL-TR-422

Antony Rowstron:

Using the BONITA primitives:a case studyMay 1997, 19 pages, paper copy

84

UCAM-CL-TR-423

Karl F. MacDorman:

Symbol grounding:Learning categorical andsensorimotor predictions forcoordination in autonomous robotsMay 1997, 170 pages, paper copyPhD thesis (Wolfson College, March 1997)

UCAM-CL-TR-424

Fabio Massacci:

Simplification with renaming:a general proof technique for tableauand sequent-based proversMay 1997, 26 pages, DVI

Abstract: Tableau and sequent calculi are the basis formost popular interactive theorem provers for hardwareand software verification.

Yet, when it comes to decision procedures or auto-matic proof search, tableaux are orders of magnitudeslower than Davis-Putnam, SAT based procedures orother techniques based on resolution.

To meet this challenge, this paper proposes a the-oretical innovation: the rule of simplification, whichplays the same role for tableaux as subsumption doesfor resolution, and unit for Davis-Putman.

This technique gives a unifying view of a number oftableaux-like calculi such as DPLL, KE, HARP, hyper-tableaux etc. For instance the stand-alone nature ofthe first-order Davis-Putnam-Longeman-Loveland pro-cedure can be explained away as a case of Smullyantableau with propositional simplification.

Besides its computational effectiveness, the simplic-ity and generality of simplification make its extensionpossible in a uniform way. We define it for proposi-tional and first order logic and a wide range of modallogics. For a full-fledged first order simplification wecombine it with another technique, renaming, whichsubsumes the use of free universal variables in sequentand tableau calculi.

New experimental results are given for random SATand the IFIP benchmarks for hardware verification.

UCAM-CL-TR-425

Leslie Lamport, Lawrence C. Paulson:

Should your specification language betyped?May 1997, 30 pages, PDF

Abstract: Most specification languages have a type sys-tem. Type systems are hard to get right, and gettingthem wrong can lead to inconsistencies. Set theory canserve as the basis for a specification language with-out types. This possibility, which has been widely over-looked, offers many advantages. Untyped set theoryis simple and is more flexible than any simple typedformalism. Polymorphism, overloading, and subtypingcan make a type system more powerful, but at the costof increased complexity, and such refinements can neverattain the flexibility of having no types at all. Typedformalisms have advantages too, stemming from thepower of mechanical type checking. While types servelittle purpose in hand proofs, they do help with mecha-nized proofs. In the absence of verification, type check-ing can catch errors in specifications. It may be possibleto have the best of both worlds by adding typing anno-tations to an untyped specification language.

We consider only specification languages, not pro-gramming languages.

UCAM-CL-TR-426

Mark Humphrys:

Action selection methods usingreinforcement learningJune 1997, 195 pages, PostScriptPhD thesis (Trinity Hall)

Abstract: The Action Selection problem is the prob-lem of run-time choice between conflicting and het-erogenous goals, a central problem in the simulationof whole creatures (as opposed to the solution of iso-lated uninterrupted tasks). This thesis argues that Re-inforcement Learning has been overlooked in the so-lution of the Action Selection problem. Considering adecentralised model of mind, with internal tension andcompetition between selfish behaviors, this thesis intro-duces an algorithm called “W-learning”, whereby dif-ferent parts of the mind modify their behavior based onwhether or not they are succeeding in getting the bodyto execute their actions. This thesis sets W-learning incontext among the different ways of exploiting Rein-forcement Learning numbers for the purposes of Ac-tion Selection. It is a ‘Minimize the Worst Unhappi-ness’ strategy. The different methods are tested andtheir strengths and weaknesses analysed in an artificialworld.

UCAM-CL-TR-427

Don Syme:

Proving Java type soundnessJune 1997, 35 pages, paper copy

85

UCAM-CL-TR-428

John Harrison:

Floating point verification inHOL Light: the exponential functionJune 1997, 112 pages, PostScript, DVI

Abstract: In that they often embody compact butmathematically sophisticated algorithms, operationsfor computing the common transcendental functions infloating point arithmetic seem good targets for formalverification using a mechanical theorem prover. We dis-cuss some of the general issues that arise in verificationsof this class, and then present a machine-checked ver-ification of an algorithm for computing the exponen-tial function in IEEE-754 standard binary floating pointarithmetic. We confirm (indeed strengthen) the main re-sult of a previously published error analysis, though weuncover a minor error in the hand proof and are forcedto confront several subtle issues that might easily beoverlooked informally.

Our main theorem connects the floating point ex-ponential to its abstract mathematical counterpart. Thespecification we prove is that the function has the cor-rect overflow behaviour and, in the absence of overflow,the error in the result is less than 0.54 units in the lastplace (0.77 if the answer is denormalized) comparedagainst the exact mathematical exponential function.The algorithm is expressed in a simple formalized pro-gramming language, intended to be a subset of real pro-gramming and hardware description languages. It usesunderlying floating point operations (addition, multi-plication etc.) that are assumed to conform to the IEEE-754 standard for binary floating point arithmetic.

The development described here includes, apartfrom the proof itself, a formalization of IEEE arith-metic, a mathematical semantics for the programminglanguage in which the algorithm is expressed, and thebody of pure mathematics needed. All this is devel-oped logically from first principles using the HOL Lightprover, which guarantees strict adherence to simplerules of inference while allowing the user to performproofs using higher-level derived rules. We first presentthe main ideas and conclusions, and then collect sometechnical details about the prover and the underlyingmathematical theories in appendices.

UCAM-CL-TR-429

Andrew D. Gordon, Paul D. Hankin,Søren B. Lassen:

Compilation and equivalence ofimperative objectsJune 1997, 64 pages, PostScript, DVI

Abstract: We adopt the untyped imperative object cal-culus of Abadi and Cardelli as a minimal setting inwhich to study problems of compilation and programequivalence that arise when compiling object-orientedlanguages. We present both a big-step and a small-stepsubstitution-based operational semantics for the calcu-lus. Our first two results are theorems asserting theequivalence of our substitution-based semantics with aclosure-based semantics like that given by Abadi andCardelli. Our third result is a direct proof of the correct-ness of compilation to a stack-based abstract machinevia a small-step decompilation algorithm. Our fourthresult is that contextual equivalence of objects coincideswith a form of Mason and Talcott’s CIU equivalence;the latter provides a tractable means of establishing op-erational equivalences. Finally, we prove correct an al-gorithm, used in our prototype compiler, for staticallyresolving method offsets. This is the first study of cor-rectness of an object-oriented abstract machine, and ofoperational equivalence for the imperative object calcu-lus.

UCAM-CL-TR-430


Video mail retrieval using voiceReport on topic spottingJuly 1997, 73 pages, paper copy

UCAM-CL-TR-431

Martin Richards:

The MCPL programming manual anduser guideJuly 1997, 70 pages, paper copy

UCAM-CL-TR-432


On two formal analyses of theYahalom protocolJuly 1997, 16 pages, PDF, PostScript, DVI

Abstract: The Yahalom protocol is one of those an-alyzed by Burrows et al. in the BAN paper. Basedupon their analysis, they have proposed modificationsto make the protocol easier to understand and analyze.Both versions of Yahalom have now been proved, us-ing Isabelle/HOL, to satisfy strong security goals. Themathematical reasoning behind these machine proofs ispresented informally.

86

The new proofs do not rely on a belief logic; theyuse an entirely different formal model, the inductivemethod. They confirm the BAN analysis and the ad-vantages of the proposed modifications. The new proofmethods detect more flaws than BAN and analyze pro-tocols in finer detail, while remaining broadly consis-tent with the BAN principles. In particular, the proofsconfirm the explicitness principle of Abadi and Need-ham.

UCAM-CL-TR-433

Martin Richards:

Backtracking algorithms in MCPLusing bit patterns and recursionJuly 1997, 80 pages, paper copy

UCAM-CL-TR-434

Martin Richards:

Demonstration programs for CTLand µ-calculus symbolic modelcheckingAugust 1997, 41 pages, paper copy

UCAM-CL-TR-435

Peter Sewell:

Global/local subtyping for adistributed π-calculusAugust 1997, 57 pages, PostScript

Abstract: In the design of mobile agent programminglanguages there is a tension between the implementa-tion cost and the expressiveness of the communicationmechanisms provided. This paper gives a static typesystem for a distributed π-calculus in which the inputand output of channels may be either global or local.This allows compile-time optimization where possiblebut retains the expressiveness of channel communica-tion. Subtyping allows all communications to be in-voked uniformly. Recursive types and products are in-cluded. The distributed π-calculus used integrates loca-tion and migration primitives from the Distributed JoinCalculus with asynchronous π communication, takinga simple reduction semantics. Some alternative calculiare discussed.

UCAM-CL-TR-436

W.F. Clocksin:

A new method for estimating opticalflowNovember 1997, 20 pages, PDF

Abstract: Accurate and high density estimation of opti-cal flow vectors in an image sequence is accomplishedby a method that estimates the velocity distributionfunction for small overlapping regions of the image. Be-cause the distribution is multimodal, the method canaccurately estimate the change in velocity near motioncontrast borders. Large spatiotemporal support with-out sacrificing spatial resolution is a feature of themethod, so it is not necessary to smooth the resultingflow vectors in a subsequent operation, and there is acertain degree of resistance to aperture and aliasing ef-fects. Spatial support also provides for the accurate es-timation of long-range displacements, and subpixel ac-curacy is achieved by a simple weighted mean near themode of the velocity distribution function.

The method is demonstrated using image sequencesobtained from the analysis of ceramic and metal mate-rials under stress. The performance of the system un-der degenerate conditions is also analysed to provideinsight into the behaviour of optical flow methods ingeneral.

UCAM-CL-TR-437

William S. Harbison:

Trusting in computer systemsDecember 1997, 95 pages, PDFPhD thesis (Wolfson College, May 1997)

Abstract: We need to be able to reason about large sys-tems, and not just about their components. For this weneed new conceptual tools, and this dissertation there-fore indicates the need for a new methodology whichwill allow us to better identify areas of possible conflictor lack of knowledge in a system.

In particular, it examines at the concept of trust,and how this can help us to understand the basic se-curity aspects of a system. The main proposal of thispresent work is that systems are viewed in a mannerwhich analyses the conditions under which they havebeen designed to perform, and the circumstances underwhich they have been implemented, and then comparesthe two. This problem is then examined from the pointof what is being trusted in a system, or what it is beingtrusted for.

Starting from an approach developed in a militarycontext, we demonstrate how this can lead to unantic-ipated risks when applied inappropriately. We furthersuggest that ‘trust’ be considered a relative concept, incontast to the more usual usage, and that it is not the

87

result of knowledge but a substitute for it. The utilityof these concepts is in their ability to quantify the risksassociated with a specific participant, whether these areexplicitly accepted by them, or not.

We finally propose a distinction between ‘trust’ and‘trustworthy’ and demonstrate that most current usesof the term ‘trust’ are more appropriately viewed asstatements of ‘trustworthiness’. Ultimately, therefore,we suggest that the traditional “Orange Book” conceptof trust resulting from knowledge can violate the secu-rity policy of a system.

UCAM-CL-TR-438

Feng Shi:

An architecture for scalable anddeterministic video serversNovember 1997, 148 pages, PDFPhD thesis (Wolfson College, June 1997)

Abstract: A video server is a storage system that canprovide a repository for continuous media (CM) dataand sustain CM stream delivery (playback or record-ing) through networks. The voluminous nature of CMdata demands a video server to be scalable in order toserve a large number of concurrent client requests. Inaddition, deterministic services can be provided by avideo server for playback because the characteristics ofvariable bit rate (VBR) video can be analysed in ad-vance and used in run-time admission control (AC) anddata retrieval.

Recent research has made gigabit switches a reality,and the cost/performance ratio of microprocessors andstandard PCs is dropping steadily. It would be morecost effective and flexible to use off-the-shelf compo-nents inside a video server with a scalable switched net-work as the primary interconnect than to make a spe-cial purpose or massively parallel multiprocessor basedvideo server. This work advocates and assumes such ascalable video server structure in which data is stripedto multiple peripherals attached directly to a switchednetwork.

However, most contemporary distributed file sys-tems do not support data distribution across multiplenetworked nodes, let alone providing quality of service(QoS) to CM applications at the same time. It is theobservation of this dissertation that the software sys-tem framework for network striped video servers is asimportant as the scalable hardware architecture itself.This leads to the development of a new system archi-tecture, which is scalable, flexible and QoS aware, forscalable and deterministic video servers. The resultingsrchitecture is called Cadmus from sCAlable and De-terministic MUlitmedia Servers.

Cadmus also provides integrated solutions to ACand actual QoS enforcement in storage nodes. This isachieved by considering resources such as CPU buffer,

disk, and network, simultaneously but not indepen-dently and by including both real-time (RT) and non-real-time (NRT) activities, In addition, the potentialto smooth the variability of VBR videos using read-ahead under client buffer constraints is identified. Anew smoothing algorithm is presented, analysed, andincorporated into the Cadmus architecture.

A prototype implementation of Cadmus has beenconstructed based on distributed object computingand hardware modules directly connected to an Asyn-chronous Transfer Mode (ATM) network. Experimentswere performed to evaluate the implementation anddemonstrate the utility and feasibility of the architec-ture and its AC criteria.

UCAM-CL-TR-439

David A. Halls:

Applying mobile code to distributedsystemsDecember 1997, 158 pages, paper copyPhD thesis (June 1997)

UCAM-CL-TR-440


Inductive analysis of the internetprotocol TLSDecember 1997, 19 pages, PDF, PostScript

Abstract: Internet browsers use security protocols toprotect confidential messages. An inductive analysis ofTLS (a descendant of SSL 3.0) has been performed us-ing the theorem prover Isabelle. Proofs are based onhigher-order logic and make no assumptions concern-ing beliefs or finiteness. All the obvious security goalscan be proved; session resumption appears to be secureeven if old session keys have been compromised. Theanalysis suggests modest changes to simplify the proto-col.

TLS, even at an abstract level, is much more com-plicated than most protocols that researchers have veri-fied. Session keys are negotiated rather than distributed,and the protocol has many optional parts. Neverthe-less, the resources needed to verify TLS are modest. Theinductive approach scales up.

UCAM-CL-TR-441


A generic tableau prover andits integration with IsabelleJanuary 1998, 16 pages, PDF, PostScript, DVI

88

Abstract: A generic tableau prover has been imple-mented and integrated with Isabelle. It is based onleantap but is much more complicated, with numerousmodifications to allow it to reason with any supplied setof tableau rules. It has a higher-order syntax in order tosupport the binding operators of set theory; unificationis first-order (extended for bound variables in obviousways) instead of higher-order, for simplicity.

When a proof is found, it is returned to Isabelleas a list of tactics. Because Isabelle verifies the proof,the prover can cut corners for efficiency’s sake with-out compromising soundness. For example, it knowsalmost nothing about types.

UCAM-CL-TR-442

Jacques Fleuriot, Lawrence C. Paulson:

A combination of nonstandardanalysis and geometry theoremproving, with application toNewton’s PrincipiaJanuary 1998, 13 pages, PostScript, DVI

Abstract: The theorem prover Isabelle is used to for-malise and reproduce some of the styles of reasoningused by Newton in his Principia. The Principia’s reason-ing is resolutely geometric in nature but contains “in-finitesimal” elements and the presence of motion thattake it beyond the traditional boundaries of EuclideanGeometry. These present difficulties that prevent New-ton’s proofs from being mechanised using only the ex-isting geometry theorem proving (GTP) techniques.

Using concepts from Robinson’s Nonstandard Anal-ysis (NSA) and a powerful geometric theory, we intro-duce the concept of an infinitesimal geometry in whichquantities can be infinitely small or infinitesimal. Wereveal and prove new properties of this geometry thatonly hold because infinitesimal elements are allowedand use them to prove lemmas and theorems from thePrincipia.

UCAM-CL-TR-443


The inductive approach to verifyingcryptographic protocolsFebruary 1998, 46 pages, PDF, PostScript

Abstract: Informal arguments that cryptographic pro-tocols are secure can be made rigorous using induc-tive definitions. The approach is based on ordinarypredicate calculus and copes with infinite-state systems.Proofs are generated using Isabelle/HOL. The humaneffort required to analyze a protocol can be as little as

a week or two, yielding a proof script that takes a fewminutes to run.

Protocols are inductively defined as sets of traces.A trace is a list of communication events, perhapscomprising many interleaved protocol runs. Protocoldescriptions incorporate attacks and accidental losses.The model spy knows some private keys and can forgemessages using components decrypted from previoustraffic. Three protocols are analyzed below: Otway-Rees (which uses shared-key encryption), Needham-Schroeder (which uses public-key encryption), and a re-cursive protocol (which is of variable length).

One can prove that event ev always precedes eventev′ or that property P holds provided X remains secret.Properties can be proved from the viewpoint of the var-ious principals: say, if A receives a final message from Bthen the session key it conveys is good.

UCAM-CL-TR-444

Peter Sewell:

From rewrite rules to bisimulationcongruencesMay 1998, 72 pages, PostScript

Abstract: The dynamics of many calculi can be mostclearly defined by reduction semantics. To work witha calculus, however, an understanding of operationalcongruences is fundamental; these can often be giventractable definitions or characterisations using a la-belled transition semantics. This paper considers cal-culi with arbitary reduction semantics of three simpleclasses, firstly ground term rewriting, then left-linearterm rewriting, and then a class which is esentially theaction calculi lacking substantive name binding. Gen-eral definitions of labelled transitions are given in eachcase, uniformly in the set of rewrite rules, and withoutrequiring the prescription of additional notions of ob-servation. They give rise to bisimulation congruences.As a test of the theory it is shown that bisimulation fora fragment of CCS is recovered. The transitions gener-ated for a fragment of the Ambient Calculus of Cardelliand Gordon, and for SKI combinators, are also dis-cussed briefly.

UCAM-CL-TR-445

Michael Roe, Bruce Christianson,David Wheeler:

Secure sessions from weak secretsJuly 1998, 12 pages, PDF

Abstract: Sometimes two parties who share a weak se-cret k (such as a password) wish to share a strong secrets (such as a session key) without revealing informationabout k to a (possibly active) attacker. We assume that

89

both parties can generate strong random numbers andforget secrets, and present three protocols for securestrong secret sharing, based on RSA, Diffie-Hellmanand El-Gamal. As well as being simpler and quickerthan their predecessors, our protocols also have slightlystronger security properties: in particular, they make nocryptographic use of s and so impose no subtle restric-tions upon the use which is made of s by other proto-cols.

UCAM-CL-TR-446

K. Sparck Jones, S. Walker, S.E. Robertson:

A probabilistic model of informationand retrieval:development and statusAugust 1998, 74 pages, PostScript, DVI

Abstract: The paper combines a comprehensive accountof the probabilistic model of retrieval with new sys-tematic experiments on TREC Programme material. Itpresents the model from its foundations through its log-ical development to cover more aspects of retrieval dataand a wider range of system functions. Each step in theargument is matched by comparative retrieval tests, toprovide a single coherent account of a major line of re-search. The experiments demonstrate, for a large testcollection, that the probabilistic model is effective androbust, and that it responds appropriately, with majorimprovements in performance, to key features of re-trieval situations.

UCAM-CL-TR-447

Giampaolo Bella, Lawrence C. Paulson:

Are timestamps worth the effort?A formal treatmentSeptember 1998, 12 pages, PDF

Abstract: Theorem proving provides formal and de-tailed support to the claim that timestamps can givebetter freshness guarantees than nonces do, and cansimplify the design of crypto-protocols. However, sincethey rely on synchronised clocks, their benefits are stilldebatable. The debate should gain from our formalanalysis, which is achieved through the comparisonof a nonce-based crypto-protocol, Needham-Schroeder,with its natural modification by timestamps, Kerberos.

UCAM-CL-TR-448

G.M. Bierman:

A computational interpretation of theλµ calculusSeptember 1998, 10 pages, paper copy

UCAM-CL-TR-449

Florian Kammuller, Markus Wenzel:

LocalesA sectioning concept for IsabelleOctober 1998, 16 pages, paper copy

UCAM-CL-TR-450

Jacobus Erasmus van der Merwe:

Open service support for ATMNovember 1998, 164 pages, paper copyPhD thesis (St John’s College, September 1997)

UCAM-CL-TR-451

Sean Rooney:

The structure of open ATM controlarchitecturesNovember 1998, 183 pages, paper copyPhD thesis (Wolfson College, February 1998)

UCAM-CL-TR-452

Florian Kammuller, Lawrence C. Paulson:

A formal proof of Sylow’s theoremAn experiment in abstract algebrawith Isabelle HolNovember 1998, 30 pages, PDF

Abstract: The theorem of Sylow is proved in IsabelleHOL. We follow the proof by Wielandt that is moregeneral than the original and uses a non-trivial combi-natorial identity. The mathematical proof is explainedin some detail leading on to the mechanization of grouptheory and the necessary combinatorics in Isabelle. Wepresent the mechanization of the proof in detail givingreference to theorems contained in an appendix. Someweak points of the experiment with respect to a naturaltreatment of abstract algebraic reasoning give rise to adiscussion of the use of module systems to represent ab-stract algebra in theorem provers. Drawing from that,we present tentative ideas for further research into asection concept for Isabelle.

90

UCAM-CL-TR-453

Michael Norrish:

C formalised in HOLDecember 1998, 156 pages, PDFPhD thesis (August 1998)

Abstract: We present a formal semantics of the C pro-gramming language, covering both the type system andthe dynamic behaviour of programs. The semantics iswide-ranging, covering most of the language, with itsmost significant omission being the C library. Using astructural operational semantics we specify transitionrelations for C’s expressions, statements and declara-tions in higher order logic.

The consistency of our definition is assured by itsspecification in the HOL theorem prover. With the the-orem prover, we have used the semantics as the basis fora set of proofs of interesting theorems about C. We in-vestigate properties of expressions and statements sep-arately.

In our chapter of results about expressions, we be-gin with two results about the interaction between thetype system and the dynamic semantics. We have bothtype preservation, that the values produced by expres-sions conform to the type predicted for them; and typesafety, that typed expressions will not block, but will ei-ther evaluate to a value, or cause undefined behaviour.We then also show that two broad classes of expres-sion are deterministic. This last result is of considerablepractical value as it makes later verification proofs sig-nificantly easier.

In our chapter of results about statements, we provea series of derived rules that provide C with Floyd-Hoare style “axiomatic” rules for verifying propertiesof programs. These rules are consequences of the orig-inal semantics, not independently stated axioms, so wecan be sure of their soundness. This chapter also provesthe correctness of an automatic tool for constructingpost-conditions for loops with break and return state-ments.

Finally, we perform some simple verification casestudies, going some way towards demonstrating practi-cal utility for the semantics and accompanying tools.

This technical report is substantially the same as thePhD thesis I submitted in August 1998. The minor dif-ferences between that document and this are principallyimprovements suggested by my examiners Andy Gor-don and Tom Melham, whom I thank for their helpand careful reading.

UCAM-CL-TR-454

Andrew M. Pitts:

Parametric polymorphism andoperational equivalenceDecember 1998, 39 pages, paper copy

UCAM-CL-TR-455

G.M. Bierman:

Multiple modalitiesDecember 1998, 26 pages, paper copy

UCAM-CL-TR-456

Joshua Robert Xavier Ross:

An evaluation based approach toprocess calculiJanuary 1999, 206 pages, paper copyPhD thesis (Clare College, July 1998)

UCAM-CL-TR-457

Andrew D. Gordon, Paul D. Hankin:

A concurrent object calculus:reduction and typingFebruary 1999, 63 pages, paper copy

UCAM-CL-TR-458


Final coalgebras as greatest fixedpoints in ZF set theoryMarch 1999, 25 pages, PDF

Abstract: A special final coalgebra theorem, in the styleof Aczel (1988), is proved within standard Zermelo-Fraenkel set theory. Aczel’s Anti-Foundation Axiom isreplaced by a variant definition of function that admitsnon-well-founded constructions. Variant ordered pairsand tuples, of possibly infinite length, are special casesof variant functions. Analogues of Aczel’s solution andsubstitution lemmas are proved in the style of Rut-ten and Turi (1993). The approach is less general thanAczel’s, but the treatment of non-well-founded objectsis simple and concrete. The final coalgebra of a functoris its greatest fixedpoint. Compared with previous work(Paulson, 1995a), iterated substitutions and solutionsare considered, as well as final coalgebras defined withrespect to parameters. The disjoint sum construction isreplaced by a smoother treatment of urelements thatsimplifies many of the derivations. The theory facili-tates machine implementation of recursive definitionsby letting both inductive and coinductive definitions berepresented as fixedpoints. It has already been appliedto the theorem prover Isabelle (Paulson, 1994).

91

UCAM-CL-TR-459

Mohamad Afshar:

An open parallel architecture fordata-intensive applicationsJuly 1999, 225 pages, PostScript, DVIPhD thesis (King’s College, December 1998)

Abstract: Data-intensive applications consist of bothdeclarative data-processing parts and imperative com-putational parts. For applications such as climatemodelling, scale hits both the computational aspectswhich are typically handled in a procedural pro-gramming language, and the data-processing aspectswhich are handled in a database query language. Al-though parallelism has been successfully exploited inthe data-processing parts by parallel evaluation ofdatabase queries associated with the application, cur-rent database query languages are poor at expressingthe computational aspects, which are also subject toscale.

This thesis proposes an open architecture that de-livers parallelism shared between the database, systemand application, thus enabling the integration of theconventionally separated query and non-query compo-nents of a data-intensive application. The architectureis data-model independent and can be used in a va-riety of different application areas including decision-support applications, which are query based, and com-plex applications, which comprise procedural languagestatements with embedded queries. The architecture en-compasses a unified model of parallelism and the reali-sation of this model in the form of a language withinwhich it is possible to describe both the query andnon-query components of data-intensive applications.The language enables the construction of parallel ap-plications by the hierarchical composition of platform-independent parallel forms, each of which implementsa form of task or data parallelism. These forms may beused to determine both query and non-query actions.

Queries are expressed in a declarative languagebased on “monoid comprehensions”. The approachof using monoids to model data types and monoidhomomorphisms to iterate over collection types en-ables mathematically provable compile-time optimisa-tions whilst also facilitating multiple collection typesand data type extensibility. Monoid comprehensionprograms are automatically transformed into paral-lel programs composed of applications of the paral-lel forms, one of which is the “monoid homomor-phism”. This process involves identifying the parts ofa query where task and data parallelism are availableand mapping that parallelism onto the most suitableform. Data parallelism in queries is mapped onto aform that implements combining tree parallelism forquery evaluation and dividing tree parallelism to re-alise data partitioning. Task parallelism is mapped onto

two separate forms that implement pipeline and in-dependent parallelism. This translation process is ap-plied to all comprehension queries including those incomplex applications. The result is a skeleton programin which both the query and non-query parts are ex-pressed within a single language. Expressions in thislanguage are amenable to the application of optimisingskeleton rewrite rules.

A complete prototype of the decision-support ar-chitecture has been constructed on a 128-cell MIMDparallel computer. A demonstration of the utility ofthe query framework is performed by modelling someof OQL and a substantial subset of SQL. The sys-tem is evaluated for query speedup with a number ofhardware configurations using a large music cataloguedatabase. The results obtained show that the implemen-tation delivers the performance gains expected whileoffering a convenient definition of the parallel environ-ment.

UCAM-CL-TR-460

Giampaolo Bella:

Message reception in the inductiveapproachMarch 1999, 16 pages, PDF

Abstract: Cryptographic protocols can be formallyanalysed in great detail by means of Paulson’s Induc-tive Approach, which is mechanised by the theoremprover Isabelle. The approach only relied on messagesending (and noting) in order to keep the models sim-ple. We introduce a new event, message reception, andshow that the price paid in terms of runtime is negli-gible because old proofs can be reused. On the otherhand, the new event enhances the global expressive-ness, and makes it possible to define an accurate no-tion of agents’ knowledge, which extends and replacesPaulson’s notion of spy’s knowledge. We have designednew guarantees to assure each agent that the peer doesnot know the crucial message items of the session. Thiswork thus extends the scope of the Inductive approach.Finally, we provide general guidance on updating theprotocols analysed so far, and give examples for somecases.

UCAM-CL-TR-461

Joe Hurd:

Integrating Gandalf and HOLMarch 1999, 11 pages, PDF

Abstract: Gandalf is a first-order resolution theorem-prover, optimized for speed and specializing in manip-ulations of large clauses. In this paper I describe GAN-DALF TAC, a HOL tactic that proves goals by callingGandalf and mirroring the resulting proofs in HOL.

92

This call can occur over a network, and a Gandalfserver may be set up servicing multiple HOL clients.In addition, the translation of the Gandalf proof intoHOL fits in with the LCF model and guarantees logicalconsistency.

UCAM-CL-TR-462

Peter Sewell, Paweł T. Wojciechowski,Benjamin C. Pierce:

Location-independentcommunication for mobile agents:a two-level architectureApril 1999, 31 pages, PostScript

Abstract: We study communication primitives for inter-action between mobile agents. They can be classifiedinto two groups. At a low level there are location de-pendent primitives that require a programmer to knowthe current site of a mobile agent in order to commu-nicate with it. At a high level there are location inde-pendent primitives that allow communication with amobile agent irrespective of its current site and of anymigrations. Implementation of these requires delicatedistributed infrastructure. We propose a simple calcu-lus of agents that allows implementation of such dis-tributed infrastructure algorithms to be expressed asencodings, or compilations, of the whole calculus intothe fragment with only location dependent communi-cation. These encodings give executable descriptions ofthe algorithms, providing a clean implementation strat-egy for prototype languages. The calculus is equippedwith a precise semantics, providing a solid basis for un-derstanding the algorithms and reasoning about theircorrectness and robustness. Two sample infrastructurealgorithms are presented as encodings.

UCAM-CL-TR-463

Peter Sewell, Jan Vitek:

Secure composition of insecurecomponentsApril 1999, 44 pages, PostScript

Abstract: Software systems are becoming heterohe-neous: instead of a small number of large programsfrom well-established sources, a user’s desktop maynow consist of many smaller components that inter-act in intricate ways. Some components will be down-loaded from the network from sources that are onlypartially trusted. A user would like to know that a num-ber of security properties hold, e.g. that personal datais not leaked to the net, but it is typically infaesibleto verify that such components are well-behaved. In-stead they must be executed in a secure environment,

or wrapper, that provides fine-grain control of the al-lowable interactions between them, and between com-ponents and other system resources.

In this paper we study such wrappers, focussing onhow they can be expressed in a way that enables theirsecurity properties to be stated and proved rigorously.We introduce a model programming language, the box-π calculus, that supports composition of software com-ponents and the enforcement of security policies. Sev-eral example wrappers are expressed using the calculus;we explore the delicate security properties they guaran-tee.

UCAM-CL-TR-464

Boaz Lerner, William Clocksin,Seema Dhanjal, Maj Hulten,Christipher Bishop:

Feature representation for theautomatic analysis of fluorescencein-situ hybridization imagesMay 1999, 36 pages, paper copy

UCAM-CL-TR-465

Boaz Lerner, Seema Dhanjal, Maj Hulten:

Gelfish – graphical environment forlabelling FISH imagesMay 1999, 20 pages, paper copy

UCAM-CL-TR-466

Boaz Lerner, William Clocksin,Seema Dhanjal, Maj Hulten,Christipher Bishop:

Automatic signal classification influorescence in-situ hybridizationimagesMay 1999, 24 pages, paper copy

UCAM-CL-TR-467


Mechanizing UNITY in IsabelleJune 1999, 22 pages, PDF

93

Abstract: UNITY is an abstract formalism for prov-ing properties of concurrent systems, which typicallyare expressed using guarded assignments [Chandy andMisra 1988]. UNITY has been mechanized in higher-order logic using Isabelle, a proof assistant. Safety andprogress primitives, their weak forms (for the substi-tution axiom) and the program composition operator(union) have been formalized. To give a feel for theconcrete syntax, the paper presents a few extracts fromthe Isabelle definitions and proofs. It discusses a smallexample, two-process mutual exclusion. A mechanicaltheory of unions of programs supports a degree of com-positional reasoning. Original work on extending pro-gram states is presented and then illustrated through asimple example involving an array of processes.

UCAM-CL-TR-468

Stephen Paul Wilcox:

Synthesis of asynchronous circuitsJuly 1999, 250 pages, PDFPhD thesis (Queens’ College, December 1998)

Abstract: The majority of integrated circuits today aresynchronous: every part of the chip times its operationwith reference to a single global clock. As circuits be-come larger and faster, it becomes progressively moredifficult to coordinate all actions of the chip to theclock. Asynchronous circuits do not suffer from thisproblem, because they do not require global synchro-nization; they also offer other benefits, such as modu-larity, lower power and automatic adaptation to physi-cal conditions.

The main disadvantage of asynchronous circuits isthat there are few tools to help with design. This thesisdescribes a new synthesis tool for asynchronous mod-ules, which combines a number of novel ideas with ex-isting methods for finite state machine synthesis. Con-nections between modules are assumed to have un-bounded finite delays on all wires, but fundamentalmode is used inside modules, rather than the pessimisticspeed-independent or quasi-delay-insensitive models.Accurate technology-specific verification is performedto check that circuits work correctly.

Circuits are described using a language based uponthe Signal Transition Graph, which is a well-knownmethod for specifying asynchronous circuits. Concur-rency reduction techniques are used to produce a largenumber of circuits that conform to a given specifica-tion. Circuits are verified using a simulation algorithmderived from the work of Brzozowski and Seger, andthen performance estimations are obtained by a gate-level simulator utilising a new estimation of waveformslopes. Circuits can be ranked in terms of high speed,low power dissipation or small size, and then the bestcircuit for a particular task chosen.

Results are presented that show significant improve-ments over most circuits produced by other synthe-sis tools. Some circuits are twice as fast and dissipate

half the power of equivalent speed-independent cir-cuits. Specification examples are provided which showthat the front-end specification is easier to use than cur-rent specification approaches. The price that must bepaid for the improved performance is decreased relia-bility and technology dependence of the circuits pro-duced; the proposed tool can also can a very long timeto produce a result.

UCAM-CL-TR-469

Jacques Desire Fleuriot:

A combination of geometry theoremproving and nonstandard analysis,with application to Newton’sPrincipiaAugust 1999, 135 pages, paper copyPhD thesis (Clare College, March 1999)

UCAM-CL-TR-470

Florian Kammuller:

Modular reasoning in IsabelleAugust 1999, 128 pages, paper copyPhD thesis (Clare College, April 1999)

UCAM-CL-TR-471

Robert M. Brady, Ross J. Anderson,Robin C. Ball:

Murphy’s law, the fitness ofevolving species, and the limits ofsoftware reliabilitySeptember 1999, 14 pages, PDF

Abstract: We tackle two problems of interest to the soft-ware assurance community. Firstly, existing models ofsoftware development (such as the waterfall and spiralmodels) are oriented towards one-off software develop-ment projects, while the growth of mass market com-puting has led to a world in which most software con-sists of packages which follow an evolutionary devel-opment model. This leads us to ask whether anythinginteresting and useful may be said about evolutionarydevelopment. We answer in the affirmative. Secondly,existing reliability growth models emphasise the Pois-son distribution of individual software bugs, while theempirically observed reliability growth for large sys-tems is asymptotically slower than this. We provide arigorous explanation of this phenomenon. Our relia-bility growth model is inspired by statistical thermo-dynamics, but also applies to biological evolution. It is

94

in close agreement with experimental measurements ofthe fitness of an evolving species and the reliability ofcommercial software products. However, it shows thatthere are significant differences between the evolutionof software and the evolution of species. In particular,we establish maximisation properties corresponding toMurphy’s law which work to the advantage of a biolog-ical species, but to the detriment of software reliability.

UCAM-CL-TR-472

Ben Y. Reis:

Simulating music learning withautonomous listening agents:entropy, ambiguity and contextSeptember 1999, 200 pages, paper copyPhD thesis (Queens’ College, July 1999)

UCAM-CL-TR-473

Clemens Ballarin:

Computer algebra and theoremprovingOctober 1999, 122 pages, paper copyPhD thesis (Darwin College)

UCAM-CL-TR-474

Boaz Lerner:

A Bayesian methodology andprobability density estimation forfluorescence in-situ hybridizationsignal classificationOctober 1999, 31 pages, paper copy

UCAM-CL-TR-475

Boaz Lerner, Neil D. Lawrence:

A comparison of state-of-the-artclassification techniques withapplication to cytogeneticsOctober 1999, 34 pages, paper copy

UCAM-CL-TR-476

Mark Staples:

Linking ACL2 and HOLNovember 1999, 23 pages, paper copy

UCAM-CL-TR-477

Gian Luca Cattani, Glynn Winskel:

Presheaf models for CCS-likelanguagesNovember 1999, 46 pages, paper copy

UCAM-CL-TR-478

Peter Sewell, Jan Vitek:

Secure composition of untrustedcode: wrappers and causality typesNovember 1999, 36 pages, PostScript

Abstract: We consider the problem of assembling con-current software systems from untrusted or partiallytrusted off-the-shelf components, using wrapper pro-grams to encapsulate components and enforce secu-rity policies. In previous work we introduced the box-π process calculus with constrained interaction to ex-press wrappers and discussed the rigorous formula-tion of their security properties. This paper addressesthe verification of wrapper information flow proper-ties. We present a novel causal type system that stat-ically captures the allowed flows between wrappedpossibly-badly-typed components; we use it to provethat a unidirectional-flow wrapper enforces a causalflow property.

UCAM-CL-TR-479

Geraint Price:

The interaction between faulttolerance and securityDecember 1999, 144 pages, PDFPhD thesis (Wolfson College, June 1999)

Abstract: This dissertation studies the effects on systemdesign when including fault tolerance design principleswithin security services.

We start by looking at the changes made to the trustmodel within protocol design, and how moving awayfrom trusted server design principles affects the struc-ture of the protocol. Taking the primary results fromthis work, we move on to study how control in proto-col execution can be used to increase assurances in theactions of legitimate participants. We study some exam-ples, defining two new classes of attack, and note thatby increasing client control in areas of protocol execu-tion, it is possible to overcome certain vulnerabilities.

We then look at different models in fault tolerance,and how their adoption into a secure environment canchange the design principles and assumptions madewhen applying the models.

95

We next look at the application of timing checksin protocols. There are some classes of timing attackthat are difficult to thwart using existing techniques,because of the inherent unreliability of networked com-munication. We develop a method of converting theQuality of Service mechanisms built into ATM net-works in order to achieve another layer of protectionagainst timing attacks.

We then study the use of primary-backup mecha-nisms within server design, as previous work on serverreplication in security centres on the use of the state ma-chine approach for replication, which provides a higherdegree of assurance in system design, but adds com-plexity.

We then provide a design for a server to reliablyand securely store objects across a loosely coupled, dis-tributed environment. The main goal behind this designwas to realise the ability for a client to exert controlover the fault tolerance inherent in the service.

The main conclusions we draw from our researchare that fault tolerance has a wider application withinsecurity than current practices, which are primarilybased on replicating servers, and clients can exert con-trol over the protocols and mechanisms to achieve re-silience against differing classes of attack. We promotesome new ideas on how, by challenging the prevail-ing model for client-server architectures in a secure en-vironment, legitimate clients can have greater controlover the services they use. We believe this to be a usefulgoal, given that the client stands to lose if the securityof the server is undermined.

UCAM-CL-TR-480

Mike Gordon:

Programming combinations ofdeduction and BDD-based symboliccalculationDecember 1999, 24 pages, paper copy

UCAM-CL-TR-481

Mike Gordon, Ken Friis Larsen:

Combining the Hol98 proof assistantwith the BuDDy BDD packageDecember 1999, 71 pages, paper copy

UCAM-CL-TR-482

John Daugman:

Biometric decision landscapesJanuary 2000, 15 pages, PDF

Abstract: This report investigates the “decision land-scapes” that characterize several forms of biometric de-cision making. The issues discussed include: (i) Esti-mating the degrees-of-freedom associated with differentbiometrics, as a way of measuring the randomness andcomplexity (and therefore the uniqueness) of their tem-plates. (ii) The consequences of combining more thanone biometric test to arrive at a decision. (iii) The re-quirements for performing identification by large-scaleexhaustive database search, as opposed to mere veri-fication by comparison against a single template. (iv)Scenarios for Biometric Key Cryptography (the use ofbiometrics for encryption of messages). These issues areconsidered here in abstract form, but where appropri-ate, the particular example of iris recognition is usedas an illustration. A unifying theme of all four sets ofissues is the role of combinatorial complexity, and itsmeasurement, in determining the potential decisivenessof biometric decision making.

UCAM-CL-TR-483

Hendrik Jaap Bos:

Elastic network controlJanuary 2000, 184 pages, paper copyPhD thesis (Wolfson College, August 1999)

UCAM-CL-TR-484

Richard Tucker:

Automatic summarising and theCLASP systemJanuary 2000, 190 pages, PDFPhD thesis (1999)

Abstract: This dissertation discusses summarisers andsummarising in general, and presents CLASP, a newsummarising system that uses a shallow semantic rep-resentation of the source text called a “predication co-hesion graph”.

Nodes in the graph are “simple predications” corre-sponding to events, states and entities mentioned in thetext; edges indicate related or similar nodes. Summarycontent is chosen by selecting some of these predica-tions according to criteria of “importance”, “represen-tativeness” and “cohesiveness”. These criteria are ex-pressed as functions on the nodes of a weighted graph.Summary text is produced either by extracting wholesentences from the source text, or by generating short,indicative “summary phrases” from the selected predi-cations.

CLASP uses linguistic processing but no domainknowledge, and therefore does not restrict the subjectmatter of the source text. It is intended to deal robustlywith complex texts that it cannot analyse completelyaccurately or in full. Experiments in summarising sto-ries from the Wall Street Journal suggest there may be a

96

benefit in identifying important material in a semanticrepresentation rather than a surface one, but that, de-spite the robustness of the source representation, inac-curacies in CLASP’s linguistic analysis can dramaticallyaffect the readability of its summaries. I discuss ways inwhich this and other problems might be overcome.

UCAM-CL-TR-485

Daryl Stewart, Myra VanInwegen:

Three notes on the interpretation ofVerilogJanuary 2000, 47 pages, paper copy

UCAM-CL-TR-486

James Richard Thomas:

Stretching a point: aspect andtemporal discourseFebruary 2000, 251 pages, paper copyPhD thesis (Wolfson College, January 1999)

UCAM-CL-TR-487

Tanja Vos, Doaitse Swierstra:

Sequential program composition inUNITYMarch 2000, 20 pages, paper copy

UCAM-CL-TR-488

Giampaolo Bella, Fabio Massacci,Lawrence Paulson, Piero Tramontano:

Formal verification of card-holderregistration in SETMarch 2000, 15 pages, paper copy

UCAM-CL-TR-489

Jong-Hyeon Lee:

Designing a reliable publishingframeworkApril 2000, 129 pages, PDFPhD thesis (Wolfson College, January 2000)

Abstract: Due to the growth of the Internet and thewidespread adoption of easy-to use web browsers, theweb provides a new environment for conventional aswell as new businesses. Publishing on the web is a fun-damental and important means of supporting variousactivities on the Internet such as commercial transac-tions, personal home page publishing, medical informa-tion distribution, public key certification and academicscholarly publishing. Along with the dramatic growthof the web, the number of reported frauds is increasingsharply. Since the Internet was not originally designedfor web publishing, it has some weaknesses that under-mine its reliability.

How can we rely on web publishing? In order to re-solve this question, we need to examine what makespeople confident when reading conventional publica-tions printed on paper, to investigate what attacks canerode confidence in web publishing, and to understandthe nature of publishing in general.

In this dissertation, we examine security propertiesand policy models, and their applicability to publish-ing. We then investigate the nature of publishing so thatwe can extract its technical requirements. To help usunderstand the practical mechanisms which might sat-isfy these requirements, some applications of electronicpublishing are discussed and some example mecha-nisms are presented.

We conclude that guaranteed integrity, verifiable au-thenticity and persistent availability of publications arerequired to make web publishing more reliable. Hencewe design a framework that can support these prop-erties. To analyse the framework, we define a securitypolicy for web publishing that focuses on the guaran-teed integrity and authenticity of web publications, andthen describe some technical primitives that enable usto achieve our requirements. Finally, the Jikzi publish-ing system—an implementation of our framework—ispresented with descriptions of its architecture and pos-sible applications.

UCAM-CL-TR-490

Peter John Cameron Brown:

Selective mesh refinement forrenderingApril 2000, 179 pages, paper copyPhD thesis (Emmanuel College, February 1998)

Abstract: A key task in computer graphics is the render-ing of complex models. As a result, there exist a largenumber of schemes for improving the speed of the ren-dering process, many of which involve displaying onlya simplified version of a model. When such a simplifica-tion is generated selectively, i.e. detail is only removedin specific regions of a model, we term this selectivemesh refinement.

Selective mesh refinement can potentially produce amodel approximation which can be displayed at greatly

97

reduced cost while remaining perceptually equivalent toa rendering of the original. For this reason, the field ofselective mesh refinement has been the subject of dra-matically increased interest recently. The resulting selec-tive refinement methods, though, are restricted in boththe types of model which they can handle and the formof output meshes which they can generate.

Our primary thesis is that a selectively refined meshcan be produced by combining fragments of approx-imations to a model without regard to the underly-ing approximation method. Thus we can utilise exist-ing approximation techniques to produce selectively re-fined meshes in n-dimensions. This means that the ca-pabilities and characteristics of standard approxima-tion methods can be retained in our selectively refinedmodels.

We also show that a selectively refined approxima-tion produced in this manner can be smoothly geomet-rically morphed into another selective refinement in or-der to satisfy modified refinement criteria. This geomet-ric morphing is necessary to ensure that detail can beadded and removed from models which are selectivelyrefined with respect to their impact on the current viewfrustum. For example, if a model is selectively refinedin this manner and the viewer approaches the modelthen more detail may have to be introduced to the dis-played mesh in order to ensure that it satisfies the newrefinement criteria. By geometrically morphing this in-troduction of detail we can ensure that the viewer is notdistracted by “popping” artifacts.

We have developed a novel framework within whichthese proposals have been verified. This frameworkconsists of a generalised resolution-based model repre-sentation, a means of specifying refinement criteria andalgorithms which can perform the selective refinementand geometric morphing tasks. The framework has al-lowed us to demonstrate that these twin tasks can beperformed both on the output of existing approxima-tion techniques and with respect to a variety of refine-ment criteria.

A HTML version of this thesis is athttps://www.cl.cam.ac.uk/research/rainbow/publications/pjcb/thesis/

UCAM-CL-TR-491

Anna Korhonen, Genevive Gorrell,Diana McCarthy:

Is hypothesis testing useful forsubcategorization acquisition?May 2000, 9 pages, paper copy

UCAM-CL-TR-492

Paweł Tomasz Wojciechowski:

Nomadic Pict: language andinfrastructure design for mobilecomputation

June 2000, 184 pages, PDF, PostScriptPhD thesis (Wolfson College, March 2000)

Abstract: Mobile agents – units of executing compu-tation that can migrate between machines – are likelyto become an important enabling technology for fu-ture distributed systems. We study the distributed in-frastructures required for location-independent com-munication between migrating agents. These infrastruc-tures are problematic: the choice or design of an in-frastructure must be somewhat application-specific –any given algorithm will only have satisfactory per-formance for some range of migration and communi-cation behaviour; the algorithms must be matched tothe expected properties (and robustness demands) ofapplications and the failure characteristic of the com-munication medium. To study this problem we in-troduce an agent programming language – NomadicPict. It is designed to allow infrastructure algorithmsto be expressed clearly, as translations from a high-level language to a lower level. The levels are based onrigorously-defined process calculi, which provide sharplevels of abstraction. In this dissertation we describe thelanguage and use it to develop a distributed infrastruc-ture for an example application. The language and ex-amples have been implemented; we conclude with a de-scription of the compiler and runtime system.

UCAM-CL-TR-493

Giampaolo Bella:

Inductive verification ofcryptographic protocolsJuly 2000, 189 pages, PDFPhD thesis (Clare College, March 2000)

Abstract: The dissertation aims at tailoring Paulson’sInductive Approach for the analysis of classical cryp-tographic protocols towards real-world protocols. Theaim is pursued by extending the approach with newelements (e.g. timestamps and smart cards), new net-work events (e.g. message reception) and more expres-sive functions (e.g. agents’ knowledge). Hence, the aimis achieved by analysing large protocols (Kerberos IVand Shoup-Rubin), and by studying how to specify andverify their goals.

More precisely, the modelling of timestamps and ofa discrete time are first developed on BAN Kerberos,while comparing the outcomes with those of the BANlogic. The machinery is then applied to Kerberos IV,whose complicated use of session keys requires a ded-icated treatment. Three new guarantees limiting thespy’s abilities in case of compromise of a specific sessionkey are established. Also, it is discovered that KerberosIV is subject to an attack due to the weak guarantees ofconfidentiality for the protocol responder.

We develop general strategies to investigate thegoals of authenticity, key distribution and non-injective

98

agreement, which is a strong form of authentication.These strategies require formalising the agents’ knowl-edge of messages. Two approaches are implemented. Ifan agent creates a message, then he knows all com-ponents of the message, including the cryptographickey that encrypts it. Alternatively, a broad definition ofagents’ knowledge can be developed if a new networkevent, message reception, is formalised.

The concept of smart card as a secure device thatcan store long-term secrets and perform easy compu-tations is introduced. The model cards can be stolenand/or cloned by the spy. The kernel of their built-in al-gorithm works correctly, so they spy cannot acquire un-limited knowledge from their use. However, their func-tional interface is unreliable, so they send correct out-puts in an unspecified order. The provably secure pro-tocol based on smart cards designed by Shoup & Ru-bin is mechanised. Some design weaknesses (unknownto the authors’ treatment by Bellare & Rogaway’s ap-proach) are unveiled, while feasible corrections are sug-gested and verified.

We realise that the evidence that a protocol achievesits goals must be available to the peers. In consequence,we develop a new a principle of prudent protocol de-sign, goal availability, which holds of a protocol whensuitable guarantees confirming its goals exist on as-sumptions that both peers can verify. Failure to observeour principle raises the risk of attacks, as is the case, forexample, of the attack on Kerberos IV.

UCAM-CL-TR-494

Mark David Spiteri:

An architecture for the notification,storage and retrieval of eventsJuly 2000, 165 pages, paper copyPhD thesis (Darwin College, January 2000)

UCAM-CL-TR-495

Mohammad S.M. Khorsheed:

Automatic recognition of words inArabic manuscriptsJuly 2000, 242 pages, PDFPhD thesis (Churchill College, June 2000)

Abstract: The need to transliterate large numbers ofhistoric Arabic documents into machine-readable formhas motivated new work on offline recognition of Ara-bic script. Arabic script presents two challenges: or-thography is cursive and letter shape is context sensi-tive.

This dissertation presents two techniques to achievehigh word recognition rates: the segmentation-freetechnique and the segmentation-based technique. Thesegmentation-free technique treats the word as a whole.

The word image is first transformed into a normalisedpolar image. The two-dimensional Fourier transformis then applied to the polar image. This results in aFourier spectrum that is invariant to dilation, trans-lation, and rotation. The Fourier spectrum is used toform the word template, or train the word model in thetemplate-based and the multiple hidden Markov model(HMM) recognition systems, respectively. The recog-nition of an input word image is based on the mini-mum distance measure from the word templates andthe maximum likelihood probability for the word mod-els.

The segmentation-based technique uses a single hid-den Markov model, which is composed of multiplecharacter-models. The technique implements the an-alytic approach in which words are segmented intosmaller units, not necessarily characters. The wordskeleton is decomposed into a number of links in ortho-graphic order, it is then transferred into a sequence ofdiscrete symbols using vector quantisation. the trainingof each character-model is performed using either: stateassignment in the lexicon-driven configuration or theBaum-Welch method in the lexicon-free configuration.The observation sequence of the input word is givento the hidden Markov model and the Viterbi algorithmis applied to provide an ordered list of the candidaterecognitions.

UCAM-CL-TR-496

Gian Luca Cattani, James J. Leifer,Robin Milner:

Contexts and embeddings forclosed shallow action graphsJuly 2000, 56 pages, PostScript

Abstract: Action calculi, which have a graphical pre-sentation, were introduced to develop a theory sharedamong different calculi for interactive systems. The π-calculus, the λ-calculus, Petri nets, the Ambient calcu-lus and others may all be represented as action calculi.This paper develops a part of the shared theory.

A recent paper by two of the authors was concernedwith the notion of reactive system, essentially a cate-gory of process contexts whose behaviour is presentedas a reduction relation. It was shown that one can, forany reactive system, uniformly derive a labelled transi-tion system whose associated behavioural equivalencerelations (e.g. trace equivalence or bisimilarity) will becongruential, under the condition that certain relativepushouts exist in the reactive system. In the present pa-per we treat closed, shallow action calculi (those withno free names and no nested actions) as a generic appli-cation of these results. We define a category of actiongraphs and embeddings, closely linked to a category ofcontexts which forms a reactive system. This connec-tion is of independent interest; it also serves our present

99

purpose, as it enables us to demonstrate that appropri-ate relative pushouts exist.

Complemented by work to be reported elsewhere,this demonstration yields labelled transition systemswith behavioural congruences for a substantial class ofaction calculi. We regard this work as a step towardscomparable results for the full class.

UCAM-CL-TR-497

G.M. Bierman, A. Trigoni:

Towards a formal type system forODMG OQLSeptember 2000, 20 pages, paper copy

UCAM-CL-TR-498

Peter Sewell:

Applied π – a brief tutorialJuly 2000, 65 pages, PDF, PostScript

Abstract: This note provides a brief introduction toπ-calculi and their application to concurrent and dis-tributed programming. Chapter 1 introduces a simpleπ-calculus and discusses the choice of primitives, op-erational semantics (in terms of reductions and of in-dexed early labelled transitions), operational equiva-lences, Pict-style programming and typing. Chapter 2goes on to discuss the application of these ideas todistributed systems, looking informally at the designof distributed π-calculi with grouping and interactionprimitives. Chapter 3 returns to typing, giving pre-cise definitions for a simple type system and sound-ness results for the labelled transition semantics. Fi-nally, Chapters 4 and 5 provide a model developmentof the metatheory, giving first an outline and then de-tailed proofs of the results stated earlier. The note canbe read in the partial order 1.(2+3+4.5).

UCAM-CL-TR-499

James Edward Gain:

Enhancing spatial deformation forvirtual sculptingAugust 2000, 161 pages, PDFPhD thesis (St John’s College, June 2000)

Abstract: The task of computer-based free-form shapedesign is fraught with practical and conceptual difficul-ties. Incorporating elements of traditional clay sculpt-ing has long been recognised as a means of shieldinga user from the complexities inherent in this form ofmodelling. The premise is to deform a mathematically-defined solid in a fashion that loosely simulates the

physical moulding of an inelastic substance, such asmodelling clay or silicone putty. Virtual sculpting com-bines this emulation of clay sculpting with interactivefeedback.

Spatial deformations are a class of powerful mod-elling techniques well suited to virtual sculpting. Theyindirectly reshape an object by warping the surround-ing space. This is analogous to embedding a flexibleshape within a lump of jelly and then causing distor-tions by flexing the jelly. The user controls spatial defor-mations by manipulating points, curves or a volumet-ric hyperpatch. Directly Manipulated Free-Form Defor-mation (DMFFD), in particular, merges the hyperpatch-and point-based approaches and allows the user to pickand drag object points directly.

This thesis embodies four enhancements to the ver-satility and validity of spatial deformation:

1. We enable users to specify deformations by ma-nipulating the normal vector and tangent plane at apoint. A first derivative frame can be tilted, twisted andscaled to cause a corresponding distortion in both theambient space and inset object. This enhanced controlis accomplished by extending previous work on bivari-ate surfaces to trivariate hyperpatches.

2. We extend DMFFD to enable curve manipulationby exploiting functional composition and degree reduc-tion. Although the resulting curve-composed DMFFDintroduces some modest and bounded approximation,it is superior to previous curve-based schemes in otherrespects. Our technique combines all three forms ofspatial deformation (hyperpatch, point and curve), canmaintain any desired degree of derivative continuity, isamenable to the automatic detection and prevention ofself-intersection, and achieves interactive update ratesover the entire deformation cycle.

3. The approximation quality of a polygon-meshobject frequently degrades under spatial deformationto become either oversaturated or undersaturated withpolygons. We have devised an efficient adaptive meshrefinement and decimation scheme. Our novel contri-butions include: incorporating fully symmetrical deci-mation, reducing the computation cost of the refine-ment/decimation trigger, catering for boundary andcrease edges, and dealing with sampling problems.

4. The potential self-intersection of an object is aserious weakness in spatial deformation. We have de-veloped a variant of DMFFD which guards against self-intersection by subdividing manipulations into injective(one-to-one) mappings. This depends on three novelcontributions: analytic conditions for identifying self-intersection, and two injectivity tests (one exact butcomputationally costly and the other approximate butefficient).

UCAM-CL-TR-500

Jianxin Yan, Alan Blackwell, Ross Anderson,Alasdair Grant:

100

The memorability and security ofpasswords – some empirical resultsSeptember 2000, 13 pages, PDF

Abstract: There are many things that are ‘well known’about passwords, such as that uers can’t rememberstrong passwords and that the passwords they can re-member are easy to guess. However, there seems to be adistinct lack of research on the subject that would passmuster by the standards of applied psychology.

Here we report a controlled trial in which, of foursample groups of about 100 first-year students, threewere recruited to a formal experiment and of thesetwo were given specific advice about password selec-tion. The incidence of weak passwords was determinedby cracking the password file, and the number of pass-word resets was measured from system logs. We ob-served a number of phenomena which run counter tothe established wisdom. For example, passwords basedon mnemonic phrases are just as hard to crack as ran-dom passwords yet just as easy to remember as naiveuser selections.

UCAM-CL-TR-501

David Ingram:

Integrated quality of servicemanagementSeptember 2000, 90 pages, paper copyPhD thesis (Jesus College, August 2000)

UCAM-CL-TR-502

Thomas Marthedal Rasmussen:

Formalizing basic number theorySeptember 2000, 20 pages, paper copy

UCAM-CL-TR-503

Alan Mycroft, Richard Sharp:

Hardware/software co-design usingfunctional languagesSeptember 2000, 8 pages, PDF

Abstract: In previous work we have developed andprototyped a silicon compiler which translates a func-tional language (SAFL) into hardware. Here we presenta SAFL-level program transformation which: (i) parti-tions a specification into hardware and software partsand (ii) generates a specialised architecture to executethe software part. The architecture consists of a num-ber of interconnected heterogeneous processors. Ourmethod allows a large design space to be explored bysystematically transforming a single SAFL specificationto investigate different points on the area-time spec-trum.

UCAM-CL-TR-504

Oi Yee Kwong:

Word sense selection in texts:an integrated modelSeptember 2000, 177 pages, PostScriptPhD thesis (Downing College, May 2000)

Abstract: Early systems for word sense disambigua-tion (WSD) often depended on individual tailor-madelexical resources, hand-coded with as much lexical in-formation as needed, but of severely limited vocab-ulary size. Recent studies tend to extract lexical in-formation from a variety of existing resources (e.g.machine-readable dictionaries, corpora) for broad cov-erage. However, this raises the issue of how to combinethe information from different resources.

Thus while different types of resource could makedifferent contribution to WSD, studies to date have notshown what contribution they make, how they shouldbe combined, and whether they are equally relevant toall words to be disambiguated. This thesis proposes anIntegrated Model as a framework to study the inter-relatedness of three major parameters in WSD: LexicalResource, Contextual Information, and Nature of Tar-get Words. We argue that it is their interaction whichshapes the effectiveness of any WSD system.

A generalised, structurally-based sense-mapping al-gorithm was designed to combine various types oflexical resource. This enables information from theseresources to be used simultaneously and compatibly,while respecting their distinctive structures. In study-ing the effect of context on WSD, different semanticrelations available from the combined resources wereused, and a recursive filtering algorithm was designedto overcome combinatorial explosion. We then investi-gated, from two directions, how the target words them-selves could affect the usefulness of different types ofknowledge. In particular, we modelled WSD with thecloze test format, i.e. as texts with blanks and all sensesfor one specific word as alternative choices for fillingthe blank.

A full-scale combination of WordNet and Roget’sThesaurus was done, linking more than 30,000 senses.Using these two resources in combination, a range ofdisambiguation tests was done on more than 60,000noun instances from corpus texts of different types, and60 blanks from real cloze texts. Results show that com-bining resources is useful for enriching lexical informa-tion, and hence making WSD more effective though notcompletely. Also, different target words make differentdemand on contextual information, and this interactionis closely related to text types. Future work is suggestedfor expanding the analysis on target nature and makingthe combination of disambiguation evidence sensitiveto the requirements of the word being disambiguated.

101

UCAM-CL-TR-505

Gian Luca Cattani, Peter Sewell:

Models for name-passing processes:interleaving and causalSeptember 2000, 42 pages, PDF, PostScript, DVI

Abstract: We study syntax-free models for name-passing processes. For interleaving semantics, we iden-tify the indexing structure required of an early labelledtransition system to support the usual π-calculus op-erations, defining Indexed Labelled Transition Systems.For noninterleaving causal semantics we define IndexedLabelled Asynchronous Transition Systems, smoothlygeneralizing both our interleaving model and the stan-dard Asynchronous Transition Systems model for CCS-like calculi. In each case we relate a denotational se-mantics to an operational view, for bisimulation andcausal bisimulation respectively. We establish complete-ness properties of, and adjunctions between, categoriesof the two models. Alternative indexing structures andpossible applications are also discussed. These are firststeps towards a uniform understanding of the semanticsand operations of name-passing calculi.

UCAM-CL-TR-506

Peter Sewell:

Modules, abstract types,and distributed versioningSeptember 2000, 46 pages, PDF, PostScript, DVI

Abstract: In a wide-area distributed system it is oftenimpractical to synchronise software updates, so onemust deal with many coexisting versions. We studystatic typing support for modular wide-area program-ming, modelling separate compilation/linking and exe-cution of programs that interact along typed channels.Interaction may involve communication of values ofabstract types; we provide the developer with fine-grainversioning control of these types to support interoper-ation of old and new code. The system makes use ofa second-class module system with singleton kinds; wegive a novel operational semantics for separate compi-lation/linking and execution and prove soundness.

UCAM-CL-TR-507

Lawrence Paulson:

Mechanizing a theory of programcomposition for UNITYNovember 2000, 28 pages, PDF

Abstract: Compositional reasoning must be better un-derstood if non-trivial concurrent programs are to beverified. Chandy and Sanders [2000] have proposed anew approach to reasoning about composition, whichCharpentier and Chandy [1999] have illustrated by de-veloping a large example in the UNITY formalism.The present paper describes extensive experiments onmechanizing the compositionality theory and the ex-ample, using the proof tool Isabelle. Broader issues arediscussed, in particular, the formalization of programstates. The usual representation based upon maps fromvariables to values is contrasted with the alternatives,such as a signature of typed variables. Properties needto be transferred from one program component’s sig-nature to the common signature of the system. Safetyproperties can be so transferred, but progress proper-ties cannot be. Using polymorphism, this problem canbe circumvented by making signatures sufficiently flex-ible. Finally the proof of the example itself is outlined.

UCAM-CL-TR-508

James Leifer, Robin Milner:

Shallow linear action graphs andtheir embeddingsOctober 2000, 16 pages, PostScript

Abstract: In previous work, action calculus has beenpresented in terms of action graphs. Many calculi, or atleast their salient features, can be expressed as specificaction calculi; examples are Petri nets, λ-calculus, π-calculus, fusion calculus, ambient calculus and spi cal-culus.

We here offer linear action graphs as a primitivebasis for action calculi. Linear action graphs have asimpler theory than the non-linear variety. This paperpresents the category of embeddings of shallow linearaction graphs (those without nesting), using a novelform of graphical reasoning which simplifies some oth-erwise complex manipulations in regular algebra. Thework is done for undirected graphs, and adapted in afew lines to directed graphs.

The graphical reasoning used here will be appliedin future work to develop behavioural congruences foraction calculi.

UCAM-CL-TR-509

Wojciech Basalaj:

Proximity visualisation of abstractdataJanuary 2001, 117 pages, PDFPhD thesis (October 2000)

102

Abstract: Data visualisation is an established techniquefor exploration, analysis and presentation of data. Agraphical presentation is generated from the data con-tent, and viewed by an observer, engaging vision – thehuman sense with the greatest bandwidth, and the abil-ity to recognise patterns subconciously. For instance, acorrelation present between two variables can be elu-cidated with a scatter plot. An effective visualisationcan be difficult to achieve for an abstract collection ofobjects, e.g. a database table with many attributes, ora set of multimedia documents, since there is no im-mediately obvious way of arranging the objects basedon their content. Thankfully, similarity between pairsof elements of such a collection can be measured, anda good overview picture should respect this proximityinformation, by positioning similar elements close toone another, and far from dissimilar objects. The re-sulting proximity visualisation is a topology preservingmap of the underlying data collection, and this work in-vestigates various methods for generating such maps. Anumber of algorithms are devised, evaluated quantita-tively by means of statistical inference, and qualitativelyin a case study for each type of data collection. Othergraphical representations for abstract data are surveyedand compared to proximity visualisation.

A standard method for modelling prximity relationsis multidimensional scaling (MDS) analysis. The resultis usually a two- or three-dimensional configuration ofpoints – each representing a single element from a col-lection., with inter-point distances approximating thecorresponding proximities. The quality of this approx-imation can be expressed as a loss function, and theoptimal arrangement can be found by minimising it nu-merically – a procedure known as least-squares metricMDS. This work presents a number of algorithmic in-stances of this problem, using established function op-timisation heuristics: Newton-Raphson, Tabu Search,Genetic Algorithm, Iterative Majorization, and Stimu-lated annealing. Their effectiveness at minimising theloss function is measured for a representative sample ofdata collections, and the relative ranking established.The popular classical scaling method serves as a bench-mark for this study.

The computational cost of conventional MDSmakes it unsuitable for visualising a large data collec-tion. Incremental multidimensional scaling solves thisproblem by considering only a carefully chosen subsetof all pairwise proximities. Elements that make up clus-ter diameters at a certain level of the single link clus-ter hierarchy are identified, and are subject to standardMDS, in order to establish the overall shape of the con-figuration. The remaining elements are positioned in-dependently of one another with respect to this skele-ton configuration. For very large collections the skele-ton configuration can itself be built up incrementally.The incremental method is analysed for the compro-mise between solution quality and the proportion ofproximities used, and compared to Principal Compo-nents Analysis on a number of large database tables.

In some applications it is convenient to represent in-

dividual objects by compact icons of fixed size, for ex-ample the use of thumbnails when visualising a set ofimages. Because the MDS analysis only takes the posi-tion of icons into account, and not their size, its directuse for visualisation may lead to partial or completeoverlap of icons. Proximity grid – an analogue of MDSin a discrete domain – is proposed to overcome this de-ficiency. Each element of an abstract data collection isrepresented within a single cell of the grid, and thusconsiderable detail can be shown without overlap. Theproximity relationships are preserved by clustering sim-ilar elements in the grid, and keeping dissimilar onesapart. Algorithms for generating such an arrangementare presented and compared in terms of output qualityto one another as well as standard MDS.

UCAM-CL-TR-510

Richard Mortier, Rebecca Isaacs, Keir Fraser:

Switchlets and resource-assuredMPLS networksMay 2000, 16 pages, PDF, PostScript

Abstract: MPLS (Multi-Protocol Label Switching) is atechnology with the potential to support multiple con-trol systems, each with guaranteed QoS (Quality of Ser-vice), on connectionless best-effort networks. However,it does not provide all the capabilities required of amulti-service network. In particular, although resource-assured VPNs (Virtual Private Networks) can be cre-ated, there is no provision for inter-VPN resource man-agement. Control flexibility is limited because resourcesmust be pinned down to be guaranteed, and best-effortflows in different VPNs compete for the same resources,leading to QoS crosstalk.

The contribution of this paper is an implementationon MPLS of a network control framework that sup-ports inter-VPN resource management. Using resourcepartitions known as switchlets, it allows the creationof multiple VPNs with guaranteed resource allocations,and maintains isolation between these VPNs. Devolvedcontrol techniques permit each VPN a customised con-trol system.

We motivate our work by discussing related effortsand example scenarios of effective deployment of oursystem. The implementation is described and evaluated,and we address interoperability with external IP controlsystems, in addition to interoperability of data acrossdifferent layer 2 technologies.

UCAM-CL-TR-511

Calum Grant:

Software visualization in PrologDecember 1999, 193 pages, PDFPhD thesis (Queens’ College, 1999)

103

Abstract: Software visualization (SV) uses computergraphics to communicate the structure and behaviourof complex software and algorithms. One of the impor-tant issues in this field is how to specify SV, becauseexisting systems are very cumbersome to specify andimplement, which limits their effectiveness and hindersSV from being integrated into professional software de-velopment tools.

In this dissertation the visualization process is de-composed into a series of formal mappings, which pro-vides a formal foundation, and allows separate aspectsof visualization to be specified independently. The firstmapping specifies the information content of each view.The second mapping specifies a graphical representa-tion of the information, and a third mapping specifiesthe graphical components that make up the graphicalrepresentation. By combining different mappings, com-pletely different views can be generated.

The approach has been implemented in Prolog toprovide a very high level specification language for in-formation visualization, and a knowledge engineeringenvironment that allows data queries to tailor the infor-mation in a view. The output is generated by a graphicalconstraint solver that assembles the graphical compo-nents into a scene.

This system provides a framework for SV calledVmax. Source code and run-time data are analyzed byProlog to provide access to information about the pro-gram structure and run-time data for a wide range ofhighly interconnected browsable views. Different viewsand means of visualization can be selected from menus.An automatic legend describes each view, and can beinteractively modified to customize how data is pre-sented. A text window for editing source code is syn-chronized with the graphical view. Vmax is a completeJava development environment and end user SV system.

Vmax compares favourably to existing SV systemsin many taxonometric criteria, including automation,scope, information content, graphical output form,specification, tailorability, navigation, granularity andelision control. The performance and scalability of thenew approach is very reasonable.

We conclude that Prolog provides a formal and highlevel specification language that is suitable for specify-ing all aspects of a SV system.

UCAM-CL-TR-512

Anthony Fox:

An algebraic frameworkfor modelling and verifyingmicroprocessors using HOLMarch 2001, 24 pages, PDF

Abstract: This report describes an algebraic approachto the specification and verification of microprocessordesigns. Key results are expressed and verified usingthe HOL proof tool. Particular attention is paid to the

models of time and temporal abstraction, culminatingin a number of one-step theorems. This work is thenexplained with a small but complete case study, whichverifies the correctness of a datapath with micropro-gram control.

UCAM-CL-TR-513

Tetsuya Sakai, Karen Sparck Jones:

Generic summaries for indexingin information retrieval –Detailed test resultsMay 2001, 29 pages, PostScript

Abstract: This paper examines the use of generic sum-maries for indexing in information retrieval. Our mainobservations are that:

– With or without pseudo-relevance feedback, asummary index may be as effective as the correspond-ing fulltext index for precision-oriented search of highlyrelevant documents. But a reasonably sophisticatedsummarizer, using a compression ratio of 10–30%, isdesirable for this purpose.

– In pseudo-relevance feedback, using a summaryindex at initial search and a fulltext index at final searchis possibly effective for precision-oriented search, re-gardless of relevance levels. This strategy is significantlymore effective than the one using the summary indexonly and probably more effective than using summariesas mere term selection filters. For this strategy, the sum-mary quality is probably not a critical factor, and acompression ratio of 5–10% appears best.

UCAM-CL-TR-514

Asis Unyapoth:

Nomadic π-calculi: expressingand verifying communicationinfrastructure for mobile computationJune 2001, 316 pages, PDF, PostScriptPhD thesis (Pembroke College, March 2001)

Abstract: This thesis addresses the problem of verify-ing distributed infrastructure for mobile computation.In particular, we study language primitives for com-munication between mobile agents. They can be clas-sified into two groups. At a low level there are “lo-cation dependent” primitives that require a program-mer to know the current site of a mobile agent in or-der to communicate with it. At a high level there are“location independent” primitives that allow commu-nication with a mobile agent irrespective of any migra-tions. Implementation of the high level requires delicatedistributed infrastructure algorithms. In earlier work ofSewell, Wojciechowski and Pierce, the two levels were

104

made precise as process calculi, allowing such algo-rithms to be expressed as encodings of the high levelinto the low level; a distributed programming language“Nomadic Pict” has been built for experimenting withsuch encodings.

This thesis turns to semantics, giving a definition ofthe core language (with a type system) and proving cor-rectness of an example infrastructure. This involves ex-tending the standard semantics and proof techniquesof process calculi to deal with the new notions of sitesand agents. The techniques adopted include labelledtransition semantics, operational equivalences and pre-orders (e.g., expansion and coupled simulation), “upto” equivalences, and uniform receptiveness. We alsodevelop two novel proof techniques for capturing thedesign intuitions regarding mobile agents: we consider“translocating” versions of operational equivalencesthat take migration into account, allowing composi-tional reasoning; and “temporary immobility”, whichcaptures the intuition that while an agent is waiting fora lock somewhere in the system, it will not migrate.

The correctness proof of an example infrastructureis non-trivial. It involves analysing the possible reach-able states of the encoding applied to an arbitrary high-level source program. We introduce an intermediatelanguage for factoring out as many ‘house-keeping’ re-duction steps as possible, and focusing on the partially-committed steps.

UCAM-CL-TR-515

Andrei Serjantov, Peter Sewell,Keith Wansbrough:

The UDP calculus:rigorous semantics for realnetworkingJuly 2001, 70 pages, PostScript

Abstract: Network programming is notoriously hard tounderstand: one has to deal with a variety of protocols(IP, ICMP, UDP, TCP, etc.), concurrency, packet loss,host failure, timeouts, the complex sockets interface tothe protocols, and subtle protability issues. Moreover,the behavioural properties of operating systems and thenetwork are not well documented.

A few of these issues have been addressed in the pro-cess calculus and distributed algorithm communities,but there remains a wide gulf between what has beencaptured in semantic models and what is required fora precise understanding of the behaviour of practicaldistributed programs that use these protocols.

In this paper we demonstrate (in a preliminary way)that the gulf can be bridged. We give an operationalmodel for socket programming with a substantial frac-tion of UDP and ICMP, including loss and failure. Themodel has been validated by experiment against actualsystems. It is not tied to a particular programming lan-guage, but can be used with any language equipped

with an operational semantics for system calls – herewe give such a language binding for an OCaml frag-ment. We illustrate the model with a few small networkprograms.

UCAM-CL-TR-516

Rebecca Isaacs:

Dynamic provisioning ofresource-assured and programmablevirtual private networksSeptember 2001, 145 pages, PostScriptPhD thesis (Darwin College, December 2000)

Abstract: Virtual Private Networks (VPNs) providededicated connectivity to a closed group of users ona shared network. VPNs have traditionally been de-ployed for reasons of economy of scale, but have eitherbeen statically defined, requiring manual configuration,or else unable to offer any quality of service (QoS) guar-antees.

This dissertation describes VServ, a service offer-ing dynamic and resource-assured VPNs that can beacquired and modified on demand. In VServ, a VPNis both a subset of physical resources, such as band-width and label space, together with the means to per-form fine-grained management of those resources. Thisnetwork programmability, combined with QoS guaran-tees, enables the multiservice network – a single univer-sal network that can support all types of service andthus be efficient, cost-effective and flexible.

VServ is deployed over a network control frame-work known as Tempest. The Tempest explicitly dis-tinguishes between inter- and intra-VPN resource man-agement mechanisms. This makes the dynamic resourcereallocation capabilities of VServ viable, whilst han-dling highly dynamic VPNs or a large number of VPNs.Extensions to the original implementation of the Tem-pest to support dynamically reconfigurable QoS are de-tailed.

A key part of a dynamic and responsive VPN ser-vice is fully automated VPN provisioning. A notationfor VPN specification is described, together with mech-anisms for incorporating policies of the service providerand the current resource availability in the network intothe design process. The search for a suitable VPN topol-ogy can be expressed as a optimisation problem thatis not computationally tractable except for very smallnetworks. This dissertation describes how the search ismade practical by tailoring it according to the charac-teristics of the desired VPN.

Availability of VServ is addressed with a proposalfor distributed VPN creation. A resource revocationprotocol exploits the dynamic resource managementcapabilities of VServ to allow adaptation in the con-trol plane on a per-VPN basis. Managed resource revo-cation supports highly flexible resource allocation and

105

reallocation policies, allowing VServ to efficiently pro-vision for short-lived or highly dynamic VPNs.

UCAM-CL-TR-517

Karen Sparck Jones, P. Jourlin, S.E. Johnson,P.C. Woodland:

The Cambridge MultimediaDocument Retrieval Project:summary of experimentsJuly 2001, 30 pages, PostScript, DVI

Abstract: This report summarises the experimentalwork done under the Multimedia Document Retrieval(MDR) project at Cambridge from 1997-2000, with se-lected illustrations. The focus is primarily on retrievalstudies, and on speech tests directly related to retrieval,not on speech recognition itself. The report draws onthe many and varied tests done during the project, butalso presents a new series of results designed to com-pare strategies across as many different data sets as pos-sible by using consistent system parameter settings.

The project tests demonstrate that retrieval fromfiles of audio news material transcribed using a state ofthe art speech recognition system can match the refer-ence level defined by human transcriptions; and that ex-pansion techniques, especially when applied to queries,can be very effective means for improving basic searchperformance.

UCAM-CL-TR-518

Jeff Jianxin Yan, Yongdong Wu:

An attack on a traitor tracing schemeJuly 2001, 14 pages, PDF

Abstract: In Crypto’99, Boneh and Franklin proposeda public key traitor tracing scheme, which was believedto be able to catch all traitors while not accusing anyinnocent users (i.e., full-tracing and error-free). Assum-ing that Decision Diffie-Hellman problem is unsolvablein Gq, Boneh and Franklin proved that a decoder can-not distinguish valid ciphertexts from invalid ones thatare used for tracing. However, our novel pirate decoderP3 manages to make some invalid ciphertexts distin-guishable without violating their assumption, and it canalso frame innocent user coalitions to fool the tracer.Neither the single-key nor arbitrary pirate tracing algo-rithm presented in [1] can identify all keys used by P3as claimed. Instead, it is possible for both algorithms tocatch none of the traitors. We believe that the construc-tion of our novel pirate also demonstrates a simple wayto defeat some other black-box traitor tracing schemesin general.

UCAM-CL-TR-519

Martin Choquette:

Local evidence in document retrievalAugust 2001, 177 pages, paper copyPhD thesis (Fitzwilliam College, November 2002)

UCAM-CL-TR-520

Mohamed Hassan, Neil A. Dodgson:

Ternary and three-point univariatesubdivision schemesSeptember 2001, 8 pages, PDF

Abstract: The generating function formalism is used toanalyze the continuity properties of univariate ternarysubdivision schemes. These are compared with their bi-nary counterparts.

UCAM-CL-TR-521

James Leifer:

Operational congruences forreactive systemsSeptember 2001, 144 pages, PostScriptPhD thesis (Trinity College, March 2001)

Abstract: The dynamics of process calculi, eg. CCS,have often been defined using a labelled transactionsystem (LTS). More recently it has become commonwhen defining dynamics to use reaction rules –ie. unla-belled transition rules– together with a structural con-gruence. This form, which I call a reactive system, ishighly expressive but is limited in an important way:LTSs lead more naturally to operational equivalencesand preorders.

So one would like to derive from reaction rules asuitable LTS. This dissertation shows how to derive anLTS for a wide range of reactive systems. A label foran agent (process), a, is defined to be any context, F,which intuitively is just large enough so that the agentFa (“a in context F”) is able to perform a reaction. Thekey contribution of my work is the precise definitionof “just large enough”, in terms of the categorical no-tation of relative pushout (RPO), which ensures thatseveral operational equivalences and preorders (strongbisimulation, weak bisimulation, the traces preorder,and the failures preorder) are congruences when suf-ficient RPOs exist.

I present a substantial example of a family of re-active systems based on closed, shallow action calculi(those with no free names and no nesting). I prove thatRPOs exist for a category of such contexts. The proofis carried out indirectly in terms of a category of action

106

graphs and embeddings and gives precise (necessaryand sufficient) conditions for the existance of RPOs.I conclude by arguing that these conditions are satis-fied for a wide class of reaction rules. The thrust of thisdissertation is, therefore, towards easing the burden ofexploring new models of computation by providing ageneral method for achieving useful operational con-gruences.

UCAM-CL-TR-522

Mark F.P. Gillies:

Practical behavioural animationbased on vision and attentionSeptember 2001, 187 pages, PDF, PostScript

Abstract: The animation of human like characters is avital aspect of computer animation. Most animationsrely heavily on characters of some sort or other. Thismeans that one important aspect of computer anima-tion research is to improve the animation of these char-acters both by making it easier to produce animationsand by improving the quality of animation produced.One approach to animating characters is to produce asimulation of the behaviour of the characters which willautomatically animate the character.

The dissertation investigates the simulation of be-haviour in practical applications. In particular it fo-cuses on models of visual perception for use in simu-lating human behaviour. A simulation of perception isvital for any character that interacts with its surround-ings. Two main aspects of the simulation of perceptionare investigated:

– The use of psychology for designing visual algo-rithms.

– The simulation of attention in order to produceboth behaviour and gaze patterns.

Psychological theories are a useful starting pointfor designing algorithms for simulating visual percep-tion. The dissertation investigates their use and presentssome algorithms based on psychological theories.

Attention is the focusing of a person’s perception ona particular object. The dissertation presents a simula-tion of what a character is attending to (looking at).This is used to simulate behaviour and for animatingeye movements.

The algorithms for the simulation of vision and at-tention are applied to two tasks in the simulation ofbehaviour. The first is a method for designing genericbehaviour patterns from simple pieces of motion. Thesecond is a behaviour pattern for navigating a clutteredenvironment. The simulation of vision and attentiongives advantages over existing work on both problems.The approaches to the simulation of perception will beevaluated in the context of these examples.

UCAM-CL-TR-523

Robin Milner:

Bigraphical reactive systems:basic theorySeptember 2001, 87 pages, PDF

Abstract: A notion of bigraph is proposed as the basisfor a model of mobile interaction. A bigraph consistsof two independent structures: a topograph represent-ing locality and a monograph representing connectiv-ity. Bigraphs are equipped with reaction rules to formbigraphical reactive systems (BRSs), which include ver-sions of the π-calculus and the ambient calculus. Bi-graphs are shown to be a special case of a more ab-stract notion, wide reactive systems (WRSs), not as-suming any particular graphical or other structure butequipped with a notion of width, which expresses thatagents, contexts and reactions may all be widely dis-tributed entities.

A behavioural theory is established for WRSs us-ing the categorical notion of relative pushout; it allowslabelled transition systems to be derived uniformly, insuch a way that familiar behavioural preorders andequivalences, in particular bisimilarity, are congruentialunder certain conditions. Then the theory of bigraphsis developed, and they are shown to meet these con-ditions. It is shown that, using certain functors, otherWRSs which meet the conditions may also be derived;these may, for example, be forms of BRS with addi-tional structure.

Simple examples of bigraphical systems are dis-cussed; the theory is developed in a number of waysin preparation for deeper application studies.

UCAM-CL-TR-524

Giampaolo Bella, Fabio Massacci,Lawrence C. Paulson:

Verifying the SET purchase protocolsNovember 2001, 14 pages, PDF

Abstract: The Secure Electronic Transaction (SET) pro-tocol has been proposed by a consortium of credit cardcompanies and software corporations to guarantee theauthenticity of e-commerce transactions and the con-fidentiality of data. When the customer makes a pur-chase, the SET dual signature keeps his account detailssecret from the merchant and his choice of goods se-cret from the bank. This paper reports verification re-sults for the purchase step of SET, using the inductivemethod. The credit card details do remain confidential.The customer, merchant and bank can confirm most de-tails of a transaction even when some of those detailsare kept from them. The usage of dual signatures re-quires repetition in protocol messages, making proofs

107

more difficult but still feasible. The formal analysis hasrevealed a significant defect. The dual signature lacksexplicitness, giving rise to potential vulnerabilities.

UCAM-CL-TR-525

Timothy L. Harris:

Extensible virtual machinesDecember 2001, 209 pages, PDFPhD thesis

Abstract: Virtual machines (VMs) have enjoyed a resur-gence as a way of allowing the same application pro-gram to be used across a range of computer systems.This flexibility comes from the abstraction that the pro-vides over the native interface of a particular computer.However, this also means that the application is pre-vented from taking the features of particular physicalmachines into account in its implementation.

This dissertation addresses the question of why,where and how it is useful, possible and practicableto provide an application with access to lower-level in-terfaces. It argues that many aspects of implementa-tion can be devolved safely to untrusted applicationsand demonstrates this through a prototype which al-lows control over run-time compilation, object place-ment within the heap and thread scheduling. The pro-posed architecture separates these application-specificpolicy implementations from the application itself. Thisallows one application to be used with different policieson different systems and also allows naıve or prematureoptimizations to be removed.

UCAM-CL-TR-526

Andrew J. Penrose:

Extending lossless image compressionDecember 2001, 137 pages, PDFPhD thesis (Gonville & Caius College, January 2001)

Abstract: “It is my thesis that worthwhile improve-ments can be made to lossless image compressionschemes, by considering the correlations between thespectral, temporal and interview aspects of image data,in extension to the spatial correlations that are tradi-tionally exploited.”

Images are an important part of today’s digitalworld. However, due to the large quantity of dataneeded to represent modern imagery the storage of suchdata can be expensive. Thus, work on efficient imagestorage (image compression) has the potential to reducestorage costs and enable new applications.

Many image compression schemes are lossy; that isthey sacrifice image informationto achieve very com-pact storage. Although this is acceptable for many ap-plications, some environments require that compres-sion not alter the image data. This lossless image com-pression has uses in medical, scientific and professionalvideo processing applications.

Most of the work on lossless image compressionhas focused on monochrome images and has madeuse of the spatial smoothness of image data. Only re-cently have researchers begun to look specifically atthe lossless compression of colour images and video.By extending compression schemes for colour imagesand video, the storage requirements for these importantclasses of image data can be further reduced.

Much of the previous research into lossless colourimage and video compression has been exploratory.This dissertation studies the problem in a structuredway. Spatial, spectral and temporal correlations areall considered to facilitate improved compression. Thishas lead to a greater data reduction than many exist-ing schemes for lossless colour image and colour videocompression.

Furthermore, this work has considered the applica-tion of extended lossless image coding to more recentimage types, such as multiview imagery. Thus, systemsthat use multiple views of the same scene to provide 3Dviewing, have beenprovided with a completely novel so-lution for the compression of multiview colour video.

UCAM-CL-TR-527

Umar Saif:

Architectures for ubiquitous systemsJanuary 2002, 271 pages, PDFPhD thesis

Abstract: Advances in digital electronics over thelast decade have made computers faster, cheaper andsmaller. This coupled with the revolution in communi-cation technology has led to the development of sophis-ticated networked appliances and handheld devices.“Computers” are no longer boxes sitting on a desk,they are all around us, embedded in every nook andcorner of our environment. This increasing complex-ity in our environment leads to the desire to design asystem that could allow this pervasive functionality todisappear in the infrastructure, automatically carryingout everyday tasks of the users.

Such a system would enable devices embedded in theenvironment to cooperate with one another to make awide range of new and useful applications possible, notoriginally conceived by the manufacturer, to achievegreater functionality, flexibility and utility.

The compelling question then becomes “what soft-ware needs to be embedded in these devices to enablethem to participate in such a ubiquitous system”? Thisis the question addressed by the dissertation.

Based on the experience with home automation sys-tems, as part of the AutoHAN project, the dissertationpresents two compatible but different architectures; oneto enable dumb devices to be controlled by the systemand the other to enable intelligent devices to control,extend and program the system.

108

Control commands for dumb devices are managedusing an HTTP-based publish/subscribe/notify archi-tecture; devices publish their control commands to thesystem as XML-typed discrete messages, applicationsdiscover and subscribe interest in these events to sendand receive control commands from these devices, astyped messages, to control their behavior. The archi-tecture handles mobility and failure of devices by us-ing soft-state, redundent subscriptions and “care-of”nodes. The system is programmed with event scriptsthat encode automation rules as condition-action bind-ings. Finally, the use of XML and HTTP allows devicesto be controlled by a simple Internet browser.

While the publish/subscribe/notify defines a simplearchitecture to enable interoperability of limited capa-bility devices, intelligent devices can afford more com-plexity that can be utilized to support user applicationsand services to control, manage and program the sys-tem. However, the operating system embedded in thesedevices needs to address the heterogeneity, longevity,mobility and dynamism of the system.

The dissertation presents the architecture of an em-bedded distributed operating system that lends itselfto safe context-driven adaptation. The operating sys-tem is instrumented with four artifacts to address thechallenges posed by a ubiquitous system. 1) An XML-based directory service captures and notifies the appli-cations and services about changes in the device con-text, as resources move, fail, leave or join the system, toallow context-driven adaptation. 2) A Java-based mo-bile agent system allows new software to be injected inthe system and moved and replicated with the changingcharacteristics of the system to define a self-organizingsystem. 3) A subscribe/notify interface allows context-specific extensions to be dynamically added to the oper-ating system to enable it to efficiently interoperate in itscurrent context according to application requirements.4) Finally, a Dispatcher module serves as the context-aware system call interface for the operating system;when requested to invoke a service, the Dispatcher in-vokes the resource that best satisfies the requirementsgiven the characteristics of the system.

Definition alone is not sufficient to prove the va-lidity of an architecture. The dissertation therefore de-scribes a prototype implementation of the operatingsystem and presents both a quantitative comparison ofits performance with related systems and its qualitativemerit by describing new applications made possible byits novel architecture.

UCAM-CL-TR-528

Andrew William Moore:

Measurement-based management ofnetwork resourcesApril 2002, 273 pages, PDFPhD thesis

Abstract: Measurement-Based Estimators are able tocharacterise data flows, enabling improvements to ex-isting management techniques and access to previouslyimpossible management techniques. It is the thesis ofthis dissertation that in addition to making practicaladaptive management schemes, measurement-based es-timators can be practical within current limitations ofresource.

Examples of network management include the char-acterisation of current utilisation for explicit admissioncontrol and the configuration of a scheduler to dividelink-capacity among competing traffic classes. Withoutmeasurements, these management techniques have re-lied upon the accurate characterisation of traffic – with-out accurate traffic characterisation, network resourcesmay be under or over utilised.

Embracing Measurement-Based Estimation in ad-mission control, Measurement-Based Admission Con-trol (MBAC) algorithms have allowed characterisationof new traffic flows while adapting to changing flowrequirements. However, there have been many MBACalgorithms proposed, often with no clear differentiationbetween them. This has motivated the need for a realis-tic, implementation-based comparison in order to iden-tify an ideal MBAC algorithm.

This dissertation reports on an implementation-based comparison of MBAC algorithms conductedusing a purpose built test environment. The use ofan implementation-based comparison has allowed theMBAC algorithms to be tested under realistic condi-tions of traffic load and realistic limitations on memory,computational resources and measurements. Alongsidethis comparison is a decomposition of a group ofMBAC algorithms, illustrating the relationship amongMBAC algorithm components, as well as highlightingcommon elements among different MBAC algorithms.

The MBAC algorithm comparison reveals that,while no single algorithm is ideal, the specific resourcedemands, such as computation overheads, can dramat-ically impact on the MBAC algorithm’s performance.Further, due to the multiple timescales present in bothtraffic and management, the estimator of a robustMBAC algorithm must base its estimate on measure-ments made over a wide range of timescales. Finally, areliable estimator must account for the error resultingfrom random properties of measurements.

Further identifying that the estimator componentsused in MBAC algorithms need not be tied to the ad-mission control problem, one of the estimators (orig-inally constructed as part of an MBAC algorithm)is used to continuously characterise resource require-ments for a number of classes of traffic. Continuouscharacterisation of traffic, whether requiring similaror orthogonal resources, leads to the construction anddemonstration of a network switch that is able to pro-vide differentiated service while being adaptive to thedemands of each traffic class. The dynamic allocationof resources is an approach unique to a measurement-based technique that would not be possible if resourceswere based upon static declarations of requirement.

109

UCAM-CL-TR-529

Neil Johnson:

The triVM intermediate languagereference manualFebruary 2002, 83 pages, PDFThis research was sponsored by a grant from ARMLimited.

Abstract: The triVM intermediate language has beendeveloped as part of a research programme concen-trating on code space optimization. The primary aimin developing triVM is to provide a language that re-moves the complexity of high-level languages, such asC or ML, while maintaining sufficient detail, at as sim-ple a level as possible, to support reseach and experi-mentation into code size optimization. The basic struc-ture of triVM is a notional Static Single Assignment-based three-address machine. A secondary aim is todevelop an intermediate language that supports graph-based translation, using graph rewrite rules, in a tex-tual, human-readable format. Experience has shownthat text-format intermediate files are much easier touse for experimentation, while the penalty in translat-ing this human-readable form to the internal data struc-tures used by the software is negligible. Another aim isto provide a flexible language in which features and in-novations can be evaluated; for example, this is one ofthe first intermediate languages directly based on theStatic Single Assignment technique, and which explic-itly exposes the condition codes as a result of arithmeticoperations. While this paper is concerned solely withthe description of triVM, we present a brief summaryof other research-orientated intermediate languages.

UCAM-CL-TR-530

Anna Korhonen:

Subcategorization acquisitionFebruary 2002, 189 pages, PDFPhD thesis (Trinity Hall, September 2001)

Abstract: Manual development of large subcategorisedlexicons has proved difficult because predicates changebehaviour between sublanguages, domains and overtime. Yet access to a comprehensive subcategorizationlexicon is vital for successful parsing capable of re-covering predicate-argument relations, and probabilis-tic parsers would greatly benefit from accurate infor-mation concerning the relative likelihood of differentsubcategorisation frames SCFs of a given predicate.Acquisition of subcategorization lexicons from textualcorpora has recently become increasingly popular. Al-though this work has met with some success, result-ing lexicons indicate a need for greater accuracy. Onesignificant source of error lies in the statistical filtering

used for hypothesis selection, i.e. for removing noisefrom automatically acquired SCFs.

This thesis builds on earlier work in verbal sub-categorization acquisition, taking as a starting pointthe problem with statistical filtering. Our investigationshows that statistical filters tend to work poorly be-cause not only is the underlying distribution zipfian,but there is also very little correlation between condi-tional distribution of SCFs specific to a verb and uncon-ditional distribution regardless of the verb. More accu-rate back-off estimates are needed for SCF acquisitionthan those provided by unconditional distribution.

We explore whether more accurate estimates couldbe obtained by basing them on linguistic verb classes.Experiments are reported which show that in termsof SCF distributions, individual verbs correlate moreclosely with syntactically similar verbs and even moreclosely with semantically similar verbs, than with allverbs in general. On the basis of this result, we suggestclassifying verbs according to their semantic classes andobtaining back-off estimates specific to these classes.

We propose a method for obtaining such semanti-cally based back-off estimates, and a novel approachto hypothesis selection which makes use of these es-timates. This approach involves automatically identi-fying the semantic class of a predicate, using subcate-gorization acquisition machinery to hypothesise condi-tional SCF distribution for the predicate, smoothing theconditional distribution with the back-off estimates ofthe respective semantic verb class, and employing a sim-ple method for filtering, which uses a threshold on theestimates from smoothing. Adopting Briscoe and Car-roll’s (1997) system as a framework, we demonstratethat this semantically-driven approach to hypothesis se-lection can significantly improve the accuracy of large-scale subcategorization acquisition.

UCAM-CL-TR-531

Giampaolo Bella, Fabio Massacci,Lawrence C. Paulson:

Verifying the SET registrationprotocolsMarch 2002, 24 pages, PDF

Abstract: SET (Secure Electronic Transaction) is an im-mense e-commerce protocol designed to improve the se-curity of credit card purchases. In this paper we focuson the initial bootstrapping phases of SET, whose ob-jective is the registration of customers and merchantswith a SET certification authority. The aim of registra-tion is twofold: getting the approval of the cardholder’sor merchant’s bank, and replacing traditional creditcard numbers with electronic credentials that customerscan present to the merchant, so that their privacy is pro-tected. These registration sub-protocols present a num-ber of challenges to current formal verification meth-ods. First, they do not assume that each agent knows

110

the public keys of the other agents. Key distributionis one of the protocols’ tasks. Second, SET uses com-plex encryption primitives (digital envelopes) which in-troduce dependency chains: the loss of one secret keycan lead to potentially unlimited losses. Building uponour previous work, we have been able to model andformally verify SET’s registration with the inductivemethod in Isabelle/HOL solving its challenges with verygeneral techniques.

UCAM-CL-TR-532

Richard Mortier:

Internet traffic engineeringApril 2002, 129 pages, PDFPhD thesis (Churchill College, October 2001)

Abstract: Due to the dramatically increasing popularityof the services provided over the public Internet, prob-lems with current mechanisms for control and manage-ment of the Internet are becoming apparent. In particu-lar, it is increasingly clear that the Internet and othernetworks built on the Internet protocol suite do notprovide sufficient support for the efficient control andmanagement of traffic, i.e. for Traffic Engineering.

This dissertation addresses the problem of traf-fic engineering in the Internet. It argues that trafficmanagement techniques should be applied at multipletimescales, and not just at data timescales as is currentlythe case. It presents and evaluates mechanisms for traf-fic engineering in the Internet at two further timescales:flow admission control and control of per-flow packetmarking, enabling control timescale traffic engineering;and support for load based inter-domain routeing inthe Internet, enabling management timescale traffic en-gineering.

This dissertation also discusses suitable policies forthe application of the proposed mechanisms. It arguesthat the proposed mechanisms are able to support awide range of policies useful to both users and oper-ators. Finally, in a network of the size of the Internetconsideration must also be given to the deployment ofproposed solutions. Consequently, arguments for andagainst the deployment of these mechanisms are pre-sented and the conclusion drawn that there are a num-ber of feasible paths toward deployment.

The work presented argues the following: firstly, it ispossible to implement mechanisms within the Internetframework that enable traffic engineering to be carriedout by operators; secondly, that applying these mecha-nisms with suitable policies can ease the managementproblems faced by operators and at the same time im-prove the efficiency with which the network can be run;thirdly, that these improvements can correspond to in-creased network performance as viewed by the user;and finally, that not only the resulting deployment butalso the deployment process itself are feasible.

UCAM-CL-TR-533

Aline Villavicencio:

The acquisition of a unification-basedgeneralised categorial grammarApril 2002, 223 pages, PDFPhD thesis (Hughes Hall, September 2001)

Abstract: The purpose of this work is to investigatethe process of grammatical acquisition from data. Inorder to do that, a computational learning system isused, composed of a Universal Grammar with associ-ated parameters, and a learning algorithm, followingthe Principles and Parameters Theory. The UniversalGrammar is implemented as a Unification-Based Gen-eralised Categorial Grammar, embedded in a defaultinheritance network of lexical types. The learning al-gorithm receives input from a corpus of spontaneouschild-directed transcribed speech annotated with logi-cal forms and sets the parameters based on this input.This framework is used as a basis to investigate sev-eral aspects of language acquisition. In this thesis I con-centrate on the acquisition of subcategorisation framesand word order information, from data. The data towhich the learner is exposed can be noisy and ambigu-ous, and I investigate how these factors affect the learn-ing process. The results obtained show a robust learnerconverging towards the target grammar given the in-put data available. They also show how the amountof noise present in the input data affects the speed ofconvergence of the learner towards the target grammar.Future work is suggested for investigating the develop-mental stages of language acquisition as predicted bythe learning model, with a thorough comparison withthe developmental stages of a child. This is primarilya cognitive computational model of language learningthat can be used to investigate and gain a better under-standing of human language acquisition, and can po-tentially be relevant to the development of more adap-tive NLP technology.

UCAM-CL-TR-534

Austin Donnelly:

Resource control in network elementsApril 2002, 183 pages, PDFPhD thesis (Pembroke College, January 2002)

Abstract: Increasingly, substantial data path processingis happening on devices within the network. At or nearthe edges of the network, data rates are low enoughthat commodity workstations may be used to processpacket flows. However, the operating systems such ma-chines use are not suited to the needs of data-drivenprocessing. This dissertation shows why this is a prob-lem, how current work fails to address it, and proposesa new approach.

111

The principal problem is that crosstalk occurs in theprocessing of different data flows when they contendfor a shared resource and their accesses to this resourceare not scheduled appropriately; typically the shared re-source is located in a server process. Previous work onvertically structured operating systems reduces the needfor such shared servers by making applications respon-sible for performing as much of their own processingas possible, protecting and multiplexing devices at thelowest level consistent with allowing untrusted user ac-cess.

However, shared servers remain on the data pathin two circumstances: firstly, dumb network adaptorsneed non-trivial processing to allow safe access by un-trusted user applications. Secondly, shared servers areneeded wherever trusted code must be executed for se-curity reasons.

This dissertation presents the design and implemen-tation of Expert, an operating system which avoidscrosstalk by removing the need for such servers.

This dissertation describes how Expert handlesdumb network adaptors to enable applications to ac-cess them via a low-level interface which is cheap to im-plement in the kernel, and retains application responsi-bility for the work involved in running a network stack.

Expert further reduces the need for application-level shared servers by introducing paths which cantrap into protected modules of code to perform actionswhich would otherwise have to be implemented withina server.

Expert allows traditional compute-bound tasks tobe freely mixed with these I/O-driven paths in a singlesystem, and schedules them in a unified manner. Thisallows the processing performed in a network elementto be resource controlled, both for background process-ing tasks such as statistics gathering, and for data pathprocessing such as encryption.

UCAM-CL-TR-535

Claudia Faggian, Martin Hyland:

Designs, disputes and strategiesMay 2002, 21 pages, PDF

Abstract: Important progresses in logic are leading tointeractive and dynamical models. Geometry of Inter-action and Games Semantics are two major examples.Ludics, initiated by Girard, is a further step in this di-rection.

The objects of Ludics which correspond to proofsare designs. A design can be described as the skeletonof a sequent calculus derivation, where we do not ma-nipulate formulas, but their location (the address wherethe formula is stored). To study the traces of the in-teractions between designs as primitive leads to an al-ternative presentation, which is to describe a design asthe set of its possible interactions, called disputes. This

presentation has the advantage to make precise the cor-respondence between the basic notions of Ludics (de-signs, disputes and chronicles) and the basic notions ofGames semantics (strategies, plays and views).

UCAM-CL-TR-536

Sergei Skorobogatov:

Low temperature data remanence instatic RAMJune 2002, 9 pages, PDF

Abstract: Security processors typically store secret keymaterial in static RAM, from which power is removedif the device is tampered with. It is commonly believedthat, at temperatures below −20 C, the contents ofSRAM can be ‘frozen’; therefore, many devices treattemperatures below this threshold as tampering events.We have done some experiments to establish the tem-perature dependency of data retention time in modernSRAM devices. Our experiments show that the conven-tional wisdom no longer holds.

UCAM-CL-TR-537

Mantsika Matooane:

Parallel systems in symbolic andalgebraic computationJune 2002, 139 pages, PDFPhD thesis (Trinity College, August 2001)

Abstract: This report describes techniques to exploitdistributed memory massively parallel supercomputersto satisfy the peak memory demands of some very largecomputer algebra problems (over 10 GB). The mem-ory balancing is based on a randomized hashing algo-rithm for dynamic data distribution. Fine grained par-titioning is used to provide flexibility in the memoryallocation, at the cost of higher communication cost.The main problem areas are multivariate polynomialalgebra, and linear algebra with polynomial matrices.The system was implemented and tested on a HitachiSR2201 supercomputer.

UCAM-CL-TR-538

Mark Ashdown, Peter Robinson:

The Escritoire: A personal projecteddisplay for interacting withdocumentsJune 2002, 12 pages, PDF

112

Abstract: The Escritoire is a horizontal desk interfacethat uses two projectors to create a foveal display. Itemssuch as images, documents, and the interactive displaysof other conventional computers, can be manipulatedon the desk using pens in both hands. The peripherycovers the desk, providing ample space for laying outthe objects relevant to a task, allowing them to be iden-tified at a glance and exploiting human spatial memoryfor rapid retrieval. The fovea is a high resolution focalarea that can be used to view any item in detail. Theprojected images are continuously warped with com-modity graphics hardware before display, to reverse theeffects of misaligned projectors and ensure registrationbetween fovea and periphery. The software is dividedinto a hardware-specific client driving the display, anda platform-independent server imposing control.

UCAM-CL-TR-539

N.A. Dodgson, M.A. Sabin, L. Barthe,M.F. Hassan:

Towards a ternary interpolatingsubdivision scheme for the triangularmeshJuly 2002, 12 pages, PDF

Abstract: We derive a ternary interpolating subdivisionscheme which works on the regular triangular mesh. Ithas quadratic precision and fulfils the standard neces-sary conditions for C2 continuity. Further analysis isrequired to determine its actual continuity class and todefine its behaviour around extraordinary points.

UCAM-CL-TR-540

N.A. Dodgson, J.R. Moore:

The use of computer graphicsrendering software in the analysis of anovel autostereoscopic display designAugust 2002, 6 pages, PDF

Abstract: Computer graphics ‘ray tracing’ software hasbeen used in the design and evaluation of a new au-tostereoscopic 3D display. This software complementsthe conventional optical design software and providesa cost-effective method of simulating what is actuallyseen by a viewer of the display. It may prove a usefultool in similar design problems.

UCAM-CL-TR-541

L. Barthe, N.A. Dodgson, M.A. Sabin,B. Wyvill, V. Gaildrat:

Different applications oftwo-dimensional potential fields forvolume modelingAugust 2002, 26 pages, PDF

Abstract: Current methods for building models usingimplicit volume techniques present problems definingaccurate and controllable blend shapes between im-plicit primitives. We present new methods to extend thefreedom and controllability of implicit volume model-ing. The main idea is to use a free-form curve to definethe profile of the blend region between implicit primi-tives.

The use of a free-form implicit curve, controlledpoint-by-point in the Euclidean user space, allows us togroup boolean composition operators with sharp tran-sitions or smooth free-form transitions in a single mod-eling metaphor. This idea is generalized for the creation,sculpting and manipulation of volume objects, whileproviding the user with simplicity, controllability andfreedom in volume modeling.

Bounded volume objects, known as “Soft objects”or “Metaballs”, have specific properties. We alsopresent binary Boolean composition operators thatgives more control on the form of the transition whenthese objects are blended.

To finish, we show how our free-form implicitcurves can be used to build implicit sweep objects.

UCAM-CL-TR-542

I.P. Ivrissimtzis, N.A. Dodgson, M.A. Sabin:

A generative classification ofmesh refinement rules with latticetransformationsSeptember 2002, 13 pages, PDFAn updated, improved version of this reporthas been published in Computer AidedGeometric Design 22(1):99–109, January 2004[doi:10.1016/j.cagd.2003.08.001]. The classificationscheme is slightly different in the CAGD version.Please refer to the CAGD version in any work whichyou produce.

Abstract: We give a classification of the subdivision re-finement rules using sequences of similar lattices. Ourwork expands and unifies recent results in the classifi-cation of primal triangular subdivision [Alexa, 2001],and results on the refinement of quadrilateral lattices[Sloan, 1994, 1989]. In the examples we concentrateon the cases with low ratio of similarity and find new

113

univariate and bivariate refinement rules with the low-est possible such ratio, showing that this very low ratiousually comes at the expense of symmetry.

UCAM-CL-TR-543

Kerry Rodden:

Evaluating similarity-basedvisualisations as interfaces forimage browsingSeptember 2002, 248 pages, PDFPhD thesis (Newnham College, 11 October 2001)

Abstract: Large collections of digital images are becom-ing more and more common, and the users of thesecollections need computer-based systems to help themfind the images they require. Digital images are easy toshrink to thumbnail size, allowing a large number ofthem to be presented to the user simultaneously. Gener-ally, current image browsing interfaces display thumb-nails in a two-dimensional grid, in some default order,and there has been little exploration of possible alter-natives to this model.

With textual document collections, information vi-sualisation techniques have been used to produce repre-sentations where the documents appear to be clusteredaccording to their mutual similarity, which is based onthe words they have in common. The same techniquescan be applied to images, to arrange a set of thumb-nails according to a defined measure of similarity. Inmany collections, the images are manually annotatedwith descriptive text, allowing their similarity to bemeasured in an analogous way to textual documents.Alternatively, research in content-based image retrievalhas made it possible to measure similarity based onlow-level visual features, such as colour.

The primary goal of this research was to inves-tigate the usefulness of such similarity-based visuali-sations as interfaces for image browsing. We concen-trated on visual similarity, because it is applicable toany image collection, regardless of the availability ofannotations. Initially, we used conventional informa-tion retrieval evaluation methods to compare the rel-ative performance of a number of different visual sim-ilarity measures, both for retrieval and for creating vi-sualisations.

Thereafter, our approach to evaluation was influ-enced more by human-computer interaction: we car-ried out a series of user experiments where arrange-ments based on visual similarity were compared to ran-dom arrangements, for different image browsing tasks.These included finding a given target image, finding agroup of images matching a generic requirement, andchoosing subjectively suitable images for a particularpurpose (from a shortlisted set). As expected, we foundthat similarity-based arrangements are generally morehelpful than random arrangements, especially when the

user already has some idea of the type of image she islooking for.

Images are used in many different application do-mains; the ones we chose to study were stock photog-raphy and personal photography. We investigated theorganisation and browsing of personal photographs insome depth, because of the inevitable future growthin usage of digital cameras, and a lack of previous re-search in this area.

UCAM-CL-TR-544

I.P. Ivrissimtzis, M.A. Sabin, N.A. Dodgson:

On the support of recursivesubdivisionSeptember 2002, 20 pages, PDFAn updated, improved version of this report has beenpublished in ACM Trans. Graphics 23(4):1043–1060,October 2004 [doi:10.1145/1027411.1027417]

Abstract: We study the support of subdivision schemes,that is, the area of the subdivision surface that will beaffected by the displacement of a single control point.Our main results cover the regular case, where the meshinduces a regular Euclidean tessellation of the param-eter space. If n is the ratio of similarity between thetessellation at step k and step k−1 of the subdivision,we show that this number determines if the support ispolygonal or fractal. In particular if n=2, as it is in themost schemes, the support is a polygon whose verticescan be easily determined. If n is not equal to two as,for example, in the square root of three scheme, thesupport is usually fractal and on its boundary we canidentify sets like the classic ternary Cantor set.

UCAM-CL-TR-545

Anthony C.J. Fox:

A HOL specification of theARM instruction set architectureJune 2001, 45 pages, PDF

Abstract: This report gives details of a HOL specifica-tion of the ARM instruction set architecture. It is shownthat the HOL proof tool provides a suitable environ-ment in which to model the architecture. The specifica-tion is used to execute fragments of ARM code gener-ated by an assembler. The specification is based primar-ily around the third version of the ARM architecture,and the intent is to provide a target semantics for fu-ture microprocessor verifications.

114

UCAM-CL-TR-546

Jonathan David Pfautz:

Depth perception in computergraphicsSeptember 2002, 182 pages, PDFPhD thesis (Trinity College, May 2000)

Abstract: With advances in computing and visual dis-play technology, the interface between man and ma-chine has become increasingly complex. The usabil-ity of a modern interactive system depends on the de-sign of the visual display. This dissertation aims to im-prove the design process by examining the relation-ship between human perception of depth and three-dimensional computer-generated imagery (3D CGI).

Depth is perceived when the human visual sys-tem combines various different sources of informa-tion about a scene. In Computer Graphics, linear per-spective is a common depth cue, and systems utilis-ing binocular disparity cues are of increasing interest.When these cues are inaccurately and inconsistentlypresented, the effectiveness of a display will be limited.Images generated with computers are sampled, mean-ing they are discrete in both time and space. This thesisdescribes the sampling artefacts that occur in 3D CGIand their effects on the perception of depth. Tradition-ally, sampling artefacts are treated as a Signal Process-ing problem. The approach here is to evaluate artefactsusing Human Factors and Ergonomics methodology;sampling artefacts are assessed via performance on rel-evant visual tasks.

A series of formal and informal experiments wereperformed on human subjects to evaluate the effects ofspatial and temporal sampling on the presentation ofdepth in CGI. In static images with perspective infor-mation, the relative size of an object can be inconsis-tently presented across depth. This inconsistency pre-vented subjects from making accurate relative depthjudgements. In moving images, these distortions weremost visible when the object was moving slowly, pixelsize was large, the object was located close to the lineof sight and/or the object was located a large virtualdistance from the viewer. When stereo images are pre-sented with perspective cues, the sampling artefactsfound in each cue interact. Inconsistencies in both sizeand disparity can occur as the result of spatial and tem-poral sampling. As a result, disparity can vary inconsis-tently across an object. Subjects judged relative depthless accurately when these inconsistencies were present.An experiment demonstrated that stereo cues domi-nated in conflict situations for static images. In movingimagery, the number of samples in stereo cues is lim-ited. Perspective information dominated the perceptionof depth for unambiguous (i.e., constant in directionand velocity) movement.

Based on the experimental results, a novel methodwas developed that ensures the size, shape and dispar-ity of an object are consistent as it moves in depth.

This algorithm manipulates the edges of an object (atthe expense of positional accuracy) to enforce consis-tent size, shape and disparity. In a time-to-contact taskusing only stereo and perspective depth cues, velocitywas judged more accurately using this method. A sec-ond method manipulated the location and orientationof the viewpoint to maximise the number of samples ofperspective and stereo depth in a scene. This algorithmwas tested in a simulated air traffic control task. Theexperiment demonstrated that knowledge about wherethe viewpoint is located dominates any benefit gainedin reducing sampling artefacts.

This dissertation provides valuable information forthe visual display designer in the form of task-specificexperimental results and computationally inexpensivemethods for reducing the effects of sampling.

UCAM-CL-TR-547

Agathoniki Trigoni:

Semantic optimization of OQLqueriesOctober 2002, 171 pages, PDFPhD thesis (Pembroke College, October 2001)

Abstract: This work explores all the phases of devel-oping a query processor for OQL, the Object QueryLanguage proposed by the Object Data ManagementGroup (ODMG 3.0). There has been a lot of researchon the execution of relational queries and their opti-mization using syntactic or semantic transformations.However, there is no context that has integrated andtested all the phases of processing an object querylanguage, including the use of semantic optimizationheuristics. This research is motivated by the need forquery execution tools that combine two valuable prop-erties: i) the expressive power to encompass all the fea-tures of the object-oriented paradigm and ii) the flexi-bility to benefit from the experience gained with rela-tional systems, such as the use of semantic knowledgeto speed up query execution.

The contribution of this work is twofold. First, itestablishes a rigorous basis for OQL by defining a typeinference model for OQL queries and proposing a com-plete framework for their translation into calculus andalgebraic representations. Second, in order to enhancequery execution it provides algorithms for applying twosemantic optimization heuristics: constraint introduc-tion and constraint elimination techniques. By takinginto consideration a set of association rules with excep-tions, it is possible to add or remove predicates froman OQL query, thus transforming it to a more efficientform.

We have implemented this framework, which en-ables us to measure the benefits and the cost of ex-ploiting semantic knowledge during query execution.The experiments showed significant benefits, especiallyin the application of the constraint introduction tech-nique. In contexts where queries are optimized once

115

and are then executed repeatedly, we can ignore thecost of optimization, and it is always worth carryingout the proposed transformation. In the context of ad-hoc queries the cost of the optimization becomes an im-portant consideration. We have developed heuristics toestimate the cost as well as the benefits of optimization.The optimizer will carry out a semantic transformationonly when the overhead is less than the expected ben-efit. Thus transformations are performed safely evenwith adhoc queries. The framework can often speed upthe execution of an OQL query to a considerable ex-tent.

UCAM-CL-TR-548

Anthony Fox:

Formal verification of the ARM6micro-architectureNovember 2002, 59 pages, PDF

Abstract: This report describes the formal verificationof the ARM6 micro-architecture using the HOL the-orem prover. The correctness of the microprocessordesign compares the micro-architecture with an ab-stract, target instruction set semantics. Data and tem-poral abstraction maps are used to formally relate thestate spaces and to capture the timing behaviour ofthe processor. The verification is carried out in HOLand one-step theorems are used to provide the frame-work for the proof of correctness. This report also de-scribes the formal specification of the ARM6’s threestage pipelined micro-architecture.

UCAM-CL-TR-549

Ross Anderson:

Two remarks on public keycryptologyDecember 2002, 7 pages, PDF

Abstract: In some talks I gave in 1997-98, I put forwardtwo observations on public-key cryptology, concerningforward-secure signatures and compatible weak keys. Idid not publish a paper on either of them as they ap-peared to be rather minor footnotes to public key cryp-tology. But the work has occasionally been cited, andI’ve been asked to write a permanent record.

UCAM-CL-TR-550

Karen Sparck Jones:

Computer security –a layperson’s guide, from thebottom upJune 2002, 23 pages, PDF

Abstract: Computer security as a technical matter iscomplex, and opaque for those who are not them-selves computer professionals but who encounter, orare ultimately responsible for, computer systems. Thispaper presents the essentials of computer security innon-technical terms, with the aim of helping people af-fected by computer systems to understand what secu-rity is about and to withstand the blinding with sciencemantras that too often obscure the real issues.

UCAM-CL-TR-551


The relative consistency of the axiomof choice —mechanized using Isabelle/ZFDecember 2002, 63 pages, PDF

Abstract: The proof of the relative consistency of theaxiom of choice has been mechanized using Isabelle/ZF.The proof builds upon a previous mechanization of thereflection theorem. The heavy reliance on metatheoryin the original proof makes the formalization unusu-ally long, and not entirely satisfactory: two parts ofthe proof do not fit together. It seems impossible tosolve these problems without formalizing the metathe-ory. However, the present development follows a stan-dard textbook, Kunen’s “Set Theory”, and could sup-port the formalization of further material from thatbook. It also serves as an example of what to expectwhen deep mathematics is formalized.

UCAM-CL-TR-552

Keir A. Fraser, Steven M. Hand,Timothy L. Harris, Ian M. Leslie,Ian A. Pratt:

The Xenoserver computinginfrastructureJanuary 2003, 11 pages, PDF

Abstract: The XenoServer project will build a publicinfrastructure for wide-area distributed computing. Weenvisage a world in which XenoServer execution plat-forms will be scattered across the globe and availablefor any member of the public to submit code for exe-cution. Crucially, the code’s sponsor will be billed forall the resources used or reserved during its execution.This will encourage load balancing, limit congestion,and make the platform self-financing.

Such a global infrastructure is essential to addressthe fundamental problem of communication latency. Byenabling principals to run programs at points through-out the network they can ensure that their code exe-cutes close to the entities with which it interacts. As

116

well as reducing latency this can be used to avoid net-work bottlenecks, to reduce long-haul network chargesand to provide a network presence for transiently-connected mobile devices.

This project will build and deploy a globalXenoServer test-bed and make it available to authen-ticated external users; initially members of the scientificcommunity and ultimately of the general public. In thisenvironment accurate resource accounting and pricingis critical – whether in an actual currency or one thatis fictitious. As with our existing work on OS resourcemanagement, pricing provides the feedback necessaryfor applications that can adapt, and prevents over-useby applications that cannot.

UCAM-CL-TR-553

Paul R. Barham, Boris Dragovic,Keir A. Fraser, Steven M. Hand,Timothy L. Harris, Alex C. Ho,Evangelos Kotsovinos,Anil V.S. Madhavapeddy, Rolf Neugebauer,Ian A. Pratt, Andrew K. Warfield:

Xen 2002January 2003, 15 pages, PDF

Abstract: This report describes the design of Xen, thehypervisor developed as part of the XenoServer wide-area computing project. Xen enables the hardware re-sources of a machine to be virtualized and dynamicallypartitioned such as to allow multiple different ‘guest’operating system images to be run simultaneously.

Virtualizing the machine in this manner providesflexibility, allowing different users to choose their pre-ferred operating system (Windows, Linux, NetBSD),and also enables use of the platform as a testbed foroperating systems research. Furthermore, Xen providessecure partitioning between these ‘domains’, and en-ables better resource accounting and QoS isolation thancan be achieved within a conventional operating sys-tem. We show these benefits can be achieved at negligi-ble performance cost.

We outline the design of Xen’s main sub-systems,and the interface exported to guest operating systems.Initial performance results are presented for our mostmature guest operating system port, Linux 2.4. This re-port covers the initial design of Xen, leading up to ourfirst public release which we plan to make available fordownload in April 2003. Further reports will updatethe design as our work progresses and present the im-plementation in more detail.

UCAM-CL-TR-554

Jon Crowcroft:

Towards a field theory for networksJanuary 2003, 9 pages, PDF

Abstract: It is often claimed that Internet Traffic pat-terns are interesting because the Internet puts few con-straints on sources. This leads to innovation. It alsomakes the study of Internet traffic, what we might calthe search for the Internet Erlang, very difficult. At thesame time, traffic control (congestion control) and en-gineering are both hot topics.

What if “flash crowds” (a.k.a. slashdot), cascades,epidemics and so on are the norm? What if the trendcontinues for network link capacity to become flatter,with more equal capacity in the access and core, oreven more capacity in the access than the core (as inthe early 1980s with 10Mbps LANs versus Kbps linksin the ARPANET)? How could we cope?

This is a paper about the use of field equations (e.g.gravitational, electrical, magnetic, strong and weakatomic and so forth) as a future model for managingnetwork traffic. We believe that in the future, one couldmove from this model to a very general prescriptivetechnique for designing network control on differenttimescales, including traffic engineering and the set ofadmission and congestion control laws. We also specu-late about the use of the same idea in wireless networks.

UCAM-CL-TR-555

Jon Crowcroft, Richard Gibbens,Stephen Hailes:

BOURSE – Broadband Organisationof Unregulated Radio Systemsthrough EconomicsJanuary 2003, 10 pages, PDF

Abstract: This is a technical report about an idea for re-search in the intersection of active nets, cognitive radioand power laws of network topologies.

UCAM-CL-TR-556

Jon Crowcroft:

Turing Switches – Turing machinesfor all-optical Internet routingJanuary 2003, 7 pages, PDF

Abstract: This is technical report outlining an idea forbasic long term research into the architectures for pro-grammable all-optical Internet routers.

We are revisiting some of the fundamental tenets ofcomputer science to carry out this work, and so it isnecessarily highly speculative.

Currently, the processing elements in all-electronicrouters are typically fairly conventional von-Neumannarchitecture computers with processors that have large,complex instruction sets (even RISC is relatively com-plex compared with the actual requirements for packetprocessing) and Random Access Memory.

117

As the need for speed increases, first this architec-ture, and then the classical computing hardware com-ponents, and finally, electronics cease to be able to keepup.

At this time, optical device technology is makinggreat strides, and we see the availability of gates, aswell as a plethora of invention in providing bufferingmechanisms.

However, a critical problem we foresee is the abil-ity to re-program devices for different packet pro-cessing functions such as classification and scheduling.This proposal is aimed at researching one direction foradding optical domain programmability.

UCAM-CL-TR-557

G.M. Bierman, P. Sewell:

Iota: A concurrent XML scriptinglanguage with applications toHome Area NetworkingJanuary 2003, 32 pages, PDF

Abstract: Iota is a small and simple concurrent lan-guage that provides native support for functional XMLcomputation and for typed channel-based communica-tion. It has been designed as a domain-specific languageto express device behaviour within the context of HomeArea Networking.

In this paper we describe Iota, explaining its noveltreatment of XML and describing its type system andoperational semantics. We give a number of examplesincluding Iota code to program Universal Plug ’n’ Play(UPnP) devices.

UCAM-CL-TR-558

Yolanta Beresnevichiene:

A role and context based securitymodelJanuary 2003, 89 pages, PDFPhD thesis (Wolfson College, June 2000)

Abstract: Security requirements approached at the en-terprise level initiate the need for models that capturethe organisational and distributed aspects of informa-tion usage. Such models have to express organisation-specific security policies and internal controls aimingto protect information against unauthorised access andmodification, and against usage of information for un-intended purposes. This technical report describes asystematic approach to modelling the security require-ments from the perspective of job functions and tasksperformed in an organisation. It deals with the design,analysis, and management of security abstractions andmechanisms in a unified framework.

The basis of access control policy in this frame-work is formulated around a semantic construct of arole. Roles are granted permissions according to thejob functions that exist in an organisation, and thenusers are assigned to roles on basis of their specificjob responsibilities. In order to ensure that permissionsincluded in the roles are used by users only for pur-poses corresponding to the organisation’s present busi-ness needs, a novel approach of “active” context-basedaccess control is proposed. The usage of role permis-sions in this approach is controlled according to theemerging context associated with progress of varioustasks in the organisation.

The work explores formally the security propertiesof the established model, in particular, support for sepa-ration of duty and least privilege principles that are im-portant requirements in many commercial systems. Re-sults have implications for understanding different vari-ations of separation of duty policy that are currentlyused in the role-based access control.

Finally, a design architecture of the defined secu-rity model is presented detailing the components andprocessing phases required for successful application ofthe model to distributed computer environments. Themodel provides opportunities for the implementers,based on application requirements, to choose betweenseveral alternative design approaches.

UCAM-CL-TR-559

Eiko Yoneki, Jean Bacon:

Pronto: MobileGateway withpublish-subscribe paradigm overwireless networkFebruary 2003, 22 pages, PDF

Abstract: This paper presents the design, implementa-tion, and evaluation of Pronto, a middleware system formobile applications with messaging as a basis. It pro-vides a solution for mobile application specific prob-lems such as resource constraints, network character-istics, and data optimization. Pronto consists of threemain functions: 1) MobileJMS Client, a lightweightclient of Message Oriented Middleware (MOM) basedon Java Message Service (JMS), 2) Gateway for reli-able and efficient transmission between mobile devicesand a server with pluggable components, and 3) Server-less JMS based on IP multicast. The publish-subscribeparadigm is ideal for mobile applications, as mobile de-vices are commonly used for data collection under con-ditions of frequent disconnection and changing num-bers of recipients. This paradigm provides greater flexi-bility due to the decoupling of publisher and subscriber.Adding a gateway as a message hub to transmit infor-mation in real-time or with store-and-forward messag-ing provides powerful optimization and data transfor-mation. Caching is an essential function of the gate-way, and SmartCaching is designed for generic caching

118

in an N-tier architecture. Serverless JMS aims at a de-centralized messaging model, which supports an ad-hocnetwork, as well as creating a high-speed messagingBUS. Pronto is an intelligent MobileGateway, provid-ing a useful MOM intermediary between a server andmobile devices over a wireless network.

UCAM-CL-TR-560

Mike Bond, Piotr Zielinski:

Decimalisation table attacks forPIN crackingFebruary 2003, 14 pages, PDF

Abstract: We present an attack on hardware securitymodules used by retail banks for the secure storage andverification of customer PINs in ATM (cash machine)infrastructures. By using adaptive decimalisation tablesand guesses, the maximum amount of information islearnt about the true PIN upon each guess. It takes anaverage of 15 guesses to determine a four digit PIN us-ing this technique, instead of the 5000 guesses intended.In a single 30 minute lunch-break, an attacker can thusdiscover approximately 7000 PINs rather than 24 withthe brute force method. With a £300 withdrawal limitper card, the potential bounty is raised from £7200to £2.1 million and a single motivated attacker couldwithdraw £30–50 thousand of this each day. This at-tack thus presents a serious threat to bank security.

UCAM-CL-TR-561

Paul B. Menage:

Resource control of untrusted code inan open network environmentMarch 2003, 185 pages, PDFPhD thesis (Magdalene College, June 2000)

Abstract: Current research into Active Networks, OpenSignalling and other forms of mobile code have madeuse of the ability to execute user-supplied code at lo-cations within the network infrastructure, in order toavoid the inherent latency associated with wide areanetworks or to avoid sending excessive amounts of dataacross bottleneck links or nodes. Existing research hasaddressed the design and evaluation of programmingenvironments, and testbeds have been implemented ontraditional operating systems. Such work has deferredissues regarding resource control; this has been reason-able, since this research has been conducted in a closedenvironment.

In an open environment, which is required forwidespread deployment of such technologies, the codesupplied to the network nodes may not be from atrusted source. Thus, it cannot be assumed that suchcode will behave non-maliciously, nor that it will avoid

consuming more than its fair share of the available sys-tem resources.

The computing resources consumed by end-users onprogrammable nodes within a network are not free,and must ultimately be paid for in some way. Pro-grammable networks allow users substantially greatercomplexity in the way that they may consume net-work resources. This dissertation argues that, due tothis complexity, it is essential to be able control and ac-count for the resources used by untrusted user-suppliedcode if such technology is to be deployed effectively ina wide-area open environment.

The Resource Controlled Active Node Environment(RCANE) is presented to facilitate the control of un-trusted code. RCANE supports the allocation, schedul-ing and accounting of the resources available on a node,including CPU and network I/O scheduling, memory al-location, and garbage collection overhead.

UCAM-CL-TR-562

Carsten Moenning, Neil A. Dodgson:

Fast Marching farthest pointsamplingApril 2003, 16 pages, PDF

Abstract: Using Fast Marching for the incrementalcomputation of distance maps across the sampling do-main, we obtain an efficient farthest point samplingtechnique (FastFPS). The method is based on that ofEldar et al. (1992, 1997) but extends more naturally tothe case of non-uniform sampling and is more widelyapplicable. Furthermore, it can be applied to bothplanar domains and curved manifolds and allows forweighted domains in which different cost is associatedwith different points on the surface. We conclude withconsidering the extension of FastFPS to the sampling ofpoint clouds without the need for prior surface recon-struction.

UCAM-CL-TR-563

G.M. Bierman, M.J. Parkinson, A.M. Pitts:

MJ: An imperative core calculus forJava and Java with effectsApril 2003, 53 pages, PDF

Abstract: In order to study rigorously object-orientedlanguages such as Java or C#, a common practice is todefine lightweight fragments, or calculi, which are suf-ficiently small to facilitate formal proofs of key prop-erties. However many of the current proposals for cal-culi lack important language features. In this paper wepropose Middleweight Java, MJ, as a contender for aminimal imperative core calculus for Java. Whilst com-pact, MJ models features such as object identity, field

119

assignment, constructor methods and block structure.We define the syntax, type system and operational se-mantics of MJ, and give a proof of type safety. In orderto demonstrate the usefulness of MJ to reason aboutoperational features, we consider a recent proposal ofGreenhouse and Boyland to extend Java with an effectssystem. This effects system is intended to delimit thescope of computational effects within a Java program.We define an extension of MJ with a similar effects sys-tem and instrument the operational semantics. We thenprove the correctness of the effects system; a questionleft open by Greenhouse and Boyland. We also considerthe question of effect inference for our extended calcu-lus, detail an algorithm for inferring effects informationand give a proof of correctness.

UCAM-CL-TR-564

Ulrich Lang:

Access policies for middlewareMay 2003, 138 pages, PDFPhD thesis (Wolfson College, March 2003)

Abstract: This dissertation examines how the architec-tural layering of middleware constrains the design ofa middleware security architecture, and analyses thecomplications that arise from that. First, we define aprecise notion of middleware that includes its architec-ture and features. Our definition is based on the Com-mon Object Request Broker Architecture (CORBA),which is used throughout this dissertation both as areference technology and as a basis for a proof of con-cept implementation. In several steps, we construct asecurity model that fits to the described middleware ar-chitecture. The model facilitates conceptual reasoningabout security. The results of our analysis indicate thatthe cryptographic identities available on the lower lay-ers of the security model are only of limited use for ex-pressing fine-grained security policies, because they areseparated from the application layer entities by the mid-dleware layer. To express individual application layerentities in access policies, additional more fine-graineddescriptors are required. To solve this problem for thetarget side (i.e., the receiving side of an invocation), wepropose an improved middleware security model thatsupports individual access policies on a per-target ba-sis. The model is based on so-called “resource descrip-tors”, which are used in addition to cryptographic iden-tities to describe application layer entities in access poli-cies. To be useful, descriptors need to fulfil a numberof properties, such as local uniqueness and persistency.Next, we examine the information available at the mid-dleware layer for its usefulness as resource descriptors,in particular the interface name and the instance infor-mation inside the object reference. Unfortunately nei-ther fulfils all required properties. However, it is pos-sible to obtain resource descriptors on the target sidethrough a mapping process that links target instanceinformation to an externally provided descriptor. We

describe both the mapping configuration when the tar-get is instantiated and the mapping process at invoca-tion time. A proof of concept implementation, whichcontains a number of technical improvements over ear-lier attempts to solve this problem, shows that this ap-proach is useable in practice, even for complex archi-tectures, such as CORBA and CORBASec (the securityservices specified for CORBA). Finally, we examine thesecurity approaches of several related middleware tech-nologies that have emerged since the specification ofCORBA and CORBASec, and show the applicabilityof the resource descriptor mapping.

UCAM-CL-TR-565

Carsten Moenning, Neil A. Dodgson:

Fast Marching farthest pointsampling for point clouds andimplicit surfacesMay 2003, 15 pages, PDF

Abstract: In a recent paper (Moenning and Dodgson,2003), the Fast Marching farthest point sampling strat-egy (FastFPS) for planar domains and curved mani-folds was introduced. The version of FastFPS for curvedmanifolds discussed in the paper deals with surface do-mains in triangulated form only. Due to a restrictionof the underlying Fast Marching method, the algorithmfurther requires the splitting of any obtuse into acutetriangles to ensure the consistency of the Fast March-ing approximation. In this paper, we overcome theserestrictions by using Memoli and Sapiro’s (Memoli andSapiro, 2001 and 2002) extension of the Fast Marchingmethod to the handling of implicit surfaces and pointclouds. We find that the extended FastFPS algorithmcan be applied to surfaces in implicit or point cloudform without the loss of the original algorithm’s com-putational optimality and without the need for any pre-processing.

UCAM-CL-TR-566

Joe Hurd:

Formal verification of probabilisticalgorithmsMay 2003, 154 pages, PDFPhD thesis (Trinity College, December 2001)

Abstract: This thesis shows how probabilistic algo-rithms can be formally verified using a mechanical the-orem prover.

We begin with an extensive foundational develop-ment of probability, creating a higher-order logic for-malization of mathematical measure theory. This al-lows the definition of the probability space we use tomodel a random bit generator, which informally is a

120

stream of coin-flips, or technically an infinite sequenceof IID Bernoulli(1/2) random variables.

Probabilistic programs are modelled using the state-transformer monad familiar from functional program-ming, where the random bit generator is passed aroundin the computation. Functions remove random bitsfrom the generator to perform their calculation, andthen pass back the changed random bit generator withthe result.

Our probability space modelling the random bitgenerator allows us to give precise probabilistic spec-ifications of such programs, and then verify them in thetheorem prover.

We also develop technical support designed to ex-pedite verification: probabilistic quantifiers; a composi-tional property subsuming measurability and indepen-dence; a probabilistic while loop together with a formalconcept of termination with probability 1. We also in-troduce a technique for reducing properties of a prob-abilistic while loop to properties of programs that areguaranteed to terminate: these can then be establishedusing induction and standard methods of program cor-rectness.

We demonstrate the formal framework with someexample probabilistic programs: sampling algorithmsfor four probability distributions; some optimal pro-cedures for generating dice rolls from coin flips; thesymmetric simple random walk. In addition, we ver-ify the Miller-Rabin primality test, a well-known andcommercially used probabilistic algorithm. Our funda-mental perspective allows us to define a version withstrong properties, which we can execute in the logic toprove compositeness of numbers.

UCAM-CL-TR-567

Joe Hurd:

Using inequalities as term orderingconstraintsJune 2003, 17 pages, PDF

Abstract: In this paper we show how linear inequalitiescan be used to approximate Knuth-Bendix term order-ing constraints, and how term operations such as sub-stitution can be carried out on systems of inequalities.Using this representation allows an off-the-shelf lineararithmetic decision procedure to check the satisfiabil-ity of a set of ordering constraints. We present a formaldescription of a resolution calculus where systems of in-equalities are used to constrain clauses, and implementthis using the Omega test as a satisfiability checker. Wegive the results of an experiment over problems in theTPTP archive, comparing the practical performance ofthe resolution calculus with and without inherited in-equality constraints.

UCAM-CL-TR-568

Gavin Bierman, Michael Hicks, Peter Sewell,Gareth Stoyle, Keith Wansbrough:

Dynamic rebinding for marshallingand update, with destruct-time λ

February 2004, 85 pages, PDF

Abstract: Most programming languages adopt staticbinding, but for distributed programming an exclusivereliance on static binding is too restrictive: dynamicbinding is required in various guises, for example whena marshalled value is received from the network, con-taining identifiers that must be rebound to local re-sources. Typically it is provided only by ad-hoc mecha-nisms that lack clean semantics.

In this paper we adopt a foundational approach, de-veloping core dynamic rebinding mechanisms as exten-sions to the simply-typed call-by-value λ-calculus. Todo so we must first explore refinements of the call-by-value reduction strategy that delay instantiation, toensure computations make use of the most recent ver-sions of rebound definitions. We introduce redex-timeand destruct-time strategies. The latter forms the ba-sis for a λ-marsh calculus that supports dynamic re-binding of marshalled values, while remaining as faras possible statically-typed. We sketch an extension ofλ-marsh with concurrency and communication, givingexamples showing how wrappers for encapsulating un-trusted code can be expressed. Finally, we show that ahigh-level semantics for dynamic updating can also bebased on the destruct-time strategy, defining a λ-updatecalculus with simple primitives to provide type-safe up-dating of running code. We thereby establish primitivesand a common semantic foundation for a variety ofreal-world dynamic rebinding requirements.

UCAM-CL-TR-569

James J. Leifer, Gilles Peskine, Peter Sewell,Keith Wansbrough:

Global abstraction-safe marshallingwith hash typesJune 2003, 86 pages, PDF

Abstract: Type abstraction is a key feature of ML-likelanguages for writing large programs. Marshalling isnecessary for writing distributed programs, exchang-ing values via network byte-streams or persistent stores.In this paper we combine the two, developing compile-time and run-time semantics for marshalling, that guar-antee abstraction-safety between separately-built pro-grams.

We obtain a namespace for abstract types thatis global, ie meaningful between programs, by hash-ing module declarations. We examine the scenarios in

121

which values of abstract types are communicated fromone program to another, and ensure, by constructinghashes appropriately, that the dynamic and static no-tions of type equality mirror each other. We use single-ton kinds to express abstraction in the static seman-tics; abstraction is tracked in the dynamic semanticsby coloured brackets. These allow us to prove preser-vation, erasure, and coincidence results. We argue thatour proposal is a good basis for extensions to existingML-like languages, pragmatically straightforward forlanguage users and for implementors.

UCAM-CL-TR-570

Ole Høgh Jensen, Robin Milner:

Bigraphs and mobile processesJuly 2003, 121 pages, PDF

Abstract: A bigraphical reactive system (BRS) involvesbigraphs, in which the nesting of nodes represents local-ity, independently of the edges connecting them; it alsoallows bigraphs to reconfigure themselves. BRSs aim toprovide a uniform way to model spatially distributedsystems that both compute and communicate. In thismemorandum we develop their static and dynamic the-ory.

In Part I we illustrate bigraphs in action, and showhow they correspond to to process calculi. We then de-velop the abstract (non-graphical) notion of wide re-active system (WRS), of which BRSs are an instance.Starting from reaction rules —often called rewritingrules— we use the RPO theory of Leifer and Milner toderive (labelled) transition systems for WRSs, in a waythat leads automatically to behavioural congruences.

In Part II we develop bigraphs and BRSs formally.The theory is based directly on graphs, not on syntax.Key results in the static theory are that sufficient RPOsexist (enabling the results of Part I to be applied), thatparallel combinators familiar from process calculi maybe defined, and that a complete algebraic theory existsat least for pure bigraphs (those without binding). Keyaspects in the dynamic theory —the BRSs— are the def-inition of parametric reaction rules that may replicateor discard parameters, and the full application of thebehavioural theory of Part I.

In Part III we introduce a special class: the sim-ple BRSs. These admit encodings of many process cal-culi, including the π-calculus and the ambient calcu-lus. A still narrower class, the basic BRSs, admits aneasy characterisation of our derived transition systems.We exploit this in a case study for an asynchronous π-calculus. We show that structural congruence of pro-cess terms corresponds to equality of the representingbigraphs, and that classical strong bisimilarity corre-sponds to bisimilarity of bigraphs. At the end, we ex-plore several directions for further work.

UCAM-CL-TR-571

James Hall:

Multi-layer network monitoring andanalysisJuly 2003, 230 pages, PDFPhD thesis (King’s College, April 2003)

Abstract: Passive network monitoring offers the pos-sibility of gathering a wealth of data about the traf-fic traversing the network and the communicating pro-cesses generating that traffic. Significant advantages in-clude the non-intrusive nature of data capture and therange and diversity of the traffic and driving applica-tions which may be observed. Conversely there are alsoassociated practical difficulties which have restrictedthe usefulness of the technique: increasing networkbandwidths can challenge the capacity of monitors tokeep pace with passing traffic without data loss, andthe bulk of data recorded may become unmanageable.

Much research based upon passive monitoring hasin consequence been limited to that using a sub-set ofthe data potentially available, typically TCP/IP packetheaders gathered using Tcpdump or similar monitoringtools. The bulk of data collected is thereby minimised,and with the possible exception of packet filtering, themonitor’s available processing power is available forthe task of collection and storage. As the data availablefor analysis is drawn from only a small section of thenetwork protocol stack, detailed study is largely con-fined to the associated functionality and dynamics inisolation from activity at other levels. Such lack of con-text severely restricts examination of the interaction be-tween protocols which may in turn lead to inaccurateor erroneous conclusions.

The work described in this report attempts to ad-dress some of these limitations. A new passive mon-itoring architecture — Nprobe — is presented, basedupon ‘off the shelf’ components and which, by usingclusters of probes, is scalable to keep pace with cur-rent high bandwidth networks without data loss. Mon-itored packets are fully captured, but are subject to theminimum processing in real time needed to identify andassociate data of interest across the target set of proto-cols. Only this data is extracted and stored. The datareduction ratio thus achieved allows examination of awider range of encapsulated protocols without strain-ing the probe’s storage capacity.

Full analysis of the data harvested from the networkis performed off-line. The activity of interest withineach protocol is examined and is integrated across therange of protocols, allowing their interaction to bestudied. The activity at higher levels informs study ofthe lower levels, and that at lower levels infers detailof the higher. A technique for dynamically modellingTCP connections is presented, which, by using datafrom both the transport and higher levels of the proto-col stack, differentiates between the effects of networkand end-process activity.

122

The balance of the report presents a study of Webtraffic using Nprobe. Data collected from the IP, TCP,HTTP and HTML levels of the stack is integrated toidentify the patterns of network activity involved indownloading whole Web pages: by using the links con-tained in HTML documents observed by the monitor,together with data extracted from the HTML head-ers of downloaded contained objects, the set of TCPconnections used, and the way in which browsers usethem, are studied as a whole. An analysis of the degreeand distribution of delay is presented and contributes tothe understanding of performance as perceived by theuser. The effects of packet loss on whole page downloadtimes are examined, particularly those losses occurringearly in the lifetime of connections before reliable esti-mations of round trip times are established. The impli-cations of such early packet losses for pages downloadsusing persistent connections are also examined by sim-ulations using the detailed data available.

UCAM-CL-TR-572

Tim Harris:

Design choices for language-basedtransactionsAugust 2003, 7 pages, PDF

Abstract: This report discusses two design choiceswhich arose in our recent work on introducing a new‘atomic’ keyword as an extension to the Java program-ming language. We discuss the extent to which pro-grams using atomic blocks should be provided withan explicit ‘abort’ operation to roll-back the effects ofthe current block. We also discuss mechanisms for sup-porting blocks that perform I/O operations or externaldatabase transactions.

UCAM-CL-TR-573


Mechanizing compositionalreasoning for concurrent systems:some lessonsAugust 2003, 20 pages, PDF

Abstract: The paper reports on experiences of mecha-nizing various proposals for compositional reasoning inconcurrent systems. The work uses the UNITY formal-ism and the Isabelle proof tool. The proposals inves-tigated include existential/universal properties, guaran-tees properties and progress sets. The paper mentionssome alternative proposals that are also worth of inves-tigation. The conclusions are that many of these meth-ods work and are suitable candidates for further devel-opment.

UCAM-CL-TR-574

Ivan Edward Sutherland:

Sketchpad: A man-machine graphicalcommunication systemSeptember 2003, 149 pages, PDFPhD thesis (Massachusetts Institute of Technology,January 1963)New preface by Alan Blackwell and Kerry Rodden.

Abstract: The Sketchpad system uses drawing as anovel communication medium for a computer. The sys-tem contains input, output, and computation programswhich enable it to interpret information drawn directlyon a computer display. It has been used to draw electri-cal, mechanical, scientific, mathematical, and animateddrawings; it is a general purpose system. Sketchpad hasshown the most usefulness as an aid to the understand-ing of processes, such as the notion of linkages, whichcan be described with pictures. Sketchpad also makes iteasy to draw highly repetitive or highly accurate draw-ings and to change drawings previously drawn with it.The many drawings in this thesis were all made withSketchpad.

A Sketchpad user sketches directly on a computerdisplay with a “light pen.” The light pen is used bothto position parts of the drawing on the display andto point to them to change them. A set of push but-tons controls the changes to be made such as “erase,”or “move.” Except for legends, no written language isused.

Information sketched can include straight line seg-ments and circle arcs. Arbitrary symbols may be definedfrom any collection of line segments, circle arcs, andpreviously defined symbols. A user may define and useas many symbols as he wishes. Any change in the defi-nition of a symbol is at once seen wherever that symbolappears.

Sketchpad stores explicit information about thetopology of a drawing. If the user moves one vertex of apolygon, both adjacent sides will be moved. If the usermoves a symbol, all lines attached to that symbol willautomatically move to stay attached to it. The topolog-ical connections of the drawing are automatically indi-cated by the user as he sketches. Since Sketchpad is ableto accept topological information from a human beingin a picture language perfectly natural to the human,it can be used as an input program for computationprograms which require topological data, e.g., circuitsimulators.

Sketchpad itself is able to move parts of the drawingaround to meet new conditions which the user may ap-ply to them. The user indicates conditions with the lightpen and push buttons. For example, to make two linesparallel, he successively points to the lines with the lightpen and presses a button. The conditions themselves aredisplayed on the drawing so that they may be erased orchanged with the light pen language. Any combination

123

of conditions can be defined as a composite conditionand applied in one step.

It is easy to add entirely new types of conditionsto Sketchpad’s vocabulary. Since the conditions can in-volve anything computable, Sketchpad can be used fora very wide range of problems. For example, Sketch-pad has been used to find the distribution of forces inthe members of truss bridges drawn with it.

Sketchpad drawings are stored in the computer ina specially designed “ring” structure. The ring struc-ture features rapid processing of topological informa-tion with no searching at all. The basic operations usedin Sketchpad for manipulating the ring structure are de-scribed.

UCAM-CL-TR-575

Tim Granger:

Reconfigurable wavelength-switchedoptical networks for the Internet coreNovember 2003, 184 pages, PDFPhD thesis (King’s College, August 2003)

Abstract: With the quantity of data traffic carried onthe Internet doubling each year, there is no let up in thedemand for ever increasing network capacity. Opticalfibres have a theoretical capacity of many tens of ter-abits per second. Currently six terabits per second hasbeen achieved using Dense Wavelength Division Mul-tiplexing: multiple signals at different wavelengths car-ried on the same fibre.

This large available bandwidth moves the perfor-mance bottlenecks to the processing required at eachnetwork node to receive, buffer, route, and transmiteach individual packet. For the last 10 years the speedof the electronic routers has been, in relative terms,increasing slower than optical capacity. The space re-quired and power consumed by these routers is also be-coming a significant limitation.

One solution examined in this dissertation is to cre-ate a virtual topology in the optical layer by using all-optical switches to create lightpaths across the network.In this way nodes that are not directly connected canappear to be a single virtual hop away, and no per-packet processing is required at the intermediate nodes.With advances in optical switches it is now possible forthe network to reconfigure lightpaths dynamically. Thisallows the network to share the resources available be-tween the different traffic streams flowing across thenetwork, and track changes in traffic volumes by allo-cating bandwidth on demand.

This solution is inherently a circuit-switched ap-proach, but taken into account are characteristics ofoptical switching, in particular waveband switching(where we switch a contiguous range of wavelengthsas a single unit) and latency required to achieve nondisruptive switching.

This dissertation quantifies the potential gain fromsuch a system and how that gain is related to the fre-quency of reconfiguration. It outlines possible networkarchitectures which allow reconfiguration and, throughsimulation, measures the performance of these architec-tures. It then discusses the possible interactions betweena reconfiguring optical layer and higher-level networklayers.

This dissertation argues that the optical layer shouldbe distinct from higher network layers, maintaining sta-ble full-mesh connectivity, and dynamically reconfigur-ing the sizes and physical routes of the virtual paths totake advantage of changing traffic levels.

UCAM-CL-TR-576

David R. Spence:

An implementation of a coordinatebased location systemNovember 2003, 12 pages, PDF

Abstract: This paper explains the co-ordinate based lo-cation system built for XenoSearch, a resource discov-ery system in the XenoServer Open Platform. The sys-tem is builds on the work of GNP, Lighthouse andmany more recent schemes. We also present resultsfrom various combinations of algorithms to performthe actual co-ordinate calculation based on GNP, Light-house and spring based systems and show our imple-mentations of the various algorithms give similar pre-diction errors.

UCAM-CL-TR-577

Markus G. Kuhn:

Compromising emanations:eavesdropping risks of computerdisplaysDecember 2003, 167 pages, PDFPhD thesis (Wolfson College, June 2002)

Abstract: Electronic equipment can emit unintentionalsignals from which eavesdroppers may reconstruct pro-cessed data at some distance. This has been a con-cern for military hardware for over half a century. Thecivilian computer-security community became aware ofthe risk through the work of van Eck in 1985. Mili-tary “Tempest” shielding test standards remain secretand no civilian equivalents are available at present. Thetopic is still largely neglected in security textbooks dueto a lack of published experimental data.

This report documents eavesdropping experimentson contemporary computer displays. It discusses thenature and properties of compromising emanations forboth cathode-ray tube and liquid-crystal monitors. Thedetection equipment used matches the capabilities to

124

be expected from well-funded professional eavesdrop-pers. All experiments were carried out in a normal un-shielded office environment. They therefore focus onemanations from display refresh signals, where periodicaveraging can be used to obtain reproducible results inspite of varying environmental noise.

Additional experiments described in this reportdemonstrate how to make information emitted viathe video signal more easily receivable, how to re-cover plaintext from emanations via radio-characterrecognition, how to estimate remotely precise video-timing parameters, and how to protect displayed textfrom radio-frequency eavesdroppers by using special-ized screen drivers with a carefully selected video card.Furthermore, a proposal for a civilian radio-frequencyemission-security standard is outlined, based on path-loss estimates and published data about radio noise lev-els.

Finally, a new optical eavesdropping technique isdemonstrated that reads CRT displays at a distance. Itobserves high-frequency variations of the light emitted,even after diffuse reflection. Experiments with a typicalmonitor show that enough video signal remains in thelight to permit the reconstruction of readable text fromsignals detected with a fast photosensor. Shot-noise cal-culations provide an upper bound for this risk.

UCAM-CL-TR-578

Robert Ennals, Richard Sharp, Alan Mycroft:

Linear types for packet processing(extended version)January 2004, 31 pages, PDF

Abstract: We present PacLang: an imperative, concur-rent, linearly-typed language designed for expressingpacket processing applications. PacLang’s linear typesystem ensures that no packet is referenced by morethan one thread, but allows multiple references to apacket within a thread. We argue (i) that this propertygreatly simplifies compilation of high-level programs tothe distributed memory architectures of modern Net-work Processors; and (ii) that PacLang’s type systemcaptures that style in which imperative packet process-ing programs are already written. Claim (ii) is justifiedby means of a case-study: we describe a PacLang imple-mentation of the IPv4 unicast packet forwarding algo-rithm.

PacLang is formalised by means of an operationalsemantics and a Unique Ownership theorem formalisesits correctness with respect to the type system.

UCAM-CL-TR-579

Keir Fraser:

Practical lock-freedomFebruary 2004, 116 pages, PDFPhD thesis (King’s College, September 2003)

Abstract: Mutual-exclusion locks are currently themost popular mechanism for interprocess synchroni-sation, largely due to their apparent simplicity andease of implementation. In the parallel-computing en-vironments that are increasingly commonplace in high-performance applications, this simplicity is deceptive:mutual exclusion does not scale well with large num-bers of locks and many concurrent threads of execu-tion. Highly-concurrent access to shared data demandsa sophisticated ‘fine-grained’ locking strategy to avoidserialising non-conflicting operations. Such strategiesare hard to design correctly and with good performancebecause they can harbour problems such as deadlock,priority inversion and convoying. Lock manipulationsmay also degrade the performance of cache-coherentmultiprocessor systems by causing coherency conflictsand increased interconnect traffic, even when the lockprotects read-only data.

In looking for solutions to these problems, interesthas developed in lock-free data structures. By eschew-ing mutual exclusion it is hoped that more efficient androbust systems can be built. Unfortunately the currentreality is that most lock-free algorithms are complex,slow and impractical. In this dissertation I address theseconcerns by introducing and evaluating practical ab-stractions and data structures that facilitate the devel-opment of large-scale lock-free systems.

Firstly, I present an implementation of two usefulabstractions that make it easier to develop arbitrarylock-free data structures. Although these abstractionshave been described in previous work, my designs arethe first that can be practically implemented on currentmultiprocessor systems.

Secondly, I present a suite of novel lock-free searchstructures. This is interesting not only because of thefundamental importance of searching in computer sci-ence and its wide use in real systems, but also because itdemonstrates the implementation issues that arise whenusing the practical abstractions I have developed.

Finally, I evaluate each of my designs and com-pare them with existing lock-based and lock-free al-ternatives. To ensure the strongest possible competi-tion, several of the lock-based alternatives are signifi-cant improvements on the best-known solutions in theliterature. These results demonstrate that it is possibleto build useful data structures with all the perceivedbenefits of lock-freedom and with performance bet-ter than sophisticated lock-based designs. Furthermore,and contrary to popular belief, this work shows that ex-isting hardware primitives are sufficient to build prac-tical lock-free implementations of complex data struc-tures.

UCAM-CL-TR-580

Ole Høgh Jensen, Robin Milner:

Bigraphs and mobile processes(revised)February 2004, 131 pages, PDF

125

Abstract: A bigraphical reactive system (BRS) involvesbigraphs, in which the nesting of nodes represents local-ity, independently of the edges connecting them; it alsoallows bigraphs to reconfigure themselves. BRSs aim toprovide a uniform way to model spatially distributedsystems that both compute and communicate. In thismemorandum we develop their static and dynamic the-ory.

In Part I we illustrate bigraphs in action, and showhow they correspond to to process calculi. We then de-velop the abstract (non-graphical) notion of wide re-active system (WRS), of which BRSs are an instance.Starting from reaction rules —often called rewritingrules— we use the RPO theory of Leifer and Milner toderive (labelled) transition systems for WRSs, in a waythat leads automatically to behavioural congruences.

In Part II we develop bigraphs and BRSs formally.The theory is based directly on graphs, not on syntax.Key results in the static theory are that sufficient RPOsexist (enabling the results of Part I to be applied), thatparallel combinators familiar from process calculi maybe defined, and that a complete algebraic theory existsat least for pure bigraphs (those without binding). Keyaspects in the dynamic theory —the BRSs— are the def-inition of parametric reaction rules that may replicateor discard parameters, and the full application of thebehavioural theory of Part I.

In Part III we introduce a special class: the sim-ple BRSs. These admit encodings of many process cal-culi, including the π-calculus and the ambient calcu-lus. A still narrower class, the basic BRSs, admits aneasy characterisation of our derived transition systems.We exploit this in a case study for an asynchronous π-calculus. We show that structural congruence of pro-cess terms corresponds to equality of the representingbigraphs, and that classical strong bisimilarity corre-sponds to bisimilarity of bigraphs. At the end, we ex-plore several directions for further work.

UCAM-CL-TR-581

Robin Milner:

Axioms for bigraphical structureFebruary 2004, 26 pages, PDF

Abstract: This paper axiomatises the structure of bi-graphs, and proves that the resulting theory is com-plete. Bigraphs are graphs with double structure, rep-resenting locality and connectivity. They have beenshown to represent dynamic theories for the π-calculus,mobile ambients and Petri nets, in a way that is faith-ful to each of those models of discrete behaviour. Whilethe main purpose of bigraphs is to understand mobilesystems, a prerequisite for this understanding is a well-behaved theory of the structure of states in such sys-tems. The algebra of bigraph structure is surprisinglysimple, as the paper demonstrates; this is because bi-graphs treat locality and connectivity orthogonally.

UCAM-CL-TR-582

Piotr Zielinski:

Latency-optimal Uniform AtomicBroadcast algorithmFebruary 2004, 28 pages, PDF

Abstract: We present a new asynchronous UniformAtomic Broadcast algorithm with a delivery latency oftwo communication steps in optimistic settings, whichis faster than any other known algorithm and has beenshown to be the lower bound. It also has the weak-est possible liveness requirements (the Ω failure detectorand a majority of correct processes) and achieves threenew lower bounds presented in this paper. Finally, weintroduce a new notation and several new abstractions,which are used to construct and present the algorithmin a clear and modular way.

UCAM-CL-TR-583

Cedric Gerot, Loıc Barthe, Neil A. Dodgson,Malcolm A. Sabin:

Subdivision as a sequence ofsampled Cp surfaces and conditionsfor tuning schemesMarch 2004, 68 pages, PDF

Abstract: We deal with practical conditions for tuninga subdivision scheme in order to control its artifactsin the vicinity of a mark point. To do so, we look forgood behaviour of the limit vertices rather than goodmathematical properties of the limit surface. The goodbehaviour of the limit vertices is characterised with thedefinition of C2-convergence of a scheme. We proposenecessary explicit conditions for C2-convergence of ascheme in the vicinity of any mark point being a ver-tex of valency n or the centre of an n-sided face withn greater or equal to three. These necessary conditionsconcern the eigenvalues and eigenvectors of subdivisionmatrices in the frequency domain. The components ofthese matrices may be complex. If we could guaran-tee that they were real, this would simplify numericalanalysis of the eigenstructure of the matrices, especiallyin the context of scheme tuning where we manipulatesymbolic terms. In this paper we show that an appro-priate choice of the parameter space combined with asubstitution of vertices lets us transform these matri-ces into pure real ones. The substitution consists in re-placing some vertices by linear combinations of them-selves. Finally, we explain how to derive conditions onthe eigenelements of the real matrices which are neces-sary for the C2-convergence of the scheme.

126

UCAM-CL-TR-584

Stephen Brooks:

Concise texture editingMarch 2004, 164 pages, PDFPhD thesis (Jesus College, October 2003)

Abstract: Many computer graphics applications remainin the domain of the specialist. They are typically char-acterized by complex user-directed tasks, often requir-ing proficiency in design, colour spaces, computer inter-action and file management. Furthermore, the demandsof this skill set are often exacerbated by an equally com-plex collection of image or object manipulation com-mands embedded in a variety of interface components.The complexity of these graphic editing tools often re-quires that the user possess a correspondingly high levelof expertise.

Concise Texture Editing is aimed at addressing theover-complexity of modern graphics tools and is basedon the intuitive notion that the human user is skilled athigh level decision making while the computer is profi-cient at rapid computation. This thesis has focused onthe development of interactive editing tools for 2D tex-ture images and has led to the development of a noveltexture manipulation system that allows:

– the concise painting of a texture;– the concise cloning of textures;– the concise alteration of texture element size.The system allows complex operations to be per-

formed on images with minimal user interaction. Whenapplied to the domain of image editing, this impliesthat the user can instruct the system to perform com-plex changes to digital images without having to spec-ify copious amounts of detail. In order to reduce theuser’s workload, the inherent self-similarity of texturesis exploited to interactively replicate editing opera-tions globally over an image. This unique image sys-tem thereby reduces the user’s workload through semi-automation, resulting in an acutely concise user inter-face.

UCAM-CL-TR-585

Mark S. D. Ashdown:

Personal projected displaysMarch 2004, 150 pages, PDFPhD thesis (Churchill College, September 2003)

Abstract: Since the inception of the personal com-puter, the interface presented to users has been de-fined by the monitor screen, keyboard, and mouse, andby the framework of the desktop metaphor. It is verydifferent from a physical desktop which has a largehorizontal surface, allows paper documents to be ar-ranged, browsed, and annotated, and is controlled viacontinuous movements with both hands. The desktop

metaphor will not scale to such a large display; the con-tinuing profusion of paper, which is used as much asever, attests to its unsurpassed affordances as a mediumfor manipulating documents; and despite its provenmanual and cognitive benefits, two-handed input is stillnot used in computer interfaces.

I present a system called the Escritoire that uses anovel configuration of overlapping projectors to cre-ate a large desk display that fills the area of a con-ventional desk and also has a high resolution regionin front of the user for precise work. The projectorsneed not be positioned exactly—the projected imageryis warped using standard 3D video hardware to com-pensate for rough projector positioning and obliqueprojection. Calibration involves computing planar ho-mographies between the 2D co-ordinate spaces of thewarped textures, projector framebuffers, desk, and in-put devices.

The video hardware can easily perform the neces-sary warping and achieves 30 frames per second for thedual-projector display. Oblique projection has provedto be a solution to the problem of occlusion common tofront-projection systems. The combination of an elec-tromagnetic digitizer and an ultrasonic pen allows si-multaneous input with two hands. The pen for the non-dominant hand is simpler and coarser than that forthe dominant hand, reflecting the differing roles of thehands in bimanual manipulation. I give a new algo-rithm for calibrating a pen, that uses piecewise linearinterpolation between control points. I also give an al-gorithm to calibrate a wall display at distance using adevice whose position and orientation are tracked inthree dimensions.

The Escritoire software is divided into a client thatexploits the video hardware and handles the input de-vices, and a server that processes events and stores allof the system state. Multiple clients can connect to asingle server to support collaboration. Sheets of vir-tual paper on the Escritoire can be put in piles whichcan be browsed and reordered. As with physical paperthis allows items to be arranged quickly and informally,avoiding the premature work required to add an itemto a hierarchical file system. Another interface feature ispen traces, which allow remote users to gesture to eachother. I report the results of tests with individuals andwith pairs collaborating remotely. Collaborating partic-ipants found an audio channel and the shared desk sur-face much more useful than a video channel showingtheir faces.

The Escritoire is constructed from commodity com-ponents, and unlike multi-projector display walls itscost is feasible for an individual user and it fits intoa normal office setting. It demonstrates a hardwareconfiguration, calibration algorithm, graphics warpingprocess, set of interface features, and distributed archi-tecture that can make personal projected displays a re-ality.

UCAM-CL-TR-586

Andras Belokosztolszki:

127

Role-based access control policyadministrationMarch 2004, 170 pages, PDFPhD thesis (King’s College, November 2003)

Abstract: The wide proliferation of the Internet has setnew requirements for access control policy specifica-tion. Due to the demand for ad-hoc cooperation be-tween organisations, applications are no longer isolatedfrom each other; consequently, access control policiesface a large, heterogeneous, and dynamic environment.Policies, while maintaining their main functionality, gothrough many minor adaptations, evolving as the envi-ronment changes.

In this thesis we investigate the long-term adminis-tration of role-based access control (RBAC) – in partic-ular OASIS RBAC – policies.

With the aim of encapsulating persistent goals ofpolicies we introduce extensions in the form of meta-policies. These meta-policies, whose expected lifetimeis longer than the lifetime of individual policies, con-tain extra information and restrictions about policies. Itis expected that successive policy versions are checkedat policy specification time to ensure that they com-ply with the requirements and guidelines set by meta-policies.

In the first of the three classes of meta-policies wegroup together policy components by annotating themwith context labels. Based on this grouping and an in-formation flow relation on context labels, we limit theway in which policy components may be connected toother component groups. We use this to partition con-ceptually disparate portions of policies, and referencethese coherent portions to specify policy restrictionsand policy enforcement behaviour.

In our second class of meta-policies – compliancepolicies – we specify requirements on an abstract policymodel. We then use this for static policy checking. Ascompliance tests are performed at policy specificationtime, compliance policies may include restrictions thateither cannot be included in policies, or whose inclu-sion would result in degraded policy enforcement per-formance. We also indicate how to use compliance poli-cies to provide information about organisational poli-cies without disclosing sensitive information.

The final class of our meta-policies, called interfacepolicies, is used to help set up and maintain cooperationamong organisations by enabling them to use compo-nents from each other’s policies. Being based on com-pliance policies, they use an abstract policy componentmodel, and can also specify requirements for both com-ponent exporters and importers. Using such interfacepolicies we can reconcile compatibility issues betweencooperating parties automatically.

Finally, building on our meta-policies, we considerpolicy evolution and self-administration, according towhich we treat RBAC policies as distributed resourcesto which access is specified with the help of RBAC itself.

This enables environments where policies are main-tained by many administrators who have varying levelsof competence, trust, and jurisdiction.

We have tested all of these concepts in Desert, ourproof of concept implementation.

UCAM-CL-TR-587

Paul Alexander Cunningham:

Verification of asynchronous circuitsApril 2004, 174 pages, PDFPhD thesis (Gonville and Caius College, January2002)

Abstract: The purpose of this thesis is to introduceproposition-oriented behaviours and apply them to theverification of asynchronous circuits. The major contri-bution of proposition-oriented behaviours is their abil-ity to extend existing formal notations to permit theexplicit use of both signal levels and transitions.

This thesis begins with the formalisation ofproposition-oriented behaviours in the context of gatenetworks, and with the set-theoretic extension of bothregular-expressions and trace-expressions to reasonover proposition-oriented behaviours. A new trace-expression construct, referred to as biased composition,is also introduced. Algorithmic realisation of these set-theoretic extensions is documented using a special formof finite automata called proposition automata. A ver-ification procedure for conformance of gate networksto a set of proposition automata is described in whicheach proposition automaton may be viewed either asa constraint or a specification. The implementation ofthis procedure as an automated verification programcalled Veraci is summarised, and a number of exampleVeraci programs are used to demonstrate contributionsof proposition-oriented behaviour to asynchronous cir-cuit design. These contributions include level-event uni-fication, event abstraction, and relative timing assump-tions using biased composition. The performance of Ve-raci is also compared to an existing event-oriented ver-ification program called Versify, the result of this com-parison being a consistent performance gain using Ve-raci over Versify.

This thesis concludes with the design and im-plementation of a 2048 bit dual-rail asynchronousMontgomery exponentiator, MOD EXP, in a 0.18µmstandard-cell process. The application of Veraci to thedesign of MOD EXP is summarised, and the practi-cal benefits of proposition-oriented verification are dis-cussed.

UCAM-CL-TR-588

Panit Watcharawitch:

MulTEP: A MultiThreadedEmbedded Processor

128

May 2004, 190 pages, PDFPhD thesis (Newnham College, November 2003)

Abstract: Conventional embedded microprocessorshave traditionally followed the footsteps of high-endprocessor design to achieve high performance. Theirunderlying architectures prioritise tasks by time-criticalinterrupts and rely on software to perform schedulingtasks. Single threaded execution relies on instruction-based probabilistic techniques, such as speculative exe-cution and branch prediction, which are unsuitable forembedded systems when real-time performance guar-antees need to be met. Multithreading appears to bea feasible solution for embedded processors. Thread-level parallelism has a potential to overcome the limita-tions of insufficient instruction-level parallelism to hidethe increasing memory latencies. MulTEP is designedto provide high performance thread-level parallelism,real-time characteristics, a flexible number of threadsand low incremental cost per thread for the embeddedsystem. In its architecture, a matching-store synchroni-sation mechanism allows a thread to wait for multipledata items. A tagged up/down dynamic-priority hard-ware scheduler is provided for real-time scheduling.Pre-loading, pre-fetching and colour-tagging techniquesare implemented to allow context switches without anyoverhead. The architecture provides four additionalmultithreading instructions for programmers and ad-vance compilers to create programs with low-overheadmultithreaded operations. Experimental results demon-strate that multithreading can be effectively used to im-prove performance and system utilisation. Latency op-erations that would otherwise stall the pipeline are hid-den by the execution of the other threads. The hard-ware scheduler provides priority scheduling, which issuitable for real-time embedded applications.

UCAM-CL-TR-589

Glynn Winskel, Francesco Zappa Nardelli:

new-HOPLA — a higher-orderprocess language with namegenerationMay 2004, 16 pages, PDF

Abstract: This paper introduces new-HOPLA, a con-cise but powerful language for higher-order nondeter-ministic processes with name generation. Its origins asa metalanguage for domain theory are sketched but forthe most part the paper concentrates on its operationalsemantics. The language is typed, the type of a processdescribing the shape of the computation paths it canperform. Its transition semantics, bisimulation, congru-ence properties and expressive power are explored. En-codings of π-calculus and HOπ are presented.

UCAM-CL-TR-590

Peter R. Pietzuch:

Hermes: A scalable event-basedmiddlewareJune 2004, 180 pages, PDFPhD thesis (Queens’ College, February 2004)

Abstract: Large-scale distributed systems require newmiddleware paradigms that do not suffer from the lim-itations of traditional request/reply middleware. Theselimitations include tight coupling between components,a lack of information filtering capabilities, and supportfor one-to-one communication semantics only. We ar-gue that event-based middleware is a scalable and pow-erful new type of middleware for building large-scaledistributed systems. However, it is important that anevent-based middleware platform includes all the stan-dard functionality that an application programmer ex-pects from middleware.

In this thesis we describe the design and implemen-tation of Hermes, a distributed, event-based middle-ware platform. The power and flexibility of Hermesis illustrated throughout for two application domains:Internet-wide news distribution and a sensor-rich, ac-tive building. Hermes follows a type- and attribute-based publish/subscribe model that places particularemphasis on programming language integration by sup-porting type-checking of event data and event typeinheritance. To handle dynamic, large-scale environ-ments, Hermes uses peer-to-peer techniques for auto-nomic management of its overlay network of event bro-kers and for scalable event dissemination. Its routingalgorithms, implemented on top of a distributed hashtable, use rendezvous nodes to reduce routing state inthe system, and include fault-tolerance features for re-pairing event dissemination trees. All this is achievedwithout compromising scalability and efficiency, as isshown by a simulational evaluation of Hermes routing.

The core functionality of an event-based middle-ware is extended with three higher-level middlewareservices that address different requirements in a dis-tributed computing environment. We introduce a novelcongestion control service that avoids congestion in theoverlay broker network during normal operation andrecovery after failure, and therefore enables a resource-efficient deployment of the middleware. The expressive-ness of subscriptions in the event-based middleware isenhanced with a composite event service that performsthe distributed detection of complex event patterns,thus taking the burden away from clients. Finally, a se-curity service adds access control to Hermes accordingto a secure publish/subscribe model. This model sup-ports fine-grained access control decisions so that sep-arate trust domains can share the same overlay brokernetwork.

129

UCAM-CL-TR-591

Silas S. Brown:

Conversion of notationsJune 2004, 159 pages, PDFPhD thesis (St John’s College, November 2003)

Abstract: Music, engineering, mathematics, and manyother disciplines have established notations for writingtheir documents. The effectiveness of each of these no-tations can be hampered by the circumstances in whichit is being used, or by a user’s disability or culturalbackground. Adjusting the notation can help, but therequirements of different cases often conflict, meaningthat each new document will have to be transformedbetween many versions. Tools that support the pro-gramming of such transformations can also assist by al-lowing the creation of new notations on demand, whichis an under-explored option in the relief of educationaldifficulties.

This thesis reviews some programming tools thatcan be used to manipulate the tree-like structure of anotation in order to transform it into another. It thendescribes a system “4DML” that allows the program-mer to create a “model” of the desired result, fromwhich the transformation is derived. This is achievedby representing the structure in a geometric space withmany dimensions, where the model acts as an alterna-tive frame of reference.

Example applications of 4DML include the tran-scription of songs and musical scores into various nota-tions, the production of specially-customised notationsto assist a sight-impaired person in learning Chinese, anunusual way of re-organising personal notes, a “web-site scraping” system for extracting data from on-lineservices that provide only one presentation, and an aidto making mathematics and diagrams accessible to peo-ple with severe print disabilities. The benefits and draw-backs of the 4DML approach are evaluated, and possi-ble directions for future work are explored.

UCAM-CL-TR-592

Mike Bond, Daniel Cvrcek,Steven J. Murdoch:

Unwrapping the ChrysalisJune 2004, 15 pages, PDF

Abstract: We describe our experiences reverse engineer-ing the Chrysalis-ITS Luna CA3 – a PKCS#11 compli-ant cryptographic token. Emissions analysis and secu-rity API attacks are viewed by many to be simpler andmore efficient than a direct attack on an HSM. But howdifficult is it to actually “go in the front door”? We de-scribe how we unpicked the CA3 internal architectureand abused its low-level API to impersonate a CA3 to-ken in its cloning protocol – and extract PKCS#11 pri-vate keys in the clear. We quantify the effort involved in

developing and applying the skills necessary for such areverse-engineering attack. In the process, we discoverthat the Luna CA3 has far more undocumented codeand functionality than is revealed to the end-user.

UCAM-CL-TR-593

Piotr Zielinski:

Paxos at warJune 2004, 30 pages, PDF

Abstract: The optimistic latency of Byzantine Paxos canbe reduced from three communication steps to two,without using public-key cryptography. This is doneby making a decision when more than (n+3f)/2 accep-tors report to have received the same proposal from theleader, with n being the total number of acceptors andf the number of the faulty ones. No further improve-ment in latency is possible, because every Consensusalgorithm must take at least two steps even in benignsettings. Moreover, if the leader is correct, our protocolachieves the latency of at most three steps, even if someother processes fail. These two properties make this thefastest Byzantine agreement protocol proposed so far.

By running many instances of this algorithm in par-allel, we can implement Vector Consensus and Byzan-tine Atomic Broadcast in two and three steps, respec-tively, which is two steps faster than any other knownalgorithm.

UCAM-CL-TR-594

George Danezis:

Designing and attacking anonymouscommunication systemsJuly 2004, 150 pages, PDFPhD thesis (Queens’ College, January 2004)

Abstract: This report contributes to the field of anony-mous communications over widely deployed communi-cation networks. It describes novel schemes to protectanonymity; it also presents powerful new attacks andnew ways of analysing and understanding anonymityproperties.

We present Mixminion, a new generation anony-mous remailer, and examine its security against allknown passive and active cryptographic attacks. Weuse the secure anonymous replies it provides, to de-scribe a pseudonym server, as an example of the anony-mous protocols that mixminion can support. The secu-rity of mix systems is then assessed against a compul-sion threat model, in which an adversary can requestthe decryption of material from honest nodes. A newconstruction, the fs-mix, is presented that makes trac-ing messages by such an adversary extremely expensive.

Moving beyond the static security of anonymouscommunication protocols, we define a metric based

130

on information theory that can be used to measureanonymity. The analysis of the pool mix serves asan example of its use. We then create a frameworkwithin which we compare the traffic analysis resistanceprovided by different mix network topologies. A newtopology, based on expander graphs, proves to be ef-ficient and secure. The rgb-mix is also presented; thisimplements a strategy to detect flooding attacks againsthonest mix nodes and neutralise them by the use ofcover traffic.

Finally a set of generic attacks are studied. Sta-tistical disclosure attacks model the whole anony-mous system as a black box, and are able to uncoverthe relationships between long-term correspondents.Stream attacks trace streams of data travelling throughanonymizing networks, and uncover the communicat-ing parties very quickly. They both use statistical meth-ods to drastically reduce the anonymity of users. Otherminor attacks are described against peer discovery androute reconstruction in anonymous networks, as wellas the naıve use of anonymous replies.

UCAM-CL-TR-595

Pablo J. Arrighi:

Representations of quantumoperations, with applications toquantum cryptographyJuly 2004, 157 pages, PDFPhD thesis (Emmanuel College, 23 September 2003)

Abstract: Representations of quantum operations – Westart by introducing a geometrical representation (realvector space) of quantum states and quantum opera-tions. To do so we exploit an isomorphism from pos-itive matrices to a subcone of the Minkowski futurelight-cone. Pure states map onto certain light-like vec-tors, whilst the axis of revolution encodes the overallprobability of occurrence for the state. This extensionof the Generalized Bloch Sphere enables us to cater fornon-trace-preserving quantum operations, and in par-ticular to view the per-outcome effects of generalizedmeasurements. We show that these consist of the prod-uct of an orthogonal transform about the axis of thecone of revolution and a positive real symmetric lin-ear transform. In the case of a qubit the representationbecomes all the more interesting since it elegantly as-sociates, to each measurement element of a generalizedmeasurement, a Lorentz transformation in Minkowskispace. We formalize explicitly this correspondence be-tween ‘observation of a quantum system’ and ‘specialrelativistic change of inertial frame’. To end this part wereview the state-operator correspondence, which wassuccessfully exploited by Choi to derive the operator-sum representation of quantum operations. We go fur-ther and show that all of the important theorems con-cerning quantum operations can in fact be derived assimple corollaries of those concerning quantum states.

Using this methodology we derive novel compositionlaws upon quantum states and quantum operations,Schmidt-type decompositions for bipartite pure statesand some powerful formulae relating to the correspon-dence.

Quantum cryptography – The key principle ofquantum cryptography could be summarized as fol-lows. Honest parties communicate using quantumstates. To the eavesdropper these states are randomand non-orthogonal. In order to gather informationshe must measure them, but this may cause irreversibledamage. Honest parties seek to detect her mischief bychecking whether certain quantum states are left intact.Thus tradeoff between the eavesdropper’s informationgain, and the disturbance she necessarily induces, canbe viewed as the power engine behind quantum crypto-graphic protocols. We begin by quantifying this trade-off in the case of a measure distinguishing two non-orthogonal equiprobable pure states. A formula for thistradeoff was first obtained by Fuchs and Peres, butwe provide a shorter, geometrical derivation (withinthe framework of the above mentioned conal represen-tation). Next we proceed to analyze the Informationgain versus disturbance tradeoff in a scenario whereAlice and Bob interleave, at random, pairwise super-positions of two message words within their otherwiseclassical communications. This work constitutes oneof the few results currently available regarding d-levelsystems quantum cryptography, and seems to providea good general primitive for building such protocols.The proof crucially relies on the state-operator corre-spondence formulae derived in the first part, togetherwith some methods by Banaszek. Finally we make useof this analysis to prove the security of a ‘blind quan-tum computation’ protocol, whereby Alice gets Bob toperform some quantum algorithm for her, but preventshim from learning her input to this quantum algorithm.

UCAM-CL-TR-596

Keir Fraser, Steven Hand, Rolf Neugebauer,Ian Pratt, Andrew Warfield,Mark Williamson:

Reconstructing I/OAugust 2004, 16 pages, PDF

Abstract: We present a next-generation architecturethat addresses problems of dependability, maintainabil-ity, and manageability of I/O devices and their softwaredrivers on the PC platform. Our architecture resolvesboth hardware and software issues, exploiting emerg-ing hardware features to improve device safety. Ourhigh-performance implementation, based on the Xenvirtual machine monitor, provides an immediate tran-sition opportunity for today’s systems.

131

UCAM-CL-TR-597

Advaith Siddharthan:

Syntactic simplification andtext cohesionAugust 2004, 195 pages, PDFPhD thesis (Gonville and Caius College, November2003)

Abstract: Syntactic simplification is the process of re-ducing the grammatical complexity of a text, while re-taining its information content and meaning. The aimof syntactic simplification is to make text easier to com-prehend for human readers, or process by programs. Inthis thesis, I describe how syntactic simplification canbe achieved using shallow robust analysis, a small set ofhand-crafted simplification rules and a detailed analysisof the discourse-level aspects of syntactically rewritingtext. I offer a treatment of relative clauses, apposition,coordination and subordination.

I present novel techniques for relative clause and ap-positive attachment. I argue that these attachment de-cisions are not purely syntactic. My approaches relyon a shallow discourse model and on animacy infor-mation obtained from a lexical knowledge base. I alsoshow how clause and appositive boundaries can be de-termined reliably using a decision procedure based onlocal context, represented by part-of-speech tags andnoun chunks.

I then formalise the interactions that take place be-tween syntax and discourse during the simplificationprocess. This is important because the usefulness ofsyntactic simplification in making a text accessible to awider audience can be undermined if the rewritten textlacks cohesion. I describe how various generation issueslike sentence ordering, cue-word selection, referring-expression generation, determiner choice and pronom-inal use can be resolved so as to preserve conjunctiveand anaphoric cohesive-relations during syntactic sim-plification.

In order to perform syntactic simplification, I havehad to address various natural language processingproblems, including clause and appositive identifica-tion and attachment, pronoun resolution and referring-expression generation. I evaluate my approaches tosolving each problem individually, and also present aholistic evaluation of my syntactic simplification sys-tem.

UCAM-CL-TR-598

James J. Leifer, Robin Milner:

Transition systems, link graphs andPetri netsAugust 2004, 64 pages, PDF

Abstract: A framework is defined within which reac-tive systems can be studied formally. The frameworkis based upon s-categories, a new variety of categories,within which reactive systems can be set up in such away that labelled transition systems can be uniformlyextracted. These lead in turn to behavioural preordersand equivalences, such as the failures preorder (treatedelsewhere) and bisimilarity, which are guaranteed to becongruential. The theory rests upon the notion of rela-tive pushout previously introduced by the authors. Theframework is applied to a particular graphical modelknown as link graphs, which encompasses a varietyof calculi for mobile distributed processes. The specifictheory of link graphs is developed. It is then appliedto an established calculus, namely condition-event Petrinets. In particular, a labelled transition system is de-rived for condition-event nets, corresponding to a natu-ral notion of observable actions in Petri net theory. Thetransition system yields a congruential bisimilarity co-inciding with one derived directly from the observableactions. This yields a calibration of the general theoryof reactive systems and link graphs against known spe-cific theories.

UCAM-CL-TR-599

Mohamed F. Hassan:

Further analysis of ternary and3-point univariate subdivisionschemesAugust 2004, 9 pages, PDF

Abstract: The precision set, approximation order andHolder exponent are derived for each of the univari-ate subdivision schemes described in Technical ReportUCAM-CL-TR-520.

UCAM-CL-TR-600

Brian Ninham Shand:

Trust for resource control:Self-enforcing automatic rationalcontracts between computersAugust 2004, 154 pages, PDFPhD thesis (Jesus College, February 2004)

Abstract: Computer systems need to control access totheir resources, in order to give precedence to urgent orimportant tasks. This is increasingly important in net-worked applications, which need to interact with othermachines but may be subject to abuse unless protectedfrom attack. To do this effectively, they need an explicitresource model, and a way to assess others’ actions interms of it. This dissertation shows how the actionscan be represented using resource-based computational

132

contracts, together with a rich trust model which mon-itors and enforces contract compliance.

Related research in the area has focused on individ-ual aspects of this problem, such as resource pricingand auctions, trust modelling and reputation systems,or resource-constrained computing and resource-awaremiddleware. These need to be integrated into a singlemodel, in order to provide a general framework forcomputing by contract.

This work explores automatic computerized con-tracts for negotiating and controlling resource usage ina distributed system. Contracts express the terms underwhich client and server promise to exchange resources,such as processor time in exchange for money, usinga constrained language which can be automatically in-terpreted. A novel, distributed trust model is used toenforce these promises, and this also supports trust del-egation through cryptographic certificates. The modelis formally proved to have appropriate properties ofsafety and liveness, which ensure that cheats cannotsystematically gain resources by deceit, and that mu-tually profitable contracts continue to be supported.

The contract framework has many applications, inautomating distributed services and in limiting the dis-ruptiveness of users’ programs. Applications such asresource-constrained sandboxes, operating system mul-timedia support and automatic distribution of personaladdress book entries can all treat the user’s time asa scarce resource, to trade off computational costsagainst user distraction. Similarly, commercial Grid ser-vices can prioritise computations with contracts, whilea cooperative service such as distributed compositeevent detection can use contracts for detector place-ment and load balancing. Thus the contract frame-work provides a general purpose tool for managing dis-tributed computation, allowing participants to take cal-culated risks and rationally choose which contracts toperform.

UCAM-CL-TR-601

Hasan Amjad:

Combining model checking andtheorem provingSeptember 2004, 131 pages, PDFPhD thesis (Trinity College, March 2004)

Abstract: We implement a model checker for the modalmu-calculus as a derived rule in a fully expansive me-chanical theorem prover, without causing an unaccept-able performance penalty.

We use a restricted form of a higher order logicrepresentation calculus for binary decision diagrams(BDDs) to interface the model checker to a high-performance BDD engine. This is used with a for-malised theory of the modal mu-calculus (which wealso develop) for model checking in which all stepsof the algorithm are justified by fully expansive proof.

This provides a fine-grained integration of modelchecking and theorem proving using a mathematicallyrigourous interface. The generality of our theories al-lows us to perform much of the proof offline, in con-trast with earlier work. This substantially reduces theinevitable performance penalty of doing model check-ing by proof.

To demonstrate the feasibility of our approach, op-timisations to the model checking algorithm are added.We add naive caching and also perform advancedcaching for nested non-alternating fixed-point compu-tations.

Finally, the usefulness of the work is demonstrated.We leverage our theory by proving translations to sim-pler logics that are in more widespread use. We thenimplement an executable theory for counterexample-guided abstraction refinement that also uses a SATsolver. We verify properties of a bus architecture in usein industry as well as a pedagogical arithmetic and logicunit. The benchmarks show an acceptable performancepenalty, and the results are correct by construction.

UCAM-CL-TR-602

Hasan Amjad:

Model checking the AMBA protocolin HOLSeptember 2004, 27 pages, PDF

Abstract: The Advanced Microcontroller Bus Architec-ture (AMBA) is an open System-on-Chip bus protocolfor high-performance buses on low-power devices. Inthis report we implement a simple model of AMBAand use model checking and theorem proving to verifylatency, arbitration, coherence and deadlock freedomproperties of the implementation.

UCAM-CL-TR-603

Robin Milner:

Bigraphs whose names havemultiple localitySeptember 2004, 15 pages, PDF

Abstract: The previous definition of binding bigraphsis generalised so that local names may be located inmore than one region, allowing more succinct and flex-ible presentation of bigraphical reactive systems. Thisreport defines the generalisation, verifies that it retainsrelative pushouts, and introduces a new notion of bi-graph extension; this admits a wider class of parametricreaction rules. Extension is shown to be well-behavedalgebraically; one consequence is that, as in the originaldefinition of bigraphs, discrete parameters are sufficientto generate all reactions.

133

UCAM-CL-TR-604

Andrei Serjantov:

On the anonymity of anonymitysystemsOctober 2004, 162 pages, PDFPhD thesis (Queens’ College, March 2003)

Abstract: Anonymity on the Internet is a property com-monly identified with privacy of electronic communica-tions. A number of different systems exist which claimto provide anonymous email and web browsing, buttheir effectiveness has hardly been evaluated in prac-tice. In this thesis we focus on the anonymity proper-ties of such systems. First, we show how the anonymityof anonymity systems can be quantified, pointing outflaws with existing metrics and proposing our own. Inthe process we distinguish the anonymity of a messageand that of an anonymity system.

Secondly, we focus on the properties of buildingblocks of mix-based (email) anonymity systems, eval-uating their resistance to powerful blending attacks,their delay, their anonymity under normal conditionsand other properties. This leads us to methods of com-puting anonymity for a particular class of mixes – timedmixes – and a new binomial mix.

Next, we look at the anonymity of a message goingthrough an entire anonymity system based on a mixnetwork architecture. We construct a semantics of anetwork with threshold mixes, define the informationobservable by an attacker, and give a principled defini-tion of the anonymity of a message going through sucha network.

We then consider low latency connection-basedanonymity systems, giving concrete attacks and de-scribing methods of protection against them. In par-ticular, we show that Peer-to-Peer anonymity systemsprovide less anonymity against the global passive ad-versary than ones based on a “classic” architecture.

Finally, we give an account of how anonymity canbe used in censorship resistant systems. These are de-signed to provide availability of documents, while fac-ing threats from a powerful adversary. We show howanonymity can be used to hide the identity of the serverswhere each of the documents are stored, thus makingthem harder to remove from the system.

UCAM-CL-TR-605

Peter Sewell, James J. Leifer,Keith Wansbrough, Mair Allen-Williams,Francesco Zappa Nardelli, Pierre Habouzit,Viktor Vafeiadis:

Acute: High-level programminglanguage design for distributedcomputation

Design rationale and languagedefinitionOctober 2004, 193 pages, PDF

Abstract: This paper studies key issues for distributedprogramming in high-level languages. We discuss thedesign space and describe an experimental language,Acute, which we have defined and implemented.

Acute extends an OCaml core to support dis-tributed development, deployment, and execution, al-lowing type-safe interaction between separately-builtprograms. It is expressive enough to enable a wide vari-ety of distributed infrastructure layers to be written assimple library code above the byte-string network andpersistent store APIs, disentangling the language run-time from communication.

This requires a synthesis of novel and existing fea-tures:

(1) type-safe marshalling of values between pro-grams;

(2) dynamic loading and controlled rebinding to lo-cal resources;

(3) modules and abstract types with abstractionboundaries that are respected by interaction;

(4) global names, generated either freshly or basedon module hashes: at the type level, as runtime namesfor abstract types; and at the term level, as channelnames and other interaction handles;

(5) versions and version constraints, integrated withtype identity;

(6) local concurrency and thread thunkification; and(7) second-order polymorphism with a namecase

construct.We deal with the interplay among these features and

the core, and develop a semantic definition that tracksabstraction boundaries, global names, and hashesthroughout compilation and execution, but which stilladmits an efficient implementation strategy.

UCAM-CL-TR-606

Nicholas Nethercote:

Dynamic binary analysis andinstrumentationNovember 2004, 177 pages, PDFPhD thesis (Trinity College, November 2004)

Abstract: Dynamic binary analysis (DBA) tools suchas profilers and checkers help programmers create bet-ter software. Dynamic binary instrumentation (DBI)frameworks make it easy to build new DBA tools. Thisdissertation advances the theory and practice of dy-namic binary analysis and instrumentation, with an em-phasis on the importance of the use and support ofmetadata.

The dissertation has three main parts.

134

The first part describes a DBI framework called Val-grind which provides novel features to support heavy-weight DBA tools that maintain rich metadata, espe-cially location metadata—the shadowing of every reg-ister and memory location with a metavalue. Locationmetadata is used in shadow computation, a kind ofDBA where every normal operation is shadowed by anabstract operation.

The second part describes three powerful DBAtools. The first tool performs detailed cache profiling.The second tool does an old kind of dynamic analysis—bounds-checking—in a new way. The third tool pro-duces dynamic data flow graphs, a novel visualisationthat cuts to the essence of a program’s execution. Allthree tools were built with Valgrind, and rely on Val-grind’s support for heavyweight DBA and rich meta-data, and the latter two perform shadow computation.

The third part describes a novel system of semi-formal descriptions of DBA tools. It gives many ex-ample descriptions, and also considers in detail exactlywhat dynamic analysis is.

The dissertation makes six main contributions.First, the descriptions show that metadata is the key

component of dynamic analysis; in particular, whereasstatic analysis predicts approximations of a program’sfuture, dynamic analysis remembers approximations ofa program’s past, and these approximations are exactlywhat metadata is.

Second, the example tools show that rich metadataand shadow computation make for powerful and novelDBA tools that do more than the traditional tracing andprofiling.

Third, Valgrind and the example tools show that aDBI framework can make it easy to build heavyweightDBA tools, by providing good support for rich meta-data and shadow computation.

Fourth, the descriptions are a precise and conciseway of characterising tools, provide a directed way ofthinking about tools that can lead to better implemen-tations, and indicate the theoretical upper limit of thepower of DBA tools in general.

Fifth, the three example tools are interesting in theirown right, and the latter two are novel.

Finally, the entire dissertation provides many de-tails, and represents a great deal of condensed expe-rience, about implementing DBI frameworks and DBAtools.

UCAM-CL-TR-607

Neil E. Johnson:

Code size optimization for embeddedprocessorsNovember 2004, 159 pages, PDFPhD thesis (Robinson College, May 2004)

Abstract: This thesis studies the problem of reducingcode size produced by an optimizing compiler. We de-velop the Value State Dependence Graph (VSDG) asa powerful intermediate form. Nodes represent com-putation, and edges represent value (data) and state(control) dependencies between nodes. The edges spec-ify a partial ordering of the nodes—sufficient orderingto maintain the I/O semantics of the source program,while allowing optimizers greater freedom to movenodes within the program to achieve better (smaller)code. Optimizations, both classical and new, transformthe graph through graph rewriting rules prior to codegeneration. Additional (semantically inessential) stateedges are added to transform the VSDG into a ControlFlow Graph, from which target code is generated.

We show how procedural abstraction can be ad-vantageously applied to the VSDG. Graph patterns areextracted from a program’s VSDG. We then select re-peated patterns giving the greatest size reduction, gen-erate new functions from these patterns, and replace alloccurrences of the patterns in the original VSDG withcalls to these abstracted functions. Several embeddedprocessors have load- and store-multiple instructions,representing several loads (or stores) as one instruction.We present a method, benefiting from the VSDG form,for using these instructions to reduce code size by pro-visionally combining loads and stores before code gen-eration. The final contribution of this thesis is a com-bined register allocation and code motion (RACM) al-gorithm. We show that our RACM algorithm formu-lates these two previously antagonistic phases as onecombined pass over the VSDG, transforming the graph(moving or cloning nodes, or spilling edges) to fit withinthe physical resources of the target processor.

We have implemented our ideas within a prototypeC compiler and suite of VSDG optimizers, generatingcode for the Thumb 32-bit processor. Our results showimprovements for each optimization and that we canachieve code sizes comparable to, and in some casesbetter than, that produced by commercial compilerswith significant investments in optimization technol-ogy.

UCAM-CL-TR-608

Walt Yao:

Trust management for widelydistributed systemsNovember 2004, 191 pages, PDFPhD thesis (Jesus College, February 2003)

Abstract: In recent years, we have witnessed the evolu-tionary development of a new breed of distributed sys-tems. Systems of this type share a number of charac-teristics – highly decentralized, of Internet-grade scala-bility, and autonomous within their administrative do-mains. Most importantly, they are expected to oper-ate collaboratively across both known and unknown

135

domains. Prime examples include peer-to-peer applica-tions and open web services. Typically, authorization indistributed systems is identity-based, e.g. access controllists. However, approaches based on predefined iden-tities are unsuitable for the new breed of distributedsystems because of the need to deal with unknownusers, i.e. strangers, and the need to manage a poten-tially large number of users and/or resources. Further-more, effective administration and management of au-thorization in such systems requires: (1) natural map-ping of organizational policies into security policies; (2)managing collaboration of independently administereddomains/organizations; (3) decentralization of securitypolicies and policy enforcement.

This thesis describes Fidelis, a trust managementframework designed to address the authorization needsfor the next-generation distributed systems. A trustmanagement system is a term coined to refer to a uni-fied framework for the specification of security poli-cies, the representation of credentials, and the evalu-ation and enforcement of policy compliances. Based onthe concept of trust conveyance and a generic abstrac-tion for trusted information as trust statements, Fidelisprovides a generic platform for building secure, trust-aware distributed applications. At the heart of the Fi-delis framework is a language for the specification ofsecurity policies, the Fidelis Policy Language (FPL), andthe inference model for evaluating policies expressed inFPL. With the policy language and its inference model,Fidelis is able to model recommendation-style policiesand policies with arbitrarily complex chains of trustpropagation.

Web services have rapidly been gaining significanceboth in industry and research as a ubiquitous, next-generation middleware platform. The second half of thethesis describes the design and implementation of theFidelis framework for the standard web service plat-form. The goal of this work is twofold: first, to demon-strate the practical feasibility of Fidelis, and second, toinvestigate the use of a policy-driven trust managementframework for Internet-scale open systems. An impor-tant requirement in such systems is trust negotiationthat allows unfamiliar principals to establish mutualtrust and interact with confidence. Addressing this re-quirement, a trust negotiation framework built on topof Fidelis is developed.

This thesis examines the application of Fidelis inthree distinctive domains: implementing generic role-based access control, trust management in the WorldWide Web, and an electronic marketplace comprisingunfamiliar and untrusted but collaborative organiza-tions.

UCAM-CL-TR-609

Eleanor Toye, Anil Madhavapeddy,Richard Sharp, David Scott, Alan Blackwell,Eben Upton:

Using camera-phones to interact withcontext-aware mobile servicesDecember 2004, 23 pages, PDF

Abstract: We describe an interaction technique for con-trolling site-specific mobile services using commerciallyavailable camera-phones, public information displaysand visual tags. We report results from an experimen-tal study validating this technique in terms of pointingspeed and accuracy. Our results show that even novicescan use camera-phones to “point-and-click” on visualtags quickly and accurately. We have built a publiclyavailable client/server software framework for rapid de-velopment of applications that exploit our interactiontechnique. We describe two prototype applications thatwere implemented using this framework and presentfindings from user-experience studies based on these ap-plications.

UCAM-CL-TR-610

Tommy Ingulfsen:

Influence of syntax on prosodicboundary predictionDecember 2004, 49 pages, PDFMPhil thesis (Churchill College, July 2004)

Abstract: In this thesis we compare the effectivenessof different syntactic features and syntactic represen-tations for prosodic boundary prediction, setting out toclarify which representations are most suitable for thistask. The results of a series of experiments show that itis not possible to conclude that a single representationis superior to all others. Three representations give riseto similar experimental results. One of these representa-tions is composed only of shallow features, which wereoriginally thought to have less predictive power thandeep features. Conversely, one of the deep representa-tions that seemed to be best suited for our purposes(syntactic chunks) turns out not to be among the threebest.

UCAM-CL-TR-611

Neil A. Dodgson:

An heuristic analysis of theclassification of bivariate subdivisionschemesDecember 2004, 18 pages, PDF

136

Abstract: Alexa [*] and Ivrissimtzis et al. [+] haveproposed a classification mechanism for bivariate sub-division schemes. Alexa considers triangular primalschemes, Ivrissimtzis et al. generalise this both toquadrilateral and hexagonal meshes and to dual andmixed schemes. I summarise this classification and thenproceed to analyse it in order to determine whichclasses of subdivision scheme are likely to contain use-ful members. My aim is to ascertain whether there areany potentially useful classes which have not yet beeninvestigated or whether we can say, with reasonableconfidence, that all of the useful classes have alreadybeen considered. I apply heuristics related to the map-pings of element types (vertices, face centres, and mid-edges) to one another, to the preservation of symme-tries, to the alignment of meshes at different subdivisionlevels, and to the size of the overall subdivision mask.My conclusion is that there are only a small number ofuseful classes and that most of these have already beeninvestigated in terms of linear, stationary subdivisionschemes. There is some space for further work, par-ticularly in the investigation of whether there are use-ful ternary linear, stationary subdivision schemes, butit appears that future advances are more likely to lieelsewhere.

[*] M. Alexa. Refinement operators for trianglemeshes. Computer Aided Geometric Design, 19(3):169-172, 2002.

[+] I. P. Ivrissimtzis, N. A. Dodgson, and M. A.Sabin. A generative classification of mesh refinementrules with lattice transformations. Computer Aided Ge-ometric Design, 22(1):99-109, 2004.

UCAM-CL-TR-612

Alastair R. Beresford:

Location privacy in ubiquitouscomputingJanuary 2005, 139 pages, PDFPhD thesis (Robinson College, April 2004)

Abstract: The field of ubiquitous computing envisagesan era when the average consumer owns hundreds orthousands of mobile and embedded computing devices.These devices will perform actions based on the contextof their users, and therefore ubiquitous systems willgather, collate and distribute much more personal in-formation about individuals than computers do today.Much of this personal information will be consideredprivate, and therefore mechanisms which allow users tocontrol the dissemination of these data are vital. Loca-tion information is a particularly useful form of contextin ubiquitous computing, yet its unconditional distribu-tion can be very invasive.

This dissertation develops novel methods for pro-viding location privacy in ubiquitous computing. Muchof the previous work in this area uses access control to

enable location privacy. This dissertation takes a differ-ent approach and argues that many location-aware ap-plications can function with anonymised location dataand that, where this is possible, its use is preferable tothat of access control.

Suitable anonymisation of location data is not a triv-ial task: under a realistic threat model simply remov-ing explicit identifiers does not anonymise location in-formation. This dissertation describes why this is thecase and develops two quantitative security models foranonymising location data: the mix zone model and thevariable quality model.

A trusted third-party can use one, or both, models toensure that all location events given to untrusted appli-cations are suitably anonymised. The mix zone modelsupports untrusted applications which require accu-rate location information about users in a set of dis-joint physical locations. In contrast, the variable qualitymodel reduces the temporal or spatial accuracy of loca-tion information to maintain user anonymity at everylocation.

Both models provide a quantitative measure of thelevel of anonymity achieved; therefore any given situ-ation can be analysed to determine the amount of in-formation an attacker can gain through analysis of theanonymised data. The suitability of both these modelsis demonstrated and the level of location privacy avail-able to users of real location-aware applications is mea-sured.

UCAM-CL-TR-613

David J. Scott:

Abstracting application-level securitypolicy for ubiquitous computingJanuary 2005, 186 pages, PDFPhD thesis (Robinson College, September 2004)

Abstract: In the future world of Ubiquitous Comput-ing, tiny embedded networked computers will be foundin everything from mobile phones to microwave ovens.Thanks to improvements in technology and softwareengineering, these computers will be capable of run-ning sophisticated new applications constructed frommobile agents. Inevitably, many of these systems willcontain application-level vulnerabilities; errors causedby either unanticipated mobility or interface behaviour.Unfortunately existing methods for applying securitypolicy – network firewalls – are inadequate to controland protect the hordes of vulnerable mobile devices. Asmore and more critical functions are handled by thesesystems, the potential for disaster is increasing rapidly.

To counter these new threats, this report champi-ons the approach of using new application-level secu-rity policy languages in combination to protect vul-nerable applications. Policies are abstracted from mainapplication code, facilitating both analysis and future

137

maintenance. As well as protecting existing applica-tions, such policy systems can help as part of a security-aware design process when building new applicationsfrom scratch.

Three new application-level policy languages arecontributed each addressing a different kind of vulnera-bility. Firstly, the policy language MRPL allows the cre-ation of Mobility Restriction Policies, based on a uni-fied spatial model which represents both physical lo-cation of objects as well as virtual location of mobilecode. Secondly, the policy language SPDL-2 protectsapplications against a large number of common errorsby allowing the specification of per-request/responsevalidation and transformation rules. Thirdly, the pol-icy language SWIL allows interfaces to be describedas automata which may be analysed statically by amodel-checker before being checked dynamically inan application-level firewall. When combined together,these three languages provide an effective means forpreventing otherwise critical application-level vulnera-bilities.

Systems implementing these policy languages havebeen built; an implementation framework is describedand encouraging performance results and analysis arepresented.

UCAM-CL-TR-614

Robin Milner:

Pure bigraphsJanuary 2005, 66 pages, PDF

Abstract: Bigraphs are graphs whose nodes may benested, representing locality, independently of the edgesconnecting them. They may be equipped with reactionrules, forming a bigraphical reactive system (Brs) inwhich bigraphs can reconfigure themselves. Brss aimto unify process calculi, and to model applications —such as pervasive computing— where locality and mo-bility are prominent. The paper is devoted to the theoryof pure bigraphs, which underlie various more refinedforms. It begins by developing a more abstract struc-ture, a wide reactive system Wrs, of which a Brs is aninstance; in this context, labelled transitions are definedin such a way that the induced bisimilarity is a congru-ence.

This work is then specialised to Brss, whose graph-ical structure allows many refinements of the dynamictheory. Elsewhere it is shown that behavioural analy-sis for Petri nets, π-calculus and mobile ambients canall be recovered in the uniform framework of bigraphs.The latter part of the paper emphasizes the parts of bi-graphical theory that are common to these applications,especially the treatment of dynamics via labelled tran-sitions. As a running example, the theory is applied tofinite pure CCS, whose resulting transition system andbisimilarity are analysed in detail.

The paper also discusses briefly the use of bigraphsto model both pervasive computing and biological sys-tems.

UCAM-CL-TR-615

Evangelos Kotsovinos:

Global public computingJanuary 2005, 229 pages, PDFPhD thesis (Trinity Hall, November 2004)

Abstract: High-bandwidth networking and cheap com-puting hardware are leading to a world in which the re-sources of one machine are available to groups of usersbeyond their immediate owner. This trend is visible inmany different settings. Distributed computing, whereapplications are divided into parts that run on differ-ent machines for load distribution, geographical dis-persion, or robustness, has recently found new fertileground. Grid computing promises to provide a com-mon framework for scheduling scientific computationand managing the associated large data sets. Proposalsfor utility computing envision a world in which busi-nesses rent computing bandwidth in server farms on-demand instead of purchasing and maintaining serversthemselves.

All such architectures target particular user and ap-plication groups or deployment scenarios, where sim-plifying assumptions can be made. They expect cen-tralised ownership of resources, cooperative users, andapplications that are well-behaved and compliant to aspecific API or middleware. Members of the public whoare not involved in Grid communities or wish to de-ploy out-of-the-box distributed services, such as gameservers, have no means to acquire resources on largenumbers of machines around the world to launch theirtasks.

This dissertation proposes a new distributed com-puting paradigm, termed global public computing,which allows any user to run any code anywhere.Such platforms price computing resources, and ulti-mately charge users for resources consumed. This dis-sertation presents the design and implementation ofthe XenoServer Open Platform, putting this vision intopractice. The efficiency and scalability of the developedmechanisms are demonstrated by experimental evalua-tion; the prototype platform allows the global-scale de-ployment of complex services in less than 45 seconds,and could scale to millions of concurrent sessions with-out presenting performance bottlenecks.

To facilitate global public computing, this workaddresses several research challenges. It introducesreusable mechanisms for representing, advertising, andsupporting the discovery of resources. To allow flexi-ble and federated control of resource allocation by allstakeholders involved, it proposes a novel role-basedresource management framework for expressing andcombining distributed management policies. Further-more, it implements effective service deployment mod-els for launching distributed services on large numbersof machines around the world easily, quickly, and effi-ciently. To keep track of resource consumption and pass

138

charges on to consumers, it devises an accounting andcharging infrastructure.

UCAM-CL-TR-616

Donnla Nic Gearailt:

Dictionary characteristics incross-language information retrievalFebruary 2005, 158 pages, PDFPhD thesis (Gonville and Caius College, February2003)

Abstract: In the absence of resources such a as suitableMT system, translation in Cross-Language InformationRetrieval (CLIR) consists primarily of mapping queryterms to a semantically equivalent representation in thetarget language. This can be accomplished by lookingup each term in a simple bilingual dictionary. The mainproblem here is deciding which of the translations pro-vided by the dictionary for each query term shouldbe included in the query translation. We tackled thisproblem by examining different characteristics of thesystem dictionary. We found that dictionary propertiessuch as scale (the average number of translations perterm), translation repetition (providing the same trans-lation for a term more than once in a dictionary entry,for example, for different senses of a term), and dictio-nary coverage rate (the percentage of query terms forwhich the dictionary provides a translation) can havea profound effect on retrieval performance. Dictionaryproperties were explored in a series of carefully con-trolled tests, designed to evaluate specific hypotheses.These experiments showed that (a) contrary to expec-tation, smaller scale dictionaries resulted in better per-formance than large-scale ones, and (b) when appro-priately managed e.g. through strategies to ensure ad-equate translational coverage, dictionary-based CLIRcould perform as well as other CLIR methods discussedin the literature. Our experiments showed that it is pos-sible to implement an effective CLIR system with noresources other than the system dictionary itself, pro-vided this dictionary is chosen with careful examina-tion of its characteristics, removing any dependency onoutside resources.

UCAM-CL-TR-617

Augustin Chaintreau, Pan Hui,Jon Crowcroft, Christophe Diot,Richard Gass, James Scott:

Pocket Switched Networks:Real-world mobility and itsconsequences for opportunisticforwardingFebruary 2005, 26 pages, PDF

Abstract: Opportunistic networks make use of humanmobility and local forwarding in order to distributedata. Information can be stored and passed, taking ad-vantage of the device mobility, or forwarded over awireless link when an appropriate contact is met. Suchnetworks fall into the fields of mobile ad-hoc network-ing and delay-tolerant networking. In order to evalu-ate forwarding algorithms for these networks, accuratedata is needed on the intermittency of connections.

In this paper, the inter-contact time between twotransmission opportunities is observed empirically us-ing four distinct sets of data, two having been specifi-cally collected for this work, and two provided by otherresearch groups.

We discover that the distribution of inter-contacttime follows an approximate power law over a largetime range in all data sets. This observation is at oddswith the exponential decay expected by many currentlyused mobility models. We demonstrate that opportunis-tic transmission schemes designed around these cur-rent models have poor performance under approximatepower-law conditions, but could be significantly im-proved by using limited redundant transmissions.

UCAM-CL-TR-618

Mark R. Shinwell:

The Fresh Approach:functional programming with namesand bindersFebruary 2005, 111 pages, PDFPhD thesis (Queens’ College, December 2004)

Abstract: This report concerns the development of alanguage called Fresh Objective Caml, which is an ex-tension of the Objective Caml language providing facil-ities for the manipulation of data structures represent-ing syntax involving α-convertible names and bindingoperations.

After an introductory chapter which includes a sur-vey of related work, we describe the Fresh ObjectiveCaml language in detail. Next, we proceed to for-malise a small core language which captures the essenceof Fresh Objective Caml; we call this Mini-FreshML.We provide two varieties of operational semantics forthis language and prove them equivalent. Then in or-der to prove correctness properties of representationsof syntax in the language we introduce a new vari-ety of domain theory called FM-domain theory, basedon the permutation model of name binding from Pittsand Gabbay. We show how classical domain-theoreticconstructions—including those for the solution of re-cursive domain equations—fall naturally into this set-ting, where they are augmented by new constructionsto handle name-binding.

After developing the necessary domain theory, wedemonstrate how it may be exploited to give a monadic

139

denotational semantics to Mini-FreshML. This seman-tics in itself is quite novel and demonstrates how asimple monad of continuations is sufficient to modeldynamic allocation of names. We prove that our de-notational semantics is computationally adequate withrespect to the operational semantics—in other words,equality of denotation implies observational equiva-lence. After this, we show how the denotational seman-tics may be used to prove our desired correctness prop-erties.

In the penultimate chapter, we examine the imple-mentation of Fresh Objective Caml, describing detailedissues in the compiler and runtime systems. Then in thefinal chapter we close the report with a discussion of fu-ture avenues of research and an assessment of the workcompleted so far.

UCAM-CL-TR-619

James R. Bulpin:

Operating system support forsimultaneous multithreadedprocessorsFebruary 2005, 130 pages, PDFPhD thesis (King’s College, September 2004)

Abstract: Simultaneous multithreaded (SMT) proces-sors are able to execute multiple application threads inparallel in order to improve the utilisation of the pro-cessor’s execution resources. The improved utilisationprovides a higher processor-wide throughput at the ex-pense of the performance of each individual thread.

Simultaneous multithreading has recently been in-corporated into the Intel Pentium 4 processor family as“Hyper-Threading”. While there is already basic sup-port for it in popular operating systems, that supportdoes not take advantage of any knowledge about thecharacteristics of SMT, and therefore does not fully ex-ploit the processor.

SMT presents a number of challenges to operat-ing system designers. The threads’ dynamic sharing ofprocessor resources means that there are complex per-formance interactions between threads. These interac-tions are often unknown, poorly understood, or hard toavoid. As a result such interactions tend to be ignoredleading to a lower processor throughput.

In this dissertation I start by describing simultane-ous multithreading and the hardware implementationsof it. I discuss areas of operating system support thatare either necessary or desirable.

I present a detailed study of a real SMT processor,the Intel Hyper-Threaded Pentium 4, and describe theperformance interactions between threads. I analyse theresults using information from the processor’s perfor-mance monitoring hardware.

Building on the understanding of the processor’s op-eration gained from the analysis, I present a designfor an operating system process scheduler that takes

into account the characteristics of the processor andthe workloads in order to improve the system-widethroughput. I evaluate designs exploiting various levelsof processor-specific knowledge.

I finish by discussing alternative ways to exploitSMT processors. These include the partitioning ontoseparate simultaneous threads of applications andhardware interrupt handling. I present preliminary ex-periments to evaluate the effectiveness of this tech-nique.

UCAM-CL-TR-620

Eleftheria Katsiri:

Middleware support forcontext-awareness in distributedsensor-driven systemsFebruary 2005, 176 pages, PDFPhD thesis (Clare College, January 2005)

Abstract: Context-awareness concerns the ability ofcomputing devices to detect, interpret and respond toaspects of the user’s local environment. Sentient Com-puting is a sensor-driven programming paradigm whichmaintains an event-based, dynamic model of the en-vironment which can be used by applications in or-der to drive changes in their behaviour, thus achiev-ing context-awareness. However, primitive events, es-pecially those arising from sensors, e.g., that a user is atposition x,y,z are too low-level to be meaningful toapplications. Existing models for creating higher-level,more meaningful events, from low-level events, are in-sufficient to capture the user’s intuition about abstractsystem state. Furthermore, there is a strong need foruser-centred application development, without undueprogramming overhead. Applications need to be cre-ated dynamically and remain functional independentlyof the distributed nature and heterogeneity of sensor-driven systems, even while the user is mobile. Both is-sues combined necessitate an alternative model for de-veloping applications in a real-time, distributed sensor-driven environment such as Sentient Computing.

This dissertation describes the design and imple-mentation of the SCAFOS framework. SCAFOS hastwo novel aspects. Firstly, it provides powerful tools forinferring abstract knowledge from low-level, concreteknowledge, verifying its correctness and estimating itslikelihood. Such tools include Hidden Markov Mod-els, a Bayesian Classifier, Temporal First-Order Logic,the theorem prover SPASS and the production systemCLIPS. Secondly, SCAFOS provides support for sim-ple application development through the XML-basedSCALA language. By introducing the new concept ofa generalised event, an abstract event, defined as a no-tification of changes in abstract system state, expres-siveness compatible with human intuition is achievedwhen using SCALA. The applications that are created

140

through SCALA are automatically integrated and op-erate seamlessly in the various heterogeneous compo-nents of the context-aware environment even while theuser is mobile or when new entities or other applica-tions are added or removed in SCAFOS.

UCAM-CL-TR-621

Mark R. Shinwell, Andrew M. Pitts:

Fresh Objective Caml user manualFebruary 2005, 21 pages, PDF

Abstract: This technical report is the user manual forthe Fresh Objective Caml system, which implements afunctional programming language incorporating facili-ties for manipulating syntax involving names and bind-ing operations.

UCAM-CL-TR-622

Jorg H. Lepler:

Cooperation and deviation inmarket-based resource allocationMarch 2005, 173 pages, PDFPhD thesis (St John’s College, November 2004)

Abstract: This thesis investigates how business trans-actions are enhanced through competing strategies foreconomically motivated cooperation. To this end, it fo-cuses on the setting of a distributed, bilateral allocationprotocol for electronic services and resources. Cooper-ative efforts like these are often threatened by transac-tion parties who aim to exploit their competitors bydeviating from so-called cooperative goals. We anal-yse this conflict between cooperation and deviation bypresenting the case of two novel market systems whichuse economic incentives to solve the complications thatarise from cooperation.

The first of the two systems is a pricing model whichis designed to address the problematic resource mar-ket situation, where supply exceeds demand and per-fect competition can make prices collapse to level zero.This pricing model uses supply functions to determinethe optimal Nash-Equilibrium price. Moreover, in thismodel the providers’ market estimations are updatedwith information about each of their own transactions.Here, we implement the protocol in a discrete eventsimulation, to show that the equilibrium prices areabove competitive levels, and to demonstrate that de-viations from the pricing model are not profitable.

The second of the two systems is a reputation aggre-gation model, which seeks the subgroup of raters that(1) contains the largest degree of overall agreement and(2) derives the resulting reputation scores from theircomments. In order to seek agreement, this model as-sumes that not all raters in the system are equally ableto foster an agreement. Based on the variances of the

raters’ comments, the system derives a notion of thereputation for each rater, which is in turn fed back intothe model’s recursive scoring algorithm. We demon-strate the convergence of this algorithm, and show theeffectiveness of the model’s ability to discriminate be-tween poor and strong raters. Then with a series ofthreat models, we show how resilient this model is interms of finding agreement, despite large collectives ofmalicious raters. Finally, in a practical example, we ap-ply the model to the academic peer review process inorder to show its versatility at establishing a ranking ofrated objects.

UCAM-CL-TR-623

Keith Wansbrough:

Simple polymorphic usage analysisMarch 2005, 364 pages, PDFPhD thesis (Clare Hall, March 2002)

Abstract: Implementations of lazy functional languagesensure that computations are performed only whenthey are needed, and save the results so that they arenot repeated. This frees the programmer to describe so-lutions at a high level, leaving details of control flow tothe compiler.

This freedom however places a heavy burden on thecompiler; measurements show that over 70% of thesesaved results are never used again. A usage analysis thatcould statically detect values used at most once wouldenable these wasted updates to be avoided, and wouldbe of great benefit. However, existing usage analyseseither give poor results or have been applied only toprototype compilers or toy languages.

This thesis presents a sound, practical, type-basedusage analysis that copes with all the language featuresof a modern functional language, including type poly-morphism and user-defined algebraic data types, andaddresses a range of problems that have caused diffi-culty for previous analyses, including poisoning, mu-tual recursion, separate compilation, and partial ap-plication and usage dependencies. In addition to well-typing rules, an inference algorithm is developed, withproofs of soundness and a complexity analysis.

In the process, the thesis develops simple polymor-phism, a novel approach to polymorphism in the pres-ence of subtyping that attempts to strike a balance be-tween pragmatic concerns and expressive power. Thisthesis may be considered an extended experiment intothis approach, worked out in some detail but not yetconclusive.

The analysis described was designed in parallel witha full implementation in the Glasgow Haskell Compiler,leading to informed design choices, thorough cover-age of language features, and accurate measurements ofits potential and effectiveness when used on real code.The latter demonstrate that the analysis yields moder-ate benefit in practice.

141

UCAM-CL-TR-624

Steve Bishop, Matthew Fairbairn,Michael Norrish, Peter Sewell,Michael Smith, Keith Wansbrough:

TCP, UDP, and Sockets:rigorous and experimentally-validatedbehavioural specificationVolume 1: OverviewMarch 2005, 88 pages, PDF

Abstract: We have developed a mathematically rigor-ous and experimentally-validated post-hoc specificationof the behaviour of TCP, UDP, and the Sockets API.It characterises the API and network-interface inter-actions of a host, using operational semantics in thehigher-order logic of the HOL automated proof assis-tant. The specification is detailed, covering almost allthe information of the real-world communications: itis in terms of individual TCP segments and UDP data-grams, though it abstracts from the internals of IP. Ithas broad coverage, dealing with arbitrary API call se-quences and incoming messages, not just some well-behaved usage. It is also accurate, closely based onthe de facto standard of (three of) the widely-deployedimplementations. To ensure this we have adopted anovel experimental semantics approach, developing testgeneration tools and symbolic higher-order-logic modelchecking techniques that let us validate the specificationdirectly against several thousand traces captured fromthe implementations.

The resulting specification, which is annotated forthe non-HOL-specialist reader, may be useful as an in-formal reference for TCP/IP stack implementors andSockets API users, supplementing the existing infor-mal standards and texts. It can also provide a basisfor high-fidelity automated testing of future implemen-tations, and a basis for design and formal proof ofhigher-level communication layers. More generally, thework demonstrates that it is feasible to carry out sim-ilar rigorous specification work at design-time for newprotocols. We discuss how such a design-for-test ap-proach should influence protocol development, leadingto protocol specifications that are both unambiguousand clear, and to high-quality implementations that canbe tested directly against those specifications.

This document (Volume 1) gives an overview of theproject, discussing the goals and techniques and givingan introduction to the specification. The specificationitself is given in the companion Volume 2 (UCAM-CL-TR-625), which is automatically typeset from the (ex-tensively annotated) HOL source. As far as possible wehave tried to make the work accessible to four groupsof intended readers: workers in networking (implemen-tors of TCP/IP stacks, and designers of new protocols);in distributed systems (implementors of software abovethe Sockets API); in distributed algorithms (for whom

this may make it possible to prove properties about ex-ecutable implementations of those algorithms); and insemantics and automated reasoning.

UCAM-CL-TR-625

Steve Bishop, Matthew Fairbairn,Michael Norrish, Peter Sewell,Michael Smith, Keith Wansbrough:

TCP, UDP, and Sockets:rigorous and experimentally-validatedbehavioural specificationVolume 2: The SpecificationMarch 2005, 386 pages, PDF

Abstract: See Volume 1 (UCAM-CL-TR-624).

UCAM-CL-TR-626

Meng How Lim, Adam Greenhalgh,Julian Chesterfield, Jon Crowcroft:

Landmark Guided Forwarding:A hybrid approach for Ad HocroutingMarch 2005, 28 pages, PDF

Abstract: Wireless Ad Hoc network routing presentssome extremely challenging research problems, tryingto optimize parameters such as energy conservation vsconnectivity and global optimization vs routing over-head scalability. In this paper we focus on the problemsof maintaining network connectivity in the presence ofnode mobility whilst providing globally efficient androbust routing. The common approach among exist-ing wireless Ad Hoc routing solutions is to establish aglobal optimal path between a source and a destination.We argue that establishing a globally optimal path isboth unreliable and unsustainable as the network diam-eter, traffic volume, number of nodes all increase in thepresence of moderate node mobility. To address this wepropose Landmark Guided Forwarding (LGF), a proto-col that provides a hybrid solution of topological andgeographical routing algorithms. We demonstrate thatLGF is adaptive to unstable connectivity and scalableto large networks. Our results indicate therefore thatLandmark Guided Forwarding converges much faster,scales better and adapts well within a dynamic wirelessAd Hoc environment in comparison to existing solu-tions.

142

UCAM-CL-TR-627

Keith Vertanen:

Efficient computer interfacesusing continuous gestures,language models, and speechMarch 2005, 46 pages, PDFMPhil thesis (Darwin College, July 2004)

Abstract: Despite advances in speech recognition tech-nology, users of dictation systems still face a significantamount of work to correct errors made by the recog-nizer. The goal of this work is to investigate the use ofa continuous gesture-based data entry interface to pro-vide an efficient and fun way for users to correct recog-nition errors. Towards this goal, techniques are investi-gated which expand a recognizer’s results to help coverrecognition errors. Additionally, models are developedwhich utilize a speech recognizer’s n-best list to buildletter-based language models.

UCAM-CL-TR-628

Moritz Y. Becker:

A formal security policy for anNHS electronic health record serviceMarch 2005, 81 pages, PDF

Abstract: The ongoing NHS project for the develop-ment of a UK-wide electronic health records service,also known as the ‘Spine’, raises many controversialissues and technical challenges concerning the securityand confidentiality of patient-identifiable clinical data.As the system will need to be constantly adapted tocomply with evolving legal requirements and guide-lines, the Spine’s authorisation policy should not behard-coded into the system but rather be specified ina high-level, general-purpose, machine-enforceable pol-icy language.

We describe a complete authorisation policy for theSpine and related services, written for the trust man-agement system Cassandra, and comprising 375 for-mal rules. The policy is based on the NHS’s Output-based Specification (OBS) document and deals withall requirements concerning access control of patient-identifiable data, including legitimate relationships, pa-tients restricting access, authenticated express consent,third-party consent, and workgroup management.

UCAM-CL-TR-629

Meng How Lim, Adam Greenhalgh,Julian Chesterfield, Jon Crowcroft:

Hybrid routing: A pragmaticapproach to mitigating positionuncertainty in geo-routingApril 2005, 26 pages, PDF

Abstract: In recent years, research in wireless Ad Hocrouting seems to be moving towards the approach ofposition based forwarding. Amongst proposed algo-rithms, Greedy Perimeter Stateless Routing has gainedrecognition for guaranteed delivery with modest net-work overheads. Although this addresses the scalinglimitations with topological routing, it has limited toler-ance for position inaccuracy or stale state reported by alocation service. Several researchers have demonstratedthat the inaccuracy of the positional system could havea catastrophic effect on position based routing proto-cols. In this paper, we evaluate how the negative effectsof position inaccuracy can be countered by extendingposition based forwarding with a combination of re-strictive topological state, adaptive route advertisementand hybrid forwarding. Our results show that a hybridof the position and topology approaches used in Land-mark Guided Forwarding yields a high goodput andtimely packet delivery, even with 200 meters of posi-tion error.

UCAM-CL-TR-630

Sergei P. Skorobogatov:

Semi-invasive attacks –A new approach to hardware securityanalysisApril 2005, 144 pages, PDFPhD thesis (Darwin College, September 2004)

Abstract: Semiconductor chips are used today not onlyto control systems, but also to protect them againstsecurity threats. A continuous battle is waged be-tween manufacturers who invent new security solu-tions, learning their lessons from previous mistakes,and the hacker community, constantly trying to breakimplemented protections. Some chip manufacturers donot pay enough attention to the proper design and test-ing of protection mechanisms. Even where they claimtheir products are highly secure, they do not guaran-tee this and do not take any responsibility if a device iscompromised. In this situation, it is crucial for the de-sign engineer to have a convenient and reliable methodof testing secure chips.

This thesis presents a wide range of attacks on hard-ware security in microcontrollers and smartcards. This

143

includes already known non-invasive attacks, such aspower analysis and glitching, and invasive attacks, suchas reverse engineering and microprobing. A new classof attacks – semi-invasive attacks – is introduced. Likeinvasive attacks, they require depackaging the chip toget access to its surface. But the passivation layer re-mains intact, as these methods do not require electri-cal contact to internal lines. Semi-invasive attacks standbetween non-invasive and invasive attacks. They repre-sent a greater threat to hardware security, as they arealmost as effective as invasive attacks but can be low-cost like non-invasive attacks.

This thesis’ contribution includes practical fault-injection attacks to modify SRAM and EEPROM con-tent, or change the state of any individual CMOS tran-sistor on a chip. This leads to almost unlimited capabil-ities to control chip operation and circumvent protec-tion mechanisms. A second contribution consist of ex-periments on data remanence, which show that it is fea-sible to extract information from powered-off SRAMand erased EPROM, EEPROM and Flash memory de-vices.

A brief introduction to copy protection in micro-controllers is given. Hardware security evaluation tech-niques using semi-invasive methods are introduced.They should help developers to make a proper selectionof components according to the required level of secu-rity. Various defence technologies are discussed, fromlow-cost obscurity methods to new approaches in sili-con design.

UCAM-CL-TR-631

Wenjun Hu, Jon Crowcroft:

MIRRORS: An integrated frameworkfor capturing real world behaviourfor models of ad hoc networksApril 2005, 16 pages, PDF

Abstract: The simulation models used in mobile ad hocnetwork research have been criticised for lack of real-ism. While credited with ease of understanding and im-plementation, they are often based on theoretical mod-els, rather than real world observations. Criticisms havecentred on radio propagation or mobility models.

In this work, we take an integrated approach tomodelling the real world that underlies a mobile adhoc network. While pointing out the correlations be-tween the space, radio propagation and mobility mod-els, we use mobility as a focal point to propose a newframework, MIRRORS, that captures real world be-haviour. We give the formulation of a specific modelwithin the framework and present simulation resultsthat reflect topology properties of the networks syn-thesised. Compared with the existing models studied,our model better represent real world topology prop-erties and presents a wider spectrum of variation in themetrics examined, due to the model encapsulating more

detailed dynamics. While the common approach is tofocus on performance evaluation of existing protocolsusing these models, we discuss protocol design oppor-tunities across layers in view of the simulation results.

UCAM-CL-TR-632

R.I. Tucker, K. Sparck Jones:

Between shallow and deep:an experiment in automaticsummarisingApril 2005, 34 pages, PDF

Abstract: This paper describes an experiment in au-tomatic summarising using a general-purpose strategybased on a compromise between shallow and deep pro-cessing. The method combines source text analysis intosimple logical forms with the use of a semantic graphfor representation and operations on the graph to iden-tify summary content.

The graph is based on predications extracted fromthe logical forms, and the summary operations applythree criteria, namely importance, representativeness,and cohesiveness, in choosing node sets to form thecontent representation for the summary. This is usedin different ways for output summaries. The paperpresents the motivation for the strategy, details of theCLASP system, and the results of initial testing andevaluation on news material.

UCAM-CL-TR-633

Alex Ho, Steven Smith, Steven Hand:

On deadlock, livelock, and forwardprogressMay 2005, 8 pages, PDF

Abstract: Deadlock and livelock can happen at manydifferent levels in a distributed system. We unify botharound the concept of forward progress and standstill.We describe a framework capable of detecting the lackof forward progress in distributed systems. Our pro-totype can easily solve traditional deadlock problemswhere synchronization is via a customer network pro-tocol; however, many interesting research challenges re-main.

UCAM-CL-TR-634

Kasim Rehman:

Visualisation, interpretation and useof location-aware interfacesMay 2005, 159 pages, PDFPhD thesis (St Catharine’s College, November 2004)

144

Abstract: Ubiquitous Computing (Ubicomp), a termcoined by Mark Weiser in the early 1990’s, is abouttransparently equipping the physical environment andeveryday objects in it with computational, sensing andnetworking abilities. In contrast with traditional desk-top computing the “computer” moves into the back-ground, unobtrusively supporting users in their every-day life.

One of the instantiations of Ubicomp is location-aware computing. Using location sensors, the “com-puter” reacts to changes in location of users and ev-eryday objects. Location changes are used to infer userintent in order to give the user the most appropriatesupport for the task she is performing. Such supportcan consist of automatically providing information orconfiguring devices and applications deemed adequatefor the inferred user task.

Experience with these applications has uncovereda number of usability problems that stem from thefact that the “computer” in this paradigm has becomeunidentifiable for the user. More specifically, these arisefrom lack of feedback from, loss of user control over,and the inability to provide a conceptual model of the“computer”.

Starting from the proven premise that feedback isindispensable for smooth human-machine interaction,a system that uses Augmented Reality in order to visu-ally provide information about the state of a location-aware environment and devices in it, is designed andimplemented.

Augmented Reality (AR) as it is understood forthe purpose of this research uses a see-through head-mounted display, trackers and 3-dimensional (3D)graphics in order to give users the illusion that 3-dimensional graphical objects specified and generatedon a computer are actually located in the real world.

The system described in this thesis can be calleda Graphical User Interface (GUI) for a physical envi-ronment. Properties of GUIs for desktop environmentsare used as a valuable resource in designing a softwarearchitecture that supports interactivity in a location-aware environment, understanding how users mightconceptualise the “computer” and extracting designprinciples for visualisation in a Ubicomp environment.

Most importantly this research offers a solution tofundamental interaction problems in Ubicomp envi-ronments. In doing so this research presents the nextstep from reactive environments to interactive environ-ments.

UCAM-CL-TR-635

John Daugman:

Results from 200 billion iriscross-comparisonsJune 2005, 8 pages, PDF

Abstract: Statistical results are presented for biometricrecognition of persons by their iris patterns, based on200 billion cross-comparisons between different eyes.The database consisted of 632,500 iris images acquiredin the Middle East, in a national border-crossing pro-tection programme that uses the Daugman algorithmsfor iris recognition. A total of 152 different national-ities were represented in this database. The set of ex-haustive cross-comparisons between all possible pair-ings of irises in the database shows that with reasonableacceptance thresholds, the False Match rate is less than1 in 200 billion. Recommendations are given for thenumerical decision threshold policy that would enablereliable identification performance on a national scalein the UK.

UCAM-CL-TR-636

Rana Ayman el Kaliouby:

Mind-reading machines:automated inference of complexmental statesJuly 2005, 185 pages, PDFPhD thesis (Newnham College, March 2005)

Abstract: People express their mental states all the time,even when interacting with machines. These mentalstates shape the decisions that we make, govern how wecommunicate with others, and affect our performance.The ability to attribute mental states to others fromtheir behaviour, and to use that knowledge to guideone’s own actions and predict those of others is knownas theory of mind or mind-reading.

The principal contribution of this dissertation is thereal time inference of a wide range of mental states fromhead and facial displays in a video stream. In particular,the focus is on the inference of complex mental states:the affective and cognitive states of mind that are notpart of the set of basic emotions. The automated men-tal state inference system is inspired by and draws onthe fundamental role of mind-reading in communica-tion and decision-making.

The dissertation describes the design, implementa-tion and validation of a computational model of mind-reading. The design is based on the results of a numberof experiments that I have undertaken to analyse the fa-cial signals and dynamics of complex mental states. Theresulting model is a multi-level probabilistic graphicalmodel that represents the facial events in a raw videostream at different levels of spatial and temporal ab-straction. Dynamic Bayesian Networks model observ-able head and facial displays, and corresponding hid-den mental states over time.

The automated mind-reading system implementsthe model by combining top-down predictions of men-tal state models with bottom-up vision-based process-ing of the face. To support intelligent human-computerinteraction, the system meets three important criteria.

145

These are: full automation so that no manual prepro-cessing or segmentation is required, real time execution,and the categorization of mental states early enough af-ter their onset to ensure that the resulting knowledge iscurrent and useful.

The system is evaluated in terms of recognition ac-curacy, generalization and real time performance for sixbroad classes of complex mental states—agreeing, con-centrating, disagreeing, interested, thinking and unsure,on two different corpora. The system successfully clas-sifies and generalizes to new examples of these classeswith an accuracy and speed that are comparable to thatof human recognition.

The research I present here significantly advancesthe nascent ability of machines to infer cognitive-affective mental states in real time from nonverbal ex-pressions of people. By developing a real time systemfor the inference of a wide range of mental states be-yond the basic emotions, I have widened the scopeof human-computer interaction scenarios in which thistechnology can be integrated. This is an important steptowards building socially and emotionally intelligentmachines.

UCAM-CL-TR-637

Shishir Nagaraja, Ross Anderson:

The topology of covert conflictJuly 2005, 15 pages, PDF

Abstract: Often an attacker tries to disconnect a net-work by destroying nodes or edges, while the defendercounters using various resilience mechanisms. Exam-ples include a music industry body attempting to closedown a peer-to-peer file-sharing network; medics at-tempting to halt the spread of an infectious diseaseby selective vaccination; and a police agency trying todecapitate a terrorist organisation. Albert, Jeong andBarabasi famously analysed the static case, and showedthat vertex-order attacks are effective against scale-freenetworks. We extend this work to the dynamic case bydeveloping a framework based on evolutionary gametheory to explore the interaction of attack and defencestrategies. We show, first, that naive defences don’twork against vertex-order attack; second, that defencesbased on simple redundancy don’t work much better,but that defences based on cliques work well; third, thatattacks based on centrality work better against cliquedefences than vertex-order attacks do; and fourth, thatdefences based on complex strategies such as delega-tion plus clique resist centrality attacks better than sim-ple clique defences. Our models thus build a bridge be-tween network analysis and evolutionary game theory,and provide a framework for analysing defence and at-tack in networks where topology matters. They suggestdefinitions of efficiency of attack and defence, and mayeven explain the evolution of insurgent organisationsfrom networks of cells to a more virtual leadershipthat facilitates operations rather than directing them.

Finally, we draw some conclusions and present possibledirections for future research.

UCAM-CL-TR-638

Piotr Zielinski:

Optimistic Generic BroadcastJuly 2005, 22 pages, PDF

Abstract: We consider an asynchronous system with theΩ failure detector, and investigate the number of com-munication steps required by various broadcast pro-tocols in runs in which the leader does not change.Atomic Broadcast, used for example in state machinereplication, requires three communication steps. Opti-mistic Atomic Broadcast requires only two steps if allcorrect processes receive messages in the same order.Generic Broadcast requires two steps if no messagesconflict. We present an algorithm that subsumes bothof these approaches and guarantees two-step delivery ifall conflicting messages are received in the same order,and three-step delivery otherwise. Internally, our proto-col uses two new algorithms. First, a Consensus algo-rithm which decides in one communication step if allproposals are the same, and needs two steps otherwise.Second, a method that allows us to run infinitely manyinstances of a distributed algorithm, provided that onlyfinitely many of them are different. We assume thatfewer than a third of all processes are faulty (n > 3f).

UCAM-CL-TR-639

Chris Purcell, Tim Harris:

Non-blocking hashtables withopen addressingSeptember 2005, 23 pages, PDF

Abstract: We present the first non-blocking hashtablebased on open addressing that provides the followingbenefits: it combines good cache locality, accessing asingle cacheline if there are no collisions, with shortstraight-line code; it needs no storage overhead forpointers and memory allocator schemes, having insteadan overhead of two words per bucket; it does not needto periodically reorganise or replicate the table; and itdoes not need garbage collection, even with arbitrary-sized keys. Open problems include resizing the tableand replacing, rather than erasing, entries. The resultis a highly-concurrent set algorithm that approaches oroutperforms the best externally-chained implementa-tions we tested, with fixed memory costs and no need toselect or fine-tune a garbage collector or locking strat-egy.

146

UCAM-CL-TR-640

Feng Hao, Ross Anderson, John Daugman:

Combining cryptography withbiometrics effectivelyJuly 2005, 17 pages, PDF

Abstract: We propose the first practical and secure wayto integrate the iris biometric into cryptographic appli-cations. A repeatable binary string, which we call a bio-metric key, is generated reliably from genuine iris codes.A well-known difficulty has been how to cope with the10 to 20% of error bits within an iris code and derivean error-free key. To solve this problem, we carefullystudied the error patterns within iris codes, and deviseda two-layer error correction technique that combinesHadamard and Reed-Solomon codes. The key is gener-ated from a subject’s iris image with the aid of auxiliaryerror-correction data, which do not reveal the key, andcan be saved in a tamper-resistant token such as a smartcard. The reproduction of the key depends on two fac-tors: the iris biometric and the token. The attacker hasto procure both of them to compromise the key. Weevaluated our technique using iris samples from 70 dif-ferent eyes, with 10 samples from each eye. We foundthat an error-free key can be reproduced reliably fromgenuine iris codes with a 99.5% success rate. We cangenerate up to 140 bits of biometric key, more thanenough for 128-bit AES. The extraction of a repeatablebinary string from biometrics opens new possible ap-plications, where a strong binding is required betweena person and cryptographic operations. For example,it is possible to identify individuals without maintain-ing a central database of biometric templates, to whichprivacy objections might be raised.

UCAM-CL-TR-641

Ross Anderson, Mike Bond, Jolyon Clulow,Sergei Skorobogatov:

Cryptographic processors –a surveyAugust 2005, 19 pages, PDF

Abstract: Tamper-resistant cryptographic processorsare becoming the standard way to enforce data-usagepolicies. Their history began with military cipher ma-chines, and hardware security modules used to encryptthe PINs that bank customers use to authenticate them-selves to ATMs. In both cases, the designers wanted toprevent abuse of data and key material should a de-vice fall into the wrong hands. From these specialistbeginnings, cryptoprocessors spread into devices suchas prepayment electricity meters, and the vending ma-chines that sell credit for them. In the 90s, tamper-resistant smartcards became integral to GSM mobile

phone identification and to key management in pay-TV set-top boxes, while secure microcontrollers wereused in remote key entry devices for cars. In the lastfive years, dedicated crypto chips have been embeddedin devices from games console accessories to printer inkcartridges, to control product and accessory aftermar-kets. The ‘Trusted Computing’ initiative will soon em-bed cryptoprocessors in PCs so that they can identifyeach other remotely.

This paper surveys the range of applications oftamper-resistant hardware, and the array of attack anddefence mechanisms which have evolved in the tamper-resistance arms race.

UCAM-CL-TR-642

Gavin Bierman, Alisdair Wren:

First-class relationships in anobject-oriented languageAugust 2005, 53 pages, PDF

Abstract: In this paper we investigate the additionof first-class relationships to a prototypical object-oriented programming language (a “middleweight”fragment of Java). We provide language-level con-structs to declare relationships between classes and tomanipulate relationship instances. We allow relation-ships to have attributes and provide a novel notionof relationship inheritance. We formalize our languagegiving both the type system and operational semanticsand prove certain key safety properties.

UCAM-CL-TR-643

Nathan E. Dimmock:

Using trust and risk for access controlin Global ComputingAugust 2005, 145 pages, PDFPhD thesis (Jesus College, April 2005)

Abstract: Global Computing is a vision of a massivelynetworked infrastructure supporting a large populationof diverse but cooperating entities. Similar to ubiqui-tous computing, entities of global computing will op-erate in environments that are dynamic and unpre-dictable, requiring them to be capable of dealing withunexpected interactions and previously unknown prin-cipals using an unreliable infrastructure.

These properties will pose new security challengesthat are not adequately addressed by existing securitymodels and mechanisms. Traditionally privileges arestatically encoded as security policy, and while role-based access control introduces a layer of abstractionbetween privilege and identity, roles, privileges and con-text must still be known in advance of any interactiontaking place.

147

Human society has developed the mechanism oftrust to overcome initial suspicion and graduallyevolve privileges. Trust successfully enables collabora-tion amongst human agents — a computational modelof trust ought to be able to enable the same in compu-tational agents. Existing research in this area has con-centrated on developing trust management systems thatpermit the encoding of, and reasoning about, trust be-liefs, but the relationship between these and privilege isstill hard coded. These systems also omit any explicitreasoning about risk, and its relationship to privilege,nor do they permit the automated evolution of trustover time.

This thesis examines the relationship between trust,risk and privilege in an access control system. Anoutcome-based approach is taken to risk modelling, us-ing explicit costs and benefits to model the relationshipbetween risk and privilege. This is used to develop anovel model of access control — trust-based access con-trol (TBAC) — firstly for the limited domain of collabo-ration between Personal Digital Assistants (PDAs), andlater for more general global computing applicationsusing the SECURE computational trust framework.

This general access control model is also used to ex-tend an existing role-based access control system to ex-plicitly reason about trust and risk. A further refine-ment is the incorporation of the economic theory ofdecision-making under uncertainty by expressing costsand benefits as utility, or preference-scaling, functions.It is then shown how Bayesian trust models can be usedin the SECURE framework, and how these models en-able a better abstraction to be obtained in the accesscontrol policy. It is also shown how the access controlmodel can be used to take such decisions as whetherthe cost of seeking more information about a princi-pal is justified by the risk associated with granting theprivilege, and to determine whether a principal shouldrespond to such requests upon receipt. The use of gametheory to help in the construction of policies is alsobriefly considered.

Global computing has many applications, all ofwhich require access control to prevent abuse by ma-licious principals. This thesis develops three in detail:an information sharing service for PDAs, an identity-based spam detector and a peer-to-peer collaborativespam detection network. Given the emerging nature ofcomputational trust systems, in order to evaluate the ef-fectiveness of the TBAC model, it was first necessary todevelop an evaluation methodology. This takes the ap-proach of a threat-based analysis, considering possibleattacks at the component and system level, to ensurethat components are correctly integrated, and system-level assumptions made by individual components arevalid. Applying the methodology to the implementa-tion of the TBAC model demonstrates its effectivenessin the scenarios chosen, with good promise for further,untested, scenarios.

UCAM-CL-TR-644

Paul Youn, Ben Adida, Mike Bond,

Jolyon Clulow, Jonathan Herzog,Amerson Lin, Ronald L. Rivest,Ross Anderson:

Robbing the bank with a theoremproverAugust 2005, 26 pages, PDF

Abstract: We present the first methodology for analy-sis and automated detection of attacks on security ap-plication programming interfaces (security APIs) – theinterfaces to hardware cryptographic services used bydevelopers of critical security systems, such as bankingapplications. Taking a cue from previous work on theformal analysis of security protocols, we model APIspurely according to specifications, under the assump-tion of ideal encryption primitives. We use a theoremprover tool and adapt it to the security API context.We develop specific formalization and automation tech-niques that allow us to fully harness the power of a the-orem prover. We show how, using these techniques, wewere able to automatically re-discover all of the pureAPI attacks originally documented by Bond and An-derson against banking payment networks, since theirdiscovery of this type of attack in 2000. We concludewith a note of encouragement: the complexity and un-intuiveness of the modelled attacks make a very strongcase for continued focus on automated formal analysisof cryptographic APIs.

UCAM-CL-TR-645

Frank Stajano:

RFID is X-ray visionAugust 2005, 10 pages, PDF

Abstract: Making RFID tags as ubiquitous as barcodeswill enable machines to see and recognize any taggedobject in their vicinity, better than they ever could withthe smartest image processing algorithms. This opensmany opportunities for “sentient computing” applica-tions.

However, in so far as this new capability has someof the properties of X-ray vision, it opens the door toabuses. To promote discussion, I won’t elaborate onlow level technological solutions; I shall instead discussa simple security policy model that addresses most ofthe privacy issues. Playing devil’s advocate, I shall alsoindicate why it is currently unlikely that consumers willenjoy the RFID privacy that some of them vociferouslydemand.

148

UCAM-CL-TR-647

Sam Staton:

An agent architecture for simulationof end-users in programming-liketasksOctober 2005, 12 pages, PDF

Abstract: We present some motivation and technicaldetails for a software simulation of an end-user per-forming programming-like tasks. The simulation usesan agent/agenda model by breaking tasks down intosubgoals, based on work of A. Blackwell. This docu-ment was distributed at the CHI 2002 workshop onCognitive Models of Programming-Like Processes.

UCAM-CL-TR-648

Moritz Y. Becker:

Cassandra: flexible trust managementand its application to electronichealth recordsOctober 2005, 214 pages, PDFPhD thesis (Trinity College, September 2005)

Abstract: The emergence of distributed applications op-erating on large-scale, heterogeneous and decentralisednetworks poses new and challenging problems of con-cern to society as a whole, in particular for data secu-rity, privacy and confidentiality. Trust management andauthorisation policy languages have been proposed toaddress access control and authorisation in this con-text. Still, many key problems have remained unsolved.Existing systems are often not expressive enough, or areso expressive that access control becomes undecidable;their semantics is not formally specified; and they havenot been shown to meet the requirements set by actualreal-world applications.

This dissertation addresses these problems. Wepresent Cassandra, a role-based language and systemfor expressing authorisation policy, and the results of asubstantial case study, a policy for a national electronichealth record (EHR) system, based on the requirementsof the UK National Health Service’s National Pro-gramme for Information Technology (NPfIT).

Cassandra policies are expressed in a language de-rived from Datalog with constraints. Cassandra sup-ports credential-based authorisation (eg between ad-ministrative domains), and rules can refer to remotepolicies (for credential retrieval and trust negotiation).The expressiveness of the language (and its computa-tional complexity) can be tuned by choosing an appro-priate constraint domain. The language is small and hasa formal semantics for both query evaluation and theaccess control engine.

There has been a lack of real-world examples ofcomplex security policies: our NPfIT case study fills thisgap. The resulting Cassandra policy (with 375 rules)demonstrates that the policy language is expressiveenough for a real-world application. We thus demon-strate that a general-purpose trust management systemcan be designed to be highly flexible, expressive, for-mally founded and meet the complex requirements ofreal-world applications.

UCAM-CL-TR-649

Mark Grundland, Neil A. Dodgson:

The decolorize algorithm forcontrast enhancing, color tograyscale conversionOctober 2005, 15 pages, PDF

Abstract: We present a new contrast enhancing colorto grayscale conversion algorithm which works in real-time. It incorporates novel techniques for image sam-pling and dimensionality reduction, sampling color dif-ferences by Gaussian pairing and analyzing color dif-ferences by predominant component analysis. In addi-tion to its speed and simplicity, the algorithm has theadvantages of continuous mapping, global consistency,and grayscale preservation, as well as predictable lu-minance, saturation, and hue ordering properties. Wegive an extensive range of examples and compare ourmethod with other recently published algorithms.

UCAM-CL-TR-650

Rashid Mehmood, Jon Crowcroft:

Parallel iterative solution method forlarge sparse linear equation systemsOctober 2005, 22 pages, PDF

Abstract: Solving sparse systems of linear equationsis at the heart of scientific computing. Large sparsesystems often arise in science and engineering prob-lems. One such problem we consider in this paper isthe steady-state analysis of Continuous Time MarkovChains (CTMCs). CTMCs are a widely used formalismfor the performance analysis of computer and commu-nication systems. A large variety of useful performancemeasures can be derived from a CTMC via the com-putation of its steady-state probabilities. A CTMC maybe represented by a set of states and a transition ratematrix containing state transition rates as coefficients,and can be analysed using probabilistic model check-ing. However, CTMC models for realistic systems arevery large. We address this largeness problem in thispaper, by considering parallelisation of symbolic meth-ods. In particular, we consider Multi-Terminal BinaryDecision Diagrams (MTBDDs) to store CTMCs, and,

149

using Jacobi iterative method, present a parallel methodfor the CTMC steady-state solution. Employing a 24-node processor bank, we report results of the sparsesystems with over a billion equations and eighteen bil-lion nonzeros.

UCAM-CL-TR-651

Rob Hague:

End-user programming in multiplelanguagesOctober 2005, 122 pages, PDFPhD thesis (Fitzwilliam College, July 2004)

Abstract: Advances in user interface technology haveremoved the need for the majority of users to program,but they do not allow the automation of repetitive or in-direct tasks. End-user programming facilities solve thisproblem without requiring users to learn and use a con-ventional programming language, but must be tailoredto specific types of end user. In situations where the userpopulation is particularly diverse, this presents a prob-lem.

In addition, studies have shown that the perfor-mance of tasks based on the manipulation and inter-pretation of data depends on the way in which thedata is represented. Different representations may facil-itate different tasks, and there is not necessarily a sin-gle, optimal representation that is best for all tasks. Inmany cases, the choice of representation is also con-strained by other factors, such as display size. It wouldbe advantageous for an end-user programming systemto provide multiple, interchangeable representations ofprograms.

This dissertation describes an architecture for pro-viding end-user programming facilities in the net-worked home, a context with a diverse user popula-tion, and a wide variety of input and output devices.The Media Cubes language, a novel end-user program-ming language, is introduced as the context that leadto the development of the architecture. A frameworkfor translation between languages via a common inter-mediate form is then described, with particular atten-tion paid to the requirements of mappings between lan-guages and the intermediate form. The implementationof Lingua Franca, a system realizing this framework inthe given context, is described.

Finally, the system is evaluated by considering sev-eral end-user programming languages implementedwithin this system. It is concluded that translation be-tween programming languages, via a common interme-diate form, is viable for systems within a limited do-main, and the wider applicability of the technique isdiscussed.

UCAM-CL-TR-652

Roongroj Nopsuwanchai:

Discriminative training methods andtheir applications to handwritingrecognitionNovember 2005, 186 pages, PDFPhD thesis (Downing College, August 2004)

Abstract: This thesis aims to improve the performanceof handwriting recognition systems by introducing theuse of discriminative training methods. Discriminativetraining methods use data from all competing classeswhen training the recogniser for each class. We de-velop discriminative training methods for two popu-lar classifiers: Hidden Markov Models (HMMs) anda prototype-based classifier. At the expense of addi-tional computations in the training process, discrimi-native training has demonstrated significant improve-ments in recognition accuracies from the classifiers thatare not discriminatively optimised. Our studies focuson isolated character recognition problems with an em-phasis on, but not limited to, off-line handwritten Thaicharacters.

The thesis is organised as followed. First, we de-velop an HMM-based classifier that employs a Maxi-mum Mutual Information (MMI) discriminative train-ing criterion. HMMs have an increasing number of ap-plications to character recognition in which they areusually trained by Maximum Likelihood (ML) usingthe Baum-Welch algorithm. However, ML training doesnot take into account the data of other competing cat-egories, and thus is considered non-discriminative. Bycontrast, MMI provides an alternative training methodwith the aim of maximising the mutual informationbetween the data and their correct categories. One ofour studies highlights the efficiency of MMI trainingthat improves the recognition results from ML train-ing, despite being applied to a highly constrained sys-tem (tied-mixture density HMMs). Various aspects ofMMI training are investigated, including its optimisa-tion algorithms and a set of optimised parameters thatyields maximum discriminabilities.

Second, a system for Thai handwriting recognitionbased on HMMs and MMI training is introduced. Inaddition, novel feature extraction methods using block-based PCA and composite images are proposed andevaluated. A technique to improve generalisation ofthe MMI-trained systems and the use of N-best liststo efficiently compute the probabilities are described.By applying these techniques, the results from exten-sive experiments are compelling, showing up to 65%relative error reduction, compared to conventional MLtraining without the proposed features. The best resultsare comparable to those achieved by other high perfor-mance systems.

Finally, we focus on the Prototype-Based Mini-mum Error Classifier (PBMEC), which uses a discrim-inative Minimum Classification Error (MCE) training

150

method to generate the prototypes. MCE tries to min-imise recognition errors during the training process us-ing data from all classes. Several key findings are re-vealed, including the setting of smoothing parametersand a proposed clustering method that are more suit-able for PBMEC than using the conventional methods.These studies reinforce the effectiveness of discrimina-tive training and are essential as a foundation for its ap-plication to the more difficult problem of cursive hand-writing recognition.

UCAM-CL-TR-653

Richard Clayton:

Anonymity and traceability incyberspaceNovember 2005, 189 pages, PDFPhD thesis (Darwin College, August 2005)

Abstract: Traceability is the ability to map events incyberspace, particularly on the Internet, back to real-world instigators, often with a view to holding them ac-countable for their actions. Anonymity is present whentraceability fails.

I examine how traceability on the Internet actuallyworks, looking first at a classical approach from thelate 1990s that emphasises the role of activity loggingand reporting on the failures that are known to oc-cur. Failures of traceability, with consequent uninten-tional anonymity, have continued as the technology haschanged. I present an analysis that ascribes these fail-ures to the mechanisms at the edge of the network be-ing inherently inadequate for the burden that traceabil-ity places upon them. The underlying reason for thiscontinuing failure is a lack of economic incentives forimprovement. The lack of traceability at the edges isfurther illustrated by a new method of stealing anotherperson’s identity on an Ethernet Local Area Networkthat existing tools and procedures would entirely failto detect.

Preserving activity logs is seen, especially by Gov-ernments, as essential for the traceability of illegal cy-berspace activity. I present a new and efficient methodof processing email server logs to detect machines send-ing bulk unsolicited email “spam” or email infectedwith “viruses”. This creates a clear business purposefor creating logs, but the new detector is so effectivethat the logs can be discarded within days, which mayhamper general traceability.

Preventing spam would be far better than tracing itsorigin or detecting its transmission. Many analyse spamin economic terms, and wish to levy a small chargefor sending each email. I consider an oft-proposed ap-proach using computational “proof-of-work” that iselegant and anonymity preserving. I show that, in aworld of high profit margins and insecure end-user ma-chines, it is impossible to find a payment level that stopsthe spam without affecting legitimate usage of email.

Finally, I consider a content-blocking system with ahybrid design that has been deployed by a UK InternetService Provider to inhibit access to child pornography.I demonstrate that the two-level design can be circum-vented at either level, that content providers can use thefirst level to attack the second, and that the selectivityof the first level can be used as an “oracle” to extract alist of the sites being blocked. Although many of theseattacks can be countered, there is an underlying fail-ure that cannot be fixed. The system’s database holdsdetails of the traceability of content, as viewed from asingle location at a single time. However, a blockingsystem may be deployed at many sites and must trackcontent as it moves in space and time; functions whichtraceability, as currently realized, cannot deliver.

UCAM-CL-TR-654

Matthew J. Parkinson:

Local reasoning for JavaNovember 2005, 120 pages, PDFPhD thesis (Churchill College, August 2005)

Abstract: This thesis develops the local reasoning ap-proach of separation logic for common forms of modu-larity such as abstract datatypes and objects. In partic-ular, this thesis focuses on the modularity found in theJava programming language.

We begin by developing a formal semantics fora core imperative subset of Java, Middleweight Java(MJ), and then adapt separation logic to reason aboutthis subset. However, a naive adaption of separationlogic is unable to reason about encapsulation or inheri-tance: it provides no support for modularity.

First, we address the issue of encapsulation with thenovel concept of an abstract predicate, which is the log-ical analogue of an abstract datatype. We demonstratehow this method can encapsulate state, and provide amechanism for ownership transfer: the ability to trans-fer state safely between a module and its client. We alsoshow how abstract predicates can be used to expressthe calling protocol of a class.

However, the encapsulation provided by abstractpredicates is too restrictive for some applications. Inparticular, it cannot reason about multiple datatypesthat have shared read-access to state, for example listiterators. To compensate, we alter the underlying modelto allow the logic to express properties about read-onlyreferences to state. Additionally, we provide a modelthat allows both sharing and disjointness to be ex-pressed directly in the logic.

Finally, we address the second modularity issue: in-heritance. We do this by extending the concept of ab-stract predicates to abstract predicate families. This ex-tension allows a predicate to have multiple definitionsthat are indexed by class, which allows subclasses tohave a different internal representation while remainingbehavioural subtypes. We demonstrate the usefulness ofthis concept by verifying a use of the visitor design pat-tern.

151

UCAM-CL-TR-655

Karen Sparck Jones:

Wearing proper combinationsNovember 2005, 27 pages, PDF

Abstract: This paper discusses the proper treatment ofmultiple indexing fields, representations, or streams, indocument retrieval. Previous experiments by Robertsonand his colleagues have shown that, with a widely usedtype of term weighting and fields that share keys, docu-ment scores should be computed using term frequenciesover fields rather than by combining field scores. HereI examine a wide range of document and query index-ing situations, and consider their implications for thisapproach to document scoring.

UCAM-CL-TR-656

Pablo Vidales:

Seamless mobility in 4G systemsNovember 2005, 141 pages, PDFPhD thesis (Girton College, May 2005)

Abstract: The proliferation of radio access technolo-gies, wireless networking devices, and mobile serviceshas encouraged intensive nomadic computing activity.When travelling, mobile users experience connectivitydisturbances, particularly when they handoff betweentwo access points that belong to the same wireless net-work and when they change from one access tech-nology to another. Nowadays, an average mobile usermight connect to many different wireless networks inthe course of a day to obtain diverse services, whilst de-manding transparent operation. Current protocols of-fer portability and transparent mobility. However, theyfail to cope with huge delays caused by different link-layer characteristics when roaming between indepen-dent disparate networks. In this dissertation, I addressthis deficiency by introducing and evaluating practicalmethods and solutions that minimise connection dis-ruptions and support transparent mobility in futurecommunication systems.

UCAM-CL-TR-657

Hyun-Jin Choi:

Security protocol design bycompositionJanuary 2006, 155 pages, PDFPhD thesis (Churchill College, December 2004)

Abstract: The aim of this research is to present a newmethodology for the systematic design of compoundprotocols from their parts. Some security properties canbe made accumulative, i.e. can be put together withoutinterfering with one another, by carefully selecting themechanisms which implement them. Among them areauthentication, secrecy and non-repudiation. Based onthis observation, a set of accumulative protocol mecha-nisms called protocol primitives are proposed and theircorrectness is verified. These protocol primitives are ob-tained from common mechanisms found in many se-curity protocols such as challenge and response. Theyhave been carefully designed not to interfere with eachother. This feature makes them flexible building blocksin the proposed methodology. Equipped with these pro-tocol primitives, a scheme for the systematic construc-tion of a complicated protocol from simple protocolprimitives is presented, namely, design by composition.This design scheme allows the combination of severalsimple protocol parts into a complicated protocol with-out destroying the security properties established byeach independent part. In other words, the compositionframework permits the specification of a complex pro-tocol to be decomposed into the specifications of sim-pler components, and thus makes the design and verifi-cation of the protocol easier to handle. Benefits of thisapproach are similar to those gained when using a mod-ular approach to software development.

The applicability and practicality of the proposedmethodology are validated through many design exam-ples of protocols found in many different environmentsand with various initial assumptions. The method is notaimed to cover all existent design issues, but a reason-able range of protocols is addressed.

UCAM-CL-TR-658

Carsten Moenning:

Intrinsic point-based surfaceprocessingJanuary 2006, 166 pages, PDFPhD thesis (Queens’ College, January 2005)

Abstract: The need for the processing of surface ge-ometry represents an ubiquitous problem in computergraphics and related disciplines. It arises in numer-ous important applications such as computer-aided de-sign, reverse engineering, rapid prototyping, medicalimaging, cultural heritage acquisition and preservation,video gaming and the movie industry. Existing surfaceprocessing techniques predominantly follow an extrin-sic approach using combinatorial mesh data structuresin the embedding Euclidean space to represent, manip-ulate and visualise the surfaces. This thesis advocates,firstly, the intrinsic processing of surfaces, i.e. process-ing directly across the surface rather than in its embed-ding space. Secondly, it continues the trend towards theuse of point primitives for the processing and represen-tation of surfaces.

152

The discussion starts with the design of an intrin-sic point sampling algorithm template for surfaces. Thisis followed by the presentation of a module library oftemplate instantiations for surfaces in triangular meshor point cloud form. The latter is at the heart of theintrinsic meshless surface simplification algorithm alsoput forward. This is followed by the introduction ofintrinsic meshless surface subdivision, the first intrinsicmeshless surface subdivision scheme and a new methodfor the computation of geodesic centroids on mani-folds. The meshless subdivision scheme uses an intrin-sic neighbourhood concept for point-sampled geometryalso presented in this thesis. Its main contributions cantherefore be summarised as follows:

– An intrinsic neighbourhood concept for point-sampled geometry.

– An intrinsic surface sampling algorithm templatewith sampling density guarantee.

– A modular library of template instantiations forthe sampling of planar domains and surfaces in trian-gular mesh or point cloud form.

– A new method for the computation of geodesiccentroids on manifolds.

– An intrinsic meshless surface simplification algo-rithm.

– The introduction of the notion of intrinsic mesh-less surface subdivision.

– The first intrinsic meshless surface subdivisionscheme.

The overall result is a set of algorithms for the pro-cessing of point-sampled geometry centering arounda generic sampling template for surfaces in the mostwidely-used forms of representation. The intrinsic na-ture of these point-based algorithms helps to overcomelimitations associated with the more traditional extrin-sic, mesh-based processing of surfaces when dealingwith highly complex point-sampled geometry as is typ-ically encountered today.

UCAM-CL-TR-659

Viktor Vafeiadis, Maurice Herlihy,Tony Hoare, Marc Shapiro:

A safety proof of a lazy concurrentlist-based set implementationJanuary 2006, 19 pages, PDF

Abstract: We prove the safety of a practical concurrentlist-based implementation due to Heller et al. It exposesan interface of an integer set with methods contains,add, and remove. The implementation uses a combina-tion of fine-grain locking, optimistic and lazy synchro-nisation. Our proofs are hand-crafted. They use rely-guarantee reasoning and thereby illustrate its powerand applicability, as well as some of its limitations. Foreach method, we identify the linearisation point, andestablish its validity. Hence we show that the meth-ods are safe, linearisable and implement a high-level

specification. This report is a companion document toour PPoPP 2006 paper entitled “Proving correctness ofhighly-concurrent linearisable objects”.

UCAM-CL-TR-660

Jeremy Singer:

Static program analysis based onvirtual register renamingFebruary 2006, 183 pages, PDFPhD thesis (Christ’s College, March 2005)

Abstract: Static single assignment form (SSA) is a popu-lar program intermediate representation (IR) for staticanalysis. SSA programs differ from equivalent controlflow graph (CFG) programs only in the names of vir-tual registers, which are systematically transformed tocomply with the naming convention of SSA. Static sin-gle information form (SSI) is a recently proposed ex-tension of SSA that enforces a greater degree of sys-tematic virtual register renaming than SSA. This disser-tation develops the principles, properties, and practiceof SSI construction and data flow analysis. Further, itshows that SSA and SSI are two members of a largerfamily of related IRs, which are termed virtual registerrenaming schemes (VRRSs). SSA and SSI analyses canbe generalized to operate on any VRRS family mem-ber. Analysis properties such as accuracy and efficiencydepend on the underlying VRRS.

This dissertation makes four significant contribu-tions to the field of static analysis research.

First, it develops the SSI representation. AlthoughSSI was introduced five years ago, it has not yet re-ceived widespread recognition as an interesting IR inits own right. This dissertation presents a new SSI defi-nition and an optimistic construction algorithm. It alsosets SSI in context among the broad range of IRs forstatic analysis.

Second, it demonstrates how to reformulate ex-isting data flow analyses using new sparse SSI-basedtechniques. Examples include liveness analysis, sparsetype inference and program slicing. It presents algo-rithms, together with empirical results of these algo-rithms when implemented within a research compilerframework.

Third, it provides the only major comparative evalu-ation of the merits of SSI for data flow analysis. Severalqualitative and quantitative studies in this dissertationcompare SSI with other similar IRs.

Last, it identifies the family of VRRSs, which areall CFGs with different virtual register naming conven-tions. Many extant IRs are classified as VRRSs. Severalnew IRs are presented, based on a consideration of pre-viously unspecified members of the VRRS family. Gen-eral analyses can operate on any family member. Therequired level of accuracy or efficiency can be selectedby working in terms of the appropriate family member.

153

UCAM-CL-TR-661

Anna Ritchie:

Compatible RMRS representationsfrom RASP and the ERGMarch 2006, 41 pages, PDF

Abstract: Various applications could potentially bene-fit from the integration of deep and shallow process-ing techniques. A universal representation, compati-ble between deep and shallow parsers, would enablesuch integration, allowing the advantages of both to becombined. This paper describes efforts to make RMRSsuch a representation. This work was done as part ofDeepThought, funded under the 5th Framework Pro-gram of the European Commission (contract referenceIST-2001-37836).

UCAM-CL-TR-662

Ted Briscoe:

An introduction to tag sequencegrammars and the RASP systemparserMarch 2006, 30 pages, PDF

Abstract: This report describes the tag sequence gram-mars released as part of the Robust Accurate Statisti-cal Parsing (RASP) system. It is intended to help usersof RASP understand the linguistic and engineering ra-tionale behind the grammars and prepare them to cus-tomise the system for their application. It also containsa fairly exhaustive list of references to extant work util-ising the RASP parser.

UCAM-CL-TR-663

Richard Bergmair:

Syntax-driven analysis ofcontext-free languages with respect tofuzzy relational semanticsMarch 2006, 49 pages, PDF

Abstract: A grammatical framework is presented thataugments context-free production rules with semanticproduction rules that rely on fuzzy relations as rep-resentations of fuzzy natural language concepts. It isshown how the well-known technique of syntax-drivensemantic analysis can be used to infer from an ex-pression in a language defined in such a semanticallyaugmented grammar a weak ordering on the possibleworlds it describes. Considering the application of nat-ural language query processing, we show how to orderelements in the domain of a relational database schemeaccording to the degree to which they fulfill the intu-ition behind a given natural language statement like“Carol lives in a small city near San Francisco”.

UCAM-CL-TR-664

Alan F. Blackwell:

Designing knowledge:An interdisciplinary experimentin research infrastructure forshared descriptionApril 2006, 18 pages, PDF

Abstract: The report presents the experimental develop-ment, evaluation and refinement of a method for doingadventurous design work, in contexts where academicsmust work in collaboration with corporate and pub-lic policy strategists and researchers. The intention hasbeen to do applied social science, in which a reflectiveresearch process has resulted in a “new social form”, asexpressed in the title of the research grant that fundedthe project. The objective in doing so is not simply toproduce new theories, or to enjoy interdisciplinary en-counters (although both of those have been side effectsof this work). My purpose in doing the work and writ-ing this report is purely instrumental – working as atechnologist among social scientists, the outcome de-scribed in this report is intended for adoption as a kindof social technology. I have given this product a name:the “Blackwell-Leach Process” for interdisciplinary de-sign. The Blackwell-Leach process has since been ap-plied and proven useful in several novel situations, andI believe is now sufficiently mature to justify publica-tion of the reports that describe both the process andits development.

UCAM-CL-TR-665

Huiyun Li:

Security evaluation at design time forcryptographic hardwareApril 2006, 81 pages, PDFPhD thesis (Trinity Hall, December 2005)

Abstract: Consumer security devices are becomingubiquitous, from pay-TV through mobile phones, PDA,prepayment gas meters to smart cards. There are manyongoing research efforts to keep these devices securefrom opponents who try to retrieve key informationby observation or manipulation of the chip’s compo-nents. In common industrial practise, it is after the chiphas been manufactured that security evaluation is per-formed. Due to design time oversights, however, weak-nesses are often revealed in fabricated chips. Further-more, post manufacture security evaluation is time con-suming, error prone and very expensive. This evokesthe need of “design time security evaluation” tech-niques in order to identify avoidable mistakes in design.

154

This thesis proposes a set of “design time se-curity evaluation” methodologies covering the well-known non-invasive side-channel analysis attacks, suchas power analysis and electromagnetic analysis attacks.The thesis also covers the recently published semi-invasive optical fault injection attacks. These securityevaluation technologies examine the system under testby reproducing attacks through simulation and observ-ing its subsequent response.

The proposed “design time security evaluation”methodologies can be easily implemented into the stan-dard integrated circuit design flow, requiring only com-monly used EDA tools. So it adds little non-recurrentengineering (NRE) cost to the chip design but helpsidentify the security weaknesses at an early stage,avoids costly silicon re-spins, and helps succeed in in-dustrial evaluation for faster time-to-market.

UCAM-CL-TR-666

Mike Bond, George Danezis:

A pact with the DevilJune 2006, 14 pages, PDF

Abstract: We study malware propagation strategieswhich exploit not the incompetence or naivety ofusers, but instead their own greed, malice and short-sightedness. We demonstrate that interactive propaga-tion strategies, for example bribery and blackmail ofcomputer users, are effective mechanisms for malwareto survive and entrench, and present an example em-ploying these techniques. We argue that in terms ofpropagation, there exists a continuum between legit-imate applications and pure malware, rather than aquantised scale.

UCAM-CL-TR-667

Piotr Zielinski:

Minimizing latency of agreementprotocolsJune 2006, 239 pages, PDFPhD thesis (Trinity Hall, September 2005)

Abstract: Maintaining consistency of fault-tolerant dis-tributed systems is notoriously difficult to achieve. Itoften requires non-trivial agreement abstractions, suchas Consensus, Atomic Broadcast, or Atomic Commit-ment. This thesis investigates implementations of suchabstractions in the asynchronous model, extended withunreliable failure detectors or eventual synchrony. Themain objective is to develop protocols that minimizethe number of communication steps required in failure-free scenarios but remain correct if failures occur. Forseveral agreement problems and their numerous vari-ants, this thesis presents such low-latency algorithmsand lower-bound theorems proving their optimality.

The observation that many agreement protocolsshare the same round-based structure helps to copewith a large number of agreement problems in a uni-form way. One of the main contributions of this thesisis “Optimistically Terminating Consensus” (OTC) – anew lightweight agreement abstraction that formalizesthe notion of a round. It is used to provide simple mod-ular solutions to a large variety of agreement problems,including Consensus, Atomic Commitment, and Inter-active Consistency. The OTC abstraction tolerates ma-licious participants and has no latency overhead; agree-ment protocols constructed in the OTC framework re-quire no more communication steps than their ad-hoccounterparts.

The attractiveness of this approach lies in the factthat the correctness of OTC algorithms can be testedautomatically. A theory developed in this thesis al-lows us to quickly evaluate OTC algorithm candidateswithout the time-consuming examination of their en-tire state space. This technique is then used to scanthe space of possible solutions in order to automati-cally discover new low-latency OTC algorithms. Fromthese, one can now easily obtain new implementationsof Consensus and similar agreement problems such asAtomic Commitment or Interactive Consistency.

Because of its continuous nature, Atomic Broad-cast is considered separately from other agreement ab-stractions. I first show that no algorithm can guaran-tee a latency of less than three communication steps inall failure-free scenarios. Then, I present new AtomicBroadcast algorithms that achieve the two-step latencyin some special cases, while still guaranteeing threesteps for other failure-free scenarios. The special casesconsidered here are: Optimistic Atomic Broadcast, (Op-timistic) Generic Broadcast, and closed-group AtomicBroadcast. For each of these, I present an appropriatealgorithm and prove its latency to be optimal.

UCAM-CL-TR-668

Piotr Zielinski:

Optimistically TerminatingConsensusJune 2006, 35 pages, PDF

Abstract: Optimistically Terminating Consensus (OTC)is a variant of Consensus that decides if all correct pro-cesses propose the same value. It is surprisingly easyto implement: processes broadcast their proposals anddecide if sufficiently many processes report the sameproposal. This paper shows an OTC-based frameworkwhich can reconstruct all major asynchronous Consen-sus algorithms, even in Byzantine settings, with no over-head in latency or the required number of processes.This result does not only deepen our understandingof Consensus, but also reduces the problem of design-ing new, modular distributed agreement protocols tochoosing the parameters of OTC.

155

UCAM-CL-TR-669

David M. Eyers:

Active privilege management fordistributed access control systemsJune 2006, 222 pages, PDFPhD thesis (King’s College, June 2005)

Abstract: The last decade has seen the explosive up-take of technologies to support true Internet-scale dis-tributed systems, many of which will require security.

The policy dictating authorisation and privilege re-striction should be decoupled from the services beingprotected: (1) policy can be given its own independentlanguage syntax and semantics, hopefully in an appli-cation independent way; (2) policy becomes portable –it can be stored away from the services it protects; and(3) the evolution of policy can be effected dynamically.

Management of dynamic privileges in wide-area dis-tributed systems is a challenging problem. Supportingfast credential revocation is a simple example of dy-namic privilege management. More complex examplesinclude policies that are sensitive to the current state ofa principal, such as dynamic separation of duties.

The Open Architecture for Secure Interworking Ser-vices (OASIS), an expressive distributed role-based ac-cess control system, is traced to the development of theClinical and Biomedical Computing Limited (CBCL)OASIS implementation. Two OASIS deployments arediscussed – an Electronic Health Record framework,and an inter-organisational distributed courseware sys-tem.

The Event-based Distributed Scalable AuthorisationControl architecture for the 21st century (EDSAC21,or just EDSAC) is then presented along with its four de-sign layers. It builds on OASIS, adding support for thecollaborative enforcement of distributed dynamic con-straints, and incorporating publish/subscribe messagingto allow scalable and flexible deployment. The OASISpolicy language is extended to support delegation, dy-namic separation of duties, and obligation policies.

An EDSAC prototype is examined. We show thatour architecture is ideal for experiments performed intolocation-aware access control. We then demonstratehow event-based features specific to EDSAC facilitateintegration of an ad hoc workflow monitor into an ac-cess control system.

The EDSAC architecture is powerful, flexible andextensible. It is intended to have widespread appli-cability as the basis for designing next-generation se-curity middleware and implementing distributed, dy-namic privilege management.

UCAM-CL-TR-670

Sarah Thompson:

On the application of programanalysis and transformation tohigh reliability hardwareJuly 2006, 215 pages, PDFPhD thesis (St Edmund’s College, April 2006)

Abstract: Safety- and mission-critical systems must beboth correct and reliable. Electronic systems must be-have as intended and, where possible, do so at the firstattempt – the fabrication costs of modern VLSI devicesare such that the iterative design/code/test methodol-ogy endemic to the software world is not financiallyfeasible. In aerospace applications it is also essential toestablish that systems will, with known probability, re-main operational for extended periods, despite beingexposed to very low or very high temperatures, highradiation, large G-forces, hard vacuum and severe vi-bration.

Hardware designers have long understood the ad-vantages of formal mathematical techniques. Notably,model checking and automated theorem proving bothgained acceptance within the electronic design commu-nity at an early stage, though more recently the researchfocus in validation and verification has drifted towardsoftware. As a consequence, the newest and most pow-erful techniques have not been significantly applied tohardware; this work seeks to make a modest contribu-tion toward redressing the imbalance.

An abstract interpretation-based formalism is intro-duced, transitional logic, that supports formal reason-ing about dynamic behaviour of combinational asyn-chronous circuits. The behaviour of majority voting cir-cuits with respect to single-event transients is analysed,demonstrating that such circuits are not SET-immune.This result is generalised to show that SET immunity isimpossible for all delay-insensitive circuits.

An experimental hardware partial evaluator,HarPE, is used to demonstrate the 1st Futamuraprojection in hardware – a small CPU is specialisedwith respect to a ROM image, yielding results thatare equivalent to compiling the program into hard-ware. HarPE is then used alongside an experimentalnon-clausal SAT solver to implement an automatedtransformation system that is capable of repairingFPGAs that have suffered cosmic ray damage. Thisapproach is extended to support automated configu-ration, dynamic testing and dynamic error recovery ofreconfigurable spacecraft wiring harnesses.

UCAM-CL-TR-671

Piotr Zielinski:

Low-latency Atomic Broadcast in thepresence of contentionJuly 2006, 23 pages, PDF

156

Abstract: The Atomic Broadcast algorithm described inthis paper can deliver messages in two communicationsteps, even if multiple processes broadcast at the sametime. It tags all broadcast messages with the local realtime, and delivers all messages in order of these times-tamps. The Ω-elected leader simulates processes it sus-pects to have crashed (♦S). For fault-tolerance, it usesa new cheap Generic Broadcast algorithm that requiresonly a majority of correct processes (n > 2f) and, infailure-free runs, delivers all non-conflicting messagesin two steps. The main algorithm satisfies several newlower bounds, which are proved in this paper.

UCAM-CL-TR-672

Calicrates Policroniades-Borraz:

Decomposing file data intodiscernible itemsAugust 2006, 230 pages, PDFPhD thesis (Hughes Hall, December 2005)

Abstract: The development of the different persistentdata models shows a constant pattern: the higher thelevel of abstraction a storage system exposes the greaterthe payoff for programmers. The file API offers a simplestorage model that is agnostic of any structure or datatypes in file contents. As a result, developers employsubstantial programming effort in writing persistentcode. At the other extreme, orthogonally persistent pro-gramming languages reduce the impedance mismatchbetween the volatile and the persistent data spaces byexposing persistent data as conventional programmingobjects. Consequently, developers spend considerablyless effort in developing persistent code.

This dissertation addresses the lack of ability in thefile API to exploit the advantages of gaining access tothe logical composition of file content. It argues that thetrade-off between efficiency and ease of programmabil-ity of persistent code in the context of the file API isunbalanced. Accordingly, in this dissertation I presentand evaluate two practical strategies to disclose struc-ture and type in file data.

First, I investigate to what extent it is possibleto identify specific portions of file content in diversedata sets through the implementation and evaluation oftechniques for data redundancy detection. This study isinteresting not only because it characterises redundancylevels in storage systems content, but also because re-dundant portions of data at a sub-file level can be anindication of internal file data structure. Although thesetechniques have been used by previous work, my anal-ysis of data redundancy is the first that makes an in-depth comparison of them and highlights the trade-offsin their employment.

Second, I introduce a novel storage system API,called Datom, that departs from the view of file con-tent as a monolithic object. Through a minimal set of

commonly-used abstract data types, it discloses a judi-cious degree of structure and type in the logical com-position of files and makes the data access semanticsof applications explicit. The design of the Datom APIweighs the addition of advanced functionality and theoverheads introduced by their employment, taking intoaccount the requirements of the target application do-main. The implementation of the Datom API is eval-uated according to different criteria such as usability,impact at the source-code level, and performance. Theexperimental results demonstrate that the Datom APIreduces work-effort and improves software quality byproviding a storage interface based on high-level ab-stractions.

UCAM-CL-TR-673

Judita Preiss:

Probabilistic word sensedisambiguationAnalysis and techniques forcombining knowledge sourcesAugust 2006, 108 pages, PDFPhD thesis (Trinity College, July 2005)

Abstract: This thesis shows that probabilistic wordsense disambiguation systems based on established sta-tistical methods are strong competitors to current state-of-the-art word sense disambiguation (WSD) systems.

We begin with a survey of approaches to WSD, andexamine their performance in the systems submitted tothe SENSEVAL-2 WSD evaluation exercise. We discussexisting resources for WSD, and investigate the amountof training data needed for effective supervised WSD.

We then present the design of a new probabilisticWSD system. The main feature of the design is thatit combines multiple probabilistic modules using bothDempster-Shafer theory and Bayes Rule. Additionally,the use of Lidstone’s smoothing provides a uniformmechanism for weighting modules based on their ac-curacy, removing the need for an additional weightingscheme.

Lastly, we evaluate our probabilistic WSD systemusing traditional evaluation methods, and introducea novel task-based approach. When evaluated on thegold standard used in the SENSEVAL-2 competition,the performance of our system lies between the first andsecond ranked WSD system submitted to the English allwords task.

Task-based evaluations are becoming more popu-lar in natural language processing, being an absolutemeasure of a system’s performance on a given task. Wepresent a new evaluation method based on subcatego-rization frame acquisition. Experiments with our prob-abilistic WSD system give an extremely high correla-tion between subcategorization frame acquisition per-formance and WSD performance, thus demonstrating

157

the suitability of SCF acquisition as a WSD evaluationtask.

UCAM-CL-TR-674

Meng How Lim:

Landmark Guided ForwardingOctober 2006, 109 pages, PDFPhD thesis (St Catharine’s College, September 2006)

Abstract: Wireless mobile ad hoc network routingpresents some extremely challenging research prob-lems. While primarily trying to provide connectivity,algorithms may also be designed to minimise resourceconsumption such as power, or to trade off global op-timisation against the routing protocol overheads. Inthis thesis, we focus on the problems of maintainingnetwork connectivity in the presence of node mobil-ity whilst providing a balance between global efficiencyand robustness. The common design goal among exist-ing wireless ad hoc routing solutions is to search for anoptimal topological path between a source and a des-tination for some shortest path metric. We argue thatthe goal of establishing an end to end globally optimalpath is unsustainable as the network diameter, trafficvolume and number of nodes all increase in the pres-ence of moderate node mobility.

Some researchers have proposed using geographicposition-based forwarding, rather than a topological-based approach. In position-based forwarding, besidesknowing about its own geographic location, every nodealso acquires the geographic position of its surround-ing neighbours. Packet delivery in general is achievedby first learning the destination position from a loca-tion service. This is followed by addressing the packetwith the destination position before forwarding thepacket on to a neighbour that, amongst all other neigh-bours, is geographically nearest to the destination. Itis clear that in the ad hoc scenario, forwarding onlyby geodesic position could result in situations that pre-vent the packet from advancing further. To resolve this,some researchers propose improving delivery guaran-tees by routing the packet along a planar graph con-structed from a Gabriel (GG) or a Relative NeighbourGraph (RNG). This approach however has been shownto fail frequently when position information is inher-ently inaccurate, or neighbourhood state is stale, suchas is the case in many plausible deployment scenarios,e.g. due to relative mobility rates being higher than lo-cation service update frequency.

We propose Landmark Guided Forwarding (LGF),an algorithm that harnesses the strengths of both topo-logical and geographical routing algorithms. LGF is ahybrid scheme that leverages the scaling property of thegeographic approach while using local topology knowl-edge to mitigate location uncertainty. We demonstratethrough extensive simulations that LGF is suited bothto situations where there are high mobility rates, and

deployment when there is inherently less accurate po-sition data. Our results show that Landmark GuidedForwarding converges faster, scales better and is moreflexible in a range of plausible mobility scenarios thanrepresentative protocols from the leading classes of ex-isting solutions, namely GPSR, AODV and DSDV.

UCAM-CL-TR-675

Paula J. Buttery:

Computational models for firstlanguage acquisitionNovember 2006, 176 pages, PDFPhD thesis (Churchill College, March 2006)

Abstract: This work investigates a computationalmodel of first language acquisition; the CategorialGrammar Learner or CGL. The model builds on thework of Villavicenio, who created a parametric Cate-gorial Grammar learner that organises its parametersinto an inheritance hierarchy, and also on the workof Buszkowski and Kanazawa, who demonstrated thelearnability of a k-valued Classic Categorial Gram-mar (which uses only the rules of function applica-tion) from strings. The CGL is able to learn a k-valuedGeneral Categorial Grammar (which uses the rules offunction application, function composition and Gener-alised Weak Permutation). The novel concept of Sen-tence Objects (simple strings, augmented strings, unla-belled structures and functor-argument structures) arepresented as potential points from which learning maycommence. Augmented strings (which are strings aug-mented with some basic syntactic information) are sug-gested as a sensible input to the CGL as they are cog-nitively plausible objects and have greater informationcontent than strings alone. Building on the work ofSiskind, a method for constructing augmented stringsfrom unordered logic forms is detailed and it is sug-gested that augmented strings are simply a representa-tion of the constraints placed on the space of possibleparses due to a strings associated semantic content. TheCGL makes crucial use of a statistical Memory Mod-ule (constructed from a Type Memory and Word Or-der Memory) that is used to both constrain hypothesesand handle data which is noisy or parametrically am-biguous. A consequence of the Memory Module is thatthe CGL learns in an incremental fashion. This echoesreal child learning as documented in Browns Stages ofLanguage Development and also as alluded to by an in-cluded corpus study of child speech. Furthermore, theCGL learns faster when initially presented with simplerlinguistic data; a further corpus study of child-directedspeech suggests that this echos the input provided tochildren. The CGL is demonstrated to learn from realdata. It is evaluated against previous parametric learn-ers (the Triggering Learning Algorithm of Gibson andWexler and the Structural Triggers Learner of Fodorand Sakas) and is found to be more efficient.

158

UCAM-CL-TR-676

R.J. Gibbens, Y. Saacti:

Road traffic analysis using MIDASdata: journey time predictionDecember 2006, 35 pages, PDFDepartment for Transport Horizons ResearchProgramme “Investigating the handling of largetransport related datasets” (project number H05-217)

Abstract: The project described in this report was un-dertaken within the Department for Transport’s sec-ond call for proposals in the Horizons research pro-gramme under the theme of “Investigating the handlingof large transport related datasets”. The project lookedat the variability of journey times across days in threeday categories: Mondays, midweek days and Fridays.Two estimators using real-time data were considered:a simple-to-implement regression-based method and amore computationally demanding k-nearest neighbourmethod. Our example scenario of UK data was takenfrom the M25 London orbital motorway during 2003and the results compared in terms of the root-mean-square prediction error. It was found that where thevariability was greatest (typically during the rush hoursperiods or periods of flow breakdowns) the regressionand nearest neighbour estimators reduced the predic-tion error substantially compared with a naive estima-tor constructed from the historical mean journey time.Only as the lag between the decision time and the jour-ney start time increased to beyond around 2 hours didthe potential to improve upon the historical mean es-timator diminish. Thus, there is considerable scope forprediction methods combined with access to real-timedata to improve the accuracy in journey time estimates.In so doing, they reduce the uncertainty in estimatingthe generalized cost of travel. The regression-based pre-diction estimator has a particularly low computationaloverhead, in contrast to the nearest neighbour estima-tor, which makes it entirely suitable for an online imple-mentation. Finally, the project demonstrates both thevalue of preserving historical archives of transport re-lated datasets as well as provision of access to real-timemeasurements.

UCAM-CL-TR-677

Eiko Yoneki:

ECCO: Data centric asynchronouscommunicationDecember 2006, 210 pages, PDFPhD thesis (Lucy Cavendish College, September 2006)

Abstract: This dissertation deals with data centric net-working in distributed systems, which relies on con-tent addressing instead of host addressing for partic-ipating nodes, thus providing network independencefor applications. Publish/subscribe asynchronous groupcommunication realises the vision of data centric net-working that is particularly important for networkssupporting mobile clients over heterogeneous wirelessnetworks. In such networks, client applications pre-fer to receive specific data and require selective datadissemination. Underlying mechanisms such as asyn-chronous message passing, distributed message filteringand query/subscription management are essential. Fur-thermore, recent progress in wireless sensor networksbrought a new dimension of data processing in ubiqui-tous computing, where the sensors are used to gatherhigh volumes of different data types and to feed themas contexts to a wide range of applications.

Particular emphasis has been placed on fundamen-tal design of event representation. Besides existing eventattributes, event order, and continuous context infor-mation such as time or geographic location can beincorporated within an event description. Data repre-sentation of event and query will be even more im-portant in future ubiquitous computing, where eventsflow over heterogeneous networks. This dissertationpresents a multidimensional event representation (i.e.,Hypercube structure in RTree) for efficient indexing,filtering, matching, and scalability in publish/subscribesystems. The hypercube event with a typed content-based publish/subscribe system for wide-area networksis demonstrated for improving the event filtering pro-cess.

As a primary focus, this dissertation investigates astructureless, asynchronous group communication overwireless ad hoc networks named ‘ECCO Pervasive Pub-lish/Subscribe’ (ECCO-PPS). ECCO-PPS uses context-adaptive controlled flooding, which takes a cross-layerapproach between middleware and network layers andprovides a content-based publish/subscribe paradigm.Traditionally events have been payload data withinnetwork layer components; the network layer nevertouches the data contents. However, application datahave more influence on data dissemination in ubiqui-tous computing scenarios.

The state information of the local node may bethe event forwarding trigger. Thus, the model ofpublish/subscribe must become more symmetric, withevents being disseminated based on rules and condi-tions defined by the events themselves. The event canthus choose the destinations instead of relying on thepotential receivers’ decision. The publish/subscribe sys-tem offers a data centric approach, where the destina-tion address is not described with any explicit networkaddress. The symmetric publish/subscribe paradigmbrings another level to the data-centric paradigm, lead-ing to a fundamental change in functionality at the net-work level of asynchronous group communication andmembership maintenance.

To add an additional dimension of event process-

159

ing in global computing, It is important to understandevent aggregation, filtering and correlation. Temporalordering of events is essential for event correlationover distributed systems. This dissertation introducesgeneric composite event semantics with interval-basedsemantics for event detection. This precisely definescomplex timing constraints among correlated event in-stances.

In conclusion, this dissertation provides advanceddata-centric asynchronous communication, which pro-vides efficiency, reliability, and robustness, while adapt-ing to the underlying network environments.

UCAM-CL-TR-678

Andrew D. Twigg:

Compact forbidden-set routingDecember 2006, 115 pages, PDFPhD thesis (King’s College, June 2006)

Abstract: We study the compact forbidden-set rout-ing problem. We describe the first compact forbidden-set routing schemes that do not suffer from non-convergence problems often associated with Bellman-Ford iterative schemes such as the interdomain routingprotocol, BGP. For degree-d n-node graphs of treewidtht, our schemes use space O(t2 d polylog(n)) bits pernode; a trivial scheme uses O(n2) and routing treesuse Ω(n) per node (these results have since been im-proved and extended – see [Courcelle, Twigg, Compactforbidden-set routing, 24th Symposium on TheoreticalAspects of Computer Science, Aachen 2007]. We alsoshow how to do forbidden-set routing on planar graphsbetween nodes whose distance is less than a parameterl. We prove a lower bound on the space requirements offorbidden-set routing for general graphs, and show thatthe problem is related to constructing an efficient dis-tributed representation of all the separators of an undi-rected graph. Finally, we consider routing while takinginto account path costs of intermediate nodes and showthat this requires large routing labels. We also study anovel way of approximating forbidden-set routing us-ing quotient graphs of low treewidth.

UCAM-CL-TR-679

Karen Sparck Jones:

Automatic summarising:a review and discussion of the state ofthe artJanuary 2007, 67 pages, PDF

Abstract: This paper reviews research on automaticsummarising over the last decade. This period has seena rapid growth of work in the area stimulated by tech-nology and by several system evaluation programmes.

The review makes use of several frameworks to organ-ise the review, for summarising, for systems, for the taskfactors affecting summarising, and for evaluation de-sign and practice.

The review considers the evaluation strategies thathave been applied to summarising and the issues theyraise, and the major summary evaluation programmes.It examines the input, purpose and output factors thathave been investigated in summarising research in thelast decade, and discusses the classes of strategy, bothextractive and non-extractive, that have been explored,illustrating the range of systems that have been built.This analysis of strategies is amplified by accounts ofspecific exemplar systems.

The conclusions drawn from the review are thatautomatic summarisation research has made valuableprogress in the last decade, with some practically usefulapproaches, better evaluation, and more understandingof the task. However as the review also makes clear,summarising systems are often poorly motivated in re-lation to the factors affecting summaries, and evalu-ation needs to be taken significantly further so as toengage with the purposes for which summaries are in-tended and the contexts in which they are used.

A reduced version of this report, entitled ‘Automaticsummarising: the state of the art’ will appear in Infor-mation Processing and Management, 2007.

UCAM-CL-TR-680

Jing Su, James Scott, Pan Hui, Eben Upton,Meng How Lim, Christophe Diot,Jon Crowcroft, Ashvin Goel, Eyal de Lara:

Haggle: Clean-slate networking formobile devicesJanuary 2007, 30 pages, PDF

Abstract: Haggle is a layerless networking architecturefor mobile devices. It is motivated by the infrastruc-ture dependence of applications such as email and webbrowsing, even in situations where infrastructure is notnecessary to accomplish the end user goal, e.g. when thedestination is reachable by ad hoc neighbourhood com-munication. In this paper we present details of Hag-gle’s architecture, and of the prototype implementationwhich allows existing email and web applications to be-come infrastructure-independent, as we show with anexperimental evaluation.

UCAM-CL-TR-681

Piotr Zielinski:

Indirect channels:a bandwidth-saving technique forfault-tolerant protocolsApril 2007, 24 pages, PDF

160

Abstract: Sending large messages known to the recipi-ent is a waste of bandwidth. Nevertheless, many fault-tolerant agreement protocols send the same large mes-sage between each pair of participating processes. Thispractical problem has recently been addressed in thecontext of Atomic Broadcast by presenting a special-ized algorithm.

This paper proposes a more general solution by pro-viding virtual indirect channels that physically transmitmessage ids instead of full messages if possible. Indirectchannels are transparent to the application; they can beused with any distributed algorithm, even with unre-liable channels or malicious participants. At the sametime, they provide rigorous theoretical properties.

Indirect channels are conservative: they do not al-low manipulating message ids if full messages are notknown. This paper also investigates the consequencesof relaxing this assumption on the latency and correct-ness of Consensus and Atomic Broadcast implementa-tions: new algorithms and lower bounds are shown.

UCAM-CL-TR-682

Juliano Iyoda:

Translating HOL functions tohardwareApril 2007, 89 pages, PDFPhD thesis (Hughes Hall, October 2006)

Abstract: Delivering error-free products is still a majorchallenge for hardware and software engineers. Due tothe increasingly growing complexity of computing sys-tems, there is a demand for higher levels of automationin formal verification.

This dissertation proposes an approach to generateformally verified circuits automatically. The main out-come of our project is a compiler implemented on topof the theorem prover HOL4 which translates a subsetof higher-order logic to circuits. The subset of the logicis a first-order tail-recursive functional language. Thecompiler takes a function f as argument and automati-cally produces the theorem “` C implements f” whereC is a circuit and “implements” is a correctness rela-tion between a circuit and a function. We achieve fullmechanisation of proofs by defining theorems whichare composable. The correctness of a circuit can bemechanically determined by the correctness of its sub-circuits. This technology allows the designer to focuson higher levels of abstraction instead of reasoning andverifying systems at the gate level.

A pretty-printer translates netlists described inhigher-order logic to structural Verilog. Our compileris integrated with Altera tools to run our circuits inFPGAs. Thus the theorem prover is used as an envi-ronment for supporting the development process fromformal specification to implementation.

Our approach has been tested with fairly sub-stantial case studies. We describe the design and the

verification of a multiplier and a simple microcom-puter which has shown us that the compiler supportssmall and medium-sized applications. Although this ap-proach does not scale to industrial-sized applicationsyet, it is a first step towards the implementation of anew technology that can raise the level of mechanisa-tion in formal verification.

UCAM-CL-TR-683

Martin Kleppmann:

Simulation of colliding constrainedrigid bodiesApril 2007, 65 pages, PDF

Abstract: I describe the development of a program tosimulate the dynamic behaviour of interacting rigidbodies. Such a simulation may be used to generate ani-mations of articulated characters in 3D graphics appli-cations. Bodies may have an arbitrary shape, defined bya triangle mesh, and may be connected with a varietyof different joints. Joints are represented by constraintfunctions which are solved at run-time using Lagrangemultipliers. The simulation performs collision detectionand prevents penetration of rigid bodies by applyingimpulses to colliding bodies and reaction forces to bod-ies in resting contact.

The simulation is shown to be physically accurateand is tested on several different scenes, including oneof an articulated human character falling down a flightof stairs.

An appendix describes how to derive arbitrary con-straint functions for the Lagrange multiplier method.Collisions and joints are both represented as con-straints, which allows them to be handled with a unifiedalgorithm. The report also includes some results relat-ing to the use of quaternions in dynamic simulations.

UCAM-CL-TR-684

Pan Hui, Jon Crowcroft:

Bubble Rap: Forwarding in smallworld DTNs in ever decreasing circlesMay 2007, 44 pages, PDF

Abstract: In this paper we seek to improve understand-ing of the structure of human mobility, and to use thisin the design of forwarding algorithms for Delay Tol-erant Networks for the dissemination of data amongstmobile users.

Cooperation binds but also divides human societyinto communities. Members of the same communityinteract with each other preferentially. There is struc-ture in human society. Within society and its commu-nities, individuals have varying popularity. Some peo-ple are more popular and interact with more people

161

than others; we may call them hubs. Popularity rank-ing is one facet of the population. In many physicalnetworks, some nodes are more highly connected toeach other than to the rest of the network. The setof such nodes are usually called clusters, communities,cohesive groups or modules. There is structure to so-cial networking. Different metrics can be used such asinformation flow, Freeman betweenness, closeness andinference power, but for all of them, each node in thenetwork can be assigned a global centrality value.

What can be inferred about individual popularity,and the structure of human society from measurementswithin a network? How can the local and global char-acteristics of the network be used practically for infor-mation dissemination? We present and evaluate a se-quence of designs for forwarding algorithms for PocketSwitched Networks, culminating in Bubble, which ex-ploit increasing levels of information about mobilityand interaction.

UCAM-CL-TR-685

John Daugman, Cathryn Downing:

Effect of severe image compression oniris recognition performanceMay 2007, 20 pages, PDF

Abstract: We investigate three schemes for severe com-pression of iris images, in order to assess what theirimpact would be on recognition performance of the al-gorithms deployed today for identifying persons by thisbiometric feature. Currently, standard iris images are600 times larger than the IrisCode templates computedfrom them for database storage and search; but it is ad-ministratively desired that iris data should be stored,transmitted, and embedded in media in the form of im-ages rather than as templates computed with propri-etary algorithms. To reconcile that goal with its impli-cations for bandwidth and storage, we present schemesthat combine region-of-interest isolation with JPEGand JPEG2000 compression at severe levels, and we testthem using a publicly available government databaseof iris images. We show that it is possible to compressiris images to as little as 2 KB with minimal impacton recognition performance. Only some 2% to 3% ofthe bits in the IrisCode templates are changed by suchsevere image compression. Standard performance met-rics such as error trade-off curves document very goodrecognition performance despite this reduction in datasize by a net factor of 150, approaching a convergenceof image data size and template size.

UCAM-CL-TR-686

Andrew C. Rice:

Dependable systems for SentientComputingMay 2007, 150 pages, PDF

Abstract: Computers and electronic devices are con-tinuing to proliferate throughout our lives. SentientComputing systems aim to reduce the time and effortrequired to interact with these devices by composingthem into systems which fade into the background ofthe user’s perception. Failures are a significant problemin this scenario because their occurrence will pull thesystem into the foreground as the user attempts to dis-cover and understand the fault. However, attemptingto exist and interact with users in a real, unpredictable,physical environment rather than a well-constrainedvirtual environment makes failures inevitable.

This dissertation describes a study of dependability.A dependable system permits applications to discoverthe extent of failures and to adapt accordingly suchthat their continued behaviour is intuitive to users ofthe system.

Cantag, a reliable marker-based machine-vision sys-tem, has been developed to aid the investigation of de-pendability. The description of Cantag includes specificcontributions for marker tracking such as rotationallyinvariant coding schemes and reliable back-projectionfor circular tags. An analysis of Cantag’s theoreticalperformance is presented and compared to its real-world behaviour. This analysis is used to develop op-timised tag designs and performance metrics. The useof validation is proposed to permit runtime calculationof observable metrics and verification of system com-ponents. Formal proof methods are combined with alogical validation framework to show the validity ofperformance optimisations.

UCAM-CL-TR-687

Viktor Vafeiadis, Matthew Parkinson:

A marriage of rely/guarantee andseparation logicJune 2007, 31 pages, PDF

Abstract: In the quest for tractable methods for reason-ing about concurrent algorithms both rely/guaranteelogic and separation logic have made great advances.They both seek to tame, or control, the complexity ofconcurrent interactions, but neither is the ultimate ap-proach. Rely-guarantee copes naturally with interfer-ence, but its specifications are complex because they de-scribe the entire state. Conversely separation logic hasdifficulty dealing with interference, but its specificationsare simpler because they describe only the relevant statethat the program accesses.

We propose a combined system which marries thetwo approaches. We can describe interference naturally(using a relation as in rely/guarantee), and where thereis no interference, we can reason locally (as in separa-tion logic). We demonstrate the advantages of the com-bined approach by verifying a lock-coupling list algo-rithm, which actually disposes/frees removed nodes.

162

UCAM-CL-TR-688

Sam Staton:

Name-passing process calculi:operational models and structuraloperational semanticsJune 2007, 245 pages, PDFPhD thesis (Girton College, December 2006)

Abstract: This thesis is about the formal semanticsof name-passing process calculi. We study operationalmodels by relating various different notions of model,and we analyse structural operational semantics by ex-tracting a congruence rule format from a model the-ory. All aspects of structural operational semantics areaddressed: behaviour, syntax, and rule-based inductivedefinitions.

A variety of models for name-passing behaviour areconsidered and developed. We relate classes of indexedlabelled transition systems, proposed by Cattani andSewell, with coalgebraic models proposed by Fiore andTuri. A general notion of structured coalgebra is intro-duced and developed, and a natural notion of struc-tured bisimulation is related to Sangiorgi’s open bisim-ulation for the π-calculus. At first the state spaces areorganised as presheaves, but it is reasonable to con-strain the models to sheaves in a category known as theSchanuel topos. This sheaf topos is exhibited as equiv-alent to a category of named-sets proposed by Mon-tanari and Pistore for efficient verification of name-passing systems.

Syntax for name-passing calculi involves variablebinding and substitution. Gabbay and Pitts proposednominal sets as an elegant model for syntax with bind-ing, and we develop a framework for substitution inthis context. The category of nominal sets is equivalentto the Schanuel topos, and so syntax and behaviour canbe studied within one universe.

An abstract account of structural operational se-mantics was developed by Turi and Plotkin. They ex-plained the inductive specification of a system by rulesin the GSOS format of Bloom et al., in terms of ini-tial algebra recursion for lifting a monad of syntax toa category of behaviour. The congruence properties ofbisimilarity can be observed at this level of generality.We study this theory in the general setting of struc-tured coalgebras, and then for the specific case of name-passing systems, based on categories of nominal sets.

At the abstract level of category theory, classes ofrules are understood as natural transformations. In theconcrete domain, though, rules for name-passing sys-tems are formulae in a suitable logical framework. Byimposing a format on rules in Pitts’s nominal logic, wecharacterise a subclass of rules in the abstract domain.Translating the abstract results, we conclude that, for aname-passing process calculus defined by rules in thisformat, a variant of open bisimilarity is a congruence.

UCAM-CL-TR-689

Ursula H. Augsdorfer, Neil A. Dodgson,Malcolm A. Sabin:

Removing polar rendering artifacts insubdivision surfacesJune 2007, 7 pages, PDF

Abstract: There is a belief that subdivision schemes re-quire the subdominant eigenvalue, λ, to be the samearound extraordinary vertices as in the regular re-gions of the mesh. This belief is owing to the polarrendering artifacts which occur around extraordinarypoints when λ is significantly larger than in the reg-ular regions. By constraining the tuning of subdivisionschemes to solutions which fulfill this condition we mayprevent ourselves from finding the optimal limit sur-face. We show that the perceived problem is purely arendering artifact and that it does not reflect the qual-ity of the underlying limit surface. Using the boundedcurvature Catmull-Clark scheme as an example, we de-scribe three practical methods by which this renderingartifact can be removed, thereby allowing us to tunesubdivision schemes using any appropriate values of λ.

UCAM-CL-TR-690

Russell Glen Ross:

Cluster storage for commoditycomputationJune 2007, 178 pages, PDFPhD thesis (Wolfson College, December 2006)

Abstract: Standards in the computer industry havemade basic components and entire architectures intocommodities, and commodity hardware is increasinglybeing used for the heavy lifting formerly reserved forspecialised platforms. Now software and services arefollowing. Modern updates to virtualization technologymake it practical to subdivide commodity servers andmanage groups of heterogeneous services using com-modity operating systems and tools, so services can bepackaged and managed independent of the hardwareon which they run. Computation as a commodity issoon to follow, moving beyond the specialised appli-cations typical of today’s utility computing.

In this dissertation, I argue for the adoption of ser-vice clusters—clusters of commodity machines undercentral control, but running services in virtual machinesfor arbitrary, untrusted clients—as the basic buildingblock for an economy of flexible commodity computa-tion. I outline the requirements this platform imposeson its storage system and argue that they are necessaryfor service clusters to be practical, but are not found inexisting systems.

163

Next I introduce Envoy, a distributed file system forservice clusters. In addition to meeting the needs of anew environment, Envoy introduces a novel file distri-bution scheme that organises metadata and cache man-agement according to runtime demand. In effect, thefile system is partitioned and control of each part givento the client that uses it the most; that client in turnacts as a server with caching for other clients that re-quire concurrent access. Scalability is limited only byruntime contention, and clients share a perfectly consis-tent cache distributed across the cluster. As usage pat-terns change, the partition boundaries are updated dy-namically, with urgent changes made quickly and moreminor optimisations made over a longer period of time.

Experiments with the Envoy prototype demonstratethat service clusters can support cheap and rapid de-ployment of services, from isolated instances to groupsof cooperating components with shared storage de-mands.

UCAM-CL-TR-691

Neil A. Dodgson, Malcolm A. Sabin,Richard Southern:

Preconditions on geometricallysensitive subdivision schemesAugust 2007, 13 pages, PDF

Abstract: Our objective is to create subdivision schemeswith limit surfaces which are surfaces useful in engi-neering (spheres, cylinders, cones etc.) without resort-ing to special cases. The basic idea explored by us pre-viously in the curve case is that if the property that allvertices lie on an object of the required class can bepreserved through the subdivision refinement, it will bepreserved into the limit surface also. The next obviousstep was to try a bivariate example. We therefore iden-tified the simplest possible scheme and implemented it.However, this misbehaved quite dramatically. This re-port, by doing the limit analysis, identifies why the mis-behaviour occurred, and draws conclusions about howthe problems should be avoided.

UCAM-CL-TR-692

Alan F. Blackwell:

Toward an undergraduateprogramme in InterdisciplinaryDesignJuly 2007, 13 pages, PDF

Abstract: This technical report describes an experimen-tal syllabus proposal that was developed for the Cam-bridge Computer Science Tripos (the standard under-graduate degree programme in Computer Science at

Cambridge). The motivation for the proposal was tocreate an innovative research-oriented taught coursethat would be compatible with the broader policy goalsof the Crucible network for research in interdisciplinarydesign. As the course is not proceeding, the syllabus ispublished here for use by educators and educational re-searchers with interests in design teaching.

UCAM-CL-TR-693

Piotr Zielinski:

Automatic classification of eventualfailure detectorsJuly 2007, 21 pages, PDF

Abstract: Eventual failure detectors, such as Ω or ♦P,can make arbitrarily many mistakes before they startproviding correct information. This paper shows thatany detector implementable in an purely asynchronoussystem can be implemented as a function of only theorder of most-recently heard-from processes. The finite-ness of this representation means that eventual failuredetectors can be enumerated and their relative strengthstested automatically. The results for systems with twoand three processes are presented.

Implementability can also be modelled as a gamebetween Prover and Disprover. This approach not onlyspeeds up automatic implementability testing, but alsoresults in shorter and more intuitive proofs. I use thistechnique to identify the new weakest failure detectoranti-Ω and prove its properties. Anti-Ω outputs processids and, while not necessarily stabilizing, it ensures thatsome correct process is eventually never output.

UCAM-CL-TR-694

Piotr Zielinski:

Anti-Ω: the weakest failure detectorfor set agreementJuly 2007, 24 pages, PDF

Abstract: In the set agreement problem, n processeshave to decide on at most n−1 of the proposed values.This paper shows that the anti-Ω failure detector is bothsufficient and necessary to implement set agreement inan asynchronous shared-memory system equipped withregisters. Each query to anti-Ω returns a single processid; the specification ensures that there is a correct pro-cess whose id is returned only finitely many times.

164

UCAM-CL-TR-695

Karen Su, Inaki Berenguer, Ian J. Wassell,Xiaodong Wang:

Efficient maximum-likelihooddecoding of spherical lattice codesJuly 2007, 29 pages, PDF

Abstract: A new framework for efficient and exactMaximum-Likelihood (ML) decoding of spherical lat-tice codes is developed. It employs a double-tree struc-ture: The first is that which underlies established tree-search decoders; the second plays the crucial role ofguiding the primary search by specifying admissiblecandidates and is our focus in this report. Latticecodes have long been of interest due to their richstructure, leading to numerous decoding algorithmsfor unbounded lattices, as well as those with axis-aligned rectangular shaping regions. Recently, spheri-cal Lattice Space-Time (LAST) codes were proposedto realize the optimal diversity-multiplexing tradeoff ofMIMO channels. We address the so-called boundarycontrol problem arising from the spherical shaping re-gion defining these codes. This problem is complicatedbecause of the varying number of candidates potentiallyunder consideration at each search stage; it is not obvi-ous how to address it effectively within the frameworksof existing schemes. Our proposed strategy is compat-ible with all sequential tree-search detectors, as wellas auxiliary processing such as the MMSE-GDFE andlattice reduction. We demonstrate the superior perfor-mance and complexity profiles achieved when applyingthe proposed boundary control in conjunction with twocurrent efficient ML detectors and show an improve-ment of 1dB over the state-of-the-art at a comparablecomplexity.

UCAM-CL-TR-696

Oliver J. Woodman:

An introduction to inertial navigationAugust 2007, 37 pages, PDF

Abstract: Until recently the weight and size of in-ertial sensors has prohibited their use in domainssuch as human motion capture. Recent improvementsin the performance of small and lightweight micro-machined electromechanical systems (MEMS) inertialsensors have made the application of inertial techniquesto such problems possible. This has resulted in an in-creased interest in the topic of inertial navigation, how-ever current introductions to the subject fail to suffi-ciently describe the error characteristics of inertial sys-tems.

We introduce inertial navigation, focusing on strap-down systems based on MEMS devices. A combination

of measurement and simulation is used to explore theerror characteristics of such systems. For a simple in-ertial navigation system (INS) based on the Xsens Mtxinertial measurement unit (IMU), we show that the av-erage error in position grows to over 150 m after 60seconds of operation. The propagation of orientationerrors caused by noise perturbing gyroscope signals isidentified as the critical cause of such drift. By simu-lation we examine the significance of individual noiseprocesses perturbing the gyroscope signals, identifyingwhite noise as the process which contributes most tothe overall drift of the system.

Sensor fusion and domain specific constraints canbe used to reduce drift in INSs. For an example INSwe show that sensor fusion using magnetometers canreduce the average error in position obtained by thesystem after 60 seconds from over 150 m to around5 m. We conclude that whilst MEMS IMU technologyis rapidly improving, it is not yet possible to build aMEMS based INS which gives sub-meter position ac-curacy for more than one minute of operation.

UCAM-CL-TR-697

Chris J. Purcell:

Scaling Mount Concurrency:scalability and progress in concurrentalgorithmsAugust 2007, 155 pages, PDFPhD thesis (Trinity College, July 2007)

Abstract: As processor speeds plateau, chip manufac-turers are turning to multi-processor and multi-core de-signs to increase performance. As the number of simul-taneous threads grows, Amdahl’s Law means the per-formance of programs becomes limited by the cost thatdoes not scale: communication, via the memory sub-system. Algorithm design is critical in minimizing thesecosts.

In this dissertation, I first show that existing instruc-tion set architectures must be extended to allow generalscalable algorithms to be built. Since it is impracticalto entirely abandon existing hardware, I then presenta reasonably scalable implementation of a map builton the widely-available compare-and-swap primitive,which outperforms existing algorithms for a range ofusages.

Thirdly, I introduce a new primitive operation, andshow that it provides efficient and scalable solutions toseveral problems before proving that it satisfies strongtheoretical properties. Finally, I outline possible hard-ware implementations of the primitive with differentproperties and costs, and present results from a hard-ware evaluation, demonstrating that the new primitivecan provide good practical performance.

165

UCAM-CL-TR-698

Simon J. Hollis:

Pulse-based, on-chip interconnectSeptember 2007, 186 pages, PDFPhD thesis (Queens’ College, June 2007)

Abstract: This thesis describes the development of anon-chip point-to-point link, with particular emphasison the reduction of its global metal area footprint.

To reduce its metal footprint, the interconnect usesa serial transmission approach. 8-bit data is sent us-ing just two wires, through a pulse-based technique,inspired by the GasP interconnect from Sun Microsys-tems. Data and control signals are transmitted bi-directionally on a wire using this double-edged, pulse-based signalling protocol, and formatted using a vari-ant of dual-rail encoding. These choices enable a reduc-tion in the number of wires needed, an improvementin the acknowledgement overhead of the asynchronousprotocol, and the ability to cross clock domains with-out synchronisation hazards.

New, stateful, repeaters are demonstrated, and re-sults from spice simulations of the system show thatdata can be transferred at over 1Gbit/s, over 1mm ofminimum-sized, minimally-spaced metal 5 wiring, ona 180nm (0.18um) technology. This reduces to only926Mbit/s, when 10mm of wiring is considered, andrepresents a channel utilisation of a very attractive 45%of theoretical capacity at this length. Analysis of laten-cies, energy consumption, and area use are also pro-vided.

The point-to-point link is then expanded with theinvention and demonstration of a router and an arbi-trated merge element, to produce a Network-on-Chip(NoC) design, called RasP. The full system is then eval-uated, and peak throughput is shown to be 763Mbit/sfor 1mm of wiring, reducing to 599Mbit/s for 10mm ofthe narrow metal 5 interconnect.

Finally, RasP is compared in performance with theChain interconnect from the University of Manchester.Results for the metrics of throughput, latency, energyconsumption and area footprint show that the two sys-tems perform very similarly — the maximum abso-lute deviation is under 25% for throughput, latencyand area; and the energy-efficiency of RasP is approx-imately twice that of Chain. Between the two systems,RasP has the smaller latency, energy and area require-ments and is shown to be a viable alternative NoC de-sign.

UCAM-CL-TR-699

Richard Southern, Neil A. Dodgson:

A smooth manifold basedconstruction of approximating loftedsurfacesOctober 2007, 17 pages, PDF

Abstract: We present a new method for constructinga smooth manifold approximating a curve network orcontrol mesh. In our two-step method, smooth ver-tex patches are initially defined by extrapolating andthen blending a univariate or bivariate surface repre-sentation. Each face is then constructed by blending to-gether the segments of each vertex patch correspondingto the face corners. By approximating the input curvenetwork, rather than strictly interpolating it, we havegreater flexibility in controlling surface behaviour andhave local control. Additionally no initial control meshfitting or fairing needs to be performed, and no deriva-tive information is needed to ensure continuity at patchboundaries.

UCAM-CL-TR-700

Maja Vukovic:

Context aware service compositionOctober 2007, 225 pages, PDFPhD thesis (Newnham College, April 2006)

Abstract: Context aware applications respond andadapt to changes in the computing environment. Forexample, they may react when the location of the useror the capabilities of the device used change. Despitethe increasing importance and popularity of such ap-plications, advances in application models to supporttheir development have not kept up. Legacy applicationdesign models, which embed contextual dependenciesin the form of if-then rules specifying how applicationsshould react to context changes, are still widely used.Such models are impractical to accommodate the largevariety of possibly even unanticipated context typesand their values.

This dissertation proposes a new application modelfor building context aware applications, consideringthem as dynamically composed sequences of callsto services, software components that perform well-defined computational operations and export open in-terfaces through which they can be invoked. This workemploys goal-oriented inferencing from planning tech-nologies for selecting the services and assembling thesequence of their execution, allowing different compo-sitions to result from different context parameters suchas resources available, time constraints, and user loca-tion. Contextual changes during the execution of theservices may trigger further re-composition causing theapplication to evolve dynamically.

An important challenge in providing a contextaware service composition facility is dealing with fail-ures that may occur, for instance as a result of con-text changes or missing service descriptions. To handlecomposition failures, this dissertation introduces Goal-Morph, a system which transforms failed compositionrequests into alternative ones that can be solved.

This dissertation describes the design and imple-mentation of the proposed framework for contextaware service composition. Experimental evaluation

166

of a realistic infotainment application demonstratesthat the framework provides an effcient and scal-able solution. Furthermore, it shows that GoalMorphtransforms goals successfully, increasing the utility ofachieved goals without imposing a prohibitive compo-sition time overhead.

By developing the proposed framework for fault-tolerant, context aware service composition this workultimately lowers the barrier for building extensible ap-plications that automatically adapt to the user’s con-text. This represents a step towards a new paradigmfor developing adaptive software to accommodate theincreasing dynamicity of computing environments.

UCAM-CL-TR-701

Jacques Jean-Alain Fournier:

Vector microprocessors forcryptographyOctober 2007, 174 pages, PDFPhD thesis (Trinity Hall, April 2007)

Abstract: Embedded security devices like ‘Trusted Plat-forms’ require both scalability (of power, performanceand area) and flexibility (of software and countermea-sures). This thesis illustrates how data parallel tech-niques can be used to implement scalable architecturesfor cryptography. Vector processing is used to pro-vide high performance, power efficient and scalableprocessors. A programmable vector 4-stage pipelinedco-processor, controlled by a scalar MIPS compatibleprocessor, is described. The instruction set of the co-processor is defined for cryptographic algorithms likeAES and Montgomery modular multiplication for RSAand ECC. The instructions are assessed using an in-struction set simulator based on the ArchC tool. Thisinstruction set simulator is used to see the impact ofvarying the vector register depth (p) and the numberof vector processing units (r). Simulations indicate thatfor vector versions of AES, RSA and ECC the perfor-mance improves in O(log(r)). A cycle-accurate synthe-sisable Verilog model of the system (VeMICry) is im-plemented in TSMC’s 90nm technology and used toshow that the best area/power/performance tradeoff isreached for r = (p/4). Also, this highly scalable designallows area/power/performance trade-offs to be madefor a panorama of applications ranging from smart-cards to servers. This thesis is, to my best knowledge,the first attempt to implement embedded cryptographyusing vector processing techniques.

UCAM-CL-TR-702

Alisdair Wren:

Relationships for object-orientedprogramming languagesNovember 2007, 153 pages, PDFPhD thesis (Sidney Sussex College, March 2007)

Abstract: Object-oriented approaches to software de-sign and implementation have gained enormous popu-larity over the past two decades. However, whilst mod-els of software systems routinely allow software engi-neers to express relationships between objects, object-oriented programming languages lack this ability. In-stead, relationships must be encoded using complex ref-erence structures. When the model cannot be expresseddirectly in code, it becomes more difficult for program-mers to see a correspondence between design and im-plementation – the model no longer faithfully docu-ments the code. As a result, programmer intuition islost, and error becomes more likely, particularly duringmaintenance of an unfamiliar software system.

This thesis explores extensions to object-orientedlanguages so that relationships may be expressed withthe same ease as objects. Two languages with relation-ships are specified: RelJ, which offers relationships ina class-based language based on Java, and QSigma,which is an object calculus with heap query.

In RelJ, relationship declarations exist at the samelevel as class declarations: relationships are named, theymay have fields and methods, they may inherit from oneanother and their instances may be referenced just likeobjects. Moving into the object-based world, QSigmais based on the sigma-calculi of Abadi and Cardelli, ex-tended with the ability to query the heap. Heap queryallows objects to determine how they are referenced byother objects, such that single references are sufficientfor establishing an inter-object relationship observableby all participants. Both RelJ and QSigma are equippedwith a formal type system and semantics to ensure typesafety in the presence of these extensions.

By giving formal models of relationships in bothclass- and object-based settings, we can obtain generalprinciples for relationships in programming languagesand, therefore, establish a correspondence between im-plementation and design.

UCAM-CL-TR-703

Jon Crowcroft, Tim Deegan,Christian Kreibich, Richard Mortier,Nicholas Weaver:

Lazy Susan: dumb waiting as proofof workNovember 2007, 23 pages, PDF

Abstract: The open nature of Internet services has beenof great value to users, enabling dramatic innovationand evolution of services. However, this openness per-mits many abuses of open-access Internet services suchas web, email, and DNS. To counteract such abuses, anumber of so called proof-of-work schemes have beenproposed. They aim to prevent or limit such abusesby demanding potential clients of the service to provethat they have carried out some amount of work before

167

they will be served. In this paper we show that exist-ing resource-based schemes have several problems, andinstead propose latency-based proof-of-work as a solu-tion. We describe centralised and distributed variants,introducing the problem class of non-parallelisableshared secrets in the process. We also discuss applica-tion of this technique at the network layer as a way toprevent Internet distributed denial-of-service attacks.

UCAM-CL-TR-704

Paul William Hunter:

Complexity and infinite games onfinite graphsNovember 2007, 170 pages, PDFPhD thesis (Hughes Hall, July 2007)

Abstract: This dissertation investigates the interplay be-tween complexity, infinite games, and finite graphs. Wepresent a general framework for considering two-playergames on finite graphs which may have an infinite num-ber of moves and we consider the computational com-plexity of important related problems. Such games arebecoming increasingly important in the field of theo-retical computer science, particularly as a tool for for-mal verification of non-terminating systems. The frame-work introduced enables us to simultaneously con-sider problems on many types of games easily, and thisis demonstrated by establishing previously unknowncomplexity bounds on several types of games.

We also present a general framework which uses in-finite games to define notions of structural complexityfor directed graphs. Many important graph parameters,from both a graph theoretic and algorithmic perspec-tive, can be defined in this system. By considering nat-ural generalizations of these games to directed graphs,we obtain a novel feature of digraph complexity: di-rected connectivity. We show that directed connectivityis an algorithmically important measure of complexityby showing that when it is limited, many intractableproblems can be efficiently solved. Whether it is struc-turally an important measure is yet to be seen, howeverthis dissertation makes a preliminary investigation inthis direction.

We conclude that infinite games on finite graphsplay an important role in the area of complexity in the-oretical computer science.

UCAM-CL-TR-705

Alan C. Lawrence:

Optimizing compilation with theValue State Dependence GraphDecember 2007, 183 pages, PDFPhD thesis (Churchill College, May 2007)

Abstract: Most modern compilers are based on vari-ants of the Control Flow Graph. Developments onthis representation—specifically, SSA form and the Pro-gram Dependence Graph (PDG)—have focused onadding and refining data dependence information, andthese suggest the next step is to use a purely data-dependence-based representation such as the VDG(Ernst et al.) or VSDG (Johnson et al.).

This thesis studies such representations, identifyingkey differences in the information carried by the VSDGand several restricted forms of PDG, which relate tofunctional programming and continuations. We unifythese representations in a new framework for specifyingthe sharing of resources across a computation.

We study the problems posed by using the VSDG,and argue that existing techniques have not solved thesequentialization problem of mapping VSDGs back toCFGs. We propose a new compiler architecture break-ing sequentialization into several stages which focus ondifferent characteristics of the input VSDG, and tendto be concerned with different properties of the out-put and target machine. The stages integrate a wide va-riety of important optimizations, exploit opportunitiesoffered by the VSDG to address many common phase-order problems, and unify many operations previouslyconsidered distinct.

Focusing on branch-intensive code, we demonstratehow effective control flow—sometimes superior to thatof the original source code, and comparable to the bestCFG optimization techniques—can be reconstructedfrom just the dataflow information comprising theVSDG. Further, a wide variety of more invasive opti-mizations involving the duplication and specializationof program elements are eased because the VSDG re-laxes the CFG’s overspecification of instruction andbranch ordering. Specifically we identify the optimiza-tion of nested branches as generalizing the problem ofminimizing boolean expressions.

We conclude that it is now practical to discard thecontrol flow information rather than maintain it in par-allel as is done in many previous approaches (e.g. thePDG).

UCAM-CL-TR-706

Steven J. Murdoch:

Covert channel vulnerabilities inanonymity systemsDecember 2007, 140 pages, PDFPhD thesis (Girton College, August 2007)

Abstract: The spread of wide-scale Internet surveillancehas spurred interest in anonymity systems that protectusers’ privacy by restricting unauthorised access to theiridentity. This requirement can be considered as a flowcontrol policy in the well established field of multi-level secure systems. I apply previous research on covert

168

channels (unintended means to communicate in viola-tion of a security policy) to analyse several anonymitysystems in an innovative way.

One application for anonymity systems is to preventcollusion in competitions. I show how covert channelsmay be exploited to violate these protections and con-struct defences against such attacks, drawing from pre-vious covert channel research and collusion-resistantvoting systems.

In the military context, for which multilevel securesystems were designed, covert channels are increasinglyeliminated by physical separation of interconnectedsingle-role computers. Prior work on the remaining net-work covert channels has been solely based on proto-col specifications. I examine some protocol implemen-tations and show how the use of several covert channelscan be detected and how channels can be modified toresist detection.

I show how side channels (unintended informationleakage) in anonymity networks may reveal the be-haviour of users. While drawing on previous researchon traffic analysis and covert channels, I avoid thetraditional assumption of an omnipotent adversary.Rather, these attacks are feasible for an attacker withlimited access to the network. The effectiveness of thesetechniques is demonstrated by experiments on a de-ployed anonymity network, Tor.

Finally, I introduce novel covert and side channelswhich exploit thermal effects. Changes in temperaturecan be remotely induced through CPU load and mea-sured by their effects on crystal clock skew. Experi-ments show this to be an effective attack against Tor.This side channel may also be usable for geolocationand, as a covert channel, can cross supposedly infalli-ble air-gap security boundaries.

This thesis demonstrates how theoretical modelsand generic methodologies relating to covert channelsmay be applied to find practical solutions to problemsin real-world anonymity systems. These findings con-firm the existing hypothesis that covert channel anal-ysis, vulnerabilities and defences developed for multi-level secure systems apply equally well to anonymitysystems.

UCAM-CL-TR-707

Ian Caulfield:

Complexity-effective superscalarembedded processors usinginstruction-level distributedprocessingDecember 2007, 130 pages, PDFPhD thesis (Queens’ College, May 2007)

Abstract: Modern trends in mobile and embedded de-vices require ever increasing levels of performance,

while maintaining low power consumption and sili-con area usage. This thesis presents a new architec-ture for a high-performance embedded processor, basedupon the instruction-level distributed processing (ILDP)methodology. A qualitative analysis of the complex-ity of an ILDP implementation as compared to botha typical scalar RISC CPU and a superscalar designis provided, which shows that the ILDP architectureeliminates or greatly reduces the size of a number ofstructures present in a superscalar architecture, allow-ing its complexity and power consumption to comparefavourably with a simple scalar design.

The performance of an implementation of the ILDParchitecture is compared to some typical processorsused in high-performance embedded systems. The ef-fect on performance of a number of the architecturalparameters is analysed, showing that many of the par-allel structures used within the processor can be scaledto provide less parallelism with little cost to the over-all performance. In particular, the size of the registerfile can be greatly reduced with little average effect onperformance – a size of 32 registers, with 16 visible inthe instruction set, is shown to provide a good trade-offbetween area/power and performance.

Several novel developments to the ILDP architec-ture are then described and analysed. Firstly, a schemeto halve the number of processing elements and thusgreatly reduce silicon area and power consumption isoutlined but proves to result in a 12–14% drop in per-formance. Secondly, a method to reduce the area andpower requirements of the memory logic in the archi-tecture is presented which can achieve similar perfor-mance to the original architecture with a large reduc-tion in area and power requirements or, at an increasedarea/power cost, can improve performance by approxi-mately 24%. Finally, a new organisation for the registerfile is proposed, which reduces the silicon area used bythe register file by approximately three-quarters and al-lows even greater power savings, especially in the casewhere processing elements are power gated.

Overall, it is shown that the ILDP methodologyis a viable approach for future embedded system de-sign, and several new variants on the architecture arecontributed. Several areas of useful future research arehighlighted, especially with respect to compiler designfor the ILDP paradigm.

UCAM-CL-TR-708

Chi-Kin Chau, Jon Crowcroft,Kang-Won Lee, Starsky H.Y. Wong:

IDRM: Inter-Domain RoutingProtocol for Mobile Ad HocNetworksJanuary 2008, 24 pages, PDF

169

Abstract: Inter-domain routing is an important compo-nent to allow interoperation among heterogeneous net-work domains operated by different organizations. Al-though inter-domain routing has been extensively stud-ied in the Internet, it remains relatively unexplored inthe Mobile Ad Hoc Networks (MANETs) space. InMANETs, the inter-domain routing problem is chal-lenged by: (1) dynamic network topology, and (2) di-verse intra-domain ad hoc routing protocols. In thispaper, we propose a networking protocol called IDRM(Inter-Domain Routing Protocol for MANETs) to en-able interoperation among MANETs. IDRM can han-dle the dynamic nature of MANETs and support policy-based routing similarly to BGP. We first discuss the de-sign challenges for inter-domain routing in MANETs,and then present the design of IDRM with illustrativeexamples. Finally, we present a simulation-based studyto understand the operational effectiveness of inter-domain routing and show that the overhead of IDRMis moderate.

UCAM-CL-TR-709

Ford Long Wong:

Protocols and technologies forsecurity in pervasive computing andcommunicationsJanuary 2008, 167 pages, PDFPhD thesis (Girton College, August 2007)

Abstract: As the state-of-the-art edges towards MarkWeiser’s vision of ubiquitous computing (ubicomp), wefound that we have to revise some previous assump-tions about security engineering for this domain. Ubi-comp devices have to be networked together to beable to realize their promise. To communicate securelyamongst themselves, they have to establish secret ses-sion keys, but this is a difficult problem when this isdone primarily over radio in an ad-hoc scenario, i.e.without the aid of an infrastructure (such as a PKI),and when it is assumed that the devices are resource-constrained and cannot perform complex calculations.Secondly, when ubicomp devices are carried by usersas personal items, their permanent identifiers inadver-tently allow the users to be tracked, to the detrimentof user privacy. Unless there are deliberate improve-ments in designing for location privacy, ubicomp de-vices can be trivially detected, and linked to individualusers, with discomfiting echoes of a surveillance soci-ety. Our findings and contributions are thus as follow.In considering session key establishment, we learnt thatasymmetric cryptography is not axiomatically infeasi-ble, and may in fact be essential, to counter possibleattackers, for some of the more computationally ca-pable (and important) devices. We next found existingattacker models to be inadequate, along with existingmodels of bootstrapping security associations, in ubi-comp. We address the inadequacies with a contribution

which we call: ‘multi-channel security protocols’, byleveraging on multiple channels, with different prop-erties, existing in the said environment. We gained anappreciation of the fact that location privacy is really amulti-layer problem, particularly so in ubicomp, wherean attacker often may have access to different layers.Our contributions in this area are to advance the designfor location privacy by introducing a MAC-layer pro-posal with stronger unlinkability, and a physical-layerproposal with stronger unobservability.

UCAM-CL-TR-710

Lucy G. Brace-Evans:

Event structures with persistenceFebruary 2008, 113 pages, PDFPhD thesis (St. John’s College, October 2007)

Abstract: Increasingly, the style of computation ischanging. Instead of one machine running a programsequentially, we have systems with many individualagents running in parallel. The need for mathematicalmodels of such computations is therefore ever greater.

There are many models of concurrent computa-tions. Such models can, for example, provide a seman-tics to process calculi and thereby suggest behaviouralequivalences between processes. They are also keyto the development of automated tools for reasoningabout concurrent systems. In this thesis we exploresome applications and generalisations of one partic-ular model – event structures. We describe a varietyof kinds of morphism between event structures. Eachkind expresses a different sort of behavioural relation-ship. We demonstrate the way in which event structurescan model both processes and types of processes by re-calling a semantics for Affine HOPLA, a higher orderprocess language. This is given in terms of asymmet-ric spans of event structures. We show that such spanssupport a trace construction. This allows the mod-elling of feedback and suggests a semantics for non-deterministic dataflow processes in terms of spans. Thesemantics given is shown to be consistent with Kahn’sfixed point construction when we consider spans mod-elling deterministic processes.

A generalisation of event structures to include per-sistent events is proposed. Based on previously de-scribed morphisms between classical event structures,we define several categories of event structures withpersistence. We show that, unlike for the correspond-ing categories of classical event structures, all are iso-morphic to Kleisli categories of monads on the mostrestricted category. Amongst other things, this providesus with a way of understanding the asymmetric spansmentioned previously as symmetric spans where onemorphism is modified by a monad. Thus we provide ageneral setting for future investigations involving eventstructures.

170

UCAM-CL-TR-711

Saar Drimer, Steven J. Murdoch,Ross Anderson:

Thinking inside the box:system-level failures of tamperproofingFebruary 2008, 37 pages, PDF

Abstract: PIN entry devices (PEDs) are critical secu-rity components in EMV smartcard payment systemsas they receive a customer’s card and PIN. Their ap-proval is subject to an extensive suite of evaluation andcertification procedures. In this paper, we demonstratethat the tamper proofing of PEDs is unsatisfactory, asis the certification process. We have implemented prac-tical low-cost attacks on two certified, widely-deployedPEDs – the Ingenico i3300 and the Dione Xtreme. Bytapping inadequately protected smartcard communica-tions, an attacker with basic technical skills can ex-pose card details and PINs, leaving cardholders opento fraud. We analyze the anti-tampering mechanisms ofthe two PEDs and show that, while the specific protec-tion measures mostly work as intended, critical vulner-abilities arise because of the poor integration of cryp-tographic, physical and procedural protection. As thesevulnerabilities illustrate a systematic failure in the de-sign process, we propose a methodology for doing itbetter in the future. They also demonstrate a seriousproblem with the Common Criteria. We discuss the in-centive structures of the certification process, and showhow they can lead to problems of the kind we identi-fied. Finally, we recommend changes to the CommonCriteria framework in light of the lessons learned.

An abridged version of this paper is to appear at theIEEE Symposium on Security and Privacy, May 2008,Oakland, CA, US.

UCAM-CL-TR-712

Christian Richardt:

Flash-exposure high dynamic rangeimaging: virtual photography anddepth-compensating flashMarch 2008, 9 pages, PDF

Abstract: I present a revised approach to flash-exposurehigh dynamic range (HDR) imaging and demonstratetwo applications of this image representation. The firstapplication enables the creation of realistic ‘virtual pho-tographs’ for arbitrary flash-exposure settings, basedon a single flash-exposure HDR image. The second ap-plication is a novel tone mapping operator for flash-exposure HDR images based on the idea of an ‘intelli-gent flash’. It compensates for the depth-related bright-ness fall-off occurring in flash photographs by takingthe ambient illumination into account.

UCAM-CL-TR-713

Pan Hui:

People are the network:experimental design and evaluation ofsocial-based forwarding algorithmsMarch 2008, 160 pages, PDFPhD thesis (Churchill College, July 2007)

Abstract: Cooperation binds but also divides humansociety into communities. Members of the same com-munity interact with each other preferentially. There isstructure in human society. Within society and its com-munities, individuals have varying popularity. Somepeople are more popular and interact with more peoplethan others; we may call them hubs. I develop methodsto extract this kind of social information from exper-imental traces and use it to choose the next hop for-warders in Pocket Switched Networks (PSNs). I findthat by incorporating social information, forwardingefficiency can be significantly improved. For practicalreasons, I also develop distributed algorithms for infer-ring communities.

Forwarding in Delay Tolerant Networks (DTNs), ormore particularly PSNs, is a challenging problem sincehuman mobility is usually difficult to predict. In thisthesis, I aim to tackle this problem using an experi-mental approach by studying real human mobility. Iperform six mobility experiments in different environ-ments. The resultant experimental datasets are valuablefor the research community. By analysing the experi-mental data, I find out that the inter-contact time ofhumans follows a power-law distribution with coeffi-cient smaller than 1 (over the range of 10 minutes to1 day). I study the limits of “oblivious” forwarding inthe experimental environment and also the impact ofthe power-law coefficient on message delivery.

In order to study social-based forwarding, I developmethods to infer human communities from the dataand use these in the study of social-aware forwarding.I propose several social-aware forwarding schemes andevaluate them on different datasets. I find out that bycombining community and centrality information, for-warding efficiency can be significantly improved, andI call this scheme BUBBLE forwarding with the anal-ogy that each community is a BUBBLE with big bub-bles containing smaller bubbles. For practical deploy-ment of these algorithms, I propose distributed com-munity detection schemes, and also propose methodsto approximate node centrality in the system.

Besides the forwarding study, I also propose a lay-erless data-centric architecture for the PSN scenario toaddress the problem with the status quo in communica-tion (e.g. an infrastructuredependent and synchronousAPI), which brings PSN one step closer to real-worlddeployment.

171

UCAM-CL-TR-714

Tim Moreton:

A wide-area file system for migratingvirtual machinesMarch 2008, 163 pages, PDFPhD thesis (King’s College, February 2007)

Abstract: Improvements in processing power and corebandwidth set against fundamental constraints onwide-area latency increasingly emphasise the positionin the network at which services are deployed. TheXenoServer project is building a platform for dis-tributed computing that facilitates the migration of ser-vices between hosts to minimise client latency and bal-ance load in response to changing patterns of demand.Applications run inside whole-system virtual machines,allowing the secure multiplexing of host resources.

Since services are specified in terms of a completeroot file system and kernel image, a key componentof this architecture is a substrate that provides an ab-straction akin to local disks for these virtual machines,whether they are running, migrating or suspended.However, the same combination of wide-area latency,constrained bandwidth and global scale that motivatesthe XenoServer platform itself impedes the location,management and rapid transfer of storage between de-ployment sites. This dissertation describes Xest, a novelwide-area file system that aims to address these chal-lenges.

I examine Xest’s design, centred on the abstractionof virtual disks, volumes that allow only a single writeryet are transparently available despite migration. Vir-tual disks support the creation of snapshots and maybe rapidly forked into copies that can be modified inde-pendently. This encourages an architectural separationinto node-local file system and global content distribu-tion framework and reduces the dependence of localoperations on wide-area interactions.

I then describe how Xest addresses the dual problemof latency and scale by managing, caching, advertisingand retrieving storage on the basis of groups, sets offiles that correspond to portions of inferred workingsets of client applications. Coarsening the granularityof these interfaces further decouples local and globalactivity: fewer units can lead to fewer interactions andthe maintenance of less addressing state. The precisionof these interfaces is retained by clustering according toobserved access patterns and, in response to evidence ofpoor clusterings, selectively degrading groups into theirconstituent elements.

I evaluate a real deployment of Xest over a wide-area testbed. Doing so entails developing new tools forcapturing and replaying traces to simulate virtual ma-chine workloads. My results demonstrate the practical-ity and high performance of my design and illustratethe trade-offs involved in modifying the granularity ofestablished storage interfaces.

UCAM-CL-TR-715

Feng Hao:

On using fuzzy data in securitymechanismsApril 2008, 69 pages, PDFPhD thesis (Queens’ College, April 2007)

Abstract: Under the microscope, every physical objecthas unique features. It is impossible to clone an object,reproducing exactly the same physical traits. This un-clonability principle has been applied in many securityapplications. For example, the science of biometrics isabout measuring unique personal features. It can au-thenticate individuals with a high level of assurance.Similarly, a paper document can be identified by mea-suring its unique physical properties, such as randomly-interleaving fiber structure.

Unfortunately, when physical measurements are in-volved, errors arise inevitably and the obtained dataare fuzzy by nature. This causes two main problems:1) fuzzy data cannot be used as a cryptographic key,as cryptography demands the key be precise; 2) fuzzydata cannot be sorted easily, which prevents efficientinformation retrieval. In addition, biometric measure-ments create a strong binding between a person and hisunique features, which may conflict with personal pri-vacy. In this dissertation, we study these problems indetail and propose solutions.

First, we propose a scheme to derive error-free keysfrom fuzzy data, such as iris codes. There are twotypes of errors within iris codes: background-noise er-rors and burst errors. Accordingly, we devise a two-layer error correction technique, which first corrects thebackground-noise errors using a Hadamard code, thenthe burst errors using a Reed-Solomon code. Based ona database of 700 iris images, we demonstrate that anerror-free key of 140 bits can be reliably reproducedfrom genuine iris codes with a 99.5% success rate. Inaddition, despite the irrevocability of the underlyingbiometric data, the keys produced using our techniquecan be easily revoked or updated.

Second, we address the search problem for a largefuzzy database that stores iris codes or data with a sim-ilar structure. Currently, the algorithm used in all publicdeployments of iris recognition is to search exhaustivelythrough a database of iris codes, looking for a matchthat is close enough. We propose a much more efficientsearch algorithm: Beacon Guided Search (BGS). BGSworks by indexing iris codes, adopting a “multiple col-liding segments principle” along with an early termina-tion strategy to reduce the search range dramatically.We evaluate this algorithm using 632,500 real-worldiris codes, showing a substantial speed-up over exhaus-tive search with a negligible loss of precision. In addi-tion, we demonstrate that our empirical findings matchtheoretical analysis.

172

Finally, we study the anonymous-veto problem,which is more commonly known as the Dining Cryp-tographers problem: how to perform a secure multi-party computation of the boolean-OR function, whilepreserving the privacy of each input bit. The solution tothis problem has general applications in security goingway beyond biometrics. Even though there have beenseveral solutions presented over the past 20 years, wepropose a new solution called: Anonymous Veto Net-work (AV-net). Compared with past work, the AV-netprotocol provides the strongest protection of each del-egate’s privacy against collusion; it requires only tworounds of broadcast, fewer than any other solutions;the computational load and bandwidth usage are thelowest among the available techniques; and our proto-col does not require any private channels or third par-ties. Overall, it seems unlikely that, with the same un-derlying technology, there can be any other solutionssignificantly more efficient than ours.

UCAM-CL-TR-716

Gavin M. Bierman, Matthew J. Parkinson,James Noble:

UpgradeJ: Incremental typecheckingfor class upgradesApril 2008, 35 pages, PDF

Abstract: One of the problems facing developers is theconstant evolution of components that are used to buildapplications. This evolution is typical of any multi-person or multi-site software project. How can we pro-gram in this environment? More precisely, how can lan-guage design address such evolution? In this paper weattack two significant issues that arise from constantcomponent evolution: we propose language- level ex-tensions that permit multiple, co-existing versions ofclasses and the ability to dynamically upgrade from oneversion of a class to another, whilst still maintainingtype safety guarantees and requiring only lightweightextensions to the runtime infrastructure. We show howour extensions, whilst intuitive, provide a great deal ofpower by giving a number of examples. Given the sub-tlety of the problem, we formalize a core fragment ofour language and prove a number of important safetyproperties.

UCAM-CL-TR-717

Stephen Julian Rymill:

Psychologically-based simulation ofhuman behaviourJune 2008, 250 pages, PDFPhD thesis (Jesus College, 2006)

Abstract: The simulation of human behaviour is a keyarea of computer graphics as there is currently a greatdemand for animations consisting of virtual humancharacters, ranging from film special effects to buildingdesign. Currently, animated characters can either be la-boriously created by hand, or by using an automatedsystem: however, results from the latter may still lookartificial and require much further manual work.

The aim of this work is to improve the automatedsimulation of human behaviour by making use of ideasfrom psychology research; the ways in which this re-search has been used are made clear throughout thisthesis. It has influenced all aspects of the design:• Collision avoidance techniques are based on ob-

served practices.• Actors have simulated vision and attention.• Actors can be given a variety of moods and emo-

tions to affect their behaviour.This thesis discusses the benefits of the simulation of

attention; this technique recreates the eye movementsof each actor, and allows each actor to build up its ownmental model of its surroundings. It is this model thatthe actor then uses in its decisions on how to behave:techniques for collision prediction and collision avoid-ance are discussed. On top of this basic behaviour, vari-ability is introduced by allowing all actors to have dif-ferent sets of moods and emotions, which influence allaspects of their behaviour. The real-time 3D simulationcreated to demonstrate the actors’ behaviour is also de-scribed.

This thesis demonstrates that the use of techniquesbased on psychology research leads to a qualitative andquantitative improvement in the simulation of humanbehaviour; this is shown through a variety of picturesand videos, and by results of numerical experimentsand user testing. Results are compared with previouswork in the field, and with real human behaviour.

UCAM-CL-TR-718

Tyler Moore:

Cooperative attack and defense indistributed networksJune 2008, 172 pages, PDFPhD thesis (St. John’s College, March 2008)

Abstract: The advance of computer networking hasmade cooperation essential to both attackers and de-fenders. Increased decentralization of network owner-ship requires devices to interact with entities beyondtheir own realm of control. The distribution of in-telligence forces decisions to be taken at the edge.The exposure of devices makes multiple, simultaneousattacker-chosen compromise a credible threat. Motiva-tion for this thesis derives from the observation that itis often easier for attackers to cooperate than for de-fenders to do so. I describe a number of attacks whichexploit cooperation to devastating effect. I also propose

173

and evaluate defensive strategies which require cooper-ation.

I first investigate the security of decentralized, or‘ad-hoc’, wireless networks. Many have proposed pre-loading symmetric keys onto devices. I describe twopractical attacks on these schemes. First, attackers maycompromise several devices and share the pre-loadedsecrets to impersonate legitimate users. Second, when-ever some keys are not pre-assigned but exchangedupon deployment, a revoked attacker can rejoin thenetwork.

I next consider defensive strategies where devicescollectively decide to remove a malicious device fromthe network. Existing voting-based protocols are maderesilient to the attacks I have developed, and I proposealternative strategies that can be more efficient and se-cure. First, I describe a reelection protocol which relieson positive affirmation from peers to continue partici-pation. Then I describe a more radical alternative calledsuicide: a good device removes a bad one unilaterallyby declaring both devices dead. Suicide offers signifi-cant improvements in speed and efficiency compared tovoting-based decision mechanisms. I then apply suicideand voting to revocation in vehicular networks.

Next, I empirically investigate attack and defensein another context: phishing attacks on the Internet.I have found evidence that one group responsible forhalf of all phishing, the rock-phish gang, cooperates bypooling hosting resources and by targeting many bankssimultaneously. These cooperative attacks are shown tobe far more effective.

I also study the behavior of defenders – banks andInternet service providers – who must cooperate toremove malicious sites. I find that phishing-websitelifetimes follow a long-tailed lognormal distribution.While many sites are removed quickly, others remainmuch longer. I examine several feeds from professional‘take-down’ companies and find that a lack of datasharing helps many phishing sites evade removal forlong time periods.

One anti-phishing organization has relied on vol-unteers to submit and verify suspected phishing sites.I find its voting-based decision mechanism to be slowerand less comprehensive than unilateral verification per-formed by companies. I also note that the distribu-tion of user participation is highly skewed, leaving thescheme vulnerable to manipulation.

UCAM-CL-TR-719

William H. Billingsley:

The Intelligent Book: technologies forintelligent and adaptive textbooks,focussing on Discrete MathematicsJune 2008, 156 pages, PDFPhD thesis (Wolfson College, April 2007)

Abstract: An “Intelligent Book” is a Web-based text-book that contains exercises that are backed by com-puter models or reasoning systems. Within the exer-cises, students work using appropriate graphical no-tations and diagrams for the subject matter, and com-ments and feedback from the book are related into thecontent model of the book. The content model can beextended by its readers. This dissertation examines thequestion of how to provide an Intelligent Book thatcan support undergraduate questions in Number The-ory, and particularly questions that allow the student towrite a proof as the answer. Number Theory questionspose a challenge not only because the student is work-ing on an unfamiliar topic in an unfamiliar syntax, butalso because there is no straightforward procedure forhow to prove an arbitrary Number Theory problem.

The main contribution is a system for support-ing student-written proof exercises, backed by the Is-abelle/HOL automated proof assistant and a set ofteaching scripts. Students write proofs using Math-sTiles: a graphical notation consisting of composabletiles, each of which can contain an arbitrary piece ofmathematics or logic written by the teacher. These tilesresemble parts of the proof as it might be written on pa-per, and are translated into Isabelle/HOL’ws Isar syntaxon the server. Unlike traditional syntax-directed editors,MathsTiles allow students to freely sketch out parts ofan answer and do not constrain the order in which ananswer is written. They also allow details of the lan-guage to change between or even during questions.

A number of smaller contributions are also pre-sented. By using the dynamic nature of MathsTiles,a type of proof exercise is developed where the stu-dent must search for the statements he or she wishesto use. This allows questions to be supported by infor-mal modelling, making them much easier to write, butstill ensures that the interface does not act as a prop forthe answer. The concept of searching for statements isextended to develop “massively multiple choice” ques-tions: a mid-point between the multiple choice andshort answer formats. The question architecture thatis presented is applicable across different notationalforms and different answer analysis techniques. Thecontent architecture uses an informal ontology that en-ables students and untrained users to add and adaptcontent within the book, including adding their ownchapters, while ensuring the content can also be re-ferred to by the models and systems that advise studentsduring exercises.

UCAM-CL-TR-720

Lauri I.W. Pesonen:

A capability-based access controlarchitecture for multi-domainpublish/subscribe systemsJune 2008, 175 pages, PDFPhD thesis (Wolfson College, December 2007)

174

Abstract: Publish/subscribe is emerging as the favouredcommunication paradigm for large-scale, wide-areadistributed systems. The publish/subscribe many-to-many interaction model together with asynchronousmessaging provides an efficient transport for highly dis-tributed systems in high latency environments with di-rect peer-to-peer interactions amongst the participants.

Decentralised publish/subscribe systems implementthe event service as a network of event brokers. Thebroker network makes the system more resilient tofailures and allows it to scale up efficiently as thenumber of event clients increases. In many cases suchdistributed systems will only be feasible when imple-mented over the Internet as a joint effort spanning mul-tiple administrative domains. The participating mem-bers will benefit from the federated event broker net-works both with respect to the size of the system aswell as its fault-tolerance.

Large-scale, multi-domain environments require ac-cess control; users will have different privileges forsending and receiving instances of different event types.Therefore, we argue that access control is vital fordecentralised publish/subscribe systems, consisting ofmultiple independent administrative domains, to everbe deployable in large scale.

This dissertation presents MAIA, an access con-trol mechanism for decentralised, type-based pub-lish/subscribe systems. While the work concentrateson type-based publish/subscribe the contributions areequally applicable to both topic and content-based pub-lish/subscribe systems.

Access control in distributed publish/subscribe re-quires secure, distributed naming, and mechanisms forenforcing access control policies. The first contributionof this thesis is a mechanism for names to be referencedunambiguously from policy without risk of forgeries.The second contribution is a model describing howsigned capabilities can be used to grant domains andtheir members’ access rights to event types in a scal-able and expressive manner. The third contribution is amodel for enforcing access control in the decentralisedevent service by encrypting event content.

We illustrate the design and implementation ofMAIA with a running example of the UK Police In-formation Technology Organisation and the UK policeforces.

UCAM-CL-TR-721

Ben W. Medlock:

Investigating classification fornatural language processing tasksJune 2008, 138 pages, PDFPhD thesis (Fitzwilliam College, September 2007)

Abstract: This report investigates the application ofclassification techniques to four natural language pro-cessing (NLP) tasks. The classification paradigm falls

within the family of statistical and machine learning(ML) methods and consists of a framework withinwhich a mechanical ‘learner’ induces a functional map-ping between elements drawn from a particular samplespace and a set of designated target classes. It is appli-cable to a wide range of NLP problems and has metwith a great deal of success due to its flexibility andfirm theoretical foundations.

The first task we investigate, topic classification, isfirmly established within the NLP/ML communities as abenchmark application for classification research. Ouraim is to arrive at a deeper understanding of how classgranularity affects classification accuracy and to assessthe impact of representational issues on different classi-fication models. Our second task, content-based spamfiltering, is a highly topical application for classificationtechniques due to the ever-worsening problem of unso-licited email. We assemble a new corpus and formulatea state-of-the-art classifier based on structured languagemodel components. Thirdly, we introduce the problemof anonymisation, which has received little attention todate within the NLP community. We define the task interms of obfuscating potentially sensitive references toreal world entities and present a new publicly-availablebenchmark corpus. We explore the implications of thesubjective nature of the problem and present an inter-active model for anonymising large quantities of databased on syntactic analysis and active learning. Finally,we investigate the task of hedge classification, a rel-atively new application which is currently of grow-ing interest due to the expansion of research into theapplication of NLP techniques to scientific literaturefor information extraction. A high level of annotationagreement is obtained using new guidelines and a newbenchmark corpus is made publicly available. As partof our investigation, we develop a probabilistic modelfor training data acquisition within a semi-supervisedlearning framework which is explored both theoreti-cally and experimentally.

Throughout the report, many common themesof fundamental importance to classification forNLP are addressed, including sample representa-tion, performance evaluation, learning model selec-tion, linguistically-motivated feature engineering, cor-pus construction and real-world application.

UCAM-CL-TR-722

Mbou Eyole-Monono:

Energy-efficient sentient computingJuly 2008, 138 pages, PDFPhD thesis (Trinity College, January 2008)

Abstract: In a bid to improve the interaction betweencomputers and humans, it is becoming necessary tomake increasingly larger deployments of sensor net-works. These clusters of small electronic devices canbe embedded in our surroundings and can detect and

175

react to physical changes. They will make comput-ers more proactive in general by gathering and inter-preting useful information about the physical environ-ment through a combination of measurements. Increas-ing the performance of these devices will mean moreintelligence can be embedded within the sensor net-work. However, most conventional ways of increasingperformance often come with the burden of increasedpower dissipation which is not an option for energy-constrained sensor networks. This thesis proposes, de-velops and tests a design methodology for perform-ing greater amounts of processing within a sensor net-work while satisfying the requirement for low energyconsumption. The crux of the thesis is that there is agreat deal of concurrency present in sensor networkswhich when combined with a tightly-coupled group ofsmall, fast, energy-conscious processors can result in asignificantly more efficient network. The constructionof a multiprocessor system aimed at sensor networksis described in detail. It is shown that a routine criti-cal to sensor networks can be sped up with the addi-tion of a small set of primitives. The need for a veryfast inter-processor communication mechanism is high-lighted, and the hardware scheduler developed as partof this effort forms the cornerstone of the new sen-tient computing framework by facilitating thread op-erations and minimising the time required for context-switching. The experimental results also show that end-to-end latency can be reduced in a flexible way throughmultiprocessing.

UCAM-CL-TR-723

Richard Southern:

Animation manifolds for representingtopological alterationJuly 2008, 131 pages, PDFPhD thesis (Clare Hall, February 2008)

Abstract: An animation manifold encapsulates an ani-mation sequence of surfaces contained within a higherdimensional manifold with one dimension being time.An iso–surface extracted from this structure is a frameof the animation sequence.

In this dissertation I make an argument for the useof animation manifolds as a representation of complexanimation sequences. In particular animation mani-folds can represent transitions between shapes with dif-fering topological structure and polygonal density.

I introduce the animation manifold, and show howit can be constructed from a keyframe animation se-quence and rendered using raytracing or graphics hard-ware. I then adapt three Laplacian editing frameworksto the higher dimensional context. I derive new bound-ary conditions for both primal and dual Laplacianmethods, and present a technique to adaptively regu-larise the sampling of a deformed manifold after edit-ing.

The animation manifold can be used to represent amorph sequence between surfaces of arbitrary topol-ogy. I present a novel framework for achieving thisby connecting planar cross sections in a higher dimen-sion with a new constrained Delaunay triangulation.Topological alteration is achieved by using the Voronoiskeleton, a novel structure which provides a fast medialaxis approximation.

UCAM-CL-TR-724

Ulrich Paquet:

Bayesian inference for latent variablemodelsJuly 2008, 137 pages, PDFPhD thesis (Wolfson College, March 2007)

Abstract: Bayes’ theorem is the cornerstone of statis-tical inference. It provides the tools for dealing withknowledge in an uncertain world, allowing us to ex-plain observed phenomena through the refinement ofbelief in model parameters. At the heart of this elegantframework lie intractable integrals, whether in comput-ing an average over some posterior distribution, or indetermining the normalizing constant of a distribution.This thesis examines both deterministic and stochasticmethods in which these integrals can be treated. Of par-ticular interest shall be parametric models where theparameter space can be extended with additional latentvariables to get distributions that are easier to handlealgorithmically.

Deterministic methods approximate the posteriordistribution with a simpler distribution over whichthe required integrals become tractable. We derive andexamine a new generic α-divergence message passingscheme for a multivariate mixture of Gaussians, a par-ticular modeling problem requiring latent variables.This algorithm minimizes local α-divergences over achosen posterior factorization, and includes variationalBayes and expectation propagation as special cases.

Stochastic (or Monte Carlo) methods rely on a sam-ple from the posterior to simplify the integration tasks,giving exact estimates in the limit of an infinite sam-ple. Parallel tempering and thermodynamic integrationare introduced as ‘gold standard’ methods to samplefrom multimodal posterior distributions and determinenormalizing constants. A parallel tempered approachto sampling from a mixture of Gaussians posteriorthrough Gibbs sampling is derived, and novel meth-ods are introduced to improve the numerical stabilityof thermodynamic integration.

A full comparison with parallel tempering and ther-modynamic integration shows variational Bayes, ex-pectation propagation, and message passing with theHellinger distance α = 1/2 to be perfectly suitable formodel selection, and for approximating the predictivedistribution with high accuracy.

Variational and stochastic methods are combinedin a novel way to design Markov chain Monte Carlo

176

(MCMC) transition densities, giving a variational tran-sition kernel, which lower bounds an exact transitionkernel. We highlight the general need to mix variationalmethods with other MCMC moves, by proving that thevariational kernel does not necessarily give a geometri-cally ergodic chain.

UCAM-CL-TR-725

Hamed Haddadi, Damien Fay,Almerima Jamakovic, Olaf Maennel,Andrew W. Moore, Richard Mortier,Miguel Rio, Steve Uhlig:

Beyond node degree:evaluating AS topology modelsJuly 2008, 16 pages, PDF

Abstract: Many models have been proposed to gener-ate Internet Autonomous System (AS) topologies, mostof which make structural assumptions about the ASgraph. In this paper we compare AS topology gener-ation models with several observed AS topologies. Incontrast to most previous works, we avoid making as-sumptions about which topological properties are im-portant to characterize the AS topology. Our analy-sis shows that, although matching degree-based prop-erties, the existing AS topology generation models failto capture the complexity of the local interconnectionstructure between ASs. Furthermore, we use BGP datafrom multiple vantage points to show that additionalmeasurement locations significantly affect local struc-ture properties, such as clustering and node centrality.Degree-based properties, however, are not notably af-fected by additional measurements locations. These ob-servations are particularly valid in the core. The short-comings of AS topology generation models stems froman underestimation of the complexity of the connectiv-ity in the core caused by inappropriate use of BGP data.

UCAM-CL-TR-726

Viktor Vafeiadis:

Modular fine-grained concurrencyverificationJuly 2008, 148 pages, PDFPhD thesis (Selwyn College, July 2007)

Abstract: Traditionally, concurrent data structures areprotected by a single mutual exclusion lock so thatonly one thread may access the data structure at anytime. This coarse-grained approach makes it relativelyeasy to reason about correctness, but it severely limitsparallelism. More advanced algorithms instead perform

synchronisation at a finer grain. They employ sophisti-cated synchronisation schemes (both blocking and non-blocking) and are usually written in low-level languagessuch as C.

This dissertation addresses the formal verification ofsuch algorithms. It proposes techniques that are modu-lar (and hence scalable), easy for programmers to use,and yet powerful enough to verify complex algorithms.In doing so, it makes two theoretical and two practicalcontributions to reasoning about fine-grained concur-rency.

First, building on rely/guarantee reasoning and sep-aration logic, it develops a new logic, RGSep, thatsubsumes these two logics and enables simple, mod-ular proofs of fine-grained concurrent algorithms thatuse complex dynamically allocated data structures andmay explicitly deallocate memory. RGSep allows forownership-based reasoning and ownership transfer be-tween threads, while maintaining the expressiveness ofbinary relations to describe inter-thread interference.

Second, it describes techniques for proving linearis-ability, the standard correctness condition for fine-grained concurrent algorithms. The main proof tech-nique is to introduce auxiliary single-assignment vari-ables to capture the linearisation point and to inline theabstract effect of the program at that point as auxiliarycode.

Third, it demonstrates this approach by proving lin-earisability of a collection of concurrent list and stackalgorithms, as well as providing the first correctnessproofs of the RDCSS and MCAS implementations ofHarris et al.

Finally, it describes a prototype safety checker,SmallfootRG, for fine-grained concurrent programsthat is based on RGSep. SmallfootRG proves simplesafety properties for a number of list and stack algo-rithms and verifies the absence of memory leaks.

UCAM-CL-TR-727

Ruoshui Liu, Ian J. Wassell:

A novel auto-calibration system forwireless sensor motesSeptember 2008, 65 pages, PDF

Abstract: In recent years, Wireless Sensor Networks(WSNs) research has undergone a quiet revolution, pro-viding a new paradigm for sensing and disseminat-ing information from various environments. In reality,the wireless propagation channel in many harsh en-vironments has a significant impact on the coveragerange and quality of the radio links between the wire-less nodes (motes). Therefore, the use of diversity tech-niques (e.g., frequency diversity and spatial diversity)must be considered to ameliorate the notoriously vari-able and unpredictable point-to-point radio communi-cation links. However, in order to determine the spaceand frequency diversity characteristics of the channel,

177

accurate measurements need to be made. The most rep-resentative and inexpensive solution is to use motes,however they suffer poor accuracy owing to their low-cost and compromised radio frequency (RF) perfor-mance.

In this report we present a novel automated calibra-tion system for characterising mote RF performance.The proposed strategy provides us with good knowl-edge of the actual mote transmit power, RSSI charac-teristics and receive sensitivity by establishing calibra-tion tables for transmitting and receiving mote pairsover their operating frequency range. The validated re-sults show that our automated calibration system canachieve an increase of ±1.5 dB in the RSSI accuracy.In addition, measurements of the mote transmit powershow a significant difference from that claimed in themanufacturer’s data sheet. The proposed calibrationmethod can also be easily applied to wireless sensormotes from virtually any vendor, provided they havea RF connector.

UCAM-CL-TR-728

Peter J.C. Brown, Christopher T. Faigle:

A robust efficient algorithm forpoint location in triangulationsFebruary 1997, 16 pages, PDF

Abstract: This paper presents a robust alternative toprevious approaches to the problem of point locationin triangulations represented using the quadedge datastructure. We generalise the reasons for the failure ofan earlier routine to terminate when applied to certainnon-Delaunay triangulations. This leads to our new de-terministic algorithm which we prove is guaranteed toterminate. We also present a novel heuristic for choos-ing a starting edge for point location queries and showthat this greatly enhances the efficiency of point loca-tion for the general case.

UCAM-CL-TR-729

Damien Fay, Hamed Haddadi, Steve Uhlig,Andrew W. Moore, Richard Mortier,Almerima Jamakovic:

Weighted spectral distributionSeptember 2008, 13 pages, PDF

Abstract: Comparison of graph structures is a fre-quently encountered problem across a number of prob-lem domains. Comparing graphs requires a metric todiscriminate which features of the graphs are con-sidered important. The spectrum of a graph is oftenclaimed to contain all the information within a graph,but the raw spectrum contains too much informationto be directly used as a useful metric. In this paper

we introduce a metric, the weighted spectral distribu-tion, that improves on the raw spectrum by discount-ing those eigenvalues believed to be unimportant andemphasizing the contribution of those believed to beimportant.

We use this metric to optimize the selection of pa-rameter values for generating Internet topologies. Ourmetric leads to parameter choices that appear sensiblegiven prior knowledge of the problem domain: the re-sulting choices are close to the default values of thetopology generators and, in the case of the AB gener-ator, fall within the expected region. This metric pro-vides a means for meaningfully optimizing parameterselection when generating topologies intended to sharestructure with, but not match exactly, measured graphs.

UCAM-CL-TR-730

Robert J. Ennals:

Adaptive evaluation of non-strictprogramsAugust 2008, 243 pages, PDFPhD thesis (King’s College, June 2004)

Abstract: Most popular programming languages arestrict. In a strict language, the binding of a variable toan expression coincides with the evaluation of the ex-pression.

Non-strict languages attempt to make life easier forprogrammers by decoupling expression binding and ex-pression evaluation. In a non-strict language, a variablecan be bound to an unevaluated expression, and suchexpressions can be passed around just like values in astrict language. This separation allows the programmerto declare a variable at the point that makes most log-ical sense, rather than at the point at which its value isknown to be needed.

Non-strict languages are usually evaluated using atechnique called Lazy Evaluation. Lazy Evaluation willonly evaluate an expression when its value is knownto be needed. While Lazy Evaluation minimises the to-tal number of expressions evaluated, it imposes a con-siderable bookkeeping overhead, and has unpredictablespace behaviour.

In this thesis, we present a new evaluation strat-egy which we call Optimistic Evaluation. OptimisticEvaluation blends lazy and eager evaluation under theguidance of an online profiler. The online profiler ob-serves the running program and decides which expres-sions should be evaluated lazily, and which should beevaluated eagerly. We show that the worst case perfor-mance of Optimistic Evaluation relative to Lazy Evalu-ation can be bounded with an upper bound chosen bythe user. Increasing this upper bound allows the pro-filer to take greater risks and potentially achieve betteraverage performance.

This thesis describes both the theory and practice ofOptimistic Evaluation. We start by giving an overview

178

of Optimistic Evaluation. We go on to present a formalmodel, which we use to justify our design. We then de-tail how we have implemented Optimistic Evaluationas part of an industrial-strength compiler. Finally, weprovide experimental results to back up our claims.

UCAM-CL-TR-731

Matthew Johnson:

A new approach to Internet bankingSeptember 2008, 113 pages, PDFPhD thesis (Trinity Hall, July 2008)

Abstract: This thesis investigates the protection land-scape surrounding online banking. First, electronicbanking is analysed for vulnerabilities and a survey ofcurrent attacks is carried out. This is represented graph-ically as an attack tree describing the different ways inwhich online transactions can be attacked.

The discussion then moves on to various defenceswhich have been developed, categorizing them and an-alyzing how successful they are at protecting against theattacks given in the first chapter. This covers everythingfrom TLS encryption through phishing site detection totwo-factor authentication.

Having declared all current schemes for protectingonline banking lacking in some way, the key aspects ofthe problem are identified. This is followed by a pro-posal for a more robust defence system which uses asmall security device to create a trusted path to thecustomer, rather than depend upon trusting the cus-tomer’s computer. The protocol for this system is de-scribed along with all the other restrictions requiredfor actual use. This is followed by a description of ademonstration implementation of the system.

Extensions to the system are then proposed, de-signed to afford extra protection for the consumer andalso to support other types of device. There is then adiscussion of ways of managing keys in a heterogeneoussystem, rather than one managed by a single entity.

The conclusion discusses the weaknesses of the pro-posed scheme and evaluates how successful it is likelyto be in practice and what barriers there may be toadoption in the banking system.

UCAM-CL-TR-732

Alban Rrustemi:

Computing surfaces – a platform forscalable interactive displaysNovember 2008, 156 pages, PDFPhD thesis (St Edmund’s College, November 2008)

Abstract: Recent progress in electronic, display andsensing technologies makes possible a future with om-nipresent, arbitrarily large interactive display surfaces.Nonetheless, current methods of designing display sys-tems with multi-touch sensitivity do not scale. This the-sis presents computing surfaces as a viable platform forresolving forthcoming scalability limitations.

Computing surfaces are composed of a homoge-neous network of physically adjoined, small sensitivedisplays with local computation and communicationcapabilities. In this platform, inherent scalability is pro-vided by a distributed architecture. The regular spatialdistribution of resources presents new demands on theway surface input and output information is managedand processed.

Direct user input with touch based gestures needs toaccount for the distributed architecture of computingsurfaces. A scalable middleware solution that concealsthe tiled architecture is proposed for reasoning withtouch-based gestures. The validity of this middlewareis proven in a case study, where a fully distributed algo-rithm for online recognition of unistrokes – a particularclass of touch-based gestures – is presented and evalu-ated.

Novel interaction techniques based around interac-tive display surfaces involve direct manipulation withdisplayed digital objects. In order to facilitate suchinteractions in computing surfaces, an efficient dis-tributed algorithm to perform 2D image transforma-tions is introduced and evaluated. The performance ofthese transformations is heavily influenced by the ar-bitration policies of the interconnection network. Oneapproach for improving the performance of these trans-formations in conventional network architectures isproposed and evaluated.

More advanced applications in computing surfacesrequire the presence of some notion of time. An efficientalgorithm for internal time synchronisation is presentedand evaluated. A hardware solution is adopted to min-imise the delay uncertainty of special timestamp mes-sages. The proposed algorithm allows efficient, scalabletime synchronisation among clusters of tiles. A hard-ware reference platform is constructed to demonstratethe basic principles and features of computing surfaces.This platform and a complementary simulation envi-ronment is used for extensive evaluation and analysis.

UCAM-CL-TR-733

Darren Edge:

Tangible user interfaces for peripheralinteractionDecember 2008, 237 pages, PDFPhD thesis (Jesus College, January 2008)

Abstract: Since Mark Weiser’s vision of ubiquitouscomputing in 1988, many research efforts have beenmade to move computation away from the workstation

179

and into the world. One such research area focuses on“Tangible” User Interfaces or TUIs – those that provideboth physical representation and control of underlyingdigital information.

This dissertation describes how TUIs can support a“peripheral” style of interaction, in which users engagein short, dispersed episodes of low-attention interactionwith digitally-augmented physical tokens. The applica-tion domain in which I develop this concept is the of-fice context, where physical tokens can represent itemsof common interest to members of a team whose workis mutually interrelated, but predominantly performedindependently by individuals at their desks.

An “analytic design process” is introduced as a wayof developing TUI designs appropriate for their in-tended contexts of use. This process is then used topresent the design of a bimanual desktop TUI thatcomplements the existing workstation, and encouragesperipheral interaction in parallel with workstation-intensive tasks. Implementation of a prototype TUI isthen described, comprising “task” tokens for work-time management, “document” tokens for face-to-facesharing of collaborative documents, and “contact” to-kens for awareness of other team members’ status andworkload. Finally, evaluation of this TUI is presentedvia description of its extended deployment in a real of-fice context.

The main empirically-grounded results of this workare a categorisation of the different ways in which userscan interact with physical tokens, and an identificationof the qualities of peripheral interaction that differenti-ate it from other interaction styles. The foremost ben-efits of peripheral interaction were found to arise fromthe freedom with which tokens can be appropriated tocreate meaningful information structures of both cog-nitive and social significance, in the physical desktopenvironment and beyond.

UCAM-CL-TR-734

Philip Tuddenham:

Tabletop interfaces for remotecollaborationDecember 2008, 243 pages, PDFPhD thesis (Gonville and Caius College, June 2008)

Abstract: Effective support for synchronous remote col-laboration has long proved a desirable yet elusive goalfor computer technology. Although video views show-ing the remote participants have recently improved,technologies providing a shared visual workspace of thetask still lack support for the visual cues and work prac-tices of co-located collaboration.

Researchers have recently demonstrated sharedworkspaces for remote collaboration using large hor-izontal interactive surfaces. These remote tabletop in-terfaces may afford the beneficial work practices associ-ated with co-located collaboration around tables. How-ever, there has been little investigation of remote table-top interfaces beyond limited demonstrations. There is

currently little theoretical basis for their design, and lit-tle empirical characterisation of their support for col-laboration. The construction of remote tabletop appli-cations also presents considerable technical challenges.

This dissertation addresses each of these areas.Firstly, a theory of workspace awareness is applied toconsider the design of remote tabletop interfaces andthe work practices that they may afford.

Secondly, two technical barriers to the rapid explo-ration of useful remote tabletop applications are iden-tified: the low resolution of conventional tabletop dis-plays; and the lack of support for existing user interfacecomponents. Techniques from multi-projector displaywalls are applied to address these problems. The result-ing method is evaluated empirically and used to createa number of novel tabletop interfaces.

Thirdly, an empirical investigation compares remoteand co-located tabletop interfaces. The findings showhow the design of remote tabletop interfaces leads tocollaborators having a high level of awareness of eachother’s actions in the workspace. This enables smoothtransitions between individual and group work, to-gether with anticipation and assistance, similar to co-located tabletop collaboration. However, remote table-top collaborators use different coordination mecha-nisms from co-located collaborators. The results haveimplications for the design and future study of these in-terfaces.

UCAM-CL-TR-735

Diarmuid O Seaghdha:

Learning compound noun semanticsDecember 2008, 167 pages, PDFPhD thesis (Corpus Christi College, July 2008)

Abstract: This thesis investigates computational ap-proaches for analysing the semantic relations in com-pound nouns and other noun-noun constructions.Compound nouns in particular have received a greatdeal of attention in recent years due to the challengesthey pose for natural language processing systems. Onereason for this is that the semantic relation between theconstituents of a compound is not explicitly expressedand must be retrieved from other sources of linguisticand world knowledge.

I present a new scheme for the semantic annotationof compounds, describing in detail the motivation forthe scheme and the development process. This schemeis applied to create an annotated dataset for use incompound interpretation experiments. The results ofa dual-annotator experiment indicate that good agree-ment can be obtained with this scheme relative to pre-viously reported results and also provide insights intothe challenging nature of the annotation task.

I describe two corpus-driven paradigms for compar-ing pairs of nouns: lexical similarity and relational sim-ilarity. Lexical similarity is based on comparing each

180

constituent of a noun pair to the corresponding con-stituent of another pair. Relational similarity is basedon comparing the contexts in which both constituentsof a noun pair occur together with the correspondingcontexts of another pair. Using the flexible frameworkof kernel methods, I develop techniques for implement-ing both similarity paradigms.

A standard approach to lexical similarity representswords by their co-occurrence distributions. I describe afamily of kernel functions that are designed for the clas-sification of probability distributions. The appropriate-ness of these distributional kernels for semantic tasks issuggested by their close connection to proven measuresof distributional lexical similarity. I demonstrate the ef-fectiveness of the lexical similarity model by applying itto two classification tasks: compound noun interpreta-tion and the 2007 SemEval task on classifying semanticrelations between nominals.

To implement relational similarity I use kernels onstrings and sets of strings. I show that distributional setkernels based on a multinomial probability model canbe computed many times more efficiently than previ-ously proposed kernels, while still achieving equal orbetter performance. Relational similarity does not per-form as well as lexical similarity in my experiments.However, combining the two models brings an im-provement over either model alone and achieves state-of-the-art results on both the compound noun and Se-mEval Task 4 datasets.

UCAM-CL-TR-736

Mike Dodds, Xinyu Feng,Matthew Parkinson, Viktor Vafeiadis:

Deny-guarantee reasoningJanuary 2009, 82 pages, PDF

Abstract: Rely-guarantee is a well-established approachto reasoning about concurrent programs that use par-allel composition. However, parallel composition is nothow concurrency is structured in real systems. Instead,threads are started by ‘fork’ and collected with ‘join’commands. This style of concurrency cannot be rea-soned about using rely-guarantee, as the life-time of athread can be scoped dynamically. With parallel com-position the scope is static.

In this paper, we introduce deny-guarantee rea-soning, a reformulation of rely-guarantee that enablesreasoning about dynamically scoped concurrency. Webuild on ideas from separation logic to allow interfer-ence to be dynamically split and recombined, in a simi-lar way that separation logic splits and joins heaps. Toallow this splitting, we use deny and guarantee permis-sions: a deny permission specifies that the environmentcannot do an action, and guarantee permission allow usto do an action. We illustrate the use of our proof sys-tem with examples, and show that it can encode all theoriginal rely-guarantee proofs. We also present the se-mantics and soundness of the deny-guarantee method.

UCAM-CL-TR-737

Na Xu:

Static contract checking for HaskellDecember 2008, 175 pages, PDFPhD thesis (Churchill College, August 2008)

Abstract: Program errors are hard to detect and arecostly, to both programmers who spend significant ef-forts in debugging, and for systems that are guarded byruntime checks. Static verification techniques have beenapplied to imperative and object-oriented languages,like Java and C#, for checking basic safety proper-ties such as memory leaks. In a pure functional lan-guage, many of these basic properties are guaranteedby design, which suggests the opportunity for verify-ing more sophisticated program properties. Neverthe-less, few automatic systems for doing so exist. In thisthesis, we show the challenges and solutions to verify-ing advanced properties of a pure functional language,Haskell. We describe a sound and automatic static ver-ification framework for Haskell, that is based on con-tracts and symbolic execution. Our approach gives pre-cise blame assignments at compile-time in the presenceof higher-order functions and laziness.

First, we give a formal definition of contract sat-isfaction which can be viewed as a denotational se-mantics for contracts. We then construct two contractchecking wrappers, which are dual to each other, forchecking the contract satisfaction. We prove the sound-ness and completeness of the construction of the con-tract checking wrappers with respect to the definitionof the contract satisfaction. This part of my researchshows that the two wrappers are projections with re-spect to a partial ordering crashes-more-often and fur-thermore, they form a projection pair and a closurepair. These properties give contract checking a strongtheoretical foundation.

As the goal is to detect bugs during compile time,we symbolically execute the code constructed by thecontract checking wrappers and prove the soundnessof this approach. We also develop a technique namedcounter-example-guided (CEG) unrolling which onlyunroll function calls on demand. This technique speedsup the checking process.

Finally, our verification approach makes error trac-ing much easier compared with the existing set-basedanalysis. Thus equipped, we are able to tell program-mers during compile-time which function to blame andwhy if there is a bug in their program. This is a break-through for lazy languages because it is known to bedifficult to report such informative messages either atcompile-time or run-time.

UCAM-CL-TR-738

Scott Fairbanks:

181

High precision timing using self-timedcircuitsJanuary 2009, 99 pages, PDFPhD thesis (Gonville and Caius College, September2004)

Abstract: Constraining the events that demarcate pe-riods on a VLSI chip to precise instances of time is thetask undertaken in this thesis. High speed sampling andclock distribution are two example applications. Foun-dational to my approach is the use of self-timed datacontrol circuits.

Specially designed self-timed control circuits deliverhigh frequency timing signals with precise phase rela-tionships. The frequency and the phase relationshipsare controlled by varying the number of self-timed con-trol stages and the number of tokens they control.

The self-timed control circuits are constructed withsimple digital logic gates. The digital logic gates re-spond to a range of analog values with a continuum ofprecise and controlled delays. The control circuits im-plement their functionality efficiently. This allows thegates to drive long wires and distribute the timing sig-nals over a large area. Also gate delays are short andfew, allowing for high frequencies.

The self-timed control circuits implement the func-tionality of a FIFO that is then closed into a ring. Tim-ing tokens ripple through the rings. The FIFO stagesuse digital handshaking protocols to pass the timing to-kens between the stages. The FIFO control stage detectsthe phase between the handshake signals on its inputsand produces a signal that is sent back to the producerswith a delay that is a function of the phase relationshipof the input signals.

The methods described are not bound to the sameprocess and systematic skew limitations of existingmethods. For a certain power budget, timing signals aregenerated and distributed with significantly less powerwith the approaches to be presented than with conven-tional methods.

UCAM-CL-TR-739

Salman Taherian:

State-based Publish/Subscribe forsensor systemsJanuary 2009, 240 pages, PDFPhD thesis (St John’s College, June 2008)

Abstract: Recent technological advances have enabledthe creation of networks of sensor devices. These de-vices are typically equipped with basic computationaland communication capabilities. Systems based onthese devices can deduce high-level, meaningful infor-mation about the environment that may be useful toapplications. Due to their scale, distributed nature, andthe limited resources available to sensor devices, these

systems are inherently complex. Shielding applicationsfrom this complexity is a challenging problem.

To address this challenge, I present a middlewarecalled SPS (State-based Publish/Subscribe). It is basedon a combination of a State-Centric data model and aPublish/Subscribe (Pub/Sub) communication paradigm.I argue that a state-centric data model allows applica-tions to specify environmental situations of interest ina more natural way than existing solutions. In addi-tion, Pub/Sub enables scalable many-to-many commu-nication between sensors, actuators, and applications.

This dissertation initially focuses on Resource-constrained Sensor Networks (RSNs) and proposesState Filters (SFs), which are lightweight, stateful, eventfiltering components. Their design is motivated by theredundancy and correlation observed in sensor read-ings produced close together in space and time. Byperforming context-based data processing, SFs increasePub/Sub expressiveness and improve communicationefficiency.

Secondly, I propose State Maintenance Components(SMCs) for capturing more expressive conditions inheterogeneous sensor networks containing more re-sourceful devices. SMCs extend SFs with data fusionand temporal and spatial data manipulation capabili-ties. They can also be composed together (in a DAG) todeduce higher level information. SMCs operate inde-pendently from each other and can therefore be decom-posed for distributed processing within the network.

Finally, I present a Pub/Sub protocol called QPS(Quad-PubSub) for location-aware Wireless SensorNetworks (WSNs). QPS is central to the design of myframework as it facilitates messaging between state-based components, applications, sensors, and actua-tors. In contrast to existing data dissemination proto-cols, QPS has a layered architecture. This allows for thetransparent operation of routing protocols that meetdifferent Quality of Service (QoS) requirements.

UCAM-CL-TR-740

Tal Sobol-Shikler:

Analysis of affective expression inspeechJanuary 2009, 163 pages, PDFPhD thesis (Girton College, March 2007)

Abstract: This dissertation presents analysis of expres-sions in speech. It describes a novel framework for dy-namic recognition of acted and naturally evoked ex-pressions and its application to expression mappingand to multi-modal analysis of human-computer inter-actions.

The focus of this research is on analysis of a widerange of emotions and mental states from non-verbalexpressions in speech. In particular, on inference ofcomplex mental states, beyond the set of basic emo-tions, including naturally evoked subtle expressionsand mixtures of expressions.

182

This dissertation describes a bottom-up computa-tional model for processing of speech signals. It com-bines the application of signal processing, machinelearning and voting methods with novel approaches tothe design, implementation and validation. It is basedon a comprehensive framework that includes all thedevelopment stages of a system. The model representsparalinguistic speech events using temporal abstrac-tions borrowed from various disciplines such as musi-cology, engineering and linguistics. The model consistsof a flexible and expandable architecture. The valida-tion of the model extends its scope to different expres-sions, languages, backgrounds, contexts and applica-tions.

The work adapts an approach that an utterance isnot an isolated entity but rather a part of an interac-tion and should be analysed in this context. The analy-sis in context includes relations to events and other be-havioural cues. Expressions of mental states are relatednot only in time but also by their meaning and content.This work demonstrates the relations between the lex-ical definitions of mental states, taxonomies and theo-retical conceptualization of mental states and their vo-cal correlates. It examines taxonomies and theoreticalconceptualisation of mental states in relation to theirvocal characteristics. The results show that a very widerange of mental state concepts can be mapped, or de-scribed, using a high-level abstraction in the form ofa small sub-set of concepts which are characterised bytheir vocal correlates.

This research is an important step towards com-prehensive solutions that incorporate social intelligencecues for a wide variety of applications and for multi-disciplinary research.

UCAM-CL-TR-741

David N. Cottingham:

Vehicular wireless communicationJanuary 2009, 264 pages, PDFPhD thesis (Churchill College, September 2008)

Abstract: Transportation is vital in everyday life. As aconsequence, vehicles are increasingly equipped withonboard computing devices. Moreover, the demand forconnectivity to vehicles is growing rapidly, both frombusiness and consumers. Meanwhile, the number ofwireless networks available in an average city in the de-veloped world is several thousand. Whilst this theoreti-cally provides near-ubiquitous coverage, the technologytype is not homogeneous.

This dissertation discusses how the diversity in com-munication systems can be best used by vehicles. Fo-cussing on road vehicles, it first details the technolo-gies available, the difficulties inherent in the vehicu-lar environment, and how intelligent handover algo-rithms could enable seamless connectivity. In particu-lar, it identifies the need for a model of the coverage ofwireless networks.

In order to construct such a model, the use of ve-hicular sensor networks is proposed. The Sentient Van,a platform for vehicular sensing, is introduced, and de-tails are given of experiments carried out concerningthe performance of IEEE 802.11x, specifically for ve-hicles. Using the Sentient Van, a corpus of 10 millionsignal strength readings was collected over three years.This data, and further traces, are used in the remainderof the work described, thus distinguishing it in usingentirely real world data.

Algorithms are adapted from the field of 2-D shapesimplification to the problem of processing thousandsof signal strength readings. By applying these to thedata collected, coverage maps are generated that con-tain extents. These represent how coverage varies be-tween two locations on a given road. The algorithmsare first proven fit for purpose using synthetic data, be-fore being evaluated for accuracy of representation andcompactness of output using real data.

The problem of how to select the optimal networkto connect to is then addressed. The coverage maprepresentation is converted into a multi-planar graph,where the coverage of all available wireless networksis included. This novel representation also includes theability to hand over between networks, and the penal-ties so incurred. This allows the benefits of connectingto a given network to be traded off with the cost ofhanding over to it.

In order to use the multi-planar graph, shortest pathrouting is used. The theory underpinning multi-criteriarouting is overviewed, and a family of routing met-rics developed. These generate efficient solutions to theproblem of calculating the sequence of networks thatshould be connected to over a given geographical route.The system is evaluated using real traces, finding thatin 75% of the test cases proactive routing algorithmsprovide better QoS than a reactive algorithm. More-over, the system can also be run to generate geographi-cal routes that are QoS-aware.

This dissertation concludes by examining how cov-erage mapping can be applied to other types of data,and avenues for future research are proposed.

UCAM-CL-TR-742

Thomas Ridge, Michael Norrish,Peter Sewell:

TCP, UDP, and Sockets:Volume 3: The Service-levelSpecificationFebruary 2009, 305 pages, PDF

Abstract: Despite more than 30 years of research onprotocol specification, the major protocols deployed inthe Internet, such as TCP, are described only in informalprose RFCs and executable code. In part this is becausethe scale and complexity of these protocols makes themchallenging targets for formal descriptions, and because

183

techniques for mathematically rigorous (but appropri-ately loose) specification are not in common use.

In this work we show how these difficulties canbe addressed. We develop a high-level specification forTCP and the Sockets API, describing the byte-streamservice that TCP provides to users, expressed in the for-malised mathematics of the HOL proof assistant. Thiscomplements our previous low-level specification of theprotocol internals, and makes it possible for the firsttime to state what it means for TCP to be correct: thatthe protocol implements the service. We define a preciseabstraction function between the models and validateit by testing, using verified testing infrastructure withinHOL. Some errors may remain, of course, especiallyas our resources for testing were limited, but it wouldbe straightforward to use the method on a larger scale.This is a pragmatic alternative to full proof, providingreasonable confidence at a relatively low entry cost.

Together with our previous validation of the low-level model, this shows how one can rigorously tietogether concrete implementations, low-level protocolmodels, and specifications of the services they claim toprovide, dealing with the complexity of real-world pro-tocols throughout.

Similar techniques should be applicable, and evenmore valuable, in the design of new protocols (as we il-lustrated elsewhere, for a MAC protocol for the SWIFToptically switched network). For TCP and Sockets, ourspecifications had to capture the historical complexi-ties, whereas for a new protocol design, such specifica-tion and testing can identify unintended complexities atan early point in the design.

UCAM-CL-TR-743

Rebecca F. Watson:

Optimising the speed and accuracy ofa Statistical GLR ParserMarch 2009, 145 pages, PDFPhD thesis (Darwin College, September 2007)

Abstract: The focus of this thesis is to develop tech-niques that optimise both the speed and accuracy of aunification-based statistical GLR parser. However, wecan apply these methods within a broad range of pars-ing frameworks. We first aim to optimise the level of tagambiguity resolved during parsing, given that we em-ploy a front-end PoS tagger. This work provides the firstbroad comparison of tag models as we consider bothtagging and parsing performance. A dynamic modelachieves the best accuracy and provides a means toovercome the trade-off between tag error rates in sin-gle tag per word input and the increase in parse am-biguity over multipletag per word input. The secondline of research describes a novel modification to theinside-outside algorithm, whereby multiple inside andoutside probabilities are assigned for elements withinthe packed parse forest data structure. This algorithm

enables us to compute a set of ‘weighted GRs’ directlyfrom this structure. Our experiments demonstrate sub-stantial increases in parser accuracy and throughput forweighted GR output.

Finally, we describe a novel confidence-based train-ing framework, that can, in principle, be applied toany statistical parser whose output is defined in termsof its consistency with a given level and type of an-notation. We demonstrate that a semisupervised vari-ant of this framework outperforms both Expectation-Maximisation (when both are constrained by unla-belled partial-bracketing) and the extant (fully super-vised) method. These novel training methods utilisedata automatically extracted from existing corpora.Consequently, they require no manual effort on behalfof the grammar writer, facilitating grammar develop-ment.

UCAM-CL-TR-744

Anna Ritchie:

Citation context analysis forinformation retrievalMarch 2009, 119 pages, PDFPhD thesis (New Hall, June 2008)

Abstract: This thesis investigates taking words fromaround citations to scientific papers in order to createan enhanced document representation for improved in-formation retrieval. This method parallels how anchortext is commonly used in Web retrieval. In previouswork, words from citing documents have been used asan alternative representation of the cited document butno previous experiment has combined them with a full-text document representation and measured effective-ness in a large scale evaluation.

The contributions of this thesis are twofold: firstly,we present a novel document representation, alongwith experiments to measure its effect on retrieval ef-fectiveness, and, secondly, we document the construc-tion of a new, realistic test collection of scientific re-search papers, with references (in the bibliography) andtheir associated citations (in the running text of the pa-per) automatically annotated. Our experiments showthat the citation-enhanced document representation in-creases retrieval effectiveness across a range of standardretrieval models and evaluation measures.

In Chapter 2, we give the background to our work,discussing the various areas from which we draw to-gether ideas: information retrieval, particularly linkstructure analysis and anchor text indexing, and bib-liometrics, in particular citation analysis. We show thatthere is a close relatedness of ideas between these ar-eas but that these ideas have not been fully exploredexperimentally. Chapter 3 discusses the test collectionparadigm for evaluation of information retrieval sys-tems and describes how and why we built our test col-lection. In Chapter 4 we introduce the ACL Anthology,

184

the archive of computational linguistics papers that ourtest collection is centred around. The archive containsthe most prominent publications since the beginningof the field in the early 1960s, consisting of one jour-nal plus conferences and workshops, resulting in over10,000 papers. Chapter 5 describes how the PDF pa-pers are prepared for our experiments, including iden-tification of references and citations in the papers, onceconverted to plain text, and extraction of citation in-formation to an XML database. Chapter 6 presents ourexperiments: we show that adding citation terms to thefull-text of the papers improves retrieval effectivenessby up to 7.4%, that weighting citation terms higherrelative to paper terms increases the improvement andthat varying the context from which citation terms aretaken has a significant effect on retrieval effectiveness.Our main hypothesis that citation terms enhance a full-text representation of scientific papers is thus proven.

There are some limitations to these experiments.The relevance judgements in our test collection are in-complete but we have experimentally verified that thetest collection is, nevertheless, a useful evaluation tool.Using the Lemur toolkit constrained the method thatwe used to weight citation terms; we would like to ex-periment with a more realistic implementation of termweighting. Our experiments with different citation con-texts did not conclude an optimal citation context; wewould like to extend the scope of our investigation.Now that our test collection exists, we can addressthese issues in our experiments and leave the door openfor more extensive experimentation.

UCAM-CL-TR-745

Scott Owens, Susmit Sarkar, Peter Sewell:

A better x86 memory model:x86-TSO(extended version)March 2009, 52 pages, PDF

Abstract: Real multiprocessors do not provide the se-quentially consistent memory that is assumed by mostwork on semantics and verification. Instead, they haverelaxed memory models, typically described in ambigu-ous prose, which lead to widespread confusion. Theseare prime targets for mechanized formalization. In pre-vious work we produced a rigorous x86-CC model, for-malizing the Intel and AMD architecture specificationsof the time, but those turned out to be unsound with re-spect to actual hardware, as well as arguably too weakto program above. We discuss these issues and presenta new x86-TSO model that suffers from neither prob-lem, formalized in HOL4. We believe it is sound withrespect to real processors, reflects better the vendor’sintentions, and is also better suited for programming.We give two equivalent definitions of x86-TSO: an in-tuitive operational model based on local write buffers,and an axiomatic total store ordering model, similar to

that of the SPARCv8. Both are adapted to handle x86-specific features. We have implemented the axiomaticmodel in our memevents tool, which calculates the setof all valid executions of test programs, and, for greaterconfidence, verify the witnesses of such executions di-rectly, with code extracted from a third, more algorith-mic, equivalent version of the definition.

UCAM-CL-TR-746

Shishir Nagaraja, Ross Anderson:

The snooping dragon:social-malware surveillance of theTibetan movementMarch 2009, 12 pages, PDF

Abstract: In this note we document a case of malware-based electronic surveillance of a political organisationby the agents of a nation state. While malware attacksare not new, two aspects of this case make it worthserious study. First, it was a targeted surveillance at-tack designed to collect actionable intelligence for useby the police and security services of a repressive state,with potentially fatal consequences for those exposed.Second, the modus operandi combined social phishingwith high-grade malware. This combination of well-written malware with well-designed email lures, whichwe call social malware, is devastatingly effective. Feworganisations outside the defence and intelligence sec-tor could withstand such an attack, and although thisparticular case involved the agents of a major power,the attack could in fact have been mounted by a ca-pable motivated individual. This report is therefore ofimportance not just to companies who may attract theattention of government agencies, but to all organisa-tions. As social-malware attacks spread, they are boundto target people such as accounts-payable and payrollstaff who use computers to make payments. Preven-tion will be hard. The traditional defence against so-cial malware in government agencies involves expen-sive and intrusive measures that range from mandatoryaccess controls to tiresome operational security proce-dures. These will not be sustainable in the economy asa whole. Evolving practical low-cost defences againstsocial-malware attacks will be a real challenge.

UCAM-CL-TR-747

Fei Song, Hongke Zhang, Sidong Zhang,Fernando Ramos, Jon Crowcroft:

An estimator of forward andbackward delay for multipathtransportMarch 2009, 16 pages, PDF

185

Abstract: Multipath transport protocols require aware-ness of the capability of different paths being used fortransmission. It is well known that round trip time(RTT) can be used to estimate retransmission timeoutwith reasonable accuracy. However, using RTT to eval-uate the delay of forward or backward paths is not al-ways suitable. In fact, these paths are usually dissimilar,and therefore the packet delay can be significantly dif-ferent in each direction.

We propose a forward and backward delay estima-tor that aims to solve this problem. Based on the re-sults of the estimator, a new retransmission heuristicmechanism for multipath transport is proposed. Withthis same technique we also build two other heuris-tics: A bottleneck bandwidth estimator and a sharedcongestion detector. These help the sender to choosethe high bandwidth path in retransmission and ensureTCP-friendliness in multipath transport, respectively.

UCAM-CL-TR-748

Marco Canini, Wei Li, Andrew W. Moore:

GTVS: boosting the collection ofapplication traffic ground truthApril 2009, 20 pages, PDF

Abstract: Interesting research in the areas of traffic clas-sification, network monitoring, and application-orientanalysis can not proceed with real trace data labeledwith actual application information. However, hand-labeled traces are an extremely valuable but scarce re-source in the traffic monitoring and analysis commu-nity, as a result of both privacy concerns and technicaldifficulties: hardly any possibility exists for payloadeddata to be released to the public, while the intensivelabor required for getting the ground-truth applicationinformation from the data severely constrains the feasi-bility of releasing anonymized versions of hand-labeledpayloaded data.

The usual way to obtain the ground truth is frag-ile, inefficient and not directly comparable from one’swork to another. This chapter proposes and details amethodology that significantly boosts the efficiency incompiling the application traffic ground truth. In con-trast with other existing work, our approach maintainsthe high certainty as in hand-verification, while striv-ing to save time and labor required for that. Further, itis implemented as an easy hands-on tool suite which isnow freely available to the public.

In this paper we present a case study using a 30minute real data trace to guide the readers through ourground-truth classification process. We also present amethod, which is an extension of GTVS that efficientlyclassifies HTTP traffic by its purpose.

UCAM-CL-TR-749

Pan Hui, Eiko Yoneki, Jon Crowcroft,Shu-Yan Chan:

Identifying social communitiesin complex communications fornetwork efficiencyMay 2009, 14 pages, PDF

Abstract: Complex communication networks, moreparticular Mobile Ad Hoc Networks (MANET) andPocket Switched Networks (PSN), rely on short rangeradio and device mobility to transfer data across thenetwork. These kind of mobile networks contain dual-ity in nature: they are radio networks at the same timealso human networks, and hence knowledge from so-cial networks can be also applicable here. In this pa-per, we demonstrate how identifying social communi-ties can significantly improve the forwarding efficien-cies in term of delivery ratio and delivery cost. We ver-ify our hypothesis using data from five human mobil-ity experiments and test on two application scenarios,asynchronous messaging and publish/subscribe service.

UCAM-CL-TR-750

Marco Canini, Wei Li, Martin Zadnik,Andrew W. Moore:

AtoZ: an automatic traffic organizerusing NetFPGAMay 2009, 27 pages, PDF

Abstract: This paper introduces AtoZ, an automatictraffic organizer that provides end-users with con-trol of how their applications use network resources.Such an approach contrasts with the moves of manyISPs towards network-wide application throttling andprovider-centric control of an application’s network-usage. AtoZ provides seamless per-application traffic-organizing on gigabit links, with minimal packet-delaysand no unintended packet drops.

The AtoZ combines the high-speed packet process-ing of the NetFPGA with an efficient flow-behavioridentification method. Currently users can enable AtoZcontrol over network resources by prohibiting certainapplications and controlling the priority of others. Wediscuss deployment experience and use real traffic toillustrate how such an architecture enables several dis-tinct features: high accuracy, high throughput, minimaldelay, and efficient packet labeling – all in a low-cost,robust configuration that works alongside the home orenterprise access-router.

186

UCAM-CL-TR-751

David C. Turner:

Nominal domain theory forconcurrencyJuly 2009, 185 pages, PDFPhD thesis (Clare College, December 2008)

Abstract: Domain theory provides a powerful mathe-matical framework for describing sequential computa-tion, but the traditional tools of domain theory are in-applicable to concurrent computation. Without a gen-eral mathematical framework it is hard to comparedevelopments and approaches from different areas ofstudy, leading to time and effort wasted in rediscover-ing old ideas in new situations.

A possible remedy to this situation is to build adenotational semantics based directly on computationpaths, where a process denotes the set of paths that itmay follow. This has been shown to be a remarkablypowerful idea, but it lacks certain computational fea-tures. Notably, it is not possible to express the idea ofnames and name-generation within this simple path se-mantics.

Nominal set theory is a non-standard mathemati-cal foundation that captures the notion of names in ageneral way. Building a mathematical development ontop of nominal set theory has the effect of incorporat-ing names into its fabric at a low level. Importantly,nominal set theory is sufficiently close to conventionalfoundations that it is often straightforward to transferintuitions into the nominal setting.

Here the original path-based domain theory forconcurrency is developed within nominal set theory,which has the effect of systematically adjoining name-generation to the model. This gives rise to an expressivemetalanguage, Nominal HOPLA, which supports a no-tion of name-generation. Its denotational semantics isgiven entirely in terms of universal constructions on do-mains. An operational semantics is also presented, andrelationships between the denotational and operationaldescriptions are explored.

The generality of this approach to including namegeneration into a simple semantic model indicates thatit will be possible to apply the same techniques tomore powerful domain theories for concurrency, suchas those based on presheaves.

UCAM-CL-TR-752

Gerhard P. Hancke:

Security of proximity identificationsystemsJuly 2009, 161 pages, PDFPhD thesis (Wolfson College, February 2008)

Abstract: RFID technology is the prevalent method forimplementing proximity identification in a number ofsecurity sensitive applications. The perceived proximityof a token serves as a measure of trust and is often usedas a basis for granting certain privileges or services. En-suring that a token is located within a specified distanceof the reader is therefore an important security require-ment. In the case of high-frequency RFID systems thelimited operational range of the near-field communica-tion channel is accepted as implicit proof that a tokenis in close proximity to a reader. In some instances, it isalso presumed that this limitation can provide furthersecurity services.

The first part of this dissertation presents attacksagainst current proximity identification systems. It doc-uments how eavesdropping, skimming and relay at-tacks can be implemented against HF RFID systems.Experimental setups and practical results are providedfor eavesdropping and skimming attacks performedagainst RFID systems adhering to the ISO 14443 andISO 15693 standards. These attacks illustrate that thelimited operational range cannot prevent unauthorisedaccess to stored information on the token, or ensurethat transmitted data remains confidential. The practi-cal implementation of passive and active relay attacksagainst an ISO 14443 RFID system is also described.The relay attack illustrates that proximity identifica-tion should not rely solely on the physical character-istics of the communication channel, even if it could beshown to be location-limited. As a result, it is proposedthat additional security measures, such as distance-bounding protocols, should be incorporated to verifyproximity claims. A new method, using cover noise, isalso proposed to make the backward communicationchannel more resistant to eavesdropping attacks.

The second part of this dissertation discussesdistance-bounding protocols. These protocols deter-mine an upper bound for the physical distance betweentwo parties. A detailed survey of current proposals,investigating their respective merits and weaknesses,identifies general principles governing secure distance-bounding implementations. It is practically shown thatan attacker can circumvent the distance bound by im-plementing attacks at the packet and physical layer ofconventional communication channels. For this reasonthe security of a distance bound depends not only onthe cryptographic protocol, but also on the time mea-surement provided by the underlying communication.Distance-bounding protocols therefore require specialchannels. Finally, a new distance-bounding protocoland a practical implementation of a suitable distance-bounding channel for HF RFID systems are proposed.

UCAM-CL-TR-753

John L. Miller, Jon Crowcroft:

Carbon: trusted auditing forP2P distributed virtual environmentsAugust 2009, 20 pages, PDF

187

Abstract: Many Peer-to-Peer Distributed Virtual Envi-ronments (P2P DVE’s) have been proposed, but noneare widely deployed. One significant barrier to deploy-ment is lack of security. This paper presents Carbon,a trusted auditing system for P2P DVE’s which pro-vides reasonable security with low per-client overhead.DVE’s using Carbon perform offline auditing to evalu-ate DVE client correctness. Carbon audits can be usedto catch DVE clients which break DVE rules – cheaters– so the DVE can punish them. We analyze the im-pact of applying Carbon to a peer-to-peer game withattributes similar to World of Warcraft. We show that99.9% of cheaters – of a certain profile – can be caughtwith guided auditing and 2.3% bandwidth overhead,or 100% of cheaters can be caught with exhaustive au-diting and 27% bandwidth overhead. The surprisinglylow overhead for exhaustive auditing is the result ofthe small payload in most DVE packet updates, com-pared to the larger aggregate payloads in audit mes-sages. Finally, we compare Carbon to PeerReview, andshow that for DVE scenarios Carbon consumes signif-icantly less resources – in typical cases by an order ofmagnitude – while sacrificing little protection.

UCAM-CL-TR-754

Frank Stajano, Paul Wilson:

Understanding scam victims:seven principles for systems securityAugust 2009, 22 pages, PDFAn updated, abridged and peer-reviewed versionof this report appeared in Communicationsof the ACM 54(3):70-75, March 2011[doi:10.1145/1897852.1897872]. Please cite therefereed CACM version in any related work.

Abstract: The success of many attacks on computer sys-tems can be traced back to the security engineers notunderstanding the psychology of the system users theymeant to protect. We examine a variety of scams and“short cons” that were investigated, documented andrecreated for the BBC TV programme The Real Hus-tle and we extract from them some general principlesabout the recurring behavioural patterns of victims thathustlers have learnt to exploit.

We argue that an understanding of these inher-ent “human factors” vulnerabilities, and the necessityto take them into account during design rather thannaıvely shifting the blame onto the “gullible users”, isa fundamental paradigm shift for the security engineerwhich, if adopted, will lead to stronger and more re-silient systems security.

UCAM-CL-TR-755

Yujian Gao, Aimin Hao, Qinping Zhao,Neil A. Dodgson:

Skin-detached surface for interactivelarge mesh editingSeptember 2009, 18 pages, PDF

Abstract: We propose a method for interactive defor-mation of large detailed meshes. Our method allowsthe users to manipulate the mesh directly using freely-selected handles on the mesh. To best preserve sur-face details, we introduce a new surface representation,the skin-detached surface. It represents a detailed sur-face model as a peeled “skin” added over a simplifiedsurface model. The “skin” contains the details of thesurface while the simplified mesh maintains the basicshape. The deformation process consists of three steps:At the mesh loading stage, the “skin” is precomputedaccording to the detailed mesh and detached from thesimplified mesh. Then we deform the simplified meshfollowing the nonlinear gradient domain mesh editingapproach to satisfy the handle position constraints. Fi-nally the detailed “skin” is remapped onto the simpli-fied mesh, resulting in a deformed detailed mesh. Weinvestigate the advantages as well as the limitations ofour method by implementing a prototype system andapplying it to several examples.

UCAM-CL-TR-756

Hamed Haddadi, Damien Fay, Steve Uhlig,Andrew W. Moore, Richard Mortier,Almerima Jamakovic:

Analysis of the Internet’s structuralevolutionSeptember 2009, 13 pages, PDF

Abstract: In this paper we study the structural evolu-tion of the AS topology as inferred from two differentdatasets over a period of seven years. We use a varietyof topological metrics to analyze the structural differ-ences revealed in the AS topologies inferred from thetwo different datasets. In particular, to focus on theevolution of the relationship between the core and theperiphery, we make use of the weighted spectral distri-bution.

We find that the traceroute dataset has increasingdifficulty in sampling the periphery of the AS topology,largely due to limitations inherent to active probing.Such a dataset has too limited a view to properly ob-serve topological changes at the AS-level compared toa dataset largely based on BGP data. We also highlightlimitations in current measurements that require a bet-ter sampling of particular topological properties of theInternet. Our results indicate that the Internet is chang-ing from a core-centered, strongly customer-provideroriented, disassortative network, to a soft-hierarchical,peering-oriented, assortative network.

188

UCAM-CL-TR-757

Mark Adcock:

Improving cache performance byruntime data movementJuly 2009, 174 pages, PDFPhD thesis (Christ’s College, June 2009)

Abstract: The performance of a recursive data struc-ture (RDS) increasingly depends on good data cache be-haviour, which may be improved by software/hardwareprefetching or by ensuring that the RDS has a good datalayout. The latter is harder but more effective, and re-quires solving two separate problems: firstly ensuringthat new RDS nodes are allocated in a good locationin memory, and secondly preventing a degradation inlayout when the RDS changes shape due to pointer up-dates.

The first problem has been studied in detail, butonly two major classes of solutions to the second ex-ist. Layout degradation may be side-stepped by usinga ‘cache-aware’ RDS, one designed to have inherentlygood cache behaviour (e.g. using a B-Tree in place ofa binary search tree), but such structures are difficultto devise and implement. A more automatic solution insome languages is to use a ‘layout-improving’ garbagecollector, which attempt to improve heap data layoutduring collection using online profiling of data accesspatterns. This may carry large performance, memoryand latency overheads.

In this thesis we investigate the insertion of codeinto a program which attempts to move RDS nodesat runtime to prevent or reduce layout degradation.Such code affects only the performance of a programnot its semantics. The body of this thesis is a thoroughand systematic evaluation of three different forms ofdata movement. The first method adapts existing workon static RDS data layout, performing ad-hoc singlenode movements at a program’s pointer-update sites,which is simple to apply and effective in practice, butthe performance gain may be hard to predict. The sec-ond method performs infrequent movement of largergroups of nodes, borrowing techniques from garbagecollection but also embedding data movement in ex-isting traversals of the RDS; the benefit of performingadditional data movement to compact the heap is alsodemonstrated. The third method restores a pre-chosenlayout after each RDS pointer update, which is a com-plex but effective technique, and may be viewed bothas an optimisation and as a way of synthesising newcache-aware RDSs.

Concentrating on both maximising performancewhile minimising latency and extra memory usage, twofundamental RDSs are used for the investigation, rep-resentative of two common data access patterns (lin-ear and branching). The methods of this thesis comparefavourably to upper bounds on performance and to thecanonical cache-aware solutions. This thesis shows the

value of runtime data movement, and as well as pro-ducing optimisation useful in their own right may beused to guide the design of future cache-aware RDSsand layout-improving garbage collectors.

UCAM-CL-TR-758

Alexey Gotsman:

Logics and analyses for concurrentheap-manipulating programsOctober 2009, 160 pages, PDFPhD thesis (Churchill College, September 2009)

Abstract: Reasoning about concurrent programs is dif-ficult because of the need to consider all possible in-teractions between concurrently executing threads. Theproblem is especially acute for programs that manipu-late shared heap-allocated data structures, since heap-manipulation provides more ways for threads to in-teract. Modular reasoning techniques sidestep this dif-ficulty by considering every thread in isolation undersome assumptions on its environment.

In this dissertation we develop modular programlogics and program analyses for the verification of con-current heap-manipulating programs. Our approach isto exploit reasoning principles provided by programlogics to construct modular program analyses and touse this process to obtain further insights into the log-ics. In particular, we build on concurrent separationlogic—a Hoare-style logic that allows modular manualreasoning about concurrent programs written in a sim-ple heap-manipulating programming language.

Our first contribution is to show the soundnessof concurrent separation logic without the conjunc-tion rule and the restriction that resource invariantsbe precise, and to construct an analysis for concurrentheap-manipulating programs that exploits this mod-ified reasoning principle to achieve modularity. Theanalysis can be used to automatically verify a num-ber of safety properties, including memory safety, data-structure integrity, data-race freedom, the absence ofmemory leaks, and the absence of assertion violations.We show that we can view the analysis as generatingproofs in our variant of the logic, which enables the useof its results in proof-carrying code or theorem provingsystems.

Reasoning principles expressed by program logicsare most often formulated for only idealised program-ming constructs. Our second contribution is to developlogics and analyses for modular reasoning about fea-tures present in modern languages and libraries for con-current programming: storable locks (i.e., locks dynam-ically created and destroyed in the heap), first-orderprocedures, and dynamically-created threads.

189

UCAM-CL-TR-759

Eric Koskinen, Matthew Parkinson,Maurice Herlihy:

Coarse-grained transactions(extended version)August 2011, 34 pages, PDF

Abstract: Traditional transactional memory systemssuffer from overly conservative conflict detection, yield-ing so-called false conflicts, because they are based onfine-grained, low-level read/write conflicts. In response,the recent trend has been toward integrating variousabstract data-type libraries using ad-hoc methods ofhigh-level conflict detection. These proposals have ledto improved performance but a lack of a unified theoryhas led to confusion in the literature.

We clarify these recent proposals by defining a gen-eralization of transactional memory in which a trans-action consists of course-grained (abstract data-type)operations rather than simply memory read/write op-erations. We provide semantics for both pessimistic(e.g. transactional boosting) and optimistic (e.g. tradi-tional TMs and recent alternatives) execution. We showthat both are included in the standard atomic seman-tics, yet find that the choice imposes different require-ments on the coarse-grained operations: pessimistic re-quires operations be left-movers, optimistic requiresright-movers. Finally, we discuss how the semantics ap-plies to numerous TM implementation details discussedwidely in the literature.

UCAM-CL-TR-760

Alan F. Blackwell, Lee Wilson, Alice Street,Charles Boulton, John Knell:

Radical innovation:crossing knowledge boundaries withinterdisciplinary teamsNovember 2009, 124 pages, PDF

Abstract: Interdisciplinary innovation arises from thepositive effects that result when stepping across the so-cial boundaries that we structure knowledge by. Thoseboundaries include academic disciplines, governmentdepartments, companies’ internal functions, companiesand sectors, and the boundaries between these do-mains. In the knowledge economy, it is often the casethat the right knowledge to solve a problem is in a dif-ferent place to the problem itself, so interdisciplinaryinnovation is an essential tool for the future. There arealso many problems today that need more than onekind of knowledge to solve them, so interdisciplinaryinnovation is also an essential tool for the challengingproblems of today.

This report presents the results of an in-depth studyinto successful interdisciplinary innovation, focusing onthe personal experiences of the people who achieve it.It is complementary to organisational research, and toresearch on the economic impact of innovation, but hasprimarily adopted perspectives and methods from otherdisciplines. Instead, this report has been developed bya team that is itself interdisciplinary, with a particularfocus on anthropology, design research, and strategicpolicy. It also draws on reports from expert witnessesand invited commentators in many other fields.

UCAM-CL-TR-761

Jonathan J. Davies:

Programming networks of vehiclesNovember 2009, 292 pages, PDFPhD thesis (Churchill College, September 2008)

Abstract: As computers become smaller in size and ad-vances in communications technology are made, we hy-pothesise that a new range of applications involvingcomputing in road vehicles will emerge. These appli-cations may be enabled by the future arrival of general-purpose computing platforms in vehicles. Many ofthese applications will involve the collection, process-ing and distribution of data sampled by sensors on largenumbers of vehicles. This dissertation is primarily con-cerned with addressing how these applications can bedesigned and implemented by programmers.

We explore how a vehicular sensor platform may bebuilt and how data from a variety of sensors can besampled and stored. Applications exploiting such plat-forms will infer higher-level information from the rawsensor data collected. We present the design and imple-mentation of one such application which involves pro-cessing vehicles’ location histories into an up-to-dateroad map.

Our experience shows that there is a problem withprogramming this kind of application: the number ofvehicles and the nature of computational infrastructureavailable are not known until the application is exe-cuted. By comparison, existing approaches to program-ming applications in wireless sensor networks tend toassume that the nature of the network architecture isknown at design-time. This is not an appropriate as-sumption to make in vehicular sensor networks. In-stead, this dissertation proposes that the functionalityof applications is designed and implemented at a higherlevel and the problem of deciding how and where itscomponents are to be executed is left to a compiler. Wecall this ‘late physical binding’.

This approach brings the benefit that applicationscan be automatically adapted and optimised for exe-cution in a wide range of environments. We describe asuite of transformations which can change the order inwhich components of the program are executed whilstpreserving its semantic integrity. These transformations

190

may affect several of the application’s characteristicssuch as its execution time or energy consumption.

The practical utility of this approach is demon-strated through a novel programming language basedon Java. Two examples of diverse applications are pre-sented which demonstrate that the language and com-piler can be used to create non-trivial applications. Per-formance measurements show that the compiler canintroduce parallelism to make more efficient use ofresources and reduce an application’s execution time.One of the applications belongs to a class of dis-tributed systems beyond merely processing vehicularsensor data, suggesting that the late physical bindingparadigm has broader application to other areas of dis-tributed computing.

UCAM-CL-TR-762

Evangelia Kalyvianaki:

Resource provisioning for virtualizedserver applicationsNovember 2009, 161 pages, PDFPhD thesis (Lucy Cavendish College, August 2008)

Abstract: Data centre virtualization creates an agileenvironment for application deployment. Applicationsrun within one or more virtual machines and are hostedon various servers throughout the data centre. One keymechanism provided by modern virtualization tech-nologies is dynamic resource allocation. Using this tech-nique virtual machines can be allocated resources asrequired and therefore, occupy only the necessary re-sources for their hosted application. In fact, two of themain challenges faced by contemporary data centres,server consolidation and power saving, can be tackledefficiently by capitalising on this mechanism.

This dissertation shows how to dynamically adjustthe CPU resources allocated to virtualized server ap-plications in the presence of workload fluctuations. Inparticular it employs a reactive approach to resourceprovisioning based on feedback control and introducesfive novel controllers. All five controllers adjust the ap-plication allocations based on past utilisation observa-tions.

A subset of the controllers integrate the Kalman fil-tering technique to track the utilisations and based onwhich they predict the allocations for the next interval.This approach is particularly attractive for the resourcemanagement problem since the Kalman filter uses theevolution of past utilisations to adjust the allocations.In addition, the adaptive Kalman controller which ad-justs its parameters online and dynamically estimatesthe utilisation dynamics, is able to differentiate sub-stantial workload changes from small fluctuations forunknown workloads.

In addition, this dissertation captures, models, andbuilds controllers based on the CPU resource cou-pling of application components. In the case of multi-tier applications, these controllers collectively allocate

resources to all application tiers detecting saturationpoints across components. This results in them act-ing faster to workload variations than their single-tiercounterparts.

All controllers are evaluated against the Rubisbenchmark application deployed on a prototype virtu-alized cluster built for this purpose.

UCAM-CL-TR-763

Saar Drimer:

Security for volatile FPGAsNovember 2009, 169 pages, PDFPhD thesis (Darwin College, August 2009)

Abstract: With reconfigurable devices fast becomingcomplete systems in their own right, interest in theirsecurity properties has increased. While research on“FPGA security” has been active since the early 2000s,few have treated the field as a whole, or framed its chal-lenges in the context of the unique FPGA usage modeland application space. This dissertation sets out to ex-amine the role of FPGAs within a security system andhow solutions to security challenges can be provided. Ioffer the following contributions:

I motivate authenticating configurations as an addi-tional capability to FPGA configuration logic, and thendescribe a flexible security protocol for remote recon-figuration of FPGA-based systems over insecure net-works. Non-volatile memory devices are used for per-sistent storage when required, and complement the lackof features in some FPGAs with tamper proofing in or-der to maintain specified security properties. A uniqueadvantage of the protocol is that it can be implementedon some existing FPGAs (i.e., it does not require FPGAvendors to add functionality to their devices). Also pro-posed is a solution to the “IP distribution problem”where designs from multiple sources are integrated intoa single bitstream, yet must maintain their confidential-ity.

I discuss the difficulty of reproducing and com-paring FPGA implementation results reported in theacademic literature. Concentrating on cryptographicimplementations, problems are demonstrated throughdesigning three architecture-optimized variants of theAES block cipher and analyzing the results to showthat single figures of merit, namely “throughput” or“throughput per slice”, are often meaningless withoutthe context of an application. To set a precedent for re-producibility in our field, the HDL source code, simula-tion testbenches and compilation instructions are madepublicly available for scrutiny and reuse.

Finally, I examine payment systems as ubiquitousembedded devices, and evaluate their security vulnera-bilities as they interact in a multi-chip environment. Us-ing FPGAs as an adversarial tool, a man-in-the-middleattack against these devices is demonstrated. An FPGA-based defense is also demonstrated: the first securewired “distance bounding” protocol implementation.

191

This is then put in the context of securing reconfig-urable systems.

UCAM-CL-TR-764

Caroline V. Gasperin:

Statistical anaphora resolution inbiomedical textsDecember 2009, 124 pages, PDFPhD thesis (Clare Hall, August 2008)

Abstract: This thesis presents a study of anaphora inbiomedical scientific literature and focuses on tacklingthe problem of anaphora resolution in this domain.Biomedical literature has been the focus of many in-formation extraction projects; there are, however, veryfew works on anaphora resolution in biomedical scien-tific full-text articles. Resolving anaphora is an impor-tant step in the identification of mentions of biomedicalentities about which information could be extracted.

We have identified coreferent and associativeanaphoric relations in biomedical texts. Among asso-ciative relations we were able to distinguish 3 maintypes: biotype, homolog and set-member relations. Wehave created a corpus of biomedical articles that areannotated with anaphoric links between noun phrasesreferring to biomedical entities of interest. Such nounphrases are typed according to a scheme that we havedeveloped based on the Sequence Ontology; it distin-guishes 7 types of entities: gene, part of gene, productof gene, part of product, subtype of gene, supertype ofgene and gene variant.

We propose a probabilistic model for the reso-lution of anaphora in biomedical texts. The modelseeks to find the antecedents of anaphoric expressions,both coreferent and associative, and also to identifydiscourse-new expressions. The model secures goodperformance despite being trained on a small corpus:it achieves 55-73% precision and 57-63% recall oncoreferent cases, and reasonable performance on differ-ent classes of associative cases. We compare the perfor-mance of the model with a rule-based baseline systemthat we have also developed, a naive Bayes system anda decision trees system, showing that the ours outper-forms the others.

We have experimented with active learning in orderto select training samples to improve the performanceof our probabilistic model. It was not, however, moresuccessful than random sampling.

UCAM-CL-TR-765

Magnus O. Myreen:

Formal verification of machine-codeprogramsDecember 2009, 109 pages, PDFPhD thesis (Trinity College, December 2008)

Abstract: Formal program verification provides mathe-matical means of increasing assurance for the correct-ness of software. Most approaches to program verifi-cation are either fully automatic and prove only weakproperties, or alternatively are manual and labour in-tensive to apply; few target realistically modelled ma-chine code. The work presented in this dissertationaims to ease the effort required in proving properties ofprograms on top of detailed models of machine code.The contributions are novel approaches for both verifi-cation of existing programs and methods for automati-cally constructing correct code.

For program verification, this thesis presents a newapproach based on translation: the problem of provingproperties of programs is reduced, via fully-automaticdeduction, to a problem of proving properties of recur-sive functions. The translation from programs to recur-sive functions is shown to be implementable in a the-orem prover both for simple while-programs as wellas real machine code. This verification-after-translationapproach has several advantages over established ap-proaches of verification condition generation. In partic-ular, the new approach does not require annotating theprogram with assertions. More importantly, the pro-posed approach separates the verification proof fromthe underlying model so that specific resource names,some instruction orderings and certain control-flowstructures become irrelevant. As a result proof reuse isenabled to a greater extent than in currently used meth-ods. The scalability of this new approach is illustratedthrough the verification of ARM, x86 and PowerPC im-plementations of a copying garbage collector.

For construction of correct code this thesis presentsa new compiler which maps functions from logic, viaproof, down to multiple carefully modelled commercialmachine languages. Unlike previously published workon compilation from higher-order logic, this compilerallows input functions to be partially specified and sup-ports a broad range of user-defined extensions. Thesefeatures enabled the production of formally verifiedmachine-code implementations of a LISP interpreter, asa case study.

The automation and proofs have been implementedin the HOL4 theorem prover, using a new machine-code Hoare triple instantiated to detailed specificationsof ARM, x86 and PowerPC instruction set architec-tures.

UCAM-CL-TR-766

Julian M. Smith:

Towards robust inexact geometriccomputationDecember 2009, 186 pages, PDFPhD thesis (St. Edmund’s College, July 2009)

Abstract: Geometric algorithms implemented usingrounded arithmetic are prone to robustness problems.

192

Geometric algorithms are often a mix of arithmetic andcombinatorial computations, arising from the need tocreate geometric data structures that are themselves acomplex mix of numerical and combinatorial data. De-cisions that influence the topology of a geometric struc-ture are made on the basis of certain arithmetic calcula-tions, but the inexactness of these calculations may leadto inconsistent decisions, causing the algorithm to pro-duce a topologically invalid result or to fail catastroph-ically. The research reported here investigates ways toproduce robust algorithms with inexact computation.

I present two algorithms for operations on piecewiselinear (polygonal/polyhedral) shapes. Both algorithmsare topologically robust, meaning that they are guar-anteed to generate a topologically valid result from atopologically valid input, irrespective of numerical er-rors in the computations. The first algorithm performsthe Boolean operation in 3D, and also in 2D. Themain part of this algorithm is a series of interdepen-dent operations. The relationship between these opera-tions ensures a consistency in these operations, which, Iprove, guarantees the generation of a shape represen-tation with valid topology. The basic algorithm maygenerate geometric artifacts such as gaps and slivers,which generally can be removed by a data-smoothingpost-process. The second algorithm presented performssimplification in 2D, converting a geometrically invalid(but topologically valid) shape representation into onethat is fully valid. This algorithm is based on a variantof the Bentley-Ottmann sweep line algorithm, but withadditional rules to handle situations not possible underan exact implementation.

Both algorithms are presented in the context of whatis required of an algorithm in order for it to be classedas robust in some sense. I explain why the formulaic ap-proach used for the Boolean algorithm cannot readilybe used for the simplification process. I also give essen-tial code details for a C++ implementation of the 2Dsimplification algorithm, and discuss the results of ex-treme tests designed to show up any problems. Finally, Idiscuss floating-point arithmetic, present error analysisfor the floating-point computation of the intersectionpoint between two segments in 2D, and discuss howsuch errors affect both the simplification algorithm andthe basic Boolean algorithm in 2D.

UCAM-CL-TR-767

Massimo Ostilli, Eiko Yoneki,Ian X. Y. Leung, Jose F. F. Mendes,Pietro Lio, Jon Crowcroft:

Ising model of rumour spreading ininteracting communitiesJanuary 2010, 24 pages, PDF

Abstract: We report a preliminary investigation on in-teractions between communities in a complex network

using the Ising model to analyse the spread of informa-tion among real communities. The inner opinion of agiven community is forced to change through the in-troduction of a unique external source and we anal-yse how the other communities react to this change.We model two conceptual external sources: namely,“Strong-belief”, and “propaganda”, by an infinitelystrong inhomogeneous external field and a finite uni-form external field, respectively. In the former case, thecommunity changes independently from other commu-nities while in the latter case according also to interac-tions with the other communities. We apply our modelto synthetic networks as well as various real worlddata ranging from human physical contact networks toonline social networks. The experimental results usingreal world data clearly demonstrate two distinct scenar-ios of phase transitions characterised by the presence ofstrong memory effects when the graph and coupling pa-rameters are above a critical threshold.

UCAM-CL-TR-768

Cecily Morrison, Adona Iosif, Miklos Danka:

Report on existing open-sourceelectronic medical recordsFebruary 2010, 12 pages, PDF

Abstract: In this report we provide an overview of ex-isting open-source electronic medical records and assessthem against the criteria established by the EViDencegroup.

UCAM-CL-TR-769

Sriram Srinivasan:

Kilim: A server framework withlightweight actors, isolation types andzero-copy messagingFebruary 2010, 127 pages, PDFPhD thesis (King’s College, February 2010)

Abstract: Internet services are implemented as hierar-chical aggregates of communicating components: net-works of data centers, networks of clusters in a datacenter, connected servers in a cluster, and multiple vir-tual machines on a server machine, each containing sev-eral operating systems processes. This dissertation ar-gues for extending this structure to the intra-processlevel, with networks of communicating actors. An ac-tor is a single-threaded state machine with a privateheap and a thread of its own. It communicates withother actors using well-defined and explicit messagingprotocols. Actors must be light enough to comfortablymatch the inherent concurrency in the problem space,and to exploit all available parallelism. Our aims aretwo-fold: (a) to treat SMP systems as they really are:

193

distributed systems with eventual consistency, and (b)recognize from the outset that a server is always part ofa larger collection of communicating components, thuseliminating the mindset mismatch between concurrentprogramming and distributed programming.

Although the actor paradigm is by no means new,our design points are informed by drawing parallelsbetween the macro and micro levels. As with compo-nents in a distributed system, we expect that actorsmust be isolatable in a number of ways: memory iso-lation, fault isolation, upgrade isolation, and executionisolation. The application should be able to have a sayin actor placement and scheduling, and actors must beeasily monitorable.

Our primary contribution is in showing that theserequirements can be satisfied in a language and envi-ronment such as Java, without changes to the sourcelanguage or to the virtual machine, and without leavingmuch of the idiomatic ambit of Java, with its mindsetof pointers and mutable state. In other words, one doesnot have to move to a concurrency-oriented languageor to an entirely immutable object paradigm.

We demonstrate an open-source toolkit called Kilimthat provides (a) ultra-lightweight actors (faster andlighter than extant environments such as Erlang), (b) atype system that guarantees memory isolation betweenthreads by separating internal objects from exportablemessages and by enforcing ownership and structuralconstraints on the latter (linearity and tree-structure, re-spectively) and, (c) a library with I/O support and cus-tomizable synchronization constructs and schedulers.

We show that this solution is simpler to programthan extant solutions, yet statically guaranteed to befree of low-level data races. It is also faster, more scal-able and more stable (in increasing scale) in two in-dustrial strength evaluations: interactive web services(comparing Kilim Web Server to Jetty) and databases(comparing Berkeley DB to a Kilim variant of it).

UCAM-CL-TR-770

Jatinder Singh:

Controlling the dissemination anddisclosure of healthcare eventsFebruary 2010, 193 pages, PDFPhD thesis (St. John’s College, September 2009)

Abstract: Information is central to healthcare: forproper care, information must be shared. Modernhealthcare is highly collaborative, involving interac-tions between users from a range of institutions, includ-ing primary and secondary care providers, researchers,government and private organisations. Each has spe-cific data requirements relating to the service they pro-vide, and must be informed of relevant information asit occurs.

Personal health information is highly sensitive.Those who collect/hold data as part of the care process

are responsible for protecting its confidentiality, in linewith patient consent, codes of practice and legislation.Ideally, one should receive only that information nec-essary for the tasks they perform—on a need-to-knowbasis.

Healthcare requires mechanisms to strictly con-trol information dissemination. Many solutions failto account for the scale and heterogeneity of the en-vironment. Centrally managed data services impedethe local autonomy of health institutions, impact-ing security by diminishing accountability and in-creasing the risks/impacts of incorrect disclosures. Di-rect, synchronous (request-response) communicationrequires an enumeration of every potential informa-tion source/sink. This is impractical when consideringhealth services at a national level. Healthcare presents adata-driven environment highly amenable to an event-based infrastructure, which can inform, update andalert relevant parties of incidents as they occur. Event-based data dissemination paradigms, while efficientand scalable, generally lack the rigorous access controlmechanisms required for health infrastructure.

This dissertation describes how publish/subscribe,an asynchronous, push-based, many-to-many middle-ware communication paradigm, is extended to includemechanisms for actively controlling information disclo-sure. We present Interaction Control: a data-controllayer above a publish/subscribe service allowing thedefinition of context-aware policy rules to authorise in-formation channels, transform information and restrictdata propagation according to the circumstances. Asdissemination policy is defined at the broker-level andenforced by the middleware, client compliance is en-sured. Although policy enforcement involves extra pro-cessing, we show that in some cases the control mecha-nisms can actually improve performance over a generalpublish/subscribe implementation. We build InteractionControl mechanisms into integrated database-brokersto provide a rich representation of state; while facilitat-ing audit, which is essential for accountability.

Healthcare requires the sharing of sensitive infor-mation across federated domains of administrative con-trol. Interaction Control provides the means for balanc-ing the competing concerns of information sharing andprotection. It enables those responsible for informationto meet their data management obligations, throughspecification of fine-grained disclosure policy.

UCAM-CL-TR-771

Cecily Morrison:

Bodies-in-Space: investigatingtechnology usage in co-present groupinteractionMarch 2010, 147 pages, PDFPhD thesis (Darwin College, August 2009)

194

Abstract: With mobile phones in people’s pockets, dig-ital devices in people’s homes, and information systemsin group meetings at work, technology is frequentlypresent when people interact with each other. Unlikedevices used by a single person at a desk, people, ratherthan machines, are the main focus in social settings. Animportant difference then between these two scenarios,individual and group, is the role of the body. Althoughnon-verbal behaviour is not part of human-computerinteraction, it is very much part of human-human in-teraction. This dissertation explores bodies-in-space —people’s use of spatial and postural positioning of theirbodies to maintain a social interaction when technol-ogy is supporting the social interaction of a co-presentgroup.

I begin this dissertation with a review of literature,looking at how and when bodies-in-space have been ac-counted for in research and design processes of tech-nology for co-present groups. I include examples fromboth human-computer interaction, as well the socialsciences more generally. Building on this base, the fol-lowing four chapters provide examples and discussionof methods to: (1) see (analytically), (2) notate, (3) ad-just (choreograph), and (4) research in the laboratory,bodies-in-space. I conclude with reflections on the valueof capturing bodies-in-space in the process of designingtechnology for co-present groups and emphasise a trendtowards end-user involvement and its consequences forthe scope of human-computer interaction research.

All of the research in this dissertation derives from,and relates to, the real-world context of an intensivecare unit of a hospital and was part of assessing thedeployment of an electronic patient record.

UCAM-CL-TR-772

Matthew R. Lakin:

An executable meta-language forinductive definitions with bindersMarch 2010, 171 pages, PDFPhD thesis (Queens’ College, March 2010)

Abstract: A testable prototype can be invaluable foridentifying bugs during the early stages of language de-velopment. For such a system to be useful in practice itshould be quick and simple to generate prototypes fromthe language specification.

This dissertation describes the design and develop-ment of a new programming language called alphaML,which extends traditional functional programming lan-guages with specific features for producing correct, ex-ecutable prototypes. The most important new featuresof alphaML are for the handling of names and bind-ing structures in user-defined languages. To this end, al-phaML uses the techniques of nominal sets (due to Pittsand Gabbay) to represent names explicitly and handlebinding correctly up to alpha-renaming. The languagealso provides built-in support for constraint solving andnon-deterministic search.

We begin by presenting a generalised notion of sys-tems defined by a set of schematic inference rules. Thisis our model for the kind of languages that might beimplemented using alphaML. We then present the syn-tax, type system and operational semantics of the al-phaML language and proceed to define a sound andcomplete embedding of schematic inference rules. Weturn to program equivalence and define a standard no-tion of operational equivalence between alphaML ex-pressions and use this to prove correctness results aboutthe representation of data terms involving binding andabout schematic formulae and inductive definitions.

The fact that binding can be represented correctlyin alphaML is interesting for technical reasons, becausethe language dispenses with the notion of globally dis-tinct names present in most systems based on nomi-nal methods. These results, along with the encoding ofinference rules, constitute the main technical payloadof the dissertation. However, our approach compli-cates the solving of constraints between terms. There-fore, we develop a novel algorithm for solving equalityand freshness constraints between nominal terms whichdoes not rely on standard devices such as swappingsand suspended permutations. Finally, we discuss an im-plementation of alphaML, and conclude with a sum-mary of the work and a discussion of possible futureextensions.

UCAM-CL-TR-773

Thomas J. Cashman:

NURBS-compatible subdivisionsurfacesMarch 2010, 99 pages, PDFPhD thesis (Queens’ College, January 2010)

Abstract: Two main technologies are available to designand represent freeform surfaces: Non-Uniform Ratio-nal B-Splines (NURBS) and subdivision surfaces. Bothrepresentations are built on uniform B-splines, but theyextend this foundation in incompatible ways, and dif-ferent industries have therefore established a preferencefor one representation over the other. NURBS are thedominant standard for Computer-Aided Design, whilesubdivision surfaces are popular for applications in an-imation and entertainment. However there are bene-fits of subdivision surfaces (arbitrary topology) whichwould be useful within Computer-Aided Design, andfeatures of NURBS (arbitrary degree and non-uniformparametrisations) which would make good additions tocurrent subdivision surfaces.

I present NURBS-compatible subdivision surfaces,which combine topological freedom with the abilityto represent any existing NURBS surface exactly. Sub-division schemes that extend either non-uniform orgeneral-degree B-spline surfaces have appeared before,but this dissertation presents the first surfaces able tohandle both challenges simultaneously. To achieve this

195

I develop a novel factorisation of knot insertion rulesfor non-uniform, general-degree B-splines.

Many subdivision surfaces have poor second-orderbehaviour near singularities. I show that it is possibleto bound the curvatures of the general-degree subdivi-sion surfaces created using my factorisation. Bounded-curvature surfaces have previously been created by‘tuning’ uniform low-degree subdivision schemes; thisdissertation shows that general-degree schemes can betuned in a similar way. As a result, I present the firstgeneral-degree subdivision schemes with bounded cur-vature at singularities.

Previous subdivision schemes, both uniform andnon-uniform, have inserted knots indiscriminately, butthe factorised knot insertion algorithm I describe in thisdissertation grants the flexibility to insert knots selec-tively. I exploit this flexibility to preserve convexity inhighly non-uniform configurations, and to create lo-cally uniform regions in place of non-uniform knot in-tervals. When coupled with bounded-curvature mod-ifications, these techniques give the first non-uniformsubdivision schemes with bounded curvature.

I conclude by combining these results to presentNURBS-compatible subdivision surfaces: arbitrary-topology, non-uniform and general-degree surfaceswhich guarantee high-quality second-order surfaceproperties.

UCAM-CL-TR-774

John Wickerson, Mike Dodds,Matthew Parkinson:

Explicit stabilisation for modularrely-guarantee reasoningMarch 2010, 29 pages, PDF

Abstract: We propose a new formalisation of stabilityfor Rely-Guarantee, in which an assertion’s stability isencoded into its syntactic form. This allows two ad-vances in modular reasoning. Firstly, it enables Rely-Guarantee, for the first time, to verify concurrent li-braries independently of their clients’ environments.Secondly, in a sequential setting, it allows a module’sinternal interference to be hidden while verifying itsclients. We demonstrate our approach by verifying, us-ing RGSep, the Version 7 Unix memory manager, un-covering a twenty-year-old bug in the process.

UCAM-CL-TR-775

Anil Madhavapeddy:

Creating high-performance,statically type-safe networkapplicationsMarch 2010, 169 pages, PDFPhD thesis (Robinson College, April 2006)

Abstract: A typical Internet server finds itself in themiddle of a virtual battleground, under constant threatfrom worms, viruses and other malware seeking to sub-vert the original intentions of the programmer. In par-ticular, critical Internet servers such as OpenSSH, BINDand Sendmail have had numerous security issues rang-ing from low-level buffer overflows to subtle protocollogic errors. These problems have cost billions of dol-lars as the growth of the Internet exposes increasingnumbers of computers to electronic malware. Despitethe decades of research on techniques such as model-checking, type-safety and other forms of formal anal-ysis, the vast majority of server implementations con-tinue to be written unsafely and informally in C/C++.

In this dissertation we propose an architecture forconstructing new implementations of standard Inter-net protocols which integrates mature formal meth-ods not currently used in deployed servers: (i) statictype systems from the ML family of functional lan-guages; (ii) model checking to verify safety propertiesexhaustively about aspects of the servers; and (iii) gen-erative meta-programming to express high-level con-straints for the domain-specific tasks of packet parsingand constructing non-deterministic state machines. Ourarchitecture—dubbed MELANGE—is based on Objec-tive Caml and contributes two domain-specific lan-guages: (i) the Meta Packet Language (MPL), a datadescription language used to describe the wire formatof a protocol and output statically type-safe code tohandle network traffic using high-level functional datastructures; and (ii) the Statecall Policy Language (SPL)for constructing non-deterministic finite state automatawhich are embedded into applications and dynamicallyenforced, or translated into PROMELA and staticallymodel-checked.

Our research emphasises the importance of deliv-ering efficient, portable code which is feasible to de-ploy across the Internet. We implemented two com-plex protocols—SSH and DNS—to verify our claims,and our evaluation shows that they perform faster thantheir standard counterparts OpenSSH and BIND, inaddition to providing static guarantees against someclasses of errors that are currently a major source ofsecurity problems.

UCAM-CL-TR-776

Kathryn E. Gray, Alan Mycroft:

System tests from unit testsMarch 2010, 27 pages, PDF

Abstract: Large programs have bugs; software engineer-ing practices reduce the number of bugs in deployedsystems by relying on a combination of unit tests, to fil-ter out bugs in individual procedures, and system tests,to identify bugs in an integrated system.

Our previous work showed how Floyd-Hoaretriples, PCQ, could also be seen as unit tests, i.e.

196

formed a link between verification and test-based val-idation. A transactional-style implementation allowstest post-conditions to refer to values of data structuresboth before and after test execution. Here we argue thatthis style of specifications, with a transactional imple-mentation, provide a novel source of system tests.

Given a set of unit tests for a system, we can runprograms in test mode on real data. Based on an anal-ysis of the unit tests, we intersperse the program’s exe-cution with the pre- and post-conditions from the testsuite to expose bugs or incompletenesses in either theprogram or the test suite itself. We use the results ofthese tests, as well as branch-trace coverage informa-tion, to identify and report anomalies in the runningprogram.

UCAM-CL-TR-777

Thomas Dinsdale-Young, Mike Dodds,Philippa Gardner, Matthew Parkinson,Viktor Vafeiadis:

Concurrent Abstract PredicatesApril 2010, 43 pages, PDF

Abstract: Abstraction is key to understanding and rea-soning about large computer systems. Abstraction issimple to achieve if the relevant data structures are dis-joint, but rather difficult when they are partially shared,as is the case for concurrent modules. We present a pro-gram logic for reasoning abstractly about data struc-tures, that provides a fiction of disjointness and per-mits compositional reasoning. The internal details of amodule are completely hidden from the client by con-current abstract predicates. We reason about a mod-ule’s implementation using separation logic with per-missions, and provide abstract specifications for use byclient programs using concurrent abstract predicates.We illustrate our abstract reasoning by building twoimplementations of a lock module on top of hardwareinstructions, and two implementations of a concurrentset module on top of the lock module.

UCAM-CL-TR-778

Viktor Vafeiadis:

Automatically proving linearizabilitySeptember 2016, 19 pages, PDF

Abstract: This technical report presents a practical au-tomatic verification procedure for proving linearizabil-ity (i.e., atomicity and functional correctness) of con-current data structure implementations. The procedureuses a novel instrumentation to verify logically pure ex-ecutions, and is evaluated on a number of standard con-current stack, queue and set algorithms.

UCAM-CL-TR-779

Eric K. Henderson:

A text representation languagefor contextual and distributionalprocessingApril 2010, 207 pages, PDFPhD thesis (Fitzwilliam College, 2009)

Abstract: This thesis examines distributional and con-textual aspects of linguistic processing in relation to tra-ditional symbolic approaches. Distributional process-ing is more commonly associated with statistical meth-ods, while an integrated representation of context span-ning document and syntactic structure is lacking incurrent linguistic representations. This thesis addressesboth issues through a novel symbolic text representa-tion language.

The text representation language encodes informa-tion from all levels of linguistic analysis in a semanti-cally motivated form. Using object-oriented constructsin a recursive structure that can be derived from thesyntactic parse, the language provides a common inter-face for symbolic and distributional processing. A keyfeature of the language is a recursive treatment of con-text at all levels of representation. The thesis gives a de-tailed account of the form and syntax of the language,as well as a treatment of several important construc-tions. Comparisons are made with other linguistic andsemantic representations, and several of the distinguish-ing features are demonstrated through experiments.

The treatment of context in the representation lan-guage is discussed at length. The recursive structure em-ployed in the representation is explained and motivatedby issues involving document structure. Applicationsof the contextual representation in symbolic processingare demonstrated through several experiments.

Distributional processing is introduced using tradi-tional statistical techniques to measure semantic sim-ilarity. Several extant similarity metrics are evaluatedusing a novel evaluation metric involving adjectiveantonyms. The results provide several insights into thenature of distributional processing, and this motivatesa new approach based on characteristic adjectives.

Characteristic adjectives are distributionally derivedand semantically differentiated vectors associated witha node in a semantic taxonomy. They are significantlylower-dimensioned then their undifferentiated sourcevectors, while retaining a strong correlation to theirposition in the semantic space. Their properties andderivation are described in detail and an experimentalevaluation of their semantic content is presented.

Finally, the distributional techniques to derive char-acteristic adjectives are extended to encompass sym-bolic processing. Rules involving several types of sym-bolic patterns are distributionally derived from a sourcecorpus, and applied to the text representation language.

197

Polysemy is addressed in the derivation by limiting dis-tributional information to monosemous words. The de-rived rules show a significant improvement at disam-biguating nouns in a test corpus.

UCAM-CL-TR-780

Michael Roe:

Cryptography and evidenceMay 2010, 75 pages, PDFPhD thesis (Clare College, April 1997)

Abstract: The invention of public-key cryptography ledto the notion that cryptographically protected messagescould be used as evidence to convince an impartial ad-judicator that a disputed event had in fact occurred.Information stored in a computer is easily modified,and so records can be falsified or retrospectively mod-ified. Cryptographic protection prevents modification,and it is hoped that this will make cryptographicallyprotected data acceptable as evidence. This usage ofcryptography to render an event undeniable has be-come known as non-repudiation. This dissertation is anenquiry into the fundamental limitations of this appli-cation of cryptography, and the disadvantages of thetechniques which are currently in use. In the course ofthis investigation I consider the converse problem, ofensuring that an instance of communication betweencomputer systems leaves behind no unequivocal evi-dence of its having taken place. Features of commu-nications protocols that were seen as defects from thestandpoint of non-repudiation can be seen as benefitsfrom the standpoint of this converse problem, which Icall “plausible deniability”.

UCAM-CL-TR-781

Minor E. Gordon:

Stage scheduling for CPU-intensiveserversJune 2010, 119 pages, PDFPhD thesis (Jesus College, December 2009)

Abstract: The increasing prevalence of multicore, mul-tiprocessor commodity hardware calls for server soft-ware architectures that are cycle-efficient on individ-ual cores and can maximize concurrency across an en-tire machine. In order to achieve both ends this disser-tation advocates stage architectures that put softwareconcurrency foremost and aggressive CPU schedulingthat exploits the common structure and runtime behav-ior of CPU-intensive servers. For these servers user-levelscheduling policies that multiplex one kernel thread perphysical core can outperform those that utilize pools ofworker threads per stage on CPU-intensive workloads.Boosting the hardware efficiency of servers in userspacemeans a single machine can handle more users withouttuning, operating system modifications, or better hard-ware.

UCAM-CL-TR-782

Jonathan M. Hayman:

Petri net semanticsJune 2010, 252 pages, PDFPhD thesis (Darwin College, January 2009)

Abstract: Petri nets are a widely-used model for concur-rency. By modelling the effect of events on local compo-nents of state, they reveal how the events of a processinteract with each other, and whether they can occurindependently of each other by operating on disjointregions of state.

Despite their popularity, we are lacking systematicsyntax-driven techniques for defining the semantics ofprogramming languages inside Petri nets in an analo-gous way that Plotkin’s Structural Operational Seman-tics defines a transition system semantics. The first partof this thesis studies a generally-applicable frameworkfor the definition of the net semantics of a programminglanguage.

The net semantics is used to study concurrent sepa-ration logic, a Hoare-style logic used to prove partialcorrectness of pointer-manipulating concurrent pro-grams. At the core of the logic is the notion of sepa-ration of ownership of state, allowing us to infer thatproven parallel processes operate on the disjoint re-gions of the state that they are seen to own. In thisthesis, a notion of validity of the judgements capturingthe subtle notion of ownership is given and soundnessof the logic with respect to this model is shown. Themodel is then used to study the independence of pro-cesses arising from the separation of ownership. Fol-lowing from this, a form of refinement is given whichis capable of changing the granularity assumed of theprogram’s atomic actions.

Amongst the many different models for concur-rency, there are several forms of Petri net. Categorytheory has been used in the past to establish connec-tions between them via adjunctions, often coreflections,yielding common constructions across the models andrelating concepts such as bisimulation. The most gen-eral forms of Petri net have, however, fallen outside thisframework. Essentially, this is due to the most generalforms of net having an implicit symmetry in their be-haviour that other forms of net cannot directly repre-sent.

The final part of this thesis shows how an abstractframework for defining symmetry in models can be ap-plied to obtain categories of Petri net with symmetry.This is shown to recover, up to symmetry, the univer-sal characterization of unfolding operations on generalPetri nets, allowing coreflections up to symmetry be-tween the category of general Petri nets and other cate-gories of net.

198

UCAM-CL-TR-783

Dan O’Keeffe:

Distributed complex event detectionfor pervasive computingJuly 2010, 170 pages, PDFPhD thesis (St. John’s College, December 2009)

Abstract: Pervasive computing is a model of informa-tion processing that augments computers with sensingcapabilities and distributes them into the environment.Many pervasive computing applications are reactivein nature, in that they perform actions in response toevents (i.e. changes in state of the environment). How-ever, these applications are typically interested in high-level complex events, in contrast to the low-level prim-itive events produced by sensors. The goal of this thesisis to support the detection of complex events by filter-ing, aggregating, and combining primitive events.

Supporting complex event detection in pervasivecomputing environments is a challenging problem. Sen-sors may have limited processing, storage, and commu-nication capabilities. In addition, battery powered sens-ing devices have limited energy resources. Since theyare embedded in the environment, recharging may bedifficult or impossible. To prolong the lifetime of thesystem, it is vital that these energy resources are usedefficiently. Further complications arise due to the dis-tributed nature of pervasive computing systems. Thelack of a global clock can make it impossible to or-der events from different sources. Events may be de-layed or lost en route to their destination, making itdifficult to perform timely and accurate complex eventdetection. Finally, pervasive computing systems may belarge, both geographically and in terms of the numberof sensors. Architectures to support pervasive comput-ing applications should therefore be highly scalable.

We make several contributions in this dissertation.Firstly, we present a flexible language for specifyingcomplex event patterns. The language provides devel-opers with a variety of parameters to control the detec-tion process, and is designed for use in an open dis-tributed environment. Secondly, we provide the abil-ity for applications to specify a variety of detectionpolicies. These policies allow the system to determinethe best way of handling lost and delayed events. Ofparticular interest is our ‘no false-positive’ detectionpolicy. This allows a reduction in detection latencywhile ensuring that only correct events are generatedfor applications sensitive to false positives. Finally, weshow how complex event detector placement can beoptimized over a federated event-based middleware. Inmany cases, detector distribution can reduce unneces-sary communication with resource constrained sensors.

UCAM-CL-TR-784

Ripduman Sohan, Andrew Rice,Andrew W. Moore, Kieran Mansley:

Characterizing 10 Gbps networkinterface energy consumptionJuly 2010, 10 pages, PDF

Abstract: Understanding server energy consumption isfast becoming an area of interest given the increase inthe per-machine energy footprint of modern servers andthe increasing number of servers required to satisfy de-mand. In this paper we (i) quantify the energy overheadof the network subsystem in modern servers by mea-suring, reporting and analyzing power consumption insix 10 Gbps and four 1 Gbps interconnects at a fine-grained level; (ii) introduce two metrics for calculatingthe energy efficiency of a network interface from theperspective of network throughput and host CPU us-age; (iii) compare the efficiency of multiport 1 Gbps in-terconnects as an alternative to 10 Gbps interconnects;and (iv) conclude by offering recommendations for im-proving network energy efficiency for system deploy-ment and network interface designers.

UCAM-CL-TR-785

Aaron R. Coble:

Anonymity, information,and machine-assisted proofJuly 2010, 171 pages, PDFPhD thesis (King’s College, January 2010)

Abstract: This report demonstrates a technique forproving the anonymity guarantees of communicationsystems, using a mechanised theorem-prover. The ap-proach is based on Shannon’s theory of informationand can be used to analyse probabilistic programs.The information-theoretic metrics that are used foranonymity provide quantitative results, even in the caseof partial anonymity. Many of the developments in thistext are applicable to information leakage in general,rather than solely to privacy properties. By developingthe framework within a mechanised theorem-prover, allproofs are guaranteed to be logically and mathemati-cally consistent with respect to a given model. More-over, the specification of a system can be parameterisedand desirable properties of the system can quantify overthose parameters; as a result, properties can be provedabout the system in general, rather than specific in-stances.

In order to develop the analysis framework de-scribed in this text, the underlying theories of informa-tion, probability, and measure had to be formalised inthe theorem-prover; those formalisations are explainedin detail. That foundational work is of general interest

199

and not limited to the applications illustrated here. Themeticulous, extensional approach that has been takenensures that mathematical consistency is maintained.

A series of examples illustrate how formalised in-formation theory can be used to analyse and provethe information leakage of programs modelled in thetheorem-prover. Those examples consider a number ofdifferent threat models and show how they can be char-acterised in the framework proposed.

Finally, the tools developed are used to prove theanonymity of the dining cryptographers (DC) protocol,thereby demonstrating the use of the framework andits applicability to proving privacy properties; the DCprotocol is a standard benchmark for new methods ofanalysing anonymity systems. This work includes thefirst machine-assisted proof of anonymity of the DCprotocol for an unbounded number of cryptographers.

UCAM-CL-TR-786

Arnab Banerjee:

Communication flows inpower-efficient Networks-on-ChipsAugust 2010, 107 pages, PDFPhD thesis (Girton College, March 2009)

Abstract: Networks-on-Chips (NoCs) represent a scal-able wiring solution for future chips, with dynamicallocation-based networks able to provide good utili-sation of the scarce available resources. This thesis de-velops power-efficient, dynamic, packet-switched NoCswhich can support on-chip communication flows.

Given the severe power constraint already presentin VLSI, a power efficient NoC design direction is firstdeveloped. To accurately explore the impact of variousdesign parameters on NoC power dissipation, 4 differ-ent router designs are synthesised, placed and routedin a 90nm process. This demonstrates that the powerdemands are dominated by the data-path and not thecontrol-path, leading to the key finding that, from theenergy perspective, it is justifiable to use more compu-tation to optimise communication.

A review of existing research shows the near-ubiquitous nature of stream-like communication flowsin future computing systems, making support for flowswithin NoCs critically important. It is shown that inseveral situations, current NoCs make highly inefficientuse of network resources in the presence of communi-cation flows. To resolve this problem, a scalable mech-anism is developed to enable the identification of flows,with a flow defined as all packets going to the samedestination. The number of virtual-channels that canbe used by a single flow is then limited to the minimumrequired, ensuring efficient resource utilisation.

The issue of fair resource allocation between flowsis next investigated. The locally fair, packet-based al-location strategies of current NoCs are shown not toprovide fairness between flows. The mechanism already

developed to identify flows by their destination nodesis extended to enable flows to be identified by source-destination address pairs. Finally, a modification to thelink scheduling mechanism is proposed to achieve max-min fairness between flows.

UCAM-CL-TR-787

Daniel Bernhardt:

Emotion inference from human bodymotionOctober 2010, 227 pages, PDFPhD thesis (Selwyn College, January 2010)

Abstract: The human body has evolved to perform so-phisticated tasks from locomotion to the use of tools.At the same time our body movements can carry infor-mation indicative of our intentions, inter-personal at-titudes and emotional states. Because our body is spe-cialised to perform a variety of everyday tasks, in mostsituations emotional effects are only visible throughsubtle changes in the qualities of movements and ac-tions. This dissertation focuses on the automatic analy-sis of emotional effects in everyday actions.

In the past most efforts to recognise emotions fromthe human body have focused on expressive gestureswhich are archetypal and exaggerated expressions ofemotions. While these are easier to recognise by hu-mans and computational pattern recognisers they veryrarely occur in natural scenarios. The principal con-tribution of this dissertation is hence the inference ofemotional states from everyday actions such as walk-ing, knocking and throwing. The implementation of thesystem draws inspiration from a variety of disciplinesincluding psychology, character animation and speechrecognition. Complex actions are modelled using Hid-den Markov Models and motion primitives. The mani-festation of emotions in everyday actions is very subtleand even humans are far from perfect at picking up andinterpreting the relevant cues because emotional influ-ences are usually minor compared to constraints arisingfrom the action context or differences between individ-uals.

This dissertation describes a holistic approachwhich models emotional, action and personal influ-ences in order to maximise the discriminability of dif-ferent emotion classes. A pipeline is developed whichincrementally removes the biases introduced by differ-ent action contexts and individual differences. The re-sulting signal is described in terms of posture and dy-namic features and classified into one of several emo-tion classes using statistically trained Support VectorMachines. The system also goes beyond isolated ex-pressions and is able to classify natural action se-quences. I use Level Building to segment action se-quences and combine component classifications usingan incremental voting scheme which is suitable for on-line applications. The system is comprehensively eval-uated along a number of dimensions using a corpus of

200

motion-captured actions. For isolated actions I evalu-ate the generalisation performance to new subjects. Foraction sequences I study the effects of reusing mod-els trained on the isolated cases vs adapting models toconnected samples. The dissertation also evaluates therole of modelling the influence of individual user differ-ences. I develop and evaluate a regression-based adap-tation scheme. The results bring us an important stepcloser to recognising emotions from body movements,embracing the complexity of body movements in natu-ral scenarios.

UCAM-CL-TR-788

Byron Cook, Eric Koskinen, Moshe Vardi:

Branching-time reasoning forprograms(extended version)July 2011, 38 pages, PDF

Abstract: We describe a reduction from temporal prop-erty verification to a program analysis problem. Ourreduction is an encoding which, with the use of proce-dures and nondeterminism, enables existing interpro-cedural program analysis tools to naturally performthe reasoning necessary for proving temporal proper-ties (eg. backtracking, eventuality checking, tree coun-terexamples for branching-time properties, abstractionrefinement, etc.). Our reduction is state-based in naturebut also forms the basis of an efficient algorithm forverifying trace-based properties, when combined withan iterative symbolic determinization technique, due toCook and Koskinen.

In this extended version, we formalize our encod-ing as a guarded transition system G, parameterizedby a finite set of ranking functions and the temporallogic property. We establish soundness between a safetyproperty of G and the validity of a branching time tem-poral logic property ∀CTL. ∀CTL is a sufficient logicfor proving properties written in the trace-based LinearTemporal Logic via the iterative algorithm.

Finally using examples drawn from the PostgreSQLdatabase server, Apache web server, and Windows OSkernel, we demonstrate the practical viability of ourwork.

UCAM-CL-TR-789

Byron Cook, Eric Koskinen:

Making prophecies with decisionpredicatesNovember 2010, 29 pages, PDF

Abstract: We describe a new algorithm for proving tem-poral properties expressed in LTL of infinite-state pro-grams. Our approach takes advantage of the fact thatLTL properties can often be proved more efficiently us-ing techniques usually associated with the branching-time logic CTL than they can with native LTL algo-rithms. The caveat is that, in certain instances, nonde-terminism in the system’s transition relation can causeCTL methods to report counterexamples that are spu-rious with respect to the original LTL formula. To ad-dress this problem we describe an algorithm that, asit attempts to apply CTL proof methods, finds andthen removes problematic nondeterminism via an anal-ysis on the potentially spurious counterexamples. Prob-lematic nondeterminism is characterized using decisionpredicates, and removed using a partial, symbolic deter-minization procedure which introduces new prophecyvariables to predict the future outcome of these choices.We demonstrate—using examples taken from the Post-greSQL database server, Apache web server, and Win-dows OS kernel—that our method can yield enormousperformance improvements in comparison to knowntools, allowing us to automatically prove properties ofprograms where we could not prove them before.

UCAM-CL-TR-790

Ted Briscoe, Ben Medlock, Øistein Andersen:

Automated assessment of ESOL freetext examinationsNovember 2010, 31 pages, PDF

Abstract: In this report, we consider the task of au-tomated assessment of English as a Second Language(ESOL) examination scripts written in response toprompts eliciting free text answers. We review and crit-ically evaluate previous work on automated assessmentfor essays, especially when applied to ESOL text. Weformally define the task as discriminative preferenceranking and develop a new system trained and testedon a corpus of manually-graded scripts. We show ex-perimentally that our best performing system is veryclose to the upper bound for the task, as defined by theagreement between human examiners on the same cor-pus. Finally we argue that our approach, unlike extantsolutions, is relatively prompt-insensitive and resistantto subversion, even when its operating principles arein the public domain. These properties make our ap-proach significantly more viable for high-stakes assess-ment.

UCAM-CL-TR-791

Andreas Vlachos:

Semi-supervised learning forbiomedical information extractionNovember 2010, 113 pages, PDFPhD thesis (Peterhouse College, December 2009)

201

Abstract: This thesis explores the application of semi-supervised learning to biomedical information extrac-tion. The latter has emerged in recent years as a chal-lenging application domain for natural language pro-cessing techniques. The challenge stems partly from thelack of appropriate resources that can be used as la-beled training data. Therefore, we choose to focus onsemi-supervised learning techniques which enable us totake advantage of human supervision combined withunlabeled data.

We begin with a short introduction to biomedicalinformation extraction and semi-supervised learning inChapter 1. Chapter 2 focuses on the task of biomedi-cal named entity recognition. Using raw abstracts anda dictionary of gene names we develop two systemsfor this task. Furthermore, we discuss annotation is-sues and demonstrate how the performance can be im-proved using user feedback in realistic conditions. InChapter 3 we develop two biomedical event extrac-tion systems: a rule-based one and a machine learningbased one. The former needs only an annotated dic-tionary and syntactic parsing as input, while the latterrequires partial event annotation additionally. Both sys-tems achieve performances comparable to systems uti-lizing fully annotated training data. Chapter 4 discussesthe task of lexical-semantic clustering using Dirichletprocess mixture models. We review the unsupervisedlearning method used, which allows the number of clus-ters discovered to be determined by the data. Further-more, we introduce a new clustering evaluation mea-sure that addresses some shortcomings of the existingmeasures. Chapter 5 introduces a method of guidingthe clustering solution using pairwise links between in-stances. Furthermore, we present a method of select-ing these pairwise links actively in order to decrease theamount of supervision required. Finally, Chapter 6 as-sesses the contributions of this thesis and highlights di-rections for future work.

UCAM-CL-TR-792

James P. Bridge:

Machine learning and automatedtheorem provingNovember 2010, 180 pages, PDFPhD thesis (Corpus Christi College, October 2010)

Abstract: Computer programs to find formal proofs oftheorems have a history going back nearly half a cen-tury. Originally designed as tools for mathematicians,modern applications of automated theorem proversand proof assistants are much more diverse. In particu-lar they are used in formal methods to verify softwareand hardware designs to prevent costly, or life threat-ening, errors being introduced into systems from mi-crochips to controllers for medical equipment or spacerockets.

Despite this, the high level of human expertise re-quired in their use means that theorem proving tools

are not widely used by non specialists, in contrast tocomputer algebra packages which also deal with themanipulation of symbolic mathematics. The work de-scribed in this dissertation addresses one aspect of thisproblem, that of heuristic selection in automated the-orem provers. In theory such theorem provers shouldbe automatic and therefore easy to use; in practice theheuristics used in the proof search are not universallyoptimal for all problems so human expertise is requiredto determine heuristic choice and to set parameter val-ues.

Modern machine learning has been applied to theautomation of heuristic selection in a first order logictheorem prover. One objective was to find if there areany features of a proof problem that are both easyto measure and provide useful information for deter-mining heuristic choice. Another was to determine anddemonstrate a practical approach to making theoremprovers truly automatic.

In the experimental work, heuristic selection basedon features of the conjecture to be proved and theassociated axioms is shown to do better than anysingle heuristic. Additionally a comparison has beenmade between static features, measured prior to theproof search process, and dynamic features that mea-sure changes arising in the early stages of proof search.Further work was done on determining which featuresare important, demonstrating that good results are ob-tained with only a few features required.

UCAM-CL-TR-793

Shazia Afzal:

Affect inference in learningenvironments: a functional view offacial affect analysis using naturalisticdataDecember 2010, 146 pages, PDFPhD thesis (Murray Edwards College, May 2010)

Abstract: This research takes an application-orientedstance on affective computing and addresses the prob-lem of automatic affect inference within learning tech-nologies. It draws from the growing understanding ofthe centrality of emotion in the learning process andthe fact that, as yet, this crucial link is not addressedin the design of learning technologies. This dissertationspecifically focuses on examining the utility of facial af-fect analysis to model the affective state of a learner ina one-on-one learning setting.

Although facial affect analysis using posed or acteddata has been studied in great detail for a couple ofdecades now, research using naturalistic data is still achallenging problem. The challenges are derived fromthe complexity in conceptualising affect, the method-ological and technical difficulties in measuring it, andthe emergent ethical concerns in realising automatic

202

affect inference by computers. However, as the con-text of this research is derived from, and relates to, areal-world application environment, it is based entirelyon naturalistic data. The whole pipeline – of identify-ing the requirements, to collection of data, to the de-velopment of an annotation protocol, to labelling ofdata, and the eventual analyses – both quantitative andqualitative; is described in this dissertation. In effect, aframework for conducting research using natural datais set out and the challenges encountered at each stageidentified.

Apart from the challenges associated with the per-ception and measurement of affect, this research em-phasises that there are additional issues that requiredue consideration by virtue of the application context.As such, in light of the discussed observations and re-sults, this research concludes that we need to under-stand the nature and expression of emotion in the con-text of technology use, and pursue creative explorationof what is perhaps a qualitatively different form of emo-tion expression and communication.

UCAM-CL-TR-794

Øistein E. Andersen:

Grammatical error predictionJanuary 2011, 163 pages, PDFPhD thesis (Girton College, 2010)

Abstract: In this thesis, we investigate methods for au-tomatic detection, and to some extent correction, ofgrammatical errors. The evaluation is based on man-ual error annotation in the Cambridge Learner Corpus(CLC), and automatic or semi-automatic annotation oferror corpora is one possible application, but the meth-ods are also applicable in other settings, for instance togive learners feedback on their writing or in a proof-reading tool used to prepare texts for publication.

Apart from the CLC, we use the British NationalCorpus (BNC) to get a better model of correct us-age, WordNet for semantic relations, other machine-readable dictionaries for orthography/morphology, andthe Robust Accurate Statistical Parsing (RASP) systemto parse both the CLC and the BNC and thereby iden-tify syntactic relations within the sentence. An ancillaryoutcome of this is a syntactically annotated version ofthe BNC, which we have made publicly available.

We present a tool called GenERRate, which can beused to introduce errors into a corpus of correct text,and evaluate to what extent the resulting synthetic errorcorpus can complement or replace a real error corpus.

Different methods for detection and correction areinvestigated, including: sentence-level binary classifica-tion based on machine learning over n-grams of words,n-grams of part-of-speech tags and grammatical rela-tions; automatic identification of features which arehighly indicative of individual errors; and developmentof classifiers aimed more specifically at given errortypes, for instance concord errors based on syntactic

structure and collocation errors based on co-occurrencestatistics from the BNC, using clustering to deal withdata sparseness. We show that such techniques can de-tect, and sometimes even correct, at least certain errortypes as well as or better than human annotators.

We finally present an annotation experiment inwhich a human annotator corrects and supplements theautomatic annotation, which confirms the high detec-tion/correction accuracy of our system and furthermoreshows that such a hybrid set-up gives higher-qualityannotation with considerably less time and effort ex-pended compared to fully manual annotation.

UCAM-CL-TR-795

Aurelie Herbelot:

Underspecified quantificationFebruary 2011, 163 pages, PDFPhD thesis (Trinity Hall, 2010)

Abstract: Many noun phrases in text are ambiguouslyquantified: syntax doesn’t explicitly tell us whether theyrefer to a single entity or to several and, in main clauses,what portion of the set denoted by the subject Nbar ac-tually takes part in the event expressed by the verb. Forinstance, when we utter the sentence ‘Cats are mam-mals’, it is only world knowledge that allows our hearerto infer that we mean ‘All cats are mammals’, and not‘Some cats are mammals’. This ambiguity effect is in-teresting at several levels. Theoretically, it raises cogni-tive and linguistic questions. To what extent does syn-tax help humans resolve the ambiguity? What problem-solving skills come into play when syntax is insufficientfor full resolution? How does ambiguous quantificationrelate to the phenomenon of genericity, as described bythe linguistic literature? From an engineering point ofview, the resolution of quantificational ambiguity is es-sential to the accuracy of some Natural Language Pro-cessing tasks.

We argue that the quantification ambiguity phe-nomenon can be described in terms of underspecifica-tion and propose a formalisation for what we call ‘un-derquantified’ subject noun phrases. Our formalisationis motivated by inference requirements and covers allcases of genericity.

Our approach is then empirically validated by hu-man annotation experiments. We propose an annota-tion scheme that follows our theoretical claims withregard to underquantification. Our annotation resultsstrengthen our claim that all noun phrases can be anal-ysed in terms of quantification. The produced corpusallows us to derive a gold standard for quantificationresolution experiments and is, as far as we are aware,the first attempt to analyse the distribution of nullquantifiers in English.

We then create a baseline system for automaticquantification resolution, using syntax to provide dis-criminating features for our classification. We show

203

that results are rather poor for certain classes and ar-gue that some level of pragmatics is needed, in com-bination with syntax, to perform accurate resolution.We explore the use of memory-based learning as a wayto approximate the problem-solving skills available tohumans at the level of pragmatic understanding.

UCAM-CL-TR-796

Jonathan Mak:

Facilitating program parallelisation:a profiling-based approachMarch 2011, 120 pages, PDFPhD thesis (St. John’s College, November 2010)

Abstract: The advance of multi-core architectures sig-nals the end of universal speed-up of software overtime. To continue exploiting hardware developments,effort must be invested in producing software that canbe split up to run on multiple cores or processors.Many solutions have been proposed to address this is-sue, ranging from explicit to implicit parallelism, butconsensus has yet to be reached on the best way totackle such a problem.

In this thesis we propose a profiling-based inter-active approach to program parallelisation. Profilersgather dependence information on a program, whichis then used to automatically parallelise the programat source-level. The programmer can then examine theresulting parallel program, and using critical path in-formation from the profiler, identify and refactor par-allelism bottlenecks to enable further parallelism. Weargue that this is an efficient and effective method ofparallelising general sequential programs.

Our first contribution is a comprehensive analy-sis of limits of parallelism in several benchmark pro-grams, performed by constructing Dynamic Depen-dence Graphs (DDGs) from execution traces. We showthat average available parallelism is often high, but re-alising it would require various changes in compilation,language or computation models. As an example, weshow how using a spaghetti stack structure can lead toa doubling of potential parallelism.

The rest of our thesis demonstrates how some ofthis potential parallelism can be realised under the pop-ular fork-join parallelism model used by Cilk, TBB,OpenMP and others. We present a tool-chain with twomain components: Embla 2, which uses DDGs fromprofiled dependences to estimate the amount of task-level parallelism in programs; and Woolifier, a source-to-source transformer that uses Embla 2’s output toparallelise the programs. Using several case studies,we demonstrate how this tool-chain greatly facilitatesprogram parallelisation by performing an automaticbest-effort parallelisation and presenting critical pathsin a concise graphical form so that the programmercan quickly locate parallelism bottlenecks, which whenrefactored can lead to even greater potential parallelism

and significant actual speed-ups (up to around 25 on a32-effective-core machine).

UCAM-CL-TR-797

Boris Feigin:

Interpretational overhead insystem softwareApril 2011, 116 pages, PDFPhD thesis (Homerton College, September 2010)

Abstract: Interpreting a program carries a runtimepenalty: the interpretational overhead. Traditionally, acompiler removes interpretational overhead by sacri-ficing inessential details of program execution. How-ever, a broad class of system software is based on non-standard interpretation of machine code or a higher-level language. For example, virtual machine monitorsemulate privileged instructions; program instrumenta-tion is used to build dynamic call graphs by interceptingfunction calls and returns; and dynamic software up-dating technology allows program code to be altered atruntime. Many of these frameworks are performance-sensitive and several efficiency requirements—both for-mal and informal—have been put forward over thelast four decades. Largely independently, the conceptof interpretational overhead received much attention inthe partial evaluation (“program specialization”) liter-ature. This dissertation contributes a unifying under-standing of efficiency and interpretational overhead insystem software.

Starting from the observation that a virtual machinemonitor is a self-interpreter for machine code, our firstcontribution is to reconcile the definition of efficientvirtualization due to Popek and Goldberg with Jonesoptimality, a measure of the strength of program spe-cializers. We also present a rational reconstruction ofhardware virtualization support (“trap-and-emulate”)from context-threaded interpretation, a technique forimplementing fast interpreters due to Berndl et al.

As a form of augmented execution, virtualizationshares many similarities with program instrumentation.Although several low-overhead instrumentation frame-works are available on today’s hardware, there has beenno formal understanding of what it means for instru-mentation to be efficient. Our second contribution isa definition of efficiency for program instrumentationin the spirit of Popek and Goldberg’s work. Instru-mentation also incurs an implicit overhead because in-strumentation code needs access to intermediate exe-cution states and this is antagonistic to optimization.The third contribution is to use partial equivalencerelations (PERs) to express the dependence of instru-mentation on execution state, enabling an instrumenta-tion/optimization trade-off. Since program instrumen-tation, applied at runtime, constitutes a kind of dy-namic software update, we can similarly restrict allow-able future updates to be consistent with existing op-timizations. Finally, treating “old” and “new” code in

204

a dynamically-updatable program as being written indifferent languages permits a semantic explanation of asafety rule that was originally introduced as a syntacticcheck.

UCAM-CL-TR-798

Periklis Akritidis:

Practical memory safety for CJune 2011, 136 pages, PDFPhD thesis (Wolfson College, May 2010)

Abstract: Copious amounts of high-performance andlow-level systems code are written in memory-unsafelanguages such as C and C++. Unfortunately, the lackof memory safety undermines security and reliability;for example, memory-corruption bugs in programs canbreach security, and faults in kernel extensions canbring down the entire operating system. Memory-safelanguages, however, are unlikely to displace C and C++in the near future; thus, solutions for future and exist-ing C and C++ code are needed.

Despite considerable prior research, memory-safetyproblems in C and C++ programs persist because theexisting proposals that are practical enough for pro-duction use cannot offer adequate protection, whilecomprehensive proposals are either too slow for practi-cal use, or break backwards compatibility by requiringsignificant porting or generating binary-incompatiblecode.

To enable practical protection against memory-corruption attacks and operating system crashes, I de-signed new integrity properties preventing dangerousmemory corruption at low cost instead of enforcingstrict memory safety to catch every memory error athigh cost. Then, at the implementation level, I aggres-sively optimised for the common case, and streamlinedexecution by modifying memory layouts as far as al-lowed without breaking binary compatibility.

I developed three compiler-based tools for analysingand instrumenting unmodified source code to automat-ically generate binaries hardened against memory er-rors: BBC and WIT to harden user-space C programs,and BGI to harden and to isolate Microsoft Windowskernel extensions. The generated code incurs low per-formance overhead and is binary-compatible with unin-strumented code. BBC offers strong protection withlower overhead than previously possible for its level ofprotection; WIT further lowers overhead while offeringstronger protection than previous solutions of similarperformance; and BGI improves backwards compati-bility and performance over previous proposals, mak-ing kernel extension isolation practical for commoditysystems.

UCAM-CL-TR-799

Thomas Tuerk:

A separation logic framework forHOLJune 2011, 271 pages, PDFPhD thesis (Downing College, December 2010)

Abstract: Separation logic is an extension of Hoarelogic due to O’Hearn and Reynolds. It was designedfor reasoning about mutable data structures. Becauseseparation logic supports local reasoning, it scales bet-ter than classical Hoare logic and can easily be used toreason about concurrency. There are automated sepa-ration logic tools as well as several formalisations ininteractive theorem provers. Typically, the automatedseparation logic tools are able to reason about shal-low properties of large programs. They usually con-sider just the shape of data structures, not their data-content. The formalisations inside theorem provers canbe used to prove interesting, deep properties. However,they typically lack automation. Another shortcoming isthat there are a lot of slightly different separation log-ics. For each programming language and each interest-ing property a new kind of separation logic seems to beinvented.

In this thesis, a general framework for separationlogic is developed inside the HOL4 theorem prover.This framework is based on Abstract Separation Logic,an abstract, high level variant of separation logic. Ab-stract Separation Logic is a general separation logicsuch that many other separation logics can be basedon it. This framework is instantiatiated in a first step tosupport a stack with read and write permissions follow-ing ideas of Parkinson, Bornat and Calcagno. Finally,the framework is further instantiated to build a separa-tion logic tool called Holfoot. It is similar to the toolSmallfoot, but extends it from reasoning about shapeproperties to fully functional specifications.

To my knowledge this work presents the first for-malisation of Abstract Separation Logic inside a the-orem prover. By building Holfoot on top of this for-malisation, I could demonstrate that Abstract Separa-tion Logic can be used as a basis for realistic separationlogic tools. Moreover, this work demonstrates that it isfeasable to implement such separation logic tools insidea theorem prover. Holfoot is highly automated. It canverify Smallfoot examples automatically inside HOL4.Moreover, Holfoot can use the full power of HOL4.This allows Holfoot to verify fully functional specifica-tions. Simple fully functional specifications can be han-dled automatically using HOL4’s tools and libraries orexternal SMT solvers. More complicated ones can behandled using interactive proofs inside HOL4. In con-trast, most other separation logic tools can reason justabout the shape of data structures. Others reason onlyabout data properties that can be solved using SMTsolvers.

205

UCAM-CL-TR-800

James R. Srinivasan:

Improving cache utilisationJune 2011, 184 pages, PDFPhD thesis (Jesus College, April 2011)

Abstract: Microprocessors have long employed cachesto help hide the increasing latency of accessing mainmemory. The vast majority of previous research has fo-cussed on increasing cache hit rates to improve cacheperformance, while lately decreasing power consump-tion has become an equally important issue. This thesisexamines the lifetime of cache lines in the memory hier-archy, considering whether they are live (will be refer-enced again before eviction) or dead (will not be refer-enced again before eviction). Using these two states, thecache utilisation (proportion of the cache which will bereferenced again) can be calculated.

This thesis demonstrates that cache utilisation isrelatively poor over a wide range of benchmarks andcache configurations. By focussing on techniques to im-prove cache utilisation, cache hit rates are increasedwhile overall power consumption may also be de-creased.

Key to improving cache utilisation is an accuratepredictor of the state of a cache line. This thesispresents a variety of such predictors, mostly basedupon the mature field of branch prediction, and com-pares them against previously proposed predictors. Themost appropriate predictors are then demonstrated intwo applications: Improving victim cache performancethrough filtering, and reducing cache pollution duringaggressive prefetching

These applications are primarily concerned with im-proving cache performance and are analysed using a de-tailed microprocessor simulator. Related applications,including decreasing power consumption, are also dis-cussed, as are the applicability of these techniques tomultiprogrammed and multiprocessor systems.

UCAM-CL-TR-801

Amitabha Roy:

Software lock elision for x86 machinecodeJuly 2011, 154 pages, PDFPhD thesis (Emmanuel College, April 2011)

Abstract: More than a decade after becoming a topic ofintense research there is no transactional memory hard-ware nor any examples of software transactional mem-ory use outside the research community. Using softwaretransactional memory in large pieces of software needscopious source code annotations and often means thatstandard compilers and debuggers can no longer be

used. At the same time, overheads associated with soft-ware transactional memory fail to motivate program-mers to expend the needed effort to use software trans-actional memory. The only way around the overheadsin the case of general unmanaged code is the anticipatedavailability of hardware support. On the other hand,architects are unwilling to devote power and area bud-gets in mainstream microprocessors to hardware trans-actional memory, pointing to transactional memory be-ing a “niche” programming construct. A deadlock hasthus ensued that is blocking transactional memory useand experimentation in the mainstream.

This dissertation covers the design and constructionof a software transactional memory runtime systemcalled SLE x86 that can potentially break this dead-lock by decoupling transactional memory from pro-grams using it. Unlike most other STM designs, thecore design principle is transparency rather than per-formance. SLE x86 operates at the level of x86 ma-chine code, thereby becoming immediately applicableto binaries for the popular x86 architecture. The onlyrequirement is that the binary synchronise using knownlocking constructs or calls such as those in Pthreads orOpenMP libraries. SLE x86 provides speculative lockelision (SLE) entirely in software, executing critical sec-tions in the binary using transactional memory. Option-ally, the critical sections can also be executed withoutusing transactions by acquiring the protecting lock.

The dissertation makes a careful analysis of the im-pact on performance due to the demands of the x86memory consistency model and the need to transpar-ently instrument x86 machine code. It shows that bothof these problems can be overcome to reach a reason-able level of performance, where transparent softwaretransactional memory can perform better than a lock.SLE x86 can ensure that programs are ready for trans-actional memory in any form, without being explicitlywritten for it.

UCAM-CL-TR-802

Johanna Geiß:

Latent semantic sentence clusteringfor multi-document summarizationJuly 2011, 156 pages, PDFPhD thesis (St. Edmund’s College, April 2011)

Abstract: This thesis investigates the applicability of La-tent Semantic Analysis (LSA) to sentence clustering forMulti-Document Summarization (MDS). In contrast tomore shallow approaches like measuring similarity ofsentences by word overlap in a traditional vector spacemodel, LSA takes word usage patterns into account. Sofar LSA has been successfully applied to different In-formation Retrieval (IR) tasks like information filter-ing and document classification (Dumais, 2004). In thecourse of this research, different parameters essential tosentence clustering using a hierarchical agglomerative

206

clustering algorithm (HAC) in general and in combina-tion with LSA in particular are investigated. These pa-rameters include, inter alia, information about the typeof vocabulary, the size of the semantic space and the op-timal numbers of dimensions to be used in LSA. Theseparameters have not previously been studied and eval-uated in combination with sentence clustering (chapter4).

This thesis also presents the first gold standard forsentence clustering in MDS. To be able to evaluatesentence clusterings directly and classify the influenceof the different parameters on the quality of sentenceclustering, an evaluation strategy is developed that in-cludes gold standard comparison using different eval-uation measures (chapter 5). Therefore the first com-pound gold standard for sentence clustering was cre-ated. Several human annotators were asked to groupsimilar sentences into clusters following guidelines cre-ated for this purpose (section 5.4). The evaluation ofthe human generated clusterings revealed that the hu-man annotators agreed on clustering sentences abovechance. Analysis of the strategies adopted by the humanannotators revealed two groups – hunters and gather-ers – who differ clearly in the structure and size of theclusters they created (chapter 6).

On the basis of the evaluation strategy the param-eters for sentence clustering and LSA are optimized(chapter 7). A final experiment in which the perfor-mance of LSA in sentence clustering for MDS is com-pared to the simple word matching approach of the tra-ditional Vector Space Model (VSM) revealed that LSAproduces better quality sentence clusters for MDS thanVSM.

UCAM-CL-TR-803

Ekaterina V. Shutova:

Computational approaches tofigurative languageAugust 2011, 219 pages, PDFPhD thesis (Pembroke College, March 2011)

Abstract: The use of figurative language is ubiquitousin natural language text and it is a serious bottleneck inautomatic text understanding. A system capable of in-terpreting figurative language would be extremely ben-eficial to a wide range of practical NLP applications.The main focus of this thesis is on the phenomenon ofmetaphor. I adopt a statistical data-driven approach toits modelling, and create the first open-domain systemfor metaphor identification and interpretation in unre-stricted text. In order to verify that similar methods canbe applied to modelling other types of figurative lan-guage, I then extend this work to the task of interpre-tation of logical metonymy.

The metaphor interpretation system is capable ofdiscovering literal meanings of metaphorical expres-sions in text. For the metaphors in the examples “All

of this stirred an unfathomable excitement in her” or“a carelessly leaked report” the system produces inter-pretations “All of this provoked an unfathomable ex-citement in her” and “a carelessly disclosed report” re-spectively. It runs on unrestricted text and to my knowl-edge is the only existing robust metaphor paraphrasingsystem. It does not employ any hand-coded knowledge,but instead derives metaphorical interpretations froma large text corpus using statistical pattern-processing.The system was evaluated with the aid of human judgesand it operates with the accuracy of 81%.

The metaphor identification system automaticallytraces the analogies involved in the production of aparticular metaphorical expression in a minimally su-pervised way. The system generalises over the analo-gies by means of verb and noun clustering, i.e. identi-fication of groups of similar concepts. This generalisa-tion makes it capable of recognising previously unseenmetaphorical expressions in text, e.g. having once seena metaphor ‘stir excitement’ the system concludes that‘swallow anger’ is also used metaphorically. The systemidentifies metaphorical expressions with a high preci-sion of 79%.

The logical metonymy processing system producesa list of metonymic interpretations disambiguated withrespect to their word sense. It then automatically or-ganises them into a novel class-based model of logicalmetonymy inspired by both empirical evidence and lin-guistic theory. This model provides more accurate andgeneralised information about possible interpretationsof metonymic phrases than previous approaches.

UCAM-CL-TR-804

Sean B. Holden:

The HasGP user manualSeptember 2011, 18 pages, PDF

Abstract: HasGP is an experimental library implement-ing methods for supervised learning using Gaussianprocess (GP) inference, in both the regression andclassification settings. It has been developed in thefunctional language Haskell as an investigation intowhether the well-known advantages of the functionalparadigm can be exploited in the field of machinelearning, which traditionally has been dominated bythe procedural/object-oriented approach, particularlyinvolving C/C++ and Matlab. HasGP is open-sourcesoftware released under the GPL3 license. This manualprovides a short introduction on how install the library,and how to apply it to supervised learning problems. Italso provides some more in-depth information on theimplementation of the library, which is aimed at de-velopers. In the latter, we also show how some of thespecific functional features of Haskell, in particular theability to treat functions as first-class objects, and theuse of typeclasses and monads, have informed the de-sign of the library. This manual applies to HasGP ver-sion 0.1, which is the initial release of the library.

207

UCAM-CL-TR-805

Simon Hay:

A model personal energy meterSeptember 2011, 207 pages, PDFPhD thesis (Girton College, August 2011)

Abstract: Every day each of us consumes a signifi-cant amount of energy, both directly through transport,heating and use of appliances, and indirectly from ourneeds for the production of food, manufacture of goodsand provision of services.

This dissertation investigates a personal energy me-ter which can record and apportion an individual’s en-ergy usage in order to supply baseline information andincentives for reducing our environmental impact.

If the energy costs of large shared resources are splitevenly without regard for individual consumption eachperson minimises his own losses by taking advantageof others. Context awareness offers the potential tochange this balance and apportion energy costs to thosewho cause them to be incurred. This dissertation ex-plores how sensor systems installed in many buildingstoday can be used to apportion energy consumptionbetween users, including an evaluation of a range ofstrategies in a case study and elaboration of the over-riding principles that are generally applicable. It alsoshows how second-order estimators combined with lo-cation data can provide a proxy for fine-grained sens-ing.

A key ingredient for apportionment mechanisms isdata on energy usage. This may come from metering de-vices or buildings directly, or from profiling devices andusing secondary indicators to infer their power state. Amechanism for profiling devices to determine the en-ergy costs of specific activities, particularly applicableto shared programmable devices is presented which canmake this process simpler and more accurate. By com-bining crowd-sourced building-inventory informationand a simple building energy model it is possible to esti-mate an individual’s energy use disaggregated by deviceclass with very little direct sensing.

Contextual information provides crucial cues forapportioning the use and energy costs of resources, andone of the most valuable sources from which to infercontext is location. A key ingredient for a personal en-ergy meter is a low cost, low infrastructure locationsystem that can be deployed on a truly global scale.This dissertation presents a description and evaluationof the new concept of inquiry-free Bluetooth trackingthat has the potential to offer indoor location informa-tion with significantly less infrastructure and calibra-tion than other systems.

Finally, a suitable architecture for a personal energymeter on a global scale is demonstrated using a mobilephone application to aggregate energy feeds based onthe case studies and technologies developed.

UCAM-CL-TR-806

Damien Fay, Jerome Kunegis, Eiko Yoneki:

On joint diagonalisation fordynamic network analysisOctober 2011, 12 pages, PDF

Abstract: Joint diagonalisation (JD) is a technique usedto estimate an average eigenspace of a set of matrices.Whilst it has been used successfully in many areas totrack the evolution of systems via their eigenvectors; itsapplication in network analysis is novel. The key focusin this paper is the use of JD on matrices of spanningtrees of a network. This is especially useful in the case ofreal-world contact networks in which a single underly-ing static graph does not exist. The average eigenspacemay be used to construct a graph which represents the‘average spanning tree’ of the network or a representa-tion of the most common propagation paths. We thenexamine the distribution of deviations from the aver-age and find that this distribution in real-world contactnetworks is multi-modal; thus indicating several modesin the underlying network. These modes are identifiedand are found to correspond to particular times. ThusJD may be used to decompose the behaviour, in time, ofcontact networks and produce average static graphs foreach time. This may be viewed as a mixture between adynamic and static graph approach to contact networkanalysis.

UCAM-CL-TR-807

Ola Mahmoud:

Second-order algebraic theoriesOctober 2011, 133 pages, PDFPhD thesis (Clare Hall, March 2011)

Abstract: Second-order universal algebra and second-order equational logic respectively provide a modeltheory and a formal deductive system for languageswith variable binding and parameterised metavariables.This dissertation completes the algebraic foundationsof second-order languages from the viewpoint of cate-gorical algebra.

In particular, the dissertation introduces the notionof second-order algebraic theory. A main role in the def-inition is played by the second-order theory of equal-ity M, representing the most elementary operators andequations present in every second-order language. Weshow that M can be described abstractly via the univer-sal property of being the free cartesian category on anexponentiable object. Thereby, in the tradition of cat-egorical algebra, a second-order algebraic theory con-sists of a cartesian category TH and a strict cartesianidentity-on-objects functor from M to TH that pre-serves the universal exponentiable object of M.

208

At the syntactic level, we establish the correctnessof our definition by showing a categorical equiva-lence between second-order equational presentationsand second-order algebraic theories. This equivalence,referred to as the Second-Order Syntactic Categori-cal Type Theory Correspondence, involves distilling anotion of syntactic translation between second-orderequational presentations that corresponds to the canon-ical notion of morphism between second-order alge-braic theories. Syntactic translations provide a mathe-matical formalisation of notions such as encodings andtransforms for second-order languages.

On top of the aforementioned syntactic correspon-dence, we furthermore establish the Second-Order Se-mantic Categorical Type Theory Correspondence. Thisinvolves generalising Lawvere’s notion of functorialmodel of algebraic theories to the second-order setting.By this semantic correspondence, second-order functo-rial semantics is shown to correspond to the model the-ory of second-order universal algebra.

We finally show that the core of the theory sur-rounding Lawvere theories generalises to the second or-der as well. Instances of this development are the exis-tence of algebraic functors and monad morphisms inthe second-order universe. Moreover, we define a no-tion of translation homomorphism that allows us to es-tablish a 2-categorical type theory correspondence.

UCAM-CL-TR-808

Matko Botincan, Mike Dodds,Suresh Jagannathan:

Resource-sensitive synchronisationinference by abductionJanuary 2012, 57 pages, PDF

Abstract: We present an analysis which takes as itsinput a sequential program, augmented with annota-tions indicating potential parallelization opportunities,and a sequential proof, written in separation logic, andproduces a correctly-synchronized parallelized programand proof of that program. Unlike previous work, oursis not an independence analysis; we insert synchroniza-tion constructs to preserve relevant dependencies foundin the sequential program that may otherwise be vi-olated by a naıve translation. Separation logic allowsus to parallelize fine-grained patterns of resource-usage,moving beyond straightforward points-to analysis.

Our analysis works by using the sequential proofto discover dependencies between different parts of theprogram. It leverages these discovered dependencies toguide the insertion of synchronization primitives intothe parallelized program, and ensure that the resultingparallelized program satisfies the same specification asthe original sequential program. Our analysis is builtusing frame inference and abduction, two techniquessupported by an increasing number of separation logictools.

UCAM-CL-TR-809

John L. Miller:

Distributed virtual environmentscalability and securityOctober 2011, 98 pages, PDFPhD thesis (Hughes Hall, October 2011)

Abstract: Distributed virtual environments (DVEs)have been an active area of research and engineering formore than 20 years. The most widely deployed DVEsare network games such as Quake, Halo, and World ofWarcraft (WoW), with millions of users and billions ofdollars in annual revenue. Deployed DVEs remain ex-pensive centralized implementations despite significantresearch outlining ways to distribute DVE workloads.

This dissertation shows previous DVE research eval-uations are inconsistent with deployed DVE needs. As-sumptions about avatar movement and proximity –fundamental scale factors – do not match WoW’s work-load, and likely the workload of other deployed DVEs.Alternate workload models are explored and prelimi-nary conclusions presented. Using realistic workloadsit is shown that a fully decentralized DVE cannot bedeployed to today’s consumers, regardless of its over-head.

Residential broadband speeds are improving, andthis limitation will eventually disappear. When it does,appropriate security mechanisms will be a fundamentalrequirement for technology adoption.

A trusted auditing system (“Carbon”) is presentedwhich has good security, scalability, and resource char-acteristics for decentralized DVEs. When performingexhaustive auditing, Carbon adds 27% network over-head to a decentralized DVE with a WoW-like work-load. This resource consumption can be reduced sig-nificantly, depending upon the DVE’s risk tolerance. Fi-nally, the Pairwise Random Protocol (PRP) is described.PRP enables adversaries to fairly resolve probabilisticactivities, an ability missing from most decentralizedDVE security proposals.

Thus, this dissertation’s contribution is to addresstwo of the obstacles for deploying research on decen-tralized DVE architectures. First, lack of evidence thatresearch results apply to existing DVEs. Second, thelack of security systems combining appropriate securityguarantees with acceptable overhead.

UCAM-CL-TR-810

Nick Barrow-Williams:

Proximity Coherence forchip-multiprocessorsNovember 2011, 164 pages, PDFPhD thesis (Trinity Hall, January 2011)

209

Abstract: Many-core architectures provide an efficientway of harnessing the growing numbers of transistorsavailable in modern fabrication processes; however, theparallel programs run on these platforms are increas-ingly limited by the energy and latency costs of com-munication. Existing designs provide a functional com-munication layer but do not necessarily implement themost efficient solution for chip-multiprocessors, placinglimits on the performance of these complex systems. Inan era of increasingly power limited silicon design, effi-ciency is now a primary concern that motivates design-ers to look again at the challenge of cache coherence.

The first step in the design process is to analyse thecommunication behaviour of parallel benchmark suitessuch as Parsec and SPLASH-2. This thesis presentswork detailing the sharing patterns observed when run-ning the full benchmarks on a simulated 32-core x86machine. The results reveal considerable locality ofshared data accesses between threads with consecutiveoperating system assigned thread IDs. This pattern, al-though of little consequence in a multi-node system,corresponds to strong physical locality of shared databetween adjacent cores on a chip-multiprocessor plat-form.

Traditional cache coherence protocols, although of-ten used in chip-multiprocessor designs, have been de-veloped in the context of older multi-node systems. Byredesigning coherence protocols to exploit new patternssuch as the physical locality of shared data, improvingthe efficiency of communication, specifically in chip-multiprocessors, is possible. This thesis explores sucha design – Proximity Coherence – a novel scheme inwhich L1 load misses are optimistically forwarded tonearby caches via new dedicated links rather than al-ways being indirected via a directory structure.

UCAM-CL-TR-811

A. Theodore Markettos:

Active electromagnetic attacks onsecure hardwareDecember 2011, 217 pages, PDFPhD thesis (Clare Hall, March 2010)

Abstract: The field of side-channel attacks on crypto-graphic hardware has been extensively studied. In manycases it is easier to derive the secret key from these at-tacks than to break the cryptography itself. One suchsidechannel attack is the electromagnetic side-channelattack, giving rise to electromagnetic analysis (EMA).

EMA, when otherwise known as ‘TEMPEST’ or‘compromising emanations’, has a long history in themilitary context over almost the whole of the twenti-eth century. The US military also mention three relatedattacks, believed to be: HIJACK (modulation of secretdata onto conducted signals), NONSTOP (modulationof secret data onto radiated signals) and TEAPOT (in-tentional malicious emissions).

In this thesis I perform a fusion of TEAPOT and HI-JACK/NONSTOP techniques on secure integrated cir-cuits. An attacker is able to introduce one or more fre-quencies into a cryptographic system with the intentionof forcing it to misbehave or to radiate secrets.

I demonstrate two approaches to this attack:To perform the reception, I assess a variety of elec-

tromagnetic sensors to perform EMA. I choose an in-ductive hard drive head and a metal foil electric fieldsensor to measure near-field EM emissions.

The first approach, named the re-emission attack,injects frequencies into the power supply of a device tocause it to modulate up baseband signals. In this wayI detect data-dependent timing from a ‘secure’ micro-controller. Such up-conversion enables a more compactand more distant receiving antenna.

The second approach involves injecting one or morefrequencies into the power supply of a random numbergenerator that uses jitter of ring oscillators as its ran-dom number source. I am able to force injection lock-ing of the oscillators, greatly diminishing the entropyavailable.

I demonstrate this with the random number gener-ators on two commercial devices. I cause a 2004 EMVbanking smartcard to fail statistical test suites by gen-erating a periodicity. For a secure 8-bit microcontrollerthat has been used in banking ATMs, I am able to re-duce the random number entropy from 232 to 225. Thisenables a 50% probability of a successful attack oncash withdrawal in 15 attempts.

UCAM-CL-TR-812

Pedro Brandao:

Abstracting information on body areanetworksJanuary 2012, 144 pages, PDFPhD thesis (Magdalene College, July 2011)

Abstract: Healthcare is changing, correction, healthcareis in need of change. The population ageing, the in-crease in chronic and heart diseases and just the in-crease in population size will overwhelm the currenthospital-centric healthcare.

There is a growing interest by individuals to moni-tor their own physiology. Not only for sport activities,but also to control their own diseases. They are chang-ing from the passive healthcare receiver to a proactiveself-healthcare taker. The focus is shifting from hospitalcentred treatment to a patient-centric healthcare moni-toring.

Continuous, everyday, wearable monitoring and ac-tuating is part of this change. In this setting, sen-sors that monitor the heart, blood pressure, move-ment, brain activity, dopamine levels, and actuatorsthat pump insulin, ‘pump’ the heart, deliver drugs tospecific organs, stimulate the brain are needed as perva-sive components in and on the body. They will tend for

210

people’s need of self-monitoring and facilitate health-care delivery.

These components around a human body that com-municate to sense and act in a coordinated fashionmake a Body Area Network (BAN). In most cases, andin our view, a central, more powerful component willact as the coordinator of this network. These networksaim to augment the power to monitor the human bodyand react to problems discovered with this observation.One key advantage of this system is their overarchingview of the whole network. That is, the central com-ponent can have an understanding of all the monitoredsignals and correlate them to better evaluate and reactto problems. This is the focus of our thesis.

In this document we argue that this multi-parametercorrelation of the heterogeneous sensed information isnot being handled in BANs. The current view dependsexclusively on the application that is using the networkand its understanding of the parameters. This meansthat every application will oversee the BAN’s heteroge-neous resources managing them directly without takinginto consideration other applications, their needs andknowledge.

There are several physiological correlations alreadyknown by the medical field. Correlating blood pressureand cross sectional area of blood vessels to calculateblood velocity, estimating oxygen delivery from cardiacoutput and oxygen saturation, are such examples. Thisknowledge should be available in a BAN and sharedby the several applications that make use of the net-work. This architecture implies a central componentthat manages the knowledge and the resources. Andthis is, in our view, missing in BANs.

Our proposal is a middleware layer that abstractsthe underlying BAN’s resources to the application, pro-viding instead an information model to be queried. Themodel describes the correlations for producing new in-formation that the middleware knows about. Naturally,the raw sensed data is also part of the model. The mid-dleware hides the specificities of the nodes that consti-tute the BAN, by making available their sensed produc-tion. Applications are able to query for information at-taching requirements to these requests. The middlewareis then responsible for satisfying the requests while op-timising the resource usage of the BAN.

Our architecture proposal is divided in two corre-sponding layers, one that abstracts the nodes’ hardware(hiding node’s particularities) and the information layerthat describes information available and how it is cor-related. A prototype implementation of the architecturewas done to illustrate the concept.

UCAM-CL-TR-813

Andrew B. Lewis:

Reconstructing compressed photoand video dataFebruary 2012, 148 pages, PDFPhD thesis (Trinity College, June 2011)

Abstract: Forensic investigators sometimes need to ver-ify the integrity and processing history of digital pho-tos and videos. The multitude of storage formats anddevices they need to access also presents a challenge forevidence recovery. This thesis explores how visual datafiles can be recovered and analysed in scenarios wherethey have been stored in the JPEG or H.264 (MPEG-4AVC) compression formats.

My techniques make use of low-level details of lossycompression algorithms in order to tell whether a fileunder consideration might have been tampered with.I also show that limitations of entropy coding some-times allow us to recover intact files from storage de-vices, even in the absence of filesystem and containermetadata.

I first show that it is possible to embed an impercep-tible message within a uniform region of a JPEG im-age such that the message becomes clearly visible whenthe image is recompressed at a particular quality fac-tor, providing a visual warning that recompression hastaken place.

I then use a precise model of the computations in-volved in JPEG decompression to build a specialisedcompressor, designed to invert the computations ofthe decompressor. This recompressor recovers the com-pressed bitstreams that produce a given decompressionresult, and, as a side-effect, indicates any regions of theinput which are inconsistent with JPEG decompression.I demonstrate the algorithm on a large database of im-ages, and show that it can detect modifications to de-compressed image regions.

Finally, I show how to rebuild fragmented com-pressed bitstreams, given a syntax description that in-cludes information about syntax errors, and demon-strate its applicability to H.264/AVC Baseline profilevideo data in memory dumps with randomly shuffledblocks.

UCAM-CL-TR-814

Arjuna Sathiaseelan, Jon Crowcroft:

The free Internet: a distant mirage ornear reality?February 2012, 10 pages, PDF

Abstract: Through this short position paper, we hopeto convey our thoughts on the need for free Internetaccess and describe possible ways of achieving this –hoping this stimulates a useful discussion.

UCAM-CL-TR-815

Christian Richardt:

Colour videos with depth:acquisition, processing andevaluation

211

March 2012, 132 pages, PDFPhD thesis (Gonville & Caius College, November2011)

Abstract: The human visual system lets us perceive theworld around us in three dimensions by integrating ev-idence from depth cues into a coherent visual modelof the world. The equivalent in computer vision andcomputer graphics are geometric models, which pro-vide a wealth of information about represented objects,such as depth and surface normals. Videos do not con-tain this information, but only provide per-pixel colourinformation. In this dissertation, I hence investigate acombination of videos and geometric models: videoswith per-pixel depth (also known as RGBZ videos). Iconsider the full life cycle of these videos: from theiracquisition, via filtering and processing, to stereoscopicdisplay.

UCAM-CL-TR-816

Jean E. Martina:

Verification of security protocolsbased on multicast communicationMarch 2012, 150 pages, PDFPhD thesis (Clare College, February 2011)

Abstract: Over an insecure network, agents need meansto communicate securely. These means are often calledsecurity protocols. Security protocols, although con-structed through the arrangement of simple securityblocks, normally yield complex goals. They seem sim-ple at a first glance, but hide subtleties that allow themto be exploited.

One way of trying to systematically capture suchsubtleties is through the use of formal methods. Thematurity of some methods for protocol verification is afact today. But these methods are still not able to cap-ture the whole set of security protocols being designed.With the convergence to an online world, new secu-rity goals are proposed and new protocols need to bedesigned. The evolution of formal verification methodsbecomes a necessity to keep pace with this ongoing de-velopment.

This thesis covers the Inductive Method and its ex-tensions. The Inductive Method is a formalism to spec-ify and verify security protocols based on structural in-duction and higher-order logic proofs. This account ofour extensions enables the Inductive Method to reasonabout non-Unicast communication and threshold cryp-tography.

We developed a new set of theories capable of repre-senting the entire set of known message casting frame-works. Our theories enable the Inductive Method toreason about a whole new set of protocols. We alsospecified a basic abstraction of threshold cryptographyas a way of proving the extensibility of the method tonew cryptographic primitives. We showed the feasibil-ity of our specifications by revisiting a classic protocol,

now verified under our framework. Secrecy verificationunder a mixed environment of Multicast and Unicastwas also done for a Byzantine security protocol.

UCAM-CL-TR-817

Joseph Bonneau, Cormac Herley,Paul C. van Oorschot, Frank Stajano:

The quest to replace passwords:a framework for comparativeevaluation of Web authenticationschemesMarch 2012, 32 pages, PDF

Abstract: We evaluate two decades of proposals to re-place text passwords for general-purpose user authen-tication on the web using a broad set of twenty-five us-ability, deployability and security benefits that an idealscheme might provide. The scope of proposals we sur-vey is also extensive, including password managementsoftware, federated login protocols, graphical passwordschemes, cognitive authentication schemes, one-timepasswords, hardware tokens, phone-aided schemes andbiometrics. Our comprehensive approach leads to keyinsights about the difficulty of replacing passwords.Not only does no known scheme come close to pro-viding all desired benefits: none even retains the full setof benefits which legacy passwords already provide. Inparticular, there is a wide range between schemes offer-ing minor security benefits beyond legacy passwords,to those offering significant security benefits in returnfor being more costly to deploy or difficult to use. Weconclude that many academic proposals have failed togain traction because researchers rarely consider a suf-ficiently wide range of real-world constraints. Beyondour analysis of current schemes, our framework pro-vides an evaluation methodology and benchmark forfuture web authentication proposals.

This report is an extended version of the peer-reviewed paper by the same name. In about twice asmany pages it gives full ratings for 35 authenticationschemes rather than just 9.

UCAM-CL-TR-818

Robert N. M. Watson:

New approaches to operating systemsecurity extensibilityApril 2012, 184 pages, PDFPhD thesis (Wolfson College, October 2010)

Abstract: This dissertation proposes new approachesto commodity computer operating system (OS) ac-cess control extensibility that address historic problems

212

with concurrency and technology transfer. Access con-trol extensibility addresses a lack of consensus on op-erating system policy model at a time when security re-quirements are in flux: OS vendors, anti-virus compa-nies, firewall manufacturers, smart phone developers,and application writers require new tools to expresspolicies tailored to their needs. By proposing principledapproaches to access control extensibility, this work al-lows OS security to be “designed in” yet remain flexiblein the face of diverse and changing requirements.

I begin by analysing system call interposition, apopular extension technology used in security researchand products, and reveal fundamental and readily ex-ploited concurrency vulnerabilities. Motivated by thesefailures, I propose two security extension models: theTrustedBSD Mandatory Access Control (MAC) Frame-work, a flexible kernel access control extension frame-work for the FreeBSD kernel, and Capsicum, practicalcapabilities for UNIX.

The MAC Framework, a research project I beganbefore starting my PhD, allows policy modules to dy-namically extend the kernel access control policy. Theframework allows policies to integrate tightly with ker-nel synchronisation, avoiding race conditions inherentto system call interposition, as well as offering reduceddevelopment and technology transfer costs for new se-curity policies. Over two chapters, I explore the frame-work itself, and its transfer to and use in several prod-ucts: the open source FreeBSD operating system, nCir-cle’s enforcement appliances, and Apple’s Mac OS Xand iOS operating systems.

Capsicum is a new application-centric capability se-curity model extending POSIX. Capsicum targets ap-plication writers rather than system designers, reflect-ing a trend towards security-aware applications such asGoogle’s Chromium web browser, that map distributedsecurity policies into often inadequate local primitives. Icompare Capsicum with other sandboxing techniques,demonstrating improved performance, programmabil-ity, and security.

This dissertation makes original contributions tochallenging research problems in security and operatingsystem design. Portions of this research have alreadyhad a significant impact on industry practice.

UCAM-CL-TR-819

Joseph Bonneau:

Guessing human-chosen secretsMay 2012, 161 pages, PDFPhD thesis (Churchill College, May 2012)

Abstract: Authenticating humans to computers remainsa notable weak point in computer security despitedecades of effort. Although the security research com-munity has explored dozens of proposals for replacingor strengthening passwords, they appear likely to re-main entrenched as the standard mechanism of human-computer authentication on the Internet for years to

come. Even in the optimistic scenario of eliminatingpasswords from most of today’s authentication proto-cols using trusted hardware devices or trusted servers toperform federated authentication, passwords will per-sist as a means of “last-mile” authentication betweenhumans and these trusted single sign-on deputies.

This dissertation studies the difficulty of guessinghuman-chosen secrets, introducing a sound mathemat-ical framework modeling human choice as a skewedprobability distribution. We introduce a new metric,alpha-guesswork, which can accurately model the re-sistance of a distribution against all possible guessingattacks. We also study the statistical challenges of esti-mating this metric using empirical data sets which canbe modeled as a large random sample from the under-lying probability distribution.

This framework is then used to evaluate severalrepresentative data sets from the most important cate-gories of human-chosen secrets to provide reliable esti-mates of security against guessing attacks. This includescollecting the largest-ever corpus of user-chosen pass-words, with nearly 70 million, the largest list of humannames ever assembled for research, the largest data setsof real answers to personal knowledge questions andthe first data published about human choice of bankingPINs. This data provides reliable numbers for designingsecurity systems and highlights universal limitations ofhuman-chosen secrets.

UCAM-CL-TR-820

Eiko Yoneki, Amitabha Roy:

A unified graph query layer formultiple databasesAugust 2012, 22 pages, PDF

Abstract: There is increasing demand to store and querydata with an inherent graph structure. Examples ofsuch data include those from online social networks,the semantic web and from navigational queries on spa-tial data such as maps. Unfortunately, traditional re-lational databases have fallen short where such graphstructured data is concerned. This has led to the devel-opment of specialised graph databases such as Neo4j.However, traditional databases continue to have a wideusage base and have desirable properties such as thecapacity to support a high volume of transactionswhile offering ACID semantics. In this paper we ar-gue that it is in fact possible to unify different databaseparadigms together in the case of graph structured datathrough the use of a common query language and dataloader that we have named Crackle (a wordplay onGra[ph]QL). Crackle provides an expressive and pow-erful query library in Clojure (a functional LISP di-alect for JVMs). It also provides a data loader that iscapable of interfacing transparently with various datasources such as PostgreSQL databases and the Rediskey-value store. Crackle shields programmers from the

213

backend database by allowing them to write queriesin Clojure. Additionally, its graph-focused prefetchersare capable of closing the hitherto large gap between aPostgreSQL database and a specialised graph databasesuch as Neo4j from as much 326x (with a SQL query)to as low as 6x (when using Crackle). We also includea detailed performance analysis that identifies waysto further reduce this gap with Crackle. This bringsinto question the performance argument for specialisedgraph databases such as Neo4j by providing compara-ble performance on supposedly legacy data sources.

UCAM-CL-TR-821

Charles Reams:

Modelling energy efficiency forcomputationOctober 2012, 135 pages, PDFPhD thesis (Clare College, October 2012)

Abstract: In the last decade, efficient use of energy hasbecome a topic of global significance, touching almostevery area of modern life, including computing. Frommobile to desktop to server, energy efficiency concernsare now ubiquitous. However, approaches to the energyproblem are often piecemeal and focus on only one areafor improvement.

I argue that the strands of the energy problem are in-extricably entangled and cannot be solved in isolation.I offer a high-level view of the problem and, buildingfrom it, explore a selection of subproblems within thefield. I approach these with various levels of formality,and demonstrate techniques to make improvements onall levels. The original contributions are as follows:

Chapter 3 frames the energy problem as one of op-timisation with constraints, and explores the impact ofthis perspective for current commodity products. Thisincludes considerations of the hardware, software andoperating system. I summarise the current situation inthese respects and propose directions in which theycould be improved to better support energy manage-ment.

Chapter 4 presents mathematical techniques tocompute energy-optimal schedules for long-runningcomputations. This work reflects the server-domainconcern with energy cost, producing schedules that ex-ploit fluctuations in power cost over time to minimiseexpenditure rather than raw energy. This assumes cer-tain idealised models of power, performance, cost, andworkload, and draws precise formal conclusions fromthem.

Chapter 5 considers techniques to implementenergy-efficient real-time streaming. Two classes ofproblem are considered: first, hard real-time streamingwith fixed, predictable frame characteristics; second,soft real-time streaming with a quality-of-service guar-antee and probabilistic descriptions of per-frame work-load. Efficient algorithms are developed for scheduling

frame execution in an energy-efficient way while stillguaranteeing hard real-time deadlines. These schedulesdetermine appropriate values for power-relevant pa-rameters, such as dynamic voltage–frequency scaling.

A key challenge for future work will be unify-ing these diverse approaches into one “Theory of En-ergy” for computing. The progress towards this is sum-marised in Chapter 6. The thesis concludes by sketchingfuture work towards this Theory of Energy.

UCAM-CL-TR-822

Richard A. Russell:

Planning with preferences usingmaximum satisfiabilityOctober 2012, 160 pages, PDFPhD thesis (Gonville and Caius College, September2011)

Abstract: The objective of automated planning is tosynthesise a plan that achieves a set of goals specified bythe user. When achieving every goal is not feasible, theplanning system must decide which ones to plan for andfind the lowest cost plan. The system should take as in-put a description of the user’s preferences and the costsincurred through executing actions. Goal utility depen-dencies arise when the utility of achieving a goal de-pends on the other goals that are achieved with it. Thiscomplicates the planning procedure because achievinga new goal can alter the utilities of all the other goalscurrently achieved.

In this dissertation we present methods for solv-ing planning problems with goal utility dependenciesby compiling them to a variant of satisfiability knownas weighted partial maximum satisfiability (WPMax-SAT). An optimal solution to the encoding is foundusing a general-purpose solver. The encoding is con-structed such that its optimal solution can be used toconstruct a plan that is most preferred amongst otherplans of length that fit within a prespecified horizon.We evaluate this approach against an integer program-ming based system using benchmark problems takenfrom past international planning competitions.

We study how a WPMax-SAT solver might benefitfrom incorporating a procedure known as survey prop-agation. This is a message passing algorithm that esti-mates the probability that a variable is constrained tobe a particular value in a randomly selected satisfyingassignment. These estimates are used to influence vari-able/value decisions during search for a solution. Sur-vey propagation is usually presented with respect to thesatisfiability problem, and its generalisation, SP(y), withrespect to the maximum satisfiability problem. We ex-tend the argument that underpins these two algorithmsto derive a new set of message passing equations forapplication to WPMax-SAT problems. We evaluate thesuccess of this method by applying it to our encodingsof planning problems with goal utility dependencies.

214

Our results indicate that planning with preferencesusing WPMax-SAT is competitive and sometimes moresuccessful than an integer programming approach –solving two to three times more subproblems in somedomains, while being outperformed by a smaller mar-gin in others. In some domains, we also find that us-ing information provided by survey propagation in aWPMax-SAT solver to select variable/value pairs forthe earliest decisions can, on average, direct search tolower cost solutions than a uniform sampling strategycombined with a popular heuristic.

UCAM-CL-TR-823

Amitabha Roy, Karthik Nilakant,Valentin Dalibard, Eiko Yoneki:

Mitigating I/O latency in SSD-basedgraph traversalNovember 2012, 27 pages, PDF

Abstract: Mining large graphs has now become an im-portant aspect of many applications. Recent interest inlow cost graph traversal on single machines has lead tothe construction of systems that use solid state drives(SSDs) to store the graph. An SSD can be accessed withfar lower latency than magnetic media, while remain-ing cheaper than main memory. Unfortunately SSDs areslower than main memory and algorithms running onsuch systems are hampered by large IO latencies whenaccessing the SSD. In this paper we present two noveltechniques to reduce the impact of SSD IO latency onsemi-external memory graph traversal. We introduce avariant of the Compressed Sparse Row (CSR) formatthat we call Compressed Enumerated Encoded SparseOffset Row (CEESOR). CEESOR is particularly effi-cient for graphs with hierarchical structure and can re-duce the space required to represent connectivity in-formation by amounts varying from 5% to as muchas 76%. CEESOR allows a larger number of edges tobe moved for each unit of IO transfer from the SSDto main memory and more effective use of operat-ing system caches. Our second contribution is a run-time prefetching technique that exploits the ability ofsolid state drives to service multiple random access re-quests in parallel. We present a novel Run Along SSDPrefetcher (RASP). RASP is capable of hiding the ef-fect of IO latency in single threaded graph traversal inbreadth-first and shorted path order to the extent thatit improves iteration time for large graphs by amountsvarying from 2.6X-6X.

UCAM-CL-TR-824

Simon Frankau:

Hardware synthesis from astream-processing functionallanguage

November 2012, 202 pages, PDFPhD thesis (St. John’s College, July 2004)

Abstract: As hardware designs grow exponentiallylarger, there is an increasing challenge to use transis-tor budgets effectively. Without higher-level synthesistools, so much effort may be spent on low-level detailsthat it becomes impractical to effectively design circuitsof the size that can be fabricated. This possibility of adesign gap has been documented for some time now.

One solution is the use of domain-specific lan-guages. This thesis covers the use of software-like lan-guages to describe algorithms that are to be imple-mented in hardware. Hardware engineers can use thetools to improve their productivity and effectiveness inthis particular domain. Software engineers can also usethis approach to benefit from the parallelism availablein modern hardware (such as reconfigurable systemsand FPGAs), while retaining the convenience of a soft-ware description.

In this thesis a statically-allocated pure functionallanguage, SASL, is introduced. Static allocation makesthe language suited to implementation in fixed hard-ware resources. The I/O model is based on streams (lin-ear lazy lists), and implicit parallelism is used in or-der to maintain a software-like approach. The thesiscontributes constraints which allow the language to bestatically-allocated, and synthesis techniques for SASLtargeting both basic CSP and a graph-based target thatmay be compiled to a register-transfer level (RTL) de-scription.

Further chapters examine the optimisation of thelanguage, including the use of lenient evaluation to in-crease parallelism, the introduction of closures and gen-eral lazy evaluation, and the use of non-determinism inthe language. The extensions are examined in terms ofthe restrictions required to ensure static allocation, andthe techniques required to synthesise them.

UCAM-CL-TR-825

Jonathan Anderson:

Privacy engineering for socialnetworksDecember 2012, 255 pages, PDFPhD thesis (Trinity College, July 2012)

Abstract: In this dissertation, I enumerate several pri-vacy problems in online social networks (OSNs) anddescribe a system called Footlights that addresses them.Footlights is a platform for distributed social applica-tions that allows users to control the sharing of privateinformation. It is designed to compete with the perfor-mance of todays centralised OSNs, but it does not trustcentralised infrastructure to enforce security properties.

Based on several socio-technical scenarios, I extractconcrete technical problems to be solved and show howthe existing research literature does not solve them.

215

Addressing these problems fully would fundamentallychange users interactions with OSNs, providing realcontrol over online sharing.

I also demonstrate that todays OSNs do not providethis control: both user data and the social graph arevulnerable to practical privacy attacks.

Footlights storage substrate provides private, scal-able, sharable storage using untrusted servers. Underrealistic assumptions, the direct cost of operating thisstorage system is less than one US dollar per user-year.It is the foundation for a practical shared filesystem, aperfectly unobservable communications channel and adistributed application platform.

The Footlights application platform allows third-party developers to write social applications withoutdirect access to users private data. Applications run in aconned environment with a private-by-default securitymodel: applications can only access user informationwith explicit user consent. I demonstrate that practicalapplications can be written on this platform.

The security of Footlights user data is based onpublic-key cryptography, but users are able to log into the system without carrying a private key on a hard-ware token. Instead, users authenticate to a set of au-thentication agents using a weak secret such as a user-chosen password or randomly-assigned 4-digit number.The protocol is designed to be secure even in the faceof malicious authentication agents.

UCAM-CL-TR-826

Fernando M. V. Ramos:

GREEN IPTV: a resource andenergy efficient network for IPTVDecember 2012, 152 pages, PDFPhD thesis (Clare Hall, November 2012)

Abstract: The distribution of television is currentlydominated by three technologies: over-the-air broad-cast, cable, and satellite. The advent of IP networks andthe increased availability of broadband access createda new vehicle for the distribution of TV services. Thedistribution of digital TV services over IP networks, orIPTV, offers carriers flexibility and added value in theform of additional services. It causes therefore no sur-prise the rapid roll-out of IPTV services by operatorsworldwide in the past few years.

IPTV distribution imposes stringent requirementson both performance and reliability. It is therefore chal-lenging for an IPTV operator to guarantee the qualityof experience expected by its users, and doing so in anefficient manner. In this dissertation I investigate someof the challenges faced by IPTV distribution networkoperators, and I propose novel techniques to addressthese challenges.

First, I address one of the major concerns of IPTVnetwork deployment: channel change delay. This is thelatency experienced by users when switching between

TV channels. Synchronisation and buffering of videostreams can cause channel change delays of several sec-onds. I perform an empirical analysis of a particularsolution to the channel change delay problem, namely,predictive pre-joining of TV channels. In this schemeeach Set Top Box simultaneously joins additional multi-cast groups (TV channels) along with the one requestedby the user. If the user switches to any of these chan-nels next, switching latency is virtually eliminated, anduser experience is improved. The results show that it ispossible to eliminate zapping delay for a significant per-centage of channel switching requests with little impactin access network bandwidth cost.

Second, I propose a technique to increase the re-source and energy efficiency of IPTV networks. Thistechnique is based on a simple paradigm: avoidingwaste. To reduce the inefficiencies of current static mul-ticast distribution schemes, I propose a semi-dynamicscheme where only a selection of TV multicast groupsis distributed in the network, instead of all. I perform anempirical evaluation of this method and conclude thatits use results in significant bandwidth reductions with-out compromising service performance. I also demon-strate that these reductions may translate into signifi-cant energy savings in the future.

Third, to increase energy efficiency further I proposea novel energy and resource friendly protocol for coreoptical IPTV networks. The idea is for popular IPTVtraffic to optically bypass the network nodes, avoid-ing electronic processing. I evaluate this proposal em-pirically and conclude that the introduction of opticalswitching techniques results in a significant increase inthe energy efficiency of IPTV networks.

All the schemes I present in this dissertation are eval-uated by means of trace-driven analyses using a datasetfrom an operational IPTV service provider. Such thor-ough and realistic evaluation enables the assessment ofthe proposed techniques with an increased level of con-fidence, and is therefore a strength of this dissertation.

UCAM-CL-TR-827

Omar S. Choudary:

The smart card detective:a hand-held EMV interceptorDecember 2012, 55 pages, PDF

Abstract: Several vulnerabilities have been found inthe EMV system (also known as Chip and PIN). SaarDrimer and Steven Murdoch have successfully imple-mented a relay attack against EMV using a fake termi-nal. Recently the same authors have found a method tosuccessfully complete PIN transactions without actuallyentering the correct PIN. The press has published thisvulnerability but they reported such a scenario as beinghard to execute in practice because it requires special-ized and complex hardware.

216

As proposed by Ross Anderson and Mike Bondin 2006, I decided to create a miniature man-in-the-middle device to defend smartcard users against relayattacks.

As a result of my MPhil project work I created ahand-held device, called Smart Card Defender (SCD),which intercepts the communication between smart-card and terminal. The device has been built using a lowcost ATMEL AT90USB1287 microcontroller and otherreadily available electronic components. The total costof the SCD has been around £100, but an industrialversion could be produced for less than £20.

I implemented several applications using the SCD,including the defense against the relay attack as wellas the recently discovered vulnerability to complete atransaction without using the correct PIN.

All the applications have been successfully tested onCAP readers and live terminals. Furthermore, I haveperformed real tests using the SCD at several shops intown.

From the experiments using the SCD, I have no-ticed some particularities of the CAP protocol com-pared to the EMV standard. I have also discovered thatthe smartcard does not follow the physical transportprotocol exactly. Such findings are presented in detail,along with a discussion of the results.

UCAM-CL-TR-828

Rosemary M. Francis:

Exploring networks-on-chip forFPGAsJanuary 2013, 121 pages, PDFPhD thesis (Darwin College, July 2009)

Abstract: Developments in fabrication processes haveshifted the cost ratio between wires and transistors toallow new trade-offs between computation and com-munication. Rising clock speeds have lead to multi-cycle cross-chip communication and pipelined buses. Itis then a small step from pipelining to switching andthe development of multi-core networked systems-on-chip. Modern FPGAs are also now home to complexsystems-on-chip. A change in the way we structure thecomputation demands a change in the way we structurethe communication on-chip.

This thesis looks at Network-on-Chip design for FP-GAs beyond the trade-offs between hard (silicon) andsoft (configurable) designs. FPGAs are capable of ex-tremely flexible, statically routed bit-based wiring, butthis flexibility comes at a high area, latency and powercost. Soft NoCs are able to maintain this flexibility, butdo not necessarily make good use of the computation-communication trade-off. Hard NoCs are more effi-cient when used, but are forced to operate below ca-pacity by the soft IP cores. It is also difficult to designhard NoCs with the flexibility needed without wastingsilicon when the network is not used.

In the first part of this thesis I explore the capabilityof Time-Division Multiplexed (TDM) wiring to bridgethe gap between the fine-grain static FPGA wiring andthe bus-based dynamic routing of a NoC. By replacingsome of the static FPGA wiring with TDM wiring I amable to time division multiplex hard routers and makebetter use of the non-configurable area. The cost of ahard network is reduced by moving some of the areacost from the routers into reusable TDM wiring com-ponents. The TDM wiring improves the interface be-tween the hard routers and soft IP blocks which leadsto higher logic density overall. I show that TDM wiringmakes hard routers a flexible and efficient alternative tosoft interconnect.

The second part of this thesis looks at the feasibilityof replacing all static wiring on the FPGA with TDMwiring. The aim was to increase the routing capacityof the FPGA whilst decreasing the area used to imple-ment it. An ECAD flow was developed to explore theextent to which the amount of wiring can be reduced.The results were then used to design the TDM circuitry.

My results show that an 80% reduction in theamount of wiring is possible though time-division mul-tiplexing. This reduction is sufficient to increase therouting capacity of the FPGA whilst maintaining simi-lar or better logic density. This TDM wiring can be usedto implement area and power-efficient hard networks-on-chip with good flexibility, as well as improving theperformance of other hard IP blocks.

UCAM-CL-TR-829

Philip Christopher Paul:

Microelectronic security measuresFebruary 2013, 177 pages, PDFPhD thesis (Pembroke College, January 2009)

Abstract: In this dissertation I propose the concept oftamper protection grids for microelectronic security de-vices made from organic electronic materials. As secu-rity devices have become ubiquitous in recent years,they are becoming targets for criminal activity. Onegeneral attack route to breach the security is to carryout physical attack after depackaging a device. Com-mercial security devices use a metal wire mesh withinthe chip to protect against these attacks. However, as amicrochip is physically robust, the mesh is not affectedby depackaging.

As a better way of protecting security devicesagainst attacks requiring the chip package to be re-moved, I investigate a protection grid that is vulnerableto damage if the packaging is tampered with. The pro-tection grid is connected directly to standard bond padson the microchip, to allow direct electronic measure-ments, saving the need for complex sensor structures.That way, a security device can monitor the packagefor integrity, and initiate countermeasures if required.

The feasibility of organic tamper protection gridswas evaluated. To establish the viability of the concept,

217

a fabrication method for these devices was developed,the sensitivity to depackaging was assessed, and prac-tical implementation issues were evolved. Inkjet print-ing was chosen as fabrication route, as devices can beproduced at low cost while preserving flexibility of lay-out. A solution to the problem of adverse surface inter-action was found to ensure good print quality on thehydrophobic chip surface. Standard contacts betweenchip and grid are non-linear and degrade between mea-surements, however it was shown that stable ohmiccontacts are possible using a silver buffer layer. Thesensitivity of the grid to reported depackaging meth-ods was tested, and improvements to the structure werefound to maximise damage to the grid upon tamperingwith the package. Practical issues such as measurementstability with temperature and age were evaluated, aswell as a first prototype to assess the achievable mea-surement accuracy. The evaluation of these practical is-sues shows directions for future work that can developorganic protection grids beyond the proof of concept.

Apart from the previously mentioned invasive at-tacks, there is a second category of attacks, non-invasive attacks, that do not require the removal of thechip packaging. The most prominent non-invasive at-tack is power analysis in which the power consump-tion of a device is used as oracle to reveal the secretkey of a security device. Logic gates were designed andfabricated with data-independent power consumptionin each clock cycle. However, it is shown that this isnot sufficient to protect the secret key. Despite bal-ancing the discharged capacitances in each clock cycle,the power consumed still depends on the data input.While the overall charge consumed in each clock cy-cle matches to a few percent, differences within a clockcycle can easily be measured. It was shown that thedominant cause for this imbalance is early propagation,which can be mitigated by ensuring that evaluationin a gate only takes place after all inputs are present.The second major source of imbalance are mismatcheddischarge paths in logic gates, which result in data-dependent evaluation times of a gate. This source ofimbalance is not as trivial to remove, as it conflicts withbalancing the discharged capacitances in each clock cy-cle.

UCAM-CL-TR-830

Paul J. Fox:

Massively parallel neuralcomputationMarch 2013, 105 pages, PDFPhD thesis (Jesus College, October 2012)

Abstract: Reverse-engineering the brain is one of theUS National Academy of Engineering’s ‘Grand Chal-lenges’. The structure of the brain can be examined atmany different levels, spanning many disciplines fromlow-level biology through psychology and computer

science. This thesis focusses on real-time computationof large neural networks using the Izhikevich spikingneuron model.

Neural computation has been described as ‘embar-rassingly parallel’ as each neuron can be thought of asan independent system, with behaviour described by amathematical model. However, the real challenge lies inmodelling neural communication. While the connectiv-ity of neurons has some parallels with that of electricalsystems, its high fan-out results in massive data process-ing and communication requirements when modellingneural communication, particularly for real-time com-putations.

It is shown that memory bandwidth is the mostsignificant constraint to the scale of real-time neuralcomputation, followed by communication bandwidth,which leads to a decision to implement a neural compu-tation system on a platform based on a network of FieldProgrammable Gate Arrays (FPGAs), using commercialoff-the-shelf components with some custom supportinginfrastructure. This brings implementation challenges,particularly lack of on-chip memory, but also many ad-vantages, particularly high-speed transceivers. An algo-rithm to model neural communication that makes ef-ficient use of memory and communication resources isdeveloped and then used to implement a neural compu-tation system on the multi-FPGA platform.

Finding suitable benchmark neural networks for amassively parallel neural computation system provesto be a challenge. A synthetic benchmark that hasbiologically-plausible fan-out, spike frequency andspike volume is proposed and used to evaluate the sys-tem. It is shown to be capable of computing the activityof a network of 256k Izhikevich spiking neurons witha fan-out of 1k in real-time using a network of 4 FPGAboards. This compares favourably with previous work,with the added advantage of scalability to larger neuralnetworks using more FPGAs.

It is concluded that communication must be consid-ered as a first-class design constraint when implement-ing massively parallel neural computation systems.

UCAM-CL-TR-831

Meredydd Luff:

Communication for programmabilityand performance on multi-coreprocessorsApril 2013, 89 pages, PDFPhD thesis (Gonville & Caius College, November2012)

Abstract: The transition to multi-core processors hasyielded a fundamentally new sort of computer. Soft-ware can no longer benefit passively from improve-ments in processor technology, but must perform itscomputations in parallel if it is to take advantage ofthe continued increase in processing power. Software

218

development has yet to catch up, and with good rea-son: parallel programming is hard, error-prone and of-ten unrewarding.

In this dissertation, I consider the programmabilitychallenges of the multi-core era, and examine three an-gles of attack.

I begin by reviewing alternative programmingparadigms which aim to address these changes, and in-vestigate two popular alternatives with a controlled pi-lot experiment. The results are inconclusive, and sub-sequent studies in that field have suffered from similarweakness. This leads me to conclude that empirical userstudies are poor tools for designing parallel program-ming systems.

I then consider one such alternative paradigm, trans-actional memory, which has promising usability char-acteristics but suffers performance overheads so severethat they mask its benefits. By modelling an ideal inter-core communication mechanism, I propose using ourembarrassment of parallel riches to mitigate these over-heads. By pairing “helper” processors with applicationthreads, I offload the overheads of software transac-tional memory, thereby greatly mitigating the problemof serial overhead.

Finally, I address the mechanics of inter-core com-munication. Due to the use of cache coherence topreserve the programming model of previous proces-sors, explicitly communicating between the cores ofany modern multi-core processor is painfully slow. Theschemes proposed so far to alleviate this problem arecomplex, insufficiently general, and often introducenew resources which cannot be virtualised transpar-ently by a time-sharing operating system. I proposeand describe an asynchronous remote store instruc-tion, which is issued by one core and completed asyn-chronously by another into its own local cache. I evalu-ate several patterns of parallel communication, and de-termine that the use of remote stores greatly increasesthe performance of common synchronisation kernels.I quantify the benefit to the feasibility of fine-grainedparallelism. To finish, I use this mechanism to imple-ment my parallel STM scheme, and demonstrate that itperforms well, reducing overheads significantly.

UCAM-CL-TR-832

Gregory A. Chadwick:

Communication centric, multi-core,fine-grained processor architectureApril 2013, 165 pages, PDFPhD thesis (Fitzwilliam College, September 2012)

Abstract: With multi-core architectures now firmly en-trenched in many application areas both computer ar-chitects and programmers now face new challenges.Computer architects must increase core count to in-crease explicit parallelism available to the programmerin order to provide better performance whilst leaving

the programming model presented tractable. The pro-grammer must find ways to exploit this explicit par-allelism provided that scales well with increasing coreand thread availability.

A fine-grained computation model allows the pro-grammer to expose a large amount of explicit paral-lelism and the greater the level of parallelism exposedthe better increasing core counts can be utilised. How-ever a fine-grained approach implies many interwork-ing threads and the overhead of synchronising andscheduling these threads can eradicate any scalabilityadvantages a fine-grained program may have.

Communication is also a key issue in multi-core ar-chitecture. Wires do not scale as well as gates, makingcommunication relatively more expensive comparedto computation so optimising communication betweencores on chip becomes important.

This dissertation presents an architecture designedto enable scalable fine-grained computation that iscommunication aware (allowing a programmer to opti-mise for communication). By combining a tagged mem-ory, where each word is augmented with a presence bitsignifying whether or not data is present in that word,with a hardware based scheduler, which allows a threadto wait upon a word becoming present with low over-head. A flexible and scalable architecture well suitedto fine-grained computation can be created, one whichenables this without needing the introduction of manynew architectural features or instructions. Communi-cation is made explicit by enforcing that accesses to agiven area of memory will always go to the same cache,removing the need for a cache coherency protocol.

The dissertation begins by reviewing the need formulti-core architecture and discusses the major issuesfaced in their construction. It moves on to look at fine-grained computation in particular. The proposed archi-tecture, known as Mamba, is then presented in detailwith several software techniques suitable for use withit introduced. An FPGA implementation of Mamba isthen evaluated against a similar architecture that lacksthe extensions Mamba has for assisting in fine-grainedcomputation (namely a memory tagged with presencebits and a hardware scheduler). Microbenchmarks ex-amining the performance of FIFO based communica-tion, MCS locks (an efficient spin-lock implementa-tion based around queues) and barriers demonstrateMamba’s scalability and insensitivity to thread count.A SAT solver implementation demonstrates that thesebenefits have a real impact on an actual application.

UCAM-CL-TR-833

Alan F. Blackwell, Ignatios Charalampidis:

Practice-led design and evaluation ofa live visual constraint languageMay 2013, 16 pages, PDF

219

Abstract: We report an experimental evaluation ofPalimpsest, a novel purely-visual programming lan-guage. A working prototype of Palimpsest had been de-veloped following a practice-led process, in order to as-sess whether tools for use in the visual arts can usefullybe created by adopting development processes that em-ulate arts practice. This initial prototype was receivedmore positively by users who have high self-efficacy inboth visual arts and computer use. A number of po-tential usability improvements are identified, structuredaccording to the Cognitive Dimensions of Notationsframework.

UCAM-CL-TR-834

John Wickerson:

Concurrent verification for sequentialprogramsMay 2013, 149 pages, PDFPhD thesis (Churchill College, December 2012)

Abstract: This dissertation makes two contributions tothe field of software verification. The first explains howverification techniques originally developed for concur-rency can be usefully applied to sequential programs.The second describes how sequential programs can beverified using diagrams that have a parallel nature.

The first contribution involves a new treatmentof stability in verification methods based on rely-guarantee. When an assertion made in one thread of aconcurrent system cannot be invalidated by the actionsof other threads, that assertion is said to be ‘stable’.Stability is normally enforced through side-conditionson rely-guarantee proof rules. This dissertation insteadproposes to encode stability information into the syn-tactic form of the assertion. This approach, which wecall explicit stabilisation, brings several benefits. First,we empower rely-guarantee with the ability to reasonabout library code for the first time. Second, whenthe rely-guarantee method is redeployed in a sequentialsetting, explicit stabilisation allows more details of amodule’s implementation to be hidden when verifyingclients. Third, explicit stabilisation brings a more nu-anced understanding of the important issue of stabilityin concurrent and sequential verification; such an un-derstanding grows ever more important as verificationtechniques grow ever more complex.

The second contribution is a new method of pre-senting program proofs conducted in separation logic.Building on work by Jules Bean, the ribbon proof isa diagrammatic alternative to the standard ‘proof out-line’. By emphasising the structure of a proof, ribbonproofs are intelligible and hence pedagogically useful.Because they contain less redundancy than proof out-lines, and allow each proof step to be checked locally,they are highly scalable; this we illustrate with a rib-bon proof of the Version 7 Unix memory manager.Where proof outlines are cumbersome to modify, rib-bon proofs can be visually manoeuvred to yield proofs

of variant programs. We describe the ribbon proof sys-tem, prove its soundness and completeness, and out-line a prototype tool for mechanically checking the di-agrams it produces.

UCAM-CL-TR-835

Maximilian C. Bolingbroke:

Call-by-need supercompilationMay 2013, 230 pages, PDFPhD thesis (Robinson College, April 2013)

Abstract: This thesis shows how supercompilation, apowerful technique for transformation and analysis offunctional programs, can be effectively applied to acall-by-need language. Our setting will be core calculisuitable for use as intermediate languages when com-piling higher-order, lazy functional programming lan-guages such as Haskell.

We describe a new formulation of supercompilationwhich is more closely connected to operational seman-tics than the standard presentation. As a result of thisconnection, we are able to exploit a standard Sestoft-style operational semantics to build a supercompilerwhich, for the first time, is able to supercompile a call-by-need language with unrestricted recursive let bind-ings.

We give complete descriptions of all of the (surpris-ingly tricky) components of the resulting supercompiler,showing in detail how standard formulations of super-compilation have to be adapted for the call-by-need set-ting.

We show how the standard technique of generalisa-tion can be extended to the call-by-need setting. We alsodescribe a novel generalisation scheme which is simplerto implement than standard generalisation techniques,and describe a completely new form of generalisationwhich can be used when supercompiling a typed lan-guage to ameliorate the phenomenon of supercompilersoverspecialising functions on their type arguments.

We also demonstrate a number of non-generalisation-based techniques that can be usedto improve the quality of the code generated by thesupercompiler. Firstly, we show how let-speculationcan be used to ameliorate the effects of the work-duplication checks that are inherent to call-by-needsupercompilation. Secondly, we demonstrate how thestandard idea of ‘rollback’ in supercompilation can beadapted to our presentation of the supercompilationalgorithm.

We have implemented our supercompiler as an op-timisation pass in the Glasgow Haskell Compiler. Weperform a comprehensive evaluation of our implemen-tation on a suite of standard call-by-need benchmarks.We improve the runtime of the benchmarks in our suiteby a geometric mean of 42%, and reduce the amount ofmemory which the benchmarks allocate by a geometricmean of 34%.

220

UCAM-CL-TR-836

Janina Voigt, Alan Mycroft:

Aliasing contracts: a dynamicapproach to alias protectionJune 2013, 27 pages, PDF

Abstract: Object-oriented programming languages al-low multiple variables to refer to the same object, a sit-uation known as aliasing. Aliasing is a powerful toolwhich enables sharing of objects across a system. How-ever, it can cause serious encapsulation breaches if notcontrolled properly; through aliasing, internal parts ofaggregate objects can be exposed and potentially mod-ified by any part of the system.

A number of schemes for controlling aliasing havebeen proposed, including Clarke et al.’s ownershiptypes and Boyland et al.’s capabilities. However, manyexisting systems lack flexibility and expressiveness,making it difficult in practice to program common id-ioms or patterns which rely on sharing, such as itera-tors.

We introduce aliasing contracts, a dynamic aliasprotection scheme which is highly flexible and expres-sive. Aliasing contracts allow developers to express as-sumptions about which parts of a system can accessparticular objects. Aliasing contracts attempt to be auniversal approach to alias protection; they can be usedto encode various existing schemes.

UCAM-CL-TR-837

Hamed Haddadi, Richard Mortier,Derek McAuley, Jon Crowcroft:

Human-data interactionJune 2013, 9 pages, PDF

Abstract: The time has come to recognise the emergingtopic of Human-Data Interaction (HDI). It arises fromthe need, both ethical and practical, to engage usersto a much greater degree with the collection, analysis,and trade of their personal data, in addition to provid-ing them with an intuitive feedback mechanism. HDIis inherently inter-disciplinary, encapsulating elementsnot only of traditional computer science ranging acrossdata processing, systems design, visualisation and inter-action design, but also of law, psychology, behaviouraleconomics, and sociology. In this short paper we elabo-rate the motivation for studying the nature and dynam-ics of HDI, and we give some thought to challenges andopportunities in developing approaches to this noveldiscipline.

UCAM-CL-TR-838

Silvia Breu:

Mining and tracking in evolvingsoftwareJune 2013, 104 pages, PDFPhD thesis (Newnham College, April 2011)

Abstract: Every large program contains a small frac-tion of functionality that resists clean encapsulation.For example, code for debugging or locking is hard tokeep hidden using object-oriented mechanisms alone.This problem gave rise to aspect-oriented program-ming: such cross-cutting functionality is factored outinto so-called aspects and these are woven back intomainline code during compilation. However, for exist-ing software systems to benefit from AOP, the cross-cutting concerns must be identified first (aspect mining)before the system can be re-factored into an aspect-oriented design.

This thesis on mining and tracking cross-cuttingconcerns makes three contributions: firstly, it presentsaspect mining as both a theoretical idea and a prac-tical and scalable application. By analysing where de-velopers add code to a program, our history-based as-pect mining (HAM) identifies and ranks cross-cuttingconcerns. Its effectiveness and high precision was eval-uated using industrial-sized open-source projects suchas ECLIPSE.

Secondly, the thesis takes the work on softwareevolution one step further. Knowledge about a con-cern’s implementation can become invalid as the systemevolves. We address this problem by defining structuraland textual patterns among the elements identified asrelevant to a concern’s implementation. The inferredpatterns are documented as rules that describe a con-cern in a formal (intensional) rather than a merely tex-tual (extensional) manner. These rules can then be usedto track an evolving concern’s implementation in con-junction with the development history.

Finally, we implemented this technique for Java inan Eclipse plug-in called ISIS4J and evaluated it using anumber of concerns. For that we again used the devel-opment history of an open-source project. The evalua-tion shows not only the effectiveness of our approach,but also to what extent our approach supports thetracking of a concern’s implementation despite, for ex-ample, program code extensions or refactorings.

UCAM-CL-TR-839

Colin Kelly:

Automatic extraction of propertynorm-like data from large textcorporaSeptember 2013, 154 pages, PDFPhD thesis (Trinity Hall, September 2012)

221

Abstract: Traditional methods for deriving property-based representations of concepts from text have fo-cused on extracting unspecified relationships (e.g., ”car— petrol”) or only a sub-set of possible relation types,such as hyponymy/hypernymy (e.g., ”car is-a vehicle”)or meronymy/metonymy (e.g., ”car has wheels”).

We propose a number of varied approaches towardsthe extremely challenging task of automatic, large-scale acquisition of unconstrained, human-like prop-erty norms (in the form ”concept relation feature”, e.g.,”elephant has trunk”, ”scissors used for cutting”, ”ba-nana is yellow”) from large text corpora. We presentfour distinct extraction systems for our task. In ourfirst two experiments we manually develop syntacticand lexical rules designed to extract property norm-likeinformation from corpus text. We explore the impactof corpus choice, investigate the efficacy of reweightingour output through WordNet-derived semantic clus-ters, introduce a novel entropy calculation specific toour task, and test the usefulness of other classical word-association metrics.

In our third experiment we employ semi-supervisedlearning to generalise from our findings thus far, view-ing our task as one of relation classification in whichwe train a support vector machine on a known set ofproperty norms. Our feature extraction performanceis encouraging; however the generated relations arerestricted to those found in our training set. There-fore in our fourth and final experiment we use an im-proved version of our semi-supervised system to ini-tially extract only features for concepts. We then use theconcepts and extracted features to anchor an uncon-strained relation extraction stage, introducing a novelbacking-off technique which assigns relations to con-cept/feature pairs using probabilistic information.

We also develop and implement an array of evalu-ations for our task. In addition to the previously em-ployed ESSLLI gold standard, we offer five new evalua-tion techniques: fMRI activation prediction, EEG acti-vation prediction, a conceptual structure statistics eval-uation, a human-generated semantic similarity evalu-ation and a WordNet semantic similarity comparison.We also comprehensively evaluate our three best sys-tems using human annotators.

Throughout our experiments, our various systems’output is promising but our final system is by far thebest-performing. When evaluated against the ESSLLIgold standard it achieves a precision of 44.1%, com-pared to the 23.9% precision of the current state ofthe art. Furthermore, our final system’s Pearson corre-lation with human- generated semantic similarity mea-surements is strong at 0.742, and human judges marked71.4% of its output as correct/plausible.

UCAM-CL-TR-840

Marek Rei:

Minimally superviseddependency-based methods for

natural language processingSeptember 2013, 169 pages, PDFPhD thesis (Churchill College, December 2012)

Abstract: This work investigates minimally-supervisedmethods for solving NLP tasks, without requiring ex-plicit annotation or training data. Our motivation is tocreate systems that require substantially reduced effortfrom domain and/or NLP experts, compared to anno-tating a corresponding dataset, and also offer easier do-main adaptation and better generalisation properties.

We apply these principles to four separate languageprocessing tasks and analyse their performance com-pared to supervised alternatives. First, we investigatethe task of detecting the scope of speculative language,and develop a system that applies manually-definedrules over dependency graphs. Next, we experimentwith distributional similarity measures for detectingand generating hyponyms, and describe a new measurethat achieves the highest performance on hyponym gen-eration. We also extend the distributional hypothesisto larger structures and propose the task of detectingentailment relations between dependency graph frag-ments of various types and sizes. Our system achievesrelatively high accuracy by combining distributionaland lexical similarity scores. Finally, we describe a self-learning framework for improving the accuracy of anunlexicalised parser, by calculating relation probabili-ties using its own dependency output. The method re-quires only a large in-domain text corpus and can there-fore be easily applied to different domains and genres.

While fully supervised approaches generally achievethe highest results, our experiments found minimallysupervised methods to be remarkably competitive. Bymoving away from explicit supervision, we aim to bet-ter understand the underlying patterns in the data, andto create systems that are not tied to any specific do-mains, tasks or resources.

UCAM-CL-TR-841

Arjuna Sathiaseelan, Dirk Trossen,Ioannis Komnios, Joerg Ott, Jon Crowcroft:

Information centric delay tolerantnetworking: an internet architecturefor the challengedSeptember 2013, 11 pages, PDF

Abstract: Enabling universal Internet access is one ofthe key issues that is currently being addressed glob-ally. However the existing Internet architecture is se-riously challenged to ensure universal service provi-sioning. This technical report puts forth our vision tomake the Internet more accessible by architecting a uni-versal communication architectural framework com-bining two emerging architecture and connectivity ap-proaches: Information Centric Networking (ICN) and

222

Delay/Disruption Tolerant Networking (DTN). Suchan unified architecture will aggressively seek to widenthe connectivity options and provide flexible servicemodels beyond what is currently pursued in the fieldof universal service provisioning.

UCAM-CL-TR-842

Helen Yannakoudakis:

Automated assessment ofEnglish-learner writingOctober 2013, 151 pages, PDFPhD thesis (Wolfson College, December 2012)

Abstract: In this thesis, we investigate automated as-sessment (AA) systems of free text that automaticallyanalyse and score the quality of writing of learnersof English as a second (or other) language. Previ-ous research has employed techniques that measure,in addition to writing competence, the semantic rele-vance of a text written in response to a given prompt.We argue that an approach which does not rely ontask-dependent components or data, and directly as-sesses learner English, can produce results as good asprompt-specific models. Furthermore, it has the advan-tage that it may not require re-training or tuning fornew prompts or assessment tasks. We evaluate the per-formance of our models against human scores, man-ually annotated in the Cambridge Learner Corpus, asubset of which we have released in the public domainto facilitate further research on the task.

We address AA as a supervised discriminative ma-chine learning problem, investigate methods for assess-ing different aspects of writing prose, examine theirgeneralisation to different corpora, and present state-of-the-art models. We focus on scoring general lin-guistic competence and discourse coherence and cohe-sion, and report experiments on detailed analysis ofappropriate techniques and feature types derived auto-matically from generic text processing tools, on theirrelative importance and contribution to performance,and on comparison with different discriminative mod-els, whilst also experimentally motivating novel fea-ture types for the task. Using outlier texts, we examineand address validity issues of AA systems and, morespecifically, their robustness to subversion by writerswho understand something of their workings. Finally,we present a user interface that visualises and uncoversthe ‘marking criteria’ represented in AA models, thatis, textual features identified as highly predictive of alearner’s level of attainment. We demonstrate how thetool can support their linguistic interpretation and en-hance hypothesis formation about learner grammars, inaddition to informing the development of AA systemsand further improving their performance.

UCAM-CL-TR-843

Robin Message:

Programming for humans:a new paradigm for domain-specificlanguagesNovember 2013, 140 pages, PDFPhD thesis (Robinson College, March 2013)

Abstract: Programming is a difficult, specialist skill. De-spite much research in software engineering, program-mers still work like craftsmen or artists, not engineers.As a result, programs cannot easily be modified, joinedtogether or customised. However, unlike a craft prod-uct, once a programmer has created their program, itcan replicated infinitely and perfectly. This means thatprograms are often not a good fit for their end-usersbecause this infinite duplication gives their creators anincentive to create very general programs.

My thesis is that we can create better paradigms,languages and data structuring techniques to enableend-users to create their own programs.

The first contribution is a new paradigm for pro-gramming languages which explicitly separates controland data flow. For example, in a web application, thecontrol level would handle user clicks and databasewrites, while the data level would handle form inputsand database reads. The language is strongly typed,with type reconstruction. We believe this paradigm isparticularly suited to end-user programming of interac-tive applications.

The second contribution is an implementation ofthis paradigm in a specialised visual programming lan-guage for novice programmers to develop web appli-cations. We describe our programming environment,which has a novel layout algorithm that maps controland data flow onto separate dimensions. We show thatexperienced programmers are more productive in thissystem than the alternatives.

The third contribution is a novel data structuringtechnique which infers fuzzy types from example data.This inference is theoretically founded on Bayesianstatistics. Our inference aids programmers in movingfrom semi-structured data to typed programs. We dis-cuss how this data structuring technique could be vi-sualised and integrated with our visual programmingenvironment.

UCAM-CL-TR-844

Wei Ming Khoo:

Decompilation as searchNovember 2013, 119 pages, PDFPhD thesis (Hughes Hall, August 2013)

223

Abstract: Decompilation is the process of convertingprograms in a low-level representation, such as ma-chine code, into high-level programs that are humanreadable, compilable and semantically equivalent. Thecurrent de facto approach to decompilation is largelymodelled on compiler theory and only focusses on oneor two of these desirable goals at a time.

This thesis makes the case that decompilation ismore effectively accomplished through search. It is ob-served that software development is seldom a cleanslate process and much software is available in publicrepositories. To back this claim, evidence is presentedfrom three categories of software development: corpo-rate software development, open source projects andmalware creation. Evidence strongly suggests that codereuse is prevalent in all categories.

Two approaches to search-based decompilation areproposed. The first approach borrow inspiration frominformation retrieval, and constitutes the first contri-bution of this thesis. It uses instruction mnemonics,control-flow sub-graphs and data constants, which canbe quickly extracted from a disassembly, and relies onthe popular text search engine CLucene. The time takento analyse a function is small enough to be practical andthe technique achieves an F2 measure of above 83.0%for two benchmarks.

The second approach and contribution of this thesisis perturbation analysis, which is able to differentiatebetween algorithms implementing the same functional-ity, e.g. bubblesort versus quicksort, and between differ-ent implementations of the same algorithm, e.g. quick-sort from Wikipedia versus quicksort from Rosettacode. Test-based indexing (TBI) uses random testingto characterise the input-output behaviour of a func-tion; perturbation-based indexing (PBI) is TBI with ad-ditional input-output behaviour obtained through per-turbation analysis. TBI/PBI achieves an F2 measure of88.4% on five benchmarks involving different compil-ers and compiler options.

To perform perturbation analysis, function proto-typing is needed, the standard way comprising live-ness and reaching-definitions analysis. However, it isobserved that in practice actual prototypes fall into oneof a few possible categories, enabling the type system tobe simplified considerably. The third and final contribu-tion is an approach to prototype recovery that followsthe principle of conformant execution, in the form ofinlined data source tracking, to infer arrays, pointer-to-pointers and recursive data structures.

UCAM-CL-TR-845

Stephen Kell:

Black-box composition ofmismatched software componentsDecember 2013, 251 pages, PDFPhD thesis (Christ’s College, December 2010)

Abstract: Software is expensive to develop. Much ofthat expense can be blamed on difficulties in combin-ing, integrating or re-using separate pieces of software,and in maintaining such compositions. Conventionaldevelopment tools approach composition in an inher-ently narrow way. Specifically, they insist on modulesthat are plug-compatible, meaning that they must fittogether down to a very fine level of detail, and thatare homogeneous, meaning that they must be writ-ten according to the same conventions and (usually)in the same programming language. In summary, mod-ules must have matched interfaces to compose. Theseinflexibilities, in turn, motivate more software creationand concomitant expense: they make programming ap-proaches based on integration and re-use unduly expen-sive. This means that reimplementation from scratch isoften chosen in preference to adaptation of existing im-plementations.

This dissertation presents several contributions to-wards lessening this problem. It centres on the de-sign of a new special-purpose programming language,called Cake. This language is specialised to the task ofdescribing how components having mismatched inter-faces (i.e., not plug-compatible, and perhaps not ho-mogeneous) may be adapted so that they compose asrequired. It is a language significantly more effective atcapturing relationships between mismatched interfacesthan general-purpose programming languages. Firstly,we outline the language’s design, which centres on rec-onciling interface differences in the form high-level cor-respondence rules which relate different interfaces. Sec-ondly, since Cake is designed to be a practical toolwhich can be a convenient and easily-integrated toolunder existing development practices, we describe animplementation of Cake in detail and explain how itachieves this integration. Thirdly, we evaluate Cake onreal tasks: by applying it to integration tasks whichhave already been performed under conventional ap-proaches, we draw meaningful comparisons demon-strating a smaller (quantitative) size of required codeand lesser (qualitative) complexity of the code that isrequired. Finally, Cake applies to a wide range of inputcomponents; we sketch extensions to Cake which ren-der it capable of composing components that are het-erogeneous with respect to a carefully identified set ofstylistic concerns which we describe in detail.

UCAM-CL-TR-846

Daniel Bates:

Exploiting tightly-coupled coresJanuary 2014, 162 pages, PDFPhD thesis (Robinson College, July 2013)

Abstract: As we move steadily through the multicoreera, and the number of processing cores on each chipcontinues to rise, parallel computation becomes in-creasingly important. However, parallelising an appli-cation is often difficult because of dependencies be-tween different regions of code which require cores

224

to communicate. Communication is usually slow com-pared to computation, and so restricts the opportunitiesfor profitable parallelisation. In this work, I explore theopportunities provided when communication betweencores has a very low latency and low energy cost. Iobserve that there are many different ways in whichmultiple cores can be used to execute a program, al-lowing more parallelism to be exploited in more situa-tions, and also providing energy savings in some cases.Individual cores can be made very simple and efficientbecause they do not need to exploit parallelism inter-nally. The communication patterns between cores canbe updated frequently to reflect the parallelism avail-able at the time, allowing better utilisation than spe-cialised hardware which is used infrequently.

In this dissertation I introduce Loki: a homoge-neous, tiled architecture made up of many simple,tightly-coupled cores. I demonstrate the benefits in bothperformance and energy consumption which can beachieved with this arrangement and observe that it isalso likely to have lower design and validation costsand be easier to optimise. I then determine exactlywhere the performance bottlenecks of the design are,and where the energy is consumed, and look into somemore-advanced optimisations which can make paral-lelism even more profitable.

UCAM-CL-TR-847

Jatinder Singh, Jean Bacon:

SBUS: a generic policy-enforcingmiddleware for open pervasivesystemsFebruary 2014, 20 pages, PDF

Abstract: Currently, application components tend to bebespoke and closed, running in vertical silos (single ap-plications/systems). To realise the potential of pervasivesystems, and emerging distributed systems more gen-erally, it must be possible to use components system-wide, perhaps in ways and for purposes not envisagedby their designers. It follows that while the infrastruc-ture and resources underlying applications still requiremanagement, so too do the applications themselves, interms of how and when they (inter)operate. To achievesuch context-dependent, personalised operation we be-lieve that the application logic embodied in componentsshould be separated from the policy that coordinatesthem, specifying where and how they should be used.

SBUS is an open, decentralised, application-independent policy-enforcing middleware, developedtowards this aim. To enable the flexible and complexinteractions required by pervasive systems, it supportsa wide range of interaction patterns, including eventdriven operation, request-response, and data (message)streaming, and features a flexible security model. Cru-cially, SBUS is dynamically reconfigurable, allowingcomponents to be managed from outside application

logic, by authorised third-parties. This paves the wayfor policy-driven systems, where policy can operateacross infrastructure and applications to realise bothtraditional and new functionality.

This report details the SBUS middleware and therole of policy enforcement in enabling pervasive, dis-tributed systems.

UCAM-CL-TR-848

James G. Jardine:

Automatically generating reading listsFebruary 2014, 164 pages, PDFPhD thesis (Robinson College, August 2013)

Abstract: This thesis addresses the task of automaticallygenerating reading lists for novices in a scientific field.Reading lists help novices to get up to speed in a newfield by providing an expert-directed list of papers toread. Without reading lists, novices must resort to ad-hoc exploratory scientific search, which is an inefficientuse of time and poses a danger that they might use bi-ased or incorrect material as the foundation for theirearly learning.

The contributions of this thesis are fourfold. Thefirst contribution is the ThemedPageRank (TPR) algo-rithm for automatically generating reading lists. It com-bines Latent Topic Models with Personalised PageRankand Age Adjustment in a novel way to generate read-ing lists that are of better quality than those generatedby state-of-the-art search engines. TPR is also used inthis thesis to reconstruct the bibliography for scientificpapers. Although not designed specifically for this task,TPR significantly outperforms a state-of-the-art systempurpose-built for the task. The second contribution is agold-standard collection of reading lists against whichTPR is evaluated, and against which future algorithmscan be evaluated. The eight reading lists in the gold-standard were produced by experts recruited from twouniversities in the United Kingdom. The third contribu-tion is the Citation Substitution Coefficient (CSC), anevaluation metric for evaluating the quality of readinglists. CSC is better suited to this task than standard IRmetrics such as precision, recall, F-score and mean av-erage precision because it gives partial credit to recom-mended papers that are close to gold-standard papersin the citation graph. This partial credit results in scoresthat have more granularity than those of the standardIR metrics, allowing the subtle differences in the perfor-mance of recommendation algorithms to be detected.The final contribution is a light-weight algorithm forAutomatic Term Recognition (ATR). As will be seen,technical terms play an important role in the TPR al-gorithm. This light-weight algorithm extracts technicalterms from the titles of documents without the needfor the complex apparatus required by most state-of-the-art ATR algorithms. It is also capable of extractingvery long technical terms, unlike many other ATR algo-rithms.

225

Four experiments are presented in this thesis. Thefirst experiment evaluates TPR against state-of-the-artsearch engines in the task of automatically generatingreading lists that are comparable to expert-generatedgold-standards. The second experiment compares theperformance of TPR against a purpose-built state-of-the-art system in the task of automatically reconstruct-ing the reference lists of scientific papers. The third ex-periment involves a user study to explore the abilityof novices to build their own reading lists using twofundamental components of TPR: automatic technicalterm recognition and topic modelling. A system expos-ing only these components is compared against a state-of-the-art scientific search engine. The final experimentis a user study that evaluates the technical terms discov-ered by the ATR algorithm and the latent topics gener-ated by TPR. The study enlists thousands of users ofQiqqa, research management software independentlywritten by the author of this thesis.

UCAM-CL-TR-849

Marcelo Bagnulo Braun, Jon Crowcroft:

SNA: Sourceless NetworkArchitectureMarch 2014, 12 pages, PDF

Abstract: Why are there source addresses in datagrams?What alternative architecture can one conceive to pro-vide all of the current, and some new functionality, cur-rently dependant on a conflicting set of uses for thisfield. We illustrate how this can be achieved by re-interpreting the 32-bit field in IPv4 headers to help theInternet solve a range of current and future problems.

UCAM-CL-TR-850

Robert N.M. Watson, Peter G. Neumann,Jonathan Woodruff, Jonathan Anderson,David Chisnall, Brooks Davis, Ben Laurie,Simon W. Moore, Steven J. Murdoch,Michael Roe:

Capability HardwareEnhanced RISC Instructions:CHERI Instruction-set architectureApril 2014, 131 pages, PDF

Abstract: This document describes the rapidly maturingdesign for the Capability Hardware Enhanced RISC In-structions (CHERI) Instruction-Set Architecture (ISA),which is being developed by SRI International and theUniversity of Cambridge. The document is intended tocapture our evolving architecture, as it is being refined,tested, and formally analyzed. We have now reached

70% of the time for our research and development cy-cle.

CHERI is a hybrid capability-system architecturethat combines new processor primitives with the com-modity 64-bit RISC ISA enabling software to ef-ficiently implement fine-grained memory protectionand a hardware-software object-capability securitymodel. These extensions support incrementally adopt-able, high-performance, formally based, programmer-friendly underpinnings for fine-grained software de-composition and compartmentalization, motivated byand capable of enforcing the principle of least privilege.The CHERI system architecture purposefully addressesknown performance and robustness gaps in commodityISAs that hinder the adoption of more secure program-ming models centered around the principle of least priv-ilege. To this end, CHERI blends traditional paged vir-tual memory with a per-address-space capability modelthat includes capability registers, capability instruc-tions, and tagged memory that have been added to the64-bit MIPS ISA via a new capability coprocessor.

CHERI’s hybrid approach, inspired by the Cap-sicum security model, allows incremental adoption ofcapability-oriented software design: software imple-mentations that are more robust and resilient can bedeployed where they are most needed, while leavingless critical software largely unmodified, but neverthe-less suitably constrained to be incapable of having ad-verse effects. For example, we are focusing conver-sion efforts on low-level TCB components of the sys-tem: separation kernels, hypervisors, operating systemkernels, language runtimes, and userspace TCBs suchas web browsers. Likewise, we see early-use scenar-ios (such as data compression, image processing, andvideo processing) that relate to particularly high-risksoftware libraries, which are concentrations of bothcomplex and historically vulnerability-prone code com-bined with untrustworthy data sources, while leavingcontaining applications unchanged.

This report describes the CHERI architecture anddesign, and provides reference documentation for theCHERI instruction-set architecture (ISA) and potentialmemory models, along with their requirements. It alsodocuments our current thinking on integration of pro-gramming languages and operating systems. Our ongo-ing research includes two prototype processors employ-ing the CHERI ISA, each implemented as an FPGA softcore specified in the Bluespec hardware description lan-guage (HDL), for which we have integrated the appli-cation of formal methods to the Bluespec specificationsand the hardware-software implementation.

UCAM-CL-TR-851

Robert N.M. Watson, David Chisnall,Brooks Davis, Wojciech Koszek,Simon W. Moore, Steven J. Murdoch,Peter G. Neumann, Jonathan Woodruff:

226

Capability HardwareEnhanced RISC Instructions:CHERI User’s guideApril 2014, 26 pages, PDF

Abstract: The CHERI User’s Guide documents the soft-ware environment for the Capability Hardware En-hanced RISC Instructions (CHERI) prototype devel-oped by SRI International and the University of Cam-bridge. The User’s Guide is targeted at hardware andsoftware developers working with capability-enhancedsoftware. It describes the CheriBSD operating system,a version of the FreeBSD operating system that hasbeen adapted to support userspace capability systemsvia the CHERI ISA, and the CHERI Clang/LLVM com-piler suite. It also describes the earlier Deimos demon-stration microkernel.

UCAM-CL-TR-852

Robert N.M. Watson, Jonathan Woodruff,David Chisnall, Brooks Davis,Wojciech Koszek, A. Theodore Markettos,Simon W. Moore, Steven J. Murdoch,Peter G. Neumann, Robert Norton,Michael Roe:

Bluespec Extensible RISCImplementation: BERI HardwarereferenceApril 2014, 76 pages, PDF

Abstract: The BERI Hardware Reference documentsthe Bluespec Extensible RISC Implementation (BERI)developed by SRI International and the Universityof Cambridge. The reference is targeted at hardwareand software developers working with the BERI1 andBERI2 processor prototypes in simulation and synthe-sized to FPGA targets. We describe how to use theBERI1 and BERI2 processors in simulation, the BERI1debug unit, the BERI unit-test suite, how to use BERIwith Altera FPGAs and Terasic DE4 boards, the 64-bit MIPS and CHERI ISAs implemented by the pro-totypes, the BERI1 and BERI2 processor implementa-tions themselves, and the BERI Programmable Inter-rupt Controller (PIC).

UCAM-CL-TR-853

Robert N.M. Watson, David Chisnall,Brooks Davis, Wojciech Koszek,Simon W. Moore, Steven J. Murdoch,Peter G. Neumann, Jonathan Woodruff:

Bluespec Extensible RISCImplementation: BERI SoftwarereferenceApril 2014, 34 pages, PDF

Abstract: The BERI Software Reference documentshow to build and use FreeBSD on the Bluespec Extensi-ble RISC Implementation (BERI) developed by SRI In-ternational and the University of Cambridge. The refer-ence is targeted at hardware and software programmerswho will work with BERI or BERI-derived systems.

UCAM-CL-TR-854

Dominic Orchard:

Programming contextualcomputationsMay 2014, 223 pages, PDFPhD thesis (Jesus College, January 2013)

Abstract: Modern computer programs are executed ina variety of different contexts: on servers, handheld de-vices, graphics cards, and across distributed environ-ments, to name a few. Understanding a program’s con-textual requirements is therefore vital for its correct ex-ecution. This dissertation studies contextual computa-tions, ranging from application-level notions of contextto lower-level notions of context prevalent in commonprogramming tasks. It makes contributions in three ar-eas: mathematically structuring contextual computa-tions, analysing contextual program properties, and de-signing languages to facilitate contextual programming.

Firstly, existing work which mathematically struc-tures contextual computations using comonads (in pro-gramming and semantics) is analysed and extended.Comonads are shown to exhibit a shape preservationproperty which restricts their applicability to a sub-set of contextual computations. Subsequently, novelgeneralisations of comonads are developed, includingthe notion of an indexed comonad, relaxing shape-preservation restrictions.

Secondly, a general class of static analyses called co-effect systems is introduced to describe the propagationof contextual requirements throughout a program. In-dexed comonads, with some additional structure, areshown to provide a semantics for languages whose con-textual properties are captured by a coeffect analysis.

Finally, language constructs are presented to easethe programming of contextual computations. The ben-efits of these language features, the mathematical struc-turing, and coeffect systems are demonstrated by a lan-guage for container programming which guarantees op-timisations and safety invariants.

227

UCAM-CL-TR-855

Patrick K.A. Wollner, Isak Herman,Haikal Pribadi, Leonardo Impett,Alan F. Blackwell:

MephistophoneJune 2014, 8 pages, PDF

Abstract: The scope of this project is the creation ofa controller for composition, performance and interac-tion with sound. Interactions can be classified to oneof three types: (i) end-user triggering, controlling, edit-ing, and manipulation of sounds with varying tempo-ral dimensions; (ii) inclusion of multi-sensor feedbackmechanisms including end-user biological monitoring;and (iii) integration of sensed, semi-random, environ-mental factors as control parameters to the output ofthe system.

The development of the device has been completedin two stages: (i) conceptual scoping has defined the in-teraction space for the development of this machine;(ii) prototype development has resulted in the creationof a functioning prototype and culminated in a series oflive performances. The final stage presupposes a custominteraction design for each artistic partner, reinforcingthe conceptual role of the device as a novel mechanismfor personalized, visualizable, tangible interaction withsound.

UCAM-CL-TR-856

Awais Athar:

Sentiment analysis of scientificcitationsJune 2014, 114 pages, PDFPhD thesis (Girton College, April 2014)

Abstract: While there has been growing interest in thefield of sentiment analysis for different text genres inthe past few years, relatively less emphasis has beenplaced on extraction of opinions from scientific litera-ture, more specifically, citations. Citation sentiment de-tection is an attractive task as it can help researchers inidentifying shortcomings and detecting problems in aparticular approach, determining the quality of a paperfor ranking in citation indexes by including negative ci-tations in the weighting scheme, and recognising issuesthat have not been addressed as well as possible gaps incurrent research approaches.

Current approaches assume that the sentimentpresent in the citation sentence represents the true sen-timent of the author towards the cited paper and donot take further informal mentions of the citations else-where in the article into account. There have also beenno attempts to evaluate citation sentiment on a largecorpus.

This dissertation focuses on the detection of senti-ment towards the citations in a scientific article. Thedetection is performed using the textual informationfrom the article. I address three sub-tasks and presentnew large corpora for each of the tasks.

Firstly, I explore different feature sets for detectionof sentiment in explicit citations. For this task, I presenta new annotated corpus of more than 8,700 citationsentences which have been labelled as positive, nega-tive or objective towards the cited paper. Experiment-ing with different feature sets, I show the best result ofmicro-F score 0.760 is obtained using n-grams of lengthand dependency relations.

Secondly, I show that the assumption that senti-ment is limited only to the explicit citation is incor-rect. I present a citation context corpus where morethan 200,000 sentences from 1,034 paper—referencepairs have been annotated for sentiment. These sen-tences contain 1,741 citations towards 20 cited papers.I show that including the citation context in the analy-sis increases the subjective sentiment by almost 185%.I propose new features which help in extracting the ci-tation context and examine their effect on sentimentanalysis.

Thirdly, I tackle the task of identifying significantcitations. I propose features which help discriminatethese from citations in passing, and show that they pro-vide statistically significant improvements over a rule-based baseline.

UCAM-CL-TR-857

Heidi Howard:

ARC: Analysis of Raft ConsensusJuly 2014, 69 pages, PDFBA dissertation (Pembroke College, May 2014)

Abstract: The Paxos algorithm, despite being synony-mous with distributed consensus for a decade, is fa-mously difficult to reason about and implement dueto its non-intuitive approach and underspecification.In response, this project implemented and evaluated aframework for constructing fault-tolerant applications,utilising the recently proposed Raft algorithm for dis-tributed consensus. Constructing a simulation frame-work for our implementation enabled us to evaluate theprotocol on everything from understandability and ef-ficiency to correctness and performance in diverse net-work environments. We propose a range of optimisa-tions to the protocol and released to the communitya testbed for developing further optimisations and in-vestigating optimal protocol parameters for real-worlddeployments.

228

UCAM-CL-TR-858

Jonathan D. Woodruff:

CHERI: A RISC capability machinefor practical memory safetyJuly 2014, 112 pages, PDFPhD thesis (Clare Hall, March 2014)

Abstract: This work presents CHERI, a practical ex-tension of the 64-bit MIPS instruction set to supportcapabilities for fine-grained memory protection.

Traditional paged memory protection has proved in-adequate in the face of escalating security threats andproposed solutions include fine-grained protection ta-bles (Mondrian Memory Protection) and hardware fat-pointer protection (Hardbound). These have empha-sised transparent protection for C executables but havelacked flexibility and practicality. Intel’s recent memoryprotection extensions (iMPX) attempt to adopt someof these ideas and are flexible and optional but lack thestrict correctness of these proposals.

Capability addressing has been the classical solu-tion to efficient and strong memory protection but ithas been thought to be incompatible with common in-struction sets and also with modern program structurewhich uses a flat memory space with global pointers.

CHERI is a fusion of capabilities with a paged flatmemory producing a program-managed fat pointer ca-pability model. This protection mechanism scales fromapplication sandboxing to efficient byte-level memorysafety with per-pointer permissions. I present an exten-sion to the 64-bit MIPS architecture on FPGA that runsstandard FreeBSD and supports self-segmenting appli-cations in user space.

Unlike other recent proposals, the CHERI imple-mentation is open-source and of sufficient quality tosupport software development as well as communityextension of this work. I compare with published mem-ory safety mechanisms and demonstrate competitiveperformance while providing assurance and greaterflexibility with simpler hardware requirements.

UCAM-CL-TR-859

Lucian Carata, Oliver Chick, James Snee,Ripduman Sohan, Andrew Rice,Andy Hopper:

Resourceful: fine-grained resourceaccounting for explaining servicevariabilitySeptember 2014, 12 pages, PDF

Abstract: Increasing server utilization in modern data-centers also increases the likelihood of contention onphysical resources and unexpected behavior due toside-effects from interfering applications. Existing re-source accounting mechanisms are too coarse-grainedfor allowing services to track the causes of such varia-tions in their execution. We make the case for measur-ing resource consumption at system-call level and out-line the design of Resourceful, a system that offers ap-plications the ability of querying this data at runtimewith low overhead, accounting for costs incurred bothsynchronously and asynchronously after a given call.

UCAM-CL-TR-860

Steffen Loesch:

Program equivalence in functionalmetaprogramming via nominal ScottdomainsOctober 2014, 164 pages, PDFPhD thesis (Trinity College, May 2014)

Abstract: A prominent feature of metaprogramming isto write algorithms in one programming language (themeta-language) over structures that represent the pro-grams of another programming language (the object-language). Whenever the object-language has bindingconstructs (and most programming languages do), werun into tedious issues concerning the semantically cor-rect manipulation of binders.

In this thesis we study a semantic framework inwhich these issues can be dealt with automaticallyby the meta-language. Our framework takes the user-friendly ‘nominal’ approach to metaprogramming inwhich bound objects are named.

Specifically, we develop mathematical tools for giv-ing logical proofs that two metaprograms (of ourframework) are equivalent. We consider two programsto be equivalent if they always give the same observableresults when they are run as part of any larger codebase.This notion of program equivalence, called contextualequivalence, is examined for an extension of Plotkin’sarchetypal functional programming language PCF withnominal constructs for metaprogramming, called PNA.Historically, PCF and its denotational semantics basedon Scott domains were hugely influential in the studyof contextual equivalence. We mirror Plotkin’s classicalresults with PNA and a denotational semantics basedon a variant of Scott domains that is modelled withinthe logic of nominal sets. In particular, we prove thefollowing full abstraction result: two PNA programsare contextually equivalent if and only if they denoteequal elements of the nominal Scott domain model.This is the first full abstraction result we know of forlanguages combining higher-order functions with someform of locally scoped names, which uses a domaintheory based on ordinary extensional functions, rather

229

than using the more intensional approach of game se-mantics.

To obtain full abstraction, we need to add two newprogramming language constructs to PNA, one for ex-istential quantification over names and one for ‘definitedescription’ over names. Adding only one of them is in-sufficient, as we give proofs that full abstraction fails ifeither is left out.

UCAM-CL-TR-861

Tadas Baltrusaitis:

Automatic facial expression analysisOctober 2014, 218 pages, PDFPhD thesis (Fitzwilliam College, March 2014)

Abstract: Humans spend a large amount of their timeinteracting with computers of one type or another.However, computers are emotionally blind and indif-ferent to the affective states of their users. Human-computer interaction which does not consider emo-tions, ignores a whole channel of available information.

Faces contain a large portion of our emotionally ex-pressive behaviour. We use facial expressions to displayour emotional states and to manage our interactions.Furthermore, we express and read emotions in faces ef-fortlessly. However, automatic understanding of facialexpressions is a very difficult task computationally, es-pecially in the presence of highly variable pose, expres-sion and illumination. My work furthers the field ofautomatic facial expression tracking by tackling theseissues, bringing emotionally aware computing closer toreality.

Firstly, I present an in-depth analysis of the Con-strained Local Model (CLM) for facial expression andhead pose tracking. I propose a number of extensionsthat make location of facial features more accurate.

Secondly, I introduce a 3D Constrained LocalModel (CLM-Z) which takes full advantage of depthinformation available from various range scanners.CLM-Z is robust to changes in illumination and showsbetter facial tracking performance.

Thirdly, I present the Constrained Local NeuralField (CLNF), a novel instance of CLM that dealswith the issues of facial tracking in complex scenes. Itachieves this through the use of a novel landmark de-tector and a novel CLM fitting algorithm. CLNF out-performs state-of-the-art models for facial tracking inpresence of difficult illumination and varying pose.

Lastly, I demonstrate how tracked facial expressionscan be used for emotion inference from videos. I alsoshow how the tools developed for facial tracking canbe applied to emotion inference in music.

UCAM-CL-TR-862

Henrik Lieng:

Surface modelling for 2D imageryOctober 2014, 177 pages, PDFPhD thesis (Fitzwilliam College, June 2014)

Abstract: Vector graphics provides powerful tools fordrawing scalable 2D imagery. With the rise of mobilecomputers, of different types of displays and imageresolutions, vector graphics is receiving an increasingamount of attention. However, vector graphics is notthe leading framework for creating and manipulating2D imagery. The reason for this reluctance of employ-ing vector graphical frameworks is that it is difficult tohandle complex behaviour of colour across the 2D do-main.

A challenging problem within vector graphics isto define smooth colour functions across the image.In previous work, two approaches exist. The first ap-proach, known as diffusion curves, diffuses coloursfrom a set of input curves and points. The secondapproach, known as gradient meshes, defines smoothcolour functions from control meshes. These two ap-proaches are incompatible: diffusion curves do not sup-port the local behaviour provided by gradient meshesand gradient meshes do not support freeform curves asinput. My research aims to narrow the gap between dif-fusion curves and gradient meshes.

With this aim in mind, I propose solutions to cre-ate control meshes from freeform curves. I demonstratethat these control meshes can be used to render a vectorprimitive similar to diffusion curves using subdivisionsurfaces. With the use of subdivision surfaces, insteadof a diffusion process, colour gradients can be locallycontrolled using colour gradient curves associated withthe input curves.

The advantage of local control is further exploredin the setting of vector-centric image processing. Idemonstrate that a certain contrast enhancement pro-file, known as the Cornsweet profile, can be modelledvia surfaces in images. This approach does not pro-duce saturation artefacts related with previous filter-based methods. Additionally, I demonstrate various ap-proaches to artistic filtering, where the artist locallymodels given artistic effects.

Gradient meshes are restricted to rectangular topol-ogy of the control meshes. I argue that this restrictionhinders the applicability of the approach and its po-tential to be used with control meshes extracted fromfreeform curves. To this end, I propose a mesh-basedvector primitive that supports arbitrary manifold topol-ogy of the control mesh.

UCAM-CL-TR-863

Jatinder Singh, Jean Bacon, Jon Crowcroft,Anil Madhavapeddy, Thomas Pasquier,W. Kuan Hon, Christopher Millard:

230

Regional clouds: technicalconsiderationsNovember 2014, 18 pages, PDF

Abstract: The emergence and rapid uptake of cloudcomputing services raise a number of legal challenges.Recently, there have been calls for regional clouds;where policy makers from various states have proposedcloud computing services that are restricted to serving(only) their particular geographic region. At a technicallevel, such rhetoric is rooted in the means for control.

This paper explores the technical considerations un-derpinning a regional cloud, including the current stateof cloud provisioning, what can be achieved using exist-ing technologies, and the potential of ongoing research.Our discussion covers technology at various system lev-els, including network-centric controls, cloud platformmanagement, and governance mechanisms (includingencryption and information flow control) for cloudproviders, applications, tenants, and end-users.

UCAM-CL-TR-864

Robert N. M. Watson, Peter G. Neumann,Jonathan Woodruff, Jonathan Anderson,David Chisnall, Brooks Davis, Ben Laurie,Simon W. Moore, Steven J. Murdoch,Michael Roe:

Capability HardwareEnhanced RISC Instructions:CHERI Instruction-set architectureDecember 2014, 142 pages, PDF

Abstract: This technical report describes CHERI ISAv3,the third version of the Capability Hardware EnhancedRISC Instructions (CHERI) Instruction-Set Architec-ture (ISA). CHERI is being developed by SRI Inter-national and the University of Cambridge. This de-sign captures four years of research, development, re-finement, formal analysis, and testing, and is a sub-stantial enhancement to the ISA version described inUCAM-CL-TR-850. Key improvements lie in tighterC-language integration, and more mature support forsoftware object-capability models; these changes resultfrom experience gained in adapting substantial soft-ware stacks to run on prototype hardware.

The CHERI instruction set is based on a hybridcapability-system architecture that adds new capability-system primitives to a commodity 64-bit RISC ISA en-abling software to efficiently implement fine-grainedmemory protection and a hardware-software object-capability security model. These extensions supportincrementally adoptable, high-performance, formallybased, programmer-friendly underpinnings for fine-grained software decomposition and compartmental-ization, motivated by and capable of enforcing the prin-ciple of least privilege.

The CHERI system architecture purposefully ad-dresses known performance and robustness gaps incommodity ISAs that hinder the adoption of more se-cure programming models centered around the prin-ciple of least privilege. To this end, CHERI blendstraditional paged virtual memory with a per-address-space capability model that includes capability regis-ters, capability instructions, and tagged memory thathave been added to the 64-bit MIPS ISA via a newcapability coprocessor. CHERI also learns from the C-language fat-pointer literature: CHERI capabilities candescribe not only regions of memory, but can also cap-ture C pointer semantics allowing capabilities to besubstituted for pointers in generated code.

CHERI’s hybrid system approach, inspired by theCapsicum security model, allows incremental adoptionof capability-oriented software design: software imple-mentations that are more robust and resilient can be de-ployed where they are most needed, while leaving lesscritical software largely unmodified, but neverthelesssuitably constrained to be incapable of having adverseeffects. For example, we are focusing conversion ef-forts on low-level TCB components of the system: sep-aration kernels, hypervisors, operating system kernels,language runtimes, and userspace TCBs such as webbrowsers. Likewise, we see early-use scenarios (suchas data compression, protocol parsing, image process-ing, and video processing) that relate to particularlyhigh-risk software libraries, which are concentrationsof both complex and historically vulnerability-pronecode combined with untrustworthy data sources, whileleaving containing applications unchanged.

This report describes the CHERI Instruction-Set Ar-chitecture (ISA) and design, and provides reference doc-umentation and potential memory models, along withtheir requirements. It also briefly addresses the CHERIsystem hardware-software architecture, documentingour current thinking on integrating programming lan-guages and operating systems with the CHERI hard-ware.

UCAM-CL-TR-865

Christopher S.F. Smowton:

I/O Optimisation and elimination viapartial evaluationDecember 2014, 129 pages, PDFPhD thesis (Churchill College, November 2014)

Abstract: Computer programs commonly repeat work.Short programs go through the same initialisation se-quence each time they are run, and long-running serversmay be given a sequence of similar or identical requests.In both cases, there is an opportunity to save time by re-using previously computed results; however, program-mers often do not exploit that opportunity because todo so would cost development time and increase codecomplexity.

231

Partial evaluation is a semi-automatic technique forspecialising programs or parts thereof to perform bet-ter in particular circumstances, and can reduce repeatedwork by generating a program variant that is spe-cialised for use in frequently-occurring circumstances.However, existing partial evaluators are limited in boththeir depth of analysis and their support for real-worldprograms, making them ineffective at specialising prac-tical software.

In this dissertation, I present a new, more accuratepartial evaluation system that can specialise programs,written in low-level languages including C and C++,that interact with the operating system to read exter-nal data. It is capable of specialising programs that arechallenging to analyse, including those which use arbi-trarily deep pointer indirection, unsafe type casts, andmulti-threading. I use this partial evaluator to specialiseprograms with respect to files that they read from disk,or data they consume from the network, producing spe-cialised variants that perform better when that externaldata is as expected, but which continue to function likethe original program when it is not. To demonstrate thesystem’s practical utility, I evaluate the system special-ising real-world software, and show that it can achievesignificant runtime improvements with little manual as-sistance.

UCAM-CL-TR-866

Ilias Giechaskiel, George Panagopoulos,Eiko Yoneki:

PDTL: Parallel and distributedtriangle listing for massive graphsApril 2015, 14 pages, PDF

Abstract: This paper presents the first distributed tri-angle listing algorithm with provable CPU, I/O, Mem-ory, and Network bounds. Finding all triangles (3-cliques) in a graph has numerous applications for den-sity and connectivity metrics. The majority of existingalgorithms for massive graphs are sequential processingand distributed versions of algorithms do not guaran-tee their CPU, I/O, Memory or Network requirements.Our Parallel and Distributed Triangle Listing (PDTL)framework focuses on efficient external-memory accessin distributed environments instead of fitting subgraphsinto memory. It works by performing efficient orienta-tion and load-balancing steps, and replicating graphsacross machines by using an extended version of Huet al.’s Massive Graph Triangulation algorithm. As aresult, PDTL suits a variety of computational environ-ments, from single-core machines to high-end clusters.PDTL computes the exact triangle count on graphsof over 6 billion edges and 1 billion vertices (e.g. Ya-hoo graphs), outperforming and using fewer resourcesthan the state-of-the-art systems PowerGraph, OPT,

and PATRIC by 2 times to 4 times. Our approach high-lights the importance of I/O considerations in a dis-tributed environment, which has received less attentionin the graph processing literature.

UCAM-CL-TR-867

Nikolai Sultana:

Higher-order proof translationApril 2015, 158 pages, PDFPhD thesis (Trinity College, April 2014)

Abstract: The case for interfacing logic tools togetherhas been made countless times in the literature, but itis still an important research question. There are vari-ous logics and respective tools for carrying out formaldevelopments, but practitioners still lament the diffi-culty of reliably exchanging mathematical data betweentools.

Writing proof-translation tools is hard. The prob-lem has both a theoretical side (to ensure that the trans-lation is adequate) and a practical side (to ensure thatthe translation is feasible and usable). Moreover, thesource and target proof formats might be less docu-mented than desired (or even necessary), and this adds adash of reverse-engineering to what should be a systemintegration task.

This dissertation studies proof translation forhigher-order logic. We will look at the qualitative bene-fits of locating the translation close to the source (wherethe proof is generated), the target (where the proof isconsumed), and in between (as an independent toolfrom the proof producer and consumer).

Two ideas are proposed to alleviate the difficultyof building proof translation tools. The first is a prooftranslation framework that is structured as a compiler.Its target is specified as an abstract machine, which cap-tures the essential features of its implementations. Thisframework is designed to be performant and extensible.Second, we study proof transformations that convertrefutation proofs from a broad class of consistency-preserving calculi (such as those used by many proof-finding tools) into proofs in validity-preserving calculi(the kind used by many proof-checking tools). The ba-sic method is very simple, and involves applying a singletransformation uniformly to all of the source calculi’sinferences, rather than applying ad hoc (rule specific)inference interpretations.

UCAM-CL-TR-868

Robert N. M. Watson, Jonathan Woodruff,David Chisnall, Brooks Davis,Wojciech Koszek, A. Theodore Markettos,Simon W. Moore, Steven J. Murdoch,Peter G. Neumann, Robert Norton,Michael Roe:

232

Bluespec Extensible RISCImplementation: BERI HardwarereferenceApril 2015, 82 pages, PDF

Abstract: The BERI Hardware Reference describes theBluespec Extensible RISC Implementation (BERI) pro-toype developed by SRI International and the Univer-sity of Cambridge. The reference is targeted at hard-ware and software developers working with the BERI1and BERI2 processor prototypes in simulation and syn-thesized to FPGA targets. We describe how to use theBERI1 and BERI2 processors in simulation, the BERI1debug unit, the BERI unit-test suite; how to use BERIwith Altera FPGAs and Terasic DE4 boards, the 64-bit MIPS and CHERI ISAs implemented by the pro-totypes, the BERI1 and BERI2 processor implementa-tions themselves, and the BERI Programmable Inter-rupt Controller (PIC).

UCAM-CL-TR-869

Robert N. M. Watson, David Chisnall,Brooks Davis, Wojciech Koszek,Simon W. Moore, Steven J. Murdoch,Peter G. Neumann, Jonathan Woodruff:

Bluespec Extensible RISCImplementation: BERI SoftwarereferenceApril 2015, 27 pages, PDF

Abstract: The BERI Software Reference documentshow to build and use the FreeBSD operating system onthe Bluespec Extensible RISC Implementation (BERI)developed by SRI International and the University ofCambridge. The reference is targeted at hardware andsoftware programmers who will work with BERI orBERI-derived systems.

UCAM-CL-TR-870

Ali Mustafa Zaidi:

Accelerating control-flow intensivecode in spatial hardwareMay 2015, 170 pages, PDFPhD thesis (St. Edmund’s College, February 2014)

Abstract: Designers are increasingly utilizing spatial(e.g. custom and reconfigurable) architectures to im-prove both efficiency and performance in increasinglyheterogeneous systems-on-chip. Unfortunately, whilesuch architectures can provide orders of magnitude bet-ter efficiency and performance on numeric applications,

they exhibit poor performance when implementing se-quential, control-flow intensive code. This thesis studiesthe problem of improving sequential code performancein spatial hardware without sacrificing its inherent effi-ciency advantage.

I propose (a) switching from a statically scheduledto a dynamically scheduled, dataflow execution model,and (b) utilizing a newly developed compiler intermedi-ate representation (IR) designed to expose ILP in spa-tial hardware, even in the presence of complex con-trol flow. I describe this new IR – the Value StateFlow Graph (VSFG) – and how it statically exposes ILPfrom control-flow intensive code by enabling control-dependence analysis, execution along multiple flows ofcontrol, as well as aggressive control-flow speculation.I also present a High-Level Synthesis (HLS) toolchain,that compiles unmodified high-level language code todataflow custom hardware, via the LLVM compiler in-frastructure.

I show that for control-flow intensive code, VSFG-based custom hardware performance approaches, oreven exceeds the performance of a complex superscalarprocessor, while consuming only 1/4x the energy of anefficient in-order processor, and 1/8x that of a com-plex out-of-order processor. I also present a discussionof compile-time optimizations that may be attemptedto further improve both efficiency and performance forVSFG-based hardware, including using alias analysis tostatically partition and parallelize memory operations.

This work demonstrates that it is possible to usecustom and/or reconfigurable hardware in heteroge-neous systems to improve the efficiency of frequentlyexecuted sequential code, without compromising per-formance relative to an energy inefficient out-of-ordersuperscalar processor.

UCAM-CL-TR-871

Peter R. Calvert:

Architecture-neutral parallelism viathe Join CalculusJuly 2015, 150 pages, PDFPhD thesis (Trinity College, September 2014)

Abstract: Ever since the UNCOL efforts in the 1960s,compilers have sought to use both source-language-neutral and architecture-neutral intermediate represen-tations. The advent of web applets led to the JVMwhere such a representation was also used for distri-bution. This trend has continued and now many main-stream applications are distributed using the JVM or.NET formats. These languages can be efficiently runon a target architecture (e.g. using JIT techniques).However, such intermediate languages have been pre-dominantly sequential, supporting only rudimentaryconcurrency primitives such as threads. This thesis pro-poses a parallel intermediate representation with anal-ogous goals. The specific contributions made in this

233

work are based around a join calculus abstract machine(JCAM). These can be broadly categorised into threesections.

The first contribution is the definition of the abstractmachine itself. The standard join calculus is modified toprevent implicit sharing of data as this is undesirable innon-shared memory architectures. It is then condensedinto three primitive operations that can be targeted byoptimisations and analyses. I claim that these three sim-ple operations capture all the common styles of con-currency and parallelism used in current programminglanguages.

The work goes on to show how the JCAM inter-mediate representation can be implemented on shared-memory multi-core machines with acceptable over-heads. This process illustrates some key program prop-erties that can be exploited to give significant benefitsin certain scenarios.

Finally, conventional control-flow analyses areadapted to the join calculus to allow the propertiesrequired for optimising compilation to be inferred.Along with the prototype compiler, this illustrates theJCAM’s capabilities as a universal representation forparallelism.

UCAM-CL-TR-872

Jia Meng:

The integration of higher orderinteractive proof with first orderautomatic theorem provingJuly 2015, 144 pages, PDFPhD thesis (Churchill College, April 2005)

Abstract: Interactive and automatic theorem provingare the two most widely used computer-assisted the-orem proving methods. Interactive proof tools suchas HOL, Isabelle and PVS have been highly success-ful. They support expressive formalisms and have beenused for verifying hardware, software, protocols, andso forth. Unfortunately interactive proof requires mucheffort from a skilled user. Many other tools are com-pletely automatic, such as Vampire, SPASS and Otter.However, they cannot be used to verify large systemsbecause their logic is inexpressive. This dissertation fo-cuses on how to combine these two types of theoremproving to obtain the advantages of each of them. Thisresearch is carried out by investigating the integrationof Isabelle with Vampire and SPASS.

Isabelle is an interactive theorem prover and it sup-ports a multiplicity of logics, such as ZF and HOL.Vampire and SPASS are first order untyped resolutionprovers. The objective of this research is to design aneffective method to support higher order interactiveproof with any first order resolution prover. This inte-gration can simplify the formal verification procedure

by reducing the user interaction required during inter-active proofs: many goals will be proved by automaticprovers.

For such an integration to be effective, we mustbridge the many differences between a typical interac-tive theorem prover and a resolution theorem prover.Examples of the differences are higher order versus firstorder; typed versus untyped.

Through experiments, we have designed and im-plemented a practical method to convert Isabelle’s for-malisms (ZF and HOL) into untyped first-order clauses.Isabelle/ZF’s formulae that are not first-order need to bereformulated to first-order formulae before clause nor-mal form transformation. For Isabelle/HOL, a soundmodelling of its type system is designed first beforetranslating its formulae into first-order clauses with itstype information encoded. This method of formaliza-tion makes it possible to have Isabelle integrated withresolutions.

A large set of axioms is usually required to supportinteractive proofs but can easily overwhelm an auto-matic prover. We have experimented with various meth-ods to solve this problem, including using different set-tings of an automatic prover and automatically elimi-nating irrelevant axioms.

The invocation of background automatic proversshould be invisible to users, hence we have also de-signed and implemented an automatic calling proce-dure, which extracts all necessary information andsends it to an automatic prover at an appropriate pointduring an Isabelle proof.

Finally, the results and knowledge gained from thisresearch are generally applicable and can be appliedto future integration of any other interactive and au-tomatic theorem provers.

UCAM-CL-TR-873

Khilan Gudka, Robert N.M. Watson,Jonathan Anderson, David Chisnall,Brooks Davis, Ben Laurie, Ilias Marinos,Peter G. Neumann, Alex Richardson:

Clean applicationcompartmentalization with SOAAP(extended version)August 2015, 35 pages, PDF

Abstract: Application compartmentalization, a vulner-ability mitigation technique employed in programs suchas OpenSSH and the Chrome web browser, decom-poses software into sandboxed components to limitprivileges leaked or otherwise available to attackers.However, compartmentalizing applications – and main-taining that compartmentalization – is hindered byad hoc methodologies and significantly increased pro-gramming effort. In practice, programmers stumble

234

through (rather than overtly reason about) compart-mentalization spaces of possible decompositions, un-knowingly trading off correctness, security, complexity,and performance.

We present a new conceptual framework embodiedin an LLVM-based tool: the Security-Oriented Analy-sis of Application Programs (SOAAP) that allows pro-grammers to reason about compartmentalization us-ing source-code annotations (compartmentalization hy-potheses). We demonstrate considerable benefit whencreating new compartmentalizations for complex appli-cations, and analyze existing compartmentalized appli-cations to discover design faults and maintenance issuesarising from application evolution.

This technical report is an extended version of thesimilarly named conference paper presented at the 2015ACM Conference on Computer and CommunicationsSecurity (CCS).

UCAM-CL-TR-874

Zhen Bai:

Augmented Reality interfaces forsymbolic play in early childhoodSeptember 2015, 292 pages, PDFPhD thesis (Jesus College, September 2014)

Abstract: Augmented Reality (AR) is an umbrella termfor technologies that superimpose virtual contents ontothe physical world. There is an emerging research focuson AR applications to improve quality of life for specialuser groups with diverse levels of age, skill, disabilitiesand knowledge.

Symbolic play is an early childhood activity that re-quires children to interpret elements of the real world ina non-literal way. Much research effort has been spentto enhance symbolic play due to its close link with crit-ical development such as symbolic thought, creativityand social understanding.

In this thesis, I identified an analogy between thedual representational characteristics of AR and sym-bolic play. This led me to explore to what extent AR canpromote cognitive and social development in symbolicplay for young children with and without autism spec-trum condition (ASC). To address this research goal, Ideveloped a progressive AR design approach that re-quires progressive levels of mental effort to conceivesymbolic thought during play. I investigated the us-ability of AR displays with the magic mirror metaphorto support physical object manipulation. Based on theprogressive AR design approach and preparatory us-ability investigation, I designed an AR system to en-hance solitary symbolic play, and another AR system toenhance social symbolic play. The effectiveness of eachsystem was rigorously evaluated with reference to psy-chology literature.

Empirical results show that children with ASC 4-7years old produced more solitary symbolic play with

higher theme relevance using the first AR system ascompared with an equivalent non-AR natural play set-ting. Typically developing children aged 4-6, using thesecond AR system, demonstrated improved social sym-bolic play in terms of comprehending emotional statesof pretend roles, and constructing joint pretense onsymbolic transformations using specially designed ARscaffoldings.

UCAM-CL-TR-875

Theodosia Togia:

The language of collaborative taggingSeptember 2015, 203 pages, PDFPhD thesis (Trinity Hall, September 2015)

Abstract: Collaborative tagging is the process wherebypeople attach keywords, known as tags, to digital re-sources, such as text and images, in order to renderthem retrievable in the future. This thesis investigateshow tags submitted by users in collaborative taggingsystems function as descriptors of a resource’s perceivedcontent. Using computational and theoretical tools, Icompare collaborative tagging with natural languagedescription in order to determine whether or to whatextent the former behaves as the latter.

I start the investigation by collecting a corpus oftagged images and exploring the relationship betweena resource and a tag using theories from different dis-ciplines, such as Library Science, Semiotics and Infor-mation Retrieval. Then, I study the lexical characteris-tics of individual tags, suggesting that tags behave asnatural language words. The next step is to examinehow tags combine when annotating a resource. It willbe shown that similar combinatory constraints hold fortags assigned to a resource and for words as used incoherent text. This realisation will lead to the ques-tion of whether the similar combinatory patterns be-tween tags and words are due to implicit semantic re-lations between the tags. To provide an answer, I con-duct an experiment asking humans to submit both tagsand textual descriptions for a set of images, construct-ing a parallel a corpus of more than one thousand tags-text annotations. Analysis of this parallel corpus pro-vides evidence that a large number of tag pairs are con-nected via implicit semantic relations, whose nature isdescribed. Finally, I investigate whether it is possible toautomatically identify the semantically related tag pairsand make explicit their relationship, even in the absenceof supporting image-specific text. I construct and eval-uate a proof-of-concept system to demonstrate that thistask is attainable.

UCAM-CL-TR-876

Robert N. M. Watson, Peter G. Neumann,Jonathan Woodruff, Michael Roe,Jonathan Anderson, David Chisnall,

235

Brooks Davis, Alexandre Joannou,Ben Laurie, Simon W. Moore,Steven J. Murdoch, Robert Norton,Stacey Son:

Capability HardwareEnhanced RISC Instructions:CHERI Instruction-Set ArchitectureSeptember 2015, 198 pages, PDF

Abstract: This technical report describes CHERI ISAv4,the fourth version of the Capability Hardware En-hanced RISC Instructions (CHERI) Instruction-Set Ar-chitecture (ISA). CHERI is being developed by SRI In-ternational and the University of Cambridge. This de-sign captures four years of research, development, re-finement, formal analysis, and testing, and is a sub-stantial enhancement to the ISA version described inUCAM-CL-TR-850. Key improvements lie in tighterC-language integration, and more mature support forsoftware object-capability models; these changes resultfrom experience gained in adapting substantial soft-ware stacks to run on prototype hardware.

The CHERI instruction set is based on a hybridcapability-system architecture that adds new capability-system primitives to a commodity 64-bit RISC ISA en-abling software to efficiently implement fine-grainedmemory protection and a hardware-software object-capability security model. These extensions supportincrementally adoptable, high-performance, formallybased, programmer-friendly underpinnings for fine-grained software decomposition and compartmental-ization, motivated by and capable of enforcing the prin-ciple of least privilege.

The CHERI system architecture purposefully ad-dresses known performance and robustness gaps incommodity ISAs that hinder the adoption of more se-cure programming models centered around the prin-ciple of least privilege. To this end, CHERI blendstraditional paged virtual memory with a per-address-space capability model that includes capability regis-ters, capability instructions, and tagged memory thathave been added to the 64-bit MIPS ISA via a newcapability coprocessor. CHERI also learns from the C-language fat-pointer literature: CHERI capabilities candescribe not only regions of memory, but can also cap-ture C pointer semantics allowing capabilities to besubstituted for pointers in generated code.

CHERI’s hybrid system approach, inspired by theCapsicum security model, allows incremental adoptionof capability-oriented software design: software imple-mentations that are more robust and resilient can be de-ployed where they are most needed, while leaving lesscritical software largely unmodified, but neverthelesssuitably constrained to be incapable of having adverseeffects. For example, we are focusing conversion ef-forts on low-level TCB components of the system: sep-aration kernels, hypervisors, operating system kernels,language runtimes, and userspace TCBs such as web

browsers. Likewise, we see early-use scenarios (suchas data compression, protocol parsing, image process-ing, and video processing) that relate to particularlyhigh-risk software libraries, which are concentrationsof both complex and historically vulnerability-pronecode combined with untrustworthy data sources, whileleaving containing applications unchanged.

This report describes the CHERI Instruction-Set Ar-chitecture (ISA) and design, and provides reference doc-umentation and potential memory models, along withtheir requirements. It also briefly addresses the CHERIsystem hardware-software architecture, documentingour current thinking on integrating programming lan-guages and operating systems with the CHERI hard-ware.

UCAM-CL-TR-877

Robert N. M. Watson, David Chisnall,Brooks Davis, Wojciech Koszek,Simon W. Moore, Steven J. Murdoch,Peter G. Neumann, Jonathan Woodruff:

Capability HardwareEnhanced RISC Instructions:CHERI Programmer’s GuideSeptember 2015, 58 pages, PDF

Abstract: This work presents CHERI, a practical ex-tension of the 64-bit MIPS instruction set to supportcapabilities for fine-grained memory protection.

Traditional paged memory protection has proved in-adequate in the face of escalating security threats andproposed solutions include fine-grained protection ta-bles (Mondrian Memory Protection) and hardware fat-pointer protection (Hardbound). These have empha-sised transparent protection for C executables but havelacked flexibility and practicality. Intel’s recent memoryprotection extensions (iMPX) attempt to adopt someof these ideas and are flexible and optional but lack thestrict correctness of these proposals.

Capability addressing has been the classical solu-tion to efficient and strong memory protection but ithas been thought to be incompatible with common in-struction sets and also with modern program structurewhich uses a flat memory space with global pointers.

CHERI is a fusion of capabilities with a paged flatmemory producing a program-managed fat pointer ca-pability model. This protection mechanism scales fromapplication sandboxing to efficient byte-level memorysafety with per-pointer permissions. I present an exten-sion to the 64-bit MIPS architecture on FPGA that runsstandard FreeBSD and supports self-segmenting appli-cations in user space.

Unlike other recent proposals, the CHERI imple-mentation is open-source and of sufficient quality tosupport software development as well as community

236

extension of this work. I compare with published mem-ory safety mechanisms and demonstrate competitiveperformance while providing assurance and greaterflexibility with simpler hardware requirements.

UCAM-CL-TR-878

Marios O. Choudary:

Efficient multivariate statisticaltechniques for extracting secrets fromelectronic devicesSeptember 2015, 164 pages, PDFPhD thesis (Darwin College, July 2014)

Abstract: In 2002, Suresh Chari, Rao Josyula andPankaj Rohatgi presented a very powerful method,known as the ‘Template Attack’, to infer secret valuesprocessed by a microcontroller, by analysing its power-supply current, generally known as its ‘side-channelleakage’. This attack uses a profiling step to computethe parameters of a multivariate normal distributionfrom the leakage of a training device, and an attackstep in which these parameters are used to infer a se-cret value (e.g. cryptographic key) from the leakage of atarget device. This has important implications for manyindustries, such as pay-TV or banking, that use a micro-controller executing a cryptographic algorithm to au-thenticate their customers.

In this thesis, I describe efficient implementationsof this template attack, that can push its limits fur-ther, by using efficient multivariate statistical analysistechniques. Firstly, I show that, using a linear discrim-inant score, we can avoid some numerical obstacles,and use a large number of leakage samples to improvethe attack, while also drastically decreasing its compu-tation time. I evaluate my implementations on an 8-bit microcontroller, using different compression meth-ods, including Principal Component Analysis (PCA)and Fisher’s Linear Discriminant Analysis (LDA), andI provide guidance for the choice of attack algorithm.My results show that we can determine almost perfectlyan 8-bit target value, even when this value is manipu-lated by a single LOAD instruction.

Secondly, I show that variability caused by the useof either different devices or different acquisition cam-paigns can have a strong impact on the performanceof these attacks. Using four different Atmel XMEGA256 A3U 8-bit devices, I explore several variants of thetemplate attack to compensate for this variability, andI show that, by adapting PCA and LDA to this context,we can reduce the entropy of an unknown 8-bit valueto below 1.5 bits, even when using one device for pro-filing and another one for the attack.

Then, using factor analysis, I identify the main fac-tors that contribute to the correlation between leakagesamples, and analyse the influence of this correlation

on template attacks. I show that, in some cases, by esti-mating the covariance matrix only from these main fac-tors, we can improve the template attack. Furthermore,I show how to use factor analysis in order to gener-ate arbitrary correlation matrices for the simulation ofleakage traces that are similar to the real leakage.

Finally, I show how to implement PCA and LDAefficiently with the stochastic model presented bySchindler et al. in 2005, resulting in the most effectivekind of profiled attack. Using these implementations, Idemonstrate a profiled attack on a 16-bit target.

UCAM-CL-TR-879

Ramana Kumar:

Self-compilation and self-verificationFebruary 2016, 148 pages, PDFPhD thesis (Peterhouse College, May 2015)

Abstract: This dissertation presents two pieces of work,one building on the other, that advance the state ofthe art of formal verification. The focus, in both cases,is on proving end-to-end correctness for realistic im-plementations of computer software. The first pieceis a verified compiler for a stateful higher-order func-tional programming language, CakeML, which is pack-aged into a verified read-eval-print loop (REPL). Thesecond piece is a verified theorem-prover kernel forhigher-order logic (HOL), designed to run in the ver-ified REPL.

Self-compilation is the key idea behind the verifi-cation of the CakeML REPL, in particular, the newtechnique of proof-grounded bootstrapping of a ver-ified compiler. The verified compiler is bootstrappedwithin the theorem prover used for its verification, andthen packaged into a REPL. The result is an implemen-tation of the REPL in machine code, verified againstmachine-code semantics. All the end-to-end correctnesstheorems linking this low-level implementation to itsdesired semantics are proved within the logic of thetheorem prover. Therefore the trusted computing base(TCB) of the final implementation is smaller than forany previously verified compiler.

Just as self-compilation is a benchmark by whichto judge a compiler, I propose self-verification as abenchmark for theorem provers, and present a methodby which a theorem prover could verify its own im-plementation. By applying proof-grounded compilation(i.e., proof-grounded bootstrapping applied to some-thing other than a compiler) to an implementation ofa theorem prover, we obtain a theorem prover whosemachine-code implementation is verified. To connectthis result back to the semantics of the logic of the the-orem prover, we need to formalise that semantics andprove a refinement theorem. I present some advancesin the techniques that can be used to formalise HOLwithin itself, as well as demonstrating that the theo-rem prover, and its correctness proof, can be pushedthrough the verified compiler.

237

My thesis is that verification of a realistic implemen-tation can be produced mostly automatically from averified high-level implementation, via the use of ver-ified compilation. I present a verified compiler andexplain how it was bootstrapped to achieve a smallTCB, and then explain how verified compilation canbe used on a larger application, in particular, a theoremprover for higher-order logic. The application has twoparts, one domain-specific and the other generic. Forthe domain-specific part, I formalise the semantics ofhigher-order logic and prove its inference system sound.For the generic part, I apply proof-grounded compila-tion to produce the verified implementation.

UCAM-CL-TR-880

Janina Voigt:

Access contracts: a dynamicapproach to object-oriented accessprotectionFebruary 2016, 171 pages, PDFPhD thesis (Trinity College, May 2014)

Abstract: In object-oriented (OO) programming, vari-ables do not contain objects directly but addresses ofobjects on the heap. Thus, several variables can pointto the same object; we call this aliasing.

Aliasing is a central feature of OO programmingthat enables efficient sharing of objects across a system.This is essential for the implementation of many pro-gramming idioms, such as iterators. On the other hand,aliasing reduces modularity and encapsulation, makingprograms difficult to understand, debug and maintain.

Much research has been done on controlling alias-ing. Alias protection schemes (such as Clarke et al.’sinfluential ownership types) limit which references canexist, thus guaranteeing the protection of encapsu-lated objects. Unfortunately, existing schemes are sig-nificantly restrictive and consequently have not beenwidely adopted by software developers.

This thesis makes three contributions to the area ofalias protection. Firstly, it proposes aliasing contracts, anovel, dynamically-checked alias protection scheme forobject-oriented programming languages. Aliasing con-tracts are highly flexible and expressive, addressing thelimitations of existing work. We show that they can beused to model many existing alias protection schemes,providing a unifying approach to alias protection.

Secondly, we develop a prototype implementationof aliasing contracts in Java and use it to quantifythe run-time performance of aliasing contracts. Sincealiasing contracts are checked dynamically, they incurrun-time performance overheads; however, our perfor-mance evaluation shows that using aliasing contractsfor testing and debugging is nevertheless feasible.

Thirdly, we propose a static analysis which can ver-ify simple aliasing contracts at compile time, includ-ing those contracts which model ownership types. Con-tracts which can be verified in this way can subse-quently be removed from the program before it is exe-cuted. We show that such a combination of static anddynamic checking significantly improves the run-timeperformance of aliasing contracts.

UCAM-CL-TR-881

Juan M. Tirado, Ovidiu Serban, Qiang Guo,Eiko Yoneki:

Web data knowledge extractionMarch 2016, 60 pages, PDF

Abstract: A constantly growing amount of informationis available through the web. Unfortunately, extract-ing useful content from this massive amount of datastill remains an open issue. The lack of standard datamodels and structures forces developers to create ad-hoc solutions from scratch. The advice of an expert isstill needed in many situations where developers do nothave the correct background knowledge. This forcesdevelopers to spend time acquiring the necessary back-ground from the expert. In other directions, there arepromising solutions employing machine learning tech-niques. However, increasing accuracy requires an in-crease in system complexity that cannot be endured inmany projects.

In this work, we approach the web knowledge ex-traction problem using an expert centric methodology.This methodology defines a set of configurable, ex-tendible and independent components that permit thereutilisation of large pieces of code among projects.Our methodology differs from similar solutions in itsexpert-driven design. This design makes it possible fora subject specialist to drive the knowledge extractionfor a given set of documents. Additionally, we proposethe utilization of machine assisted solutions that guidethe expert during this process. To demonstrate the ca-pabilities of our methodology, we present a real usecase scenario in which public procurement data is ex-tracted from the web-based repositories of several pub-lic institutions across Europe. We provide insightful de-tails about the challenges we had to deal with in thiscase and additional discussions about how to apply ourmethodology.

UCAM-CL-TR-882

Niall Murphy:

Discovering and exploitingparallelism in DOACROSS loopsMarch 2016, 129 pages, PDFPhD thesis (Darwin College, September 2015)

238

Abstract: Although multicore processors have been thenorm for a decade, programmers still struggle to writeparallel general-purpose applications, resulting in un-derutilised on-chip resources. Automatic parallelisationis a promising approach to improving the performanceof such applications without burdening the program-mer. I explore various techniques for automatically ex-tracting parallelism which span the spectrum from con-servative parallelisation, where transformations mustbe proven correct by the compiler, to optimistic paral-lelisation, where speculative execution allows the com-piler to generate potentially unsafe code, with runtimesupports to ensure correct execution.

Firstly I present a limit study of conservative par-allelisation. I use a novel runtime profiling techniqueto find all data dependences which occur during execu-tion and build an oracle data dependence graph. Thisoracle is used as input to the HELIX compiler whichautomatically generates parallelised code. In this way,the compiler is no longer limited by the inadequacies ofcompile-time dependence analysis and the performanceof the generated code represents the upper limit of whatcan be achieved by HELIX-style parallelisation. I showthat, despite shortcomings in the compile-time analysis,the oracle-parallelised code is no better than that ordi-narily produced by HELIX.

Secondly I present a limit study of optimistic par-allelisation. I implement a dataflow timing model thatallows each instruction to execute as early as possible inan idealistic, zero-overhead machine, thus giving a the-oretical limit to the parallelism which can be exploited.The study shows that considerable extra parallelism isavailable which, due to its dynamic nature, cannot beexploited by HELIX, even with the oracle dependenceanalysis.

Finally I demonstrate the design of a practical par-allelisation system which combines the best aspectsof both the conservative and optimistic parallelisationstyles. I use runtime profiling to detect which HELIX-identified dependences cause frequent conflicts and syn-chronise these while allowing other code to run specu-latively. This judicious speculation model achieves su-perior performance to HELIX and approaches the the-oretical limit in many cases.

UCAM-CL-TR-883

Richard Russell, Sean B. Holden:

Survey propagation appliedto weighted partial maximumsatisfiabilityMarch 2016, 15 pages, PDF

Abstract: We adapt the survey propagation methodfor application to weighted partial maximum satisfi-ability (WPMax-SAT) problems consisting of a mix-ture of hard and soft clauses. The aim is to find the

truth assignment that minimises the total cost of un-satisfied soft clauses while satisfying all hard clauses.We use fixed points of the new set of message pass-ing equations in a decimation procedure to reduce thesize of WPMax-SAT problems. We evaluate this strat-egy on randomly generated WPMax-SAT problems andfind that message passing frequently converges to non-trivial fixed points, and in many cases decimation re-sults in simplified problems for which local searchsolvers are able to find truth assignments of comparablecost faster than without decimation.

UCAM-CL-TR-884

Zongyan Huang:

Machine learning and computeralgebraApril 2016, 113 pages, PDFPhD thesis (Lucy Cavendish College, August 2015)

Abstract: Computer algebra is a fundamental researchfield of computer science, and computer algebra sys-tems are used in a wide range of problems, often leadingto significant savings of both time and effort. However,many of these systems offer different heuristics, deci-sion procedures, and parameter settings to tackle anygiven problem, and users need to manually select themin order to use the system.

In this context, the algorithm selection problem isthe problem of selecting the most efficient setting of thecomputer algebra system when faced with a particularproblem. These choices can dramatically affect the effi-ciency, or even the feasibility of finding a solution. Of-ten we have to rely on human expertise to pick a suit-able choice as there are no fixed rules that determinethe best approach, and even for experts, the relation-ship between the problem at hand and the choice of analgorithm is far from obvious.

Machine learning techniques have been widely ap-plied in fields where decisions are to be made withoutthe presence of a human expert, such as in web search,text categorization, or recommender systems. My hy-pothesis is that machine learning can also be appliedto help solve the algorithm selection problem for com-puter algebra systems.

In this thesis, we perform several experiments to de-termine the effectiveness of machine learning (specifi-cally using support vector machines) for the problemof algorithm selection in three instances of computeralgebra applications over real closed fields. Our threeapplications are: (i) choosing decision procedures andtime limits in MetiTarski; (ii) choosing a heuristic forCAD variable ordering; (iii) predicting the usefulness ofGrobner basis preconditioning. The results show thatmachine learning can effectively be applied to these ap-plications, with the machine learned choices being su-perior to both choosing a single fixed individual algo-rithm, as well as to random choice.

239

UCAM-CL-TR-885

Sean B. Holden:

HasGP: A Haskell library forGaussian process inferenceApril 2016, 6 pages, PDF

Abstract: HasGP is a library providing supervisedlearning algorithms for Gaussian process (GP) regres-sion and classification. While only one of many GP li-braries available, it differs in that it represents an ongo-ing exploration of how machine learning research anddeployment might benefit by moving away from theimperative/object-oriented style of implementation andinstead employing the functional programming (FP)paradigm. HasGP is implemented in Haskell and isavailable under the GPL3 open source license.

UCAM-CL-TR-886

Ekaterina Kochmar:

Error detection in content wordcombinationsMay 2016, 170 pages, PDFPhD thesis (St. John’s College, December 2014)

Abstract: This thesis addresses the task of errordetection in the choice of content words focusingon adjective–noun and verb–object combinations. Weshow that error detection in content words is an under-explored area in research on learner language since(i) most previous approaches to error detection andcorrection have focused on other error types, and (ii)the approaches that have previously addressed errorsin content words have not performed error detectionproper. We show why this task is challenging for theexisting algorithms and propose a novel approach toerror detection in content words.

We note that since content words express meaning,an error detection algorithm should take the semanticproperties of the words into account. We use a compo-sitional distributional semantic framework in which werepresent content words using their distributions in na-tive English, while the meaning of the combinations isrepresented using models of compositional semantics.We present a number of measures that describe differ-ent properties of the modelled representations and canreliably distinguish between the representations of thecorrect and incorrect content word combinations. Fi-nally, we cast the task of error detection as a binaryclassification problem and implement a machine learn-ing classifier that uses the output of the semantic mea-sures as features.

The results of our experiments confirm that an errordetection algorithm that uses semantically motivated

features achieves good accuracy and precision and out-performs the state-of-the-art approaches. We concludethat the features derived from the semantic representa-tions encode important properties of the combinationsthat help distinguish the correct combinations from theincorrect ones.

The approach presented in this work can naturallybe extended to other types of content word combina-tions. Future research should also investigate how theerror correction component for content word combina-tions could be implemented.

UCAM-CL-TR-887

Robert M. Norton:

Hardware support forcompartmentalisationMay 2016, 86 pages, PDFPhD thesis (Clare Hall, September 2015)

Abstract: Compartmentalisation is a technique to re-duce the impact of security bugs by enforcing the ‘prin-ciple of least privilege’ within applications. Splittingprograms into separate components that each operatewith minimal access to resources means that a vulnera-bility in one part is prevented from affecting the whole.However, the performance costs and development ef-fort of doing this have so far prevented widespread de-ployment of compartmentalisation, despite the increas-ingly apparent need for better computer security. A ma-jor obstacle to deployment is that existing compart-mentalisation techniques rely either on virtual mem-ory hardware or pure software to enforce separation,both of which have severe performance implicationsand complicate the task of developing compartmen-talised applications.

CHERI (Capability Hardware Enhanced RISC In-structions) is a research project which aims to improvecomputer security by allowing software to precisely ex-press its memory access requirements using hardwaresupport for bounded, unforgeable pointers known ascapabilities. One consequence of this approach is thata single virtual address space can be divided into manyindependent compartments, with very efficient transi-tions and data sharing between them.

This dissertation analyses the compartmentalisationfeatures of the CHERI Instruction Set Architecture(ISA). It includes: a summary of the CHERI ISA, par-ticularly its compartmentalisation features; a descrip-tion of a multithreaded CHERI CPU which runs theFreeBSD Operating System; the results of benchmarksthat compare the characteristics of hardware supportedcompartmentalisation with traditional techniques; andan evaluation of proposed optimisations to the CHERIISA to further improve domain crossing efficiency.

I find that the CHERI ISA provides extremely effi-cient, practical support for compartmentalisation andthat there are opportunities for further optimisation ifeven lower overhead is required in the future.

240

UCAM-CL-TR-888

Sherif Akoush, Ripduman Sohan,Andy Hopper:

Recomputation-based data reliabilityfor MapReduce using lineageMay 2016, 19 pages, PDF

Abstract: Ensuring block-level reliability of MapRe-duce datasets is expensive due to the spatial overheadsof replicating or erasure coding data. As the amount ofdata processed with MapReduce continues to increase,this cost will increase proportionally. In this paper weintroduce Recomputation-Based Reliability in MapRe-duce (RMR), a system for mitigating the cost of main-taining reliable MapReduce datasets. RMR leveragesrecord-level lineage of the relationships between inputand output records in the job for the purposes of sup-porting block-level recovery. We show that collectingthis lineage imposes low temporal overhead. We fur-ther show that the collected lineage is a fraction of thesize of the output dataset for many MapReduce jobs.Finally, we show that lineage can be used to determin-istically reproduce any block in the output. We quanti-tatively demonstrate that, by ensuring the reliability ofthe lineage rather than the output, we can achieve datareliability guarantees with a small storage requirement.

UCAM-CL-TR-889

Sherif Akoush, Ripduman Sohan,Andrew Rice, Andy Hopper:

Evaluating the viability of remoterenewable energy in datacentrecomputingMay 2016, 26 pages, PDF

Abstract: We investigate the feasibility of loosely-coupled distributed datacentre architectures colocatedwith and powered by renewable energy sources andinterconnected using high-bandwidth low-latency datalinks. The design of these architectures advocates (i)variable machine availability: the number of machinesaccessible at a particular site at any given time is pro-portional to the amount of renewable energy availableand (ii) workload deferment and migration: if there isinsufficient site capacity, workloads are either halted orshifted to transient energy-rich locations.

While these architectures are attractive from an en-vironmental perspective, their feasibility depends on (i)the requirement for additional hardware, (ii) the associ-ated service interruptions, and (iii) the data and energyoverhead of workload migration.

In this work we attempt to broadly quantify theseoverheads. We define a model of the basic design of the

architecture incorporating the energy consumption ofmachines in the datacentre and the network. We fur-ther correlate this energy consumption with renewableenergy available at different locations around the globe.Given this model we present two simulation-driven casestudies based on data from Google and Facebook pro-duction clusters.

Generally we provide insights on the trade-offs asso-ciated with this off-grid architecture. For example, weshow that an optimised configuration consisting of tendistributed datacentres results in a 2% increase in jobcompletion time at the cost of a 50% increase in thenumber of machines required.

UCAM-CL-TR-890

Alistair G. Stead:

Using multiple representationsto develop notational expertise inprogrammingJune 2016, 301 pages, PDFPhD thesis (Girton College, May 2015)

Abstract: The development of expertise with notationsis an important skill for both creators and users ofnew technology in the future. In particular, the develop-ment of Notational Expertise (NE) is becoming valuedin schools, where changes in curricula recommend thatstudents of all ages use a range of programming rep-resentations to develop competency that can be used toultimately create and manipulate abstract text notation.Educational programming environments making use ofmultiple external representations (MERs) that replicatethe transitions occurring in schools within a single sys-tem present a promising area of research. They canprovide support for scaffolding knowledge using low-abstraction representations, knowledge transfer, andaddressing barriers faced during representation transi-tion.

This thesis identifies analogies between the use ofnotation in mathematics education and programmingto construct the Modes of Representational Abstrac-tion (MoRA) framework, which supports the identi-fication of representation transition strategies. Thesestrategies were used to develop an educational pro-gramming environment, DrawBridge. Studies of the us-age of DrawBridge highlighted the need for assessmentmechanisms to measure NE. Empirical evaluation ina range of classrooms found that the MoRA frame-work provided useful insights and that low-abstractionrepresentations provided a great source of motivation.Comparisons of assessments found that a novel assess-ment type, Adapted Parsons Problems, produced highstudent participation while still correlating with code-writing scores. Results strongly suggest that game-likefeatures can encourage the use of abstract notation andincrease students’ acquisition of NE. Finally, findingsshow that students in higher year groups benefitted

241

more from using DrawBridge than students in loweryear groups.

UCAM-CL-TR-891

Robert N. M. Watson, Peter G. Neumann,Jonathan Woodruff, Michael Roe,Jonathan Anderson, David Chisnall,Brooks Davis, Alexandre Joannou,Ben Laurie, Simon W. Moore,Steven J. Murdoch, Robert Norton,Stacey Son, Hongyan Xia:

Capability HardwareEnhanced RISC Instructions:CHERI Instruction-Set Architecture(Version 5)June 2016, 242 pages, PDF

Abstract: This technical report describes CHERI ISAv5,the fifth version of the Capability Hardware EnhancedRISC Instructions (CHERI) Instruction-Set Architec-ture (ISA) being developed by SRI International and theUniversity of Cambridge. This design captures six yearsof research, development, refinement, formal analysis,and testing, and is a substantial enhancement to theISA versions described in earlier technical reports. Thisversion introduces the CHERI-128 “compressed” ca-pability format, adds further capability instructions toimprove code generation, and rationalizes a numberof ISA design choices (such as system permissions) aswe have come to better understand mappings fromC programming-language and MMU-based operating-system models into CHERI. It also contains improve-ments to descriptions, explanations, and rationale.

The CHERI instruction set is based on a hybridcapability-system architecture that adds new capability-system primitives to a commodity 64-bit RISC ISA en-abling software to efficiently implement fine-grainedmemory protection and a hardware-software object-capability security model. These extensions supportincrementally adoptable, high-performance, formallybased, programmer-friendly underpinnings for fine-grained software decomposition and compartmental-ization, motivated by and capable of enforcing the prin-ciple of least privilege. Fine-grained memory protec-tion also provides direct mitigation of many widely de-ployed exploit techniques.

The CHERI system architecture purposefully ad-dresses known performance and robustness gaps incommodity ISAs that hinder the adoption of more se-cure programming models centered around the prin-ciple of least privilege. To this end, CHERI blendstraditional paged virtual memory with a per-address-space capability model that includes capability regis-ters, capability instructions, and tagged memory thathave been added to the 64-bit MIPS ISA. CHERI learns

from the C-language fat-pointer literature: its capabili-ties describe fine-grained regions of memory and can besubstituted for data or code pointers in generated code,protecting data and also providing Control-Flow In-tegrity (CFI). Strong monotonicity properties allow theCHERI capability model to express a variety of protec-tion properties, from valid C-language pointer prove-nance and C-language bounds checking to implement-ing the isolation and controlled communication struc-tures required for compartmentalization.

CHERI’s hybrid system approach, inspired by theCapsicum security model, allows incremental adoptionof capability-oriented software design: software imple-mentations that are more robust and resilient can be de-ployed where they are most needed, while leaving lesscritical software largely unmodified, but neverthelesssuitably constrained to be incapable of having adverseeffects. For example, we are focusing conversion ef-forts on low-level TCB components of the system: sep-aration kernels, hypervisors, operating-system kernels,language runtimes, and userspace TCBs such as webbrowsers. Likewise, we see early-use scenarios (suchas data compression, protocol parsing, image process-ing, and video processing) that relate to particularlyhigh-risk software libraries, which are concentrationsof both complex and historically vulnerability-pronecode combined with untrustworthy data sources, whileleaving containing applications unchanged.

UCAM-CL-TR-892

A. Daniel Hall:

Pipelined image processing forpattern recognitionJuly 2016, 121 pages, PDFPhD thesis (Queen’s College, October 1991)

Abstract: Image processing for pattern recognitionis both computationally intensive and algorithmicallycomplex. The objective of the research presented herewas to produce a fast inexpensive image processor forpattern recognition. This objective has been achievedby separating the computationally intensive pixel pro-cessing tasks from the algorithmically complex featureprocessing tasks.

The context for this work is explored in terms ofimage processor architecture, intermediate-level imageprocessing tasks and pattern recognition.

A new language to describe pipelined neighbour-hood operations on binary images (‘PiNOLa’: PipelinedNeighbourhood Operator Language) is presented.PiNOLa was implemented in Modula-2 to providean interactive simulation system (‘PiNOSim’: PipelinedNeighbourhood Operator Simulator). Experiments us-ing PiNOSim were conducted and a design for a topo-logical feature extractor was produced.

A novel algorithm for connected component la-belling in hardware is presented. This algorithm was

242

included in the PiNOSim program to enable the compo-nent labelling of features extracted using the language.The component labelling algorithm can be used withthe topological feature extractor mentioned above. Theresult is a method of converting a binary raster scaninto a stream of topological features grouped by con-nected object.

To test the potential performance of a system basedon these ideas, some hardware (‘GRIPPR’: GenericReal-time Image Processor for Pattern Recognition)was designed. This machine was implemented usingstandard components linked to a PC based trans-puter board. To demonstrate an application of GRIPPRan Optical Character Recognition (OCR) system ispresented. Finally, results demonstrating a continuousthroughput of 1500 characters/second are given.

UCAM-CL-TR-893

Thomas F. J.-M. Pasquier:

Towards practical information flowcontrol and auditJuly 2016, 153 pages, PDFPhD thesis (Jesus College, January 2016)

Abstract: In recent years, pressure from the generalpublic and from policy makers has been for more andbetter control over personal data in cloud computingenvironments. Regulations put responsibilities on cloudtenants to ensure that proper measures are effected bytheir cloud provider. But there is currently no satisfac-tory mechanism to achieve this, leaving tenants open topotentially costly lawsuits.

Decentralised Information Flow Control (IFC) atsystem level is a data-centric Mandatory Access Con-trol scheme that guarantees non-interference across se-curity contexts, based on lattices defined by secrecy andintegrity properties. Every data flow is continuouslymonitored to guarantee the enforcement of decentrallyspecified policies. Applications running above IFC en-forcement need not be trusted and can interact. IFCconstraints can be used to ensure that proper work-flows are followed, as defined by regulations or con-tracts. For example, to ensure that end users’ personaldata are anonymised before being disclosed to thirdparties.

Information captured during IFC enforcement al-lows a directed graph representing whole-system dataexchange to be generated. The coupling of policyenforcement and audit data capture allows system“noise” to be removed from audit data, and only infor-mation relevant to the policies in place to be recorded.It is possible to query these graphs to demonstrate thatthe system behaved according to regulation. For exam-ple, to demonstrate from run-time data that there is nopath without anonymisation between an end-user anda third party.

UCAM-CL-TR-894

Christopher Bryant, Mariano Felice:

Issues in preprocessing currentdatasets for grammatical errorcorrectionSeptember 2016, 15 pages, PDF

Abstract: In this report, we describe some of the is-sues encountered when preprocessing two of the largestdatasets for Grammatical Error Correction (GEC);namely the public FCE corpus and NUCLE (along withassociated CoNLL test sets). In particular, we showthat it is not straightforward to convert character levelannotations to token level annotations and that sen-tence segmentation is more complex when annotationschange sentence boundaries. These become even morecomplicated when multiple annotators are involved.We subsequently describe how we handle such casesand consider the pros and cons of different methods.

UCAM-CL-TR-895

Mariano Felice:

Artificial error generation fortranslation-based grammatical errorcorrectionOctober 2016, 155 pages, PDFPhD thesis (Hughes Hall, October 2016)

Abstract: Automated grammatical error correction forlanguage learners has attracted a lot of attention in re-cent years, especially after a number of shared tasksthat have encouraged research in the area. Treatingthe problem as a translation task from ‘incorrect’ into‘correct’ English using statistical machine translationhas emerged as a state-of-the-art approach but it re-quires vast amounts of corrected parallel data to pro-duce useful results. Because manual annotation of in-correct text is laborious and expensive, we can gen-erate artificial error-annotated data by injecting errorsdeliberately into correct text and thus produce largeramounts of parallel data with much less effort.

In this work, we review previous work on artificialerror generation and investigate new approaches usingrandom and probabilistic methods for constrained andgeneral error correction. Our methods use error statis-tics from a reference corpus of learner writing to gener-ate errors in native text that look realistic and plausiblein context. We investigate a number of aspects that canplay a part in the error generation process, such as theorigin of the native texts, the amount of context usedto find suitable insertion points, the type of informationencoded by the error patterns and the output error dis-tribution. In addition, we explore the use of linguistic

243

information for characterising errors and train systemsusing different combinations of real and artificial data.

Results of our experiments show that the use of ar-tificial errors can improve system performance whenthey are used in combination with real learner errors,in line with previous research. These improvements areobserved for both constrained and general correction,for which probabilistic methods produce the best re-sults. We also demonstrate that systems trained on acombination of real and artificial errors can beat otherhighly-engineered systems and be more robust, show-ing that performance can be improved by focusing onthe data rather than tuning system parameters.

Part of our work is also devoted to the proposal ofthe I-measure, a new evaluation scheme that scores cor-rections in terms of improvement on the original textand solves known issues with existing evaluation mea-sures.

UCAM-CL-TR-896

Kumar Sharad:

Learning to de-anonymize socialnetworksDecember 2016, 158 pages, PDFPhD thesis (Churchill College, November 2016)

Abstract: Releasing anonymized social network datafor analysis has been a popular idea among dataproviders. Despite evidence to the contrary the be-lief that anonymization will solve the privacy prob-lem in practice refuses to die. This dissertation con-tributes to the field of social graph de-anonymizationby demonstrating that even automated models canbe quite successful in breaching the privacy of suchdatasets. We propose novel machine-learning basedtechniques to learn the identities of nodes in socialgraphs, thereby automating manual, heuristic-based at-tacks. Our work extends the vast literature of socialgraph de-anonymization attacks by systematizing them.

We present a random-forests based classifier whichuses structural node features based on neighborhooddegree distribution to predict their similarity. Usingthese simple and efficient features we design versatileand expressive learning models which can learn thede-anonymization task just from a few examples. Ourevaluation establishes their efficacy in transforming de-anonymization to a learning problem. The learning istransferable in that the model can be trained to attackone graph when trained on another.

Moving on, we demonstrate the versatility andgreater applicability of the proposed model by usingit to solve the long-standing problem of benchmark-ing social graph anonymization schemes. Our frame-work bridges a fundamental research gap by makingcheap, quick and automated analysis of anonymiza-tion schemes possible, without even requiring their fulldescription. The benchmark is based on comparison

of structural information leakage vs. utility preserva-tion. We study the trade-off of anonymity vs. utilityfor six popular anonymization schemes including thosepromising k-anonymity. Our analysis shows that noneof the schemes are fit for the purpose.

Finally, we present an end-to-end social graph de-anonymization attack which uses the proposed ma-chine learning techniques to recover node mappingsacross intersecting graphs. Our attack enhances thestate of art in graph de-anonymization by demonstrat-ing better performance than all the other attacks in-cluding those that use seed knowledge. The attack isseedless and heuristic free, which demonstrates the su-periority of machine learning techniques as comparedto hand-selected parametric attacks.

UCAM-CL-TR-897

Sheharbano Khattak:

Characterization of Internetcensorship from multiple perspectivesJanuary 2017, 170 pages, PDFPhD thesis (Robinson College, January 2017)

Abstract: Internet censorship is rampant, both underthe support of nation states and private actors, withimportant socio-economic and policy implications. Yetmany issues around Internet censorship remain poorlyunderstood because of the lack of adequate approachesto measure the phenomenon at scale. This thesis aimsto help fill this gap by developing three methodologiesto derive censorship ground truths, that are then ap-plied to real-world datasets to study the effects of In-ternet censorship. These measurements are given foun-dation in a comprehensive taxonomy that captures themechanics, scope, and dynamics of Internet censorship,complemented by a framework that is employed to sys-tematize over 70 censorship resistance systems.

The first part of this dissertation analyzes “user-sidecensorship”, where a device near the user, such as thelocal ISP or the national backbone, blocks the user’sonline communication. This study provides quanti-fied insights into how censorship affects users, contentproviders, and Internet Service Providers (ISPs); as seenthrough the lens of traffic datasets captured at an ISPin Pakistan over a period of three years, beginning in2011.

The second part of this dissertation moves to“publisher-side censorship”. This is a new kind ofblocking where the user’s request arrives at the Webpublisher, but the publisher (or something working onits behalf) refuses to respond based on some propertyof the user. Publisher-side censorship is explored in twocontexts. The first is in the context of an anonymitynetwork, Tor, involving a systematic enumeration andcharacterization of websites that treat Tor users differ-ently from other users.

Continuing on the topic of publisher-side blocking,the second case study examines the Web’s differential

244

treatment of users of adblocking software. The risingpopularity of adblockers in recent years poses a seri-ous threat to the online advertising industry, promptingpublishers to actively detect users of adblockers andsubsequently block them or otherwise coerce them todisable the adblocker. This study presents a first charac-terization of such practices across the Alexa Top 5,000websites.

This dissertation demonstrates how the censor’sblocking choices can leave behind a detectable patternin network communications, that can be leveraged toestablish exact mechanisms of censorship. This knowl-edge facilitates the characterization of censorship fromdifferent perspectives; uncovering entities involved incensorship, its targets, and the effects of such prac-tices on stakeholders. More broadly, this study comple-ments efforts to illuminate the nature, scale, and effectsof opaque filtering practices; equipping policy-makerswith the knowledge necessary to systematically and ef-fectively respond to Internet censorship.

UCAM-CL-TR-898

Dongting Yu:

Access control for networkmanagementJanuary 2017, 108 pages, PDFPhD thesis (Robinson College, May 2016)

Abstract: Network management inherently involves hu-man input. From expressing business logic and networkpolicy to the low level commands to networking de-vices, at least some tasks are done manually by oper-ators. These tasks are a source of error whose conse-quences can be severe, since operators have high levelsof access, and can bring down a whole network if theyconfigure it improperly.

Software-Defined Networking (SDN) is a new net-work technology that brings even more flexibility (andrisk) to network operations. Operators can now easilyget third-party apps to run in their networks, or evenhost tenants and give them some control over their por-tion of the network. However security has not caughtup, and it is easy for these third parties to access net-work resources without permission.

Access control is a mature concept; it has been stud-ied for decades. In this dissertation I examine how ac-cess control can be used to solve the above networkmanagement problems. I look at the Border GatewayProtocol (BGP) from an operations perspective andpropose a mandatory access control model using role-based policies to separate long-term invariants fromday-to-day configurations. As a result the former wouldnot be accidentally violated when configuring the latter,as this is a significant source of BGP misconfigurationstoday. Then, for SDN, I propose to add access controlto controllers so that virtual controllers and applica-tions that run within them cannot have unlimited ac-cess to the network infrastructure, as they do currently.

Adding attribute-based access control makes the systemmuch less fragile while it still retains the essential flexi-bility provided by SDN. Lastly, I propose a hierarchicalarchitecture which, with SDN, can isolate security com-promises even when some devices are physically com-promised. This is achieved by using access control toboth enable network access and deny unexpected con-nections.

UCAM-CL-TR-899

Douwe Kiela:

Deep embodiment: groundingsemantics in perceptual modalitiesFebruary 2017, 128 pages, PDFPhD thesis (Darwin College, July 2016)

Abstract: Multi-modal distributional semantic modelsaddress the fact that text-based semantic models, whichrepresent word meanings as a distribution over otherwords, suffer from the grounding problem. This thesisadvances the field of multi-modal semantics in two di-rections. First, it shows that transferred convolutionalneural network representations outperform the tradi-tional bag of visual words method for obtaining vi-sual features. It is then shown that these representationsmay be applied successfully to various natural languageprocessing tasks. Second, it performs the first ever ex-periments with grounding in the non-visual modalitiesof auditory and olfactory perception using raw data.Deep learning, a natural fit for deriving grounded rep-resentations, is used to obtain the highest-quality rep-resentations compared to more traditional approaches.Multi-modal representation learning leads to improve-ments over language-only models in a variety of tasks.If we want to move towards human-level artificial intel-ligence, we will need to build multi-modal models thatrepresent the full complexity of human meaning, in-cluding its grounding in our various perceptual modal-ities.

UCAM-CL-TR-900

Valentin Dalibard:

A framework to build bespokeauto-tuners with structured BayesianoptimisationFebruary 2017, 182 pages, PDFPhD thesis (St. John’s College, January 2017)

Abstract: Due to their complexity, modern computersystems expose many configuration parameters whichusers must manually tune to maximise the performanceof their applications. To relieve users of this burden,auto-tuning has emerged as an alternative in which ablack-box optimiser iteratively evaluates configurations

245

to find efficient ones. A popular auto-tuning techniqueis Bayesian optimisation, which uses the results to in-crementally build a probabilistic model of the impactof the parameters on performance. This allows the op-timisation to quickly focus on efficient regions of theconfiguration space. Unfortunately, for many computersystems, either the configuration space is too large todevelop a good model, or the time to evaluate perfor-mance is too long to be executed many times.

In this dissertation, I argue that by extracting a smallamount of domain specific knowledge about a system,it is possible to build a bespoke auto-tuner with signif-icantly better performance than its off-the-shelf coun-terparts. This could be performed, for example, by asystem engineer who has a good understanding of theunderlying system behaviour and wants to provide per-formance portability. This dissertation presents BOAT,a framework to build BespOke Auto-Tuners. BOAT of-fers a novel set of abstractions designed to make theexposition of domain knowledge easy and intuitive.

First, I introduce Structured Bayesian Optimisation(SBO), an extension of the Bayesian optimisation algo-rithm. SBO can leverage a bespoke probabilistic modelof the system’s behaviour, provided by the system engi-neer, to rapidly converge to high performance configu-rations. The model can benefit from observing manyruntime measurements per evaluation of the system,akin to the use of profilers.

Second, I present Probabilistic-C++ a lightweight,high performance probabilistic programming library. Itallows users to declare a probabilistic models of theirsystem’s behaviour and expose it to an SBO. Proba-bilistic programming is a recent tool from the MachineLearning community making the declaration of struc-tured probabilistic models intuitive.

Third, I present a new optimisation scheduling ab-straction which offers a structured way to express op-timisations which themselves execute other optimisa-tions. For example, this is useful to express Bayesianoptimisations, which each iteration execute a numeri-cal optimisation. Furthermore, it allows users to eas-ily implement decompositions which exploit the loosecoupling of subsets of the configuration parameters tooptimise them almost independently.

UCAM-CL-TR-901

Chao Gao:

Signal maps for smartphonelocalisationFebruary 2017, 127 pages, PDFPhD thesis (King’s College, August 2016)

Abstract: Indoor positioning has been an active re-search area for 20 years. Systems based on dedicatedinfrastructure such as ultrasound or ultra-wideband(UWB) radio can provide centimetre-accuracy. But theyare generally prohibitively expensive to build, deploy

and maintain. Today, signal fingerprinting-based in-door positioning techniques, which use existing wire-less infrastructure, are arguably the most prevalent.The fingerprinting-based positioning system matchesthe current signal observations (fingerprints) at a de-vice to position it on a pre-defined fingerprint map. Themap is created via some form of survey. However, a ma-jor deterrent of these systems is the initial creation andsubsequent maintenance of the signal maps. The com-monly used map building method is the so-called man-ual survey, during which a surveyor visits each pointon a regular grid and measures the signal fingerprintsthere. This traditional method is laborious and not con-sidered scalable. An emerging alternative to manualsurvey is the path survey, in which a surveyor movescontinuously through the environment and signal mea-surements are taken by the surveying device along thepath. A path survey is intuitively better than a manualsurvey, at least in terms of speed. But, path surveys havenot been well-studied yet.

This thesis assessed the path survey quantitativelyand rigorously, demonstrated that path survey can ap-proach the manual survey in terms of accuracy if cer-tain guidelines are followed. Automated survey systemshave been proposed and a commodity smart-phone isthe only survey device required. The proposed systemsachieve sub-metre accuracy in recovering the surveytrajectory both with and without environmental infor-mation (floor plans), and have been found to outper-form the state-of-the-art in terms of robustness andscalability.

This thesis concludes that path survey can bestreamlined by the proposed systems to replace the la-borious manual survey. The proposed systems can pro-mote the deployment of indoor positioning system inlarge-scale and complicated environments, especiallyin dynamic environments where frequent re-survey isneeded.

UCAM-CL-TR-902

Raoul-Gabriel Urma:

Programming language evolutionFebruary 2017, 129 pages, PDFPhD thesis (Hughes Hall, September 2015)

Abstract: Programming languages are the way develop-ers communicate with computers—just like natural lan-guages let us communicate with one another. In bothcases multiple languages have evolved. However, theprogramming language ecosystem changes at a muchhigher rate compared to natural languages. In fact, pro-gramming languages need to constantly evolve in re-sponse to user needs, hardware advances and researchdevelopments or they are otherwise displaced by new-comers. As a result, existing code written by developersmay stop working with a newer language version. Con-sequently, developers need to “search” (analyse) and

246

“replace” (refactor) code in their code bases to supportnew language versions.

Traditionally, tools have focused on the replace as-pect (refactoring) to support developers evolving theircode bases. This dissertation argues that developersalso need machine support focused on the search as-pect.

This dissertation starts by characterising factorsdriving programming language evolution based on ex-ternal versus internal forces. Next, it introduces a clas-sification of changes introduced by language designersthat affect developers. It then contributes three tech-niques to support developers in analysing their codebases.

First, we show a source code query system based ongraphs that express code queries at a mixture of syntax-tree, type, control-flow graph and data-flow levels. Wedemonstrate how this system can support developersand language designers in locating various code pat-terns relevant in evolution.

Second, we design an optional run-time type sys-tem for Python, that lets developers manually specifycontracts to identify semantic incompatibilities betweenPython 2 and Python 3.

Third, recognising that existing codebases do nothave such contracts, we describe a dynamic analysis toautomatically generate them.

UCAM-CL-TR-903

Jannis Bulian:

Parameterized complexity ofdistances to sparse graph classesFebruary 2017, 133 pages, PDFPhD thesis (Clare College, February 2016)

Abstract: This dissertation offers contributions to thearea of parameterized complexity theory, which studiesthe complexity of computational problems in a mul-tivariate framework. We demonstrate that for graphproblems, many popular parameters can be understoodas distances that measure how far a graph is frombelonging to a class of sparse graphs. We introducethe term distance parameter for such parameters anddemonstrate their value in several ways.

The parameter tree-depth is uncovered as a distanceparameter, and we establish several fixed-parametertractability and hardness results for problems param-eterized by tree-depth.

We introduce a new distance parameter eliminationdistance to a class C. The parameter measures distancein a way that naturally generalises the notion of vertexdeletion by allowing deletions from every componentin each deletion step. We show that tree-depth is a spe-cial case of this new parameter, introduce several char-acterisations of elimination distance, and discuss appli-cations. In particular we demonstrate that Graph Iso-morphism is FPT parameterized by elimination distance

to bounded degree, extending known results. We alsoshow that checking whether a given graph has elimina-tion distance k to a minor-closed class C is FPT, and,moreover, if we are given a finite set of minors charac-terising C, that there is an explicit FPT algorithm forthis.

We establish an algorithmic meta-theorem for sev-eral distance parameters, by showing that all graphproblems that are both slicewise nowhere dense andslicewise first-order definable are FPT.

Lastly, several fixed-parameter tractability resultsare established for two graph problems that have natu-ral distance parameterizations.

UCAM-CL-TR-904

Zheng Yuan:

Grammatical error correction innon-native EnglishMarch 2017, 145 pages, PDFPhD thesis (St. Edmund’s College, September 2016)

Abstract: Grammatical error correction (GEC) is thetask of automatically correcting grammatical errors inwritten text. Previous research has mainly focussed onindividual error types and current commercial proof-reading tools only target limited error types. As sen-tences produced by learners may contain multiple er-rors of different types, a practical error correction sys-tem should be able to detect and correct all errors.

In this thesis, we investigate GEC for learners ofEnglish as a Second Language (ESL). Specifically, wetreat GEC as a translation task from incorrect into cor-rect English, explore new models for developing end-to-end GEC systems for all error types, study systemperformance for each error type, and examine modelgeneralisation to different corpora. First, we apply Sta-tistical Machine Translation (SMT) to GEC and provethat it can form the basis of a competitive all-errorsGEC system. We implement an SMT-based GEC sys-tem which contributes to our winning system submittedto a shared task in 2014. Next, we propose a rankingmodel to re-rank correction candidates generated by anSMT-based GEC system. This model introduces newlinguistic information and we show that it improvescorrection quality. Finally, we present the first study us-ing Neural Machine Translation (NMT) for GEC. Wedemonstrate that NMT can be successfully applied toGEC and help capture new errors missed by an SMT-based GEC system.

While we focus on GEC for English, the methodspresented in this thesis can be easily applied to any lan-guage.

247

UCAM-CL-TR-905

William Sonnex:

Fixed point promotion:taking the induction out ofautomated inductionMarch 2017, 170 pages, PDFPhD thesis (Trinity College, September 2015)

Abstract: This thesis describes the implementation ofElea: the first automated theorem prover for proper-ties of observational approximation between terms ina functional programming language. A term approx-imates another if no program context can tell themapart, except that the approximating term can be lessdefined than the approximated term. Elea can proveterms equivalent by proving approximation in both di-rections.

The language Elea proves approximations over ispure, call-by-name, and with non-strict data-types, asubset of the Haskell language. In a test set of es-tablished properties from the literature, Elea performscomparably well to the current state of the art in au-tomated equivalence proof, but where these provers as-sume all terms are total, Elea does not.

Elea uses a novel method to prove approximations,fixed-point promotion, a method which does not sufferfrom some of the weaknesses of cyclic proof techniquessuch as induction or coinduction. Fixed-point promo-tion utilises an unfold-fold style term rewriting system,similar to supercompilation, which uses multiple novelterm rewriting steps in order to simulate the lemma dis-covery techniques of automated cyclic provers.

Elea is proven sound using denotational semantics,including a novel method for verifying an unfold-foldstyle rewrite, a method I have called truncation fusion.

UCAM-CL-TR-906

Tomas Petricek:

Context-aware programminglanguagesMarch 2017, 218 pages, PDFPhD thesis (Clare Hall, March 2017)

Abstract: The development of programming languagesneeds to reflect important changes in the way programsexecute. In recent years, this has included the develop-ment of parallel programming models (in reaction tothe multi-core revolution) or improvements in data ac-cess technologies. This thesis is a response to anothersuch revolution - the diversification of devices and sys-tems where programs run.

The key point made by this thesis is the realizationthat an execution environment or a context is funda-mental for writing modern applications and that pro-gramming languages should provide abstractions forprogramming with context and verifying how it is ac-cessed.

We identify a number of program properties thatwere not connected before, but model some notion ofcontext. Our examples include tracking different exe-cution platforms (and their versions) in cross-platformdevelopment, resources available in different executionenvironments (e.g. GPS sensor on a phone and databaseon the server), but also more traditional notions such asvariable usage (e.g. in liveness analysis and linear log-ics) or past values in stream-based dataflow program-ming. Our first contribution is the discovery of the con-nection between the above examples and their novelpresentation in the form of calculi (coeffect systems).The presented type systems and formal semantics high-light the relationship between different notions of con-text.

Our second contribution is the definition of two uni-fied coeffect calculi that capture the common structureof the examples. In particular, our flat coeffect calcu-lus models languages with contextual properties of theexecution environment and our structural coeffect cal-culus models languages where the contextual propertiesare attached to the variable usage. We define the seman-tics of the calculi in terms of category theoretical struc-ture of an indexed comonad (based on dualisation ofthe well-known monad structure), use it to define oper-ational semantics and prove type safety of the calculi.

Our third contribution is a novel presentation of ourwork in the form of web-based interactive essay. Thisprovides a simple implementation of three context-aware programming languages and lets the reader writeand run simple context-aware programs, but also ex-plore the theory behind the implementation includingthe typing derivation and semantics.

UCAM-CL-TR-907

Robert N. M. Watson, Peter G. Neumann,Jonathan Woodruff, Michael Roe,Jonathan Anderson, John Baldwin,David Chisnall, Brooks Davis,Alexandre Joannou, Ben Laurie,Simon W. Moore, Steven J. Murdoch,Robert Norton, Stacey Son, Hongyan Xia:

Capability HardwareEnhanced RISC Instructions:CHERI Instruction-Set Architecture(Version 6)April 2017, 307 pages, PDF

248

Abstract: This technical report describes CHERI ISAv6,the sixth version of the Capability Hardware EnhancedRISC Instructions (CHERI) Instruction-Set Architec-ture (ISA) being developed by SRI International and theUniversity of Cambridge. This design captures sevenyears of research, development, experimentation, re-finement, formal analysis, and validation through hard-ware and software implementation. CHERI ISAv6 is asubstantial enhancement to prior ISA versions: it intro-duces support for kernel-mode compartmentalization,jump-based rather than exception-based domain tran-sition, architecture-abstracted and efficient tag restora-tion, and more efficient generated code. A new chapteraddresses potential applications of the CHERI model tothe RISC-V and x86-64 ISAs, previously described rel-ative only to the 64-bit MIPS ISA. CHERI ISAv6 betterexplains our design rationale and research methodol-ogy.

CHERI is a hybrid capability-system architecturethat adds new capability-system primitives to a com-modity 64-bit RISC ISA enabling software to effi-ciently implement fine-grained memory protection andscalable software compartmentalization. Design goalshave included incremental adoptability within currentISAs and software stacks, low performance overheadfor memory protection, significant performance im-provements for software compartmentalization, formalgrounding, and programmer-friendly underpinnings.Throughout, we have focused on providing strong andefficient architectural foundations for the principles ofleast privilege and intentional use in the execution ofsoftware at multiple levels of abstraction, preventingand mitigating vulnerabilities.

The CHERI system architecture purposefully ad-dresses known performance and robustness gaps incommodity ISAs that hinder the adoption of more se-cure programming models centered around the princi-ple of least privilege. To this end, CHERI blends tradi-tional paged virtual memory with an in-address-spacecapability model that includes capability registers,capability instructions, and tagged memory. CHERIbuilds on C-language fat-pointer literature: its capabil-ities describe fine-grained regions of memory and canbe substituted for data or code pointers in generatedcode, protecting data and also improving control-flowrobustness. Strong capability integrity and monotonic-ity properties allow the CHERI model to express a va-riety of protection properties, from enforcing valid C-language pointer provenance and bounds checking toimplementing the isolation and controlled communica-tion structures required for software compartmental-ization.

CHERI’s hybrid capability-system approach, in-spired by the Capsicum security model, allows incre-mental adoption of capability-oriented design: softwareimplementations that are more robust and resilient canbe deployed where they are most needed, while leav-ing less critical software largely unmodified, but never-theless suitably constrained to be incapable of havingadverse effects. Potential deployment scenarios include

low-level software Trusted Computing Bases (TCBs)such as separation kernels, hypervisors, and operating-system kernels, as well as userspace TCBs such as lan-guage runtimes and web browsers. Likewise, we seeearly-use scenarios (such as data compression, proto-col parsing, and image processing) that relate to par-ticularly high-risk software libraries, which are concen-trations of both complex and historically vulnerability-prone code exposed to untrustworthy data sources,while leaving containing applications unchanged.

UCAM-CL-TR-908

Raphael L. Proust:

ASAP: As Static As Possible memorymanagementJuly 2017, 147 pages, PDFPhD thesis (Magdalene College, July 2016)

Abstract: Today, there are various ways to manage thememory of computer programs: garbage collectors ofall kinds, reference counters, regions, linear types –each with benefits and drawbacks, each fit for specificsettings, each appropriate to different problems, eachwith their own trade-offs.

Despite the plethora of techniques available, sys-tem programming (device drivers, networking libraries,cryptography applications, etc.) is still mostly done inC, even though memory management in C is notori-ously unsafe. As a result, serious bugs are continuouslydiscovered in system software.

In this dissertation, we study memory managementstrategies with a view to fitness for system program-ming.

First, we establish a framework to study mem-ory management strategies. Often perceived as distinctcategories, we argue that memory management ap-proaches are actually part of a single design space. Tothis end, we establish a precise and powerful lexicon todescribe memory management strategies of any kind.Using our newly established vocabulary, we further ar-gue that this design space has not been exhaustively ex-plored. We argue that one of the unexplored portion ofthis space, the static-automatic gap, contributes to thepersistence of C in system programming.

Second, we develop ASAP: a new memory manage-ment technique that fits in the static-automatic gap.ASAP is fully automatic (not even annotations are re-quired) and makes heavy use of static analysis. At com-pile time it inserts, in the original program, code thatdeallocates memory blocks as they becomes useless. Wethen show how ASAP interacts with various, advancedlanguage features. Specifically, we extend ASAP to sup-port polymorphism and mutability.

Third, we compare ASAP with existing approaches.One of the points of comparison we use is the be-havioural suitability to system programming. We alsoexplore how the ideas from ASAP can be combined

249

with other memory management strategies. We thenshow how ASAP handles programs satisfying the linearor region constraints. Finally, we explore the insightsgained whilst developing and studying ASAP.

UCAM-CL-TR-909

Laurent Simon:

Exploring new attack vectors for theexploitation of smartphonesJuly 2017, 167 pages, PDFPhD thesis (Homerton College, April 2016)

Abstract: Smartphones have evolved from simplecandy-bar devices into powerful miniature computingplatforms. Today’s smartphones are complex multi-tenant platforms: users, OS providers, manufacturers,carriers, and app developers have to co-exist on a sin-gle device. As with other computing platforms, smart-phone security research has dedicated a lot of effort,first, into the detection and prevention of ill-intentionedsoftware; second, into the detection and mitigation ofoperating system vulnerabilities; and third, into the de-tection and mitigation of vulnerabilities in applications.

In this thesis, I take a different approach. I exploreand study attack vectors that are specific to smart-phones; that is, attack vectors that do not exist on othercomputing platforms because they are the result ofthese phones’ intrinsic characteristics. One such char-acteristic is the sheer number of sensors and peripher-als, such as an accelerometer, a gyroscope and a built-incamera. Their number keeps increasing with new usagescenarios, e.g. for health or navigation. So I show howto abuse the camera and microphone to infer a smart-phone’s motion during user input. I then correlate mo-tion characteristics to the keyboard digits touched bya user so as to infer PINs. This can work even if theinput is protected through a Trusted Execution Envi-ronment (TEE), the industry’s preferred answer to thetrusted path problem.

Another characteristic is their form factor, such astheir small touch screen. New input methods have beendevised to make user input easier, such as “gesturetyping”. So I study a new side channel that exploitshardware and software interrupt counters to infer whatusers type using this widely adopted input method.

Another inherent trait is that users carry smart-phones everywhere. This increases the risk of theft orloss. In fact, in 2013 alone, 3.1M devices were stolenin the USA, and 120,000 in London. So I study the ef-fectiveness of anti-theft software for the Android plat-form, and demonstrate a wide variety of vulnerabilities.

Yet another characteristic of the smartphone ecosys-tem if the pace at which new devices are released: userstend to replace their phone about every 2 years, com-pared to 4.5 years for their personal computers. For al-ready 60% of users today, the purchase of a new smart-phone is partly funded by selling the previous one. This

can have privacy implications if the previous owner’spersonal data is not properly erased. So I study the ef-fectiveness of the built-in sanitisation features in An-droid smartphones, lifting the curtains on their prob-lems and their root causes.

UCAM-CL-TR-910

William Denman:

Automated verification of continuousand hybrid dynamical systemsJuly 2017, 178 pages, PDFPhD thesis (Clare College, September 2014)

Abstract: The standard method used for verifying thebehaviour of a dynamical system is simulation. Butsimulation can check only a finite number of operat-ing conditions and system parameters, leading to a po-tentially incomplete verification result. This disserta-tion presents several automated theorem proving basedmethods that can, in contrast to simulation, completelyguarantee the safety of a dynamical system model.

To completely verify a purely continuous dynamicalsystem requires proving a universally quantified first or-der conjecture, which represents all possible trajectoriesof the system. Such a closed form solution may containtranscendental functions, rendering the problem unde-cidable in the general case. The automated theoremprover MetiTarski can be used to solve such a prob-lem by reducing it to one over the real closed fields.The main issue is the doubly exponential complexityof the back-end decision procedures that it depends on.This dissertation proposes several techniques that makethe required conjectures easier for MetiTarski to prove.The techniques are shown to be effective at reducingthe amount of time required to prove high dimensionalproblems and are further demonstrated on a flight col-lision avoidance case study.

For hybrid systems, which contain both continuousand discrete behaviours, a different approach is pro-posed. In this case, qualitative reasoning is used to ab-stract the continuous state space into a set of discretecells. Then, standard discrete state formal verificationtools are used to verify properties over the abstrac-tion. MetiTarski is employed to determine the feasibil-ity of the cells and the transitions between them. Asthese checks are reduced to proving inequalities overthe theory of the reals, it facilitates the analysis of non-polynomial hybrid systems that contain transcendentalfunctions in their vector fields, invariants and guards.This qualitative abstraction framework has been imple-mented in the QUANTUM tool and is demonstrated onseveral hybrid system benchmark problems.

250

UCAM-CL-TR-911

Dominic Orchard, Mistral Contrastin,Matthew Danish, Andrew Rice:

Proofs for ‘Verifying SpatialProperties of Array Computations’September 2017, 8 pages, PDF

Abstract: This technical report provides accompanyingproofs for the paper: Verifying Spatial Properties of Ar-ray Computations. We show three properties of the lat-tice model of the stencil specification language. 1) thatit is sound with respect to the equational theory of re-gion specifications; 2) that is it sound with respect to thetheory of region approximation; 3) that the inferencealgorithm is sound. We further derive useful identitieson the specification language and properties of UnionNormal Form—the data structure used to implementthe model. Core definitions from the paper are restatedhere to make the overall report more self contained.

UCAM-CL-TR-913

Matic Horvat:

Hierarchical statistical semantictranslation and realizationOctober 2017, 215 pages, PDFPhD thesis (Churchill College, March 2017)

Abstract: Statistical machine translation (SMT) ap-proaches extract translation knowledge automaticallyfrom parallel corpora. They additionally take advan-tage of monolingual text for target-side language mod-elling. Syntax-based SMT approaches also incorpo-rate knowledge of source and/or target syntax by tak-ing advantage of monolingual grammars induced fromtreebanks, and semantics-based SMT approaches useknowledge of source and/or target semantics in variousforms. However, there has been very little research onincorporating the considerable monolingual knowledgeencoded in deep, hand-built grammars into statisticalmachine translation. Since deep grammars can producesemantic representations, such an approach could beused for realization as well as MT.

In this thesis I present a hybrid approach combiningsome of the knowledge in a deep hand-built grammar,the English Resource Grammar (ERG), with a statis-tical machine translation approach. The ERG is usedto parse the source sentences to obtain DependencyMinimal Recursion Semantics (DMRS) representations.DMRS representations are subsequently transformed toa form more appropriate for SMT, giving a parallel cor-pus with transformed DMRS on the source side andaligned strings on the target side. The SMT approachis based on hierarchical phrase-based translation (Hi-ero). I adapt the Hiero synchronous context-free gram-mar (SCFG) to comprise graph-to-string rules. DMRS

graph-to-string SCFG is extracted from the parallel cor-pus and used in decoding to transform an input DMRSgraph into a target string either for machine translationor for realization.

I demonstrate the potential of the approach forlarge-scale machine translation by evaluating it on theWMT15 English-German translation task. Althoughthe approach does not improve on a state-of-the-artHiero implementation, a manual investigation revealssome strengths and future directions for improvement.In addition to machine translation, I apply the ap-proach to the MRS realization task. The approach pro-duces realizations of high quality, but its main strengthlies in its robustness. Unlike the established MRS re-alization approach using the ERG, the approach pro-posed in this thesis is able to realize representationsthat do not correspond perfectly to ERG semantic out-put, which will naturally occur in practical realizationtasks. I demonstrate this in three contexts, by realiz-ing representations derived from sentence compression,from robust parsing, and from the transfer-phase of anexisting MT system.

In summary, the main contributions of this thesisare a novel architecture combining a statistical machinetranslation approach with a deep hand-built grammarand a demonstration of its practical usefulness as alarge-scale machine translation system and a robust re-alization alternative to the established MRS realizationapproach.

UCAM-CL-TR-914

Diana Andreea Popescu, Noa Zilberman,Andrew W. Moore:

Characterizing the impact ofnetwork latency on cloud-basedapplications’ performanceNovember 2017, 20 pages, PDF

Abstract: Businesses and individuals run increasingnumbers of applications in the cloud. The performanceof an application running in the cloud depends on thedata center conditions and upon the resources commit-ted to an application. Small network delays may leadto a significant performance degradation, which affectsboth the user’s cost and the service provider’s resourceusage, power consumption and data center efficiency. Inthis work, we quantify the effect of network latency onseveral typical cloud workloads, varying in complex-ity and use cases. Our results show that different ap-plications are affected by network latency to differingamounts. These insights into the effect of network la-tency on different applications have ramifications forworkload placement and physical host sharing whentrying to reach performance targets.

251

UCAM-CL-TR-915

Andrew Caines, Diane Nicholls,Paula Buttery:

Annotating errors and disfluencies intranscriptions of speechDecember 2017, 10 pages, PDF

Abstract: This document presents our guidelines for theannotation of errors and disfluencies in transcriptionsof speech. There is a well-established precedent for an-notating errors in written texts but the same is nottrue of speech transcriptions. We describe our codingscheme, discuss examples and difficult cases, and intro-duce new codes to deal with features characteristic ofspeech.

UCAM-CL-TR-916

Robert N. M. Watson, Jonathan Woodruff,Michael Roe, Simon W. Moore,Peter G. Neumann:

Capability Hardware EnhancedRISC Instructions (CHERI):Notes on the Meltdown andSpectre AttacksFebruary 2018, 16 pages, PDF

Abstract: In this report, we consider the potential im-pact of recently announced Meltdown and Spectre mi-croarchitectural side-channel attacks arising out of su-perscalar (out-of-order) execution on Capability Hard-ware Enhanced RISC Instructions (CHERI) computerarchitecture. We observe that CHERI’s in-hardwarepermissions and bounds checking may be an effectiveform of mitigation for one variant of these attacks,in which speculated instructions can bypass softwarebounds checking. As with MMU-based techniques,CHERI remains vulnerable to side-channel leakagearising from speculative execution across compartmentboundaries, leading us to propose a software-managedcompartment ID to mitigate these vulnerabilities forother variants as well.

UCAM-CL-TR-917

Matko Botincan:

Formal verification-drivenparallelisation synthesisMarch 2018, 163 pages, PDFPhD thesis (Trinity College, September 2013)

Abstract: Concurrency is often an optimisation, ratherthan intrinsic to the functional behaviour of a pro-gram, i.e., a concurrent program is often intended toachieve the same effect of a simpler sequential coun-terpart, just faster. Error-free concurrent programmingremains a tricky problem, beyond the capabilities ofmost programmers. Consequently, an attractive alter-native to manually developing a concurrent program isto automatically synthesise one.

This dissertation presents two novel formalverification-based methods for safely transforming asequential program into a concurrent one. The firstmethod — an instance of proof-directed synthesis —takes as the input a sequential program and its safetyproof, as well as annotations on where to parallelise,and produces a correctly-synchronised parallelised pro-gram, along with a proof of that program. The methoduses the sequential proof to guide the insertion of syn-chronisation barriers to ensure that the parallelised pro-gram has the same behaviour as the original sequen-tial version. The sequential proof, written in separationlogic, need only represent shape properties, meaning wecan parallelise complex heap-manipulating programswithout verifying every aspect of their behaviour.

The second method proposes specification-directedsynthesis: given a sequential program, we extract arich, stateful specification compactly summarising pro-gram behaviour, and use that specification for paral-lelisation. At the heart of the method is a learningalgorithm which combines dynamic and static analy-sis. In particular, dynamic symbolic execution and thecomputational learning technique grammar inductionare used to conjecture input-output specifications, andcounterexample-guided abstraction refinement to con-firm or refute the equivalence between the conjecturedspecification and the original program. Once equiva-lence checking succeeds, from the inferred specifica-tions we synthesise code that executes speculatively inparallel — enabling automated parallelisation of irreg-ular loops that are not necessary polyhedral, disjoint orwith a static pipeline structure.

UCAM-CL-TR-918

Gregory Y. Tsipenyuk:

Evaluation of decentralized emailarchitecture and social networkanalysis based on email attachmentsharingMarch 2018, 153 pages, PDFPhD thesis (Sidney Sussex College, August 2017)

Abstract: Present day e-mail is provided by centralizedservices running in the cloud. The services transparentlyconnect users behind middleboxes and provide backup,redundancy, and high availability at the expense of userprivacy. In present day mobile environments, users can

252

access and modify e-mail from multiple devices withupdates reconciled on the central server. Prioritizing up-dates is difficult and may be undesirable. Moreover,legacy email protocols do not provide optimal e-mailsynchronization and access. Recent phenomena of theInternet of Things (IoT) will see the number of inter-connected devices grow to 27 billion by 2021. In thefirst part of my dissertation I am proposing a decen-tralized email architecture which takes advantage ofuser’s a IoT devices to maintain a complete email his-tory. This addresses the e-mail reconciliation issue andplaces data under user control. I replace legacy emailprotocols with a synchronization protocol to achieveeventual consistency of email and optimize bandwidthand energy usage. The architecture is evaluated on aRaspberry Pi computer.

There is an extensive body of research on SocialNetwork Analysis (SNA) based on email archives. Typ-ically, the analyzed network reflects either communica-tion between users or a relationship between the e-mailand the information found in the e-mail’s header andthe body. This approach discards either all or someemail attachments that cannot be converted to text;for instance, images. Yet attachments may use up to90% of an e-mail archive size. In the second part ofmy dissertation I suggest extracting the network frome-mail attachments shared between users. I hypothesizethat the network extracted from shared e-mail attach-ments might provide more insight into the social struc-ture of the email archive. I evaluate communication andshared e-mail attachments networks by analyzing com-mon centrality measures and classification and cluster-ing algorithms. I further demonstrate how the analysisof the shared attachments network can be used to opti-mize the proposed decentralized e-mail architecture.

UCAM-CL-TR-919

Nada Amin, Francois-Rene Rideau:

Proceedings of the 2017 Scheme andFunctional Programming WorkshopMarch 2018, 50 pages, PDFOxford, UK – 3 September 2017

Abstract: This report aggregates the papers presentedat the eigtheenth annual Scheme and Functional Pro-gramming Workshop, hosted on September 3rd, 2017in Oxford, UK and co-located with the twenty-secondInternational Conference on Functional Programming.

The Scheme and Functional Programming Work-shop is held every year to provide an opportunity forresearchers and practitioners using Scheme and relatedfunctional programming languages like Racket, Clo-jure, and Lisp, to share research findings and discussthe future of the Scheme programming language.

Two full papers and three lightning talks were sub-mitted to the workshop, and each submission was re-viewed by three members of the program committee.

After deliberation, all submissions were accepted to theworkshop.

UCAM-CL-TR-920

Advait Sarkar:

Interactive analytical modellingMay 2018, 142 pages, PDFPhD thesis (Emmanuel College, December 2016)

Abstract: This dissertation addresses the problem of de-signing tools for non-expert end-users performing ana-lytical modelling, the activity of data analytics throughmachine learning. The research questions are, firstly,can tools be built for analytical modelling? Secondly,can these tools be made useful for non-experts?

Two case studies are made of building and applyingmachine learning models, firstly to analyse time seriesdata, and secondly to analyse data in spreadsheets. Re-spectively, two prototypes are presented, Gatherminerand BrainCel. It is shown how it is possible to visualisegeneral tasks (e.g., prediction, validation, explanation)in these contexts, illustrating how these prototypes em-body the research hypotheses, and confirmed throughexperimental evaluation that the prototypes do indeedhave the desirable properties of facilitating analyticalmodelling and being accessible to non-experts.

These prototypes and their evaluations exemplifythree theoretical contributions: (a) visual analytics canbe viewed as an instance of end-user programming,(b) interactive machine learning can be viewed as di-alogue, and (c) analytical modelling can be viewed asconstructivist learning. Four principles for the design ofanalytical modelling systems are derived: begin the ab-straction gradient at zero, abstract complex processesthrough heuristic automation, build expertise throughiteration on multiple representations, and support dia-logue through metamodels.

UCAM-CL-TR-921

Toby Moncaster:

Optimising data centre operation byremoving the transport bottleneckJune 2018, 130 pages, PDFPhD thesis (Wolfson College, February 2018)

Abstract: Data centres lie at the heart of almost everyservice on the Internet. Data centres are used to pro-vide search results, to power social media, to store andindex email, to host “cloud” applications, for onlineretail and to provide a myriad of other web services.Consequently the more efficient they can be made thebetter for all of us. The power of modern data centres isin combining commodity off-the-shelf server hardware

253

and network equipment to provide what Google’s Bar-rosso and Holzle describe as “warehouse scale” com-puters.

Data centres rely on TCP, a transport protocolthat was originally designed for use in the Internet.Like other such protocols, TCP has been optimised tomaximise throughput, usually by filling up queues atthe bottleneck. However, for most applications withina data centre network latency is more critical thanthroughput. Consequently the choice of transport pro-tocol becomes a bottleneck for performance. My the-sis is that the solution to this is to move away fromthe use of one-size-fits-all transport protocols towardsones that have been designed to reduce latency acrossthe data centre and which can dynamically respond tothe needs of the applications.

This dissertation focuses on optimising the trans-port layer in data centre networks. In particular Iaddress the question of whether any single transportmechanism can be flexible enough to cater to the needsof all data centre traffic. I show that one leading pro-tocol (DCTCP) has been heavily optimised for cer-tain network conditions. I then explore approachesthat seek to minimise latency for applications that careabout it while still allowing throughput-intensive appli-cations to receive a good level of service. My key con-tributions to this are Silo and Trevi.

Trevi is a novel transport system for storage trafficthat utilises fountain coding to maximise throughputand minimise latency while being agnostic to drop, thusallowing storage traffic to be pushed out of the waywhen latency sensitive traffic is present in the network.Silo is an admission control system that is designed togive tenants of a multi-tenant data centre guaranteedlow latency network performance. Both of these weredeveloped in collaboration with others.

UCAM-CL-TR-922

Frank Stajano, Graham Rymer,Michelle Houghton:

Raising a new generation ofcyber defendersJune 2018, 307 pages, PDF

Abstract: To address the skills gap in cyber security, inthe 2015–16 academic year we launched two compe-titions for university students: the national (UK-wide)Inter-ACE and, in collaboration with MIT, the interna-tional Cambridge2Cambridge. After running the com-petitions for three years and growing them in severaldimensions (including duration, budget, gender bal-ance, number of participants, number of universitiesand number of countries), we distill our experience intoa write-up of what we achieved and specific items ofadvice for our successors, discussing problems encoun-tered and possible solutions in a variety of areas andsuggesting future directions for further growth.

UCAM-CL-TR-923

Sam Ainsworth:

Prefetching for complex memoryaccess patternsJuly 2018, 146 pages, PDFPhD thesis (Churchill College, February 2018)

Abstract: Modern-day workloads, particularly those inbig data, are heavily memory-latency bound. This is be-cause of both irregular memory accesses, which haveno discernible pattern in their memory addresses, andlarge data sets that cannot fit in any cache. However,this need not be a barrier to high performance. Withsome data structure knowledge it is typically possibleto bring data into the fast on-chip memory caches early,so that it is already available by the time it needs to beaccessed. This thesis makes three contributions. I firstcontribute an automated software prefetching compilertechnique to insert high-performance prefetches intoprogram code to bring data into the cache early, achiev-ing 1.3x geometric mean speedup on the most complexprocessors, and 2.7x on the simplest. I also provide ananalysis of when and why this is likely to be successful,which data structures to target, and how to schedulesoftware prefetches well.

Then I introduce a hardware solution, the con-figurable graph prefetcher. This uses the example ofbreadth-first search on graph workloads to motivatehow a hardware prefetcher armed with data-structureknowledge can avoid the instruction overheads, in-flexibility and limited latency tolerance of softwareprefetching. The configurable graph prefetcher sits atthe L1 cache and observes memory accesses, which canbe configured by a programmer to be aware of a lim-ited number of different data access patterns, achieving2.3x geometric mean speedup on graph workloads onan out-of-order core. My final contribution extends thehardware used for the configurable graph prefetcher tomake an event-triggered programmable prefetcher, us-ing a set of a set of very small micro-controller-sizedprogrammable prefetch units (PPUs) to cover a wideset of workloads. I do this by developing a highly par-allel programming model that can be used to issueprefetches, thus allowing high-throughput prefetchingwith low power and area overheads of only around3%, and a 3x geometric mean speedup for a variety ofmemory-bound applications. To facilitate its use, I thendevelop compiler techniques to help automate the pro-cess of targeting the programmable prefetcher. Theseprovide a variety of tradeoffs from easiest to use to bestperformance.

UCAM-CL-TR-924

George Neville-Neil, Jonathan Anderson,Graeme Jenkinson, Brian Kidney,

254

Domagoj Stolfa, Arun Thomas,Robert N. M. Watson:

OpenDTrace Specification version 1.0August 2018, 235 pages, PDF

Abstract: OpenDTrace is a dynamic tracing facility of-fering full-system instrumentation, a high degree offlexibility, and portable semantics across a range of op-erating systems. Originally designed and implementedby Sun Microsystems (now Oracle), user-facing aspectsof OpenDTrace, such as the D language and command-line tools, are well defined and documented. However,OpenDTraces internal formats the DTrace IntermediateFormat (DIF), DTrace Object Format (DOF) and Com-pact C Trace Format (CTF) have primarily been doc-umented through source-code comments rather than astructured specification. This technical report specifiesthese formats in order to better support the develop-ment of more comprehensive tests, new underlying ex-ecution substrates (such as just-in-time compilation),and future extensions. We not only cover the data struc-tures present in OpenDTrace but also include a com-plete reference of all the low level instructions that areused by the byte code interpreter, all the built in globalvariables and subroutines. Our goal with this report isto provide not only a list of what is present in the codeat any point in time, the what, but also explanations ofhow the system works as a whole, the how, and motiva-tions for various design decisions that have been madealong the way, the why. Throughout this report weuse the name OpenDTrace to refer to the open-sourceproject but retain the name DTrace when referring todata structures such as the DTrace Intermediate For-mat. OpenDTrace builds upon the foundations of theoriginal DTrace code but provides new features, whichwere not present in the original. This document acts asa single source of truth for the current state of OpenD-Trace as it is currently implemented and deployed.

UCAM-CL-TR-925

Ranjan Pal, Jon Crowcroft, Abhishek Kumar,Pan Hui, Hamed Haddadi, Swades De,Irene Ng, Sasu Tarkoma, Richard Mortier:

Privacy markets in the Apps andIoT ageSeptember 2018, 45 pages, PDF

Abstract: In the era of the mobile apps and IoT, hugequantities of data about individuals and their activi-ties offer a wave of opportunities for economic andsocietal value creation. However, the current personaldata ecosystem is fragmented and inefficient. On onehand, end-users are not able to control access (eithertechnologically, by policy, or psychologically) to theirpersonal data which results in issues related to privacy,

personal data ownership, transparency, and value dis-tribution. On the other hand, this puts the burden ofmanaging and protecting user data on apps and ad-driven entities (e.g., an ad-network) at a cost of trustand regulatory accountability. In such a context, dataholders (e.g., apps) may take advantage of the individ-uals’ inability to fully comprehend and anticipate thepotential uses of their private information with detri-mental effects for aggregate social welfare. In this pa-per, we investigate the problem of the existence anddesign of efficient ecosystems (modeled as markets inthis paper) that aim to achieve a maximum social wel-fare state among competing data holders by preservingthe heterogeneous privacy preservation constraints upto certain compromise levels, induced by their clients,and at the same time satisfying requirements of agencies(e.g., advertisers) that collect and trade client data forthe purpose of targeted advertising, assuming the po-tential practical inevitability of some amount inappro-priate data leakage on behalf of the data holders. Usingconcepts from supply-function economics, we proposethe first mathematically rigorous and provably optimalprivacy market design paradigm that always results inunique equilibrium (i.e, stable) market states that canbe either economically efficient or inefficient, dependingon whether privacy trading markets are monopolisticor oligopolistic in nature. Subsequently, we character-ize in closed form, the efficiency gap (if any) at marketequilibrium.

UCAM-CL-TR-926

Ranjan Pal, Konstantinos Psounis,Abhishek Kumar, Jon Crowcroft,Pan Hui, Leana Golubchik, John Kelly,Aritra Chatterjee, Sasu Tarkoma:

Are cyber-blackouts inservice networks likely?:implications for cyber riskmanagementOctober 2018, 32 pages, PDF

Abstract: Service liability interconnections among net-worked IT and IoT driven service organizations cre-ate potential channels for cascading service disruptionsdue to modern cybercrimes such as DDoS, APT, andransomware attacks. The very recent Mirai DDoS andWannaCry ransomware attacks serve as famous exam-ples of cyber-incidents that have caused catastrophicservice disruptions worth billions of dollars across or-ganizations around the globe. A natural question thatarises in this context is “what is the likelihood of acyber-blackout?”, where the latter term is defined as:“the probability that all (or a major subset of) organi-zations in a service chain become dysfunctional in a cer-tain manner due to a cyber-attack at some or all pointsin the chain”.

255

The answer to this question has major implicationsto risk management businesses such as cyber-insurancewhen it comes to designing policies by risk-averse in-surers for providing coverage to clients in the after-math of such catastrophic network events. In this pa-per, we investigate this question in general as a functionof service chain networks and different loss distributiontypes. We show somewhat surprisingly (and discuss po-tential practical implications) that following a cyber-attack, the probability of a cyber-blackout and the in-crease in total service-related monetary losses across allorganizations, due to the effect of (a) network inter-connections, and (b) a wide range of loss distributions,are mostly very small, regardless of the network struc-ture – the primary rationale behind the results beingattributed to degrees of heterogeneity in wealth baseamong organizations, and Increasing Failure Rate (IFR)property of loss distributions.

UCAM-CL-TR-927

Robert N. M. Watson, Peter G. Neumann,Jonathan Woodruff, Michael Roe,Hesham Almatary, Jonathan Anderson,John Baldwin, David Chisnall,Brooks Davis, Nathaniel Wesley Filardo,Alexandre Joannou, Ben Laurie,A. Theodore Markettos, Simon W. Moore,Steven J. Murdoch, Kyndylan Nienhuis,Robert Norton, Alex Richardson,Peter Rugg, Peter Sewell, Stacey Son,Hongyan Xia:

Capability HardwareEnhanced RISC Instructions:CHERI Instruction-Set Architecture(Version 7)June 2019, 496 pages, PDF

Abstract: This technical report describes CHERI ISAv7,the seventh version of the Capability Hardware En-hanced RISC Instructions (CHERI) Instruction-Set Ar-chitecture (ISA) being developed by SRI Internationaland the University of Cambridge. This design cap-tures nine years of research, development, experi-mentation, refinement, formal analysis, and valida-tion through hardware and software implementation.CHERI ISAv7 is a substantial enhancement to priorISA versions. We differentiate an architecture-neutralprotection model vs. architecture-specific instantia-tions in 64-bit MIPS, 64-bit RISC-V, and x86-64. Wehave defined a new CHERI Concentrate compressionmodel. CHERI-RISC-V is more substantially elabo-rated. A new compartment-ID register assists in re-sisting microarchitectural side-channel attacks. Exper-imental features include linear capabilities, capability

coloring, temporal memory safety, and 64-bit capabili-ties for 32-bit architectures.

CHERI is a hybrid capability-system architecturethat adds new capability-system primitives to commod-ity 64-bit RISC ISAs, enabling software to efficientlyimplement fine-grained memory protection and scal-able software compartmentalization. Design goals in-clude incremental adoptability within current ISAs andsoftware stacks, low performance overhead for mem-ory protection, significant performance improvementsfor software compartmentalization, formal grounding,and programmer-friendly underpinnings. We have fo-cused on providing strong, non-probabilistic, efficientarchitectural foundations for the principles of leastprivilege and intentional use in the execution of soft-ware at multiple levels of abstraction, preventing andmitigating vulnerabilities.

The CHERI system architecture purposefully ad-dresses known performance and robustness gaps incommodity ISAs that hinder the adoption of more se-cure programming models centered around the prin-ciple of least privilege. To this end, CHERI blendstraditional paged virtual memory with an in-address-space capability model that includes capability reg-isters, capability instructions, and tagged memory.CHERI builds on the C-language fat-pointer litera-ture: its capabilities can describe fine-grained regionsof memory, and can be substituted for data or codepointers in generated code, protecting data and also im-proving control-flow robustness. Strong capability in-tegrity and monotonicity properties allow the CHERImodel to express a variety of protection properties,from enforcing valid C-language pointer provenanceand bounds checking to implementing the isolation andcontrolled communication structures required for soft-ware compartmentalization.

CHERI’s hybrid capability-system approach, in-spired by the Capsicum security model, allows incre-mental adoption of capability-oriented design: softwareimplementations that are more robust and resilient canbe deployed where they are most needed, while leav-ing less critical software largely unmodified, but never-theless suitably constrained to be incapable of havingadverse effects. Potential deployment scenarios includelow-level software Trusted Computing Bases (TCBs)such as separation kernels, hypervisors, and operating-system kernels, as well as userspace TCBs such as lan-guage runtimes and web browsers. We also see po-tential early-use scenarios around particularly high-risksoftware libraries (such as data compression, protocolparsing, and image processing), which are concentra-tions of both complex and historically vulnerability-prone code exposed to untrustworthy data sources,while leaving containing applications unchanged.

UCAM-CL-TR-928

Noa Zilberman, Łukasz Dudziak,Matthew Jadczak, Thomas Parks,

256

Alessandro Rietmann, Vadim Safronov,Daniel Zuo:

Cut-through network switches:architecture, design andimplementationNovember 2018, 18 pages, PDF

Abstract: Cut-through switches are increasingly usedwithin data-centres and high-performance networkedsystems. Despite their popularity, little is known ofthe architecture of cut-through switches used in to-day’s networks. In this paper we introduce three dif-ferent cut-through switch architectures, designed overthe NetFPGA platform. Beyond exploring architecturaland design considerations, we compare and contrastthe different architectures, providing insights into real-world switch design. Last, we provide an evaluation ofone successfully implemented cut-through switch de-sign, providing constant switching latency regardlessof packet size and cross-traffic, without compromisingthroughput.

UCAM-CL-TR-929

Khaled Baqer:

Resilient payment systemsNovember 2018, 115 pages, PDFPhD thesis (St. Edmund’s College, October 2018)

Abstract: There have been decades of attempts toevolve or revolutionise the traditional financial system,but not all such efforts have been transformative oreven successful. From Chaum’s proposals in the 1980sfor private payment systems to micropayments, previ-ous attempts failed to take off for a variety of reasons,including non-existing markets, or issues pertaining tousability, scalability and performance, resilience againstfailure, and complexity of protocols.

Towards creating more resilient payment systems,we investigated issues related to security engineering ingeneral, and payment systems in particular. We iden-tified that network coverage, central points of fail-ure, and attacks may cripple system performance. Thepremise of our research is that offline capabilities arerequired to produce resilience in critical systems.

We focus on issues related to network problems andattacks, system resilience, and scalability by introduc-ing the ability to process payments offline without re-lying on the availability of network coverage; a lack ofnetwork coverage renders some payment services unus-able for their customers. Decentralising payment verifi-cation, and outsourcing some operations to users, al-leviates the burden of contacting centralised systemsto process every transaction. Our secondary goal is tominimise the cost of providing payment systems, so

providers can cut transaction fees. Moreover, by de-centralising payment verification that can be performedoffline, we increase system resilience, and seamlesslymaintain offline operations until a system is back on-line. We also use tamper-resistant hardware to tackleusability issues, by minimising cognitive overhead andhelping users to correctly handle critical data, minimis-ing the risks of data theft and tampering.

We apply our research towards extending financialinclusion efforts, since the issues discussed above mustbe solved to extend mobile payments to the poorest de-mographics. More research is needed to integrate on-line payments, offline payments, and delay-tolerant net-working. This research extends and enhances not onlypayment systems, but other electronically-enabled ser-vices from pay-as-you-go solar panels to agriculturalsubsidies and payments from aid donors. We hope thatthis thesis is helpful for researchers, protocol designers,and policy makers interested in creating resilient pay-ment systems by assisting them in financial inclusionefforts.

UCAM-CL-TR-930

Lucian Carata:

Provenance-based computingDecember 2018, 132 pages, PDFPhD thesis (Wolfson College, July 2016)

Abstract: Relying on computing systems that becomeincreasingly complex is difficult: with many factors po-tentially affecting the result of a computation or itsproperties, understanding where problems appear andfixing them is a challenging proposition. Typically, theprocess of finding solutions is driven by trial and erroror by experience-based insights.

In this dissertation, I examine the idea of usingprovenance metadata (the set of elements that havecontributed to the existence of a piece of data, togetherwith their relationships) instead. I show that consider-ing provenance a primitive of computation enables theexploration of system behaviour, targeting both retro-spective analysis (root cause analysis, performance tun-ing) and hypothetical scenarios (what-if questions). Inthis context, provenance can be used as part of feed-back loops, with a double purpose: building softwarethat is able to adapt for meeting certain quality and per-formance targets (semi-automated tuning) and enablinghuman operators to exert high-level runtime controlwith limited previous knowledge of a system’s internalarchitecture.

My contributions towards this goal are threefold:providing low-level mechanisms for meaningful prove-nance collection considering OS-level resource multi-plexing, proving that such provenance data can be usedin inferences about application behaviour and general-ising this to a set of primitives necessary for fine-grainedprovenance disclosure in a wider context.

257

To derive such primitives in a bottom-up manner, Ifirst present Resourceful, a framework that enables cap-turing OS-level measurements in the context of appli-cation activities. It is the contextualisation that allowstying the measurements to provenance in a meaningfulway, and I look at a number of use-cases in understand-ing application performance. This also provides a goodsetup for evaluating the impact and overheads of fine-grained provenance collection.

I then show that the collected data enables new waysof understanding performance variation by attributingit to specific components within a system. The resultingset of tools, Soroban, gives developers and operationengineers a principled way of examining the impact ofvarious configuration, OS and virtualization parame-ters on application behaviour.

Finally, I consider how this supports the idea thatprovenance should be disclosed at application level anddiscuss why such disclosure is necessary for enablingthe use of collected metadata efficiently and at a gran-ularity which is meaningful in relation to applicationsemantics.

UCAM-CL-TR-931

Jyothish Soman:

A Performance-efficient and practicalprocessor error recovery frameworkJanuary 2019, 91 pages, PDFPhD thesis (Wolfson College, July 2017)

Abstract: Continued reduction in the size of a transis-tor has affected the reliability of processors built usingthem. This is primarily due to factors such as inaccura-cies while manufacturing, as well as non-ideal operat-ing conditions, causing transistors to slow down consis-tently, eventually leading to permanent breakdown anderroneous operation of the processor. Permanent tran-sistor breakdown, or faults, can occur at any point intime in the processor’s lifetime. Errors are the discrep-ancies in the output of faulty circuits. This dissertationshows that the components containing faults can con-tinue operating if the errors caused by them are withincertain bounds. Further, the lifetime of a processor canbe increased by adding supportive structures that startworking once the processor develops these hard errors.

This dissertation has three major contributions,namely REPAIR, FaultSim and PreFix. REPAIR is afault tolerant system with minimal changes to theprocessor design. It uses an external Instruction Re-execution Unit (IRU) to perform operations, which thefaulty processor might have erroneously executed. In-structions that are found to use faulty hardware arethen re-executed on the IRU. REPAIR shows that theperformance overhead of such targeted re-execution islow for a limited number of faults.

FaultSim is a fast fault-simulator capable of simulat-ing large circuits at the transistor level. It is developed

in this dissertation to understand the effect of faults ondifferent circuits. It performs digital logic based simula-tions, trading off analogue accuracy with speed, whilestill being able to support most fault models. A 32-bitaddition takes under 15 micro-seconds, while simulat-ing more than 1500 transistors. It can also be integratedinto an architectural simulator, which added a perfor-mance overhead of 10 to 26 percent to a simulation.The results obtained show that single faults cause anerror in an adder in less than 10 percent of the inputs.

PreFix brings together the fault models created us-ing FaultSim and the design directions found using RE-PAIR. PreFix performs re-execution of instructions ona remote core, which pick up instructions to executeusing a global instruction buffer. Error prediction anddetection are used to reduce the number of re-executedinstructions. PreFix has an area overhead of 3.5 per-cent in the setup used, and the performance overheadis within 5 percent of a fault-free case. This dissertationshows that faults in processors can be tolerated withoutexplicitly switching off any component, and minimalredundancy is sufficient to achieve the same.

UCAM-CL-TR-932

Brooks Davis, Robert N. M. Watson,Alexander Richardson, Peter G. Neumann,Simon W. Moore, John Baldwin,David Chisnall, Jessica Clarke,Nathaniel Wesley Filardo,Khilan Gudka, Alexandre Joannou,Ben Laurie, A. Theodore Markettos,J. Edward Maste, Alfredo Mazzinghi,Edward Tomasz Napierala,Robert M. Norton, Michael Roe,Peter Sewell, Stacey Son, Jonathan Woodruff:

CheriABI: Enforcing valid pointerprovenance and minimizing pointerprivilege in the POSIX C run-timeenvironmentApril 2019, 40 pages, PDF

Abstract: The CHERI architecture allows pointers tobe implemented as capabilities (rather than integer vir-tual addresses) in a manner that is compatible with, andstrengthens, the semantics of the C language. In addi-tion to the spatial protections offered by conventionalfat pointers, CHERI capabilities offer strong integrity,enforced provenance validity, and access monotonicity.

The stronger guarantees of these architectural capa-bilities must be reconciled with the real-world behav-ior of operating systems, run-time environments, andapplications. When the process model, user-kernel in-teractions, dynamic linking, and memory managementare all considered, we observe that simple derivation of

258

architectural capabilities is insufficient to describe ap-propriate access to memory. We bridge this conceptualgap with a notional abstract capability that describesthe accesses that should be allowed at a given point inexecution, whether in the kernel or userspace.

To investigate this notion at scale, we describe thefirst adaptation of a full C-language operating system(FreeBSD) with an enterprise database (PostgreSQL)for complete spatial and referential memory safety.We show that awareness of abstract capabilities, cou-pled with CHERI architectural capabilities, can pro-vide more complete protection, strong compatibility,and acceptable performance overhead compared withthe pre-CHERI baseline and software-only approaches.Our observations also have potentially significant im-plications for other mitigation techniques.

UCAM-CL-TR-933

Noa Zilberman:

An Evaluation of NDP performanceJanuary 2019, 19 pages, PDF

Abstract: NDP is a novel data centre transport archi-tecture that claims to achieve near-optimal completiontimes for short transfers and high flow throughput ina wide range of scenarios. This work presents a per-formance evaluation of NDP, both on the simulationand the hardware level. We show that NDP’s imple-mentation achieves lower switch throughput than sim-ple TCP, and that the simulated performance is highlydependent on the selected parameters.

UCAM-CL-TR-934

Colin L. Rothwell:

Exploitation from malicious PCIExpress peripheralsFebruary 2019, 108 pages, PDF

Abstract: The thesis of this dissertation is that, despitewidespread belief in the security community, systemsare still vulnerable to attacks from malicious peripher-als delivered over the PCI Express (PCIe) protocol. Ma-licious peripherals can be plugged directly into internalPCIe slots, or connected via an external Thunderboltconnection.

To prove this thesis, we designed and built a newPCIe attack platform. We discovered that a simple plat-form was insufficient to carry out complex attacks,so created the first PCIe attack platform that runs afull, conventional OS. To allows us to conduct attacksagainst higher-level OS functionality built on PCIe, wemade the attack platform emulate in detail the be-haviour of an Intel 82574L Network Interface Con-troller (NIC), by using a device model extracted fromthe QEMU emulator.

We discovered a number of vulnerabilities in thePCIe protocol itself, and with the way that the de-fence mechanisms it provides are used by modernOSs. The principal defence mechanism provided is theInput/Output Memory Management Unit (IOMMU).This remaps the address space used by peripherals in4KiB chunks, and can prevent access to areas of ad-dress space that a peripheral should not be able to ac-cess. We found that, contrary to belief in the securitycommunity, the IOMMUs in modern systems were notdesigned to protect against attacks from malicious pe-ripherals, but to allow virtual machines direct access toreal hardware.

We discovered that use of the IOMMU is patchyeven in modern operating systems. Windows effectivelydoes not use the IOMMU at all; macOS opens win-dows that are shared by all devices; Linux and FreeBSDmap windows into host memory separately for each de-vice, but only if poorly documented boot flags are used.These OSs make no effort to ensure that only data thatshould be visible to the devices is in the mapped win-dows.

We created novel attacks that subverted control flowand read private data against systems running macOS,Linux and FreeBSD with the highest level of relevantprotection enabled. These represent the first use of therelevant exploits in each case.

In the final part of this thesis, we evaluate the suit-ability of a number of proposed general purpose andspecific mitigations against DMA attacks, and make anumber of recommendations about future directions inIOMMU software and hardware.

UCAM-CL-TR-935

Heidi Howard:

Distributed consensus revisedApril 2019, 151 pages, PDFPhD thesis (Pembroke College, September 2018)

Abstract: We depend upon distributed systems in ev-ery aspect of life. Distributed consensus, the ability toreach agreement in the face of failures and asynchrony,is a fundamental and powerful primitive for construct-ing reliable distributed systems from unreliable compo-nents.

For over two decades, the Paxos algorithm hasbeen synonymous with distributed consensus. Paxos iswidely deployed in production systems, yet it is poorlyunderstood and it proves to be heavyweight, unscalableand unreliable in practice. As such, Paxos has been thesubject of extensive research to better understand thealgorithm, to optimise its performance and to mitigateits limitations.

In this thesis, we re-examine the foundations of howPaxos solves distributed consensus. Our hypothesis isthat these limitations are not inherent to the problemof consensus but instead specific to the approach of

259

Paxos. The surprising result of our analysis is a sub-stantial weakening to the requirements of this widelystudied algorithm. Building on this insight, we are ableto prove an extensive generalisation over the Paxos al-gorithm.

Our revised understanding of distributed consensusenables us to construct a diverse family of algorithmsfor solving consensus; covering classical as well as novelalgorithms to reach consensus where it was previouslythought impossible. We will explore the wide reach-ing implications of this new understanding, rangingfrom pragmatic optimisations to production systems tofundamentally novel approaches to consensus, whichachieve new tradeoffs in performance, scalability andreliability.

UCAM-CL-TR-936

Alexandre J. P. Joannou:

High-performance memory safety:optimizing the CHERI capabilitymachineMay 2019, 132 pages, PDFPhD thesis (Peterhouse College, September 2017)

Abstract: This work presents optimizations for mod-ern capability machines and specifically for the CHERIarchitecture, a 64-bit MIPS instruction set extensionfor security, supporting fine-grained memory protectionthrough hardware enforced capabilities.

The original CHERI model uses 256-bit capabili-ties to carry information required for various checkshelping to enforce memory safety, leading to increasedmemory bandwidth requirements and cache pressurewhen using CHERI capabilities in place of conven-tional 64-bit pointers. In order to mitigate this cost, Ipresent two new 128-bit CHERI capability formats, us-ing different compression techniques, while preservingC-language compatibility lacking in previous pointercompression schemes. I explore the trade-offs intro-duced by these new formats over the 256-bit format.I produce an implementation in the L3 ISA modellinglanguage, collaborate on the hardware implementation,and provide an evaluation of the mechanism.

Another cost related to CHERI capabilities is thememory traffic increase due to capability-validity tags:to provide unforgeable capabilities, CHERI uses atagged memory that preserves validity tags for every256-bit memory word in a shadow space inaccessibleto software. The CHERI hardware implementation ofthis shadow space uses a capability-validity-tag table inmemory and caches it at the end of the cache hierarchy.To efficiently implement such a shadow space and im-prove on CHERI’s current approach, I use sparse datastructures in a hierarchical tag-cache that filters unnec-essary memory accesses. I present an in-depth studyof this technique through a Python implementation ofthe hierarchical tag-cache, and also provide a hardware

implementation and evaluation. I find that validity-tagtraffic is reduced for all applications and scales with taguse. For legacy applications that do not use tags, thereis near zero overhead.

Removing these costs through the use of the pro-posed optimizations makes the CHERI architecturemore affordable and appealing for industrial adoption.

UCAM-CL-TR-937

Diana Andreea Popescu:

Latency-driven performance indata centresJune 2019, 190 pages, PDFPhD thesis (Churchill College, December 2018)

Abstract: Data centre based cloud computing has revo-lutionised the way businesses use computing infrastruc-ture. Instead of building their own data centres, compa-nies rent computing resources and deploy their appli-cations on cloud hardware. Providing customers withwell-defined application performance guarantees is ofparamount importance to ensure transparency and tobuild a lasting collaboration between users and cloudoperators. A user’s application performance is subjectto the constraints of the resources it has been allocatedand to the impact of the network conditions in the datacentre.

In this dissertation, I argue that application per-formance in data centres can be improved throughcluster scheduling of applications informed by predic-tions of application performance for given network la-tency, and measurements of current network latency indata centres between hosts. Firstly, I show how to usethe Precision Time Protocol (PTP), through an open-source software implementation PTPd, to measure net-work latency and packet loss in data centres. I pro-pose PTPmesh, which uses PTPd, as a cloud networkmonitoring tool for tenants. Furthermore, I conduct ameasurement study using PTPmesh in different cloudproviders, finding that network latency variability indata centres is still common. Normal latency values indata centres are in the order of tens or hundreds ofmicroseconds, while unexpected events, such as net-work congestion or packet loss, can lead to latencyspikes in the order of milliseconds. Secondly, I showthat network latency matters for certain distributed ap-plications even in small amounts of tens or hundredsof microseconds, significantly reducing their perfor-mance. I propose a methodology to determine the im-pact of network latency on distributed applications per-formance by injecting artificial delay into the networkof an experimental setup. Based on the experimentalresults, I build functions that predict the performanceof an application for a given network latency. Giventhe network latency variability observed in data cen-tres, applications’ performance is determined by theirplacement within the data centre. Thirdly, I propose

260

latency-driven, application performance-aware, clusterscheduling as a way to provide performance guaranteesto applications. I introduce NoMora, a cluster schedul-ing architecture that leverages the predictions of appli-cation performance dependent upon network latencycombined with dynamic network latency measurementstaken between pairs of hosts in data centres to placeapplications. Moreover, I show that NoMora improvesapplication performance by choosing better placementsthan other scheduling policies.

UCAM-CL-TR-938

Christopher Bryant:

Automatic annotation of error typesfor grammatical error correctionJune 2019, 138 pages, PDFPhD thesis (Churchill College, December 2018)

Abstract: Grammatical Error Correction (GEC) is thetask of automatically detecting and correcting gram-matical errors in text. Although previous work has fo-cused on developing systems that target specific errortypes, the current state of the art uses machine trans-lation to correct all error types simultaneously. A sig-nificant disadvantage of this approach is that machinetranslation does not produce annotated output and soerror type information is lost. This means we can onlyevaluate a system in terms of overall performance andcannot carry out a more detailed analysis of differentaspects of system performance.

In this thesis, I develop a system to automaticallyannotate parallel original and corrected sentence pairswith explicit edits and error types. In particular, I firstextend the Damerau-Levenshtein alignment algorithmto make use of linguistic information when aligningparallel sentences, and supplement this alignment witha set of merging rules to handle multi-token edits. Theoutput from this algorithm surpasses other edit extrac-tion approaches in terms of approximating human editannotations and is the current state of the art. Havingextracted the edits, I next classify them according to anew rule-based error type framework that depends onlyon automatically obtained linguistic properties of thedata, such as part-of-speech tags. This framework wasinspired by existing frameworks, and human judgesrated the appropriateness of the predicted error typesas Good (85%) or Acceptable (10%) in a random sam-ple of 200 edits. The whole system is called the ERRorANnotation Toolkit (ERRANT) and is the first toolkitcapable of automatically annotating parallel sentenceswith error types.

I demonstrate the value of ERRANT by applying itto the system output produced by the participants ofthe CoNLL-2014 shared task, and carry out a detailederror type analysis of system performance for the first

time. I also develop a simple language model based ap-proach to GEC, that does not require annotated train-ing data, and show how it can be improved using ER-RANT error types.

UCAM-CL-TR-939

Christine Guo Yu:

Effects of timing on users’ perceivedcontrol when interacting withintelligent systemsAugust 2019, 284 pages, PDFPhD thesis (Gonville & Caius Gollege, May 2018)

Abstract: This research relates to the usability of mixed-initiative interaction systems, in which actions can beinitiated either through a choice by the user or throughintelligent decisions taken by the system. The key issueaddressed here is how to preserve the user’s perceivedcontrol (‘sense of agency’) when the control of the in-teraction is being transferred between the system andthe user in a back-and-forth manner.

Previous research in social psychology and cogni-tive neuroscience suggests timing is a factor that caninfluence perceived control in such back-and-forth in-teractions. This dissertation explores the hypothesisthat in mixed-initiative interaction, a predictable in-teraction rhythm can preserve the user’s sense of con-trol and enhance their experience during a task (e.g.higher confidence in task performance, stronger tem-poral alignment, lower perceived levels of stress andeffort), whereas irregular interaction timing can havethe opposite effect. Three controlled experiments com-pare alternative rhythmic strategies when users interactwith simple visual stimuli, simple auditory stimuli, anda more realistic assisted text labelling task. The resultsof all three experiments support the hypothesis that apredictable interaction rhythm is beneficial in a rangeof interaction modalities and applications.

This research contributes to the field of human-computer interaction (HCI) in four ways. Firstly, itbuilds novel connections between existing theories incognitive neuroscience, social psychology and HCI,highlighting how rhythmic temporal structures can bebeneficial to the user’s experience: particularly, theirsense of control. Secondly, it establishes timing as acrucial design resource for mixed-initiative interaction,and provides empirical evidence of how the user’s per-ceived control and other task experiences (such as re-ported levels of confidence, stress and effort) can be in-fluenced by the manipulation of timing. Thirdly, it pro-vides quantitative measures for the user’s entrainmentbehaviours that are applicable to a wide range of inter-action timescales. Lastly, it contextualises the design oftiming in a realistic application scenario and offers in-sights to the design of general end-user automation anddecision support tools.

261

UCAM-CL-TR-940

Kyndylan Nienhuis, Alexandre Joannou,Anthony Fox, Michael Roe,Thomas Bauereiss, Brian Campbell,Matthew Naylor, Robert M. Norton,Simon W. Moore, Peter G. Neumann,Ian Stark, Robert N. M. Watson,Peter Sewell:

Rigorous engineeringfor hardware security:formal modelling and proof in theCHERI design and implementationprocessSeptember 2019, 38 pages, PDF

Abstract: The root causes of many security vulnera-bilities include a pernicious combination of two prob-lems, often regarded as inescapable aspects of com-puting. First, the protection mechanisms provided bythe mainstream processor architecture and C/C++ lan-guage abstractions, dating back to the 1970s and be-fore, provide only coarse-grain virtual-memory-basedprotection. Second, mainstream system engineering re-lies almost exclusively on test-and-debug methods, with(at best) prose specifications. These methods have his-torically sufficed commercially for much of the com-puter industry, but they fail to prevent large numbersof exploitable bugs, and the security problems that thiscauses are becoming ever more acute.

In this paper we show how more rigorous engi-neering methods can be applied to the development ofa new security-enhanced processor architecture, withits accompanying hardware implementation and soft-ware stack. We use formal models of the completeinstruction-set architecture (ISA) at the heart of the de-sign and engineering process, both in lightweight waysthat support and improve normal engineering practice– as documentation, in emulators used as a test ora-cle for hardware and for running software, and for testgeneration – and for formal verification. We formalisekey intended security properties of the design, and es-tablish that these hold with mechanised proof. This isfor the same complete ISA models (complete enough toboot operating systems), without idealisation.

We do this for CHERI, an architecture with hard-ware capabilities that supports fine-grained memoryprotection and scalable secure compartmentalisation,while offering a smooth adoption path for existing soft-ware. CHERI is a maturing research architecture, de-veloped since 2010, with work now underway to ex-plore its possible adoption in mass-market commercialprocessors. The rigorous engineering work describedhere has been an integral part of its development todate, enabling more rapid and confident experimenta-tion, and boosting confidence in the design.

UCAM-CL-TR-941

Robert N. M. Watson, Simon W. Moore,Peter Sewell, Peter G. Neumann:

An Introduction to CHERISeptember 2019, 43 pages, PDF

Abstract: CHERI (Capability Hardware EnhancedRISC Instructions) extends conventional processorInstruction-Set Architectures (ISAs) with architecturalcapabilities to enable fine-grained memory protectionand highly scalable software compartmentalization.CHERI’s hybrid capability-system approach allows ar-chitectural capabilities to be integrated cleanly withcontemporary RISC architectures and microarchitec-tures, as well as with MMU-based C/C++-languagesoftware stacks.

CHERI’s capabilities are unforgeable tokens of au-thority, which can be used to implement both explicitpointers (those declared in the language) and impliedpointers (those used by the runtime and generated code)in C and C++. When used for C/C++ memory pro-tection, CHERI directly mitigates a broad range ofknown vulnerability types and exploit techniques. Sup-port for more scalable software compartmentalizationfacilitates software mitigation techniques such as sand-boxing, which also defend against future (currently un-known) vulnerability classes and exploit techniques.

We have developed, evaluated, and demonstratedthis approach through hardware-software prototypes,including multiple CPU prototypes, and a full soft-ware stack. This stack includes an adapted versionof the Clang/LLVM compiler suite with support forcapability-based C/C++, and a full UNIX-style OS(CheriBSD, based on FreeBSD) implementing spatial,referential, and (currently for userspace) non-stack tem-poral memory safety. Formal modeling and verificationallow us to make strong claims about the security prop-erties of CHERI-enabled architectures.

This report is a high-level introduction to CHERI.The report describes our architectural approach,CHERI’s key microarchitectural implications, our ap-proach to formal modeling and proof, the CHERIsoftware model, our software-stack prototypes, furtherreading, and potential areas of future research.

UCAM-CL-TR-942

Alexander Kuhnle:

Evaluating visually groundedlanguage capabilities usingmicroworldsJanuary 2020, 142 pages, PDFPhD thesis (Queens’ College, August 2019)

262

Abstract: Deep learning has had a transformative im-pact on computer vision and natural language process-ing. As a result, recent years have seen the introductionof more ambitious holistic understanding tasks, com-prising a broad set of reasoning abilities. Datasets inthis context typically act not just as application-focusedbenchmark, but also as basis to examine higher-levelmodel capabilities. This thesis argues that emerging is-sues related to dataset quality, experimental practiceand learned model behaviour are symptoms of the in-appropriate use of benchmark datasets for capability-focused assessment. To address this deficiency, a newevaluation methodology is proposed here, which specif-ically targets in-depth investigation of model perfor-mance based on configurable data simulators. This fo-cus on analysing system behaviour is complementaryto the use of monolithic datasets as application-focusedcomparative benchmarks.

Visual question answering is an example of a mod-ern holistic understanding task, unifying a range ofabilities around visually grounded language under-standing in a single problem statement. It has also beenan early example for which some of the aforemen-tioned issues were identified. To illustrate the new eval-uation approach, this thesis introduces ShapeWorld,a diagnostic data generation framework. Its design isguided by the goal to provide a configurable and exten-sible testbed for the domain of visually grounded lan-guage understanding. Based on ShapeWorld data, thestrengths and weaknesses of various state-of-the-art vi-sual question answering models are analysed and com-pared in detail, with respect to their ability to correctlyhandle statements involving, for instance, spatial rela-tions or numbers. Finally, three case studies illustratethe versatility of this approach and the ShapeWorldgeneration framework: an investigation of multi-taskand curriculum learning, a replication of a psycholin-guistic study for deep learning models, and an explo-ration of a new approach to assess generative tasks likeimage captioning.

UCAM-CL-TR-943

Mathew P. Grosvenor:

Latency-First datacenter networkschedulingJanuary 2020, 310 pages, PDFPhD thesis (April 2017)

Abstract: Every day we take for granted that, with theclick of a mouse or a tap on a touchscreen, we caninstantly access the Internet to globally exchange in-formation, finances and physical goods. The computa-tional machinery behind the Internet is found in dat-acenters scattered all over the globe. Each datacentercontains many tens of thousands of computers con-nected together through a datacenter network. Like the

Internet, datacenter networks suffer from network in-terference. Network interference occurs when conges-tion caused by some applications, delays or interfereswith traffic from other applications. Network interfer-ence makes it difficult to predict network latency: thetime that any given packet will take to traverse the net-work. The lack of predictability makes it difficult tobuild fast, efficient, and responsive datacenter applica-tions.

In this dissertation I address the problem of networkinterference in datacenter networks. I do so primar-ily by exploiting network scheduling techniques. Net-work scheduling techniques were previously developedto provide predictability in the Internet. However, theywere complex to deploy and administer because few as-sumptions could be made about the network. Unlikethe Internet, datacenter networks are administered by asingle entity, and have well known physical properties,that rarely change.

The thesis of this dissertation is that it is both pos-sible and practical to resolve network interference indatacenter networks by using network scheduling tech-niques. I show that it is possible to resolve network in-terference by deriving a simple, and general, networkscheduler and traffic regulator. I take the novel stepof basing my scheduler design on a simple, but real-istic, model of a datacenter switch. By regulating theflow of traffic into each switch, I show that it is possi-ble to bound latency across the network. I develop theleaky token bucket regulator to perform this function.I show that my network scheduler design is practicalby implementing a simple, and immediately deployable,system called QJUMP. QJUMP is thoroughly evaluatedand demonstrated to resolve network interference be-tween datacenter applications in both simulation andon a physical test-bed. I further show that QJUMP canbe extended and improved upon in a variety of ways. Itherefore conclude that it is both possible and practicalto control network interference in datacenter networksusing network scheduling techniques.

UCAM-CL-TR-944

Alexander Vetterl:

Honeypots in the age of universalattacks and the Internet of ThingsFebruary 2020, 115 pages, PDFPhD thesis (Churchill College, November 2019)

Abstract: Today’s Internet connects billions of physicaldevices. These devices are often immature and insecure,and share common vulnerabilities. The predominantform of attacks relies on recent advances in Internet-wide scanning and device discovery. The speed at which(vulnerable) devices can be discovered, and the devicemonoculture, mean that a single exploit, potentiallytrivial, can affect millions of devices across brands andcontinents.

263

In an attempt to detect and profile the growingthreat of autonomous and Internet-scale attacks againstthe Internet of Things, we revisit honeypots, resourcesthat appear to be legitimate systems. We show that thisendeavour was previously limited by a fundamentallyflawed generation of honeypots and associated miscon-ceptions.

We show with two one-year-long studies that thedisplay of warning messages has no deterrent effectin an attacked computer system. Previous research as-sumed that they would measure individual behaviour,but we find that the number of human attackers is or-ders of magnitude lower than previously assumed.

Turning to the current generation of low- andmedium-interaction honeypots, we demonstrate thattheir architecture is fatally flawed. The use of off-the-shelf libraries to provide the transport layer meansthat the protocols are implemented subtly differentlyfrom the systems being impersonated. We developed ageneric technique which can find any such honeypot atInternet scale with just one packet for an establishedTCP connection.

We then applied our technique and conducted sev-eral Internet-wide scans over a one-year period. By log-ging in to two SSH honeypots and sending specific com-mands, we not only revealed their configuration andpatch status, but also found that many of them were notup to date. As we were the first to knowingly authenti-cate to honeypots, we provide a detailed legal analysisand an extended ethical justification for our research toshow why we did not infringe computer-misuse laws.

Lastly, we present honware, a honeypot frameworkfor rapid implementation and deployment of high-interaction honeypots. Honware automatically pro-cesses a standard firmware image and can emulate awide range of devices without any access to the manu-facturers’ hardware. We believe that honware is a majorcontribution towards re-balancing the economics of at-tackers and defenders by reducing the period in whichattackers can exploit vulnerabilities at Internet scale ina world of ubiquitous networked ‘things’.

UCAM-CL-TR-945

Maxwell Jay Conway:

Machine learning methods fordetecting structure in metabolic flownetworksMarch 2020, 132 pages, PDFPhD thesis (Selwyn College, August 2018)

Abstract: Metabolic flow networks are large scale,mechanistic biological models with good predictivepower. However, even when they provide good predic-tions, interpreting the meaning of their structure canbe very difficult, especially for large networks whichmodel entire organisms. This is an underaddressed

problem in general, and the analytic techniques that ex-ist currently are difficult to combine with experimentaldata. The central hypothesis of this thesis is that statis-tical analysis of large datasets of simulated metabolicfluxes is an effective way to gain insight into the struc-ture of metabolic networks. These datasets can be ei-ther simulated or experimental, allowing insight on realworld data while retaining the large sample sizes onlyeasily possible via simulation. This work demonstratesthat this approach can yield results in detecting struc-ture in both a population of solutions and in the net-work itself.

This work begins with a taxonomy of samplingmethods over metabolic networks, before introducingthree case studies, of different sampling strategies. Twoof these case studies represent, to my knowledge, thelargest datasets of their kind, at around half a millionpoints each. This required the creation of custom soft-ware to achieve this in a reasonable time frame, and isnecessary due to the high dimensionality of the samplespace.

Next, a number of techniques are described whichoperate on smaller datasets. These techniques, focusedon pairwise comparison, show what can be achievedwith these smaller datasets, and how in these cases, vi-sualisation techniques are applicable which do not havesimple analogues with larger datasets.

In the next chapter, Similarity Network Fusion isused for the first time to cluster organisms across sev-eral levels of biological organisation, resulting in thedetection of discrete, quantised biological states in theunderlying datasets. This quantisation effect was main-tained across both real biological data and Monte-Carlo simulated data, with related underlying biolog-ical correlates, implying that this behaviour stems fromthe network structure itself, rather than from the ge-netic or regulatory mechanisms that would normally beassumed.

Finally, Hierarchical Block Matrices are used as amodel of multi-level network structure, by clusteringreactions using a variety of distance metrics: first stan-dard network distance measures, then by Local Net-work Learning, a novel approach of measuring connec-tion strength via the gain in predictive power of eachnode on its neighbourhood. The clusters uncovered us-ing this approach are validated against pre-existing sub-system labels and found to outperform alternative tech-niques.

Overall this thesis represents a significant newapproach to metabolic network structure detection,as both a theoretical framework and as technolog-ical tools, which can readily be expanded to coverother classes of multilayer network, an under exploreddatatype across a wide variety of contexts. In additionto the new techniques for metabolic network structuredetection introduced, this research has proved fruitfulboth in its use in applied biological research and interms of the software developed, which is experiencingsubstantial usage.

264

UCAM-CL-TR-946

Michael Schaarschmidt:

End-to-end deep reinforcementlearning in computer systemsApril 2020, 166 pages, PDFPhD thesis (Sidney Sussex College, September 2019)

Abstract: The growing complexity of data process-ing systems has long led systems designers to imag-ine systems (e.g. databases, schedulers) which can self-configure and adapt based on environmental cues. Inthis context, reinforcement learning (RL) methods havesince their inception appealed to systems developers.They promise to acquire complex decision policies fromraw feedback signals. Despite their conceptual popu-larity, RL methods are scarcely found in real-worlddata processing systems. Recently, RL has seen ex-plosive growth in interest due to high profile suc-cesses when utilising large neural networks (deep rein-forcement learning). Newly emerging machine learningframeworks and powerful hardware accelerators havegiven rise to a plethora of new potential applications.

In this dissertation, I first argue that in order todesign and execute deep RL algorithms efficiently,novel software abstractions are required which canaccommodate the distinct computational patterns ofcommunication-intensive and fast-evolving algorithms.I propose an architecture which decouples logical al-gorithm construction from local and distributed execu-tion semantics. I further present RLgraph, my proof-of-concept implementation of this architecture. In RL-graph, algorithm developers can explore novel designsby constructing a high-level data flow graph throughcombination of logical components. This dataflowgraph is independent of specific backend frameworksor notions of execution, and is only later mapped toexecution semantics via a staged build process. RL-graph enables high-performing algorithm implementa-tions while maintaining flexibility for rapid prototyp-ing.

Second, I investigate reasons for the scarcity ofRL applications in systems themselves. I argue thatprogress in applied RL is hindered by a lack of toolsfor task model design which bridge the gap betweensystems and algorithms, and also by missing sharedstandards for evaluation of model capabilities. I in-troduce Wield, a first-of-its-kind tool for incrementalmodel design in applied RL. Wield provides a smallset of primitives which decouple systems interfaces anddeployment-specific configuration from representation.Core to Wield is a novel instructive experiment proto-col called progressive randomisation which helps prac-titioners to incrementally evaluate different dimensionsof non-determinism. I demonstrate how Wield and pro-gressive randomisation can be used to reproduce andassess prior work, and to guide implementation ofnovel RL applications.

UCAM-CL-TR-947

Robert N. M. Watson,Alexander Richardson, Brooks Davis,John Baldwin, David Chisnall, Jessica Clarke,Nathaniel Filardo, Simon W. Moore,Edward Napierala, Peter Sewell,Peter G. Neumann:

CHERI C/C++ Programming GuideJune 2020, PDF

Abstract: This document is a brief introduction to theCHERI C/C++ programming languages. We explainthe principles underlying these language variants, andtheir grounding in CHERI’s multiple architectural in-stantiations: CHERI-MIPS, CHERI-RISC-V, and Arm’sMorello. We describe the most commonly encountereddifferences between these dialects and C/C++ on con-ventional architectures, and where existing softwaremay require minor changes. We document new com-piler warnings and errors that may be experiencedcompiling code with the CHERI Clang/LLVM com-piler, and suggest how they may be addressed throughtypically minor source-code changes. We explain howmodest language extensions allow selected software,such as memory allocators, to further refine permis-sions and bounds on pointers. This guidance is basedon our experience adapting the FreeBSD operating-system userspace, and applications such as PostgreSQLand WebKit, to run in a CHERI C/C++ capability-based programming environment. We conclude by rec-ommending further reading.

UCAM-CL-TR-948

Dylan McDermott:

Reasoning about effectful programsand evaluation orderJune 2020, 150 pages, PDFPhD thesis (Homerton College, October 2019)

Abstract: Program transformations have various appli-cations, such as in compiler optimizations. These trans-formations are often effect-dependent: replacing oneprogram with another relies on some restriction on theside-effects of subprograms. For example, we cannoteliminate a dead computation that raises an exception,or a duplicated computation that prints to the screen.Effect-dependent program transformations can be de-scribed formally using effect systems, which annotatetypes with information about the side-effects of expres-sions.

In this thesis, we extend previous work on effect sys-tems and correctness of effect-dependent transforma-tions in two related directions.

265

First, we consider evaluation order. Effect systemsfor call-by-value languages are well-known, but are notsound for other evaluation orders. We describe soundand precise effect systems for various evaluation or-ders, including call-by-name. We also describe an ef-fect system for Levy’s call-by-push-value, and show thatthis subsumes those for call-by-value and call-by-name.This naturally leads us to consider effect-dependenttransformations that replace one evaluation order withanother. We show how to use the call-by-push-value ef-fect system to prove the correctness of transformationsthat replace call-by-value with call-by-name, using anargument based on logical relations. Finally, we extendcall-by-push-value to additionally capture call-by-need.We use our extension to show a classic example of a re-lationship between evaluation orders: if the side-effectsare restricted to (at most) nontermination, then call-by-name is equivalent to call-by-need.

The second direction we consider is non-invertibletransformations. A program transformation is non-invertible if only one direction is correct. Such trans-formations arise, for example, when considering unde-fined behaviour, nondeterminism, or concurrency. Wepresent a general framework for verifying noninvert-ible effect-dependent transformations, based on our ef-fect system for call-by-push-value. The framework in-cludes a non-symmetric notion of correctness for effect-dependent transformations, and a denotational seman-tics based on order-enriched category theory that canbe used to prove correctness.

266

list of technical reports - university of cambridge · technical report computer laboratory...

Documents