the pervasiveness of evolution in grumps software

22
SOFTWARE—PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2003; 33:99–120 (DOI: 10.1002/spe.498) The pervasiveness of evolution in GRUMPS software Huw Evans 1, ,† , Malcolm Atkinson 1 , Margaret Brown 1 , Julie Cargill 1 , Murray Crease 1 , Steve Draper 2 , Phil Gray 1 and Richard Thomas 3 1 Department of Computing Science, University of Glasgow, Glasgow, G12 8RZ, U.K. 2 Department of Psychology, University of Glasgow, Glasgow, G12 8QQ, U.K. 3 Department of Computer Science and Software Engineering, The University of Western Australia, Crawley WA, 6009, Australia SUMMARY This paper describes the evolution of the design and implementation of a distributed run-time system that itself is designed to support the evolution of the topology and implementation of an executing, distributed system. The three different versions of the run-time architecture that have been designed and implemented are presented, together with how each architecture addresses the problems of topological and functional evolution. In addition, the reasons for the rapid evolution of the design and implementation of the architecture are also described. From the lessons learned in both evolving the design of the architecture and in trying to provide a run- time system that can support run-time evolution, this paper discusses two generally applicable observations: evolution happens all the time, and it is not possible to anticipate how systems will evolve as designs; and large, run-time systems do not follow a predictable path. In addition to this, rapid prototyping has proved to be extremely useful in the production of the three architectures; this kind of prototyping has been made much easier by designing the core set of Java abstractions in terms of interfaces; and building an architecture that allows as many decisions as possible to be made at run-time which has produced a support system that is more responsive to the user as well as the distributed environment in which it is executing. Copyright c 2003 John Wiley & Sons, Ltd. KEY WORDS: run-time evolution; dynamic evolution; change management; distributed systems; experiment; investigation; large-scale INTRODUCTION The GRUMPS (Generic Remote Usage Monitoring Production System) project is developing techniques and software to support the human-driven activity of a rapid turnaround in the design, deployment and modification of distributed investigations. Such investigations may try to answer Correspondence to: Huw Evans, Department of Computing Science, University of Glasgow, Glasgow, G12 8RZ, U.K. E-mail: [email protected] Contract/grant sponsor: EPSRC; contract/grant number: GR/N381141 Copyright c 2003 John Wiley & Sons, Ltd. Received 24 July 2002 Revised 20 September 2002 Accepted 30 September 2002

Upload: huw-evans

Post on 06-Jul-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

SOFTWARE—PRACTICE AND EXPERIENCESoftw. Pract. Exper. 2003; 33:99–120 (DOI: 10.1002/spe.498)

The pervasiveness of evolutionin GRUMPS software

Huw Evans1,∗,†, Malcolm Atkinson1, Margaret Brown1, Julie Cargill1,Murray Crease1, Steve Draper2, Phil Gray1 and Richard Thomas3

1Department of Computing Science, University of Glasgow, Glasgow, G12 8RZ, U.K.2Department of Psychology, University of Glasgow, Glasgow, G12 8QQ, U.K.3Department of Computer Science and Software Engineering, The University of Western Australia,Crawley WA, 6009, Australia

SUMMARY

This paper describes the evolution of the design and implementation of a distributed run-time systemthat itself is designed to support the evolution of the topology and implementation of an executing,distributed system. The three different versions of the run-time architecture that have been designed andimplemented are presented, together with how each architecture addresses the problems of topological andfunctional evolution. In addition, the reasons for the rapid evolution of the design and implementation ofthe architecture are also described.

From the lessons learned in both evolving the design of the architecture and in trying to provide a run-time system that can support run-time evolution, this paper discusses two generally applicable observations:evolution happens all the time, and it is not possible to anticipate how systems will evolve as designs;and large, run-time systems do not follow a predictable path. In addition to this, rapid prototyping hasproved to be extremely useful in the production of the three architectures; this kind of prototyping hasbeen made much easier by designing the core set of Java abstractions in terms of interfaces; and buildingan architecture that allows as many decisions as possible to be made at run-time which has produced asupport system that is more responsive to the user as well as the distributed environment in which it isexecuting. Copyright c© 2003 John Wiley & Sons, Ltd.

KEY WORDS: run-time evolution; dynamic evolution; change management; distributed systems; experiment;investigation; large-scale

INTRODUCTION

The GRUMPS (Generic Remote Usage Monitoring Production System) project is developingtechniques and software to support the human-driven activity of a rapid turnaround in the design,deployment and modification of distributed investigations. Such investigations may try to answer

∗Correspondence to: Huw Evans, Department of Computing Science, University of Glasgow, Glasgow, G12 8RZ, U.K.†E-mail: [email protected]

Contract/grant sponsor: EPSRC; contract/grant number: GR/N381141

Copyright c© 2003 John Wiley & Sons, Ltd.Received 24 July 2002

Revised 20 September 2002Accepted 30 September 2002

100 H. EVANS ET AL.

questions about genome sequences, or whether the Higgs boson exists. These kinds of complex,distributed investigations are data-driven; they generate large amounts of data (in the region of terabytesand beyond) and they generate it very quickly. This volume of data needs to be effectively capturedand stored, ready for later analysis. In some cases, the data may also need to be processed while theinvestigation is ongoing.

GRUMPS seeks to develop a generic middleware that will assist in the construction, deployment andoperation of a wide range of investigations. The GRUMPS approach to supporting such an investigationis iterative, reflecting their dynamic nature. A user of GRUMPS will deploy a network of data collectionand process of objects which addresses their current ideas about how to perform the investigation.During the course of the investigation, the user may formulate new questions, based on feedbackfrom an analysis of the data collected in an earlier part of the investigation. Rather than deploy anew investigation—which may be costly and error prone—they want to adapt the current investigation,retargeting it to the collection of data that may answer the new question. Supporting this requires adistributed system that is responsive to the rapid turnaround in new requirements that such investigatorsencounter.

The GRUMPS distributed run-time architecture addresses this need for rapid turnaround by allowingthe user to change the topology and the implementation of an already-deployed investigation at run-time. The main aim of the architecture is to support the rapid evolution of an investigation by itsusers. The designer of an investigation should invest in the flexibility that will ameliorate the cost ofresponding to unanticipated requirements. Of course, for this to be possible, the general nature of thesechanges must be understood.

The design and implementation of the run-time architecture has gone through three major revisionssince the first was completed in February 2001 (see [1] for more information on the first six months ofthe project).

The rest of the paper is organized as follows. The remainder of this section provides a definitionof terms and introduces the overall architectural requirements of the run-time system. The followingthree sections describe the different versions of the architecture, together with an evaluation of how theysupport topological and functional evolution and what the motivators were for changing the architectureat each stage. A section then shows how the third architecture and the GRUMPS approach in general isused to perform an investigation. The last section discusses the lessons learned from iterating throughthe three different versions.

Definition of terms

At an abstract level a running GRUMPS system consists of a graph of objects that have referencesbetween them. Each object performs or supports some part of an empirical investigation (for example,collecting data from a sensor, or cleaning data prior to it being written to a database) and investigationdata is exchanged between objects via the references.

GRUMPS supports two main kinds of run-time evolution: topological and functional. The phrasetopological evolution refers to the ability of the run-time system to change both the number of objectsin the run-time system and how they are interconnected via their references. A topological evolutionalters the topology of the object graph. Functional evolution is an evolution of the implementationof the run-time system; that is, both the run-time system that is supporting the investigation and theinvestigation itself. The code that is being used to implement both of these can be replaced at run-time.

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

THE PERVASIVENESS OF EVOLUTION IN GRUMPS SOFTWARE 101

In this paper we refer to investigations rather than experiments. It is felt that the word ‘experiment’implies a clear prior hypothesis, whereas ‘investigation’ includes the more iterative and discursivenature of the kinds of endeavour described in the introduction. Such a broader approach requires a run-time system that can respond to the rapid iterations and evolutions that users perform as early resultsstimulate revisions of the investigation’s goals and methods.

In order to perform the investigation, it is assumed that the user will first of all design an investigationusing the GRUMPS approach, its software and tools. In order to manage this activity, the GRUMPSteam intends to provide a database in which the user’s current view of the investigation will be stored.This is currently an area of ongoing research. In outline, this database and its associated tools willbe used to design the investigation, deploy it and manipulate it at run-time. The database will containinformation on: the context in which the investigation is being performed; the current goal of theinvestigation; and a data repository and an analysis repository. The data repository will contain thedata model and schema for the current investigation. This will contain information on the network ofdata-collection and processing entities that should be deployed to perform the investigation, togetherwith details of how they should be interconnected. This network will be generated by processing storedsource-code repositories within the context of the data model and schema. The analysis repository willallow the investigator to store and perform queries over the current (possibly executing) investigation.This will allow them to, amongst other things, query the investigation to see if answers that help themmeet their goal can be found. The investigator can also mine the data that has already been collectedand perform queries over it. The answers gained from searching the data in this way will then be usedby the investigator and GRUMPS software to alter the topology and functionality of the investigation,to retarget the ongoing investigation to answer new questions, and to meet new goals.

Overall architectural requirements

The main architectural requirement is that the run-time system should support changes to theimplementation of the investigation during program execution. It should not be necessary for theprogram (or parts of it) to be shut down, changed and started again. In addition to this, the run-timesystem should also support changes to the topology of the objects in the investigation. It should be easyto connect and disconnect objects from one another, without the need to stop or suspend the program.If the programmer does want to stop part of a program, the run-time system should be capable ofsupporting this, as this allows the programmer to move (in an off-line sense) objects from one machineto another.

The architecture and run-time system should also be able to scale to the level of the Internet so asnot to be constrained by the assumptions of deploying a system in a local area network. This meansthat the run-time system (guided by its users and tools) should be able to place the componentsof an investigation wherever they are required on the network; it should not be necessary to alterthe requirements of an investigation to suit the computing infrastructure being used to conduct it.Therefore, the run-time system should be capable of operating within the context of firewalls andproxy servers and other kinds of technology that are routinely deployed onto the public Internet, suchas network address translators (NATs) and Internet Service Providers (ISPs) that use dynamically-assigned IP addresses.

As the GRUMPS system is a distributed one, component errors and failures need to be addressed.For example, machines may crash or they may be unexpectedly removed from the run-time system.

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

102 H. EVANS ET AL.

The run-time infrastructure should be capable of handling this situation, thus minimizing the need toredeploy parts of an investigation or to conduct an entire investigation again. In order to support this,problem determination is required so that a user can plan to make the system operable again.

FIRST ARCHITECTURE

The previous section has defined a number of overall architectural requirements. The first architectureconcentrated on providing the basic abstractions for the functional evolution of run-time objects.Distributed computation was supported in this version by Java’s RMI [2]. For a more detaileddescription of the first architecture, see [3].

GRUMPS events and GRUMPS units

A GRUMPS event (GE) in the first version of the architecture was a piece of application-level data thatcaptured information about an event that had occurred in an executing investigation: information suchas the time when the event occurred; the origin of the event; investigation identifiers to place the eventinto an investigation; and event specific information.

GRUMPS events were sent and received by GRUMPS units. A GRUMPS unit (GU) was a containerfor a single object, referred to as the contained object or CO. The CO performed the processingover a GE before (typically) sending the processed event to another GU. The CO could not beanother GU. The contained object could receive events from a number of sources and could sendan event to a number of destinations. Each GU defined a single control interface through which toolscould communicate to change the CO at run-time. A reference could also be gained to the CO so thatprograms could directly communicate with it. GU objects could be composed into a graph which itselfcould be treated and reused as a single GU.

The CO implemented an interface called Application. This interface defined one methodapplythat took two arguments, one to the GE and the other to the CO’s GU. The apply method returned aGE. In this way a simple receive, apply, send triple was defined for the processing of a received event.The CO was passed a reference to its GU in case the processing of the GE needed to call methods onthe enclosing GU. This approach has remained in all subsequent designs and implementations and hasbeen generalized as the project has continued.

From the beginning of the development of the run-time system, nine fundamental kinds of GU wereidentified from which it was intended more complex compositions could be made. Examples of thefundamental GUs are: compress that compresses an event or a group of events into a single event;concentrator that takes N input streams and has one output stream; and minable which takesevents and generates events that are suitable for data mining.

Support for topological and functional evolution

The first version of the architecture defined four main interfaces that the programmer was to use todefine a graph of GUs and to interact with their contents: Callable for managing application-level connections between GUs and for sending events between them; Control for managingthe GU control interface and for sending and receiving control events; Access for accessing the

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

THE PERVASIVENESS OF EVOLUTION IN GRUMPS SOFTWARE 103

object contained within the GU; and finally the Application interface previously defined above.Both Callable and Control provide support for defining and redefining the topology of a graph.These interfaces allow GUs to connect and disconnect from one another and to exchange and processinvestigation-level events. A GU connects to a second GU simply by gaining a reference to it andpassing a reference for itself to the second GU’s connect method. To pass an event from a sourceGU to a destination GU, the source GU calls the receive method of the destination GU which isdefined by the Callable interface.

Support for functional evolution was provided via the Access interface defined on a GU.This interface defined get and set methods to get and set the contained object. The contained objectwas an instance of the Application interface. If a programmer wanted to alter the way an alreadydeployed object processed events, they first of all gained a reference to the GU. They could then eithercall get to retrieve a reference to the contained object to call one of its methods, or they could callset to replace that object with a new one. The old object and the new one would both be instancesof the type Application (or a subtype of it); however, the objects could be instantiated from twocompletely different classes. In this way, the functional implementation of the investigation could beupdated at run-time.

Evaluation

The main contribution of the first version was basic support for topological and functional evolution.The topology of the investigation could be changed by disconnecting and connecting objects and thefunctionality of a GU could be changed by simply updating an object reference inside it. Providingthe support for functional evolution was made much easier by defining the core set of GRUMPSabstractions in terms of interfaces and this lesson has been built on in all subsequent versions.

RMI proved too cumbersome in our application. Briefly, we found objects had to support sets ofmethods for both local and remote invocations. In addition, this version of the architecture did notsupport having more than one object inside a GU. This limitation was addressed in the second versionof the architecture.

SECOND ARCHITECTURE

In moving from the first to the second version of the architecture, two main areas were addressed: moreflexible support for functional evolution by augmenting the existing container model; and considerationfor the scale to which the architecture could extend. For a more detailed description of the secondarchitecture, see [4].

Augmenting the container model

To make functional evolution easier in version two, two abstractions were introduced:GRUMPSContainers and GRUMPSEvents.

A GRUMPSContainer is an object that encapsulates a number of GU objects. In addition, a GU couldcontain a number of objects that processed the events that it received. GRUMPSContainers are a usefulabstraction to have in a distributed system as they have a known location in the network of machines,

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

104 H. EVANS ET AL.

Network

EventProcessingObjectAn

A GU

Data

OutgoingEvents

Stored

A group of threads

Incoming

Events

External

GrumpsContainerAn event queue

Control ChannelGrumpsContainer

ChannelGU Control

Figure 1. A partial network of GRUMPSContainers and their GU objects.

so that they can be referred to at design-time and run-time by users and tools. GRUMPSContainers alsoallow an investigator and the run-time system to refer to a group of GU objects as a single unit. This ismore concise and easier to deal with. In version one, the GU had a single control interface defined onit. In version two, this idea was generalized so that both the GRUMPSContainer and the GU objectcould define control channels. Communication through the control channel was synchronous as it isuseful to receive a reply when updating an object.

In version two of the architecture, each GU object could have connected to it a number of inputchannels and each GU object could hold a number of output channels. GUs would use these inputand output channels to exchange events. Communication through these channels was asynchronous.A GU was further augmented to contain a number of objects. An event queue was provided, into whichreceived events were placed. An interface type, EventProcessingObject, was also defined thatallowed implementations to take events from their queue and process them. The GRUMPS softwareprovides a default class that implements this interface using two threads; one to wait for incomingevents and place them onto the queue, and the other to take them from the queue and process them‡.An example network of GRUMPSContainers with several GU objects is given in Figure 1. The designof the GRUMPSContainer is expressed in terms of Java interfaces, so that the approach to functionalevolution introduced in version one, that of creating objects instantiated from different classes thatimplement the same interface, applies to this part of the run-time system.

For the second version of the architecture, the GRUMPSEvent-type was modified so that they couldcarry code with them. This kind of event was to be used when evolving the investigation. A supertypecalled GRUMPSEvent is provided which defines a single abstract method apply(Object o).The programmer subclasses this type and provides a concrete implementation of the apply method.An instance of this class could be created and sent to either a GRUMPSContainer or a GU objectvia their control channels. On receipt of this object, a reference to the target object is passed to the

‡At the Java-type level it is not possible to discover if the EventProcessingObject interface is being implemented usingthreads or not. Providing the programmer with this information at evolution time is ongoing work in the area of an object-updateprotocol. (See the second architecture’s Evaluation section.)

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

THE PERVASIVENESS OF EVOLUTION IN GRUMPS SOFTWARE 105

apply method. The first action in this method is to take the Object o and cast it into a known type,either a GRUMPSContainer or a GU. Then the code in the event’s apply method is executed whichmanipulates the object through its public interface. In this way, an object can respond to an open-ended, evolvable set of events; in GRUMPS the event is applied to the object. Manipulating an alreadydeployed object in a new way just entails creating a new subclass of GRUMPSEvent, instantiating itand sending it to the object in question.

Using the container model

When a GRUMPS programmer starts a Java Virtual Machine (VM), the VM only contains the basicGRUMPS run-time system. The programmer then populates the VM with GRUMPSContainer andGUs. GU objects have a name object associated with them which is used to allow individual GUs tobe retrieved from their enclosing GRUMPSContainer. The following two sections describe how a newGU is installed in an existing GRUMPSContainer and then how the EventProcessingObjectinside this newly installed GU can be replaced.

Installing a new GU

To install a new GU into an existing GRUMPSContainer, a reference to the GRUMPSContainer’scontrol channel has to first be gained. A GE object is then instantiated from the classConnectionRequestControlEvent which is made available as part of the run-timesystem. This object handles the installation of the new GU (which contains a newEventProcessingObject) in the GRUMPSContainer. When the connection request GE objectarrives at the remote GRUMPSContainer instance, a reference to this instance is passed to the GEobject’s apply method. The code in the apply method then creates the new GU, initializing itwith its EventProcessingObject. This new GU is then added to the GRUMPSContainer andits event queueing and processing threads are started. The new GU has a name object created whichabstracts over the GU’s control channel. This name object is passed back to the process that sent theConnectionRequestControlEvent so that it may communicate with the GU.

Performing a functional evolution

To perform the functional evolution of an EventProcessingObjectwithin a GU, the programmerfirst of all gains a reference to the GRUMPSContainer that contains the GU of interest. A GEobject is then created from the run-time system provided class, UpdateEPOControlEvent.This object carries with it the name of the GU that contains the EventProcessingObjectthat is to be replaced and the code to create the replacement EventProcessingObject.The GRUMPSContainer interface defines a method getConnection that takes the nameof the GU as a parameter and returns a reference to the contained GU. This GU has itssetProcessingObjectmethod called to replace the old EventProcessingObjectwith thenewly created one. As a new EventProcessingObject is created, its event processing threads arestarted and a message is passed back to the initiator of the UpdateEPOControlEvent to inform itthat the object has been created successfully.

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

106 H. EVANS ET AL.

Scaling the architecture

To support an architecture that was capable of scaling beyond the confines imposed by the first, anumber of additional abstractions were introduced. The socket abstraction was chosen to replace RMIas it provides a stream abstraction which matches the abstraction of sending streams of events betweenGUs.

Additional abstractions

The additional abstractions introduced were ExperimentMonitors and MachineMonitors which wererepresented at run-time as processes. It was intended these processes would provide additional run-timesupport for the user constructing an investigation. A hierarchical and extensible naming scheme wasalso introduced to allow the programmer to name the main abstractions in the run-time system.

At version two of the architecture, a GRUMPS investigation was organized hierarchically. At the topof the hierarchy was the ExperimentMonitor, which was made up of a number of MachineMonitors.Each machine had a number of processes running on it and within each process were a number ofGRUMPSContainers which themselves contained a group of GU objects. Thus, at the top of thehierarchy was the single ExperimentMonitor and its leaves were the GU objects.

Dynamically-extensible naming scheme

In order to refer to an object in the run-time system, the programmer created a logical name for it.A logical name consisted of the name of the investigation the object was associated with, a 1024-bitvalue to uniquely identify the logical name, and a physical name that indicated where in the distributedinvestigation the object was. Once the logical name was created, it was considered to exist forever,i.e. the logical name could never be reused. However, a single logical name could have its physicalname replaced. This allowed objects to be repositioned within the investigation. An object with a newlocation would have a new physical name, but it would always retain its original logical name.

Attribute objects were used to describe the logical name in a way more meaningful to a human.For example, an attribute object called Configuration Data could be associated with a logicalname that represented some investigation configuration data. To create a logical name, a programmerpassed a physical name to a name server together with a list of attribute objects. To find a logical name,the programmer submitted a list of attribute objects to the name server. Logical names that matchedany of the attribute objects (i.e. their equals method returned true) were placed into the result set.In this way, the run-time system defined a boolean OR-valued lookup on logical names.

Evaluation

The generalization of GRUMPSContainers and the introduction of GRUMPSEvents with theirgeneric apply method has proved very useful in supporting functional change at run-time.GRUMPSContainers are an abstraction over a collection of GU objects that has a well-known locationwithin the network which can be manipulated as a single object. The core set of types, such asGRUMPSContainer and GU objects are defined in terms of interfaces. This means that functionalchange can be easily introduced into a running investigation by assigning new objects that implement

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

THE PERVASIVENESS OF EVOLUTION IN GRUMPS SOFTWARE 107

the same (or a subtype) of the given interface. Defining the apply method in an event which takesan instance of type Object also allows the programmer to flexibly process any type of object. Theset of events that a deployed object responds to is open-ended. The apply method is called at thedestination object, which creates the objects that are to be used in the functional evolution. This hasthe advantage that the pay-load of the GE is kept as small as possible and that certain objects whichcannot be serialized, for example, thread objects, only need to be created at the location where they areneeded.

As functional evolution is implemented by performing an object reference update, the state mayhave to be transferred from the old object to the new object before the reference assignment isexecuted. This is an area of ongoing work which is being considered under the general title of anobject-update protocol. The GRUMPS architecture should automate as much of this state transfer aspossible (e.g. when there is a topological evolution) and should provide support for those cases that aresemantically richer, for example, by transferring the outstanding event processing queue from one GUto another.

Three main criticisms of the support for evolution can be levelled at the second version of theGRUMPS run-time architecture: its reliance on a client/server model; the hierarchical naming scheme;and the use of Attribute objects. To gain a reference to an object at run-time, a programmer mustnavigate through a number of servers. This can create a dependency between the object that is beingsearched for and the technology that is being used to find it (the servers). Such a rigid, interdependentarchitecture reduces the flexibility of the overall run-time system and hinders the investigator in theirrapid reconfiguration of an investigation. In addition, no replication of the servers was provided, sothat if one failed, it would not be possible to communicate with any objects further down the hierarchy.This approach was felt to reduce the dependability of the run-time architecture.

Although the naming scheme was extensible, it was based on a hierarchy, with the name of the singleinvestigation at the root. This forced the investigator to associate objects with a single investigation,resulting in investigations that were essentially stand-alone and rigid, unable to share objects. As theGRUMPS project progressed, it became clear that objects could profitably be used simultaneously bymore than one investigation.

The use of Attribute objects requires the user of the run-time system to invent names that helpdescribe some other object. If any aspect of this name is based on something that may change fromone investigation to another, it may not be possible to find that object in a subsequent version ofthe investigation, possibly breaking the application or, at least, reducing the implementor’s ability torapidly reconfigure.

For these reasons, the use of the client/server model, the reliance on a hierarchical naming scheme,the need for so many servers, and the reliance on attribute objects were dropped from the third(and current) version of the architecture.

THIRD ARCHITECTURE

The model of functional and topological evolution based on Java interfaces and object-referenceassignment has been retained in the third version of the architecture. In addition, the use ofGRUMPSContainer objects, GU objects and EventProcessingObjects has also been retained.The rest of architecture two was discarded and redesigned. This section discusses the new objectdiscovery and communication infrastructure introduced in version three. Typically the objects that will

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

108 H. EVANS ET AL.

be discovered across the communication infrastructure are the GRUMPSContainers and GU objectsintroduced in the discussion of the second version of the architecture. Also introduced in the thirdarchitecture is a meta-data object for the GRUMPSContainers and GU objects. The meta-data objectcontains information such as: the number of inter-GU references a GU currently holds; the locationof that GU; and the kinds of event that the GU responds to and generates. The meta-data is useful inhelping to describe objects at run-time and is typically used by a query to find a set of objects, e.g. allthose interested in an event of a particular type.

The third run-time architecture is predicated on two main ideas: in order to achieve scale, a tree-based architecture is used that draws on ideas from the peer-to-peer literature; and to find objects atrun-time a dynamic object query is routed through the tree. This run-time system is called Teaq§ andis described later. For a more detailed description of the current communication architecture, see [5].

Teaq increases the flexibility of a GRUMPS investigation by making it easier for the programmer tolocate objects in the distributed, run-time system. This provides the programmer with a programmingmodel where they can compose an investigation dynamically at run-time by binding objects togetherbased on the results of running queries. By allowing the interobject composition decisions to be made atrun-time, a GRUMPS investigation can more flexibly react to the ever-changing state of the distributedsystem.

In [6], peers are operating system processes that are on equal terms with each other. One peer-processmay be receiving services from another process while simultaneously providing services for a numberof other processes.

Teaq trees

A Teaq process is an instance of a Java virtual machine. The third version of the architecture placesTeaq processes into a spanning-tree [7], connected via sockets across the network. The left-hand sideof Figure 2 shows four processes in such a tree where each process has a number of registered objects.In the Teaq spanning-tree each process has at most one parent process and can decide how manychildren processes it is willing to support. All processes are in the same tree, so it is possible to routea message from one process to any other process, assuming a route between the two is available in theface of process, machine and network failure. More than one process may be started on a machine andin this case, only one process has a parent that is on a remote machine. All the other processes on themachine connect to the same local parent process.

A spanning-tree has a number of desirable properties that make it suitable for use in a peer-to-peerenvironment. If there are N processes involved in a spanning-tree, there are N −1 connections betweenprocesses which ensures a relatively low bound on the total number of connections in the tree. The treecontains no cycles and is an easy data-structure to construct in a dynamically changing environment;to connect to a tree a process only needs to contact its potential parent¶.

Each Teaq process has a Capacity value associated with it. This value represents the processingcapacity of the machine on which it is executing. If a Teaq process is running on a machine that has

§Teaq stands for Trees, Evolution and Queries and the word is pronounced the same as ‘teak’.¶In Teaq, it is possible for this parent to refuse the connection, in which case, another parent has to be chosen. This should bequite rare and the number of alternative parents contacted is low, perhaps less than ten.

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

THE PERVASIVENESS OF EVOLUTION IN GRUMPS SOFTWARE 109

QueryMatch

QueryInitiator

Hop 3Hop 2

Hop 1

QueryMatch

QueryInitiator

Object(1) Proxy

registered objectscontaining threeA Teaq process

GRUMPSContainer

GRUMPSContainer gc where gc.processedEvents() > 20;

select gc fromRemote Object(2) Invoke

GRUMPSContainer

Proxy Object

Figure 2. Propagating a query through a Teaq tree and initializing a proxy object.

a large amount of physical memory, a fast CPU, and it is attached to a fast network with a lot ofbandwidth, the Capacity value will reflect the capabilities of this machine.

The Teaq tree is tolerant of process and machine failure. If a parent process fails, all of its childrenare temporarily disconnected from the tree. Should this happen, when a child process next needs tocommunicate with its parent, an error will be received at the child and that child will reattach itself tothe tree. In the same way that a remote parent process may fail, a local parent process may fail. If thishappens, the first local child process to notice the failure runs a leader election protocol to establishitself as the process that has the remote parent process. All other local processes then make this processtheir new parent.

Teaq queries

Objects are found in Teaq by running queries across the tree. A programmer makes an object availablefor local or remote query by registering it (and implicitly, all objects reachable from the registeredobject) with the Teaq run-time system. Currently, a Teaq query looks similar to a simple OQL [8] query.The query on the left-hand side of Figure 2 will match all instances of registered GRUMPSContainerobjects for which the query condition is true.

At the query initiating process, a listener object is created that will be called-back when a queryresult has been found. To send out the query (as a GRUMPSEvent) into the tree, the programmercalls a method on the Teaq run-time. The method returns an object which represents the remote query.The query is then routed through the tree (the three hops on Figure 2). The query is dynamic in the sensethat at any one process it can decide whether to copy itself to the current process’ parent (if there isone) or whether to copy itself to any child processes that the current process may possess. In the currentversion of Teaq, a default query is provided that visits all the processes in the tree. When a querymatch is found, the matching instances are serialized, ready for sending back to the query initiator.Teaq defines a facility for the programmer to process the result objects as they are being serialized.

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

110 H. EVANS ET AL.

In some cases, the programmer may not want to send back a serialized copy of the matched object.Rather, they would prefer to pass back a proxy to the matched object. Teaq provides this facility sothat remote references to matched objects may be established (as on the right-hand side of Figure 2).As time has passed between the left-hand and right-hand sides of the figure, the shape of the graph ofregistered objects at some of the processes has changed. This is because of normal program execution;some objects have had others assigned to internal fields, others have assigned null to these fields.

Using the Teaq environment

To make a GRUMPSContainer object available for query, a programmer registers it (or a graph ofobjects containing it) with the Teaq run-time system. A program would then issue a query that searchedfor instances of GRUMPSContainer. On finding such an instance, a serializable proxy object is createdat the matching side which is connected to the GRUMPSContainer’s control channel. This proxy objectis sent back to the query initiator. The query initiator then communicates with the GRUMPSContainerusing the proxy object. In the current version of Teaq, the proxy object is generated as follows.The GRUMPSContainer class implements an interface that is known to the run-time system. When thematched GRUMPSContainer instance is serialized, the run-time system checks the instance to see if itimplements this interface. As it does, a method is called that generates the proxy object and this proxyobject is passed to the Java serialization mechanism, to be passed back to the query initiator.

Teaq provides an object discovery service in a peer-to-peer environment. Once the query initiator hasthe proxy object, it can communicate with the GRUMPSContainer directly. If the process that containsthe remote GRUMPSContainer fails, the proxy object will throw an exception when it next attemptsto communicate with the remote object. In this case, the query initiator can send a new query into thesystem to find another GRUMPSContainer object.

Evaluation

The third architecture is a much more flexible model for the programmer to work with; all thepreviously identified criticisms of the second version of the architecture have been addressed in thethird version. To find an object, a query is sent into the run-time system, no server has to be contacted.No additional names have to be created to describe an object, the object query uses the characteristicsinherent in an object to find it (e.g. its type, method signatures, the result of method calls, and any publicfields it may define). Thus, objects are self-describing. In addition, no limitation on which investigationmay use them is imposed.

In this way, the rigid support structure introduced in version two has been replaced with one thatis much more dynamic, allowing a programmer to compose an investigation dynamically at run-timeby binding objects together based on the results of running a query. The program discovers an objectthat performs a certain task by formulating a query to find it. This will still require the programmer toprovide information so that the object can be found, but in Teaq, this information is part of the object.Thus, the programmer benefits from being able to use the usual object-oriented programming conceptssuch as encapsulation. Joining and leaving the tree is handled for processes by the Teaq run-time,although process failure still needs to be handled by the programmer as application-level objects willfail to respond in this case.

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

THE PERVASIVENESS OF EVOLUTION IN GRUMPS SOFTWARE 111

control

data

cleaned

cleaneduser data

User 1

User 3

User 4

User 2

User 4

user data

user data

User 1

User 3

User 2

user data

cleaned

cleaned

user datacollection GU

user datacollection GU

data cleaning GUGEs

GRUMPSEvents (GEs)

Key Teaq parent referenceInter−GU reference

GEs

Teaq Process

Figure 3. An example of a run-time investigation.

USING THE THIRD ARCHITECTURE

The discussion so far has focussed on the evolution of the GRUMPS run-time architecture. This sectiondescribes how the third architecture might be used to perform an investigation.

An example investigation

A simple investigation based on collecting user action data (key presses, mouse clicks and windowfocus changes) from instrumented computers is used as an example and is illustrated in Figure 3.

On the left of the figure, the application-level view of the run-time system is shown wherethe interconnections between the GUs have been emphasized. The application consists of fourinstrumented computers that send their data to a user data-collection GU. This GU is responsiblefor collecting the data from the computer and managing the application’s interaction with it, via thecontrol reference, e.g. to stop the collection process on the user’s machine. This GU then turns theraw user-action data into a GRUMPSEvent. A GRUMPSEvent for each kind of user-action is defined,e.g. window focus change, which contains standard information, such as the time the event occurred,together with event specific data, such as the names of the window changed from and to. These eventsare then sent to the data-cleaning GU which cleans and transforms the data (e.g. anonymizing the user’sidentity), ready for writing the information into a database. At some later stage an analysis on this datawill be performed to answer such questions as how long do the users spend reading e-mail each day.On the right of Figure 3 is the Teaq process-level view of the same application. In this part of the

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

112 H. EVANS ET AL.

figure the topology of the Teaq tree has been emphasized to show how the application-level view couldbe mapped onto a Teaq tree. In the Teaq process-level view it is assumed that the four processes arerunning on four different computers.

Data-driven investigations

There are many potential investigations that might involve the collection of user action data of the sortdescribed in our example. We may characterize these investigations in terms of their goals, consequentdata requirements, data analysis, plus additional contextual information, such as the staff involved.Of particular interest for the topic of this paper are the differences between investigations with respectto the stability or volatility of goals, data requirements and analytic techniques. At one extreme arescientific experiments for which data requirements are typically fixed and well understood, as are theanalytic techniques by which data contributes to meeting the investigation goals (usually confirmingor disconfirming pre-defined hypotheses). Data mining provides a middle ground in which the datasources and techniques are fixed, but detailed investigation goals may emerge during the course of theinvestigation. At the other extreme are ‘data-driven’ investigations that arise in part as the result of datacollection opportunities, allowing for interests, guesses and hypotheses to develop in tandem with thecollection and analysis of data. Although the GRUMPS architecture can support all three investigationtypes, it is the latter that provides the greatest challenge for our architecture because it presents thegreatest likelihood of changes to goals, data sources, types of data required, and the types of analysisperformed on the data.

On the GRUMPS project, in addition to developing the data-capture system presented inthis paper, we are also working on support for the entire process of data-driven investigation.We have developed an investigation repository and associated tools to enable users to develop andmanage the investigation process, including the following stages, typically performed interleavedand iteratively: goal formulation; investigation design and preparation; data-collection, evolutionand tear-down; and data analysis and investigation reconfiguration (i.e. a change to any of theprevious stages).

The repository contains contextual information (e.g. about the investigators, related investigations,and investigation history) together with representations of relevant information about all of theinvestigation stages (e.g. investigation goals, data-capture system specification, schema for raw and‘cleaned’ data, and analysis results). Investigators can store and perform queries over the data from thecurrent data collection system, even as data is being collected.

Investigation design and preparation

Investigation design involves deriving data requirements from goals, identifying analytic techniquesto be employed, and acquiring or constructing the necessary data collection and analysis instrumentsand tools. For data-driven investigations, design activity may be highly interactive, using feedbackfrom early data-capture to formulate or clarify questions from which a more focused investigation maytake place. This may involve writing the software to capture the data from the user’s machine and thenecessary code to convert the raw data into events and send them from the data-collection GU to thedata-cleaning GU and into the repository. The schema for the data-collection database also needs to bedesigned and implemented.

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

THE PERVASIVENESS OF EVOLUTION IN GRUMPS SOFTWARE 113

At the design phase, non-technical factors also need to be considered, for example, the privacy ofthe user. This may be handled in our case by storing an anonymizing value for the user’s identity in thedatabase‖. In Figure 3 there is one data-collection GU per machine. It would be possible to send thedata for a number of computers to a single GU. However, in designing it this way, a single GU becomesa proxy for a single machine. To control the data collection at that machine just requires a reference tobe obtained to that GU.

Investigation deployment

Once the investigation has been designed, it needs to be deployed. This may involve installing devicesonto a network and other infrastructure issues, such as configuring firewalls to allow certain data topass through. In our example case, the deployment of the software involves installing the data-capturesoftware onto the user’s machine and installing the GU code on other computers. The database alsoneeds to be initialized. Currently, this has to be performed manually; however, in the future, GRUMPSintends to automate parts of these tasks. The investigation repository would contain a description ofwhat needed to be deployed and where, and certain of these deployment tasks could be performed bythe run-time system.

Data collection initialization

The data collection is initialized as a separate task from deployment because deployment ensures thesystem is capable of being run. Investigation initialization ensures that the system can be correctlystarted. At the initialization phase, there may be a dependency on the order in which processes arestarted. For example, the data collection database needs to be made available before the data-cleaningGU can make use of it. This kind of information would be stored in the investigation repository andthe GRUMPS investigation management tools would have access to it.

There are two main ways to initialize the example system, eagerly and lazily. An eager initializationinvolves starting the data-collection code and the Teaq processes forming a tree. Once this has beencompleted, a query would be sent around the tree by the data-collection GUs so they find their data-cleaning GU. After the GUs have connected to each other, the left-hand side of Figure 3 is initialized.However, it is also possible to initialize this system in a more lazy fashion, although this is morecomplex to design for and implement. When the user starts to use their computer and data is sent to theuser data-collection GU, this GU could dynamically find a suitable data-cleaning GU. Thus, rather thaninitialize the system as a separate task from using it, the system is initialized on an as-needed basis.Such an initialization may be appropriate in a system that has limited resources. The dynamic intercon-nection of GUs is also applicable in the error case which is discussed in the Data Collection section.

In Figure 3 there are two distinct communication paths defined: the application-level and theTeaq process-level. Two of the user data-collection GUs are directly connected to the same data-cleaning GU. This interconnection may not reflect the underlying network topology, nor the topology

‖It is often useful to actually store the user’s identity so that as much information is retained as possible, although if this isdone, the designers of the system must ensure the data is treated appropriately. Retaining data in this way can be useful whendebugging the system.

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

114 H. EVANS ET AL.

of the Teaq tree. Thus, two GU objects that are close together because they are directly connected maybe widely separated in terms of the Teaq tree. Currently, there is no way to optimize either the Teaqtree or the application-level interconnections to take advantage of the other’s topology. This is an areafor future work.

Data collection, evolution and tear-down

The next main stage is data collection execution and evolution, which involves an aspect of monitoring.In addition, some investigations will be torn-down which requires terminating the processes and savingany buffered data.

Data collection

The investigation is executed when the user logs in to their machine and starts to use it, causinginformation on their actions to be sent to the data-collection GU. This information will be anonymizedand then forwarded to the data-cleaning GU which may remove some unwanted pieces of information(e.g. the key depression events, retaining only the key raising events), before writing the data to therepository. The choice of whether the data is anonymized is something that could be defined by theinvestigator, by telling the system their preferences via a user interface that is part of the code executingon their local machine.

An error may occur in the example system, e.g. the Teaq process that contains the data-cleaningGU for users 1 and 2 may crash or it may be removed from the system. In this case, when thesedata-collection GUs next attempt to send their data forward, they will receive a Java language-levelexception as the outgoing socket will no longer be valid. This exception must be caught and a queryissued to find another suitable data-cleaning GU. The designer of the investigation may have decidedthat if one cannot be found, the data-collection GU should ask the GRUMPS run-time system todeploy a replacement for the crashed GU. This would involve communication with the investigationrepository and how such errors are resolved is an issue for future work. Delegating the decision toanother component in the system removes the need to hard-wire into the deployed application thecode and policy necessary to resolve the problem. The investigator can concentrate on performinginvestigation-related processing in the application, and place error correction code and policy elsewherein the system (in this case, in the investigation repository which is capable of changing the topologyand implementation of the application at run-time to handle the error situation).

In our current implementation, the programmer has to deal with such error cases and arrange for asolution to be found, e.g. requiring them to write the code to deploy the replacement data-cleaning GU.However, this approach is not always applicable as it can build into the code too much informationthat may change during the course of an investigation. For example, the code may attempt to start thereplacement GU on a machine that has been switched off. Therefore, retaining most of the systemconfiguration inside the investigation repository is a more flexible approach.

Data collection evolution

During the course of the investigation it may be necessary to update one of the GUs. For example,a new requirement may arise for cleaning the data. This can be captured in a new GU which can bedeployed into the system at run-time.

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

THE PERVASIVENESS OF EVOLUTION IN GRUMPS SOFTWARE 115

This would be managed by the GRUMPS management tools and the current GRUMPS run-timesupports the evolution discussed here. An investigator would begin by connecting the appropriatemanagement tool to the executing investigation. They then issue a query to find all the data-cleaningobjects in the system. The management tool then has a reference to each data-cleaning object. Using thecontainer model discussed in the section on the second architecture, the investigator would install thenew data-cleaning GU at the appropriate places in the system. Where appropriate, new GUs couldbe installed automatically by the run-time system by following a description held in the investigationdatabase.

In the GRUMPS approach to evolving a system it is assumed that the new GU has been tested andthat it is safe to introduce it into the running system. The GRUMPS system does not provide anysupport to test the new GU. For example, the investigator could run a dummy investigation and updateit with the new GU to test its correctness. However, this aspect of evolving a system is outside the scopeof the GRUMPS project. One advantage of using a database to store the investigation is that, shouldthe newly installed GU prove to be faulty in some way, an undo can more easily be performed, puttingthe old GU back in place. Of course, the investigator must program the system so that the replacementof a GU in this way can be tolerated.

As part of evolving a GRUMPS-based investigation, it may be possible to have a single GU takepart in a number of investigations simultaneously. In the second version of the architecture this wasnot possible because the hierarchical naming scheme (see the section Dynamically-extensible namingscheme) had a single root that was the name of the investigation the object was a part of. This wouldmake it difficult to share objects between investigations. In the current architecture, no such hierarchicalnaming scheme is used, rather, objects are found by running queries. An object can therefore take partin any number of investigations as it will be found by any appropriate query. It would have been possibleto extend the hierarchical naming scheme to have a multivalued root. However, the other disadvantagesof such a system still exist, e.g. the dependency on needing a name server (see the Evaluation sectionof the second architecture). The use of the object query system makes the current architecture moreflexible; binding to an object only involves running a query and selecting the appropriate object fromthe results passed back. This is a more responsive solution than using a naming scheme which can hard-wire into source code the location of name servers. The advantage of the distributed query approach isthat if an appropriate object is available in the system, a query will eventually find it, thus the supplierof the service and the consumer can be more decoupled from one another. The consumer only has toformulate an appropriate query, no other shared information or agreement between the two parties hasto exist, other than the information present in the query.

Currently, the programmer has to manage those aspects of evolution that are a part of working witha distributed system, e.g. concurrent updates. Using the facilities of the Java programming language(such as synchronized methods) it is possible for the programmer to build code that can perform theevolution correctly in a concurrent system. However, as part of future work a richer and more powerfulobject-evolution and update protocol is being considered.

Investigation tear-down

Once an investigation has been completed, the investigator may decide to remove it from the systemit occupies. In our example system, the data-collection GUs could send to their data-cleaning GUsan event to indicate that the user has decided to finish the investigation. When the data-cleaning GU

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

116 H. EVANS ET AL.

has seen such an event from each of its data-collection GUs it would flush all outstanding events tothe database and terminate itself. The investigation application would then have correctly terminated.Alternatively, the investigator may decide to keep their application running and only change it afterhaving performed some analysis of the data.

Data analysis and investigation reconfiguration

At any stage during the investigation the collected data can be analysed to address the motivatingquestions. In our specific case, the investigator would discover how long each user spent readingtheir e-mail by querying the database to sum up the times from the events that indicate the e-mailapplication has had one of its windows activated. Answering such a question may motivate anotherinvestigation and so the investigator may start the same investigation again, or they may decide tomodify the currently deployed investigation in order to answer new questions, based on the collectionof different data.

THE PERVASIVENESS OF EVOLUTION

This section summarizes the experiences of developing the three versions of the GRUMPS architecture.The approach taken to functional evolution has remained the same in all three versions (see Table I

for a summary of the evolution of the GRUMPS architecture). The approach is based on the use of Javainterfaces and the provision of different functionality by implementing a given interface with differentclasses. This approach has been sufficiently flexible and powerful that alternative solutions have notbeen required. Of course, this needs additional architectural and run-time support to make this usableby an investigator with respect to a given data model and schema.

In contrast, the design and implementation-support for topological evolution has itself evolvedduring the three versions of the architecture. In the first version, this was based on method invocations.This proved to be too rigid at the type level due to RMI and so was reimplemented, basing it on thesocket abstraction. This has been retained from version two to version three and seems flexible enoughto use in the future.

As GRUMPSEvents are applied to target objects and by providing a flexible object discovery systemin the third version, key decisions can be delayed until run-time. Certain information may only beavailable at run-time which the investigator needs access to in order to retarget their investigation, orto adapt it to the current distributed system over which the investigation is executing.

Over the course of developing the three versions, it was important to throw away models, designs andimplementations that would not work in the light of the new requirements (cf. throw-away prototypingin [9]). This can be seen in moving from version one to version two and the need to dispense with usingRMI, preferring ease of topology change over ease of method invocation. In moving from version twoto version three, the requirement to retain scale was important but not at the expense of being able tosupport the rapid change of a deployed investigation.

It was possible to throw away so much of versions one and two because of the high degreeof separation between the two levels of the architecture and their implementation and theinvestigation-level code. As the investigation-level code—the GRUMPSContainers, GU objects andthe EventProcessingObjects—are typed from interfaces, an architecture can be more easily

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

THE PERVASIVENESS OF EVOLUTION IN GRUMPS SOFTWARE 117

Table I. A summary of the evolution of the three versions of the GRUMPS architecture.

Retained Introduced Dropped

Version 1 Updating GU RMITopology EvolutionFunctional Evolution

Version 2 Updating GU GCs Client/Server ArchitectureTopology Evolution GEs Hierarchical NamingFunctional Evolution Sockets Attribute Objects

Need for servers

Version 3 GCs Peer-to-Peer ArchitectureGEs Distributed QueriesSockets

built using them as long as that architecture only makes use of the interface types. Changing theimplementation of the investigation-level code is then easy as it does not have an impact on thearchitecture. Changing the architecture is also made easier as it only makes use of the interface types,thus there are less dependencies between it and the investigation-level code.

Related work

This section describes the main areas of work that are related to GRUMPS: peer-to-peer architectures,the Grid, and Web services.

Peer-to-peer

The main defining characteristic of modern peer-to-peer systems is that no single process fulfils amore important role than any other in the system. All processes are equals (peers) of each other interms of the services they contribute to the system and those that they consume. In addition, thereis no central process or computer in a peer-to-peer system so that it is difficult to prevent such asystem from delivering a service by removing a single machine from it. This discussion of peer-to-peer systems is divided into two broad categories: file-based peer-to-peer systems; and those based onsharing computation.

File-based peer-to-peer systems

Currently, the most popular application in peer-to-peer systems is file discovery and downloading asexemplified by Gnutella, FreeNet and Pastry [6]. A user of such systems contributes a number of filesto the generally accessible pool of files that all the peers in the system may access. The user in turn mayaccess all the files contributed by the other users of the system. These systems are typically concernedwith file discovery, caching (to improve availability and performance) and file access, via downloading.

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

118 H. EVANS ET AL.

One characteristic of these systems is that the files they store tend to be long-lived, something that canbe measured in terms of days and weeks. This has meant the run-time of these systems usually maintainsome form of table that maps a file identifier to a host identifier from where the file may be downloaded.

The Teaq run-time is also concerned with the discovery of objects and providing access to them.However, Teaq is designed to be able to locate and provide access to Java language-level objects whichcome into existence and are destroyed at a much higher rate than the files accessed in the file-basedpeer-to-peer systems described. The rate at which objects are created and destroyed can be measuredin terms of milliseconds and seconds. In such an environment, maintaining tables is not viable as thesystem would spend most of its time managing them. Therefore, the Teaq system discovers objects byrouting a query through the tree.

Computation-based peer-to-peer systems

A well-known example of a computation-based peer-to-peer system is the Seti@Home initiative [6].The search for extraterrestrial intelligence (SETI) is analysing radio telescope data which is collectedinto work units and stored on a central computer. Users then donate cycles to the search by installingSeti@Home software on their own machine that runs as a screen-saver which downloads a work unitfrom the central computer and searches it for interesting signals. In this way the idle cycles of 3.7Muser’s machines have been used to search the radio telescope data. Another example of a computation-based system is the network of workstation project (NOW) at Berkeley [10].

Teaq can support the development of a computation-based peer-to-peer system. Queries would berun at idle machines in order to find objects that held appropriate data. Once found, these objectswould be contacted directly to download data that the idle machine could execute over. Additionalqueries could then be run to find objects capable of accepting the results of the computation.

Teaq can be used as a programming environment for the development of peer-to-peer systems.A Teaq-based system would place its processes into a tree and route queries across that tree to discoverrelevant objects at run-time. In this way, both the file-based and the computation-based peer-to-peersystems already identified could be built using Teaq. Queries could be executed to find files or data-serving objects in the system. Data-sets (or files) could then be downloaded and worked on and theresults stored in other objects in the system which could be found by running other queries.

Researchers working in the area of peer-to-peer systems are currently focusing their attention onissues such as accountability, anonymity and protocols, as well as dealing with both the technicalproblems of device heterogeneity and dynamic restructuring of the computing fabric as well as morehuman-oriented problems of trust and censorship. The Teaq research is currently focussed on providinga flexible and robust environment in which objects can be easily discovered and accessed at run-time.

The Grid

The Grid [11,12] is an ambitious international collaborative project to provide infrastructure for andsolutions to the problem of large-scale resource sharing between computing devices and people thatare separated geographically and by different administrative domains. The Grid is concerned withcollaboration in computation and data-rich environments that need to flexibly and securely shareresources in a coordinated manner among dynamic collections of individuals and processing elements.

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

THE PERVASIVENESS OF EVOLUTION IN GRUMPS SOFTWARE 119

These goals require solutions in the areas of authentication, authorization and resource access anddiscovery.

Teaq addresses the problem of object discovery in a distributed system. Other issues such asauthentication and authorization would be more likely to be solved at the application-level, ratherthan at the Teaq level, although support for these issues would certainly be useful. Teaq could be usedto provide solutions within the context of the Grid; however, easily navigating firewalls is not currentlysupported. Such support would be required to take advantage of the geographic separation betweenthe processing elements and the data elements of a typical Grid-based application. Teaq’s ability toautomatically repair the tree when processes and nodes fail would be a useful facility for a Gridapplication and would free the programmer from having to manage it.

Web services

Web services [13] is an initiative and set of standards and implementations that allow programfunctionality to be exposed over the Internet. A client calls a Web service enabled piece of codeusing the SOAP protocol [14]. The contents of the SOAP request and reply packets are encodedusing XML [15]. Web services also use other technologies such as the universal description, discoveryand integration system (UDDI) and the Web services description language (WSDL). UDDI allows abusiness to describe the services it provides and perform a search to find other businesses that offerservices it is interested in. WSDL describes what a Web service can do, where it resides, and how toinvoke it.

Both UDDI and WSDL allow one Web service to locate another Web service and to discover whatservices it provides and how to invoke it. This is an example of object discovery, although the Webservices approach has to deal with a large degree of complexity imposed by the need for a generic,widely applicable solution. Teaq is currently targeted more towards building peer-to-peer systemswhere locating objects at run-time can be done by sending a query through the system.

FUTURE WORK

The next stage of development for the GRUMPS run-time architecture will be to add support for Webservices—so that objects can interact across the Internet—and to make Teaq’s query system moreexpressive. The ongoing work into the investigation database will be progressed and tools will be builtand refined to support investigators interactioning with a running GRUMPS-based investigation. Suchtools will also allow them to design and deploy their investigation, based on a description held in theinvestigation database. The GRUMPS team is also building a flexible Java-bytecode instrumentationtool so that hooks into the third version of the run-time system can be added to arbitrary Java code.In this way, GRUMPS will be able to collect data from a wide range of Java-based applications.

ACKNOWLEDGEMENTS

The GRUMPS team gratefully acknowledge the funding provided by the U.K.’s EPSRC (GR/N381141) and IainDarroch and Tony Printezis for reading earlier drafts of this paper.

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120

120 H. EVANS ET AL.

REFERENCES

1. Atkinson M, Brown M, Cargill J, Crease M, Draper S, Evans H, Gray P, Mitchell C, Ritchie M, Thomas R. Summeranthology, 2001. Technical Report TR-2001-96, Department of Computing Science, Glasgow University, September 2001.

2. Sun Microsystems. Java 2 RMI Web Page. Sun Microsystems: Mountain View, CA. http://java.sun.com/products/jdk/rmi/.3. Evans H, Grumps Team. Grumps architecture (version 1). A Grumps internal working document.

http://grumps.dcs.gla.ac.uk/papers/architecture.pdf.4. Evans H, Dickman P, Atkinson M. The GRUMPS architecture: Run-time evolution in a large scale distributed system.

Workshop on Engineering Complex Object-Oriented Solutions for Evolution (ECOOSE), collocated with OOPSLA, Tampa,FL, October 2001.

5. Evans H, Dickman P. Peer-to-peer programming with Teaq. Workshop on Peer-to-Peer Computing, collocated withNetworking 2002, Pisa, Italy, May 2002.

6. Oram A (ed.). Peer-to-Peer: Harnessing the Power of Disruptive Technologies. O’Reilly: Sebastopol, CA, 2001.7. Atallah MJ (ed.). Algorithms and Theory of Computation Handbook. CRC Press: Boca Raton, FL, 1999.8. Cattell RGG, Barry D, Bartels D, Berler M, Eastman J, Gamerman S, Jordan D, Springer A, Strickland H, Wade D. The

Object Database Standard: ODMG 2.0. Morgan Kaufmann Publishers: Los Altos, CA, 1997.9. Sommerville I. Software Engineering (6th edn). Addison-Wesley: Reading, MA, 2001.

10. Anderson TE, Culler DE, Patterson DA, NOW Team. A case for NOW (network of workstations). IEEE Micro 1995;15(1):54–64.

11. Foster I, Kesselman C (eds.). The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann: San Francisco,CA, 1999.

12. Foster I, Kesselman C, Tuecke S. The anatomy of the Grid: Enabling scalable virtual organization. The InternationalJournal of High Performance Computing Applications 2001; 15(3):200–222.

13. Graham S, Simeonov S, Boubez T, Davis D, Daniels G, Nakamura Y, Neyama R. Building Web Services with Java. HowardW. Sams Publishing: Indianapolis, IN, 2002.

14. Mitra N (ed.). Soap version 1.2 part 0: Primer. http://www.w3.org/TR/soap12-part0/ [2001].15. Bray T, Paoli J, Sperberg-McQueen CM, Maler E (eds.). Extensible markup language (xml) 1.0 (second edition).

http://www.w3.org/TR/2000/REC-xml-20001006 [2000].

Copyright c© 2003 John Wiley & Sons, Ltd. Softw. Pract. Exper. 2003; 33:99–120