java abs peer to peer design & implementation of a tuple space

PEER TO PEER DESIGN &

IMPLEMENTATION OF A

TUPLE SPACE

Coordination between nodes within distributed systems is a complex problem and

a current focus of research. It needs to take into account issues of performance,

scalability, dependability and heterogeneity.

One interesting method of coordination that can be utilised is a decentralised tuple

space layer built on top of a peer to peer network. Potentially this solution could be

more efficient, flexible, robust and scalable than other coordination

implementations.

The goal of this project is to investigate this assertion by implementing a

distributed and fully decentralised tuple space co-ordination layer on top of a peer

to peer network.

It will provide a virtual shared space that can be accessed by any computer node

within a peer to peer network regardless of its physical location. Nodes within the

network should be able to post data to the shared space and retrieve data from the

shared space based on the content of the data

The overall goal of this project is to create a distributed system which implements

a tuple space over a peer to peer network. Tuple spaces are a major area of

research within the field of distributed computing at the present moment. Their

main primary concern is the coordination of multiple heterogeneous computers in

geographically remote locations in order to achieve a common task.

Communication is achieved through the exchange of tuples in the tuple space,

rather than direct communication between nodes. This is known as asynchronous

and decoupled communication. This could be useful, for example, in an mobile

environment, where there are not guarantees of an ‘always-on’ service. It is also

concerned with achieving these interactions in a scalable, robust and efficient way.

PROJECT TERMINOLOGY A tuple space is an example of shared associative memory that provides a

repository for bags of tuples (a tuple is a typed set of values - see figure 1.1).

Unlike physical memory where data is stored by its address, a tuple space is

associative in that tuples are stored and retrieved by its content or by its type.

An important distinction is a that it is logically shared memory rather than physically

shared. This means that the tuples could be distributed over a set of nodes. The

tuple space simply provides the necessary abstraction for higher level applications.

A tuple space provides decoupled asynchronous communication between nodes in

a network i.e. for a node to communicate data to another node, it does not have to

establish a permanent connection.

Tanenbaum (2002) states that a “distributed system is a collection of independent

computers that appears to its users as a single coherent system”[1].

This definition works well within the context of the tuple space paradigm as a tuple

space provides a single entry point into a distributed system; higher level

applications do not need to concern themselves with the implementation of the

distributed system underneath

Problem definition

Developing a tuple space over an underlying peer to peer network provides a

number of interesting challenges, namely:

• How does the tuple space decide where the tuples are stored within the system

in such a way they can reliably and efficiently be retrieved?

• How can a flexible solution be provided that will adapt well to many different

application level problems?

• How to separate the different concerns of the system into various components?

AIMS OF PROJECT The primary aim of this project is to develop a tuple space layer coordination layer

and to investigate the potential robustness of this solution.

The scalability of the tuple space implementation could also be investigated.

However this is considered to be outside the scope of the project as it would need

considerable time and resources.

A secondary aim is to investigate how this implementation can be mapped on top

of a peer to peer network, more specifically a Chord open network overlay. The

final aim is to investigate how this implementation can be integrated into the Gridkit

architecture as a plug-in for the interaction framework.

Primary aims

• To investigate the role of peer to peer technology in supporting decentralised tuple

space operations.

• To determine an efficient mapping between tuple spaces and Chord like distributed

hash table data structures.

• To investigate issues of flexibility within the system: i.e. how to provide a flexible

solution to the application level without sacrificing other factors.

• To investigate issues performance of the system.

• To investigate how multi-dimensional data can be efficiently retrieved from the tuple

space.

Secondary aims

Consideration will be given to these aims during design, implementation and evaluation of

the system. However they may not necessarily be covered in depth due to the time

constraints of the project.

• To investigate a component based approach to lie within the Gridkit middleware

architecture.

• Use this to determine what this can provide it in configurability and re-configurability

i.e. how the system can be adapted to the application-level’s needs.

• To investigate the scalability and robustness of the solution

Existing Tuple Space Systems

Existing tuple space systems can be classed into two different types : client-server

based and peer to peer based

Existing peer to peer based systems, various tuple space implementations (both

client-server based and peer to peer based) currently available, different methods

available for constructing a peer to peer tuple space implementations and provides

a look into the Gridkit and Open COM architecture.

Peer to Peer systems To first understand the requirements of this project, a look into existing peer to peer networks and

their properties will be needed. The motivation for using peer to peer technology as a method of

developing this system will also be considered.

A peer to peer network is one in which all nodes (known as ‘peers’) in the network are equal; there

is no single point of failure. Research into peer to peer networks is focused on how to both store

and find data within the networks. There are two main schools of thought, structured and

unstructured peer to peer networks.

Gnutella Gnutella is an open source, fully decentralised peer to peer network, originally developed by

Nullsoft. It is an example of an unstructured peer to peer network providing methods for distributed

searches.

Gnutella used a method called query flooding, which although provided scalability, was inefficient

and did not provide guaranteed lookup results.

Gnutella also provided high fault tolerance, due to its method of sending queries out to every

active node that it is connected to. This ensures that the query propagates its way though the

network even if connected nodes have failed.

Distributed Hash Tables (DHT’S) Distributed Hash Table have been designed to provide efficient and guaranteed lookups and

reliable resource discovery whilst providing the scalability of solutions such as Gnutella. They work

by partitioning a set of keys and their respective values over a number of nodes within a network.

DHT’s can efficiently route messages to a unique owner of a particular key. Most DHT’s use

consistent hashing to map keys to nodes. For example a key is mapped using a certain hash

function to a certain ID, then some mechanism is used to route that key to the node that is

responsible for it.

The value of that key could then be retrieved by hashing on the key, as it will produce exactly the

same ID. To route messages in a DHT, a routing table is used, which contains a set of links of

nodes that are close to it, these in turn form an overlay network. There are many different DHT

implementations examples being Chord[3] and CAN[4].

Chord Stoica, Morris et al presented a DHT implementation called Chord(2001). Chord envisaged the

nodes in the overlay network as being conceptually joined in a circle using a type of doubly-linked

list. Chord provided lookups in the network using only log(n) messages, n being the number of

nodes in the network.

Chord introduced the notion of successors and predecessors. The node in which has an ID that

succeeds the key is responsible for providing storage for that key. If that node was to leave the

network, it would be moved to the next successor. This method ensures a high level of robustness

whilst at the same time minimising the load placed on nodes, the network adapts itself and

distributes the keys to the changing topology of the network(i.e. joining and leaving nodes).

Each node maintains a routing table with details of nodes logically close to it, for routing Chord

messages. This makes it practical to scale to many nodes. Figure 2.1 shows an example Chord

identifier circle with 3 nodes. Key 1 is located at node 1, key 2 is located at node 2 and key 6 is

located at node 0.

Motivation for using DHT peer to peer technology Peer to peer networks have a number of interesting properties over traditional communication

models such as client-server. They can be more scalable then their client-server counterparts as

there is no single bottleneck i.e. a central server for the peers in the system to communicate with.

They also can be more robust in terms of both searching data and the storage of data. This is due

to the decentralised operation of servers and possibilities of distributed replication of files across

the network. These factors potentially provide a system with greater availability than existing

approaches.

The DHT variant can provide this functionality combined with efficient guaranteed lookups.

Existing Tuple Space Systems

Existing tuple space systems can be classed into two different types : client-server

based and peer to peer based. This section will present the motivations behind

investigating a peer to peer approach.

Linda Spaces[5]

The tuple space concept was first introduced by Gelernter(1985). It was developed with the

concept of coordination within parallel programming in mind and was designed as an extension to

existing programming languages.

It pioneered the concept of using a logical shared associative memory space to store operations

and the use of the three tuple operations to write, read and destructively read tuples from the tuple

space. More recently the concept has been adapted for use in coordination within distributed

environments.

It also developed the concept of using ‘template tuples’ to provide lookups within the tuple space.

Template tuples can provide all or some of the values required to retrieve tuples from the tuple

space. They also specify the use of wildcard and range searches to provide flexibility for retrieving

tuples.

Client-Server based Systems

Many of the Tuple Space systems currently available are based on the client-server model. Java

Spaces[6] within the JINI technology platform and TSpaces[7] from IBM are examples of this

approach.

The advantage of this model is in its simplicity, it does not have the problems of coordinating the

system over a set of distributed nodes. The primary disadvantage of the client-server model is that

it provides a single point of failure and may place a high load on the server.

This two problems affect the respective systems potential of scalability, something in which

decentralised tuple space systems are being designed to address.

Motivation for developing a decentralised peer to peer tuple space

The previous section detailed some of the reasons for using peer to peer technology in this

project. Notably due the potential of greater scalability, performance and robustness. What needs

to be considered is the motivation for developing a tuple space over peer to peer technology.

The tuple space paradigm, is at its most useful when used in environments with a large number of

geographically dispersed nodes which have intermittent availability. Therefore the traditional

client-server paradigm does not make sense as it would be difficult to enable this sort of

functionality. Peer to peer technology lends it self well to this functionality, and combined with its

other characteristics it makes for an interesting platform.

EXISTING PEER TO PEER TUPLE SPACE SYSTEMS

More recently researchers have been looking to decentralise the operation of tuple

spaces as a way of improving aspects of availability and scalability. This section

will detail some of the approaches that have been considered.

Comet - Li and Parashar(2005) present a system called Comet. Comet makes use of a Hilbert

Space Filling curve mapping to map tuples onto underlying chord nodes. Space filling curves will

be described in more detail following this section, however they basically provide a multi-

dimension to singular-dimension mapping. The Hilbert Space Filling curve is a locality preserving

function in that contextually similar tuple are grouped together. This improves the performance of

the system when looking up data and performing range or wildcard searches as similar tuples

should be grouped together on similar peers.

PeerGameSpace - Wang, Hsiao et al(2005) present a method of using ‘shortcuts’ within a

Chord network to point to various applicable tuples. However it is not made clear how many

shortcuts would be needed to retrieve tuples in multi-dimensional context or how efficient and

flexible range queries could be implemented. They have developed a simple peer to peer game to

run on top of this implementation.

Panda - Christian, Durate et al(2004) present an entirely different method in the Panda system.

Panda uses a two-tier approach to storing tuples with a tuple space, tuples are stored in the

underlying node that is responsible for the ‘signature’ of its tuple. The signature of the tuple being

the complete type of the tuple: an ordered list of the tuple type fields. Inside the hash table in the

node, the tuples are stored by a key that is hashed on the result of the ‘content’ of the data.

Pier - Although not a tuple space in the traditional sense; Pier presents a method of using

‘distributed joins’ and providing database querying techniques to lookup tuples in a layer

implemented on top of a DHT. It currently uses CAN as an underlying DHT overlay. The method

given is designed to be scalable to massively distributed systems.

SPACE FILLING CURVES The previous section detailed an approach of using space filling curves to map tuple attributes to

Chord nodes. Space filling curves[12] were first described by 19th century mathematician Peano;

examples being Hilbert Space Filling curves and Z-Order curves.

They are ‘curves whose ranges contain the entire 2-dimensional unit square’ or in the case of

many dimensions, the ‘n-dimensional hypercube’. Space filling curves have been presented as a

method for mapping multi-dimensional data (i.e. n-dimensional tuples) into 1-dimension.

This makes them incredibly useful in a distributed context, as it provides a method of mapping

tuples onto a one-dimension Chord node.

Z-Order curves

Further research indicated the use of Z-Order curves for indexing and querying multi-dimensional

data in a database. This approach was pioneered by Tropf and Herzog(1981)[13]. Binary search

trees were presented as an efficient method of looking up data; supporting both exact and range

searching facilities.

More recently Chawathe, LaMarca et al(2005)[14] have proposed a method of using Z-Order

curves to map 2-dimensional geographical information tuples onto a Distributed Hash Table.

Binary prefix tries(a type of binary tree where each subsequent node searched leads to a certain

piece of data) are used to store and lookup data; tuples are only stored in leaf nodes.

GRIDKIT / OPENCOM ARCHITECTURE

Gridkit Coulson, Grace, Blair et al(2004)[16] describe a middleware called Gridkit being developed for use

within grid computing. Grid middleware acts as an intermediary between the different components

within a distributed system; therefore allowing potential grid applications to make use of this

functionality without having to understand the underlying complexity.

Gridkit is a component based architecture that supports a number of middleware services such as

interaction types, resource discovery, resource management and security. These middleware

services are built on top of an ‘open overlays’ layer which in turn abstracts over the underlying

communications layer to provide various support e.g. peer to peer communication.

The main concerns for this project are interaction types and overlays. Overlay networks are

‘virtual’ networks layered over an underlying physical network. An example of an overlay network

is the Chord distributed hash table described previously. Interaction types are built on top the

overlay networks to provide a particular service desirable to higher level applications. Publish-

Subscribe and Tuple Spaces are examples of interaction types that could be used within the

Gridkit framework.

Component architecture Gridkit makes use of OpenCOM v2(2004) [17] as its component object model. The main concept

that needs to be understood is that of interfaces, receptacles and connections. Interfaces describe

a unit of service provision, receptacles describe a unit of service requirement and connections

allow for the binding between components with receptacles and interfaces. This component model

will be used within the development of this system as it will provide a method of integrating the

tuple space system within Gridkit and the existing Chord/DHT components. OpenCOM however

does not take into account the distribution of the system (i.e. different components situated on

different nodes) therefore the Chord layer makes use of Java RMI, to provide a method of

invocating methods on components situated on different nodes.

Motivation for using component architecture There are many advantages as to the use of a Gridkit/component based architecture as described

above. Namely it creates the possibility of configurable/reconfigurable solution as per to the aims.

The tuple space does not have to be necessarily dependent on a single overlay network(such as

Chord), a different component could be selected depending on the needs of the system. Similarly,

differing components could also be created to represent a variety of tuple space algorithms which

could be adaptive to the needs of the application-level.

java abs peer to peer design & implementation of a tuple space

Technology