a seminar report on grid computing

A Seminar Report on

UR TITLE NAME

Submitted in partial fulfillment of the requirement for the award of

Bachelor of Technology

In

COMPUTER SCIENCE & ENGINEERING

From

JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD

By

AMBADIPUDI RAJESH

08RC1A0503

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

LAQSHYA INSTITUTE OF TECHNOLOGY & SCIENCE

(Approved by AICTE, New Delhi & Affiliated to JNTU, Hyderabad)

TANIKELLA (V), KHAMMAM (M), KHAMMAM (Dt). A.P. India -507305

Ph: 08742 211306

http://www.laqshya.edu.in/

1


LAQSHYA INSTITUTE OF TECHNOLOGY & SCIENCE

(Approved by AICTE, New Delhi & Affiliated to JNTU, Hyderabad)

TANIKELLA (V), KHAMMAM (M), KHAMMAM (Dt). A.P. India -507305

Ph: 08742 211306


CERTIFICATE

This is to certify that the dissertation entitled “GRID COMPUTING” is a confide work

done by “AMBADIPUDI RAJESH,08RC1A0503” in the partial fulfillment of Bachelor of

Technology in Computer Science & Engineering from JNTU, Hyderabad during the year 2011-

2012.

Mr.SK.SHARFUDDIN M.Tech Mrs.M.Sri devi M.Tech,(p.hd)

Assistant Professor Associate Professor

Supervisor H.O.D., C.S.E.

2


ACKNOWLEDGEMENTACKNOWLEDGEMENT

The satisfaction that accompanies the successful completion of any task would be incomplete

without the mention of the people who made it possible and whose encouragement and guidance

has been a source of inspiration throughout the course of project.

It is my privilege and pleasure to express my profound sense of gratitude and

indebtedness to Mr.SK.SHARFUDDIN seminor Supervisor& Assistant Professor,

Department of Computer Science and Engineering, LAQSHYA institute of technology and

science, for his guidance, cogent discussion, constructive criticisms and encouragement

throughout this dissertation work.

I express my sincere gratitude for Associate Professor Mrs.M.SRIDEVI, Head of

Department of Computer Science and Engineering, LAQSHYA institute of technology and

science, for her precious suggestions, motivation and co-operation for the successful

completion of this seminor work.

In addition I would like to thank all my family members,

friends, and colleagues for giving moral strength and support to complete this dissertation.

3

GridFTP

Abstract:

Applications of Grid Computing

A Grid can be simply defined as a combination of different components which

function collectively as a part of one large electrical or electronic circuit.

Grid FTP is a wellknown and robust protocol for fast data transfer on the grid.

GridFTP is an exceptionally used for large volumes of data.

In this data transmission we can face a “Lots of small files” (LOSF) problem. In this

problem the large amount of data set will be partitioned into a small file. The transmission of the

small files will be achieved by the concept of “Pipelining”.

Pipelining approaches the LOSF problem by trying to minimize the amount of time

between transfers. Pipelining allows the client to have many outstanding transfer commands at a

time, Instead of being forced to wait for the transfer successful acknowledgement message the

client has free to send the transfer commands at any time. The Server processes these requests in

the order they are send.

In Grid FTP we can establish a channel. In this Channel establishment we can use

two channels one is Control channel and another is Data channel.

Key words:

Grid

Grid FTP

LOSF (Lots of small files )

Pipelining

Robust

Server

Control channel

4

INDEX

1 INTRODUCTION………………………………7-13

2 LITERATURE SURVEY……………………….14-23

3 EXISTING SYSTEM…………………………....24-13

Dis-advantages in Existing system

4 PROPOSED SYSTEM…………………………...14-19

Advantages in proposed system.

5 METHODS………………………………………..20-21

6 CONCLUSION……………………………………..22

7 FUTURE WORK…………………………………...23

8 BIBILOGRAPHY…………………………………...24

9 WEB SITES………………………………………...25

5

LIST OF FIGURES

s.no Figure name/description Page.no

1

2

6

INTRODUCTION

A Grid can be simply defined as a combination of different components which function

collectively as a part of one large electrical or electronic circuit. It can also be defined as a

paradigm/infrastructure that enables the sharing, selection and aggregation of geographically

distributed resources such as Computers-PCs, Workstations, Clusters, SuperComputers,

Softwares, Catalogued data and databases etc.

The term “Grid Computing” can similarly be applied to a large number of computers

which connect together to collectively solve a problem of very high complexity and magnitude.

The fundamental idea behind the making of any computer based grid is to utilize the idle time of

processor cycles. Simply stated, a processor during the times it would stay idle would now team

up with similar idle processors to tackle various complexities. Grid Computing virtualizes

distributed computing and data resources such as processing, network band-width and storage

capacity to create a single system image, granting users and applications seamless access to vast

IT capabilities.

GridFTP

An important type of communication in grid and distributed computing environments is bulk

data transfer. GridFTP has emerged as a de facto standard for secure, reliable, high-performance

data transfer across resources on the Grid

GridFTP is a well-known and robust protocol for fast data transfer on the Grid. The

GridFTP implementation provided by the Globus Toolkit can scale to network speeds and has

been shown to deliver 27 Gb/s on 30 Gb/s. The Globus Toolkit is an open source software

toolkit used for building Grid systems and applications

7

The protocol is optimized to transfer large volumes of data commonly found in Grid

applications. Datasets of sizes from hundreds of megabytes to terabytes and beyond can be

transferred at close to network speeds by using GridFTP. Given the high-speed networks

commonly found in modern Grid environments, datasets less than 100 MB

are too small for the underlying protocols like TCP to utilize the maximum capacity of the

network. Therefore, GridFTP – and most bulk data transfer protocols –experiences the highest

levels of throughput when transferring large volumes of data. Unfortunately,

conventional implementations of GridFTP have a limitation as to how the data must be

partitioned to reach these high-throughput levels. Not only must the amount of data to

transfer be large enough to allow TCP to reach full throttle, but the data must also be in

large files, ideally in one single file. If the dataset is large but partitioned into many small

files(on gigabit networks we consider any file smaller than 100 MB as a small file), the

performance of GridFTP servers suffers drastically This problem is known as the“lots of small

files”(LOSF) problem.

In this paper we study the LOSF problem and present a solution known as pipelining.

We have implemented pipelining in the Globus Toolkit,

LOSF PROBLEM

The GridFTP protocol is a backward-compatible extension of the legacy RFC959 FTP protocol.

It maintains the same command/response semantics introduced

by RFC959. It also maintains the two-channel protocol semantics. One channel is for control

messaging (the control channel) such as requesting what files to transfer , and the other is for

streaming the data payload (the data channel). These protocol details have

interesting effects on the LOSF problem.

Channel Establishment

GridFTP servers listen on a well-known and published port for client control channel

connections. Once a client successfully forms a control channel with a server (this often involves

8

authentication and authorization), it can begin sending commands to the server.In order to

transfer a file, the client must first establish a data channel.This involves sending the server a

series of commands on the control channel describing attributes of the desired data channel such

as: what protocol to use, binary or ASCII data, passive or active connection, and various protocol

specific attributes. Once these commands are successfully sent, a client can request a file

transfer. At this point a separate data channel connection is formed using all of the agreed upon

attributes, and the requested file is sent across it. In standard FTP the data channel can be used

only to transfer one file. Future transfers must again go through the process of setting up a new

data channel.

GridFTP modified this part of the protocol to allow many files to be transferred across a single

data channel. With GridFTP all of the messaging to establish a data channel is done once; the

data channel connection is formed just once, and the client can request several file transfers using

that same data channel. This enhancement is known as data channel caching.

File Transfers

File transfer requests are done with the RETR (send) or STOR (receive) command. A client

sends one of these commands to the server across the control channel. Data then begins to flow

between the client and server over the data channel. Once all of the data has been transferred, a

“226 Transfer Complete” acknowledgment message is sent from the server to the client on the

control channel. Only when this acknowledgment is received can the client request another

transfer. This interaction is illustrated in Figure 1.

As the figure shows, there is an entire round-trip time on the control channel between

transfers where the data channel must be idle. Before issuing the next transfer command

the client must first receive the transfer completion acknowledgment, which is one across the

network. After receiving the acknowledgment, the client sends the transfer command

immediately. However, the server does not immediately receive it

9

Figure 1: GridFTP file transfers with no pipelining

The message must cross the network before the server will begin sending data. This process

involves another trip across the network. Assuming we have the GridFTP data channel caching

enabled, we do not have to worry about the latencies involved with establishing

the data channel. If we do not have it enabled, the delay is significantly longer.

During this time the data channel is idle. The latency between transfers adds to the overall

transfer time and thus detracts from the overall throughput. The problem is even exacerbated

when communicating over highlatency networks where the RTT is very

high. While the idle data channel time is a problem, there is a far greater problem that it causes.

TCP is a window-based protocol. For it to achieve maximum efficiency, the window size of

10

allowed unacknowledged bytes must grow to the bandwidth delay product . Various algorithms

in the TCP protocol decide to increase or decrease

the window size based on observed events. If a connection is idle for longer than one RTT, the

window size gets reduced to zero; and once it is used again, it must go through TCP slow start

[14]. When transferring a series of files, the data channel is idle for a control channel RTT in

between transfers. If the control channel RTT and the data channel RTT are similar, it is likely

that data channel TCP connections will have entire

closed windows by the time the next transfer begins. When the amount of data sent in

each file is small, the ratio of idle data channel time to transfer time becomes higher and affects

the throughput. Additionally, small files may not be transferred long enough to traverse the slow-

start algorithm and bring TCP to full throttle. Thus, even when data is being transferred, it is not

moving at full speed.

PIPELINING

Pipelining approaches the LOSF problem by trying to minimize the amount of time between

transfers. Pipelining allows the client to have many outstanding, unacknowledged transfer

commands at once. Instead of being forced to wait for the “226 Transfer Successful” message;

the client is free to send transfer commands at

any time. The server processes these requests in the order they are sent. Acknowledgments are

returned to the client in the same order. The process is shown in

Figure 2.This process hides the latency of transfer requests by overlapping them with data

transfers. The first transfer request is sent, and data begins to flow across the data

channel. While the file transfer is in progress, the client sends the next n file transfer

requests. The server queues the requests. When the server completes the file

transfer, it sends the acknowledgment to the client and checks the queue for the next transfer

request. If the queue is not empty, the next file transfer begins immediately.

There is some inevitable processing latency between transfers, but it is very small compared to

the entire RTT of network latency that has been eliminated.

11

Figure 2: GridFTP file transfers with pipelining

According to the proposed pipelining protocol, the client is allowed to send an unlimited number

of outstanding commands. In practice, the number of outstanding commands will be limited by

the GridFTP server implementation and TCP flow control. The client is free to send as many

commands as it wishes on the TCP control channel. However, the GridFTP server will read a

limited number of these commands out of the TCP buffer and into its process space. All other

outstanding commands will remain in the operating systems TCP buffers. As the server side

buffers get full, the TCP window will close. Ultimately, the sending side TCP buffers will fill up,

and the client’s attempt to send future commands will be stalled. In most cases there is little

performance benefit for a client to have more thanthree outstanding commands; however,

allowing an unlimited number makes client implementation simpler.Client waits for the same

number of acknowledgments from the server.

12

GridFTP Pipelining

GridFTP is a high-performance, secure, reliable datatransfer protocol optimized for

high-bandwidth wide-area networks. GridFTP is an exceptionally fast transfer protocol for

large volumes of data. Implementations of it are widely deployed and used on well-connected

Grid environments such as those of the TeraGrid because of its ability to scale to network

speeds. However, when the data is partitioned into many small files instead of few large files, it

suffers from lower transfer rates. The latency between the serialized transfer requests of each file

directly detracts from the amount of time data pathways are active, thus lowering achieved

throughput. Further, when a data pathway is inactive, the TCP window closes, and TCP must go

through the slow-start algorithm. The performance penalty can be severe. This situation is known

as the “lots of small files” problem. In this paper we introduce a solution to this problem. This

solution, called pipelining, allows many transfer requests to be sent to the server before any one

completes. Thus, pipelining hides the latency of each transfer request by sending the requests

while a data transfer is in progress. We present an implementation and performance study of the

pipelining solution.

13

LITERATURE SURVEY

Utility computing is the conceptual core of our analysis but much of the current debate on

this idea is discourse on the concept of “Cloud Computing” – a more marketable vision perhaps

than utility computing. Cloud Computing is a new and confused term. Gartner define cloud

computing succinctly as “a style of computing where massively scalable IT-related capabilities

are provided ‘as a service’ using Internet technologies to multiple external customers”. Yet our

interest is not in the particulars of cloud computing itself but the opportunities presented for

researchers and practitioners by this new technology. We argue that fundamental to both cloud

computing and utility computing is a decoupling of the physicality of IT infrastructure from the

architecture of such infrastructures use. While in the past we thought about the bare-metal

system (a humming grey box in an air-conditioned machine room with physical attributes and a

host of peripherals) today such ideas are conceptual and virtualized – hidden from view. It is this

decoupling which will form the basis of our discussion of the technology of the Grid.

There certainly is a strong element of hype in much of the Utility, Grid and Cloud computing

discourse and perhaps such hype is necessary. As Swanson and Ramillar (1997) remind us, the

organising visions of information and communications technology are formed as much in

extravagant claims and blustering sales talk as they are in careful analysis, determination of

requirements or proven functionality. We can at times observe a distinct tension between the

technologists’ aspiration to develop and define an advanced form of computer infrastructure, and

a social construction of such technology through discourses of marketing, public relations. We

find a plethora of terms associated with Utility computing within commercial settings include

Autonomic Computing; Grid Computing; On-Demand Computing; Real-time Enterprise;

Service-Oriented Computing; Adaptive computing (or Adaptive Enterprise) (Goyal and Lawande

2006; Plaszczak and Wellner 2007) and peer-to-peer computing (Foster and Iamnitchi 2003). We

have adopted the term “utility computing” as our categorization of this mixed and confused

definitional landscape.

Many authors who write about Utility Computing start with an attempt to provide a definition,

often accompanied by a comment as to the general “confusion” surrounding the term (e.g.

(Gentzsch 2002)). It is unrealistic to expect an accepted definition of a technology which is still

emerging, but by tracing the evolution of definitions in currency we can see how the

14

understanding of new technology is influenced by various technical, commercial and socio-

political forces. Put another way, the computer is not a static thing, but rather a collection of

meanings that are contested by different groups (Bijker 1995), and as any other technology,

embodies to degrees its developers’ and users’ social, political, psychological, and professional

commitments, skills, prejudices, possibilities and constraints.

Computing Utility: The Shifting nature of Computing.

Since Von Neumann defined our modern computing architecture we have seen computers as

consisting of a processing unit (capable of undertaking calculation) and a memory (capable of

storing instructions and data for the processing unit to use). Running on this machine is

operating system software which manages (and abstracts) the way applications software makes

use of this physical machine. The development of computing networks, client-server computing

and ultimately the internet essentially introduced a form of communication into this system –

allowing storage and computing to be shared with other locations or sites - but ultimately the

concept of a "personal computer" or "server computer" remains.

This basic computer architecture no longer represents computing effectively. Firstly the physical

computer is becoming virtualized – represented as software rather than as a physical machine.

Secondly it is being distributed through Grid computing infrastructure such that it is owned by

virtual rather than physical organizations. Finally these two technologies are brought together in

a commoditization of computing infrastructure as cloud computing – where all physicality of the

network and computer is hidden from view. It is for this reason that in 2001 Shirky –at a P2P

Webservices conference stated that “Thomas Watson’s famous quote that’ I think there is a

world market for maybe five computers’ was wrong - he overstated the number by four”. For

Shirky the computer was now a single device collectively shared. All PCs, mobile phones and

connected devices share this Cloud of services on demand – and where processing occurs is not

relevant. We now review the key technologies involved in Utility Computing.

1: Internet – Bandwidth and

Internet Standards

At the core of the Utility Computing model is the network.

The internet and its associated standards have enabled

interoperability among systems and provides the foundation

15

for Grid Standards.

2: Virtualisation

Central to the Cloud Computing idea is the concept of

Virtualising the machine. While we desire services, these are

provided by personal-machines (albeit simulated in

software).

3:Grid Computing

Middleware and Standards

Just as the Internet infrastructure (standards, hardware and

software) provides the foundation of the Web, so Grid

Standards and Software extend this infrastructure to provide

utility computing utilising large clusters of distributed

computers.

Internet – Bandwidth and Standards

The internet emerged because of attempts to connect mainframe computers together to undertake

analysis beyond the capability of one machine - for example within the SAGE air-defence

system or ARPANET for scientific analysis (Berman and Hey 2004). Similarly the Web emerged

from a desire to share information globally between various different computers (Berners-Lee

1989). Achieving such distribution of resources is however founded upon a communications

infrastructure (of wires and radio-waves) capable of transferring information at the requisite

speed (bandwidth) and without delays (latency). Until the early 2000s however the bandwidth

required for large applications and processing services to interact was missing. During the dot-

com boom however a huge amount of fibre-optic cable and network routing equipment was

installed across the globe by organisations, such as the failed WorldCom, which reduced costs

dramatically and increased availability.

Having an effective network infrastructure in place is not enough. A set of standards (protocols)

are also required which define mechanisms for resource sharing (Baker, Apon et al. 2005).

Internet standards (HTTP/HTML/TCP-IP) made the Web possible by defining how information

is shared globally through the internet. These standards ensure that a packet of information is

reliably directed between machines. It is this standardised high-speed high-bandwidth Internet

infrastructure upon which Utility Computing is built.

16

Virtualization

Virtualization for cloud computing is a basic idea of providing a software simulation of an

underlying hardware machine. These simulated machines (so called Virtual Machines) present

themselves to the software running upon them as identical to a real machine of the same

specification. As such the virtual machine must be installed with an operating system (e.g.

Windows or Linux) and can then run applications within it. This is not a new technology and was

first demonstrated in 1967 by IBM’s CP/CMS systems as a means of sharing a mainframe with

many users who are each presented with their own “virtual machine” (Ceruzzi 2002). However

its relevance to modern computing rests in its ability to abstract the computer away from the

physical box and onto the internet. “Today the challenge is to virtualize computing resources

over the Internet. This is the essence of Grid computing, and it is being accomplished by

applying a layer of open Grid protocols to every “local” operating system, for example Linux,

Windows, AIX, Solaris, z\OS” (Wladawsky-Berget 2004). Once such Grid enabled virtualization

is achieved it is possible to decouple the hardware from the now virtualized machine, for

example running multiple virtual machines on one server or moving a virtual machine between

servers using the internet. Crucially for the user it appears they are interacting with a machine

with similar attributes to a desktop machine or server - albeit somewhere within the internet-

cloud.

Grid Computing

The term “Grid” is increasingly used in discussions about the future of ICT infrastructure, or

more generally in discussion of how computing will be done in the future. Unlike “Cloud

computing” which emerges and belongs to an IT industry and marketing domain, the term “Grid

Computing” emerged from the super-computing (High Performances Computing) community

(Armbrust, Fox et al. 2009). Our discussion of Utility computing begins with this concept of

Grids as a foundation. As with the other concepts however for Grids hyperbole around the

concept abounds, with arguments proposed that they are “the next generation of the internet”,

“the next big thing”; or that will “overturn strategic and operating assumptions, alter industrial

economics, upset markets (…) pose daunting challenges for every user and vendor” (Carr 2005)

and even “provide the electronic foundation for a global society in business, government,

17

research, science and entertainment” (Berman, Fox et al. 2003). Equally, Grids have been

accused of faddishness and that “there is nothing new” in comparison to older ideas, or that the

term is used simply to attract funding or to sell a product with little reference to computational

Grids as they were originally conceived (Sottrup and Peterson 2005).

From a technologists perspective an overall description might be that Grid technology aims to

provide utility computing as a transparent, seamless and dynamic delivery of computing and data

resources when needed, in a similar way to the electricity power Grid (Chetty and Buyya 2002;

Smarr 2004). Indeed the word grid is directly taken from the idea of an electricity grid, a utility

delivering power as and when needed. To provide that power on demand a Grid is built (held

together) by a set of standards (protocols) specifying the control of such distributed resources.

These standards are embedded in the Grid middleware, the software which powers the Grid. In a

similar way to how Internet Protocols such as FTP and HTTP enable information to be past

through the internet and displayed on users PCs, so Grid protocols enable the integration of

resources such as sensors, data-storage, computing processors etc (Wladawsky-Berget 2004).

The idea of the Grid is usually traced back to the mid 1990s and the I-Way project to link

together a number of US supercomputers as a ‘metacomputer’ (Abbas, 2004). This was led by

Ian Foster of the University of Chicago and Argonne National Laboratory. Foster and Carl

Kesslemenn then the Globus project to develop the tools and middle ware for this

metacomputer[3]. This tool kit rapidly took off in the world of supercomputing and Foster

remains a prominent proponent of the Grid. According to Foster and Kesselman’s (1998) “bible

of the grid” a computational Grid is “a hardware and software infrastructure that provides

dependable, consistent, pervasive and inexpensive access to high-end computational

capabilities”. In this Foster highlights “high-end” in order to focus attention on Grids as

supercomputing resource supporting large scale science; “Grid technologies seek to make this

possible, by providing the protocols, services and software development kits needed to enable

flexible, controlled resource sharing on a large scale” (Foster 2000)[4].

Three years after their first book however the same authors shift their focus, again speaking

of Grids as "coordinated resource sharing and problem solving in dynamic, multi-institutional

virtual organizations" (Foster, Kesselman et al. 2001). The inclusion of “multi-institutional”

within this 2001 definition highlights the scope of the concept as envisaged by these key Grid

18

http://www.pegasus.lse.ac.uk/Outputs/GridLiteratureReview.htm#_ftn4


proponents, with Berman (2003) further adding that Grids enable resource sharing “on a global

scale”. Such definitions, and the concrete research projects that underlie them, make the

commercial usage of the Grid seem hollow and opportunistic. These authors seem critical of the

contemporaneous re-badging by IT companies of existing computer-clusters and databases as

“Grid enabled” (Goyal and Lawande 2006; Plaszczak and Wellner 2007). This critique seems to

run through the development of Grids within supercomputing research and science where many

lament the use of the term by IT companies marketing clusters of computers in one location.

In 2002 Foster provides a three point checklist to assess a Grid (Foster 2002). A Grid

1) coordinates resources that are NOT subject to centralized control;

2) uses standard, open, general purpose protocols and interfaces;

3) delivers non-trivial qualities of service. Fosters highlighting of ‘NOT’, and the inclusion of

‘open protocols’ appear as a further challenge to the commercialization of centralized, closed

grids.

While this checkpoint was readily accepted by the academic community and is widely

cited, unsurprisingly, it was not well received by the commercial Grid community (Plaszczak

and Wellner 2007). The demand for “decentralization” was seen as uncompromising and

excluded “practically all known ‘grid’ systems in operation in industry” (Plaszczak and Wellner

2007, p57). It is perhaps in response to this definition that the notion of “Enterprise Grids”

(Goyal and Lawande 2006) emerged as a form of Grid operating within an organisation, though

possibly employing resources across multiple corporate locations employing differing

technology. It might ultimately be part of the reason why "Cloud computing" has eclipsed Grid

computing as a concept. The commercial usage of Grid terms such as “Enterprise Grid

Computing” highlights the use of Grids away from the perceived risk of globally distributed

Grids and is the foundation of modern Cloud Computing providers (e.g Amazon S3). The focus

is not to achieve increased computing power through connecting distributed clusters of

machines, but as a solution to the “Silos of applications and IT systems infrastructure” within an

organisation’s IT function (Goyal and Lawande 2006, p4) through a focus on utility computing

and reduced complexity. Indeed in contrast to most academic Grids such “Enterprise Grids”

demand homogeneity of resources and centralization within Grids as essential components. It is

19

these Grids which form the backdrop for Cloud Computing and ultimately utility computing in

which cloud provider essentially maintain a homogenous server-farm providing virtualized cloud

service. In such cases the Grid is far from distributed, rather existing as “a centralized pool of

resources to provide dedicated support for virtualized architecture” (Plaszczak and Wellner

2007,p174) often within data-centers.

Before considering the nature of Grids we discuss their underlying architecture. Foster (Foster,

Kesselman et al. 2001) provides an hour-glass Grid architecture (Figure 1). It begins with the

fabric which provides the interfaces to the local resources of the machines on the Grid (be they

physical or virtual machines). This layer provides the local, resource-specific facilities and could

be computer processors, storage elements , tape-robots, sensor, databases or networks. Above

this is a resource and connectivity layer which defines the communication and authentication

protocols required for transactions to be undertaken on the Grid. The next layer provides a

resource management function including directories, brokering systems, as well as monitoring

and diagnostic resources. In the final layer reside the tools and applications which use the Grid. It

is here that Virtualization software resides to provide services.

Figure 1: The Layered Grid Architecture.

20

One of the key challenges of Grids is the management of the resources they manage for

the users. Central to achieving this is the concept of a Virtual Organisation (VO). A Virtual

Organisation is a set of individuals and/or institutions defined by the sharing rules for a set of

resources (Foster and Kesselman 1998) or “a set of Grid entities, such as individuals,

applications, services or resources, that are related to each other by some level of trust”

(Plaszczak and Wellner 2007). By necessity these resources must be controlled “with resource

providers and consumers defining clearly and carefully just what is shared, who is allowed to

share, and the conditions under which sharing occurs” (Foster and Kesselman 1998) and for this

purpose VOs are technically defined along with the rules of their resources sharing. A Grid VO

implies the assumptions of “the absence of central location, central control, omniscience, and an

existing trust relationship” (Abbas 2004). It is this ability to control access to resources which is

also vital within Cloud Computing - allowing walled-gardens for security and accounting of

resource usage for billing.

Various classes and categories of Grids exist. According to Abbas Grids can be categorised

according to their increasing scale - desktop grids, cluster grids, enterprise grids, and global grids

(Abbas 2004). Desktop Grids are based on existing dispersed desktop PC’s and can create a new

computing resource by employing unused processing and storage capacity while the existing user

can continue to use the machine. Cluster Grids describe a form of parallel of distributed

computer system that consists of a collection of interconnected yet standardised computer

nodes working together to act, as far as the user is concerned, as a single unified computing

resource. Many existing supercomputers are clusters which “use Smart Software Systems (SSS)

to virtualise independent operation-system instances to provide an HPC service” (Abbas 2004).

All the above are arguably grids, and potentially can just about live up to Fosters 3 tests.

However, for the information systems field, for Pegasus, and for those who wish to explore

Cloud Computing, it is the final category of global Grids that is the most significant. Global

Grids employ the public internet infrastructure to communicate between Grid Nodes, and rely on

heterogeneous computing and networking resources. Some global grids have gained a large

amount of publicity by providing social benefit which capture the public imagination. Perhaps

the first large scale such project was SETI@home which searches radio-telescope data for signs

of extra-terrestrial intelligence. WorldCommunityGrid.org undertaking research for healthcare

21

and Folding@home concerned with protein folding experiments are other examples.

Folding@home indeed can claim to be the worlds most powerful distributed computing network

according to the Guinness Book of Records, with 700,000 Sony PlayStation 3 machines and over

1,000 trillion calculations per second[9]. Each works by dividing a problem into steps and

distributing software over the internet to the computers of those volunteering. Since within the

home and workplace a large number of desktop computers remain idle most of the time such

donations have little impact on the user. Indeed the average computer is idle for over 90% of the

time, and even when used only a very small amount of the CPU’s capabilities are employed

(Smith 2005).

Another way to categories Grids is by the types of solutions that they best address (Jacob 2003).

A computational grid is focused on undertaking large numbers of computations rapidly, and

hence the focus is on using high performance processors. A data grid’s focus is upon the

effective storage and distribution of large amounts of data, usually across multiple organisations

of locations. The focus of such systems is upon data integrity, security and ease of access. It

should be stressed that there are no hard boundaries between these two types of grid, and one

need often pre-supposes the other and real users face both issues.

As an example of a grid project with a more data orientation, consider the Biomedical

Informatics Research Network, a grid infrastructure project that serves biomedical research

needs http://www.nbirn.net/index.shtm. They express their offerings in terms of 5

complementary elements; a cyber infrastructure, software tools (applications) for biomedical

data gathering, resources of shared data, data integration support, an ontology and support for

multi-site integration of research activity. As they say, “By intertwining concurrent revolutions

occurring in biomedicine and information technology, BIRN is enabling researchers to

participate in large scale, cross-institutional research studies where they are able to acquire,

share, analyze, mine and interpret both imaging and clinical data acquired at multiple sites using

advanced processing and visualization tools.”

Other examples of Grid Computing exist within science, particularly particle physics. The

particle physics community faces the challenge of analyzing the unprecedented amounts of data

- some 15 Petabytes per year - that will be produced by the LHC (Large Hadron Collider)

experiments at CERN[10]. To process this data CERN required around 100,000 computer-

22



http://www.nbirn.net/index.shtm

equivalents[11] forming its associated grids by 2007, spread across the globe and incorporating a

number of grid infrastructures (Faulkner, Lowe et al. 2006). In using the Grid physicists submit

their computing-jobs to the Grid which spreads across the globe. Similarly data from the LHC is

initially processed at CERN but is quickly spread to 12 computer centres across the world (so

called Tier-1 Grid sites). From here data is spread to local data-centres at universities within

these countries (Tier-2 sites).

23


EXISTED SYSTEM

CONVENTIONAL SUPER COMPUTERS:

“Distributed” or “grid” computing in general is a special type of parallel computing that relies on

complete computers (with onboard CPUs, storage, power supplies, network interfaces, etc.)

connected to a network (private, public or the Internet) by a conventional network interface, such

as Ethernet. This is in contrast to the traditional notion of a supercomputer, which has many

processors connected by a local high-speed computer bus. The primary advantage of distributed

computing is that each node can be purchased as commodity hardware, which, when combined,

can produce a similar computing resource as multiprocessor supercomputer, but at a lower cost.

This is due to the economies of scale of producing commodity hardware, compared to the lower

efficiency of designing and constructing a small number of custom supercomputers. The primary

performance disadvantage is that the various processors and local storage areas do not have high-

speed connections. This arrangement is thus well-suited to applications in which multiple

parallel computations can take place independently, without the need to communicate

intermediate results between processors. The high-end scalability of geographically dispersed

grids is generally favorable, due to the low need for connectivity between nodes relative to the

capacity of the public Internet.

There are also some differences in programming and deployment. It can be costly and difficult to

write programs that can run in the environment of a supercomputer, which may have a custom

operating system, or require the program to address concurrency issues. If a problem can be

adequately parallelized, a “thin” layer of “grid” infrastructure can allow conventional, standalone

programs, given a different part of the same problem, to run on multiple machines. This makes it

possible to write and debug on a single conventional machine, and eliminates complications due

to multiple instances of the same program running in the same shared memory and storage space

at the same time.

Design considerations and variations

One feature of distributed grids is that they can be formed from computing resources belonging

to multiple individuals or organizations (known as multiple administrative domains). This can

24

http://en.wikipedia.org/wiki/Administrative_domain

http://en.wikipedia.org/wiki/Computer_memory

http://en.wikipedia.org/wiki/Concurrency_(computer_science)

http://en.wikipedia.org/wiki/Scalability

http://en.wikipedia.org/wiki/Economies_of_scale

http://en.wikipedia.org/wiki/Multiprocessor

http://en.wikipedia.org/wiki/Commodity_hardware

http://en.wikipedia.org/wiki/Computer_bus

http://en.wikipedia.org/wiki/Supercomputer

http://en.wikipedia.org/wiki/Ethernet

http://en.wikipedia.org/wiki/Network_interface_controller

http://en.wikipedia.org/wiki/Internet

http://en.wikipedia.org/wiki/Computer_network

http://en.wikipedia.org/wiki/Parallel_computing

facilitate commercial transactions, as in utility computing, or make it easier to

assemble volunteer computing networks.

One disadvantage of this feature is that the computers which are actually performing the

calculations might not be entirely trustworthy. The designers of the system must thus introduce

measures to prevent malfunctions or malicious participants from producing false, misleading, or

erroneous results, and from using the system as an attack vector. This often involves assigning

work randomly to different nodes (presumably with different owners) and checking that at least

two different nodes report the same answer for a given work unit. Discrepancies would identify

malfunctioning and malicious nodes.

Due to the lack of central control over the hardware, there is no way to guarantee that nodes will

not drop out of the network at random times. Some nodes (like laptops or dialup Internet

customers) may also be available for computation but not network communications for

unpredictable periods. These variations can be accommodated by assigning large work units

(thus reducing the need for continuous network connectivity) and reassigning work units when a

given node fails to report its results in expected time.

The impacts of trust and availability on performance and development difficulty can influence

the choice of whether to deploy onto a dedicated computer cluster, to idle machines internal to

the developing organization, or to an open external network of volunteers or contractors. In many

cases, the participating nodes must trust the central system not to abuse the access that is being

granted, by interfering with the operation of other programs, mangling stored information,

transmitting private data, or creating new security holes. Other systems employ measures to

reduce the amount of trust “client” nodes must place in the central system such as placing

applications in virtual machines.

Public systems or those crossing administrative domains (including different departments in the

same organization) often result in the need to run on heterogeneous systems, using

different operating systems and hardware architectures. With many languages, there is a trade off

between investment in software development and the number of platforms that can be supported

(and thus the size of the resulting network). Cross-platform languages can reduce the need to

make this trade off, though potentially at the expense of high performance on any given node

(due to run-time interpretation or lack of optimization for the particular platform).There are

diverse scientific and commercial projects to harness a particular associated grid or for the

25

http://en.wikipedia.org/wiki/Cross-platform

http://en.wikipedia.org/wiki/Computer_architecture

http://en.wikipedia.org/wiki/Operating_systems

http://en.wikipedia.org/wiki/Heterogeneous

http://en.wikipedia.org/wiki/Computer_cluster

http://en.wikipedia.org/wiki/Dialup

http://en.wikipedia.org/wiki/Volunteer_computing

http://en.wikipedia.org/wiki/Utility_computing

purpose of setting up new grids. BOINC is a common one for various academic projects seeking

public volunteers; more are listed at the end of the article.

In fact, the middleware can be seen as a layer between the hardware and the software. On top of

the middleware, a number of technical areas have to be considered, and these may or may not be

middleware independe]] management, Trust and Security, Virtual organization management,

License Management, Portals and Data Management. These technical areas may be taken care of

in a commercial solution, though the cutting edge of each area is often found within specific

research projects examining the field.

Disadvantages of Conventional super computers:

Disadvantages: Power usage, heat, cost and in the case of over clocked

computers heat that leads to damage to the components which in turn will raise the cost through

replacement parts. In the case of 64 bit processors, (which can provide better processing

capabilities) there can be the downside of compatibility issues for some software.

26

http://en.wikipedia.org/wiki/Grid_computing#See_also

http://en.wikipedia.org/wiki/BOINC

Proposed system:

Grid:

Grid is a combination of different components which collectively as a part of one large

electrical or electronic circuit.

Figure 1:Architecture of a Grid

Grid computing: The term grid computing means that large number of computers are connected

together to collectively solve a problem of very high complexity and magnitude.

Grid computing is all about sharing, aggregating, hosting, offering services

across the world for the benefit of mankind.

Grid computing is a form of networking. Unlike conventional networks that

focus on communication among devices, grid computing harnesses unusedprocessing cycles of

all computers in a network for solving problems too intensive for any stand-alone machine.

A well-known grid computing project is the SETI (Search for Extraterrestrial Intelligence)

@Home project, in which PC users worldwide donate unused processor cycles to help the search

27

http://www.webopedia.com/TERM/P/process.html

http://www.webopedia.com/TERM/N/network.html

for signs of extraterrestrial life by analyzing signals coming from outer space. The project relies

on individual users to volunteer to allow the project to harness the unused processing power of

the user's computer. This method saves the project both money and resources.

Grid computing does require special software that is unique to the computing project for which

the grid is being used.

Figure 2:Architecture Of Grid Computing

1.Current Issues In Grid Computing: Grid Computing is still very much in its development stage and there are a number of issues

that must be addressed or resolved before it can be considered as a stable technology. Some of

these issues are discussed below.

28

http://www.webopedia.com/TERM/S/software.html

1.1 The Grid versus Many Grids:

A distinction must be made between the idea of a single, worldwide, ubiquitous grid and the idea

of many separate grids located in businesses and on university campuses. The original intention

of Grid Computing was that it would follow the same architecture as the electricity grid. This

means that whenever and wherever you needed compute power you would simply \plug in" to

The Grid and the processing would be done. There would be no need to know where the

computing was being done - just as there is no need for me to know where the power that is

lighting this room is coming from - only that it was being done. In the same way that I don't need

to know whether the electricity lighting this room is coming from a hydro-electric power plant in

Fiordland or a wind turbine in Wellington, I wouldn't care if my complicated simulation were

being run on a spare machine next door or on an idle server somewhere on the other side of the

world. Infact,The Grid could be viewed as a Grid of Grids, in much the same way as the Internet

is a network of networks. Although work is still being done toward creating a single Grid, it is

already the case that there are many disparate grids worldwide that are all completely isolated

from each other. Having many separate Grids makes issues like authentication and Virtual

Organisations much simpler, which is one of the reasons that The Grid has not emerged. It also

eliminates the need for some sort of global billing system, which is discussed further in Section

1.2.Some progress toward creating a single worldwide grid has been made, however. The

PlanetLab project is a distributed testbed for testing new networking protocols, planetary scale

sharing, and many other ideas which can benefit from having a huge distributed. It

involves hundreds of computers at different locations around the world, mostly within academic

institutions, on which researchers at the institutions can run experiments. It is not an initiative

aimed at creating a global Computational Grid but it does provide some of the things that a

Grid must provide, such as authentication and authorisation. It currently has 361 nodes (as

at 20 February 2004)[30] connected to it so it is far short of being a worldwide Grid but it is

certainly an important step toward it, both in the new research initiatives that it has allowed

and in demonstrating that world-wide distributed computing projects are feasible. It has been

expected that PlanetLab will have over 1000 nodes distributed over the world by the end of

2004. Its only node in New Zealand is under care off the Network Research Group in the

Department ofComputer Science and Software Engineering at the University of Canterbury in

Christchurch. Sofar the only Australian node of PlanetLabs is located at the University of

Technology in Sydney.

29

1.3 No-one wants to share:

One of the biggest problems facing Grid Computing is not a technological one but a social one.

Even when the technology exists for Grid Computing to work easily and flawlessly, people are

still required to donate their spare CPU cycles or Grid Computing will not work at all. Although

one of the major points of Grid Computing is that only spare cycles will be used, it still goes

against human nature to allow others to access their computers and run programs on them. A fear

of viruses is no doubt a valid threat as what has been viewed as a secure system in the past has

been shown not to be so, so much work must go into developing a security infrastructure that can

be completely trusted.

In the SETI@home project, and others like it, work by volunteers around the world allowing

their computers to be used for scientic research shows that some people at least are willing to

share for no direct benefit to themselves but it is unlikely that everyone would allow this. Within

single businesses or university departments it is likely that it could be official policy that every

computer must be part of the organisation's Grid, but this would probably not work for The Grid

without some sort of global billing system.

5.3 Grid Economics:

Before all the separate grids can be connected into one `supergrid' some sort of billing system

must be established that is accepted and trusted by everyone. It is unlikely for a worldwide Grid

to take and make use of almost all spare CPU time without some incentive for people to make

their computers available. However, in order for a world-wide billing system to work, there will

need to be some way of accurately keeping track of the CPU time used, the CPU time provided

by each user and a way of transferring payment between users. The development of such a

system in a way that is scalable and trusted by everyone is necessary before a global Grid can

become the reality.

The development of such as system could lead to some sort of global bidding system for

compute power - which would fluctuate like the stock market. The value of CPU time would

vary over time according to supply and demand. Daytime hours in the North America during the

working week would probably have the highest demand so would cost more, but could make use

of the servers in Europe and Asia that are not handling their peak capacity. The analogy of the

Computer Grid with the electricity grid can be expanded further - just like it is possible to feed

30

power back into the electricity grid - it will be possible to feed computing power back into the

Computer Grid. In order for a stock-market like Grid billing system to succeed, several obstacles

must be overcome. Local resources must be able to be used first, otherwise a company could

incur costs from using The Grid that they wouldn't have otherwise. This includes stopping non-

local users from using the local resources in order to run local Grid applications. In order for a

stock-market system to work it must also be made sure that businesses or universities do not

incur charges that are more than the gain they would have made. If running an application on

The Grid saves several seconds but costs $100 then, it is probably not worth it. The ISP charges

as well as the Grid charges must be taken into account when calculating how much it will cost to

run on The Grid, which further complicates the issue. These problems mean that although The

Grid certainly can come into sometime, it is likely that in the next few years at least the

development of Grid Computing will focus mainly on the simpler task of creating separate Grids

at separate organisations.

5.4 Performance Forecasting:

One of the problems with scheduling resources on a Grid is that it is hard to know how long a

resource will be available for or how good its performance will be if it is used. Researchers have

implemented a tool known as EveryWare which contains, amongst other things, a performance

forecasting mechanism [21]. With accurate forecasting, scheduling becomes simpler because it is

known that a given resource will react fast to requests or process data quickly. Without accurate

performance forecasting a scheduler could schedule a remote set of CPUs to try and speed up

processing but actually make it slower because those CPUs do not perform as well as expected.

There is still work to be done in this area, however, as the performance forecasting needs to be

incorporated into scheduler algorithms and the accuracy of performance forecasting can no doubt

be improved.

5.5 The No-Defined Problems Problem:

A vital step in solving problems is identifying what they actually are. With any new technology

it is hard to know what the key problems to be solved for that technology to work are - there

are no forums for putting problems forward to be solved and no systematic attempts by various

researchers to solve them . To encourage the formulation of specific problems and solutions,

The authors of propose several problems that they see as holding back the progress of Grid

Computing and challenge other researchers not only to solve those problems but to supply more.

31

7Although Grid Computing has reached a state when a common vocabulary has been formed of

Grid Computing terms and various components of any Grid Computing system have been

spoken of, there is still inconsistency of what the different terms mean and when they are used.

When basic terms related with Grid Computing and components of Grid systems are agreed

upon, research into Grid Computing will be in a much better shape.

5.6 Security:

As mentioned, one of the reasons that people may not want to make their computer available on

a Grid is that they do not trust other users to run code on their machines. Within small scale

Grids this is not too much of a problem as Virtual Organisations at least partially eliminate the

fear of malicious attacks. This is because in a Virtual Organisation you can authorise only those

from within a certain trusted organisation to be able to access your computer. However, there

could potentially be problems with the authorisation systems and it is possible that someone from

within the organisation could act in a malicious way. With larger scale Grids it will be

impossible to know and trust everyone who can access a single computer so the Grid

infrastructure will have to provide guarantees of security in some way.

The Java Sandbox Security Model [14] already provides an environment in which untrusted

users are restricted from making certain system calls which are not considered safe, and from

accessing memory addresses outside of a certain range. Any Grid system will have to provide a

similar mechanism, so that users will be happy to let others access their computer.

5.7 Supercomputing Power For Everyone?

In the past, supercomputing power has been available only to very few people - certain people in

research institutions and some businesses. If The Grid is ever created, though, supercomputing

power will be available to anyone who wishes to access it, although probably at a fairly large

cost. This means that, amongst other things, anyone can do huge password searches or can try

and crack public/private keys. With the creation of The Grid, these issues will have to be

addressed either by somehow restricting users from being able to do such searches or by using

even larger keys and passwords. As [5] shows, what is considered to be an unbreakable key one

year can be inadequate a few years later, and with the advent of The Grid, this situation will be

re-enforced further. There are no doubt many other social issues that will arise when everyone

can have access to supercomputing power, and they will have to be addressed as well.

5.8 The Need Not To Centralise:

32

Any Grid system must have some knowledge of what resources are available in order to provide

Resource Access and Resource Discovery. The logical way to do this would be to have a central

repository listing all resources currently available and who is allowed to access them. The

problem with this centralised solution is that it is not at all scalable and means that the entire

Grid system is subject to a single point of failure. For these reasons, another way of providing

Resource Discovery is required. If there were a central repository containing details on all Grid

resources for a large Grid, the speed at which it would need to operate would be immense. The

dynamic nature of Grid resources would mean that the list of resources available would need to

be constantly updated. Because the availability of resources is dynamic, they can be taken away

from the users at any time which means that users may have to be constantly requesting access to

further resources. In a Grid of world-wide scale, a single server to handle this would not be

possible. As well as the problem of making the central server fast enough, it must also be so

reliable that it can never break down. If it did stop working then the whole Grid would also have

to stop - and even if some of the communication channels between it and certain sections of the

Grid broke, that whole section would have no other server which it could access. Some

distributed form of providing Resource Discovery is required for large Grids to operate reliably.

8To solve this problem, the authors of [21] say that they have created distributed, dynamic

`State Exchange Services'system called Gossips which manage resource access and discovery

and create and destroy themselves automatically. However, as stated there, not every Grid can

use that system so more work is required in this area. Other current Grid systems do not address

this problem at all (see, e.g. [1] and [20]) - but rely on centralised managers - so could not be

scaled past a certain point.

5.9 Grid Programming Environments:

Current Computer Aided Software Engineering (CASE) tools and programming languages have

not been designed to facilitate the creation of Grid applications. What is considered to be high

level in standard software development situations - Java, Message Passing Interfaces (MPI) -

are referred to as low level in Grid publications . This is because Grid Computing uses the

abstractions provided by what are currently referred to as high-level layers - Virtual Machines,

etc. - and extends them. For example Grid programmers should be able to treat a network

as one huge computer and not have to worry about the individual virtual machine computers

that make it up. This extra layer of abstraction should lead to new development environments

33

and possibly things like new programming keywords - `remote', `local', `secure', etc. The current

trend toward component based development will continue with Grid applications being made up

of different components at different sites. This could mean that huge data sets are stored at one

place, analysis is done on the Grid, and visualisation is done somewhere else. The component

based structure leads to the need for standard ways of storing and exchanging data, which current

tools like XML provide.

6 Grid Computing at the University of Canterbury:Grid Computing is not currently employed at the University of Canterbury (UC), but there are

serveral research teams who would like to work on projects that could make extensive use of

Grid Computing. This section outlines details of some of those projects and then the ways in

which they could be activated.

6.1 Research Teams and Projects:

These are projects of research teams from Physics and Astronomy (Prof. Philip Butler and

Associate Prof. Lou Reinisch), Forestry (Dr. Hamish Cochrane), Biological Sciences (Associate

Prof. Jack Heinemann) as well as from HIT Laboratory (Dr Mark Billinghurst). Their projects

are considered to be so heavily computational that they are not suitable for desktop processing.

In particular, the following projects have been planned:

Medical Imaging

The Department of Physics and Astronomy is hoping to purchase a PET/CT scanner in

the near future which would be used for Medical Imaging. Currently running the PET/CT

software on a high-end desktop computer means that only about 10% of time is spent doing

the scanning and the other 90% of the time is spent waiting for results. It is hoped that this

ratio of scanning to processing time could be increased greatly using a Grid, with reduction

of processing times at least ten times.

Bioinformatic Analysis and Genetic Data

Researchers in the New Zealand Institute of Gene Ecology (NZIGE), which includes sta

from the Department of Forestry and the School of Biological Sciences, as well as others,

would also be ready to make use of a computational grid. The research that would use the

grid would mostly involve (in very simple terms) searching for certain patterns on large data

34

sets. This is a very slow process on standard workstations and any increase in speed would

9be considered useful, with a speedup between 2 to 24 times being regarded as good, but

anything further better, of course.As well as these, it is envisaged that other projects would use

the grid if it were available. Some other potential users are:

Proteomics research.

Processing data about imported foods on behalf of MAF. This looks for certain features of

the foods but is currently a very slow process.

Processing astronomical data from the several telescopes that the Department of Physics

and Astronomy has access to.

6.2 Potential Grid Tools For UC

There are several tools that could be used to facilitate Grid Computing at the UC. All of the

projects mentioned above have a focus on data processing rather than data access or any other

Grid function, so this section will focus only on the data processing side of Grid Computing.

Note that although most of these tools are not Computational Grids as defined earlier in this

article they can still provide useful amounts of computing power (and fall into the realm of what

is commonly called Grid Computing)

6.2.1 XGrid

XGrid is a distributed computing system that is currently installed on all Apple Macintoshes at

UC. It claims to automatically detect the precense of other Apple Macs and to be capable of

distributing processing to them without any explicit programming . The degree to which this

works would have to be investigated further, but although most of the computers on campus are

not Macs, enough of them are for a fairly significant amount of processing power to be available

from them if the XGrid system is effective.

6.2.2 Globus

As mentioned earlier, the Globus Toolkit is often referred to as the de facto standard for

creating Computational Grids. It is therefore logical that if a Grid is to be created at UC, the

Globus Toolkit be used. The Globus Toolkit is not simply plugged in and used, however, unlike

XGrid, but is used to create Grids . For this reason, if the Globus Toolkit were to be used to

create a Grid at UC, specialist programmers would have to be employed to put it all together.

The advantage of the Globus Toolkit is that it is widely used and well understood and, compared

to other tools, it is at least known to work and work well.

6.3 The Akaroa Project

35

Akaroa2 is an automated controller of stochastic discrete-event simulation developed at the Uni-

versity of Canterbury by the Simulation Research Group (the group led by Prof. K. Pawlikowski

from Computer Science and Software Engineering, and Associate Prof. D. McNickle from Man-

agement). When Akaroa2 was designed at the University of Canterbury in 1992, it was one

of the first software packages enabling grid processing. In 1993, it received an international com-

mendation (in Science category) in the Computerworld Smithsonian Award for Achievements in

Information Technology, USA. Akaroa2 speeds up simulation experiments by performing

multiple replications of the experipment in parallel (MRIP) on multiple computers of a LAN,

with a simulation being stopped when the overall results have reached the desired level of

statistical precision. It runs the different replications on different machines acting as simulation

engines. Akaroa2 has been designed for 10working on local area networks consisting of

UNIX/Linux machines. Thus, the degree of its dis-tributiveness is limited by the number of

workstations in a given LAN. Currently, students of Computer Science and Software

Engineering at the University of Canterbury can use AkaroA2 for distributing simulations

utilizing about 250 workstations. Launching Akaroa2 on a Grid system would certainly be very

desirable, since access to many more hosts could be possible. The next section investigates how

this could be done.

6.3.1 PlanetLab

As mentioned, PlanetLab is not a Grid Computing system but is a global testbed for distributed

computing systems [30]. The Department of Computer Science and Software Engineering at

UC has maintained a node on PlanetLab, so any Grid projects conducted there could use the

PlanetLab testbed. This could form a very good way of extending the Akaroa2 project - multiple

simulations could be run on different parts of the world instead of on different machines in the

same lab, although issues such as the effect of the increased time propogation delay and

unreliable access to machines would need to be investigated. PlanetLab would also provide

access to another several-hundred machines which could further increase the speed of simulation

studies, and allow more complicated simulations to be carried out.

6.3.2 MPICH-G2 and Globus:

MPICH-G2 is a grid-enabled implementation of the MPI standard . MPI is a library speci-

fication for message-passing which can be used for constructing portable parallel programs.

Its goals are to provide portability and performance across many platforms and, because it

is aimed at being portable, it could be a good tool to use to modify Akaroa2. MPICH-G2 imple-

36

ments the MPI standard and extends it using tools from the Globus Toolkit, allowing the creation

of Grid applications that run on multiple machines of potentially different architectures . If

Akaroa2 were extended using MPICH-G2, it could be run on multiple environments at once (ie.

not just UNIX or Linux). This would greatly increase the potential processing power available to

simulation applications. MPICH-G2 has C and C++ bindings which make it ideal for use with

Akaroa2.

Grid Computing means sharing computing resources in order to create super-computing capa-

bilities out of desktop computers by using their idle CPU time. It also involves sharing other

computing resources such as data sets and disk storage. It has been around for several years and

has reached the stage when there are tools available so that experts can create Computational

Grids and use them to solve problems in many fields.

There are four vital issues which must be resolved in a distributed computing system before

it can be called a Grid. These are Authentication, Authorisation, Resource Access and Resource

Discovery. They lead to the idea of Virtual Organisations of collaborators who share resources

over a Grid. There are currently several tools available to help developers create Grids. The most

widely used of these is the Globus Toolkit, but there are others. There are also several

commercial companies which claim to provide Grid systems to clients.Despite all the progress

that has been made with Grid Computing, a number of challenges still exist. They must be faced

now or in the future if Grid Computing is to succeed as a technology. These include the issue of

many separate Grids versus a single world-wide Grid, addressing social issues of resulting from

sharing computing resources (the idea of Grid Economics), security issues(allowing untrusted

others to run code on your machine), problems with allocating resources (forecasting the

performance of resources and creating a way of discovering resources without using a single

central repository), and many others. Grid Computing is well suited to some of the research that

is being done, or is intended to 11be done, at the University of Canterbury. Projects in Physics

and Astronomy, Biological Sciences.

RESULTS:

To show the effectiveness of pipelining, we ran a series of experiments. All of our experiments

were performed on TeraGrid machines. For local-area tests we ranentirely on the University of

Chicago TeraGrid. Our wide-area tests ran between the San

37

Diego Supercomputer Center TeraGrid site and the University of Chicago TeraGrid site. The

nodes at these sites are Dual Itanium 1.5 GHz machines with 4 MB of RAM and 1 Gb/s network

interface cards. We used the Globus GridFTP server with the modifications

described above and a custom client written by using the jglobus libraries described above. To

avoid anomalies and bottlenecks in the filesystem, we used the standard UNIX devices /dev/zero

and /dev/null as our source and desitation files, respectively. The

devices appear as files to the GridFTP server; however, they do no disk or block I/O Figures 3

through 6 show the results of an experiment that transfers 1 GB of

partitioned into an increasing number of files. As the number of files increases, the size of each

file decreases, but the total number of bytes transferred remains constant at 1 GB.

The top x-axis shows the number of files, and the bottom x-axis shows the size of each file. The

y-axis shows the achieved throughput in Mb/s. The LAN results in Figures 3 and 4 show how the

legacy transfer request techniques quickly suffer when the data is partitioned into multiple files.

There is a significant dropoff before just 10 files of 100 MB each, and almost all of the

throughput is lost at 1,000 1 MB files. However, the

pipelining solution is unaffected by file partitioning until the point where the file sizes are less

than 100 KB. The wide-area tests in Figures 5 and 6 show how significantly latency affects the

legacy transfers. Sine the round-trip times are greater on wide area networks, the delay between

transfers is also greater, and thus the overall transfer time is longer. However, the pipelining case

is again unaffected.

38

Figure 3: Comparison of the performance of pipelined GridFTP transfers with standard (nonpipelined) GridFTP transfers in a LAN with no security

Figure 4: Comparison of the performance of pipelined GridFTP transfers with standard(nonpipelined) GridFTP transfers in a LAN with security

Security affects the results in a way we did not expect. Since we are caching data channel

connections in both the cached and the pipelining cases, we did not expect the throughput levels

to drop any sooner with security than without security. However, as shown in

Figures 4 and 6, this is not the case. As the number of files increases, the throughput drops off

sooner when sending with GSI authentication. After extensive investigation

we have determined that this result is due not to any data channel handling but rather to message

processing latencies on the control channel.

39

Figure 5: Comparison of the performance of pipelined GridFTP transfers with standard(nonpipelined) GridFTP transfers in a WAN with no security

Figure 6: Comparison of the performance of pipelined GridFTP transfers with standard(nonpipelined) GridFTP transfers in a WAN with security

40

Between transfers the server sends a reply to the client. In our implementation the data channel

must be idle while the reply is formatted and passed to the TCP stack for

sending. With nonsecure transfers this time is extremely short. With GSI, however, the reply

must be encrypted, and therefore it takes much longer to format. As more transfers are requested,

more of these replies must be sent. Thus, this idle time becomes great

enough to affect the transfer rate.

APPLICATIONS OF GRIDFTP PIPELINING

Allows many outstanding transfer requests

Send next request before previous completes

Latency is overlapped with the data transfer

Backward compatible

Wire protocol doesn’t change

Client side sends commands sooner

Significant performance improvement for LOSF

Advantages of Grid Computing:

Grid computing has been around for over 12 years now and its advantages are many. Grid

computing can be defined in many ways but for these discussions let's simply call it a way to

execute compute jobs (e.g. perl scripts, database queries, etc.) across a distributed set of

resources instead of one central resource. In the past most computing was done in silos or large

SMP like boxes. Even today you'll still see companies perform calculations on large SMP boxes

(e.g. E10K's, HP Superdomes). But this model can be quite expensive and doesn't scale well.

Along comes grid computing (top five strategic technologies for 2008) and now we have the

ability to distribute jobs to many smaller server components using load sharing software that

distributes the load evenly based on resource availability and policies. Now instead of having

one heavily burdened server the load can be spread evenly across many smaller computers. The

41

http://outervillage.com/content/top-five-strategic-technologies-2008

distributed nature of grid computing is transparent to the user. When a user submits a job they

don't have to think about which machine their job is going to get executed on. The "grid

software" will perform the necessary calculations and decide where to send the job based on

policies. Many research institutions are using some sort of grid computing to address complex

computational challenges. This post talks about how yous can volunteer your workstation to be

part of a grid that attempts to solve the some of the world’s biggest challenges.

Some Advantages of Grid Computing:

1. No need to buy large six figure SMP servers for applications that can be split up and

farmed out to smaller commodity type servers. Results can then be concatenated and

analyzed upon job(s) completion.

2. Much more efficient use of idle resources. Jobs can be farmed out to idle servers or even

idle desktops. Many of these resources sit idle especially during off business hours.

Policies can be in place that allow jobs to only go to servers that are lightly loaded or

have the appropriate amount of memory/cpu characteristics for the particular application.

3. Grid environments are much more modular and don't have single points of failure. If one

of the servers/desktops within the grid fail there are plenty of other resources able to pick

the load. Jobs can automatically restart if a failure occurs.

4. Policies can be managed by the grid software. The software is really the brains behind the

grid. A client will reside on each server which send information back to the master telling

it what type of availability or resources it has to complete incoming jobs.

5. This model scales very well. Need more compute resources? Just plug them in by

installing grid client on additional desktops or servers. They can be removed just as easily

on the fly. This modular environment really scales well.

42

http://outervillage.com/content/volunteer-computing

6. Upgrading can be done on the fly without scheduling downtime. Since there are so many

resources some can be taken offline while leaving enough for work to continue. This way

upgrades can be cascaded as to not effect ongoing projects.

7. Jobs can be executed in parallel speeding performance. Grid environments are extremely

well suited to run jobs that can be split into smaller chunks and run concurrently on many

nodes. Using things like MPI will allow message passing to occur among compute

resources.

Methods of Grid Computing:

1. Drozdowski’s on-line scheduling method:

Our scheduling method is based the On-Line method presented by Drozdowski

in [1], denoted "OL" thereafter. OL proceeds incrementally, computing the size

αi,j of the chunk to be sent to a worker Ni for each new round j, in order to

try and maintain a constant duration τ for the different rounds and thus avoid

contention at the master.

That is it allocates comparatively bigger (resp. smaller) chunks to workers with

higher (resp. lower) performance. Hence, this method can take the heterogeneous

nature of computing and communication resources into account, without explicit

knowledge of execution parameters (as equality (1) shows); as Drozdowski states,

"the application itself is a good benchmark" [1] (actually the best one).

Lemma 6.1 in [1] shows that, in a static context, with affine cost models

for communication, the way αi,j is computed using equation (1) ensures the

convergence of σi,j to τ when j increases indefinitely.

Being an estimation of the asymptotic period used for task distribution, τ is

also an upper-bound on the discrepancy between workers. Being able to control

this bound makes it possible to minimize the makespan during the clean-up

phase. round from the master to worker Ni (resp. from Ni to the master).

43

It should be noted that, unlike previous work [1, 9], this paper introduces computation start-up

times in order to be more realistic when considering grids. As suggested in section 2, the values

of the execution parameters of any worker Ni ensures that sending chunks of any size α to a

worker Ni and receiving the corresponding results cost less than processing these chunks.

The problem with OL is that computation never overlaps communication in any worker node, as

the emission of the chunk of the next round is at best triggered by the return of the result of the

previous one, with no possible anticipation.

2.The OLMR method:

2.1 Overview of the method

Our method is based on OL, but avoids idle time with respect to computing.

When the total load is important compared to the available bandwidth between master and

workers, the workload should be delivered in multiple rounds

[10, 11, 12]. Therefore we will have each worker receive its share of the load

through multiple rounds, hence the name On-Line Multi-Round method [9], denoted "OLMR"

thereafter. OLMR divides the chunk sent to Ni for each round

j into two subchunks "I" and "II" of respective sizes αi,j and αi,j − αi,j . Dividing the chunks in

two parts is enough in order to apply the principle, and

the division allows the computation to overlap the communications as can be

seen in figure FIG.1. In order to compute αi,j , we use a value of σi,j−1 derived

from the measurement of the elapsed time (including both communications and

computation) for subchunk I of the previous round: σi,j−1. We will show that,

thanks to this anticipation (compared to OL) in the computation of αi,j , we can avoid the inter-

round starvation.

44

Conclusion:

So far we have been describing and walking through overview discussion topics on the Grid Computing discipline that will be discussed further throughout this book, including the Grid Computing evolution, the applications, and the infrastructure requirements for any grid environment.

In addition to this, we have discussed when one should use Grid Computing disciplines, and the factors developers and providers must consider in the implementation phases. With this introduction we can now explore deeper into the various aspects of a Grid Computing system, its evolution across the industries, and the current architectural efforts underway throughout the world.

The proceeding chapters in this book introduce the reader to this new, evolutionary era of Grid Computing, in a concise, hard-hitting, and easy-to-understand manner.

In past by implementing the concept of Grid computing achieved the things

like robustness, throughput,and standard. In future concentrate the things like secure, scalable,

extensible. Finally a grid in need is agrid indeed.

Future work on Grid Computing:

Grid Computing can be defined as the seamless provision of access to possibly remote,

possibly heterogeneous, possibly untrusting, possibly dynamic computing resources. Analysed

piece by piece, this definition means that Grid Computing provides seamless access to:

1. Possibly Remote Computing Resources

Means that local resources, which are on the same LAN, and remote resources, which are

geographically distant, can be accessed in exactly the same way on the Grid.

2. Possibly Heterogeneous Computing Resources

45

Some computers on the Grid can run different Operating Systems on different types of

machines. Accessing them via the Grid should be possible without making any special

allowances for this.

3. Possibly Untrusting Computing Resources

Means that the owner of a computing resource on the Grid might not know or trust other

users but should still be confident that they cannot access any non-shared data and cannot make

malicious system calls on their computer. The Grid should handle this security checking without

any specific instruction from the user or from the sharer.

4. Possibly Dynamic Computing Resources

One of the major selling points of Grid Computing is that it makes use of otherwise

wastedCPU cycles. The problem with this is that the availability of computers to the Grid

changes rapidly as computers become busy and then idle as their owner's usage varies. The Grid

system should ensure that this dynamism is hidden from users so that they do not have to

program explicitly to take account of this.

Seamless provision means that Grid users can access such seemingly un-accessible resources

easily without having to worry about all these complications.

Altogether, this definition leads to four main things that any Grid system must provide

seamlessly in order to be considered a Grid,

1. Authentication

2. Authorization

3. Resource Access

4. Resource Discovery

4.1.1 Authentication

Authentication means that each user has an identity which can be trusted as genuine. This is

necessary because some resources may be authorized only to certain users, or certain classes of

users.

Authentication of a user should happen only once when they start using a Grid - they

should not have to sign on separately to each of the many machines that their

46

computation may use.

4.1.2 Authorization

Authorisation means that each resource be it the spare computing power on a computer of

an organisation or a set of astronomical data will have a set of users and groups that can accessit.

TheGrid needs to rst authenticate that the users are who they say they are and then ensure

that they are allowed to access the resources that they are requesting. Having groups authorised

to access certain resources leads to the idea of Virtual Organisations.

4.1.3 Resource Access

Resource Access means that remote resources can be accessible to Grid users. These

resources could mean anything from CPU time to disk storage, to visualisation tools and data

sets. As discussed, not everyone should be able to access all resources but the Grid must provide

a way to access those that are allowed. This means that some sort of virtual machine is required

so that machines with different operating systems, etc. can be accessed in a uniform way.

4.1.4 Resource Discovery

Being allowed to access thousands of different CPUs is useless without being able to find

out where they are. Resource Discovery means that users can find remote resources that they can

use. This process should be automated by the Grid so that a user's task can automatically be run

remotely without them having to go through the process of finding CPUs that they can use. The

automation of resource discovery is complicated hugely by the dynamic nature of Grid resources

what is available at one instant of time may no longer be available a while later. Added to this

complication is the desire to avoid a single central point where all data is stored because the

failure of it would bring the whole system down and one single point of control is not a scalable

solution if the Grid becomes really large this central point would be badly overloaded.

3.2 Virtual Organisations

47

The idea of a Virtual Organisation (VO) is that on, say, a university campus-wide Grid,

members of the Physics and Biology departments could be working on a project together so they

could form a Virtual Organisation for that project where they could all access the data for that

project and each other's computing resources. However, those who are not members of the

research group would not be members of the VO so would not be able to access the resources.

Members of the Computer Science department - who would not be part of the other VO - may be

working on a different project however could have separate projects running with separate access

rights for a different set of resources. Note that different projects within the same departments

could also have separate Virtual Organisations so keep some of their data separate but allow

projects from both VOs to use the compute resources.

4 Current Grids and Grid Products

There are a number of tools available to help create Computational Grids, both free,

open-source ones and commercial products. There is also a standards body which seeks to put

forward `recommendations' about how best to do Grid Computing. This section gives an

overview of these, and details about several of the many Grids in existence today.

4.1 Tools and Standards

4.1.1 Globus

The Globus Toolkit designed by the Globus Alliance contains a set of free software tools

services, APIs and protocols - to facilitate constructions of Grids. It is the most widely used

toolkit for building of Grids and is frequently referred to as the de facto standard; see e.g. ,. It

includes tools for, among other things, security, resource management and communication.The

Globus Alliance also researches various issues related to Grid Computing, especially issues

relating to the infrastructure of Grids. Almost every Grid which has its details published was

constructed using the Globus Toolkit.

4.1.2 The Global Grid Forum

The Global Grid Forum (GGF) performs a similar role to the development of Grids as the W3C

does toward the development of the World Wide Web, [26]. It is a conglomerate of interested

parties including universities, research institutes and industry. It is not an official body so it does

not put forward standards but just `best practices' for Grid developers. It is important because

it provides a forum for new ideas to be discussed by all interested parties. There are strong links

between the GGF and The Globus Alliance - ideas put forward by the GGF are often

implemented by Globus.

48

4.1.3 Condor and Condor-G:

Condor is a software tool for distributing computationally intensive jobs over Grids. It works by

using spare CPU cycles on other computers. It provides a way of doing resource discovery using

`ClassAds' which matches job requests to unused resources. From the Condor product Condor-G

has been created. Condor-G is an enhanced version of Condor which can be used to make Grids.

It uses Globus tools to provide \security, resource discovery, and resource access in multi-

domain environments" with Condor's \management of computation and harnessing of resources

within a single administrative domain." There has also been work on making separate Condor

pools self-organising, fault-tolerant, scalable, and locality-aware" which has proved to be a

successful way for automatic management of larger groups of Condor pools.

4.2 Some current Grids in development and deployment There are many Grids currently in use and in production; in this section we examine several

ofthem in detail. These are not claimed to give a representative sample of all current Grids, but

are only to give insight into a few of them. The huge Euro Grid project and the United States

National Fusion Collaboratory are discussed.

4.2.1 European Data Grid:

The European Data Grid is a European Union funded project which aims to create a huge

Grid system for computation and data-sharing. It is aimed at projects in high energy physics, led

by CERN, biology and medical image processing, and astronomy. It is being developed using

and extending the Globus Toolkit. In building the Grid new tools and systems have been

developed in many areas useful for the extension of Grid Computing. For example, a method of

enabling secure access to databases in Grid environments has been developed [18]. New

techniques for searching for patterns in genomic data using the European Data Grid have also

been developed .

4.2.2 The National Fusion Collaboratory:

The National Fusion Collaboratory project exists to help research magnetic fusion. Magnetic

fusion experiments operate on pulses of plasmas which are produced approximately every 15

minutes. The data generated from each measurement must be analysed within the 15 minutes so

that changes can be made to the set up in time for the next pulse . This time limit means

49

that it would be very useful for the researchers to be able to analyse the data quickly so that more

time can be spent reconfiguring the experimental set up. For this reason, the National Fusion

Collaboratory constructed a Computational Grid. This project was also built using the Globus

Toolkit and the main research focus is on `advanced reservations of multiple resources' - this

means that resources such as computational cycles can be reserved in advance if it is known that

they will be required sometime in the future.

4.3 Commercial Grid Products:

There are several Grid products currently listed on various websites; see for example and .

They claim to easily enable Grid Computing within organisations but it is hard to tell how much

they actually do because they do not publish refereed papers - most of the information available

about them is probably marketing hype and not a veriable fact. When the NorduGrid was being

constructed in Scandanavia they chose to develop their own Grid system because nothing

existing was suitable, [11]. This shows that at this stage at least commercial products were not of

a high enough standard for real use.

50

BIBILOGRAPHY:

[1] W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, and I. Foster,

“The Globus striped GridFTP framework and server,” in SC'05, ACM Press, 2005.

[2] Gu, Y. and Grossman, R. L. 2007. UDT: UDP-baseddata transfer for high-speed wide area

networks. Comput. Networks 51, 7 (May. 2007), 1777–1799. DOI=

http://dx.doi.org/10.1016/j.comnet.2006.11.009

[3] C. Kiddle P. Rizk and R. Simmonds. A GridFTP overlay network service. In In Proceedings

of the 7th IEEE/ACM International Conference on Grid Computing,

Barcelona, Spain, 2007.

51

WEBSITES:

[1] http://www.nlr.net/

[2]http://www.uklight.ac.uk/

[3] http://www.csm.ornl.gov/ultranet/topology.html

[4] http://www.lambdastation.org/

[5] http://www.atlasgrid.bnl.gov/terapaths

52

a seminar report on grid computing

Documents