a seminar report on grid computing
TRANSCRIPT
A Seminar Report on
UR TITLE NAME
Submitted in partial fulfillment of the requirement for the award of
Bachelor of Technology
In
COMPUTER SCIENCE & ENGINEERING
From
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD
By
AMBADIPUDI RAJESH
08RC1A0503
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
LAQSHYA INSTITUTE OF TECHNOLOGY & SCIENCE
(Approved by AICTE, New Delhi & Affiliated to JNTU, Hyderabad)
TANIKELLA (V), KHAMMAM (M), KHAMMAM (Dt). A.P. India -507305
Ph: 08742 211306
http://www.laqshya.edu.in/
1
LAQSHYA INSTITUTE OF TECHNOLOGY & SCIENCE
(Approved by AICTE, New Delhi & Affiliated to JNTU, Hyderabad)
TANIKELLA (V), KHAMMAM (M), KHAMMAM (Dt). A.P. India -507305
Ph: 08742 211306
http://www.laqshya.edu.in/
CERTIFICATE
This is to certify that the dissertation entitled “GRID COMPUTING” is a confide work
done by “AMBADIPUDI RAJESH,08RC1A0503” in the partial fulfillment of Bachelor of
Technology in Computer Science & Engineering from JNTU, Hyderabad during the year 2011-
2012.
Mr.SK.SHARFUDDIN M.Tech Mrs.M.Sri devi M.Tech,(p.hd)
Assistant Professor Associate Professor
Supervisor H.O.D., C.S.E.
2
ACKNOWLEDGEMENTACKNOWLEDGEMENT
The satisfaction that accompanies the successful completion of any task would be incomplete
without the mention of the people who made it possible and whose encouragement and guidance
has been a source of inspiration throughout the course of project.
It is my privilege and pleasure to express my profound sense of gratitude and
indebtedness to Mr.SK.SHARFUDDIN seminor Supervisor& Assistant Professor,
Department of Computer Science and Engineering, LAQSHYA institute of technology and
science, for his guidance, cogent discussion, constructive criticisms and encouragement
throughout this dissertation work.
I express my sincere gratitude for Associate Professor Mrs.M.SRIDEVI, Head of
Department of Computer Science and Engineering, LAQSHYA institute of technology and
science, for her precious suggestions, motivation and co-operation for the successful
completion of this seminor work.
In addition I would like to thank all my family members,
friends, and colleagues for giving moral strength and support to complete this dissertation.
3
GridFTP
Abstract:
Applications of Grid Computing
A Grid can be simply defined as a combination of different components which
function collectively as a part of one large electrical or electronic circuit.
Grid FTP is a wellknown and robust protocol for fast data transfer on the grid.
GridFTP is an exceptionally used for large volumes of data.
In this data transmission we can face a “Lots of small files” (LOSF) problem. In this
problem the large amount of data set will be partitioned into a small file. The transmission of the
small files will be achieved by the concept of “Pipelining”.
Pipelining approaches the LOSF problem by trying to minimize the amount of time
between transfers. Pipelining allows the client to have many outstanding transfer commands at a
time, Instead of being forced to wait for the transfer successful acknowledgement message the
client has free to send the transfer commands at any time. The Server processes these requests in
the order they are send.
In Grid FTP we can establish a channel. In this Channel establishment we can use
two channels one is Control channel and another is Data channel.
Key words:
Grid
Grid FTP
LOSF (Lots of small files )
Pipelining
Robust
Server
Control channel
4
INDEX
1 INTRODUCTION………………………………7-13
2 LITERATURE SURVEY……………………….14-23
3 EXISTING SYSTEM…………………………....24-13
Dis-advantages in Existing system
4 PROPOSED SYSTEM…………………………...14-19
Advantages in proposed system.
5 METHODS………………………………………..20-21
6 CONCLUSION……………………………………..22
7 FUTURE WORK…………………………………...23
8 BIBILOGRAPHY…………………………………...24
9 WEB SITES………………………………………...25
5
LIST OF FIGURES
s.no Figure name/description Page.no
1
2
6
INTRODUCTION
A Grid can be simply defined as a combination of different components which function
collectively as a part of one large electrical or electronic circuit. It can also be defined as a
paradigm/infrastructure that enables the sharing, selection and aggregation of geographically
distributed resources such as Computers-PCs, Workstations, Clusters, SuperComputers,
Softwares, Catalogued data and databases etc.
The term “Grid Computing” can similarly be applied to a large number of computers
which connect together to collectively solve a problem of very high complexity and magnitude.
The fundamental idea behind the making of any computer based grid is to utilize the idle time of
processor cycles. Simply stated, a processor during the times it would stay idle would now team
up with similar idle processors to tackle various complexities. Grid Computing virtualizes
distributed computing and data resources such as processing, network band-width and storage
capacity to create a single system image, granting users and applications seamless access to vast
IT capabilities.
GridFTP
An important type of communication in grid and distributed computing environments is bulk
data transfer. GridFTP has emerged as a de facto standard for secure, reliable, high-performance
data transfer across resources on the Grid
GridFTP is a well-known and robust protocol for fast data transfer on the Grid. The
GridFTP implementation provided by the Globus Toolkit can scale to network speeds and has
been shown to deliver 27 Gb/s on 30 Gb/s. The Globus Toolkit is an open source software
toolkit used for building Grid systems and applications
7
The protocol is optimized to transfer large volumes of data commonly found in Grid
applications. Datasets of sizes from hundreds of megabytes to terabytes and beyond can be
transferred at close to network speeds by using GridFTP. Given the high-speed networks
commonly found in modern Grid environments, datasets less than 100 MB
are too small for the underlying protocols like TCP to utilize the maximum capacity of the
network. Therefore, GridFTP – and most bulk data transfer protocols –experiences the highest
levels of throughput when transferring large volumes of data. Unfortunately,
conventional implementations of GridFTP have a limitation as to how the data must be
partitioned to reach these high-throughput levels. Not only must the amount of data to
transfer be large enough to allow TCP to reach full throttle, but the data must also be in
large files, ideally in one single file. If the dataset is large but partitioned into many small
files(on gigabit networks we consider any file smaller than 100 MB as a small file), the
performance of GridFTP servers suffers drastically This problem is known as the“lots of small
files”(LOSF) problem.
In this paper we study the LOSF problem and present a solution known as pipelining.
We have implemented pipelining in the Globus Toolkit,
LOSF PROBLEM
The GridFTP protocol is a backward-compatible extension of the legacy RFC959 FTP protocol.
It maintains the same command/response semantics introduced
by RFC959. It also maintains the two-channel protocol semantics. One channel is for control
messaging (the control channel) such as requesting what files to transfer , and the other is for
streaming the data payload (the data channel). These protocol details have
interesting effects on the LOSF problem.
Channel Establishment
GridFTP servers listen on a well-known and published port for client control channel
connections. Once a client successfully forms a control channel with a server (this often involves
8
authentication and authorization), it can begin sending commands to the server.In order to
transfer a file, the client must first establish a data channel.This involves sending the server a
series of commands on the control channel describing attributes of the desired data channel such
as: what protocol to use, binary or ASCII data, passive or active connection, and various protocol
specific attributes. Once these commands are successfully sent, a client can request a file
transfer. At this point a separate data channel connection is formed using all of the agreed upon
attributes, and the requested file is sent across it. In standard FTP the data channel can be used
only to transfer one file. Future transfers must again go through the process of setting up a new
data channel.
GridFTP modified this part of the protocol to allow many files to be transferred across a single
data channel. With GridFTP all of the messaging to establish a data channel is done once; the
data channel connection is formed just once, and the client can request several file transfers using
that same data channel. This enhancement is known as data channel caching.
File Transfers
File transfer requests are done with the RETR (send) or STOR (receive) command. A client
sends one of these commands to the server across the control channel. Data then begins to flow
between the client and server over the data channel. Once all of the data has been transferred, a
“226 Transfer Complete” acknowledgment message is sent from the server to the client on the
control channel. Only when this acknowledgment is received can the client request another
transfer. This interaction is illustrated in Figure 1.
As the figure shows, there is an entire round-trip time on the control channel between
transfers where the data channel must be idle. Before issuing the next transfer command
the client must first receive the transfer completion acknowledgment, which is one across the
network. After receiving the acknowledgment, the client sends the transfer command
immediately. However, the server does not immediately receive it
9
Figure 1: GridFTP file transfers with no pipelining
The message must cross the network before the server will begin sending data. This process
involves another trip across the network. Assuming we have the GridFTP data channel caching
enabled, we do not have to worry about the latencies involved with establishing
the data channel. If we do not have it enabled, the delay is significantly longer.
During this time the data channel is idle. The latency between transfers adds to the overall
transfer time and thus detracts from the overall throughput. The problem is even exacerbated
when communicating over highlatency networks where the RTT is very
high. While the idle data channel time is a problem, there is a far greater problem that it causes.
TCP is a window-based protocol. For it to achieve maximum efficiency, the window size of
10
allowed unacknowledged bytes must grow to the bandwidth delay product . Various algorithms
in the TCP protocol decide to increase or decrease
the window size based on observed events. If a connection is idle for longer than one RTT, the
window size gets reduced to zero; and once it is used again, it must go through TCP slow start
[14]. When transferring a series of files, the data channel is idle for a control channel RTT in
between transfers. If the control channel RTT and the data channel RTT are similar, it is likely
that data channel TCP connections will have entire
closed windows by the time the next transfer begins. When the amount of data sent in
each file is small, the ratio of idle data channel time to transfer time becomes higher and affects
the throughput. Additionally, small files may not be transferred long enough to traverse the slow-
start algorithm and bring TCP to full throttle. Thus, even when data is being transferred, it is not
moving at full speed.
PIPELINING
Pipelining approaches the LOSF problem by trying to minimize the amount of time between
transfers. Pipelining allows the client to have many outstanding, unacknowledged transfer
commands at once. Instead of being forced to wait for the “226 Transfer Successful” message;
the client is free to send transfer commands at
any time. The server processes these requests in the order they are sent. Acknowledgments are
returned to the client in the same order. The process is shown in
Figure 2.This process hides the latency of transfer requests by overlapping them with data
transfers. The first transfer request is sent, and data begins to flow across the data
channel. While the file transfer is in progress, the client sends the next n file transfer
requests. The server queues the requests. When the server completes the file
transfer, it sends the acknowledgment to the client and checks the queue for the next transfer
request. If the queue is not empty, the next file transfer begins immediately.
There is some inevitable processing latency between transfers, but it is very small compared to
the entire RTT of network latency that has been eliminated.
11
Figure 2: GridFTP file transfers with pipelining
According to the proposed pipelining protocol, the client is allowed to send an unlimited number
of outstanding commands. In practice, the number of outstanding commands will be limited by
the GridFTP server implementation and TCP flow control. The client is free to send as many
commands as it wishes on the TCP control channel. However, the GridFTP server will read a
limited number of these commands out of the TCP buffer and into its process space. All other
outstanding commands will remain in the operating systems TCP buffers. As the server side
buffers get full, the TCP window will close. Ultimately, the sending side TCP buffers will fill up,
and the client’s attempt to send future commands will be stalled. In most cases there is little
performance benefit for a client to have more thanthree outstanding commands; however,
allowing an unlimited number makes client implementation simpler.Client waits for the same
number of acknowledgments from the server.
12
GridFTP Pipelining
GridFTP is a high-performance, secure, reliable datatransfer protocol optimized for
high-bandwidth wide-area networks. GridFTP is an exceptionally fast transfer protocol for
large volumes of data. Implementations of it are widely deployed and used on well-connected
Grid environments such as those of the TeraGrid because of its ability to scale to network
speeds. However, when the data is partitioned into many small files instead of few large files, it
suffers from lower transfer rates. The latency between the serialized transfer requests of each file
directly detracts from the amount of time data pathways are active, thus lowering achieved
throughput. Further, when a data pathway is inactive, the TCP window closes, and TCP must go
through the slow-start algorithm. The performance penalty can be severe. This situation is known
as the “lots of small files” problem. In this paper we introduce a solution to this problem. This
solution, called pipelining, allows many transfer requests to be sent to the server before any one
completes. Thus, pipelining hides the latency of each transfer request by sending the requests
while a data transfer is in progress. We present an implementation and performance study of the
pipelining solution.
13
LITERATURE SURVEY
Utility computing is the conceptual core of our analysis but much of the current debate on
this idea is discourse on the concept of “Cloud Computing” – a more marketable vision perhaps
than utility computing. Cloud Computing is a new and confused term. Gartner define cloud
computing succinctly as “a style of computing where massively scalable IT-related capabilities
are provided ‘as a service’ using Internet technologies to multiple external customers”. Yet our
interest is not in the particulars of cloud computing itself but the opportunities presented for
researchers and practitioners by this new technology. We argue that fundamental to both cloud
computing and utility computing is a decoupling of the physicality of IT infrastructure from the
architecture of such infrastructures use. While in the past we thought about the bare-metal
system (a humming grey box in an air-conditioned machine room with physical attributes and a
host of peripherals) today such ideas are conceptual and virtualized – hidden from view. It is this
decoupling which will form the basis of our discussion of the technology of the Grid.
There certainly is a strong element of hype in much of the Utility, Grid and Cloud computing
discourse and perhaps such hype is necessary. As Swanson and Ramillar (1997) remind us, the
organising visions of information and communications technology are formed as much in
extravagant claims and blustering sales talk as they are in careful analysis, determination of
requirements or proven functionality. We can at times observe a distinct tension between the
technologists’ aspiration to develop and define an advanced form of computer infrastructure, and
a social construction of such technology through discourses of marketing, public relations. We
find a plethora of terms associated with Utility computing within commercial settings include
Autonomic Computing; Grid Computing; On-Demand Computing; Real-time Enterprise;
Service-Oriented Computing; Adaptive computing (or Adaptive Enterprise) (Goyal and Lawande
2006; Plaszczak and Wellner 2007) and peer-to-peer computing (Foster and Iamnitchi 2003). We
have adopted the term “utility computing” as our categorization of this mixed and confused
definitional landscape.
Many authors who write about Utility Computing start with an attempt to provide a definition,
often accompanied by a comment as to the general “confusion” surrounding the term (e.g.
(Gentzsch 2002)). It is unrealistic to expect an accepted definition of a technology which is still
emerging, but by tracing the evolution of definitions in currency we can see how the
14
understanding of new technology is influenced by various technical, commercial and socio-
political forces. Put another way, the computer is not a static thing, but rather a collection of
meanings that are contested by different groups (Bijker 1995), and as any other technology,
embodies to degrees its developers’ and users’ social, political, psychological, and professional
commitments, skills, prejudices, possibilities and constraints.
Computing Utility: The Shifting nature of Computing.
Since Von Neumann defined our modern computing architecture we have seen computers as
consisting of a processing unit (capable of undertaking calculation) and a memory (capable of
storing instructions and data for the processing unit to use). Running on this machine is
operating system software which manages (and abstracts) the way applications software makes
use of this physical machine. The development of computing networks, client-server computing
and ultimately the internet essentially introduced a form of communication into this system –
allowing storage and computing to be shared with other locations or sites - but ultimately the
concept of a "personal computer" or "server computer" remains.
This basic computer architecture no longer represents computing effectively. Firstly the physical
computer is becoming virtualized – represented as software rather than as a physical machine.
Secondly it is being distributed through Grid computing infrastructure such that it is owned by
virtual rather than physical organizations. Finally these two technologies are brought together in
a commoditization of computing infrastructure as cloud computing – where all physicality of the
network and computer is hidden from view. It is for this reason that in 2001 Shirky –at a P2P
Webservices conference stated that “Thomas Watson’s famous quote that’ I think there is a
world market for maybe five computers’ was wrong - he overstated the number by four”. For
Shirky the computer was now a single device collectively shared. All PCs, mobile phones and
connected devices share this Cloud of services on demand – and where processing occurs is not
relevant. We now review the key technologies involved in Utility Computing.
1: Internet – Bandwidth and
Internet Standards
At the core of the Utility Computing model is the network.
The internet and its associated standards have enabled
interoperability among systems and provides the foundation
15
for Grid Standards.
2: Virtualisation
Central to the Cloud Computing idea is the concept of
Virtualising the machine. While we desire services, these are
provided by personal-machines (albeit simulated in
software).
3:Grid Computing
Middleware and Standards
Just as the Internet infrastructure (standards, hardware and
software) provides the foundation of the Web, so Grid
Standards and Software extend this infrastructure to provide
utility computing utilising large clusters of distributed
computers.
Internet – Bandwidth and Standards
The internet emerged because of attempts to connect mainframe computers together to undertake
analysis beyond the capability of one machine - for example within the SAGE air-defence
system or ARPANET for scientific analysis (Berman and Hey 2004). Similarly the Web emerged
from a desire to share information globally between various different computers (Berners-Lee
1989). Achieving such distribution of resources is however founded upon a communications
infrastructure (of wires and radio-waves) capable of transferring information at the requisite
speed (bandwidth) and without delays (latency). Until the early 2000s however the bandwidth
required for large applications and processing services to interact was missing. During the dot-
com boom however a huge amount of fibre-optic cable and network routing equipment was
installed across the globe by organisations, such as the failed WorldCom, which reduced costs
dramatically and increased availability.
Having an effective network infrastructure in place is not enough. A set of standards (protocols)
are also required which define mechanisms for resource sharing (Baker, Apon et al. 2005).
Internet standards (HTTP/HTML/TCP-IP) made the Web possible by defining how information
is shared globally through the internet. These standards ensure that a packet of information is
reliably directed between machines. It is this standardised high-speed high-bandwidth Internet
infrastructure upon which Utility Computing is built.
16
Virtualization
Virtualization for cloud computing is a basic idea of providing a software simulation of an
underlying hardware machine. These simulated machines (so called Virtual Machines) present
themselves to the software running upon them as identical to a real machine of the same
specification. As such the virtual machine must be installed with an operating system (e.g.
Windows or Linux) and can then run applications within it. This is not a new technology and was
first demonstrated in 1967 by IBM’s CP/CMS systems as a means of sharing a mainframe with
many users who are each presented with their own “virtual machine” (Ceruzzi 2002). However
its relevance to modern computing rests in its ability to abstract the computer away from the
physical box and onto the internet. “Today the challenge is to virtualize computing resources
over the Internet. This is the essence of Grid computing, and it is being accomplished by
applying a layer of open Grid protocols to every “local” operating system, for example Linux,
Windows, AIX, Solaris, z\OS” (Wladawsky-Berget 2004). Once such Grid enabled virtualization
is achieved it is possible to decouple the hardware from the now virtualized machine, for
example running multiple virtual machines on one server or moving a virtual machine between
servers using the internet. Crucially for the user it appears they are interacting with a machine
with similar attributes to a desktop machine or server - albeit somewhere within the internet-
cloud.
Grid Computing
The term “Grid” is increasingly used in discussions about the future of ICT infrastructure, or
more generally in discussion of how computing will be done in the future. Unlike “Cloud
computing” which emerges and belongs to an IT industry and marketing domain, the term “Grid
Computing” emerged from the super-computing (High Performances Computing) community
(Armbrust, Fox et al. 2009). Our discussion of Utility computing begins with this concept of
Grids as a foundation. As with the other concepts however for Grids hyperbole around the
concept abounds, with arguments proposed that they are “the next generation of the internet”,
“the next big thing”; or that will “overturn strategic and operating assumptions, alter industrial
economics, upset markets (…) pose daunting challenges for every user and vendor” (Carr 2005)
and even “provide the electronic foundation for a global society in business, government,
17
research, science and entertainment” (Berman, Fox et al. 2003). Equally, Grids have been
accused of faddishness and that “there is nothing new” in comparison to older ideas, or that the
term is used simply to attract funding or to sell a product with little reference to computational
Grids as they were originally conceived (Sottrup and Peterson 2005).
From a technologists perspective an overall description might be that Grid technology aims to
provide utility computing as a transparent, seamless and dynamic delivery of computing and data
resources when needed, in a similar way to the electricity power Grid (Chetty and Buyya 2002;
Smarr 2004). Indeed the word grid is directly taken from the idea of an electricity grid, a utility
delivering power as and when needed. To provide that power on demand a Grid is built (held
together) by a set of standards (protocols) specifying the control of such distributed resources.
These standards are embedded in the Grid middleware, the software which powers the Grid. In a
similar way to how Internet Protocols such as FTP and HTTP enable information to be past
through the internet and displayed on users PCs, so Grid protocols enable the integration of
resources such as sensors, data-storage, computing processors etc (Wladawsky-Berget 2004).
The idea of the Grid is usually traced back to the mid 1990s and the I-Way project to link
together a number of US supercomputers as a ‘metacomputer’ (Abbas, 2004). This was led by
Ian Foster of the University of Chicago and Argonne National Laboratory. Foster and Carl
Kesslemenn then the Globus project to develop the tools and middle ware for this
metacomputer[3]. This tool kit rapidly took off in the world of supercomputing and Foster
remains a prominent proponent of the Grid. According to Foster and Kesselman’s (1998) “bible
of the grid” a computational Grid is “a hardware and software infrastructure that provides
dependable, consistent, pervasive and inexpensive access to high-end computational
capabilities”. In this Foster highlights “high-end” in order to focus attention on Grids as
supercomputing resource supporting large scale science; “Grid technologies seek to make this
possible, by providing the protocols, services and software development kits needed to enable
flexible, controlled resource sharing on a large scale” (Foster 2000)[4].
Three years after their first book however the same authors shift their focus, again speaking
of Grids as "coordinated resource sharing and problem solving in dynamic, multi-institutional
virtual organizations" (Foster, Kesselman et al. 2001). The inclusion of “multi-institutional”
within this 2001 definition highlights the scope of the concept as envisaged by these key Grid
18
proponents, with Berman (2003) further adding that Grids enable resource sharing “on a global
scale”. Such definitions, and the concrete research projects that underlie them, make the
commercial usage of the Grid seem hollow and opportunistic. These authors seem critical of the
contemporaneous re-badging by IT companies of existing computer-clusters and databases as
“Grid enabled” (Goyal and Lawande 2006; Plaszczak and Wellner 2007). This critique seems to
run through the development of Grids within supercomputing research and science where many
lament the use of the term by IT companies marketing clusters of computers in one location.
In 2002 Foster provides a three point checklist to assess a Grid (Foster 2002). A Grid
1) coordinates resources that are NOT subject to centralized control;
2) uses standard, open, general purpose protocols and interfaces;
3) delivers non-trivial qualities of service. Fosters highlighting of ‘NOT’, and the inclusion of
‘open protocols’ appear as a further challenge to the commercialization of centralized, closed
grids.
While this checkpoint was readily accepted by the academic community and is widely
cited, unsurprisingly, it was not well received by the commercial Grid community (Plaszczak
and Wellner 2007). The demand for “decentralization” was seen as uncompromising and
excluded “practically all known ‘grid’ systems in operation in industry” (Plaszczak and Wellner
2007, p57). It is perhaps in response to this definition that the notion of “Enterprise Grids”
(Goyal and Lawande 2006) emerged as a form of Grid operating within an organisation, though
possibly employing resources across multiple corporate locations employing differing
technology. It might ultimately be part of the reason why "Cloud computing" has eclipsed Grid
computing as a concept. The commercial usage of Grid terms such as “Enterprise Grid
Computing” highlights the use of Grids away from the perceived risk of globally distributed
Grids and is the foundation of modern Cloud Computing providers (e.g Amazon S3). The focus
is not to achieve increased computing power through connecting distributed clusters of
machines, but as a solution to the “Silos of applications and IT systems infrastructure” within an
organisation’s IT function (Goyal and Lawande 2006, p4) through a focus on utility computing
and reduced complexity. Indeed in contrast to most academic Grids such “Enterprise Grids”
demand homogeneity of resources and centralization within Grids as essential components. It is
19
these Grids which form the backdrop for Cloud Computing and ultimately utility computing in
which cloud provider essentially maintain a homogenous server-farm providing virtualized cloud
service. In such cases the Grid is far from distributed, rather existing as “a centralized pool of
resources to provide dedicated support for virtualized architecture” (Plaszczak and Wellner
2007,p174) often within data-centers.
Before considering the nature of Grids we discuss their underlying architecture. Foster (Foster,
Kesselman et al. 2001) provides an hour-glass Grid architecture (Figure 1). It begins with the
fabric which provides the interfaces to the local resources of the machines on the Grid (be they
physical or virtual machines). This layer provides the local, resource-specific facilities and could
be computer processors, storage elements , tape-robots, sensor, databases or networks. Above
this is a resource and connectivity layer which defines the communication and authentication
protocols required for transactions to be undertaken on the Grid. The next layer provides a
resource management function including directories, brokering systems, as well as monitoring
and diagnostic resources. In the final layer reside the tools and applications which use the Grid. It
is here that Virtualization software resides to provide services.
Figure 1: The Layered Grid Architecture.
20
One of the key challenges of Grids is the management of the resources they manage for
the users. Central to achieving this is the concept of a Virtual Organisation (VO). A Virtual
Organisation is a set of individuals and/or institutions defined by the sharing rules for a set of
resources (Foster and Kesselman 1998) or “a set of Grid entities, such as individuals,
applications, services or resources, that are related to each other by some level of trust”
(Plaszczak and Wellner 2007). By necessity these resources must be controlled “with resource
providers and consumers defining clearly and carefully just what is shared, who is allowed to
share, and the conditions under which sharing occurs” (Foster and Kesselman 1998) and for this
purpose VOs are technically defined along with the rules of their resources sharing. A Grid VO
implies the assumptions of “the absence of central location, central control, omniscience, and an
existing trust relationship” (Abbas 2004). It is this ability to control access to resources which is
also vital within Cloud Computing - allowing walled-gardens for security and accounting of
resource usage for billing.
Various classes and categories of Grids exist. According to Abbas Grids can be categorised
according to their increasing scale - desktop grids, cluster grids, enterprise grids, and global grids
(Abbas 2004). Desktop Grids are based on existing dispersed desktop PC’s and can create a new
computing resource by employing unused processing and storage capacity while the existing user
can continue to use the machine. Cluster Grids describe a form of parallel of distributed
computer system that consists of a collection of interconnected yet standardised computer
nodes working together to act, as far as the user is concerned, as a single unified computing
resource. Many existing supercomputers are clusters which “use Smart Software Systems (SSS)
to virtualise independent operation-system instances to provide an HPC service” (Abbas 2004).
All the above are arguably grids, and potentially can just about live up to Fosters 3 tests.
However, for the information systems field, for Pegasus, and for those who wish to explore
Cloud Computing, it is the final category of global Grids that is the most significant. Global
Grids employ the public internet infrastructure to communicate between Grid Nodes, and rely on
heterogeneous computing and networking resources. Some global grids have gained a large
amount of publicity by providing social benefit which capture the public imagination. Perhaps
the first large scale such project was SETI@home which searches radio-telescope data for signs
of extra-terrestrial intelligence. WorldCommunityGrid.org undertaking research for healthcare
21
and Folding@home concerned with protein folding experiments are other examples.
Folding@home indeed can claim to be the worlds most powerful distributed computing network
according to the Guinness Book of Records, with 700,000 Sony PlayStation 3 machines and over
1,000 trillion calculations per second[9]. Each works by dividing a problem into steps and
distributing software over the internet to the computers of those volunteering. Since within the
home and workplace a large number of desktop computers remain idle most of the time such
donations have little impact on the user. Indeed the average computer is idle for over 90% of the
time, and even when used only a very small amount of the CPU’s capabilities are employed
(Smith 2005).
Another way to categories Grids is by the types of solutions that they best address (Jacob 2003).
A computational grid is focused on undertaking large numbers of computations rapidly, and
hence the focus is on using high performance processors. A data grid’s focus is upon the
effective storage and distribution of large amounts of data, usually across multiple organisations
of locations. The focus of such systems is upon data integrity, security and ease of access. It
should be stressed that there are no hard boundaries between these two types of grid, and one
need often pre-supposes the other and real users face both issues.
As an example of a grid project with a more data orientation, consider the Biomedical
Informatics Research Network, a grid infrastructure project that serves biomedical research
needs http://www.nbirn.net/index.shtm. They express their offerings in terms of 5
complementary elements; a cyber infrastructure, software tools (applications) for biomedical
data gathering, resources of shared data, data integration support, an ontology and support for
multi-site integration of research activity. As they say, “By intertwining concurrent revolutions
occurring in biomedicine and information technology, BIRN is enabling researchers to
participate in large scale, cross-institutional research studies where they are able to acquire,
share, analyze, mine and interpret both imaging and clinical data acquired at multiple sites using
advanced processing and visualization tools.”
Other examples of Grid Computing exist within science, particularly particle physics. The
particle physics community faces the challenge of analyzing the unprecedented amounts of data
- some 15 Petabytes per year - that will be produced by the LHC (Large Hadron Collider)
experiments at CERN[10]. To process this data CERN required around 100,000 computer-
22
equivalents[11] forming its associated grids by 2007, spread across the globe and incorporating a
number of grid infrastructures (Faulkner, Lowe et al. 2006). In using the Grid physicists submit
their computing-jobs to the Grid which spreads across the globe. Similarly data from the LHC is
initially processed at CERN but is quickly spread to 12 computer centres across the world (so
called Tier-1 Grid sites). From here data is spread to local data-centres at universities within
these countries (Tier-2 sites).
23
EXISTED SYSTEM
CONVENTIONAL SUPER COMPUTERS:
“Distributed” or “grid” computing in general is a special type of parallel computing that relies on
complete computers (with onboard CPUs, storage, power supplies, network interfaces, etc.)
connected to a network (private, public or the Internet) by a conventional network interface, such
as Ethernet. This is in contrast to the traditional notion of a supercomputer, which has many
processors connected by a local high-speed computer bus. The primary advantage of distributed
computing is that each node can be purchased as commodity hardware, which, when combined,
can produce a similar computing resource as multiprocessor supercomputer, but at a lower cost.
This is due to the economies of scale of producing commodity hardware, compared to the lower
efficiency of designing and constructing a small number of custom supercomputers. The primary
performance disadvantage is that the various processors and local storage areas do not have high-
speed connections. This arrangement is thus well-suited to applications in which multiple
parallel computations can take place independently, without the need to communicate
intermediate results between processors. The high-end scalability of geographically dispersed
grids is generally favorable, due to the low need for connectivity between nodes relative to the
capacity of the public Internet.
There are also some differences in programming and deployment. It can be costly and difficult to
write programs that can run in the environment of a supercomputer, which may have a custom
operating system, or require the program to address concurrency issues. If a problem can be
adequately parallelized, a “thin” layer of “grid” infrastructure can allow conventional, standalone
programs, given a different part of the same problem, to run on multiple machines. This makes it
possible to write and debug on a single conventional machine, and eliminates complications due
to multiple instances of the same program running in the same shared memory and storage space
at the same time.
Design considerations and variations
One feature of distributed grids is that they can be formed from computing resources belonging
to multiple individuals or organizations (known as multiple administrative domains). This can
24
facilitate commercial transactions, as in utility computing, or make it easier to
assemble volunteer computing networks.
One disadvantage of this feature is that the computers which are actually performing the
calculations might not be entirely trustworthy. The designers of the system must thus introduce
measures to prevent malfunctions or malicious participants from producing false, misleading, or
erroneous results, and from using the system as an attack vector. This often involves assigning
work randomly to different nodes (presumably with different owners) and checking that at least
two different nodes report the same answer for a given work unit. Discrepancies would identify
malfunctioning and malicious nodes.
Due to the lack of central control over the hardware, there is no way to guarantee that nodes will
not drop out of the network at random times. Some nodes (like laptops or dialup Internet
customers) may also be available for computation but not network communications for
unpredictable periods. These variations can be accommodated by assigning large work units
(thus reducing the need for continuous network connectivity) and reassigning work units when a
given node fails to report its results in expected time.
The impacts of trust and availability on performance and development difficulty can influence
the choice of whether to deploy onto a dedicated computer cluster, to idle machines internal to
the developing organization, or to an open external network of volunteers or contractors. In many
cases, the participating nodes must trust the central system not to abuse the access that is being
granted, by interfering with the operation of other programs, mangling stored information,
transmitting private data, or creating new security holes. Other systems employ measures to
reduce the amount of trust “client” nodes must place in the central system such as placing
applications in virtual machines.
Public systems or those crossing administrative domains (including different departments in the
same organization) often result in the need to run on heterogeneous systems, using
different operating systems and hardware architectures. With many languages, there is a trade off
between investment in software development and the number of platforms that can be supported
(and thus the size of the resulting network). Cross-platform languages can reduce the need to
make this trade off, though potentially at the expense of high performance on any given node
(due to run-time interpretation or lack of optimization for the particular platform).There are
diverse scientific and commercial projects to harness a particular associated grid or for the
25
purpose of setting up new grids. BOINC is a common one for various academic projects seeking
public volunteers; more are listed at the end of the article.
In fact, the middleware can be seen as a layer between the hardware and the software. On top of
the middleware, a number of technical areas have to be considered, and these may or may not be
middleware independe]] management, Trust and Security, Virtual organization management,
License Management, Portals and Data Management. These technical areas may be taken care of
in a commercial solution, though the cutting edge of each area is often found within specific
research projects examining the field.
Disadvantages of Conventional super computers:
Disadvantages: Power usage, heat, cost and in the case of over clocked
computers heat that leads to damage to the components which in turn will raise the cost through
replacement parts. In the case of 64 bit processors, (which can provide better processing
capabilities) there can be the downside of compatibility issues for some software.
26
Proposed system:
Grid:
Grid is a combination of different components which collectively as a part of one large
electrical or electronic circuit.
Figure 1:Architecture of a Grid
Grid computing: The term grid computing means that large number of computers are connected
together to collectively solve a problem of very high complexity and magnitude.
Grid computing is all about sharing, aggregating, hosting, offering services
across the world for the benefit of mankind.
Grid computing is a form of networking. Unlike conventional networks that
focus on communication among devices, grid computing harnesses unusedprocessing cycles of
all computers in a network for solving problems too intensive for any stand-alone machine.
A well-known grid computing project is the SETI (Search for Extraterrestrial Intelligence)
@Home project, in which PC users worldwide donate unused processor cycles to help the search
27
for signs of extraterrestrial life by analyzing signals coming from outer space. The project relies
on individual users to volunteer to allow the project to harness the unused processing power of
the user's computer. This method saves the project both money and resources.
Grid computing does require special software that is unique to the computing project for which
the grid is being used.
Figure 2:Architecture Of Grid Computing
1.Current Issues In Grid Computing: Grid Computing is still very much in its development stage and there are a number of issues
that must be addressed or resolved before it can be considered as a stable technology. Some of
these issues are discussed below.
28
1.1 The Grid versus Many Grids:
A distinction must be made between the idea of a single, worldwide, ubiquitous grid and the idea
of many separate grids located in businesses and on university campuses. The original intention
of Grid Computing was that it would follow the same architecture as the electricity grid. This
means that whenever and wherever you needed compute power you would simply \plug in" to
The Grid and the processing would be done. There would be no need to know where the
computing was being done - just as there is no need for me to know where the power that is
lighting this room is coming from - only that it was being done. In the same way that I don't need
to know whether the electricity lighting this room is coming from a hydro-electric power plant in
Fiordland or a wind turbine in Wellington, I wouldn't care if my complicated simulation were
being run on a spare machine next door or on an idle server somewhere on the other side of the
world. Infact,The Grid could be viewed as a Grid of Grids, in much the same way as the Internet
is a network of networks. Although work is still being done toward creating a single Grid, it is
already the case that there are many disparate grids worldwide that are all completely isolated
from each other. Having many separate Grids makes issues like authentication and Virtual
Organisations much simpler, which is one of the reasons that The Grid has not emerged. It also
eliminates the need for some sort of global billing system, which is discussed further in Section
1.2.Some progress toward creating a single worldwide grid has been made, however. The
PlanetLab project is a distributed testbed for testing new networking protocols, planetary scale
sharing, and many other ideas which can benefit from having a huge distributed. It
involves hundreds of computers at different locations around the world, mostly within academic
institutions, on which researchers at the institutions can run experiments. It is not an initiative
aimed at creating a global Computational Grid but it does provide some of the things that a
Grid must provide, such as authentication and authorisation. It currently has 361 nodes (as
at 20 February 2004)[30] connected to it so it is far short of being a worldwide Grid but it is
certainly an important step toward it, both in the new research initiatives that it has allowed
and in demonstrating that world-wide distributed computing projects are feasible. It has been
expected that PlanetLab will have over 1000 nodes distributed over the world by the end of
2004. Its only node in New Zealand is under care off the Network Research Group in the
Department ofComputer Science and Software Engineering at the University of Canterbury in
Christchurch. Sofar the only Australian node of PlanetLabs is located at the University of
Technology in Sydney.
29
1.3 No-one wants to share:
One of the biggest problems facing Grid Computing is not a technological one but a social one.
Even when the technology exists for Grid Computing to work easily and flawlessly, people are
still required to donate their spare CPU cycles or Grid Computing will not work at all. Although
one of the major points of Grid Computing is that only spare cycles will be used, it still goes
against human nature to allow others to access their computers and run programs on them. A fear
of viruses is no doubt a valid threat as what has been viewed as a secure system in the past has
been shown not to be so, so much work must go into developing a security infrastructure that can
be completely trusted.
In the SETI@home project, and others like it, work by volunteers around the world allowing
their computers to be used for scientic research shows that some people at least are willing to
share for no direct benefit to themselves but it is unlikely that everyone would allow this. Within
single businesses or university departments it is likely that it could be official policy that every
computer must be part of the organisation's Grid, but this would probably not work for The Grid
without some sort of global billing system.
5.3 Grid Economics:
Before all the separate grids can be connected into one `supergrid' some sort of billing system
must be established that is accepted and trusted by everyone. It is unlikely for a worldwide Grid
to take and make use of almost all spare CPU time without some incentive for people to make
their computers available. However, in order for a world-wide billing system to work, there will
need to be some way of accurately keeping track of the CPU time used, the CPU time provided
by each user and a way of transferring payment between users. The development of such a
system in a way that is scalable and trusted by everyone is necessary before a global Grid can
become the reality.
The development of such as system could lead to some sort of global bidding system for
compute power - which would fluctuate like the stock market. The value of CPU time would
vary over time according to supply and demand. Daytime hours in the North America during the
working week would probably have the highest demand so would cost more, but could make use
of the servers in Europe and Asia that are not handling their peak capacity. The analogy of the
Computer Grid with the electricity grid can be expanded further - just like it is possible to feed
30
power back into the electricity grid - it will be possible to feed computing power back into the
Computer Grid. In order for a stock-market like Grid billing system to succeed, several obstacles
must be overcome. Local resources must be able to be used first, otherwise a company could
incur costs from using The Grid that they wouldn't have otherwise. This includes stopping non-
local users from using the local resources in order to run local Grid applications. In order for a
stock-market system to work it must also be made sure that businesses or universities do not
incur charges that are more than the gain they would have made. If running an application on
The Grid saves several seconds but costs $100 then, it is probably not worth it. The ISP charges
as well as the Grid charges must be taken into account when calculating how much it will cost to
run on The Grid, which further complicates the issue. These problems mean that although The
Grid certainly can come into sometime, it is likely that in the next few years at least the
development of Grid Computing will focus mainly on the simpler task of creating separate Grids
at separate organisations.
5.4 Performance Forecasting:
One of the problems with scheduling resources on a Grid is that it is hard to know how long a
resource will be available for or how good its performance will be if it is used. Researchers have
implemented a tool known as EveryWare which contains, amongst other things, a performance
forecasting mechanism [21]. With accurate forecasting, scheduling becomes simpler because it is
known that a given resource will react fast to requests or process data quickly. Without accurate
performance forecasting a scheduler could schedule a remote set of CPUs to try and speed up
processing but actually make it slower because those CPUs do not perform as well as expected.
There is still work to be done in this area, however, as the performance forecasting needs to be
incorporated into scheduler algorithms and the accuracy of performance forecasting can no doubt
be improved.
5.5 The No-Defined Problems Problem:
A vital step in solving problems is identifying what they actually are. With any new technology
it is hard to know what the key problems to be solved for that technology to work are - there
are no forums for putting problems forward to be solved and no systematic attempts by various
researchers to solve them . To encourage the formulation of specific problems and solutions,
The authors of propose several problems that they see as holding back the progress of Grid
Computing and challenge other researchers not only to solve those problems but to supply more.
31
7Although Grid Computing has reached a state when a common vocabulary has been formed of
Grid Computing terms and various components of any Grid Computing system have been
spoken of, there is still inconsistency of what the different terms mean and when they are used.
When basic terms related with Grid Computing and components of Grid systems are agreed
upon, research into Grid Computing will be in a much better shape.
5.6 Security:
As mentioned, one of the reasons that people may not want to make their computer available on
a Grid is that they do not trust other users to run code on their machines. Within small scale
Grids this is not too much of a problem as Virtual Organisations at least partially eliminate the
fear of malicious attacks. This is because in a Virtual Organisation you can authorise only those
from within a certain trusted organisation to be able to access your computer. However, there
could potentially be problems with the authorisation systems and it is possible that someone from
within the organisation could act in a malicious way. With larger scale Grids it will be
impossible to know and trust everyone who can access a single computer so the Grid
infrastructure will have to provide guarantees of security in some way.
The Java Sandbox Security Model [14] already provides an environment in which untrusted
users are restricted from making certain system calls which are not considered safe, and from
accessing memory addresses outside of a certain range. Any Grid system will have to provide a
similar mechanism, so that users will be happy to let others access their computer.
5.7 Supercomputing Power For Everyone?
In the past, supercomputing power has been available only to very few people - certain people in
research institutions and some businesses. If The Grid is ever created, though, supercomputing
power will be available to anyone who wishes to access it, although probably at a fairly large
cost. This means that, amongst other things, anyone can do huge password searches or can try
and crack public/private keys. With the creation of The Grid, these issues will have to be
addressed either by somehow restricting users from being able to do such searches or by using
even larger keys and passwords. As [5] shows, what is considered to be an unbreakable key one
year can be inadequate a few years later, and with the advent of The Grid, this situation will be
re-enforced further. There are no doubt many other social issues that will arise when everyone
can have access to supercomputing power, and they will have to be addressed as well.
5.8 The Need Not To Centralise:
32
Any Grid system must have some knowledge of what resources are available in order to provide
Resource Access and Resource Discovery. The logical way to do this would be to have a central
repository listing all resources currently available and who is allowed to access them. The
problem with this centralised solution is that it is not at all scalable and means that the entire
Grid system is subject to a single point of failure. For these reasons, another way of providing
Resource Discovery is required. If there were a central repository containing details on all Grid
resources for a large Grid, the speed at which it would need to operate would be immense. The
dynamic nature of Grid resources would mean that the list of resources available would need to
be constantly updated. Because the availability of resources is dynamic, they can be taken away
from the users at any time which means that users may have to be constantly requesting access to
further resources. In a Grid of world-wide scale, a single server to handle this would not be
possible. As well as the problem of making the central server fast enough, it must also be so
reliable that it can never break down. If it did stop working then the whole Grid would also have
to stop - and even if some of the communication channels between it and certain sections of the
Grid broke, that whole section would have no other server which it could access. Some
distributed form of providing Resource Discovery is required for large Grids to operate reliably.
8To solve this problem, the authors of [21] say that they have created distributed, dynamic
`State Exchange Services'system called Gossips which manage resource access and discovery
and create and destroy themselves automatically. However, as stated there, not every Grid can
use that system so more work is required in this area. Other current Grid systems do not address
this problem at all (see, e.g. [1] and [20]) - but rely on centralised managers - so could not be
scaled past a certain point.
5.9 Grid Programming Environments:
Current Computer Aided Software Engineering (CASE) tools and programming languages have
not been designed to facilitate the creation of Grid applications. What is considered to be high
level in standard software development situations - Java, Message Passing Interfaces (MPI) -
are referred to as low level in Grid publications . This is because Grid Computing uses the
abstractions provided by what are currently referred to as high-level layers - Virtual Machines,
etc. - and extends them. For example Grid programmers should be able to treat a network
as one huge computer and not have to worry about the individual virtual machine computers
that make it up. This extra layer of abstraction should lead to new development environments
33
and possibly things like new programming keywords - `remote', `local', `secure', etc. The current
trend toward component based development will continue with Grid applications being made up
of different components at different sites. This could mean that huge data sets are stored at one
place, analysis is done on the Grid, and visualisation is done somewhere else. The component
based structure leads to the need for standard ways of storing and exchanging data, which current
tools like XML provide.
6 Grid Computing at the University of Canterbury:Grid Computing is not currently employed at the University of Canterbury (UC), but there are
serveral research teams who would like to work on projects that could make extensive use of
Grid Computing. This section outlines details of some of those projects and then the ways in
which they could be activated.
6.1 Research Teams and Projects:
These are projects of research teams from Physics and Astronomy (Prof. Philip Butler and
Associate Prof. Lou Reinisch), Forestry (Dr. Hamish Cochrane), Biological Sciences (Associate
Prof. Jack Heinemann) as well as from HIT Laboratory (Dr Mark Billinghurst). Their projects
are considered to be so heavily computational that they are not suitable for desktop processing.
In particular, the following projects have been planned:
Medical Imaging
The Department of Physics and Astronomy is hoping to purchase a PET/CT scanner in
the near future which would be used for Medical Imaging. Currently running the PET/CT
software on a high-end desktop computer means that only about 10% of time is spent doing
the scanning and the other 90% of the time is spent waiting for results. It is hoped that this
ratio of scanning to processing time could be increased greatly using a Grid, with reduction
of processing times at least ten times.
Bioinformatic Analysis and Genetic Data
Researchers in the New Zealand Institute of Gene Ecology (NZIGE), which includes sta
from the Department of Forestry and the School of Biological Sciences, as well as others,
would also be ready to make use of a computational grid. The research that would use the
grid would mostly involve (in very simple terms) searching for certain patterns on large data
34
sets. This is a very slow process on standard workstations and any increase in speed would
9be considered useful, with a speedup between 2 to 24 times being regarded as good, but
anything further better, of course.As well as these, it is envisaged that other projects would use
the grid if it were available. Some other potential users are:
Proteomics research.
Processing data about imported foods on behalf of MAF. This looks for certain features of
the foods but is currently a very slow process.
Processing astronomical data from the several telescopes that the Department of Physics
and Astronomy has access to.
6.2 Potential Grid Tools For UC
There are several tools that could be used to facilitate Grid Computing at the UC. All of the
projects mentioned above have a focus on data processing rather than data access or any other
Grid function, so this section will focus only on the data processing side of Grid Computing.
Note that although most of these tools are not Computational Grids as defined earlier in this
article they can still provide useful amounts of computing power (and fall into the realm of what
is commonly called Grid Computing)
6.2.1 XGrid
XGrid is a distributed computing system that is currently installed on all Apple Macintoshes at
UC. It claims to automatically detect the precense of other Apple Macs and to be capable of
distributing processing to them without any explicit programming . The degree to which this
works would have to be investigated further, but although most of the computers on campus are
not Macs, enough of them are for a fairly significant amount of processing power to be available
from them if the XGrid system is effective.
6.2.2 Globus
As mentioned earlier, the Globus Toolkit is often referred to as the de facto standard for
creating Computational Grids. It is therefore logical that if a Grid is to be created at UC, the
Globus Toolkit be used. The Globus Toolkit is not simply plugged in and used, however, unlike
XGrid, but is used to create Grids . For this reason, if the Globus Toolkit were to be used to
create a Grid at UC, specialist programmers would have to be employed to put it all together.
The advantage of the Globus Toolkit is that it is widely used and well understood and, compared
to other tools, it is at least known to work and work well.
6.3 The Akaroa Project
35
Akaroa2 is an automated controller of stochastic discrete-event simulation developed at the Uni-
versity of Canterbury by the Simulation Research Group (the group led by Prof. K. Pawlikowski
from Computer Science and Software Engineering, and Associate Prof. D. McNickle from Man-
agement). When Akaroa2 was designed at the University of Canterbury in 1992, it was one
of the first software packages enabling grid processing. In 1993, it received an international com-
mendation (in Science category) in the Computerworld Smithsonian Award for Achievements in
Information Technology, USA. Akaroa2 speeds up simulation experiments by performing
multiple replications of the experipment in parallel (MRIP) on multiple computers of a LAN,
with a simulation being stopped when the overall results have reached the desired level of
statistical precision. It runs the different replications on different machines acting as simulation
engines. Akaroa2 has been designed for 10working on local area networks consisting of
UNIX/Linux machines. Thus, the degree of its dis-tributiveness is limited by the number of
workstations in a given LAN. Currently, students of Computer Science and Software
Engineering at the University of Canterbury can use AkaroA2 for distributing simulations
utilizing about 250 workstations. Launching Akaroa2 on a Grid system would certainly be very
desirable, since access to many more hosts could be possible. The next section investigates how
this could be done.
6.3.1 PlanetLab
As mentioned, PlanetLab is not a Grid Computing system but is a global testbed for distributed
computing systems [30]. The Department of Computer Science and Software Engineering at
UC has maintained a node on PlanetLab, so any Grid projects conducted there could use the
PlanetLab testbed. This could form a very good way of extending the Akaroa2 project - multiple
simulations could be run on different parts of the world instead of on different machines in the
same lab, although issues such as the effect of the increased time propogation delay and
unreliable access to machines would need to be investigated. PlanetLab would also provide
access to another several-hundred machines which could further increase the speed of simulation
studies, and allow more complicated simulations to be carried out.
6.3.2 MPICH-G2 and Globus:
MPICH-G2 is a grid-enabled implementation of the MPI standard . MPI is a library speci-
fication for message-passing which can be used for constructing portable parallel programs.
Its goals are to provide portability and performance across many platforms and, because it
is aimed at being portable, it could be a good tool to use to modify Akaroa2. MPICH-G2 imple-
36
ments the MPI standard and extends it using tools from the Globus Toolkit, allowing the creation
of Grid applications that run on multiple machines of potentially different architectures . If
Akaroa2 were extended using MPICH-G2, it could be run on multiple environments at once (ie.
not just UNIX or Linux). This would greatly increase the potential processing power available to
simulation applications. MPICH-G2 has C and C++ bindings which make it ideal for use with
Akaroa2.
Grid Computing means sharing computing resources in order to create super-computing capa-
bilities out of desktop computers by using their idle CPU time. It also involves sharing other
computing resources such as data sets and disk storage. It has been around for several years and
has reached the stage when there are tools available so that experts can create Computational
Grids and use them to solve problems in many fields.
There are four vital issues which must be resolved in a distributed computing system before
it can be called a Grid. These are Authentication, Authorisation, Resource Access and Resource
Discovery. They lead to the idea of Virtual Organisations of collaborators who share resources
over a Grid. There are currently several tools available to help developers create Grids. The most
widely used of these is the Globus Toolkit, but there are others. There are also several
commercial companies which claim to provide Grid systems to clients.Despite all the progress
that has been made with Grid Computing, a number of challenges still exist. They must be faced
now or in the future if Grid Computing is to succeed as a technology. These include the issue of
many separate Grids versus a single world-wide Grid, addressing social issues of resulting from
sharing computing resources (the idea of Grid Economics), security issues(allowing untrusted
others to run code on your machine), problems with allocating resources (forecasting the
performance of resources and creating a way of discovering resources without using a single
central repository), and many others. Grid Computing is well suited to some of the research that
is being done, or is intended to 11be done, at the University of Canterbury. Projects in Physics
and Astronomy, Biological Sciences.
RESULTS:
To show the effectiveness of pipelining, we ran a series of experiments. All of our experiments
were performed on TeraGrid machines. For local-area tests we ranentirely on the University of
Chicago TeraGrid. Our wide-area tests ran between the San
37
Diego Supercomputer Center TeraGrid site and the University of Chicago TeraGrid site. The
nodes at these sites are Dual Itanium 1.5 GHz machines with 4 MB of RAM and 1 Gb/s network
interface cards. We used the Globus GridFTP server with the modifications
described above and a custom client written by using the jglobus libraries described above. To
avoid anomalies and bottlenecks in the filesystem, we used the standard UNIX devices /dev/zero
and /dev/null as our source and desitation files, respectively. The
devices appear as files to the GridFTP server; however, they do no disk or block I/O Figures 3
through 6 show the results of an experiment that transfers 1 GB of
partitioned into an increasing number of files. As the number of files increases, the size of each
file decreases, but the total number of bytes transferred remains constant at 1 GB.
The top x-axis shows the number of files, and the bottom x-axis shows the size of each file. The
y-axis shows the achieved throughput in Mb/s. The LAN results in Figures 3 and 4 show how the
legacy transfer request techniques quickly suffer when the data is partitioned into multiple files.
There is a significant dropoff before just 10 files of 100 MB each, and almost all of the
throughput is lost at 1,000 1 MB files. However, the
pipelining solution is unaffected by file partitioning until the point where the file sizes are less
than 100 KB. The wide-area tests in Figures 5 and 6 show how significantly latency affects the
legacy transfers. Sine the round-trip times are greater on wide area networks, the delay between
transfers is also greater, and thus the overall transfer time is longer. However, the pipelining case
is again unaffected.
38
Figure 3: Comparison of the performance of pipelined GridFTP transfers with standard (nonpipelined) GridFTP transfers in a LAN with no security
Figure 4: Comparison of the performance of pipelined GridFTP transfers with standard(nonpipelined) GridFTP transfers in a LAN with security
Security affects the results in a way we did not expect. Since we are caching data channel
connections in both the cached and the pipelining cases, we did not expect the throughput levels
to drop any sooner with security than without security. However, as shown in
Figures 4 and 6, this is not the case. As the number of files increases, the throughput drops off
sooner when sending with GSI authentication. After extensive investigation
we have determined that this result is due not to any data channel handling but rather to message
processing latencies on the control channel.
39
Figure 5: Comparison of the performance of pipelined GridFTP transfers with standard(nonpipelined) GridFTP transfers in a WAN with no security
Figure 6: Comparison of the performance of pipelined GridFTP transfers with standard(nonpipelined) GridFTP transfers in a WAN with security
40
Between transfers the server sends a reply to the client. In our implementation the data channel
must be idle while the reply is formatted and passed to the TCP stack for
sending. With nonsecure transfers this time is extremely short. With GSI, however, the reply
must be encrypted, and therefore it takes much longer to format. As more transfers are requested,
more of these replies must be sent. Thus, this idle time becomes great
enough to affect the transfer rate.
APPLICATIONS OF GRIDFTP PIPELINING
Allows many outstanding transfer requests
Send next request before previous completes
Latency is overlapped with the data transfer
Backward compatible
Wire protocol doesn’t change
Client side sends commands sooner
Significant performance improvement for LOSF
Advantages of Grid Computing:
Grid computing has been around for over 12 years now and its advantages are many. Grid
computing can be defined in many ways but for these discussions let's simply call it a way to
execute compute jobs (e.g. perl scripts, database queries, etc.) across a distributed set of
resources instead of one central resource. In the past most computing was done in silos or large
SMP like boxes. Even today you'll still see companies perform calculations on large SMP boxes
(e.g. E10K's, HP Superdomes). But this model can be quite expensive and doesn't scale well.
Along comes grid computing (top five strategic technologies for 2008) and now we have the
ability to distribute jobs to many smaller server components using load sharing software that
distributes the load evenly based on resource availability and policies. Now instead of having
one heavily burdened server the load can be spread evenly across many smaller computers. The
41
distributed nature of grid computing is transparent to the user. When a user submits a job they
don't have to think about which machine their job is going to get executed on. The "grid
software" will perform the necessary calculations and decide where to send the job based on
policies. Many research institutions are using some sort of grid computing to address complex
computational challenges. This post talks about how yous can volunteer your workstation to be
part of a grid that attempts to solve the some of the world’s biggest challenges.
Some Advantages of Grid Computing:
1. No need to buy large six figure SMP servers for applications that can be split up and
farmed out to smaller commodity type servers. Results can then be concatenated and
analyzed upon job(s) completion.
2. Much more efficient use of idle resources. Jobs can be farmed out to idle servers or even
idle desktops. Many of these resources sit idle especially during off business hours.
Policies can be in place that allow jobs to only go to servers that are lightly loaded or
have the appropriate amount of memory/cpu characteristics for the particular application.
3. Grid environments are much more modular and don't have single points of failure. If one
of the servers/desktops within the grid fail there are plenty of other resources able to pick
the load. Jobs can automatically restart if a failure occurs.
4. Policies can be managed by the grid software. The software is really the brains behind the
grid. A client will reside on each server which send information back to the master telling
it what type of availability or resources it has to complete incoming jobs.
5. This model scales very well. Need more compute resources? Just plug them in by
installing grid client on additional desktops or servers. They can be removed just as easily
on the fly. This modular environment really scales well.
42
6. Upgrading can be done on the fly without scheduling downtime. Since there are so many
resources some can be taken offline while leaving enough for work to continue. This way
upgrades can be cascaded as to not effect ongoing projects.
7. Jobs can be executed in parallel speeding performance. Grid environments are extremely
well suited to run jobs that can be split into smaller chunks and run concurrently on many
nodes. Using things like MPI will allow message passing to occur among compute
resources.
Methods of Grid Computing:
1. Drozdowski’s on-line scheduling method:
Our scheduling method is based the On-Line method presented by Drozdowski
in [1], denoted "OL" thereafter. OL proceeds incrementally, computing the size
αi,j of the chunk to be sent to a worker Ni for each new round j, in order to
try and maintain a constant duration τ for the different rounds and thus avoid
contention at the master.
That is it allocates comparatively bigger (resp. smaller) chunks to workers with
higher (resp. lower) performance. Hence, this method can take the heterogeneous
nature of computing and communication resources into account, without explicit
knowledge of execution parameters (as equality (1) shows); as Drozdowski states,
"the application itself is a good benchmark" [1] (actually the best one).
Lemma 6.1 in [1] shows that, in a static context, with affine cost models
for communication, the way αi,j is computed using equation (1) ensures the
convergence of σi,j to τ when j increases indefinitely.
Being an estimation of the asymptotic period used for task distribution, τ is
also an upper-bound on the discrepancy between workers. Being able to control
this bound makes it possible to minimize the makespan during the clean-up
phase. round from the master to worker Ni (resp. from Ni to the master).
43
It should be noted that, unlike previous work [1, 9], this paper introduces computation start-up
times in order to be more realistic when considering grids. As suggested in section 2, the values
of the execution parameters of any worker Ni ensures that sending chunks of any size α to a
worker Ni and receiving the corresponding results cost less than processing these chunks.
The problem with OL is that computation never overlaps communication in any worker node, as
the emission of the chunk of the next round is at best triggered by the return of the result of the
previous one, with no possible anticipation.
2.The OLMR method:
2.1 Overview of the method
Our method is based on OL, but avoids idle time with respect to computing.
When the total load is important compared to the available bandwidth between master and
workers, the workload should be delivered in multiple rounds
[10, 11, 12]. Therefore we will have each worker receive its share of the load
through multiple rounds, hence the name On-Line Multi-Round method [9], denoted "OLMR"
thereafter. OLMR divides the chunk sent to Ni for each round
j into two subchunks "I" and "II" of respective sizes αi,j and αi,j − αi,j . Dividing the chunks in
two parts is enough in order to apply the principle, and
the division allows the computation to overlap the communications as can be
seen in figure FIG.1. In order to compute αi,j , we use a value of σi,j−1 derived
from the measurement of the elapsed time (including both communications and
computation) for subchunk I of the previous round: σi,j−1. We will show that,
thanks to this anticipation (compared to OL) in the computation of αi,j , we can avoid the inter-
round starvation.
44
Conclusion:
So far we have been describing and walking through overview discussion topics on the Grid Computing discipline that will be discussed further throughout this book, including the Grid Computing evolution, the applications, and the infrastructure requirements for any grid environment.
In addition to this, we have discussed when one should use Grid Computing disciplines, and the factors developers and providers must consider in the implementation phases. With this introduction we can now explore deeper into the various aspects of a Grid Computing system, its evolution across the industries, and the current architectural efforts underway throughout the world.
The proceeding chapters in this book introduce the reader to this new, evolutionary era of Grid Computing, in a concise, hard-hitting, and easy-to-understand manner.
In past by implementing the concept of Grid computing achieved the things
like robustness, throughput,and standard. In future concentrate the things like secure, scalable,
extensible. Finally a grid in need is agrid indeed.
Future work on Grid Computing:
Grid Computing can be defined as the seamless provision of access to possibly remote,
possibly heterogeneous, possibly untrusting, possibly dynamic computing resources. Analysed
piece by piece, this definition means that Grid Computing provides seamless access to:
1. Possibly Remote Computing Resources
Means that local resources, which are on the same LAN, and remote resources, which are
geographically distant, can be accessed in exactly the same way on the Grid.
2. Possibly Heterogeneous Computing Resources
45
Some computers on the Grid can run different Operating Systems on different types of
machines. Accessing them via the Grid should be possible without making any special
allowances for this.
3. Possibly Untrusting Computing Resources
Means that the owner of a computing resource on the Grid might not know or trust other
users but should still be confident that they cannot access any non-shared data and cannot make
malicious system calls on their computer. The Grid should handle this security checking without
any specific instruction from the user or from the sharer.
4. Possibly Dynamic Computing Resources
One of the major selling points of Grid Computing is that it makes use of otherwise
wastedCPU cycles. The problem with this is that the availability of computers to the Grid
changes rapidly as computers become busy and then idle as their owner's usage varies. The Grid
system should ensure that this dynamism is hidden from users so that they do not have to
program explicitly to take account of this.
Seamless provision means that Grid users can access such seemingly un-accessible resources
easily without having to worry about all these complications.
Altogether, this definition leads to four main things that any Grid system must provide
seamlessly in order to be considered a Grid,
1. Authentication
2. Authorization
3. Resource Access
4. Resource Discovery
4.1.1 Authentication
Authentication means that each user has an identity which can be trusted as genuine. This is
necessary because some resources may be authorized only to certain users, or certain classes of
users.
Authentication of a user should happen only once when they start using a Grid - they
should not have to sign on separately to each of the many machines that their
46
computation may use.
4.1.2 Authorization
Authorisation means that each resource be it the spare computing power on a computer of
an organisation or a set of astronomical data will have a set of users and groups that can accessit.
TheGrid needs to rst authenticate that the users are who they say they are and then ensure
that they are allowed to access the resources that they are requesting. Having groups authorised
to access certain resources leads to the idea of Virtual Organisations.
4.1.3 Resource Access
Resource Access means that remote resources can be accessible to Grid users. These
resources could mean anything from CPU time to disk storage, to visualisation tools and data
sets. As discussed, not everyone should be able to access all resources but the Grid must provide
a way to access those that are allowed. This means that some sort of virtual machine is required
so that machines with different operating systems, etc. can be accessed in a uniform way.
4.1.4 Resource Discovery
Being allowed to access thousands of different CPUs is useless without being able to find
out where they are. Resource Discovery means that users can find remote resources that they can
use. This process should be automated by the Grid so that a user's task can automatically be run
remotely without them having to go through the process of finding CPUs that they can use. The
automation of resource discovery is complicated hugely by the dynamic nature of Grid resources
what is available at one instant of time may no longer be available a while later. Added to this
complication is the desire to avoid a single central point where all data is stored because the
failure of it would bring the whole system down and one single point of control is not a scalable
solution if the Grid becomes really large this central point would be badly overloaded.
3.2 Virtual Organisations
47
The idea of a Virtual Organisation (VO) is that on, say, a university campus-wide Grid,
members of the Physics and Biology departments could be working on a project together so they
could form a Virtual Organisation for that project where they could all access the data for that
project and each other's computing resources. However, those who are not members of the
research group would not be members of the VO so would not be able to access the resources.
Members of the Computer Science department - who would not be part of the other VO - may be
working on a different project however could have separate projects running with separate access
rights for a different set of resources. Note that different projects within the same departments
could also have separate Virtual Organisations so keep some of their data separate but allow
projects from both VOs to use the compute resources.
4 Current Grids and Grid Products
There are a number of tools available to help create Computational Grids, both free,
open-source ones and commercial products. There is also a standards body which seeks to put
forward `recommendations' about how best to do Grid Computing. This section gives an
overview of these, and details about several of the many Grids in existence today.
4.1 Tools and Standards
4.1.1 Globus
The Globus Toolkit designed by the Globus Alliance contains a set of free software tools
services, APIs and protocols - to facilitate constructions of Grids. It is the most widely used
toolkit for building of Grids and is frequently referred to as the de facto standard; see e.g. ,. It
includes tools for, among other things, security, resource management and communication.The
Globus Alliance also researches various issues related to Grid Computing, especially issues
relating to the infrastructure of Grids. Almost every Grid which has its details published was
constructed using the Globus Toolkit.
4.1.2 The Global Grid Forum
The Global Grid Forum (GGF) performs a similar role to the development of Grids as the W3C
does toward the development of the World Wide Web, [26]. It is a conglomerate of interested
parties including universities, research institutes and industry. It is not an official body so it does
not put forward standards but just `best practices' for Grid developers. It is important because
it provides a forum for new ideas to be discussed by all interested parties. There are strong links
between the GGF and The Globus Alliance - ideas put forward by the GGF are often
implemented by Globus.
48
4.1.3 Condor and Condor-G:
Condor is a software tool for distributing computationally intensive jobs over Grids. It works by
using spare CPU cycles on other computers. It provides a way of doing resource discovery using
`ClassAds' which matches job requests to unused resources. From the Condor product Condor-G
has been created. Condor-G is an enhanced version of Condor which can be used to make Grids.
It uses Globus tools to provide \security, resource discovery, and resource access in multi-
domain environments" with Condor's \management of computation and harnessing of resources
within a single administrative domain." There has also been work on making separate Condor
pools self-organising, fault-tolerant, scalable, and locality-aware" which has proved to be a
successful way for automatic management of larger groups of Condor pools.
4.2 Some current Grids in development and deployment There are many Grids currently in use and in production; in this section we examine several
ofthem in detail. These are not claimed to give a representative sample of all current Grids, but
are only to give insight into a few of them. The huge Euro Grid project and the United States
National Fusion Collaboratory are discussed.
4.2.1 European Data Grid:
The European Data Grid is a European Union funded project which aims to create a huge
Grid system for computation and data-sharing. It is aimed at projects in high energy physics, led
by CERN, biology and medical image processing, and astronomy. It is being developed using
and extending the Globus Toolkit. In building the Grid new tools and systems have been
developed in many areas useful for the extension of Grid Computing. For example, a method of
enabling secure access to databases in Grid environments has been developed [18]. New
techniques for searching for patterns in genomic data using the European Data Grid have also
been developed .
4.2.2 The National Fusion Collaboratory:
The National Fusion Collaboratory project exists to help research magnetic fusion. Magnetic
fusion experiments operate on pulses of plasmas which are produced approximately every 15
minutes. The data generated from each measurement must be analysed within the 15 minutes so
that changes can be made to the set up in time for the next pulse . This time limit means
49
that it would be very useful for the researchers to be able to analyse the data quickly so that more
time can be spent reconfiguring the experimental set up. For this reason, the National Fusion
Collaboratory constructed a Computational Grid. This project was also built using the Globus
Toolkit and the main research focus is on `advanced reservations of multiple resources' - this
means that resources such as computational cycles can be reserved in advance if it is known that
they will be required sometime in the future.
4.3 Commercial Grid Products:
There are several Grid products currently listed on various websites; see for example and .
They claim to easily enable Grid Computing within organisations but it is hard to tell how much
they actually do because they do not publish refereed papers - most of the information available
about them is probably marketing hype and not a veriable fact. When the NorduGrid was being
constructed in Scandanavia they chose to develop their own Grid system because nothing
existing was suitable, [11]. This shows that at this stage at least commercial products were not of
a high enough standard for real use.
50
BIBILOGRAPHY:
[1] W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, and I. Foster,
“The Globus striped GridFTP framework and server,” in SC'05, ACM Press, 2005.
[2] Gu, Y. and Grossman, R. L. 2007. UDT: UDP-baseddata transfer for high-speed wide area
networks. Comput. Networks 51, 7 (May. 2007), 1777–1799. DOI=
http://dx.doi.org/10.1016/j.comnet.2006.11.009
[3] C. Kiddle P. Rizk and R. Simmonds. A GridFTP overlay network service. In In Proceedings
of the 7th IEEE/ACM International Conference on Grid Computing,
Barcelona, Spain, 2007.
51
WEBSITES:
[1] http://www.nlr.net/
[2]http://www.uklight.ac.uk/
[3] http://www.csm.ornl.gov/ultranet/topology.html
[4] http://www.lambdastation.org/
[5] http://www.atlasgrid.bnl.gov/terapaths
52