What do we mean by the Grid and e-research?
An overview of somekey aspects and technologies
in 30 minutes
Jennifer M. SchopfUK National eScience Centre
Argonne National Lab
2
Talk Outline
Definition of Grid and eResearch Globus Toolkit
Provider of basic infrastructure Focus on data tools
OMII – Open Middleware Infrastructure UK repository and distribution of eResearch
tools
3
What is a Grid?
Many definitions – many differences especially between academics and industry Both use the buzzword to get funding
My definition Resource sharing Coordinated problem solving Dynamic, multi-institutional virtual orgs
4
Resource Sharing
Resources can be anything- Computers Storage/repositories Sensors and Networks People and software
Local Control of the resources, and local policies for their use
Sharing is always conditional Issues of trust, policy Negotiation and payment
5
Coordinated Problem Solving
Beyond client-server Client Server defines a small set of well-
understood interactions as the only ones that can take place
Actions in this space can include Distributed data analysis Computation and visualization of results Collaboration
6
Dynamic, Multi-institutionalVirtual Organizations
Crossing administrative domains No one has full control over the resources Local policy not global Different local policy on different sites
Community overlays on classic organizational structures
Large or small, static or dynamic
7
What is eScience or eResearch? Use of distributed resources, in a coordinated way,
across multiple administrative domains to do science or further your research
“Classic” eScience Use compute and data resources at many sites to run large
scale simulations for a physics or biology application Today’s Use Cases
Replicate data across multiple sites to increase reliability, redundancy and performance
Use one common interface to access a variety of data resources at multiple sites
Look at a number of available resources to select the one that best suits the application needs at this time
8
Why is this hard/different? Lack of central control
Where things run When they run
Shared resources Contention, variability
Off-label use Resources or software developed for one purpose (or
community) is now being used in a way that wasn’t originally planned for
Communication Different sites implies different sys admins, users,
institutional goals, and often “strong personalities”
9
So why do it?
Work that needs to be done with a time limit
Data that can’t fit on one site Data owned by multiple sites
Applications that need to be run bigger, faster, more
10
What functionality isneeded to use a Grid?
Basics: Run a job Transfer a file Find out what’s going on (service and job
monitoring All done securely
Higher-level Replication Higher level data movement Workflow-scheduling
11
Grid2003: An Operational Grid 28 sites (2100-2800 CPUs) & growing 10 substantial applications + CS experiments Running since October 2003, still up today
Korea
http://www.ivdgl.org/grid2003
12
Globus ToolkitWas Created To Help Applications
The Globus Toolkit consists of collections of solutions to problems that frequently come up when trying to build collaborative distributed applications
Heterogeneity Focus on simplifying heterogeneity for application
developers Working towards more “vertical solutions”
Standards Capitalize on and encourage use of existing
standards (IETF, W3C, OASIS, GGF) Reference implementations of new/proposed
standards in these organizations
13
Globus is Service-Oriented Infrastructure Technology
Software for service-oriented infrastructure Service enable new & existing resources E.g., GRAM on computer, GridFTP on storage system,
custom application service Uniform abstractions & mechanisms
Tools to build applications that exploit service-oriented infrastructure Registries, security, data management, …
Open source & open standards Each empowers the other eg – monitoring across different protocols is hard
Enabler of a rich tool & service ecosystem
14
Our Goals for Globus Toolkit v4
Usability, reliability, scalability, … Web service components have quality equal
or superior to pre-WS components Documentation at acceptable quality level
Consistency with latest standards (WS-*, WSRF, WS-N, etc.) and Apache platform WS-I Basic (Security) Profile compliant
New components, platforms, languages And links to larger Globus ecosystem
15
18
A Model Architecture for Data Grids
Metadata Catalog
Replica Loc. Svc
Tape Library
Disk Cache
Attribute Specification
Logical Collection and Logical File Name
Disk Array Disk Cache
Application
Replica Selection
Multiple Locations
NWS
SelectedReplica
GridFTP Control ChannelPerformanceInformation &Predictions
Replica Location 1 Replica Location 2 Replica Location 3
MDS
GridFTPDataChannel
1 2
3
4
19
GT4 Data Functions
Find your data: Replica Location Service Managing ~40M files in production settings
Move/access your data: GridFTP, Reliable File Transfer (RFT)
High-performance striped data movement Couple data & execution management
GRAM uses GridFTP and RFT for staging Access databases through standard Grid
interfaces: OGSA-DAI
20
GridFTP in GT4
Basic file transfer support, and memory-to-memory copies
Underlying protocol of access can vary Functions as a hourglass offering one interface to
different resources Allows partial file transfer support Can have parallel streams and stripping
Greatly improve performance over most FTP implementations
On TeraGrid network achieved 27 Gbs on a 30 Gbs link (90% utilization) with 32 nodes
21
Reliable File Transfer:Third Party Transfer
RFT Service
RFT Client
SOAP Messages
Notifications(Optional)
DataChannel
Protocol Interpreter
MasterDSI
DataChannel
SlaveDSI
IPCReceiver
IPC Link
MasterDSI
Protocol Interpreter
Data Channel
IPCReceiver
SlaveDSI
Data Channel
IPC Link
GridFTP Server GridFTP Server
Fire-and-forget transfer Web services interface Many files & directories Integrated failure recovery
24
OGSA-DAI Data access
Relational & XML Databases, semi-structured files Data integration
Multiple data delivery mechanisms, data translation
Extensible & Efficient framework Request documents contain multiple tasks
A task = execution of an activity Group work to enable efficient operation
Extensible set of activities > 30 predefined, framework for writing your own
Moves computation to data Pipelined and streaming evaluation Concurrent task evaluation
26
Any questions on Data Management?
27
The ResourceManagement Challenge
Enabling secure, controlled remote access to heterogeneous computational resources and management of remote computation Authentication and authorization Resource discovery & characterization Reservation and allocation Computation monitoring and control
Addressed by a set of protocols & services GRAM protocol as a basic building block Resource brokering & co-allocation services GSI for security, MDS for discovery
28
Execution Management (GRAM)
Common WS interface to schedulers Unix, Condor, LSF, PBS, SGE, …
More generally: interface for process execution management Lay down execution environment Stage data Monitor & manage lifecycle Kill it, clean up
A basis for application-driven provisioning
29
Monitoring and Discovery Challenges
Grid Information Service Requirements and characteristics
Uniform, flexible access to information Scalable, efficient access to dynamic data Access to multiple information sources Decentralized maintenance Secure information provision
Basic monitoring for resource selection and notification of errors
32
The Globus Ecosystem Globus components address core issues relating
to resource access, monitoring, discovery, security, data movement, etc. GT4 being the latest version
A larger Globus ecosystem of open source and proprietary components provide complementary components A growing list of components
These components can be combined to produce solutions to Grid problems We’re building a list of such solutions
33
Many Tools Build on, or Can Contribute to, GT4-Based Grids
Condor-G, DAGman MPICH-G2 GRMS Nimrod-G Ninf-G Open Grid Computing
Env. Commodity Grid Toolkit GriPhyN Virtual Data
System Virtual Data Toolkit GridXpert Synergy
Platform Globus Toolkit VOMS PERMIS GT4IDE Sun Grid Engine PBS scheduler LSF scheduler GridBus TeraGrid CTSS NEES IBM Grid Toolbox …
34
Open MiddlewareInfrastructure Institute
Formed University of Southampton (2004) Focus on an easy to install e-Infrastructure solution Utilise existing software & standards
Expanding with new partners in 2006 OGSA-DAI team at Edinburgh myGrid team at Manchester
To be a leading provider of reliable interoperable and open-source
Grid middleware components services and tools to support
advanced Grid enabled solutions in academia and industry.
Slides compliments ofSteven Newhouse
35
Slides compliments ofSteven Newhouse
Activity
By providing a software repository of Grid components and tools from e-science projects
By re-engineering software, hardening it and providing support for components sourced from the community
By a managed programme to contract the development of “missing” software components necessary in grid middleware
By providing an integrated grid middleware release of the sourced software components
36
Slides compliments ofSteven Newhouse
The Managed Programme: Distribution and Repository
OGSA-DAI (Data Access service) GridSAM (Job Submission & Monitoring service) Grimoires (Registry service based on UDDI) GeodiseLab (Matlab & Jython environments) FINS (Notification services using WS-Eventing) BPEL (Workflow service) MANGO (Managing workflows with BPEL) FIRMS (Reliable messaging)
37
So…
eResearch is expanding in scope
Globus Toolkit provides many basic tools, and is incorporated in many projects, esp those focused on data movement
In the UK, OMII is another useful source of eInfrastructure software
2nd Editionwww.mkp.com/grid2
38
Additional Information
Contact: Jennifer M. Schopf [email protected] http://www.mcs.anl.gov/~jms
Globus Alliance: http://www.globus.org
Information about OMII: http//www.omii.ac.uk [email protected]