the anatomy of the grid enabling scalable virtual organizations
DESCRIPTION
The Anatomy of the Grid Enabling Scalable Virtual Organizations. John DYER TERENA [email protected]. Acknowldgement to: Ian Foster Mathematics and Computer Science Division Argonne National Laboratory. Grids are “hot” …. but what are they really about?. Presentation Agenda. - PowerPoint PPT PresentationTRANSCRIPT
The Anatomy of the GridEnabling Scalable Virtual Organizations
Acknowldgement to: Ian FosterMathematics and Computer Science DivisionArgonne National Laboratory
John DYERTERENA
but what are they really about?
Grids are “hot” …
Presentation Agenda• Problem statement• Architecture• Globus Toolkit• Futures
The Grid Problem Resource sharing & coordinated problem
solving in dynamic, multi-institutional virtual organizations
Elements of the Problem• Resource sharing
• Computers, storage, sensors, networks, …• Sharing always conditional: issues of trust, policy,
negotiation, payment, … (Cost v Performance)• Coordinated problem solving
• Beyond client-server: distributed data analysis, computation, collaboration, …
• Dynamic, multi-institutional virtual orgs• Community overlays on classic org structures• Large or small, static or dynamic
Computational Astrophysics
• Solved EEs for gravitational waves• Tightly coupled, communications required • Must communicate 30MB/step between machines
Gig-E100MB/sec
SDSC IBM SP1024 procs5x12x17 =1020
NCSA Origin Array256+128+1285x12x(4+2+2) =480
OC-12 lineBut only 2.5MB/sec)
17
5 125
4 2 2
Data Grids for High Energy Physics
Tier2 Centre ~1 TIPS
Online System
Offline Processor Farm ~20 TIPS
CERN Computer Centre
FermiLab ~4 TIPSFrance Regional Centre
Italy Regional Centre
Germany Regional Centre
InstituteInstituteInstituteInstitute ~0.25TIPS
Physicist workstations
~100 MBytes/sec
~100 MBytes/sec
~622 Mbits/sec
~1 MBytes/sec
There is a “bunch crossing” every 25 nsecs.
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec or Air Freight (deprecated)
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Caltech ~1 TIPS
~622 Mbits/sec
Tier 0Tier 0
Tier 1Tier 1
Tier 2Tier 2
Tier 4Tier 4
1 TIPS is approximately 25,000
SpecInt95 equivalents
Image courtesy Harvey Newman, Caltech
Network for Earthquake Engineering Simulation
• NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other
• On-demand access to experiments, data streams, computing, archives, collaboration
NEESgrid: Argonne, Michigan, NCSA, UIUC, USC
Grid Applications:Mathematicians
• Community=an informal collaboration of mathematicians and computer scientists
• Condor-G delivers 3.46E8 CPU seconds in 7 days (600E3 seconds real-time)
• peak 1009 processors in U.S. and Italy (8 sites)
MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin
Grid ArchitectureIsn’t it just the Next Generation
Internet , so why bother !
Why Discuss Architecture?• Descriptive
• Provide a common vocabulary for use when describing Grid systems
• Guidance• Identify key areas in which services are
required - FRAMEWORK• Prescriptive
• Define standards• But in the existing standards framework• GGF working with IETF, Internet2 etc.
What Sorts of Standards?• Need for interoperability when different groups want to
share resources• E.g., IP lets me talk to your computer, but how do we
establish & maintain sharing?• How do I discover, authenticate, authorize, describe what I
want to do, etc., etc.?• Need for shared infrastructure services to avoid
repeated development, installation, e.g.• One port/service for remote access to computing, not one
per tool/application• X.509 enables sharing of Certificate Authorities
• MIDDLEWARE !
In Defining Grid Architecture, We Must Address . . .
• Development of Grid protocols & services• Protocol-mediated access to remote resources• New services: e.g., resource brokering• Mostly (extensions to) existing protocols
• Development of Grid APIs & SDKs• Facilitate application development by
supplying higher-level abstractions• The model is the Internet and Web
The Role of Grid Services(Middleware) and Tools
Informationservices
Faultdetection . . .Resource
mgmt
CollaborationTools
Data MgmtTools
Distributedsimulation. . .
net
GRID ArchitectureStatus
• No “official” standards exist• But:
• Globus Toolkit has emerged as the de facto standard for several important Connectivity, Resource, and Collective protocols
• GGF has an architecture working group• Technical specifications are being developed
for architecture elements: e.g., security, data, resource management, information
• Internet drafts submitted in security area
Layered Grid Architecture
Application
Fabric
ConnectivityCOMMS & AUTHENTICATIONSingle Sign On, Trust . . .
ResourceNEGOTIATION & CONTROL Sharing resources, controlling
CollectiveJOB MANAGEMENTDirectory, Discovery, Monitoring
InternetTransport
Application
Link
Internet Protocol Architecture
DOES THE SCIENCE /….
ALL PHYSICAL RESOURCESNet, CPUs, Storage, Sensors
Toolkits & Components
• CONDOR - Harnessing the processing capacity of idle workstations
www.cs.wisc.edu/condor/• LEGION- developing an object-oriented framework for grid applications
www.cs.virginia.edu/~legion• Globus Toolkit SDK - APIs
www.globus.org/
Architecture: Fabric Layer• Just what you would expect: the diverse
mix of resources that may be shared• Individual computers, Condor pools, file
systems, archives, metadata catalogs, networks, sensors, etc., etc.
• Few constraints on low-level technology: connectivity and resource level protocols
• Globus toolkit provides a few selected components (e.g., bandwidth broker)
Architecture: Connectivity• Communication
• Internet protocols: IP, DNS, routing, etc.• Security: Grid Security Infrastructure (GSI)
• Uniform authentication & authorization mechanisms in multi-institutional setting
• Single sign-on, delegation, identity mapping• Public key technology, SSL, X.509, GSS-API
(several Internet drafts document extensions)• Supporting infrastructure: Certificate
Authorities, key management, etc.
GSI Futures• Scalability in numbers of users & resources
• Credential management• Online credential repositories• Account management
• Authorization• Policy languages• Community authorization
• Protection against compromised resources• Restricted delegation, smartcards
Architecture: Resources• Resource management: Remote allocation,
reservation, monitoring, control of [compute] resources - GRAM (access & management
• Data access: GridFTP• High-performance data access & transport
• Information:• GRIP cf LDAP• GRRP – Registration Protocol• Access to structure & state information
• & others emerging: catalog access, code repository access, accounting, …
• All integrated with GSI
GRAM Resource Management Protocol• Grid Resource Allocation & Management
• Allocation, monitoring, control of computations• Simple HTTP-based RPC
• Job request: Returns opaque, transferable “job contact” string for access to job
• Job cancel, Job status, Job signal• Event notification (callbacks) for state changes
• Protocol/server address robustness (exactly once execution), authentication, authorization
• Servers for most schedulers; C and Java APIs
Data Access & Transfer• GridFTP: extended version of popular FTP protocol for
Grid data access and transfer• Secure, efficient, reliable, flexible, extensible, parallel,
concurrent, e.g.:• Third-party data transfers, partial file transfers• Parallelism, striping (e.g., on PVFS)• Reliable, recoverable data transfers
• Reference implementations• Existing clients and servers: wuftpd, nicftp• Flexible, extensible libraries
Architecture: Collective• Bringing the underlying resources together
to provide the requested services • Resource brokers (e.g., Condor Matchmaker)
• Resource discovery and allocation• Replica management and replica selection
• Optimize aggregate data access performance• Co-reservation and co-allocation services
• End-to-end performance• Etc., etc.
Globus Toolkit Solution
Registration & enquiry protocols, information models, query languages• Provides standard interfaces to sensors• Supports different “directory” structures
supporting various discovery/access strategiesKarl Czajkowski, Steve Fitzgerald, others
Grid Futures
Major Grid ProjectsName URL &
SponsorsFocus
Access Grid www.mcs.anl.gov/FL/accessgrid; DOE, NSF
Create & deploy group collaboration systems using commodity technologies
BlueGrid IBM Grid testbed linking IBM laboratoriesDISCOM www.cs.sandia.gov/
discomDOE Defense Programs
Create operational Grid providing access to resources at three U.S. DOE weapons laboratories
DOE Science Grid
sciencegrid.orgDOE Office of Science
Create operational Grid providing access to resources & applications at U.S. DOE science laboratories & partner universities
Earth System Grid (ESG)
earthsystemgrid.orgDOE Office of Science
Delivery and analysis of large climate model datasets for the climate research community
European Union (EU) DataGrid
eu-datagrid.orgEuropean Union
Create & apply an operational grid for applications in high energy physics, environmental science, bioinformatics
ggg
g
g
g
New
New
Major Grid ProjectsName URL/
SponsorFocus
EuroGrid, Grid Interoperability (GRIP)
eurogrid.orgEuropean Union
Create technologies for remote access to supercomputer resources & simulation codes; in GRIP, integrate with Globus
Fusion Collaboratory
fusiongrid.orgDOE Off. Science
Create a national computational collaboratory for fusion research
Globus Project globus.orgDARPA, DOE, NSF, NASA, Msoft
Research on Grid technologies; development and support of Globus Toolkit; application and deployment
GridLab gridlab.orgEuropean Union
Grid technologies and applications
GridPP gridpp.ac.ukU.K. eScience
Create & apply an operational grid within the U.K. for particle physics research
Grid Research Integration Dev. & Support Center
grids-center.orgNSF
Integration, deployment, support of the NSF Middleware Infrastructure for research & education
g
g
g
g
g
g
New
New
New
New
New
Major Grid ProjectsName URL/Sponsor Focus
Grid Application Dev. Software
hipersoft.rice.edu/grads; NSF
Research into program development technologies for Grid applications
Grid Physics Network
griphyn.orgNSF
Technology R&D for data analysis in physics expts: ATLAS, CMS, LIGO, SDSS
Information Power Grid
ipg.nasa.govNASA
Create and apply a production Grid for aerosciences and other NASA missions
International Virtual Data Grid Laboratory
ivdgl.orgNSF
Create international Data Grid to enable large-scale experimentation on Grid technologies & applications
Network for Earthquake Eng. Simulation Grid
neesgrid.orgNSF
Create and apply a production Grid for earthquake engineering
Particle Physics Data Grid
ppdg.netDOE Science
Create and apply production Grids for data analysis in high energy and nuclear physics experiments
g
g
g
g
gNew
New
g
Major Grid Projects
Name URL/Sponsor FocusTeraGrid teragrid.org
NSFU.S. science infrastructure linking four major resource sites at 40 Gb/s
UK Grid Support Center
grid-support.ac.ukU.K. eScience
Support center for Grid projects within the U.K.
Unicore BMBFT Technologies for remote access to supercomputers
g
gNew
New
Also many technology R&D projects: e.g., Condor, NetSolve, Ninf, NWS
See also www.gridforum.org
The 13.6 TF TeraGrid:Computing at 40 Gb/s
26
24
8
4 HPSS
5
HPSS
HPSS UniTree
External Networks
External NetworksExternal
Networks
External Networks
Site Resources Site Resources
Site ResourcesSite ResourcesNCSA/PACI8 TF240 TB
SDSC4.1 TF225 TB
Caltech Argonne
TeraGrid/DTF: NCSA, SDSC, Caltech, Argonne www.teragrid.org
International Virtual Data Grid Lab
Tier0/1 facilityTier2 facility
10 Gbps link
2.5 Gbps link
622 Mbps link
Other link
Tier3 facility
U.S. PIs: Avery, Foster, Gardner, Newman, Szalay www.ivdgl.org
Problem Evolution
• Past-present: (102) high-end systems; Mb/s networks; centralized (or entirely local) control• I-WAY (1995): 17 sites, week-long; 155 Mb/s• GUSTO (1998): 80 sites, long-term experiment• NASA IPG, NSF NTG: O(10) sites, production
• Present: (104-106) data systems, computers; Gb/s networks; scaling, decentralized control• Scalable resource discovery; restricted delegation;
community policy; GriPhyN Data Grid: 100s of sites, (104) computers; complex policies
• Future: (106-109) data, sensors, computers; Tb/s networks; highly flexible policy, control
The Future• We don’t build or buy “computers” anymore,
we borrow or lease required resources• When I walk into a room, need to solve a
problem, need to communicate• A “computer” is a dynamically, often
collaboratively constructed collection of processors, data sources, sensors, networks• Similar observations apply for software
And Thus …• Reduced barriers to access mean that we do much
more computing, and more interesting computing, than today => Many more components (& services); massive parallelism
• All resources are owned by others => Sharing (for fun or profit) is fundamental; trust, policy, negotiation, payment
• All computing is performed on unfamiliar systems => Dynamic behaviors, discovery, adaptivity, failure
The Global Grid Forum• Merger of (US) GridForum & EuroGRID• Cooperative Forum of Working Groups• Open to all who show up• Meets every four months• Alternate – US and Europe
• GGF1 – Amsterdam, NL• GGF2 – Washington, US• GGF3 – Frascatti, IT
http://www.gridforum.org
GF BOF (Orlando)
GF1 (San Jose, NASA Ames)GF2 (Chicago, Northwestern)eGrid and GF BOFs (Portland)GF3 (San Diego, SDSC)
eGrid1(Posnan, PSNC)GF4 (Redmond, Microsoft)
eGrid2 (Munich, Europar)GF5 (Boston, Sun)
Global GF BOF (Dallas)
1999 2000
Asia-Pacific GF Planning (Yokohama)
1998
GGF-1 (Amsterdam, WTCW)GGF-2 (Washington, DC, DOD-MSRC)
GGF3 (Rome,INFN)7-10 October 2001
GGF4 (Toronto, NRC)17-20 February 2002
GGF5 (Edinburgh)21-24 July 2002Jointly with HPDC(24-26 July)
2001 2002
Global Grid Forum History
GGF AREAs • Working Groups • Research GroupsGrid Information Services • Grid Object
Specification• Grid Notification
Framework• Metacomputing
Directory Services
• Relational Database Information Services
Scheduling and Resource Management
• Advanced Reservation
• Scheduling Dictionary
• Scheduler Attributes
Security • Grid Security Infrastructure
• Grid Certificate Policy
Performance • PerformanceArchitectures • JINI • Grid Protocol
ArchitectureData • GridFTP • ReplicaApplications, Programming Models, and User Environments
• APPS, GUS, GCE, APM
Summary• The Grid problem: Resource sharing &
coordinated problem solving in dynamic, multi-institutional virtual organizations
• Grid architecture: Emphasize protocol and service definition to enable interoperability and resource sharing
• Globus Toolkit a source of protocol and API definitions, reference implementations
• See: globus.org, griphyn.org, gridforum.org