TeraGrid’sIntegrated Information Service
“IIS”
Grid Computing Environments 2009
Lee Liming, JP Navarro, Eric Blau, Jason Brechin, Charlie Catlett, Maytal Dahan, Diana Diehl, Rion Dooley, Michael Dwyer, Kate Ericson, Ian Foster, Ed Hanna,
David L. Hart, Chris Jordan, Rob Light, Stuart Martin, John McGee,Laura Pearlman, Jason Reilly, Tom Scavo, Michael Shapiro,
Shava Smallen, Warren Smith, Nancy Wilkins-Diehr
TeraGrid Grid Infrastructure Group (GIG)University of Chicago, Argonne National Laboratory
November 2009
Outline
Introduction:Conceived in 2006; Production in 2007; Presented at GCE’07.IIS Vision
1st IIS System ArchitectureDistributed CI provider operated local information servicesCentralized federation wide information servicesRegistries -> XML document entries
2nd IIS Information ArchitectureRegistry architecture and data formatThe Capability Kit meta-registryCurrent information registries
Leveraging IIS Examples – Providers and Consumers
Conclusion and Future Work
November 20, 2009 GCE09
VisionProvide an Authoritative Integrated Information Service enabling:
Human discovery of cyber-infrastructureScience Gateways, Portals, Documentation, CLIs
Software discovery of cyber-infrastructureFor automated resource, service, and software selection and accessFor auto-configuration (applications, gateways, workflow engines)
Providers to advertise their cyber-infrastructure offeringsAdvertise any information about any CI capabilityProviders own data, and independently control publishing
Streamlined operationsChange integration and managementAutomated testing, and monitoring
November 20, 2009 GCE09
Provide an Authoritative Integrated Information Service enabling:
Human discovery of cyber-infrastructureScience Gateways, Portals, Documentation, CLIs
Software discovery of cyber-infrastructureFor automated resource, service, and software selection and accessFor auto-configuration (applications, gateways, workflow engines)
Providers to advertise their cyber-infrastructure offeringsAdvertise any information about any CI capabilityProviders own data, and independently control publishing
Streamlined operationsChange integration and managementAutomated testing, and monitoring
Vision
November 20, 2009 GCE09
Distributed Architecture Components
XMLRepository
WS MDS4
TomcatWebMDS
Apache 2.0
Federation WideIntegrated Information Service
WS MDS4
Service ProviderLocal Information Service
TeraGridWide
Databases
WS/REST
WS/SOAP
Clients
HTTPD
Clients
November 20, 2009 GCE09
High-Availability Architecture
…
info.teragrid.org
Dynamic DNS
Clients
Service ProviderPublishing
High-LevelAggregation
November 20, 2009 GCE09
Registry ArchitectureNamed Registries, with schema compliantRegistry Entries, which are each anXML Document
The Capability Deployment Meta-Registry
Universal IdentifiersSite and Resource IdentifiersCapability IdentifierRegistry entry cross-references
ExtensibilityMeta-Registry ExtensionsNew RegistriesXML
Registry CEntry 1Entry 2Entry 3…
Registry BEntry 1Entry 2Entry 3…
Information Architecture
Registry AEntry 1Entry 2Entry 3…
<Reg1.Entry><id>entry1</id><foo>bar</foo>
</Reg1.Entry>
November 20, 2009 GCE09
TeraGrid Capability Meta-Registry
Each Capability DeploymentWhere (site and resource)What (name, class, and description)Support informationStatus informationSoftware and services component informationExtensions
November 20, 2009 GCE09
Capabilities Kit Registry by Class
CTSS Gateways
LocalLocal HPC Software
Renci Portal…
Application Development & RuntimeTeraGrid Core Integration (local info service)Co-scheduling, meta-schedulingCommon ClientComputation & Scheduling ClientsData CollectionsData ManagementData Movement servers, ClientsDistributed Parallel Application SupportDistributed Programming SystemsLocal ComputeLoginNimbus/Cloud ComputingParallel Application SupportRemote ComputationScience Gateway SupportVisualization Software (VTSS)WAN GPFS, WAN Lustre file-systemsWorkflow Support
CentralCredential Server (MyProxy)Integrated Information ServicesUser Portal
November 20, 2009 GCE09
Other Registries
TeraGrid Central Database Registries
Gateways Registries
Local RP Registries
Site/Organization and Resource identifiers (IDs) and descriptionsProject/Allocation to Resource authorization listTeraGrid Science Gateway CatalogTeraGrid System Outages
November 20, 2009 GCE09
CTSS Extension RegistriesBatch System Load (%)Batch Queue Contents (requires authorization)OGF GLUE2
Science Gateway Web Services Application Registry
Local HPC Software Catalog
Leveraging IIS ExamplesResource Description Repository Publishing
TeraGrid User Portal Batch Load & Queue DataTeraGrid User Documentation
Software Discovery– CTSS Software– Local HPC Software– Science Gateway Software– Science Gateways Web Services “WS” Application Registry
Advanced Scheduling Information
Inca Verification & ValidationUser Profile ServiceDiscovery CLI Interface
November 20, 2009 GCE09
Resource Description Repository “RDR”
TeraGrid Core Services uses RDR to collect and store validated, current and historical resource description information:
Common Resource InformationCompute Resource InformationData Collections InformationStorage Information
TeraGrid Core IntegrationLocal ComputeData Collections(Storage)
TGUP Batch Load & Queue Data
IIS provides queue & batch load information from all RP sites for TGUP to use in system monitor
<LoadRP xmlns=""><ComputeResourceLoad xmlns=""><ResourceID>pople.psc.teragrid.org</ResourceID><SiteID>psc.teragrid.org</SiteID><LoadInfo hostname="tg-login1.pople.psc.teragrid.org" timestamp="2009-11-11T13:46:19Z"><Load><Type>queue</Type><Value>98</Value></Load>
Remote Computation -> Local Compute
November 20, 2009 GCE09
http://portal.teragrid.org/
TeraGrid User Documentation
http://www.teragrid.org/http://www.teragrid.org/userinfo/software/ctss.php
November 20, 2009 GCE09
Software Discovery
TeraGrid context:>650 CTSS software package deployments>1600 Local HPC software package deployments>40 Science Gateways offering software packages
Problems:How can users discover what software is available, and how to access it?How can Science Gateways or Web Applications discover what software is available thru web service interfaces and invoke it?
November 20, 2009 GCE09
Software Discovery
Solutions:Single IIS interface to multiple software repositories including 3rd party HPC software and Science Gateway software.A custom Gateway web services registry.
Which enables, for example:Scientists to discover that Gaussian is available both from the command line and through a full service gateway such as GridChem (www.gridchem.org).Science Gateways and Applications to discover and invoke Gaussian web services automatically.
November 20, 2009 GCE09
Kit Registry
Software Discovery Design
CTSS Kit Software
Gateways Kit Software
Local HPC Kit Software
Gateway Web Services Registry
Local HPCSoftware Registry
WS EnabledSoftwareDiscovery
Comprehensive Software Discovery
November 20, 2009 GCE09
Gateway WS Application Registry
l Each Gateway hosts a service (RESTful or otherwise) that publishes local web service metadata.
l Information Services aggregates all configured Gateway hosted GAWSR metadata, creating a central registry.
l Content of GAWSR metadata is rich enough to dynamically launch jobs via web services. (ie, the registry has enough metadata to allow a user/client to dynamically launch jobs)
l Following slides demonstrate two clients using the GAWSR. The first & the latter is a.
November 20, 2009 GCE09
November 20, 2009 GCE09
Dynamic execution of web services written in Java
November 20, 2009 GCE09
RIA Flex application showing the available metadata
Local HPC Software
Local HPC Software
November 20, 2009 GCE09
Advanced Scheduling Information
CTSSCo-schedulingMeta-schedulingComputation & Scheduling ClientsLocal ComputeRemote ComputationScience Gateway SupportWorkflow Support
GLUE2 Registry
November 20, 2009 GCE09
Inca Verification & Validation
• Running on TeraGrid since 2003
• Verifies IIS published information through automated, user-level testing
• Total of ~2200 tests running on 18 login nodes, 2 grid nodes, and 3 servers
• Email notifications for critical services
• Status views from detailed test information to summary and historical reports
• Data published as XML, HTML, or graphed
• IIS compatible REST interface:
http://inca.teragrid.org/
XMLCTSS kitregistrations
XSLinfo.teragrid.org
http://info.teragrid.org/web-apps/HTML/kit-reg-v1/remote-compute.teragrid.org-4.0.2/bigred.iu.teragrid.org/http://inca.teragrid.org/inca/HTML/kit-status-v1/remote-compute.teragrid.org-4.0.2/bigred.iu.teragrid.org/
November 20, 2009 GCE09
User Profile Service
November 20, 2009 GCE09
Provide authenticated users with user-centric informationHTTPS with Basic AuthenticationIn html, csv, json, perl, and xml formats
Discovery CLI InterfaceThe tginfo CLI: http://info.teragrid.org/tginfo/
November 20, 2009 GCE09
ConclusionFederation Wide Standards
Information Integration IdentifiersInformation Discovery REST APIsStandard Capability Naming and Description Schemas
Federation Wide Information DiscoveryUsing a Central Federation Wide IndexUsing a DNS/WWW model
Central Discovery à Distributed Information Access
Enable User InterfacesWeb 2.0, Science Gateways, and traditional Web servers** IIS does not develop those interface
November 20, 2009 GCE09
Conclusion & Future WorkInformation Architecture
Capability Definition Meta-Registry (BioMedical Informatics -- BIRN)Capability Implementation RegistryMore Capabilities and Capability Classes
Clouds/IaaS, SaaS, Distributed Programming Environments (SAGA) , Data CollectionsScience Gateway Security Configuration Information (SAML)
System ArchitectureFully REST based registration services (Apache CXF, Globus CRUX)Fully REST based aggregation servicesMore REST based discovery interfaces (with XPATH, XSLT support)More custom REST service, some providing custom user services
Separate IIS projectPackaged, documented, and distributed for other projects
November 20, 2009 GCE09
More Information
November 20, 2009 GCE09
Web Siteshttp://info.teragrid.org/http://www.teragrid.org/gateways/http://info.teragrid.org/web-apps/html/index/ (REST APIs)
PeopleJP Navarro, Lee Liming (IIS Architecture and Coordination)Nancy Wilkins-Diehr (Gateway Information)Warren Smith (Execution and Scheduling Information)Ed Hannah (Resource Description Information)Kate Ericson (Monitoring and Validation Information)Rion Dooley (Authenticated User Custom Information)