cagrid overview and core services cagrid knowledge center february 2011
TRANSCRIPT
caGrid Overview and Core Services
caGrid Knowledge Center
February 2011
caGrid
• A Grid software middleware infrastructure consisting of services, toolkits, APIs, and runtime environment• Standards Based, Open Source
• Building blocks to create interoperable, Grid-enabled systems
• Service Oriented Architecture• Web Services Resource Framework standards
• Model Driven Architecture• Object oriented view, published information models, strongly-typed services• Rich metadata
• A production Grid deployment of the core services provided by that infrastructure• Security, Data Services Infrastructure, Service Development &
Deployment, Metadata, Federated Query, Workflow, Advertisement & Discovery
• Provides the software foundation which underlies the tools and applications of caBIG
Application Scenario
• A clinician/researcher is involved in a multi-institutional clinical trial of a new targeted therapeutic • Microarray, Proteomic, and Image data are collected from patients
participating in the trial• Researcher wants to carry out a correlative analysis to assess the
treatment • Query and analyze microarray, image, and protein data from
multiple patients to find interesting patterns• Look for similar patterns in other microarray, protein, and
image databases• Patients may have been seen at multiple institutions• Datasets may have been collected at different institutions
Application Scenario
Location AMicroarray, Protein, Image data
Location BMicroarray, Protein, Image data
Location CMicroarray, Protein, Image data
Location CImage Analysis
Location DImage Analysis
Microarray and protein databases at other institutions
Different database systems, different data
representations, security
Different invocations of programs, remote
access, how to transfer data.
caGrid Production Environment
Infrastructure Core Capabilities
• Model-Driven and Metadata • Enabling and supporting interoperable services• Providing service-oriented metadata
• Service development and deployment• Tooling for bringing applications and data to the grid
• Advertisement and Discovery • Publishing services to the Grid• Enabling search for services based on service metadata
• Security• Integrating existing systems and applications with Grid security• Lowering burden of implementation of grid-wide and local policy
• Facilitating Grid wide operations• Federated query, workflow execution
• Making services and core infrastructure more accessible• Graphical installation and configuration, higher-level object-oriented APIs, web
portals, graphical administrative applications
Model Driven, Interoperable Services
• Client and service APIs are object oriented, and operate over well-defined and curated data types
• Objects are defined in UML and Components, which are in turn registered in the Cancer Data Standards Repository (caDSR)
• Object definitions draw from controlled terminology and vocabulary registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described
• XML serialization of objects adhere to XML schemas registered in the Global Model Exchange (GME)
Service
Core Services
Client
XSDWSDL
Grid Service
Service Definition
Data TypeDefinitions
Service API
Grid Client
Client API
Registered In
Object Definitions
SemanticallyDescribed In
XMLObjectsSerialize To
ValidatesAgainst
Client Uses
Cancer Data Standards Repository
Enterprise Vocabulary
Services
Objects
GlobalModel
Exchange
GMERegistered In
ObjectDefinitions
Objects
Global Model Exchange and Metadata Model Services
• Global Model Exchange
• Provides support to store and retrieve schemas for types used in Grid services.
• Developers should register the schemas defining types used in Grid services with the GME.
• Metadata Model Service (MMS)
• Provides support for developers to generate and add service metadata
• Developers can augment standard caGrid service metadata with information from metadata registries, such as the caDSR
• External registry provides the means to add, modify, delete, or otherwise manage the UML models and their correspondence to XML Schemas which the MMS leverages
Service Development and Deployment: Introduce
• A framework which enables fast and easy creation of Grid services.• Provides easy to use graphical service authoring tool.• Hides all “grid-ness” from the developer.• Handles all core service architecture requirements for strongly
typed and highly interoperable grid services.• Integration with other core grid services and architecture components
• GAARDS Security Infrastructure• Globus Index Service• Global Model Exchange• Metadata Model Service• Cancer Data Standards Repository
• Extension Framework for integrating with other architecture components
Introduce Features
• Supports modification of operations• Adding operations• Removing Operations• Updating Operations• Importing Operations
• Graphical Configuration• Advertisement
• Security
• Service Metadata Specification
• Service Metadata Editing
• Service Configuration Properties
• Auto Generates Code for Service• Auto generates a client API for service.• Graphical Deployment of Service
• Globus
• Tomcat
• JBoss
Advertisement and Discovery: Index Service
Core Services
Grid Service
Uses TerminologyDescribed In
Cancer Data Standards Repository
Enterprise Vocabulary
Services
References ObjectsDefined in
Index Service
Service Metadata
Publishes
Subscribes Toand Aggregates
Queries ServiceMetadata Aggregated In
Registers To
Discovery Client API
All services register their service metadata information to the Index Service
• Clients can discover services using a discovery API which facilitates inspection of data types
• Leveraging semantic information in EVS (from which service metadata is drawn), services can be discovered by the semantics of their data types
Examples:“Find me all the services from Cancer Center X”“Which Analytical services take Genes as input?”“Find me all the services with some metadata mentioning the string ‘macromolecules’”
Service Metadata: Data Service
• Data Service Metadata
• Describes the Domain Model being exposed, in terms of a UML model linked to semantics
• Data types defined in terms of structure and semantics extracted from caDSR and EVS
• Auto-generated by caGrid service authoring toolkit (Introduce)
Security Services
• Authentication• How to identify a client (or a service)• Secure login • Integrate the Grid with existing institutional login systems!
• Enforce data sharing policies and access control• Local policies• Federated access
• Trust Fabric• How to trust a client and what level• Dynamically adapt trust if security breach
caGrid Security Infrastructure (GAARDS)
• Dorian• Allows accounts managed in external
domains to be federated and managed in the Grid.
• Allows users to use their existing credentials (external to the Grid) to authenticate to the Grid
• Grid Grouper/CSM• Provides a group-based authorization
solution for the Grid
• Grid Trust Service• Supports applications and services in
deciding whether or not signers of digital credentials can be trusted.
• Supports the provisioning of trusted certificate authorities and corresponding certificate revocation lists.
Provides services and tools for the administration and enforcement of security policy in an enterprise Grid.
Secure Clinical Research Support with GAARDS
• Use Dorian for grid authentication• Integrate with my LDAP user database and authentication
• Use Grid Grouper (along with local mechanisms) for Grid authorization• I let reviewers from institution X access patient data in the “Watson
” research trial for review only• Data Entry personnel for the research trial have permission to add
new data, but not update existing data• I bar institution X from accessing any other data I’m sharing on the
Grid
• Use GTS to update the grid trust fabric• I trust institution Y after finalizing data sharing agreements for the
Watson research
caGrid Data Service Infrastructure
• caGrid Data Services provide capability to expose data resources to the Grid• Specialization of caGrid grid services to expose data through a
common query interface• Introduce extensions to create data services from information models
and using caCORE SDK• Queries made with caBIG Query Language Query objects.
• Specifies a target object (result) type and selects the instances which satisfy the specified properties and nested object properties
• Ability to return full Objects, Set of attributes, count of results, or distinct attribute values
• Support for Bulk Data Transport for efficient transfer of large data volumes
Federated Query Processor Service
• Provides a mechanism to perform basic distributed aggregations and joins of queries over multiple data services
• Can be used to express queries against any combination of caGrid
data services, since each service uses CQL
• Federated queries are expressed using DCQL, an extension to CQL• Express joins, aggregations, and target data services
• Client API provides a means of expressing DCQL queries• Federated Query Processor service partitions a DCQL query into
queries to respective data services, carries out joins and aggregations, and compiles the results
17
Workflow Service
• Provides capability to describe “orchestrations” of service invocations and data movement
• Support two workflow execution engines• ActiveBPEL (Deprecated in caGrid 1.4)• Taverna
• Coupled with semantic discovery, service metadata, and registration of data type structures in caGrid, provides a powerful framework for analyzing data• Services can be dynamically discovered and federated queries
can be invoked as part of a workflow
Putting It Together for Example Scenario
Location AMicroarray, Protein, Image data
Location BMicroarray, Protein, Image data
Location CMicroarray, Protein, Image data
Location CImage Analysis
Location DImage Analysis
Microarray and protein databases at other institutions
caGrid Service Interfaces
caGrid Environment
Registered Object Definitions
Advertisement
Log on, Grid credentials
Query and Analysis Workflow
Discovery