peter a. covitz, ph.d. chief operating officer national cancer institute

25
caBIG and caGrid: Interoperable Computing Infrastructure for the Nation’s [and World’s] Cancer Research Enterprise Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute Center for Bioinformatics TM TM

Upload: manny

Post on 10-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

TM. 0. caBIG and caGrid: Interoperable Computing Infrastructure for the Nation’s [and World’s] Cancer Research Enterprise. Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute Center for Bioinformatics. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

1

caBIG and caGrid: Interoperable Computing Infrastructure for the

Nation’s [and World’s] Cancer Research Enterprise

Peter A. Covitz, Ph.D.Chief Operating OfficerNational Cancer Institute Center for Bioinformatics

TMTM

Page 2: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

2

The Center for Bioinformatics is the NCI’s strategic and tactical arm for research information management

We collaborate with both intramural and extramural groups

Mission to integrate and harmonize disparate biomedical research data

Production, service-oriented organization. Evaluated based upon customer and partner satisfaction.

Page 3: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

3

The Problem

1,372,910 new cancer cases and 570,280 deaths due to cancer expected in the U.S. in 2005

Jemal et al., CA Cancer J Clin 2005; 55:10-30

Page 4: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

4

A National Response

Enable investigators and research teams nationwide to combine and leverage their findings and expertise.

Create scalable, actively managed organization that will connect members of the NCI-supported cancer enterprise by building a biomedical informatics network

The Cancer Biomedical Informatics Grid™ (caBIG™)

Page 5: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

5

Scenario from caBIG Strategic Plan

A researcher involved in a phase II clinical trial of a new targeted therapeutic for brain tumors observes that cancers derived from one specific tissue progenitor appear to be strongly affected.

The trial has been generating proteomic and microarray data. The researcher would like to identify potential biochemical and signaling pathways that might be different between this cell type and other potential progenitors in cancer, deduce whether anything similar has been observed in other clinical trials involving agents known to affect these specific pathways, and identify any studies in model organisms involving tissues with similar pathway activity.

Page 6: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

6

SemanticSemanticinteroperabilityinteroperability

SyntacticSyntacticinteroperabilityinteroperability

Interoperability

ability of a system to access and use the parts or equipment of another system

Page 7: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

7

SYNTACTIC

SEMANTIC

SEMANTIC

SEMANTIC

caBIG Compatibility Guidelines

Page 8: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

8

caCOREModel Driven Architecture + Computable Semantics =

Platform for Syntactic and Semantic Interoperability

Page 9: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

9

Page 10: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

10

caCORE

Bioinformatics Objects

Enterprise Vocabulary

Common Data Elements

SECURITY

Page 11: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

11

Bioinformatics Objects

Page 12: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

12

What do all those UML data Classes and Attributes actually mean, anyway?

UML model components are mapped to semantic concepts drawn from Enterprise Vocabulary sources, then registered in the Cancer Data Standards Repository (caDSR).

caDSR is a metadata registry, implements ISO/IEC 11179 standard for Common Data Elements (CDEs).

Common Data Elements

Page 13: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

13

Preferred Name

Synonyms

Definition

Relationships

Concept Code

Enterprise Vocabulary Description Logic

Page 14: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

14

caCORE Software Development Kit

Page 15: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

15

caCORE SDK Components

UML Modeling Tool (any with XMI export) Semantic Connector (concept binding utility) UML Loader (model registration in caDSR) Codegen (middleware code generator) Security Adaptor (Common Security Module)

caCORE SDK Generates acaBIG Silver-Compliant System

Page 16: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

16

Java Applications

Data AccessObjects

Web Application Server

Interfaces

Java

SOAP

XML

HTTP Clients

SOAP Clients

DataDataClientsClients

Perl Clients

EnterpriseVocabulary

CommonData

Elements

MiddlewareMiddleware

API

API

API

API

Data AccessObjects

DomainObjects[Gene,

Disease, etc.]

DomainObjects[Gene,

Disease, Agent,etc.]

caCORE Architecture

BiomedicalData

Authorization

Page 17: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

17

From Silver to Gold:

caGrid

Page 18: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

18

Use cases not satisfied by caCORE alone

Advertisement– Service Provider composes service metadata describing the

service and publishes it to grid.

Discovery– Researcher (or application developer) specifies search criteria

describing a service of interest– The research submits the discovery request to a discovery

service, which identifies a list of services matching the criteria, and returns the list.

Invocation– Researcher (or application developer) instantiates the grid

service and access its resources

Page 19: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

19

GolGoldd

Cancer Center Cancer Center

Cancer Center

Cancer Center

Cancer Center

NCIOTHER caBIGSERVICE

PROVIDERS

OTHERTOOLKITS

Silver

Silver

SilverSilver

Silver

Silver Silver

Page 20: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

20

caGrid 1.0 Architecture

Grid Communication Protocol

Service Description

Service

Business ProcessService Registry

Security

Semantic service

Resource M

anagement

Functions Quality of Service

ID R

esolution

Transport

GSI

DORIAN

GT4

Workflow

GLOBUS Toolkit

caDSR

EVS GT4

Portal

caDSRGME

GTS

Index

IntroduceFQEGrid ID

Page 21: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

21

Data Object Semantics, Metadata, and Schemas

Object oriented, APIs, well-defined data types

Classes defined in UML and converted into ISO/IEC 11179, registered in the caDSR

Definitions drawn from Enterprise Vocabulary Services (EVS), relationships semantically described

XML serialization of objects adhere to XML schemas registered in the Global Model Exchange (GME)

Service

Core Services

ClientXSDWSDL

Grid Service

Service Definition

Data TypeDefinitions

Service API

Grid Client

Client API

Registered In

Object Definitions

SemanticallyDescribed In

XMLObjectsSerialize To

ValidatesAgainst

Client Uses

Cancer Data Standards Repository

Enterprise Vocabulary

Services

Objects

GlobalModel

Exchange

GMERegistered In

ObjectDefinitions

Objects

Page 22: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

22

Service Data Elements

Two types of top-level grid services defined– Data Services– Analytical Services

Service Data Elements (SDEs) describe services so clients can discover what they do

Page 23: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

23

Integrating with other Grids

caGrid intentionally focused on federated data and analytic service interoperability, not computing power

Adoption of standard grid tooling intended to facilitate integration other grids with compute power focus

Seeking partnership with established compute grids to install caGrid Analytical Service nodes that would be transparently available to caGrid users

Page 24: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

24

Acknowledgements

caCORE– Denise Warzel– George Komatsoulis– Avinash Shanbhag– Frank Hartel– Dianne Reeves– Sherri De Coronado– Gilberto Fragoso– SAIC– Terrapin Systems– Oracle– Ekagra – ScenPro– Apelon– MSD

caGrid– Avinash Shanbhag, NCI– Joel Saltz and colleagues,

Ohio State U.– Ian Foster and colleagues,

U. Chicago/Argonne– Booz Allen Hamilton– SAIC– SemanticBits

Page 25: Peter A. Covitz, Ph.D. Chief Operating Officer National Cancer Institute

25

Links

caBIG:– https://cabig.nci.nih.gov

caGrid– https://cabig.nci.nih.gov/News_Folder/caGrid_1.0_Beta_Release

caCORE– http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview