icr/tbpt semantic infrastructure requirements update face-to-face meeting may 4-6, 2010 denise...

18
ICR/TBPT Semantic Infrastructure Requirements Update Face-to-Face Meeting May 4-6, 2010 Denise Warzel Associate Director SI Operations Team, CBIIT

Upload: gerard-reed

Post on 18-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

ICR/TBPTSemantic

Infrastructure Requirements Update

Face-to-Face MeetingMay 4-6, 2010

Denise Warzel

Associate Director

SI Operations Team, CBIIT

2

Outline

• What is the purpose of the next generation Semantic Infrastructure?

• SI Requirements Elicitation Update• Summary

33

Goal of Semantic Infrastructure Requirements Elicitation

• Simplify use of existing capabilities• Identify the gaps in capabilities• Build new capabilities to address the gaps

4

SI Requirements Update

• SI Requirements Elicitation• Current VCDE Small Group • Elicit requirements from the community

• Elicit derive by reason; "elicit a solution" wordnetweb.princeton.edu/perl/webwn

• Involves analysis

• Organize requirements • Use Case “leveling” • Mapping to NCI Service Categories (Business Process, Business Capability, Core,

Infrastructure/Utility)• Mapping to NCI Enterprise Services (Periodic Table)• Hand-off business requirements to the architects and high priority use cases that address

the gaps

5

Requirements “Leveling” Summary and Sea Level Level User Stories

SI Master List Over 114+ Requirements Represented a broad spectrum of “requirements” organize into software requirements

Summary• Cloud – Very high level, involve multiple user goals “Operate a Biospecimen Repository”

• Kite – High level, a process that takes place over several hours, days or weeks involving many steps “Find Usable Samples”

User Goals• Sea Level – User Goal, something the actor is trying to get done – “one person, one sitting”

Subfunctions• Underwater – needed to accomplish user goals, typically can be used and reused – “Save as a File”

• Clam – not usually written out in detail as a use case, “insert record into database”

Agile Software Development Series Cockburn

6

Candidate Service Categories

NCI Services are classified into four primary types:• Business/Process

• Arbitrarily complex services that utilize the other three service types to carry out business functions

• Business Capability• Services that provide “business atoms”, the data most business processes utilize

• Core• Services that provide information components to capability and business services

• Infrastructure/Utility• Services that are required or utilized by virtually all other services

New SI

Domain WS

7

Periodic Table of Services: Infrastructure/Utility

RRegistration

AeAdverse

Event

PtProtocol

“BU

SIN

ES

S”

“CO

RE

”“C

AP

AB

ILIT

Y”

SSpecimen

ScScheduling

TxTraansformation

AuAudit

RaReferral andAuthorization

HxHx and Physical

VaValidation

EvEnterpriseVocabulary

OOrganization

PPerson

AAgent

DDisease

CCorrelation

PaProtocol

Abstraction

DxDischarge

Note

PoPatient

Outcomes

TpTreatment

Plan

DsDecisionSupport

IImage

LLab

RxPharmacy

“Inf

ra /

UT

ILIT

Y”

CmContract

Management

MpMaster

Problem List

AyAllergy

SdSDTM

EEligibility

KmKnowledge

Management

CrCredentialing

OcStudy

Outcomes

QrData

Query

IdId

Management

TrTrust

Management

AaAuthorization Authentication

PyPolicy

• Knowledge Management service represents a series of capabilities around the storage, versioning, and expression of  the semantics supporting key capabilities

• Service Contract Management service represents a series of capabilities around the storage, versioning, and expression of  the  contract semantics supporting interoperability

• Enterprise Vocabulary services support the management, storage, and mapping of terminologies and value sets• The Validation service verifies structural and semantic consistency across messages used in interoperability scenarios• The Transformation service provides a functional end point to manage and enact mappings between syntactically disparate

information types• The Auditing service provides an interface that captures auditing information around the access to sensitive data

8

Summary Level User Stories (from User Stories on VCK wiki)

1) Search  for all "pre-cancerous" biospecimens that are available for sharing at Washington University, Thomas Jefferson University, and Fox Chase Cancer Center.

2) Identify samples obtained for glioblastoma multiforme (GBM) and the corresponding CT image information

3) Support the addition of new types of annotations/attributes that reflect new research or hypothesis; automatically capture and publish the information so it is secure and sharable with others.

4) When defining new datasets for caIntegrator's data-warehouse for biomedical data collection and analysis, automatically record these new datatypes in a well-defined and federated manner so that data can be shared.

5) Discover and orchestrate services to achieve LS research goals; e.g. start with a hypothesis, identify relevant services that provides the necessary analysis and data, create the workflow/pipeline, report findings.

9

Summary User Story 3

3) Support the addition of new types of annotations/attributes that reflect new research or hypothesis; automatically capture and publish the information so it is secure and sharable with others

Domain Description: A teratoma is an encapsulated tumor with tissue or organ components resembling normal derivatives of all three germ layers.  Regardless of location in the body, a teratoma is classified according to a cancer staging system (0-3); Teratomas are also classified by their content: (solid, cystic, mixed).  A cancer researcher would like to extend the pathology annotations associated with tissues in the center's tissue bank by adding Teratoma Content as an additional nonseminomatous germ cell tumor (NSGCT) annotation.  The researcher communicates this to the director of the tissue repository, who promptly opens the administrative interface to caTissue and adds the additional pathology annotation.  The system is now able to capture NSGCT annotations Teratoma.cancerStage and Teratoma.content, and the data and data descriptions are shareable with other organizations.

10

Summary User Story 4

4) When defining new datasets for caIntegrator's data-warehouse for biomedical data collection and analysis, automatically record these new datatypes in a well-defined and federated manner so that data can be shared.

Domain Description: Similar to the teratoma example, but the extensions are broader and may involve the addition of potentially new classes of information and relationships. The researcher needs to be able to sit at his workstation and discover if any the new datasets have already been defined by others at an institution he is collaborating with, and if so, reuse their dataset descriptions, extend or constrain them to fit his purposes, and automatically record this information in his local system, simultaneously notifying his collaborators that he has just made some changes. The system needs to be able to generate the appropriate service interfaces so that his collaborators can query for and share the data.

11

Summary User Story 5

5) Discover and orchestrate services to achieve LS research goals; e.g. start with a hypothesis, identify relevant services that provides the necessary analysis and data, create the workflow/pipeline, report findings.

• Domain Description [Revised From ICRi Use Cases]: A scientist is trying to identify a new genetic biomarker for Stage I breast cancer patients. The scientist queries for appropriate tissue specimens using caTissue services at his/her cancer center that also have corresponding microarray experiments. Analysis of the microarray experiments identify genes that are significantly over-expressed and under-expressed in a number of cases. The scientist decides that these results are significant, and queries for related literature suggest a hypothesis that gene A may serve as a biomarker in these types of specimens. To validate this hypothesis in a significant number of cases the scientist needs a larger data set, so he queries for all the specimens of Stage I breast cancer patients with corresponding microarray data and also for appropriate control data from other cancer centers. After retrieving the microarray experiments the scientist analyzes the data for over-expression of genes A using the same analysis service as before. The semantic infrastructure automatically saves these steps as a workflow for discovery, modification and reuse by others.

12

How will SI meet these needs?

• Semantic Infrastructure Concept of Operations• Describes the goals of the Semantic Infrastructure• https://cabig-kc.nci.nih.gov/Vocab/KC/index.php/CaBIG%C2%AE_Semantic_In

frastructure_Concept_of_Operations

• (being re-evaluated/re-factored in terms of HOW we will meet SI needs)• White Paper - Anticipated early June 2010

• Dave Hau – CBIIT Associate Director, Application Engineering• Charlie Mead, Cecil Lynch, Raghu Chintalapati, Brian Davis

• Provide easier integration and leverage semantic infrastructure and metadata services ENABLE ECCF, caEHR, projects like Transcend (caTissue/caArray)

• caDSR and EVS Software largely “Stabilized” as of Fall 2009• Move to sSOA SAIF• Continue to support existing customers Content development

• New Terminology and Metadata as needed• Bug fixes when needed• Support high priority integration as needed

• caEHR and Imaging new vocabulary development• Medidata/RAVE Forms Integration• Children’s Oncology Group – 9k new data elements• NIDCR (Dental and Craniofacial Research) • …..

13

NCI Objectives

• Supporting investigator-initiated, hypothesis-driven research into the etiology, treatment, and prevention of cancer• Generating and publishing novel cancer research findings by

mining existing data resources such as TCGA• Leveraging caGrid-enabled and other data resources and analytic

services• Identifying novel bioinformatics processes and tools to exploit

existing data resources

• Assessing gaps in caBIG® tools• Infrastructure• Data/information resources• Analytical services

Adapted from G. Komatsoulis March 2010

14

In summary ….

• The Semantic Infrastructure will provide the ability to find, share and intelligently reuse models, metadata and services to aid in delivering working interoperability

1515

Questions?

• ?

16

Summary User Story 1

1) Search  for all "pre-cancerous" biospecimens that are available for sharing at Washington University, Thomas Jefferson University, and Fox Chase Cancer Center.

• Domain Description: A cancer researcher sits down to his console with the intention of ordering some biospecimens for use at his organization.  He opens the caTissue website at his lab and begins performing the search using the term “pre-cancerous”.  Unfortunately, there is currently a shortage at his hospital of suitable pre-cancerous tissue.  Therefore, he expands his search to Washington University, Thomas Jefferson, and Fox Chase, all of which are in driving distance so he could send a post doc to pick them up.  He hits the search button, and the result from all three cancer centers are displayed on his web page. He selects suitable biospecimens, hits the print button, and sends his trusty post doc on his way.

17

Summary User Story 2

2) Identify samples obtained for glioblastoma multiforme (GBM) and the corresponding CT image information

• Domain Description: a cancer researcher has developed a new image detection algorithm for identifying glioblastoma multiforme, which is the most common and most aggressive type of primary brain tumor in humans. When viewed with MRI, glioblastomas often appear as ring-enhancing lesions. The appearance is not specific, however, as other lesions such as abscess, metastasis, tumefactive multiple sclerosis, and other entities may have a similar appearance.  The cancer researcher's algorithm should be able to differentiate between cancerous lesions and other lesions, but he needs additional tissues and images to make his testing statistically significant.  The cancer researcher sits down to his laptop and using a caBIG search tool, he builds a search on all known tissues that have been identified as globlastoma multiforme via stereotactic biopsy and have corresponding CT images.  He hits the search button, gets a cup of coffee, and a returns to a list of 74 tissues with 465 images.  He hits the export button, which downloads all the images with associated pathology results.

Interoperability: Standards

• NCI Services are based, wherever possible, on existing standards to enhance interoperability

• NCI Enterprise Service payloads are derived from the HL7 v3 Reference Information Model and reference relevant standards wherever possible

• NCI Enterprise Services utilize ISO 21090 data types

• NCI Enterprise Services utilize standard controlled biomedical terminologies such as LOINC and the NCI Thesaurus and the ISO 11179 metadata specification

Adapted from G. Komatsoulis March 2010