1 common biorepository model (cbm) : specimen searches across real specimen collection data cabig®...

49
1 Common Biorepository Model (CBM) : Specimen Searches Across Real Specimen Collection Data caBIG® Tissue Banks & Pathology Tools Workspace TBPT F2F Houston, Texas November 3, 2010

Upload: tamsyn-pitts

Post on 25-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

1

Common Biorepository Model

(CBM) : Specimen Searches Across Real Specimen Collection

Data

caBIG® Tissue Banks & Pathology Tools

Workspace

TBPT F2F

Houston, Texas

November 3, 2010

2

Agenda

• Progress on CBM Challenge with CBM 1.0 Beta

• CBM Use at the University of Colorado Cancer Center

• CBM Use at the Medical University of South Carolina Hollings Cancer Center

• CBM Use at Washington University St. Louis and caB2B Querying

• Model, Next Steps and Discussion

3

Common Biorepository

Model (CBM): Update

November 3, 2010

Anna Fernandez, PhD

TBPT Workspace

4

Why CBM? (1/2)

• Clinical research often uses only locally obtained specimens due to limited ability to search for specimens outside an institution.

• The ability to aggregate similar specimens from various sites will expand the validation of pathology research findings and thus, more quickly impact patient care.

• caGrid allows for data from caBIG-compatible systems to be connected across the sites• Has a CQL querying language that allows for interrogation of model

metadata to navigate and find information about the data on the “grid node”

• caTissue Suite allows individual specimen level queries on caGrid

• How can we get, at minimum, all biorepository systems to share SUMMARY LEVEL information about their specimens, if many use different vendor systems and/or institute-developed solutions?

5

Why CBM? (2/2)• A CBM serves as a simple information model for interfacing with

systems by sharing key summary-level specimen information, enabling a single search across multiple biorepositories/banks.

• Biorepository software vendors and NCI stakeholders (tissue bank personnel, researchers) convened to develop a CBM undergoing final development stage.

• The goal is to reduce the time and effort required by researchers to locate biorepositories with needed specimens.

• Researchers will be able to search via the OBBR’s Specimen Resource Locator to identify specimen resources to fit their research needs.

6

CBM and Specimen Resource Locator

Sharing Data on the Grid – caTissue Suite vs. CBM

Compatibility Path

Compatibility

Fu

nct

ion

alit

y

Bronze caGrid Services

Common Biorepository

Model

Specimen Management

Services

9

CBM Challenge (Fall 2010)

ITERATIVE Development PROCESS w

ith CBM Community

NIH SRL Participants and CBM Challenge Participants

Path to CBM 1.0 (to 2009)

June

- Aug

Decem

ber

Novem

ber

Dec -

Jan

Internal NCI

Revisions

Internal NCI

Revisions

2009

Feb-

May

Updated versionCBM 0.93,

Review with Stakeholders; engage more

vendors

Updated versionCBM 0.93,

Review with Stakeholders; engage more

vendors

CBM Vendor Participants Accept CBM Challenge

(14)

CBM Vendor Participants Accept CBM Challenge

(14)

SRL Stakeholders define initial

vocabulary terms desired, initial

Silver Level Review Feedback incorporated into

CBM 0.95

SRL Stakeholders define initial

vocabulary terms desired, initial

Silver Level Review Feedback incorporated into

CBM 0.95

Sept-N

ov

Vendor workshop developed CBM 0.9

Vendor workshop developed CBM 0.9

2008

CBM 0.95 Service files generated (Grid-KC, with simple data set from caTissue

ETL);CBM vendors ready to begin

looking at service docs

CBM 0.95 Service files generated (Grid-KC, with simple data set from caTissue

ETL);CBM vendors ready to begin

looking at service docs

2009

Path to CBM 1.0 (2010)CBM Participants Mapping to REAL

Collections:

UCCC, using .NET

service connected to home-grown

system

Hollings Cancer Center –AIM TissueMetrix

WashU-caTissue

CBM Participants Mapping to REAL

Collections:

UCCC, using .NET

service connected to home-grown

system

Hollings Cancer Center –AIM TissueMetrix

WashU-caTissue

Jan-

April

June

CBM 1.0Beta -Oct

• All terms in caDSR,

• Mapping to terms with SNOMED, ICD-9/10 synonyms,

• Corrections to model based on June feedback

CBM 1.0Beta -Oct

• All terms in caDSR,

• Mapping to terms with SNOMED, ICD-9/10 synonyms,

• Corrections to model based on June feedback

July

-Oct

Nov

CBM Vendors Stand up Grid with Test Data (IMS, 5AM,

caTissue-WashU (early ETL), AIM)

FreezerWorks testing code

Daedalus, Healthcare IT, Westat, GenoLogics , Ocimum Biosolutions,

LabVantage looking at code, ready when vendors ready

UCCC interested in testing

CBM Vendors Stand up Grid with Test Data (IMS, 5AM,

caTissue-WashU (early ETL), AIM)

FreezerWorks testing code

Daedalus, Healthcare IT, Westat, GenoLogics , Ocimum Biosolutions,

LabVantage looking at code, ready when vendors ready

UCCC interested in testing

CBM 1.0Beta-May – Service

Files Generated for early testers;

new term curation

started in NCI Thesaurus

CBM 1.0Beta-May – Service

Files Generated for early testers;

new term curation

started in NCI Thesaurus

2010

2011: CBM 1.0 ECCF-SAIF Compliant, caBIG Compatible Service

13

CBM Experience - University of

Colorado Cancer Center

Michael Ames

14

CBM Experience –

Hollings Cancer

Center -AIM

Bill Morgenweck,

Kevin Hooper

15

Where we are today, Tuesday Nov 3

• UCCC has a node on the Training Grid – generated from a CBM service based on October 2010 CBM1.0Beta• ETL mapping done from MS-SQL to MS-SQL• Used CBM vocabulary• Using CBM1.0Beta-October EA model

• Hollings Cancer Center – with Artificial Intelligence Tissue Metrix• ETL mapping from Oracle SQL to MySQL• Extensively used the CBM vocabulary (CBM1.0Beta October vocab)• Close to showing on the Grid – can query the Database• Using CBM1.0Beta-May service files

• Washington University• Early caTissue Suite-CBM ETL scripts from April 2010 used (not complete

mapping) • Using CBM1.0Beta-May service files and Database

16

CBM Experience – Washington University -

caTissue

Rakesh Nagarajan

17

Query Common Biorepository Model (CBM):

iGoogle App

Mukesh Sharma

Rakesh Nagarajan

18

Training Grid Portal has CBM Nodes: http://portal.training.cagrid.org/web/guest/home

19

iGoogle Gadget – Querying the CBM Grid Node• iGoogle Application • Queries and used GridPortlet

• Using “My Gadgets”• http://

www.neatdev.com/cabig/gadget.xml

• Aggregating Data from the Grid:• UCCC• Washington University

20

How to install the Specimen Counts Google Gadget (connecting to CBM test nodes on the caGrid)• Works best on Chrome and Firefox viewers• Go To http://www.google.com/ig • Click “add stuff” • Search for “My gadgets” and click to add this to your iGoogle Page• Install “My gadgets”, go back to your iGoogle page • Once installed, enter this address in the “add gadget” bar:

http://www.neatdev.com/cabig/gadget.xml • The Specimen Counts should appear on your iGoogle homepage.

21

Specimen Counts

22

HCC Normal Breast Tissue

HCC

52

Hollings Cancer Center/AIM http://tmxstorefront.hcc.musc.edu/

23

Hollings Cancer Center : Number of cases by specimen

• Waiting for Grid Connection – will be up soon (next few days?)• Specimens that will be exposed are reported here:

http://tmxstorefront.hcc.musc.edu/

BIRT (Ex. June 2009) – open-source reporting tool

• Tie into XML output of Grid Queries• Customize reporting• Could customize what is displayed

BIRT (in the works)

26

Lessons Learned as we continue…

• Working with Vendors and multiple systems helps identify issues and iteratively improve• Mapping issues – are we all mapping to the same things• Testing with real data – when challenges are identified• Grid node testing – various environments/difficulties encountered

helps documentation and potential muddy areas!

• Extract-Transform-Load process – must be meticulous and all terms agreed on• Mapping Decisions are key in the process - Map to a code or map

to an NCI preferred name?• All must subscribe to same version – or work out how to map

against different versions • Through Real-testing is when we find this challenges – THANKS

to the institutes testing with us in this Challenge

27

Fall 2010 Additions to help ETL

• Tables added with the lists of values from the model• Can be directly accessed by ETL processes• Enforces integrity via foreign keys

• Mapping table• Addresses mappings to

• ICD9CM• ICD10• SNOMEDCT

28

What we have todayArtifact Links Description

CBM with Value Domains.EAP UML Model :

https://ncisvn.nci.nih.gov/svn/common_biorepository_model/trunk/caCORE_SDK/models/CBM%20with%20Value%20Domains.EAP

https://ncisvn.nci.nih.gov/svn/common_biorepository_model/trunk/caCORE_SDK/models/CBM%20with%20Value%20Domains.xmi

EA Model (with Permissible values,

and XMI version

CBM. SQL MySQL Database

https://ncisvn.nci.nih.gov/WebSVN/filedetails.php?repname=common_biorepository_model&path=%2Ftrunk%2Fdatabase%2FCBM.sql

Database to be used for ETL; Now has the NCI Concept Code and NCI-Concept name

HTML view of EA model:

https://ncisvn.nci.nih.gov/svn/common_biorepository_model/trunk/html_documentation/index.htm

Able to navigate through EA model (w/o having EA on machine) – can walk through UML model

/cbm service files and Grid Deployment instructions

90% complete. Needs some testing of the /cbm

Files generated from running model through caCORE-SDK and Introduce (latest version)

29

Anatomic Source

30

Specimen Type

31

Diagnosis

32

Preservation Type

33

Mappings

Where we are headed:

• Specimen Resource Locatorhttp://biospecimens.cancer.gov/locator

• From 2002 to December 2009, ~14,000 queries looking at static-based SRL

• Q1 2011 – SRL Developers will begin• CBM 1.0 documentation/service/test package will be

developed (caBIG® Compliant ECCF/SAIF service)• SRL 2.0 Work will begin

1.Electronic web form = A web-based questionnaire based on CBM

2.Common Biorepository Model = through CBM challenge adoptees• SRL will provide names of biorepositories’ contacts that have

specimens researchers are looking for.• Additional information will be obtained from direct interaction with

contact• Material Transfer Agreements (collaboration, purchase, etc) – will then

be discussed between researcher and biorepository

35

CBM 1.0Beta Next Steps in Testing

• Work with our testing vendors to check if we can query them• Through querying, determine if the values are appropriate/expected

• Helping with how Specimen Resource Locator will also be expecting data in

• Test if .NET-based service is matching with the caCORE-SDK/Introduce version, in terms of querying paths

• caTissue Suite ETL – continue testing with Washington University, to expose all the data types and thus, aggregate with the specimen information from other test sites

• Incorporate feedback, more detailed instructions, guidance for ETL

• CBM 1.0Beta – transfer to Specimen Resource Locator Development team• ECCF/SAIF documentation• Comply with the standards set up for new caBIG® services (ISO 21090 data

types, etc.)

• Position to be first Specimen Management Service Set component

36

How to Participate

• View CBM Wiki (Latest information): https://wiki.nci.nih.gov/display/TBPT/Common+Biorepository+Model+%28CBM%29

• Review Model and vocabulary – how will it match with your biorepository data?

• Test the service/mapping when final service files are released• ETL process can begin today

• Contact TBPT

• If you are using a CBM Participating Vendor system – let them know you are interested in testing/using CBM

• Identify key projects in your institute that could have/will benefit from finding more specimens for their work – or finding uses for the specimens they currently hold

37

Acknowledgements

• Hollings Cancer Center• Artificial Intelligence in Medicine (AIM)• University of Colorado Cancer Center• University of Virginia (.NET service)

TBPT – CBM Team• Ian Fore, DPhil, NCI-CBIIT• Anna Fernandez, PhD, Booz Allen Hamilton• Libby Prince, Sapient Government Solutions• Andrew Breychak, Sapient Government Solutions• Ben Fombonne, Kelly Government• Beth DiGiulian, Booz Allen Hamilton

• caGrid KC, special thanks to Joe George, Bill Stephens, & Justin Permar!• Tissue Banking Knowledge Center• Vendor Community• Biorepository community!

38

CBM Challenge (Fall 2010)

39

Extra Slides

40

Key CBM Sites:

• CBM Site (Latest Information): https://cabig.nci.nih.gov/workspaces/TBPT/CBM/

• CBM 1.0Beta Grid Test Package (May 2010) – email and links will be found on CBM website

• CBM Latest Model (Interactive Enterprise Architect version, IE browser):

https://ncisvn.nci.nih.gov/svn/common_biorepository_model/trunk/html_documentation/index.htm

• Find CBM 1.0Beta Test Nodes! • The service is deployed to the  Training Grid. Here is the service URL:

http://portal.training.cagrid.org/web/guest/discoveryThen search for a service using “Name” field - with name “CBM”:

• CBM Site for Vendor/Community Comments:• https://cabig-kc.nci.nih.gov/Biospecimen/forums/

Find “Common Biorepository Model” – Discussions Forum

caBIG iGoogle Gadget

Queries & displays NCI Common Biorepository

Specimen Count Data

Developed by Booz Allen Hamilton

caBIG iGoogle Gadget Flowchart

caBIG iGoogle Gadget

PHP PagecaGRID.orgREST form

caGRID.orgXML output

1

4

2

First, the caBIG iGoogle Gadget is added to an iGoogle Homepage. A Google Gadget is stored in an XML file which contains gadget metadata as well as HTML, CSS and Javascript.

When the iGoogle homepage loads, the Gadget makes an AJAX request to retrieve data from a PHP page hosted on an external server.

The PHP page makes several queries using the RESTful interface to cagrid.org, located at http://portal-demo.training.cagrid.org/cagridportlets/xml/form. This form returns links to XML files, which contain data retrieved from multiple servers.

1

2

3

caBIG iGoogle Gadget Flowchart

caBIG iGoogle Gadget

PHP PagecaGRID.orgREST form

caGRID.orgXML output

1

4

3

Using the links obtained in the previous step, these XML files are retrieved and summed within the PHP page. The PHP page then outputs this summary data in XML format.

The caBIG iGoogle Gadget’s AJAX request is completed as the XML data output by the PHP page is retrieved. A Javascript callback function parses the summary XML and outputs it to the screen.

Selecting a different option within the caBIG iGoogle Gadget does not make another AJAX request, but instead locally re-parses the already retrieved XML summary data.

3

4

2

44

Key TBPT/NCI Sites:

• The Specimen Resource Locator (website) http://biospecimens.cancer.gov/locator

• caBIG® Tissue Banks & Pathology Tools Workspace:

https://cabig.nci.nih.gov/workspaces/TBPT

• OBBR – Office of Biorepositories and Biospecimen Research:

http://biospecimens.cancer.gov

• OBBR & NCI Best Practices http://biospecimens.cancer.gov/practices/

BIRT (in the works) – open-source reporting tool

• Tie into XML output of Grid Queries• Customize reporting• Could customize what is displayed

BIRT (in the works)

47

CBM Next Steps

caBIG® Tissue Banks & Pathology Tools Workspace

caBIG® TBPT

National Cancer Institute

48

June 2009 - Challenge announced to use CBM to expose data on the caGrid – Vendor & Cancer Centers (biorepositories) involvement

Fall 2009 – A CBM with caGrid artifacts ready for testing

Winter 2010 –First set of vendors have incorporated a CBM into their systems and can expose test data on the caGrid

Fall 2010 – First set of cancer centers (biorepositories) have deployed CBM at their site and are exposing real data on the caGrid

End of 2010/2011 – Researchers demonstrate real-world cases of successful research with biospecimens located via CBM

2009

2010

Vendor Commitment to identify resources/timeline for CBM testing in their product

NCI-CBIIT releases initialDocumentation, SW artifacts andtools for vendors to start testing

Vendors/caTissue/NCI-CBIIT pass test suite on test-data.

Biorepositories commitment to expose their own biorepository data using CBM-ready SW

Biorepositories with

Vendors/caTissue successfully expose real data on the caGrid

NCI SRL 2.0 Ready to query for specimens across caGrid via CBM

Cancer Centers identify projects that will use CBM in specimen search

Researchers report out on research impact using CBM

Identify future publications using specimens located through CBM

ITERATIVE Development PROCESS w

ith CBM Community

CBM Challenge (DEC 2009 Update)

Where We’ve Been