development and experience with tissue banking tools to support cancer research

Development and Experience with Development and Experience with Tissue Banking Tools to Support Tissue Banking Tools to Support

Cancer ResearchCancer Research

Waqas Amin M.DWaqas Amin M.D, Anil V. Parwani M.D PhD and Michael J. , Anil V. Parwani M.D PhD and Michael J. Becich M.D, PhD1Becich M.D, PhD1

Department of Biomedical Informatics, University of Pittsburgh, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA.USA 2Department of Pathology, University of Pittsburgh, PA.USA 2Department of Pathology, University of

Pittsburgh Medical Center, Pittsburgh, PA. USAPittsburgh Medical Center, Pittsburgh, PA. USA

Introduction:

Over the last decade, the Department of Biomedical Over the last decade, the Department of Biomedical Informatics (DBMI) at the University of Pittsburgh has Informatics (DBMI) at the University of Pittsburgh has developed and deployed various tissue banking informatics developed and deployed various tissue banking informatics tools to expedite translational medicine research. tools to expedite translational medicine research.

Deals with management of clinicopathologic annotation, Deals with management of clinicopathologic annotation, inventory management and distribution of biospecimens inventory management and distribution of biospecimens that are collected and stored for translational research use that are collected and stored for translational research use by the scientific community. by the scientific community.

Tissue Banking Informatics:

Aggregation: Aggregation: Process to associate tissue samples with valuable Process to associate tissue samples with valuable data including demographic, epidemiology, pathology, data including demographic, epidemiology, pathology, progression, vital status, therapy and outcomes related data. progression, vital status, therapy and outcomes related data.

Standardization:Standardization: Collected data must be uniform or shareable. Collected data must be uniform or shareable. This standardized approach to annotation is to ensure This standardized approach to annotation is to ensure uniformity, consistency, and quality of collected data. This uniformity, consistency, and quality of collected data. This facilitates information sharing across multiple institutions. facilitates information sharing across multiple institutions.

Searchable:Searchable: Development of an information model supported Development of an information model supported by standardized data collection approach allows annotated by standardized data collection approach allows annotated tissue samples to be matched with the research queries, tissue samples to be matched with the research queries, thereby facilitating better understanding of the experimental thereby facilitating better understanding of the experimental design and result.design and result.

Data Requirement in Cancer Research:

High quality, accurate and comprehensive data is required to support genomic, proteomic, clinical and translation research.

Data must be acquired in accordance with legal and ethical subject polices.

Type of Data Collection: Demographic data Patient clinical data Pathology block level data Patient treatment data Outcome and follow up data Biochemical data Genomic level data Cell and tissue level data

Data Collection Standards:

Development of Common Data Element (CDE):Development of Common Data Element (CDE):

Standardized clinical annotations defined in detail utilizing Standardized clinical annotations defined in detail utilizing metadata. Allows uniform, consistent shareable data metadata. Allows uniform, consistent shareable data collection across multiple institutes/systems.collection across multiple institutes/systems.

Development of CDEs are supervised by multidisciplinary Development of CDEs are supervised by multidisciplinary team and CDE subcommittee developed consensus CDE team and CDE subcommittee developed consensus CDE incorporating following standards applicable for a organ incorporating following standards applicable for a organ specific tissue.specific tissue.

ADASP (Association of Directors of Anatomic and ADASP (Association of Directors of Anatomic and Surgical Pathology (ADASP) Cancer Reporting Surgical Pathology (ADASP) Cancer Reporting Guidelines Guidelines

American Joint Committee on Cancer (AJCC) Cancer American Joint Committee on Cancer (AJCC) Cancer Staging ManualStaging Manual

NAACCR (North American Association of Central Cancer NAACCR (North American Association of Central Cancer Registry) Data Standards for Cancer Registries Registry) Data Standards for Cancer Registries

Data Sources:Data Sources: Data import from automated electronic systems like AP-Data import from automated electronic systems like AP-

LIS, CP-LIS, Radiology and Registry information System LIS, CP-LIS, Radiology and Registry information System (RIS). (RIS).

Patient questionnaire, patient health record and Patient questionnaire, patient health record and treatment charts, existing databases, consultation with treatment charts, existing databases, consultation with referring physicians, archived data and pathology referring physicians, archived data and pathology reports.reports.

De-Identification of PHI:De-Identification of PHI: The purpose is to ensure proper confidentiality and privacy of The purpose is to ensure proper confidentiality and privacy of

human subjects based upon Institutional Review Board human subjects based upon Institutional Review Board approved protocols.approved protocols.

De-identification of PHI is done by an Honest Broker according De-identification of PHI is done by an Honest Broker according to Health Insurance Portability and Accountability Act (HIPAA). to Health Insurance Portability and Accountability Act (HIPAA). regulations by designating unique codes to patient data related regulations by designating unique codes to patient data related identifiers.identifiers.

Specimen collection and standardization

Biospecimens are collected according to pathology and Biospecimens are collected according to pathology and tissue banking standardized protocol. Biospecimens are tissue banking standardized protocol. Biospecimens are collected and stored for tissue banking project , includes:collected and stored for tissue banking project , includes:

Paraffin BlocksParaffin Blocks Fresh Frozen TissueFresh Frozen Tissue Blood Products includes:Blood Products includes:

SerumSerum PlasmaPlasma Buffy CoatBuffy Coat RBCRBC WBCWBC

Tissue Banking Information Models Tissue Banking Information Models and Architecture:and Architecture:

Two types of information models that have been utilized in Two types of information models that have been utilized in the development of tissue bank.the development of tissue bank.

Organ-specific databases (OSD)Organ-specific databases (OSD) Cooperative Prostate Cancer Tissue Resource (CPCTR) (Cooperative Prostate Cancer Tissue Resource (CPCTR) (

www.cpctr.info)) Pennsylvania Cancer Alliance for Bioinformatics Pennsylvania Cancer Alliance for Bioinformatics

Consortium (PCABC) (Consortium (PCABC) (www.pcabc.upmc.edu)) Early Detection Research Network (EDRN) Colorectal and Early Detection Research Network (EDRN) Colorectal and

Pancreatic Neoplasm databasePancreatic Neoplasm database SPORE Head and Neck Neoplasm Database SPORE Head and Neck Neoplasm Database

Model Driven Approach (Database)Model Driven Approach (Database) National Mesothelioma Virtual Bank (NMVB) (National Mesothelioma Virtual Bank (NMVB) (

www.mesotissue.org))

OSD (Organ Specific Database):OSD (Organ Specific Database):

OSD is a three-tiered architecture, and implemented on an OSD is a three-tiered architecture, and implemented on an Oracle Application Server v10.1.2.3 running on a Windows Oracle Application Server v10.1.2.3 running on a Windows 2003 and Oracle RDBMS v.10.2.0.2 running on an AIX 5L 2003 and Oracle RDBMS v.10.2.0.2 running on an AIX 5L virtual host definition supported by IBM x3850 system virtual host definition supported by IBM x3850 system hardware.hardware.

Dynamic web pages are generated using Oracle http server Dynamic web pages are generated using Oracle http server and mod_plsql extensions for the database users.and mod_plsql extensions for the database users.

The data annotation engine is a flexible dynamic web-based The data annotation engine is a flexible dynamic web-based tool, while the data query engine facilitates investigators to tool, while the data query engine facilitates investigators to search de-identified information within the warehouse search de-identified information within the warehouse

through a “point and click” interface.through a “point and click” interface.

OSD Multi Tier OSD Multi Tier Architecture:Architecture:

Physical DataPhysical Data PresentationPresentation

Metadata EngineMetadata Engine

Application Data

Layer

Common Data Elements (CDE)Definitions

Business Rules Engine

Mapping Engine

HELP Builder

Security EngineSecurity Engine

Registration

Authorization

Authentication

Security Data Layer

Metadata DataLayer

MetadataCuration

ManualAnnotation

Data Query

Data ImportExport

AdminSecurity

OSD (Meta Data Builder Tool):

OSD Feature List:OSD Feature List:

To address the needs of the heterogeneous users we To address the needs of the heterogeneous users we identified numerous criteria for success. Some requirements identified numerous criteria for success. Some requirements and features are listed below:and features are listed below:

Quick Statistics on overall data.Quick Statistics on overall data. Multi-mode search: Multiplex search and Advance Multi-mode search: Multiplex search and Advance

search.search. Mechanism for keeping user’s orientated (e.g. help, Mechanism for keeping user’s orientated (e.g. help,

persistence of last entered query text)persistence of last entered query text) Results in tabular forms, sorting on each column Results in tabular forms, sorting on each column

including access to full case report.including access to full case report. Both Honest Broker and De-identified (researcher) Both Honest Broker and De-identified (researcher)

access.access. Controlled access to subjects for different studiesControlled access to subjects for different studies

Feature List (Contd..)Feature List (Contd..)

Standard and customized query results of the data.Standard and customized query results of the data. Individual research and consent based access to Individual research and consent based access to

information.information. Quick search using cases saved in “My Cases”.Quick search using cases saved in “My Cases”. Query Builder interface.Query Builder interface. On Line Help Manual Builder.On Line Help Manual Builder. This model can support multi institutional data This model can support multi institutional data

enterprise model.enterprise model. User Management Module helps create, revoke, control User Management Module helps create, revoke, control

users access and activities within the database.users access and activities within the database. Business layer allows for creation of complex/logical Business layer allows for creation of complex/logical

data fields based on data interpretation by experts.data fields based on data interpretation by experts.

OSD model Based Head and Neck OSD model Based Head and Neck Neoplasm Virtual Biorepository:Neoplasm Virtual Biorepository:

It is Developing bioinformatics driven system to utilize multi It is Developing bioinformatics driven system to utilize multi model data sets from patient questionnaire, clinical, model data sets from patient questionnaire, clinical, pathological, radiology and molecular systemspathological, radiology and molecular systems

Results in one architecture supported by a set of CDEs to Results in one architecture supported by a set of CDEs to facilitate basic science, clinical as well translational research facilitate basic science, clinical as well translational research

Systems designed to facilitate semantic and syntactic Systems designed to facilitate semantic and syntactic interoperability in development of data elements (i.e., interoperability in development of data elements (i.e., metadata or data descriptors using controlled vocabulary metadata or data descriptors using controlled vocabulary and ontology) and ontology)

Provides data entry, data mining and analysis tools.Provides data entry, data mining and analysis tools.

OSD Integration with other Data OSD Integration with other Data Sources:Sources:

Genotype Lab data

Bio-marker data

Radiology (PET/CT) data

Patient Insurance

information

Human Papilloma Virus Questionnaire

data

Epidemiology Project-1

questionnaire data

SPORE H&N Neoplasm Database

AP-LIS/ CP-LIS

RIS

BIOS

Data Collection & Data Collection & Annotation ToolAnnotation Tool

User Authentication

Data Collection & Annotation Tool:

User Management Module

Data Collection & Annotation Tool

Administrator can create, edit, revoke control user’s & their access to different applications


Manual data collection module

Case summary


Can switch quickly between different available applications as per user access rights


Quick over all review of Statistics on the collected database


Data Query template


Standard view


Descriptions of different views for reference


Allows data export for Statistical analysis packages, such as SAS, etc.


Full Case Report View (Identified or De-identified as per access level

User can have multiple “My Case” lists for different studies


User can also select any data field to create personalized views & save under ”My Views”

Data Collection & Annotation ToolData Collection & Annotation Tool

Administrator can edit or create data views

Virtual BiorepositoryVirtual Biorepository Tissue typeTissue type

Total # Cases, Total # Cases, Total Number of BiospecimensTotal Number of Biospecimens

Paraffin BlocksParaffin BlocksFrozen Frozen BlocksBlocks

Blood/Blood/serum/serum/PlasmaPlasma

CPCTRCPCTR ProstateProstate 70007000 3464134641 1750817508 1750817508

PCABCPCABC

BreastBreast 36453645 17601760 847847 823823

MelanomaMelanoma 1762 1762 18851885 168168 112112

ProstateProstate 73277327 54575457 16421642 415415

EDRN Colorectal and EDRN Colorectal and Pancreatic Neoplasm Pancreatic Neoplasm Virtual BiorepositoryVirtual Biorepository

Pancreas and Pancreas and coloncolon

24592459 175175 942942 12541254

SPORE’s Head & Neck SPORE’s Head & Neck Neoplasm Virtual Neoplasm Virtual BiorepositoryBiorepository

Head and Neck Head and Neck NeoplasmNeoplasm

1162211622 22372237 00 10381038

OSD based Databases Accruals:

Amin et al. Tissue banking informatics 2010)

Model Driven Database (MDD):Model Driven Database (MDD):

NMVB is developed using a model-driven approach (MDD).NMVB is developed using a model-driven approach (MDD).

Application components are generated from UML domain Application components are generated from UML domain models.models.

Java based application designed using a Model-Driven Java based application designed using a Model-Driven Development framework. Development framework.

MDD (contd.…)

Web Tier: Construct web pages upon metadata Web Tier: Construct web pages upon metadata dictionarydictionary

Business Tier: Provides an object/relational Business Tier: Provides an object/relational mapping mechanism, a metadata interrogation mapping mechanism, a metadata interrogation mechanism, an application programming Interface mechanism, an application programming Interface and a set of shared services.and a set of shared services.

Data Tier: Consists of domain database that houses Data Tier: Consists of domain database that houses clinically annotated data, indexes to support the clinically annotated data, indexes to support the query mechanism and security data.query mechanism and security data.

Virtual Component of NMVB:

Statistical Data Query InterfaceStatistical Data Query Interface

Approved Investigator Query InterfaceApproved Investigator Query Interface

Data Entry InterfaceData Entry Interface

www.mesotissue.org

YearYear Retrospective CasesRetrospective Cases Prospective CasesProspective Cases Overall NMVB TotalOverall NMVB Total

20062006 515515 88 523523

20072007 585585 5050 635635

20082008 605605 105105 710710

20092009 674674 162162 836836

2010 (to date)2010 (to date) 674674 183183 865865

NMVB Accruals:

Conclusion:

Informatics supported tissue banking initiatives act as a Informatics supported tissue banking initiatives act as a large source of annotated biospecimens and facilitates large source of annotated biospecimens and facilitates basic and clinical science research.basic and clinical science research.

Tissue banking infrastructure allows efficient governess, Tissue banking infrastructure allows efficient governess, standardized capture of data and detailed standardized standardized capture of data and detailed standardized annotation at local institute and across multiple annotation at local institute and across multiple collaborating sites.collaborating sites.

Finally, tissue banking tools developed at DBMI Finally, tissue banking tools developed at DBMI (Department of biomedical informatics) provides an (Department of biomedical informatics) provides an important knowledgebase for the development of important knowledgebase for the development of integrated tissue banking efforts and benefit other tissue integrated tissue banking efforts and benefit other tissue banking initiatives by providing consultation.banking initiatives by providing consultation.

Thank you

development and experience with tissue banking tools to support cancer research

Documents

data import

data requirement

archived data

valuable data

comprehensive data

type of data collection

outcomes related data

cancer registries data