provenance of microarray experiments
DESCRIPTION
TRANSCRIPT
![Page 1: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/1.jpg)
Provenance of Microarray Experiments for a Better Understanding of
Experiment Results
Helena F. Deus
University of Texas
Jun ZhaoUniversity of
Oxford
Satya SahooWright State University
Matthias Samwald
DERI, Galway
Eric Prud’hommeau
xW3C
Michael MillerTantric Designs
M. Scott MarshallLeiden
University Medical Center
Kei-Hoi CheungYale University
![Page 2: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/2.jpg)
Outline
Background: microarrays, gene expression and why is provenance important for experimental biomedical data Objectives Data: Microarray workflow and gene results
The provenance model Demo Future work Summary
![Page 3: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/3.jpg)
Introduction
High throughput experiments, such as microarray technologies, have revolutionized the way we study disease and basic biology.
Microarray experiments allow scientists to quantify thousands of genomic features in a single experiment
Source: http://www.scq.ubc.ca/
Affymetrix microarray gene chips
Genes can be used as biomarkers for disease
![Page 4: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/4.jpg)
Introduction
Since 1997, the number of published results based on an analysis of gene expression microarray data has grown from 30 to over 5,000 publications per year
Existing microarray data repositories and standards, but lack of provenance and interoperable data access
Source: Y
JBM
(2007) 80(4):165-78
![Page 5: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/5.jpg)
Introduction Cont.
A pilot study of the W3C HCLS BioRDF task force
Bottom-up approach Use Microarray
experiments for Alzheimer’s Diseases as the test-bed Aggregate results
across microarray experiments
Combine different types of data
![Page 6: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/6.jpg)
Objectives
To facilitate a better understanding of microarray gene results Efficiently query gene results Efficiently combine existing life science datasets
To transform Microarray gene results into Semantic Web format
To encode provenance information about these gene results in the same format as the data itself
![Page 7: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/7.jpg)
Microarray WorkflowBiological question
Differentially expressed genesSample gathering etc.
Experiment design
Microarray experiment
Image analysis
Normalization
Estimation ClusteringDiscriminat
ion T-test… …
Data extraction
Data analysis and modeling
![Page 8: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/8.jpg)
An Example of differentially
expressed genes
8
![Page 9: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/9.jpg)
An Example of gene list from different studies
![Page 10: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/10.jpg)
What microarray experiments analyze samples taken from the entorhinal cortex region of Alzheimer's patients?
![Page 11: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/11.jpg)
What genes are overexpressed in the entorhinal cortex region and what is their expression fold change and associated p-value?
![Page 12: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/12.jpg)
What other diseases may be associated with the same genes found to be linked to AD?
![Page 13: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/13.jpg)
A Bottom-up Approach
Separate concerns/perspectives Too many existing vocabularies to choose from Lack of standardization among existing provenance
vocabularies Lack of a clear understanding of what needs to be captured Process
Identify user query Define terms Test the query using test data
![Page 14: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/14.jpg)
A Bottom-up Approach
Raw Data
Results
![Page 15: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/15.jpg)
A Bottom-up Approach
Raw Data
Results
Questions
Which genes are markers for
neurodegenerative diseases?
Was gene ALG2 differentially
expressed in multiple experiments?
What software was used to analyse the
data?
How can the experiment be
replicated?
![Page 16: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/16.jpg)
A Bottom-up Approach
Raw Data
Results
Questions
Which genes are markers for
neurodegenerative diseases?
Was gene ALG2 differentially
expressed in multiple experiments?
Provenance of Microarray experiment
What software was used to analyse the
data?
How can the experiment be
replicated?
![Page 17: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/17.jpg)
A Bottom-up ApproachProvenance
modelsWorkflow,
experimental designDomain ontologies
(DO, GO…)Communitymodels
Raw Data
Results
Questions
Which genes are markers for
neurodegenerative diseases?
Was gene ALG2 differentially
expressed in multiple experiments?
Provenance of Microarray experiment
What software was used to analyse the
data?
How can the experiment be
replicated?
![Page 18: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/18.jpg)
The Provenance Data Model: Four Types of Provenance
http://purl.org/net/biordfmicroarray/ns#
![Page 19: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/19.jpg)
RDF genelist representation Institutional level: metadata associated with each genelist such as
the laboratory where the experiments were performed or the reference to the genelist.
Experimental context level: experimental protocols such as the region of the brain and the disease (terms were partially mapped to MGED, DO and NIF).
![Page 20: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/20.jpg)
RDF genelist representation Data analysis and significance: statistical analysis methodology for
selecting the relevant genes
Dataset descriptions: version of a source dataset, who published the dataset. The vocabulary of interlinked datasets (voiD) and dublin core terms (dct) were used.
![Page 21: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/21.jpg)
Provenance types are perspectives on the data
![Page 22: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/22.jpg)
Provenance types are perspectives on the data
![Page 23: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/23.jpg)
Provenance types are perspectives on the data
![Page 24: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/24.jpg)
Provenance types are perspectives on the data
![Page 25: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/25.jpg)
Query federation with diseasomeIs there a gene network for AD?
Source: PNAS 104:21, 8685 (2007)
![Page 27: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/27.jpg)
Conclusions Levels of provenance: 1) institutional; 2) experimental
context; 3) Statistical analysis and significance; 4) dataset description
Provenance as RDF: SPARQL queries to express contrains both about the origins and context of the data
Data model is driven by the biological question: a bottom-up approach shields the model from rapidly evolving ontologies while enabling linking to widely used ontologies
Mapping is facilitated: Mapping to existing provenance vocabularies, like OPM, PML, Provenir is facilitated by: biordf:has_input_value, which can be made a sub-
property of the inverse of OPM property used biordf:derives_from_region, which can become a sub-
property of OPM property wasDerivedFrom.
![Page 28: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/28.jpg)
Summary and Future Work Provenance modeling in a semantic web application
Query genes gathered from specific samples, in a given condition or from given organizations
Query genes produced through particular statistical analysis process
Query for information about genes from a most recent dataset The bottom-up approach
Separate concerns of interests Create a minimum set of terms required for motivation queries
Future work To integrate our model with provenance information generated
in scientific workflow workbench To integrate provenance information as part of the Excel
Spreadsheet where most biologists report their results
![Page 29: provenance of microarray experiments](https://reader033.vdocuments.us/reader033/viewer/2022061103/53ff840a8d7f7249088b464d/html5/thumbnails/29.jpg)
Acknowledgement
W3C BioRDF group Kei Cheung, Michael Miller, M. Scott Marshall, Eric
Prud’hommeaux, Satya Sahoo, Matthias Samwald The HCLS IG as well as Helen Parkinson, James Malone,
Misha Kapushesky and Jonas Almeida.