ontology application and use at the encode dcc
TRANSCRIPT
![Page 1: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/1.jpg)
Ontology application and use at the ENCODE DCC
Venkat MalladiData Wrangler, ENCODE DCC Department of Genetics Stanford University School of Medicine
Venkat Malladi ENCODE DCC
![Page 2: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/2.jpg)
Overview
Venkat Malladi ENCODE DCC
MetadataModel
Ontologies Search Futuredirections
Intro to ENCODE and the DCC
![Page 3: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/3.jpg)
What is ENCODE?
Venkat Malladi ENCODE DCC Modified from PLoS Biol 9-e1001046,2011
(M. Pazin)
Approximately ~30 different assays
![Page 4: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/4.jpg)
Role of the Data Coordination Center
Venkat Malladi ENCODE DCC
Production labsAnalysis groups
Genome Browser
ENCODE portal(DCC)
Data files
Metadata DCCDCC Integrative websites
Scientificcommunity
Role: Data generation Data organization Data access
Tasks: Perform assays Data processing & validation Web-based searchesPerform analyses Data file storage Data
downloadsValidate data Metadata curation
Submit data filesSubmit metadata
![Page 5: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/5.jpg)
Challenge: Find common biosamples from data generated by two consortia
Venkat Malladi ENCODE DCC
356 termshttp://encodeproject.org/ENCODE/cellTypes.html
Projects are internally consistent…..
314 termsGEO characteristics: common_name, tissue_type, cell_type, lines
![Page 6: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/6.jpg)
Simple text match
Venkat Malladi ENCODE DCC
360 termsCell type
… but only 3 biosample names match exactly between projects
314 termsGEO
IMR90PBMCTh17
![Page 7: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/7.jpg)
Metadata annotation using Ontologies
![Page 8: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/8.jpg)
An ontology is a set of words and relationships … … All relationships must be true.
Venkat Malladi ENCODE DCC
nucleuschromosome
mitochondrial chromosome
mitochondrion
cellParent term
Child term
part_of
part_of
part_of
part_of is_a
part_ofX
![Page 9: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/9.jpg)
An ontology is a set of words and relationships.Need true relationships because inferences can be based
upon them.
Venkat Malladi ENCODE DCC
nucleuschromosome
mitochondrial chromosome
mitochondrion
cellParent term
Child term
part_of
part_of
part_of
part_of is_a
part_ofX
part_of
X part_of
http://www.geneontology.org/GO.ontology.relations.shtml
True
False
![Page 10: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/10.jpg)
Why use ontologies?
Venkat Malladi ENCODE DCC
Reason 1: Consistent way of describing biological concepts
Reason 2: Consistency of language facilitates identification of related data easily.
Reason 3: Consistency in data analysis because relationships between terms provide flexibility of grouping while everyone uses the same set of metadata
![Page 11: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/11.jpg)
What metadata is annotated with ontologies?
Venkat Malladi ENCODE DCC
1. the biological sample serving as input (Biosample)
2. the reagents and conditions applied to the biological input (Treatment)
3. the set of methods and conditions to survey the biological input (Assay)
![Page 12: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/12.jpg)
Venkat Malladi ENCODE DCC
![Page 13: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/13.jpg)
Biosample ontologies
Venkat Malladi ENCODE DCC
1. Uber anatomy ontology (Uberon) - structure, location and heterogenous mixture of cells
2. Cell Ontology (CL) - primary cells or stem cells
3. Experimental Factor Ontology (EFO) - no direct corresponding anatomical structure or physiological cell type
![Page 14: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/14.jpg)
Venkat Malladi ENCODE DCC
![Page 15: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/15.jpg)
Challenge: Find all heart-related tissues?
Venkat Malladi ENCODE DCC
Heart_OCHCFHCFaaHCMOthers?
Fetal HeartHeartRight AtriumRight VentricleOthers?
![Page 16: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/16.jpg)
Searching ENCODE metadata
Venkat Malladi ENCODE DCC
![Page 17: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/17.jpg)
Ontology driven search
Venkat Malladi ENCODE DCC
![Page 18: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/18.jpg)
Future directions
Venkat Malladi ENCODE DCC
• Additional ontologies
• Ontology- based data validations
![Page 19: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/19.jpg)
Additional ontologies
Venkat Malladi ENCODE DCC
• Protein Ontology (PRO,http://pir.georgetown.edu/pro/pro.shtml)o transforming growth factor beta-1 (human)— PR:P01137
• EDAM Ontology (EDAM, http://edamontology.org)o FASTQ—format:1930, BAM—format:2572o sequence alignment—data:0863
![Page 20: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/20.jpg)
Ontology based validations
Venkat Malladi ENCODE DCC
![Page 21: Ontology application and use at the encode dcc](https://reader036.vdocuments.us/reader036/viewer/2022062320/55c079a1bb61eb76438b464d/html5/thumbnails/21.jpg)
Acknowledgments
Venkat Malladi ENCODE DCC
Nikhil Podduturi, Laurence Rowe, Forrest Tanaka
Esther Chan, Jean Davidson, Venkat Malladi, Cricket Sloan, J. Seth Strattan
Eurie Hong, Mike Cherry (PI), Jim Kent (co-PI), Ben Hitz
Brian Lee, Stuart Miyasato, Matt Simison, Zhenhua Wang, Marcus Ho
Data Wranglers
Software Engineers
QA, administration, biocuration
National Institute of General Medical Sciences of the United States AQ1215 National Institutes of Health (GM10331601); U41 grant from National Human Genome Research Institute at the U.S. National Institutes of Health (HG006992)