1 data integration and extraction over molecular biological data cui tao supported by nsf
Post on 20-Dec-2015
216 views
TRANSCRIPT
![Page 1: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/1.jpg)
1
Data Integration and Extraction over Molecular Biological Data
Cui Tao
supported by NSF
![Page 2: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/2.jpg)
2
Motivation
Online biological data: Highly diverse in granularity and
variety Various formats Different terminologies, ID systems,
units
![Page 3: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/3.jpg)
3
How to Build a Gene Extraction Ontology? Concepts Relationship sets Constraints Data Frames
![Page 4: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/4.jpg)
4
How to Build a Gene Extraction Ontology?
(G*A*U*C*)*
(G*A*T*C*)*
![Page 5: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/5.jpg)
5
Knowledge Sources Gene Ontology
Thousands of terms
All Species Toolkit 1,231,935 species names
Protein Databases Thousands of protein names
(Molecular Function, Biological Process, Cellular Component)
![Page 6: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/6.jpg)
6
Extraction Rules Statistical NLP Machine learning
Naïve Bayes Hidden Markov Models Decision Trees
![Page 7: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/7.jpg)
7
Integration
![Page 8: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/8.jpg)
8
![Page 9: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/9.jpg)
9
![Page 10: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/10.jpg)
10
![Page 11: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/11.jpg)
11
![Page 12: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/12.jpg)
12
![Page 13: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/13.jpg)
13
Integration Information Hidden behind Links
![Page 14: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/14.jpg)
14
![Page 15: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/15.jpg)
15
![Page 16: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/16.jpg)
16
![Page 17: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/17.jpg)
17
Query-based Extraction
Query the gene extraction ontology
Find applicable resources Fill out forms Extract information
![Page 18: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/18.jpg)
18
Query-based Extraction
Example: “Find the alfR gene, its sequence, its protein's function, and any mutant that inhibits this gene.”
Gene NameGene Sequence
Gene
Mutant
Protein FunctionMutant Function
![Page 19: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/19.jpg)
19
![Page 20: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/20.jpg)
20
![Page 21: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/21.jpg)
21
![Page 22: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/22.jpg)
22
![Page 23: 1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d535503460f94a2f1c3/html5/thumbnails/23.jpg)
23
Contribution Provides a way to automatically
integrate online biological data from different sources
Provides an approach that can find proper online resources, fill out online forms and extract data depending on user’s query