integration of biological data (lifedb) presented by md. shazzad hosain ([email protected])...

37
Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain ([email protected]) Supervised By Dr. Hasan Jamil ([email protected]) Wayne State University, Detroit, USA

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Integration of Biological Data (LifeDB)

Presented ByMd. Shazzad Hosain ([email protected])

Supervised ByDr. Hasan Jamil ([email protected])

Wayne State University, Detroit, USA

04/18/23 2

Outline

Data Integration WebFusion (our previous work) LifeDB (our goal) Research Scopes

04/18/23 3

Data Integration Example

Detroit to Bologna air ticket Alitalia, Italy Airline Air France NorthWest Airline Lufthansa etc.

04/18/23 4

04/18/23 5

04/18/23 6

Integration Example cont.

CheapAir.com / Expedia.com

Alitalia Lufthansa Air France Delta

myAirFare.com

CheapAir.com Expedia.com ……

04/18/23 7

Integration Approaches

Warehouse Integration

Mediator based Integration

Navigational Integration

04/18/23 8

Warehouse Integration

Materialize data from all sources to local warehouse

Emphasize data translation rather query translation

Advantages: Low network bottleneck, efficient Disadvantages: reliability in terms of most up

to date data, system maintenance

04/18/23 9

Mediator – based Integration

Concentrates on Query translation GAV approach and LAV Approach

04/18/23 10

GAV Approach

Query reformulation easy, but addition or removal of sources are difficult

Preferred when sources are known an stable

S1 S2 S3 S4

Mediator Schema

04/18/23 11

LAV Approach

Query reformulation is difficult but addition or removal of source are easy

Appropriate for large scale ad-hoc integration ARIADNE, Discovery Link, TAMBIS, KIND etc

Mediator Schema

S1 S2 S3 S4

04/18/23 12

Navigational Integration

Some sources provide information that would not/hardly be accessible without point-and-click navigation

04/18/23 13

WebFusionDr. Liangyou Chen

04/18/23 14

LinkDB

DBGET

KEGG Pathways

Can these be done electronically for a biologist?

04/18/23 15

Go to: http://www.ncbi.nlm.nih.gov/LocusLink/

04/18/23 16

Click <Register Web Process> menu

04/18/23 17

2. Press <Pickup Input> button

1. Input: 103730

04/18/23 18

1. Press <Next> button

2. Press [Go] button

04/18/23 19

1. Mark the table

2. Press <Pickup Table> Button

04/18/23 20Press the <Create> Button

04/18/23 212. Press the <Update & Redraw> Button

1. Uncheck all Boxes except 2~6

04/18/23 22

1. Give it a name called: LocusLink

2. Name them as: Link, LocusID, Org, Symbol, Descriptionrespectively

3. Select appropriate transformations

4. Press <Update & Redraw> button

04/18/23 23

Press <Confirm & Create Table>

04/18/23 24

LocusLink web process is created

04/18/23 25

LinkDB

DBGET

KEGG Pathways

04/18/23 26

1. Select ‘LocusLink’ table

2. Type in ‘LocusLinkQuery’ as a query name

3. Check these fields to display

4. Double click here

04/18/23 27

1. Select ‘local_gene_ids’ table

2. Select ‘LID’ field

3. Click here (any place)

04/18/23 28

Click <This Query> button

04/18/23 29

Press <Execute> button

04/18/23 30

Here shows in progress results

04/18/23 31

LifeDB

04/18/23 32

LinkDB

DBGET

KEGG Pathways

04/18/23 33

Resource Discovery Automatic Schema/Ontology Matching Query Optimization WorkFlows

LifeDB

BioFlow (A declarative WorkFlow Language)

04/18/23 34

Glimpse of BioFlow

GeneBankURL FlyBaseURL

DNA sequence repositories

EMBL formatGeneBank format

Combine these sequence

Reading Frame Predictor (input_seq : FASTA format, species)

Score and predicted DNA region

University of Minnesota

04/18/23 35

BioFlow

workflow open_reading_frame ; use ontology BioSystems ; declare found logical, count int ; define data sequences_1 at GeneBankURL as (seq_1 DNA) ; define tool orf at URL parameter (seq DNA, target organism)

results (score int, predicted_region DNA) ; combine sequences_1, sequences_2 into sequences (seqs); select seqs, orf (seqs, “drosophila”) from sequences ;

Goal is to develop a formal BioFlow language syntax with compositionality, closure property and type safety

04/18/23 36

Research Scope

Resource Discovery Automatic Schema/Ontology Matching Query Optimization WorkFlows

7-8 PhD positions 3-5 years funding

04/18/23 37

Thanks to all