towards a semantic web application: ontology-driven ortholog clustering analysis

20
University of Michigan Medical University of Michigan Medical School School 1 Towards a Semantic Web Towards a Semantic Web application: Ontology- application: Ontology- driven ortholog driven ortholog clustering analysis clustering analysis Yu Lin, Zuoshuang Xiang, Yu Lin, Zuoshuang Xiang, Yongqun He Yongqun He

Upload: wilton

Post on 05-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Towards a Semantic Web application: Ontology-driven ortholog clustering analysis. Yu Lin, Zuoshuang Xiang, Yongqun He. Outline. Background of COG ( Clusters of Orthologous Groups ) database COG-based gene set enrichment analysis COG Analysis Ontology (CAO) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

University of Michigan Medical SchoolUniversity of Michigan Medical School 11

Towards a Semantic Web Towards a Semantic Web application: Ontology-application: Ontology-

driven ortholog clustering driven ortholog clustering analysisanalysis

Yu Lin, Zuoshuang Xiang, Yu Lin, Zuoshuang Xiang, Yongqun HeYongqun He

Page 2: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

University of Michigan Medical School 2

OutlineOutline

► Background of COG (Background of COG (Clusters of Clusters of Orthologous Groups Orthologous Groups ) database) database

► COG-based gene set enrichment COG-based gene set enrichment analysis analysis

► COG Analysis Ontology (CAO)COG Analysis Ontology (CAO)► OntoCOG, the semantic web OntoCOG, the semantic web

application for COG enrichment application for COG enrichment analysisanalysis

Page 3: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

University of Michigan Medical School 3

Ortholog & COG databaseOrtholog & COG database► ortholog : ortholog : Orthologs are Orthologs are

genes in different genes in different species that have species that have evolved from a common evolved from a common ancestral gene by ancestral gene by speciation. Orthologs speciation. Orthologs usually share the same usually share the same functions in the course functions in the course of evolution. of evolution.

► COG database: COG database: 1) collections of 1) collections of orthologs 2) clusters orthologs 2) clusters orthologs to functional orthologs to functional groups.groups.

► Entry in COG has COG Entry in COG has COG ID, or may have a ID, or may have a functional category functional category assignment.assignment.

Page 4: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

COG vs. GO► Same: Classified categories with gene product

assigned, provide gene function annotation and classification.

► Different: Categories Species:

GO: model animals; COG: 66 genomes.(COG covers more bacteria.)

Only Schizosaccharomyces pombe (fission yeast), Saccharomyces cerevisiae (baker's yeast) and E. coli, have both COG and GO annotations.

In Brucella, only one gene BMEI0467 in B. melitensis has been annotated both in GO and COG. GO:0042803 : protein homodimerization activity COG0408: Coproporphyrinogen III oxidase (Coenzyme transport and metabolism H)

Page 5: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

University of Michigan Medical School 5

COG enrichment analysisCOG enrichment analysis

Given Given list list

Not Not given listgiven list

Total Total

catA catA qq m-qm-q MM

Not Not catAcatA

k-qk-q t-m-(k-q)t-m-(k-q) t-mt-m

totaltotal KK t-kt-k TT

Fisher’s Exact TestContingency table

COG enrichment analysis is to find out the statistical significance of the distribution of the data, particularly, the p-value to test whether COG category catA annotated protein q is enriched (unevenly distributed) among the given protein list t.

Given a list of k COG annotated proteins with a total of t proteins, for a given COG category A, there are q proteins within k and m proteins within t associated with it.

Page 6: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

University of Michigan Medical SchoolUniversity of Michigan Medical School 66

A lot of GO enrichment analysis services are

available, but not the COG enrichment analysis service

Page 7: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

University of Michigan Medical School 7

Design of OntoCOGDesign of OntoCOG

data transformation

RDF data

Input data

User defined protein list, gene list ...

plain txt file

OWL reasoning

OUTPUT: RDF dataUser layer

Application layer

Server

backendRDMS

RDF repository

SPARQLendpoint

COG enrichment analysis

Database layer

OntoCOG : a Semantic Web service application for COG enrichment analysis.

Input data: a list of

protein defined by

user for COG enrichment

analysis

Output data: proteins

grouped by COG

category with a p-value in OWL format.

CAO (COG Analysis

Ontology) supported

Page 8: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

University of Michigan Medical School 8

COG Analysis OntologyCOG Analysis Ontology (CAO))

►Scope1) ontology-based software/service design; 2) supporting data integration and exchange in OWL format.

►Domainstatistical analysisprotein’s COG annotation

Page 9: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

Design of CAO

CAO includes models for major components of the OntoCOG application: input data transformation, Fisher’s exact test analysis, and minimum information of output data. Terms in yellow, light purple, and green boxes denote processes, generically dependent continuants, and independent continuants, respectively.

COG enrichment data transformation using Fisher's exact test

user defined protein list clustered by COG category

COG enrichment Fisher's test p-value

Fisher's exact test

has_specified_output

has_specified_input

has_part

background data set clustered by COG category

has_specified_input

user defined protein list clustered by COG category E

is_a

is about

COG mapping data transformation

user defined gene list

is_specific_input_of

has_specified_output

user defined data set for COG enrichment analysis

is_a

user defined COG annotated protein list

is_a

is_a

COG category E clustered protein, user defined

is_about

has_member

denoted_by

is_a

denoted_by

data transformation

Fisher’s exact test analysis

COG category E clustered protein group, user defined

COG category clustered protein, user defined COG Functional Category

Amino acid transport and metabolism

is_member_of

is_a

Minimum info of output data

datatype propertiesobjective properties

has_size sample size

dataProtperty termclass term

SPAN:process

independent continuant

generically dependent continuant

Page 10: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

University of Michigan Medical School 10

COG Analysis OntologyCOG Analysis Ontology (CAO) :) :Core TermsCore Terms

Page 11: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

Information captured by CAO

► The given list► The proteins grouped by COG

categories► The size of each category in the

given list *► The p-value of each category in

the given list * It captures more information than

traditional COG enrichment analysis (non-SW technology supported)

The traditional output of COG enrichment analysis.

Format: Category: p-value; size; (* denotes p-value < 0.05 significant)

Page 12: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

University of Michigan Medical School 12

New relations in CAONew relations in CAO

► denoted_by denoted_by describes a relation of an independent entity and describes a relation of an independent entity and

a data itema data iteman independent entity “denoted_by” a data iteman independent entity “denoted_by” a data item

Not a reverse relation of “denotes”Not a reverse relation of “denotes”

► is_member_ofis_member_ofhas_memberhas_member Reverse relationsReverse relations Describes relation of object and object aggregateDescribes relation of object and object aggregate

Page 13: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

University of Michigan Medical School 13

Axioms in CAOAxioms in CAO

► COG category clustered protein, user COG category clustered protein, user defined ≡ user defined protein defined ≡ user defined protein andand (denoted_by (denoted_by somesome COG Functional category) COG Functional category)

► COG category E clustered protein, user COG category E clustered protein, user defined ≡ COG category protein and defined ≡ COG category protein and (denoted_by min 1 COG Amino acid (denoted_by min 1 COG Amino acid transport and metabolism)transport and metabolism)

► COG category E clustered protein group, COG category E clustered protein group, user defined ≡ protein group and user defined ≡ protein group and (has_member only COG category E protein)(has_member only COG category E protein)

Page 14: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

University of Michigan Medical School 14

Validation of CAOValidation of CAO

Page 15: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

Summaries on CAO

►An ontology to represent COG enrichment analysis

►An ontology to represent the COG enrichment analysis service : OntoCOG

►It is a use case of IAO (Information Artifact Ontology) and OBI (Ontology for Biomedical Investigation)

►It supports OntoCOG.

Page 16: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

University of Michigan Medical School 16

OntoCOGOntoCOGhttp://ontobat.hegroup.org/ontocog/index.php

Page 17: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

University of Michigan Medical School 17

OntoCOG analysis of OntoCOG analysis of BrucellaBrucella virulence factorsvirulence factors

Page 18: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

Result

Page 19: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

University of Michigan Medical School 19

Final ConclusionFinal Conclusion

►OntoCOG provide a platform OntoCOG provide a platform independent server for COG enrichment independent server for COG enrichment analysisanalysis

►CAO ontology supports the design and CAO ontology supports the design and workflow of OntoCOG. workflow of OntoCOG.

►OntoCOG is the first semantic web OntoCOG is the first semantic web application used for such purpose.application used for such purpose.

►Future work: interface developing; Future work: interface developing; expand to other statistical analysis; expand to other statistical analysis; output data visualization.output data visualization.

Page 20: Towards a Semantic Web application: Ontology-driven ortholog clustering analysis

AcknowledgementAcknowledgement

University of Michigan Medical School 20

• The OntoCOG project is supported by NIH grant 1R01AI081062.

• People: Yu Lin Yongqun “Oliver” He Zuoshuang “Allen” Xiang

• Special thanks to ICBO Committee• Thank Dr. Barry Smith for correcting the English in our manuscript.