towards a semantic web application: ontology-driven ortholog clustering analysis
DESCRIPTION
Towards a Semantic Web application: Ontology-driven ortholog clustering analysis. Yu Lin, Zuoshuang Xiang, Yongqun He. Outline. Background of COG ( Clusters of Orthologous Groups ) database COG-based gene set enrichment analysis COG Analysis Ontology (CAO) - PowerPoint PPT PresentationTRANSCRIPT
University of Michigan Medical SchoolUniversity of Michigan Medical School 11
Towards a Semantic Web Towards a Semantic Web application: Ontology-application: Ontology-
driven ortholog clustering driven ortholog clustering analysisanalysis
Yu Lin, Zuoshuang Xiang, Yu Lin, Zuoshuang Xiang, Yongqun HeYongqun He
University of Michigan Medical School 2
OutlineOutline
► Background of COG (Background of COG (Clusters of Clusters of Orthologous Groups Orthologous Groups ) database) database
► COG-based gene set enrichment COG-based gene set enrichment analysis analysis
► COG Analysis Ontology (CAO)COG Analysis Ontology (CAO)► OntoCOG, the semantic web OntoCOG, the semantic web
application for COG enrichment application for COG enrichment analysisanalysis
University of Michigan Medical School 3
Ortholog & COG databaseOrtholog & COG database► ortholog : ortholog : Orthologs are Orthologs are
genes in different genes in different species that have species that have evolved from a common evolved from a common ancestral gene by ancestral gene by speciation. Orthologs speciation. Orthologs usually share the same usually share the same functions in the course functions in the course of evolution. of evolution.
► COG database: COG database: 1) collections of 1) collections of orthologs 2) clusters orthologs 2) clusters orthologs to functional orthologs to functional groups.groups.
► Entry in COG has COG Entry in COG has COG ID, or may have a ID, or may have a functional category functional category assignment.assignment.
COG vs. GO► Same: Classified categories with gene product
assigned, provide gene function annotation and classification.
► Different: Categories Species:
GO: model animals; COG: 66 genomes.(COG covers more bacteria.)
Only Schizosaccharomyces pombe (fission yeast), Saccharomyces cerevisiae (baker's yeast) and E. coli, have both COG and GO annotations.
In Brucella, only one gene BMEI0467 in B. melitensis has been annotated both in GO and COG. GO:0042803 : protein homodimerization activity COG0408: Coproporphyrinogen III oxidase (Coenzyme transport and metabolism H)
University of Michigan Medical School 5
COG enrichment analysisCOG enrichment analysis
Given Given list list
Not Not given listgiven list
Total Total
catA catA qq m-qm-q MM
Not Not catAcatA
k-qk-q t-m-(k-q)t-m-(k-q) t-mt-m
totaltotal KK t-kt-k TT
Fisher’s Exact TestContingency table
COG enrichment analysis is to find out the statistical significance of the distribution of the data, particularly, the p-value to test whether COG category catA annotated protein q is enriched (unevenly distributed) among the given protein list t.
Given a list of k COG annotated proteins with a total of t proteins, for a given COG category A, there are q proteins within k and m proteins within t associated with it.
University of Michigan Medical SchoolUniversity of Michigan Medical School 66
A lot of GO enrichment analysis services are
available, but not the COG enrichment analysis service
University of Michigan Medical School 7
Design of OntoCOGDesign of OntoCOG
data transformation
RDF data
Input data
User defined protein list, gene list ...
plain txt file
OWL reasoning
OUTPUT: RDF dataUser layer
Application layer
Server
backendRDMS
RDF repository
SPARQLendpoint
COG enrichment analysis
Database layer
OntoCOG : a Semantic Web service application for COG enrichment analysis.
Input data: a list of
protein defined by
user for COG enrichment
analysis
Output data: proteins
grouped by COG
category with a p-value in OWL format.
CAO (COG Analysis
Ontology) supported
University of Michigan Medical School 8
COG Analysis OntologyCOG Analysis Ontology (CAO))
►Scope1) ontology-based software/service design; 2) supporting data integration and exchange in OWL format.
►Domainstatistical analysisprotein’s COG annotation
Design of CAO
CAO includes models for major components of the OntoCOG application: input data transformation, Fisher’s exact test analysis, and minimum information of output data. Terms in yellow, light purple, and green boxes denote processes, generically dependent continuants, and independent continuants, respectively.
COG enrichment data transformation using Fisher's exact test
user defined protein list clustered by COG category
COG enrichment Fisher's test p-value
Fisher's exact test
has_specified_output
has_specified_input
has_part
background data set clustered by COG category
has_specified_input
user defined protein list clustered by COG category E
is_a
is about
COG mapping data transformation
user defined gene list
is_specific_input_of
has_specified_output
user defined data set for COG enrichment analysis
is_a
user defined COG annotated protein list
is_a
is_a
COG category E clustered protein, user defined
is_about
has_member
denoted_by
is_a
denoted_by
data transformation
Fisher’s exact test analysis
COG category E clustered protein group, user defined
COG category clustered protein, user defined COG Functional Category
Amino acid transport and metabolism
is_member_of
is_a
Minimum info of output data
datatype propertiesobjective properties
has_size sample size
dataProtperty termclass term
SPAN:process
independent continuant
generically dependent continuant
University of Michigan Medical School 10
COG Analysis OntologyCOG Analysis Ontology (CAO) :) :Core TermsCore Terms
Information captured by CAO
► The given list► The proteins grouped by COG
categories► The size of each category in the
given list *► The p-value of each category in
the given list * It captures more information than
traditional COG enrichment analysis (non-SW technology supported)
The traditional output of COG enrichment analysis.
Format: Category: p-value; size; (* denotes p-value < 0.05 significant)
University of Michigan Medical School 12
New relations in CAONew relations in CAO
► denoted_by denoted_by describes a relation of an independent entity and describes a relation of an independent entity and
a data itema data iteman independent entity “denoted_by” a data iteman independent entity “denoted_by” a data item
Not a reverse relation of “denotes”Not a reverse relation of “denotes”
► is_member_ofis_member_ofhas_memberhas_member Reverse relationsReverse relations Describes relation of object and object aggregateDescribes relation of object and object aggregate
University of Michigan Medical School 13
Axioms in CAOAxioms in CAO
► COG category clustered protein, user COG category clustered protein, user defined ≡ user defined protein defined ≡ user defined protein andand (denoted_by (denoted_by somesome COG Functional category) COG Functional category)
► COG category E clustered protein, user COG category E clustered protein, user defined ≡ COG category protein and defined ≡ COG category protein and (denoted_by min 1 COG Amino acid (denoted_by min 1 COG Amino acid transport and metabolism)transport and metabolism)
► COG category E clustered protein group, COG category E clustered protein group, user defined ≡ protein group and user defined ≡ protein group and (has_member only COG category E protein)(has_member only COG category E protein)
University of Michigan Medical School 14
Validation of CAOValidation of CAO
Summaries on CAO
►An ontology to represent COG enrichment analysis
►An ontology to represent the COG enrichment analysis service : OntoCOG
►It is a use case of IAO (Information Artifact Ontology) and OBI (Ontology for Biomedical Investigation)
►It supports OntoCOG.
University of Michigan Medical School 16
OntoCOGOntoCOGhttp://ontobat.hegroup.org/ontocog/index.php
University of Michigan Medical School 17
OntoCOG analysis of OntoCOG analysis of BrucellaBrucella virulence factorsvirulence factors
Result
University of Michigan Medical School 19
Final ConclusionFinal Conclusion
►OntoCOG provide a platform OntoCOG provide a platform independent server for COG enrichment independent server for COG enrichment analysisanalysis
►CAO ontology supports the design and CAO ontology supports the design and workflow of OntoCOG. workflow of OntoCOG.
►OntoCOG is the first semantic web OntoCOG is the first semantic web application used for such purpose.application used for such purpose.
►Future work: interface developing; Future work: interface developing; expand to other statistical analysis; expand to other statistical analysis; output data visualization.output data visualization.
AcknowledgementAcknowledgement
University of Michigan Medical School 20
• The OntoCOG project is supported by NIH grant 1R01AI081062.
• People: Yu Lin Yongqun “Oliver” He Zuoshuang “Allen” Xiang
• Special thanks to ICBO Committee• Thank Dr. Barry Smith for correcting the English in our manuscript.