collaborative construction of large biological ontologies
TRANSCRIPT
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Collaborative Construction of Large Biological Ontologies
Jie Baoa,
This work is in collaboration with Zhiliang Hub, LaRon Hughesb, Doina Carageac, Peter Wonga, James Reecyb,
Vasant G Honavara
aArtificial Intelligence Research Laboratory, Department of Computer ScienceaCenter for Computational Intelligence, Learning, and Discovery
bDepartment of Animal Science, Iowa State University, Ames, IA 50011, USA
c Department of Computing and Information Sciences, Kansas State University Manhattan, KS 66506
Email: {baojie, zhu, laron, pwwong,jreecy, honavar}@iastate.edu, [email protected]
2
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Outline
• Collaborative Ontology Building (COB) Desiderata
• Limitations of CVS-based Collaboration
• COB-based on Modular Ontologies
• The COB Editor
3
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Large Biological Ontologies
Gramineae Taxonomy
Plant Ontology
Gene Ontology
MGED Ontology
(microarray)
4
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Example: Gene Ontology
5
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Non-collaborative Ontology Building
DownloadOntology Local Editing
UploadOntology
(single curator)
(Protégé) (OBO-Edit)
6
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Collaboration In NeedExample: Gene Ontology Consortium
7
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Collaboration In Need (2)
Swine
Cattle Chicken
Horse
Each group works on an ontology module for a particular species (according to the group’s best expertise)
Example 2: an animal trait ontology that involves multiple research groups across the world
8
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Challenges
• Knowledge Integration
• Concurrence Management
• Consistency Maintenance
• Privilege Management
• History Maintenance
• Scalability
9
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Solutions1. Pipeline
• Divide the ontology building process into sequential phrases
• Each phrase is assigned to a particular contributor.
2. CVS• CVS = Concurrent Version System• Treat an ontology as a single file/document;• Use collaborative tools like CVS to build the ontology.
3. Modular Ontology • Build the ontology with fine-grained modules; • Different contributors can concurrently edit different
modules.
<= Very limited collaboration
<= Collaboration with high cost
<= Our approach
10
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Outline
• Collaborative Ontology Building (COB) Desiderata
• Limitations of CVS-based Collaboration
• COB-based on Modular Ontologies
• The COB Editor
11
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
CVS-based Ontology Building
Get GO CVS Account
Get Source Forge Account
Set Up CVS Access
Submit Change Request
Track the Request
User submit change suggestion
(in natural language)
Get Source Forge Account
Take a Change Request
Curator
Download Whole GO Flat File
Local Editing
Make Local Log File
Save GO Flat File
Manual Version Control
Commit the Whole New Ontology to CVS
12
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Unprincipled Authorization and Organization
• No principled mechanism to ensure curator privilege assignments,
• No clear organizational division of the whole ontology into smaller manageable units.
13
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Risk of Inconsistency
• No principled way to avoid unintended couplings and over-writing.
• The validity and consistency of the ontology are heavily dependent on – the curator discipline and
– good community communications (e.g., via email lists).
14
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Lack of Partial Editing/Reuse
• A curator has to – download the entire ontology, before
editing, – and submit the entire modified ontology,
after editing;
• A user cannot download and reuse only a selected subset of the ontology
• High communication and memory overhead!
15
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Expensive History Maintenance
• Even a minor edit of the ontology causes the ontology file to be replicated in its entirety
• Tracing the changing history of a term requires processing the entire ontology file for comparisons (e.g., diff)
16
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Limited Participation
• Since all editing has global effect, it is diffcult to – grant privileges scope to different types of users
(e.g., core curators versus normal curators)– accept/deny/modify/revert local changes made
by other curators
• The curator community has to be limited to a small number of trusted curators.
17
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Outline
• Collaborative Ontology Building (COB) Desiderata
• Limitations of CVS-based Collaboration
• COB based on Modular Ontologies
• The COB Editor
18
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Basic Strategy
• Localize the interactions among different parts of a large ontology.
• Build an ontology with fine-grained organizational structure.
• Allow group collaboration on different ontology modules.
19
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Package-based Ontologies
• The whole ontology consists of a set of packages
• Each package represents a fragment of the whole ontology
• Each term has a "home package"
General Cattle
Pig Chicken
Animal Trait ontology
EggChicken
ReproductionGeneral
20
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Package Nesting• A nested package is a part of
another package• Could be used to represent
the organizational structure of an ontology– Arrange knowledge– Enforce hierarchical
management of knowledge
General
Pig
Pig Health
Animal trait ontology
21
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Division of Labor
• A package can be assigned to curators with the best knowledge of the relevant sub-domain. – e.g. Pig Health, Pig Reproduction
• The package hierarchy helps to manage interactions among experts with different degrees of expertise.– e.g. Pig, Pig Health
22
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Partial Reuse
General Cattle
Pig Chicken
Animal Trait Ontology(Centralized)
Pork
General
Pig
Cattle
Chicken
Pork
Animal Trait Ontology(Package-based)
Semantic importing
Knowledge incorporated in Pork ontology
Knowledge not presented in Prok ontologyLegend:
23
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Scaleability• Reduction in communication overhead and
computational time cost – Parsing– Transfering– Consistency check
• Reduction in memory requirements– Ontology can be partially loaded into memory
• Reduction in history tracking cost– Effect of changes is localized
24
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Broadened Participation• Open-community collaboration success witnessed
by DMOZ and Wikipedia• Package-based ontology management can
– Control the scope of an editing action– Minimize the risk of vandalization
• Better tradeoff between broader participation and ontology quality– There are different levels of curators, e.g. ontology
admins, pig experts, pig health experts.– An editing action can be approved or denied by a
curator with higher privileges
25
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Outline
• Collaborative Ontology Building (COB) Desiderata
• Limitations of CVS-based Collaboration
• COB-based on Modular Ontologies
• The COB Editor
26
Iowa State University Department of Computer ScienceArtificial Intelligence Research LaboratoryThe COB Editor
Pig Package
Cattle Package
Chicken Package
[Bao et al. BIDM06]
27
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Collaborative Ontology BuildingOntology modularity facilitates collaborative building• Each package can be independently developed• Different curators can concurrently edit the
ontology on different packages• Ontology can be only partially loaded• Unwanted interactions are minimized by limiting
term and axiom visibility• Module access privileges can be controlled by the
package hierarchy
28
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Work with COB Editor
Download
• http://www.animalgenome.org/bioinfo/projects/ATO/
• http://sourceforge.net/projects/cob (source code)
Get Ontology Account
Check out a package
CuratorCreate new
package
or Lock Package
Edit the Package
Commit the Package
(Auto) Server Change Log
29
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
More Features• Support import/export from/to OWL and
OBO format– can be used for Gene Ontology and others
• Ontology shared on a database server• Allows multi-relational hierarchies
– e.g. both is-a and part-of
• Visibility of a term can be controlled by scope limitation modifiers– e.g. public, private, protected
30
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Conclusions
• Modular ontologies can improve collaborative ontology building in many aspects
• Package-based Ontology offers an "importing" based ontolog language.
• COB Editor provides the necessary tool to collaboratively build well-structured, large-scale, biomedical ontologies
31
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Future Work
• Support of inference and consistency checking
• Accommodation and modularization of existing ontologies, e.g. GO, EC, SCOP
• Support of ontology mapping and ontology integration
• Support of more expressive ontologies, e.g. UMLS, SNOMED
32
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Thanks!