Dave ThauPASI, Costa Rica, June 7, 2008
Ontologies in Ecology and Biodiversity Informatics
Dave Thau
With some slides by Shawn Bowers and Josh Madin gratefully reused with permission
Dave ThauPASI, Costa Rica, June 7, 2008
Four Chapters
I. What are ontologies and why should we care?
II. Some nitty gritty
III. Ontologies in ecology and biodiversity informatics
IV. Tools
Dave ThauPASI, Costa Rica, June 7, 2008
Talk Goals• Learn about ontology successes
• Learn basic terminology / buzz words
• Get a sense for ontology development
• See how they apply to ecology and biodiversity
• Learn what remains to be done
– Bottom line: A LOT!
Dave ThauPASI, Costa Rica, June 7, 2008
Ontology Defined
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Trapeziid Crab Pincer Acrophora Oceanlives inhas part
lives in
Dave ThauPASI, Costa Rica, June 7, 2008
notebook
The Way It’s Been
Dave ThauPASI, Costa Rica, June 7, 2008
The PlanHow are the finchesdoing these days?
1. Find data sets:“give me all data sets describing finch abundance”
Finches R’ UsWorld FinchDatabase
Finch FancyRepository
2. Find analysis:“find a way to plot theirdistribution”
Plotter Workflow
3. Integrate data,plug it in, get results
Dave ThauPASI, Costa Rica, June 7, 2008
Where Ontologies Can Help
Finches R’ UsWorld FinchDatabase
Plotter Workflow
Finch FancyRepositoryFinding the right
Data sets
Integrating the data
Finding a good analysis and
making sure data fits the analysis Making the results
discoverable
Dave ThauPASI, Costa Rica, June 7, 2008
Other Ways Ontologies Help
• Crystalize knowledge
• Lay open assumptions
• Makes for great parties
Dave ThauPASI, Costa Rica, June 7, 2008
Simple AssemblySimple Assembly
Assembly With SwitchAssembly With Switch
Assembly-1Assembly-1
Instance Of
Subclass Of
Successes
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Dave ThauPASI, Costa Rica, June 7, 2008
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
The GO Ontology: www.geneontology.org
Dave ThauPASI, Costa Rica, June 7, 2008
Gene Ontology widely adopted
AgBase
Dave ThauPASI, Costa Rica, June 7, 2008
GOOver 25,000 terms
19 Contributing groups
GO AnnotationsUniProtKB O13035 GO:0004098UniProtKB O13035 GO:0004336UniProtKB O13035 GO:0004348
Total manual GO annotations - 388,633
Total proteins with manual annotations – 80,402
Total number distinct proteins – 2,971,374
Total number taxa – 129,318
GO Stats
I
Dave ThauPASI, Costa Rica, June 7, 2008
Ontologies and You
• User of “invisible” ontologies – like search
• User of created ontologies – annotating data sets
• Collaborator in ontology creation– biologist working with ontologist
• Hands-on ontology builder– you’ll need more than a 1 hour talk…
Dave ThauPASI, Costa Rica, June 7, 2008
Chapter I Summary
• Ontologies can help– Locate data– Add semantics to data– Integrate data– Clarifiy domains
• There are already good examples– In genomics– In biomedical field– In engineering
Dave ThauPASI, Costa Rica, June 7, 2008
The Nitty Gritty
• XML, RDF, OWL and other 3 letter words
• Ontology Basics
• Reasoning with Ontologies
Dave ThauPASI, Costa Rica, June 7, 2008
XML, DTDs, XML Schema
Not good for machinestools can’t automatically processhow do you know it’s valid?
Dave ThauPASI, Costa Rica, June 7, 2008
XML
XML, XML Schema
<?xml version='1.0'?>
<dataset>
<dataitem>
<col>hya</col>
<ht>1.5</ht>
<crabs>11</crabs>
</dataitem>
…
</dataset>
Col.,Ht.,Crabshya,1.5,11
XML Schema
string
float
integer
Dave ThauPASI, Costa Rica, June 7, 2008
XML and XML Schema
• Now any machine can validate an XML document, given a schema
• Languages to translate XML to PDF or HTML exist
• But…. Can’t relate things– Like “the data in this file relates to study X”
Dave ThauPASI, Costa Rica, June 7, 2008
The Resource Description Framework (RDF)– individuals (objects), properties, and classes
RDF and RDF Schema
livesIn
My Crab That Coral
A. CoralT.CrablivesIn
type type
Coral
subClassOf
Crab
subClassOf
Dave ThauPASI, Costa Rica, June 7, 2008
RDF is Useful• GO is available in RDF
• FOAF - Friend of a Friend– For example, go to– http://xml.mfd-consult.dk/foaf/explorer/– Enter: http://hello.typepad.com/foaf.rdf
• RSS - Really Simple Syndication– It’s probably in your browser– Yahoo pipes rss blender
Person Personknows
David Jacobs
randomwalks.com
imgname
Jesse James Garrett
name
blog.jjg.net
homepagehomepage
<foaf:Person> <foaf:weblog rdf:resource="http://hello.typepad.com/" /> <foaf:homepage rdf:resource="http://www.randomwalks.com" /> <foaf:name>David Jacobs</foaf:name> <bio:olb>I work in New York City with filmmakers, activists and educators. </bio:olb> <foaf:img rdf:resource="http://hello.typepad.com/mirrorshot.jpg" /> <foaf:knows> <foaf:Person> <foaf:name>Jesse James Garrett</foaf:name> <foaf:homepage rdf:resource="http://blog.jjg.net/weblog/" /> <rdfs:seeAlso rdf:resource="http://blog.jjg.net/foaf.rdf" /> </foaf:Person> </foaf:knows></foaf:Person>
FOAF
seeAlso
Dave ThauPASI, Costa Rica, June 7, 2008
Basic Ontology Building BlocksInstances
– The actual things of interest
– For example, a specimen (that crab)
Classes (concepts)
– A set of instances that share certain characteristics
– For example, the set of all crabs
is-a
– A is-a B means every instance of A is also an instance of B
– A might have additional characteristics; more restrictions
Properties (has-a / part-of)
– Represent a characteristic
– e.g., has Wings, has-color Yellow
crab
isa
crab
T.crabhas-colorcrab color
The crab that bit me
Dave ThauPASI, Costa Rica, June 7, 2008
Example of Pollution Ontology
Dave ThauPASI, Costa Rica, June 7, 2008
Classes versus Instances - tricky!
– If A is-a B, then every A is B
– Every human, in this case, must also be a species
– But “John” is not a species
Species
Human
John
is-a
instance
Species
Human
John
Species Human
JohnHuman
(Guarino)
Dave ThauPASI, Costa Rica, June 7, 2008
is-a is not part-of
– What are essential properties of Cars?
• E.g., that they accommodate people?
– Are these also essential for Engines?
Car
Engine
part-of
Car
Engine
[Guarino]
Wheel
part-of
EngineCar
part-of
Dave ThauPASI, Costa Rica, June 7, 2008
Limitations of RDF-based Ontologies
• No constraints - – “all red things have the color property with value red”– “Costa Rica has only one President”
• Can’t create definitions by combining other definitions– Mother = Parent and Female
• Can’t say concepts are equivalent or disjoint
Dave ThauPASI, Costa Rica, June 7, 2008
OWL - The Web Ontology Language
• Three different kinds– Lite - limited, but still powerful– DL - very expressive, can still reason– Full - extremely expressive, but unreasonable
• Example Reasoning OWL– If all apples are red, and apples and
manzanas are the same, then all manzanas are red
Dave ThauPASI, Costa Rica, June 7, 2008
Reasoning about Taxonomy
Peet’s 2005Ranunculusdata set:9 Taxonomies654 Taxa704 Relations
visualization byMartin Graham
Dave ThauPASI, Costa Rica, June 7, 2008
Is This Right?
Peet, 2005: B.1948:R.h.stolonifer is congruent to K.2004:R.h.stoloniferB.1948:R.h.typicus is congruent to K.2004:R.h.typicusB.1948:R. hydrocharoides is congruent to K.2004:R. hydrocharoides
The most likely fix here is to change the congruence relation between the toptwo nodes to instead state that Benson's R. hydrocharoides includesKartesz's
Ranunculushydrocharoides
Ranunculushydrocharoides
R.h. varnatans
R.h. varnatans
R.h. varstolonifer
R.h. varstolonifer
R.h. vartypicus
R.h. vartypicus
Ranunculushydrocharoides
Ranunculushydrocharoides
R.h. varstolonifer
R.h. varstolonifer
R.h. vartypicus
R.h. vartypicus
Assuming disjoint children and complete partitioning of parents
⊋Benson, 1948 Kartesz, 2004
Dave ThauPASI, Costa Rica, June 7, 2008
Getting Crazy with Properties• Properties can be:
– Transitive (a is inCountry b, b is inCountry c..) – Inverse (a partOf b, b has_part a)– Functional (dave’s birthMother is vera)– Inverse functional (dave’s ssn is ….)
• And you can say stuff like– Apples are only red– Some apples are red– Crabs have 2 pincers
Dave ThauPASI, Costa Rica, June 7, 2008
Chapter II Summary
• XML is about syntax• RDF is about relationships• OWL is about more complex constraints• Tips:
– If A is-a B, then every instance of A is also an instance of B
– Keep classes and instances separate– is-a is not part-of
Dave ThauPASI, Costa Rica, June 7, 2008
Chapter III: Ontologies in Ecology
• GO and friends are successful but..
• Hard to represent processes– Show me studies about the flow of nitrogen in highly
saline lakes, starting with lake-side nitrate
• Can’t be used for data integration
• Ecologists use complex models that involve many relations beyond is-a and part of relations
Dave ThauPASI, Costa Rica, June 7, 2008
Reminder:Where Ontology Can Help
• Crystalizing domain knowledge
• Marking up metadata and data sets
• Marking up analyses, and analysis components
Dave ThauPASI, Costa Rica, June 7, 2008
Marking Up Metadata and Data
Taxonomic Working Group Standards
http://rs.tdwg.org/ontology/voc/
Geo.owlSpecies.owlVegetation.owlGeography.owlWater.owlEcosystem.owl
Alternethttp://www5.umweltbundesamt.at/ALTERNet
Dave ThauPASI, Costa Rica, June 7, 2008
Metadata and Data with OBOE
Example data set: the abundance of Trapeziid crabs in coral colonies (Stewart et al. 2006)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Dave ThauPASI, Costa Rica, June 7, 2008
Metadata and Data with OBOE
Two measurements of the organism: the name …
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
: Organism
ofEntity
: Observation : MeasurementhasMeasurement
: TaxonNameofCharacteristic
: TaxonCatalog
usesStandard
“Acroporahyacinthus”
hasValue
Dave ThauPASI, Costa Rica, June 7, 2008
Two measurements of the organism: the name … height
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
: Organism
ofEntity
: Observation : MeasurementhasMeasurement
: TaxonNameofCharacteristic
: TaxonCatalog
usesStandard
“Acroporahyacinthus”
hasValue
: Measurement : HeightofCharacteristic
: Meter
usesStandard
“1.25”
hasValue
“0.01”
hasPrecision
hasMeasurement
Metadata and Data with OBOE
Dave ThauPASI, Costa Rica, June 7, 2008
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
: Organism
ofEntity
: Observation : MeasurementhasMeasurement
: TaxonNameofCharacteristic
: TaxonCatalog
usesStandard
“Acroporahyacinthus”
hasValue
: Measurement : HeightofCharacteristic
: Meter
usesStandard
“1.25”
hasValue
“0.01”
hasPrecision
hasMeasurement
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
: Observation
hasContext
: Measurement : TaxonNameofCharacteristic
: TaxonCatalog
usesStandard
“Trapeziidcrab”
hasValue
: Measurement : AbundanceofCharacteristic
: Individual
usesStandard
“11”
hasValue
hasMeasurement
hasMeasurement
Metadata and Data with OBOE
Dave ThauPASI, Costa Rica, June 7, 2008
: Coral
ofEntity
: ObservationhasMeasurement
: Measurement : DiameterofCharacteristic
: Meter
usesStandard
“1.25”
hasValue
“0.01”
hasPrecision
(a)
: Animal
ofEntity
: ObservationhasMeasurement
: Measurement : ColonyDiamaterofCharacteristic
: Centimeter
usesStandard
“320”
hasValue
“10”
hasPrecision
(b)
Integration of data sets given their observation semantics
Data Integration with OBOE
Dave ThauPASI, Costa Rica, June 7, 2008
: Coral
ofEntity
: ObservationhasMeasurement
: Measurement : DiameterofCharacteristic
: Meter
usesStandard
“1.25”
hasValue
“0.01”
hasPrecision
(a)
: Animal
ofEntity
: ObservationhasMeasurement
: Measurement : ColonyDiamaterofCharacteristic
: Centimeter
usesStandard
“320”
hasValue
“10”
hasPrecision
(b)
Integration involves data set observation structures
is-a is-a: Length
hasDimension
hasDimension
Data Integration with OBOE
Dave ThauPASI, Costa Rica, June 7, 2008
: Animal
ofEntity
: ObservationhasMeasurement
: Measurement : DiameterofCharacteristic
: Meter
usesStandard
“1.3”
hasValue
“0.1”
hasPrecision
(c)
And then applying appropriate conversions, etc.
(a)
“3.2” (b)
Data Integration with OBOE
Dave ThauPASI, Costa Rica, June 7, 2008
Marking up Analyses
• Scientific Workflow Systems help:– Make analyses reproducible– Make parts of analyses reusable
• But…– 100’s of workflows and templates– 1000’s of actors (e.g. actors for web
services, data analytics, …)
• Need to find what you want
Dave ThauPASI, Costa Rica, June 7, 2008
Semantic Type Annotation in Kepler
Component input and output port annotationEach port can be annotated with multiple terms from multiple ontologiesAnnotations are stored within the actor metadata
Dave ThauPASI, Costa Rica, June 7, 2008
Chapter III Summary
• Taxonomies and partonomies are useful but limiting
• We saw a couple of ontologies for– Representing a domain– Describing data
• Again, the focus is always on discovery, integration and reuse
Dave ThauPASI, Costa Rica, June 7, 2008
Tools• For RDF:
– Simile : simile.mit.edu - nice RDF tools
• For OWL:– Protégé : protege.stanford.edu
• For reasoning:– Pellet: http://www.mindswap.org/2003/pellet/– Jena: http://jena.sourceforge.net/inference/
Dave ThauPASI, Costa Rica, June 7, 2008
Protégé
Dave ThauPASI, Costa Rica, June 7, 2008
OWLViz Tab
Dave ThauPASI, Costa Rica, June 7, 2008
Summing Up
• Ontologies are useful for– Data discovery– Data integration– Terminology regulation– Analysis Reuse
• Ontology in ecology and biodiversity is just getting started
Dave ThauPASI, Costa Rica, June 7, 2008
Lastly: Back to the Goals• Learn about ontology successes
• Learn basic terminology / buzz words
• Get a sense for ontology development
• See how and where they apply to ecology
and biodiversity studies
• Learn what remains to be done
– Bottom line: A LOT!
Dave ThauPASI, Costa Rica, June 7, 2008
Some ReferencesPractical guides/references
– Protégé. Open source ontology editor. http://protege.stanford.edu/ – CO-ODE. Various resources on ontologies, tutorials, best-practices, etc. http://www.co-ode.org/– W3C Semantic Web Activity. Various pointers, standardization efforts, etc.
http://www.w3.org/2001/sw/ – OWL Resources: OWL-Guide (http://www.w3.org/TR/owl-guide/), OWL-Reference (
http://www.w3.org/TR/owl-ref/)– Pizza Tutorials. http://www.co-ode.org/resources/tutorials/
Academic Papers/Collections– Bard and Rhee. Ontologies in biology: Design, applications and future challenges. Nature
Reviews, Genetics, vol. 5, 2004.– The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nature Genet.
25: 25-29, 2000– Barry Smith, http://ontology.buffalo.edu/smith/, various papers on ontologies (even for ecology)– Sowa, J. F. Knowledge Representation: Logical, Philosophical, and Computational Foundations.
PWS Publishing Co., Boston, 1999.– Baader F., Calvanese D., McGuinness D., Nardi D., and Patel-Schneider P. The Description
Logic Handbook: Theory, Implementation, and Applications. Cambridge Univ. Press, 2003.– Thomas R. Gruber. Toward principles for the design of ontologies used for knowledge sharing. In
Formal Ontology in Conceptual Analysis and Knowledge Representation, Kluwer Academic Publishers, 1993.
– Nicola Guarino. Formal ontology and information systems. In Proc. of Formal Ontology in Information Systems, IOS Press, pp. 3-15, 1998.
Dave ThauPASI, Costa Rica, June 7, 2008
Exercise: Ontology Engineering1. Choose the specific “domain” you want to tackle:
• Based on a specific collection of data that you are familiar with• Based on an existing project/experiment you are working on or
understand• Focus on use: data set markup or describing a domain
2. Define (a part of) an ontology for the domain• Start with the classes• Then arrange into an isa hierarchy• Then add properties between the classes• If you feel mighty, try some property constraints
3. Capture your ontology on whiteboard, poster board, or cmap tool as one or more diagram Transitive
InverseFunctionalInverse FunctionalAll Apples have a colorSome Apples have a colorAll apples are redSome apples are redCrabs have 2 pincers