the pragmatics and formality of authoring ontologiesodsl 2016

36
Formality and Pragmatics in Authoring Ontologies Robert Stevens ODLS 2016 School of Computer Science The University of Manchester Manchester United Kingdom M13 9PL [email protected]

Upload: robertstevens65

Post on 13-Apr-2017

31 views

Category:

Science


7 download

TRANSCRIPT

Page 1: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Formality and Pragmatics in Authoring Ontologies

Robert Stevens

ODLS 2016

School of Computer ScienceThe University of Manchester

ManchesterUnited Kingdom

M13 [email protected]

Page 2: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Acknowledgements

• On-going work with Phil Lord on normalising the Gene Ontology

• The Gene Ontology folk for making GO• Nico Matentzoglu for my slides• Mercedes Casteleiro for numbers

Page 3: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Formality and Pragmatics

• Formality: Acting strictly according to procedure or rules– Ontological formality– Representational formality

• Pragmatics: Behaviour driven by practical consequences rather than dogma

• There’s a tension between the two

Page 4: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Gene Ontology Molecular Function • D-alanyl carrier activity• acetylcholine receptor regulator activity• antioxidant activity• binding• calcium channel regulator activity• catalytic activity• channel regulator activity• chemoattractant activity• chemorepellent activity• core DNA-dependent RNA polymerase binding

promoter specificity activity• electron carrier activity• enzyme regulator activity• guanyl-nucleotide exchange factor activity• metallochaperone activity• mitochondrial RNA polymerase binding

promoter specificity activity• molecular function regulator

• molecular transducer activity • morphogen activity• negative regulation of molecular function• neurotransmitter receptor regulator activity• nucleic acid binding transcription factor activity• nutrient reservoir activity• positive regulation of molecular function• protein tag• receptor regulator activity• regulation of molecular function• signal transducer activity• structural molecule activity• transcription factor activity, core RNA

polymerase binding• transcription factor activity, protein binding• transcription factor activity, transcription factor

binding• translation regulator activity• transporter activity

NUMBER OF TERMS: ~10khttp://geneontology.org/

Page 5: The Pragmatics and Formality of Authoring OntologiesOdsl 2016
Page 6: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

What is Molecular Function in GO?

• Describes “function”…?

GO:0003674molecular_function

Elemental activities, such as catalysis or binding, describing the actions of a

gene product at the molecular level. A given gene product may exhibit one

or more molecular functions.

Page 7: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Motivation

• Is GO’s molecular function ontology really function, “little” processes or both?

• Documented as a function• Sometimes looks like a process• Sometimes treated like a process• Confusion of thing with a function and the

function• This can make modelling harder than it need be

Page 8: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

A Couple of Observations

• Pragmatically, we commit to GO – it’s the only show in town and it works

• There’s a lot of chemicals around in GO MF• We are biochemistry….!• Probably few functions – strip out all the “non-

function” stuff and see what’s left• Then we can look at the ontological nature of GO

MF• Also, re-create in a more sustainable form

Page 9: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

It’s all work in progress

Page 10: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

12

A “tangled” ontology of amino acids

Page 11: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

13

There are several dimensions of classification here

• The amino acids themselves – a chemical dimension• The size of the amino acids side chain• The charge on the side chain• The polarity of the side chain• The hydrophobicity of the side chain• We can normalise these into separate hierarchies then put them

back together again • Our goal is to put entities into separate trees all formed on the

same basis • Size only talks about size; amino acid only talks about chemical

composition (based on an alpha-carbon with an amino and carboxylic acid group);and so onof classification

Page 12: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

14

The dimensions separated

Amino AcidsAlanineArginineAsparagineCysteineGlutamateGlutamineGlycineHistidineIsoleucineLeucineLysineMethioninePhenylalanineProlineSerineThreonineTryptophanTyrosineValine

ChargeNegativeNeutralPositive

SizeTinySmallMediumLarge

PolarityPolarNonpolar

HydrophobicityHydrophobicHydrophilic

Page 13: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

15

The process

• Hand-crafted ontologies with a polyhierarchy are “tangled”

• Usually axiomatically lean• We classify along one axis and use

“restrictions” to other modules to capture other axes

• Then re-build the polyhierarchy using the axiomatically rich ontology

Page 14: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

16

“Pulling out” dimensions

• Each separate tree must be the same kind of thing

• We don’t mix continuants, processes, qualities, etc

• We don’t mix our classification by, for instance, structure and then charge

• We do that compositionally via defined classes and automated reasoners

Page 15: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

17

The amino acid pattern

Class: AminoAcidSubClassOf:

hasSize some Size,hasPolarity some Polar,hasCharge some Charge,hasHydrophobicity some Hydrophobicity

Page 16: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

18

An amino acid

Class: LysineSubClassOf:

AminoAcid,hasSize some Large,

hasCharge some Positive,hasPolarity some Polar,hasHydrophobicity some Hydrophilic

Page 17: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

19

Rebuilding the hierarchy

• Class: LargeAminoAcid– EquivalentTo: AminoAcid

• and hasSize some Large

• Class: PositiveAminoAcid– EquivalentTo: AminoAcid– and hasCharge some Positive

• Class: LargePositiveAminoAcid– EquivalentTo: LargeAminoAcid and PositiveAminoAcid

Page 18: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

20

A “tangled” ontology of amino acids

Page 19: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Other Ontology Topics as Factors in GO MF

molecular function

chemical

chemical role

reaction

biological process

cellular component

cell

protein

sequence

40-60% of terms mention chemicals

Page 20: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Some GO Terms

GO MF

glucose import

cytosolic calcium ion transport

hydrolase activity

tyrosine binding

retroviral strand

transfer activity

electron carrier

activity

Page 21: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Binding

• ~2k terms in the binding bit of GO MF• Remove the chemicals• Leaves “binding”• There is a function “to bind”• There is a process of binding”• Linguistically – an infinitive and a

gerund/nominalised verb

Page 22: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

More “to bind” Functions?

• “to bind” is the basic function• Specialise to to bind covalently, to bind via

hydrogen, to bind electrostatically but these are built compositionally with reference to other ontologies

Page 23: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Chemorepellant - chemoattractant activity

GO:0042056chemoattractant activity

Providing the environmental signal that initiates the directed movement of a

motile cell or organism towards a higher concentration of that signal.

GO:0045499chemorepellent activity

Providing the environmental signal that initiates the directed movement of a

motile cell or organism towards a lower concentration of that signal.

To diffuse

Page 24: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

GO realisable entities

RealizableEntity

ToCatalyseToBind

ToMark

ToStore

ToDiffuse

ToTransportToMaintainIntegrity

ToProtect

ToModulate

ToRegulate

ToTransduce

Page 25: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Angels on the head of a pin

Page 26: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Distinctions with no (practical) difference

• “Distinction without a difference” – making a distinction where none exists

• Distinctions may exist, but does one need to make them?

• Does a distinction make a practical difference to the use case in hand?

• Make no distinction unless it makes a difference• Beware of consistency…

Page 27: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

New function hierarchy

• RealizableEntity– ToCatalyse– ToBind

• ToMark

– ToStore– ToDiffuse– ToTransport– ToMaintainIntegrity– ToProtect– ToModulate

• ToRegulate

– ToTransduce

Page 28: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Is realized

in

Standard pattern – some and only

Has realizable

entity

Gene product

Realisable entity

Biological process

Page 29: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

RO candidate: capable_of = shortcut

Is capable ofGene

productBiological process

Page 30: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Some patterns

• hasRealisableEntity some (to_bind and realisedIn only (binding and hasInput some chemical)))

• Add “playsrole some role” for a chemical role like drug

• hasRealisableEntity some (to_catalyse and realisedIn only (catalysis and hasInput some chemical and hasOutput some chemical))

Page 31: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Actually doing it

• Programmatically using Tawny-OWL• Asserted tree of molecular realisables and

molecular processes • Defined classes for the actual terms• May have to restrict to OWL EL for practical

reasons• We shall see…

Page 32: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Strategies for Defined Classes

• Total post co-ordination• Total pre co-ordination• Pre co-ordinate those classes that have been

used in annotation

Page 33: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

How many GO MF terms are used?

Annotation fileHomo sapiens: Canonical accessions from UniProt

(goa_human.gaf.gz)

Unfiltered GOA UniProt gene association file

(goa_uniprot_all.gaf.gz)

Total number of GO-UniProt annotations 354 515 ~ 354K 294 208 149 ~ 294M

Unique UniProt IDs 19 055 ~ 19K 45 968 890 ~ 46M

Unique active Molecular Function classes

3 947 ~ 4K 7 521 ~ 7K

Unique active Molecular Function classes used

more than 5 times1 313 ~ 1K

Page 34: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

What have we found?• Very few functions• … and some look dispositional• It looks like physics• Most functions involve binding – makes sense• We separate realisables and processes • We live with a bit of “replication”• With molecular processes, do we need molecular funtion?• WE change the upper reaches of GO MF, but…• Does it make any practical difference?

Page 35: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Formality

• Ontological formality• Making the right distinctions drives consistent use of

relationships• Facilitates the kind of analysis we’ve done• Can also be a barrier to progress• Representational formality• Knowing what is being said is useful• Allows clean interpretation• Enables useful reasoning

Page 36: The Pragmatics and Formality of Authoring OntologiesOdsl 2016

Pragmatic Decisions

• Commit enough to achieve goals• If re-using take on the commitments of that ontology

– If using OBO commit to OBO– If what you’re using uses something with which you disagree

– get over it• Axiom pragmatics• Don’t represent that which isn’t needed• Truth and beauty• A counsel of perfection is a counsel of despair• I’d make “gene product” explicit