![Page 1: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/1.jpg)
Knowledge Management in a Knowledge Based Discipline
Robert Stevens
BioHealth Informatics Group
University of Manchester
![Page 2: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/2.jpg)
Introduction
• How do we do (molecular)biology• Managing stamp albums• A knowledge based discipline• Representing knowledge computationally• Ontologies that define what entities are in the
domain• Describing biological knowledge ontologically• Using ontologies and is it enough?
![Page 3: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/3.jpg)
Ernest Rutherford
“All science is either physics or stamp collecting”
Image: http://en.wikipedia.org/wiki/File:Ernest_Rutherford2.jpg
![Page 4: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/4.jpg)
Mathematical Sciences
![Page 5: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/5.jpg)
Laws in Biology
Charles Darwin
Image: http://en.wikipedia.org/wiki/File:Charles_Darwin_01.jpg
On The Origin of Species - 1859
![Page 6: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/6.jpg)
Classic and Modern Biology
Genotype Phenotype
Modern biology
Classic biology
![Page 7: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/7.jpg)
Central Dogma
Image: http://cellbio.utmb.edu/CELLBIO/DNA-RNA.jpg
![Page 8: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/8.jpg)
Speed of sequencing
• First human genome
– 10+ years to produce– Cost $500 million– Huge international effort
• Now done in 10 weeks
– (for $399)– http://tinyurl.com/genomecost– http://www.23andme.com
![Page 9: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/9.jpg)
1000+ databases
• according to Nucleic Acids Research
![Page 10: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/10.jpg)
PubMed: 2 papers per minute
• ~700,000 individual papers• Grows at 2 papers per minute (see http://
blogs.bbsrc.ac.uk for details)
![Page 11: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/11.jpg)
Uniprot:- A protein database?
![Page 12: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/12.jpg)
What is Knowledge?
• Knowledge – all information and an understanding to carry out tasks and to infer new information
• Information -- data equipped with meaning
• Data -- un-interpreted signals that reach our senses
Michael AshburnerProfessor
University of CambridgeUK
ISMB
NameJob
InstitutionCountry
Conf
manacademic, senior
ancient university, 5 ratedEuropean
important figure in biology
BIOLOGY
![Page 13: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/13.jpg)
A Knowledge Based Discipline
• Rather than laws captured in mathematics….• We have lots of facts: the discipline’s knowledge• Rather than “calculating” what a protein does, we
investigate and write it down• Equivalent to writing down the trajectories of all
thrown objects and not doing ballistics!• To do biology one needs “the knowledge”
![Page 14: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/14.jpg)
Heterogeneity
• 28 ways to format the representations of a biological sequence
• Though one way to represent the bases or amino acids…
• Different words same concept• Different concepts same words• Different and implicit data schema
![Page 15: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/15.jpg)
Categories and Category Labels
GO:0000368
U2-type nuclear mRNA 5' splice site recognition
spliceosomal E complex formation
spliceosomal E complex biosynthesis
spliceosomal CC complex formation
U2-type nuclear mRNA 5'-splice site recognition
![Page 16: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/16.jpg)
An Identity Crisis
• Database entries have identifiers unique within their database
• The type of entity described in an entry doesn’t have an identifier
• Different entries about the same type talk about it differently
• How do we know when an entry in one DB talks about the same thing as another entry in another DB?
• That’s the skill of a bioinformatician
![Page 17: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/17.jpg)
Why: Society of Biologists
• To do particle physics necessarily has central organisation
• One central place to generate data• A communitarian attitude• It is still possible to do biology in the “garden shed”• Historicaly less need to organise• Hence…
![Page 19: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/19.jpg)
Biology is Special
• Large quantities of data: No it doesn’t• Complex data: Yes it does• Volatile data: Types of data and what is recorded
changes rapidly• Nothing that special about biology • …except that it has all the problem and often to a
large degree
![Page 20: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/20.jpg)
Lots of catalogues
Genome
Proteome
Transcriptome
Interactome
Metabolome
PHENOME
![Page 21: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/21.jpg)
Biology now has lots of facts
![Page 22: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/22.jpg)
Creating Woods, not Trees
Genes
Proteins
Pathways
Interactions
LiteratureComplex Machines
Virtual Organism
…. from biological facts, we make a system that is some model of a real organism
![Page 23: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/23.jpg)
Networks of Chemicals
Image: http://genome-www.stanford.edu/rap_sir/images/Web_FigF_RAP1_glycolysis.gif
![Page 24: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/24.jpg)
Systems within Systems
Image: http://www.ehponline.org/members/2007/10373/fig1.jpg
![Page 25: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/25.jpg)
A Biologist’s Skills
• By the time a biologist has finished a Ph.D. he/she is about ready for action
• They have a comprehensive knowledge of the facts of a (narrow) domain
• He/she also knows how to do experimentation in that domain
• There are so many facts, it is difficult to move outside one’s sub-discipline
• Yet in a systems view such movement is mandatory
![Page 26: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/26.jpg)
The Role of Knowledge
• A lot of facts• Perhaps organised into a system• No equivalent of “laws of mechanics” – we
can’t do this biology with mathematics• Or at least not without knowing what the
numbers mean...• This is why we’ve been using ontologies!
![Page 27: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/27.jpg)
What is an Ontology?
• A description of that which exists (in our data)• What it means to be a member of a category• What categories of things exist and how do I
recognise that a particular object is a member of a given category
![Page 28: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/28.jpg)
Uses of Ontology in Bioinformatics
![Page 29: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/29.jpg)
Why develop an ontology?
• To make domain assumptions explicit
– Easier to change domain assumptions– Easier to understand and update legacy data
• To separate domain knowledge from operational knowledge
– Re-use domain and operational knowledge separately
• A community reference for applications• To share a consistent understanding of what information means.
![Page 30: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/30.jpg)
History of Bio-ontologies
1992 1996 1998
TAMBIS
2002
MGED
2006
1st Bio-ontologies meeting
Gene Ontology starts
2005
![Page 31: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/31.jpg)
Controlled Vocabulary
• An Ontology isn’t a controlled vocabulary, but can be used to deliver one
• By agreeing upon the categories in a domain and agreeing upon their labels we are controlling vocabulary
• Addresses one major problem in biology• Also forces examination of definitions• Makes domain assumptions explicit
![Page 32: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/32.jpg)
Transferring Characteristics
Uncharacterised protein
Tra1 La2 La3
High similarity transfer characteristics
![Page 33: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/33.jpg)
Post-Genomic Biology
• Fly, mouse, yeast, worm all have their own terminologies
• I want to compare genomes• How?• The genomic sequence is easily dealt with
computationally and comparisons are easy• This is not true of the annotations or knowledge of
those sequences• Need a common understanding
![Page 34: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/34.jpg)
Annotation of Data
• Big effort to create controlled vocabularies using ontologies
• A huge annotation efffort – describe the entities in DB with terms from ontologies
• The Gene Ontology (http://www.geneontology.org))• The Open Biomedical Ontologies Consortiym
![Page 35: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/35.jpg)
Genotype Phenotype
Sequence
Proteins
Gene products Transcript
Pathways
Cell type
BRENDA tissue / enzyme source
Development
Anatomy
Pheonotype
Plasmodium life cycle
-Sequence types and features-Genetic Context
- Molecule role - Molecular Function- Biological process - Cellular component
-Protein covalent bond -Protein domain -UniProt taxonomy
-Pathway ontology -Event (INOH pathway ontology) -Systems Biology -Protein-protein interaction
-Arabidopsis development -Cereal plant development -Plant growth and developmental stage -C. elegans development -Drosophila development FBdv fly development.obo OBO yes yes -Human developmental anatomy, abstract version -Human developmental anatomy, timed version
-Mosquito gross anatomy-Mouse adult gross anatomy -Mouse gross anatomy and development -C. elegans gross anatomy-Arabidopsis gross anatomy -Cereal plant gross anatomy -Drosophila gross anatomy -Dictyostelium discoideum anatomy -Fungal gross anatomy FAO -Plant structure -Maize gross anatomy -Medaka fish anatomy and development -Zebrafish anatomy and development
-NCI Thesaurus -Mouse pathology -Human disease -Cereal plant trait -PATO PATO attribute and value.obo -Mammalian phenotype -Habronattus courtship -Loggerhead nesting -Animal natural history and life history
eVOC (Expressed Sequence Annotation for Humans)
![Page 36: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/36.jpg)
The Sequence Ontology
(http://obo.sf.net)
![Page 37: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/37.jpg)
![Page 38: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/38.jpg)
GO in Analysis
• Microarray analysis one of the original visions for GO• Clustering of modulated genes cluster about
functional attributes of their proteins• GO also used in, for example, semantic similarity;
text analysis; etc.
![Page 39: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/39.jpg)
Fact Management
• When “stamp collecting” we’re collecting facts• Biology is a fact management activity• Knowing what these fact mean is very import• Science is perofrmed on data and the smeantics of
data enable us to do science• Semantic e-Science
![Page 40: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/40.jpg)
Summary
• The nature of modern biology gives it interesting knowledge (fact) management issues
• It is a knowledge based discipline• Not unique, but often extreme• Ontologies seen as one component in management
(but not a panacea)
![Page 41: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/41.jpg)
acknowledgements
• All these people provided slides and input:• Duncan Hull• Simon Jupp• Phil Lord• Carole goble
![Page 42: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/42.jpg)
Genotype to Pathway
Created by Paul Fisher
![Page 43: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/43.jpg)
Pathway to Phenotype
Created by Paul Fisher
![Page 44: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/44.jpg)
Ontology Space
(Axi
omat
ic)
Ric
hnes
s
Usage
Representation
![Page 45: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/45.jpg)
Metadata toilet
• Everyone wants to use good metadata but few people want to spend time curating and cleaning metadata
– Like a clean toilet
![Page 46: Knowledge Management in a Knowledge Based Discipline](https://reader036.vdocuments.us/reader036/viewer/2022081413/546fc79faf79594c3d8b45c5/html5/thumbnails/46.jpg)
Biologists Wake up to Standards