1 integrating bio and health informatics: ontologies for bridging scales, contexts and customs...

33
1 O pen G ALEN Integrating Bio and Health Informatics: Integrating Bio and Health Informatics: Ontologies for Bridging Scales, Contexts Ontologies for Bridging Scales, Contexts and Customs and Customs Alan Rector Alan Rector Bio and Health Informatics Forum/ Bio and Health Informatics Forum/ Medical Informatics Group Medical Informatics Group Department of Computer Science Department of Computer Science University of Manchester University of Manchester [email protected] [email protected] www.cs.man.ac.uk/mig img.man.ac.uk www.cs.man.ac.uk/mig img.man.ac.uk www.clinical-escience.org www.clinical-escience.org mygrid.man.ac.uk mygrid.man.ac.uk

Upload: edwina-french

Post on 30-Dec-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

1O p en G A L E N

Integrating Bio and Health Informatics: Integrating Bio and Health Informatics: Ontologies for Bridging Scales, Contexts Ontologies for Bridging Scales, Contexts

and Customsand Customs

Alan RectorAlan Rector

Bio and Health Informatics Forum/Bio and Health Informatics Forum/Medical Informatics GroupMedical Informatics Group

Department of Computer ScienceDepartment of Computer ScienceUniversity of ManchesterUniversity of Manchester

[email protected]@cs.man.ac.ukwww.cs.man.ac.uk/mig img.man.ac.ukwww.cs.man.ac.uk/mig img.man.ac.uk

www.clinical-escience.orgwww.clinical-escience.orgmygrid.man.ac.ukmygrid.man.ac.uk

2O p en G A L E N

Organisation of TalkOrganisation of Talk

• Convergence of needs and technology

• Barriers

• A response: the CLEF programme

• A unifying issue: Ontologies & Information Integration

• Summary

3O p en G A L E N

The ProblemThe Problem

• The next steps in exploiting our exploding knowledge of basic biology depends on understanding its relation with health and disease.

• Health care is – Deluged with information

• about generalities, policies, and theory

– Information and Knowledge Poor• about specifics of patient care and

outcomes

4O p en G A L E N

A Convergence of NeedA Convergence of Need

• Post genomic research

Knowledge is Fractal

• Safe, high quality, evidence based health care

Need more and better clinical information

• Which scales– In Size– In Complexity

5O p en G A L E N

A convergence of TechnologiesA convergence of Technologies

• Web/Grid/Semantic Web

• Ontologies & Information fusion

• Language technology

• Data mining and case based reasoning

• Healthcare records & standards

• Mobile devices

• Post genomic research

• Safe, high quality, evidence based health care

Open Collaborative Research

6O p en G A L E N

A Unique TimeA Unique Time

• E-Science

• The Grid

• The Semantic Web / Grid

• BioInformatics Genomics/Proteomics…

• Massive investment in population medicine

• Massive investment in NHS computing

• Maturing Electronic Health Records

• …Ride the Whirlwind!

7O p en G A L E N

Protocol/Collection-based Protocol/Collection-based researchresearch

Results in vivo

Research idea

Protocol Authoring

ToolsData

Collection Tools

Shared CollectionsModels & Standards

Protocol Approval

ToolsAutomatic

Patient Screening

Data Analysis Tools

Plausibilityin

Silico/Collecto

8O p en G A L E N

Accelerating the Knowledge Cycle Accelerating the Knowledge Cycle Improving Quality of Care Improving Quality of Care

Hypotheses DesignAnonymisationAnalysis &Integration

Annotation /Knowledge

Representation

InfoSources

Anonymised

Repository &Workbench

InformationFusion

ClinicalResults

IndividualisedMedicine

Data Mining

Case-BaseReasoning

Data CaptureLanguage

Image/SignalGenomic/Proteomic

LibrariesLibrariesRe-useRe-use

PatientCare

ElectronicPatientRecords

9O p en G A L E N

An opportunity for An opportunity for E-Science / E-HealthE-Science / E-Health

10O p en G A L E N

What is e-ScienceWhat is e-Science

• ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’

• ‘e-Science will change the dynamic of the way science is undertaken.’

John Taylor Director General of Research Councils

Office of Science and Technology

11O p en G A L E N

The Semantic Web / GridThe Semantic Web / Grid

• “The Semantic Web is a vision: the idea of having data on the web defined and linked in a way, that it can be used by machines - not just for display purposes, but for using it in various applications.”

www.semanticweb.org

• “Our vision of the infrastructure that is needed to support the full richness of the e-Science vision draws on research and development in both the Grid and the Semantic Web, and adopts a service-oriented approach. We call it the Semantic Grid.”

www.semanticgid.org

12O p en G A L E N

• Emphasize co-operation rather than sheer size– “Virtual organisations”– “Colaboratories”– “Collection based science”

• Layered model– Organisational layer– Knowledge layer– Information layer– Computational/data layer

UK e-Science/Grid ActivitiesUK e-Science/Grid Activities

myGridCLEF

National e-Science Centre www.nesc.ac.uk

13O p en G A L E N

but…“Stones in the Road”but…“Stones in the Road”

• Confidentiality, Privacy and Consent– Keeping public confidence while enabling

research

• Information capture– Speed and ease of use require language

technology

• Information integration– Need common ontologies to bridge bio and

health information

14O p en G A L E N

Social Policy ImperativeSocial Policy ImperativeConfidentiality, Privacy, Confidentiality, Privacy,

and Consentand Consent• Keeping public confidence while enabling research

– Balance individual risks and societal benefits

• Social policy research– Evidence-based debate rather than dogmatic disputation

• Technical means for enforcement– Grid & web infrastructure not yet adequate

Potential Show

Stopper

Good practice rigorously observed Good practice rigorously observed & sensibly enforced& sensibly enforced

15O p en G A L E N

The Information Capture BottleneckThe Information Capture BottleneckWhat clinicians have heard, seen, thought, What clinicians have heard, seen, thought,

& done& done

• Speed & ease of use for entering clinicians– Care can’t wait– Training opportunities minimal

• Language technology– Doctors dictate; nurses write; annotators

annotate

• Quality– Much current information is unreliable

• possibly even dangerous

Information from people rather than machines…

16O p en G A L E N

Information Integration Information Integration BottleneckBottleneck

“Joining up meaning”“Joining up meaning”• Differences in concepts

– “What is a gene?”– “What is a diseases?”

• Differences in purpose & relevance– Clinical care vs clinical research

• Differences in context– Mouse vs human anatomy

• Differences in granularity– Genetic, genomic, … organ…organism

17O p en G A L E N

One response:One response:

CLEFCLEF“Clinical E-Science Framework”“Clinical E-Science Framework”

A Demonstrator in Cancer Care & A Demonstrator in Cancer Care & ResearchResearch

18O p en G A L E N

CLEFCLEFTowards and “end-to-end” solutionTowards and “end-to-end” solution

in an ethical frameworkin an ethical framework

• Patient care

• Formulation of clinical studies

• Information capture

• Information representation

• Information analysis and integration

• Knowledge & hypothesis generation

• Clinical support

19O p en G A L E N

CLEF: CLEF: A meeting of open A meeting of open technologiestechnologies

• Organisational issues & Information governance– Consent, Models of access, balance of research and privacy

• Information capture & quality– Language technology + Ontologies + E Health Record

(OpenEHR)

• Information use for Care– E Health Record + Decision support + Ontologies +

Language generation

• Information Re-use for Research– Pseudonymised E Health Record + Ontologies +

Metadata/repositories

20O p en G A L E N

CLEF: Language TechnologyCLEF: Language Technology

• Extraction of simple information from clinical records– Measures of reliability

• Pseudonomysation aids

• Language generation– Validation

• “What you see is what you meant”

– Presentation

21O p en G A L E N

CLEF: An attempt at the possibleCLEF: An attempt at the possible

• Maximising – improvement in information quantity,

quality, and reliability

• Minimising – Changes in clinicians’ behaviour– Additional costs

• Scaling– Developing in specialist centres– Testing in routine care

22O p en G A L E N

CLEF: Information IntegrationCLEF: Information IntegrationThe role of ontologiesThe role of ontologies

23O p en G A L E N

Ontologies and Knowledge ResourcesOntologies and Knowledge Resources

• The common conceptualisation of a field– Common language; common facts

• Anatomy, physiology, structure, drugs, sequences, SNPs

• For…– Integration & Information fusion

• Linking resources

– Indexing• The right information at the right time

– Annotation and Meta data• Significance + Information Meaning

A Common resource requiring Common Effort & Common Tools in Common Use

24O p en G A L E N

Logic-based Ontologies: Logic-based Ontologies: Conceptual LegoConceptual Lego

“SNPolymorphism of CFTRGene causing Defect in MembraneTransport of ChlorideIon causing Increase in Viscosity of Mucus in CysticFibrosis…”

“Hand which isanatomicallynormal”

25O p en G A L E N

Encrustation

+ involves: MitralValve

Thing

+ feature: pathological

Structure

+ feature: pathological

+ involves: Heart

Logic Based Ontologies: The basicsLogic Based Ontologies: The basics

Thing

Structure

Heart MitralValve EncrustationMitralValve* ALWAYS partOf: Heart

Encrustation* ALWAYS feature: pathological

Feature

pathological red

+ (feature: pathological)

red

+ partOf: Heart

red

+ partOf: Heart

Primitives Descriptions Definitions Reasoning Validating(constraining cross products)

26O p en G A L E N

Bridging Scales Bridging Scales with Ontologieswith Ontologies

GenesSpecies

Protein

Function

Disease

Protein coded by(CFTRgene & in humans)

Membrane transport mediated by (Protein coded by (CFTRgene in humans))

Disease caused by (abnormality in (Membrane transport mediated by (Protein coded by (CTFR gene & in humans))))

CFTRGene in humans

27O p en G A L E N

Avoiding combinatorial explosionsAvoiding combinatorial explosions

• The “Exploding Bicycle” From “phrase book” to “dictionary + grammar” – 1980 - ICD-9 (E826) 8 – 1990 - READ-2 (T30..) 81– 1995 - READ-3 87– 1996 - ICD-10 (V10-19 Australian) 587

• V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income

– and meanwhile elsewhere in ICD-10• W65.40 Drowning and submersion while in bath-tub, street

and highway, while engaged in sports activity

• X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities

28O p en G A L E N

The Cost: Normalising (untangling) The Cost: Normalising (untangling) OntologiesOntologies

StructureFunction

Part-wholeStructure Function

Part-w

hole

29O p en G A L E N

The Cost: Normalising (untangling) The Cost: Normalising (untangling) OntologiesOntologies

Making each meaning explicit and separateMaking each meaning explicit and separatePhysSubstance Protein ProteinHormone Insulin Enzyme Steroid SteroidHormone Hormone ProteinHormone^ Insulin^ SteroidHormone^ Catalyst Enzyme^

Hormone = Substance & playsRole-HormoneRoleProteinHormone = Protein & playsRole-HormoneRoleSteroidHormone = Steroid & playsRole-HormoneRoleCatalyst = Substance & playsRole CatalystRoleInsulin playsRole HormoneRole

…build it all by combining simple trees

Enzyme ?=? Protein & playsRole-CatalystRole

PhysSubstance Protein ‘ ProteinHormone’ Insulin ‘Enzyme’ Steroid ‘SteroidHormone’ ‘Hormone’ ‘ProteinHormone’ Insulin^ ‘SteroidHormone’ ‘Catalyst’ ‘Enzyme’

… ActionRole PhysiologicRole HormoneRole CatalystRole …

… Substance BodySubstance Protein Insulin Steroid …

30O p en G A L E N

Distributed cooperative Distributed cooperative development:development:

Developed & owned by the Developed & owned by the communitycommunity

• Tools to– Hide complexity– Guide domain experts– Re-use common resources– Track provenance

• Self-training and support for users

• “Just in time terminology”– Distributed loosely coupled

development

31O p en G A L E N

CLEF: Re-Use & IntegrationCLEF: Re-Use & Integration

• Pseudonymised longitudinal repository– Fine grained security

• Authorisation and Consent

– Integrated clinical, genomics, imaging information

• What happened? When? Why?• What was done? When? Why?

• Clinical E-Science Workstation– Common access at varying levels of

aggregation– Human Factors –

• Bio-Clinical problem solving– What are the high value scientific questions

32O p en G A L E N

SummarySummary

• Convergence of need in healthcare & post genomic research– Matched by convergence of technologies

• E-Science – an opportunity for collaboration– Faster, less costly, more effective translation

from bioscience to health care

• Barriers to be overcome– Privacy, confidentiality, & consent– Information capture– Information integration – sharing of meaning

• Common “Ontologies” are a key resource

33O p en G A L E N

CLEF Consortium:CLEF Consortium:Academic & NHS PartnersAcademic & NHS Partners

• Bio Health Informatics Forum, Department of Computer Science, University of Manchester

• Centre for Health Informatics and Multiprofessional Education, University College London

• Natural Langauge Group, Department of Computer Science, University of Sheffield

• Judge Institute for Management Studies, University of Cambridge

• Information Technology Research Institute, University of Brighton

• Royal Marsden Hospital Trust

• North and North Central London Cancer Networks