1 integrating bio and health informatics: ontologies for bridging scales, contexts and customs...
TRANSCRIPT
1O p en G A L E N
Integrating Bio and Health Informatics: Integrating Bio and Health Informatics: Ontologies for Bridging Scales, Contexts Ontologies for Bridging Scales, Contexts
and Customsand Customs
Alan RectorAlan Rector
Bio and Health Informatics Forum/Bio and Health Informatics Forum/Medical Informatics GroupMedical Informatics Group
Department of Computer ScienceDepartment of Computer ScienceUniversity of ManchesterUniversity of Manchester
[email protected]@cs.man.ac.ukwww.cs.man.ac.uk/mig img.man.ac.ukwww.cs.man.ac.uk/mig img.man.ac.uk
www.clinical-escience.orgwww.clinical-escience.orgmygrid.man.ac.ukmygrid.man.ac.uk
2O p en G A L E N
Organisation of TalkOrganisation of Talk
• Convergence of needs and technology
• Barriers
• A response: the CLEF programme
• A unifying issue: Ontologies & Information Integration
• Summary
3O p en G A L E N
The ProblemThe Problem
• The next steps in exploiting our exploding knowledge of basic biology depends on understanding its relation with health and disease.
• Health care is – Deluged with information
• about generalities, policies, and theory
– Information and Knowledge Poor• about specifics of patient care and
outcomes
4O p en G A L E N
A Convergence of NeedA Convergence of Need
• Post genomic research
Knowledge is Fractal
• Safe, high quality, evidence based health care
Need more and better clinical information
• Which scales– In Size– In Complexity
5O p en G A L E N
A convergence of TechnologiesA convergence of Technologies
• Web/Grid/Semantic Web
• Ontologies & Information fusion
• Language technology
• Data mining and case based reasoning
• Healthcare records & standards
• Mobile devices
• Post genomic research
• Safe, high quality, evidence based health care
Open Collaborative Research
6O p en G A L E N
A Unique TimeA Unique Time
• E-Science
• The Grid
• The Semantic Web / Grid
• BioInformatics Genomics/Proteomics…
• Massive investment in population medicine
• Massive investment in NHS computing
• Maturing Electronic Health Records
• …Ride the Whirlwind!
7O p en G A L E N
Protocol/Collection-based Protocol/Collection-based researchresearch
Results in vivo
Research idea
Protocol Authoring
ToolsData
Collection Tools
Shared CollectionsModels & Standards
Protocol Approval
ToolsAutomatic
Patient Screening
Data Analysis Tools
Plausibilityin
Silico/Collecto
8O p en G A L E N
Accelerating the Knowledge Cycle Accelerating the Knowledge Cycle Improving Quality of Care Improving Quality of Care
Hypotheses DesignAnonymisationAnalysis &Integration
Annotation /Knowledge
Representation
InfoSources
Anonymised
Repository &Workbench
InformationFusion
ClinicalResults
IndividualisedMedicine
Data Mining
Case-BaseReasoning
Data CaptureLanguage
Image/SignalGenomic/Proteomic
LibrariesLibrariesRe-useRe-use
PatientCare
ElectronicPatientRecords
10O p en G A L E N
What is e-ScienceWhat is e-Science
• ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’
• ‘e-Science will change the dynamic of the way science is undertaken.’
John Taylor Director General of Research Councils
Office of Science and Technology
11O p en G A L E N
The Semantic Web / GridThe Semantic Web / Grid
• “The Semantic Web is a vision: the idea of having data on the web defined and linked in a way, that it can be used by machines - not just for display purposes, but for using it in various applications.”
www.semanticweb.org
• “Our vision of the infrastructure that is needed to support the full richness of the e-Science vision draws on research and development in both the Grid and the Semantic Web, and adopts a service-oriented approach. We call it the Semantic Grid.”
www.semanticgid.org
12O p en G A L E N
• Emphasize co-operation rather than sheer size– “Virtual organisations”– “Colaboratories”– “Collection based science”
• Layered model– Organisational layer– Knowledge layer– Information layer– Computational/data layer
UK e-Science/Grid ActivitiesUK e-Science/Grid Activities
myGridCLEF
National e-Science Centre www.nesc.ac.uk
13O p en G A L E N
but…“Stones in the Road”but…“Stones in the Road”
• Confidentiality, Privacy and Consent– Keeping public confidence while enabling
research
• Information capture– Speed and ease of use require language
technology
• Information integration– Need common ontologies to bridge bio and
health information
14O p en G A L E N
Social Policy ImperativeSocial Policy ImperativeConfidentiality, Privacy, Confidentiality, Privacy,
and Consentand Consent• Keeping public confidence while enabling research
– Balance individual risks and societal benefits
• Social policy research– Evidence-based debate rather than dogmatic disputation
• Technical means for enforcement– Grid & web infrastructure not yet adequate
Potential Show
Stopper
Good practice rigorously observed Good practice rigorously observed & sensibly enforced& sensibly enforced
15O p en G A L E N
The Information Capture BottleneckThe Information Capture BottleneckWhat clinicians have heard, seen, thought, What clinicians have heard, seen, thought,
& done& done
• Speed & ease of use for entering clinicians– Care can’t wait– Training opportunities minimal
• Language technology– Doctors dictate; nurses write; annotators
annotate
• Quality– Much current information is unreliable
• possibly even dangerous
Information from people rather than machines…
16O p en G A L E N
Information Integration Information Integration BottleneckBottleneck
“Joining up meaning”“Joining up meaning”• Differences in concepts
– “What is a gene?”– “What is a diseases?”
• Differences in purpose & relevance– Clinical care vs clinical research
• Differences in context– Mouse vs human anatomy
• Differences in granularity– Genetic, genomic, … organ…organism
17O p en G A L E N
One response:One response:
CLEFCLEF“Clinical E-Science Framework”“Clinical E-Science Framework”
A Demonstrator in Cancer Care & A Demonstrator in Cancer Care & ResearchResearch
18O p en G A L E N
CLEFCLEFTowards and “end-to-end” solutionTowards and “end-to-end” solution
in an ethical frameworkin an ethical framework
• Patient care
• Formulation of clinical studies
• Information capture
• Information representation
• Information analysis and integration
• Knowledge & hypothesis generation
• Clinical support
19O p en G A L E N
CLEF: CLEF: A meeting of open A meeting of open technologiestechnologies
• Organisational issues & Information governance– Consent, Models of access, balance of research and privacy
• Information capture & quality– Language technology + Ontologies + E Health Record
(OpenEHR)
• Information use for Care– E Health Record + Decision support + Ontologies +
Language generation
• Information Re-use for Research– Pseudonymised E Health Record + Ontologies +
Metadata/repositories
20O p en G A L E N
CLEF: Language TechnologyCLEF: Language Technology
• Extraction of simple information from clinical records– Measures of reliability
• Pseudonomysation aids
• Language generation– Validation
• “What you see is what you meant”
– Presentation
21O p en G A L E N
CLEF: An attempt at the possibleCLEF: An attempt at the possible
• Maximising – improvement in information quantity,
quality, and reliability
• Minimising – Changes in clinicians’ behaviour– Additional costs
• Scaling– Developing in specialist centres– Testing in routine care
22O p en G A L E N
CLEF: Information IntegrationCLEF: Information IntegrationThe role of ontologiesThe role of ontologies
23O p en G A L E N
Ontologies and Knowledge ResourcesOntologies and Knowledge Resources
• The common conceptualisation of a field– Common language; common facts
• Anatomy, physiology, structure, drugs, sequences, SNPs
• For…– Integration & Information fusion
• Linking resources
– Indexing• The right information at the right time
– Annotation and Meta data• Significance + Information Meaning
A Common resource requiring Common Effort & Common Tools in Common Use
24O p en G A L E N
Logic-based Ontologies: Logic-based Ontologies: Conceptual LegoConceptual Lego
“SNPolymorphism of CFTRGene causing Defect in MembraneTransport of ChlorideIon causing Increase in Viscosity of Mucus in CysticFibrosis…”
“Hand which isanatomicallynormal”
25O p en G A L E N
Encrustation
+ involves: MitralValve
Thing
+ feature: pathological
Structure
+ feature: pathological
+ involves: Heart
Logic Based Ontologies: The basicsLogic Based Ontologies: The basics
Thing
Structure
Heart MitralValve EncrustationMitralValve* ALWAYS partOf: Heart
Encrustation* ALWAYS feature: pathological
Feature
pathological red
+ (feature: pathological)
red
+ partOf: Heart
red
+ partOf: Heart
Primitives Descriptions Definitions Reasoning Validating(constraining cross products)
26O p en G A L E N
Bridging Scales Bridging Scales with Ontologieswith Ontologies
GenesSpecies
Protein
Function
Disease
Protein coded by(CFTRgene & in humans)
Membrane transport mediated by (Protein coded by (CFTRgene in humans))
Disease caused by (abnormality in (Membrane transport mediated by (Protein coded by (CTFR gene & in humans))))
CFTRGene in humans
27O p en G A L E N
Avoiding combinatorial explosionsAvoiding combinatorial explosions
• The “Exploding Bicycle” From “phrase book” to “dictionary + grammar” – 1980 - ICD-9 (E826) 8 – 1990 - READ-2 (T30..) 81– 1995 - READ-3 87– 1996 - ICD-10 (V10-19 Australian) 587
• V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income
– and meanwhile elsewhere in ICD-10• W65.40 Drowning and submersion while in bath-tub, street
and highway, while engaged in sports activity
• X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities
28O p en G A L E N
The Cost: Normalising (untangling) The Cost: Normalising (untangling) OntologiesOntologies
StructureFunction
Part-wholeStructure Function
Part-w
hole
29O p en G A L E N
The Cost: Normalising (untangling) The Cost: Normalising (untangling) OntologiesOntologies
Making each meaning explicit and separateMaking each meaning explicit and separatePhysSubstance Protein ProteinHormone Insulin Enzyme Steroid SteroidHormone Hormone ProteinHormone^ Insulin^ SteroidHormone^ Catalyst Enzyme^
Hormone = Substance & playsRole-HormoneRoleProteinHormone = Protein & playsRole-HormoneRoleSteroidHormone = Steroid & playsRole-HormoneRoleCatalyst = Substance & playsRole CatalystRoleInsulin playsRole HormoneRole
…build it all by combining simple trees
Enzyme ?=? Protein & playsRole-CatalystRole
PhysSubstance Protein ‘ ProteinHormone’ Insulin ‘Enzyme’ Steroid ‘SteroidHormone’ ‘Hormone’ ‘ProteinHormone’ Insulin^ ‘SteroidHormone’ ‘Catalyst’ ‘Enzyme’
… ActionRole PhysiologicRole HormoneRole CatalystRole …
… Substance BodySubstance Protein Insulin Steroid …
30O p en G A L E N
Distributed cooperative Distributed cooperative development:development:
Developed & owned by the Developed & owned by the communitycommunity
• Tools to– Hide complexity– Guide domain experts– Re-use common resources– Track provenance
• Self-training and support for users
• “Just in time terminology”– Distributed loosely coupled
development
31O p en G A L E N
CLEF: Re-Use & IntegrationCLEF: Re-Use & Integration
• Pseudonymised longitudinal repository– Fine grained security
• Authorisation and Consent
– Integrated clinical, genomics, imaging information
• What happened? When? Why?• What was done? When? Why?
• Clinical E-Science Workstation– Common access at varying levels of
aggregation– Human Factors –
• Bio-Clinical problem solving– What are the high value scientific questions
32O p en G A L E N
SummarySummary
• Convergence of need in healthcare & post genomic research– Matched by convergence of technologies
• E-Science – an opportunity for collaboration– Faster, less costly, more effective translation
from bioscience to health care
• Barriers to be overcome– Privacy, confidentiality, & consent– Information capture– Information integration – sharing of meaning
• Common “Ontologies” are a key resource
33O p en G A L E N
CLEF Consortium:CLEF Consortium:Academic & NHS PartnersAcademic & NHS Partners
• Bio Health Informatics Forum, Department of Computer Science, University of Manchester
• Centre for Health Informatics and Multiprofessional Education, University College London
• Natural Langauge Group, Department of Computer Science, University of Sheffield
• Judge Institute for Management Studies, University of Cambridge
• Information Technology Research Institute, University of Brighton
• Royal Marsden Hospital Trust
• North and North Central London Cancer Networks