phenotype capture in genetic variant databases peng chen school of computer and information science...
TRANSCRIPT
Phenotype Capture in Genetic Variant Databases
Peng Chen
School of Computer and Information Science
Supervisor: Dr Jan Stanek
Research Fields: Health Informatics
Health Computer Science
Health Information System
Outline
Motivation Research Question Literature Methodology Phenotype Data Review Result The openEHR Archetypes Review Result Phenotype Capture Experiment Result Conclusion
Motivation
1950s health computer science, EHR (Electronic Health Record)
Slow development
Bio-medical research & EHR systems
Genotype – Phenotype correlation
Research Question
Can the existing standard openEHR be used to capture and store phenotype data/clinical data?
Hypothesis one: most of the phenotype data in genetic variant databases is not coded, has little clinical details, not stored in a consistent manner.
Hypothesis two: openEHR is potentially suitable to store phenotype data as a standard.
Literature
Claustres et al. (2002) ‘Time for a Unified System of Mutation Description and Reporting: A Review of Locus-Specific Mutation Databases’
Mitropoulou et al. (2010) ‘Locus-specific database domain and data content analysis: evolution and content maturation toward clinical use’
Spath & Grimson (2011) ‘Applying the archetype approach to the database of a biobank information management system’
Chen et al. (2009) ‘Archetype-based conversion of EHR content models: pilot experience with a regional EHR system’
Methodology
Criteria form for phenotype review 1. Storage 4. Granualrity Collect phenotypes Overall granularity level Internal storage Partial fine-grained phenotypesProprietary external storage Foreign external storage 5. Curation Curated 2. Terminology Formal terminology 6. Multiple phenotypes Proprietary terms (mapped to Single phenotype a recognised terminology) Multiple phenotype External terminology used directly Recognised terminology 7. Case level Variant-level phenotypes 3. Coding standard Case-level phenotypes Formal coding standard Proprietary codes (mapped to 8. Database a recognised coding standard) Database family External coding standard used directly Flatform Recognised coding standard
Methodology
The openEHR phenotype capture model
Methodology
Data integration workflow towards a proposed health care EHR integration architecture
Phenotype Data Review Result
Reviewed 1224 databases, 978 collect phenotype, all stored in internal storages.
40 (4.1%) has formal terminology, 30 (3.1%) has formal coding. 959 (98%) store low-granularity phenotype data. 604 (62%) were curated by experts. 534 (54.6%) store single phenotype data, 444 (45.5%) store multiple
phenotype data. 757 (77.4%) store phenotypes on case basis, 221 (22.6%) on
variant basis. Database:
Database family Number Platform
LOVD 614 MySQL
UMD 13 4D SQL DB
63% of databases are LOVD
Platform Number
MySQL DB 617
Web page table form 209
Web page free text 132
4D SQL DB 13
PDF table form 4
Excel table form 2
Web page bar chart 1
Phenotype Data Review Result
Phenotype samples:
Sample 1: ‘MRX’, ‘ARRP’, ‘AMD’, ‘arCRD’, ‘CIPA or HSN IV (H406Y + G613V are polymorphisms)’, ‘Type I, type II, non syndromic recessive’
Sample 2: ‘Failure to thrive; Pneumocystis carinii pneumonia; Diarrhea; Marked lymphopenia’
Sample 3: Symptoms Other bacterial infections:
Symptoms Pseudomonas aeruginosa;
Symptoms Escherichia coli;
Symptoms Stenotrophomonas maltophilia;
Symptoms Other; Enterobacter cloacae
Symptoms Other symptoms: perirectal abscess and failure of the
Symptoms umbilical stump to involute, recurrent perirectal
Symptoms abscesses, an infected urachal cyst, a failure to heal
Symptoms surgical wounds, and the absence of pus in infected areas,
Symptoms leucocytosis, neutrophilia, hypochromic anemia
Treatment Bone marrow transplantation: Yes
Treatment Donor: matched sibling
Treatment Outcome: alive and well
Comment D57N mutant behaves in a dominant-negative fashion at the
Comment cellular level
The openEHR Archetypes Review Result
Reviewed 283 existing openEHR archetypes
Multilingual translation mechanism Term binding mechanism
Criteria Result
Number of terms 7361
Number of term bindings 94
Coding system SNOMED-CT, LOINC
Has term binding 7 (0.24% archetypes)
Has multilingual translations 83 (29.3% archetypes)
Languages English, German, Arabic, Portuguese, Japanese, Russian, Dutch, Chinese, Spanish, Farsi
Compile failure 14
Multilingual translation mechanism - example
ontology
terminologies_available = <"SNOMED-CT", ...>
term_definitions = <
…
["zh-cn"] = <
items = <
...
["at0004"] = <
text = <" 收缩压 ">
description = <" 一个血液循环周期中,系统性动脉血压高峰值。 收缩期血压 ">
…
["de"] = <
items = <
...
["at0004"] = <
text = <"Systolisch">
description = <"Der höchste arterielle Blutdruck eines Zyklus - gemessen in der systolischen oder
Kontraktionsphase des Herzens.">
…
["en"] = <
items = <
...
["at0004"] = <
text = <"Systolic">
description = <"Peak systemic arterial blood pressure - measured in systolic or contraction phase of the heart cycle.">
>
(ADL display)
The openEHR Archetypes Review Result
The openEHR Archetypes Review Result
Multilingual translation mechanism - compare
Term binding mechanism
term_bindings = <
["SNOMED-CT"] = <
items = <
["at0000"] = <[SNOMED-CT(2003)::163020007]>
["at0004"] = <[SNOMED-CT(2003)::163030003]>
["at0005"] = <[SNOMED-CT(2003)::163031004]>
["at0013"] = <[SNOMED-CT(2003)::246153002]>
>
>
(ADL display)
The openEHR Archetypes Review Result
Phenotype Capture Experiment Result
The chosen sample:
The mapping of concepts:
Diagnosis Wiskott Aldrich syndrome
Symptoms Platelets
Symptoms At date of diagnosis: Count: 28,000/µL
Treatment Bone marrow transplatation: Yes
Treatment Donor: mismatched family donor
Phenotype Capture Experiment Result
The openEHR archetypes mapping:
Evaluation DiagnosisObservation SymptomAction Treatment
NO. Archetypes Entry items
1 openEHR-EHR-EVALUATION.problem-diagnosis.v1.adl Diagnosis
2 openEHR-EHR-OBSERVATION.lab_test-full_blood_count.v1.adl Platelet count
3 openEHR-EHR-ACTION.procedure.v1.adl Procedure, Comments
Phenotype Capture Experiment Result
Phenotype capture snapshots:
Phenotype Capture Experiment Result
Phenotype capture snapshots:
Phenotype Capture Experiment Result
Phenotype capture snapshots:
Phenotype Capture Experiment Result
Phenotype capture snapshots:
A conceptual patient-centric EHR data warehouse schema
Conclusion
The research results have justified the hypotheses and have matched the expected outcomes
The openEHR standard is potentially suitable for storing clinical data, even for integrating health information systems.
The multilingual language mechanism and term binding mechanism are two strong evidences for semantic interoperability between heterogeneous systems.
We need international cooperation on managing the archetypes and completing a full set of archetypes for health concepts.
We need international agreement on choosing terminologies and enhancing the terminologies for resolving semantic conflicts.
Conclusion
The philosophy and the future
A health care EHR integration architecture
Archetype-ontology
Cognitive IS
Human friendly
Robust, scalable, integrated
Semantic interoperability
Syntactic consistency
Data modelling neutral
Start from learning terms and concepts
IS essentially for communication
Ubiquitous information computing