query languages for snomed: use cases and issues for binding to health records and to icd &...
TRANSCRIPT
Query Languages for SNOMED:
Use Cases and Issues for Binding to Health Records and to ICD& background for comments on DRAFT SNOMED
Query language spec(locally http://www.cs.man.ac.uk/~rector/temp/SNOMED_TQL_for_comment)
Alan RectorBioHealth Informatics Group
University of [email protected]
http://www.cs.manchester.ac.uk/~rector
Copyright University of Manchester 2012 Licensed under Creative Commons Attribution Non-commercial Licence v3
Background► Use cases that for terminology query languages
► Binding of ontologies to health records: HL7 & EN 13606/Archtypes• Specifying “value sets” for fields
• Expanding SQL queries to include subsumed concepts
► Use of a common ontology in ICD
► Questions► Theoretical
• Query expansion for querying data bases rather DL queries on A-Boxes‣ Negation: “not necessarily” vs “necessarily not”‣ Natural level of incompleteness – The frame problem and Grice
• Coping with representations in subsets of EL++ without disjointness
► Practical for ICD • Are there flaws in SNOMED’s proposed query language? Are there alternatives?
‣ “build or borrow” – relation to standards
• Establishing a “reference representation” – what it should have been ‣ and a migration path
► Major issues in Query Language Spec► Pragmatic requirements for ICD
• “Arbitrary selection of classes”
• Negation – exclusions, residual classes (“other”), with/without
• Using queries to cope with known errors in SNOMED
• Comprehensible rules for assigning cases to codes 2
Necessary background:
► SNOMED CT
► Binding to EHR► Separation of Domain Ontology from Data schema
► HL7 and Archetypes
► Three component architecture for ICD11.
► Requirements and status of SNOMED Terminology Query Language (& its Ocean Informatics predecessor)
3
SNOMED CT (SCT)
► Large terminology formuated in an old description logic
►Roughly EL++ without disjointness• Logical content available in OWL syntax • OWL version classifies with ELK or SNOROCKET in a few seconds
►~300K active classes; ~1.2M axioms• MConvenient to extract modules for experiments
‣ Most tools get bogged down in bulk
►Role Group• Translation into OWL not identical to KRSS original
►Idiosyncratic schema & many errors• See papers on my website.
►Canonical form mechanism that is often used in lieu of classification
• A good topic for a separate discussion – not for today
4
Role Groups► Purpose: group qualifiers (restrictions) together to
distinguish►Cancer originating in breast and metastatic to bone*
• Cancer & RoleGroup some (has_status some primary &
hasSite some Breast) & RoleGroup some (has_status metastases & has_site some bone)
►Cancer originating in bone and metastatic to breast• Cancer &
RoleGroup some (has_status some metastases & hasSite some Breast) & RoleGroup some (has_status primary & has_site some bone)
► OWL translation pragmatic►Role groups inserted everywhere for consistency.
• Native syntax omits them when not required
5* Easy to understand example. Not literally correct for SNOMED
Major issue: What should a code represent?The “Condition” vs “Situation” debate
(now largely resolved in favour of “situations”
► Does a code represent►A “disorder”?
• “Condition” interpretation
►“having a disorder”?• “Situation” interpretation
‣ “Situation of having a disorder” /‣ “Patient having the disorder (at a given place and time as observed by|
a given clinician)”
6
Example: Fracture of Radius & Ulna (Forearm) – a single code in ICD and SNOMED
► Nothing can be both a “fracture of radius” and “fracture of ulna”
►“Condition interpretation”
► A patient can simultaneously have both a “fracture of radius” and “fracture of ulna”
►“Situation interpretation”7
The evidence
► Should responses to queries / rules for patients with “Fracture of Radius” include patients with “Fracture of the radius & ulna”?
►Most doctors say “yes”
►Hierarchies of SNOMED and ICD imply “yes”, i.e.• “Fracture of Radius and Ulna” is a kind of “Fracture of Radius”
► Which is safer?
8
Implications in OWL
► Condition interpretation►sctcode:Fracture_of_ulna sct:Fracture_of_ulna
►sctcode:Fracture_of_radius_and_ulna ??sct:Fracture_of_radius_and_ulna
►sctcode:Intracranial_bleed_without_skull_fracture ???
► Situation interpretation (reference model)►sctcode:Fracture_of_ulna
Situation & (includes some sct:Fracture_of_ulna)
►sctcode:Fracture_of_radius_and_ulna Situation & (includes some sct:Fracture_of_ulna) & (Includes some sct:Fracture_of_radius
►sctcode:Intracranial_bleed_without_skull_fracture Situation & (includes some sct:Intracranial_bleed) & not (includes some sct:skull_fracture)
9
…but
► For the foreseable future:►The hierarchies behave as if the codes represented
situations
►Separate entities for the condition and the situation will not be created
• It is up to software and users to disambiguate or to manage as best they can‣ One of the many legacy idiosyncracies
10
11
Most common use case: eHealthrecords
Data schema
Ontology
Are the dotted arrows: Class expressions? Queries? Other?
Ontology
Data base
Most common use case: eHealthrecords
To determine what is legal for entries in the database
Consider retrieval from a database► I want to retrieve all situations with hypertension during
pregnancy… ►Pregnancy only recorded if kind of hypertension does not
necessarily involve pregnancy, so we need the union of: • All situations with kinds of hypertension necessarily involving pregnancy
- e.g. SELECT ?situation, ?diagnosis from DiagnosticTable WHERE ?diagnosis IN {SubclassesOf Hypertension_necessarily_not_involves_pregnancy}
• All situations involving kinds of hypertension not necessarily involving pregnancy but with pregnancy recorded separately.
- e.g. SELECT ?situation, ?diagnosis1 from DiagnosticTable WHERE ?diagnosis1 IN {SubclassesOf Hypertension_not_necessarily_involved_pregnancy} & EXISTS ?situation, ?diagnosis2 WHERE ?diagnosis2 IN {Subclasses of Pregnancy}
►In the terminology query language we need a query for:• “Kinds of hypertension not necessarily involving X”• “Kinds of hypertension necessarily involving X”
‣ (but that’s simple: “Subclasses of X” usually abbreviated “ X”)⬇
• “Kinds of hypertension necessarily not involving X”‣ Straightforward if we had negation and disjointness, which we don’t 1
3
Consider specification of “value sets”
► Main cases► Simple value sets not used elsewhere
• severity in {mild | moderate | severe}
► Complete hierarchies – all descendants • diagnosis in {SubclassesOf Disorder}
► Ordered hierarchies and defaults, with specialisation• “Reason for admission” in {Chest pain, Major trauma, Hypothermia,…}
► Arbitrary lists of one or more specific classes• “Radiation of chest pain” in {left arm, shoulder, neck, axilla, abdomen}
‣ Exist elsewhere and used for many other purposes
► Union, intersection & difference of all of the above
► Other issues► Declarative specification
• updating with changes in terminology; changes in data schema.
► Addition or removal of values by context(discussion for another day)
14
ICD and ICD-11(“International Classificaiton of Diseases”)
► ICD is a classification NOT an ontology►Used for national and international statistical returns
►Also for billing in many jurisdictions• (including an extra layer of “Clinical Modifications” for each country)
►Lots of legacy idiosyncracies• Designed to be printed in books & manuals
► Basic rule: Everything must add up to 100% at each level: therefore…
►Each code has only one parent
►Children of every code mutually exclusive and exhaustive
►Therefore…• If a code fits logically in two places it must be “excluded” from all but
one.• Residual categories “other” & “not elsewhere classified” are required to
make siblings exhaustive 15
SNOMED CT Common Ontology Subset
ICD 11 Revision use caseMulti-layer system
16
Foundation Component(signs, symptoms, causes, …) Ontology Component
(kinds)
Mortality Morbidty Primary Care …
Linearizations
ICD 11 Revision
►Aims to provide a persistent structure for computer access
►Foundation component• An “ontological core” shared with SNOMED• A “Content model” of other information that folk want
‣ signs, symptoms, effects, relation to diability, … … … … … …
►“Linearizations” that look like the legacy system• But can be generated from the Foundation Component
and its annotations ‣ Coherent with Foundation Model (except for flagged legacy
issues)‣ A single tree of mutually exclusive and exhaustive subclasses at
each level- Therefore must have
- “Exclusions”- “Residudala categories” – “other” “not elsewhere classified”
17
Assumptions► Snomed disorder codes to be treated as “situations”
►Conjunctions and negation “wrapped” in code
►Hierarcies consistent with “situation” interpretation
► Queries will be against the either asserted or inferred form of the ontology, but no reasoner will be used
► To be used with separate data schemas ►For lists of potential values
►For expanding queries for retrieval
► To be used with ICD “Linearizations”►Specify meaning of each item in a linearization in terms of the
ontology 18
Requirements listed for SNOMED Terminology Query Language (locally http://www.cs.man.ac.uk/~rector/temp/SNOMED_TQL_for_comment)
► Support► Select class itself only, children, and/or descendants
► Set operations on results – union, intersection, difference
► Differentiate primitive and fully defined concepts; leaf concepts from others• C SubclassOf … vs C EquivalentTo ….; no subclasses vs has subclasses;
‣ And possibly other syntactic selection/filtering
► Concepts asserted related to another given concept• And possibly the reciprocals (‘used in’)
► String matching
► Use results of previous queries in nested ) queries• and subsequent queries?
► Other► Functional & all functions returning a set of concepts
► Easy to use, understand, and implement
► Questions►What’s missing? How best to satisfy the requirements? 1
9
Examples► /* This query expression returns concepts in the Clinical finding sub-hierarchy*/
►DescendantsAndSelf(404684003|Clinical finding|)
► /* This query expression returns all fully defined concepts in the Clinical finding sub-hierarchy /*
►FilterOnFullyDefined(DescendantsAndSelf(404684003|Clinical finding|))
► /* This query expression returns the first three levels of the Clinical findings hierarchy. */
►ChildrenAndSelf( ChildrenAndSelf( ChildrenAndSelf(404684003|Clinical finding|)))
► /* This query expression returns all concepts in the ‘Immune hypersensitivity reaction hierarchy that have an explicit ungrouped ‘Causative agent’ relationship defined to any target concept.*
► Intersection( DescendantsAndSelf(418925002|Immune hypersensitivity reaction|), HasDirectRel(246075003|Causitive agent|, All))
►
20
Inferred & assertedUse of Role Groups
► /* When run against the inferred view, this query expression returns all concepts that contain a first group with a ‘Finding site’ of ‘Inguinal canal structure’ and an ‘Associated morphology’ of ‘Hermial opening’, and a second group with a ‘Finding site’ of ‘Abdominal cavity structure’ and an ‘Associated morphology’of ‘Hernia’. Concepts with inherited grouped relationships are also returned.*/
►Intersection( HasGroupedRels( 363698007|Finding site|, 90785001|Inguinal canal structure|, 116676008|Associated morphology|, 414402003|Hermial opening|) HasGroupedRels( 363698007|Finding site|, 52731004|Abdominal cavity structure|, 116676008|Associated morphology|, 414403008|Hernia|))
21
Example using descendants and has rel without role groups
► /* this query expression returns concepts describing infectious arthritis */
►Intersection( Descendants(404684003|Clinical finding|) HasRel(116676008|Associated morphology|, DescendantsAndSelf(23583003| Inflammation|)), HasRel(363698007|Finding site|, DescendantsAndSelf(39352004|Joint structure|)), HasRel(246075003|Causative agent|, DescendantsAndSelf(410607006|Organism|)) )
22