query languages for snomed: use cases and issues for binding to health records and to icd &...

22
Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec (locally http://www.cs.man.ac.uk/~rector/temp/SNOMED_TQL_for_comment ) Alan Rector BioHealth Informatics Group University of Manchester [email protected] http://www.cs.manchester.ac.uk/~rector Copyright University of Manchester 2012 Licensed under Creative Commons Attribution Non-commercial Licence v3

Upload: austen-miles

Post on 31-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

Query Languages for SNOMED:

Use Cases and Issues for Binding to Health Records and to ICD& background for comments on DRAFT SNOMED

Query language spec(locally http://www.cs.man.ac.uk/~rector/temp/SNOMED_TQL_for_comment)

Alan RectorBioHealth Informatics Group

University of [email protected]

http://www.cs.manchester.ac.uk/~rector

Copyright University of Manchester 2012 Licensed under Creative Commons Attribution Non-commercial Licence v3

Page 2: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

Background► Use cases that for terminology query languages

► Binding of ontologies to health records: HL7 & EN 13606/Archtypes• Specifying “value sets” for fields

• Expanding SQL queries to include subsumed concepts

► Use of a common ontology in ICD

► Questions► Theoretical

• Query expansion for querying data bases rather DL queries on A-Boxes‣ Negation: “not necessarily” vs “necessarily not”‣ Natural level of incompleteness – The frame problem and Grice

• Coping with representations in subsets of EL++ without disjointness

► Practical for ICD • Are there flaws in SNOMED’s proposed query language? Are there alternatives?

‣ “build or borrow” – relation to standards

• Establishing a “reference representation” – what it should have been ‣ and a migration path

► Major issues in Query Language Spec► Pragmatic requirements for ICD

• “Arbitrary selection of classes”

• Negation – exclusions, residual classes (“other”), with/without

• Using queries to cope with known errors in SNOMED

• Comprehensible rules for assigning cases to codes 2

Page 3: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

Necessary background:

► SNOMED CT

► Binding to EHR► Separation of Domain Ontology from Data schema

► HL7 and Archetypes

► Three component architecture for ICD11.

► Requirements and status of SNOMED Terminology Query Language (& its Ocean Informatics predecessor)

3

Page 4: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

SNOMED CT (SCT)

► Large terminology formuated in an old description logic

►Roughly EL++ without disjointness• Logical content available in OWL syntax • OWL version classifies with ELK or SNOROCKET in a few seconds

►~300K active classes; ~1.2M axioms• MConvenient to extract modules for experiments

‣ Most tools get bogged down in bulk

►Role Group• Translation into OWL not identical to KRSS original

►Idiosyncratic schema & many errors• See papers on my website.

►Canonical form mechanism that is often used in lieu of classification

• A good topic for a separate discussion – not for today

4

Page 5: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

Role Groups► Purpose: group qualifiers (restrictions) together to

distinguish►Cancer originating in breast and metastatic to bone*

• Cancer & RoleGroup some (has_status some primary &

hasSite some Breast) & RoleGroup some (has_status metastases & has_site some bone)

►Cancer originating in bone and metastatic to breast• Cancer &

RoleGroup some (has_status some metastases & hasSite some Breast) & RoleGroup some (has_status primary & has_site some bone)

► OWL translation pragmatic►Role groups inserted everywhere for consistency.

• Native syntax omits them when not required

5* Easy to understand example. Not literally correct for SNOMED

Page 6: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

Major issue: What should a code represent?The “Condition” vs “Situation” debate

(now largely resolved in favour of “situations”

► Does a code represent►A “disorder”?

• “Condition” interpretation

►“having a disorder”?• “Situation” interpretation

‣ “Situation of having a disorder” /‣ “Patient having the disorder (at a given place and time as observed by|

a given clinician)”

6

Page 7: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

Example: Fracture of Radius & Ulna (Forearm) – a single code in ICD and SNOMED

► Nothing can be both a “fracture of radius” and “fracture of ulna”

►“Condition interpretation”

► A patient can simultaneously have both a “fracture of radius” and “fracture of ulna”

►“Situation interpretation”7

Page 8: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

The evidence

► Should responses to queries / rules for patients with “Fracture of Radius” include patients with “Fracture of the radius & ulna”?

►Most doctors say “yes”

►Hierarchies of SNOMED and ICD imply “yes”, i.e.• “Fracture of Radius and Ulna” is a kind of “Fracture of Radius”

► Which is safer?

8

Page 9: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

Implications in OWL

► Condition interpretation►sctcode:Fracture_of_ulna sct:Fracture_of_ulna

►sctcode:Fracture_of_radius_and_ulna ??sct:Fracture_of_radius_and_ulna

►sctcode:Intracranial_bleed_without_skull_fracture ???

► Situation interpretation (reference model)►sctcode:Fracture_of_ulna

Situation & (includes some sct:Fracture_of_ulna)

►sctcode:Fracture_of_radius_and_ulna Situation & (includes some sct:Fracture_of_ulna) & (Includes some sct:Fracture_of_radius

►sctcode:Intracranial_bleed_without_skull_fracture Situation & (includes some sct:Intracranial_bleed) & not (includes some sct:skull_fracture)

9

Page 10: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

…but

► For the foreseable future:►The hierarchies behave as if the codes represented

situations

►Separate entities for the condition and the situation will not be created

• It is up to software and users to disambiguate or to manage as best they can‣ One of the many legacy idiosyncracies

10

Page 11: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

11

Most common use case: eHealthrecords

Data schema

Ontology

Are the dotted arrows: Class expressions? Queries? Other?

Page 12: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

Ontology

Data base

Most common use case: eHealthrecords

To determine what is legal for entries in the database

Page 13: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

Consider retrieval from a database► I want to retrieve all situations with hypertension during

pregnancy… ►Pregnancy only recorded if kind of hypertension does not

necessarily involve pregnancy, so we need the union of: • All situations with kinds of hypertension necessarily involving pregnancy

- e.g. SELECT ?situation, ?diagnosis from DiagnosticTable WHERE ?diagnosis IN {SubclassesOf Hypertension_necessarily_not_involves_pregnancy}

• All situations involving kinds of hypertension not necessarily involving pregnancy but with pregnancy recorded separately.

- e.g. SELECT ?situation, ?diagnosis1 from DiagnosticTable WHERE ?diagnosis1 IN {SubclassesOf Hypertension_not_necessarily_involved_pregnancy} & EXISTS ?situation, ?diagnosis2 WHERE ?diagnosis2 IN {Subclasses of Pregnancy}

►In the terminology query language we need a query for:• “Kinds of hypertension not necessarily involving X”• “Kinds of hypertension necessarily involving X”

‣ (but that’s simple: “Subclasses of X” usually abbreviated “ X”)⬇

• “Kinds of hypertension necessarily not involving X”‣ Straightforward if we had negation and disjointness, which we don’t 1

3

Page 14: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

Consider specification of “value sets”

► Main cases► Simple value sets not used elsewhere

• severity in {mild | moderate | severe}

► Complete hierarchies – all descendants • diagnosis in {SubclassesOf Disorder}

► Ordered hierarchies and defaults, with specialisation• “Reason for admission” in {Chest pain, Major trauma, Hypothermia,…}

► Arbitrary lists of one or more specific classes• “Radiation of chest pain” in {left arm, shoulder, neck, axilla, abdomen}

‣ Exist elsewhere and used for many other purposes

► Union, intersection & difference of all of the above

► Other issues► Declarative specification

• updating with changes in terminology; changes in data schema.

► Addition or removal of values by context(discussion for another day)

14

Page 15: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

ICD and ICD-11(“International Classificaiton of Diseases”)

► ICD is a classification NOT an ontology►Used for national and international statistical returns

►Also for billing in many jurisdictions• (including an extra layer of “Clinical Modifications” for each country)

►Lots of legacy idiosyncracies• Designed to be printed in books & manuals

► Basic rule: Everything must add up to 100% at each level: therefore…

►Each code has only one parent

►Children of every code mutually exclusive and exhaustive

►Therefore…• If a code fits logically in two places it must be “excluded” from all but

one.• Residual categories “other” & “not elsewhere classified” are required to

make siblings exhaustive 15

Page 16: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

SNOMED CT Common Ontology Subset

ICD 11 Revision use caseMulti-layer system

16

Foundation Component(signs, symptoms, causes, …) Ontology Component

(kinds)

Mortality Morbidty Primary Care …

Linearizations

Page 17: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

ICD 11 Revision

►Aims to provide a persistent structure for computer access

►Foundation component• An “ontological core” shared with SNOMED• A “Content model” of other information that folk want

‣ signs, symptoms, effects, relation to diability, … … … … … …

►“Linearizations” that look like the legacy system• But can be generated from the Foundation Component

and its annotations ‣ Coherent with Foundation Model (except for flagged legacy

issues)‣ A single tree of mutually exclusive and exhaustive subclasses at

each level- Therefore must have

- “Exclusions”- “Residudala categories” – “other” “not elsewhere classified”

17

Page 18: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

Assumptions► Snomed disorder codes to be treated as “situations”

►Conjunctions and negation “wrapped” in code

►Hierarcies consistent with “situation” interpretation

► Queries will be against the either asserted or inferred form of the ontology, but no reasoner will be used

► To be used with separate data schemas ►For lists of potential values

►For expanding queries for retrieval

► To be used with ICD “Linearizations”►Specify meaning of each item in a linearization in terms of the

ontology 18

Page 19: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

Requirements listed for SNOMED Terminology Query Language (locally http://www.cs.man.ac.uk/~rector/temp/SNOMED_TQL_for_comment)

► Support► Select class itself only, children, and/or descendants

► Set operations on results – union, intersection, difference

► Differentiate primitive and fully defined concepts; leaf concepts from others• C SubclassOf … vs C EquivalentTo ….; no subclasses vs has subclasses;

‣ And possibly other syntactic selection/filtering

► Concepts asserted related to another given concept• And possibly the reciprocals (‘used in’)

► String matching

► Use results of previous queries in nested ) queries• and subsequent queries?

► Other► Functional & all functions returning a set of concepts

► Easy to use, understand, and implement

► Questions►What’s missing? How best to satisfy the requirements? 1

9

Page 20: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

Examples► /* This query expression returns concepts in the Clinical finding sub-hierarchy*/

►DescendantsAndSelf(404684003|Clinical finding|)

►  /* This query expression returns all fully defined concepts in the Clinical finding sub-hierarchy /*

►FilterOnFullyDefined(DescendantsAndSelf(404684003|Clinical finding|))

► /* This query expression returns the first three levels of the Clinical findings hierarchy. */

►ChildrenAndSelf( ChildrenAndSelf( ChildrenAndSelf(404684003|Clinical finding|)))

► /* This query expression returns all concepts in the ‘Immune hypersensitivity reaction hierarchy that have an explicit ungrouped ‘Causative agent’ relationship defined to any target concept.*

► Intersection( DescendantsAndSelf(418925002|Immune hypersensitivity reaction|), HasDirectRel(246075003|Causitive agent|, All))

20

Page 21: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

Inferred & assertedUse of Role Groups

► /* When run against the inferred view, this query expression returns all concepts that contain a first group with a ‘Finding site’ of ‘Inguinal canal structure’ and an ‘Associated morphology’ of ‘Hermial opening’, and a second group with a ‘Finding site’ of ‘Abdominal cavity structure’ and an ‘Associated morphology’of ‘Hernia’. Concepts with inherited grouped relationships are also returned.*/

►Intersection( HasGroupedRels( 363698007|Finding site|, 90785001|Inguinal canal structure|, 116676008|Associated morphology|, 414402003|Hermial opening|) HasGroupedRels( 363698007|Finding site|, 52731004|Abdominal cavity structure|, 116676008|Associated morphology|, 414403008|Hernia|))

21

Page 22: Query Languages for SNOMED: Use Cases and Issues for Binding to Health Records and to ICD & background for comments on DRAFT SNOMED Query language spec

Example using descendants and has rel without role groups

► /* this query expression returns concepts describing infectious arthritis */

►Intersection( Descendants(404684003|Clinical finding|) HasRel(116676008|Associated morphology|, DescendantsAndSelf(23583003| Inflammation|)), HasRel(363698007|Finding site|, DescendantsAndSelf(39352004|Joint structure|)), HasRel(246075003|Causative agent|, DescendantsAndSelf(410607006|Organism|)) )

22