semantic interoperability for health network - i~hd d4_3... · 2014-05-15 · semantichealthnet...
TRANSCRIPT
Semantic Interoperability for Health Network
Deliverable 4.3: Ontology / Information models covering the public health use cases
[Version 1, March 12, 2014]
Call: FP7-ICT-2011-7
Grant agreement for: Network of Excellence (NoE)
Project acronym: SemanticHealthNet
Project full title: Semantic Interoperability for Health Network
Grant agreement no.: 288408
Budget: 3.222.380 EURO
Funding: 2.945.364 EURO
Start: 01.12.2011 - End: 31.05.2015
Website: www.semantichealthnet.eu
Coordinators:
The SemanticHealthNet pro-ject is partially funded by the European Commission.
D4.3 Ontology / Information models covering the public health use cases Page 2 of 44
Document description
Deliverable: D4.3
Publishable summary:
SemanticHealthNet (SHN) assumes that several standards and proprietary imple-mentations for representing the content of electronic health records (EHRs) will co-exist for a long time. Thus, the project focuses on providing an integrative semantic abstraction on top of them that is able to act as mediator. As practical exemplars, SHN has set the focus on chronic heart failure and cardiovascular prevention, which drives the development of semantic resources by the work package four (WP4). In the two WP4 previous deliverables, we provided the basis of the interoperability approach proposed: an ontological framework and an initial set of semantic pat-terns obtained by following a bottom-up approach. As a result of our second deliv-erable (4.2) we created the Heart Failure Summary (HFS), a minimal dataset that contains essential information in order to optimize heart failure management. In this deliverable, we describe an extension of the underlying semantic architec-ture and we focus on the semantic interoperability challenges that exist when clini-cal practice data (e.g. EHRs) are used by public health systems. We exemplify it by extending the HFS with two risks factors of cardiovascular diseases (CVDs), which are of interest for public health studies, viz. tobacco and alcohol use. Thus, the HFS should be understood as a global resource that can be used by both stakeholder communities. A dedicated instrument (e.g. a questionnaire) for public health re-porting on heart failure would be very similar to the HFS, from a modelling point of view. Public health use cases require the access to heterogeneous information sources that include data described at different level of detail and generated within heter-ogeneous contexts (e.g. smoking cessation clinic record vs. primary care record) and attending at different requirements. We have demonstrated how semantic patterns can be used to improve semantic interoperability across them. Semantic patterns are based on a reference ontology model (i.e. SHN ontological framework) and facilitate the mapping of clinical model data into their semantic representation. Some query exemplars have been provided in chapter 4 in order to demonstrate: (1) that data from heterogeneous models can be homogeneously queried and (2) data can be retrieved within their context which help public health systems to interpret them right. There are also limits to semantic interoperability when codes such as Ex-smoker are used for referring both to someone who quit last month but had smoked for 30 years, and to a person who quit 20 years ago after having smoked two years only. Thus, there are limits to comparability and semantic interoperability that are fun-damental to differences in clinical processes and facts about the world that cannot be resolved at the level of information systems. At best, the degree of uncertain-ty/vagueness can be estimated.
Status:
Version:
Public: x No □ Yes
Deadline:
Contact: Catalina Martínez-Costa [email protected] Stefan Schulz [email protected]
Editors: Catalina Martínez-Costa, Stefan Schulz
D4.3 Ontology / Information models covering the public health use cases Page 3 of 44
Table of contents
Table of contents ..................................................................................................................................... 3
1 Introduction and objectives ............................................................................................................ 4
1.1 Background .............................................................................................................................. 4
1.2 Objectives ................................................................................................................................ 4
1.3 The public health perspective ................................................................................................. 5
1.4 Monitoring Risk Factors for preventing Cardiovascular Disease ............................................. 6
2 Methodology and General Principles .............................................................................................. 9
2.1 SHN semantic infrastructure ................................................................................................... 9
2.1.1 Ontological framework .................................................................................................... 9
2.1.2 Ontology Content Patterns – Semantic Patterns – ....................................................... 10
2.1.3 OWL DL representation ................................................................................................. 17
3 Extending the Heart Failure Summary (HFS) to support Public Health ......................................... 20
3.1 Tobacco Use Summary .......................................................................................................... 20
3.1.1 Smoking Tobacco Model ............................................................................................... 21
3.1.2 Model for other forms of tobacco use (Snuff) .............................................................. 26
3.2 Alcohol Use Summary ........................................................................................................... 28
4 Dealing with heterogeneous tobacco use representations .......................................................... 31
4.1 Objectives .............................................................................................................................. 31
4.2 Description of the models ..................................................................................................... 31
4.3 Mapping of clinical model data to their semantic representation by using semantic patterns
as bridge ............................................................................................................................................ 35
4.4 Homogeneous query of data from the three clinical models: Tobacco Use (openEHR),
Meaningful Use (HL7 C-CDA) and Tobacco Detailed Use (DCM- HL7v3) .......................................... 40
5 Summary and conclusions ............................................................................................................. 43
D4.3 Ontology / Information models covering the public health use cases Page 4 of 44
1 Introduction and objectives
1.1 Background
SemanticHealthNet (SHN) faces the challenge of improving semantic interoperability of clinical in-
formation. It assumes that several standards and proprietary implementations for representing the
content of electronic health records (EHRs) will co-exist for a long time. Thus, the project focuses on
providing an integrative semantic abstraction on top of them, representing a homogeneous view that
is able to mediate across the heterogeneous underlying representations.
The approach targets the whole range of health-related information about all medical domains. As
practical exemplars, SHN has set the focus on chronic heart failure and cardiovascular prevention,
which drives the development of semantic resources by the work package four (WP4).
The two previous deliverables had two main focuses. Firstly, to clarify the distinction between infor-
mation entities and clinical entities in order to know what things have to be represented by infor-
mation models and by ontologies (with focus on SNOMED CT, as an ontology-based clinical terminol-
ogy). As a result we proposed an ontology engineering approach based on a top-level ontology, de-
scription logics and Semantic Web standards and investigated how the semantic equivalence of iso-
semantic expressions could be ascertained when this distinction was properly done. The second fo-
cus was the application of this approach to the semantic representation of the Heart Failure Sum-
mary (HFS), a minimal dataset that contains essential information in order to optimize heart failure
management. As a result, repetitive modelling issues were identified and a set of semantic patterns
were provided by following a bottom-up approach.
In this deliverable, we address the semantic interoperability challenges that exist when clinical prac-
tice data (e.g. EHRs) are used by public health systems. We exemplify it by focusing on two risks fac-
tors of cardiovascular diseases (CVDs), which are of interest for public health studies, viz. tobacco
and alcohol use. Public health and clinical practice requirements are not the same; and EHR data are
usually not appropriate for many public health tasks. The Heart Failure Summary should be under-
stood as a global resource that can be used by both stakeholder communities. A dedicated instru-
ment (e.g. a questionnaire) for public health reporting on heart failure would be very similar to the
HFS, from a modelling point of view. Therefore, we decided to extend the HFS with detail infor-
mation about both risk factors, which will be described in the following. Furthermore, we describe an
extension of the underlying semantic architecture that will be applied to the public health use case.
1.2 Objectives
The goal of this work is to demonstrate that the semantic interoperability solution proposed can be
applied to improve semantic interoperability between public health systems and clinical practice
records (EHRs). In order to do so we will extend the HFS with a set of good quality representations of
additional information required from a public health perspective. Based on this extension as working
example, we will provide and apply semantic patterns to that information in order to make it seman-
tically interoperable.
The main purpose of using semantic patterns is to facilitate the mapping of existing structured data
into an ontology-based representation, in a way that can be interpreted by computers independently
D4.3 Ontology / Information models covering the public health use cases Page 5 of 44
of the degree of granularity in which it has been provided. These patterns are based on a formal
model of meaning (reference model), constituted by a set of OWL DL ontologies under a highly con-
strained top-level ontology, which assists the modelling task.
The set of semantic patterns we create will be produced together with the formal representation of
their meaning in OWL DL. Some patterns will be instantiated with fictitious data for demonstration.
The patterns produced will be based on a subset of top-level patterns, which can be reused in differ-
ent use cases.
We will exemplify the use of semantic patterns by applying them to encode clinical data rendered
from heterogeneous sources. This will demonstrate that patient information can be homogeneously
retrieved for a particular public health use case, independently on the underlying clinical model rep-
resentation.
1.3 The public health perspective
The focus of public health is to improve health and quality of life through the prevention and treat-
ment of disease and through the promotion of healthy behaviours. While the clinical approach fo-
cuses on the treatment of individual patients, public health considers the characteristics of groups of
people. Knowledge of the characteristics of the population can be derived not only from the aggrega-
tion of clinical records but from other sources such as social care records, personal health records,
service payments, health surveys, etc. Each of these information systems attend to different infor-
mation requirements and contexts. Thus, their homogeneous access by public agencies would signifi-
cantly improve and facilitate their work. In the following, we will concentrate on public health as-
pects of cardiovascular diseases (CVD) and risks.
As reported by deliverable 2.1, there is considerable evidence that more than 50% of the recent re-
duction in CVD is due to decreases in risks factors. The CVD risk is frequently the result of multiple
interacting risk factors and is usually expressed in terms of: developing CVD (incidence); experiencing
an event (e.g. heart attack); or dying.
The European guidelines on cardiovascular disease prevention (ESC)1 provide a set of recommenda-
tions to reduce the CVD risk factors. Many of the recommendations are related to behavioural fac-
tors such as: (i) provide smokers advice to quit and offer assistance; (ii) avoid exposure to passive
smoking; (iii) facilitate lifestyle change with cognitive-behavioural strategies; (iv) a healthy diet; (v)
reducing weight in overweight or obese patients; (vi) physical exercise; etc. Others are related to the
use of different prophylactic drug therapies such as using statins or antihypertensive treatment.
Although there is overwhelming evidence that most of the reduction in CVD is due to decrease in
risks factors, there are still major evidence gaps (e.g. how to help citizens to achieve lifestyle chang-
es) for which “real-world evidence”, i.e. outside the artificial environments of clinical trials, will be
crucial and which requires large numbers of EHRs and other applications to be analysed. The ESC
report highlights this fact and the lack of knowledge about people who do not usually participate in
clinical trials, as well as long-term outcomes of interventions.
1 http://www.escardio.org/guidelines-surveys/esc-guidelines/Pages/cvd-prevention.aspx
D4.3 Ontology / Information models covering the public health use cases Page 6 of 44
In addition, the ESC report emphasizes the need to reach beyond the clinic into citizens’ daily lives in
order to understand how to modify CVD risk factors. This requires the collaboration of clinical care
and public health, the overlap of which varies within and across communities. Patients with behav-
ioural risk factors such as tobacco use, risky alcohol use, etc. present themselves to primary care ser-
vices every day. A coordinated approach between public health and primary care could be a synergis-
tic opportunity to address these behaviours. Among the challenges of collaborations2, there are bar-
riers like communication challenges, different practice cultures, policy, and funding mechanisms, etc.
Within the communication challenge, there are human and technical barriers. Here, we will focus on
the technical challenges related with the meaningful communication and reuse of patient data.
The outcomes of public health measures are measured in terms of processes (i.e. numbers taking up
a smoking cessation service), intermediate risk outcomes (e.g. six month quit rates) and disease out-
comes (CVD event rates; CVD and lung cancer incidence and death rates). The patient-level data that
come out of clinical processes are difficult to re-use for public health processes: they are unstruc-
tured to a large extent; coded data is often biased toward billing purposes, data of interest for public
health such as tobacco use behaviour is often too coarse-grained, incomplete, or unreliable3. Howev-
er, there is an increasing interest in the analysis of aggregated clinical data4. Nevertheless, we here
focus on the reuse of data from EHRs for public health purposes, assuming that at least parts of the
EHRs are sufficiently standardized and quality-assured. This can be the case with summaries like the
HFS. In order to support public health goals the underlying data models have to be expanded to in-
corporate environmental, psychosocial, and other non-medical data elements, which are typically
not represented in the EHR in a way that is sufficient for public health.
Since the objective of this deliverable is to show how the EHR can be used to satisfy public health
goals, we have expanded the Heart Failure Summary (HFS), produced in the second project year and
reported in deliverable 4.2. As working examples we have selected two behavioural factors, tobacco
and alcohol use, which are risks factors for developing CVDs.
Our work will consist in providing the new data elements and value restrictions for modelling both
risk factors as well as their semantic pattern-based representation, which allow it to be semantically
interoperable across heterogeneous systems, based on different EHR standard or proprietary repre-
sentations.
1.4 Monitoring Risk Factors for preventing Cardiovascular Disease
There is a wide range of data items that may be needed by population health experts to promote
CVD prevention. Table 1-1 shows a summary of possible information sources and examples of their
use by public health systems, extracted from deliverable 2.1.
A public health use case will usually involve more than one information source. As an example, in
order to compare the effectiveness of smoking cessation services we might use three different
sources of information to obtain quit rates: (1) national lifestyle surveys; (2) EHRs and (3) smoking
cessation service.
2 http://www.health.gov.on.ca/en/common/ministry/publications/reports/capacity_review06/phealth_pcare.pdf
3 http://www.ncbi.nlm.nih.gov/pubmed/10912559
4 http://www.sciencedirect.com/science/article/pii/S1532046407000603
D4.3 Ontology / Information models covering the public health use cases Page 7 of 44
Common public health data sources Example of use
Scientific literature Effect of statin prescribing on CVD risk factor
National surveys Health surveys with age, sex, ethnicity, house-
hold income, education level, life style etc.
EHRs CVD events, cholesterol measure, blood pres-
sure measure, height, weight, smoking, alcohol use, etc.
Disease registers CVD incidence
Death certificates and other public administrative data
CVD deaths
Specific health services E.g. Smoking cessation service
Smoking prevalence and quit rates, intention to quit, etc.
Web and Mobile technologies Daily weight measurement at home, ‘wellbeing’
applications for physical activity monitoring Table 1-1 Common public health data sources
WP4 has focused on the representation of tobacco and alcohol consumption in the EHR to support
public health goals. WP1 clinicians had provided us with a dataset produced within the context of the
SICA-HF5 EU FP7 project, related to heart failure patients in a clinical setting. In this dataset the in-
formation about smoking and alcohol use consisted on a (yes / no) answer. In order to make it suita-
ble for public health purposes we have enriched this representation and added it to the heart failure
summary in order to bring it into a single combined resource of interest for both clinical care and
public health stakeholder communities. Both CVD risk factors are described by using deliverable 2.1
as source.
Tobacco use and in particular smoking is strongly associated with CVD and lung cancer. When a
smoker quits there is a measurable improvement in his/her health. When large groups quit, e.g., af-
ter the ban on smoking in public places, mass effects like significant drops in heart attacks rates are
observed. As smoking is highly addictive, the process of quitting is complex, and not easy to measure.
There are different sources of information on smoking quit rates, as previously described, and the
characteristics of smokers who quit. Either this information is gathered directly from those who use
smoking cessation services or it is estimated from population surveys for those who quit on their
own.
The health risks from smoking are proportional to the amount smoked and the period of exposure.
The ideal risk measure is often termed “pack years”, which takes numeric values. The smoking data
in EHRs are often much cruder. For example a code for “ex-smoker” referring to someone who quit
last month but was smoking for 30 years, alongside another person with the same code who quit 20
years ago after only smoking for two years.
Both primary care and public health services are paid to help smokers to quit. National health bodies
need to monitor the quality of these services and consider how they should evolve as the character-
istics of smokers change over time.
5 http://www.sica-hf.com/
D4.3 Ontology / Information models covering the public health use cases Page 8 of 44
Many of the phenomena described with regard to tobacco can also be observed in the way alcohol
use is represented in the medical record6. Whereas for light and moderate alcohol use, certain pro-
tective cardiovascular effects are discussed, heavy drinkers are threatened by several cardiovascular
risks like hypertension, stroke, and alcoholic cardiomyopathy. Whereas light tobacco consumption is
rather the exception than the rule, light alcohol consumption is the normal case in Western societies,
so that the distinction between "drinker" and "non-drinker" is difficult and rather irrelevant from a
medical point of view, whereas "social drinker" and "heavy drinker" are two points on a continuum.
Heavy drinking is certainly more socially stigmatising than social drinking, so that the EHR may be
specially unreliable in identifying this risk group.
In order to extend the heart failure summary with the tobacco and alcohol use risk factors we have
used as input the requirements derived from the above content analysis. In addition to this, we have
used two other sources of requirements: (i) and analysis made by the Swedish National Board of
Health and Welfare within the project “Common information structure – the application in quality
registers” for the health care for patients with chronic heart failure in adults and (ii) a nurse-run HF
clinics where patient lifestyle/risk factors are captured.
The current heart failure summary should be seen as an exemplar, which aggregates a wide range of
phenomena that are typical for the representation of precise or approximate facts, hypothesis, plans,
etc. The inclusion of the alcohol and tobacco use cases will give additional insights in aspects like: (i)
the highly approximate and speculative character of reporting information about recreational drugs
in the EHR; (ii) Deficiencies in the representation of relevant entities in standard terminologies; (iii)
imprecision regarding numeric values; (iv) undocumented background assumptions and procedures
in history taking.
Following, Table 1-2 provides a list of the tobacco and alcohol use data items that have been selected
as relevant to support public health systems for preventing CVDs.
Tobacco Use data items Alcohol Use data items
Status Status
Form Form
Typical Smoked Amount Typical Alcohol Consumption
Pattern of Use Pattern of Use
Date Ceased Binge Drinking Pattern
Pack Years Date commenced
Date Ceased Table 1-2 Tobacco and alcohol use data items considered of interest for public health
6 http://www.ncbi.nlm.nih.gov/pubmed/10912559
D4.3 Ontology / Information models covering the public health use cases Page 9 of 44
2 Methodology and General Principles
In the first WP4 deliverable (4.1), we had outlined the shared logical framework on which our ap-
proach is based. Later, deliverable 4.2 provided an overview of the main representational units of the
framework and a set of patterns elaborated by following a bottom-up approach for modelling the
heart failure summary. There, we stated that many of the patterns provided could be generalised
and converted into top-level patterns able to be composed and specialised. Below, we describe the
approach and the progress done in the last months concerning the use of patterns for facilitating the
modelling of clinical information based on a formal model of meaning.
2.1 SHN semantic infrastructure
The semantic infrastructure proposed consists of an ontological framework that constitutes the
model of meaning and a set of content ontology patterns (onwards semantic patterns) that use this
framework as reference. The framework aims at providing an unambiguous representation of clinical
information by providing a set of top-level concepts (clinical entities and information entities) and an
unambiguous way of relating them. It consists of three kinds of ontologies: (i) top-level; (ii) infor-
mation entity and (iii) medical domain ontologies, expressed in OWL 2 DL. How this framework inter-
acts with semantic patterns and this last with clinical models or structured clinical information in
general will be explained in the following.
2.1.1 Ontological framework
Semantic patterns are based on a model of meaning which consists of the following three ontologies:
A top-domain ontology, BioTopLite7 (prefix btl:) providing a set of canonical top-level catego-
ries and relationships, like btl:Condition, btl:InformationObject, btl:Quality, btl:Process, or
btl:hasPart, btl:bearerOf, respectively. We use this top-level ontology, especially because it
introduces a very clear distinction between information objects and other entities (e.g. clini-
cal entities) and allows their unambiguous binding, which is guided by the top-level catego-
ries.
SNOMED CT (prefix sct:), a huge clinical terminology partially built on formal-ontological
principles. We use subsets of it as an OWL 2 DL ontology. Selected SNOMED CT content is
placed under top-level classes provided by BioTopLite. SNOMED CT does not limit itself to
represent medical entities like “myocarditis”, but it also codes for complex statements such
as “possible myocarditis” which combine a domain term with epistemic information about its
specific use by the author of this statement. This is typical in SNOMED CT concepts which be-
long to the “Situation with explicit context” hierarchy, which can be seen as an information
model inside SNOMED CT. Here interoperability requires additional effort (e.g. the code for
“myocarditis” bound to a specific data element within an information model on diagnosis in-
cluding diagnostic certainty attributes should be found to be equivalent to the code “possible
myocarditis”). It requires re-modelling them guided by the semantic categories of the under-
7 http://purl.org/biotop/biotoplite.owl
D4.3 Ontology / Information models covering the public health use cases Page 10 of 44
lying model of meaning and not primarily by its label, as this can be ambiguous, because
“myocarditis” could be interpreted as a disorder or as a morphology.
An EHR information entity ontology (prefix shn:) for representing pieces of information like
diagnostic statements, plans, orders, etc. They are outcomes of clinical actions like observa-
tions, investigations, or evaluations. All classes of this ontology are represented as subclasses
of the top-level class btl:InformationObject. Those SNOMED CT concepts that represent
statements rather than clinical entities should also be placed under this category after their
re-modelling (e.g. “possible myocarditis”).
Information entities will refer to (types of) clinical entities by means of the relation btl:represents
which can be further specialized by shn:isAboutSituation (e.g. shn:InformationItem
shn:isAboutSituation sct:SmokingSituation) and shn:isAboutQuality (e.g. shn:ObservationResult
shn:isAboutQuality sct:DoseForm) for referring to a patient clinical situation8 (e.g. smoking situation)
or a quality indirectly observed of some material object (e.g. pharmaceutical product) or process (e.g.
medication administration process referring to the route quality).
2.1.2 Ontology Content Patterns – Semantic Patterns –
Semantic patterns act as bridge between structured data and their semantic representation based on
the above ontological framework, and can be used to guide the mapping process between both. The
main rationale for semantic patterns is to facilitate recurring content modelling tasks. Examples of
recurrent issues are: Who does what, when and where? – What is the location of something (an ob-
ject or a process)? – What is the time frame of something, e.g. the duration of a process (like a clini-
cal situation) or the life of a material entity? – etc. In their attempt to provide clinical information
within a standardized structure that addresses the requirements of some particular use case9 seman-
tic patterns are not too different from clinical archetypes and their use in templates.
Semantic patterns are based on the above ontological framework and are employed to insulate users
– more precisely those who semantically annotate clinical models – from the underlying ontology
formalisms. Semantic patterns are characterized by the following:
They are based on the Closed World Assumption (CWA), which implies that everything we do
not know is false, while the open world assumption (OWA) states that everything we do not
know is undefined. As an example, the question “Was the heart failure caused by a heart at-
tack?” will look for that statement in our model, and since we do not have that statement,
the two systems will interpret it differently: false for a closed world approach and not com-
putable for an open world approach.
Patterns can be specialized by following a similar paradigm to the object-oriented design and
frame systems, in which the properties defined in a parent class are inherited by all its child
classes, which can further constrain them. For instance, if we specialise a diagnosis pattern
8 S. Schulz, A. Rector, J. Rodrigues, C. Chute, B. Üstün, K. Spackman, Ontology-based convergence of medical terminologies.
SNOMED CT and ICD-11. Proc. of eHealth2012. Vienna, Austria: OCG, 2012 9 E. Blomqvist, E. Daga, A. Gangemi, V. Presutti, Modelling and using ontology design patterns. [http://www.neon-
project.org/web-content/media/book-chapters/Chapter-12.pdf]
D4.3 Ontology / Information models covering the public health use cases Page 11 of 44
with one specific for breast cancer diagnoses, we might be interested in specifying its severi-
ty values by using the values of some cancer staging system as the TNM system. Thus, se-
mantic patterns can be organized in hierarchies from more general (top-level patterns) to
more specific ones.
They can be composed with other patterns to cover larger use cases10 in a semantically con-
sistent way. As an example, the disease expressed by the diagnosis pattern might be further
specified by following its own semantic pattern, which includes properties for expressing the
body site affected or the disease cause.
We propose to use semantic patterns as a “proxy” mediating between clinical models (e.g. arche-
types) and the proposed formal model of meaning (i.e. ontological framework). Thus, they should
support the mapping of clinical data rendered according to existing EHR , messaging, or reporting
standards and other structured representations into their semantic representation. Besides, they
could be used for the semantic validation of the internal consistency of proposed clinical models or
even for guiding the creation of clinical models.
Our assumption is that a broad range of clinical models can be represented by the specialisation and
composition of a limited set of top-level semantic patterns. In deliverable 4.2, we demonstrated the
creation and application of such patterns for representing information on heart failure in a bottom-
up approach. We found out that one way of handling this could be described by means of specialisa-
tion and composition based on a set of higher-level patterns (top-level patterns).
Semantic patterns describe information acquired in a medical context about clinical-related issues
(e.g. Diagnosis, Lab results, Examination results, Symptoms, etc.). Their use by clinical modellers
should be supported by tools that provide a close-to user and friendly view on them, independently
of their internal representation. Ideally, their selection should be guided by a tool that suggests a
pattern based on a list of “keywords” introduced by the modeller (e.g. lab result, cholesterol, diagno-
sis, etc.).
Here we represent them as Subject-Predicate-Object (SPO) triples, enhanced by a cardinality attrib-
ute. The Subject and the Object correspond to sets of classes of the ontology framework. However,
the predicate does not correspond to an object property but to an OWL DL expression, as we will
describe later in Section 2.1.3.
In the following, we describe some of the top-level patterns we will use to encode the tobacco and
alcohol use clinical models. They can be specialized and composed by following certain cardinality
and value restrictions.
Cardinality constraints restrict the number of instances in which some predicate is used with differ-
ent values (e.g. the cardinality “1..*” of the predicate ‘describe situation’ means that the latter must
have at least one object value, with the maximum unbounded).
10
A. Gangemi, Ontology Design Patterns for Semantic Web Content. In Proceedings of the Fourth International Semantic Web Conference, 2005, pp. 262-276
D4.3 Ontology / Information models covering the public health use cases Page 12 of 44
The value restrictions constrain the value of the Object part of the triple. They limit the possible val-
ues for some predicate, allowing another pattern as object part of a triple (e.g. a specific disease or
the pattern that further describes a disease).
Note that instantiation at this (meta-)level means that the instances are object classes, not domain
individuals.
Top-level pattern I_CS_PT (cf. Table 2-1):
The information about clinical situation pattern (I_CS_PT) can be used to represent some piece of
information about a particular clinical situation of the patient. Clinical situations, as described in11,
correspond to SNOMED CT clinical findings. Precisely, they represent the period or a patient’s life
during which a certain clinical condition is present. The pattern consists of the following SPO triples,
described as enumerated in the left column of the table:
- S1: Type of clinical situation in focus (e.g. heart failure, smoking, etc.)
- S2: Process performed to acquire the information. (e.g. diagnostic measures, physical exami-
nation, history taking, etc.)
- S3: Any epistemic and contextual aspect that qualifies or modifies the information related
with the clinical situation in focus (e.g. severity, temporal context, finding context (absence,
certainty, etc.), etc.)
#N Subject Predicate Cardinality Object
S1 shn:InformationItem 'describes situation' 1..* shn:ClinicalSituation (CS_PT)
S2 shn:InformationItem 'results from process' 1..* btl:Process (CP_PT)
S3 shn:InformationItem 'has attribute' 0..* shn:InformationAttribute
Table 2-1 Semantic pattern I_CS_PT
In the above table, the Object part of the triple may contain the name of another pattern in brackets,
which means that the main pattern can be composed with that one.
Examples of the application of this top-level pattern are Diagnoses, Co-morbidities, Symptoms, Aller-
gies, etc. The following table shows its application for representing “Suspected heart failure diagno-
sis”. The Object part of the triples has been constrained by following the cardinality and value con-
straints indicated by the pattern.
#N Subject Predicate Object
S1 shn:InformationItem 'describes situation' sct:HeartFailure
S2 shn:InformationItem 'results from process' sct:DiagnosticProcess
S3 shn:InformationItem 'has attribute' sct:Suspected
Table 2-2 Diagnosis example based on I_CS_PT pattern
Top-level pattern I_NCS_PT (cf. Table 2-3):
The information about no clinical situation pattern (I_NCS_PT) is defined as a specialisation of
I_CS_PT and can be used to represent the absence of a particular patient clinical situation. It consists
of the following SPO triples:
11
S. Schulz, A. Rector, J. Rodrigues, C. Chute, B. Üstün, K. Spackman, Ontology-based convergence of medical terminologies. SNOMED CT and ICD-11. Proc. of eHealth2012. Vienna, Austria: OCG, 2012.
D4.3 Ontology / Information models covering the public health use cases Page 13 of 44
- S1: Clinical situation in focus (e.g. heart failure, smoking, etc.)
- S2: Process performed to acquire the information. (e.g. diagnostic, physical examination, his-
tory taking, etc.)
- S3: Any epistemic and contextual aspect that qualifies or modifies the information related
with the clinical situation in focus (e.g. severity, certainty, temporal context, etc.).
- S4: The information attribute that represents the absence of the clinical situation in focus.
This triple is coloured in grey since it further specialises the triple S3 (shn:InformationItem
‘has attribute’ 0..* shn:InformationAttribute) of the parent pattern (I_CS_PT) for represent-
ing the absence aspect). Thus, the predicate ‘has situation context’ specialises the predicate
‘has attribute’ and constrains its value to sct:Absent, which is placed as subclass of
shn:InformationAttribute in the ontology framework. The cardinality “1..1” indicates that this
triple is obligatory when this pattern is instantiated. Note that this is a meta-description, so it
has no existential import at the level of individuals.
- #N Subject Predicate Cardinality Object
S1 shn:InformationItem 'describes situation' 1..* shn:ClinicalSituation(CS_PT)
S2 shn:InformationItem 'results from process' 1..* btl:Process (CP_PT)
S3 shn:InformationItem 'has attribute' 0..* shn:InformationAttribute
S4 shn:InformationItem 'has situation context' 1..1 sct:Absent
Table 2-3 Semantic pattern I_NCS_PT (shaded triple indicates difference to parent (I_CS_PT))
Examples of application of this top-level pattern are Diagnosis of absence of X, No Symptom X, No
Allergy X, etc. The following table shows its application for representing “Diagnosis of no heart fail-
ure”. The Object part of the triples has been constrained by following the cardinality and value con-
straints indicated by the pattern.
#N Subject Predicate Object
S1 shn:InformationItem 'describes situation' sct:HeartFailure
S2 shn:InformationItem 'results from process' sct:DiagnosticProcess
S4 shn:InformationItem 'has situation context' sct:Absent
Table 2-4 Diagnosis example based on I_NCS_PT pattern
Top-level pattern PH_CS_PT (cf. Table 2-5):
The past history of clinical situation pattern (PH_CS_PT) is defined as a specialisation of I_CS_PT and
can be used to represent any clinical situation of the patient that occurred in the past and may con-
tinue or not at present. It consists of the following SPO triples:
- S1: Clinical situation in focus (e.g. heart failure, smoking, etc.)
- S2: Process performed to acquire the information. (e.g. diagnostic, physical examination, his-
tory taking, etc.)
- S3: Any epistemic and contextual aspect that qualifies or modifies the information related
with the clinical situation in focus (e.g. severity, certainty, absence, etc.) Note that temporal
context is no more within the possible predicate values, since this is expressed by triple S4.
- S4: The information attribute that represents the past temporal context. This triple is col-
oured in grey since it further specialises the triple S3 of the parent pattern (I_CS_PT). The
predicate ‘has temporal context’ specialises the predicate ‘has attribute’ and constrains its
value to sct:InThePast, which is placed as subclass of shn:InformationAttribute in the ontolo-
D4.3 Ontology / Information models covering the public health use cases Page 14 of 44
gy framework. The cardinality “1..1” indicates that this triple is obligatory when this pattern
is instantiated., i.e. if OWL TBox statements are generated as instances of the pattern.
#N Subject Predicate Cardinality Object
S1 shn:InformationItem 'describes situation' 1..* shn:ClinicalSituation(CS_PT)
S2 shn:InformationItem 'results from process' 1..* btl:Process (CP_PT)
S3 shn:InformationItem 'has attribute' 0..* shn:InformationAttribute
S4 shn:InformationItem 'has temporal context' 1..1 sct:InthePast
Table 2-5 Semantic pattern PH_CS_PT (shaded triple indicates difference to parent (I_CS_PT))
Examples of application of this top-level pattern are Past History of Disease, Past History of Allergy,
Past History of Symptom, etc. The following table shows its application for representing Past history
of heart failure diagnosis. The Object part of the triples has been constrained by following the cardi-
nality and value constraints indicated by the pattern.
#N Subject Predicate Object
S1 shn:InformationItem 'describes situation' sct:HeartFailure
S2 shn:InformationItem 'results from process' sct:DiagnosticProcess
S4 shn:InformationItem 'has temporal context' sct:InThePast
Table 2-6 Diagnosis example based on PH_CS_PT pattern
Top-level pattern NPH_CS_PT (cf. Table 2-7):
The No Past History of Clinical Situation pattern (NPH_CS_PT) is defined as a specialisation of
PH_CS_PT and can be used to represent the absence of some patient clinical situation in the past,
independently of its existence or no at the present time. It consists of the following triples SPO:
- S1: Clinical situation in focus (e.g. heart failure, smoking, etc.)
- S2: Process performed to acquire the information. (e.g. diagnostic, physical examination, his-
tory taking, etc.)
- S3: Any epistemic and contextual aspect that qualifies or modifies the information related
with the clinical situation in focus (e.g. severity, certainty, temporal context, etc.)
- S4: The information attribute that represent the past temporal context (sct:InThePast)
- S5: The information attribute that represent the absence of the clinical situation in focus
(sct:Absent). This triple is coloured in grey since it further specialises the triple S3 of the par-
ent pattern (PH_CS_PT). The predicate ‘has situation context’ specialises the predicate ‘has
attribute’ and constrains its value to sct:Absent, which is placed as subclass of
shn:InformationAttribute in the ontology framework. The cardinality “1..1” indicates that this
triple is obligatory when this pattern is instantiated.
#N Subject Predicate Cardinality Object
S1 shn:InformationItem 'describes situation' 1..* shn:ClinicalSituation(CS_PT)
S2 shn:InformationItem 'results from process' 1..* btl:Process (CP_PT)
S3 shn:InformationItem 'has attribute' 0..* shn:InformationAttribute
S4 shn:InformationItem 'has temporal context' 1..1 sct:InthePast
S5 shn:InformationItem 'has situation context' 1..1 sct:Absent
Table 2-7 Semantic pattern NPH_CS_PT (shaded triple indicates difference to parent (PH_CS_PT))
Examples of application of this top-level pattern are No Past History of Disease, No Past History of
Allergy, No Past History of Symptom, etc. The following table shows its application for representing
“No Past history of heart failure diagnosed”. The Object part of the triples has been constrained by
following the cardinality and value constraints indicated by the pattern.
D4.3 Ontology / Information models covering the public health use cases Page 15 of 44
#N Subject Predicate Object
S1 shn:InformationItem 'describes situation' sct:HeartFailure
S2 shn:InformationItem 'results from process' sct:DiagnosticProcess
S4 shn:InformationItem 'has temporal context' sct:InthePast
S5 shn:InformationItem 'has situation context' sct:Absent
Table 2-8 Diagnosis example based on NPH_CS_PT pattern
Top-level pattern OB_CS_PT (cf. Table 2-9):
The observation result process quality pattern can be used to represent the result of an observation
or assessment about some quality of a given clinical situation. It consists of the following SPO triples:
- O1: The quality observed / assessed (e.g. mass intake)
- O2: The clinical situation to which that quality belongs (e.g. tobacco smoking situation)
- O3: The result of the observation / assessment (e.g. 15 cigarettes per day)
- O4: The scale in which the observed value is expressed (e.g. qualitative, quantitative (ratio,
interval, ordinal, etc.)
- O5: The observation / assessment process performed to acquire the information.
#N Subject Predicate Cardinality Object
O1 shn:ObservationResult 'describes quality' 1..1 btl:ProcessQuality
O2 btl:Quality 'is quality of' 1..* shn:ClinicalSituation
O3 shn:ObservationResult 'has observed value' 1..1 btl:ValueRegion
O4 btl:ValueRegion 'has value' 0..1 xml:datatype
O5 btl:ValueRegion 'has units' 0..1 shn:MeasurementUnits
O6 btl:ValueRegion 'has scale' 0..1 shn:Scale
O7 shn:ObservationResult 'results from process' 1..* shn:ObservationProcess or shn:AssessmentProcess
Table 2-9 Semantic pattern OB_CS_PT
Examples of application of this top-level pattern are Clinical Test Result, Physical Examination Result,
measurement result, etc. The following table shows its application for representing “10 cigarettes /
day tobacco smoking assessment”. The Object part of the triples has been constrained by following
the cardinality and value constraints indicated by the pattern.
#N Subject Predicate Object
O1 shn:ObservationResult 'describes quality' shn:MassIntake
O2 btl:Quality 'is quality of' sct:CigaretteTobaccoSmokingSituation
O3 shn:ObservationResult 'has observed value' btl:ValueRegion
O4 btl:ValueRegion 'has value' 10
O5 btl:ValueRegion 'has units' sct:PerDay
O7 shn:ObservationResult 'results from process' shn:Assessment
Table 2-10 Observation result example based on OB_CS_PT
Top-level pattern CP_PT (cf. Table 2-11):
The Clinical Process pattern (CP_PT) is used to describe a clinical process. As a difference with the
previous ones it does not describe some information about some related clinical issue but the clinical
process itself. It will be used by composition by the previous patterns.
It consists of the following SPO triples:
- P1: The participants of the process which play some role as agent (participant that is active
the process) or as patient (participant that is passive in the process). Examples of an active
process participant is the clinician in a diagnostic process, where passive process participants
could be the patient, in some cases, or the devices involved. This predicate can be further
D4.3 Ontology / Information models covering the public health use cases Page 16 of 44
specialised by the predicates ´has information subject´ and ‘has information provider’. The
first indicates the person or artefact that is the subject of the information (e.g. patient, pa-
tient relative, etc.). The second one indicates the provider of the information (e.g. patient,
clinician, etc.). Again, we have to distinguish between the meta-model (template) and the
model (OWL ontology). For instance, the predicate “has participant” has a minimal cardinali-
ty of zero. This does not mean that processes have no participants, it only means that at this
place the OWL ontology does not require any explicit axiom on process participants.
- P2: The date/time in which the process takes place. It can be provided as a time interval or as
single date/time points
- P3: Information about where the process occurs (eg. hospital).
#N Subject Predicate Cardinality Object
P1 btl:Process ‘has participant’ 0..* btl:MaterialObject
P2 btl:Process ‘occurs at’ 0..1 btl:TemporalRegion
P3 btl:Process ‘happens at’ 0..* btl:MaterialObject or btl:InmaterialObject
Table 2-11 Semantic pattern CP_PT
Examples of application of this top-level pattern are a history taking procedure, assessment, observa-
tion, etc. The following table shows its application for representing a history taking procedure carried
out in an outpatient clinic.
#N Subject Predicate Object
P1 sct:HistoryTaking ‘has information provider’ shn:Clinician
P1 sct:HistoryTaking ‘has information subject’ shn:Patient
P3 sct:HistoryTaking ‘happens at’ sct:HospitalBasedOutpatientDepartment
Table 2-12 History taking example based on CP_PT
Top-level pattern CS_PT (cf. Table 2-13)
The Clinical Situation pattern (CS_PT) is defined as a specialisation of CP_PT and can be used to de-
scribe a patient clinical situation. It is used also as compositional pattern. It consists of the following
SPO triples:
- C1: The participants of the clinical situation which play some role as agent or patient (e.g. pa-
tient)
- C2: The date/time in which the situation takes place. It can be provided as a time interval or
as single date/time points
- C3: The place where the process occurs (e.g. hospital).
- C4: The disposition, process, or material object which precedes the clinical situation. The
predicate ‘follows’ can be further refined by ‘is caused by’. For instance, a cause of a heart
failure situation can be a myocardial infarction:
#N Subject Predicate Cardinality Object
C1 shn:ClinicalSituation ‘has participant’ 0..* btl:MaterialObject
C2 shn:ClinicalSituation ‘occurs at’ 0..1 btl:TemporalRegion
C3 shn:ClinicalSituation ‘happens at’ 0..* btl:MaterialObject or btl:InmaterialObject
C4 shn:ClinicalSituation ‘follows’ 0..* shn:ClinicalSituation (CS_PT)
Table 2-13 Semantic pattern CS_PT
D4.3 Ontology / Information models covering the public health use cases Page 17 of 44
Examples of application of this top-level pattern are Heart failure caused by heart attack, Diabetes
mellitus, etc. The following table shows its application for representing a Heart failure situation
caused by heart attack.
#N Subject Predicate Object
C1 sct:HeartFailure ‘has participant’ shn:HumanOrganism
C2 sct:HeartFailure ‘occurs at’ 12/01/2012
C4 sct:HeartFailure ‘follows’ sct:HeartAttack
Table 2-14Heart failure situation example based on CS_PT
Finally, Figure 2-1 depicts the hierarchical graphical representation of the top-level patterns de-
scribed above. The clinical process and clinical situation semantic patterns can only occur as compo-
nents of other patterns since semantic patterns encode information entities. They will be used
through composition by other patterns such as I_CS_PT, I_NCS_PT, etc. The UML notation has been
used for representing specialisation and composition.
Figure 2-1 Hierarchy of top-level patterns
2.1.3 OWL DL representation
The representation of the above top-level patterns into OWL 2 DL allows the precise formalization of
the ontological framework proposed and the use of DL reasoning. DL reasoning is useful for the
achievement of two important goals.
On the one hand, it can be used for detecting equivalent clinical information from iso-semantic mod-
els. This includes the ability to compare different distributions of content between information mod-
els and ontologies/terminologies, and to test whether they are semantically equivalent. For instance,
there are two possible representations to encode a breast cancer diagnosis when using SNOMED CT:
(1) using one diagnosis information model element and the concept Breast cancer or (2) using two
information model elements for representing the disease diagnosed Cancer and the disease location
Breast structure. An appropriate representation, supported by a DL reasoner should discover that
both representations are semantically equivalent.
On the other hand, DL reasoning can be used to provide an advanced exploitation of clinical infor-
mation by means of semantic query possibilities such as retrieving patients who use tobacco, inde-
pendently of the form of the tobacco (e.g. cigar, pipe, etc.) and of the type of consumption (e.g. snuff
or smoking).
Table 2-15 and Table 2-16 depict the translation of the patterns into OWL DL, according to the pro-
posed ontological framework. The first table depicts the translation of the predicates into OWL DL
D4.3 Ontology / Information models covering the public health use cases Page 18 of 44
expressions. By following the triple-based pattern representation of the patterns, the subject (SUB)
and object (OBJ) correspond to ontology classes and the predicate to an OWL DL expression. These
DL expressions use one or more object properties from our ontologies, together with different quan-
tifiers, as a result of the underlying ontological model.
Predicate OWL DL expression
'describes situation' SUBJ subClassOf shn:isAboutSituation only OBJ
'describes quality' SUBJ subClassOf shn:isAboutQuality only OBJ
'results from process' SUBJ subClassOf btl:isOutcomeOf some OBJ
'has attribute' SUBJ subClassOf btl:hasInformationAttribute some OBJ
'is quality of' SUBJ subClassOf btl:inheresIn some OBJ
'has observed value' SUBJ subClassOf btl:Quality and btl:projectsOnto some OBJ
'has value' SUBJ subClassOf btl:isRepresentedBy only
(shn:hasValue some OBJ)
'has units' SUBJ subClassOf btl:isRepresentedBy only
(shn:hasInformationAttribure some OBJ)
'has scale' SUBJ subClassOf btl:isRepresentedBy only
(shn:hasInformationAttribure some OBJ)
‘has participant’ SUBJ subClassOf btl:hasParticipant some OBJ
‘occurs at’ SUBJ subClassOf btl:projectsOnto some OBJ
‘happens at’ SUBJ subClassOf btl:isIncludedIn some OBJ
‘follows’ SUBJ subClassOf btl:isPrecededBy some OBJ
Table 2-15 OWL DL representation of the top-level patterns predicates
Note that the quantifiers used are different. For connecting information entities with clinical entities
we have used the universal quantifier (='only'), because we cannot assume that an instance of X
exists where there is an instance of "information on X". We are aware that there are technical diffi-
culties in such models, as they lead to the possibility of statements that are not about anything at all.
But to the best of our knowledge, adverse reasoning consequences are avoided as long as we use
very specific object properties (here is_about_situation instead of represents.) On the other hand, a
false diagnosis is an existing statement that is not about any clinical individual situation at all. A sec-
ond possibility is to allow hypothetical entities. More satisfactory but more complicated would be
the use of a higher order logic so that the uncertainty can be correctly targeted on the statement or
belief rather than on the underlying state of the world. However, this is likely to remain beyond the
scope of easily used computational logics for the near future, so that approximations using descrip-
tion logic are required.
The second table (cf. Table 2-16) provides the result of applying the predicates OWL DL translation
on each pattern. The OWL snippets can be understood as equivalent to the predicates. As it can be
observed, apart from the predicates translation, for some patterns such as the one that refer to the
past history of clinical situation (PH_CS_PT) or which include negation (I_NCS_PT, NPH_CS_PT), addi-
tional changes are required.
D4.3 Ontology / Information models covering the public health use cases Page 19 of 44
Predicate OWL DL expression
I_CS_PT
shn:InformationItem and shn:isAboutSituation only shn:ClinicalSituation and btl:isOutcomeOf some btl:Process and shn:hasInformationAttribute some shn:InformationAttribute
PH_CS_PT
shn:InformationItem and shn:isAboutSituation only (btl:BiologicalLife and btl:hasPart some shn:ClinicalSituation) and btl:isOutcomeOf some btl:Process and shn:hasInformationAttribute some shn:InformationAttribute
I_NCS_PT
shn:InformationItem and shn:isAboutSituation only (shn:ClinicalSituation and not btl:hasPart some shn:ClinicalSituation) and btl:isOutcomeOf some btl:Process and shn:hasInformationAttribute some shn:InformationAttribute
NPH_CS_PT
shn:InformationItem and shn:isAboutSituation only (btl:BiologicalLife and not btl:hasPart some shn:ClinicalSituation) and btl:isOutcomeOf some btl:Process and shn:hasInformationAttribute some shn:InformationAttribute
OB_CS_PT
shn:ObservationResult
and shn:isAboutQuality only (btl:ProcessQuality
and btl:inheresIn some shn:ClinicalSituation
and btl:projectsOnto some (btl:ValueRegion
and btl:isRepresentedBy only (
shn:hasInformationAttribute some shn:MeasurementUnits
and shn:hasValue some xml:datatype
and shn:hasInformationAttribute some shn:Scale)))
CP_PT
btl:Process
and btl:hasParticipant some btl:MaterialObject
and btl:projectsOnto some btl:TemporalRegion
and btl:isIncludedIn some btl:MaterialObject
CS_PT
shn:ClinicalSituation
and btl:hasParticipant some btl:MaterialObject
and btl:projectsOnto some btl:TemporalRegion
and btl:isIncludedIn some btl:MaterialObject
and btl:isPrecededBy only shn:ClinicalSituation
Table 2-16 OWL DL representation of the top-level patterns
Example of how the above two tables are used to transform data modelled according to the seman-
tic patterns into their OWL DL representation, as well as how they can be queried and some of the
implications with regards to negation are commented at the end of Chapter 4.
D4.3 Ontology / Information models covering the public health use cases Page 20 of 44
3 Extending the Heart Failure Summary (HFS) to support Public
Health
In the following, we will describe the two models created for recording the tobacco and alcohol use
respectively. For each model, we provide its graphical representation as provided by the openEHR
HFS template12 for readability purposes. This template uses a subset of OpenEHR archetypes, where
textual descriptions for each of data elements included are provided. Figure 3-1 depicts the main
structure of the template. Then, we will apply the semantic patterns described in Chapter 2 to the
modelling of the tobacco and alcohol use information. The patterns proposed do not cover aspects of
information presentation such as order or indentation but the semantic of the information they rep-
resent.
Figure 3-1 Main structure of the OpenEHR Heart Failure Summary extended to support public health
3.1 Tobacco Use Summary
Figure 3-2 depicts the graphical representation of the Tobacco Use Summary Model. The model can
be subdivided into two sub-models, depending on how the tobacco is consumed (i.e. smoked or
snuff). Table 3-1 and Table 3-11 summarize the data elements and value restrictions for each one.
SNOMED CT terms were used when possible for encoding the value restrictions. Next, we analyse
each of the models in separate subsections.
12
http://www.openehr.org/ckm/#showTemplate_1013.26.14_2
D4.3 Ontology / Information models covering the public health use cases Page 21 of 44
Figure 3-2 Graphical view of the OpenEHR Tobacco Use Template
3.1.1 Smoking Tobacco Model
The model consists of a set of findings obtained probably by asking the patient about certain clinical
situation (e.g. Do you smoke?) or about certain quality of the clinical situation such as the mass in-
take (e.g. How many cigarettes do you smoke per week?). In the first case the situation is defined as
the part of a patient's life in which intermittent smoking processes occur; and in the second one we
ask about the amount of cigarettes smoked (i.e. mass intake) in a certain period. The answer to the
questions can be quantitative (e.g. "I smoke 10 cigarettes per day") or qualitative (e.g. "I smoke to-
bacco daily").
In the following we will apply the top-level semantic patterns described in Section 2 to encode the
Smoking Tobacco model clinical data. Table 3-1 shows the model data elements and their value con-
straints as well as the semantic pattern applied for representing them (cf. #PatternID).
In order to decide which pattern to apply for each data element / value constrain combination, the
meaning of the information that we aim at representing has to be analysed. By formulating a set of
questions (could be ideally made by a modelling tool), the correct pattern can be chosen.
We have stated that the model consists of a set of findings about patient clinical situations and about
their qualities. At first glance, for representing some information about a clinical situation the top-
level pattern that best fit is the “information about clinical situation” pattern (I_CS_PT). However, by
further analysing the meaning of the clinical situation terms (e.g. smoker, ex-smoker, never smoked
tobacco, heavy smoker, etc.) we have to consider aspects such as: Do they refer to a present or past
situation? – Do they indicate their presence or absence? In the case of referring to a past situation
(e.g. ex-smoker), then the semantic pattern “past history about clinical situation” (PH_CS_PT) has to
be used. If besides the term refers to a non-existing past situation the “no past history of clinical situ-
ation” pattern fits this purpose (NPH_CS_PT).
D4.3 Ontology / Information models covering the public health use cases Page 22 of 44
In order to represent qualities of certain clinical situations (e.g. amount of tobacco smoked), the ob-
servation result pattern that describes the result of an observation or assessment about some quality
of a given clinical situation (OB_CS_PT) fits this purpose.
Data Element Value #Pattern ID
Smoking status
77176002 160616005 8517006 266919005
Smoker(finding) Trying to give up smoking (finding) Ex-smoker(finding) Never smoked tobacco (finding)
I_CS_PT PH_CS_PT NPH_CS_PT
Form 66562002 26663004 84498003
Cigarette smoking tobacco (substance) Cigar smoking tobacco (substance) Pipe smoking tobacco (substance)
I_CS_PT I_CS_PT I_CS_PT
Typical Smoked Amount
xml:double Units: /d, /wk OB_CS_PT
365982000 230060001 56578002
Heavy smoker (over 20 per day) Light cigarette smoker Moderate smoker (20 or less per day)
I_CS_PT I_CS_PT I_CS_PT
Pattern of Use 449868002 230059006
Smokes tobacco daily Occasional cigarette smoker
I_CS_PT I_CS_PT
Date Ceased xml:Date I_CS_PT
Pack Years xml:double OB_CS_PT
Table 3-1 Data Elements and Value Restrictions of the Smoking Tobacco Model
Before describing how the semantic patterns are applied here, it is important to comment on the
SNOMED CT concepts that have been selected for encoding the values of the model data elements.
The first important observation is that some of the names of the SNOMED CT terms are misleading.
In order to correctly place the terms within the pattern, it is important to know to which semantic
category (SNOMED CT hierarchy axis) they belong to (e.g. Finding, Procedure, Qualifier Value, etc.).
As an example, the term Smoker does not refer to a person but to a smoking situation, because it is
placed in the SNOMED CT Clinical finding hierarchy. An immediate conclusion is that it is important to
consider the underlying model of meaning (i.e. the SNOMED CT concept model) in order to use the
concept consistently within the EHR and with regards to the information model and thus to use the
right pattern for its modelling. For SNOMED CT, a desideratum would be to revise preferred terms
and fully specified names in order to prevent misleading interpretations.
An additional complication might exist in cases when the model of meaning is incomplete or faulty.
As an example, we find two terms in SNOMED CT for encoding the tobacco form: cigarette smoking
tobacco and cigarette tobacco smoker, which are in the substance and a finding hierarchy, respec-
tively. There is a risk that these terms are used as binding values within clinical models without con-
sidering their semantic types. The data element may refer to the finding, but the value is indeed a
substance. Then, apart from the complication derived of using these terms interchangeably, without
taking into consideration their intended meaning, the SNOMED CT concept model does not provide
any relationship between them. As a consequence, models that use them differently, will not be se-
mantically interoperable. However, this can be solved if both of them are semantically related as we
demonstrate here:
sct:CigaretteTobaccoSmokingSituation subClassOf btl:hasParticipant
some sct:CigaretteTobaccoSmokeSubstance
For clarity, left column of Table 3-2 depicts the code and fully specified name (FSN) of the SNOMED
CT concepts we use in the upcoming examples; the right column shows the names we use to refer to
them in order to facilitate the reader understanding. The re-naming reflects the semantic category of
the concepts.
D4.3 Ontology / Information models covering the public health use cases Page 23 of 44
SNOMED CT CODE & FSN Renaming suggestion
77176002 Smoker (finding) Tobacco smoking situation
160616005 Trying to give up smoking (finding) Effort of quitting tobacco smoking situation
8517006 Ex-smoker(finding) Past history tobacco smoking situation
266919005 Never smoked tobacco (finding) No past history tobacco smoking situation
66562002 cigarette smoking tobacco (substance) Cigarette tobacco smoke substance
26663004 cigar smoking tobacco (substance) Cigar tobacco smoke substance
84498003 pipe smoking tobacco (substance) Pipe tobacco smoke substance
65568007 cigarette smoker (finding) Cigarette tobacco smoking situation
59978006 cigar smoker (finding) Cigar tobacco smoking situation
82302008 pipe smoker (finding) Pipe tobacco smoking situation
365982000 heavy smoker (over 20 per day) Heavy tobacco smoking situation
230060001 light cigarette smoker Light cigarette tobacco smoking situation
56578002 moderate smoker (20 or less per day) Moderate cigarette tobacco smoking situation
449868002 smokes tobacco daily Daily tobacco smoking situation
230059006 occasional cigarette smoker occasional cigarette tobacco smoking situation
Table 3-2 Re-naming of the SNOMED CT concepts used in the smoking tobacco model
Besides, Figure 3-3 shows the original SNOMED CT hierarchy of the above concepts and how it would
look like after their re-modelling taking into consideration the two aspects described above: mislead-
ing term names and incomplete or faulty underlying model of meaning.
Figure 3-3 Re-modelling of the original Smoking Tobacco hierarchy
In the re-naming suggestion of the SNOMED CT concepts, we have not considered aspects such as
“over 20 per day” or “20 or less per day” which are part of concepts such as heavy smoker and mod-
erate smoker, respectively. On the one hand, this kind of knowledge is not formally represented in
SNOMED CT and is only included in the name of a primitive concept. On the other hand, the defini-
tion of (heavy / light / moderate) smoker can probably not be universally defined. We consider that
this kind of knowledge might vary across institutions or depend on study purposes and therefore
should not be included in the terminology. It could be implemented as a separate set of rules that
could be shared in an interoperability scenario. For instance, if in our local implementation a heavy
cigarette smoker means “>= 10 cigarettes per day”, we could represent it with the following general
OWL DL ontology axiom:
shn:ObservationResult and shn:isAboutQuality some (shn:MassIntake and btl:inheresIn some sct:CigaretteTobaccoSmokingSituation and btl:projectsOnto some (btl:ValueRegion
and btl:IsRepresentedBy only (shn:hasInformationObjectAttribute some sct:PerDay and shn:hasValue some double[> =10])))
SubClassOf (shn:InformationItem and shn:isAboutSituation only sct:HeavyCigaretteTobaccoSmokingSituation)
D4.3 Ontology / Information models covering the public health use cases Page 24 of 44
Taking into consideration what has been explained we will apply the mentioned top-level patterns to
the tobacco smoking model. Its application will consist of constraining them by following the cardi-
nality and value restrictions provided by the semantic pattern.
The most frequently used pattern is I_CS_PT. We will use it for representing all data element / value
combinations shown in left column of Table 3-3. Two triple of the patterns are used, S1 and S2, which
represents a clinical situation and the process performed to acquire that information respectively
and whose predicate cardinality has been constrained to 1 and their value range (object part of the
triple) to a subclass of clinical finding (i.e. clinical situation) and to an evaluation procedure according
to SNOMED CT.
We will constrain it as follows:
#N Subject Predicate Cardinality Object
S1 shn:InformationItem 'describes situation' 1 << 404684003 clinical finding (finding)
S2 shn:InformationItem 'results from process' 1 << 386053000 evaluation procedure (procedure)
S3 shn:InformationItem 'has attribute' 0 shn:InformationAttribute
Table 3-3 Semantic pattern I_CS_PT constrained
We have decided to constrain the value of the process used to acquire the information to an evalua-
tion procedure. However, a more specific procedure concept could be used instead, such as assess-
ment of lifestyle (procedure) or history taking (procedure). More contextual information can be add-
ed about this process such as who provided the information, who is the subject of the information,
the purpose, where was it acquired, etc. This can be done by means of the clinical process pattern
(cf. CP_PT), which provides a set of triples that can be constrained with that purpose. Here, for sim-
plicity we will limit it to providing the procedure term. In Chapter four we will provide examples of
this.
Table 3-4 shows the result of applying the I_CS_PT pattern to each of the data elements / value com-
binations of the clinical model. Each row has three columns: (1) data element and value combina-
tions that is the modelling focus; (2) the triple-based representation of this data element / value after
constraining the I_CS_PT and (3) the corresponding pattern triple number.
The representation of (Date Ceased / xml:DateTime) is an example of pattern composition in which
the semantic pattern that is used to further describe a clinical situation (CS_PT) is used to describe
the end date of the tobacco smoking situation.
Data Element / Value Triple representation #N
Smoking Status / Smoker (finding)
shn:InformationItem ‘describes situation’ sct:TobaccoSmokingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Form / cigarette smoking tobacco (substance)
shn:InformationItem ‘describes situation’ sct:CigaretteTobaccoSmokingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Form / cigar smoking tobacco (substance)
shn:InformationItem’ describes situation’ sct:CigarTobaccoSmokingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Form / pipe smoking tobacco (substance)
shn:InformationItem ‘describes situation’ sct:PipeTobaccoSmokingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Typical Smoked Amount / heavy smoker (over 20 per day)
shn:InformationItem ‘describes situation’ sct:HeavyTobaccoSmokingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Typical Smoked Amount / light cigarette smoker
shn:InformationItem ‘describes situation’ sct:LightCigaretteTobaccoSmokingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Typical Smoked Amount / moder-ate smoker (20 or less per day)
shn:InformationItem ‘describes situation’ sct:ModerateTobaccoSmokingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Pattern of Use / smokes tobacco daily
shn:InformationItem ‘describes situation’ sct:DailyTobaccoSmokingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Pattern of Use / occasional ciga- shn:InformationItem ‘describes situation’ sct:OccasionalCogaretteTobaccoSmokingSituation #S1
D4.3 Ontology / Information models covering the public health use cases Page 25 of 44
rette smoker shn:InformationItem ‘results from process’ sct:Evaluation #S2
Date Ceased / xml:Date shn:InformationItem ‘describes situation’ sct:TobaccoSmokingSituation sct:TobaccoSmokingSituation ‘has end time’ xml:DateTime
#S1 #C2
Table 3-4 Pattern-based representation of the Tobacco smoking model (I_CS_PT)
We have used the observation result pattern (OB_CS_PT), which describes the result of an observa-
tion or assessment for representing the typical smoked amount as a quantitative value, and for the
representation of pack years. The last is defined as follows by the clinical model:
As before, we will constrain the OB_CS_PT pattern by following the cardinality and value constraints
for representing both data element / value pairs shown in the left column of Table 3-6. Following,
Table 3-5 shows how the pattern is constrained:
#N Subject Predicate Cardinality Object
O1 shn:ObservationResult 'describes quality' 1 btl:ProcessQuality
O2 btl:Quality 'is quality of' 1 << 404684003 clinical finding (finding)
O3 shn:ObservationResult 'has observed value' 1 btl:ValueRegion
O4 btl:ValueRegion 'has value' 1 xml:Double
O5 btl:ValueRegion 'has units' 1 shn:MeasurementUnits
O6 btl:ValueRegion 'has scale' 1 shn:QuantitativeScale
O7 shn:ObservationResult 'results from process' 1 evaluation procedure (procedure)
Table 3-5 Semantic pattern OB_CS_PT constrained
All pattern triples constrain their predicate cardinality to one. Triple O2 constrains their value to a
subclass (<<) of clinical finding; Triple O4 is constrained to an xml double datatype; O6 to a quantita-
tive value scale and O7 is constrained to an evaluation procedure.
Table 3-6 shows the result of applying the OB_CS_PT pattern to each of the data elements / value
combinations of the clinical model.
Data Element / Value Triple representation #N
Typical smoked amount / 10 cigarettes / day
shn:ObservationResult 'describes quality' shn:MassIntake shn:MassIntake 'is quality of' sct:CigaretteTobaccoSmokingSituation shn:ObservationResult 'has observed value' btl:ValueRegion btl:ValueRegion 'has value' 10 btl:ValueRegion 'has units' sct:PerDay
#O1 #O2 #O3 #O4 #O5
Pack years / 2
shn:ObservationResult 'describes quality' shn:MassIntake shn:MassIntake 'is quality of' sct:TobaccoSmokingSituation shn:ObservationResult 'has observed value' btl:ValueRegion btl:ValueRegion 'has value' 2 btl:ValueRegion 'has units' sct:PackYears
#O1 #O2 #O3 #O4 #O5
Table 3-6 Pattern-based representation of the Tobacco smoking model (instantiated pattern OB_CS_PT)
Finally we will represent Smoking status / Ex-smoker (finding) and Smoking status / Never smoked
tobacco (finding). For the first one we will constrain the top-level semantic pattern for representing
the past history of a clinical situation (PH_CS_PT) by limiting the cardinality of the triple that de-
scribes the situation and the process used to acquire the information to one, and their value range
to a subclass of clinical finding and to evaluation procedure respectively (cf. Table 3-7).
D4.3 Ontology / Information models covering the public health use cases Page 26 of 44
#N Subject Predicate Cardinality Object
S1 shn:InformationItem 'describes situation' 1..1 << 404684003 clinical finding (finding)
S2 shn:InformationItem 'results from process' 1..1 evaluation procedure (procedure)
S3 shn:InformationItem 'has attribute' 0 shn:InformationAttribute
S4 shn:InformationItem 'has temporal context' 1..1 sct:InthePast
Table 3-7 Semantic pattern PH_CS_PT constrained for representing Smoking status / Ex-smoker(finding)
Table 3-8 depicts the triple based representation of Smoking status / Ex-smoker (finding) after apply-
ing the above pattern:
Data Element / Value Triple representation #N
Smoking status / ex-smoker(finding)
shn:InformationItem ‘describes situation’ sct: TobaccoSmokingSituation shn:InformationItem ‘results from process’ sct:Evaluation shn:InformationItem 'has temporal context' sct:InThePast
#S1 #S2 #S4
Table 3-8 Pattern-based representation of the Tobacco smoking model (PH_CS_PT)
For representing Smoking status / Never smoked tobacco (finding) we will constrain the top-level semantic
pattern NPH_CS_PT which describes the absence of a clinical situation in the past, as follows (cf. Table 3-9):
#N Subject Predicate Cardinality Object
S1 shn:InformationItem 'describes situation' 1..1 << 404684003 clinical finding (finding)
S2 shn:InformationItem 'results from process' 1..1 evaluation procedure (procedure)
S3 shn:InformationItem 'has attribute' 0 shn:InformationAttribute
S4 shn:InformationItem 'has temporal context' 1..1 sct:InthePast
S5 shn:InformationItem 'has situation context' 1..1 sct:Absent
Table 3-9 Semantic pattern NPH_CS_PT constrained for representing Smoking status / Never smoked tobacco
Next, their triple-based representation after applying the above pattern (cf. Table 3-10):
Data Element / Value Triple representation #N
Smoking status / Never smoked tobacco(finding)
shn:InformationItem ‘describes situation’ sct: TobaccoSmokingSituation shn:InformationItem ‘results from process’ sct:Evaluation shn:InformationItem 'has temporal context' sct:InThePast shn:InformationItem 'has situation context' sct:Absent
#S1 #S2 #S4 #S5
Table 3-10 Pattern-based representation of the Tobacco smoking model (NPH_CS_PT)
3.1.2 Model for other forms of tobacco use (Snuff)
The Snuff Tobacco Model is very similar to the Smoke Tobacco Model and therefore the semantic
patterns used and how they are specialised and composed is practically the same. The only differ-
ence is the set of SNOMED CT concepts used, that in this case refer to the snuff tobacco use instead
of the smoke tobacco use. Table 3-11 depicts the model data elements and value constraints, using
SNOMED CT concepts when possible, as well as the top-level pattern applied for each data element /
value combination that appears in the clinical model.
Data Element Value #Pattern ID
Snuff use status 228494002 228493008 228492003
snuff user (finding) ex-snuff user (finding) never used snuff (finding)
I_CS_PT PH_CS_PT NPH_CS_PT
Typical Snuff Consumption (tins) xml:Double /wk OB_CS_PT
Pattern of Use daily tobacco snuff user occasional tobacco snuff user
I_CS_PT I_CS_PT
Table 3-11 Data Elements an Value Restrictions of the Snuff Tobacco Model
As with the Smoke Tobacco Model, we have re-named the SNOMED CT concepts used by the model
taking into consideration their underlying model of meaning. We have not found SNOMED CT con-
D4.3 Ontology / Information models covering the public health use cases Page 27 of 44
cepts for encoding all the model values. They can be recognised in Table 3-11 because no codes are
provided. Table 3-12 depicts the result of the re-naming process.
SNOMED CT CODE & FSN Renaming suggestion
228494002 snuff user (finding) tobacco snuff situation
228493008 ex-snuff user (finding) past history tobacco snuff situation
228492003 never used snuff (finding) no past history tobacco nuff situation
Table 3-12 Re-naming of the SNOMED CT concepts used in the snuff tobacco model
Since the semantic patterns used have been constrained as in the tobacco smoking model, we will
show directly their triple-based representation after their application (cf. Table 3-13).
Data Element / Value Triple representation #N
Smoking Status / snuff user (finding)
shn:InformationItem ‘describes situation’ sct:TobaccoSnuffSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Smoking Status / ex-snuff user (finding)
shn:InformationItem ‘describes situation’ sct TobaccoSnuffSituation shn:InformationItem ‘results from process’ sct:Evaluation shn:InformationItem ‘has attribute’ sct:InThePast
#S1 #S2 #S4
Smoking Status / never used snuff (finding)
shn:InformationItem ‘describes situation’ sct: TobaccoSnuffSituation shn:InformationItem ‘results from process’ sct:Evaluation shn:InformationItem ‘has attribute’ sct:InThePast shn:InformationItem ‘has attribute’ sct:Absent
#S1 #S2 #S4 #S5
Typical Snuff Amount / xml:double /week
shn:ObservationResult 'describes quality' shn:MassIntake shn:MassIntake 'is quality of' sct:TinTobaccoSnuffSituation shn:ObservationResult 'has observed value' btl:ValueRegion btl:ValueRegion 'has value' 10 btl:ValueRegion 'has units' sct:PerWeek
#O1 #O2 #O3 #O4 #O5
Pattern of Use / daily tobacco snuff user
shn:InformationItem ‘describes situation’ shn:DailyTobaccoSnuffSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Pattern of Use / occasional tobacco snuff user
shn:InformationItem ‘describes situation’ shn:OccasionalTobaccoSnuffSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Table 3-13 Pattern-based representation of the Tobacco snuff model
Comparing the granularity of SNOMED CT regarding tobacco smoking to other forms of tobacco use,
it is not surprising that the representation of e.g. tobacco snuff usage is more coarse-grained, as this
kind of tobacco use is less relevant under clinical and public health aspects. For instance, concepts
analogous to "daily tobacco smoking" and "occasional tobacco smoking" are missing. We interpret
the meaning of these concepts as process patterns that indicate the regularity of behaviour, inde-
pendently of the intensity of the behaviour. It is therefore not the same as an observation result "per
day", which represents an average value and should not describe the regularity. It is no contradiction
to state that someone is an occasional smoker, on average, 2-3 cigarettes per day: There are occa-
sions where he smokes one pack, and then nothing at all for a whole week.
It would be advantageous if SNOMED provided concepts like "occasional behaviour", or "daily behav-
iour", which then can be post-coordinated with all kinds of drug or food intake, as well as other be-
haviours, e.g. physical activity. This would help represent information on, e.g. snuff use in the granu-
larity needed, without needing a further proliferation of primitive concepts.
Another difficult decision is how to deal with units such as "pack years", "cigarettes per day", (snuff)
"tins per day" etc. On the one hand one could consider them as alternative units for mass intake per
time unit (dimension: mass · time-1), adapted to the concrete context, and using rough approxima-
tions, instead of referring to exact masses such as 10mg tar, or 1 mg nicotine per cigarette. On the
other hand one could consider them as number per time unit (dimension: time-1), but then the model
has to distinguish which is exactly the thing you count (e.g. a single smoking event).
D4.3 Ontology / Information models covering the public health use cases Page 28 of 44
Units like "cigarettes per day" could also be seen as equivalents, e.g. for comparing cigar or pipe
smokers with cigarette smokers. In this case, these units would not describe the cardinality of repeti-
tive processes per time unit but are proxies for an estimated amount of tar and nicotine.
As much as cigarettes, cigars or pipe fillings, vary in their content of tar and nicotine, also packages
contain different amounts of units (cigarette packs may have 12 or 20 cigarettes). The same applies
to snuff tins. Using these references alone, one has to accept large variations and systematic biases,
which affect comparability. Ontologies cannot solve this problem, and therefore have to define relat-
ed concepts as primitive.
3.2 Alcohol Use Summary
Our second risk factor under scrutiny is alcohol use. Figure 3-4 depicts the graphical representation
of the Alcohol Use Summary Model, as provided by the OpenEHR template. Table 3-14 summarize
the data elements and value restrictions used in the clinical model. As before, SNOMED CT terms
were used when possible for encoding the value restrictions.
Figure 3-4 Graphical view of the OpenEHR Alcohol Use Template
Data Element Value #Pattern ID
Status 219006 82581004 228274009
current drinker of alcohol (finding) ex-drinker (finding) lifetime non-drinker (finding)
I_CS_PT PH_CS_PT NPH_CS_PT
Form 160589000 160591008
beer drinker (finding) drinks wine (finding)
I_CS_PT I_CS_PT
Typical Alcohol Consumption
xml:double /d OB_CS_PT
160576006 86933000 228277002
moderate drinker - 3-6u/day (finding) heavy drinker (finding) light drinker (finding)
I_CS_PT I_CS_PT I_CS_PT
Pattern of Use 228318004 228310006
regular drinker (finding) drinks in morning to get rid of hangover (finding)
I_CS_PT I_CS_PT
Binge Drinking Pattern
None Less than once per month Monthly Weekly Daily
I_NCS_PT I_CS_PT I_CS_PT I_CS_PT I_CS_PT
Date commenced xml:Date I_CS_PT
Date Ceased xml:Date I_CS_PT
Standard Drink Definition xml:double /mg OB_CS_PT
Table 3-14 Data Elements and Value Restrictions of the Alcohol Use Model
D4.3 Ontology / Information models covering the public health use cases Page 29 of 44
As in the previous models, we have re-named the SNOMED CT concepts used by the model taking
into consideration their underlying model of meaning. We have not found SNOMED CT concepts for
encoding all the model values. They can be recognised in Table 3-14 because no codes are provided.
Table 3-15 depicts the result of the re-naming process.
SNOMED CT CODE & FSN Renaming suggestion
219006 current drinker of alcohol (finding) alcohol drinking situation
82581004 ex-drinker (finding) past history alcohol drinking situation
228274009 lifetime non-drinker (finding) no past history alcohol drinking situation
160589000 beer drinker (finding) beer alcohol drinking situation
160591008 drinks wine (finding) wine alcohol drinking situation
160576006 moderate drinker - 3-6u/day (finding) moderate alcohol drinking situation
86933000 heavy drinker (finding) heavy alcohol drinking situation
228277002 light drinker (finding) light alcohol drinking situation
228318004 regular drinker (finding) regular alcohol drinking situation
228310006 drinks in morning to get rid of hangover (find-ing)
morning alcohol drinking situation follows alcohol drinking hangover situation
228315001 binge drinker (finding) heavy episodic drinking situation
Table 3-15 Re-naming of the SNOMED CT concepts used in the alcohol use model
Besides, Figure 3-5 shows the original SNOMED CT hierarchy of the above concepts and how it would
look like after their re-modelling taking into consideration the two aspects described above: mislead-
ing term names and incomplete or faulty underlying model of meaning.
Figure 3-5 Re-modelling of the original Alcohol use hierarchy
Finally, Table 3-16 shows the triple-based representation after the application of the patterns to the
alcohol use clinical model.
As already commented with tobacco use, interoperability of data on alcohol consumption is affected
by a series of issues:
Ill-defined concepts, like "one drink", "glass", "less than once per month" etc.
Great variety of alcohol content even of similar products of the same kind such as beer
(compare Swedish beer with Belgian beer).
The question of defining references such as "glass per day" as a unit of measurement or
counting drinking events of a certain type ("glass wine") and then refer to the number per
time unit.
D4.3 Ontology / Information models covering the public health use cases Page 30 of 44
Different process patterns, such as regular consumption, binge consumption, morning con-
sumption etc., which are, per se, ill defined, but which could be introduced into the termi-
nology on a relatively abstract level and then used for post-coordination with a broad range
of different behaviours, in a similar way as recommended for the tobacco use cases.
Alike data on tobacco use, alcohol consumption data are not results from observations but
from questions asked to the patient or to relatives, and are therefore subject to considerable
bias, compared to observable entities in a strict sense.
Interpretations that are the result of an assessment cannot be formalized, such as "moder-
ate" or "heavy" drinker.
Data Element / Value Triple representation #N
Status / current drinker of alcohol (find-
ing)
shn:InformationItem ‘describes situation’ sct:AlcoholDrinkingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Status / ex-drinker (finding)
shn:InformationItem ‘describes situation’ sct:AlcoholDrinkingSituation shn:InformationItem ‘results from process’ sct:Evaluation shn:InformationItem ‘has attribute’ sct:InThePast
#S1 #S2 #S4
Status / lifetime non-drinker (finding)
shn:InformationItem ‘describes situation’ sct:AlcoholDrinkingSituation shn:InformationItem ‘results from process’ sct:Evaluation shn:InformationItem ‘has attribute’ sct:InThePast shn:InformationItem ‘has attribute’ sct:Absent
#S1 #S2 #S4 #S5
Form / beer drinker (finding)
shn:InformationItem ‘describes situation’ sct:BeerAlcoholDrinkingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Form / drinks wine (finding)
shn:InformationItem ‘describes situation’ sct:WineAlcoholDrinkingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Typical Alcohol Consumption xml:double /d
shn:ObservationResult 'describes quality' shn:MassIntake shn:MassIntake 'is quality of' sct:AlcoholDrinkingSituation shn:ObservationResult 'has observed value' btl:ValueRegion btl:ValueRegion 'has value' some x:double btl:ValueRegion 'has units' sct:PerDay
#O1 #O2 #O3 #O4 #O5
Pattern of Use / regular drinker (finding)
shn:InformationItem ‘describes situation’ sct:RegularAlcoholDrinkingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Pattern of Use / drinks in morning to get rid of
hangover (finding)
shn:InformationItem ‘describes situation’ sct:MorningAlcoholDrinkingSituation sct:MorningAlcoholDrinkingSituation ‘follows’ sct:AlcoholDrinkingHangoverSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #C4 #S2
Binge Drinking Pattern / None
shn:InformationItem ‘describes situation’ sct:HeavyEpisodicDrinkingSituation shn:InformationItem ‘results from process’ sct:Evaluation shn:InformationItem ‘has attribute’ sct:Absent
#S1 #S2 #S4
Binge Drinking Pattern / Less than once per month
shn:InformationItem ‘describes situation’ sct:HeavyEpisodicDrinkingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S2
Binge Drinking Pattern / Monthly
shn:InformationItem ‘describes situation’ sct:HeavyEpisodicDrinkingSituation shn:InformationItem ‘describes situation’ sct:MonthlyDrinkingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S1 #S2
Binge Drinking Pattern / Weekly
shn:InformationItem ‘describes situation’ sct:HeavyEpisodicDrinkingSituation shn:InformationItem ‘describes situation’ sct:WeeklyDrinkingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S1 #S2
Binge Drinking Pattern / Daily
shn:InformationItem ‘describes situation’ sct:HeavyEpisodicDrinkingSituation shn:InformationItem ‘describes situation’ sct:DailyDrinkingSituation shn:InformationItem ‘results from process’ sct:Evaluation
#S1 #S1 #S2
Date commenced / xml:Date
shn:InformationItem ‘describes situation’ sct:AlcoholDrinkingSituation sct:TobaccoSmokingSituation ‘has start time’ xml:Double
#S1 #C2
Date Ceased / xml:Date
shn:InformationItem ‘describes situation’ sct:AlcoholDrinkingSituation sct:TobaccoSmokingSituation ‘has end time’ xml:Double
#S1 #C2
Table 3-16 Pattern-based representation of the Alcohol use model
D4.3 Ontology / Information models covering the public health use cases Page 31 of 44
4 Dealing with heterogeneous tobacco use representations
4.1 Objectives
In Chapter 2, we have introduced the subset of top-level semantic patterns that later in Chapter 3 we
have applied for the modelling of the tobacco and alcohol use information. We have focused on the
semantic-based representation of this information for the models produced as an extension of the
heart failure summary. However, it is a known issue that clinical records do not provide such detailed
information that can be directly used by public health systems13.
In Chapter 1 we mentioned that we set our focus on EHRs, or more exactly, on a highly standardised
summary (the HFS) within an EHR, as source of information for public health purposes, although we
were aware that this was not the only possible source, and likely not the primary one. However, data
sources specifically produced for public health purposes would probably look very similar, so that this
may justify our reference to the HFS for the sake of simplicity, given that SemanticHealthNet, in its
current phase, targets on exemplars for feasibility testing and not yet to ready-to-use artefacts.
Within this framework, we will expose an interoperability use case in which the EHR information
about tobacco and alcohol use has been produced within different contexts, attending to different
requirements. Additionally, it has been recorded by using a different EHR standard (i.e. OpenEHR,
HL7 CDA and HL7v3) and all of them refer to SNOMED CT as terminology.
Our goal here is: (1) demonstrate that data from heterogeneous models can be homogeneously que-
ried and (2) data can be retrieved within their context which help public health systems to interpret
them right.
The three models are heterogeneous in the sense that they are syntactically different and also may
provide different information detail, however iso-semantic parts can be identified across them.
In order to carry out our two goals we will use semantic patterns to act as bridge between the three
clinical models and the ontological framework proposed. Besides they will be used to guide the map-
ping between clinical models and ontologies. In this way we will get the information semantically
represented according to the ontologies proposed and it will be possible to perform homogeneous
queries across the different datasets.
4.2 Description of the models
The first model is part of the heart failure summary and is the one described in Chapter 3. It collects
detailed information about tobacco consumption, obtained from different sources, and is considered
as an extension of the HFS targeted to investigate the tobacco use in heart failure patients by public
health systems. It has been represented by following the openEHR specification.
Each of the models described in this section could have been represented according to a different
EHR standard (e.g. ISO 13606) or EHR modelling approach (SIAMM). However, we do not focus on
syntactical/structural representational aspects but in formalizing the meaning of the information
13
http://www.j-biomed-inform.com/article/S1532-0464%2807%2900060-3/abstract
D4.3 Ontology / Information models covering the public health use cases Page 32 of 44
represented by clinical models and demonstrating that this formalization enables the homogeneous
query of heterogeneous datasets.
The second model has been represented according to HL7 CDA and follows one of the templates de-
fined as part of the Consolidated CDA (C-CDA)14 solution, which provides a library of reusable CDA
templates. The template comprises the data elements and vocabulary requirements needed for
meeting the EHR Certification Criteria in support of the U.S. Meaningful Use Stage 215 and might be
extended depending on additional information requirements. Thus, this CDA model is very generic
and only records a patient’s smoking status within the social history section of the patient record.
Here we will not consider any extension.
The third model is part of a Detailed Clinical Model (DCM) used to encode the information related
with the patient smoking assessment and cessation process and has been represented by using HL7
v3. The smoking assessment and cessation process consists of the following seven steps: (i) Advice
stop smoking; (ii) Draft smoke profile; (iii) Increase motivation; (iv) Barriers inventory; (v) Patient ed-
ucation and advise; (vi) Make an appointment to stop, and (vii) After care. Here, we show a snapshot
of the model, which records the smoking profile of the patient, which corresponds to the second step
of the cessation workflow.
Table 4-1 shows an excerpt of some data elements and terminology value requirements for the DCM
and Meaningful Use models. The OpenEHR Tobacco Use data elements and value restrictions were
described in Table 3-1 in Chapter 3.
Tobacco Detailed Use (DCM) Meaningful Use (C-CDA)
Data Element Value Data
Element Value
Number per day
365982000
finding of tobacco smoking consumption
(finding) 259032004
Quantity and units per day
Smoking status
449868002 428041000124106 8517006 266919005 428071000124103
Current every day smoker Current some day smoker Former smoker Never smoker Heavy Tobacco Smoker, etc.
type of tobacco use
77176002 smoker (finding) <<77176002 smoker (finding)
Table 4-1 Data elements and values (SNOMED CT) of an excerpt of DCM and C-CDA tobacco models
The Meaningful Use model provides only one data element for recording the tobacco smoking status.
The status value is constrained to a set of SNOMED CT codes to meet the certification criteria in sup-
port of Meaningful Use Stage 2 (e.g. Current every day smoker).
For the DCM we have represented the amount of tobacco consumed per day and the type of tobacco
used. Figure 4-1 and Figure 4-2 depict an excerpt of each clinical model represented in XML.
From the above we can state that it is not possible to impose a single model representation across
diverse clinical communities (e.g. public health vs. primary care vs. specialised care) and clinical prac-
tices, and that the requirements will dictate the level of information detail needed. As we can see,
14
HL7 IG for CDAR2: IHE Health Story Consolidation, R1", Consolidated CDA, C-CDA: http://www.hl7.org/implement/standards/ (Last accessed Jan. 2014) 15
US Meaningful Use Stage 2: http://www.cms.gov/Regulations-and-Guidance/Legislation/EHRIncentivePrograms/Downloads/Stage2_Guide_EPs_9_23_13.pdf (Last accessed Jan. 2014)
D4.3 Ontology / Information models covering the public health use cases Page 33 of 44
each of the models provides a different level of information granularity apart from being represented
by following different EHR standards. Table 4-2 shows a summary of the tobacco data items names of
each model that will be used in the interoperability use case:
Then, by considering these clinical limits, the immediate question is which degree of semantic in-
teroperability we can offer, or up to which degree we can make the above models semantically in-
teroperable. At the beginning of this chapter we mentioned that we had two goals: (i) demonstrate
that data from heterogeneous models can be homogeneously queried, and (ii) data can be retrieved
within their context which help public health systems to interpret them right.
Assuming that we have provided the means for recording the contextual information previously de-
scribed (e.g. information provider, site of care, etc.) (cf. Section 4.1) and it can be retrieved as answer
of each query, the next step is demonstrating that data rendered according to each model can be
homogeneously queried and that we get an answer even when information is provided at different
granularity level by each model.
Tobacco Use (openEHR) Meaningful Use (C-CDA) Tobacco Detailed Use (DCM)
Status Status Number per day
Form Type of tobacco use
Typical Smoked Amount
Pattern of Use
Date Ceased
Pack Years Pack years
Status Table 4-2 Summary of the data items of each of the three clinical models used for the interoperability use case
D4.3 Ontology / Information models covering the public health use cases Page 34 of 44
Figure 4-1 Excerpt of the DCM expressed in HL7 v3
Figure 4-2 Excerpt of the HL7 CDA clinical model
<component typeCode="COMP">
<!—Smoking profile -->
<organizer moodCode="EVN" classCode="CATEGORY">
<templateId root="6619D143-F556-41ed-89D5-B74EAF1149C0" />
<code code="365981007" displayName="finding of tobacco smoking behavior (finding)"
codeSystemName="SCT" codeSystem="2.16.840.1.113883.6.96" />
<component typeCode="COMP">
<!—Start smoking -->
<observation moodCode="EVN" classCode="OBS">
<templateId root="EB2B71AA-4125-4413-8D64-3AB2663217F1" />
<code code="266929003" displayName="smoking started (finding)" codeSystemName="SCT"
codeSystem="2.16.840.1.113883.6.96" />
<value xsi:type="TS" value="" />
</observation>
</component>
<component typeCode="COMP">
<!-- Number per day-->
<observation moodCode="EVN" classCode="OBS">
<templateId root="C16CF761-C456-44a3-A8C1-CD90D1AA3152" />
<code code="365982000" displayName="tobacco smoking consumption - finding" codeSys-
temName="SCT" codeSystem="2.16.840.1.113883.6.96" />
<value xsi:type="PQ" unit="" value="" />
</observation>
</component>
<component typeCode="COMP">
<!—Pack years -->
<observation moodCode="EVN" classCode="OBS">
<templateId root="F3434D44-7D7A-4f24-BA22-EFA0F604CE55" />
<code code="8664-5" displayName="Cigarettes smoked total (pack per year) - Reported"
codeSystemName="LOINC" codeSystem="2.16.840.1.113883.6.1" />
<value xsi:type="PQ" unit="y" value="" />
</observation>
</component>
<component typeCode="COMP">
<!-- type of tobacco use -->
<observation moodCode="EVN" classCode="OBS">
<templateId root="613E318E-5796-4a98-A9F1-556D61029882" />
<code code="77176002" displayName="smoker (finding)" codeSystemName="SCT" codeSys-
tem="2.16.840.1.113883.6.96" />
<value xsi:type="CD" displayName="" code="" codeSystemName="SoortTabakGebruik" code-
System="" />
</observation>
</component>
</organizer>
</component>
<!-- Smoking status observation -->
<section>
<!-- Social History Section templateId -->
<templateId root="2.16.840.1.113883.10.20.22.2.17"/>
<code code="29762-2" codeSystem="2.16.840.1.113883.6.1"
displayName="Social history"/>
<title>Social History</title>
<text>…</text>
<entry>
<observation classCode="OBS" moodCode="EVN">
<templateId root="2.16.840.1.113883.10.20.22.4.78"/>
<code code="ASSERTION" codeSystem="2.16.840.1.113883.5.4"/>
<statusCode code="completed"/>
<effectiveTime>
<!-- Date the person began smoking -->
<low …/>
</effectiveTime>
<value xsi:type="CD" code="…" displayName="…" />
</observation>
</entry>
</section>
D4.3 Ontology / Information models covering the public health use cases Page 35 of 44
4.3 Mapping of clinical model data to their semantic representation by
using semantic patterns as bridge
In order to be able to query homogeneously data rendered according to the previous models we
have to make explicit their underlying semantics. The following steps have to be followed in order to
get finally the data semantically represented: (i) select the semantic pattern to be applied; (ii) apply
the pattern to the clinical model by accomplishing the value range and cardinality constraints indi-
cated by the pattern; (iii) instantiate the pattern with the concrete patient data, and (iv) transform
the pattern-based representation of the data into OWL DL or RDF in order to be queried.
Next, we will demonstrate the three first steps for an excerpt of the tobacco use openEHR archetype.
The fourth step as well as examples of queries will be demonstrated in the subsequent chapter sec-
tion. Figure 4-4 shows on the left the archetype ADL representation and on the right hand side the
semantic patterns, represented as tables, and selected for the mapping. The archetype includes two
data elements: smoking status (ELEMENT [at0002]) and smoking form (ELEMENT [at0028]). Three
different patterns, each corresponding to a table colour, have been used in the mapping process:
information about clinical situation pattern (I_CS_PT); No past history of clinical situation pattern
(NPH_CS_PT), and clinical process pattern (CP_PT).
The selection of the semantic pattern to be applied could be helped by the use of keywords such as
“past situation”, “absence”, “observation result”, “test result”, “assessment”, “symptom” etc. to-
gether with considering the semantic category of each of the SNOMED CT terms used as values with-
in the archetype. By following the principles that have been explained in Chapter 2 for each of the
top-level patterns described and by its implementation in an appropriate tool, the modeller should
be assisted in the selection process.
In order to map the smoking status data element (ELEMENT [at0002]) to a semantic pattern, their
possible values (i.e. Current smoker and Never smoked) have to be considered. Current smoker or
tobacco smoking situation corresponds to a clinical entity, however, never smoked corresponds to an
information entity that refers to the concept Tobacco smoking situation. This means that each of the
values for this data element corresponds to a different semantic category in the ontology (clinical
entity vs. information entity) and therefore the same semantic pattern cannot be applied to both of
them. This distinction is important in order to be able to provide semantic interoperability when dif-
ferent modelling possibilities coexist (cf. Figure 4-3).
We assume that modelling cases such as the one shown in this figure will always exist, since it is
hardly possible to avoid them when proprietary data base schemas or when clinical models adapted
to particular contexts are used. Therefore, a different pattern is used for representing each value and
then the mapping is done at the “value level”. The two pattern tabular representations show how
the mapping is done. The cell shaded in red shows the pattern part that is filled in with the archetype
data element value.
D4.3 Ontology / Information models covering the public health use cases Page 36 of 44
Figure 4-3 Two modelling options for past history of tobacco smoking situation
The smoking form data element (ELEMENT [at0028]) has three possible values (cigar, cigarette, and
pipe). In this case, the three of them belong to the same semantic category (clinical entity) and the
mapping with the semantic pattern can be done at the level of the data element. The pattern applied
has been the I_CS_PT and the variable part has been filled with and OR expression of the three val-
ues. When the model is instantiated, only one of them will be selected, as the cardinality of the triple
predicate states (1).
Finally, the clinical process pattern (CP_PT) has been used to represent the contextual information
related with the process of acquiring the smoking status and data information. The root data ele-
ment (EVALUATION [at0000]) has been mapped to the semantic pattern that is used by composition
by the other semantic patterns applied, as the dotted arrows depict. In this case, we have indicated
that the process used to acquire the information was history taking and the information provider,
subject of information and site of care have to be instantiated when the model is filled with patient
clinical data.
Next, Figure 4-4 shows how an excerpt of the mapping of an archetype with a set of semantic patterns.
Figure 4-4 Mapping the OpenEHR archetype with a set of semantic patterns
Once we have demonstrated an example of “mapping” between a clinical model (and its instantia-
tion) and a set of semantic patterns, in the following will provide final pattern-based representation
of some data exemplars from each of the three models. As clinical data exemplars we will represent:
- openEHR: Patient X has a past history of heavy cigarette smoking, he smoked typically 10 cig-
arettes per day, he stopped in March 2010 (Part of the Heart Failure Summary record, ob-
tained by asking the patient).
- HL7 CDA: Patient Y is a heavy cigarette smoker. According to Meaningful Use it means >=10
/day (part of the social history record of the patient).
- DCM in HL7v3: Patient Z is a cigar smoker and smokes typically 15 cigar per day (part of the
profile of the smoker recorded in a smoking cessation service).
Next the corresponding data element / value pairs (cf. Table 3-1 and Table 4-1) by using SNOMED CT
terms: OpenEHR: Smoking status/ex-smoker (finding); Form/cigarette smoking tobacco (substance);
typical smoked amount/10 per day and Date ceased / March 2010; HL7 CDA: Smoking status/heavy
cigarette smoker (finding) and HL7v3: finding of tobacco smoking consumption (finding) /15 per day;
type of tobacco use / cigar smoker (finding).
As already mentioned, for the effective public health use of the data, they have to be retrieved to-
gether with their context. The following data items specify the context:
The process used to acquire the information (e.g. history taking, observation assessment,
etc.)
The provider of the information: the agent who provided the information. This is usually the
patient or the clinician, but may be someone else, or a software application or device.
The subject of information: The person that is the subject of the information (e.g. patient,
relative, etc.).
The incentive: The incentive of acquiring the information (e.g. screening program for certain
age band).
The location: the service location where the information was acquired (e.g. outpatient clinic,
online platform, cessation clinic, etc.)
The above list is not a “closed list”, and additional contextual information might be required to sup-
port public health such as degree of trust, source of evidence used, etc. In the previous figure and in
Chapter 2 (Table 2-12) an example of recording of contextual information by following the clinical
process pattern (CP_PT) was provided.
Finally, Table 4-3, Table 4-4 and Table 4-5 depict the result of the application of the semantic top-
level patterns from Chapter 2 in order to represent the openEHR, HL7 CDA and HL7v3 DCM conform-
ing clinical data respectively.
The left and right columns of each table show the correspondences between the model data ele-
ments / value pairs and the pattern triples.
In the OpenEHR model case, the smoking status and the form are both mapped to the I_CS_PT pat-
tern, since the smoking status refers to a patient smoking situation and the form is part of the situa-
D4.3 Ontology / Information models covering the public health use cases Page 39 of 44
tion class definition, refining it. The typical amount smoked is mapped to the OB_CS_PT pattern since
it is an assessment result. In the same table, the triples obtained are provided. Triples with minimum
cardinality one are mapped to the model (eg. shn:InformationItem 'describes situation'
shn:ClinicalSituation). Value constraints have been applied constraining the object part of the triple
(e.g. shn:InformationItem 'describes situation' shn:ClinicalSituation) to the specific clinical situation
(sct:TobaccoSmokingSituation).
Data Element / Value Triple representation #N
Smoking Status / ex-smoker (finding)
shn:InformationItem ‘describes situation’ sct: TobaccoSmokingSituation shn:InformationItem ‘results from process’ sct:Evaluation shn:InformationItem'has temporal context' sct:InThePast
#S1 #S2 #S4
Form / cigarette smoking tobacco (substance)
shn:InformationItem 'describes situation' sct:CigaretteTobaccoSmoking Situation-shn:InformationItem 'results from process' sct:Evaluation
#S1 #S2
Typical smoked amount / 10 cigarettes / day
shn:ObservationResult 'describes quality' shn:MassIntake shn:MassIntake 'is quality of' sct:CigaretteTobaccoSmokingSituation shn:ObservationResult 'has observed value' btl:ValueRegion btl:ValueRegion 'has value' 10 btl:ValueRegion 'has units' sct:PerDay
#O1 #O2 #O3 #O4 #O5
Date Ceased / xml:Date shn:InformationItem ‘describes situation’ sct TobaccoSmokingSituation sct:TobaccoSmokingSituation ‘has end time’ “2010-03-01”
#S1 #C2
Table 4-3 OpenEHR: “Smoker, cigarette smoker, 10 cigarettes per day”; Correspondences and Pattern triples
Table 4-4 depicts the result of specialising the top-level content patterns and the correspondences
with regards to the HL7 CDA data. The smoking status, as in the openEHR case, is mapped to the in-
formation about clinical situation pattern. The HL7 CDA model defines heavy smoker as at least 10
cigarettes / day. However, the definition is particular to this HL7 implementation and might vary
across institutions or depend on research study purposes.
Data Element / Value
Triple representation #N
Smoking Status / Heavy Cigarette Tobacco Smoker
(finding)
shn:InformationItem describes situation sct:HeavyCigaretteSmokingSituation shn:InformationItem results from process sct:Evaluation
#S1 #S2
Table 4-4 HL7 CDA “Heavy cigarette tobacco smoker (>=10)”; Correspondences and Pattern triples
Finally, Table 4-5 depicts the triple-based representation for the HL7 v3 DCM clinical data. The pat-
terns I_CS_PT and OB_CS_PT have been used for representing the type of tobacco used and the
number of cigar per day respectively.
Data Element / Value
Triple representation #N
type of tobacco use / cigar smoker (finding)
shn:InformationItem 'describes situation' sct:CigarTobaccoSmoking Situation shn:InformationItem 'results from process' sct:Evaluation
#S1 #S2
Number per day / 15 cigar / day
shn:ObservationResult 'describes quality' shn:MassIntake shn:MassIntake 'is quality of' sct:CigarTobaccoSmokingSituation shn:ObservationResult 'has observed value' btl:ValueRegion btl:ValueRegion 'has value' 15 btl:ValueRegion 'has units' sct:PerDay
#O1 #O2 #O3 #O4 #O5
Table 4-5 HL7v3 DCM “Cigar tobacco smoker and smokes 15 cigar per day” )”; Correspondences and Pattern triples
D4.3 Ontology / Information models covering the public health use cases Page 40 of 44
4.4 Homogeneous query of data from the three clinical models: Tobacco Use (openEHR), Meaningful Use (HL7 C-CDA) and Tobacco Detailed Use (DCM- HL7v3)
Once the patient data obtained from the three heterogeneous systems have been mapped into their
semantic pattern-based representation (cf. Table 4-3, Table 4-4 and Table 4-5). By applying the cor-
respondences provided by Table 2-15 and Table 2-16, we can get data OWL DL representation in or-
der to allow their homogeneous query and the use of DL reasoning.
Next, we will formulate a set of queries that could be interesting for a public health system in order
to aggregate data about heart failure diagnosed patients, and then assess the incidence of smoking
tobacco in developing heart failure.
(This use case is fictitious and it only aims at demonstrating the value of the approach proposed with regards to the exploitation of EHR information for public health purposes.)
Question exemplars: How many heart failure diagnosed patients…:
(Q1) … are currently heavy smokers (>10 / day)
(Q2) … quitted smoking within the last 20 years
(Q3) … never used tobacco in any of its forms (smoke, snuff, etc.)
Table 4-6 depicts the DL representations of the above questions:
The first question (Q1) should retrieve patient Y and Z. Patient Z data specifically state that smokes
“15 cigar / day”; Patient Y data state that is a heavy cigarette smoker. The term is used according to
Meaningful Use, meaning “>=10 /day” (here >= 10 cigarette / day). The query does not specify the
smoking form, just that they smoke tobacco, thus, both patients smoke more than 10 / day (ciga-
rettes or cigars) and both information instances will be retrieved together with its contextual infor-
mation.
The second question (Q2) should retrieve patient X, since he has a past smoking history and quitted
in March 2010 (within the last twenty years). The question asks for all the past tobacco smokers, in-
dependently of the form, whose end smoking date is greater or equal than 1994, March (within the
last twenty years).
Finally the third query (Q3) will not retrieve any patient. Two of them are current smokers, and the
other one smoked in the past. We consider that if it has not been explicitly recorded that the patient
never smoked we cannot infer it so (absence of information does not mean negation). However, on-
tologies, as opposite to traditional databases where complete information is assumed, assume in-
complete information, which affects how data can be queried. If something has not explicitly negat-
ed it is not considered false as opposed to traditional databases. This has to be considered in the
formulation of queries, which constitutes a barrier for most users. However, based on our present
experience, we think that the use of query patterns using principles similar to the semantic patterns
described above could hide that complexity.
D4.3 Ontology / Information models covering the public health use cases Page 41 of 44
In the case of (Q3), it explicitly asks for those patients of whom the records state that they do not
have a past history of smoking (Q3). We will only count those in the result and will not consider ab-
sence of information as result. Where we know that this information can be derived from other
statements and is generalizable for all patients, then a specific data instance about absence will have
to be created, probably in an automatic way whenever certain data are checked for presence. How-
ever, it is possible that the decision whether missing data for a specific patient characteristic can be
interpreted as absent condition can only be made individually. In this case, only mechanisms that
make easier taken such decision can be provided.
As it can be observed in the rendering of the DL queries, they also follow the semantic patterns, in
this case after their OWL DL transformation. Q1 follows the observation result semantic pattern, Q2
the past history situation, and Q3 the no past history of clinical situation. The three of them will re-
trieve the data within context, this last encoded as part of the evaluation procedure (cf. Table 2-12).
#Q1 shn:ObservationResult
and shn:isAboutQuality only (shn:MassIntake and btl:inheresIn some sct:TobaccoSmokingSituation and btl:projectsOnto some (btl:ValueRegion and btl:isRepresentedBy only
(shn:hasInformationAttribute some sct:PerDay and shn:hasValue some int[>10])))
#Q2 shn:InformationItem and btl:isOutcomeOf some sct:Evaluation and shn:isAboutSituation only (btl:BiologicalLife and btl:hasPart some (sct:TobaccoSmokingSituation and btl:projectsOnto some (btl:PointInTime
and btl:isRepresentedBy some dateTime[>="1994-03-01T00:00:00Z"])))
and shn:hasInformationAttribute some sct:InThePast
#Q3 shn:InformationItem and shn:isAboutSituation only (btl:BiologicalLife and not btl:hasPart some sct:TobaccoSmokingSituation) and btl:isOutcomeOf some sct:Evaluation and shn:hasInformationAttribute some sct:InThePast
Table 4-6 DL Query examples
We have demonstrated that although data are expressed at different detail levels, they can be ac-
cessed homogeneously thanks to the underlying model of meaning (i.e. ontological framework).
We have formulated the above queries as DL queries for facilitating the reader the understanding of
what is being queried. However, query languages (QLs) based on DL are computationally expensive
and therefore have a limited scalability. Other query languages based on RDF graphs such as SPARQL
are more powerful and performs better but are agnostic with regards OWL DL semantics, not allow-
ing generally DL reasoning. However, combined solutions that perform better and use partially DL
reasoning capabilities exist. Besides, QLs as SPARQL are more expressive, query functionality closer
to traditional databases QLs such as SQL (ordering, count, filter, string matching, etc.).
Table 4-7, shows the rendering of Q2 in SPARQL using Turtle syntax and by using the count operation
for getting the number of patients retrieved by the query (here: ?count = “2”), instead of each indi-
vidual data instance as we would retrieve with the DL query (cf. Table 4-8).
D4.3 Ontology / Information models covering the public health use cases Page 42 of 44
#Q2
SELECT COUNT(?s1) as ?count WHERE{
?s1 a [[a owl:Class;
owl:intersectionOf (shn:InformationItem
[a owl:Restriction ;
owl:onProperty btl:isOutcomeOf;
owl:someValuesFrom sct:Evaluation ]
[a owl:Restriction ;
owl:onProperty shn:hasInformationAttribute;
owl:someValuesFrom sct:InThePast ]
[ a owl:Restriction ;
owl:onProperty shn:isAboutSituation;
owl:allValuesFrom [ a owl:Class ;
owl:intersectionOf (sct:TobaccoSmokingSituation
[ a owl:Restriction ;
owl:onProperty btl:projectsOnto;
owl:someValuesFrom [ a owl:Class ;
owl:intersectionOf ( btl:PointInTime
[ a owl:Restriction ;
owl:onProperty btl:isRepresentedBy;
owl:someValuesFrom [ a owl:Restriction ;
owl:onProperty shn:hasValue;
owl:someValuesFrom [ a rdfs:Datatype ;
owl:onDatatype xsd:dateTime ;
owl:withRestrictions (
[ xsd:minInclusive "1994-03-01T00:00:00-00:00"])]]])]])]])] .}
Table 4-7 Q2 rendered in SPARQL, including COUNT option
Finally Table 4-8 shows a data exemplar retrieved as answer to query two:
Individual: Instance_Tobacco_Smoking_Situation_Patient Types: shn:InformationItem and shn:isAboutSituation (btl:BiologicalLife and btl:hasPart some (sct:TobaccoSmokingSituation and btl:projectsOnto some (shn:EndPointInTime
and btl:isRepresentedBy value "1994-03-01T00:00:00Z"^^dateTime)))
and btl:isOutcomeOf value Instance_Evaluation_Process and shn:hasInformationAttribute some sct:InThePast
Individual: Instance_Evaluation_Process Types: sct:HistoryTaking and shn:informationProvider value Instance_Patient_X and shn:informationSubject value Instance_Patient_X and shn:siteOfCare value Instance_Outpatient_Consultation_01 …
Table 4-8 OWL DL rendering of example data instance retrieved for Q2. There are two instances, the one that represent the tobacco smoking situation and the end point in time and the one that provides some contextual information
D4.3 Ontology / Information models covering the public health use cases Page 43 of 44
5 Summary and conclusions
In this document (SemanticHealthNet deliverable 4.3), we have addressed semantic interoperability
challenges that exist when EHRs are used for public health specific data analysis. Our objective has
not been addressing interoperability challenges due to the use of different EHR standards and repre-
sentations. We have explained that granularity, completeness and data quality required for public
health inquiries is often not provided by EHRs. Besides, clinical information mostly has to be re-
trieved within its context in order to be interpreted safely. So is it not the same whether some infor-
mation is provided by a clinician, a patient, or by a machine. In addition, data acquisition and record-
ing can be highly biased in cases the data is not considered of primary importance by the clinician, as
it is often the case with data on recreational drugs like tobacco and alcohol. Thus, all this contextual
information is required in order to know the degree of trust of data. Smoking and alcohol use data
are usually measured approximately and data items such as pack years are often used in public
health for assessing CVDs risks. There are also limits to semantic interoperability when codes such as
Ex-smoker are used for referring both to someone who quit last month but had smoked for 30 years,
and to a person who quit 20 years ago after having smoked two years only. Thus, there are limits to
comparability and semantic interoperability that are fundamental to differences in clinical processes
and facts about the world that cannot be resolved at the level of information systems. At best, the
degree of uncertainty/vagueness can be estimated.
We have focused on the reuse of data from EHRs for public health purposes, assuming that at least
parts of the EHRs are sufficiently standardized and quality-assured. This can be the case with sum-
maries like the HFS. Since the objective of this deliverable is to show how the EHR can be used to
satisfy public health goals, we have expanded the Heart Failure Summary (HFS) with two relevant
behavioural factors, tobacco and alcohol use. Furthermore, we have described an extension of the
underlying semantic architecture, which has been applied to the public health use case in order to
provide the semantic representation of the tobacco and alcohol use data and demonstrate its benefit
to improve semantic interoperability.
The use of semantic patterns to support and guide the mapping process of structured data into their
semantic representation, introduced in D4.2 has been expanded and supported by more examples.
The selection of the appropriated semantic pattern could be supported by the use of keywords such
as “past situation”, “absence”, “observation result”, “test result”, “assessment”, “symptom” etc.
together with taking into consideration the semantic category of each of the SNOMED CT concepts
used within the clinical model (i.e. finding, substance, qualifier value, etc.).
We have provided evidence that SNOMED CT, as the underlying domain ontology, bears a considera-
ble risk of being misled by the textual description of the concept (e.g. bumetanide substance vs.
product). For these cases, the use of a semantic pattern could also help to guide the terminology
binding. We have also shown that SNOMED CT concepts are placed under a wrong semantic category
(e.g. representing information, but placed under a clinical entity category), that SNOMED CT con-
cepts are underspecified (e.g. no explicit relationship between substance consumption and the relat-
ed substance), or are not expressive enough for our use case (e.g. no concept for representing occa-
sional tobacco snuff user but there is one for occasional tobacco user).
D4.3 Ontology / Information models covering the public health use cases Page 44 of 44
Given the above, the use of SNOMED CT as domain terminology within the EHR requires some deci-
sions to be made such as (i) to create value sets to be used across medical specialties; (ii) to clarify
the role of the terminology within the EHR by defining what kind of entities should be represented
(information vs. clinical entities); and (iii) to add axioms to hitherto underspecified concepts. To this
end, the approach proposed does not aim at forbidding the use of complex terms, but to correctly
place them in the ontology in a way that can be consistently used within information models. This
contrasts with other approaches which are normative and disallow the use of complex terms. The
SemanticHealthNet approach, instead, is rather descriptive than normative, and it is suited to medi-
ate across heterogeneous normative approaches, as we have demonstrated.
Although not perfect, SNOMED CT is increasingly being adopted by different countries in their na-
tional healthcare systems. It is increasingly grounded on formal ontological principles and redesigned
according to these principles. SemanticHealthNet is keeping pace with SNOMED CT’s progress, which,
however, should not preclude the use of other medical terminologies. The fact that other termino-
logical standards like ICD, LOINC and ICNP are in the process of being harmonized with SNOMED CT
via a common ontology demonstrate an increasing concern to clarify the meaning of medical terms
by formal-ontological grounding, thus following a similar route as SemanticHealthNet.
Given the state of the art of semantic technologies, the biggest challenge consists in finding interme-
diate scalable solutions which leave the door open to include the upcoming progress in the field. This
might influence the use of a logic- or non-logic-based language, as well as the type of reasoning
done, if any, which may depend on the technological state of the art and the semantic interoperabil-
ity requirements. Moreover, several renderings of a rich representation might be required, each at-
tending a different purpose, e.g. data validation vs. data query.
In the third project year, SemanticHealthNet WP 4 will open its scope to the semantic annotation of
formalised clinical practice guidelines. It will also try to explore the relationships with existing model-
ling approaches (e.g. CIMI, SIAMM, etc.) and standards such as ContSys (CEN ISO/DIS 13940). This
last defines a system of concepts from and enterprise / clinical perspective, which represent both the
content and context of the health care services under a process view and a generic clinical process
model.16 It seems to be complementary to the approach proposed. It pays attention to the modelling
of healthcare and clinical processes, considering their workflows, while in the SemanticHealthNet
approach as it has been developed by now, the sequence of healthcare processes that end up pro-
ducing the data have not been modelled in detail. Yet there is some overlap, e.g. by using concepts
like “health condition”, “observed condition”, “observed condition value” or “health component
specification” for which we have found a parallelism with the use of “clinical situation”, “observation
result about a clinical situation quality”, “result of the observation of the clinical situation quality”.
We will have a deeper insight in this standard in order to identify ways of harmonization.
16
http://www.standard.no/Global/PDF/Helse/CEN-TC251_N2013028_N13028_WG1_report_of_the_2nd_Madrid_Work.pdf