aehin 28 august, 2014 - innovation in healthcare it standards: the path to big data interchange
DESCRIPTION
AeHIN Hour is our network's regular webinar where we feature topics on eHealth, HIS, and Civil Registration and Vital Statistics. This presentation was from Dr. Luciana Cavalini, PhD. and Timothy Cook, MSc. Profa. Luciana Tricai Cavalini, MD, PhD. Luciana is a physician with PhD in Public Health. She is a Professor at the Department of Health Information Technologies, Medical Sciences College, Rio de Janeiro State University, Brazil and Professor at the Department of Epidemiology and Biostatistics, Community Health Institute, Fluminense Federal University, Brazil. Luciana is also the Coordinator of the Technological Development Unit in Multilevel Healthcare Information Modeling and Coordinator of the Emergent Group in Research and Innovation on Healthcare Information Technologies. br.linkedin.com/pub/luciana-tricai-cavalini/88/8b6/533/en Timothy Wayne Cook, MSc. Tim is an Advanced Electronics Technologist with a MSc in Health Informatics. He is the creator and core developer of the Multilevel Healthcare Information Modeling (MLHIM) specifications and Chief Technology Officer at MedWeb 3.0 (The Semantic Med Web). He also serves as International Collaborator at the National Institute of Science and Technology – Medicine Assisted by Scientific Computing, Brazil. https://www.linkedin.com/in/timothywaynecookTRANSCRIPT
INNOVATION IN HEALTHCARE IT STANDARDS: THE PATH TO BIG DATA INTERCHANGE
LUCIANA TRICAI CAVALINI, MD, PHDTIMOTHY WAYNE COOK, MSC
BIG DATA IN HEALTHCAREMYTHS (AND FACTS)
MYTH #1: "BIG DATA" HAS A UNIVERSALLY ACCEPTED, CLEAR DEFINITION
MYTH #2: BIG DATA IS NEWCollecting, processing and analyzing sheer amounts of data is not a new activity in mankind Example: Middle Age monks and their concordances (correlations of every single word in the Bible)
What is new is the volume size and the speed it can be processed and analyzed
MYTH #3: BIGGER DATA IS BETTERIn biomedical science, this is partially fact: the bigger the
sample size, the more precise the
estimates are
However, large sample
sizes with bad quality data
are dangerously misleading
In healthcare, precision and reliability are both equally
important
MYTH #4: BIG DATA MEANS BIG MARKETING
There is no evidence that analyzing Big Data
increases the number of customers
Big Data is useful when it helps emerging actionable insights
(e.g., an unknown relationship between a gene and a disease)
That has little relevance in healthcare, especially in universal healthcare
systems
HOW TO GET RELIABLE BIG DATA?TRADITIONAL STANDARDS X INNOVATION
THE TRADITIONAL HEALTHCARE IT STANDARDS
HL7, openEHR, ISO 13606
Primary focus on message exchange
among EMRs
All of them precede in history the emergence of
Big Data and the Semantic Web
Top-down data modeling approach: not
prepared to deal with the 3V of Big Data
SNOMED-CT, LOINC, ICD
Controlled vocabularies
Also preceding Big Data and
Semantic Web
Main focus on pre-coordination (top-down approach)
In other words: the traditional healthcare IT standards are not prepared to deal with Big Data
A DEVELOPMENT ABOUT OPENEHR
The current version of the Archetype
Definition Language is 1.4
It requires an archetype to be the maximal data set for
a given concept
By the book, it means that there can be just one
archetype for each single concept in the whole
globe
There are several archetypes being developed
in isolation, not being submitted to the proper
governance tool (the CKM)
Now
BIG DATA IS BEING PRODUCED:
A BIG DATA-AWARE HEALTHCARE IT STANDARD IS:
Compliant to Semantic Web Technologies
Respectful to the different points of view coming from different medical schools
Welcoming to all healthcare professionals (and their concepts)
Not limited to EMR data modeling
Prepared to deal with the emerging mHealth and the Internet of Things
MULTILEVEL HEALTHCARE INFORMATION MODELING (MLHIM)
AN INNOVATION IN HEALTHCARE IT STANDARDS
THE BACKGROUND - 1
The typical application design locks up semantics in the database structure and application source code
Different use cases in different scenarios often interpret seemingly similar data, differently when the semantics are missing
Multilevel modelling provides a way to share semantics about any medical (healthcare) concept between distributed and independent applications
THE BACKGROUND - 2MLHIM is based on the core modelling concepts of openEHR to provide semantics external from applications
From openEHR, MLHIM inherited the multilevel model principles
MLHIM also uses certain conceptual principles from HL7 v3
From HL7, MLHIM inherited the XML-based implementation
THE IMPLEMENTATION
MLHIM simplifies the openEHR Reference Model
It is called a ‘minimalistic’ multilevel model
A NOTE ON ADL X XML
There is a loss of information when moving between an object model (AOM) and XML Schema
dADL is the proper instance serialization for
the AOM
However, in practice implementers are
serializing openEHR/ISO13606 data
in XML
ADL X XML: A COMPARISONADL XML
The openEHR test suite includes approximately 1600 total files, with known independent validations of its files
The XML Schema test suite contains more than 40,000 independently validated tests
OpenEHR tools are developed by one company and there is one open source reference model
There are more than 30 XML editors, open source and proprietary from as many companies. There are additional tools in the XML family, XSLT, Xquery, Xlink and Xproc
The FOSS Java RM has not been thoroughly tested and validated
There are at least 3 widely used, XML parser/validators, open source and proprietary from different companies and communities
The only ADL courses are from Ocean Informatics and a few startup course taught by non-experts
XML is taught in all computer science courses as well as online
There are zero books on ADL O'Reilly has 54 books on XML, Amazon has 11,890 results for Books: "xml"
QUESTION BREAK
MODELING CLINICAL MODELS IN MLHIM
THE HEART OF HEALTHCARE IT STANDARDIZATION
CLINICAL KNOWLEDGE MODELING: FUNDAMENTALS
Modeling clinical data is a complex taskRequires deep
knowledge of the specific clinical
domain
Requires at least an intermediate understanding of
data types
Modeling clinical data is a core
activity in healthcare IT
It is the only way to produce
Big Data in healthcare with
responsibility
Even well designed clinical
data modes in conventional
software are not interoperable
Multilevel model software is
interoperable and it requires
thoughtful clinical knowledge modeling
CLINICAL MODELS IN MULTILEVEL MODELING
•The Reference Model: generic information model shared by the ecosystem•The Domain Model: definition of constraints to the Reference Model for each medical concept
In multilevel modeling, the information
ecosystem is structured in (at least) two levels:
Multilevel Model openEHR MLHIM
Domain Model Archetype Concept Constraint Definition (CCD)
Language ADL XML Schema 1.1
# of DM/concept 1 n
Governance Top down, consensus Bottom-up, merit
CONCEPT CONSTRAINT DEFINITION (CCD)
In MLHIM, CCDs are XML Schemas that define
constraints to the Reference Model, in order to model
clinical concepts
CCDs can be validated to the correspondent MLHIM Reference Model by third-
party applications
The CCD Schema informs the application developer of the
structure of a valid data instance for each concept modeled for that system
If the CCD is made public, any receptor of a data instance
coming from this application can store, validate, query etc
that data instance
CCD HIGH LEVEL STRUCTURE
CCD
Care, Demographic or AdminEntry
Cluster
DvAdapter (or Cluster)
DataType
MLHIM DATATYPES FOR CCDS
Ordered
Quantified
DvCount
DvQuantity
DvRatio
DvOrdinal DvTemporal
Unordered
DvString
(with enumeration)
(without enumeration)
DvCodedString DvMedia DvParsable
DvInterval
RerefenceRange
MLHIM ELEMENTS: PRINCIPLES
The elements of a CCD do not carry any semantics
Since element names are structural identifiers, this is in keeping with the best practices of healthcare knowledge artifact identifiers, as first proposed by Dr. James Cimino (circa 1988)
Characteristic #3 - Dumb Identifiers
An identifier itself should not have meaning. If an identifier is comprised of other identifiers that have been combined, then the composite identifier is inherently unstable. If the circumstances that related the composite identifiers together in the first place change, the resulting identifier must also change.
MLHIM CCDS: TECHNICAL ASPECTS
CCDs are the equivalent of an archetype in CEN13606 and openEHR
With the exceptions:
• They may be defined at any level, for any application use• complexType definitions may be reused in multiple CCDs• CCDs persist for all time and are not versioned, this is essential for data integrity across time• All element names are unique identifiers (Type 4 UUIDs)
CCD GOVERNANCE MODEL
Artifact governance in
MLHIM consists of
maintaining a copy of the CCDs and Reference
Models
This can be on the web at the
specified location or locally and referenced using the
standard XML Catalog tools
Because of the naming
conventions, changes to the
MLHIM reference
model does not impact previously
defined CCDs or data
This maintains accurate
semantics for all time
MLHIM RESOURCESPUTTING INNOVATION INTO PRACTICE
MLHIM REFERENCE MODEL
The release version is availble at www.mlhim.org
The development version is available at www.github.com/mlhim
CCD GENERATOR (CCD-GEN)
CCD editor maintained by the MLHIM Laboratory at www.ccdgen.com
Produces CCDs according to the correspondent MLHIM Reference Model
CCDs are automatically validated
Other products include:
A sample data instanceJSON serialization of the data instanceA sample HTML formModules for the R programming language to pull MLHIM data into R data frames for processing and analysis
OTHER MLHIM TOOLS
MLHIM Application Platform & Learning Environment (MAPLE)
• A MLHIM repository using an SQL DB for persistence with a browser and a REST interface
MLHIM XML Instance Converter (MXIC)
• Utility to convert MLHIM CCD XML instances to use shortuuids and to convert to JSON and back again to XML• It is intended to demonstrate how mobile apps can use smaller data files to pass over the wire to an API that expects these formats and can convert them back to full XML instances for validation
Form2CCD
• Web application to build a form and create a CCD from it (work in progress)
Constraint Definition Designer (CDD)
• FOSS CCD editor (work in progress)
IN BRIEFCONCLUSIONS AND THE VIEW TO THE FUTURE
MLHIM IS BIG DATA READY
MLHIM uses standard XML technologies and
embedded RDF to define the syntax and semantics
The semantics are in the CCD and can be easily exchanged or
referenced via the web
Their RDF can be queried, analyzed and linked using standard
tools
MLHIM data can be stored in SQL or NoSQL databases
Examples are on GitHub for eXist-DB (XML) and SQLite3 (can easily be ported to use PostgreSQL, MySQL, Oracle,
etc.)
OUR VISION OF THE FUTURE
There are intuitions inside the healthcare IT world already about the inadequacy of conventional EMRs to collect reliable data at the point of care
The real Big Data in healthcare will come from purpose-specific applications modeled by the domain experts
The hardware support of choice for those apps is the mobile computing
The other source of Big Data in healthcare will come from the Internet of Things
All that data which is MLHIM compliant will participate in a semantically interoperable health information ecosystem
THANK YOU!
/mlhim2
http://gplus.to/MLHIMComm
@mlhim2
https://www.youtube.com/user/MLHIMdotORG