synthea: massive fhir data...2018/11/15  · high demand for ehr datasets • non-clinical or...

24
HL7®, FHIR® and the flame Design mark are the registered trademarks of Health Level Seven International and are used with permission. Amsterdam, 14-16 November | @HL7 @FirelyTeam | #fhirdevdays18 | www.fhirdevdays.com Synthea: Massive FHIR Data Jason Walonoski © 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Upload: others

Post on 31-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

HL7®, FHIR® and the flame Design mark are the registered trademarks of Health Level Seven International and are used with permission.

Amsterdam, 14-16 November | @HL7 @FirelyTeam | #fhirdevdays18 | www.fhirdevdays.com

Synthea: Massive FHIR Data

Jason Walonoski

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 2: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Synthea

• Synthetic Patient Simulation• Synthea is an open-source synthetic patient generator that simulates the

medical history of synthetic patients.

• High Quality Health Records• The system outputs high-quality synthetic, realistic but not real, patient data

and associated health records covering every aspect of health.

• Freely Available• The resulting data is free from legal, cost, privacy, and security restrictions for

a variety of secondary uses in academia, research, industry, and government where realistic (but not real) data is sufficient

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 3: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Sound useful? Get Started while I’m talking…

• Requirements• Java Development Kit 1.8• Git Version Control

git clone https://github.com/synthetichealth/synthea.gitcd synthea./gradlew build check test./run_synthea

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 4: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Why synthetic data?

• High demand for EHR datasets• Non-clinical or secondary uses including software development, testing, clinical training, policy analysis, where

realistic (but not real) data is sufficient

• Lack of Access• EHR datasets are difficult to obtain

• Costs and Demand• Anonymized records are being bought and sold

• Risks• Real patient records carry privacy, confidentiality, consent, policy, and legal risks that effectively prevent use

• Real patients fear exposure of their intimate health data including lifestyle, family history, and mental health data

• Not Anonymous• Deidentified and anonymized records have been successfully reidentified

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 5: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 6: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Not just patient records…

• Access to Care• Modeling healthcare facilities and utilization• Calculates individual access

• Health Outcomes• Calculates Quality Adjusted Life Years (QALY) and Disability Adjusted Life Years (DALY)• Quality Measures

• Cost and Price• Modeling claims insurance coverage

• Medicaid, Medicare, Dual-Eligible, Private, None• Cost to individual and family burden (annual and lifetime)• Health System costs

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 7: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

DiseasesTop 10 Reasons Patients Visit PCP Top 10 Years of Life Lost

1 Routine infant/child health check Ischemic Heart Disease

2 Essential Hypertension Lung Cancer

3 Diabetes Mellitus Alzheimer’s Disease

4 Normal Pregnancy COPD

5 Respiratory Infections (Pharyngitis, Bronchitis, Sinusitis) Cerebrovascular Disease

6 General Adult Medical Examination Road Injuries

7 Disorders of Lipoid Metabolism Self-Harm

8 Ear Infections (Otitis Media) Diabetes Mellitus

9 Asthma Colorectal Cancer

10 Urinary Tract Infections Drug Use Disorders (limited to Opioids)

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

90 modules with 722 clinical codes…

Page 8: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Disease modules are state machines…

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 9: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Disease modules are written in JSON{

"name": "Ear Infections","states": {

"Initial": {"type": "Initial","direct_transition": "No_Infection"

},"No_Infection": {

"type": "Delay","direct_transition": "Gets_Ear_Infection","range": { "low": 1, "high": 2,

"unit": "months" }},"Gets_Ear_Infection": {

"type": "ConditionOnset","target_encounter": "Ear_Infection_Encounter","codes": [{

"system": "SNOMED-CT", "code": "65363002","display": "Otitis media"

}],"direct_transition": "Ear_Infection_Encounter"

},"Ear_Infection_Encounter": {

"type": "Encounter","encounter_class": "outpatient","reason": "Gets_Ear_Infection","codes": [{

"system": "SNOMED-CT", "code": "185345009",

"display": "Encounter for symptom"}],"distributed_transition": [

{ "distribution": 0.8, "transition": "Antibiotic" },

{ "distribution": 0.2, "transition": "Painkiller" }

]},"Antibiotic": {

"type": "MedicationOrder","codes": [{

"system": "RxNorm", "code": 309310,"display": "Ciprofloxacin 100 MG/ML

Oral Suspension"}],"direct_transition": "Terminal"

},"Painkiller": {

"type": "MedicationOrder","codes": [{

"system": "RxNorm", "code": 307668,"display": "Acetaminophen 32 MG/ML

Oral Suspension"}],"direct_transition": "Terminal"

},"Terminal": { "type": "Terminal" }}}

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 10: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Control States: control the flow

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 11: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Clinical States: drive disease and care

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 12: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Exampilitis Walk-Through

• 10 minute walk-through• “Examplitis is a painful condition

that affects only males. Most patients can be cured with Examplitol or an Examplotomy but some never recover.”

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

https://synthetichealth.github.io/module-builder/

Page 13: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Setup

• Requirements• Java Development Kit 1.8• Git Version Control

git clone https://github.com/synthetichealth/synthea.gitcd synthea./gradlew build check test

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 14: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Generating Data example@hostname ~/synthea $ ./run_synthea

> Task :runLoading C:\Users\example\synthea\build\resources\main\modules\allergic_rhinitis.jsonLoading C:\Users\example\synthea\build\resources\main\modules\allergies\allergy_incidence.json[... many more lines of Loading ...]Loading C:\Users\example\synthea\build\resources\main\modules\wellness_encounters.jsonLoaded 90 modules.Running with options:Population: 1Seed: 1519063214833Location: Massachusetts

1 -- Jerilyn993 Parker433 (10 y/o) Lawrence, Massachusetts

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 15: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Synthea generates FHIR Resources (DSTU2, STU3, R4)• Bundle• Patient• Encounter• Condition• AllergyIntolerance• Observation• DiagnosticReport• Procedure• ImagingStudy• Immunization• CarePlan• MedicationRequest• Claim• ExplanationOfBenefit (STU3 only, BB2.0)• Coverage (STU3 only)

{"resourceType": "Observation","id": "15cbce37-e98d-40b8-8ab2-d57907fa3a2b","status": "final","category": [ { "coding": [ {

"system": "http://hl7.org/fhir/observation-category","code": "vital-signs", "display": "vital-signs"

} ] } ],"code": { "coding": [ {

"system": "http://loinc.org", "code": "55284-4","display": "Blood Pressure"

} ], "text": "Blood Pressure"},"subject": { "reference": "urn:uuid:2af35dd1-fb58-40f2-8066-21be17fb420d" },"context": { "reference": "urn:uuid:6a1a467c-aeb0-4ca8-9826-af16aaca4dc2" },"effectiveDateTime": "2009-01-12T00:03:59-05:00","issued": "2009-01-12T00:03:59.349-05:00","component": [ {"code": { "coding": [ {

"system": "http://loinc.org", "code": "8462-4","display": "Diastolic Blood Pressure"

} ], "text": "Diastolic Blood Pressure"},"valueQuantity": {"value": 85.19988531686474, "unit": "mmHg","system": "http://unitsofmeasure.org", "code": "mmHg"

} }, {"code": { "coding": [ {

"system": "http://loinc.org", "code": "8480-6","display": "Systolic Blood Pressure"

} ], "text": "Systolic Blood Pressure"},"valueQuantity": {"value": 108.84244941704915, "unit": "mmHg","system": "http://unitsofmeasure.org", "code": "mmHg"

}} ]

}

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 16: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

POST the data to a servercurl http://hapi.fhir.org/baseDstu3 --data-binary "@/Users/example/synthea/output/fhir/Maryetta775_Rowe323_2cb7e4dd-9d8b-49cf-b1e4-9839be8bc754.json" -H "Content-Type: application/fhir+json"

{ "resourceType": "Bundle", "id": "ca4d459f-b078-4c04-a152-c7ce76a25179","type": "transaction-response","link": [{ "relation": "self",

"url": "http://hapi.fhir.org/baseDstu3"}],"entry": [{ "response": { "status": "201 Created","location": "Patient/4147259/_history/1", "etag": "1", "lastModified": "2018-06-06T18:26:06.038+00:00" }}]

}

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 17: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Configuring FHIR Settings

• The exporting of FHIR can be configured using src/main/resources/synthea.properties

# Abridged synthea.properties file# default FHIR configuration.exporter.fhir.export = true# transaction bundle 'true' produces transaction Bundles# while 'false' produces collection Bundles.exporter.fhir.transaction_bundle = true# Standard Health Record (SHR) extensions for STU3exporter.fhir.use_shr_extensions = true# Exporting FHIR DSTU2exporter.fhir_dstu2.export = false# Exporting FHIR R4exporter.fhir_r4.export = false# Exporting Hospital Provider Data in STU3 or DSTU2exporter.hospital.fhir.export = trueexporter.hospital.fhir_dstu2.export = false

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 18: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

How are other developers using Synthea?

• Healthcare Services Platform Consortium (HSPC)• Developer sandbox environment to spin-up FHIR servers and load them with

Synthea data. They also host a FHIR server preloaded with Synthea data called HSPC Synthea STU3 (3.0.1) (authentication required).

• https://sandbox.hspconsortium.org/

• SMART Health IT • Datasets available for download including Synthea data, which are also available

through http://docs.smarthealthit.org/data/stu3-sandbox-data.html• They also provide a Docker version of the HAPI FHIR Server preloaded with

Synthea data here: https://github.com/smart-on-fhir/hapi• Used Synthea data with their Bulk Data Server.

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 19: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

How are other developers using Synthea?

• Algorex Health has used Synthea data to explore Open Clinical Analysis.• https://blog.algorexhealth.com/2017/04/open-clinical-analysis-with-mitre-part-2/

• The MITRE Corporation has used Synthea data to create SyntheticMass, a 1/7th scale simulated model of the Commonwealth of Massachusetts, including a FHIR server (FHIR v1.8).

• https://syntheticmass.mitre.org

• An MSDN blog post illustrates Loading Synthea FHIR Data with Logic Apps and Functions in Azure Government.

• https://blogs.msdn.microsoft.com/mihansen/2018/05/10/loading-synthea-fhir-data-with-logic-apps-and-functions-in-azure-government/

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 20: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

How are other developers using Synthea?

• Cerner uses Synthea in their Bunsen tutorial which adds the FHIR data model to Apache Spark queries within Jupyter notebooks

• https://github.com/cerner/bunsen-tutorial

• Google uses Synthea in their FHIR protobuf examples using Big Query• https://github.com/google/fhir/tree/master/examples/bigquery#example-code-

to-upload-fhir-resources-into-bigquery

• What will you use Synthea for?

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 21: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Synthea Tutorials and Exercises

1. Install, Configure, and Run Synthea2. Use the Synthetic FHIR Data3. Explore and Modify the Disease Modules4. Localize Synthea for Alternative Geographic Locations

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 22: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Roadmap

• Current FHIR Support• DSTU2, STU3, and R4• Argonauts IG• Blue Button 2.0 IG• JSON and Bulk Data (ndjson)

• New FHIR versions• Additional Implementation Guides

• Terminology Variation• Splitting the Record• Clinical Notes

• Updating existing patients periodically (daily?)

• Claims tied to Payers

• Multiple private/government payers• Care Seeking Behavior

• In and Out of Network• Variable Care

• Health Disparities and Determinants of Health

Page 23: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,

Open Source Resources

• Contact• Jason Walonoski• [email protected]

• Synthea• https://github.com/synthetichealth/synthea

• Module Builder• https://synthetichealth.github.io/module-builder/

© 2018 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for public release. Distribution unlimited 18-1678-1.

Page 24: Synthea: Massive FHIR Data...2018/11/15  · High demand for EHR datasets • Non-clinical or secondary uses including software development, testing, clinical training, policy analysis,