Using a Systems Approach to Developing Analy6cs Capability in the Healthcare
Safety Net
David Hartzband. D.Sc. Engineering Systems Division
Path2Analy6cs: What is It? • Path2Analy6cs is a project developed by the RCHN Community Health Founda6on with the goal of working with community health centers to integrate the capacity to produc6vely use contemporary analy6cs capabili6es into the work of the health center
• The project will work with both rural & urban health centers to meet this goal – The first two centers have been selected & the project is in progress at both
– The project will consist of both a process & a set of soOware to assist in mee6ng this goal
FQHCs • Health centers started opera6on under federal grants in 1975 – Funding consolidated under Sec6on 330 (Public Health Act) in 1996 (42 USC §254b Sec6on 330)
• 1131 FQHCs (2012) with ~6700 sites – ~100 “Look-‐Alikes” with ~ 500 sites
• Serve mainly uninsured & disadvantaged popula6ons – Mainly Medicaid & uninsured – Served ~23M pa6ents in 2012
Healthcare Analy6cs in the Safety Net
Path2Analy6cs: Assump6ons 1 • Much of current healthcare is increasingly an informa6on management effort.
• Analy6cs (i.e. the systema6c analysis of data focused on answering a specific ques6on or set of ques6ons) must be an integral part of a healthcare organiza6on’s administra6ve & clinical plans.
• Analy6cs is not an IT func6on, a soOware package or a technical methodology. – It is a way of thinking about an organiza6on’s goals that makes use of IT, analy6c (soOware) packages & technical methodologies.
Path2Analy6cs: Assump6ons 2 • Analy6cs is not necessarily about “Big Data”.
– But it must be about “right data” – It can be very effec6ve making use of “ligle data”, that is the data
that an organiza6on already has. – Not all organiza6ons have mul6ple petabytes of data, but even
mul6ple gigabytes of data can produce produc6ve results.
• An analysis is only as good as: – The quality of the data used, & – The quality of the ques6ons asked.
• Ques6ons asked are only as good as their alignment with: – The quality & content of the available data, & – The strategy & goals of the organiza6on .
Path2Analy6cs: Process • The project consists of five units:
– Landscape – what are other healthcare organiza6ons doing with respect to analy6cs? how does this relate to CHCs?
– Analysis Planning – what resources are available? what ques6ons to ask? how to address them analy6cally? what tools to use, seing goals & metrics
– Data Organiza6on – what data is available, extrac6on & aggrega6on, ETL & normaliza6on
– Analysis & Interpreta6on – opera6on of analy6c process – Outcome & Outreach – how to organize & convey results, who to tell, communica6on process etc.
LANDSCAPE
Healthcare Analy6cs in the Safety Net
• Data Warehousing – Almost all projects require some data extrac6on or warehousing
• Point-‐of-‐Care Recommenda6ons – Mayo Clinic -‐ Large-‐scale warehousing: seman6c normaliza6on, data dic6onary, 5M
clinical records covering 15 years, 15-‐25PB of data (2500x the size of the content of the Library of Congress)
• AWARE “bedside consul6ng”: delivering diagnosis & best-‐prac6ce treatment for individual pa6ents based on analysis of anonymized data set
– BIDMC -‐ Clinical Query: system used directly by providers to analyze 2.2M clinical records to determine best-‐prac6ce treatment in real-‐6me
– Kaiser -‐ Clinical, pharma & lab data on 9.1M pa6ents over 10 years – Natural Language query system allows providers to get recommenda6ons for
best-‐prac6ce treatment
– Partners – combined clinical, ops, financial for real-‐6me PoC data & best prac6ce recommenda6ons (Queriable Inference Pa6ent Dossier)
Landscape -‐ 1
• Outcome & Popula6on Characteriza6ons – Intermountain (Deloige as partner) – 90M pa6ent records, Outcome Miner = factors affec6ng outcome, Popula6on Miner = rela6onship between treatment & outcomes
– McKinsey (BeyondCore) – “next 5% analysis”, 30M claims characterize “microsegments for next 5% of pa6ents wrt cost, assign care managers
• Clinical & Financial Op6miza6on – Many organiza6ons are aggrega6ng clinical & financial data to be able to do analyses such as:
• Cost/service (/provider, /loca6on) • Cost/outcome (/provider, /loca6on) • Cost trends
Landscape -‐ 2
Landscape 3 • Best Practice Guidance
– Geisinger – ProvenCare: point-of-care system that interactively provides best practice elements for specific treatment areas (congestive heart failure, strike, etc.)
• Predic6on – Express Scripts – 1.5B prescrip6ons/year, modeling to predict which pa6ents most likely to modify drug use behavior, proac6ve interven6on
• Research – Mt Sinai Medical Center (Ayasdi) – topological data analysis, analysis of en6re e. coli (1M DNA variants) to determine bacteria’s response to an6bio6cs
Landscape 3 -‐ What’s Missing
• Opera6ons? – very few projects iden6fied that were strictly non-‐financial op6miza6ons, opportuni6es for: – Schedule op6miza6on including analysis of no-‐shows – Pa6ent flow for reduced wai6ng – Readmission rates – analysis of mul6ple dimensions: demographic criteria, loca6on, diagnosis etc.
– Predic6ve modeling of changes in: • Pa6ent number & type • Service u6liza6on • Loca6onal u6liza6on
• Others?
ANALYSIS PLANNING
Healthcare Analy6cs in the Safety Net
Types of Analysis
• Visualization - Use of visualization tools to characterize relationships
• DB Query (SQL) – Conventional query of relational databases
• MapReduce - Analytic framework using large-scale distributed cluster(s) for storage & processing of ultra-large data sets (Hadoop)
• Algorithmic – Use of programming languages such as R, Python, Pig Latin… to create algorithms for custom analysis (often used with Hadoop)
• All but algorithmic used in project
This is Different! • It must be emphasized that this is not sta6s6cal analysis in
the classical sense. – It’s also not “report genera6on” as is required for HRSA, etc.
• It is the empirical characteriza6on of data in the context of specific inquiries
• If a Point-‐of-‐Care recommenda6on system finds 9,274 cases that match a specific set of pa6ent parameters & classifies the outcomes according to treatment plan, that is not a sta6s6cal predic6on of the best treatment plan. It is an exact characteriza6on based on the data! – When we say that a specific treatment plan was associated with a
specific outcome in 72% of the cases, this is not a probability, but an exact propor6on!
– The provider may choose not to follow this characteriza6on
It Must Also Be Emphasized…
• Project goal is to produce ini6al analyses & leave health center with hardware & soOware that can be used by their Staff to do addi6onal analy6c inquiry, BUT…
• Perhaps a more important goal is to introduce the concept of “data as an asset” & to embed this through discussion & example so that data-‐driven decision making becomes a tool that the health center can rou6nely use
• This is a way of thinking as much as anything
Analysis Planning • Iden6fica6on of areas of inquiry & actual query building most
effec6vely approached as a systems problem • Inquiry must be aligned with health center strategy in order
to be relevant • System includes not just clinical & financial health center data
but also at least: local & regional demographics, social & cultural determinants, local & regional economics, regulatory environment…
• Ontology (Owl) developed & used to guide discussions of inquiry planning
• Discussions held with Estaff, Clinical & Admion/Support Staffs at health center(s)
Healthcare Analy6cs in the Safety Net
P2A Ontology (na6ve org.) -‐ OWL 4.1.2
Healthcare Analy6cs in the Safety Net
This work was conducted using the Protégé resource which is supported by grant GM10331601 from the Na6onal Ins6tute of Genera Medical Sciences, NIH
Healthcare Analy6cs in the Safety Net
P2A Ontology -‐ Owl 4.1.2
This work was conducted using the Protégé resource which is supported by grant GM10331601 from the Na6onal Ins6tute of Genera Medical Sciences, NIH
Model for Developing Inquiry
Healthcare Analy6cs in the Safety Net
Discussion of Health Center
Strategy
Discussion of Analy6c Ontology
Ini6al Inquiry Topics
Evalua6on of Exis6ng Data
Refinement of Strategic Criteria Refiinement
of Inquiry Topics
Review of External Data
& other Criteria
Summary Review
Final Inquiry Topics
Data Normaliza6on & Quality Checks
Use of Systems View
• The ontology was used as a guide to have Health Center par6cipants focus on larger scale & strategic issues for inquiry – Prior to use of ontology, most ideas about analysis were highly specific & tac6cal; related to current repor6ng issues
– AOer the ontology was introduced & the larger-‐scale system was discussed, health center par6cipants were more likely to focus on strategic issues for inquiry
Healthcare Analy6cs in the Safety Net
• Cost/visit increasing but revenue declining in dental prac6ce? – Characterize trends in both cost/visit & revenue, iden6fy addi6onal
expense in dental prac6ce (by loca6on, by provider), compare with trends in medical prac6ce
• Overall pa6ent encounters declining? Are our pa6ents geing care elsewhere? – Use external (State, County) data to correlate demographic trends by
loca6on with encounter data, Use data from area hospitals & other clinics to correlate with pa6ent care outside of of CHC
• Just acquired State hospital admigance/discharge detail data
• What pa6ent popula6ons are more “non-‐compliant” in managing their health & healthcare? – Characterize pagerns in declining clinical outcomes, determine if
there is a pagern wrt diagnosis, loca6on, demographic data
Inquiry Topics CHC 1
County Demographics vs. Encounters
8560
6836
5398
3739 3272 2884 2814 2801
Year
County 1 Popula@on 1920-‐2020
2003
5366 6000
11323 9902
12466 13181 14112
Year
County 3 Popula@on 1920-‐2020
12802
10831 9227
6679 5925 5815 5409 4971
Year
County 2 Popula@on 1920-‐2010
Demographics for three counties from State Records - Encounters in Counties 1 & 2 have declined at ~12% rate, but have Increased ~57% in County 3. Used as an example to CHC Staff for the relevance of external data
14% decrease since 1980 67% decrease since 1920
9% decrease since 1980 61% decrease since 1920
25% increase since 1980 85% increase since 1920
• Service u6liza6on, especially enabling services,… by loca6on – Analysis of enabling services provided by cost, loca6on, outcome (especially interested in interpreta6on)
• Overall number of pa6ents steady, but 200-‐300 new pa6ents per month – Characteriza6on of: single visits, episodic visits
• Workflow boglenecks? – Analysis of dura6on of encounter by treatment & diagnosis codes, pagerns in dura6on? By provider? By loca6on?
Inquiry Topics CHC 2
DATA & SOFTWARE DEPLOYMENT & ORGANIZATION
Healthcare Analy6cs in the Safety Net
Need for Analy6c Stack (Hadoop,…) • There are three characteris6cs of data that may require the use of a contemporary analy6c approach – the three V’s: – Volume – Some organiza6ons actually do have petabytes (1024 terabytes) of data… such ultra-‐large data sets are not easily managed or analyzed by conven6onal means
– Variety – Many of the types of inquiries that are relevant to current decision making require that data from mul6ple different sources & in different formats be analyzed together… e.g. clinical & financial data
– Velocity – par6cularly data from sensor nets (NASA is the best example)… 1,000,000 sensors sampled 3x/second
• In healthcare, the issues usually is variety…
2013 Poll -‐ HealthCatalyst
20%
38.70%
27.80%
10.80%
2.60% 0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
0 1 2 3 4 5 6 7 8 9
% Fun
c@on
Maturity Level
% Func@on per Maturity Level
• At what level does your organization consistently and reliably function? • 305 Respondents
Proposed Technical Approach • Preliminary characteriza6on of data using exis6ng repor6ng/
BI system – Can be used to determine if normaliza6on/ETL is needed
• Use of exis6ng data extracts/warehouse if available – Analysis of data quality needs to be done
• Layering of exis6ng data into open source analy6c stack (Hadoop-‐based) – Data from mul6ple sources & of different types can be loaded – Normaliza6on can be done en6rely in Hadoop
• Use of Yarn/MapReduce to generate results, &/or • Use of Hbase for data management • Use of Hive/Impala to do query • Use of Tableau to visualize raw data
Deployed Analy6c Stack
HDFS
YARN
Zookeeper
Workflow:
Oozie
Inquiry: Impala (SQL)
Cluster access:
Hue
Tableau Visualiza6on
Group Services & Synch
Map Reduce 2
Dist. File System
Search:
Soir
HBase
Cloudera Manager running at CHC1
Hbase Table Browser running at CHC1
Healthcare Analy6cs in the Safety Net
Hive Query running at CHC1
Healthcare Analy6cs in the Safety Net
Next Steps -‐ 1
• Once a goal & ra6onale for an inquiry is decided on, several steps have to take place before analysis can be performed: – Preliminary queries/explora6ons have to be decided on & developed
• Generally done by consensus with stakeholders • Quality & normaliza6on issues should be done in the context of preliminary queries so that the en6re universe of data does not need to be addressed (infeasible to impossible)
– Data issues & Data quality need to be assessed – Seman6c normaliza6on needs to be addressed
Next Steps – 2: Data Quality
• The issues of data quality with healthcare informa6on, especially EHR data are well known
• Previous work with the data warehouse of a PCA & several associated CHCs showed that: – Usage pagerns both of different providers & different CHCs created quality issues
– Many Providers use the EHR (regardless of which one) in idiosyncra6c ways (filling in Notes, but not on forms, etc.) that affect the ability use the data in standardized ways
– Many CHCs have idiosyncra6c coding prac6ces that make comparisons across health centers infeasible or impossible
Next Steps – 3: Normaliza6on • Seman6c normaliza6on is rarely addressed
– e.g. Who is a provider? MDs?, D.Psy.? D.NPs? ANPs? RNs? Social Workers? Some of these some6mes?
– What is a service? – What is an encounter? – What is an outcome?
• A recent exercise at a medium-‐size, urban CHC produced the following defini6ons: – Provider = anyone with a license to access & modify the EHR – Service = Medical, Dental, Behavioral Health, Medical Speciali6es,
Social Services – Encounter = a checked-‐in visit with an encounter number (exclusions
by pa6ent type) – Outcome – deferred – a (semi)recent ONC working group agreed
unanimously on one outcome… mortality!
ANALYSIS & INTERPRETATION OUTCOME & OUTREACH
Healthcare Analy6cs in the Safety Net
both in progress
Initial Lessons Learned
● Semantic & Syntactic normalization of data essential for analysis Ø Even if normalization is done at time of analysis (as in Hadoop), there must be standards for the data to be normalized against
● Deployment of Analytic Stack takes time & extreme attention to detail
● Substantial time is needed for selection of analysis areas & inquiry development
● Large opportunity for operational optimization if data is available
Healthcare Analy6cs in the Safety Net
Takeaways • This technology is VERY powerful, but it CANNOT solve a health center’s issues by itself
• Inquiry is empty unless it is closely aligned with the needs & strategy of the health center
• Projects like this are not IT projects – they require input & par6cipa6on from all areas of the health center
• Knowing your “data” is key… not just the EHR data, but all of the data that is cri6cal to the health center – Focus on the idea of “data as an asset”,.. All data
• Perspec6ve is important – No analysis is a panecea
Healthcare Analy6cs in the Safety Net
David Hartzband, D.Sc. Research Affiliate
Engineering System Division
Massachusegs Ins6tute of Technology 617.324.6693 (o), 617.501.4611 (m)