how unitedhealth group integrated open data into its mongodb-based information system
DESCRIPTION
TRANSCRIPT
© Talend 2012
Ecosystem
How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System
Cedric CarboneCTO at TalendTwitter: @carbone
Matt AxsomLead Architect at [email protected]
© Talend 2012
Key Facts• Founded in 2006
• $102m raised since inception
• 400+ employees in 7 countries
• Open Core business model• Subscription license• Services & training
Solutions• Highly scalable integration solutions addressing Big Data,
Application Integration, Data Integration, Data Quality, MDM, BPM
• Recognized as a visionary leader in the integration market by Gartner and Forrester
Talend OverviewGrowth-Driven Deployment Model
Revenue: 108% CAGR
2007 2008 2009 2010 2011 2012 2013
BRAND AWARENESS
VIBRANT COMMUNITY
MONETIZATION
CUSTOMER LOYALTY
© Talend 2012© Copyright Talend 2011. All rights reserved. 3
The Talend Difference
© Talend 2012
Talend Open Studio
© Talend 2012© Copyright Talend 2011. All rights reserved. 5
Optum Overview
➜ Clinical Performance and Compliance• Encounter Data Processing• Incremental Data Capture
© Talend 2012© Copyright Talend 2011. All rights reserved. 6
Talend Usage
➜ Talend can be leveraged for combined process automation, orchestration, and execution to get increased business continuity and massive ROI
➜ Empower your business users to make changes to their analytics and business intelligence jobs with little or no IT involvement
© Talend 2012© Copyright Talend 2011. All rights reserved. 8
RADAR Overview
➜ RADAR – Risk Adjustment Data Acquisition and Analysis Repository• Medicare and Retirement Claims Related Data
• EHR – Electronic Health Record• EMR – Electronic Medical Record• HL7 – Health Level 7
© Talend 2012© Copyright Talend 2011. All rights reserved. 9
RADAR Overview – Cont.
➜ Solution Benefits• Trading Partner File Submissions Tracked via GridFS• Transactions via Collections, Correlated w/Metadata• Downstream File Outputs are Stored in GridFS• Audit Traceability from Cradle to Grave
Simplified Auditability / Research / Analysis of Submission Data by Storing in a Consistent State / Platform Removing the Reliance on Network Shares, etc.
© Talend 2012
Trading Partner Submissions
Commercial Payers
Transaction Types
Provider Direct
Hospital Data Capture
Chart Review
Delete Submissions
Government Payers
Hospitals
Physicians
Practice Management/HIS
Clearinghouses
Portal / Direct
Direct
Direct
Direct Connect
Direct Connect
Clearinghouses
Commercial Payers
Direct API
© Talend 2012
RDBMS
RADAR Architecture
MongoDB
Trading Partner Submission
CMS
Input Data Source
TranslationETL
Talend / ETL
Trading Partner
Encounters
Submissions
GridFS
Workflows
Rules
CMS Submission
Status
DetailedReports
TX Submission MMR/MOR
© Talend 2012© Copyright Talend 2011. All rights reserved. 12
How did we get here?
➜ Talend• 30 Minutes Anyone?• Orchestration – 3rd Party
➜ MongoDB• File Systems• The Data Intake Problem• RDBMS Modeling Woes
➜ Hadoop• HBase Does NoSQL Right?• Hadoop Snapshot
© Talend 2012© Copyright Talend 2011. All rights reserved. 13
30 Minutes Anyone?
SELECT * FROM TX_DETAIL WHERE DATE = ‘20140623’;
Requirements:• Input Date Parameter• Output to Excel• Run Daily• Status Email
© Talend 2012© Copyright Talend 2011. All rights reserved. 14
Orchestration – 3rd Party
➜ SAS• Autoexec Injection• System calls / error trapping
➜ SSIS / Other• System calls / error trapping
➜ SAP Web Hooks• REST• SOAP
No Utility Scripting Required
© Talend 2012© Copyright Talend 2011. All rights reserved. 15
File Systems
➜ Solution Options?• BFILE – Link / Pointer to File System Object• HDFS – Hadoop Distributed File System• GridFS - MongoDB
© Talend 2012
The Data Intake Problem
Data Source 1
Application A
Client 6 Data
Client 2 Data
Client 4 Data
Client 3 Data
Client 5 Data
Application D
Application B
Application E
Application C
MS AccessFlat File
Custom File Format
UNIX
SAS
Flat File
Image
SQL Loader
Custom Code
Custom Code
Oracle Data Integrator
UNIX
SSIS
© Talend 2012© Copyright Talend 2011. All rights reserved. 17
RDBMS Modeling Woes
© Talend 2012
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH(MapReduce)
INTERACTIVE(Tez)
STREAMING(Storm, Spark)
GRAPH(Giraph)
NoSQL(MongoDB)
Events(Falcon)
ONLINE(HBase)
OTHER(Search)
Tap – Transform – Deliver
TRANSFORM (Data Refinement)
PROFILE PARSEMAP CDCCLEANSE STANDARD-IZE
MACHINELEARNINGMATCH
TAP(Ingestion)
SQOOP
FLUME
HDFS API
Storm
HIVE
800+
DELIVER(as an API)
ActiveMQKaraf
NoSQLCXF/Camel
KafkaStorm
MetaSecurity
MDMiPaaS
GovernHA
© Talend 2012
Generate Pure Map Reduce
© Talend 2012
Pig Latin Script generation
© Talend 2012
HiveQL generation
© Talend 2012© Copyright Talend 2011. All rights reserved. 22
HBase Does NoSQL Right?
➜ Availability and Agility• HDFS Reliance – Upgrade Downtime• HBase Production – 2014Q1
➜ Operational Simplicity• MongoDB Backup & Restore
➜ Secondary Indexing (HBase December 2012)• Primary Key Only = Full Table Scan
© Talend 2012© Copyright Talend 2011. All rights reserved. 23
Hadoop Snapshot
➜ Existing Environment– 20 Table Join – ~4B Total Rows– ~865K Row Input Driver– ~100 Hour Execution
➜ 5 Node Cluster– Hive Query ~18 Hours– Hive Tuned ~45 Minutes
© Talend 2012 24
Hadoop Usage
➜ OLTP Platform Data Load to Hadoop• File to HDFS (via Talend)• Sqoop to HDFS / Hive (via Talend)
➜ Talend• Hive – SQL 92ish• Pig - Transformation• Map-Reduce (Data Read / Shuffle & Sort) Low Level Java• HCatalog* – Shared Schema / Data Typing
• Partition Notifications
➜ SAS• Hive
➜ Tableau / SAP BO 4• Hive
© Talend 2012© Copyright Talend 2011. All rights reserved. 25
RADAR Solution – Current State
➜ 100+ Jobs in TAC – Daily / Weekly / Monthly
➜ Moving / Migrating Jobs to Data Services (Real Time)• Leveraging tMOM and Camel
➜ Big Data• Migrating all core ODS to MongoDB• ETL/ELT and Analytics offload to Hadoop
© Talend 2012© Copyright Talend 2011. All rights reserved. 26
RADAR Solution – Future
➜ User Interface (Meteor/Node.js)• Talend Jobs & Services Exposure / Interaction• Self Service
– Audit and Traceability– Big Data Mashups
➜ Expand Data Services• API/REST/SOAP
➜ Big Data• DI / DQ YARN Tasks
© Talend 2012© Copyright Talend 2011. All rights reserved. 27
NPI Engine
➜ NPI - National Provider Identifier• NPI is a Health Insurance Portability and Accountability Act (HIPAA)
Administrative Simplification Standard. The NPI is a unique identification number for covered health care providers.
➜ Monthly NPI File Size is ~ 4GB• Contains 4+ million records• 329 Columns
• 15 Taxonomies (75 columns – 5 column for each)• 50 Other Provider Identifiers (200 columns – 4 column for each)• 2 Addresses• 3 Names• Additional Attributes
© Talend 2012© Copyright Talend 2011. All rights reserved. 28
NPPES Dataset
➜ NPI Permissible Uses• Cross Reference for Fraud Waste and Abuse• Debt Collection• Transaction Processing and Provider Communication• Coordination of Benefits• Provider to Patient Medical Record Linkage
© Talend 2012© Copyright Talend 2011. All rights reserved. 29
NPPES Dataset – Cont.
➜ Why we care• Source of Truth for NPI
• PECOS + NPPES like Mashups• Augment With Internal Data SSN/TIN
• Provider Lookup Service• Add Geo Location • Augment With Internal Provider Data
• Excluded Provider Services• Fraud Prevention and Detection
© Talend 2012© Copyright Talend 2011. All rights reserved. 30
NPPES and RADAR
➜ OIG List of Excluded Individuals/Entities➜ General Services Administration’s Excluded Parties List
System (Retired) SAM (System for Award Management)
➜ Acceptable Physician Specialty Types
➜ Possible Additions• Death Master• Provider Data Feeds
© Talend 2012© Copyright Talend 2011. All rights reserved. 31
Talend Job
© Talend 2012© Copyright Talend 2011. All rights reserved. 32
Output
© Talend 2012© Copyright Talend 2011. All rights reserved. 33
Output"taxonomies" : { "207X00000X" : { "licence" : "12637", "primary" : "Yes", "state" : "NE" } }, "addresses" : { "Mailing Address" : { "addresstype" : "Mailing Address", "addresstype_id" : 1, "line1" : "PO BOX 2168", "line2" : "", "city" : "KEARNEY", "state" : "NE", "postal_code" : "688482168", "country" : "US", "telephone" : "3088652512", "fax" : "3088652506" }, "Practice Location Address" : { "addresstype" : "Practice Location Address", "addresstype_id" : 2, "line1" : "3500 CENTRAL AVE", "line2" : "", "city" : "KEARNEY", "state" : "NE", "postal_code" : "688472944", "country" : "US", "telephone" : "3088652512", "fax" : "3088652506" } }}
© Talend 2012
Questions
© Talend 2012© Copyright Talend 2011. All rights reserved. 35
Acronyms
➜ RADAR – Risk Adjustment Data Acquisition and Analysis Repository
➜ ASM – Alternative Submission Method➜ NPI – National Provider Identifier➜ NPPES – National Plan and Provider Enumeration
System➜ CMS – Centers for Medicare & Medicaid Services➜ PECOS – Provider Enrollment Chain and Ownership
System
© Talend 2012© Copyright Talend 2011. All rights reserved. 36
References
➜ http://oig.hhs.gov/oei/reports/oei-07-09-00440.pdf - Daniel R. Levinson, Inspector General
➜ http://www.cms.gov/Regulations-and-Guidance/HIPAA-Administrative-Simplification/NationalProvIdentStand/DataDissemination.html
© Talend 2012© Copyright Talend 2011. All rights reserved. 37
Code
➜ https://github.com/Optum/MongoDBWorld2014 - Talend Code for NPI Load / Read
See Reference Links for File Specifications, etc.
© Talend 2012© Copyright Talend 2011. All rights reserved. 38
Session Links
➜ http://nppes.viva-it.com/NPI_Files.html - NPPES Dataset
➜ http://www.cms.gov/Regulations-and-Guidance/HIPAA-Administrative-Simplification/NationalProvIdentStand/DataDissemination.html - NPPES File Layout
➜ http://www.cms.gov/Medicare/Provider-Enrollment-and-Certification/MedicareProviderSupEnroll/Taxonomy.html - Taxonomy Crosswalk
© Talend 2012© Copyright Talend 2011. All rights reserved. 39
Session Links – Cont.
➜ http://www.csscoperations.com/internet/cssc3.nsf/DocsCat/CSSC~CSSC%20Operations~Risk%20Adjustment%20Processing%20System~References~94LTWU2832?open&navmenu=Risk%5EAdjustment%5EProcessing%5ESystem%7C%7C%7C%7C – Acceptable Physician Specialty Types