how unitedhealth group integrated open data into its mongodb-based information system

38
© Talend 2012 Ecosystem How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System Cedric Carbone CTO at Talend Twitter: @carbone Matt Axsom Lead Architect at Optum [email protected]

Upload: mongodb

Post on 29-Aug-2014

153 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012

Ecosystem

How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

Cedric CarboneCTO at TalendTwitter: @carbone

Matt AxsomLead Architect at [email protected]

Page 2: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012

Key Facts• Founded in 2006

• $102m raised since inception

• 400+ employees in 7 countries

• Open Core business model• Subscription license• Services & training

Solutions• Highly scalable integration solutions addressing Big Data,

Application Integration, Data Integration, Data Quality, MDM, BPM

• Recognized as a visionary leader in the integration market by Gartner and Forrester

Talend OverviewGrowth-Driven Deployment Model

Revenue: 108% CAGR

2007 2008 2009 2010 2011 2012 2013

BRAND AWARENESS

VIBRANT COMMUNITY

MONETIZATION

CUSTOMER LOYALTY

Page 3: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 3

The Talend Difference

Page 4: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012

Talend Open Studio

Page 5: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 5

Optum Overview

➜ Clinical Performance and Compliance• Encounter Data Processing• Incremental Data Capture

Page 6: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 6

Talend Usage

➜ Talend can be leveraged for combined process automation, orchestration, and execution to get increased business continuity and massive ROI

➜ Empower your business users to make changes to their analytics and business intelligence jobs with little or no IT involvement

Page 7: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 8

RADAR Overview

➜ RADAR – Risk Adjustment Data Acquisition and Analysis Repository• Medicare and Retirement Claims Related Data

• EHR – Electronic Health Record• EMR – Electronic Medical Record• HL7 – Health Level 7

Page 8: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 9

RADAR Overview – Cont.

➜ Solution Benefits• Trading Partner File Submissions Tracked via GridFS• Transactions via Collections, Correlated w/Metadata• Downstream File Outputs are Stored in GridFS• Audit Traceability from Cradle to Grave

Simplified Auditability / Research / Analysis of Submission Data by Storing in a Consistent State / Platform Removing the Reliance on Network Shares, etc.

Page 9: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012

Trading Partner Submissions

Commercial Payers

Transaction Types

Provider Direct

Hospital Data Capture

Chart Review

Delete Submissions

Government Payers

Hospitals

Physicians

Practice Management/HIS

Clearinghouses

Portal / Direct

Direct

Direct

Direct Connect

Direct Connect

Clearinghouses

Commercial Payers

Direct API

Page 10: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012

RDBMS

RADAR Architecture

MongoDB

Trading Partner Submission

CMS

Input Data Source

TranslationETL

Talend / ETL

Trading Partner

Encounters

Submissions

GridFS

Workflows

Rules

CMS Submission

Status

DetailedReports

TX Submission MMR/MOR

Page 11: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 12

How did we get here?

➜ Talend• 30 Minutes Anyone?• Orchestration – 3rd Party

➜ MongoDB• File Systems• The Data Intake Problem• RDBMS Modeling Woes

➜ Hadoop• HBase Does NoSQL Right?• Hadoop Snapshot

Page 12: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 13

30 Minutes Anyone?

SELECT * FROM TX_DETAIL WHERE DATE = ‘20140623’;

Requirements:• Input Date Parameter• Output to Excel• Run Daily• Status Email

Page 13: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 14

Orchestration – 3rd Party

➜ SAS• Autoexec Injection• System calls / error trapping

➜ SSIS / Other• System calls / error trapping

➜ SAP Web Hooks• REST• SOAP

No Utility Scripting Required

Page 14: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 15

File Systems

➜ Solution Options?• BFILE – Link / Pointer to File System Object• HDFS – Hadoop Distributed File System• GridFS - MongoDB

Page 15: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012

The Data Intake Problem

Data Source 1

Application A

Client 6 Data

Client 2 Data

Client 4 Data

Client 3 Data

Client 5 Data

Application D

Application B

Application E

Application C

MS AccessFlat File

PDF

Custom File Format

UNIX

SAS

Flat File

Image

SQL Loader

Custom Code

Custom Code

Oracle Data Integrator

UNIX

SSIS

Page 16: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 17

RDBMS Modeling Woes

Page 17: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012

HDFS2 (Redundant, Reliable Storage)

YARN (Cluster Resource Management)

BATCH(MapReduce)

INTERACTIVE(Tez)

STREAMING(Storm, Spark)

GRAPH(Giraph)

NoSQL(MongoDB)

Events(Falcon)

ONLINE(HBase)

OTHER(Search)

Tap – Transform – Deliver

TRANSFORM (Data Refinement)

PROFILE PARSEMAP CDCCLEANSE STANDARD-IZE

MACHINELEARNINGMATCH

TAP(Ingestion)

SQOOP

FLUME

HDFS API

Storm

HIVE

800+

DELIVER(as an API)

ActiveMQKaraf

NoSQLCXF/Camel

KafkaStorm

MetaSecurity

MDMiPaaS

GovernHA

Page 18: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012

Generate Pure Map Reduce

Page 19: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012

Pig Latin Script generation

Page 20: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012

HiveQL generation

Page 21: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 22

HBase Does NoSQL Right?

➜ Availability and Agility• HDFS Reliance – Upgrade Downtime• HBase Production – 2014Q1

➜ Operational Simplicity• MongoDB Backup & Restore

➜ Secondary Indexing (HBase December 2012)• Primary Key Only = Full Table Scan

Page 22: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 23

Hadoop Snapshot

➜ Existing Environment– 20 Table Join – ~4B Total Rows– ~865K Row Input Driver– ~100 Hour Execution

➜ 5 Node Cluster– Hive Query ~18 Hours– Hive Tuned ~45 Minutes

Page 23: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012 24

Hadoop Usage

➜ OLTP Platform Data Load to Hadoop• File to HDFS (via Talend)• Sqoop to HDFS / Hive (via Talend)

➜ Talend• Hive – SQL 92ish• Pig - Transformation• Map-Reduce (Data Read / Shuffle & Sort) Low Level Java• HCatalog* – Shared Schema / Data Typing

• Partition Notifications

➜ SAS• Hive

➜ Tableau / SAP BO 4• Hive

Page 24: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 25

RADAR Solution – Current State

➜ 100+ Jobs in TAC – Daily / Weekly / Monthly

➜ Moving / Migrating Jobs to Data Services (Real Time)• Leveraging tMOM and Camel

➜ Big Data• Migrating all core ODS to MongoDB• ETL/ELT and Analytics offload to Hadoop

Page 25: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 26

RADAR Solution – Future

➜ User Interface (Meteor/Node.js)• Talend Jobs & Services Exposure / Interaction• Self Service

– Audit and Traceability– Big Data Mashups

➜ Expand Data Services• API/REST/SOAP

➜ Big Data• DI / DQ YARN Tasks

Page 26: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 27

NPI Engine

➜ NPI - National Provider Identifier• NPI is a Health Insurance Portability and Accountability Act (HIPAA)

Administrative Simplification Standard. The NPI is a unique identification number for covered health care providers.

➜ Monthly NPI File Size is ~ 4GB• Contains 4+ million records• 329 Columns

• 15 Taxonomies (75 columns – 5 column for each)• 50 Other Provider Identifiers (200 columns – 4 column for each)• 2 Addresses• 3 Names• Additional Attributes

Page 27: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 28

NPPES Dataset

➜ NPI Permissible Uses• Cross Reference for Fraud Waste and Abuse• Debt Collection• Transaction Processing and Provider Communication• Coordination of Benefits• Provider to Patient Medical Record Linkage

Page 28: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 29

NPPES Dataset – Cont.

➜ Why we care• Source of Truth for NPI

• PECOS + NPPES like Mashups• Augment With Internal Data SSN/TIN

• Provider Lookup Service• Add Geo Location • Augment With Internal Provider Data

• Excluded Provider Services• Fraud Prevention and Detection

Page 29: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 30

NPPES and RADAR

➜ OIG List of Excluded Individuals/Entities➜ General Services Administration’s Excluded Parties List

System (Retired) SAM (System for Award Management)

➜ Acceptable Physician Specialty Types

➜ Possible Additions• Death Master• Provider Data Feeds

Page 30: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 31

Talend Job

Page 31: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 32

Output

Page 32: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 33

Output"taxonomies" : { "207X00000X" : { "licence" : "12637", "primary" : "Yes", "state" : "NE" } }, "addresses" : { "Mailing Address" : { "addresstype" : "Mailing Address", "addresstype_id" : 1, "line1" : "PO BOX 2168", "line2" : "", "city" : "KEARNEY", "state" : "NE", "postal_code" : "688482168", "country" : "US", "telephone" : "3088652512", "fax" : "3088652506" }, "Practice Location Address" : { "addresstype" : "Practice Location Address", "addresstype_id" : 2, "line1" : "3500 CENTRAL AVE", "line2" : "", "city" : "KEARNEY", "state" : "NE", "postal_code" : "688472944", "country" : "US", "telephone" : "3088652512", "fax" : "3088652506" } }}

Page 33: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012

Questions

Page 34: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 35

Acronyms

➜ RADAR – Risk Adjustment Data Acquisition and Analysis Repository

➜ ASM – Alternative Submission Method➜ NPI – National Provider Identifier➜ NPPES – National Plan and Provider Enumeration

System➜ CMS – Centers for Medicare & Medicaid Services➜ PECOS – Provider Enrollment Chain and Ownership

System

Page 36: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 37

Code

➜ https://github.com/Optum/MongoDBWorld2014 - Talend Code for NPI Load / Read

See Reference Links for File Specifications, etc.

Page 38: How UnitedHealth Group Integrated Open Data into its MongoDB-Based Information System

© Talend 2012© Copyright Talend 2011. All rights reserved. 39

Session Links – Cont.

➜ http://www.csscoperations.com/internet/cssc3.nsf/DocsCat/CSSC~CSSC%20Operations~Risk%20Adjustment%20Processing%20System~References~94LTWU2832?open&navmenu=Risk%5EAdjustment%5EProcessing%5ESystem%7C%7C%7C%7C – Acceptable Physician Specialty Types