Page 1: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

© 2014 Noblis, Inc.

Detecting Fraud Conspiracy Using Organizational Big Data

June 17, 2014

ACT-IAC Analytics and Big Data Committee

presented to the

Page 2: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

2 © 2014 Noblis, Inc.

Overview of Analytic Approach

Case Examples Exposing Potential Fraud Conspiracy

• Provider K

• Provider H

• Hospital T

Next Steps and Summary


Page 3: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

3 © 2014 Noblis, Inc.

A High-Performance Computing Platform

Cray XMT2

128 node configuration with high bandwidth 3D Torus interconnect

4 tera-bytes of shared memory

100 tera-bytes of Lustre storage

10 gigabit networking

Torque Clusters (2)

16 nodes, 512 cores with 512GB RAM

20 tera-bytes of usable storage

10 gigabit networking

Fraud detection algorithm development and analysis was enabled by a high-performance computing platform

Page 4: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

4 © 2014 Noblis, Inc.

An Approach to Reveal New Insights

From Agency Data

From existing agency data sources

A new way to structure and analyze data

Discovers previously unknowable patterns and relationships

Central provider is 100 standard deviations away from peer average in type and volume of connections

Small private psychiatric hospital has same volume of distant connections as a major cancer center

One provider has two distinct geographic clusters, and is linked to entirely different kinds of providers in each location

Page 5: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

5 © 2014 Noblis, Inc.

Case Examples

Case Likely Fraud Type

Provider K Referral fraud,

billing fraud

Provider H Stolen or “loaned”

National Provider Identifier (NPI)

Hospital T Patient brokerage,

billing fraud

Page 6: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

6 © 2014 Noblis, Inc.

Federal Agency Data Set

A public use data set from Centers for Medicare and Medicaid Services (CMS) that linked medical providers who treat the same Medicare beneficiaries

The data provides a map of how doctors, hospitals and other providers team together to deliver care to the >65 population

Approximately 1M providers, approximately 50M provider relations, over 1B claims

It represents the number of times that two providers billed Medicare for the same patient within any 30-day window in 2011

Page 7: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

7 © 2014 Noblis, Inc.

Extract ‘Latent Social Network’ From Claims Data

Claim ID Procedure Provider ID Date Beneficiary ID

0012599 Chiropractic Adjustment

1122334455 (Jones)

2013-07-21 39485922

⁞ ⁞ ⁞ ⁞ ⁞

0012784 Lung Capacity

Test 1234567890

(Smith) 2013-07-23 39485922

0012785 Ear Exam 1029384756 2013-07-23 44992029

Medicare Claims File

Event_1 Event_2 time lag

Jones Smith

Does this imply a relationship between

Jones and Smith?

Same beneficiary

Page 8: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

8 © 2014 Noblis, Inc.

Generated Data Exposes “Normal” and

“Anomalous” Behavior

Jones and Smith may not know each other --- or ---

Jones may pass Smith beneficiary IDs and get a kickback when Smith bills for them

How would we know

A single pair of events shows nothing Pairs of events will occur with a certain frequency and distribution Pairs of events that happen ‘too often’ are suspect

How do we exploit this data

Page 9: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

9 © 2014 Noblis, Inc.

Two Complementary Approaches

Whole Graph Analysis Sub-Graph Investigation

New Patterns for Detection

Anomalous Behavior

Examine behavioral patterns in specific cases

Requires understanding of service delivery and types of fraud

Vetting of leads for further investigation

Analyze entire graph for outliers based on sets of patterns and normal behaviors

Requires HPC

Develop ranked list of potential leads

Page 10: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

10 © 2014 Noblis, Inc.

Whole Graph Anomaly Detection

Define a measure of provider or beneficiary behavior

Define peer groups that are meaningful

Must be able to calculate behavior values for each member of a group

Calculate group average for each peer group Distance of each provider from his peer average Standard deviation of the distances

Analyze providers or beneficiaries whose values are significantly different from their peer average

Validate each “peer-group/behavior-measure” combination by using government resources that

identify fraudsters

Page 11: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

11 © 2014 Noblis, Inc.

Example Behavior Outbound Links to Medical Specialties

Each provider has a set of outbound links

NPI:12345, podiatry

NPI:23456, radiology

NPI:45678, radiology

NPI:56789, internist

NPI:67890, cardiology

NPI:54321, physical therapy

95 127





NPI:65432, dermatology

Specialty Outbound

Weight Sum

radiology 222


therapy 144

dermatology 90

internist 77

cardiology 70

Specialty Outbound


radiology 2


therapy 1

dermatology 1

internist 1

cardiology 1

Counting the peers of each specialty gives us an idea of the provider’s social network, in terms of the specialties his patients see

Adding up the weights of the links for each specialty tells us about the strength of his association with each specialty

The size of the social network and the weight of links both contribute to overall behavior

Page 12: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

12 © 2014 Noblis, Inc.








Average for podiatry

Behavior What is “Normal”?

For each specialty (e.g., podiatry), calculate the average outbound referral weight to every other specialty

For each provider, calculate the distance from his/her/its specialty average

Calculate the standard deviation of those distances

Ad hoc definition of an outlier:

“A provider whose behavior vector is more than 3 Std. Dev. from his/her/its specialty average.”

1 Std Dev

Page 13: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

13 © 2014 Noblis, Inc.

Whole-Graph Anomaly Detection

Surname State Peer Group Number of Standard Deviations

From Peer Group Average

K NY Chiropractor 122.12

B NY Physical Therapy


B3 VT Family Medicine 79.13

D2 IL Social Worker 61.15

D NY Emergency Medicine


E TX Specialist 46.33

S TX Optometrist 46.01

K3 IL OB/GYN 45.23

V MA Internal

Medicine 43.62

G2 LA Neurology 40.04

Standard deviations from peer group average








0 20 40 60 80 100 120 140

1% of providers >3 standard deviations from their peer group average

The anomaly detection process naturally generates a ranked list of ‘leads’ by sorting the providers in descending order of their extremity

The top ten anomalous providers are shown in the table to the right

# o

f P




K B B3 D D2

Page 14: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

14 © 2014 Noblis, Inc.

Case #1 Provider K

Identified Pattern Qualifying Questions Likely Fraud Type

Many standard deviations away

from peer average in type and volume

of connections

Referral fraud, billing fraud

Is this a data error?

Who is he linked to in the graph?

Is it really a large institution in the wrong peer group?

Maybe he has already been caught?

Page 15: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

15 © 2014 Noblis, Inc.

Examine the Local Subgraphs of Suspicious Cases Use the Latent Social Network Graph to Build Cases

We assembled “Case” descriptions and pulled information from the Web about the persons and organizations for the anomalous providers at the top of the list

Starting with ‘K’ at the top, a single query revealed that ‘K’, ‘B’, and ‘D’ had extremely heavy links to each other

So they were treated as the core of a single case

Provider ‘K’ has a large latent social network

Provider ‘K’ has large edge weight totals, which reflect how many times his name appeared on Medicare claims

Links Weights

Inbound 446 874,960

Outbound 460 890,565

Page 16: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

16 © 2014 Noblis, Inc.

Provider K Case Network Diagram of Key Associates

The diagram is centered on “Provider K” who is most central in terms of edge weights in his social network (line thickness reflects edge weight)

• Two providers in K’s network (“N” and “G,” highlighted) are listed on the NY OMIG’s web site in “Final Audit” documents as billing for overpayment by Medicare and Medicaid

• Based on our research, there may be additional anomalous behavior occurring in K’s network of providers (B, D, K2); however, none of these providers are currently listed on the OMIG web site







Page 17: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

17 © 2014 Noblis, Inc.

Provider K Case Case File Includes Known Legal Actions Against Entities

The New York State OMIG website

includes ‘Final Audit’ documents describing

overpayments ranging from a few

thousand dollars to millions

Page 18: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

18 © 2014 Noblis, Inc.

Provider K Case Google Street View of Business Practice Locations

Four addresses are listed on Provider K’s website

• Only three can actually be found on Google Maps

• Two of the locations are shown at the right

Were hundreds of thousands of

chiropractic adjustments done in

these storefronts in 2011

Page 19: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

19 © 2014 Noblis, Inc.

Case #2 Provider H

Identified Pattern Qualifying Questions Likely Fraud Type

Has connections in distant geographic locations, linked to entirely different kinds of providers in each location

Stolen or “loaned” National Provider Identifier

How far does the typical rehab doc’s network extend?

Did the provider move during 2011?

What does a rehab doc’s ‘social network’ look like?

Page 20: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

20 © 2014 Noblis, Inc.

Whole Graph Analysis Discovered a Physician With

Two Distinct Geographic Clusters

Most provider-provider connections are local (<60 miles); some are distant (hospitals, radiologists, labs)

Some providers (2 percent to 10 percent, depends on specialty) move during the course of the year: their connections also show this bifurcated pattern

A rehab physician in Colorado appears to have a compromised provider ID

Page 21: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

21 © 2014 Noblis, Inc.

Analysis of Geography + Social Networks

Network data integrated with geographic data revealed new behaviors













“Colorado links” form a profile consistent with rehab physician:

radiology, cardio care, rehab

“Texas links” form a profile consistent with primary care: other

PC, hospitals, labs

Page 22: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

22 © 2014 Noblis, Inc.

Case #3 Hospital T

Identified Pattern Qualifying Questions Likely Fraud Type

Same volume of distant, one-way connections as a

major cancer center, but is a

small community psych facility

Patient brokerage, billing fraud

Given a case with convicted fraudsters, can we find them in our data?

What does their behavior look like in our data?

Are other providers linked to them in suspicious ways?

Page 23: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

23 © 2014 Noblis, Inc.

Analyzing A Known Case Using the Network Data

Start with a known Fraud case

Find links of other providers to and from the known fraudsters Based on knowledge of healthcare delivery, evaluate if the kind, weight, physical distance, etc.

of each link is plausible Implausible links become the starting point of further investigation

Refocus on ‘implausibly linked’ providers

Filter out plausibly linked providers

Include and study other linked providers

Consolidate discoveries into coherent case description

Distill knowledge into repeatable algorithm


Page 24: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

24 © 2014 Noblis, Inc.

Start with Known Fraud Case

From 2003 through August 2012, Provider P billed Medicare nearly $70M for services that were not properly rendered, for patients that did not qualify for the services being billed, and for claims for patients who were procured through bribes and kickbacks

Four convictions were made in June 2013 for conspiracy, healthcare fraud, wire fraud, and kickback offenses

In a separate plea in March 2013, a patient broker pled guilty to receiving kickbacks

Can these providers be found in our data?

Can we use the data to understand their fraudulent behavior?

Page 25: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

25 © 2014 Noblis, Inc.

Walk the Graph Start With Provider P

Nearest neighbor sub-graph of Provider P

Edges (thickness of line) weighted by number of shared patients

Labeled by grouped primary taxonomy; color coded by state

• Red: Alabama

• Green: California

• Dark Blue: Florida

• Light Blue: Rhode Island (clinical laboratory)

• Black: Colorado (clinical laboratory)

Why are there one-way links to Provider P from a community hospital and a rehab physician in Wetumpka, Alabama?

Page 26: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

26 © 2014 Noblis, Inc.

Re-Center Around Wetumpka and

Focus on Links Outside of Alabama

• Red: Georgia

• Orange: Florida

• Green: Louisiana
























Previously unknown other Georgia network

Previously unknown other Louisiana network

Incidental other connections

We re-center the analysis around the unusual Wetumpka providers, and extract out their out-of-state linkages

The result is a map of previously unknown, likely fraudulent activity

Known fraud


Previously unknown


What is going on here?

Page 27: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

27 © 2014 Noblis, Inc.

Graph Analysis Reveals Divergent Behaviors

Green – Louisiana Blue – Mississippi

Louisiana provider draws primarily local patients

Dark Red – Alabama Bright Red – Georgia Orange – Florida Dark Blue – Michigan Azure – Colorado

Georgia provider draws patients from across the US

Page 28: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

28 © 2014 Noblis, Inc.

Quantifying Divergent Behavior

The observed pattern for Hospital T is most similar to major national cancer research centers; Hospital T has 59 beds

Pattern is the number of long distance one-way patient links to small hospital

This pattern analysis enables the identification of other suspect cases

Org Name ToCity ToState All Links

In Links in





Hospital T GA 34 29




Page 29: Detecting Fraud Conspiracy Using Organizational Big Data June 17 2014... · Detecting Fraud Conspiracy Using Organizational Big Data June 17, 2014 ... Case Examples Exposing Potential

29 © 2014 Noblis, Inc.

In Summary

Leverages our high performance computing platform to rapidly detect and qualify leads

Exposes previously unknown fraudulent behavior

Can be applied to almost any large data set

Increases in power by combining data sets

Our approach:

Top Related