data mining in health insurance

Data mining in Health Insurance

Introduction

• Rob Konijn, [email protected]– VU University Amsterdam– Leiden Institute of Advanced Computer Science (LIACS)– Achmea Health Insurance

• Currently working here• Delivering leads for other departments to follow up

– Fraud, abuse

• Research topic keywords: data mining/ unsupervised learning / fraud detection

2

Outline

• Intro Application– Health Insurance– Fraud detection

• Part 1: Subgroup discovery • Part 2: Anomaly detection (slides partly

by Z. Slavik, VU)

Intro Application

• Health Insurance Data• Health Insurance in NL

– Obligatory– Only private insurance companies– About 100 euro/month(everyone)+170 euro (income)– Premium increase of 5-12% each year

Achmea: about 6 million customers

Funding of Health Insurance Costs in the Netherlands

vereveningsfonds

verzekerde zorgverzekeraar

rijksbijdrageverzekerden 18-

2 mld

inkomensafh.bijdragewerkgevers 17 mld

30 mld

zorguitgaven

vereveningsbijdrage

18 mld

nominale premie 18+:

- rekenpremie (~€ 947/vrz): 12 mld- opslag (~€ 150/vrz) : 2 mld

vereveningsfondsvereveningsfondsvereveningsfondsvereveningsfondsvereveningsfondsvereveningsfonds

zorgverzekeraar

vereveningsfonds

Verevenings-model• By population

characteristics– Age– Gender– Income, social class– Type of work

• Calculation afterwards– High costs

compensation (>15.000 euro)

30 - 34 jr98035 - 39 jr1,044

50 - 54 jr

2,394

1,639

45 - 49 jr

55 - 59 jr60 - 64 jr 1,885

1,1831,354

40 - 44 jr

25 - 29 jr 870

1,400 0 - 4 jr1,026 5 - 9 jr90710 - 14 jr96415 - 17 jr89218 - 24 jr

905

3,34980 - 84 jr75 - 79 jr

65 - 69 jr

3,42490 jr e.o.

2,8263,244

70 - 74 jr

3,464

Mannen

85 - 89 jr

1,876

1,7131,905

1,366

2,560

1,476

2,201

1,768

1,532

1,232

Vrouwen

2,8863,0183,0343,014

918

1,2141,062

9361,210

Fraude in de zorg

Introduction Application:The Data

• Transactional data– Records of an event– Visit to a medical practitioner

• Charged directly by medical practioner• Patient is not involved• Risk of fraud

Transactional Data

• Transactions: Facts– Achmea:

About 200 mln transactions per year

• Info of customers and practitioners: dimensions

Different levels of hierarchy

• Records represent events• However, for example for fraud detection, we are

interested in customers, or medical practitoners

• See examples next pages• Groups of records: Subgroup Discovery• Individual patients/practioners: outlier detection

Different types of fraud hierarchy

• On a patient level, or on a hospital level:

Handling different hierarchy

• Creating profiles from transactional data• Aggregating costs over a time period

– Each record: patient• Each attribute i =1 to n: cost spent on treatment i

• Feature construction, for example– The ratio of long/short consults (G.P.)– The ratio of 3-way and 2 way fillings (Dentist)– Usually used for one-way analysis

Different types of fraud detection

• Supervised– A labeled fraud set– A labeled non-fraud set– Credit cards, debit cards

• Unsupervised– No labels– Health Insurance, Cargo, telecom, tax etc.

Unsupervised learning in Health Insurance Data

• Anomaly Detection (outlier detection)– Finding individual deviating points

• Subgroup Discovery– Finding (descriptions of) deviating groups

• Focus on differences and uncommon behavior– In contrast to other unsupervised learning methods

• Clustering• Frequent Pattern mining

Subgroup Discovery

• Goal: Find differences in claim behavior of medical practitioners

• To detect inefficient claim behavior– Actions:

• A visit from the account manager• To include in contract negotiations

– In the extreme case: fraud• Investigation by the fraud detection department

• By describing deviations of a practitioner from its peers– Subgroups

Patient-level, Subgroup Discovery

• Subgroup (orange): group of patients• Target (red)

– Indicates whether a patient visited a practitioner (1), or not (0)

Subgroup Discovery: Quality Measures

• Target Dentist: 1672 patiënten– Compare with peer group, 100.000 patients in

total

• Subgroup V11 > 42 euro : 10347 patients– V11: one sided filling

• Crosstable

target dentist rest totaal

V11 >= 42 871 9476 10347rest 801 88852 89653totaal 1672 98328 100000

The cross table

• Cross table in data

• Cross table expected:

• Assuming independence

target dentist rest totalV11 >= 42 173 10174 10347

rest 1499 88154 89653

total 1672 98328 100000

target dentist rest totalV11 >= 42 871 9476 10347rest 801 88852 89653total 1672 98328 100000

Calculating Wracc and Lift

• Size subgroup = P(S) = 0.10347, size target dentist = P(T) = 0.01672• Weighted Relative ACCuracy (WRAcc) = P(ST) – P(S)P(T) = (871 –

173)/100000 = 689/100000• Lift = P(ST)/P(S)P(T) = 871/173 = 5.03

target dentist rest totalV11 >= 42 173 10174 10347

rest 1499 88154 89653

total 1672 98328 100000

target dentist rest totalV11 >= 42 871 9476 10347rest 801 88852 89653total 1672 98328 100000

Example dentistry, at depth 1, one target dentist

ROC analysis, target dentist

Making SD more useful: adding prior knowledge

• Adding prior knowledge– Background variables patient (age, gender, etc.)– Specialism practitioner– For dentistry: choice of insurance

• Adding already known differences– Already detected by domain experts themselves– Already detected during a previous data mining run

Prior Knowledge, Motivation

Example, influence of prior knowledge

The idea: create an expected cross table using prior knowledge

Quality Measures• Ratio (Lift)

• Difference (WRAcc)

• Squared sum (Chi-square statistic)

Example, iterative approach

• Idea: add subgroup to prior knowledge iteratively• Target = single pharmacy• Patients that visited the hospital in last 3 years removed

from data• Compare with peer group (400,000 patients), 2929 patiënts

of target pharmacy• Top subgroup : “B03XA01 (Erythropoietin)>0 euro”

subgroup T F

T 1297 224

F 1632 396,847

B03XA01 > 0

1 ‘target’ pharmacy

rest

rest

http://en.wikipedia.org/wiki/Erythropoietin

Next iteration• Add “B03XA01 (EPO) >0 euro” to prior knowledge• Next best subgroup: “N05AX08 (Risperdal)>= 500 euro”

Figure describing subgroup: N05AX08 > 500

Left: target pharmacy, right: other pharmacies

Addition: adding costs to quality measure

– M55: dental cleaning– V11: 1-way filling– V21: polishing

• Cost of treatments in subgroup 370 euro (average)• 791 more patients than expected• Total quality 791*370 = 292,469 euro

Iterative approach, top 3 subgroups

V12: 2-sided filling V21: polishing V60: indirect pulpa covering

V21 and V60 are not allowed on the same day Claim back (from all dentists): 1.3 million euro

3d isometrics, cost based QM

Other target types: double binary target

• Target 1: year: 2009 or 2008• Target 2: target practitioner

• Pattern:– M59: extensive (expensive) dental cleaning– C12: second consult in one year

• Crosstable:

Other target types: Multiclass target

• Subgroup (orange): group of patients• Target (red), now is a multi-value column, one

value per dentist

Multiclass target, in ROC Space

Anemaly Detection

The example above contains a contextual anomaly...

Outline Anomaly Detection

• Anomalies– Definition– Types– Technique categories– Examples

• Lecture based on– Chandola et al. (2009). Anomaly

Detection: A Survey– Paper in BB

38

Definition

• “Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior”

• Anomalies, aka.– Outliers– Discordant observations– Exceptions– Aberrations– Surprises– Peculiarities– Contaminants

39

Anomaly typesPoint anomalies

– A data point is anomalous with respect to the rest of the data

40

Not covered today

• Other types of anomalies:– Collective anomalies– Contextual anomalies

• Other detection approaches:– Supervised learning– Semi supervised

• Assume training data is from normal class• Use to detect anomalies in the future

We focus on outlier scores

• Scores– You get a ranked list of anomalies– “We investigate the top 10”– “An anomaly has a score of at least 134”– Leads followed by fraud investigators

• Labels

42

ANOMAL

Y

Detection method categorisation

1. Model based2. Depth based3. Distance Based

4. Information theory related (not covered)5. Spectral theory related (not covered)

43

Model based

• Build a (statistical) model of the data

• Data instances occur in high probability regions of a stochastic model, while anomalies occur in low probability regions

• Or: data instances have a high distance to the model are outliers

• Or: data instances have a high influence on the model are outliers

Example: one way outlier detection

• Pharmacy records• Records represent patients• One attribute at a time:

– This example: attribute describing the costs spent on fertility medication (gonodatropin) in a year

• We could use such one way detection for each attribute in the data

Example, model = parametric probability density function

Example, model = non-parametric distribution

• Left: kernel density estimate• Right: boxplot

Example: regression model

Other models possible

• Probabilistic– Bayesian networks

• Regression models– Regression trees/ random forests– Neural networks

• Outlier score = prediction error (residual)

Depth based methods

• Applied on 1-4 dimensional datasets– Or 1-4 attributes at a time

• Objects that have a high distance to the “center of the data” are considered outliers

• Example Pharmacy:– Records represent patients– 2 attributes:

• Costs spent on diabetes medication • Costs spent on diabetes testing material

Example: bagplot, halfspace depth

Distance based (nearest neighbor based)

• Assumption:– Normal data instances occur in dense neighbourhoods,

while anomalies occur far from their closest neighbours

Similarity/distance

• You need a similarity measure between two data points– Numeric attributes: Eucledian, etc.– Nominal: simple match often enough– Multivariate:

• Distance using all attributes• Distance between attribute values, then combine

Example, dentistry data

• Records represent dentists

• Attributes are 14 cost categories– Denote the percentage

of patients that received a claim from the category

Option 1:Distance to kth neighbour as anomaly

score

Option 2:Use relative densities of neighbourhoods

• Density of neighbourhood estimated for each instance

• Instances in the low density neighbourhoods are anomalous, others normal

• Note:– Distance to kth neighbour is an estimate for the

inverse of density (large distance low density)– But this estimates outliers in varying density

neighbourhoods badly

56

LOF• Local Outlier Factor:• Local density:

– k divided by the volume of the smallest hyper-sphere centred around the instance, containing k neighbours

• Anomalous instance:– Local density will be

lower than that ofthe k nearest neighbours

57

Average local density of k nearest neighboursLocal density of instance

Average local density of k nearest neighboursLocal density of instance

Example LOF outlier, dentistry

3. Clustering based a.d. techniques

• 3 possibilities;1. Normal data instances belong to a cluster in

the data, while anomalies do not belong to any cluster– Use clustering methods that do not force all

instances to belong to a cluster• DBSCAN, ROCK, SSN

2. Distance to the cluster center = outlier score3. Clusters with too few points are outlying

clusters59

K-means with 6 clusters, centers of the dentistry data set

• Attributes: percent of patient that received claim from cost category

• Clusters correspond to specialism1. Dentist2. Orthodontist3. Orthodontist

(charged by dentist)

4. Dentist5. Dentist6. Dental hygenist

Combining Subgroup Discovery and Outlier Detection

• Describe regions with outliers using SD• Identify suspicious medical practitioners• 2 or 3 step approach to describe outliers:

1. Calculate outlier score2. Use subgroup discovery to describe regions with

outliers.3. (optional) identify the involved medical

practitioners

Example output:

• Look at patients with ‘P30>1050 euro’ for practitioner number 221

• Left: all data, right: practitioner 221

Descriptions of outliers: LOCI outlier score

• 1. Calculate outlier score – LOCI is a density based

outlier score• 2. Describe outlying

regions• Result top subgroup:

– Orthodontics (dentist) 0.044 ^ Orthodontics 0.78

– Group of 9 dentists with an average score of 3.9

Conclusions

• Health insurance: Interesting application domain– Very relevant

• Outlier Detection and Subgroup discovery are useful

data mining in health insurance

Documents

fraud detection departmentby

fraud seta

nonfraud setcredit cards

medical practionerpatient

patient level

medical practitionercharged

anomaly detection slides

hospital level