![Page 1: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/1.jpg)
Review ofFraud Classification Using Principal
Components Analysis of RIDITS
By Louise A. FrancisFrancis Analytics and Actuarial Data Mining, Inc.
![Page 2: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/2.jpg)
Objectives
Address question: Why use new method, PRIDIT?
Introduce other methods used in similar circumstances
Explain how PRIDIT adds to methods available
Explain limitations of PRIDIT/RIDIT
![Page 3: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/3.jpg)
A Key Problem in Fraud Modeling
Most data mining methods need a target (dependent) variableY = a + b1x1 + b2x2 + … bnxn
Fraud (Yes/No or Fraud Score) = f(predictor variables)
Need sample of data where claims have been determined to be fraudulent or legitimate
![Page 4: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/4.jpg)
Dependent variable hard to get
In a large sample of automobile insurance claims perhaps 1/3 may have an element of abuse or fraud
Scarce resources are not expensed on such large volumes of claims to determine their legitimacyOnly a small percentage referred to SIU
investigators or other investigationsThere are time lags in determining the outcome of
investigations
![Page 5: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/5.jpg)
Unsupervised learning
Another approach that does not require a dependent variable
Two Key KindsCluster AnalysisPrincipal Components/Factor Analysis
Pridit uses this approachIt is applied to ordered categorical variables
![Page 6: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/6.jpg)
Cluster Analysis
Records are grouped in categories that have similar values on the variables
ExamplesMarketing: People with similar values on demographic
variables (i.e., age, gender, income) may be grouped together for marketing
Text analysis: Use words that tend to occur together to classify documents
Note: no dependent variable used in analysis
![Page 7: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/7.jpg)
ClusteringClustering
Common Method: k-means, hierarchicalNo dependent variable – records are grouped
into classes with similar values on the variable
Start with a measure of similarity or dissimilarity
Maximize dissimilarity between members of different clusters
![Page 8: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/8.jpg)
Dissimilarity (Distance) Measure Dissimilarity (Distance) Measure – Continuous Variables– Continuous Variables
Euclidian Distance
Manhattan Distance
1/ 22
1( ) i, j = records k=variable
mij ik jkkd x x
1
mij ik jkkd x x
![Page 9: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/9.jpg)
Binary Variables
Row Variable1 0
1 a b a+b0 c d c+d
a+c b+dCo
lum
n
Var
iab
le
![Page 10: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/10.jpg)
Binary Variables
Sample Matching
Rogers and Tanimoto
b cd
a b c d
2( )( ) 2( )
b cd
a d b c
![Page 11: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/11.jpg)
Example: Fraud DataData from 1993 closed claim study conducted by
Automobile Insurers Bureau of MassachusettsClaim files often have variables which may be useful
in assessing suspicion of fraud, but a dependent variable is often not available
Variables used for clustering:Legal representationPrior ClaimSIU InvestigationAt faultPolice reportNumber of providers
![Page 12: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/12.jpg)
Statistics for Clusters Based on descriptive statistics, Cluster 2 appears to
have higher likelihood of fraudulent claims – more about this later
Police Medical At Legal SIU NumberCluster Report Audit Fault Rep Investigation Providers
Percentage Yes1 46.7% 0.1% 42.2% 6.1% 0.0% 22 49.8% 5.9% 2.4% 96.0% 6.5% 4
![Page 13: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/13.jpg)
Principal Components Analysis
A form of dimension (variable) reductionSuppose we want to combine all the information
related to the “financial” dimension of fraudMedical provider bill (indicative of padding claim)Hospital billNumber of providersEconomic LossesClaimed wages Incurred Losses
![Page 14: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/14.jpg)
Principal Components
These variables are correlated but not perfectly correlated
We replace many variables with a weighted sum of the variables
![Page 15: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/15.jpg)
Correlation Matrix for VariablesCorrelations
Number Providers
Medical Bill
Provider Paid
Economic Losses Incurred
Hospital Pymt
Number Providers 1.000 0.387 0.571 0.382 0.382 0.168
Medical Bill 0.387 1.000 0.539 0.952 0.952 0.922Provider Paid 0.571 0.539 1.000 0.531 0.531 0.327Economic Losses 0.382 0.952 0.531 1.000 1.000 0.888
Inourred 0.382 0.952 0.531 1.000 1.000 0.888Hospital Pymt 0.168 0.922 0.327 0.888 0.888 1.000
![Page 16: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/16.jpg)
Finding Factor or Component
The correlation matrix is used to find the factor that explains the most variance (captures most of the correlation) for the set of variables
That component or factor extracted will be a weighted average of the variables
More than one Component or Factor may result from applying the method
![Page 17: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/17.jpg)
Evaluating Importance of Variables
Use factor loadings
Component MatrixVariable Loading
Number Providers 0.497Medical Bill 0.974Provider Paid 0.646Economic Losses 0.976Incurred 0.976Hospital Pymt 0.886
![Page 18: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/18.jpg)
Problem: Categorical Variables
It is not clear how to best perform Principal Components/Factor Analysis on categorical variablesThe categories may be coded as a series of binary
dummy variablesIf the categories are ordered categories, you may
loose important information
This is the problem that PRIDIT addresses
![Page 19: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/19.jpg)
RIDIT
Variables are ordered so that lowest value is associated with highest probability of fraud
Use Cumulative distribution of claims at each value, i, to create RIDIT statistic for claim t, value i
ˆ ˆti tj tjj i j i
R p p
![Page 20: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/20.jpg)
Example: RIDIT for Legal Representation
Legal Representation
Proportion Proportion
Value Code Number Proportion Below Above RIDITYes 1 706 0.504 0.000 0.496 -0.496No 2 694 0.496 0.504 0.000 0.504
![Page 21: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/21.jpg)
PRIDIT
Use RIDIT statistics in Principal Components Analysis
Component Matrixa
.248
.220
.709
.752
.341
.406
SIU
Police Report
At Fault
Legal Rep
Medical Audit
Prior Claim
1
Component
Extraction Method: Principal Component Analysis.
1 components extracted.a.
![Page 22: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/22.jpg)
Scoring
Assign a score to each claimThe score can be used to sort claims
More effort expended on claims more likely to be fraudulent or abusive
In the case of AIB data, we can use additional information to test how well PRIDIT did, using the PRIDIT scoreA suspicion score was assigned to each claim by
an expert
![Page 23: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/23.jpg)
PRIDIT vs. Suspicion Score
Suspicion Score vs PRIDIT Score
(1.50)
(1.00)
(0.50)
0.00
0.50
1.00
Suspicion Score
PR
IDIT
Sc
ore
![Page 24: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/24.jpg)
Clustering and Suspicion Score
Report
Mean
.6445
3.3737
1.9643
1
2
Total
TwoStepCluster Number
SuspicionLevel
![Page 25: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/25.jpg)
Result
There appears to be a strong relationship between PRIDIT score and suspicion that claim is fraudulent or abusive
The clusters resulting from the cluster procedure also appeared to be effective in separating legitimate from fraudulent or abusive claims
![Page 26: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/26.jpg)
Comparison: PRIDIT and Clustering
PRIDIT gives a score, which may be very useful for claims sorting. Clustering assigns claims to classes. They are either in or out of the assigned class.
Clustering ignores information about the order of values for categorical variables
Clustering can accommodate both categorical and continuous variables
![Page 27: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/27.jpg)
Comparison
Unordered categorical variables with many values (i.e., injury type):Clustering has a procedure for measuring
dissimilarity for these variables and can use them in clustering
If the values for the variables contain no meaningful order, PRIDIT will not help in creating variables to use in Principal Components Analysis.
![Page 28: Review of Fraud Classification Using Principal Components Analysis of RIDITS](https://reader036.vdocuments.us/reader036/viewer/2022062321/56812d1f550346895d921ae6/html5/thumbnails/28.jpg)
Review ofFraud Classification Using Principal
Components Analysis of RIDITS
By Louise A. FrancisFrancis Analytics and Actuarial Data Mining, Inc.