cs490d: introduction to data mining prof. chris clifton
DESCRIPTION
CS490D: Introduction to Data Mining Prof. Chris Clifton. April 14, 2004 Fraud and Misuse Detection. What is Fraud Detection?. Identify wrongful actions Is right and wrong universal? If so, why not just prevent wrong actions Identify actions by the wrong people Identify suspect actions - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/1.jpg)
CS490D:Introduction to Data Mining
Prof. Chris Clifton
April 14, 2004
Fraud and Misuse Detection
![Page 2: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/2.jpg)
What is Fraud Detection?
• Identify wrongful actions– Is right and wrong universal?– If so, why not just prevent wrong actions
• Identify actions by the wrong people
• Identify suspect actions– Legal– But probably not right
![Page 3: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/3.jpg)
In Data Mining terms…
• Classification?– Classify into fraudulent and non-fraudulent
behavior– What do we need to do this?
• Outlier Detection– Assume non-fraudulent behavior is normal– Find the exceptions
• Problems?
![Page 4: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/4.jpg)
– +–
Solution: Differential Profiling
• Determine individual behavior– What is normal for the
individual– What separates one
individual from another
• Gives profile of individual behavior
• How do we do this?Profile
ClassificationMining
ProfileProfile
+ +–
![Page 5: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/5.jpg)
Has this been done?Intrusion Detection (Lane&Brodley)
• Profiled computer users based on command sequences– Command– Some (but not all) argument information– Sequence information
![Page 6: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/6.jpg)
ResultsAccuracy Time to Alarm
![Page 7: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/7.jpg)
Scaling Issues
• What happens with millions of users?– Credit card– Cell phone
• What about new users?
• Ideas?
![Page 8: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/8.jpg)
Multi-user profiles
• Cluster users
• Develop profiles for clusters– E.g., differential profiling
• Old customers: Do they match profile for their cluster?– Allows wider range of acceptable behavior
• New customer: Do they match any profile?
![Page 9: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/9.jpg)
Data mining for detection and prevention
![Page 10: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/10.jpg)
Matching known fraud/non-compliance
• Which new cases are similar to known cases?
• How can we define similarity?• How can we rate or score
similarity?
![Page 11: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/11.jpg)
Anomalies and irregularities
• How can we detect anomalous or unusual behavior?
• What do we mean by usual?• Can we rate or score cases on
their degree of anomaly?
![Page 12: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/12.jpg)
Techniques used to identify fraud
Predict and Classify– Regression
algorithms (predict numeric outcome): neural networks, CART, Regression, GLM
– Classification algorithms (predict symbolic outcome): CART, C5.0, logistic regression
Group and Find Associations
– Clustering/Grouping algorithms: K-means, Kohonen, 2Step, Factor analysis
– Association algorithms: apriori, GRI, Capri, Sequence
![Page 13: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/13.jpg)
Techniques for finding fraud:
• Predict the expected value for a claim, compare that with the actual value of the claim.
• Those cases that fall far outside the expected range should be evaluated more closely
![Page 14: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/14.jpg)
Techniques for finding fraud:
•Build a profile of the characteristics of fraudulent behavior.•Pull out the cases that meet the historical characteristics of fraud.
Decision Trees and Rules
![Page 15: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/15.jpg)
Techniques for finding fraud:
• Group behavior using a clustering algorithm
• Find groups of events using the association algorithms
• Identify outliers and investigate
Clustering and Associations
![Page 16: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/16.jpg)
Fraud detection using CRISP-DM
Provides a systematic way to detect fraud and abuse
Ensures auditing and investigative efforts are maximized
Continually assesses and updates models to identify new emerging fraud patterns
Leads to higher recoupments
![Page 17: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/17.jpg)
Data mining in action: Fraud, waste and abusecase studies
![Page 18: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/18.jpg)
Payment Error Prevention
…used this information to focus their auditing effort
The US Health Care Finance Administration needed to isolate the likely causes of payment error by developing a profile of acceptable billing practices and...
![Page 19: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/19.jpg)
Payment error prevention solution
• Clementine™• Using audited discharge records, built
profiles of appropriate decisions such as diagnosis coding and admission
• Matched new cases• Cases not matching are audited
![Page 20: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/20.jpg)
Payment error prevention results
• Detected 50% of past incorrect payments – resulting in significant recovery of funding lost to payment errors
• PRO analysts able to use resultant Clementine models to prevent future error
![Page 21: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/21.jpg)
Billing and payment fraud
Identified suspicious cases to focus investigations
The US Defense Finance and Accounting Service needed to find fraud in millions of Dept of Defense transactions and...
![Page 22: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/22.jpg)
Billing and payment fraud solution
• Clementine• Detection models based on known fraud
patterns• Analyzed all transactions – scored based on
similarity to these known patterns• High scoring transactions flagged for
investigation
![Page 23: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/23.jpg)
Billing and payment fraud results
• Identified over 1,200 payments for further investigation
• Integrated the detection process• Anomaly detection methods (e.g.,
clustering) will serve as ‘sentinel’ systems for previously undetected fraud patterns
![Page 24: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/24.jpg)
Audit selection
Focused audit investigations on cases with the highest likely
adjustments
The Washington State Department of Revenue needed to detect erroneous tax returns and...
![Page 25: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/25.jpg)
Audit selection solution
• Clementine• Using previously audited returns • Model adjustment (recovery) per auditor
hour based on return information• Models will then score future returns
showing highest potential adjustment
![Page 26: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/26.jpg)
Audit selection results
• Maximizes auditors’ time by focusing on cases likely to yield the highest return
• Closes the ‘tax gap’
![Page 27: CS490D: Introduction to Data Mining Prof. Chris Clifton](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813b2c550346895da3f312/html5/thumbnails/27.jpg)
Data mining - key to detecting and preventing fraud, waste and abuse
• Learn from the past– High quality, evidence based
decisions• Predict
– Prevent future instances• React to changing circumstances
– Models kept current, from latest data