02 san francisco guo 11 soa

8/6/2019 02 San Francisco Guo 11 SOA

1/20

Data Mining Techniques &Its Applications in Insurance

Society of Actuaries

San Francisco Spring Meeting

June 24 - 26, 2002

Lijia Guo, PhD, ASA, MAAA

University of Central FloridaSession 11L

SO A San F rancisco Spring MeetingJune 24-26, 2002

Slide 2

Learning Objectives

Understanding a Data Mining Process

Having insight about the actuarial

applications of data mining techniques

Exploring the perspective of applying data

mining techniques in your own practice


2/20


Slide 3

Agenda

Introduction

Data Mining Methods

Actuarial Applications

Conclusions & Questions


Slide 4

Introduction

Changes in Information Technology

Availability of large quantity of insurance

data

Mind your business by mining your data


3/20


Slide 5

What is Data Mining?

An information discovery process.

Prediction

-- Finding unknown values/relationships/patterns from

known large database

Description

-- interpretation of a large database

Making crucial business decisions - turn the

newfound knowledge into actionable results


Slide 6

Why Use Data Mining?

Product development

Marketing

Analysis of Claims Distribution

Healthcare ALM

Fraud detection

Solvency analysis


4/20


Slide 7

Data Mining Methods

Classification

Regression

Clustering

Summarizations

Dependency modeling

Deviation Detection


Slide 8

Data Mining Algorithms

Decision Trees (Breiman et al., 1984)

Logistic regression (Hosmer & Lemeshow,1989)

Neural Networks (Bishop, 1995; Ripley, 1996)

Fuzzy Logics

Genetic Algorithms (Goldberg, 1989)

Bayesian analysis, (Cheeseman et al., 1988)

Hybrid algorithms


5/20


Slide 9


-- Decision Trees

What are decision trees

How decision trees work

Choosing variables

Grouping

Creating the leaf nodes of the tree

Strengths and weaknesses


Slide 10

Data Mining Algorithms-- Neural Networks

What are Neural Networks

How Neural Networks work

Processing elements Training

Predicting

Strengths and weaknesses


6/20


Slide 11


-- Hybrid Algorithms

Problems with standard algorithms

Advanced algorithms

Discovery-driven approaches

Mixture of algorithms


Slide 12

Data Mining:Knowledge Discovery Process

Data Acquisition

Data integration

Data exploration

Model building

Understanding your model

Post-mining analysis


7/20


Slide 13

Data Mining Process: Data Acquisition

Data acquisition

Getting your data

Data qualification issues

Data quality issues

Data derivation

Defining a study Basic Risk Characteristics


Slide 14

Data Mining Process:Data Acquisition-- Case Study

SOA database for RP-2000 Mortality Tables

10,957,103 exposed life-years

Subset of the database that includes all the lives

above age 70 (3,769,956 exp, 217,490 death)

Risk groups

Age, gender, participation status, union, pay type,

collar type, and annuity amount, etc.


8/20


Slide 15

Data Mining Process:Data Acquisition

-- Case Study

Existing study on advanced-age mortality

Smooth extension of the patterns

Families of curves - Gompertz law, etc.

All these approaches aim at explaining the age

pattern of mortality.

Mortality distribution varies among seniors

with different backgrounds


Slide 16

Data Mining Process: Data Integration

To identify the factors that influence

mortality

To study the interaction of the risk factors

To gain the perspective on the importance

of these factors


9/20


Slide 17


Slide 18

Data Mining Process: Data Integration-- Case Study

Main effect exists for all six variables

considered

Degrees of the effects of the risk factors are

different. the interaction of these factors

the importance of the factors


10/20


Slide 19

Data Mining Process: Data exploration

Decision tree algorithm

Analyze the influences and the importance ofthe mortality risk factors

observations are grouped into several segments

Algorithm - SAS/Enterprise Miner Version4.2 (2001).

Further study the interaction and theimportance of the risk factors


Slide 20

Data Mining Process: Data Integration-- Case Study

Variable Importance Measure

Variable Importance

Participation Status 1.00

Gender 0.75

Annuity size 0.43

Pay Type 0.21

Union 0.18

Collar 0.00


11/20


Slide 21



Slide 22



12/20


Slide 23

Six risk groups:

Employees

Beneficiaries

Combined

Disabled

Male Retirees

Female Retirees. Logistic regression method

Data Mining Process: Model building


Slide 24

Data Mining Process: Model Building --Case Study: Female Retiree


13/20


Slide 25

Data Mining Process: Model Building

-- Case Study: Female Retiree Group

Collar and Pay Type are two important

variables

The interaction between Collar and Pay

Type does exist

Both annuity size and union are not

picked up by tree algorithm


Slide 26

Data Mining Process: Model Building-- Case Study: Female Retiree Group

R-square for the regression is 0.95

PTCPTCxxp

p046.000087.026.097.17

1log

2 ++=

=

collarmixed

collarblue

collarwhite

C

0047.0

0

0

=

typepaysalarized

typepayhourly

typepaycombined

PT

0

051.0

033.0

Wherep is the mortality rate,x is the age


14/20


Slide 27

Data Mining Process: Model Building -- Case Study: Female Retiree Group


Slide 28

Data Mining Process: Model Building-- Case Study: Male Retiree Group

R-square for the regression is 0.92

Wherep is the mortality rate,x is the age

SUUSxxp

p ++=

200055.020.057.141

log

=

annuitysmall

annuitymedian

annuityel

S

0074.0

060.0

arg044.0

=combined

membeunionnon

memberunion

U

040.0

14.0

0


15/20


Slide 29

Data Mining Process: SEMMA


Slide 30

Data Mining Process: Model Building -- Case Study: MaleRetiree


16/20


Slide 31

Data Mining Process: Post-mining Analysis -- Case Study


Slide 32

Data Mining Process: Understandingyour model Case Study

The male retirees mortality model and the female

retirees mortality model depend on different

variables

Mortality of the beneficiaries is determined by

gender, annuity size, the pay type, and theirinteractions

The gender factors will play a much-reduced role

in determining beneficiaries mortality model


17/20


Slide 33

Data Mining Process: Post-mining

Analysis -- Case Study

Limited results on the mortality distribution for

the ages above 95

As the female demography changed in the past

three decade, variables such as annuity size, and

union will play more important role in

determining the female mortality

Other risk factors such as education, life style,smoking/non-smoking, etc.


Slide 34

Data Mining Process: Summary-- Case Study

Non Gompertz (linear growth) between age

70 and 85

Selection of the risk factors may influence

the quality of the mortality model Mortality models varies with the most

important risk factor (the participating

status, in this study) among all the other

variables


18/20


Slide 35

Data Mining Process:

-- Case Study in Claim Analysis

Basic risk characteristics

Top-down identification

Underlying statistical properties

Domain-specific constraints


Slide 36

Data Mining Process:-- Case Study in ALM

Decision tree and DNF learning

Generative stochastic modeling

Probabilistic networks Probabilistic Rules

Hidden Markov model


19/20


Slide 37

Data Mining Process:

-- Applications in Healthcare

More productive managed care program

Pricing

Individual health insurance market

Recovery & prevention of fraudulent claims

Prescription Drugs cost management


Slide 38

Quiz on Data mining

What is Data Mining?

What can data mining do?

What are data mining techniques? What are the applications of data mining?

How can you practice on data mining?


20/20


Slide 39

Summary

Overview of data mining techniques

Its application to actuarial practice

Future developments

Potential contribution to your area

02 san francisco guo 11 soa

Documents