02 san francisco guo 11 soa

Upload: lisa-newton

Post on 07-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 02 San Francisco Guo 11 SOA

    1/20

    Data Mining Techniques &Its Applications in Insurance

    Society of Actuaries

    San Francisco Spring Meeting

    June 24 - 26, 2002

    Lijia Guo, PhD, ASA, MAAA

    University of Central FloridaSession 11L

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 2

    Learning Objectives

    Understanding a Data Mining Process

    Having insight about the actuarial

    applications of data mining techniques

    Exploring the perspective of applying data

    mining techniques in your own practice

  • 8/6/2019 02 San Francisco Guo 11 SOA

    2/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 3

    Agenda

    Introduction

    Data Mining Methods

    Actuarial Applications

    Conclusions & Questions

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 4

    Introduction

    Changes in Information Technology

    Availability of large quantity of insurance

    data

    Mind your business by mining your data

  • 8/6/2019 02 San Francisco Guo 11 SOA

    3/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 5

    What is Data Mining?

    An information discovery process.

    Prediction

    -- Finding unknown values/relationships/patterns from

    known large database

    Description

    -- interpretation of a large database

    Making crucial business decisions - turn the

    newfound knowledge into actionable results

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 6

    Why Use Data Mining?

    Product development

    Marketing

    Analysis of Claims Distribution

    Healthcare ALM

    Fraud detection

    Solvency analysis

  • 8/6/2019 02 San Francisco Guo 11 SOA

    4/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 7

    Data Mining Methods

    Classification

    Regression

    Clustering

    Summarizations

    Dependency modeling

    Deviation Detection

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 8

    Data Mining Algorithms

    Decision Trees (Breiman et al., 1984)

    Logistic regression (Hosmer & Lemeshow,1989)

    Neural Networks (Bishop, 1995; Ripley, 1996)

    Fuzzy Logics

    Genetic Algorithms (Goldberg, 1989)

    Bayesian analysis, (Cheeseman et al., 1988)

    Hybrid algorithms

  • 8/6/2019 02 San Francisco Guo 11 SOA

    5/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 9

    Data Mining Algorithms

    -- Decision Trees

    What are decision trees

    How decision trees work

    Choosing variables

    Grouping

    Creating the leaf nodes of the tree

    Strengths and weaknesses

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 10

    Data Mining Algorithms-- Neural Networks

    What are Neural Networks

    How Neural Networks work

    Processing elements Training

    Predicting

    Strengths and weaknesses

  • 8/6/2019 02 San Francisco Guo 11 SOA

    6/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 11

    Data Mining Algorithms

    -- Hybrid Algorithms

    Problems with standard algorithms

    Advanced algorithms

    Discovery-driven approaches

    Mixture of algorithms

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 12

    Data Mining:Knowledge Discovery Process

    Data Acquisition

    Data integration

    Data exploration

    Model building

    Understanding your model

    Post-mining analysis

  • 8/6/2019 02 San Francisco Guo 11 SOA

    7/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 13

    Data Mining Process: Data Acquisition

    Data acquisition

    Getting your data

    Data qualification issues

    Data quality issues

    Data derivation

    Defining a study Basic Risk Characteristics

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 14

    Data Mining Process:Data Acquisition-- Case Study

    SOA database for RP-2000 Mortality Tables

    10,957,103 exposed life-years

    Subset of the database that includes all the lives

    above age 70 (3,769,956 exp, 217,490 death)

    Risk groups

    Age, gender, participation status, union, pay type,

    collar type, and annuity amount, etc.

  • 8/6/2019 02 San Francisco Guo 11 SOA

    8/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 15

    Data Mining Process:Data Acquisition

    -- Case Study

    Existing study on advanced-age mortality

    Smooth extension of the patterns

    Families of curves - Gompertz law, etc.

    All these approaches aim at explaining the age

    pattern of mortality.

    Mortality distribution varies among seniors

    with different backgrounds

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 16

    Data Mining Process: Data Integration

    To identify the factors that influence

    mortality

    To study the interaction of the risk factors

    To gain the perspective on the importance

    of these factors

  • 8/6/2019 02 San Francisco Guo 11 SOA

    9/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 17

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 18

    Data Mining Process: Data Integration-- Case Study

    Main effect exists for all six variables

    considered

    Degrees of the effects of the risk factors are

    different. the interaction of these factors

    the importance of the factors

  • 8/6/2019 02 San Francisco Guo 11 SOA

    10/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 19

    Data Mining Process: Data exploration

    Decision tree algorithm

    Analyze the influences and the importance ofthe mortality risk factors

    observations are grouped into several segments

    Algorithm - SAS/Enterprise Miner Version4.2 (2001).

    Further study the interaction and theimportance of the risk factors

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 20

    Data Mining Process: Data Integration-- Case Study

    Variable Importance Measure

    Variable Importance

    Participation Status 1.00

    Gender 0.75

    Annuity size 0.43

    Pay Type 0.21

    Union 0.18

    Collar 0.00

  • 8/6/2019 02 San Francisco Guo 11 SOA

    11/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 21

    Data Mining Process: Data exploration

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 22

    Data Mining Process: Data exploration

  • 8/6/2019 02 San Francisco Guo 11 SOA

    12/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 23

    Six risk groups:

    Employees

    Beneficiaries

    Combined

    Disabled

    Male Retirees

    Female Retirees. Logistic regression method

    Data Mining Process: Model building

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 24

    Data Mining Process: Model Building --Case Study: Female Retiree

  • 8/6/2019 02 San Francisco Guo 11 SOA

    13/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 25

    Data Mining Process: Model Building

    -- Case Study: Female Retiree Group

    Collar and Pay Type are two important

    variables

    The interaction between Collar and Pay

    Type does exist

    Both annuity size and union are not

    picked up by tree algorithm

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 26

    Data Mining Process: Model Building-- Case Study: Female Retiree Group

    R-square for the regression is 0.95

    PTCPTCxxp

    p046.000087.026.097.17

    1log

    2 ++=

    =

    collarmixed

    collarblue

    collarwhite

    C

    0047.0

    0

    0

    =

    typepaysalarized

    typepayhourly

    typepaycombined

    PT

    0

    051.0

    033.0

    Wherep is the mortality rate,x is the age

  • 8/6/2019 02 San Francisco Guo 11 SOA

    14/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 27

    Data Mining Process: Model Building -- Case Study: Female Retiree Group

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 28

    Data Mining Process: Model Building-- Case Study: Male Retiree Group

    R-square for the regression is 0.92

    Wherep is the mortality rate,x is the age

    SUUSxxp

    p ++=

    200055.020.057.141

    log

    =

    annuitysmall

    annuitymedian

    annuityel

    S

    0074.0

    060.0

    arg044.0

    =combined

    membeunionnon

    memberunion

    U

    040.0

    14.0

    0

  • 8/6/2019 02 San Francisco Guo 11 SOA

    15/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 29

    Data Mining Process: SEMMA

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 30

    Data Mining Process: Model Building -- Case Study: MaleRetiree

  • 8/6/2019 02 San Francisco Guo 11 SOA

    16/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 31

    Data Mining Process: Post-mining Analysis -- Case Study

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 32

    Data Mining Process: Understandingyour model Case Study

    The male retirees mortality model and the female

    retirees mortality model depend on different

    variables

    Mortality of the beneficiaries is determined by

    gender, annuity size, the pay type, and theirinteractions

    The gender factors will play a much-reduced role

    in determining beneficiaries mortality model

  • 8/6/2019 02 San Francisco Guo 11 SOA

    17/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 33

    Data Mining Process: Post-mining

    Analysis -- Case Study

    Limited results on the mortality distribution for

    the ages above 95

    As the female demography changed in the past

    three decade, variables such as annuity size, and

    union will play more important role in

    determining the female mortality

    Other risk factors such as education, life style,smoking/non-smoking, etc.

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 34

    Data Mining Process: Summary-- Case Study

    Non Gompertz (linear growth) between age

    70 and 85

    Selection of the risk factors may influence

    the quality of the mortality model Mortality models varies with the most

    important risk factor (the participating

    status, in this study) among all the other

    variables

  • 8/6/2019 02 San Francisco Guo 11 SOA

    18/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 35

    Data Mining Process:

    -- Case Study in Claim Analysis

    Basic risk characteristics

    Top-down identification

    Underlying statistical properties

    Domain-specific constraints

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 36

    Data Mining Process:-- Case Study in ALM

    Decision tree and DNF learning

    Generative stochastic modeling

    Probabilistic networks Probabilistic Rules

    Hidden Markov model

  • 8/6/2019 02 San Francisco Guo 11 SOA

    19/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 37

    Data Mining Process:

    -- Applications in Healthcare

    More productive managed care program

    Pricing

    Individual health insurance market

    Recovery & prevention of fraudulent claims

    Prescription Drugs cost management

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 38

    Quiz on Data mining

    What is Data Mining?

    What can data mining do?

    What are data mining techniques? What are the applications of data mining?

    How can you practice on data mining?

  • 8/6/2019 02 San Francisco Guo 11 SOA

    20/20

    SO A San F rancisco Spring MeetingJune 24-26, 2002

    Slide 39

    Summary

    Overview of data mining techniques

    Its application to actuarial practice

    Future developments

    Potential contribution to your area