dealing with continuous variables and geographical information in non life insurance ratemaking...
TRANSCRIPT
Dealing with continuous variables and geographical information in non life
insurance ratemaking
Maxime Clijsters
Introduction
Tariff ?
Professional use (Y/N)
Postal code
Age of the permit
Kilowatt of the vehicle
Age of the vehicle
Vehicle type(4x4 Y/N)
Policyholder’s Age
Categorical variableContinuous variableMulti-Level Factor
• GLMs remain a very important statistical regression technique for pricing car insurance products
• GAMs provide interesting insights in the underlying dependency structure, but come at a high computational cost
• GAM as a complementary modelling tool
Introduction
GLM = Generalized Linear ModelGAM = Generalized Additive Model
AGENDA
• Binning continuous variables– GAM to explore nonlinear effects– GAM and regression trees for binning
• Modelling geographical information
• GLM is satisfying modelling tool• Industry-wide standard
• Only categorical variables
• Continuous variables
• High computational cost• No parametric functional form
Binning continuous variables
GLM
GAM
Binning continuous variablesGAM to explore nonlinear effects
• We fit a GAM for a continuous variable , with the observed number of claims a Poisson distributed random variable
• The GAM estimate:
with the exposure corresponding to policyholder and the nonparametric GAM estimate
Binning continuous variablesGAM to explore nonlinear effects
(a) Nonparametric prediction
(b) Total prediction
�̂� (𝑥 𝑖 )=�̂�2𝑖 𝑥 𝑖2+ �̂�3 𝑖
3 +∑𝑘=1
𝐾
�̂�𝑘 (𝑥 𝑖−𝑥𝑘 )+¿3¿
Binning continuous variablesGAM to explore nonlinear effects
Often not desirable to keep the continuous effect in the tariff
» GAM has a high computational cost (iterative method)
» GAM lacks a parametric functional form
GAMs provide insight in defining risk homogeneous
groupings of variables
Binning continuous variablesGAM for binning
• Results of the GAM as a starting point for binning– Broader categories where the risk is similar– More categories when the risk varies a lot
• Defining boundaries by means of regression trees
Binning continuous variables Regression tree
• Divide variables into groups based on GAM estimate• Find splits that minimize overall sum of squared errors • Grow tree with desired number of classes
Figure: The black coloured nodes correspond to the regression tree used, the blue coloured nodes are the following splits, and the light blue nodes are the subsequent splits
Binning continuous variables Binning results
Figure: Visualization of the classes suggested by the regression tree
AGENDA
• Binning continuous variables
• Geographical information–Modelling• GLM without geographical information• GAM with geographical information
– Visualizing and binning
Geographical informationIntroduction
Geographical information Introduction
Latit
ude
Longitude
Bree:51°07'08.8"N 5°38'32.5"E
Geographical informationStep 1: GLM without geographical information
• We fit a Poisson GLM, ignoring any geographical information, to model the claim frequency
• with the non-spatial categorical variables and the exposure corresponding to policyholder i.
• Aggregate the predicted number of claims per district (INS code)
Geographical informationStep 1: GLM without geographical information
Predicted number of claims per district
Observed number of claims per district
Geographical informationStep 2: GAM with geographical information
• Calculate the residual effect • Visualization of by means of quantile binning:
– < 1: number of claims overestimated– > 1: number of claims underestimated
• Add the longitude and latitude coordinates of the center of each district j.
• We fit a GAM to estimate the geographical effect:
with a two-dimensional smooth function, capturing the geographical effects.
Geographical informationStep 2: GAM with geographical information
Geographical informationStep 2: GAM with geographical information
• The GAM estimate
which is the geographic effect on top of all other effects included in the GLM prediction
• Create zones similar in terms of risk– Bin the estimates using classification methods
• Include resulting zones in claim frequency model
Geographical informationVisualizing and binning the geographic effect
Geographical informationVisualizing and binning the geographic effect
• Problematic issue– Different classification methods can yield dissimilar classes– Maps are very sensitive to the classification method used– Visualization of the same data can convey different
impressions
Geographical informationVisualizing and binning the geographic effect
Conclusion
• GLMs remain a very important statistical regression technique for pricing car insurance products.
• GAMs provide interesting insights in the underlying dependency structure, but come at a high computational cost.
• Care is needed when reading and interpreting choropleth maps– Different classification techniques produce different
results.– Classification strongly affects the visual impressions
readers obtain.
Thank you