katrien antonio - chaire damichaire-dami.fr/files/2016/09/antonio-katrien.pdf · 2016-09-19 ·...
TRANSCRIPT
-
Actuaries and predictive modeling: past, present and future
Katrien Antonio
Faculty of Economics and BusinessLRisk Research CenterKU Leuven & [email protected]
3rd European Actuarial Journal Conference, Lyon
September 6, 2016
mailto:[email protected]
-
Goals of this talk
Focus on two case-studies using a blend of analytic techniques.
(1) Using risk factors in P&C pricing: a data driven strategy with GAMs,regression trees and GLMs.
(2) Unraveling the predictive power of telematics data in car insurancepricing.
A blend of techniques/learning outcomes/buzz words from
(recent) past, present and future?
K. Antonio, KU Leuven & UvA Goals of this talk 2 / 35
-
Goals of this talk
Focus on two case-studies using a blend of analytic techniques.
(1) Using risk factors in P&C pricing: a data driven strategy with
GAMs, regression trees andGLMs.(2) Unraveling the predictive power of telematics data in car insurance
pricing.
A blend of techniques/learning outcomes/buzz words from
(recent) past, present and future?
K. Antonio, KU Leuven & UvA Goals of this talk 3 / 35
-
Goals of this talk
Focus on two case-studies using a blend of analytic techniques.
(1) Using risk factors in P&C pricing: a data driven strategy with GAMs,
regression trees and GLMs.(2) Unraveling the predictive power of telematics data in car insurance
pricing.
A blend of techniques/learning outcomes/buzz words from
(recent) past, present and future?
K. Antonio, KU Leuven & UvA Goals of this talk 4 / 35
-
Goals of this talk
Focus on two case-studies using a blend of analytic techniques.
(1) Using risk factors in P&C pricing: a data driven strategy with GAMs,regression trees and GLMs.
(2) Unraveling the predictive power of telematics data in carinsurance pricing.
A blend of techniques/learning outcomes/buzz words from
(recent) past, present and future?
K. Antonio, KU Leuven & UvA Goals of this talk 5 / 35
-
Data science and predictive modeling
(1) Schutt & O’Neil (2013), Doing data science -Straight talk from the frontline.
What is the eyebrow-raising about big data anddata science?
‘The hype is crazy.’
Getting past the hype?
‘There might be some meat in the data sciencesandwich’;
‘Data science, as it’s practiced, is a blend ofRed-Bull-fueled hacking and espresso-inspiredstatistics.’
(2) Prof. David Donoho (2015), 50 years of datascience.
K. Antonio, KU Leuven & UvA Data science and predictive modeling: buzz words 6 / 35
-
Data science and predictive modeling
(1) Schutt & O’Neil (2013), Doing data science -Straight talk from the frontline.
What is the eyebrow-raising about big data anddata science?
‘The hype is crazy.’
Getting past the hype?
‘There might be some meat in the data sciencesandwich’;
‘Data science, as it’s practiced, is a blend ofRed-Bull-fueled hacking and espresso-inspiredstatistics.’
(2) Prof. David Donoho (2015), 50 years of datascience.
K. Antonio, KU Leuven & UvA Data science and predictive modeling: buzz words 6 / 35
-
Actuarial pricing models in P&C insurance
I (Past)
One-way and two-way analysis, minimum bias (Bailey & Simon, 1960).
I (Present)
Risk classification in competitive markets using Generalized LinearModels for frequency and severity.
I (Future) Challenges?
- high dimensional variables (e.g. territory, vehicle groups)
- (structured and unstructured) telematics data;
- keep model explainable to clients, regulators, ICT, . . .;
- be aware of actuarial features!!
K. Antonio, KU Leuven & UvA Actuaries and Predictive modeling 7 / 35
-
Actuarial pricing models in P&C insurance
I (Past)
One-way and two-way analysis, minimum bias (Bailey & Simon, 1960).
I (Present)
Risk classification in competitive markets using Generalized LinearModels for frequency and severity.
I (Future) Challenges?
- high dimensional variables (e.g. territory, vehicle groups)
- (structured and unstructured) telematics data;
- keep model explainable to clients, regulators, ICT, . . .;
- be aware of actuarial features!!
K. Antonio, KU Leuven & UvA Actuaries and Predictive modeling 7 / 35
-
Actuarial pricing models in P&C insurance
I (Past)
One-way and two-way analysis, minimum bias (Bailey & Simon, 1960).
I (Present)
Risk classification in competitive markets using Generalized LinearModels for frequency and severity.
I (Future) Challenges?
- high dimensional variables (e.g. territory, vehicle groups)
- (structured and unstructured) telematics data;
- keep model explainable to clients, regulators, ICT, . . .;
- be aware of actuarial features!!
K. Antonio, KU Leuven & UvA Actuaries and Predictive modeling 7 / 35
-
Actuarial pricing models in P&C insurance
I (Past)
One-way and two-way analysis, minimum bias (Bailey & Simon, 1960).
I (Present)
Risk classification in competitive markets using Generalized LinearModels for frequency and severity.
I (Future) Challenges?
- high dimensional variables (e.g. territory, vehicle groups)
- (structured and unstructured) telematics data;
- keep model explainable to clients, regulators, ICT, . . .;
- be aware of actuarial features!!
K. Antonio, KU Leuven & UvA Actuaries and Predictive modeling 7 / 35
-
Actuarial pricing models in P&C insurance: a blend of?
de Jong & Heller Ohlsson & Johansson Denuit et al.
Hastie, Tibshirani & Friedman James et al. Kuhn & Johnson
K. Antonio, KU Leuven & UvA Actuaries and Predictive modeling 8 / 35
-
Actuarial pricing models in P&C insurance: a blend of?
de Jong & Heller Ohlsson & Johansson Denuit et al.
Hastie, Tibshirani & Friedman James et al. Kuhn & Johnson
K. Antonio, KU Leuven & UvA Actuaries and Predictive modeling 8 / 35
-
Actuarial pricing models in P&C insurance: a blend of?
de Jong & Heller Ohlsson & Johansson Denuit et al.
Hastie, Tibshirani & Friedman James et al. Kuhn & Johnson
K. Antonio, KU Leuven & UvA Actuaries and Predictive modeling 8 / 35
-
Using risk factors in P&C pricing
a data driven strategy with GAMs, regression trees andGLMs.
Katrien AntonioKU Leuven & UvA
Maxime ClijstersAG Insurance
Roel HenckaertsKU Leuven
Roel VerbelenKU Leuven
-
GAM claim frequency model as starting point
I Our solution starts with an exhaustive search using GAMs.
I Best GAM according to AIC/BIC:
log(E(nclaims)) =log(exposure) + β0 + β1coveragePO + β2coverageFO + β3fueldiesel+ f1(ageph) + f2(bm) + f3(power)
+ f4(ageph, power)
+ f5(long, lat),
which combines offset and
Categorical Continuous Interactions Spatial
risk factors.
K. Antonio, KU Leuven & UvA Case study 1 10 / 35
-
GAM claim frequency model as starting point
I Our solution starts with an exhaustive search using GAMs.
I Best GAM according to AIC/BIC:
log(E(nclaims)) =log(exposure) + β0 + β1coveragePO + β2coverageFO + β3fueldiesel+ f1(ageph) + f2(bm) + f3(power)
+ f4(ageph, power)
+ f5(long, lat),
which combines offset and
Categorical Continuous Interactions Spatial
risk factors.
K. Antonio, KU Leuven & UvA Case study 1 10 / 35
-
GAM claim frequency model as starting point
I Our solution starts with an exhaustive search using GAMs.
I Best GAM according to AIC/BIC:
log(E(nclaims)) =log(exposure) + β0 + β1coveragePO + β2coverageFO + β3fueldiesel+ f1(ageph) + f2(bm) + f3(power)
+ f4(ageph, power)
+ f5(long, lat),
which combines offset and
Categorical Continuous Interactions Spatial
risk factors.
K. Antonio, KU Leuven & UvA Case study 1 10 / 35
-
GAM claim frequency model as starting point
I Our solution starts with an exhaustive search using GAMs.
I Best GAM according to AIC/BIC:
log(E(nclaims)) =log(exposure) + β0 + β1coveragePO + β2coverageFO + β3fueldiesel+ f1(ageph) + f2(bm) + f3(power)
+ f4(ageph, power)
+ f5(long, lat),
which combines offset and
Categorical
Continuous Interactions Spatial
risk factors.
K. Antonio, KU Leuven & UvA Case study 1 10 / 35
-
GAM claim frequency model as starting point
I Our solution starts with an exhaustive search using GAMs.
I Best GAM according to AIC/BIC:
log(E(nclaims)) =log(exposure) + β0 + β1coveragePO + β2coverageFO + β3fueldiesel+ f1(ageph) + f2(bm) + f3(power)
+ f4(ageph, power)
+ f5(long, lat),
which combines offset and
Categorical Continuous
Interactions Spatial
risk factors.
K. Antonio, KU Leuven & UvA Case study 1 10 / 35
-
GAM claim frequency model as starting point
I Our solution starts with an exhaustive search using GAMs.
I Best GAM according to AIC/BIC:
log(E(nclaims)) =log(exposure) + β0 + β1coveragePO + β2coverageFO + β3fueldiesel+ f1(ageph) + f2(bm) + f3(power)
+ f4(ageph, power)
+ f5(long, lat),
which combines offset and
Categorical Continuous Interactions
Spatial
risk factors.
K. Antonio, KU Leuven & UvA Case study 1 10 / 35
-
GAM claim frequency model as starting point
I Our solution starts with an exhaustive search using GAMs.
I Best GAM according to AIC/BIC:
log(E(nclaims)) =log(exposure) + β0 + β1coveragePO + β2coverageFO + β3fueldiesel+ f1(ageph) + f2(bm) + f3(power)
+ f4(ageph, power)
+ f5(long, lat),
which combines offset and
Categorical Continuous Interactions Spatial
risk factors.
K. Antonio, KU Leuven & UvA Case study 1 10 / 35
-
GAM claim frequency model as starting point
−0.2
0.0
0.2
0.4
0.6
25 50 75ageph
Sin
gle
effe
ct
0.0
0.4
0.8
0 5 10 15 20bm
Sin
gle
effe
ct
−1
0
1
0 50 100 150 200 250power
Sin
gle
effe
ct
0
50
100
150
200
250
25 50 75ageph
pow
er
−0.5
0.0
0.5
Interaction effect
−0.4
−0.2
0.0
0.2
Spatial effect
K. Antonio, KU Leuven & UvA Case study 1 11 / 35
-
From GAMs to GLMs
We choose number of geo-classes by optimizing BIC for the GAM withbinned spatial effect.
Spatial effect
[−0.437,−0.328)
[−0.328,−0.219)
[−0.219,−0.109)
[−0.109,0.000104)
[0.000104,0.11)
[0.11,0.219)
[0.219,0.328]
Equal intervals
Spatial effect
[−0.437,−0.201)
[−0.201,−0.136)
[−0.136,−0.0846)
[−0.0846,−0.0258)
[−0.0258,0.0207)
[0.0207,0.119)
[0.119,0.328]
Quantile binning
Spatial effect
[−0.437,−0.382)
[−0.382,−0.278)
[−0.278,−0.121)
[−0.121,−0.0278)
[−0.0278,0.0475)
[0.0475,0.169)
[0.169,0.328]
Complete linkage
Spatial effect
[−0.437,−0.415)
[−0.415,−0.382)
[−0.382,−0.359)
[−0.359,−0.328)
[−0.328,−0.318)
[−0.318,0.325)
[0.325,0.328]
Single linkage
Spatial effect
[−0.437,−0.318)
[−0.318,−0.218)
[−0.218,−0.134)
[−0.134,−0.0448)
[−0.0448,0.0553)
[0.0553,0.169)
[0.169,0.328]
K−means clustering
Spatial effect
[−0.437,−0.255)
[−0.255,−0.146)
[−0.146,−0.0618)
[−0.0618,0.015)
[0.015,0.103)
[0.103,0.214)
[0.214,0.328]
Fisher−Jenks
K. Antonio, KU Leuven & UvA Case study 1 12 / 35
-
From GAMs to GLMs using evolutionary trees
I We fit evolutionary trees to the single and interaction effects:
f̂1(ageph) f̂2(bm) f̂3(power) f̂4(ageph, power).
I We bin these effects by picking the best tree according to someevaluation criterion:
N · log(wMSE) + 4 · α · (M + 1) · log(N).
I The evaluation criterion balances goodness-of-fit (wMSE) andcomplexity of the tree (M), while accounting for portfolio compositionas weights.
I We tune α and then use the optimal tree according to this evaluationcriterion.
K. Antonio, KU Leuven & UvA Case study 1 13 / 35
-
From GAMs to GLMs using evolutionary trees
−0.2
0.0
0.2
0.4
0.6
25 50 75ageph
Sin
gle
effe
ct
0.0
0.4
0.8
0 5 10 15 20bm
Sin
gle
effe
ct
−1
0
1
2
0 50 100 150 200 250power
Sin
gle
effe
ct
0
50
100
150
200
250
25 50 75ageph
pow
er
−0.5
0.0
0.5
Interaction effect
K. Antonio, KU Leuven & UvA Case study 1 14 / 35
-
From GAMs to GLMs using evolutionary trees
●●●●●●
●●●
●●●
●●●
●●●●●●●●●●●●●●●●●●
●●●●●
●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
−0.2
0.0
0.2
0.4
0.6
25 50 75ageph
Sin
gle
effe
ct
●
●
●
● ●● ●
●
●
●
●
●
● ● ●
● ● ● ● ● ● ● ●
0.0
0.4
0.8
0 5 10 15 20bm
Sin
gle
effe
ct
●●●●●●●●●●
●●●●●●●
●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●● ●●● ●●●●●●● ●●●●●●●●●● ●●
−1
0
1
2
0 50 100 150 200 250power
Sin
gle
effe
ct
0
50
100
150
200
250
25 50 75ageph
pow
er
−0.75
−0.50
−0.25
0.00
0.25
Residual
K. Antonio, KU Leuven & UvA Case study 1 15 / 35
-
From GAMs to GLMs using evolutionary trees
I Hence, we obtain a fully data-driven binning procedure.
I We use a blend of techniques:
trees, genetic algorithms;
(machine learning);
GAMs;
(flexible statistical modeling);
GLMs;
(the actuarial comfort zone).
K. Antonio, KU Leuven & UvA Case study 1 16 / 35
-
From GAMs to GLMs using evolutionary trees
I Hence, we obtain a fully data-driven binning procedure.
I We use a blend of techniques:
trees, genetic algorithms;
(machine learning);
GAMs;
(flexible statistical modeling);
GLMs;
(the actuarial comfort zone).
K. Antonio, KU Leuven & UvA Case study 1 16 / 35
-
Unraveling the predictive power of telematics data in carinsurance pricing.
Roel VerbelenKU Leuven
Katrien AntonioKU Leuven & UvA
Gerda ClaeskensKU Leuven
-
Telematics insurance: the future?
I The Economist, February 23 2013,How’s my driving?
I “Underwriters have traditionally used crude
demographic data such as age, location and
sex to separate the testosterone-fuelled boy
racers from their often tamer female
counterparts. [. . .] By monitoring their
customers’ motoring habits, underwriters
can increasingly distinguish between drivers
who are safe on the road from those who
merely seem safe on paper. Many think that
telematics insurance will become the
industry norm.”
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 18 / 35
-
New rating variables due to telematics technology
Telematics data collected in each trip: driving habits
and driving style
• the distance driven;
• the time of day;
• how long you have been driving;
• the location;
• the speed/speeding;
• harsh or smooth breaking;
• aggressive acceleration ordeceleration;
• your cornering and parking skills.
Possibly combined with:
• road maps;
• weather information;
• traffic information.
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 19 / 35
-
New rating variables due to telematics technology
Telematics data collected in each trip: driving habits and driving style
• the distance driven;
• the time of day;
• how long you have been driving;
• the location;
• the speed/speeding;
• harsh or smooth breaking;
• aggressive acceleration ordeceleration;
• your cornering and parking skills.
Possibly combined with:
• road maps;
• weather information;
• traffic information.
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 19 / 35
-
Unique telematics data set from a Belgian insurer
I Telematics data collected in between 2010 and 2014.
I Belgian MTPL product with telematics black box targeted to youngdrivers.
I Daily CSV-files with trip info, aggregated on daily basis:
- number of trips;
- meters traveled (in total) and
• divided by time slot: 6u-9u30, 9u30-16u, 16u-19u, 19u-22u,22u-6u;
• divided by road type: motorways, urban area, abroad, any othertype.
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 20 / 35
-
Unique telematics data set from a Belgian insurer
Insured Insurer
Data provider
Policy information
Raw
telematics
information
Agg
rega
ted
tele
mat
ics
info
rmat
ion
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 21 / 35
-
Unique telematics data set from a Belgian insurer
●●
●
●
●
●●●●●●
●
●●●●●
●
●●●●●●●
●
●●
●●●●
●
●●●●●●●●●
●●
●●●
●●
●●
●●●●
●●
●
●
●●●●
●●
●
●
●
●●
●●
●●●●●
●●
●
●●●●
●●
●
●●●●
●
●
●●
●●●
●●
●●●●●
●●
●●●●
●
●
●
●
●●●●
●
●
●
●●●●
●●
●
●●
●
●
●●
●
●●●●
●●●●●●●
●●
●
●●
●●
●●
●
●
●
●●
●
●
●●●●●
●
●
●
●●●●
●●
●●●●●
●
●
●●●●●
●●
●●●●●●●
●●●
●●
●●
●
●●●●
●●
●●●●●
●●
●●●●●
●●
●
●●●●
●●
●●●●●
●●
●
●●●●
●●
●●●●●
●●
●●●●●
●●
●
●●●●
●
●
●●●●●
●
●
●●●●●
●●
●●●●●
●●
●
●●●●
●
●
●
●●●●
●
●
●
●
●●●
●●
●
●●
●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●●
●
●
●●
●●●
●●
●●
●●●●
●●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●
●●
●●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●●●●●
●●
●
●●●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●●●●●
●
●●●●●
●●
●
●●●●
●
●
●●
●●●
●
●
●
●●●●●
●
●●●●●●
●
●
●●●●●
●
●
●●●●
●
●
●
●
●●
●●
●
●
●●●●
●
●
●
●
●●●●
●
●
●●●●●
●
●
●
●
●
●
●
●
●
●●●●●
●
●●●●●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●●●●
●
●
●
●●●●
●
●●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●●●●●●
●
●●●●●●
●
●
●●●●●
●
●
●●●
●
●
●
●
●●●●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●
●
●
●
●
●●●
●
●
●●
●●
●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●
●
●
●
●●●
●
●
●
●
●●●●●
●
●
●●●●
●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●●●●
●
●
●
●
●●●
●
●
●
●●●●
●
●
●
●●●●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●●
●●
●
●
●●
●●●●●
●●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
●●●●●
●
●
●
●●●●
●●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●●●●●
●
●●●●
●
●
●
●●●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●●
●●●
●
●
●
●●●●
●
●
●●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●●●
●
●
●
●●●●
●
●
●
●
●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●
●●
●
●
●
●●●●
●●
●●●●●
●●
●
●
●●●
●●
●
●
●●●
●●
●
●●●●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●●
●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●●
●
●
●●
●
●●●●●●
●
●
●
●
●
●
●●●●
●
●
●●
●
●●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●●●●
●
●
●
●●●●
●
●●●
●
●
●
●●●●
●●
●
●●●●
●
●
●
●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●●●●
●
●
●
●
●●●●
●●
●
●●●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●
●
●●
●
●●●●
●●
●●
●
●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●●
0
100k
200k
300k
400k
2010 2011 2012 2013 2014 2015Date
Dis
tanc
e (in
km
)
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 22 / 35
-
Unique telematics data set from a Belgian insurer
●●
●
●
●
●●●●●●
●
●●●●●
●
●●●●●●●
●
●●
●●●●
●
●●●●●●●●●
●●
●●●
●●
●●
●●●●
●●
●
●
●●●●
●●
●
●
●
●●
●●
●●●●●
●●
●
●●●●
●●
●
●●●●
●
●
●●
●●●
●●
●●●●●
●●
●●●●
●
●
●
●
●●●●
●
●
●
●●●●
●●
●
●●
●
●
●●
●
●●●●
●●●●●●●
●●
●
●●
●●
●●
●
●
●
●●
●
●
●●●●●
●
●
●
●●●●
●●
●●●●●
●
●
●●●●●
●●
●●●●●●●
●●●
●●
●●
●
●●●●
●●
●●●●●
●●
●●●●●
●●
●
●●●●
●●
●●●●●
●●
●
●●●●
●●
●●●●●
●●
●●●●●
●●
●
●●●●
●
●
●●●●●
●
●
●●●●●
●●
●●●●●
●●
●
●●●●
●
●
●
●●●●
●
●
●
●
●●●
●●
●
●●
●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●●
●
●
●●
●●●
●●
●●
●●●●
●●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●
●●
●●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●●●●●
●●
●
●●●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●●●●●
●
●●●●●
●●
●
●●●●
●
●
●●
●●●
●
●
●
●●●●●
●
●●●●●●
●
●
●●●●●
●
●
●●●●
●
●
●
●
●●
●●
●
●
●●●●
●
●
●
●
●●●●
●
●
●●●●●
●
●
●
●
●
●
●
●
●
●●●●●
●
●●●●●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●●●●
●
●
●
●●●●
●
●●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●●●●●●
●
●●●●●●
●
●
●●●●●
●
●
●●●
●
●
●
●
●●●●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●
●
●
●
●
●●●
●
●
●●
●●
●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●
●
●
●
●●●
●
●
●
●
●●●●●
●
●
●●●●
●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●●●●
●
●
●
●
●●●
●
●
●
●●●●
●
●
●
●●●●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●●
●●
●
●
●●
●●●●●
●●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
●●●●●
●
●
●
●●●●
●●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●●●●●
●
●●●●
●
●
●
●●●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●●
●●●
●
●
●
●●●●
●
●
●●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●●●
●
●
●
●●●●
●
●
●
●
●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●
●●
●
●
●
●●●●
●●
●●●●●
●●
●
●
●●●
●●
●
●
●●●
●●
●
●●●●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●●
●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●●
●
●●●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●
●
●●
●
●●●●
●●
●●
●
●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●●●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●●
0
100k
200k
300k
400k
2010 2012 2014Date
Dis
tanc
e (in
km
)
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 23 / 35
-
Description of the data
The resulting data set has 33 259 observations:
I 10 406 unique policyholders;
I 17 681 years of insured periods;
I 0.0838 claims per insured year;
I 1481 MTPL claims at fault;
I 297 million kilometers driven;
I 0.0499 claims per 10 000 km.
What is the best measure of exposure to risk?
0.000
0.002
0.004
0.006
0.008
50 100 150 200 250 300 350Policy period (days)
Den
sity
0.00
0.02
0.04
0.06
0.08
0 10 20 30 40 50 60 70Distance (1000 km)
Den
sity
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 24 / 35
-
Policy information
0.00
0.05
0.10
0.15
18 21 24 27 30Age
Den
sity
0.00
0.05
0.10
0.15
0 3 6 9 12Experience
Den
sity
0.00
0.05
0.10
0.15
0 4 8 12 16 20 24Age vehicle
Den
sity
0.00
0.01
0.02
30 60 90 120150180210Kwatt
Den
sity
0.00
0.05
0.10
−4 0 4 8 12 16 20Bonus−malus
Pro
port
ion
0.0
0.2
0.4
male femaleGender
Pro
port
ion
0.0
0.2
0.4
0.6
Diesel PetrolFuel
Pro
port
ion
0.0
0.2
0.4
0.6
yes noMaterial damage cover
Pro
port
ion
Proportion per km2
[3.69e−07,5.1e−05)
[5.1e−05,0.00014)
[0.00014,0.000274)
[0.000274,0.000475)
[0.000475,0.000789)
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 25 / 35
-
Telematics information
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 26 / 35
-
Predictor sets
Classic
Timehybrid
Meterhybrid
TelematicsPolicy
informationTelematicsinformation
Time ba
sedratin
g
Meter bas
ed rating
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 27 / 35
-
Generalized additive models
We use GAMs (Wood, 2006):
Nit ∼ POI(µit = exp (ηit))
ηit = offset + ηcatit + η
contit + η
spatialit + η
reit + η
compit
ηcatit + ηcontit + η
spatialit = Z itβ +
J∑j=1
fj(xjit) + fspatial(latit , longit) ,
We combine:
categorical + continuous + spatial + compositional (new!!)
risk factors.
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 28 / 35
-
Compositional data
I Satellite talk!
I Roel Verbelen’s talk, today, in ParallelSession 1 (Analytics), from 11-11.30h!
I More on our telematics paper, withfocus on methodological contributionwrt compositional data as predictors. Roel Verbelen, KU
Leuven
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 29 / 35
-
Model selection and assessment
I Exhaustive search with AIC as a global goodness-of-fit measure.
AIC = −2 · logL+ 2 · EDF
where EDF is the effective degrees of freedom.
I Predictive performance is assessed using proper scoring rules for countdata (Czado et al., 2009) with 10-fold cross validation
S =1∑I
i=1 Ti
I∑i=1
Ti∑t=1
s(P̂−κitit , nit) ,
where P̂−κitit the predictive count distribution for observation nitestimated with the κitth part of the data removed.
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 30 / 35
-
Results: discussion
I Telematics information improves predictive power.
- Gender plays no role anymore in models incorporating telematicsinformation (cfr. Gender Directive).
- Spatial heterogeneity decreases.
- Time hybrid model incorporating telematics through additional riskfactors is optimal.
- Experience is preferred above age of the driver.
- Compositional driving habits have significant impact on riskiness.
- Classic approach performs worse.
I Similar results using negative binomial regression and using exposureas offset.
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 31 / 35
-
Results: model assessment
Predictor set EDFAIC logS QS SphS
value rank value rank value rank value rank
Classic 32.15 11 896 4 0.1790 4 −0.918 58 4 −0.958 22 4Time hybrid 39.66 11 727 1 0.1764 1 −0.919 10 1 −0.958 37 1Meter hybrid 41.47 11 736 2 0.1766 2 −0.919 08 2 −0.958 36 2Telematics 18.05 11 890 3 0.1787 3 −0.918 60 3 −0.958 22 3
I Significant impact of the use of telematics data;
I Time hybrid is the best model according to AIC and all proper scoringrules;
I Using only telematics predictors is even better than the use oftraditional rating variables.
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 32 / 35
-
Time hybrid - Policy information
Predictor
Pol
icy
TimeAgeExperienceSexMaterialPostal codeBonus-malusAge vehicleKwattFuel
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 33 / 35
-
Time hybrid - Telematics information
Predictor
Tel
emat
ics
DistanceYearly distanceAverage distanceRoad type 1111Road type 0111Time slotWeek/weekend
K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 34 / 35
-
Outlook
I encourage the blending idea . . .
- of techniques (from machine learning, statistical modeling, actuarialscience);
- of disciplines (from computer science, statistics, actuarial science, butalso law);
- of people from practice and academia;
. . . to tackle the challenges imposed by structured and unstructured data inorder to create insurance analytics, products and risk management of thefuture.
K. Antonio, KU Leuven & UvA Outlook 35 / 35
Goals of this talkGoals of this talkData science and predictive modeling: buzz wordsActuaries and Predictive modelingCase study 1Case study 2: telematics insuranceDataModelResults
Outlook