katrien antonio - chaire damichaire-dami.fr/files/2016/09/antonio-katrien.pdf · 2016-09-19 ·...

49
Actuaries and predictive modeling: past, present and future Katrien Antonio Faculty of Economics and Business LRisk Research Center KU Leuven & UvA [email protected] 3rd European Actuarial Journal Conference, Lyon September 6, 2016

Upload: others

Post on 21-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Actuaries and predictive modeling: past, present and future

    Katrien Antonio

    Faculty of Economics and BusinessLRisk Research CenterKU Leuven & [email protected]

    3rd European Actuarial Journal Conference, Lyon

    September 6, 2016

    mailto:[email protected]

  • Goals of this talk

    Focus on two case-studies using a blend of analytic techniques.

    (1) Using risk factors in P&C pricing: a data driven strategy with GAMs,regression trees and GLMs.

    (2) Unraveling the predictive power of telematics data in car insurancepricing.

    A blend of techniques/learning outcomes/buzz words from

    (recent) past, present and future?

    K. Antonio, KU Leuven & UvA Goals of this talk 2 / 35

  • Goals of this talk

    Focus on two case-studies using a blend of analytic techniques.

    (1) Using risk factors in P&C pricing: a data driven strategy with

    GAMs, regression trees andGLMs.(2) Unraveling the predictive power of telematics data in car insurance

    pricing.

    A blend of techniques/learning outcomes/buzz words from

    (recent) past, present and future?

    K. Antonio, KU Leuven & UvA Goals of this talk 3 / 35

  • Goals of this talk

    Focus on two case-studies using a blend of analytic techniques.

    (1) Using risk factors in P&C pricing: a data driven strategy with GAMs,

    regression trees and GLMs.(2) Unraveling the predictive power of telematics data in car insurance

    pricing.

    A blend of techniques/learning outcomes/buzz words from

    (recent) past, present and future?

    K. Antonio, KU Leuven & UvA Goals of this talk 4 / 35

  • Goals of this talk

    Focus on two case-studies using a blend of analytic techniques.

    (1) Using risk factors in P&C pricing: a data driven strategy with GAMs,regression trees and GLMs.

    (2) Unraveling the predictive power of telematics data in carinsurance pricing.

    A blend of techniques/learning outcomes/buzz words from

    (recent) past, present and future?

    K. Antonio, KU Leuven & UvA Goals of this talk 5 / 35

  • Data science and predictive modeling

    (1) Schutt & O’Neil (2013), Doing data science -Straight talk from the frontline.

    What is the eyebrow-raising about big data anddata science?

    ‘The hype is crazy.’

    Getting past the hype?

    ‘There might be some meat in the data sciencesandwich’;

    ‘Data science, as it’s practiced, is a blend ofRed-Bull-fueled hacking and espresso-inspiredstatistics.’

    (2) Prof. David Donoho (2015), 50 years of datascience.

    K. Antonio, KU Leuven & UvA Data science and predictive modeling: buzz words 6 / 35

  • Data science and predictive modeling

    (1) Schutt & O’Neil (2013), Doing data science -Straight talk from the frontline.

    What is the eyebrow-raising about big data anddata science?

    ‘The hype is crazy.’

    Getting past the hype?

    ‘There might be some meat in the data sciencesandwich’;

    ‘Data science, as it’s practiced, is a blend ofRed-Bull-fueled hacking and espresso-inspiredstatistics.’

    (2) Prof. David Donoho (2015), 50 years of datascience.

    K. Antonio, KU Leuven & UvA Data science and predictive modeling: buzz words 6 / 35

  • Actuarial pricing models in P&C insurance

    I (Past)

    One-way and two-way analysis, minimum bias (Bailey & Simon, 1960).

    I (Present)

    Risk classification in competitive markets using Generalized LinearModels for frequency and severity.

    I (Future) Challenges?

    - high dimensional variables (e.g. territory, vehicle groups)

    - (structured and unstructured) telematics data;

    - keep model explainable to clients, regulators, ICT, . . .;

    - be aware of actuarial features!!

    K. Antonio, KU Leuven & UvA Actuaries and Predictive modeling 7 / 35

  • Actuarial pricing models in P&C insurance

    I (Past)

    One-way and two-way analysis, minimum bias (Bailey & Simon, 1960).

    I (Present)

    Risk classification in competitive markets using Generalized LinearModels for frequency and severity.

    I (Future) Challenges?

    - high dimensional variables (e.g. territory, vehicle groups)

    - (structured and unstructured) telematics data;

    - keep model explainable to clients, regulators, ICT, . . .;

    - be aware of actuarial features!!

    K. Antonio, KU Leuven & UvA Actuaries and Predictive modeling 7 / 35

  • Actuarial pricing models in P&C insurance

    I (Past)

    One-way and two-way analysis, minimum bias (Bailey & Simon, 1960).

    I (Present)

    Risk classification in competitive markets using Generalized LinearModels for frequency and severity.

    I (Future) Challenges?

    - high dimensional variables (e.g. territory, vehicle groups)

    - (structured and unstructured) telematics data;

    - keep model explainable to clients, regulators, ICT, . . .;

    - be aware of actuarial features!!

    K. Antonio, KU Leuven & UvA Actuaries and Predictive modeling 7 / 35

  • Actuarial pricing models in P&C insurance

    I (Past)

    One-way and two-way analysis, minimum bias (Bailey & Simon, 1960).

    I (Present)

    Risk classification in competitive markets using Generalized LinearModels for frequency and severity.

    I (Future) Challenges?

    - high dimensional variables (e.g. territory, vehicle groups)

    - (structured and unstructured) telematics data;

    - keep model explainable to clients, regulators, ICT, . . .;

    - be aware of actuarial features!!

    K. Antonio, KU Leuven & UvA Actuaries and Predictive modeling 7 / 35

  • Actuarial pricing models in P&C insurance: a blend of?

    de Jong & Heller Ohlsson & Johansson Denuit et al.

    Hastie, Tibshirani & Friedman James et al. Kuhn & Johnson

    K. Antonio, KU Leuven & UvA Actuaries and Predictive modeling 8 / 35

  • Actuarial pricing models in P&C insurance: a blend of?

    de Jong & Heller Ohlsson & Johansson Denuit et al.

    Hastie, Tibshirani & Friedman James et al. Kuhn & Johnson

    K. Antonio, KU Leuven & UvA Actuaries and Predictive modeling 8 / 35

  • Actuarial pricing models in P&C insurance: a blend of?

    de Jong & Heller Ohlsson & Johansson Denuit et al.

    Hastie, Tibshirani & Friedman James et al. Kuhn & Johnson

    K. Antonio, KU Leuven & UvA Actuaries and Predictive modeling 8 / 35

  • Using risk factors in P&C pricing

    a data driven strategy with GAMs, regression trees andGLMs.

    Katrien AntonioKU Leuven & UvA

    Maxime ClijstersAG Insurance

    Roel HenckaertsKU Leuven

    Roel VerbelenKU Leuven

  • GAM claim frequency model as starting point

    I Our solution starts with an exhaustive search using GAMs.

    I Best GAM according to AIC/BIC:

    log(E(nclaims)) =log(exposure) + β0 + β1coveragePO + β2coverageFO + β3fueldiesel+ f1(ageph) + f2(bm) + f3(power)

    + f4(ageph, power)

    + f5(long, lat),

    which combines offset and

    Categorical Continuous Interactions Spatial

    risk factors.

    K. Antonio, KU Leuven & UvA Case study 1 10 / 35

  • GAM claim frequency model as starting point

    I Our solution starts with an exhaustive search using GAMs.

    I Best GAM according to AIC/BIC:

    log(E(nclaims)) =log(exposure) + β0 + β1coveragePO + β2coverageFO + β3fueldiesel+ f1(ageph) + f2(bm) + f3(power)

    + f4(ageph, power)

    + f5(long, lat),

    which combines offset and

    Categorical Continuous Interactions Spatial

    risk factors.

    K. Antonio, KU Leuven & UvA Case study 1 10 / 35

  • GAM claim frequency model as starting point

    I Our solution starts with an exhaustive search using GAMs.

    I Best GAM according to AIC/BIC:

    log(E(nclaims)) =log(exposure) + β0 + β1coveragePO + β2coverageFO + β3fueldiesel+ f1(ageph) + f2(bm) + f3(power)

    + f4(ageph, power)

    + f5(long, lat),

    which combines offset and

    Categorical Continuous Interactions Spatial

    risk factors.

    K. Antonio, KU Leuven & UvA Case study 1 10 / 35

  • GAM claim frequency model as starting point

    I Our solution starts with an exhaustive search using GAMs.

    I Best GAM according to AIC/BIC:

    log(E(nclaims)) =log(exposure) + β0 + β1coveragePO + β2coverageFO + β3fueldiesel+ f1(ageph) + f2(bm) + f3(power)

    + f4(ageph, power)

    + f5(long, lat),

    which combines offset and

    Categorical

    Continuous Interactions Spatial

    risk factors.

    K. Antonio, KU Leuven & UvA Case study 1 10 / 35

  • GAM claim frequency model as starting point

    I Our solution starts with an exhaustive search using GAMs.

    I Best GAM according to AIC/BIC:

    log(E(nclaims)) =log(exposure) + β0 + β1coveragePO + β2coverageFO + β3fueldiesel+ f1(ageph) + f2(bm) + f3(power)

    + f4(ageph, power)

    + f5(long, lat),

    which combines offset and

    Categorical Continuous

    Interactions Spatial

    risk factors.

    K. Antonio, KU Leuven & UvA Case study 1 10 / 35

  • GAM claim frequency model as starting point

    I Our solution starts with an exhaustive search using GAMs.

    I Best GAM according to AIC/BIC:

    log(E(nclaims)) =log(exposure) + β0 + β1coveragePO + β2coverageFO + β3fueldiesel+ f1(ageph) + f2(bm) + f3(power)

    + f4(ageph, power)

    + f5(long, lat),

    which combines offset and

    Categorical Continuous Interactions

    Spatial

    risk factors.

    K. Antonio, KU Leuven & UvA Case study 1 10 / 35

  • GAM claim frequency model as starting point

    I Our solution starts with an exhaustive search using GAMs.

    I Best GAM according to AIC/BIC:

    log(E(nclaims)) =log(exposure) + β0 + β1coveragePO + β2coverageFO + β3fueldiesel+ f1(ageph) + f2(bm) + f3(power)

    + f4(ageph, power)

    + f5(long, lat),

    which combines offset and

    Categorical Continuous Interactions Spatial

    risk factors.

    K. Antonio, KU Leuven & UvA Case study 1 10 / 35

  • GAM claim frequency model as starting point

    −0.2

    0.0

    0.2

    0.4

    0.6

    25 50 75ageph

    Sin

    gle

    effe

    ct

    0.0

    0.4

    0.8

    0 5 10 15 20bm

    Sin

    gle

    effe

    ct

    −1

    0

    1

    0 50 100 150 200 250power

    Sin

    gle

    effe

    ct

    0

    50

    100

    150

    200

    250

    25 50 75ageph

    pow

    er

    −0.5

    0.0

    0.5

    Interaction effect

    −0.4

    −0.2

    0.0

    0.2

    Spatial effect

    K. Antonio, KU Leuven & UvA Case study 1 11 / 35

  • From GAMs to GLMs

    We choose number of geo-classes by optimizing BIC for the GAM withbinned spatial effect.

    Spatial effect

    [−0.437,−0.328)

    [−0.328,−0.219)

    [−0.219,−0.109)

    [−0.109,0.000104)

    [0.000104,0.11)

    [0.11,0.219)

    [0.219,0.328]

    Equal intervals

    Spatial effect

    [−0.437,−0.201)

    [−0.201,−0.136)

    [−0.136,−0.0846)

    [−0.0846,−0.0258)

    [−0.0258,0.0207)

    [0.0207,0.119)

    [0.119,0.328]

    Quantile binning

    Spatial effect

    [−0.437,−0.382)

    [−0.382,−0.278)

    [−0.278,−0.121)

    [−0.121,−0.0278)

    [−0.0278,0.0475)

    [0.0475,0.169)

    [0.169,0.328]

    Complete linkage

    Spatial effect

    [−0.437,−0.415)

    [−0.415,−0.382)

    [−0.382,−0.359)

    [−0.359,−0.328)

    [−0.328,−0.318)

    [−0.318,0.325)

    [0.325,0.328]

    Single linkage

    Spatial effect

    [−0.437,−0.318)

    [−0.318,−0.218)

    [−0.218,−0.134)

    [−0.134,−0.0448)

    [−0.0448,0.0553)

    [0.0553,0.169)

    [0.169,0.328]

    K−means clustering

    Spatial effect

    [−0.437,−0.255)

    [−0.255,−0.146)

    [−0.146,−0.0618)

    [−0.0618,0.015)

    [0.015,0.103)

    [0.103,0.214)

    [0.214,0.328]

    Fisher−Jenks

    K. Antonio, KU Leuven & UvA Case study 1 12 / 35

  • From GAMs to GLMs using evolutionary trees

    I We fit evolutionary trees to the single and interaction effects:

    f̂1(ageph) f̂2(bm) f̂3(power) f̂4(ageph, power).

    I We bin these effects by picking the best tree according to someevaluation criterion:

    N · log(wMSE) + 4 · α · (M + 1) · log(N).

    I The evaluation criterion balances goodness-of-fit (wMSE) andcomplexity of the tree (M), while accounting for portfolio compositionas weights.

    I We tune α and then use the optimal tree according to this evaluationcriterion.

    K. Antonio, KU Leuven & UvA Case study 1 13 / 35

  • From GAMs to GLMs using evolutionary trees

    −0.2

    0.0

    0.2

    0.4

    0.6

    25 50 75ageph

    Sin

    gle

    effe

    ct

    0.0

    0.4

    0.8

    0 5 10 15 20bm

    Sin

    gle

    effe

    ct

    −1

    0

    1

    2

    0 50 100 150 200 250power

    Sin

    gle

    effe

    ct

    0

    50

    100

    150

    200

    250

    25 50 75ageph

    pow

    er

    −0.5

    0.0

    0.5

    Interaction effect

    K. Antonio, KU Leuven & UvA Case study 1 14 / 35

  • From GAMs to GLMs using evolutionary trees

    ●●●●●●

    ●●●

    ●●●

    ●●●

    ●●●●●●●●●●●●●●●●●●

    ●●●●●

    ●●●●●

    ●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●

    −0.2

    0.0

    0.2

    0.4

    0.6

    25 50 75ageph

    Sin

    gle

    effe

    ct

    ● ●● ●

    ● ● ●

    ● ● ● ● ● ● ● ●

    0.0

    0.4

    0.8

    0 5 10 15 20bm

    Sin

    gle

    effe

    ct

    ●●●●●●●●●●

    ●●●●●●●

    ●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

    ●●● ●●● ●●●●●●● ●●●●●●●●●● ●●

    −1

    0

    1

    2

    0 50 100 150 200 250power

    Sin

    gle

    effe

    ct

    0

    50

    100

    150

    200

    250

    25 50 75ageph

    pow

    er

    −0.75

    −0.50

    −0.25

    0.00

    0.25

    Residual

    K. Antonio, KU Leuven & UvA Case study 1 15 / 35

  • From GAMs to GLMs using evolutionary trees

    I Hence, we obtain a fully data-driven binning procedure.

    I We use a blend of techniques:

    trees, genetic algorithms;

    (machine learning);

    GAMs;

    (flexible statistical modeling);

    GLMs;

    (the actuarial comfort zone).

    K. Antonio, KU Leuven & UvA Case study 1 16 / 35

  • From GAMs to GLMs using evolutionary trees

    I Hence, we obtain a fully data-driven binning procedure.

    I We use a blend of techniques:

    trees, genetic algorithms;

    (machine learning);

    GAMs;

    (flexible statistical modeling);

    GLMs;

    (the actuarial comfort zone).

    K. Antonio, KU Leuven & UvA Case study 1 16 / 35

  • Unraveling the predictive power of telematics data in carinsurance pricing.

    Roel VerbelenKU Leuven

    Katrien AntonioKU Leuven & UvA

    Gerda ClaeskensKU Leuven

  • Telematics insurance: the future?

    I The Economist, February 23 2013,How’s my driving?

    I “Underwriters have traditionally used crude

    demographic data such as age, location and

    sex to separate the testosterone-fuelled boy

    racers from their often tamer female

    counterparts. [. . .] By monitoring their

    customers’ motoring habits, underwriters

    can increasingly distinguish between drivers

    who are safe on the road from those who

    merely seem safe on paper. Many think that

    telematics insurance will become the

    industry norm.”

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 18 / 35

  • New rating variables due to telematics technology

    Telematics data collected in each trip: driving habits

    and driving style

    • the distance driven;

    • the time of day;

    • how long you have been driving;

    • the location;

    • the speed/speeding;

    • harsh or smooth breaking;

    • aggressive acceleration ordeceleration;

    • your cornering and parking skills.

    Possibly combined with:

    • road maps;

    • weather information;

    • traffic information.

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 19 / 35

  • New rating variables due to telematics technology

    Telematics data collected in each trip: driving habits and driving style

    • the distance driven;

    • the time of day;

    • how long you have been driving;

    • the location;

    • the speed/speeding;

    • harsh or smooth breaking;

    • aggressive acceleration ordeceleration;

    • your cornering and parking skills.

    Possibly combined with:

    • road maps;

    • weather information;

    • traffic information.

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 19 / 35

  • Unique telematics data set from a Belgian insurer

    I Telematics data collected in between 2010 and 2014.

    I Belgian MTPL product with telematics black box targeted to youngdrivers.

    I Daily CSV-files with trip info, aggregated on daily basis:

    - number of trips;

    - meters traveled (in total) and

    • divided by time slot: 6u-9u30, 9u30-16u, 16u-19u, 19u-22u,22u-6u;

    • divided by road type: motorways, urban area, abroad, any othertype.

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 20 / 35

  • Unique telematics data set from a Belgian insurer

    Insured Insurer

    Data provider

    Policy information

    Raw

    telematics

    information

    Agg

    rega

    ted

    tele

    mat

    ics

    info

    rmat

    ion

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 21 / 35

  • Unique telematics data set from a Belgian insurer

    ●●

    ●●●●●●

    ●●●●●

    ●●●●●●●

    ●●

    ●●●●

    ●●●●●●●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●

    ●●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●●

    ●●●●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●●●

    ●●

    ●●●●●

    ●●●●●

    ●●

    ●●●●●●●

    ●●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ●●

    ●●●●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ●●●●●

    ●●●●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ●●●●

    ●●●

    ●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●●●●

    ●●●●●

    ●●

    ●●●●

    ●●

    ●●●

    ●●●●●

    ●●●●●●

    ●●●●●

    ●●●●

    ●●

    ●●

    ●●●●

    ●●●●

    ●●●●●

    ●●●●●

    ●●●●●●

    ●●

    ●●●

    ●●●●●

    ●●●●

    ●●●●●●

    ●●●●●

    ●●●●●

    ●●●●●

    ●●●●●

    ●●●●●●

    ●●●●●●

    ●●●●●

    ●●●

    ●●●●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●●●●

    ●●●●●

    ●●●●●

    ●●●●●

    ●●●●

    ●●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●●●●

    ●●●●●

    ●●●●

    ●●●

    ●●●●●

    ●●●●

    ●●●●●

    ●●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●●●●

    ●●●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●●

    ●●●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●●

    ●●●●

    ●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●●●

    ●●●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●

    ●●●●

    ●●

    ●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●●●●●

    ●●●●

    ●●

    ●●●●

    ●●

    ●●

    ●●●

    ●●●●

    ●●●●

    ●●●

    ●●●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●●

    ●●●

    ●●

    ●●●

    ●●●●

    ●●●●

    ●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●

    ●●

    ●●●●

    ●●

    ●●

    ●●●●

    ●●●●

    ●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●

    ●●

    ●●

    0

    100k

    200k

    300k

    400k

    2010 2011 2012 2013 2014 2015Date

    Dis

    tanc

    e (in

    km

    )

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 22 / 35

  • Unique telematics data set from a Belgian insurer

    ●●

    ●●●●●●

    ●●●●●

    ●●●●●●●

    ●●

    ●●●●

    ●●●●●●●●●

    ●●

    ●●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●

    ●●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●●

    ●●●●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●●●

    ●●

    ●●●●●

    ●●●●●

    ●●

    ●●●●●●●

    ●●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ●●

    ●●●●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ●●●●●

    ●●●●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ●●●●

    ●●●

    ●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●●●●

    ●●●●●

    ●●

    ●●●●

    ●●

    ●●●

    ●●●●●

    ●●●●●●

    ●●●●●

    ●●●●

    ●●

    ●●

    ●●●●

    ●●●●

    ●●●●●

    ●●●●●

    ●●●●●●

    ●●

    ●●●

    ●●●●●

    ●●●●

    ●●●●●●

    ●●●●●

    ●●●●●

    ●●●●●

    ●●●●●

    ●●●●●●

    ●●●●●●

    ●●●●●

    ●●●

    ●●●●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●●●●

    ●●●●●

    ●●●●●

    ●●●●●

    ●●●●

    ●●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●●●●

    ●●●●●

    ●●●●

    ●●●

    ●●●●●

    ●●●●

    ●●●●●

    ●●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●

    ●●●●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●●●●

    ●●●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●●

    ●●●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●●●

    ●●●

    ●●●●

    ●●●

    ●●●●

    ●●●●

    ●●

    ●●

    ●●●●

    ●●

    ●●●●●

    ●●

    ●●●

    ●●

    ●●●

    ●●

    ●●●●

    ●●●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●

    ●●●●

    ●●

    ●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●●

    ●●●

    ●●

    ●●●

    ●●●●

    ●●●●

    ●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●

    ●●

    ●●●●

    ●●

    ●●

    ●●●●

    ●●●●

    ●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●

    ●●●●

    ●●

    ●●●●

    ●●●●

    ●●●●

    ●●●●

    ●●●

    ●●

    ●●

    0

    100k

    200k

    300k

    400k

    2010 2012 2014Date

    Dis

    tanc

    e (in

    km

    )

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 23 / 35

  • Description of the data

    The resulting data set has 33 259 observations:

    I 10 406 unique policyholders;

    I 17 681 years of insured periods;

    I 0.0838 claims per insured year;

    I 1481 MTPL claims at fault;

    I 297 million kilometers driven;

    I 0.0499 claims per 10 000 km.

    What is the best measure of exposure to risk?

    0.000

    0.002

    0.004

    0.006

    0.008

    50 100 150 200 250 300 350Policy period (days)

    Den

    sity

    0.00

    0.02

    0.04

    0.06

    0.08

    0 10 20 30 40 50 60 70Distance (1000 km)

    Den

    sity

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 24 / 35

  • Policy information

    0.00

    0.05

    0.10

    0.15

    18 21 24 27 30Age

    Den

    sity

    0.00

    0.05

    0.10

    0.15

    0 3 6 9 12Experience

    Den

    sity

    0.00

    0.05

    0.10

    0.15

    0 4 8 12 16 20 24Age vehicle

    Den

    sity

    0.00

    0.01

    0.02

    30 60 90 120150180210Kwatt

    Den

    sity

    0.00

    0.05

    0.10

    −4 0 4 8 12 16 20Bonus−malus

    Pro

    port

    ion

    0.0

    0.2

    0.4

    male femaleGender

    Pro

    port

    ion

    0.0

    0.2

    0.4

    0.6

    Diesel PetrolFuel

    Pro

    port

    ion

    0.0

    0.2

    0.4

    0.6

    yes noMaterial damage cover

    Pro

    port

    ion

    Proportion per km2

    [3.69e−07,5.1e−05)

    [5.1e−05,0.00014)

    [0.00014,0.000274)

    [0.000274,0.000475)

    [0.000475,0.000789)

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 25 / 35

  • Telematics information

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 26 / 35

  • Predictor sets

    Classic

    Timehybrid

    Meterhybrid

    TelematicsPolicy

    informationTelematicsinformation

    Time ba

    sedratin

    g

    Meter bas

    ed rating

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 27 / 35

  • Generalized additive models

    We use GAMs (Wood, 2006):

    Nit ∼ POI(µit = exp (ηit))

    ηit = offset + ηcatit + η

    contit + η

    spatialit + η

    reit + η

    compit

    ηcatit + ηcontit + η

    spatialit = Z itβ +

    J∑j=1

    fj(xjit) + fspatial(latit , longit) ,

    We combine:

    categorical + continuous + spatial + compositional (new!!)

    risk factors.

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 28 / 35

  • Compositional data

    I Satellite talk!

    I Roel Verbelen’s talk, today, in ParallelSession 1 (Analytics), from 11-11.30h!

    I More on our telematics paper, withfocus on methodological contributionwrt compositional data as predictors. Roel Verbelen, KU

    Leuven

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 29 / 35

  • Model selection and assessment

    I Exhaustive search with AIC as a global goodness-of-fit measure.

    AIC = −2 · logL+ 2 · EDF

    where EDF is the effective degrees of freedom.

    I Predictive performance is assessed using proper scoring rules for countdata (Czado et al., 2009) with 10-fold cross validation

    S =1∑I

    i=1 Ti

    I∑i=1

    Ti∑t=1

    s(P̂−κitit , nit) ,

    where P̂−κitit the predictive count distribution for observation nitestimated with the κitth part of the data removed.

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 30 / 35

  • Results: discussion

    I Telematics information improves predictive power.

    - Gender plays no role anymore in models incorporating telematicsinformation (cfr. Gender Directive).

    - Spatial heterogeneity decreases.

    - Time hybrid model incorporating telematics through additional riskfactors is optimal.

    - Experience is preferred above age of the driver.

    - Compositional driving habits have significant impact on riskiness.

    - Classic approach performs worse.

    I Similar results using negative binomial regression and using exposureas offset.

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 31 / 35

  • Results: model assessment

    Predictor set EDFAIC logS QS SphS

    value rank value rank value rank value rank

    Classic 32.15 11 896 4 0.1790 4 −0.918 58 4 −0.958 22 4Time hybrid 39.66 11 727 1 0.1764 1 −0.919 10 1 −0.958 37 1Meter hybrid 41.47 11 736 2 0.1766 2 −0.919 08 2 −0.958 36 2Telematics 18.05 11 890 3 0.1787 3 −0.918 60 3 −0.958 22 3

    I Significant impact of the use of telematics data;

    I Time hybrid is the best model according to AIC and all proper scoringrules;

    I Using only telematics predictors is even better than the use oftraditional rating variables.

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 32 / 35

  • Time hybrid - Policy information

    Predictor

    Pol

    icy

    TimeAgeExperienceSexMaterialPostal codeBonus-malusAge vehicleKwattFuel

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 33 / 35

  • Time hybrid - Telematics information

    Predictor

    Tel

    emat

    ics

    DistanceYearly distanceAverage distanceRoad type 1111Road type 0111Time slotWeek/weekend

    K. Antonio, KU Leuven & UvA Case study 2: telematics insurance 34 / 35

  • Outlook

    I encourage the blending idea . . .

    - of techniques (from machine learning, statistical modeling, actuarialscience);

    - of disciplines (from computer science, statistics, actuarial science, butalso law);

    - of people from practice and academia;

    . . . to tackle the challenges imposed by structured and unstructured data inorder to create insurance analytics, products and risk management of thefuture.

    K. Antonio, KU Leuven & UvA Outlook 35 / 35

    Goals of this talkGoals of this talkData science and predictive modeling: buzz wordsActuaries and Predictive modelingCase study 1Case study 2: telematics insuranceDataModelResults

    Outlook