Download - Backtesting Stochastic Mortality Models: An Ex-Post Evaluation of Multi-Period- Ahead Density Forecasts Kevin Dowd (CRIS, NUBS) Andrew J. G. Cairns (Heriot-Watt)

Backtesting Stochastic Mortality Models: An Ex-Post Evaluation of Multi-Period-

Ahead Density Forecasts

Kevin Dowd (CRIS, NUBS) Andrew J. G. Cairns (Heriot-Watt)

David Blake (Pensions Institute, Cass Business School)Guy D. Coughlan (JPMorgan)

David Epstein (JPMorgan)Marwa Khalaf-Allah (JPMorgan)

4th International Longevity Risk and Capital Market Solutions Conference

Amsterdam September 2008

2

Purposes of Paper

• To set out a framework to backtest the forecast performance of mortality models– Backtesting = evaluation of forecasts against

subsequently realised outcomes

• To apply this backtesting framework to a set of mortality models– How well do they actually perform?

3

Background

– This study is the fourth in a series involving a collaboration between Blake, Cairns and Dowd and the LifeMetrics team at JPMorgan

– Involves actuaries, economists and investment bankers

– Of course, it is very easy (and fun!) to attack the forecasting ‘abilities’ of actuaries (remember Equitable?) and investment bankers (remember subprime? etc), but we should remember…

4

Its not just actuaries and investment bankers who can’t forecast

5

Background

– Cairns et alia (2007) examines the empirical fits of 8 different mortality models applied to E&W and US male mortality data

– Compares model performance• Uses a range of qualitative criteria (e.g.,

biological reasonableness, etc)

• Uses a range of quantitative criteria (e.g., Bayes information criterion)

6

Models considered

– Model M1 = Lee-Carter, no cohort effect

– Model M2 = Renshaw-Haberman’s 2006 cohort effect generalisation of M1

– Model M3 = Currie’s age-period-cohort model

– Model M4 = P-splines model, Currie 2004

– Model M5 = CBD two-factor model, Cairns et al (2006), no cohort effect

– Models M6, M7 and M8: alternative cohort-effect generalisations of CBD

7

Second study, Cairns et al (2008)

– Examines ex ante plausibility of models’ density forecasts

– M4 (P-Splines not considered)

– Amongst other conclusions, finds that M8 (which did very well in first study) gives very implausible forecasts for US data

– Hence, decided to drop M8 as well

– Thus, a model might fit past data well but still give unreliable forecasts• Not enough just to look at past fits

8

Third study, Dowd et al (2008a)

– Examines the Goodness of Fits of models M1, M2B, M3B, M5, M6 and M7 more systematically• M2B is a special case of M2, which uses an ARIMA(1,1,0)

for cohort effect

• M3B is a special case of M3, which the same ARIMA(1,1,0) for cohort effect

– Basic idea to unravel the models’ testable implications and test them systematically

– Finds some problems with all models but M2B unstable

9

Motivation for present study

– A model might• Give a good fit to past data and

• Generate density forecasts that appear plausible ex ante

– And still produce poor forecasts

– Hence, it is essential to test performance of models against subsequently realised outcomes• This is what backtesting is about

– In the end, it is the forecast performance that really matters

– Would you want to drive a car that hadn’t been field-tested?

10

Backtesting framework

– Choose metric of interest• Could choose mortality rates, survival rates, life

expectancy, annuity prices etc.

– Select historical lookback window used to estimate model params

– Select forecast horizon or lookforward window for forecasts

– Implement tests of how well forecasts subsequently performed

11


– We choose focus mainly on mortality rate as metric

– We choose a fixed 10-year lookback window• This seems to be emerging as the standard amongst

practitioners

– We examine a range of backtests:• Over contracting horizons

• Over expanding horizons

• Over rolling fixed-length horizons

• Future mortality density tests

12


– We consider forecasts both with and without parameter uncertainty

– Parameter certain case: treat estimates of parameters as if known values

– Parameter uncertain case: forecast using a Bayesian approach that allows for uncertainty in parameter estimates• Allows for uncertainty in parameters governing period and

cohort effects

– Results indicate it is very important to allow for parameter uncertainty

13

Contracting horizon BT: age 65

1980 1985 1990 1995 2000 20050.01

0.02

0.03

0.04Males aged 65: Model M1

Mor

talit

y ra

te

1980 1985 1990 1995 2000 20050.01

0.02

0.03

0.04Males aged 65: Model M2B

Mor

talit

y ra

te

1980 1985 1990 1995 2000 20050.01

0.02

0.03


Mor

talit

y ra

te

1980 1985 1990 1995 2000 20050.01

0.02

0.03


Mor

talit

y ra

te

1980 1985 1990 1995 2000 20050.01

0.02

0.03


Stepping off year

Mor

talit

y ra

te

1980 1985 1990 1995 2000 20050.01

0.02

0.03


Stepping off year

Mor

talit

y ra

te

14


1980 1985 1990 1995 2000 20050.02

0.04

0.06

0.08

Males aged 75: Model M1

Mo

rtal

ity

rate

1980 1985 1990 1995 2000 20050.02

0.04

0.06

0.08

Males aged 75: Model M2B

Mo

rtal

ity

rate

1980 1985 1990 1995 2000 20050.02

0.04

0.06

0.08


Mo

rtal

ity

rate

1980 1985 1990 1995 2000 20050.02

0.04

0.06

0.08


Mo

rtal

ity

rate

1980 1985 1990 1995 2000 20050.02

0.04

0.06

0.08


Stepping off year

Mo

rtal

ity

rate

1980 1985 1990 1995 2000 20050.02

0.04

0.06

0.08


Stepping off year

Mo

rtal

ity

rate

15


1980 1985 1990 1995 2000 20050.05

0.1

0.15

0.2


Mor

talit

y ra

te

1980 1985 1990 1995 2000 20050.05

0.1

0.15

0.2


Mor

talit

y ra

te

1980 1985 1990 1995 2000 20050.05

0.1

0.15

0.2


Mor

talit

y ra

te

1980 1985 1990 1995 2000 20050.05

0.1

0.15

0.2


Mor

talit

y ra

te

1980 1985 1990 1995 2000 20050.05

0.1

0.15

0.2


Stepping off year

Mor

talit

y ra

te

1980 1985 1990 1995 2000 20050.05

0.1

0.15

0.2


Stepping off year

Mor

talit

y ra

te

16

Conclusions so far

• Big difference between PC and PU forecasts

• PU prediction intervals usually considerably wider than PC ones

• M2B sometimes unstable

• Now consider expanding horizon predictions …

17

Prediction-Intervals from 1980: age 65

1980 1985 1990 1995 2000 20050.01

0.02

0.03

0.04

0.05 PC: [xL, xM, xU, n] = [7, 25, 1, 27]


Mo

rtal

ity

rate

PU: [xL, xM, xU, n] = [0, 25, 1, 27]

1980 1985 1990 1995 2000 20050.01

0.02

0.03

0.04

0.05 PC: [xL, xM, xU, n] = [16, 27, 0, 27]


Mo

rtal

ity

rate

PU: [xL, xM, xU, n] = [8, 27, 0, 27]

1980 1985 1990 1995 2000 20050.01

0.02

0.03

0.04

0.05 PC: [xL, xM, xU, n] = [12, 26, 1, 27]

Mo

rtal

ity

rate


PU: [xL, xM, xU, n] = [0, 26, 1, 27]

1980 1985 1990 1995 2000 20050.01

0.02

0.03

0.04

0.05 PC: [xL, xM, xU, n] = [18, 27, 0, 27]


Mo

rtal

ity

rate

PU: [xL, xM, xU, n] = [1, 27, 0, 27]

1980 1985 1990 1995 2000 20050.01

0.02

0.03

0.04

0.05

0.06

PC: [xL, xM, xU, n] = [14, 25, 1, 27]


Year

Mo

rtal

ity

rate

PU: [xL, xM, xU, n] = [0, 25, 1, 27]

1980 1985 1990 1995 2000 20050.01

0.02

0.03

0.04

0.05

0.06

PC: [xL, xM, xU, n] = [7, 19, 1, 27]

Year

Mo

rtal

ity

rate


PU: [xL, xM, xU, n] = [0, 19, 1, 27]

18


1980 1985 1990 1995 2000 2005

0.04

0.06

0.08

0.1

PC: [xL, xM, xU, n] = [12, 27, 0, 27]


Mo

rtal

ity

rate

PU: [xL, xM, xU, n] = [1, 27, 0, 27]

1980 1985 1990 1995 2000 2005

0.04

0.06

0.08

0.1

PC: [xL, xM, xU, n] = [13, 27, 0, 27]


Mo

rtal

ity

rate

PU: [xL, xM, xU, n] = [1, 27, 0, 27]

1980 1985 1990 1995 2000 2005

0.04

0.06

0.08

0.1

PC: [xL, xM, xU, n] = [8, 27, 0, 27]

Mo

rtal

ity

rate


PU: [xL, xM, xU, n] = [1, 27, 0, 27]

1980 1985 1990 1995 2000 2005

0.04

0.06

0.08

0.1

PC: [xL, xM, xU, n] = [7, 25, 1, 27]


Mo

rtal

ity

rate

PU: [xL, xM, xU, n] = [0, 25, 1, 27]

1980 1985 1990 1995 2000 2005

0.04

0.06

0.08

0.1

PC: [xL, xM, xU, n] = [8, 27, 0, 27]


Year

Mo

rtal

ity

rate

PU: [xL, xM, xU, n] = [1, 27, 0, 27]

1980 1985 1990 1995 2000 2005

0.04

0.06

0.08

0.1

PC: [xL, xM, xU, n] = [9, 27, 0, 27]

Year

Mo

rtal

ity

rate


PU: [xL, xM, xU, n] = [1, 27, 0, 27]

19


1980 1985 1990 1995 2000 20050.05

0.1

0.15

0.2

0.25

PC: [xL, xM, xU, n] = [4, 22, 0, 27]


Mo

rtal

ity

rate

PU: [xL, xM, xU, n] = [1, 22, 0, 27]

1980 1985 1990 1995 2000 20050.05

0.1

0.15

0.2

0.25

PC: [xL, xM, xU, n] = [0, 5, 1, 27]


Mo

rtal

ity

rate

PU: [xL, xM, xU, n] = [0, 7, 1, 27]

1980 1985 1990 1995 2000 20050.05

0.1

0.15

0.2

0.25

PC: [xL, xM, xU, n] = [2, 21, 0, 27]

Mo

rtal

ity

rate


PU: [xL, xM, xU, n] = [1, 21, 0, 27]

1980 1985 1990 1995 2000 20050.05

0.1

0.15

0.2

0.25

PC: [xL, xM, xU, n] = [2, 24, 0, 27]


Mo

rtal

ity

rate

PU: [xL, xM, xU, n] = [1, 24, 0, 27]

1980 1985 1990 1995 2000 20050.05

0.1

0.15

0.2

0.25

PC: [xL, xM, xU, n] = [1, 18, 0, 27]


Year

Mo

rtal

ity

rate

PU: [xL, xM, xU, n] = [1, 18, 0, 27]

1980 1985 1990 1995 2000 20050.05

0.1

0.15

0.2

0.25

PC: [xL, xM, xU, n] = [5, 26, 0, 27]

Year

Mo

rtal

ity

rate


PU: [xL, xM, xU, n] = [1, 26, 0, 27]

20

Expanding PI conclusions

• PC models have far too many lower exceedances

• PU models have exceedances that are much closer to expectations– Especially for M1, M7 and M3B

– Suggests that PU forecasts are more plausible than PC ones

• Negligible differences between PC and PU median predictions

• Very few upper exceedances

21

Expanding PI conclusions

• Too few upper exceedances, and two many median and lower exceedances

• some upward bias, especially for PC forecasts

• This upward bias is especially pronounced for PC forecasts

• Evidence of upward bias less clearcut for PU forecasts

22

Rolling Fixed Horizon Forecasts

• From now on, work with PU forecasts only

• Assume illustrative horizon = 15 years

• Now examine performance of each model in turn …

23

Model M1

1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 200610

-2

10-1

Year

Mo

rtal

ity

rate

Age 65: [xL, xM, xU, n] = [1, 12, 0, 12]

Age 75: [xL, xM, xU, n] = [0, 11, 0, 12]

Age 85: [xL, xM, xU, n] = [1, 10, 0, 12]

Age 65

Age 85

Age 75

24

Model M2B

1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 200610

-2

10-1

Year

Mo

rtal

ity

rate

Age 65: [xL, xM, xU, n] = [8, 12, 0, 12]

Age 75: [xL, xM, xU, n] = [0, 12, 0, 12]

Age 85: [xL, xM, xU, n] = [1, 5, 0, 12]

Age 85

Age 65

Age 75

25

Model M3B

1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 200610

-2

10-1

Year

Mo

rtal

ity

rate

Age 65: [xL, xM, xU, n] = [2, 12, 0, 12]

Age 75: [xL, xM, xU, n] = [0, 12, 0, 12]

Age 85: [xL, xM, xU, n] = [0, 8, 0, 12]

Age 75

Age 65

Age 85

26

Model M5

1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 200610

-2

10-1

Year

Mo

rtal

ity

rate

Age 65: [xL, xM, xU, n] = [9, 12, 0, 12]

Age 75: [xL, xM, xU, n] = [0, 12, 0, 12]

Age 85: [xL, xM, xU, n] = [0, 8, 0, 12]

Age 85

Age 75

Age 65

27

Model M6

1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 200610

-2

10-1

Year

Mo

rtal

ity

rate

Age 65: [xL, xM, xU, n] = [10, 12, 0, 12]

Age 75: [xL, xM, xU, n] = [0, 12, 0, 12]

Age 85: [xL, xM, xU, n] = [0, 4, 0, 12]

Age 85

Age 65

Age 75

28

Model M7

1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 200610

-2

10-1

Year

Mo

rtal

ity

rate

Age 65: [xL, xM, xU, n] = [4, 12, 0, 12]

Age 75: [xL, xM, xU, n] = [0, 12, 0, 12]

Age 85: [xL, xM, xU, n] = [0, 8, 0, 12]

Age 85

Age 75

Age 65

29

Tentative conclusions so far

• Rolling PI charts broadly consistent with earlier results

• Some evidence of upward bias but not consistent across models or always especially compelling

• M2B again shows instability

30

Mortality density tests

• Choose age (e.g., 65) and horizon (e.g., 15 years ahead)

• Use model to project pdf (or cdf) of mortality rate 15 years ahead

• Plot realised q on to pdf/cdf

• Obtain associated p-value (or PIT value)

• Reject if p is too far out in either tail

31

Example: P-Values of Realised Mortality: Males 65, 1980 Start, Horizon = 26 Years Ahead

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.040

0.5

1

CD

F u

nd

er n

ull

Realised q = 0.0149 : p-value = 0.159


0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.040

0.5

1

CD

F u

nd

er n

ull



0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.040

0.5

1

CD

F u

nd

er n

ull



0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.040

0.5

1

CD

F u

nd

er n

ull



0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.040

0.5

1

Mortality rate

CD

F u

nd

er n

ull



0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.040

0.5

1

Mortality rate

CD

F u

nd

er n

ull



32

Many ways to do this

• For h=25 years ahead: 1 way – 1980-2005 only

• For h=24 years ahead, 2 ways– 1980-2004, 1981-2005

• For h=23 years ahead, 3 ways

• ….

• For h=1 year ahead, 26 ways– 1980-1981, 1981-1982, …, 2004-2005

33

Lots of cases to consider

• The are 25+24+23+…+1=325 separate cases to consider, each equally ‘legitimate’

• Need some way to make use of all possibilities but consolidate results

• We do so by computing p-values for each case and then work with mean p-values from each test

• These are reported below for each age, for h=5, 10 and 15 years ahead:

34

Age 65

1985 1990 1995 2000 20050

0.5

1Males aged 65: Model M1

P-v

alu

e

Average = 0.290 for forecasts 5 years aheadAverage = 0.188 for forecasts 10 years aheadAverage = 0.143 for forecasts 15 years ahead

1985 1990 1995 2000 20050

0.5

1Males aged 65: Model M2B

P-v

alu

e


1985 1990 1995 2000 20050

0.5


P-v

alu

e

Average = 0.259 for forecasts 5 years ahead

Average = 0.164 for forecasts 10 years aheadAverage = 0.109 for forecasts 15 years ahead

1985 1990 1995 2000 20050

0.5


P-v

alu

e


1985 1990 1995 2000 20050

0.5


Starting year

P-v

alu

e

Average = 0.193 for forecasts 5 years aheadAverage = 0.082 for forecasts 10 years aheadAverage = 0.039for forecasts 15 years ahead

1985 1990 1995 2000 20050

0.5


Starting year

P-v

alu

e


35

Age 75

1985 1990 1995 2000 20050

0.5


P-v

alu

e


1985 1990 1995 2000 20050

0.5


P-v

alu

e


1985 1990 1995 2000 20050

0.5


P-v

alu

e


1985 1990 1995 2000 20050

0.5


P-v

alu

e


1985 1990 1995 2000 20050

0.5


Starting year

P-v

alu

e


1985 1990 1995 2000 20050

0.5


Starting year

P-v

alu

e


36

Age 85

1985 1990 1995 2000 20050

0.5


P-v

alu

e


1985 1990 1995 2000 20050

0.5


P-v

alu

e


1985 1990 1995 2000 20050

0.5


P-v

alu

e


1985 1990 1995 2000 20050

0.5


P-v

alu

e


1985 1990 1995 2000 20050

0.5


Starting year

P-v

alu

e



1985 1990 1995 2000 20050

0.5


Starting year

P-v

alu

e



37

Conclusions from these tests

• All models perform well

• No rejections at 1% SL

• Only 3 at 5% SL

38

Overall conclusions

• Study outlines a framework for backtesting forecasts of mortality models

• As regards individual models and this dataset:– M1, M3B, M5 and M7 perform well most of the time and there

is little between them

– M2B unstable

– Of the Lee-Carter family of models, hard to choose between M1 and M3B

– Of the CBD family, M7 seems to perform best; little to choose between M5 and M7

39

Two other points stand out

• In many but not all cases, and depending also on the model, there is evidence of an upward bias in forecasts– This is very pronounced for PC forecasts

– This bias is less pronounced for PU forecasts

• Except maybe for M2B, PU forecasts are more plausible than the PC forecasts

• Very important to take account of param uncertainty more or less regardless of the model one uses

40

References

• Cairns et al. (2007) “A quantitative comparison of stochastic mortality models using data from England & Wales and the United States.” Pensions Institute Discussion Paper PI-0701, March

• Cairns et al. (2008) “The plausibility of mortality density forecasts: An analysis of six stochastic mortality models.” Pensions Institute Discussion Paper PI-0801, April.

• Dowd et al. (2008a) “Evaluating the goodness of fit of stochastic mortality models.” Pensions Institute Discussion Paper PI-0802, September.

• Dowd et al. (2008b) “Backtesting stochastic mortality models: An ex-post evaluation of multi-year-ahead density forecasts.” Pensions Institute Discussion Paper PI-0803, September.

• These papers are also available at www.lifemetrics.com

Download - Backtesting Stochastic Mortality Models: An Ex-Post Evaluation of Multi-Period- Ahead Density Forecasts Kevin Dowd (CRIS, NUBS) Andrew J. G. Cairns (Heriot-Watt)

Top Related