Backtesting Stochastic Mortality Models: An Ex-Post Evaluation of Multi-Period-
Ahead Density Forecasts
Kevin Dowd (CRIS, NUBS) Andrew J. G. Cairns (Heriot-Watt)
David Blake (Pensions Institute, Cass Business School)Guy D. Coughlan (JPMorgan)
David Epstein (JPMorgan)Marwa Khalaf-Allah (JPMorgan)
4th International Longevity Risk and Capital Market Solutions Conference
Amsterdam September 2008
2
Purposes of Paper
• To set out a framework to backtest the forecast performance of mortality models– Backtesting = evaluation of forecasts against
subsequently realised outcomes
• To apply this backtesting framework to a set of mortality models– How well do they actually perform?
3
Background
– This study is the fourth in a series involving a collaboration between Blake, Cairns and Dowd and the LifeMetrics team at JPMorgan
– Involves actuaries, economists and investment bankers
– Of course, it is very easy (and fun!) to attack the forecasting ‘abilities’ of actuaries (remember Equitable?) and investment bankers (remember subprime? etc), but we should remember…
5
Background
– Cairns et alia (2007) examines the empirical fits of 8 different mortality models applied to E&W and US male mortality data
– Compares model performance• Uses a range of qualitative criteria (e.g.,
biological reasonableness, etc)
• Uses a range of quantitative criteria (e.g., Bayes information criterion)
6
Models considered
– Model M1 = Lee-Carter, no cohort effect
– Model M2 = Renshaw-Haberman’s 2006 cohort effect generalisation of M1
– Model M3 = Currie’s age-period-cohort model
– Model M4 = P-splines model, Currie 2004
– Model M5 = CBD two-factor model, Cairns et al (2006), no cohort effect
– Models M6, M7 and M8: alternative cohort-effect generalisations of CBD
7
Second study, Cairns et al (2008)
– Examines ex ante plausibility of models’ density forecasts
– M4 (P-Splines not considered)
– Amongst other conclusions, finds that M8 (which did very well in first study) gives very implausible forecasts for US data
– Hence, decided to drop M8 as well
– Thus, a model might fit past data well but still give unreliable forecasts• Not enough just to look at past fits
8
Third study, Dowd et al (2008a)
– Examines the Goodness of Fits of models M1, M2B, M3B, M5, M6 and M7 more systematically• M2B is a special case of M2, which uses an ARIMA(1,1,0)
for cohort effect
• M3B is a special case of M3, which the same ARIMA(1,1,0) for cohort effect
– Basic idea to unravel the models’ testable implications and test them systematically
– Finds some problems with all models but M2B unstable
9
Motivation for present study
– A model might• Give a good fit to past data and
• Generate density forecasts that appear plausible ex ante
– And still produce poor forecasts
– Hence, it is essential to test performance of models against subsequently realised outcomes• This is what backtesting is about
– In the end, it is the forecast performance that really matters
– Would you want to drive a car that hadn’t been field-tested?
10
Backtesting framework
– Choose metric of interest• Could choose mortality rates, survival rates, life
expectancy, annuity prices etc.
– Select historical lookback window used to estimate model params
– Select forecast horizon or lookforward window for forecasts
– Implement tests of how well forecasts subsequently performed
11
Backtesting framework
– We choose focus mainly on mortality rate as metric
– We choose a fixed 10-year lookback window• This seems to be emerging as the standard amongst
practitioners
– We examine a range of backtests:• Over contracting horizons
• Over expanding horizons
• Over rolling fixed-length horizons
• Future mortality density tests
12
Backtesting framework
– We consider forecasts both with and without parameter uncertainty
– Parameter certain case: treat estimates of parameters as if known values
– Parameter uncertain case: forecast using a Bayesian approach that allows for uncertainty in parameter estimates• Allows for uncertainty in parameters governing period and
cohort effects
– Results indicate it is very important to allow for parameter uncertainty
13
Contracting horizon BT: age 65
1980 1985 1990 1995 2000 20050.01
0.02
0.03
0.04Males aged 65: Model M1
Mor
talit
y ra
te
1980 1985 1990 1995 2000 20050.01
0.02
0.03
0.04Males aged 65: Model M2B
Mor
talit
y ra
te
1980 1985 1990 1995 2000 20050.01
0.02
0.03
0.04Males aged 65: Model M3B
Mor
talit
y ra
te
1980 1985 1990 1995 2000 20050.01
0.02
0.03
0.04Males aged 65: Model M5
Mor
talit
y ra
te
1980 1985 1990 1995 2000 20050.01
0.02
0.03
0.04Males aged 65: Model M6
Stepping off year
Mor
talit
y ra
te
1980 1985 1990 1995 2000 20050.01
0.02
0.03
0.04Males aged 65: Model M7
Stepping off year
Mor
talit
y ra
te
14
Contracting horizon BT: age 75
1980 1985 1990 1995 2000 20050.02
0.04
0.06
0.08
Males aged 75: Model M1
Mo
rtal
ity
rate
1980 1985 1990 1995 2000 20050.02
0.04
0.06
0.08
Males aged 75: Model M2B
Mo
rtal
ity
rate
1980 1985 1990 1995 2000 20050.02
0.04
0.06
0.08
Males aged 75: Model M3B
Mo
rtal
ity
rate
1980 1985 1990 1995 2000 20050.02
0.04
0.06
0.08
Males aged 75: Model M5
Mo
rtal
ity
rate
1980 1985 1990 1995 2000 20050.02
0.04
0.06
0.08
Males aged 75: Model M6
Stepping off year
Mo
rtal
ity
rate
1980 1985 1990 1995 2000 20050.02
0.04
0.06
0.08
Males aged 75: Model M7
Stepping off year
Mo
rtal
ity
rate
15
Contracting horizon BT: age 85
1980 1985 1990 1995 2000 20050.05
0.1
0.15
0.2
0.25Males aged 85: Model M1
Mor
talit
y ra
te
1980 1985 1990 1995 2000 20050.05
0.1
0.15
0.2
0.25Males aged 85: Model M2B
Mor
talit
y ra
te
1980 1985 1990 1995 2000 20050.05
0.1
0.15
0.2
0.25Males aged 85: Model M3B
Mor
talit
y ra
te
1980 1985 1990 1995 2000 20050.05
0.1
0.15
0.2
0.25Males aged 85: Model M5
Mor
talit
y ra
te
1980 1985 1990 1995 2000 20050.05
0.1
0.15
0.2
0.25Males aged 85: Model M6
Stepping off year
Mor
talit
y ra
te
1980 1985 1990 1995 2000 20050.05
0.1
0.15
0.2
0.25Males aged 85: Model M7
Stepping off year
Mor
talit
y ra
te
16
Conclusions so far
• Big difference between PC and PU forecasts
• PU prediction intervals usually considerably wider than PC ones
• M2B sometimes unstable
• Now consider expanding horizon predictions …
17
Prediction-Intervals from 1980: age 65
1980 1985 1990 1995 2000 20050.01
0.02
0.03
0.04
0.05 PC: [xL, xM, xU, n] = [7, 25, 1, 27]
Males aged 65: Model M1
Mo
rtal
ity
rate
PU: [xL, xM, xU, n] = [0, 25, 1, 27]
1980 1985 1990 1995 2000 20050.01
0.02
0.03
0.04
0.05 PC: [xL, xM, xU, n] = [16, 27, 0, 27]
Males aged 65: Model M2B
Mo
rtal
ity
rate
PU: [xL, xM, xU, n] = [8, 27, 0, 27]
1980 1985 1990 1995 2000 20050.01
0.02
0.03
0.04
0.05 PC: [xL, xM, xU, n] = [12, 26, 1, 27]
Mo
rtal
ity
rate
Males aged 65: Model M3B
PU: [xL, xM, xU, n] = [0, 26, 1, 27]
1980 1985 1990 1995 2000 20050.01
0.02
0.03
0.04
0.05 PC: [xL, xM, xU, n] = [18, 27, 0, 27]
Males aged 65: Model M5
Mo
rtal
ity
rate
PU: [xL, xM, xU, n] = [1, 27, 0, 27]
1980 1985 1990 1995 2000 20050.01
0.02
0.03
0.04
0.05
0.06
PC: [xL, xM, xU, n] = [14, 25, 1, 27]
Males aged 65: Model M6
Year
Mo
rtal
ity
rate
PU: [xL, xM, xU, n] = [0, 25, 1, 27]
1980 1985 1990 1995 2000 20050.01
0.02
0.03
0.04
0.05
0.06
PC: [xL, xM, xU, n] = [7, 19, 1, 27]
Year
Mo
rtal
ity
rate
Males aged 65: Model M7
PU: [xL, xM, xU, n] = [0, 19, 1, 27]
18
Prediction-Intervals from 1980: age 75
1980 1985 1990 1995 2000 2005
0.04
0.06
0.08
0.1
PC: [xL, xM, xU, n] = [12, 27, 0, 27]
Males aged 75: Model M1
Mo
rtal
ity
rate
PU: [xL, xM, xU, n] = [1, 27, 0, 27]
1980 1985 1990 1995 2000 2005
0.04
0.06
0.08
0.1
PC: [xL, xM, xU, n] = [13, 27, 0, 27]
Males aged 75: Model M2B
Mo
rtal
ity
rate
PU: [xL, xM, xU, n] = [1, 27, 0, 27]
1980 1985 1990 1995 2000 2005
0.04
0.06
0.08
0.1
PC: [xL, xM, xU, n] = [8, 27, 0, 27]
Mo
rtal
ity
rate
Males aged 75: Model M3B
PU: [xL, xM, xU, n] = [1, 27, 0, 27]
1980 1985 1990 1995 2000 2005
0.04
0.06
0.08
0.1
PC: [xL, xM, xU, n] = [7, 25, 1, 27]
Males aged 75: Model M5
Mo
rtal
ity
rate
PU: [xL, xM, xU, n] = [0, 25, 1, 27]
1980 1985 1990 1995 2000 2005
0.04
0.06
0.08
0.1
PC: [xL, xM, xU, n] = [8, 27, 0, 27]
Males aged 75: Model M6
Year
Mo
rtal
ity
rate
PU: [xL, xM, xU, n] = [1, 27, 0, 27]
1980 1985 1990 1995 2000 2005
0.04
0.06
0.08
0.1
PC: [xL, xM, xU, n] = [9, 27, 0, 27]
Year
Mo
rtal
ity
rate
Males aged 75: Model M7
PU: [xL, xM, xU, n] = [1, 27, 0, 27]
19
Prediction-Intervals from 1980: age 85
1980 1985 1990 1995 2000 20050.05
0.1
0.15
0.2
0.25
PC: [xL, xM, xU, n] = [4, 22, 0, 27]
Males aged 85: Model M1
Mo
rtal
ity
rate
PU: [xL, xM, xU, n] = [1, 22, 0, 27]
1980 1985 1990 1995 2000 20050.05
0.1
0.15
0.2
0.25
PC: [xL, xM, xU, n] = [0, 5, 1, 27]
Males aged 85: Model M2B
Mo
rtal
ity
rate
PU: [xL, xM, xU, n] = [0, 7, 1, 27]
1980 1985 1990 1995 2000 20050.05
0.1
0.15
0.2
0.25
PC: [xL, xM, xU, n] = [2, 21, 0, 27]
Mo
rtal
ity
rate
Males aged 85: Model M3B
PU: [xL, xM, xU, n] = [1, 21, 0, 27]
1980 1985 1990 1995 2000 20050.05
0.1
0.15
0.2
0.25
PC: [xL, xM, xU, n] = [2, 24, 0, 27]
Males aged 85: Model M5
Mo
rtal
ity
rate
PU: [xL, xM, xU, n] = [1, 24, 0, 27]
1980 1985 1990 1995 2000 20050.05
0.1
0.15
0.2
0.25
PC: [xL, xM, xU, n] = [1, 18, 0, 27]
Males aged 85: Model M6
Year
Mo
rtal
ity
rate
PU: [xL, xM, xU, n] = [1, 18, 0, 27]
1980 1985 1990 1995 2000 20050.05
0.1
0.15
0.2
0.25
PC: [xL, xM, xU, n] = [5, 26, 0, 27]
Year
Mo
rtal
ity
rate
Males aged 85: Model M7
PU: [xL, xM, xU, n] = [1, 26, 0, 27]
20
Expanding PI conclusions
• PC models have far too many lower exceedances
• PU models have exceedances that are much closer to expectations– Especially for M1, M7 and M3B
– Suggests that PU forecasts are more plausible than PC ones
• Negligible differences between PC and PU median predictions
• Very few upper exceedances
21
Expanding PI conclusions
• Too few upper exceedances, and two many median and lower exceedances
• some upward bias, especially for PC forecasts
• This upward bias is especially pronounced for PC forecasts
• Evidence of upward bias less clearcut for PU forecasts
22
Rolling Fixed Horizon Forecasts
• From now on, work with PU forecasts only
• Assume illustrative horizon = 15 years
• Now examine performance of each model in turn …
23
Model M1
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 200610
-2
10-1
Year
Mo
rtal
ity
rate
Age 65: [xL, xM, xU, n] = [1, 12, 0, 12]
Age 75: [xL, xM, xU, n] = [0, 11, 0, 12]
Age 85: [xL, xM, xU, n] = [1, 10, 0, 12]
Age 65
Age 85
Age 75
24
Model M2B
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 200610
-2
10-1
Year
Mo
rtal
ity
rate
Age 65: [xL, xM, xU, n] = [8, 12, 0, 12]
Age 75: [xL, xM, xU, n] = [0, 12, 0, 12]
Age 85: [xL, xM, xU, n] = [1, 5, 0, 12]
Age 85
Age 65
Age 75
25
Model M3B
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 200610
-2
10-1
Year
Mo
rtal
ity
rate
Age 65: [xL, xM, xU, n] = [2, 12, 0, 12]
Age 75: [xL, xM, xU, n] = [0, 12, 0, 12]
Age 85: [xL, xM, xU, n] = [0, 8, 0, 12]
Age 75
Age 65
Age 85
26
Model M5
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 200610
-2
10-1
Year
Mo
rtal
ity
rate
Age 65: [xL, xM, xU, n] = [9, 12, 0, 12]
Age 75: [xL, xM, xU, n] = [0, 12, 0, 12]
Age 85: [xL, xM, xU, n] = [0, 8, 0, 12]
Age 85
Age 75
Age 65
27
Model M6
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 200610
-2
10-1
Year
Mo
rtal
ity
rate
Age 65: [xL, xM, xU, n] = [10, 12, 0, 12]
Age 75: [xL, xM, xU, n] = [0, 12, 0, 12]
Age 85: [xL, xM, xU, n] = [0, 4, 0, 12]
Age 85
Age 65
Age 75
28
Model M7
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 200610
-2
10-1
Year
Mo
rtal
ity
rate
Age 65: [xL, xM, xU, n] = [4, 12, 0, 12]
Age 75: [xL, xM, xU, n] = [0, 12, 0, 12]
Age 85: [xL, xM, xU, n] = [0, 8, 0, 12]
Age 85
Age 75
Age 65
29
Tentative conclusions so far
• Rolling PI charts broadly consistent with earlier results
• Some evidence of upward bias but not consistent across models or always especially compelling
• M2B again shows instability
30
Mortality density tests
• Choose age (e.g., 65) and horizon (e.g., 15 years ahead)
• Use model to project pdf (or cdf) of mortality rate 15 years ahead
• Plot realised q on to pdf/cdf
• Obtain associated p-value (or PIT value)
• Reject if p is too far out in either tail
31
Example: P-Values of Realised Mortality: Males 65, 1980 Start, Horizon = 26 Years Ahead
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.040
0.5
1
CD
F u
nd
er n
ull
Realised q = 0.0149 : p-value = 0.159
Males aged 65: Model M1
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.040
0.5
1
CD
F u
nd
er n
ull
Realised q = 0.0149 : p-value = 0.021
Males aged 65: Model M2B
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.040
0.5
1
CD
F u
nd
er n
ull
Realised q = 0.0149 : p-value = 0.074
Males aged 65: Model M3B
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.040
0.5
1
CD
F u
nd
er n
ull
Realised q = 0.0149 : p-value = 0.049
Males aged 65: Model M5
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.040
0.5
1
Mortality rate
CD
F u
nd
er n
ull
Realised q = 0.0149 : p-value = 0.052
Males aged 65: Model M6
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.040
0.5
1
Mortality rate
CD
F u
nd
er n
ull
Realised q = 0.0149 : p-value = 0.165
Males aged 65: Model M7
32
Many ways to do this
• For h=25 years ahead: 1 way – 1980-2005 only
• For h=24 years ahead, 2 ways– 1980-2004, 1981-2005
• For h=23 years ahead, 3 ways
• ….
• For h=1 year ahead, 26 ways– 1980-1981, 1981-1982, …, 2004-2005
33
Lots of cases to consider
• The are 25+24+23+…+1=325 separate cases to consider, each equally ‘legitimate’
• Need some way to make use of all possibilities but consolidate results
• We do so by computing p-values for each case and then work with mean p-values from each test
• These are reported below for each age, for h=5, 10 and 15 years ahead:
34
Age 65
1985 1990 1995 2000 20050
0.5
1Males aged 65: Model M1
P-v
alu
e
Average = 0.290 for forecasts 5 years aheadAverage = 0.188 for forecasts 10 years aheadAverage = 0.143 for forecasts 15 years ahead
1985 1990 1995 2000 20050
0.5
1Males aged 65: Model M2B
P-v
alu
e
Average = 0.178 for forecasts 5 years aheadAverage = 0.086 for forecasts 10 years aheadAverage = 0.041 for forecasts 15 years ahead
1985 1990 1995 2000 20050
0.5
1Males aged 65: Model M3B
P-v
alu
e
Average = 0.259 for forecasts 5 years ahead
Average = 0.164 for forecasts 10 years aheadAverage = 0.109 for forecasts 15 years ahead
1985 1990 1995 2000 20050
0.5
1Males aged 65: Model M5
P-v
alu
e
Average = 0.107 for forecasts 5 years aheadAverage = 0.063 for forecasts 10 years aheadAverage = 0.042 for forecasts 15 years ahead
1985 1990 1995 2000 20050
0.5
1Males aged 65: Model M6
Starting year
P-v
alu
e
Average = 0.193 for forecasts 5 years aheadAverage = 0.082 for forecasts 10 years aheadAverage = 0.039for forecasts 15 years ahead
1985 1990 1995 2000 20050
0.5
1Males aged 65: Model M7
Starting year
P-v
alu
e
Average = 0.270 for forecasts 5 years aheadAverage = 0.178 for forecasts 10 years aheadAverage = 0.132 for forecasts 15 years ahead
35
Age 75
1985 1990 1995 2000 20050
0.5
1Males aged 75: Model M1
P-v
alu
e
Average = 0.297 for forecasts 5 years aheadAverage = 0.314 for forecasts 10 years aheadAverage = 0.267 for forecasts 15 years ahead
1985 1990 1995 2000 20050
0.5
1Males aged 75: Model M2B
P-v
alu
e
Average = 0.330 for forecasts 5 years aheadAverage = 0.326 for forecasts 10 years aheadAverage = 0.321 for forecasts 15 years ahead
1985 1990 1995 2000 20050
0.5
1Males aged 75: Model M3B
P-v
alu
e
Average = 0.314 for forecasts 5 years aheadAverage = 0.282 for forecasts 10 years aheadAverage = 0.228 for forecasts 15 years ahead
1985 1990 1995 2000 20050
0.5
1Males aged 75: Model M5
P-v
alu
e
Average = 0.308 for forecasts 5 years aheadAverage = 0.291 for forecasts 10 years aheadAverage = 0.228 for forecasts 15 years ahead
1985 1990 1995 2000 20050
0.5
1Males aged 75: Model M6
Starting year
P-v
alu
e
Average = 0.310 for forecasts 5 years aheadAverage = 0.284 for forecasts 10 years aheadAverage = 0.226 for forecasts 15 years ahead
1985 1990 1995 2000 20050
0.5
1Males aged 75: Model M7
Starting year
P-v
alu
e
Average = 0.312 for forecasts 5 years aheadAverage = 0.258 for forecasts 10 years aheadAverage = 0.228 for forecasts 15 years ahead
36
Age 85
1985 1990 1995 2000 20050
0.5
1Males aged 85: Model M1
P-v
alu
e
Average = 0.240 for forecasts 5 years aheadAverage = 0.326 for forecasts 10 years aheadAverage = 0.282 for forecasts 15 years ahead
1985 1990 1995 2000 20050
0.5
1Males aged 85: Model M2B
P-v
alu
e
Average = 0.335 for forecasts 5 years aheadAverage = 0.368 for forecasts 10 years aheadAverage = 0.331 for forecasts 15 years ahead
1985 1990 1995 2000 20050
0.5
1Males aged 85: Model M3B
P-v
alu
e
Average = 0.318 for forecasts 5 years aheadAverage = 0.386 for forecasts 10 years aheadAverage = 0.367 for forecasts 15 years ahead
1985 1990 1995 2000 20050
0.5
1Males aged 85: Model M5
P-v
alu
e
Average = 0.327 for forecasts 5 years aheadAverage = 0.377 for forecasts 10 years aheadAverage = 0.380 for forecasts 15 years ahead
1985 1990 1995 2000 20050
0.5
1Males aged 85: Model M6
Starting year
P-v
alu
e
Average = 0.327 for forecasts 5 years aheadAverage = 0.378 for forecasts 10 years ahead
Average = 0.386 for forecasts 15 years ahead
1985 1990 1995 2000 20050
0.5
1Males aged 85: Model M7
Starting year
P-v
alu
e
Average = 0.330 for forecasts 5 years aheadAverage = 0.370 for forecasts 10 years ahead
Average = 0.371 for forecasts 15 years ahead
37
Conclusions from these tests
• All models perform well
• No rejections at 1% SL
• Only 3 at 5% SL
38
Overall conclusions
• Study outlines a framework for backtesting forecasts of mortality models
• As regards individual models and this dataset:– M1, M3B, M5 and M7 perform well most of the time and there
is little between them
– M2B unstable
– Of the Lee-Carter family of models, hard to choose between M1 and M3B
– Of the CBD family, M7 seems to perform best; little to choose between M5 and M7
39
Two other points stand out
• In many but not all cases, and depending also on the model, there is evidence of an upward bias in forecasts– This is very pronounced for PC forecasts
– This bias is less pronounced for PU forecasts
• Except maybe for M2B, PU forecasts are more plausible than the PC forecasts
• Very important to take account of param uncertainty more or less regardless of the model one uses
40
References
• Cairns et al. (2007) “A quantitative comparison of stochastic mortality models using data from England & Wales and the United States.” Pensions Institute Discussion Paper PI-0701, March
• Cairns et al. (2008) “The plausibility of mortality density forecasts: An analysis of six stochastic mortality models.” Pensions Institute Discussion Paper PI-0801, April.
• Dowd et al. (2008a) “Evaluating the goodness of fit of stochastic mortality models.” Pensions Institute Discussion Paper PI-0802, September.
• Dowd et al. (2008b) “Backtesting stochastic mortality models: An ex-post evaluation of multi-year-ahead density forecasts.” Pensions Institute Discussion Paper PI-0803, September.
• These papers are also available at www.lifemetrics.com