model risk – sources and some examples tony bellotti department of mathematics imperial college...

Model Risk – sources and some examples

Tony Bellotti

Department of Mathematics

Imperial College London

Model development

A highly simplified model development framework:-

Model development

Model Use

In this framework, once the model is developed, we then think of it as correct.

However, the model is only an approximation to reality.

Thinking about model risk

Do you factor in the uncertainty of your model when you use it?

ModelRisk

Model development

Model

UseAssess-ment

Measure

• Firstly, we need to understand the sources of model risk and how to measure those risks.

• Secondly, the consequences of using the model needs to be assessed in light of the model risks, prior to use.

Does model risk matter?

But… does model risk really matter?

Does it make a substantial difference in the real world?

“The reliance on models to handle risk carries its own risk” *

In securities markets, where complex pricing models are used, there is such a thing as model arbitrage, where a trader will take advantage of known errors in model structure or implementation to make money.

So there is a genuine effect. *

If this happens in retail credit, perhaps it could lead to adverse selection (eg pricing a loan below the true risk level of the borrower).

* Emanuel Derman (1996), Model Risk, Goldman Sachs Quantitative Strategies Research Notes

What about model risk in retail credit?

But retail credit employs relatively simple models, so perhaps there is no problem?....

• But model complexity is not the only source of model risk (although it is an important one for pricing models).

• In the following slides I will consider several possible sources of model risk.

• Note: This is not an exhaustive list and also there is some overlap between the various categories.

• Later, I give some examples from retail finance to illustrate when there could be model risk issues.

Sources of model risk

Statistical:-»Model misspecification»Model efficiency/inefficiency»Data problems and selection bias»Robustness over time»Inappropriate use

Other/management:-»Model development resources (analysts/time)»Publication, implementation and software error

We consider only the statistical sources of model risk.

Model misspecification (1)

• Model structure » Do we have the correct general model structure to model the

data?» In the past, it was common to use OLS. Now it is standard to use

logistic regression. Perhaps now we can ask if logit is the correct link function?

» Is the basic linear scorecard correct? Is a nonlinear structure more appropriate?

• Model assumptions: what are they and are we breaking them?» Distributions on error terms (eg normality for OLS).» Independence for observations in standard logistic regression. Is

this really true in retail credit?

Model misspecification (2)

• Inclusion of variables. » Too few variables may lead to biassed estimates.» Too many will lead to less efficient estimates and, hence, less

robust models.

• Variable transformations (to log or not to log?).» With some variables like income, it is “standard” to take log.» What about others? Age, eg?» Some modellers use all weights-of-evidence – is this appropriate?

• Multicollinearity.» Where predictor variables are themselves highly correlated, this

can lead to inefficient or wrong estimates (in particular, it can lead to the wrong sign).

Model efficiency/inefficiency

Every model is inaccurate and every estimate is just that: an estimate.

Fortunately, most statistical models provide a measure of the accuracy of estimates (ie the standard errors).

» This is not true of all models (eg standard linear discriminant analysis and machine learning algorithms) – although it’s always possible to bootstrap.

» Remember though that the accuracy of the standard errors themselves can be suspect and is dependent on following model assumptions (or relying on model robustness).

Data problems and selection bias

• Is the data appropriate for the modelling task?» Reliability in data collection; eg how reliable is a self-assessment

of income?» Or, eg, based on an existing portfolio of predominantly older

customers, build a model for a card targeting young customers.» A data set of accepted loan applications, to build a scorecard

across all new applications.

• Of course, the last example is the problem of selection bias.» It is a fairly well understood model risk issue in retail credit.» Several reject inference techniques to handle it: eg parcelling

and augmentation.

Robustness over time (1)

• There are some problem domains where risk factors and distributions on variables are stable over time.

• In such domains, models remain stable.» For example, mortality scoring models based on physiology of

hospital in-patients (eg Apache III) are stable since human physiology does not change much over time.

• However, consumer credit does not remain stable over time.» Credit risk changes over the business cycle.» Credit usage behaviour changes over time.» Banks’ risk appetite changes over time.» Innovations in technology and product development change risk.

• All of these time-varying factors affect the applicability of credit risk models over time.

Robustness over time (2)

• Changes in the effect size of risk factors will have an obvious effect on the applicability of a model.

• Population drift: Changes in the distribution of predictor or outcome variables can also affect the robustness of the model.

• Slow versus sudden change (eg economic crisis) can have different effects on the applicability of a model.

• Possible approaches to dealing with this problem:-» Rebuild models regularly and Champion/challenger environment.» Dynamic models (ie including time-varying factors in the risk

model). » Adaptive models.

Model robustness, in general

• The problem of model robustness over time generalizes to different domains:

eg geographic or product type.

• For example, if we have a credit card product operating in UK, does the same scorecard model apply to Ireland?

»How different will it be?

Inappropriate use

“In terms of risk control, you’re worse off thinking you have a model and relying on it than in simply realizing there isn’t one.” *

A model may be built correctly.

However, it may be used for the wrong task.

For example, using a default model as the basis of a strategy on customer retention…. Better to build a new model focussed on retention.

* Emanuel Derman (1996), Model Risk, Goldman Sachs Quantitative Strategies Research Notes

Consequences of model risk (1)

What are the consequences of model risk?

Need to measure the effect of model risk on model use :-

(1) Explanatory model• If it is important that the model is used as an explanatory model,

then bias and inefficiency in model estimation will be important.• Eg for discussion with management and regulators.

(2) Forecasting • Individual / account level; • Aggregate / loss forecasting;• Does the flat maximum effect provide some robustness against

model bias and inefficiency?

Consequences of model risk (2)

(3) Stress testing• Predictions of outcome for extreme values.• Typically, value-at-risk, expected shortfall, or scenarios.• Effects of model risk on stress testing are likely to be different to

the effect on standard forecasts.

I now give some quick examples of model risk, looking at usage, measurement issues and consequences….

Example 1: Misspecification / Misapplication

Performance of models for extreme cases *

Models work well at estimating expected values for “typical” cases from the population.

However, how do they fare when predicting default rates (DR) for extreme cases?

• In this experiment, a logistic regression model is built for credit card data.

• DR is then predicted for an independent test set of extreme cases (with respect to variables such as age and job) and compared with observed DR.

* Work conducted by Alice Wang as part of her third year undergraduate project.

Example 1: Results

• We see that these models tend to under- or over- estimate DR for extreme cases.

• Interestingly, the parsimonious model gives better forecast results.

• Note: all extreme criteria represent 2% of the test data (N=600).

Variable DR Full model Parsimonious

Age>67 Observed 0.0664 0.0664

Predicted 0.0722 0.0698

Error +8.8% +5.2%

Years in current job>24

Observed 0.110 0.110

Predicted 0.0717 0.0724

Error -34.6% -33.9%

Income (log) > 7.84

Observed 0.1034 0.1034

Predicted 0.1180 0.1148

Error +14.1% +11.0%

Years in current residence > 41

Observed 0.1146 0.1146

Predicted 0.1121 0.1147

Error -2.5% +0.1%

Example 2: Selection bias

Simulation study

• The problem of selection bias in application models is well known and several reject inference methods have been proposed.

• Unfortunately, in a real world context it is not usually possible to accurately evaluate the extent of the bias, or the effectiveness of a reject inference method, since outcomes for rejects are unknown.

• However, simulation studies can be used to show the effect. These are valuable to demonstrate the extent of the problem.

Here is the result of a simulation study using an augmentation method.• In a nutshell, augmentation is a method that weights observations

from the accepts; usually according to how typical they are of being accepts, based on an Accept-Reject model.

Example 2: Results

1. Suppose we simulate 25,000 applications with two variables: income () and number of previous delinquencies () and outcome: good/bad.

2. Reject 40% of applications using a scorecard.

3. Build an unbiassed model S1 on all applications:» Score = -2.05 + 1.47 -0.64» (remember, in the real world we could not build S1 since we do not have outcomes for rejects)

4. Now build a biassed model S2 based on just the 60% accepted cases:-

» Score = -2.08 +1.43 -0.32

Notice the difference in coefficient estimate on .

Why does this happen?

Example 2 continued

• This graph shows the distribution is not the same for the accepted population, compared to all.

• Those with high numbers of delinquencies are under-represented.• This effects the model estimation.

0 1 2 3 4 50.0%

20.0%

40.0%

60.0%

80.0%

All applicationsAccepted applications

x2

Example 2 continued

A model using augmentation S3 uses only the sample of accepts like S2, but weights observations with high delinquency more heavily in the accepted sample.

Hence model estimation is closer to the unbiassed model:• Score = -2.05 + 1.59 -0.46

The new model also gives better results on an independent test set:-

One lesson here is that simulation studies are of value to give insight into aspects of model risk that are not immediately measureable in the real-world setting.

Model AUC

S1 (unbiassed) 0.844

S2 (biassed) 0.832

S3 (augmentation) 0.840

Example 3: Model estimation error

Incorporating model estimation error in loss forecasts

Take the log-odds score from a scorecard to build a univariate logistic regression model.• Of course, the coefficient estimate on is (approximately).• However, there is a standard error which allows us to construct a CI

for : .

What consequences does this have in a real example?

Experiments with 50,000 credit cards where default rate=0.2: .• This has a small and modest effect on estimates of PD:

If PD estimate with is 0.2, then 99%CI gives (0.193,0.207).

Example 3 continued

Effect on expected loss EL=PD x LGD x EAD:

However, if we look at Value-at-risk (VaR) of EL, then the small variation in model, has a bigger impact.

Using Monte Carlo simulation of EL, either (A) with fixed coefficient , or (B) generated values of :

• At the 99% level, VaR for simulation study (B) is 4% higher than for study (A).

Based on Bellotti (2011), A simulation study of Basel II expected loss distributions for a portfolio of credit cards. Journal of Financial Services Marketing

Example 4: Misspecification

Using Logit versus Poisson link function

In the context of large defaultable bond portfolios, Lucas and Verhoef* experiment with Logit and Poisson link function.

• Note: there is a good rationale for using a Poisson link function since default time can be modelled as a Poisson process.

How do the models perform in estimating expected loss?

* Lucas A and Verhoef B (2012), Aggregating Credit and Market Risk: the Impact of Model Specification, working paper, Tinbergen Institute, VU University Amsterdam

Example 4 continued

For two segments, they report these results:-

• Hardly any model misspecification problem for Expected Loss estimates…

• But, importantly, for VaR, Logit underestimates (relative to Poisson).

“model specification matters … This is surprising, as the shape of the link function is deemed to be less important for computing capital requirements.” *

Low quality Medium quality

Logit Poisson Logit Poisson

Expected Loss -2.6 -2.6 -5.3 -5.3

VaR (99.9%) 12.2 13.4 9.5 9.9

Example 5: Robustness over time

Use of time-varying risk factors for loss forecasting

One approach to dealing with changing risk levels over time is to include macroeconomic time series.

Survival models are a good way to do this since macroeconomic and behavioural data can be included as time-varying covariates (TVCs).

• Model time to default as a failure event.

Experiment on portfolio of UK credit card data: *»Training data: 400,000 credit cards over period 1999 to 2004.»Forecast for 150,000 credit cards from 2005 to mid-2006.

* Bellotti and Crook (2009), Forecasting and stress testing credit card default using dynamic models, working paper, Credit Research Centre, Edinburgh

Example 5: Results

Inclusion of interest rate and unemployment rate are statistically significant.

We compare default rate (DR) forecasts between models with application variables (AV) only (eg age, income, employment status, housing status, at application), behavioural variables (BV) and macroeconomic variables (MV).

MAD = mean absolute difference between estimated and observed DR.

This shows an improvement in aggregate forecasts when macroeconomic data is included in the model.

Model MAD

AV 0.087

AV+BV 0.058

AV+BV+MV 0.049

Conclusion

• There is a genuine problem of model risk. We have seen some suggestive examples.

• We need to understand the sources of model risk.

• We need to know the consequences of model risk and how to measure it.

• We need to find ways to manage model risk: Develop methods to reduce or control it, and Incorporate model risk in our decision making.

model risk – sources and some examples tony bellotti department of mathematics imperial college...

Documents