validation of internal rating and scoring models ... of internal rating and scoring ... [basel ii,...
Post on 20-Mar-2018
230 Views
Preview:
TRANSCRIPT
©2005 EYGM Limited. All Rights Reserved.
Validation of InternalRating and ScoringModels
Dr. Leif BoegeleinGlobal Financial Services Risk ManagementLeif.Boegelein@ch.ey.com
07.09.2005
2Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Agenda
1. Motivation & Objectives
2. Measuring Discriminatory Power
3. Measuring the Quality of PD Calibration
4. Model Validation and “Rating Philosophies”
5. Relevance for Retail Credit Scoring
6. Summary
3Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Regulatory Requirements on “Validation of internal estimates” (BII §§500-505)
Motivation: Regulatory Compliance F-IRB / A-IRB
Discriminatory Power
Calibration
Action plan
Continuous Improvement
Independent Staff
Benchmarking
…
“…a bank must demonstrate to its supervisor that the internal validation process enables it to assess the performance of internal rating and risk estimation systems consistently and meaningfully.” [Basel II, §500]
“Banks must regularly compare realised default rates with estimated PDs for each grade and be able to demonstrate that the realized default rates are within the expected range for that grade.” [Basel II, §501]
“Banks must also use other quantitative validation tools and comparisons with relevant external data sources. The analysis must be based on data that are appropriate to the portfolio, are updated regularly, and cover a relevant observation period.“[Basel II, §502]
“Banks must have a robust system in place to validate the accuracy and consistency of rating systems, processes, and the estimation of all relevant risk components…” [Basel II, §500]
UK Waiver ApplicationCP189, CP05/03
4Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Regulatory developments: Europe
2005 2006 2007 2008
Bas
el II
Fin
al A
ccor
dJu
ne 2
004
CR
D R
elea
sed
July
200
4
Trading Book Review Consultative Paper
April 2005
CEB
SG
uida
nce
on P
illar
2M
ay 2
004
FSA CP 05/03
Final CRD4th quarter 2005
FSA CP
CRD Legislated
FSA PrudentialSource Book
Retail IRB FIRB BI StandardisedCredit Risk Operational Risk
BI StandardisedCredit Risk Operational RiskStandardised
Retail IRB FIRBCredit Risk
BI StandardisedOperational Risk
Basel I Advanced IRBCredit Risk
AMAOperational Risk
Credit Risk
Operational RiskAMA
Advanced IRB
11
22
33FSA Application for AccreditationFSA Application for Accreditation
QIS 4/5QIS 4/5 RecalibrationOf Accord
RecalibrationOf Accord
Parallel RunningParallel Running
5Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Model Validation in the context of the UK Waiver application. CP05/03
5
CHALLENGES• Roll-out plan – THE KEY
DECISION• Single point of contact – the
right person at the right level• Effective governance
framework
CHALLENGES• Basel II, CRD, CEBS, FSA • Comprehensive assessment• Sign-off
CHALLENGES• Section subject to most
scrutiny• Validation of Credit Risk
Models• Evidencing governance & use
test
CHALLENGES• Quantitative impact study
CHALLENGES• Gathering of comprehensive
information per model• Evidencing compliance with
requirements
CHALLENGES• Senior management held
accountable• How to involve them to the
level required?
Section A: Overview of Structure & Governance Framework
Section B: Self Assessment Section C: Summary of Firm’s Approach in Key Areas
Section D: Capital Impact Section E: Detail on Rating System
Section F: CEO or Equivalent Sign-off
6Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Business Value
Risk Measurement
• Internal Rating System• Scoring Models• LGD, EAD Models
Transaction Decision-Making• Underwriting and
Approval Process• Credit
Authorities• Risk-Based
Pricing Tools
Monitoring• Credit Reporting• Credit Monitoring and Administration• Model performance
Portfolio Management• Credit Portfolio
Management• Capital
Allocation and Performance Management
Reporting• Internal, Management Reporting• External Stakeholder Reporting§ Credit Risk Management processes rely
heavily on the quality of credit risk models.
§ Model error influences all phases of the credit lifecycle from transaction origination to portfolio management.
§ Systematically inaccurate estimates of risk will lead to inefficient pricing and portfolio management decisions.
§ Impact:
1. Deterioration of portfolio quality due to adverse selection effects.
2. Erosion of capital base due to unexpected losses, inadequate pricing and provisioning
Motivation: Competitive Risk Management
7Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Data QualityCompleteness: Coverage of all transactions and portfolios; missing data protocolsAppropriateness: Historical data is representative of current portfolio
Accuracy: Input data validation; frequent information refresh – up to date dataConsistency: Common identification and data definitions across organization
Data Security & ControlsAudit Trails for inputs and outputsAccess & change controls
Business Continuity Procedures
Methodology & QuantificationDevelopmental Evidence: Model development and refinement stagesBenchmarking: Comparison of results with other modelsBacktesting: Comparison of realized experience with that predicted by modelRefinement: Review of key risk drivers
ReportingInclusion in reporting target audience of key decision-makers and reviewersContent matched to audience in depth, granularity, length, sufficiencyQuality and accuracy appropriate for designated use Frequency appropriate for audience and designated use
GovernanceConsistency of application by end usersUse TestTransparency of model development, approval and validation processesIndependence of model operation from transaction/position originatorBoard responsibilities and committee structure and senior management oversightOwnership of models, ownership of validation processApproval processes for new modelsDocumentation standards for
Policies & proceduresModels and methodologiesValidation process
The Model Validation Process - Components
8Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Agenda
1. Motivation & Objectives
2. Measuring Discriminatory Power
3. Measuring the Quality of PD Calibration
4. Model Validation and “Rating Philosophies”
5. Relevance for Retail Credit Scoring
6. Summary
9Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
0 20 40 60 80 100
010
2030
Score
0 20 40 60 80 100
02
46
8
Score
0 20 40 60 80 100
010
2030
40
Score
Quantitative Validation: Discriminatory Power
Frequency of Model Scorefor complete (validation) sample
Conditional Frequency of Model Score for Non-Defaulters
Conditional Frequency of Model Scorefor Defaulters
§ Frequency plots for validation sample of retail scorecard validation sample (N=1000) indicate that the system is moderately succeeding in separating defaulters from Non-Defaulters
§ Conditional Frequencies give us the likelihood of observing a certain Score given information about Default / Non – Default. This is not the sought after PD, i.e. the likelihood of observing Default for a given Score !
§ Beyond eye-inspection, we need a metric for the degree of discrimination between Defaulters and Non-Defaulters
10Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
F(Score)
G(S
core
)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Quantitative Validation: Discriminatory Power
Model Score Distributionfor Defaulters
Model Score Distribution for Complete Sample
F(Score)
G(Score)
§ QQ plot of the distributions is the Cumulative Accuracy Profile (CAP).
§ The Accuracy Ratio (AR) is defined as the area between the model CAP and the random model, divided by the area between the perfect model and the random model.
§ AR is typically reported between 0 (random model) and 1 (perfectmodel). Possible values in [-1;1] with negatives indicating better discrimination than random but in reverse order.
Prior PD
“Random” model
“perfect”model
Model CAP
Score
p
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
Score
p
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
CAP: ( F(Score);G(F-1(p)) )
11Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Quantitative Validation: Discriminatory Power (Example)
Score
[%] D
efau
lts
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
NN 19-19-1 (AR=0.63)Strong Logit model (AR=0.58)Weak Logit model (AR=0.26)
CAP curves for 3 models developed on a retail dataset. Validation sample 1000 cases. 2 out of the 3 models seem to have discriminatory power
§ Validation of discriminatory power must be based on a representative sample that has not been used for model development !
§ CAP and Accuracy Ratio measure how well the model is able to separate defaulters from non-defaulters
§ The exemplary models produce values for AR between 0.26 and 0.63
12Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Quantitative Validation: Discriminatory Power (Example)§ Default events are random. Even if we could
predict all default probabilities without error, the measures could still indicate low discriminatory power for periods with “odd”defaults
§ AR is dependent on the sample and can therefore not provide an objective comparison between models of different samples!
§ For small samples, low default portfolios and models developed for pools of homogeneous obligors this issue becomes critical
Fraction of Toatal Obligors
Frac
tion
of D
efau
lters
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Bootstrapping: Logit CAP and 95% confidence bounds
0.0 0.2 0.4 0.6 0.8 1.00
12
34
5AR.history
Asymptotic distributionBootstrapping
Bootstrapping: Distribution of AR
§ By simulating defaults for the portfolio and calculating AR in each scenario we gain an impression on the variability of AR itself.
13Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Quantitative Validation: Discriminatory Power (Example)
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
AR.history
Asymptotic distributionBootstrapping
Simulation: Distribution of AR, Retail Scoring ModelSimulation: Distribution of AR, Random Model
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
AR
§ For small (validation and development) sample sizes and low-default portfolios the influence of statistical noise on Accuracy ratios becomes more pronounced.
§ AR analysis should include a test versus the random model with AR=0. In the example, the distribution of model AR is comfortably far to the right of the distribution of the Random model AR. Based on the simulation results (Validation sample size 200) we can estimate the confidence level of wrongly rejecting the hypothesis that the model is a random model (<<10exp-3 in the example).
14Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Quantitative Validation: Discriminatory Power
§ The Accuracy Ratio is a random variable itself. It is dependent on the portfolio structure, the number of defaulters, the number of rating grades etc. The metric is influenced by what it is measuring and should not be interpreted without knowledge of the underlying portfolio
§ Accuracy Ratios are not comparable across models developed from different samples
15Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Agenda
1. Motivation & Objectives
2. Measuring Discriminatory Power
3. Measuring the Quality of PD Calibration
4. Model Validation and “Rating Philosophies”
5. Relevance for Retail Credit Scoring
6. Summary
16Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Quantitative Validation: Calibration
§ The Calibration of Rating system influences the complete credit lifecycle, from pricing to portfolio management decisions.
§ In practice, validation of PD accuracy is often considered a difficult exercise due to data constraints and lack of a proper definition of rating philosophy.
§ Validation of PD accuracy covers all rating classes / retail pools. Different from the traditional focus on accuracy at the cut-off point in retail scoring.
§ Simple statistical methods can be utilized to illustrate the “uncertainty” associated with PD estimates.
17Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Ra tin g c la s s e s
Defau
lt pro
babil
ity (P
D)
F o rec a s tAc tu a l
Quantitative Validation: Calibration
17
PD Accuracy: Observed Default Rates versus Predicted
§ The ultimate goal of rating models is to predict the probability of default events
§ PDs are either produced by the model or obtained via mapping of internal grades to external default experience
§ The accuracy of these estimates needs to be validated
§ Realized Default rates will deviate from estimated ones. Validation procedures need to examine whether the deviation is substantial and should lead to a review of the model or can be attributed to statistical noise. Focus :
1. Significance of Deviations
2. Monotony of PDs with regards to “risk”
Problem with model or statistical noise ?
§ Binomial test / Normal test. Determine probability of observing the realized default rate under the hypothesis that the estimated value is correct for each rating class
§ Produce overall measure of calibration accuracy for the rating system
18Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
§ Simple model to analyze calibration quality assuming independent default events:
Assume that our estimate of the PD for the reference period in a particular rating class k is . We count obligors in this particular rating class. If default events happen independently, the number of total defaults for this rating class during the reference period is binomially distributed with parameters:
§ Under this assumption, we can calculate the probability of observing a given range of the default rate. Alternatively we can fix a certain level of confidence and calculate the upper and lower bounds on the observed default rate under the hypothesis that equals the true PD pk.
§ Limitations:
• Default rates are typically influenced by common factors. Unconditional therefore appear to be “correlated”. The closely related normal test incorporates correlation between default rates / default events.
• The chosen rating philosophy determines the volatility of PD estimates and should be considered when interpreting the results as we will discuss later on.
Quantitative Validation: Binomial Test
kp̂
kb̂kn
kp̂
19Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Quantitative Validation: Calibration
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
0 1 2 3 4 5 6 7 8
+ ++
++
+
+
• For 5 of the 7 rating classes we reject the hypothesis that the PD equals the long term average based on a two sided 95% interval and the assumption of independent defaults.
20Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
§ The results from the Binomial test look at each rating class in isolation. The Hosmer Lemeshowstatistic is an example of how the Goodness-of-Fit information can be condensed into one figure.
§ The Hosmer-Lemeshow statistic is distributed Chi square with K degrees of freedom (if we assume that the pk are estimated out of sample).
§ Lower p-values for HL document decreasing Goodness-of-Fit, i.e. the hypothesis that the observed default rate equals the assumed value become increasingly unlikely.
§ HL p-values allow us to compare rating models with different number of classes.
§ For the example, the HL statistic is 44.7, which yields a p-value < 10exp-6 for 7 degrees of freedom. We would therefore reject H0 that our estimated PDs are in line with realized defaults across rating classes.
§ The test is based on the normal approximation and independence assumption of observed pk
Quantitative Validation: Calibration
21Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Agenda
1. Motivation & Objectives
2. Measuring Discriminatory Power
3. Measuring the Quality of PD Calibration
4. Model Validation and “Rating Philosophies”
5. Relevance for Retail Credit Scoring
6. Summary
22Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Rating Philosophy§ The volatility of observed PDs within rating grades and
the migration frequency of obligors across rating grades are determined by the adopted “rating philosophy”.
§ When interpreting the previously discussed PD Accuracy measures, one has to take into account the philosophy underlying the estimations:
Through-the-cycle: Obligor ratings are not influenced by short term risk characteristics. Ratings do not change frequently, default rates vary.
Hybrid Approach: Mixture of both “pure” philosophies. The rating system incorporates current condition elements of obligor idiosyncratic and systematic risks. Data limitations force a “medium term” PD forecast.
Default Rate Volatility
Rating Migration Frequency
Point in Time Low High
Through the Cycle High Low
t [a]
PD [%
]
0 2 4 6 8
0.0
0.5
1.0
1.5
2.0
2.5
Point-in-time PDTypical Hybrid approachThrough the cycle PD
Point-in-time: Rating and PD estimate is based on current condition of the obligors’ risk characteristics, typically including the economic environment. This should result in high rating migration frequency. Default rates within rating classes should remain stable.
23Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Agenda
1. Motivation & Objectives
2. Measuring Discriminatory Power
3. Measuring the Quality of PD Calibration
4. Model Validation and “Rating Philosophies”
5. Relevance for Retail Credit Scoring
6. Summary
24Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Quantitative ValidationRetail Banking BookGoods: 40.000Bads: 5.000
New Retail Product 1 Goods: 1.000Bads: 20
New Retail Product 2Goods: 5.000Bads: 350
Old Retail Product 1 Goods: 20.000Bads: 3.000
Sub-Portfolio 1 Goods: 18.000Bads: 2.000
Sub-Portfolio 2 Goods: 1.350Bads: 500
Sub-Portfolio 3 Goods: 650Bads: 500
Old Retail Product 2 Goods: 14.000Bads: 1.630
Sub-Portfolio 4 Goods: 6.000Bads: 250
Sub-Portfolio 5 Goods: 8.000Bads: 1.130
• Example of medium sized retail bank model infrastructure• As we have seen, uncertainty of measurements and constraints in application result mostly from data restrictions (low default,
small sample, model assumptions etc.). • While on the retail banking book level, low-default and small may become important when portfolio splits are conducted to
increase specificity of the model.
25Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Agenda
1. Motivation & Objectives
2. Measuring Discriminatory Power
3. Measuring the Quality of PD Calibration
4. Model Validation and “Rating Philosophies”
5. Relevance for Retail Credit Scoring
6. Summary
26Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
Model Validation – Summary
§ Validation is a rigorous process carried out by a bank to ensure that parameter estimates result in an accurate characterization of risk to support capital adequacy assessment
§ Validation is the responsibility of banks, not supervisors
§ It is not a purely statistical exercise. The appropriateness of metrics will depend on the particular institution’s portfolios and data availability. However, quantitative measures are considered to play an integral part in validation process by institutions and regulators.
§ In practice, data is often insufficient and validation will focus on developmental evidence in model design, data quality and benchmarking.
§ In practice, model validation is often not implemented as an “actionable” and independent process. It often lacks a formal policy with statements of definitions of responsibilities, tests to be performed, metrics and benchmarks to use, thresholds for acceptable quality and actions to be taken if these are breached.
27Leif Boegelein, Credit Scoring & Credit Control IX. September 7th, 2005
References
27
Basel Committee on Banking Supervision, International Convergence of Capital Measurement and Capital Standards – A Revised Framework, June 2004
Basel Committee on Banking Supervision. Studies on the Validation of Internal Rating Systems. Basel, Feb. 2005. Working paper No.14
B. Engelmann, E. Hayden, and D. Tasche. Measuring the discriminatory power of rating systems. Nov 2002
FSA, Report and first consultation on the implementation of the new Basel and EU Capital Adequacy Standards, CP189, July 2003.
FSA, Strengthening capital standards, CP05/03, Jan. 2005
R. Rauhmeier, Validierung und Performancemessung von bankinternen Ratingsystemen, Dissertation, July 2003
L.C. Thomas, D.B. Edelman, and J.N. Crook. Credit Scoring and Its Applications. Monographs on Mathematical Modeling And Computation, Siam, 2002.
top related