extreme value theory: a useful framework for modeling extreme or events dr. marcelo cruz risk...
TRANSCRIPT
Extreme Value Theory:
A useful framework for modeling
extreme OR events
Dr. Marcelo CruzRisk Methodology Development
and Quantitative Analysisabcd
Operational Risk Measurement
Agenda
Database Modeling Measuring OR: Severity, Frequency Using Extreme Value Theory Causal Modeling: Using Multifactor Modeling Plans for OR Mitigation
Operational Risk Database Modelling
PROCESS
Legalsuits
Interest expensesBooking errors (P&L Adjustments)
Failures inthe process
Consequence = -$$$!
Human Errors
Systems Problems Poor Controls
Process FailuresABSTRACTPROBLEMS
OBJECTIVEPROBLEMS
Doubtful Legislation
Data Model
Market Risk adjustments Error financing costs Write offs Execution Errors
Market Risk adjustments Error financing costs Write offs Execution Errors
Operations loss dataOperations loss data
MeasureControl
Risk OptimizationRisk Optimization
Data Quality
Control Gaps
Organization
Volumes Sensitivity
Automation Levels
Business Continuity
CEF’sCEF’s
IT Environment
Process & Systems Flux
KCI’s
Nostro Breaks Depot Breaks Intersystem-
breaksIntercompany -
breaksInterdesk breaksControl Account
breaksUnmatched -
confirmationsFails
Operational Risk
P&L
Earnings
Volatility
Market Risk
Credit Risk
Operational Risk
(Revenue)
(Costs)
For the first time banks are considering impacts on theP&L from the cost side!
Measuring Operational Risk
Building the Operational VaR
1) Estimating Severity
2) Estimating Frequency
3) Aggregating Severity and FrequencyMonte Carlo SimulationValidation and Backtesting
Choosing the distributionEstimating ParametersTesting the Parameters
PDFs and CDFsQuantiles
Measuring Operational Risk
10
2436
22
7
120
15
Lossessizes(in $)
Time
52
2021 18
80
25
Location = Average = 34.6Scale = St Deviation= 32.2
2)( 2x
2
1)( exf
f(x) = 1.08% (PDF - probability dist function) = 30.3% (CDF - cumulative dist function)
Measuring Operational Risk
What number will correspond to 95% of the CDF?(How do I protect myself 95% of the time?)
In Excel, Normal Quantile function = NORMINV functionLognormal Quantile function = LOGINV function
Quantile Function = (CDF)-1--> the inverse of the CDF (Solves the CDF for x)
=NORMINV(95%,34.6,32.2) = 87.6=LOGINV(95%,3.2,.78) = 92.7
(Not heavy enough as our “VaR” would have 1 violation!)
Heavier tail !
In our example:
Measuring Operational Risk
EXTREME VALUE THEORY
10
24
36
22
7
120
15
Lossessizes(in $)
Time
52
2021 18
80
25threshold
A model chosen for its overall fit to all database may not providea particular good fit to the large losses. We need to fit a distributionspecifically for the extremes.
Measuring Operational Risk
Broadly two ‘types’ of Extremes:
Peaks over Threshold (P.O.T.)
Fits Generalised Pareto Distribution (G.P.D.)
Distribution of Maxima over a certain period - Fits the
Generalised Extreme Dist (GEV)
10
2436
22
7
120
15
52
2021 18
80
25
10
24
36
22
7
120
15
Time
52
20218
80
25
Threshold
Lossessizes(in $)
Time
Lossessizes(in $)
Measuring Operational Risk
Extreme Value Theory
10
24
36
22
7
120
15
Time
52
20218
80
25
Threshold
Lossessizes(in $)
Hill Shape
Graphical TestsQQ and ME-Plots
Choose distribution
k
k
kxk 1
lnln1̂
Measuring Operational Risk
=NORMINV(95%,34.6,32.2) = 87.6=LOGINV(95%,3.2,.78) = 92.7
1 violation(largest event = 120)
No violations !
Back to the example, comparing the results:
Using GEV (95%,3-parameter) =143.5
Extreme Value Theory
1992 1993 1994 1995 19961 907,077 1,100,000 6,600,000 600,000 1,820,000
2 845,000 650,000 3,950,000 394,672 750,000
3 734,900 556,000 1,300,000 260,000 426,000
4 550,000 214,635 410,061 248,342 423,320
5 406,001 200,000 350,000 239,103 332,000
6 360,000 160,000 200,000 165,000 294,835
7 360,000 157,083 176,000 120,000 230,000
8 350,000 120,000 129,754 116,000 229,369
9 220,357 78,375 109,543 86,878 210,537
10 182,435 52,049 107,031 83,614 128,412
11 68,000 51,908 107,000 75,177 122,650
12 50,000 47,500 64,600 52,700 89,540
Example: Frauds in a British Retail Bank
Extreme Value Theory
1
1
,,, )lnln(ˆ)(
i
nknjnk XXk-1
H
Hill method for the estimation of the shape parameter:
1995 LogLosses Hill1 600,000.34 13.30468552 394,672.11 12.8858106 0.4188753 260,000.00 12.46843691 0.6268114 248,341.96 12.42256195 0.4637495 239,102.93 12.38464941 0.3857246 165,000.00 12.01370075 0.6795287 120,000.00 11.69524702 0.8847278 116,000.00 11.66134547 0.7922399 86,878.46 11.37226541 0.982289
10 83,613.70 11.33396266 0.91144911 75,177.00 11.22760061 0.92666612 52,700.00 10.87237073 1.197653
Hill Plot
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 2 3 4 5 6 7 8 9 10 11 12
1
Extreme Value Theory
QQ-Plot 1995
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1 1.2
QQ-Plots:
Plotting:
},...,1:)(,{ ,, nkpFX nknk
Approximate linearity suggestsgood fit
n
knp nk
5.0,
where
1) Compare distributions2) Identify outliers3) Aid in finding estimates for the parameters
Uses:
Extreme Value Theory
Methods :
1) Maximum Likelihood (ML)2) Probability Weighted Moments (PWM)3) Moments
PWM works very well for small samples (OR case!) and it is simpler. ML sometimes do not converge and the bias is larger.
Parameter Estimation
Extreme Value Theory
PWM Method:
(Based on order statistics)
)1()21(
)}1(1{
3log
2log
3
2
2.9554c7.8590c
0,1,2r , 1
)(ˆ
2
1
13
12
2
1
,,
wScale
scalewLocation
ww
wwc
UXn
wn
j
njr
njr
Plotting Position
Auxiliaries
e u du tu t
o
1 0,
k
jnp kn
5.0,
GEV
Extreme Value Theory
1994 Plot Position w1 PP^2 w21 6,600,000.00 0.958333333 6325000 0.918403 6061458.3332 3,950,000.00 0.875 3456250 0.765625 3024218.753 1,300,000.00 0.791666667 1029166.667 0.626736 814756.94444 410,060.72 0.708333333 290459.6767 0.501736 205742.2715 350,000.00 0.625 218750 0.390625 136718.756 200,000.00 0.541666667 108333.3333 0.293403 58680.555567 176,000.00 0.458333333 80666.66667 0.210069 36972.222228 129,754.00 0.375 48657.75 0.140625 18246.656259 109,543.00 0.291666667 31950.04167 0.085069 9318.762153
10 107,031.20 0.208333333 22298.16667 0.043403 4645.45138911 107,000.00 0.125 13375 0.015625 1671.87512 64,600.00 0.041666667 2691.666667 0.001736 112.1527778
c -0.07731282 Hill 1.56577w0 1,125,332.41 Shape -0.5899362 Gamma 1.06w1 968,966.58 Scale 612,300.60w2 864,378.56 Location 1,101,869.17
Extreme Value Theory
Parameter Estimation (PWM and Hill)
Parameter 1992 1993 1994 1995 1996
Shape Parameter 0.959265 0.994119 1.56577 0.679518 1.07057
Location Parameter 410,279.77 432,211.40 1,101,869.17 215,551.84 445,660.38
Scale Parameter 147,105.40 298,067.91 612,300.60 25,379.83 361,651.03
The shape parameter was estimated by the Hill method and the scale and location by the PWM.
Testing the Model - Checking the ParametersBased on simulation, techniques like Bootstrapping and Jack-knife helps find confidence intervals and bias in the parameters
Let be the estimate of a parameter vector based on a sample of operational loss events x = (x1 , …,xn). An approximation to the statistical
properties can be obtained by studying a sample of B bootstrap estimators m(b) (b = 1,
…,B), each obtained from a sample of m observations, sampling with replacement from the observed sample x. The bootstrap sample size, m, may be larger or smaller than n. The
desired sampling characteristic is obtained from properties of the sample { m(1),…, m(b)}.
Jackknife =>
<= Bootstrapping
Jacknife Test for Model GEV
Shape Std Err = 0.4208, Scale Std Err = 116,122.0647,Location Std Err = 126,997.6469
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Loss Number Removed(Descending)
Par
amet
er V
alu
e
0
50000
100000
150000
200000
250000
300000
350000
Shape Scale Location
Frequency Distributions
Number of Frauds = 102
January February March April May June July August
95 82 114 74 79 160 110 115 91%118 95%126 99%
Poisson
Poisson PDF
0.00%
0.50%
1.00%
1.50%
2.00%
2.50%
3.00%
3.50%
4.00%
4.50%
0 50 100 150 200
Poisson CDF
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
0 20 40 60 80 100 120 140 160
!)(
0 k
exf
kx
k
Poisson Distribution:
Other popular distributions toestimate frequencyare the geometric,negative binomial,binomial, weibull, etc
Measuring Operational Risk
No analyticalsolution!
Need to be solvedby simulation
Prob
Number of Losses
Prob
Frequency
Losses sizes
Prob
Aggregated losses
Aggregated Loss Distribution
)(0
* xFpn
nXn
Alternatives:1) Fast Fourier Transform2) Panjer Algorithm3) Recursion
Severity
Model Backtesting and Validation
59
1
1 arg)]1,10(60
1 x );1,10(max[
i
imtmtmtmt eCreditChVaRSVaRMRC
Multiplier based on Backtests (Between 3 and 4)
Currently for Market / Credit Risks
Model Backtesting and Validation
)])1(*ln())1([ln(2
)1(*)Pr(
xnxxnx
xnx
ppLR
ppx
nx
Kupiec Test
Exceptions can be modelled as independent draws from a binomial distribution
Interval Forecast Method
Regulatory Loss Functions
mt1t
mt1t1m
VaR if 0
VaR if 1I
t
Series must exhibit the property of correct conditional coverage (unconditional)and serial independence
n
1i
imtm
mt1tmt1t
mt1t11
CC
VaR if )VaR,( g
VaR if ),(
mtt
mtVaRf
C
Define benchmarks(some subjectivity)
Under very general conditions, accurate VaR estimates will generate the lowest possible numerical score
Understanding the Causes - Multifactor Modeling
Try to link causes to loss events
For Example: We are trying to explain the frequency and severity of frauds by using 3 different factors.
Number of Op Errors Losses ($$) System Downtime N. of Employees No. of TransactionsJanuary 95 1,200,000 20 16 1,003 February 82 920,000 17 16 910
March 114 1,770,987 30 14 1,123 April 74 652,000 15 17 903 May 79 710,345 16 17 910 June 160 2,100,478 41 13 1,250 July 110 1,650,000 33 14 1,196
Losses = 4,597,086.21 - 7,300.01 System Downtime - 286,228 .59 Employees + 1,193 N.of Tr.
N. of Op Errors = 88.88 + 6.92 System Downtime + 5.32 Employees - 0.22 N. of transactions
R2 = 95%, F-test = 20.69, p-value = (0.01)
R2 = 97%, F-test = 42.57, p-value = (0.00)
Understanding the Causes - Multifactor Modeling
Benefits of the Model
1) Scenario Analysis / Stress Tests
Ex: Using confidence intervals (95%) of the parameters to estimate the number of frauds and the losses ($$) for the next month.
2) Cost / Benefit Analysis
Ex: If we hire 1 employee costing 100,000/year the reduction in losses is estimated to be 286,228.
Developing an OR Hedging Program
• Specific coverage• Immediate protection against catastrophes
OPERATIONAL RISK(MEASURED)
Capital Allocation
Internal Risk Transfer
Insurance Securitization
• General coverage rather than specific risks• It would not pay immediately after catastrophe (although some new products claim to do so)
MITIGATION(Non financial)
Developing an OR Hedging Program
AGENTFINANCIAL INSTITUTION
RISK TRANSFER COMPANY or SPV CAPITAL MARKET
INSTRUMENT Insurance policy offered by RTC
Takes the Risk and issues Bonds linked to operational event at the financialinstitution Buy the bond
FINANCIAL RESULTS Paid a premium Receives a commission Recieves high yield
RISKSNone up to the limit insured None
If the operational event described in the bond happens in the financial institution, loss of some or all the principal or interest
OpVar
CDF
Insurance ORL Bond(OR insurance)
Retain
Optimal point
Developing an OR Hedging Program
• It is possible to use robust methods to measure OR
• OR-related events does not follow Gaussian patterns
• More than just finding an Operational VaR, it is necessary to relate the losses to some tangible factors making OR management feasible
• Detailed measurement means that product pricing may incorporate OR
• Data collection is very important anyway!
Conclusion
My e-mail is [email protected]