claim reserving: performance testing and the control … reserving: performance testing and the...

33
Claim Reserving: Performance Testing and the Control Cycle by Yi Jing, Joseph Lebens, and Stephen Lowe ABSTRACT Fundamentally, estimates of claim liabilities are forecasts subject to estimation errors. The actuary responsible for making the forecast must select and apply one or more ac- tuarial projection methods, interpret the results, and apply judgment. Performance testing of an actuarial projection method can provide empirical evidence as to the inherent level of estimation error associated with its forecasts. Per- formance testing of alternative methods provides formal assurance that the actuary is using the best methods for the given circumstance, and also provides insight into the appropriate weight to give to the indications produced by each method. Performance testing is an integral part of the actuarial control cycle associated with the loss reserving process. It provides the necessary feedback loop to the ac- tuary, assuring that he or she is not overconfident about his or her forecasts. This paper describes how to construct sound performance tests, consistent with statistical cross- validation, within the reserving control cycle. It illustrates the application of the techniques via a case study, including some interesting empirical results. KEYWORDS Chain-ladder, claim reserving, control cycle, cross-validation, estimation error, forecast error, loss reserves, overconfidence, parameter risk, process risk, predictive skill, reserve accuracy, reserving process VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 161

Upload: lenhan

Post on 05-May-2018

218 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: PerformanceTesting and the Control Cycle

by Yi Jing, Joseph Lebens, and Stephen Lowe

ABSTRACT

Fundamentally, estimates of claim liabilities are forecastssubject to estimation errors. The actuary responsible formaking the forecast must select and apply one or more ac-tuarial projection methods, interpret the results, and applyjudgment. Performance testing of an actuarial projectionmethod can provide empirical evidence as to the inherentlevel of estimation error associated with its forecasts. Per-formance testing of alternative methods provides formalassurance that the actuary is using the best methods forthe given circumstance, and also provides insight into theappropriate weight to give to the indications produced byeach method. Performance testing is an integral part of theactuarial control cycle associated with the loss reservingprocess. It provides the necessary feedback loop to the ac-tuary, assuring that he or she is not overconfident abouthis or her forecasts. This paper describes how to constructsound performance tests, consistent with statistical cross-validation, within the reserving control cycle. It illustratesthe application of the techniques via a case study, includingsome interesting empirical results.

KEYWORDS

Chain-ladder, claim reserving, control cycle, cross-validation, estimationerror, forecast error, loss reserves, overconfidence, parameter risk, process

risk, predictive skill, reserve accuracy, reserving process

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 161

Page 2: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

1. IntroductionFundamentally, estimates of claim liabilities

are predictions or forecasts. Presented with abody of historical data, and with knowledge ofthe operation of the business and the exposuresunderwritten, the actuary is called upon to de-velop a forecast of the future cash flows asso-ciated with the settlement of all unpaid claimobligations. Since claim development involvesone or more underlying stochastic processes, theactuary must separate noise from signal in theavailable data. This necessitates the selection ofan underlying model, application of the modelto the available data, selection of model param-eters, and interpretation of the results. The re-sulting forecast will be subject to estimation er-rors resulting from both any misspecification ofthe model and misestimation of its parameters,as well as random process variation.Typically the actuary will employ several ac-

tuarial projection methods and select an actuar-ial central estimate of the claim liabilities afterreviewing the results of each method, giving ap-propriate weight to the indication produced byeach method and supplementing the mechani-cally generated indications with judgment. A nat-ural question to ask of the actuary is, “How doyou know that the methods chosen are the mostappropriate ones to employ?” A corollary ques-tion is, “How do you decide the relative weight tobe given to the results from each method?” Whilesome might reply that the selection of methodsand the assignment of weights are actuarial judg-ments made with the benefit of experience, in thecurrent regulatory and accounting environmentthere is a need to respond with greater clarityto such questions. It therefore seems worthwhileto develop a formal methodology to assess theaccuracy of the methods used to make actuarialestimates.Ongoing performance testing of methods can

help (a) provide empirical evidence as to the in-herent level of estimation error associated with

any particular method; (b) provide insight intothe strengths and weaknesses of various methodsin particular circumstances; (c) provide assur-ance that the actuary is using the best methods forthe given circumstance; (d) provide useful infor-mation regarding the appropriate weights to begiven to each method; and (e) ultimately lead tomeaningful improvements in actuarial estimates,in the sense of lower estimation errors, as thestronger methods win out over the weaker meth-ods, based on objective evidence as to their per-formance.Performance testing can also assist in manag-

ing the potential overconfidence of the actuary,a topic discussed by Conger and Lowe (2003).Behavioral scientists have demonstrated that thevast majority of managers are overconfident intheir ability to make accurate forecasts–of ev-erything from estimates of next year’s sales rev-enue, to the cost of implementing a new sys-tem, to the delivery date of a new product underdevelopment–because they lack meta-knowledgeabout the limitations of their forecasting abili-ties.1 Overconfidence is a problem becauseblown estimates, cost overruns, and missed dead-lines hurt business performance. Actuaries arenot immune to the overconfidence phenomenon;it manifests itself when actual claims are out-side the range more frequently than expected.The formal feedback loop provided by the con-trol cycle helps the actuary develop this meta-knowledge, becoming “well-calibrated” as to thetrue limitations of his or her actuarial projections.In addition to improving the quality of the ac-

tuarial estimates of claim liabilities, performancetesting results are also useful in other contextssuch as setting ranges around central estimates

1Meta-knowledge is an understanding of the limitations of one’sknowledge and capabilities. It includes an appreciation of the limi-tations on one’s forecasting ability. A forecaster can develop meta-knowledge by receiving continuous feedback as to the accuracyof his or her forecasts. (Such a forecaster is said to be “well-cali-brated.”) For example, meteorologists are generally well-calibratedbecause of the natural feedback they receive about their dailyweather forecasts.

162 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 3: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

and measuring required economic capital for re-serve risk.Since estimates of claim liabilities are fore-

casts, performance testing of actuarial projectionmethods should involve proper measurement ofprediction errors via a standard statistical tech-nique, referred to as cross-validation. This paperprovides an overview of cross-validation, defineshow it would be employed in the context of prop-erty and casualty claim liability estimates as akey element of the control cycle, and providessome illustrative empirical results.Including this introduction, this paper is or-

ganized into four sections. Section 2 of the pa-per lays out a definitional framework for claimliabilities and the actuarial estimation process.Section 3 discusses performance testing in gen-eral, including performance criteria and cross-validation techniques; and introduces the actu-arial control cycle, outlining the role that per-formance testing should play in it. Section 4 il-lustrates all of the concepts via a case study,using real data from a U.S. insurer, with ad-ditional calculation details provided in an Ap-pendix.The case study generates some interesting con-

clusions, highlighted here to pique the reader’scuriosity.

² At some maturities, a method that projects theultimate claim liabilities from the outstandingcase reserves outperforms the reported chain-ladder method.

² Ultimate claim counts can be predicted withmuch greater accuracy than ultimate dollars.This implies that projection methods that makeuse of claim counts can be more predictivethan those that do not. While the greater stabil-ity of claim count projections may be partiallyoffset by greater instability of average claimvalues, the information value of the claimcounts should lead to a net improvement inoverall accuracy over projections that ignoreclaim count information.

² Methods that formally adjust for changing set-tlement rates or case reserve adequacy havegreater predictive accuracy than the analogousmethods applied without adjustments. Whenchanges in claim settlement rates and/orchanges in case reserve adequacy occur, theaccuracy of projection methods that do not ad-just for them is materially degraded.

² In assessing which projection methods to em-ploy, and how to combine the results fromseveral methods, the degree of correlation be-tween the estimation errors is an importantconsideration. To illustrate, if the results of twomethods are perfectly correlated but the esti-mation errors from one are twice the estima-tion errors of the other, then the appropriatecourse of action is to drop the method withthe higher estimation errors entirely. A differ-ent course of action would be implied if theerrors were uncorrelated.

Not all of the conclusions from the case studycan be generalized beyond its immediate circum-stances. However, the case study does demon-strate the kinds of insights that can be obtainedthrough the performance testing process, and theutility of those insights in the control cycle. Per-formance tests that we have performed on otherlines of business and other actuarial projectionmethods also confirm the value of the exercise,suggesting that this is a fertile field of empir-ical investigation. The authors would encourageothers to pick up where the paper leavesoff and add to the body of performance testingresearch.

2. Defining the claim liabilityestimation process

Before describing performance testing, it isnecessary to establish a definitional frameworkfor the process by which claim liabilities are esti-mated.

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 163

Page 4: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

2.1. Claim liabilities

Claim liabilities are the loss and loss adjust-ment expense obligations arising from coverageprovided under insurance or reinsurance con-tracts. The term may refer either to actual claimliabilities (that is, an after-the-fact realization ofthe claim payments) or to estimated claim lia-bilities produced by an actuarial projectionmethod.Claim liabilities may be estimated either on a

nominal basis or on a present value basis. Whileestimates are more commonly made on a nomi-nal basis, estimates on a present value basis aremore representational of the economic cost of theclaims. Present value estimates may be less ac-curate due to misestimation of the timing of pay-ments; they also may be more accurate becausethe predicted payments further into the future–which are subject to the greatest uncertainty–aregiven successively less weight.Let Ci,d represent the actual incremental paid

claims on the ith accident year2 during the dthdevelopment period. Claim development is therealization of the actual claim liabilities, Ci,d, overtime, reflecting the confluence of a number ofunderlying stochastic processes: claim genera-tion, claim reporting, claim adjusting, and claimsettlement. (On a net basis, additional processeswould include the identification, pursuit, and re-covery of subrogation, salvage, third-party de-ductibles, and reinsurance.)Since they are the result of stochastic pro-

cesses, all Ci,d are random variables.We typically have a triangular array of actual

values of Ci,d, up to the current calendar point intime t = i+d¡ 1, as shown below.3

2The use of the term accident year is for ease of exposition; theconcepts apply equally in an underwriting year, report year, policyyear, or other cohort-based configuration of data.3While the presentation assumes the traditional situation where atriangular array of historical claims is available, the concepts wouldapply equally in other contexts, for example, where claim liabilitiesare based on exposure-based methods.

2666666664

C1,1 C1,2 C1,3 C1,4 C1,5

C2,1 C2,2 C2,3 C2,4

C3,1 C3,2 C3,3

C4,1 C4,2

C5,1

3777777775These actual Ci,d are a sample partial realizationfrom the underlying stochastic processes.Let C(t)i,d represent

4 the estimate (more accu-rately, a forecast) of the expected incrementalpaid claims on accident year i during develop-ment period d, based on information at time t.The actual aggregate discounted claim liabili-

ties at time t are given by

L(t) =X(vi+d)(C

(t)i,d) for all i+d¡ 1> t,

(2.1)

and the estimated aggregate discounted claim li-abilities at time t are given by:

L(t) =X(vi+d)(C

(t)i,d) for all i+d¡ 1> t,

(2.2)

where vi+d is the discount factor applicable topayments made i+ d¡ t periods after t.Finally, actual and estimated aggregate claim

liabilities are related by

L(t) = L(t)¡ e, (2.3)

where e is the estimation error, the difference be-tween the estimated and the actual realization ofthe claim liabilities. Because all of the Ci,d arerandom variables, e is also a random variable.L(t) may be broken down into component esti-

mates; for example, the estimated liabilities asso-ciated with a particular accident year, or the lia-bilities that are expected to be paid in a particularfuture calendar year. Because they are estimatesof means, the total of the component estimates isequal to the overall estimate, L(t).

4The use of parentheses around the parameter t is to distinguish itfrom an exponent.

164 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 5: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

In contrast to claim liabilities, a claim reserveis a balance sheet provision for the estimatedclaim liabilities at a particular statement date.The reserve is a fixed, observable, known quan-tity, not an estimate. Those who speak of “re-serve estimates” are guilty of sloppy language,which hopefully does not translate into sloppythinking. The reserve is a financial representa-tion of the underlying obligations, reflecting thevaluation standards associated with the particu-lar financial statement of which it is an element.These standards are established by the applica-ble regulatory and accounting bodies within thejurisdiction. Such valuation requirements mightnot equate the reserve, R(t), with the associatedestimated liabilities, L(t). For example, regulatoryrequirements might stipulate that the reserve beset conservatively at a percentile above the mean.Alternatively, the reserve might include a mar-gin for the cost of the economic capital requiredto support the risk of the associated claim lia-bilities. It is sufficient for our purposes to sim-ply note that R(t) = f(L(t)), (or more preciselythe distribution around L(t)) where f is deter-mined by the particular regulatory/accountingregime.

2.2. Actuarial projection methods

An Actuarial Projection Method is a systematicprocess for estimating claim liabilities. It encom-passes three fundamental elements:

² An algorithm, which is used to project futureemergence and development. Any algorithmis predicated on an underlying model, whichis a mathematical representation that embod-ies explicit and/or inferred properties of theclaim incidence, reporting, and settlement pro-cesses; the algorithm applies the model to agiven dataset to produce an estimate of theclaim liabilities.

² A predictor dataset, which is the input dataused by the method. Specification of the data-

set includes the types of data elements, the val-uation points of those data elements that varyover time, and the level of detail at which theprojection is performed (for example, sepa-rate analyses for indemnity and expense, or bytype of claim, class of policyholder, or juris-diction). The dataset may include data externalto the entity in addition to internal data; it mayalso include qualitative information as well asquantitative.

² Intervention points, which are points in the pro-cess at which judgments are made by the ac-tuary. These include the selection of assump-tions (model parameters such as chain-ladderlink ratios), overrides to outlier data points,and other adjustments from what would oth-erwise be produced by mechanically applyingthe algorithm to the data.

A particular method m(a,d,p) (i.e., consistingof an algorithm, a dataset, and interventionpoints) is selected by the actuary from the setof all possible methods M(A,D,P) to produce anestimate of the claim liabilities, L(t)m , at a valua-tion point t, for example at the end of a givencalendar year.A projection method might use data with a val-

uation date that is earlier than the valuation dateof the estimated liabilities; in such a case the ac-tuarial projection method would include the “rollforward” procedure that adjusts for the effects ofthe timing difference.Most traditional projection methods assume

stationarity of the underlying claim processes,such that the historical development in the datasetis assumed to be representative of the conditionsthat will drive the future development. Often thisis not the case, and the data must first be adjustedto reflect the current and anticipated future con-ditions. Such adjustments, when made, are partof the actuarial projection method.5

Wacek (2007) distinguishes between clinicalpredictions, which are conclusions reached by an

5Some of the more advanced stochastic projection methods allowfor the claim development processes to be dynamic, varying alongvarious dimensions. Some methods also allow for process variationdue to exogenous factors, such as monetary inflation.

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 165

Page 6: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

expert when presented with a set of facts abouta problem of a type with which he or she hasexperience; and statistical predictions, which areconclusions indicated by a quantitative or statis-tical formula or model using a set of quantifi-able facts about a problem. He points out thatan expert making a clinical prediction may usea statistical model, but if the model results areaugmented by consideration of other informa-tion and the judgment of the expert, then theprediction would be classified as clinical. Thusthe estimation of claim liabilities is generally aclinical rather than a statistical prediction. Theinclusion of intervention points in the definitionof an actuarial projection method is in recogni-tion of the important role that endogenous in-formation and judgment play. All statistical pre-diction models/methods have inherent limita-tions; many of those limitations are routinely ad-dressed by the actuary through the use of inter-vention points and the insertion of judgment intothe estimate.Performance testing can be done on either sta-

tistical predictions (those involving a mechanicalapplication of the algorithm without any judg-mental interventions) or on clinical predictions inwhich we are testing the actuary’s performance,not just “the machine’s” performance.

2.3. From actuarial projections to theactuarial central estimate

Ultimately, the actuary is responsible for de-veloping an actuarial central estimate of unpaidclaim liabilities, representing an expected valueover the range of reasonably possible outcomes.In developing an actuarial central estimate, theactuary will typically use multiple actuarial pro-jection methods that rely on different data6 and

6We note, however, that many methods use overlapping data sets.For example, both the incurred chain-ladder method and the paidchain-ladder method make use of cumulative paid loss data. Be-cause of this overlap in data, one would expect that the indicationsresulting from the paid and incurred chain-ladder methods wouldexhibit some level of correlation.

require different assumptions. For each methodemployed, the actuary can produce alternativeactuarial estimates by varying the selected pa-rameters and other judgments made at the in-tervention points in the method. Ultimately, theactuary will settle on parameters and assump-tions that produce the central estimate for eachmethod.One can view the central estimate from each

method as a sample mean, developed from a sam-ple of the data. Each sample mean will havea sample variance, depending on the predictivequality of the sample data employed. The actu-ary will reconcile/combine the sample means byassigning a relative weight to each one to pro-duce the actuarial central estimate.

3. Performance testing and thecontrol cycle

Within the definitional framework articulatedin the previous section, we can now turn to theproblem at hand.The first issue facing the actuary is to choose

a finite set of actuarial projection methodsfm1,m2, : : :mng from the set of all possible meth-odsM that is “best” for a particular class of busi-ness and circumstance.7 The second issue is tocombine the results of the individual methods to-gether into a single estimate that is “best,” givingappropriate weight to each method based on itspredictive value in the specific circumstances. Aswill be seen in subsequent sections, both issuescan be addressed more formally via performancetesting.

3.1. Performance testing and selectionof methods

In establishing a performance testing method-ology, one needs to start by establishing appro-

7The selected set of methods may vary by maturity. For example,the actuary might use projection methods for complete accidentyears, but use an expected loss ratio method for the current, in-complete accident year.

166 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 7: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

priate performance criteria. In other words, bywhat set of criteria do we say that one method isbetter than another? Preference for one methodover another would then be based on perform-ance test results against those criteria in the spe-cific circumstances.Obviously, the choices of methods at a given

point will also be constrained by the availabledata (both quantity and type) and the availableresources to produce and analyze them. How-ever, over time, performance testing can assist inevaluating the benefits against the costs of gen-erating currently unavailable data elements or ofadding resources to support more labor-intensiveprojection methods. Performance testing mighteven reduce actuarial resource costs by redeploy-ing resources away from actuarial methods thatdo not meet cost-benefit criteria.Statisticians and forecasters traditionally argue

that the best method is the one that produces es-timates that are unbiased and minimize the ex-pected squared error. Such methods are describedas “BLUE”–Best Least-Square, Unbiased Esti-mators.For our purposes, we extend the criteria some-

what beyond BLUE. In evaluating the perfor-mance of actuarial projection methods it is ap-propriate to consider the criteria listed in Table 1.Our criteria extend the acronym from BLUE

to “BLURS-ICE”–Best Least-Square, Unbias-ed, Responsive, Stable, Independent, and Com-prehensive Estimate.The minimum squared error criterion is obvi-

ously the strongest; it is the primary yardstickagainst which methods are compared. The re-quirement for unbiased estimates is a weaker cri-terion, in that we would accept methods withbias if the bias were measurable or immaterial.In other words, we would be willing to trade alower expected squared error for the introduc-tion of a manageable level of bias. Stability andresponsiveness are desirable characteristics, butare also weaker criteria.

Table 1. Performance criteria for selecting actuarialprojection methods

Least-SquareError

The method should produce estimates thathave minimum expected error, in the leastsquares sense.

Unbiased Ideally, the method should produce estimatesthat are unbiased, with an expected error ofzero.

Responsive The method should respond quickly tochanges in underlying trends or operationalchanges. Given several methods with similarexpected errors, we would favor the methodthat responds to changes in trends oroperations most quickly.

Stable Methods that work well in somecircumstances, but “blow up” in othercircumstances are less desirable.

Independent The method should not generate predictionerrors that are highly correlated with those ofother selected methods. (This criterion isdiscussed more fully in subsequent sections.)

Comprehensive The method should use as much data aspossible, in an integrated way, so that themaximum available information value isextracted.

Comprehensiveness and independence look atthe problem through a slightly different lens thanthe other criteria. These two criteria relate to theissue of combining multiple estimates together toobtain an actuarial central estimate. The compre-hensive criterion stems from our desire to maxi-mize the information value of all of the availabledata; rather than applying many separate meth-ods to disparate data elements it is preferable touse fewer, more comprehensive methods. Sim-ilarly, the independence criterion stems from adesire to not waste resources on methods withhighly correlated estimation errors, producing es-sentially the same answer as another method.If the errors from two projection methods arehighly correlated it is preferable to eliminate theone with the higher expected squared error.

3.2. Performance testing usingcross-validation

Rather than invent a performance testing meth-odology from scratch, we need only look to theapproaches used in other sciences that are also

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 167

Page 8: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

called upon to make forecasts. Statistical tech-niques to measure the accuracy of forecast mod-els are abundant in the literature of other pro-fessions. For example, as a result of the ongo-ing interest in hurricanes, the authors have hadoccasion to review the literature relating to theaccuracy of hurricane forecasts made by clima-tologists. In climatology, the “skill” of statisti-cal forecasting methods is measured via cross-validation. The method of cross-validation is out-lined in Michaelson (1987). The validity of themethod for estimating forecast skill is discussedin Barnston and van den Dool (1993). The poten-tial pitfalls and misapplication of the method arediscussed in Elsner and Schmertmann (1994).Cross-validation, properly applied, permits a

fair comparison of the skill of competing fore-cast models. For example, forecasts of the ex-pected number of major hurricanes spawned inthe Atlantic basin are made annually by Gray(2008) and others, based on statistical modelsthat relate the level of Atlantic hurricane activ-ity to other climate variables such as the stateof the El Nino Southern Oscillation (ENSO) inthe Pacific. The skill of each of these modelscan easily be measured and compared via cross-validation. Similar cross-validation tests are per-formed to measure the skill of storm path fore-cast models employed by the National HurricaneCenter. This helps users of the forecasts to appre-ciate their limitations, and allows each modeler toplace formal ranges around the central estimateof their model’s forecast. It also facilitates theselection of the “best” forecast model and pro-vides direction to the course of further researchand enhancements.Cross-validation is a straightforward, nonpara-

metric approach to estimating the errors associ-ated with any forecast produced by a model. Thiswould include forecasts of future claim paymentsin estimating claim liabilities. Cross-validation isrelatively simple to implement, given sufficientdata. It is quite general in its applicability to any

suitable defined actuarial projection method, andhas intuitive appeal because it closely mimics theactual claim liability estimation process.

3.3. Cross-validation in general

The basic approach in the context of a gen-eral statistical forecasting model is to developa model using all observations except one, andthen to use the model to produce a forecast of theomitted observation. This process is repeated it-eratively, generating a forecast from a new modeldeveloped by excluding a different observationeach time. The cross-validation exercise gener-ates an independent forecast for each observa-tion, based on a model generated by the rest ofthe data. The skill of the model can then be mea-sured by comparison of the forecasts to the actualoutcomes.For example, let Y represent a random vari-

able that we believe is dependent on some set ofindependent variables X = fa,b,c, : : :g. In otherwords we believe that Y = f(X), although wedon’t know the precise form of f. Suppose thatwe have n historical observations of (X,Y), andwould like to develop a linear regression modelthat predicts future values of Y given X. Ratherthan using all n values of (X,Y) to develop the re-gression equation, the cross-validation approachwould develop n regression equations, each onebased on n¡ 1 of the observations. Each equationwould then be used to predict the Y value of theexcluded observation, Y¤, as displayed schemat-ically in Figure 1.Comparison of the actual values of Y to the

predicted values Y¤ is a fair way to measure thetrue ability of a linear regression model to predictY from X. We are testing the predictive ability ofa linear model, not the parameters of the modelitself. By comparison, the standard error of theregression is not a fair way to measure the equa-tion’s predictive ability. The standard error is ameasure of the degree to which the equation ex-plains the variation in Y; inevitably the standard

168 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 9: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

Figure 1. Cross-validation in a regression analysis context

error of a regression overstates predictive abilitybecause, in finding the best-fit equation to the databy minimizing the errors, the result inadvertently“explains” some of the noise in addition to thesignal in the sample data.Once the cross-validation tests have been per-

formed, then all of the data can be used to de-velop the best regression model for use in pre-dicting Yn+1.Elsner and Schmertmann (1994) point out that

the form of the equation, as well as the parame-ters, must be incorporated into the cross-valida-tion test for it to be fair. Going back to our re-gression example, let us assume that in devel-oping our regression equation some of the vari-ables in the set X might not be significant, ormight create problems of autocorrelation, suchthat the final regression equation might makeuse of only a subset of the variables in X. Itwould not be fair to use all of the data to de-cide which variables to retain and which to elim-inate, and then perform cross-validation on thedata with the selected equation. Instead one muststart with all of the variables and go through theprocess of elimination in developing each of then equations for the cross-validation test to befair. Cross-validation properly performed, cap-tures model risk as well as parameter and processrisk.Also, in designing the test it is important to

consider whether using data on “both sides” ofeach (X,Y) value will result in a fair test. Sup-

pose that the data is a time series–for exam-ple, the numbers of Atlantic hurricanes occur-ring in each calendar year over a twenty-yearperiod. In the general case, we would use twentyobservation sets of nineteen data points to de-velop twenty forecasts that could be comparedto the actual values. This would include usingobservations from years after a particular yearto forecast the results for that year. Of course,in reality the subsequent observations would nothave been available at the time we were mak-ing the forecast. This is only a serious problemif the data exhibits serial correlation, typicallydue to a trend, cyclicality, or other nonstation-ary situation. Where serial correlation is presentin time-series data, using adjacent data on bothsides provides too much information about theomitted value to the forecast model; in such situ-ations sufficient adjacent data points on one sideshould be excluded so that the test is fair. Whereserial correlation is not present, the observationscan be treated as a sample set without order sig-nificance. This issue is particularly relevant in thecontext of performance testing of claim liabilityestimates, as there is very little data in insurancethat does not exhibit serial correlation.The cross-validation approach goes by a va-

riety of other names, depending on the context.Those involved in developing predictive statisti-cal models might refer to the approach as testingthe model using out-of-sample data. Meteorolo-gists refer to cross-validation as “hindcast” test-

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 169

Page 10: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

ing. In the context of claim liability estimation,some actuaries refer to it as retrospective testing.

3.4. Application of cross-validation toestimates of claim liabilities

The cross-validation approach applies quitenaturally to the estimation of claim liabilities. Itcan be applied to an overall actuarial method,used to develop L(t)m ; or to a component of themethod–for example, one that estimates theclaim liabilities for an accident year at a partic-ular maturity, or one that forecasts the expectedincremental claim payments for an accident yearin a particular development year.One issue with cross-validation in a claim lia-

bility estimation context is whether any data aftera point in time can be used to validate estimatesof claims before that point in time. As was dis-cussed in the previous section, cross-validationtests require careful design to assure that theyfairly assess predictive skill.In the context of testing an actuarial projection

method, the selected actuarial projection methodis typically applied to a historical dataset thatis limited to information that would have beenavailable at the time, and the resulting estimateof the claim liabilities is compared to the ac-tual outcome with the benefit of hindsight re-flecting the actual run-off experience. This as-sures that the test is a realistic and fair test ofthe performance of the method. The dataset isthen brought forward to the next valuation point,and the estimation process is repeated. Sched-ule P of the U.S. Annual Statement includes thistype of cross-validation test of the historical heldreserves.We illustrate the application of the standard

performance testing technique in the case studyin Section 4, with additional calculation detailsin the Appendix.Finally, we note that in most actuarial projec-

tion applications one does not have the luxury ofknowing the actual value of the claim liabilities,

L(t), because the subsequent run-off is not com-plete. Instead, one typically only knows the ac-tual run-off through the current valuation point,and must combine it with the current actuarialcentral estimate of any remaining unpaid claimliabilities to obtain a proxy for L(t). This is a prac-tical concession that one must make unless oneis prepared to restrict the analysis to very old ex-perience. The validity of the results is a functionof the proportion of each “actual” value that ispaid versus still estimated unpaid, statistics thatshould be tracked throughout the analysis. A fi-nal step in each performance test exercise will beto be sure that the results and conclusions are notunduly influenced by the current estimates of theremaining liabilities.

3.5. Performance testing as an elementof the reserving control cycle

The importance of an actuarial control cyclehas been presented by many others. The conceptwas first presented by Goford (1985) in an ad-dress to U.K. actuarial students in the contextof life insurance. Since then, the actuarial con-trol cycle has been adopted as an educationalparadigm by many actuarial organizations, mostnotably the Institute of Actuaries in Australia.Most recently, the Casualty Actuarial Society hasannounced a major revision to its basic educationand examination structure, one element of whichis the addition of the actuarial control cycle to thenew Internet-based exam modules [see for exam-ple, Hadidi (2008)]. For a general introduction tothe Control Cycle see Bellis (2003).In the context of loss reserving a control cycle

involves identifying, testing, and validating allelements of the process by which claim liabili-ties are estimated. Figure 2 depicts what the con-trol cycle might look like in this context. Mostimportantly the control cycle should involve anongoing assessment of the estimation skill of theactuarial methods currently being employed, andexploration of opportunities to enhance overall

170 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 11: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

Figure 2. The actuarial control cycle for the claim reserving process

estimation skill by implementing better actuarialprojection methods, based on all of the BLURS-ICE criteria. As with other control cycles, thereserving control cycle is critical to the manage-ment of reserve risk.Those with an engineering background will

recognize that the actuarial control cycle is a spe-cific application of a more general approach toquality assurance and continuous process im-provement. Periodically, in a planned manner,one steps back from the production process toassess its effectiveness and to identify ways toimprove it. The control cycle also codifies that,in addition to a production role, the actuary hasa developmental role to play. This latter role canoften get lost in the crush of production activi-ties.In the reserving control cycle, performance

testing is the primary assessment tool. As partof the control cycle, the actuary would identifythose areas where the greatest exposure to esti-mation errors is present, and seek process im-provements to mitigate the exposure to those er-rors. This can take the form of improvements inthe quality, timeliness, and range of data relatingto the nature of the underlying exposures andemerging claims, or the introduction of new ac-

tuarial projection methods that make use of newdata or better leverage existing data. Occasion-ally it will take the form of changes to people orsystems involved in the claim reserving process.Implementing performance testing within the

claim reserving control cycle can be labor inten-sive, especially initially when the process is firstbeing established. It is impractical to expect thatextensive performance tests would be performedon every possible actuarial method for each classof business, with the detailed analysis updatedduring every reserving cycle. Fortunately, this isnot necessary. Insights gained from the analysisof one class of business often extend to othersimilar classes. And, since the additional insightgained by updating the data with one new diag-onal is incremental, the annual updates for eachclass do not entail an extensive analysis.Performance testing can be substantially facili-

tated by capturing the central estimate from eachactuarial projection method, along with the finalselected actuarial central estimate, for each classof business in a relational database on an ongoingbasis. Reconstructing this information after thefact can be a substantial archaeological exercise,particularly when the methods involve interven-tion points with judgmental selections. Recon-

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 171

Page 12: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

struction exercises are often necessary to “seed”the database initially; the database is then up-dated with new projection results on an ongoingbasis. Once this database is established, ongoingperformance test results for existing methods canbe produced somewhat mechanically. Obviouslythis will not be the case for new actuarial projec-tion methods, where some “back-casting” of themethod is necessary to get an initial view of itsperformance.The key to successful implementation of per-

formance testing within the control cycle is acarefully-thought-out test plan that focuses on afew key issues and experiments with a few newmethods or new data sources.

4. Illustrative case study

In this section, we move the discussion fromconcept to practice, by means of a real-worldcase study based on the actual reserving expe-rience of a major U.S. insurer. While we drawsome actual conclusions from the case study, itsprimary purpose is to illustrate the applicationof performance testing, cross-validation, and thecontrol cycle. While our work in this area (in-cluding performance testing on other datasets)generally confirms the findings in the case study,we can’t say that all of the conclusions drawnfrom the case study can be generalized to othercompanies or circumstances.

4.1. The company and its data

The case study is based on the experience ofa company that no longer exists in its histori-cal form, having been transformed by a series ofsubsequent mergers and acquisitions. Since thecompany was a long-standing client of the au-thors’ firm, a wealth of historical reserving ex-perience is available. We are appreciative of thesuccessor company’s support in allowing us topublish results based on the historical dataset.For each reserve segment, accident year loss

development data is available for twenty-seven

accident years, from 1972 to 1998, with annualvaluations at June 30. Sheet 1 of the Appendixillustrates the complete triangular dataset, dis-playing reported claim counts. This data facili-tates claim liability estimation at each June 30from 1979 to 1998, a twenty-year span. Dur-ing this same period, one of the authors pro-vided independent estimates of claim liabilitiesto the management of the company, so actualhistorical projections using various methods arealso available. Claim liability estimates at eachof these twenty valuation points can be com-pared with “hindsight” estimates based on a spe-cial study performed using data through Decem-ber 31, 2000, that combined actual run-off claimpayments with estimates of the remaining unpaidclaim liabilities at that time. Thus, for the mostrecent of the twenty estimates to be tested (i.e.,the estimate at June 1998), the hindsight estimatereflects thirty months of actual run-off payments.For the earlier estimates, the hindsight estimatereflects an increasing proportion of actual run-offpayments. Assuming that the December 2000 re-maining claim liability estimates are a reasonableproxy for the actual remaining unpaid claim lia-bilities, the claim liabilities at June 1998 are 77%paid; the claim liabilities at June 1997 are 84%paid; the claim liabilities at June 1996 are 92%paid; and the claim liabilities at earlier valuationsare between 96% and 100% paid.For each reserve segment, loss development

data consists of the following triangles:

² Reported claim counts (shown on Appendixsheet 1)

² Closed claim counts (combination of closedwith payment and closed without payment)

² Paid loss and allocated loss adjustment ex-penses

² Incurred loss and allocated adjustment ex-penses (paid plus case reserves)

Fortunately, during this extended period thecompany’s claim systems did not change in ma-terial ways, so the historical data reflect relatively

172 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 13: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

consistent claim processing, from an IT perspec-tive.In addition to the claim experience, direct

earned premiums by calendar year are also avail-able, allowing us to use methods based on lossratios.The company wrote what may be described

as “main street business” through independentagents. The data is net of excess of loss reinsur-ance, at retentions that were relatively low, typi-cally around $100,000.The company’s historical experience provides

a rich backdrop for illustrating performance test-ing, for several reasons.

² First, the experience period includes the sev-enties and early eighties, a period where sig-nificant monetary and social inflation causedtrends and development patterns to changefrom their historical levels. Conversely, in thenineties this inflation abated and trends anddevelopment patterns stabilized.

² Second, the company made some significantoperational changes–particularly to its claimhandling procedures–that affected claim clo-sure rates, case reserve adequacy, and thetrends in average claim costs across accidentyears.

² Third, the company’s underwriting posturechanged several times during the period. In theearly experience years they focused on revenuegrowth, leading to relatively disastrous under-writing results (in this they were not alone).Eventually it became necessary to re-under-write the existing business, become more se-lective about new business, and seek price ad-equacy.

These factors create some distinct “turningpoints” in the historical data, creating significantchallenges to the claim liability estimation pro-cess. While the challenges in the future mighttake a different form than those observed in thepast, the uncertainties created will potentially beof a similar magnitude. Thus we believe perfor-

mance testing over this period is a useful exer-cise.The case study focuses on our ability to esti-

mate claim liabilities for Commercial Automo-bile Bodily Injury Liability (CABI). Estimationfor this reserving segment is neither very difficultnor very easy, so it is a good place to start.For the purposes of the case study we can pre-

tend that it is 2001, and the company’s manage-ment has asked the corporate actuary to beginestablishing a control cycle (making use of the“final” estimates of claim liabilities as of Decem-ber 2000). The starting point is a performancetesting project focusing on a representative lineof business, CABI. The tests will include thetwo methods currently employed in the reservingprocess, the paid and incurred chain ladder meth-ods; an alternative method, not currently usedby the company, that projects case reserves only;and an additional incurred development methodthat seeks to adjust for changes in case reserveadequacy. While methods to adjust for case re-serve adequacy have been in the literature foryears, they have not been implemented at thecompany due to misgivings about their effec-tiveness and stability. The project is purposelylimited in scope to fit within available resourceconstraints. As part of the project, a plan for fur-ther performance testing is also to be developedbased on the initial results from the project.

4.2. Paid chain-ladder developmentmethod

The project starts with performance testing ofthe paid chain-ladder development (PCLD). First,we need to formally define the PCLD method,using our m(a,d,p) construct from the previoussection. This is done in summary form in Table 2.As can be seen in Table 2, this is a highly me-

chanical method, with little or no intervention,and no exogenous data employed. In essence,this is a test of “the machine,” without the nor-mal dose of actuarial judgment supplied by “the

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 173

Page 14: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

Table 2. Paid chain-ladder development method

The algorithm The basic chain ladder approach in which the cumulative paid claim and allocated loss adjustment expenses at eachhistorical maturity is divided by the cumulative paid value at the preceding maturity to obtain a triangle of historicalreport-to-report development factors. The simple average of the latest four observations8 is calculated at eachmaturity. A tail factor to provide for development beyond 150 months (the last available maturity in the triangle) isbased on ratios of the current value of reported claim and allocated expenses to the paid claims and allocatedexpenses at 150 months for the latest four fully mature years, if available.

A “benchmark” set of factors is used to obtain report-to-report factors to the extent that the observed developmentdata does not extend to 150 months.

The selected factors are chained together to produce report-to-ultimate development factors. The development factorfrom 6 months to 18 months is divided by two to account for the accident half-year. Cumulative paid losses as of thecurrent valuation are subtracted from the projected ultimates to obtain estimates of unpaid claim liabilities.

The data Cumulative paid claim and allocated loss adjustment expenses, in traditional triangular format, with valuations from 6months to 150 months. Reported claim and allocated expenses as of the current valuation for accident years beyond150 months. No other data is employed.

Intervention points Other than selecting the benchmark factors, no interventions are made.

8The choice to use the simple average of latest four factors is meant to be illustrative only. In actual practice one could also test the use ofweighted averages, including fewer or more observations, excluding high and low values, etc.

man.” This is a scope limitation to the initial per-formance testing project; performance tests ofthe actuarial judgments are deferred to subse-quent control cycles, after some experience withthe performance of the basic methods has beengained.The PCLD method was applied to the avail-

able CABI data at each midyear valuation pointfrom June 1979 to June 1998 to obtain estimatesof the unpaid claims, by accident year, at eachvaluation. Application of the PCLD method atthe June 1984 valuation is illustrated on sheet 2of the Appendix.Many actuaries find it convenient to vary actu-

arial projection methods by accident year, usingone method for mature years, a different methodfor immature years, and perhaps yet anothermethod for the current year. It will therefore beimportant to look at performance test results bymaturity, to assist the actuary in his or her choicesof method by maturity. Rather than focusing ini-tially on the overall performance of the PCLDmethod, our analysis therefore starts with a sin-gle component of the claim liability, the perfor-mance results for the accident year at 42 monthsmaturity. (The choice of 42 months is arbitrary.)

Figure 3 displays the performance of thePCLD method as an estimator of the unpaidclaims for the accident year at 42 months ma-turity (i.e., the 1976 accident year at the June1979 valuation, the 1977 accident year at theJune 1980 valuation, etc.) The results in Figure 3are designed to answer the question “How accu-rate is the paid chain-ladder development methodas an estimator of unpaid claim liabilities whenapplied to the actual cumulative paid claims at42 months maturity?”The uppermost of the three bar graphs in Fig-

ure 3 compares the estimated unpaid claim li-abilities to the actual unpaid claim liabilities ateach valuation point. (Note that, by “actual,” wemean the unpaid claims with the benefit of hind-sight through December 2000, consisting of ac-tual payments through December 2000 plus es-timated unpaid claims at that point.) One cansee that there is some degree of correspondencebetween the PCLD estimate and the actual lia-bilities, although at some valuations (1986 and1992—1994) the correspondence is not terriblystrong.The middle of the three bar graphs converts

the predicted dollars of unpaid claim liabilitiesinto unpaid claim ratios, by dividing them by

174 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 15: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

Figure 3. CABI performance test results—Paid chain-ladder method

the earned premium for the calendar year. Thisessentially normalizes for variations in the vol-ume of business. The dotted horizontal line isthe average unpaid claim ratio, which is approx-imately 16%. (Note that the premium being usedis the combined liability premium for both bodilyinjury and property damage; these are only the

bodily injury portion of the loss ratio.) The over-all picture isn’t materially different than the up-per bar graph, partially because the company hadthe misfortune to maximize its volume of busi-ness in the late eighties, when loss ratios werehighest. The relative heights of the pairs of barsare largely unchanged.

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 175

Page 16: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

Finally, the lowermost bar graph of Figure 3simply adjusts the unpaid claim ratios by sub-tracting the 17% average unpaid claim ratio. Theresult is to express the projections in terms oftheir ability to detect deviations from the aver-age. This last presentation is relevant because theskill of a forecasting method is typically definedas

Skillm = 1¡msem=msa (4.1)

where msem is the mean squared error, the aver-age squared difference between the actual unpaidclaim ratio and the predicted unpaid claim ratiofrom the method; and msa is the mean squaredanomaly, the average squared difference betweenthe actual unpaid claim ratio for that observationand the overall average actual unpaid claim ratioacross the entire test period. The msem is spe-cific to the actuarial method, while the msa is anintrinsic property of the experience. The methodwith the minimum mean squared error will there-fore have the maximum skill.Does the form of the measure of skill look

familiar? In a regression context it would be ex-pressed as

R2 = 1¡ sse=sst, (4.2)

where sse is the squared difference between theactual values and those predicted by the regres-sion equation, and sst is the squared differencebetween the actual values and their mean. Just asR2 is a statistical measure of amount of variationcaptured by the regression equation, Skillm is ameasure of the amount of variation captured bythe particular actuarial method.Those familiar with credibility will also recog-

nize that there is some similarity of the definitionof skill to that of credibility.From the definition of skill, one can recog-

nize that skill is equal to zero when the msem isequal to the msa, and negative when the msemis greater. In these cases one would be betteroff multiplying the earned premium by the long-term average unpaid claim ratio to estimate the

unpaid claim liabilities, rather than using themethod being tested.9 Conversely, when themsemis zero, the method perfectly predicts the actualresult and skill is 100%.Returning to the lowermost bar graph on Fig-

ure 3, one can observe the actual unpaid claimratio anomalies directly. They reflect the patternof the underwriting cycle during this era. Theactual unpaid claim ratio anomaly is contrastedwith the predicted anomaly. The errors are thedifferences between the pairs of bars, reflectingour ability to predict the direction and magni-tude of the anomaly. While it is not a funda-mental transformation, removing the mean im-proves the visual display, allowing one to seethe errors more clearly. This lower chart is thetypical means of displaying performance test re-sults.The skill of the paid chain-ladder method in

estimating the unpaid claims at 42 months foran accident year is only about 21%. This re-sult is heavily driven by the poor performanceof the method at the June 1986, 1992, 1993, and1994 valuations. At each of these valuations, thePCLD projections look reasonable based on theinformation available at the time; however, sub-sequent paid claim development is markedly dif-ferent than the history up to that point.The data underlying Figure 3, along with the

details of the calculated skill of 21%, are pre-sented on sheet 6 of the Appendix.Skill can be measured at other maturities, us-

ing the same approach as is depicted in Figure 3.It turns out that the PCLD method does not ex-hibit much skill in this case. Skill is actually at itshighest, at 21%, at 42 months maturity. The skillof the PCLD method in estimating an accidentyear’s unpaid claims at both earlier and later ma-turities is actually negative. For the early maturi-ties, the size of the development factors becomes

9In essence, the skill is measured in relation to a naïve paid Born-huetter-Ferguson method, in which the expected loss ratio and pay-ment pattern are constant long-term averages.

176 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 17: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

Figure 4. CABI performance test results—Paid chain-ladder method

so large that the estimates become very volatile.For the very mature years the remaining unpaidclaim liabilities are small and highly dependenton the numbers and nature of the claims that re-main to be settled, so the volume of cumulativepaid claims is just not a very good indicator ofthe remaining unpaid claims. Many readers willrecognize these as being typical shortcomings ofthe PCLD method.It is easy to extend the skill measure to the

overall estimate of unpaid claims across all acci-dent years, by merely taking the weighted aver-age of the individual accident year unpaid claimratios. Rather than using all accident years, wehave elected to use only the latest ten accidentyears, so that the metric is relatively consistentover time. The results are shown on Figure 4.Our performance test results indicate that the

overall skill of the paid development method inestimating unpaid claim liabilities for all ten ac-cident years combined is only about 13%. Thelow skill is driven by the large prediction errorsat the 1981, 1991, and 1992 valuations. At eachof these valuations the PCLD projections lookvery reasonable; however, the subsequent devel-opment looks markedly different than the historyup to that point.The PCLD method also exhibits some short-

comings against our other BLURS-ICE criteria.Over the twenty-year performance test period,

the method exhibited a positive bias (predicteddollars of claim liabilities above actual liabilities)of about 4%. Given the volatility of the predic-tion errors, this is likely to simply reflect sam-pling error. The method is also unresponsive tochanging conditions; not surprisingly, it tends toreact in a lagged manner to changes in the pay-ment pattern.One reason for the low skill of the PCLD

method in this particular case is the nonstation-arity of the claim settlement process over the his-torical period, as evidenced by the changing per-centages of claims open in Figure 5. This figuresuggests that an adjusted paid method, reflectingthe changing claim settlement rates, should offerimproved skill over the simple method we haveemployed here.Overall the PCLD performance tests suggest

that this method is of little value, with very lowskill, due to the changing patterns of claim set-tlements. The skill of the PCLD on this prod-uct line would be higher in more normal cir-cumstances, when settlement patterns are morestable. Performance tests of methods that adjustfor changes in settlement patterns could be un-dertaken. At a minimum, simple diagnostics thatdetect changing settlement patterns should be in-troduced into the reserving process so that theactuary can know when the PCLD is not likelyto produce a reasonable estimate.

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 177

Page 18: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

Figure 5. CABI—Percentage of estimated ultimate claims open at 42 months

Table 3. Incurred chain-ladder development method

The algorithm The basic chain ladder approach in which the cumulative incurred (paid plus adjusters’ individual claim estimates ofunpaid) claim and allocated loss adjustment expenses at each historical maturity is divided by the cumulative incurredvalue at the preceding maturity to obtain a triangle of historical report-to-report development factors. The simpleaverage of the latest four observations is calculated at each maturity. A tail factor to provide for development beyond150 months (the last available maturity in the triangle) is based on ratios of the current value of reported claim andallocated expenses to the reported value at 150 months for the latest four mature years.

A “benchmark” set of factors is used to obtain report-to-report factors to the extent that the observed developmentdata does not extend to 150 months.

The selected factors are chained together to produce report-to-ultimate development factors. The development factorfrom 6 months to 18 months is divided by two to account for the accident half-year. Cumulative incurred claims aresubtracted from the projected ultimates to obtain indicated IBNR liabilities, to which are added the current casereserves.

The data Cumulative incurred claim and allocated loss adjustment expenses, in traditional triangular format, with valuations from6 months to 150 months. Reported claim and allocated expenses as of the current valuation for accident yearsbeyond 150 months. Case reserves as of the current valuation. No other data is employed.

Intervention points If any of the calculated simple average report-to-report factors are less than one, the value is set equal to one.Benchmark factors are selected to complete the development when the triangle does not extend to 150 months.

4.3. Incurred chain-ladder developmentmethod

We turn next to performance testing of thechain-ladder method applied to incurred (paidplus case reserves) claim development data(ICLD). Once again we start by formally defin-ing the method, as summarized in Table 3.As was the case with the paid method, our ap-

proach is insular and largely mechanical, withoutbenefit of judgment or confirming tests based onexogenous data.We applied the ICLD method to the CABI in-

curred development data at each midyear valua-tion point from June 1979 to June 1998 to ob-

tain estimates of unpaid claims at each valua-tion point. (More specifically, we used the ICLDmethod to estimate the Incurred But Not Re-ported [IBNR] liabilities, and then added the casereserves to get total estimated unpaid liabilities.)Application of the ICLD method at the June

1984 valuation is illustrated on sheet 3 of theAppendix.The ICLD performance test results for the ac-

cident year at 42 months are shown in graphicalform in Figure 6. This graphic is analogous to thelower graphic in Figure 3, comparing predictedunpaid claim ratios to actual unpaid claim ratioswith the benefit of hindsight.

178 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 19: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

Figure 6. CABI performance test results—Incurred chain-ladder method

Figure 7. CABI performance test results—Incurred chain-ladder method

As can be seen in Figure 6, the ICLD methodperforms poorly in the 1981—1984 period, under-estimating the unpaid loss ratio. It then performspoorly in the 1986—1987 period, overestimatingthe unpaid loss ratio. In 1992—1994 it again over-estimates the unpaid loss ratio.The ICLD method exhibits better skill than

the PCLD method, but its absolute skill is stillrelatively low. The observed skill of the ICLDmethod at 42 months maturity is 52%, versus23% for the PCLD method. The skill of theICLD method is negative for the most recent (18months) maturity and for the older maturities.Overall the skill of the ICLD method for the

last ten accident years combined is 31%, ver-sus 13% for the paid chain-ladder developmentmethod. Figure 7 displays the overall perfor-mance test results for the ICLD method.

Over the twenty-year performance test periodthe ICLD method exhibited a negative bias (pre-dicted dollars of claim liabilities below actual li-abilities) of 0.8%. However, this is not likely tobe a statistically significant result.One reason for the poor skill of the ICLD

method is the changing levels of case reserveadequacy over the historical period, due to thechanges in claim handling procedures mentionedearlier. Figure 8 displays the average case re-serve on open claims for the accident year at 42months, adjusted to a constant-dollar basis usingthe U.S. Consumer Price Index (CPI). As can beseen, the average case reserve for the accidentyear at 42 months increased substantially duringthe period from 1984 to 1988. While somewhatvolatile, the average case reserve also appears tohave declined from its 1988 peak level during

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 179

Page 20: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

Figure 8. CABI inflation-adjusted average case reserves

Figure 9. CABI comparison of paid and incurred chain-ladder errors

the early 1990s. This suggests that an adjustedincurred method, reflecting changes in case re-serve adequacy, would offer improved skill overthe simple incurred chain ladder method. We testsuch a method later in this section.One can compare the PCLD and ICLD meth-

ods directly by looking at their prediction errors,as is shown in Figure 9. Inspection of the errorsshows that the root cause of the performance dif-ference between the PCLD and the ICLD is theperformance of the former in the years 1992 and1994. These two years contribute significantly tothe mean squared error of the paid chain-laddermethod. Prior to this period, the company en-gaged in a significant effort to reduce the in-ventory of open claims. This manifested itselfas higher-than-average paid development, whichtranslated into several diagonals of above-aver-

age paid development factors. As these factorsworked their way through the average, the paidchain-ladder forecasts significantly overshot themark.Looking at the errors more broadly, one can

see that they are similar during some periods andonly occasionally divergent.Before continuing with performance tests of

other methods, we diverge to look at the issue ofcombining estimates from multiple methods.

4.4. Combining estimates from multiplemethods

We now have predictions from two actuarialprojection methods, PCLD and ICLD, with mea-sures of skill for each. The next question to beaddressed is how to combine the results of thetwo methods together to create a single actuar-

180 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 21: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

Figure 10. Illustration of combined variance of two methods

ial central estimate. We want to combine the twoestimates according to

L(t)C = w£ L(t)I +(1¡w)£ L(t)P : (4.3)

We want to choose w so that the variance of theerror distribution around L(t)C is minimized. Thevariance of the error distribution around L(t)C isgiven by

VC = w2£VI +2£w£ (1¡w)

£Cov(L(t)I , L(t)P )+ (1¡w)2£VP= w2£VI +2£w£ (1¡w)£ ½I,P£¾I £¾P +(1¡w)2£VP (4.4)

where VI ,VP ,VC are the variances of the estima-tion errors for L(t)I , L

(t)P , L

(t)C , respectively, ¾I ,¾P

are the standard deviations of the estimation er-rors for L(t)I , L

(t)P , respectively, and ½I,P is the cor-

relation coefficient between L(t)I , L(t)P .

To minimize the error distribution around L(t)Cwe must merely take the derivative of VC withrespect to w, set the derivative equal to zero, andsolve for w. Doing so we get

w =VP ¡ ½I,P £¾I £¾P

VI ¡ 2£ ½I,P £¾I £¾P +VP: (4.5)

One can see from Equation 4.5 that if the vari-ances of the error terms are equal, then w is equalto 50%. If the variances of the error terms are not

equal, greater weight will be given to the methodwith the lower error variance, depending on thecorrelation.Figure 10 provides additional insight to the re-

lationship between the two variances and the cor-relation. In this example, VP = :65 and VI = :15.The chart shows how VC varies with w for agiven level of correlation (“rho” in the chart),with the minimum combined error variance be-ing the lowest point on each curve. If there is per-fect positive correlation between the estimationerrors (½I,P = 1:0), then the combined error vari-ance is minimized by simply giving 100% weightto the method with the lower error variance. Ascorrelation decreases, one can see that the mini-mum combined error variance entails givingsome small weight to the method with the highererror variance. At the extreme, if the estimationerrors for the two methods are negatively corre-lated, then they can be perfectly offset to producezero combined error variance by using weightsfrom Equation 4.5–in this case approximately a67.6% weighting to the lower variance estimate.In the specific case of our CABI results for the

accident year at 42 months, we have the follow-ing.

² The mse of the PCLD method is 0.1249%.² The mse of the ICLD method is 0.0753%.² The observed correlation between the errors ofthe two methods is 60.3%.

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 181

Page 22: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

Table 4. Case reserve development method

The algorithm The starting point is a triangle of historical case reserves (i.e., adjusters’ individual claim estimates of unpaid claims).From that triangle a triangle of historical case reserve development factors is constructed, for all historical valuesexcluding the latest diagonal. For accident years beyond 150 months maturity, the current values of incurred lossesare selected as ultimate. The case reserve development factors are calculated for all earlier valuations of thoseaccident years by subtracting the paid losses at each valuation from the selected ultimate, and dividing by the casereserves at the same valuation. Working successively forward by accident year, the case development factor to applyto the next maturity is calculated by taking the weighted average of the latest ten observations. The projected ultimatefor that year is then used to calculate case reserve development factors at all prior maturities.

In calculating the case reserve development factors at 6 months maturity, the ultimate losses are divided by two toaccount for the accident half-year.

A “benchmark” set of case reserve development factors, inferred from the benchmark paid and incurred chain ladderfactors, is used to the extent that the observed development data does not extend to 150 months, and also to theextent that the case development factors exhibit high volatility.

The data Case reserves and cumulative paid claim and allocated loss adjustment expenses, in traditional triangular format, withvaluations from 6 months to 150 months. Reported claim and allocated expenses as of the current valuation foraccident years beyond 150 months. No other data is employed.

Intervention points Benchmark factors are selected to complete the development when the triangle does not extend to 150 months.

Assuming that these were the only two meth-ods available and we believed that the historicalperformance test results were reasonable indica-tors of the current situation, then the indicatedminimum-variance weighting would be 79.5%to the estimate from the ICLD method with thebalance given to the estimate from the PCLDmethod. The mse of the weighted average is0.0719%,an improvementover either of the stand-alone methods.The weighting approach derived in this sec-

tion could obviously be applied by individualaccident year with varying weights, or to theoverall estimates from each of the methods usingweights based on the variances and correlationsbetween the overall errors of the methods. It iseasily extendable to more than two methods, byexpansion of the combined variance equation to:

Vc =nXi=1

w2i £Vi+2£nXi=1

nXj=1,j>i

wi

£wj £ ½i,j £¾i£¾j (4.6)

Again, to minimize the combined variance onemust take the derivative of VC with respect to wi,subject to the constraint that

Pni=1wi = 1.

As the number of actuarial projection meth-ods gets larger, the required number of estimates

certainly grows, including the estimates of thevariance of each method and the correlation co-efficient between each pair of methods.

4.5. Case reserve development method

The next actuarial projection method we testedwas the case reserve development method (CRD),in which a development factor is applied to theunpaid case-basis claim reserves for each acci-dent year to obtain the estimated unpaid claim li-ability. The CRD method is a recursive approach,in that the projections from earlier accident yearsare used to calculate the factor to be applied tothe next accident year, successively. The CRDmethod is formally defined in Table 4.Application of the CRD method at the June

1988 valuation is illustrated on sheet 4 of theAppendix.Performance test results for the CRD method

for the accident year at 42 months maturity areshown in Figure 11.The CRD method exhibits better skill than ei-

ther the paid or incurred development method atolder maturities. The observed skill of the CRDmethod at 42 months is 58%, versus 52% and23% for the ICLD and PCLD methods, respec-tively. The observed skill of the CRD method at

182 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 23: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

Figure 11. CABI performance test results—Case reserve development method

Table 5. Reported claim count chain-ladder development method

The algorithm The basic chain ladder approach in which the cumulative reported claim counts at each historical maturity is divided bythe cumulative reported value at the preceding maturity to obtain a triangle of historical report-to-report developmentfactors. The simple average of the latest four observations is calculated at each maturity. It is assumed that there isno development beyond 150 months maturity.

The selected factors are chained together to produce report-to-ultimate development factors. The development factorfrom 6 months to 18 months is divided by two to account for the accident half-year.

A “benchmark” set of factors is used to obtain report-to-report factors to the extent that the observed developmentdata does not extend to 150 months.

The data Cumulative reported claim counts, in traditional triangular format, with valuations from 6 months to 150 months. Noother data is employed.

Intervention points Other than selecting the benchmark factors, no interventions are made.

54 months is even higher, at 67%. Similar re-sults have been obtained by the authors in per-formance test results for other liability lines. Itappears that for mature years, in liability linesthe case reserves alone are a better predictor thanthe incurred (i.e., paid plus case reserves) claims.This makes some intuitive sense, as the case re-serves relate directly to future claim payments onopen claims, whereas the cumulative paid claimsrelate to claims that are closed which are unre-lated to future claim payments. The skill differ-ence is particularly notable when the remainingcase reserves are variable, significant in some ac-cident years and less significant in others.The overall skill of the CRD method across

the latest ten accident years is 22%, substantiallylower than the 32% skill of the ICLD method,primarily because the CRD method has very littleskill for the early maturities.

4.6. Reported claim count chain-ladderdevelopment

Since projecting claim counts does not (by it-self) yield an estimate of unpaid claim liabili-ties, it does not technically qualify as an actu-arial projection method. Nonetheless, we testedthe performance of the chain-ladder applied toreported claim counts (RCCLD). Once again weuse a highly mechanical method, as outlined inTable 5 and illustrated at the June 1998 valuationon sheet 1 of the Appendix.Application of the RCCLD method is illus-

trated for the June 1984 valuation on sheet 5 ofthe Appendix.Figure 12 displays the performance test results

for the RCCLD method for the accident yearat 42 months. The RCCLD method exhibits re-markably better skill than any of the other meth-

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 183

Page 24: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

Figure 12. CABI performance test results—Reported count chain-ladder method

ods tested, primarily due to the stability in theclaim reporting process over the study period.Measured skill at 42 months is 99%. While casereserving policy and settlement practices changeddramatically, the reporting of claims into theclaim system did not.The skill of theRCCLDmethod ishigh at allma-

turities, ranging from a low of 95% at six monthsto a high of 99% at 42 months and beyond.The clear implication is that the claim count

data has very high information value. Methodsthat make use of the count data should thereforehave enhanced predictive skill over methods thatuse only the dollar data.

4.7. Incurred chain-ladder with casereserve adequacy adjustment

As was noted earlier, the historical experienceexhibits significant changes in case reserveadequacy at various points in time, reflectingchanges in case reserving policies at the com-pany. We therefore have tested the performanceof a method that measures and adjusts for casereserve adequacy before applying the incurredchain-ladder method. We will refer to the methodas case-adequacy adjusted incurred chain-ladderdevelopment (CAAICLD). The method is sum-marized in Table 6, and illustrated at the June1984 valuation on sheet 5 of the Appendix.

For comparison purposes, we start by analyz-

ing the performance results for the accident year

valued at 42 months maturity. Results are dis-

played in Figure 13.

The skill of the CAAICLD method at 42

months maturity is 52%, exactly the same as the

unadjusted ICLD method. However, a compar-

ison of Figure 13 with Figure 6 shows that the

estimation errors occur at different points in time.

The correlation of the estimation errors is only

37%. At this maturity, the CAAICLD method re-

duces the estimation errors caused by the changes

in case reserve adequacy, but introduces new er-

rors due to the imperfections of the case ade-

quacy adjustment process.

However, the overall skill of the CAAICLD

across the latest ten accident years is substan-

tially higher than the unadjusted ICLD method,

52% versus 31%. While skill is not enhanced at

42 months, it is substantially enhanced at 18 and

30 months. The reasons for the overall skill en-

hancement can be seen in Figure 14. The overall

magnitude of the estimation errors is reduced by

the case adequacy adjustment. Whereas the unad-

justed method produced estimation errors greater

than six loss ratio points three times, the adjusted

method never produces errors of that magnitude.

The adjusted method is far from perfect, as it

184 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 25: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

Figure 13. CABI performance test results—Case adequacy incurred chain-ladder

Table 6. Case-adequacy adjusted incurred chain-ladder development method

The algorithm Average case reserves are calculated by accident year and maturity for the latest six diagonals of loss developmentdata. A weighted average across all maturities except six months is computed for each diagonal to obtain an overallaverage case reserve at each valuation point. To prevent the overall averages from being influenced by a changing mixof claims by maturity, the weighted averages are all calculated using the current mix of claims as weights.

The annual increases in the weighted average case reserve are calculated and compared to an economic inflationmeasure. In this case we used the U.S. Consumer Price Index (CPI). For each diagonal of case reserves, the actualcase reserves are adjusted by a factor to reflect the extent to which the weighted average case reserve is increasingfaster or slower than the inflation rate. The factors are indexed to the current year, so that only the interior values arechanged; the latest diagonal of values is unchanged.

A similar adjustment is made to the case reserves at six months maturity; the current average case reserve is trendedup the column at the inflation rate.

The basic chain ladder approach is then applied to the adjusted incurred development data. The simple average of thelatest five observations is calculated at each maturity. A tail factor to provide for development beyond 150 months(the last available maturity in the triangle) is based on ratios of the current value of reported claim and allocatedexpenses to the reported value at 150 months for the latest four mature years.

The selected factors are chained together to produce report-to-ultimate development factors. The development factorfrom six months to 18 months is divided by two to account for the accident half-year.

A “benchmark” set of factors is used to obtain report-to-report factors to the extent that the observed developmentdata does not extend to 150 months.

The data Cumulative paid, case reserves and reported claims and allocated loss adjustment expenses, in traditional triangularformat, with valuations from six months to 150 months. Open claim counts in the same format. Reported claim andallocated expenses as of the current valuation for accident years beyond 150 months. No other data is employed.

Intervention points If any of the calculated simple average report-to-report factors are less than one, the value is set equal to one.Benchmark factors are selected to complete the development when the triangle does not extend to 150 months.

produces errors in excess of four loss ratio

points twice in the test period. The results

might be further improved by the use of an in-

flation index that was more relevant to Commer-

cial Auto Bodily Injury claim costs, or by other

alternative adjustment techniques than those we

employed.

4.8. Summary conclusions from thecase study

The case study presented in this section onlybegins to scratch the surface of performance test-ing. Test results for many additional methods,and variations on each one, could have beenpresented–making for a very long paper indeed.

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 185

Page 26: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

Figure 14. CABI comparison of adjusted and unadjusted ICLD errors

In addition, it is hard to present the test results ina paper format, as much of the real insight comesfrom poring over the details of the calculations.Our goal was merely to illustrate the methodol-ogy and demonstrate the potential insights thatcan be gained from it, making the case for incor-porating it into an ongoing control cycle for thereserving process. The key takeaways we wouldhighlight from the case study are the following:

² The accuracy, or skill, of actuarial methods canbe formally tested using the cross-validationapproach employed in other predictive sci-ences. Formal performance testing can help theactuary to understand the strengths and weak-nesses of each method, and more realisticallyassess the potential estimation errors arisingfrom them.

² Performance testing can guide the actuary inthe selection of methods, both overall and bymaturity. It also provides a methodology forcombining the estimates produced by differentmethods. Correlation between the methods isrelevant to the selection of methods, and theweighting scheme used to combine them.

² The accuracy of the basic chain-ladder meth-ods is seriously degraded when there arechanges in underlying claim policies and pro-cesses. The level of skill indicated by the casestudy is lower than one would expect absentthese changes. These changes are often de-

tectable by analysis of claim settlement ratesor average case reserves, which require claimcount data. Adjusting for changes in under-lying claim policies and processes enhancesskill.

² Claim count development is often more pre-dictable than claim dollar development, espe-cially when there are changes in claim proce-dures and processes. In these situations, meth-ods that make use of the claim count data arelikely to improve the accuracy of the estimates.

² Experimentation with less traditional methodsmay lead to enhancements in skill. Our exper-imentation with the case reserve developmentmethod (including tests not covered in this pa-per) suggests that it does outperform incurredchain-ladder in many liability lines.

Areas for further exploration would obviouslyinclude testing of methods that adjust for changesin claim settlement rates as well as changes incase reserve adequacy. In addition, tests of morecomplex methods, including regression-basedand stochastic methods, would be of interest.Finally, it would be useful to test whether over-

all skill is higher when the Bodily Injury Liabilityclaims are combined with the Property DamageLiability claims into a single projection, versusseparate projections on each. While homogeneityof the claims is a desirable property–arguing forgreater granularity–one of the authors believes

186 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 27: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

that greater consolidation often leads to greatercredibility (i.e., reduced noise in the data) andimproved skill. Further performance testing willhelp to find the optimal trade-off between im-proved homogeneity and improved credibility.

ReferencesBarnston, A., and H. van den Dool, “A Degeneracy in Cross-Validated Skill in Regression-Based Forecasts,” Journalof Climate 6, 1993, pp. 963—977.

Bellis, C., J. Shepherd, and R. Lyon, Understanding Actu-arial Management: The Actuarial Control Cycle, Sydney:Southwood Press, 2003.

Conger, R., and S. Lowe, “Managing Overconfidence,”Emphasis 3, 2003, pp. 6—9, www.towersperrin.com/emphasis.

Elsner, J., and C. Schmertmann, “Assessing Forecast Skillthrough Cross Validation,” Weather and Forecasting 9,1994, pp. 619—624.

Gray, W., and P. Klotzbach, “Extended Range Forecast ofAtlantic Seasonal Hurricane Activity and U.S. LandfallStrike Probability for 2008,” 2008, http://typhoon.atmos.colostate.edu/.

Goford, J., “The Control Cycle: Financial Control of a LifeAssurance Company,” Journal of the Institute of ActuariesStudents’ Society 28, 1985, pp. 99—114.

Hadidi, N., “CAS Board Approves Changes to the BasicEducation and Examination Structure,” The Actuarial Re-view 35:6, 2008, pp. 14—15.

Michaelson, J., “Cross-Validation in Statistical ClimateForecast Models,” Journal of Climate and Applied Me-teorology 26, 1987, pp. 1589—1600.

Wacek, M., “A Test of Clinical Judgment vs. Statistical Pre-diction in Loss Reserving for Commercial Auto Liabil-ity,” Casualty Actuarial Society Forum Winter 2007, pp.371—404.

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 187

Page 28: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

Appendix

188 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 29: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 189

Page 30: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

190 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 31: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 191

Page 32: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Variance Advancing the Science of Risk

192 CASUALTY ACTUARIAL SOCIETY VOLUME 3/ISSUE 2

Page 33: Claim Reserving: Performance Testing and the Control … Reserving: Performance Testing and the Control Cycle and measuring required economic capital for re-serve risk. Since estimates

Claim Reserving: Performance Testing and the Control Cycle

VOLUME 3/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 193