is it worth the while the relevance of qualitative information in credit rating

Upload: bui-van-luong

Post on 03-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    1/25

    1

    Is It Worth the While?

    The Relevance of Qualitative Information

    in Credit Rating

    Bina Lehmann*

    working paper, April 17, 2003

    JEL classification code: G21, C25, C10

    Summary

    This empirical study deals with the question whether soft facts (qualitative information, i.e.

    subjective judgments of credit analysts) considerably improve the forecast quality of bank internal

    credit ratings that are solely based on hardfacts (financial ratios, checking account data).

    An extensive sample (20.000 observations) of German SME credit data has been made available

    by a commercial bank to compare two models: one including, the other excluding qualitative in-

    formation. Logistic regression is used to forecast default probabilities. The econometric quality as

    well as the classification and separation performance of the two models are assessed statistically

    and graphically, using various measures. It is shown that most of the widely used measures cannotbe used for inter-sample comparison and offer therefore little informational content. Further, per-

    formance measures that are based on classification tables (i.e. a single cut-off value) should only

    be used if the (costs) benefits of (mis)classification are known. ROC (Receiver Operating Charac-

    teristic curve)-based measures and the inspection of ROC have been found to be the most useful

    criteria for model comparison. ROC inspection allows to compare models in a more qualitative

    way, adding information to the common inspection of numerical criteria.

    The problem of observing very few defaults is solved by stratifying the estimation sample and us-

    ing a re-sampling procedure. This results in empirical distributions of the performance measures.

    While, in similar studies, one (mean) value per measure and model is given, the existence of em-

    pirical distributions allows the difference between the two models to be quantified and to decide

    whether the models differsignificantly.

    The model including qualitative information significantly dominates the hard factsmodel in virtu-ally all respects. The inspection of ROC indicates that, contrary to credit analysts intuition, the

    hard-facts variable based on financial ratios performs better in the high-risk region than the soft-

    facts variable based on credit analysts judgments.

    It is not possible to infer from the results of this study whether the increase in performance of a

    credit rating model justifies the additional costs of obtaining qualitative information for a particu-lar bank. Yet, this study shows that subjective judgments are indeed capable of yielding valuable

    information and improve credit rating systems which are based solely on quantitative information

    by considerable amounts. It uses several methods to quantify this increase in forecast quality and

    is based, as one of the first studies so far, on an extensive set of real world credit data.

    * Bina Lehmann, University of Konstanz, Center of Finance and Econometrics, Box D147, 78457 Konstanz,

    Germany,[email protected]

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    2/25

    2

    1. Introduction

    Banks have been using internal systems to assess the creditworthiness of borrowers in one

    way or another1not only since internal credit rating systems have received closer attention in

    the discussion of the new capital requirements (Basel II)2.

    Extensive research has been conducted to compare different rating or classification methods.

    This research was usually based on publicly available information such as insolvencies and

    financial ratios from annual reports. Very little research is available on the role of softfacts

    (i.e. qualitative information: credit analystssubjectiveevaluations of the management qual-

    ity, market position etc.) in internal credit rating systems and whether this information con-

    siderably improves the forecast quality of credit rating systems which are solely based on

    hardfacts such as financial ratios. The reason is that subjective judgments of credit analysts

    is private information which is produced and used inside banks. Hardly any data have been

    available for research.

    This study uses an extensive sample of bank internal credit data. Two models, one based on

    hardfacts (financial ratios and checking account), the other additionally including softfacts

    (management quality, bank-customer relationship etc.) will be compared. Different aspects of

    the models performance are evaluated statistically and graphically, on the estimation sample

    and a hold-out sample. The problem of observing

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    3/25

    3

    time and the management of a banks credit portfolio risk. Basel II will allow banks to use

    credit ratings in the calculation of their regulatory capital requirements as well.

    Bank internal credit risk evaluation systems go by a number of names, such as expert systems,

    credit scoring or credit risk rating. These methods differ in the degree of subjectivity con-

    tained in the decision making process, i.e. the degree to which the system can be adapted to

    the individual case and, thus, the degree to which the decision is influenced by the credit ana-

    lysts or relationship managers personal opinion. Subjective elements in a credit rating sys-

    tem are not necessarily to be seen as negative but rather desirable. They allow the credit ana-

    lyst to include information in the analysis that would otherwise be left unused, such as exten-

    sive professional experience or additional relevant but non-quantifiable information beyond

    the one contained in the documents. The disadvantages of credit rating systems that rely heav-

    ily on expert judgments and subjective information are the difficulties in objectively compar-

    ing and re-examining past credit assessments. Besides, it is often very costly to gather qualita-

    tive individual information.

    While the selection and weighting of information used in credit rating systems with a high

    degree of subjectivity is determined largely by the personal experience and at the discretion of

    the credit rater himself, in strictly standardized credit rating systems it is determined entirely

    by statistical procedures. Past credit experience is considered as well by examining historical

    credit data. Information is statistically selected and weighted such that credit defaults will

    have the greatest chance of being identified in advance. This selection of variables and the

    accompanying weighting scheme will then be applied to all the individuals in a credit segment

    (SME, large corporates, retail credits etc.) regardless of individual characteristics. Contrary to

    credit rating systems with subjective components, individual adjustments of variable selection

    and weighting are not allowed.

    If standardized credit rating systems with fixed sets of variables and weighting schemes in-

    cludesoftfacts along with objective and quantifiable hardfacts, those rigidities are typically

    softened. A certain degree of flexibility is introduced to the system by allowing the credit ana-

    lyst to assess the soft rating criteria subjectively4. Whether or not an institute decides to use

    qualitative information in its credit rating system depends a) on the amount by which the

    forecast quality improves compared to an alternative purely quantitative rating system, and b)

    on the additional costs of obtaining the information. Since costs are specific to an institution

    this study will only be able to comment on whether or not forecast quality can be improved by

    considering qualitative information and, if so, whether this improvement is significant.

    In the following, it is assumed a bank considers to implement one of alternative standardized

    credit rating systems that differ in the set of variables used. The objective is to predict as ac-

    curately as possible whether a credit event (delayed payments or any other breach of contract)

    3Standardized in the sense that the credit rating is created electronically based on the soft and hard facts that are

    available, no individual choices of variables or weighting schemes.

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    4/25

    4

    occurs for a certain credit relationship within a certain period of time in the future. The occur-

    rance of such a credit event will generally be called default. A credit rating system is consid-

    ered of good quality if, looking back, debtors with a lower credit rating default more often

    than debtors with a higher credit rating. This separationor classification of debtors can be

    assessed objectively for all credit rating systems, regardless of their degree of subjectivity.

    companies /

    observations

    variablesD

    e fa

    ult

    V.1

    V.2

    V.3

    V.4

    V.5

    V.6

    V.7

    V.8

    V.9

    V.1

    0

    V.1

    1

    V.1

    2

    V.1

    3

    Block I Block II Block III

    hard facts soft facts

    ***

    RI RII RIIIRI RII RIII

    Final Score / Ratingstatistical classification

    procedures

    Figure 1: Structure of an standardized credit rating system (schematic).

    In this study an standardized credit rating system such as the one in Figure(1)is assumed. It

    includes a number of data in different, non-overlapping blocks of quantitative (hard facts)

    and/ or qualitative (soft facts) information. The hard facts include information from annual

    reports and transactions on checking accounts, thesoftfacts comprise qualitative information

    about subjective judgments of credit analysts or the use of discretionary powers in financial

    accounting.

    The block annual reports / financial accountingcontains financial ratios typically used in the

    assessment of credit risk, concerning a companys ability to repay its debts. Measures are

    based on liquidity, capital structure, turnover, cash flow and profitability

    5

    . Financial account-ing ratios are considered to be a reliable, easily obtainable and seemingly objective6basis for

    the assessment of credit risk. There is one major drawback to financial ratios: they merely

    offer a backward-looking perspective on the company. Moreover, at the moment a rating is

    created or updated financial accounting data are at least 6-12 months old. One of the strategies

    to fill this information gap is to request forward-looking financial information, i.e. forecasts

    4Antonov [2000]5The analysis of financial reports follows a long tradition. Research evolved around the question which variables

    to select (Beaver [1966], Altman [1968] and [1977]).6 Some authors doubt this assumption: Kting [1997].

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    5/25

    5

    of cash flows based on expected business scenarios7. Yet, these data are much less reliable

    and objective since they are mostly predictions.

    Another strategy to obtain more recent information about a companys financial health is a

    form of behavioral scoring, i.e. to monitor a companys checking account. If a company holds

    an account at only one bank (or passes the majority of transactions through this bank), the

    latter receives valuable information and indicators acting as an early warning system8 for

    weakened liquidity. Variables of interest include, among others, minimum / maximum bal-

    ances and their variance, the number and size of transactions, violation of lines of credit and

    whether the account holder acts as a creditor or debtor.

    Information blocks containing thesoftfacts about a company frequently deal with accounting

    policies9, the information in appendices of annual accounts10or subjective judgments of credit

    analysts11. The latter will be object of this study as well. To this point credit analysts subjec-

    tively assess a number of pre-determined criteria on a rating scale (from good to bad),

    these judgments are then weighted and aggregated. Banks use questionnaires12which are kept

    for documentation of the credit rating process. These questionnaires or rating sheets request

    information on management quality, the financial conditions of a company (apart from what

    is known from available accounting data), the market position and the quality of the bank-

    debtor relationship.

    Management quality is typically inferred from the education, professional and industry ex-

    perience of the top and middle management, the quality of management information systems

    (controlling, accounting) which allow for timely information of the management about finan-

    cial and operational risks, and the existence of a plausible long-term business strategy for the

    company13. Social skills and leadership qualities allow for a working atmosphere that pre-

    vents large fluctuations among employees. Since small and medium enterprises (SME) are

    especially prone to succession problems, plans for succession of the current management and

    continuity plans are definitely an issue to insure stability and the continuity of the business if

    one of the managers leaves the company which is often connected to a large drain of know-

    how.

    The assessment of financial conditionsdoes not merely repeat the automated analysis of an-

    nual accounts but aims to pave another way to close the gap between the rating moment and

    the most recent available annual account. Companies are asked to provide recent preliminary

    (not audited) accounting data. Efficient and permanent management of liquidity and risks are

    7Grott et.al. [2000]

    8Eisfeld [1935], Thanner [1991], Reuter [1994], Fritz / Hosemann [2000]

    9Eigermann [2001]

    10Werner [1990], [1990a]11

    Grunert et.al. [2002]12

    Norden [2001] shows an example of a questionnaire13

    Merz [1999]

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    6/25

    6

    evaluated. Plans for the current and upcoming financial years are used to forecast the devel-

    opment of cash flows, profitability and growth.

    A companys market positionis determined by the prospects of the industry (potential of the

    market, profitability and competition) as well as the positioning of the company itself within

    the relevant local and wider industry setting. Credit analysts evaluate the current impact of

    the company on the market as well as its future impact depending on the quality of its brands,

    production program, sales and marketing systems. Strong dependence on one or a few large

    suppliers and customers raise the companys sensitivity to external influences and weaken its

    ability to beat the market in downturns.

    Banks put special emphasis on the assessment of the bank-customer relationship. Here the

    trustworthiness, the reliability of statements and agreements and the companys willingness to

    provide the bank with timely and correct information are important. The length of the cus-

    tomer relationship and degree of mutual trust that has been established will be judged by the

    credit analyst. None of these criteria depend on the current financial situation of the company

    but on a continuous and reliable flow of information.

    A greater amount of information usually improves the forecast quality of a banks credit rat-

    ing system and, thus, its ability to plan and price accurately. Frequently, missing information

    or a customers lacking willingness to inform the bank will have a greater impact on the rating

    than just through the unfavorable assessment of the bank-customer relationship. A customer is

    indirectly punished for holding back information because all the criteria that cannot be as-

    sessed by the credit analyst because of missing information will be rated at the lower end of

    the available rating scale as well such that customers have an additional incentive to provide

    all the relevant information completely, reliably and timely to the bank.

    The credit rating system in Figure (1) does not include collateral, guarantees or other risk

    mitigating structures. In this study, we only look at issuer ratings. The final conditions offered

    to the customer depend on the rating of the issue.

    When constructing a credit rating system, historical credit data are used to select variables and

    weighting schemes. Looking at the credit rating system in Figure (1), all the criteria belong-ing to the same information block are combined in apartial rating(Block I: RI, Block II: RII

    etc.) such that an eventual default, likewise observed in the past, would be predicted as accu-

    rately as possible. The partial ratings (one per information block) are finally summarized, re-

    sulting in afinal score from which the credit rating is derived14.

    The statistical classification procedures in (braces in Figure (1)) may differ from one rating

    system to another. All of them aim to separate or classify as accurately as possible the good

    (non-default) from the bad (default) creditors. Parallel to the question which variables to

    14While the score is usually a continuous variable, the credit rating is discrete with a fixed number of categories.

    To this end the score is divided up into a requested number of adjacent sections. A rating is assigned to every

    section. Higher scores and higher ratings also mean higher probabilities of default.

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    7/25

    7

    use in credit scoring systems a great number of empirical studies turned to the question which

    classificationmethodto use. Basically, three methods have been discussed: Linear Discrimi-

    nant Analysis (LDA), Neural Networks (NN) and Logistic Regression (LR)15. The latter will

    be used in this study. The method has shown high performance, is easy to apply and is based

    on the maximum likelihood principle which produces estimators with many econometrically

    desirable characteristics16.

    The following empirical study shows for an extensive sample of small and medium enter-

    prises (SME) that the introduction of qualitative information (softfacts) to a credit rating sys-

    tem considerably improves its forecast and classification ability. Starting point is a model

    which is based entirely on quantitative information (annual account, checking account). The

    quality of every model will be assessed by comparing a number of quality measures.

    3. Data

    The data used in this study stem from the SME (Small and Medium Enterprises) portfolio of a

    German commercial bank. 20.000 observations (companies) have been selected into the over-

    all sample, including a pre-set fraction of defaults of 2%. This value has been chosen arbitrar-

    ily, but it reflects the typical structure (and associated estimation problems) of credit portfo-

    lios that contain less than 5% default observations fairly well. The sample consists of SME

    from all industries17, about two thirds of the companies reported annual turnovers up to 5 mil-

    lion EUR. Since the sample was artificially selected, it is not representative of either thebanks portfolio or the entirety of German SME in terms of size, industry and number of de-

    faults.

    The analysis assumes the following sequence of events:

    12 06

    FIN

    AN (e.g. manage ment quality) AN (future prospects)

    RATING

    PROGNOSEZEITRAUM

    blocks of

    information

    12 1206

    Y E A R 2 Y E A R 3 Y E A R 4Y E A R 1

    OBSERVATION PERIOD

    CA

    Figure 2: Sequence of events in the assumed credit rating process.

    15Just to mention a few: Altman et.al. [1994]: LDA vs NN, Anders / Szeszny [1998]: NN, Wiginton [1980]: LR

    vs. LDA, Thomas [2000]: various. Baetge / Heitmann [2000]: fuzzy-rule based models. Fritz/ Hosemann

    [2000] conduct an extensive study of various classification methods.16

    Textbooks on econometrics, e.g. Verbeek [2000]17

    but no start-ups or very young companies because accounting data are not very reliable. Banks use different

    models for very young companies that do not make extensive use of accounting data.

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    8/25

    8

    Every rating is assumed to be updated on December 31, 1999. Defaults are recorded during

    the following 12 months, i.e. the observation period lasts from January 1 until December 31,

    2000.

    A default(DEF=1) occurred if a first-time loan loss provision(LLP) has been recorded dur-

    ing the observation period. First-time means that no LLP has been recorded during the last

    three years18. The LLP, a bank internal event, does not mean that the debtor has defaulted

    completely on the loan, i.e. stopped payments altogether. Yet, the fulfilment of the contractual

    agreements seems unlikely. The value of the loan is reduced, the bank is forced (by German

    law) to deduct the appropriate amount. It is one of the default criteria mentioned in the New

    Basel Capital Accord (BCBS [2001]).

    The partial rating FIN (financial report) is based on the automated analysis of annual ac-

    counts, i.e. all quantitative data. A number of financial ratios have been combined here, taking

    into account the broader industry of the SME (e.g. services, goods production etc.).Figure (2)

    shows that the variable FIN contains information which is at least 6 months old, leaving an

    information gapbetween the last available annual report and the update of the rating.

    To fill this gap and to get an idea about the future prospects of the SME, two other partial

    ratings are considered: CA (checking account) contains information from the behavioural

    scoring of the checking account, as explained in section 2. This partial rating can be generated

    at any point in time, close to the moment of the rating update, and contributes to the bridging

    of the information gap. The analysis of the checking account is automated and contains only

    hardfacts.

    Finally, the partial rating AN (analyst) is based on subjective judgments of credit analysts.

    The questions asked cover the immediate past (e.g. development of financial conditions since

    the last audited annual report), but also the status quo (e.g. management quality) and an out-

    look into the future (e.g. strategic plans).

    Variable DEF is binary, i.e. has 2 outcomes: DEF = 1 (default) and DEF = 0 (non-default).

    Variables FIN, CA and FIN are discrete with 10 categories (1 = lowest probability of default,, 10 = highest probability of default). Table (1) contains descriptive statistics of the ex-

    planatory variables.

    18The analysis was also carried out with (not first-time) LLP in 2000as definition of default. A number of eas-

    ily classifiable observations entered, the performance of the rating system rose by a considerable amount.

    Yet, this is rather trivial. Usually, defaulted loans enter a separate monitoring process. The bank is most of all

    interested in the surprises in its non-default loan portfolio. The true capabilities of a credit rating systemshow in the prediction of first-time LLP, not the extrapolation of past LLP.

    The definition of the default criterion has great impact on the results. Therefore, studies with a different de-

    fault criterion cannot be compared easily.

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    9/25

    9

    Mean (std.dev.) std.error

    DEF = 0 DEF = 1

    FIN (quant.) 6.29 (2.191)

    6.25 (1.868) 7.95 (1.868)***

    CA (quant.) 5.32 (2.149)

    5.28 (2.133) 7.41 (1.889)***

    AN(qual.) 6.00 (1.953)

    5.97 (1.948) 7.43 (1.607)***

    Table 1: Means (standard deviation) of the explanatory variables. ***: means of the de-

    fault (DEF=1) and non-default (DEF=0) observations differ significantly at the 1% level (two-

    sample t-test and Wilcoxon19

    ).

    The means of the explanatory variables are given for the overall sample (standard deviation inbrackets) and for the default (DEF=1) and non-default (DEF=0) populations separately. The

    two-sample tests for equivalence of the means indicate that the means of the default-

    population differ significantly from the means of the non-default population for all three vari-

    ables, i.e. there is a clear separation of the default and non-default distributions. Every one of

    the explanatory variables alone is able to classify the observations to some degree. Higher

    partial ratings go along with a higher probabilities of default.

    DEF FIN CA AN

    DEF 1 0.09275 *** 0.11397 *** 0.09306 ***

    FIN 0.10705 *** 1 0.30470 *** 0.32041 ***

    CA 0.13135 *** 0.39924 *** 1 0.20284 ***

    AN 0.10676 *** 0.41135 *** 0.26371 *** 1

    Table 2: (Dependence measures) Shaded areas: Kendalls Tau, non-shaded areas:

    Spearman (rank) correlation.***

    : all values differ significantly from zero (zero: no correla-

    tion).

    Table (2)contains the values of two measures of dependence among the explanatory variables

    and between the explanatory variables and the default variable: Kendalls Tau and Spear-

    mans rank correlation coefficient. Of the three input variables, CA shows the highest correla-

    tion with DEF. FIN and AN show similar, but lower values. Thus, of the three variables FIN,

    CA and AN, it replicates the default variable DEF the most. Still, the correlations between the

    default variable and the respective input variables are rather low, due to the definition of the

    default criterion which emphasizes surprises in the non-default portfolio and excludes the

    rather trivial observations that have already been in default20.

    19The H0of normality is rejected for these variables.

    20Including previously defaulted observations, the correlations range between 0,31 and 0,42.

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    10/25

    10

    Now, after having looked at the univariate properties of the input variables FIN, CA and AN,

    in the following we will look at models combining the information contained in the hard facts

    (FIN, CA) andsoft facts (AN) variables.

    4. Empirical Results: Combining qualitative and quantitative information

    Two models shall be compared, using logistic regression. The first model uses only quantita-

    tive (hardfacts) variables to explain the binary dependent variable DEF, the second models

    introduces an additional qualitative (soft facts) variable.

    Model Dependent Variable Independent Variables

    FC DEF FIN, CA

    FCA DEF FIN, CA, AN

    Table 3: Models FC and FCA: FC includes only quantitative variables in the analysis,

    FCA uses additional qualitative information.

    The comparison of the two models allows to decide whether qualitative information can im-

    prove the performance and quality of credit risk rating models that are solely based on quanti-

    tative information such as financial accounting data and the analysis of checking accounts.

    The specific structure of credit data poses econometric problems. Usually, the samples con-

    tain comparably few (

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    11/25

    11

    To circumvent selection bias problems due to the selection of the estimation sample from a

    rather large overall sample, 100 random samples are drawn (Figure [3]). To make sure that the

    model does not only perform well on the estimation sample the rest of the overall sample is

    used as a hold-out sample. Performance is tested on both the estimation and the hold out sam-

    ple. The following repeated sampling procedure is carried out 100 times23:

    selection of the

    estimation sample

    (ES)

    logistic regression

    performance tests

    (ES)

    performance tests

    (HS)

    test results

    (ES), (HS)

    ModelA,M

    odelB

    calculation of Y*

    (hold out sample (HS))

    initial sample

    100 x

    Figure 3: Repeated sampling: the sequence sample selection (estimation and hold out

    sample), regression (estimation sample), diagnostic tests (estimation and hold out sample) is run100 times to prevent sample selection bias effects.

    From the (large) initial sample, 100 smaller estimation samples (n = 1,000 , n1/n = 1/3) are

    drawn randomly. Both models FC and FCA are estimated on the same set of estimation sam-

    ples. The rest of the initial sample is each time used as a hold out sample. Using the estimated

    coefficients for models FC or FCA, predictions are made for the observations in the hold out

    sample. Then, diagnostic tests (pseudo R2) are run on the estimation sample, performance

    tests (classification measures) are run on both the estimation sample and the hold out sample

    23The number of runs is sufficient as all the hypothesis tests in the following sections can be carried out satisfac-

    torily.

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    12/25

    12

    to evaluate the out-of-sample performance of the models. The resulting empirical distributions

    of the various performance measures are finally used to compare the models.

    The following diagnostic (econometric quality) and performance (classification) measures

    will be used to compare the models FC (all quantitative information) and FCA (quantitative

    and qualitative information). Some of the routinely used measures to evaluate classification

    performance will be shown to be very restrictive in meaning and therefore less suitable for

    model evaluation.

    Diagnostic: Econometric quality

    R2

    MF McFaddens R2 contrasts the log-likelihoods of the model in question and

    a trivial model

    R2

    MZMcKelvey-Zavoinas R

    2

    OLS-R2 of the latent linear model in the Logistic Re-

    gression (based on a decomposition of the variance)

    Performance: Classification,

    based on single threshold

    PC Percentage Correct percentage of correct predictions

    E!E" Alpha and Beta Error misclassification measures, read from contingency tables

    Q Yules Qmeasure of association (Goodman/Kruskalls Gamma in

    2x2 tables)

    Performance: Separation, inde-

    pendent of single threshold

    CoCCoefficient of Concor-dance

    Wilcoxon statistic and also AUC (area under the ROC),probability of the correct ranking of observations accord-

    ing to their default status

    ROCReceiver Operating Char-acteristic Curve

    graphical summary of a models classification perform-ance (compare CoC (AUC))

    Table 5: Overview of diagnostic and performance tests used for model selection.

    There is no single measure to evaluate a classification model comprehensively. A variety of

    measures is used to evaluate different aspects of a model. Frequently, not a single model per-

    forms best in allthe criteria. The final selection of a model depends on the intended use and

    preferences. While diagnostic measures like the (pseudo) R2are specifically used to evaluate

    the logistic regression, the performance measures are not specific to the logistic regression but

    can be used with every classification procedure. To evaluate the classification performance of

    the models, three different types of measures will be looked at: the well-known and routinely

    used measures that are based on a single contingency table and, thus, on a single threshold;

    ROC-based measures that do not imply a single threshold but cover the whole range of

    thresholds and one measure based on the difference between predicted default probabilitiesand observed outcomes.

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    13/25

    13

    4.1 Econometric Quality: Pseudo-R2

    McFaddens R2(R2MF)

    McFaddens R2 is also called likelihood ratio index which states pretty clearly what it is

    about: a comparison of the restricted log-likelihood (of a model containing only a constant)

    0 to the unrestricted log-likelihood of the model specification in question 1

    24. It is a

    Pseudo-R2measure and should not be interpreted as or confused with the R2known from Or-

    dinary Linear Regression (OLS). It is called R2because it always lies between 0and 1.If0

    and 1 differ considerably in size, then the additional variables that were used in the unre-

    stricted model but not the restricted model helped to explain the observed data better and,

    thus, increased the log-likelihood. Therefore, R2

    MF measures the degree to which the addi-tional variables helped to raise the log-likelihood, compared to a trivial model. The greater the

    R2MFthe better the model explains the data that have been used in the estimation. However it

    neither tells anything about the classification power of the model nor how good it works on

    the hold out sample. It is solely a measure of the econometric quality of the estimation.

    McKelvey-Zavoinas R2(R2MZ)

    R2MZ is used to evaluate the logistic regression25. It is based on a variation decomposition of

    the estimated logits (y*)26. It dominates other Pseudo-R2in its power to differentiate between

    models, i.e. a better model is more clearly separated from a worse27. Theoretically it lies be-

    tween 0 and 1. Generally, the farther R2MZdeviates from zero, the better the observed varia-

    tion is explained by the regression. The interpretation of the R2MZcomes, among the various

    pseudo-R2for binary logit models, closest to that of the OLS-R2, as it basically resembles the

    OLS-R2of the underlying linear model.

    24McKelvey, R. / Zavoina, W. [1975], Verbeek [2000], pp. 182 and Windmeijer [1992], pp. 95.

    25Windmeijer [1992] or Hosmer / Lemeshow [2000]

    26y* is the logit tranformation of the probability p. The logit transformation y* of a probabilty pof an event isthe logarithm of the ratio between the probability that the event occurs and the probability that the event does

    not occur: y* = log(p/(1-p)).27

    Compare simulation studies by Veall/Zimmermann [1996]

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    14/25

    14

    F C A FC FCA FCA }FC

    R2

    MF ES0.111

    (0.0125)

    0.166

    (0.0139)

    0.104

    (0.0113)

    0.198

    (0.0145)

    0.229+++

    (0.0154)100

    # # #

    R2

    McZ ES0.192

    (0.0208)

    0.274

    (0.0206)

    0.178

    (0.0187)

    0.327

    (0.0216)

    0.381+++

    (0.0229)100

    # # #

    Table 628

    : R2

    -- [F], [C], [A]: test results for individual input variables. [FC], [FCA]: test

    results for quantitative model FC and quantitative/qualitative model FCA: Means (standard devia-

    tion) of diagnostic and performance tests of the 100 outcomes per measure for the hold out sam-

    ple. ES: estimation sample; TS: test (hold out) sample. (+ + +

    ): means of the models FC and FCA

    differ significantly at the 1% level (paired two-sample t-test29

    ) [FCA }FC] : Number of cases (out

    of the 100 estimations) in which model FCAproduced a more favorable value for the respective

    measure than modelFC (FCA }FC). (# # #

    ): The probability that FCA }FC is significantly larger

    than 50% (FCA }FC by chance), 1% level.

    Table [6] shows that model FCA (including qualitative information) performs significantly

    better than model FC (only quantitative information). Obviously, the more complex model

    FCA is the better specification. Of the individual input variables, CA explains the available

    data best.

    4.2 Classification based on a single threshold: PC, alpha- and beta-error, Yules Q,

    Measures of the single-threshold type look at the binary variables DEF (observation) and

    FE~

    D (prediction). A discrimination procedures (preferably) continuous output has to be

    transformed into a binary variable, information is lost.

    The Logistic Regression returns for each observation an estimate of its probability of default.

    This variable 1p~ is continuous between 0 and 1. Every observation with t1p

    ~ > is assumed to

    default during the observation period. In the Logistic Regression, this threshold is set to t =

    0.5. Thus, the continuous variable 1p~ is transformed into the binary variable FE

    ~D . Informa-

    tion is lost, because the size of the estimated probability of default is not considered.

    Contingency tables (Table [7]) are used to compare predicted defaults ( FE~

    D ) and observed

    defaults (DEF) at a particular threshold t. The cells contain the numbers of cases in which

    forecast and observation coincide (DEF and FE~

    D are equal) and the numbers of cases in

    which they dont (DEF and FE~

    D are not equal). The fraction of correctly predicted defaults

    (n11/ n*1) is also calledsensitivity (SENS), the fraction of correctly predicted non-defaults (n00

    28 Tests on whether the empirical distributions of the performance measures differ significantly between the

    estimation and the hold out sample suggest that the measures 1-PC and BSdepend on the structure of the

    sample. Since the estimation sample and the hold out sample contain different fractions of defaults, this ques-

    tion cannot be answered for them. For both R2, the !-error and CoC the performance does not differ signifi-

    cantly on the estimation sample and the hold out sample. In the following, only the results for the hold out

    samples will be considered.29

    The H0of normality cannot be rejected for any of the measures. The paired two-sample t-test is used because

    the measures are calculated using the same 100 holdout samples for models FC and FCA. For every measure,

    there exist 100 paired observations (one for model FC, the other for model FCA).

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    15/25

    15

    / n*0)specificity (SPEC) of a rating system. If the threshold tis moved, the entries in the con-

    tingency table will change and also the values of all measures that are based on it.

    DEF(Beobachtung)

    0 1

    0 n00 n01 n0*FED~

    (Prognose)

    1 n10 n11 n1*

    n*0 n*1 n

    Table 7: Contingency table (schematic) to compare predicted and observed defaults.

    Consider, for example, moving the threshold to t=0. This means that all observations with apredicted probability of default greater or equal to zero will be classified as defaults. In table

    [7], n00and n01will be empty. Conversely, if t is set to 1, i.e. only observations with probabili-

    ties of default greater or equal to one will be considered defaults (this number of observations

    is usually zero), all observations will be considered non-defaults, n10 and n11will be empty.

    Both scenarios describe the same model, only the threshold has been moved, possibly arbi-

    trarily. As we will see later, it is hardly of any use to compare models based on a single

    threshold as long as the choice of this threshold has not been influenced by costs or other rea-

    sonable considerations.

    -error and -error

    The -errorE(a default observation was forecasted to be a non-default by the system) and

    the -errorE(a non-default observation was forecasted to be a default by the system) can be

    directly read from a contingency table:

    -error: E = n01/ n*1= 1 - SENS

    -error: E= n10/ n*0= 1 SPEC [1]

    They describe how good (how poor) the rating system is at identifying defaults and non-

    defaults correctly. Typically, if the percentage of one type of observation in the sample is low

    (e.g. default observations) the corresponding error will be high. Small Eand Eare desirable.

    They change inversely, however, if the threshold tis moved. Therefore, within one model, E

    can only be lowered at the expense of rising Eand vice versa. Which level of t to choose

    depends on the costs that Eand Eimpose upon the user of the classification model.

    In credit rating applications, E typically creates opportunity costs (predicted defaults will not

    enter the credit portfolio, even though they might be non-defaults and, thus, profitable busi-

    ness) while Ecreates immediate losses if a company that was classified as non-default unex-

    pectedly defaults on its obligations.

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    16/25

    16

    As has been mentioned before, it is of little use to state the size of E and Efor an arbitrary

    threshold t. For, if one of the error seems too high, it can easily be reduced by increasing the

    size of the other. What would be much more interesting to know is whether one model shows

    consistently lower E/ E- combinations across the whole range of thresholds, compared to

    an alternative model. We will come back to this discussion in the ROC section.

    Yules Q

    Yules Q is a measure of association. It lies between 1(complete association, i.e. forecast and

    observation coincide, DEF = FE~

    D ) and1(complete disassociation, i.e. forecast and observa-

    tion do not coincide, DEF # FE~

    D ). A Qclose to one points to a high fraction of concordant

    pairs or correctly forecasted events30.

    Qcan be formulated in terms of Eand E(with SENS = 1 - Eand SPEC = 1 - E)31:

    Q = ( 1 - E E) / (1 - E 2 E[1 - E] + E )

    = (SENS - E) / ( SENS 2 E SENS + E ) [2]

    It neither depends on the sample size nor the fraction of default observations in the sample

    and is therefore a lot more useful for model comparison inter-sample (i.e. samples of different

    structure) than the percentage of observations that has been correctly classified by the model

    (PC).

    Percentage Correct (PC)

    The fraction of correctly classified observations

    PC = (n00+ n11) / n [3]

    is a widely used measure but should not be used to compare models across samples or studies,

    because it depends on the structure of the sample, i.e. the percentage of default observations

    in the sample32( 1p ):

    PC = ( 1 - 1p ) (1 E ) + 1p (1 - E )

    = ( 1 - 1p ) SPEC + 1p SENS [4]

    Equation [4] implies that 1p does not affect the result if SPEC = SENS or E= E. Yet, its

    impact becomes greater if Ediffers strongly from Ewhich is the case in credit rating appli-

    cations. Intuitively, in a sample with very little default observations (difficult to detect, high

    30Yules Q corresponds to Goodman & Kruskalls Gamma for 2x2 contingency tables.

    31Swets [1996], ch. 3

    32This has fact has already been discovered by Gilbert [1885]. In addition,PCis not a useful criterion by itself

    because, in our case (overall sample with default rate of 2%) it could simply be moved up to 98% just by declar-ing all observations to be non-defaults. E, however, would be 100% (i.e. none of the default observations has

    been identified correctly).

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    17/25

    17

    E)PCmight have a certain value a. If additional non-default observations are added to the

    sample (easy to detect, low E),PCrises (cet.par.) to a+, even though Eand Ehave stayed

    the same for the overall sample. Conversely, if difficult-to-detect default observations are

    added proportionately (such that Estays the same),PCwill drop (cet.par.) to a-.

    Usually Eis higher than the E, i.e. it is more difficult to predict defaults than non-defaults.

    PC will be lower for samples with a higher fraction of defaults. Therefore, besides being

    based on a single threshold, it cannot be used to compare the quality of different models or

    rating systems when different samples are used in the analyses. It is listed here because all

    models are based on the same set of samples which allows to find out whether there is an sig-

    nificant improvement from one model to another. The absolute value of the PC cannot be

    interpreted or compared to values found in other studies which are based on other samples.

    F C A FC FCA FCA }FC

    E! TS0.586

    (0.0653)

    0.455

    (0.0663)

    0.743

    (0.0508)

    0.472

    (0.0585)

    0.448+++

    (0.0597)73

    # # #

    E" TS0.164

    (0.0287)

    0.185

    (0.0170)

    0.113

    (0.0004)

    0.152

    (0.0110)

    0.152

    (0.0088)49

    Q TS0.565

    (0.0738)

    0.680

    (0.0602)

    0.449

    (0.1094)

    0.721

    (0.0569)

    0.743+++

    (0.0506)68

    # # #

    1-PC TS0.166

    (0.0284)0.186

    (0.0168)0.115

    (0.0004)0.153

    (0.0109)0.153

    (0.0086)49

    Table 8: single-threshold classification measures

    -- [F], [C], [A]: test results for indi-

    vidual input variables. [FC], [FCA]: test results for quantitative model FC and quantita-

    tive/qualitative model FCA: Means (standard deviation) of the 100 outcomes per measure for the

    hold out sample (TS). Refer to table [6] for further reference.

    Table [8] contains the results for the single-threshold classification measures. PCcan only be

    compared across models if a) the sample structure is the same (i.e. same percentage of default

    observations), which can be confirmed due to the repeated sampling procedure in figure [3]

    and b) if the threshold tis set according to the same mechanism (e.g. cost structure: twould

    be set to minimize costs, the model that induces the lowest overall costs would be chosen)

    which is also the case here because tis set endogenously in the maximum likelihood proce-

    dure of the logistic regression. For E, Eand Q only b) applies.

    Eand PC do not differ significantly between the models but the standard deviation is lower

    for the FCA model which clearly dominates model FC in terms of E and Q. Of the input

    variables, again CA performs best in the majority of the criteria. These model evaluations are

    only valid at the particular threshold that has been chosen here.

    From the single-threshold classification measures no information is available on model per-formance at alternative thresholds. This problem can be solved if model comparison is carried

    out through the inspection of the models ROC and ROC-based measures.

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    18/25

    18

    4.3 Performance: Separation, multiple thresholds: Receiver Operating Characteristic

    While the measures that have been looked at so far assess both models at a single threshold

    (set by the logistic regression at 1p = 0.5 , i.e. all enterprises with a forecasted probability of

    default greater than 0.5will be considered as defaultswhile all enterprises with a lower fore-

    casted probability of default will be assumed to be non-defaults), it seems obvious that this is

    a very limited view. The optimal threshold (in terms of a trade off between Eand E) de-

    pends on the costs of erroneously classifying a default as non-default (E , costs of the de-

    fault) and the costs of erroneously classifying a non-default as default (E, opportunity costs

    of foregone business).

    Therefore, models should be judged by the combinations of Eand Ethat they impose over

    the whole range of possible thresholds. In the logistic regression context, thresholds tcould

    range from1

    p = 0 (all enterprises with a forecasted probability of default greater than zero,

    i.e. allenterprises, will be considered as defaults: no business, the lender is very cautious and

    closes its gates) to1

    p = 1(all enterprises with a forecasted probability of default greater than

    one, i.e. none of the enterprises, will be considered a default: the lender accepts all debtors

    and has to bear the costs of a high number of defaults in his portfolio).

    The ROC is a graphical representation of combinations of Eand Eover the entire range ofthresholds, it is created by varying the threshold and calculating Eand Eat every threshold.

    For each threshold t, the sensitivity ( 1 - E ) is plotted against one minus the specificity

    (E)33.

    The further away the ROC lies from the positive diagonal, the better is the models classifica-

    tion performance. The positive diagonal represents a model that orders default- and non-

    default observations strictly by chance with no classification power at all. A ROC above the

    diagonal implies high (low) values of the decision- or threshold-criterion for (non-)default

    observations, i.e. if the observations are ordered, non-defaults come first. A ROC below the

    diagonal implies that the observations are ordered the other way around. In this study, ROC

    lie above the positive diagonal.

    A model performs better than an alternative model over the entire range of thresholds if its

    ROC lies completely above the alternative models ROC. If the two ROC cross, however, the

    decision to choose either one depends on the performance of the models in the critical region

    of relevant thresholds (depending on the costs of !- and "-error, as mentioned above).

    33If the sensitivity is plotted against the specificity (instead of the 1 specificity) the ROC runs below the di-

    agonal instead of above the diagonal.

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    19/25

    19

    0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

    0.2

    0.1

    0.4

    0.3

    0

    0.6

    0.5

    0.8

    0.7

    1

    0.9

    1

    0

    0.8

    0.9

    0.6

    0.7

    0.4

    0.5

    0.2

    0.3

    0.1

    0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

    1 Specificity (FALNEG/n0)

    Specificity (NEG/n0)

    Sens

    itiv

    ity

    (POS/n

    1)

    1-Sensitivity(FALPOS/n1)

    non-identified non-defaults (beta-error) correctly identified non-defaults

    non-identifieddefaults(alph

    a-error)

    correctlyidentified

    defaults

    (Forecast: Probability of default)

    Threshold: Score

    01

    smalllarge

    0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

    0.2

    0.1

    0.4

    0.3

    0

    0.6

    0.5

    0.8

    0.7

    1

    0.9

    1

    0

    0.8

    0.9

    0.6

    0.7

    0.4

    0.5

    0.2

    0.3

    0.1

    0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

    1 Specificity (FALNEG/n0)

    Specificity (NEG/n0)

    Sens

    itiv

    ity

    (POS/n

    1)

    1-Sensitivity(FALPOS/n1)

    non-identified non-defaults (beta-error) correctly identified non-defaults

    non-identifieddefaults(alph

    a-error)

    correctlyidentified

    defaults

    (Forecast: Probability of default)

    Threshold: Score

    01

    smalllarge

    Figure 4: Schematic: Receiver Operating Characteristic (ROC) in 1-Specificity vs. Sen-sitivity space. The curve is the graphical representation of !- and "-errors at varying thresholds. A

    models classification performance for thresholds set at low values of forecasted probability of de-fault appears in the right part, resulting in high "- and low !-errors. The reverse is true for thresh-

    olds set at higher values of forecasted probability of default. Thresholds do not decline linearly

    when moving from left to right.

    The slope of the ROC is a graphical representation of the trade-off between Eand E(figure

    [4]).Starting from any point on the curve, a marginal reduction (increase) in E results in a

    marginal increase (reduction) in E. The following cost-benefit-analysis34 allows to choose

    the cost-optimal threshold t c.

    )(

    )(

    1

    0

    EC

    SENSB

    EC

    SPECB

    n

    noptslope

    = [5]

    with n0(n1) the number of non-defaults (defaults) in the sample and C (B ) the costs (bene-

    fits) associated with incorrect (correct) classified observations.

    The proportion of the costs connected to E and E leads to an cost-optimal marginal ex-

    change rate of Evs. E. The slope in equation [5] marks the point on the curve associated

    with the optimal threshold t c. If costs cannot be determined perfectly, the procedure allows to

    determine at least a range of reasonable thresholds. The model that is dominant over this

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    20/25

    20

    range of thresholds should then be chosen even though it may not be dominant over the entire

    range of thresholds.

    In figure [5] the mean ROC for all models are shown. Inspection of the ROC for models FC

    and FCA in Figure [5] finds model FCA dominating model FC over the entire range of

    thresholds except for low Eand, thus, high thresholds t, i.e. the performance is very similar

    for sensitivities an between up to 0.5and E> 0.5which is just the region looked at by the

    single-threshold measures in Table [8]. The individual-variable-ROC support the earlier find-

    ings: CAshows the highest performance of the three input variables, FIN and AN show simi-

    lar performance but their ROC cross. This qualitativefact means that AN shows better classi-

    fication performance if threshold tis low, FIN shows better classification performance if the

    threshold is set at a higher value, i.e. credit analysts opinions (AN) separate better at low

    scores (high quality, low PD) while the score based on financial information (FIN) separates

    better at high scores (low quality, high PD)35. This seems to contradict credit analysts intui-

    tion, might point to the fact, however, that if profitability or liquidity crises occur, tools based

    on hard facts show better performance. This qualitative piece of information cannot be col-

    lected from any single-number measures but only from the inspection of ROCs. Therefore,

    the interpretation of single-number measures such as the CoC should not go without the in-

    spection of ROC, especially if the CoC are very similar because the performance of alterna-

    tive models could either be similar across the whole range of thresholds or it could vary over

    different intervals of thresholds and equal out.

    34Swets [1996], ch. 11

    35Compare figure [4]: at low grades/thresholds (upper right corner) E!is low and E"is high, at high grades

    (lower left corner) E!is high and E"is low.

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    21/25

    21

    ROC : FCA/FC (m eans , tes t sample ), FIN/CA/AN (all obs)

    0

    0,1

    0,2

    0,3

    0,4

    0,5

    0,6

    0,7

    0,8

    0,9

    1

    0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1

    1 - Specificity

    Sensitivity

    FCA

    FC

    FIN

    CA

    AN

    null

    Figure 5: ROC (all observations) for input variables FIN, CA, AN and mean ROC (of

    100 hold out samples) for models [FC] and [FCA].The diagonal marks the location of the ROC

    for a non-performing model.

    Often it is desired and useful to obtain a first impression about the location of the ROC. One

    measure to describe the location of the ROC, i.e. the performance of the model at various

    thresholds, is the size of the area under the ROC relative to the SENS / (1 SPEC) space(area under the curve = AUC, also: CoC).

    Performance: Separation, multiple thresholds: Coefficient of Concordance (CoC)

    The coefficient of concordance36

    +=

    = =

    score

    scorei

    score

    scorei

    iiii nnnnnn

    cocmax

    min

    max

    min

    0101

    10

    5,0'1

    [6]

    (with n1i (n0i)the number of default (non-default) observations with score iand n0ithe num-

    ber of non-default observations with a score greater than i) measures the degree ofseparation

    between the default and non-default observations.

    It corresponds to the Wilcoxon statistic and the area under the ROC37(AUC) and can be inter-

    preted as the probability that a randomly chosen default observation is (correctly) higher

    ranked than a randomly chosen non-default observation. The larger the CoC the better.

    36Fritz / Hosemann [2000]

    37Hanley / McNeil [1982]

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    22/25

    22

    F C A FC FCA FCA }FC

    CoC TS0,718

    (0.0258)

    0.771

    (0.0254)

    0.719

    (0.0234)

    0.792

    (0.0232)

    0.812+++

    (0.0211)99

    # # #

    Table 9: CoC - Means (standard deviation) of performance test. TS: test (hold out)

    sample. Refer to table [6] for further reference.

    Table [9] sums up the discussion of ROC-based measures. Model FCA reached a hold-out

    sample CoC of 0.81 which is significantly greater than that of model FC. Over the whole

    range of thresholds, model FCA would be preferred over model FC. Of the input-variable

    models, CA performs best (this fact has already been established from the inspection of the

    ROC). Even though FIN and AN show similar classification performance in their CoC they

    behave quite differently across the range of thresholds. This piece of qualitative information

    cannot be contained in a single-number measure but requires the inspection of ROC.

    CoC are suitable for comparison across samples with different structures. Still, the definition

    of the default criterion would have to be the same. If the default criterion has been chosen such

    that default observations will be more difficult to detect (e.g. defaults occur farther in the

    future, sample excludes past defaults, default events are only those very late in the progression

    towards insolvency etc.), model performance will lower than for default criteria which imply

    easier detection of defaults.

    Summing up, the model including qualitative information (FCA) dominates the quantitative

    model FC in all the different measures used in this study, either by producing significantlybetter results (comparison of means) or by reducing the variability in the results. Thus, even

    though the qualitative input variable AN does not dominate the quantitative input variables

    FIN and CAby itself, it significantly improves a model that is based on quantitative informa-

    tion only.

    5. Conclusions

    In this study, the relevance of qualitative information, i.e. subjective judgments of credit ana-lysts, in credit scoring models has been analysed for an extensive data set of German SME

    (small and medium enterprises). Two models, one relying solely on quantitative information

    (from the analysis of financial ratios and transactions on checking accounts), the other addi-

    tionally including qualitative information (credit analysts judgments on the intermediate and

    future financial situation, market position, management quality and relationship to the bank),

    have been compared using a set of different criteria. Some of the commonly used classifica-

    tion measures have been found to be of limited comparability. Generally, classification meas-

    ures should

    a) not depend on the structure of the sample (i.e. fraction of defaults and non-defaults),

    because if they do they are not suitable for model comparison across samples of different

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    23/25

    23

    structure (i.e. estimation sample vs. hold out sample or different studies). Those measures

    (Percentage Correct)can only be used to compare different models based on the same sample

    (e.g. detection of significant differences), absolute values cannot be interpreted.

    b) not rely on a single threshold unless the threshold has been chosen for a specific rea-

    son (e.g. cost considerations). Models should rather be compared according to how they per-

    form over the whole range of measures or, at least, over relevant intervals of the threshold.

    Therefore single-threshold measures such as Percentage Correct, Yules Q, !-error and "-

    errorare of limited informational capacity.

    As a solution, ROC-based procedures have been proposed because they are independent of

    the sample structure and (arbitrary) single thresholds. The optimal threshold is found by in-

    troducing cost considerations in the trade-off !-error vs. "-error.

    From an extensive sample of credit files, 100 estimation samples of size n = 1 000 have beendrawn. Because credit data typically contain

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    24/25

    24

    6. References

    Altman, E.A. [1968]: Financial Ratios, Discriminant Analysis and the Prediction of Corpo-

    rate Bankruptcy, in:Journal of Finance23 (4), pp. 589-609.

    Altman, E. I. / Haldeman, R. / Narayanan, P. [1977]: Zeta Analysis: A New Model to Iden-

    tify Bankruptcy Risk of Corporations, in: Journal of Banking and Finance, pp. 29-54.

    Altman, E.I. / Marco, G. / Varetto, F. [1994]: Corporate Distress Diagnosis. Comparisons

    Using Linear Discriminant Analysis and Neural Networks (The Italian Experience),

    in:Journal of Banking and Finance, pp. 505-529.

    Anders, U. / Szczesny, A. [1998]: Prognose Von Insolvenzwahrscheinlichkeiten mit Hilfe

    Logistischer Neuronaler Netze, in:zfbf50/10, pp. 892-915.

    Antonov, I. [2000]: Crafting a Market Landscape, in: The Journal of Lending and Credit Risk

    Management2/2002, pp. 34-40.

    Baetge, J. / Heitmann, C. [2000]: Creating a Fuzzy Rule-Based Indicator for the Review ofCredit Standing, in:zfbf, sbr - Schmalenbach Business Review52, pp. 318-343.

    BCBS(Basel Committee on Banking Supervision) [1999]: A New Capital Adequacy Frame-

    work, consultative paper.

    BCBS(Basel Committee on Banking Supervision) [2000]: Range of Practice in Banks' Inter-

    nal Ratings Systems, discussion paper January 2000.

    BCBS (Basel Committee on Banking Supervision) [2001]: The New Basel Capital Accord,

    second consultative document, January 2001.

    Beaver, W.H. [1966]: Financial Ratios as Predictors of Failure, in: Empirical Research in

    Accounting, Selected Studies (Institute of Professional Accounting), pp. 71-111.

    Carey, M. [2001]: Some Evidence on the Consistency of Bank's Internal Credit Ratings,

    working paper: April 21,2001, Federal Reserve Board.

    Crouhy, M. / Galai, D. / Mark, R. [2001]: Prototype Risk Rating System, in: Journal of

    Banking and Finance25, pp. 47-95.

    Eigermann, J. [2001]: Quantitatives Credit-Rating unter Einbeziehung Qualitativer Merkma-

    le, Verlag Wissenschaft & Praxis, Kaiserslautern.

    Eisfeld,C. [1935]: Kontenanalyse im Dienste der Kreditbeobachtung, in: Sparkasse55/17,

    pp. 333-345.

    English, W.B. / Nelson, W.R. [1998]: Bank Risk Rating of Business Loans, working paper

    (November 1998), Federal Reserve Board.

    Franck, S. / Hoheneck, F. [1999]: Scoring im Gewerblichen Firmenkundengeschft, in:

    Schmoll, A. (ed.): Kreditrisiken erfolgreich managen, Gabler, pp.149-162.

    Fritz, S. / Hosemann, D. [2000]: Restructuring the Credit Process: Behaviour Scoring for

    German Corporates, in: International Journal of Intelligent Systems in Accounting,

    Finance & Management9, pp. 9-21.

    Gilbert, G.K. [1885]: Finleys tornado predictions, in: American Meteorological Journal1,

    pp.167-172.

    Grott, R. / Kruschwitz, L. / Lffler, A. [2000]: Zukunftsbezogene Kreditwrdigkeitsprfung,

    in:Kreditwesen9/2000, pp. 474-478.

  • 8/12/2019 Is It Worth the While the Relevance of Qualitative Information in Credit Rating

    25/25

    Grunert, J./ Norden, L./ Weber, M. [2002]: The Role of Non-Financial Factors in Internal

    Credit Ratings, Discussion Paper 3415, Centre for Economic Policy Research, Lon-

    don.

    Hanley, J.A. / McNeil, B.J. [1982]: The Meaning and Use of the Area under a Receiver Op-

    erating Characteristic (ROC) Curve, in:Radiology143, pp. 29-36.

    King, G. / Zeng, L. [2000]: Logistic Regression in Rare Events Data, working paper, Harvard

    University 14. Januar 2000.

    Kting, K. [1997]: Der Wahrheitsgehalt Deutscher Bilanzen, in:DStR3/97, pp. 84-91.

    Maddala, G.S. [1983]: Limited Dependent and Qualitative Variables in Econometrics, Cam-

    bridge University Press.

    McKelvey, R. / Zavoina, W. [1975]: A Statistical Model for the Analysis of Ordinal Level

    Dependent Variables, in:Journal of Mathematical Sociology4, pp. 103-120.

    Merz, A. [1999]: Moodys Management Quality Ratings Methodology for Rating Custodian

    Banks, Moodys Investors Service Global Credit Research, November 1999.Norden, L.[2001]: Gewhrung und Gestaltung einer Fremdfinanzierung Entscheidungen in

    der Kreditpraxis, in: Eisenfhr, F. / Langer, T. / Weber, M [2001]: Fallstudien zum

    rationalen Entscheiden, Springer, Berlin.

    Reuter, A. [1994]: Unternehmens-, Konto- und Bilanzanalyse, in: Betriebswirtschaftliche

    Bltter 7, pp. 347-354.

    Swets, J.A. [1996]: Signal detection Theory and ROC Analysis in Psychology and Diagnos-

    tics, LEA (Lawrence Earlbaum Associates), Mahwah / New Jersey.

    Thanner, W. [1991]: Die Analyse der Kontokorrentverbindung als Instrument zur Risiko-

    frherkennung im Firmenkundengeschft der Banken, Schriftenreihe der StiftungKreditwirtschaft an der Universitt Hohenheim, ed.: Prof. J.H. von Stein, Bd. 9.

    Thomas, L.C. [2000]: A Survey of Credit and Behavioral Scoring: Forecasting Financial Risk

    of Lending to Consumers, in:International Journal of Forecasting16, pp. 149-172.

    Treacy, W.F. / Carey, M. [2000]: Credit Risk Rating Systems at Large US , in: Journal of

    Banking and Finance24, pp. 167-201.

    Tutz, G. [2000]: Die Analyse Kategorialer Daten, Lehr- und Handbcher der Statistik, Ol-

    denbourg, Mnchen.

    Veall, M.R. / Zimmermann, K.F. [1996]: Pseudo-R2Measures for some common limited de-

    pendent variable models, in:Journal of Economic Surveys10/3, pp. 241-259.Verbeek, M. [2000]: A Guide to Modern Econometrics, John Wiley & Sons, Chichester.

    Werner, U. [1990]: Die Bercksichtigung nichtnumerischer Daten im Rahmen der

    Bilanzanalyse, in:Die Wirtschaftsprfung13/1990, pp. 369-376.

    Werner, U. [1990a]: Die Messung des Unternehmenserfolgs auf Basis einer kommunikati-

    onstheoretisch begrndeten Jahresabschlussanalyse, 1990 in print.

    Wiginton, J.C. [1980]: A Note on the Comparison of Logit and Discriminant Models of Con-

    sumer Credit Behavior, in: Journal of Financial and Quantitative Analysis 15/3,

    pp.757-769.