predicting business failure using an rsf-based case-based reasoning ensemble forecasting method

Journal of ForecastingJ. Forecast. (2011)Published online in Wiley Online Library(wileyonlinelibrary.com) DOI: 10.1002/for.1265

Predicting Business Failure Using anRSF-Based Case-Based ReasoningEnsemble Forecasting Method

HUI LI* AND JIE SUNSchool of Economics and Management, Zhejiang NormalUniversity, China

ABSTRACTCase-based reasoning (CBR) is considered a vital methodology in the currentbusiness forecasting area because of its simplicity, competitive performancewith modern methods, and ease of pattern maintenance. Business failure predic-tion (BFP) is an effective tool that helps business people and entrepreneurs makemore precise decisions in the current crisis. Using CBR as a basis for BFP canimprove the tool’s utility because CBR has the potential advantage in makingpredictions as well as suggestions compared with other methods. Recent studiesindicate that an ensemble of various techniques has the possibility of improvingthe performance of predictive model. This research focuses on an early investi-gation on predicting business failure using a CBR ensemble (CBRE) forecastingmethod constructed from the use of random similarity functions (RSF), dubbedRSF-based CBRE. Four issues are discussed: (i) the reasons for the use of RSFas the basis in the CBRE forecasting method for BFP; (ii) the means to constructthe RSF-based CBRE forecasting method for BFP; (iii) the empirical test onsensitivity of the RSF-based CBRE to the number of member CBR predictors;and (iv) performance assessment of the ensemble forecasting method. Results ofthe RSF-based CBRE forecasting method were statistically validated by com-paring them with those of multivariate discriminant analysis, logistic regression,single CBR, and a linear support vector machine. The results from Chinese hotelBFP indicate that the RSF-based CBRE forecasting method could significantlyimprove CBR’s upper limit of predictive capability. Copyright © 2011 JohnWiley & Sons, Ltd.

KEY WORDS business failure prediction (BFP); case-based reasoning ensemble;random similarity functions (RSF); nearest neighbor ensemble;hotel failure

INTRODUCTION

Among all techniques for business failure prediction (BFP), such as rough set (McKee,2003), decision tree (McKee and Greenstein, 2000; Gepp et al., 2010), discriminant analysis(Beaver, 1966; Altman, 1968), neural networks (Adya and Collopy, 1998; Setiono et al., 2011;

* Correspondence to: HUI LI, School of Economics and Management, Zhejiang Normal University,PO Box 62, 688YingBinDaDao, Jinhua, Zhejiang 321004, China. E-mail: [email protected]

Copyright © 2011 John Wiley & Sons, Ltd.

H. LI and J. SUN

Borrajo et al., 2011; Wu, 2011; Marcano-Cedeno et al., 2011), minimal optimization technique(Hu and Ansell, 2009), discrete-time duration model (Nam et al., 2008), semi-parametric method(Hwang et al., 2007), and support vector machine (SVM) (Hardle et al., 2009), case-based reasoning(CBR) with k nearest neighbor (kNN) algorithm as its core is receiving increasing attention becauseof the following advantages:

� CBR holds the potential characteristic of making predictions as well as providing suggestions.� CBR with kNN as its core is a nonparametric model with conceptual simplicity.� It can be used even with a small dataset, and users can easily incrementally add new samples or

detrimentally delete old samples.� Its error rate is bounded by twice the Bayes error rate if the size of the training dataset approaches

infinity.� The upper limit of its predictive performance could match or surpass those of sophisticated classi-

fiers or many modern methods in real-world applications, including decision tree, neural network,and support vector machine (Ho and Baird, 1997; Bay, 1998; Okun and Priisalu, 2005).

Ensemble forecasting refers to making predictions on some specific business tasks with a combinationof individual forecasting methods and models from a given dataset. This type of forecasting has thepossibility of achieving improved prediction in terms of stability and accuracy by aggregating resultsof diverse, unstable, and good predictors. The target of improving CBR’s predictive performanceis a continuous focus in the area of CBR with kNN as it core. Aside from the traditional means,ensemble forecasting provides an alternative means to improve CBR’s predictive accuracy in BFP bychanging distance metric or manipulating samples in the training dataset (Dasarathy, 1991). Thisresearch aims to propose a CBR ensemble (CBRE) forecasting method for BFP with the hopeof improving CBR’s upper limit of predictive performance without negatively affecting its advan-tages. This forecasting method combines multiple CBR member predictors with kNN as its core byusing randomly generated similarity functions. Multivariate discriminant analysis (MDA), logisticregression (Logit), single CBR, and a linear support vector machine were used as benchmarks.

This paper is organized as follows. The next section discusses the necessity to base such CBREforecasting method on random similarity functions (RSF). The third section presents the process ofusing RSF in generating member CBR predictors and specifies the RSF-based CBRE forecastingmethod. The fourth section designs an empirical research method, while the fifth section discussesthe results. Finally, the sixth section presents the conclusion.

LITERATURE REVIEW

Previous research studies seldom focused on the issue of CBR ensemble. Several studies investigatedkNN ensembles. They are reviewed as follows. Classical ensemble algorithms that involve a signifi-cant degree of resampling or replication of samples, namely bagging, boosting, and error-correctingoutput coding, might not significantly improve predictive performance of kNN (Breiman, 1996; Baoet al., 2004; Quinlan, 1996) because all these ensemble algorithms are expected to produce diversesample sets, whereas kNN may be a stable classifier for samples (Breiman, 1996). Thus, some otherapproaches should be considered in CBR ensemble with kNN as its core. Ho (1998) integrated arandom subspace method with the algorithm of kNN to generate an ensemble forecasting method.Empirical results show that the ensemble produced superior performance compared to the singlekNN algorithm. Labels of multiple k neighbors were combined instead of outputs of multiple kNNmember predictors. This treatment can be regarded as an ensemble of nearest neighbors from multiple

Copyright © 2011 John Wiley & Sons, Ltd. J. Forecast. (2011)DOI: 10.1002/for

Predicting business failure using an RSF-based CBRE forecasting method

kNN algorithms instead of an ensemble of kNN predictors. Alternatively, Bay (1998) investigated theuse of voting on outputs of member kNN predictors, each of which has access to a random featuresubset. The results showed that the ensemble forecasting method significantly outperformed varioussingle kNN predictors. These two studies are similar in generation of diverse feature subsets for kNNmember predictors, and are different in producing final predictions by ensemble.

Besides the use of randomly generated feature subsets, some other studies investigated the per-formance of kNN ensembles with the generation of diverse feature subsets consisting of selectedoptimal features. Cunningham and Zenobi (2001) presented the ensemble of diverse CBR predictorswith kNN as its core from the perspective of the case representation issue. A hill-climbing strategywas used to select optimal features in the composition of different case representations. They showedfour examples where CBR ensemble based on case representation issue produced better predictiveperformance. A similar approach was conducted on the multiple-view classification area. Okun andPriisalu (2005) used multiple views in an ensemble of kNN algorithms, with results revealing thatthe ensemble provided more promising performance than single-view prediction. The approach ofgenerating multiple views for an ensemble is similar to the random subspace method. The differenceis that the former method uses cross-validation to select features.

Other than the use of sample-oriented and feature-oriented approaches, the use of different sim-ilarity functions could also produce diverse CBR member predictors with kNNs as the cores forensemble forecasting. Bao et al. (2004) presented an approach to combine multiple kNN predictors,each of which has access to one of six similarity functions. The six similarity functions were respec-tively transferred from the six distance functions, namely Euclidean distance function, heterogeneousEuclidean overlap metric, heterogeneous value difference metric, interpolated value difference metric,windowed valued difference metric, and discredited value difference metric. The combination processtook place in the ensemble of labels of multiple nearest neighbors instead of the ensemble of predic-tions of the six kNN predictors. The results demonstrated that the ensemble produced dominatingpredictive performance. Zhou and Yu (2005) integrated the random use of six distance functions withclassical bagging. Empirical results show that this hybrid approach effectively improved accuracy ofthe classical kNN algorithm. Li and Sun (2009a, 2009b), combined multiple-member CBR predic-tors generated by employing four different similarity functions with majority voting and SVM. Theresults indicated that the CBR ensemble forecasting method produced a better performance in termsof an integrated view of mean accuracy, median accuracy, maximum accuracy, minimum accuracy,and standard deviation. However, predictive accuracy of the CBR ensemble was not always betterthan that of the best single CBR predictor from the sole view of mean accuracy with the varying of k.

Thus significance of the research of RSF-based CBRE foresting method is as follows.

� From the above analysis on the use of CBRE and kNN ensemble forecasting methods, four typesof approach that may be potentially useful in construction of CBRE for BFP are discovered,namely: resampling approach, randomly generating feature subset approach, selecting feature sub-set approach, and selecting similarity function approach. However, investigation on whether or notthere are some other alternative approaches is useful. The injection of randomness in predictivemethods and models could help improve diversity among member predictors (Dietterich, 2000).Since the so-called randomness has already been integrated with selection of samples and featuresto respectively be a resampling approach and a randomly generating feature subset approach inconstruction of CBRE with kNN as the core, investigation on integrating randomness with similar-ity function of CBR is valuable. Thus the investigation on RSF-based CBRE forecasting method isuseful from the perspective of the CBR area.


H. LI and J. SUN

� Most of the previous research studies of CBR ensemble are conducted outside the area of businessforecasting, except for two by Li and Sun (2009a,b). CBR with kNN as its core is known as one ofthe chief methods in business forecasting, and this method has already been used to forecast busi-ness failure (Jo and Han, 1996; Jo et al., 1997; Bryant, 1997; Park and Han, 2002; Ahn and Kim,2009; Li and Ho, 2009; Lin et al., 2009; Li and Sun, 2010). As a result, it is important and usefulto analyze how CBR’s upper limit of predictive performance can be improved. Thus it is necessaryto investigate whether the combination of randomness and CBR with kNN as its core can helpinvolved people forecast business failure more effectively. Thus the investigation on RSF-basedCBRE is useful from the perspective of the business forecasting area.

RSF-BASED CBRE FORECASTING METHOD FOR BFP

Mechanism of using RSF to generate member CBR predictorsCBR with kNN as its core belongs to lazy learning, which means that no separate training phaseexists. All computations take place in the forecasting phase. When a target sample, i.e. a target case,is given for forecasting, CBR with kNN as its core will identify several neighboring historical casesfrom case base, i.e. training dataset, and then use majority voting to produce the forecast. The keyissue of retrieving neighboring cases is based on the calculation of similarity between a pair of cases.The classical CBR algorithm with kNN as its core is founded on the similarity function transferredfrom the Euclidean distance. This mechanism of similarity measure can be illustrated as follows(Li and Sun, 2010):

Similarity.c1, c2/D 1

1CEuclidean.c1 ,c2/

where

Euclidean.c1, c2/Dpjx1,1 � x2,1j2C jx1,2 � x2,2j2C : : :C jx1,n � x2,nj2

(1)

where c1 and c2 represent two cases described by n-dimensional continuous feature vectors, and xexpresses a feature value. The Euclidean distance is a special case of the Minkowski distance withthe order set as 2. Similarity functions transferred from the Minkowski distance can be presented asfollows:

Similarity.c1, c2/D 1

1CMinkowski.c1 ,c2/

where

Minkowski.c1, c2/D pi

pjx1,1 � x2,1jpi C jx1,2 � x2,2jpi C : : :C jx1,n � x2,njpi

(2)

where p 2 .0,1/ is a turning parameter. Various similarity functions can be obtained by settingdifferent values of pi. The smaller the value of pi is, the more robust to data variations the distancemetric becomes. In contrast, the bigger the value of pi is, the more sensitive to data variations thedistance metric becomes.

It is a feasible method to inject randomness into the algorithm of CBR with kNN as its core inorder to produce diverse member CBR predictors. The process for the use of randomness to producediverse member CBR predictors is as follows. A random generator is combined with the parameterof pi; as a result of this treatment, the so-called random similarity functions, each of which is usedto generate a member CBR predictor, can be obtained. All the similarity functions comprise the can-didate function set, from which a group of similarity functions will be randomly used to generatemember CBR predictors.



Framework of the RSF-based CBRE forecasting method for BFPThe frame of the RSF-based CBRE, illustrated in Figure 1, consists of four steps, namely ini-

tialization, RSF, generation of member CBR predictors, and ensemble. The detailed mechanismof RSF-based CBRE is as follows. First, the CBRE randomly inputs several similarity functions.Secondly, the CBRE uses each similarity function to generate a member CBR predictor, which willproduce k nearest neighbors independently from the same case base. Third, member CBR predictorsindependently combine outputs of their k nearest neighbors and determine the class of target casebased on voting. Finally, the CBRE integrates outputs of each member CBR predictors and producesthe final forecasting on business state of target sample company.

This step consists of case representation and data preprocessing, both of which aim to make mem-ber CBR predictors produce good performances. Good predictors should be used in an ensemble(Kuncheva, 2004). The initialization step ensures that good member CBR predictors are involved. Ifall member CBR predictors are poor in predictive performance, the ensemble of poor CBR membersmight not yield results that can raise the upper limit of predictive performance. For example, if allmember CBR predictors are more likely to be incorrect on the average, then the ensemble of suchmembers could possibly yield a worse predictive performance. Case representation is a key issue

Figure 1. RSF-based CBRE forecasting method for BFP


H. LI and J. SUN

in the CBR method. It refers to the process of representing cases using significant features. Somestatistical approaches, such as stepwise MDA, stepwise Logit, t-test and ANOVA, and expertise-based approaches, are commonly used to find the so-called significant features. CBR with kNN as itscore is sensitive to features. Thus irrelevant features could decrease the predictive performance of asingle CBR predictor. Significant features which can make a member CBR predictor produce goodperformance should be employed.

CBR with kNN as its core has some weaknesses, which should be considered in the initializationstep. (i) if one of the features has a relatively large range, then this feature will overpower the otherfeatures. This means that the feature with a large range will be automatically weighted fairly morethan the other features in the calculation of similarity. Features with large ranges are not the same asfeatures with high significant levels. This weakness could make CBR produce poor predictive accu-racies. Therefore a data-preprocessing process is needed to normalize all features by dividing thedistance for each feature by the range of that feature. This mechanism is illustrated as follows:

jx1,j � x2,j j Djx1,j � x2,j j

range.xj /D

jx1,j � x2,j j

max.xj /�min.xj /(3)

where xi ,j refers to the value of the i th case on the j th feature, and max(xj ) and min(xj ) express themaximum and minimum values of the j th feature, respectively. (ii) CBR based on Minkowski dis-tance is not good at handling categorical features, which is another weakness of CBR. The treatmentof making CBR with kNN as its core effective in dealing with categorical features is not necessaryfor this research since we employ continuous features. However, if one attempts to apply the fore-casting method to solve other problems with significant categorical features, such revisions should beincluded in the initialization step.

Random similarity functions (RSF) stepThe RSF step is to ensure that good and diverse member CBR predictors are involved. Diversityamong member CBR predictors is important for an ensemble. If all member CBR predictions providethe same results, it is not possible to correct a mistake by combining these predictors. In comparison,if each member CBR predictor makes different errors in the same cases, then it is possible for anensemble of these members to reduce the total error rate of a forecasting system. The two most pop-ular methods ensuring diversity among member predictors are the use of different training datasets,such as bagging, boosting, and the use of different parameters. This research is conducted to use thethird method by employing different pi values of the Minkowski distance. As a result, different simi-larity functions are used. Randomness is injected into the generation of different parameters in orderto ensure that diversity indeed exists among member CBR predictors. The hybrid use of differentparameters and randomness fulfills our aim.

The generation of RSF relies on a stochastic process which randomly selects some of the givencandidate similarity functions in constructing each member CBR predictor. In the case of CBR withkNN as its core, only the selected similarity functions have nonzero contributions to the output ofCBRE when a target case company is compared with historical cases in the same case base. Eachtime a random similarity function is selected, a new set of k nearest neighbors is retrieved. The knearest neighbors in each selection are then integrated by voting on class membership of the targetcase. This step can be illustrated as follows. Formally, given a set of P similarity functions:

f.sim1, sim2, : : : , simP /jsimp is a similarity functiong (4)



All these similarity functions are derived from formula (2) by varying the value of pi continuously.Randomly selected similarity functions are thus considered as follows:

f.sim1, sim2, : : : , simT /jsimt is a similarity function randomly selected from simp and T << P g(5)

The RSF step is illustrated in Figure 2.

Step of generating member CBR predictorsFor each target case (i.e. a sample company), multiple k nearest neighbors can be retrieved from thesame case base according to each of these RSF. Each member CBR predictor is constructed on thebase of one of these randomly selected similarity functions. Class labels of k nearest neighbors in eachmember CBR predictor are used to assign the class of a target case, i.e. business state. The randomsubset of similarity functions is selected by choosing from the original similarity functions. Selectionwith replacement, which means that a similarity function can be randomly chosen more than once, isused. Some member CBR predictors may be the same. If a similarity function is selected more thanonce—twice, for instance—then the weight of the CBR predictor based on the similarity function isconsidered to be twice that of member CBR predictors based on similarity functions selected onlyonce. However, selection of a similarity function more than once does not usually happen, since pi isa continuous variable.

The stochastic procedure of randomness is to introduce independence among all member CBRpredictors. Thus the discriminative power of the whole CBR system could be strengthened by com-bining the decisions of member CBR predictors. By randomly selecting member CBR predictors

Figure 2. RSF method


H. LI and J. SUN

based on diverse similarity functions, a certain independence among all member CBR predictors isintroduced. The upper limit of decision power could be improved by the ensemble of individual deci-sions of member predictors (Ho, 1998). This mechanism of making member CBR predictions is easilyinterpretable without adding much complexity into CBR.

EnsembleThe dynamics of voting among a set of member CBR predictors holds the possibility of making theCBRE produce a more dominant performance by combining some good (not the best) and diversemember CBR predictors. The CBRE as a whole hopefully achieves high accuracy if its member CBRpredictors make different errors. Randomly selecting different similarity functions is an attempt tomake member CBR predictors produce different and uncorrelated errors. The scheme of making aprediction with the CBRE is illustrated in Figure 3.

The voting mechanism is also easily interpretable, since this treatment is one of the most fre-quently used methods in a real society. As a result, the ensemble step does not add much complexityinto CBR. Thus the RSF-based CBRE method is constructed without making the CBR method toocomplex. The advantages of classical CBR are, as a result, inherited.

Parameters of RSF-based CBREExcept for different similarity functions, each of the member CBR predictors has access to the samefeature set and samples (i.e. case base). The RSF-based CBR forecasting method has four parame-ters: the number of nearest neighbors (i.e. k/, the total number of member CBR predictors (i.e. T /,the maximum value of pi, and the total number of candidate similarity functions (i.e. P /. Since piis a continuous variable, P ! 1 no matter what value is set for pi. All the first three parameterscan be empirically investigated for the appropriate ranges, or they can also be determined by exper-tise, which is derived from many empirical studies. They can also be set as common values. In thisresearch, k is set as a common value of 5; the maximum value of pi is set as 3. T is respectively setas 3, 5, 7 to investigate sensitivity of the RSF-based CBR to the number of member predictors.

Figure 3. Forecasting with the RSF-based CBRE



M2-E-R model of RSF-based CBR ensembleThe R4 model of CBR (Aamodt and Plaza, 1994) greatly helped CBR to be widely applied in var-ious domains. However, the R4 model illustrates a single CBR. This work helps to encourage CBRresearch into the new stage of CBR ensemble. Thus the R3 model can be revised as an ensembleversion, i.e. the M2-E-R model consisting of multi-retrievals, multi-reuses, ensemble, and retrain,which is illustrated in Figure 4.

‘Multiple case retrievals’ refers to the process of retrieving a set of k nearest neighbors indepen-dently by member CBR predictors. All retrievals are based on randomly selected similar functionsin the RSF step. ‘Multiple case reuses’ refer to the process of making a prediction in each memberCBR predictor with its own k nearest neighbors. ‘Ensemble’ refers to the process of making the finalforecasting by integrating outputs of the T member CBR predictors. ‘Retain’ refers to the process ofadding new cases to the case base if necessary, which is the same as that of the traditional R4 model.The R of revision can be included in the new life cycle model if the CBR ensemble aims to solveproblems from other domains instead of forecasting.

Figure 4. M2-E-R model

AlgorithmThe pseudo-code of RSF-based CBRE can be presented as follows:

RSF-based CBRE (TC, CB, Y , k, T , PI)Input: TC: target Sample Company (business state to be predicted)

CB: case baseY : label of business state {y1 D�1, y2= 1}k: number of nearest neighborsT : number of member CBR predictorsPI: maximum value of pi


H. LI and J. SUN

Process:For t 2 {1 . . . T } do

// initialize the case base by case representation and data preprocessingCBt Init iali´e.CB/;// generate a similarity simp by randomly generate a value in (0, PI]simp RandomSet ((0, PI]);// retrieve k nearest neighbors of target sample company from case base with random similarityfunctionZ NearestNeighbors (TC, CBt , k, simp/;// predict label of target sample company by simply voting among nearest neighbors in CBRmembert� sign(˙.Y (x2Z//);

End for// predict label of target sample company by voting among all member CBR predictorsTC.label sign(˙.t� 2 Y //;Output: TC.label2{-1, +1}

EMPIRICAL RESEARCH

This empirical research has two objectives: (i) investigation on sensitivity of the RSF-based CBREforecasting method to the number of member CBR predictors; and (ii) investigation on whether theRSF-based CBRE forecasting method can help people make more precise decisions.

Data and variablesThe so-called business failure in this research refers to economic failure of a company, which meansthat business expenses exceed revenues. Chinese listed companies that had negative income in twoconsecutive years were regarded as failed samples. Companies that never had negative income in twoconsecutive years were regarded as non-failed samples. Accounting information from their publishedstatements was collected and calculated to represent sample cases. In order to make a prediction onwhether a company will fail in the next two or more years with the annual returns for last year, wecollected the dataset by the following means. Assume that a company fails in the t . Then the annualreturns of the year t � 3 of the company are used to express the sample. Financial ratios of t � 3,namely, in advance of the 3-year period, are predictive. By modeling on the dataset of year t � 3,ex ante predictions can be made; 154 failed and 159 and non-failed samples from the hotel industrywere obtained for Chinese BFP. The five ratios suggested by Altman and Hotchkiss (2005) were usedto represent each sample. The five ratios include: working capital to total assets, retained earnings tototal assets, earnings before interest and tax to total assets, total debts to market capitalization, andsales to total assets. This ratio set is one of the most famous and widely used in the area of BFP.

Parameter settingPredictive performances of single CBR predictors with values of pi in the range of (0,30] in stepsof 0.25 are to be analyzed with hotel BFP data to fulfill the first aim. MDA, Logit, single CBR, andSVM are to be employed as benchmarks in order to implement the second aim. Single CBR has oneparameter, namely the number of nearest neighbors. It is set as five, which is the same as the settingof RSF-based CBRE. For SVM, a linear SVM is employed since this type of SVM is easily imple-mented, not very sensitive to the parameter of C , and can produce a good predictive performance.



C was set as 64. Leave-one-out cross-validation was employed to produce predictive accuracies ofCBR predictors with different values of pi in order to investigate the sensitivity of CBR to pi. Predic-tive performances of CBR predictors are illustrated in Figure 5. The figure shows that the predictive

0 5 10 15 20 25 300.875

0.88

0.885

0.89

0.895

0.9

0.905

0.91

0.915

0.92

4*pi

Acc

urac

y

Figure 5. Accuracy of CBR predictors with k D 5

Figure 6. Datasets for modeling and model performance assessment


H. LI and J. SUN

accuracies of CBR predictors are generally in the trend of decrease with pi becoming larger. Thus itis reasonable to set the maximum value of pi as a small number, e.g. 3.

Hypotheses and test for performance evaluationNull hypotheses are as follows:

H1: There is no significant difference in predictive performances of RSF-based CBRE and MDA.H2: There is no significant difference in predictive performances of RSF-based CBRE and Logit.H3: There is no significant difference in predictive performances of RSF-based CBRE and CBR.H4: There is no significant difference in predictive performances of RSF-based CBRE and SVM.

Paired-samples t-test was employed to test the hypotheses. The whole dataset was split 100 times formodel training and evaluating, as is shown in Figure 6.

RESULTS AND DISCUSSION

Statistical results, namely mean accuracy, median accuracy, standard deviation (SD), minimum accu-racy, maximum accuracy, and range of the 100 predictions on the 30% datasets with T respectivelyset as 3, 5, and 7 are listed in Tables I–III. They are illustrated more clearly in Figures 7–9. Theindices of the best times in Tables I–III refer to the times a forecasting method achieved the best ratioof statistics among the five methods. Best ratios are indicated in bold type.

Tables I–III show that the RSF-based CBRE produced the best mean accuracies, namely 88.47%,88.82%, and 88.46%, when T is respectively set as 3, 5, and 7. The hit ratio of RSF-based CBRE isbetter than those of MDA, Logit, CBR, and the linear SVM in absolute value, respectively, by 5.08,

Table I. Statistics of predictive performance of comparative methods when T is set as 3

Method Mean Median SD Minimum Maximum Range Besttimes

Comparative methods MDA 83.39 83.87 3.83 69.89 91.40 21.51 0Logit 83.87 83.87 3.24 75.27 90.32 15.05 0CBR 88.04 88.17 3.11 80.65 95.70 15.05 2

Linear SVM 85.80 86.02 2.82 78.49 93.55 15.05 1

CBR ensemble RSF-based CBRE 88.47 89.25 3.27 79.57 93.55 13.98 3

Table II. Statistics of predictive performance of comparative methods when T is set as 5



Linear SVM 85.78 86.02 3.13 78.49 92.47 13.98 2




Table III. Statistics of predictive performance of comparative methods when T is set as 7



Linear SVM 85.84 86.02 2.82 78.49 91.40 12.90 2


MDA

Logit

CBR

SVM

CBRE

Mean accuracy

MDA

Logit

CBR

SVM

CBRE

Minimum accuracy

MDA

Logit

CBR

SVM

CBRE

Median accuracy

MDA

Logit

CBR

SVM

CBRE

S.D.

MDA

Logit

CBR

SVM

CBRE

Maximum accuracy

80.00 82.00 84.00 86.00 88.00 90.00 60.00 65.00 70.00 75.00 80.00 85.00

80.00 82.00 84.00 86.00 88.00 90.00 0.00 1.00 2.00 3.00 4.00 5.00

86.00 88.00 90.00 92.00 94.00 96.00 98.00 0.00 5.00 10.00 15.00 20.00 25.00

MDA

Logit

CBR

SVM

CBRE

Range

Figure 7. Predictive performance of comparative methods when T is set as 3

4.6, 0.43, and 2.67 when T = 3, by 5.64, 4.8, 0.53, and 3.04 when T = 5, and by 5.05, 4.16, 0.6, and2.62 when T = 7. This means that forecasting error rates of MDA, Logit, CBR, and the linear SVMare worse than that of RSF-based CBRE, respectively, by 43.80%, 39.90%, 3.73%, and 23.16% whenT = 3, by 50.45%, 42.93%, 4.74%, and 27.19% when T = 5, and by 43.76%, 36.05%, 5.20%, and22.70% when T = 7. It is also evident that RSF-based CBRE obtained the best performances in termsof the six statistics. This result indicates that, on the whole, RSF-based CBRE could produce betterpredictive performance than the four comparative methods.


H. LI and J. SUN

MDA

Logit

CBR

SVM

CBRE

Mean accuracy

MDA

Logit

CBR

SVM

CBRE

Minimum accuracy

MDA

Logit

CBR

SVM

CBRE

Median accuracy

MDA

Logit

CBR

SVM

CBRE

S.D.

MDA

Logit

CBR

SVM

CBRE

Maximum accuracy

80.00 82.00 84.00 86.00 88.00 90.00 68.00 70.00 72.00 74.00 76.00 78.00 80.00 82.00

80.00 82.00 84.00 86.00 88.00 90.00 0.00 1.00 2.00 3.00 4.00 5.00

88.00 90.00 92.00 94.00 96.00 98.00 0.00 5.00 10.00 15.00 20.00

MDA

Logit

CBR

SVM

CBRE

Range


Results of the significance test to verify the four hypotheses are presented in Table IV. This tableshows that all four hypotheses, from H1 to H4, in the null forms are rejected no matter what valueis set for T . This result means that RSF-based CBRE produced significantly different performancethan MDA, Logit, CBR, and the linear SVM. By combining predictive accuracy and the significancetest, it can be concluded that the RSF-based CBRE produced significantly better performance than thefour compared methods statistically. The significant levels are all 1%. This provides some evidencethat RSF-based CBRE is very useful in Chinese hotel BFP.

In summary, we achieved the objective of improving CBR’s upper limit of predictive performancein BFP by injecting randomness into similarity functions. The improved forecasting performance isensured by the use of the initialization step to produce good member CBR predictors, the RSF step toproduce diverse member CBR predictors, empirical investigation on sensitivity of member CBR pre-dictors to ensure that good member predictors are generated, and voting for the ensemble of memberCBR predictors. The RSF-based CBRE does not involve much complexity to improve the upper limitof predictive performance of CBR. The characteristics and advantages of the CBR system are kept.



MDA

Logit

CBR

SVM

CBRE

Mean accuracy

MDA

Logit

CBR

SVM

CBRE

Minimum accuracy

MDA

Logit

CBR

SVM

CBRE

Median accuracy

MDA

Logit

CBR

SVM

CBRE

S.D.

MDA

Logit

CBR

SVM

CBRE

Maximum accuracy

80.00 82.00 84.00 86.00 88.00 90.00 68.00 70.00 72.00 74.00 76.00 78.00 80.00 82.00

80.00 82.00 84.00 86.00 88.00 90.00 0.00 1.00 2.00 3.00 4.00 5.00

86.00 88.00 90.00 92.00 94.00 96.00 98.00 0.00 5.00 10.00 15.00 20.00

MDA

Logit

CBR

SVM

CBRE

Range


CONCLUSION

This research concludes that the RSF-based CBRE forecasting method has the capability of produc-ing a more dominant performance than MDA, Logit, CBR, and a linear SVM in Chinese hotel BFP.In terms of the significance test, RSF-based CBRE has the capability of producing significantly betterperformance than comparative methods. Datasets were collected from Chinese listed companies witha hotel business. The effectiveness of using the integration of randomness with similarity calculationsto generate diverse and good member CBR predictors for CBR ensemble was demonstrated. Thecomponents and procedures of the RSF-based CBRE ensure that good member CBR predictors areused. As a result, good and diverse member CBR predictors are used to compose the ensemble fore-casting method by voting. The structure of RSF-based CBRE is easily interpretable, which enablesit to inherit the advantages of single CBR and possibly improve the upper limit of predictive perfor-mance as well. Further studies are suggested to apply the RSF-based CBRE forecasting method insome other areas.


H. LI and J. SUN

Table IV. Results of significance test by two-tailed paired-samples t-test

T Method Mean t-statistic and p-value Significance level Hypothesis test

3 RSF-based CBRE 88.47 10.483 (0.000)*** 1% Reject H1MDA 83.39RSF-based CBRE 88.47 11.160 (0.000)*** 1% Reject H2Logit 83.87RSF-based CBRE 88.47 2.887 (0.005)*** 1% Reject H3CBR 88.04RSF-based CBRE 88.47 7.807 (0.000)*** 1% Reject H4Linear SVM 85.80



ACKNOWLEDGEMENTS

This research is partially supported by the National Natural Science Foundation of China (No.71171179) and the Zhejiang Provincial Natural Science Foundation of China (No. Y7100008). Theauthors gratefully thank the editor-in-chief and the associate editor of the Journal of Forecasting fortheir work.

REFERENCES

Aamodt A, Plaza E. 1994. Case-based reasoning: foundational issues, methodological variations, and systemapproach. AI Communications 7(1): 39–59.

Adya M, Collopy F. 1998. How effective are neural networks at forecasting and prediction? A review andevaluation. Journal of Forecasting 17(5–6): 481–495.

Ahn H, Kim K-J. 2009. Bankruptcy prediction modeling with hybrid case-based reasoning and genetic algorithmsapproach. Applied Soft Computing 9(2): 599–607.

Altman EI. 1968. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. Journal ofFinance 23(4): 589–609.

Altman EI, Hotchkiss E. 2005. Corporate Financial Distress and Bankruptcy: Predict and Avoid Bankruptcy,Analyze and Invest in Distressed Debt (3rd) edn. Wiley: Hoboken, NJ.



Bao Y, Ishii N, Du X. 2004. Combining multiple k-nearest neighbor classifiers using different distance functions.In Proceedings of the Fifth International Conference on Intelligent Data Engineering and Automated Learning,Vol. 3177, Yang Z, Everson R, Yin H (eds), LNCS. Springer: Berlin; 634–641.

Bay SD. 1998. Combining nearest neighbor classifiers through multiple feature subsets. In Proceedings of theFifteenth International Conference on Machine Learning, Shavlik J Morgan Kaufmann: (ed.), Madison, WI;37–45.

Beaver W. 1966. Financial ratios as predictors of failure. Journal of Accounting Research 4: 71–111.Borrajo ML, Bruno B, Corchado E, Bajo J, Corchado JM. 2011. Hybrid neural intelligent system to predict business

failure in small-to-medium-size enterprises. International Journal of Neural Systems 21(4): 277–296.Breiman L. 1996. Bagging predictors. Machine Learning 24: 123–140.Bryant SM. 1997. A case-based reasoning approach to bankruptcy prediction modeling. Intelligent Systems in

Accounting, Finance and Management 6: 195–214.Cunningham P, Zenobi G. 2001. Case representation issues for case-based reasoning from ensemble research.

In Case-Based Reasoning Research and Development: Proceedings of the Fourth International Conference onCase-Based Reasoning; 146–157.

Dasarathy BV. 1991. Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE ComputerSociety Press: Los Alamitos, CA.

Dietterich TG. 2000. An experimental comparison of three methods for constructing ensembles of decision trees:bagging, boosting, and randomization. Machine Learning 40(2): 139–157.

Gepp A, Kumar K, Bhattacharya S. 2010. Business failure prediction using decision trees. Journal of Forecasting29: 536–555.

Hardle W, Lee Y-J, Schäfer D, Yeh Y-R. 2009. Variable selection and oversampling in the use of smooth supportvector machines for predicting the default risk of companies. Journal of Forecasting 28(6): 512–534.

Ho TK. 1998. Nearest neighbours in random subspaces. In Proceedings of 2nd International Workshop on Statisti-cal Techniques in Pattern Recognition, Amin A, Dori D, Puil P, Freeman H (eds). Springer: Sydney, Australia;640–648.

Ho TK, Baird HS. 1997. Large-scale simulation studies in image pattern recognition. IEEE Transactions on PatternAnalysis and Machine Intelligence 19(10): 1067–1079.

Hu Y, Ansell J. 2009. Retail default prediction by using sequential minimal optimization technique. Journal ofForecasting 28(8): 651–666.

Hwang R, Cheng K, Lee J. 2007. A semiparametric method for predicting bankruptcy. Journal of Forecasting 26(5):317–342.

Jo H, Han I. 1996. Integration of case-based forecasting, neural network, and discriminant analysis for bankruptcyprediction. Expert Systems with Applications 11(4): 415–422.

Jo H, Han I, Lee H. 1997. Bankruptcy prediction using case-based reasoning, neural network and discriminantanalysis for bankruptcy prediction. Expert Systems with Applications 13: 97–108.

Kuncheva LI. 2004. Combining Pattern Classifiers: Methods and Algorithms. Wiley: Chichester.Li H, Sun J. 2009a. Majority voting combination of multiple case-based reasoning of financial distress prediction.

Expert Systems with Applications 36(3): 4363–4373.Li H, Sun J. 2009b. Predicting business failure using multiple case-based reasoning combined with support vector

machine. Expert Systems with Applications 36(6): 10085–10096.Li H, Sun J. 2010. Forecasting business failure in China using case-based reasoning with hybrid case respresenta-

tion. Journal of Forecasting 29(5): 486–501.Li S-T, Ho H-F. 2009. Predicting financial activity with evolutionary fuzzy case-based reasoning. Expert Systems

with Applications 36(1): 411–422.Lin R-H, Wang Y-T, Wu C-H, Chuang C-L. 2009. Developing a business failure prediction model via RST, GRA

and CBR. Expert Systems with Applications 36(2): 1593–1600.Marcano-Cedeno A, Marin-De-La-Barcena A, Jimenez-Trillo J, Pinuela JA, Andina D. 2011. Artificial metaplas-

ticity neural network applied to credit scoring. International Journal of Neural Systems 21(4): 311–317.McKee TE. 2003. Rough sets bankruptcy prediction models versus auditor signaling rates. Journal of Forecasting

22(8): 569–586.McKee TE, Greenstein M. 2000. Predicting bankruptcy using recursive partitioning and a realistically proportioned

data set. Journal of Forecasting 19(3): 219–230.Nam CW, Kim TS, Park NJ, Lee HK. 2008. Bankruptcy prediction using a discrete-time duration model

incorporating temporal and macroeconomic dependencies. Journal of Forecasting 27(6): 493–506.


H. LI and J. SUN

Okun O, Priisalu H. 2005. Multiple views in ensembles of nearest neighbor classifiers. In Proceedings of theWorkshop on Learning with Multiple Views (22nd ICML): Bonn, Germany.

Park C, Han I. 2002. A case-based reasoning with the feature weights derived by analytic hierarchy process forbankruptcy prediction. Expert Systems with Applications 23(3): 255–264.

Quinlan JR. 1996. Bagging, boosting, and C4.5. In Proceedings of the Thirteenth National Conference on ArtificialIntelligence; 725–730.

Setiono R, Baesens B, Mues C. 2011. Rule extraction from minimal neural networks for credit card screening.International Journal of Neural Systems 21(4): 265–276.

Wu W-W. 2011. Improving classification accuracy and causal knowledge for better credit decisions. InternationalJournal of Neural Systems 21(4): 297–309.

Zhou Z-H, Yu Y. 2005. Adapting bagging to nearest neighbor classifiers. Journal of Computer Science andTechnology 20: 48–54.

Authors’ biographies:Hui Li received the award of outstanding young talent of Zhejiang Province in 2010 and is an associate professorat Zhejiang Normal University, China. He received his BS, MS, and PhD degrees from Harbin Institute of Tech-nology. He is a young researcher of the World Federation on Soft Computing and a member of the Associationfor Information Systems. He has co-authored over 50 papers in premium, leading and reputable journals and con-ferences, including African Journal of Business Management, Applied Soft Computing, Computers and OperationsResearch, Expert Systems with Applications, European Journal of Operational Research, Information and Manage-ment, Information Sciences, Journal of Forecasting, Knowledge-Based Systems, and Tourism Management, amongothers. He serves as a principal investigator of several national funded research projects, including: the NationalNatural Science Foundation of China and the Zhejiang Provincial Natural Science Foundation of China. His pri-mary interests include business forecasting, business computing, case-based reasoning, and business intelligence.He received the Science Research Award of Zhejiang Provincial Universities in 2011, and was listed in MarquisWho’s Who in the World 2011.

Jie Sun who received all her BS, MS, and PhD degrees from Harbin Institute of Technology, is the chair of theDepartment of Accounting in Zhejiang Normal University, China. She has co-authored around 40 papers in somesignificant journals and conferences. She has also led two projects by the National Natural Science Foundation ofChina and the Zhejiang Provincial Natural Science Foundation of China. Her primary interests include financial riskmanagement, accounting information systems, and financial forecasting, among others. She received the ScienceResearch Award of Zhejiang Provincial Universities, the Social Science Research Award of Jinhua City in 2010,and was listed in Marquis Who’s Who in the World 2011.

Authors’ addresses:Hui Li and Jie Sun, School of Economics and Management, Zhejiang Normal University, PO Box 62, 688YingBinDaDao, Jinhua, Zhejiang 321004, China.


predicting business failure using an rsf-based case-based reasoning ensemble forecasting method

Documents