forecasting of options in the car industry using a ... · forecasting of options in the car...

Forecasting of Options in the Car Industry

Using a Multistrategy Approach

Stefan Ohl

Daimler-Benz AG, Research and Technology, F3S/EP.O. Box 2360

D-89013 Ulm, Germanyohl(~.dbag, ulm.DaimlerBenz.com

AbstractIn recent years, car options like air-conditioning,automatic gears, car stereo, power windows, sunroofetc. were getting more and more important for carmanufacturers. Especially at those car manufacturerswhich offer cars individually according to customerrequirements, the options affect 30% - 40% of theparts by now. Therefore, very detailed planning andforecasting of options becomes more and moreimportant. Because customer behavior concerning theoptions varies from model to model and from countryto country, it seems necessary and reasonable toforecast each option for each model and each countryseparately. The resulting huge number of data setsrequires an automatic forecasting tool that adaptsitself to the actual data sets and that requires almostno user interaction. Because depending on thecharacteristics of a time series the quality of theforecasting results varies a lot, and because "N headsare better than one," the basic idea is m select in a firststep the most appropriate forecasting procedures. Thisselection is done by a decision tree which is generatedby using a symbolic machine learning algorithm.Those selected forecasting methods produce differentresults that are in a second step combined to get acommon forecast. In this approach there are integratedunivariate time series used in the first step for runningthe prediction as well as symbolic machine learningalgorithms for generating the decision trees as well asmultivariate statistical methods and neural networksused in the combination step.

Introduction

Relevance of Options and ExtrasIn recent years, options of a car like air-conditioning,automatic gears, car stereo, power windows, sunroof etc.have become more and more important for carmanufacturers (Dichtl et al., 1983). On the one hand, bigbusiness around the options and extras arises, but on the

other hand, the huge number of extras ofl~red optionallycauses high costs as well. Mostly, these enormous costsresult of a diversity of different parts and productionprocesses that are influenced by the different options(Br[indli et al., 1994; Br’~indli and Dumschat, 1995). particular, at car manufacturers that offer cars individuallyaccording to customer requirements, the options affect30% - 40% of the parts by now. Therefore, very detailedplanning and forecasting of options becomes more andmore important.

Situation at Mercedes-BenzIn the case investigated in this paper, the car manufacturerMercedes-Benz offers about 400 different options for morethan 100 different models structured into 4 different classes(C-, E-, S-, and SL-class) sold in more than 250 countries.In average, each customer orders about 10 - 20 options percar. Based on almost 600,000 produced cars per year, inaverage, only 1.4 cars are identical (Hirzel, 1995). Thatmeans there exist only a few identical cars, and in mostcases these cars belong to bulk buyers like rental carcompanies.

Recently, even an increase in variety of models andoptions is expected, since new classes never offered beforeby Mercedes-Benz will be presented (Stegemanm 1995).Moreover, already existing classes will be extended byadditional models, and completely new options likeautomatic windshield wipers or different kinds ofnavigation systems will be sold.

Analysis of Customer Behavior: Cumulated

vs. Single Data

Ad Hoc Analysis

Since some options are not available for each class andbecause identical options differ quite often from class toclass (e.g. air conditioning of C-class is different to airconditioning of E-class), it is necessary to differentiateextras per each single class. With regard to monthly

272 MSL-96

From: Proceedings of the Third International Conference on Multistrategy Learning. Copyright © 1996, AAAI (www.aaai.org). All rights reserved.

option X

number of option X

all models of C-classall countries

Fig. l: Time series of single options.

cumulated data for about two years (e.g. all automaticgears sold per month in the C-class), cumulated time seriesof single options that represent absolute selling figures donot contain any systematic pattern as shown in figure 1.Therefore, it seems to be very difficult to forecast suchkinds of time series as monthly data to a prediction horizonof 12 or even more months.

Utilizing Descriptive Statistics

However, looking at the data in a more detailedway usingdescriptive statistics some systematic pattern may bediscovered. By analyzing the distribution of single optionsover different models and different coun~es where theseoptions are sold, a lot of differences concerning theinstallation rate of an option to the total number of carssold occur. For the options air conditioning (AC),

automatic gears (AG), power windows (PW), heated seats(HS), sunroof (SR), and traction system (TS) this analysisfor Mercedes C-class (Germany), and for the model C 220(world wide), respectively, is presented in figure

As it is shown in the left part of figure 2, for example, ifmore models C 280 and less models C 180 are produced incomparison to the previous month, the cumulative absolutenumber of air conditioning will increase, whereas theinstallation rate of air conditioning for each model will notfluctuate unsystematically a lot. Similarly, if in the currentmonth more cars of the type C 220 are produced forSwitzerland (CH) and less for Sweden (S) in comparison the previous month, the cumulative absolute number oftraction systems will increase and the cumulativeinstallation rate of heated seats will decrease, whereas thesingle installation rate per model and country for theseoptions will not vary randomly a lot (see right part offigure 2).

Based on this analysis, we detected that customerbehavior concerning options varies a lot from model tomodel and from country to country (Oh[, 1995). Therefore,it seems reasonable to forecast and plan each single optionas an installation rate for each model and each countrywhere the option is available separately.

The less systematic kind of time series mentioned in theprevious subsection of ad hoc analysis and presented infigure 1 is mainly due to the monthly shifting cars fromone country to another country or from one model toanother model as described before. Whereas the totalnumber of cars produced per month is more or less stable,the number of one single model produced for one singlecountry may vary much more (Hirzel and Ohl, 1995).

installation rate (C-class. Germany) installation rate (C 220)

[%1/~~~~~

[%1

I(X)- :::i .:i:~--==1

/ / =_-

2t

rs

eption s D option

0180

model country

Fig. 2: Installation rates of different options at different models (Mercedes C-Class, Germany) and differentcountries (Mercedes C 220, world wide).

Oh] 273


But this variation from one month to another monthconcerning one country or one model is in most casesplanned and therefore well known, because there areseasonal effects, for instance, varying from one market toanother market or from one model to another model.During summer time for instance, the number ofconvertibles sold is much larger than during winter time,and in markets like North America the selling figuresincrease when the first cars of a new model year arepresented and sold in fall.

In contrast to the cumulative time series regarded in theprevious subsection, analyzing single time series thatrepresent not the absolute number but the installation roteof an option per model and country shows much moreregular and systematic behavior as illustrated in figure 3.

Obviously, forecasting the single options as installationrates per model and country and multiplying theseinstallation rates with the more or less well known plannedabsolute numbers of cars to be sold per model and countrywill yield better results than forecasting the absolute sellingfigures of options. This idea is illustrated in figure 3 wherethe time series of figure ! is shown as the result offorecasting the single installation rates per model andcountry multiplied by the planned absolute number of carsas described before.

A Multistrategy Forecasting ApproachThe resulting huge number of data sets requires anautomatic forecasting and planning tool that adapts itself tothe current data sets and that requires almost no userinteraction. Another need concerning such a tool isperformance, i.e. algorithms should not be too complex to

avoid long computation times tbr doing the forecasts. Thereason is that the forecasted selling figures of options areneeded in a fixed company wide planning time table forfurther steps of the total planning process such as planningcapacities of the different plants or planning the dispositionof materials and parts for the next months (Ohl, 1996).This is due to the fact that the time needed for replacementof some parts and materials is longer than delivery timecustomers would accept. Therefore, frequently in theautomotive industry it is necessary to do the disposition ofparts and materials that will be delivered by internal andexternal suppliers before customers place their real orders.

Selection of Existing Methods

Classical Statistical Approaches. In general, fourdifferent ways of quantitative forecasting are yet wellestablished (Makridakis et al., 1984):¯ Purely judgmental or intuitive approaches.¯ Causal or explanatory methods such as regression or

econometric models (multivariate approaches).¯ Time series methods (extrapolative univariate).¯ Combinations of above mentioned techniques.Purely judgmental or intuitive approaches are notappropriate because the user has to forecast each timeseries manually. The multivariate approach does not workat the application presented in this paper as well, since it isalmost impossible to identify impact factors for differentoptions for different countries. For example, what are theimpact factors influencing the sunroof of the model C 220in Italy? Therefore, only extrapolative univariate timeseries approaches are taken into consideration in thefollowing.

~li~

¯ number of cars

opzion Xa

i-’~stallali~ rate of op(k~ X

model M j model M i model M2country ¢, country ¢2 co..try C i

number ofopfio. X

installation raze ol’,[~’~.mn X

model M2country C 2

all modelsall countries

Fig. 3: Installation rates and absolute selling figures of options.

274 MSL-96


Reviewing extrapolative univariate time series analysisresults in quite a lot of different methods. The mostimportant and most popular are listed below (Jarrett,1991):¯ Naive forecast¯ Moving averages¯ Exponential smoothing¯ Time series decomposition¯ Adaptive filtering¯ The Box-Jenkins methodology (ARIMA modeling)From the scientific point of view, the most interestingapproach of these methods listed above is the ARIMAmodeling (Box and Jenkins, 1976). However, there are twovery crucial disadvantages: The most important point inARIMA modeling is model identification. As shown indifferent forecasting competitions (Makridakis et al., 1984;Makridakis and Hibon, 1979; Schwarze and Weckerle,1982) even experts differ in choosing the most appropriateARIMA model. And beyond, there exist forecastingmethods producing in average better forecasts thanARIMA modeling does. Besides, as described before, anautomatic forecasting system is needed, such that it is notpossible to identify manually the appropriate model foreach time series. Therefore, ARIMA modeling seems notto be reasonable for the prediction of all time series in ourapplication.

But nevertheless, there exist definitely special types oftime series, where ARIMA modeling will be theappropriate forecasting method, i.e. time series with a highautocorrelation. Besides, there exist approaches foridentifying automatically the most appropriate model suchas Akaike’s information criterion (Akaike, 1976; Akaike,1979). Therefore, the Box-Jenkins methodology will not bethe appropriate approach for all time series, but possiblyfor a selected number of time series.

In the case of adaptive filtering (Kalman, 1960; Kalmanand Bucy, 1961), the situation is very similar. It seems notreasonable to use this approach for all time series, but inmany cases adaptive filtering will produce good predictionresults without any user interaction. Therefore, adaptivefiltering will not be used as forecasting technique for alltime series, but definitely for the prediction of special typesof time series. This result corresponding to the situation ofARIMA modeling is not surprising at all, because theadaptive filtering technique is very similar to ARIMAmodeling.

On the one hand, forecasting time series using naiveforecasts or moving averages does in a lot of cases notproduce good results in comparison to other forecastingapproaches. But on the other hand, there exist a lot of timeseries at Mercedes-Benz, where using naive forecasts ormoving averages yields excellent results.

Discussing exponential smoothing (Brown and Meyer,1961) means to discuss a lot of different forecastingtechniques (Gardner, 1985), because there exist constantexponential smoothing approaches without any trend or

seasonal parameters as well as there exist differentexponential smoothing techniques considering linear andnonlinear trends as well as seasonal effects of the timeseries. Consequently, the different kinds of exponentialsmoothing could be depending on the structure of theinvestigated time series the appropriate approaches for ourforecasting system. Generally, the most important andpopular types of exponential smoothing approaches are:¯ Single exponential smoothing (Brown, 1963)¯ Linear exponential smoothing: Brown’s one parameter

method (Brown, 1963)¯ Linear exponential smoothing: Holt’s two parameter

approach (Holt et al., 1960)¯ Winters’ three parameter method (Winters, 1960)

Symbolic Machine Learning and Neural Networks.Recently, the attention concerning the task of time seriesprediction is also focused on the application of neuralnetworks (Weigend and Gershenfeld, 1994; Graf andNakhaeizadeh, 1993). Although the development of neuralnetworks at early stage was stimulated by modeling oflearning process in human brain, the further developmentof this technology shows a very strong similarity withstatistical approaches (Ripley, 1992).

There are some studies which compare neural networkswith some statistical procedures like nonlinear regressionfrom a theoretical point of view (Arminger, 1994; Ripley,1992). However, it should be mentioned that the ability ofadaptive learning which characterizes the most of neuralnetworks is not implemented in statistical procedures likeregression analysis or ARIMA modeling.

Generally, there are two alternatives to use neuralnetworks for time series prediction:¯ On the one hand, it is possible to use neural networks as

multivariate forecasting techniques very similar tononlinear regression. In this case the problem for ourapplication is the same as with multivariate statisticalprediction approaches: it is almost impossible to identifythe impact factors for the huge number of time seriesconcerning the different options in different countries.

¯ On the other hand, it is possible to use neural networksas univariate forecasting techniques very similar tostatistical extrapolative univariate forecastingapproaches. Here the problem is to identify the mostappropriate model (Steurer, 1994).

Neural nets based learning algorithms are like the majorityof statistical approaches polythctic procedures whichconsider simultaneously attributes for decision. Because ofthis reason, they need at least as much training examples asthe number of initial attributes. It means, they are notappropriate at all for automatic attribute selection for thecases in which the number of attributes, in comparison tothe number of training examples, is too high. This is oftenthe case in dealing with time series data.

The main problem in using neural networks forprediction consists in finding the optimal networkarchitecture. To realize this task, one has to divide the

Ohl 275


available time series data into training set, test set, andvalidation set. Regarding the problem of limited number ofobservations in time series data which is discussed above,dividing the whole series into training set, test set, andvalidation sets leads to a still smaller training data set, inmany circumstances.

Besides neural networks, some of symbolic machinelearning algorithms based on lD3-concept (Breiman et al.,1984) can be used to predict the development of time seriesas well (Merkel and Nakhaeizadeh, 1992).

There are a lot of machine learning algorithms whichcan handle the classification task. Almost all of thesealgorithms are, however, not appropriate to handle theprediction task directly. The reason is that in contrast toclassification, in prediction the class values are continuousrather than discrete. Exceptions are the ID3-typealgorithms CART (Breiman et al., 1984) and Newld(Boswell, 1990) which can handle continuous-valuedclasses as well. Of course, by discretization of continuousvalues and building intervals, it is possible to transfer everyprediction task to a classification one, but this procedure isconnected, normally, with a loss of information.

The algorithms like CART and Newld can handle thecontinuous valued classes directly, and withoutdiscretization. They generate a predictor in the form of aregression tree that can be transformed to production rules.Furthermore, these learning algorithms apply a singleattribute at each level of the tree (monothetic) and this is contrast to the most statistical and neural learningalgorithms which consider simultaneously all attributes tomake a decision (polythetic).

The main advantage of symbolic machine learningapproaches like regression trees is that it is possible veryeasily to involve other available information in predictionprocess, for example, by including the backgroundknowledge of experts. Concerning rule generating from thedata, it should be mentioned that this property of decisionand regression trees is one of the most importantadvantages of these approaches. Other classification andprediction methods like statistical and neural procedureshave not these property that allows considering othersources of information in a very flexible manner. In thecase of statistical or neural classification and predictionalgorithms, it is very difficult, and in some circumstancesimpossible at all, to consider such additional information.

However, like other approaches, prediction algorithmsbased on symbolic machine learning have also someshortcomings. Generally, they can not predict the valuesbeyond the range of training data. Regarding the fact,especially, a lot of time series have an increasing(decreasing) trend component, it can be seen that by usingjust the raw class values, one can never achieve a predictedvalue which is outside the range of the class values usedfor training. But this disadvantage can be avoided bytaking differences of the class values as it is the case withthe Box-Jenkins approach.

To summarize, neural networks and symbolic machinelearning algorithms are from the theoretical point of viewvery interesting, but they do not seem to be appropriate forthe prediction of the huge number of time series in ourapplication. Therefore, the forecast will be done byextrapolative univariate forecasting approaches. The onlymethod of this class not taken into consideration any moreis the method of time series decomposition because of theneeded user interaction. Therefore, time seriesdecomposition seems definitely not to be an appropriateforecasting method for performing automatic forecasts as itis needed in our application.

The Idea of CombinationAfter the decision to use different extrapolative univariatetime series techniques except for time series decompositionfor prediction of options at Mercedes-Benz, the next stepwill be the detailed selection of an adequate extrapolativeunivariate prediction model. This selection of the mostappropriate forecasting model is a very hard task. In thecase of the application presented in this paper, wheretotally different types of time series have to be forecasted,it is almost impossible to give an answer concerning thequestion, what generally the most appropriate forecastingmodel is.

There have been many comparative studies inforecasting. Unfortunately, they do not always agree onwhich are the best forecasters or on the reasons why onemethod does well and another does badly. One of thelargest studies is that conducted by Makridakis and Hibon(1979) who concluded that simple methods oftenoutperformed sophisticated methods like ARIMAmodeling. The largest competition realized so far is thatconducted by Makridakis (Makridakis et al., 1984) whoconcluded with very similar results. While the study ofMakridakis and Hibon (1979) concerned I I I time series,the competition of Makridakis (Makridakis et al., 1984)concerned 1001 time series and represents a major researchundertaking.

Previous studies by Newbold and Granger {1974) andReid (1975) concluded that ARIMA modeling will usually superior if it is appropriate. Generally, ARIMAmodeling is regarded as giving extremely accurateforecasts provided the user is expert enough and sufficientcomputational resources are available, although simplemethods are often more than adequate if the stochasticstructure of the time series is sufficiently simple(Montgomery and Johnson, 1976; Jarrett, 1991).

In the meantime, many researchers have applied therelatively new techniques of neural networks and machinelearning in the context of prediction. There are severalrecent studies comparing the performance of neuralnetworks, machine learning and classical statisticalalgorithms using time series data: for example, Rehkuglerand Poddig (1990), Schumann and Lohrbach (1992) well as Grafand Nakhaeizadch (1993).

276 MSL-96


All these studies concentrate on finding which method isbest, or try to explain why one method is better thananother. However, as Bates and Granger (1969) suggest,",he point is not to pick the best method, but rather to see ifit is possible to combine the available methods so as to getan improved predictor:

OUR INTEREST is in cases in which two (or more)forecasts have been made of the same event.Typically, the reaction of most statisticians andbusinessmen when this occurs is to attempt todiscover which is the better (or best) forecast; thebetter forecast is then accepted and used, the otherbeing discarded. Whilst this may have some meritwhere analysis is the principal objective of theexercise, this is not a wise procedure if the objective isto make as good a forecast as possible, since thediscarded forecast nearly always contains some usefulindependent information.

The general idea of combining different methods is evenmuch older: "In combining the results of these twomethods, one can obtain a result whose probability law oferror will be more rapidly decreasing" (Laplace, 1818, inClemen, 1989).

The basic principle of combining forecasts is explainedin detail in figure 4. Based on a given database shown inthe upper middle of figure 4, forecasting method Adiscovers the linear trend correctly, but does not detect theseasonality of the given data. In contrast to method A,forecasting procedure B discovers the seasonality but doesnot detect the linear trend. In conclusion, both methodscontain useful information of the given time series, butneither approach A nor method B contains all relevantinformation. Thus, selecting one single method forprediction and discard the other one means to loose usefuland important information. This is avoided by choosing the

real data

v

.--onand--ofONEmethod (based on smallest error)OR

¯combination of different methods=> optimal exploitation of

information

Fig. 4: Combination of forecasts.

multistrategy approach of combining both methods to getone common forecast.

This multistrategy approach of combining differentforecasting methods is adapted in the consideredapplication at Mercedes-Benz. Instead of selecting onesingle forecasting procedure, different forecastingtechniques will be used in the forecasting system. Each ofthese different forecasting techniques will producesimultaneously its own forecast for the next 12 (or more)months, that will be combined subsequently in thecombination step to one common forecast followingWolpert’s general scheme of "Stacked Generalization"(Wolpert, 1992).

Following this approach of combined forecasts, there arethree very important questions to answer:* How many single univariate forecasting methods will be

combined ?¯ Which single univariate forecasting methods will be

combined ?¯ What is the appropriate algorithm for the combination ?The first question to answer is, how many differentforecasting approaches should be combined. Theoretically,it can be shown that the larger the number of singleforecasting techniques used as input for combination, thebetter the produced results are (Thiele, 1993). In practice,on the one hand it can be shown, that accuracy willincrease if more single forecasts are used as input(Makridakis and Winkler, 1983), but on the other hand, can be shown as well that the difference of accuracy is notremarkable at all, if there are six, or seven, or eightdifferent single prediction techniques used as input factorsfor the combination step (H0ttner, 1994). Besides, by usingtoo many single forecasting methods as input for thecombination step, there is the danger that the same oralmost the same information is taken different times asinput for the combination. That means to give a too strongweight to this information and not enough weight to theother independent information. Therefore, it seemsreasonable not to use more than six single forecastingprocedures as input for the combination step.

Classification of Time SeriesThe second question is, which forecasting methods shouldbe selected for combination in what situation. As shown indifferent studies and forecasting competitions (Makridakiset al., 1984; Makridakis and Hibon, 1979; Schwarze andWeckerle, 1982), depending on the characteristics of thetime series the different forecasting approaches yielddifferent results. I.e., if there are autoregressivecharacteristics, ARIMA modeling will perform best, whilefor other types of time series other approaches will producethe best forecasting results.

Therefore the idea is, to select the six most appropriatesingle time series approaches depending on thecharacteristics of the different time series. The goal is toselect for every time series the six most appropriate models

Ohl 277


time attribute forecasting approachseriesno. static- trend seaso- autocor- absolute unsyste- ... best 2nd 3rd 4th 5th 6th

narity nality relation level of matte best best best best bestfigures fluctu-

ation1 yes yes no no high no E L W A G H2 no yes no no low no R D L G B I3 no yes no no middle yes W C L B D G4 no no yes yes middle no G W B K M D5 yes no yes yes high no F B V E B D6 yes yes no no low yes E D F G O K7 yes no yes yes low8 no no no yes , high9 yes yes no no middle yes H 1 D L R C! 0 no no yes no high yes A D C G I F

yes C A U S K Lno L K S B F J

Fig. 5: Summary of ex post forecasting results.

and to combine these different forecasts subsequently inthe combination step.

For the selection of most appropriate models a randomlyselected sample of time series is analyzed and described indetail by using different attributes such as stationarity,trend, seasouality, unsystematic random fluctuation.autocorrelation, absolute level of figures, etc. The values ofthese attributes are determined using objective tests such asthe Dickey-Fuller Test (Dickey and Fuller, 1979) fortesting the stationarity and the Durbin-Watson test (Durbinand Watson. 1950, 195 I, 1971 ) for testing autocorrelation.

After this description of all time series in the sample,more than 20 different single univariate predictionprocedures as presented before make an ex post forecastfor each of those time series. The different forecastingresults of these ex post forecasts are compared in detailconcerning the prediction quality. This is for analyzing,which prediction procedure performs best, second best, etc.for what type of time series.

As an outcome a table is generated, where theforecasting results are summarized as it is shown in figure5. in this table the different time series of the selectedsample and their characteristic attributes are shown. Thesingle univariate forecasting techniques performing foreach time series best, second best, third best etc. aresummarized as well.

In best case, it would be sufficient to utilize the tableshown in figure 5 to build an automatic decision system, todecide for which type of time series which forecastingtechniques pertbrms best, second best, third best, etc. Thesix best tbrecasting methods would be selected and used asinput for the combination step.

But reality looks differently. For very similar time serieswith almost the same characteristics the different singleprediction techniques perform differently. Therefore, theresults of table 5 are not suitable to generate an automatic

decision system directly out of it, for what type of timeseries which forecasting techniques perform best.

Under the assumption that the decision, whichforecasting techniques perform best, second best, thirdbest, etc. is a classification task, an appropriate tool tosolve such kinds of classification problems is a machinelearning algorithm. A classification tool accepts a patternof data as input, and the output are decision rules ordecision trees, which can be used tbr classil~ing new andunknown objects (see for more detail Breiman et al.,1984).

Therefore, a symbolic machine learning algorithm isintegrated as a generator for an automatic decision system.The ID3-type (Quinlan, 1986, 1987) machine learningalgorithm Newld (Boswell, 1990), for example, takes input a set of examples and generates a decision tree forclassification. The product of this machine learningalgorithm is a decision tree that can assign an unknownobject to one of a specified number of disjoint classes. Inour application, such a new unknown object would be anew time series described by its characteristic attributes.and the disjoint classes are the different single forecastingapproaches, which are used as input in the combinationstep.

Using NewId classifications are learned from a set of theattributes, which characterize the different types of timeseries. For every input of the combination step an extradecision tree is generated. That means that there exist sixdecision trees: the first decision tree for the first input ofthe combination step. generated by using all the attributesmentioned in figure 5 and as classification goal the row ofbest forecasting approaches. The second decision tree isgenerated by using the same attributes again, but theclassification goal is not the row of best but of second bestforecasting techniques. The third decision tree is generatedby using the row of third best forecasting methods, etc.

278 MSL-96


labs. level offiguws ]

[ unsystematic fluctuation i~’"~~

~~

[autocon’elation I

Fig. 6: Decision tree generated by Newld.

Figure 6 shows an example of a decision tree generatedby using Newld. Since such a decision tree is generated foreach input of the combination step, in sum six decisiontrees are generated. These six generated decision treeswere pruned (Quinlan, 1987) and afterwards used as automatic decision system, to decide what singleforecasting approaches should be used to forecast thedifferent types of time series.

Combining most Promising ApproachesAs mentioned above, the third question to answer is, howthe combination should be done. Generally, there are twoparadigms for combining different forecasting approaches(Henery, 1995):¯ On the one hand, it is possible to take the output of a

prediction technique and use it as input for anotherforecast approach. This combination of different modelsmeans that only one algorithm is active at the same time(sequential multistrategy forecast).

¯ On the other hand, the combination of differentalgorithms at the same time is possible as well. In theseapproach several algorithms are active at the same time(parallel multistrategy forecast).

In a way, in our application we combine these twoalternatives: the first step is kind of parallel multistrategyforecasting, because several forecasting algorithms areactive at the same time, when the single predictionmethods generate their own forecasts. The second step iskind of sequential multistrategy forecasting, because theoutput of some prediction methods is used as input foranother forecasting procedure.

So far, we consider two different paradigms for thecombination algorithm. The first way is to use statisticalapproaches (Thiele, 1993), the other way is to use neuralnetwork approaches (Donaldson and Kamstra, 1996).

From the scientific point of view, the most interestingalternative for the combination would be to use neuralnetworks, i:e. using as learning algorithm thebackpropagation algorithm (Werbos, 1974; Rumelhart etal., 1986; Rumeihart and McCleiland, 1986). The outputsof the different single forecasting methods selected aboveare used in this approach as input values for the neuralnetwork, where the combination is performed.

But taking into consideration the huge number of timeseries to forecast, it seems not reasonable to use generallyneural networks as combination tool. But nevertheless insome special situations it might be appropriate to use aneural network as forecasting tool in the combination stepas it is shown in figure 7.

forecasting forecasting forecasting forecasting forecasting forecastingmethod A method B method C method D method E method F

Fig. 7: Neural network as combination tool.

Ohl 279


There is one big problem realizing the combination stepby using neural networks. As already mentioned above,neural networks normally need a lot of data for trainingpurposes. Because the time series in our applicationcontain in many cases only 30 or 40 values, neuralnetworks in these situations seem not to be the appropriateway for the combination step. But this problem can besolved by using techniques such as the cross-validationmethod (Lachenbruch and Mickcy, 1968: Stone. 1974) andthe bootstrap method { Efron, 1983).

The alternative way for running the combination is touse statistical approaches (Thiele, 1993). Generally, thereare four classes of statistical combination approaches:¯ Unweighted combination: the common forecast is the

mean of the different single forecasts.¯ Regression methods: the impact of each single forecast

for the combination is computed based on a regressionanalysis (Granger and Ramanathan, 1984).

¯ Variance-covariance approaches: the impact of eachsingle forecast for the combination is computed based onthe error matrix (variance-covariance matrix) of thedifferent methods used in the single prediction step(Bates and Granger, 1969, Clemen, 1989).

¯ probability-based combination approaches: the impact ofeach single forecast for the combination is computedbased on probabilities, how well every single forecastingmethod performs (see for more detail Clemen, 1989).

Similar to the single forecasting approaches the bestalternative of doing the combination depends on thecharacteristics of the time series. Therefore the same wayis chosen as it is done for the selection of the best singleforecasting methods. Again the machine learning algorithmNewld is used for generating a decision tree. This decisiontree is used for an automatic selection, in which situationwhich way for combination seems most appropriate.

In many cases, already the simple mean of the differentsingle forecasting procedures yields much better resultsthan the best single forecast technique does. This confirmKang (1986) CA simple average is shown to be the besttechnique to use in practice.") and Clemen (1989) ("Inmany studies, the simple average of the individualforecasts has performed best or almost best".). But thereare other situations as well. where it is appropriate tocombine the forecasts of the single forecasting approachesby using more sophisticated techniques.

Theoretically, the regression and the variance-covariance approach yield identical results (Thiele, 1993),but caused by estimation errors in estimating the differentparameters, in practice the regression method and thevariance-covariance method produce different forecasts.

Sometimes, best forecasting results are produced byusing probability-based approaches for running thecombination step. The most interesting task using thisalternative is the estimation of a priory probabilities, whichare needed for running the combination of the differentsingle forecasts {Clemen, 1989).

280 MSL-96

EvaluationFor evaluation purposes, there are two different parts ofevaluation: The first part of evaluation is done by takingthe DGOR (Deutsche Gesellschatt f’tir Operations Research-- German Society of Operations Research) forecastingcompetition of 1982 (Schwarze and Weckerle, 1982) as general benchmark. The second and concerning the realapplication much more interesting part is to test the newmultistrategy approach in comparison to the actualforecasting procedure, which is basically a weightedmoving average method.

DGOR Forecasting Competition

The first part of evaluation is done by taking the DGORforecasting competition of 1982 (Schwarze and Wcckerle,1982) as a general benchmark.

This competition was organized in 1982 by theforecasting group of the German Society of OperationsResearch and the task is to forecast 15 different time seriesof monthly data. All time series are real data concerningselling figures, turnover figures, etc. The longest timeseries contains 204 values, the shortest one contains 48values.

Generally, there are two parts of the competition, whereeach time the task is to forecast the above mentioned 15time series:¯ In the first part of the competition the task is to forecast

once 12 periods forecasting horizon.¯ In the second part of the competition the task is to

forecast 12 times l period forecasting horizon.As an evaluation criteria for performance of the differentforecasts Theil’s U (Theil, 197 i ) is used.

For forecasting once 12 periods forecasting horizon.Theil’s U is calculated as shown in (1) and for forecasting12 times l period forecasting horizon. U is calculated asshown in (2):

Ils÷~"

,~,,,+l (I) U, =

I E(x,-x,,,)I-Io ̄ I

¯ x, : real value of month t.¯ ~, : forecasted value for month t.¯ to :

¯ T:

|11, T =t .tn÷l (2)

actual month, where the forecast is done.total forecasting horizon(in our application: 12 months).

As shown in (I) and (2) U is calculated as square root the quotient of the squared forecasting error divided by thesquared forecasting error of the naive forecast.

Therefore, the most important advantage of usingTheil’s U is the fact that U automatically gives an answerconcerning the question, whether it is worth at all using amore or less complex forecasting approach or whether it issufficient to perform a prediction using the naive forecast:


¯ U = 0 : perfect forecast, it is worthwhile to forecast.¯ 0 < U < I : forecast better than naive forecast,

it is worthwhile to do the forecast.¯ U = ! : forecasting error equal to naive forecast,

it is not worthwhile to do the forecast.¯ U > 1 : forecast worse than naive forecast,

it is not worthwhile to do the forecast.Using TheiFs U as an evaluation criteria for performanceof our forecasting system and taking the mentioned DGORforecasting competition as a general benchmark, theapproach presented yields very good results:¯ In the first part of the competition (1 time 12 periods

forecasting horizon) our method performs U~=0.69,which is the second best result (best approach:uib~sr=0.67).

¯ In the second part of the competition (12 times I periodforecasting horizon) our method performs U2=0.58,which is also the second best result (best approach:U~b~s~=0.55).

Note that approaches reaching better prediction results (i.e.lower U value) need manual fine tuning and producedbetter results after being tuned based on each single timeseries, whereas our combined approach presented in thispaper works without any manual interaction and standardparameter values, but completely automatically.

Mereedes-Benz DataThe second and concerning the real application much moreinteresting part of evaluation is to test the newmultistrategy approach in comparison to the actualforecasting procedure, which is basically a weightedmoving average method.

Based on 200 randomly selected time series that aredifferent to those time series used for performing theclassification task in the previous section the new approachis evaluated by making ex post forecasts of the last 12months. For this purpose similar to the evaluation usingDGOR data, there are again two steps of evaluation, whereeach time the task is to forecast the above mentioned 200time series of Mercedes-Benz data.¯ In the first step the task is to forecast once 12 periods

forecasting horizon.¯ In the second step the task is to forecast 12 times 1¯ period forecasting horizon.

In earlier tests (Friedrich et al., 1995) it was alreadyshown, that the actual weighted moving averageforecasting method produces better results than the naiveforecast does (U < 1 ). Therefore, it seems not reasonable all to compare the new combined forecasting approachwith the naive forecast as it is done by using Theil’s Usuch as in the previous subsection.

Instead of using Theirs genuine U as an evaluationcriteria a modified version of Theil’s U as shown in (3) and(4) is used as evaluation criteria. In comparison to Theil’sgenuine U the modified ff is calculated as square root ofthe quotient of the squared forecasting error not divided by

the squared forecasting error of the naive forecast butdivided by the squared forecasting error of the actualweighted moving average forecasting method. Forforecasting once 12 periods forecasting horizon, themodified U" is calculated as shown in (3) and forforecasting 12 times l period forecasting horizon, themodified ff is calculated as shown in (4):

10+7" t,,-T

cf = ,o,o-, (3) u; = ’="÷’I0+T

I=1’¢1"|

¯ x~ : real value of month t.

(4)

¯ ~, : forecasted value for month t using the newcombined forecasting approach.

" ~r : forecasted value for month t using the actualweighted moving average forecasting approach.

¯ to : actual month, where the forecast is done.¯ T: total forecasting horizon

(in our application: 12 months).Similar to Theirs genuine U using Theil’s modified U"automatically gives an answer concerning the question,whether it is worth at all using the new approach orwhether it is sufficient to perform a prediction using theactual weighted moving average forecasting approach:¯ U" = 0: perfect forecast, it is worthwhile to use the

new approach.¯ 0 < U" < 1 :combined forecast better than actual

forecast, it is worthwhile to use the newmethod.

¯ U" = 1 : forecasting error of combined forecast equalto that of actual method, it is not worthwhileto use the new approach.

¯ U" > 1 : combined forecast worse than actualmethod, it is not worthwhile to use the newapproach.

Using the modified rf as an evaluation criteria forperformance of our forecasting system and taking theactual forecasting method as a benchmark, the combinedapproach presented yields very good results:¯ In the first step (1 time 12 periods forecasting horizon)

our new approach performs ~",=0.67.¯ In the second step (12 times I ,period forecasting

horizon) our new method performs U 2=0.68.Obviously, the combined forecasting approach yields muchbetter results than the actual method does. But due to thefact that neither Theirs genuine U nor the modified U" arelinear measures, it is very difficult to say exactly (i.e. inpercent) how much better the new combined forecastingapproach is in comparison to the actual method.

In summary, in comparison to the actual forecastingtechnique, our new combined approach yields much betterresults. Due to the fact that the results are confidential, theycannot be reported in more detail.

Ohl 281


References

Akaike, H. 1976. Canonical correlation analysis of timeseries and the use of an information criterion. In: R.K.Mehra, and D.G. Lainiotis (eds.): System Identification,Academic Press, New York.

Akaike, H. 1979. A Bayesian extension of the minimumAIC procedure of autoregressive model fitting: Biometrika66: 237-242.

Arminger, G. 1994. Okonometrische Sch~itzmethoden forneuronale Netze. University of Wuppertal.

Bates, J.M., and Granger, C.W.J. 1969. The combinationof forecasts: Operations Research Quarterly 20: 451-468.

Box, G.E.P., and Jenkins, G.M. 1976. Time seres analysis,forecasting and control. Holden-Day, San Francisco.

Boswell, R. 1990. Manual for Newld. The Turing Institute.

Brandli, N., and Dumschat, R. 1995. RegelbasierteStilcklisten im Anwendungsdatenmodell ISO 10303-214:Produktdaten Journal 1:1 O- 12.

Brandli, N., Kafer, W., and Malle, B. 1994. EnhancingProduct Documentation with new Information TechnologySystems Based On Step: Proceedings of the 27th ISATA:Mechatronics and Supercomputing Applications in theTransportation Industries, 429-436.

Breiman, L., FRiedmann, J.H., OIshen, R.A., and Stone,C.J. 1984. Classification and regression trees. Wadsworthand Brooks, Monterey.

Brown, R.G. ! 963. Smoothing, forecasting and predictionof discrete time series. Prentice Hall, Englewood Cliffs.

Brown, R.G., and Meyer, R.F. 1961. The fundamentaltheorem of exponential smoothing: Operations Research 9:673-685.

Clemen, R.T. 1989. Combining forecasts: a review andannoted bibliography: International Journal of Forecasting5: 559-583.

Dichtl, E., Raffle, H., Beeskow, W., and K6gelmayr, H.G.1983. Faktisches Bestellverhalten als Grundlage eineroptimalen Ausstattungspolitik bei Pkw-Modellen: ZFBF35 (2): 173-196.Dickey, D.A., and Fuller, W.A. 1979. Distribution of theestimators for autoregressive time series with a unit root:Journal of the American Statistical Association 74: 427-431.

Donaldson, R.G., and Kamstra, M. 1996. Forecastcombining with neural networks: Journal of Forecasting15: 49-61.

Durbin, J., and Watson, C.S. 1950. Testing for the serialcorrelation in least squares regression (l): Biometrika 37:409-428.

Durbin, J., and Watson, C.S. 1951. Testing for the serialcorrelation in least squares regression (il): Biometrika 38:159-178.

Durbin, J., and Watson, C.S. 1971. Testing for the serialcorrelation in least squares regression (III): Biometrika 58:1-19.

Efron, B. 1983. Estimating the error rate of a predictionrule: improvements on cross-validation: Journal of IheAmerican Statistical Association 78: 316-331.Fdedrich, L., Hirzel, J., Kirehner, S., and Ohl, S. 1996. Einneues Prognoseverfahren in der Automobilindustrie.Forthcoming.

Friedrich, L., and Kirchner, S. 1995. Das Prognose-ModellSAIPRO, Forschungsberichte 1-5: Technical Reports,Daimler-Benz AG, Berlin, Uim.Friedrich, L., Fooken, J., Hoffmann, S., Ludwig, R.,Miiller, E., Nehls, S., and Schiitz, F. 1995.Forschungsbericht: Analyse und Prognose yonKundenauftragsdaten, Dept. of Mathematics and Physics,Technische Fachhoch~hule Berlin.

Gardner, E.S. Jr. 1985. Exponential smoothing: state of theart: Journal of Forecasting 4: 1-28.

Graf, J., and Nakhaeizadeh, G 1993. Application of neuralnetworks and symbolic machine learning algorithms topredicting stock prices. In: V.L. Plantamura, B. Soucek,and G. Vissagio (eds.): Logistic and learning for qualitysoftware management and manufacturing. John Wiley &Sons, New York.

Granger, C.W.J., and Ramanathan, R. 1984. Improvedmethods of combining forecasts: Journal of Forecasting 3:197-204.

Henery, R.J. 1995. combining forecasting procedures. In:Y. Kodratoff, G. Nakhaeizadeh, and C. Taylor (eds.):Workshop Notes of the Miner Familiarization WorkshopStatistics, Machine Learning and Knowledge Discm,er3’ inDatabases.

Hirzel, J. 1995. Programmplanung f’fir variantenreicheAutomobile im Wandel vom Verl~ufer- zum IOufermarkt.In: G. Bol, T. Christ, and B. Suchanek (eds.): Das Potentialder Lean-Technik in der Kraftfahrzeugwirtschafi: AktuellcEntwicklungen der Matedal-Logistik in Theorie undPraxis.Hirzel, J., and Ohl, S. 1995. Forecasting of Extras andOptions in the Car Industry. In: P. Kleinschmidt (ed.):Operations Research Proceedings 1995,409-410. Springer,Heidelberg.

Holt, C., Modigliani, F., Muth, J.F., and Simon, H.A. 1960.Planning production, inventories and work force. PrenticeHall, Englewood Cliffs.Hiittner, M. 1994. Vergleich und Auswahl yonPrognoseveffahren f’tir betriebswirtschaRliche Zwecke. InP. Mertens (ed.): Prognoserechnung. 5., neu bearbeiteteund erweiterte Auflage, Physika, Heidelberg.

Jarrett, J. 1991. Business forecasting methods. SecondEdition. Basil Blackswell, Oxford.

282 MSL-96


Conclusion

SummaryIn conclusion, the task was to forecast cumulative timeseries of single options in the car industry, that do - at firstglance - not contain any systematic pattern like trend orseasonality at all. By utilizing descriptive statistics, wediscovered that the installation rate of single options fordifferent models in different countries is more or lesssystematic and structured. Therefore, the forecast is doneon installation rates, that are multiplied by the plannednumber of cars produced per month of each model for eachcountry. Due to the huge number of time series to predict,automatic forecasting approaches without any manual userinteraction are needed.

Because it is very difficult to identify external impactfactors, no causal approaches like regression analysis weretaken into consideration. Machine learning approaches andneural nets for running the prediction seem not to beappropriate as well. Therefore, univariate extrapolativeprediction methods were chosen for running the forecast.But instead of selecting one single forecasting techniquesuch as ARIMA modeling more than 20 differentunivariate forecasting approaches are tested for a randomlyselected sample of time series. Depending on thecharacteristics of the time series, an automatic decisionmodel was generated by using the machine learningalgorithm NewId.

Using this decision model, for every time series the sixmost appropriate forecasting approaches are selected, andsubsequently these six single forecasts are combined to getone common forecast. Generally, for performing this

combination neural networks seem reasonable as well asstatistical approaches. Which alternative is used for thecombination step is again decided by a decision tree, thatwas generated again by using the Newld algoritlun.

Selecting the most appropriate forecasting approachesdepending on the characteristics of the time series andselecting again the most appropriate combination methodfor the combination seems to guarantee an automaticadaptation of the forecasting system to the time series.

In summary, in this paper a two step multistrategyforecasting approach is presented, that integrates machinelearning techniques like decision trees for generating theautomatic decision systems as well as several well knownstatistical univariate time series approaches such asARIMA modeling, adaptive filtering, and exponentialsmoothing. Besides, multivariate statistical approachessuch as linear regression are used in this approach as wellas neural networks for performing the combination step.An overview of the total system is given in figure 8.

Future Work

Currently, improvement of single forecasting methods byautomatic adaptation of parameters and improvement ofthe decision trees by increasing the sample of time series isunder co~ideration. Besides, testing alternativecombination approaches such as using other types ofneural networks and other types of machine learningalgorithms like Quinlan’s M5 (Quinlan, 1992) for runningthe combination step are under consideration as well.

In addition, recently a completely new single forecastingapproach called SAIPRO was developed (Friedrich andKirchner, 1995; Friedrich et al., 1996), that will beintegrated in the combined forecasting system next.

J~u~.de~ 8 a.~ods:;

¯ naive forecast (0) ¯ naive forecast I I ¯ moving average (3) , moving average (6)¯ moving a~wage (I 2) ¯ moving averaae ...¯ ARIMA (0.0.1) ¯ ARIMA tl,O.0)¯ ARIMA (1,0,1) ¯ ARIMA ...¯ adaptive fiRenng , adapti~ filtennt~ ...¯ exp. mu~oth, standard̄ exp. yanooth.: Brown¯ nap. smooth.: Holt ¯ nap. mnoolh.: Winters

¯ varlance-covariance...¯ neural nat v.~’k

~ l~’°babilily

¯ abs. level¯ a;lI~lalioa¯ xasonalii~¯ sUiliomnly¯ fraud

dec/s/on f~,es Jbr ~ ey" ¯ unayst, fluctnadon~,,~ ~.~,v~ ....

*~,’len tree for .~o, ofeomblrmtlon makod:

Fig. 8: Overview of the muitistrategy forecasting system.

Ohl 283


Kalman, R.E. 1960. A new approach to linear filtering andprediction problems: Journal of Basic Engineering 82: 35-44.

Kalman, R.E., and Bucy, R.S. 1961. New results in linearfiltering and prediction theory: Journal of BasicEngineering 83: 95-107.

Kang, H. 1986. Unstable weights in the combination offorecasts: Management Science 32: 683-695.

Lachenbruch, P., and Mickey, R. 1968. Estimation of errorrates in discriminant analysis: Technometrics i0:1-1 i.

Makridakis, S., Anderson, A., Carbone, R., Fildes, R.,Hibon, M., Lewandowski, R., Newton, J., Parzen E., andWinkler, R. 1984. The forecasting accuracy of major timeseries. John Wiley & Sons, New York.

Makridakis, S. and Hibon, M. 1979. Accuracy offorecasting: an empirical investigation (with discussion):Journal of the Royal Statistical Society’ A 142: 97-145.

Makfidakis, S. and Winkler, R.L. 1983. Averages offorecasts: some empirical results: Management &’ience 29:987-995.

Merkei, A., and Nakhaeizadeh, G 1992. Application ofartificial intelligence methods to prediction of financialtime series: Operations Research 91: 557-559.

Montgomery, D.C. and Johnson, L.A.. 1976. Forecastingand time series analysis. McGraw-Hill, New York.

Newbold, P., and Granger, C.W.J. 1974. Experience withforecasting univariate time series and the combination offorecasts: Journal of the Royal Statistical Socie~ ,4 137:131-165.

Ohl, S. 1995. Analysis of the behavior of new car buyers.In: H.H. Bock and W. Polasek (eds.): Data analysis andinformation systems, statistical and conceptual approaches,studies in classification, data analysis, and knowledgeorganization, 197-207. Springer, Heidelberg.

Ohl, S. 1996. Forecasting virtual customer orders vs.forecasting combinations of options. Forthcoming.

Quinlan, R. 1986. Induction of decision trees: MachineLearning 1: 81-106.

Quinlan, R. 1987. Simplifying decision trees: InternationalJournal of Man-machine Studies 27: 221-234.

Quinlan, R. 1992. Learning with continuous classes. In:Proceedings of the 5th Australian Joint Conference onArtificial Intelligence. World Scientific, Singapore.

Rchkugler, H., and Poddig, T 1990. Statistische Methodenvs. kfinstliche Neuronale Netzwerke zur Aktienkurs-prognose - Eine vergleichende Studie. Discussion paper,University of Bamberg.

Reid, D.J. 1975. A review of short-term projectiontechniques. In: Gordon, H. (ed.): Practical aspects forecasting, Operational Research Society, London.

Ripley, B.D. 1992. Statistical aspects of neural networks.University of Oxford.

Rumelhart, D.E., Hinton, G.E., and Williams, R.J. 1986.Learning representations by back-propagating errors:Nature 323: 533-536.

Rumelhart, D.E., and McClelland, J.L. 1986. Paralleldistributed processing. Vol. I & II. MIT Press, Cambridge.Schwar’ze, J., and Weckefle, J. 1982. Prognose-verfahrenim Vergleich. Technical report. University ofBrannschweig.

Schumann, M., and Lohrbach, T 1992. Artificial neuralnetworks and ARIMA models within the field of stockmarket prediction a comparison. In: PreliminaryProceedings, BANKAI Workshop on Adaptive IntelligentSystems, Belgium, S.W.I.F.T.Stegemann, B. 1995. Dampf in allen Klassen: Auto MotorSport 3: 8-13.

Steurer, E. 1994. Prognose yon ! 5 Zeitreihen der DGORmit Neuronalen Netzen und Vergleieh mit den Methodendes Veffahrensvergleiehs der DGOR aus dem Jahre 1982:Technical Report, Daimler-Benz AG, Ulm.

Stone, M. 1974. Cross-validatory choice and assessment ofstatistical predictions: Journal of the Royal StatisticalSociety 36: 111-133.

Theil, H. 1971. Applied Economic Forecasting. North-Holland, Amsterdam.

Thiele, J. 1993. Kombination von Prognosen. Physika,Heidelberg.

Weigend, A.S., and Gershenfeld, N.A. 1994. Time seriesprediction: Forecasting the future and understanding thepast. Addison-Wesley, Reading.

Werbos, P. 1974. Beyond regression. Ph.D. Thesis,Harvard University.

Winters, P.R. 1960. Forecasting sales by exponentiallyweighted moving averages: Management Science 6: 324-342.Wolpert, D.H. 1992. Stacked generalization. NeuralNetworks, 5: 241-259.

284 MSL-96


forecasting of options in the car industry using a ... · forecasting of options in the car...

Documents