the process of using a forecasting support system

14
The process of using a forecasting support system Paul Goodwin a, , Robert Fildes b , Michael Lawrence c , Konstantinos Nikolopoulos d a The Management School, University of Bath, Bath, BA2 7AY, United Kingdom b Department of Management Science, Lancaster University Management School, LA1 4YX, United Kingdom c School of Information Systems, Technology and Management, University of New South Wales, Sydney 2052, Australia d Manchester Business School, The University of Manchester, Booth Street, West Manchester M15 6PB, United Kingdom Abstract The actions of individual users of an experimental demand forecasting support system were traced and analyzed. Users adopted a wide variety of strategies when choosing a statistical forecasting method and deciding whether to apply a judgmental adjustment to its forecast. This was the case, despite the users reporting similar levels of familiarity with statistical methods. However, the analysis also revealed that users were very consistent in the strategies that they applied across twenty different series. In general, the study found that users did not emulate mechanical forecasting systems, in that they often did not choose the forecasting method that provided the best fit to past data. They also tended to examine only a small number of methods before making a selection, though they were likely to examine more methods when they perceived the series to be difficult to forecast. Individuals who were relatively unsuccessful in identifying a well fitting statistical method tended to compensate for this by making large judgmental adjustments to the statistical forecasts. However, this generally led to forecasts that were less accurate than those produced by people who selected well fitting methods in the first place. These results should be of particular interest to designers of forecasting support systems, who will typically have some stylised representation of the way that users employ their systems to generate forecasts. © 2007 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. Keywords: Judgmental forecasting; Forecasting support system; Forecaster behaviour; Forecasting tasks 1. Introduction Almost all research into judgmental forecasting has focussed on groups of forecasters, as opposed to individuals. For example, typical research questions are: Does the use of method A lead, on average, to more accurate judgmental forecasts than method B?(e.g. Goodwin, 2000; Sanders, 1997; Webby, O'Connor, & Edmundson, 2005), or Will judgmental forecasters be on average more accurate if condition X applies in the environment rather than condition Y?(e.g. Lawrence & Makridakis, 1989; O'Connor, Remus, & Griggs, 1993). In addition, some studies have pooled data on individuals in order to develop models of the processes that the average forecaster adopts when International Journal of Forecasting 23 (2007) 391 404 www.elsevier.com/locate/ijforecast Corresponding author. Tel.: +44 1225 383594; fax: +44 1225 826473. E-mail address: [email protected] (P. Goodwin). 0169-2070/$ - see front matter © 2007 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.ijforecast.2007.05.016

Upload: paul-goodwin

Post on 04-Sep-2016

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: The process of using a forecasting support system

sting 23 (2007) 391–404www.elsevier.com/locate/ijforecast

International Journal of Foreca

The process of using a forecasting support system

Paul Goodwin a,⁎, Robert Fildes b, Michael Lawrence c, Konstantinos Nikolopoulos d

a The Management School, University of Bath, Bath, BA2 7AY, United Kingdomb Department of Management Science, Lancaster University Management School, LA1 4YX, United Kingdom

c School of Information Systems, Technology and Management, University of New South Wales, Sydney 2052, Australiad Manchester Business School, The University of Manchester, Booth Street, West Manchester M15 6PB, United Kingdom

Abstract

The actions of individual users of an experimental demand forecasting support system were traced and analyzed. Usersadopted a wide variety of strategies when choosing a statistical forecasting method and deciding whether to apply a judgmentaladjustment to its forecast. This was the case, despite the users reporting similar levels of familiarity with statistical methods.However, the analysis also revealed that users were very consistent in the strategies that they applied across twenty differentseries. In general, the study found that users did not emulate mechanical forecasting systems, in that they often did not choosethe forecasting method that provided the best fit to past data. They also tended to examine only a small number of methodsbefore making a selection, though they were likely to examine more methods when they perceived the series to be difficult toforecast. Individuals who were relatively unsuccessful in identifying a well fitting statistical method tended to compensate forthis by making large judgmental adjustments to the statistical forecasts. However, this generally led to forecasts that were lessaccurate than those produced by people who selected well fitting methods in the first place. These results should be of particularinterest to designers of forecasting support systems, who will typically have some stylised representation of the way that usersemploy their systems to generate forecasts.© 2007 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

Keywords: Judgmental forecasting; Forecasting support system; Forecaster behaviour; Forecasting tasks

1. Introduction

Almost all research into judgmental forecasting hasfocussed on groups of forecasters, as opposed toindividuals. For example, typical research questions

⁎ Corresponding author. Tel.: +44 1225 383594; fax: +44 1225826473.

E-mail address: [email protected] (P. Goodwin).

0169-2070/$ - see front matter © 2007 International Institute of Fdoi:10.1016/j.ijforecast.2007.05.016

orecaste

are: “Does the use of method A lead, on average, tomore accurate judgmental forecasts than method B?”(e.g. Goodwin, 2000; Sanders, 1997;Webby, O'Connor,& Edmundson, 2005), or “Will judgmental forecastersbe on average more accurate if condition X applies inthe environment rather than condition Y?” (e.g.Lawrence & Makridakis, 1989; O'Connor, Remus, &Griggs, 1993). In addition, some studies have pooleddata on individuals in order to develop models of theprocesses that the average forecaster adopts when

rs. Published by Elsevier B.V. All rights reserved.

Page 2: The process of using a forecasting support system

392 P. Goodwin et al. / International Journal of Forecasting 23 (2007) 391–404

making forecasts (e.g. Bolger & Harvey, 1993; Good-win, 2005; Lawrence & O'Connor, 1992). Pooling hasbeen used in particular in the examination of financialanalysts' forecasts (see, for example, Easterwood &Nutt, 1999).While many of these studies have producedresults which are of great potential value in understand-ing the behaviour of forecasters and improving accuracy(Armstrong, 2001), they have usually also found thatthere are considerable differences between individualforecasters, both in the forecasting strategies theyemploy and in the resulting accuracies they obtain.Thus, although a particular method may, on average,improve judgmental forecasting accuracy in a particularcontext, there is no guarantee that it will work for a givenindividual. Indeed, Stewart (see Ayton, Ferrell, &Stewart, 1999) has pointed out that, whilemuch researchhas assumed that judgmental processes are universal andare independent of the individual, research should beconducted which makes the opposite assumption andfully recognises the importance of the individual.

In contrast to the research focussing on ‘typical’ or‘average’ forecasters, researchers into the design ofdecision support systems (DSSs) have emphasised thedifferences between individuals. For example, Sauter(1997, p. 30), citing the work of Mintzberg (1990),argues that “first and foremost, different decision-makers operate and decide in very different ways… asa result decision support systems must also be designedto allow users to do things their way… (DSSs) mustinclude substantial flexibility in their operations”.However, Sauter argues that not only should DSSsallow decision-makers to act in their own way, theyshould also provide appropriate support and guidance toindividuals in the selection of models and data in thechoice process. For example, novice decision-makers, inparticular, might benefit from warning messages andsuggestion boxes.

This suggests that software designers who areresponsible for producing commercial systems thatsupport the forecasting task (forecasting supportsystems, FSSs) should also consider the ways inwhich individuals use their systems to produceforecasts. A mismatch between the softwaredesigner's model of how a system will be usedand the actual use is likely to impair the system'sfunctionality. However, little is known about thediversity of strategies that users adopt, the effect ofthese strategies on forecast accuracy, or the extent to

which individuals are consistent in their employmentof particular strategies.

This paper describes a study that recorded all of thekeyboard and mouse actions of the users of aforecasting support system, in order to address thefollowing research questions:

1) How do individuals carry out the task of selecting astatistical forecasting method, and what influencestheir propensity to apply judgmental adjustments tothe resulting forecasts?

2) How much variation is there between individuals'strategies, and is this variation sufficient to justifydesigning systems that vary in the support that theyprovide to individuals?

3) Are particular strategies associated with inaccurateforecasting?

4) How consistent over time are individuals in applyinga given strategy?

5) Can a broad classification of strategies be identified?

The paper is structured as follows. First, otherstudies which have considered the implications ofindividual differences for decision support arereviewed. Then the forecasting task and the compu-terised system that was designed to support it aredescribed. An analysis of how the subjects used thesystem is then used to answer the research questions.Finally, the implications of this analysis for FSS designare discussed.

2. Individual differences and decision support

The decision support literature of the 1970s andearly 1980s recognised that individuals differ in theirapproach to decision making, and hence are likely tohave different needs when using a decision supportsystem. (e.g. Bariff & Lusk, 1977; Benbasat & Taylor,1978; Dickson, Senn, & Chervany, 1977; Driver &Mock, 1975). Zmud (1979) identified three mainattributes of decision-makers that impacted on theiruse of decision support systems: (i) cognitive style(i.e., an individual's characteristic and consistent ap-proach to organising and processing information insolving problems and making decisions; see Alavi &Joachimsthaler, 1992; Tennant, 1988), (ii) personality(e.g., need for achievement, dogmatism and risk-taking propensity), and (iii) demographic/situational

Page 3: The process of using a forecasting support system

393P. Goodwin et al. / International Journal of Forecasting 23 (2007) 391–404

variables (e.g., age, sex, education, training, andexperience). A key belief inherent in these paperswas that the acceptability and subsequent use of asystem could be predicted from these attributes. Thisimplied that a system could be tailored to provideappropriate facilities that complemented, reinforced,or attempted to correct the user's decision makingprocess, and also improved the likelihood that thesystem would be accepted in the first place. However,Huber (1983) questioned the value of using cognitivestyle as a basis for system design. He argued that theliterature on cognitive style lacked both an adequateunderpinning theory and reliable and valid instrumentsfor measuring cognitive style. Huber therefore con-cluded that further research into cognitive style wasunlikely to lead to improvements in the design ofsystems. A meta-analysis by Alavi and Joachimsthaler(1992) also suggested that cognitive style andpersonality were far less important in DSS use thanuser-situational variables like training, experience, andinvolvement in the design and implementation of thesystem. These researchers found that few studies hadinvestigated the effect of demographic factors on theuse of DSSs.

However, there is another approach to tailoring theresponse of a support system to an individual. Thisdoes not involve using precursor variables toanticipate future use. Instead, it involves directlymonitoring the actions of individuals using the systemin order to identify patterns that might signal eitherfunctional or dysfunctional use, and the need forparticular types of support. Such an approach is likelyto be particularly relevant for forecasting supportsystems because the use of such systems usuallyinvolves carrying out similar tasks many times (thusallowing data to be generated about how the task isbeing approached). In addition, in forecasting, thereexists a set of research-based principles that provideguidance on how the task should be carried out(Armstrong, 2001). For example, research indicatesthat forecast accuracy tends to be reduced whenforecasters judgmentally adjust statistical forecasts ifthey are not in possession of important newinformation about exceptional events (Sanders &Ritzman, 2001). Whether such a pattern of actionsis associated with particular cognitive styles or withpersonality or demographic variables is not known.Nevertheless, support mechanisms designed to reduce

the propensity of forecasters to make such adjust-ments could still be evoked if the pattern weredetected. For example, they could involve theprovision of guidance (Silver, 1991), or the require-ment to record a reason for the adjustment (Goodwin,2000). The rest of the paper investigates the extent towhich such patterns are detectable and specific toindividual system users.

3. The experiment

3.1. The task and system

The experimental task was designed to replicate thetasks found in many supply-chain based companieswhere computer systems are used to obtain productdemand forecasts from time series data. In performingthe task, forecasters have an opportunity either tochoose a statistical forecasting method and itsassociated parameter values, or to allow the systemto identify the ‘optimum’ statistical method automat-ically. Note that in many companies, these forecastsare subsequently modified at review meetings ofmanagers, ostensibly to take into account marketintelligence (Goodwin, Lee, Fildes, Nikolopoulos, &Lawrence, 2006). We do not consider this aspect offorecasting here.

The PC based support system was identical to the‘high participation’ system employed by Lawrence,Goodwin, and Fildes (2002). In this system, subjectswere able to choose their own forecasting methods andassociated parameter values, in contrast to a ‘lowparticipation’ system where they would simply havebeen presented by the computer with an automaticallydetermined forecast. Subjects running the programfirst supplied answers to a questionnaire (see Appen-dix A). The subsequent screen displayed a brief ex-planation of the experimental task, before subjectsengaged in a practice run of the program to familiarisethemselves with its facilities. They then used the FSSto estimate one-step-ahead forecasts for each of 20monthly sales time series. Because the focus of thisstudy is on individual differences, all of the forecastersfaced the same forecasting task under the sameconditions, although the time series were presentedin a random sequence in order to remove order effects.The instruction sheet told the subjects that they were toact as product managers responsible for developing

Page 4: The process of using a forecasting support system

394 P. Goodwin et al. / International Journal of Forecasting 23 (2007) 391–404

a monthly forecast for 20 key products. At theconclusion of the experiment, another questionnairewas completed by the subjects.

3.2. About the subjects

The subjects were 32 management students atLancaster University, all of whom were taking acourse in forecasting. As an incentive, a monetaryreward of £5 was given to the 50% of subjects whoproduced the most accurate forecasts. The subjectsindicated that they had some familiarity withstatistical forecasting methods of the type availablein the FSS and with methods of technique selection.Their average rating across the four questionsdesigned to assess their familiarity with these topicson a 1 to 3 scale (1=not at all familiar, 2= somefamiliarity, 3=very familiar) was 2.2 (there was littleindividual variation here — only 6 subjects ratedtheir familiarity outside the range of 2 to 2.5 on the 1to 3 scale). This may exceed the levels of familiarityof forecasting personnel in many companies (Fildes& Hastings, 1994; Watson, 1996). Only a third of thesubjects made use of the five help buttons availableon the FSS which gave a brief explanation of eachforecasting method. Those who used the help facility

Fig. 1. A typical screen s

did so just over 3 times on average while operatingthe FSS (including the practice run).

3.3. Generation of the time series

The 20 time series were artificially generated tosimulate the non-seasonal demand patterns experi-enced by supply-chain based companies. Theyincluded:

i) four series without a systematic trend,ii) two series with an upward and two with a

downward local linear trend,iii) two series with damped trends,iv) two series with a reversal of the trend direction,v) two random walks,vi) two white noise series with step changes in the

underlying mean, andvii) four irregular series, which were generated by

making changes to the underlying level or trendof the series at random time periods.

The noise associated with a given series wasindependently sampled from a normal distributionwith a mean of zero and a standard deviation of either15 or 45, yielding eleven ‘low noise’ series and nine

hot from the FSS.

Page 5: The process of using a forecasting support system

395P. Goodwin et al. / International Journal of Forecasting 23 (2007) 391–404

‘high noise’ series, respectively. Graphs of two of theseseries can be seen in Figs. 1 and 2.

3.4. The forecasting support system

Fig. 1 shows a typical screen display seen bysubjects. The FSS presented a graph of the sales of theproduct over the previous 20 months, and the user hadseveral decisions to make in order to obtain aprovisional forecast for month 21. These decisionsrelated to the statistical forecasting method to be usedand the parameter value(s) to employ with a givenmethod. Such choices are typically available andexercised by company forecasters using forecastingsystems (Goodwin et al., 2006).

a) Choice of forecasting method. Ten methods wereavailable in the system: exponential smoothing,Holt's method, dampedHolt's method, Naïve 1, andan average of any two of these methods (Makrida-kis, Wheelwright, & Hyndman, 1998). A brief ex-planation of each method, and an indication of thecircumstances where its application was mostappropriate, could be obtained by clicking themouse. Subjects had the option of asking the systemto automatically identify the method which gave theclosest fit (i.e. the minimum mean squared error) tothe past data in the presented series and to produce a

Fig. 2. FSS screen for eliciting the u

forecast using this method. Alternatively, they couldchoose their own method from those available.

b) How the parameters of each method were obtained.Subjects could choose to use the default parametervalues preset by the FSS, or they could ask the systemto identify the parameter values that had yielded themost accurate one-month-ahead sales forecasts overthe previous 20 months. A further choice wasavailable here: the system could be asked to identifyeither the parameter values that minimised the meanabsolute error (MAE) of these previous forecasts, orthe values that minimised the mean squared error(MSE). The subjects were informed that the MSEpenalises large forecast errors more severely.

When the subject had made these choices, theforecast for month 21 using the selected method wassuperimposed on the sales graph, together with themethod's forecasts for the previous 20 months. At thispoint the subject could decide whether: i) to accept thisforecast and move on to the next product; ii) to make ajudgmental adjustment to the forecast (using the mouseto indicate on the graph where the adjusted forecastshould be); or iii) to try out an alternative statisticalmethod. In the last case, a record of the month 21forecasts of all methods previously examined for thecurrent product was available at the click of a commandbutton.

ser's confidence in a forecast.

Page 6: The process of using a forecasting support system

Table 1The frequency of selection of the different methods available on thesystem

Method % of time selected

‘Optimal’ method recommended by system 9.2Simple exponential smoothing 14.4Holt's method 22.3Damped Holt's method 11.9Naïve 10.3Average of two of the methods 31.9TOTAL 100.0

396 P. Goodwin et al. / International Journal of Forecasting 23 (2007) 391–404

When the subjects had finally determined theforecast for a given product, the FSS displayed aprediction interval around it (see Fig. 2). The upperand lower limits of the interval were set to be theforecast plus and minus the noise standard deviation.Subjects were then asked to indicate, on a scale from 0(‘no confidence at all’) to 10 (‘complete confidence’),how confident they were that the interval wouldcontain the actual sales value for month 21.

3.5. The questionnaires

As indicated above, subjects completed a pre-experiment questionnaire that was designed to measuretheir familiarity with forecasting techniques andconcepts. Later on, they also completed a post-experiment questionnaire that was intended to elicittheir perceptions of the ease of use, usefulness andtrustworthiness of the FSS tool that they had just beenusing (see Appendix A). The design of the question-naire enabled scores to be given for the constructs'‘ease of use’, ‘usefulness’ and ‘satisfaction’ using thetransformations shown in Appendix B. Both ques-tionnaires were presented on the computer screen, andsubjects used the mouse to indicate their responses.

3.6. The tracing method

The actions of the individuals using the programwere traced by recording every key stroke and mouseoperation. The advantages of using computerisedprocess tracing in order to develop an understandingof how decision-makers use decision support systemshave been discussed by Cook and Swain (1993).These include the unobtrusive nature of the tracingtools. To date, the method has never been used tostudy the processes used by forecasters.

4. How individuals made their forecasts

The process by which individuals used the FSS toarrive at each sales forecast can be characterised byfour features:

i) how well the forecasts of the chosen statisticalforecasting method fitted the past sales data;

ii) the number of times the individual fitted astatistical method to the past sales series, before

choosing the method that they thought wasappropriate for their forecasts;

iii) whether they decided to apply a judgmentaladjustment to the forecast of their chosen method;

iv) the size of any judgmental adjustment that wasapplied.

4.1. The fit of the chosen method's forecasts to pastsales data

Table 1 shows the frequency with which thedifferent methods available in the system were chosenby subjects for making their forecasts. Note that thesimple averaging involves 6 possible combinations ofthe other methods, while the ‘Optimal’ method varied,depending on the nature of the series.

It is particularly interesting to note that an averageof the forecasts of two methods was the most popularchoice.

The subjects could assess the fit of the forecasts oftheir chosen method to past observations by looking atthe graph (see Fig. 2). Most purely mechanical fore-casting systems choose statistical methods, and theirparameter values, on the basis of this fit to past data.However, in general, there was little evidence thatsubjects were either able or willing to emulate this.Recall that an “optimise” button was available, toindicate the best fitting method for the past 20 obser-vations, but this was used to obtain only 14.1% of theforecasts examined, and only 9.2% of the forecastingmethods finally chosen (just over 14% of these‘optimised’ forecasts were subsequently judgmentallyadjusted). There is evidence in the literature that the fitof forecasting methods to past data is often only weaklycorrelated with the ex ante accuracy of these methods

Page 7: The process of using a forecasting support system

Table 3Mean fit ratio (over methods seen)

Mean fit ratio (over methods seen) Number of subjects

1.0 41.01 to under 1.10 101.10 to under 1.20 81.20 to under 1.30 31.30 to under 1.70 4

29 a

a 3 subjects never considered more than one method.

397P. Goodwin et al. / International Journal of Forecasting 23 (2007) 391–404

(for example, Fildes & Makridakis, 1988, indicated acorrelation of only 0.25 for short forecasting horizons).In practice, this weak correlation might result fromnoise and the effect of external events (like productpromotion campaigns) that were not taken into accountby the forecasting methods. In this experiment therewere no external events to impact on demand.Choosing a method with a close fit to the past datawould therefore have been an appropriate strategy formost of the series.

Several of the statistical forecasts chosen werebased on default parameter values (e.g., a smoothingconstant of 0.1 for simple exponential smoothing),even though the graphical display showed, in manycases, that these forecasts provided a very poor fit tothe past time series. For a given series, the MSE of thechosen statistical forecasting method can be comparedwith that of the method offering the lowest MSE forthat series. We will define the ratio of the two MSEs asthe overall fit ratio (OVR), i.e.,

overall fit ratio ðOVRÞ ¼ MSE of chosen methodLowest MSE of methods available on system:

Clearly, where a subject chooses the best fittingmethod that is available on the FSS for a given series,the fit ratio will be 1.0. The mean fit ratio of themethods selected by the subjects, averaged over alltwenty series, was 1.44.

Nevertheless, as Table 2 shows, this overall meanobscures the fact that there were considerabledifferences between the mean fit ratios obtained byindividual subjects.

When subjects tried several forecasting methods fora given product, they did not always select the bestfitting method of those they had examined. This can be

1 This figure includes times when subjects revisited a method tha

Table 2Mean fit ratio

Mean fit ratio Number of subjects

1.0 to under 1.1 41.1 to under 1.2 61.2 to under 1.4 111.4 to under 2.0 82.0 to under 4.0 3

32

seen by considering the ratio of the fit of the chosenmethod to the best fitting method they saw, i.e.,

fit ratio ðover methods seenÞ ¼ MSE of chosen methodlowest MSE of methods examined

:

The mean ratio here was 1.17. Table 3 shows thevariation in this ratio across the subjects.

4.2. Number of methods examined

On average, the number of statistical methods thatsubjects examined before selecting a method was 2.6,though as Table 4 shows, there were again considerableindividual differences.1 Perhaps because of fatigue orincreasing familiarity with the task, subjects tended totrymoremethods for the ten earliest series that they saw(an average of 3.2 methods per series compared to 2.0for the second ten seen, t=4.43, pb0.001). However,because the product series were presented to subjects inrandom order, this did not have any effect on theforecasts of particular series (the correlation betweenthe mean position in which the series was displayed(1=first series displayed, 20= last) and the number ofmethods tried was only − 0.055).

The number of statistical forecasting methods triedby subjects varied significantly across the 20 series.The hypothesis that the number of methods tried wasuniformly distributed across the series was rejected atpb0.001 (χ219=55.6). Why did subjects try out moremethods on some series than others? One possibility isthe difficulty of modelling the patterns of particularseries — the more difficult the pattern, the moremethods people will try. The MSE of the ‘optimal’

they had already tried on a given series.

t
Page 8: The process of using a forecasting support system

Table 4Mean number of methods tried per series

Mean no. of methods tried per series Number of subjects

1.0 31.1 to under 2.0 102.0 to under 3.0 63.0 to under 5.0 105.0 or more 3

32

398 P. Goodwin et al. / International Journal of Forecasting 23 (2007) 391–404

available method was used as a proxy for the difficultyof modelling the pattern, and this was significantlycorrelated with the number of methods tried (r=0.46,pb0.05). However, the levels of sales in the differentseries varied considerably, so that they appeared ongraphs with different vertical scales. As a result, aforecast error of 10 units in one series might appear tosubjects to represent a larger gap than an error of 100units in another series. To take this into account, theMSE of the ‘optimum’ available method was dividedby the length of the graph's vertical axis (maximumsales−minimum) for each series, so that the resultingmeasure reflected the lack of fit of the statisticalforecast as it might be perceived by the subjects. Ashypothesised, this led to a higher correlation with thenumber of methods tried (r=0.56, pb0.01). Thus,although subjects in general did not tend to choose thebest fitting methods, the perceived lack of fit of themethods they tried may have spurred them on to tryfurther methods.2

4.3. The consequences of trying more methods

In general, series which had more statistical methodsapplied to them were forecasted less accurately (thecorrelation between the number of methods tried and themean absolute percentage error (MAPE) of the chosenstatistical forecast was 0.46, pb0.05, while the correla-tion with theMAPE of the final – possibly judgmentally

2 Other variables which might have explained the variationbetween series in the number of methods tried were also investigated.These included whether the optimum forecast had the same slope asthe last segment slope of the series (the assumption being that subjectsmight be searching for a method yielding a forecast that extrapolatedthe last segment), and the complexity of the seriesmeasured on a scaleusing a score of 1 for a ‘flat’ underlying pattern, 2 for a trend and 3 foran erratic underlying pattern. All of these variables yieldedcorrelations that were close to 0.

adjusted – forecast was 0.47, pb0.05). This is perhapsnot surprising, as the above analysis suggests that moremethods were tried on the more-difficult-to-forecastseries. (Note that, in common with many forecastingpackages, we used the MSE to measure the fit of theforecasts to past data. However, when measuring the exante accuracy of forecasting methods over a number ofseries, a relative measure like the MAPE is appropriate,given the different scales that apply to the time series.)

Did people who tried more forecasting methods thantheir fellow subjects obtainmore accurate final forecasts?In general, the answer was ‘no’: the correlation betweenthe total number ofmethods subjects tried and theMAPEof their chosen statistical forecasts was− 0.005,while thecorrelation between the number of methods and the final(possibly judgmentally adjusted) forecasts was 0.192.Did trying more methods lead to a mean overall fit ratiocloser to 1.0? There was no evidence for this (r=0.12).Nor was there any evidence that subjects who triedmore methods adjusted fewer forecasts (r=− 0.164), orthat they were more confident in their final forecasts(r=0.159). However, it is important to note that thesecorrelations also mask some interesting differencesbetween subjects which will be explored later.

4.4. Adjustment behaviour

Why do subjects decide to apply a judgmentaladjustment to particular statistical forecasts, while leavingother forecasts unadjusted? This can be investigated bothin terms of the characteristics of the different series, and inrelation to the behaviour of the different subjects.

There was no significant difference in the number ofadjustments applied to the 20 different series (χ219=9.89, not significant), suggesting that the propensity toadjust could not be explained by the characteristics ofparticular series. When series were categorised as beingeither i) ‘consistently’ flat, ii) ‘consistently’ trended, oriii) subject to change (e.g. involving a step change ortrend reversal), there was weak evidence that the size ofthe adjustment, as measured by the mean absolutepercentage adjustment (MAPA), was greater for thethird category of series. However, this difference wasnot quite significant (Kruskal–Wallis test: p=0.064).

Many judgmental adjustments made by subjects wererelatively small: half of the absolute percentage adjust-ments were below 2.26%. However, subjects who chosestatistical forecasts that provided relatively poor fits to

Page 9: The process of using a forecasting support system

399P. Goodwin et al. / International Journal of Forecasting 23 (2007) 391–404

the past observations did tend to adjust their forecastsmore often and make bigger adjustments. The correlationbetween the mean ‘overall’ fit ratio of the forecastingmethods chosen by subjects and the number ofadjustments they applied was 0.467 ( pb0.01), whilethe correlation between subjects' mean ‘overall’ fit ratiosand their mean absolute percentage adjustments (MAPA)was 0.638 (pb0.001).3 All of this suggests that subjectswho could only obtain poorly fitting statistical methodsrecognised the inadequacy of their statistical forecastsand tried to compensate by applying judgmental adjust-ments to them. There is some evidence from the literaturethat judgmental forecasters can recognise forecasts thatare in need of adjustment, even when they only haveaccess to time series information (Willemain, 1989).

Other factors that might have explained thesubjects' propensities to adjust forecasts did not yieldsignificant associations. For example, before startingthe experiment, subjects were asked to indicate (on afive-point scale) their strength of agreement with thestatement that “statistical forecasts are less importantthan human judgment”. The correlation between theirstrength of agreement with this statement and thenumber of judgmental adjustments they made wasonly 0.026.

4.5. Consequences of adjustments and lack of fit

To investigate the effect of the size of the ad-justments and the overall fit ratio on forecast accu-racy, the following multiple regression model wasfitted to the data:

MAPE ¼ 5:21ð0:000Þ

þ 1:29 OVRð0:066Þ

þ 0:714 MAPAð0:009Þ

� 0:307ðOVR�MAPAÞð0:036Þ

;

R� squared ¼ 28:1%;

where OVR=the overall fit ratio of their selectedstatistical method compared to the best method, andMAPA is the mean absolute percentage adjustment.

The p-values for the regression coefficients areshown in brackets.

3 One outlying observation may have over-influenced thiscorrelation coefficient, but after removing it the correlation is stillsignificant (r=0.427, pb0.05).

Some predictions of the model are shown below forfour illustrative combinations of overall fit ratios andMAPAs that fell within the range of those observed.

Overall fit ratio

MAPA Predicted MAPE 1.0 0% 6.5% 1.0 5% 8.5% 2.0 0% 7.8% 2.0 5% 8.3%

These predictions indicate that judgmental adjust-ments tended to reduce accuracy. This is perhaps notsurprising when the adjustments were applied to well-fitting methods. However, the predictions show thatmaking large adjustments to poorly fitting methods alsotended to reduce accuracy, though this reduction was notas damaging. Thus, it appears that subjects whoattempted to compensate for poor-fitting forecastingmethods by making relatively large judgmental adjust-ments to their forecasts generally only succeeded inmaking their forecasts even less accurate. (Note that themodel predicts that large judgmental adjustments (say5%) will improve accuracy when the fit of the method isextremely poor (e.g., with an OVR of 3 or over), but fitsas poor as this were rarely observed in the experiment.)

4.6. Other points

Interestingly, subjects who spent a larger percentageof their total time on the practice run tended to achievemore accurate forecasts (the total time excludes thetime spent in answering the questionnaires). Thecorrelation between the percentage of time spent onthe practice run and the MAPE was − 0.435 ( pb0.02).Similarly, the correlation between the actual time spenton the practice run and the MAPE was − 0.365( pb0.05). This may reflect the commitment of theindividual subjects, or it may indicate that time spentexploring and practising using an FSS is beneficial.

It is also worth noting that the mean confidencelevels of subjects across the twenty series (as elicitedon a scale of 0 to 10, see Fig. 2) were not significantlycorrelated with any of the behavioural or performancemeasures that were examined, including the number ofmethods tried, the fit of the statistical forecasts, thenumber of judgmental adjustments made, the meanabsolute percentage adjustments, the MAPE of theforecasts, or the total time that was spent on the task.

Page 10: The process of using a forecasting support system

Table 5Consistency

The first ten forecasts vs the last ten Correlation

Mean no. of methods tried per series 0.683⁎⁎⁎

‘Overall’ fit ratio 0.457⁎⁎

No of judgmental adjustments 0.784⁎⁎⁎

Mean absolute % adjustment 0.419⁎

Canonical correlation=0.817 (pb0.0005) Redundancy=46.5%

⁎=significant at the 5% level.⁎⁎=significant at the 1% level.⁎⁎⁎=significant at the 0.01% level.

400 P. Goodwin et al. / International Journal of Forecasting 23 (2007) 391–404

Table 6Predicting an individual's strategy from early forecasting behaviou

The first five forecasts vs the last ten Correlation

Mean no. of methods tried per series 0.606⁎⁎⁎

‘Overall’ fit ratio 0.421⁎

No of judgmental adjustments 0.628⁎⁎⁎

Mean absolute % adjustment 0.263 (ns)Canonical correlation=0.758 (pb0.0005) Redundancy=37.2%

The first forecast vs the last ten Correlation

No. of methods tried per series 0.401⁎

‘Overall’ fit ratio 0.384⁎

No. of judgmental adjustments # 0.531⁎

Mean absolute % adjustment 0.395⁎

(# can only be 0 or 1 for first forecast)Canonical correlation=0.689 (p=0.011) Redundancy=27.4%

⁎=significant at the 5% level.⁎⁎=significant at the 1% level.⁎⁎⁎=significant at the 0.01% level.

Finally, as indicated earlier, subjects' perceptions ofthe ‘ease of use’ of the FSS, the ‘usefulness of the FSS’and their ‘assessment of their own performance’ wereelicited in the post-experiment questionnaire, andscores were constructed for each of these threedimensions. However, the correlations between thesescores and the MAPEs achieved by subjects were alsovery small and not significant. Performance wastherefore not associated with the extent to which theFSS was regarded as “easy to use” or “useful”. It isparticularly noteworthy that subjects' perception oftheir performance were not related to the actualforecasting accuracy that they achieved (r=0.051).Thus, it seems that subjects could recognise when theirforecasting method had a poor fit to past data (which isnot surprising given the graph that was presented tothem), but they were less successful in anticipating theaccuracy of the forecasts that they provided.

5. Were the forecasters consistent?

How consistent were subjects in applying particularstrategies? As indicated earlier, consistency would benecessary for an FSS to recognise particular individualtraits so that appropriate guidance could be provided.Table 5 shows the correlations between the character-istics of individuals' strategies for the first ten and thelast ten series that they forecast. These correlationsindicate high levels of consistency, particularly for themean number of forecasting methods that wereexamined for each series and for the frequency withwhich judgmental adjustments were made. The highvalue of the canonical correlation coefficient, whichreflects the correlation across all four characteristics inTable 5, is also indicative of consistency.

Another interesting question relates to how manyforecasts an FSS would need to record before thecharacteristics of an individual strategy could bediscerned. Table 6 suggests that reasonable predictionsof an individual's strategy can be based on as few asfive forecasts, and that even a prediction based on onlyone forecast has some value.

Another aspect of consistency relates to the extent towhich subjects applied the same forecasting methodsacross the different series. Such consistency would beexpected to lead to poor forecasting performance giventhe varied characteristics of these series. The number ofdifferent methods used by subjects varied from 3 to 5,with a mean of 4.5 (here the average of any twomethods is treated as one method, so the maximumnumber of available methods is 5). This suggests thatthey were prepared to adapt their choice of methodwhen presented with a new series.

6. Characterising forecaster behaviour: analysis ofsubjects by sub-groups

When designing an FSS, the system designer willtypically have some stylised representation of the(potential) client and the tasks they expect to under-take. It is therefore valuable to classify the types ofstrategies that subjects used, together with theeffectiveness of the different approaches. In particular,this allows the effects of interactions of features of

r

Page 11: The process of using a forecasting support system

Table 7Characteristics of the three groups of forecasters

Attribute Exemplars Skeptics Searchers

Median total time on task (min) 32.7 17.3 35.0Mean % of time spent onthe trial run

33.1 21.0 22.0

Mean no. of methodstried per series

2.8 1.8 3.7

Mean ‘overall’ fit ratio 1.2 1.4 1.9Mean no. of adjustmentsover 20 series

1.4 9.3 6.0

Mean absolute % adjustment a 0.9 5.5 7.9MAPE 6.7 7.5 8.5

Bold= ‘smallest’, Italics= ‘largest’.Note that all differences were significant at pb0.05 when testedusing a Kruskal–Wallis ANOVA test, but that the test is‘opportunistic’ when the variable is used in the cluster analysis.a Averaged only over forecasts where an adjustment was made.

401P. Goodwin et al. / International Journal of Forecasting 23 (2007) 391–404

these strategies to be studied. For example, there maybe no relationship in general between the number ofstatistical methods tried and the accuracy of theresulting forecasts. However, for some subjects, tryinga large number of methods may be an essential part ofan effective forecasting strategy, in that it is used toexplore and gain insights into the forecasting problembefore making a commitment to a forecast. In othercases trying a large number of methods may besymptomatic of ‘thrashing around’ in desperation in anattempt to find an acceptable model. Cluster analysis,using Ward's method followed by k-means clustering,was used to group the subjects according to fivevariables: the number of methods they tried on theseries, the mean overall fit ratio of the method theyfinally chose, the number of judgmental adjustmentsthey made, their mean absolute percentage adjustment(averaged over occasions when they adjusted) and thetotal time they spent on the task, including the practicerun. The variables were standardized, and, as recom-mended by Sharma (1996), for example, several otherclustering methods were also used and the resultscompared. One subject would not fit easily into any ofthe clusters, and so was removed from the analysis;this demonstrates the difficulty of trying to find acategorization of user-types that includes all possibleusers. From this cluster analysis, three groups wereidentified, and for ease of reference, names wereassigned to them. These groups are described belowand the data relating to them is summarised in Table 7.

6.1. Group 1: “The Exemplars”

Fifty-five percent of subjects (excluding the outlier)were assigned to this group, which achieved the mostaccurate forecasts. Table 7 shows that they had the‘best’ mean overall fit ratio, made the fewestadjustments and, when they adjusted, they made thesmallest absolute percentage adjustments. Interesting-ly, this group also spent the largest percentage of thetotal time using the FSS on the practice run, where theywere able to learn about the support system's facilities.

6.2. Group 2: “The Sceptics”

Twenty-nine percent of subjects were assigned tothis group, which had the least engagement with thestatistical facilities of the forecasting support system,

suggesting a degree of scepticism. Table 7 shows thattheir approach was characterised by spending theshortest time on the practice run and exploring thefewest number of forecasting methods for each series,but making the most judgmental adjustments, whichwere also, on average, relatively large. The result of allof this was that their ‘final’ forecasts were less accuratethan those of Group 1.

6.3. Group 3: “The Searchers”

This relatively small group (16% of subjects)produced the least accurate forecasts. Despite explor-ing the largest number of forecastingmethods per seriesin a search for an appropriate method, they achieved the‘poorest’ overall mean fit ratio and made the largestjudgmental adjustments to their statistical forecasts.There was evidence that these forecasters tended to‘cycle’ between forecasting methods— often returningseveral times to methods that they had alreadyinvestigated. This suggests that there may be meritsin providing a memory support facility in FSSs toenable users to access a record of results that they havealready obtained (Singh, 1998). Oddly, this group had asignificantly higher level of disagreement (p=0.002)than the other groups with the statement: “Statisticalforecasts are less important than human judgment”.

Surprisingly, there were no significant differencesbetween the three groups in their satisfaction with theFSS or in their ratings of its ease of use and usefulness.

Page 12: The process of using a forecasting support system

402 P. Goodwin et al. / International Journal of Forecasting 23 (2007) 391–404

7. Conclusions

The extent to which inferences can be drawn fromthis study may be constrained by the fact that this was alaboratory study involving management students.Although researchers like Remus (1986) have indicat-ed that student subjects can act as good surrogates formanagers in experiments, the laboratory environmentmeant that the acceptability and use of the systemunder working conditions was not assessed. It isplanned to address this issue in future research.

Nevertheless, the study has shown that there can beconsiderable variation in the approaches people adoptwhen using a forecasting support system (FSS). Thiscan occur even where these people indicate that theyhave similar levels of familiarity with the methodsavailable in the system.

The study also found that most people did not tendto emulate mechanical forecasting systems by choos-ing the best fitting forecasting method. Indeed, theyseldom made use of the opportunity to identify the bestfitting method, even though this was available at thesimple touch of a button. There are several possiblereasons for this. All of the subjects were taking acourse in forecasting, and they may have been awarethat an optimal fit of a method to past data does notguarantee optimal forecasts. Alternatively, they mayhave striven to find a method that could forecast thenoise in the series, in addition to the underlying signal(Harvey, 1995). It is also possible that they understoodthe rationale of their own judgment better than therationale underlying the ‘advice’ offered by theautomatic forecast, and hence trusted their ownjudgment more (Yaniv & Kleinberger, 2000). Perhapsthey simply needed to have a sense of ownership of theforecasts (Önkal & Gönül, 2005). Whatever the cause,discriminating between these explanations is beyondthe scope of this paper.

People also tended to examine only a small numberof methods before making a selection; however, theywere likely to examine more methods when theyperceived the series to be difficult to model. Thosewho were relatively unsuccessful in identifying a wellfitting statistical method tended to compensate forthis by making large judgmental adjustments to thestatistical forecasts. However, this combination of apoor-fitting forecasting method and judgmental ad-justments tended to lead to forecasts that were less

accurate than those produced by people who selectedwell fitting methods in the first place.

A key result of the study is that the users wereconsistent in their approach throughout the twentyforecasts they made (subject to a general tendency totry fewer methods as time went on). This suggests thatadaptive FSSs could be designed to recogniseparticular strategies at an early stage, enabling theinterface to adapt to the particular needs, strengths andweaknesses of these users. For example, the systemcould highlight information that was not being takeninto account sufficiently (such as the lack of fit of thechosen forecasting method or its inability to deal witha trend in the series), and also guide the user towardsthe selection of more appropriate methods. Interest-ingly, people who devoted a larger percentage of theirtime to familiarising themselves with the FSS on a trialset of data, tended to achieve more accurate forecasts.

The study has also provided some evidence that thebehaviour of sub-groups of forecasters can be identi-fied. This behaviour ranged from people who were ableto produce accurate forecasts after examining very fewmethods, to those who examined many methods andyet obtained relatively inaccurate forecasts. Analysisby sub-group allowed interaction between elements ofbehaviour to be taken into account. It also permitted usto identify good forecasting strategies. For example,improved accuracy can be obtained by:

• spending time familiarising oneself with the system,and

• examining only a few forecasting methods, and• choosing a method which provided a good fit to pastdata, and then

• avoidingmaking substantial judgmental adjustments.

Other combinations of behavioural features led toinferior accuracy. However, there are clearly limitationsto the possibilities of FSSs tailoring their support to sub-groups of forecasters, rather than individuals. Identify-ing a full range of sub-groups would require studying alarger sample of users than that used in this study, andsuch a sample would need to be taken from a populationthat embraced a wider range of possible user-types. Forexample, the users in this study had similar levels offamiliarity with forecasting processes and an identicallevel of experience in using the system. Moreover,clustering individuals into sub-groups can be sensitive

Page 13: The process of using a forecasting support system

403P. Goodwin et al. / International Journal of Forecasting 23 (2007) 391–404

to the methodological choices made during theapplication of cluster analysis (see e.g., Ketchen &Shook, 1996), and as we found here, there may be someindividuals who do not easily conform to any group.

Nevertheless, issues like these are still worthexploring further.Whilemodern commercial forecastingsystems provide access to powerful statistical methodsand the ability to link easily to large databases, theirdesign has largely neglected the role that the judgment ofusers plays in the forecasting process. It is therefore to behoped that the results of studies like this will be of valueboth to developers of commercial forecasting systemsand to fellow researchers in this area.

Acknowledgements

This research was supported by the Engineeringand Physical Sciences Research Council (EPSRC)grants GR/60198/01 and GR/60181/01.

Appendices

Appendix A (Questionnaires) and Appendix B(Transformations) are available at http://www.fore-casters.org/ijf/data.htm.

References

Alavi, M., & Joachimsthaler, E. A. (1992, March). RevisingDSS implementation research: A meta-analysis of the lit-erature and suggestions for researchers. MIS Quarterly, 16,95−116.

Armstrong, J. S. (2001). Principles of forecasting. Boston: KluwerAcademic Publishers.

Ayton, P., Ferrell, W. R., & Stewart, T. R. (1999). Commentaries on“The Delphi technique as a forecasting tool: Issues and analysis”by Rowe and Wright. International Journal of Forecasting, 15,377−381.

Bariff, M. L., & Lusk, E. G. (1977). Cognitive and personality testsfor the design of management information systems. Manage-ment Science, 23, 820−829.

Benbasat, I., & Taylor, R. N. (1978). The impact of cognitive styleson information system design. MIS Quarterly, 2, 43−54.

Bolger, F., & Harvey, N. (1993). Context-sensitive heuristics instatistical reasoning. Quarterly Journal of Experimental Psy-chology, 46A, 779−811.

Cook, G. J., & Swain, M. R. (1993). A computerized approach todecision-process tracing for decision support system design.Decision Sciences, 24, 931−952.

Dickson, G. W., Senn, J. A., & Chervany, N. L. (1977). Research inmanagement information systems: The Minnesota experiments.Management Science, 9, 913−923.

Driver, M. J., & Mock, T. J. (1975). Human information processing,decision style theory, and accounting information systems. Ac-counting Review, 50, 490−508.

Easterwood, J. C., & Nutt, S. R. (1999). Inefficiency in analysts'earnings forecasts: Systematic misreaction or systematic opti-mism? Journal of Finance, 54, 1777−1797.

Fildes, R., & Hastings, R. (1994). The organisation and improve-ment of market forecasting. Journal of the Operational ResearchSociety, 45, 1−16.

Fildes, R., & Makridakis, S. (1988). Forecasting and loss functions.International Journal of Forecasting, 4, 545−550.

Goodwin, P. (2000). Improving the voluntary integration ofstatistical forecasts and judgment. International Journal ofForecasting, 16, 85−99.

Goodwin, P. (2005). Providing support for decisions based on timeseries information under conditions of asymmetric loss. Euro-pean Journal of Operational Research, 163, 388−402.

Goodwin, P., Lee, W. Y., Fildes, R., Nikolopoulos, K., & Lawrence,M. (2006). Understanding the use of forecasting software: Aninterpretive study in a supply-chain company. University of BathManagement School Working Paper.

Harvey, N. (1995). Why are judgments less consistent in lesspredictable task situations? Organizational Behavior andHuman Decision Processes, 63, 247−263.

Huber, G. P. (1983). Cognitive style as a basis for MIS and DSS:Much ado about nothing? Management Science, 29, 567−579.

Ketchen, D. J., & Shook, C. L. (1996). The application of clusteranalysis in strategic management research: An analysis andcritique. Strategic Management Journal, 17, 441−458.

Lawrence, M., Goodwin, P., & Fildes, R. (2002). Influence of userparticipation on DSS use and decision accuracy. Omega,International Journal of Management Science, 30, 381−392.

Lawrence, M. J., & Makridakis, S. (1989). Factors affectingjudgmental forecasts and confidence intervals. OrganizationalBehavior and Human Decision Processes, 42, 172−187.

Lawrence, M. J., & O'Connor, M. J. (1992). Exploring judgementalforecasting. International Journal of Forecasting, 8, 15−26.

Makridakis, S., Wheelwright, S. C., & Hyndman, R. J. (1998). For-ecasting methods and applications, 3rd Ed. Chichester: Wiley.

Mintzberg, H. (1990, March/April). The manager's job: Folklore andfact. Harvard Business Review, 163−176.

O'Connor, M., Remus, W., & Griggs, K. (1993). Judgementalforecasting in times of change. International Journal ofForecasting, 9, 163−172.

Önkal, D., & Gönül, M. S. (2005). Judgmental adjustment: Achallenge to forecast providers and users. Foresight: TheInternational Journal of Applied Forecasting, 1, 13−17.

Remus, W. (1986). Graduate students as surrogates for managers inexperiments on business decision making. Journal of BusinessResearch, 14, 19−25.

Sanders, N. R. (1997). The impact of task properties feedback ontime series judgemental forecasting tasks. Omega, 25, 135−144.

Sanders, N. R., & Ritzman, L. P. (2001). Judgmental adjustment ofstatistical forecasts. In J. S. Armstrong (Ed.), Principles offorecasting (pp. 405−416). Boston: Kluwer Academic Publishers.

Sauter, V. L. (1997). Decision support systems: an appliedmanagerial approach. New York: Wiley.

Page 14: The process of using a forecasting support system

404 P. Goodwin et al. / International Journal of Forecasting 23 (2007) 391–404

Sharma, S. (1996). Applied multivariate techniques.NewYork: Wiley.Silver, M. S. (1991). Decisional guidance for computer-based

support. MIS Quarterly, 15, 105−133.Singh, D. T. (1998). Incorporating cognitive aids into decision

support systems: The case of the strategy execution process.Decision Support Systems, 24, 145−163.

Tennant, M. (1988). Psychology and adult learning. London:Routledge.

Watson, M. (1996). Forecasting in the Scottish electronics industry.International Journal of Forecasting, 12, 361−371.

Webby, R., O'Connor, M., & Edmundson, B. (2005). Forecastingsupport systems for the incorporation of event information: An

empirical investigation. International Journal of Forecasting,21, 411−423.

Willemain, T. R. (1989). Graphical adjustment of statisticalforecasts. International Journal of Forecasting, 5, 179−185.

Yaniv, I., & Kleinberger, E. (2000). Advice taking in decision making:Egocentric discounting and reputation formation. OrganizationalBehavior and Human Decision Processes, 83, 260−281.

Zmud, R. W. (1979). Individual differences and MIS success: Areview of the empirical literature. Management Science, 25,966−979.