judgmental forecasts of time series affected by special events: does providing a statistical...

17
Judgmental Forecasts of Time Series Aected by Special Events: Does Providing a Statistical Forecast Improve Accuracy? PAUL GOODWIN 1 * and ROBERT FILDES 2 1 Department of Mathematical Sciences, University of the West of England, UK 2 The Management School, Lancaster University, UK ABSTRACT Time series found in areas such as marketing and sales often have regular established patterns which are occasionally aected by exogenous influences, such as sales promotions. While statistical forecasting methods are adept at extrapolat- ing regular patterns in series, judgmental forecasters have a potential advantage in that they can take into account the eect of these external influences, which may occur too infrequently for reliable statistical estimation. This suggests that a combination of statistical method and judgment is appropriate. An experiment was conducted to examine how judgmental forecasters make use of statistical time series forecasts when series are subject to sporadic special events. This was investigated under dierent conditions which were created by varying the complexity of the time series signal, the level of noise in the series, the salience of the cue, the predictive power of the cue information and the availability and presentation of the statistical forecast. Although the availability of a statistical forecast improved judgment under some conditions, the use the judgmental forecasters made of these forecasts was far from optimal. They changed the statistical forecasts when they were highly reliable and ignored them when they would have formed an ideal base-line for adjustment. Copyright # 1999 John Wiley & Sons, Ltd. KEY WORDS judgmental forecasting; promotions; task information feedback Statistical time series forecasting methods are designed to measure and extrapolate regular patterns in time series. However, in practice, many series have temporary discontinuities caused by sporadic events, such as promotion campaigns for a product. These events may be so infrequent that the ability of statistical methods to measure their eects is restricted by lack of data and they may therefore be treated as noise. This limitation of statistical methods is one reason for the widespread use of human judgment in business forecasting (Dalrymple, 1987; Kleinmutz, 1990; Sanders and Manrodt, 1994). CCC 0894–3257/99/010037–17$17.50 Copyright # 1999 John Wiley & Sons, Ltd. Accepted 20 August 1998 Journal of Behavioral Decision Making J. Behav. Dec. Making, 12: 37–53 (1999) *Correspondence to: Paul Goodwin, Department of Mathematical Sciences, University of the West of England, Coldharbour Lane, Frenchay, Bristol BS16 1QY, UK. E-mail: [email protected]

Upload: paul-goodwin

Post on 06-Jun-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Judgmental Forecasts of Time SeriesA�ected by Special Events: Does Providinga Statistical Forecast Improve Accuracy?

PAUL GOODWIN1* and ROBERT FILDES2

1Department of Mathematical Sciences, University of the West of England, UK2The Management School, Lancaster University, UK

ABSTRACT

Time series found in areas such as marketing and sales often have regularestablished patterns which are occasionally a�ected by exogenous in¯uences, suchas sales promotions. While statistical forecasting methods are adept at extrapolat-ing regular patterns in series, judgmental forecasters have a potential advantage inthat they can take into account the e�ect of these external in¯uences, which mayoccur too infrequently for reliable statistical estimation. This suggests that acombination of statistical method and judgment is appropriate. An experimentwas conducted to examine how judgmental forecasters make use of statistical timeseries forecasts when series are subject to sporadic special events. This wasinvestigated under di�erent conditions which were created by varying thecomplexity of the time series signal, the level of noise in the series, the salienceof the cue, the predictive power of the cue information and the availability andpresentation of the statistical forecast. Although the availability of a statisticalforecast improved judgment under some conditions, the use the judgmentalforecasters made of these forecasts was far from optimal. They changed thestatistical forecasts when they were highly reliable and ignored them when theywould have formed an ideal base-line for adjustment. Copyright # 1999 JohnWiley & Sons, Ltd.

KEY WORDS judgmental forecasting; promotions; task information feedback

Statistical time series forecasting methods are designed to measure and extrapolate regular patterns intime series. However, in practice, many series have temporary discontinuities caused by sporadicevents, such as promotion campaigns for a product. These events may be so infrequent that the abilityof statistical methods to measure their e�ects is restricted by lack of data and they may therefore betreated as noise. This limitation of statistical methods is one reason for the widespread use of humanjudgment in business forecasting (Dalrymple, 1987; Kleinmutz, 1990; Sanders and Manrodt, 1994).

CCC 0894±3257/99/010037±17$17.50Copyright # 1999 John Wiley & Sons, Ltd. Accepted 20 August 1998

Journal of Behavioral Decision MakingJ. Behav. Dec. Making, 12: 37±53 (1999)

* Correspondence to: Paul Goodwin, Department of Mathematical Sciences, University of the West of England, ColdharbourLane, Frenchay, Bristol BS16 1QY, UK. E-mail: [email protected]

Despite the popularity of judgmental forecasting, research on the accuracy of judgment hassuggested that it is subject to cognitive biases and inconsistency (Goodwin and Wright, 1994). Never-theless, some research (Mathews and Diamantopoulos, 1990; Webby and O'Connor, 1996) has foundthat judgmental forecasters are able to make e�ective adjustments to statistical time series forecasts totake into account contextual information (i.e. any information in addition to that contained in the timeseries). It therefore seems reasonable to hypothesize that the appropriate approach to the problem offorecasting discontinuous time series patterns is a combination of statistical methods and judgment,with the statistical forecast handling the regular time series pattern and the judge making adjustmentsto this in the light of sporadic events.

Recently, Lim and O'Connor (1996) called for research into how judgmental forecasters performwhen non-time series information is available sporadically. They argued that this situation is typical ofmany practical forecasting tasks. Sporadic contextual information puts special demands on thejudgmental forecaster and may therefore have a particular in¯uence on the way that judgment is used:

(1) It requires the judgmental forecaster to be adaptable. For example, it may involve relying only ontime series information in some periods, while combining time series and contextual information inothers.

(2) The e�ect of the sporadic events is to increase the complexity of the time series pattern. Unless thejudgmental forecaster is able to discount observations for periods when the special event occurredit is likely that they will be to the detriment of accurate judgmental extrapolations for `normal'periods.

(3) The tendency to rely on a pattern-matching strategy (Hoch and Schkade, 1996) may also beincreased in the case of a sporadic cue. Pattern matching will involve making forecasts for specialperiods by matching the special period to a single past observation that most closely resembles theconditions applying in the special period. This strategy is likely to be adopted when cues aresporadic because the opportunity to examine a wide range of instances where similar cue valuesapplied may be diminished.

(4) Where a statistical time series forecast is available it intersperses periods when these forecasts arelikely to be reliable with periods when they are representing only part of the series generationprocess. This means that there is a need to vary the forecasting strategy. Non-contextual judg-mental adjustment (Webby and O'Connor, 1996) may be applied to the statistical forecasts whenno special events are a�ecting the time series because the judge feels that the statistical time seriesmodel is inadequate (e.g. Willemain, 1989). Contextual adjustment is likely to be made for periodswhen the special event occurs. The success of any strategy will be dependent on the forecaster'sability to distinguish between these two types of adjustment and apply them appropriately.

(5) There are clearly dangers of `spill-over e�ects' between the types of period. For example, themanifestly inadequate performance of the statistical method when the sporadic cue applies mightreduce the judgmental forecaster's belief in the method in other periods (Taylor and Thomas,1982).

This paper reports on an experiment that was designed to address the following questions:

(1) Under what conditions, if any, is the accuracy of judgmental forecasts improved by the provision ofstatistical time series forecasts when a series is subject to the e�ect of a sporadic cue?

(2) Are judges able to make e�cient use of the available statistical forecast? For example, dojudgmental forecasters leave reliable statistical time series forecasts unchanged when the cue ishaving no e�ect on the series and use the statistical forecasts as a base from which to makeadjustments when the cue does have an e�ect?

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

38 Journal of Behavioral Decision Making Vol. 12, Iss. No. 1

(3) Does the provision of short, non-technical, explanations of statistical forecasts improve theirutilization by the judgmental forecaster?

Because the performance of judgmental forecasters is likely to be highly contingent on the nature ofthe task (Payne et al., 1993) these questions were examined under a range of conditions. These condi-tions were created by varying: (i) the complexity of the time series signal, (ii) the level of noise aroundthis signal, (iii) the salience of the cue, and (iv) the predictive value of the cue information.

PREVIOUS RESEARCH

Surprisingly, there is little evidence on how judgmental time series forecasters perform when they haveaccess to contextual information which is sporadic, despite this being a situation where the use ofjudgment is likely to be common (Lim and O'Connor, 1996). There are, however, several relevantstudies which have examined how judgmental forecasters perform when they have access to either (i)time series information only or (ii) time series and continuously available contextual information.Some of these studies have also examined how judgmental forecasters combine available informationwith statistical time series forecasts.

Time series informationWhen only time series information is available to a judgmental forecaster it appears that forecastaccuracy will depend upon the level of noise in the series and the complexity of the underlying signal(Goodwin andWright, 1993). Noise poses problems for judgmental forecasters in that it masks the truesignal and leads to the identi®cation of false signals. The result is that they read too much into thechanging values of series (O'Connor et al., 1993) and hence overreact to each new observation as itappears. For example, Sanders (1992) found that the performance of judgment relative to statisticalforecasting methods deteriorated with increases in noise level. It appears that forecasters add noise totheir forecasts to try to make them representative of the variation in the observed time series (Harvey,1995). Harvey suggests that they may use the range of observations as a crude estimate of variability.They then anchor on the upper and lower end-points of this range and make insu�cient adjustments tothe centre. In a series where the signal is disturbed by sporadic cues, this range is likely to be increasedso that more noise is likely to be added to the resulting forecasts, even in forecasts for periods that areuna�ected by the cue.

When it comes to identifying and extrapolating relatively complex time series signals, some authorshave suggested that judgmental forecasters attempt to reduce the cognitive e�ort involved by usingsimpli®ed heuristics, such as anchoring and adjustment (Andreassen and Kraus, 1990; Lawrence andO'Connor, 1992; Bolger and Harvey, 1993; but see also Lawrence and O'Connor, 1995). The use ofthese heuristics may account for the widely reported tendency of judgmental forecasters to under-estimate upward trends (Lawrence and Makridakis, 1989). It could also explain the relatively poorperformance of judgmental forecasters as the signal complexity increases. For example, Sanders (1992)found that judgmental forecast accuracy deteriorated as the signal complexity increased from `¯at' totrend-plus-seasonal.

Contextual informationMost research suggests that judges in possession of continuously available contextual information,which has predictive validity, can outperform statistical time series methods, either by adjusting these

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

P. Goodwin and R. Fildes Judgmental Forecasts of Time Series 39

forecasts or by making forecasts independently (Edmundson et al., 1988; Mathews and Diamanto-poulos, 1990; Wolfe and Flores, 1990; Sanders and Ritzman, 1992). In particular, contextual informa-tion can lead to the anticipation of discontinuities in time series which will not be predicted by astatistical time series model. However, it appears that the ability of forecasters to make e�ective use ofcontextual information is dependent on (i) the extent of the information, (ii) its predictive power and(iii) its regularity and frequency.

Given the limited information processing capacity of the human mind (Hogarth, 1987), increasingamounts of information may not being commensurate improvements in judgmental performance. Forexample, an unpublished study by Handzic (cited in O'Connor and Lawrence, 1998) suggests that thethreshold of information overload is low and that people are only able to make e�ective use ofinformation on one cue. Moreover, the general decision-making literature describes a number of biaseswhich reduce people's ability to assess the predictive power of cues and to estimate the nature of therelationship between cue and dependent variable (Hogarth and Makridakis, 1981). For example, priorbeliefs that relationships exist can lead to illusory correlation, despite the presence of discon®rmingevidence (Chapman and Chapman, 1969). Also, people are both inconsistent and biased in their use ofcue data. Evidence for inconsistency in the use of cues can be found in the extensive literature onbootstrapping (e.g. Meehl, 1957; Dawes, 1975) while examples of biases resulting from the use ofoversimpli®ed heuristics can be found in studies by Fildes (1991) and Harvey et al. (1994). Similarly,Lim and O'Connor (1996) found that forecasters underweighted the e�ect of the cues, particularlywhere they were highly predictive, although there was evidence that they could di�erentiate betweencues with low and high predictive power.

While some information will apply to all periods that are being forecast, other information will relateto special events (Meehl, 1957, referred to such events as `broken leg cues'). If such events are relevantto the forecast variable then they are likely to be associated with discontinuities in the time seriespattern. Where the sporadic event is not unique it is possible that forecasters use a pattern-matchingstrategy Ð that is, they try to ®nd a past experience that closely matches the conditions associated withthe forecast and then assume that the future experience will be similar to the past one. Note that inpattern matching, which can be seen as a form of the representativeness heuristic, a single past case,which will contain an element of noise, is used as the basis for the prediction while general tendencies,such as long-run time series patterns, are ignored. Hoch and Schkade (1996) found evidence of patternmatching in a judgemental forecasting task, but the cue information in their experiment was con-tinuous, rather than sporadic, and the task did not involve time series data, so it may not generalize totime series forecasting. They suggest that pattern matching is a fairly good strategy in highly predict-able environments, but is de®cient when the environment contains high levels of noise.

Provision of a statistical forecastSeveral studies have investigated how judges make use of the predictions of available statisticalmethods. Hoch and Schkade (1996) found that the provision of sub-optimal statistical forecasts (anexplanatory variable was deliberately omitted from their model) improved judgmental forecasts.Pattern matching enabled forecasters to take into account the variable missing from the model, whilethe availability of the model acted as a damper on the overuse of pattern matching. In a time seriesextrapolation task, Lim and O'Connor (1995) also found that the reliability of judgmental forecastswas improved by the provision of statistical forecasts, even if the latter were unreliable. However, whilesubjects could apparently discern, and hence take into account, the reliability of the statistical forecaststhey tended to over-rely on initial judgements they had made prior to receiving the statistical forecasts.This tendency persisted even when subjects were informed that their judgments were far less accuratethan the statistical forecasts. As Lim and O'Connor point out, the propensity of judges to bene®t from

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

40 Journal of Behavioral Decision Making Vol. 12, Iss. No. 1

a reliable model, while failing to outperform it, has been found in other areas of research into judgment(e.g. Peterson and Pitz, 1986; Arkes et al., 1986).

While the provision of a statistical forecast increases the amount of information that the judge has tohandle, these forecasts can be regarded as a form of task information feedback, particularly where themodel structure and parameters (rather than simply the forecast) are made available to the judge.Balzer et al. (1992, 1994) found that providing statistical information about the task system led togreater improvements in judgmental performance than other forms of feedback such as simpleoutcome feedback where the judge is simply told the outcome of each event after making the judgment(see also Hammond et al., 1973; Kessler and Ashton, 1981; Benson and OÈ nkal, 1992). It seems that thenoise element of outcome feedback may confuse judges. Most work on feedback has been carried outoutside the domain of judgmental time series forecasting. However, recently, studies by Remus et al.(1996) and Sanders (1997) found that the superiority of task information feedback also applied to ajudgmental time series forecasting task, though in both studies the task information supplied wasperfectly accurate. Clearly, in the case of a sporadic cue, where the cue information is so scarce that astatistical model will not be able to take the cue into account, task information feedback can only beavailable for part of the task environment.

EXPECTED ANSWERS TO RESEARCH QUESTIONS

Conditions where provision of statistical forecasts should improve judgmentBefore anticipating answers to the research questions we need to make assumptions about the nature ofthe cue e�ects and the statistical forecasts. First, it is assumed here that the e�ect of a sporadic cue on atime series applies only to particular periods so that periods can be categorized as being either `normal'or, when a cue e�ect is observed, `special'. Second, in marketing, where special periods often resultfrom the e�ect of sales promotion campaigns, a number of decision support systems have beendeveloped which allow statistical time series forecasts to estimate the baseline of sales for promotionperiods Ð the baseline being the estimated level of sales if the promotion does not run (e.g. Abrahamand Lodish, 1987). This has the bene®t of clearly separating the underlying time series from thepromotion e�ects. Moreover, some commercial forecasting packages like Forecast Pro now allowobservations for special periods to be separated out so that they do not contaminate forecasts fornormal periods. In the following discussion the statistical forecasts will therefore be assumed to havethis ability to be una�ected by observations for special periods. Given these assumptions, the precedingliterature review can be used to form hypotheses about the relative performance of judgmentalforecasters in these types of period when they are either supported or not supported by access to astatistical forecast.

In normal periods, the provision of a reliable statistical forecast, together with records of pastforecasts, should help to draw attention to the underlying signal and make patterns in that signal moresalient. The statistical forecasts will also treat past variations resulting from the sporadic cue as noise.They may therefore help the forecaster to discount observations for periods a�ected by the cue when aforecast is being made for `normal' periods.

In special periods, the provision of a time series forecast might be expected to improve the judgmentalforecasts because it will reduce the cognitive load on the judge by taking care of the time series pattern.Without the bene®t of a statistical forecast the judge will have to mentally extrapolate the time seriesand then estimate the cue e�ect and then add this to the extrapolation. With access to a statistical timeseries forecast of the `baseline value' the judge has only to estimate the e�ect of the cue and make anappropriate adjustment to the statistical forecast. If a cue has low predictive power then this may beeasier to discern, given the `baseline' provided by the statistical forecasts, thereby reducing the danger

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

P. Goodwin and R. Fildes Judgmental Forecasts of Time Series 41

of illusory correlation. Moreover, if Hoch and Schkade's (1996) ®ndings apply in time series fore-casting, the provision of a statistical forecast will act as a damper on excessive pattern matching. Inperiods where a non-salient cue has an e�ect it is expected that the e�ect of the cue will become moreavailable to the forecaster, because the `base-line' of the forecasts will highlight the deviations resultingfrom this cue.

While overall improvements in judgment are expected, it seems likely that the extent of the bene®ts ofsupplying a statistical forecast to the judge will be contingent on the nature of the time series. Thebene®ts should be greatest where series are subject to high noise and/or where the signal is relativelycomplex since in these cases the unaided judge is likely to have di�culties in discerning andextrapolating the signal.

Are judgmental forecasters likely to make e�cient use of statistical forecasts?While provision of statistical forecasts is expected to improve the accuracy of judgment, the literaturesuggests that judgmental forecasters are unlikely to make the most e�cient use of statistical forecasts.Because judgmental forecasters generally seek to forecast the noise in series, they may perceive thefailure of statistical forecasts to capture this noise as a sign of the forecast's unreliability and discountthe value of the information. This tendency is likely to be exacerbated where a sporadic cue appliesbecause of the manifest failure of the statistical forecast to anticipate the e�ect of the cue.

Are explanations of the statistical forecasts likely to improve the way they are used?If judgmental forecasts do pay insu�cient attention to statistical forecasts, then providing regularexplanations of their rationale may be e�ective. Since these explanations can be seen as a form of taskproperties feedback they should increase the forecaster's awareness of the main characteristics of thetime series signal.

RESEARCH DESIGN

A laboratory experiment was used to simulate a sales forecasting problem where the underlying patternof sales of a product is stable, but where additional sales are generated by promotion campaigns inparticular periods. The experiment was conducted as a 2 (signal complexity)� 2 (noise level)� 2(promotion e�ectiveness)� 3 ( feedback type) factorial design with two replications for each treatment.The forecasters, who were 48 ®nal-year students on a Business Decision Analysis degree course at theUniversity of the West of England, were each randomly assigned to one treatment. All the subjects hadsuccessfully completed a course in statistical forecasting and most had just completed a placement yearin industry. As an incentive, four prizes of £20 were o�ered to the most accurate forecasters (afteradjusting for the level of noise in the series) and casual observation suggested that this motivated thesubjects. While there are limitations in the extent to which the laboratory can replicate forecastingconditions in the ®eld (Goodwin and Wright, 1993), it does permit analysis to be carried out undercontrolled conditions.

Signal complexity and noise levelFour underlying time series of 71 quarterly observations were generated for this study. These weredesigned to re¯ect two levels of signal complexity and two levels of noise:

(a) The `simple' signal had a constant mean of 300 units (we will refer to this as the `¯at' series)

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

42 Journal of Behavioral Decision Making Vol. 12, Iss. No. 1

(b) The `complex' signal had an upward linear trend of 1.5 units per period starting with a level of 210at period 0. Superimposed on this was a multiplicative seasonal pattern with seasonal indices of 0.7,1.1, 1.3 and 0.9 for quarters 1 to 4 respectively (we will refer to this as the trend-seasonal series).

Each signal had one of two noise levels added to it. In the low-noise condition the noise wasindependently normally distributed with a mean of 0 and standard deviation of 18.8, while in the high-noise condition the standard deviation was 56.4. On the `¯at' series these give expected absolute noiselevels of 5% and 15%, respectively.

Predictive power of cueSales promotion campaigns were used to simulate the e�ects of a sporadic cue on the time series. Theseoccurred in 21 of the quarters (12 in quarters requiring forecasts) and for a given series the e�ects ofpromotion expenditure on sales were either weak or strong. This allowed the forecasters' performanceto be assessed under conditions where the cue had either low or high predictive power. Promotione�ects were deterministic: for series where they were weak, sales equal to 0.05� the promotionexpenditure were added to the underlying time series observation; strong promotions led to additionalsales equal to 0.7� the promotion expenditure. To avoid confounding e�ects, the promotioncampaigns took place in the same quarters for all series.

Salience of cueIn the period after a promotion campaign the underlying time series observation was reduced by 50%of the previous period's promotion e�ect. In practice this might be observed where some consumersbring their purchases forward by one period, because of the promotion campaign, but reduce theirpurchases in the following period to compensate (Abraham and Lodish, 1987). Although the fore-casters were told before the experiment to expect a negative e�ect on sales in these periods, there wereno reminders of this during the experiment, so that the cue e�ect was likely to have a low level ofavailability. The four underlying time series patterns, with one of two types of promotion e�ect added,yielded eight time series in all.

Type of feedbackFor each series a personal computer was used to display a graph of the ®rst 31 quarterly observations.These were labelled as `Sales (units)'. A bar chart, plotted against the same axes showed any promotionexpenditures which had taken place in these periods, while, on the right of the screen, a single bar wasused to display the promotion expenditure (if any) planned for the next quarter. Subjects were asked toproduce one-quarter-ahead forecasts for the next 40 periods. During the experiment, they received oneof three types of feedback: (i) simple outcome feedback, (ii) a statistical time series forecast or (iii) astatistical forecast together with a regular explanation of its rationale. After each forecast, the outcomefeedback group simply received a message telling them the level of sales which had occurred, and theyalso saw the sales graph updated.

Subjects in the second feedback group did not receive this message (though they did see the salesgraph updated). Instead they received a forecast of the next quarter's sales generated by either simpleexponential smoothing ( for the `¯at' series) or the Holt±Winters method ( for the `trend-seasonal'series). Note that the statistical forecasts were designed only to take into account the underlying timeseries and therefore were not a�ected at all by movements in the series caused by promotion or post-promotion e�ects. This enabled the experiment to re¯ect the features of the marketing and forecasting

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

P. Goodwin and R. Fildes Judgmental Forecasts of Time Series 43

packages discussed earlier. It also enabled a cleaner separation to be made between periods when theforecast was reliable and periods when it had a known de®ciency, thereby increasing the ability of theexperiment to address the research questions unambiguously.

Each statistical forecast was displayed prominently to encourage the forecaster to take it intoaccount and it was also plotted on the sales graph, together with all past forecasts so that its reliabilitycould be assessed. Every seven quarters the third group of subjects received, in addition to thestatistical forecasts, a brief explanation of these forecasts, supported by a small graph. Exhibit 1 showsa typical screen display for this group. For ¯at series the message was: `The statistical model estimatesthat the series is ¯at. Its latest estimate of average sales is x. It assumes any variations from this are justrandom.' The message for the trend-seasonal series was: `The statistical model estimates that under-lying sales are increasing by x units per quarter. On top of this there is a seasonal pattern. The modelassumes any variations from this pattern are just random.'

Instructions to subjectsBefore the experiment subjects were given brief written instructions. These included several statementsby the `sales manager' indicating: (i) whether sales followed a seasonal pattern, and if they did, thequarters when sales tended to be higher or lower; (ii) that there was no guarantee that a promotion

Exhibit 1. Screen display

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

44 Journal of Behavioral Decision Making Vol. 12, Iss. No. 1

campaign would be e�ective; (iii) that if a campaign was e�ective it would only increase sales during thequarter in which it took place and would tend to reduce sales in the following quarter because peoplehad stocked up on the product during the campaign; (iv) that the statistical forecast (if provided) couldnot take into account the e�ect of promotion campaigns. No other information was given about theproduct, the company or the market in which it operated.

ANALYSIS AND RESULTS

Because the judgmental forecasters were not expected to forecast the noise in the series the medianabsolute percentage error (MdAPE) in forecasting the next value of the signal was used to measure theaccuracy of the forecasts (where the signal � the underlying time series signal � any promotione�ects). The MdAPE has been recommended as an error measure for comparing forecasts across timeseries by Armstrong and Collopy (1992). In particular, it reduces the distortion that can occur in themore commonly used mean absolute percentage error (MAPE) when occasional very large percentageerrors occur because of small actual values. The time series used in this experiment can be divided intothree types of period: (i) `normal' periods, (ii) promotion periods and (iii) post-promotion periods. Foreach of these, an analysis of variance (ANOVA) was used to investigate the e�ect on the MdAPE of thedi�erent treatments. ANOVA is known to be robust to departures from the assumption of equalvariances among the treatment conditions when there is an equal number of observations per treat-ment (Lindman, 1992, p. 22). However, this is not the case when planned comparisons are used to testfor signi®cant di�erences between individual treatments. Because there was evidence of unequalvariances here, planned comparisons were carried out using the non-parametric Hodges±Lehmann testfor aligned observations (Marascuilo and McSweeney, 1977, p. 396). There is, of course, an increasingdanger that, as the number of comparisons increases, signi®cant results will be obtained by chance. Tocounter this the Dunn±Bonferroni procedure was used to control the overall signi®cance level.

When did the provision of a statistical forecast improve judgmental forecasts?

Normal periodsExhibit 2 shows the mean MdAPEs for the treatment conditions in normal periods, together with theMdAPEs of the statistical forecasts that were provided. ANOVA indicated a signi®cant noise� feed-back-type interaction (F2;24 � 4.34, p � 0.025� and also a signi®cant series� feedback-type interaction(F2;24 � 3.96, p � 0.033). This supports the idea that the extent of improvements resulting fromprovision of a statistical forecast are contingent on the nature of the time series. Multiple comparisonsrevealed that the mean MdAPE for high noise series was signi®cantly lower (p � 0.009) when astatistical forecast was provided. In contrast, for low-noise series, providing a statistical forecast did notlead to signi®cant reduction in the mean MdAPE. Similarly, providing a statistical forecast for themore complex trend-seasonal series signi®cantly reduced the mean MdAPE (p � 0.003), but this wasnot the case for the `¯at' series. Thus it seems that the expectation stated earlier that the bene®ts ofproviding a statistical forecast would apply to all series types was not ful®lled. However, as expected,there were bene®ts if the series had high noise or a complex signal.

There was also a signi®cant main e�ect on the mean MdAPE for promotion e�ectiveness(F1;24 � 5.15, p � 0.033). Series with a strong promotion e�ect were forecast less accurately than thosesubject to a weak e�ect. Of course, promotions had not e�ect in normal periods, but they did increasethe amount of variation in the time series graph, making the series apparently more complex,particularly for strong promotion series. Subjects were therefore apparently unable to ignore these

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

P. Goodwin and R. Fildes Judgmental Forecasts of Time Series 45

variations caused by strong promotions and concentrate only on past sales for normal periods.Moreover, in the absence of a signi®cant promotion� feedback interaction (F2;24 � 0.36, p � 0.701),it appears that their ability to discount periods a�ected by strong promotions was not improved by theprovision of statistical forecasts.

Did providing explanations of statistical forecasts lead to improved judgmental forecasts in normalperiods? Multiple comparisons indicated that signi®cant improvements only occurred for `¯at' series(p � 0.008). Thus although judgmental forecasts for `¯at' series did not appear to bene®t from theprovision of a statistical forecast on its own, providing this forecast with an explanation was e�ective.Curiously, for seasonal series, providing a forecast on its own produced signi®cant increases inaccuracy, but a statistical forecast with explanations produced no signi®cant improvement (p � 0.124).

Promotion periodsThe mean MdAPEs for promotion periods are shown in Exhibit 3, together with the MdAPE of thestatistical forecast provided. An ANOVA revealed no signi®cant interaction or main e�ects involvingfeedback (e.g. for the series� feedback interaction, F2;24 � 0.85, p � 0.442). Thus it appears that,contrary to the expectations that were stated earlier, there were no bene®ts to be gained by providing astatistical forecast to the judge when sales forecasts were being made for promotion periods.

Post-promotion periodsExhibit 4 shows the mean MdAPEs for post-promotion periods, together with the MdAPE of thestatistical forecast provided. The ANOVA revealed a signi®cant series� noise� feedback interaction(F2;24 � 7.64, p � 0.003). Further investigation revealed a reduction in MdAPE associated with theprovision of a statistical forecast for the high-noise trend-seasonal series as hypothesized (p � 0.04).Providing a statistical forecast with explanations also reduced the MdAPE for the high-noise trend-seasonal series (p � 0.023). Although these results were consistent with expectations, they should beinterpreted with care since both p-values were higher than those allowed for in the Dunn±Bonferroniprocedure (also the small number of observations in each cell means that the three-way interactionshould be treated with caution).

Exhibit 2. Mean MdAPEs for the experimental treatments for NORMAL periods

SeriesPromotione�ect Noise

Feedback type

MdAPE ofstatisticalforecast

Outcomefeedback

Statisticalforecast

Statistical forecastwith explanation

provided

Flat Weak Low 2.75 2.92 1.33 1.17Strong Low 5.58 9.58 4.41 1.33Weak High 12.50 9.83 4.08 4.83Strong High 16.66 12.50 6.75 3.33

Trend-seasonal Weak Low 6.11 3.19 12.67 1.75Strong Low 10.39 7.58 11.48 1.62Weak High 15.32 9.30 9.95 3.82Strong High 22.44 7.45 12.44 4.98

Anticipated rank of MdAPEs for all series(1 � largest)

1 2 3

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

46 Journal of Behavioral Decision Making Vol. 12, Iss. No. 1

WAS EFFICIENT USE MADE OF THE STATISTICAL FORECASTS?

Normal periodsIn normal periods, Exhibit 2 shows that the statistical time series forecasts provided accurate forecastsof the signal. Simply accepting the statistical forecast would therefore have been a good strategy.However, in normal periods, 47 out of the 48 subjects produced signal forecasts that were less accuratethan the statistical forecasts. Thus, while the provision of statistical forecasts improved the judgmentalforecasts for series with high noise and the more complex signal, as in other studies (e.g. Lim andO'Connor, 1995), these judgments failed to equal or outperform the statistical forecasts.

The most likely explanation for this is that the subjects were seeing a systematic pattern in the noisein the series and trying to forecast this. There appeared to be no systematic tendency to adjust theforecasts because they were perceived to be persistently too high or wrongly `tracking' the actual sales.Instead, changes in the forecasts appeared to be random. For subjects who received a statistical forecast

Exhibit 3. Mean MdAPEs for the experimental treatments for PROMOTION periods

SeriesPromotione�ect Noise

Feedback type

MdAPE ofstatisticalforecast

Outcomefeedback

Statisticalforecast

Statistical forecastwith explanation

provided

Flat Weak Low 3.21 2.58 2.74 2.06Strong Low 7.21 6.78 7.67 17.80Weak High 9.04 12.49 11.77 3.34Strong High 18.07 17.26 13.05 21.39

Trend-seasonal Weak Low 5.82 6.70 15.48 2.63Strong Low 12.71 17.19 17.47 20.85Weak High 31.32 12.41 10.59 9.50Strong High 26.19 17.26 21.30 17.52

Anticipated rank of MdAPEs for all series(1 � largest)

1 2 3

Exhibit 4. Mean MdAPEs for experimental treatments in POST-PROMOTION periods

SeriesPromotione�ect Noise

Feedback type

MdAPE ofstatisticalforecast

Outcomefeedback

Statisticalforecast

Statistical forecastwith explanation

provided

Flat Weak Low 4.61 2.82 1.44 0.51Strong Low 12.21 8.76 11.78 14.71Weak High 6.37 10.43 4.43 4.39Strong High 15.25 15.66 12.74 9.85

Trend-seasonal Weak Low 3.07 4.22 7.49 1.37Strong Low 11.35 13.12 19.11 11.68Weak High 15.58 9.84 6.12 5.47Strong High 36.69 18.33 10.36 12.02

Anticipated rank of MdAPEs for all series(1 � largest�

1 2 3

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

P. Goodwin and R. Fildes Judgmental Forecasts of Time Series 47

for `¯at' series we calculated the mean absolute percentage change in the judgmental forecasts betweencontiguous normal periods. At 9.4% this mean change was more than seven times greater than themean change for the statistical forecasts of 1.3% (the ratio was consistent across the two levels ofnoise). These results are similar to those of O'Connor et al. (1993).

Promotion periodsExhibit 3 shows the conditions where judgmental forecasters were able to improve on the accuracy ofthe statistical forecasts in promotion periods. Not surprisingly, they tended to be most successful whenthe promotion e�ects were strong, the exception being the high-noise trend-seasonal series. Never-theless, the above results show that, in promotion periods, providing a statistical forecast had nosigni®cant e�ect on the accuracy of judgmental forecasts. So how were the forecasters using theinformation provided on statistical forecasts and promotion expenditure?

Were subjects anchoring onto and under-adjusting from the statistical forecasts?The advantages provided by the statistical forecast in highlighting the time series signal might benegated by their acting as an anchor. To investigate this, the mean absolute di�erence (MAD) betweenthe judgmental and statistical forecasts for promotion periods was calculated for all forecasters. If thestatistical forecasts were acting as an anchor, it would be expected that the di�erences between thejudgmental and statistical forecast would be greater for subjects who did not receive a statisticalforecast. Although the MAD for subjects who received a forecast was less than for those who did not,an ANOVA of the MAD found no signi®cant main or interaction e�ects involving feedback (e.g. maine�ect: F2;24 � 2.10, p � 0.144), so the evidence for anchoring was only weak.

Were they ignoring the statistical forecasts?An alternative possibility is that, far from anchoring on the statistical forecasts, subjects were eitherignoring them or giving them too little weight. To try to assess the relative weights placed by judg-mental forecasters on the statistical forecasts and the promotion expenditure, a series of policycapturing models were developed using multiple linear regression. These had the form:

Jt � f �St; Pt� �1�

where: Jt , St, and Pt are respectively the judgmental forecast, statistical forecast and promotionexpenditure for period t. Weights in the models were standardized. For each treatment, a model was®tted to all the promotion period forecasts of the two subjects who were assigned to that treatment.These models were also ®tted to the forecasts of subjects who did not see the statistical forecasts (theoutcome feedback group). Clearly, if subjects provided with statistical forecasts were using them, thenthe weights they implicitly attached to the statistical forecasts should be higher than weights appearingin the models of the outcome feedback group. However, for both `¯at' and trend-seasonal series,ANOVAs indicated that there was no signi®cant di�erence in the weights (`¯at' series: F2;11 � 1.60,p � 0.27, trend-seasonal: F2;11 � 2.71, p � 0.13). All this provides evidence that in promotion periodssubjects provided with statistical forecasts were not using them.

Were subjects using pattern matching?If the judgmental forecasters were ignoring the statistical forecasts, an alternative possibility is that theywere using a pattern-matching strategy. This might involve searching for the past promotion expend-iture which is most similar to the one for the forecast period and using the actual sales for this past

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

48 Journal of Behavioral Decision Making Vol. 12, Iss. No. 1

period as a basis for the forecast. To investigate this, for each promotion period which required aforecast, an earlier period was identi®ed which had the most similar promotion expenditure (where twoearlier periods tied the more recent was selected). The correlation between the judgmental forecasts fora given period and sales in the `matched' promotion period was then calculated. Evidence for the use ofa pattern-matching strategy was found in forecasts for the low noise series where promotion e�ec-tiveness was strong. For the `¯at' series correlations were signi®cant (p5 0.05) for ®ve of the six sets offorecasts (the mean r was 0.54), while this was the case for three of the six sets of forecasts for theseasonal series (mean r � 0.56).

Post-promotion periodsExhibit 4 shows that in post-promotion periods subjects were generally less successful in outperformingthe statistical model, despite being in possession of the extra-model information about post-promotione�ects. For ¯at series, policy capturing models (of the form given in equation (1) with Ptÿ1 replacing Pt)were ®tted to the forecasts of the pair of subjects in each treatment, including those of subjects who hadnot seen the statistical forecasts. As in promotion periods, the obvious expectation was that, forsubjects who had seen the forecasts, the weights estimated for the statistical forecasts would be greaterthan for those who had not. However, ANOVAs indicated that there was no signi®cant di�erence inthese weights for either `¯at' (F2;11 � 0.76, p � 0.50) or trend-seasonal series (F2;11 � 2.19, p � 0.18).It therefore appears that subjects made little or no use of the statistical forecasts in post-promotionperiods.

DISCUSSION

The main ®nding of this study is that, while judgmental forecasters bene®ted from the availability ofstatistical forecasts under certain conditions, they almost always made insu�cient use of these fore-casts. When a good strategy would have been simply to have accepted the statistical forecast, theyinsisted on changing it, even when the statistical forecast was highly reliable. When it would have beenadvisable to use the statistical forecast as a base line, and to apply judgment to estimate the amount ofadjustment required, they appeared to ignore the forecast provided and to apply judgment to the entireforecasting task. In Lim and O'Connor's (1995) study subjects had already made an initial forecastbefore they were presented with the statistical forecast. Out study suggests that this underweightingprevails even when the statistical forecast is presented before the judgmental forecast has been formed.

There may be a number of reasons for the tendency to change the reliable forecasts made for normalperiods. Judgmental forecasters clearly have di�culties in recognizing random variation and appre-ciating that it is inherently incapable of being reliably predicted. Of course, where a sporadic cueapplies, the statistical time series forecast can only be reliable in normal periods. Its inability to handlethe e�ect of the cue when this applied may have served to decrease con®dence in the statistical forecastsfor all periods. Indeed, the fact that some of the variation in the time series was not random and, in thecase of post-promotion periods, was caused by a cue with low availability, may have further served toincrease the confusion between noise and signal. This is supported by the ®nding that judgmentalforecasting errors for normal periods were larger when the promotion e�ect was stronger, despite thefact that in normal periods promotion had, by de®nition, no e�ect. It is also possible that the subjectsmay have thought that simply accepting the statistical forecast was too easy and that something morewas required of them, particularly since they were being asked to bring their judgment to the task.

The ®nding that providing explanations of the statistical forecasts signi®cantly improved normalperiod judgments for ¯at series is consistent with the results of the studies by Remus et al. (1996) and

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

P. Goodwin and R. Fildes Judgmental Forecasts of Time Series 49

Sanders (1997) on task information feedback. However, it is not clear why providing explanations wasine�ective for trend-seasonal series. Of necessity, the explanations for these series were more complexso the subjects may have had di�culty in absorbing this information and relating it to the task.

The apparent neglect of the statistical forecasts in promotion and post-promotion periods also needsexplaining. While the presentation of the statistical forecasts was designed to make them salient (theyappeared on the sales graph and also in large characters below the graph), it may be that they werediscounted because of the amount of competing information that the forecaster had to handle (thesales graph, the next period promotion expenditure and the past expenditures). The instructions tellingsubjects that the statistical forecasts could not take into account the e�ects of promotion expendituresmay have exacerbated this. Indeed, Andreassen (1990) has argued that the relative salience of a sourceof information depends not only on the manner in which it is displayed, but also on the judge's beliefthat employing the information may be useful and his or her understanding of how to employ thesource. For example, Bar-Hillel (1980) has argued that `People integrate two items of information onlyif both seem to them equally relevant. Otherwise, high relevance information renders low relevanceinformation irrelevant.' Alternatively, the subjects may not have been able to discern that the promo-tion e�ects were added to the time series pattern and assumed that the regular pattern was suspendedduring promotion periods. A further possibility is that they may simply not have thought of using thestrategy of adjusting the statistical forecasts. For example, Payne et al. (1993, p. 201) show how decisionmakers can lack knowledge of appropriate strategies because they do not know which properties of thetask to consider.

Of course, many would argue that, at least where a stable time series pattern prevails, the judgmentalforecaster should be replaced by a statistical model since it appears that judgment only reduces forecastaccuracy. In practice the situation is not so simple. The evidence is that even if judgmental forecastsperform worse than a statistical model, it is often the former that will be used in any substantivedecisions. For example, Kleinmuntz (1990) argues: `. . . the acceptability to users of a decision aid doesnot rest on whether it substantially outperforms unaided judgment. Rather it depends on their beliefthat they have real expertise in a domain, thus inspiring con®dence in the possibility of beating theodds.' Kleinmuntz refers to this phenomenon as `deluded self-con®dence in human judgment'.

The challenge, therefore, is to develop forecasting support systems (FSS) that encourage forecastersto recognize those elements of the task which are best delegated to a statistical model and to focus theirattention on the elements where their judgment is most valuable. These systems will also need tofacilitate appropriate interaction between the model and judgment. The task will not be easy given theevidence of Kleinmuntz. Mechanical combination of judgmental and statistical forecasts has beenrecommended (Lim and O'Connor, 1995) as has statistical correction of judgmental forecasts(Goodwin, 1996, 1997) but these methods are more likely to be acceptable to forecast users who believethey have no domain expertise. A number of approaches seem worth investigating. For example,perhaps judgmental changes to statistical forecasts should be made more onerous and less cursory byrequiring documentation of the rationale for the change. Indeed, encouraging judgmental forecastersto explore why they disagree with the model should yield insights and understanding. However, if thisis too costly, relative to the bene®ts of accurate forecasting, it may militate against making changeswhen they are required or, more likely, lead to the abandonment of the FSS. Alternatively, the evidenceof Payne et al. (1993) suggests that FSS should be designed to provide hints and `attention mani-pulators' designed to guide forecasters to the appropriate features of the task and to encourage theadoption of suitable strategies.

Clearly, care must be exercised before drawing inferences about forecasting in business and industryfrom this experiment. The context of the experiment Ð sales forecasting and promotion cam-paigns Ð may have created particular expectations on the part of the forecasters about the behaviourof the series. Moreover, the subjects were not regular forecasters with particular product expertise.

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

50 Journal of Behavioral Decision Making Vol. 12, Iss. No. 1

However, the task they faced was in several ways easier than many practical sales forecasting tasks. Theunderlying time series pattern was stable, only one cue occasionally disturbed this pattern (and then ina stable and deterministic way Ð there was no additional noise in the cue±sales relationship) andinformation on the cue was quantitative and reliable. In practice many sporadic cues, some of thembased on soft information such as rumours, may need to be considered when forming the forecasts.Nevertheless, as we argued earlier, the laboratory o�ers many advantages in permitting the study ofjudgment under controlled conditions and further research could usefully extend this study to oneinvolving multiple sporadic cues with di�ering levels of reliability. This could form the basis for thetesting of forecasting support systems in the ®eld.

CONCLUSIONS

When regular time series are disturbed by sporadic events there appears to be an ideal opportunity forcombining human judgment with statistical time series forecasts. However, this study has shown thatunder these conditions human judges are ine�cient in the way they use statistical forecasts. Theyunnecessarily modify statistical forecasts when they are reliable and apparently ignore them when theywould have formed a good basis for adjustment. Providing simple and regular explanations of therationale of the statistical forecasts did lead to improvements, but only for one type of series and incertain types of period. These ®ndings are important because time series subject to the e�ect of asporadic cue are widely encountered in practice (Lim and O'Connor, 1996). The challenge, therefore, isto design methods and support systems which will help forecasters to make the most appropriate ande�ective use of their judgments, while encouraging them to delegate to the statistical model thoseaspects of the task where the exercise of judgment would be harmful.

REFERENCES

Abraham, M. M. and Lodish, L. M. `PROMOTER: an automated promotion evaluation system', MarketingScience, 6 (1987), 101±123.

Andreassen, P. B. `Judgmental extrapolation and market overreaction: on the use and disuse of news', Journal ofBehavioral Decision Making, 3 (1990), 153±174.

Andreassen, P. B. and Kraus, S. J. `Judgmental extrapolation and the salience of change', Journal of Forecasting,9 (1990), 347±372.

Arkes, H. R., Dawes, R. M. and Christensen, C. `Factors in¯uencing the use of a decision rule in a probabilistictask', Organizational Behavior and Human Decision Processes, 37 (1986), 93±110.

Armstrong, J. S. and Collopy, F. `Error measures for generalizing about forecasting methods: empiricalcomparisons', International Journal of Forecasting, 8 (1992), 69±80.

Balzer, W. K., Sulsky, L. M., Hammer, L. B. and Sumner, K. E. `Task information, cognitive information, orfunctional validity information: which components of cognitive feedback a�ect performance?', OrganizationalBehavior and Human Decision Processes, 53 (1992), 35±54.

Balzer, W. K., Hammer, L. B., Sumner, K. E., Birchenough, T. R., Martens, S. P. and Raymark, P. H. `E�ects ofcognitive feedback components, display format, and elaboration on performance', Organizational Behavior andHuman Decision Processes, 58 (1994), 369±385.

Bar-Hillel, M. `The base rate fallacy in probability judgments', Acta Psychologica, 44 (1980), 211±233.Benson, P. G. and OÈ nkal, D. `The e�ects of feedback on the performance of probability forecasters', InternationalJournal of Forecasting, 8 (1992), 559±573.

Bolger, F. and Harvey, N. `Context-sensitive heuristics in statistical reasoning', Quarterly Journal of ExperimentalPsychology, 46A (1993), 779±811.

Chapman, L. J. and Chapman, J. P. `Illusory correlation as an obstacle to the use of valid psychodiagnostic signs',Journal of Abnormal Psychology, 74 (1969), 271±280.

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

P. Goodwin and R. Fildes Judgmental Forecasts of Time Series 51

Dalrymple, D. J. `Sales forecasting practices, results from a United States survey', International Journal ofForecasting, 3 (1987), 379±391.

Dawes, R. M. `Graduate admission variables and future success', Science, 187 (1975), 721±743.Dawes, R. M., Faust, D. and Meehl, P. E. `Clinical versus actuarial judgment', Science, 243 (1989), 1668±1673.Edmundson, R., Lawrence, M. and O'Connor, M. `The use of non-time series information in sales forecasting:a case study', Journal of Forecasting, 7 (1988), 201±211.

Fildes, R. `E�cient use of information in the formation of subjective industry forecasts', Journal of Forecasting,10 (1991), 597±617.

Goodwin, P. `Statistical correction of judgmental point forecasts and decisions', Omega: International Journal ofManagement Science, 24 (1996), 551±559.

Goodwin, P. `Adjusting judgemental extrapolations using Theil's method and discounted weighted regression',Journal of Forecasting, 16 (1997), 37±46.

Goodwin, P. and Wright, G. `Improving judgmental time series forecasting: a review of the guidance provided byresearch', International Journal of Forecasting, 9 (1993), 147±161.

Goodwin, P. and Wright, G. `Heuristics, biases and improvement strategies in judgmental time series forecasting',Omega: International Journal of Management Science, 22 (1994), 553±568.

Hammond, K. R., Summers, D. A. and Deane, D. H. `Negative e�ects of outcome feedback in multiple cueprobability learning', Organizational Behavior and Human Performance, February (1973), 30±34.

Harvey, N. `Why are judgments less consistent in less predictable task situations?', Organizational Behavior andHuman Decision Processes, 63 (1995), 247±263.

Harvey, N., Bolger, F. and McClelland, A. `On the nature of expectations', British Journal of Psychology, 85(1994), 203±229.

Hoch, S. J. and Schkade, D. A. `A psychological approach to decision support systems', Management Science, 42(1996), 51±64.

Hogarth, R. M. Judgement and Choice, 2nd edition, Chichester: John Wiley, 1987.Hogarth, R. M. and Makridakis, S. `Forecasting and planning: an evaluation', Management Science, 27 (1981),115±138.

Kessler, L. and Ashton, R. H. `Feedback and prediction achievement in ®nancial analysis', Journal of AccountingResearch, 19 (1981), 146±162.

Kleinmutz, B. `Why we still use our heads instead of formulas: towards an integrative approach', PsychologicalBulletin, 107 (1990), 296±310.

Lawrence, M. J. and Makridakis, S. `Factors a�ecting judgmental forecasts and con®dence intervals',Organizational Behavior and Human Decision Processes, 42 (1989), 172±187.

Lawrence, M. J. and O'Connor, M. J. `Exploring judgemental forecasting', International Journal of Forecasting, 8(1992), 15±26.

Lawrence, M. J. and O'Connor, M. J. `The anchoring and adjustment heuristic in time series forecasting', Journalof Forecasting, 14 (1995), 443±451.

Lim, J. S. and O'Connor, M. `Judgemental adjustment of initial forecasts: its e�ectiveness and biases', Journal ofBehavioral Decision Making, 8 (1995), 149±168.

Lim, J. S. and O'Connor, M. `Judgmental forecasting with time series and causal information', InternationalJournal of Forecasting, 12 (1996), 139±153.

Lindman, H. R. Analysis of Variance in Experimental Design, New York: Springer-Verlag, 1992.Marascuilo, L. A. and McSweeney, M. Nonparametric and Distribution-Free Methods for the Social Sciences,Monterey, CA: Brooks/Cole Publishing Company, 1977.

Mathews, B. P. and Diamantopoulos, A. `Judgemental revision of sales forecasts: e�ectiveness of forecastselection', Journal of Forecasting, 9 (1990), 407±415.

Meehl, P. E. `When shall we use our heads instead of the formula?', Journal of Counselling Psychology, 4 (1957),268±273.

O'Connor, M. and Lawrence, M. `Judgmental forecasting and the use of available information', in Wright, G.and Goodwin, P. (eds), Forecasting with Judgment (pp. 65±90), Chichester: John Wiley, 1998.

O'Connor, M., Remus, W. and Griggs, K. `Judgemental forecasting in times of change', International Journal ofForecasting, 9 (1993), 163±172.

Payne, J. W., Bettman, J. R. and Johnson, E. J. The Adaptive Decision Maker, Cambridge: Cambridge UniversityPress, 1993.

Peterson, D. K. and Pitz, G. F. `E�ect of input from a mechanical model on clinical judgment', Journal of AppliedPsychology, 71 (1986), 163±167.

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

52 Journal of Behavioral Decision Making Vol. 12, Iss. No. 1

Remus, W., O'Connor, M. and Griggs, K. `Does feedback improve the accuracy of recurrent judgmentalforecasts?', Organizational Behavior and Human Decision Processes, 66 (1996), 22±30.

Sanders, N. R. `Accuracy of judgmental forecasts: a comparison', Omega: International Journal of ManagementScience, 20 (1992), 353±364.

Sanders, N. R. `The impact of task properties feedback on time series judgmental forecasting tasks', Omega:International Journal of Management Science, 25 (1997), 135±144.

Sanders, N. R. and Manrodt, K. B. `Forecasting practices in US corporations: survey results', Interfaces, 24(1994), 92±100.

Sanders, N. R. and Ritzman, L. P. `The need for contextual and technical knowledge in judgmental forecasting',Journal of Behavioral Decision Making, 5 (1992), 39±52.

Taylor, P. F. and Thomas, M. E. `Short term forecasting: horses for courses', Journal of the Operational ResearchSociety, 33 (1982), 685±694.

Webby, R. and O'Connor, M. `Judgmental and statistical time series forecasting: a review of the literature',International Journal of Forecasting, 12 (1996), 91±118.

Willemain, T. R. `Graphical adjustment of statistical forecasts', International Journal of Forecasting, 5 (1989),179±185.

Wolfe, C. and Flores, B. `Judgmental adjustment of earnings forecasts', Journal of Forecasting, 9 (1990), 389±405.

Authors' biographies :Paul Goodwin is Principal Lecturer in Operational Research at the University of the West of England. His researchinterests focus on the role of judgment in forecasting and decision making. He is the co-author ofDecision Analysisfor Management Judgment (2nd edition) published by Wiley and co-editor of Forecasting with Judgment alsopublished byWiley. He has published articles in a number of academic journals including the International Journalof Forecasting, the Journal of Forecasting and Omega: The International Journal of Management Science.

Robert Fildes is Professor of Management Science in the School of Management, Lancaster University. He has amathematics degree from Oxford and a PhD in Statistics from the University of California. He has published fourbooks in forecasting and planning including The Forecasting Accuracy of Major Time Series Methods (with SpyrosMakridakis) and, most recently, theWorld Index of Economic Forecasts. He was co-founder in 1981 of the Journalof Forecasting and in 1985 of the International Journal of Forecasting and was Editor-in-Chief of the IJF for theperiod 1988±1997. He has published numerous articles in academic journals, including Economica, ManagementScience, and the Journal of the Operational Research Society.

Authors' addresses :Paul Goodwin, Department of Mathematical Sciences, University of the West of England, Coldharbour Lane,Frenchay, Bristol, BS16 1QY, UK.

Robert Fildes, The Management School, Lancaster University, Lancaster, LA1 4YX, UK.

Copyright # 1999 John Wiley & Sons, Ltd. Journal of Behavioral Decision Making, Vol. 12, 37±53 (1999)

P. Goodwin and R. Fildes Judgmental Forecasts of Time Series 53