graphs versus tables: effects of data presentation format on judgemental forecasting

19
ELSEVIER International Journal of Forecasting 12 (1996) 119-137 Graphs versus tables" Effects of data presentation format on judgemental forecasting Nigel Harvey*, Fergus Bolger Department of Psychology, University College London, Gower Street, London WCIE 6BT, UK Abstract We report two experiments designed to study the effect of data presentation format on the accuracy of judgemental forecasts. In the first one, people studied 44 different 20-point time series and forecast the 21st and 22nd points of each one. Half the series were presented graphically and half were in tabular form. Root mean square error (RMSE) in forecasts was decomposed into constant error (to measure bias) and variable error (to measure inconsistency). For untrended data, RMSE was somewhat higher with graphical presentation: inconsis- tency and an overforecasting bias were both greater with this format. For trended data, RMSE was higher with tabular presentation. This was because underestimation of trends with this format was so much greater than with graphical presentation that it overwhelmed the smaller but opposing effects that were observed with untrended series. In the second experiment, series were more variable but very similar results were obtained. Keywords: Judgement; Time series; Graphs; Forecasting I. Introduction The relative merits of graphical and tabular data displays have been the focus of much debate in the managerial and military decision- making literature. Remus (1987, p. 1200) points out that vendors of computer graphic equipment "have suggested that graphical displays help managers to understand and use data better than older ways (for example, tables of data). While this argument would appear to have some face validity, the empirical literature is in disarray." The extent of this disarray can be gauged from * Corresponding author. Tel.: 0171 387 7050; fax: 0171 436 4276. DeSanctis's (1984, p. 475) summary of her review of the area: "A total of 12 studies have found tables to be better than graphs. No mean- ingful difference between the two presentation modes was found in 10 studies. Only 7 have found graphs to outperform tables." DeSanctis (1984) argued that researchers need to identify the conditions under which graphs outperform tables as a presentation medium. Since her review, a number of researchers have attempted to do this. They have come to the conclusion that graphs should be used when people have to use their judgement to analyse trends and make forecasts (for example, Coil et al., 1991; Dickson et al., 1986; Tullis, 1988). Some have gone further and argued that there is 0169-2070/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved SSDI 0169-2070(95)00634-6

Upload: nigel-harvey

Post on 30-Aug-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graphs versus tables: Effects of data presentation format on judgemental forecasting

ELSEVIER International Journal of Forecasting 12 (1996) 119-137

Graphs versus tables" Effects of data presentation format on judgemental forecasting

Nigel Harvey*, Fergus Bolger Department of Psychology, University College London, Gower Street, London WCIE 6BT, UK

Abstract

We report two experiments designed to study the effect of data presentation format on the accuracy of judgemental forecasts. In the first one, people studied 44 different 20-point time series and forecast the 21st and 22nd points of each one. Half the series were presented graphically and half were in tabular form. Root mean square error (RMSE) in forecasts was decomposed into constant error (to measure bias) and variable error (to measure inconsistency). For untrended data, RMSE was somewhat higher with graphical presentation: inconsis- tency and an overforecasting bias were both greater with this format. For trended data, RMSE was higher with tabular presentation. This was because underestimation of trends with this format was so much greater than with graphical presentation that it overwhelmed the smaller but opposing effects that were observed with untrended series. In the second experiment, series were more variable but very similar results were obtained.

Keywords: Judgement; Time series; Graphs; Forecasting

I. Introduction

The relative merits of graphical and tabular data displays have been the focus of much debate in the managerial and military decision- making literature. Remus (1987, p. 1200) points out that vendors of computer graphic equipment "have suggested that graphical displays help managers to understand and use data better than older ways (for example, tables of data). While this argument would appear to have some face validity, the empirical literature is in disarray." The extent of this disarray can be gauged from

* Corresponding author. Tel.: 0171 387 7050; fax: 0171 436 4276.

DeSanctis's (1984, p. 475) summary of her review of the area: " A total of 12 studies have found tables to be better than graphs. No mean- ingful difference between the two presentation modes was found in 10 studies. Only 7 have found graphs to outperform tables."

DeSanctis (1984) argued that researchers need to identify the conditions under which graphs outperform tables as a presentation medium. Since her review, a number of researchers have at tempted to do this. They have come to the conclusion that graphs should be used when people have to use their judgement to analyse trends and make forecasts (for example, Coil et al., 1991; Dickson et al., 1986; Tullis, 1988). Some have gone further and argued that there is

0169-2070/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved SSDI 0169-2070(95)00634-6

Page 2: Graphs versus tables: Effects of data presentation format on judgemental forecasting

120 N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137

no point in going to the trouble of creating graphs for any other application (Dickson et al., 1986).

In what follows, we first review previous work that has compared forecasts from graphical and from tabular data. On balance, it favours graphi- cal presentation but the studies suffer from various limitations. On the basis of this review, we specify design criteria that experiments should meet if they are to produce findings that are unambiguously interpretable and generaliz- able. Our own experiments were designed with these criteria in mind. We report a slight advan- tage for tabular presentation when series are untrended but a clear superiority for graphical presentation otherwise. Finally, we consider reasons for these effects and discuss the limita- tions and implications of our work.

2. Background

Evidence relevant to the view that judgemen- tal forecasts benefit from graphical presentation comes from five published studies (Angus- Leppan and Fatseas, 1986; Dickson et al., 1986; Lawrence, 1983; Lawrence et al., 1985; Wagenaar and Sagaria, 1975). Four of them provide some support for it. However, the strength of this support is open to question.

Angus-Leppan and Fatseas (1986) presented students with a single 48-point time series as a column of numbers. First they used the table to make extrapolative estimates for the next 12 values; next, they drew a graph of the original 48 values; finally, they used this graph to re-esti- mate the same 12 values as before. Forecasts were compared with actual values: mean abso- lute percentage error (MAPE) was significantly less for graphical than for tabular extrapolations over the first six and over all 12 periods but not over the second six periods. From these results, it appears that graphical presentation is better for short-term forecasting. However, people had spent much more time processing the data before making their graphical extrapolations. In a counterbalanced design, an additional group of subjects would have initially extrapolated from a

graph, then tabulated the data and finally ex- trapolated from the resulting table. Replication of the original results in such a group would have allowed more confidence to be placed in their interpretation. Even then, the problem of generalizing from a single series with unknown structure would remain. We have no assurance that contrary results would not be obtained with most other series.

In their second experiment, Dickson et al. (1986) presented one group of subjects with tables of three 30-point series and another group with graphs of the same series. Both groups had to make forecasts for the next three points in each series. For eight of the nine forecasts, absolute error was significantly less with graphi- cal presentation. The between-subjects design allows clear interpretation of these results but the problem of generalization remains. Dickson et al. (1986) used only three series and all of them were trended. Forecasters often have to extrapolate from series that are not trended but still have (for example, autoregressive) structure. Would the advantage of graphical over tabular presentation still appear with such series?

Lawrence (1983) also used a between-subjects design. Four subjects were given a table of monthly airline passenger totals for the period 1953-1959 and five others were given a graph of corresponding data for the period 1956-1959. Both groups first made forecasts for the 12 months of 1960. Average MAPE scores over these 12 forecasts did not differ between groups. Next, subjects were given actual data for the first six months of 1960 and had to re-forecast the last six months. When MAPE scores were averaged over these six re-forecasts and the original fore- casts for the first six months, values of 5.3% and 5.9% were obtained for the graphical and tabu- lar presentation groups, respectively. Again, this difference suggests that graphical presentation is better for short-term forecasting. However, its statistical significance was not reported. Our own analysis of the data provided in the paper shows it to be absent. This is not too surprising given the small group sizes.

Lawrence et al. (1985) used 111 annual, quar- terly and monthly economic time series as their

Page 3: Graphs versus tables: Effects of data presentation format on judgemental forecasting

N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137 121

experimental materials. In their graphical pre- sentation condition, each of 111 students forecast a different one of these series. In their tabular presentation condition, annual series were ex- cluded: each of a separate group of 91 students forecast a different one of the remaining quarter- ly and monthly series. However, the tables did not contain the raw values that were plotted in the graphs. Instead they contained percentages of the annual totals. Furthermore, the totals were presented in a separate column of the table and subjects were instructed to graph them. M A P E scores were (insignificantly) lower when data were presented graphically and this differ- ence was numerically larger at short time horizons. Yet again, there is some suggestion that short-term forecasts benefit from graphical presentation. However, it is just as plausible to attribute the observed difference in M A P E scores to easier short-term forecasting with raw monthly values than with monthly percentages of annual values. Alternatively it may have arisen because graphing annual values interfered with students' attempts to forecast from tables of percentages of those values.

Wagenaar and Sagaria (1975) used an ex- ponential growth algorithm to generate a time series. They presented people with the first five points of this series and required them to fore- cast the sixth one. Thirty subjects were given the series as a column of numbers and another group of the same size was presented with the data in graphical form. Forecasts in both groups were underestimates of the point produced by the exponential algorithm and the degree of under- estimation was greater with graphical presenta- tion. Superficially, it appears that forecasting benefits from a tabular data format. However, Jones (1976, 1979, 1984) has pointed out that the five points generated by the exponential algo- rithm could be adequately fitted by a quadratic function. If such a function is used to produce the benchmark sixth point against which sub- jects' forecasts are assessed, underestimation disappears. Given this, we have no reason to suppose that the advantage of tabular presenta- tion is preserved.

Taken together, the studies outlined above can

be regarded as providing some tentative support for the notion that judgemental forecasting bene- fits from graphical presentation. However, it also seems fair to say that more evidence on this issue would not be amiss. There is a real need for findings that are both unambiguously interpret- able and generalizable. This requires that a number of features be included in the design and analysis of experiments performed to study this topic.

First, graphical and tabular presentation should be compared using either a between- subjects or balanced within-subject design. In the latter case, half the subjects receive graphical representation first and half receive tabular pre- sentation first. Second, experimental conditions should differ only in terms of presentation for- mat: identical series should be presented graphi- cally and in a table. Third, untrended as well as trended series should be studied. Certain judgemental biases only affect trended series. An effect of presentation format on forecasting performance with trended series may arise solely because this variable influences the strength of these biases. Thus we cannot generalize our findings with trended series to untrended ones with impunity. Fourth, a sufficient number of trended and untrended series need to be studied to ensure that findings do not arise from idiosyncratic features of the presented data (for example, an unrepresentative distribution of noise over the last few observations). Including a variety of degrees of serial dependence would also increase the generalizability of findings. Fifth, it is important to collect enough data to ensure that theoretically and practically impor- tant differences in forecasting accuracy reach statistical significance. Presenting subjects with more than one series each may achieve this while obviating the need to increase group size.

3. Causes of format effects

There is one additional feature that would improve generalizability of findings. Various re- viewers (for example, DeSanctis, 1984; Ganzach, 1993) have pointed out that research on pre-

Page 4: Graphs versus tables: Effects of data presentation format on judgemental forecasting

122 N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137

sentation format effects has been empirically rather than theoretically driven. This emphasis is not surprising given the practical and commercial importance of the issue. However, as Ganzach (1993) stresses, an understanding of the cogni- tive processes underlying the effects of presenta- tion format would be useful in practice. If we could understand how the effects are produced, we could be more confident in deciding whether they should be taken into account in forecasting situations in which they have not been studied explicitly. This is because we would have some theoretical basis from which to generalize our knowledge. Experiments should be designed and analysed to cast some light on the cognitive processes responsible for any effect of presenta- tion format. In particular, it is important to select error measures appropriate for this pur- pose.

Judgemental forecasting is suboptimal because it is prone to biases and inconsistency. Biases include underestimation or damping of trends (Bolger and Harvey, 1993; Eggleton, 1982; Law- rence and Makridakis, 1989; Sanders, 1992) and a tendency to overforecast (Eggleton, 1982; Lawrence and Makridakis, 1989). They are mea- sured by constant e r r o r ( C E ) . 1 Studying them may illuminate the cognitive processes subserv- ing forecasting. For example, trend-damping can be interpreted in terms of the use of an anchor- and-adjust heuristic (Harvey et al., 1994). Typi- cally, adjustments away from a judgement an- chor are insufficient (Tversky and Kahneman, 1974): it appears that people forecasting trended series use the last data point as an anchor and make insufficient adjustment away from it to take account of trend. If presentation format affects the use of the anchor-and-adjust heuristic (for example, by influencing assessment of the anchor), then manipulating this variable will influence trend-damping and hence CE.

Forecasts are also inconsistent: the same series will receive different forecasts from different

1 Given that D is the target (actual data point or optimal forecast) minus the judgmental forecast, CE is given by E D/n; VE is given by ~/(E (D - CE)2/n); RMSE is given by

~(E (D)Z/n) and, equivalently, by ~/(CE) 2 + (VE)2).

people and different forecasts from the same person on different occasions. Inconsistency is measured by variable error (VE) 2. The study of inconsistency may also throw light on the cogni- tive processes underlying forecasting perform- ance. For example, the finding that inconsistency increases with noise in the series (Harvey, 1995) suggests that it is at least partly caused by use of the representativeness heuristic (Tversky and Kahneman, 1974). Forecasters have some (pos- sibly implicit) tendency to represent the noise as well as the pattern in the series. If presentation format affects this tendency, then manipulating the format may be expected to change forecast inconsistency and hence VE.

These considerations provide a rationale for measuring both CE and VE in studies of the effects of presentation format. However, prac- titioners are likely to be more interested in an overall estimate of error that takes them both into account. This is easily done by taking the square root of their sum of squares to obtain root mean square e r ro r ( R M S E ) . 3 Although this measure may not be the best choice for compar- ing the accuracy of different forecasting tech- niques over different types of series (Armstrong and Collopy, 1992; Fildes, 1992), it is still useful for comparing the effects of different presenta- tion formats on forecasting the same series.

4. First experiment

In an attempt to secure sound evidence on the effects of presentation format, we designed an experiment to meet the criteria listed above. We used a balanced within-subject design: graphical series preceded tabular series for half the sub- jects but followed them for the other half. The task of forecasting the two types of series was made as comparable as possible in all other respects. Untrended as well as positively and negatively trended series were forecast. To en- sure that findings could not easily be attributed to idiosyncratic noise patterns in individual

z See footnote 1. 3 See footnote 1.

Page 5: Graphs versus tables: Effects of data presentation format on judgemental forecasting

N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137 123

series, subjects forecast 42 different series (seven graphical and seven tabular for each of these three trend types). To increase the generalizabili- ty of any findings, serial dependence varied over series and was used as a covariate in the analysis. The 4368 data points (42 series x 52 subjects × 2 forecasts per series) should provide sufficient power to ensure that any potentially important effects of presentation format are statistically significant. Finally, to facilitate identification of the cognitive processes responsible for these effects, we analysed CE and VE in addition to RMSE.

4.1. Subjects and stimulus materials

The 52 subjects were all prospective psychol- ogy students and mostly aged between 17 and 20 years. Approximately 75% of them were female.

To generate 42 22-point stimulus series with three different trends and varying degrees of serial dependence, the following procedure was adopted. First 21 44-point series were produced with the following algorithm:

x , = + - + 2(x,_2 - +

Each point (At) was calculated as the mean of the series (/x) plus a proportion (al) of the deviation of the previous point (Xt_l) from the mean of the series plus a different proportion (a 2) of the deviation from the mean of the point (X,_2) immediately before the previous point, plus some Gaussian noise (e~). The mean was set to zero and the noise was normally distributed with a mean of zero and a standard deviation of t w o .

To ensure that serial dependence and the frequency composition of the different series varied widely, seven different combinations of a 1 and a 2 were each used to produce three series. These seven combinations were: -0.5, -0.3; -0.5, +0.3; -0.5, 0; 0, 0; +0.5, 0; +0.5, -0.3; +0.5, +0.3. Of course, owing to sampling error, the partial autocorrelation function for the series generated with a particular o~ 1 0/2 combination varied considerably around what would be ex- pected in an infinitely long series generated with

that combination. (The actual partial autocorre- lations in the stimulus series were used to gener- ate optimal forecasts and as covariates in the analyses reported below.)

The three different series generated with each of 1 o~ 2 combination were given different trends. This was accomplished by adding a constant positive value (upward trend), a constant nega- tive value (downward trend) or zero (no trend) at each time period.

Finally, to provide the 42 22-point stimulus series, each of the 21 44-point series produced by the procedure just outlined was divided into two. Half the subjects saw the first half of it presented graphically and the second half presented in a table, whereas the r.emaining subjects saw the first half presented in a table and the second half presented graphically. Because the last two points in each 22-point series had to be forecast, they were withheld from subjects until they had made their responses. After the forecasts had been made, the two points were presented as feedback.

The graphs were produced by plotting the series against time. They were labelled sales on the ordinate and days on the abscissa. For tabular presentation, the series were displayed as two columns of numbers labelled days and sales, respectively. The scale was identical for both types of presentation and such that sales were always substantially above zero. Also the mean level of sales to be forecast was around 35 (range 22-46) in all series.

4.2. Design and procedure

Half the subjects saw 21 series presented as graphs then 21 series presented as columns of numbers. The other half of the subjects received the tabular presentations first and the graphical presentations second. The order in which the series were presented within each of these two blocks was randomized separately for each sub- ject.

Within each experimental session, the subjects were first briefed orally as a group. They were told that they would be taking part in a com- puter-controlled experiment concerning forecast-

Page 6: Graphs versus tables: Effects of data presentation format on judgemental forecasting

124 N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137

ing. Specifically, they were asked to regard themselves as factory managers who were re- quired to predict sales for the next two days on the basis of sales figures for the previous 20 days. They were told that feedback about the true level of sales would be given after the two forecasts for each product had been made. They were informed that there were 42 products in all and that the data for half of these products would be presented in graphical form and the other half in tabular form. A couple of example trials were then demonstrated on large video monitors.

Next the subjects were shown into individual cubicles containing a microcomputer with colour monitor and two joystick controls, one with red buttons and the other with yellow buttons. Once the subjects had read the further written instruc- tions provided as an aide memoire, they could start the first trial by pressing any red button on the left-hand joystick. When such a button was pressed, the first series was presented on the screen.

If the series was presented as a graph the subjects could make a small dot move up and down the screen in the position of the 21st day by moving the right-hand joystick backwards and forwards. The forecast could be made by posi- tioning this dot and then pressing one of the yellow buttons on the joystick console. A dot then appeared on the screen at the position of the 22nd day and could be positioned as before to make the forecast for that day. The true levels of sales were then indicated by extending the graph by two time periods--the subject could see his or her forecast in relation to these actual values. Pressing a red button (on the left-hand joystick) allowed the subject to move on to the sales figures for the next product.

The procedure was identical when the series was presented as a table of numbers except that moving the joystick made the value indicated for the 21st (or 22nd) day change in whole units between 0 and 100. Feedback was given by presenting the true values next to the forecasts before the subject selected to move on to the next series by pressing a red joystick button.

The experiment was self-paced: on average, it

took approximately 40 minutes. After forecasts had been made for all 42 products, the subjects were presented with a message on the screen telling them that the experiment had finished and thanking them for their participation. The sub- jects' forecasts, and the orders in which the series had been viewed, were then saved to disc.

4.3. Results

Subjects' performance was assessed relative to optimal forecasts rather than relative to the actual values of the series which they had re- ceived as feedback. Optimal forecasts were calculated for each of the 42 series by dropping the noise component from the generating algo- rithm described above. The optimal forecasts were computed for untrended series and trended series with the trend removed. In the latter case the trend was added back again before error scores were calculated. The optimal values of a were calculated from the lag-1 and lag-2 partial autocorrelations for each of the actual 20-point series observed by the subjects prior to their forecasts (de-trended as appropriate). We used optimal forecasts as the standard rather than the true values because this is the fairest test of the subjects' forecasting ability. We would neither wish nor expect subjects to forecast the noise component in the series.

By subtracting judgemental forecasts from optimal forecasts, 4368 forecast errors were obtained. On the basis of box-plots of these errors, ten data points lying outside the outer fence were replaced by the subjects' mean error for the stimulus series. Examination of the corresponding forecasts suggested that these points were mistakes by subjects (for example, caused by inadvertently leaning on or otherwise pressing one of the response buttons).

The three error scores (CE, VE, RMSE) were calculated separately for each subject's responses to each series by aggregating the forecast error information for the 21st and 22nd days using the formulae given in footnote 1 with n equal to two. Thus we obtained 2184 values (52 subjects x 42 forecasts) for each error type. The means and standard deviations of these error scores are

Page 7: Graphs versus tables: Effects of data presentation format on judgemental forecasting

N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137 125

Table 1 First experiment: Mean error scores of forecasts from series with each type of trend presented in each format (standard deviations are shown in parentheses)

No trend Upward trend Downward trend Overall (n = 728) (n = 728) (n = 728) (n = 2184)

Graphical CE -0.89 (1.55) 0.83 (2.92) -1.27 (2.08) -0.44 VE 0.85 (0.75) 1.03 (0.98) 0.92 (0.88) 0.93 RMSE 1.75 (1.18) 2.59 (2.12) 2.32 (1.46) 2.22

Tabular CE -0.41 (1.85) 1.62 (3.07) -1.74 (2.64) -0.18 VE 0.66 (0.60) 0.95 (0.92) 0.91 (0.89) 0.84 RMSE 1.60 (1.36) 2.99 (2.21) 2.70 (2.08) 2.34

shown in Table 1 for each presentation format (graphical; tabular) and trend type (up; down; none).

Inspection reveals clear patterns in these data. When results are aggregated over trend type, CE and VE are higher with graphical presentation but R M S E is higher with tabular presentation. All four error measures are larger for trended than for untrended series. Trend-damping is revealed by the positive CE values for upward trends and the negative CE values for downward trends. A tendency to overforecast is shown by the negative CE with untrended series and, arguably, by the negative CE values with down- ward trends that are larger in absolute terms than the positive CE values with upward trends.

To analyse these data we performed a separate analysis of variance and covariance on each error score. Initial analyses showed no main effects of order of presentation (graphical then tabular versus tabular then graphical) on error scores and so this between-subjects variable was dropped from subsequent analyses. Thus the analyses that we report here include presentation format (graphical; tabular) and trend type (up; down; none) as within-subject independent vari- ables. To partial out the effects of serial depen- dence in the series, the three covariates were the lag-1 and lag-2 partial autocorrelation coeffi- cients and the interaction between these factors. Table 2 shows the results of these analyses together with adjusted means, significance tests for the influence of covariates, and correlation coefficients between covariates and dependent variables.

The analyses confirm initial impressions. Pre-

sentation format has a highly significant effect on all error measures. However, as we pointed out, its main effect on CE and VE is in the reverse direction to its main effect on RMSE. Superfi- cially, this seems paradoxical because R M S E is made up of CE and VE terms: how can the aggregate measure show an effect in the opposite direction to those shown by its constituent parts?

In fact, the paradox can be resolved by notic- ing that the analyses produce significant inter- actions and then performing tests of the simple effect of presentation format separately for each level of the trend variable. First, consider the effect of presentation format on untrended series only. For these series, R M S E was greater with graphical than with tabular presentation (F(1,364) =3.28; p =0.07): this is reasonable because both the overforecasting bias measured by CE (F(1 ,364)= 14.82; p <0.0001) and the inconsistency measured by VE (F(1 ,364)= 15.00; p<0 .0001) were greater with graphical presentation. Now consider the effect of pre- sentation format on trended series. R M S E was greater with tabular than with graphical pre- sentation for both upward (F(1,364) = 6.15; p = 0.014) and downward (F(1 ,364)=8.01; p = 0.005) trended series. Again this is reasonable because the much greater underestimation of both upward (F(1,364) = 15.68; p < 0.0001) and downward (F(1,364) = 7.18; p = 0.008) trends with tabular presentation overwhelm the oppos- ing but smaller effects of overforecasting and inconsistency that we observed with untrended series.

We can now see why aggregating data over both untrended and trended series produces the

Page 8: Graphs versus tables: Effects of data presentation format on judgemental forecasting

126 N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137

Table 2

First experiment: Analyses of variance (ANOVA); significance tests for influence of covariates; correlation coefficients between covariates and dependent variables; and adjusted means (a) Constant error (CE)

df F p

ANOVA Presentation mode (1,51) 7.28 0.007 Trend (2,102) 243.97 <0.001 Presentation mode x trend (2,1971) 14.53 <0.001

t p r

Covariates Lag-1 partial autocorrelation (p~) 2.42 0.016 0.012 Lag-2 partial autocorrelation (P2) 0.02 NS 0.019 Pl × P2 -1.33 NS -0.131

No trend Trend up Trend down Overall

Adjusted means Graphical -0.88 0.82 - 1.27 -0.44 Tabular -0.40 1.61 -1.74 -0.18

(b) Variable error (VE)

df F p

ANOVA Presentation mode (1,51) 8.34 0.004 Trend (2,102) 15.29 <0.001 Presentation mode x trend (2,1971) 2.53 NS

t p r

Covariates Lag-1 partial autocorrelation (Pt) -2.48 0.013 -0.054 Lag-2 partial autocorrelation (P2) -3.63 <0.001 -0.059 p~ x P2 0.50 NS 0.009

No trend Trend up Trend down Total

Adjusted means Graphical 0.86 1.03 0.91 0.93 Tabular 0.67 0.94 0.90 0.84

(c) Root mean square error (RMSE)

df F p

ANOVA Presentation mode (1,51) 7.74 <0.005 Trend (2,102) 70.01 <0.001 Presentation mode x trend (2,1971) 6.15 <0.002

t p r

Covariates Lag-1 partial autocorrelation (p~) 2.09 0.037 0.044 Lag-2 partial autocorrelation (P2) - 6.99 <0.001 -0.144 Pl X/92 --2.78 0.006 --0.054

NO trend Trend up Trend down Total

Adjusted means Graphical 1.81 2./56 2.30 2.22 Tabular 1.65 2.95 2.68 2.43

paradoxical main effects shown in Table 2. Aggregate RMSE is greater with tabular pre- sentation because it contains more trended than untrended series: the effects obtained with

trends therefore overwhelmed the effects ob- tained without them. Aggregate CE is greater with graphical presentation because the equal number of upward and downward series ensures

Page 9: Graphs versus tables: Effects of data presentation format on judgemental forecasting

N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137 127

that their opposite-signed trend underestimation effects cancel each other out. Consequently, the presentation format effect on aggregate CE reflects that obtained with the untrended series. (VE was greater with graphical presentation for all trend types. It did not contribute to the paradox observed with the main effects.)

Table 2 shows that lag-1 partial autocorrela- tion was a significant covariate for all measures, lag-2 partial autocorrelation was a significant covariate for all measures except CE, and the interaction term (required for coding the fre- quency composition of the series) was significant for RMSE only. However, both the weakness of the correlations between covariates and the dependent variables and the relatively small differences between raw means (Table 1) and adjusted means (Table 2) imply that the in- fluence of serial dependence on the effects that we have described was not large.

The results of this experiment are easily sum- marized. When forecasting untrended series, there is a slight advantage to using a tabular presentation. This is because inconsistency and the overforecasting bias are somewhat less with this format. In contrast, forecasting trended series is clearly better with a graphical presenta- tion. This is because there is considerably less trend-damping with this format than with a tabular presentation. Finally, we can have some confidence that these broad conclusions hold for series that vary widely in terms of the serial dependences that they contain.

5. Second experiment

A second experiment was designed to explore the generality of the findings just described. Specifically, we wanted to find out whether similar effects of presentation format would appear when people forecast from series with higher variability.

Previous work provides no clear guide as to what to expect. Increasing the variability of trended and untrended noise series increases the absolute error in forecasts (Dickson et al., 1986; O'Connor et al., 1993; Sanders, 1992) but this

could be attributed to an increase in CE, VE or both these components.

In the first experiment, RMSE for trended series was lower with graphical presentation because the trend-damping (CE) effects that favoured graphical over tabular presentation outweighed the smaller inconsistency (VE) ef- fects that favoured tabular over graphical pre- sentation. Hence, an increase in series variability that causes a proportional rise in VE without affecting CE could result in the overall (RMSE) advantage of graphical presentation being lost. On the other hand, an increase in series vari- ability that causes a proportional increase in trend-damping without any corresponding in- crease in VE would ensure that the overall advantage of graphical presentation is main- tained. Of course, increasing series variability might increase both trend-damping and inconsis- tency: preservation of the presentation format effects obtained in the first experiment would then depend on the relative size of these two effects.

There is some evidence that increasing series variability increases VE (Harvey, 1995) but the evidence on CE is more ambiguous. Eggleton (1982) and Sanders (1992) report greater trend- damping with more variable series but Lawrence and Makridakis (1989) do not do so.

5.1. Subjects and stimulus materials

Another 52 subjects were drawn from the same population as before. None of them had taken part in the first experiment.

The series were identical to those used in the first experiment except for the fact that indi- vidual values were twice as far from the series means (untrended series) or trend lines (trended series) as they were before. In other words, standard deviations of the series around their means or trend lines were double what they were in the first experiment. 4

4 When series lack serial dependence, this is equivalent to doubling the standard deviation of the noise term, e,.

Page 10: Graphs versus tables: Effects of data presentation format on judgemental forecasting

128 N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137

5.2. Design and procedure

The design and procedure were exactly as described for the first experiment.

5.3. Results

The same procedures as before were used to remove outliers. In this instance, 14 of the 4368 values were replaced by the subjects' means. The same three error scores (CE, VE and RMSE) were then calculated for each subject's responses to each series. The means and standard devia- tions of these scores are shown in Table 3 for each presentation format (graphical; tabular) and trend type (up; down; none).

For both presentation formats and all three trend types, RMSE scores are close to double what they were in the first experiment. These increases in overall error scores can be attributed to increases in both VE and CE. Doubling the standard deviation of the series doubled VE and more than doubled the size of the trend-damping bias (shown by larger positive CE values with upward trends and larger negative CE values with downwards trends). Only the overforecast- ing bias (shown by the negative CE for un- trended series) is much the same as before.

These effects of increasing series variability are approximately the same for the graphical and tabular presentations. As a consequence, results of tests of the simple effect of presentation format at each level of the trend variable are very much the same as they were in the first experiment. Damping of both upward (F(1,364) = 31.58; p < 0.0001) and downward

(F(1,364) = 18.66; p <0.0001) trends is greater with the tabular presentation, whereas un- trended series yield greater overforecasting (F(1,364) = 11.63; p = 0.0007) (though not, in this case, inconsistency) with the graphical pre- sentation. The main effects aggregating over trend type show CE to be greater with the graphical presentation; VE to be unaffected by presentation format; and RMSE to be greater with the tabular presentation. As before, the opposite effects for aggregate CE and aggregate RMSE can be attributed to the fact that trend- damping is largely cancelled out in the former score but overwhelms other effects in the latter one.

Data were statistically analysed in the same way as before. Table 4 shows results of the analyses of variance together with adjusted means, significance tests for the influence of covariates and correlations between covariates and dependent variables. There were significant effects of presentation format, trend and the interaction between these variables for all mea- sures apart from VE (where only the effect of trend was significant). The lag-1 partial auto- correlation was a significant covariate for all measures except VE and the lag-2 partial auto- correlation was a significant covariate for all of them except CE. However, the weakness of the correlations between covariates and dependent variables and the relatively small differences between raw and adjusted means again implies that the influence of serial dependence was not large.

In summary, the results of this experiment are very similar to those of the previous one. When

Table 3 Second experiment: Mean error scores of forecasts from series with each type of trend presented in each format (standard deviations are shown in parentheses)

No trend Upward trend Downward trend Overall (n = 728) (n = 728) (n = 728) (n = 2184)

Graphical CE -1.09 (2.64) 2.52 (4.51) -2.60 (3.97) -0.39 VE 1.75 (1.36) 1.92 (1.63) 1.84 (1.60) 1.84 RMSE 3.12 (1.82) 4.74 (3.24) 4.46 (2.92) 4.11

Tabular CE -0.35 (3.05) 4.44 (4.95) -4.02 (4.41) -0.02 VE 1.57 (1.42) 1.93 (1.65) 1.92 (1.65) 1.81 RMSE 3.06 (2.12) 5.91 (3.97) 5.32 (3.71) 4.76

Page 11: Graphs versus tables: Effects of data presentation format on judgemental forecasting

N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137 129

Table 4 Second experiment: Analyses of variance (ANOVA); significance tests for influence of covariates; correlation coefficients between covariates and dependent variables; and adjusted means (a) Constant error (CE)

df F p

ANOVA Presentation mode (1,51) 6.01 0.014 Trend (2,102) 518.01 <0.001 Presentation mode x trend (2,1971) 33.80 <0.001

t p r

Covariates Lag-1 partial autocorrelation (p,) 2.48 0.013 -0.003 Lag-2 partial autocorrelation (P2) - 1.34 NS 0.004 Pl X P2 -0.47 NS -0.148

No trend Trend up Trend down Overall

Adjusted means Graphical - 1.06 2.52 -2.63 -0.39 Tabular -0.31 4.44 -4.06 0.02

(b) Variable error (VE)

df F p

ANOVA Presentation mode (1,51) 0.22 NS Trend (2,102) 5.18 0.066 Presentation mode x trend (2,1971) 1.51 NS

t p r

Covariates Lag-1 partial autocorrelation (Pl) -1.81 NS -0.035 Lag-2 partial autocorrelation (P2) -3.58 <0.001 -0.070 Pl X P2 -0.61 NS -0.005

No trend Trend up Trend down Overall

Adjusted means Graphical 1.77 1.90 1.84 1,84 Tabular 1.59 1.91 1.92 1.81

(c) Root mean square error (RMSE)

df F p

ANOVA Presentation mode (1,51) 28.02 <0.001 Trend (2,102) 108.58 <0.001 Presentation mode x trend (2,1971) 8.89 <0.01

t p r

Covariates Lag-1 partial autocorrelation (p,) 4.65 <0.001 0.091 Lag-2 partial autocorrelation (Oz) -4.62 <0.001 -0.103 #1 x P2 -1.42 NS -0.035

No trend Trend up Trend down Overall

Adjusted means Graphical 3,19 4.73 4.40 4.11 Tabular 3.13 5.90 5.26 4.76

f o r e c a s t i n g u n t r e n d e d se r i e s , t h e r e was a s l ight

a d v a n t a g e o f a t a b u l a r p r e s e n t a t i o n : this was

b e c a u s e t h e o v e r f o r e c a s t i n g b ias was r a t h e r less

w i t h th is f o r m a t . I n c o n t r a s t , f o r e c a s t i n g t r e n d e d

se r ies was m u c h b e t t e r w i t h a g r a p h i c a l p r e s e n t a -

t ion : this was b e c a u s e t h e r e was m u c h less t r e n d -

d a m p i n g w i t h th is f o r m a t . A s b e f o r e , w e c a n

h a v e s o m e c o n f i d e n c e tha t t h e s e c o n c l u s i o n s

Page 12: Graphs versus tables: Effects of data presentation format on judgemental forecasting

130 N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137

hold for series that vary widely in terms of the serial dependences that they contain. In addi- tion, we now have some reason to believe that they are unaffected by series variability.

6. Discussion

The experiments have produced findings that appear to be clear, statistically significant and generalizable to a variety of series types. In particular, the highly significant trend type by presentation format interactions for directional (CE) and total (RMSE) error scores indicate that it is better to forecast untrended series from tables and trended series from graphs. This finding arose largely because trended and un- trended series are subject to different cognitive biases and the strength of these biases depends on presentation format.

In what follows we discuss the different factors that contribute to total error. We then outline some of the limitations and implications of our work.

6.1. Overforecasting bias

An overforecasting bias is a general tendency to produce forecasts that are higher than they should be. It is important to recognize that overforecasting can arise from factors other than this bias and that findings that people underfore- cast in certain circumstances do not indicate that this bias is absent. More specifically, damping a downward trend will produce some overforecast- ing that is not attributable to, but that will add to, the effects of an overforecasting bias. Con- versely, damping an upward trend will tend to produce underforecasting and this will subtract from the effects of an ovefforecasting bias.

Given these effects of trend damping, an overforecasting bias is most easily identified in untrended series. The negative CE values with both presentation formats indicate the presence of an overforecasting bias in both the experi- ments reported here. Eggleton (1982) and Law- rence and Makridakis (1989) also report over- forecasting in stationary series. Sanders (1992)

found a slight effect in the opposite direction but he only examined two stationary series. In one of them, four of the last five data points were below the series mean and, in the other, three of them were below it. Given the importance of recent data points in judgemental forecasting (for ex- ample, Lawrence and O'Connor, 1992; Bolger and Harvey, 1993), it is likely that the underfore- casting that Sanders obtained was caused by the unrepresentative noise pattern imposed on these points in the series that he used.

If we are willing to assume that the overfore- casting and trend-damping biases are indepen- dent and that the size of the latter does not depend on trend direction, we should also be able to estimate the size of the overforecasting bias by halving the difference between the size of the overforecasting effect with downward trends and the size of the underforecasting effect with upward trends. In our experiments this proce- dure produces an estimate of the bias that is somewhat smaller than that obtained from un- trended series. However, when performed on the results reported by Lawrence and Makridakis (1989), it produces an estimate very similar in magnitude to that obtained from their untrended series.

Why is there this slight tendency to overfore- cast? Is it because people are somewhat optimis- tic and experimenters have asked them to fore- cast variables associated with benefits rather than costs? Given the established effects of axis label- ling in tasks such as this (Sniezek, 1986), optim- ism effects are not implausible. However, while an overforecasting bias was evident with (pre- sumably desirable) 'sales' labels in Lawrence and Makridakis's (1989) and the present experi- ments, it was also evident with (presumably undesirable) 'costs' labels in Eggleton's (1982) experiments.

Another more 'ecological' explanation of the bias is that people more frequently experience data series that are increasing than data series that are decreasing. As a result they develop expectations about how series typically change and these expectations influence their forecasts. This influence may be greater when the possible direction of change is ambiguous (for example,

Page 13: Graphs versus tables: Effects of data presentation format on judgemental forecasting

N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137 131

in the noisy untrended series used here) than when it is not (for example, in the noisy but clearly trended series used here). It is not easy to exclude this possibility. However, it would have to be elaborated on to explain the greater overforecasting bias obtained with graphical than with tabular series. Do we experience upwardly trended data even more frequently in graphs than in tables? Answering this would require a survey of the sort that Anderson (1990) has carried out to support his ecological accounts of cognitive processes.

One final possibility is that the bias arises from anchor points that are used in the judgement process. Previous work has established the most recent data point as an important anchor that is used in judgemental forecasting (for example, Harvey et al., 1994; Lawrence and O'Connor, 1992). However, other anchor points may also influence the forecast. For example, Lawrence and Makridakis (1989) found that forecasts were less biased when series were presented nearer to the top of the graph. Clearly the top of the graph or computer screen is a visual anchor point that could be used when presentation is graphical, but that would be unavailable when it is tabular. That anchor points such as this have some role in producing the overforecasting bias is consistent with the greater magnitude of the bias for graphical than for tabular presentation.

6.2. Trend-damping

Trend-damping is a large and well-established bias that affects, judgemental forecasting (for example, Bolger and Harvey, 1993; Eggleton, 1982; Lawrence and Makridakis, 1989; Sanders, 1992). It appears to arise from use of anchor- and-adjust heuristics. People make forecasts for trended series by anchoring on the last data point and then making an adjustment away from it to take the trend into account (Bolger and Harvey, 1993). However, when this heuristic is used, adjustments are typically too small (Tversky and Kahneman, 1974). Consistent with this, people underestimate the size of the adjust- ment needed to take full account of the trend in the series. This produces the damping effect.

This account begs the question why adjust- ments are too small. Lawrence and Makridakis (1989, p. 182) argue that people's forecasts are influenced by their knowledge that trends in the world do not continue for ever: damping "indi- cates a practical forecast since a growth for 7 years might well precede some lean years". In other words, trends are usually parts of long- term cycles and this ecological fact is represented in the knowledge that people bring to their forecasting tasks.

Trend-damping was greater with the more variable series used in the second experiment: this finding replicates those of Eggleton (1982) and Sanders (1992). It can be interpreted in terms of Lawrence and Makridakis's (1989) ecological argument. Higher series variability increases ambiguity about the gradient of the trend-line. As a consequence; there is greater scope for top-down imposition of beliefs about future data that are based on the assumption that the series is behaving in a way that is representa- tive of the ecology.

Trend-damping was much greater when data were presented in tabular format. This finding can also be interpreted in terms of Lawrence and Makridakis's (1989) ecological argument if we assume that gradients of trends are more am- biguous in tabular data than in graphical data. Taking trended data out of a graph and putting them in a table may be cognitively equivalent to increasing their variability. It allows people more scope to impose their ecological knowledge of how series typically behave on their judgements of where the next data points will be.

Why should gradients of trends be more am- biguous in tabular data? Our visual system has evolved to extract information about contours (edges, creases) from noisy signals (Marr, 1976; Marr and Hildreth, 1980). This information includes details of position, length and orienta- tion. Visual perception of trends in graphs should be able to exploit these highly sophisticated perceptual mechanisms: it would be surprising if it were not carried out relatively well. In con- trast, processing of numbers in tables cannot make use of specific processing mechanisms that have evolved for some other purpose. It has to

Page 14: Graphs versus tables: Effects of data presentation format on judgemental forecasting

132 N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137

rely on more general cognitive processes that have developed to allow us to cope with in- formation for which no specific mechanism has evolved (Plotkin, 1994). These processes t rade off accuracy for generality; if they are respon- sible for extracting trends from numbers, it is not surprising that the resulting information is more ambiguous than the corresponding information obtained from graphs by highly evolved special mechanisms.

6.3. Inconsistency

Before discussing inconsistency, we must deal with a methodological point. We have taken CE and VE as independent measures of bias and inconsistency. For trended series, we have as- sumed that CE, derived from forecasts for the 21st and 22nd day, measures trend-damping and overforecasting biases, and that VE, derived from the same two forecasts, measures inconsis- tency. However, it is possible that VE is also influenced by trend-damping.

Consider, for example, the forecasts for an upward trended series. Trend-damping would result in both of them being below optimal values and the average amount of this underesti- mation would equal CE. However, trend-damp- ing would also lead to the forecast for the 22nd day being even further below its optimum than the forecast for the 21st day. The difference in degree of underestimation would contribute to VE. Hence VE contains a component related to trend-damping in addition to one related to the element of inconsistency or randomness in fore- casts. To interpret VE as a measure of inconsis- tency, we need to be sure that the component related to trend-damping is comparatively small.

As trends are increasingly under- (or over-) estimated, the difference in under- (or over-) estimation between the two forecasts increases. Hence, if VE primarily measures trend-damping rather than inconsistency, the correlation be- tween VE and ICE[ should be large. In fact, this correlation turns out to be 0.18 and 0.16 in the first and second experiments. In other words, the two measures share only 3% of their variance.

This provides some support for the view that VE is primarily a measure of inconsistency.

It is still important to be alert to the possibility that the small but significant differences in VE that we obtained were caused by the trend- damping component of the measure. For exam- ple, VE in both experiments was significantly greater in trended series that were subject to damping than in untrended series that were not subject to this bias. It is reasonable to attribute this difference to the fact that the trend-damping component was included in VE only in the former case. This interpretation is supported by the observation that the difference in VE be- tween trended and untrended series is greater for tabular than for graphical presentations: as we have seen, trend-damping measured using CE was also greater with that type of presentation.

An alternative interpretation for the greater VE in trended series is that people perceive any given level of variability to be greater in trended than in untrended series and (erroneously) re- produce this perceived variability as inconsis- tency in their forecasts (cf. Harvey, 1995). If we assume that confidence intervals that people set around their forecasts reflect their perception of variability in the data, then this account suggests that intervals should be wider around forecasts from trended series. However, the evidence on this issue is equivocal: Lawrence and Makridakis (1989) and O'Connor and Lawrence (1992) found that they were wider, whereas Eggleton (1982) did not. Furthermore, there is some doubt as to whether confidence intervals do reflect perceived variability: clear increases in noise have led to wider intervals in some studies (Eggleton, 1982; Lawrence and Makridakis, 1989) but not in others (O'Connor and Law- rence, 1992).

In the first experiment, there was a main effect of presentation format on VE. Tests of simple effects showed that this was attributable solely to the results from untrended series (p <0.0001). In the second experiment, the main effect was not significant. However, tests of simple effects showed that the effect of presentation format on VE was marginally significant for untrended series (p =0.07) but did not approach signifi-

Page 15: Graphs versus tables: Effects of data presentation format on judgemental forecasting

N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137 133

cance for trended series. Clearly an effect re- stricted to untrended series cannot be caused by the trend-damping component of VE. It must reflect inconsistency. However, its disappearance with trended series could be attributed to the trend-damping component. The analysis of CE scores showed that trend-damping is greater with a tabular presentation. In contrast, the inconsis- tency effect for untrended series is greater with a graphical presentation. Hence, if both these effects contribute to VE in trended series, they could cancel one another out. In other words, inconsistency was greater with a graphical pre- sentation for all series but this effect was masked by another one when trends were present.

Why should forecasts have been more incon- sistent when data were presented graphically? Perhaps a given level of variability is perceived to be greater when this format is used. If people reproduce the variability that they perceive in the data when making their forecasts, the ob- served effect would be produced. Alternatively, people may be less careful when they are select- ing positions on graphs as forecasts than when they are selecting numbers in tables as forecasts. They may feel that the digital representation demands more precision from them than the analogue scale.

7. Limitations

Although we have endeavoured to ensure that our results have some generality, they are poten- tially subject to a number of limitations. In this section we briefly consider four of them: subject population; computer-based displays; series type; and series label.

7.1. Subject population

Like most of those who have studied judgemental forecasting experimentally, we used students as subjects. Does this prevent generali- zation of the results to those who make judgemental forecasts as part of their jobs?

Forecasting requires extraction of information

from a signal and then a decision about how to use that information. Both these processes could be influenced by learning. However, we have argued that the initial information extraction process makes use of highly evolved perceptual mechanisms that are not subject to learning when presentation is graphical. In contrast, it must resort to more general knowledge-based cognitive processes that are tuned by experience when presentation is tabular. This analysis sug- gests that practise with both formats should improve forecasting from tables rather than forecasting from graphs. Is there any evidence of this?

Lawrence et al. (1985) compared the perform- ance of their student subjects with that of the paper's three authors who, presumably, had rather more experience at processing both tabu- lar and graphical data. T h e student subjects narrowly outperformed the researchers with graphical presentation but were "uniformly and frequently significantly less accurate" with tabu- lar presentation. The first of these findings may have been related to the age difference between the groups and does not appear to have reached statistical significance. The second finding is fully consistent with our analysis. However, it is important to note that the improvement in forecasting from tables produced by experience was not sufficient to remove the advantage of graphical presentation: errors in the researchers' short-term forecasts were also less with this format.

Additional evidence that experience does not extinguish the advantage of graphical presenta- tion for forecasting trended data comes from the work of Coil et al. (1991) and Dickson et al. (1986). According to Coil et al. (1991), en- gineers tend to have more experience with graphical data presentation whereas business people tend to have more experience with tabu- lar presentation. They demonstrated that a num- ber of decision-making tasks that Dickson et al. (1986) had shown to be performed better with tabular presentation are performed better with graphical presentation when engineers rather than business people are used as subjects. How- ever, Dickson et al. (1986) showed that even

Page 16: Graphs versus tables: Effects of data presentation format on judgemental forecasting

134 N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137

business people forecast trended data better with graphical presentation.

These studies show that even experience biased against graphical presentation fails to extinguish its advantage for forecasting trended series. This gives us some reason to be confident that our findings would broadly generalize to other more experienced subject groups. This is not to deny that research on the effects of experience on judgemental forecasting skill is not needed. In particular, the effects of exposure to different types of series on the size and nature of judgemental biases would help to clarify the role of ecological knowledge in forecasters' deci- sions about how to use the information that they have extracted from data.

7.2. Computer-based displays

We used computer-based displays for both presentation formats. Would our results general- ize to forecasters using pencil and paper? No guidance for answering this question is to be obtained from the literature on the effects of computer-based versus hard-copy display: it ap- pears to be in as much disarray as that on tabular versus graphical presentation (cf. Lucas, 1981; Lucas and Nielsen, 1980). However, the issue of the effect of display type on accuracy of judgemental forecasts has practical importance and may help to explain some apparently anomalous results.

For example, Lawrence et al. (1985) found judgemental forecasts to be of comparable qual- ity to those made osing statistical methods. O'Connor et al. (1993) argue that their failure to replicate this result arose because their fore- casters used computer-based displays whereas Lawrence et al.'s subjects used pencil and paper. They imply that people may be less careful when using computer-based displays. If this is true for both types of presentation format, the overall advantage of graphical over tabular presentation that we obtained with computer-based displays would be preserved with pencil and paper fore- casts. However, computer-based displays may just encourage lack of care with graphical pre- sentation. If so, the higher inconsistency that we

obtained with this format (Subsection 6.3) would be reduced (or even disappear) with pencil-and- paper forecasts.

7.3. Series label

Our series were labelled as 'sales'. This label has been employed in previous work both by us and by others (for example, Bolger and Harvey, 1993; Lawrence and Makridakis, 1989). Its con- tinued use provides some comparability across studies and adds some ecological validity to the subjects' task. However, in their recent review, Goodwin and Wright (1993) have pointed out that Sniezek's (1986) work on labelling effects in cue probability learning tasks should lead us to query whether series labels affect subjects' fore- casts. As we mentioned above, the fact that sales are desirable may have triggered an optimism bias that led to the overforecasting that we observed.

Would our results have been different if we had not used the 'sales' label? What would have happened if we had followed Sanders' (1992) practice of attaching no meaningful label to the series? Clearly, the overforecasting bias might have disappeared: this would have eliminated the advantage of tabular presentation for un- trended series. However, there might have been other effects as well. Both Koele (1980) and Sniezek (1986) found that performance was worse in the absence of meaningful labels. This was because responses were more inconsistent without them. On the basis of this work, we would expect that removing t h e 'sales' label would make forecasting worse by increasing VE scores. However, trend-damping is unrelated to inconsistency. We would not expect it to be affected by label removal: the advantage of graphical presentation for trended data should be maintained.

7.4. Series type

Graphical presentation led to better judgemental forecasts for trended series but not for untrended ones. To a large extent, this was because the underestimation of trends was much

Page 17: Graphs versus tables: Effects of data presentation format on judgemental forecasting

N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137 135

greater with a tabular presentation. However, like other investigators, we used linear trends. As we have pointed out, these may not be typical of naturally occurring data. Real trends that people forecast may show some damping, as Lawrence and Makridakis (1989) suggest. People forecasting these series may not show the same underestimation effects that we have found. Without these effects, the difference between graphical and tabular presentation would largely disappear. It would be rash to generalize our conclusions to series that contain non-linear (particularly decelerating) trends. Additional experimental work is needed using this type of series as stimulus materials.

8. Implications

People using their judgement to make fore- casts from linearly trended series perform better when data are presented to them graphically. This is because their tendency to underestimate trends is less than it is when they forecast from tabular data. This advantage of graphical pre- sentation was clearly evident in our trended series despite variation in the serial dependence and variability that they contained. People using their judgement to make forecasts from un- trended series perform rather better with a tabular presentation. This is because their ten- dency to overforecast and the inconsistency in their forecasts are greater when series are pre- sented graphically. The statistical significance of the second of these effects depended on the variability of the data series. In this final section, we consider the practical and theoretical implica- tions of these findings.

research may even show that it is contingent on the nature of the trends that the data contain. Meanwhile, there appears to be reason enough to maintain or adopt a graphical presentation in practical forecasting situations. Most series re- quiring forecasts seem to contain trends of some sort: the benefits to be gained from using a graphical presentation for these series should more than outweigh the comparatively low costs that will be incurred as a consequence of using it to forecast relatively few untrended series.

In their review, Bunn and Wright (1991) point out that all serious forecasting involves some judgement but that the role that judgement has in the forecasting process varies widely. Our findings are most clearly applicable to individuals making forecasts by visual inspection of data. This practice is prevalent in small businesses, but in larger concerns three other approaches are common. First, separate judgements from a number of individuals can be concatenated by mechanical averaging or by forcing social con- sensus. Although this reduces inconsistency, it preserves biases (overforecasting; trend-damp- ing): we would therefore expect the format effects reported here to remain salient. A second approach is to take a (weighted) average of the judgemental forecast and a forecast produced by formal statistical means: our findings would still be relevant here but the size of the format effects would depend on the weight given to judgement. Finally, judgement may be used to adjust a statistical forecast. Although we would expect judgemental adjustment and judgemental pro- duction of forecasts to be susceptible to similar biases, we would like empirical confirmation of this before claiming that the forecast effects reported here should occur in this situation.

8.1. Practical issues 8.2. Theoretical issues

Our results reinforce and elaborate conclu- sions drawn from previous work. The evidence that we have obtained appears to place the argument for using graphical presentation on a broader and firmer footing. However, it also shows that the advantage of this format is contin- gent upon the data containing trends. Future

Evans et al. (1994) point out that rational decision-making usually involves forecasting: one action is preferred to another because the chooser believes that he or she would rather live in the slightly different world that will exist when this action is taken. However, one of the most important axioms of normative decision theories

Page 18: Graphs versus tables: Effects of data presentation format on judgemental forecasting

136 N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137

is that of invariance (Tversky and Kahneman, 1986): different but logically identical repre- sentations of the same choice problem, or differ- ent methods of eliciting a choice, should yield the same preferences. Our work suggests that decisions based on forecasts from graphical and from tabular representations of the same data would be different. This would violate the in- variance axiom and add to the mounting evi- dence that normative and descriptive accounts of decision-making are irreconcilable.

Acknowledgements

Work reported in this paper was funded by ESRC grant R000232646 and presented at the 7th International Conference on the Foundations and Applications of Utility, Risk and Decision Theory, Sandvika, Norway, 1994.

References

Anderson, J.R., 1990, The Adaptive Character of Thought (Erlbaum, Hillsdale, NJ).

Angus-Leppan, P. and V. Fatseas, 1986, The forecasting accuracy of trainee accountants using judgmental and statistical techniques, Accounting and Business Research, 16, 179-188.

Armstrong, J.S. and F. Collopy, 1992, Error measures for generalizing about forecasting methods: Empirical com- parisons, International Journal of Forecasting, 8, 69-80.

Bolger, F. and N. Harvey, 1993, Context-sensitive heuristics in statistical reasoning, The Quarterly Journal of Ex- perimental Psychology, 46A, 779-811.

Bunn, D. and O. Wright, 1991, Interaction of judgmental and statistical forecasting methods: Issues and analysis, Man- agement Science, 37, 501-518.

Coil, R., A. Thyagarajan and S. Chopra, 1991, An ex- perimental study comparing the effectiveness of computer graphics data versus computer tabular data, IEEE Transac- tions on Systems, Man and Cybernetics, 21, 897-900.

DeSanctis, G., 1984, Computer graphics as decision aids: Directions for research, Decision Sciences, 15, 463-487.

Dickson, G.W., G. DeSanctis and D,J. McBride, 1986, Understanding the effectiveness of computer graphics for decision support: A cumulative experimental approach, Communications of the ACM, 29, 40-47.

Eggleton, I.R.C., 1982, Intuitive time-series extrapolation, Journal of Accounting Research, 20, 68-102.

Evans, J. St. B.T., D.E. Over and K.I. Manktelow, 1994, Reasoning, decision making and rationality, in: P.N. Johnson-Laird and E. Shafir, (eds.), Reasoning and Deci- sion Making (Blackwell, Cambridge, MA, and Oxford), 165-187.

Fildes, R., 1992, The evaluation of extrapolative forecasting methods, International Journal of Forecasting, 8, 81-98.

Ganzach, Y, 1993, Predictor representation and prediction strategies, Organizational Behavior and Human Decision Processes, 56, 190-212.

Goodwin, P. and G. Wright, 1993, Improving judgmental time series forecasting: A review of the guidance provided by research, International Journal of Forecasting, 9, 147- 161.

Harvey, N., 1995, Why are judgements less consistent in less predictable task situations? Organizational Behaviour and Human Decision Processes, 63, 247-263.

Harvey, N., F. Bolger and A.G.R. McClelland, 1994, On the nature of expectations, British Journal of Psychology, 85, 203-229.

Jones, G.V., 1976, Polynomial perception of exponential growth, Perception and Psychophysics, 21, 197-200.

Jones, G.V., 1979, A generalized polynomial model for perception of exponential series, Perception and Psycho- physics, 25, 232-234.

Jones, G.V., 1984, Perception of inflation: Polynomial not exponential, Perception and Psychophysics, 36, 485-487.

Koele, P., 1980, The influence of labeled stimuli on nonlinear multiple-cue probability learning, Organizational Behavior and Human Performance, 26, 22-31.

Lawrence, M.J., 1983, An exploration of some practical issues in the use of quantitative forecasting models, Jour- nal of Forecasting, 2, 169-179.

Lawrence, M.J. and S. Makridakis, 1989, Factors affecting judgmental forecasts and confidence intervals, Organiza- tional Behavior and Human Decision Processes, 42, 172- 187.

Lawrence, M.J. and M. O'Connor, 1992, Exploring judg- mental forecasting, International Journal of Forecasting, 8, 15-26.

Lawrence, M.J., R.H. Edmundson and M. O'Connor, 1985, An examination of the accuracy of judgmental extrapola- tion of time series, International Journal of Forecasting, 1, 25-36.

Lucas, J.C., Jr., 1981, An experimental investigation of the use of computer-based graphics in decision making, Man- agement Science, 27, 757-768.

Lucas, J.C., Jr. and N.R. Nielsen, 1980, The impact of mode of information presentation on learning and performance, Management Science, 26, 982-993.

Marr, D.C., 1976, Early processing of visual information, Philosophical Transactions of the Royal Society of London (B), 275, 483-524.

Marr, D.C. and E. Hildreth, 1980, A theory of edge detection, Proceedings of the Royal Society of London (B), 207, 187-217.

O'Connor, M. and M. Lawrence, 1992, Time series charac-

Page 19: Graphs versus tables: Effects of data presentation format on judgemental forecasting

N. Harvey, N. Bolger / International Journal of Forecasting 12 (1996) 119-137 137

teristics and the widths of judgemental confidence inter- vals, International Journal of Forecasting, 7, 413-420.

O'Connor, M., W. Remus and K. Griggs, 1993, Judgmental forecasting in times of change, International Journal of Forecasting, 9, 163-172.

Plotkin, H., 1994, The Nature of Knowledge: Concerning Adaptations, Instinct and the Evolution of Intelligence (Allen Lane, London).

Remus, W., 1987, A study of graphical and tabular displays and their interaction with environmental complexity, Man- agement Science, 33, 1200-1204.

Sanders, N.R., 1992, Accuracy of judgmental forecasts: A comparison, Omega: International Journal of Management Science, 20, 353-364.

Sniezek, J.A., 1986, The role of labels in cue probability learning tasks, Organizational Behavior and Human Deci- sion Processes, 38, 141-161.

Tullis, T.S., 1988, Screen design, in: M. Helander, (ed.), Human-computer Interaction (North-Holland, Amster- dam), 367-411.

Tversky, A. and D. Kahneman, 1974, Judgment under

uncertainty: Heuristics and biases, Science, 185, 1127- 1131.

Tversky, A. and D. Kahneman, 1986, Rational choice and the framing of decisions, Journal of Business, 59, $251- $278.

Wagenaar, W.A. and S.D. Sagaria, 1975, Misperception of exponential growth, Perception and Psychophysics, 18, 416-422.

Biographies: Nigel HARVEY is a Reader in Experimental Psychology at University College London. His research interests are in the areas of judgement and decision-making. He works on judgemental forecasting and control of dy- namical system behaviour.

Fergus BOLGER is a post-doctoral Research Fellow at University College London working on judgemental forecast- ing and subjective probability. He has published a number of articles on judgement and decision-making and recently co- edited the book Expertise and Decision Support.