mining the past to determine the future: comments

4
International Journal of Forecasting 25 (2009) 452–455 www.elsevier.com/locate/ijforecast Discussion Mining the past to determine the future: Comments Simon Price *,1 City University, United Kingdom Bank of England, United Kingdom I was invited to comment on Professor Hand’s paper from the perspective of a central bank forecaster, which I am delighted and honoured to do. In that role, I am an applied macroeconomist whose aim is not only to forecast, but also to understand macro data. This means that I often think about inference in the context of structural relationships. My comments are not particularly technical, but I try to respond to the wider issues raised in the paper and relate them to the type of problem that I face. As will shortly be clear, the data sets I use are sometimes large, but are nevertheless much smaller than those considered by data miners. As background, one of my favourite quotes is from Svensson (2005), describing what we central bankers do: ‘Large amounts of data about the state of the economy and the rest of the world . . . are collected, processed, and analyzed before each major decision’. This happens in many ways. One is that experts sit at their desks looking at the data in various ways. DOI of original article: 10.1016/j.ijforecast.2008.09.004. * Corresponding address: Bank of England, Threadneedle Street, London EC2R 8AH, United Kingdom. E-mail address: [email protected]. 1 The remarks in this comment reflect the personal views of the author and should not be thought to represent those of the Bank of England or members of the Monetary Policy Committee. More formally, people like myself try to forecast aggregates such as output growth and inflation. At longer horizons we can do this with simple models. At shorter horizons in particular, we often find it useful to use ‘data rich’ data-sets containing perhaps one or two hundred variables over a decade or so at a monthly frequency — giving us, perhaps, 20,000 data points. This way of proceeding was largely begun by James Stock and Mark Watson (e.g., Stock & Watson, 1999), who pioneered the use of factor models which summarise such large bodies of information in an essentially atheoretical way. Since then the literature has moved on; my colleagues Jana Eklund and George Kapetanios recently reviewed it (Eklund & Kapetanios, 2008). In some respects, the issues are close to those encountered by the data miners. There are either a large number of potential models or an infeasibly large general model, and we need efficient algorithms to extract the forecasting model. One difference is that data-miners work with data sets several orders of magnitude larger than ours. Another is that, while the data-sets described by Professor Hand typically include individual objects, their characteristics, and time (indexed, say, i , k and t ), we, on the other hand, usually work with two dimensional panels (indexed i and t ). Another, not very important, difference is that often we only want 0169-2070/$ - see front matter c 2009 Published by Elsevier B.V. on behalf of International Institute of Forecasters. doi:10.1016/j.ijforecast.2008.11.001

Upload: simon-price

Post on 05-Sep-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

International Journal of Forecasting 25 (2009) 452–455www.elsevier.com/locate/ijforecast

Discussion

Mining the past to determine the future: Comments

Simon Price∗,1

City University, United KingdomBank of England, United Kingdom

e

I was invited to comment on Professor Hand’spaper from the perspective of a central bank forecaster,which I am delighted and honoured to do. In that role,I am an applied macroeconomist whose aim is notonly to forecast, but also to understand macro data.This means that I often think about inference in thecontext of structural relationships. My comments arenot particularly technical, but I try to respond to thewider issues raised in the paper and relate them tothe type of problem that I face. As will shortly beclear, the data sets I use are sometimes large, but arenevertheless much smaller than those considered bydata miners.

As background, one of my favourite quotes is fromSvensson (2005), describing what we central bankersdo: ‘Large amounts of data about the state of theeconomy and the rest of the world . . . are collected,processed, and analyzed before each major decision’.This happens in many ways. One is that experts sitat their desks looking at the data in various ways.

DOI of original article: 10.1016/j.ijforecast.2008.09.004.∗ Corresponding address: Bank of England, Threadneedle Street,

London EC2R 8AH, United Kingdom.E-mail address: [email protected].

1 The remarks in this comment reflect the personal views of theauthor and should not be thought to represent those of the Bank ofEngland or members of the Monetary Policy Committee.

0169-2070/$ - see front matter c© 2009 Published by Elsevier B.V. on bdoi:10.1016/j.ijforecast.2008.11.001

More formally, people like myself try to forecastaggregates such as output growth and inflation. Atlonger horizons we can do this with simple models. Atshorter horizons in particular, we often find it usefulto use ‘data rich’ data-sets containing perhaps oneor two hundred variables over a decade or so at amonthly frequency — giving us, perhaps, 20,000 datapoints. This way of proceeding was largely begunby James Stock and Mark Watson (e.g., Stock &Watson, 1999), who pioneered the use of factor modelswhich summarise such large bodies of informationin an essentially atheoretical way. Since then theliterature has moved on; my colleagues Jana Eklundand George Kapetanios recently reviewed it (Eklund& Kapetanios, 2008). In some respects, the issuesare close to those encountered by the data miners.There are either a large number of potential modelsor an infeasibly large general model, and we needefficient algorithms to extract the forecasting model.One difference is that data-miners work with datasets several orders of magnitude larger than ours.Another is that, while the data-sets described byProfessor Hand typically include individual objects,their characteristics, and time (indexed, say, i , k andt), we, on the other hand, usually work with twodimensional panels (indexed i and t). Another, notvery important, difference is that often we only want

half of International Institute of Forecasters.

S. Price / International Journal of Forecasting 25 (2009) 452–455 453

to forecast a small number of series — although thetechniques can be applied to forecast many or all ofthe series in the data set (Carriero, Kapetanios, &Marcellino, 2007).

Professor Hand mentions revisions of economicdata in passing. This is a major issue for us as macroe-conomic forecasters, where there are frequently sub-stantial revisions published in successive releases,which often continue for years. In fact, data revisionsare at the core of my own paper presented at ISF 2008(Eklund, Kapetanios, & Price, 2008), although it isin no way a data rich exercise. He suggests that theoccurrence of data revisions implies that we shouldweight older data more. This is certainly one response,and one that has been explored formally by Harrison,Kapetanios, and Yates (2004). However, I suspect thatin most data mining related applications it is not anissue.

In this context, I have to mention the econometri-cian’s view of something we used to call ‘data mining’,invariably in a pejorative sense, as Professor Handmakes clear. It was used to describe the results of largenumbers of specification searches. This could be eitherwell intentioned (trying to find the ‘true’ model, andin the process ignoring classical inference) or malevo-lent (dishonestly trying to find results that fitted priorviews); but it was never good. It is worth consideringwhy this attitude continues to prevail. It was because,as economists, we often like to think that our aim isinference — to test hypotheses. I first encountered thephrase in its current context in an Edinburgh bar, whenI was discussing data analysis with a pair of computerscientists who were engaged in the exploration of largevolumes of data; in their case, geological data for theoil industry. Of course, what they were engaged inwas description, not inference. It is this issue that isrelevant — in fact, central — to this paper. I believethat at the heart of the argument is the question thatin econometrics has been labelled ‘theory versus mea-surement’.2 Thus, in the current paper, I was struckby the distinction between made between iconic andempirical models. What does ‘iconic’ mean? ‘Iconicmodels are mathematical representations (‘images’) of

2 Koopmans (1947) talked about ‘measurement without theory’in a critical review of Burns and Mitchell’s classic MeasuringBusiness Cycles.

(necessarily simplifying) theories describing the phe-nomenon in question . . .. Conversely, empirical mod-els are based purely on finding convenient or usefulsummaries of a data set’.

So that fits the economist’s perspective: the primacyof theory. However, although for us it is the onlygame in town, one should be aware that there aredifferences between disciplines in the degree to whichone can put faith in theory. If I may be permittedto illustrate with an anecdote, Roy Batchelor (thisyear elected a Fellow of the International Instituteof Forecasters) and I were once approached bya company offering a product to the oil industry.It exploited information on oil flows in pipelines.The aim was real-time prediction and prevention ofturbulent episodes — eddies — that disrupted flow.For the engineers, the physics was known, so allthey needed was enough data and computing power.They thought, why not apply techniques like this tothe economy? How complicated can it be comparedto solving Navier–Stokes equations? But of courseRoy and I never made any money out of that. Wecould not hope to get Robert Hall’s theoretical Eulerequation explaining consumption dynamics to workwell enough.

Economists love theoretical models, but economicforecasters are normally indifferent at best.3 Why isthat? It is because the economists are interested inperforming inference on parameterised models so thatthey can understand the mechanisms at play. Theylike to tell stories. On the other hand, forecastersjust want to forecast. In our experience, it is notgenerally true that ‘iconic [i.e., theoretical] models. . . yield superior predictions to empirical models’.This is precisely because we cannot be sure that ‘themodels are “right”, in that they do represent importantaspects of (or, perhaps, “good approximations to”) theway the system being modelled really behaves’. Notfor people, anyway: I do think, however, as I justsuggested with the oil example, that there is more hopewith physical systems.

I now wish to introduce the idea of factors. Ineconomics we often think in terms of ‘shocks’. There

3 I should distinguish between the straightforward task ofproducing the best forecast, conditional or unconditional, andthe publication of a macroeconomic forecast by a policy-makinginstitution, where the issues are quite different, see Bank of England(1999, 2000).

454 S. Price / International Journal of Forecasting 25 (2009) 452–455

are potentially millions of variables in the economy;even restricting ourselves to quarterly aggregates thereare hundreds that we may want to look at. So the truedata generation process (DGP) may be enormouslycomplicated. However, in our theoretical models thereare only a few shocks — e.g., to productivity, energyprices, and so on. This might seem unlikely at firstblush, but all of the evidence from both simple andcomplicated factor models suggests that there areindeed only a handful of shocks — say, five — thatexplain almost all (by which I mean exceeding 90%)of the variation in the kind of large data sets I dealwith. And those factors can then be used to forecastwithout trying to understand what they are. One reasonthat this works is that parsimony is vital in forecasting.Even if we knew the structure of the true DGP, wewould be unlikely to want to estimate it, becausethe number of parameters would make estimationuncertainty dominant. So having a small number offactors is helpful. This also helps to explain whyempirical models using variables do not have to haveobvious interpretations, or be stable, as they are alldriven by those factors. There is evidence, for examplefrom De Mol, Giannone, and Reichlin (2007), thatmodels that are parsimonious in variables do as well asmodels parsimonious in factors. However, I obviouslyagree that there is a danger of over-fitting. It is simplythat I do not think that ‘theory’ is necessarily usefulin this task. I see the ‘cliff-edge’ effect that ProfessorHand refers to as out-of-sample forecast failure, whichis a common consequence of over-fitting.

Of course, one could argue that factor andsome other models are iconic. There is not a tightspecification, but theory does suggest relationshipsand regularities (e.g., only a few shocks drive nearlyeverything; demand curves slope down).

As I mentioned, the data I work with are twodimensional, while data-miners typically work withindividuals, their characteristics, and time. Also, mydata sets are small. So the question I ask those betterqualified than me is whether factor models adapted tothat structure and capable of dealing with huge datasets might be helpful.

One example that Professor Hand uses, ‘creditwor-thiness’, seems to fit this approach. A latent vari-able is another name for a factor. He suggests thatwe distinguish in the modelling process between ‘pri-mary characteristics’ (e.g., age) of the customer and

‘behavioural characteristics’ (e.g., arrears history). Ineconomic terms, these are exogenous and endogenous,and the economist in me approves. However, it is notclear why we would not feel happy conditioning onpast behaviour if our aim is to forecast future be-haviour (rather than to understand the process under-lying that behaviour). It might be the case that ‘theory’can help us discard variables. Why would the purchaseof iPods affect default probabilities? But is not thatrather the point of data-mining — uncovering regular-ities we would never have expected to exist? ‘Empiri-cal models have the strength that they might include apowerful predictor which an “expert” would not haverecognised as relevant — they can include things wewould never have thought of, to increase the predic-tive power’.

Finally, a brief remark on the discussion regardingloss functions and performance criteria. I thinkthat this is another issue that is taken for grantedamong forecasters, although perhaps rarely actedupon.

In brief conclusion, the problems faced by data-miners and the data-rich macroeconomic forecastersmay be similar, although the structures within whichwe operate differ. There may be lessons to be learnedfrom the two areas of the literature. I will pass thatover to the audience. I agree that inference is crucial.Where I may disagree is on the usefulness of ‘theory’,odd though it may seem for an economist to saythis.

References

Bank of England, (1999). Economic models at the Bank of England.London: Bank of England.

Bank of England, (2000). Economic models at the Bank of England:September 2000 update. London: Bank of England.

Carriero, A., Kapetanios, G., & Marcellino, M. (2007). Forecastinglarge datasets with reduced rank multivariate models. QueenMary working paper No. 617.

De Mol, C., Giannone, D., & Reichlin, L. (2007). Forecasting usinga large number of predictors: Is Bayesian regression a validalternative to principal components? ECB working paper No.700.

Eklund, J., & Kapetanios, G. (2008). A review of forecastingtechniques for large datasets. National Institute EconomicReview, 203(1), 109–115.

Eklund, J., Kapetanios, G., & Price, S. (2008). Forecasting with amodel of data revisions. Bank of England, unpublished.

S. Price / International Journal of Forecasting 25 (2009) 452–455 455

Harrison, R., Kapetanios, G., & Yates, T. (2004). Forecastingwith measurement errors in dynamic models. Bank of Englandworking paper No. 237.

Koopmans, T. C. (1947). Measurement without theory. Review ofEconomic Statistics, 29, 161–172.

Stock, J., & Watson, M. (1999). Forecasting inflation. Journal ofMonetary Economics, 44, 293–335.

Svensson, L. (2005). Monetary policy with judgment: Fore-cast targeting. International Journal of Central Banking, 1,1–54.