neural networks and seasonality: some technical considerations

8
Short Communication Neural networks and seasonality: Some technical considerations Bruce Curry * Cardiff Business School, Cardiff University, Aberconway Building, Colum Drive, CARDIFF CF10 3EU, United Kingdom Received 25 October 2005; accepted 6 March 2006 Available online 5 May 2006 Abstract Debate continues regarding the capacity of feedforward neural networks (NNs) to deal with seasonality without pre- processing. The purpose of this paper is to provide, with examples, some theoretical perspective for the debate. In the first instance it considers possible specification errors arising through use of autoregressive forms. Secondly, it examines sea- sonal variation in the context of the so-called ‘universal approximation’ capabilities of NNs, finding that a short (bounded) sinusoidal series is easy for the network but that a series with many turning points becomes progressively more difficult. This follows from results contained in one of the seminal papers on NN approximation. It is confirmed in examples which also show that, to model seasonality with NNs, very large numbers of hidden nodes may be required. Ó 2006 Elsevier B.V. All rights reserved. Keywords: Neural networks; Seasonality; Sinusoid; Autoregressive models; Approximation 1. Introduction In a recent article in this journal Zhang and Qi (2005) argue that feedforward neural networks (NNs) have a limited capacity to deal with seasonal- ity in time series. Such a conclusion raises doubts about the well-established ‘universal approxima- tion’ property, which provides the basis for use of NNs to implement flexible nonlinear regression. As well as the references cited by Zhang and Qi, hereinafter ZQ, the choice between direct applica- tion of NNs as opposed to prior seasonal adjust- ment has also been considered by Crone (2005) and Dhawan and O’Connor (2005). The aim of this paper is to provide some theoret- ical perspective for the debate, given that reliance solely on empirical testing, no matter how rigorous, will not be adequate. Two specific aspects of the the- ory, relating to feedforward logistic networks, are considered. First of all, it is shown that with season- ality there is a serious risk of specification error. If the underlying process consists of a time trend and a multiplicative seasonal factor, then autoregressive formulations for the network, such as those imple- mented by ZQ, are not appropriate. This could explain why prior seasonal adjustment appears to be effective. Secondly, there is a need to reconsider the property of universal approximation in the con- text of seasonal variation. It is argued here both in theoretical terms and through examples that season- ality lies on the very edge of the approximation properties of NNs. As part of this argument, it is suggested that extremely large networks, containing surprisingly high numbers of hidden nodes, may be required. The theoretical aspect of the argument 0377-2217/$ - see front matter Ó 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2006.03.012 * Tel.: +44 29 20875668; fax: +44 29 20874419. E-mail address: curry@cardiff.ac.uk European Journal of Operational Research 179 (2007) 267–274 www.elsevier.com/locate/ejor

Upload: bruce-curry

Post on 26-Jun-2016

212 views

Category:

Documents


1 download

TRANSCRIPT

European Journal of Operational Research 179 (2007) 267–274

www.elsevier.com/locate/ejor

Short Communication

Neural networks and seasonality: Some technical considerations

Bruce Curry *

Cardiff Business School, Cardiff University, Aberconway Building, Colum Drive, CARDIFF CF10 3EU, United Kingdom

Received 25 October 2005; accepted 6 March 2006Available online 5 May 2006

Abstract

Debate continues regarding the capacity of feedforward neural networks (NNs) to deal with seasonality without pre-processing. The purpose of this paper is to provide, with examples, some theoretical perspective for the debate. In the firstinstance it considers possible specification errors arising through use of autoregressive forms. Secondly, it examines sea-sonal variation in the context of the so-called ‘universal approximation’ capabilities of NNs, finding that a short (bounded)sinusoidal series is easy for the network but that a series with many turning points becomes progressively more difficult.This follows from results contained in one of the seminal papers on NN approximation. It is confirmed in examples whichalso show that, to model seasonality with NNs, very large numbers of hidden nodes may be required.� 2006 Elsevier B.V. All rights reserved.

Keywords: Neural networks; Seasonality; Sinusoid; Autoregressive models; Approximation

1. Introduction

In a recent article in this journal Zhang and Qi(2005) argue that feedforward neural networks(NNs) have a limited capacity to deal with seasonal-ity in time series. Such a conclusion raises doubtsabout the well-established ‘universal approxima-tion’ property, which provides the basis for use ofNNs to implement flexible nonlinear regression.As well as the references cited by Zhang and Qi,hereinafter ZQ, the choice between direct applica-tion of NNs as opposed to prior seasonal adjust-ment has also been considered by Crone (2005)and Dhawan and O’Connor (2005).

The aim of this paper is to provide some theoret-ical perspective for the debate, given that reliance

0377-2217/$ - see front matter � 2006 Elsevier B.V. All rights reserved

doi:10.1016/j.ejor.2006.03.012

* Tel.: +44 29 20875668; fax: +44 29 20874419.E-mail address: [email protected]

solely on empirical testing, no matter how rigorous,will not be adequate. Two specific aspects of the the-ory, relating to feedforward logistic networks, areconsidered. First of all, it is shown that with season-ality there is a serious risk of specification error. Ifthe underlying process consists of a time trend anda multiplicative seasonal factor, then autoregressiveformulations for the network, such as those imple-mented by ZQ, are not appropriate. This couldexplain why prior seasonal adjustment appears tobe effective. Secondly, there is a need to reconsiderthe property of universal approximation in the con-text of seasonal variation. It is argued here both intheoretical terms and through examples that season-ality lies on the very edge of the approximationproperties of NNs. As part of this argument, it issuggested that extremely large networks, containingsurprisingly high numbers of hidden nodes, may berequired. The theoretical aspect of the argument

.

268 B. Curry / European Journal of Operational Research 179 (2007) 267–274

draws on detailed results contained in the famouspaper on NN approximation by Barron (1993).These and other results are sometimes given littleattention in the literature, but are worthy of detailedconsideration.

The following section provides some technicalbackground. It deals with a basic result whereby atime series model expressed in terms of seasonaldummy variables can equivalently be expressed intrigonometric form. Although dummy variablespermit easy exposition, they have the disadvantageof introducing discontinuities. Section 3 extends thistreatment to multiplicative time series models, asemployed for example by ZQ. Trigonometric con-cepts also arise in the study of NN approximations,which are discussed in Section 4 below.

2. Preliminaries

Ghysels and Osborn (2001) show that a time ser-ies model in which seasonality is described in termsof dummy variables can also be expressed in trigo-nometric form. For example, for monthly data, adeterministic linear representation would be

yt ¼X12

i¼1

cidit; ð1Þ

where dit denote seasonal (monthly) dummies.Eq. (1) can also be expressed as

yt ¼ lþX6

k¼1

½ak cosðpkt=6Þ þ bk sinðpkt=6Þ�; ð2Þ

i.e. as a linear combination of sine and cosine termsof different periodicities, with coefficients ak and bk

and overall mean l. In fact, there is no need forthe sixth sine coefficient, for which the correspond-ing term is automatically equal to zero. The termis usually included for notational convenience. Eq.(1) has 12 dummy variables, but no constant. In(2), there are effectively 11 sinusoidal variables to-gether with the constant term l.

Although it has similarities with approximationswhich use Fourier series, this is an exact representa-tion, not an approximation. The dummy variableform and the trigonometric form are in general iso-morphic, in the sense that they coincide at discretepoints in time. The reason for the isomorphism isthat both sine and cosine functions fluctuate aroundzero with extrema at unity and minus unity. How-ever, it is the trigonometric form which permitsthe most convenient and most general further anal-

ysis, using continuous functions. Traditionally,dummy variables have been used. They are easy tocomprehend and permit the use of linear regres-sions: mathematically however, step functions areless tractable.

The dummy variable and trigonometric forms (1)and (2) are linked by the matrix expression

C ¼ RB: ð3ÞSee Ghysels and Osborn (2001): attention should

be drawn to the apparent typographical errors inpages 7 and 8.

Here C is a vector comprised of the dummy var-iable representation in Eq. (1) and B = (l,a1,b1, . . . ,a6)T is formed from the coefficients in Eq.(2). The idempotent matrix R consists of constantsfor which R�1 expresses the relationships

ak ¼1

6

X12

s¼1

cs cosðpks=6Þ for k ¼ 1; 2; . . . ; 5;

a6 ¼1

6

X12

s¼1

cs cosðpsÞ for k ¼ 6;

bk ¼1

6

X12

s¼1

cs sinðpks=6Þ for k ¼ 1; 2; . . . ; 5:

ð4Þ

The c coefficients are constrained to sum to l.The implications of this section are that we can

use a sinusoidal representation to manipulate stan-dard dummy variable representations of seasonal-ity. Eqs. (3) and (4) indicate the route from eachof the two representations to the other. In the nextsection we proceed to show that one can similarlyexpress in sinusoidal form a multiplicative modelcombining seasonal dummies and a linear trend.This provides insights regarding potential specifica-tion error, and also permits the use in Section 5 oftheoretical results on the approximation capabilityof NNs.

3. Specification issues

ZQ generate data using a multiplicative monthlyseasonal process which, ignoring noise, is expressedas

Y t ¼ T tSIt; ð5Þ

where Tt = 100 + 0.6t is a linear trend and SIt is aseasonal index, expressed as values for the dummyvariables coefficients in Eq. (1) above. The basicform of this model, simplifying the time trend byexpressing it in terms of t alone, is then just Eq.

B. Curry / European Journal of Operational Research 179 (2007) 267–274 269

(1) with the insertion of a multiplicative trend. ZQargue that numerous real data series fit such a pat-tern. Using Eq. (2) to decompose SIt, it is thenstraightforward to express the model as a collectionof terms of the form yt = t sin(pkt) and yt = t -cos(pkt) for various values of k. The question nowis simply whether a neural network can model suchtypical terms. This is considered below in Section 5.Here, however, we show potential specification er-rors which would arise if one attempts to fit an auto-regressive model when the true function is of thetype described in (5).

Consider first a time series model with a singlesinusoidal term yt = t sin(t). Here it may be demon-strated that there is an equivalent autoregressivemodel which is linear but which has time varyingparameters. Specifically, we may write

yt ¼ At þ Btyt�1; ð6Þ

where

At ¼t sinð1Þ cosðtÞ

cosð1Þ and Bt ¼t

t � 1cosð1Þ: ð7Þ

Proof. This can be shown by using the standardtrigonometric identity

sinðt � 1Þ ¼ cosð1Þ sinðtÞ � sinð1Þ cosðtÞ:

Thus given yt = t sin(t) and yt�1 = (t � 1)sin(t � 1).We have

yt�1 ¼ ðt � 1Þ½cosð1Þ sinðtÞ � sinð1Þ cosðtÞ�;t

t � 1yt�1 ¼ cosð1Þt sinðtÞ � t sinð1Þ cosðtÞ;

tt � 1

yt�1 þ t sinð1Þ cosðtÞ ¼ cosð1Þt sinðtÞ;

tðt � 1Þ cosð1Þ yt�1 þ

t sinð1Þ cosðtÞcosð1Þ ¼ t sinðtÞ ¼ yt:

Similarly, yt can be expressed in terms of yt�p. Onecan also do exactly the same with tcos(t) and alsowith sine and cosine terms having other frequencies,with the general form t sin(pkt) and tcos(pkt). h

The seasonal term SIt in Eq. (5) can be expressedusing the sinusoidal representation in (2). Hence themultiplicative ZQ model in (5) contains a weightedsum of terms of the general form t sin(pkt) andtcos(pkt).

The immediate implications of these results are asfollows:

(a) An autoregressive model derived from theterms in Eq. (2) is linear, and hence wouldnot require a neural network. This can be seenin Eqs. (6) and (7).

(b) The model in (6) has time-dependent weights.Even if a neural network is employed, a net-work with time varying weights is very faraway from models which are generally inuse. It is by no means clear that the NN, withweights assumed to be fixed constants, canimplement such a model, and it seems thatusing a nonlinear autoregressive form involvesapplying a network to the wrong task. Instead,the implication is that we should fit NNswhich use various functions of time as inputs.I have found only a single reference (Steil,2002) which deals with a time varying Recur-rent Network (Recurrent NNs are outsidethe scope of this paper). Overall, this kind ofmodel is simply uncharted territory.

(c) If the true underlying model consists of com-ponents of (4), then multiple autoregressiveterms are not required. Further discussion ofthis point appears below.

4. Neural network approximations

4.1. General considerations

A general theme of this paper is that close exam-ination of the various well-known approximationtheorems can yield useful insights. The term ‘univer-sal approximator’ was used by Hornik et al. (1989)to describe the capacity of feedforward networks toapproximate any ‘reasonable’ function. This prop-erty does not depend in general on the conventionalsigmoid/logistic activation function. It is usuallyexpressed in terms of limiting cases, as the numberof hidden nodes becomes infinite (but see Tamuraand Tateishi, 1997). Although a number of theo-rems deal only with a single hidden layer, variousauthors (e.g. Kurkova, 1992) have incorporatedmultiple layers. For a review of the literature onapproximation properties of NNs see Scarselli andTsoi (1998). See also Morgan et al. (1999) for empir-ical tests on a range of functional forms, includingfor example the ‘Mexican Hat’ function which iswell known as a severe computational test.

Hence ‘universal approximation’ is well foundedin the literature, and provides the primary motivationfor the use of NNs. If the network can ‘mimic’ any

270 B. Curry / European Journal of Operational Research 179 (2007) 267–274

reasonable function, it can detect and handle nonlin-earities without requiring a priori selection of partic-ular functional forms. The results of Zhang and Qi(2005), if upheld, would represent a serious challenge.

Although ‘universal approximation’ is wellfounded, there is a need for caution. The word ‘uni-versal’ appears rather too strong and appears tobring with it some risk of exaggerating the proper-ties of the network. This potential exaggeration isin marked contrast to the precision with whichauthors such as Barron (1993) and Hornik et al.(1989) express their results. Unfortunately, there isa tendency for researchers using NNs simply to notethe theoretical approximation problems in passing,without further evaluation. It is argued here thatsuch an evaluation repays the investment.

NN approximations in fact depend on smooth-ness conditions applied to the underlying function,and a ‘reasonable’ function is one which possessessome specific smoothness properties. Smoothnesscan be specified in a number of ways, which arenot unrelated. In the first instance, for Gallantand White (1988), the condition is that the underly-ing function f(t) should be square integrable, whichmeans that

R1 f ðtÞ2 is finite. In what is perhaps the

most famous and most cited paper Hornik et al.(1989) deal with ‘any real valued function over acompact set’. A compact set is both closed andbounded and since a continuous bounded functionis locally square integrable, (Titchmarsh, 1983),the conditions of Gallant and White and Hornikare effectively equivalent.

Barron (1993) uses a Fourier representation todescribe smoothness properties for networks witha single hidden layer. Smoothness conditions arestipulated in terms of the Fourier magnitude distri-bution. Specifically, the Fourier representation(Fourier integral) of a function f(t), if it exists is

gðxÞ ¼Z

f ðtÞ expðitÞdt:

Since g(x) may be complex-valued, one uses theabsolute value, termed the Fourier magnitude distri-bution. Here, because we focus on univariate mod-els, the variable t is a scalar. Barron’s paper dealswith the multivariate case. The Fourier integralcan be regarded as the limiting case of approxima-tion by trigonometric polynomials. For smoothfunctions the transform ‘tails off’ as x is increased.

Specifically, Barron deals first with the class offunctions for which the first derivative possesses aFourier representation and for which the integral

Cf ¼Z

Rdj-jjgð-Þjd- ð8Þ

is finite.If (8) holds, it implies the integrability of the

Fourier transform of the gradient of the function.‘Well-behaved’ functions with such properties canbe approximated by feedforward NNs. The actualbound for approximation error can be expressedin terms of Cf and the number of hidden nodes: Cf

may in turn depend on the dimensionality of theinput space.

Barron extends his basic result for ‘well-behaved’functions to additional classes of functions: impor-tantly, these are defined on bounded domains. Lin-ear functions (see Barron’s example 14 and Booleanfunctions, example 18) are included. (For a moredetailed analysis of NN approximations and linear-ity, including statistical issues, see Curry and Mor-gan, 2004.) Notably, his use of a Fourier approachserves to underline the potential for approximatingtrigonometric functions and seasonality. Trigono-metric functions also feature in the work of Mhas-kar and Miccelli (1992), and the networksanalysed by Gallant and White (1988) include theFourier series approximation as a special case.

There are numerous relationships between thedifferent types of smoothness condition. For thecondition in (8) it is sufficient (although not neces-sary) that all partial derivatives of a suitable orders should be square integrable. Specifically, for a d

dimensional input space s is the smallest integergreater than 1 + (d/2): for our univariate treatments is therefore 2.

The standard (see e.g. Bracewell, 1978) condi-tions for the existence of a Fourier representation,required for Barron’s theorem, are usuallyexpressed as a requirement that a function shouldpossess a finite number of turning points or discon-tinuities. These are the ‘Dirichlet Conditions’. Thatthe conditions should apply to discontinuities aswell as turning points is noteworthy: the conse-quence is that the dummy variables representationof seasonality in Eq. (1) can be included.

5. Application to seasonality

In this section we examine the theory relating toapproximation of trigonometric functions, using thefact that the ZQ multiplicative seasonal model canbe expressed as linear combinations of terms ofthe general form t sin(t) and tcos(t). These latter

B. Curry / European Journal of Operational Research 179 (2007) 267–274 271

functions are not square integrable, but they may belocally square integrable. Thus for example givenfinite bounds L and U, we have

Z U

Lðt sinðtÞÞ2 dt¼�U 2 cosðUÞsinðUÞ=2þðU 3�L3Þ=6

�ðU cosðUÞÞ2=2

þ cosðUÞsinðUÞ=4

þðU �LÞ=4þL2 cosðLÞsinðLÞ=2

þLcosðLÞ2� cosðLÞsinðLÞ=4:

This is finite, but as either U or L becomes progres-sively larger then the integral no longer exists.

Referring to the results of Barron (1993), one canalso see on an intuitive level, that seasonality maycause problems for network approximations. A use-ful point of reference is the Dirichlet conditions. Putvery simply, one expects the NN to be able toapproximate a ‘short’ trigonometric series, as typi-cally used for seasonal modelling. However, as theseries increases in length and acquires a larger num-ber of turning points or discontinuities, then theapproximation will become more strained. This isillustrated in the next section.

Expressing smoothness either in terms of squareintegrability (Hornik et al.) or by using Barron’sapproach through Fourier representation leads usto the same viewpoint regarding seasonality. Thesituation is little different from more conventionaltreatments of multiplicative seasonality, wherebythe multiplicative process itself is ultimately explo-

0 10 20 30 40 50-30

-20

-10

0

10

20

30

Tim

Net

wor

k ou

tput

Fig. 1. Network approximation f

sive and so practical bounds need to be imposed.Damping of seasonal factors may be of assistancein this respect.

6. Empirical tests

The key point of the previous section is the num-ber of turning points, which in a seasonal series willusually be substantial. Equally, the greater the num-ber of turns, the greater the number of hidden nodeswhich will be required. Hidden nodes are the vehiclethrough which NNs provide their approximationcapacity. In non-technical terms the network con-structs its approximation by taking numerous seg-ments of the activation function, which possesses alinear zone in its middle regions and a wide varietyof curves elsewhere. As a function becomes lesssmooth it requires a larger number of hidden nodesfor approximation purposes.

To illustrate the arguments, we first considerapproximation of a ‘short’ noiseless series of theform t sin(t). This is basically very easy for the net-work. A series was generated for 96 values of t start-ing from unity and incremented by 0.25. The actualnumber of values was chosen to present continuouscurves, but there are only seven turning points.Fig. 1 shows a neural network approximation,obtained using the MATLAB NN Toolbox. No sep-arate training or validation data is used, with theintention being to focus completely on the capacityof the network to deal with noise-free underlyingfunctions. The Levenberg–Marquardt algorithm,

60 70 80 90 100e

or tsin(t): 25 hidden nodes.

272 B. Curry / European Journal of Operational Research 179 (2007) 267–274

with the default MATLAB learning parameters,was employed. The figure relates to a network with25 hidden nodes. There is very little discernible dif-ference between the network outputs and the origi-nal data series.

Figs. 2 and 3 show not the outputs but the train-ing error arising for two different networks, fitted tomonthly data for the function t sin(t) over 5 years, atotal of 60 data points. In this case we have a sub-stantial number of turning points. With 48 hiddennodes the network can shape itself to the functionover roughly half of the data series, with the errorsthen becoming explosive and, significantly, also fol-lowing the seasonal pattern.

With 120 hidden nodes the result is much better,but in this case the network still struggles after a cer-tain point. It is interesting to note that the errorsenter towards the end of the series rather than atthe beginning. What is also intriguing is the largenumber of hidden nodes required, and that thereis no apparent damage to the properties of the net-work. With time as the single input and hence threeparameters for each hidden node (including biasterms), we soon encounter ‘negative’ degrees of free-dom. This is in fact consistent with results in Curry(2006), in which parameter redundancy is seen as anatural aspect of feedforward networks: the topic isalso discussed in Curry and Morgan (2006).

Fig. 2. Training error for t sin(t), t

In both of the latter papers it is suggested thatthere are ‘naturally occurring’ functional dependen-cies between network weights, one consequence ofwhich is that it becomes impossible to derive uniqueoptimal weight values. However, such weight redun-dancy does not in itself cause damage to the prop-erty of universal approximation: the networksimply needs a suitable number of hidden nodesand weights to produce its approximation, even ifthe weight values themselves are not statistically sig-nificant. In the same way one can use in multipleregression a highly collinear set of explanatory vari-ables which combine to produce adequate forecasts.It seems that something similar is happening in theexamples presented here. The numerous turningpoints in the data require additional hidden nodes:even though individual weight values cannot beextracted, the combined weights still add value.

In considering these examples it is useful to recallthe results in Section 3 above suggesting that auto-regressive formulations are inappropriate when thegenerating process is some function of time alone.Nevertheless, it is theoretically possible, albeit notnecessarily likely, that an autoregressive model haspractical advantages. In particular, it may avoidthe difficulties caused by a large number of turningpoints. Additionally, there is the possibility that theproblem of time-varying parameters may be less

= 1, . . . ,60, 48 hidden nodes.

Fig. 3. Training error for t sin(t), t = 1, . . . , 60, 120 hidden nodes.

B. Curry / European Journal of Operational Research 179 (2007) 267–274 273

severe if autoregressive terms combine to provide abalance.

Consider for example Eq. (6) above in conjunc-tion with a second order autoregressive model withconstant weights. We have

yt ¼ At þ Btyt�1 and

yt ¼ p0 þ p1yt�1 þ p2yt�2

with constants p0; p1 and p2:

Combining the two requires that Bt � p1 andAt � p0 + p2yt�2. An approximate balance is theo-retically possible, but the topic warrants furtherinvestigation.

The examples in this section serve to emphasisethe problems involved in constructing NN approxi-mations of sinusoidal functions. No noise isincluded in the data and hence all of the data pointscan be used for training. The reason for so doing isthat if we uncover difficulties for a straightforwardcase of function approximation, then, a fortiori,the network will encounter problems in extractinga suitable function from noisy data. Without noiseoverfitting is a less serious problem. In practicalapplications with models of this kind noise becomesimportant, not only because of potential overfittingbut also because we are dealing with processesexhibiting variance non-stationarity. The statistical

consequences of the latter for NN models deservefuller consideration.

7. Summary and conclusions

The question of whether feedforward NNs candeal with seasonality without prior adjustmentremains open and interesting. This paper arguesthe need for some theoretical foundation to beestablished, with a view to supporting empiricalstudies. It is suggested that seasonality, and trigono-metric functions in general, can pose problems. Thelonger our time series becomes, the more we moveto the limits of the ‘universal approximation’ prop-erty which is the main raison d’etre of standardfeedforward networks.

Specifically, it is shown here that multiplicativemodels combining a time trend and a set of seasonaldummies can be regarded as linear combinations ofsinusoidal functions with typical terms t sin(t) andtcos(t). Such functions can in turn be examined inthe light of well-established approximation theo-rems, particularly those of Barron (1993). Thesetheorems point to difficulties in NN approximationsand seem to be worthy of more detailed attentionthan is generally the case. The term ‘universalapproximation’ clearly needs to be used with caution.

274 B. Curry / European Journal of Operational Research 179 (2007) 267–274

Specifically, Barron’s work focuses on thesmoothness of the function to be approximated,and seasonal or sinusoidal effects can be a sourceof difficulty. In the course of experiments it wasfound that as the number of turning points increasesthere is more difficulty for the network in shapingitself to the data. The training error can be seen totally with the underlying sinusoidal factor. Suchfindings confirm the theoretical arguments put for-ward above. The problems arise even in the absenceof noise and can be seen to arise directly from thenature of the underlying process itself, even if it fol-lows a stable pattern.

In this sense, Zhang and Qi (2005) may be justi-fied in calling for initial seasonal adjustment. On theother hand ZQ’s work may be seen to involve a dan-ger of specification error; to apply the neural net-work with lagged data values as inputs is to set itthe wrong approximation task. Similar problemswould apply with ARIMA time series models.

Theoretically, for an underlying multiplicativetime series model such as that of ZQ, an autoregres-sive formulation for the network is inappropriate,unless one contemplates time varying weights. Inthis case it is not surprising to find that the neuralnetwork needs to be assisted by prior seasonaladjustment. On the other hand, it may well be thecase that the autoregressive form avoids the needto fit the network to multiple turning points, andcould therefore provide a reasonable approximation.

A notable aspect of the examples quoted above isthe need for enormous numbers of hidden nodes.There appears to be a breach of statistical principlesin the form of negative degrees of freedom. Morenodes are required to allow the network to shapeitself to a complex data series. Further analysis isrequired on such a strange phenomenon, althoughit is consistent with the results contained in Curry(2006) on parameter redundancy in standard feed-forward networks.

Overall, the results discussed here suggest a needfor caution in using NNs to model nonlinearities.Universal approximation is certainly powerful, butthere is a danger that excessive reliance may beplaced upon it.

References

Barron, A.R., 1993. Universal approximation bounds for super-position of a sigmoidal function. IEEE Transactions onInformation Theory 39, 930–945.

Bracewell, R.N., 1978. The Fourier Transformation and itsApplications. McGraw-Hill Kogakusha Ltd., Tokyo.

Crone, S.V., 2005. A new perspective on forecasting seasonal timeseries with artificial neural networks. In: 25th InternationalSymposium on Forecasting, San Antonio, Texas.

Curry, B., 2006. Parameter redundancy in neural networks: Anapplication of Chebyshev Polynomials. Computational Man-agement Science, in press.

Curry, B., Morgan, P.H., 2006. Model selection in neuralnetworks: Some difficulties. European Journal of OperationalResearch 170 (2), 567–577.

Curry, B., Morgan, P.H., 2004. Neural networks, linear functionsand neglected non linearity. Computational ManagementScience 1 (1), 17–30.

Dhawan, R., O’Connor, M., 2005. Time series forecastingcompetition: A multi-network approach. In: 25th Interna-tional Symposium on Forecasting, San Antonio, Texas.

Gallant, A.R., White, H., 1988. There exists a neural networkthat does not make avoidable mistakes. In: IEEE SecondInternational Conference on Neural Networks, San Diego,vol. I, pp. 593–608.

Ghysels, E., Osborn, D.R., 2001. The Econometric Analysis ofSeasonal Time Series. Cambridge University Press, Cam-bridge, UK.

Hornik, K., Stinchcombe, M., White, H., 1989. Multilayerfeedforward networks are universal approximators. NeuralNetworks 2, 359–366.

Kurkova, V., 1992. Kolmogorov’s theorem and multilayer neuralnetworks. Neural Networks 5, 501–506.

Mhaskar, H., Miccelli, C., 1992. Approximation by superpositionof sigmoidal and radial basis functions. Advances in AppliedMathematics 13, 350–373.

Morgan, P., Curry, B., Beynon, M., 1999. Comparing neuralnetwork approximations for different functional forms.Expert Systems 16, 2.

Scarselli, F., Tsoi, A.C., 1998. Universal approximation usingfeedforward neural networks: A survey of some existingmethods, and some new results. Neural Networks 11 (1), 15–37.

Steil, J.J., 2002. Local stability of recurrent networks with time-varying weights and inputs. Neurocomputing 48 (1–4), 39–51.

Tamura, S., Tateishi, M., 1997. Capabilities of a four layeredfeedforward neural network: Four layers versus three. IEEETransactions on Neural Networks 8 (2).

Titchmarsh, E.C., 1983. The Theory of Functions. OxfordUniversity Press, Oxford.

Zhang, G.P., Qi, M., 2005. Neural network forecasting forseasonal and trend time series. European Journal of Opera-tional Research 160, 501–514.