Wavelets in Forecasting
Mak Kaboudan School of Business, University of Redlands
1200 East Colton Avenue, Redlands, CA 92373
Tel: (909) 748-6349
Email: [email protected]
July 5, 2004
2
Wavelets in Forecasting
Wavelet analysis is not a forecasting technique but may help improve our forecasting abilities.
Multiresolution analysis decomposes observed series to produce different levels of detail. For
sufficiently lengthy time series, a level of decomposition that transforms the observed series into a
smoothed representation and low scale details is selected with just enough observations to model
then forecast each. An inverse transformation process converts models’ fitted values back for
comparison with the originally observed sequence. In this paper, monthly sunspot numbers are
first transformed using the Haar wavelet providing input data to train using artificial neural
networks and genetic programming. Inverse transformation compute fitted and forecasted values
of the monthly series. Each technique was used to produce three-year-ahead forecasts for a
relatively large number of years into the future. These results should therefore invite more
research to improve on the proposed method.
Keywords: Non-stationary time series; Haar wavelet; artificial neural networks; genetic
programming.
1. INTRODUCTION
The purpose of this paper is to investigate the potential role wavelet analysis can play in forecasting.
Most prior studies using wavelets focus on estimation. Lee (1998) provides an interesting review. Pan and
Wang (1998) introduced a new wavelet-based estimator that combined state-space model with wavelet
transform to explore stock market inefficiency. Jensen (1999) uses wavelets to obtain consistent OLS
estimator of long-memory parameters. Nason and Theofanis (2000) use the method to model
nonstationary time series. Ramsey et al. (1995) use wavelet analysis to detect self-similarity or non-
randomness in the U.S. stock market. The role of wavelet analysis in filtering was investigated in Gençay
et al. (2002). Aussum and Murtagh (1997) apply neural networks to ‘à-trous’ wavelet-transformed annual
sunspot numbers to obtain one-step-ahead forecasts for 59 sunspot values ranging from 1921 to 1979.
Thomason (1997) obtain one-step-ahead forecasts of wavelet filtered S&P 500 index using neural
3
networks as well. Pan and Wang (1998) introduced a new estimator that combines a state-space model
with wavelet transforms to forecast S&P 500 as a function of the S&P dividend yield. Renuad et al.
(2002) experimented with AR(4) noisy data to provide one-step-ahead forecast. It is therefore important
to investigate whether wavelet analysis can be used in forecasting. And if it is, will it be possible to obtain
more than one-step-ahead forecast?
Wavelet analysis, in and by itself, is not a forecasting technique. In its simplest form, it transforms
a suspected signal into different levels of resolution. Wavelets localize a process in time and frequency.
This is done utilizing a wavelet transform function (the mother wavelet). Its role is to capture low and
high frequency features in a time series at successive decomposition levels. When the wavelet transform
is high in frequency it captures detailed incidents. When it is low in frequency it captures long in time
incidents making it ideal for analyzing nonstationary time series. The strength of wavelet analysis is in
inverse transformation. The inverse transform perfectly reconstructs the original suspected signal. At each
level of decomposition there exists a finer level of resolution. Each stage of decomposition produces two
series containing half the number of points in the prior level. One of the two captures the high while the
other captures the low level frequency. Increasing the level of decomposition ultimately provides
predictable series if the original data contains any signal. What is interesting here is that if one of the
high-level frequency decompositions is predictable, the associated low-level frequency is also predictable.
(This characteristic was not known prior to obtaining the results reported later in this paper.) If this is true,
then an experiment to fit estimation models to higher frequencies at consecutive decomposition levels
until a predictable one is found may be warranted. One then estimates a model for the associated low-
level scale. Fitted values from solving models at all the higher and lower frequency levels can then be
used to obtain fitted values of the original signal. Using this system and by selecting the appropriate lag
structure to estimate the different models, multi-period-ahead forecasts become possible.
4
Proposing a method to use in forecasting is validated by its successful application. While the
objective here is not to prove that this method provides the best forecast, it is imperative to show that it
can be applied to real world phenomena. Monthly sunspot numbers were selected to test the method. The
number is an index representing daily appearances of sunspots. These are huge dark areas sometimes
exceeding the Earth’s size that appear on the Sun’s visible surface, the photosphere, then disappear in a
few hours, days, or even months. They occur in pairs at confined latitudes north and south of the Sun’s
equator with opposite magnetic polarity. A sunspot number is a daily measure r = A (10 G + I), where A
is an adjustment factor that accounts for differences between observatories and observers, G is the number
of groups of sunspots, and I is actual count of visible spots. Wolf and Wolfer introduced this index in
1848. It was designed to estimate solar activity level because they found that neither the number of groups
nor the total count of individual spots alone provide an accurate representation. The series used in this
study is Rt = monthly averages of r in time period t. Most studies analyze and forecast annual data. (See
for example Gabr and Rao 1981, Tong 1990, and Lin and Pourahmadi 1998.) The relatively fewer
attempts to forecast monthly averages include those of Mundt et al. (1991) and Hathaway et al. (1999).
Monthly sunspot numbers were selected for four reasons. First, wavelet analysis demands a
relatively large number of observations to transform. Using annual data would leave too few observations
to fit. Second, using real world data seems to be more convincing than artificially simulated data. Third,
timely and accurate forecasts of the numbers are especially important in decision making for satellite
orbits and space missions. They have significant economic implications for technologies such as high-
frequency radio communications and radars. Their accurate prediction is also essential for weather
forecasting. Fourth, sunspot numbers have been one of the most widely investigated series in the field of
statistics and forecasting.
There are three problems that may have hindered the use of wavelets in forecasting. (These
problems are discussed more generously in Lee (1998) and Gençay et al. 2002, p. 143).) First, wavelet
5
analysis is assumed to apply only to time series of dyadic length (T = 2J, where T is the length of the time
series and J is a positive integer). In the method proposed, it is possible to relax the requirement for a
dyadic-length vector of observations. The second problem pertains to selecting the wavelet basis function.
Since real world data is collected periodically, it is easy to assume that they are piecewise constant
functions (Gençay et al. 2002), and the Haar wavelet would be most appropriate especially for
nonstationary signals (Swee and Elangovan 1999). Third, the application of wavelet transformation to
discrete finite-length time series is affected by the boundary. Transformation is based on filtering - and
sometimes solutions must be developed to meet boundary conditions needed to compute the transformed
values. This problem is dissolved while solving the second problem. The Haar wavelet is exactly
reversible thus eliminating boundary effects that are a problem with other wavelet transforms.
Investigating the application of other filters such as the Daubechies wavelet (Daubechies 1992) is left for
future research. Given that the objective here is to demonstrate how wavelets can be used in forecasting,
the simplicity of the Haar wavelet becomes an appealing characteristic even though the Daubechies
wavelet may improve on the frequency-domain characteristics of the Haar wavelet. The Haar wavelet is
reviewed in the next Section. To forecast the low and high frequency transformations, two techniques are
used: artificial neural networks (ANN) and genetic programming (GP). They are briefly described in
Section 3. Their application is in Section 4. The forecast and its evaluation are in Section 5. Section 6
contains discussion and some suggestions for future research.
2. THE HAAR WAVELET
Like other types of wavelets, the Haar transform iteratively decomposes a sequence of observation
into father and mother wavelets. The inverse transform restores the series to its original sequence. At each
level of decomposition the two transforms are half the length they were in the prior one. In the Haar
transform, a series Xt is first decomposed to obtain averages (denoted by a1) and differences (denoted by
d1) of consecutive pairs of observations (as opposed to consecutive points) in that series. Averages
6
preserve its main signal while differences capture the series’ fluctuations. If Xt contains T0 elements, there
will be a1 and d1 of averages and differences of length T1 = T0/2. The input for the next level of
decomposition is a1, where for this second iteration, T2 = T1/2. Recursive iterations continue until a single
average and difference are calculated. (This explains the restriction Tj = 2J.) The computations of aj,t and
dj,t are as follows:
The averages: ( ) 2aaa 1t21jt1jtj −−−+= ,,,
The differences: ( ) 2aad 1t21jt1jtj −−−−= ,,,
where for each level of decomposition tj = 1, …, Tj-1/2, j = 1, …, J, and a0 = Xt. For the Haar transform to
preserve the energy of a signal the wavelets are normalization by a factor 2 (Walker 1999, pp. 3-7), and
The father wavelet: ( ) 2aaa 1t21jt1jtj −−−+= ,,, (1)
The mother wavelet: ( ) 2aad 1t21jt1jtj −−−−= ,,, . (2)
Energy is an important characteristic in wavelet analysis. It helps here in evaluating estimation outcomes.
The inverse wavelet is simply calculated by reversing the decomposition process. From the highest
level of aj,t and dj,t reached, and in reverse order, only the aj,t are reversed and in pairs. The inverse
transforms are obtained by solving (1) and (2) for aj-1,t and aj-1,2t-1. Alternatively,
( ) 2daa tjtj1t21j ,,, −=−−
and (3)
( ) 2daa tjtjt1j ,,, +=−
(4)
are computed until the original series is restored. Jensen and la Cour-Harbo (2000) offer simple and
detailed explanation of the Haar transform and its inverse.
Wavelet transforms can be used to fit and forecast aK,t and dj,t for levels of decomposition K < J
and inverse transforms are then used to assemble fitted and forecasted values of a0,t = Xt. Decomposing a
series to a level K < J permits relaxing the restriction T = 2J. To demonstrate, consider a series with t = 1,
7
…, 320 observations (where T ≠ 2J) and assume K = 3 limited decomposition levels are performed. After
the first level of decomposition, 160 observations are left, the second leaves 80, and the third leaves 40
observations. These 40 observations can now be used to fit a model to. The restriction (T ≠ 2J) is thus
reduced to selecting a series of length T such that T/2K is an integer. Clearly, limited decomposition can
be applied to other than the Haar transform.
Determining the level of K depends on the complexity level of the series one is attempting to
model and forecast. Larger K will be needed for higher levels of complexity. This suggests modeling
successive levels of paired differencing until a reasonable model (i.e., one with good predictive ability) is
reached. Thus, one would expect a poor fit of d1. Data from the second level of differencing (d2) of pairs
of a1 should be more predictable than the first if the original series contains any signal and fitness will
continue to improve at higher levels of differencing. Once an acceptable model (with high fitness) is
obtained, there is no gain from filtering the data any further and that level is selected as K. This means
that the restriction set on the length of the series to analyze should now be expanded to the following:
Select T such that T/2K is an integer and the number of observations at K is sufficiently large to estimate a
model. (The minimum sample to model depends on the technique selected.)
3. MODELING TECHNIQUES
This Section contains a brief review of the two techniques applied in forecasting and an outline of
the steps to follow in obtaining a forecast. In addition to possessing good predictive abilities, ANN and
GP are applicable in nonlinear modeling.
Artificial Neural Networks
ANN is an information-processing paradigm based on the way the densely interconnected parallel
structure of the human brain processes information. They are a collection of mathematical models that
emulate the nervous systems and draw on the analogies of adaptive learning. ANN can be used to detect
8
structure in time-series. Input data is presented to the network that learns to predict future outcomes.
Principe et al. (2000) among many others provide a complete description on how ANN can be used in
forecasting. When training a model to forecast sunspot numbers, both multilayer perceptions (MP) and
generalized feedforward (GF) networks were used. An MP is a layered feed forward network. It is
typically trained with static backpropagation. It is easy to use and produces good approximations. A GF
network is a generalization of MP such that its connections can jump over one or more layers. GF often
solves the problem much more efficiently.
Genetic Programming
GP is a computerized optimization technique applicable in solving diverse problems in different
disciplines. The idea gained attention after work by Koza (1992). GP is utilized here to evolve model
specifications that can be used in forecasting. A description of how GP is used in forecasting and its
statistical properties are in Kaboudan (2001). The GP software used in this study is TSGP (Kaboudan
2003) written for a Windows environment in C++. It uses two types of input: data input files and a
configuration file. Data files consist of separate ones with values of the dependent and each of the
independent variables. The configuration file contains parameters that provide the computer with
execution information. Parameters provided in this file include: name of the dependent variable, number
of observations to use in fitting the model, number of observations to forecast, number of equation
specifications to evolve, and other GP-specific parameters. The computer code produces two output files.
One has a final model specification and the other contains actual and fitted values as well as performance
statistics such as R2, historic MSE, and ex post MSE for each best-fit evolved model.
To find a model that would replicate history and forecast well, executing the program only once is
not sufficient. This is because the process is random and while searching for a minimum SSE, the
algorithm easily gets trapped in a local minimum rather than a more desirable global one. Thus, the
resulting best-fit equation in a single execution may not be very useful and executing the program a fairly
9
large number of times is necessary. For most observed phenomena with signal suspected, 100
independent executions produce a few reasonable specifications.
While a best-fit model may replicate historical values well, it is not uncommon that it does not
forecast well. This is especially true when most equations specified by GP are nonlinear and therefore
sensitive to minute changes in values of the input variables. It is therefore necessary to evaluate ex post
forecasts before accepting and using an evolved model. Because solutions of nonlinear equations are
sensitive to initial conditions, GP equations are evolved using the autoregressive specification Xt = f(Xt-n,
Xt-n-1, …, Xt-n-c), where n is a constant integer defining the number of periods to forecast ahead, and c is a
constant integer defining a lag structure. By not using the first few lags, it is possible to forecast for as
many periods ahead using actual rather than forecasted values of the lagged dependent variables. The
ability to produce models with such long lag structure is one of the advantages using GP. Further, when
using long lag structures, there is no loss in degrees of freedom because coefficients are not computed.
They are randomly assigned numbers.
Steps to follow
The following is a summary of the sequence of tasks that help obtain a forecast using the proposed
method:
i. Use the Haar transform to decompose the signal Xt into multiresolution layers. Iterative
applications of equations (1) and (2) accomplish this.
ii. Use a modeling techniques (ANN and GP are used here) to fit or train a sample < t at each
level of iteration. At a3,t where t = 40 for example, it is sufficient to train the first 30
observations only and forecast the remaining 10. This provides out-of-sample outcomes to
evaluate. The number of data sets to train depends on the level of decomposition performed. If
that level is 3, then four data sets are trained to obtain four forecasts of a3,t, d3,t, d2,t, and d1,t.
Cumulative energy for these three levels of decomposition of the original series is:
10
Γ = γ1 + γ2 + γ3 + γ4 = 1. (5)
Energy at each of the three levels (where Σ is over the applicable t) is defined as:
γ1 = Σ d12 / Et, (6)
γ2 = Σ d22 / Et, (7)
γ3 = Σ d32 / Et, (8)
γ4 = Σ a32 / Et, and where (9)
Et = Σ Xt2. (10)
The cumulative energy of the fitted data sets should be expected to yield:
G = g1 + g2 + g3 +g4 ≈ 1 (11)
if the proposed method is valid. (In (11) g = the sample estimate of γ.)
iii. Use the Haar inverse transform to construct the fitted values of the original series and obtain a
forecast. Iterative applications of equations (3) and (4) accomplish this.
4. APPLICATION
The Data
The method proposed is applied to monthly unsmoothed sunspot numbers obtained from the Solar
Influences Data analysis Center division of the Royal Observatory of Belgium (SIDC 2003). The sample
selected starts January 1920 and ends August 2002 with a total of 992 observations. Identical data sets
were used to train ANN and GP. Both were designed to produce 36-month-ahead forecasts. The first 36
months were used as lags to avoid using model solutions to forecast future observations. This may seem a
lot in standard modeling techniques. However, with wavelet decomposition, this number became very
manageable quickly. By the fourth level of decomposition, the number of lags needed to forecast 36
periods ahead was down to only two. Table 1 shows how the data was used. Each column identifies the
level of differencing. Reasonable fits were found at d4. (K = 4 was possible because a large number of
11
observations was available.) The following hypothetical model specifications (used in training) may
clarify the information in Table 1:
a4,t = f (a4,t-3, a4,t-4, …, a4,t-8) (12)
d4,t = f (d4,t-3, d4,t-4, …, d4,t-8) (13)
d3,t = f (d3,t -5, d3,t -6, …, d3,t -10) (14)
d2,t = f (d2,t -9, d2,t -10, …, d2,t -14) (15)
d1,t = f (d1,t -17, d1,t -18, …, d1,t -22) (16)
Equations (12) and (13) provide two-step-ahead forecasts (or 32 months after applying the inverse
transform) without using their own forecast. Equation (14) provides four-step-ahead (or 32 months) as
well, and so on. All equations are specified similarly. Six lagged dependent variables were assumed
sufficient to explain variations in each of the dependent variable. Under Lags in Table 1, NOBS = number
of observations used up in establishing the needed lags. Lost df = lost degrees of freedom due to lags.
Under Training and Forecast, NOBS = number used to train and to forecast, respectively.
ANN Results
Five networks specified using equations (12-16) were identified and used to train and forecast the
monthly sunspot numbers. Table 2 contains the details. Figures 1a-1f contain six plots comparing actual
with fitted values of monthly sunspot numbers and of each of the five decomposed variables. The Figures
clearly suggest that ANN was successful in capturing the dynamics of a4,t and d4,t. ANN’s performance
deteriorated at lower levels of resolution d3,t, d2,t, and d1,t. This is expected since d1,t may be just noise.
Figure 1a portrays overall performance of the system when fitting MSSNt (where MSSN = monthly
sunspot numbers).
Statistics on training results are summarized in Table 3. It contains three sets of statistics. The first
one helps analyze the residuals. The statistics provide hints about normality of the residuals and test for
autocorrelation. Residuals’ means are all statistically equal to zero. Residuals of d1 and a4 are right
12
skewed and all but d3 are leptokurtic. The first order autocorrelation null cannot be rejected for d1, and the
second order autocorrelation null cannot be rejected for a4. LM is the general Lagrange multiplier test
suggested by Godfrey (1978) and Breusch (1978). Its test statistics suggest the absence of serial
correlation of order 2 in either autoregressive or moving average form for all residuals. To complete this
test, residuals were regressed on their lagged values. The test-statistic = T*R2 has an asymptotic 2)(qχ
distribution. The last statistic in this set is the Ljung-Box (1978) Q test. It was utilized to test against
higher order serial correlation. For all residuals, higher order serial correlation null hypotheses were
rejected.
The second set of statistics in Table 3 provides hints about the amount of energy captured in the
simulated ANN data relative to the original set. Importantly, the energy statistic for the simulated a4
suggests that most energy in the series is preserved and successful prediction of this variable may help
forecast the series well. Simulations of the ANN estimations successfully captured 96% of that signal. It
was also successful in capturing a respectable portion of the energy in d4. Overall, the simulation captured
93.8% of total energy contained in the observed series. The third set reports an R2 = 0.90 and root mean
square error (RMSE) = 17.03.
GP Results
Using the same equations (12-16) GP model specifications were evolved. (Because they are of little
importance, they are included in an Appendix.) Table 4 contains configuration details. The information in
the Table helps reproducing some of the results using TSGP. TSGP prompts the user with questions to
answer. Answers to those questions are in the Table listed in the order they are asked during execution.
Figures 2a-2f contain six plots comparing actual with fitted values of monthly sunspot numbers and of
each of the five variables. They suggest that GP was successful in capturing the dynamics of a4,t and d4,t.
For d3,t, d2,t, and d1,t, GP’s results were slightly better than ANN’s. Actual and fitted MSSNt are in Figure
2a.
13
Table 5 contains a summary of the computed statistics. All residuals’ means are statistically equal
to zero and are not skewed. Only d1 and d2 are leptokurtic. The null hypotheses testing for presence of
serial correlation are rejected for all data. The energy statistic for the GP simulated a4 data set suggests
that most energy in the series is preserved and that MSSN’s values were successfully fitted. Overall, the
simulation captured 96.3% of total energy contained in the observed series. Finally, the R2 = 0.90 and the
root mean square error (RMSE) = 17.52.
5. THE FORECASTS
Figures 3 and 4 contain comparisons of actual and ex post forecasts in addition to ex ante forecasts
using the respective techniques. Forecast statistics for both are in Table 6. The Theil’s U-statistics are
consistent with models’ performances.
Although both forecasts are not perfect, they actually exceeded expectations set before conducting
this experiment given that all points are forecasted 36 months ahead and that training and fitting were
only over the period 1923-1974. The resulting models seem to predict well for a relatively long period
into the future. For ANN, the forecast seems almost perfect 1981 through 1997. For GP, it seems almost
perfect 1978 through 2000.
6. DISCUSSION AND FUTURE RESEARCH
In this paper, a method was proposed to use wavelets in forecasting. The method applies only to
high frequency data (hourly, daily, weekly, monthly, and possibly quarterly provided a very large number
of observations exists). Input data in this study were monthly sunspot numbers. They were decomposed to
a level less than the maximum possible to leave a sufficient number of observations to fit a model to.
Models were fit to the decomposed series. Fitted values were transformed back then compared to the
original series.
14
The results suggest that the method helps forecast a large number of steps ahead (36 months for
sunspot numbers) and for a large number of years into the future (more than 20). Encouraging results
were obtained in the experiment undertaken. This is probably because sunspot numbers have sufficient
signal to model. Applications to other series with higher data generating process complexity will probably
yield less encouraging results. Only future experiments can confirm that.
Future research therefore may involve further experimentation with data characterized with
different levels of complexity. Applying the same method using different types of wavelets should be of
interest as well. Research may also be extended to perhaps employ this method in approximating the
signal-to-noise ratio of an observed series.
APPENDIX
GP EVOLVED EQUATIONS
Equations evolved by TSGP are included here. To compute fitted or forecasted values using these
equations, the following protections apply:
(1) If in (x÷y), y = 0, then (x/y) = 1.
(2) If in y1/2, y < 0, then y1/2 = - | y|1/2.
They are designed to prevent computational problems. Here are the fittest evolved models:
d1,t = [{d1,t-18 / {(sin d1,t-18) – ((sin d1,t-18) / 7)) + d1,t-18 + 2 d1,t-19}} + {d1,t-18 / (d1,t-20 ½ – 9 +
d1,t-19)} + {(d1,t-18 + d1,t-20) / (d1,t-18 + d1,t-19)} + {(2 d1,t-18 + d1,t-20) / (d1,t-18 + d1,t-19 +
d1,t-20)} + (d1,t-18 / d1,t-19) + d1,t-20 {sin (d1,t-19 – 4)}] ½.
(17)
d2,t = (d2,t-9 / d2,t-14 ) – {(d2,t-13 – d2,t-12 – 2 d2,t-11 – 2 d2,t-14 )}1/4 + d2,t-12½ – 2 (d2,t-13 / d2,t-14)1/4 –
(d2,t-13 – d2,t-12 – d2,t-11 – d2,t-131/32 – d2,t-13
½ ) / cos (d2,t-11 * d2,t-14).
(18)
d3,t = [{cos (d3,t-7 + d3,t-5 + 32) + sin (d3,t-7 * d3,t-6) + cos (d3,t-7 + sin d3,t-6 + sin (15 d3,t-7) +
d3,t-71/8} * (sin (d3,t-5 – (d3,t-5 + 46)½] + (19 / d3,t-5) + [{sin d3,t-7 – (d3,t-7 + 8)½ – (d3,t-5 +
(19)
15
12)½} * (cos d3,t-9 + sin (d3,t-6)2)].
d4,t = d4,t-6 – 34 – {(2 d4,t-4 + d4,t-8) / cos (d4,t-5 * 72½} – 3 d4,t-5}½) – d4,t-5 / {d4,t-8 * cos (– 24 /
d4,t-6)}½ – (d4,t-4 + d4,t-5)½ + (– 15 / d4,t-6) + (d4,t-4 + 2 d4,t-5)½ + (– 23 / d4,t-5) + d4,t-8 ½.
(20)
a4,t = a4,t-8 + a4,t-5 + (a4,t-7 * a4,t-8) + [{((a4,t-7 { a4,t-8 (a4,t-8 - 117)}½) ½ * { a4,t-6 (a4,t-8 - 204) ½}½)
+ a4,t-6 {( a4,t-7 (a4,t-8 - 128)) ½ + a4,t-8 - 88}½ + a4,t-5} * cos [a4,t-6½ { a4,t-8 + (a4,t-7 * a4,t-8)
+ a4,t-8}½ + a4,t-8] ] ½.
(21)
REFERENCES
Aussem, A., and Murtagh, F. (1997). Compbining neural network forecasts on wavelet-transformed time
series. Connection Science, 9, 113-121.
Breusch, T. S. (1978). Testing for autocorrelation in dynamic linear models. Australian Economic Papers,
17, 334-355.
Daubechies, I. (1992), Ten Lectures on Wavelets, Vol. 61 of CBMS-NSF Regional Conference Series in
Applied Mathematics. Society for Industrial and Applied Mathematics, Philadelphia.
Gabr, M. and Rao, S. (1981). The estimation and prediction of subset bilinear time series with
applications. Journal of Time Series Analysis, 2, 155-171.
Gençay, R., Selçuk, F., and Whitcher, B. (2002). An introduction to wavelets and other filtering methods
in finance and economics, San Diego, CA: Academic Press.
Godfrey, L. G. (1978). Testing against general autoregressive and moving average error models when the
regressors include lagged dependent variables. Econometrica, 46, 1293-1302.
Hathaway, D., Wilson, R., and Reichmann, E. (1999). A synthesis of solar cycle prediction techniques.
Journal of Geophysical Research, 104, 22375-22388.
16
Kaboudan, M. (2001). Statistical properties of fitted residuals from genetically evolved models. Journal
of Economic Dynamics and Control, 25, 1719-1749.
Kaboudan, M. (2003). TSGP: A time eeries genetic programming software.
http://newton.uor.edu/facultyfolder/mahmoud_kaboudan/tsgp.
Koza, J. (1992). Genetic Programming, Cambridge, MA: The MIT Press.
Jensen, A. and la Cour-Harbo, A. (2000). Ripples in Mathematics: The Wavelet Transform, Berlin,
Springer.
Jensen, M. (1999). Using wavelets to obtain a consistent ordinary least squares estimator of the long-
memory parameter. Journal of Forecasting, 18, 17-32.
Lee, G. (1998). Wavelets and wavelet estimation: A Review. Journal of Economic Theory and
Econometrics, 4, 123-157.
Lin, t., and Pourahmadi, M. (1998). Nonparametric and non-linear models and data mining in time series:
A case-study on the Canadian lynx data. Applied Statistics, 47, Part 2, 187-201.
Ljung, G. and Box, G. (1978). On a measure of lack of fit in time series models. Biometrika, 65, 297-303.
Mundt, M., Maguire II, B., and Chase, R. (1991). Chaos in the sunspot cycle: Analysis and prediction.
Journal of Geophysical Research, 96, 1705-1716.
Nason, G., Theofanis, S. (2000). Wavelet packet function modelling of nonstationary time series.
http://www.stats.bris.ac.uk/~guy/Research/papers/WPTransNonSta.pdf.
Pan, Z., and Wang, X. (1998). A stochastic nonlinear regression estimator using wavelets. Computational
Economics, 11, 89-102.
Ramsey, J., Usikov, D., and Zaslavsky, G. (1995). An analysis of U.S. stock price behavior using
wavelets. Fractals, 3, 377-389.
17
Renaud, O., Starck, J., and Murtagh, F. (2002). Wavelet-based forecasting of short and long memory time
series. Working Paper No. 2002.04, Department of Econometrics, University of Geneva,
http://www.unige.ch/ses/metri/.
Principe, J., Euliano, N., and Lefebvre, C. (2000). Neural and Adaptive Systems: Fundamentals Through
Simulations, New York: John Wiley & Sons, Inc.
SIDC (Solar Influences Data Analysis Center) division of the Royal Observatory of Belgium, (2003).
http://sidc.oma.be/DATA/monthssn.dat.
Swee, E. and Elangovan. S. (1999). Applications of symmlets for denoising and load forecasting. In
Proceedings of the IEEE Signal Processing Workshop on Higher-Order Statistics, 165-169.
Thomason, M. (1997). Financial forecasting with wavelet filters and neural networks. Journal of
Computational Intelligence in Finance, 2, 27-32.
Tong, H. (1990). Nonlinear Time Series Analysis: A Dynamical System Approach, Oxford: Oxford
University Press.
Walker, J. (1999). A Primer on Wavelets and Their Scientific Applications, Boca Raton, Chapman &
Hall/CRC.
18
Table 1. Detailed information on data used in training and forecasting. ANN and GP were given identical data.
d1 d2 d3 d4 a4 Lags: Start 1920:01 1920:04 1920:08 1921:04 1921:04 End 1923:09 1924:08 1926:08 1930:08 1930:08 NOBS 22 14 10 8 8 Lost df 6 6 6 6 6 Steps-ahead 16 8 4 2 2 Training: Start 1923:10 1924:12 1927:04 1931:12 1931:12 End 1973:06 1973:08 1973:12 1974:03 1974:03 NOBS 299 147 71 32 32 Forecast: Start 1973:08 1973:12 1974:08 1974:08 1974:08 End 2005:05 2005:05 2005:05 2005:05 2005:05 NOBS 192 95 47 24 24
19
Table 2. Neural networks architectures used. d1 d2 d3 d4 a4 Network type MP MP GF MP MP Input PEs 6 6 6 6 6 Hidden layers 2 2 1 1 2 Transfer function* Sigmoid Tan Tan Tan Tan Learning momentum 0.7 0.7 0.7 0.7 0.7 Maximum training epochs 5000 2000 2000 1000 2000 * Sigmoid = SigmoidAxon, and Tan = TanAxon.
20
Table 3. Neural networks training results. d1 d2 d3 d4 a4 MSSN Residuals: Mean 0.000 -0.340 -0.215 0.402 3.420 1.867 p-value 1.000 0.783 0.935 0.738 0.556 0.013 Skewness 0.395 0.142 -0.195 -0.745 0.949 0.380 p-value 0.006 0.487 0.510 0.101 0.037 0.000 Kurtosis 3.271 1.000 0.566 2.328 2.382 1.005 p-value 0.000 0.016 0.355 0.016 0.014 0.000 ρ1 -0.124 -0.066 -0.209 -0.216 -0.126 0.336 p-value 0.034 0.430 0.088 0.259 0.472 0.000 ρ2 -0.032 -0.098 0.141 -0.223 0.353 0.045 p-value 0.581 0.246 0.248 0.245 0.041 0.317 LM χ2 4.567 1.893 5.397 2.285 5.043 63.053 Significance 1 1 1 1 1 1 Q 21.377 19.957 14.020 2.781 5.829 38.152 Significance 0.975 0.986 0.666 0.904 0.560 0.372 Energy: Original 0.011 0.008 0.012 0.017 0.953 1.002 Simulated 0.000 0.001 0.003 0.016 0.919 0.940 Other Stats: RMSE 12.90 14.90 21.19 6.63 32.19 17.03 R2 0.00 0.18 0.33 0.98 0.97 0.90
21
Table 4. TSGP prompted questions and answers used to search for fittest equations
Question Answer
Please enter the dependent variable file name (* for *.txt): a4, d4, …, d1.
Please enter number of data points in Historical (Training) set: Tj
Please enter total number of data points to Forecast: (Discretionary)
Please enter number of data points for ex post Forecast: (Discretionary)
Please enter population size: 1000
Please enter number of generations: 200
Please enter '1' for trig function and '0' for no trig: 1
Please enter '1' for exp function and '0' for no exp: 0
Please enter the number of explanatory variables: 6
Please enter number of searches desired: 100
22
Table 5. GP fitting results. d1 d2 d3 d4 a4 MSSN Residuals: Mean 0.470 -0.450 2.121 -0.707 7.791 2.342 p-value 0.487 0.689 0.292 0.772 0.344 0.002 Skewness -0.146 0.358 -0.385 -0.271 0.257 0.155 p-value 0.305 0.079 0.195 0.551 0.572 0.154 Kurtosis 1.718 0.913 0.263 -0.125 -0.166 0.147 p-value 0.000 0.027 0.667 0.898 0.864 0.499 ρ1 -0.070 -0.013 -0.020 -0.020 0.281 0.436 p-value 0.227 0.872 0.865 0.915 0.180 0.000 ρ2 -0.048 0.007 0.176 0.097 -0.060 0.163 p-value 0.411 0.934 0.148 0.614 0.779 0.000 LM χ2 2.019 0.034 2.190 0.293 1.948 146.500 Significance 1 1 1 1 1 1 Q 23.961 44.716 10.711 3.377 2.460 57.449 Significance 0.938 0.151 0.871 0.848 0.930 0.013 Energy: Original 0.011 0.008 0.012 0.017 0.953 1.002 Simulated 0.002 0.002 0.006 0.016 0.939 0.966 Other Stats: RMSE 11.68 13.97 17.34 13.51 45.80 17.52 R2 0.18 0.28 0.60 0.92 0.95 0.90
23
Table 6. ANN and GP forecast results. d1 d2 d3 d4 a4 MSSN ANN: RMSE 14.42 19.30 26.04 52.91 115.63 36.36 Thiele’s U 0.94 0.79 0.61 0.51 0.16 0.19 GP: RMSE 14.94 19.48 32.14 39.56 90.55 31.27 Thiele’s U 0.81 0.72 0.71 0.44 0.13 0.17
24
0
50
100
150
200
250
193112 193602 194004 194406 194808 195210 195612 196102 196504 196906 197308Date
Figure 1a. Monthly sunpot number ANN actual and fitted values
MS
SN
Actual Fitted
0
200
400
600
800
1000
1 4 7 10 13 16 19 22 25 28 31
ObservationsFigure 1b. Actual and f itted valued of a4
a4
Actual Fitted
-120
-80
-40
0
40
80
120
1 4 7 10 13 16 19 22 25 28 31
ObservationsFigure 1c. Actual and f itted values of d4
d4
Actual Fitted
25
-90
-60
-30
0
30
60
90
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70Observations
Figure 1d. Actual and f itted values of d3
d3
Actual Fitted
-60
-40
-20
0
20
40
60
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133Observations
Figure 1e. Actual and f itted values of d2
d2
Actual Fitted
-60
-40
-20
0
20
40
60
80
1 31 61 91 121 151 181 211 241 271Observations
Figure 1f. Actual and fitted values of d1
d1
Actual Fitted
26
0
50
100
150
200
250
193112 193602 194004 194406 194808 195210 195612 196102 196504 196906 197308Date
Figure 2a. Monthly sunpot number GP actual and fitted values
MSSN
Actual Fitted
0
200
400
600
800
1000
1 4 7 10 13 16 19 22 25 28 31Observations
Figure 2b. Actual and f itted values of 44
a4
Actual Fitted
-200
0
200
1 4 7 10 13 16 19 22 25 28 31
ObservationsFigure 2c. Actual and f itted values of d4
d4
Actual Fitted
27
-80
-60
-40
-20
0
20
40
60
80
100
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70Observations
Figure 2d. Actual and f itted values of d3
d3
Actual Fitted
-60
-40
-20
0
20
40
60
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134Observations
Figure 2e. Actual and f itted values of d2
d2
Actual Fitted
-60
-40
-20
0
20
40
60
80
1 31 61 91 121 151 181 211 241 271Observations
Figure 2f. Actual and f itted values of d1
d1
Actual Fitted
28
0
50
100
150
200
250
197408 197712 198104 198408 198712 199104 199408 199712 200104 200409
Date
MS
SN
Actual Forecast
Figure 3. Actual and ANN forecasted MSSN.
29
0
50
100
150
200
250
197408 197712 198104 198408 198712 199104 199408 199712 200104 200409
Date
MS
SN
Actual Forecast
Figure 4. Actual and GP forecasted MSSN.