report
DESCRIPTION
Time Series AnalysisTRANSCRIPT
MODELING & FORECASTING
STOCK PRICES
November 26, 2013
IOE 565: Team #6
Abdullah Alshelahi
Caner Arslan
Wouter Hielckert
Colin Jones
Steve Kim
TABLE OF CONTENTS
I. Introduction .................................................................................................... 1
II. How We Collected Our Data ....................................................................... 3
III. System Analysis: Procedures & Results ..................................................... 5
Part A: Forecasting Google Stock Prices .............................................. 5
Part B: Forecasting Apple Stock Prices .............................................. 16
Part C: Forecasting Yahoo Stock Prices ............................................. 21
IV. Another Approach: Forecasting Stock Prices Using Brownian Motion 26
V. Discussion & Significance of the Results .................................................. 31
VI. Conclusion ................................................................................................... 33
VII. References .................................................................................................... 34
1
I. INTRODUCTION
Predicting future stock prices has been a compelling topic for quite some time, as having
an accurate vision of the stock market’s future performance can help traders invest more suitably
to maximize financial profit. Perhaps the most common method used to perform stock price
forecasting is time series analysis. As we learned in class, time series analysis is a type of
statistical study on a series of sequential data points over a period of time, where the data points
are usually measured at uniform time intervals. Time series forecasting, then, takes the analysis
from the time series data and attempts to predict what the data will be in the near future, based on
what it has been in the past. This concept is especially important in the field of quantitative
finance because traders want to make wise moves at the right times to maximize their own
welfare. However, there are many factors that influence the fluctuation of the stock market, so
creating an accurate forecast based on time series analysis alone is challenging.
For this project, our team chose to model and forecast the stock prices of three
companies: Google, Apple, and Yahoo. We selected these specific companies because they all
operate in a similar “branch” of business – information technology. As most people know,
Google Inc. is a corporation that specializes in a variety of Internet-related products and services.
Similarly, Yahoo! Inc. is an Internet corporation that is globally known for its impressive range
of services. Finally, Apple Inc. is a corporation that designs, manufactures, and sells computer
software and personal computers. These three companies are very popular in the United States
today, and there is a large amount of information about them available online.
Our team’s main goal for this project was to find good models for the stock prices of the
three companies described above to predict the future stock price. However, we also wanted to
2
implement a couple of “new” approaches to modeling that were not explicitly discussed in class
throughout the semester. As you will see, our results indicate that there is not one method that
gives the best result for each and every time series.
3
II. HOW WE COLLECTED OUR DATA
One of the first things our team needed to do was figure out how we were going to collect
the stock price data for Google, Apple, and Yahoo. Also, we had to agree on a fixed time
horizon we would consider when performing our analysis. After some discussion, we agreed to
analyze a two-year time horizon of daily stock prices from October 24, 2011 to October 24, 2013
for each of the three companies.
Once we finished that step, we knew we had to split the data for each company into parts.
Based on the discussions we had in class during the early stages of this project, we learned that
data partitioning is a necessary step in many predictive exercises – the basic idea is to separate
the entire dataset into a training set and a testing (or validation) set. Why do we need to split the
data into two parts? As we learned in class, we partition the data because we want to ensure that
our model does a good job of predicting the “seen” data. If our model fulfills our expectations,
then we have some level of confidence about the predictive power of the model when we are
presented with the “unseen” data. Therefore, to partition the stock price data for each of the
three companies, our group (randomly) decided to chose the first half of the samples for the
training set and the remaining, more recent 50 percent of the samples for the testing set.
Next, our team extracted daily stock prices for Google, Apple, and Yahoo from the
Yahoo finance website: http://finance.yahoo.com/. For each company, the entire dataset consists
of 504 observations (prices). As discussed in the introduction, our team’s main goal was to find
the best model possible for predicting each company’s stock price. So, based on the “training-
testing” method described above, we used the first 252 observations from each company’s whole
4
dataset to form the training set for each company. Similarly, we used the final 252 observations
from the entire dataset of each company to construct the testing set for each company.
For Google, Apple, and Yahoo, the entire dataset for each company contains the open,
high, low, close, and adjusted close stock prices on every trading day from October 24, 2011 to
October 24, 2013. The datasets also contain trading volume values on every trading day. To
achieve consistency, we used the close prices as a general measure of the stock price. By
definition, the closing price of a stock is the final price at which that stock is traded on a given
trading day. It represents the most up-to-date valuation of the stock until trading begins again on
the next trading day. In other words, the stock prices we used in our analysis represent the
closing prices.
Since trading days are never on weekends, the first 252 observations happen to run from
October 24, 2011 to October 22, 2012 and the final 252 prices run from October 23, 2012 to
October 24, 2013 for each company. As the next section shows, quite a bit of analysis was done
on each company’s training set. In addition, the models obtained for each company are quite
different, reflecting the fact that our group chose to evenly divide the work among ourselves.
That is, we decided to split the work for this project among the group by each company. Our
reasoning for this is that we wanted “a different mind” working on each company so all the
results would not be repetitive. We were hoping to obtain a fresh perspective on the appropriate
model for each company, and we were also trying to be efficient when conducting our analysis.
5
III. SYSTEM ANALYSIS: PROCEDURES
& RESULTS
Part A: Forecasting Google Stock Prices
As our team just established, we decided to split the stock price data for Google into two
parts. The first part – the first 252 prices – is used for training our model and the second part –
the last 252 prices – is used for testing our model. The whole dataset (504 observations) for
Google is shown here:
0
200
400
600
800
1000
1200
11
63
14
66
17
69
11
06
12
11
36
15
11
66
18
11
96
21
12
26
24
12
56
27
12
86
30
13
16
33
13
46
36
13
76
39
14
06
42
14
36
45
14
66
48
14
96
Sto
ck P
rice
($
)
Time (Days)
Daily Google Stock Prices (10/24/11 - 10/24/13)
6
Using the ARMA(2n, 2n - 1) modeling strategy, we determined ARMA(4, 3) to be the
adequate model for the training dataset. The F-tests were computed as follows:
s N - γ F FINV(5%, s, N - γ)
RSS(2,1) 22563 RSS(4,3) 21510 4 245 2.998430962 2.40848837
RSS(4,3) 21510 RSS(6,5) 21028 4 241 1.381039566 2.409100382
RSS(6,5) 21028 RSS(8,7) 20824 4 237 0.580436035 2.409733235
RSS(4,3) 21510 RSS(5,4) 21249 2 243 1.492376112 3.032969422
RSS(3,2) 22561 RSS(4,3) 21510 2 245 5.985471874 3.032662958
These are the lambda values and the actual model we obtained:
l1 = -0.9233 l2 =1.0110+0.0472i l3 =1.0110-0.0472i
l4 = 0.9498
Xt -2.048Xt-1 +0.2009Xt-2 +1.746Xt-3 -0.8983Xt-4 = at -1.039at-1 -0.9175at-2 +0.9596at-3
The Portmanteau test resulted in a value of Q = 8.33 whereas the critical value is 30.14.
Since Q is less than the critical value, our team concluded that the at ’s are uncorrelated.
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20P(k
)
Lag (k)
Bartlett Band Test
7
Moreover, the Bartlett Band test confirms this result – notice in the above graph that the at ’s are
less than the absolute value of 2/sqrt(252) ≈ 0.126. Therefore, AR(4, 3) is the adequate model.
The mean squared error (MSE) is 86.73 for the training part and 374.68 for the testing part. This
may imply that the model overfits the data. Also, as seen above, there are lambda values greater
than one, so this model is unstable and non-stationary. If a single at is injected into the system,
the system response can exceed any bound (that is, explode, given sufficient time). As a result,
this non-stationarity needs to be eliminated. The complex roots are very close to one – therefore,
the (1- 2B+B2 )seasonal operator was applied and a parsimonious model was calculated. We
arrived at this ARIMA(2, 2, 3) model:
(1-2B+B2 )(1+0.3485B-0.5651B2 )Xt = (1-0.5671B-0.9258B2 +0.575B3)at
After eliminating non-stationarity, the new model gave us a better MSE for the testing
portion of the data; the MSE is 96.67 for the training part and 150.44 for the testing part. This is
the graph of the prediction for the testing part using ARIMA(2, 2, 3):
0
200
400
600
800
1000
1200
1 8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
113
120
127
134
141
148
155
162
169
176
183
190
197
204
211
218
225
232
239
246
Sto
ck P
rice
($)
Time (Days)
Forecast of Google Stock Prices (Testing Part)
Prediction Observed
8
On the other hand, when using the ARMA(n, n - 1) modeling strategy, we selected AR(1)
as the adequate model:
s N - γ F FINV(5%, s, N - γ)
RSS(1,0) 22932 RSS(2,1) 22563 2 249 2.036098923 3.032064916
RSS(1,0) 22932 RSS(1,1) 22900 1 250 0.349344978 3.878923701
Xt - 0.9825Xt-1 = at
Furthermore, we found AR(1) to be adequate after applying the Portmanteau and the Bartlett
Band tests. The MSE is 91.36 for the training part and 162.61 for the testing part. Notice that
the MSE value for the testing part is higher than the MSE value we obtained for the
ARIMA(2, 2, 3) model.
Another Approach
0
200
400
600
800
1000
1200
11
63
14
66
17
69
11
06
12
11
36
15
11
66
18
11
96
21
12
26
24
12
56
27
12
86
30
13
16
33
13
46
36
13
76
39
14
06
42
14
36
45
14
66
48
14
96
Sto
ck
Pri
ce (
$)
Time (Days)
Google Trend
Daily Google Stock Prices (10/24/11 - 10/24/13)
9
The models discussed so far are based on the assumption that the mean and covariance are
independent of the time origin; this assumption implies that the mean is constant and the
autocovariance depends only on the lag. If we look at the graph on the previous page that shows
Google’s stock prices, it is evident there is a trend – the behavior of Google’s stock price
depends on the time origin. At this point, our team first tried to remove this non-stationary trend
and model the remaining data.
Our group decomposed the series into two parts. The first part represents a non-
stationary trend by a deterministic function that depends on the time origin. The second part
represents stochastic behavior that can be modeled using ARMA. To model the deterministic
part, we applied linear regression formulae (the least squares estimation (LSE) method) to the
centralized series of the training part and estimated the parameters b0 and b1
. Here, the
residuals (et ) are the deviations from the trend line.
Yt = b0 + b1t +et
b0 = -47.98
b1 = 0.379
-150-100
-500
50100150200250300
11
52
94
35
77
18
59
91
13
12
71
41
15
51
69
18
31
97
21
12
25
23
92
53
26
72
81
29
53
09
32
33
37
35
13
65
37
93
93
40
74
21
43
54
49
46
34
77
49
1
Res
idu
als
Time (Days)
Residuals after Removing
Trend (εt = Yt - (-47.98 + 0.379t))
Residuals
10
After removing the deterministic trend, the residuals now have a constant zero mean. We
can model this stochastic part using ARMA models. Using the ARMA(2n, 2n - 1) modeling
strategy, we found AR(1) to be the adequate model.
Xt = 0.9727Xt-1 +at
s N - γ F FINV(5%, s, N - γ)
RSS(2,1) 22470 RSS(4,3) 21972 4 245 1.38824413 2.40848837
RSS(1,0) 22777 RSS(2,1) 22470 2 249 1.70100134 3.032064916
RSS(1,0) 22777 RSS(1,1) 22734 1 251 0.47475147 3.878773587
The Portmanteau test resulted in a value of Q = 9.184 whereas the critical value was
30.14. Since the Q value is lower than the threshold value, it can be concluded that the at ’s are
uncorrelated. Moreover, the Bartlett Band test confirms this result. Therefore, AR(1) is
adequate for the residuals.
Finally, combining the deterministic and stochastic parts, the complete model can be adopted for
the Google stock prices as:
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20P(k
)
Lag (k)
Bartlett Band Test
11
Yt = b0 + b1t +Xt
Yt = -47.98+0.379t +09727Xt-1 +at
whereXt is the stationary part that follows AR(1).
The mean squared error for the training set is 90.74, while the mean squared error for the
testing set is 151.8. As seen in the figure below, the trend line does not fit the series of the
testing part well and, consequently, prediction does not perform well enough for the testing part.
0
50
100
150
200
250
300
350
400
450
1 7131925313743495561677379859197
103
109
115
121
127
133
139
145
151
157
163
169
175
181
187
193
199
205
211
217
223
229
235
241
247
Sto
ck
Pri
ce (
$)
Time (Days)
Forecast of Google Stock Prices
Prediction Observed Trend
12
If the Google stock price data series is examined carefully, it can be seen that the stock
prices have a tendency to increase once they start increasing as seen in the rounded rectangles
above. From this realization, we tried removing the trend depicted by the green line in the figure
above and modeled the remaining residuals again.
Yt = b0 + b1t +et
b0 = -56.71
b1 = 3.01
After removing the deterministic trend, the stochastic part is modeled using the ARMA(2n, 2n -
1) modeling strategy. We selected ARMA(2, 1) as the adequate model and the Portmanteau and
Bartlett Band tests approved its adequacy. Here is the model for the residuals:
Xt = 0.255Xt-1 +0.754Xt-2 +at +0.8078at-1
13
Also, the complete model is:
Yt = -56.71+3.01t +0.255Xt-1 +0.754Xt-2 +at +0.8078at-1
The complete model has a MSE of 93.4 for the training set and a MSE of 153.23 for the test set.
Finally, our team considered a trend that fits the whole data. The deterministic trend
shown below was calculated by applying regression analysis formulae to the whole dataset (that
is, the dataset containing both the training and testing parts).
Yt = b0 + b1t +et
b0 = -75.9
b1 = 0.6
-100
0
100
200
300
400
500
11
52
94
35
77
18
59
91
13
12
71
41
15
51
69
18
31
97
21
12
25
23
92
53
26
72
81
29
53
09
32
33
37
35
13
65
37
93
93
40
74
21
43
54
49
46
34
77
49
1
Sto
ck P
rice
($
)
Time (Days)
Forecast of Google Stock Prices
Prediction Observed Trend
14
Then, AR(1) was selected as the best model for the remaining stochastic part after removing the
trend. Again, we confirmed the adequacy of this model by conducting both the Portmanteau and
Bartlett Band tests. The complete model is:
Yt = -75.9+0.6t +0.3372Xt-1 +at
Fortunately, this model has a mean squared error of 90.52 for the training data and a mean
squared error of 140.89 for the testing data.
Finally, our team tried modeling after removing exponential trend. Exponential trend
was calculated using Minitab for the whole part. The following equation is exponential trend for
the Google stock price series:
Yt = 553.331*(1.00101t )+et
The residuals that remained after removing the trend above from the actual data were centralized
and fitted to a model using the ARMA(2n, 2n - 1) modeling strategy. AR(1) was selected as the
adequate model and the Portmanteau and Bartlett Band tests approved its adequacy once more.
This is the model for the residuals:
Xt = 0.9757Xt-1 +at
Also, this is the complete model:
Yt = 553.331*(1.00101t )+0.9757Xt-1 +at
The mean squared error for the training set is 90.93, while the mean squared error for the test set
is 137.58.
15
Based on these results, it can be concluded that the model with exponential trend fits the data
best since it gives the lowest MSE value for the testing portion.
0
200
400
600
800
1000
12001 8
15
22
29
36
43
50
57
64
71
78
85
92
99
10
61
13
12
01
27
13
41
41
14
8
15
5
16
21
69
17
6
18
31
90
19
7
20
42
11
21
8
22
52
32
23
9
24
6
Sto
ck
Pri
ce (
$)
Time (Days)
(Testing Part)
Forecast of Google Stock Prices
Trend Prediction Observed
Model
MSE for
Training
Part
MSE for
Testing
Part
ARMA(4,3) 86.73 374.68
ARIMA(2,2,3) 96.67 150.44
AR(1) 91.36 162.61
Deterministic Trend and AR(1)
(Yt = -47.98+0.379t +09727Xt-1 +at ) 90.74 151.8
Deterministic Trend and ARMA(2,1)
(Yt = -56.71+3.01t +0.255Xt-1 +0.754Xt-2 +at +0.8078at-1) 93.4 153.23
Deterministic Trend and AR(1)
(Yt = -75.9+0.6t +0.3372Xt-1 +at) 90.52 140.89
Exponential Trend and AR(1)
(Yt = 553.331*(1.00101t )+0.9757Xt-1 +at) 90.93 137.58
16
Part B: Forecasting Apple Stock Prices
Daily (closing) Apple stock price data from October 24, 2011 to October 24, 2013 was
split into two data sets of 252 data points each. Our team labeled the first half of the data the
training set and the second half of the data the test set (recall this is what we did for Google as
well). The training set was used to fit an ARMA model according to the ARMA(2n, 2n - 1)
modeling strategy, starting from the AR(1) model. The centralized dataset containing all 504
Apple stock prices as well as the F-test results are shown below:
s N - γ F FINV(5%, s, N-γ)
RSS(1,0) 21701 RSS(2,1) 21605 2 249 0.55 3.03
RSS(2,1) 21605 RSS(4,3) 20544 4 245 3.16 2.41
RSS(4,3) 20544 RSS(6,5) 19641 4 241 2.77 2.41
RSS(6,5) 19641 RSS(8,7) 18897 4 237 2.33 2.41
RSS(6,5) 19641 RSS(7,6) 18825 2 239 5.18 3.03
17
Comparing each consecutive model using the F-test, the AR(1) model is better than the
ARMA(2, 1) model and thus the AR(1) model was adopted. This model is:
Xt -0.9933Xt-1 = at
The Portmanteau test resulted in a Q value of 34.34, while the threshold in this case is 30.14.
Since Q is greater than this threshold, it can be concluded that the at ’s are correlated.
Furthermore, the Bartlett band with a time lag of 20 shows an autocorrelation that is too high for
a time lag of 10. Based on this observation, the AR(1) model does not fit the data well; hence,
the ARMA(2n, 2n - 1) modeling strategy was continued. This finally yielded an ARMA(7, 6)
model. This model is:
Xt -1.222Xt-1 -0.08271Xt-2 -0.0295Xt-3 +0.01842Xt-4 -0.03697Xt-5 +0.9984Xt-6 -0.6448Xt-7
= at-2 -0.4489at-3 -0.4088at-4 -0.2871at-5 +0.891at-6
The Portmanteau test resulted in a Q value of 12.81, while the threshold in this case is 14.07.
Since Q is less than this threshold, it can be concluded that the at ’s are uncorrelated. The
18
Bartlett band with time lag 20 confirms this conclusion, as all time lags have an autocorrelation
less than 2/sqrt(252) ≈ 0.126. The ARMA(7, 6) model fits the data well.
Next, we calculated the values of lambda. There are seven total roots:
Lambda | Lambda | Period
-0.8210 ± 0.4919i 0.9571 2.48
0.0449 ± 0.9414i 0.9425 4.12
0.9767 0.9767
0.8990 ± 0.0565i 0.9007 13.86
19
One of these values is close to one, so a stochastic constant trend exists. The three pairs of
complex roots are each close to one in absolute value, which represents seasonality with periods
of 2.48, 4.12, and 13.86 respectively.
The obtained model was evaluated by calculating one-step ahead predictions for all the
data in the test set using the model we built based on the training set. The predictions and actual
stock prices of the second year are pictured below.
The mean squared error for the training set was 75.00, while the mean squared error for the test
set was much higher: 120.23. This might imply that the model overfits the data.
Our last effort was to remove the trend and seasonality from the data. Using the roots we
obtained earlier, we formed the following time series model:
wt = (1-B)(1+1.6421B+B2 )(1-0.0898B+B2 )(1-1.7979B+B2 )Xt
= at -q1at-1 -q2at-2 -q3at-3 -q4at-4 -q5at-5 -q6at-6
20
Hence, wt is a MA(6) model, and by fitting this model using MATLAB, the following
parameters were placed in the above equation:
wt = at -0.1757at-1 +0.03724at-2 -0.2669at-3 +0.04324at-4 -0.1618at-5 +0.9611at-6
This model was again evaluated using one-step ahead predictions for the test set. The
predictions, after converting them back to Xt , and actual stock prices during the second year
(that is, the testing portion of the data) are pictured below.
The mean squared error (MSE) for the training set was 91.39, which is higher than the MSE of
75.00 for the ARMA(7, 6) model obtained earlier. However, the MSE for the test set was only
slightly higher than for the training set – 109.18, which is in fact less than the MSE of the
previous model of 120.23. Consequently, it seems like the ARMA(7, 6) overfits the data,
especially compared to the MA(6) model. The MA(6) model does not seem to overfit the data,
since the MSE of the test set is only slightly higher than the MSE of the training set. Based on
21
these results, it can be concluded that the best model is the MA(6) model applied to the data
where trend and seasonality are removed.
Part C: Forecasting Yahoo Stock Prices
Finally, as our group did for Google and Apple in the previous two parts of this section,
we collected 504 daily Yahoo closing stock prices (from October 24, 2011 through October 24,
2013), all of which served to construct our entire dataset. Noting again that there are 252 trading
days in a year, we split the dataset into two series of equal length for purposes of our analysis.
Exhibit 1 below shows the directionality of the Yahoo stock prices.
Exhibit 1: Yahoo Stock Prices
0
5
10
15
20
25
30
35
40
0 100 200 300 400 500
Sto
ck
Pri
ce (
$)
Trading Day Since 10/24/2011
Daily Yahoo Stock Prices (10/24/11 - 10/24/13)
First 252 Trading Days
22
The first 252 data points correspond to the training portion, wherein the data is used to fit a
forecast model. The latter 252 data points correspond to the testing portion, wherein the model is
used to forecast actual stock prices.
Model Selection Method
Our team used the F-test approach for ARMA(2n, 2n -1) forecast model selection with
MATLAB. The table shown below shows the data output:
Table 1: Selection Details
ARMA Model RSS ARMA Model RSS s N-γ F F Crit. (0.05,s,N-γ)
(2,1) 11.76 (4,3) 10.91 4 245 4.79 2.41 (4,3) 10.91 (6,5) 10.08 4 241 4.99 2.41 (6,5) 10.08 (8,7) 9.03 4 237 6.89 2.41 (8,7) 9.03 (10,9) 9.58 4 233 -3.35 2.41 (8,7) 9.03 (9,8) 8.49 2 235 7.41 3.03
As Table 1 shows, we decided to make one further comparison between the ARMA(8, 7) and
ARMA(9, 8) models and ultimately chose the latter. To check the validity of the model, our
team conducted the Bartlett Band test at α = 0.05. However, correlation at lag 15 lies outside of
the band, so we concluded that the model is inadequate.
Therefore, we conducted the same test with ARMA(8, 7), but the correlation at lag 12
also lies outside of the band. Instead of trying the ARMA(6, 5) model, we considered the
ARMA(7, 6) model, given that the ARMA(7, 6) model (with respect to the ARMA(6, 5) model)
has a non-trivial F value of 6.24, which is greater than the threshold of 3.03. The Bartlett Band
test is illustrated in Exhibit 2 on the next page:
23
Exhibit 2: Bartlett Band for ARMA(7, 6)
Clearly, all correlations lie inside the bands, suggesting the model is adequate. The model
passed the Portmanteau test at a maximum lag of K = 20 with Q = 10.255 < 30.144 =
𝜒2(0.95,19). The equation for the ARMA(7, 6) model is shown below:
(1 − 0.7766𝐵 − 0.1345𝐵2 − 0.1987𝐵3 + 0.3479𝐵4 + 0.07447𝐵5 + 0.5085𝐵6 −
0.5369𝐵7)𝑋𝑡
= (1 + 0.1624𝐵 − 0.04794𝐵2 − 0.2524𝐵3 + 0.1669𝐵4 + 0.1504𝐵5 + 0.8714𝐵6)𝑎𝑡
This equation has the following characteristic roots:
Table 2: Characteristic Roots
Root Real Imaginary r 𝜱𝟏 𝝎 Period
𝜆1 -0.781 0.497 0.925 -1.561 2.575 2.440
𝜆2 -0.781 -0.497 0.925 -1.561 2.575 2.440
𝜆3 -0.099 0.890 0.896 -0.198 1.682 3.736
𝜆4 -0.099 -0.890 0.896 -0.198 1.682 3.736
𝜆5 0.856 0.465 0.974 1.711 0.498 12.624
𝜆6 0.856 -0.465 0.974 1.711 0.498 12.624 𝜆7 0.825 0.000 0.825
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0 2 4 6 8 10 12 14 16 18 20
Co
rrel
ati
on
s
Lag
Bartlett Band
24
The periodicities on the previous page show the periodic nature of the model, with the largest
period being 12.624. Smaller periods of 3.736 and 2.440 are also present in the model.
Results
The model replicated the actual observations during the first 252 days quite accurately with an
MSE of 0.038 (for training data) as shown in Exhibit 3 below:
Exhibit 3: Training Graphs
On the other hand, the 252-day, one-step-ahead forecast was less accurate with a mean squared
error of 2.209 as shown on the next page in Exhibit 4:
14
14.5
15
15.5
16
16.5
17
0 50 100 150 200 250
Sto
ck
Pri
ce (
$)
Trading Day Since 10/24/2011
Yahoo Training Data: Prediction vs. Actual
Prediction Actual
MSE = 0.038
25
Exhibit 4: Test Graphs
Note that the prediction diverges downward as time goes on. This is due to the fact that the
model was derived from constant trend stock prices of the first 252 days.
0
5
10
15
20
25
30
35
40
253 303 353 403 453 503
Ya
ho
o S
tock
Pri
ce (
U$
)
Trading Day Since 10/24/2011
Yahoo Testing Data: Prediction vs. Actual
Prediction Actual
MSE = 2.209
26
IV. ANOTHER APPROACH:
FORECASTING STOCK PRICES USING
BROWNIAN MOTION
Researchers have proposed several mathematical models to forecast future stock prices.
For our analysis, our team examined an additional tool besides ARMA modeling to forecast
Yahoo, Apple, and Google stock prices – Brownian motion. For a little background, Brownian
motion, also known as the Wiener process, was first introduced by Robert Brown to describe the
motion exhibited by particles immersed in a gas or liquid. This process also describes the stock
price movements although Benoit Mandelbrot, a mathematician, rejected its applicability.
Browian motion can be formulated as a random walk with a drift: tWtB = , where
tW is a random walk process. In additon, tW can be written as tZWt = where Z is normally
distributed with zero mean and a standard deviation of one. As a result, ][dBE and dtdBE =][ 2 .
Brownian motion has the following properties :
1) Continuity : B(t) is a continuous-time process.
2) Markov property: : B(t) only depends on the previous value.
3) Martingale property : .=),,|( 11 nnn BBBB E
The stochastic behavior of a stock price tS follows the geometric Brownian motion process and
can be written as tttt dBSdtSdS = . The solution to this stochastic differential equation can
be found by applying the famous Itô formula. It follows that .2
exp=2
0
tt WtSS
27
The stock volatility measures the stability of the stock price and it can be computed by the
following equations :
2
1= )(1
1
=
uun
t
n
t
1
ln=t
tt
S
Su
The expected annual rate of return or drift is denoted by 2
)(=
2
u . According to the
previous results and assumptions, the expected value )( tSE of the stock price at future time t is
given by: tySSE t exp=)( 0 and 2
2 y .
The Computational Results
In this section, our team discusses how we forecasted the future stock prices of the three
companies using three approaches applied on the same model tySSE t exp=)( 0 . We used
the historical date to obtain 2
2 y by either measuring the daily return or the total return
depending on the forecasting approach. In the first approach (denoted by “Forecast 1” in the
graphs that complement this section), we forecasted future stock prices by using n-step ahead
prediction and found that the prediction was satisfactory up to 20 days step ahead prediction.
After that point, our model did not capture the actual stock price movements. For our second
approach (denoted by “Forecast 2” in the associated graphs), we proposed a different way to
update our forecast so that we could capture the change in the volatility of the stock market. The
volatility in general is not constant and changes due to many factors. We updated the stock
volatility plus the rate of return 2
2 y for every period in the expected value equation of
the stock price. The updating process started by observing the oscillation in the stock price
28
movement in the testing data and measuring its ty value. Then, we predicted the one step-ahead
stock price and periodically updated it with the real value. The forecasted data from this
approach was better and has a time series plot slightly similar to the actual data.
Finally, our group used MATLAB to implement the Brownian motion forecasting
algorithm, and we then computed the future stock prices of the three companies. Brownian
motion produced more than 1,000 random walk paths of the stock prices movements and one of
them has been selected randomly and has been compared with the testing data. We have
observed that our Brownian motion model is an accurate and good predictor method if the
prediction step is less than a month. Brownian motion outcomes slightly deviate from the testing
data when the prediction step is more than a month and that deviation could come from the
volatility of the stock price or the market rate of return. The Brownian motion generated
different paths and the selected path is not guaranteed to be the perfect one. The future path of
the stock price should be selected with accuracy and updated with any information available
about the stock price behavior, similar to what was done in the first two approaches.
Apple
Yahoo
0
200
400
600
800
1000
1200
1 9
17
25
33
41
49
57
65
73
81
89
97
10
5
11
3
12
1
12
9
13
7
14
5
15
3
16
1
16
9
17
7
18
5
19
3
20
1
20
9
21
7
22
5
23
3
24
1
24
9
Real Forecast 1 Forecast 2 Forecast 3
29
Apple
Yahoo
0
100
200
300
400
500
600
700
800
900
10001 9
17
25
33
41
49
57
65
73
81
89
97
10
5
11
3
12
1
12
9
13
7
14
5
15
3
16
1
16
9
17
7
18
5
19
3
20
1
20
9
21
7
22
5
23
3
24
1
24
9
Real Forecast 1 Forecast 2 Forecast 3
0
5
10
15
20
25
30
35
40
1 9
17
25
33
41
49
57
65
73
81
89
97
10
5
11
3
12
1
12
9
13
7
14
5
15
3
16
1
16
9
17
7
18
5
19
3
20
1
20
9
21
7
22
5
23
3
24
1
24
9
Real Forecast 1 Forecast 2 Forecast 3
30
As a reminder, in the above graphs, “Forecast 1” represents n-step ahead prediction,
“Forecast 2” represents one-step ahead prediction, and “Forecast 3” shows the randomly selected
Brownian motion path. From these graphs, we can see that Forecast 3 has the least mean
squared error because its deviation from the real line, which is shown in blue, is the smallest. In
fact, you can barely see the blue line because the Brownian motion outcomes mimic the real data
so well!
31
V. DISCUSSION & SIGNIFICANCE OF
THE RESULTS
In our analysis above, we assumed that the daily volatility and expected return of the first
year is equal to the daily volatility and expected return of the second year so that we would have
the same oscillation in the stock price movement. When we used the average sigma and mu, we
found out we could not carry out the calculations because in the expectation, we are ignoring the
random walk part in the sigma values. In other words, we are assuming sigma and mu are
constant over all days. Assuming the sigma and mu for each day in the training data is equal to
the sigma and mu in the testing data is better because we can capture all the movement in the
stock prices. Referring back to the graphs in the Brownian motion section of our report, a
constant mu and sigma will resemble the red line because if you take the expectation of mu and
sigma and just change the time, the outcome will always be an increasing or decreasing function.
Which Model Is Best for Each Company?
We found the MSE of each model for each company. We determined that for Google,
the lowest MSE was 137.58. For Google, the MSE for Brownian motion was 285.334.
Therefore, for Google, we should use the AR(1) model with exponential trend. For Apple, the
MA(6) model produced the lowest MSE value of 109.18, whereas the MSE for Brownian motion
was 154.7751064. Therefore, for Apple, we should use the MA(6) model. Lastly, for Yahoo,
the lowest MSE was 2.209, whereas the MSE for Brownian motion was 0.327270663. In this
case, we conclude that Brownian motion is the best option for Yahoo.
32
Significance of Our Results
We can use the models we developed for this project to predict future stock prices, and
we found that Yahoo has the lowest MSE, meaning that it is easier to predict Yahoo stock prices
for the existent conditions. This shows that some stocks are easier to predict than others – you
can make money more easily when a company’s stock price is more predictable. In other words,
if a company experiences the same (stable) conditions for the relevant time horizon (two years in
our case), we are confident we can predict the stock prices in the third year assuming the same
stable conditions hold.
An important conclusion we reached as a result of this project is that although there are
many modeling techniques available for stock price datasets, there is not one method that gives
the best results for each and every time series. Each time series has its own unique behavior that
presents many modeling challenges; these challenges need to be thoroughly examined before
selecting the best modeling technique that can be used to predict future patterns.
33
VI. CONCLUSION
Based on our results, there was not much volatility for Yahoo, whereas with Google and
Apple, there was much more oscillation. Based on the Brownian motion analysis, it was easier
to capture the stock fluctuations for Yahoo than for Google. The variance for Apple is very high
– one reason for this variability might be that Apple did something very different as a company
in 2011 compared to its operations in 2012. Overall, our results for the three stocks are very
different. We have very different MSEs for the three individual companies, and we noted that
Yahoo’s stock price is much more reasonable to predict. In conclusion, our group enjoyed
working on this project together and having the opportunity to implement new procedures such
as deterministic trend, exponential trend, and Brownian motion to help us achieve our objective
of finding the best models for each company’s stock price.
34
VII. REFERENCES
[1] Baxter, Martin, and Andrew Rennie. Financial Calculus: An Introduction to
Derivative Pricing. Cambridge: Cambridge UP, 1996. Print.
[2] Beichelt, Frank. Stochastic Processes in Science, Engineering, and Finance.
Boca Raton: Chapman & Hall/CRC, 2006. Print.
[3] Fama, E. “Random Walks in Stock Market Prices. Financial Analysis
Journal”, Vol. 51 (1): 1965. 1-6.
[4] Ladde, G.S. and L. Wu. “Development of Modified Geometric Brownian
Motion Models by using Stock Price Data and Basic Statistics”, Vol. 71 (12):
15 Dec. 2009.
[5] Mun, Johnathan. Applied Risk Analysis: Moving beyond Uncertainty in
Business. Hoboken, NJ: Wiley, 2004. Print.
[6] Pandit, Sudhakar M., and Shien-Ming Wu. Time Series and System Analysis
with Applications. Malabar, FL: Krieger Pub., 2001. Print.
[7] Ross, Sheldon M. An Elementary Introduction to Mathematical Finance.
NewYork: Cambridge UP, 2011. 38-39. Print.
[8] Ross, Stephen A., Randolph Westerfield, and Bradford D. Jordan.
Fundamentals of Corporate Finance. 10th ed. New York, NY:
McGraw-Hill/Irwin, 2013. 401-02. Print.