report

36
MODELING & FORECASTING STOCK PRICES November 26, 2013 IOE 565: Team #6 Abdullah Alshelahi Caner Arslan Wouter Hielckert Colin Jones Steve Kim

Upload: sangwoo-kim

Post on 18-Jul-2016

6 views

Category:

Documents


1 download

DESCRIPTION

Time Series Analysis

TRANSCRIPT

MODELING & FORECASTING

STOCK PRICES

November 26, 2013

IOE 565: Team #6

Abdullah Alshelahi

Caner Arslan

Wouter Hielckert

Colin Jones

Steve Kim

TABLE OF CONTENTS

I. Introduction .................................................................................................... 1

II. How We Collected Our Data ....................................................................... 3

III. System Analysis: Procedures & Results ..................................................... 5

Part A: Forecasting Google Stock Prices .............................................. 5

Part B: Forecasting Apple Stock Prices .............................................. 16

Part C: Forecasting Yahoo Stock Prices ............................................. 21

IV. Another Approach: Forecasting Stock Prices Using Brownian Motion 26

V. Discussion & Significance of the Results .................................................. 31

VI. Conclusion ................................................................................................... 33

VII. References .................................................................................................... 34

1

I. INTRODUCTION

Predicting future stock prices has been a compelling topic for quite some time, as having

an accurate vision of the stock market’s future performance can help traders invest more suitably

to maximize financial profit. Perhaps the most common method used to perform stock price

forecasting is time series analysis. As we learned in class, time series analysis is a type of

statistical study on a series of sequential data points over a period of time, where the data points

are usually measured at uniform time intervals. Time series forecasting, then, takes the analysis

from the time series data and attempts to predict what the data will be in the near future, based on

what it has been in the past. This concept is especially important in the field of quantitative

finance because traders want to make wise moves at the right times to maximize their own

welfare. However, there are many factors that influence the fluctuation of the stock market, so

creating an accurate forecast based on time series analysis alone is challenging.

For this project, our team chose to model and forecast the stock prices of three

companies: Google, Apple, and Yahoo. We selected these specific companies because they all

operate in a similar “branch” of business – information technology. As most people know,

Google Inc. is a corporation that specializes in a variety of Internet-related products and services.

Similarly, Yahoo! Inc. is an Internet corporation that is globally known for its impressive range

of services. Finally, Apple Inc. is a corporation that designs, manufactures, and sells computer

software and personal computers. These three companies are very popular in the United States

today, and there is a large amount of information about them available online.

Our team’s main goal for this project was to find good models for the stock prices of the

three companies described above to predict the future stock price. However, we also wanted to

2

implement a couple of “new” approaches to modeling that were not explicitly discussed in class

throughout the semester. As you will see, our results indicate that there is not one method that

gives the best result for each and every time series.

3

II. HOW WE COLLECTED OUR DATA

One of the first things our team needed to do was figure out how we were going to collect

the stock price data for Google, Apple, and Yahoo. Also, we had to agree on a fixed time

horizon we would consider when performing our analysis. After some discussion, we agreed to

analyze a two-year time horizon of daily stock prices from October 24, 2011 to October 24, 2013

for each of the three companies.

Once we finished that step, we knew we had to split the data for each company into parts.

Based on the discussions we had in class during the early stages of this project, we learned that

data partitioning is a necessary step in many predictive exercises – the basic idea is to separate

the entire dataset into a training set and a testing (or validation) set. Why do we need to split the

data into two parts? As we learned in class, we partition the data because we want to ensure that

our model does a good job of predicting the “seen” data. If our model fulfills our expectations,

then we have some level of confidence about the predictive power of the model when we are

presented with the “unseen” data. Therefore, to partition the stock price data for each of the

three companies, our group (randomly) decided to chose the first half of the samples for the

training set and the remaining, more recent 50 percent of the samples for the testing set.

Next, our team extracted daily stock prices for Google, Apple, and Yahoo from the

Yahoo finance website: http://finance.yahoo.com/. For each company, the entire dataset consists

of 504 observations (prices). As discussed in the introduction, our team’s main goal was to find

the best model possible for predicting each company’s stock price. So, based on the “training-

testing” method described above, we used the first 252 observations from each company’s whole

4

dataset to form the training set for each company. Similarly, we used the final 252 observations

from the entire dataset of each company to construct the testing set for each company.

For Google, Apple, and Yahoo, the entire dataset for each company contains the open,

high, low, close, and adjusted close stock prices on every trading day from October 24, 2011 to

October 24, 2013. The datasets also contain trading volume values on every trading day. To

achieve consistency, we used the close prices as a general measure of the stock price. By

definition, the closing price of a stock is the final price at which that stock is traded on a given

trading day. It represents the most up-to-date valuation of the stock until trading begins again on

the next trading day. In other words, the stock prices we used in our analysis represent the

closing prices.

Since trading days are never on weekends, the first 252 observations happen to run from

October 24, 2011 to October 22, 2012 and the final 252 prices run from October 23, 2012 to

October 24, 2013 for each company. As the next section shows, quite a bit of analysis was done

on each company’s training set. In addition, the models obtained for each company are quite

different, reflecting the fact that our group chose to evenly divide the work among ourselves.

That is, we decided to split the work for this project among the group by each company. Our

reasoning for this is that we wanted “a different mind” working on each company so all the

results would not be repetitive. We were hoping to obtain a fresh perspective on the appropriate

model for each company, and we were also trying to be efficient when conducting our analysis.

5

III. SYSTEM ANALYSIS: PROCEDURES

& RESULTS

Part A: Forecasting Google Stock Prices

As our team just established, we decided to split the stock price data for Google into two

parts. The first part – the first 252 prices – is used for training our model and the second part –

the last 252 prices – is used for testing our model. The whole dataset (504 observations) for

Google is shown here:

0

200

400

600

800

1000

1200

11

63

14

66

17

69

11

06

12

11

36

15

11

66

18

11

96

21

12

26

24

12

56

27

12

86

30

13

16

33

13

46

36

13

76

39

14

06

42

14

36

45

14

66

48

14

96

Sto

ck P

rice

($

)

Time (Days)

Daily Google Stock Prices (10/24/11 - 10/24/13)

Google

6

Using the ARMA(2n, 2n - 1) modeling strategy, we determined ARMA(4, 3) to be the

adequate model for the training dataset. The F-tests were computed as follows:

s N - γ F FINV(5%, s, N - γ)

RSS(2,1) 22563 RSS(4,3) 21510 4 245 2.998430962 2.40848837

RSS(4,3) 21510 RSS(6,5) 21028 4 241 1.381039566 2.409100382

RSS(6,5) 21028 RSS(8,7) 20824 4 237 0.580436035 2.409733235

RSS(4,3) 21510 RSS(5,4) 21249 2 243 1.492376112 3.032969422

RSS(3,2) 22561 RSS(4,3) 21510 2 245 5.985471874 3.032662958

These are the lambda values and the actual model we obtained:

l1 = -0.9233 l2 =1.0110+0.0472i l3 =1.0110-0.0472i

l4 = 0.9498

Xt -2.048Xt-1 +0.2009Xt-2 +1.746Xt-3 -0.8983Xt-4 = at -1.039at-1 -0.9175at-2 +0.9596at-3

The Portmanteau test resulted in a value of Q = 8.33 whereas the critical value is 30.14.

Since Q is less than the critical value, our team concluded that the at ’s are uncorrelated.

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20P(k

)

Lag (k)

Bartlett Band Test

7

Moreover, the Bartlett Band test confirms this result – notice in the above graph that the at ’s are

less than the absolute value of 2/sqrt(252) ≈ 0.126. Therefore, AR(4, 3) is the adequate model.

The mean squared error (MSE) is 86.73 for the training part and 374.68 for the testing part. This

may imply that the model overfits the data. Also, as seen above, there are lambda values greater

than one, so this model is unstable and non-stationary. If a single at is injected into the system,

the system response can exceed any bound (that is, explode, given sufficient time). As a result,

this non-stationarity needs to be eliminated. The complex roots are very close to one – therefore,

the (1- 2B+B2 )seasonal operator was applied and a parsimonious model was calculated. We

arrived at this ARIMA(2, 2, 3) model:

(1-2B+B2 )(1+0.3485B-0.5651B2 )Xt = (1-0.5671B-0.9258B2 +0.575B3)at

After eliminating non-stationarity, the new model gave us a better MSE for the testing

portion of the data; the MSE is 96.67 for the training part and 150.44 for the testing part. This is

the graph of the prediction for the testing part using ARIMA(2, 2, 3):

0

200

400

600

800

1000

1200

1 8

15

22

29

36

43

50

57

64

71

78

85

92

99

106

113

120

127

134

141

148

155

162

169

176

183

190

197

204

211

218

225

232

239

246

Sto

ck P

rice

($)

Time (Days)

Forecast of Google Stock Prices (Testing Part)

Prediction Observed

8

On the other hand, when using the ARMA(n, n - 1) modeling strategy, we selected AR(1)

as the adequate model:

s N - γ F FINV(5%, s, N - γ)

RSS(1,0) 22932 RSS(2,1) 22563 2 249 2.036098923 3.032064916

RSS(1,0) 22932 RSS(1,1) 22900 1 250 0.349344978 3.878923701

Xt - 0.9825Xt-1 = at

Furthermore, we found AR(1) to be adequate after applying the Portmanteau and the Bartlett

Band tests. The MSE is 91.36 for the training part and 162.61 for the testing part. Notice that

the MSE value for the testing part is higher than the MSE value we obtained for the

ARIMA(2, 2, 3) model.

Another Approach

0

200

400

600

800

1000

1200

11

63

14

66

17

69

11

06

12

11

36

15

11

66

18

11

96

21

12

26

24

12

56

27

12

86

30

13

16

33

13

46

36

13

76

39

14

06

42

14

36

45

14

66

48

14

96

Sto

ck

Pri

ce (

$)

Time (Days)

Google Trend

Daily Google Stock Prices (10/24/11 - 10/24/13)

9

The models discussed so far are based on the assumption that the mean and covariance are

independent of the time origin; this assumption implies that the mean is constant and the

autocovariance depends only on the lag. If we look at the graph on the previous page that shows

Google’s stock prices, it is evident there is a trend – the behavior of Google’s stock price

depends on the time origin. At this point, our team first tried to remove this non-stationary trend

and model the remaining data.

Our group decomposed the series into two parts. The first part represents a non-

stationary trend by a deterministic function that depends on the time origin. The second part

represents stochastic behavior that can be modeled using ARMA. To model the deterministic

part, we applied linear regression formulae (the least squares estimation (LSE) method) to the

centralized series of the training part and estimated the parameters b0 and b1

. Here, the

residuals (et ) are the deviations from the trend line.

Yt = b0 + b1t +et

b0 = -47.98

b1 = 0.379

-150-100

-500

50100150200250300

11

52

94

35

77

18

59

91

13

12

71

41

15

51

69

18

31

97

21

12

25

23

92

53

26

72

81

29

53

09

32

33

37

35

13

65

37

93

93

40

74

21

43

54

49

46

34

77

49

1

Res

idu

als

Time (Days)

Residuals after Removing

Trend (εt = Yt - (-47.98 + 0.379t))

Residuals

10

After removing the deterministic trend, the residuals now have a constant zero mean. We

can model this stochastic part using ARMA models. Using the ARMA(2n, 2n - 1) modeling

strategy, we found AR(1) to be the adequate model.

Xt = 0.9727Xt-1 +at

s N - γ F FINV(5%, s, N - γ)

RSS(2,1) 22470 RSS(4,3) 21972 4 245 1.38824413 2.40848837

RSS(1,0) 22777 RSS(2,1) 22470 2 249 1.70100134 3.032064916

RSS(1,0) 22777 RSS(1,1) 22734 1 251 0.47475147 3.878773587

The Portmanteau test resulted in a value of Q = 9.184 whereas the critical value was

30.14. Since the Q value is lower than the threshold value, it can be concluded that the at ’s are

uncorrelated. Moreover, the Bartlett Band test confirms this result. Therefore, AR(1) is

adequate for the residuals.

Finally, combining the deterministic and stochastic parts, the complete model can be adopted for

the Google stock prices as:

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20P(k

)

Lag (k)

Bartlett Band Test

11

Yt = b0 + b1t +Xt

Yt = -47.98+0.379t +09727Xt-1 +at

whereXt is the stationary part that follows AR(1).

The mean squared error for the training set is 90.74, while the mean squared error for the

testing set is 151.8. As seen in the figure below, the trend line does not fit the series of the

testing part well and, consequently, prediction does not perform well enough for the testing part.

0

50

100

150

200

250

300

350

400

450

1 7131925313743495561677379859197

103

109

115

121

127

133

139

145

151

157

163

169

175

181

187

193

199

205

211

217

223

229

235

241

247

Sto

ck

Pri

ce (

$)

Time (Days)

Forecast of Google Stock Prices

Prediction Observed Trend

12

If the Google stock price data series is examined carefully, it can be seen that the stock

prices have a tendency to increase once they start increasing as seen in the rounded rectangles

above. From this realization, we tried removing the trend depicted by the green line in the figure

above and modeled the remaining residuals again.

Yt = b0 + b1t +et

b0 = -56.71

b1 = 3.01

After removing the deterministic trend, the stochastic part is modeled using the ARMA(2n, 2n -

1) modeling strategy. We selected ARMA(2, 1) as the adequate model and the Portmanteau and

Bartlett Band tests approved its adequacy. Here is the model for the residuals:

Xt = 0.255Xt-1 +0.754Xt-2 +at +0.8078at-1

13

Also, the complete model is:

Yt = -56.71+3.01t +0.255Xt-1 +0.754Xt-2 +at +0.8078at-1

The complete model has a MSE of 93.4 for the training set and a MSE of 153.23 for the test set.

Finally, our team considered a trend that fits the whole data. The deterministic trend

shown below was calculated by applying regression analysis formulae to the whole dataset (that

is, the dataset containing both the training and testing parts).

Yt = b0 + b1t +et

b0 = -75.9

b1 = 0.6

-100

0

100

200

300

400

500

11

52

94

35

77

18

59

91

13

12

71

41

15

51

69

18

31

97

21

12

25

23

92

53

26

72

81

29

53

09

32

33

37

35

13

65

37

93

93

40

74

21

43

54

49

46

34

77

49

1

Sto

ck P

rice

($

)

Time (Days)

Forecast of Google Stock Prices

Prediction Observed Trend

14

Then, AR(1) was selected as the best model for the remaining stochastic part after removing the

trend. Again, we confirmed the adequacy of this model by conducting both the Portmanteau and

Bartlett Band tests. The complete model is:

Yt = -75.9+0.6t +0.3372Xt-1 +at

Fortunately, this model has a mean squared error of 90.52 for the training data and a mean

squared error of 140.89 for the testing data.

Finally, our team tried modeling after removing exponential trend. Exponential trend

was calculated using Minitab for the whole part. The following equation is exponential trend for

the Google stock price series:

Yt = 553.331*(1.00101t )+et

The residuals that remained after removing the trend above from the actual data were centralized

and fitted to a model using the ARMA(2n, 2n - 1) modeling strategy. AR(1) was selected as the

adequate model and the Portmanteau and Bartlett Band tests approved its adequacy once more.

This is the model for the residuals:

Xt = 0.9757Xt-1 +at

Also, this is the complete model:

Yt = 553.331*(1.00101t )+0.9757Xt-1 +at

The mean squared error for the training set is 90.93, while the mean squared error for the test set

is 137.58.

15

Based on these results, it can be concluded that the model with exponential trend fits the data

best since it gives the lowest MSE value for the testing portion.

0

200

400

600

800

1000

12001 8

15

22

29

36

43

50

57

64

71

78

85

92

99

10

61

13

12

01

27

13

41

41

14

8

15

5

16

21

69

17

6

18

31

90

19

7

20

42

11

21

8

22

52

32

23

9

24

6

Sto

ck

Pri

ce (

$)

Time (Days)

(Testing Part)

Forecast of Google Stock Prices

Trend Prediction Observed

Model

MSE for

Training

Part

MSE for

Testing

Part

ARMA(4,3) 86.73 374.68

ARIMA(2,2,3) 96.67 150.44

AR(1) 91.36 162.61

Deterministic Trend and AR(1)

(Yt = -47.98+0.379t +09727Xt-1 +at ) 90.74 151.8

Deterministic Trend and ARMA(2,1)

(Yt = -56.71+3.01t +0.255Xt-1 +0.754Xt-2 +at +0.8078at-1) 93.4 153.23

Deterministic Trend and AR(1)

(Yt = -75.9+0.6t +0.3372Xt-1 +at) 90.52 140.89

Exponential Trend and AR(1)

(Yt = 553.331*(1.00101t )+0.9757Xt-1 +at) 90.93 137.58

16

Part B: Forecasting Apple Stock Prices

Daily (closing) Apple stock price data from October 24, 2011 to October 24, 2013 was

split into two data sets of 252 data points each. Our team labeled the first half of the data the

training set and the second half of the data the test set (recall this is what we did for Google as

well). The training set was used to fit an ARMA model according to the ARMA(2n, 2n - 1)

modeling strategy, starting from the AR(1) model. The centralized dataset containing all 504

Apple stock prices as well as the F-test results are shown below:

s N - γ F FINV(5%, s, N-γ)

RSS(1,0) 21701 RSS(2,1) 21605 2 249 0.55 3.03

RSS(2,1) 21605 RSS(4,3) 20544 4 245 3.16 2.41

RSS(4,3) 20544 RSS(6,5) 19641 4 241 2.77 2.41

RSS(6,5) 19641 RSS(8,7) 18897 4 237 2.33 2.41

RSS(6,5) 19641 RSS(7,6) 18825 2 239 5.18 3.03

17

Comparing each consecutive model using the F-test, the AR(1) model is better than the

ARMA(2, 1) model and thus the AR(1) model was adopted. This model is:

Xt -0.9933Xt-1 = at

The Portmanteau test resulted in a Q value of 34.34, while the threshold in this case is 30.14.

Since Q is greater than this threshold, it can be concluded that the at ’s are correlated.

Furthermore, the Bartlett band with a time lag of 20 shows an autocorrelation that is too high for

a time lag of 10. Based on this observation, the AR(1) model does not fit the data well; hence,

the ARMA(2n, 2n - 1) modeling strategy was continued. This finally yielded an ARMA(7, 6)

model. This model is:

Xt -1.222Xt-1 -0.08271Xt-2 -0.0295Xt-3 +0.01842Xt-4 -0.03697Xt-5 +0.9984Xt-6 -0.6448Xt-7

= at-2 -0.4489at-3 -0.4088at-4 -0.2871at-5 +0.891at-6

The Portmanteau test resulted in a Q value of 12.81, while the threshold in this case is 14.07.

Since Q is less than this threshold, it can be concluded that the at ’s are uncorrelated. The

18

Bartlett band with time lag 20 confirms this conclusion, as all time lags have an autocorrelation

less than 2/sqrt(252) ≈ 0.126. The ARMA(7, 6) model fits the data well.

Next, we calculated the values of lambda. There are seven total roots:

Lambda | Lambda | Period

-0.8210 ± 0.4919i 0.9571 2.48

0.0449 ± 0.9414i 0.9425 4.12

0.9767 0.9767

0.8990 ± 0.0565i 0.9007 13.86

19

One of these values is close to one, so a stochastic constant trend exists. The three pairs of

complex roots are each close to one in absolute value, which represents seasonality with periods

of 2.48, 4.12, and 13.86 respectively.

The obtained model was evaluated by calculating one-step ahead predictions for all the

data in the test set using the model we built based on the training set. The predictions and actual

stock prices of the second year are pictured below.

The mean squared error for the training set was 75.00, while the mean squared error for the test

set was much higher: 120.23. This might imply that the model overfits the data.

Our last effort was to remove the trend and seasonality from the data. Using the roots we

obtained earlier, we formed the following time series model:

wt = (1-B)(1+1.6421B+B2 )(1-0.0898B+B2 )(1-1.7979B+B2 )Xt

= at -q1at-1 -q2at-2 -q3at-3 -q4at-4 -q5at-5 -q6at-6

20

Hence, wt is a MA(6) model, and by fitting this model using MATLAB, the following

parameters were placed in the above equation:

wt = at -0.1757at-1 +0.03724at-2 -0.2669at-3 +0.04324at-4 -0.1618at-5 +0.9611at-6

This model was again evaluated using one-step ahead predictions for the test set. The

predictions, after converting them back to Xt , and actual stock prices during the second year

(that is, the testing portion of the data) are pictured below.

The mean squared error (MSE) for the training set was 91.39, which is higher than the MSE of

75.00 for the ARMA(7, 6) model obtained earlier. However, the MSE for the test set was only

slightly higher than for the training set – 109.18, which is in fact less than the MSE of the

previous model of 120.23. Consequently, it seems like the ARMA(7, 6) overfits the data,

especially compared to the MA(6) model. The MA(6) model does not seem to overfit the data,

since the MSE of the test set is only slightly higher than the MSE of the training set. Based on

21

these results, it can be concluded that the best model is the MA(6) model applied to the data

where trend and seasonality are removed.

Part C: Forecasting Yahoo Stock Prices

Finally, as our group did for Google and Apple in the previous two parts of this section,

we collected 504 daily Yahoo closing stock prices (from October 24, 2011 through October 24,

2013), all of which served to construct our entire dataset. Noting again that there are 252 trading

days in a year, we split the dataset into two series of equal length for purposes of our analysis.

Exhibit 1 below shows the directionality of the Yahoo stock prices.

Exhibit 1: Yahoo Stock Prices

0

5

10

15

20

25

30

35

40

0 100 200 300 400 500

Sto

ck

Pri

ce (

$)

Trading Day Since 10/24/2011

Daily Yahoo Stock Prices (10/24/11 - 10/24/13)

First 252 Trading Days

22

The first 252 data points correspond to the training portion, wherein the data is used to fit a

forecast model. The latter 252 data points correspond to the testing portion, wherein the model is

used to forecast actual stock prices.

Model Selection Method

Our team used the F-test approach for ARMA(2n, 2n -1) forecast model selection with

MATLAB. The table shown below shows the data output:

Table 1: Selection Details

ARMA Model RSS ARMA Model RSS s N-γ F F Crit. (0.05,s,N-γ)

(2,1) 11.76 (4,3) 10.91 4 245 4.79 2.41 (4,3) 10.91 (6,5) 10.08 4 241 4.99 2.41 (6,5) 10.08 (8,7) 9.03 4 237 6.89 2.41 (8,7) 9.03 (10,9) 9.58 4 233 -3.35 2.41 (8,7) 9.03 (9,8) 8.49 2 235 7.41 3.03

As Table 1 shows, we decided to make one further comparison between the ARMA(8, 7) and

ARMA(9, 8) models and ultimately chose the latter. To check the validity of the model, our

team conducted the Bartlett Band test at α = 0.05. However, correlation at lag 15 lies outside of

the band, so we concluded that the model is inadequate.

Therefore, we conducted the same test with ARMA(8, 7), but the correlation at lag 12

also lies outside of the band. Instead of trying the ARMA(6, 5) model, we considered the

ARMA(7, 6) model, given that the ARMA(7, 6) model (with respect to the ARMA(6, 5) model)

has a non-trivial F value of 6.24, which is greater than the threshold of 3.03. The Bartlett Band

test is illustrated in Exhibit 2 on the next page:

23

Exhibit 2: Bartlett Band for ARMA(7, 6)

Clearly, all correlations lie inside the bands, suggesting the model is adequate. The model

passed the Portmanteau test at a maximum lag of K = 20 with Q = 10.255 < 30.144 =

𝜒2(0.95,19). The equation for the ARMA(7, 6) model is shown below:

(1 − 0.7766𝐵 − 0.1345𝐵2 − 0.1987𝐵3 + 0.3479𝐵4 + 0.07447𝐵5 + 0.5085𝐵6 −

0.5369𝐵7)𝑋𝑡

= (1 + 0.1624𝐵 − 0.04794𝐵2 − 0.2524𝐵3 + 0.1669𝐵4 + 0.1504𝐵5 + 0.8714𝐵6)𝑎𝑡

This equation has the following characteristic roots:

Table 2: Characteristic Roots

Root Real Imaginary r 𝜱𝟏 𝝎 Period

𝜆1 -0.781 0.497 0.925 -1.561 2.575 2.440

𝜆2 -0.781 -0.497 0.925 -1.561 2.575 2.440

𝜆3 -0.099 0.890 0.896 -0.198 1.682 3.736

𝜆4 -0.099 -0.890 0.896 -0.198 1.682 3.736

𝜆5 0.856 0.465 0.974 1.711 0.498 12.624

𝜆6 0.856 -0.465 0.974 1.711 0.498 12.624 𝜆7 0.825 0.000 0.825

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0 2 4 6 8 10 12 14 16 18 20

Co

rrel

ati

on

s

Lag

Bartlett Band

24

The periodicities on the previous page show the periodic nature of the model, with the largest

period being 12.624. Smaller periods of 3.736 and 2.440 are also present in the model.

Results

The model replicated the actual observations during the first 252 days quite accurately with an

MSE of 0.038 (for training data) as shown in Exhibit 3 below:

Exhibit 3: Training Graphs

On the other hand, the 252-day, one-step-ahead forecast was less accurate with a mean squared

error of 2.209 as shown on the next page in Exhibit 4:

14

14.5

15

15.5

16

16.5

17

0 50 100 150 200 250

Sto

ck

Pri

ce (

$)

Trading Day Since 10/24/2011

Yahoo Training Data: Prediction vs. Actual

Prediction Actual

MSE = 0.038

25

Exhibit 4: Test Graphs

Note that the prediction diverges downward as time goes on. This is due to the fact that the

model was derived from constant trend stock prices of the first 252 days.

0

5

10

15

20

25

30

35

40

253 303 353 403 453 503

Ya

ho

o S

tock

Pri

ce (

U$

)

Trading Day Since 10/24/2011

Yahoo Testing Data: Prediction vs. Actual

Prediction Actual

MSE = 2.209

26

IV. ANOTHER APPROACH:

FORECASTING STOCK PRICES USING

BROWNIAN MOTION

Researchers have proposed several mathematical models to forecast future stock prices.

For our analysis, our team examined an additional tool besides ARMA modeling to forecast

Yahoo, Apple, and Google stock prices – Brownian motion. For a little background, Brownian

motion, also known as the Wiener process, was first introduced by Robert Brown to describe the

motion exhibited by particles immersed in a gas or liquid. This process also describes the stock

price movements although Benoit Mandelbrot, a mathematician, rejected its applicability.

Browian motion can be formulated as a random walk with a drift: tWtB = , where

tW is a random walk process. In additon, tW can be written as tZWt = where Z is normally

distributed with zero mean and a standard deviation of one. As a result, ][dBE and dtdBE =][ 2 .

Brownian motion has the following properties :

1) Continuity : B(t) is a continuous-time process.

2) Markov property: : B(t) only depends on the previous value.

3) Martingale property : .=),,|( 11 nnn BBBB E

The stochastic behavior of a stock price tS follows the geometric Brownian motion process and

can be written as tttt dBSdtSdS = . The solution to this stochastic differential equation can

be found by applying the famous Itô formula. It follows that .2

exp=2

0

tt WtSS

27

The stock volatility measures the stability of the stock price and it can be computed by the

following equations :

2

1= )(1

1

=

uun

t

n

t

1

ln=t

tt

S

Su

The expected annual rate of return or drift is denoted by 2

)(=

2

u . According to the

previous results and assumptions, the expected value )( tSE of the stock price at future time t is

given by: tySSE t exp=)( 0 and 2

2 y .

The Computational Results

In this section, our team discusses how we forecasted the future stock prices of the three

companies using three approaches applied on the same model tySSE t exp=)( 0 . We used

the historical date to obtain 2

2 y by either measuring the daily return or the total return

depending on the forecasting approach. In the first approach (denoted by “Forecast 1” in the

graphs that complement this section), we forecasted future stock prices by using n-step ahead

prediction and found that the prediction was satisfactory up to 20 days step ahead prediction.

After that point, our model did not capture the actual stock price movements. For our second

approach (denoted by “Forecast 2” in the associated graphs), we proposed a different way to

update our forecast so that we could capture the change in the volatility of the stock market. The

volatility in general is not constant and changes due to many factors. We updated the stock

volatility plus the rate of return 2

2 y for every period in the expected value equation of

the stock price. The updating process started by observing the oscillation in the stock price

28

movement in the testing data and measuring its ty value. Then, we predicted the one step-ahead

stock price and periodically updated it with the real value. The forecasted data from this

approach was better and has a time series plot slightly similar to the actual data.

Finally, our group used MATLAB to implement the Brownian motion forecasting

algorithm, and we then computed the future stock prices of the three companies. Brownian

motion produced more than 1,000 random walk paths of the stock prices movements and one of

them has been selected randomly and has been compared with the testing data. We have

observed that our Brownian motion model is an accurate and good predictor method if the

prediction step is less than a month. Brownian motion outcomes slightly deviate from the testing

data when the prediction step is more than a month and that deviation could come from the

volatility of the stock price or the market rate of return. The Brownian motion generated

different paths and the selected path is not guaranteed to be the perfect one. The future path of

the stock price should be selected with accuracy and updated with any information available

about the stock price behavior, similar to what was done in the first two approaches.

Google

Apple

Yahoo

0

200

400

600

800

1000

1200

1 9

17

25

33

41

49

57

65

73

81

89

97

10

5

11

3

12

1

12

9

13

7

14

5

15

3

16

1

16

9

17

7

18

5

19

3

20

1

20

9

21

7

22

5

23

3

24

1

24

9

Real Forecast 1 Forecast 2 Forecast 3

29

Apple

Yahoo

0

100

200

300

400

500

600

700

800

900

10001 9

17

25

33

41

49

57

65

73

81

89

97

10

5

11

3

12

1

12

9

13

7

14

5

15

3

16

1

16

9

17

7

18

5

19

3

20

1

20

9

21

7

22

5

23

3

24

1

24

9

Real Forecast 1 Forecast 2 Forecast 3

0

5

10

15

20

25

30

35

40

1 9

17

25

33

41

49

57

65

73

81

89

97

10

5

11

3

12

1

12

9

13

7

14

5

15

3

16

1

16

9

17

7

18

5

19

3

20

1

20

9

21

7

22

5

23

3

24

1

24

9

Real Forecast 1 Forecast 2 Forecast 3

30

As a reminder, in the above graphs, “Forecast 1” represents n-step ahead prediction,

“Forecast 2” represents one-step ahead prediction, and “Forecast 3” shows the randomly selected

Brownian motion path. From these graphs, we can see that Forecast 3 has the least mean

squared error because its deviation from the real line, which is shown in blue, is the smallest. In

fact, you can barely see the blue line because the Brownian motion outcomes mimic the real data

so well!

31

V. DISCUSSION & SIGNIFICANCE OF

THE RESULTS

In our analysis above, we assumed that the daily volatility and expected return of the first

year is equal to the daily volatility and expected return of the second year so that we would have

the same oscillation in the stock price movement. When we used the average sigma and mu, we

found out we could not carry out the calculations because in the expectation, we are ignoring the

random walk part in the sigma values. In other words, we are assuming sigma and mu are

constant over all days. Assuming the sigma and mu for each day in the training data is equal to

the sigma and mu in the testing data is better because we can capture all the movement in the

stock prices. Referring back to the graphs in the Brownian motion section of our report, a

constant mu and sigma will resemble the red line because if you take the expectation of mu and

sigma and just change the time, the outcome will always be an increasing or decreasing function.

Which Model Is Best for Each Company?

We found the MSE of each model for each company. We determined that for Google,

the lowest MSE was 137.58. For Google, the MSE for Brownian motion was 285.334.

Therefore, for Google, we should use the AR(1) model with exponential trend. For Apple, the

MA(6) model produced the lowest MSE value of 109.18, whereas the MSE for Brownian motion

was 154.7751064. Therefore, for Apple, we should use the MA(6) model. Lastly, for Yahoo,

the lowest MSE was 2.209, whereas the MSE for Brownian motion was 0.327270663. In this

case, we conclude that Brownian motion is the best option for Yahoo.

32

Significance of Our Results

We can use the models we developed for this project to predict future stock prices, and

we found that Yahoo has the lowest MSE, meaning that it is easier to predict Yahoo stock prices

for the existent conditions. This shows that some stocks are easier to predict than others – you

can make money more easily when a company’s stock price is more predictable. In other words,

if a company experiences the same (stable) conditions for the relevant time horizon (two years in

our case), we are confident we can predict the stock prices in the third year assuming the same

stable conditions hold.

An important conclusion we reached as a result of this project is that although there are

many modeling techniques available for stock price datasets, there is not one method that gives

the best results for each and every time series. Each time series has its own unique behavior that

presents many modeling challenges; these challenges need to be thoroughly examined before

selecting the best modeling technique that can be used to predict future patterns.

33

VI. CONCLUSION

Based on our results, there was not much volatility for Yahoo, whereas with Google and

Apple, there was much more oscillation. Based on the Brownian motion analysis, it was easier

to capture the stock fluctuations for Yahoo than for Google. The variance for Apple is very high

– one reason for this variability might be that Apple did something very different as a company

in 2011 compared to its operations in 2012. Overall, our results for the three stocks are very

different. We have very different MSEs for the three individual companies, and we noted that

Yahoo’s stock price is much more reasonable to predict. In conclusion, our group enjoyed

working on this project together and having the opportunity to implement new procedures such

as deterministic trend, exponential trend, and Brownian motion to help us achieve our objective

of finding the best models for each company’s stock price.

34

VII. REFERENCES

[1] Baxter, Martin, and Andrew Rennie. Financial Calculus: An Introduction to

Derivative Pricing. Cambridge: Cambridge UP, 1996. Print.

[2] Beichelt, Frank. Stochastic Processes in Science, Engineering, and Finance.

Boca Raton: Chapman & Hall/CRC, 2006. Print.

[3] Fama, E. “Random Walks in Stock Market Prices. Financial Analysis

Journal”, Vol. 51 (1): 1965. 1-6.

[4] Ladde, G.S. and L. Wu. “Development of Modified Geometric Brownian

Motion Models by using Stock Price Data and Basic Statistics”, Vol. 71 (12):

15 Dec. 2009.

[5] Mun, Johnathan. Applied Risk Analysis: Moving beyond Uncertainty in

Business. Hoboken, NJ: Wiley, 2004. Print.

[6] Pandit, Sudhakar M., and Shien-Ming Wu. Time Series and System Analysis

with Applications. Malabar, FL: Krieger Pub., 2001. Print.

[7] Ross, Sheldon M. An Elementary Introduction to Mathematical Finance.

NewYork: Cambridge UP, 2011. 38-39. Print.

[8] Ross, Stephen A., Randolph Westerfield, and Bradford D. Jordan.

Fundamentals of Corporate Finance. 10th ed. New York, NY:

McGraw-Hill/Irwin, 2013. 401-02. Print.