stat 626 final report group 12

13
S&P Stock Price Prediction Group 12 Authors Joseph Busillo (Distance Model building and R Programming) Ryan Bellinger (Distance) Dropped the course Peter Lindmark (Distance) Dropped the course Ravi Mantripragada (Distance Report and R Programming) Pranjal Haridas (Local Logistics, R Programming and Presentation) Table of Contents: 1.Introduction ......................................................................................................2 2.Data Description ..............................................................................................2 2.1 S&P 500 Index ........................................................................................2 2.2 Heating Degree Day (HDD) & Cooling Degree Day (CDD) ....................3 2.3 Treasury Rates .......................................................................................4 3.Data Analysis ..................................................................................................4 3.1 Data Transformations .............................................................................4 4.Univariate Model Fitting ..................................................................................5 4.1 ARIMA Model for HDD Seasonal Component .....................................5 4.1.1 ARIMA(1,1,1)x(0,1,2)12 & ARIMA(1,1,1)x(1,1,1)12 ......................5 4.1.2 Final Model Selection for HDD ......................................................5 4.1.3 HDD 12 month forecast .................................................................6 4.2 ARIMA Model for CDD Seasonal Component .....................................6 4.2.1 ARIMA(1,1,1)x(0,1,1)12 and ARIMA(0,1,12)x(0,1,1)12 ................6 4.2.2 Final Model selection for CDD .......................................................6 4.2.3 CDD 12 month forecast .................................................................7 4.3 ARIMA model for treasury prices ...........................................................7 4.3.1 Model 1 & 2 AR(1) & AR(2) .........................................................7 4.3.2 Model 3 MA(1) .............................................................................7 4.3.3 Model 4 ARMA(1,1) .....................................................................7 4.3.4 Final model selection for treasury prices .......................................7 4.3.5 Parameter estimation using RSS (Invertibility) ..............................8 4.3.6 Treasury price 12 month forecast ..................................................8 4.4 S&P 500 Index ........................................................................................8 5.CCF plots from Univariate analysis .................................................................8 6. Multivariate Analysis ......................................................................................9 6.1 Model Fit ................................................................................................10 6.2 Projection ...............................................................................................10 6.3 Model Enhancement ..............................................................................11 6.4 GARCH models on orthogonal residuals ...............................................11 6.5 S&P 500 returns ARCH ..........................................................................12 6.6 Final Projection ......................................................................................12 7.Conclusions ....................................................................................................12 8.Bibliography ....................................................................................................13 1

Upload: kavya1012

Post on 12-Jan-2016

215 views

Category:

Documents


0 download

DESCRIPTION

Statistics time series

TRANSCRIPT

Page 1: STAT 626 Final Report Group 12

S&P Stock Price Prediction Group 12

Authors Joseph Busillo (Distance ­ Model building and R Programming) Ryan Bellinger (Distance) ­ Dropped the course Peter Lindmark (Distance) ­ Dropped the course Ravi Mantripragada (Distance ­ Report and R Programming) Pranjal Haridas (Local ­ Logistics, R Programming and Presentation)

Table of Contents:

1.Introduction......................................................................................................2 2.Data Description..............................................................................................2

2.1 S&P 500 Index........................................................................................2 2.2 Heating Degree Day (HDD) & Cooling Degree Day (CDD)....................3 2.3 Treasury Rates.......................................................................................4

3.Data Analysis..................................................................................................4 3.1 Data Transformations.............................................................................4

4.Univariate Model Fitting..................................................................................5 4.1 ARIMA Model for HDD ­ Seasonal Component.....................................5

4.1.1 ARIMA(1,1,1)x(0,1,2)12 & ARIMA(1,1,1)x(1,1,1)12......................5 4.1.2 Final Model Selection for HDD......................................................5 4.1.3 HDD 12 month forecast.................................................................6

4.2 ARIMA Model for CDD ­ Seasonal Component.....................................6 4.2.1 ARIMA(1,1,1)x(0,1,1)12 and ARIMA(0,1,12)x(0,1,1)12................6 4.2.2 Final Model selection for CDD.......................................................6 4.2.3 CDD 12 month forecast.................................................................7

4.3 ARIMA model for treasury prices...........................................................7 4.3.1 Model 1 & 2 ­ AR(1) & AR(2).........................................................7 4.3.2 Model 3 ­ MA(1).............................................................................7 4.3.3 Model 4 ­ ARMA(1,1).....................................................................7 4.3.4 Final model selection for treasury prices.......................................7 4.3.5 Parameter estimation using RSS (Invertibility)..............................8 4.3.6 Treasury price 12 month forecast..................................................8

4.4 S&P 500 Index........................................................................................8 5.CCF plots from Univariate analysis.................................................................8 6. Multivariate Analysis......................................................................................9

6.1 Model Fit................................................................................................10 6.2 Projection...............................................................................................10 6.3 Model Enhancement..............................................................................11 6.4 GARCH models on orthogonal residuals...............................................11 6.5 S&P 500 returns ARCH..........................................................................12 6.6 Final Projection......................................................................................12

7.Conclusions....................................................................................................12 8.Bibliography....................................................................................................13

1

Page 2: STAT 626 Final Report Group 12

Objective: Goal of the analysis is to determine the effect that weather and changes in interest rates may have on the performance of the S&P 500 as well as more robust correlation analysis of the residuals for the purpose of asset allocation.

1.Introduction Much literature has been devoted to the study of the S&P 500 index. Further, there is much financial theory around on how efficient markets should work. The efficient market hypothesis states that value of the Index should follow a random walk in it’s weak form. The conclusion there is that the index value cannot be predicted from its prior values. However, that doesn’t mean that there might not be information available outside the series that can be predictive of future asset returns. Certainly, the effects of the performance of other asset classes, geography, political and weather events can reasonably be expected to affect the performance of particular stocks or even the market as a whole.

The goal of this paper is to first determine whether the S&P 500 index has meaningful autocorrelation to suggest a future value of the index given its past values through the use of valid time series techniques. Further, we seek to determine if the level of interest rates and the effects of weather have an effect on the performance of the Index. In other words, even if there is no information in the current S&P 500 price that would enable a prediction of future prices, is there information in weather and bond markets that might be helpful for determining the path of the S&P 500 index.

Given the efficient market hypothesis, modern portfolio theory also suggests that an optimal allocation can be determined given the return vector and covariance matrix. That is, a return target can be realized for the least amount of volatility. Given the prevalence of this form of asset allocation, we further want to ensure that the covariance estimates are valid. To do this, we want to remove any indication of temporal correlation and be left with white noise in the residuals. We can then determine the correlation of the pairwise residuals series. We have used univariate analysis to identify the potential candidates that can be used in the multivariate analysis.

2.Data Description S&P 500 index is driven by various economic indicators. We have selected a few potential indicators for our study for simplicity. Heating degree day (HDD), cooling degree day (CDD) and treasury rates are the potential indicators that are studied in this project.

2.1 S&P 500 Index The Standard & Poor’s 500 (S&P 500) is a stock market index of 500 large companies having common stock in NYSE or NASDAQ. S&P 500 index is one of the most followed indices and it is also considered as one of the best representations of the US stock market and US economy. Figure 1 represents time series plot of S&P index.

2

Page 3: STAT 626 Final Report Group 12

Figure 1: Time series plot of S&P 500 index and Treasury rates from 1970Q1 to 2015Q2

2.2 Heating Degree Day (HDD) & Cooling Degree Day (CDD) A HDD is a measurement which is designed to show the demand for energy that is needed to heat a closed area (eg: home, building). Similarly, a cooling degree day (CDD) shows the amount of energy that is used to cool a closed area (eg: home, building). An HDD is defined relative to some base temperature where the outside temperature above which an area needs no heating. The most commonly used base temperature is 65 °F and is used in calculating HDD or CDD. We believe that due to climatic changes, there is a huge impact on the energy demands which in turn affect the price of common commodities. Any fluctuations on the basic commodities may have some impact on the performance of the S&P 500 index. So, we wanted to study the HDD relationship with S&P 500 index.

Figure 2: Time series plot of HDD, CDD and Treasury rates from 1970Q1 to 2015Q2

3

Page 4: STAT 626 Final Report Group 12

2.3 Treasury Rates Treasury rates are the rates that US government pays to borrow money for different lengths of time. We have used 10 year treasury yields. It is well known that treasury yields also tell us about the investor's sense on US economy. The higher the yields on the 10, 20 and 30 year treasuries the better is the US economic outlook. Based on the historical observations, whenever 10 year treasury rate has touched the bottom trend line, it has marked an extremely good point for the entry into the stocks. Similarly, the opposite has also been true. Whenever the 10 year rate touches the peak and heads down, it is not a good time to hold stocks. So, we would like to explore this trend and study the relationship with the S&P 500 index. The first phase of the analysis examines the time series attributes of the variables chosen, that is: HDD, CDD, S&P 500 index and 10 Year Constant Maturity prices. We then examine the cross correlation function to establish a causal relationship. The second phase of the project is focussed on developing a multivariate model to establish any causal/predictive relationships.

3.Data Analysis Following section will describes the steps that were followed in analyzing the data using univariate analysis to. The first step in analysing the time series data is to detrend the series to make the series stationary with constant variance. We have used data transformation techniques to satisfy the above conditions.

3.1 Data Transformations For S&P 500 index, we have used first difference of the logarithmic transformed data. Figure 4 below shows values that look reasonable except a few spikes at different time periods. There does not appear to be any trend in the data. An argument could be made that the variance is not constant, but overall the transformation seems to have worked well. Major spikes are due to recent mortgage crisis, russian crisis and black mondays. First order difference is performed to make the HDD and CDD series stationary respectively as shown in the figure 5. Figure 3: First order difference of log transformed S&P 500 index and first order difference of treasury prices

respectively.

4

Page 5: STAT 626 Final Report Group 12

Figure 4: First order difference of HDD and CDD

Similarly, first order difference is also applied to treasury prices to make the series stationary as shown in the figure 4. Investigation of these different stationary series are examined using ACF and PACF plots which has guided in selecting different time series models. Note, we have transformed the rate data to prices assuming a 5% coupon and the pricing formula: C*FV*[(1­(1/(1+r))^n)/i] + FV/(1+r)^n. C is the coupon rate ­ assumed here to be 5%, FV is the face value of the bond ­ $1,000 was used in this example. r is the treasury rate, the data that was actually pulled for this analysis and n is the stated maturity from today. In our data, since we used the constant maturity rates, this number is always 10 for purposes of this analysis.

4.Univariate Model Fitting

4.1 ARIMA Model for HDD - Seasonal Component Below are the plots of ACF and PACF for HDD data series. First difference has removed the trend of the data but the ACF plots reveals the seasonality patterns in the data. So, a seasonal difference is applied to the first order difference data to eliminate seasonality. In this case, we have used lag 12 difference to eliminate the seasonality. Fig 6 represents ACF and PACF plots of the transformed data.

Figure 5: ACF plots of HDD series after the first order difference and seasonal difference

For the seasonal interpretations, a seasonal lag of 12*k, ACF appears to cut off after lag 24 and PACF seems to tail off, suggesting first a SMA(2) model, however, it can be argued that both ACF and PACF at seasonal lags are tailing off which could be SARMA(1,1)

5

Page 6: STAT 626 Final Report Group 12

Looking at intraseasonal interpretations, it appears that both the ACF and PACF are tailing off, which also suggests an ARMA(1,1) model. After the investigation of ACF and PACF plots, ARIMA(1,1,1)x(0,1,2)12 and ARIMA(1,1,1)x(1,1,1)12 models are selected to fit the data.

4.1.1 ARIMA(1,1,1)x(0,1,2)12 & ARIMA(1,1,1)x(1,1,1)12

Model shows no autocorrelation and Q Statistic p­values are all non significant which supports the model validity for both the models. Since the results are positive, we have used AIC to select the model.

4.1.2 Final Model Selection for HDD Model 1 (ARIMA(1,1,1)x(1,1,1)12 ) AIC: 9.06 Model 2 (ARIMA(1,1,1)x(0,1,2)12) AIC: 9.06 Models are very similar, so we choose parsimony. Therefore Model 2 has an easier interpretation so we have used model 2 for the growth rate. Model equation: Xt=(.2274(.0426)Xt­1)­(1(.0526)Wt­1+.0313(.0467)Xt­12)­(.9545(.0271)Wt­12)+Wt

4.1.3 HDD 12 month forecast A 12 month HDD forecast of the data is plotted using the current model.

Figure 6: First plot is the forecast of 12 month HDD data using the model ARIMA(1,1,1)x(0,1,2)12 and next two ACF plots are of first order difference seasonal difference of CDD

4.2 ARIMA Model for CDD - Seasonal Component Based on the below plots, ACF tails off and PACF cuts off after lag 12. This suggests a seasonal lag of 12; s=12. Looking at the seasonal interpretations, a seasonal lag 12*k, ACF appears to cut off after lag 12 and the PACF seems to tail off, suggesting SMA(1). Looking at intraseasonal interpretations, it appears that both the ACF and PACF are tailing off, which also suggests and ARMA(1,1) model. Arguably, the ACF cuts off somewhere around lag 12, making it an MA(12) model.

4.2.1 ARIMA(1,1,1)x(0,1,1)12 and ARIMA(0,1,12)x(0,1,1)12 Diagnostic results from this model shows no autocorrelation in the residuals for both models. So, we have used AIC to select the best model..

4.2.2 Final Model selection for CDD Model 1 (ARIMA(1,1,1)x(0,1,1)12) AIC: 6.83 Model 2 (ARIMA(0,1,12)x(0,1,1)12) AIC: 6.82 Both the models are very similar based on the AIC, so we choose parsimony. Therefore Model 1 has an easier interpretation so we have used model 1 for the growth rate.

6

Page 7: STAT 626 Final Report Group 12

Model equation: Xt=(.3266(.0426)Xt­1)­(.9906(.0079)Wt­1)­(.9105(.0211)Wt­12)+Wt

4.2.3 CDD 12 month forecast A 12 month CDD forecast of the data is plotted using the current model. Figure 7: 12 month forecast of CDD using ARIMA(1,1,1)x(0,1,1)12 & ACF plot of the first order difference of

treasury price

4.3 ARIMA model for treasury prices ACF and PACF plots of first order difference data of treasury prices are plotted below. There is no seasonality in the plots. ACF seems to tail off as does the PACF. We could argue that either cuts off and the other tails off, therefore we fit 3 models: AR(1), MA(1) and ARMA(1,1). Also fit an AR(2) since PACF could be argued to cut off after 2 and ACF tails off. After the investigation of ACF and PACF plots, AR(1), AR(2), MA(1), ARMA(1,1) models are selected to fit the data.

4.3.1 Model 1 & 2 - AR(1) & AR(2) Models are not a good fit as there is a significant autocorrelation left in the residuals. There is a non­constant variance in the residuals and residuals are not normally distributed. Q statistic are all significant which supports the statement that model is not a good fit.

4.3.2 Model 3 - MA(1) Model fit is reasonable as we do not see any autocorrelation in the ACF plot of residuals.Cumulative autocorrelation not significant at most lags and some non­constant variance in the residuals.

4.3.3 Model 4 - ARMA(1,1) Model fit pretty good as we can see a very non­significant Q statistics. However there is an evidence of fat tails and non­constant variance in the residual plots as shown in the figure 13.

4.3.4 Final model selection for treasury prices Below are the AIC scores for all four models. MODEL 1 ­ AR(1) AIC: 6.72 MODEL 2 ­ AR(2) AIC: 6.68 MODEL 3 ­ MA(1) AIC: 6.68 MODEL 4 ­ ARMA(1,1) AIC: 6.67

7

Page 8: STAT 626 Final Report Group 12

Best models are models 3 and 4, with very similar diagnostics and AIC. So, we choose the simpler model, model 3. Model equation: Xt = .4047(.044)Wt­1 + Wt

4.3.5 Parameter estimation using RSS (Invertibility) Since we chose the MA(1) model for this series, we also wanted to see how close the sum of residual squares method would get us to the model fit above. To do this we first recognize that an invertible MA(1) process can be written as: Xt = Wt + θXt­1 ­ θ

2Xt­2 + θ3Xt­3 ­ ….

Which also means that any residual can be written as: Wt = Xt ­ θXt­1 + θ

2Xt­2 ­ θ3Xt­3 ­ ….

We can then condition on W1 = 0. After creating a grid for different values of θ, we see where the residual sum of squares is minimized. The chart to the right shows the value of the sum of squares is minimized at: at about .35. Remember, our fit above was about .4, so both methods give similar results.

Figure 8: Estimation of theta for invertible MA(1) process & Forecast of Treasury Price using MA(1)

4.3.6 Treasury price 12 month forecast A 12 month forecast of treasury prices are plotted in the figure using the above selected model.

4.4 S&P 500 Index

Below is the plot of ACF and PACF on first order difference of log of S&P 500 index. It is evident from the plot that there is no significant autocorrelation or partial auto correlation. Therefore, we can use sample mean and sample standard deviation for the growth rate.

Figure 9: ACF plot of first order difference and log of S&P 500

5.CCF plots from Univariate analysis From the CCF plots, it appears that there might be a predictive relationship between treasury prices and the S&P 500, there is not much in terms of the weather data in terms of predictive power

8

Page 9: STAT 626 Final Report Group 12

Figure 10: CCF plots

Cross­correlation function of the residuals was not significant for HDD, CDD (Y in the plot refers to S&P 500), but that there might be one on Treasury’s and S&P 500.

6. Multivariate Analysis Given the relationship evident between S&P 500 and Treasury prices, these two variables are a good candidates for a VARMAX model. Using the BIC criterion, SC(n), we choose the model with lag 2.

Figure 11: ACFs (diagonals) and CCFs (off­diagonals) for the residuals of the VAR(2) model

After running the model with a lag of 2, we see that all autocorrelation and cross correlation is gone between the two variables. We now check to make sure that there is not significant cumulative autocorrelation in the residuals up to lag 16. Portmanteau Test confirms that there is no significant autocorrelation with p­value of 0.53. Another assumption of the model is that the residuals are Multivariate Normally distributed. These residuals were standardized, squared and summed using the formula: (yi­u)

TS­1(yi­u). We see from the Chi­Squared QQ plot, that we might be in violation of this assumption, clearly the squared residuals are not Chi­Squared distributed. This indicates that there is some room for improvement in this model. Residuals still show non­constant variance.

Figure 11: Multivariate QQ plot and residual plots of S&P 500 and Treasury Prices

9

Page 10: STAT 626 Final Report Group 12

6.1 Model Fit Multivariate results from R are shown below.

SPt = 0.004838 + 3.02 * 10­06 t + 5.016 * 10­02 SPt−1 + 2.28 * 10­04 Tt−1 ­2.831 *10­04 SPt−2 − 8.36 *10­05 Tt−2 +ERROR Tt = ­0.4959 + 0.00639 t ­ 33.94 SPt−1 + 0.33 Tt−1 ­57.72 SPt−2 − 0.1683 Tt−2 + ERROR

6.2 Projection From the projection charts, we see that this model would predict a continued rise in equity prices, with a flattening out and slight rise of Treasury prices, indication and overweight to Equity.

Figure 12: Forecast values using VAR(2) model

10

Page 11: STAT 626 Final Report Group 12

6.3 Model Enhancement One enhancement we made to this model is knowing that the residuals are not multivariate normally distributed, because of the non­constant variance evident, we wanted to fit a GARCH model to the residuals series. The issue is that the residual vectors from the VARMAX model will have some correlation structure. Therefore to use univariate GARCH methods, we have to transform the correlated residuals into uncorrelated vectors. We did this using Cholesky decomposition/transformation on the covariance matrix of the residuals and applied the inverse of the lower triangle matrix from the Cholesky decomposition to the correlated, VARMAX residuals, leaving two uncorrelated vectors of residuals. It was to these vectors that the GARCH models were fit.

6.4 GARCH models on orthogonal residuals The treasury plots show squared, orthogonal residuals that are quite messy, for parsimony we tried GARCH(1,1) model.

Figure 13: ACF plots of squared residuals of treasury and S&P 500 index

Model fit statistics are all very good for this model, showing no more autocorrelation in the squared orthogonal residuals, coefficients are significant at .1 level and most at .05 level.

st2 = .05064(.02737) + .06553(.02509)*Y

2t­1 + .8854(.03986)*st­1

2

11

Page 12: STAT 626 Final Report Group 12

6.5 S&P 500 returns ARCH We see significant autocorrelation in the squared orthogonal residuals, again for parsimony we first try a GARCH(1,1) since both appear to tail off.

st2 = .03889(.01934) + .1238(.02963)*Y

2t­1 + .8453(.0290)*st­1

2

6.6 Final Projection Once the GARCH models were fit to the orthogonal residuals, we could use the projected variance vectors to create projections of the those residuals. Once we have the projected residuals, we apply the Cholesky matrix to the uncorrelated residuals and get back a set of correlated variables on the same scale as our data. Those correlated residuals can now be fit to the projected mean vectors we found using the VARMAX model. The model looks slightly different when there is non­constant variance, the model still favors equities and it appears that volatility has evened out on both series. We may not see a significant difference by adding GARCH model but it has helped in minimizing the errors. Figure 14 represents the final forecast values using VAR and GARCH models.

SPt = 0.004838 + 3.02 * 10­06 t + 5.016 * 10­02 SPt−1 + 2.28 * 10­04 Tt−1 ­2.831 *10­04 SPt−2 − 8.36 *10­05 Tt−2 +st2

Tt = ­0.4959 + 0.00639 t ­ 33.94 SPt−1 + 0.33 Tt−1 ­57.72 SPt−2 − 0.1683 Tt−2 + st2

Figure 14: Forecast values using VAR(2) and ARCH(1,1) model

7.Conclusions The findings from the univariate analysis on CDD, HDD, Treasury rates and S&P and CCF plots with respective to S&P has identified that only Treasury rates and S&P 500 has significant relationship. In the multivariate analysis, we have explored the relationship between S&P 500 and Treasury rates using VARMAX model. Upon reviewing the residuals from the VARMAX model, ACF plot of squared orthogonal residuals has encouraged to explore GARCH model for residuals. Expectation of the mean S&P 500 is derived from VARMAX model and error variance is estimated using GARCH models. We found a slight improvement with the addition of GARCH model to the errors terms in the VARMAX model. Upon reviewing the results, S&P 500 has shown positive outlook for the near future whereas treasury price has lower growth rate compared to S&P 500. An optimistic investor taking in to the consideration the expectation that the Federal Reserve will raise their overnight lending rate and economic conditions may favour investing in S&P 500 index where as a conservative investor can invest in fixed income assets.

12

Page 13: STAT 626 Final Report Group 12

8.Bibliography “Wikipedia.” Wikipedia. Wikimedia Foundation, n.d. Web. 20 Jul. 2015.

<https://en.wikipedia.org/wiki/heating_degree_day>

“Wikipedia.” Wikipedia. Wikimedia Foundation, n.d. Web. 20 Jul. 2015. <https://en.wikipedia.org/wiki/s&p_500>

“10 Yr Treasury Rates Vs. S&P 500.” TradingView. Web. 20 Jul. 2015.

<https://www.tradingview.com/chart/dgs10/p4kytyby­10­yr­treasury­rates­vs­s­p­500/>

“S&P 500 Index (YAHOO) ­ Data And Charts from Quandl.” S&P 500 Index (YAHOO) ­ Data and Charts from

Quandl. Web. 20 Jul. 2015. <https://www.quandl.com/data/yahoo/index_gspc­s­p­500­index>

“Treasury Yield Definition | Investopedia.” Investopedia. N.p., Jul. 2010. Web. 20 Jul. 2015.

<http://www.investopedia.com/terms/t/treasury­yield.asp>

13