fe 582 - project presentation

23
STATISTICAL ARBITRAGE PAIRS FOR THE UNIVERSE OF SECTORAL ETFS USING CO-INTEGRATION Manoj Shenoy, Zenghui Liu, Yangxi leng

Upload: zenghui-liu

Post on 09-Feb-2017

26 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FE 582 - Project Presentation

STATISTICAL ARBITRAGE PAIRS FOR THE UNIVERSE OF SECTORAL ETFS USING CO-INTEGRATION

Manoj Shenoy, Zenghui Liu, Yangxi leng

Page 2: FE 582 - Project Presentation

Central Theme Pair Trading is a statistical strategy which takes

advantage of mis-pricings between assets Deserves due attention and study owing to it being

a highly utilized strategy among hedge funds because of its low market & sector specific risk

Why Co-integration: Using R-squared statistic to check regression can give misleading results because of the tendency of time series with trends to produce something which has come to be known as ‘Spurious regression’. Hence the need arises for co-integration.

Page 3: FE 582 - Project Presentation

Project focus and Objective To give a brief idea of a Pair Trading Strategy

for the universe of sectoral ETFs. The main aim is to thoroughly assess the

Sectoral ETFs, bucket them into various sectors using already defined Industry wide classification and outline trading strategies for different ETF pairs in all sectors, based on whether there exists co-integration between them or not.

Usage of machine learning algorithms to train the data and predict ETF Spread, using a co-integrated Natural Resources pair CRBQ-GRES as an example.

Page 4: FE 582 - Project Presentation

Methodology & Technology R for generating the code for the Statistical model FUnitRoots, Tseries packages for determination of

co-integration property between ETFs Quantmod and Performance Analytics package for

Portfolio Statistics R to generate the visualizations using ggplot,

Quantmod Machine Learning Tool Weka used for ETF Spread

Prediction. A Classifier Model called Multi-Layer Perceptron used for training the data and predicting spread.

Page 5: FE 582 - Project Presentation

Current Work Development of a Statistical Model for pair Trading using

Co-integration back-tested over a period of 5 years. Back-tested the entire universe of Sectoral ETFs to

arrive at the optimal portfolio of ETF pairs in the same sector.

One co-integrated pair from Natural resources industry CRBQ-GRES chosen to show visualizations of Spreads, Equity curves, scatterplots etc.

Determination of optimal threshold levels of buy and sell based on P & L Optimization

Machine Learning Tool Weka used for training part of the data and predicting the future spread based on supervised learning methods.

Page 6: FE 582 - Project Presentation

Defining the Co-integration Model

Two ETFs A and B are co-integrated with the non-stationary time series corresponding to them being and respectively.

We have two equations equating the scaled difference of log prices to return of ETFs in the current time period. We can write

Where ϒ is the Co-integration coefficient and and are error correction terms. The scaled difference of log prices is termed as spread in our model.

Page 7: FE 582 - Project Presentation

Defining the Co-integration Model Consider a Portfolio with long one share of ETF A

and short ϒ shares of ETF B. The return of the portfolio for a given time period is given as:

Consider the trading strategy where the trades are put on and unwound on a deviation of Δ on either direction from the spread mean. Buy the portfolio (Long ETF A & Short ETF B) when the current spread is Δ below the mean. Similarly, Sell the portfolio (Short ETF A and Long ETF B when the current spread is Δ above the mean

Page 8: FE 582 - Project Presentation

Road Map for Strategy Design & Implementation Data is downloaded directly from Yahoo using R

code. Use ETF Pairs from the same sector and test for

Co-integration using Augmented Dickey Fuller Test. This involves determining the co-integration coefficient and examining the spread time series to ensure that it is stationary and mean reverting.

This is achieved by regressing the log price series of one ETF v/s the other to get the regression coefficient, which is also known as the hedge ratio.

Page 9: FE 582 - Project Presentation

Road Map for Strategy Design & Implementation If the p-value is less than or equal to 0.01 as obtained from

the ADF test, we conclude that the series is stationary. The entire universe of ETF pairs is run through the code to

determine co-integrated pairs. The data is then trained to determine the value of delta

which optimizes the profit function Delta is the optimal threshold value at which the pair is

bought or sold which maximizes the profit. Visualizations are generated for the co-integrated ETF pair

from Natural Resources: CRBQ-GRES. ETF Spread Prediction for the Pair is implemented through

Supervised Machine learning using the Classifier Algorithm ‘Multi-Layer Perceptron’ in the tool Weka

Page 10: FE 582 - Project Presentation

Visualizations for the pair from Natural Resources sector CRBQ - GRES

Page 11: FE 582 - Project Presentation

Visualizations : CRBQ-GRES Pair

Page 12: FE 582 - Project Presentation

Visualizations : CRBQ-GRES Pair

Page 13: FE 582 - Project Presentation

Visualizations : CRBQ-GRES Pair

Page 14: FE 582 - Project Presentation

Training Data to get optimal level of Delta for Max Profit

90110130150170190210230 Cum.Profit

Page 15: FE 582 - Project Presentation

Performance Analysis – Co-integration portfolio

X1 9% X23 11% X45 6% X67 7% X89 4% X111 10% X133 3% X155 8%

X2 13% X24 6% X46 0% X68 7% X90 7% X112 8% X134 5% X156 4%

X3 8% X25 5% X47 3% X69 6% X91 4% X113 5% X135 3% X157 7%

X4 3% X26 6% X48 5% X70 8% X92 7% X114 3% X136 5% X158 9%

X5 6% X27 4% X49 4% X71 6% X93 6% X115 6% X137 1% X159 6%

X6 8% X28 1% X50 15% X72 3% X94 6% X116 9% X138 3% X160 7%

X7 5% X29 4% X51 10% X73 8% X95 7% X117 9% X139 8% X161 3%

X8 3% X30 6% X52 6% X74 9% X96 9% X118 3% X140 4% X162 4%

X9 3% X31 15% X53 8% X75 7% X97 9% X119 8% X141 3% X163 10%

X10 9% X32 5% X54 8% X76 7% X98 10% X120 4% X142 3% X164 7%

X11 2% X33 6% X55 9% X77 6% X99 10% X121 4% X143 3% X165 3%

X12 6% X34 6% X56 7% X78 5% X100 9% X122 4% X144 6% X166 3%

X13 8% X35 6% X57 5% X79 5% X101 6% X123 6% X145 7% X167 2%

X14 6% X36 11% X58 10% X80 8% X102 9% X124 6% X146 8% X168 6%

X15 10% X37 7% X59 34% X81 8% X103 7% X125 3% X147 4% X169 4%

X16 5% X38 2% X60 6% X82 9% X104 8% X126 6% X148 8% X170 3%

X17 5% X39 2% X61 5% X83 8% X105 12% X127 3% X149 5% X171 7%

X18 8% X40 5% X62 5% X84 9% X106 10% X128 5% X150 4% X172 5%

X19 4% X41 2% X63 8% X85 5% X107 4% X129 4% X151 10% X173 5%

X20 4% X42 6% X64 8% X86 3% X108 4% X130 2% X152 9% X174 4%

X21 9% X43 7% X65 6% X87 4% X109 8% X131 3% X153 8% X175 6%

X22 3% X44 7% X66 9% X88 7% X110 4% X132 6% X154 3% X176 7%

Average 2%

CO-INTEGRATED PAIRS PORTFOLIO

Worst Drawdowns for Co-integrated Pairs Portfolio

Page 16: FE 582 - Project Presentation

Performance Analysis – Co-integration portfolio

Particulars Portfolio

Observations 1206NAs 0Minimum 99.9348Quartile 1 104.78Median 108.41Arithmetic Mean 112.46Geometric Mean 111.99Quartile 3 118.09

Particulars Portfolio

Maximum 141.89SE Mean 0.3043LCL Mean 111.86UCL Mean 113.05Variance 111.67Stdev 10.56Skewness 1.0552Kurtosis 0.1696

Page 17: FE 582 - Project Presentation

ETF SPREAD PREDICTION THROUGH SUPERVISED MACHINE LEARNING

A Machine Learning tool called Weka has been used for the purposes of Spread prediction. The objective of spread prediction through this Machine Learning tool is to show the application of supervised Machine learning.

The same Natural Resources pair CRBQ-GRES has been used as an example. Two sample data are used, one which uses 5 period lagged or embedded dimension variables and the other uses 10 period lagged variables.

Page 18: FE 582 - Project Presentation

ETF SPREAD PREDICTION THROUGH SUPERVISED MACHINE LEARNING

. Without repetitive data, the algorithm cannot be trained effectively so as to minimize the error between the actual and the predicted variable.

A Classifier algorithm called Multi-Layer Perceptron is used for training the data and developing the training model.

Page 19: FE 582 - Project Presentation

Weka Analysis for the CRBQ – GRES Pair with lag - 5

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05SPREAD

Predicted

Page 20: FE 582 - Project Presentation

Weka Analysis for the CRBQ – GRES Pair with lag - 5

=== Evaluation on test split ====== Summary ===Lagged Variables 5

Correlation coefficient0.934

7

Mean absolute error0.002

8Root mean squared error

0.0041

Relative absolute error22.64

%Root relative squared error

27.12%

Total Number of Instances 412

Summary Statistics for 5 Lagged Variables

Page 21: FE 582 - Project Presentation

Weka Analysis for the CRBQ – GRES Pair with lag - 10

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05SPREAD

Predicted

Page 22: FE 582 - Project Presentation

Weka Analysis for the CRBQ – GRES Pair with lag - 10

=== Evaluation on test split ====== Summary ===Lagged Variables 10Correlation coefficient

0.9328

Mean absolute error0.004

1Root mean squared error

0.0053

Relative absolute error

32.89%

Root relative squared error

35.32%

Total Number of Instances 412

Summary Statistics for 10 Lagged Variables

Page 23: FE 582 - Project Presentation

BIBLIOGRAPHY AND REFERENCES

Ganapathy Vidyamurthy. Pairs Trading: Quantitative Methods and Analysis, 4th Edition (New York: John Wiley & Sons, Inc., 2004).

Elton, Edwin J. and Martin J. Gruber. Modern Portfolio Theory and Investment Analysis, 4th Edition. (New York: John Wiley & Sons, Inc., 1991).

Robert H. Shumway and David S. Stoffer. Time Series Analysis and its Applications - with R Examples, 3rd Edition. (New York: Springer, 2010