ms cs - algorithmic strategy for computational investments

15
Formulating a Strategy to maximize Risk Adjusted Return though Statistical Analysis and Optimization : Goal Using Python explore sample dataset (trading data), Calculate Statistical Measures , Optimize the Weights (allocations) and Formulate a Strategy (most profitable) , finally apply machine Learning for automatic algorithmic trading. In essence, the idea is to show how one can compute correlation and variability on historical data, optimize and take decisions (e.g. algorithmic trading) on live current data. Explore Dataset Directly download text or csv files url = ‘http://www.stoxx.com/download/historical_values/" Read all data into csv file data = web.DataReader(“IBM”, data_source=‘yahoo’,start='1/1/2006') Note its always a good idea to test every measurement against a benchmark or reference. Here SPY-500 (Index Fund) represents the Market . def symbol_to_path(symbol, base_dir=os.path.join(“.", "data")): """Return CSV file path given ticker symbol.""" return os.path.join(base_dir, "{}.csv".format(str(symbol))) def get_data(symbols, dates, addSPY=True): """Read stock data (adjusted close) for given symbols from CSV files.""" print " """""""""""""""""""""""""""""""""""""""""""" " df = pd.DataFrame(index=dates) if addSPY and 'SPY' not in symbols: # add SPY for reference, if absent symbols = ['SPY'] + symbols for symbol in symbols: df_temp = pd.read_csv(symbol_to_path(symbol), index_col='Date', parse_dates=True, usecols=['Date', 'Adj Close'], na_values=['nan']) df_temp = df_temp.rename(columns={'Adj Close': symbol}) df = df.join(df_temp) if symbol == 'SPY': # drop dates SPY did not trade df = df.dropna(subset=["SPY"]) return df def plot_data(df, title="Stock prices", xlabel="Date", ylabel="Price"): """Plot stock prices with a custom title and meaningful axis labels."""

Upload: kaniska-mandal

Post on 23-Jan-2017

296 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: MS CS - Algorithmic Strategy for Computational Investments

Formulating a Strategy to maximize Risk Adjusted Return though Statistical Analysis and Optimization :!!Goal !!Using Python explore sample dataset (trading data), Calculate Statistical Measures , Optimize the Weights (allocations) and Formulate a Strategy (most profitable) , finally apply machine Learning for automatic algorithmic trading.!!In essence, the idea is to show how one can compute correlation and variability on historical data, optimize and take decisions (e.g. algorithmic trading) on live current data.!!Explore Dataset! ! Directly download text or csv files ! ! ! url = ‘http://www.stoxx.com/download/historical_values/" !! Read all data into csv file! ! data = web.DataReader(“IBM”, data_source=‘yahoo’,start='1/1/2006')!! Note its always a good idea to test every measurement against a benchmark or ! reference. Here SPY-500 (Index Fund) represents the Market .!!!

def symbol_to_path(symbol, base_dir=os.path.join(“.", "data")):! """Return CSV file path given ticker symbol."""! return os.path.join(base_dir, "{}.csv".format(str(symbol)))!!!def get_data(symbols, dates, addSPY=True):! """Read stock data (adjusted close) for given symbols from CSV files."""!!! print " """""""""""""""""""""""""""""""""""""""""""" "!! df = pd.DataFrame(index=dates)! if addSPY and 'SPY' not in symbols: # add SPY for reference, if absent! symbols = ['SPY'] + symbols!!! for symbol in symbols:! df_temp = pd.read_csv(symbol_to_path(symbol), index_col='Date',! parse_dates=True, usecols=['Date', 'Adj Close'], na_values=['nan'])! df_temp = df_temp.rename(columns={'Adj Close': symbol})! df = df.join(df_temp)! if symbol == 'SPY': # drop dates SPY did not trade! df = df.dropna(subset=["SPY"])!! return df!!!def plot_data(df, title="Stock prices", xlabel="Date", ylabel="Price"):! """Plot stock prices with a custom title and meaningful axis labels."""!

Page 2: MS CS - Algorithmic Strategy for Computational Investments

ax = df.plot(title=title, fontsize=12)! ax.set_xlabel(xlabel)! ax.set_ylabel(ylabel)! plt.show()!!!!

Statistical Analysis !!Here an example is shown how we can normalize entity values ( in this case Stock Price) so that statistical measures and data queries and be uniformly applied / calculated .!!Normalization!!

def get_portfolio_value(prices, allocs, start_val=1):! """Compute daily portfolio value given stock prices, allocations and starting value.!! Parameters! ----------! prices: daily prices for each stock in portfolio! allocs: initial allocations, as fractions that sum to 1! start_val: total starting value invested in portfolio (default: 1)!! Returns! -------! port_val: daily portfolio value! """! normed_vals = prices / prices.ix[0]! allocated_vals = normed_vals*allocs! pos_val = allocated_vals*start_val! port_val = pos_val.sum(axis=1)!! return port_val!!

Common Measures !!The Key concept is to determine the volatility factor in time-series data (usually sd) and find a risk_adjusted_return_measure (i.e. sharpe ratio) which will divide the avg value by volatility !!!

def get_portfolio_stats(port_val, daily_rf=0, samples_per_year=252):! """Calculate statistics on given portfolio values.!! Parameters! ----------! port_val: daily portfolio value! daily_rf: daily risk-free rate of return (default: 0%)! samples_per_year: frequency of sampling (default: 252 trading days)!! Returns! -------! cum_ret: cumulative return! avg_daily_ret: average of daily returns! std_daily_ret: standard deviation of daily returns! sharpe_ratio: annualized Sharpe ratio!

Page 3: MS CS - Algorithmic Strategy for Computational Investments

"""! k = np.sqrt(samples_per_year)! daily_returns = (port_val / port_val.shift(1))-1! cum_ret = (port_val[-1] / port_val[0]) - 1! avg_daily_ret = daily_returns.mean()! std_daily_ret = daily_returns.std()! daily_returns[0] = 0! sharpe_ratio = k * (np.mean(daily_returns - daily_rf) / np.std(daily_returns))!! return cum_ret, avg_daily_ret, std_daily_ret, sharpe_ratio!!!

Optimization!!We can optimize the weights for a given set of entity values to maximize the risk_adjusted_return !!

def find_optimal_allocations(prices):! """Find optimal allocations for a stock portfolio, optimizing for Sharpe ratio.!! Parameters! ----------! prices: daily prices for each stock in portfolio!! Returns! -------! allocs: optimal allocations, as fractions that sum to 1.0! """! def objectiveFunction(allocs):! allocs = np.array(allocs)! new_allocated_vals = initial_working_vals;!! # disclaimer : credit for the following sharpe ratio formula goes to author of Python for Finance book! # reference : https://github.com/yhilpisch/py4fi/blob/124cb260fa0a6d70f0945eca9eb79f6680c3f7d2/ipython/11_Statistics_a.ipynb!! portfolio_ret = np.sum(new_allocated_vals.mean() * allocs) * 252! portfolio_vol = np.sqrt(np.dot(allocs.T, np.dot(new_allocated_vals.cov() * 252, allocs)))! sharpe_ratio = portfolio_ret / portfolio_vol!! return -np.array(sharpe_ratio)!! num_allocs = 4! allocation_distribution = np.random.random(num_allocs)! normed_vals = prices / prices.ix[0]! allocated_vals = normed_vals*allocation_distribution!! initial_working_vals = np.log(allocated_vals / allocated_vals.shift(1))!! cons = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})! bnds = tuple((0, 1) for x in range(num_allocs))!! optimum_vals = sco.minimize(objectiveFunction,allocation_distribution,method='SLSQP',bounds=bnds, constraints=cons)!!

Page 4: MS CS - Algorithmic Strategy for Computational Investments

return optimum_vals['x']!!!def optimize_portfolio(start_date, end_date, symbols):! """Simulate and optimize portfolio allocations."""! # Read in adjusted closing prices for given symbols, date range! dates = pd.date_range(start_date, end_date)! prices_all = get_data(symbols, dates) # automatically adds SPY! prices = prices_all[symbols] # only portfolio symbols! prices_SPY = prices_all['SPY'] # only SPY, for comparison later!! # Get optimal allocations! allocs = find_optimal_allocations(prices)! allocs = allocs / np.sum(allocs) # normalize allocations, if they don't sum to 1.0!! # Get daily portfolio value (already normalized since we use default start_val=1.0)! port_val = get_portfolio_value(prices, allocs)!! # Get portfolio statistics (note: std_daily_ret = volatility)! cum_ret, avg_daily_ret, std_daily_ret, sharpe_ratio = get_portfolio_stats(port_val)!! # Print statistics! print "Start Date:", start_date! print "End Date:", end_date! print "Symbols:", symbols! print "Optimal allocations:", allocs! print "Sharpe Ratio:", sharpe_ratio! print "Volatility (stdev of daily returns):", std_daily_ret! print "Average Daily Return:", avg_daily_ret! print "Cumulative Return:", cum_ret!! # Compare daily portfolio value with normalized SPY! normed_SPY = prices_SPY / prices_SPY.ix[0, :]! df_temp = pd.concat([port_val, normed_SPY], keys=['Portfolio', 'SPY'], axis=1)! plot_data(df_temp, title="Daily Portfolio Value and SPY")!!!def test_run():! """Driver function."""! # Define input parameters! start_date = '2010-01-01'! end_date = '2010-12-31'! symbols = ['GOOG', 'AAPL', 'GLD', 'HNZ'] # list of symbols!! # Optimize portfolio! optimize_portfolio(start_date, end_date, symbols)!!!

Deep Dive into Histograms , Kurtosis , Scatter Plots and Correlations ! !!say, the daily-ret 0.9 occurs 3 times ,which is plotted as the ‘frequency of value’ in Histogram. !!!!!!

Page 5: MS CS - Algorithmic Strategy for Computational Investments

!Pic -1!!!!!!!!!!!standard deviation -> how much a value deviates from the mean!kurtosis is about the tails of the distribution !!!Pic-2!!!!!!!!!!!!!!!!Note positive fat tails tells us there are more occurrences outside in the tail than it would normally happen in a normal gaussian distribution.!!Note : wrong mathematical assumptions can lead to Market Catastrophe !!!

Financial Experts assumed the bonds returns follow a Normal Gaussian distribution and hence ignored the variations in tails , so they predicted that tail will never go negative .!!But on the contrary , since BONDs were not completely independent (strongly correlated and had skinny tails) lots of home-owners defaulted loans and resulted into great recession of 2008 proving all market predictions to be wrong.!!

Now lets examine the difference of the frond from market and measure the correlation.!smaller circle -> shows XYZ has a +ve value when SPY is +ve !bigger circle -> shows XYZ is -ve when SPY is +ve!

Page 6: MS CS - Algorithmic Strategy for Computational Investments

!!Pic-3!!!!!!!!!!!!!!!!!!So each point in scatter plot captures which direction XYZ value is moving w.r.t SPY !!

if slope beta is say 2 , that means XYZ offers double return compared to market .!!overall its a linear fit with positive relationship .!!

Note slope is not same as correlation . correlation -> states how tightly points fit the line .!!Now lets see how we can take decisions based on the correlations !!!The time series shows how GOLD price goes up when SPY actually drops whereas XOM constantly follows the pattern of SPY.!!Pic-4!!!!!!!!!!!!!!!!

Page 7: MS CS - Algorithmic Strategy for Computational Investments

!We see that Beta value of XOM is higher than beta of GLD . So XOM is strongly reactive to Market -> this is how we discover pattern in Data !!!

daily_returns.hist(bins=20)!! daily_returns.plot(kind='scatter', x='SPY', y='IBM' )! beta_XOM, alpha_XOM = np.polyfit(daily_returns['SPY'], daily_returns['IBM'],1)! plt.plot(daily_returns['SPY'], beta_XOM*daily_returns['SPY'] + alpha_XOM, '-', color='r')! plt.show()!!! daily_returns.plot(kind='scatter', x='SPY', y='GLD' )! beta_GLD, alpha_GLD = np.polyfit(daily_returns['SPY'], daily_returns['GLD'],1)! plt.plot(daily_returns['SPY'], beta_GLD*daily_returns['SPY'] + alpha_GLD, '-', color='r')! plt.show()!! print daily_returns.corr(method='pearson')!!

!

!!GLD dots don’t fit very closely and as a result GLD has a low correlation value .!!

Page 8: MS CS - Algorithmic Strategy for Computational Investments

!!!!!!!!!!!!!!!!!!!SPY-GLD has a weak correlation , SPY-XOM has strong correlation.!!!!!!!!!!Formulate best Strategy !!We can utilize the Concept of Correlation to formulate the best strategy for our process ( in our case its the Trading Strategy )!!Overview!Develop trading strategies using Technical Analysis, then test them using market simulator.!!We will use IBM and trade it from December 31, 2007 until December 31, 2009.!!Note that - the Orders are generated completely dynamically best on the Technical Indicators.!!There are lots of complex Hypothesis functions (i.e. Technical Indicators in terms of Trading) , we have picked uo the simplest one to start with.!

!Part 1: Bollinger Band Strategy!References :!!http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:bollinger_bands!

Page 9: MS CS - Algorithmic Strategy for Computational Investments

!Bollinger Bands include:!20 day simple moving average (SMA) line.!Upper Band = SMA + 2 * 20 day standard deviation.!Lower Band = SMA - 2 * 20 day standard deviation.!Long entries as a vertical green line at the time of entry.!Long exits as a vertical black line at the time of exit.!Short entries as a vertical RED line at the time of entry.!Short exits as a vertical black line at the time of exit.!

!Bollinger Band strategy chart :! !!!!!!!!!!!!!!!!!!!!!!!Bollinger Band strategy backtest chart :!!!!!!!!!!!!!

Page 10: MS CS - Algorithmic Strategy for Computational Investments

!!!!Summary of Bollinger Band backtest performance metrics : !

!Data Range: 2007-12-31 to 2009-12-31!Sharpe Ratio of Fund: 0.97745439803!Sharpe Ratio of SPY: -0.149576168451!Cumulative Return of Fund: 0.3614!!Cumulative Return of SPY: -0.201395139514!Standard Deviation of Fund: 0.0108802922269!Standard Deviation of SPY: 0.0219136847778!Average Daily Return of Fund: 0.000669942567631!Average Daily Return of SPY: -0.000206479400499!Final Portfolio Value: 13614.0!!Part 2: Custom Strategy!In a nutshell , my strategy is to determine the most profitable moment for a LONG and SHORT!Entry .!We want to gain advantage of the fact that IBM and SPY Funds have strong correlation .!

!SPY IBM!SPY 1.000000 0.770765!IBM 0.770765 1.000000!!Close observation on the relationship between the trends of IBM Bollinger bands and SPY!Bollinger bands shows that sometimes Bollinger Band of the Index continues downward!(signaling a down market) and in that case even if there is opportunity to buy IBM at lower rate ,!we can still wait till IBM lower Bollinger is less than SPY lower Bollinger .!

!So if we can exploit the lower ranges of SPY Bollinger then there is an opportunity to buy at!much lower rate (make more effective LONG Entry) !!Similarly, if we can wait for a SHORT Entry till IBM upper band crosses rolling mean of SPY ,!then we can sell IBM stocks at a higher rate.!!My strategy has been formulated in a way such that a SELL happens directly after BUY and we!hold the positions as per original requirement i.e. we prefer consecutive BUY / SELL and always!perform under leverage.!!There is some room for improvement by identifying profitable EXIT points .!!

Page 11: MS CS - Algorithmic Strategy for Computational Investments

Also we should automatically find out if SPY Bollinger Bands should be considered in this strategy or not by!calculating the covariance or by measuring the difference of rolling means .!

!Excerpts of Main Logic :!!

def generateOrders(df, rm, symbol, upper_band, lower_band, rm_SPY , spy_upper_band , spy_lower_band):!! orders = pd.DataFrame(index=np.arange(df.size),columns=['Date','Symbol','Order','Shares'])!! # Plot raw IBM values, rolling mean and Bollinger Bands! ax = df[symbol].plot(title="Custom Strategy - Hybrid Bollinger Bands", label=symbol)! rm.plot(label='IBM Rolling mean', ax=ax, color='black')! upper_band.plot(label='IBM Upper band', ax=ax, color='cyan')! lower_band.plot(label='IBM Lower band', ax=ax, color='cyan')! spy_lower_band.plot(label='SPY Lower band', ax=ax, color='green')! spy_upper_band.plot(label='SPY Upper band', ax=ax, color='green')!! long_entries = pd.DataFrame(index=df.index, columns=[symbol])! short_entries = pd.DataFrame(index=df.index, columns=[symbol])!! long_exits = pd.DataFrame(index=df.index, columns=[symbol])! short_exits = pd.DataFrame(index=df.index, columns=[symbol])!! last_position = 'NA'! i = -1 # range(0, df.shape[0])! j = -1! for date_index , row in df.iterrows():! i = i + 1! index_date=date_index.date();! if (df.irow(i-1)[symbol] < lower_band.irow(i-1)) and \! (df.irow(i)[symbol] > lower_band.irow(i) and \! (last_position != 'LONG_ENTRY' and last_position != 'SHORT_EXIT')):!! # since IBM has a high correlation with SPY , we can use! if(lower_band.irow(i-1) <= spy_lower_band.irow(i-1)):!! long_entries.irow(i)[symbol] = df.irow(i)[symbol]! last_position = 'LONG_ENTRY'! j = j + 1! orders.irow(j)['Date']=index_date! orders.irow(j)['Symbol']=symbol! orders.irow(j)['Order']='BUY'! orders.irow(j)['Shares']=100!! plt.vlines(x = index_date, colors = 'g', ymin= 0, ymax= 140)!! elif ( (df.irow(i-1)[symbol] > spy_lower_band.irow(i-1))! and (df.irow(i)[symbol] < upper_band.irow(i)) and \! (last_position != 'SHORT_ENTRY' and last_position != 'LONG_EXIT')):!! if(upper_band.irow(i-1) >= rm_SPY.irow(i-1)):!! short_entries.irow(i)[symbol] = df.irow(i)[symbol]! last_position = 'SHORT_ENTRY'! j = j + 1!

Page 12: MS CS - Algorithmic Strategy for Computational Investments

orders.irow(j)['Date']=index_date! orders.irow(j)['Symbol']=symbol! orders.irow(j)['Order']='SELL'! orders.irow(j)['Shares']=100!! plt.vlines(x = index_date, colors = 'r', ymin= 0, ymax= 140)!! elif (df.irow(i)[symbol] >= rm.irow(i) and last_position == 'LONG_ENTRY') :! long_exits.irow(i)[symbol] = df.irow(i)[symbol]!! last_position = 'LONG_EXIT'!! j = j + 1! orders.irow(j)['Date']=index_date! orders.irow(j)['Symbol']=symbol! orders.irow(j)['Order']='SELL'! orders.irow(j)['Shares']=100!! plt.vlines(x = index_date, colors = 'k', ymin= 0, ymax= 140)!! elif (df.irow(i)[symbol] <= rm.irow(i) and last_position == 'SHORT_ENTRY') :! short_exits.irow(i)[symbol] = df.irow(i)[symbol]!! last_position = 'SHORT_EXIT'!! j = j + 1! orders.irow(j)['Date']=index_date! orders.irow(j)['Symbol']=symbol! orders.irow(j)['Order']='BUY'! orders.irow(j)['Shares']=100!! plt.vlines(x = index_date, colors = 'k', ymin= 0, ymax= 140)!! print '--------------------'! long_entries = long_entries.dropna(subset=[symbol])! #print 'long entries' , long_entries!! print '--------------------'! #long_exits = long_exits.dropna(subset=["IBM"])! #print 'long exits' , long_exits!! print '--------------------'! #short_entries = short_entries.dropna(subset=["IBM"])! #print 'short entries' , short_entries!! print '--------------------'! orders = orders.dropna(subset=["Symbol"]).sort_index()! #print 'save data into orders file.'! orders.to_csv('orders.txt')! #print 'Orders' , orders!! # Add axis labels and legend! ax.set_xlabel("Date")! ax.set_ylabel("Price")! ax.legend(loc='lower center')! #plt.show()!! return orders!

Page 13: MS CS - Algorithmic Strategy for Computational Investments

!def test_run():! # Read data! dates = pd.date_range('2007-12-31', '2009-12-31')! #dates = pd.date_range('2005-12-31','2010-12-31')! SYMBOL_NAME = 'IBM'! # SPY data analysis! symbols = [SYMBOL_NAME]! df = get_adj_closing(symbols, dates)!!! # 1. Compute rolling mean! rm_SPY = get_rolling_mean(df['SPY'], window=20)! # 2. Compute rolling standard deviation! rstd_SPY = get_rolling_std(df['SPY'], window=20)! # 3. Compute upper and lower bands! spy_upper_band, spy_lower_band = get_bollinger_bands(rm_SPY, rstd_SPY)!! # IBM data analysis! # 1. Compute rolling mean! rm_IBM = get_rolling_mean(df[SYMBOL_NAME], window=20)! # 2. Compute rolling standard deviation! rstd_IBM = get_rolling_std(df[SYMBOL_NAME], window=20)! # 3. Compute upper and lower bands! upper_band_IBM, lower_band_IBM = get_bollinger_bands(rm_IBM, rstd_IBM)! # 4 Generate Orders based on Short and Long Exits! orders = generateOrders(df, rm_IBM, SYMBOL_NAME, upper_band_IBM, lower_band_IBM , rm_SPY , spy_upper_band , spy_lower_band)! simulateMarket('orders.txt' , '2007-12-31' , '2009-12-31')!

!!Following chart illustrates my Strategy using the SPY Bollinger and IBM Bollinger!bands :!

!!!!!!!!

Page 14: MS CS - Algorithmic Strategy for Computational Investments

!!Backtest chart of my strategy :! !!!!!!!!!!!!!!

!!!

Its clearly evident by comparing both the Backtest charts that My Strategy is more profitable!than direct Bollinger Band Strategy.!

!Also as per the performance metrics my strategy offers ~ 50% profit whereas normal Bollinger!band strategy offers ~ 30% profits.!!The key feature of my strategy is to follow the downward trend of SPY Funds for Long!Entries and follow the upward trends of SPY Funds for Short Entries.!!This strategy also works well for some of the other Funds like AAPL!

!Summary of backtest performance metrics :!

!Data Range: 2007-12-31 to 2009-12-31!Sharpe Ratio of Fund: 1.56986970104!Sharpe Ratio of SPY: -0.149576168451!Cumulative Return of Fund: 0.5093!Cumulative Return of SPY: -0.201395139514!Standard Deviation of Fund: 0.00861653636147!

Page 15: MS CS - Algorithmic Strategy for Computational Investments

Standard Deviation of SPY: 0.0219136847778!Average Daily Return of Fund: 0.000852117365506!Average Daily Return of SPY: -0.000206479400499!Final Portfolio Value: 15093.0!!Conclusion!!There are many other complex technical indicators and many other strategies which can be analyzed in parallel to find the best one at the expense of some massive historical calculation.!!Computational Investment offers a great opportunity to apply intelligent algorithm and formulate best strategy based on various statistical measures and perform algorithmic trading to maximize risk-adjusted-returns .!!