automated stock trading and portfolio optimization using ... · pdf fileautomated stock...
TRANSCRIPT
Automated Stock Trading and Portfolio
Optimization Using XCS Trader and Technical
Analysis
Anil Chauhan
Master of Science
Artificial Intelligence
School of Informatics
University of Edinburgh
2008
AbstractFinancial market is highly dynamic system for which finding underlying price pattern is highly
complex. We have extended the previous work done on automatic stock trading using extended
classifier system (XCS) by implementing Q (1) and Q (λ) Reinforcement Learning algorithm.
We developed 14 XCS agents using different technical indicators like Moving
averages,RSI,CMF,SAR,ADX etc. We showed that by modeling financial prediction as single
step reinforcement learning problem and using the concept of delayed reward for checking
correctness of action taken, all the benchmarks strategies like buy and hold, 'keeping money in
bank' etc could be beaten. We have also shown that stock price movement is co-related with
other day price movement and reformulated the financial forecasting as a multi step process.
We introduced the concept of passive set and found that multi step problem formulation gives
best results. Q learning gave 18% better performance than single step reward only RL. Finally
we build a portfolio management and optimization system which learns online and does
monthly or quarterly rebalancing using the best trader to trade. The results showed that
reacting to the market dynamics doesn’t necessarily give us the best result. We showed that
such a system give us average performance between the best trader and the worst trader. We
also employed different trading strategies like “using more than 1 best agent” and “mean
reversal strategy” to do portfolio optimization.
ii
AcknowledgementsI would like to thank my supervisor Sonia Schulenburg for introducing me to the world of
Finance and Classifier Systems and for giving constant feed back on my project. I would also
like to thank Abu ul Hassan for sharing previous version of XCS java code with me. Many
thanks to my friend Santosh for reviewing my initial draft of thesis and sharing ideas on the
same.
iii
Declaration
I declare that this thesis was composed by me, that the work contained herein is my own except
where explicitly stated otherwise in the text, and that thesis work has not been submitted for
any other degree or professional qualification except as specified.
(Anil Chauhan
iv
Table of Contents
1 Introduction..............................................................................................................................1
1.1 Introduction and Purpose....................................................................................................1
1.2 Motivation...........................................................................................................................2
1.3 Objective ............................................................................................................................3
1.4 Outline................................................................................................................................4
2 Background & Related Work.................................................................................................5
2.1 Background.........................................................................................................................5
2.1.1 Market Efficiency........................................................................................................5
2.1.1.1 Version of Efficient Market Hypothesis (EMH)..................................................52.1.2 Technical Analysis.......................................................................................................6
2.1.3 The Portfolio:..............................................................................................................6
2.1.3.1 Why do we need Portfolio?..................................................................................72.1.3.2 Portfolio Management:.........................................................................................7
2.2 Related Work......................................................................................................................7
2.2.1 Machine Learning in Finance and Portfolio Management.........................................8
2.3 XCS Introduction from Stock Trading Perspective..........................................................10
2.3.1 XCS Input and Output...............................................................................................11
2.3.2 XCS Frame Work [15]..............................................................................................12
2.3.3 XCS Learning Cycle..................................................................................................13
2.3.3.1 Updating XCS Parameters..................................................................................152.3.3.2 Genetic Algorithm role and rule evolution[15]..................................................15
2.3.4 Deviation from other LCS based Systems.................................................................16
2.3.5 Mind of XCS System..................................................................................................17
3 Implementation .....................................................................................................................18
3.1 Technical Analysis Usage in XCS....................................................................................18
3.1.1 Description of individual technical Indicators..........................................................19
3.1.2 Combining different technical indicators and working mechanism of different
Agents.................................................................................................................................24
v
3.1.2.1 Composition of 14 Agents:.................................................................................253.1.3 Advantage and “Scope of Improvement” of current approach................................26
3.2 Improving the learning of eXtended Classifier System....................................................27
3.2.1 Classifiers in multi step Reinforcement learning problems......................................28
3.2.2 Implementing Q learning in Classifier......................................................................30
3.2.3 Eligibility trace and Watkins's Q(λ) .........................................................................30
4 Experimentation.....................................................................................................................32
4.1 FTSE data and Stability of the XCS System....................................................................32
4.1.1 FTSE Data.................................................................................................................32
4.1.2 Stability of the XCS System.......................................................................................32
4.2 Comparative Study of the 3 different Algorithm..............................................................34
4.2.1 Setting the parameters for the experiments...............................................................34
4.2.1.1 Setting Initial Exploration Rate..........................................................................354.2.1.2 Setting discount rate (gamma)............................................................................364.2.1.3 Setting Trace Decay Parameter...........................................................................37
4.3 Experimental Results for 3 learning Algorithm................................................................37
4.3.1 Observations:............................................................................................................39
4.4 Fault in the previous Reward giving strategy...................................................................40
4.4.1 Experiments with improved delayed reward Strategy...............................................40
4.4.1.1 Setting Initial Exploration Rate..........................................................................414.4.1.2 Setting Discount Rate (gamma)..........................................................................424.4.1.3 Setting Trace Decay (λ)......................................................................................42
4.4.2 Results with new delayed reward strategy................................................................43
4.4.2.1 Observations:......................................................................................................454.4.3 Experimental Results for all 3 learning algorithm with new delayed reward strategy
............................................................................................................................................48
4.4.3.1 Observations.......................................................................................................514.4.4 Fault in Q (1) learning ............................................................................................51
4.4.5 Experimentation with passive Set..............................................................................52
4.4.5.1 Finding optimum parameters..............................................................................524.4.5.2 Observations for Passive set...............................................................................55
5 Implementation & Experimentation-Portfolio Optimization............................................58
5.1 Implementation.................................................................................................................59
vi
5.2 Portfolio Performance:......................................................................................................60
5.3 Results...............................................................................................................................61
Portfolio Management Results –.......................................................................................62
5.3.1.1 Observations: Portfolio Management results......................................................635.3.1.2 Analysis and Comments on Portfolio Management System..............................63
5.3.2 Change of Portfolio construction Strategy................................................................64
5.3.2.1 Results of Portfolio Management using more than 1 Agent...............................655.4 Experimentation with Portfolio Management taking few best companies in the Portfolio
................................................................................................................................................65
5.4.1 Steps Followed..........................................................................................................66
5.4.2 Results.......................................................................................................................66
5.4.3 Observation:..............................................................................................................66
5.4.4 Experimentation with Mean Reversal Strategy.........................................................67
5.4.4.1 Results.................................................................................................................675.4.4.2 Observations.......................................................................................................68
6 Conclusion & Future Work..................................................................................................69
6.1 Conclusion .......................................................................................................................69
6.2 Future Work......................................................................................................................71
Bibliography..............................................................................................................................72
Appendix....................................................................................................................................76
vii
List of Figures
Figure1 XCS Frame Work [15]...............................................................................................13
Figure 2: Voting Strategy [19].................................................................................................17
Figure3: Q-learning: An off-policy TD control algorithm. [14]...........................................29
Figure4:Back ward view of eligibility trace [14]....................................................................30
Figure5: Tabular version of Watkins's Q (λ) algorithm. [14]...............................................31
Figure6: Mean Performance of 5 companies for different number of runs........................34
Figure7: DSGI Price Chart .....................................................................................................46
Figure8: DSGI, Wealth Chart of agents with old reward strategy......................................46
Figure9: DSGI, Agent’s performance with new delayed reward giving strategy...............47
Figure10: LAND, Price Chart.................................................................................................47
Figure11: LAND, Meta Agents wealth chart with old reward strategy..............................48
Figure12: LAND, Meta Agents wealth chart with new delayed reward strategy...............48
Figure13: Portfolio Management Results...............................................................................62
Figure14: Portfolio management using either best 5 or best 10 or best 20 companies......66
Figure15: Portfolio Management System using trend reversal strategy.............................67
Figure16: Comparison mean reversal strategy with normal strategy.................................68
viii
List of Tables
Table1. Composition of different Agents................................................................................26
Table2: Details of 10 FTSE 100 Companies...........................................................................32
Table3: Experimental results for 100 run on 5 FTSE100 companies..................................33
Table4: Setting Exploration rate.............................................................................................35
Table5: Combined Results for Different exploration rate....................................................36
Table6: Setting discount rate...................................................................................................36
Table7: Setting trace decay parameter...................................................................................37
Table8: Experimental Results for 3 learning methodology..................................................39
Table9: Setting Exploration Rate............................................................................................41
Table10: Setting discount rate.................................................................................................42
Table11: Setting trace decay....................................................................................................42
Table12: Comparative results for 90 FTSE 100 companies with delayed reward strategy
for single step reward only RL................................................................................................45
Table13: Results of 90 FTSE100 Companies for different RL with new delayed reward
strategy.......................................................................................................................................50
Table 14: Comparative Results for FTSE 100 companies using Passive Set......................54
Table15: Comparison with Only active set and with additional passive set approach......55
Table16: Combined Results single step RL without Passive set and multi step Q Learning
with passive set..........................................................................................................................57
Table17: Portfolio Optimization using single best Agent....................................................61
Table18: Monthly Portfolio management Using Best 3 agents.............................................64
Table19: Monthly Portfolio Rebalancing using different Number of Best Agents............65
Table 20: Portfolio Management System with quarterly re-balancing taking best 5
companies..................................................................................................................................65
Chapter 1
1 Introduction
1.1 Introduction and PurposeAdvances in modern machine learning such as evolutionary computation have
enabled us not only to analyze data more efficiently, but also to understand any
underlying patterns present in the financial market. This effective exploitation of new
computation methods will help organizations to make better informed decisions which
will further improve their competitive edge [5]. Many different approaches like Neural
Networks (NN), Genetic Algorithms (GA) have been widely applied to predict the
financial market. However, for some of these systems the input data might not be as
rich in information content as technical indicators such as various types of moving
averages, break out rules, maximum and minimum prices in the preceding days or
fundamental indicators such as dividends, interest rates and money supply. More
recently, the academic world has shown some promise in the area of learning classifier
systems (rule based models) by Overcoming some of the most common drawbacks
neural models present to practitioners, such as the lack of explanatory power, high
variance in results and the need to continually retrain the nets when performance starts
to decrease. In addition, very few models have addressed the integration of more than
one learning paradigm within a single platform.
In this project, first we will try to improve the learning of the classifier system
by incorporating different Reinforcement learning algorithm. In all the previous
version of eXtended Classifier Systems (XCS), financial forecasting was solely
considered to be a single step Reinforcement Learning problem. However, we believe
price movement is not completely erratic and prices do follows patterns of ups and
downs. Price of any day affects the coming day prices and is in some way co-related.
Due to this we will try to model financial forecasting as a multi step process and
1
implement Q (1) and Q (λ) Reinforcement Learning algorithm. Secondly, we will
investigate the methods of portfolio construction and portfolio optimization. This will
be achieved by using a system which evolves technical trading agents, each learning to
trade stocks by modeling groups of traders using a variety of sets of technical
indicators. For portfolio construction, the main task will be to build a portfolio
management system attached to XCS which picks the best agents that can give the
maximum benefit in the long run. For simplicity, the equity market will be the prime
focus for this investigation, although (if time permits), tests could be extended to
foreign exchange to address scalability of the approach.
1.2 MotivationThe financial market is a highly dynamic system which depends on multiple
factors such as bank interest rates, company base strength and everyday changing
news. The motivation behind this project is to develop learning systems that can learn
in an online fashion and cope with rapidly changing financial market environment.
The main idea is to develop set of robust agents which uses Technical Analysis
information and different Reinforcement Learning Algorithm to do automatic stock
trading using eXtended Classifier System (XCS). They should be robust in the sense
that they should be able to trade profitably (beat different benchmark strategies) on all
FTSE 100 stocks. Finally we wish to build a portfolio optimization system which will
interact with this modified eXtended Classifier System (XCS) and harnesses the
strength of best agents and use different strategies like “taking best companies in the
portfolio”,”mean reversal strategies” etc to do monthly or quarterly portfolio
rebalancing.
2
1.3 Objective The project has been subdivided into following sub tasks -:
1) Optimize the problem formulation for XCS: The agents present in current
system converts the information given by the technical indicators into Input
(Binary String) for the XCS. The XCS System further tries to learn the
optimum decision it should take when faced with a particular combination of
binary bits. The better we define the problem (i.e. combine the technical
indicator information) for the XCS, the better it will learn the underlying price
pattern and more robust and reliable decisions (buy, sell or hold) we expect it
to give. The aim is to develop such robust agents.
2) Formulate automatic stock trading and financial forecasting as a multi step
problem instead of single step problem by implementing Q learning and Q (λ)
learning algorithm.
3) Experiment with reward mechanism of XCS reinforcement learning portion to
see if the delayed reward feedback is better than currently employed immediate
reward feedback.
4) Explore the possibility of either giving negative reward to the agents for taking
any incorrect action or creating and rewarding passive set for the incorrect
decision taken by an agent.
5) Build a Portfolio Management System attached to the current XCS System.
The portfolio management system will be responsible for -:
a) Portfolio construction using different trading strategies like "utilizing
combinations of best agents instead of a single agent", "mean reversal
strategies" etc.
b) Optimize the portfolio by pro-active monthly, quarterly or yearly
rebalancing.
6) Compare the performance of the trading agents against benchmark agents like
buy and hold, bank etc.
3
1.4 OutlineThe rest of this thesis is organized as follows:
Chapter 2 gives the background information on the subject and briefs the
readers about the current state of the art in “Financial forecasting domain”.
Chapter 3 talks about how we have implemented different agents and learning
algorithm as proposed in this thesis. It also talks about what errors we found
during implementation and how we changed our approach to tackle different
problems.
Chapter 4 presents the experimental results obtain after implementing different
Learning algorithm and our critical analysis of the results.
Chapter 5 presents the implementation and experimental results for portfolio
construction and optimization system and critical analysis of the same.
4
Chapter 2
2 Background & Related Work
2.1 Background
2.1.1 Market Efficiency
Maurice Kendall in his random walk experiment (1953) [16] found that the
stock prices are completely random and has no relation to the past performance. The
unpredictable price movement seems to confirm the irrationality of the market.
However, on deeper analysis it became apparent that random price movement
indicates a well functioning or efficient market and not an irrational one [17]. In its
most basic form Efficient Market Hypothesis says that markets are information
efficient i.e. all the available information that could be used for profit making
quickly gets absorbed in the stock prices and the prices may increase or decrease only
in response to new unavailable and unpredictable information.
2.1.1.1 Version of Efficient Market Hypothesis (EMH)
There are 3 forms of EMH, which differs in what “all available information” is
composed of.
1) Weak Form hypothesis states that stock prices already reflect all the market
trading information like past price, volume movement etc. It means that if any
form of past price or volume data movement could be used be generate reliable
trading signal then all investors would have used them by now making the
information fruitless[17]. It suggests that any form of Technical analysis is
useless.
2) Semi strong form hypothesis states that any publicly available information
like prospects of firm including fundamental data on the firm’s product line,
quality of management, balance sheet composition, and patents held, earning
5
forecast and accounting practices must also be already reflected in the stock
prices [17]. This hypothesis makes the fundamental analysis also useless.
3) Strong form hypothesis states that stock prices also reflect information
available only to company insiders. Such information also generally gets
spread very quickly, leaving very less room for making profits.
Summarizing, we can say that if there is any pattern or information that is
exploitable, then mass of astute investors would attempt to profit from such
predictability, which would ultimately move stock price and cause the trading strategy
to self destruct.
2.1.2 Technical Analysis
Technical analysis is mainly the search for recurrent and predictable patterns in
the stock prices by using the past price or volume data. Technical analysis like weather
forecasting doesn’t result in absolute prediction about the future but help investors
anticipate what is most likely to happen to the prices over time. Dow Theory lies at the
root of technical analysis. 2 important points from Dow Theory are -:
1) Prices discount everything. Current price of stock fully reflects all the
information. Technical analysis utilizes the information captured by the price
to interpret what the market is saying with the purpose of forming a view on
the future [18].
2) Price Movements are not totally random. Most technicians believe that there
are inter spread period of trending prices in between random fluctuations.
Technician aim is to identify the trend and then make use of it to trade or
invest. More detail about technical analysis and how we have used it in our
XCS System is presented in section 3.1.
2.1.3 The Portfolio:
A portfolio is a combination of different investment assets mixed and matched
for the purpose of achieving an investor's goal(s). A portfolio can be viewed as a pie-
chart where each portion represents an allocation of the investment [6].
6
2.1.3.1 Why do we need Portfolio?
The aim of portfolios is diversification. Different securities perform differently
at any given point in time, so the idea is that with a mix of assets, the entire portfolio
would not suffer the impact of a decline of any one security. It’s like following the
simple practice of not putting all your eggs in one basket. Spreading investment across
various types of assets and markets reduces the risk of catastrophic financial losses.
2.1.3.2 Portfolio Management:
Portfolio management is defined as the art and science of making decisions
about investment mix and policy, matching investments to objectives, asset allocation
for individuals and institutions, and balancing risk against performance [6]. It is an
attempt to maximize return at a given appetite for risk. In the case of mutual and
exchange traded funds (ETFs), there are two forms of portfolio management: passive
and active. Passive management simply tracks a market index, commonly referred to
as indexing or index investing. Active management usually involves a single manager,
co managers, or a team of managers who attempt to beat the market return by actively
managing a fund's portfolio through investment decisions based on research and
decisions on individual holdings. Closed end funds are generally actively managed.
2.2 Related WorkFama’s Efficient Market Hypothesis [20] (Section 2.1.1) and the Martingale
Model [21], [22] rules out any strategy or publicly available information or private or
return/dividends information, or use of technical analysis for excessive market returns.
There are proponents on both sides who believe we can somehow predict the price and
others who believe the prices are completely random. For example Burton J. Malkiel
in his Random Walk experiment [23] [24] showed that prices are completely random,
whereas MIT's Prof. Andrew Lo and Craig A. MacKinlay [25] published work, points
out that there is a long trend in the prices. Lo and MacKinlay investigated the weekly
US stock from 1962 to 1985 and found that random walk hypothesis could be easily
rejected .They also described several techniques for detecting predictabilities and
evaluating their statistical and economic significance. Work done by Pin Chen and Mu
7
Yen Chen [7] using an XCS based decision support system with technical indicators
has shown promises to predict stock price fluctuations efficiently and generated good
returns. Schulenburg [10] in her PhD research developed an LCS model of artificial
traders and tested it in the stock market using several groups of technical indicators.
Stone [26] in his PhD work applied ZCS on foreign exchange market. Competitive
returns were generated in their work in most of the cases, which suggests LCS models
can be successfully applied when modeling financial markets. More recently, Chen,
Lin [8] used XCS for predicting future market price movements. The model used
moving averages of price and volume for constructing environmental message.
Gershoff and Schulenburg [9] explored the collective behavior of XCS agents to
achieve accuracy in prediction.
2.2.1 Machine Learning in Finance and Portfolio Management
Moody and Saffell [1] presented methods for optimizing portfolio by using an
adaptive algorithm, Recurrent Reinforcement Learning (RRL), for discovering
investment policies. They demonstrated how direct reinforcement can be used to
optimize risk adjusted investment returns (including the differential Sharpe ratio)
while accounting for the effects of transaction costs. The RRL algorithm learns
profitable trading strategies in two ways:
● Maximize risk adjusted return as measured by Sharpe ratio. They used a
modified derived form of Sharpe ratio called differential Sharpe ratio for
online optimization of trading system.
● Avoid the downside risk by maximizing the Downside Deviation (DD) ratio,
which is defined as square root of the average of the square of the negative
returns. Using DD as measure of risk they used downside deviation ratio DDR
to measure the utility function.
RRL trader performed far better than Q trader and enables a simpler problem
representation, avoids Bellman’s curse of dimensionality and offers compelling
advantage in efficiency.
GAO and Chan [2] presented a trading and portfolio management system
called QSR which uses Q learning and Sharpe ratio algorithm. They used absolute
8
profit and relative risk adjusted profit as performance function to train the system
respectively. The experiments conducted on trading example based on foreign
exchange rate showed promising results.
Neuneir [3] formalized asset allocation as a Markovian Decision Problem and
optimized it using dynamic programming and Q learning algorithms. Neural networks
were used for value function approximations. Experimental results on German Stock
market showed this strategy to be better than heuristic benchmark policy.
Schulenburg and Ross [29][30] developed a LCS model where trader used
technical indicators to predict the price of IBM stocks. The system was able to beat all
the benchmark agents. Kyong and Sungky [4] used genetic algorithms to propose a
portfolio optimization scheme for index fund management. Index funds are designed
to copy the benchmark index with relatively small number of stock. The paper
reported that index fund could improve its performance greatly with the proposed GA
portfolio scheme. There proposed scheme is based on three fundamental variables:
Portfolio beta, trading amount and market capitalization. They demonstrated the
results, for index fund designed to track the Korean Stock Price Index.
Dempster and Jones [11] aimed to develop an adaptive trading system that
trades profitably by emulating the behavior of technical traders who adapt to the
market by changing it trading strategies. There trading system uses Genetic
programming to find the best combination of technical indicator to trade. The genetic
algorithm can chose the combination of technical indicator from initial set of 6
technical indicators namely AMA, CCI, MACD, MA Crossover, Price Channel, RSI
and Stochastic. They used a modified form of Sterling ration to gauge the performance
of trading strategies.
S = Return/ (1 + modified drawdown)
Alongside finding trading strategies via genetic programming they also tried to
optimize the built portfolio by quarterly re optimizing it. There experimental results
showed that such a system which uses combination of technical indicator can make
profit. The best strategy employed was able to give a return of 7% pa. However on an
average they weren’t able to beat buy and hold strategy. They also showed that trading
in adaptive manner wherein quarterly optimization of the trading strategy is done is
9
ultimately loss making, which highlight the penalty for over-reaction to short term
market behavior [11].
Schulenburg and Wong [12] experimented on Portfolio allocation using XCS
System by combining input data using technical analysis, general market condition
and options market conditions. There best performing agents performed substantially
better than benchmark agents like buy and hold, trend following, bank agent and
random agent. However, XCS agent’s performance varies depending on initial random
seed chosen and a single best performing agent can’t tell much about the performance
of overall system in general.
Dempster, Payne, Romahi and Thompson [13] used Technical indicators for
Intraday FX Trading using Reinforcement Learning and Genetic programming
technique. The set of technical indicator used by them were price channel break out,
adaptive moving average, relative strength index, stochastic, moving average
convergence divergence, moving average crossover, momentum oscillator and
commodity channel index. The performance of the System was judged on the basis of
Sharpe ratio and sterling ratio. There experiments were able to generate significant in-
sample and out of sample profits. However none of the methods produces significant
profits at realistic transaction costs.
2.3 XCS Introduction from Stock Trading PerspectiveXCS stands for extended Classifier System. It is an accuracy based classifier
system which is different from other classifier in the way that classifier fitness is
derived from estimated accuracy of reward predictions instead of from reward
prediction themselves. It is an Online learning machine, which improves its behavior
with time through interaction with environment. XCS learns through reinforcement
and the aim is not only to get more reward but to maximize the value (Summation of
all the rewards in long run). The System is given least amount of prior information, so
that most of the machine knowledge results from adaptation to the environment. We
don’t tell it how to do things but let it learn through fed inputs, action it takes and the
reinforcement it gets. If it does well, we give it positive reward else penalize it in some
form.
10
2.3.1 XCS Input and Output
XCS Input Unit : The input to XCS is binary vector e.g. 10010110 where each bit
can be thought of as crossing the threshold of continuous valued Output of some
sensors. In our XCS, everyday the system gets previous day data about the individual
stock price (open, high, low and close) and volume. The Meta agents present in the
system apply technical analysis on this information to get buy (1), sell (0) signals. A
very simple example is moving average of price and volume. Let’s say a particular
agent calculates the moving average of price and volume for the past 10 days. If the
closing price of stock is greater than 10 day moving average of price, it suggests price
may go up and that’s a buy (1) sign. Similarly, if the volume of the stock is greater
than 10 day moving average of volume, it also predicts a buy (1) sign. So the input
string that will be fed into the XCS will be 11. This is a very simple example. In
actual, system uses more advance technical analysis to form the input binary string
which may range from 6 to 9 bits in length. Each bit can be either 0 (sell) or 1 (buy).
More detail about how the problem is being defined to the XCS System using
technical analysis can be found in section 3.1.
XCS Output: XCS output is discrete action or decisions. For example in our case it is
either 0(sell) or 1(buy). The final aim of learning cycle for XCS is to learn what action
it should take for a particular combination of input binary bits. Please note XCS uses
unsupervised learning (Reinforcement Learning) wherein at any point we don’t tell it
what is right and what is wrong. It has to find this out through experimental trial and
error and reward mechanism. Depending on the input string sometimes it is easy to
predict what the correct action is. For example if input string is 101111 i.e. out of 6
bits, 5 bits are suggesting to buy the stock, then the correct action must be to buy the
stock. However, at other times the correct action might not be so evident. For example
if the input string is 111000, the proportion for both buy and sell signal are equal and
we expect it to learn what weight would be appropriate to give to individual signal.
Even in real life scenario, on a given day an actual trader might face with situation
wherein there is no clear sign of buy or sell from combination of technical indicator.
He/She then have to judge from experience which technical indicator information
should be given more weight and decide accordingly.
11
2.3.2 XCS Frame Work [15]
XCS contains population of Classifiers. Each classifier in the population is
characterized by 5 main components -:
1) Condition part C, which specifies on what problem instances the classifier is
applicable.
2) Action part A, specifies what action classifier takes when condition C is
fulfilled.
3) Reward prediction P, estimates what payoff or reward classifier can expect on
executing the action.
4) Reward prediction error ε estimated the mean absolute difference of R with
respect to the actual reward.
5) Fitness F estimates the scaled, relative accuracy of classifier with respect to
other overlapping classifiers in the action set it is present.
In Short a classifier is a set of
<Condition> : <Action> => <Prediction>
Prediction is similar to payoff of Reinforcement Learning.
Eg 01#1## : 1 => 693.2
{0 sell : 1 Buy : # : don’t care}
This classifier says if first bit is 0, second is 1 and fourth is 1 and I don’t care about
others then after taking action 1(Buy), 693.2 will be the payoffs. The payoffs are
updated with the learning of system. The above given classifier’s condition matches
with following 8 input string. It might be the sum of all there payoffs.
010100
010110 There can be 8 such cases
010101
……….
It’s different from other action based systems like Neural Network in the sense that in
Neural Networks payoff information for any Input are distributed over the whole
Network. Each classifier acts only a subset of problems. It checks whether given
condition is one on which it can act. If condition is there, it acts on it and predict
certain payoff.
12
Figure1 XCS Frame Work [15]
[P]: Classifier population F: fitness of prediction α 1\ε
[M] : Match Set ε: error in prediction
p: predicted value
2.3.3 XCS Learning Cycle
In the starting the classifier population [P] is generally empty. The agents use
the stock price and volume data to form the input string (For details please see section
3.1). The Input is fed into classifier population [P], which detects if there is any match.
The 4 classifier marked with -- in fig 1 matches the Input 0011. They are put in a
match set [M]. If no classifier matches the given input, XCS creates classifier by
covering mechanism (A rule is created at random and has random action and is
assigned a low prediction). A new rule has a certain number of don’t care sign (#) in
random position. The # sign give classifier an initial generality due to which it can be
tested on many input problem instances.
13
Covering is necessary only initially and vast majority of new rules are derived
from existing rules.
For example, suppose Input string is 11000101 and there is no classifier which
matches this input .Then the rule created is 1##0010# : 01 10
Continuing the process, after creation of Match set, XCS estimates payoff for each
possible action by forming a prediction array P (A). In fig1, 2 classifiers in the match
sets are predicting 01 and two are predicting 11. We take the Fitness weighted average
of prediction for each action
Predicted weighted = Σ prediction * fitness
Average --------------------------------
Σ fitness
Eg P(action=01) = 43*99 + 27* 3
-------------------- = 42.57
99+3
Similarly P(action=11) = 16.6
Hence, P(A) shows fitness weighted average of all reward prediction estimates of the
classifier in [M] that advocate classification A. The System follows an ε greedy policy
i.e. it takes the best action most of the time, but with small probability
ε (exploration probability) it also takes suboptimal action and chooses random action
from those in the prediction array. All classifiers in match set [M] that specifies
chosen action A forms the action set [A]. In fig 1, we have chosen the action with
maximum prediction i.e. 01 and 2 classifiers having this action are put in action set.
The System executes the prescribed action. Next day the correctness of the action
taken is judged by the stock price movement. For every correct action a reward of
1000 is given. For wrong action reward of 0 is given. It differs from normal
Reinforcement Learning methodology in the sense that for incorrect action, negative
reward is not given. For example let’s suppose the system predicts rise in the price of
stock and it buys the share. If next day the prices go up, then a reward of 1000 is
given. This reward is used to update the parameters of classifiers in action set [A].
14
2.3.3.1 Updating XCS Parameters
Initially on creation of a classifier, it is given a very low prediction value. After
getting the reward for the executed action, its parameters are updated as follows-:
Prediction : Pj Pj + α(R - Pj)
α Is learning rate (~ 0.2)
so if R > Pj then Pj value is increased i.e. it’s prediction will go up. As can be seen, if
this particular classifier is updated many times, Pj will tend toward ‘R’ i.e. predicted
value will tend towards the actual return from the process.
Similarly, other parameters are updated as
Error : Ej Ej + α(|R – Pj| - Ej)
Accuracy : Kj == Ej m− if Ej > Eo else Eo n−
Relative : Kj’= Kj / Σ Kj over [A]
Accuracy
Relative accuracy shows relative accuracy of classifier with respect to classifiers in
action set.
Fitness : Fj Fj + α(Kj’ - Fj)
Fitness of the classifier is an estimate of its accuracy with respect to accuracies of
other classifiers in the action set it occurs
2.3.3.2 Genetic Algorithm role and rule evolution[15]
XCS applies Genetic algorithm for rule evolution. If the average time since the last
GA was applied, exceeds certain threshold then genetic reproduction is invoked in the
current action set [A]. The GA selects 2 parental classifier based on there relative
fitness in action set [A]. Two offspring’s are generated reproducing the parents by
applying crossover and mutation. Parents and offspring’s both compete in the same
population [P]. Niche mutation is applied in the classifier which means that the
mutated classifiers still matches the current problem instance or input binary string
they were able to act previously. If the offspring condition is subsumed by some other
classifier than it is not inserted into the population and only the numerosity of the
subsumer classifier is increased by 1. The classifier population is fixed and deletion is
15
done if over populated. Excess classifiers are deleted from [P] with probability
proportional to the action set size estimate that the classifiers occur in. If classifiers are
more experienced with less fitness there probability of deletion is more [15]. For more
information on this, readers are encouraged to read chapter 4 of martin butz book.
The classifiers which are more general will more often be part of an action set
and thus undergo more reproduction events and thus propagates faster. Thus the GA
process is expected to evolve the accurate, maximally general solution as the final
outcome.
For example the below mentioned classifier undergoes cross over to give offspring on
the right hand side.
1 0 # # | 1 1 : 1 1 0 # # 1 # : 1 -----(1)
# 0 0 0 | 1 # : 2 # 0 0 0 1 1 : 2 ------(2)
Please note result of crossing are :
A classifier (1) which is more general than both,
A classifier (2), which is more specific than both.
A more specific classifier can never be less accurate. It is not the case always but the
process tends on balance to search along generality specific dimensions, using piece of
existing higher accuracy classifiers. It is clear that population will tend towards having
classifiers with greater accuracy [15].
2.3.4 Deviation from other LCS based Systems
• XCS reproduces classifiers selecting from the current action set instead of from
the whole population.
• Relative accuracy based fitness measure the performance of a classifier.
• Reproduction favors those who’s condition matches and come more often in
the action set.
• Deletion occurs from whole of the population.
16
2.3.5 Mind of XCS System
Figure 2: Voting Strategy [19]
Our modeled XCS System in its basic form consists of 7 Agents which uses
different set of technical analysis information to create the Input binary string. Each
agent has 25 copies which simultaneously do the trading and prediction. One voting
agent combines the prediction of these 25 agents and presents it to the meta-agent. The
system learns in an online fashion. There are two separate phases, learning phase and
trading phase. During the learning phase all the agents simply explores and updates
the parameters of classifiers and no actual money is invested. During the trading
phase, out of 25 agents, system randomly picks some agents who explores (take
random sub-optimal action) and other agents exploit (take best possible action which
is supposed to give maximum reward). While combining the decision, voting-agent
consider the factor of current wealth of 25 agents and discard the decision of those
agents who are loss making. Also any agent who is exploring (taking random action),
his action is not taken into account. Meta agent finally takes the decision of either to
buy or sell using the composite predictive power of 25 XCS Agent. For portfolio
management system 14 different types of XCS Agents were used. Due to continuous
process of exploring and exploitation even during the trading phase, learning of the
system never stops. Due to this continuous learning, if the dynamics of the market
changes, we expect system to capture those variations also.
17
Chapter 3
3 Implementation
3.1 Technical Analysis Usage in XCSTechnical analysis overall is more of an art than a science. There is no single
kind of technical indicator which can work for all the stocks in the market. In our XCS
System technical analysis information is used to make the input binary string. For our
purpose we used and coded 14 individual technical indicators. There are open source
library for technical indicators. However, we have coded our own set of technical
indicators, so that some form of heuristic can further be applied to individual technical
indicators to generate more robust buy or sell signal. For example one such technical
indicator, Relative Strength Index (RSI) ranges from 0 to 100 and gives over bought
and over sold condition for RSI greater than 70 and RSI less than 30 respectively. We
used heuristic to generate buy or sell signal in the range 30 and 70.More details about
this can be found in Section 3.1.1.
This section is further divided into following parts -:
3.1.1 Description of individual technical indicators and how they are used to
generate buy or sell signal.
3.1.2 Description of agents which combines different technical indicator
information.
3. Advantages and “scope of improvement” of the current approach.
18
3.1.1 Description of individual technical Indicators
Technical indicator is defined as a series of data points that are derived by
applying a formula to the price data of security which can be combination of the open,
high, low or close over a period of time [18]. These data points can be used to generate
buy or sell signal which we shall shortly see. Technical indicators can provide unique
viewpoint on the strength and direction of the underlying price action.
Different technical indicators employed in our XCS System are -:
Moving average:
It is a lagging indicator which simply calculates average price of security over a
specified number of periods. Moving average filters out random noise and offers a
smooth perspective of price action. They work well when stock develops a strong
trend.
Usage of Moving average:
In our XCS System Moving average is used in 2 ways to generate buy and sell signal.
The location of current price, relative to the moving average: 10 and 20 day moving
average is used for this purpose.
if (MA10[index]-close>=0){binary+="0";}else{binary+="1";}if (MA20[index]-close>=0){binary+="0";}else{binary+="1";}
Location of shorter moving average relative to longer moving average.
if(MA20[index]>MA10[index]){binary+="0";}else{binary+="1";}
Please note 1 is for buy signal and 0 is for sell signal. Binary is the appended binary
string which is fed as input to the XCS system.
We have deliberately used shorter moving averages (10 and 20) to reduce the lag in
the signal and concentrate on the short term trends rather than long term trend.
Parabolic SAR:
SAR stands for stop and reverse. It was developed by J. Welles Wilder Jr to find
trends in market price. It develops dotted line either above or below the security price.
The dotted line below the price establish the trailing stop for a long position (generates
19
buy signs) and the lines above establish the trailing stop for short position (our System
doesn’t short and only generates sell sign)
Usage in XCS: SAR value greater than current day high of day gives sell sign.
if(sar[index]> high){binary+="0";}else{binary+="1";}* Details of SAR Calculation is given in appendix
Average Directional Index (ADX):
It evaluate strength of current trend, be it up or down. ADX is based on accumulation
distribution line.
Usage in XCS:
Positive and negative direction index (+ DI, -DI ) are used to generate buy and sell
sign.
if(posDI[index]>negDI[index]){binary+=1;}else {binary+=0;}Commodity Channel Index (CCI):
CCI is a typical price based momentum indicator which was developed by Donald
Lambert to identify cyclical turns in commodities.
Usage in XCS:
CCI is band oscillator. Movement above + 100 indicates overbought stock and sell
signal is given. Similarly movement below -100 gives oversold sign and buy signal is
given. Movement between -100 and + 100 doesn’t give clear sign of buy or sell. In
such scenario we have used heuristic that if current CCI is greater than past 5 days
moving average of CCI then a buy signal should be given.
if(CCI[index] >= 100){//over boughtbinary+=0;
}else if (CCI[index] <= -100){//over soldbinary +=1;
}else if(CCI[index] >= CCIMA[index]){// + divergencebinary +=1;
}else{ binary +=0;}
Chaikin Money Flow(CMF):
CMF is an oscillator based on accumulation distribution line.
Usage in XCS:
CMF is bullish when it is positive and bearish when it is negative.
if(CMF[index]< 0){binary+="0";}else{binary+="1";}
20
MACD: Moving average convergence divergence.
It is a centered oscillator that is unique in having both leading and lagging component
in it. It is the difference between the 12 day EMA and 26 day EMA of a security.
Usage in XCS :
A positive macd indicates buy sign and vice versa
if(MACD[index]> 0){binary+="1";}else{binary+="0";}
A nine day Exponential moving average, EMA of MACD acts as trigger line to give
buy sells sign
if(MACD[index] > MACDSignal[index]){binary+="1";}else{binary+="0";}
Money Flow Index:
MFI is a Momentum indicator similar to RSI. It’s a good measure of money flowing
into and out of the security.
Usage in XCS
MFI above 80 indicates overbought stock and gives sell sign and below 20
indicates oversold stock and gives buy sign. In between 20 and 80 there is no clear
sign of buy and sold and so we have used positive divergence to create the buy or sell
sign. If current MFI is greater than 5 day average of MFI then a buy sign is given.
if(MFI[index] >= 80){binary+=0;
}else if (MFI[index] <= 20){binary +=1;
}else if(MFI[index] > MFIMA[index]){binary+=1;
}else{binary +=0;}
On balance Volume (OBV):
It’s a Volume based oscillator.
Usage in XCS
A rising bullish OBV line indicates that the smart money is flowing into the
stock and shows price uptrend. We have used it as, if the current OBV is greater than
past 5 days OBV then a sell sign may be given.
21
if (OBV[index]>=OBVMA[index]){binary+=1;}else {binary+=0;}
Percentage Price Oscillator (PPO):
This oscillator formed by taking difference of longer moving average from shorter
moving average of price in percentage form.
Usage in XCS: Used in 2 ways-:
if(PPO[index]> 0){binary+="1";}else{binary+="0";}if(PPO[index] > PPOSignal[index]){binary+="1";}else{binary+="0";}PPOSignal is found by taking 5 day moving average of PPO.
Percentage Volume Oscillator (PVO):
Similar to PPO except that instead of price, Volume is used for calculation.
Usage in XCS
if(PVO[index]> 0){binary+="1";}else{binary+="0";}if(PVO[index] > PVOSignal[index]){binary+="1";}else{binary+="0";}
Relative Strength Index (RSI) :
RSI is a momentum oscillator which compares the magnitude of a stock’s recent
gains to the magnitude of its recent losses and turns that information into a number
that range from 0 to 100 [18].
Usage in XCS:
RSI above 70 and below 30 indicates overbought and oversold condition and gives
sell and buy signal respectively. In between 30 and 70 we have used heuristic that if
current day RSI is greater than past 5 day average of RSI then a buy sign is given.
if(RSI[index] >= 80){binary+="0";
}else if(RSI[index] <= 20){binary+="1";
}else if(RSI[index] >= RSIMA[index]){binary+="1";
}else{binary+="0";
}
22
Stochastic Oscillator:
It is a momentum indicator.
Usage in XCS
Reading below 20 are considered over sold and above 80 are considered over bought.
We have used fast percent D in XCS. In between 20 and 80, heuristic similar to RSI is
used.
if(fastPercentD[index] >= 80){binary+="0";
}else if(fastPercentD[index] <= 20){binary+="1";
}else if(fastPercentD[index] >= fastPercentDMA[index]){binary+="1";
}else{binary+="0";
}
Cross over of FastPercentK with respect to fast percent D is also used to generate buy
and sell signs.
if(fastPercentK[index] > fastPercentD[index]) {binary+="1";}else{binary+="0";}* for details about Stochastic Oscillator calculation, please see the appendix.
StochRSI :
It is a momentum oscillator wherein Stochastic oscillator is combined with RSI.
Usage in XCS:
if(stochRSI[index] >= 80){binary+="0";
}else if(stochRSI[index] <= 20){binary+="1";
}else if(stochRSI[index] >= stochRSIMA[index]){binary+="1";
}else{binary+="0";}
ROC :
Rate of change is centered oscillator. It gives percentage price change over the last 20
days. Buy signal generated if ROC is greater than zero.
Usage in XCS:
if(ROC[index]< 0){binary+="0";}else{binary+="1";}
23
Williams % R:
It’s a momentum indicator that works much like the Stochastic Oscillator.
Usage in XCS:
if(willPercentR[index] >= -20){//overboughtbinary+="0";
}else{binary+="1";
}if(( -80 <= willPercentR[index] && willPercentR[index] <= -100)){//oversold
binary+="1";}else{
binary+="0";}
3.1.2 Combining different technical indicators and working
mechanism of different Agents
Few points which we have considered while combining technical indicators are -:
1) Individual indicators in the combination should provide different perspective
towards the underlying price or volume movement. Indicators should
complement each other instead of moving in unison and generate the same
signal [18].For example Chaikin Money Flow (CMF) and Money flow index
(MFI) are both price based momentum indicator and provides nearly same
information and generates same signal and therefore as such shouldn’t be used
with each other.
2) It is generally useless to combine more than 5 indicators.
Keeping these things in mind and through lot of hit and trial experiments on variety of
stocks, we developed 14 set of agents which uses different combination of technical
indicators.
For example one such combination used for Agent 1 is -:
CMF- A non trend following volume indicator to identify buying and selling pressure.
RSI - A momentum indicator used to identify potential overbought and oversold
levels
Moving Average - A trend following indicator to identify the underlying trend in the
stock.
These indicators have very less in common and complement each other very well. [18]
24
3.1.2.1 Composition of 14 Agents:
Name of Agent Composition of AgentAgent 1 Moving Average (10,20),
MACD,RSI,CMF
Agent 2 Moving Average(10,20),PPO,PVO
Agent 3 Moving Average(10,20),Stochastic Oscillator,MACD,CMF
Agent 4 SAR,Moving Average(10,20),CMF, Williams %R
Agent 5 SAR, ADX, Moving Average,OBV
Agent 6 Moving Average (10,20),Williams %R,StochRSI,CMF
Agent 7 MACD, ROC, RSI, PVO
Agent 8 Moving Average(10,20),CCI, RSI, MACD
Agent 9 Moving Average(10,20),CMF,CCI
Agent 10 Stochastic Oscillator,ADX,Moving Average(10,20),CMF
Agent 11 Moving Average(10,20),PPO,CMF
25
Name of Agent Composition of AgentAgent 12 Moving Average(10,20),
Stochastic Oscillator, ROC,MFI
Agent 13 Stochastic Oscillator, MACD, CMF
Agent 14 Stochastic Oscillator, Williams %R, MACD
Table1. Composition of different Agents
For initial set of experiments only best 7 agents were used. For portfolio management
System all 14 sets of Agents were used.
3.1.3 Advantage and “Scope of Improvement” of current
approach
Advantages: 1) All the technical indicators were used in there most basic form. No particular
threshold was set for any indicator to generate buy or sell signal. This serves
the purpose of minimum priori to XCS system, i.e. providing it with least
amount of information so that it can mostly learn from its action and adaptation
to the environment.
2) Each individual bit was independent of other bits in the binary string. So no
input information was duplicated in any form.
3) Oscillators generally gives sell or buy signal only when they are in over bought
or over sold range respectively. By using heuristic we were able to take
advantage of upward movement of oscillator and also made sure each
oscillator always gives either buy or sell signal.
Scope of Improvement in combining technical Indicators
1) There is no single correct or optimum way of combining different technical
indicators. A combination which might work for one stock might not work for
another one. We have combined technical indicators with our best knowledge.
Defining the problem in better way for XCS by combining different technical
26
indicators is open ended question and can be explored further. One thing is
very clear from our analysis is that the better we combine technical indicators
the better we can expect the learning of XCS agents and better could be the
returns.
2) Parameter optimization: With individual technical indicators, there are many
parameters which can be optimized. In most of the cases we have used widely
used parameters. This can be explored further. For example
a) For calculating positive and negative Directional index we have used
average for 14 days which is widely used. Such short of parameters can
be optimized to give best result with maximum number of stocks.
b) Moving average and moving volume is taken only for 10 and 20 days.
By doing this we have concentrated on short term price or volume
movement. By taking longer moving average, longer and more robust
trends can be identified.
3.2 Improving the learning of eXtended Classifier
SystemIn the previous version of classifier system, financial forecasting was considered
solely as single step problem where each day's Input to the System doesn’t have any
relation to the next or previous day's inputs. Here successive problem instances were
thought of independent of each other and all iterations were treated as independent.
This was based on the assumption that Input technical indicator information for one
day is completely random and has no relation to any other day. In such scenario the
classifier parameters in the current action set [A] are updated only with respect to the
immediate reward feedback only. However, a new approach is taken by us where the
basic idea is that the state of stock market on any particular day is not completely
independent of other days but is affected by how the market has behaved previously in
past few days. There is a positive or negative co-relation between the market states of
each day. Each classifier represents a particular subset of technical indicator
27
information it can act on and the possible action it will take. Keeping this thing in
mind, we modeled learning to trade via classifier systems as a Multi Step
Reinforcement learning problem where all classifiers in the action set [A] were
updated with respect to the immediate reward R plus the estimated discounted future
reward (Value function for the next state).
Q and Q (λ) are used as multi-step Reinforcement learning algorithm. Each
classifier represents a condition it can act upon and the action it will take. So each
classifier prediction value can be thought as state action value pair i.e. Q(s,a). Keeping
this thing in mind implementing Q learning is trivial in XCS. However one thing has
to be kept in mind that due to don’t care parameter (#) in the classifiers condition there
can be more than one classifier which can match a particular condition and all of them
form part of action set. So instead of updating just one classifier in each iteration, we
have to update all the classifiers present in action set.
3.2.1 Classifiers in multi step Reinforcement learning problems
A multi step Reinforcement Learning problem poses the additional complication of
back propagation of reward in an appropriate manner. Initial complication might arise
due to inappropriate reward propagation from inaccurate, young or over generalized
classifiers [15]. Q values for a Reinforcement learning problem tell the value of a state
action pair. A single classifier in itself is a combination of condition it can act and the
action it will take. So the prediction value of a classifier is just like Q values for a
particular state in Reinforcement Learning problem.
One-step Q-learning, governing equation is
This is equivalent to XCS update function for the reward prediction
β is the learning rate parameter,
γ is the discount rate
28
Reward prediction value thus coincide with the Q values Q(s, a). Thus, the prediction
array coincides with Q value entries. Without generalization XCS is just a tabular Q
learner where each table entry represents a distinct rule. XCS generalizes over states
that yield identical Q values with respect to a specified action [15].
A very important point that should be noted for multi step RL as mentioned by
Butz [15] is
“In initial phase of learning the back propagated reward signal is expected to
fluctuate significantly. As in RL, XCS is expected to progressively learn starting from
those state action combinations that yield actual reinforcement. Once such classifiers
are stably represented, back propagation becomes more reliable and the next reward
level can be learned accurately and so on. Thus as in Q learning reward will be spread
backward starting from the cases that yield the actual reward. Thus XCS learn a
generalized representation of the underlying Q function in the problem.”[15]
Q learning is a temporal difference learning methodology that doesn't require
any explicit model of the environment and learns state action value function just by
generation of episodes. It is an off policy learning methodology i.e. the policy we
follow(behavior policy) is different from the policy we evaluate(or optimize).In our
case we follow an ε greedy policy and learn the Q values for optimum policy. This
enables early convergence towards optimum Q values.
Back up diagram for Q learning is
Algorithm for Q learning
Figure3: Q-learning: An off-policy TD control algorithm. [14]
29
3.2.2 Implementing Q learning in Classifier.
The prediction of each classifier in last day action set is updated as
P = reward + cons.gamma * maxPredictionprediction += cons.beta * (P-prediction)Here maxPrediction is the maximum prediction of a classifier in current day action set.
Similarly, error property of each classifier in the last action set is updated taking
maxPrediction into consideration.
3.2.3 Eligibility trace and Watkins's Q(λ)
Q(λ) algorithm is a combination of temporal difference Q learning with
eligibility trace to obtain a more general method that may learn more efficiently. In Q
leaning, the temporal difference error(TD error) is back propagated only to last state
visited, whereas in Q(λ) each state is associated with an additional parameter called
eligibility trace which indicate degree to which each state is eligible for undergoing
learning changes, should a reinforcement event occurs. At each moment we look at
current TD error and assign it backward to each prior state according to state
eligibility trace at that time. We may think ourselves riding along a stream of states
computing TD errors and shouting them back to previously visited states. Figure4:Back ward view of eligibility trace [14]
* For details about eligibility trace and Q( λ) algorithm, readers are encouraged to look
at chapter 7 of Sutton and Barto book[14].
Watkins's Q(λ) is one form of Q(λ) which can be applied online. It offers the
advantage of faster learning. So we applied this to the classifier System to make
learning more efficient.
Watkin’s Q(λ) in pseudo code format.
30
Figure5: Tabular version of Watkins's Q (λ) algorithm. [14]
For implementing this learning methodology each classifier in the population
is associated with an extra parameter named eligibility which shows how much it is
eligible for the back propagation of delta error. Watkins Q (λ) is strictly followed as
shown in figure 5.
Delta error for any day represents error in the prediction .It is calculated as
deltaError = reward + cons.gamma * currPredictionValueMax - lastPredictionValue;currPredictionValueMax is maximum prediction value of a classifier in the current day
action set.
lastPredictionValue is the prediction value of the action that was taken yesterday.
Eligibility of all the classifier present in the last action set is increased by 1
The delta error is back propagated into the whole classifiers population taking trace
decay parameter into consideration. Further, if an exploitation step was taken the
eligibility of all the classifier is reduced by a factor of .However if an exploration
step was taken the eligibility of all the classifier is made zero.
Chapter 4
31
4 Experimentation
4.1 FTSE data and Stability of the XCS System.
4.1.1 FTSE Data
We used real data of 90, FTSE100 companies. Training + Trading
Period.Company Code
Company Name From To
AAL.LANGLO AMERICAN 5/17/2000 4/29/2008
SBRY.L SAINSBURY 5/16/2000 4/29/2008AV.L AVIVA 5/16/2000 4/29/2008BT-A.L BT GROUP 11/12/2001 4/29/2008CBRY.L CADBURY-SCH 5/16/2000 4/29/2008CCL.L CARNIVAL 4/23/2003 4/29/2008
CNE.LCAIRN ENERGY 2/21/2003 4/29/2008
FGP.L FIRST GROUP 5/16/2000 4/29/2008
FP.LFRIENDS PROV 7/10/2001 4/29/2008
LSE.L LON.STK.EXCH 7/24/2001 4/29/2008Table2: Details of 10 FTSE 100 Companies
* For details of all the 90 FTSE 100 companies used in the experiments please see the
appendix. The FTSE 100 companies data was cleaned and preprocessed at Level E
Limited and was provided by my supervisor Dr. Sonia Schulenburg.
4.1.2 Stability of the XCS System.
The initial generation of classifier population depends on the random seed and
thus Classifiers individual bit can have 0 or 1 or # (don’t care symbol) to represent the
problem space. Due to this variability is introduced in the system and performance of
XCS System varies in different run. A single run of XCS on a particular stock might
not give clear picture of the learning of the system. To check for the stability of the
system we followed following steps-:
1) Run the experiment 100 times to find the mean, max, min and standard deviation of the performance
32
2) In each run, find the yearly performance of individual agent on the stock. Yearly performance of the XCS agent on a stocks is measured as follows
a. After each year, calculate yearly percentage returns as
Yearly Percentage = Final Wealth of agents – Initial Agents wealth
return -------------------------------------------------------*100
Initial agent’s wealth
b. To find the net performance of stock, we can either take arithmetic mean or geometric mean of these yearly percentage returns. We took geometric means of different yearly percentage return as final performance measure for the stock
3) Take the average of yearly performance of all 7 agents to get the performance of an average XCS agent on that particular stock.
For our experiment we choose 5 companies which have different volatility
characteristics. By volatility, we mean they have different amount of fluctuation in
there price pattern.
Experiment results
Company OM.L LMI.L LLOY.L SVT.L XTA.LMean 11.77187 28.11772 6.221785 14.39186 29.08665Max 15.10893 33.09195 7.658274 15.91277 33.41367Min 8.665734 24.5644 4.536309 12.72837 23.99098Standarddeviation 1.413307 1.695753 0.621009 0.630542 1.837757
Table3: Experimental results for 100 run on 5 FTSE100 companies.
We also ran experiment to find mean for different number of runs like mean for 2 ,
3,5,7,9,12,16,20,25 runs.
33
Mean Performance for different number of runs
0
5
10
15
20
25
30
35
1 2 3 5 7 9 12 16 20 25
Number of runs
Per
form
ance
% OML.LLMI.LLLOY.LSVT.LXTA.L
Figure6: Mean Performance of 5 companies for different number of runs
Observations
1) The system is found to give pretty stable performance as can be seen from low standard deviation for different companies in table 3.
2) Performance is pretty stable and approaches the mean performance after 16 runs.
For all our further experiments we ran the system 20 times to see the final
performance. All the result henceforth presented are the mean of 20 runs. A very
important point to mention here is that XCS is exposed to data only once and learning
is completely online. So at any point of time we are not cheating.
4.2 Comparative Study of the 3 different AlgorithmThe base XCS java code was written by Martin V. Butz and further modified by
Matthew Gershoff and Abu ul Hassan during there MSC thesis.
4.2.1 Setting the parameters for the experiments
The first step in comparative study is to find and set the optimum parameters
for each algorithm. From performance point of view Classifier Systems are considered
very robust in terms of setting of parameters. However we felt it must to at least
optimize following 3 parameters-:
1) Initial Exploration rate of all 3 learning methods.
2) Discount rate (gamma) for Q learning.
3) Trace decay parameter for Q(λ) algorithm.
34
4.2.1.1 Setting Initial Exploration Rate
To find the optimum parameters, experiments are run on a small subset of
stock (5 companies). We experimented with 3 different exploration rates 0.5, 0.2 and
0.02. For individual stock 7 different XCS agents are trading and there averaged
performance is taken for the comparison purpose. For example for company RR.L
Only Reward Learning(RR.L)
Exploration Rate = 0.5
Exploration Rate = 0.2
Exploration Rate =.02
MetaAgent1 17.95 15.66 16.99MetaAgent2 11.75 12.79 12.51MetaAgent3 14.73 17.96 16.40MetaAgent4 11.30 13.67 15.38MetaAgent5 12.47 14.16 15.44MetaAgent6 15.88 18.30 16.98MetaAgent7 12.76 11.77 11.35Agents Average 13.84 14.90 15.01buyandHoldAgent 27.63 27.63 27.63RadomAgent 13.74 16.62 9.69trendFollowingAgent 3.68 3.68 3.68BankAgent 2.53 2.53 2.53
Table4: Setting Exploration rate
Average shown in Blue is used for comparison purpose.
Combined results for different exploration rate
Single Step RL
CompaniesExploration Rate = 0.5
Exploration Rate = 0.2
Exploration Rate =.02
RR.L 13.84 14.90 15.01CNE.L 11.98 12.87 12.40BLND.L 13.74 13.88 13.88SABRY 2.80 2.92 2.19AAL.L 5.71 5.85 5.49Average 9.61 10.09 9.79
Multi Step RL - Q Learning
CompaniesExploration Rate = 0.5
Exploration Rate = 0.2
Exploration Rate =.02
RR.L 18.08 19.23 19.10CNE.L 11.67 12.32 11.81BLND.L 14.04 13.61 13.43SABRY 4.11 3.63 3.10AAL.L 9.70 9.21 7.76Average 11.52 11.60 11.04
35
Q(λ) Learning
CompaniesExploration Rate = 0.5
Exploration Rate = 0.2
Exploration Rate =.02
RR.L 13.33 17.18 17.42CNE.L 12.37 12.32 14.42BLND.L 11.22 11.27 11.57SABRY -0.02 1.07 0.60AAL.L 4.58 4.86 3.78Average 8.30 9.34 9.56
Table5: Combined Results for Different exploration rate
For single step RL exploration rate of 0.2 showed better performance than 0.5 and
0.02. For Q learning also, exploration rate of 0.2 was found best .The performance
didn’t change much between exploration rate of 0.2 and 0.02. For Q (λ) exploration
rate of 0.2 was found better than 0.5. However, exploration rate of 0.02 also showed
nearly same performance. In the end we decided to keep exploration rate to 0.02.
4.2.1.2 Setting discount rate (gamma)
We experimented with 4 different discount rates 0.95, 0.70, 0.50 and 0.2 for
different stocks keeping optimize exploration rate of 0.2 and found that keeping high
discount rate of 0.95 gave best performance (Which is also default value of gamma in
Butz XCS code).
Result of one stock is shown below-:
RR.L Gamma(0.95) Gamma(0.70) Gamma(0.50) Gamma(0.20)MetaAgent1 13.29 8.45 12.33 9.85MetaAgent2 14.62 8.00 10.43 11.47MetaAgent3 15.66 9.03 8.67 10.79MetaAgent4 16.99 7.89 9.45 9.34MetaAgent5 13.67 7.27 10.78 7.33MetaAgent6 15.28 11.28 11.00 12.12MetaAgent7 16.80 4.77 7.07 6.99buyandHoldAgent 19.61 19.61 19.61 19.61RadomAgent 7.13 6.89 13.61 18.73trendFollowingAgent 1.51 1.51 1.51 1.51BankAgent 2.53 2.53 2.53 2.53
Table6: Setting discount rate
36
4.2.1.3 Setting Trace Decay Parameter
We experimented with 4 different trace decay parameter 0.1, 0.2, 0.5, 0.8 for
Q (λ) algorithm.
Combined Result for 9 companies is as shown below
CompaniesTrace Decay(0.1)
Trace Decay(0.2)
Trace Decay(0.5)
Trace Decay(0.8)
MKS.L 7.55 6.59 7.03 8.59NXT.L -2.90 -2.41 -1.30 -3.25OML.L 0.54 1.52 1.71 2.56PRU.L -5.37 -4.86 -3.63 -6.67PSN.L 5.42 5.79 5.00 4.66PSON.L -3.37 -2.61 -2.82 -1.56PUB 9.64 9.48 7.78 9.22RBL 4.36 4.91 4.32 3.26RBS -0.29 -0.23 -0.53 -2.54Average 1.73 2.02 1.95 1.58
Table7: Setting trace decay parameter.
Performance of stocks seems pretty robust regarding trace decay parameter. We used
trace decay parameter = 0.2 for our further experiments.
4.3 Experimental Results for 3 learning AlgorithmWe conducted experimented on 86 FTSE companies and compared there average
performance against Buy and hold strategy.
CompanyRewardOnly RL Q RL Q(λ) RL B&H
RR.L 15.69 23.07 17.61 27.62CNE.L 14.52 12.75 13.08 35.80PUB.L 7.11 10.69 9.66 10.11MKS.L 4.42 4.97 7.48 16.29XTA.L 30.67 37.22 36.01 50.88LMI.L 20.63 21.93 19.62 37.83PSN.L 9.06 11.49 9.24 9.49FP.L 2.30 2.45 2.56 4.51HBOS.L 0.72 -0.42 -0.97 -3.94LSE.L 27.94 32.48 30.30 37.02AMEC.L 20.74 22.26 16.04 26.07BDEV.L 10.39 12.59 10.81 10.02TLW.L 23.22 25.08 22.79 49.56CPG.L -2.83 -0.81 -0.01 3.01BB.L 4.44 4.31 4.27 1.61
37
CompanyReward Only RL Q RL Q(λ) RL B&H
TW.L 8.41 11.52 10.93 9.49KGF.L -4.15 -0.13 0.25 -13.83CCL.L -4.51 -4.93 1.08 -11.51NXT.L 0.47 -4.12 -0.19 -4.57LAND.L 1.19 0.52 -0.26 8.52STAN.L 1.32 3.40 4.72 17.37CPW.L 16.20 17.75 11.83 28.54UUL.L 1.20 -0.15 2.04 2.32SN.L -2.88 -1.00 1.12 7.96ICI.L 15.26 12.98 9.12 28.87RSA.L 0.63 2.38 0.05 -0.35KEL.L 11.20 15.45 9.47 19.00FGP.L 7.76 9.48 7.31 12.67ABF.L 5.92 4.29 4.23 9.44BA.L 9.40 8.36 6.16 19.19AL.L -3.44 -4.37 -2.14 -4.55REX.L -3.26 -5.35 -1.35 -0.40GSK.L -2.24 -3.60 -2.96 0.59CPI.L 7.94 8.48 7.50 12.76IPR.L 15.08 19.63 10.45 20.12BP.L -2.65 -1.48 -1.14 -0.69REL.L -2.78 -1.42 -0.06 -0.13WPP.L 2.64 -2.55 -0.87 1.50SGE.L -3.25 -4.51 -1.79 4.70MRW.L 5.84 3.12 3.01 7.77BAY.L 11.74 9.26 3.88 17.55PRUL.L -6.51 0.22 -3.37 2.91RTOL.L -2.17 -5.31 -1.76 -16.73BSY.L -7.68 -8.36 -2.94 -6.16RTR.L 25.69 19.30 12.07 33.17TATE.L 8.85 7.98 7.04 10.16LGEN.L -1.22 -0.15 -0.35 0.92SCTN.L -0.35 3.24 2.18 0.92DGE.L -0.79 0.31 0.75 2.91CBR.L 5.94 7.10 -0.28 4.50SAB.L 16.06 17.18 11.81 16.25RBS.L -1.18 0.85 -0.68 -6.20WOS.L 6.56 6.74 3.26 4.13SHP.L 11.90 9.01 7.45 14.31BLT.L 12.73 18.99 14.64 28.53SSE.L 12.24 12.60 11.41 14.91SMIN.L 5.29 1.23 1.59 3.93VOD.L 3.24 3.57 4.58 2.66TT.L 7.19 4.92 5.69 14.59
38
CompanyRewardOnly RL Q RL Q(λ) RL B&H
ANTO.L 35.39 31.57 28.53 39.83PSON.L -6.09 -8.71 -3.30 -1.44LLOY.L -2.49 -3.46 -1.59 -4.04BARC.L -2.21 -0.12 -1.32 0.32IIL.L 4.13 3.40 3.70 4.78HSBA.L 1.38 1.82 -0.09 0.43BLND.L 15.24 15.51 12.23 15.68ITV.L -0.46 2.95 0.33 5.76CW.L 9.56 4.04 2.37 3.68RB.L -0.19 4.25 4.89 16.30BATS.L 13.11 12.64 10.72 21.30JMAT.L 8.28 8.62 10.59 12.52IMT.L 3.92 6.02 3.64 13.49LII.L 9.57 9.51 8.36 15.50SDRC.L 5.12 5.60 2.72 9.42SDR.L 6.19 4.38 2.38 11.72DSGI.L -4.72 -3.89 -4.03 -8.26AAL.L 7.36 9.79 4.44 18.46AZN.L -6.39 -4.72 -3.07 -6.77ULVR.L 3.34 3.38 1.61 5.42FTSE 5.43 7.74 6.60 10.33BTA.L -0.06 -1.26 3.75 4.99DMGT.L -4.54 -6.75 -5.08 -6.00SBRY.L 6.20 7.38 4.67 4.50HMSO.L 16.44 15.76 12.23 17.42TSCO.L 10.67 8.79 7.64 10.32SVT.L -3.28 -2.89 -1.79 5.25Average 5.92 6.32 5.41 9.95 1.00 1.07 0.86Table8: Experimental Results for 3 learning methodology
4.3.1 Observations:
1) Q learning gave 6 % better performance than single step Reinforcement
learning. However, Q (λ) performance wasn’t better than single step RL.
2) None of the 3 learning strategy was able to beat buy and hold strategy.
3) There were many stocks for which average performance of agents was
negative. This indicates that XCS agents for these stocks aren’t able to learn
the underlying mapping of binary input to the correct action properly.
39
4.4 Fault in the previous Reward giving strategyThese observations made it clear that the classifiers weren’t able to learn
properly i.e. there parameters weren’t getting updated correctly. We identified that the
possible reason for this could be the way reward was judged in the system. The
correctness of the classifier’s action was decided on the next day price movement of
the stock. For example, if today XCS predict that prices are going to go up and it buys
the stock, then the reward is given to the classifiers in the action set [A] by checking
what happens to the prices the very next day. If the price goes up, which implies it was
right to buy the stock, it is given a reward of 1000 else a reward of 0 is given.
However, we believe the daily prices of the stock contain a lot of noise, fluctuations
and the judgment on the correctness of the action shouldn’t be solely done on the next
day price movement. For this reason, we changed the reward giving strategy and in the
modified system, reward was delayed by 5 days. We took the simple moving average
of the next 5 days closing price and on the 6th day we judged the correctness of the
action taken. So if we took a decision of buying the stock on the first day and the
average of next 5 days stock price was greater than first day stock price then it means
prices were actually in an upward trend and our decision of buying the stock was
correct and we further gave a reward of 1000. In this way we smoothen the noise in
the stock prices and expect it learn properly.
4.4.1 Experiments with improved delayed reward Strategy
We experimented again on all 90 FTSE 100 stocks starting from optimization of the
System parameters.
40
4.4.1.1 Setting Initial Exploration Rate
We experimented with 3 different exploration rates 0.5, 0.2 and 0.02
Reward Only Learning
CompaniesExploration Rate = 0.5
Exploration Rate = 0.2
Exploration Rate =.02
RR.L 23.88 21.17 22.29CNE.L 25.27 24.82 21.56BLND.L 24.29 24.05 23.40SABRY 14.59 14.44 13.70AAL.L 17.79 15.28 15.46MKS.L 15.56 15.61 14.83SDR.L 19.57 17.39 18.77Average 20.14 18.96 18.57
Q Learning
CompaniesExploration Rate = 0.5
Exploration Rate = 0.2
Exploration Rate =.02
RR.L 20.53 19.02 18.32CNE.L 17.23 16.94 16.41BLND.L 15.55 15.46 14.49SABRY 10.44 10.44 10.25AAL.L 10.32 8.37 8.67MKS.L 12.96 12.47 12.28SDR.L 10.26 9.73 9.56Average 13.90 13.21 12.85
Q(λ) Reinforcement Learning
CompaniesExploration Rate = 0.5
Exploration Rate = 0.2
Exploration Rate =.02
RR.L 10.91 10.34 11.21CNE.L 14.28 13.57 13.23BLND.L 15.34 14.30 13.66SABRY 6.14 7.05 5.87AAL.L 5.98 7.65 7.81MKS.L 10.81 12.07 10.19SDR.L 6.88 6.83 6.08Average 10.05 10.26 9.72
Table9: Setting Exploration Rate
For single step RL and Q learning, exploration rate of 0.5 gave the best results. For Q
(λ) exploration rate of 0.2 gave the best results. Few things that can be noted from
above results are -:
1) Changing exploration rate doesn’t affect the performance much.
41
2) For Q (λ) small exploration rate gave best performance. This make sense as the
eligibility trace is cut off whenever system explores, loosing much of the
advantage of Q (λ) algorithm.
4.4.1.2 Setting Discount Rate (gamma)
We experimented with 3 different discount rates 0.95, 0.50 and 0.25 for
different stocks keeping optimize exploration rate of 0.5 and found that keeping low
discount rate of 0.25 gives best performance.
Company Gamma(0.95) Gamma(0.50) Gamma(0.25)RR.L 20.29 23.00 23.54MKS.L 12.07 14.80 15.01NXT.L -0.56 3.94 4.00OML.L 7.87 13.05 14.32HSBA.L 4.42 9.22 10.15ICI.L 11.11 19.43 17.94MRW.L 7.69 15.40 15.28AVERAGE 8.99 14.12 14.32
Table10: Setting discount rate
4.4.1.3 Setting Trace Decay (λ)
We experimented with 4 different trace decay parameters and found that keeping high
trace decay (0.8) gave best results.
CompaniesTrace Decay(0.1)
Trace Decay(0.2)
Trace Decay(0.5)
Trace Decay(0.8)
RR.L 11.39 12.57 11.05 10.95MKS.L 10.92 9.43 11.40 13.49NXT.L -1.60 -0.33 -1.93 -0.38OML.L 7.29 7.90 8.48 7.84HSBA.L 3.03 3.32 4.16 4.90ICI.L 11.48 10.57 13.18 13.54MRW.L 3.55 4.64 5.58 7.52Average 6.58 6.87 7.42 8.27
Table11: Setting trace decay.
We kept these optimized parameters for further experimentations.
The thing that is very clear from above mentioned results is that Q learning is giving
best performance at low discount rate. It means it is tending towards single step
reward only RL. Also Q (λ) by setting high trace decay rate is tending towards Q (1)
42
Learning. Under such circumstances we don’t expect Q (1) or Q (λ) to give better
results than single step reward only RL for all companies.
4.4.2 Results with new delayed reward strategy
Results for 90 of FTSE100 companies: We first experimented with single step
reward only RL.
Agents Average WealthStandard Deviation(Yearly Performance)
Company Name
Old Reward Strategy
New Reward Strategy
Buy and Hold
Old Reward Strategy
New Reward Strategy
Buy and Hold
AAL.L 4.68 16.68 15.59 20.72 24.31 27.75ABF.L 4.96 13.20 8.86 11.40 4.64 12.43AL.L -3.14 12.21 -8.70 10.32 6.19 27.20AMEC.L 16.58 28.81 25.38 24.20 19.40 14.27ANTO.L 31.80 36.23 38.24 21.18 18.40 22.76AV.L -1.42 11.79 -3.89 20.23 12.53 32.71AZN.L -7.40 4.64 -10.41 9.61 4.37 28.56BA.L 11.81 26.13 6.14 29.64 23.91 49.12BARC.L -5.51 11.03 -2.84 13.26 8.55 26.22BATS.L 12.20 16.41 19.36 19.45 10.86 23.95BAY.L 5.23 14.89 -0.16 27.96 20.89 71.39BB.L 5.10 10.03 -2.27 20.72 15.00 29.69BDEV.L 9.44 17.91 -3.12 25.40 23.27 49.03BG.L 28.01 28.05 23.99 24.81 19.95 33.90BLND.L 14.50 23.23 10.08 19.22 13.81 36.32BLT.L 9.44 17.80 25.42 27.20 21.73 29.08BP.L -3.56 4.62 -3.17 14.86 8.35 23.55BSY.L -8.18 8.81 -6.82 12.02 6.90 12.06BT-A.L -1.78 8.87 3.29 9.83 10.40 19.75CBRY.L -4.74 5.88 2.73 12.76 3.55 19.23CCL.L -10.04 4.57 -11.51 11.17 5.66 11.31CNE.L 7.62 25.24 33.23 17.80 14.51 32.01CPG.L 0.43 9.67 0.03 7.52 8.00 26.28CPI.L 8.76 19.50 9.16 12.10 6.99 27.68CPW.L 5.62 34.32 22.81 47.97 35.63 44.18CW.L 2.50 22.89 -10.42 21.97 36.70 46.82DGE.L 2.90 8.78 1.85 7.05 5.17 15.41DMGT.L -6.76 6.61 -11.00 11.49 8.08 33.12DSGI.L -8.30 9.86 -17.08 18.34 12.25 39.54FGP.L 9.66 20.19 9.59 17.22 15.02 28.19FP.L 2.81 18.45 1.27 16.05 11.75 26.16FTSE100 4.25 6.84 10.18 6.92 3.80 7.07GSK.L -5.28 6.74 -0.27 14.17 6.45 15.17
43
Agents Average WealthStandard Deviation(Yearly Performance)
Company Name
Old Reward Strategy
New Reward Strategy
Buy and Hold
Old Reward Strategy
New Reward Strategy
Buy and Hold
HMSO.L 15.97 19.97 15.19 13.73 10.40 23.24HSBA.L 0.51 8.96 -0.60 7.43 6.90 15.94ICI.L 12.27 18.00 12.16 21.06 14.74 57.84III.L 2.73 14.09 -1.82 22.74 12.59 38.93IMT.L 2.52 6.84 12.47 17.98 12.03 16.28IPR.L 10.90 21.90 10.12 19.07 19.07 43.66ITV.L -4.13 10.70 -5.22 24.11 17.47 62.86JMAT.L 4.41 15.61 10.31 22.14 13.76 23.38KEL.L 11.69 15.88 18.43 14.53 11.79 12.83KGF.L -6.08 9.22 -19.68 11.33 12.11 37.07LAND.L -4.73 15.70 7.16 19.74 11.09 19.82LGEN.L 1.64 17.50 -4.78 11.94 13.42 32.62LII.L 6.53 15.91 13.67 18.68 13.28 21.20LLOY.L -7.59 7.83 -8.30 10.50 6.89 27.98LMI.L 12.88 31.68 29.38 45.84 34.16 54.78LSE.L 22.14 29.59 25.64 42.77 47.11 70.99MKS.L 2.40 14.90 9.73 29.98 11.46 39.30MRW.L 1.41 15.74 4.59 19.35 12.45 29.71NXT.L -2.17 3.71 -8.73 19.44 8.01 31.42OML.L -1.01 13.82 0.81 19.79 14.52 32.80PRU.L -4.68 8.78 -2.13 15.30 9.25 30.63PSN.L 4.85 20.63 2.13 11.64 21.37 41.44PSON.L -6.52 10.01 -4.58 17.33 8.87 25.03PUB.L 8.74 20.17 1.92 16.56 12.96 41.50RB.L 3.13 11.14 15.61 10.67 6.12 13.41RBS.L -3.99 3.39 -9.41 5.82 4.67 23.97REL.L -2.95 8.90 -1.20 7.67 7.49 14.93REX.L -0.99 10.38 -1.75 11.72 9.09 17.66RIO.L 12.16 15.64 25.12 37.24 26.82 44.72RR.L 7.69 22.83 19.61 44.90 42.83 58.63RSA.L -4.72 16.15 -13.72 24.22 17.14 49.57RTO.L -6.11 6.90 -19.19 9.70 7.68 20.90RTR.L 15.57 12.98 1.10 40.24 29.92 101.20SAB.L 9.34 15.58 13.29 24.31 14.46 28.46SBRY.L 0.69 14.39 -3.39 33.05 21.70 43.32SCTN.L 6.90 12.91 5.51 13.49 6.20 27.03SDR.L 4.80 18.32 7.33 18.49 11.86 30.72SDRC.L 6.17 15.40 5.81 20.09 10.95 27.29SGE.L -2.11 10.57 -1.19 19.47 12.04 37.26SHP.L 9.45 20.46 11.25 18.26 13.10 28.61SMIN.L 5.55 14.68 2.68 7.74 6.75 17.73
44
Agents Average WealthStandard Deviation(Yearly Performance)
Company Name
Old Reward Strategy
New Reward Strategy
Buy and Hold
Old Reward Strategy
New Reward Strategy
Buy and Hold
SSE.L 12.67 14.78 13.86 16.16 15.59 16.96STAN.L 3.66 14.21 17.23 8.09 6.16 6.36SVT.L -3.64 17.20 4.06 14.53 27.92 17.27TATE.L 7.49 18.64 6.45 28.39 16.68 34.97TLW.L 19.43 26.11 45.68 27.10 25.90 34.52TSCO.L 10.52 15.08 7.58 17.47 12.22 25.93TT.L 4.37 20.04 12.99 21.95 15.36 20.51TW.L 8.76 21.76 6.98 17.67 14.49 24.51ULVR.L 0.21 7.59 4.94 10.84 5.52 10.80UU.L 1.03 9.57 1.68 7.65 7.06 12.53VOD.L 3.79 17.14 2.23 15.16 15.48 10.12WOS.L 4.72 13.83 -4.46 17.82 13.72 43.48WPP.L -0.94 10.75 -5.18 10.78 11.37 36.87XTA.L 35.08 30.84 50.60 8.58 10.26 10.55Average 4.18 15.17 5.71 18.36 14.10 30.12
Table12: Comparative results for 90 FTSE 100 companies with delayed reward
strategy for single step reward only RL.
4.4.2.1 Observations:
1) The new delayed reward strategy gave 263% better performance than old
reward strategy.
2) The new delayed reward strategy was also able to beat Buy and hold strategy
by 166%.
3) In new reward strategy 88 of 90 companies showed better performance than
old reward giving strategy.
4) In new reward strategy 79 of 90 companies (i.e. 88% companies) showed
better performance than benchmark buy and hold strategy.
5) The Standard Deviation (risk) for new reward strategy is 23% less than
standard deviation for the old reward strategy.
6) Standard Deviation (risk) for new reward strategy is 53% less than standard
deviation of buy and hold strategy.
45
Let’s take few examples to illustrate this
Company : DSGI
Figure7: DSGI Price Chart
The price of DSGI has fluctuated a lot from January 2002 to July 2008. With previous
approach of immediate reward feedback the very next day, agents gave an average
performance of -8.29% annually.
Figure8: DSGI, Wealth Chart of agents with old reward strategy
Please note few important things from the agent performance with old reward
strategy-:
1) The wealth of the agents nearly follows the same pattern as that of stock price.
This has been the characteristics of almost all the FTSE 100 Stock. So when
the price of stock rises, agent’s wealth also rises and when there price falls,
agent’s wealth also fall. This confirms our initial remark that classifiers are not
able to learn the prediction value corresponding to a particular technical
indicator binary input properly.
2) None of the agent is able to predict the fall in the price of stock during January
2007 to July 2008.
46
Figure9: DSGI, Agent’s performance with new delayed reward giving strategy.
Points to note for new reward strategy-:
1) With new approach the average performance of Meta Agents rose from -8.29%
to 9.85% annually.
2) All the agents were able to predict the fall in stock price from January 2007 to
July 2008 and all of them perform better than buy and hold strategy. When
Agents predicts fall in the stock price they simply sell the stock and keep the
money in bank. Straight lines in the above graph correspond to such behavior.
Note similar behavior of another stock LAND.L
Figure10: LAND, Price Chart
47
Figure11: LAND, Meta Agents wealth chart with old reward strategy.
Figure12: LAND, Meta Agents wealth chart with new delayed reward strategy.
The average performance of Meta Agents rose from -4.73% to 15.69% annually.
4.4.3 Experimental Results for all 3 learning algorithm with new
delayed reward strategy
We also experimented with all 3 learning methodology, single step RL, multi step Q
and Q(λ) with the optimum parameters found above.
Average Agents WealthStandard Deviation(Yearly Performance)
Company Name
Only Reward RL Q - RL
Q(λ) RL B&H
Only Reward RL Q – RL
Q(λ) RL B&H
AAL.L 17.09 15.27 13.84 15.59 24.27 25.11 15.95 27.75ABF.L 14.18 13.42 11.42 8.86 5.33 4.93 4.89 12.43AL.L 12.35 11.22 11.21 -8.70 6.56 6.29 10.61 27.20AMEC.L 28.08 27.63 19.01 25.38 19.06 20.34 15.65 14.27ANTO.L 38.09 38.15 25.14 38.24 18.17 18.01 25.75 22.76AV.L 11.94 11.56 6.04 -3.89 12.65 13.47 12.48 32.71AZN.L 4.86 4.07 5.69 -10.41 4.65 3.42 5.42 28.56
48
Average Agents Wealth Standard Deviation
Company Name
Only Reward RL Q - RL
Q(λ) RL B&H
Only Reward RL Q – RL
Q(λ) RL B&H
BARC.L 10.68 10.71 5.77 -2.84 8.29 8.05 11.79 26.22BATS.L 17.35 16.32 16.68 19.36 11.82 11.72 12.04 23.95BAY.L 16.15 15.51 11.98 -0.16 21.51 20.53 20.29 71.39BB.L 10.79 10.26 9.40 -2.27 16.65 16.44 21.57 29.69BDEV.L 17.87 18.52 13.90 -3.12 22.94 22.90 17.69 49.03BG.L 28.79 28.19 26.73 23.99 20.04 19.09 14.61 33.90BLND.L 24.04 22.53 17.81 10.08 12.78 13.34 12.33 36.32BLT.L 19.07 18.45 18.29 25.42 22.72 21.67 23.45 29.08BP.L 5.06 4.85 4.05 -3.17 7.95 8.31 7.15 23.55BSY.L 9.75 8.25 8.82 -6.82 7.18 7.86 7.96 12.06BT-A.L 8.35 8.63 9.76 3.29 9.54 9.30 9.79 19.75CBRY.L 6.66 5.18 8.06 2.73 3.64 3.13 7.48 19.23CCL.L 5.02 4.97 0.49 -11.51 5.88 6.50 3.63 11.31CNE.L 25.80 23.86 17.11 33.23 15.18 15.42 16.55 32.01CPG.L 10.14 8.89 10.44 0.03 8.47 7.95 5.30 26.28CPI.L 20.33 19.03 15.40 9.16 6.17 7.41 9.56 27.68CPW.L 34.83 31.92 26.27 22.81 33.87 34.80 26.22 44.18CW.L 24.50 24.84 20.59 -10.42 39.80 42.66 39.31 46.82DGE.L 9.47 8.72 8.34 1.85 5.94 5.25 4.95 15.41DMGT.L 7.12 6.44 1.99 -11.00 7.81 7.48 9.89 33.12DSGI.L 11.18 9.92 8.92 -17.08 11.40 11.37 13.25 39.54FGP.L 20.05 19.45 18.22 9.59 14.51 14.83 14.85 28.19FP.L 19.62 18.91 14.63 1.27 11.70 11.62 16.32 26.16FTSE100 7.43 6.29 4.25 10.18 4.24 4.48 5.10 7.07GSK.L 6.77 5.92 6.30 -0.27 6.77 6.29 9.32 15.17HBOS.L 9.27 7.61 6.12 -8.61 14.48 14.01 7.45 30.35HMSO.L 21.38 20.96 18.84 15.19 9.54 9.83 8.06 23.24HSBA.L 9.20 9.13 7.94 -0.60 7.29 7.28 10.32 15.94ICI.L 17.77 17.06 17.21 12.16 14.66 14.87 12.99 57.84III.L 14.10 13.78 13.45 -1.82 14.39 14.92 13.34 38.93IMT.L 8.73 7.17 9.17 12.47 10.44 12.17 11.87 16.28IPR.L 22.81 22.62 22.27 10.12 20.25 20.51 11.84 43.66ITV.L 10.96 11.10 7.31 -5.22 16.98 17.71 12.67 62.86JMAT.L 16.39 16.02 12.30 10.31 14.05 14.93 10.83 23.38KEL.L 16.69 14.92 13.60 18.43 11.62 11.75 8.80 12.83KGF.L 9.97 8.41 7.59 -19.68 12.43 10.96 9.58 37.07LAND.L 15.84 13.33 9.88 7.16 11.30 11.27 10.93 19.82LGEN.L 19.96 17.45 19.52 -4.78 14.02 14.36 16.31 32.62LII.L 16.20 15.74 10.68 13.67 13.36 13.17 10.41 21.20LLOY.L 8.46 7.45 4.27 -8.30 7.40 6.22 6.12 27.98LMI.L 31.98 33.05 21.30 29.38 34.56 36.34 24.94 54.78LSE.L 30.04 30.81 26.37 25.64 47.90 48.63 45.66 70.99
49
Average Agents Wealth Standard Deviation
Company Name
Only Reward RL Q - RL
Q(λ) RL B&H
Only Reward RL Q – RL
Q(λ) RL B&H
MRW.L 16.55 15.47 14.29 4.59 13.16 13.09 12.59 29.71NXT.L 4.26 3.07 2.55 -8.73 7.87 8.02 7.09 31.42OML.L 13.88 13.06 11.95 0.81 15.61 18.25 12.55 32.80PRU.L 8.92 8.93 11.71 -2.13 9.53 9.03 13.02 30.63PSN.L 21.37 19.40 14.07 2.13 21.93 20.89 17.61 41.44PSON.L 9.75 9.70 7.20 -4.58 9.69 9.70 7.82 25.03PUB.L 21.51 19.67 14.74 1.92 13.09 12.10 13.10 41.50RB.L 12.24 10.50 11.17 15.61 5.94 6.33 6.86 13.41RBS.L 3.90 3.48 0.33 -9.41 4.39 5.59 8.65 23.97REL.L 9.23 8.98 6.86 -1.20 7.37 7.69 6.91 14.93REX.L 10.86 9.72 9.61 -1.75 9.11 8.84 7.87 17.66RIO.L 15.47 14.86 7.54 25.12 26.36 25.93 16.25 44.72RR.L 23.22 22.12 12.98 19.61 43.26 43.18 45.13 58.63RSA.L 17.30 17.04 6.66 -13.72 17.11 19.17 20.43 49.57RTO.L 8.02 6.72 1.56 -19.19 8.29 7.79 6.05 20.90RTR.L 14.07 13.00 11.78 1.10 31.27 31.54 29.89 11.20SAB.L 16.28 15.23 14.48 13.29 14.10 15.19 13.98 28.46SBRY.L 14.73 13.36 14.30 -3.39 22.73 22.32 15.45 43.32SCTN.L 13.23 11.57 8.56 5.51 6.21 6.09 9.53 27.03SDR.L 20.07 18.38 15.78 7.33 11.23 11.65 11.45 30.72SDRC.L 15.47 14.28 13.35 5.81 11.24 10.58 10.71 27.29SGE.L 11.61 10.29 10.22 -1.19 12.56 11.83 16.87 37.26SHP.L 20.30 19.02 9.09 11.25 13.39 14.15 10.85 28.61SMIN.L 14.90 13.93 7.26 2.68 6.97 6.86 7.74 17.73SN.L 12.10 11.40 9.41 7.00 6.36 7.57 9.43 16.24SSE.L 15.65 14.11 16.03 13.86 16.17 16.57 13.70 16.96STAN.L 16.68 14.80 15.29 17.23 7.48 6.22 9.75 6.36SVT.L 17.69 17.30 15.75 4.06 27.83 27.18 27.82 17.27TATE.L 19.97 19.44 16.90 6.45 17.57 18.75 18.35 34.97TLW.L 27.16 24.93 22.86 45.68 26.25 25.73 18.88 34.52TSCO.L 15.00 13.39 12.43 7.58 13.39 11.87 14.27 25.93TT.L 22.23 20.46 19.01 12.99 15.63 16.02 11.39 20.51TW.L 21.61 21.18 18.60 6.98 13.61 12.32 12.46 24.51ULVR.L 7.97 7.63 6.76 4.94 5.68 5.83 7.87 10.80UU.L 9.98 9.29 8.05 1.68 6.61 6.73 7.32 12.53VOD.L 17.23 16.19 12.98 2.23 16.12 16.15 14.73 10.12WOS.L 14.56 13.54 12.66 -4.46 13.52 12.94 14.39 43.48WPP.L 11.25 10.11 9.40 -5.18 11.09 11.89 10.96 36.87XTA.L 33.38 34.12 24.41 50.60 9.19 7.63 13.02 10.55Average 15.83 14.93 12.44 5.71 14.32 14.38 13.74 29.90
Table13: Results of 90 FTSE100 Companies for different RL with new delayed reward strategy.
50
4.4.3.1 Observations
1) All 3 learning methodology were able to beat buy and hold strategy. Single
step RL gave 177% better performance,1step Q learning gave 161% better
performance and Q (λ) gave 117% better performance than buy and hold
strategy.
2) Both Q learning and Q (λ) performance was less than Single step reward only
RL.
3) Standard Deviation (risk) for all three learning algorithm was less than buy and
hold strategy. Single step RL showed 52% less standard deviation, 1step Q
learning showed 52% less standard deviation and Q (λ) showed 54% less
standard deviation.
4.4.4 Fault in Q (1) learning
With our initial assumption that market behavior on different days is co-
related, we expected multi step Q learning to perform better than single step reward
only RL. However the results were contrary. We found that the reason for this was that
currently for any wrong action XCS just gave a reward of zero and didn’t penalizes
classifier by giving negative reward. Let say for example on a given day XCS can take
two actions either sell or buy. It took action sell which turns out to be wrong. We
expect the prediction of such classifier with incorrect action, sell to go down.
However, in old XCS version for Q (1) RL, with wrong action it will just be given
zero reward. Due to this prediction will still go up because of update equation.
Here, even if r is zero, Q(s, a) will increase because of γ max Q(St+1,a) .
We found this approach wrong. To tackle this, either of the below mentioned two
things could be done-:
1) If a classifier takes wrong action, give negative reward. The amount of
negative reward that should be given is another optimization problem and need
careful consideration.
OR
51
2) After the creation of match set [M] we find which action is winner say 01
(Buy). Make two sets:
a) One action set containing all the classifiers who have won (already being
done).
b) Second a “Passive set” containing all the classifiers who have lost i.e. there
action was not taken.
Union of action set and passive set makes match set. In case wrong decision is taken
instead of penalizing classifiers in action set (as suggested in point 1), we will give a
reward of 1000 to the passive set. This approach makes sense, since if instead of
action 01(buy), action 00 (sell) would have been taken then that would have been
correct and Passive set classifier would have been rewarded. By working this way we
avoid the necessity of giving negative reward.
4.4.5 Experimentation with passive Set
4.4.5.1 Finding optimum parameters
We again started with finding optimum parameters as has been done before.
For single step reward only RL and Q learning, exploration rate of 0.5 gave best
results on small set of companies and for Q (λ) exploration rate of 0.2 gave best
results.
Trace decay of 0.2 gave best results for Q (λ). For details of experimental results
please see appendix. Discount rate was taken as 0.95.
Comparative Results for FTSE 100 companies using Passive Set.
Average Agents Wealth Standard Deviation(Yearly Returns)
Company Name
Only Reward RL
Q - RL
Q(λ) RL B&H
Only Reward RL
Q - RL
Q(λ) RL B&H
AAL.L 15.27 22.79 7.44 15.59 28.06 16.65 10.48 27.75ABF.L 5.62 19.46 2.79 8.86 16.62 4.99 8.01 12.43AL.L -9.25 -1.28 -2.19 -8.70 27.40 19.47 7.28 27.20AMEC.L 22.28 26.25 9.55 25.38 13.63 20.78 8.01 14.27ANTO.L 37.24 46.00 17.03 38.24 24.00 21.38 12.46 22.76AV.L 7.56 19.94 2.14 -3.89 22.88 11.92 1.42 32.71AZN.L -6.72 6.07 1.70 -10.41 24.28 13.56 1.76 28.56BA.L 25.88 27.05 1.52 6.14 31.07 26.50 7.05 49.12
52
Average Agents Wealth Standard Deviation(Yearly Returns)
Company Name
Only Reward RL
Q - RL
Q(λ) RL B&H
Only Reward RL
Q - RL
Q(λ) RL B&H
BARC.L -6.02 9.07 -0.66 -2.84 25.09 13.64 4.70 26.22BATS.L 17.53 16.59 11.80 19.36 26.22 17.62 16.34 23.95BAY.L 12.15 21.95 1.65 -0.16 50.48 40.30 14.00 71.39BB.L -0.33 10.91 1.34 -2.27 30.38 24.97 10.54 29.69BDEV.L -9.93 18.86 4.67 -3.12 45.29 34.08 9.72 49.03BG.L 25.62 26.44 19.02 23.99 32.03 17.42 21.87 33.90BLND.L 9.15 22.72 9.15 10.08 36.99 19.07 14.71 36.32BP.L -2.81 11.68 1.07 -3.17 23.11 10.75 2.30 23.55BSY.L -4.89 0.80 1.09 -6.82 9.97 10.40 2.07 12.06BT-A.L 13.05 7.24 5.05 12.73 8.85 2.13 9.62 9.41CBRY.L 6.59 13.57 0.97 2.73 13.84 8.78 4.43 19.23CCL.L 2.87 4.16 -1.08 -11.51 23.65 17.28 3.32 11.31CNE.L 31.49 55.29 14.15 33.23 28.61 31.39 17.00 32.01CPG.L -1.63 10.79 1.49 0.03 25.83 14.22 2.50 26.28CPI.L 20.43 21.26 8.25 9.16 12.73 23.39 7.96 27.68CPW.L 20.33 39.82 4.12 22.81 45.57 31.63 6.64 44.18CW.L -4.13 23.41 1.60 -10.42 42.92 21.98 2.34 46.82DGE.L 8.59 7.62 1.18 1.85 7.99 8.41 2.79 15.41DMGT.L -3.76 10.97 -0.99 -11.00 28.93 13.76 3.92 33.12DSGI.L -16.43 4.18 1.35 -17.08 38.95 36.13 2.06 39.54FGP.L 9.86 23.75 4.38 9.59 27.88 20.84 14.72 28.19FP.L -2.97 16.76 2.87 1.27 24.29 21.76 5.04 26.16FTSE100 8.89 13.62 6.54 10.18 5.38 1.52 4.00 7.07GSK.L -0.38 7.13 1.99 -0.27 15.12 9.53 2.57 15.17HBOS.L -8.12 -2.53 0.44 -8.61 30.24 28.51 5.62 30.35HMSO.L 15.46 31.64 6.98 15.19 23.20 10.40 10.50 23.24HSBA.L 1.89 11.04 0.57 -0.60 14.06 9.11 2.74 15.94ICI.L 14.28 29.75 5.84 12.16 56.23 43.43 7.65 57.84III.L -0.58 8.99 0.82 -1.82 37.76 22.56 4.44 38.93IMT.L 10.93 16.87 6.29 12.47 18.33 9.29 15.59 16.28IPR.L 10.38 27.14 3.56 10.12 43.30 26.16 4.79 43.66ITV.L -8.10 13.53 1.31 -5.22 64.57 37.60 5.51 62.86JMAT.L 12.21 18.25 3.67 10.31 19.77 19.57 8.07 23.38KEL.L 17.41 22.78 9.60 18.43 14.20 9.99 9.48 12.83KGF.L -21.21 7.72 -1.44 -19.68 37.13 18.94 3.84 37.07LAND.L 6.87 13.96 9.42 7.16 19.24 17.00 12.65 19.82LGEN.L 1.24 16.24 1.26 -4.78 26.20 18.53 2.87 32.62LII.L 11.44 15.95 8.05 13.67 21.69 19.81 8.67 21.20LLOY.L -5.85 5.00 1.70 -8.30 25.23 17.84 1.88 27.98LMI.L 30.80 42.58 16.92 29.38 52.19 40.04 38.49 54.78LSE.L 26.59 41.10 11.51 25.64 70.70 54.46 40.46 70.99MKS.L 4.20 26.07 4.16 9.73 39.27 21.21 7.74 39.30
53
Average Agents WealthStandard Deviation(Yearly Returns)
Company Name
Only Reward RL
Q - RL
Q(λ) RL B&H
Only Reward RL
Q - RL
Q(λ) RL B&H
MRW.L 3.49 23.24 -0.93 4.59 30.40 16.00 11.61 29.71NXT.L 8.35 20.11 -3.70 6.58 20.85 16.82 2.81 22.65OML.L -0.91 10.95 2.35 0.81 34.08 27.85 7.10 32.80PRU.L 2.24 8.83 0.73 -2.13 25.58 13.30 2.73 30.63PSN.L 3.13 20.42 6.02 2.13 41.57 34.67 27.42 41.44PSON.L -0.45 10.56 2.07 -4.58 17.94 12.54 1.62 25.03RB.L 13.67 15.69 8.84 15.61 15.22 9.35 14.29 13.41RBS.L -8.33 9.05 -0.11 -9.41 23.42 14.36 3.06 23.97REL.L -0.90 7.80 0.95 -1.20 14.22 7.01 2.09 14.93REX.L 2.38 10.40 -2.26 -1.75 13.56 8.84 4.91 17.66RIO.L 25.74 23.63 15.40 25.12 44.08 24.78 29.35 44.72RR.L 15.84 30.62 18.57 19.61 48.63 43.65 37.32 58.63RSA.L -5.13 26.67 2.26 -13.72 41.76 30.89 0.77 49.57RTO.L -16.96 3.09 2.34 -19.19 19.17 11.87 0.44 20.90RTR.L 12.96 35.36 2.46 1.10 98.71 68.67 0.31 101.20SAB.L 14.30 19.88 6.90 13.29 27.43 20.66 12.39 28.46SCTN.L 3.62 12.47 3.07 5.51 29.14 12.50 3.38 27.03SDR.L 7.55 28.15 1.94 7.33 30.07 19.67 6.94 30.72SDRC.L 6.91 20.72 2.90 5.81 25.37 11.80 8.07 27.29SGE.L 2.77 13.86 0.32 -1.19 33.72 26.07 6.82 37.26SHP.L 9.51 24.09 2.55 11.25 30.36 13.21 6.46 28.61SMIN.L 2.23 9.51 0.10 2.68 18.30 14.00 3.90 17.73SN.L 6.35 12.45 1.09 7.00 16.68 12.96 5.56 16.24SSE.L 13.55 14.66 8.67 13.86 17.29 13.32 13.99 16.96STAN.L 13.69 21.17 7.89 17.23 9.18 6.30 3.90 6.36SVT.L 5.96 14.64 6.79 4.06 17.19 11.14 13.80 17.27TATE.L 6.36 12.12 0.48 6.45 35.11 23.20 12.87 34.97TLW.L 45.04 50.41 20.11 45.68 35.34 26.77 20.44 34.52TSCO.L 10.47 16.57 5.91 7.58 22.54 17.16 6.86 25.93TT.L 13.27 31.91 4.30 12.99 20.52 22.54 10.21 20.51TW.L 10.88 21.77 3.74 6.98 29.94 14.30 10.64 24.51ULVR.L 4.93 14.03 -1.92 4.94 10.76 7.99 6.05 10.80UU.L 1.35 12.56 0.95 1.68 12.70 11.39 3.39 12.53VOD.L 0.45 11.38 2.41 2.23 11.27 12.60 4.05 10.12WOS.L -3.54 14.68 2.00 -4.46 42.99 26.64 23.17 43.48WPP.L -3.34 15.48 1.16 -5.18 34.88 21.38 2.82 36.87XTA.L 48.56 48.03 34.26 50.60 13.55 18.67 16.51 10.55Average 7.22 18.70 4.73 5.98 28.41 19.84 9.28 29.90
Table 14: Comparative Results for FTSE 100 companies using Passive Set.
54
Average Agents Wealth Standard Deviation
Learning Type
Only Reward RL Q RL
Q(λ) RL B&H
Only Reward RL Q RL
Q(λ) RL B&H
Active Set 15.83 14.93 12.44 5.71 14.32 14.38 13.74 29.90Passive Set 7.22 18.70 4.73 5.71 28.41 19.84 9.28 29.90Optimization 15.83 18.70 12.44 5.71 14.32 19.84 13.74 29.90Table15: Comparison with Only active set and with additional passive set approach
4.4.5.2 Observations for Passive set.
1) As expected the performance of Q learning increases from 14.93% to 18.70%.
However, performance decreases for reward only learning RL and Q (λ)
learning. The reason for this is not very clear.
2) Multi step Q learning is better than single step RL for 62 of 90 FTSE 100
companies.
3) Standard Deviation (risk) for Only Reward RL with active set is 28% less the
standard deviation for Q, RL.
4) Q learning with passive set approach was found to be best on performance
criteria. It gave 18% better than only reward RL. So for further portfolio
management only reward RL was used without passive set and Q learning was
used with passive set approach.
Combined Results single step RL without passive set and multi step Q with
passive set.
Average Agents Wealth Standard Deviation
Company Name
Only Reward RL Q RL B&H
Only Reward RL Q RL B&H
AAL.L 17.09 22.79 15.59 24.27 16.65 27.75ABF.L 14.18 19.46 8.86 5.33 4.99 12.43AL.L 12.35 -1.28 -8.70 6.56 19.47 27.20AMEC.L 28.08 26.25 25.38 19.06 20.78 14.27ANTO.L 38.09 46.00 38.24 18.17 21.38 22.76AV.L 11.94 19.94 -3.89 12.65 11.92 32.71AZN.L 4.86 6.07 -10.41 4.65 13.56 28.56BA.L 26.57 27.05 6.14 25.11 26.50 49.12BARC.L 10.68 9.07 -2.84 8.29 13.64 26.22
55
Average Agents Wealth Standard Deviation
Company Name
Only Reward RL Q RL B&H
Only Reward RL Q RL B&H
BATS.L 17.35 16.59 19.36 11.82 17.62 23.95BAY.L 16.15 21.95 -0.16 21.51 40.30 71.39BB.L 10.79 10.91 -2.27 16.65 24.97 29.69BDEV.L 17.87 18.86 -3.12 22.94 34.08 49.03BG.L 28.79 26.44 23.99 20.04 17.42 33.90BLT.L 19.07 33.19 25.42 22.72 24.16 29.08BP.L 5.06 11.68 -3.17 7.95 10.75 23.55BSY.L 9.75 0.80 -6.82 7.18 10.40 12.06BT-A.L 8.35 7.24 12.73 9.54 2.13 9.41CBRY.L 6.66 13.57 2.73 3.64 8.78 19.23CCL.L 5.02 4.16 -11.51 5.88 17.28 11.31CNE.L 25.80 55.29 33.23 15.18 31.39 32.01CPG.L 10.14 10.79 0.03 8.47 14.22 26.28CPI.L 20.33 21.26 9.16 6.17 23.39 27.68CPW.L 34.83 39.82 22.81 33.87 31.63 44.18CW.L 24.50 23.41 -10.42 39.80 21.98 46.82DGE.L 9.47 7.62 1.85 5.94 8.41 15.41DMGT.L 7.12 10.97 -11.00 7.81 13.76 33.12DSGI.L 11.18 4.18 -17.08 11.40 36.13 39.54FGP.L 20.05 23.75 9.59 14.51 20.84 28.19FP.L 19.62 16.76 1.27 11.70 21.76 26.16FTSE100 7.43 13.62 10.18 4.24 1.52 7.07GSK.L 6.77 7.13 -0.27 6.77 9.53 15.17HBOS.L 9.27 -2.53 -8.61 14.48 28.51 30.35HMSO.L 21.38 31.64 15.19 9.54 10.40 23.24HSBA.L 9.20 11.04 -0.60 7.29 9.11 15.94ICI.L 17.77 29.75 12.16 14.66 43.43 57.84III.L 14.10 8.99 -1.82 14.39 22.56 38.93IMT.L 8.73 16.87 12.47 10.44 9.29 16.28IPR.L 22.81 27.14 10.12 20.25 26.16 43.66ITV.L 10.96 13.53 -5.22 16.98 37.60 62.86JMAT.L 16.39 18.25 10.31 14.05 19.57 23.38KEL.L 16.69 22.78 18.43 11.62 9.99 12.83KGF.L 9.97 7.72 -19.68 12.43 18.94 37.07LAND.L 15.84 13.96 7.16 11.30 17.00 19.82LGEN.L 19.96 16.24 -4.78 14.02 18.53 32.62LII.L 16.20 15.95 13.67 13.36 19.81 21.20LLOY.L 8.46 5.00 -8.30 7.40 17.84 27.98LMI.L 31.98 42.58 29.38 34.56 40.04 54.78LSE.L 30.04 41.10 25.64 47.90 54.46 70.99MKS.L 15.71 26.07 9.73 12.68 21.21 39.30
56
Average Agents Wealth Standard Deviation
Company Name
Only Reward RL Q RL B&H
Only Reward RL Q RL B&H
MRW.L 16.55 23.24 4.59 13.16 16.00 29.71NXT.L 4.26 20.11 6.58 7.87 16.82 22.65OML.L 13.88 10.95 0.81 15.61 27.85 32.80PRU.L 8.92 8.83 -2.13 9.53 13.30 30.63PSN.L 21.37 20.42 2.13 21.93 34.67 41.44PUB.L 21.51 31.28 1.92 13.09 12.90 41.50RB.L 12.24 15.69 15.61 5.94 9.35 13.41RBS.L 3.90 9.05 -9.41 4.39 14.36 23.97REL.L 9.23 7.80 -1.20 7.37 7.01 14.93REX.L 10.86 10.40 -1.75 9.11 8.84 17.66RIO.L 15.47 23.63 25.12 26.36 24.78 44.72RR.L 23.22 30.62 19.61 43.26 43.65 58.63RSA.L 17.30 26.67 -13.72 17.11 30.89 49.57RTO.L 8.02 3.09 -19.19 8.29 11.87 20.90RTR.L 14.07 35.36 1.10 31.27 68.67 101.20SAB.L 16.28 19.88 13.29 14.10 20.66 28.46SBRY.L 14.73 10.77 -3.39 22.73 32.55 43.32SCTN.L 13.23 12.47 5.51 6.21 12.50 27.03SDR.L 20.07 28.15 7.33 11.23 19.67 30.72SDRC.L 15.47 20.72 5.81 11.24 11.80 27.29SGE.L 11.61 13.86 -1.19 12.56 26.07 37.26SHP.L 20.30 24.09 11.25 13.39 13.21 28.61SMIN.L 14.90 9.51 2.68 6.97 14.00 17.73SN.L 12.10 12.45 7.00 6.36 12.96 16.24SSE.L 15.65 14.66 13.86 16.17 13.32 16.96STAN.L 16.68 21.17 17.23 7.48 6.30 6.36SVT.L 17.69 14.64 4.06 27.83 11.14 17.27TATE.L 19.97 12.12 6.45 17.57 23.20 34.97TLW.L 27.16 50.41 45.68 26.25 26.77 34.52TSCO.L 15.00 16.57 7.58 13.39 17.16 25.93TT.L 22.23 31.91 12.99 15.63 22.54 20.51TW.L 21.61 21.77 6.98 13.61 14.30 24.51ULVR.L 7.97 14.03 4.94 5.68 7.99 10.80UU.L 9.98 12.56 1.68 6.61 11.39 12.53VOD.L 17.23 11.38 2.23 16.12 12.60 10.12WOS.L 14.56 14.68 -4.46 13.52 26.64 43.48WPP.L 11.25 15.48 -5.18 11.09 21.38 36.87XTA.L 33.38 48.03 50.60 9.19 18.67 10.55Average 15.83 18.70 5.71 14.32 19.84 29.90Table16: Combined Results single step RL without Passive set and multi step Q
Learning with passive set
57
Chapter 5
5 Implementation &
Experimentation-Portfolio
OptimizationMost of the discussion till this point, concentrated on the learning of the XCS
System and therefore we concentrated on cumulative average performance of all the
agents. We haven’t looked and used the performance capabilities of single best agents.
The agents vary in the type of technical indicators they used to formulate the problem
and find the underlying price pattern. Due to this they give good performance for
certain stock and bad performance for others. Also, the performance of agents may
vary according to whether the stock is in trending period or in trading period. In
trending period we expect the combination of lagging indicators like moving average
to perform better. In trading periods such indicators will not perform better and
combinations of leading technical indicators like oscillators, Stochastic or CMF are
expected to perform much better. However, please note while designing the agents we
have tried to make them robust so that they give best performance on a variety of
stocks.
Aim of Portfolio management in XCS is to harness the profit making
capabilities of individual agents. We aim to build a subsystem which interact with
existing eXtended Classifier System and do monthly or quarterly portfolio rebalancing
and optimization. In this system all the agents will trade. According to some
performance measure (either maximum return or maximum sterling ratio for last
period) we will see which agent has done best in the last period (month or quarter) and
accordingly we will use that agent to trade for the next period (month or quarter). In
58
further sections we have explored different strategies like 'taking best 3 or best 5
agents to do the trading','taking top best companies in the portfolio' and 'mean reversal
strategy'.
5.1 ImplementationFor portfolio management we used all 14 set of agents as described in technical
analysis section. We needed certain criteria on the basis of which we can judge the
performance of the individual agents. The performance measures used were -:
1) Percentage Return from last period- In this we simply find which agents
have given us maximum percentage return in the last period (month or quarter)
and use that agent to trade for next period.
2) Sterling ratio: It is a risk adjusted return measurement. It is calculated as
Sterling ratio = Average Return for the period
--------------------------------------------
Maximum draw down for that period
Unlike, Sharpe ratio, sterling ratio penalizes volatility only in downward
direction.
For portfolio performance measurement also, average return from 20 runs, as
has been described before is used to gauge the performance of different stocks.
Experiments were conducted on 56 FTSE 100 companies (Please note, we wish to do
monthly or quarterly rebalancing of portfolio for which it was essential to have same
starting date of trading for different stocks. Out of 90 FTSE 100 Companies, only 56
company’s data was available from Year 2000 onwards. So, we used only these 56
companies in our portfolio construction and rebalancing). Experiments were
conducted on both single step reward only RL and Q learning methodology with the
optimum parameters as found before.
59
5.2 Portfolio Performance:To measure the overall risk adjusted performance of the portfolio, Sharpe ratio
is used. Sharpe ratio was developed by Nobel laureate William F. Sharpe and is
calculated by subtracting the risk-free rate such as that of the10-year U.S. Treasury
bond (we took standard 5% for this) from the rate of return for a portfolio and dividing
the result by the standard deviation of the portfolio returns.
Sharpe Ratio = Rp – Rf
--------------
σp
Where,
Rp = Expected Portfolio return (annual),
Rf = Risk Free rate (annual)
σp = Portfolio standard deviation
A higher Sharpe ratio describes how much extra return we can receive for the extra
volatility(risk) endured.
While doing monthly portfolio rebalancing modified form of Sharpe ratio formula is
used
Here, r is monthly portfolio return, f is the monthly risk free rate (taken as 0.42%) and
σr is monthly standard deviation in returns.
60
5.3 ResultsResult for one company for which monthly portfolio rebalancing and optimization is
done
Transaction Date Total Wealth
Best Agent Trading
5/2/2002 99889.28 Agent16/6/2002 100109.27 Agent47/8/2002 100329.74 Agent18/7/2002 98222.54 Agent19/9/2002 99350.09 Agent610/9/2002 106272.98 Agent711/8/2002 129037.14 Agent1212/10/2002 126184.09 Agent141/14/2003 126461.98 Agent42/13/2003 131545.70 Agent43/17/2003 129894.77 Agent14/16/2003 130836.95 Agent145/21/2003 115670.29 Agent26/23/2003 115202.85 Agent14
Table17: Portfolio Optimization using single best Agent
Please note how different agents were used to trade for a particular month
depending on there performance for last month. For the first month, May 2002 Agent
1 was chosen as default for trading. For second month June 2002, we compared the
performance of all 14 agents for the first month May and found that agent4 gave the
best results. We further used this agent to do the trading for second month, June 2002
and so on.
61
5.3.1.1 Observations: Portfolio Management results.
1) In initial system in which an average agent is trading with no portfolio
rebalancing or optimization at any time, annual return is 21.82%. In the new
system for which best agent is selected according to maximum return or
maximum sterling ratio for the last period and monthly or quarterly portfolio
re-balancing was done, performance of the system didn’t show much
improvement.
2) Sharpe ratio also didn’t show much of the improvement in comparison to the
initial system.
3) For Portfolio which uses Q learning methodology, similar observations were
noted.
5.3.1.2 Analysis and Comments on Portfolio Management System
While creating different agents we used combination of technical indicator in
such a way that they should show good performance on majority of stocks. We used
combination of both leading (oscillators) and lagging (moving average and moving
volume) indicators while designing Agents. Due to this the agents created were pretty
robust and doesn’t necessarily capture the movement of only leading or lagging range.
Also, different agents didn’t have remarkable difference in the performance. Due to
this portfolio using best agents was showing an average performance.
It should also be noted that the best agent for the last period might not be the
best agent for the next period. We expect that if Agents are solely designed to capture
either trending (only lagging indicators combination used) or trading market (only
leading indicators are used), then they have the capability to show much higher return
on the portfolio.
63
5.3.2 Change of Portfolio construction Strategy
We also noted that the best agent for the last period might not be the best agent
for the next quarter and using solely that agent for trading is not the best idea. Possible
remedy for this is that for the last traded period we find the best 3 or best 5 agents and
then let them trade for the next period. It’s like not keeping all your eggs in one
basket. Also, by using more than one best agent to do the trading, we are making the
system more robust. We expect such system to perform much better. We further
experimented by selecting best 3 or best 5 agents for the last trading period and using
them for next period.
Portfolio using Q learning showed better results than portfolio using single
step RL. However, trend in the results for both was similar. We further restrict
ourselves to experimenting only with 1 step reward only Reinforcement Learning. An
example showing Monthly Portfolio management Using Best 3 agents
Sr.NoTransaction Date
Total Wealth Best Agents Name
1 5/2/2002 356275.2 Agent1-Agent2-Agent3-2 6/6/2002 348550.4 Agent14-Agent13-Agent3-3 7/8/2002 347118.5 Agent7-Agent14-Agent8-4 8/7/2002 345733 Agent12-Agent13-Agent9-5 9/9/2002 360158.9 Agent3-Agent1-Agent8-6 10/9/2002 354231.6 Agent10-Agent11-Agent6-7 11/8/2002 348939.4 Agent14-Agent7-Agent1-8 12/10/2002 348692.1 Agent14-Agent6-Agent4-9 1/14/2003 349808.9 Agent12-Agent3-Agent10-10 2/13/2003 354686.5 Agent10-Agent4-Agent9-11 3/17/2003 366920.4 Agent3-Agent2-Agent11-12 4/16/2003 367894 Agent2-Agent8-Agent3-
Table18: Monthly Portfolio management Using Best 3 agents
For first month (May 2002) default Agent1, Agent2 and Agent 3 were used to
trade on all the portfolio stocks. Each agent was given 10k to trade. After first month
we just find out which 3 agents gave us the best performance in term of maximum
return or maximum sterling ratio. Such 3 agents (Agent14, Agent13 and Agent3) were
further used for the second month (June) trading. The accumulated portfolio wealth
after first month (May) was equally distributed among these 3 agents for the second
month (June).
64
5.3.2.1 Results of Portfolio Management using more than 1 Agent
Monthly Portfolio RB, Criteria Maximum Return1Best Agent 3 Best Agents 5 Best AgentsRET% VOL SR RET% VOL SR RET% VOL SR
Portfolio 23.32 12.38 1.48 22.90 12.52 1.43 22.55 12.26 1.43Table19: Monthly Portfolio Rebalancing using different Number of Best Agents
Portfolio using more than 1 best agent is also giving nearly the same performance as a
portfolio with single best agent. As has been explained earlier, the agents were
designed to perform on majority of stocks and though agents use different set of
technical indicators, they have similar performance. Due to this even after taking
different number of agents we are getting nearly same percentage return. However,
taking more number of agents in our portfolio makes it more robust which can be seen
by the drop in volatility of the portfolio by 3 % when best 3 agents were taken.
5.4 Experimentation with Portfolio Management
taking few best companies in the PortfolioTill now we have experimented with portfolio management using all the
companies available to us. However, in real scenario a person can not afford to have
stocks of all FTSE 100 companies. He may just want to have stocks of best 10
companies in his portfolio. We build of a Portfolio management system (PMS),
which can learn online and have only few companies in it say 5 or 10 or 20. The
procedure followed is similar to what we did before while creating portfolio with all
companies. Lets illustrate the steps followed for building portfolio management
system with best 5 companies in it and re-balancing it quarterly.
Best 5 Companies
Sr.NbrPortfolio Wealth Companies in Portfolio Performance
1 49706.78 SCTN-ABF-BATS-AL-RB -0.592 49964.53 UU-VOD-AV-BG-BATS 0.523 51077.29 CPI-REX-DGE-SMIN-AAL 2.234 60423.27 SN-IPR-AAL-BAY-RSA 18.305 61973.84 RSA-IPR-LGEN-TSCO-RTR 2.576 63422.65 BA-BLT-PRU-OML-RTO 2.34
Table 20: Portfolio Management System with quarterly re-balancing taking best 5
companies
65
5.4.1 Steps Followed
1) For the first Quarter randomly pick any 5 companies from FTSE 100 stocks. These companies are shown in blue. During this time XCS System keeps tracks of the performance of all the stocks.
2) At the end of first Quarter, PMS checks among all FTSE 100 companies, which 5 companies gave the best performance. It further re-balances and optimizes the portfolio with these 5 companies. The criteria used for checking the performance of companies was either maximum return or maximum sterling ratio from the last Quarter.
3) The process is repeated after each quarter.
5.4.2 Results
We experimented with the Portfolio Management System using either best 5 or best 10
or best 20 companies.
0
5
10
15
20
25
30
35
Vol
atili
ty
Sha
rpe
Rat
io
Per
form
ance
Vol
atili
ty
Sha
rpe
Rat
io
Per
form
ance
Vol
atili
ty
Sha
rpe
Rat
io
Per
form
ance
5 Companies 10 companies 20 companies
Monthly Max SterlingQuarterly Max Sterling
Figure14: Portfolio management using either best 5 or best 10 or best 20 companies.
Here the performance criteria used to judge the performance of companies is
maximum sterling ratio for the last period.
5.4.3 Observation:
1) By taking more companies in the portfolio, risk (volatility) decreases and risk adjusted return i.e. Sharpe ratio increases. This shows that as you take more number of companies in the portfolio, it becomes more robust and less risky.
66
2) Quarterly portfolio optimization showed better performance results than monthly portfolio optimization. This shows that it is not beneficial to short term react to the market dynamics.
3) It is easily identifiable that by taking more number of companies in the portfolio we are taking less risk. However, if we take more companies in the portfolio there is no particular trend in the performance of portfolio.
5.4.4 Experimentation with Mean Reversal Strategy
According to Investopedia
“The mean reversion strategy is based on the mathematical premise that all prices
will eventually move back towards the mean or average return. Thus, if a stock is
underperforming, its price will move towards its average value when the market
rebounds.”
This strategy main idea is to incorporate under performing stocks in the
portfolio in the hope that trend reversal will take place in price movement and they
will become profitable.
Here, we followed the similar steps as described in section 5.4.1 except that
instead of including best performing 5 or 10 or 20 companies, we include the worst
performing 5 or 10 or 20 companies of last quarter.
5.4.4.1 Results
0
5
10
15
20
25
30
Vol
atili
ty
Sha
rpe
Rat
io
Per
form
ance
Vol
atili
ty
Sha
rpe
Rat
io
Per
form
ance
Vol
atili
ty
Sha
rpe
Rat
io
Per
form
ance
5 Companies 10 companies 20 companies
Monthly Least SterlingQuarterly Least Sterling
Figure15: Portfolio Management System using trend reversal strategy.
67
Here also, the performance criteria used to judge companies is sterling ratio for
last period.
5.4.4.2 Observations
1) As more companies are taken in the portfolio, it became more robust. Risk (Volatility) has shown a steady decreases and Sharpe ratio (risk adjusted return) a steady increase for both monthly and quarterly re-balancing.
2) A riskier portfolio has potential to give more percentage return. By taking more companies in our portfolio the risk is getting reduced and so does the percentage return. Portfolio management is all about how much extra risk we are ready to endure for that extra amount of returns.
0
5
10
15
20
25
30
35
Vol
atili
ty
Sha
rpe
Rat
io
Per
form
ance
Vol
atili
ty
Sha
rpe
Rat
io
Per
form
ance
Vol
atili
ty
Sha
rpe
Rat
io
Per
form
ance
5 Companies 10 companies 20 companies
Quarterly RB Normal StrategyQuarterly RB Trend reversal
Figure16: Comparison mean reversal strategy with normal strategy.
3) The normal strategy of taking best companies showed better performance than mean reversal strategy. The trading period used for the experiments is 6 years and this shows that in long run trend reversal strategy doesn’t work. I believe market follows trends of ups and down. It’s important to identify these trends. When market is in steady uptrend or steady downtrend, normal strategy of taking best companies will work best. However, in time of ups and downs, trend reversal might perform best. Conclusively we can say in the long run normal strategy of taking best companies is our best bet for maximum profit making, rather than mean reversal strategy.
* For details of the results please see the appendix.
68
Chapter 6
6 Conclusion & Future Work
6.1 Conclusion We started the project with the aim of improving the learning of the XCS
system for financial trading purpose. We began with creation and experimentation of
large set of technical indicators and used heuristic to optimize the problem formulation
which was then used to provide input to the XCS System. In the process, we
developed a set of 14 agents using different set of technical indicators whose
performance was far better than the agents which were previously getting used in the
XCS System. For future we expect that by utilizing more expert knowledge in
technical analysis we could further improve on this area. Next, we formulated
financial forecasting as a multi step Reinforcement Learning problem by
implementing Q learning and Q (λ) learning. During the implementation of this new
approach we find it necessary that if agents take a wrong decision we should either
penalize all agents (by giving negative reward) in the action set or reward the
complementary passive set (union of action and passive set is the initial match set).
We also found that the learning of the classifiers was improper with the old strategy
that judged the correctness of action taken solely on the basis of next days price
movement. For this we used the concept of delayed reward feedback in which reward
was delayed by 5 days and the reward judgment was done on the basis of moving
average of next 5 days closing prices. In the initial version of system we found that the
cumulative performance of agents for 90 FTSE 100 companies wasn’t able to beat the
buy and hold strategy. After implementing the above mentioned algorithm and
strategy, we found that the cumulative performance of all 90 FTSE 100 companies
was able to beat buy and hold strategy by nearly 160 %. We were able to confirm that
learning is proper in our new system. We were also able to prove that market does
69
follow short trends and it is better to formulate financial forecasting as a multi step
problem rather than a single step problem. Q learning gave 18% better performance
than single step reward only RL.
Finally, we built a portfolio management system attached to modified XCS
system which did monthly, quarterly or yearly portfolio rebalancing by using
predictive capability of best agent at a particular time. As performance measure to find
the best agent, we used maximum reward or sterling ratio for the last period. We found
that the performance of such a system is equal to the average performance of all 14
agents. This confirms the view point that reaction to market behavior doesn’t
necessarily generate better results. We also employed strategy of “using combination
of best agents”to optimize the portfolio. However, results of using this strategy didn’t
show much improvement over the existing portfolio management system. Finally, we
built a portfolio using best few companies(5 or 10 or 15) in it. We found that such
portfolio gives far better performance in comparison to the portfolio containing all the
companies. We also employed mean reversal strategy on this portfolio. However, we
observed that normal strategy of taking best companies showed better performance
than mean reversal strategy. There are many things which can be improved in the
current system. They have been mentioned in the future work.
70
6.2 Future WorkFollowing are the key areas which can be explored further to improve the modified
eXtended Classifier System -:
1) We believe after the amendments made to the learning of the XCS System, It is
learning properly and efficiently. However, currently XCS System is able to
use only the rise in the stock prices and when it predicts drop in prices, it
simply sell the company's shares and put the money in bank. The performance
can further be improved by implementing Shorting. Short selling or
"Shorting" is the practice of selling securities the seller does not own, in the
hope of repurchasing them later at a lower price. This is done in an attempt to
profit from an expected fall or decline in price of a security,
2) We believe the better we define the input binary string, the better XCS system
can perform. Work done on the combination of technical indicator can serve as
starting point for future work. The main questions that need to be tackled are -:
a) What other combinations of technical indicators can be used?
b) How many of the technical indicators should be combined for giving
input to XCS system.
3) Portfolio management system can be made more robust by including different
optimization strategies like low beta portfolio or high beta portfolio.
4) It is not very clear why passive set approach didn't work with single step
reinforcement learning and Q(λ) learning. The reason for this could be
researched over.
5) Currently XCS is used to do stock trading and portfolio management system
which is attached to XCS simply uses some heuristic to optimize the portfolio.
It would be very interesting to explore how XCS itself could be used for
optimizing the portfolio.
71
7 Bibliography
[1] John Moody and Matthew Saffell, ”Learning to Trade via Direct Reinforcement”
IEEE transactions on Neural Networks, Vol 12 No. 4, July 2001.
[2] Xiu Gao, Laiwan Chan, “An Algorithm for Trading and Portfolio Management
Using QLearning and Sharpe Ratio Maximization” Proceedings of the International
Conference on Neural Information Processing, 2000 .
[3] Neunier, R. (1996), "Optimal asset allocation using adaptive dynamic
programming",inD.Touretzky, M. Mozer & M. Hasselmo, eds, "Advances in Neural
Information Processing Systems 8", MIT Press.
[4] Kyong Joo Oha,*, Tae Yoon Kimb, Sungky Mina ,”Using genetic algorithm to
support portfolio optimization for index fund management” Expert System with
application,Volume 28, Issue 2, February 2005, Pages 371-379.
[5] Edward P.K Tsang and Serafin Martinez Jaramillo ,”Computational Finance” ,
IEEE Computational Intelligence Society Newsletter (August 2004) .
[6] http://www.investopedia.com/
[7] An Pin Chen, MuYenChen, “Integration extended classifier system and knowledge
extraction model for financial investment prediction: An empirical study”, Institute of
Information Management , national Chiao Tung University, 1001TaHsueh Road,
Hsinchu30050, Taiwan, ROC.
[8] Mei Chih Chen, ChangLi Lin, AnPin Chen, “Constructing a dynamic
stockportfolio decisionmaking assistance model: using the taiwan 50 Index
constituentsas an example”, Soft Comput DOI10.1007/s005000070158y.
72
[9] Gershoff, M. and S. Schulenburg (2007). "Collective behavior based hierarchical
XCS." Proceedings of the 2007 GECCO conference companion on Genetic and
evolutionary computation: 26952700.
[10] Schulenburg, S. and P. Ross (2001). "Explorations in LCS Models of Stock
Trading." Advances in Learning Classifier Systems: 151–180.
[11] M A H Dempster and C M Jones, “A real time adaptive trading system using
genetic programming” Quant. Finance, 2001
[12] Sor Ying Wong, Sonia Schulenburg, “Portfolio Allocation using XCS Experts in
technical Analysis, Market Conditions and Options Market”,Proceedings of the 2007
GECCO conference companion on Genetic and evolutionary computation .
[13] M.A.H. Dempster, Tom W. Payne, Yazann Romahi and G.W.P. Thompson,
Computational Learning Techniques for Intraday FX Trading Using Popular Technical
Indicators. IEEE Transactions on Neural Network, Vol. 12 No. 4 July 2001
[14] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction
( Adaptive Computation and Machine Learning). The MIT Press, March 1998.
[15] Martin V. Butz ,Rule-Based Evolutionary Online Learning Systems, A Principled
Approach to LCS Analysis and Design, Chapter 4
[16] Maurice Kendall,”The Analysis of Economic Time series Part I: Prices” , Journal
of the Royal Statistical Society 96(1953).
[17] Investements, Zvi Bodie, Alex Kane, and Alan J. Marcus,McGraw-Hill
Companies, Chapter 12 market efficiency.
[18] www.stockchart.com
73
[19] Abu ul Hassan Tajwer, MSC Dissertation Evolving Trading Rules using Multi-
Agents XCS Environment.
[20] Fama, E. F. (1970). "Efficient Capital Markets: A Review of Theory andEmpirical Work." The Journal of Finance 25(2): 383-417.
[21] Samuelson, P.A, “Proof that properly anticipated prices fluctuate randomly”, Industrial Management Review, 6(1965): 41-49
[22] Mandelbrot, B. “forecasts of future prices, unbiased markets and martingale models”, Journal of Business, 39(1966):242-255
[23] Malkiel, B. G. (1996). A Random Walk Down Wall Street: Including a Lifecycle Guide to Personal Investing, WW Norton & Company.
[24] Burton, M. (1999). A Random Walk Down Wall Street, New York, itd.: Norton.
[25] Lo, Andrew W., and A. C. Mackinlay. A Non-Random Walk Down Wall Street. 5th ed. Princeton: Princeton University P, 2002. 4-47.
[26] Stone, C. and L. Bull (2004). "Foreign Exchange Trading using a LearningClassifier System." University of the West of England Bristol, Bristol UnitedKingdom.
[27] M. B. Porecha,_P. K. Panigrahi, J. C. Parikh, C. M. Kishtawal, and Sujit Basu
(2005) "Forecasting non-stationary financial time series through genetic algorithm"
[28] Leigh W., Paz M., and Purvis R. (2002): "An analysis of a hybrid neural network
and pattern recognition technique for predicting short-term increases in the NYSE
composite index "OmegaVol.30, Number 2, pp 69-76(8).
[29] Sonia Schulenburg and Peter Ross. An Adaptive Agent Based Economic
Model.In Pier Luca Lanzi, Wolfgang Stolzmann, and Stewart W. Wilson, editors,
Learning Classifier Systems: From Foundations to Applications, volume 1813 of
74
Lecture Notes in Artificial Intelligence, pages 265–284. Springer-Verlag, Berlin,
2000.
[30] Sonia Schulenburg and Peter Ross. Strength and Money: An LCS Approach to
Increasing Returns. In Pier Luca Lanzi, Wolfgang Stolzmann, and Stewart W. Wilson,
editors, Advances in Learning Classifier Systems, volume 1996 of Lecture Notes in
Artificial Intelligence, pages 114–137. Springer-Verlag, Berlin, 2001.
75
8 Appendix
8.1 Technical indicators calculation.1) SAR :
Tomorrow's SAR value is built using data available today. The general formula used for this is:
Where SARn and SARn + 1 represent today's and tomorrow's SAR values
The extreme point, EP, is a record kept during each trend that represents the highest value reached by the price during the current uptrend. The α value represents the acceleration factor. Usually, this is set to a value of 0.02 initially. This factor is increased by 0.02 each time a new EP is recorded. To keep it from getting too large, a maximum value for the acceleration factor is normally set at 0.20, so that it never goes beyond that. The SAR is recursively calculated in this manner for each new period. There are, however, two special cases that will modify the SAR value:
• If tomorrow's SAR value lies within (or beyond) today's or yesterday's price range, the SAR must be set to the closest price bound. For example, if in an uptrend, the new SAR value is calculated and it results to be greater than today's or yesterday's lowest price, the SAR must be set equal to that lower boundary.
• If tomorrow's SAR value lies within (or beyond) tomorrow's price range, a new trend direction is then signaled, and the SAR must "switch sides."
Upon a trend switch, several things happen. The first SAR value for this new trend is set to the last EP recorded on the previous trend. The EP is then reset accordingly to this period's maximum. The acceleration factor is reset to its initial value of 0.02[18].
2) Accumulation Distribution Line: Popular volume indicator. Basic idea is that volume precedes price. First we find “Close location value”(CLV) which shows value of close, relative to the range of the period.
CLV = ((C - L) –(H - C)/(H - L))
Accumulation distribution Line = Σ (CLV * Volume for a period)
3) Commodity Channel Index :4 Steps involved:
a) Calculate Typical price(TP) = (H + L + C)/3
b) Calculate 20 days simple moving average of typical price
76
c) Calculate the mean deviation
d) CCI = (Typical Price - SMATP)/(0.015 * Mean Deviation)
4) Chaikin Money Flow(CMF)CMF = cumulative total of Accumulation Distribution Value for 21 periods
divided by cumulative total of volume for 21 periods.
5) Money Flow IndexTypical price(TP) = (H + L + C)/3
Money Flow = (Typical Price * Volume)
For past 14 days find
Money Ratio = Cummulative Positive Money Flow /
Cummulative Negative Money flow
Money Flow Index = 100 – (100 / 1+ Money Ratio)
6) On balance Volume(OBV)If todays price is greater than yesterday price
OBV = Yesterdays OBV + Todays Volume
Else if todays closing price is less than yesterday price
OBV = Yesterdays OBV - Todays Volume
Else
OBV = Yesterdays OBV
7) Percentage Volume Oscillator(PVO)PVO = (Shorter EMA of Volume – Longer EMA of Volume)/ Longer EMA of
Volume * 100
8) Relative Strength Index(RSI)RSI = 100 – 100/(1 + RS)
RS = Average Gain / Average Loss
Average Gain = [(Previous Average Gain)* 13 + current Gain]/14
First Average Gain = Total of Gains during past 14 periods / 14
Average Loss = [(Previous Average Loss)* 13 + current Loss]/14
First Average Loss = Total of Losses during past 14 periods / 14
77
9) Stochastic Oscillator
10) Williams % R%R = [(highest high over 14 periods - close)/(highest high over 14 periods - lowest
low over 14 periods)] * -100
78
8.2 FTSE 100 Companies DetailsTraining + Trading
Period.
S.NbrCompany Code Company Name From To
1 AAL.L ANGLO AMERICAN 5/17/2000 4/29/20082 ABF.L A.B. FOOD 5/16/2000 4/29/20083 AL.L ALLIANCE TRUST 5/16/2000 4/29/20084 AMEC.L AMEC ORD 6/19/2001 4/29/20085 ANTO.L ANTOFAGSTA 1/5/2000 4/29/20086 AV.L AVIVA 5/16/2000 4/29/20087 AZN.L ASTRASENECA 5/17/2000 4/29/20088 BA.L BAR Systems 5/16/2000 4/29/20089 BARC.L BARCLAYS 5/16/2000 4/29/2008
10 BATS.L BR. AMER.TOB 5/16/2000 4/29/200811 BAY.L BR. AIRWAYS 5/16/2000 4/29/200812 BB.L BB 12/6/2000 4/29/200813 BDEV.L BARRATT DEVEL 6/19/2001 4/29/200814 BG.L BG GRP 5/16/2000 4/29/200815 BLND.L BR. LAND 5/16/2000 4/29/200816 BLT.L BHP BILLITON 5/16/2000 4/29/200817 BP.L BP 5/16/2000 4/29/200818 BSY.L BSKYB 5/16/2000 4/29/200819 BT-A.L BT GROUP 11/12/2001 4/29/200820 CBRY.L CADBURY-SCH 5/16/2000 4/29/200821 CCL.L CARNIVAL 4/23/2003 4/29/200822 CNE.L CAIRN ENERGY 2/21/2003 4/29/200823 CPG.L COMPASS GROUP 2/6/2001 4/29/200824 CPI.L CAPITA GROUP 5/16/2000 4/29/200825 CPW.L CARPHONE WARE 7/17/2000 4/29/200826 CW.L CABLE AND WIRE 1/5/2000 4/29/200827 DGE.L DIAGEO 5/16/2000 4/29/200828 DMGT.L DAILY MAIL'A' 5/16/2000 4/29/200829 DSGI.L DSG INTL 1/5/2000 4/29/200830 FGP.L FIRST GROUP 5/16/2000 4/29/200831 FP.L FRIENDS PROV 7/10/2001 4/29/200832 FTSE100 FTSE INDEX 3/11/2003 4/29/200833 GSK.L GLAXOSMITHKLINE 12/28/2000 4/29/200834 HBOS.L HBOS 9/11/2001 4/29/200835 HMSO.L HAMMERSON 1/5/2000 4/29/200836 HSBA.L HSBC HLDGS UK 5/16/2000 4/29/200837 ICI.L ICI 5/16/2000 4/29/200838 III.L 3 I Grp 5/16/2000 4/29/200839 IMT.L IMP. TOBACCO GRP 5/16/2000 4/29/200840 IPR.L INTL POWER 5/16/2000 4/29/200841 ITV.L ITV 1/5/2000 4/29/200842 JMAT.L JOHNSON,MATTH 5/16/2000 4/29/200843 KEL.L KELDA GROUP 5/16/2000 4/29/200844 KGF.L KINGFISHER 7/8/2003 4/29/2008
79
Training + Trading Period.
S.NbrCompany Code Company Name From To
45 LAND.L LAND SECS 9/9/2002 4/29/200846 LGEN.L LEGAL AND GEN 5/16/2000 4/29/200847 LII.L LIBERTY INTL 1/5/2000 4/29/200848 LLOY.L LLOYDS TSB GRP 5/16/2000 4/29/200849 LMI.L LONMIN 2/26/2002 4/29/200850 LSE.L LON.STK.EXCH 7/24/2001 4/29/200851 MKS.L MARKS AND SP. 3/20/2002 4/29/200852 MRW.L MORRISON(WM) 5/16/2000 4/29/200853 NXT.L NEXT 11/26/2002 4/29/200854 OML.L OLD MUTUAL 5/16/2000 4/29/200855 PRU.L PRUDENTIAL 5/16/2000 4/29/200856 PSN.L PERSIMMON 9/25/2001 4/29/200857 PSON.L PERSON 5/16/2000 4/29/200858 PUB.L PUNCH TVNS 5/24/2002 4/29/200859 RB.L RECKITT BEN GP. 5/16/2000 4/29/200860 RBS.L RECKITT BEN. GP 5/16/2000 4/29/200861 REL.L REED ELSEVIER 5/16/2000 4/29/200862 REX.L REXAM 5/16/2000 4/29/200863 RIO.L RIO TINTO 1/5/2000 4/29/200864 RR.L ROLLS-ROYCE 6/24/2003 4/29/200865 RSA.L ROYAL AND SUNN ALL. 5/16/2000 4/29/200866 RTO.L RENTOKIL INITL 5/16/2000 4/29/200867 RTR.L REUTERS GRP 5/16/2000 4/29/200868 SAB.L SABMILLER 5/16/2000 4/29/200869 SBRY.L SAINSBURY 5/16/2000 4/29/200870 SCTN.L SCOTTISH&NEWCASTLE 5/16/2000 4/29/200871 SDR.L SCHRODERS 1/5/2000 4/29/200872 SDRC.L SCHRODERS NV 1/5/2000 4/29/200873 SGE.L SAGE GRP 5/16/2000 4/29/200874 SHP.L SHIRE 5/16/2000 4/29/200875 SMIN.L SMITHS GROUP 5/16/2000 4/29/200876 SN.L SMITH&NEPHEW 5/16/2000 4/29/200877 SSE.L SCOT. AND STH.ENRGY 5/16/2000 4/29/200878 STAN.L STAND.CHART 1/22/2001 4/29/200879 SVT.L SEVERN TRENT 5/16/2000 4/29/200880 TATE.L TATE & LYLE 5/16/2000 4/29/200881 TLW.L TULLOW OIL 12/19/2000 4/29/200882 TSCO.L TESCO 5/16/2000 4/29/200883 TT.L TUI TRAVEL 1/5/2000 4/29/200884 TW.L TAYLOR WIMPEY 12/19/2000 4/29/200885 ULVR.L UNIVELVER 5/16/2000 4/29/200886 UU.L UTD UTILITIES 5/16/2000 4/29/200887 VOD.L VODAFONE GRP. 5/17/2000 4/29/200888 WOS.L WOLSELEY 5/16/2000 4/29/200889 WPP.L WPP GRP 5/16/2000 4/29/200890 XTA.L XSTRATA 3/21/2002 4/29/2008
Table21: 90, FTSE 100 Companies
80
8.3 Setting parameters for Passive Set.Setting Exploration rateReward Only Learning 0.5 0.2 B&HAAL.L 15.54611 15.60947 15.59329HSBA.L 2.082694 1.903172 -0.60372ICI.L 14.27386 13.486 12.1563MKS.L 4.446728 4.082051 9.730154MRW.L 3.307717 4.16874 4.593169NXT.L 8.187518 7.798162 6.575918OML.L -1.06169 -1.78683 0.805047RR.L 15.53187 15.85369 19.60784Average 7.789352 7.639307 8.557249Q Learning
0.5 0.2 B&HAAL.L 21.45264 20.78705 15.59329HSBA.L 11.34646 11.56546 -0.60372ICI.L 29.76451 28.95684 12.1563MKS.L 23.82818 25.23381 9.730154MRW.L 23.47143 22.91625 4.593169NXT.L 20.38398 18.9197 6.575918OML.L 10.6473 9.8369 0.805047RR.L 32.08418 30.20801 19.60784Average 21.62233 21.053 8.557249Q(λ) Learning
0.5 0.2 0.02 B&HAAL.L 6.746205 9.001104 3.096389 15.59329HSBA.L -0.09619 0.411505 1.010946 -0.60372ICI.L 7.725501 6.210549 6.078729 12.1563MKS.L 8.96266 8.009718 4.437538 9.730154MRW.L -2.20793 -2.18329 -0.32745 4.593169NXT.L -4.61885 -4.63048 -2.55267 6.575918OML.L 3.289822 2.829204 2.37 0.805047RR.L 9.335978 12.02342 18.13777 19.60784Average 3.642148 3.958965 4.031406 8.557249
Setting trace DecayQ(λ) Learning 0.8 0.5 0.2 B&HAAL.L 9.001104 6.379423 7.036763 15.59329HSBA.L 0.411505 0.703608 0.359937 -0.60372ICI.L 6.210549 7.17197 6.469978 12.1563MKS.L 8.009718 4.027111 3.425882 9.730154MRW.L -2.18329 -0.66926 -1.49622 4.593169NXT.L -4.63048 -4.18156 -2.6183 6.575918OML.L 2.829204 1.474072 1.948605 0.805047RR.L 12.02342 14.21432 20.37181 19.60784Average 3.958965 3.63996 4.437307 8.557249
Table 22 Setting parameters for passive set.
81
8.4 Portfolio Management Results
8.4.1 Reward Only Learning
Company Name
Before -No portfolio Rebalancing Criteria- Max Return Criteria -Max Sterling RatioAverage Agent Buy and Hold
Monthly Portfolio RB
Quarterly Portfolio RB
Monthly Portfolio RB
Quarterly Portfolio RB
RET% VOL B&HRETB&H Vol RET% VOL SR RET% VOL SR RET% VOL SR RET% VOL SR
AAL.L 31.62 24.27 24.59 27.75 33.99 20.69 0.80 27.75 21.71 0.66 41.53 20.64 0.92 21.67 16.74 0.66ABF.L 19.47 5.33 11.32 12.43 23.64 10.30 1.06 22.56 8.69 1.21 16.41 10.34 0.72 23.58 8.41 1.30AL.L 14.80 6.56 -6.69 27.20 23.76 11.78 0.94 18.01 10.14 0.83 14.09 9.68 0.63 16.23 9.25 0.80AV.L 15.54 12.65 -2.62 32.71 20.76 13.49 0.74 20.69 12.23 0.82 13.70 11.37 0.53 15.91 12.39 0.61
AZN.L 4.88 4.65 -6.46 28.56 2.71 6.64-0.3
3 3.93 4.52-0.2
7 4.06 5.84-0.1
9 4.50 4.46-0.1
6BA.L 47.48 25.11 6.01 49.12 53.59 18.61 1.18 48.45 17.20 1.20 42.16 16.85 1.10 51.60 19.88 1.11BARC.L 15.91 8.29 -2.49 26.22 19.66 9.18 0.98 14.31 7.95 0.78 17.26 9.13 0.85 17.36 7.74 1.02BATS.L 26.87 11.82 31.71 23.95 27.43 11.33 1.11 23.69 9.93 1.12 28.14 10.94 1.17 23.60 10.44 1.07BAY.L 25.69 21.51 -0.54 71.39 25.19 16.74 0.74 27.58 17.13 0.80 24.68 16.61 0.74 21.02 19.39 0.57BG.L 63.01 20.04 53.13 33.90 51.98 14.06 1.48 59.37 15.17 1.54 45.54 13.43 1.41 66.32 14.12 1.75BLND.L 45.33 12.78 10.23 36.32 48.85 13.30 1.49 36.38 14.81 1.12 46.76 12.46 1.54 41.57 13.36 1.36BLT.L 38.42 22.72 57.65 29.08 39.30 19.81 0.91 45.91 17.32 1.15 32.21 17.80 0.86 52.32 15.44 1.39BP.L 8.63 7.95 -0.30 23.55 6.13 7.91 0.09 7.23 7.01 0.22 6.42 7.66 0.12 6.03 6.74 0.09BSY.L 11.76 7.18 -5.78 12.06 9.40 8.06 0.38 12.45 8.43 0.61 12.22 8.38 0.59 8.13 7.87 0.29CBRY.L 7.83 3.64 3.43 19.23 13.53 7.61 0.74 11.45 4.56 0.94 9.42 7.67 0.40 9.97 4.82 0.69CPI.L 33.37 6.17 10.77 27.68 37.14 9.43 1.69 36.38 10.17 1.58 41.02 9.67 1.78 43.97 12.03 1.56DGE.L 11.21 5.94 1.98 15.41 9.89 6.30 0.52 10.20 7.70 0.48 10.04 6.79 0.50 7.70 7.59 0.25DMGT.L 7.09 7.81 -7.53 33.12 5.27 7.45 0.15 4.26 6.83 0.05 5.06 5.66 0.14 5.77 7.35 0.26FGP.L 33.32 14.51 14.30 28.19 34.16 10.45 1.44 40.04 11.74 1.49 29.87 10.08 1.33 28.02 11.37 1.15HSBA.L 12.16 7.29 1.62 15.94 11.06 5.39 0.74 10.56 4.64 0.80 10.04 4.65 0.71 8.37 4.42 0.51ICI.L 34.84 14.66 16.51 57.84 45.20 16.80 1.19 29.93 13.44 1.11 39.28 16.28 1.11 26.68 11.86 1.13III.L 22.22 14.39 -0.04 38.93 22.90 11.17 0.96 25.52 12.10 1.00 18.48 9.76 0.87 16.56 11.30 0.69IMT.L 10.92 10.44 19.48 16.28 11.87 11.67 0.43 9.64 9.78 0.35 9.72 11.35 0.32 11.12 8.99 0.48IPR.L 41.74 20.25 17.70 43.66 52.14 17.22 1.23 36.54 18.47 0.93 48.21 17.50 1.16 48.25 19.79 1.06
1
JMAT.L 26.40 14.05 14.62 23.38 31.19 9.94 1.40 28.17 11.00 1.19 26.45 9.81 1.23 27.44 10.88 1.18KEL.L 24.83 11.62 29.32 12.83 23.04 9.04 1.19 25.45 10.97 1.16 28.49 8.62 1.51 26.14 11.45 1.14LGEN.L 29.50 14.02 -3.67 32.62 29.96 12.51 1.10 36.92 11.22 1.46 27.44 12.48 1.02 39.43 13.35 1.31LLOY.L 11.70 7.40 -6.31 27.98 11.65 9.67 0.48 10.30 6.46 0.56 12.22 9.39 0.53 12.02 8.01 0.60MRW.L 27.52 13.16 6.00 29.71 28.82 14.04 0.96 23.43 11.60 0.96 32.37 13.93 1.06 22.86 14.06 0.79OML.L 21.59 15.61 3.43 32.80 26.66 14.06 0.90 27.73 14.51 0.92 19.12 13.25 0.69 27.57 13.74 0.96PRU.L 11.98 9.53 -0.66 30.63 11.60 9.42 0.49 23.02 9.47 1.14 12.05 8.61 0.56 14.64 9.67 0.68PSON.L 12.23 9.69 -4.51 25.03 17.98 8.87 0.92 11.78 8.90 0.54 11.32 9.17 0.48 11.62 7.69 0.60RB.L 15.24 5.94 25.89 13.41 24.77 9.32 1.21 16.25 8.13 0.90 18.87 8.76 0.98 13.81 6.97 0.84
RBS.L 5.19 4.39 -6.88 23.97 6.07 8.42 0.08 6.68 6.99 0.16 4.60 8.18-0.0
6 6.93 7.20 0.19REL.L 11.53 7.37 -0.84 14.93 9.17 7.38 0.39 10.59 7.16 0.54 9.46 6.77 0.45 11.10 7.53 0.56REX.L 13.79 9.11 -1.05 17.66 15.98 9.49 0.75 21.38 9.93 1.01 7.99 6.26 0.32 17.51 10.15 0.80RSA.L 31.08 17.11 -9.18 49.57 33.77 17.89 0.89 36.16 20.82 0.84 24.40 17.30 0.71 37.53 20.05 0.88RTO.L 8.90 8.29 -10.96 20.90 9.22 6.04 0.47 11.50 6.23 0.71 9.81 5.46 0.58 9.20 7.04 0.42RTR.L 21.78 31.27 2.05 101.20 25.98 17.75 0.72 28.68 15.04 0.92 17.32 17.58 0.50 23.09 13.80 0.81SAB.L 23.40 14.10 22.77 28.46 19.66 13.88 0.69 24.42 11.64 1.00 23.67 13.76 0.82 23.92 11.70 0.97SBRY.L 20.61 22.73 -1.18 43.32 22.08 11.19 0.92 19.74 13.36 0.72 20.76 10.12 0.95 22.08 13.02 0.82SCTN.L 16.53 6.21 7.08 27.03 19.23 11.71 0.77 13.07 9.39 0.60 17.05 9.20 0.83 14.80 9.75 0.68SGE.L 16.27 12.56 -0.48 37.26 13.76 9.55 0.62 14.08 7.12 0.85 15.55 8.38 0.81 11.07 7.72 0.55SHP.L 35.28 13.39 12.86 28.61 29.60 16.31 0.86 38.03 14.15 1.21 27.27 15.48 0.84 37.19 14.43 1.17SMIN.L 21.83 6.97 3.03 17.73 25.90 10.25 1.16 25.31 12.59 0.96 18.57 9.38 0.90 22.61 12.04 0.90SN.L 15.67 6.36 9.19 16.24 18.05 10.05 0.82 16.48 12.09 0.64 19.22 8.74 1.00 17.38 11.96 0.69SSE.L 22.75 16.17 17.42 16.96 24.11 11.50 0.98 25.80 15.28 0.83 22.96 10.94 0.98 21.94 13.66 0.78SVT.L 28.04 27.83 5.90 17.27 31.34 24.13 0.65 31.10 22.45 0.69 24.81 24.18 0.54 27.51 22.63 0.63TATE.L 35.11 17.57 8.15 34.97 34.45 12.06 1.27 42.23 16.52 1.13 37.12 12.48 1.30 41.33 15.78 1.16TSCO.L 20.94 13.39 11.61 25.93 21.59 11.14 0.91 17.68 13.45 0.64 21.61 10.52 0.95 17.38 13.58 0.62ULVR.L 10.83 5.68 6.01 10.80 7.57 9.13 0.21 10.78 7.13 0.56 7.36 8.26 0.20 10.55 6.44 0.59UU.L 13.73 6.61 2.97 12.53 18.43 9.21 0.91 20.44 10.28 0.94 16.47 9.06 0.81 18.03 9.74 0.86VOD.L 26.38 16.12 3.50 10.12 26.56 9.88 1.23 27.99 12.09 1.09 26.92 9.84 1.24 30.87 12.33 1.16WOS.L 19.70 13.52 -4.25 43.48 16.77 10.62 0.72 18.33 10.57 0.81 19.28 10.21 0.87 22.75 10.72 1.00WPP.L 15.95 11.09 -4.01 36.87 18.02 10.71 0.78 17.54 12.02 0.69 14.33 9.97 0.63 19.52 11.62 0.80Portfolio 21.82 11.77 7.01 13.88 23.32 12.38 1.48 22.98 11.98 1.50 21.11 11.59 1.39 22.47 13.42 1.30
Table 23: Reward Only Learning Portfolio Management Result
2
8.4.2 Q Learning Portfolio Management
Company Name
Before -No portfolio Rebalancing Criteria- Max Return Criteria -Max Sterling RatioAverage Agent Buy and Hold Monthly Portfolio RB
Quarterly Portfolio RB Monthly Portfolio RB
Quarterly Portfolio RB
RET% VOL B&HRETB&H Vol RET% VOL SR RET% VOL SR RET% VOL SR RET% VOL SR
AAL.L 41.78 16.65 24.59 27.75 55.29 18.98 1.18 61.43 15.59 1.53 55.44 18.98 1.18 51.51 15.37 1.39ABF.L 32.58 4.99 11.32 12.43 28.07 13.57 0.97 28.56 10.72 1.24 29.31 13.23 1.02 28.10 10.78 1.21
AL.L -1.15 19.47 -6.69 27.20 -1.49 18.53-0.2
7 -1.07 20.95-0.1
8 -1.74 18.62-0.2
8 -1.38 21.02 -0.20AV.L 33.25 11.92 -2.62 32.71 37.18 18.77 0.91 42.16 10.43 1.73 27.96 15.45 0.86 39.04 9.59 1.77AZN.L 8.98 13.56 -6.46 28.56 9.87 12.49 0.31 8.39 12.00 0.24 10.33 12.55 0.33 9.10 11.66 0.28BA.L 54.00 26.50 6.01 49.12 55.59 17.07 1.30 51.61 17.73 1.22 53.03 16.88 1.27 51.85 17.72 1.23BARC.L 10.84 13.64 -2.49 26.22 9.69 12.30 0.30 12.03 11.44 0.45 7.76 12.81 0.19 12.39 11.28 0.48BATS.L 26.72 17.62 31.71 23.95 30.34 14.78 0.96 26.85 13.73 0.94 27.71 14.80 0.89 28.06 13.66 0.98BAY.L 36.76 40.30 -0.54 71.39 38.12 24.24 0.77 41.94 32.35 0.68 36.82 23.42 0.77 46.89 32.68 0.72BG.L 59.91 17.42 53.13 33.90 59.02 14.07 1.60 58.05 12.35 1.83 53.50 14.09 1.50 60.17 12.73 1.82BLND.L 36.22 19.07 10.23 36.32 35.64 15.73 1.03 41.38 16.53 1.12 34.33 15.78 1.00 41.70 16.36 1.14BLT.L 87.19 24.16 57.65 29.08 77.74 21.18 1.30 95.86 17.88 1.74 70.87 21.40 1.23 93.35 17.68 1.73BP.L 21.00 10.75 -0.30 23.55 16.85 11.99 0.66 17.44 10.74 0.76 18.25 11.81 0.73 16.94 10.85 0.73
BSY.L 1.24 10.40 -5.78 12.06 2.88 13.46-0.1
0 3.54 14.18-0.0
5 5.30 12.63 0.04 2.65 14.29 -0.10CBRY.L 19.32 8.78 3.43 19.23 20.48 11.37 0.84 24.07 7.36 1.51 20.32 11.22 0.85 20.48 8.06 1.18CPI.L 35.66 23.39 10.77 27.68 41.57 12.54 1.41 29.61 16.32 0.88 40.29 13.24 1.31 30.22 16.32 0.89DGE.L 9.22 8.41 1.98 15.41 9.57 9.62 0.35 9.04 9.01 0.33 10.15 9.49 0.39 9.95 8.87 0.40DMGT.L 11.74 13.76 -7.53 33.12 11.22 21.78 0.37 11.55 21.28 0.41 11.68 21.75 0.39 12.07 21.21 0.43FGP.L 41.53 20.84 14.30 28.19 44.04 15.30 1.23 46.78 17.68 1.15 51.40 15.14 1.37 44.41 17.99 1.09HSBA.L 18.88 9.11 1.62 15.94 16.96 10.47 0.74 18.59 8.41 1.02 16.87 10.06 0.76 18.29 8.35 1.01ICI.L 59.90 43.43 16.51 57.84 62.30 25.36 1.02 59.55 21.33 1.19 54.13 25.39 0.94 56.69 21.68 1.14
III.L 10.87 22.56 -0.04 38.93 5.13 15.85 0.06 11.35 18.89 0.31 2.72 15.76-0.0
7 11.36 18.90 0.31IMT.L 32.71 9.29 19.48 16.28 35.55 11.69 1.34 27.67 9.45 1.35 33.14 11.70 1.27 26.30 9.35 1.31IPR.L 62.19 26.16 17.70 43.66 58.16 22.40 1.06 66.12 23.06 1.13 50.05 21.78 0.99 63.73 23.03 1.11JMAT.L 32.19 19.57 14.62 23.38 28.88 13.23 1.01 29.77 14.56 0.97 29.47 13.32 1.02 28.91 14.58 0.95KEL.L 40.13 9.99 29.32 12.83 36.80 11.85 1.38 38.81 10.34 1.74 35.50 11.86 1.34 39.25 10.33 1.75LGEN.L 25.98 18.53 -3.67 32.62 22.93 13.86 0.80 22.28 14.25 0.77 22.58 13.87 0.78 23.02 14.02 0.80
3
LLOY.L 5.22 17.84 -6.31 27.98 4.79 15.33 0.03 5.15 16.11 0.06 3.46 14.89-0.0
5 4.85 16.07 0.04MRW.L 47.09 16.00 6.00 29.71 43.66 17.68 1.07 44.68 19.19 1.04 45.89 17.86 1.10 41.39 19.06 0.99OML.L 16.76 27.85 3.43 32.80 15.89 20.13 0.43 14.86 21.72 0.39 12.45 20.64 0.33 15.70 21.17 0.42PRU.L 14.69 13.30 -0.66 30.63 19.37 15.33 0.63 18.58 14.02 0.66 18.37 15.41 0.59 21.00 13.55 0.76PSON.L 11.97 12.54 -4.51 25.03 11.51 14.46 0.35 10.77 11.59 0.38 13.45 14.19 0.44 12.19 12.50 0.43RB.L 24.70 9.35 25.89 13.41 23.19 10.80 1.00 27.90 10.73 1.21 25.43 10.75 1.09 25.89 11.01 1.11RBS.L 13.52 14.36 -6.88 23.97 15.19 14.09 0.51 13.37 12.62 0.49 15.12 14.13 0.51 14.58 12.60 0.54REL.L 9.92 7.01 -0.84 14.93 8.11 10.00 0.24 11.83 7.17 0.65 9.38 9.81 0.33 11.49 6.87 0.65REX.L 14.28 8.84 -1.05 17.66 15.48 12.15 0.59 12.09 10.56 0.49 13.44 11.97 0.50 11.67 10.67 0.46RSA.L 62.42 30.89 -9.18 49.57 76.86 21.51 1.28 64.14 25.04 1.04 67.17 20.92 1.21 55.83 24.56 0.97
RTO.L 3.76 11.87 -10.96 20.90 2.62 15.10-0.0
9 0.62 13.66-0.2
5 4.12 15.17 0.00 -0.43 13.12 -0.35RTR.L 94.09 68.67 2.05 101.20 72.00 26.92 1.03 82.92 26.44 1.14 73.52 27.02 1.03 90.07 33.27 0.98SAB.L 38.19 20.66 22.77 28.46 30.83 18.22 0.82 35.15 16.71 0.99 34.28 17.56 0.92 37.41 16.95 1.03SBRY.L 17.16 32.55 -1.18 43.32 14.56 17.51 0.42 13.08 20.97 0.35 18.99 17.47 0.56 12.53 21.06 0.33SCTN.L 18.49 12.50 7.08 27.03 16.27 13.83 0.56 20.30 10.91 0.88 17.40 13.72 0.61 18.89 11.08 0.81SGE.L 19.23 26.07 -0.48 37.26 21.52 14.60 0.72 18.50 14.17 0.64 21.44 14.38 0.72 23.72 15.06 0.78SHP.L 35.76 13.21 12.86 28.61 37.70 18.55 0.93 37.79 17.10 1.02 41.67 18.24 1.01 38.33 17.03 1.03SMIN.L 11.64 14.00 3.03 17.73 10.96 14.15 0.34 11.48 12.80 0.39 10.04 13.84 0.30 9.98 11.92 0.33SN.L 19.35 12.96 9.19 16.24 16.65 14.63 0.55 20.88 15.61 0.67 15.51 15.12 0.50 20.27 15.61 0.65SSE.L 21.09 13.32 17.42 16.96 24.42 12.17 0.94 21.26 14.30 0.73 24.66 12.20 0.95 21.14 14.20 0.73SVT.L 23.38 11.14 5.90 17.27 20.15 11.74 0.81 20.22 9.22 1.03 21.10 11.60 0.86 24.02 9.11 1.23TATE.L 18.94 23.20 8.15 34.97 18.48 22.27 0.47 15.40 27.34 0.37 15.60 22.54 0.40 14.75 27.45 0.35TSCO.L 24.71 17.16 11.61 25.93 25.00 11.97 0.98 27.24 13.16 0.99 23.09 12.79 0.86 24.45 12.24 0.96ULVR.L 20.42 7.99 6.01 10.80 23.38 11.48 0.95 20.50 10.48 0.93 21.42 11.74 0.86 19.32 10.63 0.86UU.L 15.87 11.39 2.97 12.53 16.58 10.00 0.75 16.71 10.70 0.73 16.03 10.14 0.71 15.95 10.69 0.69VOD.L 14.80 12.60 3.50 10.12 15.18 16.50 0.46 13.73 19.68 0.38 15.22 16.67 0.46 15.68 19.58 0.43WOS.L 19.45 26.64 -4.25 43.48 19.43 17.26 0.58 15.44 19.96 0.43 18.41 17.00 0.55 19.22 17.81 0.57WPP.L 23.56 21.38 -4.01 36.87 28.31 13.34 0.99 27.79 19.39 0.72 23.72 13.45 0.84 23.07 15.38 0.74Portfolio 27.46 19.85 7.01 13.88 27.21 19.19 1.16 27.74 20.52 1.11 26.32 17.98 1.19 27.42 20.10 1.12
Table 24: Q (1) Learning Portfolio Management Result
4
8.4.3 Different Number of Best Agents
Monthly Portfolio RB, Criteria Max Return 1Best Agent 3 Best Agents 5 Best AgentsCompany Name RET% VOL SR RET% VOL SR RET% VOL SRAAL.L 33.99 20.69 0.80 37.79 18.16 0.96 35.37 17.36 0.95ABF.L 23.64 10.30 1.06 22.77 9.73 1.08 21.94 9.80 1.03AL.L 23.76 11.78 0.94 14.29 10.88 0.58 14.08 10.79 0.58AV.L 20.76 13.49 0.74 16.22 13.35 0.58 15.16 12.28 0.57AZN.L 2.71 6.64 -0.33 3.91 6.58 -0.18 5.15 6.35 -0.02BA.L 53.59 18.61 1.18 55.66 16.93 1.31 60.27 16.42 1.41BARC.L 19.66 9.18 0.98 15.91 8.73 0.81 14.83 8.82 0.73BATS.L 27.43 11.33 1.11 29.48 11.14 1.20 28.49 10.75 1.21BAY.L 25.19 16.74 0.74 25.16 16.75 0.74 25.41 16.37 0.76BG.L 51.98 14.06 1.48 53.16 12.78 1.64 50.93 12.70 1.60BLND.L 48.85 13.30 1.49 45.48 12.49 1.51 46.44 12.27 1.55BLT.L 39.30 19.81 0.91 35.01 18.73 0.88 35.25 17.99 0.91BP.L 6.13 7.91 0.09 6.52 7.79 0.13 6.09 7.74 0.09BSY.L 9.40 8.06 0.38 11.80 7.62 0.60 10.75 7.73 0.51CBRY.L 13.53 7.61 0.74 10.02 7.25 0.47 9.50 7.26 0.42CPI.L 37.14 9.43 1.69 31.73 9.45 1.49 34.42 9.40 1.60DGE.L 9.89 6.30 0.52 9.16 5.89 0.47 10.03 5.78 0.58DMGT.L 5.27 7.45 0.15 5.91 6.25 0.27 8.65 8.31 0.51FGP.L 34.16 10.45 1.44 34.59 10.02 1.51 36.40 10.19 1.55HSBA.L 11.06 5.39 0.74 11.73 4.85 0.90 11.58 4.57 0.94ICI.L 45.20 16.80 1.19 41.71 15.21 1.23 34.44 14.48 1.12III.L 22.90 11.17 0.96 19.55 10.81 0.84 20.90 10.61 0.91IMT.L 11.87 11.67 0.43 12.02 10.33 0.48 12.67 9.72 0.54IPR.L 52.14 17.22 1.23 49.73 16.01 1.28 46.87 15.92 1.23JMAT.L 31.19 9.94 1.40 26.97 9.93 1.24 27.69 9.78 1.28KEL.L 23.04 9.04 1.19 26.41 8.80 1.39 27.32 8.71 1.44LGEN.L 29.96 12.51 1.10 36.55 11.50 1.39 36.06 11.10 1.42
5
LLOY.L 11.65 9.67 0.48 13.20 10.14 0.56 13.66 9.81 0.60MRW.L 28.82 14.04 0.96 28.21 12.98 1.01 27.88 12.97 1.00OML.L 26.66 14.06 0.90 26.91 12.47 1.00 23.33 11.67 0.94PRU.L 11.60 9.42 0.49 11.35 6.56 0.65 11.55 6.60 0.66PSON.L 17.98 8.87 0.92 14.43 7.27 0.84 13.25 6.93 0.78RB.L 24.77 9.32 1.21 19.32 7.90 1.10 17.83 7.70 1.03RBS.L 6.07 8.42 0.08 5.51 7.50 0.03 5.39 7.41 0.02REL.L 9.17 7.38 0.39 9.80 6.88 0.47 10.25 6.58 0.54REX.L 15.98 9.49 0.75 15.41 7.80 0.86 14.47 7.14 0.86RSA.L 33.77 17.89 0.89 35.75 16.90 0.98 29.37 16.90 0.84RTO.L 9.22 6.04 0.47 8.96 5.26 0.49 9.00 5.28 0.50RTR.L 25.98 17.75 0.72 22.97 17.07 0.67 23.45 17.20 0.68SAB.L 19.66 13.88 0.69 21.29 13.70 0.75 21.72 13.22 0.79SBRY.L 22.08 11.19 0.92 23.42 10.92 1.00 22.95 10.86 0.98SCTN.L 19.23 11.71 0.77 18.68 9.26 0.92 16.83 8.57 0.88SGE.L 13.76 9.55 0.62 14.79 8.02 0.79 14.80 7.93 0.80SHP.L 29.60 16.31 0.86 39.23 15.51 1.12 38.98 15.37 1.12SMIN.L 25.90 10.25 1.16 23.58 10.12 1.07 24.15 9.76 1.14SN.L 18.05 10.05 0.82 14.52 8.31 0.75 14.97 8.27 0.78SSE.L 24.11 11.50 0.98 23.63 11.24 0.98 23.31 11.07 0.98SVT.L 31.34 24.13 0.65 30.39 23.46 0.65 27.76 21.21 0.66TATE.L 34.45 12.06 1.27 37.41 12.28 1.33 37.11 12.17 1.33TSCO.L 21.59 11.14 0.91 21.56 10.18 0.98 21.75 9.43 1.06ULVR.L 7.57 9.13 0.21 10.09 8.74 0.41 10.64 8.04 0.48UU.L 18.43 9.21 0.91 15.77 8.37 0.83 15.26 8.18 0.81VOD.L 26.56 9.88 1.23 27.68 9.13 1.37 27.79 8.45 1.48WOS.L 16.77 10.62 0.72 18.83 10.29 0.84 19.13 10.18 0.87WPP.L 18.02 10.71 0.78 17.32 9.95 0.79 16.93 9.80 0.78Portfolio 23.32 12.38 1.48 22.90 12.52 1.43 22.55 12.26 1.43
Table 25: Portfolio Management using different Number of best agents.
6
8.4.4 Portfolio Construction using 10 best companies-with Quarterly optimization
10 companies
S.Nbr Portfolio Wealth Companies in Portfolio Performance
1 99716.85 SCTN-ABF-BATS-AL-RB-BA-BLND-BARC-TATE-ULVR- -0.282 100199.59 UU-VOD-AV-BG-BATS-PRU-FGP-TATE-IMT-CBRY- 0.483 103442.81 CPI-REX-DGE-SMIN-AAL-SGE-PSON-WPP-IPR-RTO- 3.244 116273.44 SN-IPR-AAL-BAY-RSA-SHP-BG-ULVR-TATE-JMAT- 12.405 120470.56 RSA-IPR-LGEN-TSCO-RTR-WPP-SMIN-BAY-OML-BLND- 3.616 123253.30 BA-BLT-PRU-OML-RTO-JMAT-TSCO-PSON-BARC-MRW- 2.317 130099.50 BAY-SAB-RTR-BG-REX-SHP-LGEN-AAL-MRW-RTO- 5.558 133260.26 RTR-WOS-IPR-BG-SAB-UU-BP-CPI-IMT-BA- 2.439 146987.21 AV-BA-SAB-BLND-SSE-WOS-TATE-BLT-FGP-AL- 10.30
10 154305.89 TATE-MRW-SAB-AV-SSE-BLND-TSCO-CPI-VOD-SVT- 4.9811 154975.36 BLT-BAY-LLOY-WOS-BLND-BATS-III-PRU-LGEN-SMIN- 0.4312 161914.01 IPR-PRU-SAB-SMIN-SCTN-RB-BG-DGE-CBRY-SVT- 4.4813 170102.33 IPR-BA-III-BG-JMAT-SHP-BLT-ABF-BP-IMT- 5.0614 183743.21 AAL-TATE-III-RSA-BAY-BATS-BARC-SAB-BG-IMT- 8.0215 182089.95 SHP-OML-BA-BG-BLND-CPI-AV-FGP-AAL-RSA- -0.9016 187322.71 FGP-AL-SVT-BLT-BSY-LGEN-JMAT-VOD-WPP-RB- 2.8717 207039.30 TATE-BAY-RSA-ABF-TSCO-REL-BSY-SBRY-SSE-BLND- 10.5318 211874.25 SVT-BAY-SSE-SHP-FGP-BLND-LGEN-TATE-VOD-RSA- 2.3419 213992.24 SBRY-SN-MRW-RSA-CPI-BA-OML-WPP-BATS-PSON- 1.0020 220112.87 BLT-AAL-SMIN-IPR-VOD-BG-SHP-CBRY-ABF-RB- 2.8621 231661.70 BA-REX-VOD-CPI-BLT-BSY-SHP-DGE-IPR-RB- 5.2522 239646.38 BG-VOD-RSA-BLT-REL-SCTN-MRW-CPI-BA-IMT- 3.4
Figure26:Portfolio Construction with 10 best companies with quarterly optimization
7
8.4.5 Combined Results for Portfolio Management System taking less number of companies.
5 Companies 10 companies 20 companies
VolatilitySharpe Ratio PER Volatility
Sharpe Ratio PER Volatility
Sharpe Ratio PER
Monthly Max Sterling 8.77 1.03 20.03 7.71 1.38 22.56 5.84 1.92 24.10Quarterly Max Sterling 9.71 1.66 31.20 6.97 1.76 23.27 5.84 2.47 28.71Monthly Least Sterling 6.11 1.85 24.54 4.35 2.35 22.27 3.18 2.72 19.22Quarterly least Sterling 7.17 1.91 25.92 4.58 2.37 20.82 3.83 2.34 17.38
Monthly Max Reward 9.50 1.31 27.35 8.46 1.55 28.13 6.03 2.08 27.06Quarterly Max Reward 8.02 1.81 27.70 7.75 1.74 25.17 5.97 2.32 27.12Monthly Least Reward 5.69 1.47 18.73 4.51 2.09 20.75 3.38 2.29 17.50Quarterly Least Reward 5.98 2.25 25.66 4.62 2.56 22.67 3.77 2.68 19.43
Table 27: Combined Results for portfolio management taking less number of companies.
8