journal of class files, vol. , no. , november ... of class files, vol. , no. , november 2016 1...

12
JOURNAL OF CLASS FILES, VOL. , NO. , NOVEMBER 2016 1 Classification and Clustering of Stocks, using Genetic Algorithms and Fundamental Analysis David Moura, MSc Student, IST Abstract—Since the last two decades the ease of access to information has grown exponentially, making it easier to analyze and use this data in every field of science, including computational finance. This work presents an architecture made from scratch of a trading system that classifies stocks using two techniques, in order to conclude which one is superior: one with user input parameters, and an unsupervised one using a genetic algorithm to optimize clustering position, with a constant number of clusters. A genetic algorithm is also applied to optimize fundamental indicators to give buy and sell signals in each of the groups obtained, in order to conclude if stocks in the same group behave in similar fashion. Results have shown that the group with best results obtained with user input parameters is superior to the group with best returns obtained with the clustering algorithm. However the clustering algorithm classified stocks better, having increased performance over the user input method when used with the genetic algorithm using fundamental indicators. The proposed system was implemented from scratch, and was contains optimization modules, processing modules and a trading simulator. Index Terms—Genetic Algorithms; Fundamental Analysis; Fundamental Indicators; Classification; Clustering; Stock Market 1 I NTRODUCTION T HIS section is an introduction to the subject of computational techniques applied to compa- nies’ financial data as a way to find trading rules and patterns in the stock market. 1.1 Motivation and Context Several techniques are used to predict stock mar- ket quotes, being the most popular ones Artificial Intelligence (AI) methods to optimize financial indicators’ parameters. There are two types of financial indicators: fundamental and technical indicators. Other methods include the evaluation of each stock of a certain type, being this type defined by its sector or any other rule. 1.2 Problem Statement The problem here is to construct a system that evaluates the stock market and accurately groups stocks that show similar behaviors. 1.3 Proposed Solution The implemented solution is a system that classi- fies companies into groups based on their finan- cial statements, in two ways: a supervised way with parameters defined by the user and an un- supervised way, applying a genetic algorithm to optimize clustering. After both classifications, the application uses a genetic algorithm once more to optimize indicators to buy and sell stocks, com- paring all the results to conclude which method classifies better. The proposed system was imple- mented in C++ with about 9000 lines of code and it was made to be used by third parties that wish to improve its features.

Upload: lamcong

Post on 11-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

JOURNAL OF CLASS FILES, VOL. , NO. , NOVEMBER 2016 1

Classification and Clustering of Stocks,using Genetic Algorithms and

Fundamental AnalysisDavid Moura, MSc Student, IST

Abstract—Since the last two decades the ease of access to information has grown exponentially, making iteasier to analyze and use this data in every field of science, including computational finance. This work presentsan architecture made from scratch of a trading system that classifies stocks using two techniques, in order toconclude which one is superior: one with user input parameters, and an unsupervised one using a geneticalgorithm to optimize clustering position, with a constant number of clusters. A genetic algorithm is also appliedto optimize fundamental indicators to give buy and sell signals in each of the groups obtained, in order toconclude if stocks in the same group behave in similar fashion. Results have shown that the group with bestresults obtained with user input parameters is superior to the group with best returns obtained with theclustering algorithm. However the clustering algorithm classified stocks better, having increased performanceover the user input method when used with the genetic algorithm using fundamental indicators. The proposedsystem was implemented from scratch, and was contains optimization modules, processing modules and atrading simulator.

Index Terms—Genetic Algorithms; Fundamental Analysis; Fundamental Indicators; Classification; Clustering;Stock Market

F

1 INTRODUCTION

THIS section is an introduction to the subject ofcomputational techniques applied to compa-

nies’ financial data as a way to find trading rulesand patterns in the stock market.

1.1 Motivation and ContextSeveral techniques are used to predict stock mar-ket quotes, being the most popular ones ArtificialIntelligence (AI) methods to optimize financialindicators’ parameters. There are two types offinancial indicators: fundamental and technicalindicators. Other methods include the evaluationof each stock of a certain type, being this typedefined by its sector or any other rule.

1.2 Problem StatementThe problem here is to construct a system thatevaluates the stock market and accurately groups

stocks that show similar behaviors.

1.3 Proposed Solution

The implemented solution is a system that classi-fies companies into groups based on their finan-cial statements, in two ways: a supervised waywith parameters defined by the user and an un-supervised way, applying a genetic algorithm tooptimize clustering. After both classifications, theapplication uses a genetic algorithm once more tooptimize indicators to buy and sell stocks, com-paring all the results to conclude which methodclassifies better. The proposed system was imple-mented in C++ with about 9000 lines of code andit was made to be used by third parties that wishto improve its features.

JOURNAL OF CLASS FILES, VOL. , NO. , NOVEMBER 2016 2

1.4 Document StructureIn this section the document structure is pre-sented. Section 2 shows theoretical concepts re-quired to develop this project and an overviewof the results obtained by related works. Section3 documents the proposed solution. Section 4presents a validation of the system. Section 5summarizes this work, concluding the achieve-ments, limitations and proposing future improve-ments.

2 STATE OF ART

2.1 Financial IndicatorsFinancial indicators analyze statistics and presentthat information in the form of ratios. These in-dicators will give better results if used alongsidethe right strategy [1].

There are several types of financial indicators,divided into Fundamental Indicators (FI) or Tech-nical Indicators, which come from FundamentalAnalysis (FA) and Technical Analysis (TA).

2.1.1 Fundamental IndicatorsFA is used to check the intrinsic value of a certainmarket, industry or company, being FA appliedto the latter the most used one. FA uses finan-cial statements to know if a company is underor overvalued. FI are constructed using FA, andthese type of indicators do not take into accountmarket trends, only its intrinsic value, or in otherwords, the real raw value of the company [2].

FA has been made famous by value investorslike Benjamin Graham and Warren Buffet (fortheir value investment strategies [3], [4]).

Using information given in financial state-ments, one can calculate some fundamental ratiosused to compare companies, to decide which onesare the best to invest in.

2.1.2 Technical IndicatorsTA and FA use different approaches towards in-vestment, since TA uses movement of stock prices[5] and volume of transactions [6] as the maininformation to predict stock markets. TIs lookfor patterns in past data and use those patternsto forecast market tendencies [7]. TA generatestrading rules by analyzing previous patterns oftechnical indicators [2].

2.2 Computational Intelligence (CI) Algo-rithms

This work uses Genetic Algorithms (GA). GAs arethe most used kind of Evolutionary Algorithms(EA), an CI technique.

2.2.1 Genetic Algorithms

GAs are stochastic algorithms based on the evo-lutionary process of species. A simple GA pseu-docode is given in algorithm 1.

t← 0;P (t)← random;Evaluation P (t);while notEndcondition do

Pp(t)← Selection of parents from P (t);Pc(t)← Crossover from Pp(t);Pm(t)←Mutation of Pc(t);Evaluation Pm(t);P (t+ 1)← New Generation Creationfrom (P (t), Pm(t));t← t+ 1 ;

endAlgorithm 1: Simple GA

The basic operators of a GA are selection,crossover and mutation. There are also other op-erators or techniques that can be applied to theGA in order to improve its results, such as elitism(propagating a percentage of the best individualsin a population into the next generation) or ran-dom immigrants (substituting the worst individ-uals in the population by random individuals).

2.3 Clustering

Clustering is an unsupervised technique, sinceit does not use preclassified data. Instead thealgorithm discovers similarities (in the requestedattributes) between objects of the set, groupingthem in the same cluster. In [8] a GA is used tooptimize clustering, using the Calinski-Harabaszindex as fitness function, obtaining better resultsthan classical clustering methods as K-means andFuzzy C-Means (FCM) (for more information see[9] and [10]).

JOURNAL OF CLASS FILES, VOL. , NO. , NOVEMBER 2016 3

2.4 Related Work Results

In this section, some previous work results anddata used are analyzed and compared betweenthem in table 2.4.

Work Date Approach FinancialData ApplicationBenchmarkValidation

Period Returns

[11] 2015 MOGA FI & TI

PortfolioCom-posi-tion

S&P500 2013 -2014 28.3%

[5] 2011 GA TI

PortfolioCom-posi-tion

DJI 2003 -2009

BetterthanB%HandRan-domWalk

[12] &[13] 2008 R-

SPEA2RawData

PortfolioCom-posi-tion

FTSE100

May2004 -

Decem-ber

2005

Betterthe

SPEA2ap-

proach

[14] 2004 EC MarkowitzModel

FundMan-age-mentCom-posi-tion

S&PDatabase N/A

Betterthan2PLS,

SA andTS

[15] 2015 GA TI

CrudeOil

FutureMarket

CrudeOil

Market

1987 -2013

BetterthanB&H

[16] 2010

MEPA&

CPDA& RST& GA

TI

PortfolioCom-posi-tion

TAIEX 2000 -2005

BetterthanB&H,GA orRST

alone

[6] 2009ANN &

FL &GA

TI

PortfolioCom-posi-tion

S%P500 1997 -2002

BetterthanB&H,

NN, NFor GAalone

[17] 2007RST &FuzzyRules

TIForex

Ex-change

USD vsothermain

curren-cies

2004 -2006 33.95%

[18] 2009 GP TI

PortfolioCom-posi-tion

S&P500 1990 -2002

BetterthanB&H

[19] 2014 GA TI

PortfolioCom-posi-tion

TAIEX

118days(end

date 14June2014)

BetterthanB&H

[20] 1990 ANN RawData

PortfolioCom-posi-tion

TOPIX

January1987 -

Septem-ber

1989

BetterthanB&H

[21] 2011 ANFIS TI

PortfolioCom-posi-tion

IBM &Wal-

mart &Citi-

group&

Wyeth& GM

24thAugust1994 -30th

August2006

240.32%, better

thanS&P500and DJI

3 ARCHITECTURE

This Chapter will give a description of the systemarchitecture. First it will be given an overview ofthe proposed solution, afterwards a module styledescription of the most important modules andlastly, an explanation of other implementationdetails. The architecture of the proposed solutionwas made from scratch in C++. The implementa-tion has 22 C++ classes, and a size of about 9000lines of code.

3.1 Module View of the Global Architecture

The overview of the module architecture of theproposed solution is presented in this section (seefigure 1).

There are nine main modules in the system,each one with a specific functionality.

Fig. 1. Module View of the Architecture

See figure 2 to better understand the data flowof the system.

3.2 Modules

This section explains in more detail the mostimportant modules of this work. The modules areconnected between them as showed in figure 2.

3.2.1 Fundamental Analysis (FA)This module is a data analysis module, that willcompute growth analysis and FIs from the fi-nancial statements of the companies (this datawas obtained with [22] algorithm). The indicatorsused in this work will always have a maximiza-tion objective. They are the following:

JOURNAL OF CLASS FILES, VOL. , NO. , NOVEMBER 2016 4

Fig. 2. Modules Data Flow

Debtindicator = 1− Total DebtTotal Assets

(1)

Payout Ratio =DPSEPS

(2)

ReturnOnEquity =Net Income (NI)

Total Equity(3)

ProfitMargin =NI

Revenue(4)

RevenueGrowth =RevActual −RevLastY ear

RevLastY ear(5)

NetIncomeGrowth =NIActual −NILastY ear

NILastY ear(6)

∆RG =RGActual −RGLastY ear

RGLastY ear(7)

∆NIG =NIGActual −NIGLastY ear

NIGLastY ear(8)

∆CFOA =CFOAActual − CFOALastY ear

CFOALastY ear(9)

3.2.2 ClassifierThis module does the classification of stocks foreach quarter, based on the approach used by PeterLynch in [23] (see figure 3 for better understand-ing of the structure).

Very Good

Good

Normal

Bad

Very Bad

MediumSmall Big

Fast Grower

Stalwart

Slow Grower

Potential

Turn Around

Growth

Size

Fig. 3. Types’ Quadrants

The classification given has only into accountthe size and the growth of the company (seefigure 3), and not the financial health, that wasimplemented for possible human analysis.

An example of how size classification is com-puted is given (health is similar). For growthclassification the method is the same but withnine instead of three thresholds:

• Classification = 1→ V alue < THLow

JOURNAL OF CLASS FILES, VOL. , NO. , NOVEMBER 2016 5

• Classification = 2→ THLow ≤ V alue <THHigh

• Classification = 3→ THHigh ≤ V alue

Afterwards the classification given is accord-ingly:

• Small→ Total Classification ∈]0, 1.5[• Medium→ Total Classification ∈ [1.5, 2.5[• Big→ Total Classification ∈ [2.5, 3]

In this work the only Size indicator used is thelast year’s Total Assets average, and the thresh-olds are1: THLow = 10B and THHigh = 25B.

The only growth indicator used is the revenueyearly growth, and the thresholds are: TH1 =−0.2, TH2 = −0.1, TH3 = −0.05, TH4 = −0.02,TH5 = 0, TH6 = 0.02, TH7 = 0.05, TH8 = 0.1,TH9 = 0.2.

And the only financial Health indicator usedis the last year TotalDebt

TotalAssets indicator average, andthe thresholds are: THLow = 0.3 and THHigh =0.7

(a) Health (b) Size

(c) Growth

Fig. 4. Classifier Structure

The type of the stock is, as said before, ob-tained from the evaluation of the size and growthof a company, based on the approach presentedby [23]. After a stock has its growth and size eval-uated, the type is determined by combinationsbetween them.

The 5 classifications given in this work are (seefigure 3):

• Slow Grower - Companies that are consid-ered Medium in terms of size and had aNormal or Good growth classification

1. the 5B and 10B represents respectively 5 and 10 Billiondollars

• Stalwart - Companies that are consideredBig in terms of size and had a Normal,Good or Very Good growth classification

• Fast Grower - Companies that are consid-ered Small or Medium in terms of size andhad a Very Good growth classification

• Potential - Companies that are consideredSmall in terms of size and had a Normalor Good growth classification

• Turn Around - Companies that had a Bador Very Bad growth classification, inde-pendently of their size.

3.2.3 Clustering

This module creates five clusters each quarter,and associates each stock to the nearest cluster inthe Growth × Size space. It uses the GA moduleto optimize the clusters’ positions, given at leastone year training.

Values are scaled to give more appropriateweights to both growth and size. These scaledunits are given in equation 10.

Size =Total Assets[Million$]

1000(10a)

Growth = ∆Revenue× 100 (10b)

The growth measure is the annual revenuegrowth, and the size measure is the average ofthe Total Assets over the last year.

There will be a fixed amount of 5 clusters (thesame amount as the types in the Classifier mod-ule), enumerated from A to E (the GA will recom-pute the best locations for clusters each quarterpassed). To maintain consistency, the first cluster(cluster A) will always be the one nearest to theorigin of the plane (Origin = Coordinates (0, 0)),B the second nearest, and so on.

3.2.4 GA

This module has two functionalities. One is tooptimize weights given to fundamental indica-tors, and other is to optimize the location of theclusters in the plane. The way each works isdescribed as follows:

JOURNAL OF CLASS FILES, VOL. , NO. , NOVEMBER 2016 6

3.2.4.1 Fundamental Analysis GA (FAGA): Since the Portfolios use FI, and FI requires amore long term analysis than TI, each time unit isconsidered a quarter. A generation is an iterationover the last 4 quarters of the GA. See figure 5 tosee a structure of a chromosome.

Fig. 5. FI Chromosome Representation

At the beginning of the algorithm the popu-lation is generated randomly. Each FI weight isinitialized as in equation 11a, the buy signal valueis initiated as in equation 11b, and the sell signalvalue initiated as in equation 11c.

r ∈ [0, 1] (11a)

b =N∑i=1

riai (11b)

s =b

2(11c)

In equations 11 N is the number of indicatorsused, r and a are different random numbers, b isthe buy signal value and s is the sell signal value.The s value is calculated as in equation 11c toguarantee that it is smaller and has a substantialpercentage difference from the b value.

Buy and sell signals are given as in equation12, where v is the value of indicator i, and w isthe weight of indicator i.

Signal =

{BUY →

∑Ni=1 vi × wi > b

SELL→∑N

i=1 vi × wi < s(12)

The fitness function will be the ROI of eachindividual, when simulating investment.

ROI =Return - Initial Investment

InitialInvestment(13)

3.2.4.2 Clustering GA: This is where thetraining and validation of the algorithm to op-timize clusters’ positions occur, inspired in theACGA algorithm from [8]. The GA module willrun the GA to find the best locations for clusters

centroids, being the output the centroids loca-tions in the Size×Growth plane, using Calinski-Harabasz index as fitness function. See figure 6 tosee the structure of a chromosome.

The chromosomes are constructed with clusterpoints. Activation values are auxiliary structuresthat will determine if a certain cluster point isgoing to be used or not. The number of clusterpoints and activation values is the possible num-ber of cluster positions given by the user. Clusterpoints and activation values are such that:

• Cluster Point is a tuple (size, growth),where size ∈ R+ and growth ∈ R

• Activation signal is a number n ∈ [0, 1]

Fig. 6. Clustering Chromosome Representation

The minimum training period of the algo-rithm is 1 year. Chromosomes are randomly ini-tiated, by assigning random values to Size andGrowth, in order to create theN different possiblepoints. This is done as in equation 14.

Sizerandom =maxsize

2× r1 (14a)

Growthrandom =maxgrowth

2× r2 (14b)

In equation 14 r1 and r2 are distinct random num-bers and maxsize, maxgrowth are the maximumsize and growth measured in the first year.

The activation values are obtained as in equa-tion 15.

Activationi =# Stocks ∈ Ci

# Stocks(15)

In equation 15 i is the index of the solution, andCi is the cluster of index i.

Within the number of solutions decided by theuser, the clusters with bigger activation values’are chosen as solutions of that chromosome. Thefitness function used is the Calinski-Harabaszmetric (sometimes called variance ratio criterion),as described in equations 16. Although this metricis usually used to optimize the number of clusterscreated, in this work it will be used only as a

JOURNAL OF CLASS FILES, VOL. , NO. , NOVEMBER 2016 7

fitness function to the GA that optimizes clusters’positions.

CH =SSB

SSW× N −K

K − 1(16a)

SSB =K∑j=1

nj‖Cj − C‖ (16b)

SSW =K∑j=1

∑i∈Ij

‖Xij − Cj‖ (16c)

C = (

∑Stockssize

N,

∑Stocksgrowth

N) (16d)

In equations 16 SSB is the between-clustervariance, SSW is the within-cluster variance, Nis the total number of stocks, K is the number ofclusters (the number of solution clusters), nj is thenumber of data points belonging to cluster indexj, C is the centroid of the dataset, Cj is cluster ofindex j, Ij is the set of data points belonging tocluster j and Xij is data point index i belongingto cluster index j.

3.2.4.3 Operators: Two types of selectionare implemented. A roulette wheel and a rankingselection. For this it will be used the fitness valuesshifted by the fitness of the worst individual, toavoid negative fitnesses.

The type of crossover done is a single pointcrossover.

The mutation implemented uses a ψ value todetermine the maximum value of the perturba-tion. This ψ is a percentage δ of the value that willbe mutated. After finding this value, the mutationvalue α will be computed as being a randomnumber between [0, ψ]

The mutation can be mathematically written:

ψ = δ × v (17)α = random ∈ [0, ψ] (18)

value = value± α (19)

In equation 17 value is the value receiving themutation, α is the perturbation applied to thevalue, ψ is the maximum value of the pertur-bation and δ is the percentage of perturbationchosen.

In this work values of δ = 0.5 in indicatorsweights, and δ = 0.2 in cluster positions are used.

3.3 Implementation Details

3.3.1 Configuration

The population size is 100, number of generationsis 50 for FA GA and 200 for clustering optimiza-tion, mutation rate is 5%, elitism rate is 40%, ran-dom immigration rate is 20% and chromosomesize is 11 for the FA GA (equals the number ofindicators) and 75 for the clustering optimization(number of possible cluster points). These valueswere chosen taking into account related worksand the execution time of the algorithm, althoughthey did not took into account neither the sizeof the problem or the features optimized in thiswork. Transaction costs of 0,3% of the stock valueare used in this work.

4 RESULTS

Every solution is compared with the S&P500 in-dex and the B&H of the dataset in the specifictime period. In the GAs used, the roulette wheelselection got better results, and these are the onespresented. These results are not compared withthe existing related works because I was unableto find works that traded in a similar time period,and because this work uses quarterly tradingonly, while most works use daily and weeklytrading.

In order to test the performance of the pro-posed solution, the metrics used in this workwere the following:

• ROI - The Return on Investment will beused to measure the amount of return,given in percentage, that a certain invest-ment had.

• Drawdown - This metric evaluates thebiggest peak-to-trough decline during aspecific period of investment.

• Sharpe Ratio - Used to measure the riskassociated with the return of a portfolio.

• Success rate of trades• Average time in the market• Average return per trade• Rate of positive quarters• Average return per quarter

JOURNAL OF CLASS FILES, VOL. , NO. , NOVEMBER 2016 8

4.1 Case Studies

All the results were computed without dividends(stocks’ closing quotes were used), and withtransaction costs of 0,3% in every buy or sellaction.

There is no comparison with related studiesbecause when this work was proposed I wasunable to find neither works that used the samevalidation period, neither works that traded onlyquarterly, and so this work uses the B&H and theS&P500 index (specially the latter) as comparisonwith the obtained results.

The execution time of the clusters’ positionsoptimization algorithm is of about 5 hours and ittakes 7 hours to create all of the portfolios once,with an i7 microprocessor.

4.1.1 Case Study I - User Input ClassificationThis case study presents the results of classifyingstocks using user input data.

The strategy is to do a portfolio containingonly a type of stock. This portfolio will buy stocksof a certain type the quarter that stock is classifiedas belonging to that type, and sell it in the quarterit stops belonging to that type. Since this classifi-cation is fixed, the values presented are the resultof a single run.

We can see in figure 7 the graph of the accu-mulated returns, and in figure 8 the results of theseveral metrics used.

One can see that every type got worst re-sults than the S&P500 index and the B&H of thedataset. The results of the Turnarounds type (thetype with worst results) was expected, since com-panies with worst revenue growths are expectedto have more trouble growing.

4.1.2 Case Study II - ClusteringThis case study presents the results of clusteringstocks by their size (total assets) and revenuegrowth, in the plane Size × Growth, with thealgorithm detailed in section 3. The algorithmhas run 10 times, and the solution that achievedthe median value of the averages of all quarters’fitnesses was used, as shown in table 1.

Just as in Case Study I, the portfolios will buystocks of a certain cluster the quarter that stock is

2012 Q1 2012 Q2 2012 Q3 2012 Q4 2013 Q1 2013 Q2 2013 Q3 2013 Q4 2014 Q1 2014 Q2 2014 Q3 2014 Q4 2015 Q1 2015 Q2 2015 Q3 2015 Q4SG 8,49% 6,43% 12,23% 15,80% 24,49% 26,26% 32,88% 39,70% 48,27% 49,73% 45,48% 55,84% 59,99% 55,89% 49,29% 48,82%SW 8,95% 5,07% 9,84% 12,69% 22,73% 26,40% 33,43% 43,68% 49,05% 54,11% 50,71% 61,50% 63,33% 59,31% 48,30% 49,44%FG 14,95% 4,99% 14,49% 18,63% 22,81% 25,09% 30,13% 41,31% 46,98% 51,52% 44,77% 49,81% 60,43% 58,93% 48,10% 48,00%POT 10,34% 6,24% 11,46% 16,84% 23,80% 22,96% 29,63% 39,81% 45,26% 46,31% 43,61% 54,85% 59,67% 57,23% 50,65% 52,15%TA 3,79% -0,40% 4,77% 6,63% 18,84% 23,85% 34,23% 41,89% 46,56% 48,73% 44,90% 50,16% 50,22% 45,78% 34,12% 36,13%B&H 10,25% 5,35% 11,73% 15,50% 26,05% 29,50% 37,89% 48,41% 55,24% 59,29% 54,97% 65,68% 68,94% 64,51% 52,44% 54,25%S&P500 11,99% 8,31% 14,54% 13,39% 24,76% 27,70% 33,69% 46,96% 48,87% 55,85% 56,80% 63,68% 64,40% 64,03% 52,64% 62,49%

-20%

0%

20%

40%

60%

80%

100%

Re

turn

s

Quarters

Type's Portfolios

Fig. 7. User input classification portfolios accumulated re-turns. The key is the following: SG - Slow Growers; SW- Stalwarts; FG - Fast Growers; POT - Potentials; TA -Turnarounds

Number of Trades

Rate of Trade

Success

Biggest Trade Return

Biggest Trade Loss

Average Time on

Market (in

Average Return

Per trade

Rate of Positive Quarters

Average Return Per

Quarter

4 Year Return

Sharpe Ratio

Drawdown

SG 132 65,91% 213,04% -65,39% 4,95 24,46% 68,75% 2,39% 44,09% 0,83 10,07%

SW 243 78,19% 253,67% -74,54% 8,51 36,03% 75,00% 2,84% 54,33% 0,87 13,77%

FG 72 73,61% 151,01% -34,50% 3,93 18,67% 75,00% 3,87% 79,32% 2,15 9,96%

POT 85 72,94% 226,94% -41,83% 5,94 26,76% 68,75% 2,90% 55,73% 1,41 6,81%

TA 203 59,11% 221,01% -80,57% 4,09 10,83% 62,50% 1,78% 30,13% 0,36 18,13%

Fig. 8. User input portfolios metrics table. The key is thefollowing: SG - Slow Growers; SW - Stalwarts; FG - FastGrowers; POT - Potentials; TA - Turnarounds

classified as belonging to that cluster, and sold inthe quarter it stops belonging to that cluster.

We can see in figure 9 the accumulated returnsof the portfolios, and in figure 10 the metricsevaluating each portfolio.

In figure 11 is shown how companies are

JOURNAL OF CLASS FILES, VOL. , NO. , NOVEMBER 2016 9

TABLE 1Average fitness per quarter of the clustering algorithm

Best Worst MedianCalinski-Harabasz 806,4 665,37 767,8

2012 Q1 2012 Q2 2012 Q3 2012 Q4 2013 Q1 2013 Q2 2013 Q3 2013 Q4 2014 Q1 2014 Q2 2014 Q3 2014 Q4 2015 Q1 2015 Q2 2015 Q3 2015 Q4A 9,13% 3,96% 10,33% 14,36% 24,87% 27,79% 36,86% 46,58% 53,08% 56,58% 52,68% 63,90% 66,93% 60,47% 51,00% 51,87%B 11,64% 7,31% 12,49% 17,81% 24,72% 23,45% 28,92% 38,72% 47,30% 54,08% 47,79% 53,25% 61,41% 60,19% 55,58% 56,06%C 10,06% 4,45% 7,59% 8,75% 19,24% 26,03% 30,88% 42,32% 48,97% 54,63% 51,91% 59,03% 57,79% 59,05% 43,94% 45,02%D 9,46% 15,49% 22,17% 19,05% 25,65% 30,77% 32,35% 43,80% 44,59% 40,37% 35,78% 43,74% 46,18% 47,82% 32,55% 34,37%E 28,17% 21,00% 36,22% 49,04% 59,20% 70,12% 78,24% 94,57% 91,90% 100,64% 97,29% 104,26% 96,95% 101,68% 77,85% 77,34%B&H 10,14% 5,25% 11,77% 15,51% 26,10% 29,49% 37,84% 48,32% 55,03% 59,04% 54,74% 65,60% 68,90% 64,51% 52,77% 54,54%S&P500 11,99% 8,31% 14,54% 13,39% 24,76% 27,70% 33,69% 46,96% 48,87% 55,85% 56,80% 63,68% 64,40% 64,03% 52,64% 62,49%

-10,00%

10,00%

30,00%

50,00%

70,00%

90,00%

110,00%

Re

turn

s

Quarters

Cluster's Portfolios

Fig. 9. Clustering classification portfolios. The key is thefollowing: A - Cluster A; B - Cluster B; C - Cluster C; D -Cluster D; E - Cluster E.

divided into clusters. It is remarkable how datais so sparse in the Size axis. Cluster A, is thebiggest cluster, containing most of the companies,while cluster E has only 4 companies in the fourthquarter of 2012.

One can conclude that the biggest companiesof the dataset got huge returns on bull markets,and huge losses on bear markets, having smallSharpe ratios, and huge drawdowns. Big compa-nies, outside this small set of huge companies,have the worst returns of all (the ones fromcluster D). It can also be concluded that revenuegrowth can be chosen as an indicator to pickstocks with relatively good Sharpe ratios.

With the scale and number of clusters usedthere was a better grouping in size than ingrowth. Due to the amount and sparsity of data,to get a better clustering in the growth axis (spe-cially in high values of size) it would be necessarymore cluster points.

Number of Trades

Rate of Trade Success

Biggest Gain Biggest LossAverage Time on Market (in

Quarters)

Average Return Per

trade

Rate of Positive Quarters

Average Return Per

Quarter4 Year Return Sharpe Ratio Drawdown

Cluster A

104 84,62% 202,85% -48,88% 6,64 46,58% 75,00% 2,76% 51,87% 0,76 15,93%

Cluster B

107 74,77% 128,38% -50,11% 3,55 14,75% 68,75% 2,91% 56,06% 1,50 6,29%

Cluster C

65 73,85% 167,25% -57,45% 8,80 29,69% 75,00% 2,48% 45,02% 0,60 15,11%

Cluster D

31 80,65% 107,34% -84,16% 6,23 23,36% 75,00% 1,99% 34,37% 0,51 15,27%

Cluster E

9 88,89% 150,20% -36,59% 7,33 66,46% 62,50% 4,00% 77,34% 0,66 23,83%

Fig. 10. Clustering portfolios metrics table. The key is thefollowing: A - Cluster A; B - Cluster B; C - Cluster C; D -Cluster D; E - Cluster E.

-80

-60

-40

-20

0

20

40

60

80

100

0 100 200 300 400 500 600 700 800 900 1000

Gro

wth

[%

]

Size [1.000.000.000$]

Clusters 2012 Q4

A

B

C

D

E

Cluster A

Cluster B

Cluster C

Cluster D

Cluster E

Fig. 11. Cluster’s representation in the whole plane in thefourth quarter of 2012. Cluster centers have the key ”ClusterX” where X is the name of the cluster (A, B, C, D or E), andthe points belonging to a certain cluster have only the letterof that cluster.

4.1.3 Case Study III - Using GAs to optimize FI

This case study studies the results of applyingGAs to give buy and sell signals on each of thegroups of case studies 1 and 2, and the wholedataset.

The algorithm ran 10 times for each model,and in the models that used the clustering al-

JOURNAL OF CLASS FILES, VOL. , NO. , NOVEMBER 2016 10

gorithm, the clustering algorithm has run twice,so for each portfolio that uses clustering, 5 runsused the first run of the clustering algorithm, andthe other 5 used the other run of the clusteringalgorithm.

One portfolio using user input classifications(the Slow Growers type) and two portfolios usingclustering showed improvements (cluster D andE). The results present how the GA improved acluster type portfolio (cluster E) and a classifi-cation type portfolio (Slow Growers), from theoriginal portfolios of case study 1 and 2.

2012 Q1 2012 Q2 2012 Q3 2012 Q4 2013 Q1 2013 Q2 2013 Q3 2013 Q4 2014 Q1 2014 Q2 2014 Q3 2014 Q4 2015 Q1 2015 Q2 2015 Q3 2015 Q4E-GA 29,50% 30,50% 41,65% 49,71% 61,30% 72,99% 74,92% 89,28% 94,96% 105,01% 93,62% 105,34% 118,52% 125,49% 103,61% 104,68%E 28,17% 21,00% 36,22% 49,04% 59,20% 70,12% 78,24% 94,57% 91,90% 100,64% 97,29% 104,26% 96,95% 101,68% 77,85% 77,34%GA 7,67% 3,89% 7,64% 10,12% 19,05% 20,35% 27,93% 36,93% 42,23% 46,18% 42,40% 51,94% 53,70% 50,07% 36,15% 41,30%

-20%

0%

20%

40%

60%

80%

100%

120%

140%

Re

turn

s

Quarters

Type's Portfolios

Fig. 12. GA optimization portfolios. The key ”SG” repre-sents the portfolio of type Slow Growers and the key ”E”represents the portfolio of cluster E, as constructed in casestudies 1 and 2. The suffix ”-GA” represents the portfoliosusing GA to optimize FI weights. The key ”GA” representsthe GA algorithm running over the whole dataset.

One can notice in figure 12 that in the portfolioE-GA there was a 27,34% increase in performance,most of which in the year 2015. The GA limitedthe losses of cluster E in 2015, and consequen-tially, increased the Sharpe ratio.

In figure 13 the metrics of these portfolios areshown. It reduced substantially the number oftrades done, when compared with portfolio SG(78,03% - from 132 trades in portfolio SG, to 29trades in portfolio SG-GA).

One can conclude from this case study that

Number of

Trades

Rate of Trade

Success

Biggest Trade Return

Biggest Trade Loss

Average Time on Market

(in Quarters)

Average Return

Per trade

Rate of Positive Quarters

Average Return

Per Quarter

4 Year Return

Sharpe

Ratio

Drawdown

GA 368 72,83% 276,67% -69,44% 5,49 24,02% 75,00% 2,29% 41,30% 0,67 17,55%

SG-GA 29 68,97% 264,07% -35,46% 3,79 29,16% 75,00% 2,49% 46,03% 0,84 16,45%

E-GA 8 100,00% 117,26% - 3,50 45,94% 87,50% 4,87% 104,68% 0,98 21,89%

Fig. 13. Clustering portfolios metrics table. The key is thefollowing: A - Cluster A; B - Cluster B; C - Cluster C; D -Cluster D; E - Cluster E.

Fundamental Indicators optimization has betterresults when applied to small sets of companies(as there were cluster E and D).

Also, the GA used in this work greatly re-duced the number of trades done in all portfolios,reducing potential losses, but also potential gains.In the case of the E-GA portfolio, the major dif-ference from the cluster E portfolio presented incase study 2 was the reduction of losses in 2015.

5 CONCLUSION

This work describes the implementation of a sys-tem that classifies stocks in the plane Size ×Growth with two methods, using in both thesame metrics of Fundamental Analysis, and thatuses Fundamental Indicators for trading simula-tion.

The classification methods used were success-ful, having both achieved great returns in the ex-pected groups. The genetic algorithm for weightoptimization of fundamental indicators improvedtwo cluster based portfolios, and one classifica-tion based portfolio. It also showed great successwhen applied to the cluster with bigger returns,improving the returns by 27,34%. To have betterresults in this optimization it would be necessarya bigger number of clusters so the algorithmcould optimize Fundamental Indicators weightsin smaller sets of stocks. These results allowed

JOURNAL OF CLASS FILES, VOL. , NO. , NOVEMBER 2016 11

to conclude that automatic classification usinggenetic algorithms is a better way of classifyingstocks than using human input, since it is notprone to human error and does a more carefulanalysis of the data, grouping companies withmore similar behaviors.

5.1 AchievementsThe main achievements in this work were thefollowing:

• The implementation from scratch of a ar-chitecture of a system containing data pro-cessing, genetic algorithms and a tradingsimulator.

• A classification method using Fundamen-tal Analysis and Genetic Algorithm tooptimize clusters’ positions, with a fixednumber of clusters.

• The comparison between an user inputclassification method and an automaticclassification method.

• A long term investment strategy based onthe implemented classification methods.

• The exclusive use of Fundamental Analy-sis and Fundamental Indicators with Ge-netic Algorithms.

5.2 Future Work• Try different number of clusters, and op-

timize the number of clusters created tomaximize the Calinski-Harabasz index.

• Use Technical Analysis to give the buy andsell signals of a portfolio.

• Tackle the Markwoitz Portfolio composi-tion problem.

REFERENCES

[1] G. S. Atsalakis and K. P. Valavanis, “Surveying stockmarket forecasting techniques part ii: Soft computingmethods,” Expert Systems with Applications, vol. 36, no.3, Part 2, pp. 5932 – 5941, 2009.

[2] S. B. Achelis, Technical Analysis from A to Z. McGrawHill New York, 2001.

[3] B. Greenwald, J. Kahn, P. Sonkin, andM. van Biema, Value Investing: From Grahamto Buffett and Beyond, ser. Wiley finance.John Wiley & Sons, 2004. [Online]. Available:https://books.google.pt/books?id=gvCzlskpZxoC

[4] M. Buffett and D. Clark, Warren Buffett and theInterpretation of Financial Statements: The Searchfor the Company with a Durable CompetitiveAdvantage. Scribner, 2008. [Online]. Available:https://books.google.pt/books?id=7iqO6rGdrAYC

[5] A. Gorgulho, R. Neves, and N. Horta, “Applying a{GA} kernel on optimizing technical analysis rulesfor stock picking and portfolio composition,” ExpertSystems with Applications, vol. 38, no. 11, pp. 14 072 –14 085, 2011.

[6] T. Chavarnakul and D. Enke, “A hybrid stock tradingsystem for intelligent technical analysis-based equiv-olume charting,” Neurocomputing, vol. 72, no. 1618,pp. 3517 – 3528, 2009, financial EngineeringCompu-tational and Ambient Intelligence (IWANN 2007).

[7] R. D. Edwards, J. Magee, and W. C. Bassetti, Technicalanalysis of stock trends. CRC Press, 2007.

[8] C. Raposo, C. H. Antunes, and J. P. Barreto, AutomaticClustering Using a Genetic Algorithm with New SolutionEncoding and Operators. Cham: Springer InternationalPublishing, 2014, pp. 92–103.

[9] J. C. Dunn, “A fuzzy relative of the isodata processand its use in detecting compact well-separated clus-ters,” 1973.

[10] J. C. Bezdek, Pattern recognition with fuzzy objectivefunction algorithms. Springer Science & BusinessMedia, 2013.

[11] A. Silva, R. Neves, and N. Horta, “A hybrid approachto portfolio composition based on fundamental andtechnical indicators,” Expert Systems with Applications,vol. 42, no. 4, pp. 2036 – 2048, 2015.

[12] G. Hassan, “Multiobjective robustness for portfoliooptimization in volatile environments,” in In Proc.GECCO 08. ACM, 2008, pp. 1507–1514.

[13] G. Hassan and C. D. Clack, “Robustness of multipleobjective gp stock-picking in unstable financial mar-kets: Real-world applications track,” in Proceedings ofthe 11th Annual Conference on Genetic and EvolutionaryComputation, ser. GECCO ’09. New York, NY,USA: ACM, 2009, pp. 1513–1520. [Online]. Available:http://doi.acm.org/10.1145/1569901.1570104

[14] M. Ehrgott, K. Klamroth, and C. Schwehm, “An{MCDM} approach to portfolio optimization,” Eu-ropean Journal of Operational Research, vol. 155, no. 3,pp. 752 – 770, 2004, traffic and Transportation SystemsAnalysis.

[15] L. Wang, H. An, X. Liu, and X. Huang, “Selectingdynamic moving average trading rules in the crudeoil futures market using a genetic approach,” AppliedEnergy, vol. 162, pp. 1608 – 1618, 2016.

[16] C.-H. Cheng, T.-L. Chen, and L.-Y. Wei, “A hybridmodel based on rough sets theory and genetic al-gorithms for stock price forecasting,” Information Sci-ences, vol. 180, no. 9, pp. 1610 – 1629, 2010.

[17] S. Yao, M. Pasquier, and C. Quek, “A foreign exchangeportfolio management mechanism based on fuzzyneural networks.” in IEEE Congress on EvolutionaryComputation. IEEE, 2007, pp. 2576–2583.

[18] D. Lohpetch and D. Corne, “Discovering effectivetechnical trading rules with genetic programming:Towards robustly outperforming buy-and-hold,” in

JOURNAL OF CLASS FILES, VOL. , NO. , NOVEMBER 2016 12

Nature & Biologically Inspired Computing, 2009. NaBIC2009. World Congress on. IEEE, 2009, pp. 439–444.

[19] M. Radeerom, “Automatic trading system based ongenetic algorithm and technical analysis for stockindex,” International Journal of Information Processingand Management, vol. 5, no. 4, p. 124, 2014.

[20] T. Kimoto, K. Asakawa, M. Yoda, and M. Takeoka,“Stock market prediction system with modular neuralnetworks,” in Neural Networks, 1990., 1990 IJCNN In-ternational Joint Conference on, June 1990, pp. 1–6 vol.1.

[21] Z. Tan, C. Quek, and P. Y. Cheng, “Stock trading withcycles: A financial application of {ANFIS} and rein-forcement learning,” Expert Systems with Applications,vol. 38, no. 5, pp. 4741 – 4755, 2011.

[22] A. Fernandes, An Evolutionary Computing Approach toFinancial Portfolio Management Based on Growth Stocks& Sector/Industry Distribution. Instituto Superior Tc-nico, May 2016.

[23] P. Lynch and J. Rothchild, One Up On Wall Street:How To Use What You Already Know To Make MoneyIn. Simon & Schuster, 2012. [Online]. Available:https://books.google.pt/books?id=TYOdIrFJ2SkC