repository method to suit different investment strategies alma lilia garcia & edward tsang

16
Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

Upload: vernon-mitchell

Post on 26-Dec-2015

224 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

Repository Method to suit different investment

strategies

Alma Lilia Garcia & Edward Tsang

Page 2: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

Motivation

• Many machine learning techniques has been applied to financial problems.

• Genetic Programming (GP) has been used to predict financial opportunities.

• However, when the number of profitable opportunities is extremely small it is very difficult to detect those cases.

Alma Lilia Garcia & Edward Tsang

MotivationRepository MethodReceiver Operating Characteristic

(ROC)Experimental designExperimental resultsConclusions

Page 3: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

Confusion Matrix Reality Prediction

+ +

+ –

– –

– +

+ +

– –

– +

– –

+ +

– –

– +

– 4 2 6

+ 1 3 4

5 5 10

Prediction

Real

ity

Page 4: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

The problem with few opportunities

+ 9,801 99 99%

+ 99 1 1%

99% 1%

+ 9,900 0 99%

+ 0 100 1%

99% 1%

Predictions

Realit

y

Ideal prediction Accuracy = Precision = Recall = 100%

+

9,900 0 99%

+ 100 0 1%

100% 0%

Easy score on accuracyAccuracy = 99%, Precision = ?

Recall = 0%

Predictions

MotivationRepository MethodReceiver Operating Characteristic

(ROC)Experimental designExperimental resultsConclusionsMotivation

Alma Lilia Garcia & Edward Tsang

Random move from to +Accuracy = 98.02%

Precision = Recall = 1%

+ 9,810 90 99%

+ 90 10 1%

99% 1%

Moves from to +Accuracy = 98.2%

Precision = Recall = 10%(Accuracy dropped from 99%)

Page 5: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

Generation

. . .1

. . .2

. . .

. . .100

R1 = Var1>0.6 and Var2>Var3

R2 = Var2> 0.6

Rn = …

. .

.

GP systems spend a lot of computational resources evolving complete populations for several generations.

However, the standard procedure is to choose just the best individual of the evolution as the optimal solution of the problem.

The objective of repository method is to mine the knowledge acquired by the evolutionary process in order to compile several rules that detect the rare cases in diverse ways.

Since the number of positive examples is very small, it is important to gather all available information about them.

Repository Method

Alma Lilia Garcia & Edward Tsang

MotivationRepository MethodReceiver Operating Characteristic

(ROC)Experimental designExperimental resultsConclusions

Page 6: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

Repository Method

In order to mine the knowledge acquired by the evolutionary process Repository Method performs the following steps:

Alma Lilia Garcia & Edward Tsang

MotivationRepository MethodReceiver Operating Characteristic

(ROC)Experimental designExperimental resultsConclusions

Evolve a GP to create a population of decision trees

R1

R2…

Rn

The rule Rk is selected by precision; Rk is simplified to R’k

1- Rule extraction 2- Rule simplification

R’k is compared to the rules in the repository by similarity (genotype)

Rα…Rµ

3- New rule detection

4- Add rule to the repositoryIf R’k is a novel rule R’k is added to the rule repository

R’k

Page 7: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

Repository Method

Alma Lilia Garcia & Edward Tsang

MotivationRepository MethodReceiver Operating Characteristic

(ROC)Experimental designExperimental resultsConclusions

R1R2

Rn

An interesting question arises: Which is the best Precision Threshold to select rules? We propose to try with different precision thresholds in order to generate different classifications.

PT=1

R1R2

Rs

PT=.90

…R1R2

Rt

PT=.05

…Every Repository produces

different classification

Page 8: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

ROC space

Alma Lilia Garcia & Edward Tsang

MotivationRepository MethodReceiver Operating

Characteristic (ROC)Experimental designExperimental resultsConclusions

The Receiver Operating Characteristics (ROC) has been used extensively in Machine Learning to measure the performance of classifiers.

A single classification produces a point in the ROC space. However, some classifiers are able to produce a range of classifications, in that cases a curve is produced, this moves from the liberal to the conservative area.

Page 9: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

ROC Curve

Alma Lilia Garcia & Edward Tsang

MotivationRepository MethodReceiver Operating

Characteristic (ROC)Experimental designExperimental resultsConclusions

The main advantages of using ROC are:

1) It is able to deal with imbalance classifications

2) It is able to deal with classifiers that produce a range of classification

3) Lets the user to calculate the best trade-off between misclassifications and false alarms

The Area Under the ROC curve (AUC) has been used widely to measure compare the performance of different classifiers.

Slope = μ (1ρ)/(β ρ)where ρ = the % of + cases

Page 10: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

Experimental design

Alma Lilia Garcia & Edward Tsang

The aims of this work are:

1) to show that RM is able to produce a range of solutions capable to suit the investor requirements

2) to analyze the influence of the evolutionary process in the RM performance.

For that purpose RM was tested with the following experiments:Experiment 1: RM on random trees

RM collects rules from P0, a random population of decision trees. It is expected that the performance of RM will be low, because the decision trees were random.

Experiment 2: RM on partially evolved treesRM gathers rules from P10, a population that has been evolved after 10 generations.

Experiment 3: RM on trees from different generationsRM collects and accumulated rules from P10,P20, . . .P100, which means that after every ten generations, RM collected and accumulated rules generated so far.

MotivationRepository MethodExperimental designReceiver Operating Characteristic

(ROC)Experimental resultsConclusions

Page 11: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

Experimental results

The ROC curve using plotted by experiment 1,2 and 3

Alma Lilia Garcia & Edward Tsang

MotivationRepository MethodReceiver Operating Characteristic

(ROC)Experimental designExperimental resultsConclusions

A standard GP result Recall= 14%, Precision=5%and Accuracy= 89%. This

result is plotted in (0.09, 0.14)

Page 12: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

It has been shown that RM offers a range of solutions to suit the risk guidelines of the investors. Thus the user can choose the best balance between miss-classification and false alarms according to his/her requirements. This makes RM a valuable tool for investors in balancing between not making mistakes and not missing opportunities. RM is able to extract predictive rules even from earliest stages of the evolutionary process is two folds: (a) RM an potentially shorten the time in evolutionary computation; and (b) effort in early part of the search are not wasted. However to create a wider range of solutions, it is advisable to evolve the solutions at least past the exploration phase, especially when the solution of the problem is complex.

Alma Lilia Garcia & Edward Tsang

Conclusions

MotivationRepository MethodReceiver Operating Characteristic

(ROC)Experimental designExperimental resultsConclusions

Page 13: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

Questions?

Alma Lilia Garcia & Edward Tsang

Page 14: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

Data set description

The data set of Barclays stock is composed by the prices from March, 1998 to January, 2005. The attributes of each record are composed by indicators derived from financial technical analysis. Technical analysis has been used in financial markets to analyze the stock prices behaviour, this is mainly based on historical prices and volume trends. The indicators were calculated on the basis of the daily closing price.

Alma Lilia Garcia & Edward Tsang

MotivationRepository MethodFactors that work in favor of RMExperimental designExperimental resultsConclusions

Page 15: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

The result of the standard GP is: recall =14%, precision=5%and accuracy= 89%. This result is plotted in (0.09, 0.14) in the ROC graph, which describes a conservative prediction. Figure 4 displays the ROC curves plotted by RM in the following experiments: ² Experiment 1 Using P0 the AUC = .69, as can be observed from figure 4 the majority of the points are clustered in the conservative part of the ROC curve because these did not classify any positive case. However, RM was able to generate an interesting choice for the investor, when PT=20%, recall =38%, precision=9% and accuracy= 87 (see table V) ² Experiment 2 Using P10 the performance of RM increased considerably, the AUC increased from 0.69 to 0.74. In this experiment RM offers two valuable choices when PT=30% and PT=20%. The latest option provides a recall = 63% and accuracy = 81%. However one of the choices is in the conservative side and the other in the liberal side of the ROC curve as table V shows. .

Alma Lilia Garcia & Edward Tsang

Experimental results

MotivationRepository MethodFactors that work in favor of RMExperimental designExperimental resultsConclusions

Page 16: Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang

True Positive Rate (recall) = TP/(TP+FN) = 350/(350+200) = 63.6%False Positive Rate = FP/(FP+TN) = 50/(50+400) = 11.1%Precision = TP/(TP+FP) = 350/(350+50) = 87.5%

+ 400 50 450

+ 200 350 550

600 400 1,000

PredictionsR

ealit

y TN

FN

FP

TP

Confusion Matrix