automated trading strategies with r - oracle...automated trading strategies with r 3rd april 2014...

50
Automated Trading Strategies with R 3rd April 2014 Richard Pugh, Commercial Director [email protected]

Upload: others

Post on 14-Jan-2020

21 views

Category:

Documents


1 download

TRANSCRIPT

Automated Trading

Strategies with R 3rd April 2014

Richard Pugh, Commercial Director

[email protected]

Agenda

• Overview of Mango

• Data Analytics

• Introduction to Backtesting

• The Backtesting Project

• Leveraging Oracle R Enterprise

• Summary

Overview of Mango Solutions

Mango in a nutshell …

• Providers of analytic products and services

• Specialise in analytic application development

• Unique mix of business-focused statisticians and

mainstream software developers

• Private company founded in 2002

• Offices in UK & China

• Global Team of 65 and expanding

• ISO 9001 Accredited

• Partner with Oracle on R project

Analytic Application Development

Data Analytics

Data Analytics

• Companies are awash with structured and

unstructured data

• The insight locked in this data can help us to

make better decisions and gain a competitive

advantage

• Data Analytics can help to extract the key

information from our data

Who is a good driver? How do we win more games? What bonus should I pay?

Will someone like this? When might this break? What are they likely to want?

Data Analytic Examples

Challenges of Integrating Analytics

• Clear questions are needed

• Data may not be analytic-ready

• Sophisticated analytics require niche technology

that can be difficult to integrate

• The “language” of analytics can be difficult to

penetrate and requires specialists

• Integrating the “right” analytics is key …

Introduction to Backtesting

Introduction to Backtesting

• Algorithmic trading makes up a large % of market

trades

• Backtesting is the process of testing a trading

strategy using historical data

• Allows the development of an automated trading

strategy

Backtesting Example

Buy every stock beginning with ‘A’ and sell all stocks beginning with ‘Z’

How do we know if this works??

Backtest!!

Long

Short

Backtesting Example

Key Factors in Backtesting

• Easy selection and execution of strategies

• Performance of backtest

• Optimisation across sectors, styles, etc

• Comparison with hurdle (e.g. interest rates)

• Transaction costs

The Backtesting Project

The Backtesting Project

• Mango engaged by a major hedge fund to create

backtest solution

• Competitive advantage over off the shelf solution

• Particular complexity around transaction cost

(futility switching) and optimisation

• Framework with possibility for extensions

The Backtesting Solution

Universe

Feed Raw Data &

Alpha Storage

Analytic Engine

Analytic Code

MgMent UI

Backtest

Application

Graphical User

Interface

Scripting

Interface

The Backtesting Solution

Universe

Feed Raw Data &

Alpha Storage

Analytic Engine

Analytic Code

MgMent UI

Backtest

Application

Graphical User

Interface

Scripting

Interface

The Backtesting Solution

Universe

Feed Raw Data &

Alpha Storage

Analytic Engine

Analytic Code

MgMent UI

Backtest

Application

Graphical User

Interface

Scripting

Interface

The Backtesting Solution

Universe

Feed Raw Data &

Alpha Storage

Analytic Engine

Analytic Code

MgMent UI

Backtest

Application

Graphical User

Interface

Scripting

Interface

The Backtesting Solution

Universe

Feed Raw Data &

Alpha Storage

Analytic Engine

Analytic Code

MgMent UI

Backtest

Application

Graphical User

Interface

Scripting

Interface

Bespoke C

The Backtesting Project Outcome

• Deemed a success

• Used to drive an industry-beating fund

• Scripting interface more popular than the

graphical user interface

• Code management interface allowed for the

addition of new routines without impact to the rest

of the application

The Backtesting Project Constraints & Challenges

• Performance bottleneck meant restricted to

weekly data

• Creation of the C layer for data access was

unexpected

• As number of “power users” increased, more

sophisticated code management would have

helped

Leveraging Oracle R Enterprise

Leveraging Oracle R Enterprise

• The project was operated on a shared-cost basis,

with Mango retaining the IP

• Mango now looking to further develop the

application and release as a product

• ORE identified as perfect way to replace non-

performant parts of the application

• Oracle products familiar to Mango

Steps to Integrating with ORE

• Use Oracle for Object Management

• Replace functions with ORE equivalent

• Use embedded scripts for execution

• Expose interface as SQL

• Build User Interface

ORE Functions

> apropos("^ore")

[1] "OREShowDoc" "ore.attach" "ore.connect" "ore.const"

[5] "ore.corr" "ore.create" "ore.crosstab" "ore.datastore"

[9] "ore.datastoreSummary" "ore.delete" "ore.detach" "ore.disconnect"

[13] "ore.doEval" "ore.drop" "ore.esm" "ore.exec"

[17] "ore.exists" "ore.frame" "ore.freq" "ore.get"

[21] "ore.getXlevels" "ore.getXnlevels" "ore.glm" "ore.glm.control"

[25] "ore.groupApply" "ore.hash" "ore.hiveOptions" "ore.hour"

[29] "ore.indexApply" "ore.is.connected" "ore.lazyLoad" "ore.lm"

[33] "ore.load" "ore.ls" "ore.make.names" "ore.mday"

[37] "ore.minute" "ore.month" "ore.neural" "ore.odmAI"

[41] "ore.odmAssocRules" "ore.odmDT" "ore.odmGLM" "ore.odmKMeans"

[45] "ore.odmNB" "ore.odmNMF" "ore.odmOC" "ore.odmSVM"

[49] "ore.predict" "ore.pull" "ore.pull" "ore.pull"

[53] "ore.push" "ore.push" "ore.rank" "ore.recode"

[57] "ore.rm" "ore.rollmax" "ore.rollmean" "ore.rollmin"

[61] "ore.rollsd" "ore.rollsum" "ore.rollvar" "ore.rowApply"

[65] "ore.save" "ore.scriptCreate" "ore.scriptDrop" "ore.second"

[69] "ore.showHiveOptions" "ore.sort" "ore.stepwise" "ore.summary"

[73] "ore.sync" "ore.tableApply" "ore.toXML" "ore.univariate"

[77] "ore.year" "oreOut"

Step #1: Oracle for Object Management

• Ported the application to use Oracle for object

(data) management

• Suite of ore.* functions to allow easy storage /

retrieval of R objects

• Immediate benefit in performance for data i/o

• Code base simplification (no need for bespoke C

layer)

Step #1: Oracle for Object Management > writeRdaObject

function(object, fileName, category = "RawData", dataMethod = .backTest$dataMethod, …) {

ore.save(object, name = returnObject, overwrite = TRUE)

}

> loadRdaObject

function(fileName, category = "RawData", dataMethod = .backTest$dataMethod, …) {

get(ore.load(returnObject))

}

> grep("^RAW", ore.datastore()[[1]], value = TRUE)

[1] "RAWDATA_BRD_NO" "RAWDATA_BRD_SECT" "RAWDATA_DIVYIELD" "RAWDATA_FISCALYR1"

[5] "RAWDATA_FISCALYR2" "RAWDATA_FY13M" "RAWDATA_FY23M" "RAWDATA_HIGHY1"

[9] "RAWDATA_HIGHY2" "RAWDATA_IH6Y1" "RAWDATA_IH6Y2" "RAWDATA_IH7Y1"

> ore.datastoreSummary("RAWDATA_PRICE")

object.name class size length row.count col.count

1 getIt matrix 34776172 4336119 3917 1107

Step #2: Replace Functions with ore*

• The ORE library contains many optimised

versions of existing R functions

• There are also new functions not available in

Base R

• Using these ORE functions improves performance

and simplifies the code base

> apropos("^ore")

[1] "OREShowDoc" "ore.attach" "ore.connect" "ore.const"

[5] "ore.corr" "ore.create" "ore.crosstab" "ore.datastore"

[9] "ore.datastoreSummary" "ore.delete" "ore.detach" "ore.disconnect"

[13] "ore.doEval" "ore.drop" "ore.esm" "ore.exec"

[17] "ore.exists" "ore.frame" "ore.freq" "ore.get"

[21] "ore.getXlevels" "ore.getXnlevels" "ore.glm" "ore.glm.control"

[25] "ore.groupApply" "ore.hash" "ore.hiveOptions" "ore.hour"

[29] "ore.indexApply" "ore.is.connected" "ore.lazyLoad" "ore.lm"

[33] "ore.load" "ore.ls" "ore.make.names" "ore.mday"

[37] "ore.minute" "ore.month" "ore.neural" "ore.odmAI"

[41] "ore.odmAssocRules" "ore.odmDT" "ore.odmGLM" "ore.odmKMeans"

[45] "ore.odmNB" "ore.odmNMF" "ore.odmOC" "ore.odmSVM"

[49] "ore.predict" "ore.pull" "ore.pull" "ore.pull"

[53] "ore.push" "ore.push" "ore.rank" "ore.recode"

[57] "ore.rm" "ore.rollmax" "ore.rollmean" "ore.rollmin"

[61] "ore.rollsd" "ore.rollsum" "ore.rollvar" "ore.rowApply"

[65] "ore.save" "ore.scriptCreate" "ore.scriptDrop" "ore.second"

[69] "ore.showHiveOptions" "ore.sort" "ore.stepwise" "ore.summary"

[73] "ore.sync" "ore.tableApply" "ore.toXML" "ore.univariate"

[77] "ore.year"

> apropos("^ore")

[1] "OREShowDoc" "ore.attach" "ore.connect" "ore.const"

[5] "ore.corr" "ore.create" "ore.crosstab" "ore.datastore"

[9] "ore.datastoreSummary" "ore.delete" "ore.detach" "ore.disconnect"

[13] "ore.doEval" "ore.drop" "ore.esm" "ore.exec"

[17] "ore.exists" "ore.frame" "ore.freq" "ore.get"

[21] "ore.getXlevels" "ore.getXnlevels" "ore.glm" "ore.glm.control"

[25] "ore.groupApply" "ore.hash" "ore.hiveOptions" "ore.hour"

[29] "ore.indexApply" "ore.is.connected" "ore.lazyLoad" "ore.lm"

[33] "ore.load" "ore.ls" "ore.make.names" "ore.mday"

[37] "ore.minute" "ore.month" "ore.neural" "ore.odmAI"

[41] "ore.odmAssocRules" "ore.odmDT" "ore.odmGLM" "ore.odmKMeans"

[45] "ore.odmNB" "ore.odmNMF" "ore.odmOC" "ore.odmSVM"

[49] "ore.predict" "ore.pull" "ore.pull" "ore.pull"

[53] "ore.push" "ore.push" "ore.rank" "ore.recode"

[57] "ore.rm" "ore.rollmax" "ore.rollmean" "ore.rollmin"

[61] "ore.rollsd" "ore.rollsum" "ore.rollvar" "ore.rowApply"

[65] "ore.save" "ore.scriptCreate" "ore.scriptDrop" "ore.second"

[69] "ore.showHiveOptions" "ore.sort" "ore.stepwise" "ore.summary"

[73] "ore.sync" "ore.tableApply" "ore.toXML" "ore.univariate"

[77] "ore.year"

Step #2: Replace Functions with ore*

Step #2: Replace Functions with ore*

MMIN <- function (data, Lag, …) {

rMin <- apply(data, 2, ore.rollmin, K = Lag, align = "right")

}

> myMat

[,1] [,2] [,3] [,4] [,5]

[1,] 4 7 1 1 1

[2,] 2 4 6 2 3

[3,] 4 0 4 2 3

[4,] 2 2 5 4 4

[5,] 4 1 2 2 1

[6,] 5 4 4 4 1

[7,] 2 3 0 0 4

[8,] 4 4 4 4 4

> MMIN(myMat, 3)

[,1] [,2] [,3] [,4] [,5]

[1,] NA NA NA NA NA

[2,] NA NA NA NA NA

[3,] 2 0 1 1 1

[4,] 2 0 4 2 3

[5,] 2 0 2 2 1

[6,] 2 1 2 2 1

[7,] 2 1 0 0 1

[8,] 2 3 0 0 1

Step #3: Use Embedded Scripts

• R Scripts Stored and Managed in the Database

• Execution controlled by Oracle Database and

performed on database server

• Set of ore.* functions for managing and executing

scripts

Step #3: Use Embedded Scripts

try(ore.scriptDrop("doBacktest"))

ore.scriptCreate("doBacktest", function(alphaName, alphaDesc, alphaCat, alphaFormula,

optimMethod, optimFactors, numBaskets, lowerThreshold, upperThreshold, portName,

portDesc = alphaDesc) {

require(backTest) # Load the backTest package

# Now do the backtest

myAlpha <- runAlpha(alphaName, alphaDesc, alphaCat, alphaFormula)

myClass <- switch(optimMethod,

"Simple" = Classify(simpleOpt(myAlpha, .myData$Factors[optimFactors]), numBaskets),

Classify(myAlpha, numBaskets))

myPort <- createPort(myAlpha, myClass, lowerThreshold, upperThreshold, Splits=numBaskets)

fullReport(myPort, portName, portDesc, alphaName, optimFactors, optimMethod,

theDir = "/home/oracle/Results", fileName = "backTestReport.pdf")

})

Step #3: Use Embedded Scripts

alphaForm <- c(

"aUpDwnY1 = (IH7Y1-IH8Y1)/pmax(IH7Y1+IH8Y1,IH6Y1)*100",

"aUpDwnY2 = (IH7Y2-IH8Y2)/pmax(IH7Y2+IH8Y2,IH6Y2)*100",

"aUpDwnSc = UPR((UPR(aUpDwnY1)+UPR(aUpDwnY2)))",

"aFyrevs = UPR((UPR(FY13M)+UPR(FY23M)))",

"UPR(aUpDwnSc+aFyrevs)")

res <- ore.doEval(FUN.NAME="doBacktest", ore.connect = TRUE,

alphaName = "aRevSc", alphaDesc = "Simple Revision Alpha", alphaCat = "Revisions",

alphaFormula = alphaForm, optimMethod = "Simple", optimFactors = c("Style", "Sector"),

numBaskets = 5, lowerThreshold = .5, upperThreshold = 2, portName = "OptRevScore")

user system elapsed

0.134 0.037 240.697

> 240.697/60

[1] 4.01161

An Aside … the Backtest Report

try(ore.scriptDrop("doBacktest"))

ore.scriptCreate("doBacktest", function(alphaName, alphaDesc, alphaCat, alphaFormula,

optimMethod, optimFactors, numBaskets, lowerThreshold, upperThreshold, portName,

portDesc = alphaDesc) {

require(backTest) # Load the backTest package

# Now do the backtest

myAlpha <- runAlpha(alphaName, alphaDesc, alphaCat, alphaFormula)

myClass <- switch(optimMethod,

"Simple" = Classify(simpleOpt(myAlpha, .myData$Factors[optimFactors]), numBaskets),

Classify(myAlpha, numBaskets))

myPort <- createPort(myAlpha, myClass, lowerThreshold, upperThreshold, Splits=numBaskets)

fullReport(myPort, portName, portDesc, alphaName, optimFactors, optimMethod,

theDir = "/home/oracle/Results", fileName = "backTestReport.pdf")

})

An Aside … the Backtest Report

Automatically generated and emailed to fund manager

Another Aside … getting interactive!

• Results are stored as ore objects in the database

• I can access the object for more in-depth analysis

> x <- loadPort("OptRevScore", "aRevSc")

Loading object STRATEGIES_REVISIONS_AREVSC_OPTREVSCORE_PORT

> names(x)

[1] "baskets" "bRets" "alpha" "relRets" "hMat" "classed" "tCosts"

[8] "turnOver" "costData“

> ls("package:backTest", pattern = "*lot")

[1] "alphaPlot" "dayPlot" "monthPlot" "pairsPlot" "plotPort"

[6] "qRetsPlot" "qSharpePlot" "qTranCostPlot" "qTurnOverPlot" "qVolsPlot"

[11] "textPlot" "turnOverPlot"

Another Aside … getting interactive!

> plotPort(x, removeTcosts = TRUE, title = "Simple Optimised Revision Strategy")

Another Aside … getting interactive!

> textPlot(x)

Another Aside … getting interactive!

> monthPlot(x, "2012-01")

Another Aside … getting interactive!

> pairsPlot(x, start = "2010-01-1")

Step #3: Expose via SQL Interface

• R Scripts Stored and Managed in the Database

• Execution controlled by Oracle Database and

performed on database server

• Set of ore.* functions for managing and executing

scripts

• Outputs can be stored as XML or PNG (blobs)

Updated Application

Universe

Feed Raw Data &

Alpha Storage

Analytic Engine

Analytic Code

MgMent UI

Backtest

Application

Graphical User

Interface

Scripting

Interface

Bespoke C

Updated Application

Universe

Feed Raw Data &

Alpha Storage

Analytic Engine

Analytic Code

MgMent UI

Backtest

Application

Graphical User

Interface

Scripting

Interface

Updated Application

Universe

Feed Raw Data &

Alpha Storage

Analytic Engine

Analytic Code

Interface

Backtest

Application

Embedded

Scripts

SQL

Interface

Graphical User

Interface

Benefits of ORE

• Significant immediate benefits in performance and

code management

• Database script management makes deployment

very simple

• Script and SQL interfaces allow for close

integration into business processes in a controlled

manner

Summary

Summary

• Oracle R Enterprise provides a sophisticated platform for integrating R into business processes

• Adds scalability and performance improvements to flexible R environment

• Integrating a legacy application with ORE proved to be easy to achieve

• We have this running on demo servers if you want to see it ….

Discussion