trading using deep learning - nvidia · behavior behavior of financial markets change all the time...

TRADING USING DEEP LEARNING

MAN VS MACHINE


84%

Orders By

Algorithms

16%

Orders By

Human

Artificial Neural Networks

Neural networks are a family of models inspired by

biological brain structure and are used to estimate or

approximate functions that can depend on a large number

of inputs and are generally unknown.

Recent breakthroughs in artificial neural networks led to a

modern renascence in AI.


DEEP LEARNING SUPERIORITY

96.92%

Deep Learning

94.9%Human

DEEP META LEARNING

ref: http://www.image-net.org/challenges/LSVRC/

GRADIENT DESCENT

𝐸 = Error of the network

𝑤𝑡 = 𝑤𝑡−1 − 𝛾𝜕𝐸

𝜕𝑤

𝑊 = Weight matrix representing the filters

DEEP META LEARNING

GRADIENT BASED MODELSLegend

𝑥0𝑓0(𝑥0, 𝑤0)

𝑓1(𝑥1, 𝑤1)

𝑓2(𝑥2, 𝑤2)

𝑓𝑛 𝑥𝑛, 𝑤𝑛 = ො𝑦

𝑓𝑛−1(𝑥𝑛−1, 𝑤𝑛−1)

𝑓𝑛−2(𝑥𝑛−2, 𝑤𝑛−2)

𝑤0

𝑤1

𝑤𝑛

𝑤𝑛−1

𝐸 = 𝑙 ො𝑦, 𝑦𝑦

𝑙 ො𝑦, 𝑦 - Loss Function

𝑥0 - Features Vector

𝑥𝑖 - Output of 𝑖 layer

𝑤𝑖 - Weights of 𝑖 layer

𝑦 – Ground Truth

ො𝑦 – Model Output

𝐸 – Loss Surface

𝜕𝐸

𝜕𝑥𝑛=𝜕𝑙 ො𝑦, 𝑦

𝜕𝑥𝑛

𝜕𝐸

𝜕𝑤𝑛=

𝜕𝐸

𝜕𝑥𝑛

𝜕𝑓𝑛 𝑥𝑛−1, 𝑤𝑛

𝜕𝑤𝑛

𝜕𝐸

𝜕𝑥𝑛−1=

𝜕𝐸

𝜕𝑥𝑛

𝜕𝑓𝑛 𝑥𝑛−1, 𝑤𝑛

𝑥𝑛−1

𝑓– Activation Function

𝜕𝐸

𝜕𝑥𝑛−2=

𝜕𝐸

𝜕𝑥𝑛−1

𝜕𝑓𝑛−1 𝑥𝑛−2, 𝑤𝑛−1

𝑥𝑛−2

𝜕𝐸

𝜕𝑤𝑛−1=

𝜕𝐸

𝜕𝑥𝑛−1

𝜕𝑓𝑛 𝑥𝑛−2, 𝑤𝑛−1

𝜕𝑤𝑛−1

……

𝐹𝑜𝑟𝑤

𝑎𝑟𝑑

𝑃𝑟𝑜𝑝𝑎𝑔𝑎𝑡𝑖𝑜𝑛

𝐵𝑎𝑐𝑘

𝑃𝑟𝑜𝑝𝑎𝑔𝑎𝑡𝑖𝑜

𝑛

1: Forward Propagation 2: Loss Calculation 3: Optimization

DEEP META LEARNING

Supervised Learning in a nutshell.

CAT

Learning From Examples.

FINNANCIAL PREDICTION PITFALLS

Much Data

Possible relevant

data from many

markets is incredibly

large.

No Theory

Complex non-linear

interactions in the data

are not well specified

by financial theory.

Noisy Data

Noise In financial data

Is very common and

sometimes

distinguishing noise

from behavior is hard.

Importance

Data Importance is

questionable and

determination of

meaningful data is

hard.

Overfitting

Overfitted easily,

most models have

poor predictive

capabilities

On financial data.

Behavior

Behavior of financial

markets change all the

time and can be really

unpredictable.


WHY DEEP LEARNING?

Much Data

Possible relevant

data from many

markets is incredibly

large.

No Theory

Complex non-linear

interactions in the data

are not well specified

by financial theory.

Noisy Data

Noise In financial data

Is very common and

sometimes

distinguishing noise

from behavior is hard.

Importance

Data Importance is

questionable and

determination of

meaningful data is

hard.

Overfitting

Overfitted easily,

most models have

poor predictive

capabilities

On financial data.

Behavior

Behavior of financial

markets change all the

time and can be really

unpredictable.


160.5

161

161.5

162

162.5

163

163.5

164

35

:09

.9

36

:57

.2

38

:04

.2

40

:03

.3

41

:06

.0

42

:41

.7

44

:12

.7

45

:35

.9

47

:39

.1

49

:05

.9

50

:17

.6

51

:31

.1

53

:04

.0

54

:36

.3

56

:31

.8

58

:07

.0

59

:29

.3

00

:57

.2

02

:31

.0

04

:05

.5

08

:16

.0

10

:11

.6

12

:09

.7

17

:17

.8

20

:37

.3

22

:09

.9

28

:36

.1

30

:52

.1

33

:07

.0

36

:46

.1

39

:29

.7

42

:05

.1

46

:26

.1

49

:55

.8

53

:08

.2

54

:59

.0

57

:12

.2

00

:35

.6

04

:10

.1

06

:56

.1

09

:30

.2

12

:35

.7

15

:44

.1

19

:03

.8

21

:05

.0

23

:13

.6

28

:12

.2

32

:33

.9

38

:22

.9

43

:34

.5

46

:47

.4

52

:54

.0

00

:33

.2

09

:54

.6

22

:43

.6

33

:06

.1

44

:02

.0

59

:27

.2

03

:38

.7

14

:35

.6

160.5

161

161.5

162

162.5

163

163.5

164

35

:09

.9

37

:08

.7

38

:59

.2

40

:38

.7

42

:17

.6

43

:49

.0

45

:35

.9

47

:55

.2

49

:32

.2

51

:02

.2

52

:25

.6

54

:24

.2

56

:31

.8

58

:27

.3

59

:58

.8

01

:43

.3

03

:22

.2

08

:03

.3

10

:11

.6

14

:04

.0

17

:37

.5

21

:19

.2

26

:03

.8

30

:46

.0

33

:07

.0

36

:54

.2

40

:00

.4

43

:27

.5

48

:22

.7

52

:26

.2

54

:59

.0

57

:33

.4

01

:00

.6

04

:41

.4

08

:23

.5

11

:20

.4

15

:44

.1

19

:15

.9

21

:24

.3

25

:00

.7

30

:25

.5

37

:26

.4

43

:34

.5

47

:56

.6

53

:40

.0

01

:46

.4

21

:12

.6

31

:31

.7

44

:02

.0

59

:33

.3

12

:50

.0

BACK TO FINANCE

?


Strategy UniverseStrategy

Configurations

Configuration

DEEP REINFORCEMENT LEARNING

Trading Decision Utility

1 - buy

0 - hold

-1 - Sell

P&L / Drawdown

DEEP LEARNING IN FINANCE

Technical analysis

Might of might not work,

One thing for sure:

Very hard to generalize.

Technical Analysis

talib.SMA(…talib.MOM(…

TA-Lib : Technical Analysis

Library

Successful Technical Trading Agents Using

Genetic Programming Farnsworth ,

2004


Surprisingly,

Genetic programing can be very

successful when It comes to

financial strategies

gp = SymbolicRegressor(...gp.fit(X_train, y_train)

APPLYING DEEP LEARNING TO ENHANCE

MOMENTUM TRADING STRATEGIES IN STOCKS

DEEP

LEARNING

IN FINANCE

APPLYING DEEP LEARNING TO ENHANCE

MOMENTUM TRADING STRATEGIES IN

STOCKSL Takeuchi,

2013

𝑋𝑖: σ𝑡−12𝑡+𝑖

𝑏𝑖𝑑𝑡−𝑎𝑠𝑘𝑡−12𝑎𝑠𝑘𝑡−12

−𝜇

𝜎| 𝑖 ∈ (1,11), ∪ 1 𝑖𝑓 𝑡 𝑖𝑛 𝑗𝑎𝑛𝑢𝑎𝑟𝑦 𝑒𝑙𝑠𝑒 0

𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝑉𝑒𝑐𝑡𝑜𝑟

FEATURE

ENGINEERI

NG

MODEL

RESULT

S


𝑋𝑖: σ𝑡−12𝑡+𝑖

𝑏𝑖𝑑𝑡−𝑎𝑠𝑘𝑡−12𝑎𝑠𝑘𝑡−12

−𝜇

𝜎| 𝑖 ∈ (1,11), ∪ 1 𝑖𝑓 𝑡 𝑖𝑛 𝑗𝑎𝑛𝑢𝑎𝑟𝑦 𝑒𝑙𝑠𝑒 0

𝐺𝑟𝑜𝑢𝑛𝑑 𝑇𝑟𝑢𝑡ℎ

4033 4 50 240 33

𝑋:𝑏𝑖𝑑𝑡+1 − 𝑎𝑠𝑘𝑡−12

𝑎𝑠𝑘𝑡−12>

𝑡−12

𝑡+𝑖𝑏𝑖𝑑𝑡 − 𝑎𝑠𝑘𝑡−12

𝑎𝑠𝑘𝑡−12


12 Monthly Returns

For every month: Daily cumulative returns

Z-score Against other cumulative returns of other stocks

Flag if January

LayersStructure Hyper Parameters

K - FoldStacked

RBMsLayers size found by grid

search

Not Written

Confusion Matrix

True

False

Predicted

True

Predicted

False

22.38%

19.19% 30.97%

27.45%

Precision: 61.224%

Recall: 53.659%

Accuracy:

53.061%

DEEP MODELING COMPLEX COUPLINGS WITHIN

FINANCIAL MARKETS

DEEP

LEARNING

IN FINANCE

DEEP MODELING COMPLEX COUPLINGS

WITHIN FINANCIAL MARKETS

𝑋𝑖: 𝑆𝑖 ∪ 𝐹𝑖


FEATURE

ENGINEERI

NG

MODEL

RESULT

S


R

B

M

S

T

O

C

K

Used Deep Belief Network to find hidden couplings between markets

Used Past Prices of stocks and forex as features

Unsupervised Learning Model


DBN of Stacked

RBMsOptimizer: SGD

Loss: Negative Log

Likelihood

Note: No Cross Validation in

paper

F

O

R

E

X

R

B

M

R

B

M

R

B

M

DEEP LEARNING FOR MULTIVARIATE FINANCIAL

TIME SERIES

DEEP

LEARNING

IN FINANCE

DEEP LEARNING FOR MULTIVARIATE

FINANCIAL TIME SERIES

𝑋𝑖:log

𝑏𝑖𝑑𝑡

𝑎𝑠𝑘𝑡−1−𝜇

𝜎| 𝑖 ∈ (−33,−2), ∪ 1 𝑖𝑓 𝑡 𝑖𝑛 𝑗𝑎𝑛𝑢𝑎𝑟𝑦 𝑒𝑙𝑠𝑒 0


FEATURE

ENGINEERI

NG

MODEL

RESULT

S


𝑌: 1 𝑖𝑓 𝑃𝑟𝑖𝑐𝑒 𝑖𝑠 𝑜𝑣𝑒𝑟 𝑚𝑒𝑑𝑖𝑎𝑛 𝑎𝑡 𝑡 + 1


D

E

N

S

e

1


Matrix of log returns over all the stocks

Z-score Against other stocks log returns

Flag if January


DBN Connected to

MLP

Optimizer: ADAGradCrosVal: Tarining: 70%, Valid: 15%, Test:

15%

Loss: Negative Log

LikelihoodR

B

M

R

B

M

R

B

M

“Deep” Belief Net Fully Connected

Activation: 𝑡𝑎𝑛ℎRegularization: 𝐿1

IMPLEMENTING DEEP NEURAL NETWORKS FOR

FINANCIAL MARKET PREDICTION

DEEP

LEARNING

IN FINANCE

IMPLEMENTING DEEP NEURAL NETWORKS

FOR FINANCIAL MARKET PREDICTION

𝑋: ራ

𝑖

𝑖−100𝑝𝑡 − 𝑝𝑡−1𝑝𝑡−1

∪ራ

𝑓=5

100

𝑀𝐴(𝑋, 𝑓) ∪ራ

𝑗=1

𝑎

𝜌(𝑋, 𝑋𝑗)


FEATURE

ENGINEERI

NG

MODEL

RESULT

S


𝑌 ∈ 1,0, −1 𝑓𝑜𝑟 𝑏𝑢𝑦, 𝑠𝑒𝑙𝑙, ℎ𝑜𝑙𝑑


1


All moving averages from 5 to 100

List of 100 lagged prices

Pearson correlation between the returns (all 100) of the stock and all the other stocks(45)


Simple Fully

Connected

Optimizer: SGD

CrosVal: Tarining: 80%, Test:

20%

Loss: Categorical Cross

Entropy1000 100 135

Fully Connected

Activation:

ReLU, 𝑠𝑜𝑓𝑡𝑚𝑎𝑥Training Algorithm: Walk

forward

Accuracy: 73%

F1 Score: 0.4

9895

DEEP LEARNING IN FINANCEDEEP

LEARNING

IN FINANCE

DEEP LEARNING IN FINANCE Heaton et al,

2016

𝑝𝑖𝑗: | 𝑖 ∈ (0,365), j ∈ (1,500)


FEATURE

ENGINEERI

NG

MODEL

RESULT

S


𝑝𝑖𝑗: | 𝑖 ∈ (0,365), j ∈ (1,500)


50

0 4 4 2

𝑝𝑖𝑗𝑠&𝑝500

: | j ∈ (1,500)


Trained an auto encoder

Used it to find stock close to the market encoded

Used those with deep architecture to find s&p500


AutoEncod

erDFP Policy Sparsity indicator: 0.1

50

0

Stationarity

Definition

Let 𝑥𝑡 be a stochastic process and let 𝐹𝑥 𝑥𝑡1+𝜏 , . . . . , 𝑥𝑡𝑘+𝜏 represent the cumulative distribution

function of the joint distribution of 𝑥𝑡 at times 𝑡1 + 𝜏, . . . . . . . , 𝑡𝑘 + 𝜏.

𝒙𝒕 is said to be strictly stationary if, for all k, for all 𝜏 and for all 𝑡1. . . . . . 𝑡𝑘

𝑭𝒙 𝒙𝒕𝟏+𝝉 , . . . . , 𝒙𝒕𝒌+𝝉 = 𝑭𝑿 𝒙𝒕𝟏 , . . . . , 𝒙𝒕𝒌

In other words

Shifting the time origin by an amount 𝜏 has no effect on

the joint distribution which depends only on the intervals between 𝑡1 . . . . . . . 𝑡𝑘

Stationarizing A Time Series

V A L U E F O R E C A S T I N G

"Stationarizing" A Time Series


𝑦𝑖 = 𝑦𝑜𝑟𝑖𝑔𝑖

𝑦𝑖 = [ln(𝑦𝑜𝑟𝑖𝑔𝑖)]′

Algorithm Structure

Learning Model

Logarithm &

Differentiation

Integration &

Exponentiation


Segmentation Concatenation

𝑦𝑖 = [ln(𝑦𝑜𝑟𝑖𝑔𝑖)]′ 𝑦𝑝𝑟𝑒𝑑𝑖 = 𝑒 ො𝑦𝑖

Features Prediction

Nonstationary

Stochastic Series

Stationary

Stochastic Series

SegmentsRegression Prediction

Stationary

Stochastic Prediction

Nonstationary Final

Stochastic Prediction

4D Financial Graph Data

Assets

DEEP REINFORCEMENT LEARNING

Time

Features

DEEP LEARNING IN PRODUCTION

ALGOTRADING PITFALLS

Slippage is the difference

between where the computer

signaled the entry and exit for

a trade and where actual

clients, with actual money,

entered and exited.

Slippage

Commission is a service

charge assessed by a broker

or investment advisor in return

for providing investment

advice and/or handling the

purchase or sale of a security.

Commission


Theoretical Motivations

Smoothness Assumption

Points that are close to each other are more likely to share a label.

-Allow to easily interpolate between examples.

-The root for the curse of dimensionality

Depth Assumption

Depth is a double edge sword.

- Depth can add exponentially (comparing to width) more predictive power to the network but only

if done right.

Distributed Representations

Localist models are very inefficient whenever the data has componential structure

-Distributed representations are useful for efficient combining of features to learn the underlying

mechanism that the labels are derived from.

Kitchen sink approach

Possible relevant features from many market can be incredibly large.


UNDERFITTING

Finance models often have poor results Solve this by smart risk management

“It works if you remove this situation”Save yourself some money, it just doesn’t work.

Feature Engineering VS Raw DataA good question.


Data: Unlimited

Image Finance

Overtraining : Hard

Data: Limited

Overtraining: Easy

$

“You are basically risking millions on something no one in the world fully understands?”

"Why Should I Trust You?": Explaining the Predictions of Any Classifier – Ribeiro, et al 2015


MARKOV PROPERTY

𝑃 𝑋𝑛 = 𝑥𝑛 𝑋𝑛−1 = 𝑥𝑛−1, 𝑋𝑛−2 = 𝑥𝑛−2, . . . . , 𝑋0 = 𝑥0) = 𝑃(𝑋𝑛 = 𝑥𝑛|𝑋𝑛−𝑡 = 𝑥𝑛−𝑡)

Markov Property


RNN PITFALLS

Example:

Jane walked into the room John walked in too. It was late at night. Jane said hi to _____

The difficulty in training recurrent neural nets – Bengio, et al 2013

Tricks:

- Init the weight matrices to the identity Matrix

- Set the activation function to RELU

- Norm clipping the exploding gradient

ො𝑔 ←𝜕𝜀

𝜕𝑤

𝒊𝒇 𝑔 ≥ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝒕𝒉𝒆𝒏

ො𝑔 ←𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑

|| ො𝑔||ො𝑔

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units– Hinton, et al 2015


DEBUGGING DEEP NETS

Understanding the difficulty in training deep feedforward neural nets– Bengio, et al 2013

Mean and standard deviation of the activation (output of

the sigmoid) during learning, for 4 hidden layers .

The top hidden layer quickly saturates at 0 (slowing down

all learning), but then slowly desaturates ~ epoch 100.

Other Uses of Deep Learning

Extracting price impact of news Stress signal on Individual Asset from text (Word2Vec) and

fundamental data

Non-Linear Portfolio Replication Using Fewer Instruments

Reinforcement LearningFor Continues Algo optimization

Space Embedding For Correlation between Assets


GOOG

AAPL

MMM

...

...

Deep Learning Orders

Trading / Training On Site

Research In House

Data StorageGPU CLUSTERTesting

DataMarket Market

Under fitting

Underfitting occurs when a statistical model or

machine learning algorithm cannot capture the

underlying trend of the data. Intuitively, underfitting

occurs when the model or the algorithm does not fit

the data well enough.

Underftting is either caused by trying to fit a too simple

model or by not training the model enough.

Calculate 1+1

What is 1? I've

only seen

equations with 4


Represented

As Tensor

Multiplications

Computed

On NVidia

GPUs

Model

Training

Algorithm

USING THE GPU


X

THE IMPORTANCE OF POWER


PerformanceComputational Speed allows us to further train our model to

yield better accuracy in the finite amount of training time we

have.

Memory CapacityHigh Memory Capacity is crucial to our systems, allowing

us to construct wider and deeper models and integrate more

data into the models.

Interface SpeedThe Speed of the card interface defines how many model

“shifts” can we perform in one trading cycle.

CPU vs GPU

Compiled with .nvcc

GDDR5 – 7.0 Gbps

Compiled with .gcc

DDR4 - 2400Mhz


Matrix multiplication complexity–

𝑂(𝑛3)Epoch takes: 7500 sec (avg)

Matrix multiplication complexity– 𝑂(𝑛)[1]

Epoch takes: 500 sec (avg)

[1] - Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication -

Fatahalian, et al, 2004

POWER

TRAINING

ACCURACY

RETURNS

=

=

=

Deep Learning

Algorithmic Trading

Paper Review

GPU Based Trading

Recap

Questions?

trading using deep learning - nvidia · behavior behavior of financial markets change all the time...

Documents