trading using deep learning - nvidia · behavior behavior of financial markets change all the time...
TRANSCRIPT
TRADING USING DEEP LEARNING
MAN VS MACHINE
TRADING USING DEEP LEARNING
84%
Orders By
Algorithms
16%
Orders By
Human
Artificial Neural Networks
Neural networks are a family of models inspired by
biological brain structure and are used to estimate or
approximate functions that can depend on a large number
of inputs and are generally unknown.
Recent breakthroughs in artificial neural networks led to a
modern renascence in AI.
TRADING USING DEEP LEARNING
DEEP LEARNING SUPERIORITY
96.92%
Deep Learning
94.9%Human
DEEP META LEARNING
ref: http://www.image-net.org/challenges/LSVRC/
GRADIENT DESCENT
𝐸 = Error of the network
𝑤𝑡 = 𝑤𝑡−1 − 𝛾𝜕𝐸
𝜕𝑤
𝑊 = Weight matrix representing the filters
DEEP META LEARNING
GRADIENT BASED MODELSLegend
𝑥0𝑓0(𝑥0, 𝑤0)
𝑓1(𝑥1, 𝑤1)
𝑓2(𝑥2, 𝑤2)
𝑓𝑛 𝑥𝑛, 𝑤𝑛 = ො𝑦
𝑓𝑛−1(𝑥𝑛−1, 𝑤𝑛−1)
𝑓𝑛−2(𝑥𝑛−2, 𝑤𝑛−2)
𝑤0
𝑤1
𝑤𝑛
𝑤𝑛−1
𝐸 = 𝑙 ො𝑦, 𝑦𝑦
𝑙 ො𝑦, 𝑦 - Loss Function
𝑥0 - Features Vector
𝑥𝑖 - Output of 𝑖 layer
𝑤𝑖 - Weights of 𝑖 layer
𝑦 – Ground Truth
ො𝑦 – Model Output
𝐸 – Loss Surface
𝜕𝐸
𝜕𝑥𝑛=𝜕𝑙 ො𝑦, 𝑦
𝜕𝑥𝑛
𝜕𝐸
𝜕𝑤𝑛=
𝜕𝐸
𝜕𝑥𝑛
𝜕𝑓𝑛 𝑥𝑛−1, 𝑤𝑛
𝜕𝑤𝑛
𝜕𝐸
𝜕𝑥𝑛−1=
𝜕𝐸
𝜕𝑥𝑛
𝜕𝑓𝑛 𝑥𝑛−1, 𝑤𝑛
𝑥𝑛−1
𝑓– Activation Function
𝜕𝐸
𝜕𝑥𝑛−2=
𝜕𝐸
𝜕𝑥𝑛−1
𝜕𝑓𝑛−1 𝑥𝑛−2, 𝑤𝑛−1
𝑥𝑛−2
𝜕𝐸
𝜕𝑤𝑛−1=
𝜕𝐸
𝜕𝑥𝑛−1
𝜕𝑓𝑛 𝑥𝑛−2, 𝑤𝑛−1
𝜕𝑤𝑛−1
……
𝐹𝑜𝑟𝑤
𝑎𝑟𝑑
𝑃𝑟𝑜𝑝𝑎𝑔𝑎𝑡𝑖𝑜𝑛
𝐵𝑎𝑐𝑘
𝑃𝑟𝑜𝑝𝑎𝑔𝑎𝑡𝑖𝑜
𝑛
1: Forward Propagation 2: Loss Calculation 3: Optimization
DEEP META LEARNING
Supervised Learning in a nutshell.
CAT
Learning From Examples.
FINNANCIAL PREDICTION PITFALLS
Much Data
Possible relevant
data from many
markets is incredibly
large.
No Theory
Complex non-linear
interactions in the data
are not well specified
by financial theory.
Noisy Data
Noise In financial data
Is very common and
sometimes
distinguishing noise
from behavior is hard.
Importance
Data Importance is
questionable and
determination of
meaningful data is
hard.
Overfitting
Overfitted easily,
most models have
poor predictive
capabilities
On financial data.
Behavior
Behavior of financial
markets change all the
time and can be really
unpredictable.
TRADING USING DEEP LEARNING
WHY DEEP LEARNING?
Much Data
Possible relevant
data from many
markets is incredibly
large.
No Theory
Complex non-linear
interactions in the data
are not well specified
by financial theory.
Noisy Data
Noise In financial data
Is very common and
sometimes
distinguishing noise
from behavior is hard.
Importance
Data Importance is
questionable and
determination of
meaningful data is
hard.
Overfitting
Overfitted easily,
most models have
poor predictive
capabilities
On financial data.
Behavior
Behavior of financial
markets change all the
time and can be really
unpredictable.
TRADING USING DEEP LEARNING
160.5
161
161.5
162
162.5
163
163.5
164
35
:09
.9
36
:57
.2
38
:04
.2
40
:03
.3
41
:06
.0
42
:41
.7
44
:12
.7
45
:35
.9
47
:39
.1
49
:05
.9
50
:17
.6
51
:31
.1
53
:04
.0
54
:36
.3
56
:31
.8
58
:07
.0
59
:29
.3
00
:57
.2
02
:31
.0
04
:05
.5
08
:16
.0
10
:11
.6
12
:09
.7
17
:17
.8
20
:37
.3
22
:09
.9
28
:36
.1
30
:52
.1
33
:07
.0
36
:46
.1
39
:29
.7
42
:05
.1
46
:26
.1
49
:55
.8
53
:08
.2
54
:59
.0
57
:12
.2
00
:35
.6
04
:10
.1
06
:56
.1
09
:30
.2
12
:35
.7
15
:44
.1
19
:03
.8
21
:05
.0
23
:13
.6
28
:12
.2
32
:33
.9
38
:22
.9
43
:34
.5
46
:47
.4
52
:54
.0
00
:33
.2
09
:54
.6
22
:43
.6
33
:06
.1
44
:02
.0
59
:27
.2
03
:38
.7
14
:35
.6
160.5
161
161.5
162
162.5
163
163.5
164
35
:09
.9
37
:08
.7
38
:59
.2
40
:38
.7
42
:17
.6
43
:49
.0
45
:35
.9
47
:55
.2
49
:32
.2
51
:02
.2
52
:25
.6
54
:24
.2
56
:31
.8
58
:27
.3
59
:58
.8
01
:43
.3
03
:22
.2
08
:03
.3
10
:11
.6
14
:04
.0
17
:37
.5
21
:19
.2
26
:03
.8
30
:46
.0
33
:07
.0
36
:54
.2
40
:00
.4
43
:27
.5
48
:22
.7
52
:26
.2
54
:59
.0
57
:33
.4
01
:00
.6
04
:41
.4
08
:23
.5
11
:20
.4
15
:44
.1
19
:15
.9
21
:24
.3
25
:00
.7
30
:25
.5
37
:26
.4
43
:34
.5
47
:56
.6
53
:40
.0
01
:46
.4
21
:12
.6
31
:31
.7
44
:02
.0
59
:33
.3
12
:50
.0
BACK TO FINANCE
?
TRADING USING DEEP LEARNING
Strategy UniverseStrategy
Configurations
Configuration
DEEP REINFORCEMENT LEARNING
Trading Decision Utility
1 - buy
0 - hold
-1 - Sell
P&L / Drawdown
DEEP LEARNING IN FINANCE
Technical analysis
Might of might not work,
One thing for sure:
Very hard to generalize.
Technical Analysis
talib.SMA(…talib.MOM(…
TA-Lib : Technical Analysis
Library
Successful Technical Trading Agents Using
Genetic Programming Farnsworth ,
2004
DEEP LEARNING IN FINANCE
Surprisingly,
Genetic programing can be very
successful when It comes to
financial strategies
gp = SymbolicRegressor(...gp.fit(X_train, y_train)
DEEP LEARNING IN FINANCE
APPLYING DEEP LEARNING TO ENHANCE
MOMENTUM TRADING STRATEGIES IN STOCKS
DEEP
LEARNING
IN FINANCE
APPLYING DEEP LEARNING TO ENHANCE
MOMENTUM TRADING STRATEGIES IN
STOCKSL Takeuchi,
2013
𝑋𝑖: σ𝑡−12𝑡+𝑖
𝑏𝑖𝑑𝑡−𝑎𝑠𝑘𝑡−12𝑎𝑠𝑘𝑡−12
−𝜇
𝜎| 𝑖 ∈ (1,11), ∪ 1 𝑖𝑓 𝑡 𝑖𝑛 𝑗𝑎𝑛𝑢𝑎𝑟𝑦 𝑒𝑙𝑠𝑒 0
𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝑉𝑒𝑐𝑡𝑜𝑟
FEATURE
ENGINEERI
NG
MODEL
RESULT
S
DEEP LEARNING IN FINANCE
𝑋𝑖: σ𝑡−12𝑡+𝑖
𝑏𝑖𝑑𝑡−𝑎𝑠𝑘𝑡−12𝑎𝑠𝑘𝑡−12
−𝜇
𝜎| 𝑖 ∈ (1,11), ∪ 1 𝑖𝑓 𝑡 𝑖𝑛 𝑗𝑎𝑛𝑢𝑎𝑟𝑦 𝑒𝑙𝑠𝑒 0
𝐺𝑟𝑜𝑢𝑛𝑑 𝑇𝑟𝑢𝑡ℎ
4033 4 50 240 33
𝑋:𝑏𝑖𝑑𝑡+1 − 𝑎𝑠𝑘𝑡−12
𝑎𝑠𝑘𝑡−12>
𝑡−12
𝑡+𝑖𝑏𝑖𝑑𝑡 − 𝑎𝑠𝑘𝑡−12
𝑎𝑠𝑘𝑡−12
𝐺𝑟𝑜𝑢𝑛𝑑 𝑇𝑟𝑢𝑡ℎ
12 Monthly Returns
For every month: Daily cumulative returns
Z-score Against other cumulative returns of other stocks
Flag if January
LayersStructure Hyper Parameters
K - FoldStacked
RBMsLayers size found by grid
search
Not Written
Confusion Matrix
True
False
Predicted
True
Predicted
False
22.38%
19.19% 30.97%
27.45%
Precision: 61.224%
Recall: 53.659%
Accuracy:
53.061%
DEEP MODELING COMPLEX COUPLINGS WITHIN
FINANCIAL MARKETS
DEEP
LEARNING
IN FINANCE
DEEP MODELING COMPLEX COUPLINGS
WITHIN FINANCIAL MARKETS
𝑋𝑖: 𝑆𝑖 ∪ 𝐹𝑖
𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝑉𝑒𝑐𝑡𝑜𝑟
FEATURE
ENGINEERI
NG
MODEL
RESULT
S
DEEP LEARNING IN FINANCE
R
B
M
S
T
O
C
K
Used Deep Belief Network to find hidden couplings between markets
Used Past Prices of stocks and forex as features
Unsupervised Learning Model
LayersStructure Hyper Parameters
DBN of Stacked
RBMsOptimizer: SGD
Loss: Negative Log
Likelihood
Note: No Cross Validation in
paper
F
O
R
E
X
R
B
M
R
B
M
R
B
M
DEEP LEARNING FOR MULTIVARIATE FINANCIAL
TIME SERIES
DEEP
LEARNING
IN FINANCE
DEEP LEARNING FOR MULTIVARIATE
FINANCIAL TIME SERIES
𝑋𝑖:log
𝑏𝑖𝑑𝑡
𝑎𝑠𝑘𝑡−1−𝜇
𝜎| 𝑖 ∈ (−33,−2), ∪ 1 𝑖𝑓 𝑡 𝑖𝑛 𝑗𝑎𝑛𝑢𝑎𝑟𝑦 𝑒𝑙𝑠𝑒 0
𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝑉𝑒𝑐𝑡𝑜𝑟
FEATURE
ENGINEERI
NG
MODEL
RESULT
S
DEEP LEARNING IN FINANCE
𝑌: 1 𝑖𝑓 𝑃𝑟𝑖𝑐𝑒 𝑖𝑠 𝑜𝑣𝑒𝑟 𝑚𝑒𝑑𝑖𝑎𝑛 𝑎𝑡 𝑡 + 1
𝐺𝑟𝑜𝑢𝑛𝑑 𝑇𝑟𝑢𝑡ℎ
D
E
N
S
e
1
𝐺𝑟𝑜𝑢𝑛𝑑 𝑇𝑟𝑢𝑡ℎ
Matrix of log returns over all the stocks
Z-score Against other stocks log returns
Flag if January
LayersStructure Hyper Parameters
DBN Connected to
MLP
Optimizer: ADAGradCrosVal: Tarining: 70%, Valid: 15%, Test:
15%
Loss: Negative Log
LikelihoodR
B
M
R
B
M
R
B
M
“Deep” Belief Net Fully Connected
Activation: 𝑡𝑎𝑛ℎRegularization: 𝐿1
IMPLEMENTING DEEP NEURAL NETWORKS FOR
FINANCIAL MARKET PREDICTION
DEEP
LEARNING
IN FINANCE
IMPLEMENTING DEEP NEURAL NETWORKS
FOR FINANCIAL MARKET PREDICTION
𝑋: ራ
𝑖
𝑖−100𝑝𝑡 − 𝑝𝑡−1𝑝𝑡−1
∪ራ
𝑓=5
100
𝑀𝐴(𝑋, 𝑓) ∪ራ
𝑗=1
𝑎
𝜌(𝑋, 𝑋𝑗)
𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝑉𝑒𝑐𝑡𝑜𝑟
FEATURE
ENGINEERI
NG
MODEL
RESULT
S
DEEP LEARNING IN FINANCE
𝑌 ∈ 1,0, −1 𝑓𝑜𝑟 𝑏𝑢𝑦, 𝑠𝑒𝑙𝑙, ℎ𝑜𝑙𝑑
𝐺𝑟𝑜𝑢𝑛𝑑 𝑇𝑟𝑢𝑡ℎ
1
𝐺𝑟𝑜𝑢𝑛𝑑 𝑇𝑟𝑢𝑡ℎ
All moving averages from 5 to 100
List of 100 lagged prices
Pearson correlation between the returns (all 100) of the stock and all the other stocks(45)
LayersStructure Hyper Parameters
Simple Fully
Connected
Optimizer: SGD
CrosVal: Tarining: 80%, Test:
20%
Loss: Categorical Cross
Entropy1000 100 135
Fully Connected
Activation:
ReLU, 𝑠𝑜𝑓𝑡𝑚𝑎𝑥Training Algorithm: Walk
forward
Accuracy: 73%
F1 Score: 0.4
9895
DEEP LEARNING IN FINANCEDEEP
LEARNING
IN FINANCE
DEEP LEARNING IN FINANCE Heaton et al,
2016
𝑝𝑖𝑗: | 𝑖 ∈ (0,365), j ∈ (1,500)
𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝑉𝑒𝑐𝑡𝑜𝑟
FEATURE
ENGINEERI
NG
MODEL
RESULT
S
DEEP LEARNING IN FINANCE
𝑝𝑖𝑗: | 𝑖 ∈ (0,365), j ∈ (1,500)
𝐺𝑟𝑜𝑢𝑛𝑑 𝑇𝑟𝑢𝑡ℎ
50
0 4 4 2
𝑝𝑖𝑗𝑠&𝑝500
: | j ∈ (1,500)
𝐺𝑟𝑜𝑢𝑛𝑑 𝑇𝑟𝑢𝑡ℎ
Trained an auto encoder
Used it to find stock close to the market encoded
Used those with deep architecture to find s&p500
LayersStructure Hyper Parameters
AutoEncod
erDFP Policy Sparsity indicator: 0.1
50
0
Stationarity
Definition
Let 𝑥𝑡 be a stochastic process and let 𝐹𝑥 𝑥𝑡1+𝜏 , . . . . , 𝑥𝑡𝑘+𝜏 represent the cumulative distribution
function of the joint distribution of 𝑥𝑡 at times 𝑡1 + 𝜏, . . . . . . . , 𝑡𝑘 + 𝜏.
𝒙𝒕 is said to be strictly stationary if, for all k, for all 𝜏 and for all 𝑡1. . . . . . 𝑡𝑘
𝑭𝒙 𝒙𝒕𝟏+𝝉 , . . . . , 𝒙𝒕𝒌+𝝉 = 𝑭𝑿 𝒙𝒕𝟏 , . . . . , 𝒙𝒕𝒌
In other words
Shifting the time origin by an amount 𝜏 has no effect on
the joint distribution which depends only on the intervals between 𝑡1 . . . . . . . 𝑡𝑘
Stationarizing A Time Series
V A L U E F O R E C A S T I N G
"Stationarizing" A Time Series
V A L U E F O R E C A S T I N G
𝑦𝑖 = 𝑦𝑜𝑟𝑖𝑔𝑖
𝑦𝑖 = [ln(𝑦𝑜𝑟𝑖𝑔𝑖)]′
Algorithm Structure
Learning Model
Logarithm &
Differentiation
Integration &
Exponentiation
V A L U E F O R E C A S T I N G
Segmentation Concatenation
𝑦𝑖 = [ln(𝑦𝑜𝑟𝑖𝑔𝑖)]′ 𝑦𝑝𝑟𝑒𝑑𝑖 = 𝑒 ො𝑦𝑖
Features Prediction
Nonstationary
Stochastic Series
Stationary
Stochastic Series
SegmentsRegression Prediction
Stationary
Stochastic Prediction
Nonstationary Final
Stochastic Prediction
4D Financial Graph Data
Assets
DEEP REINFORCEMENT LEARNING
Time
Features
DEEP LEARNING IN PRODUCTION
ALGOTRADING PITFALLS
Slippage is the difference
between where the computer
signaled the entry and exit for
a trade and where actual
clients, with actual money,
entered and exited.
Slippage
Commission is a service
charge assessed by a broker
or investment advisor in return
for providing investment
advice and/or handling the
purchase or sale of a security.
Commission
DEEP LEARNING IN FINANCE
Theoretical Motivations
Smoothness Assumption
Points that are close to each other are more likely to share a label.
-Allow to easily interpolate between examples.
-The root for the curse of dimensionality
Depth Assumption
Depth is a double edge sword.
- Depth can add exponentially (comparing to width) more predictive power to the network but only
if done right.
Distributed Representations
Localist models are very inefficient whenever the data has componential structure
-Distributed representations are useful for efficient combining of features to learn the underlying
mechanism that the labels are derived from.
Kitchen sink approach
Possible relevant features from many market can be incredibly large.
DEEP LEARNING IN FINANCE
UNDERFITTING
Finance models often have poor results Solve this by smart risk management
“It works if you remove this situation”Save yourself some money, it just doesn’t work.
Feature Engineering VS Raw DataA good question.
DEEP LEARNING IN FINANCE
Data: Unlimited
Image Finance
Overtraining : Hard
Data: Limited
Overtraining: Easy
$
“You are basically risking millions on something no one in the world fully understands?”
"Why Should I Trust You?": Explaining the Predictions of Any Classifier – Ribeiro, et al 2015
DEEP LEARNING IN FINANCE
MARKOV PROPERTY
𝑃 𝑋𝑛 = 𝑥𝑛 𝑋𝑛−1 = 𝑥𝑛−1, 𝑋𝑛−2 = 𝑥𝑛−2, . . . . , 𝑋0 = 𝑥0) = 𝑃(𝑋𝑛 = 𝑥𝑛|𝑋𝑛−𝑡 = 𝑥𝑛−𝑡)
Markov Property
DEEP LEARNING IN FINANCE
RNN PITFALLS
Example:
Jane walked into the room John walked in too. It was late at night. Jane said hi to _____
The difficulty in training recurrent neural nets – Bengio, et al 2013
Tricks:
- Init the weight matrices to the identity Matrix
- Set the activation function to RELU
- Norm clipping the exploding gradient
ො𝑔 ←𝜕𝜀
𝜕𝑤
𝒊𝒇 𝑔 ≥ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝒕𝒉𝒆𝒏
ො𝑔 ←𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
|| ො𝑔||ො𝑔
A Simple Way to Initialize Recurrent Networks of Rectified Linear Units– Hinton, et al 2015
DEEP LEARNING IN FINANCE
DEBUGGING DEEP NETS
Understanding the difficulty in training deep feedforward neural nets– Bengio, et al 2013
Mean and standard deviation of the activation (output of
the sigmoid) during learning, for 4 hidden layers .
The top hidden layer quickly saturates at 0 (slowing down
all learning), but then slowly desaturates ~ epoch 100.
Other Uses of Deep Learning
Extracting price impact of news Stress signal on Individual Asset from text (Word2Vec) and
fundamental data
Non-Linear Portfolio Replication Using Fewer Instruments
Reinforcement LearningFor Continues Algo optimization
Space Embedding For Correlation between Assets
DEEP LEARNING IN FINANCE
GOOG
AAPL
MMM
...
...
Deep Learning Orders
Trading / Training On Site
Research In House
Data StorageGPU CLUSTERTesting
DataMarket Market
Under fitting
Underfitting occurs when a statistical model or
machine learning algorithm cannot capture the
underlying trend of the data. Intuitively, underfitting
occurs when the model or the algorithm does not fit
the data well enough.
Underftting is either caused by trying to fit a too simple
model or by not training the model enough.
Calculate 1+1
What is 1? I've
only seen
equations with 4
TRADING USING DEEP LEARNING
Represented
As Tensor
Multiplications
Computed
On NVidia
GPUs
Model
Training
Algorithm
USING THE GPU
TRADING USING DEEP LEARNING
X
THE IMPORTANCE OF POWER
TRADING USING DEEP LEARNING
PerformanceComputational Speed allows us to further train our model to
yield better accuracy in the finite amount of training time we
have.
Memory CapacityHigh Memory Capacity is crucial to our systems, allowing
us to construct wider and deeper models and integrate more
data into the models.
Interface SpeedThe Speed of the card interface defines how many model
“shifts” can we perform in one trading cycle.
CPU vs GPU
Compiled with .nvcc
GDDR5 – 7.0 Gbps
Compiled with .gcc
DDR4 - 2400Mhz
TRADING USING DEEP LEARNING
Matrix multiplication complexity–
𝑂(𝑛3)Epoch takes: 7500 sec (avg)
Matrix multiplication complexity– 𝑂(𝑛)[1]
Epoch takes: 500 sec (avg)
[1] - Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication -
Fatahalian, et al, 2004
POWER
TRAINING
ACCURACY
RETURNS
=
=
=
Deep Learning
Algorithmic Trading
Paper Review
GPU Based Trading
Recap
Questions?