uncertainty-aware lookahead factor models for improved ...no... · data background •us stocks...
TRANSCRIPT
Uncertainty-Aware Lookahead Factor Models for Improved
Quantitative Investing
Lakshay Chauhan, John Alberg, Zachary Lipton
Overview
Quantitative Investing
• Portfolios are constructed by ranking stocks using a factor
• factors based on fundamentals such as Revenue, Income, Debt
• Standard quantitative investing uses current fundamentals
• Investment success what a company does in the future
Can we use forecast future fundamentals then?
Improve quantitative investing by forecasting fundamentals and measuring uncertainty
Our Contribution
• Show value of forecasting fundamentals
• Forecast future fundamentals using neural
networks and measure uncertainty
• Use uncertainty estimate to reduce risk as
measured by Sharpe Ratio
• Portfolio return and risk are significantly
improved
Overview
Improve quantitative investing by forecasting fundamentals and measuring uncertainty
Motivation
Limitation
Factor models rely on current period fundamentals, but returns are driven by future fundamentals
Solution
Build factor models using forecast future fundamentals
Quantitative Investing
Ranking Factor - Dividend YieldSouthern
Hewlett Packard
Pfizer
Verizon
US Bancorp
Duke Energy
Leggett & Platt
Omnicom Group
FactorsDividend Yield
Earnings Yield
Book-to-Market
MomentumPick top N ranked
by a factor
Portfolio
Value Factors
𝐹𝑢𝑛𝑑𝑎𝑚𝑒𝑛𝑡𝑎𝑙 𝐼𝑡𝑒𝑚 (𝑛𝑒𝑡 𝑖𝑛𝑐𝑜𝑚𝑒, 𝐸𝐵𝐼𝑇)𝑆𝑡𝑜𝑐𝑘 𝑃𝑟𝑖𝑐𝑒
Value factors outperform market averages (SP500)
Motivation
Clairvoyant Factor Model
• Imagine we had access to future fundamentals
• Simulate performance with future fundamentals (2000-2019)
• Clairvoyant fundamentals offer substantial advantage• This motivates us to forecast future fundamentals
Problem Set up
• Use EBIT as the fundamental to create value-factor• Forecast EBIT 12 months into the future
Data Background
• US stocks from 1970-2019 traded on NYSE, NASDAQ and AMEX (~12,000), Market Cap > $100M• Time series of 5 years with step size of 12 months
Jan, 2000 Jan, 2001 Jan, 2002 Jan, 2003 Jan, 2004 Jan, 2005
Feb, 2000 Feb, 2001 Feb, 2002 Feb, 2003 Feb, 2004 Feb, 2005
Mar, 2000 Mar, 2001 Mar, 2002 Mar, 2003 Mar, 2004 Mar, 2005
Input Series Target
Fundamental Momentum Auxiliary
Revenue 1-month relative momentum Short Interest
Cost of Goods Sold 3-month relative momentum Industry Group
Earnings Before Interest and Taxes (EBIT) 6-month relative momentum Company size category
Current Debt 9-month relative momentum
Long Term Debt
• Feature Examples
IBM
IBM
IBM
Forecasting Model
• In-sample validation set is used for genetic algorithm based hyper-parameter tuning
• Multi-task learning to predict all fundamental features instead of just EBIT• Increases training signal
• Improves generalization
• Use Max Norm and Dropout for regularization
Input Layer � �¹� Hidden Layer � �¹² Hidden Layer � �¹� Output Layer � ��
Multi-task LearningPredict all fundamentals
Random 70-30 train-validation split Out of Sample Test Set
Training Set
1970 2000 2019
Uncertainty Quantification• Financial data is heteroskedastic i.e. noise is data dependent
• Some companies will have more uncertainty in their earnings than others due to size, industry, etc.
• Jointly model mean and variance by splitting final layer• First half predicts means of targets• Second half predicts variance of the output values or aleatoric uncertainty
Input Layer � �¹� Hidden Layer � �¹² Hidden Layer � �¹� Output Layer � �¹�
VarianceMean
minimize uncertainty (narrow bounds)
prediction accuracy
Epistemic Uncertainty = Variance in outputs across Monte Carlo draws of dropout mask
Total Uncertainty = Aleatoric Uncertainty + Epistemic Uncertainty
penalize over-confident model
Constructing Factor Models
Definitions
EV - Enterprise Value
QFM - Quantitative Factor Model
LFM - Lookahead Factor Model
LFM UQ – Uncertainty Quantified Model
QFM
LFM Auto Reg
LFM Linear
LFM MLP
LFM LSTM
LFM UQ-MLP
LFM UQ-LSTM
Factor Models
Companies with higher variance are riskier• Higher variance = less certain about forecasts• Therefore, scale factor in inverse proportion to
variance
Portfolio Simulation
Simulated returns of a quantitative strategy vs. the real returns generated from live trading of the same strategy
• Industry grade, high fidelity investment portfolio simulator
• Portfolios formed of top 50 stocks ranked by factor
• Rebalance portfolio monthly
• Simulate 50 years of performance, many economic cycles
• Point-in-time data, no survivorship or look-ahead bias
• Include transactions cost, price slippage to reflect realistic trading
• Measure performance by Compound Annualized Return (CAR) and Sharpe Ratio
330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384
Title Suppressed Due to Excessive Size
outperform standard factor models. Motivated by this study,we first establish a correspondence between the accuracyof DNN forecast and portfolio returns. While training theLSTM model, we generate forecasts for the out-of-sampleperiod after every epoch. As the training progresses themodel gets better, measured by decreasing out-of-sampleMSE. We then use the forecasts to simulate portfolio perfor-mance over the same time period. Figure 4 shows the rela-tionship between increasing model accuracy and improvingportfolio returns. This experiment validates our hypothesisthat returns are strongly dependent on the accuracy of theforecasting model.
Figure 4. Correspondence between DNN model accuracy and port-folio returns. Bottom-rightmost point is evaluated after the firstepoch. As the training progresses, points in the graph move to-wards the upper left corner. Portfolio returns increase as the out-of-sample MSE decreases.
Figure 5. MSE over out-of-sample time period for MLP (red) andQFM or Naive predictor (blue)
As a first step in evaluating the forecast produced by theneural networks, we compare the MSE of the predictedfundamentals on out-of-sample data with a naive predictorwhere the predicted fundamentals at time t are assumed tobe the same as the fundamentals at t - 12. In nearly all
the months, however turbulent the market, neural networksoutperform the naive predictor (Figure 5).
Table 2. Out-of-sample performance for the 2000-2019 time period.All factor models use EBIT/EV. QFM uses current EBIT while ourproposed LFMs use predicted future EBIT.
Strategy MSE CAR Sharpe Ratio
S&P 500 n/a 6.05% 0.32QFM 0.65 14.0% 0.52LFM Auto Reg 0.58 14.2% 0.56LFM Linear 0.52 15.5% 0.64LFM MLP 0.48 16.1% 0.68LFM LSTM 0.48 16.2% 0.68LFM UQ-LSTM 0.48 17.7% 0.84LFM UQ-MLP 0.47 17.3% 0.83
Table 2 demonstrates a clear advantage of using look-aheadfactor models or LFMs over standard QFM. MLP and LSTMLFMs achieve higher model accuracy than linear or auto-regression models and thus yield better portfolio perfor-mance. Figure 6 shows the cumulative return of all portfo-lios across the out-of-sample period.
Investors not only care about return of a portfolio but alsothe risk undertaken as measured by volatility. Risk adjustedreturn or Sharpe ratio is meaningfully higher for LFM UQmodels which reduce the risk by scaling the EBIT forecastin inverse proportion to the total variance.
Table 3. Pairwise t-statistic for Sharpe ratio. The models are or-ganized in increasing order of Sharpe ratio values. t-statistic forLFM UQ models are marked in bold if they are significant with asignificance level of 0.05.
Auto-Reg Linear MLP LSTM UQ-LSTM UQ-MLP
QFM 0.76 2.52 2.93 2.96 5.57 6.01
Auto Reg 1.89 2.31 2.36 5.10 5.57
Linear 0.36 0.46 3.12 3.66
MLP 0.10 2.82 3.39
LSTM 2.66 3.22
We provide pairwise t-statistic values for Sharpe ratio intable 3 where improvement in Sharpe ratio for LFM UQmodels is statistically significant. As discussed in section5, we run 300 simulations with varying initial start state foreach model. Additionally, we randomly restrict the universeof stocks to 70% of the total universe making the signifi-cance test more robust to different portfolio requirements.We aggregate the monthly returns of these 300 simulationsby taking the mean and perform bootstrap resampling on themonthly returns to generate the t-statistic values for Sharperatio shown in table 3. The last two columns correspond-ing to LFM UQ models provide strong evidence that the
Results
330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384
Title Suppressed Due to Excessive Size
outperform standard factor models. Motivated by this study,we first establish a correspondence between the accuracyof DNN forecast and portfolio returns. While training theLSTM model, we generate forecasts for the out-of-sampleperiod after every epoch. As the training progresses themodel gets better, measured by decreasing out-of-sampleMSE. We then use the forecasts to simulate portfolio perfor-mance over the same time period. Figure 4 shows the rela-tionship between increasing model accuracy and improvingportfolio returns. This experiment validates our hypothesisthat returns are strongly dependent on the accuracy of theforecasting model.
Figure 4. Correspondence between DNN model accuracy and port-folio returns. Bottom-rightmost point is evaluated after the firstepoch. As the training progresses, points in the graph move to-wards the upper left corner. Portfolio returns increase as the out-of-sample MSE decreases.
Figure 5. MSE over out-of-sample time period for MLP (red) andQFM or Naive predictor (blue)
As a first step in evaluating the forecast produced by theneural networks, we compare the MSE of the predictedfundamentals on out-of-sample data with a naive predictorwhere the predicted fundamentals at time t are assumed tobe the same as the fundamentals at t - 12. In nearly all
the months, however turbulent the market, neural networksoutperform the naive predictor (Figure 5).
Table 2. Out-of-sample performance for the 2000-2019 time period.All factor models use EBIT/EV. QFM uses current EBIT while ourproposed LFMs use predicted future EBIT.
Strategy MSE CAR Sharpe Ratio
S&P 500 n/a 6.05% 0.32QFM 0.65 14.0% 0.52LFM Auto Reg 0.58 14.2% 0.56LFM Linear 0.52 15.5% 0.64LFM MLP 0.48 16.1% 0.68LFM LSTM 0.48 16.2% 0.68LFM UQ-LSTM 0.48 17.7% 0.84LFM UQ-MLP 0.47 17.3% 0.83
Table 2 demonstrates a clear advantage of using look-aheadfactor models or LFMs over standard QFM. MLP and LSTMLFMs achieve higher model accuracy than linear or auto-regression models and thus yield better portfolio perfor-mance. Figure 6 shows the cumulative return of all portfo-lios across the out-of-sample period.
Investors not only care about return of a portfolio but alsothe risk undertaken as measured by volatility. Risk adjustedreturn or Sharpe ratio is meaningfully higher for LFM UQmodels which reduce the risk by scaling the EBIT forecastin inverse proportion to the total variance.
Table 3. Pairwise t-statistic for Sharpe ratio. The models are or-ganized in increasing order of Sharpe ratio values. t-statistic forLFM UQ models are marked in bold if they are significant with asignificance level of 0.05.
Auto-Reg Linear MLP LSTM UQ-LSTM UQ-MLP
QFM 0.76 2.52 2.93 2.96 5.57 6.01
Auto Reg 1.89 2.31 2.36 5.10 5.57
Linear 0.36 0.46 3.12 3.66
MLP 0.10 2.82 3.39
LSTM 2.66 3.22
We provide pairwise t-statistic values for Sharpe ratio intable 3 where improvement in Sharpe ratio for LFM UQmodels is statistically significant. As discussed in section5, we run 300 simulations with varying initial start state foreach model. Additionally, we randomly restrict the universeof stocks to 70% of the total universe making the signifi-cance test more robust to different portfolio requirements.We aggregate the monthly returns of these 300 simulationsby taking the mean and perform bootstrap resampling on themonthly returns to generate the t-statistic values for Sharperatio shown in table 3. The last two columns correspond-ing to LFM UQ models provide strong evidence that the
Pairwise t-statistic for Sharpe ratio with ⍺=0.05Out-of-Sample Performance 2000-2019
Cumulative return of different strategies from 2000 to 2019
Results
Conclusion
• Forecasting fundamentals is valuable in quantitative investing
• Use DNN to forecast future fundamentals and estimate uncertainty
• Improve return and Sharpe ratio