uncertainty-aware lookahead factor models for improved ...no... · data background •us stocks...

Uncertainty-Aware Lookahead Factor Models for Improved

Quantitative Investing

Lakshay Chauhan, John Alberg, Zachary Lipton

Overview


• Portfolios are constructed by ranking stocks using a factor

• factors based on fundamentals such as Revenue, Income, Debt

• Standard quantitative investing uses current fundamentals

• Investment success what a company does in the future

Can we use forecast future fundamentals then?

Improve quantitative investing by forecasting fundamentals and measuring uncertainty

Our Contribution

• Show value of forecasting fundamentals

• Forecast future fundamentals using neural

networks and measure uncertainty

• Use uncertainty estimate to reduce risk as

measured by Sharpe Ratio

• Portfolio return and risk are significantly

improved

Overview

Improve quantitative investing by forecasting fundamentals and measuring uncertainty

Motivation

Limitation

Factor models rely on current period fundamentals, but returns are driven by future fundamentals

Solution

Build factor models using forecast future fundamentals


Ranking Factor - Dividend YieldSouthern

Hewlett Packard

Pfizer

Verizon

US Bancorp

Duke Energy

Leggett & Platt

Omnicom Group

FactorsDividend Yield

Earnings Yield

Book-to-Market

MomentumPick top N ranked

by a factor

Portfolio

Value Factors

𝐹𝑢𝑛𝑑𝑎𝑚𝑒𝑛𝑡𝑎𝑙 𝐼𝑡𝑒𝑚 (𝑛𝑒𝑡 𝑖𝑛𝑐𝑜𝑚𝑒, 𝐸𝐵𝐼𝑇)𝑆𝑡𝑜𝑐𝑘 𝑃𝑟𝑖𝑐𝑒

Value factors outperform market averages (SP500)

Motivation

Clairvoyant Factor Model

• Imagine we had access to future fundamentals

• Simulate performance with future fundamentals (2000-2019)

• Clairvoyant fundamentals offer substantial advantage• This motivates us to forecast future fundamentals

Problem Set up

• Use EBIT as the fundamental to create value-factor• Forecast EBIT 12 months into the future

Data Background

• US stocks from 1970-2019 traded on NYSE, NASDAQ and AMEX (~12,000), Market Cap > $100M• Time series of 5 years with step size of 12 months

Jan, 2000 Jan, 2001 Jan, 2002 Jan, 2003 Jan, 2004 Jan, 2005

Feb, 2000 Feb, 2001 Feb, 2002 Feb, 2003 Feb, 2004 Feb, 2005

Mar, 2000 Mar, 2001 Mar, 2002 Mar, 2003 Mar, 2004 Mar, 2005

Input Series Target

Fundamental Momentum Auxiliary

Revenue 1-month relative momentum Short Interest

Cost of Goods Sold 3-month relative momentum Industry Group

Earnings Before Interest and Taxes (EBIT) 6-month relative momentum Company size category

Current Debt 9-month relative momentum

Long Term Debt

• Feature Examples

IBM

IBM

IBM

Forecasting Model

• In-sample validation set is used for genetic algorithm based hyper-parameter tuning

• Multi-task learning to predict all fundamental features instead of just EBIT• Increases training signal

• Improves generalization

• Use Max Norm and Dropout for regularization

Input Layer � �¹� Hidden Layer � �¹² Hidden Layer � �¹� Output Layer � ��

Multi-task LearningPredict all fundamentals

Random 70-30 train-validation split Out of Sample Test Set

Training Set

1970 2000 2019

Uncertainty Quantification• Financial data is heteroskedastic i.e. noise is data dependent

• Some companies will have more uncertainty in their earnings than others due to size, industry, etc.

• Jointly model mean and variance by splitting final layer• First half predicts means of targets• Second half predicts variance of the output values or aleatoric uncertainty

Input Layer � �¹� Hidden Layer � �¹² Hidden Layer � �¹� Output Layer � �¹�

VarianceMean

minimize uncertainty (narrow bounds)

prediction accuracy

Epistemic Uncertainty = Variance in outputs across Monte Carlo draws of dropout mask

Total Uncertainty = Aleatoric Uncertainty + Epistemic Uncertainty

penalize over-confident model

Constructing Factor Models

Definitions

EV - Enterprise Value

QFM - Quantitative Factor Model

LFM - Lookahead Factor Model

LFM UQ – Uncertainty Quantified Model

QFM

LFM Auto Reg

LFM Linear

LFM MLP

LFM LSTM

LFM UQ-MLP

LFM UQ-LSTM

Factor Models

Companies with higher variance are riskier• Higher variance = less certain about forecasts• Therefore, scale factor in inverse proportion to

variance

Portfolio Simulation

Simulated returns of a quantitative strategy vs. the real returns generated from live trading of the same strategy

• Industry grade, high fidelity investment portfolio simulator

• Portfolios formed of top 50 stocks ranked by factor

• Rebalance portfolio monthly

• Simulate 50 years of performance, many economic cycles

• Point-in-time data, no survivorship or look-ahead bias

• Include transactions cost, price slippage to reflect realistic trading

• Measure performance by Compound Annualized Return (CAR) and Sharpe Ratio

330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384

Title Suppressed Due to Excessive Size

outperform standard factor models. Motivated by this study,we first establish a correspondence between the accuracyof DNN forecast and portfolio returns. While training theLSTM model, we generate forecasts for the out-of-sampleperiod after every epoch. As the training progresses themodel gets better, measured by decreasing out-of-sampleMSE. We then use the forecasts to simulate portfolio perfor-mance over the same time period. Figure 4 shows the rela-tionship between increasing model accuracy and improvingportfolio returns. This experiment validates our hypothesisthat returns are strongly dependent on the accuracy of theforecasting model.

Figure 4. Correspondence between DNN model accuracy and port-folio returns. Bottom-rightmost point is evaluated after the firstepoch. As the training progresses, points in the graph move to-wards the upper left corner. Portfolio returns increase as the out-of-sample MSE decreases.

Figure 5. MSE over out-of-sample time period for MLP (red) andQFM or Naive predictor (blue)

As a first step in evaluating the forecast produced by theneural networks, we compare the MSE of the predictedfundamentals on out-of-sample data with a naive predictorwhere the predicted fundamentals at time t are assumed tobe the same as the fundamentals at t - 12. In nearly all

the months, however turbulent the market, neural networksoutperform the naive predictor (Figure 5).

Table 2. Out-of-sample performance for the 2000-2019 time period.All factor models use EBIT/EV. QFM uses current EBIT while ourproposed LFMs use predicted future EBIT.

Strategy MSE CAR Sharpe Ratio

S&P 500 n/a 6.05% 0.32QFM 0.65 14.0% 0.52LFM Auto Reg 0.58 14.2% 0.56LFM Linear 0.52 15.5% 0.64LFM MLP 0.48 16.1% 0.68LFM LSTM 0.48 16.2% 0.68LFM UQ-LSTM 0.48 17.7% 0.84LFM UQ-MLP 0.47 17.3% 0.83

Table 2 demonstrates a clear advantage of using look-aheadfactor models or LFMs over standard QFM. MLP and LSTMLFMs achieve higher model accuracy than linear or auto-regression models and thus yield better portfolio perfor-mance. Figure 6 shows the cumulative return of all portfo-lios across the out-of-sample period.

Investors not only care about return of a portfolio but alsothe risk undertaken as measured by volatility. Risk adjustedreturn or Sharpe ratio is meaningfully higher for LFM UQmodels which reduce the risk by scaling the EBIT forecastin inverse proportion to the total variance.

Table 3. Pairwise t-statistic for Sharpe ratio. The models are or-ganized in increasing order of Sharpe ratio values. t-statistic forLFM UQ models are marked in bold if they are significant with asignificance level of 0.05.

Auto-Reg Linear MLP LSTM UQ-LSTM UQ-MLP

QFM 0.76 2.52 2.93 2.96 5.57 6.01

Auto Reg 1.89 2.31 2.36 5.10 5.57

Linear 0.36 0.46 3.12 3.66

MLP 0.10 2.82 3.39

LSTM 2.66 3.22

We provide pairwise t-statistic values for Sharpe ratio intable 3 where improvement in Sharpe ratio for LFM UQmodels is statistically significant. As discussed in section5, we run 300 simulations with varying initial start state foreach model. Additionally, we randomly restrict the universeof stocks to 70% of the total universe making the signifi-cance test more robust to different portfolio requirements.We aggregate the monthly returns of these 300 simulationsby taking the mean and perform bootstrap resampling on themonthly returns to generate the t-statistic values for Sharperatio shown in table 3. The last two columns correspond-ing to LFM UQ models provide strong evidence that the

Results

330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384

Title Suppressed Due to Excessive Size

outperform standard factor models. Motivated by this study,we first establish a correspondence between the accuracyof DNN forecast and portfolio returns. While training theLSTM model, we generate forecasts for the out-of-sampleperiod after every epoch. As the training progresses themodel gets better, measured by decreasing out-of-sampleMSE. We then use the forecasts to simulate portfolio perfor-mance over the same time period. Figure 4 shows the rela-tionship between increasing model accuracy and improvingportfolio returns. This experiment validates our hypothesisthat returns are strongly dependent on the accuracy of theforecasting model.

Figure 4. Correspondence between DNN model accuracy and port-folio returns. Bottom-rightmost point is evaluated after the firstepoch. As the training progresses, points in the graph move to-wards the upper left corner. Portfolio returns increase as the out-of-sample MSE decreases.

Figure 5. MSE over out-of-sample time period for MLP (red) andQFM or Naive predictor (blue)

As a first step in evaluating the forecast produced by theneural networks, we compare the MSE of the predictedfundamentals on out-of-sample data with a naive predictorwhere the predicted fundamentals at time t are assumed tobe the same as the fundamentals at t - 12. In nearly all

the months, however turbulent the market, neural networksoutperform the naive predictor (Figure 5).

Table 2. Out-of-sample performance for the 2000-2019 time period.All factor models use EBIT/EV. QFM uses current EBIT while ourproposed LFMs use predicted future EBIT.

Strategy MSE CAR Sharpe Ratio

S&P 500 n/a 6.05% 0.32QFM 0.65 14.0% 0.52LFM Auto Reg 0.58 14.2% 0.56LFM Linear 0.52 15.5% 0.64LFM MLP 0.48 16.1% 0.68LFM LSTM 0.48 16.2% 0.68LFM UQ-LSTM 0.48 17.7% 0.84LFM UQ-MLP 0.47 17.3% 0.83

Table 2 demonstrates a clear advantage of using look-aheadfactor models or LFMs over standard QFM. MLP and LSTMLFMs achieve higher model accuracy than linear or auto-regression models and thus yield better portfolio perfor-mance. Figure 6 shows the cumulative return of all portfo-lios across the out-of-sample period.

Investors not only care about return of a portfolio but alsothe risk undertaken as measured by volatility. Risk adjustedreturn or Sharpe ratio is meaningfully higher for LFM UQmodels which reduce the risk by scaling the EBIT forecastin inverse proportion to the total variance.

Table 3. Pairwise t-statistic for Sharpe ratio. The models are or-ganized in increasing order of Sharpe ratio values. t-statistic forLFM UQ models are marked in bold if they are significant with asignificance level of 0.05.

Auto-Reg Linear MLP LSTM UQ-LSTM UQ-MLP

QFM 0.76 2.52 2.93 2.96 5.57 6.01

Auto Reg 1.89 2.31 2.36 5.10 5.57

Linear 0.36 0.46 3.12 3.66

MLP 0.10 2.82 3.39

LSTM 2.66 3.22

We provide pairwise t-statistic values for Sharpe ratio intable 3 where improvement in Sharpe ratio for LFM UQmodels is statistically significant. As discussed in section5, we run 300 simulations with varying initial start state foreach model. Additionally, we randomly restrict the universeof stocks to 70% of the total universe making the signifi-cance test more robust to different portfolio requirements.We aggregate the monthly returns of these 300 simulationsby taking the mean and perform bootstrap resampling on themonthly returns to generate the t-statistic values for Sharperatio shown in table 3. The last two columns correspond-ing to LFM UQ models provide strong evidence that the

Pairwise t-statistic for Sharpe ratio with ⍺=0.05Out-of-Sample Performance 2000-2019

Cumulative return of different strategies from 2000 to 2019

Results

Conclusion

• Forecasting fundamentals is valuable in quantitative investing

• Use DNN to forecast future fundamentals and estimate uncertainty

• Improve return and Sharpe ratio

Thank You

[email protected]@euclidean.com

[email protected]

uncertainty-aware lookahead factor models for improved ...no... · data background •us stocks...

Documents