comparison of high-frequency covolatility forecasts for high … · 2018-08-05 · comparison of...
TRANSCRIPT
Comparison of high-frequency covolatilityforecasts for high-dimensional portfolio
allocation
with N. Hautsch
My visit at the University of Pennsylvania was sponsored by the Austrian Ministry forDigital and Economic Affairs and by the Slovak Ministry of education, science, research
and sport.
Large-scale portfolio allocation based on H-F data
Goal:construction of global minimum variance portfolios for S&P 500.Crucial input:forecasts of high-dimensional covariance matrices.Hautsch, Kyj and Malec (JAE, 2015):→ H-F forecasts beat L-F in terms of portfolio volatility,→ H-F methods lead to higher portfolio turnover,→ H-F methods are worth up to 199 bp for risk-averse investor,→ H-F methods perform better for moderate-size portfolios
(e.g., 30-100 most traded stocks).Next move:would the efficient H-F approach further increase investor’s utility?How to solve practical issues with H-F data . . . as the portfolio grows:→ different sample sizes,→ asynchronous observations. . . loss of information,→ ill-conditioned/in-definite covariance matrix estimators,→ lack of computational memory,→ computation takes too long for short-term portfolio management.
2 Motivation
Content
FrameworkGlobal minimum variance portfolioPortfolio characteristicsEconomic criterion
Local method of moments estimator (LMME)Spectral approach and efficiencyClustering and regularizationQuantifying information loss
Other high-frequency estimatorsRegularization and blocking estimator (RnB)Cholesky factorization estimator (CHOLCOV)
Empirical comparison resultsLOBSTER dataPortfolio charateristicsEconomic criterion
3
Global minimum variance portfolioFinding optimal weights at time t for portfolio of m stocks at time t + 1
ω∗t+1 = arg minω∈Rm
ω>Σt+1ω, s.t. ω>ι = 1,
we obtain the solution
ω∗t+1 =Ξt+1ι
ι>Ξt+1ι, with Ξ = Σ−1.
Criterion for cov-mat forecasts, Patton and Sheppard (2008):pluggin Σt+1|t 6= Σt+1, we get ωt+1|t and conditional portfolio variance
ω>t+1|t Σt+1ωt+1|t > ω∗Tt+1Σt+1ω∗t+1.
→ Focus on day-ahead prediction of cov-mat,. . . we do not predict conditional mean,
→ evaluation depends on choice of ex-post measure for Σt+1,. . . sample portfolio variance is biased criterion, see Voev (2009),
→ comparison based on. . . conditional portfolio variance, turnover and shortselling,. . . performance fees a risk-averse investor would be willing to pay.
4 Framework
Portfolio characteristics
→ Conditional portfolio variance
ω>t+1|tMRKt+1ωt+1|t ,
→ portfolio turnover according to de Pooter, Martens and Dijk (2008),
m∑l=1
∣∣∣∣∣ωlt+1|t − ωl
t|t−11 + r l
t
1 + prt
∣∣∣∣∣ ,→ amount of short-selling
m∑l=1
ωlt+1|tI
¶ωl
t+1|t < 0©.
5 Framework
Example: Assume investor invests 10$ into 2 stocks:Why is the turnover from day t to day t + 1 computed as
m∑l=1
∣∣∣∣∣ωlt+1|t − ωl
t|t−11 + r l
t
1 + prt
∣∣∣∣∣?→ on day t at 9:30 am, investor has/knows
- 10$ capital,- optimal weights ωt|t−1 = (0.6,0.4) for day t obtained on day
t − 1 (after 4 pm),- opening prices of the 2 stocks (1.5$,0.5$).
→ so investor has 4 shares of stock a and 8 shares of stock b.→ on day t after 4 pm, investor knows/has
- closing prices of the 2 stocks are (2$,0.1$).- rt = ( 1
3 ,−45 ) and prt = − 3
25 .- 8.8$ of capital in stocks.
→ on t + 1 just before rebalancing according to ωt+1|t (9:30 amsharp)- prices of the 2 stocks are still the same (2$,0.1$).- but the weights of stocks a,b are not (0.6,0.4) but
( 4·28.8 ,
8·0.18.8 ) = (0.6,0.4)� (
1+ 13
1− 325,
1− 45
1− 325
).6 Framework
Economic utility criterion
We assess economic significance of a lower (conditional) portfoliovariance by the approach of Fleming, Kirby and Ostdiek (JoF, 2001).→ If investor has quadratic utility
U(prt+1) = 1 + prt+1 −γ
2(1 + γ)
(1 + prt+1
)2,
→ ∆γ1 is defined by the equality
T−1∑t=1
E[U(prIt+1)|Ft
]=
T−1∑t=1
E[U(prIIt+1 −∆γ)|Ft
].
→ Interpret ∆γ as fee the investor is willing to pay in order to switchfrom allocation implied by ΣI
t+1|t to ΣIIt+1|t .
→ assuming E [rt+1|Ft ] = const., ∆γ > 0 iff pv2,I > pv2,II .
1γ = 1, 10 as in Fleming et al. (2003).7 Framework
Spectral approach and efficiencyAssume a continuous latent d-dimensional true log-price process:
y∗t = y∗0 +
∫ t
0Σ1/2(s)dbs, t ∈ [0,1] .
We are estimating integrated covolatility
ICov =
∫ 1
0Σ(t)dt ,
Observations are non-synchronous and poluted by iid noise:
y (l)ti = y∗(l)
ti + ε(l)i , ε
(l)i ∼ N(0, η2
l ), i = 1, . . . ,nl , l = 1, . . . ,m.
Efficient LMM estimator of Bibinger, Hautsch, Malec, Reiss (2014):→ partition the trading day into K intervals of length h = 1/K ,→ assume Σk (t) = Σk and define Sjk ∼ Nm(0,Σk + bias),→ compute S(l)
jk =√
2/h∑nl
i=1 ∆y (l)i sin
Äjπh−1(t (l)
i − kh)ä.
LMME =B∑
k=1
hJ∑
j=1
Wjkvec(S⊗2jk − ˆbias).
8 Local method of moments estimator (LMME)
Clustering and regularization→ Computational memory and speed issues2
. . . Wjk = I−1k Ijk ∈ Rm2×m2
require J × K × 123 GB for S&P 500,. . . Wjk can not be decomposed and estimated piecewise,. . . if m > 50 LMME takes days to obtain (if feasible).
→ Application to portfolio allocation related issues. . . Σ has to be positive-definite and well-conditioned,. . . adjust LMME to be feasible and well-conditioned.
→ Divide and conquer by Hsieh et al. (2012) and Tan et al. (2015):. . . Hierarchical correlation clustering & regularization
→ GLASSO estimator for the Σ−1 denoted Ξ:
Ξλ = argmin︸ ︷︷ ︸Ξ>0
ÄTr(ΞR)− log det(Ξ) + ‖L ◦ (Ξ− diag(Ξ))‖1
ä,
→ SEC estimator by Cui, Leng and Yu (CSDA, 2016):
Rλ = argmin︸ ︷︷ ︸R−εE>0
12
∥∥∥R − R∥∥∥2
F+ ‖L ◦ R‖1 ,
Lij = λ > 0 should assets i and j be in the same cluster or λ→∞.2LMME impl. in Matlab is feasible for m ≤ 20 on Mac, m ≤ 150 on VSC 256 GB.
9 Local method of moments estimator (LMME)
GLASSO
10 Local method of moments estimator (LMME)
SEC
11 Local method of moments estimator (LMME)
GLASSO - average absolute correlation
12 Local method of moments estimator (LMME)
SEC - average absolute correlation
13 Local method of moments estimator (LMME)
GLASSO - conditonal numbers
14 Local method of moments estimator (LMME)
SEC - conditonal numbers
15 Local method of moments estimator (LMME)
Divide and conquer step-by-step
On every intra-day interval 1, . . .K :→ Estimate PILOT spectral estimator, Bibinger and Reiss (2014),→ use the scaled estimator as adjacency matrix 1− |R| for HCC,
. . . Step 0, each asset is a cluster, . . . Steps 1, . . . ,m − 1 mergetwo nearest (linkage) clusters ,. . . search dendrogram top-down and find “height” λ, such thateach cluster c = 1, . . . ,C has ≤ 50 assets,
→ GLASSO/SEC on each cluster of assets with parameter λ.→ with Ξk,c
λ or Rk,cλ , compute local-cluster-LMME (now feasible),
→ put all clusters back together & replace 0’s by respective PILOTentries,
LMME-GLASSO =B∑
k=1
h · (Ξkλ)−1
LMME-SEC =B∑
k=1
h · (Σkλ)−1
16 Local method of moments estimator (LMME)
Linkage
linkage description noteaverage 1
‖C1‖0‖C2‖0
∑c1∈C1,c2∈C2
corr(c1, c2) Tan et al. (2015)Tumminello et al. (2010)complete max {corr(c1, c2); c1 ∈ C1, c2 ∈ C2}
single min {corr(c1, c2); c1 ∈ C1, c2 ∈ C2} leads to trailing clusters
. . .
cluster m
I
I
asset m
cluster m − 1
I
assetm − 1
cluster m − 2
assetm − 2
clusterm − 3
. . .
cluster m
cluster m − 2
clusterm − 6
clusterm − 5
cluster m − 1
clusterm − 4
clusterm − 3
17 Local method of moments estimator (LMME)
Quantifying information loss
Clustering and regularization lead to loss of information:→ information distance of GLASSO and SEC from efficient LMME,→ Tumminello et al. (Phys. Rev. 2007) use Kulback-Leibler distance:
K (R1,R2) =12
ÅlogÅ
det R2
det R1
ã+ trÄR−1
2 R1
ä−mã.
and showed that
EîKÄR,R
äó=
12
(m log
(n2
)−
n∑t=n−m+1
ψ(t/2)
).
Data cluster glasso sec identitySP 100 21.23 12.21 30.75 65.12SP 500 3.13 2.76 6.07 6.04
18 Local method of moments estimator (LMME)
Regularization and blocking estimator (RnB)
→ start: multivariate realized kernel estimator proposed byBarndorf-Nielsen et al. (2011),
MRK =H∑
h=−H
kÅ
hH + 1
ãΓ, with bandwidth H,
. . . refresh-time sampling for synchronization of prices,
. . . sample if all assets have been (re-)traded at least once,
. . . large loss of data - frequency is driven by least liquid asset,
. . . it gets worse as portfolio grows (heterogeneity).→ Hautsch et al. (2012): blocking (clustering) based on liquidity,
. . . set fixed number of liquidity groups (e.g., 4), apply refresh-timeon each group,. . . estimate G(G + 1)/2 blocks by MRK,. . . put blocks back together in hierarchical order.
→ Need regularization (beside smoothing). . . Laloux et al. (1999) propose eigenvalue cleaning.. . . idea: eigenvalues which are close to 0 are noisy and can beinflated without loosing information.
19 Other high-frequency estimators
RnB - average absolute correlation
20 Other high-frequency estimators
RnB - conditonal numbers
21 Other high-frequency estimators
Cholesky factorization estimator (CHOLCOV)→ Boudt et al. (2017): reorder assets according to their liquidity,
. . . estimate Σ hierachically element-by-element,
. . . hierarchical Cholesky factorization Σk = BVB>,
B =
1 0 . . . 0
b21 1 . . . 0...
.... . .
...bm1 bm2 . . . 1
and V =
v11 0 . . . 00 v22 . . . 0...
.... . .
...0 0 . . . vmm
.. . . factorization of the returns
fτi = B−1xτi ∼ N(0, (τi − τi−1)V ),
. . . where τ1, . . . , τn are synchronized observation times,
. . . convenient hierarchical regression representation
xl,τ = fl,τbl,1 + . . . ,+fl−1,τbl,l−1 + fl,τ
. . . regression coefficients bij = E [ri , fj ]/E [f 2j ], j < i and vii = E [f 2
i ],→ We: uni/bi- local spectral estimator of Bibinger and Reiss (2014).
22 Other high-frequency estimators
CHOLCOV - average absolute correlation
23 Other high-frequency estimators
CHOLCOV - conditonal numbers
24 Other high-frequency estimators
LOBSTER data from NASDAQ data feed
→ mid-quote data on 100 and 426 most liquid assets of S&P 500,. . . sample period 1/1/2012− 31/12/2015, i.e. 994 trading days,. . . raw quotes cleaned as in Barndorf-Nielsen et al. (2009),. . . quotes revised for 0 returns, reduction of comp. burden.. . . e.g., for AAPL on 1/Jan/2012: 104 711→ 39 843, i.e. 62%,. . . the least liquid asset on this day was XRX with only 436observations after revisions,
→ compute PILOT, GLASSO, SEC, RnB and CHOLCOV for allt = 1 . . . ,993, . . . estimators are smoothed across 1,2,5,20 days,. . . day t estimates are forecasts for t + 1
→ performance depends on choice of nuisance parameters,. . . for spectral approach, B, J according to Bibinger et al. (2018),. . . for RnB, G,H and others according to Hautsch et al. (2015),. . . for clustering and regularization: number of clusters, size,. . . arbitrary choice: smoothing after regularization, MRK asex-post approximation.
25 Empirical comparison results
Portfolio charateristics
67
89
10 PILOTSECGLASSO
CHOLCOVRnB
1 5 20
A
67
89
10
1 5 20
B
050
100
150
200
250
300
1 5 20
C
050
100
150
200
250
300
1 5 20
D
PV: A) SP 100 - PV, B) SP 500 PT C) SP 100, D) SP 50026 Empirical comparison results
Economic criterion
8010
012
014
0
PILOTGLASSO
RnB
1 5 20
A 160
180
200
220
240
1 5 20
B
2040
6080
1 5 20
C
4060
8010
012
0
1 5 20
D
CHOLCOV: A) SP 100 B) SP 500 RV5min: C) SP 100 D) SP 50027 Empirical comparison results
Economic criterion
12
34
PILOTGLASSO
RnB
1 5 20
A
12
34
1 5 20
B
−2
02
46
1 5 20
C
−2
02
46
1 5 20
D
CHOLCOV: A) SP 100 B) SP 500 PILOT: C) SP 100 D) SP 50028 Empirical comparison results
Thank you!