scalable inference for a full multivariate stochastic volatility
TRANSCRIPT
Scalable inference for a full
multivariate stochastic
volatility model
SYstemic Risk TOmography:
Signals, Measurements, Transmission Channels, and Policy Interventions
P. Dellaportas, A. Plataniotis and M. Titsias UCL(London),AUEB(Athens),AUEB(Athens)
Final SYRTO Conference - Université Paris1 Panthéon-Sorbonne February 19, 2016
I An important indicator of systemic risk is instantaneous volatilities and
correlations
I N-dimensional asset returns: rt = µt + "t , "t ⇠ N(0,⌃t ), t = 1, · · · ,T .
I The focus is shifted to modelling and predicting the covariance matrices ⌃t so
we assume that rt ⌘ "t .
I For realistic financial applications (portfolio allocation, systemic risk) think of N in
hundreds and T = 2000.
I Problem 1: The number of parameters in ⌃t is N(N + 1)/2 which grows
quadratically in N. The total number of parameters that need to be estimated is
TN(N + 1)/2.
I Problem 2: The N(N + 1)/2 parameters of each ⌃t are restricted since ⌃t
should be positive definite.
I Problem 3: There are many missing values (about 3% in the data we looked at)
and series with short lengths.
1-d Stochastic volatility model
I 1-dimensional returns
rt ⇠ N(µt ,�2t ),
with unobservable variances
log�2t+1 = µ+ � log�2
t + ⌘t , ⌘t ⇠ N(0, ⌧2),
I MCMC algorithms since 1994; sequential importance sampling, adaptive MCMC,
Laplace approximations, etc.
I Compare the stochastic volatility parameter-driven models with GARCH-type
observational-driven models
Volatility matrices - State of the art
I Two recent review articles on mulativariate stochastic volatility (Asai, McAleer,Yu,
2006; Chib, Omori, Asai, 2009); current state of the art is parsinomious
modelling of ⌃t and factor models with few independent factors, each one of
them being modelled as univariate stochastic volatility processes.
I A review article on multivariate GARCH models (Bauwens, Laurent, Rombouts;
2006); state of the art is parsimonious modelling of ⌃t and two-step estimation
procedures.
I Other approaches include Wishart processes (Philipov and Glickman; 2006) and
dynamic matrix-variate graphical models via inverted Wishart processes
(Carvalho and West; 2007).
Dynamic eigenvalue and eigenvector modelling
I We decompose ⌃t = Ut⇤t UTt and model Ut and ⇤t with an AR(1) process.
Direct modelling of Ut is hard.
I Since Ut is a rotation matrix, it can be parameterised w.r.t. N(N � 1)/2 Givens
angles, each one belonging to matrix Gjt :
Ut =
N(N�1)2
Y
j=1
Gjt
2-Dim
⌃t =
0
B
B
@
cos(!t ) sin(!t )
� sin(!t ) cos(!t )
1
C
C
A
0
B
B
@
�1t 0
0 �2t
1
C
C
A
0
B
B
@
cos(!t ) sin(!t )
� sin(!t ) cos(!t )
1
C
C
A
T
I Uniqueness: �1t > �2t , �⇡2 < !t <
⇡2
3-Dim
(Ignoring t): ⌃ = U⇤UT = G12G13G23⇤GT23GT
13GT12
U =
0
B
B
B
B
B
@
cos(!12) sin(!12) 0
� sin(!12) cos(!12) 0
0 0 1
1
C
C
C
C
C
A
0
B
B
B
B
B
@
cos(!13) 0 sin(!13)
0 1 0
� sin(!13) 0 cos(!13)
1
C
C
C
C
C
A
0
B
B
B
B
B
@
1 0 0
0 cos(!23) sin(!23)
0 � sin(!23) cos(!23)
1
C
C
C
C
C
A
U =
N(N�1)2
Y
j=1,k>j
Gjk =
N(N�1)2
Y
j=1,k>j
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1 0 . . . . . . 0...
0 cos(!jk ) 0 . . . 0 sin(!jk ) 0 . . .
...
0 0 . . . 1 . . . 0 . . . 0...
0 � sin(!jk ) 0 . . . 0 cos(!jk ) 0 . . .
...
0 0 0 . . . 0 1
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
I Note the sparsity of the N-dimensional rotation matrix: it contains 4 elements
with cosines and sines of the angle, ones in the diagonal, and zeroes
everywhere else.
I rt = (r1t , . . . , rNt )T , rt ⇠ MVN
�
0,Ut⇤t UTt
o
.
I Transformations: hit = log⇤it , �it = log⇣
⇡/2+!it⇡/2�!it
⌘
, i = 1, . . . ,N, t = 1, . . . ,T
hi,t+1 = µhi + �h
i · (hit � µhi ) + �h
i · ⌘hit , i = 1, . . . ,N
�j,t+1 = µ�j + ��
j · (�jt � µ�j ) + ��
j · ⌘�jt , j = 1, . . . ,N(N � 1)
2
where ⌘hit , ⌘
�jt ⇠ N
n
0, 1o
independently, and we denote
✓h = (�h1, . . . ,�
hN , µ
h1, . . . , µ
hN ,�
h1 , . . . ,�
hN)
✓� = (��1 , . . . ,�
�N(N�1)
2, µ�
1 , . . . , µ�N(N�1)
2,��
1⌘ , . . . ,��N(N�1)
2)
Priors
hi,t+1 = µhi + �h
i · (hit � µhi ) + �h
i · ⌘hit , i = 1, . . . ,N
�j,t+1 = µ�j + ��
j · (�jt � µ�j ) + ��
j · ⌘�jt , j = 1, . . . ,N(N � 1)
2
µhi ⇠ N(µ1,�
21), i = 1, . . . ,N
�hi ⇠ N(µ2,�
22), i = 1, . . . ,N
µ�j ⇠ N(µ3,�
23), j = 1, . . . ,
N(N � 1)2
��j ⇠ N(µ3,�
23), j = 1, . . . ,
N(N � 1)2
The Exchangeability assumption via a hierarchical model allows borrowing strength.
Partial exchangeability conditional on markets, sectors, etc is probably more realistic.
A general model formulation
A more general structure is a K-factor model constructed with an N ⇥ K matrix of factor
loadings B:
I rt = Bft + et , ✏t ⇠ N (0,�2I)
I The factor loadings matrix B has fixed/known structure while its non-zero
elements follow a Gaussian prior distribution
I ft ⇠ N (0,⌃t )
I ⌃t follows the multivariate stochastic volatility model with the Givens matrix
construction
I We need to constrain B so that the model is identifiable
I We do NOT need this model only when N is large: this model can treat missing
values -this is very important in real applications.
Computation
I With the Givens angles type model formulation we now deal a non-linear
likelihood plus a Gaussian process prior
I MCMC for these problems: Use an auxiliary Langevin MCMC based on an idea
by Titsias in the discussion of the RSSB discussion paper by Girolami and
Carderhead (2011).
I The Computational complexity: It is O(d3) for Normal densities of dimesion d ;
we achieve O(d2) even for the derivatives of the likelihood wrt Givens angles, so
our MCMC algorithm has complexity O(d2).
I Missing data are treated without any problem
The Sampling algorithm
Model: rt = Bft + et , ✏t ⇠ N (0,�2I), ft ⇠ N (0,⌃t) Denote by X
all latent paths
p(B,�2, (ft)Tt=1|rest) /
TY
t=1
N (rt |Bft ,�2I)N (ft |0,⌃t(xt))
!p(B,�2),
p(X |rest) /
TY
t=1
N (ft |0,⌃t(xt))
!p(X |✓h, ✓�),
p(✓h, ✓�|rest) / p(X |✓h, ✓�)p(✓h, ✓�).
We do not need to generate the missing data in rt
Sampling the Gaussian latent process
I Denote F = (f1, . . . , fT )
I Prior p(X) = N (X |M,Q�1)
I Current state of X is Xn. Use slice Gibbs:
I Introduce auxiliary variables U that live in the same space as X :
p(U|Xn) = N (U|Xn + �2r log p(F |Xn), �
2 I)
I U injects Gaussian noise into Xn and shifts it by (�/2)r log p(F |Xn)
I We cannot sample from p(X |U) so we use a Metropolis step: Propose Y from
proposal q:
q(Y |U) =1
Z(U)N (Y |U,
�
2I)p(Y )
= N (Y |(I +�
2Q)�1(U +
�
2QM),
�
2(I +
�
2Q)�1).
where Z(U) =R
N (Y |U, �2 I)p(Y )dY .
I Accept Y with Metropolis-Hastings probability min(1, r):
r =p(F |Y )p(U|Y )p(Y )
p(F |Xn)p(U|Xn)p(Xn)
q(Xn|U)
q(Y |U)=
p(F |Y )p(U|Y )p(Y )
p(F |Xn)p(U|Xn)p(Xn)
1Z(U)N (Xn|U, �
2 I)p(Xn)
1Z(U)N (Y |U, �
2 I)p(Y )
=p(F |Y )N (U|Y + �
2 Gy ,�2 I)
p(F |Xn)N (U|Xn + �2 Gt ,
�2 I)
N (Xn|U, �2 I)
N (Y |U, �2 I)
=p(F |Y )
p(F |Xn)exp
⇢
�(U � Xn)T Gt + (U � Y )T Gy �
�
4(||Gy ||2 � ||Gt ||2)
�
where Gt = r log p(F |Xn), Gy = r log p(F |Y ) and ||Z || denotes the Euclidean
norm of a vector Z .
I The Gaussian prior terms p(Xn) and p(Y ) have been cancelled out from the
acceptance probability, so their computationally expensive evaluation is not
required: the resulting q(Y |U) is invariant under the Gaussian prior.
I Tune � to achieve an acceptance rate of around 50 � 60%.
O(K 2) computation for the K-factor MSV model
I ft ⇠ N(0,⌃t ), ⌃t = Ut⇤t UTt , Ut =
Q
K (K�1)2
j=1 Gjt
log MSV(ft ) = �K2
log(2⇡)�12
KX
i=1
hit �12
vTt vt , (1)
where vt = ⇤� 1
2t UT
t ft and where we used that log |⌃t | = log |⇤t | =PK
i=1 hit .
I Given vt the above expression takes O(K ) time to compute.
I Gij (!ji,t )T ft takes O(1) time to compute since all of its elements are equal to the
corresponding ones from the vector ft apart from the i-th and j-th elements that
become ft [i] cos(!ji,t )� ft [i] sin(!ji,t ) and ft [j] sin(!ji,t ) + ft [j] cos(!ji,t ),
respectively.
I Similarly rht log MSV and r!ij,t log MSV are calculated in O(K 2) time.
O(N2) computation for the MSV model
Initialize vt = ft .for i = 1 to N � 1 do
for j = i + 1 to N do
c = cos(!ji,t), s = sin(!ji,t)t1 = vt [i], t2 = vt [j]vt [i] c ⇤ t1 � s ⇤ t2vt [j] s ⇤ t1 + c ⇤ t2
end for
end for
vt = vt � diag(⇤� 12
t ) (elementwise product)
The Sampling algorithm revisited
Model: rt = Bft + et , ✏t ⇠ N (0,�2I), ft ⇠ N (0,⌃t )
Denote by X all latent paths
p(B,�2, (ft )Tt=1|rest) /
0
@
TY
t=1
N (rt |Bft ,�2I)N (ft |0,⌃t (xt ))
1
A p(B,�2),
p(X |rest) /
0
@
TY
t=1
N (ft |0,⌃t (xt ))
1
A p(X |✓h, ✓�),
p(✓h, ✓� |rest) / p(X |✓h, ✓�)p(✓h, ✓�).
Sampling the latent factors in O(TNK ) time
I p(ft |rest) / N (rt |Bft ,�2I)N (ft |0,⌃t ) = N (ft |��2M�1t BT rt ,M�1
t ) where
Mt = ��2BT B + ⌃t . To simulate from this Gaussian we need first to compute
the stochastic volatility matrix ⌃t and subsequently the Cholesky decomposition
of Mt . Both operations have a cost O(K 3).
I We replace the exact Gibbs step with a much faster Metropolis within Gibbs step
that scales as O(T (NK + K 2)).
I To achieve this we apply the same auxiliary Langevin scheme as before
The Data
I 571 stocks from Europe Stoxx 600 index
I Daily data from 08/01/2010 to 5/1/2014 (T = 2017)
I 36340 missing values or 36340/(571 ⇤ 2017) = 3.2%
I Factor model with 30 factors: the dimension of the latent path is
2017 ⇥ 30 ⇥ 31/2 = 937, 905
I Choice of number of factors: Based on predictive performance wrt quadratic
covariation. We tried 20, 30 and 40 factors.
Next day minimum variance portfolio weights for
the 571 stocks
Pairwise correlations across time
Log-Variances across time
I January 2009: Banking shares in the UK plummet as the Royal Bank of Scotland
posts the biggest loss in British history. The Bank of England reduces the base
rate of interest to a new historic low of 1%. The U.S. economy lost 598,000 jobs
during January 2009, with unemployment rising to 7.6 percent. Bankruptcies in
the United Kingdom rose during 2008 by 50 percent to an all-time high.
California’s Alliance Bank and Georgia’s FirstBank are closed, raising the
number of 2009 U.S. bank failures to eight.
I July 2012: Barclays chairman and Chief Executive of British bank Barclays
resign following a scandal in which the bank tried to manipulate the Libor and
Euribor interest rates systems. The central banks of the European Union, Great
Britain, and the People’s Republic of China, in what appears to be a co-ordinated
action, each loosen their respective monetary systems.
Discussion
I Incorporation of Leverage effects, Jumps
I Small N: Nested Laplace approximations (PhD thesis by Plataniotis,
AUEB),importance sampling based on copulas (in progress)
I Bayesian model determination for number of factors
I Relations with other PCN proposals
This project has received funding from the European Union’s
Seventh Framework Programme for research, technological
development and demonstration under grant agreement n° 320270
www.syrtoproject.eu
This document reflects only the author’s views.
The European Union is not liable for any use that may be made of the information contained therein.