capital distribution of equity market and its statistical
TRANSCRIPT
Capital distribution of equity marketand its statistical modeling
Ting-Kam Leonard Wong
University of Toronto
Ongoing project with Heng Kan and Colin Decker (U of T)
1 /21
Equity marketAs of 2018, the size of the world stock market (total marketcapitalization) was about US$69 trillion.
US market: about US$30 trillion in 2018.
Source: https://data.worldbank.org/indicator/cm.mkt.lcap.cd
2 /21
Investing in the equity marketÉ Passive: Track a pre-defined benchmark (lower fee)É Active: Aim to outperform by adopting various strategies
Benchmark: Usually a cap-weighted market index such as S&P500
0 200 400 600 800 1000
0.00
00.
005
0.01
00.
015
0.02
00.
025
Relative cap−weights of largest 1000 stocks (Dec 2016)
rank
Data: CRSP (Special thanks to Johannes Ruf and Desmond Xie)
3 /21
Stability of capital distributionCap-weight of stock i: µi(t) =MCi(t)/
∑
j MCj(t).Ranked weights: µ(1)(t)≥ µ(2)(t)≥ · · · ≥ µ(n)(t).
0 1 2 3 4 5 6 7
−10
−8
−6
−4
−2
Capital distribution: 1962−2016
log rank
log
wei
ght
4 /21
Market diversity (Fernholz 1997)
Diversity: a measure of the degree of concentration. Examples:
Φ(µ) = (∑
j
µλj )1/λ (0< λ < 1)
Φ(µ) = −∑
j
µj logµj (Shannon entropy)
1970 1980 1990 2000 2010
550
600
650
700
1970 1980 1990 2000 2010
5.6
5.8
6.0
6.2
5 /21
Relevence of market diversity
Changes in market diversity explains a statistically andeconomically significant amount of variation in the relative returnsof actively managed institutional large cap strategies. Data fromFernholz (2002) and Agapova, Greene and Ferguson (2011):
6 /21
Theoretical explanation using SPT (Fernholz 2002)
Consider the diversity-weighted portfolio
πi(t) =µi(t)λ
∑nj=1µj(t)λ
which corresponds to the gradient of Φ(µ) = (∑
jµλj )
1/λ which isconcave:
∇Φ(p) · (q− p)≥ Φ(q)−Φ(p).
If ϕ = logΦ, then
log�
∇ϕ(p) ·qp
�
︸ ︷︷ ︸
relative log return
≥ ϕ(q)−ϕ(p)︸ ︷︷ ︸
change in diversity
.
This is an example of Fernholz’s functionally generated portfolio.É Pal and W. (2015+): Optimal transport & information geometry.
7 /21
Theoretical explanation using SPTIn a simplified Itô process model for the equity market, we have
logZπ(t)Zµ(t)
= ϕ(µ(t))−ϕ(µ(0)) +Θ(t),
where Θ(t) is an increasing process reflecting market volatility.
t
log(Zπ/Zµ)
Θ(t)
From this formula, the relative performance of the portfoliocorrelates with change in diversity.
8 /21
Modeling the capital distribution curveMathematical approachÉ Ranked-based models: an SDE system of interacting particles,
each representing the market cap of a firm. A simple version:
d (Firmi(t)) = γri(t)dt+σri(t)dWi(t),
where ri(t) is the rank of firm i at time t.
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
Source (RHS): Fernholz, Ichiba and Karatzas (2013)
On the other hand Audrino Fernholz and Ferretti (2007) proposed andtested a model for forecasting market diversity.
9 /21
Data
We are interested in dynamic modeling and forecasting, but as afirst step we consider exploratory data analysis.É Data source: CRSP (The Center for Research in Security
Prices)É Prepared and used by Johannes Ruf and Desmond Xie (2019)É Daily data from 1962 to 2016.É Market cap and daily returns for all US stocks.
We focus on the largest 1000 stocks.É Ranked-basedÉ Evolving universe, missing data.É For simplicity we use monthly data in this study (reasonable
time scale for large cap portfolio managers).É Renormalize to get (relative) (ranked) market weights{µ≤(t)} ⊂∆1000. May be regarded as a functional time series.
10 /21
Principal component analysis
Performs dimension reduction by projecting the data linearly to alow dimensional subspace. A classic application in finance is yieldcurve modeling; the eigenvectors have useful interpretations.
Left: Wikipedia. Right: https://quant.stackexchange.com
11 /21
Reminder of PCA
Suppose we observe data x1, . . . ,xN ∈ RD. PCA aims to find alow-dimensional y=Wx ∈ Rd, d< D, that approximates x:
minU∈L(Rd,RD),W∈L(RD,Rd)
N∑
i=1
‖xi −UWxi‖22.
Here W and U are linear maps (i.e., matrices).
TheoremLet A=
∑Ni=1 xix
>i , and let u1, . . . ,ud be the eigenvectors
corresponding to the largest d eigenvectors of A. Then the PCAproblem is solved with
U = [u1, . . . ,ud], W = U>.
12 /21
Plain vanilla PCA does not work well
Since our data lies in the unit simplex, Euclidean PCA may (anddoes) not work well.
0 200 400 600 800
0.0
0.1
0.2
0.3
0.4
0.5
1
0 200 400 600 800
−0.
5−
0.3
−0.
10.
1
2
0 200 400 600 800−
0.2
0.0
0.2
0.4
3
0 200 400 600 800
−0.
40.
00.
20.
40.
60.
8
4
0 200 400 600 800
−0.
6−
0.4
−0.
20.
00.
2
5
1970 1980 1990 2000 201055
060
065
070
0
The blue series approximates the diversity using 5 eigenvectors. Itdoes not track the short term fluctuations of the diversity.
13 /21
Compositional data analysis, simplicial PCAWe will apply a version of PCA using the Aitchison geometry.
Source: Aitchison (1982)
14 /21
Aitchison geometry
Consider the open unit simplex
∆n := {p= (p1, . . . , pn) : p1 + · · ·+ pn = 1}.
Define the closure operator
C[x] :=�
x1
x1 + · · ·+ xn, . . . ,
xn
x1 + · · ·+ xn
�
, x ∈ (0,∞)n.
TheoremDefine the operations
p⊕ q := C[p1q1, . . . , pnqn] (perturbation)
λ⊗ p := C[pλ1 , . . . , pλn] (powering)
Then (∆n,⊕,⊗) is an (n− 1)-dimensional real vector space.
15 /21
Aitchison geometryTheorem (Simplex as a Hilbert space)s Define
⟨p, q⟩A :=n∑
i=1
logpi
g(p)log
qi
g(q),
where g(x) = (x1 · · ·xn)1/n is the geometric mean. Then ⟨·, ·⟩A is aninner product on (∆n,⊕,⊗).
16 /21
Isometric-log-ratio transformConsider the orthonormal basis
ei = C
�
exp
�√
√ 1i(i+ 1)
, . . . ,
√
√ 1i(i+ 1)
,−√
√ ii(i+ 1)
, 0, . . . , 0
��
,
where i= 1, . . . , n− 1.
Definition (Egozcue et al (2003))We define ilr :∆n→ Rn−1 by
x= ilr(p) := (⟨p,e1⟩A, . . . , ⟨p,en−1⟩A), xi =
√
√ ii+ 1
log(p1 · · ·pi)1/i
pi+1.
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Original data on simplex
●
●●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
0 1 2 3 4 5
12
34
5
Data after ilr−transform
17 /21
Simplicial PCAAitchison (1982) uses instead the centered-log-ratio (clr)transform. Here we use the ilr-transform. Procedure:
original data→ ilr→ Euclidean PCA→ inverse-ilr
Eigenvectors (in ilr-space) using the latest 20 years of data:
0 200 400 600 800
−0.
035
−0.
025
−0.
015
−0.
005
1
0 200 400 600 800−
0.04
0.00
0.04
2
0 200 400 600 800
−0.
15−
0.05
0.05
3
0 200 400 600 800
−0.
100.
000.
10
4
0 200 400 600 800
−0.
15−
0.05
0.05
5
18 /21
Interpretations of the eigenvectors
0 1 2 3 4 5 6 7
−9
−8
−7
−6
−5
−4
−3
Perturbation wrt 1st eigenvector
log rank
log
wei
ght
0 1 2 3 4 5 6 7
−9
−8
−7
−6
−5
−4
−3
Perturbation wrt 2nd eigenvector
log rank
log
wei
ght
0 1 2 3 4 5 6 7
−9
−8
−7
−6
−5
−4
−3
Perturbation wrt 3rd eigenvector
log rank
log
wei
ght
0 1 2 3 4 5 6 7
−9
−8
−7
−6
−5
−4
−3
Perturbation wrt 4th eigenvector
log rank
log
wei
ght
The shapes of the eigenvectors (except the first one and perhapsthe 2nd) fluctuate over time. This may indicate some structuralchanges in the market despite stability of the capital distribution.
19 /21
Approximation of market diversity
Again we use only the first 5 eigenvectors. The simplex PCAperforms much better than the Euclidean PCA in capturing themarket diversity.
1965 1970 1975 1980
600
620
640
660
680
700
1970 1980 1990 2000 201055
060
065
070
0
Intuitively, the tracjectory of µ≥(·) is close to a 5-dimensionalsubmanifold of ∆n. Right: Result using the entire dataset.
20 /21
Future directions
É Further study of the geometry in relation to the properties ofthe dataÉ Financial interpretationsÉ Dynamic statistical modelsÉ ForecastingÉ Portfolio optimizationÉ Modeling of volatility using (information) geometric ideaÉ Combine with mathematical approaches in SPT
21 /21