capital distribution of equity market and its statistical

Capital distribution of equity marketand its statistical modeling

Ting-Kam Leonard Wong

University of Toronto

Ongoing project with Heng Kan and Colin Decker (U of T)

1 /21

Equity marketAs of 2018, the size of the world stock market (total marketcapitalization) was about US$69 trillion.

US market: about US$30 trillion in 2018.

Source: https://data.worldbank.org/indicator/cm.mkt.lcap.cd

2 /21

https://data.worldbank.org/indicator/cm.mkt.lcap.cd

Investing in the equity marketÉ Passive: Track a pre-defined benchmark (lower fee)É Active: Aim to outperform by adopting various strategies

Benchmark: Usually a cap-weighted market index such as S&P500

0 200 400 600 800 1000

0.00

00.

005

0.01

00.

015

0.02

00.

025

Relative cap−weights of largest 1000 stocks (Dec 2016)

rank

Data: CRSP (Special thanks to Johannes Ruf and Desmond Xie)

3 /21

Stability of capital distributionCap-weight of stock i: µi(t) =MCi(t)/

∑

j MCj(t).Ranked weights: µ(1)(t)≥ µ(2)(t)≥ · · · ≥ µ(n)(t).

0 1 2 3 4 5 6 7

−10

−8

−6

−4

−2

Capital distribution: 1962−2016

log rank

log

wei

ght

4 /21

Market diversity (Fernholz 1997)

Diversity: a measure of the degree of concentration. Examples:

Φ(µ) = (∑

j

µλj )1/λ (0< λ < 1)

Φ(µ) = −∑

j

µj logµj (Shannon entropy)

1970 1980 1990 2000 2010

550

600

650

700

1970 1980 1990 2000 2010

5.6

5.8

6.0

6.2

5 /21

Relevence of market diversity

Changes in market diversity explains a statistically andeconomically significant amount of variation in the relative returnsof actively managed institutional large cap strategies. Data fromFernholz (2002) and Agapova, Greene and Ferguson (2011):

6 /21

Theoretical explanation using SPT (Fernholz 2002)

Consider the diversity-weighted portfolio

πi(t) =µi(t)λ

∑nj=1µj(t)λ

which corresponds to the gradient of Φ(µ) = (∑

jµλj )

1/λ which isconcave:

∇Φ(p) · (q− p)≥ Φ(q)−Φ(p).

If ϕ = logΦ, then

log�

∇ϕ(p) ·qp

�

︸︷︷︸

relative log return

≥ ϕ(q)−ϕ(p)︸︷︷︸

change in diversity

.

This is an example of Fernholz’s functionally generated portfolio.É Pal and W. (2015+): Optimal transport & information geometry.

7 /21

Theoretical explanation using SPTIn a simplified Itô process model for the equity market, we have

logZπ(t)Zµ(t)

= ϕ(µ(t))−ϕ(µ(0)) +Θ(t),

where Θ(t) is an increasing process reflecting market volatility.

t

log(Zπ/Zµ)

Θ(t)

From this formula, the relative performance of the portfoliocorrelates with change in diversity.

8 /21

Modeling the capital distribution curveMathematical approachÉ Ranked-based models: an SDE system of interacting particles,

each representing the market cap of a firm. A simple version:

d (Firmi(t)) = γri(t)dt+σri(t)dWi(t),

where ri(t) is the rank of firm i at time t.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

Source (RHS): Fernholz, Ichiba and Karatzas (2013)

On the other hand Audrino Fernholz and Ferretti (2007) proposed andtested a model for forecasting market diversity.

9 /21

Data

We are interested in dynamic modeling and forecasting, but as afirst step we consider exploratory data analysis.É Data source: CRSP (The Center for Research in Security

Prices)É Prepared and used by Johannes Ruf and Desmond Xie (2019)É Daily data from 1962 to 2016.É Market cap and daily returns for all US stocks.

We focus on the largest 1000 stocks.É Ranked-basedÉ Evolving universe, missing data.É For simplicity we use monthly data in this study (reasonable

time scale for large cap portfolio managers).É Renormalize to get (relative) (ranked) market weights{µ≤(t)} ⊂∆1000. May be regarded as a functional time series.

10 /21

Principal component analysis

Performs dimension reduction by projecting the data linearly to alow dimensional subspace. A classic application in finance is yieldcurve modeling; the eigenvectors have useful interpretations.

Left: Wikipedia. Right: https://quant.stackexchange.com

11 /21

https://quant.stackexchange.com

Reminder of PCA

Suppose we observe data x1, . . . ,xN ∈ RD. PCA aims to find alow-dimensional y=Wx ∈ Rd, d< D, that approximates x:

minU∈L(Rd,RD),W∈L(RD,Rd)

N∑

i=1

‖xi −UWxi‖22.

Here W and U are linear maps (i.e., matrices).

TheoremLet A=

∑Ni=1 xix

>i , and let u1, . . . ,ud be the eigenvectors

corresponding to the largest d eigenvectors of A. Then the PCAproblem is solved with

U = [u1, . . . ,ud], W = U>.

12 /21

Plain vanilla PCA does not work well

Since our data lies in the unit simplex, Euclidean PCA may (anddoes) not work well.

0 200 400 600 800

0.0

0.1

0.2

0.3

0.4

0.5

1

0 200 400 600 800

−0.

5−

0.3

−0.

10.

1

2

0 200 400 600 800−

0.2

0.0

0.2

0.4

3

0 200 400 600 800

−0.

40.

00.

20.

40.

60.

8

4

0 200 400 600 800

−0.

6−

0.4

−0.

20.

00.

2

5

1970 1980 1990 2000 201055

060

065

070

0

The blue series approximates the diversity using 5 eigenvectors. Itdoes not track the short term fluctuations of the diversity.

13 /21

Compositional data analysis, simplicial PCAWe will apply a version of PCA using the Aitchison geometry.

Source: Aitchison (1982)

14 /21

Aitchison geometry

Consider the open unit simplex

∆n := {p= (p1, . . . , pn) : p1 + · · ·+ pn = 1}.

Define the closure operator

C[x] :=�

x1

x1 + · · ·+ xn, . . . ,

xn

x1 + · · ·+ xn

�

, x ∈ (0,∞)n.

TheoremDefine the operations

p⊕ q := C[p1q1, . . . , pnqn] (perturbation)

λ⊗ p := C[pλ1 , . . . , pλn] (powering)

Then (∆n,⊕,⊗) is an (n− 1)-dimensional real vector space.

15 /21

Aitchison geometryTheorem (Simplex as a Hilbert space)s Define

⟨p, q⟩A :=n∑

i=1

logpi

g(p)log

qi

g(q),

where g(x) = (x1 · · ·xn)1/n is the geometric mean. Then ⟨·, ·⟩A is aninner product on (∆n,⊕,⊗).

16 /21

Isometric-log-ratio transformConsider the orthonormal basis

ei = C

�

exp

�√

√ 1i(i+ 1)

, . . . ,

√

√ 1i(i+ 1)

,−√

√ ii(i+ 1)

, 0, . . . , 0

��

,

where i= 1, . . . , n− 1.

Definition (Egozcue et al (2003))We define ilr :∆n→ Rn−1 by

x= ilr(p) := (⟨p,e1⟩A, . . . , ⟨p,en−1⟩A), xi =

√

√ ii+ 1

log(p1 · · ·pi)1/i

pi+1.

●

●●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●● ●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Original data on simplex

●

●●

●

●

●

●

●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●●●

● ●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

0 1 2 3 4 5

12

34

5

Data after ilr−transform

17 /21

Simplicial PCAAitchison (1982) uses instead the centered-log-ratio (clr)transform. Here we use the ilr-transform. Procedure:

original data→ ilr→ Euclidean PCA→ inverse-ilr

Eigenvectors (in ilr-space) using the latest 20 years of data:

0 200 400 600 800

−0.

035

−0.

025

−0.

015

−0.

005

1

0 200 400 600 800−

0.04

0.00

0.04

2

0 200 400 600 800

−0.

15−

0.05

0.05

3

0 200 400 600 800

−0.

100.

000.

10

4

0 200 400 600 800

−0.

15−

0.05

0.05

5

18 /21

Interpretations of the eigenvectors

0 1 2 3 4 5 6 7

−9

−8

−7

−6

−5

−4

−3

Perturbation wrt 1st eigenvector

log rank

log

wei

ght

0 1 2 3 4 5 6 7

−9

−8

−7

−6

−5

−4

−3

Perturbation wrt 2nd eigenvector

log rank

log

wei

ght

0 1 2 3 4 5 6 7

−9

−8

−7

−6

−5

−4

−3

Perturbation wrt 3rd eigenvector

log rank

log

wei

ght

0 1 2 3 4 5 6 7

−9

−8

−7

−6

−5

−4

−3

Perturbation wrt 4th eigenvector

log rank

log

wei

ght

The shapes of the eigenvectors (except the first one and perhapsthe 2nd) fluctuate over time. This may indicate some structuralchanges in the market despite stability of the capital distribution.

19 /21

Approximation of market diversity

Again we use only the first 5 eigenvectors. The simplex PCAperforms much better than the Euclidean PCA in capturing themarket diversity.

1965 1970 1975 1980

600

620

640

660

680

700

1970 1980 1990 2000 201055

060

065

070

0

Intuitively, the tracjectory of µ≥(·) is close to a 5-dimensionalsubmanifold of ∆n. Right: Result using the entire dataset.

20 /21

Future directions

É Further study of the geometry in relation to the properties ofthe dataÉ Financial interpretationsÉ Dynamic statistical modelsÉ ForecastingÉ Portfolio optimizationÉ Modeling of volatility using (information) geometric ideaÉ Combine with mathematical approaches in SPT

21 /21

capital distribution of equity market and its statistical

Documents