tutorial financial econometrics/statistics 2005 samsi program on financial mathematics, statistics,...
TRANSCRIPT
TutorialFinancial Econometrics/Statistics
2005 SAMSI program on Financial Mathematics, Statistics, and Econometrics
Goal
At the index level
Part I: Modeling
... in which we see what basic properties of stock prices/indices we want to capture
Contents
Returns and their (static) properties
Pricing models
Time series properties of returns
Why returns?
Prices are generally found to be non-stationary
Makes life difficult (or simpler...)
Traditional statistics prefers stationary data
Returns are found to be stationary
Which returns?
Two type of returns can be defined
Discrete compounding
Continuous compounding
1
log
t
tt P
PR
11
t
tt P
PR
Discrete compounding
If you make 10% on half of your money and 5% on the other half, you have in total 7.5%
Discrete compounding is additive over portfolio formation
Continuous compounding
If you made 3% during the first half year and 2% during the second part of the year, you made (exactly) 5% in total
Continuous compounding is additive over time
Empirical properties of returns
Mean St.dev. Annualized
volatility
Skewness Kurtosis Min Max
IBM -0.0% 2.46% 39.03% -23.51 1124.61 -138% 12.4%
IBM
(corr)
0.0% 1.64% 26.02% -0.28 15.56 -26.1% 12.4%
S&P 0.0% 0.95% 15.01% -1.4 39.86 -22.9% 8.7%
Data period: July 1962- December 2004; daily frequency
Stylized facts
Expected returns difficult to assess
What’s the ‘equity premium’?
Index volatility < individual stock volatility
Negative skewness
Crash risk
Large kurtosis
Fat tails (thus EVT analysis?)
Pricing models
Finance considers the final value of an asset to be ‘known’
as a random variable , that is
In such a setting, finding the price of an asset is equivalent to finding its expected return:
11
P
XE
P
XERE
X
Pricing models 2
As a result, pricing models model expected returns ...
... in terms of known quantities or a few ‘almost known’ quantities
Capital Asset Pricing Model
One of the best known pricing models
The theorem/model states
ftmtti
ftti rERrER ,,
mt
mtti
ft
mt
ftti
ti RVar
RRCov
rRE
rRE ,,,,
Black-Scholes
Also Black-Scholes is a pricing model
(Exact) contemporaneous relation between asset prices/returns
y volatilit,moneynesspriceStock
price CallBS
Time series properties of returns
Traditionally model fitting exercise without much finance
mostly univariate time series and, thus, less scope for tor the ‘traditional’ cross-sectional pricing models
lately more finance theory is integrated
Focuses on the dynamics/dependence in returns
Random walk hypothesis
Standard paradigm in the 1960-1970
Prices follow a random walk
Returns are i.i.d.
Normality often imposed as well
Compare Black-Scholes assumptions
Box-Jenkins analysis
Linear time series analysis
Box-Jenkins analysis generally identifies a white noise
This has been taken long as support for the random walk hypothesis
Recent developments
Some autocorrelation effects in ‘momentum’
Some (linear) predictability
Largely academic discussion
Higher moments and risk
Risk predictability
There is strong evidence for autocorrelation in squared returns
also holds for other powers
‘volatility clustering’
While direction of change is difficult to predict, (absolute) size of change is
risk is predictable
The ARCH model
First model to capture this effect
No mean effects for simplicity
ARCH in mean
2
21
,0~
1
N
RR
t
ttt
ARCH properties
Uncorrelated returns
martingale difference returns
Correlated squared returns
with limited set of possible patterns
Symmetric distribution if innovations are symmetric
Fat tailed distribution, even if innovations are not
The GARCH model
Generalized ARCH
Beware of time indices ...
2
22
21
21
1
,0~
1
N
R
R
t
ttt
ttt
GARCH model
Parsimonious way to describe various correlation patterns
for squared returns
Higher-order extension trivial
Math-stat analysis not that trivial
See inference section later
Stochastic volatility models
Use latent volatility process
2
2
1
1
,0
0~
exp
N
hh
hR
t
t
ttt
ttt
Stochastic volatility models
Also SV models lead to volatility clustering
Leverage
Negative innovation correlation means that volatility increases and price decreases go together
Negative return/volatility correlation
(One) structural story: default risk
Continuous time modeling
Mathematical finance uses continuous time, mainly for ‘simplicity’
Compare asymptotic statistics as approximation theory
Empirical finance (at least originally) focused on discrete time models
Consistency
The volatility clustering and other empirical evidence is consistent with appropriate continuous time models
A simple continuous time stochastic volatility model
2
1ln
ttt
ttt
dWdtd
dWdtSd
Approximation theory
There is a large literature that deals with the approximation of continuous time stochastic volatility models with discrete time models
Important applications
Inference
Simulation
Pricing
Other asset classes
So far we only discussed stock(indices)
Stock derivatives can be studied using a derivative pricing models
Financial econometrics also deals with many other asset classes Term structure (including credit risk)
Commodities
Mutual funds
Energy markets
...
Term structure modeling
Model a complete curve at a single point in time
There exist models
in discrete/continuous time
descriptive/pricing
for standard interest rates/derivatives
...
Part 2: Inference
Contents
Parametric inference for ARCH-type models
Rank based inference
Analogy principle
The classical approach to estimation is based on the analogy principle
if you want to estimate an expectation, take an average
if you want to estimate a probability, take a frequency
...
Moment estimation (GMM)
Consider an ARCH-type model
We suppose that can be calculated on the basis of observations if is known
Moment condition
tttR 1
1 t
021
21 ttt RE
Moment estimation - 2
The estimator now is taken to solve
In case of “underidentification”: use instruments
In case of “overidentification”: minimize distance-to-zero
0ˆ1
1
21
2
n
tnttRn
Likelihood estimation
In case the density of the innovations is known, say it is , one can write down the density/likelihood of observed returns
Estimator: maximize this
n
t t
t
t
Rf
1 11
1
f
Doing the math ...
Maximizing the log-likelihood boils down to solving
with
n
tttt f
f
1
21log
'1
2
1
1
t
tt
R
Efficiency consideration
Which of the above estimators is “better”?
Analysis using Hájek-Le Cam theory of asymptotic statistics
Approximate complicated statistical experiment with very simple ones
Something which works well in the approximating experiment, will also do well in the original one
Quasi MLE
In order for maximum likelihood to work, one needs the density of the innovations
If this is not know, one can guess a density (e.g., the normal)
This is known as
ML under non-standard conditions (Huber)
Quasi maximum likelihood
Pseudo maximum likelihood
Will it work?
For ARCH-type models, postulating the Gaussian density can be shown to lead to consistent estimates
There is a large theory on when this works or not
We say “for ARCH-type models the Gaussian distribution has the QMLE property”
The QMLE pitfall
One often sees people referring to Gaussian MLE
Then, they remark that we know financial innovations are fat-tailed ...
... and they switch to t-distributions
The t-distribution does not possess the QMLE property (but, see later)
How to deal with SV-models?
The SV models look the same
But now, is a latent process and hence not observed
Likelihood estimation still works “in principle”, but unobserved variances have to be integrated out
tttR 1 1 t
Inference for continuous time models Continuous time inference can, in theory, be
based on
continuous record observations
discretely sampled observations
Essentially all known approaches are based on approximating discrete time models
Rank based inference
... in which we discuss the main ideas of rank based inference
The statistical model
Consider a model where ‘somewhere’ there
exist i.i.d. random errors
The observations are
The parameter of interest is some
We denote the density of the errors by
ntt 1
nttY 1
p
f
Formal model
We have an outcome space , with the number of observations and the dimension of
Take standard Borel sigma-fields
Model for sample size :
Asymptotics refer to
nkk
Y
n
n
fPE fn ;:,
n
Example: Linear regression
Linear regression model
(with observations )
Innovation density and cdf
iTii XY niii XY 1,
f F
Example ARCH(1)
Consider the standard ARCH(1) model
Innovation density and cdf
ttt YY 2110
f F
Maintained hypothesis
For given and sample size , the
innovations can be calculated from the
observations
For cross-sectional models one may even often write
Latent variable (e.g., SV) models ...
n ntt 1
nttY 1
;iii Y
Innovation ranks
The ranks are the ranks of the
innovations
We also write for the ranks
of the innovations based on
a value for the parameter of interest
Ranks of observations are generally not very useful
nRR ,,1 n ,,1
n,,1 nRR ,,1
Basic properties
The distribution does
not depend on nor on
permutation of
This is (fortunately) not true for
at least ‘essentially’
nf RRL ,,1, f
n,,1
nf RRL ,,1,0
Invariance
Suppose we generate the innovations as transformation
with i.i.d. standard uniform
Now, the ranks are even invariant with respect to
nii 1
niiU 1
ii UF 1
niiR 1F
Reconstruction
For large sample size we have
and, thus,
n
1n
RU ii
11
n
RF i
i
Rank based statistics
The idea is to apply whatever procedure you have that uses innovations on the innovations reconstructed from the ranks
This makes the procedure robust to distributional changes
Efficiency loss due to ‘ ’?
Rank based autocorrelations
Time-series properties can be studied using rank based autocorrelations
These can be interpreted as ‘standard’ autocorrelations
rank based
for given reference density and distribution free
n
tltt
nf RR
f
f
nlr
1,
'1
Robustness
An important property of rank based statistics is the distributional invariance
As a result: a rank based estimator is consistent for any reference density
All densities satisfy the QMLE property when using rank based inference
RB̂
Limiting distribution
The limiting distribution of depends on both the chosen reference density and the actual underlying density
The optimal choice for the reference density is the actual density
How ‘efficient’ is this estimator?
Semiparametrically efficient
RB̂
gf
Remark
All procedures are distribution free with respect to the innovation density
They are, clearly, not distribution free with respect to the parameter of interest
f
Signs and ranks
Why ranks?
So far, we have been considering ‘completely’ unrestricted sets of innovation densities
For this class of densities ranks are ‘maximal invariant’
This is crucial for proving semiparametric efficiency
Alternatives
Alternative specifications may impose
zero-median innovations
symmetric innovations
zero-mean innovations
This is generally a bad idea ...
Zero-median innovations
The maximal invariant now becomes the ranks and signs of the innovations
The ideas remain the same, but for a more precise reconstruction
Split sample of innovations in positive and negative part and treat those separately
tt signs
But ranks are still ...
Yes, the ranks are still invariant
... and the previous results go through
But the efficiency bound has now changed and rank based procedures are no longer semiparametrically efficient
... but sign-and-rank based procedures are
Symmetric innovations
In the symmetric case, the signed-ranks become maximal invariant
signs of the innovations
ranks of the absolute values
The reconstruction now becomes still more precise (and efficient)
Semiparametric efficiency
General result
Using the maximal invariant to reconstitute the central sequence leads to semiparametrically efficient inference
in the model for which this maximal invariant is derived
In general use
invariant maximal,,nffE
Proof
The proof is non-trivial, but some intuition can be given using tangent spaces