1
Maximum Likelihood Estimation Maximum Likelihood Estimation for for Allpass Allpass Time Series ModelsTime Series Models
Richard A. Davis
Department of StatisticsColorado State University
http://www.stat.colostate.edu/~rdavis/lectures/magdeburg02.pdf
Joint work with Jay Breidt, Colorado State UniversityAlex Trindade, University of FloridaBeth Andrews, Colorado State University
2
�Introduction• properties of financial time series• motivating example • all-pass models and their properties
�Estimation• likelihood approximation• MLE and LAD• asymptotic results• order selection
�Empirical results• simulation• NZ/USA exchange rates
�Noninvertible MA processes� preliminaries� a two-step estimation procedure� Microsoft trading volume
�Summary
3
Financial Time Series� Log returns, Xt = 100*(ln (Pt) - ln (Pt-1)), of financial assets
often exhibit:
• heavy-tailed marginal distributionsP(|X1| > x) ~ C x−α, 0 < α < 4.
• lack of serial correlationnear 0 for all lags h > 0 (MGD sequence)
• |Xt| and Xt2 have slowly decaying autocorrelations
converge to 0 slowly as h →∞
• process exhibits ‘stochastic volatility’
)(ˆ hXρ
)(ˆ and )(ˆ 2|| hh XX ρρ
� Nonlinear models Xt = σtZt , {Zt} ~ IID(0,1)
• ARCH and its variants (Engle `82; Bollerslev, Chou, andKroner 1992)
• Stochastic volatility (Clark 1973; Taylor 1986)
4
lag h
ACF
0 10 20 30 40 50
0.0
0.2
0.4
0.6
0.8
1.0
lag h
ACF(
|X|^
2)
0 10 20 30 40 50
0.0
0.2
0.4
0.6
0.8
1.0
lag h
ACF(
|X|)
0 10 20 30 40 50
0.0
0.2
0.4
0.6
0.8
1.0
t
X(t)
0 100 200 300 400 500
-0.0
4-0
.02
0.0
0.02
Motivating example500-daily log-returns of NZ/US exchange rate
5
All-pass model of order 2 (t3 noise )
0 1 0 2 0 3 0 4 0
0.0
0.2
0.4
0.6
0.8
1.0
L a g
ACF
A C F : (a llp a s s )2
0 1 0 2 0 3 0 4 0
0.0
0.2
0.4
0.6
0.8
1.0
L a g
ACF
A C F : (a llp a s s )
modelsample
t
X(t)
0 200 400 600 800 1000
-30
-20
-10
010
20
6
All-pass Models
Causal AR polynomial: φ(z)=1−φ1z − − φpzp , φ(z) ≠ 0 for |z|≤1.
Define MA polynomial:
θ(z) = −zpφ(z−1)/φp = −(zp −φ1zp-1 − − φp)/ φp
≠ 0 for |z|≥1 (MA polynomial is non-invertible).
Model for data {Xt} : φ(B)Xt = θ(B) Zt , {Zt} ~ IID (non-Gaussian)
BkXt = Xt-k
Examples:
All-pass(1): Xt − φXt-1 = Zt − φ−1 Zt-1 , | φ | < 1.
All-pass(2): Xt − φ1 Xt-1 − φ2 Xt-2 = Zt + φ1/ φ2 Zt-1 − 1/ φ2 Zt-2
m
m
7
Properties:• causal, non-invertible ARMA with MA representation
• uncorrelated (flat spectrum)
• zero mean• data are dependent if noise is non-Gaussian
(e.g. Breidt & Davis 1991).• squares and absolute values are correlated.
• Xt is heavy-tailed if noise is heavy-tailed.
πφσ=
πσ
φφ
φ=ω
ω−
ωω−
22)(
)()( 2
22
22
22
pi
p
iip
Xe
eef
jt0j
jtp
1
t )()(
−
∞
=
−
∑ψ=φφ−
φ= ZZB
BBXp
8
� Second-order moment techniques do not work
• least squares
• Gaussian likelihood
� Higher-order cumulant methods
• Giannakis and Swami (1990)
• Chi and Kung (1995)
� Non-Gaussian likelihood methods
• likelihood approximation
• quasi-likelihood
• least absolute deviations
• minimum dispersion
Estimation for All-Pass Models
9
Approximating the likelihoodData: (X1, . . ., Xn)
Model:
where φ0r is the last non-zero coefficient among the φ0j’s.
Noise:
where zt =Zt / φ0r.
More generally define,
Note: zt(φφφφ0) is a close approximation to zt (initialization error)
rtpptpt
ptptt
ZZZXXX
00101
0101
/)( φφ−−φ−−
φ++φ=
+−−
−−
�
�
),( 01010101 ptptttpptpt XXXzzz −−+−− φ−−φ−−φ++φ= ��
+=φ−φ++φ++=
=+−
− .1,..., if ,)()()(,1,..., if ,0
)(11 pntXBzz
npntz
ttpptpt φφφφφφφφφφφφ
�
10
Assume that Zt has density function fσ and consider the vector
Joint density of z:
and hence the joint density of the data can be approximated by
where q=max{0 ≤ j ≤ p: φj ≠ 0}.
)')(),...,(),...,(,)(),...,(,,...,( 110101OOOO NOOOO MLOOOOO NOOOOO ML
φφφφφφφφφφφφφφφφφφφφ npnpp zzzzzXX +−−−=z
independent pieces
)),(),...,(( ||))((
))(),...,(,,...,()(
121
01011
φφφφφφφφφφφφ
φφφφφφφφ
npn
pn
tqtq
pp
zzhzf
zzXXhh
+−
−
=σ
−−
φφ•
=
∏
z
φφ= ∏
−
=σ
pn
tqtq zfh
1
||))(()( φφφφx
11
Log-likelihood:
where fσ(z)= σ−1 f(z/σ).
Least absolute deviations: choose Laplace density
and log-likelihood becomes
Concentrated Laplacian likelihood
Maximizing l(φφφφ) is equivalent to minimizing the absolute deviations
))((ln|)|/ln()(),(1
1 φφφφφφφφ ∑−
=
− φσ+φσ−−=σpn
ttqq zfpnL
|)|2exp(2
1)( zzf −=
||/ ,/|)(|2ln)(constant1
q
pn
ttzpn φσ=κκ−κ−− ∑
−
=
φφφφ
|)(|ln)(constant)(1
φφφφφφφφ ∑−
=
−−=pn
ttzpnl
.|)(|)(1
n φφφφφφφφ ∑−
=
=pn
ttzm
12
Assumptions
� Assume {Zt} iid fσ(z)=σ−1f(σ−1z) with
• σ a scale parameter
• mean 0, variance σ2
� For f known, use maximum likelihood
• further smoothness assumptions (integrability,symmetry, etc.) on f
• Fisher information:
� For f unknown, use quasi-likelihood
� Least absolute deviations
• assume f has median 0
• assume f continuous in neighborhood of 0
• act as if f = Laplace to get criterion function
dzzfzfI )(/))('(~ 22∫
−σ=
13
Results� Let γ(h) = ACVF of AR model with AR poly φ0(.) and
� Maximum likelihood:
� Least absolute deviations:
p1kj,k)]-j([ =γ=Γp
))1~(2
1,0() ˆ( 1220MLE
−Γσ−σ
→− p
D
INn φφφφφφφφ
))0(2|)(|Var,0() ˆ( 12
241
0LAD−
σ
Γσσ
→− p
D
fZNn φφφφφφφφ
14
Further comments on MLE
Let α=(φ1, . . . , φp, σ /|φ1|, β1, . . . , βq), where β1, . . . , βq are theparameters of pdf f.
Set
�
�
�
� (Fisher Information)
{ }
dzzfzfzf
dzzfzfzfz
dzzfzfz
dzzfzf
T
f
p
p
0000
0000
0000
0000
00000000
0000
0000
0000
0000
00000000
00000000
ββββββββ
ββββββββ
ββββββββ
ββββββββ
ββββββββ
ββββββββ
ββββββββ
∂∂
∂∂=
∂∂α−=
−α=
σ=
∫
∫
∫
∫
−+
−+
−
);();();(
1)(I
);();();('L
1);(/));('(K
);(/));('(I
11,0
2221,0
220
15
Under smoothness conditions on f wrt β1, . . . , βq we have
where
Note: is asymptotically independent of and
),,0() ˆ( 10MLE
−∑→− Nn
Dαααααααα
−−−−−−
Γ−σ
=∑−−−−−
−−−−−
−
−
11111
11111
120
1
)'ˆ(ˆ)'ˆ()'ˆ('ˆ)'ˆ(
)1ˆ(21
LKLIKLLKLILKLILKLILK
I
ff
ff
p
00
00
ˆMLEφφφφ ˆ MLE1,p+α ˆ
MLEββββ
16
Identifiability in LAD case?• Minimizer may not be unique.
• Gaussian case: {Zt} iid , so
• Consider {cj}with at least two non-zero elements and
Jian and Pawitan (1998) show
holds for Laplace, Student’s t, contaminated normal, etc.
• Non-Gaussian case:
|| || 1ZEZcE jj
j >∑∞
−∞=
|)(||)(| 0100
01
10
1111 φφφφφφφφ zEZEZEzE
pp
=φσσ=
φσσ=
1 and || 2 =∞< ∑∑∞
−∞=
∞
−∞= jj
jj cc
|)(|)()(
)()(|)(| 010
110
11
011 φφφφφφφφ zEZ
BBBBEzE t
p
>φφφφφ= −
−
)N(0,)N(0, -21p
21
-20p
20 φσ=φσ
17
Central Limit Theorem (LAD case)
• Think of u = n1/2(φφφφ−−−−φφφφ0000) as an element of RRp
• Define
• Then Sn(u) → S(u) in distribution on C(RRp), where
• Hence,
( )
∑
∑
−
=
−
=
−+=
+=
pn
ttn
pn
tttn
znm
z-nzS
10
1/2-0
10
1/2-0
|)(|) (
|)( ||) ( |)(
φφφφφφφφ
φφφφφφφφ
u
uu
),|)(|Var2 ,(~ ,''||)0()( 22
0
1
0p
rp
r
ZNfS Γσφ
+Γφ
= σ 0NNuuuu
))0(2|)(|Var,(~
)0(2||
)(minarg)ˆ()(minarg
1244
11
0
02/1
−
σσ
−
Γσσ
Γφ−=
→−=
ppr
D
LADn
fZN
f
SnS
0N
uu φφφφφφφφ
18
Asymptotic Results (LAD case):
Theorem 1. Let {Yt} be the linear process
where c0=0, {zt }~IID(0,σ2), median(z1 )=0,
g(0)>0 (g density of z1). Then
where and γ*(h) is the covariance
function for Yt sgn(zt)
,jtj
jt zcY −
∞
−∞=∑=
, || ∞<∑∞
−∞=jjc
( )NgY
z-Y-nzSpn
ttttn
+→
=∑−
=
)0()(Var
| || |
1
1
1/2-
))(*2)0(*N(0,~1∑≥
γ+γh
hN
19
Key idea:
( )
{ })0()(
11 )( 2
)sgn(
| || |
1
}0{}0{1
2/1
1
2/1
1
1/2-
2/12/1
gYVarN
zYn
zYn
z-Y-nzS
tttt zYnYnzt
pn
tt
pn
ttt
pn
ttttn
+→
−−+
−=
=
<<<<
−
=
−
−
=
−
−
=
−−∑
∑
∑
20
Theorem 2. On C(RRp),
where
and Γp is the covariance matrix of a causal AR(p).
( ), )(
|)( ||) ( |)(1
01/2-
0
u
uu
S
z-nzSpn
tttn
→
φ+φ=∑−
=
),|)(|2 ,(~
,''||)0()(
220
1
0
pr
pr
ZVarN
fS
Γσφ
+Γφ
= σ
0N
Nuuuu
21
Limit theory for LAD estimate. Note that
so that
Minimizing S, we find that the minimizer or limit random variable is
nn /ˆ ˆ0LAD u+φ=φ
).(minargˆ)(minarg) ˆ(ˆ 0LAD
uuuu
SSn nn
=→=φ−φ=
Nu)0(2
||) ˆ(ˆ
10
0LADσ
−Γφ−→φ−φ=
fn pr
n
))0(2|)(|,(~
)0(2|| 12
241
10 −
σσ
−
Γσσ
Γφ− p
pr
fZVarN
f0N
22
Asymptotic Covariance Matrix• For LS estimators of AR(p):
• For LAD estimators of AR(p):
• For LAD estimators of AP(p):
• For MLE estimators of AP(p):
),0() ˆ( 120LS
−Γσ→− p
DNn φφφφφφφφ
))0(4
1,0() ˆ( 12220LAD
−Γσσ
→− p
D
fNn φφφφφφφφ
))0(2|)(|Var,0() ˆ( 12
241
0LAD−
σ
Γσσ
→− p
D
fZNn φφφφφφφφ
))1ˆ(2
1,0() ˆ( 1220MLE
−Γσ−σ
→− p
D
INn φφφφφφφφ
23
Laplace: (LAD=MLE)
Students tν, ν >2:
LAD:
MLE:
Student’s t3: LAD: .7337
MLE: 0.5
ARE: .7337/.5=1.4674
)1ˆ(21
21
)0(2|)(|Var
2241
−σ==
σ σ IfZ
2
2
2
2
241
)1()2(2
)2/)1((2)2)(2/(
)0(2|)(|Var
−ν−ν−
+νΓπ−ννΓ=
σ σfZ
12)3)(2(
)1ˆ(21
2
+ν−ν=−σ I
24
Order Selection:
Partial ACF From the previous result, if true model is of order r and fitted model is of order p > r, then
where is the pth element of .
Procedure:
1. Fit high order (P-th order), obtain residuals and estimate scalar,
by empirical moments of residuals and density estimates.
))0(2
|)(|Var,0(ˆ24,
2/1
σσ→φ
fZNn LADp
LADp,φ LADφφφφ
,)0(2|)(|Var
2412
σσ=θ
fZ
25
2. Fit AP models of order p=1,2, . . . , P via LAD and obtain p-th
coefficient for each.
3. Choose model order r as the smallest order beyond which the
estimated coefficients are statistically insignificant.
Note: Can replace with if using MLE. In this case
for p > r
pp ,φ
pp ,φ MLE,ˆ
pφ
).)1ˆ(2
1,0(ˆ2,
2/1
−σ→φ
INn MLEp
26
AIC: 2p or not 2p?
• An approximately unbiased estimate of the Kullback-Leiber index
of fitted to true model:
• Penalty term for Laplace case:
• Estimated penalty term:
pfZE
ZLpAIC X )0(|||)(|Var)ˆ,ˆ(2:)( 2
1
1
σσ+κ−= φφφφ
pppfZE
Z =σσσ
σ=σ σ )2/1()2/(
2/)0(||
|)(|Var2
2
21
1
pfZE
Zpfzave
z
tzt
t
)0(|||)(|Var
)0(ˆ|}ˆ({||)ˆ((|var
21
1
Pˆ( σσ
→))))φφφφ
))))φφφφ))))φφφφ
27
Sample realization of all-pass of order 2
t
X(t)
0 100 200 300 400 500
-40
-20
020
(a) Data From Allpass Model
Lag
AC
F
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
(b) ACF of Allpass Data
Lag
AC
F
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
(c) ACF of Squares
Lag
AC
F
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
(d) ACF of Absolute Values
28
Estimates:
Standard errors computed aswhere
Order selection:• cut-off value for PACF is 1.96*.908/sqrt(500)=.0796
•
)0381(.374.ˆ ),0381(.297.ˆ21 =φ=φ
}500/)ˆ1{( ˆ 22φ−θ sqrt
.919 ˆ =θ
6 7 8 9 100.047 0.034 -0.05 0.083 0.0212348 2349 2345 2343 2345
1 2 3 4 5phi_p 0.289 0.374 0.009 0.011 0.01AIC(p)2451 2346 2347 2348 2350
pLpAIC X 896.1)ˆ,ˆ(2:)( +κφ−=
29
Simulation results:
• 1000 replicates of all-pass models
• model order parameter value 1 φ1 =.52 φ1=.3, φ2=.4
• noise distribution is t with 3 d.f.
• sample sizes n=500, 5000
• estimation method is LAD
30
To guard against being trapped in local minima, we adopted the following strategy.
• 250 random starting values were chosen at random. For model of order p, k-th starting value was computed recursively as follows:
1. Draw iid uniform (-1,1).2. For j=2, …, p, compute
• Select top 10 based on minimum function evaluation.
• Run Hooke and Jeeves with each of the 10 starting values and choose best optimized value.
)()(22
)(11 ,...,, k
ppkk φφφ
φ
φφ−
φ
φ=
φ
φ
−
−−
−−
−
−)(
1,1
)(1,1
)(
)(1,1
)(1,1
)(1,
)(1
kj
kjj
kjj
kjj
kj
kjj
kj
���
31
Asymptotic EmpiricalN mean std dev mean std dev %coverage rel eff*500 φ1=.5 .0332 .4979 .0397 94.2 11.85000 φ1=.5 .0105 .4998 .0109 95.4 9.3
Asymptotic EmpiricalN mean std dev mean std dev %coverage500 φ1=.3 .0351 .2990 .0456 92.5
φ2=.4 .0351 .3965 .0447 92.15000 φ1=.3 .0111 .3003 .0118 95.5
φ2=.4 .0111 .3990 .0117 94.7
*Efficiency relative to maximum absolute residual kurtosis:2
12
1t
4
2/12
))()((1 , 3)(1 φφφφφφφφφφφφ zzpn
vvz
pn t
pn
t
pnt −
−=−
− ∑∑−
=
−
=
32
Asymptotic EmpiricalN mean std dev mean std dev %coverage 500 φ1=.5 .0349 .4983 .0421 91.7
ν=3.5 .5853 3.449 .4527 92.5 5000 φ1=.5 .0110 .4997 .0088 95.0
ν=3.5 .1851 3.449 .1341 96.7
Asymptotic EmpiricalN mean std dev mean std dev %coverage500 φ1=.3 .0369 .2969 .0451 89.6
φ2=.4 .0369 .3973 .0446 90.6ν=3.5 .5853 3.556 .5685 92.4
5000 φ1=.3 .0117 .3002 .0099 94.7φ2=.4 .0117 .4001 .0106 93.6ν=3.5 .1764 3.510 .1764 94.7
MLE Simulations Results using t-distr(3.5)
33
Empirical Empirical LADN mean std dev mean std dev500 φ1=.5 .4978 .0315 .4979 .03975000 φ1=.5 .4997 .0094 .4998 .0109500 φ1=.3 .2988 .0374 .2990 .0456
φ2=.4 .3957 .0360 .3965 .04475000 φ1=.3 .3007 .0101 .3003 .0118
φ2=.4 .3993 .0104 .3990 .0117
Minimum Dispersion Estimator: Minimize the objective fcn
where {z(t)(φφφφ)} are the ordered {zt(φφφφ)}.
)(21
1( )(
1tφφφφφ)φ)φ)φ) t
pn
zpntS ∑
−
=
−
+−=
34
lag h
ACF
0 10 20 30 40 50
0.0
0.2
0.4
0.6
0.8
1.0
lag h
ACF(
|X|^
2)
0 10 20 30 40 50
0.0
0.2
0.4
0.6
0.8
1.0
lag h
ACF(
|X|)
0 10 20 30 40 50
0.0
0.2
0.4
0.6
0.8
1.0
t
X(t)
0 100 200 300 400 500
-0.0
4-0
.02
0.0
0.02
Application to financial data500-daily log-returns of NZ/US exchange rate
35
Lag
acf
0 5 10 15 20 25
0.0
0.2
0.4
0.6
0.8
1.0
Lag
acf
0 5 10 15 20 25
0.0
0.2
0.4
0.6
0.8
1.0ACF: residuals ACF: (residuals)2
All-pass model fitted to NZ-USA exchange rates (using LAD):
Order = 6, φ1=-.367, φ2=-.750, φ3=-.391, φ4=.088, φ5= -.193, φ6=-.096
(AIC had local minima at p=6 and 10)
36
Noninvertible MA models with heavy tailed noise
Xt = Zt + θ1 Zt-1 + + θq Zt-q ,
a. {Zt} ~ IID(α) with Pareto tails
b. θ(z) = 1 + θ1 z + + θq zq
No zeros inside the unit circle invertibleSome zero(s) inside the unit circle noninvertible
. . .
. . .
⇒
⇒
37
Realizations of an invertible and noninvertible MA(2) processesModel: Xt = θ∗ (B) Zt , {Zt} ~ IID(α = 1), whereθi(B) = (1 +1/2B)(1 + 1/3B) and θni(B) = (1 + 2B)(1 + 3B)
0 10 20 30 40
-40
-20
020
0 10 20 30 40
-300
-100
010
0
Lag
0 2 4 6 8 10
-0.2
0.2
0.6
1.0
ACF
Lag
0 2 4 6 8 10
-0.2
0.2
0.6
1.0
ACF
38
Application of all-pass to noninvertible MA model fitting
Suppose {Xt} follows the noninvertible MA model
Xt= θi(B) θni(B) Zt , {Zt} ~ IID.
Step 1: Let {Ut} be the residuals obtained by fitting a purely invertible MA model, i.e.,
So
Step 2: Fit a purely causal AP model to {Ut}
Z(B)~(B)U
). of version invertible theis ~( ,(B)U~(B)
(B)UˆX
tni
nit
ninitnii
tt
θθ≈
θθθθ≈
θ=
.(B)Z(B)U~tnitni θ=θ
39
t
X(t)
0 200 400 600
2*10
56*
105
106
Volumes of Microsoft (MSFT) stock traded over 755 transaction days (6/3/96 to 5/28/99)
40
Analysis of MSFT:
Step 1: Log(volume) follows MA(4).
Xt =(1+.513B+.277B2+.270B3+.202B4) Ut (invertible MA(4))
Step 2: All-pass model of order 4 fitted to {Ut} using MLE (t-dist):
Conclude that {Xt} follows a noninvertible MA(4) which after refitting has the form:
Xt =(1+1.34B+1.374B2+2.54B3+4.96B4) Zt , {Zt}~IID t(6.3)
.)Z8B1.3.586B18B4.5B6.21(
)U14B3..833B32B1.B184.1(
t432
t432
−−−+=
−−++
41
Lag
ACF
0 10 20 30 40
0.0
0.4
0.8
(a) ACF of Squares of Ut
Lag
ACF
0 10 20 30 40
0.0
0.4
0.8
(b) ACF of Absolute Values of Ut
Lag
ACF
0 10 20 30 40
0.0
0.4
0.8
(c) ACF of Squares of Zt
Lag
ACF
0 10 20 30 40
0.0
0.4
0.8
(d) ACF of Absolute Values of Zt
42
Summary: Microsoft Trading Volume� Two-step fit of noninvertible MA(4):
• invertible MA(4): residuals not iid• causal AP(4); residuals iid
� Direct fit of purely noninvertible MA(4):(1+1.34B+1.374B2+2.54B3+4.96B4)
� For MCHP, invertible MA(4) fits.
43
Summary� All-pass models and their properties
• linear time series with “nonlinear” behavior� Estimation
• likelihood approximation• MLE and LAD• order selection
� Emprirical results• simulation study• AP(6) for NZ/USA exchange rates
�Noninvertible moving average processes• two-step estimation procedure using all-pass• noninvertible MA(4) for Microsoft trading volume
44
Further Work� Least absolute deviations
• further simulations• order selection• heavy-tailed case• other smooth objective functions (e.g., min dispersion)
� Maximum likelihood• Gaussian mixtures• simulation studies• applications
� Noninvertible moving average modeling• initial estimates from two-step all-pass procedure• adaptive procedures