Download - for Allpassrdavis/lectures/magdeburg02.pdfJoint work with Jay Breidt, Colorado State University Alex Trindade, University of Florida Beth Andrews, Colorado State University. 2

1

Maximum Likelihood Estimation Maximum Likelihood Estimation for for Allpass Allpass Time Series ModelsTime Series Models

Richard A. Davis

Department of StatisticsColorado State University

http://www.stat.colostate.edu/~rdavis/lectures/magdeburg02.pdf

Joint work with Jay Breidt, Colorado State UniversityAlex Trindade, University of FloridaBeth Andrews, Colorado State University

2

�Introduction• properties of financial time series• motivating example • all-pass models and their properties

�Estimation• likelihood approximation• MLE and LAD• asymptotic results• order selection

�Empirical results• simulation• NZ/USA exchange rates

�Noninvertible MA processes� preliminaries� a two-step estimation procedure� Microsoft trading volume

�Summary

3

Financial Time Series� Log returns, Xt = 100*(ln (Pt) - ln (Pt-1)), of financial assets

often exhibit:

• heavy-tailed marginal distributionsP(|X1| > x) ~ C x−α, 0 < α < 4.

• lack of serial correlationnear 0 for all lags h > 0 (MGD sequence)

• |Xt| and Xt2 have slowly decaying autocorrelations

converge to 0 slowly as h →∞

• process exhibits ‘stochastic volatility’

)(ˆ hXρ

)(ˆ and )(ˆ 2|| hh XX ρρ

� Nonlinear models Xt = σtZt , {Zt} ~ IID(0,1)

• ARCH and its variants (Engle `82; Bollerslev, Chou, andKroner 1992)

• Stochastic volatility (Clark 1973; Taylor 1986)

4

lag h

ACF

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

lag h

ACF(

|X|^

2)

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

lag h

ACF(

|X|)

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

t

X(t)

0 100 200 300 400 500

-0.0

4-0

.02

0.0

0.02

Motivating example500-daily log-returns of NZ/US exchange rate

5

All-pass model of order 2 (t3 noise )

0 1 0 2 0 3 0 4 0

0.0

0.2

0.4

0.6

0.8

1.0

L a g

ACF

A C F : (a llp a s s )2

0 1 0 2 0 3 0 4 0

0.0

0.2

0.4

0.6

0.8

1.0

L a g

ACF

A C F : (a llp a s s )

modelsample

t

X(t)

0 200 400 600 800 1000

-30

-20

-10

010

20

6

All-pass Models

Causal AR polynomial: φ(z)=1−φ1z − − φpzp , φ(z) ≠ 0 for |z|≤1.

Define MA polynomial:

θ(z) = −zpφ(z−1)/φp = −(zp −φ1zp-1 − − φp)/ φp

≠ 0 for |z|≥1 (MA polynomial is non-invertible).

Model for data {Xt} : φ(B)Xt = θ(B) Zt , {Zt} ~ IID (non-Gaussian)

BkXt = Xt-k

Examples:

All-pass(1): Xt − φXt-1 = Zt − φ−1 Zt-1 , | φ | < 1.

All-pass(2): Xt − φ1 Xt-1 − φ2 Xt-2 = Zt + φ1/ φ2 Zt-1 − 1/ φ2 Zt-2

m

m

7

Properties:• causal, non-invertible ARMA with MA representation

• uncorrelated (flat spectrum)

• zero mean• data are dependent if noise is non-Gaussian

(e.g. Breidt & Davis 1991).• squares and absolute values are correlated.

• Xt is heavy-tailed if noise is heavy-tailed.

πφσ=

πσ

φφ

φ=ω

ω−

ωω−

22)(

)()( 2

22

22

22

pi

p

iip

Xe

eef

jt0j

jtp

1

t )()(

−

∞

=

−

∑ψ=φφ−

φ= ZZB

BBXp

8

� Second-order moment techniques do not work

• least squares

• Gaussian likelihood

� Higher-order cumulant methods

• Giannakis and Swami (1990)

• Chi and Kung (1995)

� Non-Gaussian likelihood methods

• likelihood approximation

• quasi-likelihood

• least absolute deviations

• minimum dispersion

Estimation for All-Pass Models

9

Approximating the likelihoodData: (X1, . . ., Xn)

Model:

where φ0r is the last non-zero coefficient among the φ0j’s.

Noise:

where zt =Zt / φ0r.

More generally define,

Note: zt(φφφφ0) is a close approximation to zt (initialization error)

rtpptpt

ptptt

ZZZXXX

00101

0101

/)( φφ−−φ−−

φ++φ=

+−−

−−

�

�

),( 01010101 ptptttpptpt XXXzzz −−+−− φ−−φ−−φ++φ= ��

+=φ−φ++φ++=

=+−

− .1,..., if ,)()()(,1,..., if ,0

)(11 pntXBzz

npntz

ttpptpt φφφφφφφφφφφφ

�

10

Assume that Zt has density function fσ and consider the vector

Joint density of z:

and hence the joint density of the data can be approximated by

where q=max{0 ≤ j ≤ p: φj ≠ 0}.

)')(),...,(),...,(,)(),...,(,,...,( 110101OOOO NOOOO MLOOOOO NOOOOO ML

φφφφφφφφφφφφφφφφφφφφ npnpp zzzzzXX +−−−=z

independent pieces

)),(),...,(( ||))((

))(),...,(,,...,()(

121

01011

φφφφφφφφφφφφ

φφφφφφφφ

npn

pn

tqtq

pp

zzhzf

zzXXhh

+−

−

=σ

−−

φφ•

=

∏

z

φφ= ∏

−

=σ

pn

tqtq zfh

1

||))(()( φφφφx

11

Log-likelihood:

where fσ(z)= σ−1 f(z/σ).

Least absolute deviations: choose Laplace density

and log-likelihood becomes

Concentrated Laplacian likelihood

Maximizing l(φφφφ) is equivalent to minimizing the absolute deviations

))((ln|)|/ln()(),(1

1 φφφφφφφφ ∑−

=

− φσ+φσ−−=σpn

ttqq zfpnL

|)|2exp(2

1)( zzf −=

||/ ,/|)(|2ln)(constant1

q

pn

ttzpn φσ=κκ−κ−− ∑

−

=

φφφφ

|)(|ln)(constant)(1

φφφφφφφφ ∑−

=

−−=pn

ttzpnl

.|)(|)(1

n φφφφφφφφ ∑−

=

=pn

ttzm

12

Assumptions

� Assume {Zt} iid fσ(z)=σ−1f(σ−1z) with

• σ a scale parameter

• mean 0, variance σ2

� For f known, use maximum likelihood

• further smoothness assumptions (integrability,symmetry, etc.) on f

• Fisher information:

� For f unknown, use quasi-likelihood

� Least absolute deviations

• assume f has median 0

• assume f continuous in neighborhood of 0

• act as if f = Laplace to get criterion function

dzzfzfI )(/))('(~ 22∫

−σ=

13

Results� Let γ(h) = ACVF of AR model with AR poly φ0(.) and

� Maximum likelihood:

� Least absolute deviations:

p1kj,k)]-j([ =γ=Γp

))1~(2

1,0() ˆ( 1220MLE

−Γσ−σ

→− p

D

INn φφφφφφφφ

))0(2|)(|Var,0() ˆ( 12

241

0LAD−

σ

Γσσ

→− p

D

fZNn φφφφφφφφ

14

Further comments on MLE

Let α=(φ1, . . . , φp, σ /|φ1|, β1, . . . , βq), where β1, . . . , βq are theparameters of pdf f.

Set

�

�

�

� (Fisher Information)

{ }

dzzfzfzf

dzzfzfzfz

dzzfzfz

dzzfzf

T

f

p

p

0000

0000

0000

0000

00000000

0000

0000

0000

0000

00000000

00000000

ββββββββ

ββββββββ

ββββββββ

ββββββββ

ββββββββ

ββββββββ

ββββββββ

∂∂

∂∂=

∂∂α−=

−α=

σ=

∫

∫

∫

∫

−+

−+

−

);();();(

1)(I

);();();('L

1);(/));('(K

);(/));('(I

11,0

2221,0

220

15

Under smoothness conditions on f wrt β1, . . . , βq we have

where

Note: is asymptotically independent of and

),,0() ˆ( 10MLE

−∑→− Nn

Dαααααααα

−−−−−−

Γ−σ

=∑−−−−−

−−−−−

−

−

11111

11111

120

1

)'ˆ(ˆ)'ˆ()'ˆ('ˆ)'ˆ(

)1ˆ(21

LKLIKLLKLILKLILKLILK

I

ff

ff

p

00

00

ˆMLEφφφφ ˆ MLE1,p+α ˆ

MLEββββ

16

Identifiability in LAD case?• Minimizer may not be unique.

• Gaussian case: {Zt} iid , so

• Consider {cj}with at least two non-zero elements and

Jian and Pawitan (1998) show

holds for Laplace, Student’s t, contaminated normal, etc.

• Non-Gaussian case:

|| || 1ZEZcE jj

j >∑∞

−∞=

|)(||)(| 0100

01

10

1111 φφφφφφφφ zEZEZEzE

pp

=φσσ=

φσσ=

1 and || 2 =∞< ∑∑∞

−∞=

∞

−∞= jj

jj cc

|)(|)()(

)()(|)(| 010

110

11

011 φφφφφφφφ zEZ

BBBBEzE t

p

>φφφφφ= −

−

)N(0,)N(0, -21p

21

-20p

20 φσ=φσ

17

Central Limit Theorem (LAD case)

• Think of u = n1/2(φφφφ−−−−φφφφ0000) as an element of RRp

• Define

• Then Sn(u) → S(u) in distribution on C(RRp), where

• Hence,

( )

∑

∑

−

=

−

=

−+=

+=

pn

ttn

pn

tttn

znm

z-nzS

10

1/2-0

10

1/2-0

|)(|) (

|)( ||) ( |)(

φφφφφφφφ

φφφφφφφφ

u

uu

),|)(|Var2 ,(~ ,''||)0()( 22

0

1

0p

rp

r

ZNfS Γσφ

+Γφ

= σ 0NNuuuu

))0(2|)(|Var,(~

)0(2||

)(minarg)ˆ()(minarg

1244

11

0

02/1

−

σσ

−

Γσσ

Γφ−=

→−=

ppr

D

LADn

fZN

f

SnS

0N

uu φφφφφφφφ

18

Asymptotic Results (LAD case):

Theorem 1. Let {Yt} be the linear process

where c0=0, {zt }~IID(0,σ2), median(z1 )=0,

g(0)>0 (g density of z1). Then

where and γ*(h) is the covariance

function for Yt sgn(zt)

,jtj

jt zcY −

∞

−∞=∑=

, || ∞<∑∞

−∞=jjc

( )NgY

z-Y-nzSpn

ttttn

+→

=∑−

=

)0()(Var

| || |

1

1

1/2-

))(*2)0(*N(0,~1∑≥

γ+γh

hN

19

Key idea:

( )

{ })0()(

11 )( 2

)sgn(

| || |

1

}0{}0{1

2/1

1

2/1

1

1/2-

2/12/1

gYVarN

zYn

zYn

z-Y-nzS

tttt zYnYnzt

pn

tt

pn

ttt

pn

ttttn

+→

−−+

−=

=

<<<<

−

=

−

−

=

−

−

=

−−∑

∑

∑

20

Theorem 2. On C(RRp),

where

and Γp is the covariance matrix of a causal AR(p).

( ), )(

|)( ||) ( |)(1

01/2-

0

u

uu

S

z-nzSpn

tttn

→

φ+φ=∑−

=

),|)(|2 ,(~

,''||)0()(

220

1

0

pr

pr

ZVarN

fS

Γσφ

+Γφ

= σ

0N

Nuuuu

21

Limit theory for LAD estimate. Note that

so that

Minimizing S, we find that the minimizer or limit random variable is

nn /ˆ ˆ0LAD u+φ=φ

).(minargˆ)(minarg) ˆ(ˆ 0LAD

uuuu

SSn nn

=→=φ−φ=

Nu)0(2

||) ˆ(ˆ

10

0LADσ

−Γφ−→φ−φ=

fn pr

n

))0(2|)(|,(~

)0(2|| 12

241

10 −

σσ

−

Γσσ

Γφ− p

pr

fZVarN

f0N

22

Asymptotic Covariance Matrix• For LS estimators of AR(p):

• For LAD estimators of AR(p):

• For LAD estimators of AP(p):

• For MLE estimators of AP(p):

),0() ˆ( 120LS

−Γσ→− p

DNn φφφφφφφφ

))0(4

1,0() ˆ( 12220LAD

−Γσσ

→− p

D

fNn φφφφφφφφ

))0(2|)(|Var,0() ˆ( 12

241

0LAD−

σ

Γσσ

→− p

D

fZNn φφφφφφφφ

))1ˆ(2

1,0() ˆ( 1220MLE

−Γσ−σ

→− p

D

INn φφφφφφφφ

23

Laplace: (LAD=MLE)

Students tν, ν >2:

LAD:

MLE:

Student’s t3: LAD: .7337

MLE: 0.5

ARE: .7337/.5=1.4674

)1ˆ(21

21

)0(2|)(|Var

2241

−σ==

σ σ IfZ

2

2

2

2

241

)1()2(2

)2/)1((2)2)(2/(

)0(2|)(|Var

−ν−ν−

+νΓπ−ννΓ=

σ σfZ

12)3)(2(

)1ˆ(21

2

+ν−ν=−σ I

24

Order Selection:

Partial ACF From the previous result, if true model is of order r and fitted model is of order p > r, then

where is the pth element of .

Procedure:

1. Fit high order (P-th order), obtain residuals and estimate scalar,

by empirical moments of residuals and density estimates.

))0(2

|)(|Var,0(ˆ24,

2/1

σσ→φ

fZNn LADp

LADp,φ LADφφφφ

,)0(2|)(|Var

2412

σσ=θ

fZ

25

2. Fit AP models of order p=1,2, . . . , P via LAD and obtain p-th

coefficient for each.

3. Choose model order r as the smallest order beyond which the

estimated coefficients are statistically insignificant.

Note: Can replace with if using MLE. In this case

for p > r

pp ,φ

pp ,φ MLE,ˆ

pφ

).)1ˆ(2

1,0(ˆ2,

2/1

−σ→φ

INn MLEp

26

AIC: 2p or not 2p?

• An approximately unbiased estimate of the Kullback-Leiber index

of fitted to true model:

• Penalty term for Laplace case:

• Estimated penalty term:

pfZE

ZLpAIC X )0(|||)(|Var)ˆ,ˆ(2:)( 2

1

1

σσ+κ−= φφφφ

pppfZE

Z =σσσ

σ=σ σ )2/1()2/(

2/)0(||

|)(|Var2

2

21

1

pfZE

Zpfzave

z

tzt

t

)0(|||)(|Var

)0(ˆ|}ˆ({||)ˆ((|var

21

1

Pˆ( σσ

→))))φφφφ

))))φφφφ))))φφφφ

27

Sample realization of all-pass of order 2

t

X(t)

0 100 200 300 400 500

-40

-20

020

(a) Data From Allpass Model

Lag

AC

F

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

(b) ACF of Allpass Data

Lag

AC

F

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

(c) ACF of Squares

Lag

AC

F

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

(d) ACF of Absolute Values

28

Estimates:

Standard errors computed aswhere

Order selection:• cut-off value for PACF is 1.96*.908/sqrt(500)=.0796

•

)0381(.374.ˆ ),0381(.297.ˆ21 =φ=φ

}500/)ˆ1{( ˆ 22φ−θ sqrt

.919 ˆ =θ

6 7 8 9 100.047 0.034 -0.05 0.083 0.0212348 2349 2345 2343 2345

1 2 3 4 5phi_p 0.289 0.374 0.009 0.011 0.01AIC(p)2451 2346 2347 2348 2350

pLpAIC X 896.1)ˆ,ˆ(2:)( +κφ−=

29

Simulation results:

• 1000 replicates of all-pass models

• model order parameter value 1 φ1 =.52 φ1=.3, φ2=.4

• noise distribution is t with 3 d.f.

• sample sizes n=500, 5000

• estimation method is LAD

30

To guard against being trapped in local minima, we adopted the following strategy.

• 250 random starting values were chosen at random. For model of order p, k-th starting value was computed recursively as follows:

1. Draw iid uniform (-1,1).2. For j=2, …, p, compute

• Select top 10 based on minimum function evaluation.

• Run Hooke and Jeeves with each of the 10 starting values and choose best optimized value.

)()(22

)(11 ,...,, k

ppkk φφφ

φ

φφ−

φ

φ=

φ

φ

−

−−

−−

−

−)(

1,1

)(1,1

)(

)(1,1

)(1,1

)(1,

)(1

kj

kjj

kjj

kjj

kj

kjj

kj

��

31

Asymptotic EmpiricalN mean std dev mean std dev %coverage rel eff*500 φ1=.5 .0332 .4979 .0397 94.2 11.85000 φ1=.5 .0105 .4998 .0109 95.4 9.3

Asymptotic EmpiricalN mean std dev mean std dev %coverage500 φ1=.3 .0351 .2990 .0456 92.5

φ2=.4 .0351 .3965 .0447 92.15000 φ1=.3 .0111 .3003 .0118 95.5

φ2=.4 .0111 .3990 .0117 94.7

*Efficiency relative to maximum absolute residual kurtosis:2

12

1t

4

2/12

))()((1 , 3)(1 φφφφφφφφφφφφ zzpn

vvz

pn t

pn

t

pnt −

−=−

− ∑∑−

=

−

=

32

Asymptotic EmpiricalN mean std dev mean std dev %coverage 500 φ1=.5 .0349 .4983 .0421 91.7

ν=3.5 .5853 3.449 .4527 92.5 5000 φ1=.5 .0110 .4997 .0088 95.0

ν=3.5 .1851 3.449 .1341 96.7

Asymptotic EmpiricalN mean std dev mean std dev %coverage500 φ1=.3 .0369 .2969 .0451 89.6

φ2=.4 .0369 .3973 .0446 90.6ν=3.5 .5853 3.556 .5685 92.4

5000 φ1=.3 .0117 .3002 .0099 94.7φ2=.4 .0117 .4001 .0106 93.6ν=3.5 .1764 3.510 .1764 94.7

MLE Simulations Results using t-distr(3.5)

33

Empirical Empirical LADN mean std dev mean std dev500 φ1=.5 .4978 .0315 .4979 .03975000 φ1=.5 .4997 .0094 .4998 .0109500 φ1=.3 .2988 .0374 .2990 .0456

φ2=.4 .3957 .0360 .3965 .04475000 φ1=.3 .3007 .0101 .3003 .0118

φ2=.4 .3993 .0104 .3990 .0117

Minimum Dispersion Estimator: Minimize the objective fcn

where {z(t)(φφφφ)} are the ordered {zt(φφφφ)}.

)(21

1( )(

1tφφφφφ)φ)φ)φ) t

pn

zpntS ∑

−

=

−

+−=

34

lag h

ACF

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

lag h

ACF(

|X|^

2)

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

lag h

ACF(

|X|)

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

t

X(t)

0 100 200 300 400 500

-0.0

4-0

.02

0.0

0.02

Application to financial data500-daily log-returns of NZ/US exchange rate

35

Lag

acf

0 5 10 15 20 25

0.0

0.2

0.4

0.6

0.8

1.0

Lag

acf

0 5 10 15 20 25

0.0

0.2

0.4

0.6

0.8

1.0ACF: residuals ACF: (residuals)2

All-pass model fitted to NZ-USA exchange rates (using LAD):

Order = 6, φ1=-.367, φ2=-.750, φ3=-.391, φ4=.088, φ5= -.193, φ6=-.096

(AIC had local minima at p=6 and 10)

36

Noninvertible MA models with heavy tailed noise

Xt = Zt + θ1 Zt-1 + + θq Zt-q ,

a. {Zt} ~ IID(α) with Pareto tails

b. θ(z) = 1 + θ1 z + + θq zq

No zeros inside the unit circle invertibleSome zero(s) inside the unit circle noninvertible

. . .

. . .

⇒

⇒

37

Realizations of an invertible and noninvertible MA(2) processesModel: Xt = θ∗ (B) Zt , {Zt} ~ IID(α = 1), whereθi(B) = (1 +1/2B)(1 + 1/3B) and θni(B) = (1 + 2B)(1 + 3B)

0 10 20 30 40

-40

-20

020

0 10 20 30 40

-300

-100

010

0

Lag

0 2 4 6 8 10

-0.2

0.2

0.6

1.0

ACF

Lag

0 2 4 6 8 10

-0.2

0.2

0.6

1.0

ACF

38

Application of all-pass to noninvertible MA model fitting

Suppose {Xt} follows the noninvertible MA model

Xt= θi(B) θni(B) Zt , {Zt} ~ IID.

Step 1: Let {Ut} be the residuals obtained by fitting a purely invertible MA model, i.e.,

So

Step 2: Fit a purely causal AP model to {Ut}

Z(B)~(B)U

). of version invertible theis ~( ,(B)U~(B)

(B)UˆX

tni

nit

ninitnii

tt

θθ≈

θθθθ≈

θ=

.(B)Z(B)U~tnitni θ=θ

39

t

X(t)

0 200 400 600

2*10

56*

105

106

Volumes of Microsoft (MSFT) stock traded over 755 transaction days (6/3/96 to 5/28/99)

40

Analysis of MSFT:

Step 1: Log(volume) follows MA(4).

Xt =(1+.513B+.277B2+.270B3+.202B4) Ut (invertible MA(4))

Step 2: All-pass model of order 4 fitted to {Ut} using MLE (t-dist):

Conclude that {Xt} follows a noninvertible MA(4) which after refitting has the form:

Xt =(1+1.34B+1.374B2+2.54B3+4.96B4) Zt , {Zt}~IID t(6.3)

.)Z8B1.3.586B18B4.5B6.21(

)U14B3..833B32B1.B184.1(

t432

t432

−−−+=

−−++

41

Lag

ACF

0 10 20 30 40

0.0

0.4

0.8

(a) ACF of Squares of Ut

Lag

ACF

0 10 20 30 40

0.0

0.4

0.8

(b) ACF of Absolute Values of Ut

Lag

ACF

0 10 20 30 40

0.0

0.4

0.8

(c) ACF of Squares of Zt

Lag

ACF

0 10 20 30 40

0.0

0.4

0.8

(d) ACF of Absolute Values of Zt

42

Summary: Microsoft Trading Volume� Two-step fit of noninvertible MA(4):

• invertible MA(4): residuals not iid• causal AP(4); residuals iid

� Direct fit of purely noninvertible MA(4):(1+1.34B+1.374B2+2.54B3+4.96B4)

� For MCHP, invertible MA(4) fits.

43

Summary� All-pass models and their properties

• linear time series with “nonlinear” behavior� Estimation

• likelihood approximation• MLE and LAD• order selection

� Emprirical results• simulation study• AP(6) for NZ/USA exchange rates

�Noninvertible moving average processes• two-step estimation procedure using all-pass• noninvertible MA(4) for Microsoft trading volume

44

Further Work� Least absolute deviations

• further simulations• order selection• heavy-tailed case• other smooth objective functions (e.g., min dispersion)

� Maximum likelihood• Gaussian mixtures• simulation studies• applications

� Noninvertible moving average modeling• initial estimates from two-step all-pass procedure• adaptive procedures

Download - for Allpassrdavis/lectures/magdeburg02.pdfJoint work with Jay Breidt, Colorado State University Alex Trindade, University of Florida Beth Andrews, Colorado State University. 2

Top Related