estimation of parameters2

Upload: namrata-gulati

Post on 04-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Estimation of Parameters2

    1/44

    Maximum Likelihood Estimation of ARMA Models

    For i.i.d data with marginal pdf f(yt; ), the joint pdf

    for a sample y = (y1,...,yT) is

    f(y; ) = f(y1,...,yT; ) =T

    t=1

    f(yt; )

    The likelihood function is this joint density treated asa function of the parameters given the data y:

    L(|y) = L(|y1,...,yT) =T

    t=1

    f(yt; )

    The log-likelihood is

    L(|y) =T

    t=1

    lnf(yt; )

    1

  • 7/31/2019 Estimation of Parameters2

    2/44

    Conditional MLE of ARMA Models

    Problem: For a sample from a covariance stationarytime series {yt}, the construction of the log-likelihoodgiven above doesnt work because the random variablesin the sample y = (y1,...,yT) are not i.i.d. One So-

    lution: Conditional factorization of log-likelihood Intu-ition: Consider the joint density of two adjacent ob-servations f(y2, y1; ). The joint density can always befactored as the product of the conditional density of y2given y1 and the marginal density of y1:

    f(y2, y1; ) = f(y2|y1; )f(y1; )For three observations, the factorization becomes

    f(y3, y2, y1; ) = f(y3|y2, y1; )f(y2|y1; )f(y1; )2

  • 7/31/2019 Estimation of Parameters2

    3/44

    In general, the conditional marginal factorization has

    the form

    f(yp,...,y1; ) =

    Tt=p+1

    f(yt|It1; )

    .f(yp,...,y1; )

    It = {yt,...,y1} = info available at time typ,...,y1 = initial values

    The exact log-likelihood function may then be expressedas

    mle = arg max Tt=p+1

    lnf(yt|It1; ) + lnf(yp,...,y1; )

    The conditional log-likelihood iscmle = arg max Tt=p+1

    lnf(yt|It1; )

    3

  • 7/31/2019 Estimation of Parameters2

    4/44

    Two types of maximum likelihood estimates (mles) may

    be computed. The first type is based on maximizing the

    conditional log-likelihood function. These estimates are

    called conditional mles and are defined by

    cmle = arg maxT

    t=p+1 lnf(yt|It 1; )The second type is based on maximizing the exact log-

    likelihood function. These estimates are called exact

    mles, and are defined by

    mle = arg max Tt=p+1

    lnf(yt|It 1; ) + lnf(yp,...,y1; )

    4

  • 7/31/2019 Estimation of Parameters2

    5/44

    Result:

    For stationary models, cmle and mle are consistent andhave the same limiting normal distribution. In finite

    samples, however, cmle and mle are generally not equaland my differ by a substantial amount if the data are

    close to being non-stationary or non-invertible.

    5

  • 7/31/2019 Estimation of Parameters2

    6/44

    AR(p ), OLS equivalent to Conditional MLE

    Model:yt = + 1yt

    1 + .......... + pyt

    p + t. et

    W N(0, 2).

    = xt + t, = 1, 2,....,p, xt = yt1, yt2,....,ytp

    OLS: = (Tt=1xtxt)1Tt=1xtyt,2 =

    1

    T (p + 1)Tt=1(yt x

    t)

    2.

    6

  • 7/31/2019 Estimation of Parameters2

    7/44

    Properties of the estimator

    is downward bias in a finite sample, i.e. E[] < . Estimator might be biased but consistent, it con-

    verges in probability.

    7

  • 7/31/2019 Estimation of Parameters2

    8/44

    Example: MLE for stationary AR(1)

    Yt = c + 1Yt1 + t, t = 1, ...., Tt W N(0, 2) || < 1 = (c,,2)

    Conditional on It

    1

    yt|It1 N(c + yt1, 2), t = 2...., Twhich only depends on yt1. The conditional densityf(yt|It1;) is then

    f(yt|yt1;) = (22)12 exp 1

    22(yt c yt1)2,

    t = 2...., T

    8

  • 7/31/2019 Estimation of Parameters2

    9/44

    To determine the marginal density for the initial valuey1, recall that for a stationary AR(1) process

    E[y1] = =c

    1 var[yt] =

    2

    1 2

    It follows that

    y1 N

    c

    1 ,2

    1 2

    f(y1; ) = 22

    1 212exp 1 2

    22(y1 c

    1 )2

    9

  • 7/31/2019 Estimation of Parameters2

    10/44

    The conditional log-likelihood function is

    Tt=2

    lnf(yt|yt1; ) = (T 1)2

    ln(2) (T 1)2

    ln(2)

    122

    Tt=2

    (yt c yt1)2

    Notice that the conditional log-likelihood function has

    the form of the log-likelihood function for a linear re-gression model with normal errors

    yt = c yt1 + t, t N(0, 2), t = 2, .....TIt follows that

    ccmle = colsccmle = cols

    2cmle =T

    t=2

    (yt ccmle ccmleyt1)2

    10

  • 7/31/2019 Estimation of Parameters2

    11/44

    The marginal log-likelihood for the initial value y1 is

    ln f(y1; ) = 1

    2ln(2) 1

    2ln

    2

    1 2

    y1 c

    1

    2

    The exact log-likelihood function is then

    lnL(; y) = T2

    ln(2) T2

    ln(2

    1 2) 1 2

    22

    y1

    c

    1

    2

    (T 1)

    2ln(2)

    1

    22

    T

    t=2(yt c yt1)2

    11

  • 7/31/2019 Estimation of Parameters2

    12/44

    Prediction Error Decomposition

    To illustrate this algorithm, consider the simple AR(1)model. Recall,

    yt|It1 N(c + yt1, 2

    ), t = 1, 2...., Tfrom which it follows that

    E[yt|It1] = c + yt1var[yt|It1] = 2

    The 1-step ahead prediction errors may then be de-fined as

    vt = yt E[yt|It1] = yt c + yt1, t = 2...., T12

  • 7/31/2019 Estimation of Parameters2

    13/44

    The variance of the prediction error at time t is

    ft = var(vt) = var(t) = 2, t = 2, ...T

    For the initial value, the first prediction error and

    its variance are

    v1 = y1 E[y1] = y1 c

    1 f1 = var(1) =

    2

    1

    2

  • 7/31/2019 Estimation of Parameters2

    14/44

    Using the prediction errors and the prediction error vari-ances, the exact log-likelihood function may be re-expressed

    as

    ln L(|y) = T2

    ln(2) 12

    Tt=1

    lnft 12

    Tt=1

    v2tft

    which is the prediction error decomposition. A furthersimplification may be achieved by writing

    var(vt) = 2ft

    = 21

    1 2for t = 1= 2.1for t > 1

    That is ft =1

    12 for t = 1 and ft = 1 for t > 1. Thenthe log-likelihood becomes

    ln L(|y) = T2

    ln(2)12

    Tt=1

    ln212

    Tt=1

    lnft 1

    22

    Tt=1

    v2tft

    13

  • 7/31/2019 Estimation of Parameters2

    15/44

    MLE Estimation of MA(1)

    Recall

    Yt = + t1 + t, || < 1et W N(0, 2).

    || < 1 is assumed for invertible representation only,

    nothing about stationarity.

    14

  • 7/31/2019 Estimation of Parameters2

    16/44

    Estimation MA(1)

    Yt|t1 N( + t1, 2),f(Yt|t1, ) =

    122

    e 1

    22(ytt1)2

    =(,,2).

    Problem: without knowing t2 we dont observe t1.Need to know t2 to know t1 = yt t2.

    But t1 unobservable. Assume 0 = 0.

    Make it non-random, just fix it with number 0. This

    works with any number.

    15

  • 7/31/2019 Estimation of Parameters2

    17/44

    Estimation MA(1)

    Y1|0 N(, 2),y1 = + 1 1 = y1 ,y2 = + 2 + 1 2 = y2 (y1 ),t = yt (yt1 ) + ....... + (1)t1t1(y1 )

    Conditional likelihood:

    L(|y1....yT; 0 = 0) =T

    t=1

    122

    e 1

    22(2t1)

    If || < 1 (much less),0 doesnt matter, CMLE is con-sistent.

    Exact MLE requires Kalman Filter.

    16

  • 7/31/2019 Estimation of Parameters2

    18/44

    Why do we need Exact MLEs

    Estimation of MA(1) models, assumed that e0 = 0

    while calculating the sequence of ets. A more tradi-tional approach is to estimate the unconditional or the

    exact log-likelihood function by assuming that e0 is ran-

    dom and hence allowing it to follow some distribution

    and use it to calculate the ets from the data. Such

    an allowance will affect not only the prediction error

    and the variance of y1; but also successive prediction

    errors and their variances. The practice of obtaining an

    estimate of the parameters by not conditioning on the

    pre-sample values but obtaining them using such exact

    MSEs is called the exact or unconditional ML estima-

    tion method. Note that the problem of obtaining the

    sequence of ets arises only in pure MA or mixed models.17

  • 7/31/2019 Estimation of Parameters2

    19/44

    Exact MLEs

    What are the advantages of using such a procedure? Tounderstand that, an examination of the first prediction

    error and its variance is instructive. For example, in an

    MA(1) process, the first prediction error is the same,

    it is y1 in both conditional and exact ML estimation

    procedures. But the assumption that e0 = 0 meansthat the var(z1) = var(e1) = 2 whereas, if we allow

    e0 to be random, var(z1) = 2(1 + 2): And this will be

    reflected in f1 in the exact log-likelihood function. And

    in typically small sample sizes such an assumption may

    matter a lot for the estimates. Besides, if happensto be close to unity, the differences will be even more

    significant.

    18

  • 7/31/2019 Estimation of Parameters2

    20/44

    Exact MLEs

    But estimating the exact log-likelihood function is dif-

    ficult and was a costly exercise in terms of computing,

    till some time back; but in these days of advanced and

    cheap computing facilities, cost should not be a deter-

    rent in using exact ML estimation method. Our job

    becomes easier by noting that for any ARMA model it

    can be shown that the two end-equations, namely, that

    of the prediction error and the prediction error variance,

    are recursive in nature and literature shows that these

    two can be calculated by using two popular methods,

    viz. (1) the triangular factorization method, TF and

    (2) the kalman filter recursions.

    19

  • 7/31/2019 Estimation of Parameters2

    21/44

    Estimation

    For the Gaussian AR(1) process,

    Yt = c + 1Yt

    1 + t,

    |

    |< 1

    et W N(0, 2).

    The joint distribution of YT = (Y1, Y2.....YT)

    is

    YT N0,the observations y (y1, y2,...,yT) are the singlerealization of YT.

    20

  • 7/31/2019 Estimation of Parameters2

    22/44

    MLE AR(1)

    Y1.

    .

    .

    YT

    = N(,)

    =

    .

    .

    .

    = 0 ... T1... . . .

    .

    ..... . . . ...T1 ... 0

    21

  • 7/31/2019 Estimation of Parameters2

    23/44

    MLE AR(1)

    The p.d.f. of the sample y = (y1, y2,...,yT) is given by

    the multivariate normal density

    fY

    y; ;

    = (2)T2 ||12 exp{ 1

    2(y )1(y )}

    Denoting

    = 2y with ij = |ij|

    = 0 ... T

    1

    ... . . . ...

    ... . . . ...T1 ... 0

    = 01 ...

    T10

    ... . . . ...

    ... . . . ...T1

    0... 1

    22

  • 7/31/2019 Estimation of Parameters2

    24/44

    = 2y =

    2y

    1 ... (T 1)... . . . ......

    . ..

    ..

    .(T 1) ... 1

    (j) = j

    Collecting the parameters of the model in = (c,,2),

    the joint p.d.f. becomes:

    fY

    y;

    = (22y )

    T2 ||12 exp{ 1

    22y(y )1(y )

    }

    Collecting the parameters of the model in = (c,,2),

    the sample log-likelihood function is given by

    L() = T2

    log(2) T2

    log(2y ) 1

    2log(||) 1

    22y(y )1(y )

  • 7/31/2019 Estimation of Parameters2

    25/44

    MLE

    The exact log-likelihood function is a non-linearfunction of the parameters . There is no closed

    form solution for the exact mles.

    The exact mles must be determined by numeri-cally maximizing the exact log-likelihood function.

    Usually, a Newton-Raphson type algorithm is used

    for the maximization which leads to the iterative

    scheme

    mle,n = mle,n H(mle,n)1s(mle,n)where H() is an estimate of the Hessian matrix

    (2nd derivative of the log-likelihood function), and

    23

  • 7/31/2019 Estimation of Parameters2

    26/44

    s(mle,n) is an estimate of the score vector (1st

    derivative of the log-likelihood function).

    The estimates of the Hessian and Score may becomputed numerically (using numerical derivative

    routines) or they may be computed analytically (if

    analytic derivatives are known).

  • 7/31/2019 Estimation of Parameters2

    27/44

    Factorization

    Note that for large T, might be large and difficultto invert.

    Since is positive definite symmetric matrix thenthere exists a unique, triangular factorization of ; = Af A,

    where

    fT XT =

    f1 0 ... 00 f2

    . . . ...... . . . ...0 ... f T

    ft 0 for all t diagonal matrix

    AT XT =

    1 0 ... 0

    a21 1. . . ...

    ... . . . ...aT1 aT1 ... 1

    24

  • 7/31/2019 Estimation of Parameters2

    28/44

    Likelihood

    The likelihood function can be rewritten as:

    L(|yT) = (2)T2 det(Af A)

    12 e1

    2(yT )(Af A)1(y )

    This is done by converting the correlated variables y1, y2........yTinto a collection, say 1, 2........T of uncorrelated vari-

    ables. In the following, let Pj denote the projection

    onto the random variables in Xj Define

    = A1

    ( yT ) (prediction error).where

    A = ( yT ).25

  • 7/31/2019 Estimation of Parameters2

    29/44

    Since A is lower-triangular matrix with 1s along the

    principal diagonal,

    1 = y1

    2 = y2 P1y2 = y2 a1113 = y3 a211 a221

    ...

    T = yT T1i=1

    aT iT1

  • 7/31/2019 Estimation of Parameters2

    30/44

    Also, since A is lower triangular with 1s along the prin-

    cipal diagonal, det(A) = 1

    det(AfA) = det(A) . det(f ) . det(A) = det(f ).

    Then, L(|yT)=(2)T2 det(f1)

    12 e

    12

    (f1)

    =T

    t=1 1

    2fte12 2ft ,

    where t is tth element of T x1 = prediction error yt

    yt

    |t..1,

    yt|t..1 =t1i=1

    at,iyi, 1 = 2, 3, .....T, where at,i is(t; i)th element ofA1.

    26

  • 7/31/2019 Estimation of Parameters2

    31/44

    Kalman Filters

    The Kalman filter comprises a set of mathematical

    equations which result from an optimal recursive so-

    lution given by the least squares method.

    The purpose of this solution consists in computing alinear, unbiased and optimal estimator of a systems

    state at time t, based on information available at t..1

    and update, with the additional information at t, these

    estimates.

    The filters performance assumes that a system can bedescribed through a stochastic linear model with an as-

    sociated error following a normal distribution with mean

    zero and known variance.

    27

  • 7/31/2019 Estimation of Parameters2

    32/44

    Kalman Filters

    The Kalman filter is the main algorithm to estimate dy-

    namic systems in state-space form. This representation

    of the system is described by a set of state variables.

    The state contains all information relative to that sys-

    tem at a given point in time. This information should

    allow to infer about the past behaviour of the system,

    aiming at predict its future behaviour.

    28

  • 7/31/2019 Estimation of Parameters2

    33/44

    DEVELOPING THE KALMAN FILTER

    ALGORITHM

    The basic building blocks of a Kalman Filter are two

    equations: the measurement equation and the transi-tion equation. The measurement equation relates the

    unobserved data (xt where t indicates a point in time)

    with observable data (yt; where t indicates a point in

    time):

    yt = m xt + vt (1)where, E(vt) = 0 and var(vt) = rt The transition equa-

    tion is based on a model that allows the unobserved

    29

  • 7/31/2019 Estimation of Parameters2

    34/44

    data to change through time.

    xt+1 = a xt + wt (2)where, E(wt) = 0 and var(wt) = qt The process starts

    with an initial estimate for xt, call it x0, which has

    a mean of 0 and a standard deviation of s0. Using

    the expectation of equation (2), a prediction for x1

    emerges, call it x1

    x1 = E(a x0 + w0) = a 0 (3)The predicted value from equation (3) is then inserted

    into equation (1) and again taken as an expectation to

    produce a prediction for y1, call it y1

    y1 = E(m x1 + v0) = m E(x1) = m a 0 (4)

  • 7/31/2019 Estimation of Parameters2

    35/44

    Thus far, predictions of the future are based on expec-

    tations and not on the variance or standard deviation

    associated with the predicted variables. The variance

    will eventually be incorporated to produce better es-

    timation. However, the next step is to compare the

    predicted y1 (i.e. y1) with the actual y1 when it occurs.

    In equation (5), the expectation of the predicted value

    and actual value for y1 are compared to produce the

    predicted (or expected) error, ye1:

    ye1 = E(y1 y1) = y m E(x1) = y1 m a 0 (5)Given the error in predicting y1 which is based on the

    expectation of x1 from equation (3), a new estimation

    for x1 is considered, x1E. Notice this is different from

    x1 because x1E incorporates the prediction error of y1.

  • 7/31/2019 Estimation of Parameters2

    36/44

    Equation (6) identifies x1E as an expectation of an

    adjusted x1.

    x1E = E[x1 + k1 ye1] = E[x1] + k1 (y1 E(m x1))

    = a 0 + k1(y1 m a 0) (6)k1 (or more generically, kt in equation (6) is re-

    ferred to as the Kalman gain and incorporates the vari-

    ance of x1

    (denoted as p1

    or generically as pt

    and the

    variance of y1 (see the denominator in equation (8)below).

    var(x1) = var(a0 x0) + var(w0) = 20 a2 + q0 = p1(7)k1 =

    m p1p1 m

    2

    + r0

    =m (20 a2 + q0)

    (2

    0 a2

    + q0) m2

    + r0

    (8)

    The cycle starts over with x1E taking the place of x0in equation (3) and used to forecast y2. The mean of

  • 7/31/2019 Estimation of Parameters2

    37/44

    x1E is the value of the expectation calculated in equa-

    tion(6). Notice, the mean ofx1E

    incorporates the mean

    of x0, the variance of x0(via the Kalman gain), the vari-

    ance of the error in the measurement equation (via the

    Kalmna gain), the variance of the error in the transition

    equation (via the Kalman gain), and the observe y1.

    The variance of x1E

    :

    var(x1) = p1[1 k1 m] = p1 [

    1 11 + r0

    (20a2+q0)m2

    ](9)

    Notice, the variance of x1E is reduced relative to the

    variance of x1. Further,because the distributional as-pects of each estimated value of xt is known (assuming

  • 7/31/2019 Estimation of Parameters2

    38/44

    a model), the model parameters within the measure-

    ment and transition equations can be optimized using

    maximum likelihood estimation. An iterative sequence

    of filter followed by parameter estimation optimization

    eventually optimizes the entire system.

    Predict future unobserved variable (x) based on thecurrent estimate of the unobserved variable:

    xt = E(a x(t)E + wt) (10)

    Use predicted unobserved variable to predict future

    observable variable (y):

    yt+1 = E(m xt+1 + vt) (11)

  • 7/31/2019 Estimation of Parameters2

    39/44

    When the future observable variable actually occurs,calculate the error in the prediction:

    yet+1 = E(yt+1 yt+1) (12) Generate a better estimate of the unobserved vari-

    able at time (t + 1) and start the process over for

    time (t + 2):

    x(t+1)E = E[x(t+1) + k(t+1) ye(t+1)] (13)

    Note: kt+1 is the Kalman gain and is based on thevariance of the predicted variables in the first and

    second step of the process:

    kt+1 =m var(x

    (t=1))

    var(y(t+1))

    (14)

  • 7/31/2019 Estimation of Parameters2

    40/44

    APPLYING MAXIMUM LIKELIHOOD ESTIMATION

    TO THE KALMAN FILTER

    The Kalman Filter provides output throughout the time

    series in the form of estimated values (e.g. x1E from

    equation (6)) which are the means/expectations of theunobserved variables x1....xS for every time period t with

    associated variances provided by equation (9) (Note: x0is also from a distribution with a mean of 0 and a vari-

    ance of s20 and is part of the time series). Keeping with

    the univariate structure from the previous section andassuming T time periods of data, a maximum likelihood

    estimation (MLE) is imposed by further assuming x0,

  • 7/31/2019 Estimation of Parameters2

    41/44

    w1....wT, and v1...vT are jointly normal and uncorrelated.

    Consequently, a joint likelihood function exists:

    {122

    e

    (x00)222

    0

    }{[

    12qt

    ]Te

    T

    t=1(xtE(xt))22q1

    }

    {[1

    2rt]T

    e

    T

    t=1(ytE(yt))22rt } (15)

    recall: x0 is distributed N(0, s20), wt is distributed N(0, qt),

    and vt is distributed N(0, rt)

    The problem with the likelihood function in equation(10) is that xt is not observable. However, as noted

    above, the Kalman Filter provides means and variances

  • 7/31/2019 Estimation of Parameters2

    42/44

    for xt throughout the time series. Consequently, re-

    move the first two terms and define the mean of yt asyt = m xt = a m x(t1)E, define the variance ofyt as p

    t m2 = r(t1) ,and keep the assumption of yt

    being normally distributed. Notice, the variance defini-

    tion(via pt ; see the denominator in equation (8)) and

    the mean definition incorporate the distributional prop-erties of xt while dealing with the observable yt. Fur-

    ther, the initial conditions are introduced when t equals

    one as x0E = 0 and through p1 (see equation(7)). The

    new likelihood function becomes

    Tt=1

    {[1

    2[pt m2 + r(t1)]

    ]Te

    Tt=1(ytamx(t1)E)2p

    tm2+r(t1)

    }(16)

  • 7/31/2019 Estimation of Parameters2

    43/44

    To simplify the math, take the natural logarithm of the

    likelihood function creating the log- likelihood function:

    T ln(2)2

    12

    Tt=1

    ln[pt m2 + rt1] 1

    2

    Tt=1

    (yt amxt1)E)2pt m2 + rt1

    (17)

    Further assume that rt and qt (contained in pt ) are con-stant throughout time. Consequently, equation (12)

    has a value which is called a score. The next step is

    to maximize the log- likelihood function using the par-

    tial derivatives for r (= rt for all t), q (= qt for all t),

    and a. After solving for these three parameters based

    on setting the partial derivatives to zero, the Kalman

    Filter is re-estimated and then the maximum likelihood

    procedure is applied again until the score improves by

  • 7/31/2019 Estimation of Parameters2

    44/44

    less than a particular value (say 0.0001) indicating con-

    vergence. The iterative use of the MLE procedure is

    called the Expectation Maximization (EM) algorithm.