automated estimation of vector error correction models - american
Post on 09-Feb-2022
6 Views
Preview:
TRANSCRIPT
Automated Estimation of Vector Error Correction Models
Zhipeng Liao� Peter C. B. Phillipsy
December, 2010
Abstract
Model selection and associated issues of post-model selection inference present well
known challenges in empirical econometric research. These modeling issues are manifest
in all applied work but they are particularly acute in multivariate time series settings
such as cointegrated systems where multiple interconnected decisions can materially
a¤ect the form of the model and its interpretation. In cointegrated system modeling,
empirical estimation typically proceeds in a stepwise manner that involves the deter-
mination of cointegrating rank and autoregressive lag order in a reduced rank vector
autoregression followed by estimation and inference. This paper proposes an automated
approach to cointegrated system modeling that uses adaptive shrinkage techniques to
estimate vector error correction models with unknown cointegrating rank structure
and unknown transient lag dynamic order. These methods enable simultaneous or-
der estimation of the cointegrating rank and autoregressive order in conjunction with
oracle-like e¢ cient estimation of the cointegrating matrix and transient dynamics. As
such they o¤er considerable advantages to the practitioner as an automated approach to
the estimation of cointegrated systems. The paper develops the new methods, derives
their limit theory, reports simulations and presents an empirical illustration.
Keywords: Adaptive shrinkage; Automation; Cointegrating rank, Lasso regression;
Oracle e¢ ciency; Transient dynamics; Vector error correction.
JEL classi�cation: C22
�Yale University. Support from an Anderson Fellowship is gratefully acknowledged. Email:zhipeng.liao@yale.edu
yYale University, University of Auckland, University of Southampton and Singapore Management Uni-versity. Support from the NSF under Grant No SES 09-56687 is gratefully acknowledged. Email: pe-ter.phillips@yale.edu
1
1 Introduction
Cointegrated system modeling is now one of the main workhorses in empirical time series
research. Much of this empirical research makes use of vector error correction (VECM)
formulations. While there is often some prior information concerning the number of coin-
tegrating vectors, most practical work involves (at least con�rmatory) pre-testing to de-
termine the cointegrating rank of the system as well as the lag order in the autoregressive
component that embodies the transient dynamics. These order selection decisions can be
made by sequential likelihood ratio tests (e.g. Johansen, 1988, for rank determination) or
the application of suitable information criteria (Phillips, 1996). The latter approach o¤ers
several advantages such as joint determination of the cointegrating rank and autoregressive
order, consistent estimation of both order parameters (Chao and Phillips, 1999), robust-
ness to heterogeneity in the errors, and the convenience and generality of semi-parametric
estimation in cases where the focus is simply the cointegrating rank (Cheng and Phillips,
2010). While appealing for practitioners, all of these methods are nonetheless subject to
pre-test bias and post model selection inferential problems (Leeb and Pötscher, 2005).
The present paper explores a di¤erent approach. The goal is to liberate the empirical
researcher from sequential testing procedures and post model selection issues in inference
about cointegrated systems and in policy work that relies on impulse responses. The ideas
originate in recent work on sparse system estimation using shrinkage techniques such as
lasso and bridge regression. These procedures utilize penalized least squares criteria in
regression that can succeed, at least asymptotically, in selecting the correct regressors in
a linear regression framework while consistently estimating the non-zero regression coe¢ -
cients.
One of the contributions of this paper is to show how to develop adaptive versions of
these shrinkage methods that apply in vector error correction modeling. The implemen-
tation of these methods is not immediate. This is partly because of the nonlinearities
involved in potential reduced rank structures and partly because of the interdependence of
decision making concerning the form of the transient dynamics and the cointegrating rank
structure. The paper designs a mechanism of estimation and selection that works through
the eigenvalues of the levels coe¢ cient matrix and the coe¢ cient matrices of the transient
dynamic components. The methods apply in quite general vector systems with unknown
cointegrating rank structure and unknown lag dynamics. They permit simultaneous order
estimation of the cointegrating rank and autoregressive order in conjunction with oracle-
2
like e¢ cient estimation of the cointegrating matrix and transient dynamics. As such they
o¤er considerable advantages to the practitioner: in e¤ect, it becomes unnecessary to im-
plement pre-testing procedures because the empirical results reveal the order parameters
as a consequence of the �tting procedure. In this sense, the methods provide an automated
approach to the estimation of cointegrated systems. In the scalar case, the methods reduce
to estimation in the presence or absence of a unit root and thereby implement an implicit
unit root test procedure, as suggested in earlier work by Caner and Knight (2009).
The paper is organized as follows. Section 2 lays out the model and assumptions and
shows how to implement adaptive shrinkage methods in VECM systems. Section 3 consid-
ers a simpli�ed �rst order version of the VECM without lagged di¤erences which reveals
the approach to cointegrating rank selection and develops key elements in the limit theory.
Here we show that the cointegrating rank ro is identi�ed by the number of zero eigenvalues
of �o and the latter is consistently recovered by suitably designed shrinkage estimation.
Section 4 extends this system and its asymptotics to the general case of cointegrated sys-
tems with weakly dependent errors. Here it is demonstrated that the cointegration rank
ro can be consistently selected despite the fact that �o may not be consistently estimable.
Section 5 deals with the practically important case of a general VECM system driven by
independent identically distributed (i.i.d.) shocks, where shrinkage estimation simultane-
ously performs consistent lag selection, cointegrating rank selection, and optimal estimation
of the system coe¢ cients. Section 6 considers adaptive selection of the tuning parameter.
Section 7 reports some simulation �ndings and a brief empirical illustration. Section 8
concludes and outlines some useful extensions of the methods and limit theory to other
models. Proofs and some supplementary technical results are given in the Appendix.
2 Vector Error Correction and Adaptive Shrinkage
Throughout the paper we consider following parametric VECM representation of a cointe-
grated system
�Yt = �oYt�1 +
pXj=1
Bo;j�Yt�j + ut; (2.1)
where �Yt = Yt � Yt�1; Yt is an m-dimensional vector-valued time series, �o = �o�0o hasrank 0 � ro � m, Bo;j (j = 1; :::; p) are m �m (transient) coe¢ cient matrices and ut is
an m-vector error term with mean zero and nonsingular covariance matrix �u. The rank
3
ro of �o is an order parameter measuring the cointegrating rank or the number of (long
run) cointegrating relations in the system. The lag order p is a second order parameter,
characterizing the transient dynamics in the system.
Conventional methods of estimation of (2.1) include reduced rank regression or maxi-
mum likelihood, based on the assumption of Gaussian ut and a Gaussian likelihood. This
approach relies on known ro and known p; so implementation requires preliminary order
parameter estimation. The system can also be estimated by unrestricted fully modi�ed
vector autoregression (Phillips, 1995), which leads to consistent estimation of the unit
roots in (2.1), the cointegrating vectors and the transient dynamics. This method does
not require knowledge of r0 but does require knowledge of p: In addition, a semiparametric
approach can be adopted in which ro is estimated semiparametrically by order selection as
in Cheng and Phillips (2010) followed by fully modi�ed least squares regression to estimate
the cointegrating matrix. This method achieves asymptotically e¢ cient estimation of the
long run relations (under Gaussianity) but does not estimate the transient relations.
The present paper explores the estimation of the parameters of (2.1) by Lasso-type
regression, i.e. least squares (LS) regression with penalization. The resulting estimator
is a shrinkage estimator. Speci�cally, the LS shrinkage estimator of (�o; Bo) where Bo =
(Bo;1; :::; Bo;p) is de�ned as
(b�n; bBn) = argmin�2Rm�m;Bj2Rm�m
nXt=1
�Yt ��Yt�1 �pXj=1
Bj�Yt�j
2
E
+n
pXk=1
bP�n(Bj) + n mXk=1
bP�n ���k(�)� ; (2.2)
where k�kE is the Euclidean norm, bP�n(�) is penalty function with tuning parameter �n,and ��k(�) = j�k (�)j denotes the k-th largest modulus of the eigenvalues f�k (�)gmk=1 ofthe matrix �1. Let Sn;� = fk : �k(b�n) 6= 0g be the index set of nonzero eigenvaluesof b�n. Since the eigenvalues of b�n are ordered as indicated, if there are dSn;� elementsin Sn;�, then Sn;� = fm � dSn;� + 1; :::;mg. Similarly, we index the zero components ofb�n using the set Scn;� = fk : �k(b�n) = 0g and set Scn;� = f1; :::; dSn;�g. Given �n, thisprocedure delivers a one step estimator of the model (2.1) with an implied estimate of the
1Throughout this paper, for any m � m matrix �, we order the eigenvalues of � in ascending orderby their moduli, i.e. j�1 (�)j � j�2 (�)j � ::: � j�m (�)j. When there is a pair of complex conjugateeigenvalues, we order the one with a negative imaginary part before the other.
4
cointegrating rank (based on the number of non-zero eigenvalues of b�n and an impliedestimate of the transient dynamic order p and transient dynamic structure (that is, the
non zero elements of Bo) based on the �tted value bBn:In the literature of variable selection, there are many popular choices of the penalty
function. One example for bP�n(�) is the adaptive Lasso penalty (Zou, 2006), de�ned asbP�n(�) = �n bw�j�j; (2.3)
where bw� = 1=jb�nj! with ! > 0 and some consistent estimator b�n of �. Other examplesare the bridge penalty (Knight, 2000), de�ned as
bP�n(�) = �n j�j (2.4)
where 2 (0; 1), and the SCAD penalty (Fan and Li, 2001), de�ned as
bP�n(�) =8>><>>:
�nj�j�naj�j(a�1) �
�2+�2n2(a�1)
(a+1)�2n2
j�j � �n�n < j�j � a�na�n < j�j
; (2.5)
where a is some positive real number strictly larger than 2. In the context of variable selec-
tion, the LS shrinkage estimator with the above penalty functions has "oracle" properties
in the sense that any zero coe¢ cients are estimated as zeros with probability approaching
1(w.p.a.1). Correspondingly, the non-zero coe¢ cients are estimated as if the true model
were known and implemented in estimation (Fan and Li, 2001).
Let �(�o) = (�1(�); :::; �m(�)) denote the vector containing the m ordered eigenvalues
of the matrix � in Rm�m. When futgt�1 is an i.i.d. process or a martingale di¤erencesequence, the LS estimators (b�ols; bBols) of (�o; Bo) are well known to be consistent. Thusit seems intuitively clear that some form of adaptive penalization can be devised to consis-
tently distinguish the zero and nonzero components in Bo and �(�o). We show that the
shrinkage LS estimator de�ned in (2.2) enjoys these oracle-like properties, in the sense that
the zero components in Bo and �(�o) are estimated as zeros w.p.a.1. Thus, �o and the
non-zero elements in Bo are estimated as if the form of the true model were known and
inferences can be conducted as if we knew the true cointegration rank ro.
If the transient behavior of (2.1) is misspeci�ed and (for some given lag order p) the error
process futgt�1 is weakly dependent and ro > 0, then consistent estimators of (�o; Bo) are
5
typically unavailable without further assumptions. However, we show here that the m� rozero eigenvalues of �o can still be consistently estimated with an order n convergence
rate, while the rest of the eigenvalues of �o are estimated with asymptotic bias at apn
convergence rate. The di¤erent convergence rates of the eigenvalues are important, because
when the non-zero eigenvalues of �o are occasionally (asymptotically) estimated as zeros,
the di¤erent convergence rates are useful in consistently distinguishing the zero eigenvalues
from the biasedly estimated non-zero eigenvalues of �o. Speci�cally, we show that if the
estimator of some non-zero eigenvalue of �o has probability limit zero, then this estimator
will converge in probability to zero at the ratepn, while estimates of the zero eigenvalues of
�o all have convergence rate n. Hence the adaptive penalties associated with estimates of
zero eigenvalues of �o will diverge to in�nity at a rate faster than those of estimates of the
nonzero eigenvalues of �o, even though the latter also converge to zero in probability. As
we have prior knowledge about these di¤erent divergence rates in a potentially cointegrated
system, we can impose explicit conditions on the convergence rate of the tuning parameter
to ensure that only the estimates of zero eigenvalues of �o are adaptively shrunk to zero
in �nite samples.
For empirical implementation of our approach, we provide data-driven procedures for
selecting the tuning parameter of the penalty function in �nite samples. Our method is
executed as follows.
(i) Solve the minimization problem in (6.2) to get the data determined tuning parameterb��n.(ii) Using the tuning parameter b��n, obtain the LS shrinkage estimator (b�n; bBn) of
(�o; Bo).
(iii) The cointegration rank selected by the shrinkage method is implied by the rank
of the shrinkage estimator b�n and the lagged di¤erences selected by the shrinkage methodare implied by the nonzero matrices in bBn.
(iv) The LS shrinkage estimator (b�n; bBSn;B ) contains shrinkage bias introduced by thepenalty on the nonzero eigenvalues of b�n and nonzero matrices in bB. To remove this bias,run a second-step reduced rank regression based on the cointegration rank and the model
selected by the LS shrinkage estimation given b��n.Notation is standard. For vector-valued, zero mean, covariance stationary stochastic
processes fatgt�1 and fbtgt�1, �ab(h) = E[atb0t+h] and �ab =
P1h=0 �ab(h) denote the
lag h autocovariance matrix and one-sided long-run covariance matrix. The symbolism
an � bn means that (1 + op(1))bn = an or vice versa; the expression an = op(bn) signi-
6
�es that Pr (jan=bnj � �) ! 0 for all � > 0 as n go to in�nity; and an = Op(bn) when
Pr (jan=bnj �M)! 0 as n and M go to in�nity. As usual, "!p" and "!d" imply conver-
gence in probability and convergence in distribution, respectively.
3 The First Order VECM
This section considers the following simpli�ed �rst order version of (2.1),
�Yt = �oYt�1 + ut = �o�0oYt�1 + ut: (3.1)
The model contains no deterministic trend and no lagged di¤erences. Our focus in this
simpli�ed system is to outline the approach to cointegrating rank selection and develop
key elements in the limit theory, showing consistency in rank selection and reduced rank
coe¢ cient matrix estimation. The theory is then generalized in subsequent sections.
We impose the following assumptions on the innovation ut.
Assumption 3.1 (WN) futgt�1 is an m-dimensional i:i:d: process with mean zero, andnonsingular covariance matrix u.
Assumption 3.1 ensures that the parameter matrix �o is consistently estimable in this
simpli�ed system. It could be relaxed to allow for martingale di¤erence innovations. Under
3.1, partial sums of ut satisfy the functional law
n�12
[n�]Xt=1
ut !d Bu(�); (3.2)
where Bu(�) is vector Brownian motion with variance matrix u.
Assumption 3.2 (RR) (i) The determinantal equation jI � (I +�o)�j = 0 has roots onor outside the unit circle; (ii) the matrix �o has rank ro, with 0 � ro � m; (iii) if ro > 0,then the matrix R = Iro + �
0o�o has eigenvalues within the unit circle.
As �o = �o�0o has rank ro, we can choose �o and �o to be m � ro matrices with full
rank. When ro = 0, we simply take �o = 0. Let �o;? and �o;? be the matrix orthogonal
complements of �o and �o and, without loss of generality, assume that �0o;?�o;? = Im�ro
7
and �0o;?�o;? = Im�ro . Assumption 3.2.(iii) implies �0o�o is nonsingular, which further
implies that �0o;?�o;? is nonsingular.
Suppose �o 6= 0 and de�ne Q = [�o; �o?]0 : In view of the well known relation (e.g.,
Johansen, 1995)
�o(�0o�o)
�1�0o + �o;?(�0o;?�o;?)
�1�0o;? = Im;
it follows that Q�1 =h�o(�
0o�o)
�1; �o;?(�0o;?�o;?)
�1iand then
Q�oQ�1 =
�0o�o 0
0 0
!: (3.3)
Cointegrating rank is the number ro of non-zero eigenvalues of �o and �o;? is a matrix of
normalized eigenvectors corresponding to those eigenvalues. When �o = 0, then the result
holds trivially with ro = 0 and �o;? = Im.
Let S� = fk : �k(�o) 6= 0g be the index set of nonzero eigenvalues of �o and similarlySc� = fk : �k(�o) = 0g denote the index set of zero eigenvalues of �o. By the ordering ofthe eigenvalues of �o, we know that S� = fm� dS� + 1; :::;mg and Sc� = f1; :::;m� dS�g,where dS� is the cardinality of the set S�. Based on the above discussion, the following
equality holds
dS� = ro: (3.4)
Consistent selection of the rank of �o is equivalent to the consistent recovery of the zero
components in �(�o). We show that the latter is achieved by the sparsity of our shrinkage
estimator of the eigenvalues of �o.
Using the matrix Q, we can transform (3.1) as
�Zt = �oZt�1 + wt; (3.5)
where
Zt =
�0oYt
�0o;?Yt
!:=
Z1;t
Z2;t
!; wt =
�0out
�0o;?ut
!:=
w1;t
w2;t
!and �o = Q�oQ�1. Assumption 3.2 leads to the following Wold representation for Z1;t
Z1;t = �0oYt =
1Xi=0
Ri�0out�i = R(L)�0out; (3.6)
8
and the partial sum Granger representation,
Yt = C
tXs=1
us + �o(�0o�o)
�1R(L)�0out + CY0; (3.7)
where C = �o;?(�0o;?�o;?)
�1�0o;?. Under Assumption 3.2 and (3.2), we have the functional
law
n�12
[n�]Xt=1
wt !d Bw(�) = QBu (�) ="�0oBu (�)�0o;?Bu (�)
#=:
"Bw1 (�)Bw2 (�)
#for wt = Qut; so that
n�12
[n�]Xt=1
Z1;t = n� 12
[n�]Xt=1
�0oYt !d �(�0o�o)�1Bw1(�); (3.8)
since R (1) =P1i=0R
i = (I �R)�1 = �(�0o�o)�1: Also
n�1nXt=1
Z1;t�1Z01;t�1 = n
�1nXt=1
�0oYt�1Y0t�1�o !p �z1z1 ;
where �z1z1 := V ar��0oYt
�=P1i=0R
i�0ou�oRi0:
The shrinkage LS estimator b�n of �o is de�ned asb�n = arg min
�2Rm�m
nXt=1
k�Yt ��Yt�1k2E + nmXk=1
bP�n ���k(�)� : (3.9)
The unrestricted LS estimator b�ols of �o isb�ols = arg min
�2Rm�m
nXt=1
k�Yt ��Yt�1k2E =
TXt=1
�YtY0t�1
! TXt=1
Yt�1Y0t�1
!�1: (3.10)
Let �(b�ols) = [�1(b�ols); :::; �k(b�ols)] be the vector of ordered eigenvalues of b�ols. Theasymptotic properties of b�ols and its eigenvalues are described in the following result.Lemma 3.1 Under Assumptions 3.1 and 3.2, we have:
(a) b�ols satis�esQ�b�ols ��o�Q�1D�1n = Op(1) (3.11)
9
where Dn = diag(n�12 Iro ; n
�1Im�ro);
(b) the eigenvalues of b�ols satisfyh�1(b�ols); :::; �m�ro(b�ols)i!p (0; :::; 0)1�(m�ro) (3.12)
and h�m�ro+1(
b�ols); :::; �m(b�ols)i!p
��m�ro+1(�o); :::; �m(�o)
�; (3.13)
(c) the �rst m� ro ordered eigenvalues of b�ols satisfynh�1(b�ols); :::; �m�ro(b�ols)i!d
�e�o;1; :::; e�o;m�ro� ; (3.14)
where the e�o;j (j = 1; :::;m� ro) are solutions of the following determinantal equation������Im�r0 ��Z
dBw2B0w2
��ZBw2B
0w2
��1����� = 0; (3.15)
where Bw2 = �0o;?Bu is Brownian motion with covariance matrix �
0o;?u�o;?.
The results of Lemma 3.1 are useful in constructing the adaptive Lasso penalty function
because the OLS estimate b�ols and the related eigenvalue estimates �k(b�ols) of �k(�o)(k = 1; :::;m) can be used as �rst step estimates in the penalty function. The convergence
rates of b�ols and �k(b�ols) are important for delivering consistent model selection andcointegrated rank selection. In the univariate case wherem = 1 and ro = 0, we immediately
have (see Lemma 1.(iii) in the Appendix)
n�Sc�(b�ols)!d
RBu(a)dBu(a)RB2u(a)da
; (3.16)
coinciding with the asymptotic distribution of the LS estimator in a regression with a
unit-root.
The following assumptions provide some regularity conditions on the penalty functionbP�n(�), which are su¢ cient for establishing consistency and the convergence rate of b�n, aswell as the sparsity of �(b�n).Assumption 3.3 (PF-I) (i) The penalty function bP�n(�) is non-negative with bP�n(0) = 0
10
and satis�es bP�n(�) = op(1) (3.17)
for all � 2 (0;1); (ii) bP�n(�) is twice continuously di¤erentiable at � 2 (0;1) withbP 00�n(�) = op(1) for all � 2 (0;1); (iii) bP�n(�) satis�esn12
��� bP 0�n(�)��� = op(1); (3.18)
for any � 2 (0;1); (iv) bP�n(�) is an increasing function in (0;+1) and for each k 2 Sc�,there exists some sequence vn;k = o(1) such that
lim infn!1
nn bP�n(n�1vn;k)o!1 a:e: (3.19)
Assumption 3.3.(i) essentially states that the shrinkage e¤ect on estimates of the non-
zero eigenvalues goes to zero asymptotically. This requirement is important for establishing
the consistency of the shrinkage estimator b�n. Assumption 3.3.(ii) is a useful regularitycondition in deriving the convergence rate of b�n and Assumption 3.3.(iii) assures a pnconvergence rate. Assumption 3.3.(iv) ensures that the penalty bP�n(�) attaches to estimatesof the zero eigenvalues diverges to in�nity, which ensures the zero eigenvalues of �o are
estimated with zero w.p.a.1 and consistent cointegration rank selection. Similar general
assumptions are imposed on the penalty function bP�n(�) in Liao (2010), where shrinkagetechniques are employed to perform consistent moment selection in a GMM framework.
Theorem 3.1 (Consistency) Suppose Assumptions WN, RR and PF-I.(i) are satis�ed.Then the shrinkage LS estimator b�n is consistent, i.e. b�n ��o = op(1):
When consistent shrinkage estimators are considered, Theorem 3.1 extends Theorem 1
of Caner and Knight (2009), where shrinkage techniques are used to perform a unit root
test. As the eigenvalues �(�) of the matrix � are continuous functions of �, we deduce from
the consistency of b�n and continuous mapping that �k(b�n)!p �k(�o) for all k = 1; :::;m.
Theorem 3.1 implies that the nonzero eigenvalues of �o are estimated as non-zeros, which
means that the rank of �o will not be under-selected. However, consistency of the estimates
of the non-zero eigenvalues is not necessary for consistent cointegration rank selection. In
that case what is essential is that the probability limits of the estimates of those (non-zero)
eigenvalues are not zeros or at least that their convergence rates are slower than those
of estimates of the zero eigenvalues. This point will be pursued in the following section
11
where it is demonstrated that consistent estimation of the cointegrating rank continues to
hold for weakly dependent innovations futgt�1 even though full consistency of b�n does notgenerally apply in that case.
Our next result gives the convergence rate of the shrinkage estimator b�n.Theorem 3.2 (Rate of Convergence) Denote an = maxk2S�
��� bP 0�n ���k(�o)����. Under
Assumption WN, RR and Assumption PF.(i)-(ii), the shrinkage LS estimator b�n satis�esthe following:
(a) if ro = 0, then b�n ��o = Op(n�1 + n�1an);(b) if 0 < ro � m, then
�b�n ��o�Q�1D�1n = Op(1 + n12an).
The term an represents the shrinkage bias that the penalty function introduces to the
LS shrinkage estimator. If the convergence rate of �n is fast enough such that Assumption
PF.(iii) is satis�ed, then Theorem 3.2 implies that b�n � �o = Op(n�1) when ro = 0
and�b�n ��o�Q�1D�1n = Op(1) otherwise. Hence, under Assumption WN, RR and
Assumption PF.(i)-(iii), the LS shrinkage estimator b�n has the same stochastic propertiesof the LS estimator b�ols. However, we next show that if the tuning parameter �n convergesto zero not too fast and such that Assumptions PF.(iv) is also satis�ed, then the correct
rank restriction r = ro is automatically imposed on the LS shrinkage estimator b�n w.p.a.1.Recall that Sn;� is the index set of the largest (or last) ro ordered eigenvalues of b�n
and Scn;� is the index set of the smallest (or �rst) m � ro ordered eigenvalues of b�n. Byvirtue of the consistency of �(b�n) = h�1(b�n); :::; �m(b�n)i, we can deduce that Scn;� � Sc�.To show Scn;� = S
c�, it is su¢ ces to show that S
c� � Scn;�:
Theorem 3.3 (Sparsity-I) Under Assumption WN, RR and Assumption PF-I,
Pr��k(b�n) = 0�! 1 (3.20)
for all k 2 Sc�, as n!1.
Proof. Let b�n;ro denote the reduced rank regression estimator with the rank constraintr(�) = ro. By the de�nition of b�n,nXt=1
� �Yt � b�nYt�1 2E� �Yt � b�n;roYt�1 2
E
�� n
mXk=1
n bP�n h��k �b�n;ro�i� bP�n h��k �b�n�io :(3.21)
12
The left-hand side of the inequality in (3.21) can be rewritten as
Ln : =hvec
�b�n;ro � b�n�i0
nXt=1
Yt�1Y0t�1 Im
!hvec
�2�o � b�n;ro � b�n�i
+ 2hvec
�b�n;ro � b�n�i0 vec
nXt=1
Yt�1u0t
!:(3.22)
We invoke Theorem 3.2 and Theorem 3.1 in Liao and Phillips (2010) to deduce that
Ln = Op(1).
On the eventn�ko(
b�n) 6= 0o for some ko 2 Sc�, the right-hand side of the inequality in(3.21) can be bounded in the following way
Rn : = nXk2S�
n bP�n h��k �b�n;ro�i� bP�n h��k �b�n�io� nXk2Sc�
bP�n h��k �b�n�i� �n bP�n h��k �b�n�i+ nX
k2S�
n bP�n h��k �b�n;ro�i� bP�n h��k �b�n�io : (3.23)
From Assumptions PF.(ii)-(iii), the Bauer-Fiker Theorem and Theorem 3.1 in Liao and
Phillips (2010), we have nXk2S�
n bP�n h��k �b�n;ro�i� bP�n h��k �b�n�io
� n12 bP 0�n ���k (�o)� n 1
2
�b�n;ro � b�n� E
+ [1 + op(1)] bP 00�n ���k (�o)� n 12
�b�n;ro � b�n� 2E
= Op(1): (3.24)
Suppose that Pr��ko(
b�n) 6= 0�! � 2 (0; 1], then on the eventn�ko(
b�n) 6= 0o the restric-tion �ko(�) = 0 is not imposed on the LS estimator
b�n asymptotically. Hence we can useAssumptions PF.(iv) and Theorem 4.1 in Liao and Phillips (2010) to deduce that
n bP�n h��k �b�n�i = n bP�n nn�1n h��k(b�n)io � n bP�n �n�1vn�!p 1 (3.25)
as n!1. From the results in (3.23)-(3.25), we have Rn ! �1 as n!1 and Ln > Rnwith nontrivial probability, which contradicts the de�nition of b�n. Hence, it follows that
13
Pr��k(b�n) = 0�! 1 for all k 2 Sc�, as n!1.
Theorem 3.1 and the CMT imply that
�k(b�n)!p �k (�o) : (3.26)
Combining Theorem 3.1 and Theorem 3.3, we deduce that
Pr (Sn;� = S�)! 1; (3.27)
which implies consistent cointegration rank selection, giving the following result.
Corollary 3.4 Under Assumption WN, RR and PF-I, we have
Pr�r(b�n) = ro�! 1 (3.28)
as n!1, where r(b�n) denotes the rank of b�n.It is clear that Assumption PF-I plays an important role in deriving consistent cointe-
gration rank selection. We next show that the adaptive Lasso and bridge penalty functions
satisfy Assumption PF-I; the SCAD penalty satis�es Assumptions PF-I.(i)-(iii) but fails to
satisfy Assumptions PF-I.(iv).
Corollary 3.5 The adaptive Lasso, bridge and SCAD penalty functions have the followingproperties:
(a) If �n = o(1), then the adaptive Lasso and bridge penalty functions satisfy Assump-
tions PF.(i)-(ii) and the SCAD penalty function satis�es Assumptions PF.(i)-(iii);
(b) if n12�n = o(1), then the adaptive Lasso and bridge penalty functions satisfy As-
sumptions PF.(ii)-(iii);
(c) if �nn! ! 1, then the adaptive Lasso penalty satis�es Assumption PF.(v); ifn1� �n !1, then the bridge penalty satis�es Assumption PF.(v);
(d) the SCAD penalty function does not satisfy Assumption PF.(iii).
Proof. (a). First it is clear that the adaptive Lasso, bridge and SCAD penalty functionsall satisfy bP�n(0) = 0. For the adaptive Lasso penalty,
bP�n [��k(�o)] = �nj�k(b�ols)j�!j�k(�o)j = op(1)14
for all k 2 S�, by the consistency of �k(b�ols) and the Slutsky Theorem. The adaptiveLasso penalty is clearly twice di¤erentiable with bP 00�n(�) = 0 for all � 2 (0;1), and thussatis�es Assumptions PF.(i)-(ii). For the bridge penalty,
�n bP�n [��k(�o)] = �nj�k(�o)j = o(1);for all k and bP 00�n(�) = ( � 1)�nj�j �2 ! 0;
for any � 2 (0;1). Hence the bridge penalty satis�es Assumptions PF.(i)-(ii). For theSCAD penalty, when n is su¢ ciently large, we have
bP�n [��k(�o)] = (a+ 1)�2n2
= o(1);
for all k 2 S�. Furthermore when n is su¢ ciently large, the SCAD penalty function is
clearly twice di¤erentiable at �k(�o) (k 2 S�) with
pn bP 0�n [��k(�o)] = pn [�na� j�k(�o)j]+(a� 1) I(� > �n) = 0; (3.29)
where [x]+ = maxf0; xg, and bP 00�n [��k(�o)] = 0 for all k 2 S�. Hence the SCAD penalty
satis�es Assumptions PF.(i)-(iii).
(b). For the adaptive Lasso penalty,
n12 bP 0�n ���k(�o)� = n 1
2�nj�k(b�ols)j�! = op(1);where the last equality is by the consistency of b�k;n and Slutsky Theorem. For the bridgepenalty,
n12 bP 0�n ���k(�o)� = n 1
2�nj�k(�o)j �1 = o(1):
Hence both the adaptive Lasso penalty and bridge penalty functions satisfy Assumptions
PF.(iii).
(c) For the adaptive Lasso penalty, by Lemma 3.1, we can choose vn;k = n�!2 �
� 12
n = o(1)
such that
n bP�n �n�1vn;k� = jn�k(b�n)j�!jn!�nvn;kj = jn�k(b�n)j�!jn!2 �
12n j !p 1;
15
for all k 2 Scn;�. Thus the adaptive Lasso penalty satis�es Assumption PF.(iv). For thebridge penalty, we can choose vn = (n1� �n)
� 12 = o(1) such that
n bP�n �n�1vn;k� = n1� �njvn;kj = �n1� �n� 12 !1;
for all k 2 Scn;�. Hence the bridge penalty function satis�es Assumption PF.(iii).(d) For the SCAD penalty, if n�n !1 and �n ! 0, then
n bP�n �n�1vn;k� = �njvn;kj ! 0;
for any vn;k = o(1). If n�n ! 0, then
n bP�n �n�1vn;k� = a�njvn;kj(a� 1) �
n�1v2n;k + n�2n
2(a� 1) ! 0;
or
n bP�n �n�1vn;k� = (a+ 1)n�2n2
! 0:
Hence there exists no sequence vn;k such that Assumption PF.(iii) is satis�ed.
From the sparsity and consistency of �k(b�n) (k = 1; :::;m), we can deduce that the rankconstraint r(�) = ro is imposed on the LS shrinkage estimator b�n w.p.a.1. Hence, for largeenough n the shrinkage estimator b�n can be decomposed as b�nb�0n w.p.a.1, where b�n andb�n are some m� ro matrices. Without loss of generality, we assume the �rst ro columns of�o are linearly independent. To ensure identi�cation, we normalize �o as �o = [Iro ; Oro ]
0
where Oro is some ro � (m� ro) matrix such that
�o = �o�0o = [�o; �oOro ]: (3.30)
Hence �o is the �rst ro columns of �o which is an m � ro matrix with full rank and Orois uniquely determined by the equation �oOro = �o;2, where �o;2 denotes the last m� rocolumns of �o. Correspondingly, for large enough n we can normalize b�n as b�n = [Iro ; bOn]0where bOn is some ro � (m� ro) matrix.
From Theorem 3.2 and Assumption 3.3.(ii), we have
Op(1) =�b�n ��o�Q�1Dn = �b�n ��o� �pn�o(�0o�o)�1; n�o;?(�0o;?�o;?)�1� : (3.31)
16
From (3.31), we can deduce that
nb�n �b�n � �o�0 �o;?(�0o;?�o;?)�1 = Op(1);which implies
n� bOn �Oo� = Op(1): (3.32)
Similarly, we have
pn
�(b�n � �o) b�0n�o(�0o�o)�1 � �o �b�n � �o�0 �o(�0o�o)�1� = Op(1);
which combined with (3.32) implies
pn (b�n � �o) = Op(1): (3.33)
To conclude this section, we give the centered limit distribution of the shrinkage estimatorb�n in the following theorem.Theorem 3.6 Under Assumption WN, RR and PF, we have
Q�b�n ��o�Q�1D�1n !d
�0oB1m�
�1z1z1 �oB2;m
�RBw2B
0w2
��1�0o;?B1m�
�1z1z1 0
!(3.34)
where B1m = N(0;u �z1z1), B2;m =�RBw2dB
0u
�0 and �o = �0o�o(�0o�o)�1�0o.Proof. By the sparsity of �k(b�n) (k = 1; :::;m), we can deduce that b�n and b�n minimizethe following criterion function w.p.a.1,
V3;n(�; �) =nXt=1
�Yt � ��0Yt�1 2E + nXk2S�
bP�n ���k(��0)� : (3.35)
De�ne U�1;n =pn (b�n � �o) and U�3;n = n�b�n � �o�0 = h0ro ; n� bOn �Oo�i := �0ro ; U�2;n�,
then�b�n ��o�Q�1D�1n =
�b�n �b�n � �o�0 + (b�n � �o)�0o�Q�1D�1n=
hn�
12 b�nU�3;n�o(�0o�o)�1 + U�1;n; b�nU�3;n�o;?(�0o;?�o;?)�1i :
17
De�ne
�n(U) =hn�
12 b�nU3�o(�0o�o)�1 + U1; b�nU3�o;?(�0o;?�o;?)�1i ;
then by de�nition, U�n =�U�1;n; U
�2;n
�minimizes the following criterion function
V4;n(U) =nXt=1
�k�Yt ��oYt�1 ��n(U)DnZt�1k2E � k�Yt ��oYt�1k
2E
�+n
Xk2S�
n bP�n ���k(�n(U)DnQ+�o)�� bP�n ���k(�o)�ow.p.a.1.
For any compact set K 2 Rm�ro �Rro�(m�ro) and any U 2 K, we have
�n(U)DnQ = Op(n� 12 ):
Hence, from Assumption PF.(ii)-(iii) and the Bauer-Fiker Theorem, we can deduce that
n��� bP�n ���k(�n(U)DnQ+�o)�� bP�n ���k(�o)����
� Cnmaxk2S�
n bP 0�n ���k(�o)�o k�n(U)DnQkE= Op(n
12 )maxk2S�
n bP 0�n ���k(�o)�o = op(1); (3.36)
uniformly over U 2 K.As �o = [Iro ; Oro ]
0, by de�nition we have �o;? = [�O0ro ; Im�ro ]0 and hence
�n(U) ! p
�U1; (0m�ro ; �oU2)�o;?(�
0o;?�o;?)
�1�=
�U1; �oU2(�
0o;?�o;?)
�1� =: �1(U) (3.37)
18
uniformly over U 2 K. By Lemma 10.1 and (3.37), we deduce that
nXt=1
�k�Yt ��oYt�1 ��n(U)DnZt�1k2E � k�Yt ��oYt�1k
2E
�= vec [�n(U)]
0 Dn
nXt=1
Zt�1Z0t�1Dn Im
!vec [�n(U)]
�2vec [�n(U)]0 vec
nXt=1
utZ0t�1Dn
!
! d vec [�1(U)]0"
�z1z1 0
0RBw2B
0w2�o;?
! Im
#vec [�1(U)]
�2vec [�1(U)]0 vec [(B1;m; B2;m)] := V2(U) (3.38)
uniformly over U 2 K.Note that vec [�1(U)] =
hvec(U1)
0; vec(�oU2(�0o;?�o;?)�1)0
i0and vec(�oU2(�0o;?�o;?)
�1) =h(�0o;?�o;?)
�1 �0oi0vec(U2): Hence V2(U) can be rewritten as
V2(U) = vec(U1)0 [�z1z1 Im] vec(U1)
+vec(U2)0�(�0o;?�o;?)
�1ZBw2B
0w2(�
0o;?�o;?)
�1 �0o�o�vec(U2)
�2vec(U1)0vec (B1;m)� 2vec(U2)0vec��0oB2;m(�
0o;?�o;?)
�1� : (3.39)
The expression in (3.39) makes it clear that V2(U) is uniquely minimized at (U�1 ; U�2 ), where
U�1 = B1;m��1z1z1 and U
�2 = (�
0o�o)
�1 ��0oB2;m��Z Bw2B0w2
��1(�0o;?�o;?). (3.40)
From (3.32) and (3.33), we can see that U�n is asymptotically tight. Invoking the Argmax
Continuous Mapping Theorem (ACMT), we can deduce that
U�n = (U�1;n; U
�2;n)!d (U
�1 ; U
�2 ):
Applying the CMT, we get
Q�b�n ��o�Q�1D�1n !d
�0oU
�1 �0o�oU
�2 (�
0o;?�o;?)
�1
�0o;?U�1 0
!;
19
which �nishes the proof.
Remark 3.7 The generalized shrinkage LS estimator is de�ned as
b�g;n = argmin�
nXt=1
(�Yt ��Yt�1)0 b�1u;n (�Yt ��Yt�1) + n mXk=1
bP�n ���k(�)� (3.41)
where bu;n is some consistent estimator of u. The consistency and convergence rate ofb�g;n and the sparsity of �k(b�g;n) (k = 1; :::;m) can be established similarly. Following
similar arguments to those of Theorem 3.6, we can deduce that
nXt=1
(ut ��n(U)DnZt�1)0 b�1u;n (ut ��n(U)DnZt�1)� nXt=1
u0tb�1u;nut! d vec(U1)
0 ��z1z1 �1u � vec(U1)+vec(U2)
0�(�0o;?�o;?)
�1ZBw2B
0w2(�
0o;?�o;?)
�1 �0o�1u �o�vec(U2)
�2vec(U1)0vec��1u B1;m
�� 2vec(U2)0vec
��0o
�1u B2;m(�
0o;?�o;?)
�1�: = V3(U); (3.42)
uniformly over U 2 K. V3(U) is uniquely minimized at�U�g;1; U
�g;2
�, where U�g;1 = B1;m�
�1z1z1
and
U�2 = (�0o
�1u �o)
�1 ��0o�1u B2;m��Z Bw2B0w2
��1(�0o;?�o;?):
Invoking the ACMT, we deduce that
pn (b�g;n � �o)!d B1;m�
�1z1z1 ; (3.43)
and
n� bOg;n �Oo�!d (�
0o
�1u �o)
�1 ��0o�1u B2;m��Z Bw2B0w2
��1(�0o;?�o;?): (3.44)
In the triangular representation of a cointegration system studied in Phillips (1991), we
have �o = [Iro ; 0ro�(m�ro)]0 and �o = [Iro ; Oo]
0: Thus, �0o;?�o;? = Im�ro and
�0o�1u �o =
�1u;11�2 and �
0o
�1u = �1u;11�2[Iro ;�u;12
�1u;22]; (3.45)
20
where u;11�2 = �1u;11�u;12�1u;22u;21 and u;11, u;12 and u;22 denote the leading ro�ro
sub-matrix of u, the upper-right ro�(m�ro) sub-matrix of u and last (m�ro)�(m�ro)sub-matrix of u respectively. Using (3.44) and (3.45), we have
pn� bOg;n �Oo�!d
ZdBu1�2B
0u2
�ZBu2B
0u2
��1; (3.46)
where Bu1 and Bu2 denotes the �rst ro and last m � ro vectors of Bu and Bu1�2 = Bu1 �u;12
�1u;22Bu2. From the results in (3.46), we can see that the generalized shrinkage LS
estimator bOg;n of the cointegration matrix Oo is asymptotically equivalent to the maximumlikelihood estimator studied in Phillips (1991).
4 Extension I: VECM Estimation with Weakly Dependent
Innovations
In this section, we study shrinkage reduced rank estimation in a scenario where the equation
innovations futgt�1 are weakly dependent. Speci�cally, we assume that futgt�1 is generatedby a linear process satisfying the following condition.
Assumption 4.1 (LP) Let D(L) =P1j=0DjL
j, where D0 = Im and D(1) has full rank.
Let ut have the Wold representation
ut = D(L)"t =1Xj=0
Dj"t�j, with1Xj=0
j12 jjDj jjE <1; (4.1)
where "t is i:i:d:(0;�") with �" positive de�nite and �nite fourth moments.
Denote the long-run variance of futgt�1 as u =P1h=�1 �uu(h). From the Wold
representation in (4.1), we have u = D(1)�"D(1)0, which is positive de�nite because
D(1) has full rank and �" is positive de�nite. The fourth moment assumption is needed
for the limit distribution of sample autocovariances in the case of misspeci�ed transient
dynamics.
The following lemma is useful in establishing the asymptotic properties of the shrinkage
estimator with weakly dependent innovations.
21
Lemma 4.1 Under Assumption 3.2 and 4.1, (a)-(c) and (e) of Lemma 10.1 are un-changed, while Lemma 10.1.(d) becomes
n�12
nXt=1
�utZ
01;t�1 � �uz1(1)
�!d N(0;�uz1); (4.2)
where �uz1(1) =P1j=0 �uu(j)�o
�Rj�0< 1 and �uz1 is the long run variance matrix of
ut Z1;t�1:
Proof. From the partial sum expression in (3.7), we get Z1;t�1 = �0oYt�1 = R(L)�0out,
which implies that f�0oYt�1gt�1 is a stationary process. Note that
E�utZ
01;t�1
�=
1Xj=0
E�utu
0t�j��o�Rj�0=
1Xj=0
�uu(j)�o�Rj�0<1:
Using a CLT for linear process time series (e.g. the multivariate version of theorem 8 and
Remark 3.9 of Phillips and Solo, 1992), we deduce that
n�12
nXt=1
�utZ
01;t�1 � �uz1(1)
�!d N(0;�uz1);
which establishes (4.2). The results of (a)-(c) and (e) can be proved using similar arguments
to those of Lemma 10.1.
As expected, under general weak dependence assumptions on ut; the simple reduced
rank regression models (2.1) and (3.1) are susceptible to the e¤ects of potential misspeci�-
cation in the transient dynamics. These e¤ects bear on the stationary components in the
system. In particular, due to the centering term �uz1(1) in (4.2), both the OLS estimatorb�ols and the shrinkage estimator b�n are asymptotically biased. Speci�cally, we show thatb�ols has the following probability limit,b�ols !p �1 := Q
�1HoQ+�o; (4.3)
where Ho =��wz1(1)�
�1z1z1 ; 0m�(m�ro)
�. Note that
Q�1HoQ+�o =��o + �uz1(1)�
�1z1z1
��0o = e�o�0o; (4.4)
22
which implies that the asymptotic bias of the OLS estimator b�ols is introduced via the biasin the pseudo true value limit e�o. Observe also that �1 = e�o�0o has rank at most equal toro; the number of rows in �0o: Denote
bS12 =nXt=1
Z1;t�1Z 02;t�1n
, S21 =nXt=1
Z2;t�1Z 01;t�1n
,
bS11 =
nXt=1
Z1;t�1Z 01;t�1n
and bS22 = nXt=1
Z2;t�1Z 02;t�1n
.
The next Lemma presents some asymptotic properties of the bias in the OLS estimatorb�ols.Lemma 4.2 Let Hn = n
��wz1(1); 0m�(m�ro)
� �Pnt=1 Zt�1Z
0t�1��1 and �1;n = Q�1HnQ+
�o: Then
(a) Hn converges in probability to Ho, i.e. Hn !p Ho;
(b) nQ�1HnQ�o? has limit distribution e�1�o?, wheree�1 = �uz1(1)��1z1z1(�0o�o)�1 ��Z Bw2dB
0w1
�0+�w1w2
��ZBw2B
0w2
��1�0o?; (4.5)
(c)pnQ�1 (Hn �Ho)Q�o has the limit distribution e�2�o, where
e�2 = �uz1(1)��1z1z1N(0; Vz1z1)��1z1z1�0o (4.6)
and N(0; Vz1z1) denotes the matrix limit distribution ofpn�bS11 � �z1z1�.
Proof. (a). Denote �wz1(1) = Q�uz1(1). By Lemma 4.1, we have
Hn =��wz1(1); 0m�(m�ro)
�n12D�1n
D�1n
nXt=1
Zt�1Z0t�1D
�1n
!�1D�1n n
12
=��wz1(1); 0m�(m�ro)
� D�1n
nXt=1
Zt�1Z0t�1D
�1n
!�1 Iro 0
0 n�12 Im�ro
!! p
��wz1�
�1z1z1 : 0
�= Ho: (4.7)
23
(b). From the expression of Hn in the �rst line of (4.7), we get
nQ�1HnQ�o? = [�uz1(1); 0]
D�1n
nXt=1
Zt�1Z0t�1D
�1n
!�1 0
n12�0o?�o?
!
= ��uz1(1)bS�111 bS12 �n�1 bS22 � n�1 bS21 bS�111 bS12��1 �0o?�o?: (4.8)
By Lemma 4.1 and the CMT, we have
�bS22 � n�1 bS21 bS�111 bS12��1 !d
�ZBw2B
0w2
��1: (4.9)
Next note that bS12 !d �(�0o�o)�1��Z
Bw2dB0w1
�0+�w1w2
�: (4.10)
The claimed result now follows by applying the results in (4.8)-(4.10), Lemma 4.1 and
CMT into the expression in (4.8).
(c). From the expression of Hn in the �rst line of (4.7), we get
pnQ�1 (Hn �Ho)Q�o
=pn [�uz1(1); 0]
8<:nD�1n bS11 n�
12 bS12
n�12 bS21 n�1 bS22
!�1D�1n �
��1z1z1 0
0 0
!9=;Q�o=
pn�uz1(1)
�Hn;11 � ��1z1z1
��0o�o; (4.11)
where Hn;11 =�bS11 � bS12 bS�122 bS21��1 and we use �0o?�o = 0 in get the third equality.
Invoking the CLT and Lemma 4.1, we get
�uz1(1)pn�Hn;11 � ��1z1z1
��0o�o
= ��uz1(1)Hn;11pn�H�1n;11 � �z1z1
���1z1z1�
0o�o
= ��uz1(1)H11;npn�bS11 � �z1z1���1z1z1�0o�o + op(1)
! d �uz1(1)��1z1z1N(0; Vz1z1)�
�1z1z1�
0o�o; (4.12)
which �nishes the proof.
24
Denote the rank of �1 by r1: Then, by virtue of the expression �1 = e�o�0o, we haver1 � ro as indicated. Without loss of generality, we decompose �1 as �1 = e�1e�01 wheree�1 and e�1 are m � r1 matrixes with full rank. Denote the orthogonal complements of e�1and e�1 as e�1? and e�1? respectively. Similarly, we decompose e�1? as e�1? = he�?; �o?iwhere e�? is an m � (r1 � ro) matrix. Denote e�0 = �R Bw2dB0u�0 �R Bw2B0w2��1 �0o? ande�3 = N(0;�uz1)�
�1z1z1�
0o. Let [�1(b�ols); :::; �m(b�ols)] and [�1(�1); :::; �m(�1)] be the or-
dered eigenvalues of b�ols and �1; respectively.Lemma 4.3 Under Assumption 3.2 and 4.1, we have the following results:
(a) the OLS estimator b�ols satis�eshQ�b�ols ��o�Q�1 �HniDn = Op(1); (4.13)
(b) the eigenvalues of b�ols satisfy[�1(b�ols); :::; �m�ro(b�ols)]!p (0; :::; 0)1�(m�r1) (4.14)
and
[�m�r1+1(b�ols); :::; �m(b�ols)]!p
��m�ro+1(�1); :::; �m(�1)
�; (4.15)
(c) the �rst m� ro ordered eigenvalues of b�ols satisfyn[�1(b�ols); :::; �m�ro(b�ols)]!d [e�01; :::; e�0m�ro ]; (4.16)
where e�0j (j = 1; :::;m�ro) are the ordered solutions of ���uIm�ro � �0o? �e�0 + e�1��o?��� = 0;(d) b�ols has ro � r1 eigenvalues satisfying
pn[�m�ro+1(
b�ols); :::; �m�r1(b�ols)]!d [e�0m�ro+1; :::; e�0m�r1 ]; (4.17)
where e�0j (j = m�ro+1; :::;m�r1) are the ordered solutions of ���uIro�r1 � e�0? �e�2 + e�3� e�?��� =0.
Proof. (a). From the expression in (3.10) and (4.7),
hQ�b�ols ��o�Q�1 �HniDn =
"nXt=1
�wtZ
0t�1 �H1;o
�#D�1n
D�1n
nXt=1
Zt�1Z0t�1D
�1n
!�1;
(4.18)
25
where H1;o =��wz1(1); 0m�(m�ro)
�. Now, the results in (a) are directly from Lemma 4.1
and CMT.
(b). Denote P =he�1; e�1?i and Sn(�) = �Im� b�ols, then by de�nition, the eigenvalues
of b�ols are the solutions of the following determinantal equation,0 =
��P 0Sn(�)P �� =����� �e�
01e�1 � e�01b�olse�1 �e�01b�olse�1?�e�01?b�olse�1 �Im�ro � e�01?b�olse�1?
����� : (4.19)
As �1e�1? = 0 and b�ols = �1 + op(1),e�01b�olse�1? = e�01 �b�ols ��1� e�1? + e�01�1e�1? = op(1): (4.20)
Similarly, we have e�01?b�olse�1? = e�01? �b�ols ��1� e�1? = op(1) (4.21)
and e�01?b�olse�1 !pe�01?�1e�1 and e�01b�olse�1 !p
e�01�1e�1: (4.22)
From the results in (4.19)-(4.22), we can invoke the Slutsky Theorem to deduce that����Im � b�ols���!p j�Im�r1 j �����e�01e�1 � e�01�1e�1��� ; (4.23)
uniformly over any compact set in R, where����e�01e�1 �p e�01�1e�1��� = 0 can equivalently be
written as����Ir1�ro � e�01e�1��� = 0. Hence the claimed results follow by (4.23) and the CMT.
(c). If we denote u�n;k = n�k(b�ols), then by de�nition, u�n;k (k 2 Scols;�) is the solutionof the following determinantal equation
0 =���e�01Sn(u)e�1���� ����e�01?�Sn(u)� Sn(u)e�1 he�01Sn(u)e�1i�1 e�01Sn(u)� e�1?���� ; (4.24)
where Sn(u) = unIm � b�ols.
From the results in (4.3) and (4.4), we have
e�01Sn(u)e�1 = n� 12ue�01e�1 � e�01b�olse�1 !p �e�01e�1e�01e�1; (4.25)
26
where e�01e�1e�01e�1 is a r1� r1 nonsingular matrix. Hence u�n;k is the solution of the followingdeterminantal equation asymptotically
0 =
����e�01?�Sn(u)� Sn(u)e�1 he�01Sn(u)e�1i�1 e�01Sn(u)� e�1?���� : (4.26)
Denote Tn(u) = Sn(u) � Sn(u)e�1 he�01Sn(u)e�1i�1 e�01Sn(u), then (4.26) can be equivalentlywritten as
0 =���e�0?Tn(u)e�?���� �����0o?�Tn(u)� Tn(u)e�? he�0?Tn(u)e�?i�1 e�0?Tn(u)��o?���� : (4.27)
Note that
n12 e�0?Sn(u)e�? = n�
12ue�0?e�? � n 1
2 e�0? �b�ols ��1;n� e�? � n 12 e�0?�1;ne�?;
n12 e�01Sn(u)e�? = �n
12 e�01 �b�ols ��1;n� e�? � n 1
2 e�01�1;ne�?;e�0?Sn(u)e�1 = �e�0?b�olse�1:From above expressions, we can write
n12 e�0?Tn(u)e�? = n� 1
2ue�0?e�? � e�0? �Im + b�olse�1 he�01Sn(u)e�1i�1 e�01�M1;n; (4.28)
where M1;n = n12
�b�ols ��1;n� e�? + n 12�1;ne�?. Using Lemma 4.1 and the results in (a),
we can deduce that
n12
�b�ols ��1;n� e�? = Q�1hQ�b�ols ��1;n�Q�1Dnin 1
2D�1n Qe�?! d N(0;�uz1)�
�1z1z1�
0oe�? := N1e�?: (4.29)
Using the results in Lemma 4.2, we get
n12�1;ne�? = n
12 (�1;n ��1) e�? = n 1
2Q�1 (Hn �Ho)Qe�?! d �uz1(1)�
�1z1z1N(0; Vz1z1)�
�1z1z1�
0oe�? := N2e�?: (4.30)
27
From (4.28)-(4.30), we can deduce that���pne�0?Tn(u)e�?���!d
���e�0? (N1 +N2) e�?��� 6= 0; a:e: (4.31)
Next note that
�0o?Tn(u)�o? = n�1u�0o?�o? � �0o?�Im + b�olse�1 �e�01Sn(u)e�1��1 e�01�M2;n; (4.32)
e�0?Tn(u)�o? = �e�0? �Im + b�olse�1 �e�01Sn(u)e�1��1 e�01�M2;n; (4.33)
where M2;n =�b�ols ��1;n��o? +�1;n�o?. By (4.22) and (4.25), we have
�0o?Tn(u)e�? = �0o?b�olse�? � �0o?b�olse�1 �e�01Sn(u)e�1��1 e�01b�olse�? = op(1): (4.34)
Using Lemma 4.2, we can deduce that
n�b�ols ��1;n��o? = nQ�1
hQ�b�ols ��1;n�Q�1DniD�1n Q�o?
= Q�1hQ�b�ols ��1;n�Q�1Dni
0
�0o?�o?
!
! d
�ZBw2dB
0u
�0�ZBw2B
0w2
��1�0o?�o? := e�0�o? (4.35)
Using the result in (4.5), we get
n�1n�o? = n��o +Q
�1HnQ��o?
= nQ�1HnQ�o? !de�1�o?: (4.36)
From (4.32)-(4.36), we deduce that����n�0o?�Tn(u)� Tn(u)e�? he�0?Tn(u)e�?i�1 e�0?Tn(u)��o?����! d
���uIm�ro � �0o? �e�0 + e�1��o?��� ; (4.37)
uniformly over any compact set in R. Now, the results in (c) follow from (4.37) and the
CMT.
28
(d) If we denote u�n;k =pn�k(b�ols), then by de�nition, u�n;k (k 2 eScols;�) is the solution
of the following determinantal equation
0 =���e�01Sn(u)e�1���� ����e�01?�Sn(u)� Sn(u)e�1 he�01Sn(u)e�1i�1 e�01Sn(u)� e�1?���� ; (4.38)
where Sn(u) = upnIm � b�ols.
Note that
e�01?Sn(u)e�1? = n�12uIm�r1 � e�01?b�olse�1?; (4.39)e�01?Sn(u)e�1 = �e�01?b�olse�1 and e�01Sn(u)e�1? = �e�01b�olse�1?: (4.40)
Using expressions (4.39) and (4.40), we have
e�01?�Sn(u)� Sn(u)e�1 he�01Sn(u)e�1i�1 e�01Sn(u)� e�1?= n�
12uIm�r1 � e�01?�Im + b�olse�1 he�01Sn(u)e�1i�1 e�01� b�olse�1?: (4.41)
From (4.35), we getpn�b�ols ��1n��o;? = op(1): (4.42)
Using (4.29), (4.5) and (4.6), we have
pn�b�ols ��1n� e�? ! d
e�3e�?; (4.43)pn�1n�o? = op(1); (4.44)pn�1ne�? ! d
e�2e�?: (4.45)
Now, using the results in (4.42)-(4.45), we get
pnb�olse�1? =
pnhb�ols ��1ni e�1? +pn�1ne�1?
! d
h0m�(m�ro);
e�3e�?i+ h0m�(m�ro); e�2e�?i=
h0m�(m�ro);
�e�2 + e�3� e�?i : (4.46)
29
From the results in (4.41)-(4.46), it follows that����pne�01?�Sn(u)� Sn(u)e�1 he�01Sn(u)e�1i�1 e�01Sn(u)� e�1?����! d
����uIm�r1 � e�01? �Im + e�1 �e�01e�1��1 e�01� h0m�(m�ro); (N1 +N2) e�?i����= juIm�r0 j �
���uIro�r1 � e�0? �e�2 + e�3� e�?��� : (4.47)
Note that the determinantal equation
juIm�r0 j ����uIro�r1 � e�0? �e�2 + e�3� e�?��� = 0 (4.48)
has m� r0 zero eigenvalues, which correspond to the probability limit ofpn�Scols;�(
b�ols),as illustrated in (c). Equation (4.48) also has ro � r1 non-trivial eigenvalues as solutionsof the stochastic determinantal equation
���uIro�r1 � e�0? �e�2 + e�3� e�?��� = 0, which �nishesthe proof.
We next derive the asymptotic properties of the shrinkage estimator b�n with weaklydependent innovations. Let eS� = fk : �k(�1) 6= 0; k = 1; :::;mg be the index set of non-zeroeigenvalues of �1. From Lemma 4.3, it is clear that eS� � S�.Corollary 4.1 Under Assumption 3.1, 4.1 and 3.3.(i), the shrinkage estimator b�n satis-�es b�n !p �1; (4.49)
where �1 is de�ned in (4.3).
Proof. First, when ro = 0, then �1 = e�o�0o = 0 = �o. Hence, (4.49) follows by the
similar arguments to those in the proof of Theorem 3.1. To �nish the proof, we only need
to consider scenarios where ro = m and ro 2 (0;m).
30
Using the same notation for V1;n(�) de�ned in the proof of Theorem 3.1, by de�nition
we have V1;n(b�n) � V1;n(�1), which implieshvec(b�n ��1)i0 nX
t=1
Yt�1Y0t�1 Im
!hvec(b�n ��1)i
+2hvec(b�n ��1)i0 vec" nX
t=1
utY0t�1 � (�1 ��o)
nXt=1
Yt�1Y0t�1
#
� n
(mXk=1
bP�n ���k(�1)�� mXk=1
bP�n h��k(b�n)i): (4.50)
When ro = m, Yt is stationary and we have
1
n
nXt=1
Yt�1Y0t�1 !p �Y Y (0) = R(1)uR(1)
0: (4.51)
From the results in (4.50) and (4.51), we get w.p.a.1, b�n ��1 2E�min �
b�n ��1 Ecn � dn � 0; (4.52)
where �min denotes the smallest eigenvalue of �Y Y (0) , which is positive with probability
1,
cn =
Pnt=1 utY
0t�1
n� (�1 ��o)
Pnt=1 Yt�1Y
0t�1
n
E
! p
�uY (1)� �uY (1)��1z1z1�z1z1 E = 0;and dn =
Pk2 eS� bP�n ���k(�1)� = op(1). So result in (4.49) follows directly from the in-
equality in (4.52).
When 0 < ro < m, we get
hvec(b�n ��1)i0 nX
t=1
Yt�1Y0t�1 Im
!hvec(b�n ��1)i
= vec(b�n ��1)0 BnDn nXt=1
Zt�1Z0t�1DnB
0n Im
!vec(b�n ��1)
� �min
�b�n ��1�Bn 2E: (4.53)
31
Next, note that(nXt=1
utZ0t�1 �
�(�1 ��o)Q�1
� nXt=1
Zt�1Z0t�1
)Dn
=
"n�
12Pnt=1 Z1;t�1u
0t
n�1Pnt=1 Z2;t�1u
0t
#0�"n�
12Pnt=1 Z1;t�1Z
01;t�1�
�1z1z1�
0uz1(1)
n�1Pnt=1 Z2;t�1Z
01;t�1�
�1z1z1�
0uz1(1)
#0: (4.54)
From Lemma 10.1, we can deduce that
n�1nXt=1
Z2;t�1u0t = Op(1) and n
�1nXt=1
Z2;t�1Z01;t�1�
�1���
0uZ1(1) = Op(1): (4.55a)
Similarly, we get
n�12
nXt=1
�Z1;t�1u
0t � �0uz1(1)
�� n
12 [Sn;11 � �z1z1 ] ��1z1z1�
0uz1(1) = Op(1): (4.56)
De�ne en = �Pn
t=1 utZ0t�1 �
�(�1 ��o)Q�1
�Pnt=1 Zt�1Z
0t�1Dn E, then from (4.54)-
(4.56) we can deduce that en = Op(1). By Cauchy-Schwarz, we have�����hvec(b�n ��1)i0 vec"nXt=1
utY0t�1 � (�1 ��o)
nXt=1
Yt�1Y0t�1
#�����=
�����hvec(b�n ��1)i0 vec"(
nXt=1
utZ0t�1 �
�(�1 ��o)Q�1
� nXt=1
Zt�1Z0t�1
)DnB
0n
#������
�b�n ��1�Bn Een: (4.57)
De�ne dn = nPk2 eS� bP�n ���k(�1)�, from results in (4.50), (4.53) and (4.57), we get
�min
�b�n ��1�Bn 2E� 2
�b�n ��1�Bn Een � dn � 0 (4.58)
Now, result in (4.49) follows by the same arguments in Theorem 3.1.
Corollary 4.1 implies that the shrinkage estimator b�n has the same probability limit asthat of the OLS estimator b�ols. We next derive the convergence rate of b�n.Corollary 4.2 Denote an = maxk2 eS�
��� bP 0�n ���k(�1)����. Under Assumption RR, 3.3.(i)-(ii)32
and 4.1, the shrinkage LS estimator b�n satis�es(a) if ro = 0, then b�n ��1 = Op(n�1 + n�1an);(b) if 0 < ro � m, then
�b�n ��1�Q�1D�1n = Op(n� 12 + an).
Proof. By the results of Corollary 4.1 and the CMT, we deduce that �k(b�n) is a consistentestimator of �k(�1) for all k = 1; :::;m. Using the same arguments as those of Theorem
3.2, we deduce thatXk2 eS�
� bP�n ���k(�1)�� bP�n [��k(b�n)]� � Cmaxk2 eS�
��� bP 0�n ���k(�1)���� b�n ��1 E+ op(1)
b�n ��1 2E: (4.59)
Using the inequality (4.59) in (4.50) gives
hvec(b�n ��1)i0 nX
t=1
Yt�1Y0t�1 Im
!hvec(b�n ��1)i
+2hvec(b�n ��1)i0 vec" nX
t=1
utY0t�1 � (�1 ��o)
nXt=1
Yt�1Y0t�1
#
� Cnmaxk2 eS�
��� bP 0�n ���k(�1)���� b�n ��1 E + op(1) b�n ��1 2E ; (4.60)
where C > 0 denotes a generic constant.
When ro = 0, the convergence rate of b�n could be derived using the same arguments inTheorem 3.2. Hence, to �nish the proof, we only need to consider scenarios where ro = m
and 0 < ro < m.
Consider the case ro = m. Note that
n�12
nXt=1
utY0t�1 � n�
12 (�1 ��o)
nXt=1
Yt�1Y0t�1
= n�12
nXt=1
�utY
0t�1 � �uY (1)
�� �uY (1)��1z1z1
hn12
�bS11 � �z1z1�i = Op(1):Following similar arguments to those of Theorem 3.2, we get b�n ��1 2
E�min �
b�n ��1 E(cn + anC) � 0; (4.61)
33
where cn = n�1 �Pn
t=1 utY0t�1 � (�1 ��o)
Pnt=1 Yt�1Y
0t�1� E= Op(n
� 12 ). From the in-
equality in (10.32), we deduce that
b�n ��1 = Op(n� 12 + an): (4.62)
When 0 < ro < m, we can use (4.50), (4.53) and (4.57) in the proof of Corollary 4.1
and (10.29) in the proof of Corollary 3.2 to get
�min
�b�n ��1�Bn 2E� 2
�b�n ��1�Bn Een � nanCV
b�n ��1 E; (4.63)
where en = �Pn
t=1 utZ0t�1 �
�(�1 ��o)Q�1
�Pnt=1 Zt�1Z
0t�1Dn E= Op(1). Note that b�n ��1
E= �b�n ��1�BnB�1n
E� Cn�
12
�b�n ��1�Bn E: (4.64)
From the inequalities in (4.63) and (4.64), we obtain�b�n ��1�Bn = Op(n� 12 + an); (4.65)
which �nishes the proof.
Corollary 4.1 implies that the nonzero eigenvalues of �1 are estimated as nonzeros
w.p.a.1. However, the matrix �1 may have more zero eigenvalues than �o. To ensure
consistent cointegration rank selection, we need to show that �1 hasm�ro zero eigenvalueswhich are estimated as zero w.p.a.1 and ro � r1 zero eigenvalues which are estimated asnon-zeros w.p.a.1. From Lemma (4.2), we see that b�ols has m � ro eigenvalues whichconverge to zero at the rate n and ro � r1 eigenvalues which converge to zero at the ratepn. The di¤erent convergence rates of the estimates of the zero eigenvalues of �1 enable
us to empirically distinguish the estimates of the m � ro zero eigenvalues of �1 from the
estimates of the ro � r1 zero eigenvalues of �1. For example, we can de�ne thresholdingestimators of the eigenvalues of �1 in the following way
b�th;k =(
0b�k if���b�k��� � n� 5
8
otherwise;
where b�k = �k(b�ols). Let eSols;� = fk : �k(b�ols) 6= 0; k = 1; :::;mg be the index set ofnonzero eigenvalues of b�ols. As n 5
8
���b�k��� = n18
���n 12 b�k��� !p 1 for all k 2 eSols;�, it follows
34
that b�th;k = b�k 8k 2 eSols;� w.p.a.1. On the other hand, n 58
���b�k��� = n� 38
���nb�k��� !p 0 for all
k 2 eScols;�, and hence b�th;k = 0 8k 2 eScols;� w.p.a.1. From the arguments above, we see that
consistent cointegration rank is achieved. We next show that the LS shrinkage estimatorb�n has only m� ro zero eigenvalues w.p.a.1.Assumption 4.2 The penalty function bP�n(�) satis�es
n12+v bP 0�n [��k(b�n)] = op(1)
for all k 2 S� and some positive constant v.
It is easy to see that if n1+!2 �n = o(1), then by Lemma (4.3) the adaptive Lasso penalty
satis�es Assumption 4.2. For any m�m matrix �, we have by de�nition
�vk = �kvk and �0uk = �kuk;
where �k is the k-th largest eigenvalue of �, and vk and uk are the related normalized right
and left eigenvectors. Applying the chain rule, we get
d�vk +�dvk = vkd�k + �kdvk;
so that
u0kd�vk = u0kvkd�k; (4.66)
where we use the fact that u0k� = u0k�k. It is easy to see that e�1? and e�1? contain the right
and left eigenvectors of the zero eigenvalues of �1 respectively. When r1 = m, then the
limit results �k(b�n)!p �k(�1), k = 1; :::;m imply consistent cointegration rank selection
immediately. When r1 < m, both e�1? and e�1? are non-zero. Suppose the (i; j)-th elementof e�1? and (l;m)-th element of e�1? are nonzero, then from the di¤erentials in (4.66), we
deduce thatd�jld�k
�����=�1
=(e�1?)0�j(e�1?)l�(e�1?)ij(e�1?)lm : (4.67)
The result in (4.67) is important for establishing the following result.
Corollary 4.3 Under Assumption LP, RR, Assumption 3.3 and Assumption 4.2, we have
Pr��k(b�n) 6= 0�! 1 (4.68)
35
for all k 2 S� as n!1, and
Pr��k(b�n) = 0�! 1 (4.69)
for all k 2 Sc� as n!1.
Proof. On the event f�k(b�n) = 0g, for some k 2 S�, we have the following Karush-Kuhn-Tucker (KKT) optimality condition
���n� 12+vvec(@b�n)0 (Y�1 Im) h�y � �Y 0�1 Im� vec(b�n)i��� < n
12+v bP 0�n [��k(b�n)]
2; (4.70)
where @b�n is a matrix whose jk-th element equals (b�1?)0�j(b�1?)k�(b�1?)ij(b�1?)kl and whose other elements
are zero. By Corollary 4.1 and continuous mapping, we have
@b�n !p
(e�1?)0�j(e�1?)l�(e�1?)ij(e�1?)lm := @�1: (4.71)
Denote
bJ =
Pnt=1 Yt�1u
0t
n�Pnt=1 Yt�1Y
0t�1
n(b�n ��o)
= Q�1�Pn
t=1 Zt�1w0t
n�Pnt=1 Zt�1Z
0t�1
nQ0�1(b�n ��o)0Q0�Q0�1;
where Pnt=1 Zt�1Z
0t�1
nQ0�1(b�n ��o)0Q0
=
bS11 bS12bS21 bS22!
�0o(b�n ��o)�o(�0o�o)�1 �0ob�n(�0o?�o?)�1�0o?(
b�n ��o)�o(�0o�o)�1 �0o?b�n�o?(�0o?�o?)�1
!0
=
bS11 bS12bS21 bS22!
��1z1z1�z1w1(1) + op(1) ��1z1z1�z1w2(1) + op(1)
(�0o?�o?)�1�0o?
b�0n�o (�0o?�o?)�1�0o?
b�0n�o?!:
36
Hence
bJ11 =
Pnt=1 Z1;t�1w
01t
n�hbS11��1z1z1�z1w1(1) + bS12(�0o?�o?)�1�0o?b�0n�oi+ op(1)
=
Pnt=1 Z1;t�1w
01t
n� bS11��1z1z1�z1w1(1) + op(1)
= bS11 �bS�111 � ��1z1z1� Pnt=1 Z1;t�1w
01t
n+ bS11��1z1z1 �Pn
t=1 Z1;t�1w01t
n� �z1w1(1)
�+ op(1);
which implies that
n12+v bJ11 !p 1: (4.72)
for any v > 0. Next note that
bJ12 =
Pnt=1 Z1;t�1w
02t
n�hbS11��1z1z1�z1w2(1) + bS12(�0o?�o?)�1�0o?b�0n�o?i+ op(1)
=
Pnt=1 Z1;t�1w
02t
n� bS11��1z1z1�z1w2(1) + op(1)
= bS11 �bS�111 � ��1z1z1� Pnt=1 Z1;t�1w
02t
n+ bS11��1z1z1 �Pn
t=1 Z1;t�1w02t
n� �z1w2(1)
�+ op(1);
which implies that
n12+v bJ12 !p 1: (4.73)
For the third term,
bJ21 =
Pnt=1 Z2;t�1w
01t
n�hbS21��1z1z1�z1w1(1) + bS22(�0o?�o?)�1�0o?b�0n�oi+ op(1)
=
Pnt=1 Z2;t�1w
01t
n� bS22(�0o?�o?)�1�0o? �b�n ��1;n�0 �o
�hbS21��1z1z1�z1w1(1) + bS22(�0o?�o?)�1�0o?�01;n�oi+ op(1):
=
�ZBw2dB
0w1
���Z
Bw2dB0w2
�(�0o?�o?)
�1�0o?
�b�n ��1;n�0 �o + op(1):If b�n is the OLS estimator, then it is easy to see that�Z
Bw2dB0w2
�(�0o?�o?)
�1�0o?
�b�ols ��1;n�0 �o !d
�ZBw2dB
0w1
�:
37
However, as shown in Liao and Phillips (2010), when there are rank constraints imposed
on b�n, the limiting distribution of �R Bw2dB0w2� (�0o?�o?)�1�0o? �b�n ��1;n�0 �o will bedi¤erent from the above. Hence, we can deduce that
n12+v bJ21 !p 1: (4.74)
For the fourth term,
bJ22 =
Pnt=1 Z2;t�1w
02t
n�hbS21��1z1z1�z1w2(1) + bS22(�0o?�o?)�1�0o?b�0n�o?i+ op(1)
=
Pnt=1 Z2;t�1w
02t
n� bS22(�0o?�o?)�1�0o? �b�n ��1;n�0 �o?
�hbS21��1���z1w2(1) + bS22(�0o?�o?)�1�0o?�01;n�o?i+ op(1):
=
�ZBw2B
0w2
���Z
Bw2B0w2
�(�0o?�o?)
�1�0o?
�b�n ��1;n�0 �o? + op(1):Using similar arguments to those in deriving (4.74), we have
n12+v bJ22 !p 1: (4.75)
From the results in (4.71) and (4.72)-(4.75), we can deduce that���n� 12+vvec(@b�n)0 (Y�1 Im) h�y � �Y 0�1 Im� vec(b�n)i���!p 1: (4.76)
However, from Assumption 4.2, we have
n12+v bP 0�n [��k(b�n)]
2= op(1) (4.77)
Results in (4.70), (4.76) and (4.77) imply that Pr���k(b�n) = 0�! 0, for any k 2 S�; giving
(4.68). The proof of (4.69) is similar to that of Theorem 3.3 and is therefore omitted.
Corollary 4.3 implies that b�n has m � ro eigenvalues estimated as zero w.p.a.1 andthe other eigenvalues are estimated as nonzero w.p.a.1. Hence Corollary 4.3 gives us the
following result immediately.
38
Theorem 4.4 (Sparsity-II) Under the conditions of Corollary 4.3, we have
Pr�r(b�n) = ro�! 1 (4.78)
as n!1, where r(b�n) denotes the rank of b�n.5 Extension II: VECM Estimation with Explicit Transient
Dynamics
In this section, we are interested in the estimation of the VECM model
�Yt = �oYt�1 +
pXj=1
Bo;j�Yt�j + ut (5.1)
with simultaneous cointegration rank selection and lag order selection. Denoting Bo =
(Bo;1; :::; Bo;p), the unknown parameter matrices (�o; Bo) are estimated via the following
penalized LS estimation
(b�n; bBn) = argmin�2Rm�m;Bj2Rm�m
nXt=1
�Yt ��Yt�1 �pXj=1
Bj�Yt�j
2
E
+n
pXj=1
bP�n(Bj) + n mXk=1
bP�n ���k(�)� : (5.2)
In (5.2), bP�n(�) denotes the adaptive group Lasso penalty function, i.e.bP�n(A) = �n kAkE/ jj bAnjj!E ;
for any m�m matrix, where ! > 0 and bAn is some �rst-step estimator of A.The adaptive group Lasso delivers the penalty to the shrinkage estimator groupwisely.
For example, if kAkE 6= 0 for some m�m matrix A and bAn is a �rst step consistent esti-mator of A, then under condition �n ! 0, the penalty on the estimate of each component
of A will go to zero asymptotically because
jj bAnjj!E !p kAk!E 6= 0;
39
even though the matrix A may have some zero elements. On the other hand, if kAkE = 0,then the penalty on the estimate of each component of A will be nontrivial and it is possible
to consistently recover the zero matrix A in the LS shrinkage estimation. Also note that
when A is a scalar, then the adaptive group Lasso penalty function reduces to the adaptive
Lasso penalty function.
To consistently select the true order of the lagged di¤erences, Bo should be at least
consistently estimable. Thus we assume that the error term ut satis�es Assumption 3.1.
Assumption 3.2 needs revision to accommodate the general structure in (5.1). Denote
C(�) = �o +
pXj=0
Bo;j(1� �)�i
where Bo;0 = �Im.
Assumption 5.1 (RR) (i) The determinantal equation jC(�)j = 0 has roots on or outsidethe unit circle; (ii) the matrix �o has rank ro, with 0 � ro � m; (iii) the following
(m� ro)� (m� ro) matrix
�0o;?
0@Im � pXj=1
Bo;j
1A�o;? (5.3)
is nonsingular.
Under Assumption 5.1, the time series Yt has following partial sum representation,
Yt = CB
tXs=1
us + �(L)ut + CBY0 (5.4)
where CB = �o;?
h�0o;?
�Im �
Ppj=1Bo;j
��o;?
i�1�0o;? and �(L)ut =
P1s=0 �sut�s is a
stationary process. From the partial sum representation in (5.4), we deduce that �0oYt =
�0o�(L)ut and �Yt�j (j = 0; :::; p) are stationary.
De�ne
QB :=
0B@ �0o 0
0 Imp�mp
�0o;? 0
1CAm(p+1)�m(p+1)
40
and denote �Xt�1 =��Y 0t�1; :::;�Y
0t�p�0, then the model in (5.1) can be rewritten as
�Yt =h�o Bo
i " Yt�1
�Xt�1
#+ ut; (5.5)
Denote
Zt�1 = QB
"Yt�1
�Xt�1
#=
"Z1;t�1
Z2;t�1
#; (5.6)
where Z1;t�1 =
"�0oYt�1
�Xt�1
#is a stationary process and Z2;t�1 = �0o;?Yt�1 consists the I(1)
components.
Lemma 5.1 Under Assumption 3.1 and 5.1, the results in Lemma 10.1 are satis�ed forZ1;t and Z2;t de�ned in (5.6).
We �rst establish the asymptotic properties of the OLS estimator�b�ols; bBols� of
(�o; Bo) and the asymptotic properties of the eigenvalues of b�ols. The estimate �b�ols; bBols�has the following closed-form solution
�b�ols; bBols� = � bSy0y1 bSy0x0 � bSy1y1 bSy1x0bSx0y1 bSx0x0
!�1; (5.7)
where
bSy0y1 =1
n
nXt=1
�YtY0t�1; bSy0x0 = 1
n
nXt=1
�Yt�X0t�1; bSy1y1 = 1
n
nXt=1
Yt�1Y0t�1;
bSy1x0 =1
n
nXt=1
Yt�1�X0t�1; bSx0y1 = bS0y1x0 and bSx0x0 = 1
n
nXt=1
�Xt�1�X0t�1. (5.8)
Denote Y� = (Y0; :::; Yn�1)m�n, �Y = (�Y1; :::;�Yn)m�n and cM0 = Im ��X 0 bS�1x0x0�X,where �X = (�X0; :::;�Xn�1)mp�n, then b�ols has the explicit representation
b�ols =��Y cM0Y
0�
��Y�cM0Y
0�
��1= �o +
�UcM0Y
0�
��Y�cM0Y
0�
��1; (5.9)
41
where U = (u1; :::; un)m�n. Recall that [�1(b�ols); :::; �m(b�ols)] and [�1(�o); :::; �m(�o)] arethe ordered eigenvalues of b�ols and �o respectively, where �j(�o) = 0 (j = 1; :::;m� ro).Lemma 5.2 Suppose that Assumption 3.1 and 5.1 hold.
(a) De�ne Dn;B = diag(n12 Iro+mp; nIm�ro), then
h(b�ols; bBols)� (�o; Bo)iQ�1B Dn;B has
the following limit distribution�N(0;u �z1z1)��1z1z1 ;
�RBw2dB
0u
�0 �RBw2B
0w2
��1 �; (5.10)
(b) the eigenvalues of b�ols satisfy Lemma 3.1.(b);(c) the �rst m� ro ordered eigenvalues of b�ols satisfy Lemma 3.1.(c).
Proof. To prove (a), start by de�ning bSuy1 = 1n
nPt=1utY
0t�1 and bSux0 = 1
n
nPt=1ut�X
0t�1.
From the expression in (5.8), we geth(b�ols; bBols)� (�o; Bo)iQ�1B Dn;B
=� bSuy1 bSux0 �Q0BD�1n;B
"D�1n;BQB
bSy1y1 bSy1x0bSx0y1 bSx0x0!Q0BD
�1n;B
#�1: (5.11)
Note that
� bSuy1 bSux0 �Q0BD�1n;B = U"QB
Y�
�X
!#0D�1n;B =
�n�
12UZ 01 n�1UZ 02
�(5.12)
and
D�1n;BQB
bSy1y1 bSy1x0bSx0y1 bSx0x0!Q0BD
�1n;B =
0BB@ n�1nPt=1Z1;tZ
01;t n�
32
nPt=1Z1;tZ
02;t
n�32
nPt=1Z2;tZ
01;t n�2
nPt=1Z2;tZ
02;t
1CCA ; (5.13)
where Z1 = (Z1;0; :::; Z1;n�1) and Z2 = (Z2;0; :::; Z2;n�1). Now the result in (5.10) follows
by applying the Lemma 5.1.
The proofs of (b) and (c) are similar to those of Lemma 3.1.(b)-(c) and are therefore
omitted.
Denote the index set of the zero components in Bo as ScB such that kBo;jk = 0 for all
42
j 2 ScB and kBo;jk 6= 0 otherwise. We next derive the asymptotic properties of the LS
shrinkage estimator de�ned in (5.2).
Lemma 5.3 If Assumption 3.1 and 5.1 are satis�ed and �n = o(1), then the LS shrinkageestimator (b�n; bBn) satis�esh
(b�n; bBn)� (�o; Bo)iQ�1B Dn;B = Op(1 + n 12�n): (5.14)
Proof. Let � = (�; B) and
V3;n(�) =
nXt=1
�Yt ��Yt�1 �pXj=1
Bj�Yt�j
2
E
+n
pXk=1
bP�n(Bj) + n mXk=1
bP�n ���k(�)� ;= V4;n(�) + n
pXk=1
bP�n(Bj) + n mXk=1
bP�n ���k(�)� :Set b�n = (b�n; bBn), and then by de�nition V3;n(b�n) � V3;n(�o); so thatn
vechn�
12 (b�n ��o)Q�1B Dn;Bio0Wn
nvec
hn�
12 (b�n ��o)Q�1B Dn;Bio
: +2nvec
hn�
12 (b�n ��o)Q�1B Dn;Bio0 dn � (e1;n + e2;n) (5.15)
whereWn = D�1n;B
Pnt=1 Zt�1Z
0t�1D
�1n;BIm(p+1), dn = vec
�n�
12D�1n;B
Pnt=1 Zt�1u
0t
�, e1;n =P
k2ScB
h bP�n(Bo;j)� bP�n( bBn;j)i and e2;n =Pk2Sc�
n bP�n ���k(�o)�� bP�n [��k(b�n)]o. Apply-ing the Cauchy-Schwarz inequality to (5.15), we deduce that
�n;min
n� 12 (b�n ��o)Q�1B Dn;B 2
E� n� 1
2 (b�n ��o)Q�1B Dn;B Edn � (e1;n + e2;n) � 0;
(5.16)
where �n;min denotes the smallest eigenvalue of Wn which is bounded away from zero
w.p.a.1. By the de�nition of the adaptive group Lasso penalty function, the consistency ofb�ols and the Slutsky Theorem, we �nd thate1;n �
Xk2ScB
bP�n(Bo;j) = op(1) and e2;n � Xk2Sc�
bP�n ���k(�o)� = op(1): (5.17)
43
If ro = m, then Dn;B = n12 Im(p+1) and hence n
� 12Dn;B = Im(p+1) and n
� 12D�1n;B =
n�1Im(p+1). As Zt�1u0t is a stationary martingale di¤erence, we have by the LLN,
dn = vec
n�1
nXt=1
Zt�1u0t
!= op(1): (5.18)
The consistency of b�n is implied by the inequality in (5.16) and the results in (5.17)-(5.18).If ro 2 [0;m), then by Lemma 5.1, we can deduce that
dn = vec
n�
12D�1n;B
nXt=1
Zt�1u0t
!= op(1): (5.19)
The consistency of b�n is implied by the inequality in (5.16) and the results in (5.17) and(5.19).
We next derive the convergence rate of the LS shrinkage estimator b�n. Using the
similar argument in the proof of Theorem 3.2, we getXk2S�
� bP�n ���k(�o)�� bP�n [��k(b�n)]� � n� 12Cmax
k2S�
n bP 0�n ���k(�o)�o �b�n ��o�Q�1B Dn;B ;(5.20)
where C is some generic positive constant. SimilarlyXk2SB
h bP�n(Bo;j)� bP�n( bBn;j)i � n� 12C max
k2SB
n bP 0�n(Bo;j)o �b�n ��o�Q�1B Dn;B : (5.21)Combining the results in (5.20)-(5.21), we get
n(e1;n + e2;n) � n12an
�b�n ��o�Q�1B Dn;B ; (5.22)
where an = maxk2S�n bP 0�n ���k(�o)�o_maxk2SB n bP 0�n(Bo;j)o = Op(�n). From Lemma 5.1,
we have
n12dn = vec
D�1n;B
nXt=1
Zt�1u0t
!= OP (1): (5.23)
Using the inequality in (5.22), we can rewrite the inequality in (5.16) as
�n;min
(b�n ��o)Q�1B Dn;B 2E� (b�n ��o)Q�1B Dn;B
E(dn + n
12an) � 0; (5.24)
44
which combined with the result in (5.23) imply the rate claimed in (5.14).
The results in Lemma 5.3 imply that the LS shrinkage estimator (b�n; bBn) may havethe same convergence rate of the OLS estimator (b�ols; bBols). We next show that if the
tuning parameter �n converges to zero not too fast, then the zero eigenvalues of �o and
zero matrices in Bo are estimated as zero w.p.a.1. Let the smallest m � ro eigenvalues ofb�n be indexed by Scn;� and the zero matrix in bBn by Scn;B.Theorem 5.1 Suppose that Assumption 3.1 and 5.1 hold. If the tuning parameter �n and! satisfy the conditions ! > 1
2 , n12�n = o(1), �nn! !1 and n
1+!2 �n !1, then we have
Pr��k(b�n) = 0�! 1 (5.25)
for all k 2 Sc�, andPr� bBn;j = 0m�m�! 1 (5.26)
for all j 2 ScB.
Proof. The �rst result can be proved using similar arguments to those in the proof of
Theorem 3.3 and hence is omitted. We next show the second result. Note that
nXt=1
�Yt ��Yt�1 �pXj=1
Bj�Yt�j
2
E
�nXt=1
kutk2E
= vec(���o)0 Q�1B
nXt=1
Zt�1Z0t�1Q
�10B Im
!vec(���o)
�2vec(���o)0vec Q�1B
nXt=1
�Ztu0t
!: (5.27)
On the event f bBn;j 6= 0m�mg for some j 2 ScB, we have the following KKT optimality
condition,
n�n
2jj bBols;j jj!E jj bBn;j jjE bBn;j = V�cWB Im
�vec(b�n ��o)� V vec Q�1B nX
t=1
Zt�1u0t
!(5.28)
where V = diag(V1; :::; Vp+1), Vj+1 = Im and V�(j+1) = 0m andcWB = Q�1B
Pnt=1 Zt�1Z
0t�1Q
�10B .
By the de�nition of V , we have V = V � Im where V � is a (p + 1) � (p + 1) diagonal
45
matrix whose j + 1-th diagonal element is 1 and zero otherwise. Hence, results in (5.28)
imply that
n1+!2 �n
2jjn 12 bBols;j jj!E =
V �cWB
n12
Im
!vec(b�n ��o)� vec V �Q�1B Pn
t=1 Zt�1u0t
n12
! E
:
(5.29)
By Lemma 5.1 and Lemma 5.3, we deduce that V �cWB
n12
Im
!vec(b�n ��o)
=
V �cWBQ
0BD
�1n;B
n12
Im
!vec
h�b�n ��o�Q�1B Dn;Bi = Op(1); (5.30)
and
vec
V �Q�1B
Pnt=1 Zt�1u
0t
n12
!= Op(1): (5.31)
However, as n1+!2 �n !1, it follows by Lemma 5.2 and the Slutsky Theorem that
n1+!2 �n
2jjn 12 bBols;j jj!E !p 1: (5.32)
Now, using the results in (5.29)-(5.31), we can deduce that the event f bBn;j 6= 0m�mg8j 2 ScB has zero probability with n!1, which �nishes the proof.
Theorem 5.1 indicates that the zero eigenvalues of �o and the zero matrices in Bo are
estimated as zeros w.p.a.1. Hence Lemma 5.3 and Theorem 5.1 imply consistent cointe-
gration rank selection and consistent lag order selection.
Theorem 5.2 Under the conditions of Theorem 5.1, we have
Pr (Sn;� = S� and Sn;B = SB)! 1; (5.33)
as n!1.
As
QB =
0B@ Iro 0 0
0 0 Imp
0 Im�ro 0
1CA diag(Q; Imp);46
it is easy to see that
Q�1B =
�o(�
0o�o)
�1 0 �o;?(�0o;?�o;?)
�1
0 Imp 0
!:
To ensure identi�cation, we normalize �o as �o = [Iro ; Oro ]0; where Oro is some ro�(m�ro)
matrix such that �o = �o�0o = [�o; �oOro ]. From Lemma 5.3, we have�n12 (b�n ��o)�o(�0o�o)�1 n
12 ( bBn �Bo) n(b�n ��o)�o;?(�0o;?�o;?)�1 � = Op(1);
which implies that
n� bOn �Oo� = Op(1); (5.34)
n12 ( bBn �Bo) = Op(1); (5.35)
n12 (b�n � �o) = Op(1): (5.36)
To conclude this section, we derive the centered limit distribution of the shrinkage estimatorb�S = �b�n; bBSB� in the following theorem, where bBSB denotes the LS shrinkage estimatorof the nonzero matrices in Bo. Denote ISB = diag(I1;m; :::; IdSB ;m) where the Ij;m (j =
1; :::; dSB ) are m�m identity matrices and dSB is the dimensionality of the index set SB:De�ne
QS :=
0B@ �0o 0
0 ISB�0o;? 0
1CA and DS = diag(n12 Iro ; n
12 ISB ; nIm�ro);
where the identity matrix ISB = ImdSB in QS serves to accommodate the nonzero matrices
in Bo:
Theorem 5.3 Under conditions of Theorem 5.1, we have�b�S ��o;S�Q�1S D�1S !d
�N(0;u �z1z1)��1z1z1 �oU
�2 (�
0o;?�o;?)
�1�; (5.37)
where
U�2 = (�0o�o)
�1�0o
�ZdBuB
0w2
��ZBw2B
0w2
��1(�0o;?�o;?): (5.38)
Proof. From the results of Theorem 5.1, we deduce that b�n, b�n and bBSB minimize the47
following criterion function w.p.a.1,
V3;n(�S) =nXt=1
�Yt � ��0Yt�1 �Xj2SB
Bj�Yt�j
2
E
+nXk2S�
bP�n ���k(��0)�+n Xj2SB
bP�n(Bj):De�ne U�1;n =
pn (b�n � �o), U2;n = �
0ro ; U�2;n
�0, where U�2;n = n� bOn �Oo� and U�3;n =
pn� bBSB �Bo;SB� thenh�b�n ��o� ;� bBSB �Bo;SB�iQ�1S D�1S
=hn�
12 b�nU2;n�o(�0o�o)�1 + U�1;n; U�3;n; b�nU2;n�o;?(�0o;?�o;?)�1i :
Denote
�n(U) =hn�
12 b�nU2�o(�0o�o)�1 + U1; U3; b�nU2�o;?(�0o;?�o;?)�1i ;
then by de�nition, U�n =�U�1;n; U
�2;n; U
�3;n
�minimizes the following criterion function
V4;n(U) =nXt=1
�kut ��n(U)DSZS;t�1k2E � kutk
2E
�+n
Xk2S�
n bP�n ���k(�n(U)DSQSL1 +�o)�� bP�n ���k(�o)�o+n
Xj2SB
n bP�n [�n(U)DSQSLj+1 +Bo;j ]� bP�n(Bo;j)o ;where Lj = diag(A1; :::; Ap+1) with Aj = Im and A�j = 0 otherwise and ZS;t�1 =
QS
Yt�1
�XSB ;t�1
!.
For any compact set K 2 Rm�ro �Rro�(m�ro) �Rm�mdSB and any U 2 K, there is
�n(U)DSQS = Op(n� 12 ):
Hence using similar arguments in the proof of Theorem 3.6, we can deduce that
nXk2S�
n bP�n ���k(�n(U)DSQSL1 +�o)�� bP�n ���k(�o)�o = op(1) (5.39)
48
and
nXj2SB
n bP�n [�n(U)DSQSLj+1 +Bo;j ]� bP�n(Bo;j)o = op(1) (5.40)
uniformly over U 2 K.Next, note that
�n(U)!p
�U1; U3; �oU2(�
0o;?�o;?)
�1� := �1(U) (5.41)
uniformly over U 2 K. By Lemma 5.1 and (5.41), we can deduce that
nXt=1
�kut ��n(U)DSZS;t�1k2E � kutk
2E
�! d vec [�1(U)]
0"
�z1z1 0
0RBw2B
0w2
! Im
#vec [�1(U)]
�2vec [�1(U)]0 vec [(B1;m; B2;m)] := V2(U) (5.42)
uniformly over U 2 K, where B1;m = N(0;u �z1z1) and B2;m =�RBw2dB
0u
�0.Note that V2(U) can be rewritten as
V2(U) = vec(U1; U3)0 (�z1z1 Im) vec(U1; U3)
+vec(U2)0�(�0o;?�o;?)
�1ZBw2B
0w2(�
0o;?�o;?)
�1 �0o�o�vec(U2)
�2vec(U1; U3)0vec (B1;m)� 2vec(U2)0vec�B2;m(�
0o;?�o;?)
�1� : (5.43)
The expression in (5.43) makes it clear that V2(U) is uniquely minimized at (U�1 ; U�2 ; U
�3 ),
where
(U�1 ; U�3 ) = B1;m�
�1z1z1 and U
�2 = (�
0o�o)
�1 ��0oB2;m��Z Bw2B0w2
��1(�0o;?�o;?). (5.44)
From (5.34) and (5.36), we can see that U�n is asymptotically tight. Invoking the ACMT,
we deduce that U�n !d U�. The results in (5.37) follow by applying the CMT.
49
6 Adaptive Selection of the Tuning Parameters
This section develops a data-driven procedure of selecting the tuning parameter �n. Given
�, we denote the rank of the shrinkage estimator b�n(�) as brn(�) and let SB;� be the indexset of the nonzero matrices in the shrinkage estimator bBn. De�neICn(�) = min
�2Rm�m;r(�)=brn(�)Bj2Rm�m
8<:nXt=1
�Yt ��Yt�1 �Xj2SB;�
Bj�Yt�j
2
E
9=;�CnK [brn(�); bpn(�)] ;(6.1)
where bpn(�) denotes the dimensionality of the index set SB;�, K (r; p) is a strictly decreasingfunction in r and p and Cn is a positive sequence. We propose to select the tuning parameter
� by minimizing the information criterion ICn(�) de�ned in (6.1), i.e.
b��n = arg min�2n
ICn(�); (6.2)
where n = [0; �max;n] and �max;n is a positive sequence such that �max;n = o(1).
Note that the �rst term in (6.1) is used to measure the �t of the selected model and
the second term gives extra bonus to select smaller cointegration rank and smaller number
of lagged di¤erences. When an over�tted model is selected, then we can show that
min�2Rm�m;r(�)=brn(�)
Bj2Rm�m
8<:nXt=1
�Yt ��Yt�1 �Xj2SB;�
Bj�Yt�j
2
E
9=;� min�2Rm�m;r(�)=ro
Bj2Rm�m
8<:nXt=1
�Yt ��Yt�1 �Xj2SB
Bj�Yt�j
2
E
9=;= Op(1); (6.3)
while if Cn !1, then
Cn fK(ro; po)�K [brn(�); bpn(�)]g ! 1 (6.4)
Results in (6.3)-(6.4) imply that over�tted models will not be selected if the tuning para-
meter b��n is selected by minimizing the information criterion in (6.1). On the other hand,
50
if an under�tted model is selected, then we can show that
min�2Rm�m;r(�)=brn(�)
Bj2Rm�m
8<: 1nnXt=1
�Yt ��Yt�1 �Xj2SB;�
Bj�Yt�j
2
E
9=;� min�2Rm�m;r(�)=ro
Bj2Rm�m
8<: 1nnXt=1
�Yt ��Yt�1 �Xj2SB
Bj�Yt�j
2
E
9=;! pC; (6.5)
where C is some generic positive constant, while if Cn=n! 0, then
CnnfK(ro; po)�K [brn(�); bpn(�)]g ! 0: (6.6)
Results in (6.3)-(6.4) imply that under�tted models will not be selected if the tuning
parameter b��n is selected by minimizing the information criterion in (6.1). From above
discussion, we see that the following assumption is important for the LS shrinkage estimator
based on b��n to enjoy oracle-like properties.Assumption 6.1 (i) K(r; p) is a bounded and strictly decreasing function in r and p; (ii)Cn !1 as the sample size n!1 and Cn = o(n).
We next show that the LS shrinkage estimation based on b��n selected by minimizingthe information criterion de�ned in (6.1) is consistent in cointegration rank and lag order
selection.
Theorem 6.1 Under Assumption 3.1, 5.1 and 6.1, there is
Pr�S�;b��n = S� and SB;b��n = SB
�! 1 (6.7)
where S�;b��n denotes the index set of nonzero eigenvalues of b�n(b��n) and SB;b��n denotes the
nonzero matrices in bBn(b��n).For empirical implementation, the function K(r; p) can be speci�ed as K(r; p) = r2 �
2mr � p and the sequence Cn can be log n or log log(n), which give BIC and HQIC typeinformation criteria, respectively. On the other hand, if Cn = o(n) and Cn < M for some
51
�nite constant M and all n, then we can show that the LS shrinkage estimation based
on b��n selects under�tted models with probability approaching zero, while it has nontrivialprobability of selecting over�tted models. One such example is the AIC type of information
criterion where K(r; p) = r2 � 2mr � p and Cn = 2.
7 Simulation Study
8 An Empirical Example
9 Conclusion
One of the main challenges in any applied econometric work is the selection of a good
model for practical implementation. The conduct of inference and model use in forecasting
and policy analysis are inevitably conditioned on the empirical process of model selection,
which typically leads to issues of post-model selection inference. Adaptive lasso and bridge
estimation methods provide a methodology where these di¢ culties may be attenuated by
simultaneous model selection and estimation.
This paper uses that methodology to develop an automated approach to cointegrated
system modeling that enables simultaneous estimation of the cointegrating rank and au-
toregressive order in conjunction with oracle-like e¢ cient estimation of the cointegrating
matrix and the transient dynamics. As such the methods o¤er practical advantages to
the empirical researcher by avoiding sequential techniques where cointegrating rank and
transient dynamics are estimated prior to model �tting.
Various extensions of the methods developed here are possible. One rather obvious
extension is to allow for parametric restrictions on the cointegrating matrix which may
relate to theory-induced speci�cations. Lasso type procedures have so far been con�ned
to parametric models, whereas cointegrated systems are often formulated with some non-
parametric elements relating to unknown features of the model. A second extension of
the present methodology, therefore, is to semiparametric formulations in which the error
process in the VECM is weakly dependent, which is partly considered already in Section
4. These and other generalizations of the framework will be explored in future work.
52
10 Appendix
We start with the following standard asymptotic result.
Lemma 10.1 Under Assumptions 3.1 and 3.2, we have(a) n�1
Pnt=1 Z1;t�1Z
01;t�1 !p �z1z1;
(b) n�32Pnt=1 Z1;t�1Z
02;t�1 !p 0;
(c) n�2Pnt=1 Z2;t�1Z
02;t�1 !d
RBw2B
0w2;
(d) n�12Pnt=1 utZ
01;t�1 !d N(0;u �z1z1);
(e) n�1Pnt=1 utZ
02;t�1 !d
�RBw2dB
0u
�0.The quantities in (c), (d), and (e) converge jointly.
Proof of Lemma 10.1. See Johansen (1995) and Cheng and Phillips (2009).
Proof of Lemma 3.1. (a) From (3.10)
Q�b�ols ��o�Q�1Dn =
nXt=1
QutY0t�1Q
0
! nXt=1
QYt�1Y0t�1Q
0
!�1Dn
=
nXt=1
vtZ0t�1D
�1n
! D�1n
nXt=1
Zt�1Z0t�1D
�1n
!�1; (10.1)
where vt = Qut and Zt�1 = QYt�1. Result (a) follows directly from Lemma 10.1.
(b) Denote P = [�o; �o?] and Sn(�) = �Im� b�ols: Then, by de�nition, the elements of�(b�ols) are the solutions of the determinantal equation,0 =
����Im � b�ols��� = ��P 0Sn(�)P �� =����� ��0o�o � �0ob�ols�o ��0ob�ols�o?
��0o?b�ols�o �Im�ro � �0o?b�ols�o?����� : (10.2)
Using (a) we deduce that
�0ob�ols�o? = �0o
�b�ols ��o��o? = op(1); (10.3)
�0o?b�ols�o? = �0o?
�b�ols ��o��o? = op(1); (10.4)
and, similarly,
�0o?b�ols�o !p �0o?�o�o and �
0ob�ols�o !p �
0o�o�o: (10.5)
53
Using (10.2)-(10.5), we deduce that����Im � b�ols���!p j�Im�ro j �����0o�o � �0o�o�o�� ; (10.6)
uniformly over any compact set in Rm. By Assumption 3.2.(i), �(�o) 2 U1 := f� 2 Rm;jj�jjE � 1g and U1 is a compact set in Rm. Thus, by continuous mapping, we have
�Sc�;n(b�ols)!p 0 and �S�;n(
b�ols)!p �S�(�o); (10.7)
where �S�(�o) denotes the solutions of the equation����0o�o � �0o�o�o�� = 0. The deter-
minantal equation����0o�o � �0o�o�o�� = 0 is equivalent to ���Iro � �0o�o�� = 0, so result (b)
follows.
(c) Using the notation from (b), we have
jSn(�)j =���0oSn(�)�o��� ����0o? nSn(�)� Sn(�)�o ��0oSn(�)�o��1 �0oSn(�)o�o?��� : (10.8)
Let ��k = n�k(b�ols) (k 2 Scn;�), so that ��k is by de�nition a solution of the equation0 =
���0oSn(�)�o��� ����0o? nSn(�)� Sn(�)�o ��0oSn(�)�o��1 �0oSn(�)o�o?��� ; (10.9)
where Sn(�) =�nIm � b�ols.
For any compact subset K � R, we can invoke the results in (a) to show
�0oSn(�)�o =�
n�0o�o � �0o
�b�ols ��o��o + �0o�o�o !p �0o�o�o; (10.10)
uniformly over K. From Assumption 3.2.(iii), we have
���0o�o�o�� = ���0o�o�0o�o�� = ���0o�o��� ���0o�o�� 6= 0:Thus, the normalized m� ro smallest eigenvalues ��k (k 2 Scn;�) of b�ols are asymptoticallythe solutions of the following determinantal equation,
0 =����0o? nSn(�)� Sn(�)�o ��0oSn(�)�o��1 �0oSn(�)o�o?��� ; (10.11)
54
where
�0oSn(�)�o? = �0o
�b�ols ��o��o?; (10.12)
�0o?Sn(�)�o? =�
nIm�ro � �0o?
�b�ols ��o��o?; (10.13)
�0o?Sn(�)�o = �0o?b�ols�o !p �0o?�o�
0o�o: (10.14)
Using the results in (10.10) and (10.12)-(10.14), we get
�0o?
nSn(�)� Sn(�)�o
��0oSn(�)�o
��1�0oSn(�)
o�o?
=�
nIm�ro � �0o?
hIm � �o
��0o�o
��1�0o + op(1)
i �b�ols ��o��o?: (10.15)
Note that
�0o?
hIm�ro � �o
��0o�o
��1�0o
iQ�1 =
�0(m�ro)�ro ; (�
0o;?�o;?)
�1� := H1; (10.16)
and
Q�o? = [�o; �o?]0 �o? =
�0(m�ro)�ro ; �
0o?�o?
�0:= H 0
2: (10.17)
Using (10.1), we deduce that
nH1Q�b�ols ��o�Q�1H 0
2
=
H1
nXt=1
wtZ0t�1D
�1n
! D�1n
nXt=1
ZtZ0tD
�1n
!�1H 02
! d (�0o;?�o;?)
�1�Z
Bw2dB0w2
�0�ZBw2B
0w2
��1(�0o;?�o;?): (10.18)
Then, from (10.11)-(10.18), we obtain���n�0o? nSn(�)� Sn(�)�o ��0oSn(�)�o��1 �0oSn(�)o�o?���! d
������Im�ro ��Z
Bw2dB0w2
�0�ZBw2B
0w2
��1����� ; (10.19)
uniformly over K. The result in (c) follows from (10.19) and by continuous mapping.
55
Proof of Theorem 3.1. De�ne
V1;n(�) =nXt=1
k�Yt ��Yt�1k2E + nmXk=1
bP�n ���k(�)� = V2;n(�) + n mXk=1
bP�n ���k(�)� :We can write
nXt=1
k�Yt ��Yt�1k2E =��y �
�Y 0�1 Im
�vec(�)
�0 ��y �
�Y 0�1 Im
�vec(�)
�where �y = vec (�Y ), �Y = (�Y1; :::;�Yn)m�n and Y�1 = (Y0; :::; YT�1)m�n.
By de�nition, V1;n(b�n) � V1;n(�o) and thushvec(b�n ��o)i0 nX
t=1
Yt�1Y0t�1 Im
!hvec(b�n ��o)i
+2hvec(b�n ��o)i0 vec nX
t=1
Yt�1u0t
!
� n
(mXk=1
bP�n ���k(�o)�� mXk=1
bP�n h��k(b�n)i): (10.20)
When ro = 0, �Yt is stationary and Yt is full rank I (1) ; so that
n�2nXt=1
Yt�1Y0t�1 !d
Z 1
0Bu(a)B
0u(a)da; (10.21)
and n�2PTt=1 Yt�1u
0t = op(1). From the results in (10.20) and (10.21), we get w.p.a.1, b�n ��o 2
E�min � 2
b�n ��o Ecn � dn � 0; (10.22)
where �min denotes the smallest eigenvalue of the random matrixR 10 Bu(a)B
0u(a)da; which is
positive with probability 1, cn = n�2Pn
t=1 Yt�1u0t
E= op(1) and dn = n�1
Pmk=1
bP�n ���k(�o)� =op(1). From (10.22), it is straightforward to deduce that
b�n ��o E= op(1).
When ro = m, Yt is stationary and we have
n�1nXt=1
Yt�1Y0t�1 !p �Y Y (0) = R(1)uR(1)
0; (10.23)
56
and n�1PTt=1 Yt�1u
0t = op(1). From the results in (10.20) and (10.21), we get w.p.a.1, b�n ��o 2
E�min � 2
b�n ��o Ecn � dn � 0 (10.24)
where �min denotes the smallest eigenvalue of �Y Y (0); which is positive with probability 1,
cn = n�1Pn
t=1 Yt�1u0t
E= op(1) and dn =
Pmk=1
bP�n ���k(�o)� = op(1). So, consistencyof b�n follows directly from the inequality in (10.24).
Denote Bn = (DnQ)�1, then when 0 < ro < m, we can use the results in Lemma 1 to
deduce that
nXt=1
Yt�1Y0t�1 = Q�1D�1n Dn
nXt=1
Zt�1Z0t�1DnD
�1n Q
0�1
= Bn
" �z1z1 0
0RBw2B
0w2
!+ op(1)
#B0n;
and thus
vec(b�n ��o)0 BnDn nXt=1
Zt�1Z0t�1DnB
0n Im
!vec(b�n ��o) � �min �b�n ��o�Bn 2
E;
(10.25)
where �min is the smallest eigenvalue of diag��z1z1 ;
RBw2B
0w2
�and is strictly positive with
probability 1. Next observe that�����hvec(b�n ��o)i0 vec BnDn
nXt=1
Zt�1u0t
!����� � �b�n ��o�Bn E en; (10.26)
where en = kDnPnt=1 Yt�1u
0tkE = Op(1) by Lemma 10.1. Denoting dn = n
Pmk=1
bP�n ���k(�o)�,we have the inequality
�min
�b�n ��o�Bn 2E� 2
�b�n ��o�Bn Een � dn � 0; (10.27)
which implies �b�n ��o�Bn = Op(1 + d 12n ): (10.28)
57
By the de�nition of Bn and (10.28), we deduce that
b�n ��o = Op(n� 12 + n�
12d
12n ) = op(1);
which implies the consistency of b�n.Proof of Theorem 3.2. By the consistency of b�n and the CMT �k(b�n) is consistent for�k(�o). Consistency of �k(b�n) and continuous twice di¤erentiability of bP�n(�) imply that������Xk2S�
� bP�n ���k(�o)�� bP�n [��k(b�n)]������� � max
k2S�
��� bP 0�n ���k(�o)���� Xk2S�
�����k(b�n)� ��k(�o)���+maxk2S�
��� bP 00�n ���k(�o)�+ op(1)��� Xk2S�
�����k(b�n)� ��k(�o)���2 :(10.29)
By the triangle inequality and Bauer-Fiker Theorem on eigenvalue sensitivity we have the
inequality �����k(b�n)� ��k(�o)��� � ����k(b�n)� �k(�o)��� � CV b�n ��o E; (10.30)
for all k 2 S�, where the constant CV = kVokE/ V �1o
E2 (0;1) and Vo is the eigenvector
matrix of �o.
Using (10.29) and (10.30) and invoking the inequality in (10.20) we get
hvec(b�n ��o)i0 nX
t=1
Yt�1Y0t�1 Im
!hvec(b�n ��o)i
+2hvec(b�n ��o)i0 vec nX
t=1
Yt�1u0t
!
� Cnmaxk2S�
��� bP 0�n ���k(�o)���� b�n ��o E + op(1) b�n ��o 2E ; (10.31)
where C > 0 denotes a generic constant.
When ro = 0, we use (10.21) and (10.31) to deduce that b�n ��o 2E�min �
b�n ��o E
�cn + n
�1anC�� 0; (10.32)
58
where cn = n�2Pn
t=1 Yt�1u0t
E= Op(n
�1). We deduce from the inequality (10.32) that
b�n ��o = Op(n�1 + n�1an): (10.33)
When ro = m, we use (10.23) and (10.31) to deduce that b�n ��o 2E�min �
b�n ��o E(cn + anC) � 0; (10.34)
where cn = 1n
Pnt=1 Yt�1u
0t
E= Op(n
� 12 ). The inequality (10.32) leads to
b�n ��o = Op(n� 12 + an): (10.35)
When 0 < ro < m, we can use the results in (10.25), (10.26) and (10.31) to deduce that
�min
�b�n ��o�Bn 2E� 2
�b�n ��o�Bn Een � nanCV
�b�n ��o�BnB�1n E;
(10.36)
where en = kDnQPnt=1 Yt�1u
0tkE = Op(1). Note that �b�n ��o�BnB�1n
E� Cn�
12
�b�n ��o�Bn E: (10.37)
Using (10.36) and (10.37), we get�b�n ��o�Bn = Op(1 + n 12an) (10.38)
which �nishes the proof.
59
References
[1] M. Caner and K. Knight, "No country for old unit root tests: bridge estimators
di¤erentiate between nonstationary versus stationary models and select optimal lag,"
unpublished manuscript, 2009.
[2] J. Chao and P.C.B. Phillips, "Model selection in partially nonstationary vector au-
toregressive processes with reduced rank structure," Journal of Econometrics, vol. 91,
no. 2, pp. 227.271, 1999.
[3] X. Cheng and P.C.B. Phillips, "Semiparametric cointegrating rank selection," Econo-
metrics Journal, vol. 12, pp. S83.S104, 2009.
[4] J. Fan and R. Li, "Variable selection via nonconcave penalized likelihood and its
oracle properties," Journal of the American Statistical Association, vol. 96, no. 456,
pp. 1348.1360, 2001.
[5] S. Johansen, "Statistical analysis of cointegration vectors," Journal of economic dy-
namics and control, vol. 12, no. 2-3, pp. 231.254, 1988.
[6] S. Johansen, Likelihood-based inference in cointegrated vector autoregressive models.
Oxford University Press, USA, 1995.
[7] K. Knight and W. Fu, "Asymptotics for lasso-type estimators," Annals of Statistics,
vol. 28, no. 5, pp. 1356.1378, 2000.
[8] H. Leeb and B. Potscher, "Model selection and inference: facts and �ction," Econo-
metric Theory, vol. 21, no. 01, pp. 21.59, 2005.
[9] Z. Liao, "Adaptive GMM shrinkage estimation with consistent moment selection,"
unpublished manuscript, 2010.
[10] Z. Liao and P.C.B. Phillips, "Reduced rank regression of partially non-stationary
vector autoregressive processes under misspeci�cation," unpublished manuscript, 2010.
[11] P.C.B. Phillips, "Optimal inference in cointegrated systems," Econometrica, vol. 59,
no. 2, pp. 283.306, 1991.
[12] � � � �, "Fully modi�ed least squares and vector autoregression," Econometrica, vol.
63, no. 5, pp. 1023.1078, 1995.
60
[13] � � � �, "Econometric model determination," Econometrica, vol. 64, no. 4, pp.
763.812, 1996.
[14] P.C.B. Phillips and V. Solo, "Asymptotics for linear processes," Annals of Statistics,
vol. 20, no. 2, pp. 971.1001, 1992.
[15] H. Zou, "The adaptive lasso and its oracle properties," Journal of the American Sta-
tistical Association, vol. 101, no. 476, pp. 1418.1429, 2006.
61
top related