t arxiv:1407.6484v1 [stat.co] 24 jul 2014the toolbox for statisticians and econometricians and...

The R-package phtt: Panel Data Analysis with

Heterogeneous Time Trends

Oualid BadaUniversity of Bonn

Dominik LieblUniversite libre de Bruxelles

Abstract

The R-package phtt provides estimation procedures for panel data with large dimen-sions n, T , and general forms of unobservable heterogeneous effects. Particularly, theestimation procedures are those of Bai (2009) and Kneip, Sickles, and Song (2012), whichcomplement one another very well: both models assume the unobservable heterogeneouseffects to have a factor structure. Kneip et al. (2012) considers the case in which thetime varying common factors have relatively smooth patterns including strongly positiveauto-correlated stationary as well as non-stationary factors, whereas the method of Bai(2009) focuses on stochastic bounded factors such as ARMA processes. Additionally, thephtt package provides a wide range of dimensionality criteria in order to estimate thenumber of the unobserved factors simultaneously with the remaining model parameters.

Keywords: Panel data, unobserved heterogeneity, principal component analysis, factor dimen-sion.

1. Introduction

One of the main difficulties and at the same time appealing advantages of panel models istheir need to deal with the problem of the unobserved heterogeneity. Classical panel models,such as fixed effects or random effects, try to model unobserved heterogeneity using dummyvariables or structural assumptions on the error term (see, e.g., Baltagi (2005)). In bothcases the unobserved heterogeneity is assumed to remain constant over time within eachcross-sectional unit—apart from an eventual common time trend. This assumption might bereasonable for approximating panel data with fairly small temporal dimensions T ; however,for panel data with large T this assumption becomes very often implausible.

Nowadays, the availability of panel data with large cross-sectional dimensions n and largetime dimensions T has triggered the development of a new class of panel data models. Recentdiscussions by Ahn, Lee, and Schmidt (2013), Pesaran (2006), Bai (2009), Bai, Kao, andNg (2009), and Kneip et al. (2012) have focused on advanced panel models for which theunobservable individual effects are allowed to have heterogeneous (i.e., individual specific)time trends that can be approximated by a factor structure. The basic form of this new classof panel models can be presented as follows:

yit =P∑j=1

xitjβj + νit + εit for i ∈ 1, . . . , n and t ∈ 1, . . . , T, (1)

arX

iv:1

407.

6484

v1 [

stat

.CO

] 2

4 Ju

l 201

4

2 The R-package phtt

where yit is the dependent variable for each individual i at time t, xitj is the jth elementof the vector of explanatory variables xit ∈ RP , and εit is the idiosyncratic error term. Thetime-varying individual effects νit ∈ R of individual i for the time points t ∈ 1, . . . , T areassumed to be generated by d common time-varying factors. The following two specificationsof the time-varying individual effects νit are implemented in our R package phtt:

νit =

vit =

∑dl=1 λilflt, for the model of Bai (2009),

vi(t) =∑d

l=1 λilfl(t), for the model of Kneip et al. (2012).(2)

Here, λil are unobserved individual loadings parameters, flt are unobserved common factorsfor the model of Bai (2009), fl(t) are the unobserved common factors for the model of Kneipet al. (2012), and d is the unknown factor dimension.

Note that the explicit consideration of an intercept in model (1) is not necessary but mayfacilitate interpretation. If xit includes an intercept, the time-varying individual effects νit arecentered around zero. If xit does not include an intercept, the time-varying individual effectsνit are centered around the overall mean.

Model (1) includes the classical panel data models with additive time-invariant individualeffects and common time-specific effects. This model is obtained by choosing d = 2 with afirst common factor f1t = 1 for all t ∈ 1, . . . , T that has individual loadings parametersλi1, and a second common factor f2t that has the same loadings parameter λi2 = 1 for alli ∈ 1, . . . , n.An intrinsic problem of factor models lies in the fact that the true factors are only identifi-able up to rotation. In order to ensure the uniqueness of these parameters, a number of d2

restrictions are required. The usual normalization conditions are given by

(a) 1T

∑Tt=1 f

2lt = 1 for all l ∈ 1, . . . , d,

(b)∑T

t=1 fltfkt = 0 for all l, k ∈ 1, . . . , d with k 6= l, and

(c)∑N

i=1 λilλik = 0 for all l, k ∈ 1, . . . , d with k 6= l;

see, e.g., Bai (2009) and Kneip et al. (2012). For the model of Kneip et al. (2012), flt inconditions (a) and (b) has to be replaced by fl(t). As usual in factor models, a certain degreeof indeterminacy remains, because the factors can only be determined up to sign changes anddifferent ordering schemes.

Kneip et al. (2012) consider the case in which the common factors fl(t) show relatively smoothpatterns over time. This includes strongly positive auto-correlated stationary as well asnon-stationary factors. The authors propose to approximate the time-varying individualeffects vi(t) by smooth nonparametric functions, say, ϑi(t). In this way (1) becomes a semi-parametric model and its estimation is done using a two-step estimation procedure, which weexplain in more detail in Section 2. The asymptotic properties of this method rely, however,on independent and identically distributed errors.

Alternatively, Bai (2009) allows for weak forms of heteroskedasticity and dependency in bothtime and cross-section dimensions and proposes an iterated least squares approach to esti-mate (1) for stationary time-varying individual effects vit such as ARMA processes or non-stationary deterministic trends. However, Bai (2009) rules out a large class of non-stationaryprocesses such as stochastic processes with integration.

Oualid Bada, Dominik Liebl 3

Moreover, Bai (2009) assumes the factor dimension d to be a known parameter, which isusually not the case. Therefore, the phtt package uses an algorithmic refinement of Bai’smethod proposed by Bada and Kneip (2014) in order to estimate the number of unobservedcommon factors d jointly with the remaining model parameters; see Section 4 for more details.

Besides the implementations of the methods proposed by Kneip et al. (2012), Bai (2009), andBada and Kneip (2014) the R package phtt comes with a wide range of criteria (16 in total)for estimating the factor dimension d. The main functions of the phtt package are given inthe following list:

• KSS(): Computes the estimators of the model parameters according to the method ofKneip et al. (2012); see Section 2.

• Eup(): Computes the estimators of the model parameters according to the method ofBai (2009) and Bada and Kneip (2014); see Section 4.

• OptDim(): Allows for a comparison of the estimated factor dimensions d obtained frommany different (in total 16) criteria; see Section 3.

• checkSpecif(): Tests whether to use a classical fixed effects panel model or a panelmodel with individual effects νit; see Section 5.1.

The functions are provided with the usual print()-, summary()-, plot()-, coef()- andresiduals()-methods.

Standard methods for estimating models for panel and longitudinal data are also implementedin the R packages plm (Croissant and Millo 2008), nlme (Pinheiro, Bates, DebRoy, Sarkar,and R Core team 2012), and lme4 (Bates, Maechler, and Bolker 2012); see Croissant andMillo (2008) for an exhaustive comparison of these packages. Recently, Millo and Piras (2012)published the R package splm for spatial panel data models. The phtt package further extendsthe toolbox for statisticians and econometricians and provides the possibility of analyzingpanel data with large dimensions n and T and considers in the case when the unobservedheterogeneity effects are time-varying.

To the best of our knowledge, our phtt package Bada and Liebl (2012) is the first softwarepackage that offers the estimation methods of Bai (2009) and Kneip et al. (2012). Regardingthe different dimensionality criteria that can by accessed via the function OptDim() only thoseof Bai and Ng (2002) are publicly available as MATLAB codes (The MathWorks Inc. 2012)from the homepage of Serena Ng (http://www.columbia.edu/~sn2294/).

To demonstrate the use of our functions, we re-explore the well known Cigar dataset, which isfrequently used in the literature of panel models. The panel contains the per capita cigaretteconsumptions of n = 46 American states from 1963 to 1992 (T = 30) as well as data aboutthe income per capita and cigarette prices (see, e.g., Baltagi and Levin (1986) for more detailson the dataset).

We follow Baltagi and Li (2004), who estimate the following panel model:

ln(Consumptionit) = µ+ β1 ln(Priceit) + β2 ln(Incomeit) + eit. (3)

Here, Consumptionit presents the sales of cigarettes (packs of cigarettes per capita), Priceitis the average real retail price of cigarettes, and Incomeit is the real disposable income per

http://www.columbia.edu/~sn2294/


capita. The index i ∈ 1, . . . , 46 denotes the single states and the index t ∈ 1, . . . , 30denotes the year.

We revisit this model, but allow for a multidimensional factor structure such that

eit =

d∑l=1

λilflt + εit.

The Cigar dataset can be obtained from the phtt package using the function data("Cigar").The panels of the variables ln(Consumptionit), ln(Priceit), and ln(Incomeit) are shown inFigure 1.

0 10 20 30

4.04.5

5.05.5

Log's ofCigar−Consumption

Time

0 10 20 30

−0.6

−0.4

−0.2

0.00.2

0.4

Log's ofreal Prices

Time

0 10 20 30

3.84.0

4.24.4

4.64.8

5.0

Log's ofreal Income

Time

Figure 1: Plots of the dependent variable ln(Consumptionit) and regressor variablesln(Priceit) and ln(Incomeit).

Section 2 is devoted to a short introduction of the method of Kneip et al. (2012), which isappropriate for relatively smooth common factors fl(t). Section 3 presents the usage of thefunction OptDim(), which provides access to a wide range of panel dimensionality criteriarecently discussed in the literature on factor models. Section 4 deals with the explanation aswell as application of the panel method proposed by Bai (2009), which is basically appropriatefor stationary and relatively unstructured common factors flt.

2. Panel models for heterogeneity in time trends

The panel model proposed by Kneip et al. (2012) can be presented as follows:

yit =P∑j=1

xitjβj + vi(t) + εit, (4)


where the time-varying individual effects vi(t) are parametrized in terms of common non-parametric basis functions f1(t), . . . , fd(t) such that

vi(t) =

d∑l=1

λilfl(t). (5)

The asymptotic properties of this method rely on second order differences of vi(t), which applyfor continuous functions as well as for classical discrete stochastic time series processes suchas (S)AR(I)MA processes. Therefore, the functional notation of the time-varying individualeffects vi(t) and their underlying common factors f1(t), . . . , fd(t) does not restrict them to apurely functional interpretation. The main idea of this approach is to approximate the timeseries of individual effects vi(t) by smooth functions ϑi(t).

The estimation approach proposed by Kneip et al. (2012) relies on a two-step procedure: first,estimates of the common slope parameters βj and the time-varying individual effects vi(t) areobtained semi-parametrically. Second, functional principal component analysis is used toestimate the common factors f1(t), . . . , fd(t), and to re-estimate the time-varying individualeffects vi(t) more efficiently. In the following we describe both steps in more detail.

Step 1: The unobserved parameters βj and vi(t) are estimated by the minimization of

n∑i=1

1

T

T∑t=1

yit − P∑j=1

xitjβj − ϑi(t)

2

+n∑i=1

κ

∫ T

1

1

T

(ϑ(m)i (s)

)2ds, (6)

over all βj ∈ R and all m-times continuously differentiable functions ϑi(t), where ϑ(m)i (t)

denotes the mth derivative of the function ϑi(t). A first approximation of vi(t) is then givenby vi(t) := ϑi(t). Spline theory implies that any solution ϑi(t) possesses an expansion in termsof a natural spline basis z1(t), . . . , zT (t) such that ϑi(t) =

∑Ts=1 ζiszs(t); see, e.g., De Boor

(2001). Using the latter expression, we can rewrite (6) to formalize the following objectivefunction:

S(β, ζ) =

n∑i=1

(||Yi −Xiβ − Zζi||2 + κζ>i Rζi

), (7)

where Yi = (yi1, . . . , yiT )>, Xi = (x>i1, . . . , x>iT )>, β = (β1, . . . , βp)

>, ζi = (ζi1, . . . , ζiT )>, Z

and R are T × T matrices with elements zs(t)s,t=1,...,T and ∫z(m)s (t)z

(m)k (t)dts,k=1,...,T

respectively. κ is a preselected smoothing parameter to control the smoothness of ϑi(t). Wefollow the usual choice of m = 2, which leads to cubic smoothing splines.

In contrast to Kneip et al. (2012), we do not specify a common time effect in model (4), butthe vector of explanatory variables is allowed to contain an intercept. This means that thetime-varying individual effects vi(t) are not centered around zero for each specific time pointt, but around a common intercept term. The separate estimation of the common time effect,say θt, is also possible with our phtt package; we discuss this in detail in Section 5.

The semi-parametric estimators β, ζi = (ζi1, . . . , ζiT )>, and vi = (vi1, . . . , viT )> can be ob-tained by minimizing S(β, ζ) over all β ∈ Rp and ζ ∈ RT×n.


The solutions are given by

β =

(N∑i=1

X>i (I −Zκ)Xi

)−1( N∑i=1

X>i (I −Zκ)Yi

), (8)

ζi = (Z>Z + κR)−1Z>(Yi −Xiβ), and (9)

vi = Zκ(Yi −Xiβ

), where Zκ = Z

(Z>Z + κR

)−1Z>. (10)

Step 2: The common factors are obtained by the first d eigenvectors γ1, . . . , γd that corre-spond to the largest eigenvalues ρ1, . . . , ρd of the empirical covariance matrix

Σ =1

n

n∑i=1

viv>i . (11)

The estimator of the common factor fl(t) is then defined by the lth scaled eigenvector

fl(t) =√T γlt for all l ∈ 1, . . . , d, (12)

where γlt is the tth element of the eigenvector γl. The scaling factor√T yields that fl(t)

satisfies the normalization condition 1T

∑Tt=1 fl(t)

2 = 1 as listed above in Section 1. Theestimates of the individual loadings parameters λil are obtained by ordinary least squares

regressions of(Yi −Xiβ

)on fl, where fl = (fl(1), . . . , fl(T ))

′. Recall from conditions (a)

and (b) that λil can be calculated as follows:

λil =1

Tf>l

(Yi −Xiβ

). (13)

A crucial part of the estimation procedure of Kneip et al. (2012) is the re-estimation ofthe time-varying individual effects vi(t) in Step 2 by vi(t) :=

∑dl=1 λlfl(t), where the factor

dimension d can be determined, e.g., by the sequential testing procedure of Kneip et al. (2012)or by any other dimensionality criterion; see also Section 3. This re-estimation leads to moreefficiently estimated time-varying individual effects.

Kneip et al. (2012) derive the consistency of the estimators as n, T → ∞ and show that the

asymptotic distribution of common slope estimators is given by Σ−1/2β (β −Eε(β))

d→ N(0, I),where

Σβ = σ2

(n∑i=1

X>i (I −Zκ)Xi

)−1( n∑i=1

X>i (I −Zκ)2Xi

)(n∑i=1

X>i (I −Zκ)Xi

)−1. (14)

A consistent estimator of σ2 can be obtained by

σ2 =1

(n− 1)T

n∑i=1

||Yi −Xiβ −d∑l=1

λi,lfl||2. (15)

To determine the optimal smoothing parameter κopt, Kneip et al. (2012) propose the followingcross validation (CV) criterion:

CV (κ) =n∑i=1

||Yi −Xiβ−i −d∑l=1

λ−i,lf−i,l||2, (16)


where β−i, λ−i,l, and f−i,l are estimates of the parameters β, λ, and fl based on the datasetwithout the ith observation. Unfortunately, this criterion is computationally very costly andrequires determining the factor dimension d in advance. To overcome this disadvantage,we propose a plug-in smoothing parameter that is discussed in more detail in the followingSection 2.1.

2.1. Computational details

Theoretically, it is possible to determine κ by the CV criterion in (16); however, cross vali-dation is computationally very costly. Moreover, Kneip et al. (2012) do not explain how thefactor dimension d is to be specified during the optimization process, which is critical sincethe estimator d is influenced by the choice of κ.

In order to get a quick and effective solution, we propose to determine the smoothing param-eter κ by generalized cross validation (GCV). However, we cannot apply the classical GCVformulas as proposed, e.g., in Craven and Wahba (1978) since we do not know the parametersβ and vi(t). Our computational algorithm for determining the GCV smoothing parameterκGCV is based on the method of Cao and Ramsay (2010), who propose optimizing objectivefunctions of the form (7) by updating the parameters iteratively in a functional hierarchy.Formally, the iteration algorithm can be described as follows:

1. For given κ and β, we optimize (7) with respect to ζi to get

ζi = (Z ′Z + κR)−1Z>(Yi −Xiβ). (17)

2. By using (17), we minimize (7) with respect to β to get

β =

(N∑i=1

X>i Xi

)−1( N∑i=1

X>i (Yi − Zζi)

)(18)

3. Once (17) and (18) are obtained, we optimize the following GCV criterion to calculateκGCV :

κGCV = arg minκ

1nT tr(I −Zκ)2

n∑i=1

||Yi −Xiβ −Zκ(Yi −Xiβ)||2. (19)

The program starts with initial estimates of β and κ and proceeds with steps 1, 2, and 3 inrecurrence until convergence of all parameters, where the initial value βstart is defined in (50)and the initial value κstart is the GCV-smoothing parameter of the residuals Yi −Xiβstart.

The advantage of this approach is that the inversion of the P × P matrix in (18) does nothave to be updated during the iteration process. Moreover, the determination of the GCV-minimizer in (19) can be easily performed in R using the function smooth.spline(), whichcalls on a rapid C-routine.

But note that the GCV smoothing parameter κGCV in (19) does not explicitly account forthe factor structure of the time-varying individual effects vi(t) as formalized in (2). In fact,given that the assumption of a factor structure is true, the goal shall not be to obtain optimalestimates of vi(t) but rather to obtain optimal estimates of the common factors fl(t), which


implies that the optimal smoothing parameter κopt will be smaller than κGCV ; see Kneip et al.(2012).

If the goal is to obtain optimal estimates of fl(t), κopt will be used as an upper bound whenminimizing the CV criterion (16) (via setting the argument CV = TRUE); which, however, cantake some time. Note that, this optimal smoothing parameter κopt depends on the unknownfactor dimension d. Therefore, we propose to, first, estimate the dimension based on thesmoothing parameter κGCV and, second, to use the estimated dimension d (via explicitlysetting the dimension argument factor.dim= d) in order to determine the dimension-specificsmoothing parameter κopt (via setting the argument CV = TRUE).

2.2. Application

This section is devoted to the application of the method of Kneip et al. (2012) discussedabove. The computation of this method is accessible through the function KSS(), which hasthe following arguments:

R> args(KSS)

function (formula, additive.effects = c("none", "individual",

"time", "twoways"), consult.dim.crit = FALSE, d.max = NULL,

sig2.hat = NULL, factor.dim = NULL, level = 0.01, spar = NULL,

CV = FALSE, convergence = 1e-06, restrict.mode = c("restrict.factors",

"restrict.loadings"), ...)

NULL

The argument formula is compatible with the usual R-specific symbolic designation of themodel. The unique specificity here is that the variables should be defined as T × n matrices,where T is the temporal dimension and n is the number of the cross-section unites.1

The argument additive.effects makes it possible to extend the model (4) for additionaladditive individual, time, or twoways effects as discussed in Section 5.

If the logical argument consult.dim.crit is set to TRUE all dimensionality criteria discussedin Section 3 are computed and the user is asked to choose one of their results.

The arguments d.max and sig2.hat are required for the computation of some dimensionalitycriteria discussed in Section 3. If their default values are maintained, the function internally

computes d.max=⌊min

√n,√T⌋

and sig2.hat as in (15), where bxc indicates the integer

part of x. The argument level allows to adjust the significance level for the dimensionalitytesting procedure (21) of Kneip et al. (2012); see Section 3.

CV is a logical argument. If it is set to TRUE the cross validation criterion (16) of Kneip et al.(2012) will be computed. In the default case, the function uses the GCV method discussedabove in Section 2.1.

The factor dimension d can be pre-specified by the argument factor.dim. Recall from re-striction (a) that 1

T

∑Tt=1 fl(t)

2 = 1.

1Note that phtt is written for balanced panels. Missing values have to be replaced in a pre-processing stepby appropriate imputation methods.


Alternatively, it is possible to standardize the individual loadings parameters such that1n

∑ni=1 λil = 1, which can be done by setting restrict.mode = "restrict.loadings".

As an illustration we estimate the Cigarettes model (3) introduced in Section 1:

ln(Consumptionit) = µ+ β1 ln(Priceit) + β2 ln(Incomeit) + eit (20)

with eit =

d∑l=1

λil fl(t) + εit,

In the following lines of code we load the Cigar dataset and take logarithms of the threevariables, Consumptionit, Priceit/cpit and Incomeit/cpit, where cpit is the consumer priceindex. The variables are stored as T × n-matrices. This is necessary, because the formula

argument of the KSS()-function takes the panel variables as matrices in which the numberof rows has to be equal to the temporal dimension T and the number of columns has to beequal to the individual dimension n.

R> library("phtt")

R> data("Cigar")

R> N <- 46

R> T <- 30

R> l.Consumption <- log(matrix(Cigar$sales, T, N))

R> cpi <- matrix(Cigar$cpi, T, N)

R> l.Price <- log(matrix(Cigar$price, T, N)/cpi)

R> l.Income <- log(matrix(Cigar$ndi, T, N)/cpi)

The model parameters β1, β2, the factors fl(t), the loadings parameters λil, and the factordimension d can be estimated by the KSS()-function with its default arguments. Inferencesabout the slope parameters can be obtained by using the method summary().

R> Cigar.KSS <- KSS(formula = l.Consumption ~ l.Price + l.Income)

R> (Cigar.KSS.summary <- summary(Cigar.KSS))

Call:

KSS.default(formula = l.Consumption ~ l.Price + l.Income)

Residuals:

Min 1Q Median 3Q Max

-0.11 -0.01 0.00 0.01 0.12

Slope-Coefficients:

Estimate StdErr z.value Pr(>z)

(Intercept) 4.0600 0.1770 23.00 < 2.2e-16 ***

l.Price -0.2600 0.0223 -11.70 < 2.2e-16 ***

l.Income 0.1550 0.0382 4.05 5.17e-05 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Additive Effects Type: none

Used Dimension of the Unobserved Factors: 6

Residual standard error: 0.000725 on 921 degrees of freedom

R-squared: 0.99

The effects of the log-real prices for cigarettes and the log-real incomes on the log-sales ofcigarettes are highly significant and in line with results in the literature. The summary outputreports an estimated factor dimension of d = 6. In order to get a visual impression of the sixestimated common factors f1(t), . . . , f6(t) and the estimated time-varying individual effectsv1(t), . . . , vn(t), we provide a plot()-method for the KSS-summary object.

R> plot(Cigar.KSS.summary)

111111111111111111111111111111

0 5 10 15 20 25 30

−2−1

01

2

Estimated Factors(Used Dimension = 6)

Time

22222

222222222

222222

22222

2

2

222

3333

3333

3

3

3

3

3

333

3333

3

333

3

3

3333

4

4

4

444

4

4

4

44

4

4

4

4

4

4

444444

444

4

4

44

5

5

5

5

5

5

5

55

5

55

5

5

5555

555

5

5

5

555

5

5

5

6

6

6

66

66

6

66

6

6

6

66

6

6

6

6

6

666

66

6

6

6

6

6

0 5 10 15 20 25 30

−0.5

0.00.5

Estimated Factor−Structure

Time

Figure 2: Left panel: Estimated factors f1(t), . . . , f6(t). Right panel: Estimated time-varying individual effects v1(t), . . . , vn(t).

The left panel of Figure 2 shows the six estimated common factors f1(t), . . . , f6(t) and the rightpanel of Figure 2 shows the n = 46 estimated time-varying individual effects v1(t), . . . , vn(t).The common factors are ordered correspondingly to the decreasing sequence of their eigenval-ues. Obviously, the first common factor is nearly time-invariant; this suggests extending themodel (20) by additive individual (time-invariante) effects; see Section 5 for more details.

By setting the logical argument consult.dim.crit=TRUE, the user can choose from otherdimensionality criteria, which are discussed in Section 3. Note that the consideration ofdifferent factor dimensions d would not alter the results for the slope parameters β since theestimation procedure of Kneip et al. (2012) for the slope parameters β does not depend onthe dimensionality parameter d.


3. Panel criteria for selecting the number of factors

In order to estimate the factor dimension d, Kneip et al. (2012) propose a sequential testingprocedure based on the following test statistic:

KSS(d) =n∑T

r=d+1 ρr − (n− 1)σ2tr(ZκPdZκ)

σ2√

2N · tr((ZκPdZκ)2)

a∼ N(0, 1), (21)

where Pd = I − 1T

∑dl=1 flf

>l with fl = (fl(1), . . . , fl(T ))>, and

σ2 =1

(n− 1)tr((I −Zκ)2)

n∑i=1

||(I −Zκ)(Yi −Xiβ)||2. (22)

The selection method can be described as follows: choose a significance level α (e.g., α = 1%)and begin with H0 : d = 0. Test if KSS(0) ≤ z1−α, where z1−α is the (1− α)-quantile of thestandard normal distribution. If the null hypothesis can be rejected, go on with d = 1, 2, 3, . . .until H0 cannot be rejected. Finally, the estimated dimension is then given by the smallestdimension d, which leads a rejection of H0.

The dimensionality criterion of Kneip et al. (2012) can be used for stationary as well as non-stationary factors. However, this selection procedure has a tendency to ignore factors thatare weakly auto-correlated. As a result, the number of factors can be underestimated.

More robust against this kind of underestimation are the criteria of Bai and Ng (2002). Thebasic idea of their approach consists simply of finding a suitable penalty term gnT , whichcountersteers the undesired variance reduction caused by an increasing number of factors d.Formally, d can be obtained by minimizing the following criterion:

PC(l) =1

nT

n∑i=1

T∑t=1

(yit − yit(l))2 + lgnT (23)

for all l ∈ 1, 2, . . ., where yit(l) is the fitted value for a given factor dimension l. To estimateconsistently the dimension of stationary factors Bai and Ng (2002) propose specifying gnT byone of the following penalty terms:

g(PC1)nT = σ2

(n+ T )

nTlog

(nT

n+ T

), (24)

g(PC2)nT = σ2

(n+ T )

nTlog(minn, T), (25)

g(PC3)nT = σ2

log(minn, T)minn, T

, and (26)

g(BIC3)nT = σ2

(n+ T − l)nT

log(nT ), (27)

where σ2 is the sample variance estimator of the residuals εit. The proposed criteria aredenoted by PC1, PC2, PC3, and BIC3 respectively. Note that only the first three crite-ria satisfy the requirements of Theorem 2 in Bai and Ng (2002), i.e., (i) gnT → 0 and(ii) minn, Tgnt → ∞, as n, T → ∞. These conditions ensure consistency of the selectionprocedure without imposing additional restrictions on the proportional behavior of n and T .


The requirement (i) is not always fulfilled for BIC3, especially when n is too large relative toT or T is too large relative to n (e.g., n = exp(T ) or T = exp(n)). In practice, BIC3 seemsto perform very well, especially when the idiosyncratic errors are cross-correlated.

The variance estimator σ2 can be obtained by

σ2(dmax) =1

nT

n∑i=1

T∑t=1

(yit − yit(dmax))2, (28)

where dmax is an arbitrary maximal dimension that is larger than d. This kind of varianceestimation can, however, be inappropriate in some cases, especially when σ2(dmax) under-estimates the true variance. To overcome this problem, Bai and Ng (2002) propose threeadditional criteria (IC1, IC2, and IC3):

IC(l) = log

(1

nT

n∑i=1

T∑t=1

(yit − yit(l))2)

+ lgnT (29)

with

g(IC1)nT =

(n+ T )

nTlog(

nT

n+ T), (30)

g(IC2)nT =

(n+ T )

nTlog(minn, T), and (31)

g(IC3)nT =

log(minn, T)minn, T

. (32)

In order to improve the finite sample performance of IC1 and IC2, Alessi, Barigozzi, and

Capasso (2010) propose to multiply the penalties g(IC1)nT and g

(IC2)nT with a positive con-

stant c and apply the calibration strategy of Hallin and Liska (2007). The choice of c isbased on the inspection of the criterion behavior through J-different tuples of n and T , i.e.,(n1, T1), . . . , (nJ , TJ), and for different values of c in a pre-specified grid interval. We denotethe refined criteria in our package by ABC.IC1 and ABC.IC2 respectively. Note that such amodification does not affect the asymptotic properties of the dimensionality estimator.

Under similar assumptions, Ahn and Horenstein (2013) propose selecting d by maximizingthe ratio of adjacent eigenvalues (or the ratio of their growth rate). The criteria are referredto as Eigenvalue Ratio (ER) and Growth Ratio (GR) and defined as following:

ER =ρlρl+1

(33)

(34)

GR =log(∑T

r=l ρr/∑T

r=l+1 ρr

)log(∑T

r=l+1 ρr/∑T

r=l+2 ρr

) . (35)

Note that the theory of the above dimensionality criteria PC1, PC2, PC3, BIC3, IC1, IC2, IC3,IPC1,IPC2, IPC3, ABC.IC1, ABC.IC2, KSS.C, ER, and GR are developed for stochasticallybounded factors. In order to estimate the number of unit root factors, Bai (2004) proposes


the following panel criteria:

IPC(l) =1

nT

n∑i=1

T∑t=1

(yit − yit(l))2 + lgnT , (36)

where

g(IPC1)nT = σ2

log(log(T ))

T

(n+ T )

nTlog

(nT

n+ T

), (37)

g(IPC2)nT = σ2

log(log(T ))

T

(n+ T )

nTlog(minn, T), and (38)

g(IPC3)nT = σ2

log(log(T ))

T

(n+ T − l)nT

log(nT ). (39)

Alternatively, Onatski (2010) has introduced a threshold approach based on the empiricaldistribution of the sample covariance eigenvalues, which can be used for both stationary andnon-stationary factors. The estimated dimension is obtained by

d = maxl ≤ dmax : ρl − ρl−1 ≥ δ,

where δ is a positive threshold, estimated iteratively from the data. We refer to this criterionas ED, which stands for Eigenvalue Differences.

3.1. Application

The dimensionality criteria introduced above are implemented in the function OptDim(),which has the following arguments:

R> args(OptDim)

function (Obj, criteria = c("PC1", "PC2", "PC3", "BIC3", "IC1",

"IC2", "IC3", "IPC1", "IPC2", "IPC3", "ABC.IC1", "ABC.IC2",

"KSS.C", "ED", "ER", "GR"), standardize = FALSE, d.max, sig2.hat,

spar, level = 0.01, c.grid = seq(0, 5, length.out = 128),

T.seq, n.seq)

NULL

The desired criteria can be selected by one or several of the following character variables:"KSS.C", "PC1", "PC2", "PC3", "BIC2", "IC1", "IC2" , "IC3", "ABC.IC1", "ABC.IC2","ER", "GR", "IPC1", "IPC2", "IPC3", and "ED". The default significance level used forthe "KSS"-criterion is level = 0.01. The values of dmax and σ2 can be specified exter-nally by the arguments d.max and sig2.hat. By default, d.max is computed internally

as d.max=⌊min

√n,√T⌋

and sig2.hat as in (22) and (28). The arguments "c.grid",

"T.seq", and "n.seq" are required for computing "ABC.IC1" and "ABC.IC2". The grid in-terval of the calibration parameter can be externally specified with "c.grid". The J-Tuples,(n1, T1), . . . , (nJ , TJ), can be specified by using appropriate vectors in "T.seq", and "n.seq".If these two arguments are left unspecified, the function constructs internally the followingsequences: T − C, T − C + 1, . . . , T , and n − C, n − C + 1, . . . , n, for C = min

√n,√T , 30.


Alternatively, the user can specify only the length of the sequences by giving appropriateintegers to the arguments "T.seq", and "n.seq", to control for C.

The input variable can be standardized by choosing standardize = TRUE. In this case, thecalculation of the eigenvalues is based on the correlation matrix instead of the covariancematrix for all criteria.

As an illustration, imagine that we are interested in the estimation of the factor dimensionof the variable ln(Consumptionit) with the dimensionality criterion "PC1". The functionOptDim() requires a T × n matrix as input variable.

R> OptDim(Obj = l.Consumption, criteria = "PC1")

Call: OptDim.default(Obj = l.Consumption, criteria = "PC1")

---------

Criterion of Bai and Ng (2002):

PC1

5

OptDim() offers the possibility of comparing the result of different selection procedures bygiving the corresponding criteria to the argument criteria. If the argument criteria is leftunspecified, OptDim() automatically compares all 16 procedures.

R> (OptDim.obj <- OptDim(Obj = l.Consumption, criteria = c("PC3", "ER",

+ "GR", "IPC1", "IPC2", "IPC3"), standardize = TRUE))

Call: OptDim.default(Obj = l.Consumption, criteria = c("PC3", "ER",

"GR", "IPC1", "IPC2", "IPC3"), standardize = TRUE)

---------

Criterion of Bai and Ng (2002):

PC3

5

--------

Criteria of Ahn and Horenstein (2013):

ER GR

3 3

---------

Criteria of Bai (2004):

IPC1 IPC2 IPC3

3 3 2


In order to help users to choose the most appropriate dimensionality criterion for the data,OptDim-objects are provided with a plot()-method. This method displays, in descendingorder, the magnitude of the eigenvalues in percentage of the total variance and indicateswhere the selected criteria detect the dimension; see Figure 3.

R> plot(OptDim.obj)

Screeplot

Ordered eigenvalues

Pro

port

ion

of v

aria

nce

1 3 5

0.5

%12

.6 %

81.8

%

52IPC3

ER, GR, IPC1, IPC2PC3

Figure 3: Scree plot produced by the plot()-method for OptDim-objects. Most of the dimen-sionality criteria (ER, GR, IPC1 and IPC2) suggest using the dimension d = 3.

We, now, come back to the KSS- function, which offers an additional way to compare the resultsof all dimensionality criteria and to select one of them: If the KSS()-argument consult.dim

= TRUE, the results of the dimensionality criteria are printed on the console of R and the useris asked to choose one of the results.

R> KSS(formula = l.Consumption ~ -1 + l.Price + l.Income, consult.dim = TRUE)

-----------------------------------------------------------

Results of Dimension-Estimations

-Bai and Ng (2002):

PC1 PC2 PC3 BIC3 IC1 IC2 IC3

5 5 5 4 5 5 5


-Bai (2004):

IPC1 IPC2 IPC3

3 3 2

-Alessi et al. (2010):

ABC.IC1 ABC.IC2

3 3

-Kneip et al. (2012):

KSS.C

6

-Onatski (2009):

ED

3

-Ahn and Horenstein (2013):

ER GR

3 6

-----------------------------------------------------------

Please, choose one of the proposed integers:

After entering a number of factors, e.g., 6 we get the following feedback:

Used dimension of unobs. factor structure is: 6

-----------------------------------------------------------

Note that the maximum number of factors that can be given, cannot exceed the highestestimated factor dimension (here maximal dimension would be 6). A higher dimension canbe chosen using the argument factor.dim.

4. Panel models with stochastically bounded factors

The panel model proposed by Bai (2009) can be presented as follows:

yit =

P∑j=1

xitjβj + vit + εit, (40)

where

vit =

d∑l=1

λilflt. (41)

Combining (40) with (41) and writing the model in matrix notation we get

Yi = Xiβ + FΛ>i + εi, (42)


where Yi = (yi1, . . . , yiT )>, Xi = (x>i1, . . . , x>iT )>, εi = (εi1, . . . , εiT )>, Λi = (λ1, . . . , λn)> and

F = (f1, . . . , fT )> with λi = (λi1, . . . , λid), ft = (f1t, . . . , fdt), and εi = (εi1, . . . , εiT )>.

The asymptotic properties of Bai’s method rely, among others, on the following assumption:

1

TF>F

p→ ΣF , as T →∞, (43)

where ΣF is a fixed positive definite d × d matrix. This allows for the factors to followa deterministic time trend such as ft = t/T or to be stationary dynamic processes suchthat ft =

∑∞j=1Cjet−j , where et are i.i.d. zero mean stochastic components. It is, however,

important to note that such an assumption rules out a large class of non-stationary factorssuch as I(p) processes with p ≥ 1.

4.1. Model with known number of factors d

Bai (2009) proposes to estimate the model parameters β, F and Λi by minimizing the followingleast squares objective function:

S(β, F,Λi) =n∑i

||Yi −Xiβ − FΛ>i ||2. (44)

For each given F , the OLS estimator of β can be obtained by

β(F ) =

(n∑i=1

X>i PdXi

)−1( n∑i=1

X>i PdYi

)(45)

where Pd = I−F (F>F )−1F> = I−FF>/T . If β is known, F can be estimated by using thefirst d eigenvectors γ = (γ1, . . . , γd) corresponding to the first d eigenvalues of the empiricalcovariance matrix Σ = (nT )−1

∑ni=1wiw

>i , where wi = Yi −Xiβ. That is,

F (β) =√T γ.

The idea of Bai (2009) is to start with initial values for β or F and calculate the estimatorsiteratively. The method requires, however, the factor dimension d to be known, which isusually not the case in empirical applications.

A feasible estimator of (45) can be obtained by using an arbitrary large dimension dmaxgreater than d. The factor dimension can be estimated subsequently by using the criteria ofBai and Ng (2002) to the remainder term Yi = Xiβ(F (dmax)), as suggested by Bai (2009).This strategy can lead, however, to inefficient estimation and spurious interpretation of β dueto over-parameterization.

4.2. Model with unknown number of factors d

In order to estimate d jointly with β, F , and Λi, Bada and Kneip (2014) propose to integratea penalty term into the objective function to be globally optimized. In this case, the opti-mization criterion can be defined as a penalized least squares objective function of the form:

S(β, F,Λi, l) =

N∑i

||Yi −Xiβ − FΛ>i ||2 + lgnT (46)


The role of the additional term lgnT is to pick up the dimension d, of the unobserved factorstructure. The penalty gnT can be chosen according to Bai and Ng (2002). The estimationalgorithm is based on the parameter cascading strategy of Cao and Ramsay (2010), which inthis case can be described as follows:

1. Minimizing (46) with respect to Λi for each given β, F and d, we get

Λ>i (β, F, d) = F> (Yi −Xiβ) /T. (47)

2. Introducing (47) in (46) and minimizing with respect to F for each given β and d, weget

F (β, d) =√T γ(β, d), (48)

where γ(β, d) is a T × d matrix that contains the first d eigenvectors corresponding tothe first d eigenvalues ρ1, . . . , ρd of the covariance matrix Σ = (nT )−1

∑ni=1wiw

>i with

wi = Yi −Xiβ.

3. Reintegrating (48) and (47) in (46) and minimizing with respect to β for each given d,we get

β(d) =

(N∑i=1

X>i Xi

)−1( N∑i=1

X>i

(Yi − F Λ>i (β, d)

)). (49)

4. Optimizing (46) with respect to l given the results in (47), (48), and (49) allows us toselect d as

d = argminl

N∑i

||Yi −Xiβ − F Λ>i ||2 + lgnT , for all l ∈ 0, 1, . . . , dmax.

The final estimators are obtained by alternating between an inner iteration to optimizeβ(d), F (d), and Λi(d) for each given d and an outer iteration to select the dimension d.The updating process is repeated in its entirety till the convergence of all the parameters.This is why the estimators are called entirely updated estimators (Eup). In order to avoidover-estimation, Bada and Kneip (2014) propose to re-scale gnT in each iteration stage withσ2 =

∑Ni ||Yi−Xiβ− F Λ>i ||2 in stead of σ2(dmax). Simulations show that such a calibration

can improve the finite sample properties of the estimation method.

It is notable that the objective functions (46) and (44) are not globally convex. There isno guarantee that the iteration algorithm converges to the global optimum. Therefore, it isimportant to choose reasonable starting values dstart and βstart. We propose to select a largedimension dmax and to start the iteration with the following estimate of β:

βstart =

(n∑i=1

X>i (I −GG>)Xi

)−1( n∑i=1

X>i (I −GG>)Yi

), (50)

where G is the T ×dmax matrix of the eigenvectors corresponding to the first dmax eigenvaluesof the augmented covariance matrix

ΓAug =1

nT

n∑i=1

(Yi, Xi)(Y>i , X

>i )>.


The intuition behind these starting estimates relies on the fact that the unobserved factorscannot escape from the space spanned by the eigenvectors G. The projection of Xi on theorthogonal complement of G in (50) eliminates the effect of a possible correlation betweenthe observed regressors and unobserved factors, which can heavily distort the value of β0 if itis neglected. Greenaway-McGrevy, Han, and Sul (2012) give conditions under which (50) isa consistent estimator of β. In order to avoid miss-specifying the model through identifyingfactors that only exist in Xi and not Yi, Bada and Kneip (2014) recommend to under-scalethe starting common factors Gl that are highly correlated with Xi.

According to Bai (2009), the asymptotic distribution of the slope estimator β(d) for known dis given by √

nT (β(d)− β)a∼ N(0, D−10 DZD

−10 ),

where D0 = plim 1nT

∑ni=1

∑Tt=1 Z

>itZit with Zi = (Zi1, . . . , ZiT )> = PdXi − 1

n

∑nk=1 PdXiaik

and aik = Λi(1n

∑ni=1 Λ>i Λi)

−1Λ>k , and

Case 1. DZ = D−10 σ2 if the errors are i.i.d. with zero mean and variance σ2,

Case 2. DZ = plim 1nT

∑ni=1 σ

2i

∑Tt=1 Z

>itZit, where σ2i = E(ε2it) with E(εit) = 0, if cross-section

heteroskedasticity exists and n/T → 0,


∑ni=1

∑nj=1 ωij

∑Tt=1 Z

>itZjt, where ωij = E(εitεjt) with E(εit) = 0, if

cross-section correlation and heteroskedasticity exist and n/T → 0,


∑Tt=1 σ

2t

∑ni=1 Z

>itZit, where σ2t = E(ε2it) with E(εit) = 0, if heteroskedas-

ticity in the time dimension exists and T/n→ 0,


∑Tt=1

∑Ts=1 ρ(t, s)

∑ni=1 Z

>itZis, where ρ(t, s) = E(εitεis) with E(εit) = 0 ,

if correlation and heteroskedasticity in the time dimension exist and T/n→ 0, and


∑Tt=1

∑ni=1 σ

2itZ>itZis, where σ2it = E(ε2it) with E(εit) = 0, if heteroskedas-

ticity in both time and cross-section dimensions exists with T/n2 → 0 and n/T 2 → 0.

In presence of correlation and heteroskedasticity in panels with proportional dimensions nand T , i.e., n/T → c > 0, the asymptotic distribution of β(d) will be not centered at zero.This can lead to false inference when using the usual test statistics such as t- and χ2-statistic.To overcome this problem, Bai (2009) propose to estimate the asymptotic bias and correctthe estimator as follows:

β∗(d) = β(d)− 1

nB − 1

TC (51)

where B and C are the estimators of

B = −(

1nT

∑ni=1

∑Tt=1 Z

>itZit

)−11n

∑ni=1

∑nk=1(Xi − Vi)>F

(F>F

)−1C = −

(1nT

∑ni=1

∑Tt=1 Z

>itZit

)−11n

∑ni=1X

>i MFΩF

(F>F

)−1 (∑nk=1 Λ>k Λk

)−1Λ>i

respectively. Here, Vi = 1n

∑nj=1 aijXj , Ω = 1

n

∑nk=1 Ωk and

Case 7. Ωk is a T × T diagonal matrix with elements ωkt = E(ε2kt) if heteroskedasticity in bothtime and cross-section dimensions exist and n/T → c > 0 and,


Case 8. Ωk is a T×T matrix with elements Ωk,ts = E(εktεks) if correlation and heteroskedasticityin both time and cross-section dimensions exist and n/T → c > 0.

In a similar context, Bada and Kneip (2014) prove that estimating d with the remaining modelparameters does not affect the asymptotic properties of β(d). The asymptotic distribution ofβ = β(d) is given by √

nT (β − β)a∼ N(0, D−10 DZD

−10 )

under Cases 1-6, and √nT (β∗ − β)

a∼ N(0, D−10 DZD−10 )

under Cases 7-8, where β∗ = β∗(d).

The asymptotic variance of β and the bias terms B and C can be estimated by replacing F ,Λi, Zit, and εit with F , Λi, Zit, and εit respectively.

In presence of serial correlation (cases 5 and 8), consistent estimators for DZ and C canbe obtained by using the usual heteroskedasticity and autocorrelation (HAC) robust limit-ing covariance. In presence of cross-section correlation (case 3), DZ is estimated by DZ =1mT

∑mi=1

∑mj=1

∑Tt=1 Z

>it Zjtεitεjt, where m =

√n. If both cross-section and serial correlation

exist (case 8), we estimate the long-run covariance of 1√m

∑mj=1 Zitεit.

4.3. Application

The above described methods are implemented in the function Eup(), which takes the follow-ing arguments:

R> args(Eup)

function (formula, additive.effects = c("none", "individual",

"time", "twoways"), dim.criterion = c("PC1", "PC2", "PC3",

"BIC3", "IC1", "IC2", "IC3", "IPC1", "IPC2", "IPC3"), d.max = NULL,

sig2.hat = NULL, factor.dim = NULL, double.iteration = TRUE,

start.beta = NULL, max.iteration = 500, convergence = 1e-06,

restrict.mode = c("restrict.factors", "restrict.loadings"),

...)

NULL

The arguments additive.effects, d.max, sig2.hat, and restrict.mode have the sameroles as in KSS(); see Section 2.2. The argument dim.criterion specifies the dimensionalitycriterion to be used if factor.dim is left unspecified and defaults to dim.criterion = "PC1".

Setting the argument double.iteration=FALSE may speed up computations, because theupdates of d will be done simultaneously with F without waiting for their inner convergences.However, in this case, the convergence of the parameters is less stable than in the defaultsetting.

The argument start.beta allows us to give a vector of starting values for the slope parametersβstart. The maximal number of iteration and the convergence condition can be controlled bymax.iteration and convergence.


In our application, we take first-order differences of the observed time series. This is becausesome factors show temporal trends, which can violate the stationarity condition (43); seeFigure 2. We consider the following modified cigarettes model:

O ln(Consumptionit) = β1O ln(Priceit) + β2O ln(Incomeit) + eit,

with eit =d∑l=1

λilflt + εit,

where Oxt = xt − xt−1. In order to avoid notational mess, we use the same notation forthe unobserved time-varying individual effects vit =

∑dl=1 λilflt as above in (20). The O-

transformation can be easily performed in R using the standard diff()-function as follows:

R> d.l.Consumption <- diff(l.Consumption)

R> d.l.Price <- diff(l.Price)

R> d.l.Income <- diff(l.Income)

As previously mentioned for the KSS()-function, the formula argument of the Eup()-functiontakes balanced panel variables as T × n dimensional matrices, where the number of rows hasto be equal to the temporal dimension T and the number of columns has to be equal to theindividual dimension n.

R> (Cigar.Eup <- Eup(d.l.Consumption ~ -1 + d.l.Price + d.l.Income,

+ dim.criterion = "PC3"))

Call:

Eup.default(formula = d.l.Consumption ~ -1 + d.l.Price + d.l.Income,

dim.criterion = "PC3")

Coeff(s) of the Observed Regressor(s) :

d.l.Price d.l.Income

-0.3140143 0.159392


Dimension of the Unobserved Factors: 5

Number of iterations: 55

Inferences about the slope parameters can be obtained by using the method summary().The type of correlation and heteroskedasticity in the idiosyncratic errors can be specified bychoosing one of the corresponding Cases 1-8 described above using the argument error.type= c(1, 2, 3, 4, 5, 6, 7, 8).

In presence of serial correlations (cases 5 and 8), the kernel weights required for estimating thelong-run covariance can be externally specified by giving a vector of weights in the argumentkernel.weights. By default, the function uses internally the linearly decreasing weights

of Newey and West (1987) and a truncation at⌊min

√n,√T⌋. If case 7 or 8 is chosen,


the method summary() calculates the realization of the bias corrected estimators and givesappropriate inferences. The bias corrected coefficients can be called by using the methodcoef() to the object produced by summary().

R> summary(Cigar.Eup)

Call:

Eup.default(formula = d.l.Consumption ~ -1 + d.l.Price + d.l.Income,

dim.criterion = "PC3")

Residuals:


-0.147000 -0.013700 0.000889 0.014100 0.093300

Slope-Coefficients:

Estimate Std.Err Z value Pr(>z)

d.l.Price -0.3140 0.0227 -13.90 < 2.2e-16 ***

d.l.Income 0.1590 0.0358 4.45 8.39e-06 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Dimension of the Unobserved Factors: 5

Residual standard error: 0.02804 on 957 degrees of freedom,

R-squared: 0.7033

The summary output reports that "PC3" detects 5 common factors. The effect of the dif-ferenced log-real prices for cigarettes on the differenced log-sales is negative and amounts to−0.31. The estimated effect of the differenced real disposable log-income per capita is 0.16.

The estimated factors ftl as well as the individual effects vit can be plotted using the plot()-method for summary.Eup-objects. The corresponding graphics are shown in Figure 4.

R> plot(summary(Cigar.Eup))

5. Models with additive and interactive unobserved effects

Even though the classical additive "individual", "time", and "twoways" effects can beabsorbed by the factor structure, there are good reasons to model them explicitly. On theone hand, if there are such effects in the true model, then neglecting them will result in non-efficient estimators; see Bai (2009). On the other hand, additive effects can be very useful forinterpretation.


1

11

1

11

1

11

1

1

1

1

1

1

1

11

1

1

1

11

1

11

1

1

1

0 5 10 15 20 25 30

−3−2

−10

12


Time

2

2

2

2

22

2

2

2

2

2

2

222

2

22

2

2

2

2

2

2

2

2

2

2

233

3

3

3

3

3

3

3

3

3

3

3

3

3

33333

3

3

3

3

3

33

3

3

4

4

4

4

44

4

4

4

4

4

44

4

44

4

4

44

4

4

4

4

4

4

4

4

4

5

5

55

5

5

5

5

55

55

5

55

5

555

555

5

5

5

5

5

5

5

0 5 10 15 20 25 30

−0.1

0.00.1

0.2


Time

Figure 4: Left Panel: Estimated factors f1t, . . . , f7t. Right panel: Estimated time-varying individual effects v1t, . . . , vnt.

Consider now the following model:

yit = µ+ αi + θt + x>itβ + νit + εit (52)

with

νit =

vit =

∑dl=1 λilflt, for the model of Bai (2009),

vi(t) =∑d

l=1 λilfl(t), for the model of Kneip et al. (2012),

where αi are time-constant individual effects and θt is a common time-varying effect.

In order to ensure identification of the additional additive effects αi and θt, we need thefollowing further restrictions:

(d)∑n

i=1 λil = 0 for all l ∈ 1, . . . , d

(e)∑T

t=1 flt = 0 for all l ∈ 1, . . . , d

(f)∑n

i=1 αi = 0

(g)∑T

t=1 θt = 0

By using the classical within-transformations on the observed variables, we can eliminate theadditive effects αi and θt, such that

yit = x>itβ + νit + εit,

where yit = yit − 1T

∑Tt=1 yit −

1n

∑ni=1 yit + 1

nT

∑Tt=1

∑ni=1 yit, xit = xit − 1

T

∑Tt=1 xit −

1n

∑ni=1 xit + 1

nT

∑Tt=1

∑ni=1 xit, and εit = εit − 1

T

∑Tt=1 εit −

1n

∑ni=1 εit + 1

nT

∑Tt=1

∑ni=1 εit.

Note that Restrictions (d) and (e) ensure that the transformation does not affect the time-varying individual effects νit. The parameters µ, αi and θt can be easily estimated in a second


step once an estimate of β is obtained. Because of Restrictions (d) and (e), the solution hasthe same form as the classical fixed effects model.

The parameters β and νit can be estimated by the above introduced estimation procedures.All possible variants of model (52) are implemented in the functions KSS() and Eup().The appropriate model can be specified by the argument additive.effects = c("none",

"individual", "time", "twoways"):

"none" yit = µ+ x>itβ + νit + εit

"individual" yit = µ+ αi + x>itβ + νit + εit

"time" yit = µ+ θt + x>itβ + νit + εit

"twoways" yit = µ+ αi + θt + x>itβ + νit + εit.

The presence of µ can be controlled by -1 in the formula-object: a formula with -1 refersto a model without intercept. However, for identification purposes, if a twoways model isspecified, the presence -1 in the formula will be ignored.

As an illustration, we continue with the application of the KSS()-function in Section 2. Theleft panel of Figure 2 shows that the first common factor is nearly time-invariant. Thismotivates us to augment the model (20) for a time-constant additive effects αi. In this case,it is convenient to use an intercept µ, which yields the following model:

ln(Consumptionit) = µ+ β1 ln(Priceit) + β2 ln(Incomeit) + αi + vi(t) + εit, (53)

where vi(t) =

d∑l=1

λil fl(t).

The estimation of the augmented model (53) can be done using the following lines of code.

R> Cigar2.KSS <- KSS(formula = l.Consumption ~ l.Price + l.Income,

+ additive.effects = "individual")

R> (Cigar2.KSS.summary <- summary(Cigar2.KSS))

Call:

KSS.default(formula = l.Consumption ~ l.Price + l.Income,

additive.effects = "individual")

Residuals:


-0.11 -0.01 0.00 0.01 0.12

Slope-Coefficients:

Estimate StdErr z.value Pr(>z)

(Intercept) 4.0500 0.1760 23.10 < 2.2e-16 ***

l.Price -0.2600 0.0222 -11.70 < 2.2e-16 ***

l.Income 0.1570 0.0381 4.11 3.88e-05 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Additive Effects Type: individual

Used Dimension of the Unobserved Factors: 5

Residual standard error: 0.000734 on 951 degrees of freedom

R-squared: 0.99

Again, the plot() method provides a useful visualization of the results.

R> plot(Cigar2.KSS.summary)

The "individual"-transformation of the data does not affect the estimation of the slopeparameters, but reduces the estimated dimension from d = 6 to d = 5. The remaining fivecommon factors f1, . . . , f5 correspond to those of model (20); see the middle panel of Figure 5.The estimated time-constant state-specific effects αi are shown in the left plot of Figure 5. Theextraction of the αi’s from the factor structure yields a denser set of time-varying individualeffects vi shown in the right panel of Figure 5.

0 10 20 30−0.6

−0.4

−0.2

0.00.2

0.40.6

Additive Individual Effects

Time

1111

1

1

11111

11111

1111111

1

1

1

1

1

11

0 10 20 30

−2−1

01

2


Time

2

222

222

2

2

2

2

2

2

2

22

2

222

2

2

2

2

2

2

2222

3

3

3

3

33

3

3

3

33

3

3

3

3

3

3

3

333

33

3

3

3

3

333

4

4

4

4

4

4

4

4

4

4

4

4

4

4

444

4

444

4

4

4

4

4

4

4

4

4

5

5

5

55

5

5

5

55

5

5

5

55

5

5

5

5

5

5

5

5

55

5

5

5

5

5

0 10 20 30

−0.6

−0.4

−0.2

0.00.2

0.4


Time

Figure 5: Left Panel: Estimated time-constant state-specific effects α1, . . . , αn. MiddlePanel: Estimated common factors f1(t), . . . , f5(t). Right Panel: Estimated time-varyingindividual effects v1(t), . . . , vn(t).

5.1. Specification tests

Model specification is an important step for any empirical analysis. The phtt package isequipped with two types of specification tests: the first is a Hausman-type test appropriatefor the model of Bai (2009); see Section 5.1.1. The second one examines the existence of afactor structure in Bai’s model as well as in the model of Kneip et al. (2012); see Section 5.1.2.


Testing the sufficiency of classical additive effects

For the case in which the estimated number of factors amounts to one or two (1 ≤ d ≤ 2), it isinteresting to check whether or not these factors can be interpreted as classical "individual","time", or "twoways" effects. Bai (2009) considers the following testing problem:

H0: vit = αi + θtH1: vit =

∑2l=1 λilflt

The model with factor structure, as described in Section 4, is consistent under both hypothe-ses. However, it is less efficient under H0 than the classical within estimator, while the latter isinconsistent under H1 if xit and vit are correlated. These conditions are favorable for applyingthe Hausman test:

JBai = nT(β − βwithin

)∆−1

(β − βwithin

)a∼ χ2

P , (54)

where βwithin is the classical within least squares estimator, ∆ is the asymptotic variance

of√nT(β − βwithin

), P is the vector-dimension of β, and χ2

P is the χ2-distribution with P

degrees of freedom.

The null hypothesis H0 can be rejected, if JBai > χ2P,1−α, where χ2

P,1−α is the (1−α)-quantile

of the χ2 distribution with P degrees of freedom.

Under i.i.d. errors, JBai can be calculated by replacing ∆ with its consistent estimator

∆ =

( 1

nT

n∑i=1

Z>i Zi

)−1−

(1

nT

n∑i=1

T∑t=1

x>i xi

)−1 σ2, (55)

where

σ2 =1

nT − (n+ T )d− P + 1

n∑i=1

T∑t=1

(yit − x>it β −d∑l=1

λilflt)2. (56)

The used residual variance estimator σ2 is chosen here, since it is supposed to be consistentunder the null as well as the alternative hypothesis. The idea behind this trick is to avoidnegative definiteness of ∆. But notice that even with using this construction, the possibility ofgetting a negative definite variance estimator cannot be excluded. As an illustration, considerthe case in which the true number of factors is greater than the number of factors used underthe alternative hypothesis, i.e., the true d > 2. In such a case, the favorable conditions forapplying the test can be violated, since the iterated least squares estimator β is computedwith d ≤ 2 and can be inconsistent under both hypothesis. To avoid such a scenario, werecommended to the user to calculate β with a large dimension dmax instead of d ≤ 2.

The test is implemented in the function checkSpecif(), which takes the following arguments:

R> checkSpecif(obj1, obj2, level = 0.05)

The argument level is used to specify the significance level. The arguments obj1 and obj2

take both objects of class Eup produced by the function Eup():

obj1 Takes an Eup-object from an estimation with "individual", "time", or "twoways"

effects and a factor dimension equal to d = 0; specified as factor.dim = 0.


obj2 Takes an Eup-object from an estimation with "none"-effects and a large factor dimen-sion dmax; specified with the argument factor.dim.

If the test statistic is negative (due to the negative definiteness of ∆), the checkSpecif()

prints an error message.

R> twoways.obj <- Eup(d.l.Consumption ~ -1 + d.l.Price + d.l.Income,

+ factor.dim = 0, additive.effects = "twoways")

R> not.twoways.obj <- Eup(d.l.Consumption ~ -1 + d.l.Price + d.l.Income,

+ factor.dim = 2, additive.effects = "none")

R> checkSpecif(obj1 = twoways.obj, obj2 = not.twoways.obj, level = 0.01)

Error in checkSpecif(obj1 = twoways.obj, obj2 = not.twoways.obj,

level = 0.01):

The assumptions of the test are not fulfilled.

The (unobserved) true number of factors is probably greater than 2.

Notice that the Hausman test of Bai (2009) assumes the within estimator to be inconsistentunder the alternative hypothesis, which requires xit to be correlated with vit. If this assump-tion is violated, the test can suffer from power to reject the null hypothesis, since the withinestimator becomes consistent under both hypothesis.

Bai (2009) discusses in his supplementary material another way to check whether a classicalpanel data with fixed additive effects is sufficient to describe the data. His idea consists ofestimating the factor dimension after eliminating the additive effects as described in Section5. If the obtained estimate of d is zero, the additive model can be considered as a reasonablealternative for the model with factor structure. But note that this procedure can not beconsidered as a formal testing procedure, since information about the significance level of thedecision are not provided.

An alternative test for the sufficiency of a classical additive effects model can be given bymanipulating the test proposed by Kneip et al. (2012) as described in the following section.

Testing the existence of common factors

This section is concerned with testing the existence of common factors. In contrast to theHausman type statistic discussed above, the goal of this test is not merely to decide whichmodel specification is more appropriate for the data, but rather to test in general the existenceof common factors beyond the possible presence of additional classical "individual", "time",or "twoways" effects in the model.

This test relies on using the dimensionality criterion proposed by Kneip et al. (2012) totest the following hypothesis after eliminating eventual additive "individual", "time", or"twoways" effects:

H0: d = 0H1: d > 0


Under H0 the slope parameters β can be estimated by the classical within estimation method.In this simple case, the dimensionality test of Kneip et al. (2012) can be reduced to thefollowing test statistic:

JKSS =n tr(Σw)− (n− 1)(T − 1)σ2√

2n(T − 1)σ2a∼ N(0, 1),

where Σw is the covariance matrix of the within residuals. The reason for this simplificationis that under H0 there is no need for smoothing, which allows us to set κ = 0.

We reject H0: d = 0 at a significance level α, if JKSS > z1−α, where z1−α is the (1 − α)-quantile of the standard normal distribution. It is important to note that the performance ofthe test depends heavily on the accuracy of the variance estimator σ2. We propose to use thevariance estimators (15) or (56), which are consistent under both hypotheses as long as d isgreater than the unknown dimension d. Internally, the test procedure sets d =d.max and σ2

as in (56).

This test can be performed for Eup- as well as for KSS-objects by using the function checkSpecif()

leaving the second argument obj2 unspecified. In the following, we apply the test for bothmodels:

For the model of Bai (2009):

R> Eup.obj <- Eup(d.l.Consumption ~ -1 + d.l.Price + d.l.Income,

+ additive.effects = "twoways")

R> checkSpecif(Eup.obj, level = 0.01)

----------------------------------------------

Testing the Presence of Interactive Effects

Test of Kneip, Sickles, and Song (2012)

----------------------------------------------

H0: The factor dimension is equal to 0.

Test-Statistic p-value crit.-value sig.-level

13.29 0.00 2.33 0.01

For the model of Kneip et al. (2012):

R> KSS.obj <- KSS(l.Consumption ~ -1 + l.Price + l.Income,

+ additive.effects = "twoways")

R> checkSpecif(KSS.obj, level = 0.01)

----------------------------------------------

Testing the Presence of Interactive Effects

Test of Kneip, Sickles, and Song (2012)

----------------------------------------------

H0: The factor dimension is equal to 0.

Test-Statistic p-value crit.-value sig.-level

104229.55 0.00 2.33 0.01


The null hypothesis H0: d = 0 can be rejected for both models at a significance level α = 0.01.

6. Interpretation

This section is intended to outline an exemplary interpretation of the panel model (53), whichis estimated by the function KSS() in Section 5. The interpretation of models estimated bythe function Eup() can be done accordingly. For convenience sake, we re-write the model (53)in the following:

ln(Consumptionit) = µ+ β1 ln(Priceit) + β2 ln(Incomeit) + αi + vi(t) + εit,

where vi(t) =

d∑l=1

λil fl(t).

A researcher, who chooses the panel models proposed by Kneip et al. (2012) or Bai (2009),will probably find them attractive due to their ability to control for very general forms ofunobserved heterogeneity. Beyond this, a further great advantage of these models is thatthe time-varying individual effects vi(t) provide a valuable source of information about thedifferences between the individuals i. These differences are often of particular interest as,e.g., in the literature on stochastic frontier analysis.

The left panel of Figure 5 shows that the different states i have considerable different time-constant levels αi of cigarette consumption. A classical further econometric analysis couldbe to regress the additive individual effects αi on other time-constant variables, such as thegeneral populations compositions, the cigarette taxes, etc.

The right panel of Figure 5 shows the five estimated common factors f1(t), . . . , f5(t). It is agood practice to start the interpretation of the single common factors with an overview abouttheir importance in describing the differences between the vi(t)’s, which is reflected in thevariances of the individual loadings parameters λil. A convenient depiction is the quantityof variance-shares of the individual loadings parameters on the total variance of the loadingsparameters

coef(Cigar2.KSS)$Var.shares.of.loadings.param[l] = V(λil)/

d∑k=1

V(λik),

which is shown for all common functions f1(t), . . . , f5(t) in the following table:

Common Factor Share of total variance of vi(t)

f1(t) coef(Cigar2.KSS)$Var.shares.of.loadings.param[1] = 66.32%





Table 1: List of the variance shares of the common factors f1(t), . . . , f5(t).

The values in Table 1 suggest to focus on the first two common factors, which explain togetherabout 90% of the total variance of the time-varying individual effects vi(t).


The first two common factors

coef(Cigar2.KSS)$Common.factors[,1] = f1(t) and

coef(Cigar2.KSS)$Common.factors[,2] = f2(t)

are plotted as black and red lines in the middle panel of Figure 5. Figure 6 visualizes thedifferences of the time-varying individual effects vi(t) in the direction of the first commonfactor (i.e., λi1f1(t)) and in the direction of the second common factor (i.e., λi2f2(t)). As forthe time-constant individual effects αi a further econometric analysis could be to regress theindividual loadings parameters λi1 and λi2 on other explanatory time-constant variables.

0 5 10 15 20 25 30

−0.6

−0.4

−0.2

0.0

0.2

0.4

Variance of time−varying indiv. effects in direction of the 1. common factor

Time

0 5 10 15 20 25 30

−0.2

−0.1

0.0

0.1

0.2

Variance of time−varying indiv. effects in direction of the 2. common factor

Time

Figure 6: Left Panel: Visualization of the differences of the time-varying individual effectsvi(t) in the direction of the first factor f1(t) (i.e., λi1f1(t)). Right Panel: Visualization ofthe differences of the time-varying individual effects vi(t) in the direction of the second factorf2(t) (i.e., λi2f2(t)).

Generally, for both models proposed by Kneip et al. (2012) and Bai (2009) the time-vayingindividual effects

νit =

d∑l=1

λilflt

can be interpreted as it is usually done in the literature on factor models. An importanttopic that is not covered in this section is the rotation of the common factors. Often, thecommon factors fl can be interpreted economically only after the application of an appropriate


rotation scheme for the set of factors f1, . . . , fd. The latter can be done, e.g., using thefunction varimax() from the stats package. Alternatively, many other rotation schemes canbe found in the GPArotation package (R Core Team (2014), Bernaards and I.Jennrich (2005)).Sometimes, it is also preferable to standardize the individual loadings parameters instead ofthe common factors as it is done, e.g., in Ahn, Hoon Lee, and Schmidt (2001). This can bedone by choosing restrict.mode = c("restrict.loadings") in the functions KSS() andEup() respectively.

7. Summary

This paper introduces the R package phtt for the new class of panel models proposed by Bai(2009) and Kneip et al. (2012). The two main functions of the package are the Eup()-functionfor the estimation procedure proposed in Bai (2009) and the KSS()-function for the estimationprocedure proposed in Kneip et al. (2012). Both of the main functions are supported by theusual print()-, summary()-, plot()-, coef()- and residuals()-methods. While parts ofthe method of Bai (2009) are available for commercially available software packages, theestimation procedure proposed by Kneip et al. (2012) is not available elsewhere. A furtherremarkable feature of our phtt package is the OptDim()-function, which provides an easeaccess to many different dimensionality criteria proposed in the literature on factor models.The usage of the functions is demonstrated by a real data application.

8. Acknowledgment

The authors wish to thank the referees for their many helpful comments and suggestionsthat greatly improved the paper. The work of Dominik Liebl was supported from the IAPResearch NetworkP7/06 of the Belgian State (Belgian Science Policy).


References

Ahn S, Hoon Lee Y, Schmidt P (2001). “GMM Estimation of Linear Panel Data Models withTime-Varying Individual Effects.” Journal of Econometrics, 101(2), 219–255.

Ahn SC, Horenstein AR (2013). “Eigenvalue Ratio Test for the Number of Factors.” Econo-metrica, 81(3), 1203–1227.

Ahn SC, Lee YH, Schmidt P (2013). “Panel Data Models with Multiple Time-Varying Indi-vidual Effects.” Journal of Econometrics, 174(1), 1–14.

Alessi L, Barigozzi M, Capasso M (2010). “Improved Penalization for Determining the Numberof Factors in Approximate Factor Models.” Statistics & Probability Letters, 80(23-24),1806–1813.

Bada O, Kneip A (2014). “Parameter Cascading for Panel Models with Unknown Number ofUnobserved Factors: An Application to the Credit Spread Puzzle.” Computational Statistics& Data Analysis (forthcoming).

Bada O, Liebl D (2012). phtt: Panel Data Analysis with Heterogeneous Time Trends. Rpackage version 2.07, URL https://r-forge.r-project.org/R/?group_id=730.

Bai J (2004). “Estimating Cross-Section Common Stochastic Trends in Nonstationary PanelData.” Journal of Econometrics, 122(1), 137–183.

Bai J (2009). “Panel Data Models with Interactive Fixed Effects.” Econometrica, 77(4),1229–1279.

Bai J, Kao C, Ng S (2009). “Panel Cointegration with Global Stochastic Trends.” Journal ofEconometrics, 149(1), 82–99.

Bai J, Ng S (2002). “Determining the Number of Factors in Approximate Factor Models.”Econometrica, 70(1), 191–221.

Baltagi B (2005). Econometric Analysis of Panel Data. Third edition. John Wiley & Sons.

Baltagi B, Levin D (1986). “Estimating Dynamic Demand for Cigarettes Using Panel Data:The Effects of Bootlegging, Taxation and Advertising Reconsidered.” The Review of Eco-nomics and Statistics, pp. 148–155.

Baltagi B, Li D (2004). Prediction in the Panel Data Model with Spatial Correlation. Firstedition. Springer-Verlag.

Bates D, Maechler M, Bolker B (2012). lme4: Linear Mixed-Effects Models Using S4 Classes.R package version 0.999375-42, URL http://CRAN.R-project.org/package=lme4.

Bernaards CA, IJennrich R (2005). “Gradient Projection Algorithms and Software for Arbi-trary Rotation Criteria in Factor Analysis.” Educational and Psychological Measurement,65, 676–696.

Cao J, Ramsay J (2010). “Linear Mixed-Effects Modeling by Parameter Cascading.” Journalof the American Statistical Association, 105(489), 365–374.

https://r-forge.r-project.org/R/?group_id=730

http://CRAN.R-project.org/package=lme4


Craven P, Wahba G (1978). “Smoothing Noisy Data with Spline Functions: Estimating theCorrect Degree of Smoothing by the Method of Generalized Cross-Validation.” NumerischeMathematik, 31(4), 377–403.

Croissant Y, Millo G (2008). “Panel Data Econometrics in R: The plm Package.” Journal ofStatistical Software, 27(2), 1–43. URL http://www.jstatsoft.org/v27/i02.

De Boor C (2001). A Practical Guide to Splines. Applied Mathematical Series, revised edition.Springer-Verlag.

Greenaway-McGrevy R, Han C, Sul D (2012). “Asymptotic Distribution of Factor AugmentedEstimators for Panel Regression.” Journal of Econometrics (Forthcoming).

Hallin M, Liska R (2007). “Determining the Number of Factors in the General DynamicFactor Model.” Journal of the American Statistical Association, 102, 603–617.

Kneip A, Sickles RC, Song W (2012). “A New Panel Data Treatment for Heterogeneity inTime Trends.” Econometric Theory, 28(3), 590–628.

Millo G, Piras G (2012). “splm: Spatial Panel Data Models in R.” Journal of StatisticalSoftware, 47(1), 1–38. URL http://www.jstatsoft.org/v47/i01.

Newey WK, West KD (1987). “A Simple, Positive Semi-definite, Heteroskedasticity andAutocorrelation Consistent Covariance Matrix.” Econometrica, 55(3), 703–08.

Onatski A (2010). “Determining the Number of Factors from Empirical Distribution of Eigen-values.” The Review of Economics and Statistics, 92(4), 1004–1016.

Pesaran HM (2006). “Estimation and Inference in Large Heterogeneous Panels with a Multi-factor Error Structure.” Econometrica, 74(4), 967–1012.

Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core team (2012). nlme: Linear and NonlinearMixed Effects Models. R package version 3.1-103, URL http://CRAN.R-project.org/

package=nlme.

R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foun-dation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

The MathWorks Inc (2012). MATLAB – The Language of Technical Computing, Version7.14. The MathWorks, Inc., Natick, Massachusetts. URL http://www.mathworks.com/

products/matlab.

Affiliation:

Oualid BadaStatistische AbteilungUniversity of BonnAdenauerallee 24-2653113 Bonn, GermanyE-mail: [email protected]

http://www.jstatsoft.org/v27/i02

http://www.jstatsoft.org/v47/i01

http://CRAN.R-project.org/package=nlme

http://CRAN.R-project.org/package=nlme

http://www.R-project.org/

http://www.mathworks.com/products/matlab

http://www.mathworks.com/products/matlab

mailto:[email protected]

t arxiv:1407.6484v1 [stat.co] 24 jul 2014the toolbox for statisticians and econometricians and...

Documents