The Linear Regression Model with Panel Data

February 16, 2011

Readings: Chapter 7 of textbook.

I will cover the pooled and individual e¤ects models.

A very popular version of the individual e¤ects model: the stochasticfrontier model will also be discussed.

Computational tools: Gibbs sampler with data augmentation.

yit and εit denote tth observations (for t = 1, ..,T ) of the dependentvariable and error, respectively, for i th individual for i = 1, ..,N.

yi and εi denote vectors of T observations on dependent variable anderror, respectively, for i th individual.

Sometimes it is important to distinguish between the intercept andslope coe¢ cients.

Hence, de�ne Xi to be a T � k matrix containing T observations oneach of k explanatory variables (including intercept) for i th individual.eXi is T � (k � 1) matrix equal to Xi with intercept removed.

If we stack observations for all N individuals together, we obtain theTN�vectors:

y =


3775 and ε =



3775Similarly, stacking observations on all explanatory variables togetheryields the TN �K matrix:

X =



The Pooled Model

Assume same regression relationship holds for every individual:

yi = Xi β+ εi ,

for i = 1, ..,N where β is the k�vector of regression coe¢ cients,including intercept

This is just a linear regression model of sort discussed in previouslectures.

No new issues arise.

Individual E¤ects Models

Model is of the form:

yit = αi + βxit + εit

Di¤erent intercept for every individual (�individual e¤ect�)

Same slope for every individual

The Likelihood Function

Likelihood function based on the regression equation:

yi = αi ιT + eXieβ+ εi

Properties of multivariate Normal imply likelihood function of theform:

p(y jα, eβ, h) =∏Ni=1




�� h2

�yi � αi � eXieβ�0 �yi � αi � eXieβ���

where α = (α1, .., αN )0.

The PriorA Non-hierarchical Prior

Any sort of prior, including a noninformative one.

Here we consider two types of priors which are computationally simpleand commonly used.

Individual e¤ects model can be written as:

y = X �β� + ε

where X � is a TN � (N + k � 1) matrix given by

X � =

266664ιT 0T . . 0T eX10T ιT . . . eX2. 0T . . . .. . . . 0T .0T . . . ιT eXN


β� =




Individual e¤ects model can be written as regression model (withindividual dummy variables).

Use independent Normal-Gamma prior (but could also use naturalconjugate prior):

β� � N�


h � G�s�2, ν

A Hierarchical Prior

Hierarchical priors are popular in many cases with high-dimensionalparameter spaces (such as the individual e¤ects model).

Consider a prior:αi � N (µα,Vα)

with αi and αj being independent of one another for i 6= j .Hierarchical structure of the prior arises if we treat µα and Vα asunknown parameters which require their own prior.

We assume µα and Vα to be independent of one another with:

µα � N�

µα, σ2α


V�1α � G�V�1α , να

�Hierarchical prior assumes all intercepts are drawn from samedistribution.

This extra structure (if consistent with patterns in the data), allowsfor more accurate estimation.

For the remaining parameters, we assume a non-hierarchical prior ofthe independent Normal-Gamma variety.

eβ � N �β,V β


h � G�s�2, ν

�This model is analogous to the frequentist random e¤ects model.

Bayesian ComputationPosterior Inference under the Hierarchical Prior

Under the non-hierarchical prior, we have a linear regression modelwith independent Normal-Gamma prior. Hence, posterior inferencecan be can be carried out using methods in Chapter 4.

A Gibbs sampler can be used

The relevant posterior distributions for eβ and h, conditional on α, are

eβjy , h, α, µα,Vα � N�

β,V β

�hjy , eβ, α, µα,Vα � G (s�2, ν)

αi jy , eβ, h, µα,Vα � N�αi ,V i


µαjy , eβ, h, α,Vα � N�µα, σ


V�1α jy , eβ, h, α, µα � G�V�1α , να

�where formulae for arguments of these densities given in textbook,pages 152-154.

Derivations above simple extensions of those for Normal linearregression model

Gibbs sampler requires only random number generation from Normaland Gamma distributions.

Note: the random coe¢ cients model is given by:

yi = Xi βi + εi

where βi varies over observation.

Discussed in textbook, pages 155-157. (Simple extension of individuale¤ects model so I will not discuss it here).

E¢ ciency Analysis and the Stochastic Frontier Model

To motivate model, let output of �rm i at time t, Yit , be producedusing a vector of inputs, X �it , .

Firms have access to a common best-practice technology for turninginputs into output:

Yit = f (X �it ; β).

Production frontier measures the maximum amount of output thatcan be obtained from a given level of inputs.

Deviation of actual from maximum feasible output is a measure ofine¢ ciency.

Formally:Yit = f (X �it ; β)τi

where 0 < τi � 1 is a measure of �rm-speci�c e¢ ciency and τi = 1indicates �rm i is fully e¢ cient.

Example: τi = 0.75 means that �rm i is producing only 75% of theoutput it could have if it were operating according to best-practicetechnology.

In this speci�cation, we have assumed each �rm has a particulare¢ ciency level which is constant over time. This assumption can berelaxed.

Adding a random error to the model, ζ it , to capture measurement (orspeci�cation) error:

Yit = f (X �it ; β)τi ζ it

Common for f () to be log-linear (e.g. Cobb-Douglas or translog):

yit = Xitβ+ εit � zi

where yit = ln(Yit ), εit = ln(ζ it ), zi = �ln(τi ) and Xit is thecounterpart of X �it with the inputs transformed to logarithms

stack into matrices:yi = Xi β+ εi � zi ιT

zi is referred to as ine¢ ciency

0 < τi � 1It is a non-negative random variable.

Xit is assumed to contain an intercept and β1 is its coe¢ cient.

Note that this model is of the form of an individual e¤ects model:β1 � zi plays the same role that αi did for individual e¤ects models.

Bayesian Inference in the Stochastic Frontier Model

Very similar to individual e¤ects model, so we will only sketch outdetails.The important new issue here is ine¢ ciency term, zi , so focus on that.Hierarchical prior for ine¢ ciencies:Since zi > 0, cannot use Normal hierarchical priorCommon choices include the truncated-Normal and members of thefamily of Gamma distributions.Here we will use the exponential distribution (which is Gamma withtwo degrees of freedom):

zi � G (µz , 2)

µz > 0 requires a prior.We use:

µ�1z � G�

µ�1z, νz

Now set up a Gibbs sampler.

Derive full conditional posterior distributions similarly to randome¤ects model

βjy , h, z , µz � N�


hjy , β, z , µz � G (s�2, ν)

p(zi jyi ,Xi , β, h, µz ) ∝fN (zi jX i β� y i � (Thµz )

�1, (Th)�1)1(zi � 0)

µ�1z jy , β, h, z � G (µz , νz )

formulae for arguments of densities are given in the book.

Gibbs sampler involves drawning from Normal, truncated Normal andGamma distributions �all straightforward to do.

Empirical Illustration: E¢ ciency Analysis with StochasticFrontier Models

To illustrate Bayesian inference in the stochastic frontier model,arti�cial data was generated from:

yit = 1.0+ 0.75x2,it + 0.25x3,it � zi + εit

for i = 1, .., 100 and t = 1, .., 5.

εit � N (0, 0.04), zi � G (� ln [.85] , 2), x2,it � U (0, 1) andx2,it � U (0, 1) .Note: ine¢ ciency distribution is selected to imply median of e¢ ciencydistribution is 0.85.

Priors are relatively noninformative (see textbook).

Posterior results based on Gibbs sampler

Table 7.3 contains posterior means and standard deviations forparameters

With stochastic frontier models, interest often centers on �rm-speci�ce¢ ciencies, τi for i = 1, ..,N.

Since τi = exp (�zi ), and Gibbs sampler yields draws of zi , we cansimply transform them and average to obtain E (τi jy)There are N = 100 e¢ ciencies �we select �rms which have theminimum, median and maximum values for E (τi jy).These are labelled τmin, τmed and τmax in Table 7.3.

The histogram in Figure 7.5 plots the posterior means of thee¢ ciencies of all 100 �rms, might be presented to give a rough ideaof how e¢ ciencies are distributed across �rms.

Table 7.3: Posterior Results for Arti�cialData Set from Stochastic Frontier Model


β1 0.98 0.03β2 0.74 0.03β3 0.27 0.03h 26.69 1.86µz 0.15 0.02

τmin 0.56 0.05τmed 0.89 0.06τmax 0.97 0.03

An important issue in e¢ ciency analysis is whether point estimatescan be treated as a reliable guide to the ranking of �rms.

Important policy recommendations may hang on a �nding that �rm Ais less e¢ cient than �rm B.

Simply relying on point estimates which indicate that �rm A is lesse¢ cient than �rm B may lead to inappropriate policy advice.

But Gibbs sampler output can be used in a straightforward manner toshed light on this issue.

For instance, p (τA < τB jy) is the probability �rm A is less e¢ cientthan �rm B.

We �nd p (τmax > τmed jy) = 0.89, p (τmax > τminjy) = 1.00 andp (τmed > τminjy) = 1.00.Thus, we can conclude that �rms which are ranked far apart in termsof their e¢ ciency estimates do truly di¤er in e¢ ciency.

However, it is likely that, e.g., researcher would be very uncertainabout saying 12th ranked �rm is more e¢ cient than 13th ranked.

Figure 7.6 plots posteriors for τmin, τmed and τmax.

Panel data topics popular right now in the econometrics literature.

Panel data models introduced in this chapter are useful for modelingheterogeneity of various sorts.

This is a crucial issue in many �elds.

E.g. marketing has consumer heterogeneity,

labour economics, individuals may vary in many ways that cannot bedirectly observed by the econometrician (e.g. they may di¤er in theirreturns to schooling, their value of leisure, their productivity, etc.).

Dynamic panel data models are very hot these days (i.e. T is largeenough that you have to start worrying about time series and unitroot issues).

