on the properties of the coefficient of determination in regression models with infinite variance...

10

Click here to load reader

Upload: mico

Post on 25-Dec-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: On the properties of the coefficient of determination in regression models with infinite variance variables

Journal of Econometrics 181 (2014) 15–24

Contents lists available at ScienceDirect

Journal of Econometrics

journal homepage: www.elsevier.com/locate/jeconom

On the properties of the coefficient of determination inregression models with infinite variance variables✩

Jeong-Ryeol Kurz-Kim a,1, Mico Loretan b,∗

a Research Centre, Deutsche Bundesbank, Wilhelm-Epstein-Strasse 14, 60431 Frankfurt am Main, Germanyb Monetary Policy Analysis Group, Swiss National Bank, Börsenstrasse 15, 8001 Zürich, Switzerland

a r t i c l e i n f o

Article history:Available online 3 March 2014

JEL classification:C12C13C21G12

Keywords:Coefficient of determinationα-stable distributionsSignal to noise ratioDensity transformation theoremMonte Carlo simulationFama–MacBeth regressionCAPM

a b s t r a c t

We examine the asymptotic properties of the coefficient of determination, R2, in models withα-stable random variables. If the regressor and error term share the same index of stability α < 2,we show that the R2 statistic does not converge to a constant but has a nondegenerate distribution onthe entire [0, 1] interval. We provide closed-form expressions for the cumulative distribution functionand probability density function of this limit random variable, and we show that the density function isunbounded at 0 and 1. If the indices of stability of the regressor and error term are unequal, we show thatthe coefficient of determination converges in probability to either 0 or 1, depending onwhich variable hasthe smaller index of stability, irrespective of the value of the slope coefficient. In an empirical application,we revisit the Fama and MacBeth (1973) two-stage regression and demonstrate that in the infinite-variance case the R2 statistic of the second-stage regression converges to 0 in probability even if the slopecoefficient is nonzero.We deduce that a small value of the R2 statistic should not, in itself, be used to rejectthe usefulness of a regression model.

© 2014 Elsevier B.V. All rights reserved.

1. Introduction

Granger and Orr (1972) lead off their article ‘‘ ‘Infinite variance’and research strategy in time series analysis’’ by questioning theuncritical use of the normal distribution assumption in economicmodeling and estimation:

It is standard procedure in economic modeling and estimationto assume that random variables are normally distributed. Inempirical work, confidence intervals and significance tests are

✩ The views expressed in this paper are solely the responsibility of the authorsand should not be interpreted as reflecting the views of the staff of the DeutscheBundesbank or the Swiss National Bank. We thank Jean-Marie Dufour (the editor),three anonymous referees, Neil R. Ericsson, Peter C.B. Phillips, Werner Ploberger,Bernhard Schipp, Casper G. de Vries, Jonathan H. Wright, and participants ofworkshops at the Federal Reserve Board, the Deutsche Bundesbank, and SingaporeManagementUniversity for valuable comments. ZhenyuWangvery kindly providedthe data we used in the empirical section of this paper. Jeong-Ryeol Kurz-Kimgratefully acknowledges the research support from the Alexander von HumboldtFoundation.∗ Corresponding author. Tel.: +41 44 631 8056.

E-mail addresses: [email protected] (J.-R. Kurz-Kim),[email protected] (M. Loretan).1 Tel.: +49 69 9566 4576.

http://dx.doi.org/10.1016/j.jeconom.2014.02.0040304-4076/© 2014 Elsevier B.V. All rights reserved.

widely used, and these usually hinge on the presumption of anormal population. Lately, there has been a growing awarenessthat some economic data display distributional characteristicsthat are flatly inconsistent with the hypothesis of normality.

Due importantly to the seminal work of Mandelbrot (1963),non-Gaussian α-stable distributions are often considered to pro-vide the basis for more realistic distributional assumptions forsome economic data, especially for high-frequency financial timeseries such as those of exchange rate fluctuations and stock returns.Financial time series are typically fat-tailed and excessively peakedaround their mean—phenomena that can be better captured byα-stable distributions with 1 < α < 2 rather than by the normaldistribution for which α = 2. The α-stable distributional assump-tionwithα < 2 is a generalization of rather than a strict alternativeto the Gaussian distributional assumption. If an economic seriesfluctuates according to an α-stable distribution with α < 2, it isknown thatmany of the standardmethods of statistical analysis donot apply in the conventionalway. In particular, aswedemonstratein this paper, if α < 2 the coefficient of determination of a regres-sion model has several nonstandard properties. Moreover, theseproperties are sufficiently important to cast doubt on the suitabil-ity of the coefficient of determination as a general goodness of fitcriterion in regressions inwhich the regressor(s) and the error term

Page 2: On the properties of the coefficient of determination in regression models with infinite variance variables

16 J.-R. Kurz-Kim, M. Loretan / Journal of Econometrics 181 (2014) 15–24

are characterized by strong outlier activity, regardless of whetherthe sample value of the coefficient of determination is high or low.

The linear regression model is one of the most commonly usedand basic econometric tools, not only for the analysis of macroe-conomic relationships but also for the study of financial marketdata. Typical examples for the latter case are the estimation of theex-post version of the capital asset pricing model (CAPM) and thetwo-stage modeling approach of Fama and MacBeth (1973). Be-cause of the prevalence of heavy-tailed distributions in financialtime series, it is of interest to study how regression models per-form when the data are heavy-tailed rather normally distributed.There are many heavy-tailed distributions that could be consid-ered. One such class of distributions that is particularly suitable ina regression model context is the class of α-stable distributions,because (i) these distributions are able to capture the relative fre-quencies of extreme vs. ordinary observations in economic andfinancial variables, (ii) they have the convenient statistical prop-erty of closure under convolution, and (iii) only α-stable distri-butions can serve as limiting distributions of normalized sums ofindependent and identically distributed (iid) random variables, asproven in Zolotarev (1986). The second and third properties are es-pecially appealing for regression analysis because the disturbanceterm may often be interpreted as a random variable which repre-sents the sum of all external effects not captured by the regressors.

In this paper,we show that infinite variance of the regressor anddisturbance term has important consequences for the asymptoticproperties of the coefficient of determination, R2, a very frequentlyused goodness-of-fit measure. We show that if the regressor anderror term are both α-stable (with α < 2) with the same index ofstability, the R2 statistic does not converge to a fixed (positive) con-stant but has a nondegenerate limiting distribution on the (0, 1)interval. Hence, a low value of R2 in an empirical regression ap-plication should not, by itself, be interpreted as implying eitherthat the model is poorly specified or that there is no statisticallysignificant (linear) relationship between the regressor and the de-pendent variable. In an empirical application, we revisit the Famaand MacBeth (1973) two-stage regression approach and establishthat infinite variance of the regression variables affects decisivelythe interpretation of the well-known stylized empirical fact thatthe R2 statistics in static CAPM models tend to be very close to 0.Specifically, we find that a low value of the R2 statistic should notbe used to conclude that the relationship between the regressorand the regressand is ‘‘flat’’.

The rest of our paper is structured as follows. In Section 2 weprovide a brief summary of the properties of α-stable distributionsand of aspects of estimation, hypothesis testing, and modeldiagnostic checking in regression models with α-stable variables.Section 3 provides a detailed analysis of the asymptotic propertiesof the coefficient of determination in regression models withinfinite variance variables. Our empirical application is presentedin Section 4, and Section 5 offers concluding remarks.

2. Framework

2.1. A brief overview of the properties α-stable distributions

A random variable X is said to have a stable distribution if, forany positive integer n > 2, there exist coefficients an > 0 andbn ∈ R such that X1 + · · · + Xn

d= anX + bn, where X1, . . . , Xn

are independent copies of X and d= signifies equality in distribu-

tion. The coefficients an are necessarily of the form an = n1/α forsomeα ∈ (0, 2]; see Feller (1971, Chapter VI.1). The parameterα iscalled the index of stability of the distribution, and a random vari-able X with index of stability α is called α-stable . An α-stable dis-tribution is described by four parameters and will be denoted by

S(α, γ , β, δ). Closed-form expressions for the probability densityfunctions (pdfs) of α-stable distributions are known to exist onlyfor three special cases.2 However, closed-form expressions for thecharacteristic functions of α-stable distributions are readily avail-able. One parameterization of the logarithm of the characteristicfunction of S(α, γ , β, δ) is3

ln E eiτX = iδτ − γ α|τ |

α1 + iβ sign (τ ) ω(τ , α)

, (1)

where sign (τ ) equals−1 for τ < 0, 0 for τ = 0, and+1 for τ > 0;and ω(τ, α) equals − tan(πα/2) for α = 1 and (2/π) ln |τ | forα = 1.4

The asymptotic tail shape of an α-stable distribution is deter-mined by its index of stability α ∈ (0, 2]. Skewness is governedby β ∈ [−1, 1]; the distribution is symmetric about δ ∈ R if andonly if β = 0. The scale and location parameters of α-stable dis-tributions are denoted by γ > 0 and δ, respectively. If α = 2, theright-hand side of Eq. (1) reduces to iδτ − γ 2τ 2, which is that of aGaussian random variable with mean δ and variance 2γ 2.

For α < 2 and |β| < 1, the tails of an α-stable random vari-able X satisfy

limx→∞

Pr(X > x) = C(α) γ α(1 + β)/2

x−α (2)

and

limx→−∞

Pr(X < x) = C(α) γ α(1 − β)/2

|x|−α , (3)

i.e., both tails of the pdf of X are asymptotically Paretian, with tailshape parameter α.5

The function C(α) in Eqs. (2) and (3) is given by6

C(α) =1 − α

Γ (2 − α) cos(πα/2)for α = 1 (4)

and by 2/π forα = 1.7 The function C(α) is continuous and strictlydecreasing over the interval (0, 2); furthermore, limα↓0 C(α) = 1and limα↑2 C(α) = 0. In consequence, as α ↑ 2, proportionatelyless and less of the distribution’s probability mass is located in itstail region. In addition, because the density’s tails decline at anincreasingly rapid rate as α ↑ 2, the likelihood of observing verylarge draws conditional on the draw coming from the tail regiondecreases as well. These observations explain why potentially verylarge sample sizes are required if one desires to estimate the indexof stability with adequate precision if α is close to but smallerthan 2.

Defining E |X |ξ

= limb→∞

b0 xξdF|X |(x), Eqs. (2) and (3) imply

that E |X |ξ < ∞ for ξ ∈ (0, α) and E |X |

ξ= ∞ for ξ ≥ α for an

α-stable random variable X with cumulative distribution function(cdf) FX .8 If α ∈ (1, 2)—as is usually the case for empirical data

2 These three special cases are: the Gaussian distribution, S(2, γ , 0, δ) ≡

N(δ, 2γ 2); the symmetric Cauchy distribution, S(1, γ , 0, δ); and the Lévydistribution, S(0.5, γ ,±1, δ); see, e.g., Zolotarev (1986, chapter 2) and Rachev et al.(2005, chapter 7).3 Other parameterizations exist as well. See Nolan (2013) for a discussion of

several alternative parameterizations.4 0 · ln 0 is always interpreted as 0.5 For α < 2 and β = +1 (−1), i.e., for maximally right-skewed (left-skewed)

distributions, only the right (left) tail is asymptotically Paretian. For α < 1 and β =

+1, Pr(X < δ) = 0, i.e., the distribution’s support is bounded below by δ. Zolotarev(1986, Theorem 2.5.3) and Samorodnitsky and Taqqu (1994, pp. 17–18) provideexpressions for the rate of decline of the non-Paretian tail if β = ±1 and α ≥ 1.6 See Samorodnitsky and Taqqu (1994, p. 17).7 The numerator and the second term in the denominator of Eq. (4) both converge

to 0 as α → 1. The result C(1) = 2/π is obtained by applying L’Hôpital’s Rule.8 Ibragimov and Linnik (1971, Theorem 2.6.4) show that this result holds not

only for α-stable distributions but for all distributions in the domain of attractionof an α-stable distribution. Ibragimov and Linnik (1971, Theorem 2.6.1) providenecessary and sufficient conditions for a probability distribution to lie in the domainof attraction of an α-stable law.

Page 3: On the properties of the coefficient of determination in regression models with infinite variance variables

J.-R. Kurz-Kim, M. Loretan / Journal of Econometrics 181 (2014) 15–24 17

in finance—the mean of X exists but its variance is infinite. Forα > 1 and the parameterization that underlies Eq. (1), it followsthat E (X) = δ. In addition, forβ = 0, δ is equal to the distribution’smode and median irrespective of the value of α, justifying the useof the term ‘‘central location parameter’’ for δ in the finite-mean orsymmetric cases.

2.2. Regression models with infinite-variance variables

LetX and Y be two jointly symmetricα-stable (henceforth, SαS)random variableswithα > 1, i.e., we require X and Y to have finitemeans. Our main reason for concentrating on the case α > 1 liesin its empirical relevance. Estimated maximal moment exponentsfor most empirical financial data, such as returns to exchange ratesand stockprices, are generally greater than1.5; see, for example, deVries (1991) and Loretan and Phillips (1994). An econometric (pur-poseful) reason for studying the caseα > 1 is that, forα-stable dis-tributions with α > 1, regression analysis that is based on samplesecond moments, such as least squares, is still asymptotically con-sistent for the regression coefficients even though the limit distri-butions of these regression coefficients are nonstandard. Anotherreason for this restriction comes from the viewpoint of statisticalmodeling. For a bivariate symmetric α-stable distribution (X, Y ),the conditional expectation function E ( Y | X ) is linear in X only ifα ∈ (1, 2]; the conditional expectation function is in general non-linear or, rather, only asymptotically linear for α ≤ 1.9

Suppose that the regression of a randomvariable Y on a randomvariable X is linear, i.e., there exists a constant θ such that

E ( Y | X ) = θX a.s., (5)

with

θ =[Y , X]α

γxα

,

where γx is the scale parameter of the SαS random variable Xand [Y , X]α is the covariation (covariance in the Gaussian case)between Y and X , which can be expressed as E (XY ⟨ξ−1⟩)/E (|Y |

ξ )for ξ ∈ (1, α), with a⟨ξ⟩

≡ |a|ξ sign (a); see Samorodnitskyand Taqqu (1994, p. 94). For estimation and diagnostic purposes,Eq. (5) can be recast as

yt = c + θxt + ut , t = 1, 2, . . . , T , (6)

where the maintained hypothesis is that ut is iid SαS with α ∈

(1, 2]. The econometric issues of interest are to estimate θ consis-tently and efficiently, to test the hypothesis of significance of θ ,usually based on the t-statistic, and to compute model diagnos-tics such as the coefficient of determination, the Durbin–Watsonstatistic, and F-tests of parameter constancy across subsamples.

The consequences of infinite variance in the regressor and dis-turbance term for these issues can be substantial. For instance,if the variables share the same index of stability α, the ordinaryleast squares (OLS) estimator of θ is still consistent but its asymp-totic distribution is α-stable with the same α as the underlyingvariables. Furthermore, the convergence rate to the true param-eter is T (α−1)/α , smaller than the rate T 1/2 which applies in thefinite-variance case. Moreover, if α < 2, OLS loses its best lin-ear unbiased estimator (BLUE) property, i.e., it is no longer theminimum-dispersion estimator in the class of linear estimatorsof θ . In addition, the asymptotic efficiency of OLS converges to 0 asα ↓ 1.

Blattberg and Sargent (1971) derived the BLUE for θ in Eq. (6)if α is known; their estimator coincides with the OLS estimator if

9 Formore on bivariate linearity, see Samorodnitsky and Taqqu (1994, Chapters 4and 5).

α = 2. Samorodnitsky et al. (2007) consider an optimal powerestimate based on the Blattberg–Sargent estimator for unknownα,and they also provide an optimal linear estimator of the regressioncoefficients for various configurations of the indices of stability ofxt and ut . Other efficient estimators of the regression coefficientshave been studied as well. Kanter and Steiger (1974) proposean unbiased L1-estimator, which excludes very large shocks inits estimation to avoid excess sensitivity due to outliers. Usinga weighting function, McCulloch (1998) considers a maximumlikelihood estimator which is based on an approximation to asymmetric stable density.

Hypothesis testing is also affected considerably when the re-gressors and disturbance terms have infinite-variance stable dis-tributions. For example, the t-statistic, commonly used to test thenull hypothesis of parameter significance, no longer has a conven-tional Student-t distribution if α < 2. Rather, as shown by Loganet al. (1973), its pdf has modes at −1 and +1, and for α < 1 thesemodes are infinite. Kim (2003) provides empirical distributionsof the t-statistic for finite degrees of freedom and various valuesofα by simulation. The usual applied goodness-of-fit test statistics,such as the likelihood ratio, Lagrange multiplier, and Wald statis-tics, also no longer have a conventional χ2 distribution asymptoti-cally. Instead, they have a stable χ2 distribution, a term introducedby Mittnik et al. (1998).

In time series regressions with infinite-variance innova-tions, Phillips (1990) shows that the limit distribution of the aug-mented Dickey–Fuller tests for a unit root are functionals of Lévyprocesses, whereas they are functionals of Brownian motion pro-cesses in the finite-variance case. The F-test statistic for parameterconstancy that is based on the residuals from a sample split testhas an F-distribution in the conventional, finite-variance case. Asshown by Runde (1993), the limiting distribution of the F-statisticfor α < 2 behaves completely differently from the Gaussian case.Whereas in the Gaussian case the statistic converges to 1 underthe null as the degrees of freedom for both the numerator and de-nominator of the statistic approach infinity, in the non-Gaussiancase the statistic converges to a ratio of two independent, positive,and maximally right-skewed α/2-stable distributions. This resultis also used below in the derivation of closed-form expressions forthe pdf and cdf of the limiting distribution of the R2 statistic.

Moreover, commonly used criteria for judging the validity ofsome of the maintained hypotheses of a regression model, suchas the Durbin–Watson statistic and the Box–Pierce Q -statistic,would be inappropriate if one were to rely on conventional criti-cal values. Phillips and Loretan (1991) study the properties of theDurbin–Watson statistic for regression residuals with infinite vari-ance, and Runde (1997) examines the properties of the Box–PierceQ -statistic for randomvariableswith infinite variance. Loretan andPhillips (1994) and Phillips and Loretan (1995) establish that boththe size of tests of covariance stationarity under the null and theirrate of divergence of these tests under the alternative are stronglyaffected by the failure of standard moment conditions; indeed,standard tests of covariance stationarity are inconsistent if popu-lation second moments do not exist.

3. Asymptotic properties of the coefficient of determination

3.1. Basic results

For the general asymptotic theory of stochastic processes withstable random variables, we refer to Davis and Resnick (1985a,b,1986) and Resnick (1986). Our results in this section on theasymptotic properties of the R2 statistic are, in large part, anapplication of their work to the regression diagnostic context.

Page 4: On the properties of the coefficient of determination in regression models with infinite variance variables

18 J.-R. Kurz-Kim, M. Loretan / Journal of Econometrics 181 (2014) 15–24

The maintained assumptions of this section are:

1. the relationship between the dependent and independentvariable conforms to the classical bivariate linear regressionmodel

yt = c + θxt + ut , t = 1, . . . , T ; (7)

2. ut is distributed iid SαS (αu, 0, γu, 0), with αu ∈ (1, 2);3. xt is independent of ut and is also distributed iid SαS (αx, 0, γx,

0), with αx ∈ (1, 2);4. the regressor and the error term have the same index of

stability, i.e., αx = αu = α;5. the coefficients c and θ are estimated consistently.10 Denote the

point estimates by c and θ .

Assumption 1 may easily be generalized to include autoregres-sions of the form yt = c+θyt−1+ut . Assumptions 2 and 3 could berelaxed without loss of generality, though at the cost of lengthen-ing the proofs below, to require merely that the variables be in thenormal domain of attraction of symmetric stable laws. The sym-metry assumption for ut is strictly required only if α = 1; for1 < α < 2, it could be replaced (again at the cost of lengtheningthe proofs) with the requirement that E ut = 0.11 Assumption 4,that the regressor and the error term have the same index of sta-bility, is rather strong, and its empirical validity may be difficult toascertain in practice. In Corollary 2 below, we therefore examinethe consequences of having unequal values for the indices of sta-bility for xt and ut for the asymptotic properties of the coefficientof determination.

The coefficient of determination, R2, is the proportion of the to-tal squared variation in the dependent variable that is ‘‘explained’’by the regression12:

R2=

Explained Sum of SquaresTotal Sum of Squares

=

yt − y

2yt − y

2 . (8)

Some basic calculations show that yt = c + θxt , yt = yt + ut , andy = c + θ x, where y and x are the respective sample averages, andhence that yt − y = θ (xt − x) and yt − y = θ (xt − x) + ut . Since

(xt − x)ut = 0 by construction, the coefficient of determinationmay also be expressed as

R2=

θ2 xt − x

2θ2

xt − x

2+

u2t

. (9)

To obtain asymptotically nondegenerate limits for the terms inthe numerator and denominator of this expression, it is necessaryto norm them by functions of the sample size T . Because x2t andu2t are in the normal domain of attraction of a stable distribution

with index of stability α/2, norming by T−2/α rather than by T−1

is required to obtain non-degenerate limits for the sums of thesquared variables. Because θ →p θ by Assumption 5, applyingthe law of large numbers, the continuous mapping theorem, andthe results of Davis and Resnick (1985b) yield the followingexpression for the joint limiting distribution of the elements in

10 If αx = αu , OLS generates consistent estimates of c and θ . See Samorodnitskyet al. (2007) for an overview and discussion of estimation methods that areconsistent for various combinations of αu and αx .11 See, for instance, Loretan and Phillips (1994) for a discussion of how far themaintained assumptions listed above could be weakened.12 Unless indicated otherwise, all summations are understood as t running from 1to T .

Eq. (9), where ≈ denotes approximate equality andL→ denotes

convergence in law as T → ∞:T−2/αγ −2

u

u2t , θ2T−2/αγ −2

x

(xt − x)2

T−2/αγ −2

u

u2t , θ2T−2/αγ −2

x

x2t

=

T−2/α

(ut/γu)

2, θ2T−2/α

(xt/γx)2

L→

Su, θ2Sx

. (10)

For α < 2, the random variables Su and Sx are independent(because xt and ut are independent), maximally right-skewed, andpositive stable random variables with parameters α/2 < 1, γ =

1,13 β = +1, δ = 0, and log characteristic function

ln E eiτS = − |τ |α/2

1 − i sign (τ ) tan(πα/4). (11)

We deduce that the R2 statistic stated in Eq. (8) has a nondegen-erate limiting distribution:

Theorem 1. Under Assumptions 1 to 5 and α < 2, the coefficient ofdetermination given in Eq. (8) converges in law:

R2 L→

θ2γ 2x Sx

θ2γ 2x Sx + γ 2

u Su

=ηSx

ηSx + Su

=ηZ

ηZ + 1

= R (α, η), (12)

where η = (θγx/γu)2

≥ 0,14 Z = Sx/Su, and Sx and Su are iidS(α/2, 1, 1, 0).

Thus, for α < 2 and η > 0, the coefficient of determinationdoes not converge to a constant but, instead, has a nondegenerateasymptotic distribution on the interval [0, 1]. This is in starkcontrast to the finite-variance case15:

Corollary 1. If αu = αx = 2, the limit of R2 as T → ∞ is given by

R2→p

θ2σ 2x

θ2σ 2x + σ 2

u=

η

η + 1, (13)

where η = (θσx/σu)2.

Proof. If α = 2, the dispersion parameter γ in Eq. (1) equalsσ/

√2. Therefore, the normalization of the terms in Eq. (10) by

T−1γ −2x and T−1γ −2

u , respectively, produces a constant of 2 inboth cases. The corollary follows immediately from this obser-vation. �

For finite-variance regressors and error terms, the model’sasymptotic signal to noise ratio, η = (θσx/σu)

2, is a constant, as istherefore the limit of the coefficient of determination. In contrast,in the infinite-variance case the model’s limiting signal to noise

13 To prove that γ = 1, see Brockwell and Davis (1991, p. 529, Eq. 13.3.14). Inthat equation, put C = C(α/2), where the function C(·) is given by Eq. (4), andemploy the recursive relationship Γ (2 − α/2) = (1 − α/2)Γ (1 − α/2) to obtainthe required result.14 Because the dispersion parameters γx and γu are positive, η = 0 if and only ifθ = 0.15 This corollary is stated for the case of xt and ut being normally distributed. It isstraightforward to show that it holdsmuchmore generally, viz., for xt and ut havingfinite variances.

Page 5: On the properties of the coefficient of determination in regression models with infinite variance variables

J.-R. Kurz-Kim, M. Loretan / Journal of Econometrics 181 (2014) 15–24 19

ratio is given by the random variable ηZ , where η = (θγx/γu)2 and

Z = Sx/Su. It is this feature that causes the randomness of R (α, η).Assumption 4, i.e., the requirement αx = αu, is crucial in ob-

taining the result that the asymptotic distribution of R is nonde-generate. Indeed, if the two indices of stability are not equal, the R2

statistic has very different asymptotic properties:

Corollary 2. Suppose that the maintained assumptions of Theo-rem 1 apply except that αx = αu, i.e., suppose that the indices ofstability of the regressor and error term are unequal. Let θ = 0. Then,

• If αx < αu, 1 − R2= op

T 2/αu−2/αx

; and

• If αu < αx, R2= op

T 2/αx−2/αu

.

Thus, R2→p 1 if αx < αu and R2

→p 0 if αu < αx.

Proof. If αx = αu, different norming factors, viz., T 2/αx and T 2/αu ,are needed in Eq. (10) to achieve joint convergence of the terms

(xt − x)2 and

u2t to γ 2

x Sx and γ 2u Su, respectively. Suppose first

that αx < αu. Because T 2/αx > T 2/αu , we find

T−2/αx

u2t = T 2/αu−2/αx · T−2/αu

u2t

= opT 2/αu−2/αx

.

Therefore,

R2=

θ2T−2/αx

(xt − x)2

θ2T−2/αx

(xt − x)2 + T−2/αx

u2t

L→

θ2γ 2x Sx

θ2γ 2x Sx + op

T 2/αu−2/αx

→p 1.

Similarly, if αu < αx, T−2/αu

(xt − x)2 = opT 2/αx−2/αu

and

R2→p 0. �

If αx = αu and θ = 0, the limiting distribution of the R2 statisticis degenerate at 0 and 1 because the model’s asymptotic signal tonoise ratio is either zero (if αu < αx) or infinite (if αx < αu). Fromthis proof onemay also deduce that ifαx = αu, Assumption 5 – thatθ is estimated consistently – could be relaxed to require merelythat θ = op(1); the result that R2 converges either to 0 or to 1would then continue to hold.

3.2. Qualitative properties of R

Returning to the case of αx = αu = α, we note that the randomvariable R is defined for all values of α ∈ (0, 2) even though in aregression context onewould typically assume that α ∈ (1, 2). Wenow establish some important qualitative properties of R.

Remark 1. For θ = 0 and hence η > 0, the median of R,m, equalsη/(η + 1).

Proof. For η > 0, observe that

PrR ≤

η

η + 1

= Pr

ηSx

ηSx + Su≤

η

η + 1

= Pr

Sx ≤

1η + 1

(ηSx + Su)

= Pr(η + 1)Sx − ηSx ≤ Su

= Pr

Sx ≤ Su

.

Because Sx and Su are iid and have continuous cdfs, Sx − Su issymmetric about 0 and therefore Pr(Sx − Su ≤ 0) = Pr(Sx ≤ Su) =

0.5. This establishes that the median of R equals η/(η + 1). �

Thus, m is equal to the non-random limit of R2 in the finite-variance case. Since Sx and Su are positive a.s., we also havePr(Sx/Su ≤ 1) ≡ Pr(Z ≤ 1) = 0.5, i.e., the median of Z ≡ Sx/Su isequal to 1 regardless of the value of α. As we show in more detailbelow, the probability mass of Z is highly concentrated around itsmedian for values of α close to 2, whereas for small values of α Z isunlikely to be close to 1. (Instead, it ismuchmore likely that onewillobtain a draw of Z that is either very close to 0 or very large.) Smalland large draws of Z strongly affect themodel’s signal to noise ratioand therefore also R2. This suggests that an informalmeasure of theeffect of infinite variance in the regression variables on the valueof R2 in a given sample may be based on the difference betweenthe model’s coefficient of determination and a consistent estimateof its median m, say m = η/(η + 1), where η = (θ γx/γu)

2. Thelarger the difference between R2 and m, the more important is theconsequence of having obtained a small (or large) draw of Z for thesample value of R2.

Another well-known finite-variance property of R2(η), viz.,R2(1/η) = 1 − R2(η) for η > 0, also carries over to R:

Remark 2. For θ = 0 and η = 1, the distribution of R (α, η) isskew-symmetric:

R (α, η)d= 1 − R (α, 1/η).

The distribution of R is symmetric for η = 1.

Proof. Recall that Sx and Su are iid. Thus, for η > 0

1 − R (α, 1/η) = 1 −(1/η)Sx

(1/η)Sx + Su

=Su

(1/η)Sx + Su

=ηSu

ηSu + Sxd=

ηSxηSx + Su

= R (α, η).

The symmetry of R about 0.5 for η = 1 also follows from thisresult. �

Even without knowing the functional form of either the cdf orpdf of R, one can further show that whereas its cdf is continuous onthe support, its pdf is unbounded at both 0 and 1:

Remark 3. (i) The cdf of R is continuous on [0, 1], and thedistribution does not have atoms at 0 and 1. (ii) For θ = 0 andhence η > 0, the pdf of R is unbounded at 0 and 1, i.e., fR(0) =

fR(1) = ∞.

Proof. The continuity of the cdf of R on (0, 1) for η > 0 followsfrom the continuity of the cdfs of Sx and Su on R+. Because thepdfs and cdfs of Sx and Su are equal to 0 at the origin, it followsthat Pr(R = 1) = Pr(Su = 0) = 0 a.s. Because the choice of η isarbitrary, the result Pr(R = 0) = 0 follows from an application ofRemark 2.

To establish the second part of the remark, we apply a standardresult for the pdf of the ratio of two random variables – see,e.g., Mood et al. (1974, p. 187) – adapted to the present case ofpositive random variables. For η > 0, set V = ηSx and W =

ηSx + Su. We have

fR(r) =

0wfV ,W (rw, w)dw, 0 ≤ r ≤ 1,

where the joint pdf fV ,W (·, ·) is nonzero onR+×R+. The case r = 1

can occur only if Su = 0. If Su = 0, however, the randomvariablesVandW are equal and thus perfectly dependent, in which case theirjoint pdf, fV ,V (w, w), is nonzero only on the positive 45◦-halfline,

Page 6: On the properties of the coefficient of determination in regression models with infinite variance variables

20 J.-R. Kurz-Kim, M. Loretan / Journal of Econometrics 181 (2014) 15–24

where it reduces to (1/√2)fV (w), w ≥ 0. Hence, for r = 1 we find

fR(1) =

0wfV ,V (1 · w, w)dw

=1

√2

0wfV (w)dw

=1

√2E (V ) =

η√2E (Sx) = ∞.

By Remark 2, we have fR(0) = ∞ as well. �

The fact that the pdf of R has infinite singularities at 0 and 1may seemunusual at first. However, such singularities are a regularfeature of pdfs that are based on ratios of stable random variables.For example, Logan et al. (1973) and Fiorio et al. (2010) show thatif α < 1, the density of the t-statistic has infinite modes at −1and +1. Similarly, Phillips and Loretan (1991) show that if α < 2the limiting pdfs of the von Neumann ratio and the normalizedDurbin–Watson test statistic have an infinite mode at 0.

3.3. The cdf and pdf of R

The results stated in the three remarks of the precedingsubsection provide important qualitative information about somethe distributional properties of R. However, they do not addressissues such as whether the pdf has modes beyond those at 0 and 1,whether the discontinuity of the pdf at the endpoints is simple orif fR (r) diverges – and, if so, at which rate – as r ↓ 0 or r ↑ 1,or how much of the distribution’s mass is concentrated near theendpoints of the support. To examine these issues, it is necessary tohave expressions for the cdf andpdf of fR (r).Wenowshow that it ispossible to provide closed-form statements for both the cdf and pdfof R, because this random variable is a continuously differentiableand invertible function of the ratio of two independent, maximallyright-skewed, and positive α-stable random variables and becauseclosed-form expressions for the cdf and pdf of this ratio are known;the expression for the cdf of the ratio is due to Zolotarev (1986,p. 205), and that for the pdf is given in Runde (1993, p. 11).

Proposition 1. Let S1 and S2 be iid positive α-stable random vari-ables with common parameters α/2 ∈ (0, 1), γ = 1, β = +1, andδ = 0. Set Z = S1/S2.

For z ≥ 0,16 the cdf of Z is given by

FZ (z) = Pr(Z ≤ z)

=1

πα/2arctan

zα/2

+ cos(πα/2)sin(πα/2)

+ 1. (14)

For z > 0, the pdf of Z is given by

fZ (z) =ddz

FZ (z)

=sin(πα/2)

πzz−α/2 + zα/2 + 2 cos(πα/2)

. (15)

The random variable Z has several interesting properties. First,note that limz↓0 fZ (z) = ∞ and that the rate of divergence toinfinity of fZ (z) as z ↓ 0 is given by (1/z)1−α/2. Thus, the pdfof Z has a one-sided infinite singularity at 0. Second, as z → ∞,fZ (z) ≈ κ · z−α/2−1 for a suitable constant κ > 0. This result,

16 Because Z is a positive random variable, FZ (z) = fZ (z) = 0 for z < 0.

along with Pr(Z > 0) = 1, implies that Z lies in the normaldomain of attraction of a positive stable distribution, say, Z ′, withindex of stability α/2 and skewness parameter β = +1—the sameparameter values as those of the component random variables S1and S2.17 Hence, the mean of Z is infinite for all values of α < 2.Third, in the special case of α = 1, S1 and S2 are each distributedindependently as a Lévy α-stable random variable, which is wellknown to be equivalent to the inverse of a χ2(1) random variable.The ratio of two independent χ2(1) random variables has, ofcourse, a central F1,1-distribution. For α = 1, the pdf of Z isgiven by (πz1/2(1 + z))−1. This expression can be derived both bysetting α = 1 in Eq. (15) and by setting the two parameters of theF-distribution equal to 1; see also Runde (1993).

As we noted earlier, the median of Z is equal to 1 for all valuesof α ∈ (0, 2]. The regression model’s limiting signal to noiseratio is a function of the random variable Z if α < 2, whereas itconverges to a constant in the finite-variance case. The fact that therandom variable Z has a median of 1 helps to develop the intuitionthat underlies the result of Remark 1, viz., that the median of R,η/(η+1), is the same in both the finite- and infinite-variance cases.An inspection of Eq. (14) reveals that limα↑2 Pr(Z < 1) = 0 andlimα↑2 Pr(Z > 1) = 0. Put differently, limα↑2 Pr(Z = 1) = 1. Theprobability mass of Z therefore becomes perfectly concentratedat 1 as α ↑ 2 – even though, of course, itsmean remains infinite aslong as α < 2 – and the model’s signal to noise ratio ηZ thereforeconverges to the constant η as α ↑ 2.

From Theorem 1, we have R = ηZ/(ηZ + 1) = g(Z), say. Notethat Z ≡ Sx/Su satisfies the conditions of Proposition 1 and thatthe function Z = g−1(R) = (1/η)(R/(1 − R)) is continuouslydifferentiable and strictly increasing in the interior of its domain.We are therefore able to provide closed-form expressions for thecdf and pdf of R by an application of the density transformationtheorem18:

Theorem 2. Let the cdf and pdf of Z be given by Eqs. (14) and (15).For η > 0, set R = g(Z) = ηZ/(ηZ + 1); for r ∈ (0, 1), setz = g−1(r) = r/

η(1 − r)

.

(i) The cdf of R (r) for r ∈ (0, 1) is given by

FR(r) = FZg−1(r)

. (16)

Furthermore, FR (0) = 0 and FR (1) = 1.(ii) The pdf of R for r ∈ (0, 1) is given by

fR(r) =

ddr

g−1(r) fZg−1(r)

=

1η(1 − r)2

·sin(πα/2)

πg−1(r)[g−1(r)]−α/2 + [g−1(r)]α/2 + 2 cos(πα/2)

=

sin(πα/2)π r(1 − r)

·z−α/2

+ zα/2+ 2 cos(πα/2)

−1, (17)

where z = r/η(1 − r)

.

(iii) As r ↓ 0 or r ↑ 1, fR (r) diverges to infinity at a rate proportionalto (1/r)1−α/2 and

1/(1 − r)

1−α/2, respectively.

17 See Mittnik et al. (1998) for a discussion of some of the properties of the stablelaw Z ′ .18 For a statement of the density transformation theorem see, e.g., Mood et al.(1974, p. 200).

Page 7: On the properties of the coefficient of determination in regression models with infinite variance variables

J.-R. Kurz-Kim, M. Loretan / Journal of Econometrics 181 (2014) 15–24 21

Fig. 1. Probability density functions of R (α, η), η = 1.

Fig. 2. Cumulative distribution functions of R (α, η), η = 1.

Proof. The first two claims follow immediately from Proposition 1and thedensity transformation theorem. Because limr↓0 dg−1(r)/dr= η−1, the rate of divergence of fR (r) as r ↓ 0 is equal to – apartfrom themultiplicative constant η−1 – that of fZ (z) as z ↓ 0, whichis (1/z)1−α/2. Finally, it follows from Remark 2 that as r ↑ 1 the pdfof R also diverges to infinity at this rate. �

The pdfs and cdfs of R (α, η) for values of α between 0.5and 1.98 are graphed in Figs. 1 and 2. (In all cases, we have setη = 1.) The pdfs in Fig. 1 are shownwith a logarithmic scale on theordinate. Since we know that fR (0) = fR (1) = ∞, we graph thefunctions only for r ∈ (10−13, 1 − 10−13).

The figures show that:• if α is close to but less than 2, e.g., if α = 1.98 or α = 1.90,

the pdf has an interior mode, and most of the probability massof R is concentrated near its median. Conversely, only very littlemass is located near 0 and 1;

• for α = 1.75 and α = 1.50, the pdf continues to have an in-terior mode (as well as, of course, the two unbounded modesat 0 and 1). However, the distribution is noticeably less concen-trated around the interior mode than if α is closer to 2;

• if α declines further, increasingly more of the probability massof R is located near 0 and 1. If α = 0.5, about 70% of the prob-ability mass lies within 0.05 of the two endpoints of the distri-bution’s support,while the probability of obtaining a realizationof R for r ∈ [0.25, 0.75] – recall that the distribution’s median,by design, is located at 0.5 – is less than 15%.

A heuristic summary of the properties of R is straightforward.We begin by recalling that the function C(α) in Eq. (4) controls

the probability of observing tail-region values of the randomvariables in question, and that the rate of decline of the pdf of anα-stable distribution in the tail region increases as α ↑ 2. Supposefirst that α is very close to 2. Then, C(α) is close to 0, and thefraction of observations of xt and ut that fall into the respectiveParetian-tail regions is therefore very small. Moreover, given thefairly rapid rate of decline of the pdf’s tails for α close to 2, thelikelihood of obtaining a very large draw, conditional on havingobtained a draw from the tail region, is also low. As a result, theunconditional probability of observing very large observations of xtand ut is low if α is close to 2. In turn, this makes observing verylarge draws of either Sx or Su, and thus observing a value of Z that iseither close to 0 or very large, a low-probability event. In summary,if α is very close to 2, Z is likely close to its median of 1, and mostof the mass of R is therefore concentrated close to its median, viz.,η/(η + 1).

Next, suppose that α takes on a value of about 1.5. The valueof C(α) is now larger than if α is closer to 2, which implies a higherfrequency of tail-region draws for xt and ut . Moreover, the pdf nowdeclinesmore slowly, and it is thereforemuchmore likely to obtainvery large draws of the regressor and error term conditional onhaving obtained a draw from the tail region. In consequence, forvalues of α of about 1.5, the probability of obtaining a draw of Zthat is not close to its median of 1 is much larger than if α is closerto 2, and the probability mass of R is therefore more dispersed.

Finally, as α decreases further, both the probability ofdrawing tail observations and the likelihood that draws fromthe distributions’ tail areas will be very large increase. Therefore,it becomes increasingly likely that the largest few observationsof xt and ut will dominate the realization of Z and therefore therealization of R. As a result, if α is sufficiently small, i.e., if α < 1.2,the central mode of the pdf of R vanishes entirely and almost allof its probability mass is located very close to the endpoints ofthe distribution’s support. In the limit, as α ↓ 0, R converges toa Bernoulli random variable, for which all of the probability massis located at 0 and 1.

4. Empirical application: revisiting the CAPM

Fama and MacBeth (1973) proposed the so-called Fama–MacBeth regression to test the hypothesis of a linear relationshipbetween risk (market beta) and risk premium (returns) of stocks,or portfolios of stocks, in a cross-sectional setting. Their methodrequires two stages. In the first stage, themarket beta of each stock(or portfolio of stocks) is estimated in a time series regression. Inthe second step, the return of each stock (or portfolio of stocks) isregressed on the corresponding market betas.

Let rit be the return on portfolio i at time t , where i = 1, . . . ,Nand t = 1, . . . , T . Denote the average return of portfolio i asri = T−1

rit , the average portfolio return at time t as Rt =

N−1 Ni=1 rit , and the average portfolio return across all time

periods by µR = T−1 Rt . The first-stage regression, for each

portfolio i, is an ex post CAPM,

rit = ζi + βiRt + uit , t = 1, . . . , T , (18)

where it is assumed that E (uit) = 0, E (uitRt) = 0, and uit isdistributed iid SαS with the same index α ∈ (1, 2] as rit . Thecoefficients βi, which denote the sensitivity of portfolio i to overallmarket returns at time t , may be assumed to follow a distributionof finite variance, denoted by Var (β). This assumption, which isempirically realistic, plays a crucial role in the derivation of thesome of the results below. Denote the OLS estimates of the slopecoefficients in Eq. (18) by βi.

The second-stage Fama–MacBeth regression is given by

ri = λ0 + λ1βi + εi, i = 1, . . . ,N, (19)

Page 8: On the properties of the coefficient of determination in regression models with infinite variance variables

22 J.-R. Kurz-Kim, M. Loretan / Journal of Econometrics 181 (2014) 15–24

where εi is iid SαS with the same index α as rit , E (εi) = 0, andE (εiβi) = 0. The R2 statistic of the second-stage regression is givenby

R2=

λ21

Ni=1

βi − β

2λ21

Ni=1

βi − β

2+

Ni=1

ε2i

, (20)

where β denotes the mean of the N βi coefficients.This R2 statistic has the following asymptotic properties:

Theorem 3. If the individual portfolio returns rit follow an iidSαS distribution with α ∈ (1, 2] and if µR > 0, the coefficient ofdetermination in Eq. (20) has the following limits as T → ∞ andN → ∞:

• if α = 2, R2→p η/

η + 1), where η = λ2

1Var (β)/Var (ε); and• if α < 2, R2

= opN1−2/α

.

Thus, if α < 2, R2→p 0 at a rate that is proportional to N1−2/α .

Proof. The result for the finite-variance case follows from Corol-lary 1. For α < 2, the dispersion of βi about βi converges to 0 asT → ∞ because OLS is a consistent (though inefficient) estima-tor, and the distributional properties of the βi’s therefore convergeto those of βi as T → ∞. Recall that the variance of the βi’s isfinite by assumption. As both N → ∞ and T → ∞, the numera-tor and the first summand in the denominator of Eq. (20), normal-ized by N , therefore converge to λ2

1 Var (β). However, the secondsummand in the denominator of Eq. (20) requires normalization byN2/α > N in order to attain a proper limit. The coefficient of deter-mination of the second-stage Fama–MacBeth regression thereforeconverges to 0 in probability as N, T → ∞, at a rate proportionalto N1−2/α , irrespective of the value of λ. �

This result does not conflict with Theorem 1, as the present caseis one of an unbalanced regression design: the regressor’s varianceis finite, whereas the error term’s variance is infinite, implying thatthe model’s asymptotic signal to noise ratio is 0 irrespective ofthe value of λ1. This result is closely related to the one providedin Corollary 2, which states the asymptotic limits of R2 for thecase αx = αu. We note that even if T is fixed (as is generallytaken to be the case in Fama–MacBeth regressions) and thus ifthe model’s signal to noise ratio is not strictly 0, the ratio of thesample dispersions of βi and εi, which is related to the quantityγx/γu in Theorem 1, is likely to be very small. The model’s signal tonoise ratio η and themedian R2 of the second-stage regression willtherefore also likely be very small, even if λ1 = 0.

These qualitative observations are confirmed by a small-scaleMonte Carlo simulation. In Table 1, we report the median valueof R2 as a function of α, T , N , and µR. The design of the simulationand the choices of values for α, T , N and µR were influenced bya desire to maximize the empirical relevance of the simulationexercise. We chose α = 1.5 and α = 1.75 because α ≥ 1.5 formost empirical economic datasets. We chose values of T of 250and 1000; T = 250 corresponds approximately to the numberof business days in a calendar year, and T = 1000 was selectedto provide a comparison with a larger sample. The case µR = 0was included to show that the result of Theorem 3 applies in finitesamples both for λ1 = 0 and for λ1 = 0. µR = 0.1 is particularlyrelevant for the empirical study provided below.

It is evident for both α = 1.5 and α = 1.75 that (i) themedian value of R2 declines as N increases if T is fixed, (ii) thiseffect is particularly strong T itself is large, and (iii) this effect ismore pronounced for α = 1.5 than it is for α = 1.75. The finalresult is as one would expect, given that Theorem 3 states that the

Table 1Median value of R2 as a function of α, T , N , and µR .

α T N µR

0.0 0.1 0.3 0.5 1.0

1.50 250 30 0.0402 0.0426 0.0779 0.1598 0.4417100 0.0161 0.0190 0.0448 0.1058 0.3304500 0.0064 0.0075 0.0220 0.0565 0.2020

1000 0.0047 0.0058 0.0172 0.0452 0.16671000 30 0.0387 0.0484 0.1499 0.3320 0.6748

100 0.0162 0.0223 0.0940 0.2272 0.5558500 0.0065 0.0104 0.0521 0.1341 0.3994

1000 0.0046 0.0072 0.0399 0.1079 0.34431.75 250 30 0.0474 0.0642 0.2509 0.4899 0.7993

100 0.0265 0.0430 0.2066 0.4264 0.7560500 0.0169 0.0290 0.1571 0.3500 0.6950

1000 0.0143 0.0251 0.1440 0.3273 0.67301000 30 0.0470 0.1193 0.5351 0.7665 0.9309

100 0.0265 0.0871 0.4612 0.7124 0.9115500 0.0169 0.0635 0.3910 0.6507 0.8865

1000 0.0144 0.0579 0.3663 0.6257 0.8744

The numbers in the body of the table are the medians from simulated regressionswith 100,000 replications.

Fig. 3. CRSP returns, monthly data, July 1963 to December 1992.

rate of convergence ofR2 to zero increases asαmoves down furtherfrom 2.

On the basis of the small value of coefficient of determina-tion from the Fama–MacBeth regression, Jagannathan and Wang(1996) confirm the finding of Fama and French (1992) of a ‘‘flat’’relation between average return and market beta. They report avery low coefficient of determination of 1.35% = 0.0135 for theSharpe–Lintner–Black (SLB) static CAPM. Regarding ‘‘thick-tailed’’phenomena in empirical data, Fama and MacBeth (1973, p. 621)conjecture that neglecting the heavy-tails phenomenon of the datadoes not lead to serious errors in the interpretation of empirical re-sults. To (re)examine this claim, we use the same CRSP dataset aswas used by Jagannathan andWang (1996); the data are very simi-lar to those that were used in the study of Fama and French (1992).The data consist of stock returns of nonfinancial firms listed on theNYSE and AMEX from July 1963 until December 1990 covered byCRSP alone; the frequency of observation is monthly. In the pre-ceding notation, we have T = 330 and N = 100. Fig. 3 displays thetime series of these monthly returns.

For our analysis we need to obtain point estimates the index ofstability of the stock returns and determine whether the estimatesare less than 2. Under the assumption of symmetry, which impliesthat the left and right tails of the returns distribution possess thesame maximal moment exponent and dispersion coefficient, thepoint estimate of α for monthly stock returns in the CRSP datasetusing the method of Hill (1975) is 1.77, with a standard deviation

Page 9: On the properties of the coefficient of determination in regression models with infinite variance variables

J.-R. Kurz-Kim, M. Loretan / Journal of Econometrics 181 (2014) 15–24 23

of 0.15.19 On the basis of these estimates, normality (α = 2) canbe excluded only at a confidence level of approximately 87.5%.20

However, widths of confidence intervals for the Hill estimatorare valid only asymptotically. In finite samples, Hill-method basedestimates of α are known to be quite sensitive to even minordepartures from the assumption of an exactly Paretian shapeof the pdf’s tails. Stable distributions have tails that are onlyasymptotically Paretian. Especially if the index of stability is notfar below 2, it is known that the tails of stable distributions arenot approximated particularly well by Pareto distributions withthe same value of α. In particular, applying the Hill method maylead to an overestimate of the value of α.21 In contrast, the methodof Dufour and Kurz-Kim (2010) provides exact confidence intervalsfor finite sample sizes. By their method, the point estimate of α forthe monthly stock returns data is 1.78, and the exact finite-sample90% confidence interval for this point estimate is [1.64, 1.99].This result also does not offer very strong evidence against thehypothesisα = 2. Nevertheless, because of estimation uncertaintyin small samples, and because this uncertainty is especially severeif α is close to 2, the data can be regarded as being in thedomain of attraction of a stable distribution with α < 2. Wetherefore proceed to investigate the consequences of this findingfor the proper interpretation of the low R2 statistic reportedby Jagannathan and Wang (1996, Table II, p. 22).

We designedMonte Carlo simulations to obtain the cdf of R2 forour empirical data, first under the assumption that the returns dataare in the domain of attraction of anα-stable distributionwithα <2, and second under the assumption of normality (α = 2).22 Thecdfs of the simulated R2-statistics are shown in Fig. 4, where avertical line is drawn at R2

= 0.0135 to indicate the in-samplevalue of the coefficient of determination. The shapes of the twocurves are rather different, with the one for α = 1.78 rising muchmore quickly for small values of R2.

The simulated median R2 of the second-stage Fama–MacBethregression is 0.384 for α = 2, but it is only 0.072 for α = 1.78.The simulated probability of obtaining R2

≤ 0.0135 is a minuscule1.55% for α = 2, but it is a muchmore sizable 21.88% for α = 1.78.On the basis of these findings, we conclude that the inferencedrawn from the low value of R2 by Fama and French (1992) –that the empirical usefulness of the SLB CAPM is refuted – is notwarranted once the proper allowance is made for the samplingproperties of the data.

19 In this estimation, we used 0.0031 as the centering offset for the empirical data;this adjustment is necessary because the Hill estimator is not location-invariant.The offset is equal to the estimated location parameter obtained by the quantileestimationmethod ofMcCulloch (1986). The choice of the number of order statisticsto include in the Hill method used was determined by the Monte Carlo methodof Dufour and Kurz-Kim (2010). For the present dataset, this method indicated theuse of 43% of all observations.20 If the empirical distribution of extreme returns is symmetric, the Hill estimatorshould be based on observations from both tails. If, in contrast, the distribution isright-skewed (left-skewed), one should use only observations from the right (left)tail. In our sample, the distribution of extrememonthly stock returns is left-skewed,as the largest few negative returns are clearly larger than the largest few positivereturns. Under the assumptionof left-skewness, theHill-methodpoint estimate ofαusing only negative returns is 1.47, with a standard deviation of 0.18. Therefore,if we took into account only negative returns in our empirical estimation of α,the negative consequences of incorrectly assuming that the Gaussian distributionalassumption applies for inference about the value of the R2 statistic would be evenpronounced.21 See Resnick (2006, pp. 86–89) for an overview of the consequences of thesefinite-sample features for the reliability of the Hill estimator and McCulloch (1997)for a broader discussion of methods for deciding whether α < 2.22 The simulations’ parameters were calibrated to the main characteristics theempirical data.We setα = 1.78, T = 330,N = 100, andwe set the expected returnequal to the average annual return in the full sample, i.e.,µR = 0.1088. The numberof replications of the first-stage and second-stage Fama–MacBeth regressions is100,000 for the both values of α.

Fig. 4. Simulated cdf of the R2 statistic from second-stage Fama–MacBethregression.

5. Concluding remarks

By studying the properties of the coefficient of determination,our paper adds to the already-wide body of knowledge thatthere are substantial differences between regression models withinfinite-variance and finite-variance regressors and error terms. Inthe infinite-variance case with iid regressors and error terms thatshare the same index of stability α, we find that the R2 statisticdoes not converge to a constant but instead has a nondegenerateasymptotic distribution on the [0, 1] interval, with a pdf thathas infinite singularities at 0 and 1. We provide closed-formexpressions for the cdf and pdf of this limit random variable. Incontrast, if the regressors and error term do not have the sameindex of stability, the coefficient of determination collapses eitherto 0 or to 1, depending on whether the model’s signal to noiseratio converges asymptotically to zero or infinity. We provide anempirical application to the Fama–MacBeth two-stage regressionframework, and we show that the R2 statistic of the second-stageregression converges to 0 in probability if the error terms haveinfinite variance, an hypothesis not rejected by our data, regardlessof the value of the model’s slope coefficient.

Given the random nature of the limit law R whenever theregressors and error terms share the same index of stability α < 2,and given our related finding that the coefficient of determinationconverges in probability to 0 if the tail index of the disturbanceterm is smaller than that of the regressor – a case that may bedifficult to rule out confidently unless the sample size is very large– we view our results as establishing that one should not rely onthe R2 statistic whenever the regressors and disturbance termsare sufficiently heavy-tailed to call into question the existence ofsecond (population) moments. At the very least, if one chooses toreport the coefficient of determination in regressions with heavy-tailed variables, one should also report a point estimate of themedian of R, m = η/(η+1), whereη is as in Theorem1. In addition,one should indicate whether the error terms and regressors mayreasonably be assumed to share the same index of stability. If thevalidity of the latter assumption is in doubt, authors should alsoindicate which of the two indices of stability is likely to be smallerand how far apart they may plausibly be.

It is widely understood, and it is stressed in all introductoryeconometrics textbooks, that a high value of R2 does not providea solid basis either for judging that an empirical regression modelis ‘‘good’’ at explaining the fluctuations of the dependent variableor for ascertainingwhether or not the regressionmodel is correctlyspecified. On the other hand, one suspects, researchers often tendto view low values of the R2 statistic as indicating that the (linear)relationship posited in the regression model is either weak orunreliable. Our paper demonstrates that this view may not be

Page 10: On the properties of the coefficient of determination in regression models with infinite variance variables

24 J.-R. Kurz-Kim, M. Loretan / Journal of Econometrics 181 (2014) 15–24

warranted either, at least whenever the data are characterized bysignificant outlier activity. In such cases the R2 statistic is a randomvariable even in the limit as T → ∞, and a low R2 value could becaused by strong outlier activity in the error terms rather than bya ‘‘flat’’ relationship between the regressor and regressand.

Several extensions to the work presented here are possible.First, the regression F-statistic is functionally related to the coef-ficient of determination; e.g., F = (T − 2) · R2/(1 − R2) in thebivariate regression case. Given the close connection between thetwo statistics, it seems useful to study how the distributional prop-erties of the regression F-statistic are affected by the presence ofα-stable regressors and error terms under both the null hypothe-sis, θ = 0, and the alternative hypothesis, θ = 0. It would alsobe useful to elaborate on our idea – set forth in the paragraph thatfollows the proof of Remark 1 – that the difference between the es-timate of R2 and a consistent estimate of its median may serve as adiagnostic check of the size of the effect of infinite variance on R2.For example, it may be feasible to develop an asymptotic theory ofthe distributional properties of this difference.

It also seems desirable to analyze howwell the distribution of Rapproximates the empirical distribution of R2 in finite samples, forvarious types of heavy-tailed distributions that are in the domainof attraction of SαS distributions, and for various types of esti-mators (such as OLS, Blattberg–Sargent’s BLUE, and the least ab-solute deviation estimator). An extension to the multiple regres-sion framework may produce additional insights into the proper-ties of the R2 statistic. Finally, the theoretical results presented inour paper depend importantly on the assumption that the regres-sors and error terms are independent and identically distributed.Relaxing this assumptionwould be useful becausemany economicand financial time series – especially when sampled at very highfrequencies – are characterized by interesting dependence andheterogeneity features. Introducing serial dependence and condi-tional heteroskedasticity would serve the purpose of studying theproperties of R in a wide range of models that are of particular em-pirical interest. The authors are considering conducting researchalong several of the lines suggested above.

References

Blattberg, R.C., Sargent, T.J., 1971. Regression with non-Gaussian stable distur-bances: some sampling results. Econometrica 39 (3), 501–510.

Brockwell, P.J., Davis, R.A., 1991. Time Series: Theory andModels, 2nd ed.. Springer,New York.

Davis, R.A., Resnick, S.I., 1985a. Limit theory for moving averages of randomvariables with regularly varying tail probabilities. Ann. Probab. 13 (1),179–195.

Davis, R.A., Resnick, S.I., 1985b. More limit theory for sample correlation functionsof moving averages. Stochastic Process. Appl. 20 (2), 257–279.

Davis, R.A., Resnick, S.I., 1986. Limit theory for sample covariance and correlationfunctions of moving averages. Ann. Statist. 14 (2), 533–558.

de Vries, C.G., 1991. On the relation between GARCH and stable processes.J. Econometrics 48 (3), 313–324.

Dufour, J.-M., Kurz-Kim, J.-R., 2010. Exact inference and optimal invariantestimation for the stability parameter of symmetric α-stable distributions.J. Empir. Finance 17 (2), 180–194.

Fama, E.F., French, K.R., 1992. The cross-section of expected stock returns. J. Finance47 (2), 427–465.

Fama, E.F., MacBeth, J.D., 1973. Risk, return, and equilibrium: empirical tests.J. Political Economy 71 (3), 607–636.

Feller, W., 1971. An Introduction to Probability Theory and its Applications, Vol. 2,2nd ed.. John Wiley & Sons, New York.

Fiorio, C.V., Hajivassiliou, V.A., Phillips, P.C.B., 2010. Bimodal t-ratios: the impact ofthick tails on inference. Econom. J. 13 (2), 271–289.

Granger, C.W.J., Orr, D., 1972. ‘Infinite variance’ and research strategy in time seriesanalysis. J. Amer. Statist. Assoc. 67 (338), 275–285.

Hill, B.M., 1975. A simple general approach to inference about the tail of adistribution. Ann. Statist. 3 (5), 1163–1174.

Ibragimov, I.A., Linnik, Y.V., 1971. Independent and Stationary Sequences ofRandom Variables. Wolters-Noordhoff, Groningen.

Jagannathan, R., Wang, Z., 1996. The conditional CAPM and the cross-section ofexpected returns. J. Finance 51 (1), 3–53.

Kanter, M., Steiger, W.L., 1974. Regression and autoregression with infinitevariance. Adv. Appl. Probab. 6 (4), 768–783.

Kim, J.-R., 2003. Finite-sample distributions of self-normalized sums. Comput.Statist. 18, 493–504.

Logan, B.F., Mallows, C.L., Rice, S.O., Shepp, L.A., 1973. Limit distributions of self-normalized sums. Ann. Probab. 1 (5), 788–809.

Loretan,M., Phillips, P.C.B., 1994. Testing the covariance stationarity of heavy-tailedtime series: an overview of the theory with applications to several financialdatasets. J. Empir. Finance 1 (2), 211–248.

Mandelbrot, B.B., 1963. The variation of certain speculative prices. J. Bus. 36 (4),394–429.

McCulloch, J.H., 1998. Linear regression with stable disturbances. In: Adler, R.J.,Feldman., R.E., Taqqu, M.S. (Eds.), A Practical Guide to Heavy Tails. In: StatisticalTechniques and Applications, Birkhäuser, Berlin, pp. 359–376 (chapter 15).

McCulloch, J.H., 1986. Simple consistent estimators of stable distribution parame-ters. Commun. Stat.: Comput. Simul. 15 (4), 1109–1136.

McCulloch, J.H., 1997. Measuring tail thickness to estimate the stable index α: acritique. J. Bus. Econom. Statist. 15 (1), 74–81.

Mittnik, S., Rachev, S.T., Kim, J.-R., 1998. Chi-square-type distributions for heavy-tailed variates. Econometric Theory 14 (3), 339–354.

Mood, A.M., Graybill, F.A., Boes, D.C., 1974. Introduction to the Theory of Statistics,3rd ed.. McGraw-Hill, New York.

Nolan, J.P., 2013. Stable Distributions—Models for Heavy Tailed Data. Birkhäuser,Berlin, (in preparation),Chapter 1 online at http://academic2.american.edu/~jpnolan.

Phillips, P.C.B., 1990. Time series regression with a unit root and infinite-varianceerrors. Econometric Theory 6 (1), 44–62.

Phillips, P.C.B., Loretan, M., 1991. The Durbin–Watson ratio under infinite-varianceerrors. J. Econometrics 47 (1), 85–114.

Phillips, P.C.B., Loretan, M., 1995. On the theory of testing covariance stationarityunder moment condition failure. In: Maddala, G. S., Phillips, P. C. B., Srinivasan,T. N. (Eds.), Advances in Econometrics and Quantitative Economics: Essays inHonor of Professor C. R. Rao. Blackwell, Oxford, pp. 198–233 (chapter 8).

Rachev, S.T., Menn, C., Fabozzi, F.J., 2005. Fat-tailed and skewed asset returndistributions: implications for risk managment. In: Portfolio Selection, andOption Pricing. John Wiley & Sons, New York.

Resnick, S.I., 1986. Point processes, regular variation and weak convergence. Adv.Appl. Probab. 18 (1), 66–138.

Resnick, S.I., 2006. Heavy-Tail Phenomena: Probabilistic and Statistical Modeling.Springer, New York.

Runde, R., 1993. A note on the asymptotic distribution of the F-statistic for randomvariables with infinite variance. Statist. Probab. Lett. 18 (1), 9–12.

Runde, R., 1997. The asymptotic null distribution of the Box–Pierce Q -statistic forrandom variables with infinite variance, with an application to German stockreturns. J. Econometrics 78 (2), 205–216.

Samorodnitsky, G., Rachev, S.T., Kurz-Kim, J.-R., Stoyanov, S.V., 2007. Asymptoticdistribution of unbiased linear estimators in the presence of heavy-tailedstochastic regressors and residuals. Probab. Math. Statist. 27 (2), 275–302.

Samorodnitsky, G., Taqqu, M.S., 1994. Stable Non-Gaussian Random Processes.Chapman and Hall, New York.

Zolotarev, V.M., 1986. One-Dimensional Stable Distributions. In: Translationsof Mathematical Monographs, vol. 65. American Mathematical Society,Providence RI.