on measures of association as measures of positive dependence

Statistics & Probability Letters 14 (1992) 269-274

North-Holland

17 July 1992

On measures of association as measures of positive dependence

Roger B. Nelsen Department of Mathematical Sciences, Lewis and Clark College, Portland, OR, USA

Received January 1991

Revised August 1991

Abstract: We show that Spearman’s rho is a measure of average positive (and negative) quadrant dependence, and that Kendall’s

tau is a measure of average total positivity (and reverse regularity) of order two.

Ah4.Y 1980 Subject Classifications: Primary 62820; Secondary 62805.

Keywords: Measures of association, Spearman’s rho, Kendall’s tau, positive dependence properties, positive quadrant dependence,

total positivity of order two, copulas.

1. Introduction

Kimeldorf and Sampson (1989) recently developed a structure for studying concepts of positive dependence. In their important paper they explore the relationships between and among three sets of positive dependence concepts: positive dependence orderings; positive dependence properties (such as positive quadrant dependence and total positivity of order two); and measures of positive association (such as Spearman’s ps and Kendall’s 7). Concerning dependence properties and measures of association, Kimeldorf and Sampson state that “it is often unclear exactly what dependence (property) a specific measure of positive association is attempting to describe.” In this paper we provide a partial response: both as a population parameter and as a sample statistic, Spearman’s ps is a measure of average positive (and negative) quadrant dependence while Kendall’s 7 is a measure of average total positivity (and reverse regularity) of order two.

2. Quadrant dependence and Spearman’s ps

Let (X, Y) be a pair of continuous random variables with joint distribution function H and marginal distribution functions F and G. The pair (X, Y) is said to be positively qtiadr-ant dependent, written PQD (Lehmann, 1966) if H(x, y) - F(x)G(y) > 0 for all x and y, and negatively quadrant dependent when the inequality is reversed. So, in a sense, the expression H(x, y> - F(x)G(y) measures ‘local’ quadrant dependence at each point (x, y) E R*.

Correspondence to: Roger B. Nelsen, Department of Mathematics, Mount Holyoke College, South Hadley, MA 01075-1461, USA.

0167.7152/92/$05.00 0 1992 - Elsevier Science Publishers B.V. All rights reserved 269

Volume 14, Number 4 STATISTICS & PROBABILITY LETTERS 17 July 1992

It is well known (see Schweizer and Wolff, 1981, and the references cited therein) that the population

version of Spearman’s rho is given by

ps= l’j/-JH(x. Y) -F(x)G(y)] dF(x) dG(y).

Hence &ps represents an average measure of quadrant dependence, where the average is taken with respect to the marginal distributions of X and Y.

The above expressions (as well as others to follow) can be simplified if we employ the probability transforms u = F(x) and L’ = G(y). Then our measure of quadrant dependence is given by C(u, U) - uu (for u and u in I = [0, I]) and its average becomes

kps = /[-[C(u, ~1) - UU] du dL1,

where C is the function defined by C(u, L’) = H(F-l(u), G-‘(u)) and F-’ and G-’ denote the usual inverses of F and G. This function C : I2 -+ I is the copulu (Sklar, 1959) of the random variables X and Y. For a complete discussion of copulas, their properties, and a bibliography, see Schweizer (1991).

A measure of ‘expected’ quadrant dependence is the expected value of the random variable W=H(X, Y) - F(X)G(Y) corresponding to our measure of quadrant dependence, i.e. E(W) = //&H(x, y> - F(x)G(y)l dH( x, y). But it is well known that the population version of Kendall’s tau is given by r = 4//,2H(x, y) dH(x, y> - 1 and that ps = 12//,~F(x)G(y) dH(x, y> - 3 (see Schweizer and Wolff, 1981, for references), from which it follows that ‘expected’ quadrant dependence is given by

E(W) = A(37 - ps). The relationship between PQD and ps was exploited by Lehmann (1966) and Tchen (1980), who

showed that when (X, Y> are PQD, both ps > 0 and r 2 0. But (X, Y> PQD implies E(W) 2 0, so that in this case we now have the slightly strong inequality: 37 > ps >, 0.

3. Total positivity of order two and Kendall’s T (absolutely continuous case)

A pair of random variables (X, Y> with an absolutely continuous distribution function H is said to be totally positive of order two (TP,) if the joint density function h(x, y> satisfies h(x,, y2)h(~,, y,) - h(x,, y,)h(x,, y,) 2 0 whenever x, <x, and y, <y 2; and reverse regular of order two (RR,) when the

opposite inequality holds (Karlin, 1968; Barlow and Proschan, 1975). Thus h(x,, y,)h(x,, L2)-- h(x,, Y~)~(x~, y,) measures ‘local’ TP, (and RR,) for (X, Y). Let T denote its average for --00 <x, < x2<m and --m<y, <y2<m, that is,

T= /:m/l,/r:/‘* [h(x,, Y2)h(x,, Y,) -h(x,> Y2)‘3(x29 Yd] dxi dye dx2 dY2* --m

We will show that T = 47. Again employing the probability transforms u = F(x) and L’ = G(y) and utilizing copulas, we have

T= j01/01(‘2~Z[~(~2r c2)c(u,, L.,) -c(u,, u2)c(u2, u,)] du, du, du, du,

where c(u, U) is the ‘density copula’ defined by c(u, U) = (a2/au au)C(u, u), so that h(x, y) = c(F(x), G(y))f(x)g(y) (f and g denote the marginal densities).

To evaluate T, we first let Tb,, u,) denote the inner double integral. It readily follows that

T(u,, ~‘2) = C(u,> ~2) &(u2. c’2) - &(u,, /.,)$C(u2, u2).

270

Volume 14. Number 4 STATISTICS & PROBABILITY LETTERS 17 July 1992

Hence we have (after replacing u2 and L’~ by u and 12, respectively)

T= ’ *C(u, // 0 0 1 Ia a

u, v) du dv - -C(u, du // 0 0 du. au c)~C(U, U)

But the first integral above is simply a<r + 1) and the second is (after integration by parts) i<l - 7); whence T = $r. Thus when the joint distribution function H is absolutely continuous +T represents an average measure of total positivity (and reverse regularity) of order two.

4. Total positivity of order two and Kendall’s T (general case)

Of course, the above argument fails if H fails to be absolutely continuous. But we will show that its conclusion still holds if we use a more general definition of TP,. If J and K are intervals in R, let H(J, K) denote Pr{XEJ, YE K). Furthermore write J <K whenever x EJ and y E K implies x <y. Then we say that the joint distribution of (X, Y> is TP, if H(J,, K,)H(J,, K,) aH(J,, K,)W(J,, K,) for all intervals J, <J, and K, <K, (Block et al., 1982; Kimeldorf and Sampson, 1987). This version of TP, can also be expressed in terms of the copula C, since H(J, K) = CWJ), G(K)); where for any two

intervals [u,, u,], [v,, ~~1 in I, C(]u,, u,l, iv,, u*I) = au,, v,> - au,, VJ - au,, u,> + au,, u,>. To construct an average measure for the TP, property in this case, we begin by partitioning the

interval I = [0, l] on the u axis in the usual manner: choose points {u,],“=,, such that 0 = u,, < U, < . . . < u, = 1; and let .I,, = [up_,, up]. Similarly partition I on the c’ axis into intervals K, = Icy_ ,, u41 for 1 G 9 G m; thus generating a partition P of I2 into the mn rectangles Jp X K,. Let I] P II denote the norm of P. For each of the (;>(yI choices of intervals J,, J,, K,y, and K, with 1 <r <p <II and

1 G s < 9 G m compute the ‘local’ measure CC./,, K,)C(J,, K,) - C<J,, K,)C(J,, Kq> of the TP, property; then sum and take the limit as II P II * 0; that is, we now let

We now prove that the above limit for T exists and equals $r. The inner double summation in T readily telescopes to yield

VU)C(U~_,, +,) - C(u,-,, u,)C(u,, %-,)I. (3)

If we employ the identity 2(ad - bc) = (a + b + c + d)(a -b - c + d) - (a2 - b2 - c2 + dz> to expand the summand, we have (writing C,,, for C(u,, u,))

2T= lim ,,p,,_o c, q;, [(CA, + C,-,A + C/A,-, + C,-l,,-,)G,, - C,-,A - C/V-, + C,-,/-,)

-(c;,q-cp’-*,q - C&L-l + c;-,,y-l)]. (We can use 1 for the lower limits of summation since the summand vanishes if either p = 1 or 4 = 1.) Now C is continuous on each rectangle Z, X.lq, so that the intermediate value theorem guarantees the existence of points (u,*, u,*> ~1, xJ, such that 4C(u,*, u,*> = C,,, + C,_,,, + Cp,4_, + CD_,,,_,. Fur-

271


thermore, ACP.q = C( I,, J,> = C,,, - C,_ ,,q - C,,,_ r + C,_ ,,q_, represents the probability mass as-

signed to Z, x 1, by the copula C. Thus

2T= lim IIPII+O

4 2 E C(u;, L’~)AC~,~- 2 2 [C~,y-C~-,,y-~:,q--,+~~-~,q-~] p=l q=l p=l q=l

But the second double sum above telescopes to 1 and thus 2T is the limit of a Riemann sum for

r = 4//,~C(u, U) dC - 1. Hence T = :T as claimed.

5. Sample statistics

A result similar to (1) holds for the sample statistic rs corresponding to Spearman’s ps (we shall use

Greek letters for population parameters, Latin letters for the corresponding sample statistics). Letting {(x,, yk))[=, denote a sample of size n, we have

where C, is the empirical dependence function (Deheuvels, 1979) or empirical cop&a (Quesada-Molina, 1990) given by

C, if k i i

= k . (number of pairs (x, y ) in the sample such that x < xci) and y < ycj,),

and xci) and ycjr denote order statistics from the sample. To show that the expression in (4) is equivalent to the standard one for rs (Kruskal, 1958; Lehmann,

1975), we need only show that

2 i C,( ;, ;) = ;&R,, i=r j=l

where R, = m when (xck,, y(,,) is an element of the sample. But observe that a particular pair

(x (k), y(,,) in the sample contributes l/n to the double sum above for each pair (i, j) with i > k and

j 2 m. Equivalently, the total contribution to the double sum by a particular pair (xck,, y,,,,,) is l/n times (n - k + l)(n - m + l), the total number of pairs (i, j) such that i 2 k and j >, m. Hence, writing R, for

m and summing on k, we have

as claimed (also see Quesada-Molina, 1990). As with rs and ps we have similar results for the sample statistic t corresponding to Kendall’s 7.

Analogous to (2) we have (for the sample of size n>

(5)

272


where c, is the ‘empirical density copula’ defined by

l/n, if ( xcij, ycj,) is an element of the sample (( xk, yk)}l=, ,

0, otherwise.

It is easy to show that the expression in (5) for t is equivalent to the classical one (Kendall, 19621, i.e., the difference between the numbers of ‘concordant’ and ‘discordant’ pairs of elements in the sample divided by the total number (i) of pairs of elements from the sample. The summand in (5) reduces to (l/n)* whenever the sample contains both (x(~,), Y,~,,> and (xci2), ycjzj) (a ‘concordant’ pair of elements, since

X(i,) < X(i,) and ycj ) <Y,~,,); to --(l/n)’ whenever the sample contains both (xci ), ycjz,) and (+,), ycj,,) (a ‘discordant’ pair), and 0 otherwise. Thus the quadruple sum in (5) is (l/nj2 times the difference between the number of concordant pairs of elements and the number of discordant pairs, equivalent to the classical form for t.

Evaluating the inner double summation in (5) yields an expression for t similar to (3) in terms of the empirical copula C,,

6. Conclusion

Several authors (Barlow and Proschan, 1975; Shaked, 1977) have studied positive dependence properties in addition to PQD and TP,; properties such as association, stochastically increasing, right tail increasing, left tail decreasing, and right corner set increasing. Of these properties, TP, is the strongest and PQD the weakest. Thus the two classical measures of association, Spearman’s ps and Kendall’s 7, provide average measures of two relatively extreme positive dependence properties.

Acknowledgement

The author wishes to acknowledge the support of Mount Holyoke College in South Hadley, Mas- sachusetts, where he was a visiting professor in the Department of Mathematics, Statistics, and Computation.

References

Barlow, R. and F.’ Proschan (19751, Statistical Theory of Relia- bility and Life Testing: Probability Models (Holt, Rinehart

and Winston, New York). Block, H., T. Savits and M. Shaked (1982), Some concepts of

negative dependence, Ann. Probab. 10, 765-772. Deheuvels, P. (1979), La fonction de dependence empirique

et ses proprietes. Un test non parametrique d’independ- ence, Acad. Roy. Belg. Bull. Cl. Sci. (5) 65, 274-292.

Karlin, S. (19681, Total Posifiuity, Vol. I (Stanford Univ. Press,

Stanford, CA).

Kendall, M.G. (1962), Rank Correlation Methods (Griffin,

London). Kimeldorf, G. and A. Sampson (1987), Positive dependence

orderings, Ann. Inst. Statist. Math. 39, 113-128. Kimeldorf, G. and A. Sampson (1989), A framework for

positive dependence, Ann. Inst. Statist. Math. 41, 31-45. Kruskal, W.H. (1958), Ordinal measures of association, L

Amer. Statist. Assoc. 53, 814-861. Lehmann, E. (1966), Some concepts of dependence, Ann.

Math. Statist. 37, 1137-l 1.53.

273


Lehmann, E. (19751, Nonparametrics: Statistical Methods Based on Ranks (Holden-Day, San Francisco, CA).

Quesada-Molina, J.J. (1990), Copulas and multivariate de-

pendence, Symposium on Distributions with Given Marginals (Frichet classes), Rome, 4-7 April.

Schweizer, B. (19911, Thirty years of copulas, in: G. Dall’Aglio, S. Kotz and G. Salinetti, eds., Aduances in Probability Distributions with Gicen Marginals (Kluwer Academic Pub-

lishers, Dordrecht) pp. 13-50.

Schweizer, B. and E.F. Wolff (1981), On nonparametric measures of dependence for random variables, Ann. Statist. 9, 879-885.

Shaked, M. (1977), A family of concepts of dependence for bivariate distributions, .I. Amer. Statist. Assoc. 72, 642-650.

Sklar, A. (19591, Fonctions de repartition a n dimensions et leurs marges, Publ. Inst. Statist. Uniu. Paris 8, 229-231.

Tchen, A. (1980), Inequalities for distribution functions with given marginals, Ann. Probab. 8, 814-827.

274

on measures of association as measures of positive dependence

Documents