on measures of association as measures of positive dependence
TRANSCRIPT
Statistics & Probability Letters 14 (1992) 269-274
North-Holland
17 July 1992
On measures of association as measures of positive dependence
Roger B. Nelsen Department of Mathematical Sciences, Lewis and Clark College, Portland, OR, USA
Received January 1991
Revised August 1991
Abstract: We show that Spearman’s rho is a measure of average positive (and negative) quadrant dependence, and that Kendall’s
tau is a measure of average total positivity (and reverse regularity) of order two.
Ah4.Y 1980 Subject Classifications: Primary 62820; Secondary 62805.
Keywords: Measures of association, Spearman’s rho, Kendall’s tau, positive dependence properties, positive quadrant dependence,
total positivity of order two, copulas.
1. Introduction
Kimeldorf and Sampson (1989) recently developed a structure for studying concepts of positive depen- dence. In their important paper they explore the relationships between and among three sets of positive dependence concepts: positive dependence orderings; positive dependence properties (such as positive quadrant dependence and total positivity of order two); and measures of positive association (such as Spearman’s ps and Kendall’s 7). Concerning dependence properties and measures of association, Kimeldorf and Sampson state that “it is often unclear exactly what dependence (property) a specific measure of positive association is attempting to describe.” In this paper we provide a partial response: both as a population parameter and as a sample statistic, Spearman’s ps is a measure of average positive (and negative) quadrant dependence while Kendall’s 7 is a measure of average total positivity (and reverse regularity) of order two.
2. Quadrant dependence and Spearman’s ps
Let (X, Y) be a pair of continuous random variables with joint distribution function H and marginal distribution functions F and G. The pair (X, Y) is said to be positively qtiadr-ant dependent, written PQD (Lehmann, 1966) if H(x, y) - F(x)G(y) > 0 for all x and y, and negatively quadrant dependent when the inequality is reversed. So, in a sense, the expression H(x, y> - F(x)G(y) measures ‘local’ quadrant dependence at each point (x, y) E R*.
Correspondence to: Roger B. Nelsen, Department of Mathematics, Mount Holyoke College, South Hadley, MA 01075-1461, USA.
0167.7152/92/$05.00 0 1992 - Elsevier Science Publishers B.V. All rights reserved 269
Volume 14, Number 4 STATISTICS & PROBABILITY LETTERS 17 July 1992
It is well known (see Schweizer and Wolff, 1981, and the references cited therein) that the population
version of Spearman’s rho is given by
ps= l’j/-JH(x. Y) -F(x)G(y)] dF(x) dG(y).
Hence &ps represents an average measure of quadrant dependence, where the average is taken with respect to the marginal distributions of X and Y.
The above expressions (as well as others to follow) can be simplified if we employ the probability transforms u = F(x) and L’ = G(y). Then our measure of quadrant dependence is given by C(u, U) - uu (for u and u in I = [0, I]) and its average becomes
kps = /[-[C(u, ~1) - UU] du dL1,
where C is the function defined by C(u, L’) = H(F-l(u), G-‘(u)) and F-’ and G-’ denote the usual inverses of F and G. This function C : I2 -+ I is the copulu (Sklar, 1959) of the random variables X and Y. For a complete discussion of copulas, their properties, and a bibliography, see Schweizer (1991).
A measure of ‘expected’ quadrant dependence is the expected value of the random variable W=H(X, Y) - F(X)G(Y) corresponding to our measure of quadrant dependence, i.e. E(W) = //&H(x, y> - F(x)G(y)l dH( x, y). But it is well known that the population version of Kendall’s tau is given by r = 4//,2H(x, y) dH(x, y> - 1 and that ps = 12//,~F(x)G(y) dH(x, y> - 3 (see Schweizer and Wolff, 1981, for references), from which it follows that ‘expected’ quadrant dependence is given by
E(W) = A(37 - ps). The relationship between PQD and ps was exploited by Lehmann (1966) and Tchen (1980), who
showed that when (X, Y> are PQD, both ps > 0 and r 2 0. But (X, Y> PQD implies E(W) 2 0, so that in this case we now have the slightly strong inequality: 37 > ps >, 0.
3. Total positivity of order two and Kendall’s T (absolutely continuous case)
A pair of random variables (X, Y> with an absolutely continuous distribution function H is said to be totally positive of order two (TP,) if the joint density function h(x, y> satisfies h(x,, y2)h(~,, y,) - h(x,, y,)h(x,, y,) 2 0 whenever x, <x, and y, <y 2; and reverse regular of order two (RR,) when the
opposite inequality holds (Karlin, 1968; Barlow and Proschan, 1975). Thus h(x,, y,)h(x,, L2)-- h(x,, Y~)~(x~, y,) measures ‘local’ TP, (and RR,) for (X, Y). Let T denote its average for --00 <x, < x2<m and --m<y, <y2<m, that is,
T= /:m/l,/r:/‘* [h(x,, Y2)h(x,, Y,) -h(x,> Y2)‘3(x29 Yd] dxi dye dx2 dY2* --m
We will show that T = 47. Again employing the probability transforms u = F(x) and L’ = G(y) and utilizing copulas, we have
T= j01/01(‘2~Z[~(~2r c2)c(u,, L.,) -c(u,, u2)c(u2, u,)] du, du, du, du,
where c(u, U) is the ‘density copula’ defined by c(u, U) = (a2/au au)C(u, u), so that h(x, y) = c(F(x), G(y))f(x)g(y) (f and g denote the marginal densities).
To evaluate T, we first let Tb,, u,) denote the inner double integral. It readily follows that
T(u,, ~‘2) = C(u,> ~2) &(u2. c’2) - &(u,, /.,)$C(u2, u2).
270
Volume 14. Number 4 STATISTICS & PROBABILITY LETTERS 17 July 1992
Hence we have (after replacing u2 and L’~ by u and 12, respectively)
T= ’ *C(u, // 0 0 1 Ia a
u, v) du dv - -C(u, du // 0 0 du. au c)~C(U, U)
But the first integral above is simply a<r + 1) and the second is (after integration by parts) i<l - 7); whence T = $r. Thus when the joint distribution function H is absolutely continuous +T represents an average measure of total positivity (and reverse regularity) of order two.
4. Total positivity of order two and Kendall’s T (general case)
Of course, the above argument fails if H fails to be absolutely continuous. But we will show that its conclusion still holds if we use a more general definition of TP,. If J and K are intervals in R, let H(J, K) denote Pr{XEJ, YE K). Furthermore write J <K whenever x EJ and y E K implies x <y. Then we say that the joint distribution of (X, Y> is TP, if H(J,, K,)H(J,, K,) aH(J,, K,)W(J,, K,) for all intervals J, <J, and K, <K, (Block et al., 1982; Kimeldorf and Sampson, 1987). This version of TP, can also be expressed in terms of the copula C, since H(J, K) = CWJ), G(K)); where for any two
intervals [u,, u,], [v,, ~~1 in I, C(]u,, u,l, iv,, u*I) = au,, v,> - au,, VJ - au,, u,> + au,, u,>. To construct an average measure for the TP, property in this case, we begin by partitioning the
interval I = [0, l] on the u axis in the usual manner: choose points {u,],“=,, such that 0 = u,, < U, < . . . < u, = 1; and let .I,, = [up_,, up]. Similarly partition I on the c’ axis into intervals K, = Icy_ ,, u41 for 1 G 9 G m; thus generating a partition P of I2 into the mn rectangles Jp X K,. Let I] P II denote the norm of P. For each of the (;>(yI choices of intervals J,, J,, K,y, and K, with 1 <r <p <II and
1 G s < 9 G m compute the ‘local’ measure CC./,, K,)C(J,, K,) - C<J,, K,)C(J,, Kq> of the TP, prop- erty; then sum and take the limit as II P II * 0; that is, we now let
We now prove that the above limit for T exists and equals $r. The inner double summation in T readily telescopes to yield
VU)C(U~_,, +,) - C(u,-,, u,)C(u,, %-,)I. (3)
If we employ the identity 2(ad - bc) = (a + b + c + d)(a -b - c + d) - (a2 - b2 - c2 + dz> to expand the summand, we have (writing C,,, for C(u,, u,))
2T= lim ,,p,,_o c, q;, [(CA, + C,-,A + C/A,-, + C,-l,,-,)G,, - C,-,A - C/V-, + C,-,/-,)
-(c;,q-cp’-*,q - C&L-l + c;-,,y-l)]. (We can use 1 for the lower limits of summation since the summand vanishes if either p = 1 or 4 = 1.) Now C is continuous on each rectangle Z, X.lq, so that the intermediate value theorem guarantees the existence of points (u,*, u,*> ~1, xJ, such that 4C(u,*, u,*> = C,,, + C,_,,, + Cp,4_, + CD_,,,_,. Fur-
271
Volume 14, Number 4 STATISTICS & PROBABILITY LETTERS 17 July 1992
thermore, ACP.q = C( I,, J,> = C,,, - C,_ ,,q - C,,,_ r + C,_ ,,q_, represents the probability mass as-
signed to Z, x 1, by the copula C. Thus
2T= lim IIPII+O
4 2 E C(u;, L’~)AC~,~- 2 2 [C~,y-C~-,,y-~:,q--,+~~-~,q-~] p=l q=l p=l q=l
But the second double sum above telescopes to 1 and thus 2T is the limit of a Riemann sum for
r = 4//,~C(u, U) dC - 1. Hence T = :T as claimed.
5. Sample statistics
A result similar to (1) holds for the sample statistic rs corresponding to Spearman’s ps (we shall use
Greek letters for population parameters, Latin letters for the corresponding sample statistics). Letting {(x,, yk))[=, denote a sample of size n, we have
where C, is the empirical dependence function (Deheuvels, 1979) or empirical cop&a (Quesada-Molina, 1990) given by
C, if k i i
= k . (number of pairs (x, y ) in the sample such that x < xci) and y < ycj,),
and xci) and ycjr denote order statistics from the sample. To show that the expression in (4) is equivalent to the standard one for rs (Kruskal, 1958; Lehmann,
1975), we need only show that
2 i C,( ;, ;) = ;&R,, i=r j=l
where R, = m when (xck,, y(,,) is an element of the sample. But observe that a particular pair
(x (k), y(,,) in the sample contributes l/n to the double sum above for each pair (i, j) with i > k and
j 2 m. Equivalently, the total contribution to the double sum by a particular pair (xck,, y,,,,,) is l/n times (n - k + l)(n - m + l), the total number of pairs (i, j) such that i 2 k and j >, m. Hence, writing R, for
m and summing on k, we have
as claimed (also see Quesada-Molina, 1990). As with rs and ps we have similar results for the sample statistic t corresponding to Kendall’s 7.
Analogous to (2) we have (for the sample of size n>
(5)
272
Volume 14, Number 4 STATISTICS & PROBABILITY LETTERS 17 July 1992
where c, is the ‘empirical density copula’ defined by
l/n, if ( xcij, ycj,) is an element of the sample (( xk, yk)}l=, ,
0, otherwise.
It is easy to show that the expression in (5) for t is equivalent to the classical one (Kendall, 19621, i.e., the difference between the numbers of ‘concordant’ and ‘discordant’ pairs of elements in the sample divided by the total number (i) of pairs of elements from the sample. The summand in (5) reduces to (l/n)* whenever the sample contains both (x(~,), Y,~,,> and (xci2), ycjzj) (a ‘concordant’ pair of elements, since
X(i,) < X(i,) and ycj ) <Y,~,,); to --(l/n)’ whenever the sample contains both (xci ), ycjz,) and (+,), ycj,,) (a ‘discordant’ pair), and 0 otherwise. Thus the quadruple sum in (5) is (l/nj2 times the difference between the number of concordant pairs of elements and the number of discordant pairs, equivalent to the classical form for t.
Evaluating the inner double summation in (5) yields an expression for t similar to (3) in terms of the empirical copula C,,
6. Conclusion
Several authors (Barlow and Proschan, 1975; Shaked, 1977) have studied positive dependence properties in addition to PQD and TP,; properties such as association, stochastically increasing, right tail increas- ing, left tail decreasing, and right corner set increasing. Of these properties, TP, is the strongest and PQD the weakest. Thus the two classical measures of association, Spearman’s ps and Kendall’s 7, provide average measures of two relatively extreme positive dependence properties.
Acknowledgement
The author wishes to acknowledge the support of Mount Holyoke College in South Hadley, Mas- sachusetts, where he was a visiting professor in the Department of Mathematics, Statistics, and Computation.
References
Barlow, R. and F.’ Proschan (19751, Statistical Theory of Relia- bility and Life Testing: Probability Models (Holt, Rinehart
and Winston, New York). Block, H., T. Savits and M. Shaked (1982), Some concepts of
negative dependence, Ann. Probab. 10, 765-772. Deheuvels, P. (1979), La fonction de dependence empirique
et ses proprietes. Un test non parametrique d’independ- ence, Acad. Roy. Belg. Bull. Cl. Sci. (5) 65, 274-292.
Karlin, S. (19681, Total Posifiuity, Vol. I (Stanford Univ. Press,
Stanford, CA).
Kendall, M.G. (1962), Rank Correlation Methods (Griffin,
London). Kimeldorf, G. and A. Sampson (1987), Positive dependence
orderings, Ann. Inst. Statist. Math. 39, 113-128. Kimeldorf, G. and A. Sampson (1989), A framework for
positive dependence, Ann. Inst. Statist. Math. 41, 31-45. Kruskal, W.H. (1958), Ordinal measures of association, L
Amer. Statist. Assoc. 53, 814-861. Lehmann, E. (1966), Some concepts of dependence, Ann.
Math. Statist. 37, 1137-l 1.53.
273
Volume 14, Number 4 STATISTICS & PROBABILITY LETTERS 17 July 1992
Lehmann, E. (19751, Nonparametrics: Statistical Methods Based on Ranks (Holden-Day, San Francisco, CA).
Quesada-Molina, J.J. (1990), Copulas and multivariate de-
pendence, Symposium on Distributions with Given Marginals (Frichet classes), Rome, 4-7 April.
Schweizer, B. (19911, Thirty years of copulas, in: G. Dall’Aglio, S. Kotz and G. Salinetti, eds., Aduances in Probability Distributions with Gicen Marginals (Kluwer Academic Pub-
lishers, Dordrecht) pp. 13-50.
Schweizer, B. and E.F. Wolff (1981), On nonparametric meas- ures of dependence for random variables, Ann. Statist. 9, 879-885.
Shaked, M. (1977), A family of concepts of dependence for bivariate distributions, .I. Amer. Statist. Assoc. 72, 642-650.
Sklar, A. (19591, Fonctions de repartition a n dimensions et leurs marges, Publ. Inst. Statist. Uniu. Paris 8, 229-231.
Tchen, A. (1980), Inequalities for distribution functions with given marginals, Ann. Probab. 8, 814-827.
274