distributions with fixed marginals and related topics || nonparametric measures of multivariate...
TRANSCRIPT
Nonparametric Measures of Multivariate AssociationAuthor(s): Roger B. NelsenSource: Lecture Notes-Monograph Series, Vol. 28, Distributions with Fixed Marginals andRelated Topics (1996), pp. 223-232Published by: Institute of Mathematical StatisticsStable URL: http://www.jstor.org/stable/4355895 .
Accessed: 14/06/2014 00:10
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp
.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].
.
Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access toLecture Notes-Monograph Series.
http://www.jstor.org
This content downloaded from 188.72.127.68 on Sat, 14 Jun 2014 00:10:09 AMAll use subject to JSTOR Terms and Conditions
Distributions with Fixed Marginals and Related Topics IMS Lecture Notes - Monograph Series Vol. 28, 1996
Nonparametric Measures
of Multivariate Association
By Roger B. Nelsen
Lewis and Clark College
We study measures of multivariate association which are generalizations of the measures of bivariate association known as Spearman's rho and Kendall's
tau. Since the population versions of Spearman's rho and Kendall's tau can be
interpreted as measures of average positive (and negative) quadrant dependence and average total positivity of order two, respectively (Nelsen, 1992), we extend
these ideas to the multivariate setting and derive measures of multivariate as-
sociation from averages of orthant dependence and multivariate total positivity of order two. We examine several properties of these measures, and present
examples in three and four dimensions.
1. Introduction. The purpose of this paper is to present three mea-
sures of multivariate association which are derived from multivariate depen-
dence concepts. We begin in Section 2 by reviewing the main results for the
bivariate case: Spearman's rho is a measure of average quadrant dependence;
while Kendall's tau is a measure of average total positivity of order two. Re-
lationships between measures of multivariate association and concordance are
discussed in Section 3. In Section 4 we construct two measures of multivariate
association from two generalizations of quadrant dependence -
upper orthant
dependence and lower orthant dependence, and examine properties of these
measures in Section 5. In Section 6 we construct a measure from multivariate
total positivity of order two, and in Section 7 we examine properties of this
measure.
2. The Bivariate Case. Before proceeding, we adopt some notation
and review some definitions. Let X and Y denote continuous random variables
(r.v.'s) with joint distribution function (d.f.) H and marginal d.f.'s F and G.
Let C denote the copula of X and Y, the function C : I2 ?> J = [0,1] given by
H(x,y) = C(F(x),G(y)). We will also denote the joint and marginal densities
of X and Y by h, f, and g, respectively.
AMS 1991 Subject Classification: Primary 62H20
Key words and phrases: copula, Kendall's tau, multivariate total positivity of order
two, orthant dependence, Spearman's rho.
This content downloaded from 188.72.127.68 on Sat, 14 Jun 2014 00:10:09 AMAll use subject to JSTOR Terms and Conditions
224 NONPARAMETRIC MEASURES
The r.v.'s X and Y are said to be positively quadrant dependent, written
PQD, (Lehmann, 1966) iff the probability that they are simultaneously large is at least as great as it would be were X and Y independent, i.e., iff
P(X >x,Y >y)> P(X > x)P(Y > y) for all ? and y. (2.1)
In terms of d.f.'s, X and Y are PQD iff 1 - F(x)
- G(y) + H(x,y) > [1
-
F(x)][l-G(y)}, or equivalently, iff H(x,y) > F(x)G(y) (i.e.,iffP(X < x,Y <
y) > P(X < x)P(Y < y)). So, in a sense, the expression H(x,y) -
F(x)G(y) measures "local" quadrant dependence at each point (x,y) in R2.
The population version of Spearman's rho is given by
p. = 12 I 2[H(x,y)- F(x)G(y)]dF{x)dG(y),
J R
and hence j^ps represents a measure of average quadrant dependence, where
the average is with respect to the marginal distributions of X and Y (Nelsen,
1992). Invoking the probability transforms u = F(x) and ? = G(y) and the
copula C, the above expressions (as well as ones to follow) simplify. The
measure of local quadrant dependence becomes C(u,v) - uv (for u and ? in
I2), and ps = 12 jT2[C(u, v)
- uv]dudv.
In a similar fashion, the population analog of Kendall's tau is related to
the dependence property known as total positivity of order two. X and Y
are totally positive of order two (written TP2) iff their joint density h satisfies
h(xi,yi)h(x2,y2) > h(xi,y2)h(x2,yi) for all xx, x2, yi, ?/2 in R such that
xi < x2 and yx < y2. Thus h(xi,yi)h(x2,y2) -
h(xi,y2)h(x2,yi) measures
"local" TP2 for X and Y. Let t denote its average for -00 < xi < x2 < 00
and ?00 < yi < y2 < 00, i.e., let
/oo
roo rx2 ry2
?ii \???,?\)??2,?2) -
h(xi,y2)h(x2,yi)]dyidxidy2dx2. -00 J?00 J?00 J?oo
It can be shown that t = 2t where t is the population version of Kendall's
tau,
t = 4 f 2H(x,y)dH(x,y)~
1 = 4 / C(u, v)dC(u,v) - 1. (2.2)
J R JI
See Nelsen (1992) for details.
3. Concordance. In the bivariate setting, two pairs of r.v.'s (??,??) and (X2,Y2) are concordant if Xi < X2 and Yi < Y2 or if X\ > X2 and
Yi > Y2; and discordant if ?? < X2 and Yi > Y2 or if Xx > X2 and Yx <Y2.
As is well known (Kruskal, 1958), the population version of Kendall's t is the
This content downloaded from 188.72.127.68 on Sat, 14 Jun 2014 00:10:09 AMAll use subject to JSTOR Terms and Conditions
ROGER ?. NELSEN 225
probability that two independent observations of the r.v.'s X and Y (with d.f.
H) are concordant, scaled to be 0 when X and Y are independent and 1 when
the d.f. of X and Y is the Fr?chet upper bound. This can be seen by noting that
P(Xi < X2,Yi < Y2 or Xi > X2,Yi > Y2) = 2P{XX < X2,Yi < Y2)
= 2 / P(X < ?, Y < y)dH(x,y) = 2 / H(x,y)dH(x,y)
J R J R
and observing that this integral appears in (2.2).
For notation in the multivariate case, let Xi,X2,? - ?,Xn be continuous
r.v.'s with marginal d.f.'s Fi?F2?- --Fn respectively, joint d.f. H, and copula C G ^1 given by H(xux2,.
? -,xn) = C(Fi{xi),F2(x2),- -,Fn(xn)). Also
let X = (??,X2,? ?
?,??)?> & = (??>?2> ? ?
?,??), and let X > ? denote the
component-wise inequality Xi > xt, i = 1,2,???,?. In a recent paper Joe
(1990) studied a family of measures of multivariate concordance given by
?
t* = S wkP(D ? V*) (3?1) k=n'
where D = X?Y, Bk,n-k is the set of a; in Rn with k positive and n?k negative or k negative and n ? k positive components; n* =
f1^], and the coefficients
Wf? satisfy some technical restrictions. At one extreme, Wk = 1 ? \(n-\? anc^
r* is the average of the (!J) pairwise Kendall's r's for the components of X.
At the other extreme, wn = 1 and Wk = 2n-i-i ^or ^ < n, in which case
7-- ? 1 /on?1 2n-T3?(2n""1P(X < Y or X > Y)
- 1). In Section 6 we will see that
this measure of concordance is also a measure of average total positivity of
order two.
We further note that p+ and p~, which will appear in Section 4, are
also discussed in Joe (1990), where they appear as the scaled expected values
E(Fi(xi)F2(x2) ?
-Fn(xn)) and E(F1{xi)F2(x2) ? ?
-Fn(xn)), respectively.
4. Two Measures of Average Orthant Dependence. There are
two standard ways to generalize positive quadrant dependence to the multi-
variate situation (Shaked, 1982). The first is a generalization of (2.1): X is ?
positively upper orthant dependent (PUOD) iff P(X > x) > Y[P(Xi > *i) i=l
and negatively upper orthant dependent (NUOD) when > is replaced by <.
The second is similar: X is positively lower orthant dependent (PLOD) iff ?
P(X < x) > Y^P(Xi
< Xi) and negatively lower orthant dependent (NLOD)
This content downloaded from 188.72.127.68 on Sat, 14 Jun 2014 00:10:09 AMAll use subject to JSTOR Terms and Conditions
226 NONPARAMETRIC MEASURES
when > is replaced by <. When ? ? 2, positive upper orthant dependence and positive lower orthant dependence are the same, both reducing to pos- itive quadrant dependence (Lehmann, 1966); but for ? > 3 they are dis-
tinct concepts. For example (when ? = 3), if X assumes the four values
(1,1,1),(1,0,0),(0,1,0) and (0,0,1) each with probability \, then it is easy
to verify that X is PUOD but not PLOD. (Note that P(X < 0) = 0 while
P(Xi < 0)P(X2 < 0)P(X3 < 0) - D
As with quadrant dependence, we can view the expression P(X > x) ?
?
TT P(X{ > Xi) as a measure of "local" upper orthant dependence at each point i=l ? in Rn. If we set Ut = Fi(Xi) for i = 1,2,???,? and U = (Uu U2, ? ? ?, Un), then each f/t is uniform on I and the d.f. of U is the copula C. Furthermore,
X is PUOD (NUOD) iff U is PUOD (NUOD), or equivalently, iff P(U > u) > ? ?
(<)Y\(1 ?
Ui). Thus P(U > ?) ?
TT(1 ?
Ui) also measures local upper orthant
t=l i=l
dependence for X. Let ? denote its average:
p= / [P(U > u) ?
TT(1 -
Ui)]duidu2 - ? -dun.
Jin t=l
Now let V be a random vector with ? independent components each uniformly
distributed on J. Then
/, [P(U > u))duidu2 -"dun = P(U > V)
In
= P(V<U)= ?JP(V <u)]dC(u)= j nUiu2--undC(u),
and since Jjn ?G=?(1 ~
Ui)duidu2 ? ? -dun =
(|) , we have
? = / uiu2 ? ? ? undC(u)
- ? ? j
?
If X is a vector of independent r.v.'s with marginal d.f.'s F\,F2, ? ? ?,??, then
H(xi,X2, * ? '
,xn) = Fi(xi)F2(x2) * *
-Fn(xn), and hence the copula for such an
X, which we will denote by II(u), is given by II(wi,U2, ? ? ?
,un) ?
uiu2 ? ? -un.
In this case ? = 0. When H(xi,x2, ? ?
-,xn) = min(F1(x1), F2(x2), ? ?
-,Fn(xn)),
the Fr?chet upper bound for distributions with margins Fi, F2, ? ? ?, Fn, then
the copula is M(u) = min(wi, w2, ? ?-^ wn). In this case P(U > u) = min(l -
Ui, 1 ? u2, ???,! ? un), so that frn[P(U > u)}duidu2
? ? -dun = --. Since 1 ? + 1
C(u) < M(u) for every C, it follows that ? < ^-
- (2)n- So if we divide ?
by this constant, we will have a measure which is 1 when the d.f. of X is the
This content downloaded from 188.72.127.68 on Sat, 14 Jun 2014 00:10:09 AMAll use subject to JSTOR Terms and Conditions
ROGER ?. NELSEN 227
Fr?chet upper bound (and still 0 when the components of X are independent).
Hence we make the following
Definition 4.1. A measure of multivariate association p+ derived from
average upper orthant dependence for the random vector X with copula C(u) is given by:
pt = 2"-(t + i)(2W Jr
UlU2 ? ??UndC{u) -1);
or, equivalently,
pX = n+1
?2n / n(u)dC(tt) -
1). Hn 2n -
(n+ l)v Jln
In a similar fashion, we can define another measure p~ from average
lower orthant dependence. Recall that X is PLOD (NLOD) iff P(X < x) > ? ? ?
(<) J] P(X? < Xi), that is, iff P(U < u) > (<) JJ w,. So P((7 < ti) -
JJ u,
i=l measures "lc
its average;
i=l t=l ?=i measures "local" positive and negative lower orthant dependence; let q denote
q = [P(U < ti) ? TT Ui]duidu2
? ? -dun;
from which it readily follows that
q= I C(u)duidu2 ? ? ? dun - f -
J .
As with p, q ? 0 when the components of X are independent, and ?t = ^-?
?
(I) when the d.f. of X is the Fr?chet upper bound. So we make the following
Definition 4.2. A measure of multivariate association p~ derived from
average lower orthant dependence for the random vector X with copula C(u) is given by:
Pn = 2?G(?+1)(2? Jjn
C(U)du^du2 ' ' ' dun -
1);
or, equivalently,
p- = n+1
(2n / C(u)dn(u) -
1). rn 2n -(n + iy Jr1
J J }
This content downloaded from 188.72.127.68 on Sat, 14 Jun 2014 00:10:09 AMAll use subject to JSTOR Terms and Conditions
228 NONPARAMETRIC MEASURES
5. Properties of p+ and p~ and Examples.
1. A lower bound for />+ and p~ is Jh?^t+i\i
? Since the Fr?chet lower
bound for distributions with marginal d.f.'s Fi,F2,- ? ,Fn is max(Fi(xi) +
F2(x2) + -- + Fn(xn)-n+ 1,0),it follows that P(U > u) > max(l-ux-u2-
-un,0). Thus fjn[P(U > u))duidu2 ? ? -dun > I [max(l
? ui - u2-
un,0)]duidu2--dun = , so that p+ > 2?-(n+i) {(?+?y.
~ ?)'
Simi"
larly, since C(u) > max(ui+u2-\-hun-n+l,0), then Jjn C(u)duidu2 ? ? -dun
> jjn max(t?i +w2 H-\-un ? n + l,0)duidu2 * ? -dun
? ??\??,
and we obtain
the same lower bound for p~. Since the Fr?chet lower bound is not a d.f. for
? > 3, this bound may well fail to be best possible.
2. For ? = 2, both pj and p? reduce to Spearman's ps discussed in
Section 2.
3. For ? = 3, the inclusion-exclusion principle yields
P(U > u) + C(u) = l-ui-u2-u3 + C(ui,U2,l) + C(ui,l,u3) + C(l,U2,u3).
Integrating over I3 gives
P3 + ! P3 + ! = -, 3
1 Pxy + 3 pxz + 3 ?>yZ + 3
8 8 '
2 12 12 12 '
where ???, ??? and pyZ denote Spearman's ps for the two r.v.'s displayed as
the subscript. It follows that
2^3 + Ps) =
?(PXY + PXZ + PYZ)?
Similar expressions can be obtained for ? > 4.
4. Suppose that the distribution of X is radially symmetric (Nelsen,
1993), that is, there is a point a in Rn such that P(X < a-x) = P(X > a + x) for all ? in Rn. It follows that P(U < u) = P(U > 1 -
u), so that p+ = p~.
Example 1. Let ?, Y, and ? be r.v.'s, each uniform on I, such that the
probability mass in I3 is uniformly distributed on the faces of the tetrahedron
with vertices (1, 0, 0), (0, 1, 0), (0, 0, 1), and (1, 1, 1). For this distribution,
P3 ? 15 aRd P3 = ?
?5? Furthermore, since ?, Y, and ? are pairwise inde-
pendent, ??? = ??? = Pyz ? 0. Thus a common measure of multivariate
dependence, the average of the pairwise Spearman's p5's, is 0 for (?, Y, Z). But
the fact that p? is positive and p? is negative indicates some degree of positive
upper orthant dependence and negative lower orthant dependence - indeed,
P(X > \,Y > \,Z> \) =
^ and ? (X < \,Y < \,Z < |) = ? but both
of these probabilities are | when ?, Y, and ? are mutually independent.
This content downloaded from 188.72.127.68 on Sat, 14 Jun 2014 00:10:09 AMAll use subject to JSTOR Terms and Conditions
ROGER B. NELSEN 229
Example 2. Suppose X, Y, and Z have the trivariate Farlie-Gumbel-
Morgenstern (FGM) distribution on I3 with d.f. H(x,y,z) = xyz[l + a(l ?
2/)(l-z)+/J(l-x)(l-^)+7(l-x)(l-y)+<5(l-x)(l-y)(l-2)]; where the four
parameters a, ?, ?, and d are each in [-1,1] satisfying the inequalities 1+??a+
?2? + ?37 > |?| for ?t? = ?1, 6i6263 = 1. The univariate margins are uniform
on J while the bivariate margins are FGM, and the pairwise Spearman's p?'s
are ??? = ^7, ??? = |/3, and ??? =
^o:. Thus the average of the pairwise
/>s's is |(a + /? + 7), however
P3=l(<* + ? +
7)--fis and Pj =
\{<* + ? + ?[) + ?d.
Example 3. Copulas for trivariate Cuadras-Auge distributions are weighted
geometric means of the copula for independent r.v.'s and the copula for the
Fr?chet upper bound: Ce(x^y^) = [mm(x,y,z)]e(xyz)l~6, ? e I. The uni-
variate margins are uniform on J and the bivariate margins are Cuadras-Auge distributions with parameter T. This distribution is both PUOD and PLOD
and ??? ? ??? ? ??? ? 30/(4 -
T); however
, 0(11-50) _A ?._ 0(7-0) P3 =
/? a\(? _ ?\ and ^3 = (3-0)(4-0)
r? (3-?)(4-?)'
Example 4. Let ?, ?, and ? be r.v.'s, each uniform on J, such that
the probability mass in I3 is uniformly distributed on two triangles, one with
vertices (1, 0, 0), (0, 1, 0), and (0, 0, 1); and one with vertices (1, 1, 0), (1,
0, 1), and (0, 1, 1). Here p? = p? = 0. Note that ?, Y', and ? are again
pairwise independent and that (?,?, Z) is radially symmetric about (^, |, ~).
6. A Measure of Average Multivariate Total Positivity of Order
Two. The multivariate version of the TP2 property is called multivariate total
positivity of order two (Karlin and Rinott, 1980): A distribution with joint
density h is multivariate totally positive of order two (MTP2) iff for all ? and
yin Rn,
h(xVy)h(xAy)> h{x)h(y)
where
and
? V y = (max(xi, yx), max(x2, y2), ?, max(xn, yn))
? ? y = (min(xi, yi), min(x2, y2), ? ? ?, min(xn, yn)).
Thus h(xVy)h(xAy)-h(x)h(y) measures "local" MTP2 for a distribution
This content downloaded from 188.72.127.68 on Sat, 14 Jun 2014 00:10:09 AMAll use subject to JSTOR Terms and Conditions
230 NONPARAMETRIC MEASURES
with density h, and we let ? denote its average:
T - / ? / Jh(x Vy)h(x^y)- h(x)h(y)]dxidx2--dxndyidy2--dyn.
JRn JRn
Let Si = Ft(xi), U = Fi(yi), and h(x) = c(F1(x1), F2(x2), ? ? ?, Fn(xn))
h{x\)?2{x2) ' -
-fn(Xn) where /, is the (marginal) density of Xi and c(u) =
dnC(u)/duidu2--dun. Then
T= / / [c(sVt)c(3ht)-c(s)c(t)]dsid82---dsndtidt2---dtn. Jr1 Jr
Now let ti = s V t and ? = s ? t. Then ? < u and dvidui = dsidti for
i = 1,2, ? ? ?,?; hence
T= f-f f G?~ G G S S [c(")cH
- <*??*)?
Jo JO JO JO JO JO , n 0^?7 Jfc=0 S?N |S|=*
dvidv2 - ? - dvnduidu2 ? ? ?</^?
where ? = {1,2, ? ? ?, ?},
{t?t
if i ? S, ( Vi if ? G 5, and 2,? = <
Vi if i ? S, { Ui if i ? S.
Evaluation of the n inner integrals yields
f n
T= / [2nC(ti)c(ti) -
J] J] ^(ti^-^Citi)]^!^???^, ?^ A;=0 S?N
\S\=k
where d$C(u) denotes the kth order mixed partial derivative of C(u) with
respect to the k variables whose subscripts are in S. But since the double sum
in the above expression is simply du d^...du C2(u), we have
r f dn ? = 2n / C(u)c(u)duidu2 ? ? ? dun -
/ -?-?C2(u)duidu2 ? ? ? dun.
Jr1 Jl71 duidu2'-dun
The second multiple integral above is C2(l) = 1, and thus
? = 2n f C{u)dC{u)-l. Jr1
As with ? and q in Section 4, ? = 0 when the components of X are indepen-
dent, and when the d.f. of X is the Fr?chet upper bound, ? = 2n~1 - 1. So
we make the following
This content downloaded from 188.72.127.68 on Sat, 14 Jun 2014 00:10:09 AMAll use subject to JSTOR Terms and Conditions
ROGER B. NELSEN 231
Definition 6.1. A measure of multivariate association t? derived from
average multivariate total positivity of order two for the random vector X
with copula C(u) is given by:
_^(2?^ C(?)dC(u)
- 1).
7. Properties of rn and Examples.
1. A lower bound for rn is 2nI11_1 (since Jjn C(u)dC(u) > 0.)
2. For ? = 2, r2 reduces to Kendall's r as presented in Section 2.
3. For ? = 3, r3 = \(t?? + t?? + t??), where t??, t?? and t?? denote
Kendall's t for the two r.v.'s displayed as the subscript. This follows from the
fact that the family of measures of multivariate concordance studied by Joe
(1990) includes both rn and the average of the pairwise Kendall's r's; but has
only one member when ? = 3.
4. As noted in Section 3, rn is a scaled probability of concordance. This
follows from the observation that 2 Jjn C(u)dC(u) = P(X < Y or X > Y), where X and Y are independent each with d.f. H.
Example 5. Let ?, ?, and ? be r.v.'s on I3 with a density h(x, y, z) = 4
in the two cubes [0, ^]3 and [\, l]3, and 0 elsewhere. Then p* = p^ = | while
Example 6. Now let ?, ?, and ? be r.v. 's on I3 with a density h(x, y, z) =
2 in the four cubes [\, 1] ? [0, \] x [0, \], [0, \\ xj|, 1] x [0, \], [0,1] ? [0, \] x [\, 1],
and [~, l]3; and 0 elsewhere. JVow p% = |, pj = ?
|, whiie T3 = 0. Note that
in this example ?, ?, and ? are pairwise independent.
Example 7. Let ?, ?, ?, and W be r.v. ys on I4 with d.f. H(x, y, z, w) =
xyzw[l + 0(1 -
x)(l -
y)(l -
z)(l -
w)] and density h(x, y,z,w) = 1 + 0(1 -
2x)(l -
2y)(l -
2z)(l -
2w), where |0| < 1. Any three of these four r.v.'s are
mutually independent; however all four are not unless 0 = 0, and
pt = p4" = ??
and u = Wi6'
References
Joe, H. (1990). Multivariate concordance. J. Multivariate Anal. 35, 12-30.
Karlin, S. and Rinott, Y. (1980). Classes of orderings of measures and
related correlation inequalities - I. J. Multivariate Anal. 10, 467-498.
This content downloaded from 188.72.127.68 on Sat, 14 Jun 2014 00:10:09 AMAll use subject to JSTOR Terms and Conditions
232 NONPARA METRIC MEASURES
Kruskal, W. H. (1958). Ordinal measures of association. J. Amer. Statist.
Assoc. 53, 814-861.
Lehmann, E. L. (1966). Some concepts of dependence. Ann. Math. Statist.
37, 1137-1153.
Nelsen, R. B. (1992). On measures of association as measures of positive
dependence. Statist. Probab. Lett. 14, 269-274.
Nelsen, R. B. (1993). Some concepts of bivariate symmetry. Nonparametric
Statist. 3, 95-101.
Shared, M. (1982). A general theory of some positive dependence notions. J.
Multivariate Anal. 12, 199-218.
Department of Mathematical Sciences
Lewis and Clark College
Portland, OR 97219-7899
nelsen<91clark. edu
This content downloaded from 188.72.127.68 on Sat, 14 Jun 2014 00:10:09 AMAll use subject to JSTOR Terms and Conditions