to - dticnsam -m2 the k-coiefficient, a pearson-type substitute for the contingency coeficient...

20
NSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document is unlimiftd..

Upload: others

Post on 28-Feb-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

NSAM -M2

THE K-COIEFFICIENT,

A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT

Robert J. Wherry, Jr. and Norman E. Lane

j j~~~7OFi~-~

3

11

DishtbUtlon of this document is unlimiftd..

Page 2: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

Distribution of this document is unlimited.

Research Report

THE K-COEFFICIENT,

A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGeNCY COEFFICIENT

Robert J. Wherry, Jr. and Norman E. Lane

Bureau of Medicine and SurgeryProject MR005.13-3003

Subtask 1 Report No. 43

Approved by Released by

Captain Ashton Graybiel, MC, USN Captain H. C. Hunley, MC, USNDirector of Research Commanding Officer

25 May 1965

U. S. NAVAL SCHOOL OF AVIATION MEDICIlEV. L NAVAL AVIATION MEDIAL COITI

ImBACOLA. VLOM

I

Page 3: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

SUMMARY PAGE

THE PROBLEM

Determining the relationship between two categorical or qualitative variableshas previously been possible only through use of C, the contingency coefficient.C has several distinct disadvantages, among then its lack of comparability to thePearson product-moment correlation mec~rres of relationship. This lack of compara-bil ity frequently creates problems in the interpretation of contingency coefficientsand the generalization of results.

FINDINGS

A technique is described which provides an extension of the Pearson correlationequation to the case of to categorical variables. Through canonical weighting,maximal scale values are assigned to categories of the qualitative variables. Thesescale values create an optimally weighted composite for each categorical variable,and the correlation between composites is the "K " coefficient, which avoids manyof the disadvantages of C, and allows the computation of a product-moment relation-ship between two categorical variables.

ii

Page 4: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

INTRODUCTION

Many problems in psychology must of necessity deal with categorical data. Inthe post, when an investigator has had two categorical variables, and wished to expressthe relationship between them, his usual recourse was to compute C, the contingencycoefficient. It is well documented that C has certain disadvantages. For example, themagnitude of C cannot be interpreted as Indicating the same degree of relationship as anordinary Pearson correlation coefficient. This is due, in part, to built-in upper limitsof C. When the number of colums (h) equals the number of rows (g), then the upperlimit of C = J -•I . The problem of interpreting C is complicated even further wheng h; for such cases the upper limit of C is unknown. Even for a 2 x 2 contingencytable the magnitude of C does not equal the magnitude of p, which Is a Pearson equa-tion and which can readily be computed.

PROCEDURE

THE CORRELATION RATIO

In attempting to find a Person-type substitute for the contingency coefficient,certain guide lines seemed evident. First, as mentioned above, the technique should,for a 2 x 2 tabledegenerate into the four-fold contingency cojfficient (0). Secondly,the technique for a 2 x g table should yield the same result as if we had assigned avalue of *1" to row I, a "04 to row 2, and solved the problem by means of the correla-tion ratio equation.

The row scare form of the Pearson product-moment equation is

N 4N1" XY - &!X)(1

Wherry, Sr. (1) pointed out in 1944 that if one hod categorical data for variable X andinterval data for variable Y, and If one assigned the mean Y-ucare of those individualsin category i as their X-score (these are known as the Wherry weights), i.e., Xn,I Yy/n, then the Pearson equation could be shown to be equivalent to the correlationratio equation. Using the Wherry weightscertain values in Eq. (1) con be rewritten.Thus,

N gn11 gn n, 9I XY I I 1 (YV,) = I I" (V. I Y/nd) r• -" (2)

N NI x In,- V, ni'Y/n1

4 1 Y=IY and (3)

K.,

Page 5: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

N an,. 2(- EX'!t?- i =/n, • I ( "y)2/niY (4)

[Notethat Eqs. (2) and(4) are equl.

Substituting Eqs. (2), (3), and (4) into Eq. (1), we obtain

n1 NI 31Y)/n, - (T y)2

rXy: = NI (NY)'/n, - y)2 INY2 (N N

But the numerator is the square of the lcft term in the denominator; therefore,

gn/ ti• g N n N

TN T. Iy)2/n, (T Y)N :(?y')2/n, -(I y)2rxy (6)

N N N NINYF - (IY)2 N ( y)2

Equation (6) Is, of course, the correlation ratio equation for use when Y is acontinuous variable and X is a categorical variable with g categories. The fact that thecorrelation ratio is derivable from the Pearson equation is unfortunately neglected inmost statistical texts.

THE "K" COEFFICIENT

Wherry, Sr., in the article mentioned above, was primarily interested indevelopment of a technique far using qualitative data in multiple regression techniques.An alternative method for using qualitative data in multiple regression studies hasrecently gained prominence in the literature. This technique is to create a dichotomousvariaLle for each category of X. Such variables have been referred to as *categoricalpredictor variables" by Bottenberg and Ward (2) and as 'psieudo dichotomous variablesmby Wherry, Jr. (3). Thus, if an individual is In the first category of X, he receives a"I" on the first created dichotomous variable and a *ON on all other dichotomousvariables representing variable X. If he is in the second category of X, he receives a"1 on the second created dichotomous variable and a NO" on all the other createddichotomous variables, et cetera.

If g dichotomous variables are created as stated above and these variables areused to predict variable Y, a continuous variable, the multiple correlation (Ry.x;4 ... )will also equal 17 and the raw score beta weights will equal the Wherry weights men-tioned above. It is also of Interest to point out that the shrunken multiple correlationR will equal Kellys epsilon (c), the unbiased estimate of 17.

2

Page 6: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

Thus, we may recognize that there exists a method of assigning numbers notcovered in the usual discussions of interval, ordinal, and nominal scaling techniques.This new technique we will refer to as *maximal " scaling, because .t is the asignmentof the set of numbers to a set of mutually exclusive categories which will maximize thePearson equation.

If one had two categorical variables, X (with g categories) and Y (with h cote-gories), one could, as stated above, create a set of g dichotornou6 variobles to representX and a set of h dichotomous variables to represent Y. If we could then simultaneouslysolve for the optimal ('Maximalna) weights for the h variables and the optimal weights forthe g variables, so as to maximize the Pearson equation, one could have a statement ofthe magnitude of relationship between two categorical variables which is a Pearon-typeof correlation coefficient. Fortunately, once one has computed the interrelationshipsamong the two sets of dichotomous variables, Hotelling's (4) technique of canonical cor-relation can be used to solve for the maximal weights. The resulting Pearson coefficientwill be referred to as the "K-coefficient. The use of the canonical correlation tech-nique for scoring categorical data was first mentioned by Fisher (5) in 1938, and has alsobeen reported elsewhere (6). The discussions of the technique are usually couched interns of matrix algebra. Because both matrix algebra and canonical analysis are notwell known, relatively few persons would be able to compute a K-coefficient even ifthey had the above references available. In fact, even if one understands the canonicaltechnique several modifications are necessary for handling two sets of categorical vari-ables. For the above reasons it seems desirable to set forth the necessary steps in obtain-ing a K-coefficient.

AN EXAMPLE

Let us assume we desire to find the K-coefficient for the 7 x 8 contingency tableshown in Table I. The contingency table should be arranged so that the nunabers of rows(g) is equal to or less than the number of columns (h).

Step 1.Compute n, N -n, and fn (N- n) for each row and column of the contingency

table as shown in Table 1.

Step 2.Compute ag by g+h intercorrelation matrix using the equation

= N csj- n,. nj (9)rtj n,_

where cjj = the cell entry in Table I If , is an X-varlable and j is a Y-variable,otherwise let c, 1 = 0.

Accuracy should be maintained to the sixth and seventh decimal in computingthe matrix. It Is not necessary to compute the inrroorrelatlo ng the Y-variables.The resuls of step 2 are shown in Table II. j "'

34772•,•

Page 7: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

.- f7IO 0z z

zz

%0 0 la

N ~ ~ C4 Go

I-o

Ui

a en

*c m n

0q (' 0 Nnc

Sn 0 4x x C xnC

N 4

Page 8: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

I K

.C

A0' I . !q

C4 40 .C.4.- 0 - 4 0

ar as C1 0x Pr

x x

N~Ifl5fl -I4~x

O = ,.

c~xt"Vx*v.r= U

Page 9: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

SStep Perform a diagonal factor analysis (7) extracting not more than g-1 factors. It

turfs out that a maximum of g'1 factors are needed to explain all the variance of theg X-variables since each of the X-vorlables is mutually exclusive and mutually exhaus-tive. That is, if a person is found in one row, he cannot be found in any other row,and if a person is included in the analysis, he must be found in one of the rows. If, bychance, the proportions of cases found in each column are Identical for two rows, therows should be conmbined since, from a predictive standpoint, there is no differencebetween the two categories.

The results of the diagonal factor analysis are shown in Table Ill.

Step 4.Using the h loadings of the Y-variables on the g-1 diagonal factors, obtain

the interrelationships of the Y-voriables in the X-space. To obtain this matrix, use theequation

fg-1rl E I (af Y." afY) (10)

where th th1` j is the relationship of the t Y-variable to the j Y-variob:e in the

X-space, and

af YI is the diagonal factor loading of the ith Y-variable on the nth

diagonal factor.

The results of step 4 are shown in Table IV.

Table IV

The Interrelationships of The Y-Variables in the X-Space, ir 's)

"Y1 Y2 Y3 Y4 Y 5 Y6 Y7 Y8

Y1 .04475 -. 04189 .06056 -. 03410 -. 03684 -. 01188 -. 03146 -. 04721Y2 .11155 -. 08051 .10540 -. 040M8 -,00779 .02698 .01826Y3 .09650 -. 07787 .05546 -. 01101 .05055 -. 05709YV .11701 -. 03471 -. 00843 .03500 .00442Y .03527 -. 00646 .02961 -. 04246

.01398 -. 00731 .01512V6 .04095 .03380

- -. 06986

6

Page 10: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

Step 5. Perform a Principal Axis factor analysis (8) extracting only the first Principal

Axis. This step obtains the vector through the X-spoce which will explain a maximumamount of the Y variance explainable with only one vector. The Principal Axis toad-ings (OpyI 'S) ore shown in Table V.

Table V

The Principal Axis Loadings of the Y-Variables, (apy 's)

Y1 Y2 Y3 Y4 Y5 Y6 Y7 Ye

P.A. -. 18127 .29501 -. 30391 .28791 -. 16924 .01475 .15021 .15049

Step 6.Obtain the relationships (a f, p's) of the Principal Axis to each of the h-1 diag-

onal factors by solving g-1 set of equations of the type

Of~~~ y1fP+fy +a p + a.+ Of _gi P py

0 y *Of, + p ,,, •f p + ... a., • -1p =f aY

*+ .*+ + . . =afzV ag I Oft + ofzy - ap + *... +a afg. =a

°f ¥• 1 °f ' % g - "al + .. ÷ -l g - "°f -l - PY9-," 0 1)

Notice that only g-1 of the available h Y-variables or* used in the abovsimultaneous solution. A consistent solution will be obtained with any g9i of the y-variables used. For simplicity the first g-1 set was chosn. Table VI shows the Initialvalues for the simultaneous equations as obtained from Tables III and V, and the rela-tionship of the Principal Axis to the diagonal factors. As a computational chec, thesum of the squares of the af. p's should equal 1.0 (within rounding errmo). This

demonstrates that the Principal Axis Vector (obtained in Step 5) is wholly within theX-space, and must therefore be completely predictable from the g-I X-Variables.

7

Page 11: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

Table VA

The Beginning Values for the g -1 Simultaneous Equations to be Solved to ObtainZ the Relationship of the Principal Axis to the Diagonal Factors.

Final Solution is Shown in the Row Labeled a. p

Coefficients for

Of? afap O 0a' 1fp 1ý I

-. 07235 -. 02135 -. 11552 .15839 .026 .02492 - -. 18127-. 11255 .26M .11692 -. 07960 -.03003 .07423 .29501

.14078 -. 08813 -. 14393 .18450 .05983 -. 10287 - -. 30391"-.08916 .23753 .11757 -. 02682 -. 05377 .18762 = .28791

.11519 -. 02468 -. 08230 .11311 .03746 -. 02057 = -. 16924.03773 -. 04738 -. 00275 -. 09817 -. 02135 .01455 = .01475

of.p-. 46 2 13 .44628 .47222 -. 47346 -. 14697 .34421fg-1

fa-

m=f

Stop 7.Using the loadings of the Principal Axis on the diagonal factors as the criterion

variable, solve the g-I simultaneous equations necessary to obtain the standard-scorebeta weights of the g-1 X-variables used as the basis for the diagonal factors. The g-iset of simultaneous equations for step 7 is of he form

af " + Of1 X3 . Xa + "'" + ag " -1 = af p0

a f.BX + + +afeg- . 0 Xg..1 -afp

afg-ixg.i . %xg-I = aFg.1P (12)

The coefficients for the Xs for use In the above equations are the diagonalfactor loadings of the first g-I X-varlables as shown in Table 111. The af p's are the

solution to the simultaneous equations solved in Table VI. The results of step 7 areshown in Table VII.

Page 12: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

Table ViI

The Standard Score Beta Coefficients (8x• 's) for the g- 1 X-Variables Which Perfectly

Predict the Vector Through the X Space Capable of Explaining the Most Vriantce

-. 38191 .46194 .37010 -. 48376 -. 61551 .36691

Step 8.Obtain the raw-score beta coefficients for the g-1 X-variables upon Which the

diagonal factors are based. This is accomplished by taking the mean of the PrincipalAxis Vector to be zero and taking its variance to be unity. Raw score beta for the ith

X-variable becomes1.00

Caxt N

N (13)

F nx_(N nx

Obtain the constant to be added (A) by the equation

A T) -, Xt.) = 0.O - 9 1 . nx I

g-1 t N

-I-bx •nxl= n•:. (14)

N

Table VIII

The Raw Score Beta Coefficients (bxf 's) for the g-] X-Variablm

and the Constant to be Added (A)

bx, bX2 b bN bX1 % A

-. 90867 3.275 .82315 -1.5188 -. 16125 2.30387 .06111

9

Page 13: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

step 9.form .The values in Table VIII give weights to use in a prediction equation of thet form

P b • X +b•- ( +*..+bx 1 + x, . bxg.l •Xg-.1 +A. (15)

If we oinslder the predicted score of a person in the Ith X category, the only X-variable to have a nonzero score will be Xt , which will have a value of 1.0. There-

_. fore, the predicted score (i.e., location of case on the Principal Axis vector) for apersun who was in the 1th category of the X-variable will be

Px bx (0)+bx;(0)+ +bx (I)+...+ g..1 (0) +A,

which simplifies to

P b + A. (16)Pxl x

For a case which was in the gth category of X, the predicted score will be simplyPXg = A. These scores represent the locations of the various X categories on a vector

through the X-space which must be optimally related to the variances of the y-variables.The meun of the scores will be zero and the standard deviation will be one. For thisreason we refer to these predicted scores (Px's) as the z-weights (i.e., zx = Px )Table IX shows the computed z-weights based on Eq. (16).

Table IX

The Computed z-Weights for the g X-Variables

zxI zx2 Zx3 Zx4 Zx5 zx6 zx 7

-. 84755 3.78582 .88426 -1.45766 -. 10014 2.36498 .06111

Step 10.Inasmuch as the X categories have been ordered along a single continuum, we

may consider X to be an interval-type variable. The optimal set of weights for theY categories may be obtained by Wherry's (1) multiserial equation which states

nj

"b =byl ;" /n+

which,for our purposes, may be rewritten as

10

Page 14: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

Y4

9

b (z71 1 (17)yj n

by3 = the multiasrial weight for the st Y category,

thz - the x-weight for the i X category,

c = t•e number of cases in the 1 X category which were alsoin the j'y category, and

n = the total number of cases in the j th Y category.

The muitserlial weights for the Y crtegories, as computed by Eq. (17), are

shown in Table X.

Table X

The Multiserial Weights for the Y Categories

byb bY5 by6 bY7 bye

-. 86933 1.01338 -. 39234 1. 76070 -. 65546 .03737 .72035 .26810

Step II.Obtain the K-coefficient by the equation

K y lb 2 en5 (18)Kxy j_

N

For the present problem KXY has a value of .56219.

Step 12.This final step is optional but may be accomplished if zwelghts are desired for

the Y variables instead of multiserial weights. The z-welghts, when applied to theentire samplewill yield a mean of zero and a standard deviation of unity. The z"weights for the h Y-variables are obtained by applying the equation

Zy3= b y3 Xby• /K .y •(9)

The calculated z-weights for the Y'varlables are shown in Table XI.

11

Page 15: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

Table XI

The z-Weights for the h Y'Variables

z y1 zY2 Zy3 Zy 4 Zy 5 Zy6 zy7 Zy8

-1.54632 1.80256 -. 69788 3.13186 -1.165992 .06630 1.28133 .47688

Proof that KXy is a Peorson-Moment Correlation Coefficient

IUing the z-welghts for the g X-categories and the h Y-categories will yieldmeans of zero and standard deviations of unity for both X and Y. Under such circum-stances the Pearson product-moment correlation coefficient may be computed as

Nrxy

N

The above equation may be rewritten as

rXy = = (20)N N

However, from Eq. (17) we know

b3 n =~ I(Zx1 0cl)

therefore, substituting this value into Eq. (20) we obtain

hrXy Zyj. bys • n (21)

N

Now, substituting Eq. (19) into Eq. (20) we obtain

hI b2 n

r (22)

N. KyXV

12

Page 16: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

From Eq. (18) we se thath A

5b ni

XV N

therefore, substituting this value into Eq. (22) we me

KIrXy = XY Ky

Kxy

%Aich proves the K-coefficient is, In fact, a Pea F ontype meaos of c=rrelation farcotegorical data.

The data used in the above example were obtained by randomly conbiningvariow adjacent columns and various adjaceni rows of some winterval' type data on"hegig of fathers" (X) and "height of sons (Y) as found on page 117 of /iNenwr (9).The data were conbined as shown in Figure I.

The z-weights derived by the procedure dowwibed in this paper, when applied insubaequent soinples, would not be expected to yield a correlation this high, A paper onthe expected 'shrinkage- of the K-coefficient is being psepared as well as a paper ontesting the significance of the K-coefficient. Until the latter paper is published usersof the K-coefficient may use the 'X -test associated with the contingency coefficientto decide if obtained K-coefficients differ significantly from zero.

Initial investigations of the distribution of K-coefficients indicate that aK-coefficient may be less than, as well as greater than, the corresponding C-coefficient.

The obtaining of a K-coefficient, eapecially when the number of rows andcolumns is large, is obviously a tedious process and one which should be handled bycomputer rather than desk calculator. The technique has been programmed for the aM1620 computer and will handle problems where g + h S 20. In spite of the complexityof the technique, the K-coefficient seem far more acceptable as the proper Meire ofrelationship between categorical variables than the older contingency coefficient.

i*-4

3t13

Page 17: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

*i

H; !ght of Sons (Y)62 63 64 65 66 67 68 69 70 71 72 73

75 1

74 1 X2S73 1 I 1 1 I X r56~FF'77777 6 Xy

72 11 2221 171 1 2 2 2 3 4 2 2 Ix 3 KX .562

•70 1142444311

69 1 1 3 4 3 5 6-421 2 X7

68 1225656322 X5

67 1134554231. 66 - 2 2 2 14 3 1

ii65 1 12 _3 2 12 1 1

64 1 1 1 2 2 12 X4

63 1

Figure 1. The Original Scatterplot of the Data from whichTable I was Obtained

z 14

"q ~l r:• p - 1 2 2 2 4 31....... i :;

Page 18: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

REFERENCES

1. Wherry, R.J., St., Maximal weighting of qualitative data. Psychometrik, 9:263-266, 1944.

2. Bottenberg, R.A., and Ward, J.H., Jr., Applied multiple linear regression.Technical Documentary Report PRL-TDR-63-6. Lackland AFB, Texas:Pesonnel Research Laboratory, Aerospace Medical Division, AF SystemsCommand, 1963.

3. Wherr/, R.J., Jr., Toward an optimal method of equating wmbups compo ofdifferent subie:ts. Monograph Series, No. 9. Pensacola, Fla.: Naval Schoolof Aviation Medicine, 1964.

4. HoIeli'nq, H., The most predictable criter'on. J. educ. E.-, 16:139-142,1935.

5. Fisher, R.A., Statistical Methods r Research Workers. 7th ed., Edirwurgh:Oliver and Boyd, 1938.

6. McKeon, J.J., Canonical analysis: Some relations between canonical correlation,factor analysis, discriminant function analysis, and scaling theory. PublicHealth Service Training Grant 2M-6961 and Nonr 1834(39). Urbana, Ill.:University of Illinois, Department of Psychology, July, 1962.

7. Fruchter, B., Introduction to Factor Analysis. Princeton, N.J.: D. VanNostrand Co., Inc., 1954.

8. Hotelling, H., Analysis of a complex of statistical variables into principlecomponents. J. educ. Psych., 24:417-441 and 498-520, 1933.

9. McNemar, Q., Psychological Statistics. 3rd ed. New York: John Wiley andSons, Inc., 1962.

15

,A

Page 19: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

Undn~iadDOCUMNET CONTROL DATA - R&D

(ecurity etj..*itc.#tin *I titl., body of *ftt,.ct and hinudosn afttiafat-@f must be gse.,.d when d" a .,.urll ewet 1# c wsits0o)t .oft.. .S-1 01IiNATIN G ACTIVI-V (CotipueaE AviA.,) A. XIPORT ORCURTY C i.ASSPICA tION

U.S. Naval School of Aviation Medicine 264*IPensacola, Fiorida ____

3 1001 TEOT? lLE

THE K-COEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCYCOEFFICIENT.

S 4. 091OESC&PTIV *ROYJ Twnm of j~ r jand wasi. due..)

9 A AU TNONS) MmL a..me. ftut ,,-9 *0440~

Wherry, Robert J., J.k., and Lane, Norman E.

6 l9 boOtTOATe7a. TOTAL NO. 09 *A#Ea 76. too. or, eRIs

25 MRX1%65 15 9gCONTRACT an GRANT No. to- @RIGINATOMRS R4PORT NUUSGWS)

&PROJOECT No. MROO5. 13-3003 NSAM - 929

C. 5juftas 1 Sb lk arm PONT WO(S) (RY*W IAN NO oerkMhue mt ON", 6. aW&WIb.

43

Quliie reqeser mT/N ay10 obtain copies of this report from DOC. Available, for sale to thepublic, from the Clearinghouse for Federal Scientific and Technical Information, Springfield,Virginia, 22151.

IIspi. 9UPPLEMETANY NOTES OR. 11POUS0611mNO11LITANY ACTevITY

12 AUSTRAC1

The Pearson product-moment correlation coefficient (r ) is recognized at the basicequation for relationship. Well-known and widely used dirivatives of tim Peamonequation include the point-biweial correlation ( rpt. bls.), the Speaman rank-difference correlation (10), the fourfold point corrlation- (,) and the correlationratio ( lJ).

Expre Ion of relationships between categorical variables has previously been poesibleonly bymeats of the contingency coefficient. This paper descrlbes n extension ofPearson's basic equation to provide a imeaure of relationship between two categoiricalvariables, the "K" coefficWent, which avokb many of the disadvantages inherent In

the contingency coeff iclesJ.

D D 4oa"1473 Ucluledt~~u~a~

Page 20: to - DTICNSAM -M2 THE K-COIEFFICIENT, A PEARSON-TYPE SUBSTITUTE FOR THE CONTINGENCY COEFICIENT Robert J. Wherry, Jr. and Norman E. Lane j j~~~7OFi~-~ 3 11 DishtbUtlon of this document

ISecuuityClassificatioun______________ ______

14Ky OD LINK A LINK 3 LINK C

CorrelationCorrelation analysisCanonical correlationFactor analysisStatisticsStatistical analysisQualitative data

INST UCTIONS4 t. ORIGINATING ACTIVITY: Enter the name and address imposed by security classification, using standard statements

of the contractor, subcontractor, grantee, Department of Do-. sih as:letas activity or other organization (cooporewte auther) issuing (1) "Qualified reqiasters may obtain copies of thisthe r~on. report from DDC."2&. REPORT SECUErTY CLASSIFICATION: Enter the over- (2) "Foreign announcement and dissemination of thisell security classification of the report. Indicate whether"Res6tricted Dolt" is included. Marking is to be in acord- report by DDC is not authtized."antce with appropriate security regulationas. (3) "U. S. Goverment agencies may obtain copies of2b. GROUP:. Automatic dowageding is specified in DoD D this report directly from DDC. Other qualified DDCrective 5200. 10 and Armed Forces Industrial Manusl. Enter users shell request thwoglithe group nhmiber. Also, when applicable, show that optionalmarkings have been used for Group 3 and Group 4 as author- (4) "U. S. military agencies may obtain copies of this

report directly from DDC. Other qualified users3. REPOWRT TITLE: Enter the complete report title in all shall request throughcapital letters. Tites in all coses should be unciessified.If a maaningful title cannot be selected without classifies-tion, show title classification in all capitals in parenthesis (S) "All distribution of this report is controlled. Qual-immediately following the title. fied DDC users shall request through

4. DESCRIPTIVE NOTES If appropriate, enter the type of ,_1_report, e.g.. interim, progress, summary. arnsal, or final. If the report has been furnished to the Office of TechnicalGive the inclusive dates w.hen a specific reporting period is Services Department of Commerce. for sale to the public. di-covered, cat* this fact and enter the price, if known.S. AUTHOR(S): Kter the name(s) of atmhor(s) as shown on I. SUPPLEMENTARY NOTES:- Use for additional esxplane-or in the report. Enter last name, first name, middle initial. tory motes.If military, show rank and breach of service. The name ofthe principal author is an absolute minimum requirement. 12. SPONSORING MILITARY ACTIVITY: Enter the b-ame of

the dapartmental project office or laboratory spomeoring (pay-6. REPORT DATEL Enter the date of the report as day, ing for) the research and development. Include addres.month. year. or month. year. If soe than one date appea.Ars: ne nabtatgvigabif a ataon the report, use date of publication. 13. ABSTRACT: Enter an abstract giving a bief and (actul

summary of the document indicative of the report. even though7a. TOTAL NUMBER OF PAGE&: The totat page count it may also appear elsewhere in the body of the technical ns-should follow normal pagination procedaues. i.e enter the port. If additional space is required, a continuation sheet shallnumber of pages containing ifaomatio. be attached.7b. NUMBER OF REFERENCES: Enter the total number of It is biglly desirable that the abstract of classified reportsreferences cited in the report. be unclassified. tach paragraph of the abetract shell end withSa- CONTRACT OR GRANT NUMBER: If appropriate. enter an indication of the military security classification of the is-the applicable number of the contract or grent under which formation in the paragraph. represented as (TS), (S). (C). or (U)the report was wmitteon There in no limitatioa on the length of the bhetract. How-Sb. k, h 6d. PROJECT NUMBEM Enter the appropriate ever, the suggested length is from 150 to 225 woids.military departeN identification. each as project mumber 1subproject number, system numbers, task isaer, at 14. KEY eOc.: Key words oin tocheicnlly melingful tome

or ehort phrases that characterize a report and may be used as9a. ORIGINATOR'S REPORT NUMBER(S): •ooer the offl- index entries for cataloging the report- ey words must becial report number by which the document will be identified selected to that no security classification is required. Identi-and controlled by the originating activity. This number must fiers, such as equipment model designation. trade name. militarybe unique to this report. Project code name. geographic locatioso. may be used sa kay9b. OTHER REPORT NUMBER(S): If the report hele been words but will be followed by an indication of teclinical co.assigned any other report numbers (either by the orig•nahor text. The assignment of liank. role*. rind "vights is optional.or by the sponsor), also enter this number().10. AVAIL-ABIL-TY/IIITATION NOTICES: Enter any lim-itatdions •n further dissemination of the report, other than those

Unclamvif led

Security Cliaulifkcatiom

A ki..

C ~ ~I