lecture 5 chi-squares ii (other categorical measures of association)

Lecture 5

Chi-Squares II (other categorical measures of association)

Measures of Association for Categorical Data

• One problem with the hypothesis testing framework that we’ll discuss later is the fact that any observed difference has the potential to be statistically significant, provided the sample size is large enough.

• Hence, the results of a hypothesis test could be depicted as a test to determine whether your sample size is large enough to detect the true difference between two populations.

• A more informative way of describing observed differences relies on effect size indices (statistics that attempt to depict differences in a metric that provides substantive meaning to the observed difference).

• In the context of chi-square tests, the appropriate effect size indices are measures of association (statistics that depict the magnitude of the relationship between the two variables in the table).


• For example, consider the two tables below. Both have comparable chi-square and p-values, but most people would say that the one on the left shows evidence of a stronger relationship than the one on the right (particularly given the expected values shown in parentheses). Measures of association for these tables illustrate this difference.

2 = 4.94, p = .03 2 = 5.14, p = .02

30

(24.5)

19

(24.5)

19

(24.5)

30

(24.5)

300

(281)

262

(281)

262

(281)

300

(281)


• There are four relevant measures of association for nominal (categorical) data. The first three are interpreted similarly, and the last has a different interpretation.• Contingency coefficient• Phi Coefficient• Cramer’s phi coefficient• Odds ratio• Risk ratio

• The contingency coefficient cannot have a maximum value of 1, so its interpretation is somewhat difficult.


• The phi coefficient and Cramer’s phi coefficient have a range from 0 to 1 with 0 indicating no association and 1 indicating a perfect relationship between the two variables in the contingency table.

• As a rule, values less than .2 indicate a negligible relationship, values from .2 up to .5 indicate an important relationship, and values from .5 up to 1 indicate a very strong relationship.

• The phi coefficient only applies to 2 x 2 tables, and Cramers phi (aka Cramer’s V) applies to any two-way table. As you can see, the equations for the three indices are similar.


• Going back to our original example, let’s apply what we now know…

300

(281)

262

(281)

262

(281)

300

(281)

2 = 4.94, p = .03


• Going back to our original example, let’s apply what we now know…

30

(24.5)

19

(24.5)

19

(24.5)

30

(24.5)

2 = 5.14, p = .02


• The introduction of these correlation-based (we’ll learn more about correlations in Chapter 9) statistics introduces a new way of thinking about the null hypothesis of the Pearson chi-square test of association.

• Recall that our null hypothesis is that there is no relationship between the two variables depicted by the table and that we represent this symbolically as

• Ho: ρij = ρ i+1,j, for all i .

• That is, the proportion of observations in one row equal the proportion of observations in another row for each column of the table.


• Recall that this is a test of association and that Cramer’s V is a measure of association. Also, note that the null hypothesis implies that there is no relationship between the two variables in the table (i.e., that the proportion of observations in an individual cell is dictated by the marginal frequencies for the two variables).

• Hence, we can restate the null hypothesis for the Pearson chi-square test of association as

• Ho: Cramer = 0

• That is, there is no relationship between the two variables in the table, which is equivalent to saying that the proportion of observations in one row equal the proportion of observations in another row for each column of the table.


• The odds ratio (OR) is a little more difficult to understand, but it also has a straightforward interpretation. Note that the odds of an event is represented as a fraction 2/1, sometimes represented 2:1 or 2 to 1.

• The odds represents the likelihood of one event instead of the converse of that event.

• For example, you could describe the odds of a 1, 2, 3, or 4 on the roll of a six-sided die rather than something other than a 1 through 4 (i.e., a 5 or a 6) as 2/1 or simply 2 (4 to 2, simplified to 2 to 1).

• This means that an outcome of 1 through 4 is twice as likely as an outcome of 5 or 6. Hence, the odds of a 5 or 6 rather than its converse is 2/4 or 1/2 or simply 0.5a 5 or 6 is half as likely as a 1 through 4.

• Note that in this example, the ratio for the odds is a simplification of a ratio of probabilities. The probability of a 1 through 4 is (4/6) and the probability of a 5 or 6 is (2/6). So, the odds of 1 through 4 is4

4 26 22 2 16


• An odds ratio, on the other hand, is a ratio of odds compared between two groups.

• Let’s compare the odds of a fair die resulting in a 1 through 4 outcome to the odds of a “loaded” die resulting in a 1 through 4 outcome as the ratio of the odds for each event.

• For the fair die, the odds would be 2/1 or 2 (as stated on the previous slide).

• For the loaded die, the odds might be 4/1—loading the die has doubled the chances of seeing a 1 through 4.

• Hence, the odds ratio between an outcome of 1 through 4 for a fair versus a loaded die would be (2/1)/(4/1)=2/4, which equals 0.50. That means that the odds of a fair die showing a 1 through 4 is only 50% as large as the odds of a loaded die showing a 1 through 4.


• Alternatively, you can turn around this odds ratio by inverting the original odds ratio. Hence, 1 / .50 = 2, which is the odds ratio between an outcome of 1-4 for a loaded die versus a fair die. We can confirm this by constructing the odds ratio from the actual odds of each event:

• (4/1)/(2/1) =4/2= 2. Hence, the odds of a 1-4 on an loaded die is 2 times larger than the odds of a 1-4 on a fair die.

Example in out text…


Table 6.4 The effect of aspirin on the incidence of heart attacksOdds of heart attack given that participants did not take aspirin:

OddNoAspirin=189/10845=0.0174Odds of heart attack given that participants did take aspirin:

OddAspirin=104/10933=0.0095OR= OddNoAspirin/OddAspirin=0.0174/0.0095=1.83

• Thus, the odds of having a heart attack given you didn’t take aspirin are 1.83 times greater than the odds of having a heart attack with aspirin.

Outcome Heart Attack No Heart Attack

Aspirin 104 10933 11037 Placebo 189 10845 11034

293 21778 22071


• An alternative calculation is simply dividing the cross products. Again we want to divide the odds of the treatment group (experimental group) by the odds of the no-treatment group (control group)

• Example in out text:• Table 6.4 The effect of aspirin on the incidence of heart attacks

AD/BC or BC/AD will yield different OR’s and different interpretations

Odds of heart attack given that participants did take aspirin: OddNoAspirin=189(10933)=2066337Odds of heart attack given that participants did not take aspirin:OddAspirin=104(10845)=1127880OR= OddNoAspirin/OddAspirin=2066337/1127880=1.83


Aspirin 104 10933 11037 Placebo 189 10845 11034

293 21778 22071


• Another commonly seen measure of association is relative risk (RR). • The relative risk is a measure of the relative size of the probabilities of

two events: p1 / p2. We know that the probability of a 1 through 4 on a fair die is 4/6 (or 2/3 = .67). From the odds ratio, for the loaded die, we can see that the probability of a 1 through 4 is 4/5 (p/1-p = 4, so p = .80). Hence, the relative risk of a 1 through 4 on a fair versus a loaded die is 2/3 / 4/5 or .83. That is, the likelihood of a 1 through 4 on a fair die is 83% of the likelihood of a 1 through 4 on a loaded die. This is different than the odds ratio for these events which equals .50.


• Back to the example on the previous slide:

• Risk of heart attack given that participants did not take aspirin: RiskNoAspirin=189/11034=0.0171Odds of heart attack given that participants did take aspirin:RiskAspirin=104/11037=0.0094Risk Difference = .0171-.0094=.0077RR= RiskNoAspirin/RiskAspirin=0.0171/0.0094=1.819

• Therefore, the risk of having a heart attack given you did not take aspirin is 1.82 times as likely than if you had taken aspirin

• Note: The odds ratio is only relevant for 2 x 2 tables


• Some quick notes on risk and odds– Risk is intuitive but limited• It is future oriented and inapplicable in retrospective

studies

– Odds is less intuitive• But it is applicable in retrospective and prospective

studies• Can make odds more intuitive with some simple

transformations


• Example

• The odds of having a heart attack given you took aspirin are .54 times the odds of having a heart attack given you were in the placebo group

• The probability of having a heart attack given you were in the aspirin group is OR/(1+OR) = .54/1.54=.35

• The probability of having a heart attack given you were in the placebo group is 1.83/2.83 = .65

• .65+.35 = 1


Aspirin 104 10933 11037 Placebo 189 10845 11034

293 21778 22071


• A quick reminder…

• All of the tests that we present in this course will place certain requirement, expectations, or assumptions on the data in order for the test interpretation to be valid. For the chi-square test, the assumptions are:

• Independence: We assume that observations are independent of one another. That is, the value of any one observation does not depend on or is not influenced by the value of other observations in the dataset. Don’t confuse this with the test of independence, which focuses on independence between variables (not observations).

• One way to ensure independence among observations is to verify that the categories constitute mutually exclusive codes (an individual cannot be a member of multiple categories).

• Another way to ensure independence among observations is to use simple random sampling from the population. A third way to ensure independence is to evaluate your research design to determine whether there are opportunities for participants to interact or to for group clusters.


• Normality: Recall that the chi-square distribution can be formed by summing squared observations from a standard normal curve (z-scores from a normal distribution). This suggests that the chi-square distribution relies on a normality assumption in some way.

– Look at the tables below. If you fix the margins as indicated, there are several configurations of allocating individuals to cells that allows you to maintain these marginal frequencies.

20

5 5 5 15

5 5 5 15

10 10 10

4 6 5 15

6 4 5 15

10 10 10

3 5 7 15

7 5 3 15

10 10 10

6 4 5 15

4 6 5 15

10 10 10

3 7 5 15

7 3 5 15

10 10 10

6 3 6 15

4 7 4 15

10 10 10

4 5 6 15

6 5 4 15

10 10 10

4 4 7 15

6 6 3 15

10 10 10

3 4 8 15

7 6 2 15

10 10 10

6 2 7 15

4 8 3 15

10 10 10

3 3 9 15

7 7 1 15

10 10 10

6 1 8 15

4 9 2 15

10 10 10

4 3 8 15

6 7 2 15

10 10 10

3 2 10 15

7 8 0 15

10 10 10

6 0 9 15

4 10 1 15

10 10 10


– In fact, the distribution of possible values for any single cell in the table is normally distributed, given that the sample size is large enough and the probability of an observation falling in that cell is not extreme.

– Also, recall that the expected cell frequencies for the chi-square test are defined as Np (total sample size times the probability of being in that cell). Hence, the requirement of normality can be satisfied if the expected cell frequencies are of sufficient size. A rule of thumb is that all of the expected cell frequencies should be 5 or greater.


• Sensitivity is the probability that an outcome occurs given a positive result on some (predictive) measure for that outcome

• Specificity is the opposite; the probability of not having some outcome or meeting criteria for some outcome given you screened negatively on some predictive measure.

Screen Pos Screen Neg Total Cancer 9 1 10

No Cancer 110 880 990 Total 18 881 1000


• This data is similar to mammography data predicting the presence and absence of breast cancer (not real data)

• We need to consider the conditional and marginal distributions to get at the answer of sensitivity and specificity

Screen Pos Screen Neg Total

Cancer 8 2 10 No Cancer 110 880 990

Total 118 882 1000


• sensitivity = 8/10 = .80 (the probability of screening positive in the diagnostic cancer population)

• specificity = 880/990 = .89 (the probability of screening negative in the non-diagnostic cancer population)



Total 118 882 1000


• Going one step further…• What if we wanted to use all of this

information to answer the question, “What is the probability of having cancer, given you screened positive for cancer?”– Guesses?

• We can answer this with Baye’s theorem


P(C) = (8 + 2)/1000 = .01; P(C’) = .99P(+)=(8+110)/1000=.12; sensitivity=.8

or



Total 118 882 1000 Thoughts? Is this what you expected?


– When this requirement is not met, you can use exact statistic to perform the hypothesis test. The exact statistic is based on the empirical probability of observing a certain configuration of cell frequencies with fixed marginal frequencies. On the last page, several such configurations were shown. To perform an exact test, you would rank order the tables based on the value of one of the cells, determine the probability of observing a value in that cell equal to or less than the observed value, and declare that probability as the p-value for your hypothesis test. For this class, you don’t need to know how to do an exact test, but you do need to know that it is an alternative when expected cell frequencies are small.

• Inclusion of Nonoccurrences: Another requirement of the chi-square test is that all cases in the data set be included in the contingency table. That is, the coding system must be exhaustive—it must represent all elements of the sample.


• A slightly different index, a measure of agreement rather than association, is coefficient kappa (κ—aka Cohen’s kappa). This index is referred to as a measure of agreement rather than a measure of association because it goes beyond merely indicating whether there is a relationship between two variables—kappa actually indicates the degree to which the categorizations of the two variables are identical.

• Coefficient kappa is commonly used to depict the level of agreement between two raters.


• For example, the frequency table below provides an overly optimistic measure of association of the level of agreement between two raters. Cramer’s V for this table equals 0.54 indicating fairly strong association. However, they only agree in 12 out of 36 cases.

rater1 rater2 Frequency‚ Percent ‚ Row Pct ‚ Col Pct ‚ 0‚ 1‚ 2‚ Total --------------------------------------------- 0 ‚ 1 ‚ 10 ‚ 1 ‚ 12 ‚ 2.78 ‚ 27.78 ‚ 2.78 ‚ 33.33 ‚ 8.33 ‚ 83.33 ‚ 8.33 ‚ ‚ 33.33 ‚ 83.33 ‚ 4.76 ‚ --------------------------------------------- 1 ‚ 1 ‚ 1 ‚ 10 ‚ 12 ‚ 2.78 ‚ 2.78 ‚ 27.78 ‚ 33.33 ‚ 8.33 ‚ 8.33 ‚ 83.33 ‚ ‚ 33.33 ‚ 8.33 ‚ 47.62 ‚ ------------------------------------------------------ 2 ‚ 1 ‚ 1 ‚ 10 ‚ 12 ‚ 2.78 ‚ 2.78 ‚ 27.78 ‚ 33.33 ‚ 8.33 ‚ 8.33 ‚ 83.33 ‚ ‚ 33.33 ‚ 8.33 ‚ 47.62 ‚ ------------------------------------------------------ Total 3 12 21 36 8.33 33.33 58.33 100.00


• One measure of agreement in the table below would be to sum the relative cell frequencies in cases where the two raters agree (e.g., 0,0; 1,1; and 2,2). In table on the previous page, the percentage of agreement between the raters would be 33% (12/36)--not that great.

rater1 rater2 Frequency‚ Percent ‚ Row Pct ‚ Col Pct ‚ 0‚ 1‚ 2‚ Total --------------------------------------------- 0 ‚ 1 ‚ 10 ‚ 1 ‚ 12 ‚ 2.78 ‚ 27.78 ‚ 2.78 ‚ 33.33 ‚ 8.33 ‚ 83.33 ‚ 8.33 ‚ ‚ 33.33 ‚ 83.33 ‚ 4.76 ‚ --------------------------------------------- 1 ‚ 1 ‚ 1 ‚ 10 ‚ 12 ‚ 2.78 ‚ 2.78 ‚ 27.78 ‚ 33.33 ‚ 8.33 ‚ 8.33 ‚ 83.33 ‚ ‚ 33.33 ‚ 8.33 ‚ 47.62 ‚ ------------------------------------------------------ 2 ‚ 1 ‚ 1 ‚ 10 ‚ 12 ‚ 2.78 ‚ 2.78 ‚ 27.78 ‚ 33.33 ‚ 8.33 ‚ 8.33 ‚ 83.33 ‚ ‚ 33.33 ‚ 8.33 ‚ 47.62 ‚ ------------------------------------------------------ Total 3 12 21 36 8.33 33.33 58.33 100.00


• However, such an index is misleading because it ignores the fact that raters may agree by chance. Cohen’s kappa corrects for this problem by depicting the proportion of agreement attained beyond that attainable by chance. As shown below, kappa gives us the proportion of agreement that was attained once the proportion attainable by chance is removed from the actual proportion of agreement.

31

0.00 1.00

Attainable by chance Attainable beyond chance

Actual


• The computation formula for kappa demonstrates this relationship. In this formula, D indicates the diagonal elements of the frequency table (the cells in which the raters agree).

• The first element of the numerator indicates the sum of the number of observed agreements. The second element of the numerator indicates that you subtract the sum of the expected agreements from this (where the expected value is defined as the product of the marginal frequencies and N as was the case for the chi-square test). Hence, the numerator gives you the number of observations in agreement beyond those expected by chance.

• The denominator takes the total number of observations and subtracts the number of expected agreements giving us the number of observations beyond those expected to agree given the marginal frequencies. Hence, the numerator divided by the denominator (kappa) gives us the proportion of observations in agreement beyond those expected by chance.


• Recall that in the table below, Cramer’s V equals 0.54 and the observed level of agreement equals 0.33 (12 out of 36 cases). Cohen’s kappa for this table equals 0.00 indicating that the observed level of agreement (0.33) is no better than that expected by chance alone.

rater1 rater2

Frequency‚ Percent ‚ Row Pct ‚ Col Pct ‚ 0‚ 1‚ 2‚ Total --------------------------------------------- 0 ‚ 1 ‚ 10 ‚ 1 ‚ 12 ‚ 2.78 ‚ 27.78 ‚ 2.78 ‚ 33.33 ‚ 8.33 ‚ 83.33 ‚ 8.33 ‚ ‚ 33.33 ‚ 83.33 ‚ 4.76 ‚ --------------------------------------------- 1 ‚ 1 ‚ 1 ‚ 10 ‚ 12 ‚ 2.78 ‚ 2.78 ‚ 27.78 ‚ 33.33 ‚ 8.33 ‚ 8.33 ‚ 83.33 ‚ ‚ 33.33 ‚ 8.33 ‚ 47.62 ‚ ------------------------------------------------------ 2 ‚ 1 ‚ 1 ‚ 10 ‚ 12 ‚ 2.78 ‚ 2.78 ‚ 27.78 ‚ 33.33 ‚ 8.33 ‚ 8.33 ‚ 83.33 ‚ ‚ 33.33 ‚ 8.33 ‚ 47.62 ‚ ------------------------------------------------------ Total 3 12 21 36 8.33 33.33 58.33 100.00


• 6.29 Dabbs and Morris (1990) examined archival data from military records to study the relationship between high testosterone levels and antisocial behavior in males. Of 4016 men in the Normal Testosterone group, 10.0% had a record of adult delinquency. Of 446 men in the High Testosterone group, 22.6% had a record of adult delinquency. Is this relationship significant?

• 6.30 What’s the odds ratio? How would you interpret it?


• According to the description, the data for this study look like:

• The critical value for this study is χ2(1)=3.84 at 0.05 level.

Testosterone High Normal Total

No 345 3614 3959 Delinquency

Yes 101 402 503 446 4016 4462

22

2 2 2 2

( )

(345 395.723) (3614 3563.277) (101 50.277) (402 452.723)

395.723 3563.277 50.277 452.72364.08

O E

E


• According to the description, the data for this study look like:

The odds of adult delinquency for high testosterone group is ODDhigh=101/345=0.2928

The odds of adult delinquency for normal testosterone group is ODDnormal=402/3614=0.1112

And the odds ratio OR=.2928/.1112=2.63

• The odds of engaging in behaviors of adult delinquency are 2.63 times higher if you are a member of the high testosterone group.

Testosterone High Normal Total

No 345 3614 3959 Delinquency

Yes 101 402 503 446 4016 4462

lecture 5 chi-squares ii (other categorical measures of association)

Documents

association statistics

nominal categorical

stronger relationship

perfect relationship

negligible relationship

important relationship

strong relationship

contingency table