testing for marginal independence between two categorical variables with multiple responses robert...

25
Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Upload: kate-hargrove

Post on 30-Mar-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Testing for Marginal Independence Between Two Categorical Variables with Multiple

Responses

Robert Jeutong

Page 2: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Outline

• Introduction– Kansas Farmer Data– Notation

• Modified Pearson Based Statistic– Nonparametric Bootstrap– Bootstrap p-Value Methods

• Simulation Study• Conclusion

Page 3: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Introduction

• “pick any” (or pick any/c) or multiple-response categorical variables

• Survey data arising from multiple-response categorical variables questions present a unique challenge for analysis because of the dependence among responses provided by individual subjects.

• Testing for independence between two categorical variables is often of interest

• When at least one of the categorical variables can have multiple responses, traditional Pearson chisquare tests for independence should not be used because of the within-subject dependence among responses

Page 4: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Intro cont’d

• A special kind of independence, called marginal independence, becomes of interest in the presence of multiple response categorical variables

• The purpose of this article is to develop new approaches to the testing of marginal independence between two multiple-response categorical variables

• Agresti and Liu (1999) call this a test for simultaneous pair wise marginal independence (SPMI)

• The proposed tests are extensions to the traditional Pearson chi-square tests for independence testing between single-response categorical variables

Page 5: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Kansas Farmer Data

• Comes from Loughin (1998) and Agresti and Liu (1999)

• Conducted by the Department of Animal Sciences at Kansas State University

• Two questions in the survey asked Kansas farmers about their sources of veterinary information and their swine waste storage methods

• Farmers were permitted to select as many responses as applied from a list of items

Page 6: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong
Page 7: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Data cont’d

• Interest lies in determining whether sources of veterinary information are independent of waste storage methods in a similar manner as would be done in a traditional Pearson chi-square test applied to a contingency table with single-response categorical variables

• A test for SPMI can be performed to determine whether each source of veterinary information is simultaneously independent of each swine waste storage method

Page 8: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Data cont’d

• 4 × 5 = 20 different 2 × 2 tables can be formed to marginally summarize all possible responses to item pairs

• Independence is tested in each of the 20 2 × 2 tables simultaneously for a test of SPMI

Professional consultant

1 0

Lagoon 1 34 109

0 10 126

Page 9: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Data cont’d

• The test is marginal because responses are summed over the other item choices for each of the multiple-response categorical variables

• If SPMI is rejected, examination of the individual 2 × 2 tables can follow to determine why the rejection occurs

Page 10: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Notation

• Let W and Y = multiple-response categorical variables for an r × c table’s row and column variables, respectively

• Sources of veterinary information are denoted by Y and waste storage methods are denoted by W

• The categories for each multiple-response categorical variable are called items (Agresti and Liu, 1999); For example, lagoon is one of the items for waste storage method

• Suppose W has r items and Y has c items. Also, suppose n subjects are sampled at random

Page 11: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Notation cont’d

• Let Wsi = 1 if a positive response is given for item i by subject s for i = 1,.. ,r and s = 1,.. ,n; Wsi = 0 for a negative response.

• Let Ysj for j = 1,.., c and s = 1..,n be similarly defined.

• The abbreviated notation, Wi and Yj , refers generally to the binary response random variable for item i and j, respectively

• The set of correlated binary item responses for subject s are

• Ys = (Ys1, Ys2,…,Ysc) and Ws = (Ws1, Ws2,…,Wsr )

Page 12: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Notation cont’d

• Cell counts in the joint table are denoted by ngh for the gth possible (W1…,Wr ) and hth possible (Y1…,Yc )

• The corresponding probability is denoted by τgh. Multinomial sampling is assumed to occur within the entire joint table; thus, ∑g,h τgh = 1

• Let mij denote the number of observed positive responses to Wi and Yj

• The marginal probability of a positive response to Wi and Yj is denoted by πij and its maximum likelihood estimate (MLE) is mij/n.

Page 13: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Joint Table

Page 14: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

SPMI Defined in Hypothesis

• Ho: πij = πi•π•j for i = 1,...,r and j = 1,...,c

• Ha: At least one equality does not hold

• where πij = P(Wi = 1, Yj = 1), πi• = P(Wi = 1), and π•j = P(Yj = 1). This specifies marginal independence between each Wi and Yj pair

• P(Wi = 1, Yj = 1) = πij

• P(Wi = 1, Yj = 0) =πi• − πij

• P(Wi = 0, Yj = 1) = π•j − πij

• P(Wi =0, Yj = 0) = 1 − πi• − π•j + πij

Page 15: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Hypothesis

• SPMI can be written as ORWY,ij =1 for i = 1,…,r and j = 1,…,c where OR is the abbreviation for odds ratio and– ORWY,ij = πij(1 − πi• − π•j + πij)/[(πi• − πij)(π•j − πij )]

• Therefore, SPMI represents simultaneous independence in the rc 2 × 2 pairwise item response tables formed for each Wi and Yj pair

• Join independence implies SPMI but the reverse is not true

Page 16: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Modified Pearson Statistic

• Under the Null

• (1,1), (1,0), (0,1), (1,1)

Yj

Wi

1 0

1 πij πi• − πij π•i

0 π•j − πij 1 − πi• − π•j + πij 1-π•i

π•j 1-π•j

Page 17: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

The Statistic

Page 18: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Nonparametric Bootstrap

• To resample under independence of W and Y, Ws and Ys are independently resampled with replacement from the data set.

• The test statistic calculated for the bth resample of size n is denoted by X2∗

S,b.

• The p-value is calculated as– B-1∑bI(X2∗

S,b ≥X2S)

• where B is the number of resamples taken and I() is the indicator function

Page 19: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Bootstrap p-Value Combination Methods

• Each X2S,i,j gives a test for independence between

each Wi and Yj pair for i = 1,…,r and j = 1,…,c. The p-values from each of these tests (using a χ2

1

approximation) can be combined to form a new statistic p tilde

• the product of the r×c p-values or the minimum of the r×c p-values could be used as p tilde

• The p-value is calculated as– B-1∑bI(p* tilde ≤ p tilde)

Page 20: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Results from the Farmer DataMethod My p-value Authors p-valueBootstrap X2

s 0.0001 <0.0001

Bootstrap product of p-values 0.0001 0.0001Bootstrap minimum p-values 0.0047 0.0034

Page 21: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Interpretation and Follow-Up

• The p-values show strong evidence against SPMI

• Since X2S is the sum of rc different Pearson chi-square test

statistics, each X2S,i,j can be used to measure why SPMI is

rejected

• The individual tests can be done using an asymptotic χ21

approximation or the estimated sampling distribution of the individual statistics calculated in the proposed bootstrap procedures

• When this is done, the significant combinations are (Lagoon, pro consultant), (Lagoon, Veterinarian), (Pit, Veterinarian), (Pit, Feed companies & representatives), (Natural drainage, pro consultant), (Natural drainage, Magazines)

Page 22: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Simulation Study

• which testing procedures hold the correct size under a range of different situations and have power to detect various alternative hypotheses

• 500 data sets for each simulation setting investigated

• The SPMI testing methods are applied (B = 1000), and for each method the proportion of data sets are recorded for which SPMI is rejected at the 0.05 nominal level

Page 23: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong
Page 24: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

My Results

• n=100• 2×2 marginal table• OR = 25

Method My p-value Authors p-value

Bootstrap X2s 0.04 0.056

Bootstrap product of p-values 0.042 0.056

Bootstrap minimum p-values 0.036 0.044

Page 25: Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

Conclusion

• The bootstrap methods generally hold the correct size