chi-square test for qualitative data - dooleykevin.comdooleykevin.com/psyc60.15.pdf · chi-square...

Chi-Square Test for Qualitative Data

For qualitative data (measured on a nominal scale)

* Observations MUST be independent - No more than one measurement per subject

* Sample size must be large enough - Expected frequencies must be ≥ 5

2

Chi-square distribution

Critical Values Table on page 537 in your book!

X2 rollercoaster right here in California

Goodness of Fit χ2

l  1 variable l  H0: observed & expected frequencies do not differ l  Steps:

l  Calculate expected frequencies l  Compute χ2

l  Compare to critical value l  df = # categories - 1

(fO-fE)2

fE ∑

Observed frequency

Expected frequency

Example: Goodness of Fit χ2 Married Single Separated Divorced Widowed Total

Sample (N = 100) fo 50 22 8 18 2 100 expected freq. fe

0.55 0.21 0.09 0.10 0.05 100%

∑−

=e

eo

fff 2

2 )(χ

5)52(

10)1018(

9)98(

21)2122(

55)5550( 22222

2 −+

−+

−+

−+

−=χ

Is the marital status of our sample representative of the population?

Statistical Hypotheses: H0 = fo’s (observed frequencies) conform to fe’s (expected) H1 = the sample differs from the expected frequencies

Decision rule: α = .05; df = 5 - 1 = 4; critical χ2= 9.49

Calculate test statistic: (*expected frequencies should not below 5 in any cell!)

81.88.14.611.05.45.2 =++++=χ

Getting the Critical Value

Example: Goodness of Fit χ2 Observed statistical test value: χ2 (4) = 8.81, p > .05 Make a decision & interpret

- Retain H0 because 8.81 < 9.49 - The sample does not significantly differ from the population, with regard to marital status

Another Example Rated G Rated PG-13 Rated NC17

Sample (N = 24) fo 5 5 14 expected freq. fe 8 8 8

∑−

=e

eo

fff 2

2 )(χ

8)814(

8)85(

8)85( 222

2 −+

−+

−=χ

Is there an association between sexy advertising and buying more products?

Statistical Hypotheses: H0 = there is no association between sexy advertising and purchases; H1 = there is an association between advertising and purchases

Decision rule: α = .05; df = 3 - 1 = 2; critical χ2= 5.99

Calculate statistic: (remember: expected frequencies should not below 5 in any cell!)

75.65.4125.1125.12 =++=χ

Another Example l  Observed statistical test value: χ2 (2) = 6.75, p < .05

l  Make a decision & interpret l  Reject H0 because 6.75 > 5.99 l  Sex sells!

Practice! Goodness of Fit χ2 l  Lets say you roll a 6-sided dice 120 times. You would EXPECT

that each side would come up 1/6 of the time (i.e., 20 times)

l  Now your friend gets his own 6-sided dice and rolls it 120 times. You would have the same EXPECTED frequency here, right?

l  Calculate a goodness of fit χ2 for both you and your friend, and determine whether one of you has a weighted dice, at α = .05. Don’t forget to calculate df to get the critical χ2 value! Is one of the dice suspect?

1 2 3 4 5 6 fo 18 19 21 23 22 17

1 2 3 4 5 6 fo 8 9 15 15 16 57

€

(O - E)2

EDice Obs. Exp. O-E (O-E)2

1 18 20 -2 4 .20

2 19 20 -1 1 .05

3 21 20 1 1 .05

4 23 20 3 9 .45

5 22 20 2 4 .20

6 17 20 -3 9 .45

120 120

Your 120 Rolls

0 1.4

€

(O - E)2

EDice Obs. Exp. O-E (O-E)2

1 8 20 -12 144 7.20

2 9 20 -11 121 6.05

3 15 20 -5 25 1.25

4 15 20 -5 25 1.25

5 16 20 -4 16 0.80

6 57 20 37 1369 68.45

Friend’s 120 Rolls

0 120 120 85

df & critical value…

l  df = #categories – 1 = 5 l  Critical χ2 = 11.07

Practice: Goodness of Fit χ2 l  You:

l  NOT SIGNIFICANT

l  Friend: l  SIGNIFICANT l  Is your friend using a weighted dice?

=∑ = 1.4 χ2 � (O-E)2

E

=∑ = 85 χ2 � (O-E)2

E

χ2 Test for Independence l  Tests the association between 2 categorical variables l  Do the frequencies you actually observe differ from the

expected frequencies by more than chance alone? l  Statistical hypotheses:

l  H0: the 2 variables are independent (i.e. no association) l  H1: the variables are not independent

l  Steps: l  Calculate expected frequency of each cell l  Compute χ2

l  Compare to critical value §  df = (# rows – 1) x (# columns – 1)

(fO-fE)2

fE ∑ Expected

frequency Observed frequency

Example: χ2 Test for Independence l  Is there an association between gender and vegetarianism?

l  Statistical Hypotheses: l  H0: gender and food preference are independent l  H1: gender and food preference are associated/ not independent

l  Decision rule: α = .05 l  df = (# rows – 1) x (# columns – 1) à (2-1) x (2-1) = 1 l  Critical χ2 = 3.841

Vegetarian Non-Vegetarian Total: Male 10 60 70 Female 50 80 130 Total: 60 140 200

Next step: calculate the expected frequency of each cell

Vegetarian Non-Vegetarian Total: Male 10

60 70

Female 50

80 130

Total: 60 140 200

totalgrandalcolumn tot x totalrowcelleach offrequency expected =

21200

60 x 70==ef

39200

60 x 130==ef

49200

140 x 70==ef

91200

140 x 130==ef

Now put it into the table… Male Veg Male Non-Veg Female Veg Female Non-Veg

Sample (N = 200)

fo 10 60 50 80

expected freq. fe

21 49 39 91

∑−

=e

eo

fff 2

2 )(χ

91)9180(

39)3950(

49)4960(

21)2110( 2222

2 −+

−+

−+

−=χ

66.1233.110.347.276.52 =+++=χ

Example: χ2 Test for Independence l  Observed statistical test value: χ2 (1) = 12.66, p < .05

l  Make a decision & interpret l  Reject H0 and accept H1 because 12.66 > 3.84 l  Gender is related to food preference!

Practice! l  Is there an association between cat ownership (yes/no) and life

success (yes/no)? You survey 100 people…

l  Don’t forget to get your row and column totals… l  And follow the steps of hypothesis testing:

l  Statistical Hypothesis l  Decision Rule l  Calculate Test Statistic l  Make a Decision & Interpret

Successful Not Successful Total: Cat 60 15 No Cat 15 10 Total: 100

Successful Not Successful Total: Cat 60 15 75 No Cat 15 10 25 Total: 75 25 100

Statistical Hypotheses: H0: cat ownership and life success are independent H1: cat ownership and life success are related

Decision rule: α = .05 df = (# rows – 1) x (# columns – 1) à (2-1) x (2-1) = 1 Critical χ2 = 3.841

Successful Not Successful Total: Cat 60

15 75

No Cat 15

10 25

Total: 75 25 100

25.56100

75 x 75==ef

75.18100

75 x 25==ef

75.18100

25 x 75==ef

25.6100

25 x 25==ef

Cat, Success

No cat, Success

Cat, No success

No cat, No Success

Sample (N = 100)

fo 60 15 15 10

expected freq. fe

56.25 18.75 18.75 6.25

∑−

=e

eo

fff 2

2 )(χ

25.6)25.610(

75.18)75.1815(

75.18)75.1815(

25.56)25.5660( 2222

2 −+

−+

−+

−=χ

0.425.275.75.25.2 =+++=χ

Cat, Success

No cat, Success

Cat, No success

No cat, No Success

Sample (N = 100)

fo 60 15 15 10

expected freq. fe

56.25 18.75 18.75 6.25

l  Observed statistical test value: χ2 (1) = 4.00, p < .05

l  Make a decision & interpret l  Reject H0 because 4.00 > 3.84 l  Cat ownership is related to life success!

=

chi-square test for qualitative data - dooleykevin.comdooleykevin.com/psyc60.15.pdf · chi-square...

Documents