chi-square test for qualitative data - dooleykevin.comdooleykevin.com/psyc60.15.pdf · chi-square...
TRANSCRIPT
Chi-Square Test for Qualitative Data
For qualitative data (measured on a nominal scale)
* Observations MUST be independent - No more than one measurement per subject
* Sample size must be large enough - Expected frequencies must be ≥ 5
2
Chi-square distribution
Critical Values Table on page 537 in your book!
X2 rollercoaster right here in California
Goodness of Fit χ2
l 1 variable l H0: observed & expected frequencies do not differ l Steps:
l Calculate expected frequencies l Compute χ2
l Compare to critical value l df = # categories - 1
(fO-fE)2
fE ∑
Observed frequency
Expected frequency
Example: Goodness of Fit χ2 Married Single Separated Divorced Widowed Total
Sample (N = 100) fo 50 22 8 18 2 100 expected freq. fe
0.55 0.21 0.09 0.10 0.05 100%
∑−
=e
eo
fff 2
2 )(χ
5)52(
10)1018(
9)98(
21)2122(
55)5550( 22222
2 −+
−+
−+
−+
−=χ
Is the marital status of our sample representative of the population?
Statistical Hypotheses: H0 = fo’s (observed frequencies) conform to fe’s (expected) H1 = the sample differs from the expected frequencies
Decision rule: α = .05; df = 5 - 1 = 4; critical χ2= 9.49
Calculate test statistic: (*expected frequencies should not below 5 in any cell!)
81.88.14.611.05.45.2 =++++=χ
Getting the Critical Value
Example: Goodness of Fit χ2 Observed statistical test value: χ2 (4) = 8.81, p > .05 Make a decision & interpret
- Retain H0 because 8.81 < 9.49 - The sample does not significantly differ from the population, with regard to marital status
Another Example Rated G Rated PG-13 Rated NC17
Sample (N = 24) fo 5 5 14 expected freq. fe 8 8 8
∑−
=e
eo
fff 2
2 )(χ
8)814(
8)85(
8)85( 222
2 −+
−+
−=χ
Is there an association between sexy advertising and buying more products?
Statistical Hypotheses: H0 = there is no association between sexy advertising and purchases; H1 = there is an association between advertising and purchases
Decision rule: α = .05; df = 3 - 1 = 2; critical χ2= 5.99
Calculate statistic: (remember: expected frequencies should not below 5 in any cell!)
75.65.4125.1125.12 =++=χ
Another Example l Observed statistical test value: χ2 (2) = 6.75, p < .05
l Make a decision & interpret l Reject H0 because 6.75 > 5.99 l Sex sells!
Practice! Goodness of Fit χ2 l Lets say you roll a 6-sided dice 120 times. You would EXPECT
that each side would come up 1/6 of the time (i.e., 20 times)
l Now your friend gets his own 6-sided dice and rolls it 120 times. You would have the same EXPECTED frequency here, right?
l Calculate a goodness of fit χ2 for both you and your friend, and determine whether one of you has a weighted dice, at α = .05. Don’t forget to calculate df to get the critical χ2 value! Is one of the dice suspect?
1 2 3 4 5 6 fo 18 19 21 23 22 17
1 2 3 4 5 6 fo 8 9 15 15 16 57
€
(O - E)2
EDice Obs. Exp. O-E (O-E)2
1 18 20 -2 4 .20
2 19 20 -1 1 .05
3 21 20 1 1 .05
4 23 20 3 9 .45
5 22 20 2 4 .20
6 17 20 -3 9 .45
120 120
Your 120 Rolls
0 1.4
€
(O - E)2
EDice Obs. Exp. O-E (O-E)2
1 8 20 -12 144 7.20
2 9 20 -11 121 6.05
3 15 20 -5 25 1.25
4 15 20 -5 25 1.25
5 16 20 -4 16 0.80
6 57 20 37 1369 68.45
Friend’s 120 Rolls
0 120 120 85
df & critical value…
l df = #categories – 1 = 5 l Critical χ2 = 11.07
Practice: Goodness of Fit χ2 l You:
l NOT SIGNIFICANT
l Friend: l SIGNIFICANT l Is your friend using a weighted dice?
=∑ = 1.4 χ2 � (O-E)2
E
=∑ = 85 χ2 � (O-E)2
E
χ2 Test for Independence l Tests the association between 2 categorical variables l Do the frequencies you actually observe differ from the
expected frequencies by more than chance alone? l Statistical hypotheses:
l H0: the 2 variables are independent (i.e. no association) l H1: the variables are not independent
l Steps: l Calculate expected frequency of each cell l Compute χ2
l Compare to critical value § df = (# rows – 1) x (# columns – 1)
(fO-fE)2
fE ∑ Expected
frequency Observed frequency
Example: χ2 Test for Independence l Is there an association between gender and vegetarianism?
l Statistical Hypotheses: l H0: gender and food preference are independent l H1: gender and food preference are associated/ not independent
l Decision rule: α = .05 l df = (# rows – 1) x (# columns – 1) à (2-1) x (2-1) = 1 l Critical χ2 = 3.841
Vegetarian Non-Vegetarian Total: Male 10 60 70 Female 50 80 130 Total: 60 140 200
Next step: calculate the expected frequency of each cell
Vegetarian Non-Vegetarian Total: Male 10
60 70
Female 50
80 130
Total: 60 140 200
totalgrandalcolumn tot x totalrowcelleach offrequency expected =
21200
60 x 70==ef
39200
60 x 130==ef
49200
140 x 70==ef
91200
140 x 130==ef
Now put it into the table… Male Veg Male Non-Veg Female Veg Female Non-Veg
Sample (N = 200)
fo 10 60 50 80
expected freq. fe
21 49 39 91
∑−
=e
eo
fff 2
2 )(χ
91)9180(
39)3950(
49)4960(
21)2110( 2222
2 −+
−+
−+
−=χ
66.1233.110.347.276.52 =+++=χ
Example: χ2 Test for Independence l Observed statistical test value: χ2 (1) = 12.66, p < .05
l Make a decision & interpret l Reject H0 and accept H1 because 12.66 > 3.84 l Gender is related to food preference!
Practice! l Is there an association between cat ownership (yes/no) and life
success (yes/no)? You survey 100 people…
l Don’t forget to get your row and column totals… l And follow the steps of hypothesis testing:
l Statistical Hypothesis l Decision Rule l Calculate Test Statistic l Make a Decision & Interpret
Successful Not Successful Total: Cat 60 15 No Cat 15 10 Total: 100
Successful Not Successful Total: Cat 60 15 75 No Cat 15 10 25 Total: 75 25 100
Statistical Hypotheses: H0: cat ownership and life success are independent H1: cat ownership and life success are related
Decision rule: α = .05 df = (# rows – 1) x (# columns – 1) à (2-1) x (2-1) = 1 Critical χ2 = 3.841
Successful Not Successful Total: Cat 60
15 75
No Cat 15
10 25
Total: 75 25 100
25.56100
75 x 75==ef
75.18100
75 x 25==ef
75.18100
25 x 75==ef
25.6100
25 x 25==ef
Cat, Success
No cat, Success
Cat, No success
No cat, No Success
Sample (N = 100)
fo 60 15 15 10
expected freq. fe
56.25 18.75 18.75 6.25
∑−
=e
eo
fff 2
2 )(χ
25.6)25.610(
75.18)75.1815(
75.18)75.1815(
25.56)25.5660( 2222
2 −+
−+
−+
−=χ
0.425.275.75.25.2 =+++=χ
Cat, Success
No cat, Success
Cat, No success
No cat, No Success
Sample (N = 100)
fo 60 15 15 10
expected freq. fe
56.25 18.75 18.75 6.25
l Observed statistical test value: χ2 (1) = 4.00, p < .05
l Make a decision & interpret l Reject H0 because 4.00 > 3.84 l Cat ownership is related to life success!
=