chi-square tests categorical data 1-sample, compared to theoretical distribution –goodness-of-fit...
TRANSCRIPT
![Page 1: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/1.jpg)
Chi-Square Tests
Chi-Square Tests
• Categorical data
• 1-sample, compared to theoretical distribution– Goodness-of-Fit Test
• 2+ samples, 2+ levels of response variable– Chi-square Test
Slide #1
![Page 2: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/2.jpg)
Chi-square Slide #2
Chi-Square -- Examples
• Does the dominant plants in plots differ between two locations?
• Does the frequency of females in majors differ between majors in the natural sciences, social sciences, and humanities?
• Does the occurrence of a food item in the stomachs of lake trout and chinook salmon differ?
![Page 3: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/3.jpg)
Chi-square Slide #3
What do those examples have in common?
• A categorical response variable– dominant plant in a plot– sex of student (male or female)– occurrence of a food item (Y/N)
• Compare response frequencies among >2 groups– between two locations– among three divisions– between lake trout and chinook salmon
![Page 4: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/4.jpg)
Chi-square Slide #4
An Illustrative Example• When Chinook Salmon were first introduced to
Lake Superior there was concern that they would compete with native Lake Trout for Lake Herring. Preliminarily, fisheries biologists classified the diets of 50 Lake Trout and 40 Chinook Salmon as containing Lake Herring or not. They found 36 Lake Trout and 24 Chinook Salmon contained Lake Herring. Test (at the 10% level) if there is a difference in the proportion of Lake Trout and Chinook Salmon that had Lake Herring.
![Page 5: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/5.jpg)
Chi-square Slide #5
Observed Table
– Recall – “… the diets of 50 Lake Trout and 40 Chinook Salmon … found 36 Lake Trout and 24 Chinook Salmon contained Lake Herring”
LH no LH Total
Lake TroutCh. Salmon
Total
5040
36 1424 16
3060 90
![Page 6: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/6.jpg)
Chi-square Slide #6
Observed Table
• If there is no difference between rows (i.e., the Ho) then the total row could represent either row.
• Thus, the proportion of predator (regardless of type) that consumed Lake Herring is estimated to be 60/90 or 0.67
LH no LH Total
Lake Trout 36 14 50Ch. Salmon 24 16 40
Total 60 30 90
![Page 7: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/7.jpg)
Chi-square Slide #7
Expectations if Ho is true• If there is no difference and the common
proportion is estimated by 0.67 then how many ….
•LT do we expect to have LH = 50*0.67
•LT … … to not have LH = 50*0.33
•CS … … to have LH = 40*0.67
•CS … … to not have LH = 40*0.33
90
60*50
90
30*50
90
60*40
90
30*40
![Page 8: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/8.jpg)
Chi-square Slide #8
Create Expected Table
LH no LH Total
Lake Trout 50Ch. Salmon 40
Total 60 30 90
90
60*50• LT to have LH = = 33.3
33.3
![Page 9: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/9.jpg)
Chi-square Slide #9
LH no LH Total
Lake Trout 50Ch. Salmon 40
Total 60 30 90
Create Expected Table
90
30*50• LT to NOT have LH = = 16.7
16.726.733.3
13.316.7
• Expected counts are the product of the marginal totals divided by the table total.
![Page 10: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/10.jpg)
Chi-Square Tests Slide #10
A New Test Statistic
table
22
ectedexp
ectedexpobserved
df = (rows-1)*(cols-1)
![Page 11: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/11.jpg)
Chi-Square Tests Slide #11
Chi-Square Distribution• Right-skewed (all values are positive)• Less sharply skewed with increasing df
– df are related to the size of the table, not n
• All p-values are “right-ofs” – no “one-tailed” tests with chi-square
• Examine HO – page 1
0 10 20 30 40 50
Chi-square
Chi(3)Chi(10)Chi(20)
![Page 12: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/12.jpg)
Chi-square Slide #12
Chi-Square Test• Ho: “distribution of individuals into the levels is
same for each population”• HA: “distribution of individuals into levels is
different for at least one pair of populations”• Assume: at least 5 in each cell of expected table• Statistic: Observed frequency table
• Test Statistic:
• df: (rows-1)*(columns-1)• When: categorical variable, 2+ populations/groups
table
22
ectedexp
ectedexpobserved
![Page 13: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/13.jpg)
Chi-square Slide #13
A Full Example• When Chinook Salmon were first introduced to
Lake Superior there was concern that they would compete with native Lake Trout for Lake Herring. Preliminarily, fisheries biologists classified the diets of 50 Lake Trout and 40 Chinook Salmon as containing Lake Herring or not. They found 36 Lake Trout and 24 Chinook Salmon contained Lake Herring. Test (at the 10% level) if there is a difference in the proportion of Lake Trout and Chinook Salmon that had Lake Herring.
![Page 14: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/14.jpg)
Chi-square Slide #14
• Modification -- the researchers recorded what the dominant food item was. Do the dominant food items in Lake Trout and Chinook Salmon differ at the 5% level?
• See R HO Page 2.
LH smelt Mysis Total
Lake Trout 32 10 8 50Ch. Salmon 18 18 4 40
Total 50 28 12 90
Another Full Example
![Page 15: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/15.jpg)
Chi-Square Tests
Examine HO – Page 3
Slide #15
![Page 16: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/16.jpg)
Chi-Square Tests Slide #17
Goodness-of-Fit Test
• Compare observed to theoretical frequencies of individuals in categories.
• Examples –– Test whether responses are “random” (e.g., preference)– Test Mendelian genetics (e.g., 3:1 and 9:3:3:1 theories).– Test use of available resources (e.g., compare habitat
usage to availability).
![Page 17: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/17.jpg)
Chi-Square Tests Slide #18
An Illustrative Example
• Determine, at the 10% level, if Northland students prefer the Chris Duarte Group (CDG), Ronnie Baker Brooks (RBB), or Bernard Allison (BA).
• Hypotheses?• Ha: “different # of students prefer each artist”
• Ho: “same # of students prefer each artist”
![Page 18: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/18.jpg)
Chi-Square Tests Slide #19
• Under Ho, what proportion prefer each artist?
• If n=78, how many students prefer each artist if Ho is true?
Artist CDG RBB BA
Freq 26 26 26
1/3
26
An Illustrative Example
ExpectedTable
![Page 19: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/19.jpg)
Chi-Square Tests Slide #20
• Suppose these results were obtained:
Artist CDG RBB BA
Freq 24 38 16
• Is there a preference – i.e., are these observations significantly different from what was expected when assuming no preference?
An Illustrative Example
ObservedTable
![Page 20: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/20.jpg)
Chi-Square Tests Slide #21
A New Test Statistic
table
22
ectedexp
ectedexpobserved
df = cells - 1
![Page 21: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/21.jpg)
Chi-Square Tests Slide #22
Artist CDG RBB BA
# 24 38 16
Artist CDG RBB BA
# 26 26 26
26
2624 2
26
2638 2 26
2616 2c2 =
c2 = 0.15 + 5.54 + 3.85 = 9.54
df = (3-1) = 2 p-value = 0.00848
Conclusion?
An Illustrative Example
ObservedTable
ExpectedTable
![Page 22: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/22.jpg)
Chi-Square Tests Slide #23
Goodness-of-Fit Test
• Ho: distribution of individuals into levels follows the theoretical distribution
• HA: distribution of individuals into levels does NOT follow the theoretical distribution
• Sample: randomized, single variable of size n
• Assume: at least 5 in each cell of expected table
• Statistic: Observed frequency table
![Page 23: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/23.jpg)
Chi-Square Tests Slide #24
Goodness-of-Fit Test
• Test Statistic:
• df: cells-1
• Confidence Region:
–
table
22
ectedexp
ectedexpobserved
n
p̂1p̂*zp̂
where is sample proportion in level of interestp̂
![Page 24: Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square](https://reader031.vdocuments.us/reader031/viewer/2022013112/56649eba5503460f94bc1e84/html5/thumbnails/24.jpg)
Chi-Square Tests
Examine HO – Page 5
Slide #25