tutorial: chi-square distribution presented by: nikki natividad course: biol 5081 - biostatistics

34
Tutorial: Chi-Square Tutorial: Chi-Square Distribution Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Upload: alize-langworthy

Post on 14-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Tutorial: Chi-Square Tutorial: Chi-Square DistributionDistributionPresented by: Nikki NatividadCourse: BIOL 5081 - Biostatistics

Page 2: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

PurposePurposeTo measure discontinuous

categorical/binned data in which a number of subjects fall into categories

We want to compare our observed data to what we expect to see. Due to chance? Due to association?

When can we use the Chi-Square Test? ◦ Testing outcome of Mendelian Crosses, Testing

Independence – Is one factor associated with another?, Testing a population for expected proportions

Page 3: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Assumptions:Assumptions:1 or more categoriesIndependent observationsA sample size of at least 10Random samplingAll observations must be usedFor the test to be accurate, the

expected frequency should be at least 5

Page 4: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Conducting Chi-Square Conducting Chi-Square AnalysisAnalysis1) Make a hypothesis based on your basic biological

question

2) Determine the expected frequencies

3) Create a table with observed frequencies, expected frequencies, and chi-square values using the formula:

(O-E)2

E

4) Find the degrees of freedom: (c-1)(r-1)

5) Find the chi-square statistic in the Chi-Square Distribution table

6) If chi-square statistic > your calculated chi-square value, you do not reject your null hypothesis and vice versa.

Page 5: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Example 1: Testing for Example 1: Testing for ProportionsProportions

Leaf Cutter Ants

Carpenter Ants

Black Ants

Total

Observed 25 18 17 60

Expected 20 20 20 60

O-E 5 -2 -3 0

(O-E)2

E1.25 0.2 0.45 χ2 = 1.90

HO: Horned lizards eat equal amounts of leaf cutter, carpenter and black ants.HA: Horned lizards eat more amounts of one species of ants than the others.

Page 6: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Example 1: Testing for Example 1: Testing for ProportionsProportions

χ2α=0.05 = 5.991

Page 7: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Example 1: Testing for Example 1: Testing for ProportionsProportions

Chi-square statistic: χ2 = 5.991 Our calculated value: χ2 = 1.90

*If chi-square statistic > your calculated value, then you do not reject your null hypothesis. There is a significant

difference that is not due to chance.

5.991 > 1.90 ∴ We do not reject our null hypothesis.

Leaf Cutter Ants

Carpenter Ants

Black Ants

Total

Observed 25 18 17 60

Expected 20 20 20 60

O-E 5 -2 -3 0

(O-E)2

E1.25 0.2 0.45 χ2 = 1.90

Page 8: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

SAS: Example 1SAS: Example 1

Included to format the table

Define your data

Indicate what your want in your output

Page 9: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

SAS: Example 1SAS: Example 1

Page 10: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

SAS: What does the p-value SAS: What does the p-value mean?mean?

“The exact p-value for a nondirectional test is the sum of probabilities for the table having a test statistic greater than or equal to the value of the observed test statistic.”

High p-value: High probability that test statistic > observed test statistic. Do not reject null hypothesis.

Low p-value: Low probability that test statistic > observed test statistic. Reject null hypothesis.

Page 11: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

SAS: Example 1SAS: Example 1

High probability that Chi-Square statistic

> our calculated chi-square statistic.

We do not reject our null hypothesis.

Page 12: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

SAS: Example 1SAS: Example 1

Page 13: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Example 2: Testing Example 2: Testing AssociationAssociation

c

cellchi2 = displays how much each cell contributes to the overall chi-squared value

no col = do not display totals of column

no row = do not display totals of rows

chi sq = display chi square statistics

HO: Gender and eye colour are not associated with each other.HA: Gender and eye colour are associated with each other.

Page 14: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Example 2: More SAS Example 2: More SAS ExamplesExamples

Page 15: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Example 2: More SAS Example 2: More SAS ExamplesExamples

(2-1)(3-1) = 1*2 = 2

High probability that Chi-Square statistic > our

calculated chi-square statistic. (78.25%)

We do not reject our null hypothesis.

Page 16: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Example 2: More SAS Example 2: More SAS ExamplesExamples

If there was an association, can

check which interactions

describe association by looking at how much each cell

contributes to the overall Chi-square

value.

Page 17: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

LimitationsLimitations No categories should be less than 1 No more than 1/5 of the expected categories

should be less than 5◦ To correct for this, can collect larger samples or

combine your data for the smaller expected categories until their combined value is 5 or more

Yates Correction*◦ When there is only 1 degree of freedom, regular

chi-test should not be used◦ Apply the Yates correction by subtracting 0.5

from the absolute value of each calculated O-E term, then continue as usual with the new corrected values

Page 18: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

What do these mean?What do these mean?

Page 19: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Likelihood Ratio Chi Likelihood Ratio Chi SquareSquare

Page 20: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Continuity-Adjusted Chi-Continuity-Adjusted Chi-Square TestSquare Test

Page 21: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Mantel-Haenszel Chi-Mantel-Haenszel Chi-Square TestSquare Test

QMH = (n-1)r2

r2 is the Pearson correlation coefficient (which also measures the linear association between row and column)

◦ http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_a0000000659.htm

Tests alternative hypothesis that there is a linear association between the row and column variableFollows a Chi-square distribution with 1 degree of freedom

Page 22: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Phi CoefficientPhi Coefficient

Page 23: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Contigency CoefficientContigency Coefficient

Page 24: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Cramer’s VCramer’s V

Page 25: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Yates & 2 x 2 Contingency Yates & 2 x 2 Contingency TablesTablesHO: Heart Disease is not associated with cholesterol levels.HA: Heart Disease is more likely in patients with a high cholesterol diet.

Calculate degrees of freedom: (c-1)(r-1) = 1*1 = 1We need to use the YATES CORRECTION

High Cholester

ol

Low Cholesterol

Total

Heart Disease 15 7 22Expected 12.65 9.35 22

Chi-Square 0.44 0.59 1.03

No Heart Disease

8 10 18

Expected 10.35 7.65 18Chi-Square 0.53 0.72 1.25

TOTAL 23 17 40

Chi-Square Total

2.28

Page 26: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Yates & 2 x 2 Contingency Yates & 2 x 2 Contingency TablesTablesHO: Heart Disease is not associated with cholesterol levels.HA: Heart Disease is more likely in patients with a high cholesterol diet. High

Cholesterol

Low Cholesterol

Total

Heart Disease 15 7 22Expected 12.65 9.35 22

Chi-Square 0.27 0.37 0.64

No Heart Disease

8 10 18

Expected 10.35 7.65 18Chi-Square 0.33 0.45 0.78

TOTAL 23 17 40

Chi-Square Total

1.42

(|15-12.65| - 0.5)2 12.65

= 0.27

Page 27: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Example 1: Testing for Example 1: Testing for ProportionsProportions

χ2α=0.05 = 3.841

Page 28: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Yates & 2 x 2 Contingency Yates & 2 x 2 Contingency TablesTablesHO: Heart Disease is not associated with cholesterol levels.HA: Heart Disease is more likely in patients with a high cholesterol diet.

3.841 > 1.42 ∴ We do not reject our null hypothesis.

High Cholester

ol

Low Cholesterol

Total

Heart Disease 15 7 22Expected 12.65 9.35 22

Chi-Square 0.27 0.37 0.64

No Heart Disease

8 10 18

Expected 10.35 7.65 18Chi-Square 0.33 0.45 0.78

TOTAL 23 17 40

Chi-Square Total

1.42

Page 29: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Fisher’s Exact TestFisher’s Exact TestLeft: Use when the alternative to independence

is negative association between the variables. These observations tend to lie in lower left and upper right cells of the table. Small p-value = Likely negative association.

Right: Use this one-sided test when the alternative to independence is positive association between the variables. These observations tend to lie in upper left and lower right cells or the table. Small p-value = Likely positive association.

Two-Tail: Use this when there is no prior alternative.

Page 30: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Yates & 2 x 2 Contingency Yates & 2 x 2 Contingency TablesTables

Page 31: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Yates & 2 x 2 Contingency Yates & 2 x 2 Contingency TablesTables

Page 32: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

HO: Heart Disease is not associated with cholesterol levels.

HA: Heart Disease is more likely in patients with a high cholesterol diet.

Page 33: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

ConclusionConclusionThe Chi-square test is important in testing

the association between variables and/or checking if one’s expected proportions meet the reality of one’s experiment

There are multiple chi-square tests, each catered to a specific sample size, degrees of freedom, and number of categories

We can use SAS to conduct Chi-square tests on our data by utilizing the command proc freq

Page 34: Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

ReferencesReferencesChi-Square Test Descriptions:

http://www.enviroliteracy.org/pdf/materials/1210.pdf

http://129.123.92.202/biol1020/Statistics/Appendix%206%20%20The%20Chi-Square%20TEst.pdf

Ozdemir T and Eyduran E. 2005. Comparison of chi-square and likelihood ratio chi-square tests: power of test. Journal of Applied Sciences Research. 1(2):242-244.

SAS Support website: http://www.sas.com/index.html“FREQ procedure”

YouTube Chi-square SAS Tutorial (user: mbate001):http://www.youtube.com/watch?v=ACbQ8FJTq7k