chi-square tests chi-square tests chapter1414 chi-square test for independence chi-square tests for...

15
Chi-Square Chi-Square Tests Tests C h a p t e r 14 14 Chi-Square Test for Independence Chi-Square Tests for Goodness- of-Fit Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin

Upload: reginald-robertson

Post on 13-Jan-2016

274 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill

Chi-Square TestsChi-Square Tests

Chapter14141414

Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit

Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill/Irwin

Page 2: Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill

Chi-Square Test for IndependenceChi-Square Test for Independence

Chi-Square TestChi-Square Test• In a test of independence for an In a test of independence for an rr x x cc contingency table, the contingency table, the

hypotheses arehypotheses areHH00: Variable : Variable AA is independent of variable is independent of variable BB

HH11: Variable : Variable AA is not independent of variable is not independent of variable BB

• Use the Use the chi-square test for independencechi-square test for independence to test these to test these hypotheses.hypotheses.

• This This non-parametric non-parametric test is based on test is based on frequenciesfrequencies..• The The nn data pairs are classified into data pairs are classified into cc columns and columns and rr rows and then rows and then

the the observed frequencyobserved frequency ffjkjk is compared with the is compared with the expected expected

frequencyfrequency eejkjk..

14-2

Page 3: Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill

• The critical value comes from the The critical value comes from the chi-square probability chi-square probability distributiondistribution with with degrees of freedom. degrees of freedom.

• = degrees of freedom = (= degrees of freedom = (rr – 1)( – 1)(cc – 1) – 1)where where rr = number of rows in the table = number of rows in the table

cc = number of columns in the table = number of columns in the table• Appendix E contains critical values for right-tail areas of the chi-Appendix E contains critical values for right-tail areas of the chi-

square distribution.square distribution.• The mean of a chi-square distribution is The mean of a chi-square distribution is with variance 2 with variance 2..

Chi-Square DistributionChi-Square Distribution

Chi-Square Test for IndependenceChi-Square Test for Independence

14-3

Page 4: Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill

• Assuming that Assuming that HH00 is true, the expected frequency of row is true, the expected frequency of row jj and and

column column kk is: is:

eejkjk = = RRjjCCkk//nn

where where RRjj = total for row = total for row jj ( (jj = 1, 2, …, = 1, 2, …, rr))

CCkk = total for column = total for column kk ( (kk = 1, 2, …, = 1, 2, …, cc))

nn = sample size = sample size

Expected FrequenciesExpected Frequencies

Chi-Square Test for IndependenceChi-Square Test for Independence

Steps in Testing the HypothesesSteps in Testing the Hypotheses

Step 1: State the HypothesesStep 1: State the HypothesesHH00: Variable : Variable AA is independent of variable is independent of variable B B

HH11: Variable : Variable AA is not independent of variable is not independent of variable BB

14-4

Page 5: Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill

Chi-Square Test for IndependenceChi-Square Test for Independence

• Step 2: Specify the Decision RuleStep 2: Specify the Decision Rule

Calculate Calculate = ( = (rr – 1)( – 1)(cc – 1) – 1)

For a given For a given , look up the right-tail critical value (, look up the right-tail critical value (22RR) from ) from

Appendix E or by using Excel.Appendix E or by using Excel.

Reject Reject HH00 if if 22RR > test statistic. > test statistic.

Steps in Testing the HypothesesSteps in Testing the Hypotheses

• Step 3: Calculate the Expected FrequenciesStep 3: Calculate the Expected Frequencieseejkjk = = RRjjCCkk//nn

14-5

Page 6: Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill

Chi-Square Test for IndependenceChi-Square Test for Independence

• Step 4: Calculate the Test StatisticStep 4: Calculate the Test StatisticThe chi-square test statistic isThe chi-square test statistic is

• Step 5: Make the DecisionStep 5: Make the DecisionReject Reject HH00 if if 22

RR > test statistic or if the > test statistic or if the pp-value -value << ..

Steps in Testing the HypothesesSteps in Testing the Hypotheses

calc

14-6

Page 7: Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill

Chi-Square Test for IndependenceChi-Square Test for Independence

• The chi-square test is unreliable if the The chi-square test is unreliable if the expectedexpected frequencies are frequencies are too small.too small.

• Rules of thumb:Rules of thumb:• Cochran’s RuleCochran’s Rule requires that requires that eejkjk > 5 for all cells. > 5 for all cells.• Up to 20% of the cells may have Up to 20% of the cells may have eejkjk < 5 < 5

Small Expected FrequenciesSmall Expected Frequencies

• Most agree that a chi-square test is infeasible Most agree that a chi-square test is infeasible if if eejkjk < 1 in any cell. < 1 in any cell.

• If this happens, try combining adjacent rows If this happens, try combining adjacent rows or columns to enlarge the expected or columns to enlarge the expected frequencies.frequencies.

14-7

Page 8: Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill

Chi-Square Test for Goodness-of-FitChi-Square Test for Goodness-of-Fit

Why Do a Chi-Square Test on Numerical Data?Why Do a Chi-Square Test on Numerical Data?• The researcher may believe there’s a relationship between X and Y, but The researcher may believe there’s a relationship between X and Y, but

doesn’t want to use regression.doesn’t want to use regression.• There are outliers or anomalies that prevent us from assuming that the There are outliers or anomalies that prevent us from assuming that the

data came from a normal population.data came from a normal population.• The researcher has numerical data for one variable but not the other.The researcher has numerical data for one variable but not the other.

Purpose of the TestPurpose of the Test• The The goodness-of-fitgoodness-of-fit ( (GOFGOF) test helps you decide whether your sample ) test helps you decide whether your sample

resembles a particular kind of population.resembles a particular kind of population.• The chi-square test will be used because it is versatile and easy to The chi-square test will be used because it is versatile and easy to

understand.understand.• Goodness-of-fit tests may lack power in small samples. As a guideline, Goodness-of-fit tests may lack power in small samples. As a guideline,

a chi-square goodness-of-fit test should be avoided if a chi-square goodness-of-fit test should be avoided if nn < 25. < 25.

14-8

Page 9: Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill

• A A multinomial distributionmultinomial distribution is defined by any is defined by any kk probabilities probabilities 11, , 22, , …, …, kk that sum to unity. that sum to unity.

• For example, consider the following “official” proportions of M&M For example, consider the following “official” proportions of M&M colors. The hypotheses arecolors. The hypotheses are

HH00: : 11 = .13, = .13, 22 = .13, = .13, 33 = .24, = .24, 44 = .20, = .20, 55 = .16, = .16, 66 = .14 = .14HH11: At least one of the : At least one of the jj differs from the differs from the hypothesized hypothesized valuevalue

Multinomial GOF TestMultinomial GOF Test

Chi-Square Test for Goodness-of-FitChi-Square Test for Goodness-of-Fit

14-9

Page 10: Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill

Hypotheses for GOFHypotheses for GOF• The hypotheses are:The hypotheses are:

HH00: The population follows a _____ distribution: The population follows a _____ distribution

HH11: The population does not follow a ______ : The population does not follow a ______

distribution distribution• The blank may contain the name of any theoretical distribution (e.g., The blank may contain the name of any theoretical distribution (e.g.,

uniform, Poisson, normal).uniform, Poisson, normal).

Chi-Square Test for Goodness-of-FitChi-Square Test for Goodness-of-Fit

14-10

Page 11: Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill

• Assuming Assuming nn observations, the observations are grouped into observations, the observations are grouped into cc classes and then the classes and then the chi-square test statisticchi-square test statistic is found using: is found using:

Test Statistic and Degrees of Freedom Test Statistic and Degrees of Freedom for GOFfor GOF

wherewhere ffjj = the observed frequency of = the observed frequency of

observations in class observations in class jj

eejj = the expected frequency in class = the expected frequency in class jj if if

HH00 were true were true

calc

Chi-Square Test for Goodness-of-FitChi-Square Test for Goodness-of-Fit

14-11

Page 12: Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill

Uniform Goodness-of-Fit TestUniform Goodness-of-Fit Test

• The The uniform goodness-of-fituniform goodness-of-fit test is a special case of the multinomial test is a special case of the multinomial in which every value has the same chance of occurrence.in which every value has the same chance of occurrence.

• The chi-square test for a uniform distribution compares all The chi-square test for a uniform distribution compares all cc groups groups simultaneously.simultaneously.

• The hypotheses are:The hypotheses are:

HH00: : 11 = = 22 = …, = …, cc = 1/ = 1/cc

HH11: Not all : Not all jj are equal are equal

Uniform DistributionUniform Distribution

14-12

Page 13: Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill

Uniform Goodness-of-Fit TestUniform Goodness-of-Fit Test

• The test can be performed on data that are already tabulated into The test can be performed on data that are already tabulated into groups.groups.

• Calculate the expected frequency Calculate the expected frequency eejj for each cell.for each cell.

• The degrees of freedom are The degrees of freedom are = c – 1 since there are no parameters = c – 1 since there are no parameters for the uniform distribution.for the uniform distribution.

• Obtain the critical value Obtain the critical value 22 from Appendix E for the desired level of from Appendix E for the desired level of

significance significance ..• The The pp-value can be obtained from Excel. -value can be obtained from Excel.

• Reject Reject HH00 if if pp-value -value << ..

Uniform GOF Test: Grouped DataUniform GOF Test: Grouped Data

14-13

Page 14: Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill

Uniform Goodness-of-Fit TestUniform Goodness-of-Fit Test

• First form First form cc bins of equal width (X bins of equal width (Xmaxmax – X – Xminmin)/c and create a )/c and create a

frequency distribution.frequency distribution.

• Calculate the observed frequency Calculate the observed frequency ffjj for each bin. for each bin.

• Define Define eejj = = n/c and pn/c and perform the chi-square calculations.erform the chi-square calculations.

• The degrees of freedom are The degrees of freedom are = c – 1 since there are no = c – 1 since there are no parameters for the uniform distribution.parameters for the uniform distribution.

• Obtain the critical value from Appendix E for a given significance Obtain the critical value from Appendix E for a given significance level level and make the decision. and make the decision.

Uniform GOF Test: Raw DataUniform GOF Test: Raw Data

14-14

Page 15: Chi-Square Tests Chi-Square Tests Chapter1414 Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit Copyright © 2010 by The McGraw-Hill

14-15

Uniform Goodness-of-Fit TestUniform Goodness-of-Fit Test

• Calculate the mean and standard deviation of the uniform Calculate the mean and standard deviation of the uniform distribution as:distribution as:

= (a + b)/2= (a + b)/2• If the data are not skewed and the sample size is large (If the data are not skewed and the sample size is large (nn > 30), > 30),

then the mean is approximately normally distributed. then the mean is approximately normally distributed. • So, test the hypothesized uniform mean using So, test the hypothesized uniform mean using

Uniform GOF Test: Raw DataUniform GOF Test: Raw Data

= [(b – a + 1)2 – 1)/12= [(b – a + 1)2 – 1)/12

14-15