prepared by lloyd r. jaisingh

19
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to A PowerPoint Presentation Package to Accompany Accompany Applied Statistics in Applied Statistics in Business & Economics, Business & Economics, 4 4 th th edition edition David P. Doane and Lori E. David P. Doane and Lori E. Seward Seward Prepared by Lloyd R. Jaisingh Prepared by Lloyd R. Jaisingh

Upload: jane-hoffman

Post on 31-Dec-2015

27 views

Category:

Documents


3 download

DESCRIPTION

A PowerPoint Presentation Package to Accompany. Applied Statistics in Business & Economics, 4 th edition David P. Doane and Lori E. Seward. Prepared by Lloyd R. Jaisingh. Chapter Contents 15.1 Chi-Square Test for Independence 15.2 Chi-Square Tests for Goodness-of-Fit - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Prepared by Lloyd R. Jaisingh

McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.

A PowerPoint Presentation Package to AccompanyA PowerPoint Presentation Package to Accompany

Applied Statistics in Business & Applied Statistics in Business & Economics, Economics, 44thth edition edition

David P. Doane and Lori E. Seward David P. Doane and Lori E. Seward

Prepared by Lloyd R. Jaisingh Prepared by Lloyd R. Jaisingh

Page 2: Prepared by Lloyd R. Jaisingh

15-2

Chi-Square TestsChi-Square Tests

Chapter ContentsChapter Contents

15.1 Chi-Square Test for Independence15.1 Chi-Square Test for Independence

15.2 Chi-Square Tests for Goodness-of-Fit15.2 Chi-Square Tests for Goodness-of-Fit

15.3 Uniform Goodness-of-Fit Test15.3 Uniform Goodness-of-Fit Test

15.4 Poisson Goodness-of-Fit Test15.4 Poisson Goodness-of-Fit Test

15.5 Normal Chi-Square Goodness-of-Fit Test15.5 Normal Chi-Square Goodness-of-Fit Test

15.6 ECDF Tests (Optional)15.6 ECDF Tests (Optional)

Ch

apter 15

Page 3: Prepared by Lloyd R. Jaisingh

15-3

Chapter Learning Objectives Chapter Learning Objectives

LO15-1: LO15-1: Recognize a contingency table.Recognize a contingency table.

LO15-2:LO15-2: Find degrees of freedom and use the chi-square table of critical values.Find degrees of freedom and use the chi-square table of critical values.

LO15-3:LO15-3: Perform a chi-square test for independence on a contingency table.Perform a chi-square test for independence on a contingency table.

LO15-4:LO15-4: Perform a goodness-of-fit (GOF) test for a uniform distribution.Perform a goodness-of-fit (GOF) test for a uniform distribution.

LO15-5:LO15-5: Explain the GOF test for a Poisson distribution.Explain the GOF test for a Poisson distribution.

LO15-6: LO15-6: Use computer software to perform a chi-square GOF test for normality.Use computer software to perform a chi-square GOF test for normality.

LO15-7: LO15-7: State advantages of ECDF tests as compared to chi-square GOF tests.State advantages of ECDF tests as compared to chi-square GOF tests.

Ch

apter 15

Chi-Square TestsChi-Square Tests

Page 4: Prepared by Lloyd R. Jaisingh

15-4

15.1 Chi-Square Test for Independence15.1 Chi-Square Test for Independence

• A A contingency tablecontingency table is a cross-tabulation of is a cross-tabulation of nn paired observations into categories. paired observations into categories.

• Each cell shows the count of observations that fall into the category defined by its Each cell shows the count of observations that fall into the category defined by its row (row (rr) and column () and column (cc) heading) heading..

• For example:For example:

Contingency TablesContingency Tables

Ch

apter 15

LO15-1LO15-1

LO15-1: LO15-1: Recognize a contingency table.Recognize a contingency table.

Page 5: Prepared by Lloyd R. Jaisingh

15-5

Chi-Square TestChi-Square Test• In a test of independence for an In a test of independence for an rr x x cc contingency table, the hypotheses are contingency table, the hypotheses are

HH00: Variable : Variable AA is independent of variable is independent of variable BB

HH11: Variable : Variable AA is not independent of variable is not independent of variable BB

• Use the Use the chi-square test for independencechi-square test for independence to test these hypotheses. to test these hypotheses.• This This non-parametric non-parametric test is based on test is based on frequenciesfrequencies..• The The nn data pairs are classified into data pairs are classified into cc columns and columns and rr rows and then the rows and then the observed observed

frequencyfrequency ffjkjk is compared with the is compared with the expected frequencyexpected frequency eejkjk..

• The critical value comes from the The critical value comes from the chi-square probability distributionchi-square probability distribution with with degrees degrees of freedom. (See Appendix E for table values).of freedom. (See Appendix E for table values).

d.f. d.f. = degrees of freedom = (= degrees of freedom = (rr – 1)( – 1)(cc – 1) – 1)where where rr = number of rows in the table = number of rows in the table

cc = number of columns in the table = number of columns in the table

Ch

apter 15

15.1 Chi-Square Test for Independence15.1 Chi-Square Test for IndependenceLO15-3, 2LO15-3, 2

LO15-3: LO15-3: Perform a chi-square test for independence on a Perform a chi-square test for independence on a contingency table.contingency table.

LO15-2: LO15-2: Find degrees of freedom and use the chi-square Find degrees of freedom and use the chi-square table of critical values.table of critical values.

Page 6: Prepared by Lloyd R. Jaisingh

15-6

• Assuming that Assuming that HH00 is true, the expected frequency of row is true, the expected frequency of row jj and column and column kk is: is:

eejkjk = = RRjjCCkk//nn

where where RRjj = total for row = total for row jj ( (jj = 1, 2, …, = 1, 2, …, rr))

CCkk = total for column = total for column kk ( (kk = 1, 2, …, = 1, 2, …, cc))

nn = sample size = sample size

Expected FrequenciesExpected Frequencies

Ch

apter 15

Steps in Testing the HypothesesSteps in Testing the Hypotheses

• Step 1: State the Hypotheses.Step 1: State the Hypotheses.

• HH00: Variable : Variable AA is independent of variable is independent of variable B B

• HH11: Variable : Variable AA is not independent of variable is not independent of variable BB

• Step 2: Specify the Decision Rule.Step 2: Specify the Decision Rule.• Calculate Calculate d.f.d.f. = ( = (rr – 1)( – 1)(cc – 1) – 1)

• For a given For a given , look up the right-tail critical value (, look up the right-tail critical value (22RR) from ) from

Appendix E or by using Excel.Appendix E or by using Excel.

15.1 Chi-Square Test for Independence15.1 Chi-Square Test for IndependenceLO15-3LO15-3

Page 7: Prepared by Lloyd R. Jaisingh

15-7

• Step 4: Calculate the Test Statistic.• The chi-square test statistic is

• Step 5: Make the Decision.• Reject H0 if test statistic > 2

R or if the p-value ≤ .

Steps in Testing the HypothesesSteps in Testing the Hypotheses

Ch

apter 15

Small Expected FrequenciesSmall Expected Frequencies

• The chi-square test is unreliable if the The chi-square test is unreliable if the expectedexpected frequencies are too small. frequencies are too small.• Rules of thumb:Rules of thumb:

• Cochran’s RuleCochran’s Rule requires that requires that eejkjk > 5 for all cells. > 5 for all cells.• Up to 20% of the cells may have Up to 20% of the cells may have eejkjk < 5. < 5.

• Most agree that a chi-square test is infeasible if Most agree that a chi-square test is infeasible if eejkjk < 1 in any cell. < 1 in any cell.• If this happens, try combining adjacent rows or columns to enlarge the If this happens, try combining adjacent rows or columns to enlarge the

expected frequencies.expected frequencies.

15.1 Chi-Square Test for Independence15.1 Chi-Square Test for IndependenceLO15-3LO15-3

Page 8: Prepared by Lloyd R. Jaisingh

15-8

• Chi-square tests for independence can also be used to analyze quantitative Chi-square tests for independence can also be used to analyze quantitative variables by coding them into categories.variables by coding them into categories.

Cross-Tabulating Raw DataCross-Tabulating Raw Data

Figure 14.6

Ch

apter 15

Why Do a Chi-Square Test on Numerical Data?Why Do a Chi-Square Test on Numerical Data?

• The researcher may believe there’s a relationship between X and Y, but doesn’t The researcher may believe there’s a relationship between X and Y, but doesn’t want to use regression.want to use regression.

• There are outliers or anomalies that prevent us from assuming that the data came There are outliers or anomalies that prevent us from assuming that the data came from a normal population.from a normal population.

• The researcher has numerical data for one variable but not the other.The researcher has numerical data for one variable but not the other.

Test of Two ProportionsTest of Two Proportions

• For a 2 × 2 contingency table, the chi-square test is equivalent to a two-tailed z test for two proportions, if the samples are large enough to ensure normality.• The hypotheses are:The hypotheses are:

15.1 Chi-Square Test for Independence15.1 Chi-Square Test for IndependenceLO15-3LO15-3

Page 9: Prepared by Lloyd R. Jaisingh

15-9

15.2 Chi-Square Tests for Goodness-of-Fit15.2 Chi-Square Tests for Goodness-of-Fit

Purpose of the TestPurpose of the Test

• The The goodness-of-fitgoodness-of-fit ( (GOFGOF) test helps you decide whether your sample ) test helps you decide whether your sample resembles a particular kind of population.resembles a particular kind of population.

• The chi-square test will be used because it is versatile and easy to understand.The chi-square test will be used because it is versatile and easy to understand.

Ch

apter 15

Multinomial GOF TestMultinomial GOF Test

• A A multinomial distributionmultinomial distribution is defined by any is defined by any kk probabilities probabilities 11, , 22, …, , …, kk that sum to that sum to

unity. For example,unity. For example,H0: 1 = .13, 2 = .13, 3 = .24, 4 = .20, 5 = .16, 6 = .14H1: At least one of the j differs from the hypothesized value.

• If no parameters are estimated (m = 0) and there are c = 6 classes, so the degrees of freedom will be d.f. = c – m – 1 = 6 – 0 – 1 = 5.

Page 10: Prepared by Lloyd R. Jaisingh

15-10

Hypotheses for GOFHypotheses for GOF

• The hypotheses are:The hypotheses are:

HH00: The population follows a _____ distribution: The population follows a _____ distribution

HH11: The population does not follow a ______ : The population does not follow a ______

distribution distribution• The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson, The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson,

normal).normal).

Ch

apter 15

Test Statistic and Degrees of Freedom for GOFTest Statistic and Degrees of Freedom for GOF

Where Where ffjj = the observed frequency of = the observed frequency of

observations in class observations in class j and ej and ejj = the expected = the expected

frequency in class frequency in class jj if if HH00 were true. were true.

• The test statistic follows the chi-square distribution with degrees of freedomThe test statistic follows the chi-square distribution with degrees of freedomd.f. d.f. = = cc – – mm – 1 where – 1 where cc is the number of classes used in the test is the number of classes used in the test mm is the number is the number of parameters estimated.of parameters estimated.

15.2 Chi-Square Tests for Goodness-of-Fit15.2 Chi-Square Tests for Goodness-of-Fit

Page 11: Prepared by Lloyd R. Jaisingh

15-11

15.3 Uniform Goodness-of-Fit Test15.3 Uniform Goodness-of-Fit Test

• The The uniform goodness-of-fituniform goodness-of-fit test is a special case of the multinomial in which every test is a special case of the multinomial in which every value has the same chance of occurrence.value has the same chance of occurrence.

• The chi-square test for a uniform distribution compares all The chi-square test for a uniform distribution compares all cc groups simultaneously. groups simultaneously.• The hypotheses are:The hypotheses are:

HH00: : 11 = = 22 = …, = …, cc = 1/ = 1/cc

HH11: Not all : Not all jj are equal are equal

Uniform DistributionUniform Distribution

Ch

apter 15

• The test can be performed on data that are already tabulated into groups.The test can be performed on data that are already tabulated into groups.• Calculate the expected frequency Calculate the expected frequency eejj for each cell.for each cell.• The degrees of freedom are The degrees of freedom are d.f. d.f. = c – 1 since there are no parameters for the = c – 1 since there are no parameters for the

uniform distribution.uniform distribution.• Obtain the critical value Obtain the critical value 22

from Appendix E for the desired level of significance from Appendix E for the desired level of significance

..• The The pp-value can be obtained from Excel. -value can be obtained from Excel. • Reject Reject HH00 if if pp-value ≤ -value ≤ ..

LO15-4LO15-4

LO15-4: LO15-4: Perform a goodness of-fit (GOF) test for a uniformPerform a goodness of-fit (GOF) test for a uniform distribution.distribution.

Page 12: Prepared by Lloyd R. Jaisingh

15-12

• First form First form cc bins of equal width and create a frequency distribution. bins of equal width and create a frequency distribution.

• Calculate the observed frequency Calculate the observed frequency ffjj for each bin. for each bin.

• Define Define eejj = = n/c.n/c.

• Perform the chi-square calculations.Perform the chi-square calculations.• The degrees of freedom are The degrees of freedom are d.f.d.f. = c – 1 since there are no parameters for the = c – 1 since there are no parameters for the

uniform distribution.uniform distribution.• Obtain the critical value from Appendix E for a given significance level Obtain the critical value from Appendix E for a given significance level and make and make

the decision.the decision.• Maximize the test’s power by defining bin width as (As a result, the expected Maximize the test’s power by defining bin width as (As a result, the expected

frequencies will be as large as possible.)frequencies will be as large as possible.)

Uniform GOF Test: Raw DataUniform GOF Test: Raw Data

Ch

apter 15

15.3 Uniform Goodness-of-Fit Test15.3 Uniform Goodness-of-Fit TestLO15-4LO15-4

Page 13: Prepared by Lloyd R. Jaisingh

15-13

• Calculate the mean and standard deviation of the uniform distribution as:Calculate the mean and standard deviation of the uniform distribution as:

• If the data are not skewed and the sample size is large (If the data are not skewed and the sample size is large (nn > 30), then the mean is > 30), then the mean is approximately normally distributed. approximately normally distributed.

• So, test the hypothesized uniform mean using So, test the hypothesized uniform mean using

Uniform GOF Test: Raw DataUniform GOF Test: Raw Data

Ch

apter 15

15.3 Uniform Goodness-of-Fit Test15.3 Uniform Goodness-of-Fit TestLO15-4LO15-4

Page 14: Prepared by Lloyd R. Jaisingh

15-14

15.4 Poisson Goodness-of-Fit Test15.4 Poisson Goodness-of-Fit Test

• In a Poisson distribution model, In a Poisson distribution model, XX represents the number of events per unit of time represents the number of events per unit of time or space.or space.

• XX is a discrete nonnegative integer ( is a discrete nonnegative integer (XX = 0, 1, 2, …). = 0, 1, 2, …).• Event arrivals must be independent of each other.Event arrivals must be independent of each other.• Sometimes called a model of Sometimes called a model of rare eventsrare events because because XX typically has a small mean. typically has a small mean.

Poisson Data-Generating SituationsPoisson Data-Generating Situations

Ch

apter 15

Poisson Goodness-of-Fit TestPoisson Goodness-of-Fit Test

• The mean The mean is the only parameter. is the only parameter.• If If is unknown, it must be estimated from the sample. is unknown, it must be estimated from the sample.• Use the estimated Use the estimated to find the Poisson probability to find the Poisson probability PP((XX) for each value of ) for each value of XX..• Compute the expected frequencies.Compute the expected frequencies.• Perform the chi-square calculations.Perform the chi-square calculations.• Make the decision.Make the decision.• You may need to combine classes until expected frequencies become large enough for the You may need to combine classes until expected frequencies become large enough for the

test (at least until test (at least until eejj >> 2). 2).

LO15-5LO15-5

LO15-5: LO15-5: Explain the GOF test for a Poisson distribution.Explain the GOF test for a Poisson distribution.

Page 15: Prepared by Lloyd R. Jaisingh

15-15

• Calculate the sample mean as:Calculate the sample mean as:

• Using this estimate mean, calculate the Poisson probabilities either by using the Using this estimate mean, calculate the Poisson probabilities either by using the Poisson formula Poisson formula P(x) = (xe-)/x! or Excel.

• For For cc classes with classes with mm = 1 parameter estimated, the degrees of freedom are = 1 parameter estimated, the degrees of freedom are d.f. d.f. = = cc – – mm – 1 – 1

• Obtain the critical value for a given Obtain the critical value for a given from Appendix E. from Appendix E. • Make the decision.Make the decision.

Poisson GOF Test: Tabulated DataPoisson GOF Test: Tabulated Data

Ch

apter 15

15.4 Poisson Goodness-of-Fit Test15.4 Poisson Goodness-of-Fit TestLO15-5LO15-5

Page 16: Prepared by Lloyd R. Jaisingh

15-16

15.5 Normal Chi-Square15.5 Normal Chi-Square Goodness-of-Fit Test Goodness-of-Fit Test

• Two parameters, the mean Two parameters, the mean and the standard deviation and the standard deviation , fully describe the normal , fully describe the normal distribution.distribution.

• Unless Unless and and are know are know aa prioripriori, they must be estimated from a sample., they must be estimated from a sample.• Using these statistics, the chi-square goodness-of-fit test can be used.Using these statistics, the chi-square goodness-of-fit test can be used.

Normal Data Generating SituationsNormal Data Generating Situations

Ch

apter 15

Method 1: Standardizing the DataMethod 1: Standardizing the Data• Transform the sample observations Transform the sample observations xx11, , xx22, …, , …, xxnn into standardized values.into standardized values.

LO15-6LO15-6

LO15-6LO15-6: Use computer software to perform a chi-square GOF test for Use computer software to perform a chi-square GOF test for normality.normality.

Page 17: Prepared by Lloyd R. Jaisingh

15-17

• To obtain equal-width bins, divide the To obtain equal-width bins, divide the exact data range exact data range into into cc groups of equal groups of equal width.width.

• Step 1: Count the sample observations in each bin to get observed Step 1: Count the sample observations in each bin to get observed frequencies frequencies ffjj..

• Step 2: Convert the bin limits into standardized z-values by using the Step 2: Convert the bin limits into standardized z-values by using the formula.formula.

Method 2: Equal Bin WidthsMethod 2: Equal Bin Widths

Ch

apter 15

• Step 3: Find the normal area within each bin assuming a normal distribution.Step 3: Find the normal area within each bin assuming a normal distribution.• Step 4: Find expected frequencies Step 4: Find expected frequencies eejj by by

multiplying each normal area by the multiplying each normal area by the sample size sample size nn..

• Classes may need to be collapsed from the ends inward to enlarge expected Classes may need to be collapsed from the ends inward to enlarge expected frequencies.frequencies.

15.5 Normal Chi-Square15.5 Normal Chi-Square Goodness-of-Fit Test Goodness-of-Fit Test

LO15-6LO15-6

Page 18: Prepared by Lloyd R. Jaisingh

15-18

• Define histogram bins in such a way that an equal number of observations would Define histogram bins in such a way that an equal number of observations would be be expectedexpected within each bin under the null hypothesis. within each bin under the null hypothesis.

• Define bin limits so that Define bin limits so that eejj = = nn//cc

• A normal area of 1/A normal area of 1/cc in each of the in each of the cc bins is desired. bins is desired.• The first and last classes must be open-ended for a normal distribution, so to The first and last classes must be open-ended for a normal distribution, so to

define define cc bins, we need bins, we need cc – 1 cut-points. – 1 cut-points.• The upper limit of bin The upper limit of bin jj can be found directly by using Excel. can be found directly by using Excel.

• Alternatively, find Alternatively, find zzjj for bin for bin jj using Excel and then calculate the upper limit for bin using Excel and then calculate the upper limit for bin jj

as as

• Once the bins are defined, count the observations Once the bins are defined, count the observations ffjj within each bin and compare within each bin and compare

them with the expected frequencies them with the expected frequencies eejj = = nn//cc..

Method 3: Equal Expected FrequenciesMethod 3: Equal Expected Frequencies

Ch

apter 15

szx j

LO15-6LO15-615.5 Normal Chi-Square15.5 Normal Chi-Square Goodness-of-Fit Test Goodness-of-Fit Test

Page 19: Prepared by Lloyd R. Jaisingh

15-19

15.6 ECDF Tests15.6 ECDF Tests

• There are many alternatives to the chi-square test based on the There are many alternatives to the chi-square test based on the Empirical Empirical Cumulative Distribution Function Cumulative Distribution Function ((ECDFECDF).).

• The The Kolmogorov-Smirnov Kolmogorov-Smirnov (K-S) test uses the largest absolute difference between (K-S) test uses the largest absolute difference between the actual and expected cumulative relative frequency of the the actual and expected cumulative relative frequency of the nn data values data values

• The K-S test is not recommended for grouped data.The K-S test is not recommended for grouped data.• The K-S test assumes that no parameters are estimated.The K-S test assumes that no parameters are estimated.• If parameters are estimated, use a If parameters are estimated, use a Lilliefors testLilliefors test..• Both of these tests are done by computer.Both of these tests are done by computer.• The Anderson-Darling (A-D) test is widely used for non-normality because of its

power.• The A-D test is based on a probability plot.• When the data fit the hypothesized distribution closely, the probability plot will be

close to a straight line.

Ch

apter 15

LO15-7LO15-7

LO15-7: LO15-7: State advantages of ECDF tests as compared to chi-square State advantages of ECDF tests as compared to chi-square GOF tests.GOF tests.