copyright © 2005 brooks/cole, a division of thomson learning, inc. 16.1 chapter 16 chi-squared...

29
16.1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 16 Chi-Squared Tests

Upload: gwenda-may

Post on 04-Jan-2016

220 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.1Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Chapter 16

Chi-Squared Tests

Page 2: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

A Common Theme…

What to do? Data Type?Number of

Categories?Statistical Technique:

Describe a population

Nominal Two or moreX2 goodness of

fit test

Compare two populations

Nominal Two or moreX2 test of a contingency

table

Compare two or more populations

Nominal --X2 test of a contingency

table

Analyze relationship between two

variables

Nominal --X2 test of a contingency

table

One data type……Two techniques

Page 3: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.3Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Two Techniques…

The first is a goodness-of-fit test applied to data produced by a multinomial experiment, a generalization of a binomial experiment and is used to describe one population of data.

The second uses data arranged in a contingency table to determine whether two classifications of a population of nominal data are statistically independent; this test can also be interpreted as a comparison of two or more populations.

In both cases, we use the chi-squared ( ) distribution.

Page 4: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.4Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

The Multinomial Experiment…

Unlike a binomial experiment which only has two possible outcomes (e.g. heads or tails), a multinomial experiment:

• Consists of a fixed number, n, of trials.

• Each trial can have one of k outcomes, called cells.

• Each probability pi remains constant.

• Our usual notion of probabilities holds, namely:

p1 + p2 + … + pk = 1, and

• Each trial is independent of the other trials.

Page 5: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.5Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Chi-squared Goodness-of-Fit Test…

We test whether there is sufficient evidence to reject a specified set of values for pi.

To illustrate, our null hypothesis is:

H0: p1 = a1, p2 = a2, …, pk = ak

(where a1, a2, …, ak are the values of interest)

Our research hypothesis is:

H1: At least one pi ≠ ai

Page 6: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.6Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Chi-squared Goodness-of-Fit Test…

The test builds on comparing actual frequency and the expected frequency of occurrences in all the cells.

Example 16.1…

We compare market share before and after an advertising campaign to see if there is a difference (i.e. if the advertising was effective in improving market share).

H0: p1 = a1, p2 = a2, …, pk = ak

Where ai is the market share before the campaign. If there was no change, we’d expect H0 to not be rejected. If there is evidence to reject H0 in favor of: H1: At least one pi ≠ ai, what’s a logical conclusion?

Page 7: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.7Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.1…

Market shares before the advertising campaign…Company A – 45%

Company B – 40%

All Others – 15 %

200 customers surveyed after the campaign. The results:Company A – 102 customers preferred their product.

Company B – 82 customers…

All Others – 16 customers.

Before the campaign, we’d expect 45% of 200 customers (i.e. 90 customers) to prefer company A’s product. After the campaign, we observe its 102 customers. Does this mean the campaign was effective? (i.e. at a 5% significance level).

IDENTIFY

Page 8: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.8Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.1…Observed Frequency

A

B

Expected Frequency

A

B

Are these changes

statistically significant?

Page 9: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.9Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.1…

Our null hypothesis is:

H0: pCompanyA = .45, pCompanyB = .40, pOthers = .15

(i.e. the market shares pre-campaign), and our alternative hypothesis is:

H1: At least one pi ≠ ai

In order to complete our hypothesis testing we need a test statistic and a rejection region…

IDENTIFY

Page 10: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.10Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Chi-squared Goodness-of-Fit Test…

Our Chi-squared goodness of fit test statistic is given by:

Note: this statistic is approximately Chi-squared with k–1 degrees of freedom provided the sample size is large. The rejection region is:

observed frequency

expected frequency

Page 11: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.11Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.1…

In order to calculate our test statistic, we lay-out the data in a tabular fashion for easier calculation by hand:

Company

Observed Frequenc

y

Expected Frequenc

yDelta

SummationComponent

fi ei (fi – ei) (fi – ei)2/ei

A 102 90 12 1.60

B 82 80 2 0.05

Others 16 30 -14 6.53

Total 200 200 8.18

Check that these are equal

COMPUTE

Page 12: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.12Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.1…

Our rejection region is:

Since our test statistic is 8.18 which is greater than our critical value for Chi-squared, we reject H0 in favor of H1, that is,

“There is sufficient evidence to infer that the proportions have changed since the advertising campaigns were

implemented”

INTERPRET

Page 13: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.13Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.1…

Note: Table 5 in Appendix B does not allow for the direct calculation of , so we have to use Excel:

COMPUTE

Page 14: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.14Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.1…

Note: There are a couple of different ways to calculate the p-value of the test:

p-value

Computed manually from our

table

Computed directly from the data

Page 15: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.15Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Required Conditions…

In order to use this technique, the sample size must be large enough so that the expected value for each cell is 5 or more. (i.e. n x pi ≥ 5)

If the expected frequency is less than five, combine it with other cells to satisfy the condition.

Page 16: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.16Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Identifying Factors…

Factors that Identify the Chi-Squared Goodness-of-Fit Test:

ei=(n)(pi)

Page 17: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.17Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Chi-squared Test of a Contingency TableThe Chi-squared test of a contingency table is used to:

• determine whether there is enough evidence to infer that two nominal variables are related, and

• to infer that differences exist among two or more populations of nominal variables.

In order to use use these techniques, we need to classify the data according to two different criteria.

Page 18: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.18Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.2…

The demand for an MBA program’s optional courses and majors is quite variable year over year.

The research hypothesis is that the academic background of the students (i.e. their undergrad degrees) affects their choice of major.

A random sample of data on last year’s MBA students was collected and summarized in a contingency table…

IDENTIFY

Page 19: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.19Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.2 The Data

MBA Major

Undergrad Degree

Accounting

Finance Marketing Total

BA 31 13 16 60

BEng 8 16 7 31

BBA 12 10 17 39

Other 10 5 7 22

Total 61 44 47 152

Page 20: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.20Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.2…

Again, we are interesting in determining whether or not the academic background of the students affects their choice of MBA major. Thus our research hypothesis is:

H1: The two variables are dependent

Our null hypothesis then, is:

H0: The two variables are independent.

Page 21: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.21Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.2…

In this case, our test statistic is:

(where k is the number of cells in the contingency table, i.e. rows x columns)

Our rejection region is:

where the number of degrees of freedom is (r–1)(c–1)

Page 22: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.22Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.2…

In order to calculate our test statistic, we need the calculate the expected frequencies for each cell…

The expected frequency of the cell in row i and column j is:

COMPUTE

Page 23: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.23Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Contingency Table Set-up…

Page 24: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.24Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.2 COMPUTE

MBA Major

Undergrad Degree

Accounting

Finance Marketing Total

BA 31 13 16 60

BEng 8 1631 x 47

15231

BBA 12 10 17 39

Other 10 5 7 22

Total 61 44 47 152

e23 = (31)(47)/152 = 9.59 — compare this to f23 = 7

Compute expected frequencies…

Page 25: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.25Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.2…

We can now compare observed with expected frequencies…

and calculate our test statistic:

MBA Major

Undergrad Degree

Accounting Finance Marketing

BA 31 24.08 13 17.37 16 18.55

BEng 8 12.44 16 8.97 7 9.59

BBA 12 15.65 10 11.29 17 12.06

Other 10 8.83 5 6.37 7 6.80

Page 26: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.26Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.2…

We compare = 14.70 with:

Since our test statistic falls into the rejection region, we reject

H0: The two variables are independent.

in favor of

H1: The two variables are dependent.

That is, there is evidence of a relationship between undergrad degree and MBA major.

INTERPRET

Page 27: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.27Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 16.2…

We can also leverage the tools in Excel to process our data:

COMPUTE

Tools >Data Analysis Plus >Contingency Table

Compare

p-value

Page 28: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.28Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Required Condition – Rule of Five…

In a contingency table where one or more cells have expected values of less than 5, we need to combine rows or columns to satisfy the rule of five.

Note: by doing this, the degrees of freedom must be changed as well.

Page 29: Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 16.1 Chapter 16 Chi-Squared Tests

16.29Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Identifying Factors…

Factors that identify the Chi-squared test of a contingency table: