chapter 13 inference for counts: chi-square tests © 2011 pearson education, inc. 1 business...

17
Chapter 13 Inference for Counts: Chi- Square Tests © 2011 Pearson Education, Inc. 1 Business Business Statistics: Statistics: A First A First Course Course

Upload: caitlin-malone

Post on 13-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

Chapter 13Inference for Counts: Chi-Square

Tests

© 2011 Pearson Education, Inc.

1

Business Business Statistics: Statistics:

A First CourseA First Course

Page 2: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

2

13.1 Chi-Square TestsGiven the following…

1) Counts of items in each of several categories

2) A model that predicts the distribution of the relative frequencies

…the basic idea is to ask:

“Does the actual distribution differ from the model because of random error or do the differences mean that the model does not fit the data?”

In other words, “How good is the fit between what we observe and what we expect to observe?”

© 2011 Pearson Education, Inc.

Page 3: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

3

13.1 Chi-Square TestsExample: Stock Market “Up” Days

Sample of 1000 “up” days

“Up” days appear to be more common than expected on Fridays (we expect them to be equally likely across trading days).Null Hypothesis: The distribution of “up” days is no different from what we expect (equally likely across days). Test the hypothesis with a chi-square goodness-of-fit test.

© 2011 Pearson Education, Inc.

Page 4: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

4

13.1 Chi-Square TestsThe Chi-Square Distribution

Note that “accumulates” the relative squared deviation of each cell from its expected value.

So, gets “big” when the model is a poor fit.

2

2

© 2011 Pearson Education, Inc.

Page 5: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

5

13.1 Chi-Square Tests

Assumptions and Conditions

• Counted Data Condition – The data must be counts for the categories of a categorical variable.

• Independence Assumption – The counts should be independent of each other.

• Randomization Condition – The counted individuals should be a random sample of the population.

• Expected Cell Frequency Condition – Expect at least 5 individuals per cell.

© 2011 Pearson Education, Inc.

Page 6: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

6

13.1 Chi-Square TestsThe Chi-Square Calculation

© 2011 Pearson Education, Inc.

Page 7: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

7

13.1 Chi-Square Tests

The Chi-Square Calculation: Stock Market “Up” Days

Using a chi-square table at a significance level of 0.05 and with 4 degrees of freedom:

24 9.488 2.62

Do not reject the null hypothesis. (The fit is “good”.)

2 22 (192 193.4) (218 199.7)

... 2.62193.4 199.7

x

© 2011 Pearson Education, Inc.

Page 8: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

8

13.2 Interpreting Chi-Square ValuesThe Chi-Square Distribution

The distribution is right-skewed and becomes broader with increasing degrees of freedom:

2

The test is a one-sided test.2

© 2011 Pearson Education, Inc.

Page 9: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

9

When we reject a null hypothesis, we can examine the residuals in each cell to discover which values are extraordinary.

Because we might compare residuals for cells with very different counts, we should examine standardized residuals:

13.3 Examining the Residuals

Note that standardized residuals from goodness-of-fit tests are actually z-scores (which we already know how to interpret and analyze).

© 2011 Pearson Education, Inc.

Page 10: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

10

Standardized residuals for the trading days data:

13.3 Examining the Residuals

• None of these values is remarkable.

• The largest, Friday, at 1.292, is not impressive when viewed as a z-score.

• The deviations are in the direction of a “weekend effect”, but they aren’t quite large enough for us to conclude they are real.

© 2011 Pearson Education, Inc.

Page 11: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

11

13.6 Chi-Square Test of Independence

The table below shows the importance of personal appearance for several age groups.

Are Age and Appearance independent, or is there a relationship?

© 2011 Pearson Education, Inc.

Page 12: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

12

13.6 Chi-Square Test of Independence

A stacked barchart suggests a relationship:

Test for independence using a chi-square test of independence.

© 2011 Pearson Education, Inc.

Page 13: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

13

13.6 Chi-Square Test of Independence

The test requires finding expected counts under the assumption that the null hypothesis is true (that the two variables are independent). Find the expected count for each cell by multiplying the appropriate row and column totals and divide by the table total:

Exp ij = Total Row i x Total Column j / Table Total

© 2011 Pearson Education, Inc.

Page 14: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

14

13.6 Chi-Square Test of Independence

For the Appearance and Age example, we reject the null hypothesis that the variables are independent.

So, it may be of interest to know how differently two age groups (teens and 30-something adults) select the “very important” category (Appearance response 6 or 7).

You can construct a confidence interval for the true difference in these proportions…

© 2011 Pearson Education, Inc.

Page 15: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

15

13.6 Chi-Square Test of Independence

From the table, the relevant percentages of responses (6 or 7) on Appearance for teens and 30 something adults are:

Teens: 45.17%

30-39: 39.91%

The 95% confidence interval is found below:

© 2011 Pearson Education, Inc.

Page 16: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

16

What Can Go Wrong? Don’t use chi-square methods unless you have counts.

Beware of large samples! With a sufficiently large sample size, a chi-square test will result in rejecting the null hypothesis.

Don’t say that one variable “depends” on the other just because they’re not independent.

© 2011 Pearson Education, Inc.

Page 17: Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course

17

What Have We Learned?

Goodness-of-fit tests compare the observed distribution of a single categorical variable to an expected distribution based on a theory or model.

Tests of independence examine counts from a single group for evidence of an association between two categorical variables.

© 2011 Pearson Education, Inc.