chapter 13 inference for counts: chi-square tests © 2011 pearson education, inc. 1 business...

Post on 13-Dec-2015

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chapter 13Inference for Counts: Chi-Square

Tests

© 2011 Pearson Education, Inc.

1

Business Business Statistics: Statistics:

A First CourseA First Course

2

13.1 Chi-Square TestsGiven the following…

1) Counts of items in each of several categories

2) A model that predicts the distribution of the relative frequencies

…the basic idea is to ask:

“Does the actual distribution differ from the model because of random error or do the differences mean that the model does not fit the data?”

In other words, “How good is the fit between what we observe and what we expect to observe?”

© 2011 Pearson Education, Inc.

3

13.1 Chi-Square TestsExample: Stock Market “Up” Days

Sample of 1000 “up” days

“Up” days appear to be more common than expected on Fridays (we expect them to be equally likely across trading days).Null Hypothesis: The distribution of “up” days is no different from what we expect (equally likely across days). Test the hypothesis with a chi-square goodness-of-fit test.

© 2011 Pearson Education, Inc.

4

13.1 Chi-Square TestsThe Chi-Square Distribution

Note that “accumulates” the relative squared deviation of each cell from its expected value.

So, gets “big” when the model is a poor fit.

2

2

© 2011 Pearson Education, Inc.

5

13.1 Chi-Square Tests

Assumptions and Conditions

• Counted Data Condition – The data must be counts for the categories of a categorical variable.

• Independence Assumption – The counts should be independent of each other.

• Randomization Condition – The counted individuals should be a random sample of the population.

• Expected Cell Frequency Condition – Expect at least 5 individuals per cell.

© 2011 Pearson Education, Inc.

6

13.1 Chi-Square TestsThe Chi-Square Calculation

© 2011 Pearson Education, Inc.

7

13.1 Chi-Square Tests

The Chi-Square Calculation: Stock Market “Up” Days

Using a chi-square table at a significance level of 0.05 and with 4 degrees of freedom:

24 9.488 2.62

Do not reject the null hypothesis. (The fit is “good”.)

2 22 (192 193.4) (218 199.7)

... 2.62193.4 199.7

x

© 2011 Pearson Education, Inc.

8

13.2 Interpreting Chi-Square ValuesThe Chi-Square Distribution

The distribution is right-skewed and becomes broader with increasing degrees of freedom:

2

The test is a one-sided test.2

© 2011 Pearson Education, Inc.

9

When we reject a null hypothesis, we can examine the residuals in each cell to discover which values are extraordinary.

Because we might compare residuals for cells with very different counts, we should examine standardized residuals:

13.3 Examining the Residuals

Note that standardized residuals from goodness-of-fit tests are actually z-scores (which we already know how to interpret and analyze).

© 2011 Pearson Education, Inc.

10

Standardized residuals for the trading days data:

13.3 Examining the Residuals

• None of these values is remarkable.

• The largest, Friday, at 1.292, is not impressive when viewed as a z-score.

• The deviations are in the direction of a “weekend effect”, but they aren’t quite large enough for us to conclude they are real.

© 2011 Pearson Education, Inc.

11

13.6 Chi-Square Test of Independence

The table below shows the importance of personal appearance for several age groups.

Are Age and Appearance independent, or is there a relationship?

© 2011 Pearson Education, Inc.

12

13.6 Chi-Square Test of Independence

A stacked barchart suggests a relationship:

Test for independence using a chi-square test of independence.

© 2011 Pearson Education, Inc.

13

13.6 Chi-Square Test of Independence

The test requires finding expected counts under the assumption that the null hypothesis is true (that the two variables are independent). Find the expected count for each cell by multiplying the appropriate row and column totals and divide by the table total:

Exp ij = Total Row i x Total Column j / Table Total

© 2011 Pearson Education, Inc.

14

13.6 Chi-Square Test of Independence

For the Appearance and Age example, we reject the null hypothesis that the variables are independent.

So, it may be of interest to know how differently two age groups (teens and 30-something adults) select the “very important” category (Appearance response 6 or 7).

You can construct a confidence interval for the true difference in these proportions…

© 2011 Pearson Education, Inc.

15

13.6 Chi-Square Test of Independence

From the table, the relevant percentages of responses (6 or 7) on Appearance for teens and 30 something adults are:

Teens: 45.17%

30-39: 39.91%

The 95% confidence interval is found below:

© 2011 Pearson Education, Inc.

16

What Can Go Wrong? Don’t use chi-square methods unless you have counts.

Beware of large samples! With a sufficiently large sample size, a chi-square test will result in rejecting the null hypothesis.

Don’t say that one variable “depends” on the other just because they’re not independent.

© 2011 Pearson Education, Inc.

17

What Have We Learned?

Goodness-of-fit tests compare the observed distribution of a single categorical variable to an expected distribution based on a theory or model.

Tests of independence examine counts from a single group for evidence of an association between two categorical variables.

© 2011 Pearson Education, Inc.

top related