chapter 13 inference for counts: chi-square tests © 2011 pearson education, inc. 1 business...

Chapter 13Inference for Counts: Chi-Square

Tests

© 2011 Pearson Education, Inc.

1

Business Business Statistics: Statistics:

A First CourseA First Course

2

13.1 Chi-Square TestsGiven the following…

1) Counts of items in each of several categories

2) A model that predicts the distribution of the relative frequencies

…the basic idea is to ask:

“Does the actual distribution differ from the model because of random error or do the differences mean that the model does not fit the data?”

In other words, “How good is the fit between what we observe and what we expect to observe?”


3

13.1 Chi-Square TestsExample: Stock Market “Up” Days

Sample of 1000 “up” days

“Up” days appear to be more common than expected on Fridays (we expect them to be equally likely across trading days).Null Hypothesis: The distribution of “up” days is no different from what we expect (equally likely across days). Test the hypothesis with a chi-square goodness-of-fit test.


4

13.1 Chi-Square TestsThe Chi-Square Distribution

Note that “accumulates” the relative squared deviation of each cell from its expected value.

So, gets “big” when the model is a poor fit.

2

2


5

13.1 Chi-Square Tests

Assumptions and Conditions

• Counted Data Condition – The data must be counts for the categories of a categorical variable.

• Independence Assumption – The counts should be independent of each other.

• Randomization Condition – The counted individuals should be a random sample of the population.

• Expected Cell Frequency Condition – Expect at least 5 individuals per cell.


6

13.1 Chi-Square TestsThe Chi-Square Calculation


7

13.1 Chi-Square Tests

The Chi-Square Calculation: Stock Market “Up” Days

Using a chi-square table at a significance level of 0.05 and with 4 degrees of freedom:

24 9.488 2.62

Do not reject the null hypothesis. (The fit is “good”.)

2 22 (192 193.4) (218 199.7)

... 2.62193.4 199.7

x


8

13.2 Interpreting Chi-Square ValuesThe Chi-Square Distribution

The distribution is right-skewed and becomes broader with increasing degrees of freedom:

2

The test is a one-sided test.2


9

When we reject a null hypothesis, we can examine the residuals in each cell to discover which values are extraordinary.

Because we might compare residuals for cells with very different counts, we should examine standardized residuals:

13.3 Examining the Residuals

Note that standardized residuals from goodness-of-fit tests are actually z-scores (which we already know how to interpret and analyze).


10

Standardized residuals for the trading days data:

13.3 Examining the Residuals

• None of these values is remarkable.

• The largest, Friday, at 1.292, is not impressive when viewed as a z-score.

• The deviations are in the direction of a “weekend effect”, but they aren’t quite large enough for us to conclude they are real.


11

13.6 Chi-Square Test of Independence

The table below shows the importance of personal appearance for several age groups.

Are Age and Appearance independent, or is there a relationship?


12


A stacked barchart suggests a relationship:

Test for independence using a chi-square test of independence.


13


The test requires finding expected counts under the assumption that the null hypothesis is true (that the two variables are independent). Find the expected count for each cell by multiplying the appropriate row and column totals and divide by the table total:

Exp ij = Total Row i x Total Column j / Table Total


14


For the Appearance and Age example, we reject the null hypothesis that the variables are independent.

So, it may be of interest to know how differently two age groups (teens and 30-something adults) select the “very important” category (Appearance response 6 or 7).

You can construct a confidence interval for the true difference in these proportions…


15


From the table, the relevant percentages of responses (6 or 7) on Appearance for teens and 30 something adults are:

Teens: 45.17%

30-39: 39.91%

The 95% confidence interval is found below:


16

What Can Go Wrong? Don’t use chi-square methods unless you have counts.

Beware of large samples! With a sufficiently large sample size, a chi-square test will result in rejecting the null hypothesis.

Don’t say that one variable “depends” on the other just because they’re not independent.


17

What Have We Learned?

Goodness-of-fit tests compare the observed distribution of a single categorical variable to an expected distribution based on a theory or model.

Tests of independence examine counts from a single group for evidence of an association between two categorical variables.


chapter 13 inference for counts: chi-square tests © 2011 pearson education, inc. 1 business...

Documents

chisquare tests

pearson education

chisquare goodness

fit tests

course slide

standardized residuals

days sample

trading days data