slide 2 - 41 copyright © 2008 pearson education, inc. chapter 12 chi-square procedures

41

Upload: dale-morton

Post on 20-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures
Page 2: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 2 - 41

Chapter 12

Chi-Square Procedures

Page 3: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 3 - 41

Figure 12.1

A variable has a chi-square distribution if its distribution has the shape of a special type of right-skewed curve, called a chi-square ( ) curve. Actually, there are infinitely many chi-square distributions, and we identify the chi-square distribution in question by its number of degrees of freedom, just as we did for t-distributions. Figure 12.1 shows three -curves and illustrates some basic properties of -curves.

2

2

2

Page 4: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 4 - 41

Key Fact 12.1

Page 5: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 5 - 41

Example 12.2

The U.S. Federal Bureau of Investigation (FBI) compiles data on crimes and crime rates and publishes the information in Crime in the United States. A violent crime is classified by the FBI as murder, forcible rape, robbery, or aggravated assault. Table 12.1 gives a relative-frequency distribution for (reported) violent crimes in 2000. For instance, in 2000, 28.6% of violent crimes were robberies.A random sample of 500 violent-crime reports from last year yielded the frequency distribution shown in Table 12.2. Suppose that we want to use the data in Tables 12.1 and 12.2 to decide whether last year’s distribution of violent crimes has changed from the 2000 distribution.

Page 6: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 6 - 41

Table 12.1 Table 12.2

Solution Example 12.2

a. Formulate the problem statistically by posing it as a hypothesis test.

b. Explain the basic idea for carrying out the hypothesis test.

c. Discuss the details for making a decision concerning the hypothesis test.

Page 7: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 7 - 41

Solution Example 12.2

a. The population is last year’s (reported) violent crimes. The variable is “type of violent crime,” and its possible values are murder, forcible rape, robbery, and aggravated assault. We want to perform the hypothesis test

H0 : Last year’s violent-crime distribution is the same as the 2000 distribution.

Ha : Last year’s violent-crime distribution is different from the 2000 distribution.

Page 8: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 8 - 41

Solution Example 12.2

b. The idea behind the chi-square goodness-of-fit test is to compare the observed frequencies in the second column of Table 12.2 to the frequencies that would be expected – the expected frequencies – if last year’s violent-crime distribution is the same as the 2000 distribution. If the observed and expected frequencies match fairly well, (i.e., each observed frequency is roughly equal to its corresponding expected frequency), we do not reject the null hypothesis; otherwise, we reject the null hypothesis.

Page 9: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 9 - 41

Solution Example 12.2

c. To formulate a precise procedure for carrying out the hypothesis test, we need to answer two questions: 1. What frequencies should we expect from a random sample of 500 violent-crime reports from last year if last year’s violent-crime distribution is the same as the 2000 distribution? 2. How do we decide whether the observed and expected frequencies match fairly well? The first question is easy to answer, which we illustrate with robberies. If last year’s violent-crime distribution is the same as the 2000 distribution, then, according to Table 12.1, 28.6% of last year’s violent crimes would have been robberies. Therefore, in a random sample of 500 violent-crime reports from last year, we would expect about 28.6% of the 500, or 143, to be robberies.

Page 10: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 10 - 41

Solution Example 12.2

c. Compute each expected frequency using the formula E = np, where n is the sample size and p is the relative frequency. Calculations of the expected frequencies for all four types of violent crime are shown in Table 12.3. The third column of Table 12.3 answers the first question. It gives the frequencies that we would expect if last year’s violent-crime distribution is the same as the 2000 distribution.

Table 12.3

Page 11: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 11 - 41

Solution Example 12.2

c. The second question – whether the observed and expected frequencies match fairly well – is harder to answer. We need to calculate a number that measures the goodness of fit. In Table 12.4, the second column repeats the observed frequencies from the second column of Table 12.2. The third column of Table 12.4 repeats the expected frequencies from the third column of Table 12.3.

Table 12.4

Page 12: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 12 - 41

Solution Example 12.2

c. To measure the goodness of fit of the observed and expected frequencies, we look at the differences, O − E, shown in the fourth column of Table 12.4. Summing these differences to obtain a measure of goodness of fit isn’t very useful because the sum is 0. Instead, we square each difference (shown in the fifth column) and then divide by the corresponding expected frequency. Doing so gives the values (O − E)2/E, called chi-square subtotals, shown in the sixth column. The sum of the chi-square subtotals,

is the statistic used to measure the goodness of fit of the observed and expected frequencies.

O E 2 E 3.555

Page 13: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 13 - 41

Solution Example 12.2

c. If the null hypothesis is true, the observed and expected frequencies should be roughly equal, resulting in a small value of the test statistic,(O − E)2/E. In other words, large values of

(O − E)2/E provide evidence against the null hypothesis. As we have seen, (O − E)2/E = 3.555. Can this value be reasonably attributed to sampling error, or is it large enough to suggest that the null hypothesis is false? To answer this question, we need to know the distribution of the test statistic (O − E)2/E.

Page 14: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 14 - 41

Key Fact 12.2

Page 15: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 15 - 41

Procedure 12.1

Page 16: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 16 - 41

Procedure 12.1 (cont.)

Page 17: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 17 - 41

Example 12.5

In Example 2.8, we considered data on political party affiliation for the students in Professor Weiss’s introductory statistics course. These are univariate data from the single variable “political party affiliation.” Now, we simultaneously consider data on political party affiliation and on class level for the students in Professor Weiss’s introductory statistics course, as shown in Table 12.7. These are bivariate data from the two variables “political party affiliation” and “class level.” Group these bivariate data into a contingency table.

Page 18: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 18 - 41

Table 12.7

Page 19: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 19 - 41

Solution Example 12.5

A contingency table must accommodate each possible pair of values for the two variables. The contingency table for these two variables has the form shown in Table 12.8. The small boxes inside the rectangle formed by the heavy lines are called cells, which hold the frequencies.

Table 12.8

Page 20: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 20 - 41

Solution Example 12.5

To complete the contingency table, we first go through the data in Table 12.7 and place a tally mark in the appropriate cell of Table 12.8 for each student. The results of the tallying procedure are shown in Table 12.8. Replacing the tallies in Table 12.8 by the frequencies (counts of the tallies), we obtain the required contingency table, as shown in Table 12.9.

Table 12.9

Page 21: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 21 - 41

Solution Example 12.5

The upper left cell of Table 12.9 shows that one student in the course is both a Democrat and a freshman. The cell diagonally below and to the right of that cell shows that eight students in the course are both Republicans and sophomores. According to the first row total, 13 (1 + 4 + 5 + 3) of the students are Democrats. Similarly, the third column total shows that 12 of the students are juniors. The lower right corner gives the total number of students in the course, 40. You can find that total by summing the row totals, the column totals, or the frequencies in the 12 cells.

Page 22: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 22 - 41

Example 12.6

In Example 12.5, we presented data on political party affiliation and class level for the students in Professor Weiss’s introductory statistics course. Consider those students a population of interest.a. Find the distribution of political party affiliation within

each class level.b. Use the result of part (a) to decide whether the variables

“political party affiliation” and “class level” are associated.c. What would it mean if the variables “political party

affiliation” and “class level” were not associated?d. Explain how a segmented bar graph represents

whether the variables “political party affiliation” and “class level” are associated.

e. Discuss another method for deciding whether the variables “political party affiliation” and “class level” are associated.

Page 23: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 23 - 41

Solution Example 12.6

a. To obtain the distribution of political party affiliation within each class level, divide each entry in a column of the contingency table in Table 12.9 by its column total. Table 12.10 shows the results.

Table 12.10

Page 24: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 24 - 41

Solution Example 12.6

a. The first column of Table 12.10 gives the distribution of political party affiliation for freshman: 16.7% are Democrats, 66.7% are Republicans, and 16.7% are Other. This distribution is called the conditional distribution of the variable “political party affiliation” corresponding to the value “freshman” of the variable “class level”; or, more simply, the conditional distribution of political party affiliation for freshmen.

Similarly, the second, third, and fourth columns give the conditional distributions of political party affiliation for sophomores, juniors, and seniors, respectively. The “Total” column provides the (unconditional) distribution of political party affiliation for the entire population which, in this context, is called the marginal distribution of the variable “political party affiliation.”

Page 25: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 25 - 41

Solution Example 12.6

b. Table 12.10 reveals that the variables “political party affiliation” and “class level” are associated because knowing the value of the variable “class level” imparts information about the value of the variable “political party affiliation.” For instance, as shown in Table 12.10, if we do not know the class level of a student in the course, there is a 32.5% chance that the student is a Democrat. But, if we know that the student is a junior, there is a 41.7% chance that the student is a Democrat.

c. If the variables “political party affiliation” and “class level” were not associated, the four conditional distributions of political party affiliation would be the same as each other and as the marginal distribution of political party affiliation; in other words, all five columns of Table 12.10 would be identical.

Page 26: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 26 - 41

Solution Example 12.6

d. A segmented bar graph lets us visualize the concept of association. The first four bars of the segmented bar graph in Fig. 12.4 show the conditional distributions of political party affiliation for freshmen, sophomores, juniors, and seniors, respectively, and the fifth bar gives the marginal distribution of political party affiliation. This segmented bar graph is derived from Table 12.10. If political party affiliation and class level were not associated, the four bars displaying the conditional distributions of political party affiliation would be the same as each other and as the bar displaying the marginal distribution of political party affiliation; in other words, all five bars in Fig. 12.4 would be identical. That political party affiliation and class level are in fact associated is illustrated by the nonidentical bars.

Page 27: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 27 - 41

Figure 12.4

Page 28: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 28 - 41

Solution Example 12.6

e. Alternatively, we could decide whether the two variables are associated by obtaining the conditional distribution of class level within each political party affiliation. The conclusion regarding association (or nonassociation) will be the same, regardless of which variable’s conditional distributions we examined.

Page 29: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 29 - 41

Definition 12.1

Page 30: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 30 - 41

Example 12.9

A national survey was conducted to obtain information on the alcohol consumption patterns of U.S. adults by marital status. A random sample of 1772 residents, 18 years old and older, yielded the data displayed in Table 13.13.  Suppose we want to use the data in Table 12.13 to decide whether marital status and alcohol consumption are associated.a. Formulate the problem statistically by posing it as a

hypothesis test.b. Explain the basic idea for carrying out the hypothesis test.c. Develop a formula for computing the expected

frequencies.d. Construct a table that provides both the observed

frequencies in Table 12.13 and the expected frequencies.e. Discuss the details for making a decision concerning the

hypothesis test.

Page 31: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 31 - 41

Table 12.13

Page 32: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 32 - 41

Solution Example 12.9

a. For a chi-square independence test, the null hypothesis is that the two variables are not associated; the alternative hypothesis is that the two variables are associated. Thus, we want to perform the hypothesis testH0 : Marital status and alcohol consumption are not associated.Ha : Marital status and alcohol consumption are associated.

b. The idea behind the chi-square independence test is to compare the observed frequencies in Table 12.13 with the frequencies we would expect if the null hypothesis of nonassociation is true. The test statistic for making the comparison has the same form as the one used for the goodness-of- fit test: 2 = (O − E)2/E, where O represents observed frequency and E represents expected frequency.

Page 33: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 33 - 41

Solution Example 12.9

c. To develop a formula for computing the expected frequencies, consider, for instance, the cell of Table 12.13 corresponding to “Married and Abstain,” the cell in the second row and first column. We note that the population proportion of all adults who abstain can be estimated by the sample proportion of the 1772 adults sampled who abstain, that is, by

If no association exists between marital status and alcohol consumption (i.e., if H0 is true), then the proportion of married adults who abstain is the same as the proportion of all adults who abstain.

Page 34: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 34 - 41

Solution Example 12.9

c. Therefore, of the 1173 married adults sampled, we would expect about

to abstain from alcohol.Let’s rewrite the left side of this expected-frequency computation in a slightly different way. By using algebra and referring to Table 12.13, we obtain

Expected Frequency 590

17721173

5901173

1772

Row total Column total

Sample size

590

17721173 390.6

Page 35: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 35 - 41

Solution Example 12.9

c. If we let R denote “Row total” and C denote “Column total,” we can write this equation as

where, as usual, E denotes expected frequency and n denotes sample size.

d. Using Equation (12.1), we can calculate the expected frequencies for all the cells in Table 12.13. For the cell in the upper right corner of the table, we get

E RCn

,

E RCn

354 225

177244.9

Page 36: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 36 - 41

Solution Example 12.9

d. In Table 12.14 (on the next slide), we have modified Table 12.13 by including each expected frequency beneath the corresponding observed frequency. Table 12.14 shows, for instance, that of the adults sampled, 74 were observed to be single and consumed more than 60 drinks per month, whereas if marital status and alcohol consumption are not associated, the expected frequency is 44.9.

Page 37: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 37 - 41

Table 12.14

Page 38: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 38 - 41

Solution Example 12.9

e. If the null hypothesis of nonassociation is true, the observed and expected frequencies should be approximately equal, which would result in a relatively small value of the test statistic, = (O − E)2/E. Consequently, if is too large, we reject the null hypothesis and conclude that an association exists between marital status and alcohol consumption. From Table 12.14, we find that

= (O − E)2/E = 94.269Can this value be reasonably attributed to sampling error, or is it large enough to indicate that marital status and alcohol consumption are associated? Before we can answer that question, we must know the distribution of the -statistic.

2

2

2

2

Page 39: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 39 - 41

Key Fact 12.3

Page 40: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 40 - 41

Procedure 12.2

Page 41: Slide 2 - 41 Copyright © 2008 Pearson Education, Inc. Chapter 12 Chi-Square Procedures

Copyright © 2008 Pearson Education, Inc. Slide 41 - 41

Procedure 12.2 (cont.)