chapter 11 chi-square and f...

12
CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS CHI-SQUARE TESTS OF INDEPENDENCE (SECTION 11.1 OF UNDERSTANDABLE STATISTICS) In chi-square tests of independence we use the hypotheses. 0 : H The two variables are independent 1 : H The two variables are not independent To use SPSS for tests of independence of two variables, we need to enter the original occurrence records into the data editing screen (or retrieve it from a data file). The command Chi-square then prints a contingency table showing both the observed and expected counts. It computes the sample chi-square value using the following formula, in which E stands for the expected count in a cell and O stands for the observed count in that same cell. The sum takes over all cells. 2 2 ( ) O E E χ =∑ Then SPSS gives the number of degrees of the chi-square distribution. To conclude the test, use the P value of the sample chi-square statistic if your version of SPSS provides it. Otherwise, compare the calculated chi-square value to a table of the chi-square distribution with the indicated degrees of freedom. Use Table 8 of Appendix II of Understandable Statistics. If the calculated sample chi-square value is larger than the value in Table 8 for a specified level of significance, reject 0 . H Use the menu selection Analyze Descriptive Statistics Crosstabs Dialog Box Responses Enter one variable as the row variable. Enter the other variable as the column variable. Click on [Cells] and check [Observed] as well as [Expected] for Counts. Then click on [Continue]. Click on [Statistics] and check [Chi-square]. Then click on [Continue]. Click on [OK]. Example Let us first use a small sample to illustrate the procedure. Suppose among ten students four are male and six are female. When they vote on a certain issue, one male gives “yes”, other three male students vote “no”, two female students vote “yes”, and other four vote “no”. Use the Chi-square test at the 5% level of significance to determine whether the two variables gender and votes are independent of each other. First, enter the data under two variables Gender and Vote (both are of the type “string” ) as shown below. 366 Copyright © Houghton Mifflin Company. All rights reserved.

Upload: others

Post on 18-Apr-2020

37 views

Category:

Documents


0 download

TRANSCRIPT

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

CHI-SQUARE TESTS OF INDEPENDENCE (SECTION 11.1 OF UNDERSTANDABLE STATISTICS)

In chi-square tests of independence we use the hypotheses.

0:H The two variables are independent

1:H The two variables are not independent

To use SPSS for tests of independence of two variables, we need to enter the original occurrence records into the data editing screen (or retrieve it from a data file). The command Chi-square then prints a contingency table showing both the observed and expected counts. It computes the sample chi-square value using the following formula, in which E stands for the expected count in a cell and O stands for the observed count in that same cell. The sum takes over all cells.

22 ( )O E

Eχ −

= ∑

Then SPSS gives the number of degrees of the chi-square distribution. To conclude the test, use the P value of the sample chi-square statistic if your version of SPSS provides it. Otherwise, compare the calculated chi-square value to a table of the chi-square distribution with the indicated degrees of freedom. Use Table 8 of Appendix II of Understandable Statistics. If the calculated sample chi-square value is larger than the value in Table 8 for a specified level of significance, reject 0.H

Use the menu selection

Analyze Descriptive Statistics Crosstabs

Dialog Box Responses

Enter one variable as the row variable. Enter the other variable as the column variable.

Click on [Cells] and check [Observed] as well as [Expected] for Counts. Then click on [Continue].

Click on [Statistics] and check [Chi-square]. Then click on [Continue].

Click on [OK].

Example

Let us first use a small sample to illustrate the procedure. Suppose among ten students four are male and six are female. When they vote on a certain issue, one male gives “yes”, other three male students vote “no”, two female students vote “yes”, and other four vote “no”. Use the Chi-square test at the 5% level of significance to determine whether the two variables gender and votes are independent of each other.

First, enter the data under two variables Gender and Vote (both are of the type “string” ) as shown below.

366 Copyright © Houghton Mifflin Company. All rights reserved.

Part IV: SPSS Guide 367

Now, use the menu options Analyze Descriptive Statistics Crosstabs. Use Gender as the row variable, and Vote as the column variable. Click on [Cells] and check [Observed] as well as [Expected] for Counts, as shown below.

Click on [Continue]. Then click on [Statistics] and check [Chi-square], as shown below.

Copyright © Houghton Mifflin Company. All rights reserved.

368 Technology Guide Understandable Statistics, 8th Edition

Click [Continue] and then [OK]. The results follow.

Since the P value (Asymp. Sig.) equals 0.778 which is greater than 0.05, we do not reject the null hypothesis.

Copyright © Houghton Mifflin Company. All rights reserved.

Part IV: SPSS Guide 369

Example

Consider an example that involves a relatively large data. A computer programming aptitude test has been developed for high school seniors. The test designers claim that scores on the test are independent of the type of school the student attends: rural, suburban, urban. A study involving a random sample of students from these types of institutions yielded the following contingency table. Use the Chi-Square command to compute the sample chi-square value, and to determine the degrees of freedom of the chi-square distribution. Then determine if type or school and test score are independent at the α = 0.05 level of significance.

School Type Score Rural Suburban Urban

200–299 33 65 83

300–399 45 79 95

400–500 21 47 63

SPSS conducts the Chi-square test on the original occurrence records data, as illustrated in the previous example. Therefore, first create a data file containing original records under two variables Score and Region. As shown below, using the above contingency table we can use Transform Compute to see that there are 181 scores between 200 and 299, 219 scores between 300 and 399, and 131 scores between 400 and 500. Altogether there are 531 scores.

Next, in a new data editing screen, define three variables: id (type: numeric), Score (type: string), and Region (type: string). The variable id contains the record number, which equals the row number, and is used to make the entering of data more convenient as described below. We create the data following these steps:

Copyright © Houghton Mifflin Company. All rights reserved.

370 Technology Guide Understandable Statistics, 8th Edition

1. Roll the screen down to row 531 and enter a value (any value) for the variable id. This defines the length of the data.

2. Use Transform Compute. Select id as the target variable. Under the Function group select All. In that subgroup select the function $Casenum, which will assign the case number (row number) to the variable id.

Click [OK]. The results follow.

Copyright © Houghton Mifflin Company. All rights reserved.

Part IV: SPSS Guide 371

3. Now use Transform Compute to enter data for the variables Score and Region. For the variable Score, we enter “200-299” when 1 <= id <= 181, enter “300-399” when 182 <= id <= 400, enter “400-500” when 401 <= id <= 531. For the variable Region, we enter “rural” when 1 <= id <= 33 or 182 <= id <= 226 or 401 <= id <=421, and similarly, we enter “suburban” when 34 <= id <= 98 or 227 <= id <= 305 or 422 <= id <=468, enter “urban” when 99 <= id <= 181 or 306 <= id <= 400 or 469 <= id <=531. Here is how to enter “200-299” for Score when 1 <= id <= 181. Use Transform Compute, enter Score as the target variable, enter “200-299” as string expression, click on [if], then choose [include if case satisfies condition] and enter the condition id >= 1 & id <= 181.

Click [Continue] and then [OK]. The results follow.

Copyright © Houghton Mifflin Company. All rights reserved.

372 Technology Guide Understandable Statistics, 8th Edition

Similarly enter the rest of the data. The finished data will appear as below (window split feature is used to display the beginning and the end of data.)

Now use the menu options Analyze Descriptive Statistics Crosstabs. Use Score as the row variable, and Region as the column variable. Click on [Cells] and check [Observed] as well as [Expected] for Counts. Click on [Continue]. Then click on [Statistics] and check [Chi-square]. Click [Continue] and then [OK]. The results follow.

Since the P value, 0.855, is greater than 0.05,α = we do not reject the null hypothesis.

Copyright © Houghton Mifflin Company. All rights reserved.

Part IV: SPSS Guide 373

LAB ACTIVITIES FOR CHI-SQUARE TESTS OF INDEPENDENCE

Use SPSS to compute the sample chi square value. If your version of SPSS produces the P value of the sample chi-square statistic, conclude the test using P values. Otherwise, use Table 8 of Understandable Statistics to find the chi-square value for the given α and degrees of freedom. Compare the sample chi-square value to the value found in Table 8 to conclude the test.

1. We Care Auto Insurance had its staff of actuaries conduct a study to see if vehicle type and loss claim are independent. A random sample of auto claims over six months gives the information in the contingency table.

Total Loss Claims per Year per Vehicle

Type of vehicle $0–999 $1000–2999 $3000–5999 $6000+

Sports car 20 10 16 8

Truck 16 25 33 9

Family Sedan 40 68 17 7

Compact 52 73 48 12

Test the claim that car type and loss claim are independent. Use α = 0.05.

2. An educational specialist is interested in comparing three methods of instruction.

SL–standard lecture with discussion

TV–video taped lectures with no discussion

IM–individualized method with reading assignments

and tutoring, but no lectures.

The specialist conducted a study of these methods to see if they are independent. A course was taught using each of the three methods and a standard final exam was given at the end. Students were put into the different method sections at random. The course type and test results are shown in the next contingency table.

Final Exam Score

Course Type < 60 60–69 70–79 80–89 90–100

SL 10 4 70 31 25

TV 8 3 62 27 23

IM 7 2 58 25 22

Test the claim that the instruction method and final exam test scores are independent, using α = 0.01.

Copyright © Houghton Mifflin Company. All rights reserved.

374 Technology Guide Understandable Statistics, 8th Edition

ANALYSIS OF VARIANCE (ANOVA) (SECTION 11.5 OF UNDERSTANDABLE STATISTICS)

Section 11.5 of Understandable Statistics introduces single factor analysis of variance (also called one-way ANOVA). We consider several populations which are each assumed to follow a normal distribution. The standard deviations of the populations are assumed to be approximately equal. ANOVA provides a method to compare several different populations to see if the means are the same. Let population 1 have mean 1,µ population 2 have mean 2 ,µ and so forth. The hypotheses of ANOVA are

0 1 2: nH u u u= = =K

1:H not all the means are equal.

In SPSS use the menu selection Analyze Compare Means One-Way ANOVA to perform one-way ANOVA. Use two variables. One variable (column) contains data from all populations. The other variable contains population numbers (called levels) to indicate which population the corresponding data is from. An analysis of variance table is printed, as well a confidence interval for the mean of each level.

Analyze Compare Means One-Way ANOVA

Dialog Box Responses

Dependent list: Enter the columns containing the data.

Factor: Enter the column containing levels.

Click on [Options] and choose [Descriptive]

Example

A psychologists has developed a series of tests to measure a person’s level of depression. The composite scores range from 50 to 100 with 100 representing the most severe depression level. A random sample of 12 patients with approximately the same depression level, as measured by the tests, was divided into 3 different treatment groups. Then, one month after treatment was completed, the depression level of each patient was again evaluated. The after-treatment depression levels are given below.

Treatment 1 70 65 82 83 71

Treatment 2 75 62 81

Treatment 3 77 60 80 75

First we enter the data as shown below.

Copyright © Houghton Mifflin Company. All rights reserved.

Part IV: SPSS Guide 375

Now use Analyze Compare Means One-Way ANOVA. Enter Depression as the dependent variable. Enter Treatment as the factor. Click on [Options] and choose [Descriptive].

Copyright © Houghton Mifflin Company. All rights reserved.

376 Technology Guide Understandable Statistics, 8th Edition

Click [Continue] and then [OK]. The results follow.

Since the level of significance α = 0.05 is less than the P value of 0.965, we do not reject 0.H

LAB ACTIVITIES FOR ANALYSIS OF VARIANCE

1. A random sample of 20 overweight adults were randomly divided into 4 groups. Each group was given a different diet plan, and the weight loss for each individual after 3 months follows:

Plan 1 18 10 20 25 17 Plan 2 28 12 22 17 16 Plan 3 16 20 24 8 17 Plan 4 14 17 18 5 16

Test the claim that the population mean weight loss is the same for the four diet plans, at the 5% level of

significance. 2. A psychologist is studying the time it takes rats to respond to stimuli after being given doses of different

tranquilizing drugs. A random sample of 18 rats were divided into 3 groups. Each group was given a different drug. The response time to stimuli was measured (in seconds). The results follow.

Drug A 3.1 2.5 2.2 1.5 0.7 2.4

Drug B 4.2 2.5 1.7 3.5 1.2 3.1

Drug C 3.3 2.6 1.7 3.9 2.8 3.5

Copyright © Houghton Mifflin Company. All rights reserved.

Part IV: SPSS Guide 377

Test the claim that the population mean response times for the three drugs is the same, at the 5% level of significance.

3. A research group is testing various chemical combinations designed to neutralize and buffer the effects of acid rain on lakes. A random sample of 18 lakes of similar size in the same region have all been affected in the same way by acid rain. The lakes are divided into four groups and each group of lakes is sprayed with a different chemical combination. An acidity index is then take after treatment. The index ranges from 60 to 100, with 100 indicating the greatest acid rain pollution. The results follow.

Combination I 63 55 72 81 75 Combination II 78 56 75 73 82 Combination III 59 72 77 60

72 81 66 71 Test the claim that the population mean acidity index after each of the four treatments is the same at the

0.01 level of significance.

Copyright © Houghton Mifflin Company. All rights reserved.