exploratory data analysis with spss

Chapter 12 SPSS Analysis of SPF-2.4 Design

Data from Table 12.2-1

Exploratory Data Analysis With SPSS

Prior to performing an analysis of variance, an exploratory data analysis should be performed. The exploratory data analysis may uncover data recording errors, ANOVA assumptions that appear untenable, and unexpected promising lines of investigation. The vigilance data in Table 12.2-1 of Experimental Design: Procedures for the Behavioral Sciences (page 545) are used to illustrate the procedures for a two-treatment split-plot factorial design. Treatment A is the mode of signal presentation: a1 is a tone and a2 is light. Treatment B is four successive-hour monitoring periods. The data are as follows.

Table 12.2-1. Vigilance Data

Treatment Combinations

b1 b2 b3 b4 s1

3

4

7

7 s2 6 5 8 8 a1 s3 3 4 7 9 s4 3 3 6 8 s1 1 2 5 10 s2 2 3 6 10 a2 s3 2 4 5 9 s4 2 3 6 11

1. Double click on the SPSS icon to open SPSS. This action opens the window shown below that ask, What would you like to do?

2

2. Select the Type in data button and then the OK button in the lower right corner.

Type in data OK This action opens the SPSS Statistics Data Editor window shown next. At the bottom lower left of the

window are two rectangular buttons: Data View and Variable View. The Data View button is highlighted, which means that the window for entering new data is open. Before entering new data, the names and details of the variables in the new data set should be defined.

3. Click on the Variable View button to open the following SPSS Statistics Data Editor Variable View window where the characteristics of the data are defined.

You can go directly to the Variable View window by clicking on the Cancel button in the lower right corner of the first window instead of the OK button.

!

3

4. Use row one of the Variable View window to describe the characteristics of the subjects. Use row 2 to describe the characteristics of treatment A, type of signal. Use rows 3–6 to describe the characteristics of the four levels of treatment B, four monitoring periods. For the data in Table 12.2-1, fill in the columns of row 1 as follows:

Columns 1. Name. Name the variable Subject. A variable name can have no more than 64 characters. The first character must be an upper or lower case letter; the remaining characters can be any letter, digit, or the symbols @, #, _, or $. No spaces can appear in the name. Also, variable names cannot end with a period. The inclusion of the Subject variable in row 1 helps to organize the data in the Data View window.

Column 2. Type. This column enables you to define the variable type. The default type is Numeric for number. You can change the variable type by clicking on the Type cell. This action opens the Variable Type window where you can select from nine variable types including Scientific notation, Date, Custom currency, String, and Text. If you have changed the default, you need to click on the OK button at the bottom of the Variable Type window to return to the Variable View window.

Column 3. Width. This column enables you to define the number of characters that are shown for a variable in the SPSS Statistics Data Editor. The default is 8 characters. When you click on the Width cell, a blue scroll box appears on the right side of the cell where the options are 1, . . . , 40 characters.

Column 4. Decimals. This column enables you to define the number of characters to the right of the decimal point. When you click on the Decimal cell, a blue scroll box appears on the right side of the cell. The options are 0, . . . , 16 decimals: scroll to 0.

Column 5. Label. This column enables you to provide a descriptive label for the row variable: type in Pilots.

Column 6. Values. This column is used with grouping variables. Do not change None, which is the default.

Column 7. Missing. Do not change None, which is the default. The data in Table 12.2-1 have no missing values.

When you click on the Missing values cell, a blue button appears on the right side of the cell. If you click on the blue button, the Missing Values window that is shown next opens. This window enables you to identify discrete missing values or a range of missing values.

4

Click on the OK button at the bottom of the window to return to the SPSS Statistics Data Editor window.

Column 8. Columns. This column enables you to specify the number of characters in a column. When you click on the Columns cell, a blue scroll box appears on the right side of the cell where you can select from 1 to 255 characters. Do not change 8, which is the default.

Column 9. Align. This column enables you to determine the alignment of the columns. When you click on the Align cell, the cell enlarges and provides three options: Left, Right, and Center. Do not change Right, which is the default.

Column 10. Measure. This column enables you to specify the level of measurement of your variable. When you click on the Measure cell, the cell enlarges and provides three options: Scale, Ordinal, and Nominal. Click on the Nominal option. Scale denotes either interval or ratio measurement.

Column 11. Role. This column enables you to specify the role that the variable plays in the data set such as input, output, or partitioning data into samples. Do not change Input, which is the default. When you click on the Role cell, the cell enlarges and provides six options: Input, Target, Both, None, Partition, and Split. Further details about Role are available in SPSS’s help menu (Help Topics roles).

5. After you have entered the information for the subject variable in row 1, it is a good idea to give your SPSS file a name and save the file. Select File in the menu and then Save as to name the file Table 12.2-1. The SPSS Statistics Data Editor Variable View window will appear as follows. Notice that Untitled1 in the top left of the window has been replaced with Table 12.2-1.sav.

6. Use row 2 of the Variable View window to describe the characteristics of treatment A, type of signal. The information for row 2 is given next. For the data in Table 12.2-1, fill in the columns of row 2 as follows.

Column 1. Name. Name the variable Signal.

Column 2. Type. Do not change Numeric, which is the default.

Column 3. Width. Do not change 8, which is the default.

Column 4. Decimals. Click on the Decimal cell. In the blue scroll box on the right side of the cell, scroll to 0.

! !

5

Column 5. Label. Label this independent variable, Type of signal.

Column 6. Values. Click in in the Values cell to activate the blue button on the right side of the cell. Click on the blue button to open the Value Labels window that is shown next.

Type “1” in the Value rectangle and “tone” in the Label rectangle. Then click on the Add button to enter the label into the large rectangle. Repeat the procedure by entering “2” in the Value rectangle and “light” in the Label rectangle.

The Value Labels window changes as shown next.

To return to the Variable View window to enter more information about treatment A, click on the OK button at the bottom of the Value Labels window.

Column 7. Missing. This column enables you to designate certain scores as missing. The data in Table 8.2-1 have no missing values. Do not change None, which is the default.

Column 8. Columns. Do not change 8, which is the default.

Column 9. Align. Do not change Right, which is the default.

Column 10. Click on the Measure cell and select the Nominal option.

Column 11. Role. Do not change Input; which is the default.

6

7. Use rows 3–6 of the Variable View window to describe the characteristics of the repeated measures treatment, four monitoring periods. The information for row 3 is given next. Rows 4–6 follow the same pattern. For the data in Table 12.2-1, fill in the columns of row 3 as follows.

Column 1. Name. Name the variable Time_1.

Column 2. Type. Do not change Numeric, which is the default.

Column 3. Width. Do not change 8, which is the default.

Column 4. Decimals. Click on the Decimal cell. In the blue scroll box on the right side of the cell, scroll to 0.

Column 5. Label. Label this independent variable, 1st hour.

Column 6. Values. Click in in the Values cell to activate the blue button on the right side of the cell. Click on the blue button to open the Value Labels window that is shown next.

Type “1” in the Value rectangle and “1st hour” in the Label rectangle. Then click on the Add button to enter the label into the large rectangle.

The Value Labels window changes as shown next.

To return to the Variable View window to enter more information about treatment B, click on the OK button at the bottom of the Value Labels window.

7

Column 7. Missing. This column enables you to designate certain scores as missing. The data in Table 8.2-1 have no missing values. Do not change None, which is the default.

Column 8. Columns. Do not change 8, which is the default.

Column 9. Align. Do not change Right, which is the default.

Column 10. Click on the Measure cell and select the Nominal option.

Column 11. Role. Do not change Input; which is the default.

8. In rows 4–6 repeat the descriptive procedures for the other three monitoring periods. In the Values column for rows 4–6 enter, respectively, “2” for the 2nd hour, “3” for the 3rd hour, and “4 for the 4th hour. The Variable View window should appear as follows:

9. Click on Data View in the lower left of the SPSS Statistics Data Editor window to open the Data View window. In rows 1–5, enter the number of reading errors for each of the five subjects. After entering the data from Table 10.6-1, the Data View window should appear as follows:

8

10. To obtain descriptive statistics for the police attitude data, click on Analyze in the menu. Select Descriptive Statistics from the drop down menu and then Explore (Analyze Descriptive Statistics Explore). These actions open the Explore window shown next where the nonrepeated and repeated variables are identified

Subject is highlighted. Click on the arrow beside Label Cases by to move Subject into the Label Cases by rectangle. Highlight the Type of signal. Click on the arrow beside Factor List to move Type of signal into the Factor List rectangle. Highlight the four time periods. Click on the arrow beside Dependent List to move the four time periods into the Dependent List rectangle.

After these actions, the Explore window should appear as shown next.

11. Click on the Plots button in the upper right corner of the Explore window to open the Explore: Plots

window shown next. Click on the Dependent together button and the Histogram check box. Click off the Stem-and-leaf check box.

!

!

9

Click on the Continue button at the bottom of the Explore Plots window to return to the Explore window. Then click on the OK button in the Explore window to obtain the following descriptive statistics.

Results of the Exploratory Data Analysis

This output summarizes the information that was processed for the eight treatment combinations. It is important to examine the table to be sure that SPSS has correctly interpreted your instructions about the variables and number of observations.

12

The exploratory data analysis indicates that there are differences among the sample means. The differences would be of practical importance if they occurred in the population. The sample means, trimmed means, and medians are similar. This suggests that the distributions are reasonably symmetrical. However, the histograms and box plots shown next suggests that three of the altimeter populations may not be symmetrical.

15

The boxplots identified four outliers and one extreme outlier for the inner city beat. See Sections 3.6 and 3.7 of Experimental Design: Procedures for the Behavioral Sciences for ways to deal with nonnormality, unequal variances, and outliers.

Analysis of Variance With SPSS

1. To perform an analysis of variance on the reading-error data, click on Analyze in the menu; select

General Linear Model from the drop-down menu, and then Repeated Measures (AnalyzeGeneral Linear Model Repeated Measures). These actions open the Repeated Measures Define Factors window shown next.

!

!

16

2. Replace “factor 1” in the rectangle with Periods. Type 4 in the Number of Levels box. This instructs

SPSS to treat the four levels of Periods as different levels of the same variable, not as different variables. Click on the Add button to move Periods into the large rectangle. Type the name of the dependent variable, Latency, in the Measure Name rectangle. Click on the Add button to move Latency into the large rectangle. The Repeated Measures Define Factor(s) window should appear as shown next.

3. Click on the Define button in the lower left corner of the Repeated Measures Define Factor(s)

window. This action opens the Repeated Measures window shown next. Select Type of signal and click on the arrow beside the Between-Subjects Factor(s) rectangle to move Type of signal into the rectangle. Select the four time periods and click on the arrow beside the Within-Subjects Variables

17

box to move them into the large box to replace the four _?_(Latency). This action identifies the different levels of the repeated measures variable.

4 After these actions, the Repeated Measures window should appear as shown next.

5 Click on the Plots button on the right side of the Repeated Measures window to open the Repeated

Measures: Profile Plots window that is shown next.

18

6. Transfer Signal into the Separate Lines rectangle by clicking on the arrow beside the rectangle.

Transfer Periods into the Horizontal Axis rectangle by clicking on the arrow beside the rectangle. Then click on the Add button to transfer Signal and Periods into the large Plots rectangle in the lower part of the Repeated Measures: Profile Plots window. These actions will produce the following window.

Notice that Periods*Signal appears in the large Plots rectangle in the lower part of the Repeated Measures: Profile Plots window. Click on the Continue button at the bottom of the window to return to the Repeated Measures window. In the Repeated Measures window, click on the Options button. This action opens the Repeated Measures: Options window shown next.

19

7. In the Repeated Measures: Options window, highlight Signal and click on the arrow beside the

Display Means for rectangle to transfer Signal to the rectangle. Repeat the procedure for Periods. Check the box beside Compare main effects. Scroll in the Confidence interval adjustment rectangle to select Bonferroni. Then in the Display area check the boxes beside Descriptive statistics and Estimates of effect size. The Repeated Measures: Options window should appear as shown next.

20

8. In the Repeated Measures: Option window, click on the Continue button to return to the Repeated

Measures window. Click OK at the bottom of the Repeated Measures window to obtain the results of the split-plot factorial ANOVA.

Results of the Analysis of Variance

The SPSS output contains some results that are not relevant to a randomized block factorial analysis of variance. SPSS lists of all of the results sub tables on the left side of the results window. A list of these sub tables is shown next.

21

Some of the sub tables such as the Multivariate Tests are not relevant in this example. The tables can be deleted by highlighting each one and selecting Edit from the menu and then Delete. Alternatively, the irrelevant sub tables can be ignored. The following output contains only the relevant sub tables.

22

It is important to examine the Within-Subjects Factors, Between-Subjects Factors, and the Descriptive Statistics tables to be sure that SPSS has correctly interpreted your instructions about the variables and number of observations.

The key assumption for the within-blocks tests is multisample sphericity (See Section 12.4 in Experimental Design: Procedures for the Behavioral Sciences). This assumption contains two assertions:

CB

!" #a jCB" is the same at each level of treatment A and each

CB

!" #a jCB" satisfies the

sphericity condition. I recommend a procedure due to P. Harris for testing the multisample sphericity assumption. Mauchly’s test in SPSS does not test this multisample sphericity assumption. When, as in this example, naj ≤ 8, it is customary to adopt the α = .25 level of significance for preliminary tests on a model to increase the power of the test. Harris’s test statistic, W = 11.84, is less than the critical value, W.25; 2, 3, 4 = 11.878. Hence, the multisample sphericity assumption is tenable.

The Mauchly table provides three ε values for adjusting the degrees of freedom for treatment B and the A × B interaction. In SPSS the use of ̂! = .584 for adjusting the degrees of freedom for the F test corresponds to performing an adjusted F test. Note the designations of the Greenhouse-Geisser and Huynh-Feldt tests in SPSS do not correspond to my designations on pages 558–560 of Experimental Design: Procedures for the Behavioral Sciences.

23

The tests of treatment B and the A × B interaction are significant: F = MSB/MSB × Blocks w A = 64.833/0.507 = 127.89 and F = MSA × B/MSB × Blocks w A = 6.458/0.507 = 12.74. Because the A × B interaction is significant, interest shifts from treatment B to understanding the interaction. Procedures for performing tests of treatment-contrast interactions are described in Section 12.6 of Experimental Design: Procedures for the Behavioral Sciences.

The Within-Subjects Contrasts table provides tests of the linear, quadratic, and cubic trends. These tests are appropriate for quantitative independent variables such as treatment B that are separated by equal intervals and have equal ns.

When multisample sphericity is tenable, it is customary to use MSB × Blocks w A in the denominator of each trend test. When sphericity is not tenable, an error term appropriate for the specific trend, for example, MSB × Blocks w Alin, should be used.

24

The test of treatment A is not significant: F = MSA/MSBlocks w A = 3.125/1.563 = 2.00. Because the A × B interaction is significant, interest shifts from treatment A to understanding the interaction. Procedures for performing tests of treatment-contrast interactions are described in Section 12.6 of Experimental Design: Procedures for the Behavioral Sciences.

The Estimates table provides 95% two-sided confidence intervals for the treatment A means. The Pairwise Comparisons table provides a test of the treatment A contrast. Because the A × B interaction is significant, interest shifts to understanding the interaction.

25

The Univariate Tests table duplicates the treatment A test in the Tests of Between-Subjects Effects table and provides a sample measure of strength of association, partial eta squared. I prefer to use partial omega squared to estimate the population strength of association. This test is described in Section 12.2 of Experimental Design: Procedures for the Behavioral Sciences.

26

The Estimates table provides 95% two-sided confidence intervals for the treatment B means. The Pairwise Comparisons table provides a test of the treatment B contrast. Because the A × B interaction is significant, interest shifts to understanding the interaction.

The Profile Plots figure provides a picture of the sample data. The figure is useful for identifying interesting treatment-contrast interactions.

Reference

Kirk, R. E. (2013) Experimental Design: Procedures for the Behavioral Sciences (4th ed.). Thousand Oaks, CA: Sage.

Suggested Readings

Brace, N, Kemp, R., & Snelgar, R. (2013). SPSS for Psychologists (5th ed.) New York, NY: Routledge.

Kinnear, P. R., & Gray, C. D. (2011). IBM SPSS Statistics Made Simple 18. New York, NY: Psychology Press.

exploratory data analysis with spss

Documents