spss for windows - yale statlab homestatlab.stat.yale.edu/workshops/intro_spss_fa12.pdf · • you...

Workshop: Introduction to

SPSS for Windows

September 28th, 2012

Presenters:

Kevin Callender & Oriana Aragon

SPSS version currently on StatLab computers: 19.0

Introduction ................................................................................................... 2 Getting Data into SPSS ................................................................................. 4 Analyzing Data .............................................................................................. 5 Creating Graphs ........................................................................................... 14 Saving or Printing Data, Output or Graphs ................................................. 20 Appendix .................................................................................................... 22

- 2 -

Introduction Workshop overview: The purpose of this workshop is to help new users achieve a basic understanding of SPSS. This basic understanding will provide a foundation upon which to build a more comprehensive skill set based on your particular analytical needs. By the end of the workshop, you should be able to do all of the following:

o Navigate the interface o Enter and upload data o Perform basic statistical analyses o Create graphs that help you visualize your data and results

• SPSS is a statistical analysis and data management package widely used in the social sciences. In SPSS, most features are accessible through menus, located at the top of display windows.

• When you start SPSS for the first time, you are greeted with the following dialog box:

• To get started, choose from the options and click 'OK'. • There are three main window displays when using SPSS; the Data Editor, the Viewer, and the Syntax

Editor. The Data Editor and the Viewer will open whenever you open a data set. o Data Editor:

§ The Data Editor is a spreadsheet style window which displays data and information about the variables in that data. You can enter data directly in this window, edit data, and edit variable names, variable type, variable labels and more.

§ This window has two tabs at the bottom left of the window. The Data View tab allows you to see the data, with variable names across the top (in columns). The Variable View tab (below) lets you see your variables in rows, with the variable information in the columns.

- 3 -

§ You can open more than one data set at a time. Each data set will open in a separate data editor window. The currently active data set will have a green plus sign displayed in it's icon in the upper left corner of the window, like this: . The same plus sign will be displayed in the icon on the taskbar in Windows. When you select an action from the menus, SPSS will perform that action on the currently active dataset.

o Viewer § The Viewer displays output of commands including charts, graphs, tables, and any error or

warning messages. The left hand part of the window displays the outline view of output, allowing you to easily select what you'd like to view, hide or delete from the file. The right side is where the content is displayed. You can edit most output by simply double-clicking it in the right side of the window. You can also open more than one Viewer if you wish. This feature comes in handy if you want to refer to the results of an analysis run previously (or by someone else) while working on a new analysis. Make sure to keep track of which Viewer is currently active when running analyses, as the output will be directed to the active Viewer (the last one you used or clicked.)

o Syntax Editor § The Syntax Editor displays statements from the SPSS command language or syntax. This

window is basically a plain text editor where you can type (or paste) commands you wish SPSS to run.

- 4 -

§ The Syntax Editor does not open by default. • Although the menus displayed on each window include options specific to that window, several key menus

are displayed on all SPSS windows, including File, Edit, Data, Transform and Analyze to allow quick access to core functionalities.

• First time users: A tutorial is available in SPSS. Click on Help > Tutorial. As you can see from the table of contents screen shown below, it is a comprehensive introduction and well worth exploring.

Getting Data into SPSS

Direct Entry

• You can use the SPSS Data Editor to enter raw data manually. Make sure you are in Data View. Check the tab at the very bottom of the Data Editor as shown:

• Once you are in the Data View just click in a cell and begin typing.

• The Variable View tab allows you to edit the variable names, type (numeric, string, etc.), variable labels, value labels, missing values and other variable attributes.

- 5 -

SPSS or .sav Files

• Click on File > Open > Data... to open a SPSS (.sav) file.

Excel, Stata, SAS, database Files

• SPSS can also directly open files from Excel, Stata, SAS, and several database files. Select the appropriate type from 'Files of type:' just below the 'File name:' text box.

Analyzing Data • There are two basic ways to use SPSS to analyze, manage and present data:

1. Select procedures from them menus (Data, Transform, Analyze, Graph). 2. Open a Syntax Window and type in commands directly using the SPSS command language.

• Below I'll demonstrate four basic analyses, a Student’s t-test, One-Way ANOVA, Bivariate Correlation, and Simple Linear Regression. We'll use the sample data file ‘Employee data.sav' for all four analyses.

• The path for this file is C:\Program Files\IBM\SPSS\Statistics\19\Samples\English\Employee data.sav . • Student’s t-test

o The t test (for independent samples) is found under Analyze > Compare Means > Independent-Samples T Test..., which brings up the dialog box below:

- 6 -

o We will compare men and women using ‘Current Salary' and ‘Beginning Salary'. o First, click on 'Current Salary' then click the arrow to move the variable into the 'Test Variable(s)'

box. Repeat with ’Beginning Salary’. Now click on 'Gender' then click the arrow to move it to the 'Grouping Variable' box. The 'Define Groups...' button will become active. Your dialog box will now look like this:

o Click on 'Define Groups...' to get this dialog box (I've already filled in the group values):

- 7 -

o This dialog lets you specify how to identify the groups for the analysis. 'Gender' sex has only two values, so I filled in the values in the appropriate boxes. In this case, they are letters, but the same can be done when the groups are numerical. The 'Cut point:' option lets you specify a value to divide the variable into two groups. Cases with values greater than or equal to the cut point will form group 2 and all cases with values less than the cut point will form group 1. If your grouping variable has short text values (8 characters or less, for example 'Yes' or 'No'), you can also type in the short text into the group boxes.

o What if the grouping variable has more than two values and I don't want to use a cut point? Good question. Let's discuss a variable with the values: High, Medium, Low and a value for Missing (3, 2, 1, and 99 respectively). We could compare High and Low by designating 'Group 1:' in the dialog box to be 3 and 'Group 2:' to be 1. This analysis would ignore the Medium and Missing groups.

o Click 'Continue'. o The 'Options' button allows you to specify a different confidence interval for your test than the

default 95%. You can also opt to exclude cases listwise instead of the default (analysis by analysis). o Click 'OK' to run the analysis. o Part of the output from this analysis is shown below (this should be in your Viewer window):

o Since Levene's Test for Equality of Variances is significant for both variables (this indicates that the variances are significantly different), we'll look at the t values for the rows labeled "Equal variances not assumed' to discover the results of our analysis. Both comparisons are significant. On average, female employees earn ~$15,410 less currently and $7,210 less at the beginning of employment.

o If you were to click 'Paste' instead of clicking 'OK' after filling in the dialog box, this is the syntax that SPSS would generate:

T-TEST GROUPS=gender('f' 'm') /MISSING=ANALYSIS /VARIABLES=salary salbegin /CRITERIA=CI(.95).

The point and click interface in SPSS includes many defaults in the syntax it generates. To run the same analysis, the following is all you need (don't forget the / before VARIABLES or the period at the end of the line): T-TEST GROUPS = gender /VARIABLES = salary salbegin.

- 8 -

o To use a cut-point to assign groups- in the GROUPS command, include a single value (the cut point) in parentheses after the grouping variable, instead of two values separated by a space.

One-Way ANOVA o The One-Way ANOVA is found under Analyze > Compare Means > One-Way ANOVA... and

produces the dialog box below:

o For this analysis the groups will be formed by levels of 'Employment Category’ and we'll see if ‘Months since Hire’ differs by group. Click on the variable (on the left) ‘Months since Hire’ then move it to the 'Dependent List:' box by clicking on the upper arrow button. Now click on ‘Employment Categoy’ and then click the lower arrow button to move that variable to the 'Factor:' box.

o At this point the 'OK' button and the 'Paste' button activate, as you have entered the minimum information needed to run a One-Way ANOVA. However, you may want to click the 'Options...' button to bring up the following dialog:

o All of these options will be useful at different times, depending on your circumstances. A quick reference is available by clicking the 'Help' button. Let's check all the boxes under Statistics and the Means plot as well. We'll leave Missing Values as the default.

- 9 -

o Since our factor has only two levels, contrasts are not needed. If you do need to run contrasts, click on the 'Contrasts...' button and it will open a dialog to allow you to define contrasts. The 'Post Hoc...' button is covered in the GLM example in the appendix.

o Click 'OK' to run the analysis. o The output in the viewer reveals that there is no significant difference between the three groups. The

p value for an F of 0.031 with 2 between-groups degree of freedom and 471 within-groups degrees of freedom is 0.970 (much higher than .05).

o Let's look a second at the means plot, though:

o At first glance, that certainly LOOKS like a big difference, doesn't it? The default settings for charts in SPSS can sometimes be inappropriate for the data you are trying to display. This is one of those cases. In this situation, the difference between 81.07 and 81.55 is not important (or significant) and should not be exaggerated by the scale of the chart. Let's fix that.

§ First: double-click on the chart in the output window. This will open the Chart Editor in a new window, with your chart ready to edit.

- 10 -

§ Click on one of the scale points on the Y axis (the vertical one on the left). The scale points will now have a blue box around them.

§ Right Click in one of the blue boxes and select the first option 'Properties Window' to get the Properties dialog for the Y axis. (While editing a chart in SPSS, keep this in mind: When in doubt, right click on the item you want to change.) Click on the 'Scale' tab at the upper right and you should be seeing the following:

§ For those of you familiar with Excel, this may look familiar. We need to change the minimum and maximum from 'Auto' to 'Custom'. Uncheck the boxes next to 'Minimum' and 'Maximum' and replace the values to the right with a minimum of 50 and a maximum of 100. Let's also change the 'Major Increment' to 5. Click 'Apply' to update your chart. Hint: if you drag the properties box next to the chart, you can update the chart and see the impact of each change before clicking 'Close'. Those changes give us the following chart:

- 11 -

§ That's much better; the chart now displays the information in a more appropriate context.

o Bivariate Correlation:

§ The Bivariate Correlation is found under Analyze > Correlate > Bivariate... and produces the dialog box below:

- 12 -

o For this analysis we will be testing the linear relationship between ‘Previous Experience’ and ‘Current Salary.’ Click on the variable (on the left) ‘Previous Experience’ then move it to the 'Variables List:' box by clicking on the upper arrow button. Repeat for ‘Current Salary.’

o Before proceeding, you may want to click the 'Options...' button to bring up the following dialog:

• By default, SPSS will exclude cases pairwise. Deleting cases listwise or pairwise are ways of

dealing with missing values in data. o If you select listwise deletion, then a respondent or case is omitted from the entire

analysis insofar as the respondent has any missing value for any of the variables being tested.

o If you select pairwise deletion, then a respondent or case is only omitted from analysis involving the specific variable(s) where missing values are present (this is less harsh).

o Leave the defaults for this analysis by clicking ‘Continue,’ then click the 'OK' button to run the Bivariate Correlation. Our output should look like this:

o According to the output, having higher current salary is linearly associated with less previous experience. The correlation coefficient r is equal to -.155. The p-value for this association is less than .05, given 474 degrees of freedom (N-2). You will notice that variables are perfectly correlated with themselves (r =1). You will also see that the correlation coefficient between two variables remain the same regardless of the order in which they are examined.

- 13 -

Simple Linear Regression

o Linear Regression is found under Analyze > Regression > Linear... and produces the dialog box below:

o For this analysis, we will be essentially repeating the analysis bivariate correlation procedure but within a regression framework. I will demonstrate how a bivariate correlation is in some ways equivalent to a simple linear regression (i.e., a regression with one independent and one dependent variable).

o Select ‘Current Salary’ and move it to the ‘Dependent’ box. Select ‘Previous Experience’ and move it to the ‘Independent(s)’ box. Nothing else needs to be added at this moment. Once you have the following screen, click ‘OK.’

- 14 -

o Your output will include a model summary, an ANOVA table, and a Coefficients table. For now, let’s jump to the Coefficients table. The unstandardized coefficient ‘B’ indicates the change in ‘Current Salary’ (the DV) that occurs when ‘Previous Experience’ (the IV) increases by one unit, which in this case is one month. The Std Error of 7.479 indicates the precision of that estimate, and the standardized coefficient, Beta, refers to how many standard deviations a dependent variable will change, per standard deviation increase in the predictor variable. The t statistic assesses whether or not the beta is significantly different from zero. Notice that -.097, and the p-value associated with it (.034) is the exact same as the r coefficient and its p-value from the bivariate correlation we conducted earlier.

o In the case of a simple linear regression, the standardized beta is equivalent to the Pearson Correlation involving both variables. This fact also highlights that although the format of a regression suggests one variable “predicting” an outcome, causation cannot be inferred from the analysis alone; correlation does not indicate causation. Let’s take a look at what happens when the IV and the DV for this analysis is reversed. The following output is given:

o You’ll notice that the values under unstandardized coefficients have changed, but the standardized coefficients have remained the same. A one unit increase in ‘Current Salary’ (i.e., a $1 increase) is linearly associated with .001 fewer months of work experience. Despite the change in scale, the standardized association stays the same. Similar to the correlation coefficient, the standardized beta coefficient between two variables remains the same regardless of which is the IV and which is the DV.

- 15 -

Creating Graphs • Click on Graphs > Chart Builder... to access the Chart Builder: • By default, SPSS will display the following warning before you access the Chart Builder:

• Clicking 'OK' takes you to the Chart Builder. The 'Define Variable Properties...' option opens the Define Variable Properties dialog (also available through the menus: Data > Define Variable Properties... ). SPSS uses the Variable Properties to determine which options are available for various graphs and plots. Hint: Defining Variable Properties is a VERY good habit to get into. Unexpected behavior from SPSS can often be traced to incorrect Variable Property settings.

• For now, just click 'OK' to open the Chart Builder. You should see this:

- 16 -

• Suppose we would like to graph the relationship between beginning salary “salbegin” and current salary “salary”. First, click on Scatter/Dot in the Gallery tab of the lower part of the Chart Builder. Next, double click (or drag to the chart builder working area) the simple scatter icon (top row- first on left). Then drag the “salbegin” variable to the x-axis box, and then drag the “salary” to the y-axis box. You should see this:

• Click “OK,” and the graph will be shown in the viewer:

- 17 -

• Now let’s say that you want to see, in these data if the relationships in beginning salary and current salary seem to be the same for men and women. Rather than creating two separate charts (one for men and one for women) we can label these data points to indicate which are men and which are women.

• So first click again Graphs > Chart Builder... and then select OK. If the information for your last graph is still in place, simply press reset in the bottom left corner. That will give us a clean slate to start again. For this chart again go to the option of scatterplot but this time instead of using simple scatter (top left) drag to the chart builder grouped scatter (next one over to the right).

• Next enter the variables of interest to us: o x axis = “salbegin” o y axis =“salary” o set color = gender

• It should look like this:

- 18 -

• Lastly press OK. You chart should look like this:

• In other instances you may want to create a graph of a categorical variable (i.e. gender) and a continuous variable (i.e. salary). In this case you can create a bar chart to give you a sense of the shape of your data. So again use Graphs > Chart Builder... and again click Ok past the first pop up dialogue box, and again click Reset to clear past work.

• This time select bar from the lower left hand box of options and then drag simple bar (the top left icon) into the chart builder. We next enter the variables of interest

o x axis = gender o y axis = salary

• You should see this:

- 19 -

• And again press OK; your bar graph should look like this:

• Just as in the scatterplot example, sometimes we want to consider a third variable of interest (i.e. minority classification) at the same time. Maybe the relationships that we see for men and women and salary are not the same dependent on whether the men and women are in a minority group, or not. To visualize this let’s create clustered bar graph.

• So again use Graphs > Chart Builder... and again click Ok past the first pop up dialogue box, and again click Reset to clear past work.

• Select bar from the lower left hand box of options and then drag clustered bar icon (top row, second from the left) into the chart builder. We next enter the variables of interest

o x axis = gender o y axis = salary o cluster on x = minority

• You should see this:

- 20 -

• Press OK. You clustered bar graph should look like this:

• For more help, go to the tutorial (Help > Tutorial) which can further introduce you to the Chart Builder.

Saving or Printing Data, Output or Graphs • Click on File > Save to save work in active window.

• Click on File > Print to print contents of active window or click on Print Button on the Icon Bar. o Saving paper, Editing print output: o Results of analyses in SPSS will bring up an SPSS Viewer window. The left pane displays an

outline of the output, including titles, notes, and statistics. Clicking on a small box with a minus (-) sign in it will collapse that entry in the output, hiding it from view in the main window. Individual items in a listing can be toggled from displayed to hidden (and back) by double clicking on the icon in the outline pane.

o There are several options for printing from SPSS that consolidate output and save paper. The following are available from the SPSS Viewer window. These options may be combined to consolidate output even further.

§ Select the section(s) you want to print from the outline box: § A single click suffices for one section. § If several sections need to be printed out, click on the appropriate sections while

holding down the CTRL key. § Once selection has been made, click on File > Print > Selection

§ Delete or hide sections you do not need to print: § Select section(s) from the outline box and use the DELETE key to discard them, or click on

the box with the minus (-) sign to hide them. § Click on File > Print > All visible output § Tip: sections for titles, notes, and warnings should be deleted or hidden if you do not need

these records in the printout. o If presentation style is not an issue, you can change the text output page size so that you will print

output without any page breaks.

- 21 -

§ Click on Edit > Options, and then the Viewer tab. § Click on Infinite in the Text Output Page Size box. § Click on OK to save options. § Click on File > Print > All visible output

o Clear page breaks and add them wherever you prefer: § Click on Insert > Clear Page Break to clear current page breaks. § To add page breaks at certain sections, click on the section where you want the new page to

start, either from the outline box or from the text output on the right. § Click Insert > Page Break § Click on File > Print > All visible output § TIP: Use File > Print > Preview to check the output to decide on modifications and options

for printing. • Sometimes, SPSS has a hard time opening output files created with other versions (e.g., older versions,

versions for a different operating system). If you are sharing output with colleagues, you can always export it into another format. To do so, make sure you are on the Output widow. Click on File > Export. This will allow you to save the output in other formats, such as html or PDF.

- 22 -

Appendix Fixed Format Data

• SPSS has a Text Import Wizard that will help you import data from many plain text files. The Wizard will scan your data and provide options for you to choose from. When data is delimited (variables separated by a tab, comma or space) the default choices are very often correct. For this demonstration, I'll be using a sample data set that I constructed for this exercise. You can download data.txt (right click to download) if you'd like to follow along. This data is Fixed Width format. That is, variables are identified by the column(s) of the data set and are not (necessarily) delimited by spaces or tabs. To start the Text Import Wizard from the menus, click File > Read Text Data... which opens the 'Open File' dialog box. Navigate to and select data.txt, then click 'Open'.

• The lower part of the screen provides a preview of the data file you are working with. As you can see, fixed format data can be very difficult to read unless you have very specific information about how the file was constructed. This information is usually called a codebook. I will provide that information below.

• Unless you have previously saved a format to apply to this data, click 'Next>'.

- 23 -

• The Wizard has correctly checked the 'Fixed width' radio button at the top of the dialog, since variable names are not included in this data file, the second question is marked appropriately as well. Click 'Next>'.

• The suggested responses to all of the options are correct here as well. Note that with these options you can import data with cases that span several lines in the file, import a percentage of cases, or just the first few (or few thousand) cases. Click 'Next>' to continue.

- 24 -

• This is the section of the wizard that requires information from the codebook. Although that information can be presented in a number of ways, the following is fairly typical:

Variable Columns Variable Details Subj 1-2 Subject, numeric2.0, no labels or missing value assignments Age 3-4 Age, numeric2.0, no labels or missing value assignments Ht 5-8 Height, numeric2.1, no labels or missing value assignments Gen 9 Gender, numeric1.0, 0=Male 1=Female, no missing value assignments Scr1 10-12 Score1, numeric3.0, no labels or missing value assignments Scr2 13-15 Score2, numeric3.0, no labels or missing value assignments

• Using the information from the codebook, and following the directions from the wizard (at the top of the dialog box), we can place variable break lines (the vertical lines) to separate the variables. Since the variable Subj is in columns 1-2, we need to place a line at column 2. Either click on the Ruler above the data preview at column 2, or type a '2' into the 'Column Number' box below the data preview window; then click on 'Insert Break'. Continue until you've added all the breaks necessary.

• The following shows the dividers placed appropriately.

- 25 -

• When you have added the variable break lines as above, click 'Next>' to display the dialog box below:

• If you'll notice, the 'Finish' button has become active since you have completed enough of the wizard to create a data file. If you click 'Finish' now, the variables will be V1, V2, and so on. The wizard assigned the default variable naming strategy to the variables set up in the previous step. Currently highlighted in the first column is V1. Rather than click 'Finish' and rename the variables from the Data Editor, let's go ahead and rename them using the wizard. I'll explain why in a moment. If you type in 'subj' into the box above (Variable name:) SPSS will name that variable 'subj'. Now highlight the second column by clicking on V2

- 26 -

in the Data preview window then type in 'age' into the Variable name box. Repeat for the remaining variables to get a dialog box as follows:

• Click 'Next>' to see the last screen of the wizard, below:

• The last screen of the wizard gives us the opportunity to save our work two different ways. If you save the file format using the first question of the screen, you can use that format file for data set up in a similar manner in the future. Remember, screen 1 of the wizard asked if your file matched a predefined format. This is how you can create that format. If you need to use the format file again, the wizard will use all of the settings (variable break lines, variable names, number of rows per case, etc.) and apply them to the new data.

- 27 -

• TIP: if you are not absolutely certain that your work to this point is perfectly correct, save the file format for future use. It can save you a great deal of time if you need to change just one thing in the wizard. Even if you don't use it right away, you may need it later.

• The other way to save your work is to paste the syntax. Go ahead and click 'Yes' under 'Would you like to paste the syntax?' then click 'Finish' to create the SPSS syntax file. Notice that we have not yet created a data file. We just pasted the syntax, shown below:

• Using information from the codebook we can add to the syntax to create a data file formated as we wish. Notice that 'ht' has an F4.0 at the end of its line in the syntax file. That is telling SPSS to create a floating point numeric variable with 4 digits to the left of the decimal point and 0 digits to the right. The codebook specifies that 'ht' has one decimal place (numeric3.1), so we can change it to F3.1 in the syntax. I'll leave it to you to adjust the syntax for the remaining variables if necessary. It will then look as follows:

- 28 -

• Before you run the syntax, make sure that the path (in the /FILE subcommand) is a location where you have write privileges; your desktop or thumb drive are good places to write data. This syntax will create the data file displayed in the Data Editor below:

• The data is now ready for any further cleaning, manipulation and/or analysis. The file is now saved in the location specified in the /FILE statement.

Delimited Format Data

• The Text Import Wizard can also assist with delimited data. The process will be similar to Fixed Width with a few adjustments. I'll be using a tab-delimited file called milascii.txt. You can download milascii.txt (right click to download) if you'd like to follow along. Open the wizard (File > Read Text Data... ) and click 'Next>' on step 1.

• In step 2 you will notice that the 'Delimited' radio button is chosen by the wizard, since this file does not include variable names you can click 'Next>' to proceed to step 3.

• Click 'Next>' to view the step 4 dialog box:

- 29 -

• The wizard correctly lists tab as a delimiter. If there had been multiple word text in the file (such as free responses) the wizard would haved marked the Space delimiter checkbox and created a large number of variables containing individual words. In that case, just uncheck that box and look in the Data preview window to check the effect. The remaining steps of the wizard will be as described above, so I won't repeat them here.

Loading DATA into SPSS Using Syntax

• Why would you want to use syntax to load data in SPSS? • Good question. Most of the time, you probably won't. But sometimes using syntax is a way to solve special

problems... such as accessing data in older, less accessible formats or setting up a series of commands to be run multiple times, or on multiple machines.

SPSS files

• A common reason for using syntax to open SPSS files is for documentation purposes. If the syntax file is complete, that is, it opens the file, runs analyses, and saves the output and data files, you can refer to it when you have questions, or even send it to a colleague to allow them to recreate the analysis.

• It is often easier to start by using the dialog boxes to navigate to the file and then use the paste function to have SPSS write the syntax for you. This time around, I will use the sample data file '1991 U.S. General Social Survey.sav'. Instead of clicking 'Open' click on the 'Paste' button just below it

- 30 -

to create the syntax statements as below.

• If your data is already in SPSS format, that's all you need. You can run this syntax by clicking Run > All.

Excel files

• The syntax below will open an Excel file included in a default SPSS installation directory. This code will work for Excel 95 through Excel 2003 (not Excel 2007, you would use /TYPE=XLSX instead). I used the 'Open File...' dialog box and the 'Paste' function to create it.

• The /SHEET subcommand allows you to specify which worksheet the data is in. If omitted, the default is the first sheet. The single quotation marks are required around the name of the sheet. /CELLRANGE lets you specify a range to read the data from. So if there are a number of non data rows, or you only want to read in a portion of the data from a file, you can specify the range you want read into SPSS. The default is full, or all the data on the sheet. /READNAMES tells SPSS whether or not the first row in the range specified contains variable names or not. The default is on. /ASSUMEDSTRWIDTH allows you to specify the length of string (alphabetic) variables. The default is 255, the maximum is 32767. If you set up your Excel files properly, only the first 3 lines of code are necessary to open the file, since the command will default to same settings as below if you do not specify otherwise.

- 31 -

GET DATA /TYPE=XLS /FILE='C:\Program Files\SPSSInc\SPSS16\Samples\demo.xls' /SHEET=name 'demo' /CELLRANGE=full /READNAMES=on /ASSUMEDSTRWIDTH=32767.

• This Excel file contains a number of variables, including 'Maritalstatus' and 'IncomeCategory'. If you received a version of such a file on a regular basis, you could write a syntax file (as below) to read the file into SPSS, run FREQUENCIES, then save as a SPSS file in the same folder. The only thing you would need to edit each time you ran the script would be the file names.

GET DATA /TYPE=XLS /FILE='C:\Program Files\SPSSInc\SPSS16\Samples\demo.xls'. FREQUENCIES Maritalstatus IncomeCategory /BARCHART . SAVE OUTFILE='C:\Program Files\SPSSInc\SPSS16\Samples\demo.xls' /COMPRESSED.

• Of course, this is just a simple example. You could write a much more complicated syntax file to sort and merge files, create new variables and run statistical analyses on all or part of the data.

SPSS Syntax Basics

This section provides an introduction and sampling of basic syntax in SPSS. All illustrations and details are intended to apply to SPSS v16 for Windows. Most of the information will be the same for other versions, but there may be discrepancies.

Syntax refers to the computer language SPSS uses to complete analyses. While most commonly used commands are available through the menu system of SPSS (point & click) many more options and functionalities are available using syntax.

Using syntax can save a great deal of time when running repetitive analyses. It is also a way to document your work and allow you to duplicate that work with a new (or updated) data set. It allows you to 'tweak' your analysis in ways not available through dialog boxes.

• Syntax files in SPSS are plain text files with an extension of '.sps'. You can create syntax several ways. Probably the easiest way to start using syntax is by using the 'paste' button available in most dialog boxes. It is circled below as seen in the Frequencies dialog box.

- 32 -

• If you'd like to follow along, open the sample data file '1991 U.S. General Social Survey.sav' (in a default installation, it is located in C:\Program Files\SPSS.) From the toolbar in SPSS, click the leftmost icon, an open folder (Open File) or click File > Open and navigate to the file. The default settings will open the 'Program Files\SPSS' folder. If necessary, navigate to that folder. Select the file and click 'Open'. I obtained this dialog box by selecting Analyze >Descriptive Statistics >Frequencies.

• First select the options you wish from the dialog box, then, instead of clicking 'OK', click 'Paste'. I used the default settings. If you do not have a syntax window open already, this will open a new syntax window containing the commands you selected in the dialog box as seen below. If you already had a syntax window open, the 'Pasted' syntax will be appended to the end of the file.

• If you already have a syntax window open, the commands will be pasted at the bottom of the currently active syntax window. The Syntax Editor allows you to edit a plain text file and submit selected commands to SPSS directly. You can add notes, cut, paste and edit just as with any other text file.

Structure of Commands in SPSS syntax

• Commands in SPSS begin with a keyword that is the name of the command followed by any subcommands and user specifications. The end of the command is marked by a period/full stop.

• In SPSS syntax files, commands must always be placed in the first column. Subcommands and user specifications must be indented at least one space. Refer to the Command Syntax Reference for a discussion of available commands and options. It can be found in the menus under Help >Command Syntax Reference.

o Example: the FREQUENCIES command: § FREQUENCIES produces tables of frequency counts and percentages of the values of

individual variables. FREQUENCIES is used to obtain frequencies and statistics for categorical variables and to obtain statistics and graphical displays for continuous variables.

§ By default, SPSS will paste syntax with commands and specifications in all caps, and will display variables as you have entered them. Commands and specifications do not have to be entered in all caps, but I will continue to display them that way to help differentiate them from variables. Further, all syntax commands will be shown in a blue font. In the syntax below, just as pasted from the dialog box, 'sex' and 'race' are the variables that I selected. NOTE: Variable names in SPSS are generally separated by spaces.

FREQUENCIES VARIABLES=sex race /ORDER= ANALYSIS .

- 33 -

§ In the syntax window there is a very useful toolbar button called 'Syntax Help'. It is context sensitive, meaning that it will display a syntax chart for the command where the cursor is currently located. Clicking on the 'Syntax Help' button provides the following information about the FREQUENCIES command.

§ Don't let the long list intimidate you. Many people are surprised to learn that FREQUENCIES has so many subcommands and specifications!

§ Subcommands and specifications in square brackets ([ ]) are optional, and those in braces ({ }) indicate a choice between elements. Look closely, there are only two words that are NOT in brackets, FREQUENCIES and 'varlist'. This means that you can run frequencies on the two variables 'sex' and 'race' by typing the following into a syntax editor window:

FREQUENCIES sex race.

§ There are many abbreviations allowed in syntax (generally the first 3 or 4 letters of a command/specification will suffice.) So the following syntax will provide the same output:

FREQ sex race.

§ Don't forget the period at the end of the command. The 'Syntax Help' button provides the skeleton of the command you are using, but does not provide a detailed explanation of each possible subcommand and/or specification. That information is found in the Command Syntax Reference (use the menus: Help >Command Syntax Reference).

- 34 -

§ Now lets add a little more to the syntax. The following will provide descriptive statistics and barcharts for our two variables:

FREQUENCIES sex race /STATISTICS=STDDEV MINIMUM MAXIMUM MEAN MEDIAN MODE /BARCHART .

§ If you want to add documentation to your syntax file, indicate the start of a comment with an asterisk (*). Everything between that asterisk and the next period will by ignored by SPSS. Remember not to add other periods in your documentation if you use this method, since SPSS will try to interpret everything after the period as commands. Another method is to use /* and */ to set off a comment. That is, start with /*, insert your comment of as many or as few words and lines as you want then end the comment with */. Remember not to include any periods in the comment using this technique either.

Missing Values, Variable and Value Labels

• In this section we will return to using the sample data file '1991 U.S. General Social Survey.sav'. • Creating a well documented data file can be quite tedious. Syntax statements can make the process a little

less painful. The MISSING VALUES command declares values for variables as user-missing. User-missing values are then treated the same as the system-missing values. (That is, they are usually ignored.) Multiple missing values are separated by commas, and a range of missing values may be declared using the keywords LO, LOWEST, HI, HIGHEST, and THRU. If the variable(s) are strings, enclose the missing values in single quotes. Still using the 'demo' file from above; the following command sets values of the variable 'Age' higher than 99 (including 99) and values of 'Emply' equal to 999 to user-missing. Don't forget the period at the end.

MISSING VALUES Age (99 THRU HIGHEST) Emply (999).

• To cancel previously declared missing values, simply reassign the missing values to blank (use () in the previous statement.) To remove all missing values settings at once, use the following code:

MISSING VALUES ALL ().

• VARIABLE LABELS assigns descriptive labels to variables in the datafile. You can assign a label to one variable or to a long list at the same time. The following syntax assigns labels to the first few variables in the 'demo' file. NOTE: Each variable label can be up to 120 characters long, although most procedures will only print fewer than the 120 characters. All statistical procedures display at least 40 characters.

VARIABLE LABELS Age 'Age' Maritalstatus 'Marital Status' Address 'Address' .

• In general, syntax will ignore spaces and lines within commands and subcommands. It is often easier to read a syntax file if you add spaces and start new lines to create columns, as below.

VARIABLE LABELS Age 'Age' Maritalstatus 'Marital Status' Address 'Address' .

• The VALUE LABELS command assigns descriptive labels to values of variables in the datafile. Many people confuse variable and value labels when they are new to them. Variable labels describe the variables and value labels allow you to assign descriptions to particular values of a variable. In the 'demo' file, Maritalstatus is either 0 or 1. Value labels help you to remember whether 0 means married or not married.

VALUE LABELS

- 35 -

Maritalstatus 0 'Not Married' 1 'Married' .

• NOTE: The VALUE LABELS command deletes all existing value labels for the specified variable(s) and assigns new value labels. The ADD VALUE LABELS command can be used to add new labels or to alter labels for specified values without deleting existing labels.

• This is another instance where adding spaces can make your syntax much more readable. The following command is equivalent, but is easier to follow.

VALUE LABELS Maritalstatus 0 'Not Married' 1 'Married' .

• To create value labels for additional variables just list the next variable after the last value label of the previous, followed by the value labels in single quotes. Remember to put a period only at the very end of the command.

Data Management: COMPUTE, RECODE, SPLIT FILE, and FILTER

• In this section we will go over creating new variables (COMPUTE), recoding the values of existing variables (RECODE), running the same analysis on subgroups (SPLIT FILE) and using filters to select subsections of your data (FILTER). For this section we will be using the sample data file '1991 U.S. General Social Survey.sav' (in a default installation, it is located in C:\Program Files\SPSS.)

The COMPUTE command

o The COMPUTE command creates new numeric variables or modifies the values of existing string or numeric variables. You may be familiar with the dialog box, accessed by clicking Transform >Compute, shown below:

- 36 -

o It is often more efficient to write COMPUTE statements in syntax, instead of 'pointing and clicking' your way to a new variable. While the dialog box is especially useful for functions you are not familiar with, I find it faster to code common formulas directly in syntax. The examples below each calculate 'Hlth_totx' by adding the values of the 9 'hlthx' variables in the data set. NOTE: If Hlth_totx already exists in the data, the COMPUTE statement will replace it with new values. If it does not exist, COMPUTE creates a new variable at the end of your data. I'll discuss the differences after the examples.

COMPUTE Hlth_tot1=hlth1 + hlth2 + hlth3 + hlth4 + hlth5 + hlth6 + hlth7 + hlth8 +hlth9. COMPUTE Hlth_tot2=sum(hlth1, hlth2, hlth3, hlth4, hlth5, hlth6, hlth7, hlth8, hlth9). COMPUTE Hlth_tot3=sum(hlth1 TO hlth9).

COMPUTE Hlth_tot4=sum.6(hlth1, hlth2, hlth3, hlth4, hlth5, hlth6, hlth7, hlth8, hlth9). EXECUTE.

o To begin, notice the EXECUTE statement at the end. SPSS requires this at the end of COMPUTE commands, unless a procedural command follows it (i.e. FREQUENCIES, or other statistical analyses.) If you forget and run the syntax without it, the Data Editor window will display "Transformations Pending" in the bottom display area. Either add the EXECUTE statement to the syntax, highlight it and run it, or go to the menus and click Transform >Run Pending Transforms to complete the command. EXECUTE can be shortened to EXE.

o The first statement adds the values, NOTE: if there are ANY missing values in ANY of the variables in the equation, the result will be a missing value when using the + operator. The second statement sums across the variables, ignoring missing values, that is, it sums all available values. The third COMPUTE statement returns identical results to the second, UNLESS there are other variables in the data set between 'hlth1' and 'hlth9'. If, for example, you had previously created subtotals and they were in the data listed between 'hlth1' and 'hlth9' those subtotals would be included in the sum for the third statement, but NOT in the second. So, depending upon the data, hlth_tot1, hlth_tot2 and hlth_tot3 may be very different variables. The last COMPUTE statement will add all the variables if at least 6 of them are non-missing. This is usually a nice compromise between using the + sign (conservative) and using the sum command (liberal).

o For a full list of functions available with the COMPUTE command, refer to the SPSS 14.0 Command Syntax Reference (pp. 259-266). You can also use the menus Transform >Compute to get to the dialog box as shown above. Clicking in the 'Function Group:' box will populate the 'Functions and Special Variables:' box below it. Clicking on a particular function will populate the central box with details about that function.

The RECODE command

o The RECODE function allows you change the values of a variable. For example, it is sometimes useful to reverse code responses to a survey (change the highest response to the lowest and vice versa.) You may also need to collapse categories for an analysis. In general, it is safer to recode into a new variable, rather than change an existing one, so I will not address that option here.

o We will be using the variable 'life' as an example. It currently has value labels: 1 = Exciting, 2 = Routine, 3 = Dull, with 0, 8, and 9 set to missing for various reasons. We are going to recode 'life' so that higher numbers indicate more excitement, and all missing values set to 99. If you've ever tried doing a similar recode through the dialog boxes, you know how tedious this can be. The following syntax will accomplish this (and a couple of other things we've reviewed):

RECODE life (0, MISSING, SYSMIS = 99) (3 = 1) (2 = Copy) (1 = 3) (ELSE = 99) INTO life_r . VARIABLE LABELS life_r 'Is Life Exciting or Dull - reverse coded'. VALUE LABELS life_r 1 'Dull'

- 37 -

2 'Routine' 3 'Exciting' 99 'Missing' . MISSING VALUES life_r (99). EXE .

o The RECODE command needs a list of variables to act upon (yes, you can recode many variables at once by listing them after the RECODE statement) in this case 'life'. The variable(s) are followed by a list of recodes each enclosed in parentheses. As I've shown in the first example, you can list several values to be recoded to a single value.

o The keywords MISSING and SYSMIS both refer to missing values. MISSING includes both system-missing values (no value entered in the data set) and user-missing (values entered but set to missing by the user.) SYSMIS refers only to system-missing values. Since I included MISSING, I did not need to include SYSMIS, I did it for the sake of demonstration.

o Additionally, I could have set 2 equal to 2 (2=2) instead of using COPY, but it is useful to demonstrate (particularly if you only want to recode a few of a larger number of values.) The last recode, ELSE is included as a catch-all, any values that I did not specify will now be coded as 99 (I could also have written ELSE=COPY to avoid changing values. Your goals in the recode will dictate your own choice.)

o At the end of the RECODE statement is INTO, followed by the name of the new variable I want to create, here it is 'life_r'. The rest of the syntax adds labels and sets 99 to be a missing value.

o Remember that you can get reference information by clicking on the 'Syntax Help' button in the Syntax Editor, or by looking in the Command Reference.

The SPLIT FILE command

o If you are interested in running the same analysis on a set of subgroups in your data you can use SPLIT FILE to accomplish this. Using the survey data, let's see if the distribution of education is different for males and females. We'll use SPLIT FILE and FREQUENCIES to get the output we're looking for. SPLIT FILE requires that the data be sorted by the variable(s) we want to split by. The SORT CASES command will sort by the variables listed after the command. The default is to sort ascending. If you want to sort descending, add (D) after that variable.

SORT CASES BY sex . SPLIT FILE LAYERED BY sex . FREQUENCIES educ /HISTOGRAM. SPLIT FILE OFF .

o By default, SPLIT FILE will produce output with the values of the split variable as the outermost column entries of a table (the LAYERED BY option.) If you want split file groups to display in separate tables, use SEPARATE BY instead. It is a good practice to 'turn off' splits as soon as you complete the analysis. SPLIT FILE will be overridden by a later SPLIT FILE command. Until SPLIT FILE OFF is entered, all analyses will be carried out on a split file.

The FILTER and SELECT IF command

o Filter variables (also known as indicator variables and dummy variables) are used to identify cases that meet the criteria you have specified. For example, let's create a variable to identify all the women in the survey that are 25 or older, have at least 12 years of education and no children. To accomplish this through the menus, go to Data >Select Cases... to produce the following dialogue box:

- 38 -

o Select 'If condition is satisfied'. This will enable the 'If...' button below it. Click the 'If...' button to see the following dialogue box. I have already entered the formula to create the the filter variable we're interested in.

o Click 'Continue' and then click 'Paste' in the 'Select Cases' dialogue to create the following syntax:

USE ALL. COMPUTE filter_$=(sex = 2 and age >= 25 and educ >= 12 and childs = 0). VARIABLE LABEL filter_$ 'sex = 2 and age >= 25 and educ >= 12 and childs = 0 (FILTER)'. VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'. FORMAT filter_$ (f1.0). FILTER BY filter_$. EXECUTE .

- 39 -

o The USE ALL command is added by the dialogue box to ensure that no other filters are active when creating the new variable

o The COMPUTE command creates a new variable 'filter_$'. This is the default SPSS name for filters. If you used the dialogue boxes to create another filter, it would also be named 'filter_$' which would overwrite the previous filter. You could, of course, rename the filter after using the dialogue boxes, but it is surprisingly easy to forget. Using syntax, you can name the variable something more useful as you create it.

o The next three lines of syntax (VARIABLE LABEL, VALUE LABEL, and FORMAT) are not absolutely necessary but are very helpful for documentation sake, not to mention ease of use of the data set. Remember, if you change the name of the filter variable, you need to change 'filter_$' to the new name in each of the next four lines as well.

o The next line of the syntax applies the filter to your data. FILTER BY filters out all cases where the filter variable is 0. In other words, applying a filter selects only those cases where the filter variable equals 1 or greater. Using this data, if you ran

FILTER BY childs . EXE .

o you would filter out all cases that had 0 children, and leave in those with 1 or more children. So, you could use a variable that has many values. The common practice, however, is to use filter variables that only have values of 0 and 1. When you no longer wish to filter cases, run the following command:

FILTER OFF. EXE .

o The SELECT IF statement is another way to select subsets of cases. Instead of using a filter variable, the logical expression (or formula) to select cases is specified in the command. The major difference is that this command permanently deletes non-selected cases. So, SELECT IF can be used if you need to create a new data set that is a subset of existing data.For example, the following syntax will provide a data set that includes only females:

SELECT IF (sex = 2).

o SELECT IF will evaluate the expression you enter between the parentheses as either True, False, or Missing; all the False and Missing cases are dropped from the active data set.

o Using the TEMPORARY command will allow you to return to the original data set once a command that reads the data is run. That is, temporary transformations only apply until the next command that reads data and are no longer in effect once that command has run. So, the following syntax will temporarily filter out males, and then run DESCRIPTIVES for the variables 'happy' and 'life'. Since the DESCRIPTIVES command reads the data, it turns off the TEMPORARY command. Repeating the same DESCRIPTIVES command will then act upon the entire data set, not just females.

TEMPORARY. SELECT IF (sex=2). DESCRIPTIVES happy life. DESCRIPTIVES happy life.

o Here is the output:

- 40 -

o Note the second set of descriptive statistics has a valid N of 971 cases, while the first has 549. o Which is better; SELECT IF or creating a filter? That depends on how you prefer to work in SPSS. If

you are writing a syntax file to document an analysis, it may be easier to follow if you use SELECT IF, particularly if you need to run only one command on a subset of data. If you need to run multiple commands on a subset of data, it may be easier to use filters to subset your data, depending upon how you need to split the data. If you are in the midst of an analysis and are switching back and forth from syntax to point & click, it is useful to have permanent filter variables instead of re-creating them each time you want to use them.

Saving Files in SPSS Syntax

Saving Data as a SPSS file (.sav)

• The command to save a SPSS data file (extension '.sav') is as follows:

SAVE OUTFILE='C:\Program Files\SPSS\1991 U.S. General Social Survey.sav' .

• Specify the file you want to save by listing it between the single quotes, don't forget to include the complete path, file extension (in this case '.sav') and the period at the end of the command. If a file with the same name exists at the location you specify, it will be overwritten. You can produce this syntax by using the menus; File >Save As... then clicking on 'Paste' instead of 'OK'.

- 41 -

• If you only need some of the variables in the file, you can specify them using the KEEP and DROP subcommands. You can also rename variables using the RENAME command. Check the Command Syntax Reference for more details.

Saving Data as a SPSS Portable file (.por)

• The SPSS portable file format is useful if you need to transfer SPSS data between machines with different operating systems (for example from a PC to a Mac or UNIX) or to other programs such as SAS. The following command will save the currently active file as a SPSS portable file:

EXPORT OUTFILE='C:\Program Files\SPSS\1991 U.S. General Social Survey.por'.

• Note the extension is '.por'. The subcommands KEEP, DROP and RENAME work in EXPORT OUTFILE as well.

Saving Data in other formats

• SPSS can save files in a variety of formats, including ASCII delimited text, versions of Excel, Stata, SAS, and some database formats. The SAVE TRANSLATE command is used to create these files. I recommend using the menus (File >Save As... then 'Paste') to begin exploring these other formatting options.

Another way of analyzing ANOVAs

• One-Way ANOVA (using GLM) o The One-Way ANOVA is found under Analyze > General Linear Model > Univariate... and

produces the dialog box below:

o We will be doing the same analysis as before, so the groups will be formed by levels of 'Death of a Close Friend' and 'General Happiness' will be the dependent variable.

o Click on 'General Happiness' then move it to the 'Dependent Variable:' box by clicking on the upper arrow button. Now click on 'Death of a Close Friend' (scroll down the variable list to find) and then click the arrow button to move it to the 'Fixed Factor(s):' box.

- 42 -

§ Why 'Fixed' and not the 'Random Factors' box? Another good question. The levels of a fixed factor include all levels about which conclusions are desired. Since our variable is yes/no, all the levels we're interested in are included and we should include it as a Fixed Factor.

§ Random Factors have levels that are a random sample of the levels about which we want conclusions. The key is to keep in mind what you want to do with the results. Are you interested in generalizing your findings to levels that are not sampled in the variable (the levels are sampled from a larger population of possible levels) or did you include the entire population in your levels? If a variable’s levels are sampled from a larger population, you should treat that variable as a Random Factor.

o Once you have assigned variables in the Univariate dialog box, click on 'Model...' to get the following dialog:

We will not be changing anything in this dialog for this analysis. Note the default is to run a full factorial model, that is, with multiple factors and/or covariates in your model, all the interactions will be included. The dialog gives you a way to 'prune' the list if you need to. You can also change the type of sum of squares calculated and whether or not to include an intercept in your model. Click 'Continue'.

- 43 -

o Clicking the 'Contrasts...' button opens the following dialog:

A number of kinds of contrast are available through this dialog. To change or add contrasts first select the factor of interest then select the type of contrast from the list box below. After you select the contrast type, you still need to click the 'Change' button. You will then see the contrast type listed next to the factor in parentheses. For this analysis, our factor has only 2 levels so contrasts are not necessary. Click 'Cancel' to return to the main Univariate dialog.

o Now click on the 'Plots...' button to bring up the Profile Plots dialog:

The Profile Plots dialog allows you to plot (graph) any or all of the factors in your analysis. To plot our factor, make sure it is highlighted in the 'Factors:' box and then click the arrow to copy it to the 'Horizontal Axis:' box. Now click the 'Add' button below to list it in the 'Plots' box at the bottom. The 'Separate Lines:' and 'Separate Plots:' boxes are pretty self explanatory. You can add multiple plots and try switching factors from the horizontal axis to separate lines to visually explore your results. If you want to make adjustments to a plot, select it in the 'Plots' box, make the changes above and then click the 'Change' button. To delete a plot, select it then click 'Remove'.

- 44 -

o The 'Post Hoc...' button displays the following dialog:

Since no factors have been selected, the various tests are grayed out. When you select the factor(s) the tests will become active and you can select any or all the tests you want. The current analysis does not require any post hoc tests so we'll click 'Continue' or 'Cancel' to return to the Univariate dialog.

o The 'Save...' dialog provides options to save diagnostic information as variables. The dialog is below:

- 45 -

• Click on File > New > SPSS Syntax to open a Syntax Window, type in commands. To submit commands

in a Syntax Window, click on the Submit Button on the Tool Bar.

• Every choice you make using the menus is translated into syntax and executed when you click 'OK' or 'Run'. You can save these commands into a Syntax Window by using the 'Paste' button found next to the 'OK' or 'Run' button in the dialog boxes. You can then run that syntax whenever you want.

spss for windows - yale statlab homestatlab.stat.yale.edu/workshops/intro_spss_fa12.pdf · • you...

Documents