programming with s pss syntax and macros
TRANSCRIPT
-
8/9/2019 Programming With s Pss Syntax and Macros
1/154
Programming with SPSSSyntax and Macros
SPSS Inc.
233 S Wacker Drive, 11th Floor
Chicago, Illinois 60606
312.651.3000
Training Department
800.543.6607
v10.0 Revised 12/ 31/ 99 ss
-
8/9/2019 Programming With s Pss Syntax and Macros
2/154
SPSS N eura l Conn ection, SPSS QI Analyst, SPSS for Windows, SPSS Da ta
En try II, SP SS-X, SCSS, SPSS/PC, SPSS/PC+, SPSS Cat egories, SPSS Gr aph ics,
SPSS Pr ofessiona l Stat istics, SPSS Advanced Stat istics, SPSS Ta bles, SPSS
Trends, SPSS E xact Tests, and SP SS Missing Value ar e the tra demar ks of SPSS
Inc. for its pr oprieta ry computer software. CHAID for Windows is th e tra demar kof SPSS Inc. and Sta tistical In novations Inc. for its pr oprietar y compu ter
softwa re. Excel for Windows an d Word for Windows ar e tra dema rk s of Microsoft;
dBase is a t ra demar k of Borlan d; Lotu s 1-2-3 is a tr adema rk of Lotus
Development Corp. N o mat erial describing su ch softwar e ma y be pr oduced or
distributed without the written permission of the owners of the trademark and
license rights in the softwar e and t he copyrights in th e published mat erials.
Genera l notice: Other pr oduct na mes ment ioned herein ar e used for
identificat ion pu rposes only an d ma y be tra demar ks of th eir respective
companies.
Copyright(c) 2000 by SPS S In c.
All rights reserved.
Pr inted in th e United Sta tes of America.
No part of this pu blication m ay be repr oduced or distributed in an y form or by
an y means, or stored on a data base or ret rieval system, without th e prior writt en
permission of th e publisher, except as perm itted un der th e United Sta tes
Copyright Act of 1976.
-
8/9/2019 Programming With s Pss Syntax and Macros
3/154
Table of Contents - 1
SPSS Training
Programming with SPSS Syntax and MacrosTable of Contents
Introduction and Syntax Review
A Data Manipula t ion Example 1 - 1A Macro Example 1 - 3
Rules and Aids for SPSS Syntax 1 - 6
Advice for Those Working with Syntax 1 - 9
Summary 1-10
Basic SPSS Programming ConceptsCommand Types in SPSS 2 - 2
The Three Types of SPSS Programming 2 - 3
SPSS Data Defin it ion 2 - 5
SPSS Programming Const ruct s 2 - 6Do If & End If 2 - 6
Do Repeat & End Repeat 2 - 7
Loop & End Loop 2 - 9
Scra tch Var iables 2-11
Vector 2-12
Summary 2-16
Complex File TypesASCII Data and Records 3 - 2
File Types 3 - 2Syntax Basics 3 - 3
Data File St ructure 3 - 4
Grouped Data 3 - 4
Mixed Data 3 - 5
Nested Data 3 - 6
Reading a Mixed File 3 - 7
Er rors in the Data 3-10
Grouped F ile Type Without Record Information 3-12
Summary 3-17
Input ProgramsSyntax Components 4 - 2
Example 1: Change the Case Base of a F ile 4 - 2
End of Case Processing 4 - 7
End of F ile Processing 4 - 9
Checking Input P rograms 4-10
Incomplete Input Programs 4-11
Chapter 1
Chapter 3
Chapter 2
Chapter 4
-
8/9/2019 Programming With s Pss Syntax and Macros
4/154
SPSS Training
Table of Contents - 2
Exercises
Example 2: Reading F iles with Missing Ident ifier s 4-14
When Things Go Wrong 4-20
Summary 4-21
Advanced Data Manipulation
Reading a Comma-Delimited File 5 - 2Reading Mult iple Cases on the Same Record 5 - 7
An Exist ing SPSS Data File with Repeat ing Data 5-11
Prin t Command for Diagnost ics 5-15
Pract ica l Example: Consolida t ing Transact ions 5-17
Summary 5-24
Appendix: Ident ifying Missing Values by Case 5-25
Introduction to MacrosMacro Basics 6 - 2
Macro Arguments 6 - 3
Macro Tokens 6 - 3
Viewing a Macro Expansion 6 - 7
Keyword Arguments 6 - 8
Using a Varying Number of Tokens 6-10
When Things Go Wrong 6-15
Summary 6-18
Advanced MacrosLooping in Macros 7 - 2
Producing Severa l Clustered Bar Char t s 7 - 2Double Loops in Macros 7 - 5
St r ing Manipula t ion Funct ions 7 - 7
Direct Assignment of Macro Var iables 7 - 7
Condit iona l Processing 7 - 7
Crea t ing Concatena ted Stub and Banner Tables 7 - 8
Addit iona l Recommendat ions 7-13
Summary 7-13
Macro Tricks
Combining Input Programs and Macros 8 - 2Order ing Tables and Char t s 8 - 6
The Case of the Disappear ing Command 8 - 9
Summary 8-14
ExercisesExercises E - 1
Chapter 6
Chapter 7
Chapter 8
Chapter 5
-
8/9/2019 Programming With s Pss Syntax and Macros
5/154
Introduction and Syntax Review 1 - 1
SPSS Training
Introduction and SyntaxReview
A Data Manipulation Example
A Macro Example
Rules and Aids for SP SS Synta x
Advice for Those Work ing with Synt ax
This course has two major topical areas. We will review how to use
SPSS Synt ax to perform complex data ma nipulat ions th at a re notavailable under t he SPS S menu system. This will be of int erest t o
those who n eed to rea d complex dat a files from legacy compu ter systems
(for exam ple, legacy health care dat a, tr an saction oriented sa les systems)
an d th ose who find th ey need to reorgan ize th eir data in order to perform
a desired an alysis. Exam ples of th e latter include mar keting an d
cust omer relat ionsh ip stu dies in which a nu mber of products (or ser vices
of an compan y) are ra ted on ea ch of ma ny a ttr ibutes. All inform ation
from a respondent is typically stored in a single record, but needs t o be
spread across multiple records in order for factor a na lysis an d perceptua l
map ping to be performed. When p repar ing data for chur n (cust omer
reten tion for t elecoms, credit card issuers, insu ra nce companies)
stu dies, compa risons might need t o be mad e across tr an sactiona l records
sorted by customer ID and dat e. SPSS Synt ax permits a r icher a rra y ofdata manipulations in t his content tha n would the menu system. In
short, we will examine uses of SPSS Syntax to facilitate analysis of files
with complex structu res or files tha t mu st be restr uctur ed for a desired
analysis.
The second topical ar ea concerns au tomation in SPSS t hr ough the
SPSS m acro lan guage. SPSS ma cros can generat e SPSS Synta x, which is
then executed. For th is reason, macros are very han dy in situ at ions
where SPSS Syntax needs to be run repeatedly, but with minor and
systemat ic changes each t ime. For example, you m ight wish t o produce
thirt y Intera ctive graphs, each a clust ered bar chart containing a
demograp hic varia ble and one of thirt y rat ing scale variables. Within th e
SPSS m enu system , cha nges would ha ve to be made in th e Inter active
graph dialog box for each graph . Instea d, an SPSS macro could genera te
the SP SS Synta x for each inter active graph within a loop, substitut ing a
new rat ing scale variable na me per iterat ion. In th is way, the SP SS
macro language can au tomat e what would otherwise be time-consu ming
tasks for t he an alyst.
Chapter 1
Topics
INTRODUCTION
-
8/9/2019 Programming With s Pss Syntax and Macros
6/154
SPSS Training
Introduction and Syntax Review 1 - 2
Since th ese topics involve SPS S Synt ax, we will use t he dia log boxes
within SP SS infrequent ly. A prerequ isite for this cour se is familiar ity
with SP SS Synta x at t he level of our Introduction to SPS S S yntax
tra ining course. In th is cha pter, we will present a sa mple of data
man ipulation with SP SS synta x and a ma cro example, an d provide a
brief review of and some recommen dations for SPSS Synta x.
To illustr at e the type of dat a ma nipulat ion th at can be performed with
SPSS Synta x, we will display the beginning a nd fina l form of a dat a file
recording SPSS t ra ining course purcha ses. Within th e Training
department, there was interest in examining patterns of training courses
tak en by SPSS cust omers, and an a na lysis was performed usin g SPSS
Clementine. However, a requirement of the a na lysis was a da ta set in
which a ll courses ta ken by a customer (an SPSS ID) were contained in a
single customer record.
The original dat a file, extr acted from a t ran saction da taba se,
contained one record p er cours e ta ken, since an insta nce of a cour se being
tak en by a customer const ituted a sales tra nsa ction. We show this below.
Figure 1.1 Training Sales Data - Transaction File
A DATAMANIPULATION
EXAMPLE
Each record in this file is a sales transaction involving a specific
tra ining cour se. The two fields displayed ar e customer ID a nd course
tak en (which cont ains city, sequence within th e year, and t ra ining course
code information). Additional fields, such as date and price, were
previously removed since th ey were not n eeded for th is ana lysis. Here
different courses tak en by an individual SPSS customer ar e scatt ered
thr oughout th e file. Even if the file were sorted by cust omer ID, the fact
tha t th e tra ining course history for a single cust omer is spread a cross a
-
8/9/2019 Programming With s Pss Syntax and Macros
7/154
Introduction and Syntax Review 1 - 3
SPSS Training
nu mber of records, tha t var ies from customer t o cust omer, would create
difficulties for the analysis procedures.
Figure 1.2 Training Sales Data - One Record Per Customer
The tr aining dat a h as been reorganized so ther e is a single record per
cust omer ID and a sepa ra te variable for each tr aining cour se. These
course varia bles are coded 1 if a cust omer signed up for t he cours e an d 0
if not. This str uctur e ma kes it easy t o explore a ssociations am ong
tra ining courses tak en by cust omers. The SPSS synta x to perform t he
dat a reorganization involved two steps: creating a vector of variables in
which ea ch variable represen ted a specific cour se, and aggregating t his
file to the cust omer ID level. The logic behin d th ese operat ions is
reviewed in Chapt er 5.
We mentioned earlier that SPSS macros generate SPSS Syntax. A
common use of macros is to produce a series of synta x comma nds th at
vary in specific ways, for example, a set of Interactive Graph or Tables
comman ds in which each comma nd r un s an an alysis based on a different
variable. Thus one macro produces the sa me result as m an y synt axcommands (which it creates) or interactions with a dialog box. To
demonstra te, we will display resu lts from a ma cro th at produces a set of
Inter active graph s, substitut ing different variables in clustered bar
cha rts (th is macro is discussed in Ch apt er 7).
A MACROEXAMPLE
-
8/9/2019 Programming With s Pss Syntax and Macros
8/154
SPSS Training
Introduction and Syntax Review 1 - 4
Figure 1.3 Create Bar Chart Dialog Box
The dialog above will create a clust ered bar cha rt displaying at titud e
toward governmen t action on health for different ma rital sta tus groups.
Note th at only a single variable can be placed in h orizonta l an d Color
boxes. (Note: actua lly multiple var iables can be pla ced in a single box, but
this a ction will not produce multiple cha rts.) Thus creating a series of
cha rt s, in which eith er t he h orizontal axis or Color var iables cha nge,
would require r epeated visits to this dialog, substitu ting one variable at atime. However, th e macro below can build man y clustered ba r char ts.
-
8/9/2019 Programming With s Pss Syntax and Macros
9/154
Introduction and Syntax Review 1 - 5
SPSS Training
Figure 1.4 Macro to Produce Multiple Bar Charts (Interactive Graphs)
The deta ils of this ma cro (Clu2IBar) will be discussed in Cha pter 7.
However, we point out t ha t th e IGRAPH comm an d, which was pa sted by
clicking the P ast e pushbu tton in th e Create Bar Char t dialog box (seeFigure 1.3), is nested within two loops, each of which iterates over a list
of variables supplied by the user. The invocation of the macro (last line in
program), supplies two var iable nam es for t he h orizont al a xis var iable
an d thr ee var iable names for th e cluster var iable. Thus six IGRAPH
comman ds will be generat ed, resulting in t he six bar cha rt s shown below.
-
8/9/2019 Programming With s Pss Syntax and Macros
10/154
SPSS Training
Introduction and Syntax Review 1 - 6
Figure 1.5 Bar Charts Produced from Clu2IBar Macro
The six Intera ctive Graph s in th e Outline pan e were produced from
the m acro. In th is way, ma cros can au tomat e the ru nn ing of sets of
similar an alyses. The second section of th is cours e reviews SPSS ma cros
in detail.
Since this cour se involves either the writing or genera tion of SPSS
Synta x, we begin be reviewing th e rules of SPSS S ynta x and h ow to
obtain synt ax help.
The synta x rules for editing an d writing SPSS comma nds a re as
follows:
1. Each new command must begin on a new line and end with a
period (.) or a blan k line.
2. *Each comman d must begin in the first column of a new line.
3. *Continua tion lines of a comman d must be indented at leastone space.
4. Variable names must be spelled out fully.
5. Subcommands must be separated with a forward s lash ( /).
The slash before the first su bcomman d is usu ally optional.
6. *Each line of comman d syntax cann ot exceed 80 char acters.
*Not required when r un ning from a Synt ax window, but required when
using th e INCLUDE comma nd or the SP SS Pr oduction Facility
RULES AND AIDSFOR SPSS
SYNTAX
-
8/9/2019 Programming With s Pss Syntax and Macros
11/154
Introduction and Syntax Review 1 - 7
SPSS Training
Synta x comma nds pr oduced by clicking th e Past e push butt on from an
SPSS dialog box will conform to th ese ru les, so th e importan t issu e is to
remember them when editing or ent ering synta x.
There a re several u seful sources of help when writing SP SS synta x. A
quick reminder of the keywords and r equirement s for an SP SS comma nd
are only a tool-button click away. To demonstrate:
From within SPSS:
ClickFile..Open..Syntax
Move to th e c:\ Train\ ProgSynMac directory
Double click on TransactionAgg
Scroll down to th e Vector command
Click on th e Vector comma nd (so the insert ion pointer touches it)
Click the Syntax Help tool
Figure 1.6 Syntax Help
In th is syntax sum mar y for th e Vector comman d, subcomman d
na mes an d keywords a re shown in upper case (some simple comman ds,
like Vector, have no subcommands); lower case elements describe
specifications that you supply (e.g. varlist indicates a list of variable
na mes t ha t you pr ovide). Sections of the comma nd enclosed in squ ar e
brackets [ ] are optiona l, while those in braces indicat e sets from wh ich a
single choice can be ma de. To focus on th e requ ired elemen ts of the
comman d, scan only sections not en closed within squa re bra ckets.
-
8/9/2019 Programming With s Pss Syntax and Macros
12/154
SPSS Training
Introduction and Syntax Review 1 - 8
While the synta x informat ion a bout VECTOR is complete, ther e is no
explan ation a bout wha t each specificat ion does. Although some might be
obvious from th eir na mes, man y are n ot. Complete documen ta tion a bout
SPSS for Windows synta x comma nds can be found in t he SPS S 10.0
S yntax Reference Guide (included on t he CD-ROM containing t he SP SS
10.0 progra m). If copied to your h ar d dr ive du ring SP SS inst allation, you
can access th e guide from th e ma in men u by clicking Help..Synta xGuide..Base (or one of the optional modules). Commands are listed
alpha betically and t he su bcomman d options a re fully explained.
Experienced synt ax comma nd u sers, needing only reminder s, can work
from th e Synta x Help windows. For others, t he Synt ax Reference Guide is
necessary.
The sequence below assu mes th e S PS S 10.0 S yntax Reference Guide has
been insta lled on your machine. If not, it can be insta lled from t he SP SS
for Windows 10.0 CD-ROM.
ClickHelp..Syntax Guide..BaseClick th e ar row beside Commands in the Outline pane
Scroll down to VECTOR
Click the arr ow beside VECTOR in the Outline pane
Click on VECTOR in the Outline pane
Figure 1.7 SPSS Base 10.0 Syntax Reference for Vector Command
Note
-
8/9/2019 Programming With s Pss Syntax and Macros
13/154
Introduction and Syntax Review 1 - 9
SPSS Training
In addition to th e syntax summ ary accessed through th e Syntax Help
tool, th e S PS S 10.0 S yntax Reference Guide contains discussion,
explan ation an d examples. All ar e useful when investigating the
possibilities of an S PSS comma nd. F or those working often with SP SS
Synta x, we strongly recommend inst alling the r eference guide on your
machine or pu rchasing a copy of th e S PS S 10.0 S yntax Reference Guide in
book form.
ClickFile..Exit to exit Adobe Acrobat and the SPS S 10.0 Syntax
Reference Guide
Fina lly, it is worth ment ioning, at th e risk of being obvious, several
recommen dat ions for t hose working with SPSS Synta x.
Display Syntax commands as Log Items
By defau lt, SPSS does not d isplay synt ax in t he Viewer window, although
it is written t o the SPSS journa l file. If SPSS issues a ny error or war ning
messages, it is u seful to see which comma nd they follow. For t his r eason,
while writing, editing and testing SP SS synt ax, we recommen d you set on
the option to display syntax a s a log item in th e Viewer window. We will
do this explicitly in th e next cha pter, bu t view th e Options dialog here.
ClickEdit..Options
Click the Viewer t ab
Figure 1.8 Viewer Options
ADVICE FOR THOSEWORKING WITH
SYNTAX
-
8/9/2019 Programming With s Pss Syntax and Macros
14/154
SPSS Training
Introduction and Syntax Review 1 - 10
The checkbox in the lower left corner of Viewer tab within the
Options dialog controls whether SPSS syn ta x comman ds display in th e
log.
Develop Basic Syntax Using Dialog Boxes
Alth ough th is cour se proves the exception to the ru le (discussing In put
Program s, Vectors an d Loops), most SPSS comma nds can be generat ed by
clicking th e Pa ste pu shbut ton of th e relevant dialog box. It is to your
advan ta ge to use dialogs, when possible, to const ru ct t he ba sic SPSS
synta x and t hen edit is as n eeded. This will minimize errors by reducing
your opportun ity to ma ke typing mistakes.
Use File New to Clear Data
If err ors lead to complex SPSS da ta operat ions (Inpu t P rogram) not
completing, SPSS can be left in a wa iting stat e. That is, it will not
properly process new instru ctions u nt il it ha s closur e on th e interr upt ed
sequence. To clear t he curren t dat a sta te of SPSS, you can ru n th e NEW
DATA command or click File..New..Data. When writing complex Input
Program s (see Chapt er 4 an d 5), you m ight consider beginning the
program with a N EW DATA comma nd to insu re tha t an y problem datastat e has been cleared pr ior to run ning your pr ogra m. We illust ra te th is
in Chapter 4.
Test After Each Step
It is a difficult challenge to foresee all possible data problems that a
program might face an d it rarely the case that a ny program, for that
mat ter, ru ns correctly the first time. For these reasons it is importa nt t o
systemat ically test an d check resu lts at each sta ge of the process. Full
programm ing meth odologies ha ve been developed to th is end. Her e we
merely wish to recommen d th at , during development , you include
displays or procedures to check the results of each set of operations, so
tha t wh en something goes awry you have a wa y of isolating an d
identifying the pr oblem. The Dat a E ditor display in SPSS is useful for
this purpose, as are the Frequencies, Crosstabs, Case Summaries and
List procedures, an d the P rint t ra nsforma tion. These will be used
repeat edly in the examples we present in this cour se, and we can a ssur e
you th at t he consu ltan ts in the SP SS Consu lting group use them h eavily.
Delete Items in Viewer Window
If an er ror occurs, an d you ha ve read and u nderst ood the war ning
messages in th e Viewer window, it is often a good idea t o delete th e
results in th e Viewer window before rer un ning your program. Th is is
becau se syntax comma nds m ay be appended to the last Log item in t he
Outline pan e, making it d ifficult to distinguish th e old war ning messa ges
from the new resu lts.
In t his chapter we introduced, with exam ples, the major focus a reas of
this cour se: Synt ax for complex data ma nipulat ion a nd S PSS Ma cros. We
also briefly reviewed the available help for SPSS Syntax and offered some
advice for t hose working with SPSS Synta x.
SUMMARY
-
8/9/2019 Programming With s Pss Syntax and Macros
15/154
Basic SPSS Programming Concepts 2 - 1
SPSS Training
Basic SPSS ProgrammingConcepts
Introduction
Comma nd Types in SPSS
The Thr ee Types of SPSS P rogramm ing
SPSS Data Definition
SPSS P rogram ming Constru cts
A Note About P rogram Execution
Analysis Tip: Reordering Variables
A
ll SPSS procedures a re built upon a powerful program ming
langua ge that ha s been consistent, though great ly extended, since
SPSS wa s first developed as a main fra me sta tistical softwar eprogram. This course will teach you how to use th is language a nd other
featu res for file an d dat a inpu t a nd m an ipulation, and for overa ll control
of SP SS execution.
The SP SS lan guage, called syntax, is generat ed by the progra m every
time a user clicks on t he OK bu tt on in a dialog box to execut e a
procedure. Behind t he scenes, SPSS builds syntax to send to th e SPSS
centr al engine to execut e a pa rticular pr ocedure or tr an sforma tion. Using
the P aste bu tton in a dialog box places a copy of that synta x in a Synt ax
window so that it can be edited or saved and u sed again.
SPSS synta x is also often called a commandor set of comman ds. The
gram mar or rules associated with comma nds a re fairly simple, an d we
will review them a s necessar y thr oughout t he chapter s.
SPSS for Windows 10.0 can ru n en tirely on your deskt op machine.
Altern at ively, an SP SS Client, th rough which you requ est an alyses and
view results, can run on your desktop, while the an alyses are ru n by the
SPSS Server, possibly located on a different ma chine. In this cours e,
except for t he directory you use t o access th e tr aining da ta files, it ma kes
no difference wheth er t he SP SS Server is locat ed on your desktop or a
differen t comput er. Th e SP SS Ser ver Login dialog (click F ile..Switch
Server ) allows you to connect t o a remote SP SS Ser ver (if inst alled on
your network).
If you are r un ning SP SS from a Remote (not Local) server, th en t ouse t he da ta files accompanying th is course, they mu st be copied either t o
the server r un ning SPSS or to a directory tha t can be accessed by
(ma pped from) the ser ver. The directory references in t his guide assu me
you a re r un ning SP SS as a local server a nd can th us directly access files
stored on your ha rd dr ive.
Chapter 2
Topics
INTRODUCTION
Note about DataFile Access
When RunningSPSS from a
Remote Server
-
8/9/2019 Programming With s Pss Syntax and Macros
16/154
Basic SPSS Programming Concepts 2 - 2
SPSS Training
Most u sers su ccessfully progra m in SP SS without a complete
un dersta nding of th e various comma nd an d progra m sta tes of SPSS, an d
you can too. Nevertheless, it helps to kn ow a little about th is subject,
par ticular ly to help put th e various capabilities of SPSS in cont ext. Many
users know th e difference between transformations and procedures, the
two main t ypes of comma nds, but th ere ar e a few oth ers:
Fi le Def ini t ion Commands: As their na me implies, all these
comman ds are used to input da ta int o SPSS. They include
fam iliar comman ds su ch as GE T, DATA LIST, or GE T
CAPTURE ODBC, but also other s like FILE TYPE, IMPORT,
or MATCH FILE S.
Input Program Commands: These a re s pecialized comm an ds,
also used to input or create dat a in SPSS . These comma nds
are m ore esoteric and in clude RECORD TYPE, REPE ATING
DATA, and END CASE. We ha ve more to say about th is
below when discussing the data step in SPSS.
Transformation Commands: These comman ds are quite var ied
in their operation, but th e key element t hey shar e in common
is that th ey neither input da ta n or an alyze data. Instead, they
modify data (COMPUTE, RE CODE), creat e new var iables
(VECTOR, NUMERIC), write out da ta (WRITE), or label dat a
(VARIABLE LABELS). To be specific, tr an sform at ions do not
cau se SPSS to read th e data file.
Procedure Commands: Almost a ll of these comma nds an alyze
dat a. However, the a ctual definition of a procedure in SPSS is
a comma nd t ha t causes dat a to be read. Thus, SAVE is also a
procedure because it causes the dat a t o be read an d an SPSS
system file ( an .SAV exten sion) to be crea ted.
Util i ty Commands: These comma nds h an dle a variety of chores.
They include comma nds to add comments (COMMENT,
DOCUME NT), to define a n ew file (NE W FILE ), an d to define
macros (DEF INE--!END DEFIN E).
Knowing about comman d types will be helpful in u nderst an ding how
an d why a program operat es. For example, XSAVE, an altern at ive to the
SAVE comman d, can be used within a loop becau se it is a t ra nsforma tion,
not a pr ocedure.
COMMANDTYPES IN SPSS
-
8/9/2019 Programming With s Pss Syntax and Macros
17/154
Basic SPSS Programming Concepts 2 - 3
SPSS Training
Logically, there a re t hr ee general met hods of program ming in SP SS. Two
of them involve synt ax, while the t hird u ses a version of th e Basic
programm ing langua ge (Sax Ba sic).
Standard Syntax: These programs ar e the most common and simply
involve writing a series of SPSS commands to accomplish a set of tasks.
An example of a simple progra m is sh own in th e box (th is program u ses
the F ILE TYPE comma nd t o read a non-standa rd ASCII data file). In a
stan dar d synta x progra m, each comman d does one th ing, an d it does not
refer to oth er SPSS synt ax. Stan dar d progra ms ar e executed either
through the Run button, the INCLUDE comman d, or th e SPSS
Production Facility.
* SPSS Exa mple to read a n ested file * .
FILE TYPE N ESTE D FILE 'C:\ TEST.DAT' / RECORD RECID 1
(A) CASE 3 (F).
RECORD TYPE 'H'.
DATA LIST /H1 t o H10 5-14.
RECORD TYPE 'F'.DATA LIST /F1 to F5 15-19.
RECORD TYPE 'P'.
DATA LIST /CASEX 3 P1 t o P3 20-22.
END FILE TYPE.
LIST
Macros: Many pr ogra ms a llow user s to define macros , which a re
typically a series of comman ds grouped t ogether as a sin gle comma nd to
mak e everyday tasks easier an d more convenient. Macros can often be
assigned to a t oolbar or men u t o make t hem readily accessible. Normally,
macros are sa ved as a ser ies of inst ru ctions in a special ma cro langua ge.
SPSS Ma cros are a bit different an d not exactly par allel to the m ore
common definition of a ma cro in other programs. F irst, th ey are wr itten
in SPSS synta x (plus a few special ma cro comma nds) and are essen tially
executed like a ny other synta x file. Second, they genera te cust omized
SPSS comma nd synta x, i.e., stan dar d syntax, to reduce the time an d
effort needed by th e program writer t o perform complex and repetitive
tas ks. Ther e is no special ma cro editor in SP SS or m acro facility to
execut e a m acro; aga in, a m acro is simply a specialized synta x file. Below
is an example of a ma cro th at a ut omates th e production of a bar graph
an d th e insert ion of todays dat e into th e title of a gra ph (th is progra m
actua lly defines two macros). The ma cro begins with DEF INE an d endswith !ENDDEFINE .
THE THREETYPES OF SPSSPROGRAMMING
-
8/9/2019 Programming With s Pss Syntax and Macros
18/154
Basic SPSS Programming Concepts 2 - 4
SPSS Training
DEF INE !GRAPHI T ( ARG1 !TOKEN S(1) / ARG2 !TOKENS(1)/
ARG3 !TOKE NS (1)) .
GRAPH /BAR(SIMPLE )=COUN T BY !ARG1
/TITLE=
"EXAMPLE OF MACRO GRAPHING TITLES AND DATES"
/SUBTITLE !QUOTE(!CONCAT(!UNQUOTE(!ARG2),
"Something ",!UNQUOTE(!ARG3)))/FOOTN OTE = !CON CAT(""Toda y is ",!EVAL(@DATE IT),""" ).
!ENDDEFINE .
DATA LIST F REE / A.
BEGIN DATA
1 2 3 2 1 2 4
EN D DATA.
* The following defines a ma cro entity wit h t odays da te *.
DO IF $CASENUM=1 .
WRITE OUTFILE 'TMP'/
'DEF INE @DATEIT()',$TIME(ADATE),'!EN DDEF INE .'.
END IF .
EXECUTE.
INCLUDE 'TMP' .!GRAPHIT ARG1=A ARG2="ARGUMEN T 2"
ARG3="ARGUMENT 3".
SPSS Scripting Faci l i ty: The s cripting facility a lso allows you to
aut omat e tasks in SPSS. It ha s far more power and capabilit ies tha n
SPSS m acros. You can accomplish t he sam e task s th at you would with a
macro, but can do much more, including the creation of dialog boxes, the
cust omizat ion of out put in t he Viewer window, or the writing out of
selected portions of output to a sepa ra te file for u se in other programs.
Scripts can be set to run au tomatically or run at user choice.
Unlike synta x progra ms or macros, scripts are writt en in a special
langua ge, Sax Basic, that is similar to the macro lan guage in other
progra ms, su ch as Visua l Basic for Applicat ions. Of cour se, scripts can
also use SPSS synt ax in th eir definition, an d th ey can be assigned to a
menu choice like macros. Scripts ar e often somewhat lengthy compa red t o
stan dar d synta x, so we wont display an exam ple in t his chapter . SPSS
Scripts are covered in th e Program m ing with SPS S S cripts training
course.
-
8/9/2019 Programming With s Pss Syntax and Macros
19/154
Basic SPSS Programming Concepts 2 - 5
SPSS Training
A substa ntia l portion of th is cour se is devoted to th e ma nipulat ion of files
an d data with SPSS . As such it will be helpful to underst an d a bit about
SPSS dat a file definition. More informa tion is available in th e Comman ds
and P rogram States Appendix in the S PS S Base 10.0 Synta x Reference
Guide (th is guide is available on the CD-ROM cont aining SP SS a nd can
be copied to your ha rd d rive when S PSS is insta lled).
To do something in S PSS you need t o define a working da ta file,
possibly tra nsform t he dat a, and t hen a na lyze it. The first ta sk is
accomplished with a file definition command. A simple instance might be
the GET comma nd t o open a n SP SS dat a file, but a more complex one is
INPU T PROGRAM to define a n on-sta nda rd da ta file. In eith er case, file
definition comm an ds use an input program state to accomplish th eir job.
In an input program stat e, SPSS must determine how to read a data file,
what th e definition is of a case, when t o creat e a case, and wh en to create
the da ta file.
Often th ese decisions ar e str aightforwar d for S PSS. When you click
on File...Open..Data, a nd na me a file with a n exten sion of SAV, SPSS
kn ows how to read t he file, th at each logical r ecord in th e file is to bewritten to one row in the Dat a E ditor, and tha t i t sh ould read th e whole
file. Or in th e simple pr ogra m below, the execution of th e DATA LIST
command (accessed by choosing File...Read Text Data; note that as of
SPSS 10.0 a GET DATA comman d is pasted inst ead) tells SPSS t ha t
what follows is a n ormal file, where ea ch line of dat a is t o be written to a
new row, or case, in th e Data Editor, and t ha t th e last case should be
written a nd t he file creat ed when th e END DATA comma nd is
encountered.
DATA LIST F REE / X.
BEGIN DATA.
1 2 3
EN D DATA.LIST.
However, even in th ese simple comm an ds, SPSS ent ers an input
program st at e tha t is more complex tha n it first appear s. In fact, you can
explicitly place SPSS int o this sta te by using t he INP UT P ROGRAM
comman d. Thu s, the following program is equivalent t o the first.
INPU T PROGRAM.
DATA LIST F REE / X.
END INP UT PROGRAM.BEGIN DATA.
1 2 3
EN D DATA.
LIST.
SPSS DATADEFINITION
-
8/9/2019 Programming With s Pss Syntax and Macros
20/154
Basic SPSS Programming Concepts 2 - 6
SPSS Training
The difference is tha t first, SPSS is explicitly put int o the inpu t
program sta te, and second, SPSS is told when to quit the input program
sta te (with END INPU T PROGRAM). There is, of course, no reason to use
INPU T PROGRAM in th is un complicated example, but for complex dat a,
using an inp ut pr ogra m ma y be necessary to successfully read dat a into
SPSS. We will see more of INPUT P ROGRAM in Ch apt er 4.
The k ey to un dersta nding a nd using complex file definitioncomman ds in SPSS is to un dersta nd th at you, the user, are in charge of
telling SPSS wha t const itut es a case or row in the Data Editor, when to
create th at case, and when t o end the input program a nd create the
working da ta file. As an illustr at ion, it is possible through a n inpu t
program t o have SPSS r ead only a portion of a file rat her t ha n first
creat ing a lar ger working data file which is t hen reduced by selecting
certa in cases to retain.
SPSS da ta files must be rectangu lar (denorma lized, in data base
term inology). This means th at ther e must be a value for every variable
for every case. Or t o put it a nother way, each row of the Dat a E ditor
defines a case t o SPSS. Often, da ta come in a forma t t ha t doesnt m at ch
this layout, an d th at is one of th e most common u ses for the inpu t
program capa bility. In a ddition, SP SS su pplies several pr edefined
complex file definit ions t ha t r ead common t ypes of non-recta ngu lar files
(we will discuss th ese in Ch apt er 3). Chan ging th e case definition of a file
is a comm on techn ique to solve a var iety of pr oblems.
As with other programm ing lan guages, SPSS program s, whether th ey be
sta nda rd synta x, macros, or scripts, all have several sta nda rd const ru cts
tha t can be used t o do man y things. These include the a bility to loop, to
creat e an ar ray of elements, t o repeat a ctions, a nd t o do actions only if
some condition is tr ue. We illustra te th ese concepts her e with sta nda rd
synta x; th en later you will see these const ru cts u sed again in m acro0s.
A DO IF & END IF st ru cture is used to execut e tra nsform ations on
subs ets of cases b ased on s ome logical condit ion. It is often used to
replace a long series of IF st atem ents. Th e logical stru cture of a DO IF
comman d sequence is:
DO IF (test for condition)
transformations
ELSE IF (test for a nother condition)transformations
ELSE IF or ELSE
furt her tra nsformat ions
END IF
SPSSPROGRAMMING
CONSTRUCTS
DO IF & END IF
-
8/9/2019 Programming With s Pss Syntax and Macros
21/154
Basic SPSS Programming Concepts 2 - 7
SPSS Training
The clear a dvant age is that not all stat ements a re execut ed for each
case, as is tr ue for a series of IF stat ement s. Consider regression an alyses
tha t foun d th at t he relat ionsh ip between gross domestic product (GDP)
an d birth ra te (BTHR) is not the sa me for first- and th ird-world coun tries
(which is definitely true). The results of th e separ ate regression an alyses
can be a pplied to a file of first - and t hir d-world count ries efficient ly with
these comman ds (where WORLD is t he selection variable).
DO IF (WORLD = 1).
COMPU TE BTHR = 10.872 + .0014 *GDP.
ELS E IF (WORLD = 3).
COMPU TE BTH R = 46.148 -.004 * GDP.
END IF.
Although we could h ave accomplished t he sa me with two IF
comman ds, the advan ta ge is tha t th e Else If an d second Compu te
comman ds a re not executed for first-world countr ies.
A DO REPEAT const ru ct a llows you t o repeat t he sa me group of
tra nsforma tions on a s et of variables, thereby reducing the nu mber of
comma nds th at you m ust enter. SPSS mu st sti l l execute t he same
number of commands; the efficiency comes for the user, not SPSS. To
illustrate its use, lets access the 1994 General Social Survey file, stored
in the c:\ Train\ ProgSynMac directory.
First , to simplify the in stru ctions in th is course, we will request tha t
variable na mes (and n ot the default variable labels) be displayed in
dialog boxes. Fr om within SPSS:
ClickEdit..Options
Click the Display Names option but ton in th e Variable Lists
section of th e Genera l tab
Click the Alphabetical option butt on in th e Variable Lists
section of th e Genera l tab
In order t o display SPSS comma nds in t he Viewer window when we
run an alyses, we cha nge one of th e Viewer options.
ClickViewer ta b in th e SPSS Options dialog box
Clickcheckbox beside Display commands in the log
ClickOK
Now to read th e data.
ClickFile...Open..Data
Move to th e c :\ Tra i n \ ProgSynMac directory (if necessa ry)
Double-click on GSS94
There ar e several questions in the file th at a sk about wheth er
na tiona l spending on var ious program s or ar eas should be increased, stay
the sa me, or be redu ced. Imagine th at we wish to compare pa irs of
DO REPEAT &END REPEAT
DisplayingVariable Names in
Dialog Boxes
-
8/9/2019 Programming With s Pss Syntax and Macros
22/154
Basic SPSS Programming Concepts 2 - 8
SPSS Training
questions (ur ban problems an d welfar e) to see whether or not a
respondent gave th e same an swer to each. The program in F igure 2.1
accomplishes th at t ask. Open it by
Clicking on File...Open..Syntax (move t o
c:\ Train \ Pr ogSynMac folder if necessary)
Double-click on CHAPT2
Figure 2.1 DO REPEAT & END REPEAT Program
The Do Repeat str ucture requires tha t sta nd-in variable na mes be
used to represen t a list of variables or constan ts. The sta nd-in variables
exist only within th e DO REPEAT stru cture. Between the DO REP EAT
an d END REP EAT comman ds, tran sformat ion comma nds can be used,
referencing the st an d-in variables. The PRINT keyword on t he EN DREPE AT comma nd tells SPSS to list th e comman ds generat ed by th e DO
REPE AT structur e. (This is a good idea except when SPSS genera tes
hu ndr eds of comma nds.)
The COMPUTE comma nd uses an other feat ur e of SPSS synt ax, true/
false comparisons. The COMPUTE st at ement tells SPSS to compare th e
values of the element s in F IRST to SECOND, in pa irs. When, for
example, NATSPAC is equal to NATENVIR, the test is tr ue an d SPSS
retu rn s a 1 to the var iable SAME. When th e two responses a re not
equal, SPSS returns a false, or 0, to SAME.
To see this in operation
Highlight all th e lines from DO REP EAT to LIST, th en click on
the Run button
-
8/9/2019 Programming With s Pss Syntax and Macros
23/154
Basic SPSS Programming Concepts 2 - 9
SPSS Training
After SP SS ru ns t he comma nds t he Viewer window opens, as sh own
in Figure 2.2. SPSS creates four COMPUTE comman ds based on th e DO
REPE AT structur e. Scrolling down t hr ough t he outpu t from LIST (not
shown) demonstra tes th at wh en NATSPAC is not equal to NATENVIR,
DIFF1 is set equal to zero, and when t he two responses ar e equal, DIFF1
is set t o 1 (a va lue of 0 for eith er of th e spen ding var iables is defined as
missing so the COMPUTE is not done).
Figure 2.2 Output from END REPEAT PRINT
Although in this inst an ce little if an y work was sa ved by the use of
DO REPE AT, in man y circum sta nces the savings can be subst an tial.
The DO REPEAT stru ctu re is an iterat ive const ru ct because SPSS
iterat es over sets of elements t o car ry out th e user inst ru ctions. A more
gener ic form of itera tion is pr ovided by th e looping facility in SP SS,
represent ed by th e LOOP & EN D LOOP comman ds. They can be used t o
perform repeat ed tra nsforma tions on t he sam e case u nt il a specified
cutoff is reached, which can be defined by an index on the LOOP
comman d, an IF statement on t he EN D LOOP comman d, or otheroptions. By default, th e ma ximum nu mber of loops is 40, defined on th e
SET comma nd. Almost an y tra nsforma tion can be used within a loop.
We begin with a very simple loop to illust ra te its synta x.
Click on Window ...CHAPT2 - SPSS Syntax Editor to return to
the Synt ax Editor window
Scroll down t o the program shown in F igure 2.3
LOOP & ENDLOOP
-
8/9/2019 Programming With s Pss Syntax and Macros
24/154
Basic SPSS Programming Concepts 2 - 10
SPSS Training
Figure 2.3 LOOP & END LOOP Example
A NOTE ABOUTPROGRAM
EXECUTION
On t he LOOP comma nd, we tell SPSS t o loop five times with t he
index clause of #I=1 to 5. This tells SPSS to repeat th e COMPUTE
comman d five times for ea ch person in t he GSS file. Usually indices ar e
increased by one, as in t his example, but t ha t is not always t he case. Nor
must t hey begin a t 1.
The COMPUTE comman d itself tells SPSS t o add one to the pr evious
value of Z, which h as initia lly been set t o 0 before th e loop. The loop t hen
finishes with th e required END LOOP comma nd to tell SPSS t he
constru ct has finished.
Notice tha t th e program ends with an EXECUTE comma nd. When
run ning synta x from a Synt ax window, SPSS does not immediately
process tr an sform at ions by read ing the da ta file. Inst ead, it stores
transformations in memory and waits until a command is encountered
which forces a pass of the dat a. This is in comparison to run ning SPS S
comman ds from a dialog box, where th e comma nd is execut ed
immediately after th e OK butt on is clicked. The E XECUTE comma nd
forces a pa ss of th e data an d execut es any pr eceding tra nsform at ions.
Highlight th e comman ds from t he first COMPU TE to
EXECUTE
Click on the Ru n button
To see th e effect of th is progra m
-
8/9/2019 Programming With s Pss Syntax and Macros
25/154
Basic SPSS Programming Concepts 2 - 11
SPSS Training
Switch to th e Data Editor window
Scroll to th e last column in th e Data View sheet
Figure 2.4 Data Editor with variable Z added
SPSS ha s a dded Z to itself plus 1 five times, a nd since Z initially was
zero, Z is now 5 for every case in th e file. To reiter at e, th e LOOP
comman d works with in a case rat her t ha n a cross cases. We will see man y
uses of looping in pr ogra ms, a nd t he concept of looping will be repea ted in
macros and scripts.
The va ria ble #I used t o index t he loop does not exist in t he GS S file. If it
did, we would see it next t o Z in th e Data Editor. It ha snt been creat ed by
SPSS because it wa s declar ed a scratch va riable. This is done by
specifying a variable nam e tha t begins with th e # cha ra cter. Scrat ch
variables are used in tra nsforma tions or data definition when t here is no
reason to reta in them in the da ta file. They cannot be used in procedures.
SCRATCHVARIABLES
-
8/9/2019 Programming With s Pss Syntax and Macros
26/154
Basic SPSS Programming Concepts 2 - 12
SPSS Training
A vector is a construct used to reference a set of existing variables or
newly creat ed variables with an index. The vector can reference either
string or nu meric var iables.
Here is how to creat e a vector from existing var iables.
VECTOR SAT = SATCITY TO SATHE ALT.
The vector S AT is created from t he five questions in t he Gen eral
Social Survey tha t a sk a bout a r espondent s sat isfaction with various
aspects of his/her life. This vector is not visible in the Data Editor as a
separate variable or set of variables because it is a logical construct from
existing var iables. These variables must be cont iguous in th e file; th at is,
they mu st be located next to each other wh en viewed in th e Data E ditor.
Conversely, the synta x
VECTOR X(10).
will create 10 new variables with names from X1 to X10, allinitialized to system-missing. To illust ra te t his point
Switch to the CHAPT2 - SPS S Syn tax Editor window
ClickVECTOR X(10)., then click t he Run button
Go to the Data Editor (click Goto Dat a tool )and scroll to
the las t co lumns
Figure 2. 5 Data Editor with Variables X1 to X10
VECTOR
The ten new var iables all have system-missing values for ea ch case.
-
8/9/2019 Programming With s Pss Syntax and Macros
27/154
Basic SPSS Programming Concepts 2 - 13
SPSS Training
A more int eresting use of a vector is illustra ted by th e synta x shown
in Figure 2.6. In t he DO REP EAT example we compa red t he value of one
spending var iable to another to see if responses wer e identical. We can
accomplish a similar ta sk with vectors an d loops. In t his insta nce we wish
to compare t he r esponses on t he var iable NATCITY to responses on four
other var iables (NATCRIME, NATEDU C, NATRACE, AND NATARMS).
And inst ead of creat ing a new var iable th at indicates wheth er th e
response on NATCITY is identical or not to the other four variables, wewill compu te t he differen ce.
Switch to the CHAPT2 - SPS S Syn tax Editor window
Scroll down t o the program shown in F igure 2.6
Figure 2.6 Program with Vector and Loop to Compute Differences
Between Variables
The VECTOR comma nd creates t wo new vectors. GROUP is
composed of the five var iables from N ATCITY to NATARMS (again, t hey
must be cont iguous). DIFF_ has four elements an d so creat es four new
variables, DIFF_1, DIFF_2, DIFF_3, an d DIF F_4. We will place th e
differen ce for each pa ir in t his vector.
The loop increments by 1 but begins at 2 ra th er th an 1. It loops until
5 (or a total of four t imes) because t here ar e four variables to compa re t oNATCITY. On th e first pa ss th rough th e loop, the COMPU TE comma nd
compares NATCITY (the first element of GROUP) to the second element
of GROUP (NATCRIME) and pu ts t he differen ce in DIF F_(1). And s o on
for thr ee oth er itera tions.
-
8/9/2019 Programming With s Pss Syntax and Macros
28/154
Basic SPSS Programming Concepts 2 - 14
SPSS Training
Highlight all the lines from VECTOR GROUP to LIST
Click the Run button
Figure 2.7 List Output Showing DIFF_1 to DIFF_4
Where a case h as valid values for t he spen ding variables, we can seethat SPSS created the four new DIFF_ variables measuring the
difference between NATCITY an d th e other four spen ding items. It would
be straightforward to create additional COMPUTE statements to
compare all other possible pairs.
We have used the LIST comma nd t o check the operat ion of SPSS in
two of the examples. Checking to see wheth er synt ax ha s done what you
expected it t o do is very importa nt when doing SP SS program ming. The
SUMMARIZE comman d, available thr ough the men us, can do wha t LIST
does an d more, but LIST is easier to type an d less complicat ed when
using syntax.
-
8/9/2019 Programming With s Pss Syntax and Macros
29/154
Basic SPSS Programming Concepts 2 - 15
SPSS Training
A set of variables mu st be contiguous wh en placing th em int o a vector.
What can you do if tha t is not tr ue in an existing file? Perh aps t he easiest
meth od to rear ra nge variables is to use the t rick of matching a file to
itself. Figure 2.8 displays syntax from CH APT2.SPS t ha t illust ra tes th is
technique.
Switch to the CHAPT2 - SPS S Syn tax Editor window
Scroll down t o the Match F iles example
Figure 2. 8 Match Files Program
ANALYSIS TIP:REORDERING
VARIABLES
Normally, MATCH F ILES is u sed to mat ch one file to an oth er.
However, here t he working dat a file (referenced by an a sterisk on th e
FILE subcomma nd) is mat ched to itself because no other file is nam ed.
Usua lly files are m atched u sing one or m ore link variables (for example,
ID nu mber), but h ere it is n ot n ecessar y since we ma tch one file to itself.
The key portion of th e MATCH FILE S comma nd is th e KEEP
subcomma nd, where we list the variables we wish to reta in in the order
we want t hem t o appear in the Dat a E ditor. The EXECUTE comman d isrequired because MATCH is a t ra nsforma tion, not a procedure.
Highlight th e lines from MATCH to EXECUTE
Click the Run button
Switch t o the Data Editor window and scroll to the last
columns
-
8/9/2019 Programming With s Pss Syntax and Macros
30/154
Basic SPSS Programming Concepts 2 - 16
SPSS Training
Figure 2.9 Data Editor with Spending Variables moved to the End
The KEEP subcommand named the spending variables last, so they
ha ve been moved to the last column s in the Dat a Ed itor.
We reviewed the t ypes of SPSS comma nds, t he t hr ee types of SPSS
programs, a nd briefly reviewed dat a d efinition. We th en discussed someof the k ey progra mming t echn iques in SP SS, including t he u se of loops,
the creation of vectors, the processing of conditional statements (DO IF),
an d th e creation of repeating elements (DO REPE AT). These techniques
will be used repeatedly in SPSS pr ogram ming. There ar e a few oth er
import an t programm ing techn iques tha t you will see in later chapt ers
when th e need arises. We turn in Cha pter 3 to the han dling of complex
dat a files.
SUMMARY
-
8/9/2019 Programming With s Pss Syntax and Macros
31/154
Complex File Types 3 - 1
SPSS Training
Complex File Types
IntroductionASCII Dat a a nd Records
File Types
Synta x Basics
Data F ile Structure
Reading a Mixed File
Errors in the Data
Grouped File Type Without Record Information
Most SPSS users find th at t he sta nda rd DATA LIST comman d is
sufficient t o read t he great majority of the dat a files th ey
norma lly encoun ter. This is because most da ta files ar e
rectangu lar, i.e., they cont ain t he sam e num ber of records per case, th e
definition of a case is consistent thr oughout th e file, and t he var iables to
be defined ar e identical for each case. There a re, however, situa tions in
which the above conditions do not hold. One example is a file at a medical
center with two types of records, one for inpatients and one for
out pat ients, with identical var iables locat ed in different column positions
on each type of record, and some variables un ique to each type of patient.
A standa rd DATA LIST cannot corr ectly read such a file and creat e a
separa te case for each pa tient t ype.
To ha ndle such a data file and an y oth er th at is non-recta ngular ,
SPSS su pplies two general solut ions. The first is to use a FILE TYPE
comman d, which allows th e use of predefined file types t ha t r ead
grouped, mixed, or nested files. The second solution is to allow the user to
tak e complete cont rol of the process of reading da ta with a n IN PUT
PROGRAM comman d. This chapt er discusses th e use of file types; the
next chapt er covers input programs. A third solut ion is t o read th e file
with a standard DATA LIST command, then use other programming
techniques to restr uctur e th e file, such a s VECTOR and LOOP . We will
also illustrate this approach in subsequent chapters.
Before r eviewing the va rious file types, we n eed to discuss some
backgroun d informat ion.
Chapter 3
Topics
INTRODUCTION
-
8/9/2019 Programming With s Pss Syntax and Macros
32/154
SPSS Training
Complex File Types 3 - 2
SPSS a ssum es tha t complex files ar e in ASCII format so that they can be
read with a DATA LIST command (within the complex file types). Files
tha t ar e stored in a spreadsheet or data base format cann ot be read
directly by SPSS with th ese techniques. In th at case, you have two
options. You can write out a n ASCII file from th e other softwa re a nd t hen
read it into SPSS with a complex file definition. Or you can read the file
into SPSS as you norma lly would, tempora rily creat ing a working file
with a n incorr ect forma t for a na lysis. You can then use var iousprogramm ing techn iques to restructur e the file.
The u se of complex file types r equires an un dersta nding of a record.
For SPS S, a r ecord refers t o a physical line of data in an ASCII data file.
Techn ically speaking, a record ends with a carriage retu rn an d a line feed
(these are invisible to users in most software). In practice, if you open a
dat a file in a text editor, such a s Notepad, ea ch line will correspond to a
record in th e dat a file. However, th is is not always th e case in word
processing software t ha t wr aps lengthy lines, so be careful when dea ling
with a file for which you d o not ha ve a codebook tha t lists r ecord length.
It is common to ha ve several records for each case you plan to createin th e fina l SPSS dat a file or t o have several cases on one physical record.
Under stan ding what constitu tes a record an d what t he case definition
should be in th e fina l SPSS file is part of th e ar t of successful data input
programming.
In general, ASCII text da ta files can be in either fixed or d elimited
form at . Becau se complex file types m us t be a ble to locate case a nd /or
record variables, though, complex data should be stored in a fixed-format
ASCII text file.
The th ree available file types within th e FILE TYPE comma nd a re:
Grouped: This is a file in which all records for a single case
are locat ed physically togeth er. Ea ch case usua lly has one
record of each type. Ea ch record sh ould have a case
ident ificat ion va ria ble. This type of file is often iden tical to a
stan dar d rectangu lar file, th e difference being th at a grouped
file type allows addit iona l checking for er rors becau se of
missing and out-of-sequence records, since SPSS normally
assumes th at the r ecords ar e in the sa me sequence within
each case.
Mixed: This is a file in which each record type defines a case.
Some informat ion m ay be th e sam e for a ll record types but
can be recorded in different locations. Ot her informa tion ma y
be recorded only for s pecific record types. N ot all r ecord types
need be defined, so th is is often a very efficient met hod to
read only part of a da ta file.
ASCII DATA ANDRECORDS
FILE TYPES
-
8/9/2019 Programming With s Pss Syntax and Macros
33/154
Complex File Types 3 - 3
SPSS Training
Nested: This is a file in wh ich t he r ecord types are r elated to
each other hierarchically. An example is a file containing
school records and st udent records, where a ll th e studen ts
att ending one school have their records placed together after
the school record. Usually the lowest level of the hierarchy,
the st udent in th is exam ple, defines a case. Informat ion from
the higher-level records, perhaps overall GPA at the school, is
usu ally spread t o the lower-level record when th e case isdefined. All record types that form a complete set should be
physically grouped together, with an optional case identifier
on each record. It is wort h noting tha t record types can be
skipped when reading th e data , resulting in th e creat ion of
cases at a higher level in the h ierarchy.
Complex file type programs a re begun by th e comman d FILE TYPE an d
closed with t he comma nd E ND FILE TYPE . These two comman ds enclose
all definitiona l stat ement s. One of th e thr ee keywords GROUP ED,MIXED, or NESTE D mu st be placed on t he F ILE TYPE comma nd. The
comman ds tha t define the dat a mu st include at least one RECORD TYPE
an d one DATA LIST comm an d, though it is common t o have several. One
set of RECORD TYPE and DATA LIST commands is used to define each
type of record in any data file. The definition of a case, again, depends
upon which FILE TYPE is specified.
The RECORD subcomman d is required an d nam es the column
location of the record identification information and, optionally, the
variable tha t will store th is information. A CASE su bcomman d is also
available (an d required for a grouped file) tha t specifies the n ame an d
location of th e case ident ificat ion in form at ion.
This syntax illustrates the basic structure,
FILE TYPE (Grouped, Mixed, or Nest ed) FI LE = 'Your File' /
RECORD = RECID 4 CASE = ID 1-3.
RECORD TYPE 1.
DATA LIST / your variables and column locations here.
RECORD TYPE 2.
DATA LIST / more variables here.
etc.
END FILE TYPE.
All three file types have subcomman ds available that warn th e userwhen r ecords and cases ar e encoun tered t ha t don't meet t he definitions of
the file type, record, an d case. This warn ing can include situ ations when
records a re m issing.
After t he FILE TYPE --END F ILE TYPE str uctur e is processed, a
rectangular active file is created, no mat ter t he origina l stru cture of the
raw da ta file.
SYNTAX BASICS
-
8/9/2019 Programming With s Pss Syntax and Macros
34/154
SPSS Training
Complex File Types 3 - 4
To fur th er illustra te t he t hr ee types of files, we display sa mples of dat a
files th at can be rea d as grouped, mixed, or n ested files.
A grouped d ata file often looks identical or very similar t o a sta nda rd
rectangu lar d ata file. However, a grouped file often ha s one or more of thefollowing problems:
1. A differen t n um ber of records for ea ch case
2. Records out of order
3. Records with t he wr ong record n um ber
4. Duplicate records
These situa tions a ll mean th at SP SS will not read th e file
successfully with a simple DATA LIST command. The structure of a
simple grouped da ta file is shown in F igure 3.1.
Figure 3.1 Grouped Data File for Hospital Patients
DATA FILESTRUCTURE
GROUPED DATA
These data ar e from a hospital and cont ain inform ation on tests a nd
procedures a dministered t o each patient . Each pat ients data begins with
a r ecord tha t lists identifying informat ion. The second a nd su bsequent
records include informat ion on a test t ha t was given, th e dat e of the test,
an d th e cost. Ea ch r ecord after th e first defines a test , but we would like
the case definition to be a pa tient. The pr oblem is that a different nu mber
of tests is given to each pat ient, so we can not specify th e sam e nu mber of
records for each pa tient.
A CASE su bcomman d is required on th e FILE TYPE comma nd in
addition to th e RECORD subcomma nd. Records with a missing or
incorrect case identification information cannot be corrected and placed
with th e correct case, but SP SS will warn you a bout th e problem.
All defined variable n ames for a gr ouped file mu st be un ique becau se
mu ltiple records will be put t ogether t o form one case. By defau lt all
-
8/9/2019 Programming With s Pss Syntax and Macros
35/154
Complex File Types 3 - 5
SPSS Training
inst an ces of missin g, dup licat e, out of ra nge (called "wild") or out-of-order
records will result in wa rn ings from SPS S.
A mixed raw da ta file looks quite different th an a r ecta ngular dat a file.Again , a MIXED file type is used wh en ea ch record t ype defines a
separa te case (though n ot all record t ypes need be defined). Figure 3.2
depicts a port ion of th e file MIXED.DAT th at cont ain s job inform at ion on
employees from a large company.
Figure 3.2 Mixed Data File For Employees
MIXED DATA
Column s 2-4 contain an identification n um ber for each employee.
This is necessary for t he compan y but n ot importa nt for a FILE TYPE
MIXED definition. Colum n 6 cont ain s the requir ed record ident ificat ion
informa tion, in t his case eith er a 1 or 2 (there is also a record n um ber 3
not shown). The compa ny ha d th ree separa te record-keeping systems for
employee informa tion for each division. The da ta from all th ree ha ve
recently been placed in one file for reporting purposes.
A stan dar d DATA LIST cannot be used to read t his file becau se some
of the same information is in a different location for each record type.
Salary is recorded in a different location for each record type, and other
variables are n ot recorded in each system . We will att empt t o read th is
file in th e first example.
-
8/9/2019 Programming With s Pss Syntax and Macros
36/154
SPSS Training
Complex File Types 3 - 6
A nested dat a file also looks qu ite different th an a r ectan gular d ata file.
A FILE TYPE NESTE D comman d is used when the r ecords in a file are
hier ar chically related . One exam ple is a file with t wo types of records,
one for each depar tmen t in a compa ny, an d one for ea ch of th e employees
in that depa rt ment . All th e employee records for one depart ment a re
placed consecutively together , after th e record for t he depa rt ment in
which th ey are located and before th e next depart ment record.
The variable nam es on all the r ecords must be u nique becau se one
record of each type will be grouped t ogeth er to form a case. Since not all
record t ypes need be ment ioned on th e RECORD TYPE comman d, it is
possible to define a case at a higher level in the hierarchy, e.g., a
department rath er th an an employee. In fact, a case can be defined at a ny
level in th e hiera rchy of record types. Figure 3.3 depicts a n ested da ta file
for a school distr ict.
Figure 3.3 Nested Data File for School District
NESTED DATA
The file cont ains dat a on t he per forman ce of high school stud ents
organized by homeroom and school. It contains three types of records:
Record 1: The high s chool record cont ain s ident ifyinginform at ion on t he s chool plus t he s chool's overa ll GPA, SAT
verbal and SAT mat h scores.
Record 2: The h omeroom record conta ins iden tifying
informat ion on t he h omeroom plus th e average GP A for all
students in the homeroom.
-
8/9/2019 Programming With s Pss Syntax and Macros
37/154
Complex File Types 3 - 7
SPSS Training
Record 3: The student record contains identifying information
including th e studen t's sex and academic track, plus GPA,
SAT verbal an d SAT mat h scores.
There is only one variable in common for all three records, the record
identifying informa tion in th e first column .
The data file has no case identifying information, which is typical of
real-life situat ions. Tha t isn't a pr oblem since SPSS simply stores the
higher-level record information, spreading it to each record 3 when it
creat es a case for each stu dent, ret aining th is informa tion unt il an other
record t ype 1 is encoun tered. SPSS can still successfully creat e the cases
even when interm ediate-level records ar e missing (a homeroom record, in
this insta nce).
We will read th e employee mixed data file from Figur e 3.2 (na med
MIXED.DAT) int o SPSS a nd create a rectangular file with a case for eachemployee.
Her e is a codebook t able for t he t hr ee types of employee record
systems.
RECORD 1 RECORD 2 RECORD 3
VARIABLE LOCATION LOCATION LOCATION
ID 2-4 2-4 2-4
RECORD ID 6 6 6
AGE 8-9 8-9 8-9
SEX 11 11 11
SALARY 13-17 22-26 13-17
TENURE 19-20 19-20 19-20
J OBCODE 22 17 not recorded
LOCATION 24 15 22
J OBRATE 26 13 not recorded
Not only ar e var iables like SALARY recorded in differen t locat ions ,
but two variables, JOBCODE a nd J OBRATE, were not r ecorded on r ecord
3, which was t he oldest record-keeping system. When th e same varia ble,
e.g., AGE, is defined by more th an one record t ype, the forma t t ype and
length should be the same on all records. SPSS uses th e first appear an ce
of th e var iable for t he a ctive file dictiona ry.
The appr opriate comma nds t o read th is file are included in Figure 3.4
an d ar e in the file MIXED.SPS.
Click on File..Open..Syntax
If necessary, switch directories to c :\ Tra i n \ ProgSynMac
Double-click on MIXED
READING A
MIXED FILE
-
8/9/2019 Programming With s Pss Syntax and Macros
38/154
SPSS Training
Complex File Types 3 - 8
Figure 3.4 Mixed File Type Program
The comma nd F ILE TYPE begins the file definition a nd pu ts SP SS
into an input program state. The MIXED subcommand tells SPSS that
this is a m ixed dat a file. The data file is nam ed here, not on th e DATA
LIST comma nds th at follow. The only oth er r equired subcomma nd is
RECORD to specify the record identification variable. The equal sign is
not required following RECORD or FILE. The record variable is incolum n 6 a nd will be named S YSTEM. For t he employee data it is
importan t to retain informat ion that tells us under what record system
the da ta were created because of duplicat e IDs und er each system; often,
though, t he r ecord variable doesnt n eed to be reta ined in t he final file. In
tha t case, it can be declared a scratch variable by beginning its na me
with #.
Ea ch em ployee data system, corr esponding t o a type of record in th e
data file, gets its own RECORD TYPE command. The value specified on
the comma nd (1, 2, or 3) refers to an actua l value in t he file MIXED.DAT
in the record identification position, here column 6 (refer to Figure 3.2).
The DATA LIST comma nd following each RE CORD TYPE comm an d
defines th e var iables for t ha t r ecord t ype. Notice how SALARY is incolum ns 13-17 for r ecord t ypes 1 an d 3 but in colum ns 22-26 for r ecord
type 2.
An optional subcomma nd on th e FILE TYPE comma nd is WILD,
which t ells SPSS t o issue a warn ing when it encoun ters u ndefined record
types in th e dat a file. The defau lt is NOWARN, so SPSS simply skips all
record t ypes not mentioned and does not display warning m essages.
-
8/9/2019 Programming With s Pss Syntax and Macros
39/154
Complex File Types 3 - 9
SPSS Training
The input program state ends with th e END FILE TYPE comman d,
which is followed by labeling comma nds an d t hen a Fr equencies
comman d. FILE TYPE--END F ILE TYPE a re not procedur es and do not
cau se th e ra w dat a file to be read, so they m ust be followed by either a n
EXECUTE comma nd or another pr ocedure.
To run all the synta x
ClickRun..All
SPSS displays the comma nds in th e Viewer window (not shown) and
then the frequency table for SYSTEM. We can see tha t t here a re 212
employees in t he file, creat ed from 212 records, an d th at th ere ar e 52 of
record type 1, 122 of record type 2, a nd 38 of record t ype 3 in th e dat a file.
All the information on, for example, salary, has now been placed in one
column despite its two different locations in the data (if you wish, switch
to the Da ta Editor to verify this).
Figure 3.5 Frequency Table for System
-
8/9/2019 Programming With s Pss Syntax and Macros
40/154
SPSS Training
Complex File Types 3 - 10
We will illust ra te wh at occurs with un defined record types by once again
reading t he file MIXED.DAT. Becau se war ning m essages ar e tu rned off
by defau lt, we were un awa re th at t here a re in fact 213 records, or
employees, in th e file. However, the 213th case has an error in its record
type, as shown in Figure 3.6.
Its record type should be a 3 but is instead a 4.
Figure 3.6 Error in MIXED.DAT
ERRORS IN THEDATA
SP SS skipp ed th is employees record becau se it was n ot defined on a
RECORD TYPE comma nd, but didnt wa rn us. Lets tell SPSS t o do tha t,
then reread the file.
ClickWindow..MIXED - SPSS Syntax Editor to return to the
syntax file
Add th e subcomman d /WILD=WARN to the end of the FILE
TYPE command, but before the period (.)
Figure 3.7 Modified File Type Command to Add Warnings From SPSS
-
8/9/2019 Programming With s Pss Syntax and Macros
41/154
Complex File Types 3 - 11
SPSS Training
After you ha ve car efully added this subcomma nd, reru n a ll the
comman ds by
Clicking Run..All
When SP SS switches to the Viewer, you will now see a wa rn ing
message and a note in t he log under t he FREQUE NCIES comman d, asshown in Figure 3.8. The warn ing message is clear, telling us th at t he
record type (4) was ign ored when building th e file. You can verify th is by
looking at the frequencies output, which lists only 212 cases.
The exact position of the problem is n oted in the m essage tha t begins
with Command line:. The critical information is that it was on case 213
tha t SP SS encoun tered a n un known record ID, whose value is 4. SPSS
also convenient ly lists t he a ctua l line of da ta from the file MIXED.DAT
for r eference. Obviously, warn ings can be h elpful in finding a nd fixing
errors in dat a ent ry or definition.
Figure 3.8 Warning Messages for Undefined Record Type 4
It is n ow possible to use the file created via t he F ILE TYPE MIXEDcommand to report on either the total group of employees, or differences
between employees across record-keeping systems.
-
8/9/2019 Programming With s Pss Syntax and Macros
42/154
SPSS Training
Complex File Types 3 - 12
If you plan to read a file with kn own err ors, you m ight th ink th at you
want to be warn ed every time th ere is a problem defining an S PSS da ta
file. However, th is is not a lways th e case. If it is a large dat a file with
man y errors, SPSS could possibly generat e hun dreds, even thousan ds, of
warn ing messages. It is un likely tha t you will car e to scroll th rough all
tha t outpu t. In recognition of th is, th e maximum nu mber of warn ings
SPS S will display h as been set r elat ively low, to a value of 10. When you
do wan t to see more warn ings, use th e SET comman d with this synta x:
SET MXWARNS = 100. (or to what ever value is appr opriat e)
Reading either a gr ouped or n ested file into SPS S isnt m uch different
tha n r eading a mixed file in t erms of th e synta x. However, one situ at ion
tha t causes p roblems yet is still relatively common, and t herefore worth
exploring, is when you wish to use F ILE TYPE GROUP ED bu t dont h ave
a r ecord iden tificat ion va ria ble. This is fair ly common, especially becau se
an y recta ngular da ta file can be read with eith er a sta nda rd DATA LIST
or via a FILE TYPE GROUPE D format . The advant age of the lat ter is
tha t SP SS will fix any problems with out-of-order records a nd r ead t hefile correctly if there are missing records. However, a record type variable
is needed in each instan ce, and you may n ot have created one for what
you kn ew was a stan dar d rectangular file forma t.
Our example here is a little m ore complicated. Figure 3.9 displays a
small data file th at st ores informa tion on stu dents in a sta tistics class
an d th eir scores on each of thr ee assignment s. There is an identification
variable for the st udent in column 2, but n o numer ic record ident ifier to
tell SPSS tha t th e first record h as inform at ion on the first quiz, the
second r ecord on t he first homework a ssignment , and t he th ird record on
the first t est. Moreover, stu dent 2 did not complete th e first homework
assignmen t a nd so only has t wo records, which is the rea l problem.
If this file ha d been created with one record for ea ch st uden t, an d
each assignment in a sepa rat e colum n, it would be a straightforwar d task
to read it into SPSS.
Figure 3.9 Grouped Data File
Analysis Tip
GROUPED FILETYPE WITHOUT
RECORDINFORMATION
We should point out th at th e score t ype (quiz1, etc.) field can be used a s a
record type identifier, although it is a string field. However, we will
ignore th is in order t o demonst rat e anoth er met hod of reading the file.
Note
-
8/9/2019 Programming With s Pss Syntax and Macros
43/154
Complex File Types 3 - 13
SPSS Training
What we want t o do is read t his dat a file and creat e thr ee cases, one
for each stu dent. We also want to creat e th ree variables, one for each type
of assignment, and just as importa nt, we want SPSS to realize that the
second stu dents second r ecord ha s h is score for t he t est, not t he
homework, assignment.
There ar e two meth ods to appr oach th is problem without u sing a
more complex INP UT PROGRAM comman d sequ ence.
1) Read th e data into SPSS, creat e a record identifier, write th e
data back out a s an ASCII file, then r ead it back in using
FILE TYPE GROUPED.
2) Read th e dat a int o SPSS, creat e a record identifier, then
manipulate the data to create the necessary variables.
The lat ter choice is clearly pr eferable because it r equires fewer p asses
of th e dat a file. The file GROUPED.SPS cont ains t he comma nds t o
implement th e second method.
Click on File..Open..Syntax (move t o c:\ Train\ ProgSynMac
folder)
Double-click on GROUPED
Figure 3.10 Program to Read a Grouped File
The DATA LIST comma nd reads GROUPE D.DAT as if it were a
rectangular file. The interestin g techn iques in th is program a re what
ha ppens after t he file is read.
-
8/9/2019 Programming With s Pss Syntax and Macros
44/154
SPSS Training
Complex File Types 3 - 14
First , we creat e a RE CORD variable with th e LAG function. Initially,
RECORD is set to 1 for each case. Then, when th e cur rent case ha s th e
same ID a s th e previous case, RECORD is set equa l to the value of
RECORD for t he pr evious case plus one. When t he pr evious case was a
different student, then this statement is not executed and RECORD
rema ins a t 1, i.e., it is r eset t o 1 for each stu dent s first record. This
creat es the desired r ecord identificat ion var iable. To see this
Highlight the lines from DATA LIST to the first LIST
command
Click on the Ru n button
Figure 3.11 List Output Showing Record ID
When creating programs t ha t rea d or tr an sform da ta , it is very helpful indebugging your programs t o list out th e data after an action or series of
actions is executed . If you d ont d o this, t hen it will be very difficult to
figure out wh ere t hings went wrong. Following this a dvice, the LIST
comman d ha s been placed at four sp ots in t he program. It is better to err
on th e side of excess h ere.
Our problems a re n ot yet solved, though. Stu dent 2 didnt complete
the homework, but th e value of RECORD for h is test1 score is 2, not 3.
The set of IF sta temen ts fix this problem. They assign th e correct value of
RECORD for ea ch type of assignmen t, so th at stu dent 2s test 1 score will
now be listed a s record t ype 3.
Switch to the GROUPED - SPSS Syn tax Editor window
Highlight the lines from IF to the next LIST command
Click on the Run button
In t he Viewer (not shown) we can now see tha t t he value of RECORD
for th e second line (or case) for st udent 2 ha s been cha nged to a 3.
Analysis Tip
-
8/9/2019 Programming With s Pss Syntax and Macros
45/154
Complex File Types 3 - 15
SPSS Training
One task r emains, and tha t is to take these 8 cases in the Data Editor
an d tu rn them into 3 cases. As men tioned above, we could write this file
out a nd th en read it back into SPSS, but t ha t is cum bersome. A better
meth od is to use t he AGGREGATE comma nd. For a na lysis pur poses, it is
best to create a separa te var iable or column for each assignmen t. If we
simply aggregate t he curr ent working file by studen t ID, th at wont
happen. Why?
To creat e these n ew variables (th ree in th is instan ce), we can tak e
advan ta ge of th e VECTOR comma nd we sa w in Chapt er 2. The VECTOR
comman d creates a n ew vector, SCORE_, with th ree elements. The
COMPUTE st at ement t hen a ssigns the value of SCORE for a case to an
element of SCORE _ based on th e value of RECORD. In other words, for
the first record type (quiz1), SCORE_1 gets the value of SCORE for that
case, the quiz1 score. The values of SCORE_2 and SCORE_3 for that case
ar e system-missin g. For th e second record type (for hmwk 1) SCORE _2
gets the value of SCORE for t ha t case, and SCORE_1 and SCORE_3 are
system-missing.
Its probably easier to see this in action.
Switch to the GROUPED - SPSS Syn tax Editor window
Highlight the lines from VECTOR to the next LIST command
Click on the Run button
Figure 3.12 List Output Showing Vector SCORE_
The values for each assignm ent or test h ave been placed in separa te
variables, with th e quiz1 score in SCORE_1, the h mwk1 score in
SCORE_2, an d th e test1 score in SCORE _3. However, th ere ar e still
thr ee cases for each st uden t, with lots of missing data , and a ll we need is
one case for each st udent to calculate appr opriate sta tistics.
-
8/9/2019 Programming With s Pss Syntax and Macros
46/154
SPSS Training
Complex File Types 3 - 16
A file with th at str uctur e can be creat ed using th e AGGREGATE
comman d, which chan ges the case base in a file and calculates sum mar y
sta tistics for the n ew case base. For example, in a file of customers th at
buy five different products, AGGREGATE can create an SPSS dat a file
where t he case ba se is product (and so there ar e only five cases), with
informa tion such as t he n um ber of customers who bought each product,
the m ean a ge of customers who bought th at p roduct, and so forth.
There a re th ree necessary subcomma nds for AGGREGATE, as shown
in Figure 3.10. The OUTFILE subcomma nd tells SPSS wheth er to save
the n ew data file to disk or to mak e it the curr ent working file in th e Data
Editor. The aster isk mean s to replace the curr ent working file with th e
new one. The BREAK subcomma nd defines th e case base for t he n ew file.
By breaking on the var iable ID, which h ere ha s only th ree un ique values,
we will creat e a file with thr ee cases. The aggregated var iables
subcomma nd creates t he sum mar y variables in th e new file. Its format is
LIST OF NEW VARIABLES = FUN CTION (LIST OF
EXISTING VARIABLES )
where you define n ew variable na mes on t he left of th e expression, a n
aggregate function on the right, followed by th e sam e nu mber of existing
variables tha t will be used to creat e the n ew variables. In our example,
we use t he MAX (maximum) function t o creat e th ree var iables called
QUIZ1, HMWK1, and TE ST1, based on the m aximum value of SCORE_1
to SCORE_3 for ea ch ID value. Why does th is accomplish our t ask ? Could
we ha ve used a different fun ction?
To finish t he pr ogram
Highlight the AGGREGATE and LIST commands
Click on the Run button
Figure 3.13 List Output Showing New Assignment Variables
The output from LIST demonstr ates t ha t th ere are only th ree cases in
the new file, one for ea ch stud ent. Thr ee new variables ha ve been creat ed,
one for each as signment. Th is ma kes it ea sy to calculate st at istics for
each assignment . And st udent 2 has been correctly assigned a m issing
score for HMWK1. Since AGGREGATE only creates t he n ew sum ma ry
-
8/9/2019 Programming With s Pss Syntax and Macros
47/154
Complex File Types 3 - 17
SPSS Training
variables we define, the variables ASSIGN, SCORE, RECORD, and
SCORE _1 to SCORE_3 ar e gone, which is fine. You m ay a lso want to look
at th e Data Editor (not shown) to see the file format .
Un like th e definit ion of complex file types, every comm an d in t his
program could have been created from the dialog boxes except VECTOR.As this cour se is generally concerned with SPSS programm ing, we
instead worked from a S ynta x file. Either appr oach is a cceptable,
although seeing the syntax often helps your u nderst an ding, an d it
certa inly lets you apply the sam e type of program in th e fut ur e to another
dat a file.
We reviewed the t ypes of complex files, their str uctur e, and th e SPSS
synta x used t o read t hese da ta files. We illustr ated th e use of complex file
types by reading a mixed data file, then discussed how data err ors are
ha ndled by SPSS. We then sh owed how to read a grouped file with an odd
stru cture an d no numer ic record t ype inform ation.
Analysis Tip
SUMMARY
-
8/9/2019 Programming With s Pss Syntax and Macros
48/154
SPSS Training
Complex File Types 3 - 18
-
8/9/2019 Programming With s Pss Syntax and Macros
49/154
Input Programs 4 - 1
SPSS Training
Input Programs
Introduction
Synta x Basics
Chan ging the Case Base of a F ile
En d of Case Pr ocessing
End of File Processing
Checking Input Programs
Incomplete Input Programs
Reading F iles with Missing Iden tifiers
When Th ings Go Wrong
There are situations where you encounter non-rectangular raw
dat a files th at can not be rea d directly with t he complex file types
provided by SPSS. For t hose situat ions, SPS S offers a n inpu t
program facility, as men tioned in Cha pter 2, tha t ha s th e capa bility to
read essentially any type of ASCII data file. The ability to read a file will
at times depend upon t he cleverness of th e user, a s rea lly odd files ma y
require creative solut ions.
An input program can a lso be used to create dat a th at m atch a t arget
distribution, often for pur poses of teaching or illust ra tion. In other words,
an inpu t program can create dat a from nothing (this is the one time thatSPSS provides a free lunch, so to speak).
For ver y large files, inpu t pr ogra ms a lso offer gr eat efficiencies, even
if th e file is a stan dar d rectangu lar dat a file. An input pr ogram can be
used t o select only certa in cases a s th e file is read, sa ving one pa ss of the
dat a. Or it can concatena te ra w data files, saving on h aving to creat e
SPSS da ta files of each. And inpu t pr ogra ms can per form t he equivalent
fun ctions of a gr ouped, mixed, or nested file type, but with added
flexibility.
The u ser is in char ge of case definition when wr iting an input
program, so car eful at tent ion mu st often be paid to where in th e progra m
strea m a case should be creat ed. At t imes, you may also need to tell SPSS
when to stop reading the data an d creat e a working dat a file.
Chapter 4
Topics
INTRODUCTION
-
8/9/2019 Programming With s Pss Syntax and Macros
50/154
Input Programs 4 - 2
SPSS Training
The comma nds IN PUT P ROGRAM and E ND INP UT PROGRAM enclose
dat a definition and tr an sforma tion comm an ds tha t build cases from input
dat a. At least one file definition comma nd, su ch as a DATA LIST, must
be included in th e structu re. Essentially any tra nsform ation comman ds
can be placed with in an inp ut pr ogra m str uctur e, but no procedures. This
mean s tha t you can u se COMPUTE, IF, DO IF, REPE ATING DATA,
LOOP, or an y of th e oth er tra nsform ation comman ds tha t ma y help to
creat e a working da ta file.
It is very important to understan d tha t SP SS processes the input
program comman ds on a case-by-case basis. This may be ha rd t o gra sp
intu itively, given tha t a n inpu t pr ogram creates th e definition of a case as
it is executed, but we will illustrate this concep