programming with s pss syntax and macros

Upload: stevefredjoe

Post on 30-May-2018

248 views

Category:

Documents


7 download

TRANSCRIPT

  • 8/9/2019 Programming With s Pss Syntax and Macros

    1/154

    Programming with SPSSSyntax and Macros

    SPSS Inc.

    233 S Wacker Drive, 11th Floor

    Chicago, Illinois 60606

    312.651.3000

    Training Department

    800.543.6607

    v10.0 Revised 12/ 31/ 99 ss

  • 8/9/2019 Programming With s Pss Syntax and Macros

    2/154

    SPSS N eura l Conn ection, SPSS QI Analyst, SPSS for Windows, SPSS Da ta

    En try II, SP SS-X, SCSS, SPSS/PC, SPSS/PC+, SPSS Cat egories, SPSS Gr aph ics,

    SPSS Pr ofessiona l Stat istics, SPSS Advanced Stat istics, SPSS Ta bles, SPSS

    Trends, SPSS E xact Tests, and SP SS Missing Value ar e the tra demar ks of SPSS

    Inc. for its pr oprieta ry computer software. CHAID for Windows is th e tra demar kof SPSS Inc. and Sta tistical In novations Inc. for its pr oprietar y compu ter

    softwa re. Excel for Windows an d Word for Windows ar e tra dema rk s of Microsoft;

    dBase is a t ra demar k of Borlan d; Lotu s 1-2-3 is a tr adema rk of Lotus

    Development Corp. N o mat erial describing su ch softwar e ma y be pr oduced or

    distributed without the written permission of the owners of the trademark and

    license rights in the softwar e and t he copyrights in th e published mat erials.

    Genera l notice: Other pr oduct na mes ment ioned herein ar e used for

    identificat ion pu rposes only an d ma y be tra demar ks of th eir respective

    companies.

    Copyright(c) 2000 by SPS S In c.

    All rights reserved.

    Pr inted in th e United Sta tes of America.

    No part of this pu blication m ay be repr oduced or distributed in an y form or by

    an y means, or stored on a data base or ret rieval system, without th e prior writt en

    permission of th e publisher, except as perm itted un der th e United Sta tes

    Copyright Act of 1976.

  • 8/9/2019 Programming With s Pss Syntax and Macros

    3/154

    Table of Contents - 1

    SPSS Training

    Programming with SPSS Syntax and MacrosTable of Contents

    Introduction and Syntax Review

    A Data Manipula t ion Example 1 - 1A Macro Example 1 - 3

    Rules and Aids for SPSS Syntax 1 - 6

    Advice for Those Working with Syntax 1 - 9

    Summary 1-10

    Basic SPSS Programming ConceptsCommand Types in SPSS 2 - 2

    The Three Types of SPSS Programming 2 - 3

    SPSS Data Defin it ion 2 - 5

    SPSS Programming Const ruct s 2 - 6Do If & End If 2 - 6

    Do Repeat & End Repeat 2 - 7

    Loop & End Loop 2 - 9

    Scra tch Var iables 2-11

    Vector 2-12

    Summary 2-16

    Complex File TypesASCII Data and Records 3 - 2

    File Types 3 - 2Syntax Basics 3 - 3

    Data File St ructure 3 - 4

    Grouped Data 3 - 4

    Mixed Data 3 - 5

    Nested Data 3 - 6

    Reading a Mixed File 3 - 7

    Er rors in the Data 3-10

    Grouped F ile Type Without Record Information 3-12

    Summary 3-17

    Input ProgramsSyntax Components 4 - 2

    Example 1: Change the Case Base of a F ile 4 - 2

    End of Case Processing 4 - 7

    End of F ile Processing 4 - 9

    Checking Input P rograms 4-10

    Incomplete Input Programs 4-11

    Chapter 1

    Chapter 3

    Chapter 2

    Chapter 4

  • 8/9/2019 Programming With s Pss Syntax and Macros

    4/154

    SPSS Training

    Table of Contents - 2

    Exercises

    Example 2: Reading F iles with Missing Ident ifier s 4-14

    When Things Go Wrong 4-20

    Summary 4-21

    Advanced Data Manipulation

    Reading a Comma-Delimited File 5 - 2Reading Mult iple Cases on the Same Record 5 - 7

    An Exist ing SPSS Data File with Repeat ing Data 5-11

    Prin t Command for Diagnost ics 5-15

    Pract ica l Example: Consolida t ing Transact ions 5-17

    Summary 5-24

    Appendix: Ident ifying Missing Values by Case 5-25

    Introduction to MacrosMacro Basics 6 - 2

    Macro Arguments 6 - 3

    Macro Tokens 6 - 3

    Viewing a Macro Expansion 6 - 7

    Keyword Arguments 6 - 8

    Using a Varying Number of Tokens 6-10

    When Things Go Wrong 6-15

    Summary 6-18

    Advanced MacrosLooping in Macros 7 - 2

    Producing Severa l Clustered Bar Char t s 7 - 2Double Loops in Macros 7 - 5

    St r ing Manipula t ion Funct ions 7 - 7

    Direct Assignment of Macro Var iables 7 - 7

    Condit iona l Processing 7 - 7

    Crea t ing Concatena ted Stub and Banner Tables 7 - 8

    Addit iona l Recommendat ions 7-13

    Summary 7-13

    Macro Tricks

    Combining Input Programs and Macros 8 - 2Order ing Tables and Char t s 8 - 6

    The Case of the Disappear ing Command 8 - 9

    Summary 8-14

    ExercisesExercises E - 1

    Chapter 6

    Chapter 7

    Chapter 8

    Chapter 5

  • 8/9/2019 Programming With s Pss Syntax and Macros

    5/154

    Introduction and Syntax Review 1 - 1

    SPSS Training

    Introduction and SyntaxReview

    A Data Manipulation Example

    A Macro Example

    Rules and Aids for SP SS Synta x

    Advice for Those Work ing with Synt ax

    This course has two major topical areas. We will review how to use

    SPSS Synt ax to perform complex data ma nipulat ions th at a re notavailable under t he SPS S menu system. This will be of int erest t o

    those who n eed to rea d complex dat a files from legacy compu ter systems

    (for exam ple, legacy health care dat a, tr an saction oriented sa les systems)

    an d th ose who find th ey need to reorgan ize th eir data in order to perform

    a desired an alysis. Exam ples of th e latter include mar keting an d

    cust omer relat ionsh ip stu dies in which a nu mber of products (or ser vices

    of an compan y) are ra ted on ea ch of ma ny a ttr ibutes. All inform ation

    from a respondent is typically stored in a single record, but needs t o be

    spread across multiple records in order for factor a na lysis an d perceptua l

    map ping to be performed. When p repar ing data for chur n (cust omer

    reten tion for t elecoms, credit card issuers, insu ra nce companies)

    stu dies, compa risons might need t o be mad e across tr an sactiona l records

    sorted by customer ID and dat e. SPSS Synt ax permits a r icher a rra y ofdata manipulations in t his content tha n would the menu system. In

    short, we will examine uses of SPSS Syntax to facilitate analysis of files

    with complex structu res or files tha t mu st be restr uctur ed for a desired

    analysis.

    The second topical ar ea concerns au tomation in SPSS t hr ough the

    SPSS m acro lan guage. SPSS ma cros can generat e SPSS Synta x, which is

    then executed. For th is reason, macros are very han dy in situ at ions

    where SPSS Syntax needs to be run repeatedly, but with minor and

    systemat ic changes each t ime. For example, you m ight wish t o produce

    thirt y Intera ctive graphs, each a clust ered bar chart containing a

    demograp hic varia ble and one of thirt y rat ing scale variables. Within th e

    SPSS m enu system , cha nges would ha ve to be made in th e Inter active

    graph dialog box for each graph . Instea d, an SPSS macro could genera te

    the SP SS Synta x for each inter active graph within a loop, substitut ing a

    new rat ing scale variable na me per iterat ion. In th is way, the SP SS

    macro language can au tomat e what would otherwise be time-consu ming

    tasks for t he an alyst.

    Chapter 1

    Topics

    INTRODUCTION

  • 8/9/2019 Programming With s Pss Syntax and Macros

    6/154

    SPSS Training

    Introduction and Syntax Review 1 - 2

    Since th ese topics involve SPS S Synt ax, we will use t he dia log boxes

    within SP SS infrequent ly. A prerequ isite for this cour se is familiar ity

    with SP SS Synta x at t he level of our Introduction to SPS S S yntax

    tra ining course. In th is cha pter, we will present a sa mple of data

    man ipulation with SP SS synta x and a ma cro example, an d provide a

    brief review of and some recommen dations for SPSS Synta x.

    To illustr at e the type of dat a ma nipulat ion th at can be performed with

    SPSS Synta x, we will display the beginning a nd fina l form of a dat a file

    recording SPSS t ra ining course purcha ses. Within th e Training

    department, there was interest in examining patterns of training courses

    tak en by SPSS cust omers, and an a na lysis was performed usin g SPSS

    Clementine. However, a requirement of the a na lysis was a da ta set in

    which a ll courses ta ken by a customer (an SPSS ID) were contained in a

    single customer record.

    The original dat a file, extr acted from a t ran saction da taba se,

    contained one record p er cours e ta ken, since an insta nce of a cour se being

    tak en by a customer const ituted a sales tra nsa ction. We show this below.

    Figure 1.1 Training Sales Data - Transaction File

    A DATAMANIPULATION

    EXAMPLE

    Each record in this file is a sales transaction involving a specific

    tra ining cour se. The two fields displayed ar e customer ID a nd course

    tak en (which cont ains city, sequence within th e year, and t ra ining course

    code information). Additional fields, such as date and price, were

    previously removed since th ey were not n eeded for th is ana lysis. Here

    different courses tak en by an individual SPSS customer ar e scatt ered

    thr oughout th e file. Even if the file were sorted by cust omer ID, the fact

    tha t th e tra ining course history for a single cust omer is spread a cross a

  • 8/9/2019 Programming With s Pss Syntax and Macros

    7/154

    Introduction and Syntax Review 1 - 3

    SPSS Training

    nu mber of records, tha t var ies from customer t o cust omer, would create

    difficulties for the analysis procedures.

    Figure 1.2 Training Sales Data - One Record Per Customer

    The tr aining dat a h as been reorganized so ther e is a single record per

    cust omer ID and a sepa ra te variable for each tr aining cour se. These

    course varia bles are coded 1 if a cust omer signed up for t he cours e an d 0

    if not. This str uctur e ma kes it easy t o explore a ssociations am ong

    tra ining courses tak en by cust omers. The SPSS synta x to perform t he

    dat a reorganization involved two steps: creating a vector of variables in

    which ea ch variable represen ted a specific cour se, and aggregating t his

    file to the cust omer ID level. The logic behin d th ese operat ions is

    reviewed in Chapt er 5.

    We mentioned earlier that SPSS macros generate SPSS Syntax. A

    common use of macros is to produce a series of synta x comma nds th at

    vary in specific ways, for example, a set of Interactive Graph or Tables

    comman ds in which each comma nd r un s an an alysis based on a different

    variable. Thus one macro produces the sa me result as m an y synt axcommands (which it creates) or interactions with a dialog box. To

    demonstra te, we will display resu lts from a ma cro th at produces a set of

    Inter active graph s, substitut ing different variables in clustered bar

    cha rts (th is macro is discussed in Ch apt er 7).

    A MACROEXAMPLE

  • 8/9/2019 Programming With s Pss Syntax and Macros

    8/154

    SPSS Training

    Introduction and Syntax Review 1 - 4

    Figure 1.3 Create Bar Chart Dialog Box

    The dialog above will create a clust ered bar cha rt displaying at titud e

    toward governmen t action on health for different ma rital sta tus groups.

    Note th at only a single variable can be placed in h orizonta l an d Color

    boxes. (Note: actua lly multiple var iables can be pla ced in a single box, but

    this a ction will not produce multiple cha rts.) Thus creating a series of

    cha rt s, in which eith er t he h orizontal axis or Color var iables cha nge,

    would require r epeated visits to this dialog, substitu ting one variable at atime. However, th e macro below can build man y clustered ba r char ts.

  • 8/9/2019 Programming With s Pss Syntax and Macros

    9/154

    Introduction and Syntax Review 1 - 5

    SPSS Training

    Figure 1.4 Macro to Produce Multiple Bar Charts (Interactive Graphs)

    The deta ils of this ma cro (Clu2IBar) will be discussed in Cha pter 7.

    However, we point out t ha t th e IGRAPH comm an d, which was pa sted by

    clicking the P ast e pushbu tton in th e Create Bar Char t dialog box (seeFigure 1.3), is nested within two loops, each of which iterates over a list

    of variables supplied by the user. The invocation of the macro (last line in

    program), supplies two var iable nam es for t he h orizont al a xis var iable

    an d thr ee var iable names for th e cluster var iable. Thus six IGRAPH

    comman ds will be generat ed, resulting in t he six bar cha rt s shown below.

  • 8/9/2019 Programming With s Pss Syntax and Macros

    10/154

    SPSS Training

    Introduction and Syntax Review 1 - 6

    Figure 1.5 Bar Charts Produced from Clu2IBar Macro

    The six Intera ctive Graph s in th e Outline pan e were produced from

    the m acro. In th is way, ma cros can au tomat e the ru nn ing of sets of

    similar an alyses. The second section of th is cours e reviews SPSS ma cros

    in detail.

    Since this cour se involves either the writing or genera tion of SPSS

    Synta x, we begin be reviewing th e rules of SPSS S ynta x and h ow to

    obtain synt ax help.

    The synta x rules for editing an d writing SPSS comma nds a re as

    follows:

    1. Each new command must begin on a new line and end with a

    period (.) or a blan k line.

    2. *Each comman d must begin in the first column of a new line.

    3. *Continua tion lines of a comman d must be indented at leastone space.

    4. Variable names must be spelled out fully.

    5. Subcommands must be separated with a forward s lash ( /).

    The slash before the first su bcomman d is usu ally optional.

    6. *Each line of comman d syntax cann ot exceed 80 char acters.

    *Not required when r un ning from a Synt ax window, but required when

    using th e INCLUDE comma nd or the SP SS Pr oduction Facility

    RULES AND AIDSFOR SPSS

    SYNTAX

  • 8/9/2019 Programming With s Pss Syntax and Macros

    11/154

    Introduction and Syntax Review 1 - 7

    SPSS Training

    Synta x comma nds pr oduced by clicking th e Past e push butt on from an

    SPSS dialog box will conform to th ese ru les, so th e importan t issu e is to

    remember them when editing or ent ering synta x.

    There a re several u seful sources of help when writing SP SS synta x. A

    quick reminder of the keywords and r equirement s for an SP SS comma nd

    are only a tool-button click away. To demonstrate:

    From within SPSS:

    ClickFile..Open..Syntax

    Move to th e c:\ Train\ ProgSynMac directory

    Double click on TransactionAgg

    Scroll down to th e Vector command

    Click on th e Vector comma nd (so the insert ion pointer touches it)

    Click the Syntax Help tool

    Figure 1.6 Syntax Help

    In th is syntax sum mar y for th e Vector comman d, subcomman d

    na mes an d keywords a re shown in upper case (some simple comman ds,

    like Vector, have no subcommands); lower case elements describe

    specifications that you supply (e.g. varlist indicates a list of variable

    na mes t ha t you pr ovide). Sections of the comma nd enclosed in squ ar e

    brackets [ ] are optiona l, while those in braces indicat e sets from wh ich a

    single choice can be ma de. To focus on th e requ ired elemen ts of the

    comman d, scan only sections not en closed within squa re bra ckets.

  • 8/9/2019 Programming With s Pss Syntax and Macros

    12/154

    SPSS Training

    Introduction and Syntax Review 1 - 8

    While the synta x informat ion a bout VECTOR is complete, ther e is no

    explan ation a bout wha t each specificat ion does. Although some might be

    obvious from th eir na mes, man y are n ot. Complete documen ta tion a bout

    SPSS for Windows synta x comma nds can be found in t he SPS S 10.0

    S yntax Reference Guide (included on t he CD-ROM containing t he SP SS

    10.0 progra m). If copied to your h ar d dr ive du ring SP SS inst allation, you

    can access th e guide from th e ma in men u by clicking Help..Synta xGuide..Base (or one of the optional modules). Commands are listed

    alpha betically and t he su bcomman d options a re fully explained.

    Experienced synt ax comma nd u sers, needing only reminder s, can work

    from th e Synta x Help windows. For others, t he Synt ax Reference Guide is

    necessary.

    The sequence below assu mes th e S PS S 10.0 S yntax Reference Guide has

    been insta lled on your machine. If not, it can be insta lled from t he SP SS

    for Windows 10.0 CD-ROM.

    ClickHelp..Syntax Guide..BaseClick th e ar row beside Commands in the Outline pane

    Scroll down to VECTOR

    Click the arr ow beside VECTOR in the Outline pane

    Click on VECTOR in the Outline pane

    Figure 1.7 SPSS Base 10.0 Syntax Reference for Vector Command

    Note

  • 8/9/2019 Programming With s Pss Syntax and Macros

    13/154

    Introduction and Syntax Review 1 - 9

    SPSS Training

    In addition to th e syntax summ ary accessed through th e Syntax Help

    tool, th e S PS S 10.0 S yntax Reference Guide contains discussion,

    explan ation an d examples. All ar e useful when investigating the

    possibilities of an S PSS comma nd. F or those working often with SP SS

    Synta x, we strongly recommend inst alling the r eference guide on your

    machine or pu rchasing a copy of th e S PS S 10.0 S yntax Reference Guide in

    book form.

    ClickFile..Exit to exit Adobe Acrobat and the SPS S 10.0 Syntax

    Reference Guide

    Fina lly, it is worth ment ioning, at th e risk of being obvious, several

    recommen dat ions for t hose working with SPSS Synta x.

    Display Syntax commands as Log Items

    By defau lt, SPSS does not d isplay synt ax in t he Viewer window, although

    it is written t o the SPSS journa l file. If SPSS issues a ny error or war ning

    messages, it is u seful to see which comma nd they follow. For t his r eason,

    while writing, editing and testing SP SS synt ax, we recommen d you set on

    the option to display syntax a s a log item in th e Viewer window. We will

    do this explicitly in th e next cha pter, bu t view th e Options dialog here.

    ClickEdit..Options

    Click the Viewer t ab

    Figure 1.8 Viewer Options

    ADVICE FOR THOSEWORKING WITH

    SYNTAX

  • 8/9/2019 Programming With s Pss Syntax and Macros

    14/154

    SPSS Training

    Introduction and Syntax Review 1 - 10

    The checkbox in the lower left corner of Viewer tab within the

    Options dialog controls whether SPSS syn ta x comman ds display in th e

    log.

    Develop Basic Syntax Using Dialog Boxes

    Alth ough th is cour se proves the exception to the ru le (discussing In put

    Program s, Vectors an d Loops), most SPSS comma nds can be generat ed by

    clicking th e Pa ste pu shbut ton of th e relevant dialog box. It is to your

    advan ta ge to use dialogs, when possible, to const ru ct t he ba sic SPSS

    synta x and t hen edit is as n eeded. This will minimize errors by reducing

    your opportun ity to ma ke typing mistakes.

    Use File New to Clear Data

    If err ors lead to complex SPSS da ta operat ions (Inpu t P rogram) not

    completing, SPSS can be left in a wa iting stat e. That is, it will not

    properly process new instru ctions u nt il it ha s closur e on th e interr upt ed

    sequence. To clear t he curren t dat a sta te of SPSS, you can ru n th e NEW

    DATA command or click File..New..Data. When writing complex Input

    Program s (see Chapt er 4 an d 5), you m ight consider beginning the

    program with a N EW DATA comma nd to insu re tha t an y problem datastat e has been cleared pr ior to run ning your pr ogra m. We illust ra te th is

    in Chapter 4.

    Test After Each Step

    It is a difficult challenge to foresee all possible data problems that a

    program might face an d it rarely the case that a ny program, for that

    mat ter, ru ns correctly the first time. For these reasons it is importa nt t o

    systemat ically test an d check resu lts at each sta ge of the process. Full

    programm ing meth odologies ha ve been developed to th is end. Her e we

    merely wish to recommen d th at , during development , you include

    displays or procedures to check the results of each set of operations, so

    tha t wh en something goes awry you have a wa y of isolating an d

    identifying the pr oblem. The Dat a E ditor display in SPSS is useful for

    this purpose, as are the Frequencies, Crosstabs, Case Summaries and

    List procedures, an d the P rint t ra nsforma tion. These will be used

    repeat edly in the examples we present in this cour se, and we can a ssur e

    you th at t he consu ltan ts in the SP SS Consu lting group use them h eavily.

    Delete Items in Viewer Window

    If an er ror occurs, an d you ha ve read and u nderst ood the war ning

    messages in th e Viewer window, it is often a good idea t o delete th e

    results in th e Viewer window before rer un ning your program. Th is is

    becau se syntax comma nds m ay be appended to the last Log item in t he

    Outline pan e, making it d ifficult to distinguish th e old war ning messa ges

    from the new resu lts.

    In t his chapter we introduced, with exam ples, the major focus a reas of

    this cour se: Synt ax for complex data ma nipulat ion a nd S PSS Ma cros. We

    also briefly reviewed the available help for SPSS Syntax and offered some

    advice for t hose working with SPSS Synta x.

    SUMMARY

  • 8/9/2019 Programming With s Pss Syntax and Macros

    15/154

    Basic SPSS Programming Concepts 2 - 1

    SPSS Training

    Basic SPSS ProgrammingConcepts

    Introduction

    Comma nd Types in SPSS

    The Thr ee Types of SPSS P rogramm ing

    SPSS Data Definition

    SPSS P rogram ming Constru cts

    A Note About P rogram Execution

    Analysis Tip: Reordering Variables

    A

    ll SPSS procedures a re built upon a powerful program ming

    langua ge that ha s been consistent, though great ly extended, since

    SPSS wa s first developed as a main fra me sta tistical softwar eprogram. This course will teach you how to use th is language a nd other

    featu res for file an d dat a inpu t a nd m an ipulation, and for overa ll control

    of SP SS execution.

    The SP SS lan guage, called syntax, is generat ed by the progra m every

    time a user clicks on t he OK bu tt on in a dialog box to execut e a

    procedure. Behind t he scenes, SPSS builds syntax to send to th e SPSS

    centr al engine to execut e a pa rticular pr ocedure or tr an sforma tion. Using

    the P aste bu tton in a dialog box places a copy of that synta x in a Synt ax

    window so that it can be edited or saved and u sed again.

    SPSS synta x is also often called a commandor set of comman ds. The

    gram mar or rules associated with comma nds a re fairly simple, an d we

    will review them a s necessar y thr oughout t he chapter s.

    SPSS for Windows 10.0 can ru n en tirely on your deskt op machine.

    Altern at ively, an SP SS Client, th rough which you requ est an alyses and

    view results, can run on your desktop, while the an alyses are ru n by the

    SPSS Server, possibly located on a different ma chine. In this cours e,

    except for t he directory you use t o access th e tr aining da ta files, it ma kes

    no difference wheth er t he SP SS Server is locat ed on your desktop or a

    differen t comput er. Th e SP SS Ser ver Login dialog (click F ile..Switch

    Server ) allows you to connect t o a remote SP SS Ser ver (if inst alled on

    your network).

    If you are r un ning SP SS from a Remote (not Local) server, th en t ouse t he da ta files accompanying th is course, they mu st be copied either t o

    the server r un ning SPSS or to a directory tha t can be accessed by

    (ma pped from) the ser ver. The directory references in t his guide assu me

    you a re r un ning SP SS as a local server a nd can th us directly access files

    stored on your ha rd dr ive.

    Chapter 2

    Topics

    INTRODUCTION

    Note about DataFile Access

    When RunningSPSS from a

    Remote Server

  • 8/9/2019 Programming With s Pss Syntax and Macros

    16/154

    Basic SPSS Programming Concepts 2 - 2

    SPSS Training

    Most u sers su ccessfully progra m in SP SS without a complete

    un dersta nding of th e various comma nd an d progra m sta tes of SPSS, an d

    you can too. Nevertheless, it helps to kn ow a little about th is subject,

    par ticular ly to help put th e various capabilities of SPSS in cont ext. Many

    users know th e difference between transformations and procedures, the

    two main t ypes of comma nds, but th ere ar e a few oth ers:

    Fi le Def ini t ion Commands: As their na me implies, all these

    comman ds are used to input da ta int o SPSS. They include

    fam iliar comman ds su ch as GE T, DATA LIST, or GE T

    CAPTURE ODBC, but also other s like FILE TYPE, IMPORT,

    or MATCH FILE S.

    Input Program Commands: These a re s pecialized comm an ds,

    also used to input or create dat a in SPSS . These comma nds

    are m ore esoteric and in clude RECORD TYPE, REPE ATING

    DATA, and END CASE. We ha ve more to say about th is

    below when discussing the data step in SPSS.

    Transformation Commands: These comman ds are quite var ied

    in their operation, but th e key element t hey shar e in common

    is that th ey neither input da ta n or an alyze data. Instead, they

    modify data (COMPUTE, RE CODE), creat e new var iables

    (VECTOR, NUMERIC), write out da ta (WRITE), or label dat a

    (VARIABLE LABELS). To be specific, tr an sform at ions do not

    cau se SPSS to read th e data file.

    Procedure Commands: Almost a ll of these comma nds an alyze

    dat a. However, the a ctual definition of a procedure in SPSS is

    a comma nd t ha t causes dat a to be read. Thus, SAVE is also a

    procedure because it causes the dat a t o be read an d an SPSS

    system file ( an .SAV exten sion) to be crea ted.

    Util i ty Commands: These comma nds h an dle a variety of chores.

    They include comma nds to add comments (COMMENT,

    DOCUME NT), to define a n ew file (NE W FILE ), an d to define

    macros (DEF INE--!END DEFIN E).

    Knowing about comman d types will be helpful in u nderst an ding how

    an d why a program operat es. For example, XSAVE, an altern at ive to the

    SAVE comman d, can be used within a loop becau se it is a t ra nsforma tion,

    not a pr ocedure.

    COMMANDTYPES IN SPSS

  • 8/9/2019 Programming With s Pss Syntax and Macros

    17/154

    Basic SPSS Programming Concepts 2 - 3

    SPSS Training

    Logically, there a re t hr ee general met hods of program ming in SP SS. Two

    of them involve synt ax, while the t hird u ses a version of th e Basic

    programm ing langua ge (Sax Ba sic).

    Standard Syntax: These programs ar e the most common and simply

    involve writing a series of SPSS commands to accomplish a set of tasks.

    An example of a simple progra m is sh own in th e box (th is program u ses

    the F ILE TYPE comma nd t o read a non-standa rd ASCII data file). In a

    stan dar d synta x progra m, each comman d does one th ing, an d it does not

    refer to oth er SPSS synt ax. Stan dar d progra ms ar e executed either

    through the Run button, the INCLUDE comman d, or th e SPSS

    Production Facility.

    * SPSS Exa mple to read a n ested file * .

    FILE TYPE N ESTE D FILE 'C:\ TEST.DAT' / RECORD RECID 1

    (A) CASE 3 (F).

    RECORD TYPE 'H'.

    DATA LIST /H1 t o H10 5-14.

    RECORD TYPE 'F'.DATA LIST /F1 to F5 15-19.

    RECORD TYPE 'P'.

    DATA LIST /CASEX 3 P1 t o P3 20-22.

    END FILE TYPE.

    LIST

    Macros: Many pr ogra ms a llow user s to define macros , which a re

    typically a series of comman ds grouped t ogether as a sin gle comma nd to

    mak e everyday tasks easier an d more convenient. Macros can often be

    assigned to a t oolbar or men u t o make t hem readily accessible. Normally,

    macros are sa ved as a ser ies of inst ru ctions in a special ma cro langua ge.

    SPSS Ma cros are a bit different an d not exactly par allel to the m ore

    common definition of a ma cro in other programs. F irst, th ey are wr itten

    in SPSS synta x (plus a few special ma cro comma nds) and are essen tially

    executed like a ny other synta x file. Second, they genera te cust omized

    SPSS comma nd synta x, i.e., stan dar d syntax, to reduce the time an d

    effort needed by th e program writer t o perform complex and repetitive

    tas ks. Ther e is no special ma cro editor in SP SS or m acro facility to

    execut e a m acro; aga in, a m acro is simply a specialized synta x file. Below

    is an example of a ma cro th at a ut omates th e production of a bar graph

    an d th e insert ion of todays dat e into th e title of a gra ph (th is progra m

    actua lly defines two macros). The ma cro begins with DEF INE an d endswith !ENDDEFINE .

    THE THREETYPES OF SPSSPROGRAMMING

  • 8/9/2019 Programming With s Pss Syntax and Macros

    18/154

    Basic SPSS Programming Concepts 2 - 4

    SPSS Training

    DEF INE !GRAPHI T ( ARG1 !TOKEN S(1) / ARG2 !TOKENS(1)/

    ARG3 !TOKE NS (1)) .

    GRAPH /BAR(SIMPLE )=COUN T BY !ARG1

    /TITLE=

    "EXAMPLE OF MACRO GRAPHING TITLES AND DATES"

    /SUBTITLE !QUOTE(!CONCAT(!UNQUOTE(!ARG2),

    "Something ",!UNQUOTE(!ARG3)))/FOOTN OTE = !CON CAT(""Toda y is ",!EVAL(@DATE IT),""" ).

    !ENDDEFINE .

    DATA LIST F REE / A.

    BEGIN DATA

    1 2 3 2 1 2 4

    EN D DATA.

    * The following defines a ma cro entity wit h t odays da te *.

    DO IF $CASENUM=1 .

    WRITE OUTFILE 'TMP'/

    'DEF INE @DATEIT()',$TIME(ADATE),'!EN DDEF INE .'.

    END IF .

    EXECUTE.

    INCLUDE 'TMP' .!GRAPHIT ARG1=A ARG2="ARGUMEN T 2"

    ARG3="ARGUMENT 3".

    SPSS Scripting Faci l i ty: The s cripting facility a lso allows you to

    aut omat e tasks in SPSS. It ha s far more power and capabilit ies tha n

    SPSS m acros. You can accomplish t he sam e task s th at you would with a

    macro, but can do much more, including the creation of dialog boxes, the

    cust omizat ion of out put in t he Viewer window, or the writing out of

    selected portions of output to a sepa ra te file for u se in other programs.

    Scripts can be set to run au tomatically or run at user choice.

    Unlike synta x progra ms or macros, scripts are writt en in a special

    langua ge, Sax Basic, that is similar to the macro lan guage in other

    progra ms, su ch as Visua l Basic for Applicat ions. Of cour se, scripts can

    also use SPSS synt ax in th eir definition, an d th ey can be assigned to a

    menu choice like macros. Scripts ar e often somewhat lengthy compa red t o

    stan dar d synta x, so we wont display an exam ple in t his chapter . SPSS

    Scripts are covered in th e Program m ing with SPS S S cripts training

    course.

  • 8/9/2019 Programming With s Pss Syntax and Macros

    19/154

    Basic SPSS Programming Concepts 2 - 5

    SPSS Training

    A substa ntia l portion of th is cour se is devoted to th e ma nipulat ion of files

    an d data with SPSS . As such it will be helpful to underst an d a bit about

    SPSS dat a file definition. More informa tion is available in th e Comman ds

    and P rogram States Appendix in the S PS S Base 10.0 Synta x Reference

    Guide (th is guide is available on the CD-ROM cont aining SP SS a nd can

    be copied to your ha rd d rive when S PSS is insta lled).

    To do something in S PSS you need t o define a working da ta file,

    possibly tra nsform t he dat a, and t hen a na lyze it. The first ta sk is

    accomplished with a file definition command. A simple instance might be

    the GET comma nd t o open a n SP SS dat a file, but a more complex one is

    INPU T PROGRAM to define a n on-sta nda rd da ta file. In eith er case, file

    definition comm an ds use an input program state to accomplish th eir job.

    In an input program stat e, SPSS must determine how to read a data file,

    what th e definition is of a case, when t o creat e a case, and wh en to create

    the da ta file.

    Often th ese decisions ar e str aightforwar d for S PSS. When you click

    on File...Open..Data, a nd na me a file with a n exten sion of SAV, SPSS

    kn ows how to read t he file, th at each logical r ecord in th e file is to bewritten to one row in the Dat a E ditor, and tha t i t sh ould read th e whole

    file. Or in th e simple pr ogra m below, the execution of th e DATA LIST

    command (accessed by choosing File...Read Text Data; note that as of

    SPSS 10.0 a GET DATA comman d is pasted inst ead) tells SPSS t ha t

    what follows is a n ormal file, where ea ch line of dat a is t o be written to a

    new row, or case, in th e Data Editor, and t ha t th e last case should be

    written a nd t he file creat ed when th e END DATA comma nd is

    encountered.

    DATA LIST F REE / X.

    BEGIN DATA.

    1 2 3

    EN D DATA.LIST.

    However, even in th ese simple comm an ds, SPSS ent ers an input

    program st at e tha t is more complex tha n it first appear s. In fact, you can

    explicitly place SPSS int o this sta te by using t he INP UT P ROGRAM

    comman d. Thu s, the following program is equivalent t o the first.

    INPU T PROGRAM.

    DATA LIST F REE / X.

    END INP UT PROGRAM.BEGIN DATA.

    1 2 3

    EN D DATA.

    LIST.

    SPSS DATADEFINITION

  • 8/9/2019 Programming With s Pss Syntax and Macros

    20/154

    Basic SPSS Programming Concepts 2 - 6

    SPSS Training

    The difference is tha t first, SPSS is explicitly put int o the inpu t

    program sta te, and second, SPSS is told when to quit the input program

    sta te (with END INPU T PROGRAM). There is, of course, no reason to use

    INPU T PROGRAM in th is un complicated example, but for complex dat a,

    using an inp ut pr ogra m ma y be necessary to successfully read dat a into

    SPSS. We will see more of INPUT P ROGRAM in Ch apt er 4.

    The k ey to un dersta nding a nd using complex file definitioncomman ds in SPSS is to un dersta nd th at you, the user, are in charge of

    telling SPSS wha t const itut es a case or row in the Data Editor, when to

    create th at case, and when t o end the input program a nd create the

    working da ta file. As an illustr at ion, it is possible through a n inpu t

    program t o have SPSS r ead only a portion of a file rat her t ha n first

    creat ing a lar ger working data file which is t hen reduced by selecting

    certa in cases to retain.

    SPSS da ta files must be rectangu lar (denorma lized, in data base

    term inology). This means th at ther e must be a value for every variable

    for every case. Or t o put it a nother way, each row of the Dat a E ditor

    defines a case t o SPSS. Often, da ta come in a forma t t ha t doesnt m at ch

    this layout, an d th at is one of th e most common u ses for the inpu t

    program capa bility. In a ddition, SP SS su pplies several pr edefined

    complex file definit ions t ha t r ead common t ypes of non-recta ngu lar files

    (we will discuss th ese in Ch apt er 3). Chan ging th e case definition of a file

    is a comm on techn ique to solve a var iety of pr oblems.

    As with other programm ing lan guages, SPSS program s, whether th ey be

    sta nda rd synta x, macros, or scripts, all have several sta nda rd const ru cts

    tha t can be used t o do man y things. These include the a bility to loop, to

    creat e an ar ray of elements, t o repeat a ctions, a nd t o do actions only if

    some condition is tr ue. We illustra te th ese concepts her e with sta nda rd

    synta x; th en later you will see these const ru cts u sed again in m acro0s.

    A DO IF & END IF st ru cture is used to execut e tra nsform ations on

    subs ets of cases b ased on s ome logical condit ion. It is often used to

    replace a long series of IF st atem ents. Th e logical stru cture of a DO IF

    comman d sequence is:

    DO IF (test for condition)

    transformations

    ELSE IF (test for a nother condition)transformations

    ELSE IF or ELSE

    furt her tra nsformat ions

    END IF

    SPSSPROGRAMMING

    CONSTRUCTS

    DO IF & END IF

  • 8/9/2019 Programming With s Pss Syntax and Macros

    21/154

    Basic SPSS Programming Concepts 2 - 7

    SPSS Training

    The clear a dvant age is that not all stat ements a re execut ed for each

    case, as is tr ue for a series of IF stat ement s. Consider regression an alyses

    tha t foun d th at t he relat ionsh ip between gross domestic product (GDP)

    an d birth ra te (BTHR) is not the sa me for first- and th ird-world coun tries

    (which is definitely true). The results of th e separ ate regression an alyses

    can be a pplied to a file of first - and t hir d-world count ries efficient ly with

    these comman ds (where WORLD is t he selection variable).

    DO IF (WORLD = 1).

    COMPU TE BTHR = 10.872 + .0014 *GDP.

    ELS E IF (WORLD = 3).

    COMPU TE BTH R = 46.148 -.004 * GDP.

    END IF.

    Although we could h ave accomplished t he sa me with two IF

    comman ds, the advan ta ge is tha t th e Else If an d second Compu te

    comman ds a re not executed for first-world countr ies.

    A DO REPEAT const ru ct a llows you t o repeat t he sa me group of

    tra nsforma tions on a s et of variables, thereby reducing the nu mber of

    comma nds th at you m ust enter. SPSS mu st sti l l execute t he same

    number of commands; the efficiency comes for the user, not SPSS. To

    illustrate its use, lets access the 1994 General Social Survey file, stored

    in the c:\ Train\ ProgSynMac directory.

    First , to simplify the in stru ctions in th is course, we will request tha t

    variable na mes (and n ot the default variable labels) be displayed in

    dialog boxes. Fr om within SPSS:

    ClickEdit..Options

    Click the Display Names option but ton in th e Variable Lists

    section of th e Genera l tab

    Click the Alphabetical option butt on in th e Variable Lists

    section of th e Genera l tab

    In order t o display SPSS comma nds in t he Viewer window when we

    run an alyses, we cha nge one of th e Viewer options.

    ClickViewer ta b in th e SPSS Options dialog box

    Clickcheckbox beside Display commands in the log

    ClickOK

    Now to read th e data.

    ClickFile...Open..Data

    Move to th e c :\ Tra i n \ ProgSynMac directory (if necessa ry)

    Double-click on GSS94

    There ar e several questions in the file th at a sk about wheth er

    na tiona l spending on var ious program s or ar eas should be increased, stay

    the sa me, or be redu ced. Imagine th at we wish to compare pa irs of

    DO REPEAT &END REPEAT

    DisplayingVariable Names in

    Dialog Boxes

  • 8/9/2019 Programming With s Pss Syntax and Macros

    22/154

    Basic SPSS Programming Concepts 2 - 8

    SPSS Training

    questions (ur ban problems an d welfar e) to see whether or not a

    respondent gave th e same an swer to each. The program in F igure 2.1

    accomplishes th at t ask. Open it by

    Clicking on File...Open..Syntax (move t o

    c:\ Train \ Pr ogSynMac folder if necessary)

    Double-click on CHAPT2

    Figure 2.1 DO REPEAT & END REPEAT Program

    The Do Repeat str ucture requires tha t sta nd-in variable na mes be

    used to represen t a list of variables or constan ts. The sta nd-in variables

    exist only within th e DO REPEAT stru cture. Between the DO REP EAT

    an d END REP EAT comman ds, tran sformat ion comma nds can be used,

    referencing the st an d-in variables. The PRINT keyword on t he EN DREPE AT comma nd tells SPSS to list th e comman ds generat ed by th e DO

    REPE AT structur e. (This is a good idea except when SPSS genera tes

    hu ndr eds of comma nds.)

    The COMPUTE comma nd uses an other feat ur e of SPSS synt ax, true/

    false comparisons. The COMPUTE st at ement tells SPSS to compare th e

    values of the element s in F IRST to SECOND, in pa irs. When, for

    example, NATSPAC is equal to NATENVIR, the test is tr ue an d SPSS

    retu rn s a 1 to the var iable SAME. When th e two responses a re not

    equal, SPSS returns a false, or 0, to SAME.

    To see this in operation

    Highlight all th e lines from DO REP EAT to LIST, th en click on

    the Run button

  • 8/9/2019 Programming With s Pss Syntax and Macros

    23/154

    Basic SPSS Programming Concepts 2 - 9

    SPSS Training

    After SP SS ru ns t he comma nds t he Viewer window opens, as sh own

    in Figure 2.2. SPSS creates four COMPUTE comman ds based on th e DO

    REPE AT structur e. Scrolling down t hr ough t he outpu t from LIST (not

    shown) demonstra tes th at wh en NATSPAC is not equal to NATENVIR,

    DIFF1 is set equal to zero, and when t he two responses ar e equal, DIFF1

    is set t o 1 (a va lue of 0 for eith er of th e spen ding var iables is defined as

    missing so the COMPUTE is not done).

    Figure 2.2 Output from END REPEAT PRINT

    Although in this inst an ce little if an y work was sa ved by the use of

    DO REPE AT, in man y circum sta nces the savings can be subst an tial.

    The DO REPEAT stru ctu re is an iterat ive const ru ct because SPSS

    iterat es over sets of elements t o car ry out th e user inst ru ctions. A more

    gener ic form of itera tion is pr ovided by th e looping facility in SP SS,

    represent ed by th e LOOP & EN D LOOP comman ds. They can be used t o

    perform repeat ed tra nsforma tions on t he sam e case u nt il a specified

    cutoff is reached, which can be defined by an index on the LOOP

    comman d, an IF statement on t he EN D LOOP comman d, or otheroptions. By default, th e ma ximum nu mber of loops is 40, defined on th e

    SET comma nd. Almost an y tra nsforma tion can be used within a loop.

    We begin with a very simple loop to illust ra te its synta x.

    Click on Window ...CHAPT2 - SPSS Syntax Editor to return to

    the Synt ax Editor window

    Scroll down t o the program shown in F igure 2.3

    LOOP & ENDLOOP

  • 8/9/2019 Programming With s Pss Syntax and Macros

    24/154

    Basic SPSS Programming Concepts 2 - 10

    SPSS Training

    Figure 2.3 LOOP & END LOOP Example

    A NOTE ABOUTPROGRAM

    EXECUTION

    On t he LOOP comma nd, we tell SPSS t o loop five times with t he

    index clause of #I=1 to 5. This tells SPSS to repeat th e COMPUTE

    comman d five times for ea ch person in t he GSS file. Usually indices ar e

    increased by one, as in t his example, but t ha t is not always t he case. Nor

    must t hey begin a t 1.

    The COMPUTE comman d itself tells SPSS t o add one to the pr evious

    value of Z, which h as initia lly been set t o 0 before th e loop. The loop t hen

    finishes with th e required END LOOP comma nd to tell SPSS t he

    constru ct has finished.

    Notice tha t th e program ends with an EXECUTE comma nd. When

    run ning synta x from a Synt ax window, SPSS does not immediately

    process tr an sform at ions by read ing the da ta file. Inst ead, it stores

    transformations in memory and waits until a command is encountered

    which forces a pass of the dat a. This is in comparison to run ning SPS S

    comman ds from a dialog box, where th e comma nd is execut ed

    immediately after th e OK butt on is clicked. The E XECUTE comma nd

    forces a pa ss of th e data an d execut es any pr eceding tra nsform at ions.

    Highlight th e comman ds from t he first COMPU TE to

    EXECUTE

    Click on the Ru n button

    To see th e effect of th is progra m

  • 8/9/2019 Programming With s Pss Syntax and Macros

    25/154

    Basic SPSS Programming Concepts 2 - 11

    SPSS Training

    Switch to th e Data Editor window

    Scroll to th e last column in th e Data View sheet

    Figure 2.4 Data Editor with variable Z added

    SPSS ha s a dded Z to itself plus 1 five times, a nd since Z initially was

    zero, Z is now 5 for every case in th e file. To reiter at e, th e LOOP

    comman d works with in a case rat her t ha n a cross cases. We will see man y

    uses of looping in pr ogra ms, a nd t he concept of looping will be repea ted in

    macros and scripts.

    The va ria ble #I used t o index t he loop does not exist in t he GS S file. If it

    did, we would see it next t o Z in th e Data Editor. It ha snt been creat ed by

    SPSS because it wa s declar ed a scratch va riable. This is done by

    specifying a variable nam e tha t begins with th e # cha ra cter. Scrat ch

    variables are used in tra nsforma tions or data definition when t here is no

    reason to reta in them in the da ta file. They cannot be used in procedures.

    SCRATCHVARIABLES

  • 8/9/2019 Programming With s Pss Syntax and Macros

    26/154

    Basic SPSS Programming Concepts 2 - 12

    SPSS Training

    A vector is a construct used to reference a set of existing variables or

    newly creat ed variables with an index. The vector can reference either

    string or nu meric var iables.

    Here is how to creat e a vector from existing var iables.

    VECTOR SAT = SATCITY TO SATHE ALT.

    The vector S AT is created from t he five questions in t he Gen eral

    Social Survey tha t a sk a bout a r espondent s sat isfaction with various

    aspects of his/her life. This vector is not visible in the Data Editor as a

    separate variable or set of variables because it is a logical construct from

    existing var iables. These variables must be cont iguous in th e file; th at is,

    they mu st be located next to each other wh en viewed in th e Data E ditor.

    Conversely, the synta x

    VECTOR X(10).

    will create 10 new variables with names from X1 to X10, allinitialized to system-missing. To illust ra te t his point

    Switch to the CHAPT2 - SPS S Syn tax Editor window

    ClickVECTOR X(10)., then click t he Run button

    Go to the Data Editor (click Goto Dat a tool )and scroll to

    the las t co lumns

    Figure 2. 5 Data Editor with Variables X1 to X10

    VECTOR

    The ten new var iables all have system-missing values for ea ch case.

  • 8/9/2019 Programming With s Pss Syntax and Macros

    27/154

    Basic SPSS Programming Concepts 2 - 13

    SPSS Training

    A more int eresting use of a vector is illustra ted by th e synta x shown

    in Figure 2.6. In t he DO REP EAT example we compa red t he value of one

    spending var iable to another to see if responses wer e identical. We can

    accomplish a similar ta sk with vectors an d loops. In t his insta nce we wish

    to compare t he r esponses on t he var iable NATCITY to responses on four

    other var iables (NATCRIME, NATEDU C, NATRACE, AND NATARMS).

    And inst ead of creat ing a new var iable th at indicates wheth er th e

    response on NATCITY is identical or not to the other four variables, wewill compu te t he differen ce.

    Switch to the CHAPT2 - SPS S Syn tax Editor window

    Scroll down t o the program shown in F igure 2.6

    Figure 2.6 Program with Vector and Loop to Compute Differences

    Between Variables

    The VECTOR comma nd creates t wo new vectors. GROUP is

    composed of the five var iables from N ATCITY to NATARMS (again, t hey

    must be cont iguous). DIFF_ has four elements an d so creat es four new

    variables, DIFF_1, DIFF_2, DIFF_3, an d DIF F_4. We will place th e

    differen ce for each pa ir in t his vector.

    The loop increments by 1 but begins at 2 ra th er th an 1. It loops until

    5 (or a total of four t imes) because t here ar e four variables to compa re t oNATCITY. On th e first pa ss th rough th e loop, the COMPU TE comma nd

    compares NATCITY (the first element of GROUP) to the second element

    of GROUP (NATCRIME) and pu ts t he differen ce in DIF F_(1). And s o on

    for thr ee oth er itera tions.

  • 8/9/2019 Programming With s Pss Syntax and Macros

    28/154

    Basic SPSS Programming Concepts 2 - 14

    SPSS Training

    Highlight all the lines from VECTOR GROUP to LIST

    Click the Run button

    Figure 2.7 List Output Showing DIFF_1 to DIFF_4

    Where a case h as valid values for t he spen ding variables, we can seethat SPSS created the four new DIFF_ variables measuring the

    difference between NATCITY an d th e other four spen ding items. It would

    be straightforward to create additional COMPUTE statements to

    compare all other possible pairs.

    We have used the LIST comma nd t o check the operat ion of SPSS in

    two of the examples. Checking to see wheth er synt ax ha s done what you

    expected it t o do is very importa nt when doing SP SS program ming. The

    SUMMARIZE comman d, available thr ough the men us, can do wha t LIST

    does an d more, but LIST is easier to type an d less complicat ed when

    using syntax.

  • 8/9/2019 Programming With s Pss Syntax and Macros

    29/154

    Basic SPSS Programming Concepts 2 - 15

    SPSS Training

    A set of variables mu st be contiguous wh en placing th em int o a vector.

    What can you do if tha t is not tr ue in an existing file? Perh aps t he easiest

    meth od to rear ra nge variables is to use the t rick of matching a file to

    itself. Figure 2.8 displays syntax from CH APT2.SPS t ha t illust ra tes th is

    technique.

    Switch to the CHAPT2 - SPS S Syn tax Editor window

    Scroll down t o the Match F iles example

    Figure 2. 8 Match Files Program

    ANALYSIS TIP:REORDERING

    VARIABLES

    Normally, MATCH F ILES is u sed to mat ch one file to an oth er.

    However, here t he working dat a file (referenced by an a sterisk on th e

    FILE subcomma nd) is mat ched to itself because no other file is nam ed.

    Usua lly files are m atched u sing one or m ore link variables (for example,

    ID nu mber), but h ere it is n ot n ecessar y since we ma tch one file to itself.

    The key portion of th e MATCH FILE S comma nd is th e KEEP

    subcomma nd, where we list the variables we wish to reta in in the order

    we want t hem t o appear in the Dat a E ditor. The EXECUTE comman d isrequired because MATCH is a t ra nsforma tion, not a procedure.

    Highlight th e lines from MATCH to EXECUTE

    Click the Run button

    Switch t o the Data Editor window and scroll to the last

    columns

  • 8/9/2019 Programming With s Pss Syntax and Macros

    30/154

    Basic SPSS Programming Concepts 2 - 16

    SPSS Training

    Figure 2.9 Data Editor with Spending Variables moved to the End

    The KEEP subcommand named the spending variables last, so they

    ha ve been moved to the last column s in the Dat a Ed itor.

    We reviewed the t ypes of SPSS comma nds, t he t hr ee types of SPSS

    programs, a nd briefly reviewed dat a d efinition. We th en discussed someof the k ey progra mming t echn iques in SP SS, including t he u se of loops,

    the creation of vectors, the processing of conditional statements (DO IF),

    an d th e creation of repeating elements (DO REPE AT). These techniques

    will be used repeatedly in SPSS pr ogram ming. There ar e a few oth er

    import an t programm ing techn iques tha t you will see in later chapt ers

    when th e need arises. We turn in Cha pter 3 to the han dling of complex

    dat a files.

    SUMMARY

  • 8/9/2019 Programming With s Pss Syntax and Macros

    31/154

    Complex File Types 3 - 1

    SPSS Training

    Complex File Types

    IntroductionASCII Dat a a nd Records

    File Types

    Synta x Basics

    Data F ile Structure

    Reading a Mixed File

    Errors in the Data

    Grouped File Type Without Record Information

    Most SPSS users find th at t he sta nda rd DATA LIST comman d is

    sufficient t o read t he great majority of the dat a files th ey

    norma lly encoun ter. This is because most da ta files ar e

    rectangu lar, i.e., they cont ain t he sam e num ber of records per case, th e

    definition of a case is consistent thr oughout th e file, and t he var iables to

    be defined ar e identical for each case. There a re, however, situa tions in

    which the above conditions do not hold. One example is a file at a medical

    center with two types of records, one for inpatients and one for

    out pat ients, with identical var iables locat ed in different column positions

    on each type of record, and some variables un ique to each type of patient.

    A standa rd DATA LIST cannot corr ectly read such a file and creat e a

    separa te case for each pa tient t ype.

    To ha ndle such a data file and an y oth er th at is non-recta ngular ,

    SPSS su pplies two general solut ions. The first is to use a FILE TYPE

    comman d, which allows th e use of predefined file types t ha t r ead

    grouped, mixed, or nested files. The second solution is to allow the user to

    tak e complete cont rol of the process of reading da ta with a n IN PUT

    PROGRAM comman d. This chapt er discusses th e use of file types; the

    next chapt er covers input programs. A third solut ion is t o read th e file

    with a standard DATA LIST command, then use other programming

    techniques to restr uctur e th e file, such a s VECTOR and LOOP . We will

    also illustrate this approach in subsequent chapters.

    Before r eviewing the va rious file types, we n eed to discuss some

    backgroun d informat ion.

    Chapter 3

    Topics

    INTRODUCTION

  • 8/9/2019 Programming With s Pss Syntax and Macros

    32/154

    SPSS Training

    Complex File Types 3 - 2

    SPSS a ssum es tha t complex files ar e in ASCII format so that they can be

    read with a DATA LIST command (within the complex file types). Files

    tha t ar e stored in a spreadsheet or data base format cann ot be read

    directly by SPSS with th ese techniques. In th at case, you have two

    options. You can write out a n ASCII file from th e other softwa re a nd t hen

    read it into SPSS with a complex file definition. Or you can read the file

    into SPSS as you norma lly would, tempora rily creat ing a working file

    with a n incorr ect forma t for a na lysis. You can then use var iousprogramm ing techn iques to restructur e the file.

    The u se of complex file types r equires an un dersta nding of a record.

    For SPS S, a r ecord refers t o a physical line of data in an ASCII data file.

    Techn ically speaking, a record ends with a carriage retu rn an d a line feed

    (these are invisible to users in most software). In practice, if you open a

    dat a file in a text editor, such a s Notepad, ea ch line will correspond to a

    record in th e dat a file. However, th is is not always th e case in word

    processing software t ha t wr aps lengthy lines, so be careful when dea ling

    with a file for which you d o not ha ve a codebook tha t lists r ecord length.

    It is common to ha ve several records for each case you plan to createin th e fina l SPSS dat a file or t o have several cases on one physical record.

    Under stan ding what constitu tes a record an d what t he case definition

    should be in th e fina l SPSS file is part of th e ar t of successful data input

    programming.

    In general, ASCII text da ta files can be in either fixed or d elimited

    form at . Becau se complex file types m us t be a ble to locate case a nd /or

    record variables, though, complex data should be stored in a fixed-format

    ASCII text file.

    The th ree available file types within th e FILE TYPE comma nd a re:

    Grouped: This is a file in which all records for a single case

    are locat ed physically togeth er. Ea ch case usua lly has one

    record of each type. Ea ch record sh ould have a case

    ident ificat ion va ria ble. This type of file is often iden tical to a

    stan dar d rectangu lar file, th e difference being th at a grouped

    file type allows addit iona l checking for er rors becau se of

    missing and out-of-sequence records, since SPSS normally

    assumes th at the r ecords ar e in the sa me sequence within

    each case.

    Mixed: This is a file in which each record type defines a case.

    Some informat ion m ay be th e sam e for a ll record types but

    can be recorded in different locations. Ot her informa tion ma y

    be recorded only for s pecific record types. N ot all r ecord types

    need be defined, so th is is often a very efficient met hod to

    read only part of a da ta file.

    ASCII DATA ANDRECORDS

    FILE TYPES

  • 8/9/2019 Programming With s Pss Syntax and Macros

    33/154

    Complex File Types 3 - 3

    SPSS Training

    Nested: This is a file in wh ich t he r ecord types are r elated to

    each other hierarchically. An example is a file containing

    school records and st udent records, where a ll th e studen ts

    att ending one school have their records placed together after

    the school record. Usually the lowest level of the hierarchy,

    the st udent in th is exam ple, defines a case. Informat ion from

    the higher-level records, perhaps overall GPA at the school, is

    usu ally spread t o the lower-level record when th e case isdefined. All record types that form a complete set should be

    physically grouped together, with an optional case identifier

    on each record. It is wort h noting tha t record types can be

    skipped when reading th e data , resulting in th e creat ion of

    cases at a higher level in the h ierarchy.

    Complex file type programs a re begun by th e comman d FILE TYPE an d

    closed with t he comma nd E ND FILE TYPE . These two comman ds enclose

    all definitiona l stat ement s. One of th e thr ee keywords GROUP ED,MIXED, or NESTE D mu st be placed on t he F ILE TYPE comma nd. The

    comman ds tha t define the dat a mu st include at least one RECORD TYPE

    an d one DATA LIST comm an d, though it is common t o have several. One

    set of RECORD TYPE and DATA LIST commands is used to define each

    type of record in any data file. The definition of a case, again, depends

    upon which FILE TYPE is specified.

    The RECORD subcomman d is required an d nam es the column

    location of the record identification information and, optionally, the

    variable tha t will store th is information. A CASE su bcomman d is also

    available (an d required for a grouped file) tha t specifies the n ame an d

    location of th e case ident ificat ion in form at ion.

    This syntax illustrates the basic structure,

    FILE TYPE (Grouped, Mixed, or Nest ed) FI LE = 'Your File' /

    RECORD = RECID 4 CASE = ID 1-3.

    RECORD TYPE 1.

    DATA LIST / your variables and column locations here.

    RECORD TYPE 2.

    DATA LIST / more variables here.

    etc.

    END FILE TYPE.

    All three file types have subcomman ds available that warn th e userwhen r ecords and cases ar e encoun tered t ha t don't meet t he definitions of

    the file type, record, an d case. This warn ing can include situ ations when

    records a re m issing.

    After t he FILE TYPE --END F ILE TYPE str uctur e is processed, a

    rectangular active file is created, no mat ter t he origina l stru cture of the

    raw da ta file.

    SYNTAX BASICS

  • 8/9/2019 Programming With s Pss Syntax and Macros

    34/154

    SPSS Training

    Complex File Types 3 - 4

    To fur th er illustra te t he t hr ee types of files, we display sa mples of dat a

    files th at can be rea d as grouped, mixed, or n ested files.

    A grouped d ata file often looks identical or very similar t o a sta nda rd

    rectangu lar d ata file. However, a grouped file often ha s one or more of thefollowing problems:

    1. A differen t n um ber of records for ea ch case

    2. Records out of order

    3. Records with t he wr ong record n um ber

    4. Duplicate records

    These situa tions a ll mean th at SP SS will not read th e file

    successfully with a simple DATA LIST command. The structure of a

    simple grouped da ta file is shown in F igure 3.1.

    Figure 3.1 Grouped Data File for Hospital Patients

    DATA FILESTRUCTURE

    GROUPED DATA

    These data ar e from a hospital and cont ain inform ation on tests a nd

    procedures a dministered t o each patient . Each pat ients data begins with

    a r ecord tha t lists identifying informat ion. The second a nd su bsequent

    records include informat ion on a test t ha t was given, th e dat e of the test,

    an d th e cost. Ea ch r ecord after th e first defines a test , but we would like

    the case definition to be a pa tient. The pr oblem is that a different nu mber

    of tests is given to each pat ient, so we can not specify th e sam e nu mber of

    records for each pa tient.

    A CASE su bcomman d is required on th e FILE TYPE comma nd in

    addition to th e RECORD subcomma nd. Records with a missing or

    incorrect case identification information cannot be corrected and placed

    with th e correct case, but SP SS will warn you a bout th e problem.

    All defined variable n ames for a gr ouped file mu st be un ique becau se

    mu ltiple records will be put t ogether t o form one case. By defau lt all

  • 8/9/2019 Programming With s Pss Syntax and Macros

    35/154

    Complex File Types 3 - 5

    SPSS Training

    inst an ces of missin g, dup licat e, out of ra nge (called "wild") or out-of-order

    records will result in wa rn ings from SPS S.

    A mixed raw da ta file looks quite different th an a r ecta ngular dat a file.Again , a MIXED file type is used wh en ea ch record t ype defines a

    separa te case (though n ot all record t ypes need be defined). Figure 3.2

    depicts a port ion of th e file MIXED.DAT th at cont ain s job inform at ion on

    employees from a large company.

    Figure 3.2 Mixed Data File For Employees

    MIXED DATA

    Column s 2-4 contain an identification n um ber for each employee.

    This is necessary for t he compan y but n ot importa nt for a FILE TYPE

    MIXED definition. Colum n 6 cont ain s the requir ed record ident ificat ion

    informa tion, in t his case eith er a 1 or 2 (there is also a record n um ber 3

    not shown). The compa ny ha d th ree separa te record-keeping systems for

    employee informa tion for each division. The da ta from all th ree ha ve

    recently been placed in one file for reporting purposes.

    A stan dar d DATA LIST cannot be used to read t his file becau se some

    of the same information is in a different location for each record type.

    Salary is recorded in a different location for each record type, and other

    variables are n ot recorded in each system . We will att empt t o read th is

    file in th e first example.

  • 8/9/2019 Programming With s Pss Syntax and Macros

    36/154

    SPSS Training

    Complex File Types 3 - 6

    A nested dat a file also looks qu ite different th an a r ectan gular d ata file.

    A FILE TYPE NESTE D comman d is used when the r ecords in a file are

    hier ar chically related . One exam ple is a file with t wo types of records,

    one for each depar tmen t in a compa ny, an d one for ea ch of th e employees

    in that depa rt ment . All th e employee records for one depart ment a re

    placed consecutively together , after th e record for t he depa rt ment in

    which th ey are located and before th e next depart ment record.

    The variable nam es on all the r ecords must be u nique becau se one

    record of each type will be grouped t ogeth er to form a case. Since not all

    record t ypes need be ment ioned on th e RECORD TYPE comman d, it is

    possible to define a case at a higher level in the hierarchy, e.g., a

    department rath er th an an employee. In fact, a case can be defined at a ny

    level in th e hiera rchy of record types. Figure 3.3 depicts a n ested da ta file

    for a school distr ict.

    Figure 3.3 Nested Data File for School District

    NESTED DATA

    The file cont ains dat a on t he per forman ce of high school stud ents

    organized by homeroom and school. It contains three types of records:

    Record 1: The high s chool record cont ain s ident ifyinginform at ion on t he s chool plus t he s chool's overa ll GPA, SAT

    verbal and SAT mat h scores.

    Record 2: The h omeroom record conta ins iden tifying

    informat ion on t he h omeroom plus th e average GP A for all

    students in the homeroom.

  • 8/9/2019 Programming With s Pss Syntax and Macros

    37/154

    Complex File Types 3 - 7

    SPSS Training

    Record 3: The student record contains identifying information

    including th e studen t's sex and academic track, plus GPA,

    SAT verbal an d SAT mat h scores.

    There is only one variable in common for all three records, the record

    identifying informa tion in th e first column .

    The data file has no case identifying information, which is typical of

    real-life situat ions. Tha t isn't a pr oblem since SPSS simply stores the

    higher-level record information, spreading it to each record 3 when it

    creat es a case for each stu dent, ret aining th is informa tion unt il an other

    record t ype 1 is encoun tered. SPSS can still successfully creat e the cases

    even when interm ediate-level records ar e missing (a homeroom record, in

    this insta nce).

    We will read th e employee mixed data file from Figur e 3.2 (na med

    MIXED.DAT) int o SPSS a nd create a rectangular file with a case for eachemployee.

    Her e is a codebook t able for t he t hr ee types of employee record

    systems.

    RECORD 1 RECORD 2 RECORD 3

    VARIABLE LOCATION LOCATION LOCATION

    ID 2-4 2-4 2-4

    RECORD ID 6 6 6

    AGE 8-9 8-9 8-9

    SEX 11 11 11

    SALARY 13-17 22-26 13-17

    TENURE 19-20 19-20 19-20

    J OBCODE 22 17 not recorded

    LOCATION 24 15 22

    J OBRATE 26 13 not recorded

    Not only ar e var iables like SALARY recorded in differen t locat ions ,

    but two variables, JOBCODE a nd J OBRATE, were not r ecorded on r ecord

    3, which was t he oldest record-keeping system. When th e same varia ble,

    e.g., AGE, is defined by more th an one record t ype, the forma t t ype and

    length should be the same on all records. SPSS uses th e first appear an ce

    of th e var iable for t he a ctive file dictiona ry.

    The appr opriate comma nds t o read th is file are included in Figure 3.4

    an d ar e in the file MIXED.SPS.

    Click on File..Open..Syntax

    If necessary, switch directories to c :\ Tra i n \ ProgSynMac

    Double-click on MIXED

    READING A

    MIXED FILE

  • 8/9/2019 Programming With s Pss Syntax and Macros

    38/154

    SPSS Training

    Complex File Types 3 - 8

    Figure 3.4 Mixed File Type Program

    The comma nd F ILE TYPE begins the file definition a nd pu ts SP SS

    into an input program state. The MIXED subcommand tells SPSS that

    this is a m ixed dat a file. The data file is nam ed here, not on th e DATA

    LIST comma nds th at follow. The only oth er r equired subcomma nd is

    RECORD to specify the record identification variable. The equal sign is

    not required following RECORD or FILE. The record variable is incolum n 6 a nd will be named S YSTEM. For t he employee data it is

    importan t to retain informat ion that tells us under what record system

    the da ta were created because of duplicat e IDs und er each system; often,

    though, t he r ecord variable doesnt n eed to be reta ined in t he final file. In

    tha t case, it can be declared a scratch variable by beginning its na me

    with #.

    Ea ch em ployee data system, corr esponding t o a type of record in th e

    data file, gets its own RECORD TYPE command. The value specified on

    the comma nd (1, 2, or 3) refers to an actua l value in t he file MIXED.DAT

    in the record identification position, here column 6 (refer to Figure 3.2).

    The DATA LIST comma nd following each RE CORD TYPE comm an d

    defines th e var iables for t ha t r ecord t ype. Notice how SALARY is incolum ns 13-17 for r ecord t ypes 1 an d 3 but in colum ns 22-26 for r ecord

    type 2.

    An optional subcomma nd on th e FILE TYPE comma nd is WILD,

    which t ells SPSS t o issue a warn ing when it encoun ters u ndefined record

    types in th e dat a file. The defau lt is NOWARN, so SPSS simply skips all

    record t ypes not mentioned and does not display warning m essages.

  • 8/9/2019 Programming With s Pss Syntax and Macros

    39/154

    Complex File Types 3 - 9

    SPSS Training

    The input program state ends with th e END FILE TYPE comman d,

    which is followed by labeling comma nds an d t hen a Fr equencies

    comman d. FILE TYPE--END F ILE TYPE a re not procedur es and do not

    cau se th e ra w dat a file to be read, so they m ust be followed by either a n

    EXECUTE comma nd or another pr ocedure.

    To run all the synta x

    ClickRun..All

    SPSS displays the comma nds in th e Viewer window (not shown) and

    then the frequency table for SYSTEM. We can see tha t t here a re 212

    employees in t he file, creat ed from 212 records, an d th at th ere ar e 52 of

    record type 1, 122 of record type 2, a nd 38 of record t ype 3 in th e dat a file.

    All the information on, for example, salary, has now been placed in one

    column despite its two different locations in the data (if you wish, switch

    to the Da ta Editor to verify this).

    Figure 3.5 Frequency Table for System

  • 8/9/2019 Programming With s Pss Syntax and Macros

    40/154

    SPSS Training

    Complex File Types 3 - 10

    We will illust ra te wh at occurs with un defined record types by once again

    reading t he file MIXED.DAT. Becau se war ning m essages ar e tu rned off

    by defau lt, we were un awa re th at t here a re in fact 213 records, or

    employees, in th e file. However, the 213th case has an error in its record

    type, as shown in Figure 3.6.

    Its record type should be a 3 but is instead a 4.

    Figure 3.6 Error in MIXED.DAT

    ERRORS IN THEDATA

    SP SS skipp ed th is employees record becau se it was n ot defined on a

    RECORD TYPE comma nd, but didnt wa rn us. Lets tell SPSS t o do tha t,

    then reread the file.

    ClickWindow..MIXED - SPSS Syntax Editor to return to the

    syntax file

    Add th e subcomman d /WILD=WARN to the end of the FILE

    TYPE command, but before the period (.)

    Figure 3.7 Modified File Type Command to Add Warnings From SPSS

  • 8/9/2019 Programming With s Pss Syntax and Macros

    41/154

    Complex File Types 3 - 11

    SPSS Training

    After you ha ve car efully added this subcomma nd, reru n a ll the

    comman ds by

    Clicking Run..All

    When SP SS switches to the Viewer, you will now see a wa rn ing

    message and a note in t he log under t he FREQUE NCIES comman d, asshown in Figure 3.8. The warn ing message is clear, telling us th at t he

    record type (4) was ign ored when building th e file. You can verify th is by

    looking at the frequencies output, which lists only 212 cases.

    The exact position of the problem is n oted in the m essage tha t begins

    with Command line:. The critical information is that it was on case 213

    tha t SP SS encoun tered a n un known record ID, whose value is 4. SPSS

    also convenient ly lists t he a ctua l line of da ta from the file MIXED.DAT

    for r eference. Obviously, warn ings can be h elpful in finding a nd fixing

    errors in dat a ent ry or definition.

    Figure 3.8 Warning Messages for Undefined Record Type 4

    It is n ow possible to use the file created via t he F ILE TYPE MIXEDcommand to report on either the total group of employees, or differences

    between employees across record-keeping systems.

  • 8/9/2019 Programming With s Pss Syntax and Macros

    42/154

    SPSS Training

    Complex File Types 3 - 12

    If you plan to read a file with kn own err ors, you m ight th ink th at you

    want to be warn ed every time th ere is a problem defining an S PSS da ta

    file. However, th is is not a lways th e case. If it is a large dat a file with

    man y errors, SPSS could possibly generat e hun dreds, even thousan ds, of

    warn ing messages. It is un likely tha t you will car e to scroll th rough all

    tha t outpu t. In recognition of th is, th e maximum nu mber of warn ings

    SPS S will display h as been set r elat ively low, to a value of 10. When you

    do wan t to see more warn ings, use th e SET comman d with this synta x:

    SET MXWARNS = 100. (or to what ever value is appr opriat e)

    Reading either a gr ouped or n ested file into SPS S isnt m uch different

    tha n r eading a mixed file in t erms of th e synta x. However, one situ at ion

    tha t causes p roblems yet is still relatively common, and t herefore worth

    exploring, is when you wish to use F ILE TYPE GROUP ED bu t dont h ave

    a r ecord iden tificat ion va ria ble. This is fair ly common, especially becau se

    an y recta ngular da ta file can be read with eith er a sta nda rd DATA LIST

    or via a FILE TYPE GROUPE D format . The advant age of the lat ter is

    tha t SP SS will fix any problems with out-of-order records a nd r ead t hefile correctly if there are missing records. However, a record type variable

    is needed in each instan ce, and you may n ot have created one for what

    you kn ew was a stan dar d rectangular file forma t.

    Our example here is a little m ore complicated. Figure 3.9 displays a

    small data file th at st ores informa tion on stu dents in a sta tistics class

    an d th eir scores on each of thr ee assignment s. There is an identification

    variable for the st udent in column 2, but n o numer ic record ident ifier to

    tell SPSS tha t th e first record h as inform at ion on the first quiz, the

    second r ecord on t he first homework a ssignment , and t he th ird record on

    the first t est. Moreover, stu dent 2 did not complete th e first homework

    assignmen t a nd so only has t wo records, which is the rea l problem.

    If this file ha d been created with one record for ea ch st uden t, an d

    each assignment in a sepa rat e colum n, it would be a straightforwar d task

    to read it into SPSS.

    Figure 3.9 Grouped Data File

    Analysis Tip

    GROUPED FILETYPE WITHOUT

    RECORDINFORMATION

    We should point out th at th e score t ype (quiz1, etc.) field can be used a s a

    record type identifier, although it is a string field. However, we will

    ignore th is in order t o demonst rat e anoth er met hod of reading the file.

    Note

  • 8/9/2019 Programming With s Pss Syntax and Macros

    43/154

    Complex File Types 3 - 13

    SPSS Training

    What we want t o do is read t his dat a file and creat e thr ee cases, one

    for each stu dent. We also want to creat e th ree variables, one for each type

    of assignment, and just as importa nt, we want SPSS to realize that the

    second stu dents second r ecord ha s h is score for t he t est, not t he

    homework, assignment.

    There ar e two meth ods to appr oach th is problem without u sing a

    more complex INP UT PROGRAM comman d sequ ence.

    1) Read th e data into SPSS, creat e a record identifier, write th e

    data back out a s an ASCII file, then r ead it back in using

    FILE TYPE GROUPED.

    2) Read th e dat a int o SPSS, creat e a record identifier, then

    manipulate the data to create the necessary variables.

    The lat ter choice is clearly pr eferable because it r equires fewer p asses

    of th e dat a file. The file GROUPED.SPS cont ains t he comma nds t o

    implement th e second method.

    Click on File..Open..Syntax (move t o c:\ Train\ ProgSynMac

    folder)

    Double-click on GROUPED

    Figure 3.10 Program to Read a Grouped File

    The DATA LIST comma nd reads GROUPE D.DAT as if it were a

    rectangular file. The interestin g techn iques in th is program a re what

    ha ppens after t he file is read.

  • 8/9/2019 Programming With s Pss Syntax and Macros

    44/154

    SPSS Training

    Complex File Types 3 - 14

    First , we creat e a RE CORD variable with th e LAG function. Initially,

    RECORD is set to 1 for each case. Then, when th e cur rent case ha s th e

    same ID a s th e previous case, RECORD is set equa l to the value of

    RECORD for t he pr evious case plus one. When t he pr evious case was a

    different student, then this statement is not executed and RECORD

    rema ins a t 1, i.e., it is r eset t o 1 for each stu dent s first record. This

    creat es the desired r ecord identificat ion var iable. To see this

    Highlight the lines from DATA LIST to the first LIST

    command

    Click on the Ru n button

    Figure 3.11 List Output Showing Record ID

    When creating programs t ha t rea d or tr an sform da ta , it is very helpful indebugging your programs t o list out th e data after an action or series of

    actions is executed . If you d ont d o this, t hen it will be very difficult to

    figure out wh ere t hings went wrong. Following this a dvice, the LIST

    comman d ha s been placed at four sp ots in t he program. It is better to err

    on th e side of excess h ere.

    Our problems a re n ot yet solved, though. Stu dent 2 didnt complete

    the homework, but th e value of RECORD for h is test1 score is 2, not 3.

    The set of IF sta temen ts fix this problem. They assign th e correct value of

    RECORD for ea ch type of assignmen t, so th at stu dent 2s test 1 score will

    now be listed a s record t ype 3.

    Switch to the GROUPED - SPSS Syn tax Editor window

    Highlight the lines from IF to the next LIST command

    Click on the Run button

    In t he Viewer (not shown) we can now see tha t t he value of RECORD

    for th e second line (or case) for st udent 2 ha s been cha nged to a 3.

    Analysis Tip

  • 8/9/2019 Programming With s Pss Syntax and Macros

    45/154

    Complex File Types 3 - 15

    SPSS Training

    One task r emains, and tha t is to take these 8 cases in the Data Editor

    an d tu rn them into 3 cases. As men tioned above, we could write this file

    out a nd th en read it back into SPSS, but t ha t is cum bersome. A better

    meth od is to use t he AGGREGATE comma nd. For a na lysis pur poses, it is

    best to create a separa te var iable or column for each assignmen t. If we

    simply aggregate t he curr ent working file by studen t ID, th at wont

    happen. Why?

    To creat e these n ew variables (th ree in th is instan ce), we can tak e

    advan ta ge of th e VECTOR comma nd we sa w in Chapt er 2. The VECTOR

    comman d creates a n ew vector, SCORE_, with th ree elements. The

    COMPUTE st at ement t hen a ssigns the value of SCORE for a case to an

    element of SCORE _ based on th e value of RECORD. In other words, for

    the first record type (quiz1), SCORE_1 gets the value of SCORE for that

    case, the quiz1 score. The values of SCORE_2 and SCORE_3 for that case

    ar e system-missin g. For th e second record type (for hmwk 1) SCORE _2

    gets the value of SCORE for t ha t case, and SCORE_1 and SCORE_3 are

    system-missing.

    Its probably easier to see this in action.

    Switch to the GROUPED - SPSS Syn tax Editor window

    Highlight the lines from VECTOR to the next LIST command

    Click on the Run button

    Figure 3.12 List Output Showing Vector SCORE_

    The values for each assignm ent or test h ave been placed in separa te

    variables, with th e quiz1 score in SCORE_1, the h mwk1 score in

    SCORE_2, an d th e test1 score in SCORE _3. However, th ere ar e still

    thr ee cases for each st uden t, with lots of missing data , and a ll we need is

    one case for each st udent to calculate appr opriate sta tistics.

  • 8/9/2019 Programming With s Pss Syntax and Macros

    46/154

    SPSS Training

    Complex File Types 3 - 16

    A file with th at str uctur e can be creat ed using th e AGGREGATE

    comman d, which chan ges the case base in a file and calculates sum mar y

    sta tistics for the n ew case base. For example, in a file of customers th at

    buy five different products, AGGREGATE can create an SPSS dat a file

    where t he case ba se is product (and so there ar e only five cases), with

    informa tion such as t he n um ber of customers who bought each product,

    the m ean a ge of customers who bought th at p roduct, and so forth.

    There a re th ree necessary subcomma nds for AGGREGATE, as shown

    in Figure 3.10. The OUTFILE subcomma nd tells SPSS wheth er to save

    the n ew data file to disk or to mak e it the curr ent working file in th e Data

    Editor. The aster isk mean s to replace the curr ent working file with th e

    new one. The BREAK subcomma nd defines th e case base for t he n ew file.

    By breaking on the var iable ID, which h ere ha s only th ree un ique values,

    we will creat e a file with thr ee cases. The aggregated var iables

    subcomma nd creates t he sum mar y variables in th e new file. Its format is

    LIST OF NEW VARIABLES = FUN CTION (LIST OF

    EXISTING VARIABLES )

    where you define n ew variable na mes on t he left of th e expression, a n

    aggregate function on the right, followed by th e sam e nu mber of existing

    variables tha t will be used to creat e the n ew variables. In our example,

    we use t he MAX (maximum) function t o creat e th ree var iables called

    QUIZ1, HMWK1, and TE ST1, based on the m aximum value of SCORE_1

    to SCORE_3 for ea ch ID value. Why does th is accomplish our t ask ? Could

    we ha ve used a different fun ction?

    To finish t he pr ogram

    Highlight the AGGREGATE and LIST commands

    Click on the Run button

    Figure 3.13 List Output Showing New Assignment Variables

    The output from LIST demonstr ates t ha t th ere are only th ree cases in

    the new file, one for ea ch stud ent. Thr ee new variables ha ve been creat ed,

    one for each as signment. Th is ma kes it ea sy to calculate st at istics for

    each assignment . And st udent 2 has been correctly assigned a m issing

    score for HMWK1. Since AGGREGATE only creates t he n ew sum ma ry

  • 8/9/2019 Programming With s Pss Syntax and Macros

    47/154

    Complex File Types 3 - 17

    SPSS Training

    variables we define, the variables ASSIGN, SCORE, RECORD, and

    SCORE _1 to SCORE_3 ar e gone, which is fine. You m ay a lso want to look

    at th e Data Editor (not shown) to see the file format .

    Un like th e definit ion of complex file types, every comm an d in t his

    program could have been created from the dialog boxes except VECTOR.As this cour se is generally concerned with SPSS programm ing, we

    instead worked from a S ynta x file. Either appr oach is a cceptable,

    although seeing the syntax often helps your u nderst an ding, an d it

    certa inly lets you apply the sam e type of program in th e fut ur e to another

    dat a file.

    We reviewed the t ypes of complex files, their str uctur e, and th e SPSS

    synta x used t o read t hese da ta files. We illustr ated th e use of complex file

    types by reading a mixed data file, then discussed how data err ors are

    ha ndled by SPSS. We then sh owed how to read a grouped file with an odd

    stru cture an d no numer ic record t ype inform ation.

    Analysis Tip

    SUMMARY

  • 8/9/2019 Programming With s Pss Syntax and Macros

    48/154

    SPSS Training

    Complex File Types 3 - 18

  • 8/9/2019 Programming With s Pss Syntax and Macros

    49/154

    Input Programs 4 - 1

    SPSS Training

    Input Programs

    Introduction

    Synta x Basics

    Chan ging the Case Base of a F ile

    En d of Case Pr ocessing

    End of File Processing

    Checking Input Programs

    Incomplete Input Programs

    Reading F iles with Missing Iden tifiers

    When Th ings Go Wrong

    There are situations where you encounter non-rectangular raw

    dat a files th at can not be rea d directly with t he complex file types

    provided by SPSS. For t hose situat ions, SPS S offers a n inpu t

    program facility, as men tioned in Cha pter 2, tha t ha s th e capa bility to

    read essentially any type of ASCII data file. The ability to read a file will

    at times depend upon t he cleverness of th e user, a s rea lly odd files ma y

    require creative solut ions.

    An input program can a lso be used to create dat a th at m atch a t arget

    distribution, often for pur poses of teaching or illust ra tion. In other words,

    an inpu t program can create dat a from nothing (this is the one time thatSPSS provides a free lunch, so to speak).

    For ver y large files, inpu t pr ogra ms a lso offer gr eat efficiencies, even

    if th e file is a stan dar d rectangu lar dat a file. An input pr ogram can be

    used t o select only certa in cases a s th e file is read, sa ving one pa ss of the

    dat a. Or it can concatena te ra w data files, saving on h aving to creat e

    SPSS da ta files of each. And inpu t pr ogra ms can per form t he equivalent

    fun ctions of a gr ouped, mixed, or nested file type, but with added

    flexibility.

    The u ser is in char ge of case definition when wr iting an input

    program, so car eful at tent ion mu st often be paid to where in th e progra m

    strea m a case should be creat ed. At t imes, you may also need to tell SPSS

    when to stop reading the data an d creat e a working dat a file.

    Chapter 4

    Topics

    INTRODUCTION

  • 8/9/2019 Programming With s Pss Syntax and Macros

    50/154

    Input Programs 4 - 2

    SPSS Training

    The comma nds IN PUT P ROGRAM and E ND INP UT PROGRAM enclose

    dat a definition and tr an sforma tion comm an ds tha t build cases from input

    dat a. At least one file definition comma nd, su ch as a DATA LIST, must

    be included in th e structu re. Essentially any tra nsform ation comman ds

    can be placed with in an inp ut pr ogra m str uctur e, but no procedures. This

    mean s tha t you can u se COMPUTE, IF, DO IF, REPE ATING DATA,

    LOOP, or an y of th e oth er tra nsform ation comman ds tha t ma y help to

    creat e a working da ta file.

    It is very important to understan d tha t SP SS processes the input

    program comman ds on a case-by-case basis. This may be ha rd t o gra sp

    intu itively, given tha t a n inpu t pr ogram creates th e definition of a case as

    it is executed, but we will illustrate this concep