planning, preparing, documenting, and referencing sas ... · although sas is flexible,...

53
United States General Accounting Office GAO Office of Policy August 1992 Planning, Preparing, Documenting, and Referencing SAS Products GAO/IMTEC-11.1.2

Upload: others

Post on 13-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

United States General Accounting Office

GAO Office of Policy

August 1992

Planning, Preparing,Documenting, andReferencing SASProducts

GAO/IMTEC-11.1.2

Page 2: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and
Page 3: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Preface

The SAS system is a software system for dataanalysis.1 It is primarily used for information storageand retrieval, data modification and programming,report writing, statistical analysis, graphics, and filehandling. It is used extensively within the GeneralAccounting Office (GAO) to retrieve, analyze, andpresent data. Evaluators, programmers, and analystsuse SAS because it is comprehensive and flexible.

Because properly planning, preparing, documenting,and referencing SAS products can be intricate anddemanding, this guide was developed to enable GAO’sSAS users to develop products that conform to GAOquality control and workpaper documentationstandards. The guide recommends SAS features thatare consistent with GAO standards and warns againstusing those that are not. Although SAS has nonauditapplications, such as management informationsystems, this guide is intended for audit and programevaluation applications. While this guide is specific toSAS, the principles discussed here apply to otheranalytical packages.

This guide is targeted toward GAO evaluators andanalysts who use SAS or supervise those who do. Itcomplements SAS training and reference manuals andsupplements GAO policy directives. Because thisguide assumes that the reader understands the syntaxand style of SAS statements and procedures, youshould consult your technical assistance group if youare unfamiliar with SAS. The Training Institute andother educational centers offer basic courses in usingSAS.

This document can also provide guidance to thegeneralist evaluator by recommending practices inthe planning, preparing, documenting, andreferencing of SAS products.

1SAS is a registered trademark of the SAS Institute Inc., Cary, N.C.

GAO/IMTEC-11.1.2 SAS DocumentsPage 1

Page 4: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Preface

A team of experienced SAS users, representing bothregional and headquarters staff, developed this guide.Major contributors were Arthur D. Foreman andRoJeanne W.L. Liu. If you have any suggestions onimproving this guide or need further assistance,please call Mr. Foreman at (513) 684-7120.

Ralph V. CarloneAssistant Comptroller GeneralInformation Management and Technology Division

Werner GrosshansAssistant Comptroller General for Policy

GAO/IMTEC-11.1.2 SAS DocumentsPage 2

Page 5: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

GAO/IMTEC-11.1.2 SAS DocumentsPage 3

Page 6: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Contents

Preface 1

Chapter 1 Introduction

6Flexibility 6Suitability 6Ease of Use 7Order of Topics 7

Chapter 2 Planning

8Deciding to Use SAS 8SAS Work Plan 11

Chapter 3 Checking YourData and SasPrograms

13Checking Your Data 13Checking Your Program 22

Chapter 4 Data Entry andTransfer

36Raw Data 36SAS Data 38Data Formats From Other Statistical

Software Packages38

Data From Data Base Packages 39SAS-Generated Data 39On-Line SAS Data Entry 40Transferring and Moving SAS Data Sets 41

Chapter 5 Processing andDocumenting theResults

42

GAO/IMTEC-11.1.2 SAS DocumentsPage 4

Page 7: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Contents

Chapter 6 Referencing SASWork

45

Chapter 7 Disposition ofWorkpapers

47

Appendix Additional Information 48

GAO/IMTEC-11.1.2 SAS DocumentsPage 5

Page 8: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 1

Introduction

Although SAS is flexible, multipurpose, and easy touse, it can easily be misapplied, the resultsmisinterpreted, and errors left undetected. This guidestresses the importance of

• knowing the underlying statistical principles and howto interpret results,

• understanding the structure and characteristics of thedata you will be using,

• being familiar with SAS procedures and appropriatelyapplying SAS options and modifiers, and

• being careful to specify the correct parameters forSAS procedures.

Flexibility SAS software works in a variety of user interfaces andexecution methods—batch, interactively with linenumbers, interactively with display managerwindows, and interactively with menus. Each of theseinterfaces and methods presents distinct challengesto build quality control into SAS products. Certaininterfaces and methods, although convenient and easyto use, are difficult to document. This guide provideshints on when each mode is appropriate and how touse it.

Suitability This guide suggests what SAS procedures areappropriate for an assignment and when othersoftware would be more effective than SAS. SAS wasdeveloped to give analysts one software system tomeet a wide variety of computing needs. In additionto providing data retrieval, modification, reportwriting, and statistical analysis procedures, SASsoftware provides graphics, operations research,forecasting, and business analysis tools. SAS alsoprovides interfaces to popular data bases,menu-driven facilities for data manipulation, facilities

GAO/IMTEC-11.1.2 SAS DocumentsPage 6

Page 9: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 1

Introduction

for interactive applications, and a matrixprogramming language.

Ease of Use Because of SAS syntax and user interfaces, SASprocedures are easy to use. One method of using SASis to simply check blocks on a menu. However,appropriately using many SAS procedures requires atechnical knowledge of the areas addressed by thoseprocedures. Many procedures require an extensiveknowledge of statistics, operations research, andeconometrics. In addition, some of the user interfacesdo not provide proper GAO workpaperdocumentation.

Order of Topics The sequence of topics in this guide follows thenormal order of assignment tasks:

• planning work that may involve SAS;• ensuring correctness in SAS work;• entering data into SAS from raw data formats, SAS

data files, other software formats, and data bases;• transferring SAS data between computers;• documenting SAS work;• referencing SAS work; and• storing SAS workpapers and files.

GAO/IMTEC-11.1.2 SAS DocumentsPage 7

Page 10: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 2

Planning

Using SAS to support an audit assignment requirescareful planning. To correctly decide if you shoulduse SAS and, if so, how, you must have explicitlydefined the assignment’s objectives and the specificquestions to be answered. Planning to use SASinvolves

• defining the analysis techniques you will use,• determining the source of data,• estimating staff proficiency in SAS and statistics,• determining available computer resources, and• determining available support material.

The result of this planning effort is an initial SASwork plan. This plan, developed after defining auditobjectives and specific information needs, should beincluded as part of the overall audit plan or evaluationdesign.

Deciding to UseSAS

The GAO Project Manual describes significantcharacteristics of various computer softwarepackages available to GAO staff. Selecting the mostappropriate package(s) depends on matching theassignment’s needs with a software’s strengths. Thefollowing questions will help you decide if SAS is themost appropriate package.

What AuditTechniques Do YouNeed?

SAS works well for data retrieval, statistical analysis,operations research, econometrics, and qualitycontrol problems and may be appropriate when theassignment calls for these techniques. For example, ifthe assignment calls for a statistical analysis, SAS is atop candidate. However, if simulation models are asignificant portion of the assignment, SAS would notbe the best choice because of the extensiveprogramming effort required. Also, although SAS hasextensive analytical graphic capabilities, SAS graphics

GAO/IMTEC-11.1.2 SAS DocumentsPage 8

Page 11: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 2

Planning

do not meet GAO’s visual communications standards.Therefore, do not use SAS graphics for presentation.

Instead, use GAO packages such as Instant Chart orText Frame.

SAS may be used in conjunction with other products.For example, if your assignment calls for creating arelational data base from which selected items needcomplex statistical analysis, you could select a database package, such as dBASE, to gather and structurethe data and then use SAS to perform the statisticalanalysis.1 If you need to collect data in the field, youmay find it convenient to use a data collectionpackage that will run on a laptop computer and laterconvert the data for analysis in SAS.

How Are AgencyData Stored?

In many assignments, existing data sources areneeded to satisfy assignment objectives. Convertingexisting data from one format to another involvesadditional programming and risks that are bestavoided. If your assignment needs existing data thatare stored in a SAS format, you should copy the SASdata file and use it without converting it to anotherformat. Conversely, if the data are stored as an SPSSdata file, SPSS is a better choice than SAS, all otherfactors being equal.2

Is the StaffProficient in UsingSAS?

To meet the staff qualifications requirement of theGovernment Auditing Standards, the assignment teammust collectively have the skills to use SAS. Thesupervisor or a designated person has to be able toreview the SAS programming and understand the

1dBASE is the registered trademark of Borland International, Inc.,for data base software.

2SPSS is a registered trademark of SPSS, Inc., for its computersoftware product.

GAO/IMTEC-11.1.2 SAS DocumentsPage 9

Page 12: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 2

Planning

principles behind the SAS procedures. In addition, thereferencer must be able to reference workpapersproduced with SAS. If the referencer does not possessthe necessary skills, a person independent of theassignment and familiar with SAS must review thework.

Does the AssignmentTeam Have Accessto ComputerResources Capableof Processing SASPrograms?

You need SAS software to interpret your SASprograms and to process your data. Since SAS is soldas modules, you must have the appropriate modules.For example, if your assignment calls for economictime series analysis, you will need the SASeconometric and time series module.

The SAS software must reside on a computer withenough capacity to process your data. If a largeagency data file is to be processed, you need access toa mainframe. If a small questionnaire is to beprocessed, access to a microcomputer with SAS issufficient. Depending on your processing needs andresources, you may use a combination.

Not all of GAO’s microcomputers have enough speedand storage capacity to run SAS effectively. As aminimum, you need a computer with 20 megabytes offree hard disk space. The SAS Institute recommends a“286” or better computer.

Besides an appropriately sized computer for SASsoftware, other factors should be considered. Forexample, if you need to process classified data, thecomputer must be certified for processing classifieddata. If you need computer graphics, the computermust have access to a graphics printer.

GAO/IMTEC-11.1.2 SAS DocumentsPage 10

Page 13: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 2

Planning

Does the AssignmentTeam Have Accessto SAS SupportMaterial?

SAS reference manuals must be available. Thesematerials are needed during the assignment toproperly implement and understand SAS procedures.While the SAS “help” and the SAS “assist” facilitiesprovide limited assistance, you need the SASreference manuals for a complete understanding. Ifmanuals are not available, they should be purchased.

SAS Work Plan As encouraged by the Project Manual, yourassignment team needs to decide on an initial SASwork plan. This plan should be a part of the overallassignment plan or evaluation design. The SAS workplan is important since it outlines the methodologies,programs, and procedures that are needed and showswhat data will be captured and verified. It logicallyties together the information needed, programs, datafiles, and reports.

The SAS work plan may include a process flow chartor diagram supported by a written narrativeexplaining the relationship among major programs,procedures, windows, data files, manual procedures,and reports. Consider the following items for yourplan:

• how data relevance and reliability will be gathered,transferred, or converted to a machine-readable form;

• how data will be checked;• when, if ever, a permanent SAS data file will be

created;• when and how data will be transferred between SAS

and other programming packages;• what major analysis procedures will be used and how

they relate to the assignment objectives;• how programs will be tested and reviewed;• what major reports and data files will be produced;

and• who will be responsible for the analysis and review.

GAO/IMTEC-11.1.2 SAS DocumentsPage 11

Page 14: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 2

Planning

Upon arriving at your SAS work plan, you may haverejected many alternative approaches. You shoulddocument the rejected approaches in the plan. Thiswill help you to justify your decisions and, ifconditions change, help you to revise the plan.

GAO/IMTEC-11.1.2 SAS DocumentsPage 12

Page 15: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

Quality evidence and sound analysis are necessary tosupport findings, conclusions, and recommendations.To ensure the quality of your evidence and thesoundness of your analysis in SAS-based work, youmust use data-checking techniques and programverification procedures that are tailored to SAS. Thissection discusses a variety of such techniques andprocedures.

Checking YourData

The Government Auditing Standards suggestschecking the relevance and reliability of data. Datareliability can be determined by conducting a reviewof the general and application controls incomputer-based systems or by conducting other testsand procedures. SAS programs are an excellent toolfor checking data since SAS contains procedures thatcan easily highlight potentially invalid data.

The nature of your data checks depends largely onhow you answer the following questions:

• What is the source of your data file? That is, did theagency supply the data file, did you develop it, or wasit program generated?

• How is the data file structured? Does the file haveembedded structure information relating records?Does the file consist of records of a single recordtype?

• How are the individual data elements stored? Are theystored as numbers or as alphabetic characters?

• How many data values are associated with each dataelement? In other words, is the data elementcontinuous, or is it categorical with a limited numberof values?

• Can you check your data against other available data?

GAO/IMTEC-11.1.2 SAS DocumentsPage 13

Page 16: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

The extent of checking will depend on the riskassociated with using the data. The followingsuggestions will help you use SAS to check data.

ReasonablenessChecks

• Extreme values in numeric data variables mayindicate invalid data. For example, the indicated agesof the youngest and oldest employees may be errors.You can identify extreme values with the PROCUNIVARIATE procedure. This procedure will reportthe five minimum and five maximum values of anydata variable and will identify the observations fromwhich the data came. Code the request in thefollowing manner:

• Many numeric variables have only a limited range ofvalid values. For example, the age of an employee isnormally greater than 15 years and less than 80. Youcan identify data outside an acceptable range with theIF ... THEN PUT ... statements. Information on thepotentially invalid values will appear on the SAS logfile. Code the request in the following manner:

GAO/IMTEC-11.1.2 SAS DocumentsPage 14

Page 17: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

• Categorical data normally have few valid values. Forexample, the gender of a person coded other than Mor F is a potential error. You can identify all values ofa categorical variable in a data file using the PROCFREQ procedure. PROC FREQ counts the number ofobservations for each and every value of the variable.Code the request in the following manner:

• You can identify invalid categorical data and theobservations from which the data came with the IF ...THEN PUT ... statement and the IN function. Code therequest in the following manner:

GAO/IMTEC-11.1.2 SAS DocumentsPage 15

Page 18: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

• SAS will ensure that numeric data are numeric andthat dates and times are valid. SAS does this whilereading raw data. If bad numeric data or invalid datesand times are found, SAS will report the data on theSAS log and will assign a missing value to the variablein the observation. For each observation with anerror, SAS will display the entire input record, thevalues assigned to each variable, and error messages.Do not suppress error-checking messages. This isimportant in checking the reasonableness of yourdata.

• SAS will help you check the consistency between dataelements. Elements with conflicting values can bechecked with IF ... THEN PUT ... statements. Code therequest in the following manner:

GAO/IMTEC-11.1.2 SAS DocumentsPage 16

Page 19: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

• SAS will help you select a sample of data for you tocompare for consistency with your source file. Forexample, you may want to select 10 observationsfrom the beginning of the file, 10 observations fromthe end of the file, and 5 percent randomly from a SASfile of 10,010 observations. You can use an IFstatement as follows:

Missing Data Checks SAS has two general types of data—numeric andalphanumeric. For numeric data, which include datesand times, SAS represents missing values with aperiod. For alphanumeric data, SAS representsmissing characters with a blank.

GAO/IMTEC-11.1.2 SAS DocumentsPage 17

Page 20: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

• SAS will check for missing data in numeric variableswith the PROC MEANS or PROC UNIVARIATEprocedure. These procedures will report the numberof observations with missing values. Code the requestin the following manner:

• You can identify missing numeric data and theobservations in which they appear using the IF ...THEN PUT ... statement. Code the request in thefollowing manner:

• SAS will check for missing data in categoricalvariables every time you use the PROC FREQprocedure. In the following example, all values,including missing values, will be identified. Code therequest in the following manner:

GAO/IMTEC-11.1.2 SAS DocumentsPage 18

Page 21: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

• You can identify missing alphanumeric data and theobservations in which they appear using the IF ...THEN PUT ... statement. Code the request in thefollowing manner:

Record ErrorChecks

• SAS will help you guard against losing observationsby counting the number of records read and thenumber of observations created in a newly createddata file. This information will appear on the SAS logfile. You can check the number on the SAS log fileagainst the expected number.

• SAS can help you locate unexpected duplicateobservations. You can locate duplicates by firstsorting the file with a PROC SORT procedure andthen using an IF ... THEN PUT ... statement with the

GAO/IMTEC-11.1.2 SAS DocumentsPage 19

Page 22: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

LAST and FIRST modifiers on the variable. Code therequest in the following manner:

If duplicate observations exist, they will not be boththe first and last observation with a particular value.For the above example, every duplicate will be listedon the SAS log.

The PROC SORT procedure using the NODUPKEYmodifier eliminates observations with the same sortkey. However, this method does not indicate whichobservations were deleted and should not normallybe used in GAO work.

Another way to locate duplicate observations is to usethe LAG function. First sort the file with a PROCSORT procedure and then using an IF ... THEN ... PUTstatement and the LAG function code the request inthe following manner:

GAO/IMTEC-11.1.2 SAS DocumentsPage 20

Page 23: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

• When an individual observation is structured asmultiple records or lines, you can resynchronize theinput after a missing record or a missing line by usingthe LOSTCARD statement. The LOSTCARD statementreports unexpected changes in the record identifierand is used in conjunction with the IF ... THEN PUT ...statements. The input file must be ordered by therecord identifier. In the following example, the inputdata file has two records per observation:

GAO/IMTEC-11.1.2 SAS DocumentsPage 21

Page 24: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

Checking YourProgram

A program is reliable if it is performing as expected.SAS programs depend on several simple practices toenhance program reliability and error detection andto reduce the time to correct errors.

Before writing a SAS program, you should establishits specific purpose. Include a general description ofyour program and the techniques you will use toverify that the program is working correctly. Thesedetails will assist in selecting the appropriate SASstatements and procedures, as discussed below.

Program Structure • Dedicate a program to one main purpose. This mayseem inefficient, which it is for production programswhere programs are run repetitively. However, bylimiting your program to one purpose, it is easier totest, correct, and understand. If you have a complexanalysis that takes several steps to achieve onepurpose, breaking up the analysis into more than oneprogram may make more sense.

GAO/IMTEC-11.1.2 SAS DocumentsPage 22

Page 25: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

• Preprogrammed procedures and built-in formats aremore likely to be error free than programs andformats you write. If a preprogrammed procedure orbuilt-in function exists and fits your analysis needs,use it. For example, the descriptive statisticsproduced by the PROC MEANS procedure can beduplicated using a DATA step; however, the PROCMEANS procedure takes less time to program andgives reliable results.

• SAS statements and procedures are reliable, but theyhave limitations and known errors. Before writingSAS programs with unfamiliar statements or beforeusing unfamiliar procedures, read the appropriatesections of the SAS manuals and check the usagenotes for any limitations.

• Keep SAS data steps simple and straightforward byusing standard programming constructs which areeasy to follow. Avoid complex and unstructuredprogramming which is characterized by excessiveGOTO statements, too many levels of nested DOloops, self-modifying code, excessive interactionbetween modules, and multiple modules performingthe same function.

• You should let programs run to their logical end. Usemodular programming constructions like the DOWHILE and DO UNTIL statements. Each constructionshould have only one entry and only one exit point.

Readability • Individual data steps and procedure steps arelogically distinct parts in a SAS program and shouldbe visibly separated in a program by at least oneblank line and a description at the beginning of a newprocess. A RUN statement at the end of each step willvisibly separate SAS programming steps and willassist in correcting errors.

• Put a program description near the beginning of everySAS program. Put a description of the SAS data stepor procedure step before every major logical

GAO/IMTEC-11.1.2 SAS DocumentsPage 23

Page 26: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

grouping. At the beginning of the program, you cancreate a comment block with the followinginformation: job title and assignment number,programmer’s name, program name and dateprepared, program description or purpose, and datasource. For example:

At the beginning of each major logical grouping in theSAS program, create a comment block containing thepurpose of the following step. For example:

GAO/IMTEC-11.1.2 SAS DocumentsPage 24

Page 27: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

Do not explain the meaning of SAS procedures orstatements in the program since they are explained inSAS manuals.

• Write meaningful comments into the source code.This will help the reviewer understand your logic.

Comments should be easily distinguished from theprogram code. To write comments directly into SASprograms, use the * statement or the /* */ commentdelimiter. For larger comments, draw boxes ofasterisks around them. Avoid burying the /* */comment delimiters within a SAS statement since thedelimiters can make the program harder to read.Avoid using the /* in the first columns of a programbecause certain IBM operating systems will confusethe delimiter with the job termination statement.1

• Put global titles near the beginning of the program.The first title line, TITLE1, will contain theassignment name; the second line will contain theassignment code. These titles will remain constantthroughout the SAS program. Every procedure thatproduces output will have an associated title whichexplains the output. For example:

1IBM is the registered trademark of International BusinessMachines Corporation.

GAO/IMTEC-11.1.2 SAS DocumentsPage 25

Page 28: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

• When possible, enhance SAS output with descriptivevariable labels and descriptive data formats. TheLABEL statement associates a variable name with alabel. The FORMAT statement associates data valueswith a description.

• Correcting errors and understanding data stepprograms is easier if logical groupings are readilyapparent. Likewise, correcting errors andunderstanding procedures is easier if the proceduremodifiers and options are logically arranged. Theexample below illustrates the lack of logicalgroupings in a data step:

GAO/IMTEC-11.1.2 SAS DocumentsPage 26

Page 29: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

The example above is rewritten below to show theadvantage of logically grouping statements in a datastep.

• Indentation will also help in understanding andcorrecting programs. For example, indent allstatements in a DO group at least one column fromthe word DO. Put the corresponding END statementon a separate line and indent it at least one columnfrom the word DO, as follows:

GAO/IMTEC-11.1.2 SAS DocumentsPage 27

Page 30: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

As another example, indent all statements in aSELECT group at least one column from the wordSELECT. Put the corresponding END statement on aseparate line indented one column from the wordSELECT. Indent the OTHERWISE statement onecolumn from the SELECT statement.

Indent all statements in a MACRO routine at least onecolumn from the MACRO header line. Put thecorresponding MEND statement on a separate lineindented one column from the label.

GAO/IMTEC-11.1.2 SAS DocumentsPage 28

Page 31: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

• Align items for readability. The following twoprogram examples look exactly the same to SAS, butthe latter is more readable.

• The IF statement may be written on one line. Whenmore than one line is required, put the word THENand the action clause on the line or lines following thecondition clause. Indent the line or lines at least one

GAO/IMTEC-11.1.2 SAS DocumentsPage 29

Page 32: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

column from the IF statement. Indent the ELSEstatement at least one column from the IF statementon a separate line.

• The SAS MENU facility creates a special problemsince its spacing is very different from what this guidesuggests. When using code written by the SAS MENUfacility, realign the code to comply with standardindentation and spacing.

• Use descriptive and consistent program names. Ifpossible, relate the first part of the program name tothe assignment or assignment segment. Use theextension “SAS” for the last level of the programname to indicate that the program is a SAS program.For example, MEDCST01.SAS is program number onefor the medical cost review.

• File references are more understandable if they havemeaningful names. Personal names of assignmentteam members are not acceptable. It is sometimesconvenient to name the SAS file reference the same asthe external file.

• Library references are more understandable if theyhave meaningful names. The library reference may bethe same as the external file name.

• Programs are more understandable if variables havemeaningful names or are related to other workpapers.For questionnaires and data collection instruments,you should tie variable names to items on thequestionnaires. For example, a variable named Q12bwould correspond to question 12b. For data comingfrom agency files, you can relate a variable to anagency data base. For example, a variable namedENTRY123 would correspond to a variable in anagency data dictionary.

In certain situations, variable names can be related tothe nature of an item. For example, a variable namedAGE can be used for a person’s age. Since SASvariable names are limited to eight characters, thismethod of naming variables is not appropriate when

GAO/IMTEC-11.1.2 SAS DocumentsPage 30

Page 33: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

the variable cannot be described adequately withineight characters.

• Whenever possible, spell out comparison operatorsrather than using special symbols. For example, useLT rather than the < symbol. Printers are inconsistentin the way they print special symbols, and mnemonicsare easier to understand than symbols. Also, symbolsdo not have the same meaning in all programminglanguages. For example, <> means the maximum oftwo values in SAS, while it means not equal in BASIC.

• Operator priority in a complex expression can beconfusing. When this is the case, use parentheses toshow the order in which operations are performed.

• Use decimal numbers, whenever appropriate, since inmost situations they are more widely used andunderstood than binary and hexadecimal numbers.

• Avoid nonstandard coding schemes. The FederalInformation Processing Standards Publications are agood source of standard codes on which many SASfunctions and formats are based. For example, all SASstate code functions and United States maps arebased on the Federal Information ProcessingStandards for state abbreviations.

Logic and StatisticalCorrectness

• Explicitly document options you use when they differfrom the standard normal default settings. Use theOPTIONS statement to set options. Unexplainedoption selections may change the way a SAS programoperates.

• When defining categories or groupings for SASprocedures or programs, make sure you haveexplicitly defined the categories as mutually exclusiveand collectively exhaustive. When using the IF andSELECT statements, make sure all possibilities arestated and the entire range of values is covered. Becareful that the end points of the categories reflectassignment needs and do not overlap.

GAO/IMTEC-11.1.2 SAS DocumentsPage 31

Page 34: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

Be explicit about all conditions, even those that maybe unusual. Use the ELSE statement with the IFstatement. Use the OTHERWISE statement with theSELECT statement. Use the special rangenames—HIGH, LOW, OTHER—in the FORMATprocedure. In the following examples you aredistinguishing between gender and allowing forunusual responses.

• Statistical procedures may be defined for eithersamples or universes. When sample statistics aredesired, use the correct options. For example, inPROC MEANS, the option VARDEF=DF requests thatthe procedure use the proper degrees of freedom forcomputing the sample variance.

• Frequently, one SAS observation represents multipleobservations. To deal with these situations, SAS hastwo statement modifiers—FREQ and WEIGHT. Donot confuse them. FREQ is used for integer weighing.WEIGHT is used for any value greater than zero.

GAO/IMTEC-11.1.2 SAS DocumentsPage 32

Page 35: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

• SAS procedures do not calculate a weighted skewnessor kurtosis. The WEIGHT statement affects only themean, variance, and related statistics.

• SAS procedures use standard algorithms forcomputing sample statistics. SAS also providesoptions to test for normal distributions of the data.You must decide whether the standard algorithms areappropriate. The UNIVARIATE procedure’s NORMALoption will test if your data values approximate anormal distribution, but you must decide whichstatistics are valid.

• Program explicitly for missing and miscoded data. Inthe FORMAT procedures, use the OTHER modifier toallow for missing and miscoded data. In proceduresteps, select the MISS or NOMISS modifiers asneeded. Use the SUM statement when missing dataare to be excluded from the calculation.

Program Testing • To ensure that your program is logically correct,review each program step to ensure there are no logicerrors. If the program is large or quite complex, askyour supervisor to review the program with you indetail. If your supervisor needs assistance, aknowledgeable colleague can help.

• Test SAS procedures. One method is with small testfiles, comparing the SAS results with known results.Include bad, missing, and valid data in your test files.Another method is independently testing resultsmanually or through another package, againcomparing the two results. Test results should bedocumented and included in the workpapers, as wellas an explanation of your choice in the use of optionsand modifiers.

• Once a program receives approval, do not modify itunless the modification is approved.

Efficiency • Although effectiveness rather than efficiency is theprimary concern in our programming environment,avoid excessively inefficient programming. Minimize

GAO/IMTEC-11.1.2 SAS DocumentsPage 33

Page 36: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

computer time by not repeatedly reading data. Forexample:

is preferable to

In the second case, SAS must read the file twice.

• Avoid repeating complex and time-consumingcalculations and file sorting. After making acalculation that you intend to use in a later program,save the calculation as a variable in the SAS data file.Avoid repeatedly sorting a file by saving the file in theorder that will be most useful in future programs.Indexing may also be efficient.

• Conserve computer storage by eliminating variableswith no further use.

• SAS programs may be more efficient if theytemporarily limit the number of variables beingprocessed. This is accomplished with the DROP or

GAO/IMTEC-11.1.2 SAS DocumentsPage 34

Page 37: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 3

Checking Your Data and Sas Programs

KEEP options attached to the SET statement. Forexample:

• For numeric variables requiring less than 16 digits ofprecision, you can reduce the space SAS normallyuses to store data by using the LENGTH statement.

GAO/IMTEC-11.1.2 SAS DocumentsPage 35

Page 38: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 4

Data Entry and Transfer

The method you use to read data into a SAS data setwill depend on the data source. In every case, youmust verify that the data were entered correctly.These requirements are discussed in the GovernmentAuditing Standards. This section explains how toimplement those requirements with SAS.

Raw Data In the data step, the INPUT statement is used forreading raw computer data. Most data from agencies,questionnaires, and data collection instruments areconverted to SAS with the INPUT statement. SAS canread any file and record structure, no matter howcomplex. The following points will help you input thedata:

• To reliably enter data, use a complete description ofthe data file, including a file structure, a recordlayout, and data descriptions for at least the criticaldata elements.

• The reliability of the data entry program is critical.Compare the resulting values of the SAS data set withraw data to ensure that data conversion is reliable. Ifthe data entry program is at all complex, test it with atest file and document the results.

Sources of errors may be (1) incorrectly specifying adata field starting location and length,(2) misinterpreting the format of the raw data, and(3) exceeding the field length of a SAS variable.

• Document any differences between the raw data fileand the newly created SAS file. If records are deleted,document the number of deleted records and explainwhy they were deleted. The raw data record countmust equal the sum of the SAS observations createdand the records deleted. Data base segments must becounted when the raw data file is a data base.

GAO/IMTEC-11.1.2 SAS DocumentsPage 36

Page 39: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 4

Data Entry and Transfer

• Fixed-format data entry should be used in preferenceto free formatted and named data entry.Free-formatted data entry can lead to misread andskipped data. When free-formatted data entry is theonly alternative, you should count the number of dataelements read for each observation.

• Maintain consistency between the variable and itsformat. To do this, completely define the variableattributes. Use either the ATTRIB statement with theFORMAT, INFORMAT, LABEL, and LENGTHspecifications or individual statements for FORMATand LABEL with the input format defined in theINPUT statement. When using the INPUT statementin the data step, use explicit input formats and readcontrols. Avoid suppressing input error reporting.

• The LENGTH statement is especially important forcharacter variables since letting SAS decide theirlength could cause truncation. The maximum lengthof a SAS character variable is 200 characters. If youneed a longer character variable, you have to eithersplit the variable into parts or use anotherprogramming package.

The LENGTH statement is also used to specify thenumber of bytes set aside to store the numericvariables. The maximum length and the default lengthof numeric variables in SAS is eight bytes, whichequates to the integer 75,057,594,037,927,935. If youneed more precision, you will have to split thevariable into parts or use another programmingpackage.

• Be careful with short records. When necessary, usethe MISSOVER modifier on the INFILE statement.Use the LOSTCARD statement to synchronize data onmulticard input.

GAO/IMTEC-11.1.2 SAS DocumentsPage 37

Page 40: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 4

Data Entry and Transfer

SAS Data Some agencies maintain data in a SAS file format. SAScan read these files without error. However, be awareof the following points:

• You must have a complete description of the data fileand data descriptions for the critical data elements.

You must also get a copy of agency-written SASformats. Use the PROC CONTENTS procedure for adescription of the file and how it was created.

• You must know the SAS version used to create thedata file and how it was exported.

Data FormatsFrom OtherStatisticalSoftwarePackages

The file you need may be in the native format ofanother statistical software package, such as SPSS orBMDP.1 While SAS can read these files, you shouldkeep the following points in mind:

• Be careful that the conversion maintains consistencybetween the value of the variable and its format. Becautious of variable name conversions, truncation oflong character variables, redefinition of missing data,changes in date conventions, and changes due toformat inconsistencies. These potentialinconsistencies are documented in the language andprocedure manuals.

• Have a complete description of the agency’s data file,including a file description and data descriptions ofall critical data elements.

• Test and document the reliability of the dataconversion process. Trace the data from the otherstatistical package’s data file to the SAS data. Sourcesof error may be mistakes in converting the variablename, field width changes, and data formatdifferences. Date formats are especially troublesome.

1BMDP is a registered trademark of BMDP Statistical Software, Inc.

GAO/IMTEC-11.1.2 SAS DocumentsPage 38

Page 41: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 4

Data Entry and Transfer

• The data entry program must read every observationin the other package’s data file and create a SASimage. If records are deleted, the number of deletedrecords should be reported. The input observationrecord count should equal the sum of the SASobservations created and the records deleted. Database segments should be counted when the raw datafile is a data base.

Data From DataBase Packages

Some agencies maintain data using data basepackages, such as dBASE. SAS can read these fileswith the PROC DBF procedure for dBASE files andwith special procedures for other data bases. Toconvert data from data base packages, you should beaware of the following points:

• Have a complete description of the agency’s data fileand data descriptions for the critical data elements, aswell as a description of the file structure.

• Test and document the reliability of the dataconversion programs. Sources of error may bemistakes in record segment pointers and incorrectSAS record structures.

SAS-GeneratedData

SAS can generate data internally in data andprocedure steps. Data generated in this manner musthave the same level of reliability as data from externalsources. Several points deserve special mention.

• GAO has approved the use of the SAS RANUNIrandom number generation function and subroutineto generate uniform random numbers for sampleselection. When you need more than one stream ofrandom numbers, use the subroutine rather than thefunction.

• Be careful to use the correct random number functionsince SAS offers several functions with different

GAO/IMTEC-11.1.2 SAS DocumentsPage 39

Page 42: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 4

Data Entry and Transfer

output distributions. Avoid the UNIFORM randomnumber generation function.

• Check character data created from charactersubstrings for leading and trailing blanks.

• Numeric data created from mathematical calculationsare subject to rounding errors. The ROUND and FUZZfunctions will eliminate rounding errors at specifiedrounding units.

On-Line SAS DataEntry

You can manually enter data directly into SAS datasets using the CARDS option, direct variableassignment, and full screen products, such as PROCFSEDIT. Data entered in this manner must have thesame level of reliability as data provided by othermethods. Also, you must be careful to create a solidaudit trail. Guidelines specific to this method of dataentry follow:

• Avoid the CARDS and CARDS4 methods of data entrysince they require the data and the SAS program to beready, reviewed, and approved at the same time.These methods are difficult to use because theprogrammer is not normally the same person whoenters the data. In addition, the SAS log file does notshow the data lines in the program, so the programmust be included in the workpapers to document theraw data.

• The SAS assignment statement is not a good way toenter a lot of data. It is normally used in a batch modefor correcting data in an existing SAS data file. Aprogram using the assignment statement mustcarefully document the data.

• Data can be entered directly into SAS files with thePROC FSEDIT procedure or by creating data entrywindows. These methods have the advantage ofimmediate data editing, but a danger exists in that thedata file can be lost or inadvertently modified. Always

GAO/IMTEC-11.1.2 SAS DocumentsPage 40

Page 43: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 4

Data Entry and Transfer

make a backup copy of the data file before using thismethod.

• The PROC FSEDIT procedure does not have a log fileof the changes made to the data base. Use the PROCCOMPARE procedure to compare values between theold and new data files, and review the output report.

Transferring andMoving SAS DataSets

Transferring and moving SAS data sets is reliable, butspecific methods must be used to retain the integrityof the files. Several methods are available, asdiscussed below:

• When copying SAS data files within a computersystem, use the PROC COPY procedure. Do not usesystem utilities, such as the mainframe computer’sIEBCOPY program or the microcomputer’s COPYprocedure. These utilities will not maintain theintegrity of the SAS file.

• You can use the interactive CATALOG window tocopy catalog entries from one catalog to another.Catalog entries include windows, formats, indices,and many other SAS items. You can use theinteractive DATASETS procedure to copy SAS datafiles.

• When copying files between remotely linkedcomputers running SAS, use the PROC UPLOADprocedure or the PROC DOWNLOAD procedure.

• When copying SAS files between computer systems,use the PROC COPY procedure with the IMPORT orEXPORT options.

GAO/IMTEC-11.1.2 SAS DocumentsPage 41

Page 44: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 5

Processing and Documenting theResults

Documentation standards for SAS programs aresimilar to those for manual workpapers; that is, SASworkpapers should be complete, accurate, clear, neat,relevant, and understandable. The term“understandable” needs some clarification whendealing with SAS programs. SAS programs must beunderstandable to people with a general knowledgeof SAS. However, SAS documentation need not be acomplete explanation of SAS procedures andstatements since this information is available in theSAS language manuals.

Documentation of a SAS program helps ensure thequality of the product and provides the supervisor andthe referencer with an essential audit trail. The propertime to document your SAS program is duringprogram development since documentation also helpsyou ensure that your program is working correctly.

Documentation includes the SAS log file and theoutput. The documentation can also include, but isnot limited to, flow charts, test results, comments andtitles within the programs, and explanations ofmessages in the SAS log file.

All GAO workpapers should receive promptsupervisory review, and workpapers produced by SASare no exception. The supervisor should indicatereview and acceptance of the SAS program by signingon the first page of the program output. For outputfrom mainframe computers, the first page is the jobcontrol output log. For microcomputers, the first pageis the SAS log file.

The following suggestions help ensure that SASprograms will be developed with sufficientdocumentation and adequate review.

GAO/IMTEC-11.1.2 SAS DocumentsPage 42

Page 45: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 5

Processing and Documenting the

Results

• You are encouraged to use interactive programmingfor program development, but workpapers for thefinal programs should be run in the batch mode orwith a clean interactive run.

• SAS programs are easily modified. To accuratelydocument the running of a SAS program you mustretain the log file in the workpapers. SAS outputreports must never be separated from their log file.Formal run books are not needed. On mainframeversions of SAS, the job control language messagesshould also be retained.

• The log file must be complete, including all source filestatements. When calling external programs ormacros with the %INCLUDE statement, use theSOURCE2 option so that all source file statements arelisted on the SAS log file. When using macros, use theMPRINT and SYBOLGEN options to show the macroand to display how symbolic variables are resolved.

• Messages on the SAS log should be reviewed and,where appropriate, explained by the programmer. Logfile notes needing comment are data read errors,format conversion errors, format changes, missingvalue generation, record counts, and variable counts.

-Output reports must be complete. Use the defaultoptions for date and page numbers.

• Systems documentation must indicate final programnames, data file locations, and workpaper indexes.

• SAS files and user-written permanent formats can bedocumented with the PROC CONTENTS procedure.Use the PROC PRINT procedure to display a portionof permanent SAS data files.

• Run final test programs in the batch mode or asseparate interactive runs. The resulting workpapersrequire approval of a knowledgeable person,preferably the supervisor.

• If possible, limit procedure output to relevantinformation. For example, if cell percentages are not

GAO/IMTEC-11.1.2 SAS DocumentsPage 43

Page 46: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 5

Processing and Documenting the

Results

needed in frequency tables, suppress printing thepercentages with the NOPERCENT option. Whereextraneous material is part of the output, line it out.

• Avoid using the TITLES window since this method ofentering titles does not create an audit trail. Likewise,avoid using the FOOTNOTES window for enteringfootnotes for the same reason.

GAO/IMTEC-11.1.2 SAS DocumentsPage 44

Page 47: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 6

Referencing SAS Work

Full referencing is the preferred method for all GAOreports, testimonies, and other products, and materialbased on SAS is no exception.

To reference SAS-based information, the referencermust be able to determine that the workpapersprovide sufficient, competent, and relevant evidence.Referencers who are not familiar with SAS must seekassistance from someone who is. This person mustalso be independent of the assignment.

The following guidance will help the referencerreference SAS workpapers:

• The referencer is responsible for determining that theSAS workpapers are in compliance with GAOworkpaper standards. For example, the SASworkpapers need evidence of supervisory review andquality control checks. The position of the titles,preparer’s name, date, supervisor’s signature, andsource reference may vary, but they are required.

• The referencer must check the SAS runs forunresolved errors, unexplained changes in thenumber of variables, unexplained changes in thenumber of records, unexplained variable typeconversion, and unexplained generation of missingdata. The SAS log file will contain many of theseitems.

• The referencer must determine if the workpapersinclude cross-referenced system documentation;program documentation, including log files; datadocumentation and validity tests; and tests ofcomplex programs.

• The workpapers need sufficient instructions so thatthe referencer can duplicate the work. Instead ofmanually verifying every computation (which, in mostcases, would be impossible), the referencer mustensure that computer program logic has been tested,the data have been checked for reliability, and the

GAO/IMTEC-11.1.2 SAS DocumentsPage 45

Page 48: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 6

Referencing SAS Work

report item matches the supporting SAS output. Onrare occasions, the figures may be independentlyverified using another computer program.

• Factual data derived from SAS procedures must beaccurate and objective. The procedures must havecomments indicating the assumptions made and themethod used. Items to check will depend on theprocedures used. The following examples indicatestypical items to check:

• When referencing the PROC ANOVA procedure,check the use of an unbalanced experimental designand changes in the significance level from thestandard 0.05. Deviations should have writtenjustification.

• When referencing the PROC CATMOD procedure,check the design matrix specification to ensure it isconsistent with the results.

• When referencing the PROC FREQ procedure, checkthe inclusion or exclusion of missing data in marginalcomputations, the proper adjustment to Chi-squaretests, and warnings about minimal cell frequencies.

• When referencing the PROC UNIVARIATE procedure,check the proper specification for the divisor in thecalculating variance.

• When graphs are based on SAS procedures, thereferencer must ensure that the axes do not distortthe data and that the smoothing techniques do nothide important changes.

GAO/IMTEC-11.1.2 SAS DocumentsPage 46

Page 49: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Chapter 7

Disposition of Workpapers

Retention standards for SAS workpapers are the sameas those for manual workpapers. The method ofarchiving data and programs will depend on theamount of data and the facilities available forarchiving. Microcomputer data will most likely bearchived on a floppy disk. Mainframe computer datawill most likely be archived on magnetic tape.

You do not have to save every data file you create inthe course of a job. The files that are archived willdepend on how the data were created and what fileswould be time consuming to recreate. The followingitems should be considered in archiving SASprograms and data files:

• Use only SAS procedures, such as the PROC COPYprocedure, and DATA steps to make backup copies ofSAS data files. Never use operating system utilities onthese files.

• When saving data files, remember to include formats,windows, and indices.

• When the SAS file is likely to be used on anothersystem, backups should be saved as portable filesusing the EXPORT function.

• All final SAS program files must be saved. SASprograms should be saved as text files.

GAO/IMTEC-11.1.2 SAS DocumentsPage 47

Page 50: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Additional Information

The SAS Institute publishes an extensive list of SASsoftware guides which are customized for variousoperating systems and computer environments. A fewof the over 200 SAS publications are listed below.

Version 6 — MS-DOS and PC-DOS environment:

SAS Language Guide for Personal Computers, Release6.03 Edition

SAS Procedures Guide, Release 6.03 Edition

SAS/STAT User’s Guide, Release 6.03 Edition

SAS/GRAPH User’s Guide, Release 6.03 Edition

SAS/FSP User’s Guide, Release 6.03 Edition

Version 6 — AOS/VS, CMS, MVS, OS/2, PRIMOS, andVMS environments:

SAS Language: Reference Version 6

SAS/STAT User’s Guide, Version 6

SAS/GRAPH Software Reference, Version 6, volumes1 and 2

SAS/FSP Software Usage and Reference, Version 6

SAS Companion for the MVS Environment, Version 6

You will find additional guidance in thesepublications:

Federal Information Processing StandardsPublications, Department of Commerce, NationalBureau of Standards. Washington, D.C.: 1970-92.

GAO/IMTEC-11.1.2 SAS DocumentsPage 48

Page 51: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Additional Information

Using Micro Computers in GAO Audits: ImprovingQuality and Productivity. General Accounting Office,Information Management and Technology Division,Technical Guideline 1. Washington, D.C.: 1986.

Assessing the Reliability of Computer-ProcessedData. General Accounting Office, InformationManagement and Technology Division, GAO/OP-8.1.3.Washington, D.C.: 1990.

Project Manual. General Accounting Office, chapter10.1. Washington, D.C.: 1986.

Government Auditing Standards, 1988 Revision.General Accounting Office. Washington, D.C.: 1988

GAO/IMTEC-11.1.2 SAS DocumentsPage 49

Page 52: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

Ordering Information

The first copy of each GAO report and testimony

is free. Additional copies are $2 each. Orders

should be sent to the following address,

accompanied by a check or money order made

out to the Superintendent of Documents, when

necessary. Orders for 100 or more copies to be

mailed to a single address are discounted

25 percent.

Orders by mail:

U.S. General Accounting Office

P.O. Box 6015

Gaithersburg, MD 20884-6015

or visit:

Room 1100

700 4th St. NW (corner of 4th & G Sts. NW)

U.S. General Accounting Office

Washington, DC

Orders may also be placed by calling

(202) 512-6000 or by using fax number

(301) 258-4066, or TDD (301) 413-0006.

Each day, GAO issues a list of newly available

reports and testimony. To receive facsimile

copies of the daily list or any list from the past

30 days, please call (301) 258-4097 using a

touchtone phone. A recorded menu will provide

information on how to obtain these lists.

Page 53: Planning, Preparing, Documenting, and Referencing SAS ... · Although SAS is flexible, multipurpose, and easy to use, it can easily be misapplied, the results misinterpreted, and

United States

General Accounting Office

Washington, D.C. 20548-0001

Official Business

Penalty for Private Use $300

Address Correction Requested

Bulk Mail

Postage & Fees Paid

GAO

Permit No. G100