sas intro

SAS INTRODUCTORY GUIDE

Systems Introduction to SAS

REFERENCES

Ref Doc ID Title Date of IssueS10 56101 SAS Companion For The MVS Environment Ver 6 1st Edit 11/93S2 56076 SAS Language: Reference Ver 6 1st Edit 09/95S4 56075 SAS Languages and Procedures: Usage Ver 6 1st Edit 12/93S5 56078 SAS Languages and Procedures: Usage 2 Ver 6 1st Edit 07/91S6 56080 SAS Procedures Guide Ver 6 3rd Edit 06/95S3 56077 SAS Languages and Procedures: Syntax Ver 6 1st Edit 09/94S7 56041 SAS Guide To Macro Processing Ver 6 2nd Edit 06/95S8 56042 SAS Guide To VSAM Processing Ver 6 1st Edit 08/92S9 56150 SAS A Guide To Efficient SAS Processing Ver 6 1st Edit 11/92

TABLE OF CONTENTS1. INTRODUCTION..............................................................................................3

1.1 WHAT IS SAS?..............................................................................................31.2 TOOLS.........................................................................................................31.3 PROGRAM STEPS..........................................................................................31.4 SAS DATASETS............................................................................................3

2. WRITING A SAS PROGRAM...............................................................................42.1 PROGRAM STEPS..........................................................................................42.2 SEQUENCE OF PROCESSING..........................................................................42.3 DATA TYPES................................................................................................52.4 COMMENTS.................................................................................................5

3. THE DATA STEP..............................................................................................53.1 INFILE STATEMENT......................................................................................53.2 INPUT STATEMENT......................................................................................63.3 SET STATEMENT..........................................................................................63.4 RECORD SELECTION....................................................................................73.5 CALCULATIONS AND ASSIGNMENTS.............................................................73.6 KEEP AND DROP..........................................................................................8

4. THE PROC STEP...............................................................................................94.1 GENERAL....................................................................................................94.2 PROC PRINT.................................................................................................94.3 PROC SORT................................................................................................104.4 PROC SUMMARY........................................................................................114.5 PROC CHART.............................................................................................124.6 OTHER PROCEDURES.................................................................................124.7 SYSTEM OPTIONS.......................................................................................13

5. ADDITIONAL FACILITIES...............................................................................135.1 REPORT WRITING.......................................................................................135.2 FUNCTIONS...............................................................................................145.3 MERGING FILES.........................................................................................155.4 COMPLEX FILES.........................................................................................165.5 RECORDS WITH VARIABLE NUMBERS OF SETS............................................175.6 PROGRAM CONTROL..................................................................................175.7 CREATING MORE THAN ONE DATASET........................................................185.8 RETAIN OPTION.........................................................................................195.9 COUNTING AND SUM STATEMENTS............................................................195.10 LENGTH....................................................................................................19

1 of 48 07 April 2023


5.11 FORMAT STATEMENT................................................................................206. DATES AND TIMES IN SAS..............................................................................20

6.1 INTRODUCTION.........................................................................................206.2 INTERNAL REPRESENTATION.....................................................................206.3 FORMATS FOR DATES AND TIMES...............................................................216.4 HOW TO USE THE FORMATS.......................................................................216.5 DATE AND TIME FUNCTIONS......................................................................226.6 DATE AND TIME VALUES AS CONSTANTS...................................................236.7 A WORKED EXAMPLE.................................................................................23

7. RUNNING YOUR SAS PROGRAM.....................................................................237.1 SAMPLE JCL FOR A SAS JOB........................................................................23

8. USEFUL TOOLS.............................................................................................248.1 MISSING VALUES.......................................................................................248.2 NEATER ARGUMENTS................................................................................248.3 CHARACTER TESTS....................................................................................258.4 DO’S AND DON’TS......................................................................................258.5 GO TO.......................................................................................................268.6 LINK.........................................................................................................278.7 ARRAYS....................................................................................................278.8 ARRAYS AND DOs......................................................................................288.9 BY-WAYS..................................................................................................288.10 SELECTED OBSERVATIONS FOR TESTING....................................................298.11 A SLICK USE OF INDEX...............................................................................298.12 TROUBLE WITH SUBSTRING.......................................................................308.13 SAS MACRO LANGUAGE.............................................................................30

9. ASPECTS OF INPUT........................................................................................319.1 TEST INPUT...............................................................................................319.2 INPUTTING SEVERAL DATASETS.................................................................319.3 MERGE WITHOUT BY.................................................................................319.4 OBSERVATION (END = )..............................................................................329.5 FIRST. AND LAST........................................................................................329.6 DATASET OPTIONS.....................................................................................33

10. ASPECTS OF OUTPUT.....................................................................................3310.1 OUTPUT FROM DATA.................................................................................3310.2 THE FILE STATEMENT................................................................................3410.3 FILE OPTIONS............................................................................................3410.4 THE PUT STATEMENT.................................................................................3510.5 PRINTING MULTIPLE COLUMNS..................................................................3610.6 OUTPUT TO EXTERNAL FILES.....................................................................3610.7 PROC PRINTTO...........................................................................................37

2 of 48 07 April 2023


STATISTICAL ANALYSIS SYSTEM (SAS)INTRODUCTORY GUIDE

1. INTRODUCTION

This guide is designed to give a basic grounding in SAS. If read from beginning to end the reader will be able to cope with most SAS programs. The guide starts out at a basic level, then goes on to take the same topics further and to introduce some new ones.

1.1 WHAT IS SAS?

Like every other computer language, SAS is a way of writing instructions to a machine to handle data.

SAS is a high-level computer language. This means that it accepts comparatively simple instructions from the user, who does not need to concern himself with the details of their execution. In fact SAS is aimed at the user rather than the programmer. In practice this means that if you already use a language like for example COBOL you have to try not to think how you would design COBOL code to meet a requirement but rather how you can take advantage of the many built in functions of SAS to do the job quicker , with far less code and in a far easier to maintain form.

If you have used PL/I you will probably find SAS code very familiar because the syntax is virtually identical (This is because SAS originally compiled into PL/I) but the same lateral thought will be required of PL/I’ers.

1.2 TOOLS

SAS is a powerful language. Many different kinds of instruction are possible, giving the user access to an impressive range of information outputs.

The purpose of this guide is to teach a subset of the language which will be comparatively easy to remember and use. But the whole language is available and worth investigating further if you have special needs such as mathematical functions, statistical analysis, matrix handling, specially laid out reports, graphs, etc.

Some rules are included here only to make it easier to avoid errors. In fact, SAS is much less restrictive than some comparable languages, and with experience you will find that quite a lot can be left out without affecting the way SAS operates.

1.3 PROGRAM STEPS

A SAS program normally reads one or more input files and produces reports. You write your program as a series of separate steps; data is passed from one step to another on temporary workfiles, which disappear when the program run ends.

3 of 48 07 April 2023


1.4 SAS DATASETS

The temporary files are known as SAS datasets . A SAS dataset differs from a normal (OS) file in that it contains not only the field names and values but also information for SAS about their structure and where the data came from.

SAS datasets can be created and used only from SAS programs.

A SAS dataset can also be made permanent if required.

2. WRITING A SAS PROGRAM

2.1 PROGRAM STEPS

A SAS program is made up of steps. Each step is made up of statements.

There are two kinds of step: a DATA step creates a SAS dataset (work file); a PROC step processes a SAS dataset. (PROC stands for procedure).

Here is a typical SAS program with two steps.

DATA WORK1; INFILE SIR; INPUT @1 PART £CHAR15. @74 SLOC £2. @90 MANRES £2. ; IF MANRES = '02' & SLOC = 'DY'; OUTPUT WORK1;PROC PRINT DATA = WORK1; TITLE DEPT 02 PARTS;

Note: Each statement ends with a semicolon. You normally write one statement to a line, but several statements may be written on one line, or a statement can spill over several lines. The semicolon is decisive.

Lines are indented except for the DATA and PROC statements, which shows where the steps begin. This is not essential, but it helps when reading the program.

The DATA step creates a SAS dataset; in the name of the dataset, in this case WORK1, appears in both the DATA and OUTPUT statements.

The INFILE statement defines the production file to be read. The INPUT statement tells SAS where to find the necessary fields on that file. The IF statement causes certain records to be selected, depending on the values of two of the fields.

When this is done and the selected records and fields are placed on the file WORK1, the PROC step is invoked to process it. PROC PRINT produces a printed report with the data in columns. The TITLE asked for will head each page.

This done, the dataset WORK1 will be erased, there being no further steps in the program.

4 of 48 07 April 2023


2.2 SEQUENCE OF PROCESSING

A SAS program step looks at the input file record by record. Each statement is performed separately on every record, unless you state otherwise. For example:

IF MANRES = '02' & SLOC = 'DY';

causes SAS to look at the two fields MANRES and SLOC on every record of SIR.

Likewise:

OUTPUT WORK1;

directs each record selected to the work file. The single statement is to be performed as many times as there are records.

2.3 DATA TYPES

A field on a file can be either numeric or character.

A numeric field contains numbers only, numbers being made up of the digits 0 to 9 with optional decimal point, minus sign, and leading or trailing blanks.

A character field can contain any valid SAS characters; these include letters, digits, blanks, punctuation marks, and special symbols.

When you write a character value in a SAS program, enclose it in single quotes; e.g. MANRES = '02'.

SAS can do calculations on numeric values. It can also calculate with character values if they can be converted to numeric. For example '02' can be converted to 2, but 'DY' is not convertible to numeric and therefore not usable in calculations.

2.4 COMMENTS

You can include comments in a program to make it easier to understand. A statement that begins with an * (and ends with a semicolon) is taken as comment and is not processed by SAS. Or you can insert comments inside SAS statements wherever a blank is possible by using /* before and */ after each comment. (But do not place /* in columns 1 and 2, as this will be taken as a job delimiter).

3. THE DATA STEP

Note: The DATA and OUTPUT statements are described in "Program Steps".

The INFILE and INPUT statements described below enable you to read data from an OS file, i.e. Any file other than SAS dataset. If you are reading data from a SAS dataset, ignore these statements and use SET; see "SET Statement".

5 of 48 07 April 2023


3.1 INFILE STATEMENT

Examples:

INFILE SIR; INFILE TAC OBS = 50;

In the JCL (see "Running Your SAS Program”) you will define the files which you want the program to read, and give a short name - the DD name - to each file.

The INFILE statement states the DD name of the file which is to be input. There are various keywords which may be added to the INFILE statement to describe the file being used, for example if it is a VSAM dataset add the VSAM keyword plus any other VSAM processing options required. SAS has interfacesto most database management systems on may platforms including IMS, DB2, INGRES and ACCESS.

OBS = n causes SAS to read only the first n records from the input file. Suppose you are writing a program to read a file and produce a report which could be long. You are not yet sure what the report will look like or if you are making the right selection. Adding OBS = 50 will give you a test run using only 50 records from the file. Use this option when it could prevent waste of resources, and remove it when you are past the testing stage.

6 of 48 07 April 2023


3.2 INPUT STATEMENT

Example:

INPUT @1 CHRG 5.3 @6 CPUHRS 5.3 @36 DEPT £CHAR4. @48 PROJ £CHAR4. @69 TIMOFF £CHAR4.;

The INPUT statement states which fields you wish to read from the input file. For each field you state:

1. its starting position on the record, using @,2. a name for the field, and3. the format of the field.

Fortunately you do not have to worry about the details of the INPUT statement, as a complete layout for all the fields can usually be obtained from the COBOL copy book by using the RPF XG.SASREC. You then just select the fields you need and copy their details into your INPUT statement. See “SAS MACRO LANGUAGE” for further information.

Note that in the previous example, the fields CHRG and CPUHRS begin at positions 1 and 6 respectively on the record and both are in decimal format (a £ signs means a character format: all other fields are decimal) with five significant figures including three decimal places (i.e. they are numbers like 1.234). DEPT, PROG and TIMOFF are character fields each four characters long.

Copy out the positions and formats from the field list supplied, changing the field names if you wish. You need not write the fields in the same order as they appear on the field list.

The format indicator for a character field begins with either £ or £CHAR - thus either £15. or £CHAR15. specifies a field or 15 characters. The difference is that SAS automatically left-justifies a field input with £ only, while leaving a field input with £CHAR intact. So I recommend the £CHARn. format as generally safer. However, when writing variables using the PUT command, it is necessary to respecify the format in the command line e.g.

PUT @10 FIELD £CHARn.;as otherwise SAS will truncate leading spaces even if the variable is specified as £CHARn.

Note also that every format description contains a full stop, and don't forget the semicolon at the end of the statement!

3.3 SET STATEMENT

In the previous example, the input to the DATA step was an OS file. When this is the case, the INPUT and INFILE instructions are necessary.

7 of 48 07 April 2023


If the input to the DATA step is a SAS dataset, however, omit the INFILE and INPUT statements, and replace them with a SET statement.

The syntax is

SET filename;

e.g. SET WORK1; The 'filename' will be a dataset (e.g. WORK1) created in an earlier DATA step.

3.4 RECORD SELECTION

A function of the DATA step is to select records from a production file or SAS dataset and produce a workfile which will be smaller and easier to handle. A typical retrieval will ask for only a small part of the entire data on the file.

The operative words are IF, THEN, and DELETE, together with the symbols

= equals < less than & and

¬ not > greater than | or

Of the examples below, ( a ) means delete all records for which TIMEON is less than TIMOFF. ( b ) means delete records where the date is not equal to 82300. In this case 82300 is written in quote marks because DAT is a character, not a numeric, field. ( c ) means delete records whose CPUHRS value is less than 1.0.

( a ) IF TIMEON < TIMOFF THEN DELETE;( b ) IF DAT ¬= '82300' THEN DELETE;( c ) IF CPUHRS < 1.0 THEN DELETE;

The following examples tackle the problem from the opposite standpoint, stating which records are to be retained (selected), not deleted (rejected). The last part of the sentence, which might have read something like "THEN HOLD" is omitted. Example ( d ) therefore means retain records for which TIMEON is greater than or equal to TIMOFF; it has exactly the same effect as example ( a ).

( d ) IF TIMEON >= TIMOFF;( e ) IF CPUHRS >= 1.0;( f ) IF TIMEON >= TIMOFF & CPUHRS >= 1.0;( g ) IF MANRES = '02' & (SLOC = 'DY' | SLOC = 'NF');

Examples ( f ) and ( g ) are compound conditions involving and and or. Use '&' if the selection (or deletion) is to depend on both conditions' applying simultaneously. Use '|' (or) if the selection (or deletion) should result from either condition's being fulfilled.

8 of 48 07 April 2023


Example ( g ) shows logical nesting; the brackets are used to specify that the condition "stores location is either DY or NF" is to be considered first; the selection will be made if this is true and if MANRES is 02.

Note: The relational symbols may be combined in various ways:

¬ = not equals >= greater than or equal to

¬> not greater than <= less than or equal to

¬< not less than

If you don't like the symbols, you can use

EQ NE LT NL GT NG LE GE AND or OR

instead of = ¬= < ¬< > ¬> <= >= & | .

Expressions must be written in full, e.g. SLOC = 'DY' | SLOC = 'NF', not SLOC = 'DY' | 'NF'.

3.5 CALCULATIONS AND ASSIGNMENTS

You read fields from an input file in a DATA step. You can also create your own fields, using a statement of the general form

name = expression;

Possible examples are

A = 17; HOURS = MINUTES / 60; COST = QTY * CHARGE;

TOTAL_QY = QY1 + QY2 + QY3; X = X + 1;

NEWRATE = (UNITS * RATE) - DISCOUNT; B = 'FISH'; JOIN = B | | C;

On the left of the = sign you give the new field a name, which can be anything you like, so long as

1. it has no more than 8 characters,2. it has no embedded blanks,3. it contains only letters and figures, and4. it begins with a letter.

The underscore counts as a letter; it is a convenient replacement for a blank, as in TOTAL_QY as above.

9 of 48 07 April 2023


On the right of the = sign is an expression made up of constants and/or already existing fields, combined to give a value to the field on the left. If the expression is constant, as with A = 17 and B = 'FISH' above, the field will have this value on all records. But if the expression contains existing fields, such as MINUTES or QTY, the result will depend on the values these have in each record.

The symbols +, -, *, / are used for add, subtract, multiply and divide. Brackets have their usual function, but note that * and / are executed before + and -, so that the NEWRATE example would not be affected if the brackets were removed. To join character fields together, use two | signs. Thus, if B is 'FISH' and C is 'PASTE' then B | | C would give 'FISHPASTE' and B | | '-' | | C would give 'FISH-PASTE'

SAS functions can also be used in expressions.

3.6 KEEP AND DROP

The DATA step reads certain fields from the input file, depending on the INPUT or SET statement,

and adds to these any new fields set up as in "Calculations and Assignments". Unless you state

otherwise, using KEEP or DROP, all these fields will be written on the output file, although they may not be needed.

The following program is the example in "Program Steps" with a DROP statement added. Three fields are read in by the INPUT statement. Two of these are used for record selection (IF), and if they are not needed on the file WORK1 may as well be DROPped.

10 of 48 07 April 2023


DATA WORK1; INFILE SIR;

INPUT @1 PART £CHAR15. @274 SLOC £2. @90 MANRES £2.;

IF MANRES = '02' & SLOC = 'DY'; DROP SLOC MANRES; OUTPUT WORK1;PROC PRINT DATA = WORK1;

"KEEP PART;" in place of the DROP statement would have just the same effect.

4. THE PROC STEP

4.1 GENERAL

When you are writing a DATA step, you program all your requirements down to the detail of field names and lengths, file and print positions, precise criteria for record selection, etc. A SAS PROCedure, on the other hand, is a complete program already written and incorporated in the language; you call it in and specify options. And if you don’t like what’s programmed, you find another procedure or tackle the job yourself in DATA.

The PROC step specifies a SAS procedure to be performed on the dataset created in a DATA step. Many procedures are available to analyse and process the data, covering a wide range of statistical and other analyses. Most procedures are for presenting data. About 30 are statistical packages - the initial thrust of the Statistical-Analysis System (SAS).There are graph procedures and facilities for producing bar charts and pie charts. The most useful procedure must be PROC PRINT.

The form of the PROCedure step is

PROC name DATA = workfile;information statements

'Name' is the PROC name, e.g. PRINT, CHART. The 'information statements' define columns to print, axes for plotting, variables for sorting, etc. as needed.

4.2 PROC PRINT

PROC PRINT DATA = WORK1;is all you need to produce a print of the data in WORK1.

The result will be neat columns of data with column headings. This is an example of a PROC step consisting of only one statement.

But you can also call up information statements to enhance your print. The next example is a PROC step with eight such statements.

PROC PRINT DATA = WORK3; BY PROJ DEPT;

11 of 48 07 April 2023


PAGEBY PROJ; SUMBY DEPT; VAR CHARGE LAPSE UNITS JOBS; SUM UNITS JOBS; TITLE1 'LAPSE TIMES AND UNITS BY PROJECT'; TITLE3 'DATA FOR JULY 1983'; ID CHARGE;

Note: The first column of the print normally contains the observation number OBS (but see ID below). This is just a sequential numbering of the lines of data printed.

The BY statement asks for the print to be separated into chunks that end where values of the variables change. The BY variables then appear not in columns but as subheadings.

Before using a PROC PRINT with a BY statement, you should ensure that the dataset is already sorted (see below) by the same BY variables (at least). If you have DESCENDING in the PROC SORT, you should have it also in the PROC PRINT.

PAGEBY requests a new page on change of the variable. Use only one variable name with PAGEBY. The same variable name must also appear in the BY statement.

SUMBY in conjunction with SUM specifies subtotals at control breaks. In the example above, the columns for UNITS and JOBS will be totalled when the value of DEPT changes. But since DEPT is sorted within PROJ, totals by PROJ will also be shown.

Only one variable name may follow SUMBY; it must also appear in the BY statement. If the SUM statement is missing, SUMBY will give subtotals of all the numeric variables. If SUM is present but no SUMBY, you will get final totals only.

VAR indicates the variables to be printed and the order of the columns. If VAR is omitted, all variables present will be printed, in the order of their appearance in the DATA step.

TITLEn statements give the titles to be printed at line n of each page. Using TITLE1 and TITLE3 causes two title lines to be printed with a blank line between them. You can use TITLE instead of TITLE1. To include an apostrophe (single quote) in a title, code two quote marks, and one will be printed.

Note that a title once set up will continue in later PROCs unless respecified.

ID specifies which variable should appear first. This is then an IDentifying field and replaces the observation number. If ID is omitted, the first column will contain the observation number.

4.3 PROC SORT

PROC SORT DATA = WORK1; BY PTNO;

12 of 48 07 April 2023


PROC SORT DATA = WORK3; BY PROJ DEPT TIMEON;

Before printing, you may wish your dataset to be sorted. Before use of MERGE or UPDATE, sorting is essential.

The BY statement indicates the variable(s) used in the sort. The first example would sort WORK1 into PTNO order. The second example would sort WORK3 by TIMEON within DEPT within PROJ. Note the sequence: the major sort variable appears first.

You can sort in DESCENDING order:

PROC SORT DATA = WORK5 OUT = WORK6; BY DESCENDING SCORE;

This example also illustrates the OUT = option, which allows you to name the sorted version of the file. If you omit OUT =, the sorted version overwrites the original version and takes the same name.

4.4 PROC SUMMARY

The result of PROC SUMMARY is a summarised version of the original file, with the detailed information replaced by totals, subtotals and means if required. These can then be printed using PROC PRINT. PROC SUMMARY itself causes no printed report.

The way the data are summarised depends on the variables you choose as CLASSES.

PROC SUMMARY DATA = WORK1; CLASSES DEPT PROJ; VAR HOURS UNITS; OUTPUT OUT = WORK5 SUM = TOTHRS TOTUNI MEAN(UNITS) = AVGUNI;PROC PRINT DATA = WORK5;

TITLE 'CHARGEABLE UNITS & CPU HOURS SUMMARY';

In this example, hours and units from a resource-usage file are to be summarised by department and project number. So DEPT and PROJ are the classes, and the output will show

(0) overall totals,(1) subtotals by project,(2) subtotals by department, and(3) subtotals by project within department.

of the variables HOURS and UNITS.

13 of 48 07 April 2023


SAS will create two new fields: _TYPE_ will be the type of subtotal, corresponding to (0), (1), (2), or (3) above. _FREQ_ will contain the number of observations which are totalled on that line.

Note: Here the PROC SUMMARY step is followed by a PROC PRINT, as no print results from the former.

The VARiables to be summed are as listed.

The OUTPUT statement -

(a) names the file, using OUT =, on which the summary is to be written,(b) gives names to the totals of the fields in the VAR list,(c) gives names to the means of the fields in the VAR list.

Sums and means can be specified or not, as required. In the example above, both variables are to be summed; but only the UNITS are to be meaned. When, as in the latter case, only part of the VAR list is involved, that part should be shown in brackets to avoid ambiguity.

Further note. Remember that _TYPE_ = 0 always refers to overall totals, frequencies etc. Where there is only one class, _TYPE_ = 1 records will take each value of the class variable in turn.

If there are more classes, these are grouped by _TYPE_ value in a logical pattern reminiscent of binary numbers. Suppose there are three classes called A, B, and C (CLASSES A B C;). The subtotals obtained with SUM = are then -

14 of 48 07 April 2023


for _TYPE_ = 0 overallfor _TYPE_ = 1 for each value of Cfor _TYPE_ = 2 for each value Bfor _TYPE_ = 3 by B and Cfor _TYPE_ = 4 for each value of Afor _TYPE_ = 5 by A and Cfor _TYPE_ = 6 by A and Bfor _TYPE_ = 7 by A and B and C

4.5 PROC CHART

This procedure produces a bar chart or pie diagram directly from a SAS dataset. The output uses print characters and forms part of your normal print-out.

Example: PROC CHART DATA = WORK1; HBAR TERMAD / SUMVAR = LAPSE;

This example asks for a horizontal bar chart (HBAR) showing lapse times for different terminal addresses; TERMAD and LAPSE are fields on the file WORK1 in this case. The result will be a bar chart with a bar for each terminal address, representing total lapse times by bar lengths.

Example: PROC CHART DATA = WORK2; PIE DEPT / SUMVAR = COST;

DEPT and COST are fields on WORK2. The resulting PIE will be cut, with one slice for each department; the total cost for the department determines the width of the slice.

The types of chart available are

VBAR - vertical bar chart PIE - pie diagramHBAR - horizontal bar chart BLOCK - block diagram

The block diagram is also called the Manhattan chart: vertical bars rise from a horizontal matrix rather like skyscrapers. After each chart type write the name of the principal variable, which is to label the bar, slice or skyscraper, and follow it with a slash /. The options that can follow the slash include

SUMVAR = (sum variable, giving bar length or slice width)GROUP = (grouping of principal variable)SYMBOL = (symbol forming the bar)

Example: PROC CHART DATA = WORK1; VBAR TERMAD / SUMVAR = LAPSE GROUP = DEPT SYMBOL = 'OX';

Note: Write all the options you require after the slash, and follow them with a semicolon.

In this example, TERMAD would be grouped by DEPT.

15 of 48 07 April 2023


In the BLOCK diagram, GROUP will add an extra dimension, giving the Manhattan effect.

If no SYMBOL is specified, the default is *. If two or more symbols are specified, as above, they are overprinted to form one symbol.

4.6 OTHER PROCEDURES

There are over 50 procedures available to the SAS user, covering a range of statistical and other applications. PRINT, SORT, SUMMARY, and CHART, described in this guide, are probably the most useful.

For particular applications you might also like to consider the following:-

PROC MEANS. Statistical data including means, standard deviations, maxima, minima, standard errors of the mean, and variances.

PROC FREQ. Frequency tables, cross tabulations, and other statistical data.

PROC FORMAT. Output formats for variables to be printed, e.g. FEMALE, MALE FOR 0, 1, or sizes grouped as SMALL, MEDIUM, LARGE.

PROC PLOT. Graphs formed with print characters.

4.7 SYSTEM OPTIONS

The outputs from the PROCedures are controlled by system options, which you can change. The default options vary for printers and different terminals.

To re-specify options you can use the OPTIONS statement anywhere in a SAS program, e.g.

OPTIONS PAGESIZE = 100 NONUMBER;

and it will remain in force to the end of the program or until cancelled by another OPTIONS statement. Alternatively, options specified in the JCL are valid to the end of the program, or until you change them. These take the form

/ / EXEC SAS,OPTIONS='PAGESIZE=100 NONUMBER'

Here are some of the system options:

PAGESIZE=n (or PS=n) controls the maximum number of lines on a page. The default page size for printing is 60 lines. 33 lines are needed for a pie chart.

LINESIZE=n (or LS=n) controls the maximum length in characters of each line. The default for printing is 132.

CENTER or NOCENTER. CENTER, the default option, causes outputs to be centred on the page.

16 of 48 07 April 2023


DATE or NODATE determines whether the date is to be printed at the top of each page. Default is DATE.

NUMBER or NONUMBER determines whether pages are to be numbered. Default is NUMBER.

PAGES=n sets a limit to the number of pages a SAS job may print. PAGES=MAX, the default, sets no limit except that imposed by operations.

5. ADDITIONAL FACILITIES

5.1 REPORT WRITING

The PRINT procedure (see "PROC PRINT" ) takes care of most printing requirements.

But if you need more control over your output, use a DATA step instead, like the one below. As no dataset is to be created, start with

DATA _NULL_;

followed by a SET for the data you want to print. Use 'FILE PRINT' to direct the output to a printer, and PUT statements. The PUT statement is like INPUT in reverse: you can specify positions and formats as with INPUT ("INPUT Statement"), but they now control the output.

Control symbols used with PUT include:

@ - at position.. +n - leave n spaces

/ - move to start of next line // - skip one line (etc.) _PAGE_ - skip to a new page

together with the format specifications as with INPUT.

DATA _NULL_; SET WORK1;

FILE PRINT HEADER = H NOTITLES; PUT @5 CHRG 5 . 3

@15 PROJ @32 TIMEON;

RETURN;H:PUT /@5 'EVERYTHING IN ITS PLACE' //@5 'CHARGE' @14 'PROJECT' @32 'TIME' /@5 'DEPT' @14 'NUMBER'

17 of 48 07 April 2023


@35 'ON' //; RETURN;

This example shows not only the formatting methods of the PUT statement but also a method of program control. The requirement is

(a) to suppress any existing titles (NOTITLES),

(b) when starting a new page, to go to another part of the program (H:) to print formatted headings,

(c) otherwise to print the values, record by record, at set positions and with controlled formats.

The first PUT statement in the example above controls the printing of the detail. HEADER = defines the label (H in this case, but any valid name will do) for the program to branch to when the page changes. The PUT statement after the label controls the headings. Note the double slash to produce a blank line below the column headings.

The two RETURN statements are important. Each returns control to the top of the program, ensuring that the processes occur in proper sequence.

5.2 FUNCTIONS

"Calculations and Assignments" showed how to do calculations in a DATA step and create new fields. The functions are an extension of this idea. In each case the function acts on the value(s) in the brackets; the field named to the left of the = sign will take a value equal to the result.

Here is a selection of functions with examples of their use. Assume that A = 3.14, B = 17, C = 18, D = 19, E = 'MESSAGES', and F = 'NOW HERE':

Function Result

M=CEIL(A); M=4N=FLOOR(A); N=3O=ROUND(A); O=3P=MAX(A,B,C,D); P=19Q=MIN(A,B,C,D); Q=3.14R=MEAN(B,C,D); R=18S=SUBSTR(E,5,3); S='AGE'T=INDEX(E, 'AGE'); T=5U=LEFT(F); U='NOW HERE 'V=RIGHT(F); V=' NOW HERE'W=COMPRESS(F); W='NOWHERE 'X=TRANSLATE(E, 'WXY', 'RST'); X='MEXXAGEX'Y=TODAY( ); today's date codedZ=PUT(Y,DDMMYY6.); date as DDMMYY

18 of 48 07 April 2023


Note: the arguments (values in brackets) can be field names or plain values. The names to the left of the = sign can be up to 8 characters long as before.

CEIL rounds up; FLOOR rounds down; ROUND rounds to the nearest whole number. CEIL(-3.14) and ROUND(-3.14) give -3; FLOOR(-3.14) gives -4.

SUBSTR takes a character field (or 'string'), and the result is a part (sub-string) of that field. The numeric arguments define the start position and the length of the sub-string.

INDEX searches within a character field for a pattern of characters; it returns the starting position if the pattern is found, zero if it isn't.

LEFT left-justifies and RIGHT right-justifies a character field; COMPRESS squeezes out the blanks.

TRANSLATE searches within a character field for specified characters to be replaced by others. In this example X replaces S, but R and T are not found.

TODAY has empty brackets and returns the date your job runs, but expressed in SAS day-number form (see "Dates and Times in SAS").

PUT can be used to change a value to a different format. In this case a date is changed to a readable form.

Many other functions are available including the maths functions SQRT (square root), EXP, LOG, LOG10, SIN, COS, TAN, etc.

5.3 MERGING FILES

It is possible to read two (or more) files and produce a dataset containing fields from both (all).

The source files must have a field (or fields) in common, typically a part or check number, for reference. both files should be in the sort order of the reference field(s) before merging.

Say you wish to select parts from the SIR file, and reference against the Unit-Cost File (UCF) to pick up current costs for those parts. First you will select the part numbers from SIR in a DATA step and produce a SAS dataset, say WORK1. Then you will select the required cost fields from UCF and produce WORK2, say. WORK1 and WORK2 will both be in part-number order. The next DATA step will then perform the merge:

19 of 48 07 April 2023


DATA WORK3; MERGE WORK1 (IN = IN1) WORK2 (IN = IN2); BY PTNO; IF IN1 = 1 & IN2 = 1; KEEP MAT LAB ASL SLOC PTNO; OUTPUT WORK3;

Note: PTNO, part number, is the field common to WORK1 and WORK2, both of which must be in PTNO order.

IN1 and IN2 (you can choose your own names) are indicators that a record for the particular part number is present - in WORK1 and WORK2 respectively. Only two values are possible: 0 or 1, absent or present. So

IF IN1 = 1 & IN2 = 1;

selects records for part numbers which were present on both files. Alternatively, "IF IN1 = 0 & IN2 = 1;" would select records where the part number occurred on WORK2 only.

The records created by the merge will contain the common field together with all the other fields on both files. These can then be kept or dropped by using KEEP or DROP.

If there is no match, one of the indicators will have value 0, and the fields from the file whose record is missing will be flagged as "missing".

("Missing" values are a special feature of SAS. If you try to print a missing value, the result will be blank for a character field or a decimal point for a numeric field. Calculations on missing values give results which are also missing.)

5.4 COMPLEX FILES

A simple file is one in which all the records have the same fields with the same format.

A complex file contains records of more than one type; records of any one type have the same fields with the same format.

Normally in a complex file a record of type 1 will be followed by a varying number of type-2 records, and each of these may be followed by type-3 records, and so on.

The first one or two fields will usually have the same format for all record types, and one of these fields will contain the record type (RT).

In your program you must INPUT part of the record including the RT field, and suspend the INPUT. do this with an @ sign at the end of the statement - a so called trailing @ sign. Then test the record type, before resuming the INPUT to read in other fields that belong to the record type you want.

(If you don't use the trailing @ sign, each INPUT statement reads a fresh record from the file.)

20 of 48 07 April 2023


The example below shows a DATA step which reads the Repair-of-Spares File (ROS) and writes fields selected from the two record types on two SAS datasets.

21 of 48 07 April 2023


DATA WORK1 WORK2; INFILE ROS; INPUT @1 CUSKEY £CHAR6.

@16 RT £1. @;

IF RT='1' THEN DO; INPUT @17 GRSTAT £1.

@109 CROSQTY PD3.0; OUTPUT WORK1; END; ELSE DO; INPUT @17 WJSTAT £1.

@60 ROSCSSD PD3.0; OUTPUT WORK2;

END;

5.5 RECORDS WITH VARIABLE NUMBERS OF SETS

A second type of complex file is one where there is one record type but where a group of fields on the record can be repeated a varying number of times.

The MACART History File is an example. The first 18 fields occupy constant positions, and a normal INPUT statement can be used to read them. The remaining 7 fields form a set, which can be repeated up to 9 times.

The INPUT statement should be set to look for the maximum number of repeats, starting from the position where the set begins. On MACART History the set begins at position 90, TRADE is the first field of the set, and there are 24 more character positions in the set after TRADE. The INPUT statement contains

@90 (TRADE1-TRADE9) (£2. +24)

The brackets mean that the formats are to be read in rotation; +24 are the fields to be skipped-over.

There could of course be fewer than 9 sets on a particular record: 9 is the maximum. To prevent SAS running over to the next record in such cases, include MISSOVER LENGTH=L in the INFILE statement.

The following example shows a DATA step that reads the MACART History file and creates a SAS dataset from the fields JCN, ORDNO, TRADE1 to TRADE9, and DIRCUM1 to DIRCUM9.

DATA WORK1; INFILE MACART MISSOVER LENGTH = L; INPUT @1 JCN £CHAR10.

@11 ORDNO £CHAR6. @90 (TRADE1-TRADE9) (£CHAR2. +24) @90 (DIRCUM1-DIRCUM9) (+10 PD4.2 +12);

OUTPUT WORK1;

22 of 48 07 April 2023


Note: In the repeatable sets there are 10 character positions before DIRCUM, 12 after.

Where the set occurs fewer than 9 times, the remaining values of TRADE and DIRCUM will be "missing" - see "Merging Files".

5.6 PROGRAM CONTROL

In a DATA step you can include statements that are to be executed only if a certain condition applies - or only if it does not apply. For example,

IF N > 0 THEN AVGE = TOT / N;

will make SAS calculate AVGE only when N is greater than zero; otherwise this calculation will be omitted.

If your next statement is

ELSE AVGE = 0;

SAS will then set AVGE to 0 if N is not greater than zero (written without the ELSE, this statement would set AVGE to 0 in all cases).

To attach more than one statement to a THEN or an ELSE, form a DO group. For example:

IF N > 0 THEN DO; AVGE = TOT / N; PUT @55 AVGE;END;ELSE DO; AVGE = 0; PUT @55 AVGE

@65 'BUCKET';END;

Note that an IF statement must contain a THEN (except in the special case of record selection, "Record Selection"). Note also that.

IF ....THEN DO;

is one statement and that a DO group must finish with

END;

5.7 CREATING MORE THAN ONE DATASET

You can create several SAS datasets in one DATA step, by making different selections from the input file for instance. Include all the datasets' names in the DATA statement, and write a separate OUTPUT for each dataset, e.g.

23 of 48 07 April 2023


DATA WORKB WORKD WORKE WORKS; INFILE INFO; INPUT @1 PTNO £CHAR15.

@16 SITE £1. @17 QTY PD3.0;

SELECT(SITE); WHEN('B') DO;

OUTPUT WORKB; END; WHEN('D') OUTPUT WORKD; WHEN('E') OUTPUT WORKE; WHEN('S') OUTPUT WORKS;

OTHER; DROP SITE;

Note the use of the SELECT statement above, this can be a useful replacement for multiple IFs. The first WHEN DO is unnecessary but shown for completeness.

5.8 RETAIN OPTION

SAS cycles through the statements in a DATA step once for each input record, as described in "Sequence of Processing". Generally, if you give a field a value in one cycle, it will not retain this value to the next cycle; instead the field will be set to "missing" until given a new value.

By using RETAIN, however, you can cause a field to hold its value until another statement changes it. For example,

RETAIN A X Y NAME TOTAL;

will cause A, X, Y, NAME, and TOTAL to remember their values from one cycle (input record) to the next.

The RETAIN option also allows you to initialise values:

RETAIN A X Y 0 NAME 'LUCY' TOTAL;

will also initialise A, X, and Y to zero and NAME to 'LUCY'.

The next section shows a possible use of RETAIN.

5.9 COUNTING AND SUM STATEMENTS

The following example illustrates a method of counting the records for which a certain condition applies:

DATA WORK3; SET WORK1; RETAIN COUNT 0; IF SEX = 'F' THEN COUNT = COUNT + 1; OUTPUT WORK3;

24 of 48 07 April 2023


COUNT is initialized to zero by RETAIN, which also makes the link between one record and the next, - and is increased by one when the record is for a female.

However, in this case, the same effect can be achieved by writing a sum statement.

COUNT + 1; instead of COUNT = COUNT + 1. Retaining and initializing to zero are then automatic; so you can then omit the RETAIN statement.

DATA WORK3; SET WORK1; IF SEX = 'F' THEN COUNT + 1 OUTPUT WORK3;

The general form of the sum statement in SAS is

variable + expression;

(the sign must be +).

5.10 LENGTH

The first time you mention a character field in a program you automatically set its length. This may happen in an INPUT or an assignment statement, so that for example either

INPUT ... @41 WAIT £4. ...;or WAIT='LONG';

would define WAIT as a four-character field. So if you then put

WAIT = 'SHORT';

WAIT would still have only four characters, with the value 'SHOR'.

A third way to fix the length of a field is to use a LENGTH statement before the field is INPUT or assigned.

LENGTH WAIT £ 5;

would define WAIT as five characters. (The £ sign means characters, but this is not a format specification: there is no dot.) You can specify several lengths in one statement:

LENGTH QTY COST 6 WAIT CODE £ 5 ADDRESS £ 20;

Note the maximum length allowed for a field is 200 bytes.

25 of 48 07 April 2023


Numeric lengths are in bytes, not digits. The default length of numeric fields is 8 bytes; this gives 15 significant figures and is called double precision. The effect of

LENGTH DEFAULT=4;

is to reduce the default length of all numeric fields in the program to 4 bytes. The resulting 7 significant figures (single precision) are enough for most purposes and save space for running. The precision of individual fields can then be overruled in another LENGTH statement, like the one above which gives QTY and COST a length of 6 bytes (11 significant figures).

5.11 FORMAT STATEMENT

A format is a code like 5.3, ZD2.0, £CHAR15., DATE7., which describes the way a value is either (a) stored on an external file or (b) to be written. SAS recognises the format by the dot; a number just before the dot defines how many positions the field takes up.

A FORMAT statement attaches a format to a variable or variables. It has no effect on the variables themselves or on the way they are held in the program. It is just a label which says how the variables should appear on output, e.g. by PROC PRINT or PROC CHART. For example

FORMAT BUST WAIST HIP 3.0 WEIGHT 5.1;

would signal that the measurements should be printed as whole numbers and the weight to one decimal place.

6. DATES AND TIMES IN SAS

6.1 INTRODUCTION

SAS's handling of dates and times is very flexible but necessarily rather complicated. This section is intended to show several ways to exploit the facilities.

6.2 INTERNAL REPRESENTATION

When SAS works with dates it uses a sequential day number: day 0 is January 1st 1960. Likewise time is worked in seconds counted from midnight; and date-time combinations are worked in seconds from 0 hours on January 1st 1960. This enables SAS to calculate intervals by subtracting and to sort dates and times correctly. The user, however, doesn't need to understand this internal representation.

6.3 FORMATS FOR DATES AND TIMES

There are several SAS formats to allow dates and times to be printed in conventional form. A selection of these appear below, taking September 16th 1987 and/or noon as an example. The figure before the point in the format gives the field width. If this is omitted, the default width is taken (e.g. "DATE." has the same effect as "DATE7."); if it is less than the default, the field will be truncated from the right; if it is specified too long, an error message will result. * marks the default field-widths.

26 of 48 07 April 2023


format typical result

DATE5. 16SEP* DATE7. 16SEP87 DATE9. 16SEP1987 DATETIME7. 16SEP87 DATETIME12. 16SEP87 : 12 DATETIME15. 16SEP87 : 12:00* DATETIME18. 16SEP87 : 12:00:00 DDMMYY2. 16 DDMMYY4. 1609 DDMMYY5. 16/09 DDMMYY6. 160987* DDMMYY8. 16/09/87 * HHMM5. 12:00* MONYY5. SEP87 TIME5. 12:00* TIME8. 12:00:00 WEEKDATE3. WED WEEKDATE9. WEDNESDAY WEEKDATE15. WED, SEP 16, 87 WEEKDATE17. WED, SEP 26, 1987* WEEKDATE29. WEDNESDAY, SEPTEMBER 16, 1987 WORDDATE3. SEP WORDDATE12. SEP 16, 1987* WORDDATE18. SEPTEMBER 16, 1987 YYMMDD2. 87 YYMMDD4. 8709 YYMMDD5. 87-09 YYMMDD6. 870916* YYMMDD8. 87-09-16* YYQ4. 87Q3

6.4 HOW TO USE THE FORMATS

To read dates or times from a file, specify on INPUT the format they present on the file, e.g.-

INPUT @45 START HHMM5.;

To specify formats for printing in a PROC step, use a FORMAT statement in a DATA step:

27 of 48 07 April 2023


INPUT @35 DFROM DATE7. @45 EXPIRES DATE7.;FORMAT DFROM EXPIRES DDMMYY8.;

To print dates or times in a DATA step, use the formats in a PUT statement (see "Report Writing"):

FILE PRINT;PUT @10 START HHMM5.;

6.5 DATE AND TIME FUNCTIONS

(a) The following functions convert dates or times from the SAS internal representation to numeric variables containing meaningful dates or times. Suppose D is a SAS date and T a SAS time, the representations in this case of September 16th 1987 and noon respectively. Writing X = HOUR(T); Y = MONTH(D); will set X to 12 and Y to 9 for example.

function result comment

DAY(D) 16 day of monthHOUR(T) 12JULDATE(D) 87259 year & day numberMINUTE(T) 0MONTH(D) 9QTR(D) 3 quarterSECOND(T) 0WEEKDAY(D) 4 1=Sunday etc.YEAR(D) 1987

The function PUT(value, format) can be used to convert to any format, e.g.

DAT = PUT(D,DDMMYY.);TIM = PUT(T,TIME8.);

(b) The following functions convert numeric arguments to SAS internal representations of dates or times. Note that DATEJUL is the reverse of the JULDATE function, and that DHMS requires a SAS date as first argument. The functions are shown with typical arguments, and the actual result is the SAS representation of the date or time indicated (e.g. not 16.09.87 but 10120).

function result (see last sentence above)

DATEJUL (87259) 16.09.87MDY(9, 16,87) 16.09.87YYQ(87,3) 1.07.87DHMS(D,12,0,0) 16.09.87 12:00HMS(12,0,0) 12:00

28 of 48 07 April 2023


(c) The following functions are written with empty brackets; they return the current date or time in internal representation. Use FORMATs, or functions of type (a), to convert to readable form:

HODIE = TODAY( );FORMAT HODIE DATE.;

The first two functions give identical results:-

29 of 48 07 April 2023


function result

DATE( ) current dateTODAY( ) current dateDATETIME( ) current date-timeTIME( ) current time

(d) The remaining functions compute intervals:

INTCK('DAY',D1,D2)

returns the number of days from date D1 to date D2, both in SAS representation.

INTNX('DAY',D1,N)

returns the date, in SAS representation, which is N days later than the date D1. The first argument of INTCK or INTNX must be one of the following, in single quotes:

for dates: DAY, WEEK, MONTH, QTR, YEAR;

for date-times: DTDAY, DTWEEK, DTMONTH, DTQTR, DTYEAR;

for times: HOUR, MINUTE, SECOND.

6.6 DATE AND TIME VALUES AS CONSTANTS

The values of SAS dates or times can be written in DATA or PROC steps. Use quotes as for normal character values, and follow the second quote with D, T, or DT without a space:-

IF HR > '23:30'T THEN DAT1 = '25DEC87'D;HOGMAN = '31DEC87:23:59:59'DT;

6.7 A WORKED EXAMPLE

Read the dates DAT1 and DAT2 from a file, selecting records for which DAT1 is later than 1st January 1987. Print DAT1 and MIDDATE, which lies midway between DAT1 and DAT2, both in the form DDMONYY. If DAT2 is before the run date of the job, print 'LATE'.

DATA WORK1: INFILE DIARY; INPUT @1 DAT1 YYMMDD6.

@7 DAT2 YYMMDD6.; IF DAT1 > '01JAN87'D; FORMAT DAT1 MIDDATE DATE.; N = INTCK('DAY',DAT1,DAT2); MIDDATE = INTNX('DAY', DAT1,0.5*N); IF DAT2 < TODAY( ) THEN NOTE = 'LATE'; ELSE NOTE = ' ';

30 of 48 07 April 2023


DROP DAT2 N;PROC PRINT DATA=WORK1;

7. RUNNING YOUR SAS PROGRAM

7.1 SAMPLE JCL FOR A SAS JOB

//VC01RX9S JOB ,'CGW MSL TOMOTT X4183',CLASS=C,MSGCLASS=R

//*MAIN SYSTEM=JGLOBAL//SAS EXEC SAS//external file DD’s//SYSIN DD *

SAS code …/*

8. USEFUL TOOLS

8.1 MISSING VALUES

SAS recognises missing as a value. This allows a program to continue - with a warning message, when it meets a missing or invalid field on an input file, or when for any other reason there is a field without an acceptable value.

For example, the following statements would produce missing values if

. SEX is not M and P is not already set,

. N is numeric

. either RATE or HOURS is missing:-

IF SEX = ’M’ THEN P = 1;N=’ ‘;OTIME = RATE * HOURS;

In general any missing value used in a calculation will produce a missing result. But the SUM function treats missing values as zero; see the next section.

On output, a missing numeric value prints as a dot, a missing character value as blank. You can use these symbols in program logic, for example

IF TITLE = ’MISS’ | TITLE = ’MRS’ THEN SEX = 0;IF TITLE = ’MR’ THEN SEX = 1;IF TITLE=’ ‘ THEN SEX=.; /* MISSING TITLE */

If you prefer missing numeric values to appear on prints as something other than a dot, use the MISSING option:

OPTIONS MISSING = ’c’’;

31 of 48 07 April 2023


where c is blank or whatever else you choose.

8.2 NEATER ARGUMENTS

The functions MAX, MIN, MEAN, N, RANGE, STD, and SUM - called sample-statistic functions - calculate statistics for a range of arguments supplied. They ignore arguments with missing values; with SUM this is equivalent to taking 0 for each missing value.

N returns the number of (non-missing) arguments, RANGE the difference between the largest and smallest arguments, STD the standard deviation; and there are more obscure functions for the statistician.

The usual way of writing arguments is a list with commas:

AVERAGE = MEAN(TOM,DICK,GIACOMO);AP = SUM(WEEK1,WEEK2,WEEK3,WEEK4);

But you can use blanks in place of commas or compress a suffixed list, if you insert OF:

AVERAGE = MEAN(OF TOM DICK GIACOMO);AP = SUM(OF WEEK1 WEEK2 WEEK3 WEEK4);AP = SUM(OF WEEK1-WEEK4);

8.3 CHARACTER TESTS

Comparing character fields of different lengths works as if the shorter field were padded out with trailing blanks. So if NAME has the value ‘FRED’ and PASSWORD is ‘FRED ’, the following will select positively:

IF NAME = ’FRED ‘; IF PASSWORD = ’FRED‘;IF PASSWORD = NAME;

while

IF NAME = ’ FRED’;IF NAME = ’FRED BLOGGS’;

will not.

But if you put a colon after the = sign (or other comparison symbol), it is as if the long field were cut down to the length of the short, and so

IF NAME=:’FRED BLOGGS’;

would connect.

32 of 48 07 April 2023


You can use < > ¬< etc. to check alphabetic position. A blank counts first in the collating sequence, followed by special characters, letters A to Z, and numbers 0 to 9. So

IF PTNO >: ‘Z’ … or IF PTNO >=: ‘0’ …

checks if PTNO begins with a numeral.

8.4 DO’S AND DON’TS

A DO group is a list of statements written between “DO;” and “END;”. There are several examples in these notes where the execution of such a group of statements together depends on a single condition:-

IF condition THEN DO; (action statements forming a DO group)END;

DO groups are also useful when parts of a DATA step are to execute repetitively. (This is distinct from the normal repetitive execution of the whole DATA step wherever an input file is involved).

33 of 48 07 April 2023


DATA; SET WORK1; LENGTH CCDE £ 1; DO I = 1 TO 4; CODE = SUBSTR (‘ABCD’,I,1); OUTPUT; END;

The example above would produce four observations for every observation read from WORK1, setting a new field called CODE to A, B, C, and D respectively.

In general DO statements for repetitive processing take the forms

DO I = a TO b BY c;DO I = p, q, r, s, …;DO WHILE (expression);DO UNTIL (expression);DO OVER arrayname;

The first two forms can be combined in various ways, using commas between the items. Any of the entries a, b, p, q, etc. can be numbers or variables but not expressions. The following are two examples which come to the same thing:

DO K=13, 5 TO 8, 3 TO -3 BY -2;DO K=13,5,6,7,8,3,1,-1,-3;

With DO WHILE, SAS first evaluates the ‘expression’ (which should be in brackets). If the expression is true, it executes the statements in the DO group. SAS then goes back and evaluates the expression again, and so on until the expression is not true. So another way of coding the DO group in the previous example would be:

I = 1;DO WHILE (I < 5); CODE = SUBSTR (‘ABCD’,I,1); OUTPUT; I + 1;END;

With DO UNTIL, SAS first executes the statements in the DO group regardless; it then evaluates the expression. If the expression is not true, it executes the statements again, and so on until the expression is true; then it stops. So again:-

I = 1;DO UNTIL (I = 5); CODE = SUBSTR (‘ABCD’,I,1); OUTPUT; I + 1;END;

34 of 48 07 April 2023


“DO OVER arrayname;”, where ‘arrayname’ is the name of an array valid in the same step, means

DO i = 1 TO n;

Where ‘i’ is the index variable of and ‘n’ the number of elements in the array. See “Arrays”.

8.5 GO TO

The statement

GO TO label;

tells SAS to jump to another statement within the same DATA step. Attach the ‘label’ to the front of the target statement with a colon ( : ).

IF N<= 0 THEN GO TO L1;AVGE = TOT / N;PUT @55 AVGE;GO TO L2;

L1: AVGE=0;PUT @55 AVGE @65 ‘BUCKET’;

L2: …

Too many GO TOs produce unwieldy coding, and it is often better to use DO. The logic above, for example, is more lucidly coded with DO groups in “PROGRAM CONTROL”.

8.6 LINK

LINK label;

does the same as “GO TO label;”, except that when, after branching to the labelled statement, SAS meets

RETURN;

it returns to the statement immediately following the LINK statement.

This is a useful technique where a standard peice of coding needs to be executed at several places in a DATA step.

8.7 ARRAYS

An array is a group of variables (fields) referred to by a single name. If you have several variables requiring similar action, define an array on them, using an ARRAY statement. Examples are

ARRAY COST (I) LABC MATC ASLC;ARRAY SCORE (TEST) S1-S4 ENDSCR;ARRAY ITEM X1-X12 Y1-712;

35 of 48 07 April 2023


ARRAY NAME (I) £ NAME1-NAME5;

Here COST, SCORE, ITEM, and NAME are array names. I and TEST are index variables, written in parentheses. The names on the right belong to the original variables, which become the elements of the arrays.

Notes on arrays: Numbered variables are a convenient shorthand: they do not necessarily form an array. Here S1-S4 just means S1 S2 S3 and S4.

The elements of an array are variables in their own right; you can still call them by their own names.

An array declaration is valid in one step only. Redeclare arrays in each DATA step where you want them to apply.

Do not use array names in INPUT, PUT, KEEP, DROP, FORMAT, or RETAIN statements or in PROC steps (except in certain special PROCedures).

You don’t need to give an index variable: SAS uses _I_ in default,

A £ sign before the element names defines the elements as character variables. But if they are already defined as character (in INPUT ot LENGTH statements or by assignment), you can omit the £. Don’t use the £ sign with numeric variables.

Using arrays: You can refer to a variable by its own name or by the name of an array to which it belongs, if the index is set. So instead of

IF MATC > 0 THEN TOT = TOT + MATC;

You could code

I = 2;IF COST > 0 THEN TOT = TOT + COST;

(with the array definition above), and instead of

Y1 = 10;

you could code

_I_ = 13; ITEM = 10;

8.8 ARRAYS AND DOS

Define an array if you want to perform similar actions on several fields. This means repetition, and a DO group is ideal for this.

Suppose you have 8 fields called QTY1 to QTY8 - and if any of these is negative its value is to be changed to 100. This repetitive process calls for an array and a DO group:

36 of 48 07 April 2023


ARRAY QTY (J) QTY1-QTY8;DO OVER QTY; IF QTY < 0 THEN QTY = 100;END;

Note that the DO statement above is equivalent to

DO J = 1 TO 8;

You don’t actually need the array index (J) here, and if you missed it out you could use

DO _I_ = 1 TO 8;

but in any case DO OVER QTY is neater, easier and self documenting.

8.9 BY-WAYS

You can use a BY statement in

1. PROC SORT

2. Other PROCedures - the result is like running the procedure separately for each value of the BY variable -, or

3. a DATA step with SET or MERGE.

In cases 2 and 3 the dataset must be in order of the BY variable(s). To be safe, see that the BY statement is the same as (the first part of ) the BY statement in a previous PROC SORT, including DESCENDING if used.

Each item in a BY statement is (to be) sorted in ascending or alphabetic order, unless preceded by DESCENDING or followed by NOTSORTED. Compare

BY DEPT NAME;BY DESCENDING DEPT NAME;BY DEPTTITL NOTSORTED NAME;

(Name is in ascending alphabetical order in each case.)

NOTSORTED (not to be used in PROC SORT) means that observations with the same value of the variable are grouped together, although the file is not necessarily in order of the variable. For example, if a file containing both department number and department title is sorted

BY DEPT;

observations with the same department title will come together, without the department titles necessarily being in alphabetical order. So we could write

37 of 48 07 April 2023


BY DEPTTITL NOTSORTED;

in a PROC PRINT for example.

8.10 SELECTED OBSERVATIONS FOR TESTING

DATA; SET MONEY (OBS = 50);

IF COST < BOMB & CODE = ’MORSE’; AS = SIGN(MENT); SUM + MATION;

This step reads the first 50 observations from MONEY in order to test a program, say. But those 50 observations might never get past the selection (IF . . .); the dataset would then have 0 observations, and the test would be frustrated.

Code as follows to make sure that 50 records are selected for the rest of the program (STOP halts the current step, not the ones that follow):

DATA; SET MONEY; IF N = 50 THEN STOP; IF COST < BOMB & CODE = ’MORSE’ THEN DO; AS = SIGN(MENT); SUM + MATION; . . .; OUTPUT; N+1; END;

8.11 A SLICK USE OF INDEX

You can use the INDEX function to save writing a long string of ORs when selecting records with IF. For example, replace

IF SITE = ’A | SITE = ’B’ | SITE = ’P’ | SITE = ’Q’ | SITE = ’W’ or IF SLOC = ’BA’ | SLOC = ’BR’ | SLOC = ’EW’ | SLOC = ’DY’;

with

IF INDEX (‘ABPQW’,SITE) > 0; or IF INDEX (‘BA,BR,BW,DY’, SLOC) > 0;

8.12 TROUBLE WITH SUBSTRING

Suppose you are taking bits out of character strings using SUBSTRING, and tacking them together using | |, to make a new character field. This sort of thing:

DATA; . . . INPUT . . . CATSEQ £CHAR12. . . .; FIGURE = SUBSTR (CATSEQ, 7,2); ITEMBIT= SUBSTR (CATSEQ, 10,3);

38 of 48 07 April 2023


ITEM = FIGURE | | ITEMIT;

You expect FIGURE to be 2 and ITEMBIT 3 characters long, and the new field ITEM to be 5 characters. You may be surprised to find that ITEM is 24 characters long with lots of blanks.

This happened because SUBSTRING takes its length from the first argument, so that in this example FIGURE and ITEMBIT will be the same length as CATSEQ, ie. 12 characters each.

The solution is to get a LENGTH statement in before the fields are created -

LENGTH FIGURE £ 2 ITEMBIT £ 3;FIGURE = SUBSTR (CATSEQ, 7,2);ITEMBIT= . . .

8.13 SAS MACRO LANGUAGE

The SAS macro language allows the user to generate data-dependent statements, to communicate information between steps , to execute DATA or PROC steps conditionally and to generate repetitive code.While this powerful feature is beyond the scope of this guide the %INCLUDE statement is worthy of note.

%INCLUDE(member) copies in the code from the dataset with a DDNAME of member, this is useful for copying in common code eg. File definitions.

//VC01RX9S JOB ,'CGW MSL TOMOTT X4183',CLASS=C,MSGCLASS=R

//*MAIN SYSTEM=JGLOBAL//SAS EXEC SAS//CPSYS DD DISP=SHR,DSN=V.VSAM.CT.AT.CPSYS//CPSYSFD DD * INPUT @1 ORG 3.

@4 …/*//SYSIN DD * DATA _NULL_; INFILE CPSYS VSAM; %INCLUDE CPSYSFD; … /*

For clarity the above example shows the %INCLUDEd code instream, normally it would be kept on a sequential dataset.

39 of 48 07 April 2023


9. ASPECTS OF INPUT

9.1 TEST INPUT

Instead of using an input file, you can enter data with your program. Comment out the INFILE statement, and include your data at the end of the DATA step, preceded by the statement

CARDS;

The format of the data should correspond with the INPUT statement. Follow the last line of your data with the first line of the next step or with RUN; - the semicolon in this line signals the end of the data.

DATA WORK1; INPUT NAME £ DATE SCORE; INFORMAT DATE DDMMYY6.; . . . CARDS; PETE 061282 151 SUE 150483 180 GEOFF 220183 207PROC SORT; BY SCORE;PROC CHART …

If you use the normal style of INPUT, with @ and format, your CARDS input must correspond exactly. But if your data items are separated by blanks, and have no embedded blanks, it is easier to code the INPUT in list form as in the example: Just write a list of fields, with £ after each character field.

9.2 INPUTTING SEVERAL DATASETS

DATA WORK7; SET WORK3 WORK4 WORK6; . . .

If more than one dataset follows SET, these datasets will be concatenated ie. the new dataset will contain the observations from all the input datasets. Normally the observations will be in their original order: the observations from the first dataset named will be followed by those from the second, etc.

However, if a BY statement is included, and if the input datasets are sorted by the BY variable, the new dataset will also be sorted by the BY variable. Note how this interleaving process differs from merging, where the observations from the different sources are fused together.

The input datasets need not all have the same fields. In the example above, WORK7 will contain all the fields present on WORK3, WORK4 or WORK6. Where an observation originates from a dataset in which one of the fields is absent, the corresponding value will be held as missing.

40 of 48 07 April 2023


9.3 MERGE WITHOUT BY

DATA WORK7; MERGE WORK3 WORK4 WORK6; . . .

As with a multiple SET statement, MERGE reads from two or more datasets to produce a single dataset. But whereas SET preserves the total number of observations from all sources, MERGE combines them: each observation in the new dataset can contain fields from all the source datasets.

If there is no BY statement, the first observation of the new dataset is made up from the first observation of each of the source datasets, the second from the second, etc. The dataset with the most observations fixes the number of observations created, the fields arising from the other datasets are then made up with missing values.

9.4 OBSERVATION (END = )

DATA; SET WORK2 END = E; OUTPUT; IF E = 1 THEN DO; SET WORK3; OUTPUT; END;

The END = option on SET statement creates a field you can use to detect when you have come to the end of the input file. Give the field any valid name (E in the example). Its value will be 0, until the last observation has been read in, and then it will be 1. The example shows how to take an observation (totals perhaps) from a single-observation file WORK3 and tack it on the end of WORK2 and produce a new dataset.

You can use END = in the same way with INFILE, MERGE, and UPDATE statements.

If you want to detect the first or the nth record of a file, use the automatic variable _N_. _N_ is the number of times SAS has begun executing the DATA step. To process only every 10th observation, for example, code

IF FLOOR (_N_ / 10) = _N_ / 10;or IF MOD (_N_,10) = 0;

9.5 FIRST. AND LAST.

If you use BY in a SET statement (the dataset being in order of the BY variable), SAS generates a “LAST.” Variable, whose value is 1 for the last observation before the variable changes, 0 otherwise; and also a “FIRST.” variable, whose value is 1 for the first observation after the variable changes, and 0 otherwise.

41 of 48 07 April 2023


PROC SORT DATA = WORK2; BY DEPT;DATA; SET WORK2; BY DEPT; COUNT + 1; IF LAST.DEPT = 1 THEN DO; OUTPUT; COUNT = 0; END;

In this example, the dataset WORK2 is sorted by department and used as input ot a DATA step. Several observations might have the same value of DEPT. On reading the last of these, ie. just before DEPT changes value, a record is to be output and the record count reset to zero.

Note that, where a value of DEPT occurs only once, FIRST.DEPT and LAST.DEPT will both be 1 for that observation.

Note also that the FIRST. and LAST. facility may not work if you have more than one SET in the step. A further problem may arise if you are selecting observations. A statement likek “IF SLOC=’F’;” in the example above could delete an observation having LAST.DEPT equal to 1, and the change of department would go undetected.

9.6 DATASET OPTIONS

The coding below illustrates a few normal SAS facilities. Observations 101 to 200 are selected from the file XY, using FIRSTOBS and OBS on INFILE; certain fields are read in; and after a further selection based on the values of OTIME and SALARY, these fields are dropped from the file.

DATA; INFILE XY FIRSTOBS=101 OBS=200; INPUT @1 NAME £CHAR18.

@16 DOB PD3.0 …; RENAME NAME = SURNAME DOB = BORN; IF OTIME > 0 & SALARY < 8000; DROP OTIME SALARY;

The facilities OBS, FIRSTOBS, RENAME, KEEP and DROP are among those that can be specified as options on SET, MERGE, and UPDATE statements, eg.:

DATA; SET WORK1 (OBS = 9 DROP = PITCH DIA RENAME = (COLOUR =

SHADE));

Notice that the options are put in brackets, even if there is only one:

SET WORK5 (OBS = 25);

and separated by at least one blank. If you are renaming several fields, use blanks to separate them too:

42 of 48 07 April 2023


SET WORK8 (RENAME = ( R = RED P = PINK B = BLUE));

You can also use dataset options in the same way in DATA = and OUT = in PROCedures. In a DATA statement they provide a way of sending different fields to different datasets:

DATA WORK6 (KEEP = PART YEAR81 TOT81) WORK7 (KEEP = PART YEAR82 TOT82) WORK8 (KEEP = PART YEAR83 TOT83);. . .

Although OBS and FIRSTOBS don’t make any sense on a DATA statement or on the OUT = option in a PROCedure, you can use them with DATA = to process only part of a dataset:

PROC PRINT DATA = WORK9 (OBS = 10);

If you use RENAME and KEEP or DROP on the same dataset, the KEEP/DROP happens first; so use the old name with KEEP or DROP.

10. ASPECTS OF OUTPUT

10.1 OUTPUT FROM DATA

The next four sections concern ways of programming output from a DATA step. The PROCedures like PRINT, CHART, PLOT, and FREQ are programs written to process data in special ways for the print file. Within a DATA step you can program your own procedure, with options to output the print file to an O.S. file. If you don’t want your DATA step to create a SAS dataset as well, begin it with DATA _NULL_;

Use PUT statements to output, in the same way as INPUT to input.

And just as INPUT is preceded by INFILE to specify the file you are reading, so PUT is preceded by a FILE statement to specify the file you are writing.

10.2 THE FILE STATEMENT

The form is

FILE filename options;

where ‘filename’ is PRINT, PUNCH, LOG, or a dd name for an OS file, and ‘options’ may be omitted; see the next section.

FILE PRINT will direct your output to the print file, and this is an alternative to using PROC PRINT. FILE PUNCH will produce punched cards (remember those?).

With FILE LOG you write on the SAS log, and your output appears with the SAS notes and messages. If you omit the FILE statement, LOG is taken in default. No file allocation is needed with PRINT, PUNCH, or LOG.

43 of 48 07 April 2023


10.3 FILE OPTIONS

Several of the options usable with FILE are ways to set up variables to tell you where you are when printing.

DATE _NULL_; TITLE MENUS FOR A SUMMER SEASON; SET WORK80; FILE PRINT LINE = LCOUNT; PUT @22 DAT DATE7.

@29 ENTREE @44 MAIN @57 SWEET;

IF LCOUNT = 18 THEN PUT ’10 / / / 68 * ’=’ _PAGE_;

This example uses the LINE = option and a title to prepare printed pages. LINE = LCOUNT sets up a field LCOUNT (you choose the name), a count of lines printed. When 18 lines are printed, two lines are skipped (3 slashes), 68 equals signs are printed for decoration, and a new page begins. Each page carries a title.

A FILE statement need not carry any options. If there are several options, at least one blank should separate them. For example:

FILE MENU PRINT HEADER = HDR LINESLEFT = LDF;

If the ‘filename’ is PRINT, SAS supplies carriage-control characters in column 1 to trigger off line skips and new pages. With any other file name, use PRINT as an option (as above) if you want carriage-control character - for example if you want to store the file on disk to print at a later stage.

The HEADER = option defines a label for branching when a new page is needed, LINE = sets a field to contain the current line number when printing. LINESLEFT = sets a field to tell how many lines are left on the page; this is useful if you want to include a footnote:

44 of 48 07 April 2023


DATA _NULL_; SET WORK2; FILE PRINT LINESLEFT = GAP; PUT @9 START

@19 DESTINATION @29 DISTANCE 5.1

@36 TIME 4.0;IF GAP = 5 THEN PUT @40 / / / / DATE DATE7. _PAGE_;

Further options: NOTITLES is often used with HEADER = to suppress automatic titles. PAGESIZE = sets the number of lines per page. (the default is that specified in the PAGESIZE systems option). COLUMN = works similarly to LINE= and sets a variable to the current location of the pointer on the print line.

10.4 THE PUT STATEMENT

Follow the word PUT with a list of variables to write, and line controls if required. If you are printing, the values will appear on a line, separated by single blanks, or in the positions your line controls specify. Some examples:

PUT X Y ALPHA BETA RATIO _ERROR_;PUT @22 DAT DATE7. @29 ENTREE @44 MAIN @57 SWEET;PUT @POS1 RED @POS2 ORANGE @POS3 YELLOW _PAGE_;PUT _PAGE_;PUT _ALL_;PUT @20 x 5.2 +10 y 5.2 +10 z 5.2;PUT @60 ‘DEFINITE’ OVERPRINT @60 ‘-----------‘;

The first of these will write the values of six variables including the error indicator (SAS maintains _ERROR_ as 0 until it finds an error, then sets it to 1). The second and third statements give positions and formats (sometimes). For printing, the positions are the columns on the page at which the fields should start. Where a format is omitted, SAS uses the format it already knows for the variable.

+10 means “Skip 10 positions (columns) after the end of the preceding field”.

PUT _ALL_ means “Write the values of all the current variables” - useful for diagnosis.

The most useful line controls are

@n move to position n (where n is a number or variable but not an expression)

+n move forward n positions (ditto)

45 of 48 07 April 2023


/ go to the next line

#m go to line m (see next section)

_PAGE_ go to the next page

OVERPRINT overprint on the same line

The items to PUT can be variables or character constants. The latter should be in quotes, eg. ‘DEFINITE’; without the quotes SAS would hunt for a variable called DEFINITE. You can replicate a character variable; so 6*’HA’ would print the same as ‘HAHAHAHAHAHA’.

10.5 PRINTING MULTIPLE COLUMNS

You may want to save paper, when printing only a few fields, by repeating the column headings across the page. To do this, make use of

1. the # control symbol. “PUT #m . . .” means move to line m

2. the N = option on the FILE statement; this controls the number of lines held in a buffer beforeprinting. The default is N = 1, which doesn’t allow you to use #. Here you need to use N = PAGESIZE (or N = PS) and format a whole page at a time:

DATA _NULL_; TITLE ECONOMIC MOTORING FILE; FILE PRINT HEADER = H N = PS; DO C = 2, 32, 62; DO L = 5 TO 19 BY 2; SET CAR; D = C + 12; PUT #L @C OWNER

@D TYPE; END; END; PUT _PAGE_; RETURN; H: PUT /@2 ‘OWNER’

@16 ‘TYPE’ @32 ‘OWNER’ @46 ‘TYPE’

@62 ‘OWNER’ @74 ‘TYPE’;

This example uses nested DO groups to set values for line(L) and column (C). The SET statement, to read the data, is inside the DO groups. The PUT _PAGE_ instruction dispatches the completed page to the printer; without it the page would remain in the buffer to be overwritten by the data that followed.

46 of 48 07 April 2023


10.6 OUTPUT TO EXTERNAL FILES

The outputs from SAS are

1. the SAS print, ie. the output from printing PROCedures and from PUT statements used with FILE PRINT,

2. the SAS log, ie. information messages about steps and files, and error and warning messages, and

3. saved SAS datasets.

(A SAS dataset can be written or read only from within SAS. If you want the data in a non-SAS file, use PROC PRINT or PUT.)

As a rule, the first two outputs appear as prints when you submit a batch job.

10.7 PROC PRINTTO

PROC PRINTTO allows you to change in mid-program the destination of your print output. In the following program, WORK1 will be printed normally, WORK2 will go to a file defined by UNIT=90, and WORK3 will be printed normally:

DATA WORK1 WORK2 WORK3; . . .PROC PRINT DATA = WORK1;PROC PRINTTO UNIT = 90 NEW;PROC PRINT DATA = WORK2;PROC PRINTTO;PROC PRINT DATA = WORK3;

The form of the procedure is

PROC PRINTTO UNIT = nn NEW; orPROC PRINTTO UNIT = nn; or justPROC PRINTTO;

The UNIT option sends output to one of the special files FTnnF001 belonging to SAS (FT11F001 is already the SAS log, FT12F001 the print file, etc.: files from FT90F001 are available). Use the NEW option the first time in your program that you are writing to a particular file. PROC PRINTTO on its own cancels previous options and reroutes output to the printer; otherwise the last options are effective to the end of the program.

Before running your program you must allocate a code a DD card for any of these files. For example, if you want to save a print file, you can use UNIT=90 and

//FT90F001 DD …

47 of 48 07 April 2023


48 of 48 07 April 2023

sas intro

Documents

sum mation

sas introductory

sas macro

data step

means delete

proc printto

7 proc printto

proc sort