sas summary guide

24
SAS Summary Guide School of Applied Statistics November, 03

Upload: ashishksharma

Post on 21-Nov-2014

2.112 views

Category:

Technology


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Sas summary guide

SAS

Summary Guide

School of Applied Statistics

November, 03

Page 2: Sas summary guide

1

Contents

1. Introduction........................................................................................................................2 1.1 Structure of a SAS Job .........................................................................................2 1.2 SAS Language......................................................................................................2 1.3 SAS Variables ......................................................................................................2 1.4 SAS Data Sets ......................................................................................................3

2. Introduction to the DATA Step .........................................................................................3 2.1 DATA Statement..................................................................................................3 2.2 Sources of Input ...................................................................................................3 2.3 Input of Raw Data ................................................................................................4 2.4 Formats: Input and Output ...................................................................................5 2.5 How SAS Executes a DATA Step .......................................................................5 2.6 Transformation of Data ........................................................................................5 2.7 Missing Values.....................................................................................................5 2.8 Modifying an Existing SAS Data Set ..................................................................6 2.9 Output from a SAS DATA Step...........................................................................6 2.10 Output to Create Stored ASCII Files .................................................................7

3. Introduction to the PROC Step ..........................................................................................7 4. Basic Procedures................................................................................................................8 5. More on the DATA Step....................................................................................................13

5.1 IF - THEN - ELSE Statements.............................................................................13 5.2 Selecting Observations.........................................................................................14 5.3 DO and END Statements .....................................................................................14 5.4 DO Loops .............................................................................................................14 5.5 Arrays...................................................................................................................15 5.6 RETAIN ...............................................................................................................15 5.7 DROP and KEEP .................................................................................................15 5.8 RENAME and LABEL ........................................................................................16

6. Data Management ..............................................................................................................16 6.1 SET.......................................................................................................................16 6.2 MERGE................................................................................................................17 6.3 UPDATE..............................................................................................................17

7. Statistical Procedures.........................................................................................................18 8. Graphical Procedures .........................................................................................................21 9. Output Delivery System (ODS) .........................................................................................22 10. Further Facilities ..............................................................................................................23 11. Publications......................................................................................................................23

SAS Summary Guide November, 03 School of Applied Statistics

Page 3: Sas summary guide

2

1. Introduction This handout is meant as a brief introduction to the syntax of the SAS package which is available on UNIX workstations and PC computers at The University of Reading. The SAS language is similar for all versions but there are differences in file access and storage. This document is designed to give a brief synopsis of many basic commands used in the Data step and the general structure to some statistical procedures (Proc). It is, by no means, complete and there are numerous specialised manuals published by SAS Institute (some of which are in Room G16 in the School of Applied Statistics).

1.1 Structure of a SAS Job

A SAS program consists of a sequence of one or more steps and each step may contain several SAS statements. There are two kinds of step:-

• The DATA step which is used to create and manipulate SAS data sets

• The PROC step which is used for analysing or processing SAS data sets

A SAS job is made up of any number of these steps. The beginning of one step signifies the ending of the previous step.

1.2 SAS Language

SAS statements can begin in any column of a line and can be continued on subsequent lines. Each SAS statement must end with a semicolon but is mainly case-sensitive (i.e. upper and lower case should not be freely mixed).

There are three types of SAS statements:-

• Statements which appear in the DATA step

• Statements which appear in the PROC step

• Statements which can appear anywhere (global statements)

Comments can also be included in a SAS program, these are useful for annotating your program. An asterisk is used to comment out a single statement.

e.g. * This is a comment ;

or to comment out a block of lines use the /* and */ delimiter pairs:-

e.g. /* This is a comment

which will not be acted upon by SAS */

1.3 SAS Variables

There are two types of SAS variable - numeric and character. They can have the following attributes:-

LENGTH numeric variables 2 - 8 bytes

character variables 1 - 200 bytes / characters

INFORMAT format SAS uses to read a data value into a variable

FORMAT format SAS uses to write each value of a variable

LABEL descriptive label of up to 256 characters

SAS Summary Guide November, 03 School of Applied Statistics

Page 4: Sas summary guide

3

1.4 SAS Data Sets

A SAS data set is a collection of data values arranged in a rectangular table, the rows representing observations and the columns representing variables. Each variable must be given a name which consists of 1 - 32 characters. The name must start with a letter and can contain any alphanumeric character or underscore. Avoid special characters in variable names such as . or $ . Special variables within SAS are denoted by names that begin and end with an underscore.

SAS data sets can be either temporary or permanent. Temporary data sets are given a one-level name by the user which is automatically prefixed with WORK. by the SAS system. This name can be omitted altogether, in which case SAS names the data sets DATA1, DATA2 ... for the 1st, 2nd ... data sets defined. Temporary data sets are erased on leaving the current SAS session. Permanent data sets must be given a two-level name by the user linking to their storage location.

e.g. LIBNAME PERM 'complete_pathname'; PROC PRINT DATA=PERM.STUDENTS;

RUN;

Permanent SAS data sets are stored differently between versions and allocated different file extensions. However, all data sets are upward compatible. There are several words which should not be used as the first part of the SAS data set name. These include such words as PRINT, EXEC, DATA etc. and also SAS reserved names such as LIBRARY, MAPS, WORK etc.

SAS automatically documents a permanent data set to include a data set label, variable attributes and history information. The data are stored in the form in which SAS uses them, therefore saving computer time and making it unnecessary to execute input statements each time the data set is used.

2. Introduction to the DATA Step

2.1 DATA Statement

The DATA statement signals the beginning of the DATA step and gives a name to the SAS data set being created. This SAS data set can be used as input to any subsequent DATA or PROC steps.

e.g. a) DATA PERM.PATIENTS; creates a permanent data set

b) DATA SCHOOL; creates a temporary data set

c) DATA; creates a temporary data set with

default name DATAn

d) DATA _NULL_; does not create a data set

2.2 Sources of Input

a) The DATALINES or CARDS statement is used when the data are in the same file as the SAS statements:-

DATA REGRESS;

INPUT X Y Z;

SAS Summary Guide November, 03 School of Applied Statistics

Page 5: Sas summary guide

4

DATALINES;

61 44 29

17 6 43

.

.

b) The INFILE statement is used to read data from an external file on your workdisk:- DATA REGRESS;

INFILE 'file_identifier';

INPUT X Y Z;

The file identifier in the INFILE statement is the full pathname and filename of the external data file, residing on your disk, which is to be linked to your SAS program.

2.3 Input of Raw Data

The INPUT statement is used to describe the raw input data. There are three types of input mode which can be mixed in one INPUT statement:-

• LIST (or free-field)

• COLUMN

• FORMATTED

a) LIST INPUT

This mode of input simply lists the variables in the order in which they appear in the input data

e.g. INPUT NAME $ AGE SEX $;

INPUT NAME $ Q1-Q32;

where $ is used after a variable name to indicate a character variable whose value has a default length of 8 with no embedded blanks. Values must be separated by at least one space (free format).

b) COLUMN INPUT

With this mode of input the columns are specified within which each variable value is located

e.g. INPUT CANNAME $ 1-15 PARTY $ 20-24 VOTES 30-40;

The data values can be read in any order and blank fields are automatically set to missing. Embedded blanks are allowed in character data by specifying the maximum length of a value.

c) FORMATTED INPUT

This is a very flexible method of input as it is possible to read data in virtually any form. SAS keeps track of its position on the input lines with a 'pointer'

e.g. INPUT @3 QUEST3 +10 QUEST12 / @60 RESPONSE;

There are various types of 'pointer' controls each having a different meaning. Listed below are some of the more frequently used ones:-

@n move pointer to column n

SAS Summary Guide November, 03 School of Applied Statistics

Page 6: Sas summary guide

5

+n move the pointer forward n columns

#n move pointer to line n

/ move to next line

Whichever mode of input is used the following 'pointer' controls can be used to maintain the current pointer position:-

@ 'hold' data line for next INPUT statement in the current DATA step

@@ 'hold' data line for more executions of the DATA step

2.4 Formats: Input and Output

A set of directions for reading a value is called an INFORMAT and a set of directions for printing a value is called a FORMAT. It is possible to specify formats for numeric and character variables and also date and time variables. There are a large number of FORMAT and INFORMAT specifications, refer to SAS Language Reference Version 8 for further information.

2.5 How SAS Executes a DATA Step

A DATA step is executed once for each observation in the data set. A DATA step that does not contain an INPUT, SET, MERGE or UPDATE statement is executed once. The SAS variable _N_ is automatically generated for each DATA step, its value is the number of times that SAS has begun executing the step (_N_ is not directly available outside the current DATA step). All variables referred to in the DATA step, for example the variables named in the input statement and any new variables generated, make up the program data vector.

For each execution of the DATA step:-

• The program data vector is initialised to missing.

• The data values of the current observation are read using the INPUT statement. Any new variables are computed and added to the program data vector and any variables not wanted are dropped.

• The values in the program data vector are then added to the data set being created

2.6 Transformation of Data

There is a range of standard functions available in SAS for transforming data. For a full list of these functions consult the SAS Language Reference. Manipulation and transformation of data is carried out in the DATA step with the resulting variable being added to the data set automatically.

e.g. SUM=X + X;

X2=X * X; or X2=X**2; LX=LOG(X);

2.7 Missing Values

Variables with missing values on input are specified in SAS by a full stop or a blank field. On output numeric variables are displayed as a full stop and character variables as a blank field. For numeric variables it is also possible to specify up to 27 special missing value symbols ( A - Z and _ ) to distinguish between different kinds of missing data. This is done using the MISSING statement:-

SAS Summary Guide November, 03 School of Applied Statistics

Page 7: Sas summary guide

6

DATA;

INPUT X;

MISSING A B;

IF X = 99 THEN X = .A;

IF X = 999 THEN X = .B;

CARDS;

a) .A is used to distinguish from the variable name A

b) A variable is set to missing if the input field contains only a full stop or is blank.

c) A variable is set to missing if the input field contains an illegal character

2.8 Modifying an Existing SAS Data Set

Once data have been read into a SAS data set it is possible to modify that data in other DATA steps while keeping the original data set unchanged and without having to re-input the data from the raw data file. This is easily done by transferring data from the existing SAS data set into another one.

e.g. DATA NEW;

SET PERM.PATIENTS;

DOSE=PILL_A*QTY_A;

Each time the SET statement is executed another observation is transferred from the existing SAS data set PERM.PATIENTS to the SAS data set being created and called NEW .

2.9 Output from a SAS DATA Step

OUTPUT statements allow you to control when an observation is written to one of the SAS data sets which are currently being created.

e.g. OUTPUT;

OUTPUT MISSDATA;

When an OUTPUT statement is executed SAS will immediately output the current values to the named or current SAS data set. OUTPUT statements are useful for:-

a) Creating 2 or more observations from 1 record of input data

b) Combining several observations into one observation

c) Creating more than one SAS data set from one input file

eg. DATA HARV1 HARV2;

SET COMPLETE;

IF HARVEST=1 THEN OUTPUT HARV1;

IF HARVEST=2 THEN OUTPUT HARV2;

SAS Summary Guide November, 03 School of Applied Statistics

Page 8: Sas summary guide

7

2.10 Output to Create Stored ASCII Files

The FILE and PUT statements are used within a DATA step and are analogous to the INFILE and INPUT statements. The FILE command links SAS to a specific external file, while the PUT command specifies the output record format.

e.g. DATA CREATE;

SET CLASSNO;

FILE 'file_identifier';

PUT NAME $ 1-8 SEX $ 11 AGE 13-14;

3. Introduction to the PROC Step Some of the procedures available in SAS are:-

Basics: CHART, CONTENTS, CORR, DATASETS, FORMAT, FREQ, MEANS,

PLOT, PRINT, SORT, SUMMARY, TABULATE, TRANSPOSE,

UNIVARIATE

Statistics: ANOVA, CANCORR, CANDISC, CLUSTER, DISCRIM, FACTOR, GLM,

PRINCOMP, REG, TTEST

Graph: GCHART, GCONTOUR, GMAP, GPLOT, GSLIDE, G3D, G3GRID

SAS procedures analyse and process SAS data sets as follows:-

a) Read SAS data sets

b) Perform the requested task

c) Print results

d) Create SAS output data sets (optional)

Most SAS procedures have default option settings for the more common situations or analyses. However, information can be given to the PROC step to specify:-

a) Which data set to process

b) Which variables to process

c) Whether to process the data in subsets

The PROC statement is used to begin a procedure.

e.g. PROC MEANS DATA=PERM.PATIENTS MEAN STD;

Some of the more commonly used statements within the PROC step are:-

a) General statements common to many procedures

VAR Specifies variables to be analysed

ID Specifies a variable whose values identify observations in the SAS data set

SAS Summary Guide November, 03 School of Applied Statistics

Page 9: Sas summary guide

8

BY Specifies that the data set is to be processed in groups

N.B. The data set must have already been sorted in the order of the current

BY group.

WEIGHT Specifies a variable whose values are the relative weights for the observations

WHERE Subsets observations to be analysed based on specified criteria

b) Statements specific to individual procedures

TABLES Table request in PROC FREQ

PLOT Plot request in PROC PLOT

MODEL Model specification in PROC ANOVA, PROC GLM, PROC REG etc.

c) Statements describing variable attributes

FORMAT Specifies formats for printing variable values

LABEL Associates descriptive labels with variable names

Lists of names can be abbreviated:-

a) Range of variables VAR SEX -- TEMP;

b) Numeric suffix range VAR Q1 - Q20;

c) Range of numeric variables only VAR AGE _NUMERIC_ TEMP;

d) Range of character variables only VAR NAME _CHARACTER_ SEX;

e) All numeric variables VAR _NUMERIC_;

f) All character variables VAR _CHARACTER_;

4. Basic Procedures PROC CHART

This procedure produces horizontal and vertical bar charts, pie charts, star charts and block charts for numeric and character variables. The charts can represent frequencies and cumulative frequencies, percentages and cumulative percentages, sums and means.

PROC CHART DATA = data_set_name options ;

HBAR variable_list ; produces horizontal bar chart

VBAR variable_list ; produces vertical bar chart

PIE variable_list ; produces pie chart

STAR variable_list ; produces star chart

BLOCK variable_list ; produces block chart

BY variable_list ;

SAS Summary Guide November, 03 School of Applied Statistics

Page 10: Sas summary guide

9

PROC CORR

This procedure computes correlation coefficients between variables. Various univariate statistics are also computed.

PROC CORR DATA = data_set_name options ;

VAR variable_list ;

WITH variable_list ;

WEIGHT variable ;

FREQ variable ;

BY variable_list ;

PROC FORMAT

This procedure is used to define formats for specifying labels for variable values used for output. Formats can be used for either numeric or character variables. They can be used in PUT statements in a DATA step and in FORMAT statements in a PROC step. In FORMAT statements in a DATA step they can also be used in which case they are then associated with the variable for the remainder of the SAS job, unless changed.

PROC FORMAT options ;

VALUE format_name value1 = label1 value2 = label2 . . valuen = labeln ;

format_name Must be a unique SAS name which must begin with a $ for character variables

values Can be a single number or a range of numbers, or several numerical or

character values

labels Labels can contain a maximum of 40 characters and must be enclosed in

quotes

e.g. PROC FORMAT;

VALUE $SEXFMT 'M' = 'Male' 'F' = 'Female';

VALUE AGEFMT 1 - 16 = 'Child' 17 - High = 'Adult';

The formats defined above can be used in other procedures as follows:- PROC PRINT DATA = PERM.PATIENTS;

SAS Summary Guide November, 03 School of Applied Statistics

Page 11: Sas summary guide

10

VAR SEX AGE;

FORMAT SEX $SEXFMT. AGE AGEFMT. ;

NB. The full stop after SEXFMT and AGEFMT is essential

PROC FREQ

This procedure produces 1 - way to n - way frequency tables of character and numeric variables.

PROC FREQ DATA = data_set_name options ;

WEIGHT weighting_variable ;

BY variable_list ;

TABLES table_request / options ;

In the TABLES specification the values of the last variable form the columns and the values of the second last variable form the rows.

e.g. TABLES VAR1; one - way table

TABLES VAR1 * VAR2; two - way table

PROC MEANS

This procedure is used to produce simple univariate statistics for numeric variables. The options available allow you to specify which statistics you want calculated e.g. mean, standard deviation, minimum. If no statistics are specifically requested in the MEANS statement, then variable name, N, mean, standard deviation, minimum, maximum are printed automatically.

PROC MEANS DATA = data_set_name options ;

BY variable_list ;

VAR variable_list ;

ID variable_list ;

FREQ variable ;

WEIGHT weighting_variable ;

OUTPUT OUT = output_data_set_name statistics ;

SAS Summary Guide November, 03 School of Applied Statistics

Page 12: Sas summary guide

11

PROC PLOT

This procedure produces line-printer plots for both numeric and character variables. Various options are available for specifying the plotting symbol, scaling the axes, drawing reference lines, superimposing 2 or more plots and drawing contour plots.

PROC PLOT DATA = data_set_name options ;

PLOT vertical_variable * horizontal_variable / options ;

BY variable_list ;

PROC PRINT

This procedure prints the values in a SAS data set.

PROC PRINT DATA = data_set_name options ;

BY variable_list ;

VAR variable_list ;

ID variable_list ;

PAGEBY variable ;

SUM variable_list ;

SUMBY variable ;

PROC SORT

This procedure rearranges the observations in an existing SAS data set or creates a new data set containing the rearranged observations. Multiple sorting groups can be specified and variables can be sorted in ascending or descending order.

PROC SORT DATA = data_set_name OUT = output_data_set_name options ;

BY variable_list ;

Variables are automatically sorted in ascending order, for descending order put DESCENDING before the variable names in the BY statement. The SORT procedure should always be used when subsequent procedures process the data set in groups using the BY statement. It is possible to process a data set without sorting it beforehand by using the NOTSORTED option on the BY statement of the procedure being used. However, SAS assumes that consecutive observations with the same BY value are grouped together although the BY values are not necessarily sorted in alphabetic or numeric order.

SAS Summary Guide November, 03 School of Applied Statistics

Page 13: Sas summary guide

12

PROC SUMMARY

This procedure produces a SAS data set containing statistics similar to the MEANS procedure, but much more efficiently. PROC SUMMARY does not produce any printed output and the data does not have to be sorted in order to produce subgroup statistics. An OUTPUT and a VAR statement must be specified, and any number of OUTPUT statements can be used. The VAR statement must precede the OUTPUT statements.

PROC SUMMARY DATA = data_set_name options ;

CLASS variable_list ;

VAR variable_list ;

BY variable_list ;

FREQ variable ;

WEIGHT weighting_variable ;

ID variable_list ;

OUTPUT OUT = output data_set_name statistics ;

PROC TABULATE

This procedure provides a more flexible alternative to the FREQ procedure for producing tables. Each cell in the table contains a descriptive statistic e.g. mean, standard deviation, etc. TABULATE will generate tables defined by the TABLE statement. Classification variables must be specified with the CLASS statement, while the variables to be tabulated i.e. whose values are to be the cell contents must be specified by the VAR statement. Each expression in the TABLE statement defines the categories for the table's dimensions - page, row and column.

PROC TABULATE DATA = data_set_name options ;

CLASS variable_list ;

VAR variable_list ;

BY variable_list ;

FREQ variable ;

WEIGHT weighting_variable ;

FORMAT variables'_format ;

LABEL variable = 'label' ;

TABLE page_expression, row_expression, column_expression ;

SAS Summary Guide November, 03 School of Applied Statistics

Page 14: Sas summary guide

13

PROC TRANSPOSE

This procedure transposes data sets, changing observations into variables and variables into observations. An output data set is created automatically and named according to the DATAn convention if a name is not specified.

PROC TRANSPOSE DATA = data_set_name options ;

VAR variable_list ;

ID variable ;

IDLABEL variable ;

COPY variable_list ;

BY variable_list ;

5. More on the DATA Step

5.1 IF - THEN - ELSE Statements

These statements are used to execute a further SAS statement conditional on some expression.

IF expression THEN statement;

ELSE statement ;

THEN statement is executed if expression is non zero, non missing or true

ELSE statement is executed if expression is zero, missing or false

There are eight relational operators:-

LT or < LE or <= GT or > GE or >=

NL or ~< NG or ~> EQ or = NE or ~=

In addition there are three logical operators:-

NOT or ~ AND or & OR

e.g. DATA ;

IF CODE = 1 OR CODE = 2 THEN SEX = 'MALE' ;

ELSE SEX = 'FEMALE';

e.g. DATA ;

INPUT AGE ;

SAS Summary Guide November, 03 School of Applied Statistics

Page 15: Sas summary guide

14

IF 0 < AGE < 10 THEN AGEGRP = 1 ;

IF 10 <= AGE < 19 THEN AGEGRP = 2 ;

IF AGE >= 19 THEN AGEGRP = 3 ;

Any observations with values not included in one of the categories will produce missing or blank values.

5.2 Selecting Observations

If not all observations are to be included in the data set being created they can be excluded by the DELETE statement or the subsetting IF statement. The DELETE statement stops the processing of an observation:-

e.g. DATA MALES ;

INPUT AGE SEX $ ;

IF SEX = 'F' THEN DELETE ;

The subsetting IF statement allows an observation to pass if the expression is true:-

e.g. DATA MALES ;

INPUT AGE SEX $ ;

IF SEX = 'M' ;

The result from both of the above DATA steps is the same.

5.3 DO and END Statements

DO statements specify that any statements following the DO are to be executed until a matching END appears.

e.g. DATA ;

INPUT AGE SEX $ FAMILY $ ;

IF SEX = 'F' THEN DO ;

AGE = AGE - 5 ;

FAMILY = 'NEW' ;

END ;

ELSE AGE = AGE + 3 ;

5.4 DO Loops

DO loops allow a range of statements, within a DATA step, to be repeated either a specified number of times or while a specified condition holds.

DO variable= start TO stop ;

SAS Summary Guide November, 03 School of Applied Statistics

Page 16: Sas summary guide

15

DO variable = start TO stop BY increment ;

DO WHILE (expression) ;

DO UNTIL (expression) ;

DO OVER array_name ;

Each must have a matching END statement to terminate execution.

e.g. DO N = 1 TO 20 ;

DO N = 1 TO 20 BY 4 ;

DO WHILE (N < 20) ;

DO UNTIL (N = 20) ;

5.5 Arrays

Arrays in SAS are useful for processing a lot of SAS variables in the same way

ARRAY array_name [index_variable] array_elements ;

e.g. ARRAY A Q1 - Q5 ;

DO OVER A ;

A = LOG(A) ;

END ;

Array elements are substituted for the array name in SAS statements depending on the value of the index variable. SAS will use its own internal index variable if none is defined. In the example above the DO group is executed for every element in the array.

5.6 RETAIN

This statement retains a variable value from the last execution of the DATA step. Normally all variables are set to missing before each execution of the DATA step. Initial values can also be assigned to the variables.

RETAIN variable ;

RETAIN variable initial_value ;

5.7 DROP and KEEP

The DROP statement excludes named variables from a data set or analysis and the KEEP statement includes only named variables in a data set or analysis. Both statements can be used in the DATA step or as data set options which appear after the data set name on PROC steps.

SAS Summary Guide November, 03 School of Applied Statistics

Page 17: Sas summary guide

16

e.g. DATA PERM.PATIENTS ;

DROP PATNO ;

DATA PERM.PATIENTS(DROP = PATNO) ;

PROC PRINT DATA = PERM.PATIENTS(KEEP = AGE SEX) ;

5.8 RENAME and LABEL

The RENAME statement is used to rename variables.

RENAME old_name = new_name ;

The LABEL statement assigns labels of up to 40 characters to variables.

LABEL variable = 'label' ;

6. Data Management

6.1 SET

Reads observations from 1 or more SAS data sets and can interleave observations.

a) Subset the observations DATA FEMALES ;

SET STUDENTS ;

IF SEX = 'F' ;

b) Subset the variables DATA SMALL ;

SET STUDENTS ;

DROP WEIGHT AGE ;

c) Add a new variable DATA ADD ;

SET STUDENTS ;

WTKG = WEIGHT / 2.2 ;

d) Multiple output data sets DATA MALES FEMALES ;

SET STUDENTS ;

IF SEX = 'M' THEN OUTPUT MALES ;

IF SEX = 'F' THEN OUTPUT FEMALES ;

e) Multiple input data sets DATA ALL ;

SAS Summary Guide November, 03 School of Applied Statistics

Page 18: Sas summary guide

17

(Concatenate) SET MALES FEMALES ;

f) Multiple input data sets DATA ALL ;

(Interleave) SET MALES FEMALES ;

BY NAME ;

6.2 MERGE

Combines observations from two or more SAS data sets and places them side by side.

a) One-to-one Merging

If there are the same number of observations in each data set and if the observations are in the same order then they can be combined as shown below. The two data sets are placed side by side in the combined data set being created. DATA COUPLES ;

MERGE HUSBANDS WIVES;

For any duplicate variable name in the data sets, only the values of that variable from the last named data set will be saved.

b) Match Merging

The two data sets, having already been sorted, are placed side-by-side in the order specified in the BY statement.

DATA STABLE ;

MERGE HORSE TRAINER ;

BY OWNER ;

6.3 UPDATE

Updates a master file with a transaction file where the BY variable is the KEY for matching observations.

DATA SURGERY;

UPDATE SURGERY BLOODCT;

BY PATIENT;

This should be used only when, for a master data set, there are several changes that can be applied all in one job.

SAS Summary Guide November, 03 School of Applied Statistics

Page 19: Sas summary guide

18

7. Statistical Procedures There are a wide range of statistical procedures available in SAS for carrying out such techniques as analysis of variance and covariance, linear and non-linear regression analysis, multivariate methods and non-parametric methods. A few examples of some of the more widely used procedures are given below. For more details on all the procedures available for statistical analysis, consult the appropriate manuals.

PROC ANOVA

This procedure is used to carry out an analysis of variance of balanced data (see also PROC GLM). Many of the statements which can be used with this procedure are not necessary for standard analyses.

PROC ANOVA DATA= ;required statements;

CLASS ;must appear in this order

MODEL ;

BY ;

data_set_name options variable_listdependent_variables = effects / options

variable_list

must appear before the

ABSORB ;first RUN statement

FREQ ;

MEANS ;TEST ;MANOVA H

variable_listvariable

effects / optionsH = effects E = effect

= ef

can appear after the MODEL statement

E M ; and can be used REPEATED ; interactively

fects = effect = equations / optionsfactor_names / options

e.g. PROC ANOVA DATA = EXPT ;

CLASS METHOD VARIETY ;

MODEL YIELD = METHOD VARIETY METHOD * VARIETY ;

BY YEAR ;

SAS Summary Guide November, 03 School of Applied Statistics

Page 20: Sas summary guide

19

PROC GLM

This procedure can be used to fit general linear models to data to enable statistical methods such as analysis of variance, analysis of covariance, regression analysis (including comparison of regressions) and multivariate analysis of variance to be carried out. Unbalanced data and data with missing values can also be analysed using this procedure. There are numerous statements and options available with this procedure, but most applications only use a few of them.

PROC GLM DATA= ; must precede MODEL CLASS ; statementMODEL ; required statemen

data_set_name options variable_listdependent_variables = independent_variables / options

tABSORB ;BY ;

must appear before the FREQ ;

first RUN sID ;WEIGHT ;

variable_listvariable_list

variablevariable_list

weighting_variable

tatement

CONTRAST ;ESTIMATE ;LSMEANS ;MANOVA H E M ; MEANS

'label' effect_values / options'name' effect_values / options

effects / options = effects = effect = equations / options

ef

can appear after the MODEL s

;OUTPUT OUT = ;RANDOM ;REPEATED ;TEST H E ;

fects / optionsoutput_data_set_name

effects / optionsfactor_names / options

= effects = effect / options

tatement and can be used interactively

e.g. PROC GLM DATA = EXPT2 ;

CLASS TREAT SUBJECT TIME ;

MODEL RESP = TREAT SUBJECT(TREAT) TIME TREAT * TIME ;

TEST H = TREAT E = SUBJECT(TREAT) ;

LSMEANS TREAT TIME TREAT*TIME ;

OUTPUT OUT = NEW P = RHAT R = RESID ;

SAS Summary Guide November, 03 School of Applied Statistics

Page 21: Sas summary guide

20

PROC TTEST

This procedure carries out a simple t-test on the means of two groups of observations. The grouping factor specified by the CLASS statement it must have only two levels.

PROC TTEST DATA = data_set_name options ;required statements

CLASS variable_list ;

BY variable_list ;optional statements

VAR variable_list ;

e.g. PROC TTEST DATA = EXPT5 ;

CLASS SEX ;

VAR SCORE ;

PROC NLIN

This procedure is used to fit nonlinear regression models. The model to be fitted has to be specified, as do the parameters to be estimated, initial guesses for them, and possibly the partial derivatives of the model with respect to each parameter. Some models are difficult to fit and in these cases the initial guesses can be critical. There is no guarantee that the procedure will be able to fit the model successfully.

PROC NLIN DATA = ;PARMS ; required statementsMODEL ;

BOUNDS ;BY ;ID ;DER.

data_set_name optionsparameter = valuesdependent variable = expression

expressionsvariable_list

variable_listparameter =

optional statements ;

OUTPUT OUT = ; expressionoutput_data_set_name

e.g. PROC NLIN DATA = EXPT3 ;

PARMS B0 = 0.5 B1 = 0.08 ;

MODEL Y = B0*(1-EXP(-B1*X)) ;

DER.BO = 1-EXP(-B1*X) ;

DER.B1 = B0*X*EXP(-B1*X) ;

SAS Summary Guide November, 03 School of Applied Statistics

Page 22: Sas summary guide

21

PROC REG

This procedure is used to fit linear regression models. There are other regression procedures such as RSQUARE, RSREG and STEPWISE for selecting subsets of independent variables in a multiple regression analysis, fitting quadratic response surfaces and carrying out stepwise regression, respectively.

}

}

PROC REG DATA = data_set_name options ; required statementrequired statement for

MODEL dependent_variables = independent_variables / options ; model fitting: can be used interactively

VAR variable_list ;BY variable_list ;

must appear before the FREQ variable ;

first RWEIGHT weighting_variable; ID variable ;

UN statement

ADD variable_list;DELETE variable_list;MTEST equations ;OUTPUT OUT = output_data_set_name ; PLOT y_variate*x_variate;REFIT;RESTRICT equations ;REWEIGH

can appear anywhere after a MODEL statement and can be used interactively

T condition;TEST equations ;

e.g. PROC REG DATA = EXPT4 ;

MODEL POP = YEAR ;

OUTPUT OUT = REGOUT P = EPOP R = RESID ;

8. Graphical Procedures The majority of procedures available to produce high-quality, hard-copy graphical output work in the same way as those mentioned in section 4. Syntactically most are prefixed by the letter G e.g. GCHART, GPLOT etc. Additional global statements allow the user to specify more precisely the axes, symbols and patterns etc. used in the representation of the data.

This is a topic beyond the scope of this Summary Guide but information can be found in the two volumes of the manuals SAS/GRAPH. To produce hard-copy, the various versions of SAS access the graphics devices in different ways, so refer to the appropriate SAS Companion Guide for more complete information.

SAS Summary Guide November, 03 School of Applied Statistics

Page 23: Sas summary guide

22

9. Output Delivery System (ODS) Many procedures produced output data sets which could be used in further calculations e.g parameter estimates from regression analysis. However, some more common procedures lacked this facility. Since verion 7 the Output Delivery System (ODS) has made the saving of datasets, formatted output for high-resolution printers and web quality output using HTML much simpler.

Equally it is possible to control the output stream more effectively and greater choice of output objects to data sets is available.

ODS is a vast topic with many individual statements. Each statement (shown in the next table has its own set of options which are not shown here and are best described in the manual.

Table of ODS Statements {ODS EXCLUDE Specify output objects to exclude from ODS destinations.

Open, manage, or close the HTML destination. If ODS HTML

the destination is open, you can create HTML output.ODS LISTING Open, mana

{ ge or close the Listing destination.

Create a SAS data set from an output object and manage ODS OUTPUT

the selection and exclusion lists for the Output destination.

Specify which locations to ODS PATH

search for the definitions thatwere created by PROC TEMPLATE, as well asthe order in which to search for them.

Open, manage or close the Printer destination. If the ODS PRINTER

destination is open

{

, you can create Printer output.

ODS SELECT Specify output objects for ODS destinations.Write to the SAS log the specified selection or

ODS SHOWexclusion list.

Write to the SAS log a recoODS TRACE

rd of each output object that iscreated, or suppress the writing of this record.

Print or suppress a warning that a style definition or a tableODS VERIFY

definition that is used is not supplied by SA

S Institute.

SAS Summary Guide November, 03 School of Applied Statistics

Page 24: Sas summary guide

23

SAS Summary Guide November, 03 School of Applied Statistics

10. Further Facilities There are many more facilities in SAS in addition to those that have been documented here. These include:-

• A macro processing language

• A full-screen editor (FSP) enabling data to be entered and updated. It also contains a spreadsheet facility.

• Interactive matrix language (IML). A very powerful module for programming matrix algebra useful for statistical and mathematical applications

• Time series module (ETS) for carrying out econometric and time-series analysis.

11. Publications There is a vast range of SAS manuals for both UNIX and PC versions. They can be ordered from:-

SAS Software Ltd. Wittington House Henley Road Medmenham Marlow SL7 2EB

The Main Library on campus has a few manuals for reference based on previous versions. In addition, users of SAS at The University of Reading can read the current documentation on-line by registering at

http://v8doc.sas.com/sashtml/