sas#programming#and#applications# …carpedm/courses/stat6110/notes/module2/module2.… · sas base...
TRANSCRIPT
SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110): FALL 2015 Module 2
Mark Carpenter, Professor of StaGsGcs Department of MathemaGcs and StaGsGcs
Phone: 4-‐3620 Office: Parker 364-‐A E-‐mail: [email protected] Web: hUp://www.auburn.edu/~carpedm/stat6110
TOPICS (Module 2) • Introduction to SAS Windows Environment
(log, editor, and output screens). • Introduction to SAS Help Screens (on-line and
within SAS system) • Introduction to the SAS DATASTEP • SAS LIBNAMES • SAS INFILE
2
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
• SAS statements end with a semicolon. • You can enter SAS statements in lowercase, uppercase, or a
mixture of the two. • You can begin SAS statements in any column of a line and
write several statements on the same line. • You can begin a statement on one line and continue it on
another line, but you cannot split a word between two lines. • Words in SAS statements are separated by blanks or by special
characters (such as the equal sign and the minus sign in the calculation of the Loss variable in the WEIGHT_CLUB example).
3
Rules for SAS Statements
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
Comment Statements Documents the purpose of the programming statements or the overall program.
• Can appear anywhere in the program • Are helpful reminders to the programmer and assist the user in implementation of the program.
Syntax: *message; or /*message*/
4
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
Comment Statements (cont)
Example: /* the following lines produce summary statistics */ or *the following lines produce summary statistics;
5
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
Comment Statements (cont)
Example: /* Author: John Smith Assignment: Homework 1 Due Date: 9/21/04 */
6
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
Comment Statements (cont)
Example: /* *********************** * Author: John Smith * * Assignment: Homework 1 * * Due Date: 9/21/04 * *************************/
7
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
Comment Statements (cont)
NOTE: All Programs for Homework assigments turned will have to have to start with a preamble: /* *********************** * Author: John Smith * * Assignment: Homework 1 * * Due Date: 9/21/04 * *************************/
8
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
Comment Statements (cont)
Example: * Author: John Smith; * Assignment: Homework 1; * Due Date: 9/21/04;
9
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
INTRODUCTION TO THE SAS DATASTEP
Click on “Help”, “SAS Help and Documentation” Click “Contents” tab. Click “SAS Products” then “Base SAS” Click “Step-by-step Programming with Base software
10
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
SAS BASE PROGRAMMING
The DATA step is one of the basic building blocks of SAS programming. It creates the data sets that are used in a SAS program's analysis and reporting procedures. Understanding the basic structure, functioning, and components of the DATA step is fundamental to learning how to create your own SAS data sets.
11
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
SAS DATA SETS AND DATASTEPs
In this section, you will learn the following: • what a SAS data set is and why it is needed • how the DATA step works • what information you have to supply to SAS so that it can construct a SAS data set for you.
12
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
ANOTOMY OF A DATASTEP
DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight;
Loss=StartWeight-EndWeight; DATALINES;
1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ;
Creating a SAS data set from Scratch using datalines statement
13
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight;
Loss=StartWeight-EndWeight; DATALINES;
1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ;
1
The DATA statement tells SAS to begin building a SAS data set named WEIGHT_CLUB 1
14
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight;
Loss=StartWeight-EndWeight; DATALINES;
1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ;
2
2 The INPUT statement idenGfies the fields to be read from the input data and names the SAS variables to be created from them (IdNumber, Name, Team, StartWeight, and EndWeight).
15
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight;
Loss=StartWeight-EndWeight; DATALINES;
1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ;
3
3 The third statement is an assignment statement. It calculates the weight each person lost and assigns the result to a new variable, Loss.
16
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight;
Loss=StartWeight-EndWeight; DATALINES;
1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ;
4
4 The DATALINES statement indicates that data lines follow
17
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight;
Loss=StartWeight-EndWeight; DATALINES;
1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ;
5 The data lines follow the DATALINES statement. This approach to processing raw data is useful when you have only a few lines of data. (Later secGons show ways to access larger amounts of data that are stored in files.)
5
18
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight;
Loss=StartWeight-EndWeight; DATALINES;
1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ;
6 The DATALINES statement marks the beginning of the input data. The single semicolon marks the end of the input data and the DATA step.
6
19
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
NAMING CONVENTIONS
Rules for Most SAS Names SAS names are used for SAS data set names, variable names, and other items. The following rules apply:
• A SAS name can contain from one to 32 characters. • The first character must be a letter or an underscore (_). • Subsequent characters must be letters, numbers, or underscores. • Blanks cannot appear in SAS names.
20
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
NAMING CONVENTIONS
Special Rules for Variable Names For variable names only, SAS remembers (labels) the combination of uppercase and lowercase letters that you use when you create the variable name. Internally, the case of letters does not matter. "CAT," "cat," and "Cat" all represent the same variable. But for presentation purposes, SAS remembers (labels) the initial case of each letter and uses it to represent the variable name when printing it.
21
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
STAT 6110 SOME SAS BASE PROCEDURES
OPTIONS linesize=80 pagesize=60 pageno=1 nodate; PROC PRINT DATA=weight_club; title 'Health Club Data'; run;
22
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
STAT 6110 SOME SAS BASE PROCEDURES
options linesize=80 pagesize=60 pageno=1 nodate; PROC PRINT DATA=weight_club; TITLE 'Health Club Data'; RUN;
23
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
STAT 6110 SOME SAS BASE PROCEDURES
OPTIONS linesize=80 pagesize=60 pageno=1 nodate; PROC TABULATE DATA=weight_club;
CLASS team; VAR StartWeight EndWeight Loss; TABLE team, mean*(StartWeight EndWeight Loss); TITLE1 'Mean StarGng Weight, Ending Weight,'; TITLE2 'and Weight Loss';
RUN;
24
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
SAS module
25
1. Create a directory on your hard drive called c:\sasfiles 2. Save the SAS programs to your local directory,
a. module1_example1.sas b. module1_example2.sas c. module1_example3.sas d. module1_example4.sas e. module1_exampl5.sas
3. Save the text files, module1_text1.txt and module1_text2.txt, and the the excel file “classroll_example.xls” to the “c:\sasfiles” directory.
4. Open SAS and go to the editor. 5. Follow Professor’s instructions on how to open and run these
programs. 6. Replicate these steps at home and make sure you can open and run
SAS programs before the next class meeting.
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
26
SAS DataSet from Existing SAS DataSet
DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight;
DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; DATA weight2; *DATA statement tells SAS to begin building a SAS data set named weight2;
SET weight_club; *SET statement tells SAS from which existing dataset to begin;
RUN; *Run statement tells SAS that you are at the end of this DATASTEP;
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
27
• Both SAS datasets, “weight_club” and “weight2” are temporary SAS datesets • Temporary SAS datasets can be referenced and used throughout the SAS module in which they were created only. • Temporary SAS datasets are stored in the temporary SAS library that SAS calls the “WORK”.
Temporary SAS datasets and the WORK Directory
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
28
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
module 2 : STAT 6110
Temporary SAS datasets and the WORK Directory
28
29
29
module 2 : STAT 6110
Double-click
Temporary SAS datasets and the WORK Directory
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
30
Temporary SAS datasets and the WORK Directory
List of SAS Datasets
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
31
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
Permanent SAS datasets and user defined SAS Libraries
• LIBNAME Statement is used to define a permanent SAS library with name of user’s choosing. • The SAS library is mapped to a specific folder located on the user’s hard-drive.
32
Permanent SAS datasets and user defined SAS Libraries
LIBNAME libref 'SAS-data-library';
Syntax:
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
33
Permanent SAS datasets and user defined SAS Libraries
LIBNAME libref 'SAS-data-library';
Syntax:
SAS LIBNAME statement tells SAS you are going to create or reference a SAS Library mapped to a specific location on the harddrive.
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
34
Permanent SAS datasets and user defined SAS Libraries
LIBNAME libref 'SAS-data-library';
Syntax:
User defined library name. Instead of “libref” the user may choose the name.
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
35
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
Permanent SAS datasets and user defined SAS Libraries
LIBNAME libref 'SAS-data-library';
Syntax:
In quotes the user tells SAS where the files will be kept. This is a specific Folder that must already exist on the user’s harddrive.
Example: LIBNAME stat6110 ‘c:\sasfiles’;
36
Permanent SAS datasets and user defined SAS Libraries
LIBNAME libref 'SAS-data-library';
Syntax:
In quotes the user tells SAS where the files will be kept. This is a specific Folder that must already exist on the user’s harddrive.
Example: LIBNAME stat6110 ‘c:\sasfiles’;
Must exist on harddrive Module 2
STAT 5110/6110: SAS Programming and ApplicaGons Mark Carpenter, Professor of StaGsGcs
37
Permanent SAS datasets and user defined SAS Libraries
LIBNAME stat6110 'c:\sasfiles'; DATA stat6110.weight_club; SET weight2; RUN;
85 86 LIBNAME stat6110 'c:\sasfiles'; NOTE: Libref STAT6110 was successfully assigned as follows: Engine: V9 Physical Name: c:\sasfiles 87 88 DATA stat6110.weight_club; 89 SET weight2; 90 RUN;
Programming Statements SAS log file
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
38
Permanent SAS datasets and user defined SAS Libraries
LIBNAME stat6110 'c:\sasfiles'; DATA stat6110.weight_club; SET weight2; RUN;
Programming Statements Creates a library called “stat6110” “stat6110” is mapped to c:\sasfiles
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
39
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
Permanent SAS datasets and user defined SAS Libraries
LIBNAME stat6110 'c:\sasfiles'; DATA stat6110.weight_club; SET weight2; RUN;
Programming Statements
Creates a permanent SAS dataset called weight_club which is “virtually” mapped to the stat6110 library but actual file is located on the harddrive in ‘c:\sasfiles’
40
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
Permanent SAS datasets and user defined SAS Libraries
New SAS Library mapped to c:\sasfiles
Permanent SAS dataset
41
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
If the raw data is in “rectangular” format where columns represent variables and rows represent observations and the variables are separated by spaces, then the SAS dataset can be created (using DATALINES, INFILE, etc) without column formatting.
FREE FORMAT DATA CREATION
42
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
DATA DATA1; INPUT ID Age savings; DATALINES; 1 25 4000 2 33 1000 3 32 8000 4 26 1500 ;
DATA DATA2; INFILE datalines delimiter=','; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ;
INFILE statement is used when we need to tell SAS special features for the data or special locations (external files).
DATA1 and DATA2 are identical
FREE FORMAT AND COMMA DELIMITED FILES
43
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
DATA DATA2b; INFILE datalines DSD; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ;
The DSD option sets the comma as the default delimiter
DATA DATA2; INFILE datalines delimiter=','; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ;
The DSD and delimiter=',' both sets the comma as the delimiter for this dataset
DATA2 and DATA2b are identical
DSD versus delimter=','
44
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
DSD (delimiter-sensitive data) specifies that when data values are enclosed in quotation marks, delimiters within the value be treated as character data. The DSD option changes how SAS treats delimiters when you use LIST input and sets the default delimiter to a comma. When you specify DSD, SAS treats two consecutive delimiters as a missing value and removes quotation marks from character values.
DATA DATA2b; INFILE datalines DSD; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ;
DSD versus delimter=','
45
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
FREE FORMAT DATA CREATION Other Delimiters
DATA DATA3; INFILE datalines delimiter=‘8'; INPUT first$ last$; DATALINES; John8Smith Bill8Johnson Alice8Bening ;
DATA DATA4; INFILE datalines delimiter=‘*'; INPUT ID Age savings; DATALINES; 1*25*4000 2*33*1000 3*32*8000 4*26*1500 ;
INFILE statement is used when we need to tell SAS special features for the data or special locations (external files).
46
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
STAT 6110 READING CHARACTER VARIABLES
DATA DATA5; INPUT ID Age Gender$ Savings; DATALINES; 1 25 Male 4000 2 33 Female 1000 3 32 Male 8000 4 26 Male 1500 ;
Dollar sign, $, tells SAS that the variable to be read is a character variable.
47
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
This example demonstrates how to prevent missing values from causing problems when you read the data with list input. Some data lines in this example contain fewer than five temperature values. Use the MISSOVER option so that these values are set to missing.
DATA weather1; INFILE datalines missover; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ;
DATA weather2; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3 . . 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ;
weather1 and weather2 are identical
MISSOVER STATEMENT
48
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
MISSOVER STATEMENT
DATA weather1; INFILE datalines missover; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ;
Prevents an INPUT statement from reading a new input data record if it does not find values in the current input line for all the variables in the statement. When an INPUT statement reaches the end of the current input data record, variables without any values assigned are set to missing.
DATA weather2; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3 . . 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ;
49
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
Using the INFILE statement (Reading External Text Files)
To find more information on INFILE: While in the text editor in a SAS module, go to “Help” then click on the “Index” tab. Type the word “infile” in the keyword box, then double click the word “INFILE” in the results section.
50
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
Using the INFILE statement (Reading External Text Files)
• Because the INFILE statement identifies the file to read, it must execute before the INPUT statement that reads the input data records.
• Usually, you use an INFILE statement to read data from an external file. When data is read from the job stream, you must use a DATALINES statement. However, to take advantage of certain data-reading options that are available only in the INFILE statement, you can use an INFILE statement with the file-specification DATALINES and a DATALINES statement in the same DATA step.
51
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
Using the INFILE statement (Reading External Text Files)
Reading Multiple Input Files You can read from multiple input files in a single iteration of the DATA step by using multiple INFILE statements.
52
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
Using the INFILE statement (Reading External Text Files)
France,575,Express,10 Spain,510,World,12 Brazil,540,World,6 India,489,Express,.
C:\module2_text1.txt Japan,720,Express,10 Greece,698,Express,20 New Z.,1489, Southsea,6 Venez.,425,World,8 Italy,468,Express,9 USSR,924,World,6 Switz.,734,World,20 Austral.,1079,Southsea,10 Ireland,558,Express,9
C:\module2_text2.txt
53
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
Using the INFILE statement (Reading External Text Files)
DATA DATA1; INFILE 'c:\module2_text1.txt' DSD; INPUT country$ cost vendor$ number; RUN; DATA DATA2; INFILE 'c:\module2_text2.txt' DSD; INPUT country$ cost vendor$ number; RUN; DATA DATA3; SET DATA1 DATA2; RUN;
SAS program that reads in C:\module2_text1.txt
54
Module 2 STAT 5110/6110: SAS Programming and ApplicaGons
Mark Carpenter, Professor of StaGsGcs
(@@ or "double trailing @").
Sometimes you may need to create multiple observations from a single record of raw data. One way to tell SAS how to read such a record is to use the other line-hold specifier, the double trailing at-sign (@@ or "double trailing @"). The double trailing @ not only prevents SAS from reading a new record into the input buffer when a new INPUT statement is encountered, but it also prevents the record from being released when the program returns to the top of the DATA step.