dsci 325: handout 4 if-then statements in sascourse1.winona.edu/thooks/media/handout 4 - if-then...
TRANSCRIPT
1
DSCI 325: Handout 4 – If-Then Statements in SAS Spring 2017
IF-THEN STATEMENTS IN SAS
We have already worked briefly with IF-THEN statements in SAS. Here, we will discuss using
IF-THEN statements in a DATA step to assign a value to a variable using one (or more)
condition(s). Note that an IF-THEN statement takes on the following general form.
IF condition THEN action;
For example, consider once again the Grades data set. Suppose that the professor uses the
following statements to calculate a final percentage grade for each student and to determine
whether they pass or fail the course. Identify the condition and the action in each of the IF-
THEN statements in the following code.
DATA Grades2;
SET Hooks.Grades;
TotalQuiz =
SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);
TotalExam = SUM(Exam1,Exam2,Exam3);
FinalPercent = SUM(TotalQuiz,TotalExam,EC,Final)/640;
IF FinalPercent >= .70 THEN Status = 'Pass';
IF FinalPercent < .70 THEN Status = 'Fail';
RUN;
PROC PRINT Data=Grades2;
VAR FirstName LastName FinalPercent Status;
RUN;
Note that for each observation in the data set, when SAS sees the keyword IF it evaluates the
condition to determine whether it is true or false. If the condition is true, SAS takes the action
that follows the keyword THEN. If the condition is false, SAS ignores the THEN clause and
simply proceeds to the next statement in the DATA step.
A portion of the result of the PROC PRINT is shown below:
2
The condition in an IF-THEN statement typically involves the use of a comparison operator
(such as “greater than or equal to” as was used above). SAS uses the following syntax for the
standard comparison operators. Note that you can use either syntax when programming.
Comparison
SAS syntax using
symbols
SAS syntax using
Mnemonic Equivalent
Less than < LT
Greater than > GT
Less than or equal to <= LE
Greater than or equal to >= GE
Equal to = EQ
Not equal to ^= or ~= NE
Equal to one of a list IN( )
Change the IF-THEN statements in the above code to the following and verify that they work.
IF FinalPercent GE .70 THEN Status = 'Pass';
IF FinalPercent LT .70 THEN Status = 'Fail';
USING DO and END STATEMENTS
The IF-THEN statements discussed thus far allow only one action to take place. If multiple
actions need to take place if the condition is true, then you should use DO and END statements.
Typical IF Statement
(one action)
IF Statement using DO and END
(multiple actions)
IF condition THEN action;
IF condition THEN DO;
action 1;
action 2;
…
END;
Consider the following example that has two action statements. DATA Grades2;
SET Hooks.Grades;
ParenLocation = INDEX(ZipTownAreaCode,'(');
Town = SUBSTR(ZipTownAreaCode,7,ParenLocation-8);
IF SUBSTR(Town,1,1) EQ 'A' THEN DO;
ATown=1;
BTown=0;
END;
RUN;
3
Discuss what the IF statement should do in detail.
Consider the print-out from the above code. Is this what you expected?
Task: This dataset contains only towns that start with an A or B. Modify the code so that all B
towns are given a value of 1 for BTown and 0 for ATown (instead of leaving them missing as
was done above).
DATA Grades2;
SET Hooks.Grades;
ParenLocation = INDEX(ZipTownAreaCode,'(');
Town = SUBSTR(ZipTownAreaCode,7,ParenLocation-8);
IF SUBSTR(Town,1,1) EQ 'A' THEN DO;
ATown=1;
BTown=0;
END;
RUN;
4
IF-THEN-ELSE Statements
Next, note that the IF-THEN statement tells SAS what to do when the condition is true. To tell
SAS what action to take when the condition is false, you could utilize an ELSE statement. For
example, we could have also created the variable Status as follows.
DATA Grades2;
SET Hooks.Grades;
TotalQuiz =
SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);
TotalExam = SUM(Exam1,Exam2,Exam3);
FinalPercent = SUM(TotalQuiz,TotalExam,EC,Final)/640;
IF FinalPercent >= .70 THEN Status = 'Pass';
ELSE Status = 'Fail';
RUN;
PROC PRINT Data=Grades2;
VAR FirstName LastName FinalPercent Status;
RUN;
Task: Recall the example from the previous page. Use an ELSE statement to modify the code so
that all B towns are given a value of 1 for BTown and 0 for ATown.).
DATA Grades2;
SET Hooks.Grades;
ParenLocation = INDEX(ZipTownAreaCode,'(');
Town = SUBSTR(ZipTownAreaCode,7,ParenLocation-8);
IF SUBSTR(Town,1,1) EQ 'A' THEN DO;
ATown=1;
BTown=0;
END;
RUN;
5
Problems with Missing Data
Suppose that one of the students was ill and was unable to take the final exam as scheduled. In
the data file Grades_missing.csv, this student’s final exam score is a missing value. Read this
file into your personal library and run code similar to the following. What happens?
DATA Grades3;
SET Hooks.Grades_missing;
TotalQuiz =
SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);
TotalExam = SUM(Exam1,Exam2,Exam3);
FinalPercent = SUM(TotalQuiz,TotalExam,EC,Final)/640;
IF FinalPercent >= .70 THEN Status = 'Pass';
ELSE Status = 'Fail';
RUN;
PROC PRINT Data=Grades3;
VAR FirstName LastName FinalPercent Status;
RUN;
Note: The SUM function returns the sum of all of the non-missing arguments, so SAS still
calculates a value for FinalPercent. Alternatively, consider calculating FinalPercent as follows.
DATA Grades3;
SET Hooks.Grades_missing;
TotalQuiz =
SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);
TotalExam = SUM(Exam1,Exam2,Exam3);
FinalPercent = (TotalQuiz + TotalExam + EC + Final)/640;
IF FinalPercent >= .70 THEN Status = 'Pass';
ELSE Status = 'Fail';
RUN;
Note: The ‘+’ operator returns a missing value if any of the arguments are missing.
6
Also, note that SAS considers a missing value to be smaller than any other numerical value.
Thus, it is always a good idea to program for missing values as shown below.
DATA Grades3;
SET Hooks.Grades_missing;
TotalQuiz =
SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);
TotalExam = SUM(Exam1,Exam2,Exam3);
FinalPercent = (TotalQuiz + TotalExam + EC + Final)/640;
IF FinalPercent =. THEN Status = ' ';
ELSE IF FinalPercent >= .70 THEN Status = 'Pass';
ELSE Status = 'Fail';
RUN;
Questions:
1. Run this program. Is this the result you expected?
2. Why did you get this result?
3. Add one statement to the DATA step to fix this problem.
DATA Grades3;
SET Hooks.Grades_missing;
TotalQuiz =SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,
Quiz10,Quiz11,Quiz12);
TotalExam = SUM(Exam1,Exam2,Exam3);
FinalPercent = (TotalQuiz + TotalExam + EC + Final)/640;
IF FinalPercent = . THEN Status = ' ';
ELSE IF FinalPercent >= .70 THEN Status = 'Pass';
ELSE Status = 'Fail';
RUN;
7
Using Logical Operators in IF-THEN Statements
In addition to comparison operators, we can also use logical operators when writing a condition
for an IF-THEN statement.
Operation
SAS syntax using
symbols
SAS syntax using
Mnemonic Equivalent
All conditions must be true & AND
At least one condition must be true | or ! OR
Reverse the logic of a condition ^ or ~ NOT
For example, we could assign grades to the students in the file Grades.csv as follows.
DATA Grades2;
SET Hooks.Grades;
TotalQuiz =
SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);
TotalExam = SUM(Exam1,Exam2,Exam3);
FinalPercent = SUM(TotalQuiz,TotalExam,EC,Final)/640;
IF (FinalPercent >= 0.90) THEN Grade='A';
IF (FinalPercent >= 0.80) AND (FinalPercent < 0.90) THEN Grade='B';
IF (FinalPercent >= 0.70) AND (FinalPercent < 0.80) THEN Grade='C';
IF (FinalPercent >= 0.60) AND (FinalPercent < 0.70) THEN Grade='D';
IF (FinalPercent < 0.60) THEN Grade='F';
RUN;
PROC PRINT Data=Grades2;
VAR FirstName LastName Final FinalPercent Grade;
RUN;
Comment: Though it is not necessary, it is good practice to write programs that are easy to read (the
earlier you develop this habit, the better off you’ll be). For example, in the above code, the conditions and
actions have been aligned, and optional parentheses have been used above to offset the conditions.
These statements create the following data set (only a portion is shown here).
8
Task: Now, suppose the instructor wants to give bonus points to students who show some sign
of improvement from the beginning of the course to the end of the course. Suppose that two
percentage points will be added to a student's final percentage grade if either their Exam 1
grade is less than their Exam 3 and Final grade or their Exam 2 grade is less than their Exam 3
and Final grade. Call this new variable FinalPercent_Adjusted. Finish the following SAS code to
do this, and verify that your program works.
DATA Grades2;
SET Hooks.Grades;
TotalQuiz =
SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);
TotalExam = SUM(Exam1,Exam2,Exam3);
FinalPercent = SUM(TotalQuiz,TotalExam,EC,Final)/640;
9
A Word of Caution Regarding IF-THEN Statements
Be careful when using multiple IF-THEN statements when creating a new variable. For
example, suppose we tried to assign grades to the students in the file Grades_missing.csv as
follows.
DATA Grades3;
SET Hooks.Grades_missing;
TotalQuiz =
SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);
TotalExam = SUM(Exam1,Exam2,Exam3);
FinalPercent = (TotalQuiz + TotalExam + EC + Final)/640;
IF FinalPercent=. THEN Grade='Incomplete';
IF FinalPercent >= 0.90 THEN Grade='A';
IF FinalPercent >= 0.80 AND FinalPercent < 0.90 THEN Grade='B';
IF FinalPercent >= 0.70 AND FinalPercent < 0.80 THEN Grade='C';
IF FinalPercent >= 0.60 AND FinalPercent < 0.70 THEN Grade='D';
IF FinalPercent < 0.60 THEN Grade='F';
RUN;
PROC PRINT Data=Grades3;
VAR FirstName LastName Final FinalPercent Grade;
RUN;
SAS creates the following data set (only a portion is shown here).
Question: Is this the result you expected? What is the problem?
10
Note that SAS checks each IF statement, and the value for Grade is overwritten each time a
condition is met. Any change in the order of the statements may affect the results.
USING ELSE-IF STATEMENTS
A better idea is to use IF-THEN-ELSE statements. The general form is as follows:
IF condition THEN action;
ELSE IF condition THEN action;
ELSE IF condition THEN action;
This way, once a condition is met, SAS stops checking the other IF-THEN statements that follow
ELSE. This not only prevents you from writing over values but is also much more efficient. The
correct program is shown below.
DATA Grades3;
SET Hooks.Grades_missing;
TotalQuiz =
SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);
TotalExam = SUM(Exam1,Exam2,Exam3);
FinalPercent = (TotalQuiz + TotalExam + EC + Final)/640;
IF FinalPercent=. THEN Grade='Incomplete';
ELSE IF FinalPercent >= 0.90 THEN Grade='A';
ELSE IF FinalPercent >= 0.80 THEN Grade='B';
ELSE IF FinalPercent >= 0.70 THEN Grade='C';
ELSE IF FinalPercent >= 0.60 THEN Grade='D';
ELSE IF FinalPercent < 0.60 THEN Grade='F';
RUN;
PROC PRINT Data=Grades3;
VAR FirstName LastName Final FinalPercent Grade;
RUN;
A portion of the results is shown below.
11
Comment: The last ELSE statement in a series may simply specify an action that is
automatically executed for all observations that fail to satisfy any of the previous IF-THEN
statements.
IF condition THEN action;
ELSE IF condition THEN action;
ELSE action;
For example, we could have modified the above program as follows:
DATA Grades3;
SET Hooks.Grades_missing;
TotalQuiz =
SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);
TotalExam = SUM(Exam1,Exam2,Exam3);
FinalPercent = (TotalQuiz + TotalExam + EC + Final)/640;
IF FinalPercent=. THEN Grade='Incomplete';
ELSE IF FinalPercent >= 0.90 THEN Grade='A';
ELSE IF FinalPercent >= 0.80 THEN Grade='B';
ELSE IF FinalPercent >= 0.70 THEN Grade='C';
ELSE IF FinalPercent >= 0.60 THEN Grade='D';
ELSE Grade='F';
RUN;
PROC PRINT Data=Grades3;
VAR FirstName LastName Final FinalPercent Grade;
RUN;
12
Task: Consider the following SAS code.
DATA test;
INPUT ID Gender $ Score;
DATALINES;
1 male 21
2 male 34
3 male 54
4 male 76
5 female 44
6 female 21
7 female 22
8 female 17
;
DATA test2;
SET test;
IF Score < 30 THEN Group = 1;
IF Gender = 'male' then Group = 2;
RUN;
PROC PRINT DATA=test2; RUN;
Predict the value of Group for each observation and write it in the table below. Then, run the
program to verify your answer.
Now, suppose the code is changed to the following:
DATA test2;
SET test;
IF Score < 30 THEN Group = 1;
ELSE IF Gender = 'male' then Group = 2;
RUN;
PROC PRINT DATA=test2; RUN;
13
Once again, predict the value of Group for each observation and write it in the table below.
Then, run the program to verify your answer.
Finally, consider changing the code as follows and predict the output. Run the program in SAS
to check your answer.
DATA test2;
SET test;
IF Gender = 'male' THEN Group = 2;
ELSE IF Score < 30 then Group = 1;
RUN;
PROC PRINT DATA=test2; RUN;
To avoid errors in your programs, make sure you understand how SAS deals with a series of IF-
THEN or IF-THEN-ELSE statements!
14
IF-THEN STATEMENTS AND CHARACTER VALUES
Consider again the Grades.csv data set. The following code is used to create a new variable
named Ada. The code assigns a 1 to this variable if the town is Ada and a ‘ . ‘ if the town is not
Ada.
DATA Grades2;
SET Hooks.Grades;
ParenLocation = INDEX(ZipTownAreaCode,'(');
Town = SUBSTR(ZipTownAreaCode,7,ParenLocation-8);
IF Town = 'Ada' THEN Ada=1;
RUN;
PROC PRINT DATA=Grades2;
Var FirstName LastName ZipTownAreaCode ParenLocation Town Ada;
RUN;
Comment #1: SAS is case-sensitive with regard to the string of text. For example, try to replace
the above IF-THEN statement with the following: IF Town = 'ADA' THEN Ada=1;
What happens?
Comment #2: This code does NOT work if you have leading or trailing spaces (which you
cannot see) in your variable name. For example, in the following code, town contains a leading
space. Thus, in the first observation, Town is specified by _Ada, not Ada. As a result, the IF
statement fails to recognize _Ada as Ada and thus does not assign Ada a value of 1.
DATA Grades2;
SET Hooks.Grades;
ParenLocation = INDEX(ZipTownAreaCode,'(');
Town = SUBSTR(ZipTownAreaCode,6,ParenLocation-7);
IF Town = 'Ada' THEN Ada=1;
RUN;
PROC PRINT DATA=Grades2;
Var FirstName LastName ZipTownAreaCode ParenLocation Town Ada;
RUN;
15
The following program verifies this.
DATA Grades2;
SET Hooks.Grades;
ParenLocation = INDEX(ZipTownAreaCode,'(');
Town = SUBSTR(ZipTownAreaCode,6,ParenLocation-7);
IF Town = 'Ada' THEN Ada=1;
TownLength = Length(Town);
Test = '*' || Town;
RUN;
PROC PRINT DATA=Grades2;
Var FirstName LastName ZipTownAreaCode ParenLocation Town Ada TownLength
Test;
RUN;
You can remove any character (including a space) using the COMPRESS function in SAS. The
first argument is the string with which you want to work; the second argument in the
COMPRESS function is the character you’d like to remove.
DATA Grades2;
SET Hooks.Grades;
ParenLocation = INDEX(ZipTownAreaCode,'(');
Town = COMPRESS(SUBSTR(ZipTownAreaCode,6,ParenLocation-7),' ');
IF Town = 'Ada' THEN Ada=1;
TownLength = Length(Town);
Test = '*' || Town;
RUN;
PROC PRINT DATA=Grades2;
VAR FirstName LastName ZipTownAreaCode ParenLocation Town Ada TownLength;
RUN;
16
Note that the COMPRESS function removes ALL spaces, which is not desirable here because
now any town with two words has its space removed (e.g., Albert_Lea is now AlbertLea). A
better alternative is to use the STRIP function in SAS, which removes leading and trailing
blanks from a string.
DATA Grades2;
SET Hooks.Grades;
ParenLocation = INDEX(ZipTownAreaCode,'(');
Town = STRIP(SUBSTR(ZipTownAreaCode,6,ParenLocation-7));
IF Town = 'Ada' THEN Ada=1;
TownLength = Length(Town);
Test = '*' || Town;
RUN;
PROC PRINT DATA=Grades2;
VAR FirstName LastName ZipTownAreaCode ParenLocation Town Ada TownLength Test;
RUN;
17
Tasks:
1. Suppose we’d like to create a variable named NotAda which takes on the value 1 for all
towns NOT EQUAL to Ada and 0 otherwise. Write a program to do this.
2. Write a program which creates a variable named ATown which takes on the value 1 if
the town is either Ada, Adams, or Adolph and 0 otherwise.
3. SAS has an IN operator that can be used to accomplish the previous task. The IN
operator allows you to check a condition using a list. Verify that the following code
produces the same output as your code for Task 2.
DATA Grades2;
SET Hooks.Grades;
ParenLocation = INDEX(ZipTownAreaCode,'(');
Town = SUBSTR(ZipTownAreaCode,7,ParenLocation-8);
IF Town IN ('Ada','Adams', 'Adolph') THEN ATown=1;
RUN;
4. Modify your code from Task 2 so that all towns that start with an ‘A’ are given a value
of 1 for ATown and 0 otherwise.