1 controlling input and output outputting multiple observations writing to multiple sas data sets...

74
1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

Upload: osborne-kelly

Post on 19-Jan-2016

228 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

1

Controlling Input and Output

Outputting Multiple Observations

Writing to Multiple SAS Data Sets

Selecting Variables and Observations

Page 2: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

2

Outputting Multiple Observations

Page 3: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

A Forecasting Application

The growth rate of six departments is stored in the Increase variable in the data set orion.growth. If each department grows at its predicted rate for the next two years, how many employees will be in each department at the end of each year?

3

libname orion "/folders/myfolders/sasdata/prg2" ;

proc print data=orion.growth;run;

Page 4: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

4

The output SAS data set, forecast, should contain 12 observations. Two observations are written for each observation read.

Page 5: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

5

The end of a DATA step iteration.

data forecast; set orion.growth; total_employees= total_employees * (1+increase);run;

Implicit OUTPUT;Implicit RETURN;

Page 6: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

6

Explicit OutputThe explicit OUTPUT statement writes the contents

of the program data vector (PDV) to the data set or data sets being created. The presence of an explicit OUTPUT statement overrides implicit output.

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

No Implicit OUTPUT;

Page 7: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

7

Compilation and Execution of the Data Step

Page 8: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

8

Compilationdata forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

PDV – Program Data Vector

Department$ 40

Total_Employees

N 8

IncreaseN 8

...

Page 9: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

9

Compilation

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

...

Page 10: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

10

Compilationdata forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

PDV

Department$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

...

Write descriptor portion of output data set

work.forecast, descriptor portionDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Page 11: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

11

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

. . . ...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDVdata forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Initialize PDV

Page 12: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

12

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 34 0.25 . ...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Initialize PDV

Page 13: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

13

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 34 0.25 1...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDVdata forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Page 14: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

14

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 42.5 0.25 1 ...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

34 * (1 + 0.25)

Page 15: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

15

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 42.5 0.25 1

...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Output current observation

Page 16: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

16

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 42.5 0.25 2

...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Page 17: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

17

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 53.125 0.25 2

...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

42.5 * (1 + 0.25)

Page 18: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

18

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 53.125 0.25 2

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run; Output current observation

Page 19: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

19

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 53.125 0.25 2

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

No Implicit OUTPUT;

Page 20: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

20

The forecast data set contains two observations after the first iteration of the DATA step.

work.forecast

Department Total_Employees

Increase Year

Administration 42.500 0.25 1Administration 53.125 0.25 2

Page 21: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

21

Prior to the second iteration of the DATA step, some variables in the program data vector will be reinitialized.

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 53.125 0.25 2

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Page 22: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

22

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Administration 53.125 0.25 . ...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Reinitialize PDV

Page 23: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

23

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Engineering 9 0.30 ....

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDV

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Page 24: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

24

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Engineering 9 0.30 1 ...

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDVdata forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Page 25: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

25

Execution: Explicit Output

PDVDepartment

$ 40

Total_Employees

N 8

IncreaseN 8

YearN 8

Engineering 9 0.30 1

orion.growthDepartment Total_Employees Increase

Administration 34 0.25Engineering 9 0.30IS 25 0.10

Initialize PDVdata forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Continue until EOF

Page 26: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

26

Modify the DATA step to write only one observation per department, showing the number of employees after two years.

data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); Year=2; Total_Employees=Total_Employees*(1+Increase); output;run;

Page 27: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

27

Writing to Multiple SAS Data Sets

Page 28: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

28

Use the orion.employee_addresses data set as input to create three new data sets: usa, australia, and other.

The usa data set will contain observations with a Country value of US.

The australia data set will contain observations with a Country value of AU.

Observations with any other Country value will be written to the other data set.

Page 29: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

29

The Input Data Set

libname orion "/folders/myfolders/sasdata/prg2";proc print data=orion.employee_addresses;var employee_name city country;run;

Page 30: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

30

Creating Multiple DATA Sets

Multiple data sets can be created in a DATA step by listing the names of the output data sets in the DATA statement.

You can direct output to a specific data set or data sets by listing the data set names in the OUTPUT statement.

An OUTPUT statement without arguments writes to every SAS data set listed in the DATA statement.

DATA <SAS-data-set-1 … SAS-data-set-n>; DATA <SAS-data-set-1 … SAS-data-set-n>;

OUTPUT <SAS-data-set-1 … SAS-data-set-n>;OUTPUT <SAS-data-set-1 … SAS-data-set-n>;

Page 31: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

31

Creating Multiple SAS Data Sets

libname orion "/folders/myfolders/sasdata/prg2";data usa australia other; set orion.employee_addresses; if Country='AU' then output australia; else if Country='US' then output usa; else output other;run;

Check the SAS Log

Page 32: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

32

Efficient Conditional Processing

• It is more efficient to check values in order of decreasing frequency. data usa australia other; set orion.employee_addresses; if Country='US' then output usa; else if Country='AU' then output australia; else output other;run;

NOTE: There were 424 observations read from the data set ORION.EMPLOYEE_ADDRESSES.NOTE: The data set WORK.USA has 311 observations and 9 variables.NOTE: The data set WORK.AUSTRALIA has 105 observations and 9 variables.NOTE: The data set WORK.OTHER has 8 observations and 9 variables.

Page 33: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

33

Displaying Multiple SAS Data SetsThe PRINT procedure can only print one data set. A separate PROC PRINT step is required for each data set.

title 'Employees in the United States';proc print data=usa;run;

title 'Employees in Australia';proc print data=australia;run;

title 'Non US and AU Employees';proc print data=other;run;

title;

Page 34: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

34

Using a SELECT GroupAn alternate form of conditional execution uses a SELECT group.

The select-expression specifies any SAS expression that evaluates to a single value. Often a variable name is used as the select-expression.

SELECT <(select-expression)>; WHEN-1 (value-1 <…,value-n>)

statement; <…WHEN-n (value-1 <…,value-n>)

statement;> <OTHERWISE statement;>END;

SELECT <(select-expression)>; WHEN-1 (value-1 <…,value-n>)

statement; <…WHEN-n (value-1 <…,value-n>)

statement;> <OTHERWISE statement;>END;

Page 35: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

35

Using a SELECT Group

The previous task could be rewritten using a SELECT group:

SELECT processes the WHEN statements from top to bottom, so it is more efficient to check the values in order of decreasing frequency.

data usa australia other; set orion.employee_addresses; select (Country);

when ('US') output usa; when ('AU') output australia; otherwise output other;

end;run;

Page 36: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

36

The OTHERWISE Statement

The OTHERWISE statement is optional, but omitting it will result in an error when all WHEN conditions are false.

Use OTHERWISE followed by a null statement to prevent SAS from issuing an error message.

SELECT <(select-expression)>; WHEN-1 (value-1 <…,value-n>)

statement; <…WHEN-n (value-1 <…,value-n>)

statement;> <OTHERWISE statement;>END;

SELECT <(select-expression)>; WHEN-1 (value-1 <…,value-n>)

statement; <…WHEN-n (value-1 <…,value-n>)

statement;> <OTHERWISE statement;>END;

otherwise;

Page 37: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

37

data usa australia; set orion.employee_addresses; select (Country);

when ('US') output usa; when ('AU') output australia;

end;run;

Page 38: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

38

data usa australia; set orion.employee_addresses; select (Country);

when ('US') output usa;when ('AU') output australia;

otherwise; end;run;

An OTHERWISE statement is required.

Page 39: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

39

Test for Multiple Values in a WHEN Statement

data usa australia other; set orion.employee_addresses; select (Country);

when ('US','us') output usa; when ('AU','au') output australia; otherwise output other;

end;run;

Page 40: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

40

Using Functions in a Select Expression data usa australia other; set orion.employee_addresses; select (upcase(Country));

when ('US') output usa; when ('AU') output australia; otherwise output other;

end;run;

Page 41: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

41

Omitting the Select ExpressionThe select-expression can be omitted in a SELECT group:

Each WHEN expression evaluates to true or false. If true, the associated statement(s) are executed. If false, SAS proceeds to the next WHEN statement. If all WHEN expressions are false, then the statement(s) following the OTHERWISE statement will execute.

SELECT; WHEN (expression-1) statement; <…WHEN (expression-n) statement;> <OTHERWISE statement;>END;

SELECT; WHEN (expression-1) statement; <…WHEN (expression-n) statement;> <OTHERWISE statement;>END;

Page 42: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

42

Omitting the Select Expression

data usa australia other; set orion.employee_addresses; select;

when (country='US') output usa; when (country='AU') output australia; otherwise output other;

end;run;

Page 43: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

43

Using DO-END in a SELECT Groupdata usa australia other; set orion.employee_addresses; select (country); when ('US') do; Benefits=1; output usa; end;

when ('AU') do; Benefits=2;

output australia; end;

otherwise do; Benefits=0;

output other; end; end;run;

Page 44: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

44

Selecting Variables and Observations

Page 45: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

45

Create three data sets that are subsets of orion.employee_addresses based on the value of the Country variable.

Name the data sets usa, australia, and other, and write different variables to each output data set.

Page 46: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

46

Controlling Variable Output (Review)

By default, the SAS System writes all variables from the input data set to every output data set.

In the DATA step, the DROP and KEEP statements can be used to control which variables are written to output data sets.

Page 47: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

47

The DROP and KEEP Statements (Review)

PDV

Raw data file

Output SAS data set

Input SAS data set

DROP and KEEP statements

Page 48: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

48

Display Information About the Variables

libname orion "/folders/myfolders/sasdata/prg2";proc contents data=orion.employee_addresses;run;

Page 49: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

49

The DROP Statement

Use a DROP statement to drop the variable Street_ID.

data usa australia other; drop Street_ID; set orion.employee_addresses; if Country='US' then output usa; else if Country='AU' then output australia; else output other;run;

Page 50: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

50

Check the SAS Log

•Street_ID was dropped from every output data set. Each output data set has eight variables.

Page 51: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

51

Controlling Variable Output

What if you want to drop Street_ID and Country from usa, drop Street_ID, Country, and State from australia, and keep all variables in other?

Page 52: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

52

The DROP and KEEP statements affect every data set named on the DATA statement.

We need to use the DROP= or KEEP= data set option for more flexibility.

Page 53: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

53

The DROP= Data Set Option

The DROP= data set option can be used to exclude variables from an output data set.

General form of the DROP= data set option:

The specified variables are not written to the output data set; however, all variables are available for processing.

SAS-data-set(DROP=variable-1 <variable-2 …variable-n>)SAS-data-set(DROP=variable-1 <variable-2 …variable-n>)

Page 54: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

54

Using the DROP= Data Set Option

data usa(drop=Street_ID Country) australia(drop=Street_ID State Country) other; set orion.employee_addresses; if Country='US' then output usa; else if Country='AU' then

output australia; else output other;run;

Page 55: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

55

The KEEP= Data Set Option

The KEEP= data set option can be used to specify which variables to write to an output data set.

Only the specified variables are written to the output data set; however, all variables are available for processing.

SAS-data-set(KEEP=variable-1 <variable-2 …variable-n>)SAS-data-set(KEEP=variable-1 <variable-2 …variable-n>)

Page 56: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

56

Using the DROP= and KEEP= Options

data usa(keep=Employee_Name City State) australia(drop=Street_ID State) other; set orion.employee_addresses; if Country='US' then output usa; else if Country='AU' then output australia; else output other;run;

Page 57: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

57

The data set orion.employee_addresses contains nine variables. How many variables will be in the usa, australia, and other data sets?

data usa(keep=Employee_Name City State Country) australia(drop=Street_ID State Country) other; set orion.employee_addresses; if Country='US' then output usa; else if Country='AU' then output australia; else output other;run;

Page 58: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

58

Controlling Variable InputWhen the KEEP= data set option is associated with an input data set, only the variables specified are read into the PDV and therefore are available for processing.

data usa; set orion.employee_addresses (keep=Employee_ID Employee_Name City State Postal_Code); <additional SAS statements>run;

PDVEmployee_

IDEmployee_

NameCity State

Postal_Code

Only specified variables are in PDV

Page 59: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

59

Controlling Variable InputWhen the DROP= data set option is associated with an input data set, the specified variables are not read into the PDV, and therefore are not available for processing.

data usa; set orion.employee_addresses (drop=Street_ID Street_Number

Street_Name Country); <additional SAS statements>run;

PDVEmployee_

IDEmployee_

NameCity State

Postal_Code

Dropped variables are not in PDV

Page 60: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

60

Raw data file

Output SAS data set

DROP and KEEP statements

PDV

DROP= and KEEP= on an output data set

DROP= and KEEP= on an input data set

Input SAS data set

Affects contents of PDV

Page 61: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

61

What happens?

data usa australia(drop=State) other; set orion.employee_addresses (drop=Country Street_ID Employee_ID); if Country='US' then output usa; else if Country='AU' then output australia; else output other;run;

Page 62: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

62

Country is dropped on input, and therefore it is not available for processing. Every observation is written to other.

data usa australia(drop=State) other; set orion.employee_addresses (drop=Country Street_ID Employee_ID); if Country='US' then output usa; else if Country='AU' then output australia; else output other;run;

Page 63: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

63

Controlling Which Observations Are Read

By default, SAS processes every observation in a SAS data set, from the first observation to the last. The FIRSTOBS= and OBS= data set options can be used to control which observations are processed.

FIRSTOBS= and OBS= are used with input data sets. You cannot use either option with output data sets.

Page 64: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

64

The OBS= Data Set Option

The OBS= data set option specifies an ending point for processing an input data set.

General form of OBS= data set option:

This option specifies the number of the last observation to process, not how many observations should be processed.

SAS-data-set(OBS=n)SAS-data-set(OBS=n)

Page 65: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

65

data australia; set orion.employee_addresses (obs=100); if Country='AU' then output;run;

Page 66: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

66

The FIRSTOBS= Data Set Option

The FIRSTOBS= data set option specifies a starting point for processing an input data set. This option specifies the number of the first observation to process.

FIRSTOBS= and OBS= are often used together to define a range of observations to be processed.

SAS-data-set (FIRSTOBS=n)SAS-data-set (FIRSTOBS=n)

Page 67: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

67

Using OBS= and FIRSTOBS= Data Set Options

proc contents data=orion.employee_addresses;run;

data australia; set orion.employee_addresses (firstobs=50 obs=100); if Country='AU' then output;run;

Page 68: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

68

Using OBS= and FIRSTOBS= in a PROC Step

The FIRSTOBS= and OBS= data set options can also be used in SAS procedures. The PROC PRINT below begins processing at observation 10 and ends after observation 15.

proc print data=orion.employee_addresses (firstobs=10 obs=15); var Employee_Name City State Country;run;

Page 69: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

69

Adding a WHERE Statement

When FIRSTOBS= or OBS= and WHERE are used together:

The subsetting WHERE is applied first The FIRSTOBS= and OBS= are applied to the resulting observations.

The following step includes a WHERE statement and an OBS= option.

proc print data=orion.employee_addresses (obs=10); where Country='AU'; var Employee_Name City Country;run;

Page 70: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

70

An Aside -- proc surveyselect

proc surveyselect data=orion.employee_addresses method=srs n=10out=simplerandomsampleseed=12345;

run;title "Simple random sample";proc print data=simplerandomsample;var employee_id country;run;

Page 71: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

71

proc surveyselect data=orion.employee_addresses method=srs n=10out=simplerandomsampleseed=12345;

where country="US";run;title "Simple Random sample of US employees";proc print data=simplerandomsample;var employee_id country;run;

Page 72: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

72

Bootstrap sampling

data tmp;input x y @@;datalines;1 7 2 6 3 4 4 8 5 9 6 11 7 12;

run;proc print data=tmp;run;

Page 73: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

73

A single boot strap sample

proc surveyselect data=tmp method=urs n=7 outhits out=tmp1(drop=numberhits) seed=7456713;

run;

proc print data=tmp1;run;

Page 74: 1 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

74

Multiple bootstrap samples

proc surveyselect data=tmp method=urs n=7 outhits out=tmp1(drop=numberhits) seed=7456713reps=3;

run;

proc print data=tmp1;run;