planting your rows: using sas ® formats to make the generation of zero-filled rows in tables less...

Post on 28-Mar-2015

216 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Planting Your Rows: Using SAS® Formats to Make the Generation of Zero-Filled Rows in Tables

Less Thorny

Kathy Hardis FraemanHASUG Q3 2012

“Planting your rows” with SAS® Formats

One of many clever tricks that can be done with SAS formats

Techniques used for this trick are not unique to this trick, can be applied to other programming tricks

I ♥ SAS formats!

When to use this trick with SAS formats

Need tables with all possible values of a variable or variables

All possible values of the variable don’t exist in the data

3

Solution without using SAS Formats

“Hard code” all possible values of a variable in the program

Disadvantages–Tedious–Potential for large number of variables and/or values

–“Possible values” could change over time

4

Solution using SAS Formats

Attach a format to each variable with all possible values described

Advantages–“Dynamic”–Only need to include variable values in the SAS format

5

SAS Programming Techniques

1. Saving a permanent SAS format library2. Attach a format to a variable3. Determine the format attached to a

variable–SCL–SQL

4. Identify the data values defined in the attached format using CNTLOUT

6

Technique 1Saving a permanent SAS format library

libname library ">>pathname<<";

proc format library = library; . . . run;

7

Permanent SAS format library

8

• Named “formats”

• Stored as a SAS Catalog

SAMPLE DATA

Obs employee year num dollar  1 Hall FY 2008 10 $10,000.00 2 Hall FY 2010 15 $15,500.00 3 Oates FY 2008 8 $500.00 4 Brooks FY 2008 15 $11,111.00 5 Brooks FY 2010 20 $12,345.67 6 Abbott FY 2008 50 $75,757.00 7 Abbott FY 2010 75 $99,999.99 8 Costello FY 2008 33 $33,333.00 9 Costello FY 2010 44 $44,444.44

9

Variables with attached formats

EMPLOYEE and YEAR are not “text” variables

Variables have attached formats–EMPLOYEE has an attached format emplfmt.–YEAR has an attached format yearfmt.

10

SAS Formats attached to variablesproc format library = library; value emplfmt 1 = "Hall" 2 = "Oates"

3 = "Brooks" 4 = "Dunn"

5 = "Abbott" 6 = "Costello" ;

  value yearfmt 2008 = "FY 2008" 2009 = "FY 2009"

2010 = "FY 2010" ;

run;11

Technique 2Attach format to the variable

data in.sales; set sales; format employee emplfmt. year yearfmt. ; run;  

12

Obs employee year num dollar  1 Hall FY 2008 10 $10,000.00 2 Hall FY 2010 15 $15,500.00 3 Oates FY 2008 8 $500.00 4 Brooks FY 2008 15 $11,111.00 5 Brooks FY 2010 20 $12,345.67 6 Abbott FY 2008 50 $75,757.00 7 Abbott FY 2010 75 $99,999.99 8 Costello FY 2008 33 $33,333.00 9 Costello FY 2010 44 $44,444.44

Dunn hasn’t sold anything No sales by anyone in FY 2009 Oates didn’t sell anything in 201013

REPORT WITH MISSING ROWS Number Amount of Year Employee of Sales Sales ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ  FY 2008 Hall 10 $10,000.00 Oates 8 $500.00 Brooks 15 $11,111.00 Abbott 50 $75,757.00 Costello 33 $33,333.00  FY 2010 Hall 15 $15,500.00 Brooks 20 $12,345.67 Abbott 75 $99,999.99 Costello 44 $44,444.44

14

Technique 3Determine a Variable’s Format

Don’t need to “hard code” the name of a variables format unless in a FORMAT statement

Can use two techniques to determine a variable’s format–%SYSFUNC with SCL–A dictionary table with PROC SQL

15

Method 1 -- %SYSFUNC with SAS SCL

%let dsid = %sysfunc(open(IN.SALES, i)); %let varnum = %sysfunc(varnum(&dsid,EMPLOYEE)); %let format = %sysfunc(varfmt(&dsid, &varnum)); %let rc = %sysfunc(close(&dsid));  %put EMPLOYEE VARIABLE FORMAT = &format;

999 %put EMPLOYEE VARIABLE FORMAT = &format; EMPLOYEE VARIABLE FORMAT = EMPLFMT. Note that the “.” in the format name is included in the

macro variable &format

16

Method 2 – PROC SQL DICTIONARY TABLES

proc sql; create table formats as select format from dictionary.columns where upcase(libname) = 'IN' and upcase(memname) = 'SALES' and

upcase(name) = ('EMPLOYEE') ; quit;

  17

PROC SQL Dictionary Table Output

proc print data=formats; run;

Obs format 1 EMPLFMT.

Again note that the “.” in the format name is included in the value of the variable

18

Technique 4Identify all values of a SAS format

CNTLOUT Option of PROC FORMAT

–FMTNAME – name of the format–START – starting value of the format–LABEL – descriptive label associated with the value of START

19

PROC FORMAT Code

proc format library = library cntlout = formatlib (keep = fmtname start label); run;

 proc print data = formatlib; run;

20

Output data set FORMATLIB

Obs FMTNAME START LABEL  1 EMPLFMT 1 Hall 2 EMPLFMT 2 Oates 3 EMPLFMT 3 Brooks 4 EMPLFMT 4 Dunn 5 EMPLFMT 5 Abbott 6 EMPLFMT 6 Costello 7 YEARFMT 2008 FY 2008 8 YEARFMT 2009 FY 2009 9 YEARFMT 2010 FY 2010

21

Output data set FORMATLIB

Note that the variable with the name of the format FMTNAME does not have a “.” at the end

Can add a “.” to the end

fmtname = cats(fmtname,’.’);

22

Putting it all together – finding all possible values for variables using attached formats

Uses previously described techniques

Can be done with SCL or SQL

Complete code is provided

23

SAS Macro to put it all together

Two macros, one for each technique–%SYSFUNC with SCL–PROC SQL Dictionary tables

Both macros use CNTLOUTBoth macros input name of variable, output

values of variable in a data set

%getfmt1(var=employee, outvals=empvals);%getfmt1(var=year, outvals=yearvals);

24

Method 1 – SCL%macro getfmt1(var=, outvals=);  %let dsid = %sysfunc(open(IN.SALES,i)); /* Input data is hard coded*/

%let varnum = %sysfunc(varnum(&dsid, &var)); %let varfmt = %sysfunc(varfmt(&dsid, &varnum)); %let rc = %sysfunc(close(&dsid));

  %put &varfmt;

  proc format library = library cntlout = &outvals (keep = fmtname start label where = (cats(fmtname,'.') = "&varfmt"));

run; 

title "Data set &outvals from macro GETFMT1 -- SYSFUNC and SCL"; proc print data = &outvals;

run; %mend getfmt1;%getfmt1(var=employee, outvals=empvals);%getfmt1(var=year, outvals=yearvals);

25

Method 2 – SQL (part 1)

%macro getfmt2(var=, outvals=);  proc sql; create table &var.fmt as select format

from dictionary.columns where upcase(libname) = 'IN' and /* Input data is hard coded*/ upcase(memname) = 'SALES' and

upcase(name) = upcase("&var") ;

quit;

data _null_; set &var.fmt; call symputX("varfmt", format, 'L');

run;

%put &varfmt;  

26

Method 2 – SQL (part 2)

  proc format library = library cntlout = &outvals (keep = fmtname start label where = (cats(fmtname,'.') =

"&varfmt")); run;

  title "Data set &outvals from macro GETFMT2 SQL Dictionary Table";

proc print data = &outvals; run;

 %mend getfmt2;

%getfmt2(var=employee, outvals=empvals);%getfmt2(var=year, outvals=yearvals);

27

Output Data from %GETFMT1 (SCL) Data set empvals from macro GETFMT1 -- SYSFUNC and SCL

Obs FMTNAME START LABEL

1 EMPLFMT 1 Hall 2 EMPLFMT 2 Oates 3 EMPLFMT 3 Brooks 4 EMPLFMT 4 Dunn 5 EMPLFMT 5 Abbot 6 EMPLFMT 6 Costello

Data set yearvals from macro GETFMT1 -- SYSFUNC and SCL

Obs FMTNAME START LABEL

1 YEARFMT 2008 FY 2008 2 YEARFMT 2009 FY 2009 3 YEARFMT 2010 FY 201028

Output Data from %GETFMT2 (SQL) Data set empvals from macro GETFMT2 SQL Dictionary Table

Obs FMTNAME START LABEL

1 EMPLFMT 1 Hall 2 EMPLFMT 2 Oates 3 EMPLFMT 3 Brooks 4 EMPLFMT 4 Dunn 5 EMPLFMT 5 Abbot 6 EMPLFMT 6 Costello

Data set yearvals from macro GETFMT2 SQL Dictionary Table

Obs FMTNAME START LABEL

1 YEARFMT 2008 FY 2008 2 YEARFMT 2009 FY 2009 3 YEARFMT 2010 FY 201029

Data set with all possible values of both variablesUses data sets created by macros

–EMPVALS – all possible values of EMPLOYEE

–YEARVALS – all possible values of YEAR

30

Data set with all possible values of both variablesCombined with SQL join to create a data

set will all 6 possible values of the variable EMPLOYEE with all 3 possible values of the variable YEAR

Name of combined data set is ALLROWSALLROWS has 6 x 3 = 18 observations

31

SQL Join to create ALLROWSdata yearvals_mod (keep = year a); set yearvals;

year = input(trim(left(start)),8.); a = 1; run;

data empvals_mod (keep = employee b); set empvals;

employee = input(trim(left(start)),8.); b = 1; run;

proc sql; create table allrows as

select year, employee from yearvals_mod y, empvals_mod e where y.a = e.b; quit;

proc sort data=allrows; by year employee;

run;

32

Data set ALLROWS All rows needed for table

Formats will automatically be added later in the merge

Obs year employee

1 2008 1 2 2008 2 3 2008 3 4 2008 4 5 2008 5 6 2008 6 7 2009 1 8 2009 2 9 2009 3 10 2009 4 11 2009 5 12 2009 6 13 2010 1 14 2010 2 15 2010 3 16 2010 4 17 2010 5 18 2010 633

Create data set for reportproc sort data=in.sales out=sales; by year employee;

run;

data sales_all; merge allrows (in=a)

sales (in=s); by year employee;

/*------------------------------*/ /* Zero-fill the missing rows /*------------------------------*/ if a and ^s then do; num = 0; dollar = 0; end; run;

34

Data set SALES_ALL Input data for report Obs year employee num dollar 1 FY 2008 Hall 10 $10,000.00 2 FY 2008 Oates 8 $500.00 3 FY 2008 Brooks 15 $11,111.00 4 FY 2008 Dunn 0 $0.00 5 FY 2008 Abbott 50 $75,757.00 6 FY 2008 Costello 33 $33,333.00 7 FY 2009 Hall 0 $0.00 8 FY 2009 Oates 0 $0.00 9 FY 2009 Brooks 0 $0.00 10 FY 2009 Dunn 0 $0.00 11 FY 2009 Abbott 0 $0.00 12 FY 2009 Costello 0 $0.00 13 FY 2010 Hall 15 $15,500.00 14 FY 2010 Oates 0 $0.00 15 FY 2010 Brooks 20 $12,345.67 16 FY 2010 Dunn 0 $0.00 17 FY 2010 Abbott 75 $99,999.99 18 FY 2010 Costello 44 $44,444.4435

REPORT WITHOUT MISSING ROWS Number Amount of Year Employee Sales Sales ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ  FY 2008 Hall 10 $10,000.00 Oates 8 $500.00 Brooks 15 $11,111.00 Dunn 0 $0.00 Abbott 50 $75,757.00 Costello 33 $33,333.00

FY 2009 Hall 0 $0.00 Oates 0 $0.00 Brooks 0 $0.00 Dunn 0 $0.00 Abbott 0 $0.00 Costello 0 $0.00

  FY 2010 Hall 15 $15,500.00 Oates 0 $0.00 Brooks 20 $12,345.67 Dunn 0 $0.00 Abbott 75 $99,999.99 Costello 44 $44,444.4436

Conclusions

Use of SAS formats can improve the programming process

Techniques used for this programming trick can be adapted to other programming tricks

37

top related