proc contents

7
Developing Data-Driven SAS Programs Using Proc Contents Robert W. Graebner, Quintiles, Inc., Kansas City, MO ABSTRACT It is often desirable to write SAS programs that adapt to different data set structures without being modified. Such programs are referred to as data-driven programs because they assess the structure of the data set they are working with and automatically adapt to that structure. In SAS, the macro language can be used in conjunction with PROC CONTENTS to produce such programs. In this paper examples are provided to illustrate how this technique can be used to reduce programming and maintenance effort in a variety of situations. This paper is intended for experienced SAS programmers who have a basic understanding of the SAS macro language. INTRODUCTION The SAS macro language provides powerful capabilities for writing flexible programs that can behave differently depending on the parameters passed to them. A common use of macros is to reduce repetition in programs. An example from the pharmaceutical industry is the need to produce summary listing for each subject with the subject ID included in the report title. To accomplish this, you could write PROC REPORT code and make a copy for each subject then change the ID in each title statement. A more efficient way would be to put the PROC REPORT code in a macro, pass subject ID as a parameter and then use macro variable substitution to place the ID in the title. This method is simple when there are only a few differences between each report, but what do you do when you need reports for many different data sets with different structures? PROC CONTENTS provides a simple solution with its capability of storing data set structure information in a data set. This information can then be stored in macro variables and used to build SAS programming statements tailored to the data set you are working with. For example, in PROC REPORT, the variable names could be used to construct the COLUMN statements. In addition to making your program more generic, you also eliminate many errors. Because the variable names are obtained from PROC CONTENTS, you are assured that all variables will be included and that they will all be spelled correctly. Information on type, length, label and format can be used in a similar fashion to produce a DEFINE statement for each variable. This method can be used to generate SAS programming statements for any SAS procedure that utilizes data set structure information. There are two basic ways in which this process can be used in your programs. The first is to use macro variable substitution in the source code run by your current session. This has the advantage that your program can be generic and self contained. The second method is to use a DATA _NULL_ step with a series of PUT statements that use macro variable substitution to create a text file containing SAS source code statements. An advantage of this method is that you can modify the program before you run it. This is helpful when you are not able to handle all coding needs in your macro. It also allows you to give the source code to clients without giving away your macro technology. METHOD PROC CONTENTS has several features that make it useful in developing data-driven applications. It can determine the structure of any data set in a SAS library by using the DATA= LIBNAME.MEMBER option or it can process all data sets in a library at once by specifying DATA= LIBNAME._ALL_. By using the OUT= LIBNAME.MEMBER and NOPRINT options you can send the output to a SAS data set and suppress printed output. The resulting data set contains a series of variables describing the data set structures with an observation for each variable in each data set. The most useful variables are listed below. PROC CONTENTS Variable Description LIBNAME SAS Library Name MEMNAME SAS Library Member Name NAME Variable Name TYPE Variable Type (1= numeric, 2= character) LENGTH Variable length LABEL Variable Label FORMAT Variable Format FORMATL Format Length FORMATD Format Decimals INFORMAT Variable Informat INFORMATL Informat Length IINFORMATD Informat Decimals Creating a structure data set is very simple, an example is given below. proc contents data= &saslib..&ds out=struct position noprint; run; The macro variables that indicate the SAS library and data set name to be used are passed as parameters to the macro that contains the call to PROC CONTENTS. The PROC CONTENTS output is stored in a temporary data set called struct. The position option specifies that the observations will be ordered by the location of variables

Upload: saketgiri

Post on 21-Jul-2016

223 views

Category:

Documents


4 download

DESCRIPTION

how to change SAS prgram structure based on the dataset used

TRANSCRIPT

Page 1: Proc Contents

Developing Data-Driven SAS Programs Using Proc ContentsRobert W. Graebner, Quintiles, Inc., Kansas City, MO

ABSTRACTIt is often desirable to write SAS programs that adapt todifferent data set structures without being modified. Suchprograms are referred to as data-driven programs becausethey assess the structure of the data set they are workingwith and automatically adapt to that structure. In SAS, themacro language can be used in conjunction with PROCCONTENTS to produce such programs. In this paperexamples are provided to illustrate how this technique canbe used to reduce programming and maintenance effort ina variety of situations. This paper is intended forexperienced SAS programmers who have a basicunderstanding of the SAS macro language.

INTRODUCTIONThe SAS macro language provides powerful capabilitiesfor writing flexible programs that can behave differentlydepending on the parameters passed to them. A commonuse of macros is to reduce repetition in programs. Anexample from the pharmaceutical industry is the need toproduce summary listing for each subject with the subjectID included in the report title. To accomplish this, youcould write PROC REPORT code and make a copy foreach subject then change the ID in each title statement. Amore efficient way would be to put the PROC REPORTcode in a macro, pass subject ID as a parameter and thenuse macro variable substitution to place the ID in the title.This method is simple when there are only a fewdifferences between each report, but what do you do whenyou need reports for many different data sets withdifferent structures? PROC CONTENTS provides asimple solution with its capability of storing data setstructure information in a data set. This information canthen be stored in macro variables and used to build SASprogramming statements tailored to the data set you areworking with. For example, in PROC REPORT, thevariable names could be used to construct the COLUMNstatements. In addition to making your program moregeneric, you also eliminate many errors. Because thevariable names are obtained from PROC CONTENTS,you are assured that all variables will be included and thatthey will all be spelled correctly. Information on type,length, label and format can be used in a similar fashion toproduce a DEFINE statement for each variable. Thismethod can be used to generate SAS programmingstatements for any SAS procedure that utilizes data setstructure information.

There are two basic ways in which this process can beused in your programs. The first is to use macro variablesubstitution in the source code run by your currentsession. This has the advantage that your program can begeneric and self contained. The second method is to use aDATA _NULL_ step with a series of PUT statements that

use macro variable substitution to create a text filecontaining SAS source code statements. An advantage ofthis method is that you can modify the program before yourun it. This is helpful when you are not able to handle allcoding needs in your macro. It also allows you to give thesource code to clients without giving away your macrotechnology.

METHODPROC CONTENTS has several features that make ituseful in developing data-driven applications. It candetermine the structure of any data set in a SAS library byusing the DATA= LIBNAME.MEMBER option or it canprocess all data sets in a library at once by specifyingDATA= LIBNAME._ALL_. By using the OUT=LIBNAME.MEMBER and NOPRINT options you cansend the output to a SAS data set and suppress printedoutput. The resulting data set contains a series of variablesdescribing the data set structures with an observation foreach variable in each data set. The most useful variablesare listed below.

PROC CONTENTS Variable DescriptionLIBNAME SAS Library NameMEMNAME SAS Library Member NameNAME Variable NameTYPE Variable Type (1= numeric,

2= character)LENGTH Variable lengthLABEL Variable LabelFORMAT Variable FormatFORMATL Format LengthFORMATD Format DecimalsINFORMAT Variable InformatINFORMATL Informat LengthIINFORMATD Informat Decimals

Creating a structure data set is very simple, an example isgiven below.

proc contents

data= &saslib..&ds

out=struct

position

noprint;

run;

The macro variables that indicate the SAS library and dataset name to be used are passed as parameters to the macrothat contains the call to PROC CONTENTS. The PROCCONTENTS output is stored in a temporary data setcalled struct. The position option specifies that theobservations will be ordered by the location of variables

Page 2: Proc Contents

in the data set rather than alphabetically by variable name.The NOPRINT option suppresses printed output of thePROC CONTENTS results.

The next step is to place the desired information into SASmacro variables. Because these variables are often used initerative processes, it is desirable to have them in an array.While the SAS macro language does not support arrays,you can simulate arrays (sometimes called pseudo arrays)by using multi-pass macro variable resolution. Thefollowing source code creates a pseudo array containingthe variable names and types from the data set struct. TheSYMPUT function is used to store the data set variablesNAME and TYPE into macro variables that have theobservation number added to the end of variable name(e.g. var1, var2, etc.) to facilitate referencing them in a%DO loop. The last observation number is stored as wellto serve as the upper limit for the %DO loop.

data _ null_ ;

set struct end=last ;

call symput (' var'||left (_N_), name);

call symput (' type'||left (_N_),type);

if last then call symput (' numrec ', _N_);

run ;

As mentioned earlier, one use of this information is to usemacro variable substitution to form the necessary SASsource code when the macro is resolved. The followingexample loops through all variables in the data set andcalls PROC FREQ for each one.

%do i = 1 %to & numrec ;

proc freq data= & saslib..& ds ;

tables && var&i ;

run ;

%end ;

The macro variable reference &&var&I will be resolvedin two passes. When I = 1, the first pass will resolve to&var1 and the second pass will resolve to the string storedin &var1 which will be the name of the first variable in thedata set.

Another option is to use a DATA _NULL_ step and PUTstatements to generate SAS source code. The examplebelow uses this method to generate PROC REPORT code.The source code is put in the text file referenced in theFILE statements. The MOD option is used so that eachsuccessive DATA step will append to the file rather thanoverwrite it.

data _null_;

set struct end=last;

file sascode mod;

if _N_ = 1 then do;

put / "proc report data=&saslib..&ds

missing nowindows headline headskip

split='\';"/ ' column ' @;

end;

linelen + length(name);

if linelen >= 70 then do;

put / +9 @;

linelen = 10;

end;

put name @;

if last then put ';';

run;

data _null_;

set struct end=last;

file sascode mod;

length clabel $ 120 coltype $ 7;

vnwidth = length( trim(name));

if name in('PATNO','VISIT') then

coltype = 'order';

else coltype = 'display';

select;

when(length <= 4)

clabel = "define "||name||" / "||

coltype || " width=" ||

put( max(vnwidth, 4), 2.)

||"left;";

when(4 < length <= 20)

clabel = "define " || name || " / "

|| coltype || " width=" || put(

max(vnwidth, length), 2.) || "

left;";

when(length > 20)

clabel = "define " || name || " / "

|| coltype || " width=20" || " left

flow;";

end;

put @3 clabel;

if last then do;

put " title1 'QC Listing Report for

&ds';" /

'run;';

end;run;

Page 3: Proc Contents

To start the program generation, a PROC REPORTstatement and the associated options are written. Becausethis is only needs to be done once, before the variable-specific statements are written, an IF statement is used sothat this line is only written when _N_ equals one. Themacro variables &SASLIB and &DS contain the SASlibrary name and the data set name. Because these macrovariables need to be resolved as the source code isgenerated, double quotes are used to surround the stringthat contains them. If you need to include a macrovariable reference in the source code you generate, usesingle quotes to enclose the string. The remainingstatements in this DATA step are used to create aCOLUMN statement that contains the names of all thevariables in the data set to be reported from.

The second DATA step is used to create DEFINEstatements for each column. This section illustrates howconditional processing can be used to generate sourcecode that is dependent on each variable’s attributes. Whenthe length of a variable is less than or equal to four, thecolumn width is set to the maximum of the length of thevariable name and four. This guarantees that you will nothave any columns narrower than four spaces. When thelength is greater than four, but less than or equal to 20, thecolumn width is equal to the maximum of the length of thename and the length of the variable. When the length isgreater than 20, the column width is set to 20 and theFLOW option is used for column wrapping. After the lastvariable is reached, a TITLE statement is generated.

This source code is part of a macro that receives the dataset name in the parameter DS. If you need to performgenerate source code for multiple data sets, you can writeanother macro that creates a pseudo array containing thenames of the required data sets and then calls the sourcecode generating macro for each data set. An example ofsuch a macro is given below.

%macro qcrptgen (saslib, codefile);

options nolabel nofmterr;

filename sascode "&codefile";

/**** CREATE PROGRAM HEADER ****/

data _null_;

file sascode mod;

gendate = put( today(), date9.);

put "/***********************";

put " QC Listing for: &saslib";

put " Program name :: &codefile";

put " Authors name :: ";

put " Date started :: " gendate /;

put " Source code generated by the”;

put “ QCList Macro.";

put "******************************/";

run;

/**** CREATE A DATASET CONTAINING THE NAMES

OF ALL DATASETS IN THE LIBRARY SASLIB ****/

proc contents

data=&saslib.._all_

out = libmem

(keep= memname varnum)

position

noprint;

run;

proc sort data=libmem nodupkey;

by memname;

run;

/**** CREATE A PSUEDO-ARRAY (D1..Dn) OF

MACRO VARIABLES CONTAINING EACH DATASET

NAME ****/

data _null_;

set libmem end=last;

call symput('d'||left(_N_), memname);

if last then call symput('numrec',

_N_);

run;

/**** CALL THE SOURCE CODE GENERATOR FOR

EACH DATASET ****/

%do i = 1 %to &numrec;

%prgen(&&d&i);

%end;

%mend qcrptgen;

This macro has two parameters; SASLIB, which containsthe name of the SAS library to use, and CODEFILE,which contains the full path and filename of the sourcecode file you want to create. The first DATA stepgenerates a program header. The next step uses PROCCONTENTS to create a data set that contains the datastructure information for all data sets in the specified SASlibrary. The purpose of this data set is to provide a list ofdata sets to generate PROC REPORT source code for.The structure data set created by PROC CONTENTS willhave one observation for each variable in each data set.PROC SORT with the NODUPKEY option, usingMEMNAME (data set name) as the by variable, is used tocreate a data set with one observation per data set in thespecified library.

Page 4: Proc Contents

The next DATA step uses the SYMPUT function to storethe data set names in a pseudo array of macro variables.This allows a %DO loop to be used to call the source codegenerating macro for each of the data sets.

CONCLUSION

The methods presented in this paper illustrate how PROCCONTENTS and the SAS macro language can be used toincrease programming efficiency by creating data-drivenprograms or by generating SAS source code.

ACKNOWLEDGMENTSSAS is a registered trademark or trademark of SASInstitute, Inc. in the USA and other countries. IndicatesUSA registration.

CONTACTING THE AUTHOR

Robert GraebnerQuintiles, Inc.P.O. Box 9708Kansas City, MO 64134-0708

Email: [email protected]@grapevine.net

Web Site: Quintiles.comwww.grapevine.net/~graetech

Page 5: Proc Contents
Page 6: Proc Contents

M W S U G

Jazz Up Your

SAS Skills in

D a t aM a n a g e m e n t

Page 7: Proc Contents