parallel processing sas batch jobs in windows

25
Parallel processing SAS Batch jobs in Windows, check return codes and then aggregate information This post will discuss a method and present macros to achieve Parallel processing SAS Batch jobs in Windows OS. Scenario: You need to extract data from say a large database or multiple sources and then aggregate that information into one dataset in the least possible time without sacrificing much on the processing time and programming time. Let’s discuss this hypothetical scenario with an example…Imagine that there were 3 independent extract processes…and suppose each of those processes took the below min to process…See Appendix #1 below…The example is simple to understand but the macro might be a little bit overkill…but I am sure it is worth 100% to try…Thanks in advance for your patience…. Process 1 took 30 min Process 2 took 30 min Process 3 took take 30 min Aggregate process took 15 min Total process : 1 hr 45 min. Now your task is to somehow decreasing the processing time.. The simple solution is to decrease the processing time by processing these independent extract jobs in parallel and then aggregate the info in a separate process. But how do we do that?

Upload: chakravarthy-narnindi-sharad

Post on 26-Sep-2015

66 views

Category:

Documents


3 download

DESCRIPTION

Parallel Processing SAS Batch Jobs in Windows

TRANSCRIPT

  • Parallel processing SAS Batch jobs in Windows, check return codes and then aggregate information

    This post will discuss a method and present macros to achieve Parallel processing SAS Batch jobs in Windows OS.

    Scenario:

    You need to extract data from say a large database or multiple sources and then aggregate that information into one dataset in the least possible time without sacrificing much on the processing time and programming time.

    Lets discuss this hypothetical scenario with an exampleImagine that there were 3 independent extract processesand suppose each of those processes took the below min to processSee Appendix #1 belowThe example is simple to understand but the macro might be a little bit overkillbut I am sure it is worth 100% to tryThanks in advance for your patience.

    Process 1 took 30 min

    Process 2 took 30 min

    Process 3 took take 30 min

    Aggregate process took 15 min

    Total process : 1 hr 45 min.

    Now your task is to somehow decreasing the processing time..

    The simple solution is to decrease the processing time by processing these independent extract jobs in parallel and then aggregate the info in a separate process. But how do we do that?

  • Assuming that we have a directory structure like this for the Project to store permanent datasets(data), source information (sourcedata), logs (log), programs(progs) and output (out)

    Here I present the steps to do that:

    1. Break down the processes into separate .sas programs (Look at Programs in Appendix # 2 #3)

    Put each of the extract processes into separate programsFor this example prog1, prog2 , prog 3 and aggregate(aggregation part)

    SASStart.bat It is a batch file (save the code in Appendix # 4 into C:\Project\bat\SASStart.bat)

  • 2. Kick off those separate programs using systask command with the SASStart.bat batch file

    Use systask SAS statement to execute the Batch file SASStart.bat asynchronously i.e. concurrently.

    More info http://support.sas.com/onlinedoc/913/getDoc/en/hostwin.hlp/win-stmt-systask.htm

    systask command "SASStart.bat progsname" taskname="task1" status=rc_task1;

    Syntax briefly

    systask command tells SAS to run the bat program asynchronously

    taskname identify the task

    status= stores the return code value in the macro variable specified (rc_task1 in the example)

    3. Check the return codes before aggregating

    http://support.sas.com/onlinedoc/913/getDoc/en/hostwin.hlp/win-stmt-systask.htm
  • We need to check the return codes of each of those process by looking at the values of the macro variables mentioned in the status=macvar.

    4. Aggregate the code

    Prepare the aggregation code and the process code to do all the above tasks.

    Please look at the Appendices for the code and explanation of the code.

    Download all the code and data from http://www.sastechies.com/blogs/parallel.zip

    All the macros are generic and a little bit of customization in the highlighted areas is required to implement this technique to your programs. 1.Replace these macro variable values to your project specific paths %let progPath=C:\Project\progs; /* Location of the Programs*/ %let BatchPath=C:\Project\bat; /* Location of the SAS Start Batch file*/ %let prognames= prog1 prog2 prog3; /*name of the program names to execute*/ 2. replace the contents of the Prog1-3.sas and aggregate.sas to your project specific programs.

    http://www.sastechies.com/blogs/parallel.zip
  • APPENDIX # 1 Code that processes sequentially Libname datalib "C:\Project\data"; Libname source "C:\Project\sourcedata"; /* Process 1: Extracting info starting with J */ proc sql; create table datalib.data1 as select * from source.class1 where name like 'J%'; quit; /* Process 2: Extracting info starting with A */ proc sql; create table datalib.data2 as select * from source.class2 where name like 'A%'; quit; /* Process 3: Extracting info starting with R */ proc sql; create table datalib.data3 as select * from source.class3

  • where name like 'R%'; quit; /* aggregate Process: Put the information together */ data datalib.aggregatedata; set datalib.data1 datalib.data2 datalib.data3; run; /* sorting by name */ proc sort data=datalib.aggregatedata; by name; quit;

  • Appendix # 2:

    Save the code into each of these .sas files

    Program prog1.sas

    Libname datalib "C:\Project\data"; /* Process 1: Extracting info starting with J */ proc sql; create table datalib.data1 as select * from source.class1 where name like 'J%'; quit;

  • Program prog2.sas

    Libname datalib "C:\Project\data"; /* Process 2: Extracting info starting with A */ proc sql; create table datalib.data2 as select * from source.class2 where name like 'A%'; quit;

    Program prog3.sas

    Libname datalib "C:\Project\data"; /* Process 3: Extracting info starting with R */ proc sql; create table datalib.data3 as select * from source.class3 where name like 'R%'; quit;

  • Appendix # 3:

    Program aggregate.sas

    Description of each of these macros used in the program

    %initiateParallelProcessing This is the main macro which kicks off all the parallel programs with different SAS sessions. The SYSTASK statement within the Do loop here kicks off the batch program c:\Project\bat\SASStart.bat with program as the parameter for every task. It also has the name of the taskname and stores the return code of the batch process to the macro variable mentioned in the status= statement.

    systask command "SASStart.bat &&progs&i." taskname="task&i." status=rc_task&i.;

    the WAITFOR _ALL_ task1 task2 task3 tells SAS to wait for all the processes to complete before executing all the statements after that i.e. %checkreturncodes and %aggregate etc..

    waitfor _all_ %tasklist(&start.,&end.);; /* Wait for All of the tasks to complete */ %checkreturncodes(&start.,&end.,task,"flag_error=N");; - Print All the Error Codes to Log

    %aggregatecode this macro checks all the return codes of the parallel tasks..if 0 then completes the aggregating code after that.

    %sleepsecs sometimes it is better to sleep sas for couple of secs just to let the Windows OS settle with the new taskso Ive used sleepsecs(2) so that Windows can recognize the batch file execution properlythis is optional

  • %tasklist(start,end) Ive used this macro just to give me the list of the tasks that were spawned..

    %Conditions(start,end,rctype)- Ive used this macro piece to check the values for rc_task1 eq 0 or rc_task2 eq 0 etc for all the task ids spawned

    %Checkreturncodes(start,end,rctype,cond)- This macro checks for the conditions rc_task1 eq 0 or rc_task2 eq 0 and then sets the condition that a user wants to setin this case I wanted to set the condition flag_error=N

    %printerrorcodes(start,end,rctype)- This macro prints the error codes to the Log..just a listing of the values of the macro variables of all the tasks..

    splitmacroval(macvar,prefix,splitcnt,dlm=' ')- Read more about this macro at- http://sastechies.blogspot.com/2009/11/sas-macro-to-split-macro-variables.html

    Full Code

    /* Macro to send SAS to a sleep mode for few seconds */ %macro sleepsecs(sec); %put;%put =============Macro: sleepsecs(&sec); data _null_; slept= sleep(&sec);/* argument to sleep must be expressed in seconds */ run; %mend sleepsecs; /* Macro to generate a tasklist task1 task2 etc... */ %macro tasklist(start,end);

  • %do i=&start %to &end.; task&i. %end; %mend tasklist; /* Macro to check if all the returncodes for the Child sessions is 0 or not*/ %macro Conditions(start,end,rctype); %put;%put =============Macro: Conditions(&start,&end,&rctype); %do i=&start %to %eval(&end - 1); &&&rc_&rctype.&i. eq 0 or %end; %mend Conditions; /* Macro to Check the return codes of all tasks */ %macro Checkreturncodes(start,end,rctype,cond); %put;%put =============Macro: Checkreturncodes(&start,&end,&rctype,&cond); %put;%put =============================================================================; %put The Error codes for the Above tasks are: %printerrorcodes(&start,&end,&rctype); %put =============================================================================;%put; %if %Conditions(&start,&end,&rctype) &&&rc_&rctype.&end. eq 0 %then %do;

  • %let leftof=%scan(%sysfunc(compress(&cond,'"')),1,'='); %let rightof=%scan(%sysfunc(compress(&cond,'"')),2,'='); %global &leftof; %let &leftof=&rightof; %put Flag Set: &leftof=&rightof; %put; %end; %mend Checkreturncodes; /* Macro to print the error codes to the Log */ %macro printerrorcodes(start,end,rctype); %put;%put =============Macro: printerrorcodes(&start,&end,&rctype); %do i=&start %to %eval(&end); rc_&rctype.&i.=&&&rc_&rctype.&i. %end; %mend printerrorcodes; /* Create a SAS session for each of the Filename Inputs */ %macro initiateParallelProcessing; %let start=1; %let end=&progcnt; options noxwait noxsync; x cd "&BatchPath"; %do i=&start. %to &end.; /* */

  • /* Call the SASStart.bat with All the parameters in the Back ground mode with Systask command */ %put;%put *************************Task executing task&i.********************************************; %put systask command "SASStart.bat &&progs&i." taskname="task&i." status=rc_task&i.; systask command "SASStart.bat &&progs&i." taskname="task&i." status=rc_task&i.; %sleepsecs(3); /* wait for 3 secs atleast for the system to stabilize with the new App*/ %end; waitfor _all_ %tasklist(&start.,&end.);; /* Wait for All of the tasks to complete */ %checkreturncodes(&start.,&end.,task,"flag_error=N");; /* Print All the Error Codes to Log */ %mend initiateParallelProcessing; %macro splitmacroval(macvar,prefix,splitcnt,dlm=' '); %global &splitcnt; /*create the global variable for storing the splitcount*/ %let num=1; /*initialize a counter variable*/ %global &prefix# /*create the global variable for storing the first split*/ %let &prefix&num=%scan(&&&macvar,&num,&dlm); /*Scan the first value*/ %do %while(&&&prefix&num ne ); %let num=%eval(&num + 1);/*increment the counter*/

  • %global &prefix# /*create the global variable*/ %let &prefix&num=%scan(&&&macvar,&num,&dlm); /*scan the next value*/ %end; %let &splitcnt=%eval(&num - 1); /*Store the split count to the macro variable reqd*/ %mend splitmacroval; %let flag_error=Y; /* Initialize the error flag to Y */ %let progPath=C:\Project\progs; /* Location of the Programs*/ %let BatchPath=C:\Project\bat; /* Location of the SAS Start Batch file*/ %let prognames= prog1 prog2 prog3; /*name of the program names to execute*/ %splitmacroval(prognames,progs,progcnt,dlm=' ');/* Read more about this macro at-http://sastechies.blogspot.com/2009/11/sas-macro-to-split-macro-variables.html*/ %initiateParallelProcessing;/* Call the SAS Process to process those files */ %put The value of flag_error=&flag_error; %put; %macro aggregatecode; %if &flag_error eq N %then %do; Libname datalib "C:\Project\data"; data datalib.aggregatedata; set datalib.data1 datalib.data2 datalib.data3; run; proc sort data=datalib.aggregatedata;

  • by name; quit; %end; %else %do; %put Errors in Processing...Please check the Logs...; %end; %mend aggregatecode; %aggregatecode;

  • Appendix # 4:

    Save the code to SASstart.bat and put it in C:\projects\bat\

    This windows batch file takes one parameter i.e. name of the sas program (extension .sas is not required). This batch file kicks a batch SAS session with the command

    For eg. The macro initiateProcessing uses SASstart.bat to kick off prog2.sas with this command.

    "C:\Program Files\SAS\SAS 9.1\sas.exe " -sysin "C:\Project\progs\prog2.sas.sas" -print "C:\Project\out\prog2 08DEC2009.lst" log "C:\Project\Log\prog2 08DEC2009.log" -unbuflog -nodmsexp -nosplash

    Read more about Running SAS in batch mode in Windows OS at

    http://sastechies.blogspot.com/2009/12/running-sas-in-batch-mode-in-windows-os.html

    tips to understand this program is to compare with SAS syntax

    The Set statement is like our SAS %let statement

    %XXX% is like out SAS &XXXX macro resolution

    %ERRORLEVEL% is like out SAS &sysrc macvariable At SET ProgName=%1 is the parameter that we send to the SASstart.bat program

    SASStart.bat

    @echo off

    http://sastechies.blogspot.com/2009/12/running-sas-in-batch-mode-in-windows-os.html
  • if x==%1x goto CHKPARM goto PROCESS :PROCESS echo Start Processing SET ProgName=%1 SET SASEXEpath=C:\Program Files\SAS\SAS 9.1\sas.exe SET ProgPath=C:\Project\progs SET Prog2Start=%ProgPath%\%ProgName%.sas for /f "tokens=1-2,* delims= " %%p in ('echo %DATE%') do SET Rundated=%%q echo Rundated=%Rundated% for /f "tokens=1-3,* delims=/" %%f in ('echo %Rundated%') do SET Rundatet=%%f%%g%%h echo Rundatet=%Rundatet% SET day=%Rundatet:~2,2% SET year=%Rundatet:~4,4% SET Monthtemp=%Rundatet:~0,2% echo Monthtemp=%Monthtemp% if %Monthtemp%==01 set month=JAN if %Monthtemp%==02 set month=FEB

  • if %Monthtemp%==03 set month=MAR if %Monthtemp%==04 set month=APR if %Monthtemp%==05 set month=MAY if %Monthtemp%==06 set month=JUN if %Monthtemp%==07 set month=JUL if %Monthtemp%==08 set month=AUG if %Monthtemp%==09 set month=SEP if %Monthtemp%==10 set month=OCT if %Monthtemp%==11 set month=NOV if %Monthtemp%==12 set month=DEC SET Rundate=%day%%Month%%year% SET ProjPath=C:\Project SET Logpath=%ProjPath%\Log SET OutputPath=%ProjPath%\out SET Logfile=%Logpath%\%ProgName% %Rundate%.log SET Outputfile=%OutputPath%\%ProgName% %Rundate%.lst echo Command Executing is echo --------------------------------------------------- echo "%SASEXEpath%" -sysin "%Prog2Start%.sas" -print "%Outputfile%" -log "%Logfile%" -unbuflog -nodmsexp -nosplash echo --------------------------------------------------- "%SASEXEpath%" -icon -sysin "%Prog2Start%" -print "%Outputfile%" -log "%Logfile%" -unbuflog -nodmsexp -nosplash SET Capterr=%ERRORLEVEL% echo SAS Execution Complete with Exit code %Capterr%

  • goto TERM :CHKPARM echo Please check your parameter list echo Correct Usage: DQS AppName Environment FileDate Filename echo Eg: DQS CAHSEE Production 05/10/2009 CAHSEE.DAT goto TERM :TERM exit /b %Capterr%

  • Appendix # 5

    Log of the Aggregate process

    535 /* Macro to send SAS to a sleep mode for few seconds */ 536 %macro sleepsecs(sec); 537 %put;%put =============Macro: sleepsecs(&sec); 538 539 data _null_; 540 slept= sleep(&sec);/* argument to sleep must be expressed in seconds */ 541 run; 542 %mend sleepsecs; 543 544 /* Macro to generate a tasklist task1 task2 etc... */ 545 %macro tasklist(start,end); 546 %do i=&start %to &end.; 547 task&i. 548 %end; 549 %mend tasklist; 550 551 /* Macro to check if all the returncodes for the Child sessions is 0 or not*/ 552 %macro Conditions(start,end,rctype); 553 %put;%put =============Macro: Conditions(&start,&end,&rctype); 554 555 %do i=&start %to %eval(&end - 1); 556 &&&rc_&rctype.&i. eq 0 or 557 %end; 558 %mend Conditions; 559 560 /* Macro to Check the return codes of all tasks */ 561 %macro Checkreturncodes(start,end,rctype,cond); 562 %put;%put =============Macro: Checkreturncodes(&start,&end,&rctype,&cond); 563 564 %put;%put =============================================================================; 565 %put The Error codes for the Above tasks are: %printerrorcodes(&start,&end,&rctype); 566 %put =============================================================================;%put; 567 568 %if %Conditions(&start,&end,&rctype) &&&rc_&rctype.&end. eq 0 %then 569 %do;

  • 570 %let leftof=%scan(%sysfunc(compress(&cond,'"')),1,'='); 571 %let rightof=%scan(%sysfunc(compress(&cond,'"')),2,'='); 572 %global &leftof; 573 %let &leftof=&rightof; 574 %put Flag Set: &leftof=&rightof; 575 %put; 576 %end; 577 578 %mend Checkreturncodes; 579 /* Macro to print the error codes to the Log */ 580 %macro printerrorcodes(start,end,rctype); 581 %put;%put =============Macro: printerrorcodes(&start,&end,&rctype); 582 583 %do i=&start %to %eval(&end); 584 rc_&rctype.&i.=&&&rc_&rctype.&i. 585 %end; 586 %mend printerrorcodes; 587 588 /* Create a SAS session for each of the Filename Inputs */ 589 %macro initiateParallelProcessing; 590 %let start=1; 591 %let end=&progcnt; 592 593 options noxwait noxsync; 594 x cd "&BatchPath"; 595 596 %do i=&start. %to &end.; /* */ 597 /* Call the SASStart.bat with All the parameters in the Back ground mode with Systask 597! command */ 598 %put;%put *************************Task executing 598! task&i.********************************************; 599 %put systask command "SASStart.bat &&progs&i." taskname="task&i." status=rc_task&i.; 600 systask command "SASStart.bat &&progs&i." taskname="task&i." status=rc_task&i.; 601 %sleepsecs(3); /* wait for 3 secs atleast for the system to stabilize with the new 601! App*/ 602 %end; 603 604 waitfor _all_ %tasklist(&start.,&end.);; /* Wait for All of the tasks to complete 604! */ 605 %checkreturncodes(&start.,&end.,task,"flag_error=N");; /* Print All the Error 605! Codes to Log */

  • 606 607 %mend initiateParallelProcessing; 608 609 %macro splitmacroval(macvar,prefix,splitcnt,dlm=' '); 610 %global &splitcnt; /*create the global variable for storing the splitcount*/ 611 %let num=1; /*initialize a counter variable*/ 612 613 %global &prefix# /*create the global variable for storing the first split*/ 614 %let &prefix&num=%scan(&&&macvar,&num,&dlm); /*Scan the first value*/ 615 616 %do %while(&&&prefix&num ne ); 617 %let num=%eval(&num + 1);/*increment the counter*/ 618 %global &prefix# /*create the global variable*/ 619 %let &prefix&num=%scan(&&&macvar,&num,&dlm); /*scan the next value*/ 620 %end; 621 622 %let &splitcnt=%eval(&num - 1); /*Store the split count to the macro variable reqd*/ 623 %mend splitmacroval; 624 625 626 %let progPath=C:\Project\progs; /* Location of the Programs*/ 627 %let BatchPath=C:\Project\bat; /* Location of the Batch file*/ 628 %let prognames= prog1 prog2 prog3; /*name of the program names to execute*/ 629 %splitmacroval(prognames,progs,progcnt,dlm=' ');/* Read more about this macro at- 630 http://sastechies.blogspot.com/2009/11/sas-macro-to-split-macro-variables.html*/ 631 632 %let flag_error=Y; 633 %initiateParallelProcessing; /* Call the SAS Process to process those files */ *************************Task executing task1******************************************** systask command "SASStart.bat prog1" taskname="task1" status=rc_task1 =============Macro: sleepsecs(3) NOTE: Task "task1" produced no LOG/Output. NOTE: DATA statement used (Total process time): real time 3.06 seconds cpu time 0.04 seconds

  • *************************Task executing task2******************************************** systask command "SASStart.bat prog2" taskname="task2" status=rc_task2 =============Macro: sleepsecs(3) NOTE: Task "task2" produced no LOG/Output. NOTE: DATA statement used (Total process time): real time 3.06 seconds cpu time 0.04 seconds *************************Task executing task3******************************************** systask command "SASStart.bat prog3" taskname="task3" status=rc_task3 =============Macro: sleepsecs(3) NOTE: Task "task3" produced no LOG/Output. NOTE: DATA statement used (Total process time): real time 3.06 seconds cpu time 0.04 seconds =============Macro: Checkreturncodes(1,3,task,"flag_error=N") ============================================================================= =============Macro: printerrorcodes(1,3,task) The Error codes for the Above tasks are: rc_task1=0 rc_task2=0 rc_task3=0 ============================================================================= =============Macro: Conditions(1,3,task) Flag Set: flag_error=N 634 %put The value of flag_error=&flag_error; %put; The value of flag_error=N 635 636 %macro aggregatecode;

  • 637 %if &flag_error eq N %then 638 %do; 639 Libname datalib "C:\Project\data"; 640 data datalib.aggregatedata; 641 set datalib.data1 datalib.data2 datalib.data3; 642 run; 643 644 proc sort data=datalib.aggregatedata; 645 by name; 646 quit; 647 648 %end; 649 %else %do; %put Errors in Processing...Please check the Logs...; %end; 650 %mend aggregatecode; 651 652 %aggregatecode; NOTE: Libref DATALIB was successfully assigned as follows: Engine: V9 Physical Name: C:\Project\data NOTE: There were 7 observations read from the data set DATALIB.DATA1. NOTE: There were 2 observations read from the data set DATALIB.DATA2. NOTE: There were 2 observations read from the data set DATALIB.DATA3. NOTE: The data set DATALIB.AGGREGATEDATA has 11 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.04 seconds cpu time 0.00 seconds NOTE: There were 11 observations read from the data set DATALIB.AGGREGATEDATA. NOTE: The data set DATALIB.AGGREGATEDATA has 11 observations and 5 variables. NOTE: PROCEDURE SORT used (Total process time): real time 0.07 seconds cpu time 0.01 seconds

  • Appendix # 6

    Final Datasets

    Download all the code and data from

    https://drive.google.com/open?id=0ByjOA-zaHec0Y2FpbzN5eFpfM0U&authuser=0

    Read more @ http://sastechies.blogspot.com/

    https://drive.google.com/open?id=0ByjOA-zaHec0Y2FpbzN5eFpfM0U&authuser=0http://sastechies.blogspot.com/