introduction to sas

47
Introduction to Introduction to SAS SAS Lecture 2 Lecture 2 Brian Healy Brian Healy

Upload: nagendra-venkat-thati

Post on 23-Dec-2015

8 views

Category:

Documents


2 download

DESCRIPTION

instruction of SAS,basics of SAS

TRANSCRIPT

Page 1: Introduction to SAS

Introduction to SAS Introduction to SAS

Lecture 2Lecture 2

Brian HealyBrian Healy

Page 2: Introduction to SAS

Why use statistical Why use statistical packagespackages

Built-in functionsBuilt-in functions Data manipulationData manipulation Updated often to include new Updated often to include new

applicationsapplications Different packages complete certain Different packages complete certain

tasks more easily than otherstasks more easily than others Packages we will introducePackages we will introduce

– SASSAS– R (S-plus)R (S-plus)

Page 3: Introduction to SAS

SASSAS

Easy to input and output data setsEasy to input and output data sets Preferred for data manipulationPreferred for data manipulation ““proc” used to complete analyses proc” used to complete analyses

with built-in functionswith built-in functions Macros used to build your own Macros used to build your own

functionsfunctions

Page 4: Introduction to SAS

OutlineOutline

SAS StructureSAS Structure Efficient SAS Code for Large FilesEfficient SAS Code for Large Files SAS Macro FacilitySAS Macro Facility

Page 5: Introduction to SAS

Common errorsCommon errors

Missing semicolonMissing semicolon MisspellingMisspelling Unmatched quotes/commentsUnmatched quotes/comments Mixed proc and data statementMixed proc and data statement Using wrong optionsUsing wrong options

Page 6: Introduction to SAS

SAS StructureSAS Structure

Data Step: input, create, manipulate Data Step: input, create, manipulate or output dataor output data– Always start with a data lineAlways start with a data line– Ex. Ex. data one;data one;

Procedure Step: complete an Procedure Step: complete an operation on dataoperation on data– Always start with a proc lineAlways start with a proc line– Ex. Ex. proc contents;proc contents;

Page 7: Introduction to SAS

SAS System OptionsSAS System Options System options are global instructions that System options are global instructions that

affect the entire SAS session and control affect the entire SAS session and control the way SAS performs operations. SAS the way SAS performs operations. SAS system options differ from SAS data set system options differ from SAS data set options and statement options in that once options and statement options in that once you invoke a system option, it remains in you invoke a system option, it remains in effect for all subsequent effect for all subsequent datadata and and procproc steps in a SAS job, unless you specify them.steps in a SAS job, unless you specify them.

In order to view which options are available In order to view which options are available and in effect for your SAS session, use and in effect for your SAS session, use proc proc options; run;options; run;

Page 8: Introduction to SAS

Log, output and procedure Log, output and procedure optionsoptions

center center controls whether SAS procedure output is centered. By default, output is controls whether SAS procedure output is centered. By default, output is centered. To specify not centered, use centered. To specify not centered, use nocenternocenter..

date date prints the date and time to the log and output window. By default, the date prints the date and time to the log and output window. By default, the date and time is printed. To suppress the printing of the date, use and time is printed. To suppress the printing of the date, use nodatenodate..

label label allows SAS procedures to use labels with variables. By default, labels are allows SAS procedures to use labels with variables. By default, labels are permitted. To suppress the printing of labels, use permitted. To suppress the printing of labels, use nolabelnolabel..

notesnotes controls whether notes are printed to the SAS log. By default, notes are controls whether notes are printed to the SAS log. By default, notes are printed. To suppress the printing of notes, use printed. To suppress the printing of notes, use nonotesnonotes..

numbernumber controls whether page numbers are printed. By default, page numbers controls whether page numbers are printed. By default, page numbers are printed. To suppress the printing of page numbers, use are printed. To suppress the printing of page numbers, use nonumbernonumber..

linesize=linesize= specifies the line size (printer line width) for the SAS log and the SAS specifies the line size (printer line width) for the SAS log and the SAS procedure output file used by the procedure output file used by the datadata step and procedures. step and procedures.

pagesize=pagesize= specifies # of lines that can be printed per page of SAS output. specifies # of lines that can be printed per page of SAS output. missing=missing= specifies the character to be printed for missing numeric values. specifies the character to be printed for missing numeric values. formchar=formchar= specifies the the list of graphics characters that define table specifies the the list of graphics characters that define table

boundaries. boundaries.

Example:Example:OPTIONS NOCENTER NODATE NONOTES LINESIZE=80 MISSING=.OPTIONS NOCENTER NODATE NONOTES LINESIZE=80 MISSING=. ; ;

Page 9: Introduction to SAS

SAS data set control optionsSAS data set control optionsSAS data set control options specify how SAS SAS data set control options specify how SAS

data sets are input, processed, and output. data sets are input, processed, and output. firstobs=firstobs= causes SAS to begin reading at a specified causes SAS to begin reading at a specified

observation in a data set. The default is observation in a data set. The default is firstobs=1firstobs=1.. obs=obs= specifies the last observation from a data set or the specifies the last observation from a data set or the

last record from a raw data file that SAS is to read. To return last record from a raw data file that SAS is to read. To return to using all observations in a data set use to using all observations in a data set use obs=all obs=all

replace replace specifies whether permanently stored SAS data sets specifies whether permanently stored SAS data sets are to be replaced. By default, the SAS system will over-are to be replaced. By default, the SAS system will over-write existing SAS data sets if the SAS data set is re-write existing SAS data sets if the SAS data set is re-specified in a specified in a datadata step. To suppress this option, use step. To suppress this option, use noreplacenoreplace..

Example:Example: OPTIONS OBS=100 NOREPLACE;OPTIONS OBS=100 NOREPLACE;

Page 10: Introduction to SAS

Error handling optionsError handling options

Error handling options specify how the SAS System Error handling options specify how the SAS System reports on and recovers from error conditions. reports on and recovers from error conditions.

errors=errors= controls the maximum number of observations for which controls the maximum number of observations for which complete error messages are printed. The default maximum complete error messages are printed. The default maximum number of complete error messages is number of complete error messages is errors=20errors=20

fmterr fmterr controls whether the SAS System generates an error controls whether the SAS System generates an error message when the system cannot find a format to associate with a message when the system cannot find a format to associate with a variable. SAS will generate an ERROR message for every unknown variable. SAS will generate an ERROR message for every unknown format it encounters and will terminate the SAS job without format it encounters and will terminate the SAS job without running any following running any following datadata and and procproc steps. To read a SAS system steps. To read a SAS system data set without requiring a SAS format library, use data set without requiring a SAS format library, use nofmterr.nofmterr.

Example:Example:OPTIONS ERRORS=100 NOFMTERR; OPTIONS ERRORS=100 NOFMTERR;

Page 11: Introduction to SAS

Statements for Reading Statements for Reading DataData

data data statement names the data set statement names the data set you are makingyou are making

Can use any of the following Can use any of the following commands to input datacommands to input data– infileinfile Identifies an external raw data file Identifies an external raw data file

to read with an INPUT statement to read with an INPUT statement – inputinput Lists variable names in the input Lists variable names in the input

filefile– cardscards Indicates internal data Indicates internal data – setset Reads a SAS data setReads a SAS data set

Page 12: Introduction to SAS

Looking at the dataLooking at the data

To look at the variables in a data set, To look at the variables in a data set, use use – proc contents data=dataset;proc contents data=dataset;

run;run; To look at the actual data in the data To look at the actual data in the data

set,set,– proc print data=dataset (obs=num);proc print data=dataset (obs=num);

var varlist;var varlist;

run;run;

Page 13: Introduction to SAS

ExampleExampledata treat; data treat;

infile “g:\shared\BIO271\treat.dat”;infile “g:\shared\BIO271\treat.dat”;

input input id bpa bpb chola cholbid bpa bpb chola cholb; ;

run; run;

proc print data = treat (obs=10); proc print data = treat (obs=10);

run; run;

proc contents data=treat;proc contents data=treat;

run;run;

Page 14: Introduction to SAS

Delimiter OptionDelimiter Option

blank space (default)blank space (default) DELIMITER= option specifies that the DELIMITER= option specifies that the

INPUT statement use a character INPUT statement use a character other than a blank as a delimiter for other than a blank as a delimiter for data values that are read with list data values that are read with list input input

Page 15: Introduction to SAS

Delimiter ExampleDelimiter Example

Sometimes you want to input the data Sometimes you want to input the data yourselfyourself

Try the following data step:Try the following data step:data nums; data nums; infile datalines dsd delimiter=‘&'; infile datalines dsd delimiter=‘&'; input X Y Z; input X Y Z; datalines; datalines; 1&2&3 1&2&3 4&5&6 4&5&6 7&8&9 ;7&8&9 ;

Notice that there are no semicolons until the Notice that there are no semicolons until the end of the datalinesend of the datalines

Page 16: Introduction to SAS

CardsCards Another way to input data using the Another way to input data using the

keyboard (and often a last resort if having keyboard (and often a last resort if having problems input the data) is cardsproblems input the data) is cards

Similar to datalinesSimilar to datalines– data score;data score;

input test1 test2 test3;input test1 test2 test3; cards;cards;91 87 9591 87 9597 . 9297 . 92. 89 99. 89 99;;run;run;

Page 17: Introduction to SAS

Inputting character Inputting character variablesvariables

Sometimes your data will have charactersSometimes your data will have characters Example:Example:

data fam;data fam;input name$ age;input name$ age;cards;cards;Brian 27Brian 27Andrew 29Andrew 29Kate 24Kate 24run;run;

proc print data=fam;proc print data=fam;run;run;

What is different and what happens if you What is different and what happens if you don’t have the dollar sign?don’t have the dollar sign?

Page 18: Introduction to SAS

Using the libname Using the libname commandcommand

The final way we will show to input The final way we will show to input data is if you have a SAS data set , data is if you have a SAS data set , you can use a libname commandyou can use a libname commandlibname summer "g:\shared\bio271";libname summer "g:\shared\bio271";

data treat2;data treat2;

set summer.treat2;set summer.treat2;

run;run;

Look at the data set with proc printLook at the data set with proc print

Page 19: Introduction to SAS

Labeling variablesLabeling variablesVariable label: Use the label statement in the data

step to assign labels to the variables.  You could also assign labels to variables in proc steps, but then the labels only exist for that step.  When labels are assigned in the data step they are available for all procedures that use that data set.

Example: DATA labtreat; DATA labtreat; SET treat; SET treat; LABEL id=“patient id” bpa =“BP on treatment A" bpb =“BP on LABEL id=“patient id” bpa =“BP on treatment A" bpb =“BP on

treatment B" cholA=“Cholesterol on treatment A” treatment B" cholA=“Cholesterol on treatment A” cholB=“Cholesterol on treatment B"; cholB=“Cholesterol on treatment B";

RUN; RUN; PROC CONTENTS DATA=labtreat; PROC CONTENTS DATA=labtreat; RUN;RUN;

Page 20: Introduction to SAS

Try on your ownTry on your own

Make a data set with the following Make a data set with the following data calling it redsoxdata calling it redsox– 8, 58, 491, 1638, 58, 491, 163

7, 50, 469, 1337, 50, 469, 13331, 107, 458, 136 31, 107, 458, 136 33, 111, 410, 11733, 111, 410, 117

Label the variables HR, RBI, AB, HITSLabel the variables HR, RBI, AB, HITS Use proc print to ensure that you Use proc print to ensure that you

have input the data correctlyhave input the data correctly

Page 21: Introduction to SAS

Data ManipulationsData Manipulations One of the best parts of SAS is the One of the best parts of SAS is the

ability to complete data manipulationsability to complete data manipulations There are four major types of There are four major types of

manipulationsmanipulations– Subset of data Subset of data

Drop / keep variablesDrop / keep variables Drop observationsDrop observations

– Concatenate data filesConcatenate data files– Merge data filesMerge data files– Create new variablesCreate new variables

Page 22: Introduction to SAS

Drop / KeepDrop / Keep

SAS easily allows you to make a data set SAS easily allows you to make a data set with a subset of the variableswith a subset of the variables

What do you think happens with this code?What do you think happens with this code?DATA redsox2; DATA redsox2;

SET redsox; SET redsox;

KEEP ba rbi; KEEP ba rbi;

RUN;RUN;

How do you think you could use drop to do How do you think you could use drop to do the same thing?the same thing?

Page 23: Introduction to SAS

Dropping observationsDropping observations

We can also get a subset of the We can also get a subset of the observationsobservations

Read in treat2 from the g: driveRead in treat2 from the g: drive This is helpful when we want to This is helpful when we want to

remove missing dataremove missing dataDATA notreat2; DATA notreat2;

SET treat2; SET treat2;

IF cholA ^= . ;IF cholA ^= . ;

RUN;RUN;

Page 24: Introduction to SAS

Concatenating data files in Concatenating data files in SASSAS

SAS allows us to combine dataset by adding SAS allows us to combine dataset by adding more observations, usingmore observations, usingdata tottreat;data tottreat;set treat treat2;set treat treat2;run;run;

Check that it worked using Check that it worked using proc printproc print If a variable is called by a different name in If a variable is called by a different name in

each dataset, you must use:each dataset, you must use:data momdad; data momdad; set dads(RENAME=(dadinc=inc)) set dads(RENAME=(dadinc=inc))

moms(RENAME=(mominc=inc));moms(RENAME=(mominc=inc));run;run;

Page 25: Introduction to SAS

Merge data filesMerge data files

SAS also allows us to add more variables by SAS also allows us to add more variables by merging data filesmerging data files

The data set The data set demodemo gives demographic gives demographic information about the patients in treatinformation about the patients in treat

Read in demoRead in demo Now, use this code to combine the informationNow, use this code to combine the information

data extratreat; merge treat demo; by id; run;

Note: the data in each data set must be sorted to Note: the data in each data set must be sorted to use this codeuse this code

Page 26: Introduction to SAS

Making new variablesMaking new variables

We can make new variables in a data We can make new variables in a data stepstep

Let’s make a new variable in the Let’s make a new variable in the redsox data set by finding batting redsox data set by finding batting average and a variable for hr30average and a variable for hr30data redsox2;data redsox2;

set redsox;set redsox;

ba=hits/ab;ba=hits/ab;

if hr>=30 then hr30=1 else hr30=0;if hr>=30 then hr30=1 else hr30=0;

run;run;

Page 27: Introduction to SAS

Try on your ownTry on your own

Make a new data set called redsox3 Make a new data set called redsox3 using the following data and combine using the following data and combine it with redsoxit with redsox7, 51, 378, 1137, 51, 378, 1134, 41, 367, 994, 41, 367, 9920, 58, 361, 10920, 58, 361, 109

Make a new variable in redsox3 that Make a new variable in redsox3 that equals 1 if rbi is more than 100 and 0 equals 1 if rbi is more than 100 and 0 if rib is less than or equal to 100if rib is less than or equal to 100

Page 28: Introduction to SAS

Statements for Outputting Statements for Outputting DataData

file:file: Specifies the current output file for PUT Specifies the current output file for PUT statementsstatements

put:put: Writes lines to the SAS log, to the SAS Writes lines to the SAS log, to the SAS procedure output file, or to an external file procedure output file, or to an external file that is specified in the most recent FILE that is specified in the most recent FILE statement.statement.

Example:Example:data _null_; data _null_;

set redsox;set redsox;

file ‘p:\redsox.csv' delimiter=',' dsd;file ‘p:\redsox.csv' delimiter=',' dsd;

put hr rbi ab hits; put hr rbi ab hits;

run; run;

Page 29: Introduction to SAS

ComparisonsComparisons The The INFILEINFILE statement specifies the statement specifies the input fileinput file for for

any any INPUTINPUT statements in the statements in the DATADATA step. The step. The FILEFILE statement specifies the statement specifies the output fileoutput file for any for any PUTPUT statements in the statements in the DATADATA step. step.

Both the Both the FILEFILE and and INFILEINFILE statements allow you to statements allow you to use options that provide SAS with additional use options that provide SAS with additional information about the external file being used.information about the external file being used.

An INFILE statement usually identifies data from an An INFILE statement usually identifies data from an external file. A DATALINES statement indicates that external file. A DATALINES statement indicates that data follow in the job stream. You can use the data follow in the job stream. You can use the INFILE statement with the file specification INFILE statement with the file specification DATALINES to take advantage of certain data-DATALINES to take advantage of certain data-reading options that effect how the INPUT reading options that effect how the INPUT statement reads in-stream data.statement reads in-stream data.

Page 30: Introduction to SAS

Missing valuesMissing values

Missing values in SAS are shown by .Missing values in SAS are shown by . As a general rule, SAS procedures As a general rule, SAS procedures

that perform computations handle that perform computations handle missing data by omitting the missing missing data by omitting the missing values, including proc means, proc values, including proc means, proc freq, proc corr, and proc reg freq, proc corr, and proc reg

Check SAS web page for more Check SAS web page for more informationinformation

Page 31: Introduction to SAS

Missing values in logical Missing values in logical statementsstatements

SAS treats a missing value as the SAS treats a missing value as the smallest possible value (e.g., negative smallest possible value (e.g., negative infinity) in logical statements.infinity) in logical statements.

data times6; set times ; if (var1 <= 1.5) then varc1 = 0; else varc1 = 1 ; run ;

Output:Output:Obs id var1 varc1 Obs id var1 varc1 11 1 1.51 1.5 0 0 22 2 .2 . 0 0 33 3 2.13 2.1 1 1

Page 32: Introduction to SAS

Basic procsBasic procs

proc print and proc contents- we proc print and proc contents- we have seen thesehave seen these

proc sortproc sort proc meansproc means proc univariateproc univariate proc plotproc plot

Page 33: Introduction to SAS

Options in most procsOptions in most procs

var: lists the variables you want to var: lists the variables you want to perform the proc onperform the proc on

by: breaks the data into groupsby: breaks the data into groups where: limits the data set to a where: limits the data set to a

specific group of observationsspecific group of observations output: allows you to output the output: allows you to output the

results into a data setresults into a data set

Page 34: Introduction to SAS

Sort dataSort data We can use We can use proc sortproc sort to sort data to sort data The code to complete this isThe code to complete this is

proc sort data=extratreat ; by gender ; run ; proc sort data=extratreat ; by gender ; run ;

proc sort data=extratreat out=extreat ; by gender ; run ; proc sort data=extratreat out=extreat ; by gender ; run ;

proc sort data=extratreat out=extreat2; by descending proc sort data=extratreat out=extreat2; by descending gender ; run ; gender ; run ;

proc sort data=extratreat out=extreat3 noduplicates;proc sort data=extratreat out=extreat3 noduplicates;by gender ; run ; by gender ; run ;

Page 35: Introduction to SAS

proc means / univariateproc means / univariate The basic form of proc means isThe basic form of proc means is

– proc means data=extratreat;proc means data=extratreat;var ______;var ______;by _______;by _______;where _______;where _______;output out=stat mean=bpamean cholamean;output out=stat mean=bpamean cholamean;run;run;

The basic form of proc univariate is the The basic form of proc univariate is the same, but much more information is givensame, but much more information is given

It is helpful to use the output window to It is helpful to use the output window to get the info you needget the info you need

Page 36: Introduction to SAS

proc plotproc plot

To make different plots in SAS, you use To make different plots in SAS, you use proc plotproc plot

ScatterplotScatterplot– proc plot data=redsox;proc plot data=redsox;

plot rbi*ab;plot rbi*ab;run;run;

You can also make plots using You can also make plots using – proc univariate data=redsox plot;proc univariate data=redsox plot;

var rbi;var rbi;run;run;

Page 37: Introduction to SAS

Try on your ownTry on your own

Find the mean blood pressure on Find the mean blood pressure on treatment A in womentreatment A in women

Make a scatterplot of blood pressure Make a scatterplot of blood pressure on treatment B versus blood on treatment B versus blood pressure on treatment A in menpressure on treatment A in men

Find the median number of home Find the median number of home runs hit by the Red Soxruns hit by the Red Sox

Page 38: Introduction to SAS

SAS Macro SAS Macro

Macros are the SAS method of making Macros are the SAS method of making functionsfunctions

Avoid repetitious SAS codeAvoid repetitious SAS code Create generalizable and flexible SAS Create generalizable and flexible SAS

codecode Pass information from one part of a SAS Pass information from one part of a SAS

job to anotherjob to another Conditionally execute data steps and Conditionally execute data steps and

PROCsPROCs

Page 39: Introduction to SAS

SAS Macro FacilitySAS Macro Facility

SAS macro variableSAS macro variable SAS MacroSAS Macro There are many discussions of macro There are many discussions of macro

variables on the web; one good one variables on the web; one good one is given here: is given here: http://www2.sas.com/proceedings/suhttp://www2.sas.com/proceedings/sugi30/130-30.pdfgi30/130-30.pdf

Page 40: Introduction to SAS

SAS Macro DelimitersSAS Macro Delimiters

Two delimiters will trigger the macro processor in a SAS program.

&macro-variable This refers to a macro variable. The current value of

the variable will replace &macro-variable;

%macro-name This refers to a macro, which consists of one or

more complete SAS statements, or even whole data or proc steps.

Page 41: Introduction to SAS

SAS Macro VariablesSAS Macro Variables

SAS Macro variables can be defined SAS Macro variables can be defined and used anywhere in a SAS and used anywhere in a SAS program, except in data lines. They program, except in data lines. They are independent of a SAS dataset. are independent of a SAS dataset.

Page 42: Introduction to SAS

SAS Macro Variables SAS Macro Variables

%LET:%LET: assign text to a macro variable; assign text to a macro variable; %LET %LET macrovarmacrovar = = valuevalue1. Macrovar is the name of a global macro variable;1. Macrovar is the name of a global macro variable;2. Value is macro variable value, which is a character string 2. Value is macro variable value, which is a character string

without quotation or macro expression.without quotation or macro expression.

%PUT:%PUT: display macro variable values as text in the display macro variable values as text in the SAS log;SAS log; %put _all_, %put _user_%put _all_, %put _user_

&macrovar:&macrovar: Substitute the value of a macro Substitute the value of a macro variable in a program;variable in a program;

Page 43: Introduction to SAS

SAS Macro Variables SAS Macro Variables Here is an example of how to use a macro Here is an example of how to use a macro

variable:variable: %let int=treat;%let int=treat;

proc means data=&int;proc means data=&int;run;run;

Now we can rerun the code again simply Now we can rerun the code again simply changing the value of the macro variable, changing the value of the macro variable, without altering the rest of the code.without altering the rest of the code.

%let int=redsox;%let int=redsox;proc means data=&int;proc means data=&int;run;run;

This is extremely helpful when you have a This is extremely helpful when you have a large amount of code you want to referencelarge amount of code you want to reference

Page 44: Introduction to SAS

Create SAS Macro Create SAS Macro

Definition:Definition:%MACRO macro-name (parm1, parm2,…%MACRO macro-name (parm1, parm2,…

parmk);parmk);

Macro definition (&parm1,&parm2,…&parmk)Macro definition (&parm1,&parm2,…&parmk)

%MEND macro-name;%MEND macro-name;

Application:Application:%macro-name(values of parm1, parm2,%macro-name(values of parm1, parm2,

…,parmk);…,parmk);

Page 45: Introduction to SAS

SAS Macro ExampleSAS Macro Example

Import Excel to SAS Datasets by a MacroImport Excel to SAS Datasets by a Macro%macro excelsas(in, out); %macro excelsas(in, out); proc import out=work.&outproc import out=work.&out datafile=“g:\shared\bio271\&in"datafile=“g:\shared\bio271\&in" dbms=excel replace;dbms=excel replace; getnames=yes; run;getnames=yes; run;%mend excelsas;%mend excelsas;

% excelsas(practice.xls,test) % excelsas(practice.xls,test) Use proc print to ensure that you have the data Use proc print to ensure that you have the data

input properlyinput properly

Page 46: Introduction to SAS

What is this macro doingWhat is this macro doing%let int=treat;%let int=treat;%let dop=%str(id bpa);%let dop=%str(id bpa);

%macro%macro happyhappy; ; data new;data new;

set &int;set &int;drop &dop;drop &dop;run;run;

proc means data=new; proc means data=new; run;run;%mend%mend happy; happy;

%%happyhappy

Page 47: Introduction to SAS

In class practiceIn class practice

Use the Use the autoauto data from the g: drive data from the g: drive read data into SAS (variables: id, weight, read data into SAS (variables: id, weight,

mpg, foreign)mpg, foreign) create a new variable for better than 20 create a new variable for better than 20

mpgmpg get means/frequencies for weight and mpg get means/frequencies for weight and mpg

for foreign and domestic vehiclesfor foreign and domestic vehicles Are there any missing values?Are there any missing values? Write a macro to sort a data set by a Write a macro to sort a data set by a

variable and then print the first 10 variable and then print the first 10 observations (use macro variables)observations (use macro variables)