presentation and data http:// short courses intro to sas download data to desktop

95
Presentation and Data http:// www.lisa.stat.vt.edu Short Courses Intro to SAS Download Data to Desktop 1

Upload: gwen

Post on 25-Feb-2016

42 views

Category:

Documents


5 download

DESCRIPTION

Presentation and Data http:// www.lisa.stat.vt.edu Short Courses Intro to SAS Download Data to Desktop. Introduction to SAS. Mark Seiss , Dept. of Statistics. Reference Material. The Little SAS Book – Delwiche and Slaughter SAS Programming I: Essentials - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Presentation and Data

http://www.lisa.stat.vt.edu

Short Courses

Intro to SAS

Download Data to Desktop

1

Page 2: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Mark Seiss, Dept. of Statistics

Introduction to SAS

November 8 and 9, 2010

Page 3: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Reference Material The Little SAS Book – Delwiche and Slaughter SAS Programming I: Essentials SAS Programming II: Manipulating Data with the

DATA Step Presentation and Data

http://www.lisa.stat.vt.edu

Page 4: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Presentation Outline

1. Introduction to the SAS Environment

2. Working With SAS Data Sets

3. Summary Procedures

4. Basic Statistical Analysis Procedures

Page 5: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Presentation Outline

Questions/Comments

Individual Goals/Interests

Page 6: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Introduction to the SAS Environment

1. SAS Programs2. SAS Data Sets and Data

Libraries3. SAS System Help4. Creating SAS Data Sets

Page 7: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

SAS Programs• File extension - .sas• Editor window has four uses:

• Access and edit existing SAS programs• Write new SAS programs• Submitting SAS programs for execution• Saving SAS programs

• SAS program – sequence of steps that the user submits for execution

• Submitting SAS programs• Entire program• Selection of the program

Page 8: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

SAS Programs• Syntax Rules for SAS statements

• Free-format – can use upper or lower case• Usually begin with an identifying keyword• Can span multiple lines• Always end with a semicolon• Multiple statements can be on the same line

• Errors• Misspelled key words• Missing or invalid punctuation (missing semi-colon common)• Invalid options• Indicated in the Log window

Page 9: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

SAS Programs• 2 Basic steps in SAS programs:

• Data Steps • Typically used to create SAS datasets and manipulate data, • Begins with DATA statement

• Proc Steps• Typically used to process SAS data sets• Begins with PROC statement

• The end of the data or proc steps are indicated by:• RUN statement – most steps• QUIT statement – some steps• Beginning of another step (DATA or PROC statement)

Page 10: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

SAS Programs• Output generated from SAS program – 2 Windows

• SAS log • Information about the processing of the SAS program• Includes any warnings or error messages• Accumulated in the order the data and procedure steps are

submitted

• SAS output• Reports generated by the SAS procedures• Accumulates output in the order it is generated

Page 11: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

SAS Data Sets and Data Libraries• SAS Data Set

• Specifically structured file that contains data values.• File extension - .sas7bdat• Rows and Columns format – similar to Excel

• Columns – variables in the table corresponding to fields of data• Rows – single record or observation

• Two types of variables• Character – contain any value (letters, numbers, symbols, etc.)• Numeric – floating point numbers

• Located in SAS Data Libraries

Page 12: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

SAS Data Sets and Data Libraries• SAS Data Libraries

• Contain SAS data sets• Identified by assigning a library reference name – libref• Temporary

• Work library• SAS data files are deleted when session ends• Library reference name not necessary

• Permanent• SAS data sets are saved after session ends• SASUSER library• You can create and access your own libraries

Page 13: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

SAS Data Sets and Data Libraries• SAS Data Libraries cont.

• Assigning library references• Syntax

LIBNAME libref ‘SAS-data-library’;

• Rules for Library References• 8 characters or less• Must begin with letter or underscore• Other characters are letters, numbers, or under scores

Page 14: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

SAS Data Sets and Data Libraries• SAS Data Libraries cont.

• Identifying SAS data sets within SAS Data Librarieslibref.filename

• Accessing SAS data sets within SAS Data LibrariesExample: DATA new_data_set;

set libref.filename;run;

• Creating SAS data sets within SAS Data LibrariesExample: DATA libref.filename;

set old_data_set;run;

Page 15: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

SAS System Help• SAS Help and Documentation

• Help SAS Help and Documentation• Red Book Icon

• SAS Online Help• http://support.sas.com/

Page 16: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Creating SAS Data Sets• Creating a SAS data sets from raw data

• 4 methods1. Importing existing data sets using Import menu option2. Importing existing raw data in SAS program3. Manually entering raw data in SAS program4. Manually entering raw data using Table Editor

Page 17: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Creating SAS Data Sets• Using the import data menu option

1. File Import Data2. Standard data source select the file format3. Specify file location or Browse to select file4. Create name for the new SAS data set and specify location

Page 18: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Creating SAS Data Sets• Compatible file formats

• Microsoft Excel Spreadsheets• Microsoft Access Databases• Comma Separate Files (.csv)• Tab Delimited Files (.txt)• dBASE Files (.dbf)• JMP data sets• SPSS Files• Lotus Spreadsheets• Stata Files• Paradox Files

Page 19: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Creating SAS Data Sets• Example Data Sets

• Excel File – State_SAT_data.xls• http://www.stat.ucla.edu/labs/datasets/sat.dat• Extracted from 1997 Digest of Education Statistics, an annual

publication of the U.S. Department of Education• Contains variables that show the relationship between public

school expenditure and SAT performance• Variables:

– State (state)– Current expenditure per pupil (expend)– Average pupil to teacher ratio (PT_ratio)– Estimated annual salary of teachers (salary)– Percentage of eligible students taking the SAT (students)– Average verbal SAT score (verbal)– Average math SAT Score (math)– Average total score (total)

Page 20: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Creating SAS Data Sets• Example Data Sets Cont.

• Text file – State_region_data.txt• Contains region assignments for each state• 1 = New England• 2 = Middle Atlantic• 3 = East North Central• 4 = West North Central• 5 = South Atlantic• 6 = East South Central• 7 = West South Central• 8 = Mountain• 9 = Pacific

Page 21: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Creating SAS Data SetsImport State_SAT_data.xls Assign as

work.state_sat_data.sas7bdat

Import State_region_data.txt Assign as work.state_region_data.sas7bdat

Page 22: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Introduction to theSAS Environment

Questions/Comments

Page 23: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Working With SAS Data Sets

1. Data Set Information2. Data Set Manipulation3. Data Set Processing4. Combining Data Sets

A. Concatenating/Appending

B. Merging 5. Saving Data Sets

Page 24: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set Information• Proc Contents

• Output contains a table of contents of the specified data set• Data Set Information

• Data set name• Number of observations• Number of Variables

• Variable Information• Type (numeric or character)• Length

• Syntax:PROC CONTENTS DATA=input_data_set;RUN;

Page 25: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set InformationAssignment

Obtain Data Set Information for work.state_sat_data and work.state_region_data

Page 26: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set InformationSolution

proc contents data=state_sat_data;

run;

proc contents data=state_region_data;

run;

Page 27: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set Manipulation• Create a new SAS data set using an existing SAS data set as

input• Specify name of the new SAS data set after the DATA statement• Use SET statement to identify SAS data set being read• Syntax:

DATA output_data_set;SET input_data_set;<additional SAS statements>;

RUN;

• By default the SET statement reads all observations and variables from the input data set into the output data set.

Page 28: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set Manipulation• Assignment Statements

• Evaluate an expression• Assign resulting value to a variable• General Form: variable = expression;• Example: miles_per_hour = distance/time;

• SAS Functions• Perform arithmetic functions, compute simple statistics, manipulate

dates, etc.• General Form: variable=function_name(argument1, argument2,

…);• Example: Time_worked = sum(Day1,Day2, Day3, Day4, Day5);

Page 29: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set Manipulation• Selecting Variables

• Use DROP and KEEP to determine which variables are written to new SAS data set.

• 2 Ways• DROP and KEEP as statements

– Form: DROP Variable1 Variable2;KEEP Variable3 Variable4 Variable5;

• DROP and KEEP options in SET statement– Form: SET input_data_set (KEEP=Var1);

Page 30: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set Manipulation• Conditional Processing

• Uses IF-THEN-ELSE logic• General Form: IF <expression1> THEN <statement>;

ELSE IF <expression2> THEN <statement>;

ELSE <statement>;

• <expression> is a true/false statement, such as:• Day1=Day2, Day1 > Day2, Day1 < Day2• Day1+Day2=10• Sum(day1,day2)=10• Day1=5 and Day2=5

Page 31: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set Manipulation• Conditional Processing

Symbolic Mnemonic Example

= EQ IF region=‘Spain’;

~= or ^= NE IF region ne ‘Spain’;

> GT IF rainfall > 20;

< LT IF rainfall lt 20;

>= GE IF rainfall ge 20;

<= LE IF rainfall <= 20;

& AND IF rainfall ge 20 & temp < 90;

| or ! OR IF rainfall ge 20 OR temp < 90;

IS NOT MISSING

IF region IS NOT MISSING;

BETWEEN AND IF region BETWEEN ‘Plain’ AND ‘Spain’;

CONTAINS IF region CONTAINS ‘ain’;

IN IF region IN (‘Rain’, ‘Spain’, ‘Plain’);

Page 32: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set Manipulation• Conditional Processing cont.

• If <expression1> is true, <statement> is processed• ELSE IF and ELSE are only processed if <expression1> is false• Only one statement specified using this form• Use DO and END statements to execute group of statements• General Form: IF <expression> THEN DO;

<statements>;END;ELSE DO;

<statements>;END;

Page 33: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set Manipulation• Subsetting Rows (Observations)

• We will look at two ways• Using IF statement• Using WHERE option in SET statement

• IF statement• Only writes observations to the new data set in which an

expression is true;• General Form: IF <expression>;• Example: IF career = ‘Teacher’;

IF sex ne ‘M’;• In the second example, only observations where sex is not equal

to ‘M’ will be written to the output data set

Page 34: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set Manipulation• Subsetting Rows (Observations) cont.

• Where Option in SET statement• Use option to only read rows from the input data set in which the

expression is true• General Form: SET input_data_set (where=(<expression>));• Example:SET vacation (where=(destination=‘Bermuda’));• Only observations where the destination equals ‘Bermuda’ will be

read from the input data set

• Comparison• Resulting output data set is equivalent• IF statement – all rows read from the input data set• Where option – only rows where expression is true are read from

input data set• Difference in processing time when working with big data sets

Page 35: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set Manipulation• Assignments

1. Create new dataset work.state_SAT_data2 from work.state_SAT_data

Assign new variable upper_indIf total > 1000 then upper_ind=1Otherwise upper_ind=0

2. Create new dataset work.south from work.state_region_data

Specify work.south contains only records from regions 5, 6, or 7

Specify work.south only contains the state variable

Page 36: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set Manipulation• Solutions

1. data state_sat_data2;

set state_sat_data;

if total>1000 then upper_ind=1;

else upper_ind=0;

run;

Page 37: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set Manipulation• Solutions

2. data south;

set state_region_data;

if region=5 or region=6 or region=7;

keep state;

run;

OR

data south;

set state_region_data(where=(region=5 or region=6 or region=7));

keep state;

run;

Page 38: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set Manipulation• PROC SORT sorts data according to specified variables• General Form: PROC SORT DATA=input_data_set <options>;

BY Variable1 Variable2; RUN;

• Sorts data according to Variable1 and then Variable2;• By default, SAS sorts data in ascending order

• Number low to high• A to Z

• Use DESCENDING statement for numbers high to low and letters Z to A• BY City DESCENDING Population;• SAS sorts data first by city A to Z and then Population high to low

Page 39: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set Manipulation• Some Options

• NODUPKEY• Eliminates observations that have the same values for the BY

variables

• OUT=output_data_set• By default, PROC SORT replaces the input data set with the

sorted data set• Using this option, PROC SORT creates a newly sorted data set

and the input data set remains unchanged

Page 40: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Data Set Processing• Data Set Processing

• DATA steps read in data from existing data sets or raw data files one row at a time, like a loop

• DATA step reads data from the input data set in the following way:1. Read in current row from input data set to Program

Data Vector (PDV)2. Process SAS statements3. PDV to output data set4. Set current row to the next row in the input data set5. Iterate to Step 1

• One row at a time is processed• Thus we cannot simply add the value of a variable in one row to the

value in another row

Page 41: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Combining Data Sets• Concatenating (or Appending)

• Stacks each data set upon the other• If one data set does not have a variable that the other datasets

do, the variable in the new data set is set to missing for the observations from that data set.

• General Form: DATA output_data_set;SET data1 data2;

run;

• PROC APPEND may also be used

Page 42: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Combining Data Sets• Merging Data Sets

• One-to-One Match Merge• A single record in a data set corresponds to a single record in all

other data sets• Example: Patient and Billing Information

• One-to-Many Match Merge• Matching one observation from one data set to multiple

observations in other data sets• Example: County and State Information

• Note: Data must be sorted before merging can be done (PROC SORT)

Page 43: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Combining Data Sets• One-to-One Match Merge

• Usually need at least one common variable between data sets – matching purposes

• For the example, a patient ID would be needed• Do not need common variable if all data sets are in exactly the same

order• General Form: DATA output_data_set;

MERGE input_data_set1 input_data_set2;

By variable1 variable2;RUN;

Page 44: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Combining Data Sets• One-to-One Match Merge

• Example:PerformanceGoals

Code:DATA compare;

MERGE performance goals;BY month;difference=sales-goal;

RUN;

Month Sales1 8223

2 6034

3 4220

Month Goal1 9000

2 6000

3 5000

Page 45: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Combining Data Sets• One-to-One Match Merge

• Example cont.:Compare

Month Sales Goal Difference1 8223 9000 -777

2 6034 6000 34

3 4220 5000 -780

Page 46: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Combining Data Sets• One-to-Many Match Merge

• Requires at least one common variable in the data sets for matching purposes

• For the example, State information is in both the state and county files

• If two data sets have variables with the same name, the variables in the second data set will overwrite the variable in the first.

• General Form: DATA output_data_set;MERGE Data1 Data2 Data3;BY Variable1 Variable2;

RUN:

Page 47: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Combining Data Sets• One-to-Many Match Merge

• Example:Videos

Adjustment

Code:DATA prices;

MERGE videos adjustmentBY category;NewPrice=(1-adjustment)*sales;

RUN;

Category Sales

Aerobics 12.99

Aerobics 13.99

Aerobics 13.99

Step 12.99

Step 12.99

Weights 15.99

Category Adjustment

Aerobics .20

Step .30

Weights .25

Page 48: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Combining Data Sets• One-to-One Many Merge

• Example cont.:Videos

Category Sales Adjustment NewPrice

Aerobics 12.99 .20 10.39

Aerobics 13.99 .20 11.19

Aerobics 13.99 .20 11.19

Step 12.99 .30 9.09

Step 12.99 .30 9.09

Weights 15.99 .25 11.99

Page 49: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Combining Data Sets• Assignment

Create the dataset work.state_dataMerge work.state_sat_data2 with work.state_region_data by the state variable

Page 50: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Combining Data Sets• Solution

proc sort data=state_sat_data2;

by state;

run;

proc sort data=state_region_data;

by state;

run;

data state_data;

merge state_sat_data2 state_region_data;

by state;

run;

Page 51: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Saving Data Sets• Save as SAS dataset (.sas7bdat)

LIBNAME libref “destination folder”;

DATA libref.filename;

SET current_name;

optional commands;

RUN;

• Other Formats

1. File Export Data2. Specify SAS data set3. Standard data source select the file format4. Specify File Folder and Filename

Page 52: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Working With SAS Data Sets

Questions/Comments

Page 53: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Summary Procedures

1. Print Procedure2. Plot Procedure3. Univariate Procedure4. Means Procedure5. Freq Procedure

Page 54: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Print Procedure• PROC PRINT is used to print data to the output window• By default, prints all observations and variables in the SAS data set• General Form: PROC PRINT DATA=input_data_set <options>

<optional SAS statements>;

RUN;

• Some Options• input_data_set (obs=n) - Specifies the number of observations to

be printed in the output• NOOBS - Suppresses printing observation

number• LABEL - Prints the labels instead of variable

names

Page 55: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Print Procedure• Optional SAS statements

• BY variable1 variable2 variable3;• Starts a new section of output for every new value of the BY

variables

• ID variable1 variable2 variable3;• Prints ID variables on the left hand side of the page and

suppresses the printing of the observation numbers

• SUM variable1 variable2 variable3;• Prints sum of listed variables at the bottom of the output

• VAR variable1 variable2 variable3;• Prints only listed variables in the output

Page 56: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Print Procedure• Assignment

Use PROC PRINT to print out the state variable separately for each region

Note: All procedures for the remainder of the course will be run on the data set work.state_data.

Page 57: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Print Procedure• Solution

proc sort data=state_data;

by region;

run;

proc print data=state_data;

var state;

by region;

run;

Page 58: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Plot Procedure• Used to create basic scatter plots of the data• Use PROC GPLOT or PROC SGPLOT for more sophisticated plots• General Form: PROC PLOT DATA=input_data_set;

PLOT vertical_variable *

horizontal_variable/<options>; RUN;

• By default, SAS uses letters to mark points on plots• A for a single observation, B for two observations at the same point,

etc.

• To specify a different character to represent a point• PLOT vertical_variable * horizontal variable = ‘*’;

Page 59: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Plot Procedure• To specify a third variable to use to mark points

• PLOT vertical_variable * horizontal_variable = third_variable;

• To plot more than one variable on the vertical axis• PLOT vertical_variable1 * horizontal_variable=‘2’

vertical_variable2 * horizontal_variable=‘1’/OVERLAY;

Page 60: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Plot Procedure• Assignment

Use the PLOT PROCEDURE to plot SAT Verbal scores versus SAT Math ScoresUse the value of the region variable to mark points

Page 61: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Plot Procedure• Solution

proc plot data=state_data;plot math*verbal=region;

run;

Page 62: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Univariate Procedure• PROC UNIVARIATE is used to examine the distribution of data• Produces summary statistics for a single variable

• Includes mean, median, mode, standard deviation, skewness, kurtosis, quantiles, etc.

• General Form: PROC UNIVARIATE DATA=input_data_set <options>;VAR variable1 variable2 variable3;

RUN ;

• If the variable statement is not used, summary statistics will be produced for all numeric variables in the input data set.

Page 63: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Univariate Procedure• Options include:

• PLOT – produces Stem-and-leaf plot, Box plot, and Normal probability plot;

• NORMAL – produces tests of Normality

Page 64: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Univariate Procedure• Assignment

Use PROC UNIVARIATE to produce a normal probability plot and test the normality of the SAT Total variable and Expenditure variable

Page 65: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Univariate Procedure• Solution

proc univariate data=state_data normal plot;

var expend total;

run;

Page 66: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Means Procedure• Similar to the Univariate procedure• General Form: PROC MEANS DATA=input_data_set options;

<Optional SAS statements>;

RUN;

• With no options or optional SAS statements, the Means procedure will print out the number of non-missing values, mean, standard deviation, minimum, and maximum for all numeric variables in the input data set

Page 67: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Means Procedure• Options

• Statistics Available

• Note: The default alpha level for confidence limits is 95%. Use ALPHA= option to specify different alpha level.

CLM Two-Sided Confidence Limits RANGE Range

CSS Corrected Sum of Squares SKEWNESS Skewness

CV Coefficient of Variation STDDEV Standard Deviation

KURTOSIS Kurtosis STDERR Standard Error of Mean

LCLM Lower Confidence Limit SUM Sum

MAX Maximum Value SUMWGT Sum of Weight Variables

MEAN Mean UCLM Upper Confidence Limit

MIN Minimum Value USS Uncorrected Sum of Squares

N Number Non-missing Values VAR Variance

NMISS Number Missing Values PROBT Probability for Student’s t

MEDIAN (or P50) Median T Student’s t

Q1 (P25) 25% Quantile Q3 (P75) 75% Quantile

P1 1% Quantile P5 5% Quantile

P10 10% Quantile P90 90% Quantile

P95 95% Quantile P99 99% Quantile

Page 68: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Means Procedure• Optional SAS Statements

• VAR Variable1 Variable2;• Specifies which numeric variables statistics will be produced for

• BY Variable1 Variable2;• Calculates statistics for each combination of the BY variables

• Output out=output_data_set;• Creates data set with the default statistics

Page 69: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Means Procedure• Assignment

Use PROC MEANS to calculate the mean and variance of the expenditure variable for each region

Page 70: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Means Procedure• Solution

proc sort data=state_data;

by region;

run;

proc means data=state_data mean var;

var expend;

by region;

run;

Page 71: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

FREQ Procedure• PROC FREQ is used to generate frequency tables• Most common usage is create table showing the distribution of categorical

variables• General Form: PROC FREQ DATA=input_data_set;

TABLE variable1*variable2*variable3/<options>;

RUN;• Options

• LIST – prints cross tabulations in list format rather than grid• MISSING – specifies that missing values should be included in the tabulations• OUT=output_data_set – creates a data set containing frequencies, list format• NOPRINT – suppress printing in the output window

• Use BY statement to get percentages within each category of a variable

Page 72: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

FREQ Procedure• Assignment

Use PROC FREQ to find the number of states within each region

Page 73: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

FREQ Procedure• Solution

proc freq data=state_data;

table region;

run;

Page 74: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Summary Procedures

Questions/Comments

Page 75: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Statistical Analysis Procedures

1. Correlation – PROC CORR2. Regression – PROC REG3. Analysis of Variance – PROC ANOVA4. Chi-square Test of Association – PROC FREQ5. General Linear Models – PROC GENMOD

Page 76: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

CORR Procedure• PROC CORR is used to calculate the correlations between variables• Correlation coefficient measures the linear relationship between two variables• Values Range from -1 to 1

• Negative correlation - as one variable increases the other decreases• Positive correlation – as one variable increases the other increases• 0 – no linear relationship between the two variables• 1 – perfect positive linear relationship• -1 – perfect negative linear relationship

• General Form: PROC CORR DATA=input_data_set <options>VAR Variable1

Variable2;With Variable3;

RUN;

Page 77: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

CORR Procedure• If the VAR and WITH statements are not used, correlation is computed

for all pairs of numeric variables• Options include

• SPEARMAN – computes Spearman’s rank correlations• KENDALL – computes Kendall’s Tau coefficients

Page 78: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

CORR Procedure• Question: What is the correlation between the SAT Total variable and

Expenditure variable? Is it significant? Based on previous exercises, which correlation

coefficient should we use?

• Assignment: Use PROC CORR to find the correlation between the SAT Total variable and Expenditure Variable

Page 79: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

CORR Procedure• Solution

If the normality assumption is validproc corr data=state_data /;

var total expend;

run;

If the normality assumption is not validproc corr data=state_data spearman;

var total expend;

run;

Page 80: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

REG Procedure• PROC REG is used to fit linear regression models by least squares estimation• One of many SAS procedures that can perform regression analysis• Only continuous independent variables (Use GENMOD for categorical variables)• General Form:

PROC REG DATA=input_data_set <options>

MODEL dependent=independent1 independent2/<options>;

<optional statements>;

RUN;

• PROC REG statement options include• PCOMIT=m - performs principle component estimation with m principle

components• CORR – displays correlation matrix for independent variables in the model

Page 81: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

REG Procedure• MODEL statement options include

• SELECTION= • Specifies a model selection procedure be conducted –

FORWARD, BACKWARD, and STEPWISE

• ADJRSQ - Computes the Adjusted R-Square• MSE – Computes the Mean Square Error• COLLIN – performs collinearity analysis• CLB – computes confidence limits for parameter estimates• ALPHA=

• Sets significance value for confidence and prediction intervals and tests

Page 82: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

REG Procedure• Optional statements include

• PLOT Dependent*Independent1 – generates plot of data

Page 83: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

REG Procedure• Assignment

Use PROC REG to generate a multiple linear regression model

Dependent Variable – SAT Total (total)Use Stepwise Selection Possible Independent Variables

– Average pupil to teacher ratio (PT_ratio)– Current expenditure per pupil (expend)– Estimated annual salary of teachers (salary)– Percentage of eligible students taking the SAT

(students)

Page 84: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

REG Procedure• Solution

proc reg data=state_data;

model total=pt_ratio expend salary students/selection=stepwise;

run;

Page 85: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

ANOVA Procedure• PROC ANOVA performs analysis of variance• Designed for balanced data (PROC GLM used for unbalance data)• Can handle nested and crossed effects and repeated measures• General Form: PROC ANOVA DATA=input_data_set <options>;

CLASS independent1 independent2;

MODEL dependent=independent1 independent2;

<optional statements>;

Run;

• Class statement must come before model statement, used to define classification variables

Page 86: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

ANOVA Procedure• Useful PROC ANOVA statement option – OUTSTAT=output_data_set

• Generates output data set that contains sums of squares, degrees of freedom, statistics, and p-values for each effect in the model

• Useful optional statement – MEANS independent1/<comparison type>• Used to perform multiple comparisons analysis• Set <comparison type> to:

• TUKEY – Tukey’s studentized range test• BON – Bonferroni t test• T – pairwise t tests• Duncan – Duncan’s multiple-range test• Scheffe – Scheffe’s multiple comparison procedure

Page 87: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

ANOVA Procedure• Question: Are there significant differences between the Match SAT

scores of students from different regions?If there are significant differences, which regions

are different?

• Assignment: Use PROC ANOVA to determine if there are significant differences in the Math SAT variable between regions

Perform multiple comparisons between regions using Tukey’s Adjustment

Page 88: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

ANOVA Procedure• Solution

proc anova data=state_data;

class region;

model math=region;

means region/tukey;

run;

Page 89: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

FREQ Procedure• PROC FREQ can also be used to perform analysis with categorical data• General Form: PROC FREQ DATA=input_data_set;

TABLE variable1 variable2/<options>;

RUN;

• TABLE statement options include:• AGREE – Tests and measures of classification agreement including McNemar’s test,

Bowker’s test, Cochran’s Q test, and Kappa statistics• CHISQ - Chi-square test of homogeneity and measures of association• MEASURE - Measures of association include Pearson and Spearman correlation,

gamma, Kendall’s Tau, Stuart’s tau, Somer’s D, lambda, odds ratios, risk ratios, and confidence intervals

Page 90: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

GENMOD Procedure• PROC GENMOD is used to estimate linear models in which the response

is not necessarily normal• Logistic and Poisson regression are examples of generalized linear

models• General Form:

PROC GENMOD DATA=input_data_set;CLASS independent1;MODEL dependent = independent1 independent2/

dist= <option>

link=<option>;run;

Page 91: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

GENMOD Procedure• DIST = - specifies the distribution of the response variable• LINK= - specifies the link function from the linear predictor to the mean of

the response

• Example – Logistic Regression• DIST = binomial• LINK = logit

• Example – Poisson Regression• DIST = poisson• LINK = log

Page 92: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

GENMOD Procedure• Question: How do we model the probability of having a high

total SAT average based on other variables in the dataset?

Is the dependent variable normal, or does it have a different distribution?

What link function would you specify?• Assignment: Use PROC GENMOD to perform Logistic

Regression on the work.state_data data set• Dependent variable – upper_ind• Independent variables

– Average pupil to teacher ratio (PT_ratio)– Current expenditure per pupil (expend)– Estimated annual salary of teachers (salary)– Percentage of eligible students taking the SAT (students)– Region (region)

Page 93: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

GENMOD Procedure• Solution

proc genmod data=state_data descending;

class region;

model upper_ind=pt_ratio expend salary students/dist=bin link=logit;

run;

Page 94: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Statistical Analysis Procedures

Questions/Comments

Page 95: Presentation and Data http://  Short Courses Intro to SAS Download Data to Desktop

Attendee Questions

If time permits