sas slides 8 : base sas statistics procedures

24
SASTechies [email protected] http://www.sastechies.com

Upload: sastechies

Post on 16-Nov-2014

4.648 views

Category:

Documents


7 download

DESCRIPTION

Learning Base SAS,Advanced SAS,Proc SQl, ODS, SAS in financial industry,Clinical trials, SAS Macros,SAS BI,SAS on Unix,SAS on Mainframe,SAS interview Questions and Answers,SAS Tips and Techniques,SAS Resources,SAS Certification questions...visit http://sastechies.blogspot.com

TRANSCRIPT

Page 2: SAS Slides 8 : BASE SAS Statistics Procedures

Creating

◦ SAS Tables, ◦ Listings, ◦ Basic Statistics Procedures with SAS ◦ Graphs◦ ODS HTML◦ Proc Report and Other Utility Procedures

TLG’s

04/08/23 2SAS Techies 2009

Page 3: SAS Slides 8 : BASE SAS Statistics Procedures

Descriptive Distributive

SAS Techies 2009 04/08/23 3

Page 4: SAS Slides 8 : BASE SAS Statistics Procedures

PROC TABULATE creates customized one-, two-, and three-dimensional tables that display any of a large number of descriptive statistics. ◦ modify virtually every feature of a table ◦ calculate percentages ◦ produce sub-reports without sorting data ◦ summarize data and produce a report in one step ◦ generate multiple tables in one step.

SAS Techies 2009

proc tabulate data=diabstat;class type; var premium; table type premium; run;

proc tabulate data=clinic.admit; class sex; var height weight; table sex,height*min weight*min; run;

 Height Weight

Min Min

Sex

61.00 118.00F

M 69.00 147.00

proc tabulate data=clinic.admit; class sex actlevel; var height weight; table actlevel,sex,height*min weight*min; run;

ActLevel HIGHActLevel LOW

 Height Weight

Min Min

Sex

66.00 140.00F

M 72.00 168.00

 Height Weight

Min Min

Sex

61.00 118.00F

M 71.00 154.00

04/08/23 4

Page 5: SAS Slides 8 : BASE SAS Statistics Procedures

To set up a table with PROC TABULATE, you need to identify the data you are analyzing, and then determine ◦ which variables, if any, you need to classify your data ◦ which variables, if any, you need to analyze your data ◦ the type of table you need to represent your data.

SAS Techies 2009

PROC TABULATE invokes the procedure and identifies your data set

CLASS specifies variables used to classify data

VAR analyze data - uses variables and statistics to form the table.

TABLE defines the table to display your data --uses variables and statistics to form the table.

04/08/23 5

Page 6: SAS Slides 8 : BASE SAS Statistics Procedures

Class variables ◦ can be character or numeric. ◦ classify data into groups or

categories. ◦ have only a few distinct

values, in most cases. (PROC TABULATE prints each value of a class variable.)

Analysis Variables Unlike class variables, analysis variables ◦ must be numeric ◦ are used for arithmetic

analysis ◦ often contain continuous

values.

SAS Techies 2009

the same variable cannot appear in both the CLASS statement and the VAR statement in the same step.

04/08/23 6

Page 7: SAS Slides 8 : BASE SAS Statistics Procedures

SAS Techies 2009

You use the TABLE statement to specify • the number of dimensions in the table (page, row, column) • the variables in the table (Sex, Height)

• the statistics to be calculated (MAX)

                          

    

04/08/23 7

Page 8: SAS Slides 8 : BASE SAS Statistics Procedures

SAS Techies 2009

TABLE page-expression, row-expression, column-expression;

Dimension expressions contain elements.

                                                                                             

Dimension expressions can also contain operators that you use when combining elements to produce the table you want.

                                                                                         

Commas, one type of operator, separate the dimensions of the table.

                                                                                                                                 

04/08/23 8

Page 9: SAS Slides 8 : BASE SAS Statistics Procedures

proc tabulate data=clinic.diabstat; class type sex; var totalclaim premium; table type; table type premium; table type,premium; table type,premium,sum; run;

Two-dimensional tables always have row and column headings; one-dimensional tables only have column headings.

SAS Techies 2009

Type

I II

N N3.0

0 17.00

Type

PremiumI II

N N Sum3.00 17.00 3359.15

Premium

Sum

Type

312.65I

II 3046.50Type I

  Sum

Premium

312.65

Type II

  Sum

Premium

3046.50

04/08/23 9

Page 10: SAS Slides 8 : BASE SAS Statistics Procedures

Your final task before writing your own PROC TABULATE step is to specify the statistics needed. To request a statistic, you use an operator, the asterisk (*), to attach the statistic to the variable.

If you specify only class variables in your TABLE statement,

the default statistic is N (frequency)

the only statistics you can request are N and PCTN (percent of total frequency).

If you specify any analysis variables in your TABLE statement,

the default statistic is SUM you can request any statistic to

be computed on the analysis variables.

In a TABLE statement, you can specify statistics in any dimension, but they must all be in the same dimension.

SAS Techies 2009

proc tabulate data=clinic.admit; class sex actlevel; var height weight; table height*mean weight*max,actlevel; table sex*pctn actlevel*n; run;

04/08/23 10

Page 11: SAS Slides 8 : BASE SAS Statistics Procedures

To specify a summary row, you specify ALL in the row expression of your TABLE statement.

SAS Techies 2009

 Fee

Sum

Sex

1418.35F

M 1268.60

All 2686.95

proc tabulate; data=clinic.admit; var fee;class sex; table sex all,fee;run;

proc tabulate data=clinic.admit; class sex;table sex all; run;

SexAll

F M

N N N11 10 21

04/08/23 11

Page 12: SAS Slides 8 : BASE SAS Statistics Procedures

proc tabulate data=clinic.admit; class sex; var height weight; table (height weight)*mean,sex

all; label sex='Sex of Patient'

height='Height' weight='Weight'; keylabel min='Lowest Reading'

max='Highest Reading' mean='Average Reading';

Run;

SAS Techies 2009

 LowestReading

HighestReading

AverageReading

Sex

152.00 568.00 253.45

F FastingGlucoseLevel

M FastingGlucoseLevel 156.00 492.00 354.89

04/08/23 12

Page 13: SAS Slides 8 : BASE SAS Statistics Procedures

SAS Techies 2009

table height*mean weight*mean,sex all;

To group elements and control how expressions are evaluated, you can use the parentheses operator.

table sex,actlevel*age*max;

table type*(sex all);

You can also produce hierarchical tables by using the asterisk operator to cross class variables with other variables.

concatenation, using the blank Operator to display variables side by side or stacked

04/08/23 13

Page 14: SAS Slides 8 : BASE SAS Statistics Procedures

SAS Techies 2009

table type,fee,sum / condense;

table type,fee*sex*pctsum<type>;

proc format; value $actfmt 'LOW'='(1) Low' 'MOD'='(2) Moderate' 'HIGH'='(3) High';

You can condense multiple pages into a single page. Condensed output

You can specify how percentages are calculated.

You can create formats to change headings for class variable values.

04/08/23 14

Page 15: SAS Slides 8 : BASE SAS Statistics Procedures

Error How to rectifyVARIABLE appears in both CLASS and VAR lists. You specified a variable as both

class and analysis.

Type of name (VARIABLE) unknown at line n. You forgot to specify a variable as either class or analysis.

Variable VARIABLE in list does not match type prescribed for this list. You specified a character variable as

analysis.

SAS Techies 2009 04/08/23 15

Page 16: SAS Slides 8 : BASE SAS Statistics Procedures

proc means data=clinic.diabetes N mean std min max; run;

Descriptive statistics such as mean, sum, minimum, and maximum can answer basic questions about numeric data.

If variables are not specified, statistics on all the variables are calculated..

var age height weight;

SAS Techies 2009

Variable N Mean Std Dev Minimum MaximumAgeHeightWeightPulseFastGlucPostGluc

202020202020

4767

17575

299355

134

368

126126

1561

10265

152206

6375

240100568625

Variable Minimum MaximumAgeHeightWeightPulseFastGlucPostGluc

1561

10265

152206

6375

240100568625

proc means data=clinic.diabetes min max maxdec=0; run;

04/08/23 16

Page 17: SAS Slides 8 : BASE SAS Statistics Procedures

proc means data=flights.laguardia median maxdec=0; var boarded transferred deplaned;

class Dest;run

CLASS Group Processing You will often want statistics for grouped observations, instead of for observations as a whole.

SAS Techies 2009

Dest N Obs Variable MedianCPH 6 Boarded

TransferredDeplaned

13712

149

FRA 7 BoardedTransferredDeplaned

17612

189

LON 20 BoardedTransferredDeplaned

18611

200

PAR 13 BoardedTransferredDeplaned

15515

182

04/08/23 17

Page 18: SAS Slides 8 : BASE SAS Statistics Procedures

proc sort data=clinic.heart out=work.hartsort; by survive sex; run;

proc means data=work.hartsort maxdec=1; var arterial heart cardiac urinary; by survive sex; run;

Like the CLASS statement, the BY statement specifies variables to use for categorizing observations.

Unlike CLASS processing, BY processing requires that your data already be sorted in the order of the BY variables.

BY group results have a layout that is different from that of CLASS group results.

SAS Techies 2009

Survive=DIED Sex=2

Variable N Mean Std Dev Minimum MaximumArterialHeartCardiacUrinary

6666

94.2103.7318.3100.3

27.316.7

102.6155.7

72.081.0

156.00.0

145.0130.0424.0405.0

Survive=SURV Sex=1Variable N Mean Std Dev Minimum Maximum

ArterialHeartCardiacUrinary

5555

77.2109.0298.0100.8

12.232.0

139.860.2

61.077.066.044.0

88.0149.0410.0200.0

04/08/23 18

Page 19: SAS Slides 8 : BASE SAS Statistics Procedures

By default, PROC FREQ creates a one-way table with the frequency, percent, cumulative frequency, and cumulative percent of every value of all variables in a data set.

Frequency distributions work best with variables that contain repeating values.

proc freq data=finance.loans; tables rate months; run;

Note: One table per variable in one-way freq tables

SAS Techies 2009

VariableFrequenc

y Percent

Cumulative

Frequency

Cumulative

Percent

Value Number of observations with the value.

Frequency of the value divided by the total number of observations.

Sum of the frequency counts of the value and all other values listed above it in the table.

Cumulative frequency of the value divided by the total number of observation

Rate Frequency PercentCumulativeFrequency

CumulativePercent

9.50% 2 22.22 2 22.22

9.75% 1 11.11 3 33.33

10.00% 2 22.22 5 55.56

10.50% 4 44.44 9 100.00Months Frequency Percent

CumulativeFrequency

CumulativePercent

12 1 11.11 1 11.11

24 1 11.11 2 22.22

36 1 11.11 3 33.33

48 1 11.11 4 44.44

60 2 22.22 6 66.67

360 3 33.33 9 100.00 04/08/23 19

Page 20: SAS Slides 8 : BASE SAS Statistics Procedures

ht Frequency PercentCumulative

Frequency

CumulativePerce

nt

61 2 10.00 2 10.00

71 2 10.00 4 20.00

66 2 10.00 6 30.00

eight

Frequency

Percent

Cumulative

Frequency

Cumulative

Percent

64 3 15.00 3 15.00

61 2 10.00 5 25.00

65 2 10.00 7 35.00

66 2 10.00 9 45.00

ORDER=DATA|FORMATTED|FREQ|INTERNAL where

DATA orders values by appearance in the data set

FORMATTED orders by formatted value

FREQ orders values by descending frequency count

INTERNAL orders by unformatted value (default).

SAS Techies 2009

SAS Data Set Clinic.Diabetes ID Sex Age Height Weight Pulse FastGluc

PostGluc 2304 F 16 61 102 100 568 6251128 M 43 71 218 76 156 208 4425 F 48 66 162 80 244 322 1387 F 57 64 142 70 177 206

Height Frequency Percent

CumulativeFrequency

CumulativePerce

nt

Medium 8 40.00 8 40.00

Short 7 35.00 15 75.00

Tall 5 25.00 20 100.00

04/08/23 20

Page 21: SAS Slides 8 : BASE SAS Statistics Procedures

To create a two-way table, join two variables with asterisks (*) in the TABLES statement of a PROC FREQ step.

SAS Techies 2009

Frequency

Percent Row Pct Col Pct

Table of Weight by Height

Weight

Height

Total< 5'5" 5'5-10" > 5'10"

< 140 210.00

100.0028.57

00.000.000.00

00.000.000.00

210.00

  

140-180

525.0050.0071.43

525.0050.0062.50

00.000.000.00

1050.00

  

> 180 00.000.000.00

315.0037.5037.50

525.0062.50

100.00

840.00

  

Total 735.00

840.00

525.00

20100.00

proc format; value wtfmt low-139='< 140' 140-180='140-180' 181-high='> 180'; value htfmt low-64='< 5''5"' 65-70='5''5-10"' 71-high='> 5''10"'; run; proc freq data=clinic.diabetes; tables weight*height; format weight wtfmt. height htfmt.; run; 04/08/23 21

Page 22: SAS Slides 8 : BASE SAS Statistics Procedures

SAS Techies 2009

Frequency

Percent Row Pct Col Pct

Table 1 of Weight by Height

Controlling for Sex=F

Weight

Height

Total

< 5'5"

5'5-10"

> 5'10"

< 140 218.18100.0028.57

00.

000.

000.

00

00.000.00.

218.

18

  

140-18

0

545.4555.5671.43

436.36

44.44

100.

00

00.000.00.

981.

82

  

> 180 00.00. 0.00

00.

00. 0.

00

00.00. .

00.

00

  

Total 763.64

436.36

00.00

11100.00

Frequency

Percent Row Pct Col Pct

Table 2 of Weight by Height

Controlling for Sex=M

Weight

Height

Total

< 5'5"

5'5-10"

> 5'10"

< 140 00.00. .

00.

00. 0.

00

00.00.

0.00

00.

00

  

140-18

0

00.000.00.

111.11

100.

0025.00

00.000.000.00

111.

11

  

> 180 00.000.00.

333.33

37.50

75.00

555.56

62.50

100.00

888.

89

  

Total 00.00

444.44

555.56

9100.00

levels v tables sex*weight*height; ^ ^ rows + columns = two-way tables

proc freq data=clinic.diabetes; tables sex*weight*height; format weight weight. height height.; run;

04/08/23 22

Page 23: SAS Slides 8 : BASE SAS Statistics Procedures

Sex Weight Height Frequency Percent

CumulativeFreque

ncy

CumulativePercen

tF < 140 < 5'5" 2 10.00 2 10.00

F 140-180 < 5'5" 5 25.00 7 35.00

F 140-180 5'5-10" 4 20.00 11 55.00

M 140-180 5'5-10" 1 5.00 12 60.00

M > 180 5'5-10" 3 15.00 15 75.00

M > 180 > 5'10" 5 25.00 20 100.00

SAS Techies 2009

To generate list output for crosstabulations, add a slash (/) and the LIST option to the TABLES statement in your PROC FREQ step.

TABLES variable-1*variable-2 <* ... variable-n> / LIST;

proc freq data=clinic.diabetes; tables sex*weight*height / list; format weight wtfmt. height htfmt.; run;

04/08/23 23

Page 24: SAS Slides 8 : BASE SAS Statistics Procedures

NOFREQ suppresses cell frequencies

NOPERCENT suppresses cell percentages

NOROW supresses row percentages

NOCOL suppresses column percentages

SAS Techies 2009

proc freq data=clinic.diabetes; tables sex*weight / nofreq norow

nocol; format weight weight.; run;

Percent Table of Sex by Weight

Sex

Weight

Total< 140

140-18

0> 180

F 10.00 45.00 0.00 55.00

M 0.00 5.00 40.00 45.00

Total 210.00

1050.00

840.

00

20100.00

04/08/23 24