producing descriptive statistics
DESCRIPTION
Producing Descriptive Statistics. Statistical Analysis System. Producing Descriptive Statistics. Introduction Computing Statistics Using PROC MEANS Creating a Summarized Data Set Using PROC SUMMARY Producing Frequency Tables Using PROC FREQ. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
Mainframe Online Training
Poldsni Anil Kumar
Producing Descriptive Statistics
Statistical Analysis System
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 2
Producing Descriptive Statistics Introduction
Computing Statistics Using PROC MEANS
Creating a Summarized Data Set Using PROC SUMMARY
Producing Frequency Tables Using PROC FREQ
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 3
Introduction
PROC REPORT is the ability to summarize large amounts of data by producing descriptive statistics. However, there are SAS procedures that are designed specifically to produce various types of descriptive statistics and to display them in meaningful reports
If the data values that you want to describe are continuous numeric values , then you can use the MEANS procedure or the SUMMARY procedure to calculate statistics such as the mean, sum, minimum, and maximum.
If the data values that you want to describe are discrete then you can use the FREQ procedure to show the distribution of these values.
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 4
Computing Statistics Using PROC MEANS(Page 1)
The MEANS procedure provides descriptive statistics such as the mean, minimum, and maximum provide useful information about numeric data.
Procedure Syntax
PROC MEANS <DATA=SAS-data-set>
<statistic-keyword(s)> <option(s)>;
RUN;
Where
SAS-data-set is the name of the data set to be used
statistic-keyword(s) specify the statistics to compute
option(s) control the content, analysis, and appearance
Example
proc means data=perm.survey;
run;
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 5
Computing Statistics Using PROC MEANS(Page 2)
Selecting Statistics
Consider that you want to see the median and range of Perm. Survey numeric values, add the MEDIAN and RANGE keywords as options.
Example
proc means data=perm.survey median range;
run;
The following keywords can be used with PROC
MEANS to compute statistics:
Keyword DescriptionMAX Maximum value
MEAN Average
MODE Value that occurs most frequently
MIN Minimum value
VAR Variance
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 6
Computing Statistics Using PROC MEANS(Page 3)
Limiting Decimal Places
To limit decimal places, use the MAXDEC= option in the PROC MEANS statement, and set it equal to the length that you prefer.
Syntax
PROC MEANS <DATA=SAS-data-set>
<statistic-keyword(s)> MAXDEC=n;
where n specifies the maximum number of decimal places.
Example
proc means data=clinic.diabetes min max maxdec=0;
run;
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 7
Computing Statistics Using PROC MEANS(Page 4)
Specifying Variables in PROC MEANS
To specify the variables that PROC MEANS analyzes, add a VAR statement and list the variable names.
General form, VAR statement:
VAR variable(s);
Example
proc means data=clinic.diabetes min max maxdec=0;
var age height weight;
run;
In addition to listing variables separately,
you can use a numbered range of variables
var item1-item5;
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 8
Computing Statistics Using PROC MEANS(Page 5)
Group Processing Using the CLASS Statement
To produce separate analyses of grouped observations, add a CLASS statement to the MEANS procedure.
General form, CLASS statement:
CLASS variable(s);
where variable(s) specifies category variables for group processing.
CLASS variables can be either character or numeric, but they should contain a limited number of discrete values that represent meaningful groupings.
Example
proc means data=clinic.heart maxdec=1;
var arterial heart cardiac urinary;
class survive sex;
run;
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 9
Computing Statistics Using PROC MEANS(Page 6)
Group Processing Using the BY Statement
Like the CLASS statement, the BY statement specifies variables to use for categorizing observations.
General form, BY statement:
BY variable(s);
Difference between BY and CLASS
1.Unlike CLASS processing, BY processing requires that your data already be sorted or indexed in the order of the BY variables.
2.BY group results have a layout that is different from the layout of CLASS group results. Note that the BY statement in the program below creates four small tables; a CLASS statement would produce a single large table.
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 10
Computing Statistics Using PROC MEANS(Page 7)
Example for BY statement.
proc sort data=clinic.heart out=work.heartsort;
by survive sex;
run;
proc means data=work.heartsort maxdec=1;
var arterial heart cardiac urinary;
by survive sex;
run;
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 11
Computing Statistics Using PROC MEANS(Page 8)
Creating a Summarized Data Set Using PROC MEANS
You might want to create an output SAS data set that contains just the summarized variable.
General form, OUTPUT statement:
OUTPUT OUT=SAS-data-set statistic=variable(s);
where
OUT= specifies the name of the output data set
statistic= specifies the summary statistic written out
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 12
Computing Statistics Using PROC MEANS(Page 9)
Specifying the STATISTIC= Option
You can specify which statistics to produce in the output data set. To do so, you must specify the statistic and then list all of the variables. The variables must be listed in the same order as in the VAR statement. You can specify more than one statistic in the OUTPUT statement.
proc means data=clinic.diabetes;
var age height weight;
class sex;
output out=work.sum_gender
mean=AvgAge AvgHeight AvgWeight
min=MinAge MinHeight MinWeight;
run;
To see the contents of the output data set,
submit the following PROC PRINT step.
PROC MEANS in SAS LISTPROC MEANS in SAS LIST
PROC MEANS OUTPUT TO SAS DATASETPROC MEANS OUTPUT TO SAS DATASET
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 13
Computing Statistics Using PROC MEANS(Page 10)
Creating only the output data set
You can use the NOPRINT option in the PROC MEANS statement to prevent the default report from being created. For example, the following program creates only the output data set:
Example
proc means data=clinic.diabetes noprint;
var age height weight;
class sex;
output out=work.sum_gender
mean=AvgAge AvgHeight AvgWeight;
run;
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 14
Creating a Summarized Data Set Using PROC SUMMARY
You can also create a summarized output data set by using PROC SUMMARY.
The difference between the two procedures is that PROC MEANS produces a report by default. By contrast, to produce a report in PROC SUMMARY, you must include a PRINT option in the PROC SUMMARY statement.
Example
proc summary data=clinic.diabetes print;
var age height weight;
class sex;
output out=work.sum_gender
mean=AvgAge AvgHeight AvgWeight;
run;
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 15
Producing Frequency Tables Using PROC FREQ (Page 1)
The FREQ procedure is a descriptive procedure as well as a statistical procedure. It produces one-way and n-way frequency tables,.
You can use the FREQ procedure to create cross-tabulation tables that summarize data for two or more categorical variables by showing the number of observations for each combination of variable values.
General form, basic FREQ procedure:
PROC FREQ <DATA=SAS-data-set>;
RUN;
By default, PROC FREQ creates a one-way table with the frequency, percent, cumulative frequency, and
cumulative percent of every
value of all variables in a data set.
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 16
Producing Frequency Tables Using PROC FREQ (Page 2)
Example
For example, the following FREQ procedure creates a frequency table for each variable in the data set Parts. Widgets. All the unique values are shown for ItemName, LotSize, and Region.
proc freq data=parts.widgets;
run;
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 17
Producing Frequency Tables Using PROC FREQ (Page 3)
Specifying Variables in PROC FREQ
By default, the FREQ procedure creates frequency tables for every variable in your data set.
To specify the variables to be processed by the FREQ procedure, include a TABLES statement.
Syntax
TABLES variable(s);
where variable(s) lists the variables to include.
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 18
Producing Frequency Tables Using PROC FREQ (Page 4)
Example
Consider the SAS data set Finance.Loans. The variables Rate and Months are best described as categorical values, so they are the best choices for frequency tables.
proc freq data=finance.loans;
tables rate months;
run;
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 19
Producing Frequency Tables Using PROC FREQ (Page 5)
Creating Two-Way Tables
It is often helpful to crosstabulate frequencies with the values of other variables. For example, census data is typically crosstabulated with a variable that represents geographical regions.
Syntax
TABLES variable-1 *variable-2 <* ... variable-n>;
where (for two-way tables)
variable-1 specifies table rows and variable-2 specifies table columns.
When crosstabulations are specified, PROC FREQ produces tables with cells that contain
column cell frequency
cell percentage of total frequency
cell percentage of row frequency
cell percentage of frequency.
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 20
Producing Frequency Tables Using PROC FREQ (Page 6)
For example, the following program creates the two-way table shown below.
proc format;
value wtfmt low-139='< 140'
140-180='140-180‘
181-high='> 180';
value htfmt low-64='< 5''5"'
65-70='5''5-10"'
71-high='> 5''10"';
run;
proc freq data=clinic.diabetes;
tables weight*height;
format weight wtfmt.
height htfmt.;
run;
Tables
LEGEND BOX
Tables
LEGEND BOX
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 21
Producing Frequency Tables Using PROC FREQ (Page 7)
Creating N-Way Tables
For a frequency analysis of more than two variables, use PROC FREQ to create n-way crosstabulations. A series of two-way tables is produced, with a table for each level of the other variables.
For example, suppose you want to add the variable Sex to your crosstabulation of Weight and Height in the data set Clinic.Diabetes. Add Sex to the TABLES statement, joined to the other variables with an asterisk (*).
Example
tables sex*weight*height;
The order of the variables is important. In n-way tables, the last two variables of the TABLES statement become the two-way rows and columns.
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 22
Producing Frequency Tables Using PROC FREQ (Page 8)
Example
proc format;
value wtfmt low-139='< 140‘
140-180='140-180‘
181-high='> 180';
value htfmt low-64='< 5''5"'
65-70='5''5-10"‘
71-high='> 5''10"';
run;
proc freq data=clinic.diabetes;
tables sex*weight*height;
format weight wtfmt. height htfmt.;
run;
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 23
Producing Frequency Tables Using PROC FREQ (Page 9)
Changing the Table Format
CROSSLIST option to your TABLES statement displays cross-tabulation tables
proc format;
value wtfmt low-139='< 140‘
140-180='140-180‘
181-high='> 180';
value htfmt low-64='< 5''5"'
65-70='5''5-10"‘
71-high='> 5''10"';
run;
proc freq data=clinic.diabetes;
tables sex*weight*height/crosslist;
format weight wtfmt. height htfmt.;
run;
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 24
Producing Frequency Tables Using PROC FREQ (Page 10)
Creating Tables
When three or more variables are specified, the multiple levels of n-way tables can produce considerable output. Such bulky, often complex crosstabulations are often easier to read as a continuous list in List Format
Syntax
TABLES variable-1 *variable-2 <* … variable-n> / LIST;
Example
proc format;
value wtfmt low-139='< 140'
….;
run;
proc freq data=clinic.diabetes;
tables sex*weight*height / list;
format weight wtfmt. height htfmt.;
run;
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 25
Producing Frequency Tables Using PROC FREQ (Page 11)
Suppressing Table Information
NOFREQ suppresses cell frequencies
NOPERCENT suppresses cell percentages
NOROW supresses row percentages
NOCOL suppresses column percentages.
Example
proc freq data=clinic.diabetes;
tables sex*weight / nofreq norow nocol;
format weight wtfmt.;
run;
Mainframe Online Training
SAS Presentation | IBM Confidential | Apr 21, 2023 26