5. sas procedures for linear regression
TRANSCRIPT
5. SAS procedures for linear regression
Karl B Christensenhttp://192.38.117.59/~kach/SAS
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Contents
Scatter plots
Correlation
Simple linear regression
Residual plots
Histogram, Probability plot, Box plot
Data example: obesity score and blood pressure
http://192.38.117.59/~kach/SAS/bp.sas7bdat
http://192.38.117.59/~kach/SAS/bp.txt
http://192.38.117.59/~kach/SAS/bp.xlsx
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Use PROC PRINT to look at the data set
LIBNAME dat ’P:\’;
PROC PRINT DATA=dat.bp(obs =7);
RUN;
Output
Obs sexnr obese bp
1 1 1.31 130
2 1 1.31 148
3 1 1.19 146
4 1 1.11 122
5 1 1.34 140
6 1 1.17 146
7 1 1.56 132
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Simplest scatter plot using PROC GPLOT
PROC GPLOT DATA=dat.bp;
PLOT bp*obese;
RUN; QUIT;111
bp
90
100
110
120
130
140
150
160
170
180
190
200
210
obese
0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4
can be improved a lot....Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Improved scatter plot using PROC GPLOT
PROC GPLOT DATA=dat.bp;
PLOT bp*obese / HAXIS=AXIS1 VAXIS=AXIS2;
AXIS1 LABEL=(H=3) VALUE =(H=2) MINOR=NONE;
AXIS2 LABEL=(H=3 A=90) VALUE=(H=2) MINOR=NONE;
SYMBOL1 V=CIRCLE H=2;
RUN; QUIT;111
bp
90
100
110
120
130
140
150
160
170
180
190
200
210
obese0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Improved scatter plot using PROC GPLOT
PROC GPLOT DATA=dat.bp;
PLOT bp*obese / HAXIS=AXIS1 VAXIS=AXIS2;
AXIS1 LABEL=(H=3) VALUE =(H=2) MINOR=NONE;
AXIS2 LABEL=(H=3 A=90) VALUE=(H=2) MINOR=NONE;
SYMBOL1 V=CIRCLE H=2;
RUN; QUIT;
Things to notice
AXIS1: larger label and larger numbers on the axis
AXIS2: also rotates the label
MINOR=NONE: removes tick marks
SYMBOL1: option V (for value) gives circles in stead of +,option H makes symbols larger
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Correlation
Is blood pressure related to obesity ? We compute the correlationusing PROC CORR
Default is the parametric correlation, based on the bivariatenormal distribution: (the Pearson correlation).
The Spearman correlation based on ranks is the mostcommonly used non parametric correlation
The Kendall correlation is an alternative rank correlationbased on number of concordant and discordant pairs
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Correlation coefficient
Calculated as
r = rxy =Sxy√SxxSyy
=
∑ni=1 (xi − x)(yi − y)√∑n
i=1 (xi − x)2∑n
i=1 (yi − y)2
Measures the strength of the (linear) association between twovariables
values are between −1 and 1
0 corresponds to independence
−1 and 1 correspond to perfect linearity
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Computing correlations in SAS
SAS code
PROC CORR DATA=dat.bp PEARSON SPEARMAN KENDALL;
VAR bp;
WITH obese;
RUN; QUIT;
Options PEARSON, SPEARMAN, and KENDALL: we get the Pearsoncorrelation, the Spearman correlation, and the Kendall correlation.Can use
PROC CORR DATA=dat.bp PEARSON SPEARMAN KENDALL;
VAR bp obese;
RUN; QUIT;
but this gives us more (unneeded) output.
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Output
Pearson Correlation Coefficients , N = 102
Prob > |r| under H0: Rho=0
bp
obese 0.32614
0.0008
Spearman Correlation Coefficients , N = 102
Prob > |r| under H0: Rho=0
bp
obese 0.30363
0.0019
Kendall Tau b Correlation Coefficients , N = 102
Prob > |tau| under H0: Tau=0
bp
obese 0.21298
0.0019
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Linear regression
Y Response variable, outcome variable, dependent variable(bp)
X : Explanatory variable, independent variable, covariate(obesity score)
Bivariate observations of X and Y for n individuals
(xi , yi ), i = 1, . . . , n
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Linear regression
The equation for a straight line: Y = α + βX
α: intercept, intersection with Y -axis (at X=0)
The expected bp for an individual with obesity score 0 (oftenan illegal extrapolation).
β: slope, regression coefficient
Expected difference in bp for two individuals with a differenceof 1 unit in obesity score.
Often the parameter of interest.
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
simple linear regression
Model
Yi = α + βXi + εi , εi ∼ N(0, σ2) independent
Estimation - least squares: α and β minimizing
n∑i=1
(yi − (α + βxi ))2
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Assumptions in linear regression
Linearity in the mean value
Independence between error terms εi
Normally distributed error terms, εi ∼ N(0, σ2)
Variance homogeneity, that is, identical variances for all εi ’s
First assumption is the most important. If the assumptionsunderlying the linear regression model are not met, we cannot trustthe results.
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Add a smooth curve to the scatter plot
PROC GPLOT DATA=dat.bp;
PLOT bp*obese / HAXIS=AXIS1 VAXIS=AXIS2;
SYMBOL1 V=CIRCLE H=2 I=SM75S L=2;
RUN; QUIT;
Interpolation I=SM75S smooth curve, L=2 dashed curve 111
bp
90
100
110
120
130
140
150
160
170
180
190
200
210
obese0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Add regression line to the scatter plot
PROC GPLOT DATA=dat.bp;
PLOT bp*obese / HAXIS=AXIS1 VAXIS=AXIS2;
SYMBOL1 V=CIRCLE H=2 I=RL L=1;
RUN; QUIT;
Interpolation I=RL the regression line, L=1 makes the line solid111
bp
90
100
110
120
130
140
150
160
170
180
190
200
210
obese0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Regression using PROC REG
The SAS procedure PROC REG can be used for linear regression
PROC REG DATA=dat.bp;
MODEL bp=obese;
RUN; QUIT;
Output
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 3552.41931 3552.41931 11.90 0.0008
Error 100 29846 298.45541
Corrected Total 101 33398
shows a significant association
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Regression using PROC REG
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 96.81793 8.91960 10.85 <.0001
obese 1 23.00135 6.66701 3.45 0.0008
Estimated relation:
bp = 96.81793 + 23.00135×obese
Interpretation: A difference of 1 unit in obesity score correspondsto an expected difference of 23.00135 units in blood pressure.Confidence limits: use the option CLB
PROC REG DATA=dat.bp;
MODEL bp = obese / CLB;
RUN; QUIT;
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Variance around the regression line σ2 is estimated as
s2 =1
n − 2
n∑i=1
(yi − α− βxi )2
and found in output as mean square error, here the value is298.45541. Standard deviation around the regression line found inoutput as root mean square error ’root MSE’
s =√s2 = 17.27586
has the same unit as outcome and is easier to interpret.
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Confidence limits for regression line
PROC GPLOT DATA=dat.bp;
PLOT bp*obese / HAXIS=AXIS1 VAXIS=AXIS2;
SYMBOL1 V=CIRCLE I=RLCLM95 L=1;
RUN; QUIT;
the option I=RLCLM95 gives us the confidence limits (I=RLCLI95which gives us prediction limits)
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Confidence limits for regression line
111
bp
90
100
110
120
130
140
150
160
170
180
190
200
210
obese0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Model control in regression analysis
Residualsεi = yi − yi = yi − (α + βxi )
should be plotted against:
1 the explanatory variable xi – to check linearity
2 the fitted values yi – to check variance homogeneity (andnormality)
3 ’normal scores’ i.e. probability plot – to check normality
First two should give impression random scatter, while theprobability plot ought to show a straight line.
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Residual plots and linearity
Look for ∪ or ∩ forms
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Plots for model checking in the HTML output
If you want to add a smooth curve to ease the check for ∪ or ∩forms in the plots of the residuals against the covariates, try
ODS GRAPHICS ON;
PROC REG DATA=dat.bp PLOTS=( DIAGNOSTICS RESIDUALS(SMOOTH ));
MODEL bp = obese / CLB;
RUN;
ODS GRAPHICS OFF;
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Distribution of residuals not well approximated by normaldistribution.Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Types of residuals
Ordinary residuals = model deviations
εi = yi − yi .
SAS option: R
Standardized (studentized) residuals. SAS option: STUDENT
Residuals where current observation is not used in estimationof corresponding line. SAS option: PRESS
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Save residuals and predicted values
PROC REG calculates and saves all these quantities for later use,typically graphics (e.g., if the HTML output does not work on yourown computer or you want to create your own plots)
PROC REG DATA=dat.bp;
MODEL bp = obese / CLB;
OUTPUT OUT=WORK.ch P=forv R=resid;
RUN;
get a new data set WORK.ch with new variables forv and resid
which we may then to construct residual plots
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Plot residuals and predicted values
PROC GPLOT DATA=WORK.ch;
PLOT resid*(obese forv) /
HAXIS=AXIS1
VAXIS=AXIS2
VREF=0
CVREF=GRAYAA
LVREF =33;
SYMBOL1 V=CIRCLE CV=BLACK H=2 I=SM75S CI=RED L=8 W=3;
RUN; QUIT;
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Plot residuals and predicted values
111
Res
idua
l
-30
-20
-10
0
10
20
30
40
50
60
70
80
obese0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Plot residuals and predicted values
2222
Res
idua
l
-30
-20
-10
0
10
20
30
40
50
60
70
80
Predicted Value of bp110 120 130 140 150 160
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Plot histogram and probability plot for the residuals
PROC UNIVARIATE NORMAL DATA=WORK.ch;
VAR resid;
HISTOGRAM / HEIGHT =3 CFILL=GRAY NORMAL;
PROBPLOT / HEIGHT =3 NORMAL(MU=EST SIGMA=EST L=2 COLOR=RED);
RUN; QUIT;
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Plot histogram and probability plot for the residuals
3333
-22.5 -7.5 7.5 22.5 37.5 52.5 67.50
10
20
30
40
50P
erce
nt
Residual
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Plot histogram and probability plot for the residuals
5555
0.1 1 5 10 25 50 75 90 95 99 99.9
-40
-20
0
20
40
60
80R
esid
ual
Res
idua
l
Normal Percentiles
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Exercise: Regression and graphics I
In this exercise we want to study the effect of age on theSIGF1-level in the Juul data.
1 Get the data into SAS using a LIBNAME statement
2 Create a new data set containing only prepubertal children(Tanner stage 1 and age > 5).
3 Use PROC GPLOT to plot the relationship between SIGF1 andage for prepubertal children. Add regression lines or smoothcurves.
4 Use PROC REG to do a regression analysis of SIGF1 vs. agefor prepubertal children.
5 Use OUTPUT OUT=WORK.check .. to make a data set withresiduals and use these to evaluate model fit.
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
More about scatter plots in SAS
PROC GPLOT in the simplest form
PROC GPLOT DATA=dat.bp;
PLOT bp*obese;
RUN; QUIT;
111
bp
90
100
110
120
130
140
150
160
170
180
190
200
210
obese
0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
More about scatter plots in SAS
more code can give us nicer output
AXIS1 LABEL=(H=3 F=swissbe "Obese")
VALUE=(H=2)
ORDER =(0.8 TO 2.4 BY 0.2)
MINOR=NONE
OFFSET =(3,2)CM;
AXIS2 LABEL=(A=90 H=2 F=SWISSBI "Blood pressure")
VALUE=(H=2)
ORDER =(90 TO 210 BY 10)
LENGTH =12CM;
PROC GPLOT DATA=dat.bp;
PLOT bp*obese / HAXIS=AXIS1 VAXIS=AXIS2;
SYMBOL1 C=RED V=DOT H=2 I=SM45S L=1 W=3;
LEGEND1 LABEL=(H=2.5) VALUE=(H=2 JUSTIFY=LEFT);
TITLE1 F=CSCRIPT H=3 "BP and "
F="Times New Roman" H=2 "obese";
RUN; QUIT;
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
More about scatter plots in SAS
111
90
100
110
120
130
140
150
160
170
180
190
200
210
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
obese
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
More about scatter plots in SAS
AXIS1 LABEL=(H=3 F=swissbe "obese")
VALUE=(H=2)
ORDER =(0.8 TO 2.4 BY 0.2)
MINOR=NONE
OFFSET =(1,1)CM;
AXIS2 LABEL=(A=90 H=2 F=SWISSBI "Blood pressure")
VALUE=(H=2)
ORDER =(90 TO 210 BY 10)
LENGTH =12CM;
PROC GPLOT DATA=dat.bp;
PLOT bp*obese = sexnr / HAXIS=AXIS1 VAXIS=AXIS2;
SYMBOL1 C=RED V=CIRCLE H=2 I=SM85S L=1 W=3;
SYMBOL2 C=BLUE V=DOT H=2 I=SM85S L=7 W=3;
LEGEND1 LABEL=(H=2) VALUE =(H=2 JUSTIFY=LEFT);
TITLE1 F=CSCRIPT H=2 F="Times New Roman" "BP and obese";
RUN; QUIT;
use third variable in PLOT statement and include statements onAXIS, SYMBOL LEGEND and TITLE
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
More about scatter plots in SAS
111
90
100
110
120
130
140
150
160
170
180
190
200
210
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
BP and obese
sexnr 1 2
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Symbol statements
One symbol statement for each group (each value of sexnr).Options
C=RED - color of points and lines: BLACK, RED, BLUE, ...
V=DOT plotting symbol: CIRCLE, DOT, STAR, ...
H=2 size of plotting symbol (default 1)
I=SM25S interpolation method: NONE, JOIN, SM75S, RL,
RLCLI95, RLCLM95, ...
L=1 line type (1: solid, 2-46: dashed)
W=3 - line three times thicker than standard
MODE=INCLUDE - observations outside range specified inORDER statement are used in the interpolation (default isMODE=EXCLUDE
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Plotting symbols
V= or VALUE= in SYMBOL statements
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Interpolation methods
I= or INTERPOL= in SYMBOL statements
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Line types
L= or LINE= in SYMBOL statements
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Fonts in SAS
F= or FONT= in SYMBOL, AXIS or TITLE statements
or specify font names in quotes (e.g. ”Times New Roman”)See choices: Tools → Options → Enhanced Editor → Appearance → Font Name:
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
AXIS specifications
HAXIS=AXIS1 VAXIS=AXIS2: horizontal axis uses AXIS1 andvertical axis uses AXIS2 (AXIS1 and AXIS2 must beprespecified
LABEL=(A=90 H=2 "..") specifies axis text, size of theletters, and direction of text (A=90 text string rotated 90degrees counterclockwise)
VALUE=(H=2): the size of the digits on the axis
ORDER=(135 TO 195 BY 5) specifies desired range and thespecific numbers on the axis
MINOR=9 number of tick marks between the numbers, may beset to NONE
OFFSET=(3,2)CM leaves 3 cm of empty space to the left and2 cm of empty space to the right
LOGBASE=10: log-scaled axis - tick marks at powers of 10.
LENGTH=12CM length of the axis.
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Histograms in PROC UNIVARIATE
PROC UNIVARIATE DATA=dat.bp;
CLASS sexnr;
VAR bp;
HISTOGRAM bp / CFILL=GRAY NORMAL;
INSET MEAN STD SKEWNESS / HEADER="Descriptive stats" FORMAT =8.3;
RUN;
HISTOGRAM: gives a histogram for the variable bp
CFILL=GRAY: bars are gray
CLASS sexnr: one histogram for each gender NORMAL: thebest fitting normal curve is included.
Endpoints of the bars in the histogram may also be specified,e.g., ENDPOINTS=1.9 TO 2.3 BY 0.1,
INSET MEAN STD SKEWNESS ...: Box with descriptivestatistics included.
can use formats for descriptive statistics
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Histograms in PROC UNIVARIATE
5555
BP and obese
0
5
10
15
20
25
30
35
40
45P
erc
ent
Descriptive stats
Mean 127.955
Std Deviation 16.591
Skewness 1.217
1
97.5 112.5 127.5 142.5 157.5 172.5 187.5 202.5
0
5
10
15
20
25
30
35
40
45
Pe
rce
nt
Descriptive stats
Mean 126.310
Std Deviation 19.419
Skewness 1.574
2
bp
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Probability plots in PROC UNIVARIATE
PROC UNIVARIATE DATA=dat.bp;
VAR bp;
PROBPLOT bp / HEIGHT =3 NORMAL(MU=EST SIGMA=EST L=41);
INSET MEAN STD SKEWNESS / HEADER="Descriptive stats" FORMAT =8.1;
RUN;
PROBPLOT: gives us a probability plot
HEIGHT=3: size of text
NORMAL(MU=EST SIGMA=EST L=41) adds (dashed) line ofbest fitting normal distribution.
CLASS sexnr separate plot for each gender (some SASversions show same fitted line is shown in all plots)
can use formats for descriptive statistics
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Probability plots in PROC UNIVARIATE
3333
BP and obese
0.1 1 5 10 25 50 75 90 95 99 99.9
75
100
125
150
175
200
225bpbp
Descriptive stats
Mean 127.0
Std Deviation 18.2
Skewness 1.4
Normal Percentiles
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Box plots
Two different ways: PROC BOXPLOT
PROC SORT DATA=dat.bp;
BY sexnr;
RUN;
PROC BOXPLOT DATA=dat.bp;
PLOT bp*sexnr / HEIGHT =3 BOXSTYLE=SCHEMATIC;
RUN;
or PROC GPLOT
PROC GPLOT DATA=dat.bp;
PLOT bp*sexnr / HAXIS=AXIS1 VAXIS=AXIS2;
AXIS1 LABEL=(H=3) VALUE=(H=2) OFFSET =(6,6)CM ORDER=1 TO 2 BY 1;
AXIS2 LABEL=(H=3 A=90) VALUE=(H=2);
SYMBOL1 V=CIRCLE H=2 I=BOX10TJ W=3;
RUN; QUIT;
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Box plots in PROC BOXPLOT
111
BP and obese
1 2
75
100
125
150
175
200
225bp
sexnr
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Box plots in PROC GPLOT
111
bp
90
100
110
120
130
140
150
160
170
180
190
200
210
sexnr1 2
BP and obese
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
PROC BOXPLOT
PROC SORT DATA=dat.bp;
BY sexnr;
RUN;
PROC BOXPLOT DATA=dat.bp;
PLOT bp*sexnr / HEIGHT =3 BOXSTYLE=SCHEMATIC;
RUN;
bp*sexnr: distribution of bp for each value of sexnr
Data must be sorted by sexnr
HEIGHT=3 size of text
BOXSTYLE=SCHEMATIC type of boxplot (there are several).
SCHEMATIC means that whiskers will extend to the mostextreme points within 1.5 × the interquartile range, measuredfrom the end of the box1
1if data are perfectly normally distributed, this is close to the 2.5 and 97.5percentiles – otherwise it is not really relevantKarl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Box plot in PROC GPLOT
PROC GPLOT DATA=dat.bp;
PLOT bp*sexnr / HAXIS=AXIS1 VAXIS=AXIS2;
AXIS1 LABEL=(H=3) VALUE=(H=2) OFFSET =(6,6)CM ORDER=1 TO 2 BY 1;
AXIS2 LABEL=(H=3 A=90) VALUE=(H=2);
SYMBOL1 V=CIRCLE H=2 I=BOX10TJ W=3;
RUN; QUIT;
bp*sexnr: the distribution of bp for each value of sexnr
OFFSET=(6,6)cm: needed, otherwise first and last box will beplaced adjacent to left and right border of the plot
I=BOX10TJ: 10 means that the whiskers show the 10th and90th percentiles (any number between 00 and 25 can beused), T means that there are serifs at the end of thewhiskers, and J means that medians are joined line2
W=3: width of boxes (and joining line)
2more interesting when x-variable has more that two levelsKarl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
How to save graphics and tables
SAS can use the ’output delivery system’ (ODS) to direct outputfrom a SAS procedure to other files
ODS RTF FILE=’p:\ eksempel.rtf ’ STYLE=Journal BODYTITLE;
PROC GPLOT;
---;
RUN; QUIT;
PROC FREQ;
---;
RUN;
ODS RTF CLOSE;
this generates a file in ’rich text format’ (RTF), which can beopened by Word. Alternative PDF.
predefined STYLE looks a little like presentation in scientificpapers.
BODYTITLE: titles are placed in the documents page headerand SAS footnotes are not placed in the document’s pagefooter.
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression
Exercise: Regression and graphics II
1 Download the bp.sas7bdat data set from the homepage andget it into SAS using a LIBNAME statement
2 Use graphical methods to answer the questions
Is there an association between obese and sexnr ?Is there an association between bp and sexnr ?Is there an association between obese and bp ?
3 Use ODS RTF to save the plots in an rtf file that can beopened using Word.
Karl B Christensenhttp://192.38.117.59/~kach/SAS 5. SAS procedures for linear regression