biostat methods stat 5820/6910 handout #5: logistic ......intercept 0.511028 -0.209445 -0.035981...

1

Biostat Methods STAT 5820/6910 – Handout #5: Logistic Regression

(with Overdispersion, Separation of Points, and Inverse Interval Estimation)

Example 1: 102 patients with acute myelogenous leukemia (AML) in remission were

enrolled in a study of a new anti-relapse treatment (ACT). Patients were randomly

assigned to receive a 10-day infusion of ACT or a placebo (PBO), and effects were

followed for 90 days. Of interest was whether or not the patients suffered a major

'relapse' during the 90 days, including relapse, death, or major intervention, such as bone

marrow transplant. The time of remission from diagnosis or prior relapse ('x', in months)

at study enrollment was considered an important covariate in predicting relapse. Is there

any evidence that ACT leads to a decreased relapse rate compared to PBO?

Relapse (y)

No (0) Yes (1)

Treatment (trt) PBO (0) 20 30

ACT (1) 29 23

/* Define options */

ods html image_dpi=300 style=journal;

data aml; input group $ x relapse $ @@;

trt = (group='ACT');

y = (relapse='Y');

label x = 'Months in Remission';

cards;

ACT 3 N ACT 3 Y ACT 3 Y ACT 6 Y ACT 15 N ACT 6 Y

ACT 6 Y ACT 6 Y ACT 15 N ACT 15 N ACT 12 N ACT 18 N

ACT 6 Y ACT 15 N ACT 6 Y ACT 15 N ACT 12 Y ACT 9 N

ACT 6 Y ACT 6 N ACT 6 N ACT 6 N ACT 3 Y ACT 18 N

ACT 9 N ACT 12 Y ACT 6 N ACT 9 Y ACT 9 Y ACT 3 N

ACT 9 Y ACT 12 N ACT 12 N ACT 3 N ACT 12 N ACT 12 N

ACT 12 N ACT 9 Y ACT 6 Y ACT 12 N ACT 6 N ACT 15 Y

ACT 9 N ACT 3 Y ACT 9 N ACT 9 N ACT 9 N ACT 9 N

ACT 9 Y ACT 12 Y ACT 3 Y ACT 6 Y PBO 9 Y PBO 3 N

PBO 12 Y PBO 3 Y PBO 3 Y PBO 15 Y PBO 9 Y PBO 12 Y

PBO 3 Y PBO 9 Y PBO 15 Y PBO 9 Y PBO 6 Y PBO 9 Y

PBO 6 Y PBO 12 N PBO 9 N PBO 15 N PBO 15 Y PBO 9 N

PBO 9 N PBO 12 Y PBO 3 Y PBO 6 Y PBO 6 Y PBO 12 N

PBO 12 N PBO 12 Y PBO 3 Y PBO 12 Y PBO 3 Y PBO 12 Y

PBO 6 Y PBO 6 Y PBO 9 Y PBO 15 N PBO 15 N PBO 12 N

PBO 9 N PBO 12 N PBO 15 N PBO 18 Y PBO 12 N PBO 15 Y

PBO 15 N PBO 15 N PBO 18 N PBO 18 Y PBO 18 N PBO 18 N

;

2

/* Run usual chi-square test */

proc freq data=aml;

tables trt*y / chisq nopercent nocol;

title1 'Chi-square test of association';

title2 '(ignoring covariate)';

run;

Chi-square test of association

(ignoring covariate)

The FREQ Procedure

Frequency

Row Pct

Table of trt by y

trt y

0 1 Total

0 20

40.00

30

60.00

50

1 29

55.77

23

44.23

52

Total 49

53

102

Statistics for Table of trt by y

Statistic DF Value Prob

Chi-Square 1 2.5394 0.1110

Likelihood Ratio Chi-

Square

1 2.5505 0.1103

Continuity Adj. Chi-

Square

1 1.9469 0.1629

Mantel-Haenszel Chi-

Square

1 2.5145 0.1128

Phi Coefficient -0.1578

Contingency Coefficient 0.1559

Cramer's V -0.1578

Fisher's Exact Test

Cell (1,1) Frequency (F) 20

Left-sided Pr <= F 0.0813

Right-sided Pr >= F 0.9637

Table Probability (P) 0.0450

Two-sided Pr <= P 0.1189

3

/* Do equivalent test in logistic regression */

proc logistic data=aml;

model y(event='1') = trt;

title1 'Logistic regression';

title2 '(ignoring covariate)';

run;

Logistic regression

(ignoring covariate)

Response Profile

Ordered

Value

y Total

Frequency

1 0 49

2 1 53

Probability modeled is y=1.

Model Convergence Status

Convergence criterion

(GCONV=1E-8) satisfied.

Model Fit Statistics

Criterion Intercept

Only

Intercept

and

Covariates

AIC 143.245 142.695

SC 145.870 147.945

-2 Log L 141.245 138.695

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 2.5505 1 0.1103

Score 2.5394 1 0.1110

Wald 2.5178 1 0.1126

Analysis of Maximum Likelihood Estimates

Parameter DF Estimate Standard

Error

Wald

Chi-

Square

Pr > ChiSq

Intercept 1 0.4055 0.2887 1.9728 0.1602

trt 1 -0.6373 0.4016 2.5178 0.1126

Odds Ratio Estimates

Effect Point Estimate 95% Wald

Confidence Limits

trt 0.529 0.241 1.162

4

/* Fit logistic regression model with covariate */

proc logistic data=aml plots(only)=roc;

model y(event='1') = trt x ;


title2 '(accounting for covariate)';

run;

Logistic regression

(accounting for covariate)

Response Profile

Ordered

Value

y Total

Frequency

1 0 49

2 1 53

Probability modeled is y=1.


Convergence criterion (GCONV=1E-

8) satisfied.

Model Fit Statistics

Criterion Intercept

Only

Intercept

and

Covariates

AIC 143.245 129.376

SC 145.870 137.251

-2 Log L 141.245 123.376



Likelihood Ratio 17.8687 2 0.0001

Score 16.4848 2 0.0003

Wald 14.0612 2 0.0009



Error

Wald

Chi-

Square

Pr > ChiSq

Intercept 1 2.6135 0.7149 13.3662 0.0003

trt 1 -1.1191 0.4669 5.7446 0.0165

x 1 -0.1998 0.0560 12.7187 0.0004



Confidence Limits

trt 0.327 0.131 0.815

x 0.819 0.734 0.914

5

Association of Predicted Probabilities and

Observed Responses

Percent Concordant 68.5 Somers' D 0.454

Percent Discordant 23.1 Gamma 0.496

Percent Tied 8.4 Tau-a 0.229

Pairs 2597 c 0.727

/* Fit equivalent logistic regression model,

and look at 'dose-response' curves

for each level of group variable */

proc logistic data=aml;

class group;

model y(event='1') = group x ;

effectplot fit(plotby=group x=x);


title2 '(with dose-response curve)';

run;

6

Logistic regression

(with dose-response curve)



Error

Wald

Chi-Square

Pr > ChiSq

Intercept 1 2.0539 0.5967 11.8477 0.0006

group ACT 1 -0.5595 0.2335 5.7446 0.0165

x 1 -0.1998 0.0560 12.7187 0.0004



Confidence Limits

group ACT vs PBO 0.327 0.131 0.815

x 0.819 0.734 0.914

7

/***************************************/

/* Look at inverse interval estimation */

/***************************************/

/* First get 'weighted' version of data */

proc sort data=aml; by trt x;

proc means data=aml sum n noprint;

by trt x;

var y;

output out=out1 n=total sum=resp;

proc print data=out1;

title1 'Weighted version of AML data';

run;

Weighted version of AML data

Obs trt x _TYPE_ _FREQ_ total resp

1 0 3 0 7 7 6

2 0 6 0 6 6 6

3 0 9 0 10 10 6

4 0 12 0 12 12 6

5 0 15 0 10 10 4

6 0 18 0 5 5 2

7 1 3 0 8 8 5

8 1 6 0 14 14 9

9 1 9 0 12 12 5

10 1 12 0 10 10 3

11 1 15 0 6 6 1

12 1 18 0 2 2 0

/* Get 'weighted' data in order with trt=1 first

-- that way the ORDER=DATA option in PROC PROBIT

will make trt=1 be the indicated factor level

since it will occur first in the data set. */

proc sort data=out1; by descending trt;

run;

8

/* Get (and plot) inverse intervals for response probabilities

when trt=0 [need to give an x-level (6 here), but not used] */

data trt0; input trt x ;

cards;

0 6

;

proc probit data=out1 order=data xdata=trt0 plot=ippplot;

class trt;

model resp/total = trt x / d=logistic inversecl lackfit

covb;

/* NOTE: Put the 'dose' variable as the first continuous

(or non-CLASS) variable in the MODEL statement.

INVERSECL applies to first continuous predictor;

all predictors must have levels set in XDATA set */

title1 'Inverse Interval Estimation (trt=0; PBO)';

run;

Inverse Interval Estimation

(trt=0; PBO)

The Probit Procedure

Model Information

Data Set WORK.OUT1

Events Variable resp

Trials Variable total

Number of

Observations

12

Number of Events 53

Number of Trials 102

Name of

Distribution

Logistic

Log Likelihood -61.68822985

Number of Observations Read 12

Number of Observations Used 12

Number of Events 53

Number of Trials 102

Algorithm converged.

Goodness-of-Fit Tests

Statistic Value DF Value/DF Pr > ChiSq

Pearson Chi-

Square

3.2825 9 0.3647 0.9520

L.R. Chi-Square 4.5899 9 0.5100 0.8685

Note: Since the Pearson Chi-Square is small (p > 0.1000), fiducial limits will be calculated using a z value of 1.96.

Response-Covariate Profile

Response Levels 2

Number of Covariate Values 12

Type III Analysis of Effects

Effect DF Wald

Chi-Square

Pr > ChiSq

trt 1 5.7449 0.0165

x 1 12.7192 0.0004

9

Analysis of Maximum Likelihood Parameter Estimates


Error

95% Confidence

Limits

Chi-

Square

Pr > ChiSq

Intercept 1 2.6136 0.7149 1.2125 4.0147 13.37 0.0003

trt 1 1 -1.1191 0.4669 -2.0343 -0.2040 5.74 0.0165

trt 0 0 0.0000 . . . . .

x 1 -0.1998 0.0560 -0.3095 -0.0900 12.72 0.0004

Estimated Covariance Matrix

Intercept trt1 x

Intercept 0.511028 -0.209445 -0.035981

trt1 -0.209445 0.218010 0.009684

x -0.035981 0.009684 0.003137

Probit Analysis on x

Probability x 95% Fiducial Limits

0.01 36.0862 27.0084 66.4687

0.02 32.5655 24.6797 58.7092

0.03 30.4845 23.2938 54.1320

…

0.35 16.1822 12.9927 23.4513

0.40 15.1131 12.0206 21.3599

…

0.97 -4.3177 -24.1121 1.8153

0.98 -6.3988 -28.6721 0.4122

0.99 -9.9194 -36.4121 -1.9359

10

/* Get (and plot) inverse intervals for response probabilities

when trt=1 [need to give an x-level (6 here), but not used] */

data trt1; input trt x ;

cards;

1 6

;

proc probit data=out1 order=data xdata=trt1 plot=ippplot;

class trt;

model resp/total = trt x /

d=logistic inversecl lackfit covb;

title1 'Inverse Interval Estimation (trt=1; ACT)';

run;

NOTE: Output here is the same as for the trt=0 case, except for the “95% Fiducial

Limits” table and corresponding figure:

11

Example 2: Erectile Dysfunction Data

48 male subjects in an anti-impotence study had experienced erectile dysfunction

following prostate surgery. Subjects were randomly assigned to receive a new drug

(trt=1) or placebo (trt=0), and kept a diary for one month, recording the number of

attempts at sexual intercourse following taking the medication and the number of

attempts that were successful. Subject age is also recorded.

Does the new drug have a higher success rate than the placebo?

data ED; input trt age successes attempts @@;

ID = _n_;

cards;

0 41 3 6 1 57 3 8

0 44 5 15 1 54 10 12

0 62 0 4 1 65 0 0

0 44 1 2 1 51 5 8

0 70 3 8 1 53 8 10

0 35 4 8 1 44 17 22

0 72 1 6 1 66 2 3

0 34 5 15 1 55 9 11

0 61 1 7 1 37 6 8

0 35 5 5 1 40 2 4

0 52 6 8 1 44 9 16

0 66 1 7 1 64 5 9

0 35 4 10 1 78 1 3

0 61 4 8 1 51 6 12

0 55 2 5 1 67 5 11

0 41 7 9 1 44 3 3

0 53 2 4 1 65 7 18

0 72 4 6 1 69 0 2

0 58 0 0 1 53 4 14

0 56 12 17 1 49 5 8

0 53 8 15 1 74 10 15

0 45 3 4 1 39 4 9

0 40 14 20 1 35 8 10

1 47 4 5

1 46 6 7

;

12

proc logistic data=ED;

model successes/attempts = trt age;

title1 'ED Data Analysis';

run;

ED Data Analysis

Number of Observations Read 48

Number of Observations Used 46

Sum of Frequencies Read 417

Sum of Frequencies Used 417

Response Profile

Ordered

Value

Binary Outcome Total

Frequency

1 Event 234

2 Nonevent 183

Note: 2 observations with invalid response values have been deleted. Either the number of trials was less than or equal to zero or less than the number of events, or the number of events was negative.


Convergence criterion (GCONV=1E-8) satisfied.



Error

Wald

Chi-Square

Pr > ChiSq

Intercept 1 1.2111 0.4597 6.9410 0.0084

trt 1 0.5265 0.2041 6.6574 0.0099

age 1 -0.0243 0.00881 7.5816 0.0059



Confidence Limits

trt 1.693 1.135 2.526

age 0.976 0.959 0.993

13

/* Check whether the subject strata (and associated

dependence of observations) has caused overdispersion */

proc logistic data=ED;

model successes/attempts = trt age / scale=pearson;

output out=out1 p=phat;

title1 'ED Data Analysis';

title2 '(Also Check for Overdispersion)';

run;

ED Data Analysis

(Also Check for Overdispersion)

Note: 2 observations with invalid response values have been deleted. Either the number of trials was less than or equal to zero or less than the number of events, or the number of events was negative.



Deviance and Pearson Goodness-of-Fit Statistics

Criterion Value DF Value/DF Pr > ChiSq

Deviance 70.3355 43 1.6357 0.0053

Pearson 63.7235 43 1.4819 0.0216

Number of events/trials observations: 46

Note: The covariance matrix has been multiplied by the heterogeneity factor (Pearson Chi-Square / DF) 1.48194.



Error

Wald

Chi-Square

Pr > ChiSq

Intercept 1 1.2111 0.5596 4.6837 0.0305

trt 1 0.5265 0.2484 4.4923 0.0340

age 1 -0.0243 0.0107 5.1160 0.0237

14

Example 3: Menopause Data

370 female patients’ age and menopause status (menopause=1 for post-menopausal, 0

otherwise) is recorded. Age is categorized into a variable agecat: 1 for age<50, 2 for 50 ≤

age < 60, 3 for 60 ≤ age < 70, and 4 for 70 ≤ age. How does menopause rate depend on

age?

filename myurl url "http://www.stat.usu.edu/jrstevens/biostat/data/bcancer.csv"

lrecl=800;

data bcancer;

infile myurl dsd delimiter = "," firstobs=2 missover;

input menopause age agecat;

run;

data bcancer; set bcancer;

ID = _n_;

run;

/* Run regular logistic regression */

proc logistic data=bcancer plots=(roc effect);

model menopause(event='1') = age ;

title1 'Logistic regression on menopause data';

run;

15

Logistic regression on menopause data

Response Profile

Ordered

Value

menopause Total

Frequency

1 0 59

2 1 301

Probability modeled is menopause=1.





Likelihood Ratio 124.1456 1 <.0001

Score 81.0669 1 <.0001

Wald 49.7646 1 <.0001



Error

Wald

Chi-Square

Pr > ChiSq

Intercept 1 -12.8675 1.9360 44.1735 <.0001

age 1 0.2829 0.0401 49.7646 <.0001



Confidence Limits

age 1.327 1.227 1.436

16

/* What happens if the trend were even more 'clear'? */

data new; set bcancer;

if menopause = 1 & age < 57 then delete;

if menopause = 0 & age > 55 then delete;

proc logistic data=new plots=(roc effect);

model menopause(event='1') = age ;

title1 'Logistic regression on menopause subset';

run;

Logistic regression on menopause subset

Response Profile

Ordered

Value

menopause Total

Frequency

1 0 58

2 1 180

Probability modeled is menopause=1.

Note: 1 observation was deleted due to missing values for the response or explanatory variables.


Complete separation of data points detected.

17

Warning: The maximum likelihood estimate does not exist.

Warning: The LOGISTIC procedure continues in spite of the above warning. Results

shown are based on the last maximum likelihood iteration. Validity of the

model fit is questionable.



Likelihood Ratio 263.5557 1 <.0001

Score 153.7178 1 <.0001

Wald 5.4176 1 0.0199



Error

Wald

Chi-Square

Pr > ChiSq

Intercept 1 -131.3 56.7926 5.3459 0.0208

age 1 2.3737 1.0198 5.4176 0.0199

18

/* How to fix this? */

proc logistic data=new plots=(roc effect);

model menopause(event='1') = age / firth;

title1 'Bias-corrected logistic regression on menopause

subset';

run;

Bias-corrected logistic regression on menopause subset

Model Information

Data Set WORK.NEW

Likelihood Penalty Firth's bias correction





Error

Wald

Chi-Square

Pr > ChiSq

Intercept 1 -88.0491 28.3566 9.6414 0.0019

age 1 1.5948 0.5078 9.8639 0.0017

19

/* A simple fix for separation with one continuous

predictor */

data new; set new;

dummy = (age >= 56);

proc logistic data=new;

model menopause(event='1') = dummy;

title1 'logistic with dummy';

run;

logistic with dummy





Error

Wald

Chi-Square

Pr > ChiSq

Intercept 1 -10.0502 19.9843 0.2529 0.6150

dummy 1 19.6361 21.9147 0.8029 0.3702

proc freq data=new;

tables dummy*menopause / chisq norow nocol nopercent;

title1 'chi-square with dummy';

run;

chi-square with dummy

Table of dummy by menopause

dummy menopause

0 1 Total

0 58

0

58

1 0

180

180

Total 58

180

238

Statistics for Table of dummy by

menopause

Statistic DF Value Prob

Chi-Square 1 238.0000 <.0001

Effective Sample Size = 238

Frequency Missing = 1

biostat methods stat 5820/6910 handout #5: logistic ......intercept 0.511028 -0.209445 -0.035981...

Documents