biostat methods stat 5820/6910 handout #5: logistic ......intercept 0.511028 -0.209445 -0.035981...

19
1 Biostat Methods STAT 5820/6910 Handout #5: Logistic Regression (with Overdispersion, Separation of Points, and Inverse Interval Estimation) Example 1: 102 patients with acute myelogenous leukemia (AML) in remission were enrolled in a study of a new anti-relapse treatment (ACT). Patients were randomly assigned to receive a 10-day infusion of ACT or a placebo (PBO), and effects were followed for 90 days. Of interest was whether or not the patients suffered a major 'relapse' during the 90 days, including relapse, death, or major intervention, such as bone marrow transplant. The time of remission from diagnosis or prior relapse ('x', in months) at study enrollment was considered an important covariate in predicting relapse. Is there any evidence that ACT leads to a decreased relapse rate compared to PBO? Relapse (y) No (0) Yes (1) Treatment (trt) PBO (0) 20 30 ACT (1) 29 23 /* Define options */ ods html image_dpi=300 style=journal; data aml; input group $ x relapse $ @@; trt = (group='ACT'); y = (relapse='Y'); label x = 'Months in Remission'; cards; ACT 3 N ACT 3 Y ACT 3 Y ACT 6 Y ACT 15 N ACT 6 Y ACT 6 Y ACT 6 Y ACT 15 N ACT 15 N ACT 12 N ACT 18 N ACT 6 Y ACT 15 N ACT 6 Y ACT 15 N ACT 12 Y ACT 9 N ACT 6 Y ACT 6 N ACT 6 N ACT 6 N ACT 3 Y ACT 18 N ACT 9 N ACT 12 Y ACT 6 N ACT 9 Y ACT 9 Y ACT 3 N ACT 9 Y ACT 12 N ACT 12 N ACT 3 N ACT 12 N ACT 12 N ACT 12 N ACT 9 Y ACT 6 Y ACT 12 N ACT 6 N ACT 15 Y ACT 9 N ACT 3 Y ACT 9 N ACT 9 N ACT 9 N ACT 9 N ACT 9 Y ACT 12 Y ACT 3 Y ACT 6 Y PBO 9 Y PBO 3 N PBO 12 Y PBO 3 Y PBO 3 Y PBO 15 Y PBO 9 Y PBO 12 Y PBO 3 Y PBO 9 Y PBO 15 Y PBO 9 Y PBO 6 Y PBO 9 Y PBO 6 Y PBO 12 N PBO 9 N PBO 15 N PBO 15 Y PBO 9 N PBO 9 N PBO 12 Y PBO 3 Y PBO 6 Y PBO 6 Y PBO 12 N PBO 12 N PBO 12 Y PBO 3 Y PBO 12 Y PBO 3 Y PBO 12 Y PBO 6 Y PBO 6 Y PBO 9 Y PBO 15 N PBO 15 N PBO 12 N PBO 9 N PBO 12 N PBO 15 N PBO 18 Y PBO 12 N PBO 15 Y PBO 15 N PBO 15 N PBO 18 N PBO 18 Y PBO 18 N PBO 18 N ;

Upload: others

Post on 19-Jan-2021

7 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

1

Biostat Methods STAT 5820/6910 – Handout #5: Logistic Regression

(with Overdispersion, Separation of Points, and Inverse Interval Estimation)

Example 1: 102 patients with acute myelogenous leukemia (AML) in remission were

enrolled in a study of a new anti-relapse treatment (ACT). Patients were randomly

assigned to receive a 10-day infusion of ACT or a placebo (PBO), and effects were

followed for 90 days. Of interest was whether or not the patients suffered a major

'relapse' during the 90 days, including relapse, death, or major intervention, such as bone

marrow transplant. The time of remission from diagnosis or prior relapse ('x', in months)

at study enrollment was considered an important covariate in predicting relapse. Is there

any evidence that ACT leads to a decreased relapse rate compared to PBO?

Relapse (y)

No (0) Yes (1)

Treatment (trt) PBO (0) 20 30

ACT (1) 29 23

/* Define options */

ods html image_dpi=300 style=journal;

data aml; input group $ x relapse $ @@;

trt = (group='ACT');

y = (relapse='Y');

label x = 'Months in Remission';

cards;

ACT 3 N ACT 3 Y ACT 3 Y ACT 6 Y ACT 15 N ACT 6 Y

ACT 6 Y ACT 6 Y ACT 15 N ACT 15 N ACT 12 N ACT 18 N

ACT 6 Y ACT 15 N ACT 6 Y ACT 15 N ACT 12 Y ACT 9 N

ACT 6 Y ACT 6 N ACT 6 N ACT 6 N ACT 3 Y ACT 18 N

ACT 9 N ACT 12 Y ACT 6 N ACT 9 Y ACT 9 Y ACT 3 N

ACT 9 Y ACT 12 N ACT 12 N ACT 3 N ACT 12 N ACT 12 N

ACT 12 N ACT 9 Y ACT 6 Y ACT 12 N ACT 6 N ACT 15 Y

ACT 9 N ACT 3 Y ACT 9 N ACT 9 N ACT 9 N ACT 9 N

ACT 9 Y ACT 12 Y ACT 3 Y ACT 6 Y PBO 9 Y PBO 3 N

PBO 12 Y PBO 3 Y PBO 3 Y PBO 15 Y PBO 9 Y PBO 12 Y

PBO 3 Y PBO 9 Y PBO 15 Y PBO 9 Y PBO 6 Y PBO 9 Y

PBO 6 Y PBO 12 N PBO 9 N PBO 15 N PBO 15 Y PBO 9 N

PBO 9 N PBO 12 Y PBO 3 Y PBO 6 Y PBO 6 Y PBO 12 N

PBO 12 N PBO 12 Y PBO 3 Y PBO 12 Y PBO 3 Y PBO 12 Y

PBO 6 Y PBO 6 Y PBO 9 Y PBO 15 N PBO 15 N PBO 12 N

PBO 9 N PBO 12 N PBO 15 N PBO 18 Y PBO 12 N PBO 15 Y

PBO 15 N PBO 15 N PBO 18 N PBO 18 Y PBO 18 N PBO 18 N

;

Page 2: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

2

/* Run usual chi-square test */

proc freq data=aml;

tables trt*y / chisq nopercent nocol;

title1 'Chi-square test of association';

title2 '(ignoring covariate)';

run;

Chi-square test of association

(ignoring covariate)

The FREQ Procedure

Frequency

Row Pct

Table of trt by y

trt y

0 1 Total

0 20

40.00

30

60.00

50

1 29

55.77

23

44.23

52

Total 49

53

102

Statistics for Table of trt by y

Statistic DF Value Prob

Chi-Square 1 2.5394 0.1110

Likelihood Ratio Chi-

Square

1 2.5505 0.1103

Continuity Adj. Chi-

Square

1 1.9469 0.1629

Mantel-Haenszel Chi-

Square

1 2.5145 0.1128

Phi Coefficient -0.1578

Contingency Coefficient 0.1559

Cramer's V -0.1578

Fisher's Exact Test

Cell (1,1) Frequency (F) 20

Left-sided Pr <= F 0.0813

Right-sided Pr >= F 0.9637

Table Probability (P) 0.0450

Two-sided Pr <= P 0.1189

Page 3: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

3

/* Do equivalent test in logistic regression */

proc logistic data=aml;

model y(event='1') = trt;

title1 'Logistic regression';

title2 '(ignoring covariate)';

run;

Logistic regression

(ignoring covariate)

Response Profile

Ordered

Value

y Total

Frequency

1 0 49

2 1 53

Probability modeled is y=1.

Model Convergence Status

Convergence criterion

(GCONV=1E-8) satisfied.

Model Fit Statistics

Criterion Intercept

Only

Intercept

and

Covariates

AIC 143.245 142.695

SC 145.870 147.945

-2 Log L 141.245 138.695

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 2.5505 1 0.1103

Score 2.5394 1 0.1110

Wald 2.5178 1 0.1126

Analysis of Maximum Likelihood Estimates

Parameter DF Estimate Standard

Error

Wald

Chi-

Square

Pr > ChiSq

Intercept 1 0.4055 0.2887 1.9728 0.1602

trt 1 -0.6373 0.4016 2.5178 0.1126

Odds Ratio Estimates

Effect Point Estimate 95% Wald

Confidence Limits

trt 0.529 0.241 1.162

Page 4: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

4

/* Fit logistic regression model with covariate */

proc logistic data=aml plots(only)=roc;

model y(event='1') = trt x ;

title1 'Logistic regression';

title2 '(accounting for covariate)';

run;

Logistic regression

(accounting for covariate)

Response Profile

Ordered

Value

y Total

Frequency

1 0 49

2 1 53

Probability modeled is y=1.

Model Convergence Status

Convergence criterion (GCONV=1E-

8) satisfied.

Model Fit Statistics

Criterion Intercept

Only

Intercept

and

Covariates

AIC 143.245 129.376

SC 145.870 137.251

-2 Log L 141.245 123.376

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 17.8687 2 0.0001

Score 16.4848 2 0.0003

Wald 14.0612 2 0.0009

Analysis of Maximum Likelihood Estimates

Parameter DF Estimate Standard

Error

Wald

Chi-

Square

Pr > ChiSq

Intercept 1 2.6135 0.7149 13.3662 0.0003

trt 1 -1.1191 0.4669 5.7446 0.0165

x 1 -0.1998 0.0560 12.7187 0.0004

Odds Ratio Estimates

Effect Point Estimate 95% Wald

Confidence Limits

trt 0.327 0.131 0.815

x 0.819 0.734 0.914

Page 5: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

5

Association of Predicted Probabilities and

Observed Responses

Percent Concordant 68.5 Somers' D 0.454

Percent Discordant 23.1 Gamma 0.496

Percent Tied 8.4 Tau-a 0.229

Pairs 2597 c 0.727

/* Fit equivalent logistic regression model,

and look at 'dose-response' curves

for each level of group variable */

proc logistic data=aml;

class group;

model y(event='1') = group x ;

effectplot fit(plotby=group x=x);

title1 'Logistic regression';

title2 '(with dose-response curve)';

run;

Page 6: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

6

Logistic regression

(with dose-response curve)

Analysis of Maximum Likelihood Estimates

Parameter DF Estimate Standard

Error

Wald

Chi-Square

Pr > ChiSq

Intercept 1 2.0539 0.5967 11.8477 0.0006

group ACT 1 -0.5595 0.2335 5.7446 0.0165

x 1 -0.1998 0.0560 12.7187 0.0004

Odds Ratio Estimates

Effect Point Estimate 95% Wald

Confidence Limits

group ACT vs PBO 0.327 0.131 0.815

x 0.819 0.734 0.914

Page 7: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

7

/***************************************/

/* Look at inverse interval estimation */

/***************************************/

/* First get 'weighted' version of data */

proc sort data=aml; by trt x;

proc means data=aml sum n noprint;

by trt x;

var y;

output out=out1 n=total sum=resp;

proc print data=out1;

title1 'Weighted version of AML data';

run;

Weighted version of AML data

Obs trt x _TYPE_ _FREQ_ total resp

1 0 3 0 7 7 6

2 0 6 0 6 6 6

3 0 9 0 10 10 6

4 0 12 0 12 12 6

5 0 15 0 10 10 4

6 0 18 0 5 5 2

7 1 3 0 8 8 5

8 1 6 0 14 14 9

9 1 9 0 12 12 5

10 1 12 0 10 10 3

11 1 15 0 6 6 1

12 1 18 0 2 2 0

/* Get 'weighted' data in order with trt=1 first

-- that way the ORDER=DATA option in PROC PROBIT

will make trt=1 be the indicated factor level

since it will occur first in the data set. */

proc sort data=out1; by descending trt;

run;

Page 8: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

8

/* Get (and plot) inverse intervals for response probabilities

when trt=0 [need to give an x-level (6 here), but not used] */

data trt0; input trt x ;

cards;

0 6

;

proc probit data=out1 order=data xdata=trt0 plot=ippplot;

class trt;

model resp/total = trt x / d=logistic inversecl lackfit

covb;

/* NOTE: Put the 'dose' variable as the first continuous

(or non-CLASS) variable in the MODEL statement.

INVERSECL applies to first continuous predictor;

all predictors must have levels set in XDATA set */

title1 'Inverse Interval Estimation (trt=0; PBO)';

run;

Inverse Interval Estimation

(trt=0; PBO)

The Probit Procedure

Model Information

Data Set WORK.OUT1

Events Variable resp

Trials Variable total

Number of

Observations

12

Number of Events 53

Number of Trials 102

Name of

Distribution

Logistic

Log Likelihood -61.68822985

Number of Observations Read 12

Number of Observations Used 12

Number of Events 53

Number of Trials 102

Algorithm converged.

Goodness-of-Fit Tests

Statistic Value DF Value/DF Pr > ChiSq

Pearson Chi-

Square

3.2825 9 0.3647 0.9520

L.R. Chi-Square 4.5899 9 0.5100 0.8685

Note: Since the Pearson Chi-Square is small (p > 0.1000), fiducial limits will be calculated using a z value of 1.96.

Response-Covariate Profile

Response Levels 2

Number of Covariate Values 12

Type III Analysis of Effects

Effect DF Wald

Chi-Square

Pr > ChiSq

trt 1 5.7449 0.0165

x 1 12.7192 0.0004

Page 9: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

9

Analysis of Maximum Likelihood Parameter Estimates

Parameter DF Estimate Standard

Error

95% Confidence

Limits

Chi-

Square

Pr > ChiSq

Intercept 1 2.6136 0.7149 1.2125 4.0147 13.37 0.0003

trt 1 1 -1.1191 0.4669 -2.0343 -0.2040 5.74 0.0165

trt 0 0 0.0000 . . . . .

x 1 -0.1998 0.0560 -0.3095 -0.0900 12.72 0.0004

Estimated Covariance Matrix

Intercept trt1 x

Intercept 0.511028 -0.209445 -0.035981

trt1 -0.209445 0.218010 0.009684

x -0.035981 0.009684 0.003137

Probit Analysis on x

Probability x 95% Fiducial Limits

0.01 36.0862 27.0084 66.4687

0.02 32.5655 24.6797 58.7092

0.03 30.4845 23.2938 54.1320

0.35 16.1822 12.9927 23.4513

0.40 15.1131 12.0206 21.3599

0.97 -4.3177 -24.1121 1.8153

0.98 -6.3988 -28.6721 0.4122

0.99 -9.9194 -36.4121 -1.9359

Page 10: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

10

/* Get (and plot) inverse intervals for response probabilities

when trt=1 [need to give an x-level (6 here), but not used] */

data trt1; input trt x ;

cards;

1 6

;

proc probit data=out1 order=data xdata=trt1 plot=ippplot;

class trt;

model resp/total = trt x /

d=logistic inversecl lackfit covb;

title1 'Inverse Interval Estimation (trt=1; ACT)';

run;

NOTE: Output here is the same as for the trt=0 case, except for the “95% Fiducial

Limits” table and corresponding figure:

Page 11: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

11

Example 2: Erectile Dysfunction Data

48 male subjects in an anti-impotence study had experienced erectile dysfunction

following prostate surgery. Subjects were randomly assigned to receive a new drug

(trt=1) or placebo (trt=0), and kept a diary for one month, recording the number of

attempts at sexual intercourse following taking the medication and the number of

attempts that were successful. Subject age is also recorded.

Does the new drug have a higher success rate than the placebo?

data ED; input trt age successes attempts @@;

ID = _n_;

cards;

0 41 3 6 1 57 3 8

0 44 5 15 1 54 10 12

0 62 0 4 1 65 0 0

0 44 1 2 1 51 5 8

0 70 3 8 1 53 8 10

0 35 4 8 1 44 17 22

0 72 1 6 1 66 2 3

0 34 5 15 1 55 9 11

0 61 1 7 1 37 6 8

0 35 5 5 1 40 2 4

0 52 6 8 1 44 9 16

0 66 1 7 1 64 5 9

0 35 4 10 1 78 1 3

0 61 4 8 1 51 6 12

0 55 2 5 1 67 5 11

0 41 7 9 1 44 3 3

0 53 2 4 1 65 7 18

0 72 4 6 1 69 0 2

0 58 0 0 1 53 4 14

0 56 12 17 1 49 5 8

0 53 8 15 1 74 10 15

0 45 3 4 1 39 4 9

0 40 14 20 1 35 8 10

1 47 4 5

1 46 6 7

;

Page 12: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

12

proc logistic data=ED;

model successes/attempts = trt age;

title1 'ED Data Analysis';

run;

ED Data Analysis

Number of Observations Read 48

Number of Observations Used 46

Sum of Frequencies Read 417

Sum of Frequencies Used 417

Response Profile

Ordered

Value

Binary Outcome Total

Frequency

1 Event 234

2 Nonevent 183

Note: 2 observations with invalid response values have been deleted. Either the number of trials was less than or equal to zero or less than the number of events, or the number of events was negative.

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Analysis of Maximum Likelihood Estimates

Parameter DF Estimate Standard

Error

Wald

Chi-Square

Pr > ChiSq

Intercept 1 1.2111 0.4597 6.9410 0.0084

trt 1 0.5265 0.2041 6.6574 0.0099

age 1 -0.0243 0.00881 7.5816 0.0059

Odds Ratio Estimates

Effect Point Estimate 95% Wald

Confidence Limits

trt 1.693 1.135 2.526

age 0.976 0.959 0.993

Page 13: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

13

/* Check whether the subject strata (and associated

dependence of observations) has caused overdispersion */

proc logistic data=ED;

model successes/attempts = trt age / scale=pearson;

output out=out1 p=phat;

title1 'ED Data Analysis';

title2 '(Also Check for Overdispersion)';

run;

ED Data Analysis

(Also Check for Overdispersion)

Note: 2 observations with invalid response values have been deleted. Either the number of trials was less than or equal to zero or less than the number of events, or the number of events was negative.

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Deviance and Pearson Goodness-of-Fit Statistics

Criterion Value DF Value/DF Pr > ChiSq

Deviance 70.3355 43 1.6357 0.0053

Pearson 63.7235 43 1.4819 0.0216

Number of events/trials observations: 46

Note: The covariance matrix has been multiplied by the heterogeneity factor (Pearson Chi-Square / DF) 1.48194.

Analysis of Maximum Likelihood Estimates

Parameter DF Estimate Standard

Error

Wald

Chi-Square

Pr > ChiSq

Intercept 1 1.2111 0.5596 4.6837 0.0305

trt 1 0.5265 0.2484 4.4923 0.0340

age 1 -0.0243 0.0107 5.1160 0.0237

Page 14: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

14

Example 3: Menopause Data

370 female patients’ age and menopause status (menopause=1 for post-menopausal, 0

otherwise) is recorded. Age is categorized into a variable agecat: 1 for age<50, 2 for 50 ≤

age < 60, 3 for 60 ≤ age < 70, and 4 for 70 ≤ age. How does menopause rate depend on

age?

filename myurl url "http://www.stat.usu.edu/jrstevens/biostat/data/bcancer.csv"

lrecl=800;

data bcancer;

infile myurl dsd delimiter = "," firstobs=2 missover;

input menopause age agecat;

run;

data bcancer; set bcancer;

ID = _n_;

run;

/* Run regular logistic regression */

proc logistic data=bcancer plots=(roc effect);

model menopause(event='1') = age ;

title1 'Logistic regression on menopause data';

run;

Page 15: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

15

Logistic regression on menopause data

Response Profile

Ordered

Value

menopause Total

Frequency

1 0 59

2 1 301

Probability modeled is menopause=1.

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 124.1456 1 <.0001

Score 81.0669 1 <.0001

Wald 49.7646 1 <.0001

Analysis of Maximum Likelihood Estimates

Parameter DF Estimate Standard

Error

Wald

Chi-Square

Pr > ChiSq

Intercept 1 -12.8675 1.9360 44.1735 <.0001

age 1 0.2829 0.0401 49.7646 <.0001

Odds Ratio Estimates

Effect Point Estimate 95% Wald

Confidence Limits

age 1.327 1.227 1.436

Page 16: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

16

/* What happens if the trend were even more 'clear'? */

data new; set bcancer;

if menopause = 1 & age < 57 then delete;

if menopause = 0 & age > 55 then delete;

proc logistic data=new plots=(roc effect);

model menopause(event='1') = age ;

title1 'Logistic regression on menopause subset';

run;

Logistic regression on menopause subset

Response Profile

Ordered

Value

menopause Total

Frequency

1 0 58

2 1 180

Probability modeled is menopause=1.

Note: 1 observation was deleted due to missing values for the response or explanatory variables.

Model Convergence Status

Complete separation of data points detected.

Page 17: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

17

Warning: The maximum likelihood estimate does not exist.

Warning: The LOGISTIC procedure continues in spite of the above warning. Results

shown are based on the last maximum likelihood iteration. Validity of the

model fit is questionable.

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 263.5557 1 <.0001

Score 153.7178 1 <.0001

Wald 5.4176 1 0.0199

Analysis of Maximum Likelihood Estimates

Parameter DF Estimate Standard

Error

Wald

Chi-Square

Pr > ChiSq

Intercept 1 -131.3 56.7926 5.3459 0.0208

age 1 2.3737 1.0198 5.4176 0.0199

Page 18: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

18

/* How to fix this? */

proc logistic data=new plots=(roc effect);

model menopause(event='1') = age / firth;

title1 'Bias-corrected logistic regression on menopause

subset';

run;

Bias-corrected logistic regression on menopause subset

Model Information

Data Set WORK.NEW

Likelihood Penalty Firth's bias correction

Model Convergence Status

Complete separation of data points detected.

Analysis of Maximum Likelihood Estimates

Parameter DF Estimate Standard

Error

Wald

Chi-Square

Pr > ChiSq

Intercept 1 -88.0491 28.3566 9.6414 0.0019

age 1 1.5948 0.5078 9.8639 0.0017

Page 19: Biostat Methods STAT 5820/6910 Handout #5: Logistic ......Intercept 0.511028 -0.209445 -0.035981 trt1 -0.209445 0.218010 0.009684 x -0.035981 0.009684 0.003137 Probit Analysis on x

19

/* A simple fix for separation with one continuous

predictor */

data new; set new;

dummy = (age >= 56);

proc logistic data=new;

model menopause(event='1') = dummy;

title1 'logistic with dummy';

run;

logistic with dummy

Model Convergence Status

Complete separation of data points detected.

Analysis of Maximum Likelihood Estimates

Parameter DF Estimate Standard

Error

Wald

Chi-Square

Pr > ChiSq

Intercept 1 -10.0502 19.9843 0.2529 0.6150

dummy 1 19.6361 21.9147 0.8029 0.3702

proc freq data=new;

tables dummy*menopause / chisq norow nocol nopercent;

title1 'chi-square with dummy';

run;

chi-square with dummy

Table of dummy by menopause

dummy menopause

0 1 Total

0 58

0

58

1 0

180

180

Total 58

180

238

Statistics for Table of dummy by

menopause

Statistic DF Value Prob

Chi-Square 1 238.0000 <.0001

Effective Sample Size = 238

Frequency Missing = 1