different distributions david purdie. topics application of gee to: binary outcomes: – logistic...

27
Different Distributions David Purdie

Upload: rose-thompson

Post on 20-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Different Distributions

David Purdie

Page 2: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Topics

Application of GEE to:

• Binary outcomes:– logistic regression

• Events over time (rate): – Poisson regression

• Survival data: – Cox regression

Page 3: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

General form for distributions from the exponential family

Outcome for subject i at time j = Yij

E(Yij)=ij

Generalized linear model

g(ij)=Xiwhere Xi=(xi1,…,xij) is the matrix of

covariates for subject i

Page 4: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Binary outcomes: logistic regression

Outcome:

Pr(Yij = 1) = ij (probability of an event)

Pr(Yij = 0) = 1- ij.

Logit link function:

Logistic model:

where ij = E(Yij|Xi)

ij

ij

1log)logit( jj

event no for 0

event for 1ijY

ii XX )}Y{E( logit )( logit ijij

Page 5: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Events over time: Poisson regression

• Outcome: Yi = number of events in time period ti

• E(Yi): iti Var(Yiti)= iti (were i is the event rate)

• Log link function: log (i)

• Poisson model:

)log()log()( log)}Y{E( log ii itXtX iii

Page 6: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Survival data: Cox regression

• Parameter: tij (time to event yij)

• Based on a hazard function: ht

• Outcome: Tij = time till event yij

• Log link function: log (ht)

• Cox model:

itit XXh )}T{E( log )( log ij

where t is the baseline hazard rate.

Page 7: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Alternating logistic regression

If the responses are binary, it may make more sense to use a matrix of odds ratios rather than correlations.

Replace corr(Yij, Yik) with:

1)Y0,0)Pr(YY1,Pr(Y0)Y0,1)Pr(TY1,Pr(Y

)Y,OR(Yikijikij

ikijikij

ikij

The ALR algorithm models ijk = log{OR(Yij,Yik)} as:ijk=zijk

where are regression parameters and z is fixed and needs to be specified

Page 8: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Mixed Models for Non-Normal Data

• E(y|u)=, var(y|u)=V(), g()=X+Zu

• Random coefficients u have dist f(u)

• y|u has the usual glm distribution

• Binary outcome: – binomial for y|u and beta for u

• Count outcome:– Poisson for y|u and gamma for u

Page 9: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Example - binary

• Study of bladder cancer• All patients had superficial bladder tumours on entry

which were removed• Two randomly allocated treatments (group):

– Placebo (n=47), Thiotepa (n=38)

• Many multiple recurrences of tumours• Month is month since treatment (1 to 53)• Baseline covariates of number of initial tumours

(number) & size of largest tumour (size)• Lots of missing data: 3585 out of 4505 potential

observations (80%) are missing• Model missing data (yes/no) using a binomial GEE to

assess if data is missing at random (logit link function)

Name in data set

Page 10: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression
Page 11: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Visits per subject

N Mean Min Max

Placebo 47 8.7 1 19

Thiotepa 38 13.5 1 38

Total 85 10.8 1 38

Page 12: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Plot of missing proportion over time

Page 13: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Format for the data in SASSubject group number count month missing size

1 0 1 0 1 0 3

1 0 1 . 2 1 3

1 0 1 . 3 1 3

.

2 0 2 0 1 0 1

2 0 2 . 2 1 1

2 0 2 0 3 0 1

.

48 1 1 0 1 0 3

48 1 1 . 2 1 3

48 1 1 . 3 1 3

.

49 1 3 0 1 0 1

49 1 3 . 2 1 1

49 1 3 . 3 1 1

Page 14: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Logistic GEE in SAS

proc genmod data=tumour_miss descending;

class group subject month;

model missing=group month size number

/ dist=binomial type3;

repeated subject=subject

/ type=ind corrw within=month;

estimate 'effect of thiotepa' group -1 1/ exp;

run;

Page 15: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

ORs for group (Thiotepa vs plac)Corr structure OR 95% CI P-value

ind 0.53 0.33 - 0.84 0.007

exch (=0.12) 0.41 0.23 - 0.72 0.002

AR(1) 0.53 0.33 - 0.84 0.007

mdep(1) 0.53 0.33 - 0.84 0.007

mdep(3) 0.56 0.35 – 0.88 0.013

unstr - - -

Log OR structure

Logor=exch (OR=1.05)

0.51 0.32 – 0.81 0.004

Page 16: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Example - Poisson

• Response: number of new tumours (count)• Month is month since treatment (1 to 53)• Baseline covariates of number of initial tumours

(number) & size of largest tumour (size)• Timesince is the number of months since the

last visit • Missing data are dependent upon treatment

group and time• Model new tumour counts using a Poisson GEE

to assess treatment effect (log link function)

Name in data set

Page 17: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Count of tumours by treatment group

N Mean Std Min Max

Placebo 407 0.70 1.73 0 9

Thiotepa 513 0.23 0.99 0 9

Page 18: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

New tumour counts over time by treatment group

Page 19: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression
Page 20: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Plot of observed means over time

Page 21: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Poisson GEE in SAS

proc genmod data=tumour_count;

class group subject month;

model count=group size number timesince

/ dist=poisson scale=deviance;

repeated subject=subject

/ type=exch withinsubject=month corrw;

estimate 'effect of thiotepa' group -1 1/ exp;

run;

Page 22: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

RRs for group (Thiotepa vs plac)

Corr structure OR 95% CI P-value

ind 0.33 0.18 - 0.59 0.0002

exch (=0.08) 0.38 0.21 - 0.68 0.0012

AR(1) 0.35 0.19 - 0.62 0.0004

mdep(1) 0.35 0.19 - 0.62 0.0004

mdep(5) 0.37 0.21 - 0.65 0.0006

unstr* 0.49 0.29 - 0.83 0.008

*WARNING: The number of response pairs for estimating correlation is less than or equal to the number of regression parameters. A simpler correlation model might be more appropriate.

Page 23: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Using an offsetdata tumour_count;

set tumour_count;off=log(timesince+1);

run;proc genmod data=tumour_count; class group subject month; model count=group size number

/ dist=poisson scale=deviance offset=off type3; repeated subject=subject

/ type=unstr withinsubject=month; estimate 'effect of thiotepa' group -1 1/ exp;run;

Page 24: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

RRs for group (Thiotepa vs plac)

Corr structure OR 95% CI P-value

ind 0.46 0.25 - 0.86 0.014

exch (=0.07) 0.48 0.26 - 0.91 0.023

AR(1) 0.46 0.25 - 0.84 0.012

mdep(1) 0.46 0.25 - 0.84 0.012

mdep(5) 0.46 0.25 - 0.84 0.011

unstr* 0.85 0.36 - 2.02 0.708

*WARNING: The number of response pairs for estimating correlation is less than or equal to the number of regression parameters. A simpler correlation model might be more appropriate.

Page 25: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Interpretation and Presentation• Descriptive: plots of means or tables of

means (percentages, etc.)• Tables of parameter estimates and

confidence intervals (odds ratios or relative risks)

• P-values for effects or interactions (possibly just in the text)

• Emphasize results from descriptive analysis and effect estimates.

Page 26: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Statistical Methods• What is the distribution of the outcome?• How were the data summarized?• Due to the repeated nature of the data, a

generalized estimated equations (GEE) approach was used to estimate parameters and test for differences between groups.

• What was the form of the correlation structure?• What hypotheses were being tested?• How were missing data handled?• How were variances calculated?• What statistical package was used?

Page 27: Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression

Example: Statistical MethodsMean numbers of new tumours were used to summarise the data. Poisson regression was used to model tumour counts using the time between successive observations as an offset. Due to the repeated nature of the data, a generalized estimated equations (GEE) approach was used to estimate parameters and test for differences between groups. The main hypothesis being tested was whether Thiotepa affected the numbers of new tumours. The correlation between successive observations was examined and an appropriate correlation structure was specified. Drop outs and non-attendance was examined to assess for differences between the treatment groups. Robust variance estimate techniques were used to calculate standard errors and confidence intervals. All analysis were performed using SAS version 8.2.