different distributions david purdie. topics application of gee to: binary outcomes: – logistic...
TRANSCRIPT
Different Distributions
David Purdie
Topics
Application of GEE to:
• Binary outcomes:– logistic regression
• Events over time (rate): – Poisson regression
• Survival data: – Cox regression
General form for distributions from the exponential family
Outcome for subject i at time j = Yij
E(Yij)=ij
Generalized linear model
g(ij)=Xiwhere Xi=(xi1,…,xij) is the matrix of
covariates for subject i
Binary outcomes: logistic regression
Outcome:
Pr(Yij = 1) = ij (probability of an event)
Pr(Yij = 0) = 1- ij.
Logit link function:
Logistic model:
where ij = E(Yij|Xi)
ij
ij
1log)logit( jj
event no for 0
event for 1ijY
ii XX )}Y{E( logit )( logit ijij
Events over time: Poisson regression
• Outcome: Yi = number of events in time period ti
• E(Yi): iti Var(Yiti)= iti (were i is the event rate)
• Log link function: log (i)
• Poisson model:
)log()log()( log)}Y{E( log ii itXtX iii
Survival data: Cox regression
• Parameter: tij (time to event yij)
• Based on a hazard function: ht
• Outcome: Tij = time till event yij
• Log link function: log (ht)
• Cox model:
itit XXh )}T{E( log )( log ij
where t is the baseline hazard rate.
Alternating logistic regression
If the responses are binary, it may make more sense to use a matrix of odds ratios rather than correlations.
Replace corr(Yij, Yik) with:
1)Y0,0)Pr(YY1,Pr(Y0)Y0,1)Pr(TY1,Pr(Y
)Y,OR(Yikijikij
ikijikij
ikij
The ALR algorithm models ijk = log{OR(Yij,Yik)} as:ijk=zijk
where are regression parameters and z is fixed and needs to be specified
Mixed Models for Non-Normal Data
• E(y|u)=, var(y|u)=V(), g()=X+Zu
• Random coefficients u have dist f(u)
• y|u has the usual glm distribution
• Binary outcome: – binomial for y|u and beta for u
• Count outcome:– Poisson for y|u and gamma for u
Example - binary
• Study of bladder cancer• All patients had superficial bladder tumours on entry
which were removed• Two randomly allocated treatments (group):
– Placebo (n=47), Thiotepa (n=38)
• Many multiple recurrences of tumours• Month is month since treatment (1 to 53)• Baseline covariates of number of initial tumours
(number) & size of largest tumour (size)• Lots of missing data: 3585 out of 4505 potential
observations (80%) are missing• Model missing data (yes/no) using a binomial GEE to
assess if data is missing at random (logit link function)
Name in data set
Visits per subject
N Mean Min Max
Placebo 47 8.7 1 19
Thiotepa 38 13.5 1 38
Total 85 10.8 1 38
Plot of missing proportion over time
Format for the data in SASSubject group number count month missing size
1 0 1 0 1 0 3
1 0 1 . 2 1 3
1 0 1 . 3 1 3
.
2 0 2 0 1 0 1
2 0 2 . 2 1 1
2 0 2 0 3 0 1
.
48 1 1 0 1 0 3
48 1 1 . 2 1 3
48 1 1 . 3 1 3
.
49 1 3 0 1 0 1
49 1 3 . 2 1 1
49 1 3 . 3 1 1
Logistic GEE in SAS
proc genmod data=tumour_miss descending;
class group subject month;
model missing=group month size number
/ dist=binomial type3;
repeated subject=subject
/ type=ind corrw within=month;
estimate 'effect of thiotepa' group -1 1/ exp;
run;
ORs for group (Thiotepa vs plac)Corr structure OR 95% CI P-value
ind 0.53 0.33 - 0.84 0.007
exch (=0.12) 0.41 0.23 - 0.72 0.002
AR(1) 0.53 0.33 - 0.84 0.007
mdep(1) 0.53 0.33 - 0.84 0.007
mdep(3) 0.56 0.35 – 0.88 0.013
unstr - - -
Log OR structure
Logor=exch (OR=1.05)
0.51 0.32 – 0.81 0.004
Example - Poisson
• Response: number of new tumours (count)• Month is month since treatment (1 to 53)• Baseline covariates of number of initial tumours
(number) & size of largest tumour (size)• Timesince is the number of months since the
last visit • Missing data are dependent upon treatment
group and time• Model new tumour counts using a Poisson GEE
to assess treatment effect (log link function)
Name in data set
Count of tumours by treatment group
N Mean Std Min Max
Placebo 407 0.70 1.73 0 9
Thiotepa 513 0.23 0.99 0 9
New tumour counts over time by treatment group
Plot of observed means over time
Poisson GEE in SAS
proc genmod data=tumour_count;
class group subject month;
model count=group size number timesince
/ dist=poisson scale=deviance;
repeated subject=subject
/ type=exch withinsubject=month corrw;
estimate 'effect of thiotepa' group -1 1/ exp;
run;
RRs for group (Thiotepa vs plac)
Corr structure OR 95% CI P-value
ind 0.33 0.18 - 0.59 0.0002
exch (=0.08) 0.38 0.21 - 0.68 0.0012
AR(1) 0.35 0.19 - 0.62 0.0004
mdep(1) 0.35 0.19 - 0.62 0.0004
mdep(5) 0.37 0.21 - 0.65 0.0006
unstr* 0.49 0.29 - 0.83 0.008
*WARNING: The number of response pairs for estimating correlation is less than or equal to the number of regression parameters. A simpler correlation model might be more appropriate.
Using an offsetdata tumour_count;
set tumour_count;off=log(timesince+1);
run;proc genmod data=tumour_count; class group subject month; model count=group size number
/ dist=poisson scale=deviance offset=off type3; repeated subject=subject
/ type=unstr withinsubject=month; estimate 'effect of thiotepa' group -1 1/ exp;run;
RRs for group (Thiotepa vs plac)
Corr structure OR 95% CI P-value
ind 0.46 0.25 - 0.86 0.014
exch (=0.07) 0.48 0.26 - 0.91 0.023
AR(1) 0.46 0.25 - 0.84 0.012
mdep(1) 0.46 0.25 - 0.84 0.012
mdep(5) 0.46 0.25 - 0.84 0.011
unstr* 0.85 0.36 - 2.02 0.708
*WARNING: The number of response pairs for estimating correlation is less than or equal to the number of regression parameters. A simpler correlation model might be more appropriate.
Interpretation and Presentation• Descriptive: plots of means or tables of
means (percentages, etc.)• Tables of parameter estimates and
confidence intervals (odds ratios or relative risks)
• P-values for effects or interactions (possibly just in the text)
• Emphasize results from descriptive analysis and effect estimates.
Statistical Methods• What is the distribution of the outcome?• How were the data summarized?• Due to the repeated nature of the data, a
generalized estimated equations (GEE) approach was used to estimate parameters and test for differences between groups.
• What was the form of the correlation structure?• What hypotheses were being tested?• How were missing data handled?• How were variances calculated?• What statistical package was used?
Example: Statistical MethodsMean numbers of new tumours were used to summarise the data. Poisson regression was used to model tumour counts using the time between successive observations as an offset. Due to the repeated nature of the data, a generalized estimated equations (GEE) approach was used to estimate parameters and test for differences between groups. The main hypothesis being tested was whether Thiotepa affected the numbers of new tumours. The correlation between successive observations was examined and an appropriate correlation structure was specified. Drop outs and non-attendance was examined to assess for differences between the treatment groups. Robust variance estimate techniques were used to calculate standard errors and confidence intervals. All analysis were performed using SAS version 8.2.