lecture 11: hypothesis testing iii stratified tests renyi and other tests

Lecture 11: Hypothesis Testing III

Stratified TestsRenyi and Other Tests

Stratified Tests

• Adjust for a covariate• Allows you to control for a confounder without using

a regression approach• However

– Like regression, if interaction is present, it won’t be detected

– Assumes the ‘treatment’ effect is the same across strata

Sometime Confusing

• “Stratified” analysis• Sometimes

– Subgroup analysis– Stratified “combined” test

• In this case, combined test• Recall Mantel-Haenszel odds ratio

Notation

• Now three variables– Outcome (time to event)– Group variable (i.e. treatment)– Strata variable (i.e. gender, cancer grade)

• J = 1, 2,…., K indexes groups• S = 1, 2,…, M indexes strata

Similar to the Standard Test

• Formal hypothesis

• Now, Zj.(t) is represented by a sum

0 1 2: ... , 1, 2,..., ;s s KsH h t h t h t s M t

. .1ˆ ˆ

jg jgss

From there, inference is the same

• Chi-square test with K – 1 d.f. where S-1 is the inverse of the estimated variance covariance matrix

• For the 2 group scenario it can be reduced to a Z-score

'1 21 2 1, 1 2 1, 1, ,..., , ,..., ~K K KZ Z Z Z Z Z

Asymptotics

• Just like unstratified test, requires large N• Here requires even larger- think about dividing

the sample into M strata• In most cases, there probably is not sufficient

Small Example

• 20 subjects received 1 of two treatments– 9 patients on treatment 1– 11 patients received treatment 2

• Patients also categorized by disease type– 2 strata

• Question:– Does the data show a treatment effect after

adjusting for disease type?

Time Death Censor Trt Disease1 1 0 1 15 1 0 1 15 0 1 1 16 0 1 1 28 1 0 1 1

37 1 0 1 249 1 0 1 158 1 0 1 279 0 1 1 211 0 1 2 150 1 0 2 151 1 0 2 262 1 0 2 167 1 0 2 173 1 0 2 186 1 0 2 190 1 0 2 296 1 0 2 297 1 0 2 297 1 0 2 2

What first

• Data in standard format– Trt 1: 1, 5, 5+,6+, 8, 37, 49, 58, 79+– Trt 2: 11+, 50, 51, 62, 73, 86, 90, 96, 97, 97

• We might first conduct a global test– What is our hypothesis

0 1 2 1 2: . :trt trt A trt trtH h t h t vs H h t h t

Constructing Statistic , 1 , 1

11i trt i trt i i

Y Y Y diY Y Yd

, 1i trtY, 1i trtdiYidit , 1 , 1i

di trt i trt Yd Y

Calculate Statistic

• Z-statistic

• c2 statistic

Now Let’s Adjust for Disease Type

• Steps:1. Divide the data according to strata2. Calculate Zjs.(t) and

3. Sum Zjs(t) and across strata to get Zj.(t) &

4. Calculate your test statistic according to

js i ijs ijs YiZ W t d Y

11ˆ 1ijs ijs ij ij

ij ij ij

D Y Y Y d

jgs i ijY Y YiW t d

ˆ jgs

ˆ jgs .ˆ jg

'11. 2. 1. 1. 2. 1., ,..., , ,...,K KZ Z Z Z Z Z

Divide data By Strata

• Disease 1 • Disease 2Time Death Censor Trt

1 1 0 15 1 0 15 0 1 18 1 0 1

49 1 0 111 0 1 250 1 0 262 1 0 267 1 0 273 1 0 286 1 0 2

Time Death Censor Trt6 0 1 1

37 1 0 158 1 0 179 0 1 151 1 0 290 1 0 296 1 0 297 1 0 297 1 0 2

Calculate and sgsms

, 1 , 1

11i trt i trt i i

Y Y Y diY Y Yd

di trt i trt Yd Y

1 1,Trt DisZ1 1

2,ˆ Trt Dis

Calculate and sgsms

, 1 , 1

11i trt i trt i i

Y Y Y diY Y Yd

di trt i trt Yd Y

1 2,Trt DisZ1 2

2,ˆ Trt Dis

Calculate the Statistic

• Z (or chi-square)

• What is our conclusion

R Code>times<-c(1,5,5,6,8,11,37,49,50,51,58,62,67,73,79,86,90,96,97,97)>trt<- c(1,1,1,1,1,2,1,1,2,2,1,2,2,2,1,2,2,2,2,2)>strat<-c(1,1,1,2,1,1,2,1,1,2,2,1,1,1,2,1,2,2,2,2)>death<-c(1,1,0,0,1,0,1,1,1,1,1,1,1,1,0,1,1,1,1,1)

#Global>survdiff(st~trt)Call:survdiff(formula = st ~ trt)

N Observed Expected (O-E)^2/E (O-E)^2/Vtrt=1 9 6 2.63 4.329 6.1trt=2 11 10 13.37 0.851 6.1

Chisq= 6.1 on 1 degrees of freedom, p= 0.0136

R Code

#Stratifiedsurvdiff(st~trt + strata(strat)) Call:survdiff(formula = st ~ trt + strata(strat))

N Observed Expected (O-E)^2/E (O-E)^2/Vtrt=1 9 6 2.27 6.16 9.46trt=2 11 10 13.73 1.02 9.46

BMT: Hodgkin’s & Non-Hodgkin’s Lymphoma

• Study included 43 BMT patients

• Is there a difference in hazard rates between – Allogenic transplant = HLA matched sibling donor (N=16)– Autogenic transplant = Own “cleaned” marrow (N=27)

• But want to adjust for disease state– Non-Hodgkin’s lymphoma (N=23)– Hodgkin’s disease (N=20)

Global Test

2 1 43 1 16 0.628 0.234

4 1 42 1 15 0.643 0.230

28 1 41 1 14 0.659 0.225

30 1 40 0 13 -0.325 0.219

32 1 39 1 13 0.667 0.222

132 1 22 0 7 -0.318 0.217

140 1 21 0 7 -0.333 0.222

252 1 18 0 7 -0.389 0.238

357 1 16 1 7 0.563 0.246

Sum 0.886 5.841

it id iY ,i Allod ,i AlloY, ,

di Allo i Allo Yd Y , ,

11i Allo i Allo i i

Y Y Y diY Y Yd

Global Results

• Global Test Results> dat<-read.csv("C:\\BJW\\AutoAllo.csv")> d<-dat$death; t<-dat$time> dis<-dat$disease; type<-dat$graft> nostrat<-survdiff(Surv(t, d)~type)> nostratCall:survdiff(formula = Surv(t, d) ~ type)

N Observed Expected (O-E)^2/E (O-E)^2/Vtype=1 16 10 9.11 0.0862 0.134type=2 27 16 16.89 0.0465 0.134

Stratified by Disease Type

28 1 23 1 11 0.522 0.250

32 1 22 1 10 0.545 0.248

42 1 21 0 9 -0.429 0.245

49 1 20 1 9 0.550 0.248

53 1 19 0 8 -0.421 0.244

57 1 18 0 8 -0.444 0.247

63 1 17 0 8 -0.471 0.249

81 2 16 0 8 -1.000 0.467

84 1 14 1 8 0.429 0.245

140 1 13 0 7 -0.538 0.249

252 1 11 0 7 -0.636 0.231

357 1 10 1 7 0.300 0.210

524 1 8 0 6 -0.750 0.188

Sum -2.344 3.319

11i Allo i Allo i i

Y Y Y diY Y Yd

Non-Hodgkin’s Lymphoma subjects

Stratified by Disease Type

2 1 20 1 5 0.750 0.188

4 1 19 1 4 0.789 0.166

30 1 18 0 3 -0.167 0.139

36 1 17 0 3 -0.176 0.145

41 1 16 0 3 -0.188 0.152

52 1 15 0 3 -0.200 0.160

62 1 14 0 3 -0.214 0.168

72 1 13 1 3 0.769 0.178

77 1 12 1 2 0.833 0.139

79 1 11 1 1 0.909 0.083

108 1 10 0 0 0.000 0.000

132 1 9 0 0 0.000 0.000

sum 3.106 1.518

11i Allo i Allo i i

Y Y Y diY Y Yd

Hodgkin’s Disease subjects

Stratified Results

• Stratified Test Results> strat<-survdiff(Surv(t, d)~type + strata(dis))> stratCall:survdiff(formula = Surv(t, d) ~ type + strata(dis))

N Observed Expected (O-E)^2/E (O-E)^2/Vtype=1 16 10 9.24 0.0629 0.12type=2 27 16 16.76 0.0347 0.12

Stratified Results

• Stratified Test Results

• Again we fail to reject• This seems in error (recall

(our survival curves looked VERY different)

2 0.120 0.729p

Problem?

• The treatment effect is not the same in the 2 disease states

• They are in different directions– ZAllo = -2.344

– ZAuto = 3.106

• Stratified approach is NOT appropriate

Alternative to Stratified Analysis

• Alternatives– Define 4 groups and conduct a K-sample log rank

test• Allogenic and NHL• Allogenic and Hodgkin’s• Autogenic and NHL• Autogenic and Hodgkin’s

– Subgroup analysis (by disease) should be performed

• Allo|Hodgkins• Allo|Non-Hodgkins

R Code- K sample test> allgrp<-ifelse(dis==1 & type==1, 1, 0)> allgrp<-ifelse(dis==1 & type==2, 2, allgrp)> allgrp<-ifelse(dis==2 & type==1, 3, allgrp)> allgrp<-ifelse(dis==2 & type==2, 4, allgrp)> grp4<-survdiff(Surv(t, d)~allgrp)> grp4Call:survdiff(formula = Surv(t, d) ~ allgrp)

N Observed Expected (O-E)^2/E (O-E)^2/Vallgrp=1 11 5 7.67 0.927 1.350allgrp=2 12 9 7.45 0.324 0.459allgrp=3 5 5 1.45 8.721 9.567allgrp=4 15 7 9.44 0.631 0.997

R Code- Subgroup analysis> ### Subgroup (NHL)> subNHL<-survdiff(Surv(t,d)[which(dis==1)]~type[which(dis==1)])> subNHLCall:survdiff(formula = Surv(t, d)[which(dis == 1)] ~ type[which(dis ==1)])

N Observed Expected (O-E)^2/E (O-E)^2/Vtype[which(dis == 1)]=1 11 5 7.34 0.748 1.66type[which(dis == 1)]=2 12 9 6.66 0.825 1.66

Chisq= 1.7 on 1 degrees of freedom, p= 0.198 > ### Subgroup (Hodgkins)> subHD<-survdiff(Surv(t,d)[which(dis==2)]~type[which(dis==2)])> subHDCall:survdiff(formula = Surv(t, d)[which(dis == 2)] ~ type[which(dis ==2)])

N Observed Expected (O-E)^2/E (O -E)^2/Vtype[which(dis == 2)]=1 5 5 1.89 5.095 6.36type[which(dis == 2)]=2 15 7 10.11 0.955 6.36

Summary: Stratified Testing

• Alternative to a regression approach to control for a 2nd covariate when examining treatment effect.

• Sample size needs to be larger that in the case of testing K-groups for test results to be valid.

• One needs to be cautious about misinterpreting null results when interactions exist.

• We can use a subgroup approach if this fails.

Renyi Tests

• Previous tests we discussed all use weighted integral of estimated difference in cumulative hazard rates

• Doesn’t address situation where early differences favor one group, and later differences favor another group

• Solution: Renyi tests– i.e. addresses issue of crossing hazard rates

Renyi Test

• Censored data analogs of Kolmogrov-Smirnov statistic when comparing to uncensored samples

• Recall KS is a test of equality of one-dimensional probability distributions used to compare two samples

Komolgrov-Smirnov Test

• Recall empirical distribution function

• Hypothesis

• The KS statistic is

n in iF x I x x

0 1, 2, ' 1, 2, ': . :n n A n nH F x F x vs H F x F x

, ' 1, 2, '

'Reject if :

n n n n

D F x F x

Example of a KS test

• Two groups observed for a continuous outcome:

– 1: -0.2, 3.7, 4.3, 5.0, 7.7, 8.6– 2: -0.9, 0.4, 0.5, 2.6, 3.0, 12.1

• We want to determine if the distribution of the outcomes are different (without assuming any distributional form…)

Constructing KS statisticx P(X1 < x) P(X2 < x) |P(X1 < x)-P(X2 < x)|

-0.9 0 1/6 1/6

-0.2 1/6 1/6 0

0.4 1/6 1/3 1/6

0.5 1/6 1/2 1/3

2.6 1/6 2/3 1/2

3.0 1/6 5/6 2/3

3.7 1/3 5/6 1/2

4.3 1/2 5/6 1/3

5.0 2/3 5/6 1/6

7.7 5/6 5/6 0

8.6 1 5/6 1/6

12.1 1 1 0

2, ' 3

6*61.15

6 6~ 0.142

K-S Test

Renyi Test• Approach

– Find the value of Z(ti) for each failure time• Note different from Z(t) which sums over all ti < t

– Calculate series of Z(ti) :

– Estimate the standard error of Z(t) (all times)

1 1 , 1,2,...,k

di k k k Y

Z t W t d Y i D

1 2221

k k k k

Y Y Y dk kY Y Y

Renyi Statistic

• When hazard rates cross, the absolute value of Z(t) will have max value at some value t < t

• Hypothesis test:

• Note that multiple tests are made, because we are taking the max over Z(t)

: for some A

H h t h t t

Test Statistic Q

• Use the same variance estimate for test statistic as in standard two-sample approach

• Test statistic

• Q is approximated by distribution of sup{|B(x)|, 0 < x < 1}

where B is a standard Brownian motion process

• Use table C.5 to find associated p-value

sup ,Q Z t t

Small Example

• Given the following data– Group 1: (7, 8+, 9, 15, 17)– Group 2: (1, 4, 5+, 6, 19)

Constructing the statisticti dk dk1 Yk Yk1 Var1 1

dk k Yd Y iZ t

Calculating Q

• First we can calculate Q

• Once we have Q we compare to table C.5

Example 2: Kidney Infection

• Data on 119 kidney dialysis patients• Comparing time to kidney infection between

two groups– Catheters placed percutaneously (n = 76)– Catheters placed surgically (n = 43)

Example: Kidney Infection

R Code: Kidney Infection> kidney<-read.csv("H:\\public_html\\BMRTY722_Summer2015\\Data\\Kidney.csv")> time<-kidney$Time> infect<-kidney$d> percut<-kidney$cath> st<-Surv(time, infect)> LRtest<-survdiff(st~percut)> LRtestCall:survdiff(formula = st ~ percut)

N Observed Expected (O-E)^2/E (O-E)^2/Vpercut=1 43 15 11 1.42 2.53percut=2 76 11 15 1.05 2.53

How to Test This in R?

• We could write our own R function to conduct the Renyi test…

• BUT, it turns out there was a package released in April that has the Renyi test (and all weight functions from K & M included )

R Code: Kidney Infection> library(survMisc)> RYtest<-comp(survfit(st~percut))> RYtest$tne t n e n_percut=1 e_percut=1 n_percut=2 e_percut=2 1: 0.5 119 6 76 6 43 0 2: 1.5 103 1 60 0 43 1…16: 26.5 5 1 3 0 2 1

$tests$lrTests ChiSq df pLog-rank 2.529506318 1 0.11174Gehan-Breslow (mod~ Wilcoxon) 0.002084309 1 0.96359Tarone-Ware 0.402738202 1 0.52568Peto-Peto 1.399160019 1 0.23686Mod~ Peto-Peto (Andersen) 1.275908836 1 0.25866Flem~-Harr~ with p=1, q=1 9.834062861 1 0.00171

$tests$supTests Q pLog-rank 1.590442 0.22347Gehan-Breslow (mod~ Wilcoxon) 1.430499 0.30511Tarone-Ware 1.260498 0.41467Peto-Peto 1.166979 0.48551Mod~ Peto-Peto (Andersen) 1.185549 0.47085Renyi Flem~-Harr~ with p=1, q=1 7.460348 0.00000

R Code: Kidney Infection> library(survMisc)> RYtest<-comp(survfit(st~percut), FHp=0, FHq=0)> RYtest$tne t n e n_percut=1 e_percut=1 n_percut=2 e_percut=2 1: 0.5 119 6 76 6 43 0 2: 1.5 103 1 60 0 43 1…16: 26.5 5 1 3 0 2 1

Example 3: Gastric Cancer

• Clinical trial of chemotherapy vs. chemotherapy combined with radiotherapy

• 45 Patients randomized to each of two arms• Followed for up to 8 years

R Code: Gastric Cancer> RYtest<-comp(survfit(Surv(tm, dth)~x, data=dat))> RYtest$tne t n_x=1 e_x=1 n_x=2 e_x=2 n e 1: 1 45 1 45 0 90 1 …80: 2363 3 1 6 0 9 1

Compare To Log Rank

• Renyi test 0.05< p <0.06

• What would you expect to see from the log rank test? More or less significant?

LR Results

> LRtest<-survdiff(Surv(tm, dth)~x)> LRtestCall:survdiff(formula = Surv(tm, dth) ~ x)

N Observed Expected (O-E)^2/E (O-E)^2/Vx=1 45 43 45.1 0.102 0.232x=2 45 39 36.9 0.125 0.232

Final Comments on the Renyi Test

• Simulations comparing the Renyi vs. log-rank – Hazards cross Renyi test performs better – Renyi test has little loss of power if proportional hazard

assumption holds (with limited censoring)– However, with large amounts of censoring, advantages of

the Renyi test decline

• So this tests provides a good alternative when hazard rates cross.

• But caution still needs to be taken when there is a large amount of censoring.

Other Tests for Crossing Hazards

• Cramer-von Mises test(s):– Based on the integrated squared difference

between two curves• T-test analog:

– Requires estimation of the mean– Compared area under S1(t) and S2(t)

• Brookmeyer-Crowley– Censored version of two-sample median test

Cramer-Von Mises Test• Based on the Nelson-Aalen estimator for the hazard

rate and it’s associated variance

• Ideally we integrate over time 0 to t but this integral is estimated by summing over distinct death times

2 2 21 2

, 1,2 &Group Specific:

Across groups:

iji i ij ij

j jYt t t t Y YH t j t

2 2 2 211 2 1Estimated by:

Q H t H t d t

Q H t H t t t

2-Sample T-test analog• Again this test is based on the difference in the area

under the survival curve between two groups• Components of the test include:

– Order all observed times (event and censored) – Calculate dij, cij, and Yij or both groups– Calculate the KM estimator for survival and censoring

– Calculate the pooled KM estimate of survival

ˆ ˆ1 & 1ij ij

ij iji i

j jY Yt t t tS t G t

2-Sample T-test analog• Once these estimates are obtained:

– Construct weight function

– Construct the test statistic

– Construct the variance of the test statistic

– Calculate a Z-score according to

1 1 2 2

ˆ ˆnG t G t

w tn G t n G t

1 21 1 2

ˆ ˆD

KM i i i i ii

n nW t t w t S t S t

1 1 2 2 1

1 1 1 2 1

1 ˆ ˆ21ˆ ˆ ˆ ˆ

ˆ ˆˆ i ii

p i p i i i

i k k k p kk i

DnG t n G tA

p p i p iS t S t nG t G ti

A t t w t S t

S t S t

~ 0,1ˆKM

Summary of Other 2-Sample Tests

• When the hazard rates cross, both the Cramer-Von Mises and the 2-sample t-test analog have greater power than log-rank.

• When hazard rates are proportional, both show power loss relative to log-rank.

• Performance is similar to the Renyi test when hazards cross but Renyi has better power for proportional hazards.

Test Based on Fixed Points in Time

• Complicated description in K&M (chapter 7.8)

• However, pretty simple idea when you are comparing two groups:

ˆ ˆˆ ˆ

S t S tZ

V S t V S t

Next Time

• We will begin our discussion of semi-parametric regression modeling in survival analysis.

lecture 11: hypothesis testing iii stratified tests renyi and other tests

hodgkins disease n

trt stratastrat n

sufficient n

expected oe

test statistic

disease typesteps

bmt patients

cleaned marrow n

Documents

chapter 5 stratified random sampling n advantages of...

chapter 4 stratified sampling - iitk - indian institute of...

stratified inverse sampling

the erdos-renyi law in distribution, for coin tossing and...

stratified random sampling

statistika ekonomi dan bisnis lanjutan · proportional...

stratified squamous epithelium

rdp-gan: a renyi-differential privacy based´ generative...

edges detection based on renyi entropy with split/merge

replicated stratified sampling

stratified and personalised medicine

stratified medicine

perioperative medicine: medical consultation and...

from pathology research to stratified medicine trials....

performance of a supercharged direct- injection stratified...

stratified charging module

the renyi entropy and the uncertainty relations in quantum...

dialogues on mathematics - alfred renyi

estimation of renyi entropy and mutual information´ based...

1 how to read effectively presented by prof. mei renyi