sjs sdi_61 design of statistical investigations stephen senn 6. orthogonal designs randomised blocks...
TRANSCRIPT
SJS SDI_6 1
Design of Statistical Investigations
Stephen Senn
6. Orthogonal Designs
Randomised Blocks II
SJS SDI_6 2
Exp_5Alternative Analyses
• We will now show three alternative analyses of example Exp_5
• First two of these are equivalent to the analysis already done– First is only equivalent because there are only
two treatments– Not equivalent for three or more treatments
SJS SDI_6 3
Matched Pairs t-test
• Reduce data to a single difference d– per patient– between treatments
• Analyse these differences using a t-test for a single sample
2
21
1
ˆ, /( 1)ˆ
rd
r d jjd
dt d d r
r
SJS SDI_6 4
Matched Pairs t-test (cont)
• Under H0, the population mean difference d is zero
• Hence a significance test may be based on the statistic
1ˆ
r
d
dt
r
SJS SDI_6 5
Exp_5Matched Pairs t-test
SEQ Patient Formoterol Salbutamol Differenceforsal 1 310 270 40salfor 2 385 370 15salfor 3 400 310 90forsal 4 310 260 50salfor 5 410 380 30forsal 6 370 300 70forsal 7 410 390 20salfor 9 320 290 30forsal 10 250 210 40forsal 11 380 350 30salfor 12 340 260 80salfor 13 220 90 130forsal 14 330 365 -35
SJS SDI_6 6
Exp_5Calculations
Difference Statistic Value40 Mean 45.3846215 Variance 1647.75690 SD 40.5925750 r 1330 SE 11.2583570 t 4.0320 DF 1230 P-value 0.0017403080
130-35
SJS SDI_6 7
Exp_5Matched Pairs Using S-Plus
#First split data into two columns#Matched by patienti<-sort.list(patient)data2<-cbind(patient=patient[i],pef=pef[i],treat=treat[i])pefTREAT<-split(data2[,2],data2[,3])
#Perform matched pairs t-testt.test(pefTREAT$”2", pefTREAT$”1", alternative="two.sided", mu=0, paired=T, conf.level=.95)
SJS SDI_6 8
Exp_5S-Plus Output
Paired t-Test
data: pefTREAT$"2" and pefTREAT$"1" t = 4.0312, df = 12, p-value = 0.0017 alternative hypothesis: true mean of differences is not equal to 0 95 percent confidence interval: 20.85477 69.91446 sample estimates: mean of x - y 45.38462
SJS SDI_6 9
Exp_5 using a Linear Model Approach
> fit3 <- lm(pef ~ patient + treat)> summary(fit3, corr = F)
Call: lm(formula = pef ~ patient + treat)Residuals: Min 1Q Median 3Q Max -42.31 -11.15 1.554e-015 11.15 42.31
Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 207.3077 21.0624 9.8425 0.0000 patient1 60.0000 28.7033 2.0904 0.0585 patient11 135.0000 28.7033 4.7033 0.0005etc. treat 45.3846 11.2584 4.0312 0.0017
SJS SDI_6 10
pef
tre
at
100 150 200 250 300 350 400
1.0
1.2
1.4
1.6
1.8
2.0
-40 -20 0 20 40
01
23
45
6
residual
fitted
resi
du
al
150 200 250 300 350 400
-40
-20
02
04
0
theoretical
em
pir
ica
l
-2 -1 0 1 2
-40
-20
02
04
0
SJS SDI_6 11
Exp_5A Non-Parametric Approach
• Wilcoxon signed ranks test
• Calculate difference
• Ignore sign
• Rank
• Re-assign sign
• Calculate sum of negative (or positive) ranks
SJS SDI_6 12
Exp_5Signed Rank Calculations
Absolute Rank Signed Patient Difference Difference Abs Diff Rank
1 40 40 7 7 *2 15 15 1 13 90 90 12 124 50 50 9 95 30 30 3 3 *6 70 70 10 107 20 20 2 29 30 30 3 4 *
10 40 40 7 8 *11 30 30 3 5 *12 80 80 11 1113 130 130 13 1314 -35 35 6 -6
* = tie arbitrarily
broken
SJS SDI_6 13
Exp_5Hypothesis Test
• Suppose H0 true
• P = 1/2 any difference is positive or negative
• 213 = 8192 different possible patterns of - and +
• How many produce sum of negative ranks as low as that seen?
SJS SDI_6 14
Possible Assignments of Negative RanksWith Equal or Lower Score
• No negative ranks: 1 case• 1,2,3,4,5,6 only: 6 cases• 1+(2,3,4,5): 4 cases• 2+(3,4): 2 cases• 1+2+3: 1 case• Total = 1+6 + 4 +2 +1 = 14 cases• 14/213=0.00171• Two sided P-value = 2 x 0.00171 =0.0034
SJS SDI_6 15
Exp_5SPlus Output
> wilcox.test(pefTREAT$"1", pefTREAT$"2", alternative = "two.sided", mu = 0, paired = T, exact = T, correct = T
)
Wilcoxon signed-rank test
cannot compute exact p-value with ties in: wil.sign.rank(dff, alternative, exact, correct)data: pefTREAT$"1" and pefTREAT$"2" signed-rank normal statistic with correction Z = -2.7297, p-value = 0.0063 alternative hypothesis: true mu is not equal to 0
SJS SDI_6 16
Why the Discrepancy?
• P = 0.0034 by hand, 0.0063 SPlus• SPlus is using asymptotic approximation• StatXact gives an accurate calculation
StatXact outputExact Inference: One-sided p-value: Pr { Test Statistic .GE. Observed } = 0.0017 Pr { Test Statistic .EQ. Observed } = 0.0005 Two-sided p-value: Pr { | Test Statistic - Mean | .GE. | Observed - Mean | = 0.0034 Two-sided p-value: 2*One-Sided = 0.0034
SJS SDI_6 17
Orthogonal(See Marriott A Dictionary of Statistical Terms)
• Mathematical meaning is perpendicular– as in co-ordinate axes
• Statistical variates are orthogonal if independent
• Experimental design is orthogonal if certain variates or linear combinations are independent– rectangular arrays are orthogonal
SJS SDI_6 18
Randomised Blocks and Orthogonality
• Randomised blocks are examples of orthogonal designs– Rectangular arrays– Balanced
• Consequences– Treatment sum of squares does not change as
blocks are fitted– Design is efficient
SJS SDI_6 19
Orthogonality and Regression
Illustration of orthogonality using Exp_5
Input design matrix corresponding to treatment only
ORIGIN 1
X11
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
T
X1T X1
1 0.077
0.077
0.077
0.154
Variance-covariance matrix
SJS SDI_6 20
Orthogonality and Regression 2Create design matrix with dummy variables for patients also
X2
tempi j 0
tempi j 1 i jif
tempi 13 j 0
tempi 13 j 1 i jif
j 1 12for
i 1 13for
X2 augment X1 temp
Variance-covariance matrix (part)
X2T X2
1
1 2 3
1
2
3
0.538 -0.077 -0.5
-0.077 0.154 0
-0.5 0 1
SJS SDI_6 21
Orthogonality and Regression 3
2 21 2 1 2
2 2 2 2 21 2
1 1ˆ ˆvar( ) ,
1 1 2,
r r
r r r
Here we have
2 21 2
2ˆ ˆvar( ) 0.154
13
SJS SDI_6 22
Orthogonality
• Addition of patients has not increased variance multiplier for treatments
• Variance is as it would be had patients not been included
• “patient” and “treat” are orthogonal
• The factors are balanced
SJS SDI_6 23
Exp_6 Another Example of a Two-Way Layout
• Classic attempt (Cushny & Peebles, 1905) to investigate optical isomers
• Subjects: eleven patients insane asylum Kalamazoo
• Outcome: hours of sleep gained
• Data subsequently analysed by Student (1908) in famous t-test paper.
SJS SDI_6 24
The Cushny and Peebles Data
Control B C DPatient Mean Mean Mean Mean
1 0.6 1.3 2.5 2.12 3.0 1.4 3.8 4.43 4.7 4.5 5.8 4.74 5.5 4.3 5.6 4.85 6.2 6.1 6.1 6.76 3.2 6.6 7.6 8.37 2.5 6.2 8 8.28 2.8 3.6 4.4 4.39 1.1 1.1 5.7 5.810 2.9 4.9 6.3 6.4
B=L-Hyosciamine HBr
C=L-Hyoscine HBr
D=R-Hyoscine HBr
SJS SDI_6 25
Features
• Treatments provide one dimension
• Patients provide the others
• Patients are the “blocks”
• Control is no treatment
• Main interest is difference between optical isomers
• Other treatment is positive control
SJS SDI_6 26
Cushny and Peebles Data
1 2 3 4 5 6 7 8 9 10 11
Patient number
0
2
4
6
8
ControlL-Hyoscyamine HBrL-Hyoscine HBrR-Hyoscine HBr
SJS SDI_6 27
Plotting Points
• Important to show three dimensions of data– Outcome– Treatment– Block (patients)
• This has been done here by using– Patient as a “pseudo-dimension” X– Outcome as the Y dimension– Treatment by colour and symbols
SJS SDI_6 28
Points
• Clear difference between treatments and control
• Some suggestion of difference to active control
• Little suggestion of difference between isomers
SJS SDI_6 29
0 2 4 6
L-Hyoscine HBr
0
2
4
6
R-H
yosc
ine
HB
rCushny and Peebles Data
Comparison of Two Isomers with Respect to Sleep (hours)
SJS SDI_6 30
Exp_6 SPlus Analysis
#Analysis of first 10 patients#Cushny and peebles datapatient<-factor(rep(c("1","2","3","4","5","6","7","8","9","10"),4))treat<-factor(c(rep("A",10),rep("B",10),rep("C",10),rep("D",10))) sleep<-c(0.6,3.0,4.7,5.5,6.2,3.2,2.5,2.8,1.1,2.9,1.3,1.4,4.5,4.3,6.1,6.6,6.2,3.6,1.1,4.9,2.5,3.8,5.8,5.6,6.1,7.6,8.0,4.4,5.7,6.3,2.1,4.4,4.7,4.8,6.7,8.3,8.2,4.3,5.8,6.4)fit1<-aov(sleep~patient+treat)summary(fit1)
SJS SDI_6 31
Exp_6 SPlus Output
summary(fit1) Df Sum of Sq Mean Sq F Value Pr(F) patient 9 89.500 9.94444 7.38815 0.0000231769 treat 3 40.838 13.61267 10.11342 0.0001225074Residuals 27 36.342 1.34600
SJS SDI_6 32
lsleep
tre
at
-0.5 0.0 0.5 1.0 1.5 2.0
1.0
2.0
3.0
4.0
-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
02
46
81
01
2
residual
fitted
resi
du
al
0.0 0.5 1.0 1.5 2.0
-0.6
-0.2
0.2
0.6
theoretical
em
pir
ica
l
-2 -1 0 1 2
-0.6
-0.2
0.2
0.6
SJS SDI_6 33
Calculation of Standard Errors
2 2
2 2 2
1 1 2ˆ ˆvar( ) ,l m
l m
r r r
Here we have2ˆ 1.34600
2ˆ ˆvar( ) 1.3460 0.2692
10
ˆ ˆ ˆ( ) 0.2692 0.519
l m
l mSE
Note that this applies as a consequence of orthogonality
The multiplier 2/r is the same whether or not we fit patient in addition to treat
SJS SDI_6 34
> multicomp(fit1, focus = "treat", error.type = "cwe", method = "lsd")
95 % non-simultaneous confidence intervals for specified linear combinations, by the Fisher LSD method
critical point: 2.0518 response variable: sleep
intervals excluding 0 are flagged by '****'
Estimate Std.Error Lower Bound Upper Bound A-B -0.75 0.519 -1.81 0.315 A-C -2.33 0.519 -3.39 -1.270 ****A-D -2.32 0.519 -3.38 -1.260 ****B-C -1.58 0.519 -2.64 -0.515 ****B-D -1.57 0.519 -2.63 -0.505 ****C-D 0.01 0.519 -1.05 1.070
SJS SDI_6 35
Questions
• Perform a matched pairs analysis comparing B and C and ignoring data from A and D.
• Compare it to the pair-wise contrast for B & C obtained above?
• Why are the results not the same?
• What are the advantages and disadvantages of these approaches?