effect size calculation in educational and behavioral research wim van den noortgate ‘power...
TRANSCRIPT
Effect size calculation in educational and behavioral research
Wim Van den Noortgate
‘Power training’ Faculty Psychology and Educational Sciences, K.U.Leuven
Leuven, October 10 2003
Questions and comments: [email protected]
1. Applications
2. A measure for each situation
3. Some specific topics
Applications
1. Expressing size of association
2. Comparing size of association
3. Determining power
Application 1: Expressing size of association
Example: M = 8 ; F = 8.5 ; M = F = 1.5 => δ = 0.33
M F
Application 1: Expressing size of association
Mx
Example: M = 8 ; F = 8.5 ; M = F = 1.5 => δ = 0.33
sc sE p (two-sided) g
8.10 9.34 1.55 1.55 0.015 (*) 0.80
Fx
Application 1: Expressing size of association
Mx
Example: M = 8 ; F = 8.5 ; M = F = 1.5 => δ = 0.33
sc sE p (two-sided) g
8.107.607.967.708.177.868.198.117.868.34
9.347.598.818.258.258.817.938.157.948.53
1.551.231.381.491.761.241.791.761.891.39
1.551.471.591.651.331.581.781.971.641.79
0.015 (*)0.980.0780.280.870.040 (*)0.650.950.890.71
0.80-0.00690.570.350.0530.67-0.140.0200.0420.12
Fx
δ g
FM
FM22)g(
2)g(
nnnn
SEσwith
)σ,δ(N~g
SE96.1δ;SE96.1δg:gof%95For
SE96.1g;SE96.1gδ:gof%95For
)32.0;33.0(~ 2Ng
0 0.33 g
MX FX sc sE p g
8.107.607.967.708.177.868.198.117.868.34
9.347.598.818.258.258.817.938.157.948.53
1.551.231.381.491.761.241.791.761.891.39
1.551.471.591.651.331.581.781.971.641.79
0.015 (*)0.980.0780.280.870.040 (*)0.650.950.890.71
0.80-0.0069
0.570.35
0.0530.67
-0.140.0200.042
0.12
[0.17; 1.43][-0.63; 0.62][-0.06; 1.20][-0.28; 0.98][-0.57; 0.68][0.04; 1.30]
[-0.77; 0.49][-0.61; 0.65][-0.59; 0.67][-0.51; 0.75]
Suppose simulated data are data from 10 studies, being replications of each other:
k
g ̂
kgSEgSE
kgg
gg
gg
)()(
2)(2
)(
)()(
)var()(var
]45.0;05.0[25.0ˆ or
Comparing individual study results and combined study results
1. observed effect sizes may be negative, small, moderate and large.
2. CI relatively large
3. 0 often included in confidence intervals
4. Combined effect size close to population effect size
5. CI relatively small
6. 0 not included in confidence interval
Meta-analysis: Gene Glass (Educational Researcher, 1976, p.3):
“Meta-analysis refers to the analysis of analyses”
Example: Raudenbush & Bryk (2002)
StudyWeeks
previous contact
g SE
1.2.3.4.5.6.7.8.9.
10.11.12.13.14.15.16.17.18.19.
Rosenthal et al. (1974)Conn et al. (1968)Jose & Cody (1971)Pellegrini & Hicks (1972)Pellegrini & Hicks (1972)Evans & Rosenthal (1969)Fielder et al. (1971)Claiborn (1969)Kester & Letchworth (1972)Maxwell (1970)Carter (1970)Flowers (1966)Keshock (1970)Henrickson (1970)Fine (1972)Greiger (1970)Rosenthal & Jacobson (1968)Fleming & Anttonen (1971)Ginsburg (1970)
2330033301001233123
0.030.12
-0.141.180.26
-0.06-0.02-0.320.270.800.540.18
-0.020.23
-0.18-0.060.300.07
-0.07
0.130.150.170.370.370.100.100.220.160.250.300.220.290.290.160.170.140.090.17
Application 2: Comparing the size of association
Results meta-analysis:
1. The variation between observed effect sizes is larger than could be expected based on sampling variance alone: the population effect size is probably not the same for studies.
2. The effect depends on the amount of previous contact
Application 3: Power calculations
0:;0
0:;0
0)2/(1)2/(1
0)2/(1)2/(1
HrejectnotdoSEzgSEzg
HrejectSEzgSEzg
Power = probability to reject H0
Power depends on - δ
- α
- N
‘Powerful’ questions:
1. Suppose the population effect size is small (δ = 0.20), how large should my sample size (N) be, to have a high probability (say, .80) to draw the conclusion that there is an effect (power), when testing with an α-level of .05?
2. I did not find an effect, but maybe the chance to find an effect (power) with such a small sample is small anyway? (N and α from study, assume for instance that δ=g)
A measure for each situation Dependent variable
Dichotomous Nominal Ordinal/interval
Dichotomo
us
RD
RR
Φ
OR
gIG
Glass’s Δ
ggain
ggain IG
nonparametric
rpb
Ind
ep
en
den
t v
aria
ble
Nominal
Measures of contingency
Goodman-Kruskal’s Tau
Uncertainty coefficient
Cohen’s Kappa
Multiple g’s
η²
ICC
Ordinal/interval
Pearson’s r
Spearman’s
Kendall’s / Somer’s D
Gamma coefficient
Weighted Kappa
A measure for each situation Dependent variable
Dichotomous Nominal Ordinal/interval
Dichotomo
us
RD
RR
Φ
OR
gIG
Glass’s Δ
ggain
ggain IG
nonparametric
rpb
Ind
ep
en
den
t v
aria
ble
Nominal
Measures of contingency
Goodman-Kruskal’s Tau
Uncertainty coefficient
Cohen’s Kappa
Multiple g’s
η²
ICC
Ordinal/interval
Pearson’s r
Spearman’s
Kendall’s / Somer’s D
Gamma coefficient
Weighted Kappa
Dichotomous independent-dichotomous dependent variable
Final exam
Predictive test 1 0
1 130 20 150
0 30 20 50
160 40 200
Dichotomous independent-dichotomous dependent variable
1. Risk difference: .87-.60 = .272. Relative risk: .87/.60 = 1.453. Phi: (130 x 20 – 20 x 30)/sqrt (150 x 50 x 160 x 40) = 0.294. Odds ratio: (130 x 20 / 20 x 30) = 4.33
Final exam
Predictive test 1 0
1 130
(87 %)
20
(13 %)
150
(100 %)
0 30
(60 %)
20
(40%)
50
(100 %)
160 40 200
A measure for each situation Dependent variable
Dichotomous Nominal Ordinal/interval
Dichotomo
us
RD
RR
Φ
OR
gIG
Glass’s Δ
ggain
ggain IG
nonparametric
rpb
Ind
ep
en
den
t v
aria
ble
Nominal
Measures of contingency
Goodman-Kruskal’s Tau
Uncertainty coefficient
Cohen’s Kappa
Multiple g’s
η²
ICC
Ordinal/interval
Pearson’s r
Spearman’s
Kendall’s / Somer’s D
Gamma coefficient
Weighted Kappa
Dichotomous independent-continuous dependent variable
1. Independent groups, homogeneous variance:
2. Independent groups, heterogeneous variance:
3. Repeated measures (one group):
4. Repeated measures (independent groups):
5. Nonparametric measures
6. rpb
p
CE
sxx
g
C
CE
sxx
s'Glass
pre
prepost
Dgain s
xxgor
sD
g
preC
preCpostC
preE
preEpostEIGgain s
xx
s
xxg
A measure for each situation Dependent variable
Dichotomous Nominal Ordinal/interval
Dichotomo
us
RD
RR
Φ
OR
gIG
Glass’s Δ
ggain
ggain IG
nonparametric
rpb
Ind
ep
en
den
t v
aria
ble
Nominal
Measures of contingency
Goodman-Kruskal’s Tau
Uncertainty coefficient
Cohen’s Kappa
Multiple g’s
η²
ICC
Ordinal/interval
Pearson’s r
Spearman’s
Kendall’s / Somer’s D
Gamma coefficient
Weighted Kappa
Nominal independent-nominal dependent variable
1. Contingency measures, e.g.: 1. Pearson’s coefficient
2. Cramers V
3. Phi coefficient
2. Goodman-Kruskal tau
3. Uncertainty coefficient
4. Cohen’s Kappa
Illness
Better Same Worse
Experimental 10 5 2
Control 4 7 3
Illness
Better Same Worse
Control 10 5 2
Experimental 4 7 3
Illness
Same Better Worse
Experimental 10 5 2
Control 4 7 3
A measure for each situation Dependent variable
Dichotomous Nominal Ordinal/interval
Dichotomo
us
RD
RR
Φ
OR
gIG
Glass’s Δ
ggain
ggain IG
nonparametric
rpb
Ind
ep
en
den
t v
aria
ble
Nominal
Measures of contingency
Goodman-Kruskal’s Tau
Uncertainty coefficient
Cohen’s Kappa
Multiple g’s
η²
ICC
Ordinal/interval
Pearson’s r
Spearman’s
Kendall’s / Somer’s D
Gamma coefficient
Weighted Kappa
Nominal independent-continuous dependent variable
1. ANOVA: multiple g’s
2. η²
3. ICC
A measure for each situation Dependent variable
Dichotomous Nominal Ordinal/interval
Dichotomo
us
RD
RR
Φ
OR
gIG
Glass’s Δ
ggain
ggain IG
nonparametric
rpb
Ind
ep
en
den
t v
aria
ble
Nominal
Measures of contingency
Goodman-Kruskal’s Tau
Uncertainty coefficient
Cohen’s Kappa
Multiple g’s
η²
ICC
Ordinal/interval
Pearson’s r
Spearman’s
Kendall’s / Somer’s D
Gamma coefficient
Weighted Kappa
Continuous independent-Continuous dependent variable
1. r
2. Non-normal data: Spearman ρ
3. Ordinal data: Kendall’s τ, Somer’s D, Gamma coefficient
4. Weighted Kappa
More complex situations1. Two or more independent variables
a) Regression models
1. Y continuous: Yi= a + bX + ei
1. X continuous: b estimated by
2. X dichotomous (1 = experimental, 0 = control), b estimated by
2. Y dichotomous: Logit(P(Y=1))= a + bX,
If X dichotomous, b estimated by the log odds ratio
X
YXY s
sr
CE yy
More complex situations1. Two or more independent variables
a) Regression modelsb) Stratification c) Contrast analyses in factorial designs (Rosenthal, Rosnow & Rubin,2000)
Number of treatments weekly
0 1 2 3 Mean
Dose 100 mg 3 10 9 12 8.5
of 50 mg 1 4 8 9 5.5
Medication 0 mg 1 4 6 5 4.0
Mean 1.67 6.00 7.67 8.67 6.0
Source SS Df MS F p
Between 1 420 11 129.09 5.16 .000002
Treatments 860 3 286.67 11.47 .000002
Dose 420 2 210.00 8.40 .0004
Treat.x dose 140 6 23.33 0.93 .47
Within 2700 108 25.00
Total 4120 119
Note: N=120 (12 x 10)
Number of treatments weekly
0 1 2 3
Dose 100 mg -3 -1 +1 +3
of 50 mg -3 -1 +1 +3
medication 0 mg -3 -1 +1 +3
Number of treatments weekly
0 1 2 3 Mean
Dose 100 mg -1 +1 +3 +5 +2
of 50 mg -3 -1 +1 +3 0
medication 0 mg -5 -3 -1 +1 -2
Mean -3 -1 +1 +3 0
within
betweenweightsmeans
within
contrastcontrast MS
SSr
MSMS
F
2
withintnoncontrastnoncontrascontrast
contrastsizeeffect dfdfFF
Fr
)(
More complex situations1. Two or more independent variables
a) Regression modelsb) Stratification c) Contrast analyses in factorial designs
2. Multilevel models3. Two or more dependent variables4. Single-case studies
• Yi = b0 + b1 phasei + ei
• Yi = b0 + b1 timei +
b2 phasei +b3 (timei x phasei) + ei
Specific topics
Comparability of effect sizesExample: gIG vs. ggain :
)1(2
)1(2
222
IGgain
prepost
postpreprepost
prepost
prepost
prepost
D
Dgain
CEIG
Comparability of effect sizes
1. Estimating different population parameters, e.g.,
2. Estimating with different precision, e.g.,g vs. Glass’s Δ
)r1(2
gg gain
Choosing a measure1. Design and measurement level2. Assumptions3. Popularity4. Simplicity of sampling distribution
Fisher’s Z = 0.5 log[(1+r)/(1-r)]Log odds ratioLn(RR)
5. Directional effect size
Threats of effect sizes1. ‘Bad data’
2. Measurement error
3. Artificial dichotomization
4. Imperfect construct validity
5. Range restriction
Threats of effect sizes1. ‘Bad data’
2. Measurement error
3. Artificial dichotomization
4. Imperfect construct validity
5. Range restriction
6. Bias