effect size calculation in educational and behavioral research wim van den noortgate ‘power...

Effect size calculation in educational and behavioral research

Wim Van den Noortgate

‘Power training’ Faculty Psychology and Educational Sciences, K.U.Leuven

Leuven, October 10 2003

Questions and comments: [email protected]

1. Applications

2. A measure for each situation

3. Some specific topics

Applications

1. Expressing size of association

2. Comparing size of association

3. Determining power

Application 1: Expressing size of association

Example: M = 8 ; F = 8.5 ; M = F = 1.5 => δ = 0.33

M F


Mx

Example: M = 8 ; F = 8.5 ; M = F = 1.5 => δ = 0.33

sc sE p (two-sided) g

8.10 9.34 1.55 1.55 0.015 (*) 0.80

Fx


Mx

Example: M = 8 ; F = 8.5 ; M = F = 1.5 => δ = 0.33

sc sE p (two-sided) g

8.107.607.967.708.177.868.198.117.868.34

9.347.598.818.258.258.817.938.157.948.53

1.551.231.381.491.761.241.791.761.891.39

1.551.471.591.651.331.581.781.971.641.79

0.015 (*)0.980.0780.280.870.040 (*)0.650.950.890.71

0.80-0.00690.570.350.0530.67-0.140.0200.0420.12

Fx

δ g

FM

FM22)g(

2)g(

nnnn

SEσwith

)σ,δ(N~g

SE96.1δ;SE96.1δg:gof%95For

SE96.1g;SE96.1gδ:gof%95For

)32.0;33.0(~ 2Ng

0 0.33 g

MX FX sc sE p g

8.107.607.967.708.177.868.198.117.868.34

9.347.598.818.258.258.817.938.157.948.53

1.551.231.381.491.761.241.791.761.891.39

1.551.471.591.651.331.581.781.971.641.79

0.015 (*)0.980.0780.280.870.040 (*)0.650.950.890.71

0.80-0.0069

0.570.35

0.0530.67

-0.140.0200.042

0.12

[0.17; 1.43][-0.63; 0.62][-0.06; 1.20][-0.28; 0.98][-0.57; 0.68][0.04; 1.30]

[-0.77; 0.49][-0.61; 0.65][-0.59; 0.67][-0.51; 0.75]

Suppose simulated data are data from 10 studies, being replications of each other:

k

g ̂

kgSEgSE

kgg

gg

gg

)()(

2)(2

)(

)()(

)var()(var

]45.0;05.0[25.0ˆ or

Comparing individual study results and combined study results

1. observed effect sizes may be negative, small, moderate and large.

2. CI relatively large

3. 0 often included in confidence intervals

4. Combined effect size close to population effect size

5. CI relatively small

6. 0 not included in confidence interval

Meta-analysis: Gene Glass (Educational Researcher, 1976, p.3):

“Meta-analysis refers to the analysis of analyses”

Example: Raudenbush & Bryk (2002)

StudyWeeks

previous contact

g SE

1.2.3.4.5.6.7.8.9.

10.11.12.13.14.15.16.17.18.19.

Rosenthal et al. (1974)Conn et al. (1968)Jose & Cody (1971)Pellegrini & Hicks (1972)Pellegrini & Hicks (1972)Evans & Rosenthal (1969)Fielder et al. (1971)Claiborn (1969)Kester & Letchworth (1972)Maxwell (1970)Carter (1970)Flowers (1966)Keshock (1970)Henrickson (1970)Fine (1972)Greiger (1970)Rosenthal & Jacobson (1968)Fleming & Anttonen (1971)Ginsburg (1970)

2330033301001233123

0.030.12

-0.141.180.26

-0.06-0.02-0.320.270.800.540.18

-0.020.23

-0.18-0.060.300.07

-0.07

0.130.150.170.370.370.100.100.220.160.250.300.220.290.290.160.170.140.090.17

Application 2: Comparing the size of association

Results meta-analysis:

1. The variation between observed effect sizes is larger than could be expected based on sampling variance alone: the population effect size is probably not the same for studies.

2. The effect depends on the amount of previous contact

Application 3: Power calculations

0:;0

0:;0

0)2/(1)2/(1

0)2/(1)2/(1

HrejectnotdoSEzgSEzg

HrejectSEzgSEzg

Power = probability to reject H0

Power depends on - δ

- α

- N

‘Powerful’ questions:

1. Suppose the population effect size is small (δ = 0.20), how large should my sample size (N) be, to have a high probability (say, .80) to draw the conclusion that there is an effect (power), when testing with an α-level of .05?

2. I did not find an effect, but maybe the chance to find an effect (power) with such a small sample is small anyway? (N and α from study, assume for instance that δ=g)

A measure for each situation Dependent variable

Dichotomous Nominal Ordinal/interval

Dichotomo

us

RD

RR

Φ

OR

gIG

Glass’s Δ

ggain

ggain IG

nonparametric

rpb

Ind

ep

en

den

t v

aria

ble

Nominal

Measures of contingency

Goodman-Kruskal’s Tau

Uncertainty coefficient

Cohen’s Kappa

Multiple g’s

η²

ICC

Ordinal/interval

Pearson’s r

Spearman’s

Kendall’s / Somer’s D

Gamma coefficient

Weighted Kappa

Dichotomous independent-dichotomous dependent variable

Final exam

Predictive test 1 0

1 130 20 150

0 30 20 50

160 40 200

Dichotomous independent-dichotomous dependent variable

1. Risk difference: .87-.60 = .272. Relative risk: .87/.60 = 1.453. Phi: (130 x 20 – 20 x 30)/sqrt (150 x 50 x 160 x 40) = 0.294. Odds ratio: (130 x 20 / 20 x 30) = 4.33

Final exam

Predictive test 1 0

1 130

(87 %)

20

(13 %)

150

(100 %)

0 30

(60 %)

20

(40%)

50

(100 %)

160 40 200



Dichotomo

us

RD

RR

Φ

OR

gIG

Glass’s Δ

ggain

ggain IG

nonparametric

rpb

Ind

ep

en

den

t v

aria

ble

Nominal




Cohen’s Kappa

Multiple g’s

η²

ICC

Ordinal/interval

Pearson’s r

Spearman’s


Gamma coefficient

Weighted Kappa

Dichotomous independent-continuous dependent variable

1. Independent groups, homogeneous variance:

2. Independent groups, heterogeneous variance:

3. Repeated measures (one group):

4. Repeated measures (independent groups):

5. Nonparametric measures

6. rpb

p

CE

sxx

g

C

CE

sxx

s'Glass

pre

prepost

Dgain s

xxgor

sD

g

preC

preCpostC

preE

preEpostEIGgain s

xx

s

xxg



Dichotomo

us

RD

RR

Φ

OR

gIG

Glass’s Δ

ggain

ggain IG

nonparametric

rpb

Ind

ep

en

den

t v

aria

ble

Nominal




Cohen’s Kappa

Multiple g’s

η²

ICC

Ordinal/interval

Pearson’s r

Spearman’s


Gamma coefficient

Weighted Kappa

Nominal independent-nominal dependent variable

1. Contingency measures, e.g.: 1. Pearson’s coefficient

2. Cramers V

3. Phi coefficient

2. Goodman-Kruskal tau

3. Uncertainty coefficient

4. Cohen’s Kappa

Illness

Better Same Worse

Experimental 10 5 2

Control 4 7 3

Illness

Better Same Worse

Control 10 5 2

Experimental 4 7 3

Illness

Same Better Worse

Experimental 10 5 2

Control 4 7 3



Dichotomo

us

RD

RR

Φ

OR

gIG

Glass’s Δ

ggain

ggain IG

nonparametric

rpb

Ind

ep

en

den

t v

aria

ble

Nominal




Cohen’s Kappa

Multiple g’s

η²

ICC

Ordinal/interval

Pearson’s r

Spearman’s


Gamma coefficient

Weighted Kappa

Nominal independent-continuous dependent variable

1. ANOVA: multiple g’s

2. η²

3. ICC



Dichotomo

us

RD

RR

Φ

OR

gIG

Glass’s Δ

ggain

ggain IG

nonparametric

rpb

Ind

ep

en

den

t v

aria

ble

Nominal




Cohen’s Kappa

Multiple g’s

η²

ICC

Ordinal/interval

Pearson’s r

Spearman’s


Gamma coefficient

Weighted Kappa

Continuous independent-Continuous dependent variable

1. r

2. Non-normal data: Spearman ρ

3. Ordinal data: Kendall’s τ, Somer’s D, Gamma coefficient

4. Weighted Kappa

More complex situations1. Two or more independent variables

a) Regression models

1. Y continuous: Yi= a + bX + ei

1. X continuous: b estimated by

2. X dichotomous (1 = experimental, 0 = control), b estimated by

2. Y dichotomous: Logit(P(Y=1))= a + bX,

If X dichotomous, b estimated by the log odds ratio

X

YXY s

sr

CE yy


a) Regression modelsb) Stratification c) Contrast analyses in factorial designs (Rosenthal, Rosnow & Rubin,2000)

Number of treatments weekly

0 1 2 3 Mean

Dose 100 mg 3 10 9 12 8.5

of 50 mg 1 4 8 9 5.5

Medication 0 mg 1 4 6 5 4.0

Mean 1.67 6.00 7.67 8.67 6.0

Source SS Df MS F p

Between 1 420 11 129.09 5.16 .000002

Treatments 860 3 286.67 11.47 .000002

Dose 420 2 210.00 8.40 .0004

Treat.x dose 140 6 23.33 0.93 .47

Within 2700 108 25.00

Total 4120 119

Note: N=120 (12 x 10)


0 1 2 3

Dose 100 mg -3 -1 +1 +3

of 50 mg -3 -1 +1 +3

medication 0 mg -3 -1 +1 +3


0 1 2 3 Mean

Dose 100 mg -1 +1 +3 +5 +2

of 50 mg -3 -1 +1 +3 0

medication 0 mg -5 -3 -1 +1 -2

Mean -3 -1 +1 +3 0

within

betweenweightsmeans

within

contrastcontrast MS

SSr

MSMS

F

2

withintnoncontrastnoncontrascontrast

contrastsizeeffect dfdfFF

Fr

)(


a) Regression modelsb) Stratification c) Contrast analyses in factorial designs

2. Multilevel models3. Two or more dependent variables4. Single-case studies

• Yi = b0 + b1 phasei + ei

• Yi = b0 + b1 timei +

b2 phasei +b3 (timei x phasei) + ei

Specific topics

Comparability of effect sizesExample: gIG vs. ggain :

)1(2

)1(2

222

IGgain

prepost

postpreprepost

prepost

prepost

prepost

D

Dgain

CEIG

Comparability of effect sizes

1. Estimating different population parameters, e.g.,

2. Estimating with different precision, e.g.,g vs. Glass’s Δ

)r1(2

gg gain

Choosing a measure1. Design and measurement level2. Assumptions3. Popularity4. Simplicity of sampling distribution

Fisher’s Z = 0.5 log[(1+r)/(1-r)]Log odds ratioLn(RR)

5. Directional effect size

Threats of effect sizes1. ‘Bad data’

2. Measurement error

3. Artificial dichotomization

4. Imperfect construct validity

5. Range restriction

Threats of effect sizes1. ‘Bad data’

2. Measurement error

3. Artificial dichotomization

4. Imperfect construct validity

5. Range restriction

6. Bias

effect size calculation in educational and behavioral research wim van den noortgate ‘power...

Documents