psyc3010 lecture 5 - university of queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. ·...
TRANSCRIPT
1
11
psyc3010 lecture 5psyc3010 lecture 5
power analysis and blocking designspower analysis and blocking designs
last lecture: between-Ps ANOVA III: higher-order designsnext week: correlation and regression
22
last lecture last lecture this lecturethis lecture
last lecture:last lecture:–– conceptual overview of higherconceptual overview of higher--order designsorder designs–– tests in 3tests in 3--way betweenway between--participants ANOVAparticipants ANOVA
this lecture:this lecture:–– Brief recap 3Brief recap 3--way ANOVAway ANOVA–– statistical powerstatistical power–– blocking designs in ANOVAblocking designs in ANOVA
2
33
last week’s experiment with last week’s experiment with some some differentdifferent datadata
Males
Personality Rew None Pun
Impulsive 350 425 450345 430 465330 445 465
Total 1025 1300 1380Mean 341.67 433.33 460.00
Anxious 485 450 470450 455 480495 445 465
Total 1430 1350 1415Mean 476.67 450.00 471.67
Reinforcement
Females
Personality Rew None Pun
Impulsive 350 355 460320 350 495330 340 485
Total 1000 1045 1440Mean 333.33 348.33 480.00
Anxious 320 360 480340 360 470350 370 450
Total 1010 1090 1400Mean 336.67 363.33 466.67
Reinforcement
44
omnibus tests in a 3omnibus tests in a 3--way anovaway anovamain effects:main effects:–– differences between marginal means of one factor differences between marginal means of one factor
(averaging over levels of other factors)(averaging over levels of other factors)
twotwo--way interactions (firstway interactions (first--order):order):–– examines whether the effect of one factor is the same examines whether the effect of one factor is the same
at every level of a second factor (averaging over a at every level of a second factor (averaging over a third factor)third factor)
threethree--way interaction way interaction (second(second--order)order)::–– examines whether the twoexamines whether the two--way interaction between way interaction between
two factors is the same at every level of the third two factors is the same at every level of the third factor factor
–– oror whether the cell means differ more than you would whether the cell means differ more than you would expect given the main effects and the twoexpect given the main effects and the two--way way interactionsinteractions
3
55
main summary table main summary table (from SPPS)(from SPPS)
Tests of Between-Subjects Effects
Dependent Variable: RT
60237.500 2 30118.750 169.418 .0007802.778 1 7802.778 43.891 .000
24544.444 1 24544.444 138.062 .0007751.389 2 3875.694 21.801 .000
16426.389 2 8213.194 46.199 .0006944.444 1 6944.444 39.062 .000
6601.389 2 3300.694 18.566 .000
4266.667 24 177.778134575.000 35
SourceREINFORCPERSONALGENDERREINFORC * PERSONALREINFORC * GENDERPERSONAL * GENDERREINFORC * PERSONAL* GENDERErrorTotal
Type III Sumof Squares df Mean Square F Sig.
66
steps for followingsteps for following--up up a 3a 3--way interactionway interaction
is interaction significant?
NO
calculate simple interaction effects
YES
NOYES
does factor A or B have >2 levels?
NO YESconduct tests
for simple simple
comparisons
STOP
STOP
STOP
STOP
is A x B at C1 or C2
significant?
are the simple simple effects of A or B significant?
NO
STOP(repeat this sequence for AC at B1 and B2, (repeat this sequence for AC at B1 and B2, and BC at A1 and A2, to explore interaction and BC at A1 and A2, to explore interaction
–– or pick based on theory / hypotheses)or pick based on theory / hypotheses)
YES
4
77
following up the 3following up the 3--way interactionway interaction
simple interaction effectssimple interaction effects–– tests of 2tests of 2--way interactions at each level of the third factorway interactions at each level of the third factor–– e.g., personality x reinforcement at each level of gendere.g., personality x reinforcement at each level of gender
simple simple effects:simple simple effects:–– tests of the effect of a factor at each level of second tests of the effect of a factor at each level of second
factor within levels of a third factorfactor within levels of a third factor–– e.g., effects of reinforcement at each level of personality (for males)e.g., effects of reinforcement at each level of personality (for males)
simple simple comparisons:simple simple comparisons:–– compares effect of different levels of a factor at a compares effect of different levels of a factor at a
particular level of a second factor within levels of a third particular level of a second factor within levels of a third factorfactor
–– e.g., effects of rew versus pun or neutral for impulsive males e.g., effects of rew versus pun or neutral for impulsive males
88
summary table for followsummary table for follow--up testsup testsSource SS df MS F p
Simple Interaction Effects
PR at G1 13744.44 2 6872.22 38.656 0.000PR at G2 608.33 2 304.17 1.711 0.202Error 4266.67 24 177.78
Simple Simple effects in the interaction of PR at G1
R at P1 at G1 23116.67 2 11558.33 65.016 0.000R at P2 at G1 538.89 2 269.44 1.516 0.240Error 4266.67 24 177.78
Simple Simple comparisons in simple effect of R at P1 at G1
R vs N and P at P1 at G1 22047.90 1 22047.90 124.019 0.000N vs P at P1 at G1 355.64 1 355.64 2.000 0.170Error 4266.67 24 177.78
5
99
summary table for followsummary table for follow--up testsup testsSource SS df MS F p
Simple Interaction Effects
PR at G1 13744.44 2 6872.22 38.656 0.000PR at G2 608.33 2 304.17 1.711 0.202Error 4266.67 24 177.78
Simple Simple effects in the interaction of PR at G1
R at P1 at G1 23116.67 2 11558.33 65.016 0.000R at P2 at G1 538.89 2 269.44 1.516 0.240Error 4266.67 24 177.78
Simple Simple comparisons in simple effect of R at P1 at G1
R vs N and P at P1 at G1 22047.90 1 22047.9 124.019 0.000N vs P at P1 at G1 355.64 1 355.64 2.000 0.170Error 4266.67 24 177.78
the personality x reinforcement interaction is
- significant for males
- non-significant for femaleswe can follow up this significant simple interaction effect of Personality x Reinforcement for males using simple simple effects
1010
summary table for followsummary table for follow--up testsup testsSource SS df MS F p
Simple Interaction Effects
PR at G1 13744.44 2 6872.22 38.656 0.000PR at G2 608.33 2 304.17 1.711 0.202Error 4266.67 24 177.78
Simple Simple effects in the interaction of PR at G1
R at P1 at G1 23116.67 2 11558.33 65.016 0.000R at P2 at G1 538.89 2 269.44 1.516 0.240Error 4266.67 24 177.78
Simple Simple comparisons in simple effect of R at P1 at G1
R vs N and P at P1 at G1 22047.90 1 22047.9 124.019 0.000N vs P at P1 at G1 355.64 1 355.64 2.000 0.170Error 4266.67 24 177.78
the effect of reinforcement (for males) is- significant for the impulsive group- not-significant for the anxious group
we can follow up the significant simple simple effect of reinforcement at impulsivity (for males) using simple simple comparisons
6
1111
summary table for followsummary table for follow--up testsup testsSource SS df MS F p
Simple Interaction Effects
PR at G1 13744.44 2 6872.22 38.656 0.000PR at G2 608.33 2 304.17 1.711 0.202Error 4266.67 24 177.78
Simple Simple effects in the interaction of PR at G1
R at P1 at G1 23116.67 2 11558.33 65.016 0.000R at P2 at G1 538.89 2 269.44 1.516 0.240Error 4266.67 24 177.78
Simple Simple comparisons in simple effect of R at P1 at G1
R vs N and P at P1 at G1 22047.90 1 22047.90 124.019 0.000N vs P at P1 at G1 355.64 1 355.64 2.000 0.170Error 4266.67 24 177.78
simple simple comparisons
- calculated using the t-test formula (c.f. lecture 3)
- can also be calculated using a F-test (last week’s tutorial exercise showed how to do this)
1212
summary table for followsummary table for follow--up testsup testsSource SS df MS F p
Simple Interaction Effects
PR at G1 13744.44 2 6872.22 38.656 0.000PR at G2 608.33 2 304.17 1.711 0.202Error 4266.67 24 177.78
Simple Simple effects in the interaction of PR at G1
R at P1 at G1 23116.67 2 11558.33 65.016 0.000R at P2 at G1 538.89 2 269.44 1.516 0.240Error 4266.67 24 177.78
Simple Simple comparisons in simple effect of R at P1 at G1
R vs N and P at P1 at G1 22047.90 1 22047.9 124.019 0.000N vs P at P1 at G1 355.64 1 355.644 2.000 0.170Error 4266.67 24 177.78
for impulsive males, 1) performance under rewarding feedback was
significantly different from that under neutral or punishing feedback
2) performance under punishing feedback was not significantly different to performance under neutralfeedback
7
1313
reportingreporting
A 2 x 2 x 3 factorial ANOVA was performed, and a significant A 2 x 2 x 3 factorial ANOVA was performed, and a significant threethree--way interaction was observed among personality, gender, way interaction was observed among personality, gender, and reinforcement and reinforcement FF(2,24) = 169.42, (2,24) = 169.42, pp<.001<.001. The simple . The simple personality x reinforcement interaction was significant for men, personality x reinforcement interaction was significant for men, FF(2,24) = 38.66, (2,24) = 38.66, pp<.001<.001, but not for women, , but not for women, FF(2,24) = 1.71, (2,24) = 1.71, p p = = .202.202. Simple simple effects of reinforcement were significant for . Simple simple effects of reinforcement were significant for impulsive males, impulsive males, FF(2,24) = 65.02, (2,24) = 65.02, pp<.001<.001, but not for anxious , but not for anxious males, males, FF(2,24) = 1.52, (2,24) = 1.52, p p = .240= .240. Planned contrasts showed that . Planned contrasts showed that impulsive males performed better with reward impulsive males performed better with reward ((MM = 341.67)= 341.67) than than with punishment or no reinforcement with punishment or no reinforcement ((MM = 441.67), = 441.67), FF(1,24) = (1,24) = 124.02, 124.02, p<p<.001,.001, while the performance of impulsive males while the performance of impulsive males receiving no reinforcement receiving no reinforcement ((MM = 433.33)= 433.33) was not significantly was not significantly different to those receiving punishment different to those receiving punishment ((MM = 460.00), = 460.00), FF(2,24) = (2,24) = 2.00, 2.00, p = p = .170.170..
1. I haven’t put effect sizes. These would be required for all omnibus tests. Effect sizes are usually included for simple effects as well, and even for simple comparisons.
1414
reportingreporting
Results of a 2 x 2 x 3 factorial ANOVA indicated a significant threeResults of a 2 x 2 x 3 factorial ANOVA indicated a significant three--way interaction among personality, gender, and reinforcement way interaction among personality, gender, and reinforcement FF(2,24) = 169.42, (2,24) = 169.42, pp<.001<.001. The simple personality x reinforcement . The simple personality x reinforcement interaction was significant for males, interaction was significant for males, FF(2,24) = 38.66, (2,24) = 38.66, pp<.001<.001, but not , but not for females, for females, FF(2,24) = 1.71, (2,24) = 1.71, p p = .202= .202. There was a significant simple . There was a significant simple simple effect of reinforcement for impulsive males, simple effect of reinforcement for impulsive males, FF(2,24) = 65.02, (2,24) = 65.02, pp<.001<.001, but not for anxious males, , but not for anxious males, FF(2,24) = 1.52, (2,24) = 1.52, p p = .240= .240. Planned . Planned contrasts showed that impulsive males performed better with reward contrasts showed that impulsive males performed better with reward ((MM = 341.67)= 341.67) than with punishment or no reinforcement than with punishment or no reinforcement ((MM = = 441.67), 441.67), FF(1,24) = 124.02, (1,24) = 124.02, p<p<.001,.001, while the performance of while the performance of impulsive males receiving no reinforcement impulsive males receiving no reinforcement ((MM = 433.33)= 433.33) was not was not significantly different to those receiving punishment significantly different to those receiving punishment ((MM = 460.00), = 460.00), FF(2,24) = 2.00, (2,24) = 2.00, p = p = .170.170..
2. NB we are reporting simple simple effects of reinforcement now – assumes this = theoretical focus. Last week we hypothesized different personality effects depending on reinforcement, but this week we are focusing on different reinforcement effects depending on personality.
8
1515
reportingreporting
Results of a 2 x 2 x 3 factorial ANOVA indicated a significant threeResults of a 2 x 2 x 3 factorial ANOVA indicated a significant three--way way interaction among personality, gender, and reinforcement interaction among personality, gender, and reinforcement FF(2,24) = (2,24) = 169.42, 169.42, pp<.001. The simple personality x reinforcement interaction was <.001. The simple personality x reinforcement interaction was significant for males, significant for males, FF(2,24) = 38.66, (2,24) = 38.66, pp<.001, but not for females, <.001, but not for females, FF(2,24) = 1.71, (2,24) = 1.71, p p = .202. [Insert before simple = .202. [Insert before simple simplesimple effects of R for effects of R for men:] men:] For women, across personality type the simple effect of For women, across personality type the simple effect of reinforcement was / was not significant, F(2,24)=X, p=, eta2= .reinforcement was / was not significant, F(2,24)=X, p=, eta2= . [If yes, [If yes, would also followwould also follow--up with contrasts.] In addition, there was a significant up with contrasts.] In addition, there was a significant simple simple simplesimple effect of reinforcement for impulsive males, effect of reinforcement for impulsive males, FF(2,24) = (2,24) = 65.02, 65.02, pp<.001<.001, but not for anxious males, , but not for anxious males, FF(2,24) = 1.52, (2,24) = 1.52, p p = .240= .240. etc”. etc”
3. NB for simplicity we reported for women that the simple interaction of P x R was ns, but unless interested ONLY in interaction we would also ‘followup’ ns simple interaction by looking at simple effect of Reinforcement for women (collapsing across Personality), if hypotheses suggest R = the theoretically key IV. We want to know if there are effects of reinforcement for women that don’t change depending on personality.
1616
ReportingReportingWhere there is one key IV with two moderators Where there is one key IV with two moderators of interest, you might also choose to skip directly of interest, you might also choose to skip directly from a threefrom a three--way to the simple simple effects.way to the simple simple effects.–– E.g. (madeE.g. (made--up data), “The threeup data), “The three--way interaction was way interaction was
significant, F=,p=,eta2=. Followsignificant, F=,p=,eta2=. Follow--up tests examined up tests examined the simple simple effects of reinforcement within each the simple simple effects of reinforcement within each level of personality and gender. Differences in level of personality and gender. Differences in reaction to reinforcement were observed for anxious reaction to reinforcement were observed for anxious men, F=,p=,eta2=, and impulsive men, F=,p=,eta2=, and impulsive women,F=,p=,eta2=. Specifically, [describe simple women,F=,p=,eta2=. Specifically, [describe simple simple comparisons here]. For impulsive men and simple comparisons here]. For impulsive men and anxious women, no effects of reinforcement were anxious women, no effects of reinforcement were found, Fs< ##, ps> ##, eta2s< ##.”found, Fs< ##, ps> ##, eta2s< ##.”
In fact you might even go directly from the threeIn fact you might even go directly from the three--way to the simple comparisons (and put these in way to the simple comparisons (and put these in a table) to reduce your wordiness and the a table) to reduce your wordiness and the complexity of the text.complexity of the text.
9
1717
Table 1. Mean reaction time as a function of Table 1. Mean reaction time as a function of gender, personality, and reinforcement. gender, personality, and reinforcement.
Reinforcement: Reinforcement: Punishment NeutralPunishment Neutral ReinforcementReinforcementWomenWomen
ImpulsiveImpulsive 485.33485.33aa 481.67481.67aa 320.00320.00bbAnxiousAnxious 487.10487.10aa 485.33485.33aa 323.00323.00bb
MenMenImpulsiveImpulsive 484.25484.25aa 478.76478.76aa 316.67316.67bbAnxiousAnxious 367.55367.55aa 455.67455.67bb 583.00583.00cc
Note. Differing subscripts within the same row indicate significant Note. Differing subscripts within the same row indicate significant differences among the means.differences among the means.
1. Most experimental journals would also want to see standard deviations.
E.g. (madeE.g. (made--up data): “The threeup data): “The three--way interaction of P x R x G was significant, way interaction of P x R x G was significant, F=,p=,eta2=. Table 1 displays the means by gender, personality, and F=,p=,eta2=. Table 1 displays the means by gender, personality, and reinforcement to decompose the interaction. As seen in Table 1, followreinforcement to decompose the interaction. As seen in Table 1, follow--up up analyses revealed that for women and for impulsive men, reinforcement led to analyses revealed that for women and for impulsive men, reinforcement led to faster responses (more learning) than either the control condition or faster responses (more learning) than either the control condition or punishment, which did not differ. For anxious men, by contrast, punishment punishment, which did not differ. For anxious men, by contrast, punishment produced the most learning, followed by the control condition, with produced the most learning, followed by the control condition, with reinforcement being the least successful condition (slowest response times).”reinforcement being the least successful condition (slowest response times).”
1818
Table 1. Mean reaction time as a Table 1. Mean reaction time as a function of gender, personality, and function of gender, personality, and reinforcement. reinforcement.
Reinforcement: Reinforcement: Punishment NeutralPunishment Neutral ReinforcementReinforcementWomenWomen
ImpulsiveImpulsive 485.33485.33aa 481.67481.67aa 320.00320.00bbAnxiousAnxious 487.10487.10aa 485.33485.33aa 323.00323.00bb
MenMenImpulsiveImpulsive 484.25484.25aa 478.76478.76aa 316.67316.67bbAnxiousAnxious 367.55367.55aa 455.67455.67bb 583.00583.00cc
Note. Differing subscripts within the same row indicate significant Note. Differing subscripts within the same row indicate significant differences among the means.differences among the means.
2. NB how this hides all stats (except the three-way) – excellent for non-stats minded reviewers
E.g. (madeE.g. (made--up data): “The threeup data): “The three--way interaction of P x R x G was significant, way interaction of P x R x G was significant, F=,p=,eta2=. Table 1 displays the means by gender, personality, and F=,p=,eta2=. Table 1 displays the means by gender, personality, and reinforcement to decompose the interaction. As seen in Table 1, followreinforcement to decompose the interaction. As seen in Table 1, follow--up up analyses revealed that for women and for impulsive men, reinforcement led to analyses revealed that for women and for impulsive men, reinforcement led to faster responses (more learning) than either the control condition or faster responses (more learning) than either the control condition or punishment, which did not differ. For impulsive men, by contrast, punishment punishment, which did not differ. For impulsive men, by contrast, punishment produced the most learning, followed by the control condition, with produced the most learning, followed by the control condition, with reinforcement being the least successful condition (slowest response times).”reinforcement being the least successful condition (slowest response times).”
10
1919
Reporting: The bottom lineReporting: The bottom lineWith higherWith higher--order designs, there’s a lot more order designs, there’s a lot more flexibility about how you structure the analyses / flexibility about how you structure the analyses / writewrite--upupIt’s best to be guided by your theory.It’s best to be guided by your theory.This may require you to fully explore the simple This may require you to fully explore the simple interactions, simple simple effects, simple simple interactions, simple simple effects, simple simple comparisons, etc. within the textcomparisons, etc. within the textOr allow for some skipping.Or allow for some skipping.The more complex the design, the more likely The more complex the design, the more likely that you need a Table (or Tables, and/or figures) that you need a Table (or Tables, and/or figures) to report the results.to report the results.
2020
What happens with Unequal n?What happens with Unequal n?Something really complex mathematically Something really complex mathematically –– so you’re not so you’re not required to learn this material (it will not be assessed on required to learn this material (it will not be assessed on the exam)the exam)We provide a quick overview of the issues for the We provide a quick overview of the issues for the interested among you on the Blackboard web site in the interested among you on the Blackboard web site in the practice materials sectionpractice materials sectionTake home message: Take home message: If unequal n are too extreme If unequal n are too extreme (ratio of largest to smallest cell size > 3:1), results of (ratio of largest to smallest cell size > 3:1), results of analyses are unstable (less likely to replicate analyses are unstable (less likely to replicate –– can have can have more Type 1 more Type 1 andand more Type 2 errors). more Type 2 errors). –– Always try for equal n in study designs where anticipate using Always try for equal n in study designs where anticipate using
ANOVA.ANOVA.–– Avoid using natural group variables like gender or ethnicity in Avoid using natural group variables like gender or ethnicity in
ANOVA if ratio of people in groups exceeds 3:1 for largest: ANOVA if ratio of people in groups exceeds 3:1 for largest: smallest.smallest.
11
2121
2222
statistical power: statistical power: a conceptual a conceptual
overviewoverview
12
2323
typetype--1 and type1 and type--2 errors2 errorstypetype--1 error1 error == finding a significant difference in the finding a significant difference in the sample that actually doesn’t exist in the population sample that actually doesn’t exist in the population
denoted denoted αα
typetype--2 error2 error == finding no significant difference in the finding no significant difference in the sample when one actually exists in the populationsample when one actually exists in the population
denoted denoted ββ
significant differences are defined with reference to a significant differences are defined with reference to a criterion or threshold (i.e., acceptable rate) for or threshold (i.e., acceptable rate) for committing typecommitting type--1 errors: typically set at .05 or .011 errors: typically set at .05 or .01
hypothesis testing pays little attention to typehypothesis testing pays little attention to type--2 error2 errorconcept of power shifts focus to typeconcept of power shifts focus to type--2 error2 error
2424
definition of powerdefinition of powertechnical definition:
the probability of correctly rejecting a false H0mathematically works out to 1 - β( β = type-2 error = probability of accepting false H0)
useful definition:the degree to which we can detect treatment effects(includes main effects, interactions, simple effects, etc.)
when they exist in the population
Power analyses put the emphasis on researchers’ ability to find effects that exist, rather than on the likelihood of incorrectly finding effects that don’t exist
13
2525
reality vs. statistical decisions
false alarmfalse alarmαα
jackpotjackpot1 1 -- ββ
good savegood save1 1 -- αα
missβ
reality
H0 is correct H1 is correctdecision
rejected H0
did not reject H0
2626
why you might care about powerwhy you might care about power1)1) I have done a study and I want to report the power I have done a study and I want to report the power
of my significant effect.of my significant effect.need to calculate need to calculate observed power (post hoc power)observed power (post hoc power)
2)2) I have done a study and did not find a significant I have done a study and did not find a significant effect. But I know the mean difference exists in the effect. But I know the mean difference exists in the population. How can I increase power to detect it?population. How can I increase power to detect it?
need to calculate need to calculate predicted power (a priori power)predicted power (a priori power)
3)3) I am designing a study. I want to be sure that I have I am designing a study. I want to be sure that I have enough power to detect my predicted effects.enough power to detect my predicted effects.
need to calculate need to calculate predicted power (a priori power)predicted power (a priori power)
14
2727
in terms of hypothesis testingin terms of hypothesis testingHo
μo α (one-tailed)
Sampling distribution of mean if H0 is true
H1
μ1
power
β (probability of Type II error)
Sampling distribution of mean if H1 is true
2828
factors affecting powerfactors affecting power
power depends on a number of thingspower depends on a number of things–– significance level, significance level, αα
•• relaxed relaxed αα more powermore power–– sample size, sample size, NN
•• more N more N more powermore power–– mean differences, mean differences, μμ00 –– μμ11
•• larger differences larger differences more powermore power–– error variance error variance –– σσee
22 or MSor MSerrorerror•• less error variance less error variance more powermore power
15
2929
power as a function of power as a function of αHo H1
μo μ1 α
power
β
3030
power as a function of power as a function of αHo H1
μo μ1 α
power
β
occasionally people will set α=.10 if power is
low
16
3131
power as a function of power as a function of μμ00 –– μμ11
Ho H1
μo μ1 α
power
β
3232
power as a function of power as a function of μμ00 –– μμ11
Ho H1
μo μ1 α
power
β
not very surprising!
17
3333
Ho H1
μo μ1 α
power
β
as a function of n and as a function of n and σσ22
3434
as a function of n and as a function of n and σσ22
Ho H1
μo μ1 α
power
β
an increase in sample size or
decrease variance will
decrease error variance
Recall: σσ22means means = = σσ22//nn
or or σσmeans means = = σσ22//√√nn
18
3535
as a function of effect size (as a function of effect size (d)d)
power closely related to effect sizepower closely related to effect sizeeffect size is determined by some of these previous effect size is determined by some of these previous factors we looked at (factors we looked at (μo -- μ1 and σ):d = d = μ1 -- μ0
σd indicates about how many standard deviations the d indicates about how many standard deviations the means are apart, & thus the overlap of the two means are apart, & thus the overlap of the two distributions:distributions:
i.e., how much the means differvariability in the sample
effect size d percentage overlap
Small 0.20 85%
Medium 0.50 67%
Large 0.80 53%
3636
what’s the point again?aimaim: to achieve .80 power (minimum optimal level): to achieve .80 power (minimum optimal level)
80% chance that you will find a significant 80% chance that you will find a significant effect, if it exists in the populationeffect, if it exists in the population
can conduct can conduct a priori a priori power analyses power analyses beforebeforeyou run your study you run your study –– If you want to know how many participants you need in If you want to know how many participants you need in
your study, in order to find the effects you’re looking foryour study, in order to find the effects you’re looking for
but you can also conduct but you can also conduct postpost--hochoc power analyses power analyses afterafter your study to help strengthen your argument:your study to help strengthen your argument:–– if you didn’t find significant results, but you wanted to: if you didn’t find significant results, but you wanted to:
can argue that you can argue that you didn’t have sufficient powerdidn’t have sufficient power–– if you didn’t find significant results, and that’s good: if you didn’t find significant results, and that’s good:
can argue that you can argue that you did have sufficient powerdid have sufficient power
19
3737
first, gather some information
a priori estimates a priori estimates –– what N to achieve .80 power?what N to achieve .80 power?
information to be gatheredinformation to be gathered::1. estimate of effect size 1. estimate of effect size 2. estimate of error (2. estimate of error (MSMSerrorerror))
postpost--hoc estimates hoc estimates –– how powerful was my study?how powerful was my study?
information to be gatheredinformation to be gathered::1. effect size 1. effect size 2. error (2. error (MSMSerrorerror))3. 3. NN in your studyin your study
we typically use previous research to estimate the
likely effect size and MSerror
we get these estimatesfrom our dataset
3838
next, get the softwarenext, get the softwareG*POWER: free program for power calculationsfree program for power calculationshttp://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/
• a priori power analyses – provides sample sizesfor given effect sizes, alpha levels, and power values
• post hoc power analyses – provides power valuesfor given sample sizes, effect sizes, and alpha levels
suitable for most major statistical methods
useful reference:Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39 (2), 175-191.
20
3939
finally, use the softwarefinally, use the softwareselect a “test family” from the drop-down menu of options
- for example “F tests”select a “Statistical test” within this family
- for example “ANOVA: fixed effects, special, main effects, and interactions”
select the type of Power Analysis you want to conduct - “a priori” or “post-hoc”fill in the necessary information and select “calculate”
you may need to conduct a separate analysis for each systematic effect in your design
So how many in 2-way ANOVA? And 3-way?
4040
21
4141
strategic issuesstrategic issuesin thinking about in thinking about statistical powerstatistical power
4242
some caveatssome caveats1.1. an effect must exist for you to find itan effect must exist for you to find it
increasing power can help you detect even very increasing power can help you detect even very small effects small effects (which can be important )(which can be important )
but if your theory and predictions but if your theory and predictions (about mean (about mean differences)differences) do notdo not reflect what’s actually going on reflect what’s actually going on in the population:in the population:
HH00 will be right, and Hwill be right, and H11 will be wrongwill be wrong
all the power in the world won’t help you find all the power in the world won’t help you find the effects you’re searching forthe effects you’re searching for
22
4343
2. large samples can be bewitching
a large sample can detect very small effects that may be relatively unimportant and unstable
exclusive focus on significance test may lead you to overestimate the importance of a small effect
some caveatssome caveats
4444
3. error variance is also important
high error variance means that a large effect may still turn out to be non-significant
don’t just focus on sample size
some caveatssome caveats
23
4545
so how do I maximize power?1. focus on studying large effects (!)
but limited opportunities in psychology
2. increase sample sizebut may face practical constraints
3. increase α level but this increases chance of type-1 error
4. decrease error varianceHrm how?
4646
how to reduce error varianceaim is to reduce variation in DV scoresfrom sources other than your IVs
improve operationalisation of variablesincreases validity
improve measurement of variablesincreases (internal) reliability
improve design of your studyaccount for variance from other sources (e.g. blocking designs - more in a minute)
improve methods of analysiscontrol for variance from other sources (e.g. ANCOVA - more in next lecture)
24
4747
If you torture data sufficiently, it will confess If you torture data sufficiently, it will confess to almost anything. to almost anything.
----Fred Fred MengerMenger, chemistry professor (1937, chemistry professor (1937-- ))
4848
blocking designsblocking designs
25
4949
what “blocking” iswhat “blocking” isin any research study, you want to explain variance in any research study, you want to explain variance in a DV with a novel IV / treatmentin a DV with a novel IV / treatmentoften, variance in the DV can also be explained by often, variance in the DV can also be explained by additional factors which are less novel or interestingadditional factors which are less novel or interesting–– known as known as controlcontrol or or concomitantconcomitant variablesvariables
blocking introduces blocking introduces control variables into your designvariables into your design–– reflect additional sources of variation or prereflect additional sources of variation or pre--existing existing
differences on the DV scoredifferences on the DV score
DV IV of interest control variableproblem solving teaching style IQprotest behaviour group norms political attitudesdepression stress family history
5050
11stst application of blockingapplication of blockingadding a second factor adding a second factor increases powerincreases power by by reducing error variancereducing error variance
how this works:how this works:–– error varianceerror variance, or , or residual varianceresidual variance, represents , represents
all variance left over after accounting for all variance left over after accounting for systematic variance explained by IV of interestsystematic variance explained by IV of interest
–– error variance could reflect error variance could reflect chance chance oror systematicsystematicunmeasuredunmeasured influencesinfluences from other factor(s)from other factor(s)
–– adding a 2adding a 2ndnd factor (that is correlated with the DV) factor (that is correlated with the DV) may account for some of this leftmay account for some of this left--over varianceover variance
–– reducing error variance in this way will increase reducing error variance in this way will increase power (think back to the first half of this lecture)power (think back to the first half of this lecture)
26
5151
example example -- 11stst application of blockingapplication of blocking
11--way ANOVA:way ANOVA:
–– DV = problemDV = problem--solving timesolving time–– IV = caffeine intake (caffeine or control) IV = caffeine intake (caffeine or control)
we could introduce another IV that was related to we could introduce another IV that was related to the DV, giving us a the DV, giving us a 22--way ANOVAway ANOVA::
–– DV = problemDV = problem--solving timesolving time–– factor 1 factor 1 (IV) = caffeine intake(IV) = caffeine intake–– factor 2 factor 2 (blocking variable) is related to the DV(blocking variable) is related to the DV
•• e.g., IQ levele.g., IQ level
5252
IVError
caffeineerrorerror
variance in the 1variance in the 1--way ANOVA:way ANOVA:effect of caffeine onlyeffect of caffeine only
large error / residual variance means less power for test
27
5353
IVErrorBlockIV*Block
caffeine
errorerrorIQ
IQ x C
variance in the 2variance in the 2--way ANOVA:way ANOVA:effects of caffeine and IQeffects of caffeine and IQ
Maybe you can reduce the error / residual variance by partitioning out variability due to 2nd factor
more power to find effect of original IVas long as the original IV and 2nd factor don’t
explain the same variance in the DV (can’t just manipulate same thing 2x)
Tests of Between-Subjects Effects
Dependent Variable: ATTRACT
199.85 1 199.85 5.323 .0261902.290 2 951.145 25.332 .000120.870 2 60.435 1.610 .212
1464.340 39 37.5478966.667 44
SourceIVBlockIV*BlockErrorTotal
Type III Sumof Squares df Mean Square F Sig.
ProbSolv
Tests of Between-Subjects Effects
Dependent Variable: ATTRACT
199.850 1 199.850 2.464 .1243487.500 43 81.1058966.667 44
SourceIVErrorTotal
Type III Sumof Squares df Mean Square F Sig.
ProbSolv
5454
evidence that blocking really can help increase power
28
5555
5656
how to set up a blocking designhomogenous blocks created with levels of blocking homogenous blocks created with levels of blocking factor: Ps are factor: Ps are matchedmatched within levels of blocking factorwithin levels of blocking factor–– e.g. first divide people into IQ groups (hi, med, low)e.g. first divide people into IQ groups (hi, med, low)
participants within each block then randomly assigned participants within each block then randomly assigned to levels of the IV (to levels of the IV (stratified random assignmentstratified random assignment))–– e.g. within each of three IQ groups, assign Ps to e.g. within each of three IQ groups, assign Ps to
either caffeine or control conditioneither caffeine or control condition
low IQ mid IQ high IQlow IQ mid IQ high IQ
controlcontrol
caffeinecaffeine
29
5757
what’s different about blockingwhat’s different about blockingexperimental designsexperimental designs are fully are fully randomisedrandomised: : participants are randomly assigned to 1 level participants are randomly assigned to 1 level of each (and every) factorof each (and every) factor
blockingblocking is not randomised: is not randomised: participants are categorized into levels of the participants are categorized into levels of the blocking factorblocking factor
design with blocking factor is also known as:design with blocking factor is also known as:•• randomized block designrandomized block design•• stratificationstratification•• matched samples designmatched samples design
5858
1) what if the blocking variable doesn’t reduce the error term?– then you have chosen the wrong blocking variable– blocking variables are selected because of their
known association with the dependent variable– blocking variables should also not explain the same
variance as the focal IV
2) how does the blocking variable fit in with predictions and reporting of findings?– unlike the effect of the focal IV, the effect of the
blocking factor is not usually of interest– the blocking variable is only factored in to reduce error
and increase the power of the test for the focal IV
some questions about blockingsome questions about blocking
30
5959
6060
22ndnd application of blockingapplication of blockingdetecting potential confounds: factors that offer alternative explanations for systematic resultsexample:example: each condition in your study is run by a different experimenter (2nd factor)– the experimental effects may be due to the
experimenter, rather than treatment conditions [main effect of E, rather than main effect of T]
– the effect of the treatment may depend on who the experimenter was [E x T interaction]
31
6161
IVs, controls, confounds...The label depends entirely on theory / argument
– an independent variable (IV) independent variable (IV) has a significant has a significant effect, which is of great interest to you effect, which is of great interest to you (systematic variance that you predicted)(systematic variance that you predicted)
– a controlcontrol variablevariable has a significant effect, has a significant effect, but that can be okay but that can be okay (reduces error variance)(reduces error variance)
– a confoundconfound variablevariable has a significant effect, has a significant effect, which is not wanted which is not wanted (additional systematic variance)(additional systematic variance)
6262
controls and confounds in blockingmain effect of blocking = sign of good control var.main effect of blocking = sign of good control var.•• shows systematic variability due to blocking factor, shows systematic variability due to blocking factor,
which has been removed from “error” variance which has been removed from “error” variance •• increases power of test for focal IVincreases power of test for focal IV
(as long as it doesn’t explain the same variance as IV!)(as long as it doesn’t explain the same variance as IV!)
blocking factor x IV interaction = sign of confoundblocking factor x IV interaction = sign of confound•• increases power to detect focal IV main effect increases power to detect focal IV main effect
(because systematic variability due to interaction is (because systematic variability due to interaction is removed from “error”)removed from “error”)
•• but that positive outcome is but that positive outcome is outweighedoutweighed by negative by negative outcome: interaction means that effect of focal IV outcome: interaction means that effect of focal IV changes depending on blocking factor changes depending on blocking factor
•• significant Block x IV interaction shows failure of significant Block x IV interaction shows failure of treatment IV effect to generalise across levels oftreatment IV effect to generalise across levels ofblocking variableblocking variable
32
6363
oneone--way ANOVAway ANOVAsourcesource SSSS dfdf MSMS FFTaskTask 213.78213.78 22 106.89106.89 0.710.71ErrorError 2263.332263.33 1515 150.89150.89TotalTotal 2477.112477.11 17 17
block on age (e.g., 60 block on age (e.g., 60 69, 70 69, 70 79, 80 +)79, 80 +)
sourcesource SSSS dfdf MSMS FFTaskTask 213.78213.78 22 106.89106.89 4.18*4.18*AgeAge 1933.781933.78 22 966.89966.89 37.83*37.83*T * AT * A 99.5599.55 44 24.8924.89 0.970.97ErrorError 230.00230.00 99 25.5625.56TotalTotal 2477.112477.11 1717
CONTROL
observe same treatment effects for 3 age groups - GOOD(i.e., non-significant interaction shows generalisability)
reduction in dferror (9 vs. 15), BUT this loss of power is compensated for by smaller error term (25.56 vs. 150.89)
effe
cts
of T
ask
on
effe
cts
of T
ask
on
mot
or s
kills
of e
lder
lym
otor
ski
lls o
f eld
erly
6464
oneone--way ANOVAway ANOVA
sourcesource SSSS dfdf MSMS FFExerciseExercise 1065.501065.50 22 532.75532.75 18.42*18.42*ErrorError 1301.751301.75 4545 28.9328.93TotalTotal 2367.252367.25 4747
block on gender (G):block on gender (G):
sourcesource SSSS dfdf MSMS FFExerciseExercise 1065.501065.50 22 532.75532.75 36.02*36.02*GenderGender 330.75330.75 11 330.75330.75 22.36*22.36*E * GE * G 350.00350.00 22 175.00175.00 11.83*11.83*ErrorError 621.00621.00 4242 14.7914.79TotalTotal 2367.252367.25 4747
CONFOUND (moderator) IV
effe
cts
of e
xerc
ise
leng
thef
fect
s of
exe
rcis
e le
ngth
on s
tude
nts’
flex
ibili
tyon
stu
dent
s’ fl
exib
ility
treatment effects differ according to gender (sig. interaction)treatment effects differ according to gender (sig. interaction)
potential confound potential confound –– NOT GOODNOT GOODor new IV (moderator) or new IV (moderator) –– INTERESTING!INTERESTING!
33
6565
advantages:advantages:1)1) may equate treatment groups better than does a may equate treatment groups better than does a
completely randomized design completely randomized design assuming equal assuming equal nn for levels of blocking factorfor levels of blocking factor
2)2) greater power, because error term is reducedgreater power, because error term is reduced3)3) can check interactions of treatments and blocks can check interactions of treatments and blocks
do effects of treatments generalise?do effects of treatments generalise?
disadvantages:disadvantages:1)1) practical costs of introducing blocking factorpractical costs of introducing blocking factor2)2) loss of power if blocking variable is poorly correlated with loss of power if blocking variable is poorly correlated with
DV (DV (rr < .20), because fewer < .20), because fewer dfdferrorerror3)3) blocking factor is treated as having discrete levels blocking factor is treated as having discrete levels ––
artificial grouping may be necessary artificial grouping may be necessary could lose some informationcould lose some information
pros and cons of blocking
6666
readingsreadingspower analysis and blocking designs (this lecture)
Field (2nd or 3rd ed): no new readingsHowell (all eds): Chapter 8
correlation and regression (next lecture)Field (3rd ed): Chapter 6 + Chapter 7 (pp. 197-209)Field (2nd ed): Chapter 4 + Chapter 5 (pp. 143-152)Howell (all eds): Chapter 9