psyc3010 lecture 5 - university of queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. ·...

33
1 1 psyc3010 lecture 5 psyc3010 lecture 5 power analysis and blocking designs power analysis and blocking designs last lecture: between-Ps ANOVA III: higher-order designs next week: correlation and regression 2 last lecture last lecture Æ Æ this lecture this lecture last lecture: last lecture: – conceptual overview of higher conceptual overview of higher-order designs order designs – tests in 3 tests in 3-way between way between-participants ANOVA participants ANOVA this lecture: this lecture: – Brief recap 3 Brief recap 3-way ANOVA way ANOVA – statistical power statistical power – blocking designs in ANOVA blocking designs in ANOVA

Upload: others

Post on 24-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

1

11

psyc3010 lecture 5psyc3010 lecture 5

power analysis and blocking designspower analysis and blocking designs

last lecture: between-Ps ANOVA III: higher-order designsnext week: correlation and regression

22

last lecture last lecture this lecturethis lecture

last lecture:last lecture:–– conceptual overview of higherconceptual overview of higher--order designsorder designs–– tests in 3tests in 3--way betweenway between--participants ANOVAparticipants ANOVA

this lecture:this lecture:–– Brief recap 3Brief recap 3--way ANOVAway ANOVA–– statistical powerstatistical power–– blocking designs in ANOVAblocking designs in ANOVA

Page 2: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

2

33

last week’s experiment with last week’s experiment with some some differentdifferent datadata

Males

Personality Rew None Pun

Impulsive 350 425 450345 430 465330 445 465

Total 1025 1300 1380Mean 341.67 433.33 460.00

Anxious 485 450 470450 455 480495 445 465

Total 1430 1350 1415Mean 476.67 450.00 471.67

Reinforcement

Females

Personality Rew None Pun

Impulsive 350 355 460320 350 495330 340 485

Total 1000 1045 1440Mean 333.33 348.33 480.00

Anxious 320 360 480340 360 470350 370 450

Total 1010 1090 1400Mean 336.67 363.33 466.67

Reinforcement

44

omnibus tests in a 3omnibus tests in a 3--way anovaway anovamain effects:main effects:–– differences between marginal means of one factor differences between marginal means of one factor

(averaging over levels of other factors)(averaging over levels of other factors)

twotwo--way interactions (firstway interactions (first--order):order):–– examines whether the effect of one factor is the same examines whether the effect of one factor is the same

at every level of a second factor (averaging over a at every level of a second factor (averaging over a third factor)third factor)

threethree--way interaction way interaction (second(second--order)order)::–– examines whether the twoexamines whether the two--way interaction between way interaction between

two factors is the same at every level of the third two factors is the same at every level of the third factor factor

–– oror whether the cell means differ more than you would whether the cell means differ more than you would expect given the main effects and the twoexpect given the main effects and the two--way way interactionsinteractions

Page 3: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

3

55

main summary table main summary table (from SPPS)(from SPPS)

Tests of Between-Subjects Effects

Dependent Variable: RT

60237.500 2 30118.750 169.418 .0007802.778 1 7802.778 43.891 .000

24544.444 1 24544.444 138.062 .0007751.389 2 3875.694 21.801 .000

16426.389 2 8213.194 46.199 .0006944.444 1 6944.444 39.062 .000

6601.389 2 3300.694 18.566 .000

4266.667 24 177.778134575.000 35

SourceREINFORCPERSONALGENDERREINFORC * PERSONALREINFORC * GENDERPERSONAL * GENDERREINFORC * PERSONAL* GENDERErrorTotal

Type III Sumof Squares df Mean Square F Sig.

66

steps for followingsteps for following--up up a 3a 3--way interactionway interaction

is interaction significant?

NO

calculate simple interaction effects

YES

NOYES

does factor A or B have >2 levels?

NO YESconduct tests

for simple simple

comparisons

STOP

STOP

STOP

STOP

is A x B at C1 or C2

significant?

are the simple simple effects of A or B significant?

NO

STOP(repeat this sequence for AC at B1 and B2, (repeat this sequence for AC at B1 and B2, and BC at A1 and A2, to explore interaction and BC at A1 and A2, to explore interaction

–– or pick based on theory / hypotheses)or pick based on theory / hypotheses)

YES

Page 4: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

4

77

following up the 3following up the 3--way interactionway interaction

simple interaction effectssimple interaction effects–– tests of 2tests of 2--way interactions at each level of the third factorway interactions at each level of the third factor–– e.g., personality x reinforcement at each level of gendere.g., personality x reinforcement at each level of gender

simple simple effects:simple simple effects:–– tests of the effect of a factor at each level of second tests of the effect of a factor at each level of second

factor within levels of a third factorfactor within levels of a third factor–– e.g., effects of reinforcement at each level of personality (for males)e.g., effects of reinforcement at each level of personality (for males)

simple simple comparisons:simple simple comparisons:–– compares effect of different levels of a factor at a compares effect of different levels of a factor at a

particular level of a second factor within levels of a third particular level of a second factor within levels of a third factorfactor

–– e.g., effects of rew versus pun or neutral for impulsive males e.g., effects of rew versus pun or neutral for impulsive males

88

summary table for followsummary table for follow--up testsup testsSource SS df MS F p

Simple Interaction Effects

PR at G1 13744.44 2 6872.22 38.656 0.000PR at G2 608.33 2 304.17 1.711 0.202Error 4266.67 24 177.78

Simple Simple effects in the interaction of PR at G1

R at P1 at G1 23116.67 2 11558.33 65.016 0.000R at P2 at G1 538.89 2 269.44 1.516 0.240Error 4266.67 24 177.78

Simple Simple comparisons in simple effect of R at P1 at G1

R vs N and P at P1 at G1 22047.90 1 22047.90 124.019 0.000N vs P at P1 at G1 355.64 1 355.64 2.000 0.170Error 4266.67 24 177.78

Page 5: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

5

99

summary table for followsummary table for follow--up testsup testsSource SS df MS F p

Simple Interaction Effects

PR at G1 13744.44 2 6872.22 38.656 0.000PR at G2 608.33 2 304.17 1.711 0.202Error 4266.67 24 177.78

Simple Simple effects in the interaction of PR at G1

R at P1 at G1 23116.67 2 11558.33 65.016 0.000R at P2 at G1 538.89 2 269.44 1.516 0.240Error 4266.67 24 177.78

Simple Simple comparisons in simple effect of R at P1 at G1

R vs N and P at P1 at G1 22047.90 1 22047.9 124.019 0.000N vs P at P1 at G1 355.64 1 355.64 2.000 0.170Error 4266.67 24 177.78

the personality x reinforcement interaction is

- significant for males

- non-significant for femaleswe can follow up this significant simple interaction effect of Personality x Reinforcement for males using simple simple effects

1010

summary table for followsummary table for follow--up testsup testsSource SS df MS F p

Simple Interaction Effects

PR at G1 13744.44 2 6872.22 38.656 0.000PR at G2 608.33 2 304.17 1.711 0.202Error 4266.67 24 177.78

Simple Simple effects in the interaction of PR at G1

R at P1 at G1 23116.67 2 11558.33 65.016 0.000R at P2 at G1 538.89 2 269.44 1.516 0.240Error 4266.67 24 177.78

Simple Simple comparisons in simple effect of R at P1 at G1

R vs N and P at P1 at G1 22047.90 1 22047.9 124.019 0.000N vs P at P1 at G1 355.64 1 355.64 2.000 0.170Error 4266.67 24 177.78

the effect of reinforcement (for males) is- significant for the impulsive group- not-significant for the anxious group

we can follow up the significant simple simple effect of reinforcement at impulsivity (for males) using simple simple comparisons

Page 6: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

6

1111

summary table for followsummary table for follow--up testsup testsSource SS df MS F p

Simple Interaction Effects

PR at G1 13744.44 2 6872.22 38.656 0.000PR at G2 608.33 2 304.17 1.711 0.202Error 4266.67 24 177.78

Simple Simple effects in the interaction of PR at G1

R at P1 at G1 23116.67 2 11558.33 65.016 0.000R at P2 at G1 538.89 2 269.44 1.516 0.240Error 4266.67 24 177.78

Simple Simple comparisons in simple effect of R at P1 at G1

R vs N and P at P1 at G1 22047.90 1 22047.90 124.019 0.000N vs P at P1 at G1 355.64 1 355.64 2.000 0.170Error 4266.67 24 177.78

simple simple comparisons

- calculated using the t-test formula (c.f. lecture 3)

- can also be calculated using a F-test (last week’s tutorial exercise showed how to do this)

1212

summary table for followsummary table for follow--up testsup testsSource SS df MS F p

Simple Interaction Effects

PR at G1 13744.44 2 6872.22 38.656 0.000PR at G2 608.33 2 304.17 1.711 0.202Error 4266.67 24 177.78

Simple Simple effects in the interaction of PR at G1

R at P1 at G1 23116.67 2 11558.33 65.016 0.000R at P2 at G1 538.89 2 269.44 1.516 0.240Error 4266.67 24 177.78

Simple Simple comparisons in simple effect of R at P1 at G1

R vs N and P at P1 at G1 22047.90 1 22047.9 124.019 0.000N vs P at P1 at G1 355.64 1 355.644 2.000 0.170Error 4266.67 24 177.78

for impulsive males, 1) performance under rewarding feedback was

significantly different from that under neutral or punishing feedback

2) performance under punishing feedback was not significantly different to performance under neutralfeedback

Page 7: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

7

1313

reportingreporting

A 2 x 2 x 3 factorial ANOVA was performed, and a significant A 2 x 2 x 3 factorial ANOVA was performed, and a significant threethree--way interaction was observed among personality, gender, way interaction was observed among personality, gender, and reinforcement and reinforcement FF(2,24) = 169.42, (2,24) = 169.42, pp<.001<.001. The simple . The simple personality x reinforcement interaction was significant for men, personality x reinforcement interaction was significant for men, FF(2,24) = 38.66, (2,24) = 38.66, pp<.001<.001, but not for women, , but not for women, FF(2,24) = 1.71, (2,24) = 1.71, p p = = .202.202. Simple simple effects of reinforcement were significant for . Simple simple effects of reinforcement were significant for impulsive males, impulsive males, FF(2,24) = 65.02, (2,24) = 65.02, pp<.001<.001, but not for anxious , but not for anxious males, males, FF(2,24) = 1.52, (2,24) = 1.52, p p = .240= .240. Planned contrasts showed that . Planned contrasts showed that impulsive males performed better with reward impulsive males performed better with reward ((MM = 341.67)= 341.67) than than with punishment or no reinforcement with punishment or no reinforcement ((MM = 441.67), = 441.67), FF(1,24) = (1,24) = 124.02, 124.02, p<p<.001,.001, while the performance of impulsive males while the performance of impulsive males receiving no reinforcement receiving no reinforcement ((MM = 433.33)= 433.33) was not significantly was not significantly different to those receiving punishment different to those receiving punishment ((MM = 460.00), = 460.00), FF(2,24) = (2,24) = 2.00, 2.00, p = p = .170.170..

1. I haven’t put effect sizes. These would be required for all omnibus tests. Effect sizes are usually included for simple effects as well, and even for simple comparisons.

1414

reportingreporting

Results of a 2 x 2 x 3 factorial ANOVA indicated a significant threeResults of a 2 x 2 x 3 factorial ANOVA indicated a significant three--way interaction among personality, gender, and reinforcement way interaction among personality, gender, and reinforcement FF(2,24) = 169.42, (2,24) = 169.42, pp<.001<.001. The simple personality x reinforcement . The simple personality x reinforcement interaction was significant for males, interaction was significant for males, FF(2,24) = 38.66, (2,24) = 38.66, pp<.001<.001, but not , but not for females, for females, FF(2,24) = 1.71, (2,24) = 1.71, p p = .202= .202. There was a significant simple . There was a significant simple simple effect of reinforcement for impulsive males, simple effect of reinforcement for impulsive males, FF(2,24) = 65.02, (2,24) = 65.02, pp<.001<.001, but not for anxious males, , but not for anxious males, FF(2,24) = 1.52, (2,24) = 1.52, p p = .240= .240. Planned . Planned contrasts showed that impulsive males performed better with reward contrasts showed that impulsive males performed better with reward ((MM = 341.67)= 341.67) than with punishment or no reinforcement than with punishment or no reinforcement ((MM = = 441.67), 441.67), FF(1,24) = 124.02, (1,24) = 124.02, p<p<.001,.001, while the performance of while the performance of impulsive males receiving no reinforcement impulsive males receiving no reinforcement ((MM = 433.33)= 433.33) was not was not significantly different to those receiving punishment significantly different to those receiving punishment ((MM = 460.00), = 460.00), FF(2,24) = 2.00, (2,24) = 2.00, p = p = .170.170..

2. NB we are reporting simple simple effects of reinforcement now – assumes this = theoretical focus. Last week we hypothesized different personality effects depending on reinforcement, but this week we are focusing on different reinforcement effects depending on personality.

Page 8: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

8

1515

reportingreporting

Results of a 2 x 2 x 3 factorial ANOVA indicated a significant threeResults of a 2 x 2 x 3 factorial ANOVA indicated a significant three--way way interaction among personality, gender, and reinforcement interaction among personality, gender, and reinforcement FF(2,24) = (2,24) = 169.42, 169.42, pp<.001. The simple personality x reinforcement interaction was <.001. The simple personality x reinforcement interaction was significant for males, significant for males, FF(2,24) = 38.66, (2,24) = 38.66, pp<.001, but not for females, <.001, but not for females, FF(2,24) = 1.71, (2,24) = 1.71, p p = .202. [Insert before simple = .202. [Insert before simple simplesimple effects of R for effects of R for men:] men:] For women, across personality type the simple effect of For women, across personality type the simple effect of reinforcement was / was not significant, F(2,24)=X, p=, eta2= .reinforcement was / was not significant, F(2,24)=X, p=, eta2= . [If yes, [If yes, would also followwould also follow--up with contrasts.] In addition, there was a significant up with contrasts.] In addition, there was a significant simple simple simplesimple effect of reinforcement for impulsive males, effect of reinforcement for impulsive males, FF(2,24) = (2,24) = 65.02, 65.02, pp<.001<.001, but not for anxious males, , but not for anxious males, FF(2,24) = 1.52, (2,24) = 1.52, p p = .240= .240. etc”. etc”

3. NB for simplicity we reported for women that the simple interaction of P x R was ns, but unless interested ONLY in interaction we would also ‘followup’ ns simple interaction by looking at simple effect of Reinforcement for women (collapsing across Personality), if hypotheses suggest R = the theoretically key IV. We want to know if there are effects of reinforcement for women that don’t change depending on personality.

1616

ReportingReportingWhere there is one key IV with two moderators Where there is one key IV with two moderators of interest, you might also choose to skip directly of interest, you might also choose to skip directly from a threefrom a three--way to the simple simple effects.way to the simple simple effects.–– E.g. (madeE.g. (made--up data), “The threeup data), “The three--way interaction was way interaction was

significant, F=,p=,eta2=. Followsignificant, F=,p=,eta2=. Follow--up tests examined up tests examined the simple simple effects of reinforcement within each the simple simple effects of reinforcement within each level of personality and gender. Differences in level of personality and gender. Differences in reaction to reinforcement were observed for anxious reaction to reinforcement were observed for anxious men, F=,p=,eta2=, and impulsive men, F=,p=,eta2=, and impulsive women,F=,p=,eta2=. Specifically, [describe simple women,F=,p=,eta2=. Specifically, [describe simple simple comparisons here]. For impulsive men and simple comparisons here]. For impulsive men and anxious women, no effects of reinforcement were anxious women, no effects of reinforcement were found, Fs< ##, ps> ##, eta2s< ##.”found, Fs< ##, ps> ##, eta2s< ##.”

In fact you might even go directly from the threeIn fact you might even go directly from the three--way to the simple comparisons (and put these in way to the simple comparisons (and put these in a table) to reduce your wordiness and the a table) to reduce your wordiness and the complexity of the text.complexity of the text.

Page 9: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

9

1717

Table 1. Mean reaction time as a function of Table 1. Mean reaction time as a function of gender, personality, and reinforcement. gender, personality, and reinforcement.

Reinforcement: Reinforcement: Punishment NeutralPunishment Neutral ReinforcementReinforcementWomenWomen

ImpulsiveImpulsive 485.33485.33aa 481.67481.67aa 320.00320.00bbAnxiousAnxious 487.10487.10aa 485.33485.33aa 323.00323.00bb

MenMenImpulsiveImpulsive 484.25484.25aa 478.76478.76aa 316.67316.67bbAnxiousAnxious 367.55367.55aa 455.67455.67bb 583.00583.00cc

Note. Differing subscripts within the same row indicate significant Note. Differing subscripts within the same row indicate significant differences among the means.differences among the means.

1. Most experimental journals would also want to see standard deviations.

E.g. (madeE.g. (made--up data): “The threeup data): “The three--way interaction of P x R x G was significant, way interaction of P x R x G was significant, F=,p=,eta2=. Table 1 displays the means by gender, personality, and F=,p=,eta2=. Table 1 displays the means by gender, personality, and reinforcement to decompose the interaction. As seen in Table 1, followreinforcement to decompose the interaction. As seen in Table 1, follow--up up analyses revealed that for women and for impulsive men, reinforcement led to analyses revealed that for women and for impulsive men, reinforcement led to faster responses (more learning) than either the control condition or faster responses (more learning) than either the control condition or punishment, which did not differ. For anxious men, by contrast, punishment punishment, which did not differ. For anxious men, by contrast, punishment produced the most learning, followed by the control condition, with produced the most learning, followed by the control condition, with reinforcement being the least successful condition (slowest response times).”reinforcement being the least successful condition (slowest response times).”

1818

Table 1. Mean reaction time as a Table 1. Mean reaction time as a function of gender, personality, and function of gender, personality, and reinforcement. reinforcement.

Reinforcement: Reinforcement: Punishment NeutralPunishment Neutral ReinforcementReinforcementWomenWomen

ImpulsiveImpulsive 485.33485.33aa 481.67481.67aa 320.00320.00bbAnxiousAnxious 487.10487.10aa 485.33485.33aa 323.00323.00bb

MenMenImpulsiveImpulsive 484.25484.25aa 478.76478.76aa 316.67316.67bbAnxiousAnxious 367.55367.55aa 455.67455.67bb 583.00583.00cc

Note. Differing subscripts within the same row indicate significant Note. Differing subscripts within the same row indicate significant differences among the means.differences among the means.

2. NB how this hides all stats (except the three-way) – excellent for non-stats minded reviewers

E.g. (madeE.g. (made--up data): “The threeup data): “The three--way interaction of P x R x G was significant, way interaction of P x R x G was significant, F=,p=,eta2=. Table 1 displays the means by gender, personality, and F=,p=,eta2=. Table 1 displays the means by gender, personality, and reinforcement to decompose the interaction. As seen in Table 1, followreinforcement to decompose the interaction. As seen in Table 1, follow--up up analyses revealed that for women and for impulsive men, reinforcement led to analyses revealed that for women and for impulsive men, reinforcement led to faster responses (more learning) than either the control condition or faster responses (more learning) than either the control condition or punishment, which did not differ. For impulsive men, by contrast, punishment punishment, which did not differ. For impulsive men, by contrast, punishment produced the most learning, followed by the control condition, with produced the most learning, followed by the control condition, with reinforcement being the least successful condition (slowest response times).”reinforcement being the least successful condition (slowest response times).”

Page 10: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

10

1919

Reporting: The bottom lineReporting: The bottom lineWith higherWith higher--order designs, there’s a lot more order designs, there’s a lot more flexibility about how you structure the analyses / flexibility about how you structure the analyses / writewrite--upupIt’s best to be guided by your theory.It’s best to be guided by your theory.This may require you to fully explore the simple This may require you to fully explore the simple interactions, simple simple effects, simple simple interactions, simple simple effects, simple simple comparisons, etc. within the textcomparisons, etc. within the textOr allow for some skipping.Or allow for some skipping.The more complex the design, the more likely The more complex the design, the more likely that you need a Table (or Tables, and/or figures) that you need a Table (or Tables, and/or figures) to report the results.to report the results.

2020

What happens with Unequal n?What happens with Unequal n?Something really complex mathematically Something really complex mathematically –– so you’re not so you’re not required to learn this material (it will not be assessed on required to learn this material (it will not be assessed on the exam)the exam)We provide a quick overview of the issues for the We provide a quick overview of the issues for the interested among you on the Blackboard web site in the interested among you on the Blackboard web site in the practice materials sectionpractice materials sectionTake home message: Take home message: If unequal n are too extreme If unequal n are too extreme (ratio of largest to smallest cell size > 3:1), results of (ratio of largest to smallest cell size > 3:1), results of analyses are unstable (less likely to replicate analyses are unstable (less likely to replicate –– can have can have more Type 1 more Type 1 andand more Type 2 errors). more Type 2 errors). –– Always try for equal n in study designs where anticipate using Always try for equal n in study designs where anticipate using

ANOVA.ANOVA.–– Avoid using natural group variables like gender or ethnicity in Avoid using natural group variables like gender or ethnicity in

ANOVA if ratio of people in groups exceeds 3:1 for largest: ANOVA if ratio of people in groups exceeds 3:1 for largest: smallest.smallest.

Page 11: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

11

2121

2222

statistical power: statistical power: a conceptual a conceptual

overviewoverview

Page 12: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

12

2323

typetype--1 and type1 and type--2 errors2 errorstypetype--1 error1 error == finding a significant difference in the finding a significant difference in the sample that actually doesn’t exist in the population sample that actually doesn’t exist in the population

denoted denoted αα

typetype--2 error2 error == finding no significant difference in the finding no significant difference in the sample when one actually exists in the populationsample when one actually exists in the population

denoted denoted ββ

significant differences are defined with reference to a significant differences are defined with reference to a criterion or threshold (i.e., acceptable rate) for or threshold (i.e., acceptable rate) for committing typecommitting type--1 errors: typically set at .05 or .011 errors: typically set at .05 or .01

hypothesis testing pays little attention to typehypothesis testing pays little attention to type--2 error2 errorconcept of power shifts focus to typeconcept of power shifts focus to type--2 error2 error

2424

definition of powerdefinition of powertechnical definition:

the probability of correctly rejecting a false H0mathematically works out to 1 - β( β = type-2 error = probability of accepting false H0)

useful definition:the degree to which we can detect treatment effects(includes main effects, interactions, simple effects, etc.)

when they exist in the population

Power analyses put the emphasis on researchers’ ability to find effects that exist, rather than on the likelihood of incorrectly finding effects that don’t exist

Page 13: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

13

2525

reality vs. statistical decisions

false alarmfalse alarmαα

jackpotjackpot1 1 -- ββ

good savegood save1 1 -- αα

missβ

reality

H0 is correct H1 is correctdecision

rejected H0

did not reject H0

2626

why you might care about powerwhy you might care about power1)1) I have done a study and I want to report the power I have done a study and I want to report the power

of my significant effect.of my significant effect.need to calculate need to calculate observed power (post hoc power)observed power (post hoc power)

2)2) I have done a study and did not find a significant I have done a study and did not find a significant effect. But I know the mean difference exists in the effect. But I know the mean difference exists in the population. How can I increase power to detect it?population. How can I increase power to detect it?

need to calculate need to calculate predicted power (a priori power)predicted power (a priori power)

3)3) I am designing a study. I want to be sure that I have I am designing a study. I want to be sure that I have enough power to detect my predicted effects.enough power to detect my predicted effects.

need to calculate need to calculate predicted power (a priori power)predicted power (a priori power)

Page 14: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

14

2727

in terms of hypothesis testingin terms of hypothesis testingHo

μo α (one-tailed)

Sampling distribution of mean if H0 is true

H1

μ1

power

β (probability of Type II error)

Sampling distribution of mean if H1 is true

2828

factors affecting powerfactors affecting power

power depends on a number of thingspower depends on a number of things–– significance level, significance level, αα

•• relaxed relaxed αα more powermore power–– sample size, sample size, NN

•• more N more N more powermore power–– mean differences, mean differences, μμ00 –– μμ11

•• larger differences larger differences more powermore power–– error variance error variance –– σσee

22 or MSor MSerrorerror•• less error variance less error variance more powermore power

Page 15: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

15

2929

power as a function of power as a function of αHo H1

μo μ1 α

power

β

3030

power as a function of power as a function of αHo H1

μo μ1 α

power

β

occasionally people will set α=.10 if power is

low

Page 16: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

16

3131

power as a function of power as a function of μμ00 –– μμ11

Ho H1

μo μ1 α

power

β

3232

power as a function of power as a function of μμ00 –– μμ11

Ho H1

μo μ1 α

power

β

not very surprising!

Page 17: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

17

3333

Ho H1

μo μ1 α

power

β

as a function of n and as a function of n and σσ22

3434

as a function of n and as a function of n and σσ22

Ho H1

μo μ1 α

power

β

an increase in sample size or

decrease variance will

decrease error variance

Recall: σσ22means means = = σσ22//nn

or or σσmeans means = = σσ22//√√nn

Page 18: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

18

3535

as a function of effect size (as a function of effect size (d)d)

power closely related to effect sizepower closely related to effect sizeeffect size is determined by some of these previous effect size is determined by some of these previous factors we looked at (factors we looked at (μo -- μ1 and σ):d = d = μ1 -- μ0

σd indicates about how many standard deviations the d indicates about how many standard deviations the means are apart, & thus the overlap of the two means are apart, & thus the overlap of the two distributions:distributions:

i.e., how much the means differvariability in the sample

effect size d percentage overlap

Small 0.20 85%

Medium 0.50 67%

Large 0.80 53%

3636

what’s the point again?aimaim: to achieve .80 power (minimum optimal level): to achieve .80 power (minimum optimal level)

80% chance that you will find a significant 80% chance that you will find a significant effect, if it exists in the populationeffect, if it exists in the population

can conduct can conduct a priori a priori power analyses power analyses beforebeforeyou run your study you run your study –– If you want to know how many participants you need in If you want to know how many participants you need in

your study, in order to find the effects you’re looking foryour study, in order to find the effects you’re looking for

but you can also conduct but you can also conduct postpost--hochoc power analyses power analyses afterafter your study to help strengthen your argument:your study to help strengthen your argument:–– if you didn’t find significant results, but you wanted to: if you didn’t find significant results, but you wanted to:

can argue that you can argue that you didn’t have sufficient powerdidn’t have sufficient power–– if you didn’t find significant results, and that’s good: if you didn’t find significant results, and that’s good:

can argue that you can argue that you did have sufficient powerdid have sufficient power

Page 19: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

19

3737

first, gather some information

a priori estimates a priori estimates –– what N to achieve .80 power?what N to achieve .80 power?

information to be gatheredinformation to be gathered::1. estimate of effect size 1. estimate of effect size 2. estimate of error (2. estimate of error (MSMSerrorerror))

postpost--hoc estimates hoc estimates –– how powerful was my study?how powerful was my study?

information to be gatheredinformation to be gathered::1. effect size 1. effect size 2. error (2. error (MSMSerrorerror))3. 3. NN in your studyin your study

we typically use previous research to estimate the

likely effect size and MSerror

we get these estimatesfrom our dataset

3838

next, get the softwarenext, get the softwareG*POWER: free program for power calculationsfree program for power calculationshttp://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/

• a priori power analyses – provides sample sizesfor given effect sizes, alpha levels, and power values

• post hoc power analyses – provides power valuesfor given sample sizes, effect sizes, and alpha levels

suitable for most major statistical methods

useful reference:Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39 (2), 175-191.

Page 20: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

20

3939

finally, use the softwarefinally, use the softwareselect a “test family” from the drop-down menu of options

- for example “F tests”select a “Statistical test” within this family

- for example “ANOVA: fixed effects, special, main effects, and interactions”

select the type of Power Analysis you want to conduct - “a priori” or “post-hoc”fill in the necessary information and select “calculate”

you may need to conduct a separate analysis for each systematic effect in your design

So how many in 2-way ANOVA? And 3-way?

4040

Page 21: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

21

4141

strategic issuesstrategic issuesin thinking about in thinking about statistical powerstatistical power

4242

some caveatssome caveats1.1. an effect must exist for you to find itan effect must exist for you to find it

increasing power can help you detect even very increasing power can help you detect even very small effects small effects (which can be important )(which can be important )

but if your theory and predictions but if your theory and predictions (about mean (about mean differences)differences) do notdo not reflect what’s actually going on reflect what’s actually going on in the population:in the population:

HH00 will be right, and Hwill be right, and H11 will be wrongwill be wrong

all the power in the world won’t help you find all the power in the world won’t help you find the effects you’re searching forthe effects you’re searching for

Page 22: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

22

4343

2. large samples can be bewitching

a large sample can detect very small effects that may be relatively unimportant and unstable

exclusive focus on significance test may lead you to overestimate the importance of a small effect

some caveatssome caveats

4444

3. error variance is also important

high error variance means that a large effect may still turn out to be non-significant

don’t just focus on sample size

some caveatssome caveats

Page 23: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

23

4545

so how do I maximize power?1. focus on studying large effects (!)

but limited opportunities in psychology

2. increase sample sizebut may face practical constraints

3. increase α level but this increases chance of type-1 error

4. decrease error varianceHrm how?

4646

how to reduce error varianceaim is to reduce variation in DV scoresfrom sources other than your IVs

improve operationalisation of variablesincreases validity

improve measurement of variablesincreases (internal) reliability

improve design of your studyaccount for variance from other sources (e.g. blocking designs - more in a minute)

improve methods of analysiscontrol for variance from other sources (e.g. ANCOVA - more in next lecture)

Page 24: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

24

4747

If you torture data sufficiently, it will confess If you torture data sufficiently, it will confess to almost anything. to almost anything.

----Fred Fred MengerMenger, chemistry professor (1937, chemistry professor (1937-- ))

4848

blocking designsblocking designs

Page 25: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

25

4949

what “blocking” iswhat “blocking” isin any research study, you want to explain variance in any research study, you want to explain variance in a DV with a novel IV / treatmentin a DV with a novel IV / treatmentoften, variance in the DV can also be explained by often, variance in the DV can also be explained by additional factors which are less novel or interestingadditional factors which are less novel or interesting–– known as known as controlcontrol or or concomitantconcomitant variablesvariables

blocking introduces blocking introduces control variables into your designvariables into your design–– reflect additional sources of variation or prereflect additional sources of variation or pre--existing existing

differences on the DV scoredifferences on the DV score

DV IV of interest control variableproblem solving teaching style IQprotest behaviour group norms political attitudesdepression stress family history

5050

11stst application of blockingapplication of blockingadding a second factor adding a second factor increases powerincreases power by by reducing error variancereducing error variance

how this works:how this works:–– error varianceerror variance, or , or residual varianceresidual variance, represents , represents

all variance left over after accounting for all variance left over after accounting for systematic variance explained by IV of interestsystematic variance explained by IV of interest

–– error variance could reflect error variance could reflect chance chance oror systematicsystematicunmeasuredunmeasured influencesinfluences from other factor(s)from other factor(s)

–– adding a 2adding a 2ndnd factor (that is correlated with the DV) factor (that is correlated with the DV) may account for some of this leftmay account for some of this left--over varianceover variance

–– reducing error variance in this way will increase reducing error variance in this way will increase power (think back to the first half of this lecture)power (think back to the first half of this lecture)

Page 26: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

26

5151

example example -- 11stst application of blockingapplication of blocking

11--way ANOVA:way ANOVA:

–– DV = problemDV = problem--solving timesolving time–– IV = caffeine intake (caffeine or control) IV = caffeine intake (caffeine or control)

we could introduce another IV that was related to we could introduce another IV that was related to the DV, giving us a the DV, giving us a 22--way ANOVAway ANOVA::

–– DV = problemDV = problem--solving timesolving time–– factor 1 factor 1 (IV) = caffeine intake(IV) = caffeine intake–– factor 2 factor 2 (blocking variable) is related to the DV(blocking variable) is related to the DV

•• e.g., IQ levele.g., IQ level

5252

IVError

caffeineerrorerror

variance in the 1variance in the 1--way ANOVA:way ANOVA:effect of caffeine onlyeffect of caffeine only

large error / residual variance means less power for test

Page 27: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

27

5353

IVErrorBlockIV*Block

caffeine

errorerrorIQ

IQ x C

variance in the 2variance in the 2--way ANOVA:way ANOVA:effects of caffeine and IQeffects of caffeine and IQ

Maybe you can reduce the error / residual variance by partitioning out variability due to 2nd factor

more power to find effect of original IVas long as the original IV and 2nd factor don’t

explain the same variance in the DV (can’t just manipulate same thing 2x)

Tests of Between-Subjects Effects

Dependent Variable: ATTRACT

199.85 1 199.85 5.323 .0261902.290 2 951.145 25.332 .000120.870 2 60.435 1.610 .212

1464.340 39 37.5478966.667 44

SourceIVBlockIV*BlockErrorTotal

Type III Sumof Squares df Mean Square F Sig.

ProbSolv

Tests of Between-Subjects Effects

Dependent Variable: ATTRACT

199.850 1 199.850 2.464 .1243487.500 43 81.1058966.667 44

SourceIVErrorTotal

Type III Sumof Squares df Mean Square F Sig.

ProbSolv

5454

evidence that blocking really can help increase power

Page 28: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

28

5555

5656

how to set up a blocking designhomogenous blocks created with levels of blocking homogenous blocks created with levels of blocking factor: Ps are factor: Ps are matchedmatched within levels of blocking factorwithin levels of blocking factor–– e.g. first divide people into IQ groups (hi, med, low)e.g. first divide people into IQ groups (hi, med, low)

participants within each block then randomly assigned participants within each block then randomly assigned to levels of the IV (to levels of the IV (stratified random assignmentstratified random assignment))–– e.g. within each of three IQ groups, assign Ps to e.g. within each of three IQ groups, assign Ps to

either caffeine or control conditioneither caffeine or control condition

low IQ mid IQ high IQlow IQ mid IQ high IQ

controlcontrol

caffeinecaffeine

Page 29: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

29

5757

what’s different about blockingwhat’s different about blockingexperimental designsexperimental designs are fully are fully randomisedrandomised: : participants are randomly assigned to 1 level participants are randomly assigned to 1 level of each (and every) factorof each (and every) factor

blockingblocking is not randomised: is not randomised: participants are categorized into levels of the participants are categorized into levels of the blocking factorblocking factor

design with blocking factor is also known as:design with blocking factor is also known as:•• randomized block designrandomized block design•• stratificationstratification•• matched samples designmatched samples design

5858

1) what if the blocking variable doesn’t reduce the error term?– then you have chosen the wrong blocking variable– blocking variables are selected because of their

known association with the dependent variable– blocking variables should also not explain the same

variance as the focal IV

2) how does the blocking variable fit in with predictions and reporting of findings?– unlike the effect of the focal IV, the effect of the

blocking factor is not usually of interest– the blocking variable is only factored in to reduce error

and increase the power of the test for the focal IV

some questions about blockingsome questions about blocking

Page 30: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

30

5959

6060

22ndnd application of blockingapplication of blockingdetecting potential confounds: factors that offer alternative explanations for systematic resultsexample:example: each condition in your study is run by a different experimenter (2nd factor)– the experimental effects may be due to the

experimenter, rather than treatment conditions [main effect of E, rather than main effect of T]

– the effect of the treatment may depend on who the experimenter was [E x T interaction]

Page 31: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

31

6161

IVs, controls, confounds...The label depends entirely on theory / argument

– an independent variable (IV) independent variable (IV) has a significant has a significant effect, which is of great interest to you effect, which is of great interest to you (systematic variance that you predicted)(systematic variance that you predicted)

– a controlcontrol variablevariable has a significant effect, has a significant effect, but that can be okay but that can be okay (reduces error variance)(reduces error variance)

– a confoundconfound variablevariable has a significant effect, has a significant effect, which is not wanted which is not wanted (additional systematic variance)(additional systematic variance)

6262

controls and confounds in blockingmain effect of blocking = sign of good control var.main effect of blocking = sign of good control var.•• shows systematic variability due to blocking factor, shows systematic variability due to blocking factor,

which has been removed from “error” variance which has been removed from “error” variance •• increases power of test for focal IVincreases power of test for focal IV

(as long as it doesn’t explain the same variance as IV!)(as long as it doesn’t explain the same variance as IV!)

blocking factor x IV interaction = sign of confoundblocking factor x IV interaction = sign of confound•• increases power to detect focal IV main effect increases power to detect focal IV main effect

(because systematic variability due to interaction is (because systematic variability due to interaction is removed from “error”)removed from “error”)

•• but that positive outcome is but that positive outcome is outweighedoutweighed by negative by negative outcome: interaction means that effect of focal IV outcome: interaction means that effect of focal IV changes depending on blocking factor changes depending on blocking factor

•• significant Block x IV interaction shows failure of significant Block x IV interaction shows failure of treatment IV effect to generalise across levels oftreatment IV effect to generalise across levels ofblocking variableblocking variable

Page 32: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

32

6363

oneone--way ANOVAway ANOVAsourcesource SSSS dfdf MSMS FFTaskTask 213.78213.78 22 106.89106.89 0.710.71ErrorError 2263.332263.33 1515 150.89150.89TotalTotal 2477.112477.11 17 17

block on age (e.g., 60 block on age (e.g., 60 69, 70 69, 70 79, 80 +)79, 80 +)

sourcesource SSSS dfdf MSMS FFTaskTask 213.78213.78 22 106.89106.89 4.18*4.18*AgeAge 1933.781933.78 22 966.89966.89 37.83*37.83*T * AT * A 99.5599.55 44 24.8924.89 0.970.97ErrorError 230.00230.00 99 25.5625.56TotalTotal 2477.112477.11 1717

CONTROL

observe same treatment effects for 3 age groups - GOOD(i.e., non-significant interaction shows generalisability)

reduction in dferror (9 vs. 15), BUT this loss of power is compensated for by smaller error term (25.56 vs. 150.89)

effe

cts

of T

ask

on

effe

cts

of T

ask

on

mot

or s

kills

of e

lder

lym

otor

ski

lls o

f eld

erly

6464

oneone--way ANOVAway ANOVA

sourcesource SSSS dfdf MSMS FFExerciseExercise 1065.501065.50 22 532.75532.75 18.42*18.42*ErrorError 1301.751301.75 4545 28.9328.93TotalTotal 2367.252367.25 4747

block on gender (G):block on gender (G):

sourcesource SSSS dfdf MSMS FFExerciseExercise 1065.501065.50 22 532.75532.75 36.02*36.02*GenderGender 330.75330.75 11 330.75330.75 22.36*22.36*E * GE * G 350.00350.00 22 175.00175.00 11.83*11.83*ErrorError 621.00621.00 4242 14.7914.79TotalTotal 2367.252367.25 4747

CONFOUND (moderator) IV

effe

cts

of e

xerc

ise

leng

thef

fect

s of

exe

rcis

e le

ngth

on s

tude

nts’

flex

ibili

tyon

stu

dent

s’ fl

exib

ility

treatment effects differ according to gender (sig. interaction)treatment effects differ according to gender (sig. interaction)

potential confound potential confound –– NOT GOODNOT GOODor new IV (moderator) or new IV (moderator) –– INTERESTING!INTERESTING!

Page 33: psyc3010 lecture 5 - University of Queenslanduqwloui1/stats/3010 for post... · 2018. 8. 2. · Mean 336.67 363.33 466.67 Reinforcement 4 omnibus tests in a 3omnibus tests in a 3--way

33

6565

advantages:advantages:1)1) may equate treatment groups better than does a may equate treatment groups better than does a

completely randomized design completely randomized design assuming equal assuming equal nn for levels of blocking factorfor levels of blocking factor

2)2) greater power, because error term is reducedgreater power, because error term is reduced3)3) can check interactions of treatments and blocks can check interactions of treatments and blocks

do effects of treatments generalise?do effects of treatments generalise?

disadvantages:disadvantages:1)1) practical costs of introducing blocking factorpractical costs of introducing blocking factor2)2) loss of power if blocking variable is poorly correlated with loss of power if blocking variable is poorly correlated with

DV (DV (rr < .20), because fewer < .20), because fewer dfdferrorerror3)3) blocking factor is treated as having discrete levels blocking factor is treated as having discrete levels ––

artificial grouping may be necessary artificial grouping may be necessary could lose some informationcould lose some information

pros and cons of blocking

6666

readingsreadingspower analysis and blocking designs (this lecture)

Field (2nd or 3rd ed): no new readingsHowell (all eds): Chapter 8

correlation and regression (next lecture)Field (3rd ed): Chapter 6 + Chapter 7 (pp. 197-209)Field (2nd ed): Chapter 4 + Chapter 5 (pp. 143-152)Howell (all eds): Chapter 9