murphy power analysis

Upload: utkarsh-gaurav

Post on 06-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Murphy Power Analysis

    1/40

    Power Analysis for Traditional and

    Modern Hypothesis Tests

    Kevin R. Murphy

    Pennsylvania State University

  • 8/3/2019 Murphy Power Analysis

    2/40

    Power Analysis

    Helps you plan better studies

    Helps you make better sense of existingstudies

    Is not limited to traditional null hypothesistests

    Application of power analysis to minimum-effecttests will be discussed

  • 8/3/2019 Murphy Power Analysis

    3/40

    Errors in Null Hypothesis Tests

    No Effect (H0) Some Effect

    Reject Null Type I Error -reject null when itis true

    ()

    Power= 1-

    Fail to RejectNull

    Type II Error - failto reject nullwhen you should()

    True State of Affairs

    Your

    Decision

  • 8/3/2019 Murphy Power Analysis

    4/40

    Power Depends On

    Effect Size

    How large is the effect in the population?

    Sample Size (N)

    You are using a sample to make inferences about

    the population. How large is the sample?

    Decision Criteria - How do you define significant and why?

  • 8/3/2019 Murphy Power Analysis

    5/40

    Power Analysis and the F Distribution

    The power of most statistical tests in socialsciences (e.g., ANOVA, regression, t-tests,other linear model statistics) can beevaluated via the familiar F distribution

    F is a ratio of observed effect to error F= MS treatments / MS error

    F = (True Effect + Error) / Error The larger the true treatment effect, the larger F

    you expect to find

    If the null hypothesis is correct, E(F) = 1.0

  • 8/3/2019 Murphy Power Analysis

    6/40

    How Does Power Analysis Work?

    0 1 2 3 4

    F Value

    In the familiar F distribution below, 95% of thevalues are below 2.00 (distribution for df = 7,200)

    F=2.0 represents

    cutoff for rejectingH0

  • 8/3/2019 Murphy Power Analysis

    7/40

    The Noncentral F Distribution

    0 1 2 3 4

    F Value

    Central F

    Noncentral F

    If the null hypothesis is false, the Noncentral F distributionis needed. In the Noncentral F distribution below, 75% of

    the values are below 2.00. Therefore, power = .25

  • 8/3/2019 Murphy Power Analysis

    8/40

    A Larger Effect

    0 1 2 3 4

    F Value

    Central F

    Noncentral F

    In the Noncentral F distribution below, in which the effect

    is larger, 30% of the values are below 2.00. Therefore

    power = .70

  • 8/3/2019 Murphy Power Analysis

    9/40

    Power Functions

    0

    0.10.20.30.40.50.6

    0.70.80.9

    1

    00.2

    0.4

    0.6

    0.8 1

    Effect Size

    Likelihood ofrejection H0

  • 8/3/2019 Murphy Power Analysis

    10/40

    Power Functions

    0

    0.10.20.30.40.50.6

    0.70.80.9

    1

    25 75125

    175

    225

    275

    Sample Size

    Likelihood ofrejection H0

  • 8/3/2019 Murphy Power Analysis

    11/40

    How to Increase Power

    Increase N Effects of adding more subjects are not

    identical to those of adding more observations

    Increase ES

    Choose a different research question

    Use stronger treatments or interventions

    Use better measures

    Use a more lenient alpha

    p

  • 8/3/2019 Murphy Power Analysis

    12/40

    Effects of Implementing

    Power Analysis

    Stronger studies

    Larger samples, better measures

    Fewer studies

    Adequate studies are harder to do than most people

    realize

    Less emphasis, in the long term, on null

    hypothesis testing

  • 8/3/2019 Murphy Power Analysis

    13/40

    Conducting a Power Analysis

    The classic text in this field is still one of thebest sources Cohen, J. (1998). Statistical power analysis for the

    behavioral sciences (2nd Ed.). Erlbaum

    More current (and more accessible) sourcesinclude

    Lipsey, M. (1990). Design sensitivity. Sage

    Murphy, K. & Myors, B. (2004). Statistical poweranalysis: A simple and general model fortraditional and modern hypothesis tests. Erlbaum.

  • 8/3/2019 Murphy Power Analysis

    14/40

    Conducting a Power Analysis

    Power Analysis software

    Power and Precision - Biostat

    www.PowerAnalysis.com

    One-Stop F Calculator Included in Murphy & Myors (2004)

    PASS - NCSS software www.ncss.com/pass.html

    http://www.poweranalysis.com/http://www.ncss.com/pass.htmlhttp://www.ncss.com/pass.htmlhttp://www.poweranalysis.com/
  • 8/3/2019 Murphy Power Analysis

    15/40

    Conducting a Power Analysis

    In planning studies, you should

    Assume relatively small effects

    If it was reasonable to expect a large effect, you

    probably dont need to do the study or the test

    Aim for power of .80 or better

    Power of .50 means that significance tests have

    become a coin flip

  • 8/3/2019 Murphy Power Analysis

    16/40

    Effect Size Conventions

    In behavioral and social sciences, there are widely-

    followed conventions for describing small, moderate,

    and large effects

    d- standardized Percentage of mean difference variance explained

    Small .20 1%

    Moderate .50 10%

    Large .80 25%

  • 8/3/2019 Murphy Power Analysis

    17/40

    Applications of Power Analysis

    Study planning - Given ES and , solve for N If you wanted to compare the effects of four types

    of training programs and:

    You expected small to moderate effects(programs account for 5% of variation inperformance)

    You use an

    level of .05

    You need N=214 to achieve Power=.80

  • 8/3/2019 Murphy Power Analysis

    18/40

    Applications of Power Analysis

    Study evaluation - Given N and , solve for ES If you wanted to compare the effects of four safety

    interventions and:

    You have 44 subjects available

    You use an level of .05

    You will achieve Power=.80 only if the effects ofinterventions are truly large (accounting for25% of the variance in outcomes)

  • 8/3/2019 Murphy Power Analysis

    19/40

    Applications of Power Analysis

    Making a rational choice regarding GivenN and ES, solve for If you wanted to compare the effects of two

    leadership development programs and:

    You have 200 subjects available

    You expect a small difference (d=.20, or 1% ofthe variance explained by programs)

    You will achieve Power=.64 using = .0 5

    You will achieve Power=.37 using = .0 1

  • 8/3/2019 Murphy Power Analysis

    20/40

    Moving Beyond Traditional

    Significance Testing Traditional null hypotheses tests are the focus

    of most power analyses

    These tests are deeply flawed, and there is

    relatively little research on the power of

    alternatives

    Minimum effect tests represent one useful

    alternative

  • 8/3/2019 Murphy Power Analysis

    21/40

    Nil Hypothesis Testing

    Testing the hypothesis that treatments, interventions,etc. have no effect (Nil Hypothesis Test - NHT) is mostcommon and least useful thing social and behavioralscientists do

    Two problems loom largest:

    Confusion over Type 1 errors

    Likelihood of rejecting the null hypothesis

    eventually reaches 1.0, regardless of the researchquestion

  • 8/3/2019 Murphy Power Analysis

    22/40

    Type I Errors are Very Rare

    Type I error - reject H0 when it is true

    If H0 is never true, it is impossible to make a Type I

    error

    If H0 is very unlikely, a Type I error is even lesslikely

    H0 - treatment had NO effect at all

    H1 - SOMETHING happened

    Most things we do to minimize Type I errors lead tomore Type II errors

  • 8/3/2019 Murphy Power Analysis

    23/40

    This Implies

    Large literature on protecting yourself from Type I

    errors is not really useful

    NHTs yield one of two outcomes confirm the obvious

    reject H0, which you already know is likely to be

    wrong

    confuse you

    accept H0 even though you know it is likely to

    be wrong

  • 8/3/2019 Murphy Power Analysis

    24/40

    In NHT, All You Need in N

    As N increases, the likelihood of rejecting the

    nil hypothesis approaches 1.0

    Power to reject H0 does not depend all thatmuch on the phenomenon

    if N is big enough you will reject H0

    if N it is small enough, you wont

    Significance tests are an indirect index of how

    many subjects showed up

  • 8/3/2019 Murphy Power Analysis

    25/40

    There Must be a Better Way

    Stop doing significance tests (e.g., Schmidt,

    1992)

    Confidence intervals (e.g., APA Task Force,

    American Psychologist, August, 1999)

    Bayesian methods (e.g., Rounet,Psychological Bulletin, 1996)

  • 8/3/2019 Murphy Power Analysis

    26/40

    There Must be a Better Way

    Minimum-Effect Tests

    Test the hypothesis that something nontrivialhappened

    Murphy, K. & Myors, B. (2003) Statistical poweranalysis: A simple and general model fortraditional and modern hypothesis tests: 2ndEd. Mahwah, NJ: Erlbaum.

    Murphy, K. & Myors, B. (1999). Testing thehypothesis that treatments have negligibleeffects: Minimum-effect tests in the generallinear model. Journal of Applied Psychology,84, 234-248.

  • 8/3/2019 Murphy Power Analysis

    27/40

    Minimum-Effect Tests

    H0 - treatments have a negligible effect (e.g.,they account for 1% or less of the variance)

    H1 - the effect of treatments is big enough to

    care about

    This approach addresses the two biggest flaws oftraditional tests

    H0 really is plausible. Treatments rarely have zeroeffect but they often have negligible effects

    Increasing N does not automatically increaselikelihood of rejecting H0

  • 8/3/2019 Murphy Power Analysis

    28/40

    Minimum-Effect Tests

    With Minimum Effect Tests (METs)

    Type I errors are once again possible, but can bemiminized

    the question asked in MET is no longer trivial

    you can actually learn something by doing the test

    Power Analysis work exactly the same way in METas in NHT

  • 8/3/2019 Murphy Power Analysis

    29/40

    Performing Minimum-Effect Tests

    Put your test statistics in a simple, commonform e.g. F

    Decide what you mean by a negligible effect

    Find or create an F table based on thatdefinition of a negligible effect - Noncentral F

    distribution

    Proceed as you would for any traditional NHT

  • 8/3/2019 Murphy Power Analysis

    30/40

    Working with the Noncentral F

    Calculating or deriving noncentral F

    distributions was once a daunting task

    Many simple calculators now available

    http://calculators.stat.ucla.edu/cdf/ncf/ncfcalc.php

    Noncentrality parameter ( )

    in a measure of effect size

    = [dfh * (MSh - MSe )] / MSe

    http://calculators.stat.ucla.edu/cdf/ncf/ncfcalc.phphttp://calculators.stat.ucla.edu/cdf/ncf/ncfcalc.php
  • 8/3/2019 Murphy Power Analysis

    31/40

    What Constitutes a Negligible

    Effect ?

    Standards for negligible effects depend on theresearch area and on the consequences of decisions

    Aspirin use accounts for very little variance inheart attacks, but the use of aspirin savesthousands of lives at minimal cost

    In personnel selection, it is relatively easy toaccount for a large proportion of the variance inperformance with simple cognitive tests, so theincrease in effectiveness that is defined asnegligible might be larger

  • 8/3/2019 Murphy Power Analysis

    32/40

    Defining a Negligible Effect

    Effect Size conventions are useful, but by themselves

    may not be sufficient

    Consequences of errors must also be considered

    d- standardized Percentage of

    mean difference variance explained

    Small .20 1%

    Moderate .50 10%

  • 8/3/2019 Murphy Power Analysis

    33/40

    Power Analysis for MET:

    Small Effect - d=.20, PV=.01

    0

    0.10.20.30.40.50.6

    0.70.80.9

    1

    00.2

    0.4

    0.6

    0.8 1

    Effect Size

    Likelihood ofrejection H0

  • 8/3/2019 Murphy Power Analysis

    34/40

    Power Analysis for MET:

    Small Effect - d=.20, PV=.01

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    25 75125

    175

    225

    275

    Sample Size

    Likelihood of

    rejection H0givenpopulationd=.30

  • 8/3/2019 Murphy Power Analysis

    35/40

    Power Analysis for MET:

    Small Effect - d=.20, PV=.01

    0

    0.10.20.30.40.50.6

    0.70.80.9

    1

    00.2

    0.4

    0.6

    0.8 1

    Effect Size

    Likelihood ofrejection H0

  • 8/3/2019 Murphy Power Analysis

    36/40

    Power Analysis for MET:

    Small Effect - d=.20, PV=.01

    0

    0.01

    0.02

    0.030.04

    0.05

    0.06

    0.07

    00.05 0.

    10.

    15 0.2

    0.25

    Effect Size

    Likelihood ofrejection H0

  • 8/3/2019 Murphy Power Analysis

    37/40

    Errors in MET

    The potential downsides of MET are:

    Type I errors could actually occur

    Lower power than corresponding NHT

    You can reduce Type I errors by using larger

    samples

    The loss of power is more than balanced bythe fact that the hypothesis being tested is not

    a trivial one

  • 8/3/2019 Murphy Power Analysis

    38/40

    Type I Error Rates of Minimum-

    Effect Tests

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0.08

    00.05 0.

    10.

    15 0.2

    0.25

    Effect Size

    Smaller SampleLarger Sample

  • 8/3/2019 Murphy Power Analysis

    39/40

    Type I vs Type II Errors

    The tradeoff between Type I and Type II errors is morecomplicated in METs than in Nil tests

    In MET, alpha is precise only if the true effect sizeis exactly the same as your definition ofnegligible

    Type II errors more of a problem with METs

    METs are less powerful than NHTs (it is easier toreject the hypothesis that nothing happened thanthe hypothesis that nothing important happened),

    but this is not necessarily a bad thing

    METs place even greater premium on largesamples, but small samples cause problems evenwhere there is substantial power

  • 8/3/2019 Murphy Power Analysis

    40/40

    Examples - comparing two

    treatments

    N needed True effect

    PV=.05 PV=.10

    Nil 149 79

    MET 375 117

    (1%=negligible)