hypothesis test - pennsylvania state university

95
Hypothesis Test

Upload: others

Post on 29-Jan-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hypothesis Test - Pennsylvania State University

Hypothesis Test

Page 2: Hypothesis Test - Pennsylvania State University

An Old Example Again

Page 3: Hypothesis Test - Pennsylvania State University

How about This? Students from University A rush to Fort Lauderdale.

We know the average SAT score of students from this university is 600, with a standard deviation of 15. Sampled 100 college students in Fort Lauderdale and

found their average SAT score is 620 Are they from University A?

Does one sample belongs to a known population or to a totally different population? Two populations:

Totally different groups of people Same group of people before and after a particular treatment

Page 4: Hypothesis Test - Pennsylvania State University

Another Example: Does A Treatment Have An Effect?

=?

Page 5: Hypothesis Test - Pennsylvania State University

Hypothesis Test What is hypothesis testing? Use sample statistics to evaluate a hypothesis

about a population Steps

1. State the hypotheses about the value of the population mean

2. Set the criteria for a decision3. Collect data and compute sample statistics4. Make a decision based on the criteria and

statistics

Page 6: Hypothesis Test - Pennsylvania State University

The logic of Hypothesis Testing

µ M

σ

Page 7: Hypothesis Test - Pennsylvania State University

Example A study on the effect of mild electrical brain

stimulation mathematics skills IV: electric current DV: scores of standard math test Construct validity Stimulation: electric current Math skills: test score

Page 8: Hypothesis Test - Pennsylvania State University

Step 1: Hypotheses Two hypotheses are needed here The null hypothesis The IV has no effect on the DV

The sample mean is “equal” to the population mean H0: µ with stimulation = 80

The alternative hypothesis The IV has an effect on the DV.

The sample mean is not “equal” to the population mean H1: µ with stimulation ≠ 80

The H0 and H1 are mutually exclusive and exhaustive Only one can be true One must be true

Page 9: Hypothesis Test - Pennsylvania State University

Step 2: Set the Criteria When we have a sample mean different from

the population mean, we often ask a question: How likely is the difference due to random errors

rather than system errors? For hypothesis testing, this question is: How likely can we get this particular sample mean

if H0 is true? We need the criteria for decision making.

Page 10: Hypothesis Test - Pennsylvania State University
Page 11: Hypothesis Test - Pennsylvania State University

The Alpha Level A value to separate the high-probability

samples from the low probability samples Also called the level of significance Define the very unlikely sample outcomes if

the null hypothesis is true Must be small 5%, 1%, 0.1%

Its implication The probability to make a mistaken claim is less than

the alpha level

Page 12: Hypothesis Test - Pennsylvania State University

Critical Region The region composed of extreme sample

values Value falling into the critical region is very

unlikely to occur if H0 is true If a sample mean falls into the region It is unlikely the sample is from the population. The sample may come from a different population

with a different population mean The null hypothesis should be rejected.

Page 13: Hypothesis Test - Pennsylvania State University
Page 14: Hypothesis Test - Pennsylvania State University

In Practice … We use z-scores to specify

the boundaries that defines the critical region

We need the unit normal table to locate the z-scores

One tail or two tails Depend on your null

hypothesis

For this example, two tails: z-score: +/- 1.96

Page 15: Hypothesis Test - Pennsylvania State University

Step 3: Collect Data and Compute Statistics Data can be collected in different ways Experiments Surveys

Statistics computation Find the mean Find the corresponding z-score Out interests: sample means Standard error should be used

Page 16: Hypothesis Test - Pennsylvania State University

Step 4: Make a Decision Two possible outcomes Reject the null hypothesis Fail reject the null hypothesis

Rejecting the null hypothesis The z-score for the sample mean is beyond the z-

scores defining the critical region Big discrepancy between the sample and the null hypothesis

Unlikely to happen if the null hypothesis is true The alternative hypothesis is true The treatment has an impact

Μ = 92, σ = 20, sample size = 25 σ M= ? z =

Page 17: Hypothesis Test - Pennsylvania State University

Step 4: Make a Decision (Cont.) Failing to reject the null hypothesis The sample does not fall in the critical region You cannot reject the null hypothesis

This does not mean the null hypothesis is true It may be false, but the study fails to prove it

Μ = 84, σ = 20, sample size = 25 z = ?

Two outcomes: You have enough evidence to show the treatment has

an impact Evidence you gathered is not convincing. You cannot

prove the null hypothesis is wrong. All you can say is that your data fail to show the treatment has an impact. You cannot say the treatment has no effect.

Page 18: Hypothesis Test - Pennsylvania State University

Analogy: Hypothesis Test as Jury Trial

Null hypothesis The defendant is innocent until proven guilty.

H0: Defendant = not guilty H1: Defendant = guilty

Alpha level The jury must be convinced beyond a reasonable doubt before they believe

that the defendant is guilty The probability to wrongly convict the defendant

Critical region Sufficient evidence

Sample data Evidence presented by the prosecutors

Decision Reject the null hypothesis: the defendant is guilty

Sufficient evidence beyond a reasonable doubt The defendant was wrongfully convicted. Evidence is wrong.

Fail to reject the null hypothesis Fail to find the defendant is guilty based on given evidence The defendant could be guilty, but evidence is not strong enough

Page 19: Hypothesis Test - Pennsylvania State University

Uncertainty and Errors Hypothesis testing is an

inferential process Its conclusion could be

correct or incorrect. Four different outcomes

Page 20: Hypothesis Test - Pennsylvania State University

Which Error is More Dangerous?

Page 21: Hypothesis Test - Pennsylvania State University

Two Types of Errors Type I errors Reject the null hypothesis that is actually true The treatment is claimed to have an effect

although it actually does not. Type II errors Fail to reject the null hypothesis that is actually

false The treatment indeed has an effect, but the study

fails to find it.

Page 22: Hypothesis Test - Pennsylvania State University

The Danger of Type I Errors Rejecting the null hypothesis is very tempting. It means a scientific discovery.

But a false discovery could be fatal. Research as building unit theory Research is built upon previous results and

findings with an assumption that those results and findings are true. Standing upon the shoulders of giants

Type I errors may jeopardy the whole enterprise of scientific research What if Newton’s laws were wrong? What if the belief on the link between high

cholesterol and heart disease is wrong?

Page 23: Hypothesis Test - Pennsylvania State University

How to Prevent or Minimize Type I Errors

Revisit the critical region

The role of the alpha value To minimize the chance of Type I errors occurring The measure of the probability of a Type I errors

Page 24: Hypothesis Test - Pennsylvania State University

Selecting the Alpha Level Set up the boundary of the critical region Measure the Type I error

A small alpha level Minimizes a Type I error But also demands more evidence from research

or even makes it impossible to reject the null hypothesis

Page 25: Hypothesis Test - Pennsylvania State University
Page 26: Hypothesis Test - Pennsylvania State University

Trade-off Three values 5%, 1%, and 0.1%

Page 27: Hypothesis Test - Pennsylvania State University

Type II Errors Less severer Fail to find something significant Personally, miss the chance to make a big

breakthrough Generally, science slows down a little bit

Page 28: Hypothesis Test - Pennsylvania State University

Example Population without stimulation µ = 80, σ = 20

Sample n = 25, M = 87

Page 29: Hypothesis Test - Pennsylvania State University

Solve It State the hypotheses Set the alpha value and the critical region Compute the sample statistics Make the decision Report the result The stimulation has a significant effect on match

skills (z =1.75, p < .05) Significant: statistically significant You can reject the null hypothesis p-value: the probability to make a Type I error Usually use the form of p < alpha

Page 30: Hypothesis Test - Pennsylvania State University

Weakest Link(s) in This Process What potential problems do you see that can

jeopardize the conclusion from the research?

Page 31: Hypothesis Test - Pennsylvania State University

Some Underlying Assumptions for Hypothesis Tests Random sampling Independent observation Observed data from subjects should be

independent The standard deviation is the same for the

sample and the population The treatment only affects the sample data by

adding a constant to each score Normal sampling distribution can be used to

analyze the problem Normally distributed scores, or A fairly large sample size

Page 32: Hypothesis Test - Pennsylvania State University

Recent Discussions on p value Statisticians Found One Thing They Can

Agree On: It’s Time To Stop Misusing P-Values http://fivethirtyeight.com/features/statisticians-

found-one-thing-they-can-agree-on-its-time-to-stop-misusing-p-values/

Sifting the evidence—what's wrong with significance tests? https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11

19478/

Page 33: Hypothesis Test - Pennsylvania State University

Effect Size and Power Revisit the formula We use the standard error to measure the distance

between a sample mean and the population mean Standard error Determined by sample size

A very large sample size would make a very small sample error Increases the chance to make a sample mean fall into

the critical region Increasing the sample size will make a sample mean closer to

the population mean, but this is not guaranteed It is likely to have the null hypothesis rejected

How big is a significant effect really?

Page 34: Hypothesis Test - Pennsylvania State University

Example µ = 50, σ = 10, M = 51 when n = 25, σ M= z = Conclusion:

when n = 400, σ M= z = Conclusion:

So, if the sample size is large enough, even a very small treatment effect can be found statistically significant. How important is such a treatment effect?

Page 35: Hypothesis Test - Pennsylvania State University

Effect Size Help you to evaluate the impact of a

significant treatment effect Not just compare the z-score, but also compare

the mean difference in terms of the standard deviation

Cohen’s d A ratio of the mean difference to the standard

deviation Large, medium and small

Page 36: Hypothesis Test - Pennsylvania State University
Page 37: Hypothesis Test - Pennsylvania State University

Cohen’s d Measures how far away two different

populations are separated

Page 38: Hypothesis Test - Pennsylvania State University

Power For the same purpose, but to measure the

probability that the test will correctly reject the null hypothesis

Compares two normal distributions If two populations are claimed to be different,

what is proportion of the distribution of sample means of one population that fall into the critical region of the other?

Page 39: Hypothesis Test - Pennsylvania State University
Page 40: Hypothesis Test - Pennsylvania State University
Page 41: Hypothesis Test - Pennsylvania State University

Power and Effect Size They are both an indication of the strength of

magnitude of a treatment effect Power is influenced by many factors,

including the sample size Effect size is not.

Page 42: Hypothesis Test - Pennsylvania State University

Directional Hypothesis Test Two tailed hypothesis Critical regions are located on the both sides of

the distribution The null hypothesis: equal mean

Often, we have a hypothesis about an increased or decreased mean

We will need a directional hypothesis test

Page 43: Hypothesis Test - Pennsylvania State University

Difference Between One-Tailed and Two Tailed Hypothesis Tests Hypotheses

Two-tailed Η0 : µ with stimulation = 80 Η1: µ with stimulation ≠ 80

One-tailed Η0 : µ with stimulation ≤ 80 Η1: µ with stimulation > 80

Critical region Two-tailed: both sides

Dividing the alpha value by two and then finding z-scores One-tailed: one side

Finding the z-score based on the alpha value

Page 44: Hypothesis Test - Pennsylvania State University
Page 45: Hypothesis Test - Pennsylvania State University

The Problems of Using z How can we know the population standard

deviation? Very often, they are what researchers are

pursuing. If we don’t know it, how can we do the

hypothesis test?

We use the t statistics rather than z!

Page 46: Hypothesis Test - Pennsylvania State University

t-Tests

Page 47: Hypothesis Test - Pennsylvania State University

Hypothesis Test with t-Statistics Procedures are similar to those using z-

statistics, except Finding the critical region from the t-distribution

table Using estimated standard errors to compute the

scores

Page 48: Hypothesis Test - Pennsylvania State University

t Statistics Use estimated standard errors to replace standard

errors Estimated standard errors are from sample statistics,

rather than population parameter

Sample variance (unbiased variance)

Estimated standard error

dfSS

nSSS =−

=1

2

nSSM

2

=

Page 49: Hypothesis Test - Pennsylvania State University

t Statistics Use estimated standard errors to replace

standard errors Estimated standard errors are from sample

statistics, rather than population parameter.

Degree of freedom Critical region

Page 50: Hypothesis Test - Pennsylvania State University
Page 51: Hypothesis Test - Pennsylvania State University

Example Infants, even newborns, prefer to look at attractive

faces compared to less attractive faces (Slater, et al., 1998). Subjects: infants from 1 to 6 days IV: Face in photo DV: time to look at a photo (in second) Method:

showing two photographs of women's faces (one significantly more attractive than the other)

20 seconds in both

M =13 (attractive face), SS = 72, n = 9

Page 52: Hypothesis Test - Pennsylvania State University

Steps Hypothesis Η0 : µ attractive = 10 Η1: µ attractive ≠ 10

Critical region df = n - 1 = 8

Page 53: Hypothesis Test - Pennsylvania State University

Steps Calculation

M=13, n=9, SS = 72 t = ?

Conclusion

dfSS

nSSs

nss

sMt

M

M

=−

=

=

−=

12

2

µ

Page 54: Hypothesis Test - Pennsylvania State University

Effect size Cohen’s d

sM

deviation standardsampledifference meand µ−

==

Magnitude of d Evaluation of Effective Sized = 0.2 Small effect (mean difference: 0.2 SDd = 0.5 Medium effectd = 0.8 Large effect

Page 55: Hypothesis Test - Pennsylvania State University

Effect size Percentage of Variance Explained

Page 56: Hypothesis Test - Pennsylvania State University

Effect size

An easy way to calculate

5294.89

9ty variabilitotal

for accountedy variabilit2

22 =

+=

+==

dfttr

5294.15381

ty variabilitotalfor accountedy variabilit

==

Page 57: Hypothesis Test - Pennsylvania State University

Confidence Intervals for Estimating μ• Alternative technique for describing effect

size• Estimates μ from the sample mean (M)• Based on the reasonable assumption that M

should be “near” μ• Based on the estimated standard error of the

mean (sM)

Page 58: Hypothesis Test - Pennsylvania State University

Confidence Intervals for Estimating μ (continue)

• Every sample mean has a corresponding t:

• Rearrange the equations solving for μ:Ms

Mt µ−=

MtsM ±=µ

Page 59: Hypothesis Test - Pennsylvania State University

Distribution with df = 8

397.11300.1*397.113 ±=±=±= MtsMµ

We are 80% confident that the average time to look at the pretty face is 13 seconds with an error 1.397 seconds.

Page 60: Hypothesis Test - Pennsylvania State University

Report t-Test Results The subjects averaged M = 13 seconds on the more attractive

face with SD = 3.0. Statistical analysis indicated that the time spent on the attractive face was significantly more than would be expected by chance, t(8) = 3.00, p < .05, r2 = 52.94%

Page 61: Hypothesis Test - Pennsylvania State University

How about this? Statistical analysis indicated that the time spent on the

attractive face was significantly more than would be expected by chance, t(8) = 3.00, p <.05, r2 = 52.94%. The subjects averaged M = 13 seconds on the plain side of the apparatus with SD = 3.00.

Page 62: Hypothesis Test - Pennsylvania State University

General Rule Report the descriptive statistics first. Mean, standard deviation, …

Present inferential statistics. z, t, F, …

Page 63: Hypothesis Test - Pennsylvania State University

How about One Tailed?

Page 64: Hypothesis Test - Pennsylvania State University

This t-Test Is Better, But It still requires the knowledge of the

population mean. Often, we don’t know the population mean. In practice, not just inferring population

parameters based on samples, but also checking the mean difference between two populations based on two samples.

Page 65: Hypothesis Test - Pennsylvania State University

Two Different t-Tests Checking whether scores from two groups

are different? Between-subjects design

Checking whether scores from different treatments are different Within-subjects design

Page 66: Hypothesis Test - Pennsylvania State University

t-Test for Two Independent Samples

Page 67: Hypothesis Test - Pennsylvania State University

Between-Subjects Design

Page 68: Hypothesis Test - Pennsylvania State University

The t-Test for Independent Measures Hypotheses The null hypothesis: no difference between two

population H0: µ 1 = µ 2

The Alternative hypothesis: there is a mean difference H1: µ 1 ≠ µ 2

Page 69: Hypothesis Test - Pennsylvania State University

Set the Criteria The alpha value The critical region How to determine the df?

We have two samples Two dfs

The overall df is the sum of two dfs df = df1 + df2

Page 70: Hypothesis Test - Pennsylvania State University

Compute Statistics The t value

sample mean – hypothesized population mean t = -----------------------------------------------------------------

estimated standard error

For independent measuressample mean diff. – hypothesized population mean diff.

t = -----------------------------------------------------------------estimated standard error

sample mean diff. M1-M2= ----------------------------------- = ---------------

estimated standard error S(M1-M2)

MsMt µ−

=

Page 71: Hypothesis Test - Pennsylvania State University

HOW TO COMPUTE S(M1-M2) ?

Page 72: Hypothesis Test - Pennsylvania State University

Estimated Standard Error Measure of standard or average distance

between sample statistic (M1-M2) and the population parameter

Unbiased only if n1 = n2

2

22

1

2

)(1

21 ns

ns

s MM +=−

Page 73: Hypothesis Test - Pennsylvania State University

Pooled Variance Instead, we use a pooled variance to replace

s21 and s2

2

Pooled variance (sp2 provides an unbiased

basis for calculating the standard error)

21

212

dfdfSSSSsp +

+=

)1()1( 2121 −+−=+= nndfdfdf

Page 74: Hypothesis Test - Pennsylvania State University

Make a Decision Rejecting the null hypothesis M1- M2 ≠0 The mean difference between sample

represents the the mean difference between populations

Not rejecting the null hypothesis No evidence to show the sample means

are different

Page 75: Hypothesis Test - Pennsylvania State University

Example Impact of TV time on student academic

performances IV: Sesame street DV: high school grade

Page 76: Hypothesis Test - Pennsylvania State University

Steps

df = df1+ df2= (n − 1) + (n2 − 1)= 9 + 9= 18

α = 0.01

Page 77: Hypothesis Test - Pennsylvania State University

Calculation(M1-M2) – (µ1-µ2)

t = -------------------------------------s(M1-M2)

s(M1-M2)

2ps

Page 78: Hypothesis Test - Pennsylvania State University

Effect Size Cohen’s d Use sp

r2

dfttr+

= 2

22

Page 79: Hypothesis Test - Pennsylvania State University

Assumptions for independent-measures t-test Independent observation within each sample Two population are normal. Two population have equal variances.

Homogeneity of variance Hartley’s Fmax test:

(smallest)(largest)

2

2

max ssF =

Page 80: Hypothesis Test - Pennsylvania State University

Hartley’s F-max Test To test whether two

samples have equal variances. The desired outcome: fail

to reject the null hypothesis

F-max table α value, n, df http://archive.bio.ed.ac.uk

/jdeacon/statistics/table8.htm

(smallest)(largest)

2

2

max ssF =

Page 81: Hypothesis Test - Pennsylvania State University

t-Test for Two Related Samples

Page 82: Hypothesis Test - Pennsylvania State University

Within-Subjects Design Subjects are compared with themselves Different conditions No groups to compare

Data in different treatments are actually related

To study the impact of treatments, we can still use t-test Slightly different from comparing independent

samples

Page 83: Hypothesis Test - Pennsylvania State University

The Interest in Related Samples The mean difference between two population

The difference between scores in different samples In related samples, we can put scores into pairs

based on subjects This is impossible in independent samples

The target: the score difference D = X2-X1

Page 84: Hypothesis Test - Pennsylvania State University
Page 85: Hypothesis Test - Pennsylvania State University

Example “Swearing as a response to pain” Tolerance on the level of pain when saying Neutral vs. swearing word

Both conditions for each subject N S vs. S N

Page 86: Hypothesis Test - Pennsylvania State University

Procedures Hypotheses

H0: µD = 0 H1: µD ≠0

The alpha value and critical region 5% as usual

t statistic

Estimated standard errorDM

DD

sMt µ−

=

dfSSs

nss

DM

=

=

2

2

Page 87: Hypothesis Test - Pennsylvania State University

Example

Page 88: Hypothesis Test - Pennsylvania State University

Example

MD= -2

Page 89: Hypothesis Test - Pennsylvania State University

Example

MD= -2 SS = ΣD2 – (ΣD)2/N = 68-18*2 = 32

Page 90: Hypothesis Test - Pennsylvania State University

Example

MD= -2 SS = ΣD2 – (ΣD)2/n = 68-18*2 = 32 s2= SS/df = 32/8 = 4 SQRT(s2/n) = SQRT(4/9) = 2/3=

DMs

Page 91: Hypothesis Test - Pennsylvania State University

t = - 3.00

Page 92: Hypothesis Test - Pennsylvania State University

Effective Size Cohen’s d

r2

sMd D=

dfttr+

= 2

22

Page 93: Hypothesis Test - Pennsylvania State University

Uses and Assumptions of Repeated Measures t-test Fewer subjects Reduce individual differences

Order effects

Independent observations within each treatment. Normal distribution.

Page 94: Hypothesis Test - Pennsylvania State University

t-test for Unequal Variance Samples

2

22

1

21

ns

ns

sM +=

2

2

22

2

2

1

2

1

2

2

22

1

2

)(1

1)(1

1

)(

1

1

ns

nns

n

ns

ns

df

−+

+=

MsMt µ−

=

Page 95: Hypothesis Test - Pennsylvania State University

t-test in statistics software SPSS R t.test() function

Excel