06. hypothesis testing - department of zoology, ubcmfscott/lectures/06_hypothesis_testing.pdf ·...

61
Assignment #2 Chapter 3: 17, 21 Chapter 4: 18, 20 Due This Friday Oct. 2 nd by 2pm in your TA’s homework box

Upload: vothuan

Post on 23-May-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

Assignment #2

Chapter 3: 17, 21 Chapter 4: 18, 20 Due This Friday Oct. 2nd by 2pm in your TA’s homework box

Assignment #1

Chapter 1: 14, 17, 19, 20 Chapter 2: 25, 32 Answers are now posted on the webpage Late assignments (even 1 min.) will not be accepted

Assignment #3

Chapter 5: 28, 36, 37 Chapter 6: 16, 18, 19 Due next Friday Oct. 9th by 2pm in your TA’s homework box

Reading

For Today: Chapter 6 For Thursday: Chapter 7

Chapter 5 Review The probability of an event is its true relative frequency; the proportion of times the event would occur if we repeated the same process over and over again. A probability distribution describes the true relative frequency (a.k.a. the probability) of all possible values of a random variable.

Sum of two dice

Probability

Chapter 5 Review

The probability of A OR B involves addition If the two are mutually exclusive:

Pr(A or B) = Pr(A) + Pr(B) If the two are not mutually exclusive: Pr(A or B) = Pr(A) + Pr(B) – Pr(A and B)

The probability of A AND B involves multiplication

If the two are independent Pr(A and B) = Pr(A) Pr(B)

If the two are dependent Pr(A and B) = Pr(A) Pr(B|A)

Chapter 5 Review Mutually Exclusive Not Mutually Exclusive

Independent Not possible First draw is king (put card back) OR Second draw is king 4/52 + 4/52 – (4/52*4/52) = 0.15 First draw is king (put card back) AND Second draw is king 4/52 * 4/52 = 0.006

Dependent First draw is ace of hearts OR Second draw is ace of hearts (1/52) + (51/52 * 1/51) = 0.04 First draw is ace of hearts AND Second draw is ace of hearts 1/52 * 0 = 0

First draw is king OR Second draw is king 4/52 + ((48/52 * 4/51) + (4/52 * 3/51)) – (4/52 * 3/51) First draw is king AND Second draw is king 4/52 * 3/51

Two consecutive draws from the same deck of cards

Chapter 5 Review •  The conditional probability of an event is the

probability of that event occurring given that a condition is met.

•  Law of total probability:

•  Bayes theorem:

Pr A[ ] = Pr A | B[ ]All valuesof B∑ Pr B[ ]

Pr A |B[ ] =Pr B |A[ ]Pr[A]

Pr[B]

Hypothesis testing

Hypothesis testing compares data to what we would expect to see if a specific null hypothesis were true. If the data are too unusual, compared to what we would expect to see if the null hypothesis were true, then the null hypothesis is rejected.

So we imagine making an infinite number of samples,

from a distribution where men and women have the same

height.

Hypothesis testing in a nutshell

PopulationWe want to know somethingabout this population, say, are men and women the same height, on average?

We can't measure everyone- it would take too long and cost too much. So we take a sample, and meaure those. For these we estimate the difference between men and women's mean height.

Sample

But we have a problem: The sample doesn't have the sameproperties as the population,because of chance errors.

So we need to know how good the sample is, and how likely it is that it is much different from the population.

We make an estimate from each of these samples, and from these we can calculate the sampling distribution of the estimate.

Freq

uenc

y

Difference in mean height

If the actual sample value is so different from what we would expect samples to look like, then we can say that the men in this population are on average taller than the women.

Freq

uenc

y

Difference in mean height

So we imagine making an infinite number of samples,

from a distribution where men and women have the same

height.

Hypothesis testing in a nutshell

PopulationWe want to know somethingabout this population, say, are men and women the same height, on average?

We can't measure everyone- it would take too long and cost too much. So we take a sample, and meaure those. For these we estimate the difference between men and women's mean height.

Sample

But we have a problem: The sample doesn't have the sameproperties as the population,because of chance errors.

So we need to know how good the sample is, and how likely it is that it is much different from the population.

We make an estimate from each of these samples, and from these we can calculate the sampling distribution of the estimate.

Freq

uenc

y

Difference in mean height

If the actual sample value is so different from what we would expect samples to look like, then we can say that the men in this population are on average taller than the women.

Freq

uenc

y

Difference in mean height

Hypotheses are about populations, but are tested

with data from samples

Hypothesis testing usually assumes that sampling is random.

Null hypothesis (Ho): a specific statement about a population parameter made for the purposes of argument. Alternate hypothesis (HA): represents all other possible parameter values except that stated in the null hypothesis.

The null hypothesis is usually the simplest statement, whereas the alternative hypothesis is usually the

statement of greatest interest.

A good null hypothesis would be interesting if proven wrong.

A null hypothesis is specific; an alternate hypothesis is not.

A test statistic is a number calculated from the data that is used to evaluate how compatible the data are with the result expected

under the null hypothesis

Null distribution: the sampling distribution of outcomes for a test statistic under the assumption that the null hypothesis is true

P-value

A P-value is the probability of getting the data, or something as or more unusual, if the null hypothesis were true.

Hypothesis testing: an example

Does a red shirt help win wrestling?

The experiment and the results

•  Animals use red as a sign of aggression

•  Does red influence the outcome of wrestling, taekwondo, and boxing?

–  16 of 20 rounds had more red-shirted than blue-shirted winners in these sports in the 2004 Olympics

–  Shirt color was randomly assigned

Hill, RA, and RA Burton 2005. Red enhances human performance in contests Nature 435:293.

Stating the hypotheses

H0: Red- and blue-shirted athletes are equally likely to win (proportion = 0.5).

HA: Red- and blue-shirted athletes

are not equally likely to win (proportion ≠ 0.5).

Estimating the value

•  16 of 20 is a proportion, proportion = 0.8

•  This is a discrepancy of 0.3 from the proportion proposed by the null hypothesis, proportion = 0.5

Is this discrepancy by chance alone?:

Estimating the probability of such an extreme result

We use the null distribution to estimate the probability of getting the data, or

something as or more unusual, if the null hypothesis were true (i.e. to get the P-

value for our test statistic)

The null distribution of the sample proportion

Calculating the P-value from the null distribution

The P-value is calculated as

P = 2 × [Pr(16) + Pr(17) + Pr(18) + Pr(19) + Pr(20)] = 0.012.

Statistical significance

The significance level, α, is a probability used as a criterion for rejecting the null hypothesis. If the P-value for a test is less than or equal to α, then the null hypothesis is rejected.

α is often 0.05

Significance for the red shirt example

•  P = 0.012

•  P < α, so we can reject the null hypothesis

•  Athletes in red shirts were more likely to win.

Larger samples give more information

•  A larger sample will tend to give and estimate with a smaller confidence interval

•  A larger sample will give more power to reject a false null hypothesis

Hypothesis testing: another example Do dogs resemble their owners?

Common wisdom holds that dogs resemble their owners. Is this true?

•  41 dog owners approached in parks; photos taken of dog and owner separately

•  Photos of owner and dog, along with a photo of another dog, shown to students to match

Roy, M.M., & Christenfeld, N.J.S. (2004). Do dogs resemble their owners? Psychological Science, 15, 361–363

Hypotheses

H0: The proportion of correct matches is proportion = 0.5.

HA: The proportion of correct matches is different from proportion = 0.5.

Data

Of 41 matches, 23 were correct and 18 were incorrect.

Estimating the proportion

sample proportion =2341

= 0.56

Null distribution for dog/owner resemblance

P = 0.53.

The P-value:

We do not reject the null hypothesis that dogs do not resemble their owners.

Jargon

Significance level

•  The acceptable probability of rejecting a true null hypothesis

•  Called α

•  For many purposes, α = 0.05 is acceptable

“Statistically significant”

•  P < α

•  We can “reject the null hypothesis”

We never “accept the null hypothesis”

Type I error

•  Rejecting a true null hypothesis

•  Probability of Type I error is α (the significance level)

Type II error

•  Not rejecting a false null hypothesis

•  The probability of a Type II error is β.

•  The smaller β, the more power a test has.

Power

•  The ability of a test to reject a false null hypothesis

•  Power = 1- β

One- and two-tailed tests

•  Most tests are two-tailed tests.

•  This means that a deviation in either direction would reject the null hypothesis.

•  Normally α is divided into α/2 on one side and α/2 on the other.

Test statistic

2.5% 2.5%

One-tailed tests

•  Only used when the other tail is nonsensical

•  For example, comparing grades on a multiple choice test to that expected by random guessing

Test Statistic

•  A number calculated to represent the match between a set of data and the null hypothesis

•  Can be compared to a general distribution to infer probability

Critical value

•  The value of a test statistic beyond which the null hypothesis can be rejected

Correlation does not automatically imply causation

Correlation does not automatically imply causation

57

Life expectancy by country:

Confounding variable

An unmeasured variable that may be cause both X and Y"

Observations vs. Experiments

Statistical significance ≠ Biological importance

Important Unimportant Significant

Polio vaccine reduces incidence of polio

Things you don’t care about, or already well known things:

Insignificant Small study shows a possible effect, leading to larger study which finds significance.

or Large study showing no effect of drug that was thought to be beneficial.

Studies with small sample size and high P-value

or Things you don’t care

about