hypothesis testing · steps to answering the questions with data how does science advance...

52
Hypothesis Testing

Upload: truongthuan

Post on 25-Aug-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Hypothesis Testing

Steps to Answering the Questions with Data

How does science advance knowledge? How do we answer questions about the world using observations? Generally, science forms a question and brings data to bear to answer it. Informally, the process is:

• Understand what data says and form clear questions about its implications.

• State a scientific question as either "just random variability" or "unusual outcomes"?

• Compare the actual data with these two choices and decide which to believe.

• Write down our understanding.

Detailed Steps Phase 1: State the Question

1. Evaluate and describe the data 2. Review assumptions 3. State the question-in the form of hypotheses

Phase 2: Decide How to Answer the Question4. Decide on a summary number-a statistic-that

reflects the question 5. How could random variation affect that statistic? 6. State a decision rule, using the statistic, to

answer the question

Detailed Steps (cont)Phase 3: Answer the Question

7. Calculate the statistic 8. Make a statistical decision 9. State the substantive conclusion

Phase 4: Communicate the Answer to the Question10. Document our understanding with text,

tables, or figures

Tonight weTonight we’’ll let the statistics speak for themselvesll let the statistics speak for themselvesEd Ed KorenKoren, , ©© The New YorkerThe New Yorker, 9 December 1974. , 9 December 1974.

Phase 1: State the Question• The goal of phase 1 is to state the question. This

may at first seem trivial but it turns out that this is often the hardest part!

• Perhaps the question is clear. If so, then in phase 1 of the process, we look at the data and its relationship to the question at hand. Sometimes this relationship is obvious.

• More commonly, we may have data and it's not entirely clear what the question is. Or we have a clear question but the relationship of data to this question in tenuous.

Step 1. Evaluate & Describe the Data

1. The goal is to understand the data and to fully describe the data. The questions we should ask are:

• "Where did the data come from?" and • "What are the observed statistics?"

Where did the data come from?

At this point, we need to satisfy ourselves that this was a well-designed study and that the folks in the study are representative.

At the moment, all that is necessary is to know that the study's methods were clearly defined and relevant, that there were adequate controls in place, that the subjects in the study were representative of cardiac-arrest victims, and that the measurements of mortality were without bias.

Bias could result, for instance, if only the "easier" cardiac-arrest cases were included in the study.

What are the observed statistics?

This case study "found that 29 of 278 CPR patients survived." How can we summarize that statement numerically? With a statistic.

A statistic is a single descriptive number computed from the data

The proportion of people who survived is:

• where x(survived) is the number of individuals studied who survived,

• n(phone) is the number of individuals studied (those receiving the phone-CPR)

• p is the observed proportion surviving.

survived

phone

xp

n=

In the NEJM paper, the observed proportion surviving is:

29 0.104 ~ 10%278

survived

phone

xp

n= = =

So, about 10% of cardiacSo, about 10% of cardiac--arrest victims arrest victims survived after phonesurvived after phone--CPR. CPR.

2. Review AssumptionsThe next step is stating what assumptions we're making. In

this case study, the main assumptions are that the cardiac arrest victims in this study are representative of cardiac victims in general and that each victim's experience is independent of the next.

Since the study had well-designed entry criteria and no subjects were excluded for arbitrary reasons, we can conclude that the subjects in this study are representative of others who may have an emergency response system similar to that of Seattle.

3. State the Question

What, in statistical terms, is the question?

Informally, we want to know if phone-CPR is an improvement.

We want to know if our observed 10.4% phone-CPR survival rate is better than the old survival rate of 6%.

Informally, we might be tempted to say "10.4% is greater than 6% so phone-CPR is better." But be more critical and ask whether this difference could be due to chance.

That is, if we'd observed a 6.2% survival rate, would we be tempted to say "6.2% is greater than 6% so phone-CPR is better?"

Defining a Hypothesis

A statistical hypothesis is a hypothesis stated so that it can be evaluated using statistical techniques.

We begin by conceiving the true state of nature as being either "no difference" or "a difference."

We state two hypotheses to reflect these two states of nature.

A hypothesis is a statement about a population (Notabout a sample or the data).

First, we phrase a hypothesis of no difference, termed a null-hypothesis.

Null Hypothesis

The null hypothesis, H0, is the statement that is tested.

A null-hypothesis is the simplest explanation of events: There is no difference. There is no change. There is no improvement. Nothing unusual is occurring.

A null-hypothesis is the statement we hope to contradict with data. We usually hope to reject the null hypothesis.

We always start by assuming that nothing is going on-that any apparent differences are purely because of chance.

Our preference, as scientists, is to believe the simplest explanation for a phenomenon.

Randomness

Thou shalt not interpret randomness.

• Randomness just "is"; making an interpretation that goes beyond this requires justification.

• If random noise, measurement error, or chance occurrence can account for observations, then there is no need to formulate a more complicated explanation.

• We embody this preference in the statement of the null hypothesis.

CPR Study

We start by assuming that nothing is going on-that any apparent improvement over the 6% survival rate is purely because of chance.

What would the survival rate be if essentially no intervention was made?

From the information in this case, we see that we'd expect a 6% survival rate. Two ways to state the null-hypothesis are: H0: true percentage surviving is ≤ 6%, or H0: true proportion surviving is ≤ 0.06.

Thus, one proposed state of nature is that survival is 0.06 or less.

Then, the other possible state of nature is that survival is greater than 0.06

The questionThus we actually state the question addressed by

the study by positing two possible states of nature. H0: true proportion ≤ 0.06, orHA: true proportion > 0.06.

The first state, as we've seen, is called the null-hypothesis.

The second state is the alternative-hypothesis.

Alternative Hypothesis

The alternative hypothesis, HA, is the statement we hope to be able to conclude.

The statement about a population that is true if the null hypothesis is not true. The two are complementary: One and only one of the two hypotheses is true.

Also called the research hypothesis.

Proof by contradiction

When forced to choose between a simple conceptual model and real-world data that clearly contradicts that model, we choose to trust the data.

We won’t PROVE the null hypothesis, we will accept the alternative by contradiction.

Phase 2: Decide How to Answer the Question

In Phase 2, we’ll decide how to answer the question.

We’ll consider all the outcomes that could happen and assess the impact of random variation.

4. Decide on a summary statistic

Our summary statistic is p, the proportion surviving.

If the null-hypothesis is true, this should be approximately 0.06.

5. How could random variation affect the observed statistic?

What survival proportions would we expect to observe if

1) the null-hypothesis is true, and 2) we repeated the study over and over

again?

A simulation How often would we observe various proportions

surviving if we run this study 1000 times, each time on n = 278 subjects and each subject has a 0.06 chance of surviving?

Every time this study is done you'll observe a different number of survivors; the estimated proportion surviving will be different.

What we actually observed in this study is just oneof a number of possibilities of what we couldhave observed.

VariabilityHere we see that the range of proportions includes

p = 0.018. In one experiment only five out of 278 survived. Up

through two experiments where p = 0.108, or an observed proportion almost twice the true, underlying survival rate of 0.06.

One answer to the question "how much variability do we expect?" is "proportions between 0.018 and 0.108."

• The single most common proportion is 0.058 but it occurs only 10.4% of the time. Other more typical values occur around this middle.

• 29% of the time we observed proportion between 0.054 and 0.061.

• Over 46% of the time between 0.050 and 0.065.

• How much variability do we expect? If the true underlying proportion is 0.06 then more than half the time (61.3%) we'll observe values between 0.047 and 0.068.

It's useful to consider what we would choose to believe if various outcomes occur.

In our simulation of 1000 experiments, we saw p = 0.058 (n = 16 survivors) in 104 experiments.

We saw p = 0.061 (n = 17 survivors) in 88 experiments.

We saw p = 0.065 (n = 18 survivors) in 88 experiments. ... and so on.

Overall, in 1000 experiments, we saw 613 this extreme or more.

So, observing 16 survivors or more is fairly common; it happened in 61.3% of the experiments.

What would this mean?

1. The survival proportion is 0.06 (or less).To choose this, we're saying that an event occurred that is within what we'd expect if the null hypothesis is true.

2. The survival proportion is larger than 0.06.To choose this, we're saying that the observed data-survival p = 0.058-is compelling evidence that the null hypothesis is not true.

After observing only 16 survivors, it would be extremely difficult to defend choice 2.

We just can't observe a proportion less than 0.06 as evidence that the real proportion is greater than 0.06.

Observing p = 0.058 clearly supports the null hypothesis. We'd choose to stay with our preference that the actual proportion is unchanged at 0.06.

6. State a decision rule to answer the question

• Which outcomes lead us to retain our preference for the null hypothesis?

• And which lead us to conclude that the alternative is compelling?

• The answer to these questions will be our decision rule.

Consider three possible outcomes: n = 17 survivors, n = 18 survivors, or n = 19 survivors.

• The simulation shows that about half of the time (509/1000) we'd see proportions of p = 0.061 or greater (x = 17 or more survivors).

• About 40% of the time (421/1000) we'd see proportions of p = 0.065 or greater (x = 18 or more survivors).

• About a third of the time (333/1000) we'd see proportions of p = 0.068 or greater (x = 19 or more survivors).

Decision

if we observed p = 0.068 we would retain our preference for the null-hypothesis.

That is, we presume the null-hypothesis is true and for the three examples above, we would agree that there was no compelling evidence to reject the null-hypothesis.

Significance

Choice 2 implies what an observed result of p = 0.108 is so "significant" a departure from what we'd observe by chance, that we reject the null hypothesis

• The significance level is represented by the Greek symbol "alpha",α. It is the probability of rejecting a true null hypothesis.

• The researcher chooses the risk of making this error: concluding that the null hypothesis is false when it really is true.

• The most frequent values are α = 0.05, 0.01, or 0.10.

Choosing α

• The researcher chooses the risk of making this error, prior to conducting the analysis.

• The significance level is the percentage or proportion of time the researcher is willing to conclude that the null hypothesis is false when it really is true.

CPR StudyOur decision rule is thus:

Ho: The population survival proportion is 0.06 or less if the observed proportion p ≤ 0.083 (x = 23 survivors or less).

HA: The population survival proportion is larger than 0.06 if the observed proportion p > 0.086 (x = 24 or more survivors).

Phase 3: Answer the Question

• Now we return to what we did observe. • Recall the question: Does dispatcher-

instructed bystander-administered CPR improve survival?

7. Calculate the statistic

• In the actual study, we observed proportion of 0.104.

8. Make a decision

• Based on our simulated results, our observed proportion falls under the decision rule for rejecting the null hypothesis in favor of the alternative hypothesis.

• That is, we choose to believe the alternative hypothesis:

HA: The survival proportion is > 0.06, since the observed proportion p ≥ 0.086 (x = 24 or more survivors).

Associated p-value• So, if the survival rate really is 6% it would apparently be

a rare event to observe 10.4% survival.• Therefore, the p-value for our experiment is thus

approximately 3 in 1000 (p-value = 0.003). • Recall that this calculation assumes that the null-

hypothesis is true: That the survival proportion is really 0.06.

• It is important to note that it is possible to observe proportions as large as 0.104, but the likelihood is very small (three in a thousand).

P-value

A p-value is the probability that randomness alone leads to a test statistic ≥ to the observed statistic.

When H0 is true, the p-value is the probability of the test statistic as extreme, or more extreme than the one actually observed

9. State the conclusionDoes dispatcher-instructed bystander-

administered CPR improve the chances of survival over no CPR? With this set of observed data, it would be hard to defend the null hypothesis. We reject the null hypothesis of survival rate being 0.06 in favor of the rate being > 0.06. Bystander CPR improves the survival rate compared to no CPR.

Phase 4: Communicate the Answer to the Question

• The final, and perhaps most important, phase of this process is to communicate what you understand.

• We’ve formalized a precise question, described how to answer it, then brought data to bear on the question, and finally answered it

10. Document our understanding with text, tables, or figures

Bystander-administered CPR was administered to n = 278 cardiac arrest victims. Survival probabilities of cardiac arrest victims without this intervention is 6%. In this study, the observed survival probability was 10.4% (x = 29), which was a significant increase (p-value < 0.01). Bystander CPR improves the survival to hospital discharge compared to non CPR.

Summary

In this section, we used a case study and simulation to go through the steps to answer questions with data.

These steps are called hypothesis testing. This was done using simulated results, specific to this study.

Often, this isn’t possible, so in the next section, we’ll standardize the process to a formal set of steps to use in general.