biostatistics in practice peter d. christenson biostatistician session 3: testing hypotheses

35
Biostatistics in Practice Peter D. Christenson Biostatistician http://gcrc.LABioMed.org/ Biostat Session 3: Testing Hypotheses

Upload: pierce-gordon

Post on 13-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Biostatistics in Practice

Peter D. ChristensonBiostatistician

http://gcrc.LABioMed.org/Biostat

Session 3: Testing Hypotheses

Page 2: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Session 3 Preparation

We have been using a recent study on hyperactivity for the concepts in this course. The questions below based on this paper are intended to prepare you for session 3.

1. Look at the bottom panel of Figure 3. Based on what we have discussed about confidence intervals, do you see evidence for change in hyperactivity under Mix A?

2. Repeat question 1 for placebo.

Page 3: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Session 3 Preparation: #1 and #2

Page 4: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Session 3 Preparation

We have been using a recent study on hyperactivity for the concepts in this course. The questions below based on this paper are intended to prepare you for session 3.

3. Now look at the fourth vertical bar in this same panel in Fig 3. Does it agree with your combined conclusions in questions 1 and 2?

Page 5: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Session 3 Preparation: #3

Page 6: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Session 3 Preparation

We have been using a recent study on hyperactivity for the concepts in this course. The questions below based on this paper are intended to prepare you for session 3.

4. Do you think that the negative conclusion for question #1 been "proven"?

5. Do you think that the positive conclusion

for question #2 been "proven"?

Page 7: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Session 3 Preparation: #4 and #5 Possible values for real effect.

Zero is “ruled out”.

Page 8: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Session 3 Preparation

5. From Tables 1 and 2, we see that (209-137)/209=34% of parents of the younger children and (160-130)/160=19% of parents of the older children initially were interested but did not complete the study. What are the main reported reasons for not completing? Does it seem logical that the rate is higher for the 3-year-olds? Do you have any intuition on whether the magnitude of the 34% vs. 19% difference is enough to support an age difference, regardless of the logical reason?

Page 9: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Session 3 Preparation #5

73% ↔ Consented ↔ 90%

66% ↔ Completed ↔ 81%

Not intuitive whether 73% vs. 90% is real, or reproducible.

Page 10: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Session 3 Goals

Statistical testing concepts

Three most common tests

Software

Equivalence of testing and confidence intervals

False positive and false negative conclusions

Page 11: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Goal: Do Groups Differ By More than is Expected By Chance?

Cohan (2005) Crit Care Med;33:2358-66.

Page 12: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Goal: Do Groups Differ By More than is Expected By Chance?

First, need to:

• Specify experimental units (Persons? Blood draws?).

• Specify single outcome for each unit (e.g., Yes/No, mean or min of several measurements?).

• Examine raw data, e.g., histogram, for meeting test requirements.

• Specify group summary measure to be used (e.g., % or mean, median over units).

• Choose particular statistical test for the outcome.

Page 13: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Outcome Type → Statistical Test

Cohan (2005) Crit Care Med;33:2358-66.

. . .

. . .

Medians

%s

Means

WilcoxonTest

ChiSquareTest

t Test

Page 14: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Minimal MAP: Group Distributions of Individual Units

AI Group (N=42) Stem.Leaf # 7 6 1 7 11334 5 6 555 3 6 01112344 8 5 5566778 7 5 01222234 8 4 57788 5 4 23 2 3 6 1 3 13 2 ----+----+----+----+ Multiply Stem.Leaf by 10**+1

Non-AI Group (N=38)Stem.Leaf # 7 79 2 7 00111234 8 6 5556777888 10 6 00112234 8 5 67999 5 5 3 1 4 79 2 4 04 2 ----+----+----+----+ Multiply Stem.Leaf by 10**+1

→ Approximately normally distributed

→ Use means to summarize groups.

→ Use t-test to compare means.

Page 15: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Goal: Do Groups Differ By More than is Expected By Chance?

Next, need to:

1. Calculate a standardized quantity for the particular test, a “test statistic”.

• Often: t=(Diff in Group Means)/SE(Diff)

2. Compare the test statistic to what it is expected to be if (populations represented by) groups do not differ. Often: t is approx’ly normal bell curve.

3. Declare groups to differ if test statistic is too deviant from expectations in (2) above.

• Often: absolute value of t >~2.

Page 16: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

t-Test for Minimal MAP: Step 1

1. Calculate a standardized quantity for the particular test, a “test statistic”.

Diff in Group Means = 63.4 - 56.2 = 7.2

SE(Diff) ≈ sqrt[SEM12 + SEM2

2] = sqrt(1.662+1.412) ≈ 2.2

AI N 42Mean 56.1666667Std Dev 10.7824634SE(Mean) 1.66=10.78/√42

Non AI N 38Mean 63.4122807Std Dev 8.7141575SE(Mean) 1.41=8.71/√38

→ Test Statistic = t = (7.2 - 0)/2.2 = 3.28

Page 17: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

t-Test for Minimal MAP: Step 2

2. Compare the test statistic to what it is expected to be if (populations represented by) groups do not differ. Often: t is approx’ly normal bell curve.

Expect

0.95 ChanceObserved = 3.28

Expected values for test statistic if groups do not differ.

Area under sections of curve = probability of values in the interval.

(0.5 for 0 to ∞)

Prob (-2 to -1) is Area = 0.14

Page 18: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

t-Test for Minimal MAP: Step 3

Expect

95% ChanceObserved = 3.28

3. Declare groups to differ if test statistic is too deviant. [How much?]

Convention:

“Too deviant” is < 5% chance → |t| >~2.

“Two-tailed” = the 5% is allocated equally for either group to be superior.

2.5%2.5%

Conclude: Groups differ since ≥3.28 has <5% if no diff in entire populations.

Page 19: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

t-Test for Minimal MAP: p value

Expect

95% ChanceObserved = 3.28

p-value:

Probability of a test statistic at least as deviant as observed, if populations really do not differ.

Smaller values ↔ more evidence of group differences.

Area = 0.0007

Area = 0.0007

p value = 2(0.0007) = 0.0014 <<0.05

3. Declare groups to differ if test statistic is too deviant. [How much?]

Page 20: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

t-Test: Technical Note

There are actually several types of t-tests:

• Equal vs. unequal variance (variance =SD2), depending on whether the SDs are too different between the groups. [Yes, there is another statistical test for comparing the SDs.]

SE(Diff) ≈ sqrt[SEM12 + SEM2

2] = sqrt(1.662+1.412) ≈ 2.2 is approximate. There are more complicated exact formulas that software implements.

AI N 42Mean 56.1666667Std Dev 10.7824634SE(Mean) 1.66=10.78/√42

Non AI N 38Mean 63.4122807Std Dev 8.7141575SE(Mean) 1.41=8.71/√38

Page 21: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

t-Test: Another Note

There are other types of t-tests:

• A two-sided t-test assumes that differences (between groups or pre-to-post) are possible in both directions, e.g., increase or decrease.

• A one-sided t-test assumes that these differences can only be either an increase or decrease, or one group can only have higher or lower responses than the other group. This is very rare, and generally not acceptable.

Page 22: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Back to Paper: Normal Range

Δ= 63.4-56.2= 7.2 is the best guess for the MAP diff between a randomly chosen AI and non-AI patient, w/o other patient info.

What is the “normal” range for AI patients?

SD = 8.7 SD = 10.8

N = 38 N = 42

Page 23: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Back to Paper: Confidence Intervals

Δ= 7.2 is the best guess for the MAP diff between the means of “all” AI and non-AI patients.

We are 95% sure that diff is within ≈ 7.2±2SE(Diff) = 7.2±2(2.2) = 2.8 to 11.6.

SD = 8.7 SD = 10.8

N = 38 N = 42

SE = 1.41 SE = 1.66

SE(Diff of Means) = 2.2

SE(Diff) ≈ sqrt of [SEM1

2 + SEM22]

Page 24: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Back to Paper: t-test

Δ= 7.2 is statistically significant (p=0.0014); i.e., only 14 of 1000 sets of 80 patients would differ so much, if AI and non-AI really don’t differ in MAP.

Is Δ= 7.2 clinically significant?

Page 25: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Confidence Intervals ↔ Tests

p>0.05 p≈0.05 p<0.05Hyperactivity Paper

Page 26: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Confidence Intervals ↔ Tests

|Δ/SE(Δ)| = |t| < 2

is equivalent to:

|Δ| < 2 SE(Δ)

is equivalent to:

-2 SE(Δ) < Δ < 2 SE(Δ)

is equivalent to:

Δ - 2 SE(Δ) < 0 < Δ + 2 SE(Δ) (95% Confidence Interval)

Page 27: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Confidence Intervals ↔ Tests

95% Confidence Intervals

Non-overlapping 95% confidence intervals, as here, are sufficient for significant (p<0.05) group differences.

However, non-overlapping is not necessary. They can overlap and still groups can differ significantly.

Page 28: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Back to Paper: Experimental Units

Cannot use t-test for comparing lab data for multiple blood draws per subject.

bat least 100 g/kg/min of propofol administered at the time of blood draw, or any pentobarbital in the 48 hrs before the blood draw

Page 29: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Tests on Percentages

Is 26.3% vs. 61.9% statistically significant (p<0.05), i.e., a difference too large to have a <5% of occurring by chance if groups do not really differ?

Solution: same theme as for means. Find a test statistic and compare to its expected values if groups do not differ.

See next slide.

Page 30: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Tests on Percentages

Cannot use t-test for comparing lab data for multiple blood draws per subject.

Expect

1Observed = 10.2

Area = 0.002

Chi-Square Distribution

95% Chance

5.99

Here, the test statistic is a ratio, expected to be 1, rather than a difference, expected to be 0.

Test statistic=10.2 >> 5.99, so p<0.05. In fact, p=0.002.

Page 31: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Tests on Percentages: Chi-Square

The chi-square test statistic (10.2 in the example) is found by first calculating what is the expected number of AI patients with MAP <60 and the same for non-AI patients, if AI and non-AI really do not differ for this.

Then, chi-square is found as the sum of standardized (Observed – Expected)2. This should be close to 1, as in the graph on the previous slide, if groups do not differ. The value 10.2 seems too big to have happened by chance (probability=0.002).

Page 32: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Back to t-Test

Expect

95% ChanceObserved = 3.28

Declare groups to differ if test statistic is too deviant.

Convention:

“Too deviant” is < 5% chance → |t| >~2.

Why not choose, say, |t|>3, so that our chances of being wrong are even less, <1%?

2.5%2.5%

How much “deviance” is enough proof?

Page 33: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Graphical Representation of t-test

No Effect

Real Effect

No real effect (0)

Real effect = 3

Effect in study=1.13

\\\ = Probability: Conclude Effect, But no Real Effect (5%).

/// = Probability: Conclude No Effect, But Real Effect (41%).

41%

5%

Δ = Effect (Difference Between Group Means)

Red Blue

Green

Just Δ, not t = Δ/SE(Δ) Conclude real effect.

Page 34: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Graphical Representation of t-test

No Effect

Real Effect

No real effect (0)

Real effect = 3

Effect in study=1.13 41%

5%

Δ = Effect (Difference Between Group Means)

Red Blue

Green

Just Δ, not t = Δ/SE(Δ) Conclude real effect.

Suppose we need stronger proof; i.e., shift cutoff to right.

Then, chance of false positive is reduced to ~1%, but false negative is increased to ~60%.

Page 35: Biostatistics in Practice Peter D. Christenson Biostatistician  Session 3: Testing Hypotheses

Power of a Study

Statistical power is the sensitivity of a study to detect real effects, if they exist.

It is 100-41=59% two slides back.

This is the topic for the next session #4.